git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Vegard Nossum <vegard.nossum@oracle.com>
Cc: git@vger.kernel.org,
	"Quentin Casasnovas" <quentin.casasnovas@oracle.com>,
	"Shawn Pearce" <spearce@spearce.org>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: Huge performance bottleneck reading packs
Date: Thu, 13 Oct 2016 16:43:51 -0400	[thread overview]
Message-ID: <20161013204351.rezqssvw63343l4g@sigill.intra.peff.net> (raw)
In-Reply-To: <d727ee5d-5f0f-e6df-3f83-35fe74641ace@oracle.com>

On Thu, Oct 13, 2016 at 08:18:11PM +0200, Vegard Nossum wrote:

> > My guess is that the number is relatively high. And that would explain
> > why nobody else has really complained much; such a pattern is probably
> > uncommon.
> 
> I get ~3,700 objects "they are advertising that we don't have".
> 
> They are all indeed tags which I don't have (nor want) locally.

Thanks. That's about what I expected.

> So without your patch I get these numbers:
> 
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  37.34     29.83    29.83 1948265116     0.00     0.00  strip_suffix_mem
>  27.56     51.85    22.02    11358     0.00     0.01 prepare_packed_git_one
>  14.24     63.23    11.38 1924457702     0.00     0.00  strip_suffix
>   4.88     67.13     3.90 26045319     0.00     0.00  find_pack_entry_one
>   4.17     70.46     3.33 84828149     0.00     0.00  hashcmp
>   3.28     73.08     2.62 79760199     0.00     0.00  hashcmp
>   2.44     75.03     1.95                             const_error
> 
> the call graph data shows all the reprepare_packed_git() calls to come
> from a chain like this:
> 
> do_for_each_ref
> do_for_each_ref_iterator
> ref_iterator_advance
> files_ref_iterator_advance
> ref_resolves_to_object
> has_sha1_file
> has_sha1_file_with_flags
> reprepare_packed_git

Hrm. It seems weird that we'd hit reprepare_packed_git() via
do_for_each_ref(), because that's looking at _your_ refs. So any time
that code path hits reprepare_packed_git(), it should be complaining
about repository corruption to stderr.

And that also wouldn't make sense that my patch would improve it. Are
you sure you've read the perf output correctly (I'll admit that I am
often confused by the way it reports call graphs)?

> With your (first) patch I get this instead:
> 
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  29.41      0.25     0.25  4490399     0.00     0.00  hashcmp
>  16.47      0.39     0.14   843447     0.00     0.00  find_pack_entry_one

These functions appearing at the top are probably a sign that you have
too many packs (these are just object lookup, which has to linearly try
each pack).

So I'd expect things to improve after a git-gc (and especially if you
turn off the packSizeLimit).

> Do you have all the profile data you were interested in before I try a
> 'git gc'?

Yes, I think so. At least I'm satisfied that there's not another
HAS_SHA1_QUICK case that I'm missing.

> I really appreciate the quick response and the work you put into fixing
> this, even when it's my fault for disabling gc, thanks!

No problem. I do think you'll benefit a lot from packing everything into
a single pack, but it's also clear that Git was doing more work than it
needed to be. This was a known issue when we added the racy-check to
has_sha1_file(), and knew that we might have to field reports like this
one.

-Peff

  reply	other threads:[~2016-10-13 20:45 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-12 22:30 Huge performance bottleneck reading packs Vegard Nossum
2016-10-12 22:45 ` Junio C Hamano
2016-10-13  7:17   ` Vegard Nossum
2016-10-13 14:50     ` Jeff King
2016-10-12 23:01 ` Jeff King
2016-10-12 23:18   ` Jeff King
2016-10-12 23:47     ` Jeff King
2016-10-13  9:04       ` Vegard Nossum
2016-10-14  9:35         ` Jakub Narębski
2016-10-13  7:20   ` Vegard Nossum
2016-10-13 15:26     ` Jeff King
2016-10-13 16:53       ` [PATCH] fetch: use "quick" has_sha1_file for tag following Jeff King
2016-10-13 17:04         ` Jeff King
2016-10-13 20:06           ` Jeff King
2016-10-14 17:39             ` Junio C Hamano
2016-10-14 18:59               ` Jeff King
2016-10-17 17:30                 ` Junio C Hamano
2016-10-18 10:28                   ` Jeff King
2016-10-13 18:18       ` Huge performance bottleneck reading packs Vegard Nossum
2016-10-13 20:43         ` Jeff King [this message]
2016-10-14  6:55           ` Vegard Nossum
2016-10-14 19:00             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161013204351.rezqssvw63343l4g@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=quentin.casasnovas@oracle.com \
    --cc=spearce@spearce.org \
    --cc=vegard.nossum@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).