git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Vegard Nossum <vegard.nossum@oracle.com>
Cc: git@vger.kernel.org,
	"Quentin Casasnovas" <quentin.casasnovas@oracle.com>,
	"Shawn Pearce" <spearce@spearce.org>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: Huge performance bottleneck reading packs
Date: Thu, 13 Oct 2016 11:26:32 -0400	[thread overview]
Message-ID: <20161013152632.ynsabbv3yrthgidy@sigill.intra.peff.net> (raw)
In-Reply-To: <fb66dc02-fb75-5aad-74e5-01b969cf9f9a@oracle.com> <af801f22-0e24-525d-a862-f2114941719a@oracle.com>

On Thu, Oct 13, 2016 at 09:20:07AM +0200, Vegard Nossum wrote:

> > Does the patch below help?
> 
> Yes, ~2m10s -> ~1m25s when I test a git fetch this morning (the other
> variation in time may be due to different CPU usage by other programs,
> but I ran with/without the patch multiple times and the difference is
> consistent).
> [...]
> There are some 20k refs on the remote, closer to 25k locally.

OK, that makes some sense. For whatever reason, your remote has a bunch
of tags that point to objects you do not already have. That could
happen, I think, if the remote added a lot of tags since you cloned
(because clone will grab all of the tags), but those tags do not point
to history that you are otherwise fetching (since fetch by default will
"auto-follow" such tags).

It might be interesting to see the results of:

  # all the objects we have
  git cat-file --batch-all-objects --batch-check='%(objectname)' >us

  # all the objects the remote is advertising
  git ls-remote origin | cut -f1 | sort -u >them

  # what they are advertising that we don't have
  comm -13 us them | wc -l

My guess is that the number is relatively high. And that would explain
why nobody else has really complained much; such a pattern is probably
uncommon.

> The fetch doesn't actually get anything from the remote as everything is
> already up to date (that makes the 2m40s times even more frustrating in
> a way :-)). Here's count-objects:

If the fetch is largely a noop, then that makes me wonder why we are
spending even a minute in the "good" case. I wonder if there is some
other spot that is wasting CPU on some inefficient data structure
related to the number of refs you have. If you can do any profiling that
points to a hot spot, that would be interesting to see (and also whether
a gc improves things).

I see in find_non_local_tags() that we build up a sorted string_list via
repeated calls to string_list_insert(), which will keep the thing sorted
at each stage. That's not as efficient as just sorting at the end, but I
think it's probably OK in practice because we actually feed it via
for_each_ref(), whose output is sorted, and so we'd always just be
appending to the end.

-Peff

  reply	other threads:[~2016-10-13 16:27 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-12 22:30 Huge performance bottleneck reading packs Vegard Nossum
2016-10-12 22:45 ` Junio C Hamano
2016-10-13  7:17   ` Vegard Nossum
2016-10-13 14:50     ` Jeff King
2016-10-12 23:01 ` Jeff King
2016-10-12 23:18   ` Jeff King
2016-10-12 23:47     ` Jeff King
2016-10-13  9:04       ` Vegard Nossum
2016-10-14  9:35         ` Jakub Narębski
2016-10-13  7:20   ` Vegard Nossum
2016-10-13 15:26     ` Jeff King [this message]
2016-10-13 16:53       ` [PATCH] fetch: use "quick" has_sha1_file for tag following Jeff King
2016-10-13 17:04         ` Jeff King
2016-10-13 20:06           ` Jeff King
2016-10-14 17:39             ` Junio C Hamano
2016-10-14 18:59               ` Jeff King
2016-10-17 17:30                 ` Junio C Hamano
2016-10-18 10:28                   ` Jeff King
2016-10-13 18:18       ` Huge performance bottleneck reading packs Vegard Nossum
2016-10-13 20:43         ` Jeff King
2016-10-14  6:55           ` Vegard Nossum
2016-10-14 19:00             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161013152632.ynsabbv3yrthgidy@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=quentin.casasnovas@oracle.com \
    --cc=spearce@spearce.org \
    --cc=vegard.nossum@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).