git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: upload-pack is slow with lots of refs
Date: Wed, 3 Oct 2012 17:20:07 -0400	[thread overview]
Message-ID: <20121003212007.GC4484@sigill.intra.peff.net> (raw)
In-Reply-To: <CACBZZX4Grya=FbL9XEh_EK6KVsFZYWCuHveV2QevcBwr+iYTMQ@mail.gmail.com>

On Wed, Oct 03, 2012 at 10:16:56PM +0200, Ævar Arnfjörð Bjarmason wrote:

> I can't provide all the details now (not with access to that machine
> now), but briefly:
> 
>  * The git client/server version is 1.7.8
> 
>  * The repository has around 50k refs, they're "real" refs, almost all
>    of them (say all but 0.5k-1k) are annotated tags, the rest are
>    branches.

I'd definitely try upgrading, then; I got measurable speedups from this
exact case using the patches in v1.7.10.

>  * >99% of them are packed, there's a weekly cronjob that packs them
>    all up, there were a few newly pushed branches and tags outside of
>    the

A few strays shouldn't make a big difference. The killer is calling
open(2) 50,000 times, but having most of it packed should prevent that.
I suspect Michael Haggerty's work on the ref cache may help, too
(otherwise we have to try each packed ref in the filesystem to make sure
nobody has written it since we packed).

>  * I tried "echo -n | git upload-pack <repo>" on both that 50k
>    repository and a repository with <100 refs, the former took around
>    ~1-2s to run on a 24 core box and the latter ~500ms.

More cores won't help, of course, as dumping the refs is single-threaded.

With v1.7.12, my ~400K test repository takes about 0.8s to run (on my
2-year-old 1.8 GHz i7, though it is probably turbo-boosting to 3 GHz).
So I'm surprised it is so slow.

Your 100-ref case is slow, too. Upload-pack's initial advertisement on
my linux-2.6 repository (without about 900 refs) is more like 20ms. I'd

>  * A co-worker who was working on this today tried it on 1.7.12 and
>    claimed that it had the same performance characteristics.

That's surprising to me. Can you try to verify those numbers?

>  * I tried to profile it under gcc -pg && echo -n | ./git-upload-pack
>    <repo> but it doesn't produce a profile like that, presumably
>    because the process exits unsuccessfully.

If it's a recent version of Linux, you'll get much nicer results with
perf. Here's what my 400K-ref case looks like:

  $ time echo 0000 | perf record git-upload-pack . >/dev/null
  real    0m0.808s
  user    0m0.660s
  sys     0m0.136s

  $ perf report | grep -v ^# | head
  11.40%  git-upload-pack  libc-2.13.so        [.] vfprintf
   9.70%  git-upload-pack  git-upload-pack     [.] find_pack_entry_one
   7.64%  git-upload-pack  git-upload-pack     [.] check_refname_format
   6.81%  git-upload-pack  libc-2.13.so        [.] __memcmp_sse4_1
   5.79%  git-upload-pack  libc-2.13.so        [.] getenv
   4.20%  git-upload-pack  libc-2.13.so        [.] __strlen_sse42
   3.72%  git-upload-pack  git-upload-pack     [.] ref_entry_cmp_sslice
   3.15%  git-upload-pack  git-upload-pack     [.] read_packed_refs
   2.65%  git-upload-pack  git-upload-pack     [.] sha1_to_hex
   2.44%  git-upload-pack  libc-2.13.so        [.] _IO_default_xsputn

So nothing too surprising, though there is some room for improvement
(e.g., it looks like we are calling getenv in a tight loop, which could
be hoisted out to a single call).

Do note that this version of git was compiled with -O3. Compiling with
-O0 produces very different results (it's more like 1.3s, and the
hotspots are check_refname_component and sha1_to_hex).

>    Maybe someone here knows offhand what mock data I could feed
>    git-upload-pack to make it happy to just list the refs, or better
>    yet do a bit more work which it would do if it were actually doing
>    the fetch (I suppose I could just do a fetch, but I wanted to do
>    this from a locally compiled checkout).

If you feed "0000" as I did above, that is the flush signal for "I have
no more lines to send you", which means that we are not actually
fetching anything. I.e., this is the exact same conversation a no-op
"git fetch" would produce.

-Peff

  reply	other threads:[~2012-10-04 21:51 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-03 12:36 Ævar Arnfjörð Bjarmason
2012-10-03 13:06 ` Nguyen Thai Ngoc Duy
2012-10-03 18:03 ` Jeff King
2012-10-03 18:53   ` Junio C Hamano
2012-10-03 18:55     ` Jeff King
2012-10-03 19:41       ` Shawn Pearce
2012-10-03 20:13         ` Jeff King
2012-10-04 21:52           ` Sascha Cunz
2012-10-05  0:20             ` Jeff King
2012-10-05  6:24         ` Johannes Sixt
2012-10-05 16:57           ` Shawn Pearce
2012-10-08 15:05             ` Johannes Sixt
2012-10-09  6:46               ` Shawn Pearce
2012-10-09 20:30                 ` Johannes Sixt
2012-10-09 20:46                   ` Johannes Sixt
2012-10-03 20:16   ` Ævar Arnfjörð Bjarmason
2012-10-03 21:20     ` Jeff King [this message]
2012-10-03 22:15       ` Ævar Arnfjörð Bjarmason
2012-10-03 23:15         ` Jeff King
2012-10-03 23:54           ` Ævar Arnfjörð Bjarmason
2012-10-04  7:56             ` [PATCH 0/4] optimizing upload-pack ref peeling Jeff King
2012-10-04  7:58               ` [PATCH 1/4] peel_ref: use faster deref_tag_noverify Jeff King
2012-10-04 18:24                 ` Junio C Hamano
2012-10-04  8:00               ` [PATCH 2/4] peel_ref: do not return a null sha1 Jeff King
2012-10-04 18:32                 ` Junio C Hamano
2012-10-04  8:02               ` [PATCH 3/4] peel_ref: check object type before loading Jeff King
2012-10-04 19:06                 ` Junio C Hamano
2012-10-04 19:41                   ` Jeff King
2012-10-04 20:41                     ` Junio C Hamano
2012-10-04 21:59                       ` Jeff King
2012-10-04  8:03               ` [PATCH 4/4] upload-pack: use peel_ref for ref advertisements Jeff King
2012-10-04  8:04               ` [PATCH 0/4] optimizing upload-pack ref peeling Jeff King
2012-10-04  9:01                 ` Ævar Arnfjörð Bjarmason
2012-10-04 12:14                   ` Nazri Ramliy
2012-10-03 22:32   ` upload-pack is slow with lots of refs Ævar Arnfjörð Bjarmason
2012-10-03 23:21     ` Jeff King
2012-10-03 23:47       ` Ævar Arnfjörð Bjarmason
2012-10-03 19:13 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121003212007.GC4484@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --subject='Re: upload-pack is slow with lots of refs' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).