From: Jeff King <peff@peff.net> To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Cc: Git Mailing List <git@vger.kernel.org> Subject: Re: upload-pack is slow with lots of refs Date: Wed, 3 Oct 2012 19:15:29 -0400 [thread overview] Message-ID: <20121003231529.GA11618@sigill.intra.peff.net> (raw) In-Reply-To: <CACBZZX6yMfeOx6x4iy8beq5niy9HvPq0c8ND5jZkoiJWAgVjfw@mail.gmail.com> On Thu, Oct 04, 2012 at 12:15:47AM +0200, Ævar Arnfjörð Bjarmason wrote: > I think he was wrong, I tested this on git.git by first creating a lot > of tags: > > parallel --eta "git tag -a -m"{}" test-again-{}" ::: $(git rev-list HEAD) > > Then doing: > > git pack-refs --all > git repack -A -d > > And compiled with -g -O3 I get around 1.55 runs/s of git-upload-pack > on 1.7.8 and 2.59/s on the master branch. Thanks for the update, that's more like what I expected. > FWIW here are my results on the above pathological git.git > > $ uname -r; perf --version; echo 0000 | perf record > ./git-upload-pack .>/dev/null; perf report | grep -v ^# | head > 3.2.0-2-amd64 > perf version 3.2.17 > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.026 MB perf.data (~1131 samples) ] > 29.08% git-upload-pack libz.so.1.2.7 [.] inflate > 17.99% git-upload-pack libz.so.1.2.7 [.] 0xaec1 > 6.21% git-upload-pack libc-2.13.so [.] 0x117503 > 5.69% git-upload-pack libcrypto.so.1.0.0 [.] 0x82c3d > 4.87% git-upload-pack git-upload-pack [.] find_pack_entry_one > 3.18% git-upload-pack ld-2.13.so [.] 0x886e > 2.96% git-upload-pack libc-2.13.so [.] vfprintf > 2.83% git-upload-pack git-upload-pack [.] search_for_subdir > 1.56% git-upload-pack [kernel.kallsyms] [k] do_raw_spin_lock > 1.36% git-upload-pack libc-2.13.so [.] vsnprintf > > I wonder why your report doesn't note any time in libz. This is on > Debian testing, maybe your OS uses different strip settings so it > doesn't show up? Mine was on Debian unstable. The difference is probably that I have 400K refs, but only 12K unique ones (this is the master alternates repo containing every ref from every fork of rails/rails on GitHub). So I spend proportionally more time fiddling with refs and outputting than I do actually inflating tag objects. Hmm. It seems like we should not need to open the tags at all. The main reason is to produce the "peeled" advertisement just after it. But for a packed ref with a modern version of git that supports the "peeled" extension, we should already have that information. The hack-ish patch below tries to reuse that. The interface is terrible, and we should probably just pass the peel information via for_each_ref (peel_ref tries to do a similar thing, but it also has a bad interface; if we don't have the information already, it will redo the ref lookup. We could probably get away with a peel_sha1 which uses the same optimization trick as peel_ref). With this patch my 800ms upload-pack drops to 600ms. I suspect it will have an even greater impact for you, since you are spending much more of your time on object loading than I am. And note of course that while these micro-optimizations are neat, we're still going to end up shipping quite a lot of data over the wire. Moving to a protocol where we are advertising fewer refs would solve a lot more problems in the long run. --- diff --git a/refs.c b/refs.c index 551a0f9..68eca3a 100644 --- a/refs.c +++ b/refs.c @@ -510,6 +510,14 @@ static struct ref_entry *current_ref; static struct ref_entry *current_ref; +/* XXX horrible interface due to implied argument. not for real use */ +const unsigned char *peel_current_ref(void) +{ + if (!current_ref || !(current_ref->flag & REF_KNOWS_PEELED)) + return NULL; + return current_ref->u.value.peeled; +} + static int do_one_ref(const char *base, each_ref_fn fn, int trim, int flags, void *cb_data, struct ref_entry *entry) { diff --git a/refs.h b/refs.h index 9d14558..88c5445 100644 --- a/refs.h +++ b/refs.h @@ -14,6 +14,8 @@ struct ref_lock { #define REF_ISPACKED 0x02 #define REF_ISBROKEN 0x04 +const unsigned char *peel_current_ref(void); + /* * Calls the specified function for each ref file until it returns * nonzero, and returns the value. Please note that it is not safe to diff --git a/upload-pack.c b/upload-pack.c index 8f4703b..cdf43b0 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -736,8 +736,9 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo " include-tag multi_ack_detailed"; struct object *o = lookup_unknown_object(sha1); const char *refname_nons = strip_namespace(refname); + const unsigned char *peeled = peel_current_ref(); - if (o->type == OBJ_NONE) { + if (!peeled && o->type == OBJ_NONE) { o->type = sha1_object_info(sha1, NULL); if (o->type < 0) die("git upload-pack: cannot find object %s:", sha1_to_hex(sha1)); @@ -756,11 +757,13 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo o->flags |= OUR_REF; nr_our_refs++; } - if (o->type == OBJ_TAG) { + if (!peeled && o->type == OBJ_TAG) { o = deref_tag_noverify(o); if (o) - packet_write(1, "%s %s^{}\n", sha1_to_hex(o->sha1), refname_nons); + peeled = o->sha1; } + if (peeled && !is_null_sha1(peeled)) + packet_write(1, "%s %s^{}\n", sha1_to_hex(peeled), refname_nons); return 0; }
next prev parent reply other threads:[~2012-10-04 21:53 UTC|newest] Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top 2012-10-03 12:36 Ævar Arnfjörð Bjarmason 2012-10-03 13:06 ` Nguyen Thai Ngoc Duy 2012-10-03 18:03 ` Jeff King 2012-10-03 18:53 ` Junio C Hamano 2012-10-03 18:55 ` Jeff King 2012-10-03 19:41 ` Shawn Pearce 2012-10-03 20:13 ` Jeff King 2012-10-04 21:52 ` Sascha Cunz 2012-10-05 0:20 ` Jeff King 2012-10-05 6:24 ` Johannes Sixt 2012-10-05 16:57 ` Shawn Pearce 2012-10-08 15:05 ` Johannes Sixt 2012-10-09 6:46 ` Shawn Pearce 2012-10-09 20:30 ` Johannes Sixt 2012-10-09 20:46 ` Johannes Sixt 2012-10-03 20:16 ` Ævar Arnfjörð Bjarmason 2012-10-03 21:20 ` Jeff King 2012-10-03 22:15 ` Ævar Arnfjörð Bjarmason 2012-10-03 23:15 ` Jeff King [this message] 2012-10-03 23:54 ` Ævar Arnfjörð Bjarmason 2012-10-04 7:56 ` [PATCH 0/4] optimizing upload-pack ref peeling Jeff King 2012-10-04 7:58 ` [PATCH 1/4] peel_ref: use faster deref_tag_noverify Jeff King 2012-10-04 18:24 ` Junio C Hamano 2012-10-04 8:00 ` [PATCH 2/4] peel_ref: do not return a null sha1 Jeff King 2012-10-04 18:32 ` Junio C Hamano 2012-10-04 8:02 ` [PATCH 3/4] peel_ref: check object type before loading Jeff King 2012-10-04 19:06 ` Junio C Hamano 2012-10-04 19:41 ` Jeff King 2012-10-04 20:41 ` Junio C Hamano 2012-10-04 21:59 ` Jeff King 2012-10-04 8:03 ` [PATCH 4/4] upload-pack: use peel_ref for ref advertisements Jeff King 2012-10-04 8:04 ` [PATCH 0/4] optimizing upload-pack ref peeling Jeff King 2012-10-04 9:01 ` Ævar Arnfjörð Bjarmason 2012-10-04 12:14 ` Nazri Ramliy 2012-10-03 22:32 ` upload-pack is slow with lots of refs Ævar Arnfjörð Bjarmason 2012-10-03 23:21 ` Jeff King 2012-10-03 23:47 ` Ævar Arnfjörð Bjarmason 2012-10-03 19:13 ` Junio C Hamano
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20121003231529.GA11618@sigill.intra.peff.net \ --to=peff@peff.net \ --cc=avarab@gmail.com \ --cc=git@vger.kernel.org \ --subject='Re: upload-pack is slow with lots of refs' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).