git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: upload-pack is slow with lots of refs
Date: Wed, 3 Oct 2012 19:15:29 -0400	[thread overview]
Message-ID: <20121003231529.GA11618@sigill.intra.peff.net> (raw)
In-Reply-To: <CACBZZX6yMfeOx6x4iy8beq5niy9HvPq0c8ND5jZkoiJWAgVjfw@mail.gmail.com>

On Thu, Oct 04, 2012 at 12:15:47AM +0200, Ævar Arnfjörð Bjarmason wrote:

> I think he was wrong, I tested this on git.git by first creating a lot
> of tags:
> 
>      parallel --eta "git tag -a -m"{}" test-again-{}" ::: $(git rev-list HEAD)
> 
> Then doing:
> 
>     git pack-refs --all
>     git repack -A -d
> 
> And compiled with -g -O3 I get around 1.55 runs/s of git-upload-pack
> on 1.7.8 and 2.59/s on the master branch.

Thanks for the update, that's more like what I expected.

> FWIW here are my results on the above pathological git.git
> 
>     $ uname -r; perf --version; echo 0000 | perf record
> ./git-upload-pack .>/dev/null; perf report | grep -v ^# | head
>     3.2.0-2-amd64
>     perf version 3.2.17
>     [ perf record: Woken up 1 times to write data ]
>     [ perf record: Captured and wrote 0.026 MB perf.data (~1131 samples) ]
>         29.08%  git-upload-pack  libz.so.1.2.7       [.] inflate
>         17.99%  git-upload-pack  libz.so.1.2.7       [.] 0xaec1
>          6.21%  git-upload-pack  libc-2.13.so        [.] 0x117503
>          5.69%  git-upload-pack  libcrypto.so.1.0.0  [.] 0x82c3d
>          4.87%  git-upload-pack  git-upload-pack     [.] find_pack_entry_one
>          3.18%  git-upload-pack  ld-2.13.so          [.] 0x886e
>          2.96%  git-upload-pack  libc-2.13.so        [.] vfprintf
>          2.83%  git-upload-pack  git-upload-pack     [.] search_for_subdir
>          1.56%  git-upload-pack  [kernel.kallsyms]   [k] do_raw_spin_lock
>          1.36%  git-upload-pack  libc-2.13.so        [.] vsnprintf
> 
> I wonder why your report doesn't note any time in libz. This is on
> Debian testing, maybe your OS uses different strip settings so it
> doesn't show up?

Mine was on Debian unstable. The difference is probably that I have 400K
refs, but only 12K unique ones (this is the master alternates repo
containing every ref from every fork of rails/rails on GitHub). So I
spend proportionally more time fiddling with refs and outputting than
I do actually inflating tag objects.

Hmm. It seems like we should not need to open the tags at all. The main
reason is to produce the "peeled" advertisement just after it. But for a
packed ref with a modern version of git that supports the "peeled"
extension, we should already have that information.

The hack-ish patch below tries to reuse that. The interface is terrible,
and we should probably just pass the peel information via for_each_ref
(peel_ref tries to do a similar thing, but it also has a bad interface;
if we don't have the information already, it will redo the ref lookup.
We could probably get away with a peel_sha1 which uses the same
optimization trick as peel_ref).

With this patch my 800ms upload-pack drops to 600ms. I suspect it will
have an even greater impact for you, since you are spending much more of
your time on object loading than I am.

And note of course that while these micro-optimizations are neat, we're
still going to end up shipping quite a lot of data over the wire. Moving
to a protocol where we are advertising fewer refs would solve a lot more
problems in the long run.

---
diff --git a/refs.c b/refs.c
index 551a0f9..68eca3a 100644
--- a/refs.c
+++ b/refs.c
@@ -510,6 +510,14 @@ static struct ref_entry *current_ref;
 
 static struct ref_entry *current_ref;
 
+/* XXX horrible interface due to implied argument. not for real use */
+const unsigned char *peel_current_ref(void)
+{
+	if (!current_ref || !(current_ref->flag & REF_KNOWS_PEELED))
+		return NULL;
+	return current_ref->u.value.peeled;
+}
+
 static int do_one_ref(const char *base, each_ref_fn fn, int trim,
 		      int flags, void *cb_data, struct ref_entry *entry)
 {
diff --git a/refs.h b/refs.h
index 9d14558..88c5445 100644
--- a/refs.h
+++ b/refs.h
@@ -14,6 +14,8 @@ struct ref_lock {
 #define REF_ISPACKED 0x02
 #define REF_ISBROKEN 0x04
 
+const unsigned char *peel_current_ref(void);
+
 /*
  * Calls the specified function for each ref file until it returns
  * nonzero, and returns the value.  Please note that it is not safe to
diff --git a/upload-pack.c b/upload-pack.c
index 8f4703b..cdf43b0 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -736,8 +736,9 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
 		" include-tag multi_ack_detailed";
 	struct object *o = lookup_unknown_object(sha1);
 	const char *refname_nons = strip_namespace(refname);
+	const unsigned char *peeled = peel_current_ref();
 
-	if (o->type == OBJ_NONE) {
+	if (!peeled && o->type == OBJ_NONE) {
 		o->type = sha1_object_info(sha1, NULL);
 		if (o->type < 0)
 		    die("git upload-pack: cannot find object %s:", sha1_to_hex(sha1));
@@ -756,11 +757,13 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo
 		o->flags |= OUR_REF;
 		nr_our_refs++;
 	}
-	if (o->type == OBJ_TAG) {
+	if (!peeled && o->type == OBJ_TAG) {
 		o = deref_tag_noverify(o);
 		if (o)
-			packet_write(1, "%s %s^{}\n", sha1_to_hex(o->sha1), refname_nons);
+			peeled = o->sha1;
 	}
+	if (peeled && !is_null_sha1(peeled))
+		packet_write(1, "%s %s^{}\n", sha1_to_hex(peeled), refname_nons);
 	return 0;
 }
 

  reply	other threads:[~2012-10-04 21:53 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-03 12:36 upload-pack is slow with lots of refs Ævar Arnfjörð Bjarmason
2012-10-03 13:06 ` Nguyen Thai Ngoc Duy
2012-10-03 18:03 ` Jeff King
2012-10-03 18:53   ` Junio C Hamano
2012-10-03 18:55     ` Jeff King
2012-10-03 19:41       ` Shawn Pearce
2012-10-03 20:13         ` Jeff King
2012-10-04 21:52           ` Sascha Cunz
2012-10-05  0:20             ` Jeff King
2012-10-05  6:24         ` Johannes Sixt
2012-10-05 16:57           ` Shawn Pearce
2012-10-08 15:05             ` Johannes Sixt
2012-10-09  6:46               ` Shawn Pearce
2012-10-09 20:30                 ` Johannes Sixt
2012-10-09 20:46                   ` Johannes Sixt
2012-10-03 20:16   ` Ævar Arnfjörð Bjarmason
2012-10-03 21:20     ` Jeff King
2012-10-03 22:15       ` Ævar Arnfjörð Bjarmason
2012-10-03 23:15         ` Jeff King [this message]
2012-10-03 23:54           ` Ævar Arnfjörð Bjarmason
2012-10-04  7:56             ` [PATCH 0/4] optimizing upload-pack ref peeling Jeff King
2012-10-04  7:58               ` [PATCH 1/4] peel_ref: use faster deref_tag_noverify Jeff King
2012-10-04 18:24                 ` Junio C Hamano
2012-10-04  8:00               ` [PATCH 2/4] peel_ref: do not return a null sha1 Jeff King
2012-10-04 18:32                 ` Junio C Hamano
2012-10-04  8:02               ` [PATCH 3/4] peel_ref: check object type before loading Jeff King
2012-10-04 19:06                 ` Junio C Hamano
2012-10-04 19:41                   ` Jeff King
2012-10-04 20:41                     ` Junio C Hamano
2012-10-04 21:59                       ` Jeff King
2012-10-04  8:03               ` [PATCH 4/4] upload-pack: use peel_ref for ref advertisements Jeff King
2012-10-04  8:04               ` [PATCH 0/4] optimizing upload-pack ref peeling Jeff King
2012-10-04  9:01                 ` Ævar Arnfjörð Bjarmason
2012-10-04 12:14                   ` Nazri Ramliy
2012-10-03 22:32   ` upload-pack is slow with lots of refs Ævar Arnfjörð Bjarmason
2012-10-03 23:21     ` Jeff King
2012-10-03 23:47       ` Ævar Arnfjörð Bjarmason
2012-10-03 19:13 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121003231529.GA11618@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).