git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, jrnieder@gmail.com, spearce@spearce.org
Subject: Re: [PATCH v3] fetch-pack: always allow fetching of literal SHA1s
Date: Thu, 11 May 2017 16:52:14 -0400	[thread overview]
Message-ID: <20170511205214.ofwkqogksohs6s4b@sigill.intra.peff.net> (raw)
In-Reply-To: <695d833e-9ef1-4d74-265e-f5e3af8a54cd@google.com>

On Thu, May 11, 2017 at 10:51:37AM -0700, Jonathan Tan wrote:

> > This is inside the nr_sought loop. So if I were to do:
> > 
> >   git fetch origin $(git ls-remote origin | awk '{print $1}')
> > 
> > we're back to being quadratic. I realize that's probably a silly thing
> > to do, but in the general case, you're O(m*n), where "n" is number of
> > unmatched remote refs and "m" is the number of SHA-1s you're looking
> > for.
> 
> The original patch was quadratic regardless of whether we're fetching names
> or SHA-1s, which can be said to be bad since it is a regression in an
> existing and common use case (and I agree). This one is O(m*n), as you said
> - the "quadratic-ness" only kicks in if you fetch SHA-1s, which before this
> patch is impossible.

Yeah, no question this is an improvement over the original. I think this
still "regresses" a certain request performance-wise, but it's a request
that could never have succeeded in the original. Mostly I'd just rather
not leave a hidden quadratic loop that will blow up in somebody's face
a year or two down the road.

> Having a sha1_array or oidset would require allocation (O(n log n) time, I
> think, in addition to O(n) space) and this cost would be incurred regardless
> of how many SHA-1s were actually fetched (if m is an order of magnitude less
> than log n, for example, having a sha1_array might be actually worse). Also,
> presumably we don't want to incur this cost if we are fetching zero SHA-1s,
> so we would need to do some sort of pre-check. I'm inclined to leave it the
> way it is and consider this only if the use case of fetching many SHA1-s
> comes up.

An oidset should be O(n) in both time and space. Which we're already
spending in the earlier loop (and in general, when we allocate O(n) ref
structs in the first place). I don't think we need to care too much
about micro-optimizing bumps to the constant-factor here; we just need
to make sure we don't end up in an algorithmic explosion.

And if we initialize the oidset lazily, then we incur that cost only
when we would be doing the linear walk in your patch anyway. See the
patch below.

> > AIUI, you could also avoid creating the unmatched list entirely when the
> > server advertises tip/reachable sha1s. That's a small optimization, but
> > I think it may actually make the logic clearer.
> 
> If you mean adding an "if" block at the point where we add the unmatched ref
> to the unmatched list (that either adds it to the list or immediately frees
> it), I think it makes the logic slightly more complicated. Or maybe you had
> something else in mind?

I was just thinking that it does not leave a case where we create a
data structure (the unmatched list) even though we know right then that
it will not ever be used. So it's an extra line of logic there, but the
overall function may be clearer. I dunno, it's probably not a big deal
either way.

-Peff

---
diff --git a/fetch-pack.c b/fetch-pack.c
index 5cace7458..a660331e6 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -15,6 +15,7 @@
 #include "version.h"
 #include "prio-queue.h"
 #include "sha1-array.h"
+#include "oidset.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -592,6 +593,12 @@ static void mark_recent_complete_commits(struct fetch_pack_args *args,
 	}
 }
 
+static void add_refs_to_oidset(struct oidset *oids, const struct ref *refs)
+{
+	for (; refs; refs = refs->next)
+		oidset_insert(oids, &refs->old_oid);
+}
+
 static void filter_refs(struct fetch_pack_args *args,
 			struct ref **refs,
 			struct ref **sought, int nr_sought)
@@ -600,6 +607,8 @@ static void filter_refs(struct fetch_pack_args *args,
 	struct ref **newtail = &newlist;
 	struct ref *unmatched = NULL;
 	struct ref *ref, *next;
+	struct oidset tip_oids = OIDSET_INIT;
+	int tip_oids_initialized = 0;
 	int i;
 
 	i = 0;
@@ -654,22 +663,15 @@ static void filter_refs(struct fetch_pack_args *args,
 		    (ALLOW_TIP_SHA1 | ALLOW_REACHABLE_SHA1))) {
 			can_append = 1;
 		} else {
-			struct ref *u;
-			/* Check all refs, including those already matched */
-			for (u = unmatched; u; u = u->next) {
-				if (!oidcmp(&ref->old_oid, &u->old_oid)) {
-					can_append = 1;
-					goto can_append;
-				}
-			}
-			for (u = newlist; u; u = u->next) {
-				if (!oidcmp(&ref->old_oid, &u->old_oid)) {
-					can_append = 1;
-					break;
-				}
+			if (!tip_oids_initialized) {
+				/* Check all refs, including those already matched */
+				add_refs_to_oidset(&tip_oids, unmatched);
+				add_refs_to_oidset(&tip_oids, newlist);
+				tip_oids_initialized = 1;
 			}
+			can_append = oidset_contains(&tip_oids, &ref->old_oid);
 		}
-can_append:
+
 		if (can_append) {
 			ref->match_status = REF_MATCHED;
 			*newtail = copy_ref(ref);
@@ -680,6 +682,7 @@ static void filter_refs(struct fetch_pack_args *args,
 	}
 	*refs = newlist;
 
+	oidset_clear(&tip_oids);
 	for (ref = unmatched; ref; ref = next) {
 		next = ref->next;
 		free(ref);

  reply	other threads:[~2017-05-11 20:52 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-09 18:20 [PATCH] fetch-pack: always allow fetching of literal SHA1s Jonathan Tan
2017-05-09 22:16 ` Jeff King
2017-05-10  4:22   ` Shawn Pearce
2017-05-10  4:33     ` Jeff King
2017-05-10  4:46       ` Mike Hommey
2017-05-10 17:50         ` Ævar Arnfjörð Bjarmason
2017-05-10 18:20           ` Jonathan Nieder
2017-05-10 18:48             ` Martin Fick
2017-05-10 18:54               ` Jonathan Nieder
2017-05-10  4:57       ` Shawn Pearce
2017-05-10 17:00       ` Jonathan Nieder
2017-05-10 18:55         ` Sebastian Schuberth
2017-05-11  9:59         ` Jeff King
2017-05-11 19:03           ` Jonathan Nieder
2017-05-11 21:04             ` Jeff King
2017-05-10 16:44 ` [PATCH v2] " Jonathan Tan
2017-05-10 18:01   ` Jonathan Nieder
2017-05-10 22:11 ` [PATCH v3] " Jonathan Tan
2017-05-10 23:22   ` Jonathan Nieder
2017-05-11  9:46   ` Jeff King
2017-05-11 17:51     ` Jonathan Tan
2017-05-11 20:52       ` Jeff King [this message]
2017-05-11 10:05   ` Jeff King
2017-05-11 17:00     ` Brandon Williams
2017-05-13  9:29       ` Jeff King
2017-05-11 21:14 ` [PATCH v4] " Jonathan Tan
2017-05-11 21:35   ` Jonathan Nieder
2017-05-11 21:59     ` Jeff King
2017-05-11 22:30 ` [PATCH v5] " Jonathan Tan
2017-05-11 22:46   ` Jonathan Nieder
2017-05-12  2:59     ` Jeff King
2017-05-12  6:01     ` Junio C Hamano
2017-05-12  7:59       ` Jeff King
2017-05-12  8:14         ` Jeff King
2017-05-12 18:00           ` Jonathan Tan
2017-05-13  8:30             ` Jeff King
2017-05-12 18:09         ` Jonathan Tan
2017-05-12 19:06           ` Jonathan Nieder
2017-05-12  3:06   ` Jeff King
2017-05-12 20:45 ` Jonathan Tan
2017-05-12 20:46 ` [PATCH v6] " Jonathan Tan
2017-05-12 22:28   ` Jonathan Nieder
2017-05-13  8:36   ` Jeff King
2017-05-15  1:26     ` Junio C Hamano
2017-05-15 17:32 ` [PATCH v7] " Jonathan Tan
2017-05-15 17:46   ` Jonathan Nieder
2017-05-15 22:10   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170511205214.ofwkqogksohs6s4b@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).