git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [BUG?] fetch into shallow sends a large number of objects
Date: Tue, 8 Mar 2016 08:25:24 -0500	[thread overview]
Message-ID: <20160308132524.GA22866@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8Dk_g1O98UsDaeVS3VXmE2Mn5aR+w1OiFir+QwyJYLVZQ@mail.gmail.com>

On Tue, Mar 08, 2016 at 07:33:43PM +0700, Duy Nguyen wrote:

> On Tue, Mar 8, 2016 at 7:14 PM, Jeff King <peff@peff.net> wrote:
> > ...
> >
> > So I think the solution to both is that we need to do a _separate_
> > traversal with all of the positive tips we're going to send, and the
> > parents of any shallow commits the client has, to find their fork points
> > (i.e., merge bases). And then we add those fork points to the shallow
> > list (grafting out their parents), and communicate them to the client to
> > add to its shallow setup.
> 
> Good news. We have the mechanism in place, I think.
> get_shallow_commits_by_rev_list() (from 'pu') will produce the right
> shallow points for sending back to the client if you pass "--not
> <current shallow points>" to it. It's meant to be used for
> --shallow-exclude and --shallow-since, but if neither is given (nor
> --depth) I guess we can run it with current shallow points. I wonder
> if we can detect some common cases and avoid commit traversing this
> way though.

I tried that, but I couldn't quite get it to work. I don't think we need
any special rev-list, though; we can just find the boundary points of
that traversal and mark them as new shallows.

I think this patch does roughly the right thing:

diff --git a/upload-pack.c b/upload-pack.c
index 4859535..da76f70 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -833,12 +833,41 @@ static void receive_needs(void)
 		deepen_by_rev_list(av.argc, av.argv, &shallows);
 		argv_array_clear(&av);
 	}
-	else
-		if (shallows.nr > 0) {
-			int i;
-			for (i = 0; i < shallows.nr; i++)
-				register_shallow(shallows.objects[i].item->oid.hash);
+	else if (shallows.nr > 0) {
+		struct rev_info revs;
+		struct argv_array av = ARGV_ARRAY_INIT;
+		struct commit *c;
+		int i;
+
+		argv_array_push(&av, "rev-list");
+		argv_array_push(&av, "--boundary");
+		for (i = 0; i < want_obj.nr; i++) {
+			struct object *o = want_obj.objects[i].item;
+			argv_array_push(&av, oid_to_hex(&o->oid));
 		}
+		for (i = 0; i < shallows.nr; i++) {
+			struct object *o = shallows.objects[i].item;
+			argv_array_pushf(&av, "^%s", oid_to_hex(&o->oid));
+		}
+
+		init_revisions(&revs, NULL);
+		setup_revisions(av.argc, av.argv, &revs, NULL);
+		if (prepare_revision_walk(&revs))
+			die("revision walk setup failed");
+
+		while ((c = get_revision(&revs))) {
+			if (!(c->object.flags & BOUNDARY))
+				continue;
+			register_shallow(c->object.oid.hash);
+			packet_write(1, "shallow %s",
+				     oid_to_hex(&c->object.oid));
+		}
+		packet_flush(1);
+		argv_array_clear(&av);
+
+		for (i = 0; i < shallows.nr; i++)
+			register_shallow(shallows.objects[i].item->oid.hash);
+	}
 
 	shallow_nr += shallows.nr;
 	free(shallows.objects);

Though I think perhaps we should also be adding those BOUNDARY commits
to the "shallows" object array? This works because the "--shallow" we
pass to pack-objects comes by reading the commit-graft list manipulated
by register_shallow(), so I'm not sure if it matters.

_But_, the client is not prepared to handle this. We send "shallow"
lines that it is not expecting, since it did not ask for any depth. So I
think this logic would have to kick in only when the client tells us to
do so.

I hacked around it with:

diff --git a/fetch-pack.c b/fetch-pack.c
index e8ae6d1..988c808 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -830,6 +830,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 
 	sort_ref_list(&ref, ref_compare_name);
 	qsort(sought, nr_sought, sizeof(*sought), cmp_ref_by_name);
+	args->deepen = 1;
 
 	if ((args->depth > 0 || is_repository_shallow()) && !server_supports("shallow"))
 		die("Server does not support shallow clients");

and confirmed that the resulting "git fetch origin new" from my earlier
example does a sane thing.

So what next? I think there's some protocol work here, and I think the
overall design of that needs to be considered alongside the other
"deepen" options your topic in pu adds (and of which I'm largely
ignorant). Does this sufficiently interest you to pick up and roll into
your other shallow work?

-Peff

  reply	other threads:[~2016-03-08 13:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-07 22:15 [BUG?] fetch into shallow sends a large number of objects Jeff King
2016-03-07 23:47 ` Junio C Hamano
2016-03-08  0:53   ` Duy Nguyen
2016-03-08 12:21     ` Jeff King
2016-03-08 12:14   ` Jeff King
2016-03-08 12:33     ` Duy Nguyen
2016-03-08 13:25       ` Jeff King [this message]
2016-03-08 13:30         ` Jeff King
2016-03-08 23:02           ` Duy Nguyen
2016-03-10 12:20         ` Duy Nguyen
2016-03-10 21:10           ` Jeff King
2016-03-10 21:26             ` Junio C Hamano
2016-03-10 21:40               ` Jeff King
2016-03-11  0:47                 ` Duy Nguyen
2016-03-11 16:53                   ` Junio C Hamano
2016-03-11 18:16                   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160308132524.GA22866@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).