git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Stefan Beller <sbeller@google.com>,
	Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP
Date: Wed, 10 Feb 2016 17:03:35 -0500	[thread overview]
Message-ID: <20160210220334.GB5853@sigill.intra.peff.net> (raw)
In-Reply-To: <20160210212207.GB10155@google.com>

On Wed, Feb 10, 2016 at 01:22:07PM -0800, Jonathan Nieder wrote:

> > I am not quite sure if that is an advantage, though.  The second
> > message proposes that the lost-found computation to be done by the
> > client using *.pack, but any client, given the same *.pack, will
> > compute the same result, so if the result is computed on the server
> > side just once when the *.pack is prepared and downloaded to the
> > client, it would give us a better overall resource utilization.  And
> > in essence, that was what the *.info file in the first message was.
> 
> Advantages of not providing the list of roots:
>  1. only need one round-trip to serve the packfile as-is
>  2. less data sent over the wire (not important unless the list of roots
>     is long)
>  3. can be enabled on the server for existing repositories without an
>     extra step of generating .info files
> 
> Advantage of providing the list of roots:
> - speedup because the client does not have to compute the list of roots
> 
> For a client that is already iterating over all objects and inspecting
> FLAG_LINK, the advantage (3) seems compelling enough to prefer the
> protocol that doesn't sent a list of roots.

I'm not sure how compelling (3) is, since we are relying on the server
to make certain packing choices. I guess a stock "git repack -ad" would
do in a pinch; it should at least contain all needed objects, but it's
going to potentially have extra cruft objects (from reflogs, for
example).

I outlined some alternatives to Shawn's proposal elsewhere in the
thread. I think it's a useful feature for this redirect to not just be
"go fetch this packfile", but "go clone from here and come back to me".
That opens up a lot of flexibility.

It does make "go fetch this packfile without roots" a little harder, but
I think it's still do-able. Right now when git hits an http URL, we pass
the smart-http "?service=" magic, and we look at the response to figure
out whether we got:

  1. A smart-http server.

  2. A dumb-http server.

  N. Something else, in which case we die.

The alternative I outlined elsewhere (and the patches I posted long
ago) basically adds:

  3. If it's a bundle, fetch the bundle and then clone from that.

But we could also do:

  4. If it's a packfile, fetch the packfile and then do the
     find-the-roots magic.

> Except when people pass --depth, "git clone" sets
> 'check_self_contained_and_connected = 1'.  That means clients that
> already iterate over all objects and inspect FLAG_LINK are the usual
> case.

Somewhat related, but I've wondered if we could do something similar
even for non-clone cases. That is, `index-pack` could tell us the set of
referenced but missing objects, and we could verify that each of those
is reachable (we _could_ just have it verify that we have the object at
all; traditionally we only guaranteed that reachable objects were kept,
but these days we keep anything reachable from another object we are
keeping, so if you have X, you should always have X^, etc).

Anyway, that's quite a tangent from this topic.

-Peff

  reply	other threads:[~2016-02-10 22:04 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-10 18:59 RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP Shawn Pearce
2016-02-10 20:11 ` Shawn Pearce
2016-02-10 20:23   ` Stefan Beller
2016-02-10 20:57     ` Junio C Hamano
2016-02-10 21:22       ` Jonathan Nieder
2016-02-10 22:03         ` Jeff King [this message]
2016-02-10 21:01     ` Jonathan Nieder
2016-02-10 21:07       ` Junio C Hamano
2016-02-11  3:43       ` Junio C Hamano
2016-02-11 18:04         ` Shawn Pearce
2016-02-11 23:53       ` Duy Nguyen
2016-02-13  5:07         ` Junio C Hamano
2016-02-10 21:49   ` Jeff King
2016-02-10 22:17     ` Jonathan Nieder
2016-02-10 23:03       ` Jeff King
2016-02-10 22:40     ` Junio C Hamano
2016-02-11 21:32     ` Junio C Hamano
2016-02-11 21:46       ` Jeff King
2016-02-13  1:40     ` Blake Burkhart
2016-02-13 17:00       ` Jeff King
2016-02-14  2:14     ` Shawn Pearce
2016-02-14 17:05       ` Jeff King
2016-02-14 17:56         ` Shawn Pearce
2016-02-16 18:34         ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160210220334.GB5853@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=sbeller@google.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).