git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP
Date: Wed, 10 Feb 2016 18:03:07 -0500	[thread overview]
Message-ID: <20160210230307.GA6633@sigill.intra.peff.net> (raw)
In-Reply-To: <20160210221758.GC10155@google.com>

On Wed, Feb 10, 2016 at 02:17:58PM -0800, Jonathan Nieder wrote:

> > Because the magic happens in the git protocol, that would mean this does
> > not have to be limited to git-over-http. It could be "resumable=<url>"
> > to point the client anywhere (the same server over a different protocol,
> > another server, etc).
> 
> Thanks for bringing this up.  A worry with putting the URL in the
> capabilities line is that it makes it easy to run into the 1000-byte
> limit.  It's been a while since v1.8.3-rc0~148^2~6 (pkt-line: provide
> a LARGE_PACKET_MAX static buffer, 2013-02-20) but we still can't
> rely on clients having that applied.

I hadn't considered that, but I'm not sure how much of a problem it is
in practice. The first line is 40-hex sha1, space, HEAD, NUL, then
capabilities. The current capabilities string from github.com on a
sample repo (which has a rather large agent string, and a normal-sized
symref pointer for HEAD) is 188 bytes. So that's still over 750 bytes
available for a URL. The space isn't infinite, and we may want to add
more capabilities later. But I'd think that devoting even 256 bytes to a
url would be reasonable, and leave us with fair room to grow.

> Another nice thing about using a 302 is that you can set cookies
> during the redirect, which might make authenticated access easier.
> (That said, authenticated access through e.g. signed URLs can work
> fine without that.)

Yeah, I can see there are advantages to assuming all the world is HTTP,
but it just doesn't seem that practical to me.

> > Clients do not have to _just_ fetch a packfile. They could get a bundle
> > file that contains the roots along with the packfile. I know that one of
> > your goals is not duplicating the storage of the packfile on the server,
> > but it would not be hard for the server to store the packfile and the
> > bundle header separately, and concatenate them on the fly.
> 
> Doesn't that prevent using a git-unaware file transfer service to
> serve the files?

Sort of. They would just need to serve the combined file. But _if_ you
have a git-unaware service (rsync?) _and_ its hitting the same storage
(so you could in theory not store the packfile twice), _and_ it cannot
be taught to do any kind of concatenation, then yes, it would be a
problem.

I do think I favor the "split bundle" anyway, though, just for
simplicity on both ends.

> > And you'll notice, too, that all of the bundle-http magic kicks in
> > during step 2 because the client sees they're grabbing a bundle. Which
> > means that the <url> in step 1 doesn't _have_ to be a bundle. It can be
> > "go fetch from kernel.org, then come back to me".
> 
> I think that use case brings in complications that make it not
> necessarily worth it.  In this example, if kernel.org is serving pack
> files, why shouldn't I point directly at the advertised pack CDN URL
> instead of adding an extra hop that puts added load on kernel.org
> servers?

Sure, that would be more efficient if kernel.org is providing such a CDN
URL. But they aren't now, because it doesn't exist yet, and this feature
can be used by you without having to coordinate their use of the
feature. And you can replace kernel.org in my example with any other
server that happens to be preferable to fetching from you, for whatever
reason.

> My motivation comes from the example of
> alternates: it is pretty and very flexible and ended up as a support
> and maintenance headache instead of being widely useful.

I guess one man's trash is another's treasure. Alternates are in wide
use at GitHub, and are pretty much what make our data model feasible.
I'm pretty sure that git.or.cz uses them for similar purposes, too.

> I think what
> you are proposing is more harmless but I'd still want to have an
> example of what it's used for before going in that direction.

Unless there is a big immediate downside, I'd rather err in the opposite
direction: keep orthogonal concerns separate (e.g., redirecting to X
versus protocol details of X).

-Peff

  reply	other threads:[~2016-02-10 23:03 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-10 18:59 RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP Shawn Pearce
2016-02-10 20:11 ` Shawn Pearce
2016-02-10 20:23   ` Stefan Beller
2016-02-10 20:57     ` Junio C Hamano
2016-02-10 21:22       ` Jonathan Nieder
2016-02-10 22:03         ` Jeff King
2016-02-10 21:01     ` Jonathan Nieder
2016-02-10 21:07       ` Junio C Hamano
2016-02-11  3:43       ` Junio C Hamano
2016-02-11 18:04         ` Shawn Pearce
2016-02-11 23:53       ` Duy Nguyen
2016-02-13  5:07         ` Junio C Hamano
2016-02-10 21:49   ` Jeff King
2016-02-10 22:17     ` Jonathan Nieder
2016-02-10 23:03       ` Jeff King [this message]
2016-02-10 22:40     ` Junio C Hamano
2016-02-11 21:32     ` Junio C Hamano
2016-02-11 21:46       ` Jeff King
2016-02-13  1:40     ` Blake Burkhart
2016-02-13 17:00       ` Jeff King
2016-02-14  2:14     ` Shawn Pearce
2016-02-14 17:05       ` Jeff King
2016-02-14 17:56         ` Shawn Pearce
2016-02-16 18:34         ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160210230307.GA6633@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).