git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP
Date: Wed, 10 Feb 2016 13:01:15 -0800	[thread overview]
Message-ID: <20160210210115.GA10155@google.com> (raw)
In-Reply-To: <CAGZ79kZMvxa5Np4GbShv_A6NZwVAqff94+d8MFTZwrZS+2CqeQ@mail.gmail.com>

Stefan Beller wrote:
> On Wed, Feb 10, 2016 at 12:11 PM, Shawn Pearce <spearce@spearce.org> wrote:

>> Several of us at $DAY_JOB talked about this more today and thought a
>> variation makes more sense:
>>
>> 1. Clients attempting clone ask for /info/refs?service=git-upload-pack
>> like they do today.
>>
>> 2. Servers that support resumable clone include a "resumable"
>> capability in the advertisement.
>
> like "resumable-token=hash" similar to a push cert advertisement?

It could just be the string 'resumable'.

But I wonder if it would be possible to save a round-trip by getting the
302 response in the initial request.  If the client requests

	/info/refs?service=git-upload-pack&want_resumable=true

then allow the server to make a 302 in response to its current mostly
whole pack.  Current clients would never send such a request because the
current protocol requires that for smart clients

	The request MUST contain exactly one query parameter,
	`service=$servicename`, where `$servicename` MUST be the service
	name the client wishes to contact to complete the operation.
	The request MUST NOT contain additional query parameters.

Current http-backend ignores extra query parameters.  I haven't
checked other smart http server implementations, though.

>> 3. Updated clients on clone request GET /info/refs?service=git-resumable-clone.
>
> Or just in the non-http case, they would terminate after the ls-remote
> (including capability advertisement) was done and connect again to
> a different service such as git-upload-stale-pack with the resumable
> token to identify the pack.

HTTP supports range requests and existing CDNs speak HTTP, so I
suspect it would work better if the git-resumable-clone service
printed an HTTP URL from which to grab the packfile.

I think the details are something that could be figured out after
trying out the idea with http first, though.

[...]
>> 5. Clients fetch the file using standard HTTP GET, possibly with
>> byte-ranges to resume.
>
> In the non-http case the git-upload-stale-pack would be rsync with the
> resume token to determine the file name of the pack,
> such that we have resumeability.

How do I tunnel rsync over git protocol?

So I think in the non-http case the git-resumable-clone service would
have to print a URL to be served using a possibly different protocol
(e.g., a signed https URL for getting the file from a service like S3,
or an rsync URL for getting the file using the same ssh creds that
were used for the initial request).

[...]
>> 6. Once stored and indexed with .idx, clients run `git fsck
>> --lost-found` to discover the roots of the pack it downloaded. These
>> are saved as temporary references.
>
> jrn:
> > I suspect we can do even faster by making index-pack do the work
>
>     index-pack --check-self-contained-and-connected

--strict + --check-self-contained-and-connected check that the pack
is self-contained.  In the process they mark each object that is
reachable from another object in the pack with FLAG_LINK.

The objects not marked with FLAG_LINK are the roots.

[...]
>> To make step 4 really resume well, clients may need to save the first
>> Location header it gets back from
>> /info/refs?service=git-resumable-clone and use that on resume. Servers
>> are likely to embed the pack SHA-1 in the Location header, and the
>> client wants to use this on subsequent GET attempts to abort early if
>> the server has deleted the pack the client is trying to obtain.

Yes.

I really like this design.  I'm tempted to implement it (since it
lacks a bunch of the downsides of clone.bundle).

Thanks,
Jonathan

  parent reply	other threads:[~2016-02-10 21:01 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-10 18:59 RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP Shawn Pearce
2016-02-10 20:11 ` Shawn Pearce
2016-02-10 20:23   ` Stefan Beller
2016-02-10 20:57     ` Junio C Hamano
2016-02-10 21:22       ` Jonathan Nieder
2016-02-10 22:03         ` Jeff King
2016-02-10 21:01     ` Jonathan Nieder [this message]
2016-02-10 21:07       ` Junio C Hamano
2016-02-11  3:43       ` Junio C Hamano
2016-02-11 18:04         ` Shawn Pearce
2016-02-11 23:53       ` Duy Nguyen
2016-02-13  5:07         ` Junio C Hamano
2016-02-10 21:49   ` Jeff King
2016-02-10 22:17     ` Jonathan Nieder
2016-02-10 23:03       ` Jeff King
2016-02-10 22:40     ` Junio C Hamano
2016-02-11 21:32     ` Junio C Hamano
2016-02-11 21:46       ` Jeff King
2016-02-13  1:40     ` Blake Burkhart
2016-02-13 17:00       ` Jeff King
2016-02-14  2:14     ` Shawn Pearce
2016-02-14 17:05       ` Jeff King
2016-02-14 17:56         ` Shawn Pearce
2016-02-16 18:34         ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160210210115.GA10155@google.com \
    --to=jrnieder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).