From: Jonathan Nieder <jrnieder@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP
Date: Wed, 10 Feb 2016 13:01:15 -0800 [thread overview]
Message-ID: <20160210210115.GA10155@google.com> (raw)
In-Reply-To: <CAGZ79kZMvxa5Np4GbShv_A6NZwVAqff94+d8MFTZwrZS+2CqeQ@mail.gmail.com>
Stefan Beller wrote:
> On Wed, Feb 10, 2016 at 12:11 PM, Shawn Pearce <spearce@spearce.org> wrote:
>> Several of us at $DAY_JOB talked about this more today and thought a
>> variation makes more sense:
>>
>> 1. Clients attempting clone ask for /info/refs?service=git-upload-pack
>> like they do today.
>>
>> 2. Servers that support resumable clone include a "resumable"
>> capability in the advertisement.
>
> like "resumable-token=hash" similar to a push cert advertisement?
It could just be the string 'resumable'.
But I wonder if it would be possible to save a round-trip by getting the
302 response in the initial request. If the client requests
/info/refs?service=git-upload-pack&want_resumable=true
then allow the server to make a 302 in response to its current mostly
whole pack. Current clients would never send such a request because the
current protocol requires that for smart clients
The request MUST contain exactly one query parameter,
`service=$servicename`, where `$servicename` MUST be the service
name the client wishes to contact to complete the operation.
The request MUST NOT contain additional query parameters.
Current http-backend ignores extra query parameters. I haven't
checked other smart http server implementations, though.
>> 3. Updated clients on clone request GET /info/refs?service=git-resumable-clone.
>
> Or just in the non-http case, they would terminate after the ls-remote
> (including capability advertisement) was done and connect again to
> a different service such as git-upload-stale-pack with the resumable
> token to identify the pack.
HTTP supports range requests and existing CDNs speak HTTP, so I
suspect it would work better if the git-resumable-clone service
printed an HTTP URL from which to grab the packfile.
I think the details are something that could be figured out after
trying out the idea with http first, though.
[...]
>> 5. Clients fetch the file using standard HTTP GET, possibly with
>> byte-ranges to resume.
>
> In the non-http case the git-upload-stale-pack would be rsync with the
> resume token to determine the file name of the pack,
> such that we have resumeability.
How do I tunnel rsync over git protocol?
So I think in the non-http case the git-resumable-clone service would
have to print a URL to be served using a possibly different protocol
(e.g., a signed https URL for getting the file from a service like S3,
or an rsync URL for getting the file using the same ssh creds that
were used for the initial request).
[...]
>> 6. Once stored and indexed with .idx, clients run `git fsck
>> --lost-found` to discover the roots of the pack it downloaded. These
>> are saved as temporary references.
>
> jrn:
> > I suspect we can do even faster by making index-pack do the work
>
> index-pack --check-self-contained-and-connected
--strict + --check-self-contained-and-connected check that the pack
is self-contained. In the process they mark each object that is
reachable from another object in the pack with FLAG_LINK.
The objects not marked with FLAG_LINK are the roots.
[...]
>> To make step 4 really resume well, clients may need to save the first
>> Location header it gets back from
>> /info/refs?service=git-resumable-clone and use that on resume. Servers
>> are likely to embed the pack SHA-1 in the Location header, and the
>> client wants to use this on subsequent GET attempts to abort early if
>> the server has deleted the pack the client is trying to obtain.
Yes.
I really like this design. I'm tempted to implement it (since it
lacks a bunch of the downsides of clone.bundle).
Thanks,
Jonathan
next prev parent reply other threads:[~2016-02-10 21:01 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-10 18:59 RFC: Resumable clone based on hybrid "smart" and "dumb" HTTP Shawn Pearce
2016-02-10 20:11 ` Shawn Pearce
2016-02-10 20:23 ` Stefan Beller
2016-02-10 20:57 ` Junio C Hamano
2016-02-10 21:22 ` Jonathan Nieder
2016-02-10 22:03 ` Jeff King
2016-02-10 21:01 ` Jonathan Nieder [this message]
2016-02-10 21:07 ` Junio C Hamano
2016-02-11 3:43 ` Junio C Hamano
2016-02-11 18:04 ` Shawn Pearce
2016-02-11 23:53 ` Duy Nguyen
2016-02-13 5:07 ` Junio C Hamano
2016-02-10 21:49 ` Jeff King
2016-02-10 22:17 ` Jonathan Nieder
2016-02-10 23:03 ` Jeff King
2016-02-10 22:40 ` Junio C Hamano
2016-02-11 21:32 ` Junio C Hamano
2016-02-11 21:46 ` Jeff King
2016-02-13 1:40 ` Blake Burkhart
2016-02-13 17:00 ` Jeff King
2016-02-14 2:14 ` Shawn Pearce
2016-02-14 17:05 ` Jeff King
2016-02-14 17:56 ` Shawn Pearce
2016-02-16 18:34 ` Stefan Beller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160210210115.GA10155@google.com \
--to=jrnieder@gmail.com \
--cc=git@vger.kernel.org \
--cc=sbeller@google.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).