git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: Duy Nguyen <pclouds@gmail.com>,
	Kevin Wern <kevin.m.wern@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Resumable clone
Date: Sun, 6 Mar 2016 08:59:10 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.20.1603060831570.3152@virtualbox> (raw)
In-Reply-To: <xmqqoaasvkrt.fsf@gitster.mtv.corp.google.com>

Hi Junio & Duy,

On Sat, 5 Mar 2016, Junio C Hamano wrote:

> Duy Nguyen <pclouds@gmail.com> writes:
> 
> > Resumable clone is happening. See [1] for the basic idea, [2] and [3]
> > for some preparation work. I'm sure you can help. Once you've gone
> > through at least [1], I think you can pick something (e.g. finalizing
> > the protocol, update the server side, or git-clone....)
> >
> > [1] http://thread.gmane.org/gmane.comp.version-control.git/285921
> > [2] http://thread.gmane.org/gmane.comp.version-control.git/288080/focus=288150
> > [3] http://thread.gmane.org/gmane.comp.version-control.git/288205/focus=288222
> 
> I think your response needs to be refined with a bit higher level
> overview, though.  Here are some thoughts to summarize the discussion
> and to extend it.
> 
> I think the right way to think about this is that we are adding a
> capability for the server to instruct the clients: I prefer not to
> serve a full clone to you in the usual route if I can avoid it.  You
> can help me by going to an alternate resource and populate your
> history first and then coming back to me for an additional fetch to
> complete the history if you want to.  Doing so would also help you
> because that alternate resource can be a static file (or two) that
> you can download over a resumable transport (like static files
> served over HTTPS).

For quite some time I considered presenting some alternate/additional
ideas. I feel a little bad for mentioning them here because I *really*
have no time to follow up on them whatsoever. But maybe they turn out to
contribute something to the final solution.

I tried to follow the discussion as much as possible, sometimes failing
due to time constraints, therefore I'd like to apologize in advance if any
of these ideas have been mentioned already.

First of all: my main gripe with the discussed approach is that it uses
bundles. I know, I introduced bundles, but they just seem too klunky and
too static for the resumable clone feature.

So I wonder whether it would be possible to come up with a subset of the
revs with a stable order, with associated thin packs (using prior revs as
negative revs in the commit range) such that each thin pack weighs roughly
1MB (or whatever granularity you desire). My thinking was that it should
be possible to follow a similar strategy as bisect to come up with said
list.

The client could then state that it was interrupted at downloading a given
rev's pack, with a specific offset, and the (thin) pack could be
regenerated on the fly (or cached), serving only the desired chunk. The
server would then also automatically know where in the list of
stable-ordered revs the clone was interrupted and continue with the next
one.

Oh, and if regenerating the thin pack instead of caching it, we need to
ensure a stable packing (i.e. no threads!). That is, given a commit range,
we need to (re-)generate bytewise-identical thin packs.

Of course this stable-ordered rev list would have to be persisted when the
server serves its first resumable clone and then extended with future
resumable clones whenever new revisions were pushed. (And there would also
have to be some way to evict no-longer-reachable revs, maybe by simply
regenerating the whole shebang.)

For all of this to work, the most crucial idea would be this one: a clone
can *always* start as-is. Only when interrupted, and when the server
supports the "resumable clone" capability, and only when "resuming"
the clone, the client could *actually* ask for a resumable clone.

Yes, this could potentially waste a bit of bandwidth on the part of the
user with a flakey connection (because whatever was transferred during the
first, non-resumable clone would be blown out of the window), but it might
make it easier for us to provide a non-fragile upgrade path because the
cloning process would still default to the current one.

Food for thought?

Ciao,
Dscho

  parent reply	other threads:[~2016-03-06  7:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-05  9:23 Resumable clone Kevin Wern
2016-03-05  9:40 ` Duy Nguyen
2016-03-05 18:31   ` Junio C Hamano
2016-03-05 18:40     ` Junio C Hamano
2016-03-06  7:59     ` Johannes Schindelin [this message]
2016-03-06  8:49       ` Duy Nguyen
2016-03-06  8:52         ` Duy Nguyen
2016-03-06 19:48         ` Junio C Hamano
2016-03-07  3:55       ` Junio C Hamano
2016-03-08  3:33     ` Kevin Wern
2016-03-08 11:11       ` Duy Nguyen
2016-03-08 17:25         ` Junio C Hamano
2016-03-08 17:07       ` Junio C Hamano
2016-03-09  2:04         ` Kevin Wern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1603060831570.3152@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kevin.m.wern@gmail.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).