git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Josh Triplett <josh@joshtriplett.org>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Stefan Beller <sbeller@google.com>,
	Duy Nguyen <pclouds@gmail.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	sarah@thesharps.us
Subject: Re: Resumable git clone?
Date: Wed, 2 Mar 2016 00:13:44 -0800	[thread overview]
Message-ID: <20160302081344.GB8024@x> (raw)
In-Reply-To: <20160302023024.GG17997@ZenIV.linux.org.uk>

On Wed, Mar 02, 2016 at 02:30:24AM +0000, Al Viro wrote:
> On Tue, Mar 01, 2016 at 05:40:28PM -0800, Stefan Beller wrote:
> 
> > So throwing away half finished stuff while keeping the front load?
> 
> Throw away the object that got truncated and ones for which delta chain
> doesn't resolve entirely in the transferred part.
>  
> > > indexing the objects it
> > > contains, and then re-running clone and not having to fetch those
> > > objects.
> > 
> > The pack is not deterministic for a given repository. When creating
> > the pack, you may encounter races between threads, such that the order
> > in a pack differs.
> 
> FWIW, I wasn't proposing to recreate the remaining bits of that _pack_;
> just do the normal pull with one addition: start with sending the list
> of sha1 of objects you are about to send and let the recepient reply
> with "I already have <set of sha1>, don't bother with those".  And exclude
> those from the transfer.  Encoding for the set being available is an
> interesting variable here - might be plain list of sha1, might be its
> complement ("I want the following subset"), might be "145th to 1029th,
> 1517th and 1890th to 1920th of the list you've sent"; which form ends
> up more efficient needs to be found experimentally...

As a simple proposal, the server could send the list of hashes (in
approximately the same order it would send the pack), the client could
send back a bitmap where '0' means "send it" and '1' means "got that one
already", and the client could compress that bitmap.  That gives you the
RLE and similar without having to write it yourself.  That might not be
optimal, but it would likely set a high bar with minimal effort.

One debatable optimization on top of that would rely on git object
structure to imply objects hashes without sending them: the message from
the server could have a list of commit/tree hashes that imply sending
all objects reachable from those, without having to send all the implied
hashes.  However, that would then make the message back from the client
about what it already has larger and more complicated; that might not
make it worthwhile.

This seems like a good case for doing the simplest possible thing first
(complete hash list, compressed "got it already" bitmap), seeing how
much benefit that provides, and creating a v2 protocol if some
additional optimization proves sufficiently worthwhile.

- Josh Triplett

  parent reply	other threads:[~2016-03-02  8:14 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-02  1:30 Resumable git clone? Josh Triplett
2016-03-02  1:40 ` Stefan Beller
2016-03-02  2:30   ` Al Viro
2016-03-02  6:31     ` Junio C Hamano
2016-03-02  7:37       ` Duy Nguyen
2016-03-02  7:44         ` Duy Nguyen
2016-03-02  7:54         ` Josh Triplett
2016-03-02  8:31           ` Junio C Hamano
2016-03-02  9:28             ` Duy Nguyen
2016-03-02 16:41             ` Josh Triplett
2016-03-02  8:13     ` Josh Triplett [this message]
2016-03-02  8:22       ` Duy Nguyen
2016-03-02  8:32         ` Jeff King
2016-03-02 10:47           ` Bhavik Bavishi
2016-03-02 16:40         ` Josh Triplett
2016-03-02  8:14     ` Duy Nguyen
2016-03-02  1:45 ` Duy Nguyen
2016-03-02  8:41 ` Junio C Hamano
2016-03-02 15:51   ` Konstantin Ryabitsev
2016-03-02 16:49   ` Josh Triplett
2016-03-02 17:57     ` Junio C Hamano
2016-03-24  8:00   ` Philip Oakley
2016-03-24 15:53     ` Junio C Hamano
2016-03-24 21:08       ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160302081344.GB8024@x \
    --to=josh@joshtriplett.org \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=sarah@thesharps.us \
    --cc=sbeller@google.com \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).