git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Josh Triplett <josh@joshtriplett.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Duy Nguyen <pclouds@gmail.com>, Al Viro <viro@zeniv.linux.org.uk>,
	Stefan Beller <sbeller@google.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	sarah@thesharps.us
Subject: Re: Resumable git clone?
Date: Wed, 2 Mar 2016 08:41:18 -0800	[thread overview]
Message-ID: <20160302164118.GA13732@x> (raw)
In-Reply-To: <xmqq4mcp5lij.fsf@gitster.mtv.corp.google.com>

On Wed, Mar 02, 2016 at 12:31:16AM -0800, Junio C Hamano wrote:
> Josh Triplett <josh@joshtriplett.org> writes:
> > I think several simpler optimizations seem
> > preferable, such as binary object names, and abbreviating complete
> > object sets ("I have these commits/trees and everything they need
> > recursively; I also have this stack of random objects.").
> 
> Given the way pack stream is organized (i.e. commits first and then
> trees and blobs that belong to the same delta chain together), and
> our assumed goal being to salvage objects from an interrupted
> transfer of a packfile, you are unlikely to ever see "I have these
> commits/trees and everything they need" that are salvaged from such
> a failed transfer.  So I doubt such an optimization is worth doing.

True for the resumable clone case.  For that optimization, I was
thinking of the "pull during the merge window" case that Al Viro was
also interested in optimizing.

> Besides it is very expensive to compute (the computation is done on
> the client side, so the cycles burned and the time the user has to
> wait is of much less concern, though); you'd essentially be doing
> "git fsck" to find the "dangling" objects.

Trading client-side computation for bandwidth can potentially be
worthwhile if you have plenty of local compute but a slow and metered
link.

> The list of what would be transferred needs to come in full from the
> server end, as the list names objects that the receiving end may not
> have seen, but the response by the client could be encoded much
> tightly.  For the full list of N objects from the server, we can
> think of your response to be a bitstream of N bits, each on-bit in
> which signals an unwanted object in the list.  You can optimize this
> transfer by RLE compressing the bitstream, for example.
> 
> As git-over-HTTP is stateless, however, you cannot assume that the
> server side remembers what it sent to the client (instead, the
> client side needs to re-post what it heard from the server in the
> previous exchange to allow the server side to use it after
> validating).  So "objects at these indices in your list" kind of
> optimization may not work very well in that environment.  I'd
> imagine that an exchange of "Here are the list of objects", "Give me
> these objects" done naively in full 40-hex object names would work
> OK there, though.

Good point.  Between statelessness and Duy's point about the client list
usually being smaller than the server list, perhaps it would make sense
to not have the server send a list at all, and just have the client send
its own list.

- Josh Triplett

  parent reply	other threads:[~2016-03-02 16:41 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-02  1:30 Resumable git clone? Josh Triplett
2016-03-02  1:40 ` Stefan Beller
2016-03-02  2:30   ` Al Viro
2016-03-02  6:31     ` Junio C Hamano
2016-03-02  7:37       ` Duy Nguyen
2016-03-02  7:44         ` Duy Nguyen
2016-03-02  7:54         ` Josh Triplett
2016-03-02  8:31           ` Junio C Hamano
2016-03-02  9:28             ` Duy Nguyen
2016-03-02 16:41             ` Josh Triplett [this message]
2016-03-02  8:13     ` Josh Triplett
2016-03-02  8:22       ` Duy Nguyen
2016-03-02  8:32         ` Jeff King
2016-03-02 10:47           ` Bhavik Bavishi
2016-03-02 16:40         ` Josh Triplett
2016-03-02  8:14     ` Duy Nguyen
2016-03-02  1:45 ` Duy Nguyen
2016-03-02  8:41 ` Junio C Hamano
2016-03-02 15:51   ` Konstantin Ryabitsev
2016-03-02 16:49   ` Josh Triplett
2016-03-02 17:57     ` Junio C Hamano
2016-03-24  8:00   ` Philip Oakley
2016-03-24 15:53     ` Junio C Hamano
2016-03-24 21:08       ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160302164118.GA13732@x \
    --to=josh@joshtriplett.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=sarah@thesharps.us \
    --cc=sbeller@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).