git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Josh Triplett <josh@joshtriplett.org>
Cc: Duy Nguyen <pclouds@gmail.com>, Al Viro <viro@zeniv.linux.org.uk>,
	Stefan Beller <sbeller@google.com>,
	"git\@vger.kernel.org" <git@vger.kernel.org>,
	sarah@thesharps.us
Subject: Re: Resumable git clone?
Date: Wed, 02 Mar 2016 00:31:16 -0800	[thread overview]
Message-ID: <xmqq4mcp5lij.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20160302075437.GA8024@x> (Josh Triplett's message of "Tue, 1 Mar 2016 23:54:37 -0800")

Josh Triplett <josh@joshtriplett.org> writes:

> I don't think it's worth the trouble and ambiguity to send abbreviated
> object names over the wire.  

Yup.  My unscientific experiment was to show that the list would be
far smaller than the actual transfer and between full binary and
full textual object name representations there would not be much
meaningful difference--you seem to have a better design sense to
grasp that point ;-)

> I think several simpler optimizations seem
> preferable, such as binary object names, and abbreviating complete
> object sets ("I have these commits/trees and everything they need
> recursively; I also have this stack of random objects.").

Given the way pack stream is organized (i.e. commits first and then
trees and blobs that belong to the same delta chain together), and
our assumed goal being to salvage objects from an interrupted
transfer of a packfile, you are unlikely to ever see "I have these
commits/trees and everything they need" that are salvaged from such
a failed transfer.  So I doubt such an optimization is worth doing.

Besides it is very expensive to compute (the computation is done on
the client side, so the cycles burned and the time the user has to
wait is of much less concern, though); you'd essentially be doing
"git fsck" to find the "dangling" objects.

The list of what would be transferred needs to come in full from the
server end, as the list names objects that the receiving end may not
have seen, but the response by the client could be encoded much
tightly.  For the full list of N objects from the server, we can
think of your response to be a bitstream of N bits, each on-bit in
which signals an unwanted object in the list.  You can optimize this
transfer by RLE compressing the bitstream, for example.

As git-over-HTTP is stateless, however, you cannot assume that the
server side remembers what it sent to the client (instead, the
client side needs to re-post what it heard from the server in the
previous exchange to allow the server side to use it after
validating).  So "objects at these indices in your list" kind of
optimization may not work very well in that environment.  I'd
imagine that an exchange of "Here are the list of objects", "Give me
these objects" done naively in full 40-hex object names would work
OK there, though.

  reply	other threads:[~2016-03-02  8:31 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-02  1:30 Resumable git clone? Josh Triplett
2016-03-02  1:40 ` Stefan Beller
2016-03-02  2:30   ` Al Viro
2016-03-02  6:31     ` Junio C Hamano
2016-03-02  7:37       ` Duy Nguyen
2016-03-02  7:44         ` Duy Nguyen
2016-03-02  7:54         ` Josh Triplett
2016-03-02  8:31           ` Junio C Hamano [this message]
2016-03-02  9:28             ` Duy Nguyen
2016-03-02 16:41             ` Josh Triplett
2016-03-02  8:13     ` Josh Triplett
2016-03-02  8:22       ` Duy Nguyen
2016-03-02  8:32         ` Jeff King
2016-03-02 10:47           ` Bhavik Bavishi
2016-03-02 16:40         ` Josh Triplett
2016-03-02  8:14     ` Duy Nguyen
2016-03-02  1:45 ` Duy Nguyen
2016-03-02  8:41 ` Junio C Hamano
2016-03-02 15:51   ` Konstantin Ryabitsev
2016-03-02 16:49   ` Josh Triplett
2016-03-02 17:57     ` Junio C Hamano
2016-03-24  8:00   ` Philip Oakley
2016-03-24 15:53     ` Junio C Hamano
2016-03-24 21:08       ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq4mcp5lij.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=josh@joshtriplett.org \
    --cc=pclouds@gmail.com \
    --cc=sarah@thesharps.us \
    --cc=sbeller@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).