From: Josh Triplett <josh@joshtriplett.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Duy Nguyen <pclouds@gmail.com>, Al Viro <viro@zeniv.linux.org.uk>,
Stefan Beller <sbeller@google.com>,
"git@vger.kernel.org" <git@vger.kernel.org>,
sarah@thesharps.us
Subject: Re: Resumable git clone?
Date: Wed, 2 Mar 2016 08:41:18 -0800 [thread overview]
Message-ID: <20160302164118.GA13732@x> (raw)
In-Reply-To: <xmqq4mcp5lij.fsf@gitster.mtv.corp.google.com>
On Wed, Mar 02, 2016 at 12:31:16AM -0800, Junio C Hamano wrote:
> Josh Triplett <josh@joshtriplett.org> writes:
> > I think several simpler optimizations seem
> > preferable, such as binary object names, and abbreviating complete
> > object sets ("I have these commits/trees and everything they need
> > recursively; I also have this stack of random objects.").
>
> Given the way pack stream is organized (i.e. commits first and then
> trees and blobs that belong to the same delta chain together), and
> our assumed goal being to salvage objects from an interrupted
> transfer of a packfile, you are unlikely to ever see "I have these
> commits/trees and everything they need" that are salvaged from such
> a failed transfer. So I doubt such an optimization is worth doing.
True for the resumable clone case. For that optimization, I was
thinking of the "pull during the merge window" case that Al Viro was
also interested in optimizing.
> Besides it is very expensive to compute (the computation is done on
> the client side, so the cycles burned and the time the user has to
> wait is of much less concern, though); you'd essentially be doing
> "git fsck" to find the "dangling" objects.
Trading client-side computation for bandwidth can potentially be
worthwhile if you have plenty of local compute but a slow and metered
link.
> The list of what would be transferred needs to come in full from the
> server end, as the list names objects that the receiving end may not
> have seen, but the response by the client could be encoded much
> tightly. For the full list of N objects from the server, we can
> think of your response to be a bitstream of N bits, each on-bit in
> which signals an unwanted object in the list. You can optimize this
> transfer by RLE compressing the bitstream, for example.
>
> As git-over-HTTP is stateless, however, you cannot assume that the
> server side remembers what it sent to the client (instead, the
> client side needs to re-post what it heard from the server in the
> previous exchange to allow the server side to use it after
> validating). So "objects at these indices in your list" kind of
> optimization may not work very well in that environment. I'd
> imagine that an exchange of "Here are the list of objects", "Give me
> these objects" done naively in full 40-hex object names would work
> OK there, though.
Good point. Between statelessness and Duy's point about the client list
usually being smaller than the server list, perhaps it would make sense
to not have the server send a list at all, and just have the client send
its own list.
- Josh Triplett
next prev parent reply other threads:[~2016-03-02 16:41 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-02 1:30 Resumable git clone? Josh Triplett
2016-03-02 1:40 ` Stefan Beller
2016-03-02 2:30 ` Al Viro
2016-03-02 6:31 ` Junio C Hamano
2016-03-02 7:37 ` Duy Nguyen
2016-03-02 7:44 ` Duy Nguyen
2016-03-02 7:54 ` Josh Triplett
2016-03-02 8:31 ` Junio C Hamano
2016-03-02 9:28 ` Duy Nguyen
2016-03-02 16:41 ` Josh Triplett [this message]
2016-03-02 8:13 ` Josh Triplett
2016-03-02 8:22 ` Duy Nguyen
2016-03-02 8:32 ` Jeff King
2016-03-02 10:47 ` Bhavik Bavishi
2016-03-02 16:40 ` Josh Triplett
2016-03-02 8:14 ` Duy Nguyen
2016-03-02 1:45 ` Duy Nguyen
2016-03-02 8:41 ` Junio C Hamano
2016-03-02 15:51 ` Konstantin Ryabitsev
2016-03-02 16:49 ` Josh Triplett
2016-03-02 17:57 ` Junio C Hamano
2016-03-24 8:00 ` Philip Oakley
2016-03-24 15:53 ` Junio C Hamano
2016-03-24 21:08 ` Philip Oakley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160302164118.GA13732@x \
--to=josh@joshtriplett.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
--cc=sarah@thesharps.us \
--cc=sbeller@google.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).