git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Stefan Beller <sbeller@google.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	sarah@thesharps.us
Subject: Re: Resumable git clone?
Date: Wed, 2 Mar 2016 03:32:27 -0500	[thread overview]
Message-ID: <20160302083227.GA30065@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8CBBk4bgz6Gn0QvCwWtOsqcQZBYgOBQTd=4Y+2YKs44Qg@mail.gmail.com>

On Wed, Mar 02, 2016 at 03:22:17PM +0700, Duy Nguyen wrote:

> > As a simple proposal, the server could send the list of hashes (in
> > approximately the same order it would send the pack), the client could
> > send back a bitmap where '0' means "send it" and '1' means "got that one
> > already", and the client could compress that bitmap.  That gives you the
> > RLE and similar without having to write it yourself.  That might not be
> > optimal, but it would likely set a high bar with minimal effort.
> 
> We have an implementation of EWAH bitmap compression, so compressing
> is not a problem.
> 
> But I still don't see why it's more efficient to have the server send
> the hash list to the client. Assume you need to transfer N objects.
> That direction makes you always send N hashes. But if the client sends
> the list of already fetched objects, M, then M <= N. And we won't need
> to send the bitmap. What did I miss?

Right, I don't see what the point is in compressing the bitmap. The sha1
list for a clone of linux.git is 87 megabytes. The return bitmap, even
naively, is 500K. Unless you are trying to optimize for wildly
asymmetric links.

If the client just naively sends "here's what I have", then we know it
can never be _more_ than 87 megabytes. And as a bonus, the longer the
list is, the more we are saving (so at the moment you are sending 82MB,
it's really worth it, because you do have 95% of the pack, which is
worth amortizing).

I'm still a little dubious that anything involving "send all the hashes"
is going to be useful in practice, especially for something like the
kernel (where you have tons of huge small objects that delta well). It
would work better when you have gigantic objects that don't delta (so
the cost of a sha1 versus the object size is way better), but then I
think we'd do better to transfer all of the normal-sized bits up front,
and then allow fetching the large stuff separately.

-Peff

  reply	other threads:[~2016-03-02  8:32 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-02  1:30 Resumable git clone? Josh Triplett
2016-03-02  1:40 ` Stefan Beller
2016-03-02  2:30   ` Al Viro
2016-03-02  6:31     ` Junio C Hamano
2016-03-02  7:37       ` Duy Nguyen
2016-03-02  7:44         ` Duy Nguyen
2016-03-02  7:54         ` Josh Triplett
2016-03-02  8:31           ` Junio C Hamano
2016-03-02  9:28             ` Duy Nguyen
2016-03-02 16:41             ` Josh Triplett
2016-03-02  8:13     ` Josh Triplett
2016-03-02  8:22       ` Duy Nguyen
2016-03-02  8:32         ` Jeff King [this message]
2016-03-02 10:47           ` Bhavik Bavishi
2016-03-02 16:40         ` Josh Triplett
2016-03-02  8:14     ` Duy Nguyen
2016-03-02  1:45 ` Duy Nguyen
2016-03-02  8:41 ` Junio C Hamano
2016-03-02 15:51   ` Konstantin Ryabitsev
2016-03-02 16:49   ` Josh Triplett
2016-03-02 17:57     ` Junio C Hamano
2016-03-24  8:00   ` Philip Oakley
2016-03-24 15:53     ` Junio C Hamano
2016-03-24 21:08       ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160302083227.GA30065@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=josh@joshtriplett.org \
    --cc=pclouds@gmail.com \
    --cc=sarah@thesharps.us \
    --cc=sbeller@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).