git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Josh Triplett <josh@joshtriplett.org>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Stefan Beller <sbeller@google.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	sarah@thesharps.us
Subject: Re: Resumable git clone?
Date: Tue, 1 Mar 2016 23:54:37 -0800	[thread overview]
Message-ID: <20160302075437.GA8024@x> (raw)
In-Reply-To: <CACsJy8DcNrOmrKKPibV6GuSqspovBmHzUv_mRB6fZyLjw5wWzQ@mail.gmail.com>

On Wed, Mar 02, 2016 at 02:37:53PM +0700, Duy Nguyen wrote:
> On Wed, Mar 2, 2016 at 1:31 PM, Junio C Hamano <gitster@pobox.com> wrote:
> > Al Viro <viro@ZenIV.linux.org.uk> writes:
> >
> >> FWIW, I wasn't proposing to recreate the remaining bits of that _pack_;
> >> just do the normal pull with one addition: start with sending the list
> >> of sha1 of objects you are about to send and let the recepient reply
> >> with "I already have <set of sha1>, don't bother with those".  And exclude
> >> those from the transfer.
> >
> > I did a quick-and-dirty unscientific experiment.
> >
> > I had a clone of Linus's repository that was about a week old, whose
> > tip was at 4de8ebef (Merge tag 'trace-fixes-v4.5-rc5' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace,
> > 2016-02-22).  To bring it up to date (i.e. a pull about a week's
> > worth of progress) to f691b77b (Merge branch 'for-linus' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs, 2016-03-01):
> >
> >     $ git rev-list --objects 4de8ebef..f691b77b1fc | wc -l
> >     1396
> >     $ git rev-parse 4de8ebef..f691b77b1fc |
> >       git pack-objects --revs --delta-base-offset --stdout |
> >       wc -c
> >     2444127
> >
> > So in order to salvage some transfer out of 2.4MB, the hypothetical
> > Al protocol would first have the upload-pack give 20*1396 = 28kB
> 
> It could be 10*1396 or less. If the server calculates the shortest
> unambiguous SHA-1 length (quite cheap on fully packed repo) and sends
> it to the client, the client can just sends short SHA-1 instead. It's
> racy though because objects are being added to the server and abbrev
> length may go up. But we can check ambiguity for all SHA-1 sent by
> client and ask for resend for ambiguous ones.
> 
> On my linux-2.6.git, 10 letters (so 5 bytes) are needed for
> unambiguous short SHA-1. But we can even go optimistic and ask the
> client for shorter SHA-1 with hope that resend won't be many.

I don't think it's worth the trouble and ambiguity to send abbreviated
object names over the wire.  I think several simpler optimizations seem
preferable, such as binary object names, and abbreviating complete
object sets ("I have these commits/trees and everything they need
recursively; I also have this stack of random objects.").

That would work especially well for resumable pull, or for the case of
optimizing pull during the merge window.

- Josh Triplett

  parent reply	other threads:[~2016-03-02  7:54 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-02  1:30 Resumable git clone? Josh Triplett
2016-03-02  1:40 ` Stefan Beller
2016-03-02  2:30   ` Al Viro
2016-03-02  6:31     ` Junio C Hamano
2016-03-02  7:37       ` Duy Nguyen
2016-03-02  7:44         ` Duy Nguyen
2016-03-02  7:54         ` Josh Triplett [this message]
2016-03-02  8:31           ` Junio C Hamano
2016-03-02  9:28             ` Duy Nguyen
2016-03-02 16:41             ` Josh Triplett
2016-03-02  8:13     ` Josh Triplett
2016-03-02  8:22       ` Duy Nguyen
2016-03-02  8:32         ` Jeff King
2016-03-02 10:47           ` Bhavik Bavishi
2016-03-02 16:40         ` Josh Triplett
2016-03-02  8:14     ` Duy Nguyen
2016-03-02  1:45 ` Duy Nguyen
2016-03-02  8:41 ` Junio C Hamano
2016-03-02 15:51   ` Konstantin Ryabitsev
2016-03-02 16:49   ` Josh Triplett
2016-03-02 17:57     ` Junio C Hamano
2016-03-24  8:00   ` Philip Oakley
2016-03-24 15:53     ` Junio C Hamano
2016-03-24 21:08       ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160302075437.GA8024@x \
    --to=josh@joshtriplett.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=sarah@thesharps.us \
    --cc=sbeller@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).