git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Sam Vilain <sam@vilain.net>
To: Jakub Narebski <jnareb@gmail.com>
Cc: Nicolas Pitre <nico@cam.org>,
	Tomasz Kontusz <roverorna@gmail.com>, git <git@vger.kernel.org>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	nick edelen <sirnot@gmail.com>
Subject: Re: Continue git clone after interruption
Date: Fri, 21 Aug 2009 10:57:42 +1200	[thread overview]
Message-ID: <1250809062.5963.24.camel@maia.lan> (raw)
In-Reply-To: <200908200937.05412.jnareb@gmail.com>

On Thu, 2009-08-20 at 09:37 +0200, Jakub Narebski wrote:
> You would have the same (or at least quite similar) problems with 
> deepening part (the 'incrementals' transfer part) as you found with my
> first proposal of server bisection / division of rev-list, and serving
> 1/Nth of revisions (where N is selected so packfile is reasonable) to
> client as incrementals.  Yours is top-down, mine was bottom-up approach
> to sending series of smaller packs.  The problem is how to select size
> of incrementals, and that incrementals are all-or-nothing (but see also
> comment below).

I've defined a way to do this which doesn't have the complexity of
bisect in GitTorrent, making the compromise that you can't guarantee
each chunk is exactly the same size... I'll have a crack at doing it
based on the rev-cache code in C instead of the horrendously slow
Perl/Berkeley solution I have at the moment to see how well it fares.

> Another solution would be to try to come up with some sort of stable
> sorting of objects so that packfile generated for the same parameters
> (endpoints) would be always byte-for-byte the same.  But that might be
> difficult, or even impossible.

delta compression is not repeatable enough for this.

This was an assumption made by the first version of GitTorrent, that
this would be an appropriate solution.  

So, first you have to sort the objects - that's fine, --date-order is a
good starting point, then I reasoned that interleaving new objects for
each commit with commit objects would be a useful sort order.  You also
need to tie-break for commits with the same commit date; I just used the
SHA-1 of the commit for that.  Finally, when making packs to avoid
excessive transfer you have to try to make sure that they are "thin"
packs.

Currently, thin packs can only work starting at the beginning of history
and working forward, which is opposite to what happens most of the time
in packs.  I think this is the source of much of the inefficiency caused
by chopping up the object lists mentioned in my other e-mail.  It might
be possible, if you could also know which earlier objects were using
this object as a delta base, to try delta'ing against all those objects
and see which one results in the smallest delta.

> Well, we could send the list of objects in pack in order used later by
> pack creation to client (non-resumable but small part), and if packfile
> transport was interrupted in the middle client would compare list of 
> complete objects in part of packfile against this manifest, and sent
> request to server with *sorted* list of object it doesn't have yet.
> Server would probably have to check validity of objects list first (the
> object list might be needed to be more than just object list; it might
> need to specify topology of deltas, i.e. which objects are base for which
> ones).  Then it would generate rest of packfile.

Mmm.  It's a bit chatty, that.  Object lists add another 10-20% on,
which I think should be avoidable if the thin pack problem, plus the
problem of some objects ending up in more than one of the thin packs
that are created, should be reduced to very little.

Sam

  parent reply	other threads:[~2009-08-20 22:55 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-17 11:42 Continue git clone after interruption Tomasz Kontusz
2009-08-17 12:31 ` Johannes Schindelin
2009-08-17 15:23   ` Shawn O. Pearce
2009-08-18  5:43   ` Matthieu Moy
2009-08-18  6:58     ` Tomasz Kontusz
2009-08-18 17:56       ` Nicolas Pitre
2009-08-18 18:45         ` Jakub Narebski
2009-08-18 20:01           ` Nicolas Pitre
2009-08-18 21:02             ` Jakub Narebski
2009-08-18 21:32               ` Nicolas Pitre
2009-08-19 15:19                 ` Jakub Narebski
2009-08-19 19:04                   ` Nicolas Pitre
2009-08-19 19:42                     ` Jakub Narebski
2009-08-19 21:13                       ` Nicolas Pitre
2009-08-20  0:26                         ` Sam Vilain
2009-08-20  7:37                         ` Jakub Narebski
2009-08-20  7:48                           ` Nguyen Thai Ngoc Duy
2009-08-20  8:23                             ` Jakub Narebski
2009-08-20 18:41                           ` Nicolas Pitre
2009-08-21 10:07                             ` Jakub Narebski
2009-08-21 10:26                               ` Matthieu Moy
2009-08-21 21:07                               ` Nicolas Pitre
2009-08-21 21:41                                 ` Jakub Narebski
2009-08-22  0:59                                   ` Nicolas Pitre
2009-08-21 23:07                                 ` Sam Vilain
2009-08-22  3:37                                   ` Nicolas Pitre
2009-08-22  5:50                                     ` Sam Vilain
2009-08-22  8:13                                       ` Nicolas Pitre
2009-08-23 10:37                                         ` Sam Vilain
2009-08-20 22:57                           ` Sam Vilain [this message]
2009-08-18 22:28             ` Johannes Schindelin
2009-08-18 23:40               ` Nicolas Pitre
2009-08-19  7:35                 ` Johannes Schindelin
2009-08-19  8:25                   ` Nguyen Thai Ngoc Duy
2009-08-19  9:52                     ` Johannes Schindelin
2009-08-19 17:21                   ` Nicolas Pitre
2009-08-19 22:23                     ` René Scharfe
2009-08-19  4:42           ` Sitaram Chamarty
2009-08-19  9:53             ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1250809062.5963.24.camel@maia.lan \
    --to=sam@vilain.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=nico@cam.org \
    --cc=roverorna@gmail.com \
    --cc=sirnot@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).