git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Davide Libenzi <davidel@xmailserver.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: "C. Scott Ananian" <cscott@cscott.net>,
	Alon Ziv <alonz@nolaviz.org>,
	git@vger.kernel.org
Subject: Re: RFC: adding xdelta compression to git
Date: Tue, 3 May 2005 11:10:32 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.58.0505031048440.13099@bigblue.dev.mdolabs.com> (raw)
In-Reply-To: <Pine.LNX.4.58.0505031031240.3594@ppc970.osdl.org>

On Tue, 3 May 2005, Linus Torvalds wrote:

> On Tue, 3 May 2005, C. Scott Ananian wrote:
> > 
> > Linus knows this.  His point is just to be sure you actually *code* that 
> > walk in fsck, and (hopefully) do so w/o complicating the fsck too much.
> 
> Indeed. It's also a performance issue.
> 
> If you do xdelta objects, and don't tell fsck about it, then fsck will 
> just check every object as a blob. Why is that bad?
> 
> Think about it: let's say that you have a series of xdelta objects, and a 
> fsck that is xdelta-unaware. It will unpack each object independently, 
> which means that it will keep on doing the same early xdelta work over and 
> over and over again. Instead of just applying them in order, and checking 
> the sha1 of the result at each point.
> 
> Now, You probably want to limit the length of the chains to some firly 
> small number anyway, so maybe that's not a big deal. Who knows. And I'm 
> actually still so anal that I don't think I'd use this for _my_ tree, just 
> because I'm a worry-wart (and I still think disk is incredibly cheap ;)

If you use a "full tip" metadata format with reverse deltas, you drop a 
"full" version "time to time" along the chain, and you keep a small index 
file, you have:

1) No matter how big it becomes the xdelta collection object, you are only 
   touching very limited regions of it (due the small index file, that can 
   be less than 20+8 bytes per entry in the xdelta blob)

2) Checkout happens w/out even doing xpatching (since the tip is full)

3) Checkins requires only one xdelta operation (since the tip is full), 
   and zero if it is the time to store a full version along the chain (I 
   use to drop one every 10-16 xdeltas, depending on the progressive size 
   of the delta operations)

4) Worst case performance in reconstructing histories are bound by the 
   longest xdelta chain (10-16)

In some way I tend to agree (strangely ;) with you about the disk-cheap 
mantra, but network bandwidth matter IMO. So, if you do not want (being a 
real worry-wart) to use xdelta leverage on the FS trees, you can have way 
smarter network protocols using xdelta plus the knowledge of the git 
history structure. The rsync algo uses xdelta, but the poor guy is not 
able to leverage from the knowledge of the history that only git knows. 
So, if Larry and Greg shares a common object A, Larry changes A and makes 
a new git object B, rsync will transfer the whole object B, because it 
does not have any idea of the git structure. Git though, has this 
knowledge, and it can say to the remote fetcher: Look, I have this new 
thing called B, that is basically your thing A plus this very small xdelta 
(B-A). And typical xdelta diffs are really small (1/7 to 1/10 of classical 
'diff -u' ones).




- Davide


  reply	other threads:[~2005-05-03 18:04 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-03  3:57 RFC: adding xdelta compression to git Alon Ziv
2005-05-03  4:12 ` Nicolas Pitre
2005-05-03  4:52 ` Linus Torvalds
2005-05-03  5:30   ` Davide Libenzi
2005-05-03 15:52     ` C. Scott Ananian
2005-05-03 17:35       ` Linus Torvalds
2005-05-03 18:10         ` Davide Libenzi [this message]
2005-05-03  8:06   ` [PATCH] add the ability to create and retrieve delta objects Nicolas Pitre
2005-05-03 11:24     ` Chris Mason
2005-05-03 12:51       ` Nicolas Pitre
2005-05-03 15:07       ` Linus Torvalds
2005-05-03 16:09         ` Chris Mason
2005-05-03 15:57       ` C. Scott Ananian
2005-05-03 16:35         ` Chris Mason
2005-05-03 14:13     ` Chris Mason
2005-05-03 14:24       ` Nicolas Pitre
2005-05-03 14:37         ` Chris Mason
2005-05-03 15:04           ` Nicolas Pitre
2005-05-03 16:54             ` Chris Mason
2005-05-03 14:48     ` Linus Torvalds
2005-05-03 15:52       ` Nicolas Pitre
2005-05-04 15:56     ` Chris Mason
2005-05-04 16:12       ` C. Scott Ananian
2005-05-04 17:44         ` Chris Mason
2005-05-04 22:03           ` Linus Torvalds
2005-05-04 22:43             ` Chris Mason
2005-05-05  3:25             ` Nicolas Pitre
2005-05-04 21:47       ` Geert Bosch
2005-05-04 22:34         ` Chris Mason
2005-05-05  3:10           ` Nicolas Pitre
2005-05-03 12:48   ` RFC: adding xdelta compression to git Dan Holmsand
2005-05-03 15:50   ` C. Scott Ananian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0505031048440.13099@bigblue.dev.mdolabs.com \
    --to=davidel@xmailserver.org \
    --cc=alonz@nolaviz.org \
    --cc=cscott@cscott.net \
    --cc=git@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).