git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Alon Ziv <alonz@nolaviz.org>
Cc: git@vger.kernel.org
Subject: Re: RFC: adding xdelta compression to git
Date: Mon, 2 May 2005 21:52:42 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.58.0505022131380.3594@ppc970.osdl.org> (raw)
In-Reply-To: <200505030657.38309.alonz@nolaviz.org>



On Tue, 3 May 2005, Alon Ziv wrote:
> 
> 1. Add a git-deltify command, which will take two trees and replace the second 
> tree's blobs with delta-blobs referring to the first tree.

If you do something like this, you want such a delta-blob to be named by 
the sha1 of the result, so that things that refer to it can transparently 
see either the original blob _or_ the "deltified" one, and will never 
care.

It seems that is your plan:

> from the outside it looks like any other blob, but internally it
> contains another blob reference + an xdelta.

Yes. git doesn't much care, as long as the objects unpack to the right 
format. That's all hidden away.

> The only function which would need to understand the new format would be
> unpack_sha1_file.

Yes. EXCEPT for one thing. fsck. I'd _really_ like fsck to be able to know
something about any xdelta objects, if only because if/when things go
wrong, it's really nasty to suddenly see a million "blob" objects not work
any more, with no indication of _why_ they don't work. The core reason may
be that one original object (that just got used as a base for tons of
other objects through deltas) is corrupt or missing. And then you want to
show that _one_ object.

> Cons:
> * Changes the repository format.

It wouldn't necessarily. You should be able to do this with _zero_ changes 
to existing objects what-so-ever.

What you do is introduce an "xdelta" object, which has a reference to a 
blob object and the delta. The git object model already names all objects 
by a simple ascii name, so adding a new object type in _no_ way changes 
any existing objects.

So you can just make "unpack_sha1_file()" notice that it unpacked a xdelta 
object, and then do the proper delta application, and nobody will ever be 
the wiser.

> * Some performance impact (probably quite small).

If you limit the depth of deltas, probably not too bad.

> * Same blob may have different representation in two repositories (one 
> compressed, on deltified). [I am not sure this is really a bad thing...]

THIS, I think, is the real issue. fsck-cache and pull etc, that needs to
know about references to other objects, would have to be able to see the
xdelta object, so that they can build up the reference graph. So you'd
need to basically make a "raw_unpack_sha1_file()" interface (the current
regular unpack_sha1_file()) for that.

Also, the fact is, since git saves things as separate files, you'd not win
as much as you would with some other backing store. So the second step is
to start packing the objects etc. I think there is actually a very steep
complexity edge here - not because any of the individual steps necessarily
add a whole lot, but because they all lead to the "next step".

I personally clearly feel that simplicity (and the resulting robustness)
is worth a _lot_ of disk-space.

So I think that what you suggest is likely to actually be pretty easy, but 
I'm not entirely convinced it's worth the slide into complexity.

		Linus

  parent reply	other threads:[~2005-05-03  4:45 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-03  3:57 RFC: adding xdelta compression to git Alon Ziv
2005-05-03  4:12 ` Nicolas Pitre
2005-05-03  4:52 ` Linus Torvalds [this message]
2005-05-03  5:30   ` Davide Libenzi
2005-05-03 15:52     ` C. Scott Ananian
2005-05-03 17:35       ` Linus Torvalds
2005-05-03 18:10         ` Davide Libenzi
2005-05-03  8:06   ` [PATCH] add the ability to create and retrieve delta objects Nicolas Pitre
2005-05-03 11:24     ` Chris Mason
2005-05-03 12:51       ` Nicolas Pitre
2005-05-03 15:07       ` Linus Torvalds
2005-05-03 16:09         ` Chris Mason
2005-05-03 15:57       ` C. Scott Ananian
2005-05-03 16:35         ` Chris Mason
2005-05-03 14:13     ` Chris Mason
2005-05-03 14:24       ` Nicolas Pitre
2005-05-03 14:37         ` Chris Mason
2005-05-03 15:04           ` Nicolas Pitre
2005-05-03 16:54             ` Chris Mason
2005-05-03 14:48     ` Linus Torvalds
2005-05-03 15:52       ` Nicolas Pitre
2005-05-04 15:56     ` Chris Mason
2005-05-04 16:12       ` C. Scott Ananian
2005-05-04 17:44         ` Chris Mason
2005-05-04 22:03           ` Linus Torvalds
2005-05-04 22:43             ` Chris Mason
2005-05-05  3:25             ` Nicolas Pitre
2005-05-04 21:47       ` Geert Bosch
2005-05-04 22:34         ` Chris Mason
2005-05-05  3:10           ` Nicolas Pitre
2005-05-03 12:48   ` RFC: adding xdelta compression to git Dan Holmsand
2005-05-03 15:50   ` C. Scott Ananian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0505022131380.3594@ppc970.osdl.org \
    --to=torvalds@osdl.org \
    --cc=alonz@nolaviz.org \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).