From: Nicolas Pitre <nico@cam.org>
To: Dan Holmsand <holmsand@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] improved delta support for git
Date: Wed, 18 May 2005 14:41:54 -0400 (EDT) [thread overview]
Message-ID: <Pine.LNX.4.62.0505181428170.20274@localhost.localdomain> (raw)
In-Reply-To: <d6evrk$jv2$1@sea.gmane.org>
On Wed, 18 May 2005, Dan Holmsand wrote:
> Nicolas Pitre wrote:
> > One thing I've been wondering about is whether gzipping small deltas is
> > actually a gain. For very small files it seems that gzip is adding more
> > overhead making the compressed file actually larger. Might be worth storing
> > some deltas uncompressed if the compressed version turns out to be larger.
>
> It's probably better to skip deltafication of very small files altogether. Big
> pain for small gain, and all that.
No, that's not what I mean.
Suppose a large source file that may change only one line between two
versions. The delta may therefore end up being only a few bytes long.
Compressing a few bytes with zlib creates a _larger_ file than the
original few bytes.
> > Well, any delta object smaller than its original object saves space, even if
> > it's 75% of the original size. But...
>
> That's not true if you want to keep the delta chain length down (and thus
> performance up).
Sure. That's why I added the -d switch to mkdelta. But if you can fit
a delta which is 75% the size of its original object size then you still
save 25% of the space, regardless of the delta chain length.
> But in this case, the trick is to know when to stop deltafying against one
> base file, and start over with another. If you switch to a new keyframe too
> often, you obviously lose some potential savings. But if you don't switch
> often enough, you end up repeating the same data in too many delta files.
That's why multiple combinations should be tried. And to keep things
under control then a new argument specifying the delta "distance" might
limit the number of trials.
> A maximum delta size of 10% turned out to be ideal for at least the "fs"
> tree. 8% was significantly worse, as was 15%. (The ideal size depends on how
> big the average change is: the smaller the average change, the smaller the max
> delta size should be).
In fact it seems that deltas might be significantly harder to compress.
Therefore a test on the resulting file should probably be done as well
to make sure we don't end up with a delta larger than the original
object.
> > ... but then the ultimate solution is to try out all possible references
> > within a given list. My git-deltafy-script already finds out the list of
> > objects belonging to the same file. Maybe git-mkdelta should try all
> > combinations between them. This way a deeper delta chain could be allowed
> > for maximum space saving.
>
> Yeah. But then you lose the ability to do incremental deltafication, or
> deltafication on-the-fly.
Not at all. Nothing prevents you from making the latest revision of a
file be the reference object and the previous revision turned into a
delta against that latest revision, even if it was itself a reference
object before. The only thing that must be avoided is a delta loop and
current mkdelta code takes care of that already.
Nicolas
next prev parent reply other threads:[~2005-05-18 18:42 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-12 3:51 [PATCH] improved delta support for git Nicolas Pitre
2005-05-12 4:36 ` Junio C Hamano
2005-05-12 14:27 ` Chris Mason
[not found] ` <2cfc403205051207467755cdf@mail.gmail.com>
2005-05-12 14:47 ` Jon Seymour
2005-05-12 15:18 ` Nicolas Pitre
2005-05-12 17:16 ` Junio C Hamano
2005-05-13 11:44 ` Chris Mason
2005-05-17 18:22 ` Thomas Glanzmann
2005-05-17 19:02 ` Thomas Glanzmann
2005-05-17 19:10 ` Thomas Glanzmann
2005-05-17 21:43 ` Dan Holmsand
2005-05-18 4:32 ` Nicolas Pitre
2005-05-18 8:54 ` Dan Holmsand
2005-05-18 18:41 ` Nicolas Pitre [this message]
2005-05-18 19:32 ` Dan Holmsand
2005-05-18 15:12 ` Linus Torvalds
2005-05-18 17:15 ` Dan Holmsand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.62.0505181428170.20274@localhost.localdomain \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=holmsand@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).