git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Dan Holmsand <holmsand@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] improved delta support for git
Date: Wed, 18 May 2005 14:41:54 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.62.0505181428170.20274@localhost.localdomain> (raw)
In-Reply-To: <d6evrk$jv2$1@sea.gmane.org>

On Wed, 18 May 2005, Dan Holmsand wrote:

> Nicolas Pitre wrote:
> > One thing I've been wondering about is whether gzipping small deltas is
> > actually a gain.  For very small files it seems that gzip is adding more
> > overhead making the compressed file actually larger.  Might be worth storing
> > some deltas uncompressed if the compressed version turns out to be larger.
> 
> It's probably better to skip deltafication of very small files altogether. Big
> pain for small gain, and all that.

No, that's not what I mean.

Suppose a large source file that may change only one line between two 
versions.  The delta may therefore end up being only a few bytes long.  
Compressing a few bytes with zlib creates a _larger_ file than the 
original few bytes.

> > Well, any delta object smaller than its original object saves space, even if
> > it's 75% of the original size. But...
> 
> That's not true if you want to keep the delta chain length down (and thus
> performance up).

Sure.  That's why I added the -d switch to mkdelta.  But if you can fit 
a delta which is 75% the size of its original object size then you still 
save 25% of the space, regardless of the delta chain length.

> But in this case, the trick is to know when to stop deltafying against one
> base file, and start over with another. If you switch to a new keyframe too
> often, you obviously lose some potential savings. But if you don't switch
> often enough, you end up repeating the same data in too many delta files.

That's why multiple combinations should be tried.  And to keep things 
under control then a new argument specifying the delta "distance" might 
limit the number of trials.

> A maximum delta size of 10% turned out to be ideal for at least the "fs"
> tree. 8% was significantly worse, as was 15%. (The ideal size depends on  how
> big the average change is: the smaller the average change, the smaller the max
> delta size should be).

In fact it seems that deltas might be significantly harder to compress.  
Therefore a test on the resulting file should probably be done as well 
to make sure we don't end up with a delta larger than the original 
object.

> > ... but then the ultimate solution is to try out all possible references
> > within a given list.  My git-deltafy-script already finds out the list of
> > objects belonging to the same file.  Maybe git-mkdelta should try all
> > combinations between them.  This way a deeper delta chain could be allowed
> > for maximum space saving.
> 
> Yeah. But then you lose the ability to do incremental deltafication, or
> deltafication on-the-fly.

Not at all.  Nothing prevents you from making the latest revision of a 
file be the reference object and the previous revision turned into a 
delta against that latest revision, even if it was itself a reference 
object before.  The only thing that must be avoided is a delta loop and 
current mkdelta code takes care of that already.


Nicolas

  reply	other threads:[~2005-05-18 18:42 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-12  3:51 [PATCH] improved delta support for git Nicolas Pitre
2005-05-12  4:36 ` Junio C Hamano
2005-05-12 14:27   ` Chris Mason
     [not found]     ` <2cfc403205051207467755cdf@mail.gmail.com>
2005-05-12 14:47       ` Jon Seymour
2005-05-12 15:18         ` Nicolas Pitre
2005-05-12 17:16           ` Junio C Hamano
2005-05-13 11:44             ` Chris Mason
2005-05-17 18:22 ` Thomas Glanzmann
2005-05-17 19:02   ` Thomas Glanzmann
2005-05-17 19:10   ` Thomas Glanzmann
2005-05-17 21:43 ` Dan Holmsand
2005-05-18  4:32   ` Nicolas Pitre
2005-05-18  8:54     ` Dan Holmsand
2005-05-18 18:41       ` Nicolas Pitre [this message]
2005-05-18 19:32         ` Dan Holmsand
2005-05-18 15:12   ` Linus Torvalds
2005-05-18 17:15     ` Dan Holmsand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.62.0505181428170.20274@localhost.localdomain \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=holmsand@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).