git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Sergio Callegari <scallegari@arces.unibo.it>
To: git@vger.kernel.org
Subject: Re: Git and OpenDocument (OpenOffice.org) files
Date: Mon, 27 Aug 2007 15:16:28 +0000 (UTC)	[thread overview]
Message-ID: <loom.20070827T170518-603@post.gmane.org> (raw)
In-Reply-To: 20070827141600.GA11000@glandium.org

Mike Hommey <mh <at> glandium.org> writes:


> 
> A zipped file will be 100% different at each revision.
> The unzipped counterpart may be similar for 90% or more between revisions.
> 
> Mike
> 

In my (modest) experience, not really:

in fact, odf files are a zip collection of many individual files (for instance
if you have an impress presentation, the zip collection will contain all
the images that appear in the presentation...)

Now: zip is different from .tar.gz in that tar.gz first concatenates the
files and then compresses the overall thing, while zip compresses or stores
the individual files and then concatenates and indexes the result.

The difference is that in a tar.gz file, changing a single byte in one of
the internal files can lead to a completely different compressed stream,
while in a zip file, changing an internal file only affects the relevant
part of the zipped file.

This means that:
- if you have an odf document containing lots of internal objects (e.g.
images) that do not change very much from version to version, git can make
very good deltas.
- conversely if you have an odf document whose size is dominated by proper
content, then git will not be able to make good deltas.

As an example, I am finding that impress presentations (dominated by images)
can delta very well, while calc spreadsheets (dominated by content) do not.

Probably it could be nice to make a filter that takes an odf file and 
re-zips it so that the content.xml inner file is only stored, rather
than deflated.  Then this could be used with the git file filtering
machinery.

Sergio

  reply	other threads:[~2007-08-27 15:16 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-27  9:52 Git and OpenDocument (OpenOffice.org) files Matthieu Moy
2007-08-27 10:08 ` Junio C Hamano
2007-08-27 12:35   ` Matthieu Moy
2007-08-27 13:03     ` Mike Hommey
2007-08-27 13:41       ` Johannes Schindelin
2007-08-27 13:58         ` David Kastrup
2007-08-27 14:06           ` Matthieu Moy
2007-08-27 14:15             ` Johannes Schindelin
2007-08-27 14:16           ` Mike Hommey
2007-08-27 15:16             ` Sergio Callegari [this message]
     [not found]           ` <?= =?ISO-8859-1?Q?200708271416=0400.?= =?ISO-8859-1?Q?GA11000@glandium?= =?ISO-8859-1?Q?.org>
2007-08-27 15:05             ` David Kastrup
2007-08-27 10:17 ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=loom.20070827T170518-603@post.gmane.org \
    --to=scallegari@arces.unibo.it \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).