git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Florian Weimer <fw@deneb.enyo.de>
To: Linus Torvalds <torvalds@osdl.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@elte.hu>,
	git@vger.kernel.org
Subject: Re: Index/hash order
Date: Wed, 13 Apr 2005 23:40:05 +0200	[thread overview]
Message-ID: <87aco2gxu2.fsf@deneb.enyo.de> (raw)
In-Reply-To: <Pine.LNX.4.58.0504131008500.4501@ppc970.osdl.org> (Linus Torvalds's message of "Wed, 13 Apr 2005 10:24:53 -0700 (PDT)")

* Linus Torvalds:

>  - I want things to distribute well. This means that it has to be based 
>    on a "append data" model, where historical data never changes, and you 
>    only append on top of it (either by adding totally new files, or by 
>    just letting the files grow).

Yes, I think this is something which can easily dominate the choice of
data structure.

>    This works in a forward-delta environment (which is fundamentally based 
>    on the notion of "we know the old version, we're adding new stuff on
>    top of it"), but does _not_ work in the backwards model of "we keep the
>    old history as a delta against the new" model.

Forward deltas don't have to be terribly inefficient.  You can get
O(log n) access to revision n fairly easily, using the trick described
there:

  <http://svn.collab.net/repos/svn/trunk/notes/skip-deltas>

I've run a few tests, just to get a few numbers of the overhead
involved.  I used the last ~8,000 changesets from the BKCVS kernel
repository.  With cold cache, a checkout from cold cache takes about
250 seconds on my laptop.  I don't have git numbers, but a mere copy
of the kernel tree needs 40 seconds.

For the hot-cache case, the difference is 140 seconds vs. 2.5 seconds
(or 6 seconds with decompression).

Uh-oh.  I wouldn't have imaged the difference would be *that*
dramatic.  The file system layer is *fast*.

Subversion's delta implementation is not a speed daemon (it handles
arbitrarily large files, which increases complexity significantly and
slows things down, compared to simpler in-memory algorithms), but it
will be very hard to come even close to the 2.5 seconds.

On the storage front, we have 220 MB for the skip deltas vs. 106 MB
for pure deltas-to-previous vs. 1.1 GB for uncompressed files
(directories are always delta-compressed, so to speak[1]).  In the
first two cases, the first revision in the repository is deltaed
against /dev/null and itself and thus compressed, in case you think
the numbers are suspiciously low.

1. AFAICS, you can't really avoid that if you want to track file
   identity information without introducing arbitrary file IDs.

  parent reply	other threads:[~2005-04-13 21:37 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <425C3F12.9070606@zytor.com>
     [not found] ` <Pine.LNX.4.58.0504121452330.4501@ppc970.osdl.org>
     [not found]   ` <20050412224027.GB20821@elte.hu>
     [not found]     ` <Pine.LNX.4.58.0504121554140.4501@ppc970.osdl.org>
     [not found]       ` <20050412230027.GA21759@elte.hu>
     [not found]         ` <20050412230729.GA22179@elte.hu>
     [not found]           ` <20050413111355.GB13865@elte.hu>
     [not found]             ` <425D4E1D.4040108@zytor.com>
     [not found]               ` <20050413165310.GA22428@elte.hu>
     [not found]                 ` <425D4FB1.9040207@zytor.com>
     [not found]                   ` <20050413171052.GA22711@elte.hu>
     [not found]                     ` <Pine.LNX.4.58.0504131027210.4501@ppc970.osdl.org>
     [not found]                       ` <20050413182909.GA25221@elte.hu>
     [not found]                         ` <Pine.LNX.4.58.0504131144160.4501@ppc970.osdl.org>
2005-04-13 20:02                           ` Index/hash order Ingo Molnar
2005-04-13 20:07                             ` H. Peter Anvin
2005-04-13 20:15                               ` Ingo Molnar
2005-04-13 20:18                                 ` Ingo Molnar
2005-04-13 20:21                                   ` Ingo Molnar
2005-04-13 20:26                                     ` Updated base64 patches H. Peter Anvin
2005-04-13 21:04                                 ` Index/hash order Linus Torvalds
2005-04-20  7:40                                   ` enforcing DB immutability Ingo Molnar
2005-04-20  7:49                                     ` Ingo Molnar
2005-04-20  7:53                                       ` Ingo Molnar
2005-04-20  8:58                                         ` Chris Wedgwood
2005-04-20 14:57                                       ` Nick Craig-Wood
2005-04-27  8:15                                       ` Wout
2005-04-13 20:15                               ` Index/hash order Linus Torvalds
2005-04-13 20:28                         ` Baruch Even
     [not found]                   ` <Pine.LNX.4.58.0504131008500.4501@ppc970.osdl.org>
2005-04-13 21:40                     ` Florian Weimer [this message]
2005-04-13 22:11                       ` Linus Torvalds
2005-04-13 22:48                         ` Florian Weimer
2005-04-14  7:04                         ` Ingo Molnar
2005-04-14 10:50                           ` cache-cold repository performance Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87aco2gxu2.fsf@deneb.enyo.de \
    --to=fw@deneb.enyo.de \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    --cc=mingo@elte.hu \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).