git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Florian Weimer <fw@deneb.enyo.de>
To: Linus Torvalds <torvalds@osdl.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@elte.hu>,
	git@vger.kernel.org
Subject: Re: Index/hash order
Date: Thu, 14 Apr 2005 00:48:00 +0200	[thread overview]
Message-ID: <87br8ithsv.fsf@deneb.enyo.de> (raw)
In-Reply-To: <Pine.LNX.4.58.0504131503180.4501@ppc970.osdl.org> (Linus Torvalds's message of "Wed, 13 Apr 2005 15:11:57 -0700 (PDT)")

* Linus Torvalds:

> I will bet you that a git checkout is _faster_ than a kernel source tree
> copy. The time will be dominated by the IO costs (in particular the read
> costs), and the IO costs are lower thanks to compression. So I think that
> the cold-cache case will beat your 40 seconds by a clear margin. It
> generally compresses to half the size, so 20 seconds is not impossible
> (although seek costs would tend to stay constant, so I'd expect it to be
> somewhere in between the two).

It's indeed slightly faster (34 seconds).  The hot-cache case is about
6 seconds.  Still okay.

However, I should redo these tests with a real git.  The numbers could
be quite different because seek overhead is a bit hard to predict.
Which version should I try?

> That's actually pretty encouraging. Your 1.1GB number implies to me that a
> compressed file setup should be about half that, which in turn says that
> the cost of full-file is not at all outrageous.

I usually try to avoid the typical O(f(n)) fallacy because constant
factors do matter in practice.  But the way you put it -- maybe delta
compression isn't worth the complexity after all.  At least I'm
beginning to have doubts.

Especially since the same Subversion repository, stored by the
Berkeley DB backend, requires a whopping 1.3 GB of disk space.

> Or maybe I misunderstood what you were comparing?

My estimates only cover file data, not metadata.  Based on the
Subversion dumps, it might be possible to get some rough estimates for
the cost of storing directory information.  What is the average size
of a directory blob?  Is it true that for each tree revision, you need
to store a new directory blob for each directory which indirectly
contains a modified file?

Does your 50% estimate include wasted space due to the file system
block size?

  reply	other threads:[~2005-04-13 22:45 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <425C3F12.9070606@zytor.com>
     [not found] ` <Pine.LNX.4.58.0504121452330.4501@ppc970.osdl.org>
     [not found]   ` <20050412224027.GB20821@elte.hu>
     [not found]     ` <Pine.LNX.4.58.0504121554140.4501@ppc970.osdl.org>
     [not found]       ` <20050412230027.GA21759@elte.hu>
     [not found]         ` <20050412230729.GA22179@elte.hu>
     [not found]           ` <20050413111355.GB13865@elte.hu>
     [not found]             ` <425D4E1D.4040108@zytor.com>
     [not found]               ` <20050413165310.GA22428@elte.hu>
     [not found]                 ` <425D4FB1.9040207@zytor.com>
     [not found]                   ` <20050413171052.GA22711@elte.hu>
     [not found]                     ` <Pine.LNX.4.58.0504131027210.4501@ppc970.osdl.org>
     [not found]                       ` <20050413182909.GA25221@elte.hu>
     [not found]                         ` <Pine.LNX.4.58.0504131144160.4501@ppc970.osdl.org>
2005-04-13 20:02                           ` Index/hash order Ingo Molnar
2005-04-13 20:07                             ` H. Peter Anvin
2005-04-13 20:15                               ` Ingo Molnar
2005-04-13 20:18                                 ` Ingo Molnar
2005-04-13 20:21                                   ` Ingo Molnar
2005-04-13 20:26                                     ` Updated base64 patches H. Peter Anvin
2005-04-13 21:04                                 ` Index/hash order Linus Torvalds
2005-04-20  7:40                                   ` enforcing DB immutability Ingo Molnar
2005-04-20  7:49                                     ` Ingo Molnar
2005-04-20  7:53                                       ` Ingo Molnar
2005-04-20  8:58                                         ` Chris Wedgwood
2005-04-20 14:57                                       ` Nick Craig-Wood
2005-04-27  8:15                                       ` Wout
2005-04-13 20:15                               ` Index/hash order Linus Torvalds
2005-04-13 20:28                         ` Baruch Even
     [not found]                   ` <Pine.LNX.4.58.0504131008500.4501@ppc970.osdl.org>
2005-04-13 21:40                     ` Florian Weimer
2005-04-13 22:11                       ` Linus Torvalds
2005-04-13 22:48                         ` Florian Weimer [this message]
2005-04-14  7:04                         ` Ingo Molnar
2005-04-14 10:50                           ` cache-cold repository performance Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87br8ithsv.fsf@deneb.enyo.de \
    --to=fw@deneb.enyo.de \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    --cc=mingo@elte.hu \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).