git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Nicolas Pitre <nico@fluxnic.net>
To: Avery Pennarun <apenwarr@gmail.com>
Cc: Jakub Narebski <jnareb@gmail.com>, Theodore Tso <tytso@mit.edu>,
	Jeff King <peff@peff.net>, Will Palmer <wmpalmer@gmail.com>,
	git@vger.kernel.org
Subject: Re: Why is "git tag --contains" so slow?
Date: Thu, 08 Jul 2010 16:13:21 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.2.00.1007081559300.6020@xanadu.home> (raw)
In-Reply-To: <AANLkTikVNkObOxGQhDJ5Qau-vYn2YcomHQW2p2zsMof9@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2425 bytes --]

On Thu, 8 Jul 2010, Avery Pennarun wrote:

> On Thu, Jul 8, 2010 at 3:29 PM, Nicolas Pitre <nico@fluxnic.net> wrote:
> > I might be looking at this from my own perspective as one of the few
> > people who hacked extensively on the Git pack format from the very
> > beginning.  But I do see a way for the pack format to encode commit and
> > tree objects so that walking them would be a simple lookup in the pack
> > index file where both the SHA1 and offset in the pack for each parent
> > can be immediately retrieved.  Same for tree references.  No deflating
> > required, no binary search, just simple dereferences.  And the pack size
> > would even shrink as a side effect.
> 
> One trick that bup uses is an additional file that sits alongside the
> pack and acts as an index.  In bup's case, this is to work around
> deficiencies in the .idx file format when using ridiculously huge
> numbers of objects (hundreds of millions) across a large number of
> packfiles.  But the same concept could apply here: instead of doing
> something like rev-cache, you could just construct the "efficient"
> part of the packv4 format (which I gather is entirely related to
> commit messages), and store it alongside each pack.

No.  I want the essential information in an efficient encoding _inside_ 
the pack, actually replacing the existing encoding.  One of the goal is 
also to reduce repository size, not to grow it.

> This would allow people to incrementally modify git to use the new,
> efficient commit object storage, without breaking backward
> compatibility with earlier versions of git.  (Just as bup can index
> huge numbers of packed objects but still stores them in the plain git
> pack format.)

Initially, what I'm aiming for is for pack-objects to produce the new 
format, for index-pack to grok it, and for sha1_file:unpack_entry() to 
simply regenerate the canonical object format whenever a pack v4 object 
is encountered.  Also pack-objects would be able to revert the object 
encoding to the current format on the fly when it is serving a fetch 
request to a client which is not pack v4 aware, just like we do now with 
the ofs-delta capability.

Once that stage is reached, I'll submit the lot and hope that other 
people will help incrementally converting part of Git to benefit from 
native access to the pack v4 data.  The tree object walk code would be 
the first obvious candidate.  And so on.


Nicolas

  reply	other threads:[~2010-07-08 20:13 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-01  0:54 Why is "git tag --contains" so slow? Theodore Ts'o
2010-07-01  0:58 ` Shawn O. Pearce
2010-07-03 23:27   ` Sam Vilain
2010-07-01  1:00 ` Avery Pennarun
2010-07-01 12:17   ` tytso
2010-07-01 15:03     ` Jeff King
2010-07-01 15:38       ` Jeff King
2010-07-02 19:26         ` tytso
2010-07-03  8:06           ` Jeff King
2010-07-04  0:55             ` tytso
2010-07-05 12:27               ` Jeff King
2010-07-05 12:33                 ` [RFC/PATCH 1/4] tag: speed up --contains calculation Jeff King
2010-10-13 22:07                   ` Jonathan Nieder
2010-10-13 22:56                   ` Clemens Buchacher
2011-02-23 15:51                   ` Ævar Arnfjörð Bjarmason
2011-02-23 16:39                     ` Jeff King
2010-07-05 12:34                 ` [RFC/PATCH 2/4] limit "contains" traversals based on commit timestamp Jeff King
2010-10-13 23:21                   ` Jonathan Nieder
2010-07-05 12:35                 ` [RFC/PATCH 3/4] default core.clockskew variable to one day Jeff King
2010-07-05 12:36                 ` [RFC/PATCH 4/4] name-rev: respect core.clockskew Jeff King
2010-07-05 12:39                 ` Why is "git tag --contains" so slow? Jeff King
2010-10-14 18:59                   ` Jonathan Nieder
2010-10-16 14:32                     ` Clemens Buchacher
2010-10-27 17:11                       ` Jeff King
2010-10-28  8:07                         ` Clemens Buchacher
2010-07-05 14:10                 ` tytso
2010-07-06 11:58                   ` Jeff King
2010-07-06 15:31                     ` Will Palmer
2010-07-06 16:53                       ` tytso
2010-07-08 11:28                         ` Jeff King
2010-07-08 13:21                           ` Will Palmer
2010-07-08 13:54                             ` tytso
2010-07-07 17:45                       ` Jeff King
2010-07-08 10:29                         ` Theodore Tso
2010-07-08 11:12                           ` Jakub Narebski
2010-07-08 19:29                             ` Nicolas Pitre
2010-07-08 19:39                               ` Avery Pennarun
2010-07-08 20:13                                 ` Nicolas Pitre [this message]
2010-07-08 21:20                                   ` Jakub Narebski
2010-07-08 21:30                                     ` Sverre Rabbelier
2010-07-08 23:10                                       ` Nicolas Pitre
2010-07-08 23:15                                     ` Nicolas Pitre
2010-07-08 11:31                           ` Jeff King
2010-07-08 14:35                           ` Johan Herland
2010-07-08 19:06                           ` Nicolas Pitre
2010-07-07 17:50                       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.1007081559300.6020@xanadu.home \
    --to=nico@fluxnic.net \
    --cc=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=peff@peff.net \
    --cc=tytso@mit.edu \
    --cc=wmpalmer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).