git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Shawn Pearce <spearce@spearce.org>
Cc: Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	git <git@vger.kernel.org>, Colby Ranger <cranger@google.com>
Subject: Re: Commit cache to speed up rev-list and merge
Date: Thu, 27 Sep 2012 14:32:44 -0400	[thread overview]
Message-ID: <20120927183244.GB2519@sigill.intra.peff.net> (raw)
In-Reply-To: <CAJo=hJtus46UGyTcnfTDArp=RkK-P24wO8pjhEY7qAmssyxgVA@mail.gmail.com>

On Thu, Sep 27, 2012 at 10:45:32AM -0700, Shawn O. Pearce wrote:

> On  2012-08-12 Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> > Long term we might gain slight lookup speedup if we know object type
> > as search region is made smaller. But for that to happen, we need to
> > propagate object type hint down to find_pack_entry_one() and friends.
> > Possible thing to do, I think.
> 
> I'm not sure reclustering the index by object type is going to make a
> worthwhile difference. Of 2.2m objects in the Linux tree, 320k are
> commits. The difference between doing the binary search through all
> objects vs. just commits is only 2 iterations more of binary search if
> we assume the per-type ranges have their own fan-out tables.

To me the big win would be implicit indexing for items that are present
for every instance of a particular object type. So if we wanted to keep
the timestamp for every commit, you could have a "pack-*.timestamps"
that is literally just a packed list of uint32's, one per commit, where
the position of a commit's timestamp in the list is the same as its
position in the index of sha1s in the pack index.

That's simple to do if your index is just commits. But if it includes
all objects, then your list is sparse. So either you waste space by
making an empty slot for the non-commit objects, or you have an extra
level of indirection mapping the commit into the packed list, which is
going to double the storage in this case (though you could reuse that
extra mapping for the parent, generation number, etc, so it at least
gets amortized as you store more data). Or is there some clever solution
I'm missing?

For your extension, I don't think it matters. You're sparse even in the
commit-object space, so you have to store the mapping anyway. And your
data is big enough that the overhead isn't too painful.

-Peff

  reply	other threads:[~2012-09-27 18:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-27 15:51 Commit cache to speed up rev-list and merge Shawn Pearce
2012-09-27 17:39 ` Jeff King
2012-09-27 17:45   ` Shawn Pearce
2012-09-27 18:32     ` Jeff King [this message]
2012-09-28  1:43   ` Nguyen Thai Ngoc Duy
2012-09-28  2:14 ` Nguyen Thai Ngoc Duy
2012-10-01  1:49   ` Shawn Pearce
2012-10-01  2:05     ` Nguyen Thai Ngoc Duy
2012-10-01  2:27       ` Shawn Pearce
2012-10-01  5:16         ` Nguyen Thai Ngoc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120927183244.GB2519@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=cranger@google.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).