git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Shawn Pearce <spearce@spearce.org>
Cc: git <git@vger.kernel.org>
Subject: Re: expanding pack idx fanout table
Date: Tue, 9 Jul 2013 00:19:13 -0400	[thread overview]
Message-ID: <20130709041913.GB27903@sigill.intra.peff.net> (raw)
In-Reply-To: <CAJo=hJsto1ik=GTC8c3+2_jBuUqcAPL0UWp-1uoYYMpgbLB+qg@mail.gmail.com>

On Mon, Jul 08, 2013 at 08:54:24AM -0700, Shawn O. Pearce wrote:

> Has anyone studied the impact of converting the pack idx fanout table
> from 256 entries to 65536 entries?
> 
> Back of the envelope estimates for 3.1M objects in linux.git suggests
> a 2^16 fanout table would decrease the number of binary search
> iterations from ~14 to ~6. The increased table costs an extra 255 KiB
> of disk. On a 70M idx file this is noise.

No, I hadn't considered it, but I think your analysis is correct that it
would decrease the lookup cost. Having run "perf" on git a lot of times,
I do see find_pack_entry_one turn up as a noticeable hot spot, but
usually not more than a few percent.

Which kind of makes sense, because of the way we cache objects in
memory. If you are doing something like `rev-list --objects`, you are
going to do O(n) packfile lookups, but you end up doing _way_ more
lookups in the object hash. Not just one per tree entry, but one per
tree entry for each commit in which the entry was untouched.

I tried (maybe a year or so ago) to make the object hash faster using a
similar fan-out trick, but I don't remember it having any real impact
(because we keep our hash tables relatively collision free already). I
think I also tried a full prefix-trie, with no luck.

Obviously the exact results would depend on your workload, but in
general I'd expect the object hash to take the brunt of the load for
repeated lookups, and for non-repeated lookups to be dominated by the
time to actually access the object, as opposed to saving a few hashcmps.
That may change as we start using the pack .idx for more clever things
than simply accessing the objects, though (e.g., it might make a
difference when coupled with my commit cache patches).

-Peff

      parent reply	other threads:[~2013-07-09  4:19 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-08 15:54 expanding pack idx fanout table Shawn Pearce
2013-07-08 17:37 ` Junio C Hamano
2013-07-08 17:58   ` Shawn Pearce
2013-07-09  4:19 ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130709041913.GB27903@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).