git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Shawn Pearce <spearce@spearce.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: git <git@vger.kernel.org>
Subject: Re: expanding pack idx fanout table
Date: Mon, 8 Jul 2013 10:58:28 -0700	[thread overview]
Message-ID: <CAJo=hJu2WgVfWhBGozxgPf9QT=9TydP55-Hqjfis6bhMLEPQag@mail.gmail.com> (raw)
In-Reply-To: <7vsizpymuy.fsf@alter.siamese.dyndns.org>

On Mon, Jul 8, 2013 at 10:37 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Shawn Pearce <spearce@spearce.org> writes:
>
>> Has anyone studied the impact of converting the pack idx fanout table
>> from 256 entries to 65536 entries?
>>
>> Back of the envelope estimates for 3.1M objects in linux.git suggests
>> a 2^16 fanout table would decrease the number of binary search
>> iterations from ~14 to ~6. The increased table costs an extra 255 KiB
>> of disk. On a 70M idx file this is noise.
>>
>> I'm starting to wonder if increasing the fanout table once the object
>> count is above a certain threshold is a reasonable optimization for
>> larger repositories.
>
> Yeah, and I do not think we have to be worried too much about
> backward compatibility for .idx files, as they are local and can be
> regenerated if an older version cannot read it.
>
> I also wonder if we can generate a finer-grained fan-out table on
> the fly, perhaps lazily, without changing the on-disk format ;-)

Heh. Yes. The reader at runtime could expand the 256 fanout table to a
65536 fanout table without changing the on disk format. Unfortunately
doing the expansion will require O(N) time as the 2nd byte of each
SHA-1 must be examined from every bucket. If you are going to perform
O(N) lookups this expansion might save time as it lowers the "log N"
bound of the O(N log N) algorithm for N lookups. It doesn't help
enough when doing only N/20 lookups.

  reply	other threads:[~2013-07-08 17:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-08 15:54 expanding pack idx fanout table Shawn Pearce
2013-07-08 17:37 ` Junio C Hamano
2013-07-08 17:58   ` Shawn Pearce [this message]
2013-07-09  4:19 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJo=hJu2WgVfWhBGozxgPf9QT=9TydP55-Hqjfis6bhMLEPQag@mail.gmail.com' \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).