From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: git@vger.kernel.org, sbeller@google.com, dstolee@microsoft.com,
jrnieder@gmail.com, jonathantanmy@google.com,
mfick@codeaurora.org
Subject: Re: [PATCH 00/23] Multi-pack-index (MIDX)
Date: Thu, 07 Jun 2018 16:45:18 +0200 [thread overview]
Message-ID: <877enazmdd.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20180607140338.32440-1-dstolee@microsoft.com>
On Thu, Jun 07 2018, Derrick Stolee wrote:
> To test the performance in this situation, I created a
> script that organizes the Linux repository in a similar
> fashion. I split the commit history into 50 parts by
> creating branches on every 10,000 commits of the first-
> parent history. Then, `git rev-list --objects A ^B`
> provides the list of objects reachable from A but not B,
> so I could send that to `git pack-objects` to create
> these "time-based" packfiles. With these 50 packfiles
> (deleting the old one from my fresh clone, and deleting
> all tags as they were no longer on-disk) I could then
> test 'git rev-list --objects HEAD^{tree}' and see:
>
> Before: 0.17s
> After: 0.13s
> % Diff: -23.5%
>
> By adding logic to count hits and misses to bsearch_pack,
> I was able to see that the command above calls that
> method 266,930 times with a hit rate of 33%. The MIDX
> has the same number of calls with a 100% hit rate.
Do you have the script you used for this? It would be very interesting
as something we could stick in t/perf/ to test this use-case in the
future.
How does this & the numbers below compare to just a naïve
--max-pack-size=<similar size> on linux.git?
Is it possible for you to tar this test repo up and share it as a
one-off? I've been polishing the core.validateAbbrev series I have, and
it would be interesting to compare some of the (abbrev) numbers.
> Abbreviation Speedups
> ---------------------
>
> To fully disambiguate an abbreviation, we must iterate
> through all packfiles to ensure no collision exists in
> any packfile. This requires O(P log N) time. With the
> MIDX, this is only O(log N) time. Our standard test [2]
> is 'git log --oneline --parents --raw' because it writes
> many abbreviations while also doing a lot of other work
> (walking commits and trees to compute the raw diff).
>
> For a copy of the Linux repository with 50 packfiles
> split by time, we observed the following:
>
> Before: 100.5 s
> After: 58.2 s
> % Diff: -59.7%
>
>
> Request for Review Attention
> ----------------------------
>
> I tried my best to take the feedback from the commit-graph
> feature and apply it to this feature. I also worked to
> follow the object-store refactoring as I could. I also have
> some local commits that create a 'verify' subcommand and
> integrate with 'fsck' similar to the commit-graph, but I'll
> leave those for a later series (and review is still underway
> for that part of the commit-graph).
>
> One place where I could use some guidance is related to the
> current state of 'the_hash_algo' patches. The file format
> allows a different "hash version" which then indicates the
> length of the hash. What's the best way to ensure this
> feature doesn't cause extra pain in the hash-agnostic series?
> This will inform how I go back and make the commit-graph
> feature better in this area, too.
>
>
> Thanks,
> -Stolee
next prev parent reply other threads:[~2018-06-07 14:45 UTC|newest]
Thread overview: 192+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-07 14:03 [PATCH 00/23] Multi-pack-index (MIDX) Derrick Stolee
2018-06-07 14:03 ` [PATCH 01/23] midx: add design document Derrick Stolee
2018-06-11 19:04 ` Stefan Beller
2018-06-18 18:48 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 02/23] midx: add midx format details to pack-format.txt Derrick Stolee
2018-06-11 19:19 ` Stefan Beller
2018-06-18 19:01 ` Derrick Stolee
2018-06-18 19:41 ` Stefan Beller
2018-06-07 14:03 ` [PATCH 03/23] midx: add midx builtin Derrick Stolee
2018-06-07 17:20 ` Duy Nguyen
2018-06-18 19:23 ` Derrick Stolee
2018-06-11 21:02 ` Stefan Beller
2018-06-18 19:40 ` Derrick Stolee
2018-06-18 19:55 ` Stefan Beller
2018-06-18 19:58 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 04/23] midx: add 'write' subcommand and basic wiring Derrick Stolee
2018-06-07 17:27 ` Duy Nguyen
2018-06-07 14:03 ` [PATCH 05/23] midx: write header information to lockfile Derrick Stolee
2018-06-07 17:35 ` Duy Nguyen
2018-06-12 15:00 ` Duy Nguyen
2018-06-19 12:54 ` Derrick Stolee
2018-06-19 14:59 ` Duy Nguyen
2018-06-19 15:24 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 06/23] midx: struct midxed_git and 'read' subcommand Derrick Stolee
2018-06-07 17:54 ` Duy Nguyen
2018-06-20 13:13 ` Derrick Stolee
2018-06-07 18:31 ` Duy Nguyen
2018-06-20 13:33 ` Derrick Stolee
2018-06-20 15:07 ` Duy Nguyen
2018-06-20 16:39 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 07/23] midx: expand test data Derrick Stolee
2018-06-07 14:03 ` [PATCH 08/23] midx: read packfiles from pack directory Derrick Stolee
2018-06-07 18:03 ` Duy Nguyen
2018-06-20 16:33 ` [PATCH] packfile: generalize pack directory list Derrick Stolee
2018-06-07 14:03 ` [PATCH 09/23] midx: write pack names in chunk Derrick Stolee
2018-06-07 18:26 ` Duy Nguyen
2018-06-21 15:25 ` Derrick Stolee
2018-06-21 17:38 ` Junio C Hamano
2018-06-22 18:25 ` Derrick Stolee
2018-06-22 18:31 ` Junio C Hamano
2018-06-22 18:32 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 10/23] midx: write a lookup into the pack names chunk Derrick Stolee
2018-06-09 16:43 ` Duy Nguyen
2018-06-21 17:23 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 11/23] midx: sort and deduplicate objects from packfiles Derrick Stolee
2018-06-09 17:07 ` Duy Nguyen
2018-06-21 17:54 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 12/23] midx: write object ids in a chunk Derrick Stolee
2018-06-09 17:25 ` Duy Nguyen
2018-06-07 14:03 ` [PATCH 13/23] midx: write object id fanout chunk Derrick Stolee
2018-06-09 17:28 ` Duy Nguyen
2018-06-21 19:49 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 14/23] midx: write object offsets Derrick Stolee
2018-06-09 17:41 ` Duy Nguyen
2018-06-07 14:03 ` [PATCH 15/23] midx: create core.midx config setting Derrick Stolee
2018-06-07 14:03 ` [PATCH 16/23] midx: prepare midxed_git struct Derrick Stolee
2018-06-09 17:47 ` Duy Nguyen
2018-06-07 14:03 ` [PATCH 17/23] midx: read objects from multi-pack-index Derrick Stolee
2018-06-09 17:56 ` Duy Nguyen
2018-06-21 20:03 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 18/23] midx: use midx in abbreviation calculations Derrick Stolee
2018-06-09 18:01 ` Duy Nguyen
2018-06-22 18:38 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 19/23] midx: use existing midx when writing new one Derrick Stolee
2018-06-07 14:03 ` [PATCH 20/23] midx: use midx in approximate_object_count Derrick Stolee
2018-06-09 18:03 ` Duy Nguyen
2018-06-22 18:39 ` Derrick Stolee
2018-06-07 14:03 ` [PATCH 21/23] midx: prevent duplicate packfile loads Derrick Stolee
2018-06-09 18:05 ` Duy Nguyen
2018-06-07 14:03 ` [PATCH 22/23] midx: use midx to find ref-deltas Derrick Stolee
2018-06-07 14:03 ` [PATCH 23/23] midx: clear midx on repack Derrick Stolee
2018-06-09 18:13 ` Duy Nguyen
2018-06-22 18:44 ` Derrick Stolee
2018-06-07 14:06 ` [PATCH 00/23] Multi-pack-index (MIDX) Derrick Stolee
2018-06-07 14:45 ` Ævar Arnfjörð Bjarmason [this message]
2018-06-07 14:54 ` Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 00/24] " Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 01/24] multi-pack-index: add design document Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 02/24] multi-pack-index: add format details Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 03/24] multi-pack-index: add builtin Derrick Stolee
2018-06-25 19:15 ` Junio C Hamano
2018-06-25 14:34 ` [PATCH v2 04/24] multi-pack-index: add 'write' verb Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 05/24] midx: write header information to lockfile Derrick Stolee
2018-06-25 19:19 ` Junio C Hamano
2018-07-05 19:13 ` Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 06/24] multi-pack-index: load into memory Derrick Stolee
2018-06-25 19:38 ` Junio C Hamano
2018-07-05 14:19 ` Derrick Stolee
2018-07-05 18:58 ` Eric Sunshine
2018-07-06 19:20 ` Junio C Hamano
2018-06-25 14:34 ` [PATCH v2 07/24] multi-pack-index: expand test data Derrick Stolee
2018-06-25 19:45 ` Junio C Hamano
2018-06-25 14:34 ` [PATCH v2 08/24] packfile: generalize pack directory list Derrick Stolee
2018-06-25 19:57 ` Junio C Hamano
2018-06-25 14:34 ` [PATCH v2 09/24] multi-pack-index: read packfile list Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 10/24] multi-pack-index: write pack names in chunk Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 11/24] midx: read pack names into array Derrick Stolee
2018-06-25 23:52 ` Eric Sunshine
2018-06-25 14:34 ` [PATCH v2 12/24] midx: sort and deduplicate objects from packfiles Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 13/24] midx: write object ids in a chunk Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 14/24] midx: write object id fanout chunk Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 15/24] midx: write object offsets Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 16/24] config: create core.multiPackIndex setting Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 17/24] midx: prepare midxed_git struct Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 18/24] midx: read objects from multi-pack-index Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 19/24] midx: use midx in abbreviation calculations Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 20/24] midx: use existing midx when writing new one Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 21/24] midx: use midx in approximate_object_count Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 22/24] midx: prevent duplicate packfile loads Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 23/24] packfile: skip loading index if in multi-pack-index Derrick Stolee
2018-06-25 14:34 ` [PATCH v2 24/24] midx: clear midx on repack Derrick Stolee
2018-07-06 0:52 ` [PATCH v3 00/24] Multi-pack-index (MIDX) Derrick Stolee
2018-07-06 0:52 ` [PATCH v3 01/24] multi-pack-index: add design document Derrick Stolee
2018-07-06 0:52 ` [PATCH v3 02/24] multi-pack-index: add format details Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 03/24] multi-pack-index: add builtin Derrick Stolee
2018-07-06 3:54 ` Eric Sunshine
2018-07-06 0:53 ` [PATCH v3 04/24] multi-pack-index: add 'write' verb Derrick Stolee
2018-07-06 4:07 ` Eric Sunshine
2018-07-06 0:53 ` [PATCH v3 05/24] midx: write header information to lockfile Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 06/24] multi-pack-index: load into memory Derrick Stolee
2018-07-06 4:19 ` Eric Sunshine
2018-07-06 5:18 ` Eric Sunshine
2018-07-09 19:08 ` Junio C Hamano
2018-07-12 16:06 ` Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 07/24] multi-pack-index: expand test data Derrick Stolee
2018-07-06 4:36 ` Eric Sunshine
2018-07-06 5:20 ` Eric Sunshine
2018-07-12 14:10 ` Derrick Stolee
2018-07-12 18:02 ` Eric Sunshine
2018-07-12 18:06 ` Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 08/24] packfile: generalize pack directory list Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 09/24] multi-pack-index: read packfile list Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 10/24] multi-pack-index: write pack names in chunk Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 11/24] midx: read pack names into array Derrick Stolee
2018-07-06 4:58 ` Eric Sunshine
2018-07-06 0:53 ` [PATCH v3 12/24] midx: sort and deduplicate objects from packfiles Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 13/24] midx: write object ids in a chunk Derrick Stolee
2018-07-06 5:04 ` Eric Sunshine
2018-07-06 0:53 ` [PATCH v3 14/24] midx: write object id fanout chunk Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 15/24] midx: write object offsets Derrick Stolee
2018-07-06 5:27 ` Eric Sunshine
2018-07-12 16:33 ` Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 16/24] config: create core.multiPackIndex setting Derrick Stolee
2018-07-06 5:39 ` Eric Sunshine
2018-07-12 13:19 ` Derrick Stolee
2018-07-12 16:30 ` Derrick Stolee
2018-07-11 9:48 ` SZEDER Gábor
2018-07-12 13:01 ` Derrick Stolee
2018-07-12 13:31 ` SZEDER Gábor
2018-07-12 15:40 ` Derrick Stolee
2018-07-12 17:29 ` Junio C Hamano
2018-07-06 0:53 ` [PATCH v3 17/24] midx: prepare midxed_git struct Derrick Stolee
2018-07-06 5:41 ` Eric Sunshine
2018-07-06 0:53 ` [PATCH v3 18/24] midx: read objects from multi-pack-index Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 19/24] midx: use midx in abbreviation calculations Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 20/24] midx: use existing midx when writing new one Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 21/24] midx: use midx in approximate_object_count Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 22/24] midx: prevent duplicate packfile loads Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 23/24] packfile: skip loading index if in multi-pack-index Derrick Stolee
2018-07-06 0:53 ` [PATCH v3 24/24] midx: clear midx on repack Derrick Stolee
2018-07-06 5:52 ` Eric Sunshine
2018-07-12 19:39 ` [PATCH v4 00/23] Multi-pack-index (MIDX) Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 01/23] multi-pack-index: add design document Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 02/23] multi-pack-index: add format details Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 03/23] multi-pack-index: add builtin Derrick Stolee
2018-07-20 18:22 ` Junio C Hamano
2018-07-20 22:15 ` brian m. carlson
2018-07-20 22:28 ` Junio C Hamano
2018-07-12 19:39 ` [PATCH v4 04/23] multi-pack-index: add 'write' verb Derrick Stolee
2018-07-12 22:56 ` Eric Sunshine
2018-07-12 19:39 ` [PATCH v4 05/23] midx: write header information to lockfile Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 06/23] multi-pack-index: load into memory Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 07/23] t5319: expand test data Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 08/23] packfile: generalize pack directory list Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 09/23] multi-pack-index: read packfile list Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 10/23] multi-pack-index: write pack names in chunk Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 11/23] midx: read pack names into array Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 12/23] midx: sort and deduplicate objects from packfiles Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 13/23] midx: write object ids in a chunk Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 14/23] midx: write object id fanout chunk Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 15/23] midx: write object offsets Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 16/23] config: create core.multiPackIndex setting Derrick Stolee
2018-07-12 21:05 ` Junio C Hamano
2018-07-13 0:50 ` Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 17/23] midx: read objects from multi-pack-index Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 18/23] midx: use midx in abbreviation calculations Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 19/23] midx: use existing midx when writing new one Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 20/23] midx: use midx in approximate_object_count Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 21/23] midx: prevent duplicate packfile loads Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 22/23] packfile: skip loading index if in multi-pack-index Derrick Stolee
2018-07-12 19:39 ` [PATCH v4 23/23] midx: clear midx on repack Derrick Stolee
2018-07-12 21:11 ` [PATCH v4 00/23] Multi-pack-index (MIDX) Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877enazmdd.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=mfick@codeaurora.org \
--cc=sbeller@google.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).