From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Garima Singh" <garima.singh@microsoft.com>,
"Derrick Stolee" <stolee@gmail.com>,
"Jakub Narebski" <jnareb@gmail.com>, "Jeff King" <peff@peff.net>,
"Taylor Blau" <me@ttaylorr.com>,
"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [PATCH 33/34] commit-graph: write modified path Bloom filters in "history order"
Date: Fri, 29 May 2020 10:50:37 +0200 [thread overview]
Message-ID: <20200529085038.26008-34-szeder.dev@gmail.com> (raw)
In-Reply-To: <20200529085038.26008-1-szeder.dev@gmail.com>
[Explain! Try topo order!]
I don't have recent benchmark results at hand yet, but as far as I can
recall some old results this reduces the time spent loading and
checking modified path Bloom filters by another ~10%. Checking Bloom
filters is of course only a small part of the whole runtime of a
pathspec-limited revision walk, so the overall improvement is only
about 1-2%.
Oh, well, I expected more from this change... But perhaps it has
larger impact with less embedded modified path Bloom filters?
This is the last patch in this series that improves pathspec-limited
revision walk performance, so it's time to compare average runtime and
memory usage of
git rev-list HEAD -- "$path"
for 5000+ randomly selected paths from each repository with and
without modified path Bloom filters:
Average runtime Average | Average max RSS
without with speedup | without with
---------------------------------------------+----------------------------
android-base 0.8780s 0.1456s 6.03x | 387MB 91.2MB -76.5%
cmssw 0.3143s 0.0313s 10.04x | 181MB 37.4MB -79.4%
cpython 0.7453s 0.0810s 9.20x | 159MB 43.8MB -72.5%
elasticsearch 0.1492s 0.0136s 10.95x | 134MB 21.8MB -83.7%
gcc 7.1852s 0.2114s 34.00x | 297MB 91.1MB -69.3%
gecko-dev 4.6113s 0.4815s 9.58x | 832MB 175.2MB -79.0%
git 0.6180s 0.0310s 19.94x | 131MB 31.5MB -76.0%
glibc 0.5618s 0.0282s 19.92x | 135MB 25.0MB -81.6%
go 0.4913s 0.0403s 12.19x | 130MB 24.7MB -81.1%
jdk 0.0482s 0.0068s 7.09x | 52MB 27.1MB -48.2%
linux 0.7043s 0.0873s 8.08x | 438MB 150.9MB -65.5%
llvm-project 2.6844s 0.4135s 6.49x | 384MB 126.5MB -67.1%
rails 0.2784s 0.0391s 7.12x | 88MB 28.8MB -67.3%
rust 0.7757s 0.0439s 17.67x | 344MB 42.0MB -87.8%
tensorflow 0.6258s 0.0487s 12.85x | 233MB 36.7MB -84.2%
webkit 1.9137s 0.2420s 7.91x | 940MB 84.7MB -91.0%
The astute reader may notice that in several cases the achieved
speedup is a bit higher than the calculated potential speedup shown in
the log message of the patch adding the modified path Bloom filter
specs, e.g. 34x vs. 29.7x in the gcc repository. I suspect that by
eliminating much of the tree-diff overhead we load a lot less (tree)
objects, putting less strain on the caches, and in turn making other
parts of the process faster as well.
Alas, this is not without extra cost: writing the commit-graph file
with modified path Bloom filters takes a lot longer and the resulting
commit-graph file is considerably larger:
Writing a commit-graph file from |
scratch with '--reachable' | commit-graph file size
without with | without with
----------------------------------------------+--------------------------------
android-base 6.230s 40.880s 6.56x | 25077512 31278605 +24.7%
cmssw 3.177s 25.691s 8.09x | 13017516 23971497 +84.1%
cpython 1.344s 8.951s 6.66x | 6808068 8152146 +19.7%
gcc 2.886s 36.917s 12.79x | 12839660 18068286 +40.7%
gecko-dev 9.331s 97.729s 10.47x | 43335208 66568549 +53.6%
git 0.750s 5.245s 6.99x | 3388208 4011595 +18.3%
glibc 0.565s 4.146s 7.34x | 2832460 3936588 +38.9%
homebrew-cask 1.083s 27.024s 24.95x | 5928644 6859160 +15.7%
homebrew-core 1.702s 55.478s 32.60x | 9171324 10509040 +14.5%
jdk 0.568s 19.418s 34.19x | 3237508 15858410 +389.8%
linux 13.525s 100.837s 7.46x | 49800744 81939893 +64.5%
llvm-project 4.275s 31.188s 7.30x | 19309116 25336867 +31.2%
rails 0.986s 5.607s 5.69x | 4990252 6317541 +26.5%
rust 1.260s 13.250s 10.52x | 6053824 8601440 +42.0%
webkit 3.658s 30.469s 8.33x | 12288620 19438512 +58.1%
---
commit-graph.c | 28 +++++++++++++++++++++-------
1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/commit-graph.c b/commit-graph.c
index 1677e4fb3e..8eb0cbedaf 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -67,6 +67,8 @@ struct modified_path_bloom_filter_info {
uint64_t offset;
uint32_t merge_index_pos;
struct modified_path_bloom_filter_info *next;
+ /* TODO: need better names for 'next' and 'next_commit_bfi' */
+ struct modified_path_bloom_filter_info *next_commit_bfi;
};
static void free_modified_path_bloom_filter_info_in_slab(
@@ -1159,6 +1161,10 @@ struct write_commit_graph_context {
/* Used to find identical modified path Bloom filters */
struct hashmap dedup_hashmap;
+
+ /* To chain up Bloom filters in history order */
+ struct modified_path_bloom_filter_info *first_commit_bfi;
+ struct modified_path_bloom_filter_info *prev_commit_bfi;
} mpbfctx;
};
@@ -1363,18 +1369,18 @@ static int write_graph_chunk_modified_path_bloom_filters(struct hashfile *f,
struct write_commit_graph_context *ctx)
{
uint64_t offset = 0;
- int i;
+ struct modified_path_bloom_filter_info *next_commit_bfi;
- for (i = 0; i < ctx->commits.nr; i++) {
- struct commit *commit = ctx->commits.list[i];
- struct modified_path_bloom_filter_info *bfi;
- unsigned int filter_size;
+ next_commit_bfi = ctx->mpbfctx.first_commit_bfi;
+ while (next_commit_bfi) {
+ struct modified_path_bloom_filter_info *bfi = next_commit_bfi;
+
+ next_commit_bfi = next_commit_bfi->next_commit_bfi;
display_progress(ctx->progress, ++ctx->progress_cnt);
- bfi = modified_path_bloom_filters_peek(
- &modified_path_bloom_filters, commit);
for (; bfi; bfi = bfi->next) {
+ unsigned int filter_size;
if (bfi->duplicate_of)
continue;
if (!bfi->filter.nr_bits)
@@ -1762,6 +1768,14 @@ static void create_modified_path_bloom_filter(
bloom_filter_set_bits(&bfi->filter, mpbfctx->hashes,
mpbfctx->hashes_nr);
+ if (!mpbfctx->first_commit_bfi) {
+ mpbfctx->first_commit_bfi = bfi;
+ mpbfctx->prev_commit_bfi = bfi;
+ } else if (!nth_parent) {
+ mpbfctx->prev_commit_bfi->next_commit_bfi = bfi;
+ mpbfctx->prev_commit_bfi = bfi;
+ }
+
if (path_component_count > mpbfctx->embedded_limit &&
!handle_duplicate_modified_path_bloom_filter(mpbfctx, bfi))
mpbfctx->total_filter_size += sizeof(uint32_t) +
--
2.27.0.rc1.431.g5c813f95dc
next prev parent reply other threads:[~2020-05-29 8:52 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-29 8:50 [PoC PATCH 00/34] An alternative modified path Bloom filters implementation SZEDER Gábor
2020-05-28 22:09 ` Johannes Schindelin
2020-05-29 8:50 ` [PATCH 01/34] tree-walk.c: don't match submodule entries for 'submod/anything' SZEDER Gábor
2020-05-29 8:50 ` [PATCH 02/34] commit-graph: fix parsing the Chunk Lookup table SZEDER Gábor
2020-05-29 8:50 ` [PATCH 03/34] commit-graph-format.txt: all multi-byte numbers are in network byte order SZEDER Gábor
2020-05-29 8:50 ` [PATCH 04/34] commit-slab: add a function to deep free entries on the slab SZEDER Gábor
2020-06-04 16:43 ` Derrick Stolee
2020-06-27 15:56 ` SZEDER Gábor
2020-06-29 11:30 ` Derrick Stolee
2020-05-29 8:50 ` [PATCH 05/34] diff.h: drop diff_tree_oid() & friends' return value SZEDER Gábor
2020-05-29 8:50 ` [PATCH 06/34] commit-graph: clean up #includes SZEDER Gábor
2020-05-29 8:50 ` [PATCH 07/34] commit-graph: simplify parse_commit_graph() #1 SZEDER Gábor
2020-05-29 8:50 ` [PATCH 08/34] commit-graph: simplify parse_commit_graph() #2 SZEDER Gábor
2020-05-29 8:50 ` [PATCH 09/34] commit-graph: simplify write_commit_graph_file() #1 SZEDER Gábor
2020-05-29 8:50 ` [PATCH 10/34] commit-graph: simplify write_commit_graph_file() #2 SZEDER Gábor
2020-05-29 8:50 ` [PATCH 11/34] commit-graph: allocate the 'struct chunk_info' array dinamically SZEDER Gábor
2020-05-29 8:50 ` [PATCH 12/34] commit-graph: unify the signatures of all write_graph_chunk_*() functions SZEDER Gábor
2020-05-29 8:50 ` [PATCH 13/34] commit-graph: simplify write_commit_graph_file() #3 SZEDER Gábor
2020-05-29 8:50 ` [PATCH 14/34] commit-graph: check chunk sizes after writing SZEDER Gábor
2020-05-29 8:50 ` [PATCH 15/34] commit-graph-format.txt: document the modified path Bloom filter chunks SZEDER Gábor
2020-06-02 17:59 ` SZEDER Gábor
2020-05-29 8:50 ` [PATCH 16/34] Add a generic and minimal Bloom filter implementation SZEDER Gábor
2020-05-29 8:50 ` [PATCH 17/34] Import a streaming-capable Murmur3 hash function implementation SZEDER Gábor
2020-05-29 8:50 ` [PATCH 18/34] commit-graph: write "empty" Modified Path Bloom Filter Index chunk SZEDER Gábor
2020-05-29 8:50 ` [PATCH 19/34] commit-graph: add commit slab for modified path Bloom filters SZEDER Gábor
2020-05-29 8:50 ` [PATCH 20/34] commit-graph: fill the Modified Path Bloom Filter Index chunk SZEDER Gábor
2020-05-29 8:50 ` [PATCH 21/34] commit-graph: load and use " SZEDER Gábor
2020-05-29 8:50 ` [PATCH 22/34] commit-graph: write the Modified Path Bloom Filters chunk SZEDER Gábor
2020-05-29 8:50 ` [PATCH 23/34] commit-graph: load and use " SZEDER Gábor
2020-05-29 8:50 ` [PATCH 24/34] commit-graph: check all leading directories in modified path Bloom filters SZEDER Gábor
2020-05-29 8:50 ` [PATCH 25/34] commit-graph: check embedded modified path Bloom filters with a mask SZEDER Gábor
2020-05-29 8:50 ` [PATCH 26/34] commit-graph: deduplicate modified path Bloom filters SZEDER Gábor
2020-05-29 8:50 ` [PATCH 27/34] commit-graph: load modified path Bloom filters for merge commits SZEDER Gábor
2020-05-29 8:50 ` [PATCH 28/34] commit-graph: write Modified Path Bloom Filter Merge Index chunk SZEDER Gábor
2020-05-29 8:50 ` [PATCH 29/34] commit-graph: extract init and free write_commit_graph_context SZEDER Gábor
2020-05-29 8:50 ` [PATCH 30/34] commit-graph: move write_commit_graph_reachable below write_commit_graph SZEDER Gábor
2020-05-29 8:50 ` [PATCH 31/34] t7007-show: make the first test compatible with the next patch SZEDER Gábor
2020-05-29 8:50 ` [PATCH 32/34] PoC commit-graph: use revision walk machinery for '--reachable' SZEDER Gábor
2020-05-29 8:50 ` SZEDER Gábor [this message]
2020-05-29 8:50 ` [PATCH 34/34] commit-graph: use modified path Bloom filters with wildcards, if possible SZEDER Gábor
2020-05-29 13:59 ` [PoC PATCH 00/34] An alternative modified path Bloom filters implementation Derrick Stolee
2020-06-01 23:25 ` Taylor Blau
2020-06-02 17:08 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200529085038.26008-34-szeder.dev@gmail.com \
--to=szeder.dev@gmail.com \
--cc=garima.singh@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jnareb@gmail.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).