From: Junio C Hamano <gitster@pobox.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>,
git@vger.kernel.org, dstolee@microsoft.com, peff@peff.net
Subject: Re: [PATCH v2 10/13] bloom: encode out-of-bounds filters as non-empty
Date: Thu, 17 Sep 2020 17:52:11 -0700 [thread overview]
Message-ID: <xmqqo8m3oqis.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <20200917231344.GA1591704@nand.local> (Taylor Blau's message of "Thu, 17 Sep 2020 19:13:44 -0400")
Taylor Blau <me@ttaylorr.com> writes:
>> > In practice, these boundary commits likely occupy a small proportion of
>> > the overall number of commits, and so the size penalty is likely smaller
>> > than storing a bitmap for all commits.
>>
>> | Percentage of
>> | commits modifying
>> | 0 path | >= 512 paths
>> ---------------+------------+----------------
>> android-base | 13.20% | 0.13%
>> cmssw | 0.15% | 0.23%
>> cpython | 3.07% | 0.01%
>> elasticsearch | 0.70% | 1.00%
>> gcc | 0.00% | 0.08%
>> gecko-dev | 0.14% | 0.64%
>> git | 0.11% | 0.02%
>> glibc | 0.02% | 0.10%
>> go | 0.00% | 0.07%
>> homebrew-cask | 0.40% | 0.02%
>> homebrew-core | 0.01% | 0.01%
>> jdk | 0.26% | 5.64%
>> linux | 0.01% | 0.51%
>> llvm-project | 0.12% | 0.03%
>> rails | 0.10% | 0.10%
>> rust | 0.07% | 0.17%
>> tensorflow | 0.09% | 1.02%
>> webkit | 0.05% | 0.31%
>
> This is very useful information to have! Without the total number of
> commits, it's impossible to know whether or not this is a win over the
> BFXL chunk. But, since the number of commits is probably "large" versus
> the percentage of boundary commits which is "small", it's almost
> certainly an advantage.
Do you want to include it in either the log message in one of the
commits, in code comment, or a technical doc?
> So, I think that if it's truly misleading, we could revisit this after
> the topic is merged. But, I'm not planning on changing anything at this
> point.
If you do not want to help us go the last-mile to completion, that
is sad, but I do not want to see a basically good patch stall like
that, so let's find somebody else who can do the helping ;-)
Here is what my trial rebase produced. I'll queue it to 'seen' (if
you prefer I can send a full v3 patch, but I expect that you know
how to fetch from 'seen' and review locally) after checking if the
result passes the tests locally; an extra set of eyeballs to verify
the result is pretty much appreciated.
Thanks.
1: ca6c060171 ! 1: 9b294a0c66 bloom: encode out-of-bounds filters as non-empty
@@ Commit message
efficient than storing a bitmap for which commits haven't computed their
Bloom filters. Storing a bitmap incurs a penalty of one bit per commit,
whereas storing explicit filters as above incurs a penalty of one byte
- per too-large or too-small commit.
+ per too-large or empty commit.
In practice, these boundary commits likely occupy a small proportion of
the overall number of commits, and so the size penalty is likely smaller
@@ bloom.c: struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
filter->len = (hashmap_get_size(&pathmap) * settings->bits_per_entry + BITS_PER_WORD - 1) / BITS_PER_WORD;
+ if (!filter->len) {
+ if (computed)
-+ *computed |= BLOOM_TRUNC_SMALL;
++ *computed |= BLOOM_TRUNC_EMPTY;
+ filter->len = 1;
+ }
filter->data = xcalloc(filter->len, sizeof(unsigned char));
@@ bloom.h: enum bloom_filter_computed {
BLOOM_NOT_COMPUTED = (1 << 0),
BLOOM_COMPUTED = (1 << 1),
BLOOM_TRUNC_LARGE = (1 << 2),
-+ BLOOM_TRUNC_SMALL = (1 << 3),
++ BLOOM_TRUNC_EMPTY = (1 << 3),
};
struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
@@ commit-graph.c: struct write_commit_graph_context {
int count_bloom_filter_computed;
int count_bloom_filter_not_computed;
-+ int count_bloom_filter_trunc_small;
++ int count_bloom_filter_trunc_empty;
int count_bloom_filter_trunc_large;
};
@@ commit-graph.c: static void trace2_bloom_filter_write_statistics(struct write_co
ctx->count_bloom_filter_computed);
trace2_data_intmax("commit-graph", ctx->r, "filter-not-computed",
ctx->count_bloom_filter_not_computed);
-+ trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-small",
-+ ctx->count_bloom_filter_trunc_small);
++ trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-empty",
++ ctx->count_bloom_filter_trunc_empty);
trace2_data_intmax("commit-graph", ctx->r, "filter-trunc-large",
ctx->count_bloom_filter_trunc_large);
}
@@ commit-graph.c: static void compute_bloom_filters(struct write_commit_graph_cont
&computed);
if (computed & BLOOM_COMPUTED) {
ctx->count_bloom_filter_computed++;
-+ if (computed & BLOOM_TRUNC_SMALL)
-+ ctx->count_bloom_filter_trunc_small++;
++ if (computed & BLOOM_TRUNC_EMPTY)
++ ctx->count_bloom_filter_trunc_empty++;
if (computed & BLOOM_TRUNC_LARGE)
ctx->count_bloom_filter_trunc_large++;
} else if (computed & BLOOM_NOT_COMPUTED)
@@ t/t4216-log-bloom.sh: test_max_changed_paths () {
grep "\"key\":\"filter-computed\",\"value\":\"$1\"" $2
}
-+test_filter_trunc_small () {
-+ grep "\"key\":\"filter-trunc-small\",\"value\":\"$1\"" $2
++test_filter_trunc_empty () {
++ grep "\"key\":\"filter-trunc-empty\",\"value\":\"$1\"" $2
+}
+
test_filter_trunc_large () {
@@ t/t4216-log-bloom.sh: test_expect_success 'correctly report changes over limit'
'
+test_expect_success 'correctly report commits with no changed paths' '
-+ git init small &&
-+ test_when_finished "rm -fr small" &&
++ git init empty &&
++ test_when_finished "rm -fr empty" &&
+ (
-+ cd small &&
++ cd empty &&
+
+ git commit --allow-empty -m "initial commit" &&
+
@@ t/t4216-log-bloom.sh: test_expect_success 'correctly report changes over limit'
+ git commit-graph write --reachable --changed-paths &&
+ test_filter_computed 1 trace.event &&
+ test_filter_not_computed 0 trace.event &&
-+ test_filter_trunc_small 1 trace.event &&
++ test_filter_trunc_empty 1 trace.event &&
+ test_filter_trunc_large 0 trace.event
+ )
+'
2: 11db600d51 = 2: 1b4c861e68 commit-graph: rename 'split_commit_graph_opts'
3: cf49598137 ! 3: d6c1bd395e builtin/commit-graph.c: introduce '--max-new-filters=<n>'
@@ t/t4216-log-bloom.sh: test_expect_success 'correctly report commits with no chan
+
+ test_filter_computed 2 trace.event &&
+ test_filter_not_computed 3 trace.event &&
-+ test_filter_trunc_small 0 trace.event &&
++ test_filter_trunc_empty 0 trace.event &&
+ test_filter_trunc_large 0 trace.event
+ )
+'
@@ t/t4216-log-bloom.sh: test_expect_success 'correctly report commits with no chan
+ --split=replace --max-new-filters=1 &&
+ test_filter_computed 1 trace.event &&
+ test_filter_not_computed 4 trace.event &&
-+ test_filter_trunc_small 0 trace.event &&
++ test_filter_trunc_empty 0 trace.event &&
+ test_filter_trunc_large 0 trace.event
+ )
+'
@@ t/t4216-log-bloom.sh: test_expect_success 'correctly report commits with no chan
+ --changed-paths --max-new-filters=2 &&
+ test_filter_computed 2 trace.event &&
+ test_filter_not_computed 4 trace.event &&
-+ test_filter_trunc_small 2 trace.event &&
++ test_filter_trunc_empty 2 trace.event &&
+ test_filter_trunc_large 0 trace.event
+ done &&
+
@@ t/t4216-log-bloom.sh: test_expect_success 'correctly report commits with no chan
+ --changed-paths --max-new-filters=2 &&
+ test_filter_computed 0 trace.event &&
+ test_filter_not_computed 6 trace.event &&
-+ test_filter_trunc_small 0 trace.event &&
++ test_filter_trunc_empty 0 trace.event &&
+ test_filter_trunc_large 0 trace.event
+ )
+'
4: 1bc82cd008 ! 4: b0d51fb04a commit-graph: introduce 'commitGraph.maxNewFilters'
@@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation is limited by --max-
+ --split=replace &&
test_filter_computed 1 trace.event &&
test_filter_not_computed 4 trace.event &&
- test_filter_trunc_small 0 trace.event &&
+ test_filter_trunc_empty 0 trace.event &&
@@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills previously-skipped filters' '
)
'
@@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills previously
+ --max-new-filters=1 &&
+ test_filter_computed 1 trace.event &&
+ test_filter_not_computed 1 trace.event &&
-+ test_filter_trunc_small 0 trace.event &&
++ test_filter_trunc_empty 0 trace.event &&
+ test_filter_trunc_large 0 trace.event
+ )
+'
next prev parent reply other threads:[~2020-09-18 0:52 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-09 15:22 [PATCH 00/12] more miscellaneous Bloom filter improvements, redux Taylor Blau
2020-09-09 15:22 ` [PATCH 01/12] commit-graph: introduce 'get_bloom_filter_settings()' Taylor Blau
2020-09-09 15:22 ` [PATCH 02/12] t4216: use an '&&'-chain Taylor Blau
2020-09-09 15:22 ` [PATCH 03/12] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-09-09 15:23 ` [PATCH 04/12] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-09-09 15:23 ` [PATCH 05/12] commit-graph: respect 'commitGraph.readChangedPaths' Taylor Blau
2020-09-09 15:23 ` [PATCH 06/12] commit-graph.c: store maximum changed paths Taylor Blau
2020-09-09 15:23 ` [PATCH 07/12] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-09-09 15:23 ` [PATCH 08/12] bloom: use provided 'struct bloom_filter_settings' Taylor Blau
2020-09-09 15:23 ` [PATCH 09/12] bloom/diff: properly short-circuit on max_changes Taylor Blau
2020-09-09 15:23 ` [PATCH 10/12] bloom: encode out-of-bounds filters as non-empty Taylor Blau
2020-09-10 3:35 ` Taylor Blau
2020-09-10 15:45 ` Taylor Blau
2020-09-11 18:15 ` Derrick Stolee
2020-09-09 15:23 ` [PATCH 11/12] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-09-09 15:24 ` [PATCH 12/12] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-09-11 17:52 ` Jeff King
2020-09-11 18:59 ` Taylor Blau
2020-09-11 19:25 ` Taylor Blau
2020-09-14 20:12 ` Taylor Blau
2020-09-14 20:31 ` Derrick Stolee
2020-09-14 20:36 ` Taylor Blau
2020-09-15 0:59 ` Derrick Stolee
2020-09-15 4:31 ` Taylor Blau
2020-09-15 21:49 ` Junio C Hamano
2020-09-15 21:53 ` Taylor Blau
2020-09-11 19:47 ` Jeff King
2020-09-11 19:31 ` Junio C Hamano
2020-09-16 18:06 ` [PATCH v2 00/13] more miscellaneous Bloom filter improvements, redux Taylor Blau
2020-09-16 18:06 ` [PATCH v2 01/13] commit-graph: introduce 'get_bloom_filter_settings()' Taylor Blau
2020-09-16 18:07 ` [PATCH v2 02/13] t4216: use an '&&'-chain Taylor Blau
2020-09-16 18:07 ` [PATCH v2 03/13] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-09-16 18:07 ` [PATCH v2 04/13] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-09-16 18:07 ` [PATCH v2 05/13] commit-graph: respect 'commitGraph.readChangedPaths' Taylor Blau
2020-09-16 18:07 ` [PATCH v2 06/13] commit-graph.c: store maximum changed paths Taylor Blau
2020-09-16 18:07 ` [PATCH v2 07/13] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-09-16 18:07 ` [PATCH v2 08/13] bloom: use provided 'struct bloom_filter_settings' Taylor Blau
2020-09-16 18:07 ` [PATCH v2 09/13] bloom/diff: properly short-circuit on max_changes Taylor Blau
2020-09-16 18:07 ` [PATCH v2 10/13] bloom: encode out-of-bounds filters as non-empty Taylor Blau
2020-09-17 22:13 ` SZEDER Gábor
2020-09-17 23:13 ` Taylor Blau
2020-09-18 0:52 ` Junio C Hamano [this message]
2020-09-18 1:15 ` Taylor Blau
2020-09-16 18:08 ` [PATCH v2 11/13] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-09-16 18:08 ` [PATCH v2 12/13] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-09-18 9:23 ` SZEDER Gábor
2020-09-18 13:27 ` Taylor Blau
2020-09-16 18:08 ` [PATCH v2 13/13] commit-graph: introduce 'commitGraph.maxNewFilters' Taylor Blau
2020-09-16 22:51 ` [PATCH v2 00/13] more miscellaneous Bloom filter improvements, redux Derrick Stolee
2020-09-16 23:07 ` Junio C Hamano
2020-09-17 0:45 ` Taylor Blau
2020-09-17 0:59 ` Junio C Hamano
2020-09-17 1:10 ` Taylor Blau
2020-09-17 13:34 ` Taylor Blau
2020-09-17 13:38 ` Derrick Stolee
2020-09-18 2:58 ` [PATCH v3 " Taylor Blau
2020-09-18 2:58 ` [PATCH v3 01/13] commit-graph: introduce 'get_bloom_filter_settings()' Taylor Blau
2020-09-18 2:58 ` [PATCH v3 02/13] t4216: use an '&&'-chain Taylor Blau
2020-09-18 2:59 ` [PATCH v3 03/13] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-09-18 2:59 ` [PATCH v3 04/13] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-09-18 2:59 ` [PATCH v3 05/13] commit-graph: respect 'commitGraph.readChangedPaths' Taylor Blau
2020-09-18 2:59 ` [PATCH v3 06/13] commit-graph.c: store maximum changed paths Taylor Blau
2020-09-18 2:59 ` [PATCH v3 07/13] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-09-18 2:59 ` [PATCH v3 08/13] bloom: use provided 'struct bloom_filter_settings' Taylor Blau
2020-09-18 16:27 ` SZEDER Gábor
2020-09-18 16:32 ` Taylor Blau
2020-09-18 2:59 ` [PATCH v3 09/13] bloom/diff: properly short-circuit on max_changes Taylor Blau
2020-09-18 2:59 ` [PATCH v3 10/13] bloom: encode out-of-bounds filters as non-empty Taylor Blau
2020-09-18 2:59 ` [PATCH v3 11/13] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-09-18 2:59 ` [PATCH v3 12/13] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-09-18 2:59 ` [PATCH v3 13/13] commit-graph: introduce 'commitGraph.maxNewFilters' Taylor Blau
2020-09-18 13:29 ` Taylor Blau
2020-09-18 17:43 ` Junio C Hamano
2020-09-18 13:31 ` [PATCH v3 00/13] more miscellaneous Bloom filter improvements, redux Taylor Blau
2020-09-18 13:34 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqo8m3oqis.fsf@gitster.c.googlers.com \
--to=gitster@pobox.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).