git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net,
	szeder.dev@gmail.com
Subject: [PATCH v3 01/13] commit-graph: introduce 'get_bloom_filter_settings()'
Date: Thu, 17 Sep 2020 22:58:52 -0400	[thread overview]
Message-ID: <8f8a82224364ba6ca487d99d0e2cafeabe2edc5a.1600397826.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1600397826.git.me@ttaylorr.com>

Many places in the code often need a pointer to the commit-graph's
'struct bloom_filter_settings', in which case they often take the value
from the top-most commit-graph.

In the non-split case, this works as expected. In the split case,
however, things get a little tricky. Not all layers in a chain of
incremental commit-graphs are required to themselves have Bloom data,
and so whether or not some part of the code uses Bloom filters depends
entirely on whether or not the top-most level of the commit-graph chain
has Bloom filters.

This has been the behavior since Bloom filters were introduced, and has
been codified into the tests since a759bfa9ee (t4216: add end to end
tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130
requires that Bloom filters are not used in exactly the case described
earlier.

There is no reason that this needs to be the case, since it is perfectly
valid for commits in an earlier layer to have Bloom filters when commits
in a newer layer do not.

Since Bloom settings are guaranteed in practice to be the same for any
layer in a chain that has Bloom data, it is sufficient to traverse the
'->base_graph' pointer until either (1) a non-null 'struct
bloom_filter_settings *' is found, or (2) until we are at the root of
the commit-graph chain.

Introduce a 'get_bloom_filter_settings()' function that does just this,
and use it instead of purely dereferencing the top-most graph's
'->bloom_filter_settings' pointer.

While we're at it, add an additional test in t5324 to guard against code
in the commit-graph writing machinery that doesn't correctly handle a
NULL 'struct bloom_filter *'.

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 blame.c                       |  6 ++++--
 bloom.c                       |  6 +++---
 commit-graph.c                | 11 +++++++++++
 commit-graph.h                |  2 ++
 revision.c                    |  5 +----
 t/t4216-log-bloom.sh          |  9 ++++++---
 t/t5324-split-commit-graph.sh | 13 +++++++++++++
 7 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/blame.c b/blame.c
index 1be1cd82a2..903e23af23 100644
--- a/blame.c
+++ b/blame.c
@@ -2892,16 +2892,18 @@ void setup_blame_bloom_data(struct blame_scoreboard *sb,
 			    const char *path)
 {
 	struct blame_bloom_data *bd;
+	struct bloom_filter_settings *bs;
 
 	if (!sb->repo->objects->commit_graph)
 		return;
 
-	if (!sb->repo->objects->commit_graph->bloom_filter_settings)
+	bs = get_bloom_filter_settings(sb->repo);
+	if (!bs)
 		return;
 
 	bd = xmalloc(sizeof(struct blame_bloom_data));
 
-	bd->settings = sb->repo->objects->commit_graph->bloom_filter_settings;
+	bd->settings = bs;
 
 	bd->alloc = 4;
 	bd->nr = 0;
diff --git a/bloom.c b/bloom.c
index 1a573226e7..cd9380ac62 100644
--- a/bloom.c
+++ b/bloom.c
@@ -38,7 +38,7 @@ static int load_bloom_filter_from_graph(struct commit_graph *g,
 	while (graph_pos < g->num_commits_in_base)
 		g = g->base_graph;
 
-	/* The commit graph commit 'c' lives in doesn't carry bloom filters. */
+	/* The commit graph commit 'c' lives in doesn't carry Bloom filters. */
 	if (!g->chunk_bloom_indexes)
 		return 0;
 
@@ -195,8 +195,8 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
 	if (!filter->data) {
 		load_commit_graph_info(r, c);
 		if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH &&
-			r->objects->commit_graph->chunk_bloom_indexes)
-			load_bloom_filter_from_graph(r->objects->commit_graph, filter, c);
+			load_bloom_filter_from_graph(r->objects->commit_graph, filter, c))
+				return filter;
 	}
 
 	if (filter->data)
diff --git a/commit-graph.c b/commit-graph.c
index 0ed003e218..6a36ed0b06 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -667,6 +667,17 @@ int generation_numbers_enabled(struct repository *r)
 	return !!first_generation;
 }
 
+struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r)
+{
+	struct commit_graph *g = r->objects->commit_graph;
+	while (g) {
+		if (g->bloom_filter_settings)
+			return g->bloom_filter_settings;
+		g = g->base_graph;
+	}
+	return NULL;
+}
+
 static void close_commit_graph_one(struct commit_graph *g)
 {
 	if (!g)
diff --git a/commit-graph.h b/commit-graph.h
index 09a97030dc..0677dd1031 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -87,6 +87,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size);
  */
 int generation_numbers_enabled(struct repository *r);
 
+struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r);
+
 enum commit_graph_write_flags {
 	COMMIT_GRAPH_WRITE_APPEND     = (1 << 0),
 	COMMIT_GRAPH_WRITE_PROGRESS   = (1 << 1),
diff --git a/revision.c b/revision.c
index 1239023f93..6aeb764821 100644
--- a/revision.c
+++ b/revision.c
@@ -681,10 +681,7 @@ static void prepare_to_use_bloom_filter(struct rev_info *revs)
 
 	repo_parse_commit(revs->repo, revs->commits->item);
 
-	if (!revs->repo->objects->commit_graph)
-		return;
-
-	revs->bloom_filter_settings = revs->repo->objects->commit_graph->bloom_filter_settings;
+	revs->bloom_filter_settings = get_bloom_filter_settings(revs->repo);
 	if (!revs->bloom_filter_settings)
 		return;
 
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index 4bb9e9dbe2..715912ad0f 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -65,7 +65,7 @@ setup () {
 
 test_bloom_filters_used () {
 	log_args=$1
-	bloom_trace_prefix="statistics:{\"filter_not_present\":0,\"maybe\""
+	bloom_trace_prefix="statistics:{\"filter_not_present\":${2:-0},\"maybe\""
 	setup "$log_args" &&
 	grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" &&
 	test_cmp log_wo_bloom log_w_bloom &&
@@ -139,8 +139,11 @@ test_expect_success 'setup - add commit-graph to the chain without Bloom filters
 	test_line_count = 2 .git/objects/info/commit-graphs/commit-graph-chain
 '
 
-test_expect_success 'Do not use Bloom filters if the latest graph does not have Bloom filters.' '
-	test_bloom_filters_not_used "-- A/B"
+test_expect_success 'use Bloom filters even if the latest graph does not have Bloom filters' '
+	# Ensure that the number of empty filters is equal to the number of
+	# filters in the latest graph layer to prove that they are loaded (and
+	# ignored).
+	test_bloom_filters_used "-- A/B" 3
 '
 
 test_expect_success 'setup - add commit-graph to the chain with Bloom filters' '
diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
index 18216463c7..c334ee9155 100755
--- a/t/t5324-split-commit-graph.sh
+++ b/t/t5324-split-commit-graph.sh
@@ -427,4 +427,17 @@ done <<\EOF
 0600 -r--------
 EOF
 
+test_expect_success '--split=replace with partial Bloom data' '
+	rm -rf $graphdir $infodir/commit-graph &&
+	git reset --hard commits/3 &&
+	git rev-list -1 HEAD~2 >a &&
+	git rev-list -1 HEAD~1 >b &&
+	git commit-graph write --split=no-merge --stdin-commits --changed-paths <a &&
+	git commit-graph write --split=no-merge --stdin-commits <b &&
+	git commit-graph write --split=replace --stdin-commits --changed-paths <c &&
+	ls $graphdir/graph-*.graph >graph-files &&
+	test_line_count = 1 graph-files &&
+	verify_chain_files_exist $graphdir
+'
+
 test_done
-- 
2.28.0.510.g375ecf1f36


  reply	other threads:[~2020-09-18  2:59 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-09 15:22 [PATCH 00/12] more miscellaneous Bloom filter improvements, redux Taylor Blau
2020-09-09 15:22 ` [PATCH 01/12] commit-graph: introduce 'get_bloom_filter_settings()' Taylor Blau
2020-09-09 15:22 ` [PATCH 02/12] t4216: use an '&&'-chain Taylor Blau
2020-09-09 15:22 ` [PATCH 03/12] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-09-09 15:23 ` [PATCH 04/12] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-09-09 15:23 ` [PATCH 05/12] commit-graph: respect 'commitGraph.readChangedPaths' Taylor Blau
2020-09-09 15:23 ` [PATCH 06/12] commit-graph.c: store maximum changed paths Taylor Blau
2020-09-09 15:23 ` [PATCH 07/12] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-09-09 15:23 ` [PATCH 08/12] bloom: use provided 'struct bloom_filter_settings' Taylor Blau
2020-09-09 15:23 ` [PATCH 09/12] bloom/diff: properly short-circuit on max_changes Taylor Blau
2020-09-09 15:23 ` [PATCH 10/12] bloom: encode out-of-bounds filters as non-empty Taylor Blau
2020-09-10  3:35   ` Taylor Blau
2020-09-10 15:45     ` Taylor Blau
2020-09-11 18:15       ` Derrick Stolee
2020-09-09 15:23 ` [PATCH 11/12] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-09-09 15:24 ` [PATCH 12/12] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-09-11 17:52   ` Jeff King
2020-09-11 18:59     ` Taylor Blau
2020-09-11 19:25       ` Taylor Blau
2020-09-14 20:12         ` Taylor Blau
2020-09-14 20:31           ` Derrick Stolee
2020-09-14 20:36             ` Taylor Blau
2020-09-15  0:59               ` Derrick Stolee
2020-09-15  4:31                 ` Taylor Blau
2020-09-15 21:49               ` Junio C Hamano
2020-09-15 21:53                 ` Taylor Blau
2020-09-11 19:47       ` Jeff King
2020-09-11 19:31     ` Junio C Hamano
2020-09-16 18:06 ` [PATCH v2 00/13] more miscellaneous Bloom filter improvements, redux Taylor Blau
2020-09-16 18:06   ` [PATCH v2 01/13] commit-graph: introduce 'get_bloom_filter_settings()' Taylor Blau
2020-09-16 18:07   ` [PATCH v2 02/13] t4216: use an '&&'-chain Taylor Blau
2020-09-16 18:07   ` [PATCH v2 03/13] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-09-16 18:07   ` [PATCH v2 04/13] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-09-16 18:07   ` [PATCH v2 05/13] commit-graph: respect 'commitGraph.readChangedPaths' Taylor Blau
2020-09-16 18:07   ` [PATCH v2 06/13] commit-graph.c: store maximum changed paths Taylor Blau
2020-09-16 18:07   ` [PATCH v2 07/13] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-09-16 18:07   ` [PATCH v2 08/13] bloom: use provided 'struct bloom_filter_settings' Taylor Blau
2020-09-16 18:07   ` [PATCH v2 09/13] bloom/diff: properly short-circuit on max_changes Taylor Blau
2020-09-16 18:07   ` [PATCH v2 10/13] bloom: encode out-of-bounds filters as non-empty Taylor Blau
2020-09-17 22:13     ` SZEDER Gábor
2020-09-17 23:13       ` Taylor Blau
2020-09-18  0:52         ` Junio C Hamano
2020-09-18  1:15           ` Taylor Blau
2020-09-16 18:08   ` [PATCH v2 11/13] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-09-16 18:08   ` [PATCH v2 12/13] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-09-18  9:23     ` SZEDER Gábor
2020-09-18 13:27       ` Taylor Blau
2020-09-16 18:08   ` [PATCH v2 13/13] commit-graph: introduce 'commitGraph.maxNewFilters' Taylor Blau
2020-09-16 22:51   ` [PATCH v2 00/13] more miscellaneous Bloom filter improvements, redux Derrick Stolee
2020-09-16 23:07     ` Junio C Hamano
2020-09-17  0:45       ` Taylor Blau
2020-09-17  0:59         ` Junio C Hamano
2020-09-17  1:10           ` Taylor Blau
2020-09-17 13:34             ` Taylor Blau
2020-09-17 13:38               ` Derrick Stolee
2020-09-18  2:58 ` [PATCH v3 " Taylor Blau
2020-09-18  2:58   ` Taylor Blau [this message]
2020-09-18  2:58   ` [PATCH v3 02/13] t4216: use an '&&'-chain Taylor Blau
2020-09-18  2:59   ` [PATCH v3 03/13] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-09-18  2:59   ` [PATCH v3 04/13] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-09-18  2:59   ` [PATCH v3 05/13] commit-graph: respect 'commitGraph.readChangedPaths' Taylor Blau
2020-09-18  2:59   ` [PATCH v3 06/13] commit-graph.c: store maximum changed paths Taylor Blau
2020-09-18  2:59   ` [PATCH v3 07/13] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-09-18  2:59   ` [PATCH v3 08/13] bloom: use provided 'struct bloom_filter_settings' Taylor Blau
2020-09-18 16:27     ` SZEDER Gábor
2020-09-18 16:32       ` Taylor Blau
2020-09-18  2:59   ` [PATCH v3 09/13] bloom/diff: properly short-circuit on max_changes Taylor Blau
2020-09-18  2:59   ` [PATCH v3 10/13] bloom: encode out-of-bounds filters as non-empty Taylor Blau
2020-09-18  2:59   ` [PATCH v3 11/13] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-09-18  2:59   ` [PATCH v3 12/13] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-09-18  2:59   ` [PATCH v3 13/13] commit-graph: introduce 'commitGraph.maxNewFilters' Taylor Blau
2020-09-18 13:29     ` Taylor Blau
2020-09-18 17:43       ` Junio C Hamano
2020-09-18 13:31   ` [PATCH v3 00/13] more miscellaneous Bloom filter improvements, redux Taylor Blau
2020-09-18 13:34     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f8a82224364ba6ca487d99d0e2cafeabe2edc5a.1600397826.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    --subject='Re: [PATCH v3 01/13] commit-graph: introduce '\''get_bloom_filter_settings()'\''' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).