git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Derrick Stolee <stolee@gmail.com>,
	git@vger.kernel.org, dstolee@microsoft.com, peff@peff.net,
	szeder.dev@gmail.com
Subject: Re: [PATCH v2 00/13] more miscellaneous Bloom filter improvements, redux
Date: Thu, 17 Sep 2020 09:34:42 -0400	[thread overview]
Message-ID: <20200917133442.GA12879@nand.local> (raw)
In-Reply-To: <20200917011049.GB6478@nand.local>

Junio,

On Wed, Sep 16, 2020 at 09:10:49PM -0400, Taylor Blau wrote:
> I should have the patch in your inbox by the end of tonight, depending
> on how fast my workstation can run the ASan-enabled test suite 13 times
> ;).

All finished. This is sufficient to fix the ASan-enabled test suite,
along with fixing a bug where we wouldn't respect the limit on changed
paths when loading an existing commit-graph. This has nothing to do with
the user-specified '--max-new-filters', nor does it mean that we're
storing the limit in the commit-graph file. Instead it's because we're
loading the bloom_filter_settings struct from the graph and
initializing it ourselves, instead of using the default values (which is
the case when we don't load a graph at all).

Anyway, let's use this instead of 6/13. Here's an inter-diff that shows
the fix and test change:

  diff --git a/commit-graph.c b/commit-graph.c
  index 33af6c2430..fc6c6fdc3e 100644
  --- a/commit-graph.c
  +++ b/commit-graph.c
  @@ -424,6 +424,7 @@ struct commit_graph *parse_commit_graph(struct repository *r,
          graph->bloom_filter_settings->hash_version = hash_version;
          graph->bloom_filter_settings->num_hashes = get_be32(data + chunk_offset + 4);
          graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8);
  +				graph->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES;
        }
        break;
      }
  diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
  index dc7d62c778..5bc1627568 100755
  --- a/t/t4216-log-bloom.sh
  +++ b/t/t4216-log-bloom.sh
  @@ -175,11 +175,11 @@ test_expect_success 'persist filter settings' '
      GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=9 \
      GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY=15 \
      git commit-graph write --reachable --changed-paths &&
  -	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2.txt &&
  +	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15,\"max_changed_paths\":512" trace2.txt &&
    GIT_TRACE2_EVENT="$(pwd)/trace2-auto.txt" \
      GIT_TRACE2_EVENT_NESTING=5 \
      git commit-graph write --reachable --changed-paths &&
  -	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15" trace2-auto.txt
  +	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15,\"max_changed_paths\":512" trace2-auto.txt
   '

   test_max_changed_paths () {

...and here is the updated patch. Sorry again for all of the trouble.

--- >8 ---

Subject: [PATCH] commit-graph.c: store maximum changed paths

For now, we assume that there is a fixed constant describing the
maximum number of changed paths we are willing to store in a Bloom
filter.

Prepare for that to (at least partially) not be the case by making it a
member of the 'struct bloom_filter_settings'. This will be helpful in
the subsequent patches by reducing the size of test cases that exercise
storing too many changed paths, as well as preparing for an eventual
future in which this value might change.

This patch alone does not cause newly generated Bloom filters to use
a custom upper-bound on the maximum number of changed paths a single
Bloom filter can hold, that will occur in a later patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bloom.h              | 11 ++++++++++-
 commit-graph.c       |  4 ++++
 t/t4216-log-bloom.sh |  4 ++--
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/bloom.h b/bloom.h
index d8fbb0fbf1..0b9b59a6fe 100644
--- a/bloom.h
+++ b/bloom.h
@@ -28,9 +28,18 @@ struct bloom_filter_settings {
 	 * that contain n*b bits.
 	 */
 	uint32_t bits_per_entry;
+
+	/*
+	 * The maximum number of changed paths per commit
+	 * before declaring a Bloom filter to be too-large.
+	 *
+	 * Not written to the commit-graph file.
+	 */
+	uint32_t max_changed_paths;
 };

-#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10 }
+#define DEFAULT_BLOOM_MAX_CHANGES 512
+#define DEFAULT_BLOOM_FILTER_SETTINGS { 1, 7, 10, DEFAULT_BLOOM_MAX_CHANGES }
 #define BITS_PER_WORD 8
 #define BLOOMDATA_CHUNK_HEADER_SIZE 3 * sizeof(uint32_t)

diff --git a/commit-graph.c b/commit-graph.c
index ea54d108b9..ba6d4a4c6c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -424,6 +424,7 @@ struct commit_graph *parse_commit_graph(struct repository *r,
 				graph->bloom_filter_settings->hash_version = hash_version;
 				graph->bloom_filter_settings->num_hashes = get_be32(data + chunk_offset + 4);
 				graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8);
+				graph->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES;
 			}
 			break;
 		}
@@ -1201,6 +1202,7 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx)
 	jw_object_intmax(&jw, "hash_version", ctx->bloom_settings->hash_version);
 	jw_object_intmax(&jw, "num_hashes", ctx->bloom_settings->num_hashes);
 	jw_object_intmax(&jw, "bits_per_entry", ctx->bloom_settings->bits_per_entry);
+	jw_object_intmax(&jw, "max_changed_paths", ctx->bloom_settings->max_changed_paths);
 	jw_end(&jw);

 	trace2_data_json("bloom", ctx->r, "settings", &jw);
@@ -1669,6 +1671,8 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 							      bloom_settings.bits_per_entry);
 		bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
 							  bloom_settings.num_hashes);
+		bloom_settings.max_changed_paths = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_MAX_CHANGED_PATHS",
+							  bloom_settings.max_changed_paths);
 		ctx->bloom_settings = &bloom_settings;
 	}

diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index fc7693806c..593571358d 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -174,11 +174,11 @@ test_expect_success 'persist filter settings' '
 		GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=9 \
 		GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY=15 \
 		git commit-graph write --reachable --changed-paths &&
-	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2.txt &&
+	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15,\"max_changed_paths\":512" trace2.txt &&
 	GIT_TRACE2_EVENT="$(pwd)/trace2-auto.txt" \
 		GIT_TRACE2_EVENT_NESTING=5 \
 		git commit-graph write --reachable --changed-paths &&
-	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15}" trace2-auto.txt
+	grep "{\"hash_version\":1,\"num_hashes\":9,\"bits_per_entry\":15,\"max_changed_paths\":512" trace2-auto.txt
 '

 test_expect_success 'correctly report changes over limit' '
--
2.28.0.510.g86fdc5f89a


  reply	other threads:[~2020-09-17 13:43 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-09 15:22 [PATCH 00/12] " Taylor Blau
2020-09-09 15:22 ` [PATCH 01/12] commit-graph: introduce 'get_bloom_filter_settings()' Taylor Blau
2020-09-09 15:22 ` [PATCH 02/12] t4216: use an '&&'-chain Taylor Blau
2020-09-09 15:22 ` [PATCH 03/12] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-09-09 15:23 ` [PATCH 04/12] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-09-09 15:23 ` [PATCH 05/12] commit-graph: respect 'commitGraph.readChangedPaths' Taylor Blau
2020-09-09 15:23 ` [PATCH 06/12] commit-graph.c: store maximum changed paths Taylor Blau
2020-09-09 15:23 ` [PATCH 07/12] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-09-09 15:23 ` [PATCH 08/12] bloom: use provided 'struct bloom_filter_settings' Taylor Blau
2020-09-09 15:23 ` [PATCH 09/12] bloom/diff: properly short-circuit on max_changes Taylor Blau
2020-09-09 15:23 ` [PATCH 10/12] bloom: encode out-of-bounds filters as non-empty Taylor Blau
2020-09-10  3:35   ` Taylor Blau
2020-09-10 15:45     ` Taylor Blau
2020-09-11 18:15       ` Derrick Stolee
2020-09-09 15:23 ` [PATCH 11/12] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-09-09 15:24 ` [PATCH 12/12] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-09-11 17:52   ` Jeff King
2020-09-11 18:59     ` Taylor Blau
2020-09-11 19:25       ` Taylor Blau
2020-09-14 20:12         ` Taylor Blau
2020-09-14 20:31           ` Derrick Stolee
2020-09-14 20:36             ` Taylor Blau
2020-09-15  0:59               ` Derrick Stolee
2020-09-15  4:31                 ` Taylor Blau
2020-09-15 21:49               ` Junio C Hamano
2020-09-15 21:53                 ` Taylor Blau
2020-09-11 19:47       ` Jeff King
2020-09-11 19:31     ` Junio C Hamano
2020-09-16 18:06 ` [PATCH v2 00/13] more miscellaneous Bloom filter improvements, redux Taylor Blau
2020-09-16 18:06   ` [PATCH v2 01/13] commit-graph: introduce 'get_bloom_filter_settings()' Taylor Blau
2020-09-16 18:07   ` [PATCH v2 02/13] t4216: use an '&&'-chain Taylor Blau
2020-09-16 18:07   ` [PATCH v2 03/13] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-09-16 18:07   ` [PATCH v2 04/13] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-09-16 18:07   ` [PATCH v2 05/13] commit-graph: respect 'commitGraph.readChangedPaths' Taylor Blau
2020-09-16 18:07   ` [PATCH v2 06/13] commit-graph.c: store maximum changed paths Taylor Blau
2020-09-16 18:07   ` [PATCH v2 07/13] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-09-16 18:07   ` [PATCH v2 08/13] bloom: use provided 'struct bloom_filter_settings' Taylor Blau
2020-09-16 18:07   ` [PATCH v2 09/13] bloom/diff: properly short-circuit on max_changes Taylor Blau
2020-09-16 18:07   ` [PATCH v2 10/13] bloom: encode out-of-bounds filters as non-empty Taylor Blau
2020-09-17 22:13     ` SZEDER Gábor
2020-09-17 23:13       ` Taylor Blau
2020-09-18  0:52         ` Junio C Hamano
2020-09-18  1:15           ` Taylor Blau
2020-09-16 18:08   ` [PATCH v2 11/13] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-09-16 18:08   ` [PATCH v2 12/13] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-09-18  9:23     ` SZEDER Gábor
2020-09-18 13:27       ` Taylor Blau
2020-09-16 18:08   ` [PATCH v2 13/13] commit-graph: introduce 'commitGraph.maxNewFilters' Taylor Blau
2020-09-16 22:51   ` [PATCH v2 00/13] more miscellaneous Bloom filter improvements, redux Derrick Stolee
2020-09-16 23:07     ` Junio C Hamano
2020-09-17  0:45       ` Taylor Blau
2020-09-17  0:59         ` Junio C Hamano
2020-09-17  1:10           ` Taylor Blau
2020-09-17 13:34             ` Taylor Blau [this message]
2020-09-17 13:38               ` Derrick Stolee
2020-09-18  2:58 ` [PATCH v3 " Taylor Blau
2020-09-18  2:58   ` [PATCH v3 01/13] commit-graph: introduce 'get_bloom_filter_settings()' Taylor Blau
2020-09-18  2:58   ` [PATCH v3 02/13] t4216: use an '&&'-chain Taylor Blau
2020-09-18  2:59   ` [PATCH v3 03/13] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-09-18  2:59   ` [PATCH v3 04/13] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-09-18  2:59   ` [PATCH v3 05/13] commit-graph: respect 'commitGraph.readChangedPaths' Taylor Blau
2020-09-18  2:59   ` [PATCH v3 06/13] commit-graph.c: store maximum changed paths Taylor Blau
2020-09-18  2:59   ` [PATCH v3 07/13] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-09-18  2:59   ` [PATCH v3 08/13] bloom: use provided 'struct bloom_filter_settings' Taylor Blau
2020-09-18 16:27     ` SZEDER Gábor
2020-09-18 16:32       ` Taylor Blau
2020-09-18  2:59   ` [PATCH v3 09/13] bloom/diff: properly short-circuit on max_changes Taylor Blau
2020-09-18  2:59   ` [PATCH v3 10/13] bloom: encode out-of-bounds filters as non-empty Taylor Blau
2020-09-18  2:59   ` [PATCH v3 11/13] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-09-18  2:59   ` [PATCH v3 12/13] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-09-18  2:59   ` [PATCH v3 13/13] commit-graph: introduce 'commitGraph.maxNewFilters' Taylor Blau
2020-09-18 13:29     ` Taylor Blau
2020-09-18 17:43       ` Junio C Hamano
2020-09-18 13:31   ` [PATCH v3 00/13] more miscellaneous Bloom filter improvements, redux Taylor Blau
2020-09-18 13:34     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200917133442.GA12879@nand.local \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --cc=szeder.dev@gmail.com \
    --subject='Re: [PATCH v2 00/13] more miscellaneous Bloom filter improvements, redux' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).