git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Elijah Newren <newren@gmail.com>,
	"Eric W. Biederman" <ebiederm@gmail.com>,
	Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
	Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries)
Date: Wed, 18 Oct 2023 14:32:22 -0400	[thread overview]
Message-ID: <cover.1697653929.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1696629697.git.me@ttaylorr.com>

(Rebased onto the tip of 'master', which is 3a06386e31 (The fifteenth
batch, 2023-10-04), at the time of writing).

This series is a reroll of the combined efforts of [1] and [2] to
introduce the v2 changed-path Bloom filters, which fixes a bug in our
existing implementation of murmur3 paths with non-ASCII characters (when
the "char" type is signed).

In large part, this is the same as the previous round. But this round
includes some extra bits that address issues pointed out by SZEDER
Gábor, which are:

  - not reading Bloom filters for root commits
  - corrupting Bloom filter reads by tweaking the filter settings
    between layers.

These issues were discussed in (among other places) [3], and [4],
respectively.

Thanks to Jonathan, Peff, and SZEDER who have helped a great deal in
assembling these patches. As usual, a range-diff is included below.
Thanks in advance for your
review!

[1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@google.com/
[2]: https://lore.kernel.org/git/cover.1691426160.git.me@ttaylorr.com/
[3]: https://public-inbox.org/git/20201015132147.GB24954@szeder.dev/
[4]: https://lore.kernel.org/git/20230830200218.GA5147@szeder.dev/

Jonathan Tan (4):
  gitformat-commit-graph: describe version 2 of BDAT
  t4216: test changed path filters with high bit paths
  repo-settings: introduce commitgraph.changedPathsVersion
  commit-graph: new filter ver. that fixes murmur3

Taylor Blau (13):
  t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
  revision.c: consult Bloom filters for root commits
  commit-graph: ensure Bloom filters are read with consistent settings
  t/helper/test-read-graph.c: extract `dump_graph_info()`
  bloom.h: make `load_bloom_filter_from_graph()` public
  t/helper/test-read-graph: implement `bloom-filters` mode
  bloom: annotate filters with hash version
  bloom: prepare to discard incompatible Bloom filters
  commit-graph.c: unconditionally load Bloom filters
  commit-graph: drop unnecessary `graph_read_bloom_data_context`
  object.h: fix mis-aligned flag bits table
  commit-graph: reuse existing Bloom filters where possible
  bloom: introduce `deinit_bloom_filters()`

 Documentation/config/commitgraph.txt     |  26 ++-
 Documentation/gitformat-commit-graph.txt |   9 +-
 bloom.c                                  | 208 ++++++++++++++++-
 bloom.h                                  |  38 ++-
 commit-graph.c                           |  61 ++++-
 object.h                                 |   3 +-
 oss-fuzz/fuzz-commit-graph.c             |   2 +-
 repo-settings.c                          |   6 +-
 repository.h                             |   2 +-
 revision.c                               |  26 ++-
 t/helper/test-bloom.c                    |   9 +-
 t/helper/test-read-graph.c               |  67 ++++--
 t/t0095-bloom.sh                         |   8 +
 t/t4216-log-bloom.sh                     | 282 ++++++++++++++++++++++-
 14 files changed, 692 insertions(+), 55 deletions(-)

Range-diff against v3:
 1:  fe671d616c =  1:  e0fc51c3fb t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
 2:  7d0fa93543 =  2:  87b09e6266 revision.c: consult Bloom filters for root commits
 3:  2ecc0a2d58 !  3:  46d8a41005 commit-graph: ensure Bloom filters are read with consistent settings
    @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm
     +	done
     +'
     +
    -+test_expect_success 'split' '
    ++test_expect_success 'ensure incompatible Bloom filters are ignored' '
     +	# Compute Bloom filters with "unusual" settings.
     +	git -C $repo rev-parse one >in &&
     +	GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git -C $repo commit-graph write \
    @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm
     +
     +test_expect_success 'merge graph layers with incompatible Bloom settings' '
     +	# Ensure that incompatible Bloom filters are ignored when
    -+	# generating new layers.
    ++	# merging existing layers.
     +	git -C $repo commit-graph write --reachable --changed-paths 2>err &&
     +	grep "disabling Bloom filters for commit-graph layer .$layer." err &&
     +
     +	test_path_is_file $repo/$graph &&
     +	test_dir_is_empty $repo/$graphdir &&
     +
    -+	# ...and merging existing ones.
    -+	git -C $repo -c core.commitGraph=false log --oneline --no-decorate -- file \
    -+		>expect 2>err &&
    -+	GIT_TRACE2_PERF="$(pwd)/trace.perf" \
    ++	git -C $repo -c core.commitGraph=false log --oneline --no-decorate -- \
    ++		file >expect &&
    ++	trace_out="$(pwd)/trace.perf" &&
    ++	GIT_TRACE2_PERF="$trace_out" \
     +		git -C $repo log --oneline --no-decorate -- file >actual 2>err &&
     +
    -+	test_cmp expect actual && cat err &&
    -+	grep "statistics:{\"filter_not_present\":0" trace.perf &&
    -+	! grep "disabling Bloom filters" err
    ++	test_cmp expect actual &&
    ++	grep "statistics:{\"filter_not_present\":0," trace.perf &&
    ++	test_must_be_empty err
     +'
     +
      test_done
 4:  17703ed89a =  4:  4d0190a992 gitformat-commit-graph: describe version 2 of BDAT
 5:  94552abf45 =  5:  3c2057c11c t/helper/test-read-graph.c: extract `dump_graph_info()`
 6:  3d81efa27b =  6:  e002e35004 bloom.h: make `load_bloom_filter_from_graph()` public
 7:  d23cd89037 =  7:  c7016f51cd t/helper/test-read-graph: implement `bloom-filters` mode
 8:  cba766f224 !  8:  cef2aac8ba t4216: test changed path filters with high bit paths
    @@ Commit message
     
      ## t/t4216-log-bloom.sh ##
     @@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible Bloom settings' '
    - 	! grep "disabling Bloom filters" err
    + 	test_must_be_empty err
      '
      
     +get_first_changed_path_filter () {
    @@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible
     +	(
     +		cd highbit1 &&
     +		echo "52a9" >expect &&
    -+		get_first_changed_path_filter >actual &&
    -+		test_cmp expect actual
    ++		get_first_changed_path_filter >actual
     +	)
     +'
     +
 9:  a08a961f41 =  9:  36d4e2202e repo-settings: introduce commitgraph.changedPathsVersion
10:  61d44519a5 ! 10:  f6ab427ead commit-graph: new filter ver. that fixes murmur3
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +	test_commit -C doublewrite c "$CENT" &&
     +	git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
     +	git -C doublewrite commit-graph write --reachable --changed-paths &&
    ++	for v in -2 3
    ++	do
    ++		git -C doublewrite config --add commitgraph.changedPathsVersion $v &&
    ++		git -C doublewrite commit-graph write --reachable --changed-paths 2>err &&
    ++		cat >expect <<-EOF &&
    ++		warning: attempting to write a commit-graph, but ${SQ}commitgraph.changedPathsVersion${SQ} ($v) is not supported
    ++		EOF
    ++		test_cmp expect err || return 1
    ++	done &&
     +	git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
     +	git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	(
11:  a8c10f8de8 = 11:  dc69b28329 bloom: annotate filters with hash version
12:  2ba10a4b4b = 12:  85dbdc4ed2 bloom: prepare to discard incompatible Bloom filters
13:  09d8669c3a = 13:  3ff669a622 commit-graph.c: unconditionally load Bloom filters
14:  0d4f9dc4ee = 14:  1c78e3d178 commit-graph: drop unnecessary `graph_read_bloom_data_context`
15:  1f7f27bc47 = 15:  a289514faa object.h: fix mis-aligned flag bits table
16:  abbef95ae8 ! 16:  6a12e39e7f commit-graph: reuse existing Bloom filters where possible
    @@ t/t4216-log-bloom.sh: test_expect_success 'when writing another commit graph, pr
      	test_commit -C doublewrite c "$CENT" &&
     +
      	git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
    --	git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +		git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	test_filter_computed 1 trace2.txt &&
     +	test_filter_upgraded 0 trace2.txt &&
    ++
    + 	git -C doublewrite commit-graph write --reachable --changed-paths &&
    + 	for v in -2 3
    + 	do
    +@@ t/t4216-log-bloom.sh: test_expect_success 'when writing commit graph, do not reuse changed-path of ano
    + 		EOF
    + 		test_cmp expect err || return 1
    + 	done &&
     +
      	git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
     -	git -C doublewrite commit-graph write --reachable --changed-paths &&
17:  ca362408d5 ! 17:  8942f205c8 bloom: introduce `deinit_bloom_filters()`
    @@ bloom.h: void add_key_to_filter(const struct bloom_key *key,
      	BLOOM_NOT_COMPUTED = (1 << 0),
     
      ## commit-graph.c ##
    -@@ commit-graph.c: static void close_commit_graph_one(struct commit_graph *g)
    +@@ commit-graph.c: struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r)
      void close_commit_graph(struct raw_object_store *o)
      {
    - 	close_commit_graph_one(o->commit_graph);
    + 	clear_commit_graph_data_slab(&commit_graph_data_slab);
     +	deinit_bloom_filters();
    + 	free_commit_graph(o->commit_graph);
      	o->commit_graph = NULL;
      }
    - 
     @@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
      
      	res = write_commit_graph_file(ctx);
-- 
2.42.0.415.g8942f205c8


  parent reply	other threads:[~2023-10-18 18:32 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-06 22:01 [PATCH 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-06 22:01 ` [PATCH 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-06 22:01 ` [PATCH 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-06 22:01 ` [PATCH 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-06 22:02 ` [PATCH 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-06 22:02 ` [PATCH 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-07  3:07   ` Eric Biederman
2023-10-09  1:31     ` Taylor Blau
2023-10-06 22:02 ` [PATCH 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-06 22:35   ` Junio C Hamano
2023-10-06 23:02     ` Taylor Blau
2023-10-08  7:02       ` Elijah Newren
2023-10-08 16:04         ` Taylor Blau
2023-10-08 17:33           ` Jeff King
2023-10-09  1:37             ` Taylor Blau
2023-10-09 20:21               ` Jeff King
2023-10-09 17:24             ` Junio C Hamano
2023-10-09 10:54       ` Patrick Steinhardt
2023-10-09 16:08         ` Taylor Blau
2023-10-10  6:36           ` Patrick Steinhardt
2023-10-17 16:31 ` [PATCH v2 0/7] merge-ort: implement support for packing objects together Taylor Blau
2023-10-17 16:31   ` [PATCH v2 1/7] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 2/7] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 3/7] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 4/7] bulk-checkin: factor our `finalize_checkpoint()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 5/7] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18  2:18     ` Junio C Hamano
2023-10-18 16:34       ` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 6/7] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-17 16:31   ` [PATCH v2 7/7] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 17:07 ` [PATCH v3 00/10] merge-ort: implement support for packing objects together Taylor Blau
2023-10-18 17:07   ` [PATCH v3 01/10] bulk-checkin: factor out `format_object_header_hash()` Taylor Blau
2023-10-18 17:07   ` [PATCH v3 02/10] bulk-checkin: factor out `prepare_checkpoint()` Taylor Blau
2023-10-18 17:07   ` [PATCH v3 03/10] bulk-checkin: factor out `truncate_checkpoint()` Taylor Blau
2023-10-18 17:07   ` [PATCH v3 04/10] bulk-checkin: factor out `finalize_checkpoint()` Taylor Blau
2023-10-18 17:08   ` [PATCH v3 05/10] bulk-checkin: extract abstract `bulk_checkin_source` Taylor Blau
2023-10-18 23:10     ` Junio C Hamano
2023-10-19 15:19       ` Taylor Blau
2023-10-19 17:55         ` Junio C Hamano
2023-10-18 17:08   ` [PATCH v3 06/10] bulk-checkin: implement `SOURCE_INCORE` mode for `bulk_checkin_source` Taylor Blau
2023-10-18 17:08   ` [PATCH v3 07/10] bulk-checkin: generify `stream_blob_to_pack()` for arbitrary types Taylor Blau
2023-10-18 17:08   ` [PATCH v3 08/10] bulk-checkin: introduce `index_blob_bulk_checkin_incore()` Taylor Blau
2023-10-18 23:18     ` Junio C Hamano
2023-10-19 15:30       ` Taylor Blau
2023-10-18 17:08   ` [PATCH v3 09/10] bulk-checkin: introduce `index_tree_bulk_checkin_incore()` Taylor Blau
2023-10-18 17:08   ` [PATCH v3 10/10] builtin/merge-tree.c: implement support for `--write-pack` Taylor Blau
2023-10-18 18:32 ` Taylor Blau [this message]
2023-10-18 18:32   ` [PATCH v4 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-10-18 18:32   ` [PATCH v4 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2023-10-18 18:32   ` [PATCH v4 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2023-10-18 18:32   ` [PATCH v4 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-10-18 18:32   ` [PATCH v4 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-10-18 18:32   ` [PATCH v4 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-10-18 18:32   ` [PATCH v4 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-10-18 18:32   ` [PATCH v4 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2023-10-18 18:32   ` [PATCH v4 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-10-18 18:32   ` [PATCH v4 10/17] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-10-18 18:33   ` [PATCH v4 11/17] bloom: annotate filters with hash version Taylor Blau
2023-10-18 18:33   ` [PATCH v4 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-10-18 18:33   ` [PATCH v4 13/17] commit-graph.c: unconditionally load " Taylor Blau
2023-10-18 18:33   ` [PATCH v4 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-10-18 18:33   ` [PATCH v4 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2023-10-18 18:33   ` [PATCH v4 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-10-18 18:33   ` [PATCH v4 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-10-18 23:26   ` [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries) Junio C Hamano
2023-10-20 17:27     ` Taylor Blau
2023-10-23 20:22       ` SZEDER Gábor
2023-10-30 20:24         ` Taylor Blau
2024-01-16 22:08   ` [PATCH v5 " Taylor Blau
2024-01-16 22:09     ` [PATCH v5 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-16 22:09     ` [PATCH v5 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-16 22:09     ` [PATCH v5 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-16 22:09     ` [PATCH v5 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-16 22:09     ` [PATCH v5 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-16 22:09     ` [PATCH v5 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-16 22:09     ` [PATCH v5 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-16 22:09     ` [PATCH v5 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-16 22:09     ` [PATCH v5 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-29 21:26       ` SZEDER Gábor
2024-01-29 23:58         ` Taylor Blau
2024-01-16 22:09     ` [PATCH v5 10/17] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-01-16 22:09     ` [PATCH v5 11/17] bloom: annotate filters with hash version Taylor Blau
2024-01-16 22:09     ` [PATCH v5 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-16 22:09     ` [PATCH v5 13/17] commit-graph.c: unconditionally load " Taylor Blau
2024-01-16 22:09     ` [PATCH v5 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2024-01-16 22:09     ` [PATCH v5 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-16 22:09     ` [PATCH v5 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-16 22:09     ` [PATCH v5 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1697653929.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=ebiederm@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).