git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: peff@peff.net, dstolee@microsoft.com, szeder.dev@gmail.com,
	gitster@pobox.com
Subject: [PATCH v3 00/14] more miscellaneous Bloom filter improvements
Date: Tue, 11 Aug 2020 16:51:13 -0400	[thread overview]
Message-ID: <cover.1597178914.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1596480582.git.me@ttaylorr.com>

Hi,

Here's a(nother) re-roll of mine and Stolee's series to introduce the
new BFXL commit-graph chunk, along with the '--max-new-filters' option
to 'git commit-graph write'.

Not really much has changed since v2, other than a rebase onto the
latest from master (the fifth 2.29 batch, at the time of writing), and
to squash in a few fixups that I sent in response to my v2 series.

Hopefully this should be ready for queueing. (Stolee has looked at this
a lot off-list, but it would be great to get an ack from him on list,
too).

Derrick Stolee (1):
  bloom/diff: properly short-circuit on max_changes

Taylor Blau (13):
  commit-graph: introduce 'get_bloom_filter_settings()'
  t4216: use an '&&'-chain
  commit-graph: pass a 'struct repository *' in more places
  t/helper/test-read-graph.c: prepare repo settings
  commit-graph: respect 'commitGraph.readChangedPaths'
  commit-graph.c: store maximum changed paths
  bloom: split 'get_bloom_filter()' in two
  bloom: use provided 'struct bloom_filter_settings'
  commit-graph.c: sort index into commits list
  csum-file.h: introduce 'hashwrite_be64()'
  commit-graph: add large-filters bitmap chunk
  commit-graph: rename 'split_commit_graph_opts'
  builtin/commit-graph.c: introduce '--max-new-filters=<n>'

 Documentation/config.txt                      |   2 +
 Documentation/config/commitgraph.txt          |   8 +
 Documentation/git-commit-graph.txt            |   4 +
 .../technical/commit-graph-format.txt         |  12 +
 blame.c                                       |   8 +-
 bloom.c                                       |  51 +++-
 bloom.h                                       |  22 +-
 builtin/commit-graph.c                        |  61 +++-
 commit-graph.c                                | 274 +++++++++++++-----
 commit-graph.h                                |  19 +-
 csum-file.h                                   |   6 +
 diff.h                                        |   2 -
 fuzz-commit-graph.c                           |   5 +-
 line-log.c                                    |   2 +-
 midx.c                                        |   3 +-
 repo-settings.c                               |   3 +
 repository.h                                  |   1 +
 revision.c                                    |   7 +-
 t/helper/test-bloom.c                         |   4 +-
 t/helper/test-read-graph.c                    |   3 +-
 t/t4216-log-bloom.sh                          | 148 ++++++++--
 t/t5324-split-commit-graph.sh                 |  13 +
 tree-diff.c                                   |   5 +-
 23 files changed, 522 insertions(+), 141 deletions(-)
 create mode 100644 Documentation/config/commitgraph.txt

Range-diff against v2:
 [ ... rebase onto 'master' ... ]
 1:  001f3385ff = 34:  e714e54240 commit-graph: introduce 'get_bloom_filter_settings()'
 2:  e4d068a478 = 35:  9fc8b17d6f t4216: use an '&&'-chain
 3:  afdc614c0d = 36:  8dbe4838b7 commit-graph: pass a 'struct repository *' in more places
 4:  038e996ced = 37:  f59db1e30d t/helper/test-read-graph.c: prepare repo settings
 5:  404f10319a = 38:  daae6788c0 commit-graph: respect 'commitGraph.readChangedPaths'
 6:  053991f048 = 39:  bf498844ef commit-graph.c: store maximum changed paths
 7:  23525947c8 ! 40:  eba2794873 bloom: split 'get_bloom_filter()' in two
    @@ bloom.h: void add_key_to_filter(const struct bloom_key *key,
     +						 int compute_if_not_present,
     +						 int *computed);
     +
    -+#define DEFAULT_BLOOM_MAX_CHANGES 512
     +#define get_bloom_filter(r, c) get_or_compute_bloom_filter( \
     +	(r), (c), 0, NULL)

 8:  4deb724fc1 ! 41:  4f08177dbe bloom: use provided 'struct bloom_filter_settings'
    @@ bloom.h: void init_bloom_filters(void);
     +						 const struct bloom_filter_settings *settings,
      						 int *computed);

    - #define DEFAULT_BLOOM_MAX_CHANGES 512
      #define get_bloom_filter(r, c) get_or_compute_bloom_filter( \
     -	(r), (c), 0, NULL)
     +	(r), (c), 0, NULL, NULL)
 9:  d1c4bbcaa9 = 42:  cc1dc8b121 bloom/diff: properly short-circuit on max_changes
10:  e92ccafcf7 = 43:  23fd52c3b8 commit-graph.c: sort index into commits list
11:  c42d678714 = 44:  4800cd373e csum-file.h: introduce 'hashwrite_be64()'
12:  100b26d7c8 ! 45:  619e0c619d commit-graph: add large-filters bitmap chunk
    @@ Commit message
         To allow using the existing bitmap code with 64-bit words, we write the
         data in network byte order from the 64-bit words. This means we also
         need to read the array from the commit-graph file by translating each
    -    word from network byte order using get_be64() upon first use of the
    -    bitmap. This is only used when writing the commit-graph, so this is a
    -    relatively small operation compared to the other writing code.
    +    word from network byte order using get_be64() when loading the commit
    +    graph. (Note that this *could* be delayed until first-use, but a later
    +    patch will rely on this being initialized early, so we assume the
    +    up-front cost when parsing instead of delaying initialization).

         By avoiding the need to move to new versions of the BDAT and BIDX chunk,
         we can give ourselves more time to consider whether or not other
    @@ commit-graph.c: struct commit_graph *parse_commit_graph(struct repository *r,
     +			if (graph->chunk_bloom_large_filters)
     +				chunk_repeated = 1;
     +			else if (r->settings.commit_graph_read_changed_paths) {
    -+				graph->bloom_large_to_alloc = get_be64(chunk_lookup + 4)
    -+							      - chunk_offset - sizeof(uint32_t);
    -+
    -+				graph->bloom_large.word_alloc = 0; /* populate when necessary */
    ++				size_t alloc = get_be64(chunk_lookup + 4) - chunk_offset - sizeof(uint32_t);
     +				graph->chunk_bloom_large_filters = data + chunk_offset + sizeof(uint32_t);
     +				graph->bloom_filter_settings->max_changed_paths = get_be32(data + chunk_offset);
    ++				if (alloc) {
    ++					size_t j;
    ++					graph->bloom_large = bitmap_word_alloc(alloc);
    ++
    ++					for (j = 0; j < graph->bloom_large->word_alloc; j++)
    ++						graph->bloom_large->words[j] = get_be64(
    ++							graph->chunk_bloom_large_filters + j * sizeof(eword_t));
    ++				}
     +			}
     +			break;
      		}
    @@ commit-graph.c: struct commit_graph *parse_commit_graph(struct repository *r,
      		graph->chunk_bloom_data = NULL;
     +		graph->chunk_bloom_large_filters = NULL;
      		FREE_AND_NULL(graph->bloom_filter_settings);
    ++		bitmap_free(graph->bloom_large);
      	}

    + 	hashcpy(graph->oid.hash, graph->data + graph->data_len - graph->hash_len);
     @@ commit-graph.c: struct tree *get_commit_tree_in_graph(struct repository *r, const struct commit
      	return get_commit_tree_in_graph_one(r, r->objects->commit_graph, c);
      }
    @@ commit-graph.c: struct tree *get_commit_tree_in_graph(struct repository *r, cons
     +	while (g && graph_pos < g->num_commits_in_base)
     +		g = g->base_graph;
     +
    -+	if (!g || !g->bloom_large_to_alloc)
    ++	if (!(g && g->bloom_large))
     +		return 0;
    -+
    -+	if (!g->bloom_large.word_alloc) {
    -+		size_t i;
    -+		g->bloom_large.word_alloc = g->bloom_large_to_alloc;
    -+		g->bloom_large.words = xmalloc(g->bloom_large_to_alloc * sizeof(eword_t));
    -+
    -+		for (i = 0; i < g->bloom_large_to_alloc; i++)
    -+			g->bloom_large.words[i] = get_be64(g->chunk_bloom_large_filters
    -+							   + i * sizeof(eword_t));
    -+	}
    -+
    -+	return bitmap_get(&g->bloom_large, graph_pos - g->num_commits_in_base);
    ++	return bitmap_get(g->bloom_large, graph_pos - g->num_commits_in_base);
     +}

      struct packed_oid_list {
    @@ commit-graph.h: struct commit_graph {
      	const unsigned char *chunk_bloom_data;
     +	const unsigned char *chunk_bloom_large_filters;
     +
    -+	size_t bloom_large_to_alloc;
    -+	struct bitmap bloom_large;
    ++	struct bitmap *bloom_large;

      	struct bloom_filter_settings *bloom_filter_settings;
      };
13:  2ee0b84351 = 46:  b2e33ecba8 commit-graph: rename 'split_commit_graph_opts'
14:  3b66ae4a9c ! 47:  09f6871f66 builtin/commit-graph.c: introduce '--max-new-filters=<n>'
    @@ bloom.c: static int load_bloom_filter_from_graph(struct commit_graph *g,
      		start_index = 0;

     +	if ((start_index == end_index) &&
    -+	    (g->bloom_large.word_alloc && !bitmap_get(&g->bloom_large, lex_pos))) {
    ++	    (g->bloom_large && !bitmap_get(g->bloom_large, lex_pos))) {
     +		/*
     +		 * If the filter is zero-length, either (1) the filter has no
     +		 * changes, (2) the filter has too many changes, or (3) it
    @@ commit-graph.c: struct tree *get_commit_tree_in_graph(struct repository *r, cons
      {
      	uint32_t graph_pos = commit_graph_position(c);
      	if (graph_pos == COMMIT_NOT_FROM_GRAPH)
    +@@ commit-graph.c: static int get_bloom_filter_large_in_graph(struct commit_graph *g,
    +
    + 	if (!(g && g->bloom_large))
    + 		return 0;
    ++	if (g->bloom_filter_settings->max_changed_paths != max_changed_paths) {
    ++		/*
    ++		 * Force all commits which are subject to a different
    ++		 * 'max_changed_paths' limit to be recomputed from scratch.
    ++		 *
    ++		 * Note that this could likely be improved, but is ignored since
    ++		 * all real-world graphs set the maximum number of changed paths
    ++		 * at 512.
    ++		 */
    ++		return 0;
    ++	}
    + 	return bitmap_get(g->bloom_large, graph_pos - g->num_commits_in_base);
    + }
    +
     @@ commit-graph.c: static void compute_bloom_filters(struct write_commit_graph_context *ctx)
      	int i;
      	struct progress *progress = NULL;
--
2.28.0.rc1.13.ge78abce653

  parent reply	other threads:[~2020-08-11 20:51 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-03 18:57 [PATCH 00/10] more miscellaneous Bloom filter improvements Taylor Blau
2020-08-03 18:57 ` [PATCH 01/10] commit-graph: introduce 'get_bloom_filter_settings()' Taylor Blau
2020-08-04  7:24   ` Jeff King
2020-08-04 20:08     ` Taylor Blau
2020-08-03 18:57 ` [PATCH 02/10] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-08-03 18:57 ` [PATCH 03/10] t4216: use an '&&'-chain Taylor Blau
2020-08-03 18:57 ` [PATCH 04/10] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-08-03 18:57 ` [PATCH 05/10] commit-graph: respect 'commitgraph.readChangedPaths' Taylor Blau
2020-08-03 18:57 ` [PATCH 06/10] commit-graph.c: sort index into commits list Taylor Blau
2020-08-04 12:31   ` Derrick Stolee
2020-08-04 20:10     ` Taylor Blau
2020-08-03 18:57 ` [PATCH 07/10] commit-graph: add large-filters bitmap chunk Taylor Blau
2020-08-03 18:59   ` Taylor Blau
2020-08-04 12:57   ` Derrick Stolee
2020-08-03 18:57 ` [PATCH 08/10] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-08-04 13:00   ` Derrick Stolee
2020-08-04 20:12     ` Taylor Blau
2020-08-03 18:57 ` [PATCH 09/10] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-08-03 18:57 ` [PATCH 10/10] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-08-04 13:03   ` Derrick Stolee
2020-08-04 20:14     ` Taylor Blau
2020-08-05 17:01 ` [PATCH v2 00/14] more miscellaneous Bloom filter improvements Taylor Blau
2020-08-05 17:01   ` [PATCH v2 01/14] commit-graph: introduce 'get_bloom_filter_settings()' Taylor Blau
2020-08-05 17:02   ` [PATCH v2 02/14] t4216: use an '&&'-chain Taylor Blau
2020-08-05 17:02   ` [PATCH v2 03/14] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-08-05 17:02   ` [PATCH v2 04/14] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-08-05 17:02   ` [PATCH v2 05/14] commit-graph: respect 'commitGraph.readChangedPaths' Taylor Blau
2020-08-05 17:02   ` [PATCH v2 06/14] commit-graph.c: store maximum changed paths Taylor Blau
2020-08-05 17:02   ` [PATCH v2 07/14] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-08-05 17:02   ` [PATCH v2 08/14] bloom: use provided 'struct bloom_filter_settings' Taylor Blau
2020-08-05 17:02   ` [PATCH v2 09/14] bloom/diff: properly short-circuit on max_changes Taylor Blau
2020-08-05 17:02   ` [PATCH v2 10/14] commit-graph.c: sort index into commits list Taylor Blau
2020-08-05 17:02   ` [PATCH v2 11/14] csum-file.h: introduce 'hashwrite_be64()' Taylor Blau
2020-08-05 17:02   ` [PATCH v2 12/14] commit-graph: add large-filters bitmap chunk Taylor Blau
2020-08-05 21:01     ` Junio C Hamano
2020-08-05 21:17       ` Taylor Blau
2020-08-05 22:21         ` Junio C Hamano
2020-08-05 22:25           ` Taylor Blau
2020-08-11 13:48             ` Taylor Blau
2020-08-11 18:59               ` Junio C Hamano
2020-08-05 17:03   ` [PATCH v2 13/14] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-08-05 17:03   ` [PATCH v2 14/14] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-08-11 20:51 ` Taylor Blau [this message]
2020-08-11 20:51   ` [PATCH v3 01/14] commit-graph: introduce 'get_bloom_filter_settings()' Taylor Blau
2020-08-11 21:18     ` SZEDER Gábor
2020-08-11 21:21       ` Taylor Blau
2020-08-11 21:27         ` SZEDER Gábor
2020-08-11 21:34           ` Taylor Blau
2020-08-11 23:55             ` SZEDER Gábor
2020-08-12 11:48               ` Derrick Stolee
2020-08-14 20:17                 ` Taylor Blau
2020-08-11 20:51   ` [PATCH v3 02/14] t4216: use an '&&'-chain Taylor Blau
2020-08-11 20:51   ` [PATCH v3 03/14] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-08-11 20:51   ` [PATCH v3 04/14] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-08-11 20:51   ` [PATCH v3 05/14] commit-graph: respect 'commitGraph.readChangedPaths' Taylor Blau
2020-08-11 20:51   ` [PATCH v3 06/14] commit-graph.c: store maximum changed paths Taylor Blau
2020-08-11 20:51   ` [PATCH v3 07/14] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-08-11 20:51   ` [PATCH v3 11/14] csum-file.h: introduce 'hashwrite_be64()' Taylor Blau
2020-08-11 20:51   ` [PATCH v3 08/14] bloom: use provided 'struct bloom_filter_settings' Taylor Blau
2020-08-11 20:51   ` [PATCH v3 09/14] bloom/diff: properly short-circuit on max_changes Taylor Blau
2020-08-11 20:52   ` [PATCH v3 10/14] commit-graph.c: sort index into commits list Taylor Blau
2020-08-11 20:52   ` [PATCH v3 12/14] commit-graph: add large-filters bitmap chunk Taylor Blau
2020-08-11 21:11     ` Derrick Stolee
2020-08-11 21:18       ` Taylor Blau
2020-08-11 22:05         ` Taylor Blau
2020-08-19 13:35     ` SZEDER Gábor
2020-09-02 20:23       ` Taylor Blau
2020-09-01 14:35     ` SZEDER Gábor
2020-09-02 20:40       ` Taylor Blau
2020-08-11 20:52   ` [PATCH v3 13/14] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-08-19  9:56     ` SZEDER Gábor
2020-09-02 21:02       ` Taylor Blau
2020-08-11 20:52   ` [PATCH v3 14/14] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-08-12 11:49     ` SZEDER Gábor
2020-08-14 20:20       ` Taylor Blau
2020-08-17 22:50         ` SZEDER Gábor
2020-09-02 21:03           ` Taylor Blau
2020-08-12 12:29     ` Derrick Stolee
2020-08-14 20:10       ` Taylor Blau
2020-08-18 22:23     ` SZEDER Gábor
2020-09-03 16:35       ` Taylor Blau
2020-08-19  8:20     ` SZEDER Gábor
2020-09-03 16:42       ` Taylor Blau
2020-09-04  8:50         ` SZEDER Gábor
2020-09-01 14:36     ` SZEDER Gábor
2020-09-03 18:49       ` Taylor Blau
2020-09-03 21:45   ` [PATCH v3 00/14] more miscellaneous Bloom filter improvements Junio C Hamano
2020-09-03 22:33     ` Taylor Blau
2020-09-03 22:45 ` [PATCH v4 " Taylor Blau
2020-09-03 22:46   ` [PATCH v4 01/14] commit-graph: introduce 'get_bloom_filter_settings()' Taylor Blau
2020-09-03 22:46   ` [PATCH v4 02/14] t4216: use an '&&'-chain Taylor Blau
2020-09-03 22:46   ` [PATCH v4 03/14] commit-graph: pass a 'struct repository *' in more places Taylor Blau
2020-09-03 22:46   ` [PATCH v4 04/14] t/helper/test-read-graph.c: prepare repo settings Taylor Blau
2020-09-03 22:46   ` [PATCH v4 05/14] commit-graph: respect 'commitGraph.readChangedPaths' Taylor Blau
2020-09-03 22:46   ` [PATCH v4 06/14] commit-graph.c: store maximum changed paths Taylor Blau
2020-09-03 22:46   ` [PATCH v4 07/14] bloom: split 'get_bloom_filter()' in two Taylor Blau
2020-09-05 17:22     ` Jakub Narębski
2020-09-05 17:38       ` Taylor Blau
2020-09-05 17:50         ` Jakub Narębski
2020-09-05 18:01           ` Taylor Blau
2020-09-05 18:18             ` Jakub Narębski
2020-09-05 18:38               ` Taylor Blau
2020-09-05 18:55                 ` Taylor Blau
2020-09-05 19:04                   ` SZEDER Gábor
2020-09-05 19:49                     ` Taylor Blau
2020-09-06 21:52                       ` Junio C Hamano
2020-09-03 22:46   ` [PATCH v4 08/14] bloom: use provided 'struct bloom_filter_settings' Taylor Blau
2020-09-03 22:46   ` [PATCH v4 09/14] bloom/diff: properly short-circuit on max_changes Taylor Blau
2020-09-03 22:46   ` [PATCH v4 10/14] commit-graph.c: sort index into commits list Taylor Blau
2020-09-03 22:46   ` [PATCH v4 11/14] csum-file.h: introduce 'hashwrite_be64()' Taylor Blau
2020-09-04 20:18     ` René Scharfe
2020-09-04 20:22       ` Taylor Blau
2020-09-03 22:46   ` [PATCH v4 12/14] commit-graph: add large-filters bitmap chunk Taylor Blau
2020-09-03 22:46   ` [PATCH v4 13/14] commit-graph: rename 'split_commit_graph_opts' Taylor Blau
2020-09-04 15:20     ` Taylor Blau
2020-09-03 22:47   ` [PATCH v4 14/14] builtin/commit-graph.c: introduce '--max-new-filters=<n>' Taylor Blau
2020-09-04 14:39   ` [PATCH v4 00/14] more miscellaneous Bloom filter improvements Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1597178914.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).