From: "Garima Singh via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: stolee@gmail.com, szeder.dev@gmail.com, jonathantanmy@google.com,
Garima Singh <garima.singh@microsoft.com>,
Garima Singh <garima.singh@microsoft.com>
Subject: [PATCH v3 06/16] commit-graph: compute Bloom filters for changed paths
Date: Mon, 30 Mar 2020 00:31:28 +0000 [thread overview]
Message-ID: <c38b9b386ef246cd3144ec5ede991c3cad32f3d9.1585528298.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.497.v3.git.1585528298.gitgitgadget@gmail.com>
From: Garima Singh <garima.singh@microsoft.com>
Add new COMMIT_GRAPH_WRITE_CHANGED_PATHS flag that makes Git compute
Bloom filters for the paths that changed between a commit and it's
first parent, for each commit in the commit-graph. This computation
is done on a commit-by-commit basis.
We will write these Bloom filters to the commit-graph file, to store
this data on disk, in the next change in this series.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
---
commit-graph.c | 32 +++++++++++++++++++++++++++++++-
commit-graph.h | 3 ++-
2 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/commit-graph.c b/commit-graph.c
index e4f1a5b2f1a..862a00d67ed 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -16,6 +16,7 @@
#include "hashmap.h"
#include "replace-object.h"
#include "progress.h"
+#include "bloom.h"
#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
@@ -789,9 +790,11 @@ struct write_commit_graph_context {
unsigned append:1,
report_progress:1,
split:1,
- check_oids:1;
+ check_oids:1,
+ changed_paths:1;
const struct split_commit_graph_opts *split_opts;
+ size_t total_bloom_filter_data_size;
};
static void write_graph_chunk_fanout(struct hashfile *f,
@@ -1134,6 +1137,28 @@ static void compute_generation_numbers(struct write_commit_graph_context *ctx)
stop_progress(&ctx->progress);
}
+static void compute_bloom_filters(struct write_commit_graph_context *ctx)
+{
+ int i;
+ struct progress *progress = NULL;
+
+ init_bloom_filters();
+
+ if (ctx->report_progress)
+ progress = start_delayed_progress(
+ _("Computing commit changed paths Bloom filters"),
+ ctx->commits.nr);
+
+ for (i = 0; i < ctx->commits.nr; i++) {
+ struct commit *c = ctx->commits.list[i];
+ struct bloom_filter *filter = get_bloom_filter(ctx->r, c);
+ ctx->total_bloom_filter_data_size += sizeof(unsigned char) * filter->len;
+ display_progress(progress, i + 1);
+ }
+
+ stop_progress(&progress);
+}
+
static int add_ref_to_list(const char *refname,
const struct object_id *oid,
int flags, void *cb_data)
@@ -1776,6 +1801,8 @@ int write_commit_graph(struct object_directory *odb,
ctx->split = flags & COMMIT_GRAPH_WRITE_SPLIT ? 1 : 0;
ctx->check_oids = flags & COMMIT_GRAPH_WRITE_CHECK_OIDS ? 1 : 0;
ctx->split_opts = split_opts;
+ ctx->changed_paths = flags & COMMIT_GRAPH_WRITE_BLOOM_FILTERS ? 1 : 0;
+ ctx->total_bloom_filter_data_size = 0;
if (ctx->split) {
struct commit_graph *g;
@@ -1870,6 +1897,9 @@ int write_commit_graph(struct object_directory *odb,
compute_generation_numbers(ctx);
+ if (ctx->changed_paths)
+ compute_bloom_filters(ctx);
+
res = write_commit_graph_file(ctx);
if (ctx->split)
diff --git a/commit-graph.h b/commit-graph.h
index e87a6f63600..86be81219da 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -79,7 +79,8 @@ enum commit_graph_write_flags {
COMMIT_GRAPH_WRITE_PROGRESS = (1 << 1),
COMMIT_GRAPH_WRITE_SPLIT = (1 << 2),
/* Make sure that each OID in the input is a valid commit OID. */
- COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3)
+ COMMIT_GRAPH_WRITE_CHECK_OIDS = (1 << 3),
+ COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 4),
};
struct split_commit_graph_opts {
--
gitgitgadget
next prev parent reply other threads:[~2020-03-30 0:31 UTC|newest]
Thread overview: 159+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-20 22:05 [PATCH 0/9] [RFC] Changed Paths Bloom Filters Garima Singh via GitGitGadget
2019-12-20 22:05 ` [PATCH 1/9] commit-graph: add --changed-paths option to write Garima Singh via GitGitGadget
2020-01-01 20:20 ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 2/9] commit-graph: write changed paths bloom filters Garima Singh via GitGitGadget
2019-12-21 16:48 ` Philip Oakley
2020-01-06 18:44 ` Jakub Narebski
2020-01-13 19:48 ` Garima Singh
2019-12-20 22:05 ` [PATCH 3/9] commit-graph: use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-01-07 12:19 ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 4/9] commit-graph: document bloom filter format Garima Singh via GitGitGadget
2020-01-07 14:46 ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 5/9] commit-graph: write changed path bloom filters to commit-graph file Garima Singh via GitGitGadget
2020-01-07 16:01 ` Jakub Narebski
2020-01-14 15:14 ` Garima Singh
2019-12-20 22:05 ` [PATCH 6/9] commit-graph: test commit-graph write --changed-paths Garima Singh via GitGitGadget
2020-01-08 0:32 ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 7/9] commit-graph: reuse existing bloom filters during write Garima Singh via GitGitGadget
2020-01-09 19:12 ` Jakub Narebski
2019-12-20 22:05 ` [PATCH 8/9] revision.c: use bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-01-11 0:27 ` Jakub Narebski
2020-01-15 0:08 ` Garima Singh
2019-12-20 22:05 ` [PATCH 9/9] commit-graph: add GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS test flag Garima Singh via GitGitGadget
2020-01-11 19:56 ` Jakub Narebski
2020-01-15 0:55 ` Garima Singh
2019-12-20 22:14 ` [PATCH 0/9] [RFC] Changed Paths Bloom Filters Junio C Hamano
2019-12-22 9:26 ` Christian Couder
2019-12-22 9:38 ` Jeff King
2020-01-01 12:04 ` Jakub Narebski
2019-12-22 9:30 ` Jeff King
2019-12-22 9:32 ` [PATCH 1/3] commit-graph: examine changed-path objects in pack order Jeff King
2019-12-27 14:51 ` Derrick Stolee
2019-12-29 6:12 ` Jeff King
2019-12-29 6:28 ` Jeff King
2019-12-30 14:37 ` Derrick Stolee
2019-12-30 14:51 ` Derrick Stolee
2019-12-22 9:32 ` [PATCH 2/3] commit-graph: free large diffs, too Jeff King
2019-12-27 14:52 ` Derrick Stolee
2019-12-22 9:32 ` [PATCH 3/3] commit-graph: stop using full rev_info for diffs Jeff King
2019-12-27 14:53 ` Derrick Stolee
2019-12-26 14:21 ` [PATCH 0/9] [RFC] Changed Paths Bloom Filters Derrick Stolee
2019-12-29 6:03 ` Jeff King
2019-12-27 16:11 ` Derrick Stolee
2019-12-29 6:24 ` Jeff King
2019-12-30 16:04 ` Derrick Stolee
2019-12-30 17:02 ` Junio C Hamano
2019-12-31 16:45 ` Jakub Narebski
2020-01-13 16:54 ` Garima Singh
2020-01-20 13:48 ` Jakub Narebski
2020-01-21 16:14 ` Garima Singh
2020-02-02 18:43 ` Jakub Narebski
2020-01-21 23:40 ` Emily Shaffer
2020-01-27 18:24 ` Garima Singh
2020-02-01 23:32 ` Jakub Narebski
2020-02-05 22:56 ` [PATCH v2 00/11] " Garima Singh via GitGitGadget
2020-02-05 22:56 ` [PATCH v2 01/11] commit-graph: use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-02-09 12:39 ` Jakub Narebski
2020-02-05 22:56 ` [PATCH v2 02/11] bloom: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-02-15 17:17 ` Jakub Narebski
2020-02-16 16:49 ` Jakub Narebski
2020-02-22 0:32 ` Garima Singh
2020-02-23 13:38 ` Jakub Narebski
2020-02-24 17:34 ` Garima Singh
2020-02-24 18:20 ` Jakub Narebski
2020-02-05 22:56 ` [PATCH v2 03/11] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-02-17 0:00 ` Jakub Narebski
2020-02-22 0:37 ` Garima Singh
2020-02-05 22:56 ` [PATCH v2 04/11] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-02-17 21:56 ` Jakub Narebski
2020-02-22 0:55 ` Garima Singh
2020-02-23 17:34 ` Jakub Narebski
2020-02-05 22:56 ` [PATCH v2 05/11] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-02-18 17:59 ` Jakub Narebski
2020-02-24 18:29 ` Garima Singh
2020-02-05 22:56 ` [PATCH v2 06/11] commit-graph: examine commits by generation number Derrick Stolee via GitGitGadget
2020-02-19 0:32 ` Jakub Narebski
2020-02-24 20:45 ` Garima Singh
2020-02-05 22:56 ` [PATCH v2 07/11] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-02-19 15:13 ` Jakub Narebski
2020-02-24 21:14 ` Garima Singh
2020-02-25 11:40 ` Jakub Narebski
2020-02-25 15:58 ` Garima Singh
2020-02-05 22:56 ` [PATCH v2 08/11] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-02-20 18:48 ` Jakub Narebski
2020-02-24 21:45 ` Garima Singh
2020-02-05 22:56 ` [PATCH v2 09/11] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-02-20 20:28 ` Jakub Narebski
2020-02-24 21:51 ` Garima Singh
2020-02-25 12:10 ` Jakub Narebski
2020-02-20 22:10 ` Bryan Turner
2020-02-22 1:44 ` Garima Singh
2020-02-05 22:56 ` [PATCH v2 10/11] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-02-21 17:31 ` Jakub Narebski
2020-02-21 22:45 ` Jakub Narebski
2020-02-05 22:56 ` [PATCH v2 11/11] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-02-22 0:11 ` Jakub Narebski
2020-02-07 13:52 ` [PATCH v2 00/11] Changed Paths Bloom Filters SZEDER Gábor
2020-02-07 15:09 ` Garima Singh
2020-02-07 15:36 ` Derrick Stolee
2020-02-07 16:15 ` SZEDER Gábor
2020-02-07 16:33 ` Derrick Stolee
2020-02-11 19:08 ` Garima Singh
2020-02-08 23:04 ` Jakub Narebski
2020-02-21 17:41 ` Garima Singh
2020-03-29 18:36 ` Junio C Hamano
2020-03-30 0:31 ` [PATCH v3 00/16] " Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 01/16] commit-graph: define and use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 02/16] bloom.c: add the murmur3 hash implementation Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 03/16] bloom.c: introduce core Bloom filter constructs Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 04/16] bloom.c: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 05/16] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-03-30 0:31 ` Garima Singh via GitGitGadget [this message]
2020-03-30 0:31 ` [PATCH v3 07/16] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 08/16] commit-graph: examine commits by generation number Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 09/16] diff: skip batch object download when possible Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 10/16] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 11/16] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 12/16] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 13/16] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 14/16] revision.c: add trace2 stats around Bloom filter usage Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 15/16] t4216: add end to end tests for git log with Bloom filters Garima Singh via GitGitGadget
2020-03-30 0:31 ` [PATCH v3 16/16] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-04-06 16:59 ` [PATCH v4 00/15] Changed Paths Bloom Filters Garima Singh via GitGitGadget
2020-04-06 16:59 ` [PATCH v4 01/15] commit-graph: define and use MAX_NUM_CHUNKS Garima Singh via GitGitGadget
2020-04-06 16:59 ` [PATCH v4 02/15] bloom.c: add the murmur3 hash implementation Garima Singh via GitGitGadget
2020-04-06 16:59 ` [PATCH v4 03/15] bloom.c: introduce core Bloom filter constructs Garima Singh via GitGitGadget
2020-04-06 16:59 ` [PATCH v4 04/15] bloom.c: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget
2020-06-27 15:53 ` SZEDER Gábor
2020-04-06 16:59 ` [PATCH v4 05/15] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget
2020-08-04 14:47 ` SZEDER Gábor
2020-08-04 16:25 ` Derrick Stolee
2020-08-04 17:00 ` SZEDER Gábor
2020-08-04 17:31 ` Derrick Stolee
2020-08-05 17:08 ` Derrick Stolee
2020-04-06 16:59 ` [PATCH v4 06/15] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget
2020-04-06 16:59 ` [PATCH v4 07/15] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget
2020-04-06 16:59 ` [PATCH v4 08/15] commit-graph: examine commits by generation number Garima Singh via GitGitGadget
2020-04-06 16:59 ` [PATCH v4 09/15] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget
2020-05-29 8:57 ` SZEDER Gábor
2020-05-29 13:35 ` Derrick Stolee
2020-05-31 17:23 ` SZEDER Gábor
2020-07-09 17:00 ` [PATCH] commit-graph: fix "Writing out commit graph" progress counter SZEDER Gábor
2020-07-09 18:01 ` Derrick Stolee
2020-07-09 18:20 ` Derrick Stolee
2020-04-06 16:59 ` [PATCH v4 10/15] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget
2020-06-19 14:02 ` SZEDER Gábor
2020-06-19 19:28 ` Junio C Hamano
2020-07-27 21:33 ` SZEDER Gábor
2020-04-06 16:59 ` [PATCH v4 11/15] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget
2020-06-07 22:21 ` SZEDER Gábor
2020-04-06 16:59 ` [PATCH v4 12/15] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget
2020-06-26 6:34 ` SZEDER Gábor
2020-04-06 16:59 ` [PATCH v4 13/15] revision.c: add trace2 stats around Bloom filter usage Garima Singh via GitGitGadget
2020-04-06 16:59 ` [PATCH v4 14/15] t4216: add end to end tests for git log with Bloom filters Garima Singh via GitGitGadget
2020-04-06 16:59 ` [PATCH v4 15/15] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget
2020-04-08 15:51 ` [PATCH v4 00/15] Changed Paths Bloom Filters Derrick Stolee
2020-04-08 19:21 ` Junio C Hamano
2020-04-08 20:05 ` Jakub Narębski
2020-04-12 20:34 ` Taylor Blau
2020-03-05 19:49 ` [PATCH 0/9] [RFC] " Garima Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c38b9b386ef246cd3144ec5ede991c3cad32f3d9.1585528298.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=garima.singh@microsoft.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=stolee@gmail.com \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).