From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> To: git@vger.kernel.org Cc: stolee@gmail.com, szeder.dev@gmail.com, jonathantanmy@google.com, Garima Singh <garima.singh@microsoft.com>, Derrick Stolee <dstolee@microsoft.com> Subject: [PATCH v4 05/15] diff: halt tree-diff early after max_changes Date: Mon, 06 Apr 2020 16:59:45 +0000 [thread overview] Message-ID: <2d4c0b2da38632424c8bd31ccb2037e0676c3c74.1586192395.git.gitgitgadget@gmail.com> (raw) In-Reply-To: <pull.497.v4.git.1586192395.gitgitgadget@gmail.com> From: Derrick Stolee <dstolee@microsoft.com> When computing the changed-paths bloom filters for the commit-graph, we limit the size of the filter by restricting the number of paths in the diff. Instead of computing a large diff and then ignoring the result, it is better to halt the diff computation early. Create a new "max_changes" option in struct diff_options. If non-zero, then halt the diff computation after discovering strictly more changed paths. This includes paths corresponding to trees that change. Use this max_changes option in the bloom filter calculations. This reduces the time taken to compute the filters for the Linux kernel repo from 2m50s to 2m35s. On a large internal repository with ~500 commits that perform tree-wide changes, the time reduced from 6m15s to 3m48s. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> --- bloom.c | 4 +++- diff.h | 5 +++++ tree-diff.c | 6 ++++++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/bloom.c b/bloom.c index 881a9841ede..a16eee92331 100644 --- a/bloom.c +++ b/bloom.c @@ -133,6 +133,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r, struct bloom_filter_settings settings = DEFAULT_BLOOM_FILTER_SETTINGS; int i; struct diff_options diffopt; + int max_changes = 512; if (bloom_filters.slab_size == 0) return NULL; @@ -141,6 +142,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r, repo_diff_setup(r, &diffopt); diffopt.flags.recursive = 1; + diffopt.max_changes = max_changes; diff_setup_done(&diffopt); if (c->parents) @@ -149,7 +151,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r, diff_tree_oid(NULL, &c->object.oid, "", &diffopt); diffcore_std(&diffopt); - if (diff_queued_diff.nr <= 512) { + if (diff_queued_diff.nr <= max_changes) { struct hashmap pathmap; struct pathmap_hash_entry *e; struct hashmap_iter iter; diff --git a/diff.h b/diff.h index 6febe7e3656..9443dc1b003 100644 --- a/diff.h +++ b/diff.h @@ -285,6 +285,11 @@ struct diff_options { /* Number of hexdigits to abbreviate raw format output to. */ int abbrev; + /* If non-zero, then stop computing after this many changes. */ + int max_changes; + /* For internal use only. */ + int num_changes; + int ita_invisible_in_index; /* white-space error highlighting */ #define WSEH_NEW (1<<12) diff --git a/tree-diff.c b/tree-diff.c index 33ded7f8b3e..f3d303c6e54 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -434,6 +434,9 @@ static struct combine_diff_path *ll_diff_tree_paths( if (diff_can_quit_early(opt)) break; + if (opt->max_changes && opt->num_changes > opt->max_changes) + break; + if (opt->pathspec.nr) { skip_uninteresting(&t, base, opt); for (i = 0; i < nparent; i++) @@ -518,6 +521,7 @@ static struct combine_diff_path *ll_diff_tree_paths( /* t↓ */ update_tree_entry(&t); + opt->num_changes++; } /* t > p[imin] */ @@ -535,6 +539,7 @@ static struct combine_diff_path *ll_diff_tree_paths( skip_emit_tp: /* ∀ pi=p[imin] pi↓ */ update_tp_entries(tp, nparent); + opt->num_changes++; } } @@ -552,6 +557,7 @@ struct combine_diff_path *diff_tree_paths( const struct object_id **parents_oid, int nparent, struct strbuf *base, struct diff_options *opt) { + opt->num_changes = 0; p = ll_diff_tree_paths(p, oid, parents_oid, nparent, base, opt); /* -- gitgitgadget
next prev parent reply other threads:[~2020-04-06 17:00 UTC|newest] Thread overview: 159+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-12-20 22:05 [PATCH 0/9] [RFC] Changed Paths Bloom Filters Garima Singh via GitGitGadget 2019-12-20 22:05 ` [PATCH 1/9] commit-graph: add --changed-paths option to write Garima Singh via GitGitGadget 2020-01-01 20:20 ` Jakub Narebski 2019-12-20 22:05 ` [PATCH 2/9] commit-graph: write changed paths bloom filters Garima Singh via GitGitGadget 2019-12-21 16:48 ` Philip Oakley 2020-01-06 18:44 ` Jakub Narebski 2020-01-13 19:48 ` Garima Singh 2019-12-20 22:05 ` [PATCH 3/9] commit-graph: use MAX_NUM_CHUNKS Garima Singh via GitGitGadget 2020-01-07 12:19 ` Jakub Narebski 2019-12-20 22:05 ` [PATCH 4/9] commit-graph: document bloom filter format Garima Singh via GitGitGadget 2020-01-07 14:46 ` Jakub Narebski 2019-12-20 22:05 ` [PATCH 5/9] commit-graph: write changed path bloom filters to commit-graph file Garima Singh via GitGitGadget 2020-01-07 16:01 ` Jakub Narebski 2020-01-14 15:14 ` Garima Singh 2019-12-20 22:05 ` [PATCH 6/9] commit-graph: test commit-graph write --changed-paths Garima Singh via GitGitGadget 2020-01-08 0:32 ` Jakub Narebski 2019-12-20 22:05 ` [PATCH 7/9] commit-graph: reuse existing bloom filters during write Garima Singh via GitGitGadget 2020-01-09 19:12 ` Jakub Narebski 2019-12-20 22:05 ` [PATCH 8/9] revision.c: use bloom filters to speed up path based revision walks Garima Singh via GitGitGadget 2020-01-11 0:27 ` Jakub Narebski 2020-01-15 0:08 ` Garima Singh 2019-12-20 22:05 ` [PATCH 9/9] commit-graph: add GIT_TEST_COMMIT_GRAPH_BLOOM_FILTERS test flag Garima Singh via GitGitGadget 2020-01-11 19:56 ` Jakub Narebski 2020-01-15 0:55 ` Garima Singh 2019-12-20 22:14 ` [PATCH 0/9] [RFC] Changed Paths Bloom Filters Junio C Hamano 2019-12-22 9:26 ` Christian Couder 2019-12-22 9:38 ` Jeff King 2020-01-01 12:04 ` Jakub Narebski 2019-12-22 9:30 ` Jeff King 2019-12-22 9:32 ` [PATCH 1/3] commit-graph: examine changed-path objects in pack order Jeff King 2019-12-27 14:51 ` Derrick Stolee 2019-12-29 6:12 ` Jeff King 2019-12-29 6:28 ` Jeff King 2019-12-30 14:37 ` Derrick Stolee 2019-12-30 14:51 ` Derrick Stolee 2019-12-22 9:32 ` [PATCH 2/3] commit-graph: free large diffs, too Jeff King 2019-12-27 14:52 ` Derrick Stolee 2019-12-22 9:32 ` [PATCH 3/3] commit-graph: stop using full rev_info for diffs Jeff King 2019-12-27 14:53 ` Derrick Stolee 2019-12-26 14:21 ` [PATCH 0/9] [RFC] Changed Paths Bloom Filters Derrick Stolee 2019-12-29 6:03 ` Jeff King 2019-12-27 16:11 ` Derrick Stolee 2019-12-29 6:24 ` Jeff King 2019-12-30 16:04 ` Derrick Stolee 2019-12-30 17:02 ` Junio C Hamano 2019-12-31 16:45 ` Jakub Narebski 2020-01-13 16:54 ` Garima Singh 2020-01-20 13:48 ` Jakub Narebski 2020-01-21 16:14 ` Garima Singh 2020-02-02 18:43 ` Jakub Narebski 2020-01-21 23:40 ` Emily Shaffer 2020-01-27 18:24 ` Garima Singh 2020-02-01 23:32 ` Jakub Narebski 2020-02-05 22:56 ` [PATCH v2 00/11] " Garima Singh via GitGitGadget 2020-02-05 22:56 ` [PATCH v2 01/11] commit-graph: use MAX_NUM_CHUNKS Garima Singh via GitGitGadget 2020-02-09 12:39 ` Jakub Narebski 2020-02-05 22:56 ` [PATCH v2 02/11] bloom: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget 2020-02-15 17:17 ` Jakub Narebski 2020-02-16 16:49 ` Jakub Narebski 2020-02-22 0:32 ` Garima Singh 2020-02-23 13:38 ` Jakub Narebski 2020-02-24 17:34 ` Garima Singh 2020-02-24 18:20 ` Jakub Narebski 2020-02-05 22:56 ` [PATCH v2 03/11] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget 2020-02-17 0:00 ` Jakub Narebski 2020-02-22 0:37 ` Garima Singh 2020-02-05 22:56 ` [PATCH v2 04/11] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget 2020-02-17 21:56 ` Jakub Narebski 2020-02-22 0:55 ` Garima Singh 2020-02-23 17:34 ` Jakub Narebski 2020-02-05 22:56 ` [PATCH v2 05/11] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget 2020-02-18 17:59 ` Jakub Narebski 2020-02-24 18:29 ` Garima Singh 2020-02-05 22:56 ` [PATCH v2 06/11] commit-graph: examine commits by generation number Derrick Stolee via GitGitGadget 2020-02-19 0:32 ` Jakub Narebski 2020-02-24 20:45 ` Garima Singh 2020-02-05 22:56 ` [PATCH v2 07/11] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget 2020-02-19 15:13 ` Jakub Narebski 2020-02-24 21:14 ` Garima Singh 2020-02-25 11:40 ` Jakub Narebski 2020-02-25 15:58 ` Garima Singh 2020-02-05 22:56 ` [PATCH v2 08/11] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget 2020-02-20 18:48 ` Jakub Narebski 2020-02-24 21:45 ` Garima Singh 2020-02-05 22:56 ` [PATCH v2 09/11] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget 2020-02-20 20:28 ` Jakub Narebski 2020-02-24 21:51 ` Garima Singh 2020-02-25 12:10 ` Jakub Narebski 2020-02-20 22:10 ` Bryan Turner 2020-02-22 1:44 ` Garima Singh 2020-02-05 22:56 ` [PATCH v2 10/11] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget 2020-02-21 17:31 ` Jakub Narebski 2020-02-21 22:45 ` Jakub Narebski 2020-02-05 22:56 ` [PATCH v2 11/11] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget 2020-02-22 0:11 ` Jakub Narebski 2020-02-07 13:52 ` [PATCH v2 00/11] Changed Paths Bloom Filters SZEDER Gábor 2020-02-07 15:09 ` Garima Singh 2020-02-07 15:36 ` Derrick Stolee 2020-02-07 16:15 ` SZEDER Gábor 2020-02-07 16:33 ` Derrick Stolee 2020-02-11 19:08 ` Garima Singh 2020-02-08 23:04 ` Jakub Narebski 2020-02-21 17:41 ` Garima Singh 2020-03-29 18:36 ` Junio C Hamano 2020-03-30 0:31 ` [PATCH v3 00/16] " Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 01/16] commit-graph: define and use MAX_NUM_CHUNKS Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 02/16] bloom.c: add the murmur3 hash implementation Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 03/16] bloom.c: introduce core Bloom filter constructs Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 04/16] bloom.c: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 05/16] diff: halt tree-diff early after max_changes Derrick Stolee via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 06/16] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 07/16] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 08/16] commit-graph: examine commits by generation number Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 09/16] diff: skip batch object download when possible Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 10/16] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 11/16] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 12/16] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 13/16] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 14/16] revision.c: add trace2 stats around Bloom filter usage Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 15/16] t4216: add end to end tests for git log with Bloom filters Garima Singh via GitGitGadget 2020-03-30 0:31 ` [PATCH v3 16/16] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget 2020-04-06 16:59 ` [PATCH v4 00/15] Changed Paths Bloom Filters Garima Singh via GitGitGadget 2020-04-06 16:59 ` [PATCH v4 01/15] commit-graph: define and use MAX_NUM_CHUNKS Garima Singh via GitGitGadget 2020-04-06 16:59 ` [PATCH v4 02/15] bloom.c: add the murmur3 hash implementation Garima Singh via GitGitGadget 2020-04-06 16:59 ` [PATCH v4 03/15] bloom.c: introduce core Bloom filter constructs Garima Singh via GitGitGadget 2020-04-06 16:59 ` [PATCH v4 04/15] bloom.c: core Bloom filter implementation for changed paths Garima Singh via GitGitGadget 2020-06-27 15:53 ` SZEDER Gábor 2020-04-06 16:59 ` Derrick Stolee via GitGitGadget [this message] 2020-08-04 14:47 ` [PATCH v4 05/15] diff: halt tree-diff early after max_changes SZEDER Gábor 2020-08-04 16:25 ` Derrick Stolee 2020-08-04 17:00 ` SZEDER Gábor 2020-08-04 17:31 ` Derrick Stolee 2020-08-05 17:08 ` Derrick Stolee 2020-04-06 16:59 ` [PATCH v4 06/15] commit-graph: compute Bloom filters for changed paths Garima Singh via GitGitGadget 2020-04-06 16:59 ` [PATCH v4 07/15] commit-graph: examine changed-path objects in pack order Jeff King via GitGitGadget 2020-04-06 16:59 ` [PATCH v4 08/15] commit-graph: examine commits by generation number Garima Singh via GitGitGadget 2020-04-06 16:59 ` [PATCH v4 09/15] commit-graph: write Bloom filters to commit graph file Garima Singh via GitGitGadget 2020-05-29 8:57 ` SZEDER Gábor 2020-05-29 13:35 ` Derrick Stolee 2020-05-31 17:23 ` SZEDER Gábor 2020-07-09 17:00 ` [PATCH] commit-graph: fix "Writing out commit graph" progress counter SZEDER Gábor 2020-07-09 18:01 ` Derrick Stolee 2020-07-09 18:20 ` Derrick Stolee 2020-04-06 16:59 ` [PATCH v4 10/15] commit-graph: reuse existing Bloom filters during write Garima Singh via GitGitGadget 2020-06-19 14:02 ` SZEDER Gábor 2020-06-19 19:28 ` Junio C Hamano 2020-07-27 21:33 ` SZEDER Gábor 2020-04-06 16:59 ` [PATCH v4 11/15] commit-graph: add --changed-paths option to write subcommand Garima Singh via GitGitGadget 2020-06-07 22:21 ` SZEDER Gábor 2020-04-06 16:59 ` [PATCH v4 12/15] revision.c: use Bloom filters to speed up path based revision walks Garima Singh via GitGitGadget 2020-06-26 6:34 ` SZEDER Gábor 2020-04-06 16:59 ` [PATCH v4 13/15] revision.c: add trace2 stats around Bloom filter usage Garima Singh via GitGitGadget 2020-04-06 16:59 ` [PATCH v4 14/15] t4216: add end to end tests for git log with Bloom filters Garima Singh via GitGitGadget 2020-04-06 16:59 ` [PATCH v4 15/15] commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag Garima Singh via GitGitGadget 2020-04-08 15:51 ` [PATCH v4 00/15] Changed Paths Bloom Filters Derrick Stolee 2020-04-08 19:21 ` Junio C Hamano 2020-04-08 20:05 ` Jakub Narębski 2020-04-12 20:34 ` Taylor Blau 2020-03-05 19:49 ` [PATCH 0/9] [RFC] " Garima Singh
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=2d4c0b2da38632424c8bd31ccb2037e0676c3c74.1586192395.git.gitgitgadget@gmail.com \ --to=gitgitgadget@gmail.com \ --cc=dstolee@microsoft.com \ --cc=garima.singh@microsoft.com \ --cc=git@vger.kernel.org \ --cc=jonathantanmy@google.com \ --cc=stolee@gmail.com \ --cc=szeder.dev@gmail.com \ --subject='Re: [PATCH v4 05/15] diff: halt tree-diff early after max_changes' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).