git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] bloom: ignore renames when computing changed paths
@ 2020-04-08 16:38 Derrick Stolee via GitGitGadget
  2020-04-08 19:11 ` Junio C Hamano
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-04-08 16:38 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The changed-path Bloom filters record an entry in the filter for
every path that was changed. This includes every add and delete,
regardless of whther a rename was detected. Detecting renames
causes significant performance issues, but also will trigger
downloading missing blobs in partial clone.

The simple fix is to disable rename detection when computing a
changed-path Bloom filter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    bloom: ignore renames when computing changed paths
    
    I promised [1] I would adapt the commit that was dropped from
    gs/commit-graph-path-filter [2] on top of gs/commit-graph-path-filter
    and jt/avoid-prefetch-when-able-in-diff. However, I noticed that the
    change was extremely simple and has value without basing it on
    jt/avoid-prefetch-when-able-in-diff.
    
    This change applied to gs/commit-graph-path-filter has obvious CPU time
    improvements for computing changed-path Bloom filters (that I did not
    measure). The partial clone improvements require
    jt/avoid-prefetch-when-able-in-diff to be included, too, but the code
    does not depend on it at compile time.
    
    Thanks, -Stolee
    
    [1] 
    https://lore.kernel.org/git/7de2f54b-8704-a0e1-12aa-0ca9d3d70f6f@gmail.com/
    [2] 
    https://lore.kernel.org/git/55824cda89c1dca7756c8c2d831d6e115f4a9ddb.1585528298.git.gitgitgadget@gmail.com/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-601%2Fderrickstolee%2Fdiff-and-bloom-filters-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-601/derrickstolee/diff-and-bloom-filters-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/601

 bloom.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/bloom.c b/bloom.c
index c5b461d1cfe..dd9bab9bbd6 100644
--- a/bloom.c
+++ b/bloom.c
@@ -189,6 +189,7 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
 
 	repo_diff_setup(r, &diffopt);
 	diffopt.flags.recursive = 1;
+	diffopt.detect_rename = 0;
 	diffopt.max_changes = max_changes;
 	diff_setup_done(&diffopt);
 

base-commit: d5b873c832d832e44523d1d2a9d29afe2b84c84f
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-04-09 14:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-08 16:38 [PATCH] bloom: ignore renames when computing changed paths Derrick Stolee via GitGitGadget
2020-04-08 19:11 ` Junio C Hamano
2020-04-08 19:13 ` Philip Oakley
2020-04-08 22:31 ` Jeff King
2020-04-09 11:56   ` Derrick Stolee
2020-04-09 13:47     ` Jeff King
2020-04-09 14:00       ` Derrick Stolee
2020-04-09 14:15         ` Jeff King
2020-04-09 13:00 ` [PATCH v2] " Derrick Stolee via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).