From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <dstolee@microsoft.com>,
"Jonathan Tan" <jonathantanmy@google.com>,
"Taylor Blau" <me@ttaylorr.com>,
"Junio C Hamano" <gitster@pobox.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Elijah Newren" <newren@gmail.com>,
"Elijah Newren" <newren@gmail.com>,
"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v2 1/8] diffcore-rename: enable filtering possible rename sources
Date: Tue, 09 Mar 2021 00:09:52 +0000 [thread overview]
Message-ID: <dab8e3c6aee5a852ad46c569c0be729d64310ad9.1615248599.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.845.v2.git.1615248599.gitgitgadget@gmail.com>
From: Elijah Newren <newren@gmail.com>
Add the ability to diffcore_rename_extended() to allow external callers
to declare that they only need renames detected for a subset of source
files, and use that information to skip detecting renames for them.
There are two important pieces to this optimization that may not be
obvious at first glance:
* We do not require callers to just filter the filepairs out
to remove the non-relevant sources, because exact rename detection
is fast and when it finds a match it can remove both a source and a
destination whereas the relevant_sources filter can only remove a
source.
* We need to filter out the source pairs in a preliminary pass instead
of adding a
strset_contains(relevant_sources, one->path)
check within the nested matrix loop. The reason for that is if we
have 30k renames, doing 30k * 30k = 900M strset_contains() calls
becomes extraordinarily expensive and defeats the performance gains
from this change; we only want to do 30k such calls instead.
If callers pass NULL for relevant_sources, that is special cases to
treat all sources as relevant. Since all callers currently pass NULL,
this optimization does not yet have any effect. Subsequent commits will
have merge-ort compute a set of relevant_sources to restrict which
sources we detect renames for, and have merge-ort pass that set of
relevant_sources to diffcore_rename_extended().
Signed-off-by: Elijah Newren <newren@gmail.com>
---
diffcore-rename.c | 26 +++++++++++++++++++-------
diffcore.h | 1 +
merge-ort.c | 1 +
3 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 1fe902ed2af0..7f6115fd9018 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -991,11 +991,12 @@ static int find_renames(struct diff_score *mx,
return count;
}
-static void remove_unneeded_paths_from_src(int detecting_copies)
+static void remove_unneeded_paths_from_src(int detecting_copies,
+ struct strset *interesting)
{
int i, new_num_src;
- if (detecting_copies)
+ if (detecting_copies && !interesting)
return; /* nothing to remove */
if (break_idx)
return; /* culling incompatible with break detection */
@@ -1022,12 +1023,18 @@ static void remove_unneeded_paths_from_src(int detecting_copies)
* from rename_src here.
*/
for (i = 0, new_num_src = 0; i < rename_src_nr; i++) {
+ struct diff_filespec *one = rename_src[i].p->one;
+
/*
* renames are stored in rename_dst, so if a rename has
* already been detected using this source, we can just
* remove the source knowing rename_dst has its info.
*/
- if (rename_src[i].p->one->rename_used)
+ if (!detecting_copies && one->rename_used)
+ continue;
+
+ /* If we don't care about the source path, skip it */
+ if (interesting && !strset_contains(interesting, one->path))
continue;
if (new_num_src < i)
@@ -1040,6 +1047,7 @@ static void remove_unneeded_paths_from_src(int detecting_copies)
}
void diffcore_rename_extended(struct diff_options *options,
+ struct strset *relevant_sources,
struct strset *dirs_removed,
struct strmap *dir_rename_count)
{
@@ -1060,6 +1068,8 @@ void diffcore_rename_extended(struct diff_options *options,
want_copies = (detect_rename == DIFF_DETECT_COPY);
if (dirs_removed && (break_idx || want_copies))
BUG("dirs_removed incompatible with break/copy detection");
+ if (break_idx && relevant_sources)
+ BUG("break detection incompatible with source specification");
if (!minimum_score)
minimum_score = DEFAULT_RENAME_SCORE;
@@ -1127,9 +1137,10 @@ void diffcore_rename_extended(struct diff_options *options,
/*
* Cull sources:
* - remove ones corresponding to exact renames
+ * - remove ones not found in relevant_sources
*/
trace2_region_enter("diff", "cull after exact", options->repo);
- remove_unneeded_paths_from_src(want_copies);
+ remove_unneeded_paths_from_src(want_copies, relevant_sources);
trace2_region_leave("diff", "cull after exact", options->repo);
} else {
/* Determine minimum score to match basenames */
@@ -1148,7 +1159,7 @@ void diffcore_rename_extended(struct diff_options *options,
* - remove ones involved in renames (found via exact match)
*/
trace2_region_enter("diff", "cull after exact", options->repo);
- remove_unneeded_paths_from_src(want_copies);
+ remove_unneeded_paths_from_src(want_copies, NULL);
trace2_region_leave("diff", "cull after exact", options->repo);
/* Preparation for basename-driven matching. */
@@ -1167,9 +1178,10 @@ void diffcore_rename_extended(struct diff_options *options,
/*
* Cull sources, again:
* - remove ones involved in renames (found via basenames)
+ * - remove ones not found in relevant_sources
*/
trace2_region_enter("diff", "cull basename", options->repo);
- remove_unneeded_paths_from_src(want_copies);
+ remove_unneeded_paths_from_src(want_copies, relevant_sources);
trace2_region_leave("diff", "cull basename", options->repo);
}
@@ -1342,5 +1354,5 @@ void diffcore_rename_extended(struct diff_options *options,
void diffcore_rename(struct diff_options *options)
{
- diffcore_rename_extended(options, NULL, NULL);
+ diffcore_rename_extended(options, NULL, NULL, NULL);
}
diff --git a/diffcore.h b/diffcore.h
index c6ba64abd198..737c93a6cc79 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -166,6 +166,7 @@ void partial_clear_dir_rename_count(struct strmap *dir_rename_count);
void diffcore_break(struct repository *, int);
void diffcore_rename(struct diff_options *);
void diffcore_rename_extended(struct diff_options *options,
+ struct strset *relevant_sources,
struct strset *dirs_removed,
struct strmap *dir_rename_count);
void diffcore_merge_broken(void);
diff --git a/merge-ort.c b/merge-ort.c
index 467404cc0a35..aba0b9fa54c3 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2029,6 +2029,7 @@ static void detect_regular_renames(struct merge_options *opt,
diff_queued_diff = renames->pairs[side_index];
trace2_region_enter("diff", "diffcore_rename", opt->repo);
diffcore_rename_extended(&diff_opts,
+ NULL,
&renames->dirs_removed[side_index],
&renames->dir_rename_count[side_index]);
trace2_region_leave("diff", "diffcore_rename", opt->repo);
--
gitgitgadget
next prev parent reply other threads:[~2021-03-09 0:11 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-28 3:58 [PATCH 0/8] Optimization batch 9: avoid detecting irrelevant renames Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 1/8] diffcore-rename: enable filtering possible rename sources Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 2/8] merge-ort: precompute subset of sources for which we need rename detection Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 3/8] merge-ort: add data structures for an alternate tree traversal Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 4/8] merge-ort: introduce wrappers for " Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 5/8] merge-ort: precompute whether directory rename detection is needed Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 6/8] merge-ort: use relevant_sources to filter possible rename sources Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 7/8] merge-ort: skip rename detection entirely if possible Elijah Newren via GitGitGadget
2021-02-28 3:58 ` [PATCH 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources Elijah Newren via GitGitGadget
2021-03-01 16:39 ` [PATCH 0/8] Optimization batch 9: avoid detecting irrelevant renames Elijah Newren
2021-03-04 7:54 ` Elijah Newren
2021-03-09 0:09 ` [PATCH v2 " Elijah Newren via GitGitGadget
2021-03-09 0:09 ` Elijah Newren via GitGitGadget [this message]
2021-03-09 22:21 ` [PATCH v2 1/8] diffcore-rename: enable filtering possible rename sources Derrick Stolee
2021-03-09 22:40 ` Elijah Newren
2021-03-09 22:45 ` Derrick Stolee
2021-03-09 0:09 ` [PATCH v2 2/8] merge-ort: precompute subset of sources for which we need rename detection Elijah Newren via GitGitGadget
2021-03-09 0:09 ` [PATCH v2 3/8] merge-ort: add data structures for an alternate tree traversal Elijah Newren via GitGitGadget
2021-03-09 0:09 ` [PATCH v2 4/8] merge-ort: introduce wrappers for " Elijah Newren via GitGitGadget
2021-03-09 23:06 ` Derrick Stolee
2021-03-09 0:09 ` [PATCH v2 5/8] merge-ort: precompute whether directory rename detection is needed Elijah Newren via GitGitGadget
2021-03-09 0:09 ` [PATCH v2 6/8] merge-ort: use relevant_sources to filter possible rename sources Elijah Newren via GitGitGadget
2021-03-09 0:09 ` [PATCH v2 7/8] merge-ort: skip rename detection entirely if possible Elijah Newren via GitGitGadget
2021-03-09 22:51 ` Derrick Stolee
2021-03-09 22:57 ` Elijah Newren
2021-03-09 0:09 ` [PATCH v2 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources Elijah Newren via GitGitGadget
2021-03-09 22:08 ` [PATCH v2 0/8] Optimization batch 9: avoid detecting irrelevant renames Derrick Stolee
2021-03-10 15:08 ` Derrick Stolee
2021-03-11 0:38 ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 1/8] diffcore-rename: enable filtering possible rename sources Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 2/8] merge-ort: precompute subset of sources for which we need rename detection Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 3/8] merge-ort: add data structures for an alternate tree traversal Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 4/8] merge-ort: introduce wrappers for " Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 5/8] merge-ort: precompute whether directory rename detection is needed Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 6/8] merge-ort: use relevant_sources to filter possible rename sources Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 7/8] merge-ort: skip rename detection entirely if possible Elijah Newren via GitGitGadget
2021-03-11 0:38 ` [PATCH v3 8/8] diffcore-rename: avoid doing basename comparisons for irrelevant sources Elijah Newren via GitGitGadget
2021-03-15 13:57 ` [PATCH v3 0/8] Optimization batch 9: avoid detecting irrelevant renames Derrick Stolee
2021-03-15 17:10 ` [PATCH v2 " Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dab8e3c6aee5a852ad46c569c0be729d64310ad9.1615248599.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=avarab@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).