From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <stolee@gmail.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
"Elijah Newren" <newren@gmail.com>,
"Bagas Sanjaya" <bagasdotme@gmail.com>,
"Elijah Newren" <newren@gmail.com>,
"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v4 6/7] merge-ort: avoid recursing into directories when we don't need to
Date: Wed, 21 Jul 2021 04:24:03 +0000 [thread overview]
Message-ID: <7b2112718157e65ba558842e27521df8c351f596.1626841444.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.988.v4.git.1626841444.gitgitgadget@gmail.com>
From: Elijah Newren <newren@gmail.com>
This combines the work of the last several patches, and implements the
conditions when we don't need to recurse into directories. It's perhaps
easiest to see the logic by separating the fact that a directory might
have both rename sources and rename destinations:
* rename sources: only files present in the merge base can serve as
rename sources, and only when one side deletes that file. When the
tree on one side matches the merge base, that means every file
within the subtree matches the merge base. This means that the
skip-irrelevant-rename-detection optimization from before kicks in
and we don't need any of these files as rename sources.
* rename destinations: the tree that does not match the merge base
might have newly added and hence unmatched destination files.
This is what usually prevents us from doing trivial directory
resolutions in the merge machinery. However, the fact that we have
deferred recursing into this directory until the end means we know
whether there are any unmatched relevant potential rename sources
elsewhere in this merge. If there are no unmatched such relevant
sources anywhere, then there is no need to look for unmatched
potential rename destinations to match them with.
This informs our algorithm:
* Search through relevant_sources; if we have entries, they better all
be reflected in cached_pairs or cached_irrelevant, otherwise they
represent an unmatched potential rename source (causing the
optimization to be disallowed).
* For any relevant_source represented in cached_pairs, we do need to
to make sure to get the destination for each source, meaning we need
to recurse into any ancestor directories of those destinations.
* Once we've recursed into all the rename destinations for any
relevant_sources in cached_pairs, we can then do the trivial
directory resolution for the remaining directories.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 5.235 s ± 0.042 s 205.1 ms ± 3.8 ms
mega-renames: 9.419 s ± 0.107 s 1.564 s ± 0.010 s
just-one-mega: 480.1 ms ± 3.9 ms 479.5 ms ± 3.9 ms
Acked-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
merge-ort.c | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 99 insertions(+), 3 deletions(-)
diff --git a/merge-ort.c b/merge-ort.c
index dbccf8c62e2..a013708fa79 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1230,6 +1230,18 @@ static int collect_merge_info_callback(int n,
return mask;
}
+static void resolve_trivial_directory_merge(struct conflict_info *ci, int side)
+{
+ VERIFY_CI(ci);
+ assert((side == 1 && ci->match_mask == 5) ||
+ (side == 2 && ci->match_mask == 3));
+ oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
+ ci->merged.result.mode = ci->stages[side].mode;
+ ci->merged.is_null = is_null_oid(&ci->stages[side].oid);
+ ci->match_mask = 0;
+ ci->merged.clean = 1; /* (ci->filemask == 0); */
+}
+
static int handle_deferred_entries(struct merge_options *opt,
struct traverse_info *info)
{
@@ -1239,9 +1251,72 @@ static int handle_deferred_entries(struct merge_options *opt,
int side, ret = 0;
for (side = MERGE_SIDE1; side <= MERGE_SIDE2; side++) {
- renames->deferred[side].trivial_merges_okay = 0;
- strintmap_for_each_entry(&renames->deferred[side].possible_trivial_merges,
- &iter, entry) {
+ unsigned optimization_okay = 1;
+ struct strintmap copy;
+
+ /* Loop over the set of paths we need to know rename info for */
+ strset_for_each_entry(&renames->relevant_sources[side],
+ &iter, entry) {
+ char *rename_target, *dir, *dir_marker;
+ struct strmap_entry *e;
+
+ /*
+ * If we don't know delete/rename info for this path,
+ * then we need to recurse into all trees to get all
+ * adds to make sure we have it.
+ */
+ if (strset_contains(&renames->cached_irrelevant[side],
+ entry->key))
+ continue;
+ e = strmap_get_entry(&renames->cached_pairs[side],
+ entry->key);
+ if (!e) {
+ optimization_okay = 0;
+ break;
+ }
+
+ /* If this is a delete, we have enough info already */
+ rename_target = e->value;
+ if (!rename_target)
+ continue;
+
+ /* If we already walked the rename target, we're good */
+ if (strmap_contains(&opt->priv->paths, rename_target))
+ continue;
+
+ /*
+ * Otherwise, we need to get a list of directories that
+ * will need to be recursed into to get this
+ * rename_target.
+ */
+ dir = xstrdup(rename_target);
+ while ((dir_marker = strrchr(dir, '/'))) {
+ *dir_marker = '\0';
+ if (strset_contains(&renames->deferred[side].target_dirs,
+ dir))
+ break;
+ strset_add(&renames->deferred[side].target_dirs,
+ dir);
+ }
+ free(dir);
+ }
+ renames->deferred[side].trivial_merges_okay = optimization_okay;
+ /*
+ * We need to recurse into any directories in
+ * possible_trivial_merges[side] found in target_dirs[side].
+ * But when we recurse, we may need to queue up some of the
+ * subdirectories for possible_trivial_merges[side]. Since
+ * we can't safely iterate through a hashmap while also adding
+ * entries, move the entries into 'copy', iterate over 'copy',
+ * and then we'll also iterate anything added into
+ * possible_trivial_merges[side] once this loop is done.
+ */
+ copy = renames->deferred[side].possible_trivial_merges;
+ strintmap_init_with_options(&renames->deferred[side].possible_trivial_merges,
+ 0,
+ NULL,
+ 0);
+ strintmap_for_each_entry(©, &iter, entry) {
const char *path = entry->key;
unsigned dir_rename_mask = (intptr_t)entry->value;
struct conflict_info *ci;
@@ -1254,6 +1329,13 @@ static int handle_deferred_entries(struct merge_options *opt,
VERIFY_CI(ci);
dirmask = ci->dirmask;
+ if (optimization_okay &&
+ !strset_contains(&renames->deferred[side].target_dirs,
+ path)) {
+ resolve_trivial_directory_merge(ci, side);
+ continue;
+ }
+
info->name = path;
info->namelen = strlen(path);
info->pathlen = info->namelen + 1;
@@ -1289,6 +1371,20 @@ static int handle_deferred_entries(struct merge_options *opt,
if (ret < 0)
return ret;
}
+ strintmap_clear(©);
+ strintmap_for_each_entry(&renames->deferred[side].possible_trivial_merges,
+ &iter, entry) {
+ const char *path = entry->key;
+ struct conflict_info *ci;
+
+ ci = strmap_get(&opt->priv->paths, path);
+ VERIFY_CI(ci);
+
+ assert(renames->deferred[side].trivial_merges_okay &&
+ !strset_contains(&renames->deferred[side].target_dirs,
+ path));
+ resolve_trivial_directory_merge(ci, side);
+ }
}
return ret;
}
--
gitgitgadget
next prev parent reply other threads:[~2021-07-21 4:25 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-01 3:46 [PATCH 0/7] Optimization batch 14: trivial directory resolution Elijah Newren via GitGitGadget
2021-07-01 3:46 ` [PATCH 1/7] merge-ort: resolve paths early when we have sufficient information Elijah Newren via GitGitGadget
2021-07-01 3:46 ` [PATCH 2/7] merge-ort: add some more explanations in collect_merge_info_callback() Elijah Newren via GitGitGadget
2021-07-01 3:46 ` [PATCH 3/7] merge-ort: add data structures for allowable trivial directory resolves Elijah Newren via GitGitGadget
2021-07-01 3:46 ` [PATCH 4/7] merge-ort: add a handle_deferred_entries() helper function Elijah Newren via GitGitGadget
2021-07-01 3:46 ` [PATCH 5/7] merge-ort: defer recursing into directories when merge base is matched Elijah Newren via GitGitGadget
2021-07-01 3:46 ` [PATCH 6/7] merge-ort: avoid recursing into directories when we don't need to Elijah Newren via GitGitGadget
2021-07-01 3:46 ` [PATCH 7/7] merge-ort: restart merge with cached renames to reduce process entry cost Elijah Newren via GitGitGadget
2021-07-01 13:21 ` [PATCH 0/7] Optimization batch 14: trivial directory resolution Ævar Arnfjörð Bjarmason
2021-07-01 15:04 ` Elijah Newren
2021-07-01 19:22 ` Elijah Newren
2021-07-13 19:32 ` [PATCH v2 " Elijah Newren via GitGitGadget
2021-07-13 19:32 ` [PATCH v2 1/7] merge-ort: resolve paths early when we have sufficient information Elijah Newren via GitGitGadget
2021-07-13 19:32 ` [PATCH v2 2/7] merge-ort: add some more explanations in collect_merge_info_callback() Elijah Newren via GitGitGadget
2021-07-13 23:34 ` Bagas Sanjaya
2021-07-14 0:19 ` Elijah Newren
2021-07-13 19:32 ` [PATCH v2 3/7] merge-ort: add data structures for allowable trivial directory resolves Elijah Newren via GitGitGadget
2021-07-15 13:54 ` Derrick Stolee
2021-07-15 15:54 ` Elijah Newren
2021-07-13 19:33 ` [PATCH v2 4/7] merge-ort: add a handle_deferred_entries() helper function Elijah Newren via GitGitGadget
2021-07-15 14:32 ` Derrick Stolee
2021-07-15 15:59 ` Elijah Newren
2021-07-13 19:33 ` [PATCH v2 5/7] merge-ort: defer recursing into directories when merge base is matched Elijah Newren via GitGitGadget
2021-07-15 14:43 ` Derrick Stolee
2021-07-15 16:03 ` Elijah Newren
2021-07-15 17:14 ` Derrick Stolee
2021-07-13 19:33 ` [PATCH v2 6/7] merge-ort: avoid recursing into directories when we don't need to Elijah Newren via GitGitGadget
2021-07-15 14:55 ` Derrick Stolee
2021-07-15 16:28 ` Elijah Newren
2021-07-13 19:33 ` [PATCH v2 7/7] merge-ort: restart merge with cached renames to reduce process entry cost Elijah Newren via GitGitGadget
2021-07-15 15:09 ` Derrick Stolee
2021-07-15 16:53 ` Elijah Newren
2021-07-15 17:19 ` Derrick Stolee
2021-07-15 17:32 ` Elijah Newren
2021-07-16 5:22 ` [PATCH v3 0/7] Optimization batch 14: trivial directory resolution Elijah Newren via GitGitGadget
2021-07-16 5:22 ` [PATCH v3 1/7] merge-ort: resolve paths early when we have sufficient information Elijah Newren via GitGitGadget
2021-07-16 5:22 ` [PATCH v3 2/7] merge-ort: add some more explanations in collect_merge_info_callback() Elijah Newren via GitGitGadget
2021-07-16 5:22 ` [PATCH v3 3/7] merge-ort: add data structures for allowable trivial directory resolves Elijah Newren via GitGitGadget
2021-07-16 5:22 ` [PATCH v3 4/7] merge-ort: add a handle_deferred_entries() helper function Elijah Newren via GitGitGadget
2021-07-16 5:22 ` [PATCH v3 5/7] merge-ort: defer recursing into directories when merge base is matched Elijah Newren via GitGitGadget
2021-07-16 5:22 ` [PATCH v3 6/7] merge-ort: avoid recursing into directories when we don't need to Elijah Newren via GitGitGadget
2021-07-16 5:22 ` [PATCH v3 7/7] merge-ort: restart merge with cached renames to reduce process entry cost Elijah Newren via GitGitGadget
2021-07-20 13:00 ` [PATCH v3 0/7] Optimization batch 14: trivial directory resolution Derrick Stolee
2021-07-20 21:43 ` Junio C Hamano
2021-07-21 4:23 ` [PATCH v4 " Elijah Newren via GitGitGadget
2021-07-21 4:23 ` [PATCH v4 1/7] merge-ort: resolve paths early when we have sufficient information Elijah Newren via GitGitGadget
2021-07-21 4:23 ` [PATCH v4 2/7] merge-ort: add some more explanations in collect_merge_info_callback() Elijah Newren via GitGitGadget
2021-07-21 4:24 ` [PATCH v4 3/7] merge-ort: add data structures for allowable trivial directory resolves Elijah Newren via GitGitGadget
2021-07-21 4:24 ` [PATCH v4 4/7] merge-ort: add a handle_deferred_entries() helper function Elijah Newren via GitGitGadget
2021-07-21 4:24 ` [PATCH v4 5/7] merge-ort: defer recursing into directories when merge base is matched Elijah Newren via GitGitGadget
2021-07-21 4:24 ` Elijah Newren via GitGitGadget [this message]
2021-07-21 4:24 ` [PATCH v4 7/7] merge-ort: restart merge with cached renames to reduce process entry cost Elijah Newren via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7b2112718157e65ba558842e27521df8c351f596.1626841444.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=avarab@gmail.com \
--cc=bagasdotme@gmail.com \
--cc=git@vger.kernel.org \
--cc=newren@gmail.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).