git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Derrick Stolee" <stolee@gmail.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v3 10/10] diffcore-rename: compute dir_rename_guess from dir_rename_counts
Date: Fri, 26 Feb 2021 01:58:19 +0000	[thread overview]
Message-ID: <65f7bfb735f21ff774938b49ba500bf81265db94.1614304700.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.844.v3.git.1614304699.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

dir_rename_counts has a mapping of a mapping, in particular, it has
   old_dir => { new_dir => count }
We want a simple mapping of
   old_dir => new_dir
based on which new_dir had the highest count for a given old_dir.
Compute this and store it in dir_rename_guess.

This is the final piece of the puzzle needed to make our guesses at
which directory files have been moved to when basenames aren't unique.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       12.775 s ±  0.062 s    12.596 s ±  0.061 s
    mega-renames:    188.754 s ±  0.284 s   130.465 s ±  0.259 s
    just-one-mega:     5.599 s ±  0.019 s     3.958 s ±  0.010 s

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 45 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 4 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index e5fa0cb555dd..1fe902ed2af0 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -389,6 +389,24 @@ static void dirname_munge(char *filename)
 	*slash = '\0';
 }
 
+static const char *get_highest_rename_path(struct strintmap *counts)
+{
+	int highest_count = 0;
+	const char *highest_destination_dir = NULL;
+	struct hashmap_iter iter;
+	struct strmap_entry *entry;
+
+	strintmap_for_each_entry(counts, &iter, entry) {
+		const char *destination_dir = entry->key;
+		intptr_t count = (intptr_t)entry->value;
+		if (count > highest_count) {
+			highest_count = count;
+			highest_destination_dir = destination_dir;
+		}
+	}
+	return highest_destination_dir;
+}
+
 static void increment_count(struct dir_rename_info *info,
 			    char *old_dir,
 			    char *new_dir)
@@ -512,6 +530,8 @@ static void initialize_dir_rename_info(struct dir_rename_info *info,
 				       struct strset *dirs_removed,
 				       struct strmap *dir_rename_count)
 {
+	struct hashmap_iter iter;
+	struct strmap_entry *entry;
 	int i;
 
 	if (!dirs_removed) {
@@ -558,6 +578,23 @@ static void initialize_dir_rename_info(struct dir_rename_info *info,
 					 rename_dst[i].p->one->path,
 					 rename_dst[i].p->two->path);
 	}
+
+	/*
+	 * Now we collapse
+	 *    dir_rename_count: old_directory -> {new_directory -> count}
+	 * down to
+	 *    dir_rename_guess: old_directory -> best_new_directory
+	 * where best_new_directory is the one with the highest count.
+	 */
+	strmap_for_each_entry(info->dir_rename_count, &iter, entry) {
+		/* entry->key is source_dir */
+		struct strintmap *counts = entry->value;
+		char *best_newdir;
+
+		best_newdir = xstrdup(get_highest_rename_path(counts));
+		strmap_put(&info->dir_rename_guess, entry->key,
+			   best_newdir);
+	}
 }
 
 void partial_clear_dir_rename_count(struct strmap *dir_rename_count)
@@ -682,10 +719,10 @@ static int idx_possible_rename(char *filename, struct dir_rename_info *info)
 	 *       rename.
 	 *
 	 * This function, idx_possible_rename(), is only responsible for (4).
-	 * The conditions/steps in (1)-(3) will be handled via setting up
-	 * dir_rename_count and dir_rename_guess in a future
-	 * initialize_dir_rename_info() function.  Steps (0) and (5) are
-	 * handled by the caller of this function.
+	 * The conditions/steps in (1)-(3) are handled via setting up
+	 * dir_rename_count and dir_rename_guess in
+	 * initialize_dir_rename_info().  Steps (0) and (5) are handled by
+	 * the caller of this function.
 	 */
 	char *old_dir, *new_dir;
 	struct strbuf new_path = STRBUF_INIT;
-- 
gitgitgadget

  parent reply	other threads:[~2021-02-26  2:01 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-14  7:58 [PATCH 00/10] Optimization batch 8: use file basenames even more Elijah Newren via GitGitGadget
2021-02-14  7:58 ` [PATCH 01/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Elijah Newren via GitGitGadget
2021-02-14  7:58 ` [PATCH 02/10] diffcore-rename: add functions for clearing dir_rename_count Elijah Newren via GitGitGadget
2021-02-14  7:58 ` [PATCH 03/10] diffcore-rename: move dir_rename_counts into a dir_rename_info struct Elijah Newren via GitGitGadget
2021-02-14  7:58 ` [PATCH 04/10] diffcore-rename: extend cleanup_dir_rename_info() Elijah Newren via GitGitGadget
2021-02-14  7:58 ` [PATCH 05/10] diffcore-rename: compute dir_rename_counts in stages Elijah Newren via GitGitGadget
2021-02-14  7:58 ` [PATCH 06/10] diffcore-rename: add a mapping of destination names to their indices Elijah Newren via GitGitGadget
2021-02-14  7:59 ` [PATCH 07/10] diffcore-rename: add a dir_rename_guess field to dir_rename_info Elijah Newren via GitGitGadget
2021-02-14  7:59 ` [PATCH 08/10] diffcore-rename: add a new idx_possible_rename function Elijah Newren via GitGitGadget
2021-02-14  7:59 ` [PATCH 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Elijah Newren via GitGitGadget
2021-02-14  7:59 ` [PATCH 10/10] diffcore-rename: use directory rename guided basename comparisons Elijah Newren via GitGitGadget
2021-02-23 23:43 ` [PATCH v2 00/10] Optimization batch 8: use file basenames even more Elijah Newren via GitGitGadget
2021-02-23 23:43   ` [PATCH v2 01/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Elijah Newren via GitGitGadget
2021-02-24 15:25     ` Derrick Stolee
2021-02-24 18:50       ` Elijah Newren
2021-02-23 23:43   ` [PATCH v2 02/10] diffcore-rename: add functions for clearing dir_rename_count Elijah Newren via GitGitGadget
2021-02-23 23:44   ` [PATCH v2 03/10] diffcore-rename: move dir_rename_counts into a dir_rename_info struct Elijah Newren via GitGitGadget
2021-02-23 23:44   ` [PATCH v2 04/10] diffcore-rename: extend cleanup_dir_rename_info() Elijah Newren via GitGitGadget
2021-02-24 15:37     ` Derrick Stolee
2021-02-25  2:16     ` Ævar Arnfjörð Bjarmason
2021-02-25  2:26       ` Ævar Arnfjörð Bjarmason
2021-02-25  2:34       ` Junio C Hamano
2021-02-23 23:44   ` [PATCH v2 05/10] diffcore-rename: compute dir_rename_counts in stages Elijah Newren via GitGitGadget
2021-02-24 15:43     ` Derrick Stolee
2021-02-23 23:44   ` [PATCH v2 06/10] diffcore-rename: add a mapping of destination names to their indices Elijah Newren via GitGitGadget
2021-02-23 23:44   ` [PATCH v2 07/10] diffcore-rename: add a dir_rename_guess field to dir_rename_info Elijah Newren via GitGitGadget
2021-02-23 23:44   ` [PATCH v2 08/10] diffcore-rename: add a new idx_possible_rename function Elijah Newren via GitGitGadget
2021-02-24 17:35     ` Derrick Stolee
2021-02-25  1:13       ` Elijah Newren
2021-02-23 23:44   ` [PATCH v2 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Elijah Newren via GitGitGadget
2021-02-23 23:44   ` [PATCH v2 10/10] diffcore-rename: use directory rename guided basename comparisons Elijah Newren via GitGitGadget
2021-02-24 17:44     ` Derrick Stolee
2021-02-24 17:50   ` [PATCH v2 00/10] Optimization batch 8: use file basenames even more Derrick Stolee
2021-02-25  1:38     ` Elijah Newren
2021-02-26  1:58   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 01/10] diffcore-rename: use directory rename guided basename comparisons Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 02/10] diffcore-rename: add a new idx_possible_rename function Elijah Newren via GitGitGadget
2021-02-26 15:52       ` Derrick Stolee
2021-02-26  1:58     ` [PATCH v3 03/10] diffcore-rename: add a mapping of destination names to their indices Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 04/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Elijah Newren via GitGitGadget
2021-02-26 15:55       ` Derrick Stolee
2021-02-26  1:58     ` [PATCH v3 05/10] diffcore-rename: add function for clearing dir_rename_count Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 06/10] diffcore-rename: move dir_rename_counts into dir_rename_info struct Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 07/10] diffcore-rename: extend cleanup_dir_rename_info() Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 08/10] diffcore-rename: compute dir_rename_counts in stages Elijah Newren via GitGitGadget
2021-02-26  1:58     ` [PATCH v3 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Elijah Newren via GitGitGadget
2021-02-26  1:58     ` Elijah Newren via GitGitGadget [this message]
2021-02-26 16:34     ` [PATCH v3 00/10] Optimization batch 8: use file basenames even more Derrick Stolee
2021-02-26 19:28       ` Elijah Newren
2021-02-27  0:30     ` [PATCH v4 " Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 01/10] diffcore-rename: use directory rename guided basename comparisons Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 02/10] diffcore-rename: provide basic implementation of idx_possible_rename() Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 03/10] diffcore-rename: add a mapping of destination names to their indices Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 04/10] Move computation of dir_rename_count from merge-ort to diffcore-rename Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 05/10] diffcore-rename: add function for clearing dir_rename_count Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 06/10] diffcore-rename: move dir_rename_counts into dir_rename_info struct Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 07/10] diffcore-rename: extend cleanup_dir_rename_info() Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 08/10] diffcore-rename: compute dir_rename_counts in stages Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 09/10] diffcore-rename: limit dir_rename_counts computation to relevant dirs Elijah Newren via GitGitGadget
2021-02-27  0:30       ` [PATCH v4 10/10] diffcore-rename: compute dir_rename_guess from dir_rename_counts Elijah Newren via GitGitGadget
2021-03-09 21:52       ` [PATCH v4 00/10] Optimization batch 8: use file basenames even more Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65f7bfb735f21ff774938b49ba500bf81265db94.1614304700.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).