git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>,
	Derrick Stolee <dstolee@gmail.com>, Taylor Blau <me@ttaylorr.com>,
	Elijah Newren <newren@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: [PATCH v2 3/5] diffcore-rename: allow different missing_object_cb functions
Date: Tue, 15 Jun 2021 22:41:44 +0000	[thread overview]
Message-ID: <9f2a8ed8d61f7eef014ec14d93733e2267e1939f.1623796907.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.969.v2.git.1623796907.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

estimate_similarity() was setting up a diff_populate_filespec_options
every time it was called, requiring the caller of estimate_similarity()
to pass in some data needed to set up this option.  Currently the needed
data consisted of a single variable (skip_unmodified), but we want to
also have the different estimate_similarity() callsites start using
different missing_object_cb functions as well.  Rather than also passing
that data in, just have the caller pass in the whole
diff_populate_filespec_options, and reduce the number of times we need to
set it up.

As a side note, this also drops the number of calls to
has_promisor_remote() dramatically.  If L is the number of basename
paths to compare, M is the number of inexact sources, and N is the
number of inexact destinations, then the number of calls to
has_promisor_remote() drops from L+M*N down to at most 2 -- one for each
of the sites that calls estimate_similarity().  has_promisor_remote() is
a very fast function so this almost certainly has no measurable
performance impact, but it seems cleaner to avoid calling that function
so many times.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 58 +++++++++++++++++++++++++++++++----------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 35378d84e8f1..e13e52046026 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -126,7 +126,7 @@ static int estimate_similarity(struct repository *r,
 			       struct diff_filespec *src,
 			       struct diff_filespec *dst,
 			       int minimum_score,
-			       int skip_unmodified)
+			       struct diff_populate_filespec_options *dpf_opt)
 {
 	/* src points at a file that existed in the original tree (or
 	 * optionally a file in the destination tree) and dst points
@@ -143,15 +143,6 @@ static int estimate_similarity(struct repository *r,
 	 */
 	unsigned long max_size, delta_size, base_size, src_copied, literal_added;
 	int score;
-	struct diff_populate_filespec_options dpf_options = {
-		.check_size_only = 1
-	};
-	struct prefetch_options prefetch_options = {r, skip_unmodified};
-
-	if (r == the_repository && has_promisor_remote()) {
-		dpf_options.missing_object_cb = prefetch;
-		dpf_options.missing_object_data = &prefetch_options;
-	}
 
 	/* We deal only with regular files.  Symlink renames are handled
 	 * only when they are exact matches --- in other words, no edits
@@ -169,11 +160,13 @@ static int estimate_similarity(struct repository *r,
 	 * is a possible size - we really should have a flag to
 	 * say whether the size is valid or not!)
 	 */
+	dpf_opt->check_size_only = 1;
+
 	if (!src->cnt_data &&
-	    diff_populate_filespec(r, src, &dpf_options))
+	    diff_populate_filespec(r, src, dpf_opt))
 		return 0;
 	if (!dst->cnt_data &&
-	    diff_populate_filespec(r, dst, &dpf_options))
+	    diff_populate_filespec(r, dst, dpf_opt))
 		return 0;
 
 	max_size = ((src->size > dst->size) ? src->size : dst->size);
@@ -191,11 +184,11 @@ static int estimate_similarity(struct repository *r,
 	if (max_size * (MAX_SCORE-minimum_score) < delta_size * MAX_SCORE)
 		return 0;
 
-	dpf_options.check_size_only = 0;
+	dpf_opt->check_size_only = 0;
 
-	if (!src->cnt_data && diff_populate_filespec(r, src, &dpf_options))
+	if (!src->cnt_data && diff_populate_filespec(r, src, dpf_opt))
 		return 0;
-	if (!dst->cnt_data && diff_populate_filespec(r, dst, &dpf_options))
+	if (!dst->cnt_data && diff_populate_filespec(r, dst, dpf_opt))
 		return 0;
 
 	if (diffcore_count_changes(r, src, dst,
@@ -862,7 +855,11 @@ static int find_basename_matches(struct diff_options *options,
 	int i, renames = 0;
 	struct strintmap sources;
 	struct strintmap dests;
-
+	struct diff_populate_filespec_options dpf_options = {
+		.check_binary = 0,
+		.missing_object_cb = NULL,
+		.missing_object_data = NULL
+	};
 	/*
 	 * The prefeteching stuff wants to know if it can skip prefetching
 	 * blobs that are unmodified...and will then do a little extra work
@@ -873,7 +870,10 @@ static int find_basename_matches(struct diff_options *options,
 	 * the extra work necessary to check if rename_src entries are
 	 * unmodified would be a small waste.
 	 */
-	int skip_unmodified = 0;
+	struct prefetch_options prefetch_options = {
+		.repo = options->repo,
+		.skip_unmodified = 0
+	};
 
 	/*
 	 * Create maps of basename -> fullname(s) for remaining sources and
@@ -910,6 +910,11 @@ static int find_basename_matches(struct diff_options *options,
 			strintmap_set(&dests, base, i);
 	}
 
+	if (options->repo == the_repository && has_promisor_remote()) {
+		dpf_options.missing_object_cb = prefetch;
+		dpf_options.missing_object_data = &prefetch_options;
+	}
+
 	/* Now look for basename matchups and do similarity estimation */
 	for (i = 0; i < rename_src_nr; ++i) {
 		char *filename = rename_src[i].p->one->path;
@@ -953,7 +958,7 @@ static int find_basename_matches(struct diff_options *options,
 			one = rename_src[src_index].p->one;
 			two = rename_dst[dst_index].p->two;
 			score = estimate_similarity(options->repo, one, two,
-						    minimum_score, skip_unmodified);
+						    minimum_score, &dpf_options);
 
 			/* If sufficiently similar, record as rename pair */
 			if (score < minimum_score)
@@ -1272,6 +1277,14 @@ void diffcore_rename_extended(struct diff_options *options,
 	int num_sources, want_copies;
 	struct progress *progress = NULL;
 	struct dir_rename_info info;
+	struct diff_populate_filespec_options dpf_options = {
+		.check_binary = 0,
+		.missing_object_cb = NULL,
+		.missing_object_data = NULL
+	};
+	struct prefetch_options prefetch_options = {
+		.repo = options->repo
+	};
 
 	trace2_region_enter("diff", "setup", options->repo);
 	info.setup = 0;
@@ -1433,6 +1446,13 @@ void diffcore_rename_extended(struct diff_options *options,
 				(uint64_t)num_destinations * (uint64_t)num_sources);
 	}
 
+	/* Finish setting up dpf_options */
+	prefetch_options.skip_unmodified = skip_unmodified;
+	if (options->repo == the_repository && has_promisor_remote()) {
+		dpf_options.missing_object_cb = prefetch;
+		dpf_options.missing_object_data = &prefetch_options;
+	}
+
 	CALLOC_ARRAY(mx, st_mult(NUM_CANDIDATE_PER_DST, num_destinations));
 	for (dst_cnt = i = 0; i < rename_dst_nr; i++) {
 		struct diff_filespec *two = rename_dst[i].p->two;
@@ -1458,7 +1478,7 @@ void diffcore_rename_extended(struct diff_options *options,
 			this_src.score = estimate_similarity(options->repo,
 							     one, two,
 							     minimum_score,
-							     skip_unmodified);
+							     &dpf_options);
 			this_src.name_score = basename_same(one, two);
 			this_src.dst = i;
 			this_src.src = j;
-- 
gitgitgadget


  parent reply	other threads:[~2021-06-15 22:42 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-05  1:27 [PATCH 0/5] Optimization batch 13: partial clone optimizations for merge-ort Elijah Newren via GitGitGadget
2021-06-05  1:28 ` [PATCH 1/5] promisor-remote: output trace2 statistics for number of objects fetched Elijah Newren via GitGitGadget
2021-06-09 21:12   ` Derrick Stolee
2021-06-05  1:28 ` [PATCH 2/5] t6421: add tests checking for excessive object downloads during merge Elijah Newren via GitGitGadget
2021-06-09 21:16   ` Derrick Stolee
2021-06-05  1:28 ` [PATCH 3/5] diffcore-rename: allow different missing_object_cb functions Elijah Newren via GitGitGadget
2021-06-05  1:28 ` [PATCH 4/5] diffcore-rename: use a different prefetch for basename comparisons Elijah Newren via GitGitGadget
2021-06-05  1:28 ` [PATCH 5/5] merge-ort: add prefetching for content merges Elijah Newren via GitGitGadget
2021-06-15 22:41 ` [PATCH v2 0/5] Optimization batch 13: partial clone optimizations for merge-ort Elijah Newren via GitGitGadget
2021-06-15 22:41   ` [PATCH v2 1/5] promisor-remote: output trace2 statistics for number of objects fetched Elijah Newren via GitGitGadget
2021-06-15 22:41   ` [PATCH v2 2/5] t6421: add tests checking for excessive object downloads during merge Elijah Newren via GitGitGadget
2021-06-17  4:49     ` Junio C Hamano
2021-06-15 22:41   ` Elijah Newren via GitGitGadget [this message]
2021-06-15 22:41   ` [PATCH v2 4/5] diffcore-rename: use a different prefetch for basename comparisons Elijah Newren via GitGitGadget
2021-06-15 22:41   ` [PATCH v2 5/5] merge-ort: add prefetching for content merges Elijah Newren via GitGitGadget
2021-06-17  5:04     ` Junio C Hamano
2021-06-22  8:02       ` Elijah Newren
2021-06-16 17:54   ` [PATCH v2 0/5] Optimization batch 13: partial clone optimizations for merge-ort Derrick Stolee
2021-06-17  5:05   ` Junio C Hamano
2021-06-22  8:04   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 1/5] promisor-remote: output trace2 statistics for number of objects fetched Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 2/5] t6421: add tests checking for excessive object downloads during merge Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 3/5] diffcore-rename: allow different missing_object_cb functions Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 4/5] diffcore-rename: use a different prefetch for basename comparisons Elijah Newren via GitGitGadget
2021-06-22  8:04     ` [PATCH v3 5/5] merge-ort: add prefetching for content merges Elijah Newren via GitGitGadget
2021-06-22 16:10     ` [PATCH v3 0/5] Optimization batch 13: partial clone optimizations for merge-ort Derrick Stolee
2021-06-22 18:45       ` Elijah Newren
2021-06-23  2:14         ` Derrick Stolee
2021-06-23  8:11           ` Elijah Newren
2021-06-23 17:31             ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f2a8ed8d61f7eef014ec14d93733e2267e1939f.1623796907.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).