git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Derrick Stolee <dstolee@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>, Jeff King <peff@peff.net>,
	Karsten Blees <blees@dcon.de>, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 2/2] diffcore-rename: filter rename_src list when possible
Date: Wed, 03 Feb 2021 11:12:40 -0800	[thread overview]
Message-ID: <xmqqbld1c6dz.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <580ba9a10f54c7a2e7f28d60395fc2edae25eec1.1612331345.git.gitgitgadget@gmail.com> (Elijah Newren via GitGitGadget's message of "Wed, 03 Feb 2021 05:49:05 +0000")

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static int remove_unneeded_paths_from_src(int num_src,
> +					  int detecting_copies)
> +{
> +	int i, new_num_src;
> +
> +	/*
> +	 * Note on reasons why we cull unneeded sources but not destinations:
> +	 *   1) Pairings are stored in rename_dst (not rename_src), which we
> +	 *      need to keep around.  So, we just can't cull rename_dst even
> +	 *      if we wanted to.  But doing so wouldn't help because...
> +	 *
> +	 *   2) There is a matrix pairwise comparison that follows the
> +	 *      "Performing inexact rename detection" progress message.
> +	 *      Iterating over the destinations is done in the outer loop,
> +	 *      hence we only iterate over each of those once and we can
> +	 *      easily skip the outer loop early if the destination isn't
> +	 *      relevant.  That's only one check per destination path to
> +	 *      skip.
> +	 *
> +	 *      By contrast, the sources are iterated in the inner loop; if
> +	 *      we check whether a source can be skipped, then we'll be
> +	 *      checking it N separate times, once for each destination.
> +	 *      We don't want to have to iterate over known-not-needed
> +	 *      sources N times each, so avoid that by removing the sources
> +	 *      from rename_src here.
> +	 */
> +	if (detecting_copies)
> +		return num_src; /* nothing to remove */
> +	if (break_idx)
> +		return num_src; /* culling incompatbile with break detection */
> +
> +	for (i = 0, new_num_src = 0; i < num_src; i++) {
> +		/*
> +		 * renames are stored in rename_dst, so if a rename has
> +		 * already been detected using this source, we can just
> +		 * remove the source knowing rename_dst has its info.
> +		 */
> +		if (rename_src[i].p->one->rename_used)
> +			continue;
> +
> +		if (new_num_src < i)
> +			memcpy(&rename_src[new_num_src], &rename_src[i],
> +			       sizeof(struct diff_rename_src));
> +		new_num_src++;
> +	}
> +
> +	return new_num_src;
> +}

Essentially we are compacting rename_src[] array from num_src
elements down to new_num_src elements; we are losing pointers, but I
presume these are all borrowed pointers that we do not own and we
are not responsible for freeing?  If we were to free them, the
compaction would leave duplicates after the new tail (new_num_src)
and we'd end up having to worry about double-freeing, so hopefully
all we need to do is just to free the entire array of pointers, and
not the pointees.

Having to do this just once and being able to reduce the number of
entries we need to iterate over does sound like a good simple
optimization.

>  void diffcore_rename(struct diff_options *options)
>  {
>  	int detect_rename = options->detect_rename;
> @@ -463,10 +512,11 @@ void diffcore_rename(struct diff_options *options)
>  	struct diff_score *mx;
>  	int i, j, rename_count, skip_unmodified = 0;
>  	int num_destinations, dst_cnt;
> -	int num_sources;
> +	int num_sources, want_copies;
>  	struct progress *progress = NULL;
>  
>  	trace2_region_enter("diff", "setup", options->repo);
> +	want_copies = (detect_rename == DIFF_DETECT_COPY);
>  	if (!minimum_score)
>  		minimum_score = DEFAULT_RENAME_SCORE;
>  
> @@ -529,13 +579,10 @@ void diffcore_rename(struct diff_options *options)
>  		goto cleanup;
>  
>  	/*
> -	 * Calculate how many renames are left (but all the source
> -	 * files still remain as options for rename/copies!)
> +	 * Calculate how many renames are left
>  	 */
>  	num_destinations = (rename_dst_nr - rename_count);
> -	num_sources = rename_src_nr;
> -	if (detect_rename != DIFF_DETECT_COPY)
> -		num_sources -= rename_count;
> +	num_sources = remove_unneeded_paths_from_src(rename_src_nr, want_copies);

OK, this is in a sense an extended version of the previous step.

I am not sure if rename_src_nr can be left out of sync with reality
like this patch does, though.  The reference to that variable in
register_rename_src() and find_exact_renames() are OK as we are not
going to call them after we futz with the rename_src[] array, but
the reference in prefetch(), which does not actually happen early
but only when we start running estimate_similarity(), which is after
we compacted the rename_src[] array, would be affected, no?

Thanks.

  parent reply	other threads:[~2021-02-03 19:15 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-03  5:49 [PATCH 0/2] Optimization batch 6: make full use of exact renames Elijah Newren via GitGitGadget
2021-02-03  5:49 ` [PATCH 1/2] diffcore-rename: no point trying to find a match better than exact Elijah Newren via GitGitGadget
2021-02-03 11:44   ` Derrick Stolee
2021-02-03 16:31     ` Elijah Newren
2021-02-03 18:46     ` Junio C Hamano
2021-02-03 19:10       ` Elijah Newren
2021-02-03  5:49 ` [PATCH 2/2] diffcore-rename: filter rename_src list when possible Elijah Newren via GitGitGadget
     [not found]   ` <13feb106-c3a7-a26d-0e6e-013aa45c58d4@gmail.com>
2021-02-03 17:12     ` Elijah Newren
2021-02-03 19:12   ` Junio C Hamano [this message]
2021-02-03 19:19     ` Elijah Newren
2021-02-03 20:03 ` [PATCH v2 0/2] Optimization batch 6: make full use of exact renames Elijah Newren via GitGitGadget
2021-02-03 20:03   ` [PATCH v2 1/2] diffcore-rename: no point trying to find a match better than exact Elijah Newren via GitGitGadget
2021-02-03 20:03   ` [PATCH v2 2/2] diffcore-rename: filter rename_src list when possible Elijah Newren via GitGitGadget
2021-02-13  1:04     ` Junio C Hamano
2021-02-13  4:24       ` Elijah Newren
2021-02-13  1:06     ` Junio C Hamano
2021-02-13  4:43       ` Elijah Newren
2021-02-03 21:56   ` [PATCH v2 0/2] Optimization batch 6: make full use of exact renames Junio C Hamano
2021-02-03 23:06     ` Elijah Newren
2021-02-03 23:26       ` Junio C Hamano
2021-02-03 23:36       ` Jeff King
2021-02-04  0:05         ` Elijah Newren
2021-02-14  7:34   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-02-14  7:35     ` [PATCH v3 1/2] diffcore-rename: no point trying to find a match better than exact Elijah Newren via GitGitGadget
2021-02-14  7:35     ` [PATCH v3 2/2] diffcore-rename: filter rename_src list when possible Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqbld1c6dz.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=blees@dcon.de \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).