git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <dstolee@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 3/4] diffcore-rename: guide inexact rename detection based on basenames
Date: Tue, 9 Feb 2021 09:41:16 -0800	[thread overview]
Message-ID: <CABPp-BFfF5rxbHwP1xnr63fPmp2=NkTZ-r0=FVZ1oVhAZD_s9w@mail.gmail.com> (raw)
In-Reply-To: <48a208c2-75fe-773e-aa2a-12a90dd06ddd@gmail.com>

On Tue, Feb 9, 2021 at 5:33 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 2/9/2021 6:32 AM, Elijah Newren via GitGitGadget wrote:
> > +     num_sources = rename_src_nr;
> > +
> > +     if (want_copies || break_idx) {
> > +             /*
> > +              * Cull sources:
> > +              *   - remove ones corresponding to exact renames
> > +              */
> > +             trace2_region_enter("diff", "cull after exact", options->repo);
> > +             remove_unneeded_paths_from_src(want_copies);
> > +             trace2_region_leave("diff", "cull after exact", options->repo);
>
> Isn't this the same as
>
> > +     } else {
> > +             /* Determine minimum score to match basenames */
> > +             int min_basename_score = (int)(5*minimum_score + 0*MAX_SCORE)/5;
> > +
> > +             /*
> > +              * Cull sources:
> > +              *   - remove ones involved in renames (found via exact match)
> > +              */
> > +             trace2_region_enter("diff", "cull exact", options->repo);
> > +             remove_unneeded_paths_from_src(want_copies);
> > +             trace2_region_leave("diff", "cull exact", options->repo);
>
> ...this? (except the regions are renamed)
>
> Could this be simplified as:
>
> +       num_sources = rename_src_nr;
> +
> +       trace2_region_enter("diff", "cull after exact", options->repo);
> +       remove_unneeded_paths_from_src(want_copies);
> +       trace2_region_leave("diff", "cull after exact", options->repo);
> +
> +       if (!want_copies && !break_idx) {
> +               /* Determine minimum score to match basenames */
>
> I realize you probably changed the region names on purpose to distinguish
> that there are two "cull" regions in the case of no copies, but I think
> that isn't really worth different names. Better to have a consistent region
> name around the same activity in both cases.

Actually, the reason they were split is because a later series has to
call remove_unneeded_paths_from_src() differently for the two
branches.  The patch history was so dirty that the easiest way to
clean things up was just to create completely new patches pulling off
relevant chunks of code and touching them up; while doing that, I
didn't notice that the changes I made to split out this early series
resulted in this near-duplication.

So, I can join them...but they would just need to be split back out in
my "Optimization batch 9" series.

I'm happy to fix the region name to make them the same.  Is that good
enough, or would you rather these common code regions combined for
this patch and then split out later?

>
> > +             int min_basename_score = (int)(5*minimum_score + 0*MAX_SCORE)/5;
>
> Did you intend for this to be 5*min + 0*MAX? This seems wrong if you want
> this value to be different from minimum_score.

In my cover letter I noted that I didn't know what to set this to and
wanted input; yesterday you said it wasn't worth worrying about using
a different value, but Junio suggested we should use one (but didn't
state how much higher it should be or whether it should be user input
driven).  This weird construct was here just to show that it is easy
to feed a different score into the basename comparison than what is
used elsewhere; I can fix it up once I get word on what Junio wants to
see.

Since I didn't know what to use, though, and I didn't want to get a
different set of numbers for the final commit message on the speedup
achieved if I'm just going to throw them away and recompute once I
find out what Junio wants here, I did intentionally set the
computation to just give us minimum_score, for now.

> > +
> > +             /* Utilize file basenames to quickly find renames. */
> > +             trace2_region_enter("diff", "basename matches", options->repo);
> > +             rename_count += find_basename_matches(options,
> > +                                                   min_basename_score,
> > +                                                   rename_src_nr);
> > +             trace2_region_leave("diff", "basename matches", options->repo);
> > +
> > +             /*
> > +              * Cull sources, again:
> > +              *   - remove ones involved in renames (found via basenames)
> > +              */
> > +             trace2_region_enter("diff", "cull basename", options->repo);
> > +             remove_unneeded_paths_from_src(want_copies);
> > +             trace2_region_leave("diff", "cull basename", options->repo);
> > +     }
> > +
>
> Thanks,
> -Stolee

  reply	other threads:[~2021-02-09 17:46 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-06 22:52 [PATCH 0/3] Optimization batch 7: use file basenames to guide rename detection Elijah Newren via GitGitGadget
2021-02-06 22:52 ` [PATCH 1/3] diffcore-rename: compute basenames of all source and dest candidates Elijah Newren via GitGitGadget
2021-02-06 22:52 ` [PATCH 2/3] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-06 22:52 ` [PATCH 3/3] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-07 14:38   ` Derrick Stolee
2021-02-07 19:51     ` Junio C Hamano
2021-02-08  8:38       ` Elijah Newren
2021-02-08 11:43         ` Derrick Stolee
2021-02-08 16:25           ` Elijah Newren
2021-02-08 17:37         ` Junio C Hamano
2021-02-08 22:00           ` Elijah Newren
2021-02-08 23:43             ` Junio C Hamano
2021-02-08 23:52               ` Elijah Newren
2021-02-08  8:27     ` Elijah Newren
2021-02-08 11:31       ` Derrick Stolee
2021-02-08 16:09         ` Elijah Newren
2021-02-07  5:19 ` [PATCH 0/3] Optimization batch 7: use file basenames to guide rename detection Junio C Hamano
2021-02-07  6:05   ` Elijah Newren
2021-02-09 11:32 ` [PATCH v2 0/4] " Elijah Newren via GitGitGadget
2021-02-09 11:32   ` [PATCH v2 1/4] diffcore-rename: compute basenames of all source and dest candidates Elijah Newren via GitGitGadget
2021-02-09 13:17     ` Derrick Stolee
2021-02-09 16:56       ` Elijah Newren
2021-02-09 17:02         ` Derrick Stolee
2021-02-09 17:42           ` Elijah Newren
2021-02-09 11:32   ` [PATCH v2 2/4] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-09 13:25     ` Derrick Stolee
2021-02-09 17:17       ` Elijah Newren
2021-02-09 17:34         ` Derrick Stolee
2021-02-09 11:32   ` [PATCH v2 3/4] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-09 13:33     ` Derrick Stolee
2021-02-09 17:41       ` Elijah Newren [this message]
2021-02-09 18:59         ` Junio C Hamano
2021-02-09 11:32   ` [PATCH v2 4/4] gitdiffcore doc: mention new preliminary step for rename detection Elijah Newren via GitGitGadget
2021-02-09 12:59     ` Derrick Stolee
2021-02-09 17:03       ` Junio C Hamano
2021-02-09 17:44         ` Elijah Newren
2021-02-10 15:15   ` [PATCH v3 0/5] Optimization batch 7: use file basenames to guide " Elijah Newren via GitGitGadget
2021-02-10 15:15     ` [PATCH v3 1/5] t4001: add a test comparing basename similarity and content similarity Elijah Newren via GitGitGadget
2021-02-13  1:15       ` Junio C Hamano
2021-02-13  4:50         ` Elijah Newren
2021-02-13 23:56           ` Junio C Hamano
2021-02-14  1:24             ` Elijah Newren
2021-02-14  1:32               ` Junio C Hamano
2021-02-14  3:14                 ` Elijah Newren
2021-02-10 15:15     ` [PATCH v3 2/5] diffcore-rename: compute basenames of all source and dest candidates Elijah Newren via GitGitGadget
2021-02-13  1:32       ` Junio C Hamano
2021-02-10 15:15     ` [PATCH v3 3/5] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-13  1:48       ` Junio C Hamano
2021-02-13 18:34         ` Elijah Newren
2021-02-13 23:55           ` Junio C Hamano
2021-02-14  3:08             ` Elijah Newren
2021-02-10 15:15     ` [PATCH v3 4/5] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-13  1:49       ` Junio C Hamano
2021-02-10 15:15     ` [PATCH v3 5/5] gitdiffcore doc: mention new preliminary step for rename detection Elijah Newren via GitGitGadget
2021-02-10 16:41       ` Junio C Hamano
2021-02-10 17:20         ` Elijah Newren
2021-02-11  8:15     ` [PATCH v4 0/6] Optimization batch 7: use file basenames to guide " Elijah Newren via GitGitGadget
2021-02-11  8:15       ` [PATCH v4 1/6] t4001: add a test comparing basename similarity and content similarity Elijah Newren via GitGitGadget
2021-02-11  8:15       ` [PATCH v4 2/6] diffcore-rename: compute basenames of all source and dest candidates Elijah Newren via GitGitGadget
2021-02-11  8:15       ` [PATCH v4 3/6] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-11  8:15       ` [PATCH v4 4/6] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-11  8:15       ` [PATCH v4 5/6] gitdiffcore doc: mention new preliminary step for rename detection Elijah Newren via GitGitGadget
2021-02-11  8:15       ` [PATCH v4 6/6] merge-ort: call diffcore_rename() directly Elijah Newren via GitGitGadget
2021-02-13  1:53       ` [PATCH v4 0/6] Optimization batch 7: use file basenames to guide rename detection Junio C Hamano
2021-02-14  7:51       ` [PATCH v5 " Elijah Newren via GitGitGadget
2021-02-14  7:51         ` [PATCH v5 1/6] t4001: add a test comparing basename similarity and content similarity Elijah Newren via GitGitGadget
2021-02-14  7:51         ` [PATCH v5 2/6] diffcore-rename: compute basenames of source and dest candidates Elijah Newren via GitGitGadget
2021-02-14  7:51         ` [PATCH v5 3/6] diffcore-rename: complete find_basename_matches() Elijah Newren via GitGitGadget
2021-02-14  7:51         ` [PATCH v5 4/6] diffcore-rename: guide inexact rename detection based on basenames Elijah Newren via GitGitGadget
2021-02-14  7:51         ` [PATCH v5 5/6] gitdiffcore doc: mention new preliminary step for rename detection Elijah Newren via GitGitGadget
2021-02-14  7:51         ` [PATCH v5 6/6] merge-ort: call diffcore_rename() directly Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BFfF5rxbHwP1xnr63fPmp2=NkTZ-r0=FVZ1oVhAZD_s9w@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).