git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <dstolee@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH v2 07/13] merge-ort: populate caches of rename detection results
Date: Wed, 19 May 2021 17:48:53 -0700	[thread overview]
Message-ID: <CABPp-BH79AC+k99djq28my=3VyyszR4=uRhSU8Ouy=P9WmiSCw@mail.gmail.com> (raw)
In-Reply-To: <df8260bf-0990-a2df-86be-0059ca561751@gmail.com>

On Mon, May 17, 2021 at 6:51 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/3/21 10:12 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Fill in cache_pairs, cached_target_names, and cached_irrelevant based on
> > rename detection results.  Future commits will make use of these values.
>
> Thank you for continuing to break this down into nice-sized pieces.
>
> > +static void possibly_cache_new_pair(struct rename_info *renames,
> > +                                 struct diff_filepair *p,
> > +                                 unsigned side,
> > +                                 char *new_path)
> > +{
> > +     char *old_value;
> > +     int dir_renamed_side = 0;
> > +
> > +     if (new_path) {
> > +             /*
> > +              * Directory renames happen on the other side of history from
> > +              * the side that adds new files to the old directory.
> > +              */
> > +             dir_renamed_side = 3 - side;
>
> Neat trick. Side is in { 1, 2 } so this makes sense.
>
> > +     } else {
> > +             int val = strintmap_get(&renames->relevant_sources[side],
> > +                                     p->one->path);
> > +             if (val == RELEVANT_NO_MORE) {
> > +                     assert(p->status == 'D');
> > +                     strset_add(&renames->cached_irrelevant[side],
> > +                                p->one->path);
>
> Ok, I see a transition here from a relevant side to an
> irrelevant one.
>
> > +             }
> > +             if (val <= 0)
> > +                     return;
> > +     }
> > +
> > +     if (p->status == 'D') {
> > +             /*
> > +              * If we already had this delete, we'll just set it's value
> > +              * to NULL again, so no harm.
> > +              */
> > +             strmap_put(&renames->cached_pairs[side], p->one->path, NULL);
> > +     } else if (p->status == 'R') {
> > +             if (new_path) {
> > +                     new_path = xstrdup(new_path);
> > +                     old_value = strmap_put(&renames->cached_pairs[dir_renamed_side],
> > +                                            p->two->path, new_path);
> > +                     strset_add(&renames->cached_target_names[dir_renamed_side],
> > +                                new_path);
> > +                     assert(!old_value);
>
> This assert implies that p->status == 'R' only if this is the
> first side (and first commit) to show a rename, right?

Um, this assert implies that p->two->path was not already found in
renames->cached_pairs[dir_renamed_side].

>
> > +             }
> > +             if (!new_path)
> > +                     new_path = p->two->path;
> > +             new_path = xstrdup(new_path);
>
> If new_path was provided as non-NULL, then this is the second
> time we are dup-ing it. However, that seems correct because we
> want a different copy or every time we add it to the cached_pairs
> and cached_target_names data.
>
> > +             old_value = strmap_put(&renames->cached_pairs[side],
> > +                                    p->one->path, new_path);
> > +             strset_add(&renames->cached_target_names[side],
> > +                        new_path);
>
> Since we appear to be doing this in multiple places, would this
> be a good place for a helper method? We could have it take a
> `const char *new_path` and have the helper manage the `xstrdup()`
> so we never forget to do that exactly once per insert to these
> sets.

Makes sense.

> > +             free(old_value);
> > +     } else if (p->status == 'A' && new_path) {
> > +             new_path = xstrdup(new_path);
> > +             old_value = strmap_put(&renames->cached_pairs[dir_renamed_side],
> > +                                    p->two->path, new_path);
> > +             strset_add(&renames->cached_target_names[dir_renamed_side],
> > +                        new_path);
> > +             assert(!old_value);
>
> And here's the third instance, making the "three is many" rule
> kick in. A helper method would help make this easier. You can
> also have a parameter corresponding to whether you need to
> free() the old_value or assert it is NULL.

Yep, I'll add a helper.

>
> > +     }
> > +}
> > +
> >  static int compare_pairs(const void *a_, const void *b_)
> >  {
> >       const struct diff_filepair *a = *((const struct diff_filepair **)a_);
> > @@ -2415,6 +2474,7 @@ static int collect_renames(struct merge_options *opt,
> >               char *new_path; /* non-NULL only with directory renames */
> >
> >               if (p->status != 'A' && p->status != 'R') {
> > +                     possibly_cache_new_pair(renames, p, side_index, NULL);
> >                       diff_free_filepair(p);
> >                       continue;
> >               }
> > @@ -2426,11 +2486,11 @@ static int collect_renames(struct merge_options *opt,
> >                                                     &collisions,
> >                                                     &clean);
> >
> > +             possibly_cache_new_pair(renames, p, side_index, new_path);
> >               if (p->status != 'R' && !new_path) {
> >                       diff_free_filepair(p);
> >                       continue;
> >               }
> > -
>
> nit: this deletion seems unnecessary.

Will fix.

> >               if (new_path)
> >                       apply_directory_rename_modifications(opt, p, new_path);
> >
> > @@ -3701,8 +3761,16 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
> >                                        NULL, 1);
> >               strmap_init_with_options(&renames->dir_renames[i],
> >                                        NULL, 0);
> > +             /*
> > +              * relevant_sources uses -1 for the default, because we need
> > +              * to be able to distinguish not-in-strintmap from valid
> > +              * relevant_source values from enum file_rename_relevance.
> > +              * In particular, possibly_cache_new_pair() expects a negative
> > +              * value for not-found entries.
> > +              */
> >               strintmap_init_with_options(&renames->relevant_sources[i],
> > -                                         0, NULL, 0);
> > +                                         -1 /* explicitly invalid */,
> > +                                         NULL, 0);
> >               strmap_init_with_options(&renames->cached_pairs[i],
> >                                        NULL, 1);
> >               strset_init_with_options(&renames->cached_irrelevant[i],
> >
>
> Functionally looks good. I just had some nits about organization.

As always, thanks for the review and the helpful suggestions!

  reply	other threads:[~2021-05-20  0:49 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-24 21:32 [PATCH 0/7] Optimization batch 11: avoid repeatedly detecting same renames Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 1/7] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 2/7] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 3/7] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 4/7] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 5/7] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 6/7] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 7/7] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-03-24 22:04 ` [PATCH 0/7] Optimization batch 11: avoid repeatedly detecting same renames Junio C Hamano
2021-03-24 23:25   ` Elijah Newren
2021-03-25 18:59     ` Junio C Hamano
2021-03-29 22:34       ` Elijah Newren
2021-03-30 12:07         ` Derrick Stolee
2021-05-04  2:12 ` [PATCH v2 00/13] " Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 01/13] t6423: rename file within directory that other side renamed Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 02/13] Documentation/technical: describe remembering renames optimization Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 03/13] fast-rebase: change assert() to BUG() Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 04/13] fast-rebase: write conflict state to working tree, index, and HEAD Elijah Newren via GitGitGadget
2021-05-17 13:32     ` Derrick Stolee
2021-05-18  3:42       ` Elijah Newren
2021-05-18 13:54         ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 05/13] t6429: testcases for remembering renames Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 06/13] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-05-17 13:41     ` Derrick Stolee
2021-05-18  3:55       ` Elijah Newren
2021-05-18 13:57         ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 07/13] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-05-17 13:51     ` Derrick Stolee
2021-05-20  0:48       ` Elijah Newren [this message]
2021-05-04  2:12   ` [PATCH v2 08/13] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-05-17 14:01     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 09/13] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-05-17 14:10     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 10/13] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 11/13] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 12/13] merge-ort: handle interactions of caching and rename/rename(1to1) cases Elijah Newren via GitGitGadget
2021-05-17 14:16     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 13/13] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-05-17 14:23     ` Derrick Stolee
2021-05-20  0:36       ` Elijah Newren
2021-05-22 11:17         ` Derrick Stolee
2021-05-14 17:37   ` [PATCH v2 00/13] Optimization batch 11: avoid repeatedly detecting same renames Elijah Newren
2021-05-14 21:04     ` Derrick Stolee
2021-05-20  6:09   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 01/13] t6423: rename file within directory that other side renamed Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 02/13] Documentation/technical: describe remembering renames optimization Elijah Newren via GitGitGadget
2021-05-20 11:32       ` Bagas Sanjaya
2021-05-20 15:14         ` Kerry, Richard
2021-05-20 16:34         ` Elijah Newren
2021-05-20  6:09     ` [PATCH v3 03/13] fast-rebase: change assert() to BUG() Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 04/13] fast-rebase: write conflict state to working tree, index, and HEAD Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 05/13] t6429: testcases for remembering renames Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 06/13] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 07/13] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 08/13] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 09/13] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 10/13] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 11/13] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 12/13] merge-ort: handle interactions of caching and rename/rename(1to1) cases Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 13/13] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-05-22 11:17     ` [PATCH v3 00/13] Optimization batch 11: avoid repeatedly detecting same renames Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BH79AC+k99djq28my=3VyyszR4=uRhSU8Ouy=P9WmiSCw@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    --subject='Re: [PATCH v2 07/13] merge-ort: populate caches of rename detection results' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git