git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Derrick Stolee <dstolee@microsoft.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH v2 13/13] merge-ort, diffcore-rename: employ cached renames when possible
Date: Wed, 19 May 2021 17:36:26 -0700	[thread overview]
Message-ID: <CABPp-BFdxn9f0-jUjY6Ake_6kX-jeN1EEzpeJeTg+TV4wfepwg@mail.gmail.com> (raw)
In-Reply-To: <f0c50259-5627-482b-1daf-b73819cde108@gmail.com>

On Mon, May 17, 2021 at 7:23 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 5/3/21 10:12 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> ...
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 2fc98b803d1c..17dc3deb3c73 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -753,15 +753,48 @@ static void add_pair(struct merge_options *opt,
> >       struct rename_info *renames = &opt->priv->renames;
> >       int names_idx = is_add ? side : 0;
> >
> > -     if (!is_add) {
> > +     if (is_add) {
> > +             if (strset_contains(&renames->cached_target_names[side],
> > +                                 pathname))
> > +                     return;
> > +     } else {
> >               unsigned content_relevant = (match_mask == 0);
> >               unsigned location_relevant = (dir_rename_mask == 0x07);
> >
> > +             /*
> > +              * If pathname is found in cached_irrelevant[side] due to
> > +              * previous pick but for this commit content is relevant,
> > +              * then we need to remove it from cached_irrelevant.
> > +              */
> > +             if (content_relevant)
> > +                     /* strset_remove is no-op if strset doesn't have key */
> > +                     strset_remove(&renames->cached_irrelevant[side],
> > +                                   pathname);
>
> I see, content can become relevant again.
>
> ...
>
> > diff --git a/t/t6429-merge-sequence-rename-caching.sh b/t/t6429-merge-sequence-rename-caching.sh
> > index f47d8924ee73..035edc40b1eb 100755
> > --- a/t/t6429-merge-sequence-rename-caching.sh
> > +++ b/t/t6429-merge-sequence-rename-caching.sh
> > @@ -101,10 +101,10 @@ test_expect_success 'caching renames does not preclude finding new ones' '
> >  # dramatic change in size of the file, but remembering the rename and
> >  # reusing it is reasonable too.
> >  #
> > -# Rename detection (diffcore_rename_extended()) will run twice here; it is
> > -# not needed on the topic side of history for either of the two commits
> > -# being merged, but it is needed on the upstream side of history for each
> > -# commit being picked.
> > +# We do test here that we expect rename detection to only be run once total
> > +# (the topic side of history doesn't need renames, and with caching we
> > +# should be able to only run rename detection on the upstream side one
> > +# time.)
> >  test_expect_success 'cherry-pick both a commit and its immediate revert' '
> >       test_create_repo pick-commit-and-its-immediate-revert &&
> >       (
> > @@ -140,11 +140,11 @@ test_expect_success 'cherry-pick both a commit and its immediate revert' '
> >               GIT_TRACE2_PERF="$(pwd)/trace.output" &&
> >               export GIT_TRACE2_PERF &&
> >
> > -             test_might_fail test-tool fast-rebase --onto HEAD upstream~1 topic &&
> > +             test-tool fast-rebase --onto HEAD upstream~1 topic &&
>
> Here is a change of behavior, but it appears to be a good one!
>
> >               #git cherry-pick upstream~1..topic &&
> >
> >               grep region_enter.*diffcore_rename trace.output >calls &&
> > -             test_line_count = 2 calls
> > +             test_line_count = 1 calls
> >       )
> ...
> > @@ -450,7 +459,7 @@ test_expect_success 'dir rename unneeded, then add new file to old dir' '
> >               #git cherry-pick upstream..topic &&
> >
> >               grep region_enter.*diffcore_rename trace.output >calls &&
> > -             test_line_count = 3 calls &&
> > +             test_line_count = 2 calls &&
> >
> >               git ls-files >tracked &&
> >               test_line_count = 5 tracked &&
> > @@ -516,7 +525,7 @@ test_expect_success 'dir rename unneeded, then rename existing file into old dir
> >               #git cherry-pick upstream..topic &&
> >
> >               grep region_enter.*diffcore_rename trace.output >calls &&
> > -             test_line_count = 4 calls &&
> > +             test_line_count = 3 calls &&
>
> I appreciate that this use of tracing demonstrates a change of
> internal behavior.
>
> >               test_path_is_missing somefile &&
> >               test_path_is_missing olddir/newfile &&
> > @@ -648,9 +657,8 @@ test_expect_success 'caching renames only on upstream side, part 1' '
> >  # for the wrong side of history.
> >  #
> >  #
> > -# This testcase should only need three calls to diffcore_rename_extended(),
> > -# because there are no renames on the topic side of history for picking
> > -# Topic_2.
> > +# This testcase should only need two calls to diffcore_rename_extended(),
> > +# both for the first merge, one for each side of history.
> >  #
> >  test_expect_success 'caching renames only on upstream side, part 2' '
> >       test_setup_topic_rename cache-renames-only-upstream-rename-file &&
> > @@ -677,7 +685,7 @@ test_expect_success 'caching renames only on upstream side, part 2' '
> >               #git cherry-pick upstream..topic &&
> >
> >               grep region_enter.*diffcore_rename trace.output >calls &&
> > -             test_line_count = 3 calls &&
> > +             test_line_count = 2 calls &&
>
> Same here.
>
> As I was reading, I was also thinking that it would be good to
> have some kind of tracing, if only a summary of how often we
> relied upon the cached renames.

That's an interesting thought.  If there are renames that remain
cached, the code always uses them (whether we need the cached rename
or not doesn't matter since we already have it and it is cheap to
use), so a summary would probably be the only thing that would make
sense.

The easiest summary of how often we rely upon cached renames, is
checking if we have enough to avoid calling diffcore_rename_extended()
entirely.  That's precisely what the code above does that you are
responding to.

What more information could we get out?  I guess we could get ever so
slightly more information by tracking how many times we decide that
the cache from a previous merge can be re-used (by tracking the number
of times that cached_valid_side is set to 1 or 2 instead of 0).  Since
we sometimes have some cached renames but not enough to skip rename
detection (because source paths that were irrelevant in previous
commits are marked relevant for the current commit), this would be an
independent number from the region_enter-diffcore_rename count used
above.

> That would present a mechanism
> for the test cases to verify that the rename cache is behaving
> as expected

How so?  I did check something like that above with verifying I was
able to reduce the number of calls to diffcore_rename_extended() while
still getting the expected result (which can only be done if renames
are known).  The only additional thing I can think of we could check
would be a testcase where renames are cached and can be re-used but
it's not enough to avoid calling diffcore_rename_extended().  Did you
have something else in mind?

> but also provides a way to diagnose any issues that
> might arise in the future by asking a user to reproduce a
> problematic rebase/merge with a GIT_TRACE2* target. Such a
> change would fit as a follow-up, and does not need to insert
> into an already heavy patch.

I'm not sure what information could be recorded that would help
diagnose any such issues.  "9 out of 10 commits in your rebase reused
cached renames" just doesn't seem granular enough to help.  Is there
something else you were thinking of recording?


> I have now read all of the patches in this series to the level
> I can. It's all very deep stuff, so the more we can rely on the
> tests to show correctness, the better.
>
> I appreciate the extra tests that you added, which increases my
> confidence in the series.
>
> Thanks,
> -Stolee

  reply	other threads:[~2021-05-20  0:36 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-24 21:32 [PATCH 0/7] Optimization batch 11: avoid repeatedly detecting same renames Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 1/7] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 2/7] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 3/7] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 4/7] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 5/7] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 6/7] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-03-24 21:32 ` [PATCH 7/7] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-03-24 22:04 ` [PATCH 0/7] Optimization batch 11: avoid repeatedly detecting same renames Junio C Hamano
2021-03-24 23:25   ` Elijah Newren
2021-03-25 18:59     ` Junio C Hamano
2021-03-29 22:34       ` Elijah Newren
2021-03-30 12:07         ` Derrick Stolee
2021-05-04  2:12 ` [PATCH v2 00/13] " Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 01/13] t6423: rename file within directory that other side renamed Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 02/13] Documentation/technical: describe remembering renames optimization Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 03/13] fast-rebase: change assert() to BUG() Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 04/13] fast-rebase: write conflict state to working tree, index, and HEAD Elijah Newren via GitGitGadget
2021-05-17 13:32     ` Derrick Stolee
2021-05-18  3:42       ` Elijah Newren
2021-05-18 13:54         ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 05/13] t6429: testcases for remembering renames Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 06/13] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-05-17 13:41     ` Derrick Stolee
2021-05-18  3:55       ` Elijah Newren
2021-05-18 13:57         ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 07/13] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-05-17 13:51     ` Derrick Stolee
2021-05-20  0:48       ` Elijah Newren
2021-05-04  2:12   ` [PATCH v2 08/13] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-05-17 14:01     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 09/13] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-05-17 14:10     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 10/13] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 11/13] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-05-04  2:12   ` [PATCH v2 12/13] merge-ort: handle interactions of caching and rename/rename(1to1) cases Elijah Newren via GitGitGadget
2021-05-17 14:16     ` Derrick Stolee
2021-05-04  2:12   ` [PATCH v2 13/13] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-05-17 14:23     ` Derrick Stolee
2021-05-20  0:36       ` Elijah Newren [this message]
2021-05-22 11:17         ` Derrick Stolee
2021-05-14 17:37   ` [PATCH v2 00/13] Optimization batch 11: avoid repeatedly detecting same renames Elijah Newren
2021-05-14 21:04     ` Derrick Stolee
2021-05-20  6:09   ` [PATCH v3 " Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 01/13] t6423: rename file within directory that other side renamed Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 02/13] Documentation/technical: describe remembering renames optimization Elijah Newren via GitGitGadget
2021-05-20 11:32       ` Bagas Sanjaya
2021-05-20 15:14         ` Kerry, Richard
2021-05-20 16:34         ` Elijah Newren
2021-05-20  6:09     ` [PATCH v3 03/13] fast-rebase: change assert() to BUG() Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 04/13] fast-rebase: write conflict state to working tree, index, and HEAD Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 05/13] t6429: testcases for remembering renames Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 06/13] merge-ort: add data structures for in-memory caching of rename detection Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 07/13] merge-ort: populate caches of rename detection results Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 08/13] merge-ort: add code to check for whether cached renames can be reused Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 09/13] merge-ort: avoid accidental API mis-use Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 10/13] merge-ort: preserve cached renames for the appropriate side Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 11/13] merge-ort: add helper functions for using cached renames Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 12/13] merge-ort: handle interactions of caching and rename/rename(1to1) cases Elijah Newren via GitGitGadget
2021-05-20  6:09     ` [PATCH v3 13/13] merge-ort, diffcore-rename: employ cached renames when possible Elijah Newren via GitGitGadget
2021-05-22 11:17     ` [PATCH v3 00/13] Optimization batch 11: avoid repeatedly detecting same renames Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BFdxn9f0-jUjY6Ake_6kX-jeN1EEzpeJeTg+TV4wfepwg@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).