git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Elijah Newren <newren@gmail.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 3/3] merge-ort: fix issue with dual rename and add/add conflict
Date: Wed,  6 Jul 2022 10:25:05 -0700	[thread overview]
Message-ID: <20220706172505.270363-1-jonathantanmy@google.com> (raw)
In-Reply-To: <CABPp-BG8r4zUOKZ+DzZOdsEwr+580cK=SvZ4n_euPOV1Nocu+A@mail.gmail.com>

Sorry for getting back to this so late. I don't have any issues with the
patches, but just to close the loop:

Elijah Newren <newren@gmail.com> writes:
> On Mon, Jun 27, 2022 at 11:47 AM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > > The testcases added in t6423 a couple commits ago are slightly different
> > > but similar in principle.  They involve a similar case of paired
> > > renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
> > > a leading directory of B/ to C/.
> >
> > Hmm...one side moved sub1 -> sub3 and the other moved sub2 from the root
> > to under sub1, right? So it's more like A/ -> B/ and C/ -> A/C/.
> 
> Hmm, maybe I should have been more explicit in my mapping:
>    A = sub2
>    B = sub1/sub2
>    leading directory of B = sub1
>    C = sub3

Substituting into A/ -> B/ and "a leading directory of B/ to C/", we
get:

  sub2 -> sub1/sub2 and sub1 -> sub3

which is indeed what is happening. I see, thanks.

> > > And both sides add a new file
> > > somewhere under the directory that the other side will rename.
> >
> > Rename or move, I think.
> 
> I'm confused; I don't understand what this comment means.  Renames
> tend to be created using "git mv", before or after making content
> changes, so to me a file being renamed or a file being moved to a
> different location are synonymous.  What distinction are you making
> between renames and moves?

I was thinking that a rename means that the directory still has the same
parent directory, whereas a move means that the directory keeps its
basename but has a different parent directory. Maybe it's just the way I
think about things, but the important thing here is that a directory was
moved so that its parent directory is a directory that is different and
that already exists, and I think that this meaning is lost when we say
"rename". But it might just be me.

> Hmm, perhaps I should change this to:
> 
> /* Cases where we don't have or don't want a directory rename for this
> path, so we return NULL */
> 
> The purpose of this function is to check whether a given path would be
> modified by directory renaming to get a new path.  So, "no new path"
> means we don't have an applicable directory rename or don't want to
> use it.

I see - the new text makes sense.

> > >       if (strmap_empty(dir_renames))
> > >               return new_path;
> > > +     if (strmap_get(&collisions[other_side], path))
> > > +             return new_path;
> >
> > So as far as I understand, collisions here, for a given side, is a map.
> > The map's keys are all the filenames of added and renamed files (I'm
> > assuming that's what 'A' and 'R' are) from that side after any directory
> > moves on the other side are applied. So, for example, if we add "foo/a"
> > on side A and rename "foo/" to "bar/" on side B, then side A's collision
> > map would have a key "bar/a". So I'm not sure if "collision" is the
> > right name here, but the existing code already uses it so I'll leave it
> > be.
> 
> Let's take your example a bit further, to discuss the original kind of
> usecase that "collisions" was written for.  In addition to adding
> "foo/a" on side A, we also add "foo2/a" and "foo3/a".  And then in
> addition to renaming "foo/" to "bar/" on side B, we also rename
> "foo2/" to "bar/" and "foo3/" to "bar/", thus merging all of foo/,
> foo2/, and foo3/ into a single directory named bar/.  If the files in
> foo/, foo2/, and foo3/ on side B were all unique, you can see how
> there'd be no problem merging these directories together.  The problem
> only comes at merge time when you attempt to apply the directory
> renames from side B to the new files on side A.  That's when you get
> collisions, in this case three files that would be named bar/a.
> 
> Checking for such collisions was the purpose of the "collisions"
> metadata, so I think the name is fitting.  The only problem is that
> we're reusing that data now for a slightly different purpose, though I
> think it's still "collision-y".

That makes sense, thanks.

> > It makes sense that this situation (of side A having "bar/a" because
> > side B renamed "foo/" to "bar/", and at the same time, side B adds its
> > own "bar/a") is weird, as stated in the commit message, so I don't mind
> > disabling checking for directory rename in this case.  However, in
> > theory, I don't see how disabling this check would fix anything, since
> > the bug seems to be about two different files on different sides being
> > renamed to the same filename through some convoluted means. (Unless this
> > is the only convoluted means to do it, and disabling it means that the
> > bug wouldn't occur.)

[snip]

> Hmm...let me see if I can explain this a different way.
> 
> The short version of the issue here is that if a directory rename
> wants to rename NEWFILE -> ALTERNATE_NEWFILE, but there is another
> directory rename on the other side of history that wants to rename
> ANOTHER_FILE -> NEWFILE, then we run into problems and have to turn
> off the NEWFILE -> ALTERNATE_NEWFILE.  This check here is all about
> this case.
> 
> To see why this is the problematic setup...
> 
> Now a conflict_info
> stores information about the OIDs and modes for all three sides of the
> merge (i.e. both sides of the merge and the base).  Whenever a rename
> is processed, we have to update this map, because the rename makes the
> conflict_info now apply to a different path.  In the simple cases, the
> conflict_info from the source path needs to be merged with the
> conflict_info for the target path, and the source path's conflict_info
> needs to be marked as null (literally setting the .is_null field to
> 1).  rename/rename and such can make this a bit more complicated.

Ah, I think I was missing this part. The intention is that processing
one side in this way (and thus modifying its conflict_info) would not
affect the processing of any other sides, but there is a bug that makes
it not so. (And the intention is not, say, when processing all sides,
to write the output in a new data structure so that the result is the
same no matter the order.)

So I think the answer to my question is: no, we do not want to be able
to rename two different files on different sides to the same filename
through any convoluted means, and if that happens, it's a bug.

[snip illustration of what happens in either merge order]

> To avoid this order dependence, and the weird multiple-renaming of a
> single path, we just want to turn off directory renames when the
> source of a directory rename on one side is the target of a directory
> rename on the other.  That's what this patch does.
> 
> 
> Does that help?  Or did I make it more confusing?

I think that helped, thanks.

  reply	other threads:[~2022-07-06 17:25 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22  4:20 [PATCH 0/3] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
2022-06-22  4:20 ` [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
2022-06-27 18:20   ` Jonathan Tan
2022-06-30  0:06     ` Elijah Newren
2022-06-30 22:32       ` Jonathan Tan
2022-07-01  2:57         ` Elijah Newren
2022-06-27 22:30   ` Calvin Wan
2022-06-30  0:07     ` Elijah Newren
2022-06-22  4:20 ` [PATCH 2/3] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
2022-06-27 18:48   ` Jonathan Tan
2022-06-27 21:04     ` Calvin Wan
2022-06-30  0:05       ` Elijah Newren
2022-06-22  4:20 ` [PATCH 3/3] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
2022-06-27 18:47   ` Jonathan Tan
2022-06-30  0:05     ` Elijah Newren
2022-07-06 17:25       ` Jonathan Tan [this message]
2022-06-22  4:36 ` [PATCH 0/3] Fix dual rename into each other plus conflicting adds Elijah Newren
2022-06-30  6:57 ` [PATCH v2 " Elijah Newren via GitGitGadget
2022-06-30  6:57   ` [PATCH v2 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
2022-06-30 10:21     ` Ævar Arnfjörð Bjarmason
2022-07-01  2:57       ` Elijah Newren
2022-07-01  9:29         ` Ævar Arnfjörð Bjarmason
2022-06-30  6:57   ` [PATCH v2 2/3] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
2022-06-30 10:28     ` Ævar Arnfjörð Bjarmason
2022-07-01  3:02       ` Elijah Newren
2022-06-30  6:57   ` [PATCH v2 3/3] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
2022-06-30 10:31     ` Ævar Arnfjörð Bjarmason
2022-07-01  3:03       ` Elijah Newren
2022-07-01  5:23   ` [PATCH v3 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
2022-07-01  5:23     ` [PATCH v3 1/5] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
2022-07-01  5:23     ` [PATCH v3 2/5] merge-ort: small cleanups of check_for_directory_rename Elijah Newren via GitGitGadget
2022-07-01  5:23     ` [PATCH v3 3/5] merge-ort: make a separate function for freeing struct collisions Elijah Newren via GitGitGadget
2022-07-01  5:23     ` [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
2022-07-01  9:16       ` Ævar Arnfjörð Bjarmason
2022-07-25 12:00         ` C99 "for (int ..." form on "master" (was: [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions) Ævar Arnfjörð Bjarmason
2022-07-26  2:14           ` Elijah Newren
2022-07-26  4:48             ` C99 "for (int ..." form on "master" Junio C Hamano
2022-07-01  5:23     ` [PATCH v3 5/5] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
2022-07-05  1:33     ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
2022-07-05  1:33       ` [PATCH v4 1/5] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
2022-07-05  1:33       ` [PATCH v4 2/5] merge-ort: small cleanups of check_for_directory_rename Elijah Newren via GitGitGadget
2022-07-05  1:33       ` [PATCH v4 3/5] merge-ort: make a separate function for freeing struct collisions Elijah Newren via GitGitGadget
2022-07-05  1:33       ` [PATCH v4 4/5] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
2022-07-05  1:33       ` [PATCH v4 5/5] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
2022-07-06 16:52       ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220706172505.270363-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).