git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Elijah Newren <newren@gmail.com>
Cc: git <git@vger.kernel.org>
Subject: Re: [PATCH 10/30] directory rename detection: more involved edge/corner testcases
Date: Tue, 14 Nov 2017 14:47:55 -0800	[thread overview]
Message-ID: <CAGZ79kY+DcQkSBjz-=_pUzkpLk+yE2b+Nqi07a-WC4fwzsSCyA@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BE7qKiYBNxMquXRv+Gr=sA3jGs8JyKuqNSzAwKbOWOQ=Q@mail.gmail.com>

On Tue, Nov 14, 2017 at 1:11 PM, Elijah Newren <newren@gmail.com> wrote:
> On Mon, Nov 13, 2017 at 4:42 PM, Stefan Beller <sbeller@google.com> wrote:
>> On Fri, Nov 10, 2017 at 11:05 AM, Elijah Newren <newren@gmail.com> wrote:
>
>> "In my opinion" ... sounds like commit message?
>
> Sure, I can move it there.
>
>
>>> +# Testcase 7a, rename-dir vs. rename-dir (NOT split evenly) PLUS add-other-file
>>> +#   Commit A: z/{b,c}
>>> +#   Commit B: y/{b,c}
>>> +#   Commit C: w/b, x/c, z/d
>>> +#   Expected: y/d, CONFLICT(rename/rename for both z/b and z/c)
>>> +#   NOTE: There's a rename of z/ here, y/ has more renames, so z/d -> y/d.
>>
>> But the creator of C intended to have z/d, not {w,x}/d, and as {w,x} == y,
>> I am not sure I like this result. (I have no concrete counter example, just
>> messy logic)
>
> I'm open to alternative interpretations here.  The biggest issue for
> me -- going back our discussion at the end of
> https://public-inbox.org/git/CABPp-BFKiam6AK-Gg_RzaLuLur-jz0kvv3TqsHNHg5+HTv_uzA@mail.gmail.com/
> -- is "simple, predictable rule", which is consistent with the other
> rules and limits the number of nasty corner cases as much as possible.
> Perhaps you think this is one of those nasty corner cases, and that's
> fair, but I think it'd be hard to do much better.

yup, I agree that a simple, predictable rule is better than optimizing
for the corner cases we can come up now. (I sent this email before
reading your reply, so thanks for re-iterating the answer.)

> After spending quite a while trying to think of any other alternative
> rules or ways of looking at this, I could only come up with two
> points:
>
>   1) One could view this as a case where commit C didn't in fact do
> any directory rename -- note that directory z/ still exists in that
> commit.  Thus, only B did a rename, it renamed z/ -> y/, thus C's z/d
> should be moved to y/d.  So, this choice is consistent with the other
> rules we've got.

I wonder if we can do a data driven approach, i.e. mine some history
(linux, git, other large projects), that would tell us which of these cases
happens very often, and which of these corner cases can be "safely
ignored because it never happens". My gut feeling tells me that
splitting up a directory into two or three (potential sub-)directories
is a common thing, whereas double renames are not as often.
(But that's just my view, I have no data to back it up; the selection
of the data would open the next debate as it will be very specific to
a given community. But as linux has had such a huge impact on git,
I'd be tempted to claim any study on linux.git is fruitful for gits defaults)

>   2) An alternate (or maybe additional?) rule: We could decide that if
> a source path is renamed on both sides of history, then we'll just
> ignore both renames for consideration of directory rename detection.
>
> The new rule idea would "fix" this testcase to your liking, although
> now we'd be somewhat inconsistent with the "directory still exists
> implies no directory rename occurred rule".  But what other weirdness
> could entail?  Here's a few I've thought of:
>
> Commit O: z/{b,c,d}
> Commit A: y/{b,c}
> Commit B: z/{newb, newc, e}
>
> Here, A renamed z/ -> y/.

.. while deleting d ...


>  Except B renamed z/b and z/c differently,

... and z/d -> z/e, maybe(?).

> so all paths used to detect the z/ -> y/ rename are ignored, so there
> isn't a rename after all.  I'm not so sure I like that decision.
> Let's keep looking though, and change it up a bit more:
>
> Commit O: z/{b,c,d}
> Commit A: y/{b,c}, x/d
> Commit B: z/{newb, newc, d, e}
>
> Here, A has a split rename.  Since B renames z/b and z/c differently,
> we have to ignore the z/ -> y/ rename, and thus the only rename left
> implies z/ -> x/.  Thus we'd end up with z/e getting moved into x/e.
> Seems weird to me, and less likely that a user would understand this
> rule than the "majority wins" one.

It still is "majority wins" except the set of inspected files is filtered first.

In an *ideal, but expensive* algorithm, we might give different
weights to files, e.g.
* large files have more weight than smaller files,
* files with interesting names have more weight (c.f. Makefile vs. xstrbuf.c)
* similar files have more weight than files that are rewritten, or rather
  the more rewrite is done the less impact one file has.
* how unique file content is (LICENSE.txt that exists 23 times in the
  tree has less weight than the sekret-algorithm.c)

and depending on these weights we have a "majority of content" moved
to y/ or x/.

>>> +# Testcase 7c, rename/rename(1to...2or3); transitive rename may add complexity
>>> +#   (Related to testcases 3b and 5c)
>>> +#   Commit A: z/{b,c}, x/d
>>> +#   Commit B: y/{b,c}, w/d
>>> +#   Commit C: z/{b,c,d}
>>> +#   Expected: y/{b,c}, CONFLICT(x/d -> w/d vs. y/d)
>>
>> CONFLICT(x/d -> y/d vs w/d) ?
>
> I'm afraid I'm not following the question.

Yesterday I had the impression the renaming perspective changed,
note the difference in order of y/ and w/ inside the CONFLICT.
I might have been confused already, though.

>>> +# Testcase 7e, transitive rename in rename/delete AND dirs in the way
>>> +#   (Very similar to 'both rename source and destination involved in D/F conflict' from t6022-merge-rename.sh)
>>> +#   (Also related to testcases 9c and 9d)
>>> +#   Commit A: z/{b,c},     x/d_1
>>> +#   Commit B: y/{b,c,d/g}, x/d/f
>>> +#   Commit C: z/{b,c,d_1}
>>> +#   Expected: rename/delete(x/d_1->y/d_1 vs. None) + D/F conflict on y/d
>>> +#             y/{b,c,d/g}, y/d_1~C^0, x/d/f
>>> +#   NOTE: x/d/f may be slightly confusing here.  x/d_1 -> z/d_1 implies
>>> +#         there is a directory rename from x/ -> z/, performed by commit C.
>>> +#         However, on the side of commit B, it renamed z/ -> y/, thus
>>> +#         making a rename from x/ -> z/ when it was getting rid of z/ seems
>>> +#         non-sensical.  Further, putting x/d/f into y/d/f also doesn't
>>> +#         make a lot of sense because commit B did the renaming of z to y
>>> +#         and it created x/d/f, and it clearly made these things separate,
>>> +#         so it doesn't make much sense to push these together.
>>
>> This is confusing.
>
> Indeed it is.  When I first wrote this testcase, I didn't realize that
> I actually had two potentially directory renames involved and a
> doubly-transitive rename from it, on top of the D/F conflict.  I can
> see two ways to resolve this.
>
> 1) Leave the testcase alone, just try to make the NOTE more clear:
>
> NOTE: The main path of interest here is d_1 and where it ends up, but
> this is actually a case that has two potential directory renames
> involved and D/F conflict(s), so it makes sense to walk through each
> step.  Commit B renames z/ -> y/.  Thus everything that C adds to z/
> should be instead moved to y/.  This gives us the D/F conflict on y/d
> because x/d_1 -> z/d_1 -> y/d_1 conflicts with y/d/g.  Further, commit
> C renames x/ -> z/, thus everything B adds to x/ should instead be
> moved to z/...BUT we removed z/ and renamed it to y/, so maybe
> everything should move not from x/ to z/, but from x/ to z/ to y/.
> Doing so might make sense from the logic so far, but note that commit
> B had both an x/ and a y/; it did the renaming of z/ to y/ and created
> x/d/f and it clearly made these things separate, so it doesn't make
> much sense to push these together.  Doing so is what I'd call a doubly
> transitive rename; see testcases 9c and 9d for further discussion of
> this issue and how it's resolved.
>
> 2) Modify the testcase so it doesn't have two potential directory
> renames involved.  Just add another unrelated file under x/ that
> doesn't change on either side, thus removing the x/ -> z/ rename from
> the mix.  That wouldn't actually change the expected result (other
> than the new file should remain around), but it would change the
> reasoning and simplify it:
>
> NOTE: Commit B renames z/ -> y/.  Thus everything that C adds to z/
> should be instead moved to y/.  This gives us the D/F conflict on y/d
> because x/d_1 -> z/d_1 -> y/d_1 conflicts with y/d/g.  As a side note,
> one could imagine an alternative implementation trying to resolve D/F
> conflicts caused by renames by just undoing the rename, but in this
> case that would end up with us needing to write an x/d_1, which would
> still be a D/F conflict with x/d/f.

What do you want to test in 7e? AFAICT section 7 is about
"More involved Edge/Corner cases", so keeping it edge sounds fine.
(hence I'd vote for (1), just adjusting the notes)

  reply	other threads:[~2017-11-14 22:48 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-10 19:05 [PATCH 00/30] Add directory rename detection to git Elijah Newren
2017-11-10 19:05 ` [PATCH 01/30] Tighten and correct a few testcases for merging and cherry-picking Elijah Newren
2017-11-13 19:32   ` Stefan Beller
2017-11-10 19:05 ` [PATCH 02/30] merge-recursive: Fix logic ordering issue Elijah Newren
2017-11-13 19:48   ` Stefan Beller
2017-11-13 22:04     ` Elijah Newren
2017-11-13 22:12       ` Stefan Beller
2017-11-13 23:39         ` Elijah Newren
2017-11-13 23:46           ` Stefan Beller
2017-11-10 19:05 ` [PATCH 03/30] merge-recursive: Add explanation for src_entry and dst_entry Elijah Newren
2017-11-13 21:06   ` Stefan Beller
2017-11-13 22:57     ` Elijah Newren
2017-11-13 23:11       ` Stefan Beller
2017-11-14  1:26   ` Junio C Hamano
2017-11-10 19:05 ` [PATCH 04/30] directory rename detection: basic testcases Elijah Newren
2017-11-13 22:04   ` Stefan Beller
2017-11-14  0:57     ` Elijah Newren
2017-11-14  1:21       ` Stefan Beller
2017-11-14  1:40         ` Elijah Newren
2017-11-14  2:03     ` Junio C Hamano
2017-11-10 19:05 ` [PATCH 05/30] directory rename detection: directory splitting testcases Elijah Newren
2017-11-13 23:20   ` Stefan Beller
2017-11-10 19:05 ` [PATCH 06/30] directory rename detection: testcases to avoid taking detection too far Elijah Newren
2017-11-13 23:25   ` Stefan Beller
2017-11-14  1:02     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 07/30] directory rename detection: partially renamed directory testcase/discussion Elijah Newren
2017-11-14  0:07   ` Stefan Beller
2017-11-10 19:05 ` [PATCH 08/30] directory rename detection: files/directories in the way of some renames Elijah Newren
2017-11-14  0:15   ` Stefan Beller
2017-11-14  1:19     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 09/30] directory rename detection: testcases checking which side did the rename Elijah Newren
2017-11-14  0:25   ` Stefan Beller
2017-11-14  1:30     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 10/30] directory rename detection: more involved edge/corner testcases Elijah Newren
2017-11-14  0:42   ` Stefan Beller
2017-11-14 21:11     ` Elijah Newren
2017-11-14 22:47       ` Stefan Beller [this message]
2017-11-10 19:05 ` [PATCH 11/30] directory rename detection: testcases exploring possibly suboptimal merges Elijah Newren
2017-11-14 20:33   ` Stefan Beller
2017-11-14 21:42     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 12/30] directory rename detection: miscellaneous testcases to complete coverage Elijah Newren
2017-11-15 20:03   ` Stefan Beller
2017-11-16 21:17     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 13/30] directory rename detection: tests for handling overwriting untracked files Elijah Newren
2017-11-10 19:05 ` [PATCH 14/30] directory rename detection: tests for handling overwriting dirty files Elijah Newren
2017-11-10 19:05 ` [PATCH 15/30] merge-recursive: Move the get_renames() function Elijah Newren
2017-11-14  4:46   ` Junio C Hamano
2017-11-14 17:41     ` Elijah Newren
2017-11-15  1:20       ` Junio C Hamano
2017-11-10 19:05 ` [PATCH 16/30] merge-recursive: Introduce new functions to handle rename logic Elijah Newren
2017-11-14  4:56   ` Junio C Hamano
2017-11-14  5:14     ` Junio C Hamano
2017-11-14 18:24       ` Elijah Newren
2017-11-10 19:05 ` [PATCH 17/30] merge-recursive: Fix leaks of allocated renames and diff_filepairs Elijah Newren
2017-11-14  4:58   ` Junio C Hamano
2017-11-10 19:05 ` [PATCH 18/30] merge-recursive: Make !o->detect_rename codepath more obvious Elijah Newren
2017-11-10 19:05 ` [PATCH 19/30] merge-recursive: Split out code for determining diff_filepairs Elijah Newren
2017-11-14  5:20   ` Junio C Hamano
2017-11-10 19:05 ` [PATCH 20/30] merge-recursive: Add a new hashmap for storing directory renames Elijah Newren
2017-11-10 19:05 ` [PATCH 21/30] merge-recursive: Add get_directory_renames() Elijah Newren
2017-11-14  5:30   ` Junio C Hamano
2017-11-14 18:38     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 22/30] merge-recursive: Check for directory level conflicts Elijah Newren
2017-11-10 19:05 ` [PATCH 23/30] merge-recursive: Add a new hashmap for storing file collisions Elijah Newren
2017-11-10 19:05 ` [PATCH 24/30] merge-recursive: Add computation of collisions due to dir rename & merging Elijah Newren
2018-06-10 10:56   ` René Scharfe
2018-06-10 11:03     ` René Scharfe
2018-06-10 20:44     ` Jeff King
2018-06-11 15:03     ` Elijah Newren
2018-06-14 17:36     ` Junio C Hamano
2017-11-10 19:05 ` [PATCH 25/30] merge-recursive: Check for file level conflicts then get new name Elijah Newren
2017-11-10 19:05 ` [PATCH 26/30] merge-recursive: When comparing files, don't include trees Elijah Newren
2017-11-10 19:05 ` [PATCH 27/30] merge-recursive: Apply necessary modifications for directory renames Elijah Newren
2017-11-15 20:23   ` Stefan Beller
2017-11-16  3:54     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 28/30] merge-recursive: Avoid clobbering untracked files with " Elijah Newren
2017-11-10 19:05 ` [RFC PATCH 29/30] merge-recursive: Fix overwriting dirty files involved in renames Elijah Newren
2017-11-10 19:05 ` [PATCH 30/30] merge-recursive: Fix remaining directory rename + dirty overwrite cases Elijah Newren
2017-11-10 22:27 ` [PATCH 00/30] Add directory rename detection to git Philip Oakley
2017-11-10 23:26   ` Elijah Newren
2017-11-13 15:04     ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kY+DcQkSBjz-=_pUzkpLk+yE2b+Nqi07a-WC4fwzsSCyA@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).