git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: git <git@vger.kernel.org>
Subject: Re: [PATCH 04/30] directory rename detection: basic testcases
Date: Mon, 13 Nov 2017 16:57:31 -0800	[thread overview]
Message-ID: <CABPp-BFKiam6AK-Gg_RzaLuLur-jz0kvv3TqsHNHg5+HTv_uzA@mail.gmail.com> (raw)
In-Reply-To: <CAGZ79kbmt7jt13D-HY90+LBaCHsqvDOYnrmJ41hR3YsgEceirg@mail.gmail.com>

On Mon, Nov 13, 2017 at 2:04 PM, Stefan Beller <sbeller@google.com> wrote:
> On Fri, Nov 10, 2017 at 11:05 AM, Elijah Newren <newren@gmail.com> wrote:
> (minor thought:)
> After rereading the docs above this is clear; I wonder if instead of A, B, C
> a notation of Base, ours, theirs would be easier to understand?

Sure, that'd be an easy change.

>> +test_expect_success '1a-setup: Simple directory rename detection' '
>> +test_expect_failure '1a-check: Simple directory rename detection' '
>
> Thanks for splitting the setup and the check into two different test cases!
>
>
>> +       git checkout B^0 &&
>
> Any reason for ^0 ? (to make clear it is a branch?)

Partially force-of-habit (did the same with t6036 and t6042), but it
seemed to have the nice feature of making debugging easier through
improved reproducability.  In particular, if I had checked out B
rather than B^0 in the testcase and a merge succeeded when I didn't
expect it, then attempting to run the same commands gives me a
different starting point for the merge.  By instead explicitly
checking out B^0, then even if the merge succeeds, someone who again
runs checkout B^0 will have the same starting point.

>> +test_expect_success '1b-setup: Merge a directory with another' '
>> +       git rm -rf . &&
>> +       git clean -fdqx &&
>> +       rm -rf .git &&
>> +       git init &&
>
> This is quite a strong statement to start a test with.

It was actually copy-paste from t6036, and achieved two purposes:
  1) Even if one test fails, subsequent ones continue running.  (Had
lots of problems with this in t6036 years ago and just ended up with
those four steps)
  2) Someone who runs into a failing testcase has a _much_ easier time
figuring out what is going on with the testcase.  I find it takes a
fair amount of time to figure out what's going on with other tests in
git's testsuite because of the presence of so many files completely
unrelated to the given test, which have simply accumulated from
previous tests.  For many tests, that may be fine, but in particular
for t6036, t6042, and now t6043, since these are mostly about corner
cases that are hard enough to reason about, I didn't want the extra
distractions.

but...

> Nowadays we rather do
>
>     test_when_finished "some command" &&
>
> than polluting the follow up tests. But as you split up the previous test
> into 2 tests, it is not clear if this would bring any good.

test_when_finished looks pretty cool; I didn't know about it.  Thanks
for the pointer.  Not sure if we want to use it here (if we do, we'd
only do so in the check tests rather than the setup ones).

> Also these are four cleanup commands, I'd have expected fewer.
> Maybe just "rm -rf ." ? Or as we make a new repo for each test case,
>
>     test_create_repo 1a &&
>     cd 1a
>
> in the first setup, and here we do
>     test_create_repo 1b &&
>     cd 1b
>
> relying on test_done to cleanup everything afterwards?

rm aborts for me with 'rm -rf .', but I could possibly make it 'rm -rf
* .* && git init .'

The test_create_repo might not be so bad as long as every test used it
and put all files under it's own little repo.  It does create some
clutter, but it's at least somewhat managed.  I'm still a bit partial
to just totally cleaning it out, but if others would prefer, I can
switch.

>> +       test 3 -eq $(git ls-files -s | wc -l) &&
>
>     git ls-files >out &&
>     test_line_count = 3 out &&
>
> maybe? Piping output of git commands somewhere is an
> anti-pattern as we cannot examine the return code of ls-files in this case.

Um...I guess that makes sense, if you assumed that I cared about the
return code of ls-files.  But it seems to make the tests with multiple
calls to ls-files in a row (of which there are many) considerably
uglier, so I'd personally prefer to avoid that.  Also, why would I
care about the return code of ls-files?

Your suggestion made me curious, so I went looking.  As far as I can
tell, the return code of ls-files is always 0 unless you both specify
both --error-unmatch and one or more paths, neither of which I did.
Am I missing something?

If you feel the return code of ls-files is important, perhaps I could
just have a separate
   git ls-files -s >/dev/null &&
call before the others?

>> +       test $(git rev-parse HEAD:y/b) = $(git rev-parse A:z/b) &&
>> +       test $(git rev-parse HEAD:y/c) = $(git rev-parse A:z/c) &&
>> +       test $(git rev-parse HEAD:y/d) = $(git rev-parse A:x/d) &&
>
> Speaking of these, there are quite a lot of invocations of rev-parse,
> though it looks clean; recently Junio had some good ideas regarding a
> similar test[1].
> So maybe
>
>   git rev-parse >actual \
>     HEAD:y/b  HEAD:y/c HEAD:y/d &&
>   git rev-parse >expect \
>     A:z/b    A:z/c    A:x/d &&
>   test_cmp expect actual
>
> Not sure if that is any nicer, but has fewer calls.

Sure, I can switch it over.

>> +       test_must_fail git rev-parse HEAD:x/d &&
>> +       test_must_fail git rev-parse HEAD:z/d &&
>> +       test ! -f z/d
>> +'
>> +
>> +# Testcase 1d, Directory renames (merging two directories into one new one)
>> +#              cause a rename/rename(2to1) conflict
>> +#   (Related to testcases 1c and 7b)
>> +#   Commit A. z/{b,c},        y/{d,e}
>> +#   Commit B. x/{b,c},        y/{d,e,m,wham}
>> +#   Commit C. z/{b,c,n,wham}, x/{d,e}
>> +#   Expected: x/{b,c,d,e,m,n}, CONFLICT:(y/wham & z/wham -> x/wham)
>> +#   Note: y/m & z/n should definitely move into x.  By the same token, both
>> +#         y/wham & z/wham should to...giving us a conflict.
>
> If wham are equal (in oid), shouldn't this not conflict and only conflict
> when z/wham and x/wham differ in oid, but have the same sub-path?

Good eyes, I should have labelled these as wham_1 and wham_2, since
the testcase did explicitly make them different (having contents of
"wham1\n" and "wham2\n"), but it could make sense to add a test to the
testsuite where two such colliding files are identical.

I thought about that, figured it wasn't worth it, and didn't add the
code to handle it.  I could add a testcase for it, though I'd be very
tempted to just leave it as test_expect_failure and let someone else
come along and implement it if someone feels so inclined.  Note,
though, that the situation is made slightly more complex due to the
fact that it might be N colliding files rather than just 2 (due to the
possibility of multiple directories being merged into one on either
side of history).

>> +
>> +# Testcase 1e, Renamed directory, with all filenames being renamed too
>> +#   Commit A: z/{oldb,oldc}
>> +#   Commit B: y/{newb,newc}
>> +#   Commit C: z/{oldb,oldc,d}
>
> What about oldd ?
> (expecting a pattern across many files of s/old/new/ isn't unreasonable,
> but maybe too much for now?)
> By having no "old" prefix there is only one thing to do, which is y/d

I'm not following.  The "old" and "new" in the filenames were just
there so a human reading the testcase could easily tell which
filenames were related and involved in renames.  There is no rename
associated with d, so why would I have called it "old" or "new"?

>> +# Testcase 1f, Split a directory into two other directories
>> +#   (Related to testcases 3a, all of section 2, and all of section 4)
>> +#   Commit A: z/{b,c,d,e,f}
>> +#   Commit B: z/{b,c,d,e,f,g}
>> +#   Commit C: y/{b,c}, x/{d,e,f}
>> +#   Expected: y/{b,c}, x/{d,e,f,g}
>
> Why not y/g ? Because there are more files in x?

Yes.

> I can come up with a reasonable counter example:

Oh sure, of course.  You should also be able to come up with
counter-examples to the "correctness" of git's content merging of
files when it fails to report any conflicts (e.g. one side added
another call to a function, the other side changed the number of
parameters the function took).  In short, this is just a case of where
we need to come up with simple predictable rules since we can't read
minds.  Here's the situation we're faced with, and my line of thinking
in coming up with this simple, predictable, but definitely coarse
rule:

  * Users sometimes rename directories on one side of history and add
files to the original directory on the other side.
  * We would like to "detect" the directory rename, and move the new
files into the correct directory.
  * We don't really have any hints for detecting directory renames
other by comparing content, i.e. basing it on the file rename
detection we already have.
  * There is a possibility that not all files in a certain directory
went to the same location.  It's possible that a few may have gone
elsewhere.
  * Only in weird corner cases would I expect renamed-file location to
be split nearly 50-50 (as in this contrived testcase) -- although for
such cases, as you point out, there is a much higher chance of us
getting the merge "wrong".  Thus, our rule should be simple so people
can understand and expect what we did and have an easy time fixing it.

So, what to do?  There are a few options:
  1) Don't do directory rename detection at all.  Just declare it to
be an anti-feature.
  2) Try to guess why the user moved different files to different
directories and try to mimic that somehow.
  3) Only allow directory rename detection to happen when ALL renamed
files in the directory went to the same directory.
  4) Use a simple predictable rule like majority wins.

I think 2 is insanity.  Choices 1 and 3 don't have much appeal to me;
I'm strongly against them.  I'm unware of any remaining choices other
than 4, but 4 seems pretty reasonable to me.  It won't always be
right, but neither is simple content merging.

  reply	other threads:[~2017-11-14  0:57 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-10 19:05 [PATCH 00/30] Add directory rename detection to git Elijah Newren
2017-11-10 19:05 ` [PATCH 01/30] Tighten and correct a few testcases for merging and cherry-picking Elijah Newren
2017-11-13 19:32   ` Stefan Beller
2017-11-10 19:05 ` [PATCH 02/30] merge-recursive: Fix logic ordering issue Elijah Newren
2017-11-13 19:48   ` Stefan Beller
2017-11-13 22:04     ` Elijah Newren
2017-11-13 22:12       ` Stefan Beller
2017-11-13 23:39         ` Elijah Newren
2017-11-13 23:46           ` Stefan Beller
2017-11-10 19:05 ` [PATCH 03/30] merge-recursive: Add explanation for src_entry and dst_entry Elijah Newren
2017-11-13 21:06   ` Stefan Beller
2017-11-13 22:57     ` Elijah Newren
2017-11-13 23:11       ` Stefan Beller
2017-11-14  1:26   ` Junio C Hamano
2017-11-10 19:05 ` [PATCH 04/30] directory rename detection: basic testcases Elijah Newren
2017-11-13 22:04   ` Stefan Beller
2017-11-14  0:57     ` Elijah Newren [this message]
2017-11-14  1:21       ` Stefan Beller
2017-11-14  1:40         ` Elijah Newren
2017-11-14  2:03     ` Junio C Hamano
2017-11-10 19:05 ` [PATCH 05/30] directory rename detection: directory splitting testcases Elijah Newren
2017-11-13 23:20   ` Stefan Beller
2017-11-10 19:05 ` [PATCH 06/30] directory rename detection: testcases to avoid taking detection too far Elijah Newren
2017-11-13 23:25   ` Stefan Beller
2017-11-14  1:02     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 07/30] directory rename detection: partially renamed directory testcase/discussion Elijah Newren
2017-11-14  0:07   ` Stefan Beller
2017-11-10 19:05 ` [PATCH 08/30] directory rename detection: files/directories in the way of some renames Elijah Newren
2017-11-14  0:15   ` Stefan Beller
2017-11-14  1:19     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 09/30] directory rename detection: testcases checking which side did the rename Elijah Newren
2017-11-14  0:25   ` Stefan Beller
2017-11-14  1:30     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 10/30] directory rename detection: more involved edge/corner testcases Elijah Newren
2017-11-14  0:42   ` Stefan Beller
2017-11-14 21:11     ` Elijah Newren
2017-11-14 22:47       ` Stefan Beller
2017-11-10 19:05 ` [PATCH 11/30] directory rename detection: testcases exploring possibly suboptimal merges Elijah Newren
2017-11-14 20:33   ` Stefan Beller
2017-11-14 21:42     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 12/30] directory rename detection: miscellaneous testcases to complete coverage Elijah Newren
2017-11-15 20:03   ` Stefan Beller
2017-11-16 21:17     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 13/30] directory rename detection: tests for handling overwriting untracked files Elijah Newren
2017-11-10 19:05 ` [PATCH 14/30] directory rename detection: tests for handling overwriting dirty files Elijah Newren
2017-11-10 19:05 ` [PATCH 15/30] merge-recursive: Move the get_renames() function Elijah Newren
2017-11-14  4:46   ` Junio C Hamano
2017-11-14 17:41     ` Elijah Newren
2017-11-15  1:20       ` Junio C Hamano
2017-11-10 19:05 ` [PATCH 16/30] merge-recursive: Introduce new functions to handle rename logic Elijah Newren
2017-11-14  4:56   ` Junio C Hamano
2017-11-14  5:14     ` Junio C Hamano
2017-11-14 18:24       ` Elijah Newren
2017-11-10 19:05 ` [PATCH 17/30] merge-recursive: Fix leaks of allocated renames and diff_filepairs Elijah Newren
2017-11-14  4:58   ` Junio C Hamano
2017-11-10 19:05 ` [PATCH 18/30] merge-recursive: Make !o->detect_rename codepath more obvious Elijah Newren
2017-11-10 19:05 ` [PATCH 19/30] merge-recursive: Split out code for determining diff_filepairs Elijah Newren
2017-11-14  5:20   ` Junio C Hamano
2017-11-10 19:05 ` [PATCH 20/30] merge-recursive: Add a new hashmap for storing directory renames Elijah Newren
2017-11-10 19:05 ` [PATCH 21/30] merge-recursive: Add get_directory_renames() Elijah Newren
2017-11-14  5:30   ` Junio C Hamano
2017-11-14 18:38     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 22/30] merge-recursive: Check for directory level conflicts Elijah Newren
2017-11-10 19:05 ` [PATCH 23/30] merge-recursive: Add a new hashmap for storing file collisions Elijah Newren
2017-11-10 19:05 ` [PATCH 24/30] merge-recursive: Add computation of collisions due to dir rename & merging Elijah Newren
2018-06-10 10:56   ` René Scharfe
2018-06-10 11:03     ` René Scharfe
2018-06-10 20:44     ` Jeff King
2018-06-11 15:03     ` Elijah Newren
2018-06-14 17:36     ` Junio C Hamano
2017-11-10 19:05 ` [PATCH 25/30] merge-recursive: Check for file level conflicts then get new name Elijah Newren
2017-11-10 19:05 ` [PATCH 26/30] merge-recursive: When comparing files, don't include trees Elijah Newren
2017-11-10 19:05 ` [PATCH 27/30] merge-recursive: Apply necessary modifications for directory renames Elijah Newren
2017-11-15 20:23   ` Stefan Beller
2017-11-16  3:54     ` Elijah Newren
2017-11-10 19:05 ` [PATCH 28/30] merge-recursive: Avoid clobbering untracked files with " Elijah Newren
2017-11-10 19:05 ` [RFC PATCH 29/30] merge-recursive: Fix overwriting dirty files involved in renames Elijah Newren
2017-11-10 19:05 ` [PATCH 30/30] merge-recursive: Fix remaining directory rename + dirty overwrite cases Elijah Newren
2017-11-10 22:27 ` [PATCH 00/30] Add directory rename detection to git Philip Oakley
2017-11-10 23:26   ` Elijah Newren
2017-11-13 15:04     ` Philip Oakley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BFKiam6AK-Gg_RzaLuLur-jz0kvv3TqsHNHg5+HTv_uzA@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).