git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] Fix dual rename into each other plus conflicting adds
@ 2022-06-22  4:20 Elijah Newren via GitGitGadget
  2022-06-22  4:20 ` [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
                   ` (4 more replies)
  0 siblings, 5 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-06-22  4:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

This series adds some testcases based on the tensorflow repository issue
reported by Glen Choo at [1], demonstrating bugs in both the ort and
recursive strategies. It also provides a fix for the ort strategy.

[1]
https://lore.kernel.org/git/kl6lee006mle.fsf@chooglen-macbookpro.roam.corp.google.com/

Elijah Newren (3):
  t6423: add tests of dual directory rename plus add/add conflict
  merge-ort: shuffle the computation and cleanup of potential collisions
  merge-ort: fix issue with dual rename and add/add conflict

 merge-ort.c                         |  60 ++++++++++------
 t/t6423-merge-rename-directories.sh | 102 ++++++++++++++++++++++++++++
 2 files changed, 142 insertions(+), 20 deletions(-)


base-commit: e54793a95afeea1e10de1e5ad7eab914e7416250
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1268%2Fnewren%2Ffix-dual-rename-into-each-other-plus-conflicting-adds-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1268/newren/fix-dual-rename-into-each-other-plus-conflicting-adds-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1268
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict
  2022-06-22  4:20 [PATCH 0/3] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
@ 2022-06-22  4:20 ` Elijah Newren via GitGitGadget
  2022-06-27 18:20   ` Jonathan Tan
  2022-06-27 22:30   ` Calvin Wan
  2022-06-22  4:20 ` [PATCH 2/3] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-06-22  4:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This is an attempt at minimalizing a testcase reported by Glen Choo
with tensorflow where merge-ort would report an assertion failure:

    Assertion failed: (ci->filemask == 2 || ci->filemask == 4), function apply_directory_rename_modifications, file merge-ort.c, line 2410

reversing the direction of the merge provides a different error:

    error: cache entry has null sha1: ...
    fatal: unable to write .git/index

so we add testcases for both.  With these new testcases, the
recursive strategy differs in that it returns the latter error for
both merge directions.

These testcases are somehow a little different than Glen's original
tensorflow testcase in that these ones trigger a bug with the recursive
algorithm whereas his testcase didn't.  I figure that means these
testcases somehow manage to be more comprehensive.

Reported-by: Glen Choo <chooglen@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t6423-merge-rename-directories.sh | 102 ++++++++++++++++++++++++++++
 1 file changed, 102 insertions(+)

diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
index 479db32cd62..296c04f8046 100755
--- a/t/t6423-merge-rename-directories.sh
+++ b/t/t6423-merge-rename-directories.sh
@@ -5199,6 +5199,108 @@ test_expect_success '12k: Directory rename with sibling causes rename-to-self' '
 	)
 '
 
+# Testcase 12l, Both sides rename a directory into the other side, both add
+#   a file with after directory renames are the same filename
+#   Commit O: sub1/file,                 sub2/other
+#   Commit A: sub3/file,                 sub2/{other, new_add_add_file_1}
+#   Commit B: sub1/{file, newfile}, sub1/sub2/{other, new_add_add_file_2}
+#
+#   In words:
+#     A: sub1/ -> sub3/, add sub2/new_add_add_file_1
+#     B: sub2/ -> sub1/sub2, add sub1/newfile, add sub1/sub2/new_add_add_file_2
+#
+#   Expected: sub3/{file, newfile, sub2/other}
+#             CONFLICT (add/add): sub3/sub2/new_add_add_file
+
+test_setup_12l () {
+	test_create_repo 12l_$1 &&
+	(
+		cd 12l_$1 &&
+
+		mkdir -p sub1 sub2
+		echo file >sub1/file &&
+		echo other >sub2/other &&
+		git add sub1 sub2 &&
+		git commit -m "O" &&
+
+		git branch O &&
+		git branch A &&
+		git branch B &&
+
+		git checkout A &&
+		git mv sub1 sub3 &&
+		echo conflicting >sub2/new_add_add_file &&
+		git add sub2 &&
+		test_tick &&
+		git add -u &&
+		git commit -m "A" &&
+
+		git checkout B &&
+		echo dissimilar >sub2/new_add_add_file &&
+		echo brand >sub1/newfile &&
+		git add sub1 sub2 &&
+		git mv sub2 sub1 &&
+		test_tick &&
+		git commit -m "B"
+	)
+}
+
+test_expect_merge_algorithm failure failure '12l (B into A): Rename into each other + add/add conflict' '
+	test_setup_12l BintoA &&
+	(
+		cd 12l_BintoA &&
+
+		git checkout -q A^0 &&
+
+		test_must_fail git -c merge.directoryRenames=true merge -s recursive B^0 &&
+
+		git ls-files -s >out &&
+		test_line_count = 5 out &&
+
+		git rev-parse >actual \
+			:0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
+			:2:sub1/sub2/new_add_add_file \
+			:3:sub1/sub2/new_add_add_file &&
+		git rev-parse >expect \
+			O:sub1/file  B:sub1/newfile O:sub2/other \
+			A:sub2/new_add_add_file \
+			B:sub1/sub2/new_add_add_file &&
+		test_cmp expect actual &&
+
+		git ls-files -o >actual &&
+		test_write_lines actual expect out >expect &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_merge_algorithm failure failure '12l (A into B): Rename into each other + add/add conflict' '
+	test_setup_12l AintoB &&
+	(
+		cd 12l_AintoB &&
+
+		git checkout -q B^0 &&
+
+		test_must_fail git -c merge.directoryRenames=true merge -s recursive A^0 &&
+
+		git ls-files -s >out &&
+		test_line_count = 5 out &&
+
+		git rev-parse >actual \
+			:0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
+			:2:sub1/sub2/new_add_add_file \
+			:3:sub1/sub2/new_add_add_file &&
+		git rev-parse >expect \
+			O:sub1/file  B:sub1/newfile O:sub2/other \
+			B:sub1/sub2/new_add_add_file \
+			A:sub2/new_add_add_file &&
+		test_cmp expect actual &&
+
+		git ls-files -o >actual &&
+		test_write_lines actual expect out >expect &&
+		test_cmp expect actual
+	)
+'
+
 ###########################################################################
 # SECTION 13: Checking informational and conflict messages
 #
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/3] merge-ort: shuffle the computation and cleanup of potential collisions
  2022-06-22  4:20 [PATCH 0/3] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
  2022-06-22  4:20 ` [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
@ 2022-06-22  4:20 ` Elijah Newren via GitGitGadget
  2022-06-27 18:48   ` Jonathan Tan
  2022-06-22  4:20 ` [PATCH 3/3] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-06-22  4:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Run compute_collisions() for renames on both sides of history before
any calls to collect_renames(), and do not free the computed collisions
until after both calls to collect_renames().  This is just a code
reorganization at this point that doesn't make sense on its own, but
will permit us to use the computed collision info from both sides
within each call to collect_renames() in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 56 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 8545354dafd..fa6667de18c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2259,6 +2259,27 @@ static void compute_collisions(struct strmap *collisions,
 	}
 }
 
+static void free_collisions(struct strmap *collisions)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *entry;
+
+	/* Free each value in the collisions map */
+	strmap_for_each_entry(collisions, &iter, entry) {
+		struct collision_info *info = entry->value;
+		string_list_clear(&info->source_files, 0);
+	}
+	/*
+	 * In compute_collisions(), we set collisions.strdup_strings to 0
+	 * so that we wouldn't have to make another copy of the new_path
+	 * allocated by apply_dir_rename().  But now that we've used them
+	 * and have no other references to these strings, it is time to
+	 * deallocate them.
+	 */
+	free_strmap_strings(collisions);
+	strmap_clear(collisions, 1);
+}
+
 static char *check_for_directory_rename(struct merge_options *opt,
 					const char *path,
 					unsigned side_index,
@@ -2314,7 +2335,8 @@ static char *check_for_directory_rename(struct merge_options *opt,
 	}
 
 	new_path = handle_path_level_conflicts(opt, path, side_index,
-					       rename_info, collisions);
+					       rename_info,
+					       &collisions[side_index]);
 	*clean_merge &= (new_path != NULL);
 
 	return new_path;
@@ -3023,18 +3045,15 @@ static int detect_regular_renames(struct merge_options *opt,
 static int collect_renames(struct merge_options *opt,
 			   struct diff_queue_struct *result,
 			   unsigned side_index,
+			   struct strmap *collisions,
 			   struct strmap *dir_renames_for_side,
 			   struct strmap *rename_exclusions)
 {
 	int i, clean = 1;
-	struct strmap collisions;
 	struct diff_queue_struct *side_pairs;
-	struct hashmap_iter iter;
-	struct strmap_entry *entry;
 	struct rename_info *renames = &opt->priv->renames;
 
 	side_pairs = &renames->pairs[side_index];
-	compute_collisions(&collisions, dir_renames_for_side, side_pairs);
 
 	for (i = 0; i < side_pairs->nr; ++i) {
 		struct diff_filepair *p = side_pairs->queue[i];
@@ -3050,7 +3069,7 @@ static int collect_renames(struct merge_options *opt,
 						      side_index,
 						      dir_renames_for_side,
 						      rename_exclusions,
-						      &collisions,
+						      collisions,
 						      &clean);
 
 		possibly_cache_new_pair(renames, p, side_index, new_path);
@@ -3076,20 +3095,6 @@ static int collect_renames(struct merge_options *opt,
 		result->queue[result->nr++] = p;
 	}
 
-	/* Free each value in the collisions map */
-	strmap_for_each_entry(&collisions, &iter, entry) {
-		struct collision_info *info = entry->value;
-		string_list_clear(&info->source_files, 0);
-	}
-	/*
-	 * In compute_collisions(), we set collisions.strdup_strings to 0
-	 * so that we wouldn't have to make another copy of the new_path
-	 * allocated by apply_dir_rename().  But now that we've used them
-	 * and have no other references to these strings, it is time to
-	 * deallocate them.
-	 */
-	free_strmap_strings(&collisions);
-	strmap_clear(&collisions, 1);
 	return clean;
 }
 
@@ -3100,6 +3105,7 @@ static int detect_and_process_renames(struct merge_options *opt,
 {
 	struct diff_queue_struct combined = { 0 };
 	struct rename_info *renames = &opt->priv->renames;
+	struct strmap collisions[3];
 	int need_dir_renames, s, i, clean = 1;
 	unsigned detection_run = 0;
 
@@ -3149,12 +3155,22 @@ static int detect_and_process_renames(struct merge_options *opt,
 	ALLOC_GROW(combined.queue,
 		   renames->pairs[1].nr + renames->pairs[2].nr,
 		   combined.alloc);
+	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
+		int other_side = 3 - i;
+		compute_collisions(&collisions[i],
+				   &renames->dir_renames[other_side],
+				   &renames->pairs[i]);
+	}
 	clean &= collect_renames(opt, &combined, MERGE_SIDE1,
+				 collisions,
 				 &renames->dir_renames[2],
 				 &renames->dir_renames[1]);
 	clean &= collect_renames(opt, &combined, MERGE_SIDE2,
+				 collisions,
 				 &renames->dir_renames[1],
 				 &renames->dir_renames[2]);
+	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++)
+		free_collisions(&collisions[i]);
 	STABLE_QSORT(combined.queue, combined.nr, compare_pairs);
 	trace2_region_leave("merge", "directory renames", opt->repo);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 3/3] merge-ort: fix issue with dual rename and add/add conflict
  2022-06-22  4:20 [PATCH 0/3] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
  2022-06-22  4:20 ` [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
  2022-06-22  4:20 ` [PATCH 2/3] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
@ 2022-06-22  4:20 ` Elijah Newren via GitGitGadget
  2022-06-27 18:47   ` Jonathan Tan
  2022-06-22  4:36 ` [PATCH 0/3] Fix dual rename into each other plus conflicting adds Elijah Newren
  2022-06-30  6:57 ` [PATCH v2 " Elijah Newren via GitGitGadget
  4 siblings, 1 reply; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-06-22  4:20 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

There is code in both merge-recursive and merge-ort for avoiding doubly
transitive renames (i.e. one side renames directory A/ -> B/, and the
other side renames directory B/ -> C/), because this combination would
otherwise make a mess for new files added to A/ on the first side and
wondering which directory they end up in -- especially if there were
even more renames such as the first side renaming C/ -> D/.  In such
cases, it just turns "off" directory rename detection for the higher
order transitive cases.

The testcases added in t6423 a couple commits ago are slightly different
but similar in principle.  They involve a similar case of paired
renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
a leading directory of B/ to C/.  And both sides add a new file
somewhere under the directory that the other side will rename.  While
the new files added start within different directories and thus could
logically end up within different directories, it is weird for a file
on one side to end up where the other one started and not move along
with it.  So, let's just turn off directory rename detection in this
case as well.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c                         | 4 ++++
 t/t6423-merge-rename-directories.sh | 4 ++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index fa6667de18c..5bcb9a4980b 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2292,9 +2292,13 @@ static char *check_for_directory_rename(struct merge_options *opt,
 	struct strmap_entry *rename_info;
 	struct strmap_entry *otherinfo = NULL;
 	const char *new_dir;
+	int other_side = 3 - side_index;
 
+	/* Cases where there is no new path, so we return NULL */
 	if (strmap_empty(dir_renames))
 		return new_path;
+	if (strmap_get(&collisions[other_side], path))
+		return new_path;
 	rename_info = check_dir_renamed(path, dir_renames);
 	if (!rename_info)
 		return new_path;
diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
index 296c04f8046..4286ae987c4 100755
--- a/t/t6423-merge-rename-directories.sh
+++ b/t/t6423-merge-rename-directories.sh
@@ -5245,7 +5245,7 @@ test_setup_12l () {
 	)
 }
 
-test_expect_merge_algorithm failure failure '12l (B into A): Rename into each other + add/add conflict' '
+test_expect_merge_algorithm failure success '12l (B into A): Rename into each other + add/add conflict' '
 	test_setup_12l BintoA &&
 	(
 		cd 12l_BintoA &&
@@ -5273,7 +5273,7 @@ test_expect_merge_algorithm failure failure '12l (B into A): Rename into each ot
 	)
 '
 
-test_expect_merge_algorithm failure failure '12l (A into B): Rename into each other + add/add conflict' '
+test_expect_merge_algorithm failure success '12l (A into B): Rename into each other + add/add conflict' '
 	test_setup_12l AintoB &&
 	(
 		cd 12l_AintoB &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/3] Fix dual rename into each other plus conflicting adds
  2022-06-22  4:20 [PATCH 0/3] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-06-22  4:20 ` [PATCH 3/3] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
@ 2022-06-22  4:36 ` Elijah Newren
  2022-06-30  6:57 ` [PATCH v2 " Elijah Newren via GitGitGadget
  4 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren @ 2022-06-22  4:36 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: Git Mailing List

On Tue, Jun 21, 2022 at 9:21 PM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> This series adds some testcases based on the tensorflow repository issue
> reported by Glen Choo at [1], demonstrating bugs in both the ort and
> recursive strategies. It also provides a fix for the ort strategy.

Also, just to be clear, this is not a regression since 2.36.  It's
been with the ort strategy since it was introduced, and with the
recursive strategy for a number of years.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict
  2022-06-22  4:20 ` [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
@ 2022-06-27 18:20   ` Jonathan Tan
  2022-06-30  0:06     ` Elijah Newren
  2022-06-27 22:30   ` Calvin Wan
  1 sibling, 1 reply; 45+ messages in thread
From: Jonathan Tan @ 2022-06-27 18:20 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: Jonathan Tan, git, Elijah Newren

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Elijah Newren <newren@gmail.com>
> 
> This is an attempt at minimalizing a testcase reported by Glen Choo
> with tensorflow where merge-ort would report an assertion failure:
> 
>     Assertion failed: (ci->filemask == 2 || ci->filemask == 4), function apply_directory_rename_modifications, file merge-ort.c, line 2410

First of all, thanks for this fix - I've verified with Glen Choo's test
cases and it does fix it.

> diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
> index 479db32cd62..296c04f8046 100755
> --- a/t/t6423-merge-rename-directories.sh
> +++ b/t/t6423-merge-rename-directories.sh
> @@ -5199,6 +5199,108 @@ test_expect_success '12k: Directory rename with sibling causes rename-to-self' '
>  	)
>  '
>  
> +# Testcase 12l, Both sides rename a directory into the other side, both add
> +#   a file with after directory renames are the same filename
> +#   Commit O: sub1/file,                 sub2/other
> +#   Commit A: sub3/file,                 sub2/{other, new_add_add_file_1}
> +#   Commit B: sub1/{file, newfile}, sub1/sub2/{other, new_add_add_file_2}
> +#
> +#   In words:
> +#     A: sub1/ -> sub3/, add sub2/new_add_add_file_1
> +#     B: sub2/ -> sub1/sub2, add sub1/newfile, add sub1/sub2/new_add_add_file_2
> +#
> +#   Expected: sub3/{file, newfile, sub2/other}
> +#             CONFLICT (add/add): sub3/sub2/new_add_add_file

The "expected" might need to be updated. After all patches are applied,
this is the tree I get (note that it's "sub1/sub2/new_add_add_file, not
"sub3/sub2/new_add_add_file"):

  .
  |-- sub1
  |   `-- sub2
  |       `-- new_add_add_file
  `-- sub3
      |-- file
      |-- newfile
      `-- sub2
          `-- other

Also, at first glance, "newfile" shouldn't belong in a minimal
reproduction, but it somehow changes the output. When I apply this diff
(to the state after all patches are applied):

  diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
  index 4286ae987c..9472fb7233 100755
  --- a/t/t6423-merge-rename-directories.sh
  +++ b/t/t6423-merge-rename-directories.sh
  @@ -5237,7 +5237,6 @@ test_setup_12l () {
   
                  git checkout B &&
                  echo dissimilar >sub2/new_add_add_file &&
  -               echo brand >sub1/newfile &&
                  git add sub1 sub2 &&
                  git mv sub2 sub1 &&
                  test_tick &&
  @@ -5255,14 +5254,14 @@ test_expect_merge_algorithm failure success '12l (B into A): Rename into each ot
                  test_must_fail git -c merge.directoryRenames=true merge -s recursive B^0 &&
   
                  git ls-files -s >out &&
  -               test_line_count = 5 out &&
  +               test_line_count = 4 out &&
   
                  git rev-parse >actual \
  -                       :0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
  +                       :0:sub3/file :0:sub3/sub2/other \
                          :2:sub1/sub2/new_add_add_file \
                          :3:sub1/sub2/new_add_add_file &&
                  git rev-parse >expect \
  -                       O:sub1/file  B:sub1/newfile O:sub2/other \
  +                       O:sub1/file  O:sub2/other \
                          A:sub2/new_add_add_file \
                          B:sub1/sub2/new_add_add_file &&
                  test_cmp expect actual &&

I get:

  .
  |-- sub1
  |   `-- sub2
  |       |-- new_add_add_file
  |       `-- other
  `-- sub3
      `-- file

(Note the path to "other".) I haven't figured out what's going on,
though.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/3] merge-ort: fix issue with dual rename and add/add conflict
  2022-06-22  4:20 ` [PATCH 3/3] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
@ 2022-06-27 18:47   ` Jonathan Tan
  2022-06-30  0:05     ` Elijah Newren
  0 siblings, 1 reply; 45+ messages in thread
From: Jonathan Tan @ 2022-06-27 18:47 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: Jonathan Tan, git, Elijah Newren

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> The testcases added in t6423 a couple commits ago are slightly different
> but similar in principle.  They involve a similar case of paired
> renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
> a leading directory of B/ to C/.  

Hmm...one side moved sub1 -> sub3 and the other moved sub2 from the root
to under sub1, right? So it's more like A/ -> B/ and C/ -> A/C/.

> And both sides add a new file
> somewhere under the directory that the other side will rename.  

Rename or move, I think.

> While
> the new files added start within different directories and thus could
> logically end up within different directories, it is weird for a file
> on one side to end up where the other one started and not move along
> with it.  So, let's just turn off directory rename detection in this
> case as well.

Makes sense.

> diff --git a/merge-ort.c b/merge-ort.c
> index fa6667de18c..5bcb9a4980b 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -2292,9 +2292,13 @@ static char *check_for_directory_rename(struct merge_options *opt,
>  	struct strmap_entry *rename_info;
>  	struct strmap_entry *otherinfo = NULL;
>  	const char *new_dir;
> +	int other_side = 3 - side_index;
>  
> +	/* Cases where there is no new path, so we return NULL */

What do you mean by "no new path"?

>  	if (strmap_empty(dir_renames))
>  		return new_path;
> +	if (strmap_get(&collisions[other_side], path))
> +		return new_path;

So as far as I understand, collisions here, for a given side, is a map.
The map's keys are all the filenames of added and renamed files (I'm
assuming that's what 'A' and 'R' are) from that side after any directory
moves on the other side are applied. So, for example, if we add "foo/a"
on side A and rename "foo/" to "bar/" on side B, then side A's collision
map would have a key "bar/a". So I'm not sure if "collision" is the
right name here, but the existing code already uses it so I'll leave it
be.

It makes sense that this situation (of side A having "bar/a" because
side B renamed "foo/" to "bar/", and at the same time, side B adds its
own "bar/a") is weird, as stated in the commit message, so I don't mind
disabling checking for directory rename in this case. However, in
theory, I don't see how disabling this check would fix anything, since
the bug seems to be about two different files on different sides being
renamed to the same filename through some convoluted means. (Unless this
is the only convoluted means to do it, and disabling it means that the
bug wouldn't occur.)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/3] merge-ort: shuffle the computation and cleanup of potential collisions
  2022-06-22  4:20 ` [PATCH 2/3] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
@ 2022-06-27 18:48   ` Jonathan Tan
  2022-06-27 21:04     ` Calvin Wan
  0 siblings, 1 reply; 45+ messages in thread
From: Jonathan Tan @ 2022-06-27 18:48 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: Jonathan Tan, git, Elijah Newren

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> @@ -2314,7 +2335,8 @@ static char *check_for_directory_rename(struct merge_options *opt,
>  	}
>  
>  	new_path = handle_path_level_conflicts(opt, path, side_index,
> -					       rename_info, collisions);
> +					       rename_info,
> +					       &collisions[side_index]);

Is this a fix of a latent bug? handle_path_level_conflicts() is not
changed in this patch.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/3] merge-ort: shuffle the computation and cleanup of potential collisions
  2022-06-27 18:48   ` Jonathan Tan
@ 2022-06-27 21:04     ` Calvin Wan
  2022-06-30  0:05       ` Elijah Newren
  0 siblings, 1 reply; 45+ messages in thread
From: Calvin Wan @ 2022-06-27 21:04 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: Calvin Wan, Elijah Newren via GitGitGadget, git, Elijah Newren

Jonathan Tan <jonathantanmy@google.com> writes:
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > @@ -2314,7 +2335,8 @@ static char *check_for_directory_rename(struct merge_options *opt,
> >  	}
> >  
> >  	new_path = handle_path_level_conflicts(opt, path, side_index,
> > -					       rename_info, collisions);
> > +					       rename_info,
> > +					       &collisions[side_index]);
> 
> Is this a fix of a latent bug? handle_path_level_conflicts() is not
> changed in this patch.
> 

I don't think so. IIUC this is what's happening given the callstack:

detect_and_process_renames()
  - Now defines `struct strmap collisions[3];` and computes all
    three collisions here
  - Passes collisions into collect_renames()
collect_renames()
  - Originally defined as `struct strmap collisions;` and computed
    collisions in here
  - Now takes collisions as an argument
  - Passes collisions into check_for_directory_rename()
check_for_directory_rename()
  - Collisions isn't used in this function at all except to pass into
    handle_path_level_conflicts
handle_path_level_conflicts()
  - Expecting pointer to singular collisions, not an array so side_index
    is now required

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict
  2022-06-22  4:20 ` [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
  2022-06-27 18:20   ` Jonathan Tan
@ 2022-06-27 22:30   ` Calvin Wan
  2022-06-30  0:07     ` Elijah Newren
  1 sibling, 1 reply; 45+ messages in thread
From: Calvin Wan @ 2022-06-27 22:30 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: Calvin Wan, git, Elijah Newren

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +# Testcase 12l, Both sides rename a directory into the other side, both add
> +#   a file with after directory renames are the same filename
> +#   Commit O: sub1/file,                 sub2/other
> +#   Commit A: sub3/file,                 sub2/{other, new_add_add_file_1}
> +#   Commit B: sub1/{file, newfile}, sub1/sub2/{other, new_add_add_file_2}
> +#
> +#   In words:
> +#     A: sub1/ -> sub3/, add sub2/new_add_add_file_1
> +#     B: sub2/ -> sub1/sub2, add sub1/newfile, add sub1/sub2/new_add_add_file_2
> +#
> +#   Expected: sub3/{file, newfile, sub2/other}
> +#             CONFLICT (add/add): sub3/sub2/new_add_add_file

Grammatically, I could not understand

"both add a file with after directory renames are the same filename"

I also found the same issue with `Expected` that Jonathan mentions. I ran the
command separately and got

CONFLICT (add/add): Merge conflict in sub1/sub2/new_add_add_file

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/3] merge-ort: fix issue with dual rename and add/add conflict
  2022-06-27 18:47   ` Jonathan Tan
@ 2022-06-30  0:05     ` Elijah Newren
  2022-07-06 17:25       ` Jonathan Tan
  0 siblings, 1 reply; 45+ messages in thread
From: Elijah Newren @ 2022-06-30  0:05 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Mon, Jun 27, 2022 at 11:47 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > The testcases added in t6423 a couple commits ago are slightly different
> > but similar in principle.  They involve a similar case of paired
> > renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
> > a leading directory of B/ to C/.
>
> Hmm...one side moved sub1 -> sub3 and the other moved sub2 from the root
> to under sub1, right? So it's more like A/ -> B/ and C/ -> A/C/.

Hmm, maybe I should have been more explicit in my mapping:
   A = sub2
   B = sub1/sub2
   leading directory of B = sub1
   C = sub3

> > And both sides add a new file
> > somewhere under the directory that the other side will rename.
>
> Rename or move, I think.

I'm confused; I don't understand what this comment means.  Renames
tend to be created using "git mv", before or after making content
changes, so to me a file being renamed or a file being moved to a
different location are synonymous.  What distinction are you making
between renames and moves?

> > While
> > the new files added start within different directories and thus could
> > logically end up within different directories, it is weird for a file
> > on one side to end up where the other one started and not move along
> > with it.  So, let's just turn off directory rename detection in this
> > case as well.
>
> Makes sense.
>
> > diff --git a/merge-ort.c b/merge-ort.c
> > index fa6667de18c..5bcb9a4980b 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -2292,9 +2292,13 @@ static char *check_for_directory_rename(struct merge_options *opt,
> >       struct strmap_entry *rename_info;
> >       struct strmap_entry *otherinfo = NULL;
> >       const char *new_dir;
> > +     int other_side = 3 - side_index;
> >
> > +     /* Cases where there is no new path, so we return NULL */
>
> What do you mean by "no new path"?

Hmm, perhaps I should change this to:

/* Cases where we don't have or don't want a directory rename for this
path, so we return NULL */

The purpose of this function is to check whether a given path would be
modified by directory renaming to get a new path.  So, "no new path"
means we don't have an applicable directory rename or don't want to
use it.

> >       if (strmap_empty(dir_renames))
> >               return new_path;
> > +     if (strmap_get(&collisions[other_side], path))
> > +             return new_path;
>
> So as far as I understand, collisions here, for a given side, is a map.
> The map's keys are all the filenames of added and renamed files (I'm
> assuming that's what 'A' and 'R' are) from that side after any directory
> moves on the other side are applied. So, for example, if we add "foo/a"
> on side A and rename "foo/" to "bar/" on side B, then side A's collision
> map would have a key "bar/a". So I'm not sure if "collision" is the
> right name here, but the existing code already uses it so I'll leave it
> be.

Let's take your example a bit further, to discuss the original kind of
usecase that "collisions" was written for.  In addition to adding
"foo/a" on side A, we also add "foo2/a" and "foo3/a".  And then in
addition to renaming "foo/" to "bar/" on side B, we also rename
"foo2/" to "bar/" and "foo3/" to "bar/", thus merging all of foo/,
foo2/, and foo3/ into a single directory named bar/.  If the files in
foo/, foo2/, and foo3/ on side B were all unique, you can see how
there'd be no problem merging these directories together.  The problem
only comes at merge time when you attempt to apply the directory
renames from side B to the new files on side A.  That's when you get
collisions, in this case three files that would be named bar/a.

Checking for such collisions was the purpose of the "collisions"
metadata, so I think the name is fitting.  The only problem is that
we're reusing that data now for a slightly different purpose, though I
think it's still "collision-y".

> It makes sense that this situation (of side A having "bar/a" because
> side B renamed "foo/" to "bar/", and at the same time, side B adds its
> own "bar/a") is weird, as stated in the commit message, so I don't mind
> disabling checking for directory rename in this case.  However, in
> theory, I don't see how disabling this check would fix anything, since
> the bug seems to be about two different files on different sides being
> renamed to the same filename through some convoluted means. (Unless this
> is the only convoluted means to do it, and disabling it means that the
> bug wouldn't occur.)

Hmm...let me see if I can explain this a different way.

The short version of the issue here is that if a directory rename
wants to rename NEWFILE -> ALTERNATE_NEWFILE, but there is another
directory rename on the other side of history that wants to rename
ANOTHER_FILE -> NEWFILE, then we run into problems and have to turn
off the NEWFILE -> ALTERNATE_NEWFILE.  This check here is all about
this case.

To see why this is the problematic setup...

The primary data structure in merge-ort is opt->priv->paths, a strmap
which maps: (path involved in the merge) -> (conflict_info).
(Technically, it could have a merged_info instead of a conflict_info
if the file was trivially resolvable early on but since merged_info is
the first entry in a conflict_info, logically it can still be thought
of as a conflict_info just with less data.).  Now a conflict_info
stores information about the OIDs and modes for all three sides of the
merge (i.e. both sides of the merge and the base).  Whenever a rename
is processed, we have to update this map, because the rename makes the
conflict_info now apply to a different path.  In the simple cases, the
conflict_info from the source path needs to be merged with the
conflict_info for the target path, and the source path's conflict_info
needs to be marked as null (literally setting the .is_null field to
1).  rename/rename and such can make this a bit more complicated.

Normally, directory renames would actually be a simpler case for this
merging of conflict_info structs rather than a more complicated case.
There cannot be a directory rename if the directory exists on both
sides, so we don't need to worry about there already being a file on
the other side whose conflict_info we need to merge with the source
conflict_info.  So, the code just added an assertion that there wasn't
one.  The problem is, that _another_ directory rename for the other
side of history can move a file into the way of where our directory
rename wants to operate on.  Let's jump into the example testcase I
added to make this more concrete:

   #   Commit O: sub1/file,                 sub2/other
   #   Commit A: sub3/file,                 sub2/{other, new_add_add_file_1}
   #   Commit B: sub1/{file, newfile}, sub1/sub2/{other, new_add_add_file_2}
   #
   #   In words:
   #     A: sub1/ -> sub3/, add sub2/new_add_add_file_1
   #     B: sub2/ -> sub1/sub2, add sub1/newfile, add
sub1/sub2/new_add_add_file_2

Here, the sub2/sub1/sub2/ rename on sideB will rename A's
sub2/new_add_add_file to sub1/sub2/new_add_add_file, which is at the
same location as B's sub1/sub2/new_add_add_file (and which A's sub1/
-> sub3/ directory rename would normally operate on).

Given our opt->priv->paths data structure, if we wanted to let both
directory renames take effect, then the order would matter:

* If the sub1/ -> sub3/ directory rename is applied first, then:
  * B's sub1/sub2/new_add_add_file gets renamed to sub3/sub2/new_add_add_file
  * sub1/sub2/new_add_add_file is marked as .is_null=1
  * A's sub2/new_add_add_file gets renamed to
sub1/sub2/new_add_add_file (which was already marked as null)

This set of steps seems to trigger the "error: cache entry has null
sha1" fatal error I mentioned earlier.  In contrast, if we take the
other order:

* If the sub2/ -> sub1/sub2/ rename is applied first, then:
  * A's sub2/new_add_add_file gets renamed to sub1/sub2/new_add_add_file
  * the conflict_info for the two sub1/sub2/new_add_add_file's are now merged
  * the sub1/ -> sub3/ directory rename is applied to move this file
to sub3/sub2/new_add_add_file

This second order may not sound so bad.  And, in fact, you can get
this behavior simply by relaxing (or commenting out) the assertion
Glen reported hitting.  However, that results in making the merge have
a fatal error when you reverse its direction (i.e. when swapping HEAD
and MERGE_HEAD), and seems somewhat confusing to me given that A's
sub2/new_add_add_file was renamed twice, going against the general
"avoid-mutiply-transitive-renames" rule employed elsewhere in
directory rename detection.

To avoid this order dependence, and the weird multiple-renaming of a
single path, we just want to turn off directory renames when the
source of a directory rename on one side is the target of a directory
rename on the other.  That's what this patch does.


Does that help?  Or did I make it more confusing?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/3] merge-ort: shuffle the computation and cleanup of potential collisions
  2022-06-27 21:04     ` Calvin Wan
@ 2022-06-30  0:05       ` Elijah Newren
  0 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren @ 2022-06-30  0:05 UTC (permalink / raw)
  To: Calvin Wan; +Cc: Jonathan Tan, Elijah Newren via GitGitGadget, Git Mailing List

On Mon, Jun 27, 2022 at 2:04 PM Calvin Wan <calvinwan@google.com> wrote:
>
> Jonathan Tan <jonathantanmy@google.com> writes:
> > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > > @@ -2314,7 +2335,8 @@ static char *check_for_directory_rename(struct merge_options *opt,
> > >     }
> > >
> > >     new_path = handle_path_level_conflicts(opt, path, side_index,
> > > -                                          rename_info, collisions);
> > > +                                          rename_info,
> > > +                                          &collisions[side_index]);
> >
> > Is this a fix of a latent bug? handle_path_level_conflicts() is not
> > changed in this patch.
> >
>
> I don't think so. IIUC this is what's happening given the callstack:
>
> detect_and_process_renames()
>   - Now defines `struct strmap collisions[3];` and computes all
>     three collisions here
>   - Passes collisions into collect_renames()
> collect_renames()
>   - Originally defined as `struct strmap collisions;` and computed
>     collisions in here
>   - Now takes collisions as an argument
>   - Passes collisions into check_for_directory_rename()
> check_for_directory_rename()
>   - Collisions isn't used in this function at all except to pass into
>     handle_path_level_conflicts
> handle_path_level_conflicts()
>   - Expecting pointer to singular collisions, not an array so side_index
>     is now required

Sweet, thanks for answering for me.  This is exactly right.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict
  2022-06-27 18:20   ` Jonathan Tan
@ 2022-06-30  0:06     ` Elijah Newren
  2022-06-30 22:32       ` Jonathan Tan
  0 siblings, 1 reply; 45+ messages in thread
From: Elijah Newren @ 2022-06-30  0:06 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Mon, Jun 27, 2022 at 11:20 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > This is an attempt at minimalizing a testcase reported by Glen Choo
> > with tensorflow where merge-ort would report an assertion failure:
> >
> >     Assertion failed: (ci->filemask == 2 || ci->filemask == 4), function apply_directory_rename_modifications, file merge-ort.c, line 2410
>
> First of all, thanks for this fix - I've verified with Glen Choo's test
> cases and it does fix it.

:-)

> > diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
> > index 479db32cd62..296c04f8046 100755
> > --- a/t/t6423-merge-rename-directories.sh
> > +++ b/t/t6423-merge-rename-directories.sh
> > @@ -5199,6 +5199,108 @@ test_expect_success '12k: Directory rename with sibling causes rename-to-self' '
> >       )
> >  '
> >
> > +# Testcase 12l, Both sides rename a directory into the other side, both add
> > +#   a file with after directory renames are the same filename
> > +#   Commit O: sub1/file,                 sub2/other
> > +#   Commit A: sub3/file,                 sub2/{other, new_add_add_file_1}
> > +#   Commit B: sub1/{file, newfile}, sub1/sub2/{other, new_add_add_file_2}
> > +#
> > +#   In words:
> > +#     A: sub1/ -> sub3/, add sub2/new_add_add_file_1
> > +#     B: sub2/ -> sub1/sub2, add sub1/newfile, add sub1/sub2/new_add_add_file_2
> > +#
> > +#   Expected: sub3/{file, newfile, sub2/other}
> > +#             CONFLICT (add/add): sub3/sub2/new_add_add_file
>
> The "expected" might need to be updated.

Oops!  Yes, I kept changing and editing the testcase and the
comments...but didn't quite get all the updates I needed when I was
revising.

> After all patches are applied,
> this is the tree I get (note that it's "sub1/sub2/new_add_add_file, not
> "sub3/sub2/new_add_add_file"):
>
>   .
>   |-- sub1
>   |   `-- sub2
>   |       `-- new_add_add_file
>   `-- sub3
>       |-- file
>       |-- newfile
>       `-- sub2
>           `-- other

Yes, that's right.

> Also, at first glance, "newfile" shouldn't belong in a minimal
> reproduction

Ah, I can see you've looked at this very closely.  Thanks for digging
in!   I know it's counter-intuitive at first, but the file is
necessary in order to get the sub1/ -> sub3/ rename.  The reasoning is
as follows: We don't need to detect a directory rename for a directory
where the other side added no new files into that directory...because
the whole point of directory renames is to move new files in a
directory to the new location.  Therefore, no new files in the
directory on one side, means no need to detect a directory rename for
it on the other side.  For a deeper discussion of this, see commit
c64432aacd (t6423: more involved rules for renaming directories into
each other, 2020-10-15).


>, but it somehow changes the output. When I apply this diff
> (to the state after all patches are applied):
>
>   diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
>   index 4286ae987c..9472fb7233 100755
>   --- a/t/t6423-merge-rename-directories.sh
>   +++ b/t/t6423-merge-rename-directories.sh
>   @@ -5237,7 +5237,6 @@ test_setup_12l () {
>
>                   git checkout B &&
>                   echo dissimilar >sub2/new_add_add_file &&
>   -               echo brand >sub1/newfile &&
>                   git add sub1 sub2 &&
>                   git mv sub2 sub1 &&
>                   test_tick &&
>   @@ -5255,14 +5254,14 @@ test_expect_merge_algorithm failure success '12l (B into A): Rename into each ot
>                   test_must_fail git -c merge.directoryRenames=true merge -s recursive B^0 &&
>
>                   git ls-files -s >out &&
>   -               test_line_count = 5 out &&
>   +               test_line_count = 4 out &&
>
>                   git rev-parse >actual \
>   -                       :0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
>   +                       :0:sub3/file :0:sub3/sub2/other \
>                           :2:sub1/sub2/new_add_add_file \
>                           :3:sub1/sub2/new_add_add_file &&
>                   git rev-parse >expect \
>   -                       O:sub1/file  B:sub1/newfile O:sub2/other \
>   +                       O:sub1/file  O:sub2/other \
>                           A:sub2/new_add_add_file \
>                           B:sub1/sub2/new_add_add_file &&
>                   test_cmp expect actual &&
>
> I get:
>
>   .
>   |-- sub1
>   |   `-- sub2
>   |       |-- new_add_add_file
>   |       `-- other
>   `-- sub3
>       `-- file
>
> (Note the path to "other".) I haven't figured out what's going on,
> though.

Yeah, this is the result of having no directory rename due to having
no new files that need to be moved by a directory rename.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict
  2022-06-27 22:30   ` Calvin Wan
@ 2022-06-30  0:07     ` Elijah Newren
  0 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren @ 2022-06-30  0:07 UTC (permalink / raw)
  To: Calvin Wan; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Mon, Jun 27, 2022 at 3:30 PM Calvin Wan <calvinwan@google.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > +# Testcase 12l, Both sides rename a directory into the other side, both add
> > +#   a file with after directory renames are the same filename
> > +#   Commit O: sub1/file,                 sub2/other
> > +#   Commit A: sub3/file,                 sub2/{other, new_add_add_file_1}
> > +#   Commit B: sub1/{file, newfile}, sub1/sub2/{other, new_add_add_file_2}
> > +#
> > +#   In words:
> > +#     A: sub1/ -> sub3/, add sub2/new_add_add_file_1
> > +#     B: sub2/ -> sub1/sub2, add sub1/newfile, add sub1/sub2/new_add_add_file_2
> > +#
> > +#   Expected: sub3/{file, newfile, sub2/other}
> > +#             CONFLICT (add/add): sub3/sub2/new_add_add_file
>
> Grammatically, I could not understand
>
> "both add a file with after directory renames are the same filename"

Oops, that should be "with" -> "which".

> I also found the same issue with `Expected` that Jonathan mentions. I ran the
> command separately and got
>
> CONFLICT (add/add): Merge conflict in sub1/sub2/new_add_add_file

Yeah, repeatedly revising stuff is fine as long as you remember to
update all the parts.  (I was swapping which directories were named
"sub1", "sub2" and "sub3" to see if lexicographic ordering might be
related to why this simpler test triggered a bug in the "recursive"
strategy but Glen's bigger testcase didn't.).  Anyway, I just missed
updating one of the locations while doing that revision; sorry about
that.  Will fix.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 0/3] Fix dual rename into each other plus conflicting adds
  2022-06-22  4:20 [PATCH 0/3] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-06-22  4:36 ` [PATCH 0/3] Fix dual rename into each other plus conflicting adds Elijah Newren
@ 2022-06-30  6:57 ` Elijah Newren via GitGitGadget
  2022-06-30  6:57   ` [PATCH v2 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
                     ` (3 more replies)
  4 siblings, 4 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-06-30  6:57 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jonathan Tan, Calvin Wan, Elijah Newren

This series adds some testcases based on the tensorflow repository issue
reported by Glen Choo at [1], demonstrating bugs in both the ort and
recursive strategies. It also provides a fix for the ort strategy.

Changes since v1:

 * Fixed some wording issues in comments, and added a bit more details to
   one of the commit messages

[1]
https://lore.kernel.org/git/kl6lee006mle.fsf@chooglen-macbookpro.roam.corp.google.com/

Elijah Newren (3):
  t6423: add tests of dual directory rename plus add/add conflict
  merge-ort: shuffle the computation and cleanup of potential collisions
  merge-ort: fix issue with dual rename and add/add conflict

 merge-ort.c                         |  63 +++++++++++------
 t/t6423-merge-rename-directories.sh | 102 ++++++++++++++++++++++++++++
 2 files changed, 145 insertions(+), 20 deletions(-)


base-commit: e54793a95afeea1e10de1e5ad7eab914e7416250
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1268%2Fnewren%2Ffix-dual-rename-into-each-other-plus-conflicting-adds-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1268/newren/fix-dual-rename-into-each-other-plus-conflicting-adds-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1268

Range-diff vs v1:

 1:  69d62041843 ! 1:  bf4c03d01d5 t6423: add tests of dual directory rename plus add/add conflict
     @@ t/t6423-merge-rename-directories.sh: test_expect_success '12k: Directory rename
       '
       
      +# Testcase 12l, Both sides rename a directory into the other side, both add
     -+#   a file with after directory renames are the same filename
     ++#   a file which after directory renames are the same filename
      +#   Commit O: sub1/file,                 sub2/other
      +#   Commit A: sub3/file,                 sub2/{other, new_add_add_file_1}
      +#   Commit B: sub1/{file, newfile}, sub1/sub2/{other, new_add_add_file_2}
     @@ t/t6423-merge-rename-directories.sh: test_expect_success '12k: Directory rename
      +#     B: sub2/ -> sub1/sub2, add sub1/newfile, add sub1/sub2/new_add_add_file_2
      +#
      +#   Expected: sub3/{file, newfile, sub2/other}
     -+#             CONFLICT (add/add): sub3/sub2/new_add_add_file
     ++#             CONFLICT (add/add): sub1/sub2/new_add_add_file
      +
      +test_setup_12l () {
      +	test_create_repo 12l_$1 &&
 2:  d8c13e56209 = 2:  cfa38f01481 merge-ort: shuffle the computation and cleanup of potential collisions
 3:  bb2badccb71 ! 3:  da3ae38e390 merge-ort: fix issue with dual rename and add/add conflict
     @@ Commit message
          with it.  So, let's just turn off directory rename detection in this
          case as well.
      
     +    Another way to look at this is that if the source name involved in a
     +    directory rename on one side is the target name of a directory rename
     +    operation for a file from the other side, then we avoid the doubly
     +    transitive rename.  (More concretely, if a directory rename on side D
     +    wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
     +    already had a file named NEW_NAME, and a directory rename on side E
     +    wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
     +    directory rename detection for NEW_NAME to prevent the
     +    NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
     +    conflict on NEW_NAME.)
     +
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## merge-ort.c ##
     @@ merge-ort.c: static char *check_for_directory_rename(struct merge_options *opt,
       	const char *new_dir;
      +	int other_side = 3 - side_index;
       
     -+	/* Cases where there is no new path, so we return NULL */
     ++	/*
     ++	 * Cases where we don't have or don't want a directory rename for
     ++	 * this path, so we return NULL.
     ++	 */
       	if (strmap_empty(dir_renames))
       		return new_path;
      +	if (strmap_get(&collisions[other_side], path))

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v2 1/3] t6423: add tests of dual directory rename plus add/add conflict
  2022-06-30  6:57 ` [PATCH v2 " Elijah Newren via GitGitGadget
@ 2022-06-30  6:57   ` Elijah Newren via GitGitGadget
  2022-06-30 10:21     ` Ævar Arnfjörð Bjarmason
  2022-06-30  6:57   ` [PATCH v2 2/3] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-06-30  6:57 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jonathan Tan, Calvin Wan, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

This is an attempt at minimalizing a testcase reported by Glen Choo
with tensorflow where merge-ort would report an assertion failure:

    Assertion failed: (ci->filemask == 2 || ci->filemask == 4), function apply_directory_rename_modifications, file merge-ort.c, line 2410

reversing the direction of the merge provides a different error:

    error: cache entry has null sha1: ...
    fatal: unable to write .git/index

so we add testcases for both.  With these new testcases, the
recursive strategy differs in that it returns the latter error for
both merge directions.

These testcases are somehow a little different than Glen's original
tensorflow testcase in that these ones trigger a bug with the recursive
algorithm whereas his testcase didn't.  I figure that means these
testcases somehow manage to be more comprehensive.

Reported-by: Glen Choo <chooglen@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t6423-merge-rename-directories.sh | 102 ++++++++++++++++++++++++++++
 1 file changed, 102 insertions(+)

diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
index 479db32cd62..db13bb995f3 100755
--- a/t/t6423-merge-rename-directories.sh
+++ b/t/t6423-merge-rename-directories.sh
@@ -5199,6 +5199,108 @@ test_expect_success '12k: Directory rename with sibling causes rename-to-self' '
 	)
 '
 
+# Testcase 12l, Both sides rename a directory into the other side, both add
+#   a file which after directory renames are the same filename
+#   Commit O: sub1/file,                 sub2/other
+#   Commit A: sub3/file,                 sub2/{other, new_add_add_file_1}
+#   Commit B: sub1/{file, newfile}, sub1/sub2/{other, new_add_add_file_2}
+#
+#   In words:
+#     A: sub1/ -> sub3/, add sub2/new_add_add_file_1
+#     B: sub2/ -> sub1/sub2, add sub1/newfile, add sub1/sub2/new_add_add_file_2
+#
+#   Expected: sub3/{file, newfile, sub2/other}
+#             CONFLICT (add/add): sub1/sub2/new_add_add_file
+
+test_setup_12l () {
+	test_create_repo 12l_$1 &&
+	(
+		cd 12l_$1 &&
+
+		mkdir -p sub1 sub2
+		echo file >sub1/file &&
+		echo other >sub2/other &&
+		git add sub1 sub2 &&
+		git commit -m "O" &&
+
+		git branch O &&
+		git branch A &&
+		git branch B &&
+
+		git checkout A &&
+		git mv sub1 sub3 &&
+		echo conflicting >sub2/new_add_add_file &&
+		git add sub2 &&
+		test_tick &&
+		git add -u &&
+		git commit -m "A" &&
+
+		git checkout B &&
+		echo dissimilar >sub2/new_add_add_file &&
+		echo brand >sub1/newfile &&
+		git add sub1 sub2 &&
+		git mv sub2 sub1 &&
+		test_tick &&
+		git commit -m "B"
+	)
+}
+
+test_expect_merge_algorithm failure failure '12l (B into A): Rename into each other + add/add conflict' '
+	test_setup_12l BintoA &&
+	(
+		cd 12l_BintoA &&
+
+		git checkout -q A^0 &&
+
+		test_must_fail git -c merge.directoryRenames=true merge -s recursive B^0 &&
+
+		git ls-files -s >out &&
+		test_line_count = 5 out &&
+
+		git rev-parse >actual \
+			:0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
+			:2:sub1/sub2/new_add_add_file \
+			:3:sub1/sub2/new_add_add_file &&
+		git rev-parse >expect \
+			O:sub1/file  B:sub1/newfile O:sub2/other \
+			A:sub2/new_add_add_file \
+			B:sub1/sub2/new_add_add_file &&
+		test_cmp expect actual &&
+
+		git ls-files -o >actual &&
+		test_write_lines actual expect out >expect &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_merge_algorithm failure failure '12l (A into B): Rename into each other + add/add conflict' '
+	test_setup_12l AintoB &&
+	(
+		cd 12l_AintoB &&
+
+		git checkout -q B^0 &&
+
+		test_must_fail git -c merge.directoryRenames=true merge -s recursive A^0 &&
+
+		git ls-files -s >out &&
+		test_line_count = 5 out &&
+
+		git rev-parse >actual \
+			:0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
+			:2:sub1/sub2/new_add_add_file \
+			:3:sub1/sub2/new_add_add_file &&
+		git rev-parse >expect \
+			O:sub1/file  B:sub1/newfile O:sub2/other \
+			B:sub1/sub2/new_add_add_file \
+			A:sub2/new_add_add_file &&
+		test_cmp expect actual &&
+
+		git ls-files -o >actual &&
+		test_write_lines actual expect out >expect &&
+		test_cmp expect actual
+	)
+'
+
 ###########################################################################
 # SECTION 13: Checking informational and conflict messages
 #
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 2/3] merge-ort: shuffle the computation and cleanup of potential collisions
  2022-06-30  6:57 ` [PATCH v2 " Elijah Newren via GitGitGadget
  2022-06-30  6:57   ` [PATCH v2 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
@ 2022-06-30  6:57   ` Elijah Newren via GitGitGadget
  2022-06-30 10:28     ` Ævar Arnfjörð Bjarmason
  2022-06-30  6:57   ` [PATCH v2 3/3] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
  2022-07-01  5:23   ` [PATCH v3 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
  3 siblings, 1 reply; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-06-30  6:57 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jonathan Tan, Calvin Wan, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Run compute_collisions() for renames on both sides of history before
any calls to collect_renames(), and do not free the computed collisions
until after both calls to collect_renames().  This is just a code
reorganization at this point that doesn't make sense on its own, but
will permit us to use the computed collision info from both sides
within each call to collect_renames() in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 56 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 8545354dafd..fa6667de18c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2259,6 +2259,27 @@ static void compute_collisions(struct strmap *collisions,
 	}
 }
 
+static void free_collisions(struct strmap *collisions)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *entry;
+
+	/* Free each value in the collisions map */
+	strmap_for_each_entry(collisions, &iter, entry) {
+		struct collision_info *info = entry->value;
+		string_list_clear(&info->source_files, 0);
+	}
+	/*
+	 * In compute_collisions(), we set collisions.strdup_strings to 0
+	 * so that we wouldn't have to make another copy of the new_path
+	 * allocated by apply_dir_rename().  But now that we've used them
+	 * and have no other references to these strings, it is time to
+	 * deallocate them.
+	 */
+	free_strmap_strings(collisions);
+	strmap_clear(collisions, 1);
+}
+
 static char *check_for_directory_rename(struct merge_options *opt,
 					const char *path,
 					unsigned side_index,
@@ -2314,7 +2335,8 @@ static char *check_for_directory_rename(struct merge_options *opt,
 	}
 
 	new_path = handle_path_level_conflicts(opt, path, side_index,
-					       rename_info, collisions);
+					       rename_info,
+					       &collisions[side_index]);
 	*clean_merge &= (new_path != NULL);
 
 	return new_path;
@@ -3023,18 +3045,15 @@ static int detect_regular_renames(struct merge_options *opt,
 static int collect_renames(struct merge_options *opt,
 			   struct diff_queue_struct *result,
 			   unsigned side_index,
+			   struct strmap *collisions,
 			   struct strmap *dir_renames_for_side,
 			   struct strmap *rename_exclusions)
 {
 	int i, clean = 1;
-	struct strmap collisions;
 	struct diff_queue_struct *side_pairs;
-	struct hashmap_iter iter;
-	struct strmap_entry *entry;
 	struct rename_info *renames = &opt->priv->renames;
 
 	side_pairs = &renames->pairs[side_index];
-	compute_collisions(&collisions, dir_renames_for_side, side_pairs);
 
 	for (i = 0; i < side_pairs->nr; ++i) {
 		struct diff_filepair *p = side_pairs->queue[i];
@@ -3050,7 +3069,7 @@ static int collect_renames(struct merge_options *opt,
 						      side_index,
 						      dir_renames_for_side,
 						      rename_exclusions,
-						      &collisions,
+						      collisions,
 						      &clean);
 
 		possibly_cache_new_pair(renames, p, side_index, new_path);
@@ -3076,20 +3095,6 @@ static int collect_renames(struct merge_options *opt,
 		result->queue[result->nr++] = p;
 	}
 
-	/* Free each value in the collisions map */
-	strmap_for_each_entry(&collisions, &iter, entry) {
-		struct collision_info *info = entry->value;
-		string_list_clear(&info->source_files, 0);
-	}
-	/*
-	 * In compute_collisions(), we set collisions.strdup_strings to 0
-	 * so that we wouldn't have to make another copy of the new_path
-	 * allocated by apply_dir_rename().  But now that we've used them
-	 * and have no other references to these strings, it is time to
-	 * deallocate them.
-	 */
-	free_strmap_strings(&collisions);
-	strmap_clear(&collisions, 1);
 	return clean;
 }
 
@@ -3100,6 +3105,7 @@ static int detect_and_process_renames(struct merge_options *opt,
 {
 	struct diff_queue_struct combined = { 0 };
 	struct rename_info *renames = &opt->priv->renames;
+	struct strmap collisions[3];
 	int need_dir_renames, s, i, clean = 1;
 	unsigned detection_run = 0;
 
@@ -3149,12 +3155,22 @@ static int detect_and_process_renames(struct merge_options *opt,
 	ALLOC_GROW(combined.queue,
 		   renames->pairs[1].nr + renames->pairs[2].nr,
 		   combined.alloc);
+	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
+		int other_side = 3 - i;
+		compute_collisions(&collisions[i],
+				   &renames->dir_renames[other_side],
+				   &renames->pairs[i]);
+	}
 	clean &= collect_renames(opt, &combined, MERGE_SIDE1,
+				 collisions,
 				 &renames->dir_renames[2],
 				 &renames->dir_renames[1]);
 	clean &= collect_renames(opt, &combined, MERGE_SIDE2,
+				 collisions,
 				 &renames->dir_renames[1],
 				 &renames->dir_renames[2]);
+	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++)
+		free_collisions(&collisions[i]);
 	STABLE_QSORT(combined.queue, combined.nr, compare_pairs);
 	trace2_region_leave("merge", "directory renames", opt->repo);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v2 3/3] merge-ort: fix issue with dual rename and add/add conflict
  2022-06-30  6:57 ` [PATCH v2 " Elijah Newren via GitGitGadget
  2022-06-30  6:57   ` [PATCH v2 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
  2022-06-30  6:57   ` [PATCH v2 2/3] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
@ 2022-06-30  6:57   ` Elijah Newren via GitGitGadget
  2022-06-30 10:31     ` Ævar Arnfjörð Bjarmason
  2022-07-01  5:23   ` [PATCH v3 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
  3 siblings, 1 reply; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-06-30  6:57 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Jonathan Tan, Calvin Wan, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

There is code in both merge-recursive and merge-ort for avoiding doubly
transitive renames (i.e. one side renames directory A/ -> B/, and the
other side renames directory B/ -> C/), because this combination would
otherwise make a mess for new files added to A/ on the first side and
wondering which directory they end up in -- especially if there were
even more renames such as the first side renaming C/ -> D/.  In such
cases, it just turns "off" directory rename detection for the higher
order transitive cases.

The testcases added in t6423 a couple commits ago are slightly different
but similar in principle.  They involve a similar case of paired
renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
a leading directory of B/ to C/.  And both sides add a new file
somewhere under the directory that the other side will rename.  While
the new files added start within different directories and thus could
logically end up within different directories, it is weird for a file
on one side to end up where the other one started and not move along
with it.  So, let's just turn off directory rename detection in this
case as well.

Another way to look at this is that if the source name involved in a
directory rename on one side is the target name of a directory rename
operation for a file from the other side, then we avoid the doubly
transitive rename.  (More concretely, if a directory rename on side D
wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
already had a file named NEW_NAME, and a directory rename on side E
wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
directory rename detection for NEW_NAME to prevent the
NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
conflict on NEW_NAME.)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c                         | 7 +++++++
 t/t6423-merge-rename-directories.sh | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index fa6667de18c..17db4c30e5b 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2292,9 +2292,16 @@ static char *check_for_directory_rename(struct merge_options *opt,
 	struct strmap_entry *rename_info;
 	struct strmap_entry *otherinfo = NULL;
 	const char *new_dir;
+	int other_side = 3 - side_index;
 
+	/*
+	 * Cases where we don't have or don't want a directory rename for
+	 * this path, so we return NULL.
+	 */
 	if (strmap_empty(dir_renames))
 		return new_path;
+	if (strmap_get(&collisions[other_side], path))
+		return new_path;
 	rename_info = check_dir_renamed(path, dir_renames);
 	if (!rename_info)
 		return new_path;
diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
index db13bb995f3..1300a01206a 100755
--- a/t/t6423-merge-rename-directories.sh
+++ b/t/t6423-merge-rename-directories.sh
@@ -5245,7 +5245,7 @@ test_setup_12l () {
 	)
 }
 
-test_expect_merge_algorithm failure failure '12l (B into A): Rename into each other + add/add conflict' '
+test_expect_merge_algorithm failure success '12l (B into A): Rename into each other + add/add conflict' '
 	test_setup_12l BintoA &&
 	(
 		cd 12l_BintoA &&
@@ -5273,7 +5273,7 @@ test_expect_merge_algorithm failure failure '12l (B into A): Rename into each ot
 	)
 '
 
-test_expect_merge_algorithm failure failure '12l (A into B): Rename into each other + add/add conflict' '
+test_expect_merge_algorithm failure success '12l (A into B): Rename into each other + add/add conflict' '
 	test_setup_12l AintoB &&
 	(
 		cd 12l_AintoB &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] t6423: add tests of dual directory rename plus add/add conflict
  2022-06-30  6:57   ` [PATCH v2 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
@ 2022-06-30 10:21     ` Ævar Arnfjörð Bjarmason
  2022-07-01  2:57       ` Elijah Newren
  0 siblings, 1 reply; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-30 10:21 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Jonathan Tan, Calvin Wan, Elijah Newren


On Thu, Jun 30 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>

> +test_setup_12l () {
> +	test_create_repo 12l_$1 &&

Can & should just be "git init", see f0d4d398e28 (test-lib: split up and
deprecate test_create_repo(), 2021-05-10).


> +	(
> +		cd 12l_$1 &&
> +
> +		mkdir -p sub1 sub2

The "-p" here isn't needed, you're not creating recursive directories. 

> +		git ls-files -s >out &&
> +		test_line_count = 5 out &&

Can't this just use test_stdout_line_count? Except...

> +
> +		git rev-parse >actual \
> +			:0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
> +			:2:sub1/sub2/new_add_add_file \
> +			:3:sub1/sub2/new_add_add_file &&
> +		git rev-parse >expect \
> +			O:sub1/file  B:sub1/newfile O:sub2/other \
> +			A:sub2/new_add_add_file \
> +			B:sub1/sub2/new_add_add_file &&
> +		test_cmp expect actual &&
> +
> +		git ls-files -o >actual &&
> +		test_write_lines actual expect out >expect &&
> +		test_cmp expect actual

This test seems a bit odd, here we're just checking that we've created
scratch files for the internal use of our test, why is it important that
we have an "out" file, when the only reason we have it is because we
needed it for checking the "ls-files" line count above?

> +	)
> +'
> +
> +test_expect_merge_algorithm failure failure '12l (A into B): Rename into each other + add/add conflict' '
> +	test_setup_12l AintoB &&
> +	(
> +		cd 12l_AintoB &&
> +
> +		git checkout -q B^0 &&
> +
> +		test_must_fail git -c merge.directoryRenames=true merge -s recursive A^0 &&
> +
> +		git ls-files -s >out &&
> +		test_line_count = 5 out &&

ditto.

> +		git rev-parse >actual \
> +			:0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
> +			:2:sub1/sub2/new_add_add_file \
> +			:3:sub1/sub2/new_add_add_file &&
> +		git rev-parse >expect \
> +			O:sub1/file  B:sub1/newfile O:sub2/other \
> +			B:sub1/sub2/new_add_add_file \
> +			A:sub2/new_add_add_file &&
> +		test_cmp expect actual &&
> +
> +		git ls-files -o >actual &&
> +		test_write_lines actual expect out >expect &&

ditto

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/3] merge-ort: shuffle the computation and cleanup of potential collisions
  2022-06-30  6:57   ` [PATCH v2 2/3] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
@ 2022-06-30 10:28     ` Ævar Arnfjörð Bjarmason
  2022-07-01  3:02       ` Elijah Newren
  0 siblings, 1 reply; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-30 10:28 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Jonathan Tan, Calvin Wan, Elijah Newren


On Thu, Jun 30 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> Run compute_collisions() for renames on both sides of history before
> any calls to collect_renames(), and do not free the computed collisions
> until after both calls to collect_renames().  This is just a code
> reorganization at this point that doesn't make sense on its own, but
> will permit us to use the computed collision info from both sides
> within each call to collect_renames() in a subsequent commit.

I think this would be easier to follow if split in two, since...

> +static void free_collisions(struct strmap *collisions)
> +{
> +	struct hashmap_iter iter;
> +	struct strmap_entry *entry;
> +
> +	/* Free each value in the collisions map */
> +	strmap_for_each_entry(collisions, &iter, entry) {
> +		struct collision_info *info = entry->value;
> +		string_list_clear(&info->source_files, 0);
> +	}
> +	/*
> +	 * In compute_collisions(), we set collisions.strdup_strings to 0
> +	 * so that we wouldn't have to make another copy of the new_path
> +	 * allocated by apply_dir_rename().  But now that we've used them
> +	 * and have no other references to these strings, it is time to
> +	 * deallocate them.
> +	 */
> +	free_strmap_strings(collisions);
> +	strmap_clear(collisions, 1);
> +}

...a large part of it...
>  
> -	/* Free each value in the collisions map */
> -	strmap_for_each_entry(&collisions, &iter, entry) {
> -		struct collision_info *info = entry->value;
> -		string_list_clear(&info->source_files, 0);
> -	}
> -	/*
> -	 * In compute_collisions(), we set collisions.strdup_strings to 0
> -	 * so that we wouldn't have to make another copy of the new_path
> -	 * allocated by apply_dir_rename().  But now that we've used them
> -	 * and have no other references to these strings, it is time to
> -	 * deallocate them.
> -	 */
> -	free_strmap_strings(&collisions);
> -	strmap_clear(&collisions, 1);

..is moving existing code into a utility function...

>  	return clean;
>  }
>  
> @@ -3100,6 +3105,7 @@ static int detect_and_process_renames(struct merge_options *opt,
>  {
>  	struct diff_queue_struct combined = { 0 };
>  	struct rename_info *renames = &opt->priv->renames;
> +	struct strmap collisions[3];
>  	int need_dir_renames, s, i, clean = 1;
>  	unsigned detection_run = 0;
>  
> @@ -3149,12 +3155,22 @@ static int detect_and_process_renames(struct merge_options *opt,
>  	ALLOC_GROW(combined.queue,
>  		   renames->pairs[1].nr + renames->pairs[2].nr,
>  		   combined.alloc);
> +	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
> +		int other_side = 3 - i;
> +		compute_collisions(&collisions[i],
> +				   &renames->dir_renames[other_side],
> +				   &renames->pairs[i]);
> +	}
>  	clean &= collect_renames(opt, &combined, MERGE_SIDE1,
> +				 collisions,
>  				 &renames->dir_renames[2],
>  				 &renames->dir_renames[1]);
>  	clean &= collect_renames(opt, &combined, MERGE_SIDE2,
> +				 collisions,
>  				 &renames->dir_renames[1],
>  				 &renames->dir_renames[2]);
> +	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++)
> +		free_collisions(&collisions[i]);
>  	STABLE_QSORT(combined.queue, combined.nr, compare_pairs);
>  	trace2_region_leave("merge", "directory renames", opt->repo);

...which when looking at it makes the functional change harder to follow
than it otherwise would be.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] merge-ort: fix issue with dual rename and add/add conflict
  2022-06-30  6:57   ` [PATCH v2 3/3] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
@ 2022-06-30 10:31     ` Ævar Arnfjörð Bjarmason
  2022-07-01  3:03       ` Elijah Newren
  0 siblings, 1 reply; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-06-30 10:31 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Jonathan Tan, Calvin Wan, Elijah Newren


On Thu, Jun 30 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> There is code in both merge-recursive and merge-ort for avoiding doubly
> transitive renames (i.e. one side renames directory A/ -> B/, and the
> other side renames directory B/ -> C/), because this combination would
> otherwise make a mess for new files added to A/ on the first side and
> wondering which directory they end up in -- especially if there were
> even more renames such as the first side renaming C/ -> D/.  In such
> cases, it just turns "off" directory rename detection for the higher
> order transitive cases.
>
> The testcases added in t6423 a couple commits ago are slightly different
> but similar in principle.  They involve a similar case of paired
> renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
> a leading directory of B/ to C/.  And both sides add a new file
> somewhere under the directory that the other side will rename.  While
> the new files added start within different directories and thus could
> logically end up within different directories, it is weird for a file
> on one side to end up where the other one started and not move along
> with it.  So, let's just turn off directory rename detection in this
> case as well.
>
> Another way to look at this is that if the source name involved in a
> directory rename on one side is the target name of a directory rename
> operation for a file from the other side, then we avoid the doubly
> transitive rename.  (More concretely, if a directory rename on side D
> wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
> already had a file named NEW_NAME, and a directory rename on side E
> wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
> directory rename detection for NEW_NAME to prevent the
> NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
> conflict on NEW_NAME.)
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c                         | 7 +++++++
>  t/t6423-merge-rename-directories.sh | 4 ++--
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index fa6667de18c..17db4c30e5b 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -2292,9 +2292,16 @@ static char *check_for_directory_rename(struct merge_options *opt,
>  	struct strmap_entry *rename_info;
>  	struct strmap_entry *otherinfo = NULL;
>  	const char *new_dir;
> +	int other_side = 3 - side_index;
>  
> +	/*
> +	 * Cases where we don't have or don't want a directory rename for
> +	 * this path, so we return NULL.
> +	 */
>  	if (strmap_empty(dir_renames))
>  		return new_path;
> +	if (strmap_get(&collisions[other_side], path))
> +		return new_path;

I realize from looking at merge-recursive.c that this is carrying
forward some legacy debt, but I find this code and the need for a
comment more complex than it needs to. On top of master this will work
just as well:
	
	diff --git a/merge-ort.c b/merge-ort.c
	index b5015b9afd4..f5a02b1ff6f 100644
	--- a/merge-ort.c
	+++ b/merge-ort.c
	@@ -2268,16 +2268,16 @@ static char *check_for_directory_rename(struct merge_options *opt,
	 					struct strmap *collisions,
	 					int *clean_merge)
	 {
	-	char *new_path = NULL;
	+	char *new_path;
	 	struct strmap_entry *rename_info;
	 	struct strmap_entry *otherinfo = NULL;
	 	const char *new_dir;
	 
	 	if (strmap_empty(dir_renames))
	-		return new_path;
	+		return NULL;
	 	rename_info = check_dir_renamed(path, dir_renames);
	 	if (!rename_info)
	-		return new_path;
	+		return NULL;
	 	/* old_dir = rename_info->key; */
	 	new_dir = rename_info->value;

I.e. we're really just making the reader squint to see that we're
actually returning NULL here, it's only later that we have an actual
"new path".

Wouldn't sticking that earlier in this series be an improvement? The
reason you need to explain "so we return NULL" is because we're carrying
forward this oddity, but if we just don't initialize it and return NULL
instead...

If you want to keep this pretty much as-is I think you should add a \n
before the (not seen in your context) "old_dir" comment seen in the
context here above, to make it visually clear that your new comment is
referring to these chains of returns. That could also be made clearer
with (again, on top of master, and could be combined with the above):
	
	diff --git a/merge-ort.c b/merge-ort.c
	index b5015b9afd4..a418f81a3eb 100644
	--- a/merge-ort.c
	+++ b/merge-ort.c
	@@ -2278,8 +2278,6 @@ static char *check_for_directory_rename(struct merge_options *opt,
	 	rename_info = check_dir_renamed(path, dir_renames);
	 	if (!rename_info)
	 		return new_path;
	-	/* old_dir = rename_info->key; */
	-	new_dir = rename_info->value;
	 
	 	/*
	 	 * This next part is a little weird.  We do not want to do an
	@@ -2305,6 +2303,7 @@ static char *check_for_directory_rename(struct merge_options *opt,
	 	 * As it turns out, this also prevents N-way transient rename
	 	 * confusion; See testcases 9c and 9d of t6043.
	 	 */
	+	new_dir = rename_info->value; /* old_dir = rename_info->key; */
	 	otherinfo = strmap_get_entry(dir_rename_exclusions, new_dir);
	 	if (otherinfo) {
	 		path_msg(opt, rename_info->key, 1,

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict
  2022-06-30  0:06     ` Elijah Newren
@ 2022-06-30 22:32       ` Jonathan Tan
  2022-07-01  2:57         ` Elijah Newren
  0 siblings, 1 reply; 45+ messages in thread
From: Jonathan Tan @ 2022-06-30 22:32 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Jonathan Tan, Elijah Newren via GitGitGadget, Git Mailing List

Elijah Newren <newren@gmail.com> writes:
> Ah, I can see you've looked at this very closely.  Thanks for digging
> in!   I know it's counter-intuitive at first, but the file is
> necessary in order to get the sub1/ -> sub3/ rename.  The reasoning is
> as follows: We don't need to detect a directory rename for a directory
> where the other side added no new files into that directory...because
> the whole point of directory renames is to move new files in a
> directory to the new location.  Therefore, no new files in the
> directory on one side, means no need to detect a directory rename for
> it on the other side.  For a deeper discussion of this, see commit
> c64432aacd (t6423: more involved rules for renaming directories into
> each other, 2020-10-15).

Thanks! This makes sense. Might be worth including as a comment
(explaining why "newfile" is there) in the test.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] t6423: add tests of dual directory rename plus add/add conflict
  2022-06-30 10:21     ` Ævar Arnfjörð Bjarmason
@ 2022-07-01  2:57       ` Elijah Newren
  2022-07-01  9:29         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 45+ messages in thread
From: Elijah Newren @ 2022-07-01  2:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Jonathan Tan,
	Calvin Wan

On Thu, Jun 30, 2022 at 3:26 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Thu, Jun 30 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
>
> > +test_setup_12l () {
> > +     test_create_repo 12l_$1 &&
>
> Can & should just be "git init", see f0d4d398e28 (test-lib: split up and
> deprecate test_create_repo(), 2021-05-10).

I've switched to "git init" and even encouraged others to do the same
in other test scripts, but since literally every other test in this
file uses test_create_repo, I think they should all be converted at
once and just be consistent here.  But, so we can stop having this
conversation, after this series lands, I'll send one in to update the
various merge testfiles that make heavy use of test_create_repo and
convert them over.

> > +     (
> > +             cd 12l_$1 &&
> > +
> > +             mkdir -p sub1 sub2
>
> The "-p" here isn't needed, you're not creating recursive directories.

Indeed; will fix.

> > +             git ls-files -s >out &&
> > +             test_line_count = 5 out &&
>
> Can't this just use test_stdout_line_count? Except...

Ooh, new function from late last year that I was unaware of.  Yeah,
I'm happy to start using it.

> > +
> > +             git rev-parse >actual \
> > +                     :0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
> > +                     :2:sub1/sub2/new_add_add_file \
> > +                     :3:sub1/sub2/new_add_add_file &&
> > +             git rev-parse >expect \
> > +                     O:sub1/file  B:sub1/newfile O:sub2/other \
> > +                     A:sub2/new_add_add_file \
> > +                     B:sub1/sub2/new_add_add_file &&
> > +             test_cmp expect actual &&
> > +
> > +             git ls-files -o >actual &&
> > +             test_write_lines actual expect out >expect &&
> > +             test_cmp expect actual
>
> This test seems a bit odd, here we're just checking that we've created
> scratch files for the internal use of our test, why is it important that
> we have an "out" file, when the only reason we have it is because we
> needed it for checking the "ls-files" line count above?

Nah, you've misunderstood the purpose of the check.  It isn't "make
sure that these untracked files are present among whatever else might
also be present", it's "make sure the merge step did not introduce
*any* untracked files" (something the recursive backend periodically
did, and they weren't cruft untracked files but stored actual merge
results).  There wasn't a nice easy check for that, the closest was to
translate the requirement to "make sure the only untracked files are
the ones explicitly added by this test script", which is the check you
see here.  I don't actually care about "actual", "expect", or "out", I
just care that there aren't any _other_ untracked files.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict
  2022-06-30 22:32       ` Jonathan Tan
@ 2022-07-01  2:57         ` Elijah Newren
  0 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren @ 2022-07-01  2:57 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Thu, Jun 30, 2022 at 3:32 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
> > Ah, I can see you've looked at this very closely.  Thanks for digging
> > in!   I know it's counter-intuitive at first, but the file is
> > necessary in order to get the sub1/ -> sub3/ rename.  The reasoning is
> > as follows: We don't need to detect a directory rename for a directory
> > where the other side added no new files into that directory...because
> > the whole point of directory renames is to move new files in a
> > directory to the new location.  Therefore, no new files in the
> > directory on one side, means no need to detect a directory rename for
> > it on the other side.  For a deeper discussion of this, see commit
> > c64432aacd (t6423: more involved rules for renaming directories into
> > each other, 2020-10-15).
>
> Thanks! This makes sense. Might be worth including as a comment
> (explaining why "newfile" is there) in the test.

Sure, will do.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 2/3] merge-ort: shuffle the computation and cleanup of potential collisions
  2022-06-30 10:28     ` Ævar Arnfjörð Bjarmason
@ 2022-07-01  3:02       ` Elijah Newren
  0 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren @ 2022-07-01  3:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Jonathan Tan,
	Calvin Wan

On Thu, Jun 30, 2022 at 3:29 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Thu, Jun 30 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > Run compute_collisions() for renames on both sides of history before
> > any calls to collect_renames(), and do not free the computed collisions
> > until after both calls to collect_renames().  This is just a code
> > reorganization at this point that doesn't make sense on its own, but
> > will permit us to use the computed collision info from both sides
> > within each call to collect_renames() in a subsequent commit.
>
> I think this would be easier to follow if split in two, since...
>
> > +static void free_collisions(struct strmap *collisions)
> > +{
> > +     struct hashmap_iter iter;
> > +     struct strmap_entry *entry;
> > +
> > +     /* Free each value in the collisions map */
> > +     strmap_for_each_entry(collisions, &iter, entry) {
> > +             struct collision_info *info = entry->value;
> > +             string_list_clear(&info->source_files, 0);
> > +     }
> > +     /*
> > +      * In compute_collisions(), we set collisions.strdup_strings to 0
> > +      * so that we wouldn't have to make another copy of the new_path
> > +      * allocated by apply_dir_rename().  But now that we've used them
> > +      * and have no other references to these strings, it is time to
> > +      * deallocate them.
> > +      */
> > +     free_strmap_strings(collisions);
> > +     strmap_clear(collisions, 1);
> > +}
>
> ...a large part of it...
> >
> > -     /* Free each value in the collisions map */
> > -     strmap_for_each_entry(&collisions, &iter, entry) {
> > -             struct collision_info *info = entry->value;
> > -             string_list_clear(&info->source_files, 0);
> > -     }
> > -     /*
> > -      * In compute_collisions(), we set collisions.strdup_strings to 0
> > -      * so that we wouldn't have to make another copy of the new_path
> > -      * allocated by apply_dir_rename().  But now that we've used them
> > -      * and have no other references to these strings, it is time to
> > -      * deallocate them.
> > -      */
> > -     free_strmap_strings(&collisions);
> > -     strmap_clear(&collisions, 1);
>
> ..is moving existing code into a utility function...
>
> >       return clean;
> >  }
> >
> > @@ -3100,6 +3105,7 @@ static int detect_and_process_renames(struct merge_options *opt,
> >  {
> >       struct diff_queue_struct combined = { 0 };
> >       struct rename_info *renames = &opt->priv->renames;
> > +     struct strmap collisions[3];
> >       int need_dir_renames, s, i, clean = 1;
> >       unsigned detection_run = 0;
> >
> > @@ -3149,12 +3155,22 @@ static int detect_and_process_renames(struct merge_options *opt,
> >       ALLOC_GROW(combined.queue,
> >                  renames->pairs[1].nr + renames->pairs[2].nr,
> >                  combined.alloc);
> > +     for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
> > +             int other_side = 3 - i;
> > +             compute_collisions(&collisions[i],
> > +                                &renames->dir_renames[other_side],
> > +                                &renames->pairs[i]);
> > +     }
> >       clean &= collect_renames(opt, &combined, MERGE_SIDE1,
> > +                              collisions,
> >                                &renames->dir_renames[2],
> >                                &renames->dir_renames[1]);
> >       clean &= collect_renames(opt, &combined, MERGE_SIDE2,
> > +                              collisions,
> >                                &renames->dir_renames[1],
> >                                &renames->dir_renames[2]);
> > +     for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++)
> > +             free_collisions(&collisions[i]);
> >       STABLE_QSORT(combined.queue, combined.nr, compare_pairs);
> >       trace2_region_leave("merge", "directory renames", opt->repo);
>
> ...which when looking at it makes the functional change harder to follow
> than it otherwise would be.

Good point; will split.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 3/3] merge-ort: fix issue with dual rename and add/add conflict
  2022-06-30 10:31     ` Ævar Arnfjörð Bjarmason
@ 2022-07-01  3:03       ` Elijah Newren
  0 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren @ 2022-07-01  3:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Jonathan Tan,
	Calvin Wan

On Thu, Jun 30, 2022 at 3:41 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Thu, Jun 30 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > There is code in both merge-recursive and merge-ort for avoiding doubly
> > transitive renames (i.e. one side renames directory A/ -> B/, and the
> > other side renames directory B/ -> C/), because this combination would
> > otherwise make a mess for new files added to A/ on the first side and
> > wondering which directory they end up in -- especially if there were
> > even more renames such as the first side renaming C/ -> D/.  In such
> > cases, it just turns "off" directory rename detection for the higher
> > order transitive cases.
> >
> > The testcases added in t6423 a couple commits ago are slightly different
> > but similar in principle.  They involve a similar case of paired
> > renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
> > a leading directory of B/ to C/.  And both sides add a new file
> > somewhere under the directory that the other side will rename.  While
> > the new files added start within different directories and thus could
> > logically end up within different directories, it is weird for a file
> > on one side to end up where the other one started and not move along
> > with it.  So, let's just turn off directory rename detection in this
> > case as well.
> >
> > Another way to look at this is that if the source name involved in a
> > directory rename on one side is the target name of a directory rename
> > operation for a file from the other side, then we avoid the doubly
> > transitive rename.  (More concretely, if a directory rename on side D
> > wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
> > already had a file named NEW_NAME, and a directory rename on side E
> > wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
> > directory rename detection for NEW_NAME to prevent the
> > NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
> > conflict on NEW_NAME.)
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  merge-ort.c                         | 7 +++++++
> >  t/t6423-merge-rename-directories.sh | 4 ++--
> >  2 files changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index fa6667de18c..17db4c30e5b 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -2292,9 +2292,16 @@ static char *check_for_directory_rename(struct merge_options *opt,
> >       struct strmap_entry *rename_info;
> >       struct strmap_entry *otherinfo = NULL;
> >       const char *new_dir;
> > +     int other_side = 3 - side_index;
> >
> > +     /*
> > +      * Cases where we don't have or don't want a directory rename for
> > +      * this path, so we return NULL.
> > +      */
> >       if (strmap_empty(dir_renames))
> >               return new_path;
> > +     if (strmap_get(&collisions[other_side], path))
> > +             return new_path;
>
> I realize from looking at merge-recursive.c that this is carrying
> forward some legacy debt

...which was introduced by me as well...

> , but I find this code and the need for a
> comment more complex than it needs to. On top of master this will work
> just as well:
>
>         diff --git a/merge-ort.c b/merge-ort.c
>         index b5015b9afd4..f5a02b1ff6f 100644
>         --- a/merge-ort.c
>         +++ b/merge-ort.c
>         @@ -2268,16 +2268,16 @@ static char *check_for_directory_rename(struct merge_options *opt,
>                                                 struct strmap *collisions,
>                                                 int *clean_merge)
>          {
>         -       char *new_path = NULL;
>         +       char *new_path;
>                 struct strmap_entry *rename_info;
>                 struct strmap_entry *otherinfo = NULL;
>                 const char *new_dir;
>
>                 if (strmap_empty(dir_renames))
>         -               return new_path;
>         +               return NULL;
>                 rename_info = check_dir_renamed(path, dir_renames);
>                 if (!rename_info)
>         -               return new_path;
>         +               return NULL;
>                 /* old_dir = rename_info->key; */
>                 new_dir = rename_info->value;

You know, while making this patch series I was asking myself, "Why
didn't I just explicitly return NULL here?"  I don't know why I did it
this way, but it now seems slightly odd to me too.  I decided to not
fix it up and just provide a more targeted fix, but since it tripped
you up, I'm happy to add this as a preparatory cleanup.

> I.e. we're really just making the reader squint to see that we're
> actually returning NULL here, it's only later that we have an actual
> "new path".
>
> Wouldn't sticking that earlier in this series be an improvement? The
> reason you need to explain "so we return NULL" is because we're carrying
> forward this oddity, but if we just don't initialize it and return NULL
> instead...

I still think the comment makes sense to add, other than the "so we
return NULL" bit, even after this change.  But yeah, we can strike the
"so we return NULL" part of it after this cleanup.

> If you want to keep this pretty much as-is I think you should add a \n
> before the (not seen in your context) "old_dir" comment seen in the
> context here above, to make it visually clear that your new comment is
> referring to these chains of returns. That could also be made clearer
> with (again, on top of master, and could be combined with the above):
>
>         diff --git a/merge-ort.c b/merge-ort.c
>         index b5015b9afd4..a418f81a3eb 100644
>         --- a/merge-ort.c
>         +++ b/merge-ort.c
>         @@ -2278,8 +2278,6 @@ static char *check_for_directory_rename(struct merge_options *opt,
>                 rename_info = check_dir_renamed(path, dir_renames);
>                 if (!rename_info)
>                         return new_path;
>         -       /* old_dir = rename_info->key; */
>         -       new_dir = rename_info->value;
>
>                 /*
>                  * This next part is a little weird.  We do not want to do an
>         @@ -2305,6 +2303,7 @@ static char *check_for_directory_rename(struct merge_options *opt,
>                  * As it turns out, this also prevents N-way transient rename
>                  * confusion; See testcases 9c and 9d of t6043.
>                  */
>         +       new_dir = rename_info->value; /* old_dir = rename_info->key; */
>                 otherinfo = strmap_get_entry(dir_rename_exclusions, new_dir);
>                 if (otherinfo) {
>                         path_msg(opt, rename_info->key, 1,

Sure, seems fine.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v3 0/5] Fix dual rename into each other plus conflicting adds
  2022-06-30  6:57 ` [PATCH v2 " Elijah Newren via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-06-30  6:57   ` [PATCH v2 3/3] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
@ 2022-07-01  5:23   ` Elijah Newren via GitGitGadget
  2022-07-01  5:23     ` [PATCH v3 1/5] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
                       ` (5 more replies)
  3 siblings, 6 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-07-01  5:23 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason, Elijah Newren

This series adds some testcases based on the tensorflow repository issue
reported by Glen Choo at [1], demonstrating bugs in both the ort and
recursive strategies. It also provides a fix for the ort strategy.

Changes since v2:

 * Added a couple preparatory cleanup patches
 * Added a comment about why sub1/newfile is important to the new testcases
 * A couple other minor code cleanups

Changes since v1:

 * Fixed some wording issues in comments, and added a bit more details to
   one of the commit messages

[1]
https://lore.kernel.org/git/kl6lee006mle.fsf@chooglen-macbookpro.roam.corp.google.com/

Elijah Newren (5):
  t6423: add tests of dual directory rename plus add/add conflict
  merge-ort: small cleanups of check_for_directory_rename
  merge-ort: make a separate function for freeing struct collisions
  merge-ort: shuffle the computation and cleanup of potential collisions
  merge-ort: fix issue with dual rename and add/add conflict

 merge-ort.c                         |  74 +++++++++++++-------
 t/t6423-merge-rename-directories.sh | 105 ++++++++++++++++++++++++++++
 2 files changed, 153 insertions(+), 26 deletions(-)


base-commit: e54793a95afeea1e10de1e5ad7eab914e7416250
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1268%2Fnewren%2Ffix-dual-rename-into-each-other-plus-conflicting-adds-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1268/newren/fix-dual-rename-into-each-other-plus-conflicting-adds-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1268

Range-diff vs v2:

 1:  bf4c03d01d5 ! 1:  a16a1c4d947 t6423: add tests of dual directory rename plus add/add conflict
     @@ t/t6423-merge-rename-directories.sh: test_expect_success '12k: Directory rename
      +#
      +#   Expected: sub3/{file, newfile, sub2/other}
      +#             CONFLICT (add/add): sub1/sub2/new_add_add_file
     ++#
     ++#   Note that sub1/newfile is not extraneous.  Directory renames are only
     ++#   detected if they are needed, and they are only needed if the old directory
     ++#   had a new file added on the opposite side of history.  So sub1/newfile
     ++#   is needed for there to be a sub1/ -> sub3/ rename.
      +
      +test_setup_12l () {
      +	test_create_repo 12l_$1 &&
      +	(
      +		cd 12l_$1 &&
      +
     -+		mkdir -p sub1 sub2
     ++		mkdir sub1 sub2
      +		echo file >sub1/file &&
      +		echo other >sub2/other &&
      +		git add sub1 sub2 &&
     @@ t/t6423-merge-rename-directories.sh: test_expect_success '12k: Directory rename
      +
      +		test_must_fail git -c merge.directoryRenames=true merge -s recursive B^0 &&
      +
     -+		git ls-files -s >out &&
     -+		test_line_count = 5 out &&
     ++		test_stdout_line_count = 5 git ls-files -s &&
      +
      +		git rev-parse >actual \
      +			:0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
     @@ t/t6423-merge-rename-directories.sh: test_expect_success '12k: Directory rename
      +		test_cmp expect actual &&
      +
      +		git ls-files -o >actual &&
     -+		test_write_lines actual expect out >expect &&
     ++		test_write_lines actual expect >expect &&
      +		test_cmp expect actual
      +	)
      +'
     @@ t/t6423-merge-rename-directories.sh: test_expect_success '12k: Directory rename
      +
      +		test_must_fail git -c merge.directoryRenames=true merge -s recursive A^0 &&
      +
     -+		git ls-files -s >out &&
     -+		test_line_count = 5 out &&
     ++		test_stdout_line_count = 5 git ls-files -s &&
      +
      +		git rev-parse >actual \
      +			:0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
     @@ t/t6423-merge-rename-directories.sh: test_expect_success '12k: Directory rename
      +		test_cmp expect actual &&
      +
      +		git ls-files -o >actual &&
     -+		test_write_lines actual expect out >expect &&
     ++		test_write_lines actual expect >expect &&
      +		test_cmp expect actual
      +	)
      +'
 -:  ----------- > 2:  297fef60b19 merge-ort: small cleanups of check_for_directory_rename
 -:  ----------- > 3:  f5f87acbbd2 merge-ort: make a separate function for freeing struct collisions
 2:  cfa38f01481 ! 4:  d3eac3d0bf6 merge-ort: shuffle the computation and cleanup of potential collisions
     @@ Commit message
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## merge-ort.c ##
     -@@ merge-ort.c: static void compute_collisions(struct strmap *collisions,
     - 	}
     - }
     - 
     -+static void free_collisions(struct strmap *collisions)
     -+{
     -+	struct hashmap_iter iter;
     -+	struct strmap_entry *entry;
     -+
     -+	/* Free each value in the collisions map */
     -+	strmap_for_each_entry(collisions, &iter, entry) {
     -+		struct collision_info *info = entry->value;
     -+		string_list_clear(&info->source_files, 0);
     -+	}
     -+	/*
     -+	 * In compute_collisions(), we set collisions.strdup_strings to 0
     -+	 * so that we wouldn't have to make another copy of the new_path
     -+	 * allocated by apply_dir_rename().  But now that we've used them
     -+	 * and have no other references to these strings, it is time to
     -+	 * deallocate them.
     -+	 */
     -+	free_strmap_strings(collisions);
     -+	strmap_clear(collisions, 1);
     -+}
     -+
     - static char *check_for_directory_rename(struct merge_options *opt,
     - 					const char *path,
     - 					unsigned side_index,
      @@ merge-ort.c: static char *check_for_directory_rename(struct merge_options *opt,
       	}
       
     @@ merge-ort.c: static int detect_regular_renames(struct merge_options *opt,
       	int i, clean = 1;
      -	struct strmap collisions;
       	struct diff_queue_struct *side_pairs;
     --	struct hashmap_iter iter;
     --	struct strmap_entry *entry;
       	struct rename_info *renames = &opt->priv->renames;
       
       	side_pairs = &renames->pairs[side_index];
     @@ merge-ort.c: static int collect_renames(struct merge_options *opt,
       		result->queue[result->nr++] = p;
       	}
       
     --	/* Free each value in the collisions map */
     --	strmap_for_each_entry(&collisions, &iter, entry) {
     --		struct collision_info *info = entry->value;
     --		string_list_clear(&info->source_files, 0);
     --	}
     --	/*
     --	 * In compute_collisions(), we set collisions.strdup_strings to 0
     --	 * so that we wouldn't have to make another copy of the new_path
     --	 * allocated by apply_dir_rename().  But now that we've used them
     --	 * and have no other references to these strings, it is time to
     --	 * deallocate them.
     --	 */
     --	free_strmap_strings(&collisions);
     --	strmap_clear(&collisions, 1);
     +-	free_collisions(&collisions);
       	return clean;
       }
       
 3:  da3ae38e390 ! 5:  121761e26e2 merge-ort: fix issue with dual rename and add/add conflict
     @@ Commit message
       ## merge-ort.c ##
      @@ merge-ort.c: static char *check_for_directory_rename(struct merge_options *opt,
       	struct strmap_entry *rename_info;
     - 	struct strmap_entry *otherinfo = NULL;
     + 	struct strmap_entry *otherinfo;
       	const char *new_dir;
      +	int other_side = 3 - side_index;
       
     +-	/* Cases where we don't have a directory rename for this path */
      +	/*
      +	 * Cases where we don't have or don't want a directory rename for
     -+	 * this path, so we return NULL.
     ++	 * this path.
      +	 */
       	if (strmap_empty(dir_renames))
     - 		return new_path;
     + 		return NULL;
      +	if (strmap_get(&collisions[other_side], path))
     -+		return new_path;
     ++		return NULL;
       	rename_info = check_dir_renamed(path, dir_renames);
       	if (!rename_info)
     - 		return new_path;
     + 		return NULL;
      
       ## t/t6423-merge-rename-directories.sh ##
      @@ t/t6423-merge-rename-directories.sh: test_setup_12l () {

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v3 1/5] t6423: add tests of dual directory rename plus add/add conflict
  2022-07-01  5:23   ` [PATCH v3 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
@ 2022-07-01  5:23     ` Elijah Newren via GitGitGadget
  2022-07-01  5:23     ` [PATCH v3 2/5] merge-ort: small cleanups of check_for_directory_rename Elijah Newren via GitGitGadget
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-07-01  5:23 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

This is an attempt at minimalizing a testcase reported by Glen Choo
with tensorflow where merge-ort would report an assertion failure:

    Assertion failed: (ci->filemask == 2 || ci->filemask == 4), function apply_directory_rename_modifications, file merge-ort.c, line 2410

reversing the direction of the merge provides a different error:

    error: cache entry has null sha1: ...
    fatal: unable to write .git/index

so we add testcases for both.  With these new testcases, the
recursive strategy differs in that it returns the latter error for
both merge directions.

These testcases are somehow a little different than Glen's original
tensorflow testcase in that these ones trigger a bug with the recursive
algorithm whereas his testcase didn't.  I figure that means these
testcases somehow manage to be more comprehensive.

Reported-by: Glen Choo <chooglen@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t6423-merge-rename-directories.sh | 105 ++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
index 479db32cd62..ed5586de28c 100755
--- a/t/t6423-merge-rename-directories.sh
+++ b/t/t6423-merge-rename-directories.sh
@@ -5199,6 +5199,111 @@ test_expect_success '12k: Directory rename with sibling causes rename-to-self' '
 	)
 '
 
+# Testcase 12l, Both sides rename a directory into the other side, both add
+#   a file which after directory renames are the same filename
+#   Commit O: sub1/file,                 sub2/other
+#   Commit A: sub3/file,                 sub2/{other, new_add_add_file_1}
+#   Commit B: sub1/{file, newfile}, sub1/sub2/{other, new_add_add_file_2}
+#
+#   In words:
+#     A: sub1/ -> sub3/, add sub2/new_add_add_file_1
+#     B: sub2/ -> sub1/sub2, add sub1/newfile, add sub1/sub2/new_add_add_file_2
+#
+#   Expected: sub3/{file, newfile, sub2/other}
+#             CONFLICT (add/add): sub1/sub2/new_add_add_file
+#
+#   Note that sub1/newfile is not extraneous.  Directory renames are only
+#   detected if they are needed, and they are only needed if the old directory
+#   had a new file added on the opposite side of history.  So sub1/newfile
+#   is needed for there to be a sub1/ -> sub3/ rename.
+
+test_setup_12l () {
+	test_create_repo 12l_$1 &&
+	(
+		cd 12l_$1 &&
+
+		mkdir sub1 sub2
+		echo file >sub1/file &&
+		echo other >sub2/other &&
+		git add sub1 sub2 &&
+		git commit -m "O" &&
+
+		git branch O &&
+		git branch A &&
+		git branch B &&
+
+		git checkout A &&
+		git mv sub1 sub3 &&
+		echo conflicting >sub2/new_add_add_file &&
+		git add sub2 &&
+		test_tick &&
+		git add -u &&
+		git commit -m "A" &&
+
+		git checkout B &&
+		echo dissimilar >sub2/new_add_add_file &&
+		echo brand >sub1/newfile &&
+		git add sub1 sub2 &&
+		git mv sub2 sub1 &&
+		test_tick &&
+		git commit -m "B"
+	)
+}
+
+test_expect_merge_algorithm failure failure '12l (B into A): Rename into each other + add/add conflict' '
+	test_setup_12l BintoA &&
+	(
+		cd 12l_BintoA &&
+
+		git checkout -q A^0 &&
+
+		test_must_fail git -c merge.directoryRenames=true merge -s recursive B^0 &&
+
+		test_stdout_line_count = 5 git ls-files -s &&
+
+		git rev-parse >actual \
+			:0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
+			:2:sub1/sub2/new_add_add_file \
+			:3:sub1/sub2/new_add_add_file &&
+		git rev-parse >expect \
+			O:sub1/file  B:sub1/newfile O:sub2/other \
+			A:sub2/new_add_add_file \
+			B:sub1/sub2/new_add_add_file &&
+		test_cmp expect actual &&
+
+		git ls-files -o >actual &&
+		test_write_lines actual expect >expect &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_merge_algorithm failure failure '12l (A into B): Rename into each other + add/add conflict' '
+	test_setup_12l AintoB &&
+	(
+		cd 12l_AintoB &&
+
+		git checkout -q B^0 &&
+
+		test_must_fail git -c merge.directoryRenames=true merge -s recursive A^0 &&
+
+		test_stdout_line_count = 5 git ls-files -s &&
+
+		git rev-parse >actual \
+			:0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
+			:2:sub1/sub2/new_add_add_file \
+			:3:sub1/sub2/new_add_add_file &&
+		git rev-parse >expect \
+			O:sub1/file  B:sub1/newfile O:sub2/other \
+			B:sub1/sub2/new_add_add_file \
+			A:sub2/new_add_add_file &&
+		test_cmp expect actual &&
+
+		git ls-files -o >actual &&
+		test_write_lines actual expect >expect &&
+		test_cmp expect actual
+	)
+'
+
 ###########################################################################
 # SECTION 13: Checking informational and conflict messages
 #
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 2/5] merge-ort: small cleanups of check_for_directory_rename
  2022-07-01  5:23   ` [PATCH v3 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
  2022-07-01  5:23     ` [PATCH v3 1/5] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
@ 2022-07-01  5:23     ` Elijah Newren via GitGitGadget
  2022-07-01  5:23     ` [PATCH v3 3/5] merge-ort: make a separate function for freeing struct collisions Elijah Newren via GitGitGadget
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-07-01  5:23 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@palantir.com>

No functional changes, just some preparatory cleanups.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@palantir.com>
---
 merge-ort.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 8545354dafd..ff037cca8d2 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2267,18 +2267,17 @@ static char *check_for_directory_rename(struct merge_options *opt,
 					struct strmap *collisions,
 					int *clean_merge)
 {
-	char *new_path = NULL;
+	char *new_path;
 	struct strmap_entry *rename_info;
-	struct strmap_entry *otherinfo = NULL;
+	struct strmap_entry *otherinfo;
 	const char *new_dir;
 
+	/* Cases where we don't have a directory rename for this path */
 	if (strmap_empty(dir_renames))
-		return new_path;
+		return NULL;
 	rename_info = check_dir_renamed(path, dir_renames);
 	if (!rename_info)
-		return new_path;
-	/* old_dir = rename_info->key; */
-	new_dir = rename_info->value;
+		return NULL;
 
 	/*
 	 * This next part is a little weird.  We do not want to do an
@@ -2304,6 +2303,7 @@ static char *check_for_directory_rename(struct merge_options *opt,
 	 * As it turns out, this also prevents N-way transient rename
 	 * confusion; See testcases 9c and 9d of t6043.
 	 */
+	new_dir = rename_info->value; /* old_dir = rename_info->key; */
 	otherinfo = strmap_get_entry(dir_rename_exclusions, new_dir);
 	if (otherinfo) {
 		path_msg(opt, rename_info->key, 1,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 3/5] merge-ort: make a separate function for freeing struct collisions
  2022-07-01  5:23   ` [PATCH v3 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
  2022-07-01  5:23     ` [PATCH v3 1/5] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
  2022-07-01  5:23     ` [PATCH v3 2/5] merge-ort: small cleanups of check_for_directory_rename Elijah Newren via GitGitGadget
@ 2022-07-01  5:23     ` Elijah Newren via GitGitGadget
  2022-07-01  5:23     ` [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-07-01  5:23 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@palantir.com>

This commit makes no functional changes, it's just some code movement in
preparation for later changes.

Signed-off-by: Elijah Newren <newren@palantir.com>
---
 merge-ort.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index ff037cca8d2..1514dd173c0 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2259,6 +2259,27 @@ static void compute_collisions(struct strmap *collisions,
 	}
 }
 
+static void free_collisions(struct strmap *collisions)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *entry;
+
+	/* Free each value in the collisions map */
+	strmap_for_each_entry(collisions, &iter, entry) {
+		struct collision_info *info = entry->value;
+		string_list_clear(&info->source_files, 0);
+	}
+	/*
+	 * In compute_collisions(), we set collisions.strdup_strings to 0
+	 * so that we wouldn't have to make another copy of the new_path
+	 * allocated by apply_dir_rename().  But now that we've used them
+	 * and have no other references to these strings, it is time to
+	 * deallocate them.
+	 */
+	free_strmap_strings(collisions);
+	strmap_clear(collisions, 1);
+}
+
 static char *check_for_directory_rename(struct merge_options *opt,
 					const char *path,
 					unsigned side_index,
@@ -3029,8 +3050,6 @@ static int collect_renames(struct merge_options *opt,
 	int i, clean = 1;
 	struct strmap collisions;
 	struct diff_queue_struct *side_pairs;
-	struct hashmap_iter iter;
-	struct strmap_entry *entry;
 	struct rename_info *renames = &opt->priv->renames;
 
 	side_pairs = &renames->pairs[side_index];
@@ -3076,20 +3095,7 @@ static int collect_renames(struct merge_options *opt,
 		result->queue[result->nr++] = p;
 	}
 
-	/* Free each value in the collisions map */
-	strmap_for_each_entry(&collisions, &iter, entry) {
-		struct collision_info *info = entry->value;
-		string_list_clear(&info->source_files, 0);
-	}
-	/*
-	 * In compute_collisions(), we set collisions.strdup_strings to 0
-	 * so that we wouldn't have to make another copy of the new_path
-	 * allocated by apply_dir_rename().  But now that we've used them
-	 * and have no other references to these strings, it is time to
-	 * deallocate them.
-	 */
-	free_strmap_strings(&collisions);
-	strmap_clear(&collisions, 1);
+	free_collisions(&collisions);
 	return clean;
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions
  2022-07-01  5:23   ` [PATCH v3 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
                       ` (2 preceding siblings ...)
  2022-07-01  5:23     ` [PATCH v3 3/5] merge-ort: make a separate function for freeing struct collisions Elijah Newren via GitGitGadget
@ 2022-07-01  5:23     ` Elijah Newren via GitGitGadget
  2022-07-01  9:16       ` Ævar Arnfjörð Bjarmason
  2022-07-01  5:23     ` [PATCH v3 5/5] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
  2022-07-05  1:33     ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
  5 siblings, 1 reply; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-07-01  5:23 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Run compute_collisions() for renames on both sides of history before
any calls to collect_renames(), and do not free the computed collisions
until after both calls to collect_renames().  This is just a code
reorganization at this point that doesn't make sense on its own, but
will permit us to use the computed collision info from both sides
within each call to collect_renames() in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 1514dd173c0..b496f0e3803 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2335,7 +2335,8 @@ static char *check_for_directory_rename(struct merge_options *opt,
 	}
 
 	new_path = handle_path_level_conflicts(opt, path, side_index,
-					       rename_info, collisions);
+					       rename_info,
+					       &collisions[side_index]);
 	*clean_merge &= (new_path != NULL);
 
 	return new_path;
@@ -3044,16 +3045,15 @@ static int detect_regular_renames(struct merge_options *opt,
 static int collect_renames(struct merge_options *opt,
 			   struct diff_queue_struct *result,
 			   unsigned side_index,
+			   struct strmap *collisions,
 			   struct strmap *dir_renames_for_side,
 			   struct strmap *rename_exclusions)
 {
 	int i, clean = 1;
-	struct strmap collisions;
 	struct diff_queue_struct *side_pairs;
 	struct rename_info *renames = &opt->priv->renames;
 
 	side_pairs = &renames->pairs[side_index];
-	compute_collisions(&collisions, dir_renames_for_side, side_pairs);
 
 	for (i = 0; i < side_pairs->nr; ++i) {
 		struct diff_filepair *p = side_pairs->queue[i];
@@ -3069,7 +3069,7 @@ static int collect_renames(struct merge_options *opt,
 						      side_index,
 						      dir_renames_for_side,
 						      rename_exclusions,
-						      &collisions,
+						      collisions,
 						      &clean);
 
 		possibly_cache_new_pair(renames, p, side_index, new_path);
@@ -3095,7 +3095,6 @@ static int collect_renames(struct merge_options *opt,
 		result->queue[result->nr++] = p;
 	}
 
-	free_collisions(&collisions);
 	return clean;
 }
 
@@ -3106,6 +3105,7 @@ static int detect_and_process_renames(struct merge_options *opt,
 {
 	struct diff_queue_struct combined = { 0 };
 	struct rename_info *renames = &opt->priv->renames;
+	struct strmap collisions[3];
 	int need_dir_renames, s, i, clean = 1;
 	unsigned detection_run = 0;
 
@@ -3155,12 +3155,22 @@ static int detect_and_process_renames(struct merge_options *opt,
 	ALLOC_GROW(combined.queue,
 		   renames->pairs[1].nr + renames->pairs[2].nr,
 		   combined.alloc);
+	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
+		int other_side = 3 - i;
+		compute_collisions(&collisions[i],
+				   &renames->dir_renames[other_side],
+				   &renames->pairs[i]);
+	}
 	clean &= collect_renames(opt, &combined, MERGE_SIDE1,
+				 collisions,
 				 &renames->dir_renames[2],
 				 &renames->dir_renames[1]);
 	clean &= collect_renames(opt, &combined, MERGE_SIDE2,
+				 collisions,
 				 &renames->dir_renames[1],
 				 &renames->dir_renames[2]);
+	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++)
+		free_collisions(&collisions[i]);
 	STABLE_QSORT(combined.queue, combined.nr, compare_pairs);
 	trace2_region_leave("merge", "directory renames", opt->repo);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v3 5/5] merge-ort: fix issue with dual rename and add/add conflict
  2022-07-01  5:23   ` [PATCH v3 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
                       ` (3 preceding siblings ...)
  2022-07-01  5:23     ` [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
@ 2022-07-01  5:23     ` Elijah Newren via GitGitGadget
  2022-07-05  1:33     ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
  5 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-07-01  5:23 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

There is code in both merge-recursive and merge-ort for avoiding doubly
transitive renames (i.e. one side renames directory A/ -> B/, and the
other side renames directory B/ -> C/), because this combination would
otherwise make a mess for new files added to A/ on the first side and
wondering which directory they end up in -- especially if there were
even more renames such as the first side renaming C/ -> D/.  In such
cases, it just turns "off" directory rename detection for the higher
order transitive cases.

The testcases added in t6423 a couple commits ago are slightly different
but similar in principle.  They involve a similar case of paired
renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
a leading directory of B/ to C/.  And both sides add a new file
somewhere under the directory that the other side will rename.  While
the new files added start within different directories and thus could
logically end up within different directories, it is weird for a file
on one side to end up where the other one started and not move along
with it.  So, let's just turn off directory rename detection in this
case as well.

Another way to look at this is that if the source name involved in a
directory rename on one side is the target name of a directory rename
operation for a file from the other side, then we avoid the doubly
transitive rename.  (More concretely, if a directory rename on side D
wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
already had a file named NEW_NAME, and a directory rename on side E
wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
directory rename detection for NEW_NAME to prevent the
NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
conflict on NEW_NAME.)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c                         | 8 +++++++-
 t/t6423-merge-rename-directories.sh | 4 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index b496f0e3803..daa0e4496f8 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2292,10 +2292,16 @@ static char *check_for_directory_rename(struct merge_options *opt,
 	struct strmap_entry *rename_info;
 	struct strmap_entry *otherinfo;
 	const char *new_dir;
+	int other_side = 3 - side_index;
 
-	/* Cases where we don't have a directory rename for this path */
+	/*
+	 * Cases where we don't have or don't want a directory rename for
+	 * this path.
+	 */
 	if (strmap_empty(dir_renames))
 		return NULL;
+	if (strmap_get(&collisions[other_side], path))
+		return NULL;
 	rename_info = check_dir_renamed(path, dir_renames);
 	if (!rename_info)
 		return NULL;
diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
index ed5586de28c..99baf77cbfd 100755
--- a/t/t6423-merge-rename-directories.sh
+++ b/t/t6423-merge-rename-directories.sh
@@ -5250,7 +5250,7 @@ test_setup_12l () {
 	)
 }
 
-test_expect_merge_algorithm failure failure '12l (B into A): Rename into each other + add/add conflict' '
+test_expect_merge_algorithm failure success '12l (B into A): Rename into each other + add/add conflict' '
 	test_setup_12l BintoA &&
 	(
 		cd 12l_BintoA &&
@@ -5277,7 +5277,7 @@ test_expect_merge_algorithm failure failure '12l (B into A): Rename into each ot
 	)
 '
 
-test_expect_merge_algorithm failure failure '12l (A into B): Rename into each other + add/add conflict' '
+test_expect_merge_algorithm failure success '12l (A into B): Rename into each other + add/add conflict' '
 	test_setup_12l AintoB &&
 	(
 		cd 12l_AintoB &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions
  2022-07-01  5:23     ` [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
@ 2022-07-01  9:16       ` Ævar Arnfjörð Bjarmason
  2022-07-25 12:00         ` C99 "for (int ..." form on "master" (was: [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions) Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-01  9:16 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Jonathan Tan, Calvin Wan, Elijah Newren, Junio C Hamano


On Fri, Jul 01 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> Run compute_collisions() for renames on both sides of history before
> any calls to collect_renames(), and do not free the computed collisions
> until after both calls to collect_renames().  This is just a code
> reorganization at this point that doesn't make sense on its own, but
> will permit us to use the computed collision info from both sides
> within each call to collect_renames() in a subsequent commit.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>

This is much easier to read & review with the prep patches, thanks a
lot.

B.t.w. on the "legacy code" comment wrt merge-{ort,recursive}.c : I
didn't look in that case, but I've seen that you've copied various older
code from merge-recursive.c to merge-ort.c (which makes sense) in the
past, but I didn't check the origin in that case. Sorry :)

> @@ -3106,6 +3105,7 @@ static int detect_and_process_renames(struct merge_options *opt,
>  {
>  	struct diff_queue_struct combined = { 0 };
>  	struct rename_info *renames = &opt->priv->renames;
> +	struct strmap collisions[3];
>  	int need_dir_renames, s, i, clean = 1;
>  	unsigned detection_run = 0;
>  
> @@ -3155,12 +3155,22 @@ static int detect_and_process_renames(struct merge_options *opt,
>  	ALLOC_GROW(combined.queue,
>  		   renames->pairs[1].nr + renames->pairs[2].nr,
>  		   combined.alloc);
> +	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {

The "int i" here will need to be pre-declared earlier, per: 6563706568b
(CodingGuidelines: give deadline for "for (int i = 0; ...", 2022-03-30)

I also don't mind us just saying "we've waited enough". Junio?

> +		int other_side = 3 - i;
> +		compute_collisions(&collisions[i],
> +				   &renames->dir_renames[other_side],
> +				   &renames->pairs[i]);
> +	}
>  	clean &= collect_renames(opt, &combined, MERGE_SIDE1,
> +				 collisions,
>  				 &renames->dir_renames[2],
>  				 &renames->dir_renames[1]);
>  	clean &= collect_renames(opt, &combined, MERGE_SIDE2,
> +				 collisions,
>  				 &renames->dir_renames[1],
>  				 &renames->dir_renames[2]);
> +	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++)
> +		free_collisions(&collisions[i]);
>  	STABLE_QSORT(combined.queue, combined.nr, compare_pairs);
>  	trace2_region_leave("merge", "directory renames", opt->repo);


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v2 1/3] t6423: add tests of dual directory rename plus add/add conflict
  2022-07-01  2:57       ` Elijah Newren
@ 2022-07-01  9:29         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-01  9:29 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Jonathan Tan,
	Calvin Wan


On Thu, Jun 30 2022, Elijah Newren wrote:

> On Thu, Jun 30, 2022 at 3:26 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> On Thu, Jun 30 2022, Elijah Newren via GitGitGadget wrote:
>>
>> > From: Elijah Newren <newren@gmail.com>
>>
>> > +test_setup_12l () {
>> > +     test_create_repo 12l_$1 &&
>>
>> Can & should just be "git init", see f0d4d398e28 (test-lib: split up and
>> deprecate test_create_repo(), 2021-05-10).
>
> I've switched to "git init" and even encouraged others to do the same
> in other test scripts, but since literally every other test in this
> file uses test_create_repo, I think they should all be converted at
> once and just be consistent here.  But, so we can stop having this
> conversation, after this series lands, I'll send one in to update the
> various merge testfiles that make heavy use of test_create_repo and
> convert them over.

Sorry, genuinely I didn't mean to mention it again, just saw it scroll
past & wondered if it was intentional. I'm fine with keeping it...

>> > +     (
>> > +             cd 12l_$1 &&
>> > +
>> > +             mkdir -p sub1 sub2
>>
>> The "-p" here isn't needed, you're not creating recursive directories.
>
> Indeed; will fix.

Thanks!

>> > +             git ls-files -s >out &&
>> > +             test_line_count = 5 out &&
>>
>> Can't this just use test_stdout_line_count? Except...
>
> Ooh, new function from late last year that I was unaware of.  Yeah,
> I'm happy to start using it.
>
>> > +
>> > +             git rev-parse >actual \
>> > +                     :0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
>> > +                     :2:sub1/sub2/new_add_add_file \
>> > +                     :3:sub1/sub2/new_add_add_file &&
>> > +             git rev-parse >expect \
>> > +                     O:sub1/file  B:sub1/newfile O:sub2/other \
>> > +                     A:sub2/new_add_add_file \
>> > +                     B:sub1/sub2/new_add_add_file &&
>> > +             test_cmp expect actual &&
>> > +
>> > +             git ls-files -o >actual &&
>> > +             test_write_lines actual expect out >expect &&
>> > +             test_cmp expect actual
>>
>> This test seems a bit odd, here we're just checking that we've created
>> scratch files for the internal use of our test, why is it important that
>> we have an "out" file, when the only reason we have it is because we
>> needed it for checking the "ls-files" line count above?
>
> Nah, you've misunderstood the purpose of the check.  It isn't "make
> sure that these untracked files are present among whatever else might
> also be present", it's "make sure the merge step did not introduce
> *any* untracked files" (something the recursive backend periodically
> did, and they weren't cruft untracked files but stored actual merge
> results). 

Ah, thanks!

> There wasn't a nice easy check for that, the closest was to
> translate the requirement to "make sure the only untracked files are
> the ones explicitly added by this test script", which is the check you
> see here.  I don't actually care about "actual", "expect", or "out", I
> just care that there aren't any _other_ untracked files.

I'm fine with keeping this as-is, but FWIW perhaps this pattern is more
explicit about the intent:

	test_expect_success 'do merge stuff' '
		... &&
		rm -f expect actual &&
		git ls-files -o ':!out' >out &&
		test_must_be_empty out
	'

Or piping it to ".git/out", to avoid the path exclude, but like this is
also fine:) Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds
  2022-07-01  5:23   ` [PATCH v3 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
                       ` (4 preceding siblings ...)
  2022-07-01  5:23     ` [PATCH v3 5/5] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
@ 2022-07-05  1:33     ` Elijah Newren via GitGitGadget
  2022-07-05  1:33       ` [PATCH v4 1/5] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
                         ` (5 more replies)
  5 siblings, 6 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-07-05  1:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason, Elijah Newren

This series adds some testcases based on the tensorflow repository issue
reported by Glen Choo at [1], demonstrating bugs in both the ort and
recursive strategies. It also provides a fix for the ort strategy.

Changes since v3:

 * Remove use of for-initializer

Changes since v2:

 * Added a couple preparatory cleanup patches
 * Added a comment about why sub1/newfile is important to the new testcases
 * A couple other minor code cleanups

Changes since v1:

 * Fixed some wording issues in comments, and added a bit more details to
   one of the commit messages

[1]
https://lore.kernel.org/git/kl6lee006mle.fsf@chooglen-macbookpro.roam.corp.google.com/

Elijah Newren (5):
  t6423: add tests of dual directory rename plus add/add conflict
  merge-ort: small cleanups of check_for_directory_rename
  merge-ort: make a separate function for freeing struct collisions
  merge-ort: shuffle the computation and cleanup of potential collisions
  merge-ort: fix issue with dual rename and add/add conflict

 merge-ort.c                         |  74 +++++++++++++-------
 t/t6423-merge-rename-directories.sh | 105 ++++++++++++++++++++++++++++
 2 files changed, 153 insertions(+), 26 deletions(-)


base-commit: e54793a95afeea1e10de1e5ad7eab914e7416250
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1268%2Fnewren%2Ffix-dual-rename-into-each-other-plus-conflicting-adds-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1268/newren/fix-dual-rename-into-each-other-plus-conflicting-adds-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/1268

Range-diff vs v3:

 1:  a16a1c4d947 = 1:  a16a1c4d947 t6423: add tests of dual directory rename plus add/add conflict
 2:  297fef60b19 = 2:  297fef60b19 merge-ort: small cleanups of check_for_directory_rename
 3:  f5f87acbbd2 = 3:  f5f87acbbd2 merge-ort: make a separate function for freeing struct collisions
 4:  d3eac3d0bf6 ! 4:  9d813116112 merge-ort: shuffle the computation and cleanup of potential collisions
     @@ merge-ort.c: static int detect_and_process_renames(struct merge_options *opt,
       	ALLOC_GROW(combined.queue,
       		   renames->pairs[1].nr + renames->pairs[2].nr,
       		   combined.alloc);
     -+	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
     ++	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
      +		int other_side = 3 - i;
      +		compute_collisions(&collisions[i],
      +				   &renames->dir_renames[other_side],
     @@ merge-ort.c: static int detect_and_process_renames(struct merge_options *opt,
      +				 collisions,
       				 &renames->dir_renames[1],
       				 &renames->dir_renames[2]);
     -+	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++)
     ++	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++)
      +		free_collisions(&collisions[i]);
       	STABLE_QSORT(combined.queue, combined.nr, compare_pairs);
       	trace2_region_leave("merge", "directory renames", opt->repo);
 5:  121761e26e2 = 5:  993ac405408 merge-ort: fix issue with dual rename and add/add conflict

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH v4 1/5] t6423: add tests of dual directory rename plus add/add conflict
  2022-07-05  1:33     ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
@ 2022-07-05  1:33       ` Elijah Newren via GitGitGadget
  2022-07-05  1:33       ` [PATCH v4 2/5] merge-ort: small cleanups of check_for_directory_rename Elijah Newren via GitGitGadget
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-07-05  1:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

This is an attempt at minimalizing a testcase reported by Glen Choo
with tensorflow where merge-ort would report an assertion failure:

    Assertion failed: (ci->filemask == 2 || ci->filemask == 4), function apply_directory_rename_modifications, file merge-ort.c, line 2410

reversing the direction of the merge provides a different error:

    error: cache entry has null sha1: ...
    fatal: unable to write .git/index

so we add testcases for both.  With these new testcases, the
recursive strategy differs in that it returns the latter error for
both merge directions.

These testcases are somehow a little different than Glen's original
tensorflow testcase in that these ones trigger a bug with the recursive
algorithm whereas his testcase didn't.  I figure that means these
testcases somehow manage to be more comprehensive.

Reported-by: Glen Choo <chooglen@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t6423-merge-rename-directories.sh | 105 ++++++++++++++++++++++++++++
 1 file changed, 105 insertions(+)

diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
index 479db32cd62..ed5586de28c 100755
--- a/t/t6423-merge-rename-directories.sh
+++ b/t/t6423-merge-rename-directories.sh
@@ -5199,6 +5199,111 @@ test_expect_success '12k: Directory rename with sibling causes rename-to-self' '
 	)
 '
 
+# Testcase 12l, Both sides rename a directory into the other side, both add
+#   a file which after directory renames are the same filename
+#   Commit O: sub1/file,                 sub2/other
+#   Commit A: sub3/file,                 sub2/{other, new_add_add_file_1}
+#   Commit B: sub1/{file, newfile}, sub1/sub2/{other, new_add_add_file_2}
+#
+#   In words:
+#     A: sub1/ -> sub3/, add sub2/new_add_add_file_1
+#     B: sub2/ -> sub1/sub2, add sub1/newfile, add sub1/sub2/new_add_add_file_2
+#
+#   Expected: sub3/{file, newfile, sub2/other}
+#             CONFLICT (add/add): sub1/sub2/new_add_add_file
+#
+#   Note that sub1/newfile is not extraneous.  Directory renames are only
+#   detected if they are needed, and they are only needed if the old directory
+#   had a new file added on the opposite side of history.  So sub1/newfile
+#   is needed for there to be a sub1/ -> sub3/ rename.
+
+test_setup_12l () {
+	test_create_repo 12l_$1 &&
+	(
+		cd 12l_$1 &&
+
+		mkdir sub1 sub2
+		echo file >sub1/file &&
+		echo other >sub2/other &&
+		git add sub1 sub2 &&
+		git commit -m "O" &&
+
+		git branch O &&
+		git branch A &&
+		git branch B &&
+
+		git checkout A &&
+		git mv sub1 sub3 &&
+		echo conflicting >sub2/new_add_add_file &&
+		git add sub2 &&
+		test_tick &&
+		git add -u &&
+		git commit -m "A" &&
+
+		git checkout B &&
+		echo dissimilar >sub2/new_add_add_file &&
+		echo brand >sub1/newfile &&
+		git add sub1 sub2 &&
+		git mv sub2 sub1 &&
+		test_tick &&
+		git commit -m "B"
+	)
+}
+
+test_expect_merge_algorithm failure failure '12l (B into A): Rename into each other + add/add conflict' '
+	test_setup_12l BintoA &&
+	(
+		cd 12l_BintoA &&
+
+		git checkout -q A^0 &&
+
+		test_must_fail git -c merge.directoryRenames=true merge -s recursive B^0 &&
+
+		test_stdout_line_count = 5 git ls-files -s &&
+
+		git rev-parse >actual \
+			:0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
+			:2:sub1/sub2/new_add_add_file \
+			:3:sub1/sub2/new_add_add_file &&
+		git rev-parse >expect \
+			O:sub1/file  B:sub1/newfile O:sub2/other \
+			A:sub2/new_add_add_file \
+			B:sub1/sub2/new_add_add_file &&
+		test_cmp expect actual &&
+
+		git ls-files -o >actual &&
+		test_write_lines actual expect >expect &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_merge_algorithm failure failure '12l (A into B): Rename into each other + add/add conflict' '
+	test_setup_12l AintoB &&
+	(
+		cd 12l_AintoB &&
+
+		git checkout -q B^0 &&
+
+		test_must_fail git -c merge.directoryRenames=true merge -s recursive A^0 &&
+
+		test_stdout_line_count = 5 git ls-files -s &&
+
+		git rev-parse >actual \
+			:0:sub3/file :0:sub3/newfile :0:sub3/sub2/other \
+			:2:sub1/sub2/new_add_add_file \
+			:3:sub1/sub2/new_add_add_file &&
+		git rev-parse >expect \
+			O:sub1/file  B:sub1/newfile O:sub2/other \
+			B:sub1/sub2/new_add_add_file \
+			A:sub2/new_add_add_file &&
+		test_cmp expect actual &&
+
+		git ls-files -o >actual &&
+		test_write_lines actual expect >expect &&
+		test_cmp expect actual
+	)
+'
+
 ###########################################################################
 # SECTION 13: Checking informational and conflict messages
 #
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 2/5] merge-ort: small cleanups of check_for_directory_rename
  2022-07-05  1:33     ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
  2022-07-05  1:33       ` [PATCH v4 1/5] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
@ 2022-07-05  1:33       ` Elijah Newren via GitGitGadget
  2022-07-05  1:33       ` [PATCH v4 3/5] merge-ort: make a separate function for freeing struct collisions Elijah Newren via GitGitGadget
                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-07-05  1:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@palantir.com>

No functional changes, just some preparatory cleanups.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@palantir.com>
---
 merge-ort.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 8545354dafd..ff037cca8d2 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2267,18 +2267,17 @@ static char *check_for_directory_rename(struct merge_options *opt,
 					struct strmap *collisions,
 					int *clean_merge)
 {
-	char *new_path = NULL;
+	char *new_path;
 	struct strmap_entry *rename_info;
-	struct strmap_entry *otherinfo = NULL;
+	struct strmap_entry *otherinfo;
 	const char *new_dir;
 
+	/* Cases where we don't have a directory rename for this path */
 	if (strmap_empty(dir_renames))
-		return new_path;
+		return NULL;
 	rename_info = check_dir_renamed(path, dir_renames);
 	if (!rename_info)
-		return new_path;
-	/* old_dir = rename_info->key; */
-	new_dir = rename_info->value;
+		return NULL;
 
 	/*
 	 * This next part is a little weird.  We do not want to do an
@@ -2304,6 +2303,7 @@ static char *check_for_directory_rename(struct merge_options *opt,
 	 * As it turns out, this also prevents N-way transient rename
 	 * confusion; See testcases 9c and 9d of t6043.
 	 */
+	new_dir = rename_info->value; /* old_dir = rename_info->key; */
 	otherinfo = strmap_get_entry(dir_rename_exclusions, new_dir);
 	if (otherinfo) {
 		path_msg(opt, rename_info->key, 1,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 3/5] merge-ort: make a separate function for freeing struct collisions
  2022-07-05  1:33     ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
  2022-07-05  1:33       ` [PATCH v4 1/5] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
  2022-07-05  1:33       ` [PATCH v4 2/5] merge-ort: small cleanups of check_for_directory_rename Elijah Newren via GitGitGadget
@ 2022-07-05  1:33       ` Elijah Newren via GitGitGadget
  2022-07-05  1:33       ` [PATCH v4 4/5] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
                         ` (2 subsequent siblings)
  5 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-07-05  1:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@palantir.com>

This commit makes no functional changes, it's just some code movement in
preparation for later changes.

Signed-off-by: Elijah Newren <newren@palantir.com>
---
 merge-ort.c | 38 ++++++++++++++++++++++----------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index ff037cca8d2..1514dd173c0 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2259,6 +2259,27 @@ static void compute_collisions(struct strmap *collisions,
 	}
 }
 
+static void free_collisions(struct strmap *collisions)
+{
+	struct hashmap_iter iter;
+	struct strmap_entry *entry;
+
+	/* Free each value in the collisions map */
+	strmap_for_each_entry(collisions, &iter, entry) {
+		struct collision_info *info = entry->value;
+		string_list_clear(&info->source_files, 0);
+	}
+	/*
+	 * In compute_collisions(), we set collisions.strdup_strings to 0
+	 * so that we wouldn't have to make another copy of the new_path
+	 * allocated by apply_dir_rename().  But now that we've used them
+	 * and have no other references to these strings, it is time to
+	 * deallocate them.
+	 */
+	free_strmap_strings(collisions);
+	strmap_clear(collisions, 1);
+}
+
 static char *check_for_directory_rename(struct merge_options *opt,
 					const char *path,
 					unsigned side_index,
@@ -3029,8 +3050,6 @@ static int collect_renames(struct merge_options *opt,
 	int i, clean = 1;
 	struct strmap collisions;
 	struct diff_queue_struct *side_pairs;
-	struct hashmap_iter iter;
-	struct strmap_entry *entry;
 	struct rename_info *renames = &opt->priv->renames;
 
 	side_pairs = &renames->pairs[side_index];
@@ -3076,20 +3095,7 @@ static int collect_renames(struct merge_options *opt,
 		result->queue[result->nr++] = p;
 	}
 
-	/* Free each value in the collisions map */
-	strmap_for_each_entry(&collisions, &iter, entry) {
-		struct collision_info *info = entry->value;
-		string_list_clear(&info->source_files, 0);
-	}
-	/*
-	 * In compute_collisions(), we set collisions.strdup_strings to 0
-	 * so that we wouldn't have to make another copy of the new_path
-	 * allocated by apply_dir_rename().  But now that we've used them
-	 * and have no other references to these strings, it is time to
-	 * deallocate them.
-	 */
-	free_strmap_strings(&collisions);
-	strmap_clear(&collisions, 1);
+	free_collisions(&collisions);
 	return clean;
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 4/5] merge-ort: shuffle the computation and cleanup of potential collisions
  2022-07-05  1:33     ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
                         ` (2 preceding siblings ...)
  2022-07-05  1:33       ` [PATCH v4 3/5] merge-ort: make a separate function for freeing struct collisions Elijah Newren via GitGitGadget
@ 2022-07-05  1:33       ` Elijah Newren via GitGitGadget
  2022-07-05  1:33       ` [PATCH v4 5/5] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
  2022-07-06 16:52       ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Junio C Hamano
  5 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-07-05  1:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Run compute_collisions() for renames on both sides of history before
any calls to collect_renames(), and do not free the computed collisions
until after both calls to collect_renames().  This is just a code
reorganization at this point that doesn't make sense on its own, but
will permit us to use the computed collision info from both sides
within each call to collect_renames() in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 1514dd173c0..a37c1c19aca 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2335,7 +2335,8 @@ static char *check_for_directory_rename(struct merge_options *opt,
 	}
 
 	new_path = handle_path_level_conflicts(opt, path, side_index,
-					       rename_info, collisions);
+					       rename_info,
+					       &collisions[side_index]);
 	*clean_merge &= (new_path != NULL);
 
 	return new_path;
@@ -3044,16 +3045,15 @@ static int detect_regular_renames(struct merge_options *opt,
 static int collect_renames(struct merge_options *opt,
 			   struct diff_queue_struct *result,
 			   unsigned side_index,
+			   struct strmap *collisions,
 			   struct strmap *dir_renames_for_side,
 			   struct strmap *rename_exclusions)
 {
 	int i, clean = 1;
-	struct strmap collisions;
 	struct diff_queue_struct *side_pairs;
 	struct rename_info *renames = &opt->priv->renames;
 
 	side_pairs = &renames->pairs[side_index];
-	compute_collisions(&collisions, dir_renames_for_side, side_pairs);
 
 	for (i = 0; i < side_pairs->nr; ++i) {
 		struct diff_filepair *p = side_pairs->queue[i];
@@ -3069,7 +3069,7 @@ static int collect_renames(struct merge_options *opt,
 						      side_index,
 						      dir_renames_for_side,
 						      rename_exclusions,
-						      &collisions,
+						      collisions,
 						      &clean);
 
 		possibly_cache_new_pair(renames, p, side_index, new_path);
@@ -3095,7 +3095,6 @@ static int collect_renames(struct merge_options *opt,
 		result->queue[result->nr++] = p;
 	}
 
-	free_collisions(&collisions);
 	return clean;
 }
 
@@ -3106,6 +3105,7 @@ static int detect_and_process_renames(struct merge_options *opt,
 {
 	struct diff_queue_struct combined = { 0 };
 	struct rename_info *renames = &opt->priv->renames;
+	struct strmap collisions[3];
 	int need_dir_renames, s, i, clean = 1;
 	unsigned detection_run = 0;
 
@@ -3155,12 +3155,22 @@ static int detect_and_process_renames(struct merge_options *opt,
 	ALLOC_GROW(combined.queue,
 		   renames->pairs[1].nr + renames->pairs[2].nr,
 		   combined.alloc);
+	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
+		int other_side = 3 - i;
+		compute_collisions(&collisions[i],
+				   &renames->dir_renames[other_side],
+				   &renames->pairs[i]);
+	}
 	clean &= collect_renames(opt, &combined, MERGE_SIDE1,
+				 collisions,
 				 &renames->dir_renames[2],
 				 &renames->dir_renames[1]);
 	clean &= collect_renames(opt, &combined, MERGE_SIDE2,
+				 collisions,
 				 &renames->dir_renames[1],
 				 &renames->dir_renames[2]);
+	for (i = MERGE_SIDE1; i <= MERGE_SIDE2; i++)
+		free_collisions(&collisions[i]);
 	STABLE_QSORT(combined.queue, combined.nr, compare_pairs);
 	trace2_region_leave("merge", "directory renames", opt->repo);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v4 5/5] merge-ort: fix issue with dual rename and add/add conflict
  2022-07-05  1:33     ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
                         ` (3 preceding siblings ...)
  2022-07-05  1:33       ` [PATCH v4 4/5] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
@ 2022-07-05  1:33       ` Elijah Newren via GitGitGadget
  2022-07-06 16:52       ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Junio C Hamano
  5 siblings, 0 replies; 45+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-07-05  1:33 UTC (permalink / raw)
  To: git
  Cc: Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

There is code in both merge-recursive and merge-ort for avoiding doubly
transitive renames (i.e. one side renames directory A/ -> B/, and the
other side renames directory B/ -> C/), because this combination would
otherwise make a mess for new files added to A/ on the first side and
wondering which directory they end up in -- especially if there were
even more renames such as the first side renaming C/ -> D/.  In such
cases, it just turns "off" directory rename detection for the higher
order transitive cases.

The testcases added in t6423 a couple commits ago are slightly different
but similar in principle.  They involve a similar case of paired
renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
a leading directory of B/ to C/.  And both sides add a new file
somewhere under the directory that the other side will rename.  While
the new files added start within different directories and thus could
logically end up within different directories, it is weird for a file
on one side to end up where the other one started and not move along
with it.  So, let's just turn off directory rename detection in this
case as well.

Another way to look at this is that if the source name involved in a
directory rename on one side is the target name of a directory rename
operation for a file from the other side, then we avoid the doubly
transitive rename.  (More concretely, if a directory rename on side D
wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
already had a file named NEW_NAME, and a directory rename on side E
wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
directory rename detection for NEW_NAME to prevent the
NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
conflict on NEW_NAME.)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c                         | 8 +++++++-
 t/t6423-merge-rename-directories.sh | 4 ++--
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index a37c1c19aca..3855f9de25e 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2292,10 +2292,16 @@ static char *check_for_directory_rename(struct merge_options *opt,
 	struct strmap_entry *rename_info;
 	struct strmap_entry *otherinfo;
 	const char *new_dir;
+	int other_side = 3 - side_index;
 
-	/* Cases where we don't have a directory rename for this path */
+	/*
+	 * Cases where we don't have or don't want a directory rename for
+	 * this path.
+	 */
 	if (strmap_empty(dir_renames))
 		return NULL;
+	if (strmap_get(&collisions[other_side], path))
+		return NULL;
 	rename_info = check_dir_renamed(path, dir_renames);
 	if (!rename_info)
 		return NULL;
diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
index ed5586de28c..99baf77cbfd 100755
--- a/t/t6423-merge-rename-directories.sh
+++ b/t/t6423-merge-rename-directories.sh
@@ -5250,7 +5250,7 @@ test_setup_12l () {
 	)
 }
 
-test_expect_merge_algorithm failure failure '12l (B into A): Rename into each other + add/add conflict' '
+test_expect_merge_algorithm failure success '12l (B into A): Rename into each other + add/add conflict' '
 	test_setup_12l BintoA &&
 	(
 		cd 12l_BintoA &&
@@ -5277,7 +5277,7 @@ test_expect_merge_algorithm failure failure '12l (B into A): Rename into each ot
 	)
 '
 
-test_expect_merge_algorithm failure failure '12l (A into B): Rename into each other + add/add conflict' '
+test_expect_merge_algorithm failure success '12l (A into B): Rename into each other + add/add conflict' '
 	test_setup_12l AintoB &&
 	(
 		cd 12l_AintoB &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds
  2022-07-05  1:33     ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
                         ` (4 preceding siblings ...)
  2022-07-05  1:33       ` [PATCH v4 5/5] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
@ 2022-07-06 16:52       ` Junio C Hamano
  5 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2022-07-06 16:52 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Elijah Newren, Jonathan Tan, Calvin Wan,
	Ævar Arnfjörð Bjarmason

"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This series adds some testcases based on the tensorflow repository issue
> reported by Glen Choo at [1], demonstrating bugs in both the ort and
> recursive strategies. It also provides a fix for the ort strategy.
>
> Changes since v3:
>
>  * Remove use of for-initializer

I missed them while queuing the previous round and updates look OK.

I however personally find the resulting code irritating to read.
The counter 'i', which never is used for two purposes at the same
time, has multiple "hiding" declarations in this function, in
addition to its top-level declaration.  It forces the readers to
think about which variable each reference of 'i' talks about, and
more importantly, if the value in outer 'i' of its last use, after
an inner 'i' is used, matters.

Will queue.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/3] merge-ort: fix issue with dual rename and add/add conflict
  2022-06-30  0:05     ` Elijah Newren
@ 2022-07-06 17:25       ` Jonathan Tan
  0 siblings, 0 replies; 45+ messages in thread
From: Jonathan Tan @ 2022-07-06 17:25 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Jonathan Tan, Elijah Newren via GitGitGadget, Git Mailing List

Sorry for getting back to this so late. I don't have any issues with the
patches, but just to close the loop:

Elijah Newren <newren@gmail.com> writes:
> On Mon, Jun 27, 2022 at 11:47 AM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > > The testcases added in t6423 a couple commits ago are slightly different
> > > but similar in principle.  They involve a similar case of paired
> > > renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
> > > a leading directory of B/ to C/.
> >
> > Hmm...one side moved sub1 -> sub3 and the other moved sub2 from the root
> > to under sub1, right? So it's more like A/ -> B/ and C/ -> A/C/.
> 
> Hmm, maybe I should have been more explicit in my mapping:
>    A = sub2
>    B = sub1/sub2
>    leading directory of B = sub1
>    C = sub3

Substituting into A/ -> B/ and "a leading directory of B/ to C/", we
get:

  sub2 -> sub1/sub2 and sub1 -> sub3

which is indeed what is happening. I see, thanks.

> > > And both sides add a new file
> > > somewhere under the directory that the other side will rename.
> >
> > Rename or move, I think.
> 
> I'm confused; I don't understand what this comment means.  Renames
> tend to be created using "git mv", before or after making content
> changes, so to me a file being renamed or a file being moved to a
> different location are synonymous.  What distinction are you making
> between renames and moves?

I was thinking that a rename means that the directory still has the same
parent directory, whereas a move means that the directory keeps its
basename but has a different parent directory. Maybe it's just the way I
think about things, but the important thing here is that a directory was
moved so that its parent directory is a directory that is different and
that already exists, and I think that this meaning is lost when we say
"rename". But it might just be me.

> Hmm, perhaps I should change this to:
> 
> /* Cases where we don't have or don't want a directory rename for this
> path, so we return NULL */
> 
> The purpose of this function is to check whether a given path would be
> modified by directory renaming to get a new path.  So, "no new path"
> means we don't have an applicable directory rename or don't want to
> use it.

I see - the new text makes sense.

> > >       if (strmap_empty(dir_renames))
> > >               return new_path;
> > > +     if (strmap_get(&collisions[other_side], path))
> > > +             return new_path;
> >
> > So as far as I understand, collisions here, for a given side, is a map.
> > The map's keys are all the filenames of added and renamed files (I'm
> > assuming that's what 'A' and 'R' are) from that side after any directory
> > moves on the other side are applied. So, for example, if we add "foo/a"
> > on side A and rename "foo/" to "bar/" on side B, then side A's collision
> > map would have a key "bar/a". So I'm not sure if "collision" is the
> > right name here, but the existing code already uses it so I'll leave it
> > be.
> 
> Let's take your example a bit further, to discuss the original kind of
> usecase that "collisions" was written for.  In addition to adding
> "foo/a" on side A, we also add "foo2/a" and "foo3/a".  And then in
> addition to renaming "foo/" to "bar/" on side B, we also rename
> "foo2/" to "bar/" and "foo3/" to "bar/", thus merging all of foo/,
> foo2/, and foo3/ into a single directory named bar/.  If the files in
> foo/, foo2/, and foo3/ on side B were all unique, you can see how
> there'd be no problem merging these directories together.  The problem
> only comes at merge time when you attempt to apply the directory
> renames from side B to the new files on side A.  That's when you get
> collisions, in this case three files that would be named bar/a.
> 
> Checking for such collisions was the purpose of the "collisions"
> metadata, so I think the name is fitting.  The only problem is that
> we're reusing that data now for a slightly different purpose, though I
> think it's still "collision-y".

That makes sense, thanks.

> > It makes sense that this situation (of side A having "bar/a" because
> > side B renamed "foo/" to "bar/", and at the same time, side B adds its
> > own "bar/a") is weird, as stated in the commit message, so I don't mind
> > disabling checking for directory rename in this case.  However, in
> > theory, I don't see how disabling this check would fix anything, since
> > the bug seems to be about two different files on different sides being
> > renamed to the same filename through some convoluted means. (Unless this
> > is the only convoluted means to do it, and disabling it means that the
> > bug wouldn't occur.)

[snip]

> Hmm...let me see if I can explain this a different way.
> 
> The short version of the issue here is that if a directory rename
> wants to rename NEWFILE -> ALTERNATE_NEWFILE, but there is another
> directory rename on the other side of history that wants to rename
> ANOTHER_FILE -> NEWFILE, then we run into problems and have to turn
> off the NEWFILE -> ALTERNATE_NEWFILE.  This check here is all about
> this case.
> 
> To see why this is the problematic setup...
> 
> Now a conflict_info
> stores information about the OIDs and modes for all three sides of the
> merge (i.e. both sides of the merge and the base).  Whenever a rename
> is processed, we have to update this map, because the rename makes the
> conflict_info now apply to a different path.  In the simple cases, the
> conflict_info from the source path needs to be merged with the
> conflict_info for the target path, and the source path's conflict_info
> needs to be marked as null (literally setting the .is_null field to
> 1).  rename/rename and such can make this a bit more complicated.

Ah, I think I was missing this part. The intention is that processing
one side in this way (and thus modifying its conflict_info) would not
affect the processing of any other sides, but there is a bug that makes
it not so. (And the intention is not, say, when processing all sides,
to write the output in a new data structure so that the result is the
same no matter the order.)

So I think the answer to my question is: no, we do not want to be able
to rename two different files on different sides to the same filename
through any convoluted means, and if that happens, it's a bug.

[snip illustration of what happens in either merge order]

> To avoid this order dependence, and the weird multiple-renaming of a
> single path, we just want to turn off directory renames when the
> source of a directory rename on one side is the target of a directory
> rename on the other.  That's what this patch does.
> 
> 
> Does that help?  Or did I make it more confusing?

I think that helped, thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* C99 "for (int ..." form on "master" (was: [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions)
  2022-07-01  9:16       ` Ævar Arnfjörð Bjarmason
@ 2022-07-25 12:00         ` Ævar Arnfjörð Bjarmason
  2022-07-26  2:14           ` Elijah Newren
  0 siblings, 1 reply; 45+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-25 12:00 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Jonathan Tan, Calvin Wan, Elijah Newren, Junio C Hamano,
	Johannes Schindelin


On Fri, Jul 01 2022, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Jul 01 2022, Elijah Newren via GitGitGadget wrote:
> [...]
>> @@ -3106,6 +3105,7 @@ static int detect_and_process_renames(struct merge_options *opt,
>>  {
>>  	struct diff_queue_struct combined = { 0 };
>>  	struct rename_info *renames = &opt->priv->renames;
>> +	struct strmap collisions[3];
>>  	int need_dir_renames, s, i, clean = 1;
>>  	unsigned detection_run = 0;
>>  
>> @@ -3155,12 +3155,22 @@ static int detect_and_process_renames(struct merge_options *opt,
>>  	ALLOC_GROW(combined.queue,
>>  		   renames->pairs[1].nr + renames->pairs[2].nr,
>>  		   combined.alloc);
>> +	for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
>
> The "int i" here will need to be pre-declared earlier, per: 6563706568b
> (CodingGuidelines: give deadline for "for (int i = 0; ...", 2022-03-30)
>
> I also don't mind us just saying "we've waited enough". Junio?

This case got fixed, but per the changed $subject others have snuck
through.

Since be733e12001 (Merge branch 'en/merge-tree', 2022-07-14) we've had
these forms on "master", see 6debb7527b0 (merge-ort: store messages in a
list, not in a single strbuf, 2022-06-18) and cb2607759e2 (merge-ort:
store more specific conflict information, 2022-06-18).

We could "fix" those, but per the above I think it's just as valid to
just move up the deadline & say that 2.38.0 will have a hard dependency
on this C99 feature...

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: C99 "for (int ..." form on "master" (was: [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions)
  2022-07-25 12:00         ` C99 "for (int ..." form on "master" (was: [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions) Ævar Arnfjörð Bjarmason
@ 2022-07-26  2:14           ` Elijah Newren
  2022-07-26  4:48             ` C99 "for (int ..." form on "master" Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Elijah Newren @ 2022-07-26  2:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Jonathan Tan,
	Calvin Wan, Junio C Hamano, Johannes Schindelin

On Mon, Jul 25, 2022 at 5:04 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Fri, Jul 01 2022, Ævar Arnfjörð Bjarmason wrote:
>
> > On Fri, Jul 01 2022, Elijah Newren via GitGitGadget wrote:
> > [...]
> >> @@ -3106,6 +3105,7 @@ static int detect_and_process_renames(struct merge_options *opt,
> >>  {
> >>      struct diff_queue_struct combined = { 0 };
> >>      struct rename_info *renames = &opt->priv->renames;
> >> +    struct strmap collisions[3];
> >>      int need_dir_renames, s, i, clean = 1;
> >>      unsigned detection_run = 0;
> >>
> >> @@ -3155,12 +3155,22 @@ static int detect_and_process_renames(struct merge_options *opt,
> >>      ALLOC_GROW(combined.queue,
> >>                 renames->pairs[1].nr + renames->pairs[2].nr,
> >>                 combined.alloc);
> >> +    for (int i = MERGE_SIDE1; i <= MERGE_SIDE2; i++) {
> >
> > The "int i" here will need to be pre-declared earlier, per: 6563706568b
> > (CodingGuidelines: give deadline for "for (int i = 0; ...", 2022-03-30)
> >
> > I also don't mind us just saying "we've waited enough". Junio?
>
> This case got fixed, but per the changed $subject others have snuck
> through.
>
> Since be733e12001 (Merge branch 'en/merge-tree', 2022-07-14) we've had
> these forms on "master", see 6debb7527b0 (merge-ort: store messages in a
> list, not in a single strbuf, 2022-06-18) and cb2607759e2 (merge-ort:
> store more specific conflict information, 2022-06-18).
>
> We could "fix" those, but per the above I think it's just as valid to
> just move up the deadline & say that 2.38.0 will have a hard dependency
> on this C99 feature...

Thanks for catching this and bringing it up; sorry for missing it earlier.

6563706568b says we should "revisit this *around* November 2022"
(emphasis added).  2.38 will be released in October 2022.  So, maybe
it's fine as-is.

But if others prefer these be fixed over moving up the deadline, I'll
go ahead and submit a patch.

I guess we just need a call.  Junio?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: C99 "for (int ..." form on "master"
  2022-07-26  2:14           ` Elijah Newren
@ 2022-07-26  4:48             ` Junio C Hamano
  0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2022-07-26  4:48 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Ævar Arnfjörð Bjarmason,
	Elijah Newren via GitGitGadget, Git Mailing List, Jonathan Tan,
	Calvin Wan, Johannes Schindelin

Elijah Newren <newren@gmail.com> writes:

> Thanks for catching this and bringing it up; sorry for missing it earlier.
>
> 6563706568b says we should "revisit this *around* November 2022"
> (emphasis added).  2.38 will be released in October 2022.  So, maybe
> it's fine as-is.
>
> But if others prefer these be fixed over moving up the deadline, I'll
> go ahead and submit a patch.
>
> I guess we just need a call.  Junio?

I do not see a need to do anything special at this moment.

When somebody reports that their favorite compiler does not yet grok
the syntax, we can ask them to ensure that there are only two places
that need "fixing" for them to compile.  It would have been ideal if
we kept the "weather balloon" to the only one known place, but the
second one would not be a disaster.

We would re-evaluate the situation when such a report comes, but
even if we decide to stay away from the syntax a bit longer, we know
exactly where the only two places we need to revert are.

One thing we should absolutely avoid is to open the floodgate and
deliberately add more of them, just for the sake of adding more of
them.

Thanks.

P.S. Tomorrow is that "gitster goes offline every other Tuesday"
day.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2022-07-26  4:48 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-22  4:20 [PATCH 0/3] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
2022-06-22  4:20 ` [PATCH 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
2022-06-27 18:20   ` Jonathan Tan
2022-06-30  0:06     ` Elijah Newren
2022-06-30 22:32       ` Jonathan Tan
2022-07-01  2:57         ` Elijah Newren
2022-06-27 22:30   ` Calvin Wan
2022-06-30  0:07     ` Elijah Newren
2022-06-22  4:20 ` [PATCH 2/3] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
2022-06-27 18:48   ` Jonathan Tan
2022-06-27 21:04     ` Calvin Wan
2022-06-30  0:05       ` Elijah Newren
2022-06-22  4:20 ` [PATCH 3/3] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
2022-06-27 18:47   ` Jonathan Tan
2022-06-30  0:05     ` Elijah Newren
2022-07-06 17:25       ` Jonathan Tan
2022-06-22  4:36 ` [PATCH 0/3] Fix dual rename into each other plus conflicting adds Elijah Newren
2022-06-30  6:57 ` [PATCH v2 " Elijah Newren via GitGitGadget
2022-06-30  6:57   ` [PATCH v2 1/3] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
2022-06-30 10:21     ` Ævar Arnfjörð Bjarmason
2022-07-01  2:57       ` Elijah Newren
2022-07-01  9:29         ` Ævar Arnfjörð Bjarmason
2022-06-30  6:57   ` [PATCH v2 2/3] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
2022-06-30 10:28     ` Ævar Arnfjörð Bjarmason
2022-07-01  3:02       ` Elijah Newren
2022-06-30  6:57   ` [PATCH v2 3/3] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
2022-06-30 10:31     ` Ævar Arnfjörð Bjarmason
2022-07-01  3:03       ` Elijah Newren
2022-07-01  5:23   ` [PATCH v3 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
2022-07-01  5:23     ` [PATCH v3 1/5] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
2022-07-01  5:23     ` [PATCH v3 2/5] merge-ort: small cleanups of check_for_directory_rename Elijah Newren via GitGitGadget
2022-07-01  5:23     ` [PATCH v3 3/5] merge-ort: make a separate function for freeing struct collisions Elijah Newren via GitGitGadget
2022-07-01  5:23     ` [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
2022-07-01  9:16       ` Ævar Arnfjörð Bjarmason
2022-07-25 12:00         ` C99 "for (int ..." form on "master" (was: [PATCH v3 4/5] merge-ort: shuffle the computation and cleanup of potential collisions) Ævar Arnfjörð Bjarmason
2022-07-26  2:14           ` Elijah Newren
2022-07-26  4:48             ` C99 "for (int ..." form on "master" Junio C Hamano
2022-07-01  5:23     ` [PATCH v3 5/5] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
2022-07-05  1:33     ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Elijah Newren via GitGitGadget
2022-07-05  1:33       ` [PATCH v4 1/5] t6423: add tests of dual directory rename plus add/add conflict Elijah Newren via GitGitGadget
2022-07-05  1:33       ` [PATCH v4 2/5] merge-ort: small cleanups of check_for_directory_rename Elijah Newren via GitGitGadget
2022-07-05  1:33       ` [PATCH v4 3/5] merge-ort: make a separate function for freeing struct collisions Elijah Newren via GitGitGadget
2022-07-05  1:33       ` [PATCH v4 4/5] merge-ort: shuffle the computation and cleanup of potential collisions Elijah Newren via GitGitGadget
2022-07-05  1:33       ` [PATCH v4 5/5] merge-ort: fix issue with dual rename and add/add conflict Elijah Newren via GitGitGadget
2022-07-06 16:52       ` [PATCH v4 0/5] Fix dual rename into each other plus conflicting adds Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).