git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/11] merge-ort: add basic rename detection
@ 2020-12-09 19:41 Elijah Newren via GitGitGadget
  2020-12-09 19:41 ` [PATCH 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
                   ` (11 more replies)
  0 siblings, 12 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-09 19:41 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

This series builds on en/merge-ort-2 and adds basic rename detection to
merge-ort.

The first five patches set up the basic algorithm structure. Patches 6-10
implement different type of rename-related cases, one-by-one. Patch 11 also
implements a rename-related case, but one that merge-recursive doesn't
really handle.

A couple early patches mimic or even copy from merge-recursive, but in later
patches the implementation here diverges heavily from merge-recursive's.
Patches 6-10 refer to this repeatedly, all having a slight variant of the
following paragraph in their commit messages:

The consolidation of $NUM separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

This patch series does not make more tests pass under
GIT_TEST_MERGE_ALGORITHM=ort by itself, because once renames are handled
then content merging needs to happen and that code still does a die("Not yet
implemented."). I'll be soon submitting parallel patches for more basic
conflict handling and recursiveness, and when all three series are merged
down (in any order), it will drop the number of test failures under
GIT_TEST_MERGE_ALGORITHM=ort from 1448 to 60.

Elijah Newren (11):
  merge-ort: add basic data structures for handling renames
  merge-ort: add initial outline for basic rename detection
  merge-ort: implement detect_regular_renames()
  merge-ort: implement compare_pairs() and collect_renames()
  merge-ort: add basic outline for process_renames()
  merge-ort: add implementation of both sides renaming identically
  merge-ort: add implementation of both sides renaming differently
  merge-ort: add implementation of rename collisions
  merge-ort: add implementation of rename/delete conflicts
  merge-ort: add implementation of normal rename handling
  merge-ort: add implementation of type-changed rename handling

 merge-ort.c | 439 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 423 insertions(+), 16 deletions(-)


base-commit: 2f73290465428ae9d088819b8a07bc5c4efe4a8b
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-812%2Fnewren%2Fort-renames-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-812/newren/ort-renames-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/812
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH 01/11] merge-ort: add basic data structures for handling renames
  2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
@ 2020-12-09 19:41 ` Elijah Newren via GitGitGadget
  2020-12-11  2:03   ` Derrick Stolee
  2020-12-09 19:41 ` [PATCH 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-09 19:41 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

This will grow later, but we only need a few fields for basic rename
handling.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index ef143348592..90baedac407 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -29,6 +29,25 @@
 #include "unpack-trees.h"
 #include "xdiff-interface.h"
 
+struct rename_info {
+	/*
+	 * pairs: pairing of filenames from diffcore_rename()
+	 *
+	 * Index 1 and 2 correspond to sides 1 & 2 as used in
+	 * conflict_info.stages.  Index 0 unused.
+	 */
+	struct diff_queue_struct pairs[3];
+
+	/*
+	 * needed_limit: value needed for inexact rename detection to run
+	 *
+	 * If the current rename limit wasn't high enough for inexact
+	 * rename detection to run, this records the limit needed.  Otherwise,
+	 * this value remains 0.
+	 */
+	int needed_limit;
+};
+
 struct merge_options_internal {
 	/*
 	 * paths: primary data structure in all of merge ort.
@@ -96,6 +115,11 @@ struct merge_options_internal {
 	 */
 	struct strmap output;
 
+	/*
+	 * renames: various data relating to rename detection
+	 */
+	struct rename_info *renames;
+
 	/*
 	 * current_dir_name: temporary var used in collect_merge_info_callback()
 	 *
@@ -1356,6 +1380,7 @@ static void merge_start(struct merge_options *opt, struct merge_result *result)
 
 	/* Initialization of opt->priv, our internal merge data */
 	opt->priv = xcalloc(1, sizeof(*opt->priv));
+	opt->priv->renames = xcalloc(1, sizeof(*opt->priv->renames));
 
 	/*
 	 * Although we initialize opt->priv->paths with strdup_strings=0,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 02/11] merge-ort: add initial outline for basic rename detection
  2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
  2020-12-09 19:41 ` [PATCH 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
@ 2020-12-09 19:41 ` Elijah Newren via GitGitGadget
  2020-12-11  2:39   ` Derrick Stolee
  2020-12-09 19:41 ` [PATCH 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-09 19:41 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 90baedac407..92b765dd3f0 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -617,20 +617,72 @@ static int handle_content_merge(struct merge_options *opt,
 
 /*** Function Grouping: functions related to regular rename detection ***/
 
+static int process_renames(struct merge_options *opt,
+			   struct diff_queue_struct *renames)
+{
+	die("Not yet implemented.");
+}
+
+static int compare_pairs(const void *a_, const void *b_)
+{
+	die("Not yet implemented.");
+}
+
+/* Call diffcore_rename() to compute which files have changed on given side */
+static void detect_regular_renames(struct merge_options *opt,
+				   struct tree *merge_base,
+				   struct tree *side,
+				   unsigned side_index)
+{
+	die("Not yet implemented.");
+}
+
+/*
+ * Get information of all renames which occurred in 'side_pairs', discarding
+ * non-renames.
+ */
+static int collect_renames(struct merge_options *opt,
+			   struct diff_queue_struct *result,
+			   unsigned side_index)
+{
+	die("Not yet implemented.");
+}
+
 static int detect_and_process_renames(struct merge_options *opt,
 				      struct tree *merge_base,
 				      struct tree *side1,
 				      struct tree *side2)
 {
-	int clean = 1;
+	struct diff_queue_struct combined;
+	struct rename_info *renames = opt->priv->renames;
+	int s, clean = 1;
+
+	memset(&combined, 0, sizeof(combined));
+
+	detect_regular_renames(opt, merge_base, side1, 1);
+	detect_regular_renames(opt, merge_base, side2, 2);
+
+	ALLOC_GROW(combined.queue,
+		   renames->pairs[1].nr + renames->pairs[2].nr,
+		   combined.alloc);
+	clean &= collect_renames(opt, &combined, 1);
+	clean &= collect_renames(opt, &combined, 2);
+	QSORT(combined.queue, combined.nr, compare_pairs);
+
+	clean &= process_renames(opt, &combined);
+
+	/* Free memory for renames->pairs[] and combined */
+	for (s = 1; s <= 2; s++) {
+		free(renames->pairs[s].queue);
+		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
+	}
+	if (combined.nr) {
+		int i;
+		for (i = 0; i < combined.nr; i++)
+			diff_free_filepair(combined.queue[i]);
+		free(combined.queue);
+	}
 
-	/*
-	 * Rename detection works by detecting file similarity.  Here we use
-	 * a really easy-to-implement scheme: files are similar IFF they have
-	 * the same filename.  Therefore, by this scheme, there are no renames.
-	 *
-	 * TODO: Actually implement a real rename detection scheme.
-	 */
 	return clean;
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 03/11] merge-ort: implement detect_regular_renames()
  2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
  2020-12-09 19:41 ` [PATCH 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
  2020-12-09 19:41 ` [PATCH 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
@ 2020-12-09 19:41 ` Elijah Newren via GitGitGadget
  2020-12-11  2:54   ` Derrick Stolee
  2020-12-09 19:41 ` [PATCH 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-09 19:41 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Based heavily on merge-recursive's get_diffpairs() function.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 92b765dd3f0..1ff637e57af 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -634,7 +634,33 @@ static void detect_regular_renames(struct merge_options *opt,
 				   struct tree *side,
 				   unsigned side_index)
 {
-	die("Not yet implemented.");
+	struct diff_options diff_opts;
+	struct rename_info *renames = opt->priv->renames;
+
+	repo_diff_setup(opt->repo, &diff_opts);
+	diff_opts.flags.recursive = 1;
+	diff_opts.flags.rename_empty = 0;
+	diff_opts.detect_rename = DIFF_DETECT_RENAME;
+	diff_opts.rename_limit = opt->rename_limit;
+	if (opt->rename_limit <= 0)
+		diff_opts.rename_limit = 1000;
+	diff_opts.rename_score = opt->rename_score;
+	diff_opts.show_rename_progress = opt->show_rename_progress;
+	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
+	diff_setup_done(&diff_opts);
+	diff_tree_oid(&merge_base->object.oid, &side->object.oid, "",
+		      &diff_opts);
+	diffcore_std(&diff_opts);
+
+	if (diff_opts.needed_rename_limit > opt->priv->renames->needed_limit)
+		opt->priv->renames->needed_limit = diff_opts.needed_rename_limit;
+
+	renames->pairs[side_index] = diff_queued_diff;
+
+	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
+	diff_queued_diff.nr = 0;
+	diff_queued_diff.queue = NULL;
+	diff_flush(&diff_opts);
 }
 
 /*
@@ -1379,6 +1405,10 @@ void merge_switch_to_result(struct merge_options *opt,
 			printf("%s", sb->buf);
 		}
 		string_list_clear(&olist, 0);
+
+		/* Also include needed rename limit adjustment now */
+		diff_warn_rename_limit("merge.renamelimit",
+				       opti->renames->needed_limit, 0);
 	}
 
 	merge_finalize(opt, result);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 04/11] merge-ort: implement compare_pairs() and collect_renames()
  2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-12-09 19:41 ` [PATCH 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
@ 2020-12-09 19:41 ` Elijah Newren via GitGitGadget
  2020-12-11  3:00   ` Derrick Stolee
  2020-12-09 19:41 ` [PATCH 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-09 19:41 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 1ff637e57af..3cdf8124b85 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -625,7 +625,13 @@ static int process_renames(struct merge_options *opt,
 
 static int compare_pairs(const void *a_, const void *b_)
 {
-	die("Not yet implemented.");
+	const struct diff_filepair *a = *((const struct diff_filepair **)a_);
+	const struct diff_filepair *b = *((const struct diff_filepair **)b_);
+
+	int cmp = strcmp(a->one->path, b->one->path);
+	if (cmp)
+		return cmp;
+	return a->score - b->score;
 }
 
 /* Call diffcore_rename() to compute which files have changed on given side */
@@ -671,7 +677,24 @@ static int collect_renames(struct merge_options *opt,
 			   struct diff_queue_struct *result,
 			   unsigned side_index)
 {
-	die("Not yet implemented.");
+	int i, clean = 1;
+	struct diff_queue_struct *side_pairs;
+	struct rename_info *renames = opt->priv->renames;
+
+	side_pairs = &renames->pairs[side_index];
+
+	for (i = 0; i < side_pairs->nr; ++i) {
+		struct diff_filepair *p = side_pairs->queue[i];
+
+		if (p->status != 'R') {
+			diff_free_filepair(p);
+			continue;
+		}
+		p->score = side_index;
+		result->queue[result->nr++] = p;
+	}
+
+	return clean;
 }
 
 static int detect_and_process_renames(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 05/11] merge-ort: add basic outline for process_renames()
  2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-12-09 19:41 ` [PATCH 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
@ 2020-12-09 19:41 ` Elijah Newren via GitGitGadget
  2020-12-11  3:24   ` Derrick Stolee
  2020-12-09 19:41 ` [PATCH 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-09 19:41 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add code which determines which kind of special rename case each rename
corresponds to, but leave the handling of each type unimplemented for
now.  Future commits will implement each one.

There is some tenuous resemblance to merge-recursive's
process_renames(), but comparing the two is very unlikely to yield any
insights.  merge-ort's process_renames() is a bit complex and I would
prefer if I could simplify it more, but it is far easier to grok than
merge-recursive's function of the same name in my opinion.  Plus,
merge-ort handles more rename conflict types than merge-recursive does.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 97 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 3cdf8124b85..faec29db955 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -620,7 +620,103 @@ static int handle_content_merge(struct merge_options *opt,
 static int process_renames(struct merge_options *opt,
 			   struct diff_queue_struct *renames)
 {
-	die("Not yet implemented.");
+	int clean_merge = 1, i;
+
+	for (i = 0; i < renames->nr; ++i) {
+		const char *oldpath = NULL, *newpath;
+		struct diff_filepair *pair = renames->queue[i];
+		struct conflict_info *oldinfo = NULL, *newinfo = NULL;
+		struct strmap_entry *old_ent, *new_ent;
+		unsigned int old_sidemask;
+		int target_index, other_source_index;
+		int source_deleted, collision, type_changed;
+
+		old_ent = strmap_get_entry(&opt->priv->paths, pair->one->path);
+		oldpath = old_ent->key;
+		oldinfo = old_ent->value;
+
+		new_ent = strmap_get_entry(&opt->priv->paths, pair->two->path);
+		newpath = new_ent->key;
+		newinfo = new_ent->value;
+
+		/*
+		 * diff_filepairs have copies of pathnames, thus we have to
+		 * use standard 'strcmp()' (negated) instead of '=='.
+		 */
+		if (i+1 < renames->nr &&
+		    !strcmp(oldpath, renames->queue[i+1]->one->path)) {
+			/* Handle rename/rename(1to2) or rename/rename(1to1) */
+			const char *pathnames[3];
+
+			pathnames[0] = oldpath;
+			pathnames[1] = newpath;
+			pathnames[2] = renames->queue[i+1]->two->path;
+
+			if (!strcmp(pathnames[1], pathnames[2])) {
+				/* Both sides renamed the same way. */
+				die("Not yet implemented");
+
+				/* We handled both renames, i.e. i+1 handled */
+				i++;
+				/* Move to next rename */
+				continue;
+			}
+
+			/* This is a rename/rename(1to2) */
+			die("Not yet implemented");
+
+			i++; /* We handled both renames, i.e. i+1 handled */
+			continue;
+		}
+
+		VERIFY_CI(oldinfo);
+		VERIFY_CI(newinfo);
+		target_index = pair->score; /* from append_rename_pairs() */
+		assert(target_index == 1 || target_index == 2);
+		other_source_index = 3-target_index;
+		old_sidemask = (1 << other_source_index); /* 2 or 4 */
+		source_deleted = (oldinfo->filemask == 1);
+		collision = ((newinfo->filemask & old_sidemask) != 0);
+		type_changed = !source_deleted &&
+			(S_ISREG(oldinfo->stages[other_source_index].mode) !=
+			 S_ISREG(newinfo->stages[target_index].mode));
+		if (type_changed && collision) {
+			/* special handling so later blocks can handle this */
+			die("Not yet implemented");
+		}
+
+		assert(source_deleted || oldinfo->filemask & old_sidemask);
+
+		/* Need to check for special types of rename conflicts... */
+		if (collision && !source_deleted) {
+			/* collision: rename/add or rename/rename(2to1) */
+			die("Not yet implemented");
+		} else if (collision && source_deleted) {
+			/* rename/add/delete or rename/rename(2to1)/delete */
+			die("Not yet implemented");
+		} else {
+			/* a few different cases... */
+			if (type_changed) {
+				/* rename vs. typechange */
+				die("Not yet implemented");
+			} else if (source_deleted) {
+				/* rename/delete */
+				die("Not yet implemented");
+			} else {
+				/* normal rename */
+				die("Not yet implemented");
+			}
+		}
+
+		if (!type_changed) {
+			/* Mark the original as resolved by removal */
+			oldinfo->merged.is_null = 1;
+			oldinfo->merged.clean = 1;
+		}
+
+	}
+
+	return clean_merge;
 }
 
 static int compare_pairs(const void *a_, const void *b_)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 06/11] merge-ort: add implementation of both sides renaming identically
  2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                   ` (4 preceding siblings ...)
  2020-12-09 19:41 ` [PATCH 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
@ 2020-12-09 19:41 ` Elijah Newren via GitGitGadget
  2020-12-11  3:32   ` Derrick Stolee
  2020-12-09 19:41 ` [PATCH 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-09 19:41 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement rename/rename(1to1) handling, i.e. both sides of history
renaming a file but renaming the same way.  This code replaces the
following from merge-recurisve.c:

  * all the 1to1 code in process_renames()
  * the RENAME_ONE_FILE_TO_ONE case of process_entry()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_rename_normal()
  * setup_rename_conflict_info()

The consolidation of four separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index faec29db955..085e81196a5 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -647,14 +647,31 @@ static int process_renames(struct merge_options *opt,
 		    !strcmp(oldpath, renames->queue[i+1]->one->path)) {
 			/* Handle rename/rename(1to2) or rename/rename(1to1) */
 			const char *pathnames[3];
+			struct version_info merged;
+			struct conflict_info *base, *side1, *side2;
+			unsigned was_binary_blob = 0;
 
 			pathnames[0] = oldpath;
 			pathnames[1] = newpath;
 			pathnames[2] = renames->queue[i+1]->two->path;
 
+			base = strmap_get(&opt->priv->paths, pathnames[0]);
+			side1 = strmap_get(&opt->priv->paths, pathnames[1]);
+			side2 = strmap_get(&opt->priv->paths, pathnames[2]);
+
+			VERIFY_CI(base);
+			VERIFY_CI(side1);
+			VERIFY_CI(side2);
+
 			if (!strcmp(pathnames[1], pathnames[2])) {
-				/* Both sides renamed the same way. */
-				die("Not yet implemented");
+				/* Both sides renamed the same way */
+				assert(side1 == side2);
+				memcpy(&side1->stages[0], &base->stages[0],
+				       sizeof(merged));
+				side1->filemask |= (1 << 0);
+				/* Mark base as resolved by removal */
+				base->merged.is_null = 1;
+				base->merged.clean = 1;
 
 				/* We handled both renames, i.e. i+1 handled */
 				i++;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 07/11] merge-ort: add implementation of both sides renaming differently
  2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                   ` (5 preceding siblings ...)
  2020-12-09 19:41 ` [PATCH 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
@ 2020-12-09 19:41 ` Elijah Newren via GitGitGadget
  2020-12-11  3:39   ` Derrick Stolee
  2020-12-09 19:41 ` [PATCH 08/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-09 19:41 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement rename/rename(1to2) handling, i.e. both sides of history
renaming a file and rename it differently.  This code replaces the
following from merge-recurisve.c:

  * all the 1to2 code in process_renames()
  * the RENAME_ONE_FILE_TO_TWO case of process_entry()
  * handle_rename_rename_1to2()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_file_collision()
  * setup_rename_conflict_info()

The consolidation of five separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

To be fair, there is a _slight_ tweak to process_entry() here to make
sure that the two different paths aren't marked as clean but are left in
a conflicted state.  So process_renames() and process_entry() aren't
quite entirely orthogonal, but they are pretty close.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 085e81196a5..75e638a23eb 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -680,7 +680,59 @@ static int process_renames(struct merge_options *opt,
 			}
 
 			/* This is a rename/rename(1to2) */
-			die("Not yet implemented");
+			clean_merge = handle_content_merge(opt,
+							   pair->one->path,
+							   &base->stages[0],
+							   &side1->stages[1],
+							   &side2->stages[2],
+							   pathnames,
+							   1 + 2 * opt->priv->call_depth,
+							   &merged);
+			if (!clean_merge &&
+			    merged.mode == side1->stages[1].mode &&
+			    oideq(&merged.oid, &side1->stages[1].oid)) {
+				was_binary_blob = 1;
+			}
+			memcpy(&side1->stages[1], &merged, sizeof(merged));
+			if (was_binary_blob) {
+				/*
+				 * Getting here means we were attempting to
+				 * merge a binary blob.
+				 *
+				 * Since we can't merge binaries,
+				 * handle_content_merge() just takes one
+				 * side.  But we don't want to copy the
+				 * contents of one side to both paths.  We
+				 * used the contents of side1 above for
+				 * side1->stages, let's use the contents of
+				 * side2 for side2->stages below.
+				 */
+				oidcpy(&merged.oid, &side2->stages[2].oid);
+				merged.mode = side2->stages[2].mode;
+			}
+			memcpy(&side2->stages[2], &merged, sizeof(merged));
+
+			side1->path_conflict = 1;
+			side2->path_conflict = 1;
+			/*
+			 * TODO: For renames we normally remove the path at the
+			 * old name.  It would thus seem consistent to do the
+			 * same for rename/rename(1to2) cases, but we haven't
+			 * done so traditionally and a number of the regression
+			 * tests now encode an expectation that the file is
+			 * left there at stage 1.  If we ever decide to change
+			 * this, add the following two lines here:
+			 *    base->merged.is_null = 1;
+			 *    base->merged.clean = 1;
+			 * and remove the setting of base->path_conflict to 1.
+			 */
+			base->path_conflict = 1;
+			path_msg(opt, oldpath, 0,
+				 _("CONFLICT (rename/rename): %s renamed to "
+				   "%s in %s and to %s in %s."),
+				 pathnames[0],
+				 pathnames[1], opt->branch1,
+				 pathnames[2], opt->branch2);
 
 			i++; /* We handled both renames, i.e. i+1 handled */
 			continue;
@@ -1257,13 +1309,13 @@ static void process_entry(struct merge_options *opt,
 		int side = (ci->filemask == 4) ? 2 : 1;
 		ci->merged.result.mode = ci->stages[side].mode;
 		oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
-		ci->merged.clean = !ci->df_conflict;
+		ci->merged.clean = !ci->df_conflict && !ci->path_conflict;
 	} else if (ci->filemask == 1) {
 		/* Deleted on both sides */
 		ci->merged.is_null = 1;
 		ci->merged.result.mode = 0;
 		oidcpy(&ci->merged.result.oid, &null_oid);
-		ci->merged.clean = 1;
+		ci->merged.clean = !ci->path_conflict;
 	}
 
 	/*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 08/11] merge-ort: add implementation of rename collisions
  2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                   ` (6 preceding siblings ...)
  2020-12-09 19:41 ` [PATCH 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
@ 2020-12-09 19:41 ` Elijah Newren via GitGitGadget
  2020-12-09 19:41 ` [PATCH 09/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-09 19:41 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement rename/rename(2to1) and rename/add handling, i.e. a file is
renamed into a location where another file is added (with that other
file either being a plain add or itself coming from a rename).  Note
that rename collisions can also have a special case stacked on top: the
file being renamed on one side of history is deleted on the other
(yielding either a rename/add/delete conflict or perhaps a
rename/rename(2to1)/delete[/delete]) conflict.

One thing to note here is that when there is a double rename, the code
in question only handles one of them at a time; a later iteration
through the loop will handle the other.  After they've both been
handled, process_entry()'s normal add/add code can handle the collision.

This code replaces the following from merge-recurisve.c:

  * all the 2to1 code in process_renames()
  * the RENAME_TWO_FILES_TO_ONE case of process_entry()
  * handle_rename_rename_2to1()
  * handle_rename_add()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_file_collision()
  * setup_rename_conflict_info()

The consolidation of six separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 75e638a23eb..4ec6b0701f1 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -759,10 +759,58 @@ static int process_renames(struct merge_options *opt,
 		/* Need to check for special types of rename conflicts... */
 		if (collision && !source_deleted) {
 			/* collision: rename/add or rename/rename(2to1) */
-			die("Not yet implemented");
+			const char *pathnames[3];
+			struct version_info merged;
+
+			struct conflict_info *base, *side1, *side2;
+			unsigned clean;
+
+			pathnames[0] = oldpath;
+			pathnames[other_source_index] = oldpath;
+			pathnames[target_index] = newpath;
+
+			base = strmap_get(&opt->priv->paths, pathnames[0]);
+			side1 = strmap_get(&opt->priv->paths, pathnames[1]);
+			side2 = strmap_get(&opt->priv->paths, pathnames[2]);
+
+			VERIFY_CI(base);
+			VERIFY_CI(side1);
+			VERIFY_CI(side2);
+
+			clean = handle_content_merge(opt, pair->one->path,
+						     &base->stages[0],
+						     &side1->stages[1],
+						     &side2->stages[2],
+						     pathnames,
+						     1 + 2*opt->priv->call_depth,
+						     &merged);
+
+			memcpy(&newinfo->stages[target_index], &merged,
+			       sizeof(merged));
+			if (!clean) {
+				path_msg(opt, newpath, 0,
+					 _("CONFLICT (rename involved in "
+					   "collision): rename of %s -> %s has "
+					   "content conflicts AND collides "
+					   "with another path; this may result "
+					   "in nested conflict markers."),
+					 oldpath, newpath);
+			}
 		} else if (collision && source_deleted) {
-			/* rename/add/delete or rename/rename(2to1)/delete */
-			die("Not yet implemented");
+			/*
+			 * rename/add/delete or rename/rename(2to1)/delete:
+			 * since oldpath was deleted on the side that didn't
+			 * do the rename, there's not much of a content merge
+			 * we can do for the rename.  oldinfo->merged.is_null
+			 * was already set, so we just leave things as-is so
+			 * they look like an add/add conflict.
+			 */
+
+			newinfo->path_conflict = 1;
+			path_msg(opt, newpath, 0,
+				 _("CONFLICT (rename/delete): %s renamed "
+				   "to %s in %s, but deleted in %s."),
+				 oldpath, newpath, rename_branch, delete_branch);
 		} else {
 			/* a few different cases... */
 			if (type_changed) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 09/11] merge-ort: add implementation of rename/delete conflicts
  2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                   ` (7 preceding siblings ...)
  2020-12-09 19:41 ` [PATCH 08/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
@ 2020-12-09 19:41 ` Elijah Newren via GitGitGadget
  2020-12-09 19:41 ` [PATCH 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-09 19:41 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement rename/delete conflicts, i.e. one side renames a file and the
other deletes the file.  This code replaces the following from
merge-recurisve.c:

  * the code relevant to RENAME_DELETE in process_renames()
  * the RENAME_DELETE case of process_entry()
  * handle_rename_delete()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_change_delete()
  * setup_rename_conflict_info()

The consolidation of five separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

To be fair, there is a _slight_ tweak to process_entry() here, because
rename/delete cases will also trigger the modify/delete codepath.
However, we only want a modify/delete message to be printed for a
rename/delete conflict if there is a content change in the renamed file
in addition to the rename.  So process_renames() and process_entry()
aren't quite fully orthogonal, but they are pretty close.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 47 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 39 insertions(+), 8 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 4ec6b0701f1..412a3b1da76 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -630,6 +630,7 @@ static int process_renames(struct merge_options *opt,
 		unsigned int old_sidemask;
 		int target_index, other_source_index;
 		int source_deleted, collision, type_changed;
+		const char *rename_branch = NULL, *delete_branch = NULL;
 
 		old_ent = strmap_get_entry(&opt->priv->paths, pair->one->path);
 		oldpath = old_ent->key;
@@ -752,6 +753,14 @@ static int process_renames(struct merge_options *opt,
 		if (type_changed && collision) {
 			/* special handling so later blocks can handle this */
 			die("Not yet implemented");
+		if (source_deleted) {
+			if (target_index == 1) {
+				rename_branch = opt->branch1;
+				delete_branch = opt->branch2;
+			} else {
+				rename_branch = opt->branch2;
+				delete_branch = opt->branch1;
+			}
 		}
 
 		assert(source_deleted || oldinfo->filemask & old_sidemask);
@@ -812,13 +821,26 @@ static int process_renames(struct merge_options *opt,
 				   "to %s in %s, but deleted in %s."),
 				 oldpath, newpath, rename_branch, delete_branch);
 		} else {
-			/* a few different cases... */
+			/*
+			 * a few different cases...start by copying the
+			 * existing stage(s) from oldinfo over the newinfo
+			 * and update the pathname(s).
+			 */
+			memcpy(&newinfo->stages[0], &oldinfo->stages[0],
+			       sizeof(newinfo->stages[0]));
+			newinfo->filemask |= (1 << 0);
+			newinfo->pathnames[0] = oldpath;
 			if (type_changed) {
 				/* rename vs. typechange */
 				die("Not yet implemented");
 			} else if (source_deleted) {
 				/* rename/delete */
-				die("Not yet implemented");
+				newinfo->path_conflict = 1;
+				path_msg(opt, newpath, 0,
+					 _("CONFLICT (rename/delete): %s renamed"
+					   " to %s in %s, but deleted in %s."),
+					 oldpath, newpath,
+					 rename_branch, delete_branch);
 			} else {
 				/* normal rename */
 				die("Not yet implemented");
@@ -1346,12 +1368,21 @@ static void process_entry(struct merge_options *opt,
 		modify_branch = (side == 1) ? opt->branch1 : opt->branch2;
 		delete_branch = (side == 1) ? opt->branch2 : opt->branch1;
 
-		path_msg(opt, path, 0,
-			 _("CONFLICT (modify/delete): %s deleted in %s "
-			   "and modified in %s.  Version %s of %s left "
-			   "in tree."),
-			 path, delete_branch, modify_branch,
-			 modify_branch, path);
+		if (ci->path_conflict &&
+		    oideq(&ci->stages[0].oid, &ci->stages[side].oid)) {
+			/*
+			 * This came from a rename/delete; no action to take,
+			 * but avoid printing "modify/delete" conflict notice
+			 * since the contents were not modified.
+			 */
+		} else {
+			path_msg(opt, path, 0,
+				 _("CONFLICT (modify/delete): %s deleted in %s "
+				   "and modified in %s.  Version %s of %s left "
+				   "in tree."),
+				 path, delete_branch, modify_branch,
+				 modify_branch, path);
+		}
 	} else if (ci->filemask == 2 || ci->filemask == 4) {
 		/* Added on one side */
 		int side = (ci->filemask == 4) ? 2 : 1;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 10/11] merge-ort: add implementation of normal rename handling
  2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                   ` (8 preceding siblings ...)
  2020-12-09 19:41 ` [PATCH 09/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
@ 2020-12-09 19:41 ` Elijah Newren via GitGitGadget
  2020-12-09 19:41 ` [PATCH 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
  11 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-09 19:41 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement handling of normal renames.  This code replaces the following
from merge-recurisve.c:

  * the code relevant to RENAME_NORMAL in process_renames()
  * the RENAME_NORMAL case of process_entry()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_rename_normal()
  * setup_rename_conflict_info()

The consolidation of four separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

(To be fair, the code for handling normal renames wasn't all that
complicated beforehand, but it's still much simpler now.)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 412a3b1da76..f2e4edf6506 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -843,7 +843,11 @@ static int process_renames(struct merge_options *opt,
 					 rename_branch, delete_branch);
 			} else {
 				/* normal rename */
-				die("Not yet implemented");
+				memcpy(&newinfo->stages[other_source_index],
+				       &oldinfo->stages[other_source_index],
+				       sizeof(newinfo->stages[0]));
+				newinfo->filemask |= (1 << other_source_index);
+				newinfo->pathnames[other_source_index] = oldpath;
 			}
 		}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 11/11] merge-ort: add implementation of type-changed rename handling
  2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                   ` (9 preceding siblings ...)
  2020-12-09 19:41 ` [PATCH 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
@ 2020-12-09 19:41 ` Elijah Newren via GitGitGadget
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
  11 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-09 19:41 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement cases where renames are involved in type changes (i.e. the
side of history that didn't rename the file changed its type from a
regular file to a symlink or submodule).  There was some code to handle
this in merge-recursive but only in the special case when the renamed
file had no content changes.  The code here works differently -- it
knows process_entry() can handle mode conflicts, so it does a few
minimal tweaks to ensure process_entry() can just finish the job as
needed.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index f2e4edf6506..64b23c8aa2a 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -752,7 +752,32 @@ static int process_renames(struct merge_options *opt,
 			 S_ISREG(newinfo->stages[target_index].mode));
 		if (type_changed && collision) {
 			/* special handling so later blocks can handle this */
-			die("Not yet implemented");
+			/*
+			 * if type_changed && collision are both true, then this
+			 * was really a double rename, but one side wasn't
+			 * detected due to lack of break detection.  I.e.
+			 * something like
+			 *    orig: has normal file 'foo'
+			 *    side1: renames 'foo' to 'bar', adds 'foo' symlink
+			 *    side2: renames 'foo' to 'bar'
+			 * In this case, the foo->bar rename on side1 won't be
+			 * detected because the new symlink named 'foo' is
+			 * there and we don't do break detection.  But we detect
+			 * this here because we don't want to merge the content
+			 * of the foo symlink with the foo->bar file, so we
+			 * have some logic to handle this special case.  The
+			 * easiest way to do that is make 'bar' on side1 not
+			 * be considered a colliding file but the other part
+			 * of a normal rename.  If the file is very different,
+			 * well we're going to get content merge conflicts
+			 * anyway so it doesn't hurt.  And if the colliding
+			 * file also has a different type, that'll be handled
+			 * by the content merge logic in process_entry() too.
+			 *
+			 * See also t6430, 'rename vs. rename/symlink'
+			 */
+			collision = 0;
+		}
 		if (source_deleted) {
 			if (target_index == 1) {
 				rename_branch = opt->branch1;
@@ -832,7 +857,11 @@ static int process_renames(struct merge_options *opt,
 			newinfo->pathnames[0] = oldpath;
 			if (type_changed) {
 				/* rename vs. typechange */
-				die("Not yet implemented");
+				/* Mark the original as resolved by removal */
+				memcpy(&oldinfo->stages[0].oid, &null_oid,
+				       sizeof(oldinfo->stages[0].oid));
+				oldinfo->stages[0].mode = 0;
+				oldinfo->filemask &= 0x06;
 			} else if (source_deleted) {
 				/* rename/delete */
 				newinfo->path_conflict = 1;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/11] merge-ort: add basic data structures for handling renames
  2020-12-09 19:41 ` [PATCH 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
@ 2020-12-11  2:03   ` Derrick Stolee
  2020-12-11  9:41     ` Elijah Newren
  0 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2020-12-11  2:03 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> This will grow later, but we only need a few fields for basic rename
> handling.

Perhaps these things will be extremely clear as the patch
series continues, but...

> +struct rename_info {
> +	/*
> +	 * pairs: pairing of filenames from diffcore_rename()
> +	 *
> +	 * Index 1 and 2 correspond to sides 1 & 2 as used in
> +	 * conflict_info.stages.  Index 0 unused.

Hm. This seems wasteful. I'm sure that you have a reason to use
index 0 in the future instead of just avoiding instances of [i-1]
indexes.

> +	 */
> +	struct diff_queue_struct pairs[3];
> +
> +	/*
> +	 * needed_limit: value needed for inexact rename detection to run
> +	 *
> +	 * If the current rename limit wasn't high enough for inexact
> +	 * rename detection to run, this records the limit needed.  Otherwise,
> +	 * this value remains 0.
> +	 */
> +	int needed_limit;
> +};
> +
>  struct merge_options_internal {
>  	/*
>  	 * paths: primary data structure in all of merge ort.
> @@ -96,6 +115,11 @@ struct merge_options_internal {
>  	 */
>  	struct strmap output;
>  
> +	/*
> +	 * renames: various data relating to rename detection
> +	 */
> +	struct rename_info *renames;
> +

And here, you create this as a pointer, but...
>  	/* Initialization of opt->priv, our internal merge data */
>  	opt->priv = xcalloc(1, sizeof(*opt->priv));
> +	opt->priv->renames = xcalloc(1, sizeof(*opt->priv->renames));

...unconditionally allocate it here. Perhaps there are other cases
where 'struct merge_options_internal' is allocated without the renames
member?

Searching merge-ort.c at this point does not appear to have any
other allocations of opt->priv or struct merge_options_internal.
Perhaps it would be best to include struct rename_info not as a
pointer?

If you do have a reason to keep it as a pointer, then perhaps it
should be freed in clear_internal_opts()?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/11] merge-ort: add initial outline for basic rename detection
  2020-12-09 19:41 ` [PATCH 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
@ 2020-12-11  2:39   ` Derrick Stolee
  2020-12-11  9:40     ` Elijah Newren
  2020-12-13  7:47     ` Elijah Newren
  0 siblings, 2 replies; 65+ messages in thread
From: Derrick Stolee @ 2020-12-11  2:39 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 60 insertions(+), 8 deletions(-)
> 
> diff --git a/merge-ort.c b/merge-ort.c
> index 90baedac407..92b765dd3f0 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -617,20 +617,72 @@ static int handle_content_merge(struct merge_options *opt,
>  
>  /*** Function Grouping: functions related to regular rename detection ***/
>  
> +static int process_renames(struct merge_options *opt,
> +			   struct diff_queue_struct *renames)
> +static int compare_pairs(const void *a_, const void *b_)
> +/* Call diffcore_rename() to compute which files have changed on given side */
> +static void detect_regular_renames(struct merge_options *opt,
> +				   struct tree *merge_base,
> +				   struct tree *side,
> +				   unsigned side_index)
> +static int collect_renames(struct merge_options *opt,
> +			   struct diff_queue_struct *result,
> +			   unsigned side_index)

standard "I promise this will follow soon!" strategy, OK.

>  static int detect_and_process_renames(struct merge_options *opt,
>  				      struct tree *merge_base,
>  				      struct tree *side1,
>  				      struct tree *side2)
>  {
> -	int clean = 1;
> +	struct diff_queue_struct combined;
> +	struct rename_info *renames = opt->priv->renames;

(Re: my concerns that we don't need 'renames' to be a pointer,
this could easily be "renames = &opt->priv.renames;")

> +	int s, clean = 1;
> +
> +	memset(&combined, 0, sizeof(combined));
> +
> +	detect_regular_renames(opt, merge_base, side1, 1);
> +	detect_regular_renames(opt, merge_base, side2, 2);

Find the renames in each side's diff.

I think the use of "1" and "2" here might be better situated
for an enum. Perhaps:

enum merge_side {
	MERGE_SIDE1 = 0,
	MERGE_SIDE2 = 1,
};

(Note, I shift these values to 0 and 1, respectively, allowing
us to truncate the pairs array to two entries while still
being mentally clear.)

> +
> +	ALLOC_GROW(combined.queue,
> +		   renames->pairs[1].nr + renames->pairs[2].nr,
> +		   combined.alloc);
> +	clean &= collect_renames(opt, &combined, 1);
> +	clean &= collect_renames(opt, &combined, 2);

Magic numbers again.

> +	QSORT(combined.queue, combined.nr, compare_pairs);
> +
> +	clean &= process_renames(opt, &combined);

I need to mentally remember that "clean" is a return state,
but _not_ a fail/success result. Even though we are using
"&=" here, it shouldn't be "&&=" or even "if (method()) return 1;"

Looking at how "clean" is used in struct merge_result, I
wonder if there is a reason to use an "int" over a simple
"unsigned" or even "unsigned clean:1;" You use -1 in places
as well as a case of "mi->clean = !!resolved;"

If there is more meaning to values other than "clean" or
"!clean", then an enum might be valuable.

> +	/* Free memory for renames->pairs[] and combined */
> +	for (s = 1; s <= 2; s++) {
> +		free(renames->pairs[s].queue);
> +		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
> +	}

This loop is particularly unusual. Perhaps it would be
better to do this instead:

	free(renames->pairs[MERGE_SIDE1].queue);
	free(renames->pairs[MERGE_SIDE2].queue);
	DIFF_QUEUE_CLEAR(&renames->pairs[MERGE_SIDE1]);
	DIFF_QUEUE_CLEAR(&renames->pairs[MERGE_SIDE2]);

> +	if (combined.nr) {
> +		int i;
> +		for (i = 0; i < combined.nr; i++)
> +			diff_free_filepair(combined.queue[i]);
> +		free(combined.queue);
> +	}
>  
> -	/*
> -	 * Rename detection works by detecting file similarity.  Here we use
> -	 * a really easy-to-implement scheme: files are similar IFF they have
> -	 * the same filename.  Therefore, by this scheme, there are no renames.
> -	 *
> -	 * TODO: Actually implement a real rename detection scheme.
> -	 */
>  	return clean;

I notice that this change causes detect_and_process_renames() to
change from an "unhelpful result, but success" to "die() always".

I wonder if there is value in swapping the order of the patches
to implement the static methods first. Of course, you hit the
"unreferenced static method" problem, so maybe your strategy is
better after all.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/11] merge-ort: implement detect_regular_renames()
  2020-12-09 19:41 ` [PATCH 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
@ 2020-12-11  2:54   ` Derrick Stolee
  2020-12-11 17:38     ` Elijah Newren
  0 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2020-12-11  2:54 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Based heavily on merge-recursive's get_diffpairs() function.

(You're not kidding, and I should have looked here before making
some comments below.)

> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c | 32 +++++++++++++++++++++++++++++++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/merge-ort.c b/merge-ort.c
> index 92b765dd3f0..1ff637e57af 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -634,7 +634,33 @@ static void detect_regular_renames(struct merge_options *opt,
>  				   struct tree *side,
>  				   unsigned side_index)
>  {
> -	die("Not yet implemented.");
> +	struct diff_options diff_opts;
> +	struct rename_info *renames = opt->priv->renames;
> +
> +	repo_diff_setup(opt->repo, &diff_opts);
> +	diff_opts.flags.recursive = 1;
> +	diff_opts.flags.rename_empty = 0;
> +	diff_opts.detect_rename = DIFF_DETECT_RENAME;
> +	diff_opts.rename_limit = opt->rename_limit;

I assume that opt->rename_limit has been initialized properly
against merge.renameLimit/diff.renameLimit in another location...

> +	if (opt->rename_limit <= 0)
> +		diff_opts.rename_limit = 1000;

(I made the following comments before thinking to look at
get_diffpairs() which behaves in an equivalent way with this
"1000" constant limit. I'm not sure if there is a reason why
this limit is different from the _other_ limits I discovered,
but it might still be good to reduce magic literal ints by
grouping this "1000" into a const or macro.)

...and this just assigns the default again. Why is this done
here instead of inside the diff machinery? Also, wouldn't a
diff.renameLimit = 0 imply no renames, not "use default"?

I notice that the docs don't make this explicit:

diff.renameLimit::
	The number of files to consider when performing the copy/rename
	detection; equivalent to the 'git diff' option `-l`. This setting
	has no effect if rename detection is turned off.

but also too_many_rename_candidates() has this strange
default check:

	/*
	 * This basically does a test for the rename matrix not
	 * growing larger than a "rename_limit" square matrix, ie:
	 *
	 *    num_create * num_src > rename_limit * rename_limit
	 */
	if (rename_limit <= 0)
		rename_limit = 32767;

this is... a much larger limit than I would think is reasonable.

Of course, diff_rename_limit_default is set to 400 inside diff.c.
Should that be extracted as a constant so we can repeat it here?

> +	diff_opts.rename_score = opt->rename_score;
> +	diff_opts.show_rename_progress = opt->show_rename_progress;
> +	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
> +	diff_setup_done(&diff_opts);
> +	diff_tree_oid(&merge_base->object.oid, &side->object.oid, "",
> +		      &diff_opts);
> +	diffcore_std(&diff_opts);
> +
> +	if (diff_opts.needed_rename_limit > opt->priv->renames->needed_limit)
> +		opt->priv->renames->needed_limit = diff_opts.needed_rename_limit;
> +
> +	renames->pairs[side_index] = diff_queued_diff;
> +
> +	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
> +	diff_queued_diff.nr = 0;
> +	diff_queued_diff.queue = NULL;
> +	diff_flush(&diff_opts);
>  }
>  
>  /*
> @@ -1379,6 +1405,10 @@ void merge_switch_to_result(struct merge_options *opt,
>  			printf("%s", sb->buf);
>  		}
>  		string_list_clear(&olist, 0);
> +
> +		/* Also include needed rename limit adjustment now */
> +		diff_warn_rename_limit("merge.renamelimit",
> +				       opti->renames->needed_limit, 0);

I suppose this new call is appropriate in this patch, since you assign
the value inside detect_regular_renames(), but it might be good to
describe its presence in the commit message.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/11] merge-ort: implement compare_pairs() and collect_renames()
  2020-12-09 19:41 ` [PATCH 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
@ 2020-12-11  3:00   ` Derrick Stolee
  2020-12-11 18:43     ` Elijah Newren
  0 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2020-12-11  3:00 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>

Perhaps worth pointing out comparison to score_compare() 

> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c | 27 +++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/merge-ort.c b/merge-ort.c
> index 1ff637e57af..3cdf8124b85 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -625,7 +625,13 @@ static int process_renames(struct merge_options *opt,
>  
>  static int compare_pairs(const void *a_, const void *b_)
>  {
> -	die("Not yet implemented.");
> +	const struct diff_filepair *a = *((const struct diff_filepair **)a_);
> +	const struct diff_filepair *b = *((const struct diff_filepair **)b_);
> +
> +	int cmp = strcmp(a->one->path, b->one->path);
> +	if (cmp)
> +		return cmp;
> +	return a->score - b->score;

Hm. I wasn't sure what would happen when subtracting these
"unsigned short" scores, but I see that score_compare() does
the same. Any potential for an existing, hidden bug here?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/11] merge-ort: add basic outline for process_renames()
  2020-12-09 19:41 ` [PATCH 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
@ 2020-12-11  3:24   ` Derrick Stolee
  2020-12-11 20:03     ` Elijah Newren
  0 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2020-12-11  3:24 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Add code which determines which kind of special rename case each rename
> corresponds to, but leave the handling of each type unimplemented for
> now.  Future commits will implement each one.
> 
> There is some tenuous resemblance to merge-recursive's
> process_renames(), but comparing the two is very unlikely to yield any
> insights.  merge-ort's process_renames() is a bit complex and I would
> prefer if I could simplify it more, but it is far easier to grok than
> merge-recursive's function of the same name in my opinion.  Plus,
> merge-ort handles more rename conflict types than merge-recursive does.
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 97 insertions(+), 1 deletion(-)
> 
> diff --git a/merge-ort.c b/merge-ort.c
> index 3cdf8124b85..faec29db955 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -620,7 +620,103 @@ static int handle_content_merge(struct merge_options *opt,
>  static int process_renames(struct merge_options *opt,
>  			   struct diff_queue_struct *renames)
>  {
> -	die("Not yet implemented.");
> +	int clean_merge = 1, i;
> +
> +	for (i = 0; i < renames->nr; ++i) {
> +		const char *oldpath = NULL, *newpath;

This "= NULL" is not necessary, since you initialize it to
old_ent->key unconditionally.

> +		struct diff_filepair *pair = renames->queue[i];
> +		struct conflict_info *oldinfo = NULL, *newinfo = NULL;

These, too.

> +		struct strmap_entry *old_ent, *new_ent;
> +		unsigned int old_sidemask;
> +		int target_index, other_source_index;
> +		int source_deleted, collision, type_changed;
> +
> +		old_ent = strmap_get_entry(&opt->priv->paths, pair->one->path);
> +		oldpath = old_ent->key;
> +		oldinfo = old_ent->value;
> +
> +		new_ent = strmap_get_entry(&opt->priv->paths, pair->two->path);
> +		newpath = new_ent->key;
> +		newinfo = new_ent->value;

This is moving data around. I wonder if there is any possibility that
old_ent or new_ent could be NULL here, and we should check for that?
(The "any possibility" is probably "is there a chance of a bug in the
earlier logic that didn't cause a failure yet, but would cause a SEGFAULT
here?".)

> +		/*
> +		 * diff_filepairs have copies of pathnames, thus we have to
> +		 * use standard 'strcmp()' (negated) instead of '=='.
> +		 */
> +		if (i+1 < renames->nr &&

nit: I tend to prefer "i + 1".

> +		    !strcmp(oldpath, renames->queue[i+1]->one->path)) {
> +			/* Handle rename/rename(1to2) or rename/rename(1to1) */
> +			const char *pathnames[3];
> +
> +			pathnames[0] = oldpath;
> +			pathnames[1] = newpath;
> +			pathnames[2] = renames->queue[i+1]->two->path;
> +
> +			if (!strcmp(pathnames[1], pathnames[2])) {
> +				/* Both sides renamed the same way. */
> +				die("Not yet implemented");
> +
> +				/* We handled both renames, i.e. i+1 handled */
> +				i++;
> +				/* Move to next rename */
> +				continue;
> +			}
> +
> +			/* This is a rename/rename(1to2) */
> +			die("Not yet implemented");

Interesting that you chose to do some internal logic to split this
case, but have both die(). Perhaps that is wise, but also this could
have been a die() at the start, along with the pathnames[] initialization
in a later patch that implements the 1to1 case (leaving the 1to2 case
to die()).

> +			i++; /* We handled both renames, i.e. i+1 handled */
> +			continue;
> +		}
> +
> +		VERIFY_CI(oldinfo);
> +		VERIFY_CI(newinfo);
> +		target_index = pair->score; /* from append_rename_pairs() */

Hm. I don't see append_rename_pairs() anywhere else in the codebase.
Do you mean record_rename_pair()? But in that case, I don't understand
the following assertion:

> +		assert(target_index == 1 || target_index == 2)> +		other_source_index = 3-target_index;

nit: "3 - target_index"

> +		old_sidemask = (1 << other_source_index); /* 2 or 4 */
> +		source_deleted = (oldinfo->filemask == 1);

This oldinfo->filemask check made me go to the declaration to find

	/*
	 * For filemask and dirmask, see tree-walk.h's struct traverse_info,
	 * particularly the documentation above the "fn" member.  Note that
	 * filemask = mask & ~dirmask from that documentation.
	 */
	unsigned filemask:3;
	unsigned dirmask:3;

Perhaps I've missed my window to complain about this comment pointing to
a comment in another struct definition instead of something like:

	/*
	 * The ith bit corresponds to whether the ith entry is a file
	 * (filemask) or a directory (dirmask). Thus, filemask & dirmask
	 * is always zero and filemask | dirmask == 7 always.
	 */

And of course, looking at this struct provides the justification for
using "1" and "2" for the "sides" and wasting the 0th value, because
it is consistent with the three entries here, using 0 as the base.

Thus, the comment about not using the 0 position could use a reference
to these triples. I still think that using an enum would help here.

Coming back to the line in question, "filemask == 1" _does_ mean that
the file exists only in the base. Deleted indeed.

> +		collision = ((newinfo->filemask & old_sidemask) != 0);
> +		type_changed = !source_deleted &&
> +			(S_ISREG(oldinfo->stages[other_source_index].mode) !=
> +			 S_ISREG(newinfo->stages[target_index].mode));
> +		if (type_changed && collision) {
> +			/* special handling so later blocks can handle this */
> +			die("Not yet implemented");
> +		}
> +
> +		assert(source_deleted || oldinfo->filemask & old_sidemask);
> +
> +		/* Need to check for special types of rename conflicts... */
> +		if (collision && !source_deleted) {
> +			/* collision: rename/add or rename/rename(2to1) */
> +			die("Not yet implemented");
> +		} else if (collision && source_deleted) {
> +			/* rename/add/delete or rename/rename(2to1)/delete */

How did we get to three actions here? I'll probably learn when
it is implemented.

> +			die("Not yet implemented");
> +		} else {
> +			/* a few different cases... */
> +			if (type_changed) {
> +				/* rename vs. typechange */
> +				die("Not yet implemented");
> +			} else if (source_deleted) {
> +				/* rename/delete */
> +				die("Not yet implemented");
> +			} else {
> +				/* normal rename */
> +				die("Not yet implemented");
> +			}
> +		}
> +
> +		if (!type_changed) {
> +			/* Mark the original as resolved by removal */
> +			oldinfo->merged.is_null = 1;
> +			oldinfo->merged.clean = 1;
> +		}
> +
> +	}
> +
> +	return clean_merge;

I'm glad you separated out this case organization from the implementations.
It's still a big dense, so I'll probably need to revisit as I see you fill
in the rest.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 06/11] merge-ort: add implementation of both sides renaming identically
  2020-12-09 19:41 ` [PATCH 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
@ 2020-12-11  3:32   ` Derrick Stolee
  0 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2020-12-11  3:32 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Implement rename/rename(1to1) handling, i.e. both sides of history
> renaming a file but renaming the same way.  This code replaces the
> following from merge-recurisve.c:
> 
>   * all the 1to1 code in process_renames()
>   * the RENAME_ONE_FILE_TO_ONE case of process_entry()
> 
> Also, there is some shared code from merge-recursive.c for multiple
> different rename cases which we will no longer need for this case (or
> other rename cases):
> 
>   * handle_rename_normal()
>   * setup_rename_conflict_info()
> 
> The consolidation of four separate codepaths into one is made possible
> by a change in design: process_renames() tweaks the conflict_info
> entries within opt->priv->paths such that process_entry() can then
> handle all the non-rename conflict types (directory/file, modify/delete,
> etc.) orthogonally.  This means we're much less likely to miss special
> implementation of some kind of combination of conflict types (see
> commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
> 2020-11-18), especially commit ef52778708 ("merge tests: expect improved
> directory/file conflict handling in ort", 2020-10-26) for more details).
> That, together with letting worktree/index updating be handled
> orthogonally in the merge_switch_to_result() function, dramatically
> simplifies the code for various special rename cases.
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/merge-ort.c b/merge-ort.c
> index faec29db955..085e81196a5 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -647,14 +647,31 @@ static int process_renames(struct merge_options *opt,
>  		    !strcmp(oldpath, renames->queue[i+1]->one->path)) {
>  			/* Handle rename/rename(1to2) or rename/rename(1to1) */
>  			const char *pathnames[3];
> +			struct version_info merged;
> +			struct conflict_info *base, *side1, *side2;
> +			unsigned was_binary_blob = 0;

Since you are adding to the declarations here, I suppose it would be
reasonable to include the 1to2/1to1 split here instead of the previous
patch, if that seems useful to reduce the complexity of that patch.
  
>  			pathnames[0] = oldpath;
>  			pathnames[1] = newpath;
>  			pathnames[2] = renames->queue[i+1]->two->path;
>
> +			base = strmap_get(&opt->priv->paths, pathnames[0]);
> +			side1 = strmap_get(&opt->priv->paths, pathnames[1]);
> +			side2 = strmap_get(&opt->priv->paths, pathnames[2]);
> +
> +			VERIFY_CI(base);
> +			VERIFY_CI(side1);
> +			VERIFY_CI(side2);
> +
>  			if (!strcmp(pathnames[1], pathnames[2])) {
> -				/* Both sides renamed the same way. */
> -				die("Not yet implemented");
> +				/* Both sides renamed the same way */
> +				assert(side1 == side2);
> +				memcpy(&side1->stages[0], &base->stages[0],
> +				       sizeof(merged));
> +				side1->filemask |= (1 << 0);
> +				/* Mark base as resolved by removal */
> +				base->merged.is_null = 1;
> +				base->merged.clean = 1;

Looks good.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 07/11] merge-ort: add implementation of both sides renaming differently
  2020-12-09 19:41 ` [PATCH 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
@ 2020-12-11  3:39   ` Derrick Stolee
  2020-12-11 21:56     ` Elijah Newren
  0 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2020-12-11  3:39 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren

On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Implement rename/rename(1to2) handling, i.e. both sides of history
> renaming a file and rename it differently.  This code replaces the
> following from merge-recurisve.c:
> 
>   * all the 1to2 code in process_renames()
>   * the RENAME_ONE_FILE_TO_TWO case of process_entry()
>   * handle_rename_rename_1to2()
> 
> Also, there is some shared code from merge-recursive.c for multiple
> different rename cases which we will no longer need for this case (or
> other rename cases):
> 
>   * handle_file_collision()
>   * setup_rename_conflict_info()
> 
> The consolidation of five separate codepaths into one is made possible
> by a change in design:

Excellent!

>  			/* This is a rename/rename(1to2) */
> -			die("Not yet implemented");
> +			clean_merge = handle_content_merge(opt,
> +							   pair->one->path,
> +							   &base->stages[0],
> +							   &side1->stages[1],
> +							   &side2->stages[2],
> +							   pathnames,
> +							   1 + 2 * opt->priv->call_depth,
> +							   &merged);

(this method currently die()s. ok)

> +			if (!clean_merge &&
> +			    merged.mode == side1->stages[1].mode &&
> +			    oideq(&merged.oid, &side1->stages[1].oid)) {
> +				was_binary_blob = 1;
> +			}

nit: Extraneous braces?

> +			memcpy(&side1->stages[1], &merged, sizeof(merged));
> +			if (was_binary_blob) {
> +				/*
> +				 * Getting here means we were attempting to
> +				 * merge a binary blob.
> +				 *
> +				 * Since we can't merge binaries,
> +				 * handle_content_merge() just takes one
> +				 * side.  But we don't want to copy the
> +				 * contents of one side to both paths.  We
> +				 * used the contents of side1 above for
> +				 * side1->stages, let's use the contents of
> +				 * side2 for side2->stages below.
> +				 */
> +				oidcpy(&merged.oid, &side2->stages[2].oid);
> +				merged.mode = side2->stages[2].mode;
> +			}
> +			memcpy(&side2->stages[2], &merged, sizeof(merged));
> +
> +			side1->path_conflict = 1;
> +			side2->path_conflict = 1;
> +			/*
> +			 * TODO: For renames we normally remove the path at the
> +			 * old name.  It would thus seem consistent to do the
> +			 * same for rename/rename(1to2) cases, but we haven't
> +			 * done so traditionally and a number of the regression
> +			 * tests now encode an expectation that the file is
> +			 * left there at stage 1.  If we ever decide to change
> +			 * this, add the following two lines here:
> +			 *    base->merged.is_null = 1;
> +			 *    base->merged.clean = 1;
> +			 * and remove the setting of base->path_conflict to 1.
> +			 */
> +			base->path_conflict = 1;

I'm getting the point of the review/evening where I'm starting to gloss
over these important details. Time to take a break (after this patch).

> +			path_msg(opt, oldpath, 0,
> +				 _("CONFLICT (rename/rename): %s renamed to "
> +				   "%s in %s and to %s in %s."),
> +				 pathnames[0],
> +				 pathnames[1], opt->branch1,
> +				 pathnames[2], opt->branch2);

This output differs a bit from handle_rename_rename_1to2() in
merge-recursive.c:

	output(opt, 1, _("CONFLICT (rename/rename): "
	       "Rename \"%s\"->\"%s\" in branch \"%s\" "
	       "rename \"%s\"->\"%s\" in \"%s\"%s"),
	       o->path, a->path, ci->ren1->branch,
	       o->path, b->path, ci->ren2->branch,
	       opt->priv->call_depth ? _(" (left unresolved)") : "");

How much do we want to have _exact_ output matches between the
two strategies, at least in the short term?

> @@ -1257,13 +1309,13 @@ static void process_entry(struct merge_options *opt,
>  		int side = (ci->filemask == 4) ? 2 : 1;
>  		ci->merged.result.mode = ci->stages[side].mode;
>  		oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
> -		ci->merged.clean = !ci->df_conflict;
> +		ci->merged.clean = !ci->df_conflict && !ci->path_conflict;
>  	} else if (ci->filemask == 1) {
>  		/* Deleted on both sides */
>  		ci->merged.is_null = 1;
>  		ci->merged.result.mode = 0;
>  		oidcpy(&ci->merged.result.oid, &null_oid);
> -		ci->merged.clean = 1;
> +		ci->merged.clean = !ci->path_conflict;

These exist because this is the first time we assign path_conflict.
Sure.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/11] merge-ort: add initial outline for basic rename detection
  2020-12-11  2:39   ` Derrick Stolee
@ 2020-12-11  9:40     ` Elijah Newren
  2020-12-13  7:47     ` Elijah Newren
  1 sibling, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2020-12-11  9:40 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Thu, Dec 10, 2020 at 6:39 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  merge-ort.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 60 insertions(+), 8 deletions(-)
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 90baedac407..92b765dd3f0 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -617,20 +617,72 @@ static int handle_content_merge(struct merge_options *opt,
> >
> >  /*** Function Grouping: functions related to regular rename detection ***/
> >
> > +static int process_renames(struct merge_options *opt,
> > +                        struct diff_queue_struct *renames)
> > +static int compare_pairs(const void *a_, const void *b_)
> > +/* Call diffcore_rename() to compute which files have changed on given side */
> > +static void detect_regular_renames(struct merge_options *opt,
> > +                                struct tree *merge_base,
> > +                                struct tree *side,
> > +                                unsigned side_index)
> > +static int collect_renames(struct merge_options *opt,
> > +                        struct diff_queue_struct *result,
> > +                        unsigned side_index)
>
> standard "I promise this will follow soon!" strategy, OK.
>
> >  static int detect_and_process_renames(struct merge_options *opt,
> >                                     struct tree *merge_base,
> >                                     struct tree *side1,
> >                                     struct tree *side2)
> >  {
> > -     int clean = 1;
> > +     struct diff_queue_struct combined;
> > +     struct rename_info *renames = opt->priv->renames;
>
> (Re: my concerns that we don't need 'renames' to be a pointer,
> this could easily be "renames = &opt->priv.renames;")

Yeah, there'll be a lot of these...

>
> > +     int s, clean = 1;
> > +
> > +     memset(&combined, 0, sizeof(combined));
> > +
> > +     detect_regular_renames(opt, merge_base, side1, 1);
> > +     detect_regular_renames(opt, merge_base, side2, 2);
>
> Find the renames in each side's diff.
>
> I think the use of "1" and "2" here might be better situated
> for an enum. Perhaps:

I can see where you're coming from...but that's a monumentally huge
shift.  What about all my "loops over the sides"?  Sure, the ones that
are only two lines long could be just turned into code without a loop,
but when the loop is 140 lines long, that doesn't make much sense.
Loop over enum ranges?

Are all the variables that track an index still okay?    Do I need to
also introduce an enum for all the filemask/dirmask/match_mask/etc.
variables:

enum merge_mask {
     MERGE_JUST_BASE = 1,
     MERGE_JUST_SIDE1 = 2,
     MERGE_BASE_AND_SIDE1 = 3,
     MERGE_JUST_SIDE2 = 4,
     MERGE_BASE_AND_SIDE2 = 5,
     MERGE_SIDE1_AND_SIDE2 = 6,
     MERGE_BASE_AND_SIDE1_AND_SIDE2 = 7
}

?  That seems like a pretty big pain to use.

Also, what about the code that uses a side index to get the other side
index?  Or my conversions from side indices to masks (or vice versa)?
I tend to put comments by these, but there's a _lot_ of them.


I suspect this one would take a week to change, and I'd miss several
locations, and....some cases would certainly look cleaner but I
suspect some would be far uglier and end up being unchanged and then
leave us with a mess of trying to understand both.

What if I added a big comment near the top of the file that we've got
dozens of variables that are arrays of size 3 which are meant to be
indexed by the "side" of the three-way merge that it is tracking
information for:
    0: merge_base
    1: side1
    2: side2
(though several of the variables might have index 0 unused since it
doesn't track anything specifically for the merge base), and further
that masks are used in certain variables which try to track which
sides are present or match, with 2<<SIDE being the bit to track that a
given side is present/relevant.

I mean, this stuff is all over the place throughout the 4500 line
merge-ort.c file.  And it is in lots of diffcore-rename.c (which will
grow by about 1K lines as well).

> enum merge_side {
>         MERGE_SIDE1 = 0,
>         MERGE_SIDE2 = 1,
> };
>
> (Note, I shift these values to 0 and 1, respectively, allowing
> us to truncate the pairs array to two entries while still
> being mentally clear.)

As mentioned in the previous patch, the shift of the indices would
cause me at least a large amount of mental confusion and I suspect it
would for others too.  Both conflict_info.stages[] and
conflict_info.pathnames[] are meant to be indexed by the side of the
merge (or the base), but using conflict_info.stages[MERGE_SIDE1] or
conflict_info.pathnames[MERGE_SIDE2] as you have them defined here
would provide the wrong answer.  Since there's a conflict_info per
file and it is used all over the code, this would just be ripe for
off-by-one errors.

Since both conflict_info.stages[] and renames->pairs[] are meant to be
indexed by the merge side, this kind of conflict is inevitable.  The
only clean solution is making both be arrays of size three, and just
skipping index 0 in the variables that don't need to track something
for the merge_base.

> > +
> > +     ALLOC_GROW(combined.queue,
> > +                renames->pairs[1].nr + renames->pairs[2].nr,
> > +                combined.alloc);
> > +     clean &= collect_renames(opt, &combined, 1);
> > +     clean &= collect_renames(opt, &combined, 2);
>
> Magic numbers again.
>
> > +     QSORT(combined.queue, combined.nr, compare_pairs);
> > +
> > +     clean &= process_renames(opt, &combined);
>
> I need to mentally remember that "clean" is a return state,
> but _not_ a fail/success result. Even though we are using
> "&=" here, it shouldn't be "&&=" or even "if (method()) return 1;"
>
> Looking at how "clean" is used in struct merge_result, I
> wonder if there is a reason to use an "int" over a simple
> "unsigned" or even "unsigned clean:1;" You use -1 in places
> as well as a case of "mi->clean = !!resolved;"

Heh, when I used an unsigned for a boolean, Jonathan Tan asked why I
didn't use an int.  When I use an int for a boolean, you ask why I
don't use an unsigned.  I think my stock answer should just be that
the other reviewer suggested it.  ;-)

> If there is more meaning to values other than "clean" or
> "!clean", then an enum might be valuable.

Yeah, this came from unpack-trees.c and merge-recursive.c, where the
value is usually 1 (clean) or 0 (not clean), but the special value of
-1 signals something went wrong enough that we need to stop further
processing and return up the call chain for any necessary cleanup
(e.g. removal of lock files).  The value of -1 is only used for things
like "disk-is-full, can't write any more files to the working
directory", or "failed to read one of the trees involved in the merge
from the git object store".

-1, though, is the return value from unpack_trees(), traverse_trees(),
and perhaps other places in the code, so I'd be worried about
attempting to use a different special value for fear that I'd miss
converting the return value I got from one of those to the new special
value.

merge-ort has far fewer locations where -1 appears (in part because
the checkout() code is an external function rather than being
sprinkled everywhere), and it tends to cause the code to return
immediately, so most all call sites can assume a simple boolean value
of either 0 or 1.

> > +     /* Free memory for renames->pairs[] and combined */
> > +     for (s = 1; s <= 2; s++) {
> > +             free(renames->pairs[s].queue);
> > +             DIFF_QUEUE_CLEAR(&renames->pairs[s]);
> > +     }
>
> This loop is particularly unusual. Perhaps it would be
> better to do this instead:
>
>         free(renames->pairs[MERGE_SIDE1].queue);
>         free(renames->pairs[MERGE_SIDE2].queue);
>         DIFF_QUEUE_CLEAR(&renames->pairs[MERGE_SIDE1]);
>         DIFF_QUEUE_CLEAR(&renames->pairs[MERGE_SIDE2]);

If this were the only one, then I'd absolutely agree.  There's 7 such
loops in the version of merge-ort.c in the 'ort' branch.  I can't get
rid of all of them, because even though some are short, some of those
are very long for-loops.  (The long ones use "side" for the loop
counter instead of "s" -- maybe I should use "side" even on the short
ones?)

There's also another 6 that loop over the sides including the
merge-base (thus including index 0).  If those count, it's closer to
13.

> > +     if (combined.nr) {
> > +             int i;
> > +             for (i = 0; i < combined.nr; i++)
> > +                     diff_free_filepair(combined.queue[i]);
> > +             free(combined.queue);
> > +     }
> >
> > -     /*
> > -      * Rename detection works by detecting file similarity.  Here we use
> > -      * a really easy-to-implement scheme: files are similar IFF they have
> > -      * the same filename.  Therefore, by this scheme, there are no renames.
> > -      *
> > -      * TODO: Actually implement a real rename detection scheme.
> > -      */
> >       return clean;
>
> I notice that this change causes detect_and_process_renames() to
> change from an "unhelpful result, but success" to "die() always".
>
> I wonder if there is value in swapping the order of the patches
> to implement the static methods first. Of course, you hit the
> "unreferenced static method" problem, so maybe your strategy is
> better after all.

I used to do that kind of thing, but the unreferenced static method
problem is really annoying.  It means the code doesn't even compile,
which is a bad state to submit patches in.  I can work around that by
adding "(void)unused_funcname;" expressions somewhere in the code, but
reviewers tend to be even more surprised by those.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/11] merge-ort: add basic data structures for handling renames
  2020-12-11  2:03   ` Derrick Stolee
@ 2020-12-11  9:41     ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2020-12-11  9:41 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Thu, Dec 10, 2020 at 6:03 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > This will grow later, but we only need a few fields for basic rename
> > handling.
>
> Perhaps these things will be extremely clear as the patch
> series continues, but...
>
> > +struct rename_info {
> > +     /*
> > +      * pairs: pairing of filenames from diffcore_rename()
> > +      *
> > +      * Index 1 and 2 correspond to sides 1 & 2 as used in
> > +      * conflict_info.stages.  Index 0 unused.
>
> Hm. This seems wasteful. I'm sure that you have a reason to use
> index 0 in the future instead of just avoiding instances of [i-1]
> indexes.

Yes, it is...and it gets more wasteful when I increase the number of
fields that are arrays of size 3 with none of them using index 0.
Currently, there's only 1 such field; later there will be 10.

However, this does not scale with the number of files or size of the
repository or anything like that; it's a flat overhead.  At this point
in my patch submissions, that overhead is 16 bytes per merge.  Later
when I have 10 variables that are arrays of size three, it'll be 940
bytes per merge.

I'm not planning on using index 0 later; the reason for this really is
to avoid off-by-one errors (it's one of the two biggest problems in
computer science, right?).  The off-by-one problem becomes huge when
you consider all the references:

* The conflict_info type has stages which is an array of size three --
index 0 is always the base commit, index 1 is side1, and index 2 is
side2.  There is one of these per path involved in the merge, and are
used all over the place, so it's nice to think in terms of "1 is
side1, 2 is side2".  (There is also a pathnames variable of size three
with the same indexing rules, and a bunch of bitmasks that rely on
2<<0 == base, 2<<1 == side1, and 2<<2 == side2.)

* These other 10 variables that are arrays of size 3 in the
rename_info struct are all keeping track of information for side1 and
side2.  When you consider the number of references for all 10 of them
combined across the codebase, it adds up to quite a bit.

I'm certain that if I would have had to use off-by-one indexing for
these 10 variables, while using not-off-by-one indexing for the stages
and pathnames in conflict_info, I'm certain I would have messed it up
many dozen times and spent countless hours tracking down bugs.  And I
think the result would be a lot harder to review.  And future
developers would come along and fall into that trap and get various
indices off.

I'm willing to pay a one-time overhead of 940 bytes to avoid that.

> > +      */
> > +     struct diff_queue_struct pairs[3];
> > +
> > +     /*
> > +      * needed_limit: value needed for inexact rename detection to run
> > +      *
> > +      * If the current rename limit wasn't high enough for inexact
> > +      * rename detection to run, this records the limit needed.  Otherwise,
> > +      * this value remains 0.
> > +      */
> > +     int needed_limit;
> > +};
> > +
> >  struct merge_options_internal {
> >       /*
> >        * paths: primary data structure in all of merge ort.
> > @@ -96,6 +115,11 @@ struct merge_options_internal {
> >        */
> >       struct strmap output;
> >
> > +     /*
> > +      * renames: various data relating to rename detection
> > +      */
> > +     struct rename_info *renames;
> > +
>
> And here, you create this as a pointer, but...
> >       /* Initialization of opt->priv, our internal merge data */
> >       opt->priv = xcalloc(1, sizeof(*opt->priv));
> > +     opt->priv->renames = xcalloc(1, sizeof(*opt->priv->renames));
>
> ...unconditionally allocate it here. Perhaps there are other cases
> where 'struct merge_options_internal' is allocated without the renames
> member?
>
> Searching merge-ort.c at this point does not appear to have any
> other allocations of opt->priv or struct merge_options_internal.
> Perhaps it would be best to include struct rename_info not as a
> pointer?

That's a really good point; I'll try it out.

> If you do have a reason to keep it as a pointer, then perhaps it
> should be freed in clear_internal_opts()?

Eek.  It's there in my 'ort' branch, but one of the problems trying to
rearrange and clean things up to make nice digestible series is that
you sometimes forget to bring important parts along.  Whoops; good
catch.  I'm going to try just turning renames into an embedded struct
instead of a pointer, though.  If it doesn't work out, I'll make sure
to clear it.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/11] merge-ort: implement detect_regular_renames()
  2020-12-11  2:54   ` Derrick Stolee
@ 2020-12-11 17:38     ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2020-12-11 17:38 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Thu, Dec 10, 2020 at 6:54 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Based heavily on merge-recursive's get_diffpairs() function.
>
> (You're not kidding, and I should have looked here before making
> some comments below.)

I can provide some extra background on all the crazy magic numbers and
non-sensical treatment of tiny values, though.  And since you were so
curious about these, I have an excuse to dump more info on you than
you probably were bargaining for...  :-)

> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  merge-ort.c | 32 +++++++++++++++++++++++++++++++-
> >  1 file changed, 31 insertions(+), 1 deletion(-)
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 92b765dd3f0..1ff637e57af 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -634,7 +634,33 @@ static void detect_regular_renames(struct merge_options *opt,
> >                                  struct tree *side,
> >                                  unsigned side_index)
> >  {
> > -     die("Not yet implemented.");
> > +     struct diff_options diff_opts;
> > +     struct rename_info *renames = opt->priv->renames;
> > +
> > +     repo_diff_setup(opt->repo, &diff_opts);
> > +     diff_opts.flags.recursive = 1;
> > +     diff_opts.flags.rename_empty = 0;
> > +     diff_opts.detect_rename = DIFF_DETECT_RENAME;
> > +     diff_opts.rename_limit = opt->rename_limit;
>
> I assume that opt->rename_limit has been initialized properly
> against merge.renameLimit/diff.renameLimit in another location...

Yes, see init_merge_options() and merge_recursive_config() in
merge-recursive.c.  People using merge-ort will nevertheless be using
some functions out of merge-recursive.c...for now.

> > +     if (opt->rename_limit <= 0)
> > +             diff_opts.rename_limit = 1000;
>
> (I made the following comments before thinking to look at
> get_diffpairs() which behaves in an equivalent way with this
> "1000" constant limit. I'm not sure if there is a reason why
> this limit is different from the _other_ limits I discovered,
> but it might still be good to reduce magic literal ints by
> grouping this "1000" into a const or macro.)

I'll discuss the value of 1000 later...

> ...and this just assigns the default again. Why is this done
> here instead of inside the diff machinery? Also, wouldn't a
> diff.renameLimit = 0 imply no renames, not "use default"?

Yes, I totally agree that would make more sense, but backward
compatibility sometimes requires violating common sense.  See commit
89973554b5 ("diffcore-rename: make diff-tree -l0 mean -l<large>",
2017-11-29).  For the same reasons discussed in that commit, I'm
hesitant to change what is used here; it's a backward compatibility
concern now.

One reason opt->rename_limit could be 0 is if some caller does the following:

   merge_options opt;
   memset(&opt, 0, sizeof(opt));
   opt.ancestor = ....;
   /* forget to set opt.rename_limit */
   merge_incore_nonrecursive(&opt, ...);

The most likely reason for a negative value is probably that
init_merge_options() in merge-recursive.c set opt->rename_limit to -1.
Having init_merge_options() set the value to the actual default
probably would have made more sense, but the
assign-it-to-negative-one-and-deal-with-it-later goes back to the
introduction of init_merge_options() in 2008.  Actually, if you ignore
init_merge_options() the same thing was being done before back in 2007
as soon as any limit handling was introduced to the code.

Since init_merge_options() is shared for now between merge-recurisve.c
and merge-ort.c, any updates I make here would necessitate similar
code updates to merge-recursive.c.

Also, it's not just internal code callers.  Someone could set
merge.renameLimit or diff.renameLimit in their repository (or their
global .gitconfig) to a non-positive value and get this behavior of
treat-non-positive-as-whatever-the-default-is.

> I notice that the docs don't make this explicit:
>
> diff.renameLimit::
>         The number of files to consider when performing the copy/rename
>         detection; equivalent to the 'git diff' option `-l`. This setting
>         has no effect if rename detection is turned off.

See also https://lore.kernel.org/git/20180426162339.db6b4855fedb5e5244ba7dd1@google.com/
where we talked about documenting the special value of 0 (in that case
for diff -l, though merge.renameLimit should have one too), but we
obviously never got around to it.  Yet.  (I did at least put it on my
projects list, though things sometimes languish there for years.)

> but also too_many_rename_candidates() has this strange
> default check:
>
>         /*
>          * This basically does a test for the rename matrix not
>          * growing larger than a "rename_limit" square matrix, ie:
>          *
>          *    num_create * num_src > rename_limit * rename_limit
>          */
>         if (rename_limit <= 0)
>                 rename_limit = 32767;
>
> this is... a much larger limit than I would think is reasonable.

The value of 32767 came from backward compatibility and in particular
from the exact same commit referenced above -- 89973554b5
("diffcore-rename: make diff-tree -l0 mean -l<large>", 2017-11-29).

Also, perhaps this value is *smaller* than reasonable -- I've used
values like 48941 before on real world repositories.  (And I'm not
picking a random large value to report; *that* exact value came up
enough times that I remember that particular one.)  If 0 (or negative)
is supposed to mean "large", then shouldn't it handle values people
use on real world repositories?  (Not that I care that much, because I
think the usage of 0 to mean "large" is kind of illogical, so I'll
avoid it and discourage others from using it.)

I do know where the 32767 came from, though.  Once upon a time, 32767
was "the biggest supported value possible" and in fact any other
number was silently capped to 32767.  This of course led to a number
of issues.  See commit 9f7e4bfa3b ("diff: remove silent clamp of
renameLimit", 2017-11-13) and perhaps also commits b520abf1c8
("sequencer: warn when internal merge may be suboptimal due to
renameLimit", 2017-11-13) and d6861d0258 ("progress: fix progress
meters when dealing with lots of work", 2017-11-13).

> Of course, diff_rename_limit_default is set to 400 inside diff.c.
> Should that be extracted as a constant so we can repeat it here?

I think it makes sense to have merge have a higher default rename
limit than diffs.  I can see folks just doing a "git log -p" and not
wanting individual commits to take a long time, especially since it's
not at all clear that most the commits are going to be of interest to
the user.  In contrast, when merging, the commits are definitely of
interest to the user, and spending a little more time on a few commits
provides a nice payoff.

Also, merges provide progress meters on rename detection; I don't
think that log -p does.  I think that the presence of progress meters
makes it easier to deal with larger values as well.

It may also be worth noting that both of these numbers were modified
in the same commit in the past and retained distinct values; see
commit 92c57e5c1d ("bump rename limit defaults (again)", 2011-02-19).

After all my rename optimizations, all those cases that used to
require limits in the 20k ~ 50k range can now all complete with a
limit under 1000, and quite rapidly.  (It was really hard to get one
of them under 1000, though.  It stubbornly required a value of 1002
until I figured out another optimization allowing me to avoid
detecting more renames without any change in behavior.)  It's nice
that it's fast, and it's also nice that rename detection just works
instead of having the merge throw a warning that the limit was too
low, doing the merge all wrong, and expecting the user to undo the
merge, set the limit higher, and redo it.

400 definitely isn't high enough.  I'm actually tempted to double the
1000 to buy more room.  Since the last bump was about a decade ago and
noted that processors had gotten faster, since the bump before it
perhaps it is time to bump it again.

All that said, it could possibly make sense to define 1000 as a
special constant near the top of the file and then use it via whatever
macro/constant/variable name we give it.  Such a change would make it
harder to compare this patch to get_diffpairs() in merge-recursive.c,
though...

> > +     diff_opts.rename_score = opt->rename_score;
> > +     diff_opts.show_rename_progress = opt->show_rename_progress;
> > +     diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
> > +     diff_setup_done(&diff_opts);
> > +     diff_tree_oid(&merge_base->object.oid, &side->object.oid, "",
> > +                   &diff_opts);
> > +     diffcore_std(&diff_opts);
> > +
> > +     if (diff_opts.needed_rename_limit > opt->priv->renames->needed_limit)
> > +             opt->priv->renames->needed_limit = diff_opts.needed_rename_limit;
> > +
> > +     renames->pairs[side_index] = diff_queued_diff;
> > +
> > +     diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
> > +     diff_queued_diff.nr = 0;
> > +     diff_queued_diff.queue = NULL;
> > +     diff_flush(&diff_opts);
> >  }
> >
> >  /*
> > @@ -1379,6 +1405,10 @@ void merge_switch_to_result(struct merge_options *opt,
> >                       printf("%s", sb->buf);
> >               }
> >               string_list_clear(&olist, 0);
> > +
> > +             /* Also include needed rename limit adjustment now */
> > +             diff_warn_rename_limit("merge.renamelimit",
> > +                                    opti->renames->needed_limit, 0);
>
> I suppose this new call is appropriate in this patch, since you assign
> the value inside detect_regular_renames(), but it might be good to
> describe its presence in the commit message.

Sure, I can add a note.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/11] merge-ort: implement compare_pairs() and collect_renames()
  2020-12-11  3:00   ` Derrick Stolee
@ 2020-12-11 18:43     ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2020-12-11 18:43 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Thu, Dec 10, 2020 at 7:00 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
>
> Perhaps worth pointing out comparison to score_compare()

That comparison might be slightly misleading due to this line from
collect_renames():
+               p->score = side_index;
Since diffcore-rename has already used the percentage similarity to
determine if two files are a rename and has recorded that in a
separate field, I don't need the percentage similarity anymore.  So
the score is no longer needful at this point.  However, I needed a way
to somehow record which side of the merge each diff_filepair came from
and I can't just add a field to the diff_filepair struct (especially
since its only use is in merge-ort).  I know, I know, I'm evil.
Creating a new struct just so I could have something that contained a
diff_filepair and another auxiliary field just felt so ugly, so I just
reused this one field instead.  And I did use that field to "rank" or
"sort" the pairs, so doesn't that make it a valid "score"?  :-)

I should probably add a big comment about this just above that line.
I've meant to do that a multiple different times, but oddly enough
this thought has only occurred to me while I'm out running or
otherwise away from the computer.  Until now, of course.

> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  merge-ort.c | 27 +++++++++++++++++++++++++--
> >  1 file changed, 25 insertions(+), 2 deletions(-)
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 1ff637e57af..3cdf8124b85 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -625,7 +625,13 @@ static int process_renames(struct merge_options *opt,
> >
> >  static int compare_pairs(const void *a_, const void *b_)
> >  {
> > -     die("Not yet implemented.");
> > +     const struct diff_filepair *a = *((const struct diff_filepair **)a_);
> > +     const struct diff_filepair *b = *((const struct diff_filepair **)b_);
> > +
> > +     int cmp = strcmp(a->one->path, b->one->path);
> > +     if (cmp)
> > +             return cmp;
> > +     return a->score - b->score;
>
> Hm. I wasn't sure what would happen when subtracting these
> "unsigned short" scores, but I see that score_compare() does
> the same. Any potential for an existing, hidden bug here?

In the case of compare_pairs(), a->score and b->score have a minimum
value of 1 and a maximum value of 2 (note above where I set score to
the side index).  I believe most platforms will have an int big enough
to store the result of that subtraction.

I'm not sure why I bother with the secondary sort, though.  It shouldn't matter.

Which is probably a good thing, because strcmp(a, b) gives us
ascending order, and a - b gives us descending order.  That's messed
up.  Honestly, it doesn't matter because all I really needed from the
sort was for diff_filepairs with the same source name to be adjacent
(so that I can check for rename/rename(1to2) conflicts be comparing
adjacent pairs), but still it's annoying that the function contradicts
itself on the desired order.  And it'll trigger whenever the same path
is renamed by both sides of history, which we have a number of
testcases for in the testsuite.  So that confirms that the secondary
sort just doesn't matter.  I'll get rid of it and just use strcmp.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/11] merge-ort: add basic outline for process_renames()
  2020-12-11  3:24   ` Derrick Stolee
@ 2020-12-11 20:03     ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2020-12-11 20:03 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Thu, Dec 10, 2020 at 7:24 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Add code which determines which kind of special rename case each rename
> > corresponds to, but leave the handling of each type unimplemented for
> > now.  Future commits will implement each one.
> >
> > There is some tenuous resemblance to merge-recursive's
> > process_renames(), but comparing the two is very unlikely to yield any
> > insights.  merge-ort's process_renames() is a bit complex and I would
> > prefer if I could simplify it more, but it is far easier to grok than
> > merge-recursive's function of the same name in my opinion.  Plus,
> > merge-ort handles more rename conflict types than merge-recursive does.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  merge-ort.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 97 insertions(+), 1 deletion(-)
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 3cdf8124b85..faec29db955 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -620,7 +620,103 @@ static int handle_content_merge(struct merge_options *opt,
> >  static int process_renames(struct merge_options *opt,
> >                          struct diff_queue_struct *renames)
> >  {
> > -     die("Not yet implemented.");
> > +     int clean_merge = 1, i;
> > +
> > +     for (i = 0; i < renames->nr; ++i) {
> > +             const char *oldpath = NULL, *newpath;
>
> This "= NULL" is not necessary, since you initialize it to
> old_ent->key unconditionally.
>
> > +             struct diff_filepair *pair = renames->queue[i];
> > +             struct conflict_info *oldinfo = NULL, *newinfo = NULL;
>
> These, too.

Oh, man, so many frustrations here.  You are, of course correct.  The
reason it's here, though...

The code I took this from is a bit more complex due to directory
renames, cached renames from previous steps (think of rebases or
cherry-picks, where there was a rename from the old base to the new
base), and trivial directory resolutions.

In the more complex code, the initializations aren't needed either;
the variables are never used uninitialized.

BUT certain versions of gcc don't recognize that they are never used
uninitialized and throw errors saying they might be.  Newer gcc
versions recognize that everything is kosher and compiles fine, but
IIRC, the CentOS 7 version of gcc does not.  I want the code to
compile there under DEVELOPER=1 too, and I couldn't find an easy way
to restructure to make it clearer to the compiler.  So I was forced to
add these useless initializations.

...and I just didn't think to rip them out for this preliminary patch.


I can rip them out if you want, but I'll be forced to add them back
later -- in patches where code flow analysis will also suggest it
isn't needed.

> > +             struct strmap_entry *old_ent, *new_ent;
> > +             unsigned int old_sidemask;
> > +             int target_index, other_source_index;
> > +             int source_deleted, collision, type_changed;
> > +
> > +             old_ent = strmap_get_entry(&opt->priv->paths, pair->one->path);
> > +             oldpath = old_ent->key;
> > +             oldinfo = old_ent->value;
> > +
> > +             new_ent = strmap_get_entry(&opt->priv->paths, pair->two->path);
> > +             newpath = new_ent->key;
> > +             newinfo = new_ent->value;
>
> This is moving data around. I wonder if there is any possibility that
> old_ent or new_ent could be NULL here, and we should check for that?
> (The "any possibility" is probably "is there a chance of a bug in the
> earlier logic that didn't cause a failure yet, but would cause a SEGFAULT
> here?".)

Good question.  The only chance at this point in the code of this
happening is if someone has introduced a severe bug in the code
elsewhere.  Any paths that show up in rename detection have to come
from the trees, and in collect_merge_info() we walk over the full
trees and store every path from any of the trees in opt->priv->paths.

Once directory rename detection, caching of renames, and trivial
directory resolution are added to the mix, it suddenly becomes
possible for these to be NULL.  So, the version on 'ort' does have
checks.  This just represents me trying to find chunks of the code
that can be submitted upstream in a fashion where each piece makes
sense.

> > +             /*
> > +              * diff_filepairs have copies of pathnames, thus we have to
> > +              * use standard 'strcmp()' (negated) instead of '=='.
> > +              */
> > +             if (i+1 < renames->nr &&
>
> nit: I tend to prefer "i + 1".

You might have to keep reminding me; I think I've got more of these in
various places.  I'll fix it up, though.

> > +                 !strcmp(oldpath, renames->queue[i+1]->one->path)) {
> > +                     /* Handle rename/rename(1to2) or rename/rename(1to1) */
> > +                     const char *pathnames[3];
> > +
> > +                     pathnames[0] = oldpath;
> > +                     pathnames[1] = newpath;
> > +                     pathnames[2] = renames->queue[i+1]->two->path;
> > +
> > +                     if (!strcmp(pathnames[1], pathnames[2])) {
> > +                             /* Both sides renamed the same way. */
> > +                             die("Not yet implemented");
> > +
> > +                             /* We handled both renames, i.e. i+1 handled */
> > +                             i++;
> > +                             /* Move to next rename */
> > +                             continue;
> > +                     }
> > +
> > +                     /* This is a rename/rename(1to2) */
> > +                     die("Not yet implemented");
>
> Interesting that you chose to do some internal logic to split this
> case, but have both die(). Perhaps that is wise, but also this could
> have been a die() at the start, along with the pathnames[] initialization
> in a later patch that implements the 1to1 case (leaving the 1to2 case
> to die()).

Yeah, that'd also be a fair way to split up the patches.  Doing this
the way I did above allowed me to separate the conflict detection and
conflict handling patches -- later patches in the series were allowed
to focus on "this is how rename/delete is handled", "this is how
rename/rename(1to2) conflicts are handled", etc.  Your split would
mean combining the detection and handling logic for at least one of
the conflict types into the same patch.  That'd still work fine, but
it just wasn't what I came up with due to my focus on the
detection/handling split.

> > +                     i++; /* We handled both renames, i.e. i+1 handled */
> > +                     continue;
> > +             }
> > +
> > +             VERIFY_CI(oldinfo);
> > +             VERIFY_CI(newinfo);
> > +             target_index = pair->score; /* from append_rename_pairs() */
>
> Hm. I don't see append_rename_pairs() anywhere else in the codebase.

Oh, whoops, that function doesn't even exist in the 'ort' branch
anymore either.  pair->score was set in collect_renames(), in the
previous patch.  It was set to the side index.  Looks like I at least
documented it once on one side of its usage.

> Do you mean record_rename_pair()? But in that case, I don't understand
> the following assertion:
>
> > +             assert(target_index == 1 || target_index == 2)>

See collect_renames() setting of p->score to side_index.

> > +               other_source_index = 3-target_index;
>
> nit: "3 - target_index"

Yep, like I said, you'll have to keep reminding me.  I will fix this up.

> > +             old_sidemask = (1 << other_source_index); /* 2 or 4 */
> > +             source_deleted = (oldinfo->filemask == 1);
>
> This oldinfo->filemask check made me go to the declaration to find
>
>         /*
>          * For filemask and dirmask, see tree-walk.h's struct traverse_info,
>          * particularly the documentation above the "fn" member.  Note that
>          * filemask = mask & ~dirmask from that documentation.
>          */
>         unsigned filemask:3;
>         unsigned dirmask:3;
>
> Perhaps I've missed my window to complain about this comment pointing to
> a comment in another struct definition instead of something like:

It would be nicer placed in as a comment on patch 1 of
en/merge-ort-impl[1].  But it's definitely not too late -- that series
is still in 'seen', you reviewed an earlier round of it[2] (sadly ALSO
labelled as v2), and it was mostly waiting for you and Jonathan to
give a thumbs up on whether you were happy with the changes I made to
the series[3].  Feel free to say that my changes to that series looks
okay, except that I need to update the description for filemask and
dirmask.  :-)

[1] https://lore.kernel.org/git/2568ec92c6d96dc51aff4a411900eaec8d32ce27.1607114890.git.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/75170ee7-525e-31fc-f6bd-6dfac12b00c8@gmail.com/
[3] See final portion of
https://lore.kernel.org/git/CABPp-BGcyRURykePOafjcE1z9J8U5awF=PZw1ufx+8Ow+k3j3w@mail.gmail.com/

>         /*
>          * The ith bit corresponds to whether the ith entry is a file
>          * (filemask) or a directory (dirmask). Thus, filemask & dirmask
>          * is always zero and filemask | dirmask == 7 always.
>          */

The first part of this comment looks great.  The last part is false,
though -- the ith entry might not be either a file or a directory.
For example, if the merge base had a file that both sides deleted,
filemask == 1 and dirmask == 0.

I'd be happy to use your wording before the final "and filemask |
dirmask == 7 always" bit, but I think it'd be nice to also keep a "see
also tree-walk.h..." comment,

> And of course, looking at this struct provides the justification for
> using "1" and "2" for the "sides" and wasting the 0th value, because
> it is consistent with the three entries here, using 0 as the base.

Yaay, looks like I didn't need to convince you after all!  At least
not on that point...

> Thus, the comment about not using the 0 position could use a reference
> to these triples. I still think that using an enum would help here.

I'm genuinely curious about your thoughts on the various other
questions I raised about that point in the earlier patches to this
series and your thoughts on my other suggestion there.

> Coming back to the line in question, "filemask == 1" _does_ mean that
> the file exists only in the base. Deleted indeed.

There's going to be a lot more of these later, especially in the basic
conflict handling series...  :-)

> > +             collision = ((newinfo->filemask & old_sidemask) != 0);
> > +             type_changed = !source_deleted &&
> > +                     (S_ISREG(oldinfo->stages[other_source_index].mode) !=
> > +                      S_ISREG(newinfo->stages[target_index].mode));
> > +             if (type_changed && collision) {
> > +                     /* special handling so later blocks can handle this */
> > +                     die("Not yet implemented");
> > +             }
> > +
> > +             assert(source_deleted || oldinfo->filemask & old_sidemask);
> > +
> > +             /* Need to check for special types of rename conflicts... */
> > +             if (collision && !source_deleted) {
> > +                     /* collision: rename/add or rename/rename(2to1) */
> > +                     die("Not yet implemented");
> > +             } else if (collision && source_deleted) {
> > +                     /* rename/add/delete or rename/rename(2to1)/delete */
>
> How did we get to three actions here? I'll probably learn when
> it is implemented.

Multiple actions that appear unrelated on one side suddenly get glued
together after the other side's rename detection.

rename/add/delete comes about like so:

Base version: file A exists
Side 1: deletes A, adds unrelated B
Side 2: renames A -> B

rename/rename(2to1)/delete comes about as follows:

Base version: file A and file B both exist.
Side 1: delete A, rename B->C
Side 2, rename A->C

(You could also have rename/rename(2to1)/delete/delete if side 2 also
deleted B, but it doesn't present any additional complications for the
code.)

You can find all kinds of crazy rename cases in t6422 and t6416.
"mod6" is fun.  Perhaps I should add a comment somewhere referencing
these testcases?


...and if you're really curious...

All of these conflict types were much worse with merge-recursive.c,
because sometimes each code path had to consider the combination of
all possible conflicts.  Thus if you were ever worried about a
rename/rename(1to2)/content conflict/file location/(D/F)/(D/F)/
appearing[4], there might need to be a single specific codepath that
handled all of those simultaneously.  merge-ort makes these more
orthogonal.  For example, one place in merge-ort handles all
directory/file conflicts, regardless of what other conflicts they are
part of, whereas in merge-recursive there was directory/file conflict
handling code shotgun blasted everywhere and probably missing from
several specific conflict types.

[4] You can read this conflict as: both sides modify a file in
conflicting way ("content conflict"), both rename that file but to
different paths ("rename/rename(1to2)"), one side renames the
directory which the other side had renamed that file into causing it
to possibly need a transitive rename ("file location"; for example, if
one side renamed A -> B/C, and the other side renamed B/ -> Beta/),
and each side puts a directory in the way of the other's path (each of
the "D/F", e.g. if the side that renamed B/ -> Beta/ also added a
Beta/C/ directory to be in the way of A getting renamed to end up
there).)

> > +                     die("Not yet implemented");
> > +             } else {
> > +                     /* a few different cases... */
> > +                     if (type_changed) {
> > +                             /* rename vs. typechange */
> > +                             die("Not yet implemented");
> > +                     } else if (source_deleted) {
> > +                             /* rename/delete */
> > +                             die("Not yet implemented");
> > +                     } else {
> > +                             /* normal rename */
> > +                             die("Not yet implemented");
> > +                     }
> > +             }
> > +
> > +             if (!type_changed) {
> > +                     /* Mark the original as resolved by removal */
> > +                     oldinfo->merged.is_null = 1;
> > +                     oldinfo->merged.clean = 1;
> > +             }
> > +
> > +     }
> > +
> > +     return clean_merge;
>
> I'm glad you separated out this case organization from the implementations.
> It's still a big dense, so I'll probably need to revisit as I see you fill
> in the rest.
>
> Thanks,
> -Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 07/11] merge-ort: add implementation of both sides renaming differently
  2020-12-11  3:39   ` Derrick Stolee
@ 2020-12-11 21:56     ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2020-12-11 21:56 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Thu, Dec 10, 2020 at 7:39 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Implement rename/rename(1to2) handling, i.e. both sides of history
> > renaming a file and rename it differently.  This code replaces the
> > following from merge-recurisve.c:
> >
> >   * all the 1to2 code in process_renames()
> >   * the RENAME_ONE_FILE_TO_TWO case of process_entry()
> >   * handle_rename_rename_1to2()
> >
> > Also, there is some shared code from merge-recursive.c for multiple
> > different rename cases which we will no longer need for this case (or
> > other rename cases):
> >
> >   * handle_file_collision()
> >   * setup_rename_conflict_info()
> >
> > The consolidation of five separate codepaths into one is made possible
> > by a change in design:
>
> Excellent!
>
> >                       /* This is a rename/rename(1to2) */
> > -                     die("Not yet implemented");
> > +                     clean_merge = handle_content_merge(opt,
> > +                                                        pair->one->path,
> > +                                                        &base->stages[0],
> > +                                                        &side1->stages[1],
> > +                                                        &side2->stages[2],
> > +                                                        pathnames,
> > +                                                        1 + 2 * opt->priv->call_depth,
> > +                                                        &merged);
>
> (this method currently die()s. ok)
>
> > +                     if (!clean_merge &&
> > +                         merged.mode == side1->stages[1].mode &&
> > +                         oideq(&merged.oid, &side1->stages[1].oid)) {
> > +                             was_binary_blob = 1;
> > +                     }
>
> nit: Extraneous braces?

Yeah.

> > +                     memcpy(&side1->stages[1], &merged, sizeof(merged));
> > +                     if (was_binary_blob) {
> > +                             /*
> > +                              * Getting here means we were attempting to
> > +                              * merge a binary blob.
> > +                              *
> > +                              * Since we can't merge binaries,
> > +                              * handle_content_merge() just takes one
> > +                              * side.  But we don't want to copy the
> > +                              * contents of one side to both paths.  We
> > +                              * used the contents of side1 above for
> > +                              * side1->stages, let's use the contents of
> > +                              * side2 for side2->stages below.
> > +                              */
> > +                             oidcpy(&merged.oid, &side2->stages[2].oid);
> > +                             merged.mode = side2->stages[2].mode;
> > +                     }
> > +                     memcpy(&side2->stages[2], &merged, sizeof(merged));
> > +
> > +                     side1->path_conflict = 1;
> > +                     side2->path_conflict = 1;
> > +                     /*
> > +                      * TODO: For renames we normally remove the path at the
> > +                      * old name.  It would thus seem consistent to do the
> > +                      * same for rename/rename(1to2) cases, but we haven't
> > +                      * done so traditionally and a number of the regression
> > +                      * tests now encode an expectation that the file is
> > +                      * left there at stage 1.  If we ever decide to change
> > +                      * this, add the following two lines here:
> > +                      *    base->merged.is_null = 1;
> > +                      *    base->merged.clean = 1;
> > +                      * and remove the setting of base->path_conflict to 1.
> > +                      */
> > +                     base->path_conflict = 1;
>
> I'm getting the point of the review/evening where I'm starting to gloss
> over these important details. Time to take a break (after this patch).

I'm surprised you didn't take a break after giving a talk at GitHub
Universe earlier yesterday.  I don't think I got anything done the
rest of the day after I gave my talk at GitMerge 2020.  Nice job on
your talk, by the way; I'm sending it along to some others to watch.

> > +                     path_msg(opt, oldpath, 0,
> > +                              _("CONFLICT (rename/rename): %s renamed to "
> > +                                "%s in %s and to %s in %s."),
> > +                              pathnames[0],
> > +                              pathnames[1], opt->branch1,
> > +                              pathnames[2], opt->branch2);
>
> This output differs a bit from handle_rename_rename_1to2() in
> merge-recursive.c:
>
>         output(opt, 1, _("CONFLICT (rename/rename): "
>                "Rename \"%s\"->\"%s\" in branch \"%s\" "
>                "rename \"%s\"->\"%s\" in \"%s\"%s"),
>                o->path, a->path, ci->ren1->branch,
>                o->path, b->path, ci->ren2->branch,
>                opt->priv->call_depth ? _(" (left unresolved)") : "");
>
> How much do we want to have _exact_ output matches between the
> two strategies, at least in the short term?

Good question.  I started with such a goal, and discarded it when I
discovered it was unrealistic and even actively harmful -- at least in
the extreme of exact matching in all cases.  I've already updated the
regression tests to expect differences in output and behavior in a few
different series that merged down months ago[1][2][3].
   [1] https://lore.kernel.org/git/pull.827.v3.git.git.1597098559.gitgitgadget@gmail.com/
   [2] https://lore.kernel.org/git/pull.769.v2.git.1603731704.gitgitgadget@gmail.com/
   [3] https://lore.kernel.org/git/pull.879.git.git.1602794790.gitgitgadget@gmail.com/

 Of particular note from those series, at least as far as output
messages, are the following commits:
   1f3c9ba707 ("t6425: be more flexible with rename/delete conflict
messages", 2020-08-10)
   2a7c16c980 ("t6422, t6426: be more flexible for add/add conflicts
involving renames", 2020-08-10)
   c8c35f6a02 ("merge tests: expect slight differences in output for
recursive vs. ort", 2020-10-26)

Summarizing those, I split some conflict messages into multiple
messages, changed the name of some conflict types that get reported,
avoided having some messages go to stdout while others go to stderr
(based on who added them rather than differences in severity of the
message), and even just changed some messages (because once you've
accepted the other changes, exact matching just doesn't matter).
Other commits in the series also note how I changed the merge behavior
in addition to the output in various cases.

The rename/delete conflict commit message is particularly
illustrative.  It demonstrates why exact matching of output messages
is unachievable short of keeping a completely busted code design.

We can certainly tweak individual messages if we feel something will
make them clearer, but matching isn't a goal for me.

> > @@ -1257,13 +1309,13 @@ static void process_entry(struct merge_options *opt,
> >               int side = (ci->filemask == 4) ? 2 : 1;
> >               ci->merged.result.mode = ci->stages[side].mode;
> >               oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
> > -             ci->merged.clean = !ci->df_conflict;
> > +             ci->merged.clean = !ci->df_conflict && !ci->path_conflict;
> >       } else if (ci->filemask == 1) {
> >               /* Deleted on both sides */
> >               ci->merged.is_null = 1;
> >               ci->merged.result.mode = 0;
> >               oidcpy(&ci->merged.result.oid, &null_oid);
> > -             ci->merged.clean = 1;
> > +             ci->merged.clean = !ci->path_conflict;
>
> These exist because this is the first time we assign path_conflict.
> Sure.
>
> Thanks,
> -Stolee


Thanks for all your detailed reviews!

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/11] merge-ort: add initial outline for basic rename detection
  2020-12-11  2:39   ` Derrick Stolee
  2020-12-11  9:40     ` Elijah Newren
@ 2020-12-13  7:47     ` Elijah Newren
  2020-12-14 14:33       ` Derrick Stolee
  1 sibling, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2020-12-13  7:47 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

Hi,

Sorry for two different email responses to the same email...

Addressing the comments on this patchset mean re-submitting
en/merge-ort-impl, and causing conflicts in en/merge-ort-2 and this
series en/merge-ort-3.  Since gitgitgadget will not allow me to submit
patches against a series that isn't published by Junio, I'll need to
ask Junio to temporarily drop both of these series, then later
resubmit en/merge-ort-2 after he publishes my updates to
en/merge-ort-impl.  Then when he publishes my updates to
en/merge-ort-2, I'll be able to submit my already-rebased patches for
en/merge-ort-3.

A couple extra comments below...

On Thu, Dec 10, 2020 at 6:39 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  merge-ort.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 60 insertions(+), 8 deletions(-)
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 90baedac407..92b765dd3f0 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -617,20 +617,72 @@ static int handle_content_merge(struct merge_options *opt,
> >
> >  /*** Function Grouping: functions related to regular rename detection ***/
> >
> > +static int process_renames(struct merge_options *opt,
> > +                        struct diff_queue_struct *renames)
> > +static int compare_pairs(const void *a_, const void *b_)
> > +/* Call diffcore_rename() to compute which files have changed on given side */
> > +static void detect_regular_renames(struct merge_options *opt,
> > +                                struct tree *merge_base,
> > +                                struct tree *side,
> > +                                unsigned side_index)
> > +static int collect_renames(struct merge_options *opt,
> > +                        struct diff_queue_struct *result,
> > +                        unsigned side_index)
>
> standard "I promise this will follow soon!" strategy, OK.
>
> >  static int detect_and_process_renames(struct merge_options *opt,
> >                                     struct tree *merge_base,
> >                                     struct tree *side1,
> >                                     struct tree *side2)
> >  {
> > -     int clean = 1;
> > +     struct diff_queue_struct combined;
> > +     struct rename_info *renames = opt->priv->renames;
>
> (Re: my concerns that we don't need 'renames' to be a pointer,
> this could easily be "renames = &opt->priv.renames;")
>
> > +     int s, clean = 1;
> > +
> > +     memset(&combined, 0, sizeof(combined));
> > +
> > +     detect_regular_renames(opt, merge_base, side1, 1);
> > +     detect_regular_renames(opt, merge_base, side2, 2);
>
> Find the renames in each side's diff.
>
> I think the use of "1" and "2" here might be better situated
> for an enum. Perhaps:
>
> enum merge_side {
>         MERGE_SIDE1 = 0,
>         MERGE_SIDE2 = 1,
> };
>
> (Note, I shift these values to 0 and 1, respectively, allowing
> us to truncate the pairs array to two entries while still
> being mentally clear.)

So, after mulling it over for a while, I created a

enum merge_side {
    MERGE_BASE = 0,
    MERGE_SIDE1 = 1,
    MERGE_SIDE2 = 2
};

and I made use of it in several places.  I just avoided going to an
extreme with it (e.g. adding another enum for masks or changing all
possibly relevant variables from ints to enum merge_side), and used it
more as a document-when-values-are-meant-to-refer-to-sides-of-the-merge
kind of thing.  Of course, this affects two previous patchsets and not
just this one, so I'll have to post a _lot_ of new patches...   :-)

> > +
> > +     ALLOC_GROW(combined.queue,
> > +                renames->pairs[1].nr + renames->pairs[2].nr,
> > +                combined.alloc);
> > +     clean &= collect_renames(opt, &combined, 1);
> > +     clean &= collect_renames(opt, &combined, 2);
>
> Magic numbers again.
>
> > +     QSORT(combined.queue, combined.nr, compare_pairs);
> > +
> > +     clean &= process_renames(opt, &combined);
>
> I need to mentally remember that "clean" is a return state,
> but _not_ a fail/success result. Even though we are using
> "&=" here, it shouldn't be "&&=" or even "if (method()) return 1;"
>
> Looking at how "clean" is used in struct merge_result, I
> wonder if there is a reason to use an "int" over a simple
> "unsigned" or even "unsigned clean:1;" You use -1 in places
> as well as a case of "mi->clean = !!resolved;"

Something I missed in my reply yesterday...

Note that mi->clean is NOT from struct merge_result.  It is from
struct merged_info, and in that struct it IS defined as "unsigned
clean:1", i.e. it is a true boolean.  The merged_info.clean field is
used to determine whether a specific path merged cleanly.

"clean" from struct merge_result is whether the entirety of the merge
was clean or not.  It's almost a boolean, but allows for a
"catastrophic problem encountered" value.  I added the following
comment:
/*
* Whether the merge is clean; possible values:
*    1: clean
*    0: not clean (merge conflicts)
*   <0: operation aborted prematurely.  (object database
*       unreadable, disk full, etc.)  Worktree may be left in an
*       inconsistent state if operation failed near the end.
*/

This also means that I either abort and return a negative value, or I
can continue treating merge_result's "clean" field as a boolean.

But again, this isn't new to this patchset; it affects the patchset
before the patchset before this one.

> If there is more meaning to values other than "clean" or
> "!clean", then an enum might be valuable.
>
> > +     /* Free memory for renames->pairs[] and combined */
> > +     for (s = 1; s <= 2; s++) {
> > +             free(renames->pairs[s].queue);
> > +             DIFF_QUEUE_CLEAR(&renames->pairs[s]);
> > +     }
>
> This loop is particularly unusual. Perhaps it would be
> better to do this instead:
>
>         free(renames->pairs[MERGE_SIDE1].queue);
>         free(renames->pairs[MERGE_SIDE2].queue);
>         DIFF_QUEUE_CLEAR(&renames->pairs[MERGE_SIDE1]);
>         DIFF_QUEUE_CLEAR(&renames->pairs[MERGE_SIDE2]);
>
> > +     if (combined.nr) {
> > +             int i;
> > +             for (i = 0; i < combined.nr; i++)
> > +                     diff_free_filepair(combined.queue[i]);
> > +             free(combined.queue);
> > +     }
> >
> > -     /*
> > -      * Rename detection works by detecting file similarity.  Here we use
> > -      * a really easy-to-implement scheme: files are similar IFF they have
> > -      * the same filename.  Therefore, by this scheme, there are no renames.
> > -      *
> > -      * TODO: Actually implement a real rename detection scheme.
> > -      */
> >       return clean;
>
> I notice that this change causes detect_and_process_renames() to
> change from an "unhelpful result, but success" to "die() always".
>
> I wonder if there is value in swapping the order of the patches
> to implement the static methods first. Of course, you hit the
> "unreferenced static method" problem, so maybe your strategy is
> better after all.
>
> Thanks,
> -Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/11] merge-ort: add initial outline for basic rename detection
  2020-12-13  7:47     ` Elijah Newren
@ 2020-12-14 14:33       ` Derrick Stolee
  2020-12-14 15:42         ` Johannes Schindelin
  2020-12-14 17:35         ` Elijah Newren
  0 siblings, 2 replies; 65+ messages in thread
From: Derrick Stolee @ 2020-12-14 14:33 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On 12/13/2020 2:47 AM, Elijah Newren wrote:
> Hi,
> 
> Sorry for two different email responses to the same email...
> 
> Addressing the comments on this patchset mean re-submitting
> en/merge-ort-impl, and causing conflicts in en/merge-ort-2 and this
> series en/merge-ort-3.  Since gitgitgadget will not allow me to submit
> patches against a series that isn't published by Junio, I'll need to
> ask Junio to temporarily drop both of these series, then later
> resubmit en/merge-ort-2 after he publishes my updates to
> en/merge-ort-impl.  Then when he publishes my updates to
> en/merge-ort-2, I'll be able to submit my already-rebased patches for
> en/merge-ort-3.

Let's chat privately about perhaps creatin
 
> A couple extra comments below...


>>> +     int s, clean = 1;
>>> +
>>> +     memset(&combined, 0, sizeof(combined));
>>> +
>>> +     detect_regular_renames(opt, merge_base, side1, 1);
>>> +     detect_regular_renames(opt, merge_base, side2, 2);
>>
>> Find the renames in each side's diff.
>>
>> I think the use of "1" and "2" here might be better situated
>> for an enum. Perhaps:
>>
>> enum merge_side {
>>         MERGE_SIDE1 = 0,
>>         MERGE_SIDE2 = 1,
>> };
>>
>> (Note, I shift these values to 0 and 1, respectively, allowing
>> us to truncate the pairs array to two entries while still
>> being mentally clear.)
> 
> So, after mulling it over for a while, I created a
> 
> enum merge_side {
>     MERGE_BASE = 0,
>     MERGE_SIDE1 = 1,
>     MERGE_SIDE2 = 2
> };
> 
> and I made use of it in several places.  I just avoided going to an
> extreme with it (e.g. adding another enum for masks or changing all
> possibly relevant variables from ints to enum merge_side), and used it
> more as a document-when-values-are-meant-to-refer-to-sides-of-the-merge
> kind of thing.  Of course, this affects two previous patchsets and not
> just this one, so I'll have to post a _lot_ of new patches...   :-)

I appreciate using names for the meaning behind a numerical constant.
You mentioned in the other thread that this will eventually expand to
a list of 10 entries, which is particularly frightening if we don't
get some control over it now.

I generally prefer using types to convey meaning as well, but I'm
willing to relax on this because I believe C won't complain if you
pass a literal int into an enum-typed parameter, so the compiler
doesn't help enough in that sense.

> Something I missed in my reply yesterday...
> 
> Note that mi->clean is NOT from struct merge_result.  It is from
> struct merged_info, and in that struct it IS defined as "unsigned
> clean:1", i.e. it is a true boolean.  The merged_info.clean field is
> used to determine whether a specific path merged cleanly.
> 
> "clean" from struct merge_result is whether the entirety of the merge
> was clean or not.  It's almost a boolean, but allows for a
> "catastrophic problem encountered" value.  I added the following
> comment:
> /*
> * Whether the merge is clean; possible values:
> *    1: clean
> *    0: not clean (merge conflicts)
> *   <0: operation aborted prematurely.  (object database
> *       unreadable, disk full, etc.)  Worktree may be left in an
> *       inconsistent state if operation failed near the end.
> */
> 
> This also means that I either abort and return a negative value, or I
> can continue treating merge_result's "clean" field as a boolean.

Having this comment helps a lot!
 
> But again, this isn't new to this patchset; it affects the patchset
> before the patchset before this one.

Right, when I had the current change checked out, I don't see the
patch that introduced the 'clean' member (though, I _could_ have
blamed to find out). Instead, I just got confused and thought it
worth a question. Your comment prevents this question in the future.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/11] merge-ort: add initial outline for basic rename detection
  2020-12-14 14:33       ` Derrick Stolee
@ 2020-12-14 15:42         ` Johannes Schindelin
  2020-12-14 16:11           ` Elijah Newren
  2020-12-14 17:35         ` Elijah Newren
  1 sibling, 1 reply; 65+ messages in thread
From: Johannes Schindelin @ 2020-12-14 15:42 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren, Elijah Newren via GitGitGadget, Git Mailing List

Hi Elijah & Stolee,

On Mon, 14 Dec 2020, Derrick Stolee wrote:

> On 12/13/2020 2:47 AM, Elijah Newren wrote:
> >
> > Sorry for two different email responses to the same email...
> >
> > Addressing the comments on this patchset mean re-submitting
> > en/merge-ort-impl, and causing conflicts in en/merge-ort-2 and this
> > series en/merge-ort-3.  Since gitgitgadget will not allow me to submit
> > patches against a series that isn't published by Junio, I'll need to
> > ask Junio to temporarily drop both of these series, then later
> > resubmit en/merge-ort-2 after he publishes my updates to
> > en/merge-ort-impl.  Then when he publishes my updates to
> > en/merge-ort-2, I'll be able to submit my already-rebased patches for
> > en/merge-ort-3.
>
> Let's chat privately about perhaps creatin

Yes, I am totally willing to push up temporary branches if that helps you,
or even giving you push permissions to do that.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/11] merge-ort: add initial outline for basic rename detection
  2020-12-14 15:42         ` Johannes Schindelin
@ 2020-12-14 16:11           ` Elijah Newren
  2020-12-14 16:50             ` Johannes Schindelin
  0 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren @ 2020-12-14 16:11 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List

Hi,

On Mon, Dec 14, 2020 at 7:42 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah & Stolee,
>
> On Mon, 14 Dec 2020, Derrick Stolee wrote:
>
> > On 12/13/2020 2:47 AM, Elijah Newren wrote:
> > >
> > > Sorry for two different email responses to the same email...
> > >
> > > Addressing the comments on this patchset mean re-submitting
> > > en/merge-ort-impl, and causing conflicts in en/merge-ort-2 and this
> > > series en/merge-ort-3.  Since gitgitgadget will not allow me to submit
> > > patches against a series that isn't published by Junio, I'll need to
> > > ask Junio to temporarily drop both of these series, then later
> > > resubmit en/merge-ort-2 after he publishes my updates to
> > > en/merge-ort-impl.  Then when he publishes my updates to
> > > en/merge-ort-2, I'll be able to submit my already-rebased patches for
> > > en/merge-ort-3.
> >
> > Let's chat privately about perhaps creatin
>
> Yes, I am totally willing to push up temporary branches if that helps you,
> or even giving you push permissions to do that.
>
> Ciao,
> Dscho

Given the amount of changes left to push up, I suspect there'll be
more cases where it'd be useful.  If I could get push permissions, and
a suggested namespace to use for such temporary branches, that'd help.

In this particular case, though, one of my two fears was already
realized -- Junio jumped in and did the work of rebasing and conflict
resolving for en/merge-ort-2 and en/merge-ort-3.  I didn't want to
burden him with that extra work, but what he pushed up for
en/merge-ort-2 is identical to what I have.  So, all I have to do is
push en/merge-ort-3 with the extra changes I have in it.  So this
particular time is taken care of.

Thanks!

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 00/11] merge-ort: add basic rename detection
  2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                   ` (10 preceding siblings ...)
  2020-12-09 19:41 ` [PATCH 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget
@ 2020-12-14 16:21 ` Elijah Newren via GitGitGadget
  2020-12-14 16:21   ` [PATCH v2 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
                     ` (12 more replies)
  11 siblings, 13 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-14 16:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren

This series builds on en/merge-ort-2 and adds basic rename detection to
merge-ort.

Changes since v1 (all due to feedback from Stolee's reviews):

 * embedded struct rename_info directly in struct merge_options_internal (no
   longer a pointer)
 * expanded use of new enum merge_side and its new MERGE_BASE, MERGE_SIDE1,
   MERGE_SIDE2 constants
 * removed unnecessary secondary sort in compare_pairs()
 * improved commit messages
 * document p->score reuse with better comment(s)
 * space around operators

Elijah Newren (11):
  merge-ort: add basic data structures for handling renames
  merge-ort: add initial outline for basic rename detection
  merge-ort: implement detect_regular_renames()
  merge-ort: implement compare_pairs() and collect_renames()
  merge-ort: add basic outline for process_renames()
  merge-ort: add implementation of both sides renaming identically
  merge-ort: add implementation of both sides renaming differently
  merge-ort: add implementation of rename collisions
  merge-ort: add implementation of rename/delete conflicts
  merge-ort: add implementation of normal rename handling
  merge-ort: add implementation of type-changed rename handling

 merge-ort.c | 445 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 429 insertions(+), 16 deletions(-)


base-commit: c5a6f65527aa3b6f5d7cf25437a88d8727ab0646
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-812%2Fnewren%2Fort-renames-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-812/newren/ort-renames-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/812

Range-diff vs v1:

  1:  ef8f315f828 !  1:  78621ca0788 merge-ort: add basic data structures for handling renames
     @@ Commit message
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## merge-ort.c ##
     -@@
     - #include "unpack-trees.h"
     - #include "xdiff-interface.h"
     +@@ merge-ort.c: enum merge_side {
     + 	MERGE_SIDE2 = 2
     + };
       
      +struct rename_info {
      +	/*
     @@ merge-ort.c: struct merge_options_internal {
      +	/*
      +	 * renames: various data relating to rename detection
      +	 */
     -+	struct rename_info *renames;
     ++	struct rename_info renames;
      +
       	/*
       	 * current_dir_name: temporary var used in collect_merge_info_callback()
       	 *
     -@@ merge-ort.c: static void merge_start(struct merge_options *opt, struct merge_result *result)
     - 
     - 	/* Initialization of opt->priv, our internal merge data */
     - 	opt->priv = xcalloc(1, sizeof(*opt->priv));
     -+	opt->priv->renames = xcalloc(1, sizeof(*opt->priv->renames));
     - 
     - 	/*
     - 	 * Although we initialize opt->priv->paths with strdup_strings=0,
  2:  b9e0e1a60b9 !  2:  d846decf40b merge-ort: add initial outline for basic rename detection
     @@ merge-ort.c: static int handle_content_merge(struct merge_options *opt,
       {
      -	int clean = 1;
      +	struct diff_queue_struct combined;
     -+	struct rename_info *renames = opt->priv->renames;
     ++	struct rename_info *renames = &opt->priv->renames;
      +	int s, clean = 1;
      +
      +	memset(&combined, 0, sizeof(combined));
      +
     -+	detect_regular_renames(opt, merge_base, side1, 1);
     -+	detect_regular_renames(opt, merge_base, side2, 2);
     ++	detect_regular_renames(opt, merge_base, side1, MERGE_SIDE1);
     ++	detect_regular_renames(opt, merge_base, side2, MERGE_SIDE2);
      +
      +	ALLOC_GROW(combined.queue,
      +		   renames->pairs[1].nr + renames->pairs[2].nr,
      +		   combined.alloc);
     -+	clean &= collect_renames(opt, &combined, 1);
     -+	clean &= collect_renames(opt, &combined, 2);
     ++	clean &= collect_renames(opt, &combined, MERGE_SIDE1);
     ++	clean &= collect_renames(opt, &combined, MERGE_SIDE2);
      +	QSORT(combined.queue, combined.nr, compare_pairs);
      +
      +	clean &= process_renames(opt, &combined);
      +
      +	/* Free memory for renames->pairs[] and combined */
     -+	for (s = 1; s <= 2; s++) {
     ++	for (s = MERGE_SIDE1; s <= MERGE_SIDE2; s++) {
      +		free(renames->pairs[s].queue);
      +		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
      +	}
  3:  ba30bc8686e !  3:  620fc64032d merge-ort: implement detect_regular_renames()
     @@ Metadata
       ## Commit message ##
          merge-ort: implement detect_regular_renames()
      
     -    Based heavily on merge-recursive's get_diffpairs() function.
     +    Based heavily on merge-recursive's get_diffpairs() function, and also
     +    includes the necessary paired call to diff_warn_rename_limit() so that
     +    users will be warned if merge.renameLimit is not sufficiently large for
     +    rename detection to run.
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     @@ merge-ort.c: static void detect_regular_renames(struct merge_options *opt,
       {
      -	die("Not yet implemented.");
      +	struct diff_options diff_opts;
     -+	struct rename_info *renames = opt->priv->renames;
     ++	struct rename_info *renames = &opt->priv->renames;
      +
      +	repo_diff_setup(opt->repo, &diff_opts);
      +	diff_opts.flags.recursive = 1;
     @@ merge-ort.c: static void detect_regular_renames(struct merge_options *opt,
      +		      &diff_opts);
      +	diffcore_std(&diff_opts);
      +
     -+	if (diff_opts.needed_rename_limit > opt->priv->renames->needed_limit)
     -+		opt->priv->renames->needed_limit = diff_opts.needed_rename_limit;
     ++	if (diff_opts.needed_rename_limit > renames->needed_limit)
     ++		renames->needed_limit = diff_opts.needed_rename_limit;
      +
      +	renames->pairs[side_index] = diff_queued_diff;
      +
     @@ merge-ort.c: void merge_switch_to_result(struct merge_options *opt,
      +
      +		/* Also include needed rename limit adjustment now */
      +		diff_warn_rename_limit("merge.renamelimit",
     -+				       opti->renames->needed_limit, 0);
     ++				       opti->renames.needed_limit, 0);
       	}
       
       	merge_finalize(opt, result);
  4:  207bb9a837c !  4:  9382dc4d50b merge-ort: implement compare_pairs() and collect_renames()
     @@ merge-ort.c: static int process_renames(struct merge_options *opt,
      +	const struct diff_filepair *a = *((const struct diff_filepair **)a_);
      +	const struct diff_filepair *b = *((const struct diff_filepair **)b_);
      +
     -+	int cmp = strcmp(a->one->path, b->one->path);
     -+	if (cmp)
     -+		return cmp;
     -+	return a->score - b->score;
     ++	return strcmp(a->one->path, b->one->path);
       }
       
       /* Call diffcore_rename() to compute which files have changed on given side */
     @@ merge-ort.c: static int collect_renames(struct merge_options *opt,
      -	die("Not yet implemented.");
      +	int i, clean = 1;
      +	struct diff_queue_struct *side_pairs;
     -+	struct rename_info *renames = opt->priv->renames;
     ++	struct rename_info *renames = &opt->priv->renames;
      +
      +	side_pairs = &renames->pairs[side_index];
      +
     @@ merge-ort.c: static int collect_renames(struct merge_options *opt,
      +			diff_free_filepair(p);
      +			continue;
      +		}
     ++
     ++		/*
     ++		 * p->score comes back from diffcore_rename_extended() with
     ++		 * the similarity of the renamed file.  The similarity is
     ++		 * was used to determine that the two files were related
     ++		 * and are a rename, which we have already used, but beyond
     ++		 * that we have no use for the similarity.  So p->score is
     ++		 * now irrelevant.  However, process_renames() will need to
     ++		 * know which side of the merge this rename was associated
     ++		 * with, so overwrite p->score with that value.
     ++		 */
      +		p->score = side_index;
      +		result->queue[result->nr++] = p;
      +	}
  5:  35b070b9b7c !  5:  d20fab8d403 merge-ort: add basic outline for process_renames()
     @@ merge-ort.c: static int handle_content_merge(struct merge_options *opt,
      +		 * diff_filepairs have copies of pathnames, thus we have to
      +		 * use standard 'strcmp()' (negated) instead of '=='.
      +		 */
     -+		if (i+1 < renames->nr &&
     ++		if (i + 1 < renames->nr &&
      +		    !strcmp(oldpath, renames->queue[i+1]->one->path)) {
      +			/* Handle rename/rename(1to2) or rename/rename(1to1) */
      +			const char *pathnames[3];
     @@ merge-ort.c: static int handle_content_merge(struct merge_options *opt,
      +
      +		VERIFY_CI(oldinfo);
      +		VERIFY_CI(newinfo);
     -+		target_index = pair->score; /* from append_rename_pairs() */
     ++		target_index = pair->score; /* from collect_renames() */
      +		assert(target_index == 1 || target_index == 2);
     -+		other_source_index = 3-target_index;
     ++		other_source_index = 3 - target_index;
      +		old_sidemask = (1 << other_source_index); /* 2 or 4 */
      +		source_deleted = (oldinfo->filemask == 1);
      +		collision = ((newinfo->filemask & old_sidemask) != 0);
  6:  9c79b9f4a09 !  6:  15fff3dd0c4 merge-ort: add implementation of both sides renaming identically
     @@ merge-ort.c: static int process_renames(struct merge_options *opt,
      +				assert(side1 == side2);
      +				memcpy(&side1->stages[0], &base->stages[0],
      +				       sizeof(merged));
     -+				side1->filemask |= (1 << 0);
     ++				side1->filemask |= (1 << MERGE_BASE);
      +				/* Mark base as resolved by removal */
      +				base->merged.is_null = 1;
      +				base->merged.clean = 1;
  7:  d4595397052 !  7:  d00e26be784 merge-ort: add implementation of both sides renaming differently
     @@ merge-ort.c: static int process_renames(struct merge_options *opt,
      +							   &merged);
      +			if (!clean_merge &&
      +			    merged.mode == side1->stages[1].mode &&
     -+			    oideq(&merged.oid, &side1->stages[1].oid)) {
     ++			    oideq(&merged.oid, &side1->stages[1].oid))
      +				was_binary_blob = 1;
     -+			}
      +			memcpy(&side1->stages[1], &merged, sizeof(merged));
      +			if (was_binary_blob) {
      +				/*
  8:  ab15f85f698 =  8:  edd610321a0 merge-ort: add implementation of rename collisions
  9:  c069d34b15f !  9:  f017534243c merge-ort: add implementation of rename/delete conflicts
     @@ merge-ort.c: static int process_renames(struct merge_options *opt,
      +			 */
      +			memcpy(&newinfo->stages[0], &oldinfo->stages[0],
      +			       sizeof(newinfo->stages[0]));
     -+			newinfo->filemask |= (1 << 0);
     ++			newinfo->filemask |= (1 << MERGE_BASE);
      +			newinfo->pathnames[0] = oldpath;
       			if (type_changed) {
       				/* rename vs. typechange */
 10:  14baa5874af = 10:  22cb7110261 merge-ort: add implementation of normal rename handling
 11:  476553e2b20 = 11:  ff09ddb9caf merge-ort: add implementation of type-changed rename handling

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v2 01/11] merge-ort: add basic data structures for handling renames
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
@ 2020-12-14 16:21   ` Elijah Newren via GitGitGadget
  2020-12-14 16:21   ` [PATCH v2 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-14 16:21 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

This will grow later, but we only need a few fields for basic rename
handling.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 414e7b7eeac..1c1a7fa4bf1 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -46,6 +46,25 @@ enum merge_side {
 	MERGE_SIDE2 = 2
 };
 
+struct rename_info {
+	/*
+	 * pairs: pairing of filenames from diffcore_rename()
+	 *
+	 * Index 1 and 2 correspond to sides 1 & 2 as used in
+	 * conflict_info.stages.  Index 0 unused.
+	 */
+	struct diff_queue_struct pairs[3];
+
+	/*
+	 * needed_limit: value needed for inexact rename detection to run
+	 *
+	 * If the current rename limit wasn't high enough for inexact
+	 * rename detection to run, this records the limit needed.  Otherwise,
+	 * this value remains 0.
+	 */
+	int needed_limit;
+};
+
 struct merge_options_internal {
 	/*
 	 * paths: primary data structure in all of merge ort.
@@ -113,6 +132,11 @@ struct merge_options_internal {
 	 */
 	struct strmap output;
 
+	/*
+	 * renames: various data relating to rename detection
+	 */
+	struct rename_info renames;
+
 	/*
 	 * current_dir_name: temporary var used in collect_merge_info_callback()
 	 *
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 02/11] merge-ort: add initial outline for basic rename detection
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
  2020-12-14 16:21   ` [PATCH v2 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
@ 2020-12-14 16:21   ` Elijah Newren via GitGitGadget
  2020-12-14 16:21   ` [PATCH v2 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-14 16:21 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 1c1a7fa4bf1..8552f5e2318 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -644,20 +644,72 @@ static int handle_content_merge(struct merge_options *opt,
 
 /*** Function Grouping: functions related to regular rename detection ***/
 
+static int process_renames(struct merge_options *opt,
+			   struct diff_queue_struct *renames)
+{
+	die("Not yet implemented.");
+}
+
+static int compare_pairs(const void *a_, const void *b_)
+{
+	die("Not yet implemented.");
+}
+
+/* Call diffcore_rename() to compute which files have changed on given side */
+static void detect_regular_renames(struct merge_options *opt,
+				   struct tree *merge_base,
+				   struct tree *side,
+				   unsigned side_index)
+{
+	die("Not yet implemented.");
+}
+
+/*
+ * Get information of all renames which occurred in 'side_pairs', discarding
+ * non-renames.
+ */
+static int collect_renames(struct merge_options *opt,
+			   struct diff_queue_struct *result,
+			   unsigned side_index)
+{
+	die("Not yet implemented.");
+}
+
 static int detect_and_process_renames(struct merge_options *opt,
 				      struct tree *merge_base,
 				      struct tree *side1,
 				      struct tree *side2)
 {
-	int clean = 1;
+	struct diff_queue_struct combined;
+	struct rename_info *renames = &opt->priv->renames;
+	int s, clean = 1;
+
+	memset(&combined, 0, sizeof(combined));
+
+	detect_regular_renames(opt, merge_base, side1, MERGE_SIDE1);
+	detect_regular_renames(opt, merge_base, side2, MERGE_SIDE2);
+
+	ALLOC_GROW(combined.queue,
+		   renames->pairs[1].nr + renames->pairs[2].nr,
+		   combined.alloc);
+	clean &= collect_renames(opt, &combined, MERGE_SIDE1);
+	clean &= collect_renames(opt, &combined, MERGE_SIDE2);
+	QSORT(combined.queue, combined.nr, compare_pairs);
+
+	clean &= process_renames(opt, &combined);
+
+	/* Free memory for renames->pairs[] and combined */
+	for (s = MERGE_SIDE1; s <= MERGE_SIDE2; s++) {
+		free(renames->pairs[s].queue);
+		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
+	}
+	if (combined.nr) {
+		int i;
+		for (i = 0; i < combined.nr; i++)
+			diff_free_filepair(combined.queue[i]);
+		free(combined.queue);
+	}
 
-	/*
-	 * Rename detection works by detecting file similarity.  Here we use
-	 * a really easy-to-implement scheme: files are similar IFF they have
-	 * the same filename.  Therefore, by this scheme, there are no renames.
-	 *
-	 * TODO: Actually implement a real rename detection scheme.
-	 */
 	return clean;
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 03/11] merge-ort: implement detect_regular_renames()
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
  2020-12-14 16:21   ` [PATCH v2 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
  2020-12-14 16:21   ` [PATCH v2 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
@ 2020-12-14 16:21   ` Elijah Newren via GitGitGadget
  2020-12-14 16:21   ` [PATCH v2 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-14 16:21 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Based heavily on merge-recursive's get_diffpairs() function, and also
includes the necessary paired call to diff_warn_rename_limit() so that
users will be warned if merge.renameLimit is not sufficiently large for
rename detection to run.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 8552f5e2318..66f84d39b43 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -661,7 +661,33 @@ static void detect_regular_renames(struct merge_options *opt,
 				   struct tree *side,
 				   unsigned side_index)
 {
-	die("Not yet implemented.");
+	struct diff_options diff_opts;
+	struct rename_info *renames = &opt->priv->renames;
+
+	repo_diff_setup(opt->repo, &diff_opts);
+	diff_opts.flags.recursive = 1;
+	diff_opts.flags.rename_empty = 0;
+	diff_opts.detect_rename = DIFF_DETECT_RENAME;
+	diff_opts.rename_limit = opt->rename_limit;
+	if (opt->rename_limit <= 0)
+		diff_opts.rename_limit = 1000;
+	diff_opts.rename_score = opt->rename_score;
+	diff_opts.show_rename_progress = opt->show_rename_progress;
+	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
+	diff_setup_done(&diff_opts);
+	diff_tree_oid(&merge_base->object.oid, &side->object.oid, "",
+		      &diff_opts);
+	diffcore_std(&diff_opts);
+
+	if (diff_opts.needed_rename_limit > renames->needed_limit)
+		renames->needed_limit = diff_opts.needed_rename_limit;
+
+	renames->pairs[side_index] = diff_queued_diff;
+
+	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
+	diff_queued_diff.nr = 0;
+	diff_queued_diff.queue = NULL;
+	diff_flush(&diff_opts);
 }
 
 /*
@@ -1406,6 +1432,10 @@ void merge_switch_to_result(struct merge_options *opt,
 			printf("%s", sb->buf);
 		}
 		string_list_clear(&olist, 0);
+
+		/* Also include needed rename limit adjustment now */
+		diff_warn_rename_limit("merge.renamelimit",
+				       opti->renames.needed_limit, 0);
 	}
 
 	merge_finalize(opt, result);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 04/11] merge-ort: implement compare_pairs() and collect_renames()
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                     ` (2 preceding siblings ...)
  2020-12-14 16:21   ` [PATCH v2 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
@ 2020-12-14 16:21   ` Elijah Newren via GitGitGadget
  2020-12-14 16:21   ` [PATCH v2 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-14 16:21 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 66f84d39b43..10550c542b8 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -652,7 +652,10 @@ static int process_renames(struct merge_options *opt,
 
 static int compare_pairs(const void *a_, const void *b_)
 {
-	die("Not yet implemented.");
+	const struct diff_filepair *a = *((const struct diff_filepair **)a_);
+	const struct diff_filepair *b = *((const struct diff_filepair **)b_);
+
+	return strcmp(a->one->path, b->one->path);
 }
 
 /* Call diffcore_rename() to compute which files have changed on given side */
@@ -698,7 +701,35 @@ static int collect_renames(struct merge_options *opt,
 			   struct diff_queue_struct *result,
 			   unsigned side_index)
 {
-	die("Not yet implemented.");
+	int i, clean = 1;
+	struct diff_queue_struct *side_pairs;
+	struct rename_info *renames = &opt->priv->renames;
+
+	side_pairs = &renames->pairs[side_index];
+
+	for (i = 0; i < side_pairs->nr; ++i) {
+		struct diff_filepair *p = side_pairs->queue[i];
+
+		if (p->status != 'R') {
+			diff_free_filepair(p);
+			continue;
+		}
+
+		/*
+		 * p->score comes back from diffcore_rename_extended() with
+		 * the similarity of the renamed file.  The similarity is
+		 * was used to determine that the two files were related
+		 * and are a rename, which we have already used, but beyond
+		 * that we have no use for the similarity.  So p->score is
+		 * now irrelevant.  However, process_renames() will need to
+		 * know which side of the merge this rename was associated
+		 * with, so overwrite p->score with that value.
+		 */
+		p->score = side_index;
+		result->queue[result->nr++] = p;
+	}
+
+	return clean;
 }
 
 static int detect_and_process_renames(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 05/11] merge-ort: add basic outline for process_renames()
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                     ` (3 preceding siblings ...)
  2020-12-14 16:21   ` [PATCH v2 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
@ 2020-12-14 16:21   ` Elijah Newren via GitGitGadget
  2020-12-14 16:21   ` [PATCH v2 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-14 16:21 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add code which determines which kind of special rename case each rename
corresponds to, but leave the handling of each type unimplemented for
now.  Future commits will implement each one.

There is some tenuous resemblance to merge-recursive's
process_renames(), but comparing the two is very unlikely to yield any
insights.  merge-ort's process_renames() is a bit complex and I would
prefer if I could simplify it more, but it is far easier to grok than
merge-recursive's function of the same name in my opinion.  Plus,
merge-ort handles more rename conflict types than merge-recursive does.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 97 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 10550c542b8..ebe275ef73c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -647,7 +647,103 @@ static int handle_content_merge(struct merge_options *opt,
 static int process_renames(struct merge_options *opt,
 			   struct diff_queue_struct *renames)
 {
-	die("Not yet implemented.");
+	int clean_merge = 1, i;
+
+	for (i = 0; i < renames->nr; ++i) {
+		const char *oldpath = NULL, *newpath;
+		struct diff_filepair *pair = renames->queue[i];
+		struct conflict_info *oldinfo = NULL, *newinfo = NULL;
+		struct strmap_entry *old_ent, *new_ent;
+		unsigned int old_sidemask;
+		int target_index, other_source_index;
+		int source_deleted, collision, type_changed;
+
+		old_ent = strmap_get_entry(&opt->priv->paths, pair->one->path);
+		oldpath = old_ent->key;
+		oldinfo = old_ent->value;
+
+		new_ent = strmap_get_entry(&opt->priv->paths, pair->two->path);
+		newpath = new_ent->key;
+		newinfo = new_ent->value;
+
+		/*
+		 * diff_filepairs have copies of pathnames, thus we have to
+		 * use standard 'strcmp()' (negated) instead of '=='.
+		 */
+		if (i + 1 < renames->nr &&
+		    !strcmp(oldpath, renames->queue[i+1]->one->path)) {
+			/* Handle rename/rename(1to2) or rename/rename(1to1) */
+			const char *pathnames[3];
+
+			pathnames[0] = oldpath;
+			pathnames[1] = newpath;
+			pathnames[2] = renames->queue[i+1]->two->path;
+
+			if (!strcmp(pathnames[1], pathnames[2])) {
+				/* Both sides renamed the same way. */
+				die("Not yet implemented");
+
+				/* We handled both renames, i.e. i+1 handled */
+				i++;
+				/* Move to next rename */
+				continue;
+			}
+
+			/* This is a rename/rename(1to2) */
+			die("Not yet implemented");
+
+			i++; /* We handled both renames, i.e. i+1 handled */
+			continue;
+		}
+
+		VERIFY_CI(oldinfo);
+		VERIFY_CI(newinfo);
+		target_index = pair->score; /* from collect_renames() */
+		assert(target_index == 1 || target_index == 2);
+		other_source_index = 3 - target_index;
+		old_sidemask = (1 << other_source_index); /* 2 or 4 */
+		source_deleted = (oldinfo->filemask == 1);
+		collision = ((newinfo->filemask & old_sidemask) != 0);
+		type_changed = !source_deleted &&
+			(S_ISREG(oldinfo->stages[other_source_index].mode) !=
+			 S_ISREG(newinfo->stages[target_index].mode));
+		if (type_changed && collision) {
+			/* special handling so later blocks can handle this */
+			die("Not yet implemented");
+		}
+
+		assert(source_deleted || oldinfo->filemask & old_sidemask);
+
+		/* Need to check for special types of rename conflicts... */
+		if (collision && !source_deleted) {
+			/* collision: rename/add or rename/rename(2to1) */
+			die("Not yet implemented");
+		} else if (collision && source_deleted) {
+			/* rename/add/delete or rename/rename(2to1)/delete */
+			die("Not yet implemented");
+		} else {
+			/* a few different cases... */
+			if (type_changed) {
+				/* rename vs. typechange */
+				die("Not yet implemented");
+			} else if (source_deleted) {
+				/* rename/delete */
+				die("Not yet implemented");
+			} else {
+				/* normal rename */
+				die("Not yet implemented");
+			}
+		}
+
+		if (!type_changed) {
+			/* Mark the original as resolved by removal */
+			oldinfo->merged.is_null = 1;
+			oldinfo->merged.clean = 1;
+		}
+
+	}
+
+	return clean_merge;
 }
 
 static int compare_pairs(const void *a_, const void *b_)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 06/11] merge-ort: add implementation of both sides renaming identically
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                     ` (4 preceding siblings ...)
  2020-12-14 16:21   ` [PATCH v2 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
@ 2020-12-14 16:21   ` Elijah Newren via GitGitGadget
  2020-12-14 16:21   ` [PATCH v2 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-14 16:21 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement rename/rename(1to1) handling, i.e. both sides of history
renaming a file but renaming the same way.  This code replaces the
following from merge-recurisve.c:

  * all the 1to1 code in process_renames()
  * the RENAME_ONE_FILE_TO_ONE case of process_entry()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_rename_normal()
  * setup_rename_conflict_info()

The consolidation of four separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index ebe275ef73c..4034ffcf501 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -674,14 +674,31 @@ static int process_renames(struct merge_options *opt,
 		    !strcmp(oldpath, renames->queue[i+1]->one->path)) {
 			/* Handle rename/rename(1to2) or rename/rename(1to1) */
 			const char *pathnames[3];
+			struct version_info merged;
+			struct conflict_info *base, *side1, *side2;
+			unsigned was_binary_blob = 0;
 
 			pathnames[0] = oldpath;
 			pathnames[1] = newpath;
 			pathnames[2] = renames->queue[i+1]->two->path;
 
+			base = strmap_get(&opt->priv->paths, pathnames[0]);
+			side1 = strmap_get(&opt->priv->paths, pathnames[1]);
+			side2 = strmap_get(&opt->priv->paths, pathnames[2]);
+
+			VERIFY_CI(base);
+			VERIFY_CI(side1);
+			VERIFY_CI(side2);
+
 			if (!strcmp(pathnames[1], pathnames[2])) {
-				/* Both sides renamed the same way. */
-				die("Not yet implemented");
+				/* Both sides renamed the same way */
+				assert(side1 == side2);
+				memcpy(&side1->stages[0], &base->stages[0],
+				       sizeof(merged));
+				side1->filemask |= (1 << MERGE_BASE);
+				/* Mark base as resolved by removal */
+				base->merged.is_null = 1;
+				base->merged.clean = 1;
 
 				/* We handled both renames, i.e. i+1 handled */
 				i++;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 07/11] merge-ort: add implementation of both sides renaming differently
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                     ` (5 preceding siblings ...)
  2020-12-14 16:21   ` [PATCH v2 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
@ 2020-12-14 16:21   ` Elijah Newren via GitGitGadget
  2020-12-14 16:21   ` [PATCH v2 08/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-14 16:21 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement rename/rename(1to2) handling, i.e. both sides of history
renaming a file and rename it differently.  This code replaces the
following from merge-recurisve.c:

  * all the 1to2 code in process_renames()
  * the RENAME_ONE_FILE_TO_TWO case of process_entry()
  * handle_rename_rename_1to2()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_file_collision()
  * setup_rename_conflict_info()

The consolidation of five separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

To be fair, there is a _slight_ tweak to process_entry() here to make
sure that the two different paths aren't marked as clean but are left in
a conflicted state.  So process_renames() and process_entry() aren't
quite entirely orthogonal, but they are pretty close.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 54 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 4034ffcf501..19477cfae60 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -707,7 +707,58 @@ static int process_renames(struct merge_options *opt,
 			}
 
 			/* This is a rename/rename(1to2) */
-			die("Not yet implemented");
+			clean_merge = handle_content_merge(opt,
+							   pair->one->path,
+							   &base->stages[0],
+							   &side1->stages[1],
+							   &side2->stages[2],
+							   pathnames,
+							   1 + 2 * opt->priv->call_depth,
+							   &merged);
+			if (!clean_merge &&
+			    merged.mode == side1->stages[1].mode &&
+			    oideq(&merged.oid, &side1->stages[1].oid))
+				was_binary_blob = 1;
+			memcpy(&side1->stages[1], &merged, sizeof(merged));
+			if (was_binary_blob) {
+				/*
+				 * Getting here means we were attempting to
+				 * merge a binary blob.
+				 *
+				 * Since we can't merge binaries,
+				 * handle_content_merge() just takes one
+				 * side.  But we don't want to copy the
+				 * contents of one side to both paths.  We
+				 * used the contents of side1 above for
+				 * side1->stages, let's use the contents of
+				 * side2 for side2->stages below.
+				 */
+				oidcpy(&merged.oid, &side2->stages[2].oid);
+				merged.mode = side2->stages[2].mode;
+			}
+			memcpy(&side2->stages[2], &merged, sizeof(merged));
+
+			side1->path_conflict = 1;
+			side2->path_conflict = 1;
+			/*
+			 * TODO: For renames we normally remove the path at the
+			 * old name.  It would thus seem consistent to do the
+			 * same for rename/rename(1to2) cases, but we haven't
+			 * done so traditionally and a number of the regression
+			 * tests now encode an expectation that the file is
+			 * left there at stage 1.  If we ever decide to change
+			 * this, add the following two lines here:
+			 *    base->merged.is_null = 1;
+			 *    base->merged.clean = 1;
+			 * and remove the setting of base->path_conflict to 1.
+			 */
+			base->path_conflict = 1;
+			path_msg(opt, oldpath, 0,
+				 _("CONFLICT (rename/rename): %s renamed to "
+				   "%s in %s and to %s in %s."),
+				 pathnames[0],
+				 pathnames[1], opt->branch1,
+				 pathnames[2], opt->branch2);
 
 			i++; /* We handled both renames, i.e. i+1 handled */
 			continue;
@@ -1292,13 +1343,13 @@ static void process_entry(struct merge_options *opt,
 		int side = (ci->filemask == 4) ? 2 : 1;
 		ci->merged.result.mode = ci->stages[side].mode;
 		oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
-		ci->merged.clean = !ci->df_conflict;
+		ci->merged.clean = !ci->df_conflict && !ci->path_conflict;
 	} else if (ci->filemask == 1) {
 		/* Deleted on both sides */
 		ci->merged.is_null = 1;
 		ci->merged.result.mode = 0;
 		oidcpy(&ci->merged.result.oid, &null_oid);
-		ci->merged.clean = 1;
+		ci->merged.clean = !ci->path_conflict;
 	}
 
 	/*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 08/11] merge-ort: add implementation of rename collisions
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                     ` (6 preceding siblings ...)
  2020-12-14 16:21   ` [PATCH v2 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
@ 2020-12-14 16:21   ` Elijah Newren via GitGitGadget
  2020-12-15 14:09     ` Derrick Stolee
  2020-12-14 16:21   ` [PATCH v2 09/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
                     ` (4 subsequent siblings)
  12 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-14 16:21 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement rename/rename(2to1) and rename/add handling, i.e. a file is
renamed into a location where another file is added (with that other
file either being a plain add or itself coming from a rename).  Note
that rename collisions can also have a special case stacked on top: the
file being renamed on one side of history is deleted on the other
(yielding either a rename/add/delete conflict or perhaps a
rename/rename(2to1)/delete[/delete]) conflict.

One thing to note here is that when there is a double rename, the code
in question only handles one of them at a time; a later iteration
through the loop will handle the other.  After they've both been
handled, process_entry()'s normal add/add code can handle the collision.

This code replaces the following from merge-recurisve.c:

  * all the 2to1 code in process_renames()
  * the RENAME_TWO_FILES_TO_ONE case of process_entry()
  * handle_rename_rename_2to1()
  * handle_rename_add()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_file_collision()
  * setup_rename_conflict_info()

The consolidation of six separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 19477cfae60..04a16837849 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -785,10 +785,58 @@ static int process_renames(struct merge_options *opt,
 		/* Need to check for special types of rename conflicts... */
 		if (collision && !source_deleted) {
 			/* collision: rename/add or rename/rename(2to1) */
-			die("Not yet implemented");
+			const char *pathnames[3];
+			struct version_info merged;
+
+			struct conflict_info *base, *side1, *side2;
+			unsigned clean;
+
+			pathnames[0] = oldpath;
+			pathnames[other_source_index] = oldpath;
+			pathnames[target_index] = newpath;
+
+			base = strmap_get(&opt->priv->paths, pathnames[0]);
+			side1 = strmap_get(&opt->priv->paths, pathnames[1]);
+			side2 = strmap_get(&opt->priv->paths, pathnames[2]);
+
+			VERIFY_CI(base);
+			VERIFY_CI(side1);
+			VERIFY_CI(side2);
+
+			clean = handle_content_merge(opt, pair->one->path,
+						     &base->stages[0],
+						     &side1->stages[1],
+						     &side2->stages[2],
+						     pathnames,
+						     1 + 2*opt->priv->call_depth,
+						     &merged);
+
+			memcpy(&newinfo->stages[target_index], &merged,
+			       sizeof(merged));
+			if (!clean) {
+				path_msg(opt, newpath, 0,
+					 _("CONFLICT (rename involved in "
+					   "collision): rename of %s -> %s has "
+					   "content conflicts AND collides "
+					   "with another path; this may result "
+					   "in nested conflict markers."),
+					 oldpath, newpath);
+			}
 		} else if (collision && source_deleted) {
-			/* rename/add/delete or rename/rename(2to1)/delete */
-			die("Not yet implemented");
+			/*
+			 * rename/add/delete or rename/rename(2to1)/delete:
+			 * since oldpath was deleted on the side that didn't
+			 * do the rename, there's not much of a content merge
+			 * we can do for the rename.  oldinfo->merged.is_null
+			 * was already set, so we just leave things as-is so
+			 * they look like an add/add conflict.
+			 */
+
+			newinfo->path_conflict = 1;
+			path_msg(opt, newpath, 0,
+				 _("CONFLICT (rename/delete): %s renamed "
+				   "to %s in %s, but deleted in %s."),
+				 oldpath, newpath, rename_branch, delete_branch);
 		} else {
 			/* a few different cases... */
 			if (type_changed) {
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 09/11] merge-ort: add implementation of rename/delete conflicts
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                     ` (7 preceding siblings ...)
  2020-12-14 16:21   ` [PATCH v2 08/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
@ 2020-12-14 16:21   ` Elijah Newren via GitGitGadget
  2020-12-15 14:23     ` Derrick Stolee
  2020-12-15 14:27     ` Derrick Stolee
  2020-12-14 16:21   ` [PATCH v2 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
                     ` (3 subsequent siblings)
  12 siblings, 2 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-14 16:21 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement rename/delete conflicts, i.e. one side renames a file and the
other deletes the file.  This code replaces the following from
merge-recurisve.c:

  * the code relevant to RENAME_DELETE in process_renames()
  * the RENAME_DELETE case of process_entry()
  * handle_rename_delete()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_change_delete()
  * setup_rename_conflict_info()

The consolidation of five separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

To be fair, there is a _slight_ tweak to process_entry() here, because
rename/delete cases will also trigger the modify/delete codepath.
However, we only want a modify/delete message to be printed for a
rename/delete conflict if there is a content change in the renamed file
in addition to the rename.  So process_renames() and process_entry()
aren't quite fully orthogonal, but they are pretty close.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 47 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 39 insertions(+), 8 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 04a16837849..4150ccc35e1 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -657,6 +657,7 @@ static int process_renames(struct merge_options *opt,
 		unsigned int old_sidemask;
 		int target_index, other_source_index;
 		int source_deleted, collision, type_changed;
+		const char *rename_branch = NULL, *delete_branch = NULL;
 
 		old_ent = strmap_get_entry(&opt->priv->paths, pair->one->path);
 		oldpath = old_ent->key;
@@ -778,6 +779,14 @@ static int process_renames(struct merge_options *opt,
 		if (type_changed && collision) {
 			/* special handling so later blocks can handle this */
 			die("Not yet implemented");
+		if (source_deleted) {
+			if (target_index == 1) {
+				rename_branch = opt->branch1;
+				delete_branch = opt->branch2;
+			} else {
+				rename_branch = opt->branch2;
+				delete_branch = opt->branch1;
+			}
 		}
 
 		assert(source_deleted || oldinfo->filemask & old_sidemask);
@@ -838,13 +847,26 @@ static int process_renames(struct merge_options *opt,
 				   "to %s in %s, but deleted in %s."),
 				 oldpath, newpath, rename_branch, delete_branch);
 		} else {
-			/* a few different cases... */
+			/*
+			 * a few different cases...start by copying the
+			 * existing stage(s) from oldinfo over the newinfo
+			 * and update the pathname(s).
+			 */
+			memcpy(&newinfo->stages[0], &oldinfo->stages[0],
+			       sizeof(newinfo->stages[0]));
+			newinfo->filemask |= (1 << MERGE_BASE);
+			newinfo->pathnames[0] = oldpath;
 			if (type_changed) {
 				/* rename vs. typechange */
 				die("Not yet implemented");
 			} else if (source_deleted) {
 				/* rename/delete */
-				die("Not yet implemented");
+				newinfo->path_conflict = 1;
+				path_msg(opt, newpath, 0,
+					 _("CONFLICT (rename/delete): %s renamed"
+					   " to %s in %s, but deleted in %s."),
+					 oldpath, newpath,
+					 rename_branch, delete_branch);
 			} else {
 				/* normal rename */
 				die("Not yet implemented");
@@ -1380,12 +1402,21 @@ static void process_entry(struct merge_options *opt,
 		modify_branch = (side == 1) ? opt->branch1 : opt->branch2;
 		delete_branch = (side == 1) ? opt->branch2 : opt->branch1;
 
-		path_msg(opt, path, 0,
-			 _("CONFLICT (modify/delete): %s deleted in %s "
-			   "and modified in %s.  Version %s of %s left "
-			   "in tree."),
-			 path, delete_branch, modify_branch,
-			 modify_branch, path);
+		if (ci->path_conflict &&
+		    oideq(&ci->stages[0].oid, &ci->stages[side].oid)) {
+			/*
+			 * This came from a rename/delete; no action to take,
+			 * but avoid printing "modify/delete" conflict notice
+			 * since the contents were not modified.
+			 */
+		} else {
+			path_msg(opt, path, 0,
+				 _("CONFLICT (modify/delete): %s deleted in %s "
+				   "and modified in %s.  Version %s of %s left "
+				   "in tree."),
+				 path, delete_branch, modify_branch,
+				 modify_branch, path);
+		}
 	} else if (ci->filemask == 2 || ci->filemask == 4) {
 		/* Added on one side */
 		int side = (ci->filemask == 4) ? 2 : 1;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 10/11] merge-ort: add implementation of normal rename handling
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                     ` (8 preceding siblings ...)
  2020-12-14 16:21   ` [PATCH v2 09/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
@ 2020-12-14 16:21   ` Elijah Newren via GitGitGadget
  2020-12-15 14:27     ` Derrick Stolee
  2020-12-14 16:21   ` [PATCH v2 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget
                     ` (2 subsequent siblings)
  12 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-14 16:21 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement handling of normal renames.  This code replaces the following
from merge-recurisve.c:

  * the code relevant to RENAME_NORMAL in process_renames()
  * the RENAME_NORMAL case of process_entry()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_rename_normal()
  * setup_rename_conflict_info()

The consolidation of four separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

(To be fair, the code for handling normal renames wasn't all that
complicated beforehand, but it's still much simpler now.)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 4150ccc35e1..9aac33c8e31 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -869,7 +869,11 @@ static int process_renames(struct merge_options *opt,
 					 rename_branch, delete_branch);
 			} else {
 				/* normal rename */
-				die("Not yet implemented");
+				memcpy(&newinfo->stages[other_source_index],
+				       &oldinfo->stages[other_source_index],
+				       sizeof(newinfo->stages[0]));
+				newinfo->filemask |= (1 << other_source_index);
+				newinfo->pathnames[other_source_index] = oldpath;
 			}
 		}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v2 11/11] merge-ort: add implementation of type-changed rename handling
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                     ` (9 preceding siblings ...)
  2020-12-14 16:21   ` [PATCH v2 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
@ 2020-12-14 16:21   ` Elijah Newren via GitGitGadget
  2020-12-15 14:31     ` Derrick Stolee
  2020-12-15 14:34   ` [PATCH v2 00/11] merge-ort: add basic rename detection Derrick Stolee
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
  12 siblings, 1 reply; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-14 16:21 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement cases where renames are involved in type changes (i.e. the
side of history that didn't rename the file changed its type from a
regular file to a symlink or submodule).  There was some code to handle
this in merge-recursive but only in the special case when the renamed
file had no content changes.  The code here works differently -- it
knows process_entry() can handle mode conflicts, so it does a few
minimal tweaks to ensure process_entry() can just finish the job as
needed.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 9aac33c8e31..11e33f56edf 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -778,7 +778,32 @@ static int process_renames(struct merge_options *opt,
 			 S_ISREG(newinfo->stages[target_index].mode));
 		if (type_changed && collision) {
 			/* special handling so later blocks can handle this */
-			die("Not yet implemented");
+			/*
+			 * if type_changed && collision are both true, then this
+			 * was really a double rename, but one side wasn't
+			 * detected due to lack of break detection.  I.e.
+			 * something like
+			 *    orig: has normal file 'foo'
+			 *    side1: renames 'foo' to 'bar', adds 'foo' symlink
+			 *    side2: renames 'foo' to 'bar'
+			 * In this case, the foo->bar rename on side1 won't be
+			 * detected because the new symlink named 'foo' is
+			 * there and we don't do break detection.  But we detect
+			 * this here because we don't want to merge the content
+			 * of the foo symlink with the foo->bar file, so we
+			 * have some logic to handle this special case.  The
+			 * easiest way to do that is make 'bar' on side1 not
+			 * be considered a colliding file but the other part
+			 * of a normal rename.  If the file is very different,
+			 * well we're going to get content merge conflicts
+			 * anyway so it doesn't hurt.  And if the colliding
+			 * file also has a different type, that'll be handled
+			 * by the content merge logic in process_entry() too.
+			 *
+			 * See also t6430, 'rename vs. rename/symlink'
+			 */
+			collision = 0;
+		}
 		if (source_deleted) {
 			if (target_index == 1) {
 				rename_branch = opt->branch1;
@@ -858,7 +883,11 @@ static int process_renames(struct merge_options *opt,
 			newinfo->pathnames[0] = oldpath;
 			if (type_changed) {
 				/* rename vs. typechange */
-				die("Not yet implemented");
+				/* Mark the original as resolved by removal */
+				memcpy(&oldinfo->stages[0].oid, &null_oid,
+				       sizeof(oldinfo->stages[0].oid));
+				oldinfo->stages[0].mode = 0;
+				oldinfo->filemask &= 0x06;
 			} else if (source_deleted) {
 				/* rename/delete */
 				newinfo->path_conflict = 1;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/11] merge-ort: add initial outline for basic rename detection
  2020-12-14 16:11           ` Elijah Newren
@ 2020-12-14 16:50             ` Johannes Schindelin
  0 siblings, 0 replies; 65+ messages in thread
From: Johannes Schindelin @ 2020-12-14 16:50 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Elijah Newren via GitGitGadget, Git Mailing List

Hi Elijah,

On Mon, 14 Dec 2020, Elijah Newren wrote:

> On Mon, Dec 14, 2020 at 7:42 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > Hi Elijah & Stolee,
> >
> > On Mon, 14 Dec 2020, Derrick Stolee wrote:
> >
> > > On 12/13/2020 2:47 AM, Elijah Newren wrote:
> > > >
> > > > Sorry for two different email responses to the same email...
> > > >
> > > > Addressing the comments on this patchset mean re-submitting
> > > > en/merge-ort-impl, and causing conflicts in en/merge-ort-2 and this
> > > > series en/merge-ort-3.  Since gitgitgadget will not allow me to submit
> > > > patches against a series that isn't published by Junio, I'll need to
> > > > ask Junio to temporarily drop both of these series, then later
> > > > resubmit en/merge-ort-2 after he publishes my updates to
> > > > en/merge-ort-impl.  Then when he publishes my updates to
> > > > en/merge-ort-2, I'll be able to submit my already-rebased patches for
> > > > en/merge-ort-3.
> > >
> > > Let's chat privately about perhaps creatin
> >
> > Yes, I am totally willing to push up temporary branches if that helps you,
> > or even giving you push permissions to do that.
> >
> > Ciao,
> > Dscho
>
> Given the amount of changes left to push up, I suspect there'll be
> more cases where it'd be useful.  If I could get push permissions, and
> a suggested namespace to use for such temporary branches, that'd help.

I invited you into the GitGitGadget organization.

Recently, I pushed up `hanwen/libreftable` to have a proper base branch
for myself, but I think that was probably not a good idea. I should have
opened a namespace like `temp/` or some such.

> In this particular case, though, one of my two fears was already
> realized -- Junio jumped in and did the work of rebasing and conflict
> resolving for en/merge-ort-2 and en/merge-ort-3.  I didn't want to
> burden him with that extra work, but what he pushed up for
> en/merge-ort-2 is identical to what I have.  So, all I have to do is
> push en/merge-ort-3 with the extra changes I have in it.  So this
> particular time is taken care of.

Understood.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/11] merge-ort: add initial outline for basic rename detection
  2020-12-14 14:33       ` Derrick Stolee
  2020-12-14 15:42         ` Johannes Schindelin
@ 2020-12-14 17:35         ` Elijah Newren
  1 sibling, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2020-12-14 17:35 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Elijah Newren via GitGitGadget, Git Mailing List

On Mon, Dec 14, 2020 at 6:33 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/13/2020 2:47 AM, Elijah Newren wrote:
> > Hi,
> >
> > Sorry for two different email responses to the same email...
> >
> > Addressing the comments on this patchset mean re-submitting
> > en/merge-ort-impl, and causing conflicts in en/merge-ort-2 and this
> > series en/merge-ort-3.  Since gitgitgadget will not allow me to submit
> > patches against a series that isn't published by Junio, I'll need to
> > ask Junio to temporarily drop both of these series, then later
> > resubmit en/merge-ort-2 after he publishes my updates to
> > en/merge-ort-impl.  Then when he publishes my updates to
> > en/merge-ort-2, I'll be able to submit my already-rebased patches for
> > en/merge-ort-3.
>
> Let's chat privately about perhaps creatin
>
> > A couple extra comments below...
>
>
> >>> +     int s, clean = 1;
> >>> +
> >>> +     memset(&combined, 0, sizeof(combined));
> >>> +
> >>> +     detect_regular_renames(opt, merge_base, side1, 1);
> >>> +     detect_regular_renames(opt, merge_base, side2, 2);
> >>
> >> Find the renames in each side's diff.
> >>
> >> I think the use of "1" and "2" here might be better situated
> >> for an enum. Perhaps:
> >>
> >> enum merge_side {
> >>         MERGE_SIDE1 = 0,
> >>         MERGE_SIDE2 = 1,
> >> };
> >>
> >> (Note, I shift these values to 0 and 1, respectively, allowing
> >> us to truncate the pairs array to two entries while still
> >> being mentally clear.)
> >
> > So, after mulling it over for a while, I created a
> >
> > enum merge_side {
> >     MERGE_BASE = 0,
> >     MERGE_SIDE1 = 1,
> >     MERGE_SIDE2 = 2
> > };
> >
> > and I made use of it in several places.  I just avoided going to an
> > extreme with it (e.g. adding another enum for masks or changing all
> > possibly relevant variables from ints to enum merge_side), and used it
> > more as a document-when-values-are-meant-to-refer-to-sides-of-the-merge
> > kind of thing.  Of course, this affects two previous patchsets and not
> > just this one, so I'll have to post a _lot_ of new patches...   :-)
>
> I appreciate using names for the meaning behind a numerical constant.
> You mentioned in the other thread that this will eventually expand to
> a list of 10 entries, which is particularly frightening if we don't
> get some control over it now.
>
> I generally prefer using types to convey meaning as well, but I'm
> willing to relax on this because I believe C won't complain if you
> pass a literal int into an enum-typed parameter, so the compiler
> doesn't help enough in that sense.

Yeah, I went through my 'ort' branch with all 10 entries and did a
regex search for \b[12]\b throughout merge-ort.c, then considered each
one in turn, updating to the new enum where it made sense.  Then
backported the changes across en/merge-ort-impl and en/merge-ort-3
(and I just /submit-ted the en/merge-ort-3 updates to the list).  Took
quite a while, of course, but I feel it's in good shape.

So, take a look at the new sets of series and let me know what you think.

> > Something I missed in my reply yesterday...
> >
> > Note that mi->clean is NOT from struct merge_result.  It is from
> > struct merged_info, and in that struct it IS defined as "unsigned
> > clean:1", i.e. it is a true boolean.  The merged_info.clean field is
> > used to determine whether a specific path merged cleanly.
> >
> > "clean" from struct merge_result is whether the entirety of the merge
> > was clean or not.  It's almost a boolean, but allows for a
> > "catastrophic problem encountered" value.  I added the following
> > comment:
> > /*
> > * Whether the merge is clean; possible values:
> > *    1: clean
> > *    0: not clean (merge conflicts)
> > *   <0: operation aborted prematurely.  (object database
> > *       unreadable, disk full, etc.)  Worktree may be left in an
> > *       inconsistent state if operation failed near the end.
> > */
> >
> > This also means that I either abort and return a negative value, or I
> > can continue treating merge_result's "clean" field as a boolean.
>
> Having this comment helps a lot!
>
> > But again, this isn't new to this patchset; it affects the patchset
> > before the patchset before this one.
>
> Right, when I had the current change checked out, I don't see the
> patch that introduced the 'clean' member (though, I _could_ have
> blamed to find out). Instead, I just got confused and thought it
> worth a question. Your comment prevents this question in the future.

Yeah, definitely worth the question.  I've been buried in
merge-recursive.c & related areas so long that I've forgotten that
certain things are weird or surprising on first look.  The more of
those we can flag and document, the better.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 08/11] merge-ort: add implementation of rename collisions
  2020-12-14 16:21   ` [PATCH v2 08/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
@ 2020-12-15 14:09     ` Derrick Stolee
  2020-12-15 16:56       ` Elijah Newren
  0 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2020-12-15 14:09 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Johannes Schindelin

On 12/14/2020 11:21 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Implement rename/rename(2to1) and rename/add handling, i.e. a file is
> renamed into a location where another file is added (with that other
> file either being a plain add or itself coming from a rename).  Note
> that rename collisions can also have a special case stacked on top: the
> file being renamed on one side of history is deleted on the other
> (yielding either a rename/add/delete conflict or perhaps a
> rename/rename(2to1)/delete[/delete]) conflict.
> 
> One thing to note here is that when there is a double rename, the code
> in question only handles one of them at a time; a later iteration
> through the loop will handle the other.  After they've both been
> handled, process_entry()'s normal add/add code can handle the collision.
> 
> This code replaces the following from merge-recurisve.c:
> 
>   * all the 2to1 code in process_renames()
>   * the RENAME_TWO_FILES_TO_ONE case of process_entry()
>   * handle_rename_rename_2to1()
>   * handle_rename_add()
> 
> Also, there is some shared code from merge-recursive.c for multiple
> different rename cases which we will no longer need for this case (or
> other rename cases):
> 
>   * handle_file_collision()
>   * setup_rename_conflict_info()
> 
> The consolidation of six separate codepaths into one is made possible
> by a change in design: process_renames() tweaks the conflict_info
> entries within opt->priv->paths such that process_entry() can then
> handle all the non-rename conflict types (directory/file, modify/delete,
> etc.) orthogonally.  This means we're much less likely to miss special
> implementation of some kind of combination of conflict types (see
> commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
> 2020-11-18), especially commit ef52778708 ("merge tests: expect improved
> directory/file conflict handling in ort", 2020-10-26) for more details).
> That, together with letting worktree/index updating be handled
> orthogonally in the merge_switch_to_result() function, dramatically
> simplifies the code for various special rename cases.

I'm really happy that you broke out the cases earlier, and describe
them so well in the message. It makes this hunk of code really easy
to understand:

> +			const char *pathnames[3];
> +			struct version_info merged;
> +
> +			struct conflict_info *base, *side1, *side2;
> +			unsigned clean;
> +
> +			pathnames[0] = oldpath;
> +			pathnames[other_source_index] = oldpath;
> +			pathnames[target_index] = newpath;
> +
> +			base = strmap_get(&opt->priv->paths, pathnames[0]);
> +			side1 = strmap_get(&opt->priv->paths, pathnames[1]);
> +			side2 = strmap_get(&opt->priv->paths, pathnames[2]);
> +
> +			VERIFY_CI(base);
> +			VERIFY_CI(side1);
> +			VERIFY_CI(side2);
> +
> +			clean = handle_content_merge(opt, pair->one->path,
> +						     &base->stages[0],
> +						     &side1->stages[1],
> +						     &side2->stages[2],
> +						     pathnames,
> +						     1 + 2*opt->priv->call_depth,

nit: " * "

> +						     &merged);
> +
> +			memcpy(&newinfo->stages[target_index], &merged,
> +			       sizeof(merged));
> +			if (!clean) {
> +				path_msg(opt, newpath, 0,
> +					 _("CONFLICT (rename involved in "
> +					   "collision): rename of %s -> %s has "
> +					   "content conflicts AND collides "
> +					   "with another path; this may result "
> +					   "in nested conflict markers."),
> +					 oldpath, newpath);

I was briefly taken aback by "AND collides with another path" wondering if
that wording helps users understand the type of conflict here. But I can't
think of anything better, so *shrug*.

> +			}
>  		} else if (collision && source_deleted) {
> -			/* rename/add/delete or rename/rename(2to1)/delete */
> -			die("Not yet implemented");
> +			/*
> +			 * rename/add/delete or rename/rename(2to1)/delete:
> +			 * since oldpath was deleted on the side that didn't
> +			 * do the rename, there's not much of a content merge
> +			 * we can do for the rename.  oldinfo->merged.is_null
> +			 * was already set, so we just leave things as-is so
> +			 * they look like an add/add conflict.
> +			 */
> +
> +			newinfo->path_conflict = 1;
> +			path_msg(opt, newpath, 0,
> +				 _("CONFLICT (rename/delete): %s renamed "
> +				   "to %s in %s, but deleted in %s."),
> +				 oldpath, newpath, rename_branch, delete_branch);

I think this branch is added in the wrong patch. My compiler is complaining
that 'rename_branch' and 'delete_branch' are not declared (yet).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 09/11] merge-ort: add implementation of rename/delete conflicts
  2020-12-14 16:21   ` [PATCH v2 09/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
@ 2020-12-15 14:23     ` Derrick Stolee
  2020-12-15 17:07       ` Elijah Newren
  2020-12-15 14:27     ` Derrick Stolee
  1 sibling, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2020-12-15 14:23 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Johannes Schindelin

On 12/14/2020 11:21 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Implement rename/delete conflicts, i.e. one side renames a file and the
> other deletes the file.  This code replaces the following from
> merge-recurisve.c:
> 
>   * the code relevant to RENAME_DELETE in process_renames()
>   * the RENAME_DELETE case of process_entry()
>   * handle_rename_delete()
> 
> Also, there is some shared code from merge-recursive.c for multiple
> different rename cases which we will no longer need for this case (or
> other rename cases):
> 
>   * handle_change_delete()
>   * setup_rename_conflict_info()
> 
> The consolidation of five separate codepaths into one is made possible
> by a change in design: process_renames() tweaks the conflict_info
> entries within opt->priv->paths such that process_entry() can then
> handle all the non-rename conflict types (directory/file, modify/delete,
> etc.) orthogonally.  This means we're much less likely to miss special
> implementation of some kind of combination of conflict types (see
> commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
> 2020-11-18), especially commit ef52778708 ("merge tests: expect improved
> directory/file conflict handling in ort", 2020-10-26) for more details).
> That, together with letting worktree/index updating be handled
> orthogonally in the merge_switch_to_result() function, dramatically
> simplifies the code for various special rename cases.
> 
> To be fair, there is a _slight_ tweak to process_entry() here, because
> rename/delete cases will also trigger the modify/delete codepath.
> However, we only want a modify/delete message to be printed for a
> rename/delete conflict if there is a content change in the renamed file
> in addition to the rename.  So process_renames() and process_entry()
> aren't quite fully orthogonal, but they are pretty close.

Thanks for adding this warning about the change to process_entry().

> @@ -657,6 +657,7 @@ static int process_renames(struct merge_options *opt,
>  		unsigned int old_sidemask;
>  		int target_index, other_source_index;
>  		int source_deleted, collision, type_changed;
> +		const char *rename_branch = NULL, *delete_branch = NULL;

Ah, here they are!

> +		if (source_deleted) {
> +			if (target_index == 1) {
> +				rename_branch = opt->branch1;
> +				delete_branch = opt->branch2;
> +			} else {
> +				rename_branch = opt->branch2;
> +				delete_branch = opt->branch1;
> +			}
>  		}
>  
>  		assert(source_deleted || oldinfo->filemask & old_sidemask);
> @@ -838,13 +847,26 @@ static int process_renames(struct merge_options *opt,
>  				   "to %s in %s, but deleted in %s."),
>  				 oldpath, newpath, rename_branch, delete_branch);

This context line is the previous use of rename_branch and delete_branch.
Perhaps the declarations, initialization, and first-use here are worth
their own patch?

>  		} else {
> +			/*
> +			 * a few different cases...start by copying the
> +			 * existing stage(s) from oldinfo over the newinfo
> +			 * and update the pathname(s).
> +			 */
> +			memcpy(&newinfo->stages[0], &oldinfo->stages[0],
> +			       sizeof(newinfo->stages[0]));
> +			newinfo->filemask |= (1 << MERGE_BASE);
> +			newinfo->pathnames[0] = oldpath;
>  			if (type_changed) {
>  				/* rename vs. typechange */
>  				die("Not yet implemented");
>  			} else if (source_deleted) {
>  				/* rename/delete */
> +				newinfo->path_conflict = 1;
> +				path_msg(opt, newpath, 0,
> +					 _("CONFLICT (rename/delete): %s renamed"
> +					   " to %s in %s, but deleted in %s."),
> +					 oldpath, newpath,
> +					 rename_branch, delete_branch);

Since the primary purpose of rename_branch and delete_branch appears to
be for these error messages, then likely the previous error message about
a rename/delete should just be promoted into this patch instead of the
previous.

In fact, the error messages are the exact same, but with slightly different
lines due to wrapping:

			path_msg(opt, newpath, 0,
				 _("CONFLICT (rename/delete): %s renamed "
				   "to %s in %s, but deleted in %s."),
				 oldpath, newpath, rename_branch, delete_branch);

and

				path_msg(opt, newpath, 0,
					 _("CONFLICT (rename/delete): %s renamed"
					   " to %s in %s, but deleted in %s."),
					 oldpath, newpath,
					 rename_branch, delete_branch);

I wonder if there is a way to group these together? Perhaps the nested
if/else if/else blocks could store a "conflict state" value that says
which CONFLICT message to print after the complicated branching is done.

Alternatively, this message appears to be written in the following case:

	source_deleted && !type_changed

your if/else if/else block could be rearranged as follows:

	if (collision && !source_deleted)
		/* collision: rename/add or rename/rename(2to1) */
	else if (!type_change && source_deleted)
		/* rename/delete or rename/add/delete or rename/rename(2to1)/delete */
	else if (!collision)
		/* a few different cases */

Of course, the thing I am missing is that copy of oldinfo->stages[0] into
newinfo->stages[0] along with changes to the filemask and pathnames! That
is likely why you need the two different markers, because the cases truly
are different in that subtle way.

>  				/* normal rename */
>  				die("Not yet implemented");
> @@ -1380,12 +1402,21 @@ static void process_entry(struct merge_options *opt,
>  		modify_branch = (side == 1) ? opt->branch1 : opt->branch2;
>  		delete_branch = (side == 1) ? opt->branch2 : opt->branch1;
>  
> -		path_msg(opt, path, 0,
> -			 _("CONFLICT (modify/delete): %s deleted in %s "
> -			   "and modified in %s.  Version %s of %s left "
> -			   "in tree."),
> -			 path, delete_branch, modify_branch,
> -			 modify_branch, path);
> +		if (ci->path_conflict &&
> +		    oideq(&ci->stages[0].oid, &ci->stages[side].oid)) {
> +			/*
> +			 * This came from a rename/delete; no action to take,
> +			 * but avoid printing "modify/delete" conflict notice
> +			 * since the contents were not modified.
> +			 */
> +		} else {
> +			path_msg(opt, path, 0,
> +				 _("CONFLICT (modify/delete): %s deleted in %s "
> +				   "and modified in %s.  Version %s of %s left "
> +				   "in tree."),
> +				 path, delete_branch, modify_branch,
> +				 modify_branch, path);
> +		}

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 09/11] merge-ort: add implementation of rename/delete conflicts
  2020-12-14 16:21   ` [PATCH v2 09/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
  2020-12-15 14:23     ` Derrick Stolee
@ 2020-12-15 14:27     ` Derrick Stolee
  1 sibling, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2020-12-15 14:27 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Johannes Schindelin

On 12/14/2020 11:21 AM, Elijah Newren via GitGitGadget wrote:
>  		if (type_changed && collision) {
>  			/* special handling so later blocks can handle this */
>  			die("Not yet implemented");
> +		if (source_deleted) {

I didn't catch this in my earlier message, but the opening brace
of the if (type_changed && collision) gets squashed here, causing
a compiler break.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 10/11] merge-ort: add implementation of normal rename handling
  2020-12-14 16:21   ` [PATCH v2 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
@ 2020-12-15 14:27     ` Derrick Stolee
  0 siblings, 0 replies; 65+ messages in thread
From: Derrick Stolee @ 2020-12-15 14:27 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Johannes Schindelin

On 12/14/2020 11:21 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Implement handling of normal renames.  This code replaces the following
> from merge-recurisve.c:
> 
>   * the code relevant to RENAME_NORMAL in process_renames()
>   * the RENAME_NORMAL case of process_entry()
> 
> Also, there is some shared code from merge-recursive.c for multiple
> different rename cases which we will no longer need for this case (or
> other rename cases):
> 
>   * handle_rename_normal()
>   * setup_rename_conflict_info()
> 
> The consolidation of four separate codepaths into one is made possible
> by a change in design: process_renames() tweaks the conflict_info
> entries within opt->priv->paths such that process_entry() can then
> handle all the non-rename conflict types (directory/file, modify/delete,
> etc.) orthogonally.  This means we're much less likely to miss special
> implementation of some kind of combination of conflict types (see
> commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
> 2020-11-18), especially commit ef52778708 ("merge tests: expect improved
> directory/file conflict handling in ort", 2020-10-26) for more details).
> That, together with letting worktree/index updating be handled
> orthogonally in the merge_switch_to_result() function, dramatically
> simplifies the code for various special rename cases.
> 
> (To be fair, the code for handling normal renames wasn't all that
> complicated beforehand, but it's still much simpler now.)

Definitely looks simple.

> +				memcpy(&newinfo->stages[other_source_index],
> +				       &oldinfo->stages[other_source_index],
> +				       sizeof(newinfo->stages[0]));
> +				newinfo->filemask |= (1 << other_source_index);
> +				newinfo->pathnames[other_source_index] = oldpath;

I'm happy your organization brought us to a clean place.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/11] merge-ort: add implementation of type-changed rename handling
  2020-12-14 16:21   ` [PATCH v2 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget
@ 2020-12-15 14:31     ` Derrick Stolee
  2020-12-15 17:11       ` Elijah Newren
  0 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2020-12-15 14:31 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Johannes Schindelin

On 12/14/2020 11:21 AM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Implement cases where renames are involved in type changes (i.e. the
> side of history that didn't rename the file changed its type from a
> regular file to a symlink or submodule).  There was some code to handle
> this in merge-recursive but only in the special case when the renamed
> file had no content changes.  The code here works differently -- it
> knows process_entry() can handle mode conflicts, so it does a few
> minimal tweaks to ensure process_entry() can just finish the job as
> needed.
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c | 33 +++++++++++++++++++++++++++++++--
>  1 file changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/merge-ort.c b/merge-ort.c
> index 9aac33c8e31..11e33f56edf 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -778,7 +778,32 @@ static int process_renames(struct merge_options *opt,
>  			 S_ISREG(newinfo->stages[target_index].mode));
>  		if (type_changed && collision) {
>  			/* special handling so later blocks can handle this */

Perhaps drop this comment, or incorporate it into the lower one?

> -			die("Not yet implemented");
> +			/*
> +			 * if type_changed && collision are both true, then this
> +			 * was really a double rename, but one side wasn't
> +			 * detected due to lack of break detection.  I.e.
> +			 * something like
> +			 *    orig: has normal file 'foo'
> +			 *    side1: renames 'foo' to 'bar', adds 'foo' symlink
> +			 *    side2: renames 'foo' to 'bar'
> +			 * In this case, the foo->bar rename on side1 won't be
> +			 * detected because the new symlink named 'foo' is
> +			 * there and we don't do break detection.  But we detect
> +			 * this here because we don't want to merge the content
> +			 * of the foo symlink with the foo->bar file, so we
> +			 * have some logic to handle this special case.  The
> +			 * easiest way to do that is make 'bar' on side1 not
> +			 * be considered a colliding file but the other part
> +			 * of a normal rename.  If the file is very different,
> +			 * well we're going to get content merge conflicts
> +			 * anyway so it doesn't hurt.  And if the colliding
> +			 * file also has a different type, that'll be handled
> +			 * by the content merge logic in process_entry() too.
> +			 *
> +			 * See also t6430, 'rename vs. rename/symlink'

I appreciate the callout to a test that exercises this behavior.

> +			 */
> +			collision = 0;
> +		}

Here, we regain that closing curly brace, fixing the compiler errors from
earlier.

>  		if (source_deleted) {
>  			if (target_index == 1) {
>  				rename_branch = opt->branch1;
> @@ -858,7 +883,11 @@ static int process_renames(struct merge_options *opt,
>  			newinfo->pathnames[0] = oldpath;
>  			if (type_changed) {
>  				/* rename vs. typechange */
> -				die("Not yet implemented");
> +				/* Mark the original as resolved by removal */
> +				memcpy(&oldinfo->stages[0].oid, &null_oid,
> +				       sizeof(oldinfo->stages[0].oid));
> +				oldinfo->stages[0].mode = 0;
> +				oldinfo->filemask &= 0x06;

This matches your explanation in the comment above. I wonder if 0x06
could be less magical, but we are really deep in the weeds here already.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/11] merge-ort: add basic rename detection
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                     ` (10 preceding siblings ...)
  2020-12-14 16:21   ` [PATCH v2 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget
@ 2020-12-15 14:34   ` Derrick Stolee
  2020-12-15 22:09     ` Junio C Hamano
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
  12 siblings, 1 reply; 65+ messages in thread
From: Derrick Stolee @ 2020-12-15 14:34 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git; +Cc: Elijah Newren, Johannes Schindelin

On 12/14/2020 11:21 AM, Elijah Newren via GitGitGadget wrote:
> This series builds on en/merge-ort-2 and adds basic rename detection to
> merge-ort.

I have now finished a full pass through this series. I find it to be
well organized with a satisfying conclusion. My comments are mostly
about nits. I tried to recommend some better organization, but failed
to find a clear way to do so, which shows that the organization here
is sound.

My only complaint is that some of the patches break compilation, so
that should definitely be fixed to assist with bisects in the future
(likely for bisects unrelated to this feature).

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 08/11] merge-ort: add implementation of rename collisions
  2020-12-15 14:09     ` Derrick Stolee
@ 2020-12-15 16:56       ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2020-12-15 16:56 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Johannes Schindelin

On Tue, Dec 15, 2020 at 6:09 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/14/2020 11:21 AM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Implement rename/rename(2to1) and rename/add handling, i.e. a file is
> > renamed into a location where another file is added (with that other
> > file either being a plain add or itself coming from a rename).  Note
> > that rename collisions can also have a special case stacked on top: the
> > file being renamed on one side of history is deleted on the other
> > (yielding either a rename/add/delete conflict or perhaps a
> > rename/rename(2to1)/delete[/delete]) conflict.
> >
> > One thing to note here is that when there is a double rename, the code
> > in question only handles one of them at a time; a later iteration
> > through the loop will handle the other.  After they've both been
> > handled, process_entry()'s normal add/add code can handle the collision.
> >
> > This code replaces the following from merge-recurisve.c:
> >
> >   * all the 2to1 code in process_renames()
> >   * the RENAME_TWO_FILES_TO_ONE case of process_entry()
> >   * handle_rename_rename_2to1()
> >   * handle_rename_add()
> >
> > Also, there is some shared code from merge-recursive.c for multiple
> > different rename cases which we will no longer need for this case (or
> > other rename cases):
> >
> >   * handle_file_collision()
> >   * setup_rename_conflict_info()
> >
> > The consolidation of six separate codepaths into one is made possible
> > by a change in design: process_renames() tweaks the conflict_info
> > entries within opt->priv->paths such that process_entry() can then
> > handle all the non-rename conflict types (directory/file, modify/delete,
> > etc.) orthogonally.  This means we're much less likely to miss special
> > implementation of some kind of combination of conflict types (see
> > commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
> > 2020-11-18), especially commit ef52778708 ("merge tests: expect improved
> > directory/file conflict handling in ort", 2020-10-26) for more details).
> > That, together with letting worktree/index updating be handled
> > orthogonally in the merge_switch_to_result() function, dramatically
> > simplifies the code for various special rename cases.
>
> I'm really happy that you broke out the cases earlier, and describe
> them so well in the message. It makes this hunk of code really easy
> to understand:
>
> > +                     const char *pathnames[3];
> > +                     struct version_info merged;
> > +
> > +                     struct conflict_info *base, *side1, *side2;
> > +                     unsigned clean;
> > +
> > +                     pathnames[0] = oldpath;
> > +                     pathnames[other_source_index] = oldpath;
> > +                     pathnames[target_index] = newpath;
> > +
> > +                     base = strmap_get(&opt->priv->paths, pathnames[0]);
> > +                     side1 = strmap_get(&opt->priv->paths, pathnames[1]);
> > +                     side2 = strmap_get(&opt->priv->paths, pathnames[2]);
> > +
> > +                     VERIFY_CI(base);
> > +                     VERIFY_CI(side1);
> > +                     VERIFY_CI(side2);
> > +
> > +                     clean = handle_content_merge(opt, pair->one->path,
> > +                                                  &base->stages[0],
> > +                                                  &side1->stages[1],
> > +                                                  &side2->stages[2],
> > +                                                  pathnames,
> > +                                                  1 + 2*opt->priv->call_depth,
>
> nit: " * "

Will fix.

> > +                                                  &merged);
> > +
> > +                     memcpy(&newinfo->stages[target_index], &merged,
> > +                            sizeof(merged));
> > +                     if (!clean) {
> > +                             path_msg(opt, newpath, 0,
> > +                                      _("CONFLICT (rename involved in "
> > +                                        "collision): rename of %s -> %s has "
> > +                                        "content conflicts AND collides "
> > +                                        "with another path; this may result "
> > +                                        "in nested conflict markers."),
> > +                                      oldpath, newpath);
>
> I was briefly taken aback by "AND collides with another path" wondering if
> that wording helps users understand the type of conflict here. But I can't
> think of anything better, so *shrug*.
>
> > +                     }
> >               } else if (collision && source_deleted) {
> > -                     /* rename/add/delete or rename/rename(2to1)/delete */
> > -                     die("Not yet implemented");
> > +                     /*
> > +                      * rename/add/delete or rename/rename(2to1)/delete:
> > +                      * since oldpath was deleted on the side that didn't
> > +                      * do the rename, there's not much of a content merge
> > +                      * we can do for the rename.  oldinfo->merged.is_null
> > +                      * was already set, so we just leave things as-is so
> > +                      * they look like an add/add conflict.
> > +                      */
> > +
> > +                     newinfo->path_conflict = 1;
> > +                     path_msg(opt, newpath, 0,
> > +                              _("CONFLICT (rename/delete): %s renamed "
> > +                                "to %s in %s, but deleted in %s."),
> > +                              oldpath, newpath, rename_branch, delete_branch);
>
> I think this branch is added in the wrong patch. My compiler is complaining
> that 'rename_branch' and 'delete_branch' are not declared (yet).

Whoops.  This used to be separate patches, with the second half coming
after one of the later patches in the series.  But in the commit
message seemed most natural to talk about "rename collisions" which
then means both types of conflicts here.  So I squashed them...and
broke the build.  I'll rearrange this one to come after the
rename/delete patch so that rename_branch and delete_branch will be
defined.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 09/11] merge-ort: add implementation of rename/delete conflicts
  2020-12-15 14:23     ` Derrick Stolee
@ 2020-12-15 17:07       ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2020-12-15 17:07 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Johannes Schindelin

On Tue, Dec 15, 2020 at 6:23 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/14/2020 11:21 AM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Implement rename/delete conflicts, i.e. one side renames a file and the
> > other deletes the file.  This code replaces the following from
> > merge-recurisve.c:
> >
> >   * the code relevant to RENAME_DELETE in process_renames()
> >   * the RENAME_DELETE case of process_entry()
> >   * handle_rename_delete()
> >
> > Also, there is some shared code from merge-recursive.c for multiple
> > different rename cases which we will no longer need for this case (or
> > other rename cases):
> >
> >   * handle_change_delete()
> >   * setup_rename_conflict_info()
> >
> > The consolidation of five separate codepaths into one is made possible
> > by a change in design: process_renames() tweaks the conflict_info
> > entries within opt->priv->paths such that process_entry() can then
> > handle all the non-rename conflict types (directory/file, modify/delete,
> > etc.) orthogonally.  This means we're much less likely to miss special
> > implementation of some kind of combination of conflict types (see
> > commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
> > 2020-11-18), especially commit ef52778708 ("merge tests: expect improved
> > directory/file conflict handling in ort", 2020-10-26) for more details).
> > That, together with letting worktree/index updating be handled
> > orthogonally in the merge_switch_to_result() function, dramatically
> > simplifies the code for various special rename cases.
> >
> > To be fair, there is a _slight_ tweak to process_entry() here, because
> > rename/delete cases will also trigger the modify/delete codepath.
> > However, we only want a modify/delete message to be printed for a
> > rename/delete conflict if there is a content change in the renamed file
> > in addition to the rename.  So process_renames() and process_entry()
> > aren't quite fully orthogonal, but they are pretty close.
>
> Thanks for adding this warning about the change to process_entry().
>
> > @@ -657,6 +657,7 @@ static int process_renames(struct merge_options *opt,
> >               unsigned int old_sidemask;
> >               int target_index, other_source_index;
> >               int source_deleted, collision, type_changed;
> > +             const char *rename_branch = NULL, *delete_branch = NULL;
>
> Ah, here they are!
>
> > +             if (source_deleted) {
> > +                     if (target_index == 1) {
> > +                             rename_branch = opt->branch1;
> > +                             delete_branch = opt->branch2;
> > +                     } else {
> > +                             rename_branch = opt->branch2;
> > +                             delete_branch = opt->branch1;
> > +                     }
> >               }
> >
> >               assert(source_deleted || oldinfo->filemask & old_sidemask);
> > @@ -838,13 +847,26 @@ static int process_renames(struct merge_options *opt,
> >                                  "to %s in %s, but deleted in %s."),
> >                                oldpath, newpath, rename_branch, delete_branch);
>
> This context line is the previous use of rename_branch and delete_branch.
> Perhaps the declarations, initialization, and first-use here are worth
> their own patch?
>
> >               } else {
> > +                     /*
> > +                      * a few different cases...start by copying the
> > +                      * existing stage(s) from oldinfo over the newinfo
> > +                      * and update the pathname(s).
> > +                      */
> > +                     memcpy(&newinfo->stages[0], &oldinfo->stages[0],
> > +                            sizeof(newinfo->stages[0]));
> > +                     newinfo->filemask |= (1 << MERGE_BASE);
> > +                     newinfo->pathnames[0] = oldpath;
> >                       if (type_changed) {
> >                               /* rename vs. typechange */
> >                               die("Not yet implemented");
> >                       } else if (source_deleted) {
> >                               /* rename/delete */
> > +                             newinfo->path_conflict = 1;
> > +                             path_msg(opt, newpath, 0,
> > +                                      _("CONFLICT (rename/delete): %s renamed"
> > +                                        " to %s in %s, but deleted in %s."),
> > +                                      oldpath, newpath,
> > +                                      rename_branch, delete_branch);
>
> Since the primary purpose of rename_branch and delete_branch appears to
> be for these error messages, then likely the previous error message about
> a rename/delete should just be promoted into this patch instead of the
> previous.
>
> In fact, the error messages are the exact same, but with slightly different
> lines due to wrapping:
>
>                         path_msg(opt, newpath, 0,
>                                  _("CONFLICT (rename/delete): %s renamed "
>                                    "to %s in %s, but deleted in %s."),
>                                  oldpath, newpath, rename_branch, delete_branch);
>
> and
>
>                                 path_msg(opt, newpath, 0,
>                                          _("CONFLICT (rename/delete): %s renamed"
>                                            " to %s in %s, but deleted in %s."),
>                                          oldpath, newpath,
>                                          rename_branch, delete_branch);
>
> I wonder if there is a way to group these together? Perhaps the nested
> if/else if/else blocks could store a "conflict state" value that says
> which CONFLICT message to print after the complicated branching is done.
>
> Alternatively, this message appears to be written in the following case:
>
>         source_deleted && !type_changed
>
> your if/else if/else block could be rearranged as follows:
>
>         if (collision && !source_deleted)
>                 /* collision: rename/add or rename/rename(2to1) */
>         else if (!type_change && source_deleted)
>                 /* rename/delete or rename/add/delete or rename/rename(2to1)/delete */
>         else if (!collision)
>                 /* a few different cases */
>
> Of course, the thing I am missing is that copy of oldinfo->stages[0] into
> newinfo->stages[0] along with changes to the filemask and pathnames! That
> is likely why you need the two different markers, because the cases truly
> are different in that subtle way.

Yeah, there is that subtlety and another one -- the rename/add/delete
case will also later trigger the "add/add" conflict type within
process_entries() for this same path, whereas the rename/delete case
from this patch won't.  The combination is enough of a difference that
I'm worried that trying to make both types run through the same code
block might blur the differences and pose a landmine for future folks
coming to edit the code; it'd make it too easy to break one or the
other conflict type.

If the sharing was done a different way, namely saving the basic
message in some variable before either if-block and then both places
just pass that string to path_msg() instead of both having it typed
out, then that'd probably make sense, but then we're not really saving
much.

> >                               /* normal rename */
> >                               die("Not yet implemented");
> > @@ -1380,12 +1402,21 @@ static void process_entry(struct merge_options *opt,
> >               modify_branch = (side == 1) ? opt->branch1 : opt->branch2;
> >               delete_branch = (side == 1) ? opt->branch2 : opt->branch1;
> >
> > -             path_msg(opt, path, 0,
> > -                      _("CONFLICT (modify/delete): %s deleted in %s "
> > -                        "and modified in %s.  Version %s of %s left "
> > -                        "in tree."),
> > -                      path, delete_branch, modify_branch,
> > -                      modify_branch, path);
> > +             if (ci->path_conflict &&
> > +                 oideq(&ci->stages[0].oid, &ci->stages[side].oid)) {
> > +                     /*
> > +                      * This came from a rename/delete; no action to take,
> > +                      * but avoid printing "modify/delete" conflict notice
> > +                      * since the contents were not modified.
> > +                      */
> > +             } else {
> > +                     path_msg(opt, path, 0,
> > +                              _("CONFLICT (modify/delete): %s deleted in %s "
> > +                                "and modified in %s.  Version %s of %s left "
> > +                                "in tree."),
> > +                              path, delete_branch, modify_branch,
> > +                              modify_branch, path);
> > +             }
>
> Thanks,
> -Stolee

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 11/11] merge-ort: add implementation of type-changed rename handling
  2020-12-15 14:31     ` Derrick Stolee
@ 2020-12-15 17:11       ` Elijah Newren
  0 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren @ 2020-12-15 17:11 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, Git Mailing List,
	Johannes Schindelin

On Tue, Dec 15, 2020 at 6:31 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/14/2020 11:21 AM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Implement cases where renames are involved in type changes (i.e. the
> > side of history that didn't rename the file changed its type from a
> > regular file to a symlink or submodule).  There was some code to handle
> > this in merge-recursive but only in the special case when the renamed
> > file had no content changes.  The code here works differently -- it
> > knows process_entry() can handle mode conflicts, so it does a few
> > minimal tweaks to ensure process_entry() can just finish the job as
> > needed.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  merge-ort.c | 33 +++++++++++++++++++++++++++++++--
> >  1 file changed, 31 insertions(+), 2 deletions(-)
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 9aac33c8e31..11e33f56edf 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -778,7 +778,32 @@ static int process_renames(struct merge_options *opt,
> >                        S_ISREG(newinfo->stages[target_index].mode));
> >               if (type_changed && collision) {
> >                       /* special handling so later blocks can handle this */
>
> Perhaps drop this comment, or incorporate it into the lower one?

Will do.

> > -                     die("Not yet implemented");
> > +                     /*
> > +                      * if type_changed && collision are both true, then this
> > +                      * was really a double rename, but one side wasn't
> > +                      * detected due to lack of break detection.  I.e.
> > +                      * something like
> > +                      *    orig: has normal file 'foo'
> > +                      *    side1: renames 'foo' to 'bar', adds 'foo' symlink
> > +                      *    side2: renames 'foo' to 'bar'
> > +                      * In this case, the foo->bar rename on side1 won't be
> > +                      * detected because the new symlink named 'foo' is
> > +                      * there and we don't do break detection.  But we detect
> > +                      * this here because we don't want to merge the content
> > +                      * of the foo symlink with the foo->bar file, so we
> > +                      * have some logic to handle this special case.  The
> > +                      * easiest way to do that is make 'bar' on side1 not
> > +                      * be considered a colliding file but the other part
> > +                      * of a normal rename.  If the file is very different,
> > +                      * well we're going to get content merge conflicts
> > +                      * anyway so it doesn't hurt.  And if the colliding
> > +                      * file also has a different type, that'll be handled
> > +                      * by the content merge logic in process_entry() too.
> > +                      *
> > +                      * See also t6430, 'rename vs. rename/symlink'
>
> I appreciate the callout to a test that exercises this behavior.
>
> > +                      */
> > +                     collision = 0;
> > +             }
>
> Here, we regain that closing curly brace, fixing the compiler errors from
> earlier.

So embarrassing.  I was pretty sure I tested the individual patches,
but maybe I somehow missed this series??  Anyway, yeah, I'll fix it
up.

>
> >               if (source_deleted) {
> >                       if (target_index == 1) {
> >                               rename_branch = opt->branch1;
> > @@ -858,7 +883,11 @@ static int process_renames(struct merge_options *opt,
> >                       newinfo->pathnames[0] = oldpath;
> >                       if (type_changed) {
> >                               /* rename vs. typechange */
> > -                             die("Not yet implemented");
> > +                             /* Mark the original as resolved by removal */
> > +                             memcpy(&oldinfo->stages[0].oid, &null_oid,
> > +                                    sizeof(oldinfo->stages[0].oid));
> > +                             oldinfo->stages[0].mode = 0;
> > +                             oldinfo->filemask &= 0x06;
>
> This matches your explanation in the comment above. I wonder if 0x06
> could be less magical, but we are really deep in the weeds here already.
>
> Thanks,
> -Stolee
>

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v3 00/11] merge-ort: add basic rename detection
  2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
                     ` (11 preceding siblings ...)
  2020-12-15 14:34   ` [PATCH v2 00/11] merge-ort: add basic rename detection Derrick Stolee
@ 2020-12-15 18:27   ` Elijah Newren via GitGitGadget
  2020-12-15 18:27     ` [PATCH v3 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
                       ` (10 more replies)
  12 siblings, 11 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-15 18:27 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren

This series builds on en/merge-ort-2 and adds basic rename detection to
merge-ort.

Changes since v2 (all due to feedback from Stolee's reviews):

 * reordered two of the patches (one depended on vars declared in another)
 * a few other adjustments to make patches individually compile (I usually
   check this; so embarrassing that I somehow missed it)

Elijah Newren (11):
  merge-ort: add basic data structures for handling renames
  merge-ort: add initial outline for basic rename detection
  merge-ort: implement detect_regular_renames()
  merge-ort: implement compare_pairs() and collect_renames()
  merge-ort: add basic outline for process_renames()
  merge-ort: add implementation of both sides renaming identically
  merge-ort: add implementation of both sides renaming differently
  merge-ort: add implementation of rename/delete conflicts
  merge-ort: add implementation of rename collisions
  merge-ort: add implementation of normal rename handling
  merge-ort: add implementation of type-changed rename handling

 merge-ort.c | 446 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 430 insertions(+), 16 deletions(-)


base-commit: c5a6f65527aa3b6f5d7cf25437a88d8727ab0646
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-812%2Fnewren%2Fort-renames-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-812/newren/ort-renames-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/812

Range-diff vs v2:

  1:  78621ca0788 =  1:  78621ca0788 merge-ort: add basic data structures for handling renames
  2:  d846decf40b =  2:  d846decf40b merge-ort: add initial outline for basic rename detection
  3:  620fc64032d =  3:  620fc64032d merge-ort: implement detect_regular_renames()
  4:  9382dc4d50b =  4:  9382dc4d50b merge-ort: implement compare_pairs() and collect_renames()
  5:  d20fab8d403 =  5:  d20fab8d403 merge-ort: add basic outline for process_renames()
  6:  15fff3dd0c4 !  6:  7ec51feb418 merge-ort: add implementation of both sides renaming identically
     @@ merge-ort.c: static int process_renames(struct merge_options *opt,
       			const char *pathnames[3];
      +			struct version_info merged;
      +			struct conflict_info *base, *side1, *side2;
     -+			unsigned was_binary_blob = 0;
       
       			pathnames[0] = oldpath;
       			pathnames[1] = newpath;
  7:  d00e26be784 !  7:  d37e2626c30 merge-ort: add implementation of both sides renaming differently
     @@ Commit message
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
       ## merge-ort.c ##
     +@@ merge-ort.c: static int process_renames(struct merge_options *opt,
     + 			const char *pathnames[3];
     + 			struct version_info merged;
     + 			struct conflict_info *base, *side1, *side2;
     ++			unsigned was_binary_blob = 0;
     + 
     + 			pathnames[0] = oldpath;
     + 			pathnames[1] = newpath;
      @@ merge-ort.c: static int process_renames(struct merge_options *opt,
       			}
       
  9:  f017534243c !  8:  6b79da5e8a4 merge-ort: add implementation of rename/delete conflicts
     @@ merge-ort.c: static int process_renames(struct merge_options *opt,
       		old_ent = strmap_get_entry(&opt->priv->paths, pair->one->path);
       		oldpath = old_ent->key;
      @@ merge-ort.c: static int process_renames(struct merge_options *opt,
     - 		if (type_changed && collision) {
       			/* special handling so later blocks can handle this */
       			die("Not yet implemented");
     + 		}
      +		if (source_deleted) {
      +			if (target_index == 1) {
      +				rename_branch = opt->branch1;
     @@ merge-ort.c: static int process_renames(struct merge_options *opt,
      +				rename_branch = opt->branch2;
      +				delete_branch = opt->branch1;
      +			}
     - 		}
     ++		}
       
       		assert(source_deleted || oldinfo->filemask & old_sidemask);
     + 
      @@ merge-ort.c: static int process_renames(struct merge_options *opt,
     - 				   "to %s in %s, but deleted in %s."),
     - 				 oldpath, newpath, rename_branch, delete_branch);
     + 			/* rename/add/delete or rename/rename(2to1)/delete */
     + 			die("Not yet implemented");
       		} else {
      -			/* a few different cases... */
      +			/*
  8:  edd610321a0 !  9:  065fc0396dc merge-ort: add implementation of rename collisions
     @@ merge-ort.c: static int process_renames(struct merge_options *opt,
      +						     &side1->stages[1],
      +						     &side2->stages[2],
      +						     pathnames,
     -+						     1 + 2*opt->priv->call_depth,
     ++						     1 + 2 * opt->priv->call_depth,
      +						     &merged);
      +
      +			memcpy(&newinfo->stages[target_index], &merged,
     @@ merge-ort.c: static int process_renames(struct merge_options *opt,
      +				   "to %s in %s, but deleted in %s."),
      +				 oldpath, newpath, rename_branch, delete_branch);
       		} else {
     - 			/* a few different cases... */
     - 			if (type_changed) {
     + 			/*
     + 			 * a few different cases...start by copying the
 10:  22cb7110261 = 10:  73426c16687 merge-ort: add implementation of normal rename handling
 11:  ff09ddb9caf ! 11:  8f4662398ab merge-ort: add implementation of type-changed rename handling
     @@ Commit message
      
       ## merge-ort.c ##
      @@ merge-ort.c: static int process_renames(struct merge_options *opt,
     + 			(S_ISREG(oldinfo->stages[other_source_index].mode) !=
       			 S_ISREG(newinfo->stages[target_index].mode));
       		if (type_changed && collision) {
     - 			/* special handling so later blocks can handle this */
     +-			/* special handling so later blocks can handle this */
      -			die("Not yet implemented");
      +			/*
     ++			 * special handling so later blocks can handle this...
     ++			 *
      +			 * if type_changed && collision are both true, then this
      +			 * was really a double rename, but one side wasn't
      +			 * detected due to lack of break detection.  I.e.
     @@ merge-ort.c: static int process_renames(struct merge_options *opt,
      +			 * See also t6430, 'rename vs. rename/symlink'
      +			 */
      +			collision = 0;
     -+		}
     + 		}
       		if (source_deleted) {
       			if (target_index == 1) {
     - 				rename_branch = opt->branch1;
      @@ merge-ort.c: static int process_renames(struct merge_options *opt,
       			newinfo->pathnames[0] = oldpath;
       			if (type_changed) {

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH v3 01/11] merge-ort: add basic data structures for handling renames
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
@ 2020-12-15 18:27     ` Elijah Newren via GitGitGadget
  2020-12-15 18:27     ` [PATCH v3 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
                       ` (9 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-15 18:27 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

This will grow later, but we only need a few fields for basic rename
handling.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/merge-ort.c b/merge-ort.c
index 414e7b7eeac..1c1a7fa4bf1 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -46,6 +46,25 @@ enum merge_side {
 	MERGE_SIDE2 = 2
 };
 
+struct rename_info {
+	/*
+	 * pairs: pairing of filenames from diffcore_rename()
+	 *
+	 * Index 1 and 2 correspond to sides 1 & 2 as used in
+	 * conflict_info.stages.  Index 0 unused.
+	 */
+	struct diff_queue_struct pairs[3];
+
+	/*
+	 * needed_limit: value needed for inexact rename detection to run
+	 *
+	 * If the current rename limit wasn't high enough for inexact
+	 * rename detection to run, this records the limit needed.  Otherwise,
+	 * this value remains 0.
+	 */
+	int needed_limit;
+};
+
 struct merge_options_internal {
 	/*
 	 * paths: primary data structure in all of merge ort.
@@ -113,6 +132,11 @@ struct merge_options_internal {
 	 */
 	struct strmap output;
 
+	/*
+	 * renames: various data relating to rename detection
+	 */
+	struct rename_info renames;
+
 	/*
 	 * current_dir_name: temporary var used in collect_merge_info_callback()
 	 *
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 02/11] merge-ort: add initial outline for basic rename detection
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
  2020-12-15 18:27     ` [PATCH v3 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
@ 2020-12-15 18:27     ` Elijah Newren via GitGitGadget
  2020-12-15 18:27     ` [PATCH v3 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
                       ` (8 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-15 18:27 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 1c1a7fa4bf1..8552f5e2318 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -644,20 +644,72 @@ static int handle_content_merge(struct merge_options *opt,
 
 /*** Function Grouping: functions related to regular rename detection ***/
 
+static int process_renames(struct merge_options *opt,
+			   struct diff_queue_struct *renames)
+{
+	die("Not yet implemented.");
+}
+
+static int compare_pairs(const void *a_, const void *b_)
+{
+	die("Not yet implemented.");
+}
+
+/* Call diffcore_rename() to compute which files have changed on given side */
+static void detect_regular_renames(struct merge_options *opt,
+				   struct tree *merge_base,
+				   struct tree *side,
+				   unsigned side_index)
+{
+	die("Not yet implemented.");
+}
+
+/*
+ * Get information of all renames which occurred in 'side_pairs', discarding
+ * non-renames.
+ */
+static int collect_renames(struct merge_options *opt,
+			   struct diff_queue_struct *result,
+			   unsigned side_index)
+{
+	die("Not yet implemented.");
+}
+
 static int detect_and_process_renames(struct merge_options *opt,
 				      struct tree *merge_base,
 				      struct tree *side1,
 				      struct tree *side2)
 {
-	int clean = 1;
+	struct diff_queue_struct combined;
+	struct rename_info *renames = &opt->priv->renames;
+	int s, clean = 1;
+
+	memset(&combined, 0, sizeof(combined));
+
+	detect_regular_renames(opt, merge_base, side1, MERGE_SIDE1);
+	detect_regular_renames(opt, merge_base, side2, MERGE_SIDE2);
+
+	ALLOC_GROW(combined.queue,
+		   renames->pairs[1].nr + renames->pairs[2].nr,
+		   combined.alloc);
+	clean &= collect_renames(opt, &combined, MERGE_SIDE1);
+	clean &= collect_renames(opt, &combined, MERGE_SIDE2);
+	QSORT(combined.queue, combined.nr, compare_pairs);
+
+	clean &= process_renames(opt, &combined);
+
+	/* Free memory for renames->pairs[] and combined */
+	for (s = MERGE_SIDE1; s <= MERGE_SIDE2; s++) {
+		free(renames->pairs[s].queue);
+		DIFF_QUEUE_CLEAR(&renames->pairs[s]);
+	}
+	if (combined.nr) {
+		int i;
+		for (i = 0; i < combined.nr; i++)
+			diff_free_filepair(combined.queue[i]);
+		free(combined.queue);
+	}
 
-	/*
-	 * Rename detection works by detecting file similarity.  Here we use
-	 * a really easy-to-implement scheme: files are similar IFF they have
-	 * the same filename.  Therefore, by this scheme, there are no renames.
-	 *
-	 * TODO: Actually implement a real rename detection scheme.
-	 */
 	return clean;
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 03/11] merge-ort: implement detect_regular_renames()
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
  2020-12-15 18:27     ` [PATCH v3 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
  2020-12-15 18:27     ` [PATCH v3 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
@ 2020-12-15 18:27     ` Elijah Newren via GitGitGadget
  2020-12-15 18:27     ` [PATCH v3 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
                       ` (7 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-15 18:27 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Based heavily on merge-recursive's get_diffpairs() function, and also
includes the necessary paired call to diff_warn_rename_limit() so that
users will be warned if merge.renameLimit is not sufficiently large for
rename detection to run.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 8552f5e2318..66f84d39b43 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -661,7 +661,33 @@ static void detect_regular_renames(struct merge_options *opt,
 				   struct tree *side,
 				   unsigned side_index)
 {
-	die("Not yet implemented.");
+	struct diff_options diff_opts;
+	struct rename_info *renames = &opt->priv->renames;
+
+	repo_diff_setup(opt->repo, &diff_opts);
+	diff_opts.flags.recursive = 1;
+	diff_opts.flags.rename_empty = 0;
+	diff_opts.detect_rename = DIFF_DETECT_RENAME;
+	diff_opts.rename_limit = opt->rename_limit;
+	if (opt->rename_limit <= 0)
+		diff_opts.rename_limit = 1000;
+	diff_opts.rename_score = opt->rename_score;
+	diff_opts.show_rename_progress = opt->show_rename_progress;
+	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
+	diff_setup_done(&diff_opts);
+	diff_tree_oid(&merge_base->object.oid, &side->object.oid, "",
+		      &diff_opts);
+	diffcore_std(&diff_opts);
+
+	if (diff_opts.needed_rename_limit > renames->needed_limit)
+		renames->needed_limit = diff_opts.needed_rename_limit;
+
+	renames->pairs[side_index] = diff_queued_diff;
+
+	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
+	diff_queued_diff.nr = 0;
+	diff_queued_diff.queue = NULL;
+	diff_flush(&diff_opts);
 }
 
 /*
@@ -1406,6 +1432,10 @@ void merge_switch_to_result(struct merge_options *opt,
 			printf("%s", sb->buf);
 		}
 		string_list_clear(&olist, 0);
+
+		/* Also include needed rename limit adjustment now */
+		diff_warn_rename_limit("merge.renamelimit",
+				       opti->renames.needed_limit, 0);
 	}
 
 	merge_finalize(opt, result);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 04/11] merge-ort: implement compare_pairs() and collect_renames()
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
                       ` (2 preceding siblings ...)
  2020-12-15 18:27     ` [PATCH v3 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
@ 2020-12-15 18:27     ` Elijah Newren via GitGitGadget
  2020-12-15 18:28     ` [PATCH v3 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
                       ` (6 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-15 18:27 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 66f84d39b43..10550c542b8 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -652,7 +652,10 @@ static int process_renames(struct merge_options *opt,
 
 static int compare_pairs(const void *a_, const void *b_)
 {
-	die("Not yet implemented.");
+	const struct diff_filepair *a = *((const struct diff_filepair **)a_);
+	const struct diff_filepair *b = *((const struct diff_filepair **)b_);
+
+	return strcmp(a->one->path, b->one->path);
 }
 
 /* Call diffcore_rename() to compute which files have changed on given side */
@@ -698,7 +701,35 @@ static int collect_renames(struct merge_options *opt,
 			   struct diff_queue_struct *result,
 			   unsigned side_index)
 {
-	die("Not yet implemented.");
+	int i, clean = 1;
+	struct diff_queue_struct *side_pairs;
+	struct rename_info *renames = &opt->priv->renames;
+
+	side_pairs = &renames->pairs[side_index];
+
+	for (i = 0; i < side_pairs->nr; ++i) {
+		struct diff_filepair *p = side_pairs->queue[i];
+
+		if (p->status != 'R') {
+			diff_free_filepair(p);
+			continue;
+		}
+
+		/*
+		 * p->score comes back from diffcore_rename_extended() with
+		 * the similarity of the renamed file.  The similarity is
+		 * was used to determine that the two files were related
+		 * and are a rename, which we have already used, but beyond
+		 * that we have no use for the similarity.  So p->score is
+		 * now irrelevant.  However, process_renames() will need to
+		 * know which side of the merge this rename was associated
+		 * with, so overwrite p->score with that value.
+		 */
+		p->score = side_index;
+		result->queue[result->nr++] = p;
+	}
+
+	return clean;
 }
 
 static int detect_and_process_renames(struct merge_options *opt,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 05/11] merge-ort: add basic outline for process_renames()
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
                       ` (3 preceding siblings ...)
  2020-12-15 18:27     ` [PATCH v3 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
@ 2020-12-15 18:28     ` Elijah Newren via GitGitGadget
  2020-12-15 18:28     ` [PATCH v3 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
                       ` (5 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-15 18:28 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Add code which determines which kind of special rename case each rename
corresponds to, but leave the handling of each type unimplemented for
now.  Future commits will implement each one.

There is some tenuous resemblance to merge-recursive's
process_renames(), but comparing the two is very unlikely to yield any
insights.  merge-ort's process_renames() is a bit complex and I would
prefer if I could simplify it more, but it is far easier to grok than
merge-recursive's function of the same name in my opinion.  Plus,
merge-ort handles more rename conflict types than merge-recursive does.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 97 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 10550c542b8..ebe275ef73c 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -647,7 +647,103 @@ static int handle_content_merge(struct merge_options *opt,
 static int process_renames(struct merge_options *opt,
 			   struct diff_queue_struct *renames)
 {
-	die("Not yet implemented.");
+	int clean_merge = 1, i;
+
+	for (i = 0; i < renames->nr; ++i) {
+		const char *oldpath = NULL, *newpath;
+		struct diff_filepair *pair = renames->queue[i];
+		struct conflict_info *oldinfo = NULL, *newinfo = NULL;
+		struct strmap_entry *old_ent, *new_ent;
+		unsigned int old_sidemask;
+		int target_index, other_source_index;
+		int source_deleted, collision, type_changed;
+
+		old_ent = strmap_get_entry(&opt->priv->paths, pair->one->path);
+		oldpath = old_ent->key;
+		oldinfo = old_ent->value;
+
+		new_ent = strmap_get_entry(&opt->priv->paths, pair->two->path);
+		newpath = new_ent->key;
+		newinfo = new_ent->value;
+
+		/*
+		 * diff_filepairs have copies of pathnames, thus we have to
+		 * use standard 'strcmp()' (negated) instead of '=='.
+		 */
+		if (i + 1 < renames->nr &&
+		    !strcmp(oldpath, renames->queue[i+1]->one->path)) {
+			/* Handle rename/rename(1to2) or rename/rename(1to1) */
+			const char *pathnames[3];
+
+			pathnames[0] = oldpath;
+			pathnames[1] = newpath;
+			pathnames[2] = renames->queue[i+1]->two->path;
+
+			if (!strcmp(pathnames[1], pathnames[2])) {
+				/* Both sides renamed the same way. */
+				die("Not yet implemented");
+
+				/* We handled both renames, i.e. i+1 handled */
+				i++;
+				/* Move to next rename */
+				continue;
+			}
+
+			/* This is a rename/rename(1to2) */
+			die("Not yet implemented");
+
+			i++; /* We handled both renames, i.e. i+1 handled */
+			continue;
+		}
+
+		VERIFY_CI(oldinfo);
+		VERIFY_CI(newinfo);
+		target_index = pair->score; /* from collect_renames() */
+		assert(target_index == 1 || target_index == 2);
+		other_source_index = 3 - target_index;
+		old_sidemask = (1 << other_source_index); /* 2 or 4 */
+		source_deleted = (oldinfo->filemask == 1);
+		collision = ((newinfo->filemask & old_sidemask) != 0);
+		type_changed = !source_deleted &&
+			(S_ISREG(oldinfo->stages[other_source_index].mode) !=
+			 S_ISREG(newinfo->stages[target_index].mode));
+		if (type_changed && collision) {
+			/* special handling so later blocks can handle this */
+			die("Not yet implemented");
+		}
+
+		assert(source_deleted || oldinfo->filemask & old_sidemask);
+
+		/* Need to check for special types of rename conflicts... */
+		if (collision && !source_deleted) {
+			/* collision: rename/add or rename/rename(2to1) */
+			die("Not yet implemented");
+		} else if (collision && source_deleted) {
+			/* rename/add/delete or rename/rename(2to1)/delete */
+			die("Not yet implemented");
+		} else {
+			/* a few different cases... */
+			if (type_changed) {
+				/* rename vs. typechange */
+				die("Not yet implemented");
+			} else if (source_deleted) {
+				/* rename/delete */
+				die("Not yet implemented");
+			} else {
+				/* normal rename */
+				die("Not yet implemented");
+			}
+		}
+
+		if (!type_changed) {
+			/* Mark the original as resolved by removal */
+			oldinfo->merged.is_null = 1;
+			oldinfo->merged.clean = 1;
+		}
+
+	}
+
+	return clean_merge;
 }
 
 static int compare_pairs(const void *a_, const void *b_)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 06/11] merge-ort: add implementation of both sides renaming identically
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
                       ` (4 preceding siblings ...)
  2020-12-15 18:28     ` [PATCH v3 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
@ 2020-12-15 18:28     ` Elijah Newren via GitGitGadget
  2020-12-15 18:28     ` [PATCH v3 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
                       ` (4 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-15 18:28 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement rename/rename(1to1) handling, i.e. both sides of history
renaming a file but renaming the same way.  This code replaces the
following from merge-recurisve.c:

  * all the 1to1 code in process_renames()
  * the RENAME_ONE_FILE_TO_ONE case of process_entry()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_rename_normal()
  * setup_rename_conflict_info()

The consolidation of four separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index ebe275ef73c..da3715baa63 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -674,14 +674,30 @@ static int process_renames(struct merge_options *opt,
 		    !strcmp(oldpath, renames->queue[i+1]->one->path)) {
 			/* Handle rename/rename(1to2) or rename/rename(1to1) */
 			const char *pathnames[3];
+			struct version_info merged;
+			struct conflict_info *base, *side1, *side2;
 
 			pathnames[0] = oldpath;
 			pathnames[1] = newpath;
 			pathnames[2] = renames->queue[i+1]->two->path;
 
+			base = strmap_get(&opt->priv->paths, pathnames[0]);
+			side1 = strmap_get(&opt->priv->paths, pathnames[1]);
+			side2 = strmap_get(&opt->priv->paths, pathnames[2]);
+
+			VERIFY_CI(base);
+			VERIFY_CI(side1);
+			VERIFY_CI(side2);
+
 			if (!strcmp(pathnames[1], pathnames[2])) {
-				/* Both sides renamed the same way. */
-				die("Not yet implemented");
+				/* Both sides renamed the same way */
+				assert(side1 == side2);
+				memcpy(&side1->stages[0], &base->stages[0],
+				       sizeof(merged));
+				side1->filemask |= (1 << MERGE_BASE);
+				/* Mark base as resolved by removal */
+				base->merged.is_null = 1;
+				base->merged.clean = 1;
 
 				/* We handled both renames, i.e. i+1 handled */
 				i++;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 07/11] merge-ort: add implementation of both sides renaming differently
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
                       ` (5 preceding siblings ...)
  2020-12-15 18:28     ` [PATCH v3 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
@ 2020-12-15 18:28     ` Elijah Newren via GitGitGadget
  2020-12-15 18:28     ` [PATCH v3 08/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
                       ` (3 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-15 18:28 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement rename/rename(1to2) handling, i.e. both sides of history
renaming a file and rename it differently.  This code replaces the
following from merge-recurisve.c:

  * all the 1to2 code in process_renames()
  * the RENAME_ONE_FILE_TO_TWO case of process_entry()
  * handle_rename_rename_1to2()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_file_collision()
  * setup_rename_conflict_info()

The consolidation of five separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

To be fair, there is a _slight_ tweak to process_entry() here to make
sure that the two different paths aren't marked as clean but are left in
a conflicted state.  So process_renames() and process_entry() aren't
quite entirely orthogonal, but they are pretty close.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index da3715baa63..19477cfae60 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -676,6 +676,7 @@ static int process_renames(struct merge_options *opt,
 			const char *pathnames[3];
 			struct version_info merged;
 			struct conflict_info *base, *side1, *side2;
+			unsigned was_binary_blob = 0;
 
 			pathnames[0] = oldpath;
 			pathnames[1] = newpath;
@@ -706,7 +707,58 @@ static int process_renames(struct merge_options *opt,
 			}
 
 			/* This is a rename/rename(1to2) */
-			die("Not yet implemented");
+			clean_merge = handle_content_merge(opt,
+							   pair->one->path,
+							   &base->stages[0],
+							   &side1->stages[1],
+							   &side2->stages[2],
+							   pathnames,
+							   1 + 2 * opt->priv->call_depth,
+							   &merged);
+			if (!clean_merge &&
+			    merged.mode == side1->stages[1].mode &&
+			    oideq(&merged.oid, &side1->stages[1].oid))
+				was_binary_blob = 1;
+			memcpy(&side1->stages[1], &merged, sizeof(merged));
+			if (was_binary_blob) {
+				/*
+				 * Getting here means we were attempting to
+				 * merge a binary blob.
+				 *
+				 * Since we can't merge binaries,
+				 * handle_content_merge() just takes one
+				 * side.  But we don't want to copy the
+				 * contents of one side to both paths.  We
+				 * used the contents of side1 above for
+				 * side1->stages, let's use the contents of
+				 * side2 for side2->stages below.
+				 */
+				oidcpy(&merged.oid, &side2->stages[2].oid);
+				merged.mode = side2->stages[2].mode;
+			}
+			memcpy(&side2->stages[2], &merged, sizeof(merged));
+
+			side1->path_conflict = 1;
+			side2->path_conflict = 1;
+			/*
+			 * TODO: For renames we normally remove the path at the
+			 * old name.  It would thus seem consistent to do the
+			 * same for rename/rename(1to2) cases, but we haven't
+			 * done so traditionally and a number of the regression
+			 * tests now encode an expectation that the file is
+			 * left there at stage 1.  If we ever decide to change
+			 * this, add the following two lines here:
+			 *    base->merged.is_null = 1;
+			 *    base->merged.clean = 1;
+			 * and remove the setting of base->path_conflict to 1.
+			 */
+			base->path_conflict = 1;
+			path_msg(opt, oldpath, 0,
+				 _("CONFLICT (rename/rename): %s renamed to "
+				   "%s in %s and to %s in %s."),
+				 pathnames[0],
+				 pathnames[1], opt->branch1,
+				 pathnames[2], opt->branch2);
 
 			i++; /* We handled both renames, i.e. i+1 handled */
 			continue;
@@ -1291,13 +1343,13 @@ static void process_entry(struct merge_options *opt,
 		int side = (ci->filemask == 4) ? 2 : 1;
 		ci->merged.result.mode = ci->stages[side].mode;
 		oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
-		ci->merged.clean = !ci->df_conflict;
+		ci->merged.clean = !ci->df_conflict && !ci->path_conflict;
 	} else if (ci->filemask == 1) {
 		/* Deleted on both sides */
 		ci->merged.is_null = 1;
 		ci->merged.result.mode = 0;
 		oidcpy(&ci->merged.result.oid, &null_oid);
-		ci->merged.clean = 1;
+		ci->merged.clean = !ci->path_conflict;
 	}
 
 	/*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 08/11] merge-ort: add implementation of rename/delete conflicts
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
                       ` (6 preceding siblings ...)
  2020-12-15 18:28     ` [PATCH v3 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
@ 2020-12-15 18:28     ` Elijah Newren via GitGitGadget
  2020-12-15 18:28     ` [PATCH v3 09/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
                       ` (2 subsequent siblings)
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-15 18:28 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement rename/delete conflicts, i.e. one side renames a file and the
other deletes the file.  This code replaces the following from
merge-recurisve.c:

  * the code relevant to RENAME_DELETE in process_renames()
  * the RENAME_DELETE case of process_entry()
  * handle_rename_delete()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_change_delete()
  * setup_rename_conflict_info()

The consolidation of five separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

To be fair, there is a _slight_ tweak to process_entry() here, because
rename/delete cases will also trigger the modify/delete codepath.
However, we only want a modify/delete message to be printed for a
rename/delete conflict if there is a content change in the renamed file
in addition to the rename.  So process_renames() and process_entry()
aren't quite fully orthogonal, but they are pretty close.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 19477cfae60..a10c3f5046f 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -657,6 +657,7 @@ static int process_renames(struct merge_options *opt,
 		unsigned int old_sidemask;
 		int target_index, other_source_index;
 		int source_deleted, collision, type_changed;
+		const char *rename_branch = NULL, *delete_branch = NULL;
 
 		old_ent = strmap_get_entry(&opt->priv->paths, pair->one->path);
 		oldpath = old_ent->key;
@@ -779,6 +780,15 @@ static int process_renames(struct merge_options *opt,
 			/* special handling so later blocks can handle this */
 			die("Not yet implemented");
 		}
+		if (source_deleted) {
+			if (target_index == 1) {
+				rename_branch = opt->branch1;
+				delete_branch = opt->branch2;
+			} else {
+				rename_branch = opt->branch2;
+				delete_branch = opt->branch1;
+			}
+		}
 
 		assert(source_deleted || oldinfo->filemask & old_sidemask);
 
@@ -790,13 +800,26 @@ static int process_renames(struct merge_options *opt,
 			/* rename/add/delete or rename/rename(2to1)/delete */
 			die("Not yet implemented");
 		} else {
-			/* a few different cases... */
+			/*
+			 * a few different cases...start by copying the
+			 * existing stage(s) from oldinfo over the newinfo
+			 * and update the pathname(s).
+			 */
+			memcpy(&newinfo->stages[0], &oldinfo->stages[0],
+			       sizeof(newinfo->stages[0]));
+			newinfo->filemask |= (1 << MERGE_BASE);
+			newinfo->pathnames[0] = oldpath;
 			if (type_changed) {
 				/* rename vs. typechange */
 				die("Not yet implemented");
 			} else if (source_deleted) {
 				/* rename/delete */
-				die("Not yet implemented");
+				newinfo->path_conflict = 1;
+				path_msg(opt, newpath, 0,
+					 _("CONFLICT (rename/delete): %s renamed"
+					   " to %s in %s, but deleted in %s."),
+					 oldpath, newpath,
+					 rename_branch, delete_branch);
 			} else {
 				/* normal rename */
 				die("Not yet implemented");
@@ -1332,12 +1355,21 @@ static void process_entry(struct merge_options *opt,
 		modify_branch = (side == 1) ? opt->branch1 : opt->branch2;
 		delete_branch = (side == 1) ? opt->branch2 : opt->branch1;
 
-		path_msg(opt, path, 0,
-			 _("CONFLICT (modify/delete): %s deleted in %s "
-			   "and modified in %s.  Version %s of %s left "
-			   "in tree."),
-			 path, delete_branch, modify_branch,
-			 modify_branch, path);
+		if (ci->path_conflict &&
+		    oideq(&ci->stages[0].oid, &ci->stages[side].oid)) {
+			/*
+			 * This came from a rename/delete; no action to take,
+			 * but avoid printing "modify/delete" conflict notice
+			 * since the contents were not modified.
+			 */
+		} else {
+			path_msg(opt, path, 0,
+				 _("CONFLICT (modify/delete): %s deleted in %s "
+				   "and modified in %s.  Version %s of %s left "
+				   "in tree."),
+				 path, delete_branch, modify_branch,
+				 modify_branch, path);
+		}
 	} else if (ci->filemask == 2 || ci->filemask == 4) {
 		/* Added on one side */
 		int side = (ci->filemask == 4) ? 2 : 1;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 09/11] merge-ort: add implementation of rename collisions
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
                       ` (7 preceding siblings ...)
  2020-12-15 18:28     ` [PATCH v3 08/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
@ 2020-12-15 18:28     ` Elijah Newren via GitGitGadget
  2020-12-15 18:28     ` [PATCH v3 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
  2020-12-15 18:28     ` [PATCH v3 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-15 18:28 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement rename/rename(2to1) and rename/add handling, i.e. a file is
renamed into a location where another file is added (with that other
file either being a plain add or itself coming from a rename).  Note
that rename collisions can also have a special case stacked on top: the
file being renamed on one side of history is deleted on the other
(yielding either a rename/add/delete conflict or perhaps a
rename/rename(2to1)/delete[/delete]) conflict.

One thing to note here is that when there is a double rename, the code
in question only handles one of them at a time; a later iteration
through the loop will handle the other.  After they've both been
handled, process_entry()'s normal add/add code can handle the collision.

This code replaces the following from merge-recurisve.c:

  * all the 2to1 code in process_renames()
  * the RENAME_TWO_FILES_TO_ONE case of process_entry()
  * handle_rename_rename_2to1()
  * handle_rename_add()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_file_collision()
  * setup_rename_conflict_info()

The consolidation of six separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 51 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index a10c3f5046f..1c5b2f7e3b9 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -795,10 +795,58 @@ static int process_renames(struct merge_options *opt,
 		/* Need to check for special types of rename conflicts... */
 		if (collision && !source_deleted) {
 			/* collision: rename/add or rename/rename(2to1) */
-			die("Not yet implemented");
+			const char *pathnames[3];
+			struct version_info merged;
+
+			struct conflict_info *base, *side1, *side2;
+			unsigned clean;
+
+			pathnames[0] = oldpath;
+			pathnames[other_source_index] = oldpath;
+			pathnames[target_index] = newpath;
+
+			base = strmap_get(&opt->priv->paths, pathnames[0]);
+			side1 = strmap_get(&opt->priv->paths, pathnames[1]);
+			side2 = strmap_get(&opt->priv->paths, pathnames[2]);
+
+			VERIFY_CI(base);
+			VERIFY_CI(side1);
+			VERIFY_CI(side2);
+
+			clean = handle_content_merge(opt, pair->one->path,
+						     &base->stages[0],
+						     &side1->stages[1],
+						     &side2->stages[2],
+						     pathnames,
+						     1 + 2 * opt->priv->call_depth,
+						     &merged);
+
+			memcpy(&newinfo->stages[target_index], &merged,
+			       sizeof(merged));
+			if (!clean) {
+				path_msg(opt, newpath, 0,
+					 _("CONFLICT (rename involved in "
+					   "collision): rename of %s -> %s has "
+					   "content conflicts AND collides "
+					   "with another path; this may result "
+					   "in nested conflict markers."),
+					 oldpath, newpath);
+			}
 		} else if (collision && source_deleted) {
-			/* rename/add/delete or rename/rename(2to1)/delete */
-			die("Not yet implemented");
+			/*
+			 * rename/add/delete or rename/rename(2to1)/delete:
+			 * since oldpath was deleted on the side that didn't
+			 * do the rename, there's not much of a content merge
+			 * we can do for the rename.  oldinfo->merged.is_null
+			 * was already set, so we just leave things as-is so
+			 * they look like an add/add conflict.
+			 */
+
+			newinfo->path_conflict = 1;
+			path_msg(opt, newpath, 0,
+				 _("CONFLICT (rename/delete): %s renamed "
+				   "to %s in %s, but deleted in %s."),
+				 oldpath, newpath, rename_branch, delete_branch);
 		} else {
 			/*
 			 * a few different cases...start by copying the
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 10/11] merge-ort: add implementation of normal rename handling
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
                       ` (8 preceding siblings ...)
  2020-12-15 18:28     ` [PATCH v3 09/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
@ 2020-12-15 18:28     ` Elijah Newren via GitGitGadget
  2020-12-15 18:28     ` [PATCH v3 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-15 18:28 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement handling of normal renames.  This code replaces the following
from merge-recurisve.c:

  * the code relevant to RENAME_NORMAL in process_renames()
  * the RENAME_NORMAL case of process_entry()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_rename_normal()
  * setup_rename_conflict_info()

The consolidation of four separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

(To be fair, the code for handling normal renames wasn't all that
complicated beforehand, but it's still much simpler now.)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/merge-ort.c b/merge-ort.c
index 1c5b2f7e3b9..26f357e524f 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -870,7 +870,11 @@ static int process_renames(struct merge_options *opt,
 					 rename_branch, delete_branch);
 			} else {
 				/* normal rename */
-				die("Not yet implemented");
+				memcpy(&newinfo->stages[other_source_index],
+				       &oldinfo->stages[other_source_index],
+				       sizeof(newinfo->stages[0]));
+				newinfo->filemask |= (1 << other_source_index);
+				newinfo->pathnames[other_source_index] = oldpath;
 			}
 		}
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH v3 11/11] merge-ort: add implementation of type-changed rename handling
  2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
                       ` (9 preceding siblings ...)
  2020-12-15 18:28     ` [PATCH v3 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
@ 2020-12-15 18:28     ` Elijah Newren via GitGitGadget
  10 siblings, 0 replies; 65+ messages in thread
From: Elijah Newren via GitGitGadget @ 2020-12-15 18:28 UTC (permalink / raw)
  To: git
  Cc: Derrick Stolee, Elijah Newren, Johannes Schindelin, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Implement cases where renames are involved in type changes (i.e. the
side of history that didn't rename the file changed its type from a
regular file to a symlink or submodule).  There was some code to handle
this in merge-recursive but only in the special case when the renamed
file had no content changes.  The code here works differently -- it
knows process_entry() can handle mode conflicts, so it does a few
minimal tweaks to ensure process_entry() can just finish the job as
needed.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/merge-ort.c b/merge-ort.c
index 26f357e524f..677c6a878c5 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -777,8 +777,33 @@ static int process_renames(struct merge_options *opt,
 			(S_ISREG(oldinfo->stages[other_source_index].mode) !=
 			 S_ISREG(newinfo->stages[target_index].mode));
 		if (type_changed && collision) {
-			/* special handling so later blocks can handle this */
-			die("Not yet implemented");
+			/*
+			 * special handling so later blocks can handle this...
+			 *
+			 * if type_changed && collision are both true, then this
+			 * was really a double rename, but one side wasn't
+			 * detected due to lack of break detection.  I.e.
+			 * something like
+			 *    orig: has normal file 'foo'
+			 *    side1: renames 'foo' to 'bar', adds 'foo' symlink
+			 *    side2: renames 'foo' to 'bar'
+			 * In this case, the foo->bar rename on side1 won't be
+			 * detected because the new symlink named 'foo' is
+			 * there and we don't do break detection.  But we detect
+			 * this here because we don't want to merge the content
+			 * of the foo symlink with the foo->bar file, so we
+			 * have some logic to handle this special case.  The
+			 * easiest way to do that is make 'bar' on side1 not
+			 * be considered a colliding file but the other part
+			 * of a normal rename.  If the file is very different,
+			 * well we're going to get content merge conflicts
+			 * anyway so it doesn't hurt.  And if the colliding
+			 * file also has a different type, that'll be handled
+			 * by the content merge logic in process_entry() too.
+			 *
+			 * See also t6430, 'rename vs. rename/symlink'
+			 */
+			collision = 0;
 		}
 		if (source_deleted) {
 			if (target_index == 1) {
@@ -859,7 +884,11 @@ static int process_renames(struct merge_options *opt,
 			newinfo->pathnames[0] = oldpath;
 			if (type_changed) {
 				/* rename vs. typechange */
-				die("Not yet implemented");
+				/* Mark the original as resolved by removal */
+				memcpy(&oldinfo->stages[0].oid, &null_oid,
+				       sizeof(oldinfo->stages[0].oid));
+				oldinfo->stages[0].mode = 0;
+				oldinfo->filemask &= 0x06;
 			} else if (source_deleted) {
 				/* rename/delete */
 				newinfo->path_conflict = 1;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH v2 00/11] merge-ort: add basic rename detection
  2020-12-15 14:34   ` [PATCH v2 00/11] merge-ort: add basic rename detection Derrick Stolee
@ 2020-12-15 22:09     ` Junio C Hamano
  0 siblings, 0 replies; 65+ messages in thread
From: Junio C Hamano @ 2020-12-15 22:09 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, git, Elijah Newren,
	Johannes Schindelin

Derrick Stolee <stolee@gmail.com> writes:

> On 12/14/2020 11:21 AM, Elijah Newren via GitGitGadget wrote:
>> This series builds on en/merge-ort-2 and adds basic rename detection to
>> merge-ort.
>
> I have now finished a full pass through this series. I find it to be
> well organized with a satisfying conclusion. My comments are mostly
> about nits. I tried to recommend some better organization, but failed
> to find a clear way to do so, which shows that the organization here
> is sound.
>
> My only complaint is that some of the patches break compilation, so
> that should definitely be fixed to assist with bisects in the future
> (likely for bisects unrelated to this feature).

Thanks.

^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2020-12-15 22:12 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-09 19:41 [PATCH 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
2020-12-09 19:41 ` [PATCH 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
2020-12-11  2:03   ` Derrick Stolee
2020-12-11  9:41     ` Elijah Newren
2020-12-09 19:41 ` [PATCH 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
2020-12-11  2:39   ` Derrick Stolee
2020-12-11  9:40     ` Elijah Newren
2020-12-13  7:47     ` Elijah Newren
2020-12-14 14:33       ` Derrick Stolee
2020-12-14 15:42         ` Johannes Schindelin
2020-12-14 16:11           ` Elijah Newren
2020-12-14 16:50             ` Johannes Schindelin
2020-12-14 17:35         ` Elijah Newren
2020-12-09 19:41 ` [PATCH 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
2020-12-11  2:54   ` Derrick Stolee
2020-12-11 17:38     ` Elijah Newren
2020-12-09 19:41 ` [PATCH 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
2020-12-11  3:00   ` Derrick Stolee
2020-12-11 18:43     ` Elijah Newren
2020-12-09 19:41 ` [PATCH 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
2020-12-11  3:24   ` Derrick Stolee
2020-12-11 20:03     ` Elijah Newren
2020-12-09 19:41 ` [PATCH 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
2020-12-11  3:32   ` Derrick Stolee
2020-12-09 19:41 ` [PATCH 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
2020-12-11  3:39   ` Derrick Stolee
2020-12-11 21:56     ` Elijah Newren
2020-12-09 19:41 ` [PATCH 08/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
2020-12-09 19:41 ` [PATCH 09/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
2020-12-09 19:41 ` [PATCH 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
2020-12-09 19:41 ` [PATCH 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget
2020-12-14 16:21 ` [PATCH v2 00/11] merge-ort: add basic rename detection Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
2020-12-14 16:21   ` [PATCH v2 08/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
2020-12-15 14:09     ` Derrick Stolee
2020-12-15 16:56       ` Elijah Newren
2020-12-14 16:21   ` [PATCH v2 09/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
2020-12-15 14:23     ` Derrick Stolee
2020-12-15 17:07       ` Elijah Newren
2020-12-15 14:27     ` Derrick Stolee
2020-12-14 16:21   ` [PATCH v2 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
2020-12-15 14:27     ` Derrick Stolee
2020-12-14 16:21   ` [PATCH v2 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget
2020-12-15 14:31     ` Derrick Stolee
2020-12-15 17:11       ` Elijah Newren
2020-12-15 14:34   ` [PATCH v2 00/11] merge-ort: add basic rename detection Derrick Stolee
2020-12-15 22:09     ` Junio C Hamano
2020-12-15 18:27   ` [PATCH v3 " Elijah Newren via GitGitGadget
2020-12-15 18:27     ` [PATCH v3 01/11] merge-ort: add basic data structures for handling renames Elijah Newren via GitGitGadget
2020-12-15 18:27     ` [PATCH v3 02/11] merge-ort: add initial outline for basic rename detection Elijah Newren via GitGitGadget
2020-12-15 18:27     ` [PATCH v3 03/11] merge-ort: implement detect_regular_renames() Elijah Newren via GitGitGadget
2020-12-15 18:27     ` [PATCH v3 04/11] merge-ort: implement compare_pairs() and collect_renames() Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 05/11] merge-ort: add basic outline for process_renames() Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 06/11] merge-ort: add implementation of both sides renaming identically Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 07/11] merge-ort: add implementation of both sides renaming differently Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 08/11] merge-ort: add implementation of rename/delete conflicts Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 09/11] merge-ort: add implementation of rename collisions Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 10/11] merge-ort: add implementation of normal rename handling Elijah Newren via GitGitGadget
2020-12-15 18:28     ` [PATCH v3 11/11] merge-ort: add implementation of type-changed " Elijah Newren via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).