git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/9] Improve rename detection performance in merge recursive
@ 2017-11-10 22:21 Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 1/9] diffcore-rename: No point trying to find a match better than exact Elijah Newren
                   ` (8 more replies)
  0 siblings, 9 replies; 12+ messages in thread
From: Elijah Newren @ 2017-11-10 22:21 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

This series depends on BOTH my rename-limits and directory-detection
series
(https://public-inbox.org/git/20171110173956.25105-1-newren@gmail.com/
and
https://public-inbox.org/git/20171110190550.27059-1-newren@gmail.com/).
This series could be modified to be submitted with no dependencies on
those series, but it'd end up causing lots of merging conflicts, and
having this series depend on the other two seemed most logical to me.

This patch series tries to improve performance rename detection
performance in merge recursive where possible.  In particular, I was
guided by a specific usecase of cherry-picking a small change (by
which I mean 7 files with small modifications and one new file) in a
repo that has thousands of renames, due to some high-level directories
having been renamed.  Some of the changes should help rename detection
performance in general, but the greatest benefit will be found when
one side of history only makes a small number of changes.  For my
usecase, I dropped the time needed for the cherry-pick from over 9
minutes, down to about 16 seconds.

RFC because:

  * I believe the patch with the biggest performance improvement will
    break directory rename detection in specific, limited cases (which
    are not yet represented in the testsuite).  Should be fixable; I
    just need to implement the fix.  (The fix may reduce the
    performance improvement slightly).
    
  * This series includes changes to conflict handling for conflict
    types that involve two colliding files.  I think the new behavior
    is strictly better, but this is the kind of thing I really need to
    make sure others agree with; comments very welcome.  (Patches 2--6)

  * 16 seconds is still annoyingly slow; we should be able to do better.
    I have one or two ideas.  But since others are asking about the
    performance of small cherry-picks in large repos with lots of renames,
    I figure it's worth posting what I have so far.

Elijah Newren (9):
  diffcore-rename: No point trying to find a match better than exact
  merge-recursive: Avoid unnecessary string list lookups
  merge-recursive: New function for better colliding conflict
    resolutions
  Add testcases for improved file collision conflict handling
  merge-recursive: Fix rename/add conflict handling
  merge-recursive: Improve handling for rename/rename(2to1) conflicts
  merge-recursive: Improve handling for add/add conflicts
  merge-recursive: Accelerate rename detection
  diffcore-rename: Filter rename_src list when possible

 diff.c                               |   1 +
 diff.h                               |   7 +
 diffcore-rename.c                    |  85 ++++++-
 merge-recursive.c                    | 452 ++++++++++++++++++++++++-----------
 t/t2023-checkout-m.sh                |   2 +-
 t/t3418-rebase-continue.sh           |  27 ++-
 t/t3504-cherry-pick-rerere.sh        |  19 +-
 t/t4200-rerere.sh                    |  12 +-
 t/t6020-merge-df.sh                  |   4 +-
 t/t6024-recursive-merge.sh           |  35 +--
 t/t6025-merge-symlinks.sh            |   9 +-
 t/t6031-merge-filemode.sh            |   4 +-
 t/t6036-recursive-corner-cases.sh    |  19 +-
 t/t6042-merge-rename-corner-cases.sh | 212 +++++++++++++++-
 t/t6043-merge-rename-directories.sh  |  13 +-
 t/t7060-wtstatus.sh                  |   1 +
 t/t7064-wtstatus-pv2.sh              |   4 +-
 t/t7506-status-submodule.sh          |  11 +-
 t/t7610-mergetool.sh                 |  28 +--
 19 files changed, 722 insertions(+), 223 deletions(-)

-- 
2.15.0.46.g41dca04efb


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH 1/9] diffcore-rename: No point trying to find a match better than exact
  2017-11-10 22:21 [RFC PATCH 0/9] Improve rename detection performance in merge recursive Elijah Newren
@ 2017-11-10 22:21 ` Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 2/9] merge-recursive: Avoid unnecessary string list lookups Elijah Newren
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Elijah Newren @ 2017-11-10 22:21 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

diffcore_rename() had some code to avoid having destination paths that
already had an exact rename detected from being re-checked for other
renames.  Source paths, however, were re-checked because we wanted to
allow the possibility of detecting copies.  But if copy detection isn't
turned on, then this merely amounts to attempting to find a
better-than-exact match, which naturally ends up being an expensive
no-op.  In particular, copy detection is never turned on by the merge
recursive machinery.

In a large repository (~50k files, about 60% of which was java) that had
a number of high level directories involved in renames, this cut the time
necessary for a cherry-pick down by about 50% (from around 9 minutes to
4.5 minutes).

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6ba6157c61..c0517058b0 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -377,11 +377,10 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
  * 1 if we need to disable inexact rename detection;
  * 2 if we would be under the limit if we were given -C instead of -C -C.
  */
-static int too_many_rename_candidates(int num_create,
+static int too_many_rename_candidates(int num_create, int num_src,
 				      struct diff_options *options)
 {
 	int rename_limit = options->rename_limit;
-	int num_src = rename_src_nr;
 	int i;
 
 	options->needed_rename_limit = 0;
@@ -446,7 +445,7 @@ void diffcore_rename(struct diff_options *options)
 	struct diff_queue_struct outq;
 	struct diff_score *mx;
 	int i, j, rename_count, skip_unmodified = 0;
-	int num_create, dst_cnt;
+	int num_create, dst_cnt, num_src;
 	struct progress *progress = NULL;
 
 	if (!minimum_score)
@@ -512,12 +511,14 @@ void diffcore_rename(struct diff_options *options)
 	 * files still remain as options for rename/copies!)
 	 */
 	num_create = (rename_dst_nr - rename_count);
+	num_src = (detect_rename == DIFF_DETECT_COPY ?
+		   rename_src_nr : rename_src_nr - rename_count);
 
 	/* All done? */
 	if (!num_create)
 		goto cleanup;
 
-	switch (too_many_rename_candidates(num_create, options)) {
+	switch (too_many_rename_candidates(num_create, num_src, options)) {
 	case 1:
 		goto cleanup;
 	case 2:
@@ -531,7 +532,7 @@ void diffcore_rename(struct diff_options *options)
 	if (options->show_rename_progress) {
 		progress = start_delayed_progress(
 				_("Performing inexact rename detection"),
-				(uint64_t)rename_dst_nr * (uint64_t)rename_src_nr);
+				(uint64_t)num_create * (uint64_t)num_src);
 	}
 
 	mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_create), sizeof(*mx));
@@ -550,6 +551,10 @@ void diffcore_rename(struct diff_options *options)
 			struct diff_filespec *one = rename_src[j].p->one;
 			struct diff_score this_src;
 
+			if (one->rename_used &&
+			    detect_rename != DIFF_DETECT_COPY)
+				continue;
+
 			if (skip_unmodified &&
 			    diff_unmodified_pair(rename_src[j].p))
 				continue;
@@ -568,7 +573,7 @@ void diffcore_rename(struct diff_options *options)
 			diff_free_filespec_blob(two);
 		}
 		dst_cnt++;
-		display_progress(progress, (uint64_t)(i+1)*(uint64_t)rename_src_nr);
+		display_progress(progress, (uint64_t)dst_cnt*(uint64_t)num_src);
 	}
 	stop_progress(&progress);
 
-- 
2.15.0.46.g41dca04efb


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 2/9] merge-recursive: Avoid unnecessary string list lookups
  2017-11-10 22:21 [RFC PATCH 0/9] Improve rename detection performance in merge recursive Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 1/9] diffcore-rename: No point trying to find a match better than exact Elijah Newren
@ 2017-11-10 22:21 ` Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 3/9] merge-recursive: New function for better colliding conflict resolutions Elijah Newren
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Elijah Newren @ 2017-11-10 22:21 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

Since we're taking entries from active_cache, which is already in sorted
order with same-named entries adjacent, we can skip a lookup.  Also, we can
just use append instead of insert (avoiding the need to find where to put
the new item) and still end up with a sorted list.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
Assumed negligible performance change; I didn't even bother measuring it.
But I just happened to be looking around at this code while trying to figure
out some of the performance and figured it was at least speeding it up a
tiny bit.

 merge-recursive.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 6ef1d52f0a..d54466649e 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -451,22 +451,28 @@ static struct stage_data *insert_stage_data(const char *path,
 static struct string_list *get_unmerged(void)
 {
 	struct string_list *unmerged = xcalloc(1, sizeof(struct string_list));
+	struct string_list_item *item;
+	const char *last = NULL;
 	int i;
 
 	unmerged->strdup_strings = 1;
 
 	for (i = 0; i < active_nr; i++) {
-		struct string_list_item *item;
 		struct stage_data *e;
 		const struct cache_entry *ce = active_cache[i];
 		if (!ce_stage(ce))
 			continue;
 
-		item = string_list_lookup(unmerged, ce->name);
-		if (!item) {
-			item = string_list_insert(unmerged, ce->name);
+		if (last == NULL || strcmp(last, ce->name)) {
+			/*
+			 * active_cache is in sorted order, so we can just call
+			 * string_list_append instead of string_list_insert and
+			 * still end up with a sorted list.
+			 */
+			item = string_list_append(unmerged, ce->name);
 			item->util = xcalloc(1, sizeof(struct stage_data));
 		}
+		last = ce->name;
 		e = item->util;
 		e->stages[ce_stage(ce)].mode = ce->ce_mode;
 		oidcpy(&e->stages[ce_stage(ce)].oid, &ce->oid);
-- 
2.15.0.46.g41dca04efb


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 3/9] merge-recursive: New function for better colliding conflict resolutions
  2017-11-10 22:21 [RFC PATCH 0/9] Improve rename detection performance in merge recursive Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 1/9] diffcore-rename: No point trying to find a match better than exact Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 2/9] merge-recursive: Avoid unnecessary string list lookups Elijah Newren
@ 2017-11-10 22:21 ` Elijah Newren
  2017-11-11 16:49   ` Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 4/9] Add testcases for improved file collision conflict handling Elijah Newren
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Elijah Newren @ 2017-11-10 22:21 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

There are three conflict types that represent two (possibly entirely
unrelated) files colliding at the same location:
  * add/add
  * rename/add
  * rename/rename(2to1)

These three conflict types already share more similarity than might be
immediately apparent from their description: (1) the handling of the
rename variants already involves removing any entries from the index
corresponding to the original file names[*], thus only leaving entries
in the index for the colliding path; (2) likewise, any trace of the
original file name in the working tree is also removed.  So, in all
three cases we're left with how to represent two colliding files in both
the index and the working copy.

[*] Technically, this isn't quite true because rename/rename(2to1)
conflicts in the recursive (o->call_depth > 0) case do an "unrename"
since about seven years ago.  But even in that case, Junio felt
compelled to explain that my decision to "unrename" wasn't necessarily
the only or right answer -- search for "Comment from Junio" in t6036 for
details.

My initial motivation for looking at these three conflict types was that
if the handling of these three conflict types is the "same" (defined
more precisely in a later commit), or is at least the "same" in the
limited set of cases where a renamed file is unmodified on the side of
history where the file is not renamed, then a significant performance
improvement for rename detection during merges is possible.  However,
while that served as motivation to look at these three types of
conflicts, the actual goal of this new function is to try to improve the
handling for all three cases, not to merely make them the same.

The previous behavior for these conflict types in regards to the
working tree (assuming the file collision occurs at 'foo') was:
  * add/add does a two-way merge of the two files and records it as 'foo'.
  * rename/rename(2to1) records the two different files into two new
    uniquely named files (foo~HEAD and foo~$MERGE), while removing 'foo'
    from the working tree.
  * rename/add records the two different files into two different
    locations, recording the add at foo~$SIDE and, oddly, recording
    the rename at foo (why is the rename more important than the add?)

So, the biggest question is whether the two colliding files should be
two-way merged and recorded in place, or recorded into separate files.
If the files are similar enough, the two-way merge is probably
preferable, but if they're not similar, recording as separate files is
probably similar.  (The same logic that applies for the working
directory here would also apply to the recursive (i.e. o->call_depth >
0) case as well.)  The code handling the different types of conflicts
appear to have been written with different assumptions about whether the
colliding files would be similar.

But, rather than make an assumption about whether the two files will be
similar, why not just check?  If we simply call estimate_similarity(),
we can two-way merge the files if they are similar, and otherwise record
the two files at different locations.

Checking similarity in order to optimize worktree handling is the
primary improvement for these three conflict types.  Further
improvements will be discussed in subsequent commits that modify the
code to take advantage of this new shared function.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
Besides the use-similarity-to-determine-two-way-vs-record-two-separate
choice, I am curious if anyone else has problems with the "unrename"
behavior for the recursive case (which I'd guess never actually comes up
in practice and only cursorily appears in the testsuite because I put it
there years ago).  If anyone does, I may have a useful testcase I could
bring up for discussion on the list.  Given that Junio disagreed with
some of my reasoning on the one testcase in the testsuite that cursorily
touches this, I'm not sure it's even clear what "correct"/"optimal"
behavior for those cases is.

 diff.h            |   4 ++
 diffcore-rename.c |   6 +--
 merge-recursive.c | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/diff.h b/diff.h
index aca150ba2e..adf7e92eb5 100644
--- a/diff.h
+++ b/diff.h
@@ -427,4 +427,8 @@ extern void print_stat_summary(FILE *fp, int files,
 			       int insertions, int deletions);
 extern void setup_diff_pager(struct diff_options *);
 
+extern int estimate_similarity(struct diff_filespec *src,
+			       struct diff_filespec *dst,
+			       int minimum_score);
+
 #endif /* DIFF_H */
diff --git a/diffcore-rename.c b/diffcore-rename.c
index c0517058b0..b15d9d74ef 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -127,9 +127,9 @@ struct diff_score {
 	short name_score;
 };
 
-static int estimate_similarity(struct diff_filespec *src,
-			       struct diff_filespec *dst,
-			       int minimum_score)
+int estimate_similarity(struct diff_filespec *src,
+			struct diff_filespec *dst,
+			int minimum_score)
 {
 	/* src points at a file that existed in the original tree (or
 	 * optionally a file in the destination tree) and dst points
diff --git a/merge-recursive.c b/merge-recursive.c
index d54466649e..345479ad50 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1278,6 +1278,124 @@ static struct diff_filespec *filespec_from_entry(struct diff_filespec *target,
 	return target;
 }
 
+static int handle_file_collision(struct merge_options *o,
+				 const char *collide_path,
+				 const char *prev_path1,
+				 const char *prev_path2,
+				 const char *branch1, const char *branch2,
+				 const struct object_id *a_oid, unsigned a_mode,
+				 const struct object_id *b_oid, unsigned b_mode,
+				 unsigned conflict_markers_already_present)
+{
+	struct merge_file_info mfi;
+	struct diff_filespec null, a, b;
+	int minimum_score;
+
+	/* Remove rename sources if this was rename/add or rename/rename(2to1) */
+	if (prev_path1)
+		remove_file(o, 1, prev_path1,
+			    o->call_depth || would_lose_untracked(prev_path1));
+	if (prev_path2)
+		remove_file(o, 1, prev_path2,
+			    o->call_depth || would_lose_untracked(prev_path2));
+
+	/*
+	 * Remove the collision path, if it wouldn't cause dirty contents
+	 * or an untracked file to get lost.  We'll either overwrite with
+	 * merged contents, or just write out to differently named files.
+	 */
+	if (was_dirty(o, collide_path))
+		output(o, 1, _("Refusing to lose dirty file at %s"),
+		       collide_path);
+	else if (would_lose_untracked(collide_path))
+		/*
+		 * Only way we get here is if both renames were from
+		 * a directory rename AND user had an untracked file
+		 * at the location where both files end up after the
+		 * two directory renames.  See testcase 10d of t6043.
+		 */
+		output(o, 1, _("Refusing to lose untracked file at "
+			       "%s, even though it's in the way."),
+		       collide_path);
+	else
+		/*
+		 * FIXME: It's possible that neither of the two files have
+		 * conflict markers already present, and that they're
+		 * identical, and that the current working copy happens to
+		 * match, in which case we are unnecessarily touching the
+		 * working tree file.  It's not a likely enough scenario
+		 * that I want to code up the checks for it and a better
+		 * fix is available if we restructure how unpack_trees()
+		 * and merge-recursive interoperate anyway, so punting for
+		 * now...
+		 */
+		remove_file(o, 0, collide_path, 0);
+
+	/* Store things in diff_filespecs for functions that need it */
+	memset(&a, 0, sizeof(struct diff_filespec));
+	memset(&b, 0, sizeof(struct diff_filespec));
+	null.path = a.path = b.path = (char *)collide_path;
+	oidcpy(&null.oid, &null_oid);
+	null.mode = 0;
+	oidcpy(&a.oid, a_oid);
+	a.mode = a_mode;
+	a.oid_valid = 1;
+	oidcpy(&b.oid, b_oid);
+	b.mode = b_mode;
+	b.oid_valid = 1;
+
+	/*
+	 * If the colliding files are similar enough, we can simply merge
+	 * them.  But we don't want to merge files that have conflict
+	 * markers in them already, because nested conflict markers are
+	 * too confusing.
+	 */
+	minimum_score = o->rename_score ? o->rename_score : DEFAULT_RENAME_SCORE;
+	if (!conflict_markers_already_present && minimum_score <=
+	    estimate_similarity(&a, &b, o->rename_score)) {
+		if (merge_file_1(o, &null, &a, &b, branch1, branch2, &mfi))
+			return -1;
+		if (update_file(o, mfi.clean, &mfi.oid, mfi.mode, collide_path))
+			return -1;
+		if (!mfi.clean && !o->call_depth &&
+		    update_stages(o, collide_path, NULL, &a, &b))
+			return -1;
+		return mfi.clean;
+	}
+
+	/*
+	 * Put the colliding files into different paths, and record the
+	 * updated sha1sums in the index
+	 */
+	char *new_path1 = (o->call_depth && prev_path1) ? strdup(prev_path1) :
+			  unique_path(o, collide_path, branch1);
+	char *new_path2 = (o->call_depth && prev_path2) ? strdup(prev_path2) :
+			  unique_path(o, collide_path, branch2);
+	output(o, 1, _("Renaming collisions at %s to %s and %s instead"),
+	       collide_path, new_path1, new_path2);
+
+	if (update_file(o, 0, a_oid, a_mode, new_path1))
+		return -1;
+	if (update_file(o, 0, b_oid, b_mode, new_path2))
+		return -1;
+
+	/* Update index too, making sure to get stage order correct. */
+	if (!o->call_depth) {
+		if (o->branch1 == branch1) {
+			if (update_stages(o, collide_path, NULL, &a, &b))
+				return -1;
+		} else {
+			if (update_stages(o, collide_path, NULL, &b, &a))
+				return -1;
+		}
+	}
+
+	free(new_path2);
+	free(new_path1);
+
+	return 0; /* not clean */
+}
+
 static int handle_file(struct merge_options *o,
 			struct diff_filespec *rename,
 			int stage,
-- 
2.15.0.46.g41dca04efb


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 4/9] Add testcases for improved file collision conflict handling
  2017-11-10 22:21 [RFC PATCH 0/9] Improve rename detection performance in merge recursive Elijah Newren
                   ` (2 preceding siblings ...)
  2017-11-10 22:21 ` [RFC PATCH 3/9] merge-recursive: New function for better colliding conflict resolutions Elijah Newren
@ 2017-11-10 22:21 ` Elijah Newren
  2017-11-11 16:52   ` Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 5/9] merge-recursive: Fix rename/add " Elijah Newren
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 12+ messages in thread
From: Elijah Newren @ 2017-11-10 22:21 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

Adds testcases dealing with file collisions for the following types of
conflicts:
  * add/add
  * rename/add
  * rename/rename(2to1)
These tests include expectations for new, smarter behavior provided by
handle_file_collision().  Since that function is not in use yet, the
tests are currently expected to fail.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t6036-recursive-corner-cases.sh    |   8 +-
 t/t6042-merge-rename-corner-cases.sh | 210 +++++++++++++++++++++++++++++++++++
 2 files changed, 214 insertions(+), 4 deletions(-)

diff --git a/t/t6036-recursive-corner-cases.sh b/t/t6036-recursive-corner-cases.sh
index 18aa88b5c0..8485e04b9b 100755
--- a/t/t6036-recursive-corner-cases.sh
+++ b/t/t6036-recursive-corner-cases.sh
@@ -208,11 +208,11 @@ test_expect_success 'git detects differently handled merges conflict' '
 	git cat-file -p C:new_a >>merge-me &&
 	>empty &&
 	test_must_fail git merge-file \
-		-L "Temporary merge branch 2" \
-		-L "" \
 		-L "Temporary merge branch 1" \
-		merged empty merge-me &&
-	sed -e "s/^\([<=>]\)/\1\1\1/" merged >merged-internal &&
+		-L "" \
+		-L "Temporary merge branch 2" \
+		merge-me empty merged &&
+	sed -e "s/^\([<=>]\)/\1\1\1/" merge-me >merged-internal &&
 	test $(git rev-parse :1:new_a) = $(git hash-object merged-internal)
 '
 
diff --git a/t/t6042-merge-rename-corner-cases.sh b/t/t6042-merge-rename-corner-cases.sh
index 411550d2b6..d8fe797f0d 100755
--- a/t/t6042-merge-rename-corner-cases.sh
+++ b/t/t6042-merge-rename-corner-cases.sh
@@ -575,4 +575,214 @@ test_expect_success 'rename/rename/add-dest merge still knows about conflicting
 	test ! -f c
 '
 
+test_conflicts_with_adds_and_renames() {
+	test $1 != 0 && side1=rename || side1=add
+	test $2 != 0 && side2=rename || side2=add
+
+	# Setup:
+	#          L
+	#         / \
+	#   master   ?
+	#         \ /
+	#          R
+	#
+	# Where:
+	#   Both L and R have files named 'three-unrelated' and
+	#   'three-related' which collide.  Each of the colliding files
+	#   could have been involved in a rename, in which case there was a
+	#   file named 'one-[un]related' or 'two-[un]related' that was
+	#   modified on the opposite side of history and renamed into the
+	#   collision on this side of history.
+	#
+	# Questions for both three-unrelated and three-related:
+	#   1) The index should contain both a stage 2 and stage 3 entry
+	#      for the colliding file.  Does it?
+	#   2) When renames are involved, the content merges are clean, so
+	#      the index should reflect the content merges, not merely the
+	#      version of the colliding file from the prior commit.  Does
+	#      it?
+	#
+	# Questions for three-unrelated:
+	#   3) There should be files in the worktree named
+	#      'three-unrelated~HEAD' and 'three-unrelated~R^0' with the
+	#      (content-merged) version of 'three-unrelated' from the
+	#      appropriate side of the merge.  Are they present?
+	#   4) There should be no file named 'three-unrelated' in the
+	#      working tree.  That'd make it too likely that users would
+	#      use it instead of carefully looking at both
+	#      three-unrelated~HEAD and three-unrelated~R^0.  Is it
+	#      correctly missing?
+	#
+	# Questions for three-related:
+	#   3) There should be a file in the worktree named three-related
+	#      containing the two-way merged contents of the content-merged
+	#      versions of three-related from each of the two colliding
+	#      files.  Is it present?
+	#   4) There should not be any three-related~* files in the working
+	#      tree.
+	test_expect_success "setup simple $side1/$side2 conflict" '
+		git rm -rf . &&
+		git clean -fdqx &&
+		rm -rf .git &&
+		git init &&
+
+		# Create a simple file with 10 lines
+		ten="0 1 2 3 4 5 6 7 8 9" &&
+		for i in $ten
+		do
+			echo line $i in a sample file
+		done >unrelated1_v1 &&
+		# Create a second version of same file with one more line
+		cat unrelated1_v1 >unrelated1_v2 &&
+		echo another line >>unrelated1_v2 &&
+
+		# Create an unrelated simple file with 10 lines
+		for i in $ten
+		do
+			echo line $i in another sample file
+		done >unrelated2_v1 &&
+		# Create a second version of same file with one more line
+		cat unrelated2_v1 >unrelated2_v2 &&
+		echo another line >>unrelated2_v2 &&
+
+		# Create some related files now
+		for i in $ten
+		do
+			echo Random base content line $i
+		done >related1_v1 &&
+		cp -a related1_v1 related1_v2 &&
+		echo modification >>related1_v2 &&
+
+		cp -a related1_v1 related2_v1 &&
+		echo more stuff >>related2_v1 &&
+		cp -a related2_v1 related2_v2 &&
+		echo yet more stuff >>related2_v2 &&
+
+		# Use a tag to record both these files for simple access,
+		# and clean out these untracked files
+		git tag unrelated1_v1 `git hash-object -w unrelated1_v1` &&
+		git tag unrelated1_v2 `git hash-object -w unrelated1_v2` &&
+		git tag unrelated2_v1 `git hash-object -w unrelated2_v1` &&
+		git tag unrelated2_v2 `git hash-object -w unrelated2_v2` &&
+		git tag related1_v1 `git hash-object -w related1_v1` &&
+		git tag related1_v2 `git hash-object -w related1_v2` &&
+		git tag related2_v1 `git hash-object -w related2_v1` &&
+		git tag related2_v2 `git hash-object -w related2_v2` &&
+		git clean -f &&
+
+		# Setup merge-base, consisting of files named "one-*"
+		# and "two-*" if renames were involved.
+		touch irrelevant_file &&
+		git add irrelevant_file &&
+		if [ $side1 == "rename" ]; then
+			git show unrelated1_v1 >one-unrelated &&
+			git add one-unrelated
+			git show related1_v1 >one-related &&
+			git add one-related
+		fi &&
+		if [ $side2 == "rename" ]; then
+			git show unrelated2_v1 >two-unrelated &&
+			git add two-unrelated
+			git show related2_v1 >two-related &&
+			git add two-related
+		fi &&
+		test_tick && git commit -m initial &&
+
+		git branch L &&
+		git branch R &&
+
+		# Handle the left side
+		git checkout L &&
+		if [ $side1 == "rename" ]; then
+			git mv one-unrelated three-unrelated
+			git mv one-related   three-related
+		else
+			git show unrelated1_v2 >three-unrelated &&
+			git add three-unrelated
+			git show related1_v2 >three-related &&
+			git add three-related
+		fi &&
+		if [ $side2 == "rename" ]; then
+			git show unrelated2_v2 >two-unrelated &&
+			git add two-unrelated
+			git show related2_v2 >two-related &&
+			git add two-related
+		fi &&
+		test_tick && git commit -m L &&
+
+		# Handle the right side
+		git checkout R &&
+		if [ $side1 == "rename" ]; then
+			git show unrelated1_v2 >one-unrelated &&
+			git add one-unrelated
+			git show related1_v2 >one-related &&
+			git add one-related
+		fi &&
+		if [ $side2 == "rename" ]; then
+			git mv two-unrelated three-unrelated
+			git mv two-related three-related
+		else
+			git show unrelated2_v2 >three-unrelated &&
+			git add three-unrelated
+			git show related2_v2 >three-related &&
+			git add three-related
+		fi &&
+		test_tick && git commit -m R
+	'
+
+	test_expect_failure "check simple $side1/$side2 conflict" '
+		git reset --hard &&
+		git checkout L^0 &&
+
+		# Merge must fail; there is a conflict
+		test_must_fail git merge -s recursive R^0 &&
+
+		# Make sure the index has the right number of entries
+		test 5 = $(git ls-files -s | wc -l) &&
+		test 4 = $(git ls-files -u | wc -l) &&
+
+		# Nothing should have touched irrelevant_file
+		test $(git rev-parse :0:irrelevant_file) = $(git rev-parse master:irrelevant_file) &&
+
+		# Even for renames, make sure the index contains the MERGED
+		# version of the file, not the version of the file that existed
+		# on the given side.
+		test $(git rev-parse :2:three-unrelated) = $(git rev-parse unrelated1_v2) &&
+		test $(git rev-parse :3:three-unrelated) = $(git rev-parse unrelated2_v2) &&
+		test $(git rev-parse :2:three-related)   = $(git rev-parse related1_v2) &&
+		test $(git rev-parse :3:three-related)   = $(git rev-parse related2_v2) &&
+
+		# Make sure we have the correct number of untracked files
+		test 2 = $(git ls-files -o | wc -l) &&
+
+		# Make sure each file (with merging if rename involved) is
+		# present in the working tree for the user to work with.
+		test $(git hash-object three-unrelated~HEAD) = $(git rev-parse unrelated1_v2) &&
+		test $(git hash-object three-unrelated~R^0)  = $(git rev-parse unrelated2_v2) &&
+
+		# "three-unrelated" should not exist because there is no
+		# reason to give preference to either three-unrelated~HEAD
+		# or three-unrelated~R^0
+		test ! -f three-unrelated &&
+
+		# Make sure we have the correct merged contents for three-related
+		git show related1_v1 >expected &&
+		cat <<EOF >>expected &&
+<<<<<<< HEAD
+modification
+=======
+more stuff
+yet more stuff
+>>>>>>> R^0
+EOF
+
+		test_cmp expected three-related
+	'
+}
+
+test_conflicts_with_adds_and_renames 1 1
+test_conflicts_with_adds_and_renames 1 0
+test_conflicts_with_adds_and_renames 0 1
+test_conflicts_with_adds_and_renames 0 0
+
 test_done
-- 
2.15.0.46.g41dca04efb


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 5/9] merge-recursive: Fix rename/add conflict handling
  2017-11-10 22:21 [RFC PATCH 0/9] Improve rename detection performance in merge recursive Elijah Newren
                   ` (3 preceding siblings ...)
  2017-11-10 22:21 ` [RFC PATCH 4/9] Add testcases for improved file collision conflict handling Elijah Newren
@ 2017-11-10 22:21 ` Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 6/9] merge-recursive: Improve handling for rename/rename(2to1) conflicts Elijah Newren
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Elijah Newren @ 2017-11-10 22:21 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

This makes the rename/add conflict handling make use of the new
handle_file_collision() function, which fixes several bugs and improves
things for the rename/add case significantly.  Previously, rename/add
would:

  * Not leave any higher order stage entries in the index, making it
    appear as if there were no conflict.
  * Would place the rename file at the colliding path, and move the
    added file elsewhere, which combined with the lack of higher order
    stage entries felt really odd.  It's not clear to me why the
    rename should take precedence over the add; if one should be moved
    out of the way, they both probably should.
  * In the recursive case, it would do a two way merge of the added
    file and the version of the renamed file on the renamed side,
    completely excluding modifications to the renamed file on the
    unrenamed side of history.

Using the new handle_file_collision() fixes all of these issues, and
adds smarts to allow two-way merge OR move colliding files to separate
paths depending on the similarity of the colliding files.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-recursive.c                 | 135 +++++++++++++++++++++++++++-----------
 t/t6036-recursive-corner-cases.sh |  11 ++--
 2 files changed, 104 insertions(+), 42 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 345479ad50..f29cbd1240 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -167,6 +167,7 @@ static int oid_eq(const struct object_id *a, const struct object_id *b)
 enum rename_type {
 	RENAME_NORMAL = 0,
 	RENAME_DIR,
+	RENAME_ADD,
 	RENAME_DELETE,
 	RENAME_ONE_FILE_TO_ONE,
 	RENAME_ONE_FILE_TO_TWO,
@@ -209,6 +210,7 @@ static inline void setup_rename_conflict_info(enum rename_type rename_type,
 					      struct stage_data *src_entry1,
 					      struct stage_data *src_entry2)
 {
+	int ostage1, ostage2;
 	struct rename_conflict_info *ci = xcalloc(1, sizeof(struct rename_conflict_info));
 	ci->rename_type = rename_type;
 	ci->pair1 = pair1;
@@ -226,18 +228,22 @@ static inline void setup_rename_conflict_info(enum rename_type rename_type,
 		dst_entry2->rename_conflict_info = ci;
 	}
 
-	if (rename_type == RENAME_TWO_FILES_TO_ONE) {
-		/*
-		 * For each rename, there could have been
-		 * modifications on the side of history where that
-		 * file was not renamed.
-		 */
-		int ostage1 = o->branch1 == branch1 ? 3 : 2;
-		int ostage2 = ostage1 ^ 1;
+	/*
+	 * For each rename, there could have been
+	 * modifications on the side of history where that
+	 * file was not renamed.
+	 */
+	if (rename_type == RENAME_ADD ||
+	    rename_type == RENAME_TWO_FILES_TO_ONE) {
+		ostage1 = o->branch1 == branch1 ? 3 : 2;
 
 		ci->ren1_other.path = pair1->one->path;
 		oidcpy(&ci->ren1_other.oid, &src_entry1->stages[ostage1].oid);
 		ci->ren1_other.mode = src_entry1->stages[ostage1].mode;
+	}
+
+	if (rename_type == RENAME_TWO_FILES_TO_ONE) {
+		ostage2 = ostage1 ^ 1;
 
 		ci->ren2_other.path = pair2->one->path;
 		oidcpy(&ci->ren2_other.oid, &src_entry2->stages[ostage2].oid);
@@ -1104,6 +1110,18 @@ static int merge_file_special_markers(struct merge_options *o,
 			   const char *filename2,
 			   struct merge_file_info *mfi)
 {
+	if (o->branch1 != branch1) {
+		/*
+		 * It's weird getting a reverse merge with HEAD on the bottom
+		 * and the other branch on the top.  Fix that.
+		 */
+		return merge_file_special_markers(o,
+						  one, b, a,
+						  branch2, filename2,
+						  branch1, filename1,
+						  mfi);
+	}
+
 	char *side1 = NULL;
 	char *side2 = NULL;
 	int ret;
@@ -1291,6 +1309,21 @@ static int handle_file_collision(struct merge_options *o,
 	struct diff_filespec null, a, b;
 	int minimum_score;
 
+	/*
+	 * It's easiest to get the correct things into stage 2 and 3, and
+	 * to make sure that the content merge puts HEAD before the other
+	 * branch if we just ensure that branch1 == o->branch1.  So, simply
+	 * flip arguments around if we don't have that.
+	 */
+	if (branch1 != o->branch1) {
+		return handle_file_collision(o, collide_path,
+					     prev_path2, prev_path1,
+					     branch2, branch1,
+					     b_oid, b_mode,
+					     a_oid, a_mode,
+					     conflict_markers_already_present);
+	}
+
 	/* Remove rename sources if this was rename/add or rename/rename(2to1) */
 	if (prev_path1)
 		remove_file(o, 1, prev_path1,
@@ -1396,6 +1429,36 @@ static int handle_file_collision(struct merge_options *o,
 	return 0; /* not clean */
 }
 
+static int conflict_rename_add(struct merge_options *o,
+			       struct rename_conflict_info *ci)
+{
+	/* a was renamed to c, and a separate c was added. */
+	struct diff_filespec *a = ci->pair1->one;
+	struct diff_filespec *c = ci->pair1->two;
+	char *path = c->path;
+	struct merge_file_info mfi;
+
+	int other_stage = (ci->branch1 == o->branch1 ? 3 : 2);
+
+	output(o, 1, _("CONFLICT (rename/add): "
+	       "Rename %s->%s in %s.  Added %s in %s"),
+	       a->path, c->path, ci->branch1,
+	       c->path, ci->branch2);
+
+	if (merge_file_special_markers(o, a, c, &ci->ren1_other,
+				       o->branch1, path,
+				       o->branch2, ci->ren1_other.path, &mfi))
+		return -1;
+
+	return handle_file_collision(o,
+				     c->path, a->path, NULL,
+				     ci->branch1, ci->branch2,
+				     &mfi.oid, mfi.mode,
+				     &ci->dst_entry1->stages[other_stage].oid,
+				     ci->dst_entry1->stages[other_stage].mode,
+				     !mfi.clean);
+}
+
 static int handle_file(struct merge_options *o,
 			struct diff_filespec *rename,
 			int stage,
@@ -2529,36 +2592,23 @@ static int process_renames(struct merge_options *o,
 						      0  /* update_wd    */))
 					clean_merge = -1;
 			} else if (!oid_eq(&dst_other.oid, &null_oid)) {
-				clean_merge = 0;
-				try_merge = 1;
-				output(o, 1, _("CONFLICT (rename/add): Rename %s->%s in %s. "
-				       "%s added in %s"),
-				       ren1_src, ren1_dst, branch1,
-				       ren1_dst, branch2);
-				if (o->call_depth) {
-					struct merge_file_info mfi;
-					if (merge_file_one(o, ren1_dst, &null_oid, 0,
-							   &ren1->pair->two->oid,
-							   ren1->pair->two->mode,
-							   &dst_other.oid,
-							   dst_other.mode,
-							   branch1, branch2, &mfi)) {
-						clean_merge = -1;
-						goto cleanup_and_return;
-					}
-					output(o, 1, _("Adding merged %s"), ren1_dst);
-					if (update_file(o, 0, &mfi.oid,
-							mfi.mode, ren1_dst))
-						clean_merge = -1;
-					try_merge = 0;
-				} else {
-					char *new_path = unique_path(o, ren1_dst, branch2);
-					output(o, 1, _("Adding as %s instead"), new_path);
-					if (update_file(o, 0, &dst_other.oid,
-							dst_other.mode, new_path))
-						clean_merge = -1;
-					free(new_path);
-				}
+				/*
+				 * Probably not a clean merge, but it's
+				 * premature to set clean_merge to 0 here,
+				 * because if the rename merges cleanly and
+				 * the merge exactly matches the newly added
+				 * file, then the merge will be clean.
+				 */
+				setup_rename_conflict_info(RENAME_ADD,
+							   ren1->pair,
+							   NULL,
+							   branch1,
+							   branch2,
+							   ren1->dst_entry,
+							   NULL,
+							   o,
+							   ren1->src_entry,
+							   NULL);
 			} else
 				try_merge = 1;
 
@@ -2961,6 +3011,15 @@ static int process_entry(struct merge_options *o,
 						conflict_info->branch2))
 				clean_merge = -1;
 			break;
+		case RENAME_ADD:
+			/*
+			 * Probably unclean merge, but if the renamed file
+			 * merges cleanly and the result can then be
+			 * two-way merged cleanly with the added file, I
+			 * guess it's a clean merge?
+			 */
+			clean_merge = conflict_rename_add(o, conflict_info);
+			break;
 		case RENAME_DELETE:
 			clean_merge = 0;
 			if (conflict_rename_delete(o,
diff --git a/t/t6036-recursive-corner-cases.sh b/t/t6036-recursive-corner-cases.sh
index 8485e04b9b..09accbc62a 100755
--- a/t/t6036-recursive-corner-cases.sh
+++ b/t/t6036-recursive-corner-cases.sh
@@ -168,7 +168,7 @@ test_expect_success 'setup differently handled merges of rename/add conflict' '
 	git branch B &&
 	git checkout -b C &&
 	echo 10 >>a &&
-	echo "other content" >>new_a &&
+	printf "0\n1\n2\n3\n4\n5\n6\n7\nfoobar" >new_a &&
 	git add a new_a &&
 	test_tick && git commit -m C &&
 
@@ -179,13 +179,13 @@ test_expect_success 'setup differently handled merges of rename/add conflict' '
 	git checkout B^0 &&
 	test_must_fail git merge C &&
 	git clean -f &&
+	git add new_a &&
 	test_tick && git commit -m D &&
 	git tag D &&
 
 	git checkout C^0 &&
 	test_must_fail git merge B &&
-	rm new_a~HEAD new_a &&
-	printf "Incorrectly merged content" >>new_a &&
+	printf "0\n1\n2\n3\n4\n5\n6\n7\nbad merge" >new_a &&
 	git add -u &&
 	test_tick && git commit -m E &&
 	git tag E
@@ -204,7 +204,10 @@ test_expect_success 'git detects differently handled merges conflict' '
 	test $(git rev-parse :2:new_a) = $(git rev-parse D:new_a) &&
 	test $(git rev-parse :3:new_a) = $(git rev-parse E:new_a) &&
 
-	git cat-file -p B:new_a >>merged &&
+	# Since A:a == B:new_a, the three-way merge of A:a, B:new_a, and
+	# C:a is just C:a.  Then we do a two-way merge of that with
+	# C:new_a.
+	git cat-file -p C:a >>merged &&
 	git cat-file -p C:new_a >>merge-me &&
 	>empty &&
 	test_must_fail git merge-file \
-- 
2.15.0.46.g41dca04efb


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 6/9] merge-recursive: Improve handling for rename/rename(2to1) conflicts
  2017-11-10 22:21 [RFC PATCH 0/9] Improve rename detection performance in merge recursive Elijah Newren
                   ` (4 preceding siblings ...)
  2017-11-10 22:21 ` [RFC PATCH 5/9] merge-recursive: Fix rename/add " Elijah Newren
@ 2017-11-10 22:21 ` Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 7/9] merge-recursive: Improve handling for add/add conflicts Elijah Newren
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 12+ messages in thread
From: Elijah Newren @ 2017-11-10 22:21 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

This makes the rename/rename(2to1) conflicts use the new
handle_file_collision() function.  Since that function was based
originally on the rename/rename(2to1) handling code, the main
differences here are in what was added.  In particular:

  * If the two colliding files are similar, instead of being stored
    at collide_path~HEAD and collide_path~MERGE, the files are two-way
    merged and recorded at collide_path.
  * Instead of recording the version of the renamed file that existed
    on the renamed side in the index (thus ignoring any changes that
    were made to the file on the side of history without the rename),
    we do a three-way content merge on the renamed path, then store
    that at either stage 2 or stage 3.
  * Note that if either of the three-way content merges done for each
    rename have conflicts, we do NOT try to estimate the similarity of
    the resulting two files and just automatically consider them to be
    dissimilar.  This is done to avoid foisting conflicts-of-conflicts
    on the user.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
Is it too weird to others that I potentially record a merged file with
conflict markers at both stage 2 and stage 3 in the index?  To me, it
seemed less weird than what we previously did, but I am curious what
others think of it. 

 merge-recursive.c                    | 100 +++++------------------------------
 t/t6042-merge-rename-corner-cases.sh |   2 +-
 2 files changed, 14 insertions(+), 88 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index f29cbd1240..b8108740c4 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -647,26 +647,6 @@ static int update_stages(struct merge_options *opt, const char *path,
 	return 0;
 }
 
-static int update_stages_for_stage_data(struct merge_options *opt,
-					const char *path,
-					const struct stage_data *stage_data)
-{
-	struct diff_filespec o, a, b;
-	o.mode = stage_data->stages[1].mode;
-	oidcpy(&o.oid, &stage_data->stages[1].oid);
-
-	a.mode = stage_data->stages[2].mode;
-	oidcpy(&a.oid, &stage_data->stages[2].oid);
-
-	b.mode = stage_data->stages[3].mode;
-	oidcpy(&b.oid, &stage_data->stages[3].oid);
-
-	return update_stages(opt, path,
-			     is_null_sha1(o.oid.hash) ? NULL : &o,
-			     is_null_sha1(a.oid.hash) ? NULL : &a,
-			     is_null_sha1(b.oid.hash) ? NULL : &b);
-}
-
 static void update_entry(struct stage_data *entry,
 			 struct diff_filespec *o,
 			 struct diff_filespec *a,
@@ -1598,7 +1578,6 @@ static int conflict_rename_rename_2to1(struct merge_options *o,
 	char *path = c1->path; /* == c2->path */
 	struct merge_file_info mfi_c1;
 	struct merge_file_info mfi_c2;
-	int ret;
 
 	output(o, 1, _("CONFLICT (rename/rename): "
 	       "Rename %s->%s in %s. "
@@ -1606,9 +1585,6 @@ static int conflict_rename_rename_2to1(struct merge_options *o,
 	       a->path, c1->path, ci->branch1,
 	       b->path, c2->path, ci->branch2);
 
-	remove_file(o, 1, a->path, o->call_depth || would_lose_untracked(a->path));
-	remove_file(o, 1, b->path, o->call_depth || would_lose_untracked(b->path));
-
 	if (merge_file_special_markers(o, a, c1, &ci->ren1_other,
 				       o->branch1, c1->path,
 				       o->branch2, ci->ren1_other.path, &mfi_c1) ||
@@ -1617,66 +1593,11 @@ static int conflict_rename_rename_2to1(struct merge_options *o,
 				       o->branch2, c2->path, &mfi_c2))
 		return -1;
 
-	if (o->call_depth) {
-		/*
-		 * If mfi_c1.clean && mfi_c2.clean, then it might make
-		 * sense to do a two-way merge of those results.  But, I
-		 * think in all cases, it makes sense to have the virtual
-		 * merge base just undo the renames; they can be detected
-		 * again later for the non-recursive merge.
-		 */
-		remove_file(o, 0, path, 0);
-		ret = update_file(o, 0, &mfi_c1.oid, mfi_c1.mode, a->path);
-		if (!ret)
-			ret = update_file(o, 0, &mfi_c2.oid, mfi_c2.mode,
-					  b->path);
-	} else {
-		char *new_path1 = unique_path(o, path, ci->branch1);
-		char *new_path2 = unique_path(o, path, ci->branch2);
-		output(o, 1, _("Renaming %s to %s and %s to %s instead"),
-		       a->path, new_path1, b->path, new_path2);
-		if (was_dirty(o, path))
-			output(o, 1, _("Refusing to lose dirty file at %s"),
-			       path);
-		else if (would_lose_untracked(path))
-			/*
-			 * Only way we get here is if both renames were from
-			 * a directory rename AND user had an untracked file
-			 * at the location where both files end up after the
-			 * two directory renames.  See testcase 10d of t6043.
-			 */
-			output(o, 1, _("Refusing to lose untracked file at "
-				       "%s, even though it's in the way."),
-			       path);
-		else
-			remove_file(o, 0, path, 0);
-		ret = update_file(o, 0, &mfi_c1.oid, mfi_c1.mode, new_path1);
-		if (!ret)
-			ret = update_file(o, 0, &mfi_c2.oid, mfi_c2.mode,
-					  new_path2);
-		/*
-		 * unpack_trees() actually populates the index for us for
-		 * "normal" rename/rename(2to1) situtations so that the
-		 * correct entries are at the higher stages, which would
-		 * make the call below to update_stages_for_stage_data
-		 * unnecessary.  However, if either of the renames came
-		 * from a directory rename, then unpack_trees() will not
-		 * have gotten the right data loaded into the index, so we
-		 * need to do so now.  (While it'd be tempting to move this
-		 * call to update_stages_for_stage_data() to
-		 * apply_directory_rename_modifications(), that would break
-		 * our intermediate calls to would_lose_untracked() since
-		 * those rely on the current in-memory index.  See also the
-		 * big "NOTE" in update_stages()).
-		 */
-		if (update_stages_for_stage_data(o, path, ci->dst_entry1))
-			ret = -1;
-
-		free(new_path2);
-		free(new_path1);
-	}
-
-	return ret;
+	return handle_file_collision(o, path, a->path, b->path,
+				     ci->branch1, ci->branch2,
+				     &mfi_c1.oid, mfi_c1.mode,
+				     &mfi_c2.oid, mfi_c2.mode,
+				     !mfi_c1.clean || !mfi_c2.clean);
 }
 
 /*
@@ -3034,9 +2955,14 @@ static int process_entry(struct merge_options *o,
 				clean_merge = -1;
 			break;
 		case RENAME_TWO_FILES_TO_ONE:
-			clean_merge = 0;
-			if (conflict_rename_rename_2to1(o, conflict_info))
-				clean_merge = -1;
+			/*
+			 * Probably unclean merge, but if the two renamed
+			 * files merge cleanly and the two resulting files
+			 * can then be two-way merged cleanly, I guess it's
+			 * a clean merge?
+			 */
+			clean_merge = conflict_rename_rename_2to1(o,
+								  conflict_info);
 			break;
 		default:
 			entry->processed = 0;
diff --git a/t/t6042-merge-rename-corner-cases.sh b/t/t6042-merge-rename-corner-cases.sh
index d8fe797f0d..b0b840223b 100755
--- a/t/t6042-merge-rename-corner-cases.sh
+++ b/t/t6042-merge-rename-corner-cases.sh
@@ -358,7 +358,7 @@ test_expect_success 'setup rename/rename (2to1) + modify/modify' '
 	git init &&
 
 	printf "1\n2\n3\n4\n5\n" >a &&
-	printf "5\n4\n3\n2\n1\n" >b &&
+	printf "9\n8\n7\n6\n5\n" >b &&
 	git add a b &&
 	git commit -m A &&
 	git tag A &&
-- 
2.15.0.46.g41dca04efb


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 7/9] merge-recursive: Improve handling for add/add conflicts
  2017-11-10 22:21 [RFC PATCH 0/9] Improve rename detection performance in merge recursive Elijah Newren
                   ` (5 preceding siblings ...)
  2017-11-10 22:21 ` [RFC PATCH 6/9] merge-recursive: Improve handling for rename/rename(2to1) conflicts Elijah Newren
@ 2017-11-10 22:21 ` Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 8/9] merge-recursive: Accelerate rename detection Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 9/9] diffcore-rename: Filter rename_src list when possible Elijah Newren
  8 siblings, 0 replies; 12+ messages in thread
From: Elijah Newren @ 2017-11-10 22:21 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

This makes add/add conflicts use the new handle_file_collision()
function.  This leaves the handling of the index the same, but
modifies how the working tree is handled: instead of always doing
a two-way merge of the file contents and recording them at the
collision path name, we instead first estimate the similarity of the
two files involved.  If they are not similar, we instead record the
file contents into two separate files for the user to inspect.

Several testcases had to be modified to either expect files to be
written to different locations, or for the two test colliding files
to be modified so that they were similar.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
The sheer number of tests that relied on add/add always resulting in a
two-way merge gave me a little pause.  I am assuming that the ONLY reason
the testsuite did that was just because it made it really easy to write
the tests (you wouldn't need to make the two files similar at all), not
because it was actually the behavior that was really wanted for truly
dissimilar files.  Based on that assumption, I just modified the tests
for the "new world".  But I'd like to hear others' comments on that
assumption of mine.

 merge-recursive.c                    | 24 +++++++++++++++++++-----
 t/t2023-checkout-m.sh                |  2 +-
 t/t3418-rebase-continue.sh           | 27 +++++++++++++++++++++++----
 t/t3504-cherry-pick-rerere.sh        | 19 ++++++++++++++-----
 t/t4200-rerere.sh                    | 12 ++++++------
 t/t6020-merge-df.sh                  |  4 ++--
 t/t6024-recursive-merge.sh           | 35 +++++++++++++++++++++--------------
 t/t6025-merge-symlinks.sh            |  9 ++++++---
 t/t6031-merge-filemode.sh            |  4 ++--
 t/t6042-merge-rename-corner-cases.sh |  2 +-
 t/t6043-merge-rename-directories.sh  | 13 ++++++++-----
 t/t7060-wtstatus.sh                  |  1 +
 t/t7064-wtstatus-pv2.sh              |  4 ++--
 t/t7506-status-submodule.sh          | 11 +++++++----
 t/t7610-mergetool.sh                 | 28 ++++++++++++++--------------
 15 files changed, 127 insertions(+), 68 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index b8108740c4..7bc9a2ac80 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -3029,11 +3029,25 @@ static int process_entry(struct merge_options *o,
 				clean_merge = -1;
 		}
 	} else if (a_oid && b_oid) {
-		/* Case C: Added in both (check for same permissions) and */
-		/* case D: Modified in both, but differently. */
-		clean_merge = merge_content(o, path, 0 /* file_in_way */,
-					    o_oid, o_mode, a_oid, a_mode, b_oid, b_mode,
-					    NULL);
+		if (!o_oid) {
+			/* Case C: Added in both (check for same permissions) */
+			output(o, 1,
+			       _("CONFLICT (add/add): Merge conflict in %s"),
+			       path);
+			clean_merge = handle_file_collision(o,
+							    path, NULL, NULL,
+							    o->branch1,
+							    o->branch2,
+							    a_oid, a_mode,
+							    b_oid, b_mode,
+							    0);
+		} else
+			/* case D: Modified in both, but differently. */
+			clean_merge = merge_content(o, path, 0 /* file_in_way */,
+						    o_oid, o_mode,
+						    a_oid, a_mode,
+						    b_oid, b_mode,
+						    NULL);
 	} else if (!o_oid && !a_oid && !b_oid) {
 		/*
 		 * this entry was deleted altogether. a_mode == 0 means
diff --git a/t/t2023-checkout-m.sh b/t/t2023-checkout-m.sh
index 7e18985134..2f8ea52318 100755
--- a/t/t2023-checkout-m.sh
+++ b/t/t2023-checkout-m.sh
@@ -27,7 +27,7 @@ clean_branchnames () {
 }
 
 test_expect_success '-m restores 2-way conflicted+resolved file' '
-	cp each.txt each.txt.conflicted &&
+	test_must_fail git merge-file -p each.txt~HEAD /dev/null each.txt~master >each.txt.conflicted &&
 	echo resolved >each.txt &&
 	git add each.txt &&
 	git checkout -m -- each.txt &&
diff --git a/t/t3418-rebase-continue.sh b/t/t3418-rebase-continue.sh
index fcfdd197bd..3523558421 100755
--- a/t/t3418-rebase-continue.sh
+++ b/t/t3418-rebase-continue.sh
@@ -10,10 +10,17 @@ set_fake_editor
 
 test_expect_success 'setup' '
 	test_commit "commit-new-file-F1" F1 1 &&
-	test_commit "commit-new-file-F2" F2 2 &&
+	printf "1\n2\n2\n" >F2 &&
+	git add F2 &&
+	test_tick &&
+	git commit -m "commit-new-file-F2" &&
 
 	git checkout -b topic HEAD^ &&
-	test_commit "commit-new-file-F2-on-topic-branch" F2 22 &&
+	printf "1\n2\n22\n" >F2 &&
+	git add F2 &&
+	test_tick &&
+	git commit -m "commit-new-file-F2-on-topic-branch" &&
+	git tag "commit-new-file-F2-on-topic-branch" &&
 
 	git checkout master
 '
@@ -48,7 +55,13 @@ test_expect_success 'rebase --continue can not be used with other options' '
 test_expect_success 'rebase --continue remembers merge strategy and options' '
 	rm -fr .git/rebase-* &&
 	git reset --hard commit-new-file-F2-on-topic-branch &&
-	test_commit "commit-new-file-F3-on-topic-branch" F3 32 &&
+
+	printf "1\n2\n32\n" >F3 &&
+	git add F3 &&
+	test_tick &&
+	git commit -m "commit-new-file-F3-on-topic-branch" &&
+	git tag "commit-new-file-F3-on-topic-branch" &&
+
 	test_when_finished "rm -fr test-bin funny.was.run" &&
 	mkdir test-bin &&
 	cat >test-bin/git-merge-funny <<-EOF &&
@@ -78,7 +91,13 @@ test_expect_success 'setup rerere database' '
 	rm -fr .git/rebase-* &&
 	git reset --hard commit-new-file-F3-on-topic-branch &&
 	git checkout master &&
-	test_commit "commit-new-file-F3" F3 3 &&
+
+	printf "1\n2\n3\n" >F3 &&
+	git add F3 &&
+	test_tick &&
+	git commit -m "commit-new-file-F3" &&
+	git tag "commit-new-file-F3" &&
+
 	test_config rerere.enabled true &&
 	test_must_fail git rebase -m master topic &&
 	echo "Resolved" >F2 &&
diff --git a/t/t3504-cherry-pick-rerere.sh b/t/t3504-cherry-pick-rerere.sh
index a267b2d144..444ab06ad1 100755
--- a/t/t3504-cherry-pick-rerere.sh
+++ b/t/t3504-cherry-pick-rerere.sh
@@ -4,14 +4,23 @@ test_description='cherry-pick should rerere for conflicts'
 
 . ./test-lib.sh
 
+test_basic_commit () {
+	file=$2 &&
+	printf "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n$1" >"$2" &&
+	git add "$file" &&
+	test_tick
+	git commit -m "$1" &&
+	git tag "$1"
+}
+
 test_expect_success setup '
-	test_commit foo &&
-	test_commit foo-master foo &&
-	test_commit bar-master bar &&
+	test_basic_commit foo foo &&
+	test_basic_commit foo-master foo &&
+	test_basic_commit bar-master bar &&
 
 	git checkout -b dev foo &&
-	test_commit foo-dev foo &&
-	test_commit bar-dev bar &&
+	test_basic_commit foo-dev foo &&
+	test_basic_commit bar-dev bar &&
 	git config rerere.enabled true
 '
 
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index d97d2bebc9..80451907ce 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -280,36 +280,36 @@ test_expect_success 'setup: file2 added differently in two branches' '
 	git reset --hard &&
 
 	git checkout -b fourth &&
-	echo Hallo >file2 &&
+	printf "1\n2\n3\n4\n5\n6\n7\n8\n9\nHallo" >file2 &&
 	git add file2 &&
 	test_tick &&
 	git commit -m version1 &&
 
 	git checkout third &&
-	echo Bello >file2 &&
+	printf "1\n2\n3\n4\n5\n6\n7\n8\n9\nBello" >file2 &&
 	git add file2 &&
 	test_tick &&
 	git commit -m version2 &&
 
 	test_must_fail git merge fourth &&
-	echo Cello >file2 &&
+	printf "1\n2\n3\n4\n5\n6\n7\n8\n9\nCello" >file2 &&
 	git add file2 &&
 	git commit -m resolution
 '
 
 test_expect_success 'resolution was recorded properly' '
-	echo Cello >expected &&
+	printf "1\n2\n3\n4\n5\n6\n7\n8\n9\nCello" >expected &&
 
 	git reset --hard HEAD~2 &&
 	git checkout -b fifth &&
 
-	echo Hallo >file3 &&
+	printf "1\n2\n3\n4\n5\n6\n7\n8\n9\nHallo" >file3 &&
 	git add file3 &&
 	test_tick &&
 	git commit -m version1 &&
 
 	git checkout third &&
-	echo Bello >file3 &&
+	printf "1\n2\n3\n4\n5\n6\n7\n8\n9\nBello" >file3 &&
 	git add file3 &&
 	test_tick &&
 	git commit -m version2 &&
diff --git a/t/t6020-merge-df.sh b/t/t6020-merge-df.sh
index 2af1beec5f..4ee27d2e20 100755
--- a/t/t6020-merge-df.sh
+++ b/t/t6020-merge-df.sh
@@ -62,7 +62,7 @@ test_expect_success 'setup modify/delete + directory/file conflict' '
 	# Throw in letters.txt for sorting order fun
 	# ("letters.txt" sorts between "letters" and "letters/file")
 	echo i >>letters &&
-	echo "version 2" >letters.txt &&
+	printf "1\n2\n3\n4\n5\nversion 2" >letters.txt &&
 	git add letters letters.txt &&
 	git commit -m modified &&
 
@@ -70,7 +70,7 @@ test_expect_success 'setup modify/delete + directory/file conflict' '
 	git rm letters &&
 	mkdir letters &&
 	>letters/file &&
-	echo "version 1" >letters.txt &&
+	printf "1\n2\n3\n4\n5\nversion 1" >letters.txt &&
 	git add letters letters.txt &&
 	git commit -m deleted
 '
diff --git a/t/t6024-recursive-merge.sh b/t/t6024-recursive-merge.sh
index 3f59e58dfb..354f70a66f 100755
--- a/t/t6024-recursive-merge.sh
+++ b/t/t6024-recursive-merge.sh
@@ -15,47 +15,47 @@ GIT_COMMITTER_DATE="2006-12-12 23:28:00 +0100"
 export GIT_COMMITTER_DATE
 
 test_expect_success "setup tests" '
-echo 1 > a1 &&
+printf "1\n2\n3\n4\n1\n" > a1 &&
 git add a1 &&
 GIT_AUTHOR_DATE="2006-12-12 23:00:00" git commit -m 1 a1 &&
 
 git checkout -b A master &&
-echo A > a1 &&
+printf "1\n2\n3\n4\nA\n" > a1 &&
 GIT_AUTHOR_DATE="2006-12-12 23:00:01" git commit -m A a1 &&
 
 git checkout -b B master &&
-echo B > a1 &&
+printf "1\n2\n3\n4\nB\n" > a1 &&
 GIT_AUTHOR_DATE="2006-12-12 23:00:02" git commit -m B a1 &&
 
 git checkout -b D A &&
 git rev-parse B > .git/MERGE_HEAD &&
-echo D > a1 &&
+printf "1\n2\n3\n4\nD\n" > a1 &&
 git update-index a1 &&
 GIT_AUTHOR_DATE="2006-12-12 23:00:03" git commit -m D &&
 
 git symbolic-ref HEAD refs/heads/other &&
-echo 2 > a1 &&
+printf "1\n2\n3\n4\n2\n" > a1 &&
 GIT_AUTHOR_DATE="2006-12-12 23:00:04" git commit -m 2 a1 &&
 
 git checkout -b C &&
-echo C > a1 &&
+printf "1\n2\n3\n4\nC\n" > a1 &&
 GIT_AUTHOR_DATE="2006-12-12 23:00:05" git commit -m C a1 &&
 
 git checkout -b E C &&
 git rev-parse B > .git/MERGE_HEAD &&
-echo E > a1 &&
+printf "1\n2\n3\n4\nE\n" > a1 &&
 git update-index a1 &&
 GIT_AUTHOR_DATE="2006-12-12 23:00:06" git commit -m E &&
 
 git checkout -b G E &&
 git rev-parse A > .git/MERGE_HEAD &&
-echo G > a1 &&
+printf "1\n2\n3\n4\nG\n" > a1 &&
 git update-index a1 &&
 GIT_AUTHOR_DATE="2006-12-12 23:00:07" git commit -m G &&
 
 git checkout -b F D &&
 git rev-parse C > .git/MERGE_HEAD &&
-echo F > a1 &&
+printf "1\n2\n3\n4\nF\n" > a1 &&
 git update-index a1 &&
 GIT_AUTHOR_DATE="2006-12-12 23:00:08" git commit -m F
 '
@@ -65,6 +65,10 @@ test_expect_success "combined merge conflicts" "
 "
 
 cat > expect << EOF
+1
+2
+3
+4
 <<<<<<< HEAD
 F
 =======
@@ -76,9 +80,9 @@ test_expect_success "result contains a conflict" "test_cmp expect a1"
 
 git ls-files --stage > out
 cat > expect << EOF
-100644 ec3fe2a791706733f2d8fa7ad45d9a9672031f5e 1	a1
-100644 cf84443e49e1b366fac938711ddf4be2d4d1d9e9 2	a1
-100644 fd7923529855d0b274795ae3349c5e0438333979 3	a1
+100644 a2849e152d46797b83870a479775e980d9a138b1 1	a1
+100644 b1a886a67c3f471cec4d4112a91dc6caa6e3b709 2	a1
+100644 e9d143c719e446df2bafb4b6aba8ed511ecd63b1 3	a1
 EOF
 
 test_expect_success "virtual trees were processed" "test_cmp expect out"
@@ -87,10 +91,13 @@ test_expect_success 'refuse to merge binary files' '
 	git reset --hard &&
 	printf "\0" > binary-file &&
 	git add binary-file &&
-	git commit -m binary &&
-	git checkout G &&
+	git commit -m initial-binary &&
 	printf "\0\0" > binary-file &&
 	git add binary-file &&
+	git commit -m binary1 &&
+	git checkout -b H HEAD~1 &&
+	printf "\0\0\0" > binary-file &&
+	git add binary-file &&
 	git commit -m binary2 &&
 	test_must_fail git merge F > merge.out 2> merge.err &&
 	grep "Cannot merge binary files: binary-file (HEAD vs. F)" merge.err
diff --git a/t/t6025-merge-symlinks.sh b/t/t6025-merge-symlinks.sh
index 433c4de08f..08eb85e703 100755
--- a/t/t6025-merge-symlinks.sh
+++ b/t/t6025-merge-symlinks.sh
@@ -37,7 +37,8 @@ test_must_fail git merge master'
 
 test_expect_success \
 'the merge result must be a file' '
-test -f symlink'
+test -f symlink~HEAD &&
+test -f symlink~master'
 
 test_expect_success \
 'merge master into b-file, which has a file instead of a symbolic link' '
@@ -46,7 +47,8 @@ test_must_fail git merge master'
 
 test_expect_success \
 'the merge result must be a file' '
-test -f symlink'
+test -f symlink~HEAD &&
+test -f symlink~master'
 
 test_expect_success \
 'merge b-file, which has a file instead of a symbolic link, into master' '
@@ -56,6 +58,7 @@ test_must_fail git merge b-file'
 
 test_expect_success \
 'the merge result must be a file' '
-test -f symlink'
+test -f symlink~HEAD &&
+test -f symlink~master'
 
 test_done
diff --git a/t/t6031-merge-filemode.sh b/t/t6031-merge-filemode.sh
index 7d06461f13..79e77f391f 100755
--- a/t/t6031-merge-filemode.sh
+++ b/t/t6031-merge-filemode.sh
@@ -40,12 +40,12 @@ do_one_mode resolve b1 a1
 test_expect_success 'set up mode change in both branches' '
 	git reset --hard HEAD &&
 	git checkout -b a2 master &&
-	: >file2 &&
+	echo true >file2 &&
 	H=$(git hash-object file2) &&
 	test_chmod +x file2 &&
 	git commit -m a2 &&
 	git checkout -b b2 master &&
-	: >file2 &&
+	echo true >file2 &&
 	git add file2 &&
 	git commit -m b2 &&
 	{
diff --git a/t/t6042-merge-rename-corner-cases.sh b/t/t6042-merge-rename-corner-cases.sh
index b0b840223b..f06147e6ce 100755
--- a/t/t6042-merge-rename-corner-cases.sh
+++ b/t/t6042-merge-rename-corner-cases.sh
@@ -730,7 +730,7 @@ test_conflicts_with_adds_and_renames() {
 		test_tick && git commit -m R
 	'
 
-	test_expect_failure "check simple $side1/$side2 conflict" '
+	test_expect_success "check simple $side1/$side2 conflict" '
 		git reset --hard &&
 		git checkout L^0 &&
 
diff --git a/t/t6043-merge-rename-directories.sh b/t/t6043-merge-rename-directories.sh
index 676e72e9e6..86b17cc9ed 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -886,7 +886,7 @@ test_expect_success '5b-check: Rename/delete in order to get add/add/add conflic
 
 	test 5 -eq $(git ls-files -s | wc -l) &&
 	test 2 -eq $(git ls-files -u | wc -l) &&
-	test 1 -eq $(git ls-files -o | wc -l) &&
+	test 3 -eq $(git ls-files -o | wc -l) &&
 
 	test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) &&
 	test $(git rev-parse :0:y/c) = $(git rev-parse A:z/c) &&
@@ -895,7 +895,9 @@ test_expect_success '5b-check: Rename/delete in order to get add/add/add conflic
 	test_must_fail git rev-parse :1:y/d &&
 	test $(git rev-parse :2:y/d) = $(git rev-parse B:y/d) &&
 	test $(git rev-parse :3:y/d) = $(git rev-parse C:y/d) &&
-	test -f y/d
+	test -f y/d~HEAD &&
+	test -f y/d~C^0 &&
+	test ! -f y/d
 '
 
 # Testcase 5c, Transitive rename would cause rename/rename/rename/add/add/add
@@ -960,7 +962,7 @@ test_expect_success '5c-check: Transitive rename would cause rename/rename/renam
 
 	test 9 -eq $(git ls-files -s | wc -l) &&
 	test 6 -eq $(git ls-files -u | wc -l) &&
-	test 3 -eq $(git ls-files -o | wc -l) &&
+	test 5 -eq $(git ls-files -o | wc -l) &&
 
 	test $(git rev-parse :0:y/b) = $(git rev-parse A:z/b) &&
 	test $(git rev-parse :0:y/c) = $(git rev-parse A:z/c) &&
@@ -977,8 +979,9 @@ test_expect_success '5c-check: Transitive rename would cause rename/rename/renam
 	test $(git hash-object w/d~HEAD) = $(git rev-parse A:x/d) &&
 	test $(git hash-object w/d~C^0) = $(git rev-parse C:w/d) &&
 	test ! -f x/d &&
-	test -f y/d &&
-	grep -q "<<<<" y/d &&  # conflict markers should be present
+	test ! -f y/d &&
+	test $(git hash-object y/d~HEAD) = $(git rev-parse B:y/d) &&
+	test $(git hash-object y/d~C^0) = $(git rev-parse C:y/d) &&
 	test $(git hash-object z/d) = $(git rev-parse A:x/d)
 '
 
diff --git a/t/t7060-wtstatus.sh b/t/t7060-wtstatus.sh
index 53cf42fac1..a7888f9fc3 100755
--- a/t/t7060-wtstatus.sh
+++ b/t/t7060-wtstatus.sh
@@ -214,6 +214,7 @@ EOF
 	git status --untracked-files=no >actual &&
 	test_i18ncmp expected actual &&
 	git reset --hard &&
+	git clean -f *~* &&
 	git checkout master
 '
 
diff --git a/t/t7064-wtstatus-pv2.sh b/t/t7064-wtstatus-pv2.sh
index e319fa2e84..c4d3b1d7e4 100755
--- a/t/t7064-wtstatus-pv2.sh
+++ b/t/t7064-wtstatus-pv2.sh
@@ -259,14 +259,14 @@ test_expect_success 'verify AA (add-add) conflict' '
 
 	git branch AA_A master &&
 	git checkout AA_A &&
-	echo "Branch AA_A" >conflict.txt &&
+	printf "1\n2\n3\n4\n5\n6\nBranch AA_A" >conflict.txt &&
 	OID_AA_A=$(git hash-object -t blob -- conflict.txt) &&
 	git add conflict.txt &&
 	git commit -m "branch aa_a" &&
 
 	git branch AA_B master &&
 	git checkout AA_B &&
-	echo "Branch AA_B" >conflict.txt &&
+	printf "1\n2\n3\n4\n5\n6\nBranch AA_B" >conflict.txt &&
 	OID_AA_B=$(git hash-object -t blob -- conflict.txt) &&
 	git add conflict.txt &&
 	git commit -m "branch aa_b" &&
diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh
index 9edf6572ed..0514a019be 100755
--- a/t/t7506-status-submodule.sh
+++ b/t/t7506-status-submodule.sh
@@ -240,6 +240,8 @@ test_expect_success 'status -a clean (empty submodule dir)' '
 cat >status_expect <<\EOF
 AA .gitmodules
 A  sub1
+?? .gitmodules~HEAD
+?? .gitmodules~add_sub1
 EOF
 
 test_expect_success 'status with merge conflict in .gitmodules' '
@@ -273,7 +275,7 @@ index badaa4c,44f999a..0000000
 --- a/.gitmodules
 +++ b/.gitmodules
 @@@ -1,3 -1,3 +1,9 @@@
-++<<<<<<< HEAD
+++<<<<<<< ours
  +[submodule "sub2"]
  +	path = sub2
  +	url = ../sub2
@@ -281,7 +283,7 @@ index badaa4c,44f999a..0000000
 + [submodule "sub1"]
 + 	path = sub1
 + 	url = ../sub1
-++>>>>>>> add_sub1
+++>>>>>>> theirs
 EOF
 
 cat >diff_submodule_expect <<\EOF
@@ -290,7 +292,7 @@ index badaa4c,44f999a..0000000
 --- a/.gitmodules
 +++ b/.gitmodules
 @@@ -1,3 -1,3 +1,9 @@@
-++<<<<<<< HEAD
+++<<<<<<< ours
  +[submodule "sub2"]
  +	path = sub2
  +	url = ../sub2
@@ -298,12 +300,13 @@ index badaa4c,44f999a..0000000
 + [submodule "sub1"]
 + 	path = sub1
 + 	url = ../sub1
-++>>>>>>> add_sub1
+++>>>>>>> theirs
 EOF
 
 test_expect_success 'diff with merge conflict in .gitmodules' '
 	(
 		cd super &&
+		git checkout -m .gitmodules &&
 		git diff >../diff_actual 2>&1
 	) &&
 	test_cmp diff_expect diff_actual
diff --git a/t/t7610-mergetool.sh b/t/t7610-mergetool.sh
index 1a430b9c40..71f41fc9a8 100755
--- a/t/t7610-mergetool.sh
+++ b/t/t7610-mergetool.sh
@@ -37,9 +37,9 @@ test_expect_success 'setup' '
 	git checkout -b branch1 master &&
 	git submodule update -N &&
 	echo branch1 change >file1 &&
-	echo branch1 newfile >file2 &&
+	printf "1\n2\n3\n4\n5\n6\n7\n8\n9\nbranch1 newfile\n" >file2 &&
 	echo branch1 spaced >"spaced name" &&
-	echo branch1 both added >both &&
+	printf "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\nbranch1 both added\n" >both &&
 	echo branch1 change file11 >file11 &&
 	echo branch1 change file13 >file13 &&
 	echo branch1 sub >subdir/file3 &&
@@ -84,9 +84,9 @@ test_expect_success 'setup' '
 	git checkout master &&
 	git submodule update -N &&
 	echo master updated >file1 &&
-	echo master new >file2 &&
+	printf "1\n2\n3\n4\n5\n6\n7\n8\n9\nmaster new\n" >file2 &&
 	echo master updated spaced >"spaced name" &&
-	echo master both added >both &&
+	printf "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\nmaster both added\n" >both &&
 	echo master updated file12 >file12 &&
 	echo master updated file14 >file14 &&
 	echo master new sub >subdir/file3 &&
@@ -139,7 +139,7 @@ test_expect_success 'custom mergetool' '
 	( yes "d" | git mergetool file12 >/dev/null 2>&1 ) &&
 	( yes "l" | git mergetool submod >/dev/null 2>&1 ) &&
 	test "$(cat file1)" = "master updated" &&
-	test "$(cat file2)" = "master new" &&
+	test "$(git hash-object file2)" = "$(git rev-parse master:file2)" &&
 	test "$(cat subdir/file3)" = "master new sub" &&
 	test "$(cat submod/bar)" = "branch1 submodule" &&
 	git commit -m "branch1 resolved with mergetool"
@@ -163,7 +163,7 @@ test_expect_success 'mergetool crlf' '
 	( yes "d" | git mergetool file12 >/dev/null 2>&1 ) &&
 	( yes "r" | git mergetool submod >/dev/null 2>&1 ) &&
 	test "$(printf x | cat file1 -)" = "$(printf "master updated\r\nx")" &&
-	test "$(printf x | cat file2 -)" = "$(printf "master new\r\nx")" &&
+	test "$(printf x | cat file2 -)" = "$(printf "1\r\n2\r\n3\r\n4\r\n5\r\n6\r\n7\r\n8\r\n9\r\nmaster new\r\nx")" &&
 	test "$(printf x | cat subdir/file3 -)" = "$(printf "master new sub\r\nx")" &&
 	git submodule update -N &&
 	test "$(cat submod/bar)" = "master submodule" &&
@@ -171,7 +171,7 @@ test_expect_success 'mergetool crlf' '
 '
 
 test_expect_success 'mergetool in subdir' '
-	test_when_finished "git reset --hard" &&
+	test_when_finished "git reset --hard && git clean -f" &&
 	git checkout -b test$test_count branch1 &&
 	git submodule update -N &&
 	(
@@ -183,7 +183,7 @@ test_expect_success 'mergetool in subdir' '
 '
 
 test_expect_success 'mergetool on file in parent dir' '
-	test_when_finished "git reset --hard" &&
+	test_when_finished "git reset --hard && git clean -f" &&
 	git checkout -b test$test_count branch1 &&
 	git submodule update -N &&
 	(
@@ -197,14 +197,14 @@ test_expect_success 'mergetool on file in parent dir' '
 		( yes "d" | git mergetool ../file12 >/dev/null 2>&1 ) &&
 		( yes "l" | git mergetool ../submod >/dev/null 2>&1 ) &&
 		test "$(cat ../file1)" = "master updated" &&
-		test "$(cat ../file2)" = "master new" &&
+		test "$(git hash-object ../file2)" = "$(git rev-parse master:file2)" &&
 		test "$(cat ../submod/bar)" = "branch1 submodule" &&
 		git commit -m "branch1 resolved with mergetool - subdir"
 	)
 '
 
 test_expect_success 'mergetool skips autoresolved' '
-	test_when_finished "git reset --hard" &&
+	test_when_finished "git reset --hard && git clean -f" &&
 	git checkout -b test$test_count branch1 &&
 	git submodule update -N &&
 	test_must_fail git merge master &&
@@ -217,7 +217,7 @@ test_expect_success 'mergetool skips autoresolved' '
 '
 
 test_expect_success 'mergetool merges all from subdir (rerere disabled)' '
-	test_when_finished "git reset --hard" &&
+	test_when_finished "git reset --hard && git clean -f" &&
 	git checkout -b test$test_count branch1 &&
 	test_config rerere.enabled false &&
 	(
@@ -226,7 +226,7 @@ test_expect_success 'mergetool merges all from subdir (rerere disabled)' '
 		( yes "r" | git mergetool ../submod ) &&
 		( yes "d" "d" | git mergetool --no-prompt ) &&
 		test "$(cat ../file1)" = "master updated" &&
-		test "$(cat ../file2)" = "master new" &&
+		test "$(git hash-object ../file2)" = "$(git rev-parse master:file2)" &&
 		test "$(cat file3)" = "master new sub" &&
 		( cd .. && git submodule update -N ) &&
 		test "$(cat ../submod/bar)" = "master submodule" &&
@@ -245,7 +245,7 @@ test_expect_success 'mergetool merges all from subdir (rerere enabled)' '
 		( yes "r" | git mergetool ../submod ) &&
 		( yes "d" "d" | git mergetool --no-prompt ) &&
 		test "$(cat ../file1)" = "master updated" &&
-		test "$(cat ../file2)" = "master new" &&
+		test "$(git hash-object ../file2)" = "$(git rev-parse master:file2)" &&
 		test "$(cat file3)" = "master new sub" &&
 		( cd .. && git submodule update -N ) &&
 		test "$(cat ../submod/bar)" = "master submodule" &&
@@ -631,7 +631,7 @@ test_expect_success 'custom commands override built-ins' '
 	test_config mergetool.defaults.trustExitCode true &&
 	test_must_fail git merge master &&
 	git mergetool --no-prompt --tool defaults -- both &&
-	echo master both added >expected &&
+	git cat-file -p master:both >expected &&
 	test_cmp expected both
 '
 
-- 
2.15.0.46.g41dca04efb


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 8/9] merge-recursive: Accelerate rename detection
  2017-11-10 22:21 [RFC PATCH 0/9] Improve rename detection performance in merge recursive Elijah Newren
                   ` (6 preceding siblings ...)
  2017-11-10 22:21 ` [RFC PATCH 7/9] merge-recursive: Improve handling for add/add conflicts Elijah Newren
@ 2017-11-10 22:21 ` Elijah Newren
  2017-11-10 22:21 ` [RFC PATCH 9/9] diffcore-rename: Filter rename_src list when possible Elijah Newren
  8 siblings, 0 replies; 12+ messages in thread
From: Elijah Newren @ 2017-11-10 22:21 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

If a file is unmodified on one side of history (no content changes, no
name change, and no mode change) and is renamed on the other side, then
the correct merge result is to take both the file name and the file
contents (and file mode) of the renamed file.  merge-recursive detects
this rename and gets the correct merge result.

Note that if no rename detection is done, then this will appear to the
merge machinery as two files: one that was unmodified on one side of
history and deleted on the other (thus the merge should delete it), and
one which was newly added on one side of history (thus the merge should
include it).  Thus, even if the rename wasn't detected, we still would
have ended up with the correct result.

In other words, rename detection is a waste of time for files that were
unmodified on the OTHER side of history.  We can accelerate rename
detection for merges by providing information about the other side of
history, which will allow us to remove all such rename sources from the
list of candidates we care about.

There are two gotchas:

  1) Not trying to detect renames for these types of files can result in
     rename/add conflicts being instead detected as add/add conflicts,
     and can result in rename/rename(2to1) conflicts being instead
     detected as either rename/add or add/add conflicts.  Luckily for
     us, these three types of conflicts happen to make the same changes
     to the index and working tree (what a coincidence...), so this
     isn't a significant issue; the only annoyance is that the stdout
     from the merge command will include a "CONFLICT($type)" message for
     a related conflict type instead of the precise conflict type.

  2) If there is a directory rename on one side of history AND all files
     within the directory are not merely renamed but are modified as
     well AND none of the original files in the directory are modified
     on the other side of history AND there are new files added (or
     moved into) to the original directory on that other side of
     history, then this change will prevent us from being able to detect
     that directory rename and placing the new file(s) into the
     appropriate directory.  A subsequent commit will correct this
     downside.

In one particular testcase involving a large repository and some
high-level directories having been renamed, this cut the time necessary
for a cherry-pick down by a factor of about 8 (from around 4.5 minutes
down to around 34 seconds)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diff.c            |  1 +
 diff.h            |  3 +++
 diffcore-rename.c | 43 ++++++++++++++++++++++++++++++++++++++-
 merge-recursive.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++++------
 4 files changed, 101 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index c6597e3231..83723ab26d 100644
--- a/diff.c
+++ b/diff.c
@@ -4173,6 +4173,7 @@ void diff_setup(struct diff_options *options)
 	}
 
 	options->color_moved = diff_color_moved_default;
+	options->ignore_for_renames = NULL;
 }
 
 void diff_setup_done(struct diff_options *options)
diff --git a/diff.h b/diff.h
index adf7e92eb5..1288f36fd2 100644
--- a/diff.h
+++ b/diff.h
@@ -196,6 +196,9 @@ struct diff_options {
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
+
+	/* Paths we should ignore for rename purposes */
+	struct string_list *ignore_for_renames;
 };
 
 void diff_emit_submodule_del(struct diff_options *o, const char *line);
diff --git a/diffcore-rename.c b/diffcore-rename.c
index b15d9d74ef..aa8e0e4d4a 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -437,6 +437,40 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i
 	return count;
 }
 
+static int handle_rename_ignores(struct diff_options *options)
+{
+	int detect_rename = options->detect_rename;
+	struct string_list *ignores = options->ignore_for_renames;
+	int ignored = 0;
+	int i, j;
+
+	/* rename_ignores onlhy relevant when we're not detecting copies */
+	if (ignores == NULL || detect_rename == DIFF_DETECT_COPY)
+		return 0;
+
+	for (i = 0, j = 0; i < ignores->nr && j < rename_src_nr;) {
+		struct diff_filespec *one = rename_src[j].p->one;
+		int cmp;
+
+		if (one->rename_used) {
+			j++;
+			continue;
+		}
+
+		cmp = strcmp(ignores->items[i].string, one->path);
+		if (cmp < 0)
+			i++;
+		else if (cmp > 0)
+			j++;
+		else {
+			one->rename_used++;
+			ignored++;
+		}
+	}
+
+	return ignored;
+}
+
 void diffcore_rename(struct diff_options *options)
 {
 	int detect_rename = options->detect_rename;
@@ -445,7 +479,7 @@ void diffcore_rename(struct diff_options *options)
 	struct diff_queue_struct outq;
 	struct diff_score *mx;
 	int i, j, rename_count, skip_unmodified = 0;
-	int num_create, dst_cnt, num_src;
+	int num_create, dst_cnt, num_src, ignore_count;
 	struct progress *progress = NULL;
 
 	if (!minimum_score)
@@ -506,6 +540,12 @@ void diffcore_rename(struct diff_options *options)
 	if (minimum_score == MAX_SCORE)
 		goto cleanup;
 
+	/*
+	 * Mark source files as used if they are found in the
+	 * ignore_for_renames list.
+	 */
+	ignore_count = handle_rename_ignores(options);
+
 	/*
 	 * Calculate how many renames are left (but all the source
 	 * files still remain as options for rename/copies!)
@@ -513,6 +553,7 @@ void diffcore_rename(struct diff_options *options)
 	num_create = (rename_dst_nr - rename_count);
 	num_src = (detect_rename == DIFF_DETECT_COPY ?
 		   rename_src_nr : rename_src_nr - rename_count);
+	num_src -= ignore_count;
 
 	/* All done? */
 	if (!num_create)
diff --git a/merge-recursive.c b/merge-recursive.c
index 7bc9a2ac80..7dceaf5e9f 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -487,6 +487,50 @@ static struct string_list *get_unmerged(void)
 	return unmerged;
 }
 
+static struct string_list *get_rename_ignore(struct string_list *unmerged)
+{
+	/*
+	 * If a file is unmodified on one side of history (no content
+	 * changes, no mode change, and no name change) and is renamed on
+	 * the other side, then the correct merge result is to take both
+	 * the file name and the file contents (and file mode) of the
+	 * renamed file.  merge-recursive detects this rename and gets the
+	 * correct merge result.
+	 *
+	 * Note that if no rename detection is done, then this will appear
+	 * to the merge machinery as two files: one that was unmodified on
+	 * one side of history and deleted on the other (thus the merge
+	 * should delete it), and one which was newly added on one side of
+	 * history (thus the merge should include it).  Thus, even if the
+	 * rename wasn't detected, we still would have ended up with the
+	 * correct result.
+	 *
+	 * In other words, rename detection is a waste of time for files
+	 * that were unmodified on the OTHER side of history.  We can
+	 * accelerate rename detection for merges by find these sets of
+	 * unmodified files, and feeding them to get_renames so it can
+	 * omit using those files as rename sources.
+	 */
+	struct string_list *rename_ignore;
+	int i;
+
+	rename_ignore = xcalloc(1, sizeof(struct string_list));
+	for (i = 0; i < unmerged->nr; i++) {
+		const char *path = unmerged->items[i].string;
+		struct stage_data *e = unmerged->items[i].util;
+		unsigned ign_head = is_null_oid(&e->stages[2].oid) &&
+		  oid_eq(&e->stages[1].oid, &e->stages[3].oid) &&
+		  e->stages[1].mode == e->stages[3].mode;
+		unsigned ign_merge = is_null_oid(&e->stages[3].oid) &&
+		  oid_eq(&e->stages[1].oid, &e->stages[2].oid) &&
+		  e->stages[1].mode == e->stages[2].mode;
+		if (ign_head || ign_merge)
+			string_list_append(rename_ignore, path);
+	}
+	return rename_ignore;
+}
+
+
 static int string_list_df_name_compare(const char *one, const char *two)
 {
 	int onelen = strlen(one);
@@ -1605,7 +1649,8 @@ static int conflict_rename_rename_2to1(struct merge_options *o,
  */
 static struct diff_queue_struct *get_diffpairs(struct merge_options *o,
 					       struct tree *o_tree,
-					       struct tree *tree)
+					       struct tree *tree,
+					       struct string_list *rename_ignore)
 {
 	struct diff_queue_struct *ret;
 	struct diff_options opts;
@@ -1614,6 +1659,7 @@ static struct diff_queue_struct *get_diffpairs(struct merge_options *o,
 	DIFF_OPT_SET(&opts, RECURSIVE);
 	DIFF_OPT_CLR(&opts, RENAME_EMPTY);
 	opts.detect_rename = DIFF_DETECT_RENAME;
+	opts.ignore_for_renames = rename_ignore;
 	opts.rename_limit = o->merge_rename_limit >= 0 ? o->merge_rename_limit :
 			    o->diff_rename_limit >= 0 ? o->diff_rename_limit :
 			    1000;
@@ -2578,6 +2624,7 @@ static struct rename_info *handle_renames(struct merge_options *o,
 					  struct tree *head,
 					  struct tree *merge,
 					  struct string_list *entries,
+					  struct string_list *rename_ignore,
 					  int *clean)
 {
 	struct rename_info *rei = xcalloc(1, sizeof(struct rename_info));
@@ -2590,8 +2637,8 @@ static struct rename_info *handle_renames(struct merge_options *o,
 	if (!o->detect_rename)
 		return NULL;
 
-	head_pairs = get_diffpairs(o, common, head);
-	merge_pairs = get_diffpairs(o, common, merge);
+	head_pairs = get_diffpairs(o, common, head, rename_ignore);
+	merge_pairs = get_diffpairs(o, common, merge, rename_ignore);
 
 	dir_re_head = get_directory_renames(head_pairs, head);
 	dir_re_merge = get_directory_renames(merge_pairs, merge);
@@ -3090,7 +3137,7 @@ int merge_trees(struct merge_options *o,
 	}
 
 	if (unmerged_cache()) {
-		struct string_list *entries;
+		struct string_list *entries, *rename_ignore;
 		struct rename_info *re_info;
 		int i;
 		/*
@@ -3105,7 +3152,9 @@ int merge_trees(struct merge_options *o,
 		get_files_dirs(o, merge);
 
 		entries = get_unmerged();
-		re_info = handle_renames(o, common, head, merge, entries, &clean);
+		rename_ignore = get_rename_ignore(entries);
+		re_info = handle_renames(o, common, head, merge, entries,
+					 rename_ignore, &clean);
 		record_df_conflict_files(o, entries);
 		if (clean < 0)
 			goto cleanup;
@@ -3131,7 +3180,7 @@ int merge_trees(struct merge_options *o,
 
 cleanup:
 		cleanup_renames(re_info);
-
+		string_list_clear(rename_ignore, 0);
 		string_list_clear(entries, 1);
 		free(entries);
 
-- 
2.15.0.46.g41dca04efb


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH 9/9] diffcore-rename: Filter rename_src list when possible
  2017-11-10 22:21 [RFC PATCH 0/9] Improve rename detection performance in merge recursive Elijah Newren
                   ` (7 preceding siblings ...)
  2017-11-10 22:21 ` [RFC PATCH 8/9] merge-recursive: Accelerate rename detection Elijah Newren
@ 2017-11-10 22:21 ` Elijah Newren
  8 siblings, 0 replies; 12+ messages in thread
From: Elijah Newren @ 2017-11-10 22:21 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren

We have to look at each entry in rename_src a total of rename_dst_nr
times.  When we're not detecting copies, any exact renames or ignorable
rename paths will just be skipped over.  While checking that these can
be skipped over is a relatively cheap check, it's still a waste of time
to do that check more than once, let alone rename_dst_nr times.  When
rename_src_nr is a few thousand times bigger than the number of relevant
sources (such as when cherry-picking a commit that only touched a
handful of files, but from a side of history that has different names
for some high level directories), this time can add up.

First make an initial pass over the rename_src array and move all the
relevant entries to the front, so that we can iterate over just those
relevant entries.

In one particular testcase involving a large repository and some
high-level directories having been renamed, this cut the time necessary
for a cherry-pick down by a factor of about 2 (from around 34 seconds
down to just under 16 seconds)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c | 47 +++++++++++++++++++++++++++++++----------------
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index aa8e0e4d4a..f6fc084891 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -437,16 +437,14 @@ static int find_renames(struct diff_score *mx, int dst_cnt, int minimum_score, i
 	return count;
 }
 
-static int handle_rename_ignores(struct diff_options *options)
+static void handle_rename_ignores(struct diff_options *options)
 {
-	int detect_rename = options->detect_rename;
 	struct string_list *ignores = options->ignore_for_renames;
-	int ignored = 0;
 	int i, j;
 
 	/* rename_ignores onlhy relevant when we're not detecting copies */
-	if (ignores == NULL || detect_rename == DIFF_DETECT_COPY)
-		return 0;
+	if (ignores == NULL)
+		return;
 
 	for (i = 0, j = 0; i < ignores->nr && j < rename_src_nr;) {
 		struct diff_filespec *one = rename_src[j].p->one;
@@ -464,11 +462,27 @@ static int handle_rename_ignores(struct diff_options *options)
 			j++;
 		else {
 			one->rename_used++;
-			ignored++;
+			i++;
+			j++;
 		}
 	}
+}
+
+static int remove_renames_from_src()
+{
+	int j, new_j;
+
+	for (j = 0, new_j = 0; j < rename_src_nr; j++) {
+		if (rename_src[j].p->one->rename_used)
+			continue;
+
+		if (new_j < j)
+			memcpy(&rename_src[new_j], &rename_src[j],
+			       sizeof(struct diff_rename_src));
+		new_j++;
+	}
 
-	return ignored;
+	return new_j;
 }
 
 void diffcore_rename(struct diff_options *options)
@@ -479,7 +493,7 @@ void diffcore_rename(struct diff_options *options)
 	struct diff_queue_struct outq;
 	struct diff_score *mx;
 	int i, j, rename_count, skip_unmodified = 0;
-	int num_create, dst_cnt, num_src, ignore_count;
+	int num_create, dst_cnt, num_src;
 	struct progress *progress = NULL;
 
 	if (!minimum_score)
@@ -542,18 +556,19 @@ void diffcore_rename(struct diff_options *options)
 
 	/*
 	 * Mark source files as used if they are found in the
-	 * ignore_for_renames list.
+	 * ignore_for_renames list, and clean out files from rename_src
+	 * that we don't need to continue considering.
 	 */
-	ignore_count = handle_rename_ignores(options);
+	num_src = rename_src_nr;
+	if (detect_rename != DIFF_DETECT_COPY) {
+		handle_rename_ignores(options);
+		num_src = remove_renames_from_src();
+	}
 
 	/*
-	 * Calculate how many renames are left (but all the source
-	 * files still remain as options for rename/copies!)
+	 * Calculate how many renames are left
 	 */
 	num_create = (rename_dst_nr - rename_count);
-	num_src = (detect_rename == DIFF_DETECT_COPY ?
-		   rename_src_nr : rename_src_nr - rename_count);
-	num_src -= ignore_count;
 
 	/* All done? */
 	if (!num_create)
@@ -588,7 +603,7 @@ void diffcore_rename(struct diff_options *options)
 		for (j = 0; j < NUM_CANDIDATE_PER_DST; j++)
 			m[j].dst = -1;
 
-		for (j = 0; j < rename_src_nr; j++) {
+		for (j = 0; j < num_src; j++) {
 			struct diff_filespec *one = rename_src[j].p->one;
 			struct diff_score this_src;
 
-- 
2.15.0.46.g41dca04efb


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 3/9] merge-recursive: New function for better colliding conflict resolutions
  2017-11-10 22:21 ` [RFC PATCH 3/9] merge-recursive: New function for better colliding conflict resolutions Elijah Newren
@ 2017-11-11 16:49   ` Elijah Newren
  0 siblings, 0 replies; 12+ messages in thread
From: Elijah Newren @ 2017-11-11 16:49 UTC (permalink / raw)
  To: Git Mailing List

On Fri, Nov 10, 2017 at 2:21 PM, Elijah Newren <newren@gmail.com> wrote:
> There are three conflict types that represent two (possibly entirely
> unrelated) files colliding at the same location:
<snip>
> If the files are similar enough, the two-way merge is probably
> preferable, but if they're not similar, recording as separate files is
> probably similar.  (The same logic that applies for the working

That should read, "...recording as separate files is probably preferable".

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH 4/9] Add testcases for improved file collision conflict handling
  2017-11-10 22:21 ` [RFC PATCH 4/9] Add testcases for improved file collision conflict handling Elijah Newren
@ 2017-11-11 16:52   ` Elijah Newren
  0 siblings, 0 replies; 12+ messages in thread
From: Elijah Newren @ 2017-11-11 16:52 UTC (permalink / raw)
  To: Git Mailing List

On Fri, Nov 10, 2017 at 2:21 PM, Elijah Newren <newren@gmail.com> wrote:
> Adds testcases dealing with file collisions for the following types of
> conflicts:
>   * add/add
>   * rename/add
>   * rename/rename(2to1)
<snip>
> ---
>  t/t6036-recursive-corner-cases.sh    |   8 +-

The changes to t6036 were supposed to have been squashed into a later
commit in this series; I apparently flubbed it and squashed it into
the wrong commit.  Will fix together with whatever other feedback I
get on the series before re-posting.

>  t/t6042-merge-rename-corner-cases.sh | 210 +++++++++++++++++++++++++++++++++++
>  2 files changed, 214 insertions(+), 4 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-11-11 16:52 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-10 22:21 [RFC PATCH 0/9] Improve rename detection performance in merge recursive Elijah Newren
2017-11-10 22:21 ` [RFC PATCH 1/9] diffcore-rename: No point trying to find a match better than exact Elijah Newren
2017-11-10 22:21 ` [RFC PATCH 2/9] merge-recursive: Avoid unnecessary string list lookups Elijah Newren
2017-11-10 22:21 ` [RFC PATCH 3/9] merge-recursive: New function for better colliding conflict resolutions Elijah Newren
2017-11-11 16:49   ` Elijah Newren
2017-11-10 22:21 ` [RFC PATCH 4/9] Add testcases for improved file collision conflict handling Elijah Newren
2017-11-11 16:52   ` Elijah Newren
2017-11-10 22:21 ` [RFC PATCH 5/9] merge-recursive: Fix rename/add " Elijah Newren
2017-11-10 22:21 ` [RFC PATCH 6/9] merge-recursive: Improve handling for rename/rename(2to1) conflicts Elijah Newren
2017-11-10 22:21 ` [RFC PATCH 7/9] merge-recursive: Improve handling for add/add conflicts Elijah Newren
2017-11-10 22:21 ` [RFC PATCH 8/9] merge-recursive: Accelerate rename detection Elijah Newren
2017-11-10 22:21 ` [RFC PATCH 9/9] diffcore-rename: Filter rename_src list when possible Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).