git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v5 0/4] Detection of directory renames
@ 2010-10-09 21:31 Yann Dirson
  2010-10-09 21:31 ` [PATCH v5 1/4] Introduce bulk-move detection in diffcore Yann Dirson
  0 siblings, 1 reply; 9+ messages in thread
From: Yann Dirson @ 2010-10-09 21:31 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

Evolutions from v4 to v5:

(Jonathan Nieder) use memcpy, not strncpy overkill
(Jonathan Nieder) split 2 funcs out of diffcore_bulk_renames
(Jonathan Nieder) more comments
(Jonathan Nieder) avoid allocating often-useless filespec too early
(Jonathan Nieder) split into new func
(Ævar Arnfjörð Bjarmason) convert C99 comments intended to stay to C89 ones

(Jonathan Nieder, Ævar Arnfjörð Bjarmason) reworked style of test script
(Jonathan Nieder, Sverre Rabbelier) use compare_diff_raw, not compare_diff_patch;
	anonymize hashes and scores
(Jonathan Nieder) use "git commit" in test instead of only plumbing, and use test_tick

listed a number of additional FIXME's
changed wording "factorization" -> "bulk move"
changed --detect-dir-renames to --detect-bulk-moves and
	--hide-dir-rename-details to --hide-bulk-move-details
first attempt at showing the "/*" suffix in bulk moves, for comment
stop maintaining testcases for --hide-bulk-move-details, I'm just keeping
	the patch adding the flag in sync with other code changes anyway.


Yann Dirson (4):
  Introduce bulk-move detection in diffcore.
  Add testcases for the --detect-bulk-moves diffcore flag.
  [RFC] Only show bulkmoves in output.
  [RFC] Allow hiding renames of individual files involved in a
    directory rename.

 diff-lib.c                       |    6 +-
 diff.c                           |   12 ++
 diff.h                           |    6 +
 diffcore-rename.c                |  349 +++++++++++++++++++++++++++++++++++++-
 diffcore.h                       |    1 +
 t/t4046-diff-rename-factorize.sh |  251 +++++++++++++++++++++++++++
 tree-diff.c                      |    4 +-
 7 files changed, 617 insertions(+), 12 deletions(-)
 create mode 100755 t/t4046-diff-rename-factorize.sh

-- 
1.7.2.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v5 1/4] Introduce bulk-move detection in diffcore.
  2010-10-09 21:31 [PATCH v5 0/4] Detection of directory renames Yann Dirson
@ 2010-10-09 21:31 ` Yann Dirson
  2010-10-09 21:31   ` [PATCH v5 2/4] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
  0 siblings, 1 reply; 9+ messages in thread
From: Yann Dirson @ 2010-10-09 21:31 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

This feature tries to group together files moving from and to
identical directories - the most common case being directory renames.

It is activated by the new --detect-bulk-moves diffcore
flag.

This is only the first step, adding the basic functionnality and
adding support to raw diff output (and it is not supported in
unified-diff output yet)

The output of raw diff is displayed as "Rnnn a/* b/".

Other future developements to be made on top of this include:
* extension of unified-diff format to express this
* display as such the special case of directory move/rename
* application of such new diffs: issue a conflict, or just a warning ?
* teach git-svn (and others ?) to make use of that flag
* handle new conflict type "bulk-move/add"
* detect "directory splits" as well
* use inexact dir renames to bump score of below-threshold renames
  from/to same locations
* add yours here
---
 diff-lib.c        |    6 +-
 diff.c            |    5 +
 diff.h            |    3 +
 diffcore-rename.c |  278 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 tree-diff.c       |    4 +-
 5 files changed, 285 insertions(+), 11 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index 392ce2b..5ec3ddc 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 						    ce_option, &dirty_submodule);
 		if (!changed && !dirty_submodule) {
 			ce_mark_uptodate(ce);
-			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+			    !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
 				continue;
 		}
 		oldmode = ce->ce_mode;
@@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs,
 
 	oldmode = old->ce_mode;
 	if (mode == oldmode && !hashcmp(sha1, old->sha1) && !dirty_submodule &&
-	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+	    !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
 		return 0;
 
 	diff_change(&revs->diffopt, oldmode, mode,
diff --git a/diff.c b/diff.c
index 71efa8e..64e01b3 100644
--- a/diff.c
+++ b/diff.c
@@ -3188,6 +3188,11 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_OPT_SET(options, REVERSE_DIFF);
 	else if (!strcmp(arg, "--find-copies-harder"))
 		DIFF_OPT_SET(options, FIND_COPIES_HARDER);
+	else if (!strcmp(arg, "--detect-bulk-moves")) {
+		DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+		if (!options->detect_rename)
+			options->detect_rename = DIFF_DETECT_RENAME;
+	}
 	else if (!strcmp(arg, "--follow"))
 		DIFF_OPT_SET(options, FOLLOW_RENAMES);
 	else if (!strcmp(arg, "--color"))
diff --git a/diff.h b/diff.h
index 1fd44f5..b0d6fa6 100644
--- a/diff.h
+++ b/diff.h
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
+#define DIFF_OPT_DETECT_BULK_MOVES  (1 << 28)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
@@ -265,6 +266,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
 "  -C            detect copies.\n" \
 "  --find-copies-harder\n" \
 "                try unchanged files as candidate for copy detection.\n" \
+"  --detect-bulk-moves\n" \
+"                detect wholesale directory renames.\n" \
 "  -l<n>         limit rename attempts up to <n> paths.\n" \
 "  -O<file>      reorder diffs according to the <file>.\n" \
 "  -S<string>    find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..18a0605 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -11,9 +11,13 @@
 static struct diff_rename_dst {
 	struct diff_filespec *two;
 	struct diff_filepair *pair;
+	int i_am_not_single:1; /* does not look for a match, only here to be looked at */
 } *rename_dst;
 static int rename_dst_nr, rename_dst_alloc;
 
+/*
+ * Do a binary search of "two" in "rename_dst", inserting it if not found.
+ */
 static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
 						 int insert_ok)
 {
@@ -49,9 +53,36 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
 	rename_dst[first].two = alloc_filespec(two->path);
 	fill_filespec(rename_dst[first].two, two->sha1, two->mode);
 	rename_dst[first].pair = NULL;
+	rename_dst[first].i_am_not_single = 0;
 	return &(rename_dst[first]);
 }
 
+/*
+ * Do a binary search in "rename_dst" of any entry under "dirname".
+ */
+static struct diff_rename_dst *locate_rename_dst_dir(const char *dirname)
+{
+	int first, last;
+	int prefixlength = strlen(dirname);
+
+	first = 0;
+	last = rename_dst_nr;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		struct diff_rename_dst *dst = &(rename_dst[next]);
+		int cmp = strncmp(dirname, dst->two->path, prefixlength);
+		if (!cmp)
+			return dst;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
+	}
+	/* not found */
+	return NULL;
+}
+
 /* Table of rename/copy src files */
 static struct diff_rename_src {
 	struct diff_filespec *one;
@@ -386,8 +417,11 @@ static int find_exact_renames(void)
 	for (i = 0; i < rename_src_nr; i++)
 		insert_file_table(&file_table, -1, i, rename_src[i].one);
 
-	for (i = 0; i < rename_dst_nr; i++)
+	for (i = 0; i < rename_dst_nr; i++) {
+		if (rename_dst[i].i_am_not_single)
+			continue;
 		insert_file_table(&file_table, 1, i, rename_dst[i].two);
+	}
 
 	/* Find the renames */
 	i = for_each_hash(&file_table, find_same_files);
@@ -414,6 +448,201 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
 		m[worst] = *o;
 }
 
+struct diff_dir_rename {
+	struct diff_filespec *one;
+	struct diff_filespec *two;
+	int discarded;
+	struct diff_dir_rename* next;
+};
+
+// FIXME: replace with JN's version
+/*
+ * Copy dirname of src into dst, suitable to append a filename without
+ * an additional "/".
+ * Only handles relative paths since there is no absolute path in a git repo.
+ * Writes "" when there is no "/" in src.
+ * May overwrite more chars than really needed, if src ends with a "/".
+ */
+static const char* copy_dirname(char* dst, const char* src)
+{
+	char* lastslash = strrchr(src, '/');
+	if (!lastslash) {
+		*dst = '\0';
+		return dst;
+	}
+	memcpy(dst, src, lastslash - src + 1);
+	dst[lastslash - src + 1] = '\0';
+
+	/* if src ends with a "/" strip the last component */
+	if (lastslash[1] == '\0') {
+		lastslash = strrchr(dst, '/');
+		if (!lastslash)
+			return strcpy(dst, ".");
+		lastslash[1] = '\0';
+	}
+
+	return dst;
+}
+
+// FIXME: investigate avoiding i_am_not_single by using a separate list
+// FIXME: we could optimize the 100%-rename case by preventing
+//        recursion to unfold what we know we would refold here.
+// FIXME: check what happens when bulk move interferes with copy+move detection
+//        (which may have selected the wrong target for move)
+// FIXME: do we want to replace linked list with sorted array ?
+// FIXME: this prototype only properly handles renaming of dirs without a subdir.
+// FIXME: leaks like hell.
+// FIXME: ideas to evaluate a similarity score, anyone ?
+//        10% * tree similarity + 90% * moved files similarity ?
+// FIXME: ideas to consider under-threshold moves as part of bulk move ?
+
+// FIXME: handle as sorted list to allow for binary search ?
+static struct diff_dir_rename* bulkmove_candidates = NULL;
+
+/* See if the fact that one_leftover exists under one_parent_path in
+ * dst tree should disqualify one_parent_path from bulkmove eligibility.
+ * Return 1 if it disqualifies, 0 if it is OK.
+ */
+static int maybe_disqualify_bulkmove(const char* one_parent_path,
+				       struct diff_rename_dst* one_leftover)
+{
+	// FIXME: should only be run if !seen
+	/* this might be a dir split, or files added
+	 * after the bulk move, or just an isolated
+	 * rename */
+	/* FIXME: should we cache this information to avoid
+	 * recomputing that result when many files trigger it ?
+	 * See eg. git.git:761e742d692.
+	 */
+	int two_idx, j, onep_len, maybe_dir_rename;
+
+	/* try to see if it is a file added after the bulk move */
+	two_idx = one_leftover - rename_dst;
+	onep_len = strlen(one_parent_path);
+	maybe_dir_rename = 1;
+
+	/* check no leftover file was already here before */
+	for (j = two_idx; j < rename_dst_nr; j++) {
+		if (strncmp(rename_dst[j].two->path,
+			    one_parent_path, onep_len))
+			break; /* exhausted directory in this direction */
+		fprintf(stderr, "[DBG] leftover file %s in %s\n",
+			rename_dst[j].two->path, one_parent_path);
+		if (rename_dst[j].i_am_not_single || /* those were already here */
+		    (rename_dst[j].pair &&
+		     !strncmp(rename_dst[j].pair->one->path,
+			      one_parent_path, onep_len) && /* renamed from here */
+		     strncmp(rename_dst[j].two->path,
+			     one_parent_path, onep_len))) { /* not to a subdir */
+			maybe_dir_rename = 0;
+			fprintf(stderr, "[DBG] ... tells not a bulk move\n");
+			break;
+		}
+		fprintf(stderr, "[DBG] ... not believed to prevent bulk move\n");
+	}
+	if (!maybe_dir_rename) return 1;
+	/* try the other direction (dup code) */
+	for (j = two_idx-1; j >= 0; j--) {
+		if (strncmp(rename_dst[j].two->path,
+			    one_parent_path, onep_len))
+			break; /* exhausted directory in this direction */
+		fprintf(stderr, "[DBG] leftover file %s in %s\n",
+			rename_dst[j].two->path, one_parent_path);
+		if (rename_dst[j].i_am_not_single || /* those were already here */
+		    (rename_dst[j].pair &&
+		     !strncmp(rename_dst[j].pair->one->path,
+			      one_parent_path, onep_len) && /* renamed from here */
+		     strncmp(rename_dst[j].two->path,
+			     one_parent_path, onep_len))) { /* not to a subdir */
+			maybe_dir_rename = 0;
+			fprintf (stderr, "[DBG] ... tells not a bulk move\n");
+			break;
+		}
+		fprintf(stderr, "[DBG] ... not believed to prevent bulk move\n");
+	}
+	if (!maybe_dir_rename) return 1;
+
+	/* Here we are in the case where a directory
+	 * content was completely moved, but files
+	 * were added to it afterwards.  Proceed as
+	 * for a simple bulk move. */
+	return 0;
+}
+
+/*
+ * Check if the rename specified by "dstpair" could cause a
+ * bulk move to be detected, record it in bulkmove_candidates if yes.
+ */
+static void check_one_bulk_move(struct diff_filepair *dstpair)
+{
+	char one_parent_path[PATH_MAX], two_parent_path[PATH_MAX];
+	struct diff_dir_rename* seen;
+
+	/* genuine new files (or believed to be so) */
+	if (!dstpair)
+		return;
+	/* dummy renames used by copy detection */
+	if (!strcmp(dstpair->one->path, dstpair->two->path))
+		return;
+
+	copy_dirname(one_parent_path, dstpair->one->path);
+	copy_dirname(two_parent_path, dstpair->two->path);
+
+	/* simple rename with no directory change */
+	if (!strcmp(one_parent_path, two_parent_path))
+		return;
+
+	/* After this commit, are there any files still under one_parent_path ?
+	 * Any file left would disqualifies this dir for bulk move.
+	 */
+	struct diff_rename_dst* one_leftover = locate_rename_dst_dir(one_parent_path);
+	if (one_leftover &&
+	    maybe_disqualify_bulkmove(one_parent_path, one_leftover))
+		return;
+
+	/* already considered ? */
+	for (seen=bulkmove_candidates; seen; seen = seen->next)
+		if (!strcmp(seen->one->path, one_parent_path)) break;
+	if (!seen) {
+		/* record potential dir rename */
+		seen = xmalloc(sizeof(*seen));
+		seen->one = alloc_filespec(one_parent_path);
+		fill_filespec(seen->one, null_sha1 /*FIXME*/, S_IFDIR);
+		seen->two = alloc_filespec(two_parent_path);
+		fill_filespec(seen->two, null_sha1 /*FIXME*/, S_IFDIR);
+		seen->discarded = 0;
+		seen->next = bulkmove_candidates;
+		bulkmove_candidates = seen;
+		fprintf(stderr, "[DBG] %s -> %s suggests possible rename from %s to %s\n",
+			dstpair->one->path,
+			dstpair->two->path,
+			one_parent_path, two_parent_path);
+		return;
+	}
+	if (seen->discarded)
+		/* already seen a rename from seen->one to some than ->two */
+		return;
+	/* check that seen entry matches this rename */
+	if (strcmp(two_parent_path, seen->two->path)) {
+		fprintf(stderr, "[DBG] discarding dir %s from renames (split into %s and %s)\n",
+			one_parent_path, two_parent_path, seen->two->path);
+		seen->discarded = 1;
+	}
+
+	/* all checks ok, we keep that entry */
+}
+
+/*
+ * Take all file renames recorded so far and check if they could cause
+ * a bulk move to be detected.
+ */
+static void diffcore_bulk_moves(void)
+{
+	int i;
+	for (i = 0; i < rename_dst_nr; i++)
+		check_one_bulk_move(rename_dst[i].pair);
+}
+
 void diffcore_rename(struct diff_options *options)
 {
 	int detect_rename = options->detect_rename;
@@ -451,13 +680,23 @@ void diffcore_rename(struct diff_options *options)
 				p->one->rename_used++;
 			register_rename_src(p->one, p->score);
 		}
-		else if (detect_rename == DIFF_DETECT_COPY) {
-			/*
-			 * Increment the "rename_used" score by
-			 * one, to indicate ourselves as a user.
-			 */
-			p->one->rename_used++;
-			register_rename_src(p->one, p->score);
+		else {
+			if (detect_rename == DIFF_DETECT_COPY) {
+				/*
+				 * Increment the "rename_used" score by
+				 * one, to indicate ourselves as a user.
+				 */
+				p->one->rename_used++;
+				register_rename_src(p->one, p->score);
+			}
+			if (DIFF_OPT_TST(options, DETECT_BULK_MOVES)) {
+				/* similarly, bulk move detection needs to
+				 * see all files from second tree, but we don't
+				 * want them to be matched against single sources.
+				 */
+				// FIXME: check interaction with --find-copies-harder
+				locate_rename_dst(p->two, 1)->i_am_not_single = 1;
+			}
 		}
 	}
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
@@ -509,6 +748,8 @@ void diffcore_rename(struct diff_options *options)
 
 		if (rename_dst[i].pair)
 			continue; /* dealt with exact match already. */
+		if (rename_dst[i].i_am_not_single)
+			continue; /* not looking for a match. */
 
 		m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST];
 		for (j = 0; j < NUM_CANDIDATE_PER_DST; j++)
@@ -569,7 +810,28 @@ void diffcore_rename(struct diff_options *options)
 	/* At this point, we have found some renames and copies and they
 	 * are recorded in rename_dst.  The original list is still in *q.
 	 */
+
+	/* Now possibly factorize those renames and copies. */
+	if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+		diffcore_bulk_moves();
+
 	DIFF_QUEUE_CLEAR(&outq);
+
+	/* Now turn non-discarded bulkmove_candidates into real renames */
+	struct diff_dir_rename* candidate;
+	for (candidate=bulkmove_candidates; candidate; candidate = candidate->next) {
+		struct diff_filepair* pair;
+		if (candidate->discarded) continue;
+		/* visualize toplevel dir if needed */ //FIXME: wrong place for this ?
+		if (!*candidate->one->path)
+			candidate->one->path = "./";
+		if (!*candidate->two->path)
+			candidate->two->path = "./";
+		pair = diff_queue(&outq, candidate->one, candidate->two);
+		pair->score = MAX_SCORE;
+		pair->renamed_pair = 1;
+	}
+
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		struct diff_filepair *pair_to_free = NULL;
diff --git a/tree-diff.c b/tree-diff.c
index cd659c6..5d9f123 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -49,7 +49,9 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
 		show_entry(opt, "+", t2, base, baselen);
 		return 1;
 	}
-	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) && mode1 == mode2)
+	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) &&
+	    !DIFF_OPT_TST(opt, DETECT_BULK_MOVES) &&
+	    !hashcmp(sha1, sha2) && mode1 == mode2)
 		return 0;
 
 	/*
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 2/4] Add testcases for the --detect-bulk-moves diffcore flag.
  2010-10-09 21:31 ` [PATCH v5 1/4] Introduce bulk-move detection in diffcore Yann Dirson
@ 2010-10-09 21:31   ` Yann Dirson
  2010-10-09 21:31     ` [PATCH v5 3/4] [RFC] Only show bulkmoves in output Yann Dirson
  0 siblings, 1 reply; 9+ messages in thread
From: Yann Dirson @ 2010-10-09 21:31 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

From: Yann Dirson <ydirson@free.fr>

This notably includes a couple of tests for cases known not to be
working correctly yet.
---
 t/t4046-diff-rename-factorize.sh |  251 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 251 insertions(+), 0 deletions(-)
 create mode 100755 t/t4046-diff-rename-factorize.sh

diff --git a/t/t4046-diff-rename-factorize.sh b/t/t4046-diff-rename-factorize.sh
new file mode 100755
index 0000000..d063e25
--- /dev/null
+++ b/t/t4046-diff-rename-factorize.sh
@@ -0,0 +1,251 @@
+#!/bin/sh
+#
+# Copyright (c) 2008,2010 Yann Dirson
+# Copyright (c) 2005 Junio C Hamano
+#
+
+# TODO - missing tests:
+# * two dirs or more moving all their files to a single dir
+# * simultaneous bulkmove and rename
+# * add a new file under a dir that was moved in the same commit
+# * copy of a bulkmoved file, when scores would otherwise select the
+#   wrong one as "moved"
+
+test_description='Test rename factorization in diff engine.
+
+'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
+
+test_expect_success 'setup' '
+	git commit --allow-empty -m "original empty commit"
+
+	mkdir a &&
+	printf "Line %s\n" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >a/path0 &&
+	sed <a/path0 >a/path1 s/Line/Record/ &&
+	sed <a/path0 >a/path2 s/Line/Stuff/ &&
+	sed <a/path0 >a/path3 s/Line/Blurb/ &&
+
+	git update-index --add a/path* &&
+	test_tick &&
+	git commit -m "original set of files"
+
+	: rename the directory &&
+	mv a b &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 b/path*
+'
+
+test_expect_success 'diff-index --detect-bulk-moves after directory move.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/	b/
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:100644 100644 X X R#	a/path2	b/path2
+	:100644 100644 X X R#	a/path3	b/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup non-100% rename' '
+	echo "Line 16" >>b/path0 &&
+	mv b/path2 b/2path &&
+	rm b/path3 &&
+	echo anything >b/path100 &&
+	git update-index --add --remove b/* b/path2 b/path3
+'
+
+test_expect_success 'diff-index --detect-bulk-moves after content changes.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/	b/
+	:100644 000000 X X D#	a/path3
+	:100644 100644 X X R#	a/path2	b/2path
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:000000 100644 X X A#	b/path100
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup bulk move that is not directory move' '
+	git reset -q --hard &&
+
+	mkdir c &&
+	(
+		for i in 0 1 2; do
+			cp a/path$i c/apath$i || exit
+		done
+	) &&
+	git update-index --add c/apath* &&
+	test_tick &&
+	git commit -m "first set of changes" &&
+
+	mv c/* a/ &&
+	git update-index --add --remove a/* c/apath0 c/apath1 c/apath2
+'
+
+test_expect_failure 'diff-index --detect-bulk-moves without full-dir rename.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	a/
+	:100644 100644 X X R#	c/apath0	a/apath0
+	:100644 100644 X X R#	c/apath1	a/apath1
+	:100644 100644 X X R#	c/apath2	a/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup bulk move to toplevel' '
+	git reset -q --hard &&
+	mv c/* . &&
+	git update-index --add --remove apath* c/apath0 c/apath1 c/apath2
+'
+
+test_expect_failure 'diff-index --detect-bulk-moves bulk move to toplevel.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	./
+	:100644 100644 X X R#	c/apath0	apath0
+	:100644 100644 X X R#	c/apath1	apath1
+	:100644 100644 X X R#	c/apath2	apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup move including a subdir, with some content changes' '
+	git reset -q --hard &&
+	mv c a/ &&
+	git update-index --add --remove a/c/* c/apath0 c/apath1 c/apath2 &&
+	test_tick &&
+	git commit -m "move as subdir" &&
+
+	mv a b &&
+	echo foo >>b/c/apath0 &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 \
+			a/c/apath0 a/c/apath1 a/c/apath2 b/path* b/c/apath*
+'
+
+test_expect_failure 'diff-index --detect-bulk-moves on a move including a subdir.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/	b/
+	:100644 100644 X X R#	a/c/apath0	b/c/apath0
+	:100644 100644 X X R#	a/c/apath1	b/c/apath1
+	:100644 100644 X X R#	a/c/apath2	b/c/apath2
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:100644 100644 X X R#	a/path2	b/path2
+	:100644 100644 X X R#	a/path3	b/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup move without a subdir' '
+	git reset -q --hard &&
+	mkdir b &&
+	mv a/path* b/ &&
+	: rename files in the directory but not subdir. &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 b/path*
+'
+
+test_expect_success 'moving files but not subdirs is not mistaken for dir move' '
+	cat >expected <<-EOF &&
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:100644 100644 X X R#	a/path2	b/path2
+	:100644 100644 X X R#	a/path3	b/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup move of files and subdirs to different places' '
+	git reset -q --hard &&
+	mv a/c b &&
+	mv a d &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 \
+		a/c/apath0 a/c/apath1 a/c/apath2 d/path* b/apath*
+'
+
+test_expect_failure 'moving subdirs into one dir and files into another is not mistaken for dir move' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/c/	b/
+	:100644 100644 X X R#	a/c/apath0	b/apath0
+	:100644 100644 X X R#	a/c/apath1	b/apath1
+	:100644 100644 X X R#	a/c/apath2	b/apath2
+	:100644 100644 X X R#	a/path0	d/path0
+	:100644 100644 X X R#	a/path1	d/path1
+	:100644 100644 X X R#	a/path2	d/path2
+	:100644 100644 X X R#	a/path3	d/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup move of dir with only subdirs' '
+	git reset -q --hard &&
+	mkdir a/b &&
+	mv a/path* a/b/ &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/b/path* &&
+	test_tick &&
+	git commit -m "move all toplevel files down one level" &&
+
+	git mv a z
+'
+
+test_expect_failure 'moving a dir with no files' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/	z/
+	:040000 040000 X X R#	a/b/	z/b/
+	:040000 040000 X X R#	a/c/	z/c/
+	:100644 100644 X X R#	a/b/path0	z/b/path0
+	:100644 100644 X X R#	a/b/path1	z/b/path1
+	:100644 100644 X X R#	a/b/path2	z/b/path2
+	:100644 100644 X X R#	a/b/path3	z/b/path3
+	:100644 100644 X X R#	a/c/apath0	z/c/apath0
+	:100644 100644 X X R#	a/c/apath1	z/c/apath1
+	:100644 100644 X X R#	a/c/apath2	z/c/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+# now test moving all files from toplevel into subdir (does not hides file moves) (needs consensus on syntax)
+# Note: this is a special case of move of a dir into one of its own subdirs, which in
+# turn is a variant of new files/dirs being added into a dir after all its contents
+# are moved away
+
+test_expect_success 'setup move from toplevel to subdir' '
+	git reset -q --hard HEAD~3 &&
+	mv a/* . &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 path* &&
+	test_tick &&
+	git commit -m "move all files to toplevel" &&
+
+	mkdir z &&
+	mv path* z/ &&
+	git update-index --add --remove path0 path1 path2 path3 z/path*
+'
+
+test_expect_failure '--detect-bulk-moves everything from toplevel.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	./*	z/
+	:100644 100644 X X R#	path0	z/path0
+	:100644 100644 X X R#	path1	z/path1
+	:100644 100644 X X R#	path2	z/path2
+	:100644 100644 X X R#	path3	z/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_done
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 3/4] [RFC] Only show bulkmoves in output.
  2010-10-09 21:31   ` [PATCH v5 2/4] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
@ 2010-10-09 21:31     ` Yann Dirson
  2010-10-09 21:31       ` [PATCH v5 4/4] [RFC] Allow hiding renames of individual files involved in a directory rename Yann Dirson
  2010-10-10 12:39       ` [PATCH v5 3/4] [RFC] Only show bulkmoves in output Yann Dirson
  0 siblings, 2 replies; 9+ messages in thread
From: Yann Dirson @ 2010-10-09 21:31 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

In theory we could just append those at display time (possibly diluting
the code too much ?), or before queing the diff.  In practice we cannot
do the latter easily in either case (strcat to a constant string ("./"),
nor to the paths tighly-allocated from alloc_filespec).

So I went the way of including the trailing "*" from the beginning.
Since I'm still unsure whether keeping it that way, I leave it as a
separated patch for now.

After this patch only one of the expected failures can be considered
innocuous.
---
 diffcore-rename.c                |   22 +++++++++++++++-------
 t/t4046-diff-rename-factorize.sh |   22 +++++++++++-----------
 2 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 18a0605..6d21792 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -588,6 +588,13 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
 	copy_dirname(one_parent_path, dstpair->one->path);
 	copy_dirname(two_parent_path, dstpair->two->path);
 
+	/* Visualize toplevel dir if needed.  Need it here because of
+	 * "*" handling below. */
+	if (!*one_parent_path)
+		strcpy(one_parent_path, "./");
+	if (!*two_parent_path)
+		strcpy(two_parent_path, "./");
+
 	/* simple rename with no directory change */
 	if (!strcmp(one_parent_path, two_parent_path))
 		return;
@@ -600,11 +607,16 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
 	    maybe_disqualify_bulkmove(one_parent_path, one_leftover))
 		return;
 
+	/* For output format reason we want this "*" for the general
+	 * case of bulk moves.  Most be here for alloc_filespec to
+	 * reserve enough space, and to allow for proper comparison
+	 * with those previously recorded in bulkmove_candidates. */
+	strcat(one_parent_path, "*");
+
 	/* already considered ? */
 	for (seen=bulkmove_candidates; seen; seen = seen->next)
 		if (!strcmp(seen->one->path, one_parent_path)) break;
-	if (!seen) {
-		/* record potential dir rename */
+	if (!seen) { /* record potential dir rename */
 		seen = xmalloc(sizeof(*seen));
 		seen->one = alloc_filespec(one_parent_path);
 		fill_filespec(seen->one, null_sha1 /*FIXME*/, S_IFDIR);
@@ -822,11 +834,7 @@ void diffcore_rename(struct diff_options *options)
 	for (candidate=bulkmove_candidates; candidate; candidate = candidate->next) {
 		struct diff_filepair* pair;
 		if (candidate->discarded) continue;
-		/* visualize toplevel dir if needed */ //FIXME: wrong place for this ?
-		if (!*candidate->one->path)
-			candidate->one->path = "./";
-		if (!*candidate->two->path)
-			candidate->two->path = "./";
+
 		pair = diff_queue(&outq, candidate->one, candidate->two);
 		pair->score = MAX_SCORE;
 		pair->renamed_pair = 1;
diff --git a/t/t4046-diff-rename-factorize.sh b/t/t4046-diff-rename-factorize.sh
index d063e25..9353929 100755
--- a/t/t4046-diff-rename-factorize.sh
+++ b/t/t4046-diff-rename-factorize.sh
@@ -37,7 +37,7 @@ test_expect_success 'setup' '
 
 test_expect_success 'diff-index --detect-bulk-moves after directory move.' '
 	cat >expected <<-EOF &&
-	:040000 040000 X X R#	a/	b/
+	:040000 040000 X X R#	a/*	b/
 	:100644 100644 X X R#	a/path0	b/path0
 	:100644 100644 X X R#	a/path1	b/path1
 	:100644 100644 X X R#	a/path2	b/path2
@@ -58,7 +58,7 @@ test_expect_success 'setup non-100% rename' '
 
 test_expect_success 'diff-index --detect-bulk-moves after content changes.' '
 	cat >expected <<-EOF &&
-	:040000 040000 X X R#	a/	b/
+	:040000 040000 X X R#	a/*	b/
 	:100644 000000 X X D#	a/path3
 	:100644 100644 X X R#	a/path2	b/2path
 	:100644 100644 X X R#	a/path0	b/path0
@@ -87,7 +87,7 @@ test_expect_success 'setup bulk move that is not directory move' '
 	git update-index --add --remove a/* c/apath0 c/apath1 c/apath2
 '
 
-test_expect_failure 'diff-index --detect-bulk-moves without full-dir rename.' '
+test_expect_success 'diff-index --detect-bulk-moves without full-dir rename.' '
 	cat >expected <<-EOF &&
 	:040000 040000 X X R#	c/*	a/
 	:100644 100644 X X R#	c/apath0	a/apath0
@@ -105,7 +105,7 @@ test_expect_success 'setup bulk move to toplevel' '
 	git update-index --add --remove apath* c/apath0 c/apath1 c/apath2
 '
 
-test_expect_failure 'diff-index --detect-bulk-moves bulk move to toplevel.' '
+test_expect_success 'diff-index --detect-bulk-moves bulk move to toplevel.' '
 	cat >expected <<-EOF &&
 	:040000 040000 X X R#	c/*	./
 	:100644 100644 X X R#	c/apath0	apath0
@@ -132,7 +132,7 @@ test_expect_success 'setup move including a subdir, with some content changes' '
 
 test_expect_failure 'diff-index --detect-bulk-moves on a move including a subdir.' '
 	cat >expected <<-EOF &&
-	:040000 040000 X X R#	a/	b/
+	:040000 040000 X X R#	a/*	b/
 	:100644 100644 X X R#	a/c/apath0	b/c/apath0
 	:100644 100644 X X R#	a/c/apath1	b/c/apath1
 	:100644 100644 X X R#	a/c/apath2	b/c/apath2
@@ -176,7 +176,7 @@ test_expect_success 'setup move of files and subdirs to different places' '
 
 test_expect_failure 'moving subdirs into one dir and files into another is not mistaken for dir move' '
 	cat >expected <<-EOF &&
-	:040000 040000 X X R#	a/c/	b/
+	:040000 040000 X X R#	a/c/*	b/
 	:100644 100644 X X R#	a/c/apath0	b/apath0
 	:100644 100644 X X R#	a/c/apath1	b/apath1
 	:100644 100644 X X R#	a/c/apath2	b/apath2
@@ -201,11 +201,11 @@ test_expect_success 'setup move of dir with only subdirs' '
 	git mv a z
 '
 
-test_expect_failure 'moving a dir with no files' '
+test_expect_failure 'moving a dir with no direct children files' '
 	cat >expected <<-EOF &&
-	:040000 040000 X X R#	a/	z/
-	:040000 040000 X X R#	a/b/	z/b/
-	:040000 040000 X X R#	a/c/	z/c/
+	:040000 040000 X X R#	a/*	z/
+	:040000 040000 X X R#	a/b/*	z/b/
+	:040000 040000 X X R#	a/c/*	z/c/
 	:100644 100644 X X R#	a/b/path0	z/b/path0
 	:100644 100644 X X R#	a/b/path1	z/b/path1
 	:100644 100644 X X R#	a/b/path2	z/b/path2
@@ -235,7 +235,7 @@ test_expect_success 'setup move from toplevel to subdir' '
 	git update-index --add --remove path0 path1 path2 path3 z/path*
 '
 
-test_expect_failure '--detect-bulk-moves everything from toplevel.' '
+test_expect_success '--detect-bulk-moves everything from toplevel.' '
 	cat >expected <<-EOF &&
 	:040000 040000 X X R#	./*	z/
 	:100644 100644 X X R#	path0	z/path0
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 4/4] [RFC] Allow hiding renames of individual files involved in a directory rename.
  2010-10-09 21:31     ` [PATCH v5 3/4] [RFC] Only show bulkmoves in output Yann Dirson
@ 2010-10-09 21:31       ` Yann Dirson
  2010-10-10 12:39       ` [PATCH v5 3/4] [RFC] Only show bulkmoves in output Yann Dirson
  1 sibling, 0 replies; 9+ messages in thread
From: Yann Dirson @ 2010-10-09 21:31 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

Once has identified groups of bulk-moved files, and then
the --hide-bulk-move-details flag hides those of the individual renames
which carry no other information (further name change, or content changes).

This makes it much easier to a human reader to spot content changes
in a commit that also moves a whole subtree.

Important note: unified diff output is not currently useful, since the "bulk move"
headers are not yet added by --detect-bulk-moves, but the redundant renames are
really removed.
---
 diff.c            |    7 +++++
 diff.h            |    3 ++
 diffcore-rename.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 diffcore.h        |    1 +
 4 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 64e01b3..28f5be3 100644
--- a/diff.c
+++ b/diff.c
@@ -3193,6 +3193,13 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		if (!options->detect_rename)
 			options->detect_rename = DIFF_DETECT_RENAME;
 	}
+	else if (!strcmp(arg, "--hide-bulk-move-details")) {
+		DIFF_OPT_SET(options, HIDE_DIR_RENAME_DETAILS);
+		if (!DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+			DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+		if (!options->detect_rename)
+			options->detect_rename = DIFF_DETECT_RENAME;
+	}
 	else if (!strcmp(arg, "--follow"))
 		DIFF_OPT_SET(options, FOLLOW_RENAMES);
 	else if (!strcmp(arg, "--color"))
diff --git a/diff.h b/diff.h
index b0d6fa6..7f132d0 100644
--- a/diff.h
+++ b/diff.h
@@ -79,6 +79,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
 #define DIFF_OPT_DETECT_BULK_MOVES  (1 << 28)
+#define DIFF_OPT_HIDE_DIR_RENAME_DETAILS (1 << 29)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
@@ -268,6 +269,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
 "                try unchanged files as candidate for copy detection.\n" \
 "  --detect-bulk-moves\n" \
 "                detect wholesale directory renames.\n" \
+"  --hide-bulk-move-details\n" \
+"                hide renames of individual files in a directory rename.\n" \
 "  -l<n>         limit rename attempts up to <n> paths.\n" \
 "  -O<file>      reorder diffs according to the <file>.\n" \
 "  -S<string>    find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6d21792..cf15fc8 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -455,6 +455,34 @@ struct diff_dir_rename {
 	struct diff_dir_rename* next;
 };
 
+/*
+ * Marks as such file_rename if it is made uninteresting by dir_rename.
+ * Returns -1 if the file_rename is outside of the range in which given
+ * renames concerned by dir_rename are to be found (ie. end of loop),
+ * 0 otherwise.
+ */
+static int maybe_mark_uninteresting(struct diff_rename_dst* file_rename,
+				    struct diff_dir_rename* dir_rename,
+				    int one_len, int two_len)
+{
+	if (!file_rename->pair) /* file add */
+		return 0;
+	if (strncmp(file_rename->two->path,
+		    dir_rename->two->path, two_len))
+		return -1;
+	if (strncmp(file_rename->pair->one->path,
+		    dir_rename->one->path, one_len) ||
+	    !basename_same(file_rename->pair->one, file_rename->two) ||
+	    file_rename->pair->score != MAX_SCORE)
+		return 0;
+
+	file_rename->pair->uninteresting_rename = 1;
+	fprintf (stderr, "[DBG] %s* -> %s* makes %s -> %s uninteresting\n",
+		dir_rename->one->path, dir_rename->two->path,
+		file_rename->pair->one->path, file_rename->two->path);
+	return 0;
+}
+
 // FIXME: replace with JN's version
 /*
  * Copy dirname of src into dst, suitable to append a filename without
@@ -648,11 +676,43 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
  * Take all file renames recorded so far and check if they could cause
  * a bulk move to be detected.
  */
-static void diffcore_bulk_moves(void)
+static void diffcore_bulk_moves(int opt_hide_renames)
 {
 	int i;
 	for (i = 0; i < rename_dst_nr; i++)
 		check_one_bulk_move(rename_dst[i].pair);
+
+	if (opt_hide_renames) {
+		// flag as "uninteresting" those candidates hidden by dir move
+		struct diff_dir_rename* candidate;
+		for (candidate=bulkmove_candidates;
+		     candidate; candidate = candidate->next) {
+			int two_idx, i, one_len, two_len;
+			if (candidate->discarded)
+				continue;
+
+			// bisect to any entry within candidate dst dir
+			struct diff_rename_dst* two_sample =
+				locate_rename_dst_dir(candidate->two->path);
+			if (!two_sample) {
+				die ("PANIC: %s candidate of rename not in target tree (from %s)\n",
+				     candidate->two->path, candidate->one->path);
+			}
+			two_idx = two_sample - rename_dst;
+
+			// now remove extraneous 100% files inside.
+			one_len = strlen(candidate->one->path);
+			two_len = strlen(candidate->two->path);
+			for (i = two_idx; i < rename_dst_nr; i++)
+				if (maybe_mark_uninteresting (rename_dst+i, candidate,
+							      one_len, two_len) < 0)
+					break;
+			for (i = two_idx-1; i >= 0; i--)
+				if (maybe_mark_uninteresting (rename_dst+i, candidate,
+							      one_len, two_len) < 0)
+					break;
+		}
+	}
 }
 
 void diffcore_rename(struct diff_options *options)
@@ -825,7 +885,7 @@ void diffcore_rename(struct diff_options *options)
 
 	/* Now possibly factorize those renames and copies. */
 	if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
-		diffcore_bulk_moves();
+		diffcore_bulk_moves(DIFF_OPT_TST(options, HIDE_DIR_RENAME_DETAILS));
 
 	DIFF_QUEUE_CLEAR(&outq);
 
@@ -854,7 +914,8 @@ void diffcore_rename(struct diff_options *options)
 			struct diff_rename_dst *dst =
 				locate_rename_dst(p->two, 0);
 			if (dst && dst->pair) {
-				diff_q(&outq, dst->pair);
+				if (!dst->pair->uninteresting_rename)
+					diff_q(&outq, dst->pair);
 				pair_to_free = p;
 			}
 			else
diff --git a/diffcore.h b/diffcore.h
index b8f1fde..beee596 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
 	unsigned broken_pair : 1;
 	unsigned renamed_pair : 1;
 	unsigned is_unmerged : 1;
+	unsigned uninteresting_rename : 1;
 };
 #define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 3/4] [RFC] Only show bulkmoves in output.
  2010-10-09 21:31     ` [PATCH v5 3/4] [RFC] Only show bulkmoves in output Yann Dirson
  2010-10-09 21:31       ` [PATCH v5 4/4] [RFC] Allow hiding renames of individual files involved in a directory rename Yann Dirson
@ 2010-10-10 12:39       ` Yann Dirson
  2010-10-10 19:59         ` [PATCH 0/2] " Yann Dirson
  1 sibling, 1 reply; 9+ messages in thread
From: Yann Dirson @ 2010-10-10 12:39 UTC (permalink / raw)
  To: git

On Sat, Oct 09, 2010 at 11:31:34PM +0200, Yann Dirson wrote:
> In theory we could just append those at display time (possibly diluting
> the code too much ?), or before queing the diff.  In practice we cannot
> do the latter easily in either case (strcat to a constant string ("./"),
> nor to the paths tighly-allocated from alloc_filespec).
> 
> So I went the way of including the trailing "*" from the beginning.
> Since I'm still unsure whether keeping it that way, I leave it as a
> separated patch for now.

Thinking twice, that approach has a major problem: it will break when
a file is named "*".  Finally looks like differing this to display
time will be the most sensible solution.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 0/2] Re: [PATCH v5 3/4] [RFC] Only show bulkmoves in output.
  2010-10-10 12:39       ` [PATCH v5 3/4] [RFC] Only show bulkmoves in output Yann Dirson
@ 2010-10-10 19:59         ` Yann Dirson
  2010-10-10 19:59           ` [PATCH 1/2] " Yann Dirson
  0 siblings, 1 reply; 9+ messages in thread
From: Yann Dirson @ 2010-10-10 19:59 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

>Thinking twice, that approach has a major problem: it will break when
>a file is named "*".  Finally looks like differing this to display
>time will be the most sensible solution.

So here is a matching implementation (patch 1/2).  At first I also
thought of moving the responsibility of the toplevel special case
(displaying toplevel dir az "./") to the formatting routines (patch
2/2), but unless there are compelling reasons to proceed that way, I
guess that approach should be dropped - correct ?

Yann Dirson (2):
  [RFC] Only show bulkmoves in output.
  [RFC] Transfer special display of toplevel dir to display-time.

 diff.c                           |   15 +++++++++++++++
 diffcore-rename.c                |    9 ++-------
 diffcore.h                       |    1 +
 t/t4046-diff-rename-factorize.sh |   22 +++++++++++-----------
 4 files changed, 29 insertions(+), 18 deletions(-)

-- 
1.7.2.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] [RFC] Only show bulkmoves in output.
  2010-10-10 19:59         ` [PATCH 0/2] " Yann Dirson
@ 2010-10-10 19:59           ` Yann Dirson
  2010-10-10 19:59             ` [PATCH 2/2] [RFC] Transfer special display of toplevel dir to display-time Yann Dirson
  0 siblings, 1 reply; 9+ messages in thread
From: Yann Dirson @ 2010-10-10 19:59 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

At display time, append a "*" suffix to bulk-move sources.  Later
we will be more selective and not display it for bulk moves that are
indeed a directory rename.

Open question: "foo/*" is a valid name for a file, so if such a file
is part of a bulk move, we will see one "foo/*" entry for the bulk
move (with zeroed sha1's), and one "foo/*" for the individual file
(with its own sha1).  Is that seen as a problem ?  After all, the
zeroed sha1's are probably here to stay (or do we want to add tree
sha1's where possible ?), so it is pretty easy to tell which is which.
An alternative would be to have another way of telling it is a
bulk-move, such as a flag to M or C (eg. M100B), but there is no
precedent for this, so I can't tell for myself.

After this patch only one of the expected failures can be considered
innocuous.
---
 diff.c                           |    9 ++++++++-
 diffcore-rename.c                |    4 ++--
 diffcore.h                       |    1 +
 t/t4046-diff-rename-factorize.sh |   22 +++++++++++-----------
 4 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/diff.c b/diff.c
index 64e01b3..4de43d6 100644
--- a/diff.c
+++ b/diff.c
@@ -3471,7 +3471,14 @@ static void diff_flush_raw(struct diff_filepair *p, struct diff_options *opt)
 	if (p->status == DIFF_STATUS_COPIED ||
 	    p->status == DIFF_STATUS_RENAMED) {
 		const char *name_a, *name_b;
-		name_a = p->one->path;
+		if (p->is_bulkmove) {
+			/* append "*" to the first dirname */
+			char buf[PATH_MAX];
+			char* next = memccpy(buf, p->one->path, '\0', PATH_MAX);
+			next[-1] = '*'; *next = '\0';
+			name_a = buf;
+		} else
+			name_a = p->one->path;
 		name_b = p->two->path;
 		strip_prefix(opt->prefix_length, &name_a, &name_b);
 		write_name_quoted(name_a, opt->file, inter_name_termination);
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 18a0605..8de0d57 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -603,8 +603,7 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
 	/* already considered ? */
 	for (seen=bulkmove_candidates; seen; seen = seen->next)
 		if (!strcmp(seen->one->path, one_parent_path)) break;
-	if (!seen) {
-		/* record potential dir rename */
+	if (!seen) { /* record potential dir rename */
 		seen = xmalloc(sizeof(*seen));
 		seen->one = alloc_filespec(one_parent_path);
 		fill_filespec(seen->one, null_sha1 /*FIXME*/, S_IFDIR);
@@ -830,6 +829,7 @@ void diffcore_rename(struct diff_options *options)
 		pair = diff_queue(&outq, candidate->one, candidate->two);
 		pair->score = MAX_SCORE;
 		pair->renamed_pair = 1;
+		pair->is_bulkmove = 1;
 	}
 
 	for (i = 0; i < q->nr; i++) {
diff --git a/diffcore.h b/diffcore.h
index b8f1fde..6dab95b 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
 	unsigned broken_pair : 1;
 	unsigned renamed_pair : 1;
 	unsigned is_unmerged : 1;
+	unsigned is_bulkmove : 1;
 };
 #define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
 
diff --git a/t/t4046-diff-rename-factorize.sh b/t/t4046-diff-rename-factorize.sh
index d063e25..9353929 100755
--- a/t/t4046-diff-rename-factorize.sh
+++ b/t/t4046-diff-rename-factorize.sh
@@ -37,7 +37,7 @@ test_expect_success 'setup' '
 
 test_expect_success 'diff-index --detect-bulk-moves after directory move.' '
 	cat >expected <<-EOF &&
-	:040000 040000 X X R#	a/	b/
+	:040000 040000 X X R#	a/*	b/
 	:100644 100644 X X R#	a/path0	b/path0
 	:100644 100644 X X R#	a/path1	b/path1
 	:100644 100644 X X R#	a/path2	b/path2
@@ -58,7 +58,7 @@ test_expect_success 'setup non-100% rename' '
 
 test_expect_success 'diff-index --detect-bulk-moves after content changes.' '
 	cat >expected <<-EOF &&
-	:040000 040000 X X R#	a/	b/
+	:040000 040000 X X R#	a/*	b/
 	:100644 000000 X X D#	a/path3
 	:100644 100644 X X R#	a/path2	b/2path
 	:100644 100644 X X R#	a/path0	b/path0
@@ -87,7 +87,7 @@ test_expect_success 'setup bulk move that is not directory move' '
 	git update-index --add --remove a/* c/apath0 c/apath1 c/apath2
 '
 
-test_expect_failure 'diff-index --detect-bulk-moves without full-dir rename.' '
+test_expect_success 'diff-index --detect-bulk-moves without full-dir rename.' '
 	cat >expected <<-EOF &&
 	:040000 040000 X X R#	c/*	a/
 	:100644 100644 X X R#	c/apath0	a/apath0
@@ -105,7 +105,7 @@ test_expect_success 'setup bulk move to toplevel' '
 	git update-index --add --remove apath* c/apath0 c/apath1 c/apath2
 '
 
-test_expect_failure 'diff-index --detect-bulk-moves bulk move to toplevel.' '
+test_expect_success 'diff-index --detect-bulk-moves bulk move to toplevel.' '
 	cat >expected <<-EOF &&
 	:040000 040000 X X R#	c/*	./
 	:100644 100644 X X R#	c/apath0	apath0
@@ -132,7 +132,7 @@ test_expect_success 'setup move including a subdir, with some content changes' '
 
 test_expect_failure 'diff-index --detect-bulk-moves on a move including a subdir.' '
 	cat >expected <<-EOF &&
-	:040000 040000 X X R#	a/	b/
+	:040000 040000 X X R#	a/*	b/
 	:100644 100644 X X R#	a/c/apath0	b/c/apath0
 	:100644 100644 X X R#	a/c/apath1	b/c/apath1
 	:100644 100644 X X R#	a/c/apath2	b/c/apath2
@@ -176,7 +176,7 @@ test_expect_success 'setup move of files and subdirs to different places' '
 
 test_expect_failure 'moving subdirs into one dir and files into another is not mistaken for dir move' '
 	cat >expected <<-EOF &&
-	:040000 040000 X X R#	a/c/	b/
+	:040000 040000 X X R#	a/c/*	b/
 	:100644 100644 X X R#	a/c/apath0	b/apath0
 	:100644 100644 X X R#	a/c/apath1	b/apath1
 	:100644 100644 X X R#	a/c/apath2	b/apath2
@@ -201,11 +201,11 @@ test_expect_success 'setup move of dir with only subdirs' '
 	git mv a z
 '
 
-test_expect_failure 'moving a dir with no files' '
+test_expect_failure 'moving a dir with no direct children files' '
 	cat >expected <<-EOF &&
-	:040000 040000 X X R#	a/	z/
-	:040000 040000 X X R#	a/b/	z/b/
-	:040000 040000 X X R#	a/c/	z/c/
+	:040000 040000 X X R#	a/*	z/
+	:040000 040000 X X R#	a/b/*	z/b/
+	:040000 040000 X X R#	a/c/*	z/c/
 	:100644 100644 X X R#	a/b/path0	z/b/path0
 	:100644 100644 X X R#	a/b/path1	z/b/path1
 	:100644 100644 X X R#	a/b/path2	z/b/path2
@@ -235,7 +235,7 @@ test_expect_success 'setup move from toplevel to subdir' '
 	git update-index --add --remove path0 path1 path2 path3 z/path*
 '
 
-test_expect_failure '--detect-bulk-moves everything from toplevel.' '
+test_expect_success '--detect-bulk-moves everything from toplevel.' '
 	cat >expected <<-EOF &&
 	:040000 040000 X X R#	./*	z/
 	:100644 100644 X X R#	path0	z/path0
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] [RFC] Transfer special display of toplevel dir to display-time.
  2010-10-10 19:59           ` [PATCH 1/2] " Yann Dirson
@ 2010-10-10 19:59             ` Yann Dirson
  0 siblings, 0 replies; 9+ messages in thread
From: Yann Dirson @ 2010-10-10 19:59 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

I have mixed feelings about this one: it makes the display code much
more messy than the rename code was, and it will need to be duplicated
for each output format.

So I suggest to finally just keep the "./" assignation in diffcore_rename.
Any better idea ?
---
 diff.c            |   22 +++++++++++++++-------
 diffcore-rename.c |    5 -----
 2 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/diff.c b/diff.c
index 4de43d6..6fe47c2 100644
--- a/diff.c
+++ b/diff.c
@@ -3471,15 +3471,23 @@ static void diff_flush_raw(struct diff_filepair *p, struct diff_options *opt)
 	if (p->status == DIFF_STATUS_COPIED ||
 	    p->status == DIFF_STATUS_RENAMED) {
 		const char *name_a, *name_b;
+		char buf[PATH_MAX];
+		name_a = p->one->path;
+		name_b = p->two->path;
 		if (p->is_bulkmove) {
-			/* append "*" to the first dirname */
-			char buf[PATH_MAX];
-			char* next = memccpy(buf, p->one->path, '\0', PATH_MAX);
-			next[-1] = '*'; *next = '\0';
-			name_a = buf;
+			if (!*name_a)
+				name_a = "./*"; /* visualize toplevel dir */
+			else {
+				/* append "*" to the first dirname */
+				char* next = memccpy(buf, name_a, '\0', PATH_MAX);
+				next[-1] = '*'; *next = '\0';
+				name_a = buf;
+			}
 		} else
-			name_a = p->one->path;
-		name_b = p->two->path;
+			if (!*name_a)
+				name_a = "./"; /* visualize toplevel dir */
+		if (!*name_b)
+			name_b = "./"; /* visualize toplevel dir */
 		strip_prefix(opt->prefix_length, &name_a, &name_b);
 		write_name_quoted(name_a, opt->file, inter_name_termination);
 		write_name_quoted(name_b, opt->file, line_termination);
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 8de0d57..a075a25 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -821,11 +821,6 @@ void diffcore_rename(struct diff_options *options)
 	for (candidate=bulkmove_candidates; candidate; candidate = candidate->next) {
 		struct diff_filepair* pair;
 		if (candidate->discarded) continue;
-		/* visualize toplevel dir if needed */ //FIXME: wrong place for this ?
-		if (!*candidate->one->path)
-			candidate->one->path = "./";
-		if (!*candidate->two->path)
-			candidate->two->path = "./";
 		pair = diff_queue(&outq, candidate->one, candidate->two);
 		pair->score = MAX_SCORE;
 		pair->renamed_pair = 1;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-10-10 20:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-09 21:31 [PATCH v5 0/4] Detection of directory renames Yann Dirson
2010-10-09 21:31 ` [PATCH v5 1/4] Introduce bulk-move detection in diffcore Yann Dirson
2010-10-09 21:31   ` [PATCH v5 2/4] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
2010-10-09 21:31     ` [PATCH v5 3/4] [RFC] Only show bulkmoves in output Yann Dirson
2010-10-09 21:31       ` [PATCH v5 4/4] [RFC] Allow hiding renames of individual files involved in a directory rename Yann Dirson
2010-10-10 12:39       ` [PATCH v5 3/4] [RFC] Only show bulkmoves in output Yann Dirson
2010-10-10 19:59         ` [PATCH 0/2] " Yann Dirson
2010-10-10 19:59           ` [PATCH 1/2] " Yann Dirson
2010-10-10 19:59             ` [PATCH 2/2] [RFC] Transfer special display of toplevel dir to display-time Yann Dirson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).