git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v6 0/5] Detection of directory renames
@ 2010-10-14 23:29 Yann Dirson
  2010-10-14 23:29 ` [PATCH v6 1/5] Introduce bulk-move detection in diffcore Yann Dirson
  2010-10-15  5:17 ` [PATCH] compat: add memrchr() Jonathan Nieder
  0 siblings, 2 replies; 17+ messages in thread
From: Yann Dirson @ 2010-10-14 23:29 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

Changes since v5:

* added more tests (one OK, one for suboptimal work only, one for a
  bug fixed)
* adjusted expected output for 'diff-index --detect-bulk-moves on a
  move including a subdir.', it is OK
* include a better copy_dirname based on Johnathan's code
* dropped the "Transfer special display of toplevel dir to display-time."
  patch posted earlier, I really think that was a bad idea.
* include 2 more patches that deal with problems with moves of deep trees,
  separated for easier review, but which should be squashed into the main
  one ultimately

Only relatively easy things are still on my immediate TODO list for
this series:

* unified diff informational output
* memrchr() implementation for portability
* cleanups (free allocated structures, split long funcs a bit more, better
  comments, etc)
* audit remaining FIXME tags

Finishing the "--hide" feature will be next.

Yann Dirson (5):
  Introduce bulk-move detection in diffcore.
  Add testcases for the --detect-bulk-moves diffcore flag.
  [RFC] Handle the simpler case of a subdir invalidating bulk move.
  [RFC] Consider all parents of a file as candidates for bulk rename.
  [WIP] Allow hiding renames of individual files involved in a
    directory rename.

 diff-lib.c                       |    6 +-
 diff.c                           |   21 ++-
 diff.h                           |    6 +
 diffcore-rename.c                |  376 +++++++++++++++++++++++++++++++++++++-
 diffcore.h                       |    2 +
 t/t4046-diff-rename-factorize.sh |  301 ++++++++++++++++++++++++++++++
 tree-diff.c                      |    4 +-
 7 files changed, 703 insertions(+), 13 deletions(-)
 create mode 100755 t/t4046-diff-rename-factorize.sh

-- 
1.7.2.3

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 1/5] Introduce bulk-move detection in diffcore.
  2010-10-14 23:29 [PATCH v6 0/5] Detection of directory renames Yann Dirson
@ 2010-10-14 23:29 ` Yann Dirson
  2010-10-14 23:29   ` [PATCH v6 2/5] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
  2010-10-15  5:17 ` [PATCH] compat: add memrchr() Jonathan Nieder
  1 sibling, 1 reply; 17+ messages in thread
From: Yann Dirson @ 2010-10-14 23:29 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

This feature tries to group together files moving from and to
identical directories - the most common case being directory renames.

It is activated by the new --detect-bulk-moves diffcore
flag.

This is only the first step, adding the basic functionnality and
adding support to raw diff output (and it is not supported in
unified-diff output yet)

The output of raw diff is displayed as "Rnnn a/* b/".  Those cannot be
confused with renames of files named "whatever/*" with a literal star
character, from the full-zero SHA1's.

Other future developements to be made on top of this include:
* extension of unified-diff format to express this
* display as such the special case of directory move/rename
* application of such new diffs: issue a conflict, or just a warning ?
* teach git-svn (and others ?) to make use of that flag
* handle new conflict type "bulk-move/add"
* detect "directory splits" as well
* use inexact dir renames to bump score of below-threshold renames
  from/to same locations
* add yours here

Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 diff-lib.c        |    6 +-
 diff.c            |   14 +++-
 diff.h            |    3 +
 diffcore-rename.c |  274 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 diffcore.h        |    1 +
 tree-diff.c       |    4 +-
 6 files changed, 290 insertions(+), 12 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index 392ce2b..5ec3ddc 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 						    ce_option, &dirty_submodule);
 		if (!changed && !dirty_submodule) {
 			ce_mark_uptodate(ce);
-			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+			if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+			    !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
 				continue;
 		}
 		oldmode = ce->ce_mode;
@@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs,
 
 	oldmode = old->ce_mode;
 	if (mode == oldmode && !hashcmp(sha1, old->sha1) && !dirty_submodule &&
-	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER))
+	    !DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) &&
+	    !DIFF_OPT_TST(&revs->diffopt, DETECT_BULK_MOVES))
 		return 0;
 
 	diff_change(&revs->diffopt, oldmode, mode,
diff --git a/diff.c b/diff.c
index 71efa8e..4de43d6 100644
--- a/diff.c
+++ b/diff.c
@@ -3188,6 +3188,11 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_OPT_SET(options, REVERSE_DIFF);
 	else if (!strcmp(arg, "--find-copies-harder"))
 		DIFF_OPT_SET(options, FIND_COPIES_HARDER);
+	else if (!strcmp(arg, "--detect-bulk-moves")) {
+		DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+		if (!options->detect_rename)
+			options->detect_rename = DIFF_DETECT_RENAME;
+	}
 	else if (!strcmp(arg, "--follow"))
 		DIFF_OPT_SET(options, FOLLOW_RENAMES);
 	else if (!strcmp(arg, "--color"))
@@ -3466,7 +3471,14 @@ static void diff_flush_raw(struct diff_filepair *p, struct diff_options *opt)
 	if (p->status == DIFF_STATUS_COPIED ||
 	    p->status == DIFF_STATUS_RENAMED) {
 		const char *name_a, *name_b;
-		name_a = p->one->path;
+		if (p->is_bulkmove) {
+			/* append "*" to the first dirname */
+			char buf[PATH_MAX];
+			char* next = memccpy(buf, p->one->path, '\0', PATH_MAX);
+			next[-1] = '*'; *next = '\0';
+			name_a = buf;
+		} else
+			name_a = p->one->path;
 		name_b = p->two->path;
 		strip_prefix(opt->prefix_length, &name_a, &name_b);
 		write_name_quoted(name_a, opt->file, inter_name_termination);
diff --git a/diff.h b/diff.h
index 1fd44f5..b0d6fa6 100644
--- a/diff.h
+++ b/diff.h
@@ -78,6 +78,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
+#define DIFF_OPT_DETECT_BULK_MOVES  (1 << 28)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
@@ -265,6 +266,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
 "  -C            detect copies.\n" \
 "  --find-copies-harder\n" \
 "                try unchanged files as candidate for copy detection.\n" \
+"  --detect-bulk-moves\n" \
+"                detect wholesale directory renames.\n" \
 "  -l<n>         limit rename attempts up to <n> paths.\n" \
 "  -O<file>      reorder diffs according to the <file>.\n" \
 "  -S<string>    find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..f252da7 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -11,9 +11,13 @@
 static struct diff_rename_dst {
 	struct diff_filespec *two;
 	struct diff_filepair *pair;
+	int i_am_not_single:1; /* does not look for a match, only here to be looked at */
 } *rename_dst;
 static int rename_dst_nr, rename_dst_alloc;
 
+/*
+ * Do a binary search of "two" in "rename_dst", inserting it if not found.
+ */
 static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
 						 int insert_ok)
 {
@@ -49,9 +53,36 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
 	rename_dst[first].two = alloc_filespec(two->path);
 	fill_filespec(rename_dst[first].two, two->sha1, two->mode);
 	rename_dst[first].pair = NULL;
+	rename_dst[first].i_am_not_single = 0;
 	return &(rename_dst[first]);
 }
 
+/*
+ * Do a binary search in "rename_dst" of any entry under "dirname".
+ */
+static struct diff_rename_dst *locate_rename_dst_dir(const char *dirname)
+{
+	int first, last;
+	int prefixlength = strlen(dirname);
+
+	first = 0;
+	last = rename_dst_nr;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		struct diff_rename_dst *dst = &(rename_dst[next]);
+		int cmp = strncmp(dirname, dst->two->path, prefixlength);
+		if (!cmp)
+			return dst;
+		if (cmp < 0) {
+			last = next;
+			continue;
+		}
+		first = next+1;
+	}
+	/* not found */
+	return NULL;
+}
+
 /* Table of rename/copy src files */
 static struct diff_rename_src {
 	struct diff_filespec *one;
@@ -386,8 +417,11 @@ static int find_exact_renames(void)
 	for (i = 0; i < rename_src_nr; i++)
 		insert_file_table(&file_table, -1, i, rename_src[i].one);
 
-	for (i = 0; i < rename_dst_nr; i++)
+	for (i = 0; i < rename_dst_nr; i++) {
+		if (rename_dst[i].i_am_not_single)
+			continue;
 		insert_file_table(&file_table, 1, i, rename_dst[i].two);
+	}
 
 	/* Find the renames */
 	i = for_each_hash(&file_table, find_same_files);
@@ -414,6 +448,196 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
 		m[worst] = *o;
 }
 
+struct diff_dir_rename {
+	struct diff_filespec *one;
+	struct diff_filespec *two;
+	int discarded;
+	struct diff_dir_rename* next;
+};
+
+/*
+ * Copy dirname of src into dst, suitable to append a filename without
+ * an additional "/".
+ * Only handles relative paths since there is no absolute path in a git repo.
+ * Writes "" when there is no "/" in src.
+ * May overwrite more chars than really needed, if src ends with a "/".
+ * Supports in-place modification of src by passing dst == src.
+ */
+static const char* copy_dirname(char* dst, const char* src)
+{
+	size_t len = strlen(src);
+	const char *slash;
+
+	if (len > 0 && src[len - 1] == '/')
+		/* Trailing slash.  Ignore it. */
+		len--;
+
+	slash = memrchr(src, '/', len);
+	if (!slash) {
+		*dst = '\0';
+		return dst;
+	}
+
+	char* end = mempcpy(dst, src, slash - src + 1);
+	*end = '\0';
+	return dst;
+}
+
+// FIXME: investigate avoiding i_am_not_single by using a separate list
+// FIXME: we could optimize the 100%-rename case by preventing
+//        recursion to unfold what we know we would refold here.
+// FIXME: do we want to replace linked list with sorted array ?
+// FIXME: leaks like hell.
+// FIXME: ideas to evaluate a similarity score, anyone ?
+//        10% * tree similarity + 90% * moved files similarity ?
+// FIXME: ideas to consider under-threshold moves as part of bulk move ?
+
+// FIXME: handle as sorted list to allow for binary search ?
+static struct diff_dir_rename* bulkmove_candidates = NULL;
+
+/* See if the fact that one_leftover exists under one_parent_path in
+ * dst tree should disqualify one_parent_path from bulkmove eligibility.
+ * Return 1 if it disqualifies, 0 if it is OK.
+ */
+static int maybe_disqualify_bulkmove(const char* one_parent_path,
+				     struct diff_rename_dst* one_leftover)
+{
+	// FIXME: should only be run if !seen
+	/* this might be a dir split, or files added
+	 * after the bulk move, or just an isolated
+	 * rename */
+	/* FIXME: should we cache this information to avoid
+	 * recomputing that result when many files trigger it ?
+	 * See eg. git.git:761e742d692.
+	 */
+	int two_idx, j, onep_len, maybe_dir_rename;
+
+	/* try to see if it is a file added after the bulk move */
+	two_idx = one_leftover - rename_dst;
+	onep_len = strlen(one_parent_path);
+	maybe_dir_rename = 1;
+
+	/* check no leftover file was already here before */
+	for (j = two_idx; j < rename_dst_nr; j++) {
+		if (strncmp(rename_dst[j].two->path,
+			    one_parent_path, onep_len))
+			break; /* exhausted directory in this direction */
+		fprintf(stderr, "[DBG] leftover file %s in %s\n",
+			rename_dst[j].two->path, one_parent_path);
+		if (rename_dst[j].i_am_not_single || /* those were already here */
+		    (rename_dst[j].pair &&
+		     !strncmp(rename_dst[j].pair->one->path,
+			      one_parent_path, onep_len) && /* renamed from here */
+		     strncmp(rename_dst[j].two->path,
+			     one_parent_path, onep_len))) { /* not to a subdir */
+			maybe_dir_rename = 0;
+			fprintf(stderr, "[DBG] ... tells not a bulk move\n");
+			break;
+		}
+		fprintf(stderr, "[DBG] ... not believed to prevent bulk move\n");
+	}
+	if (!maybe_dir_rename) return 1;
+	/* try the other direction (dup code) */
+	for (j = two_idx-1; j >= 0; j--) {
+		if (strncmp(rename_dst[j].two->path,
+			    one_parent_path, onep_len))
+			break; /* exhausted directory in this direction */
+		fprintf(stderr, "[DBG] leftover file %s in %s\n",
+			rename_dst[j].two->path, one_parent_path);
+		if (rename_dst[j].i_am_not_single || /* those were already here */
+		    (rename_dst[j].pair &&
+		     !strncmp(rename_dst[j].pair->one->path,
+			      one_parent_path, onep_len) && /* renamed from here */
+		     strncmp(rename_dst[j].two->path,
+			     one_parent_path, onep_len))) { /* not to a subdir */
+			maybe_dir_rename = 0;
+			fprintf (stderr, "[DBG] ... tells not a bulk move\n");
+			break;
+		}
+		fprintf(stderr, "[DBG] ... not believed to prevent bulk move\n");
+	}
+	if (!maybe_dir_rename) return 1;
+
+	/* Here we are in the case where a directory
+	 * content was completely moved, but files
+	 * were added to it afterwards.  Proceed as
+	 * for a simple bulk move. */
+	return 0;
+}
+
+/*
+ * Check if the rename specified by "dstpair" could cause a
+ * bulk move to be detected, record it in bulkmove_candidates if yes.
+ */
+static void check_one_bulk_move(struct diff_filepair *dstpair)
+{
+	char one_parent_path[PATH_MAX], two_parent_path[PATH_MAX];
+	struct diff_dir_rename* seen;
+
+	/* genuine new files (or believed to be so) */
+	if (!dstpair)
+		return;
+	/* dummy renames used by copy detection */
+	if (!strcmp(dstpair->one->path, dstpair->two->path))
+		return;
+
+	copy_dirname(one_parent_path, dstpair->one->path);
+	copy_dirname(two_parent_path, dstpair->two->path);
+
+	/* simple rename with no directory change */
+	if (!strcmp(one_parent_path, two_parent_path))
+		return;
+
+	/* After this commit, are there any files still under one_parent_path ?
+	 * Any file left would disqualifies this dir for bulk move.
+	 */
+	struct diff_rename_dst* one_leftover = locate_rename_dst_dir(one_parent_path);
+	if (one_leftover &&
+	    maybe_disqualify_bulkmove(one_parent_path, one_leftover))
+		return;
+
+	/* already considered ? */
+	for (seen=bulkmove_candidates; seen; seen = seen->next)
+		if (!strcmp(seen->one->path, one_parent_path)) break;
+	if (!seen) { /* record potential dir rename */
+		seen = xmalloc(sizeof(*seen));
+		seen->one = alloc_filespec(one_parent_path);
+		fill_filespec(seen->one, null_sha1, S_IFDIR);
+		seen->two = alloc_filespec(two_parent_path);
+		fill_filespec(seen->two, null_sha1, S_IFDIR);
+		seen->discarded = 0;
+		seen->next = bulkmove_candidates;
+		bulkmove_candidates = seen;
+		fprintf(stderr, "[DBG] %s -> %s suggests possible bulk move from %s to %s\n",
+			dstpair->one->path,
+			dstpair->two->path,
+			one_parent_path, two_parent_path);
+		return;
+	}
+	if (seen->discarded)
+		/* already seen a rename from seen->one to some than ->two */
+		return;
+	/* check that seen entry matches this rename */
+	if (strcmp(two_parent_path, seen->two->path)) {
+		fprintf(stderr, "[DBG] discarding dir %s from bulk moves (split into %s and %s)\n",
+			one_parent_path, two_parent_path, seen->two->path);
+		seen->discarded = 1;
+	}
+
+	/* all checks ok, we keep that entry */
+}
+
+/*
+ * Take all file renames recorded so far and check if they could cause
+ * a bulk move to be detected.
+ */
+static void diffcore_bulk_moves(void)
+{
+	int i;
+	for (i = 0; i < rename_dst_nr; i++)
+		check_one_bulk_move(rename_dst[i].pair);
+}
+
 void diffcore_rename(struct diff_options *options)
 {
 	int detect_rename = options->detect_rename;
@@ -451,13 +675,23 @@ void diffcore_rename(struct diff_options *options)
 				p->one->rename_used++;
 			register_rename_src(p->one, p->score);
 		}
-		else if (detect_rename == DIFF_DETECT_COPY) {
-			/*
-			 * Increment the "rename_used" score by
-			 * one, to indicate ourselves as a user.
-			 */
-			p->one->rename_used++;
-			register_rename_src(p->one, p->score);
+		else {
+			if (detect_rename == DIFF_DETECT_COPY) {
+				/*
+				 * Increment the "rename_used" score by
+				 * one, to indicate ourselves as a user.
+				 */
+				p->one->rename_used++;
+				register_rename_src(p->one, p->score);
+			}
+			if (DIFF_OPT_TST(options, DETECT_BULK_MOVES)) {
+				/* similarly, bulk move detection needs to
+				 * see all files from second tree, but we don't
+				 * want them to be matched against single sources.
+				 */
+				// FIXME: check interaction with --find-copies-harder
+				locate_rename_dst(p->two, 1)->i_am_not_single = 1;
+			}
 		}
 	}
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
@@ -509,6 +743,8 @@ void diffcore_rename(struct diff_options *options)
 
 		if (rename_dst[i].pair)
 			continue; /* dealt with exact match already. */
+		if (rename_dst[i].i_am_not_single)
+			continue; /* not looking for a match. */
 
 		m = &mx[dst_cnt * NUM_CANDIDATE_PER_DST];
 		for (j = 0; j < NUM_CANDIDATE_PER_DST; j++)
@@ -569,7 +805,29 @@ void diffcore_rename(struct diff_options *options)
 	/* At this point, we have found some renames and copies and they
 	 * are recorded in rename_dst.  The original list is still in *q.
 	 */
+
+	/* Now possibly factorize those renames and copies. */
+	if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+		diffcore_bulk_moves();
+
 	DIFF_QUEUE_CLEAR(&outq);
+
+	/* Now turn non-discarded bulkmove_candidates into real renames */
+	struct diff_dir_rename* candidate;
+	for (candidate=bulkmove_candidates; candidate; candidate = candidate->next) {
+		struct diff_filepair* pair;
+		if (candidate->discarded) continue;
+		/* visualize toplevel dir if needed */
+		if (!*candidate->one->path)
+			candidate->one->path = "./";
+		if (!*candidate->two->path)
+			candidate->two->path = "./";
+		pair = diff_queue(&outq, candidate->one, candidate->two);
+		pair->score = MAX_SCORE;
+		pair->renamed_pair = 1;
+		pair->is_bulkmove = 1;
+	}
+
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		struct diff_filepair *pair_to_free = NULL;
diff --git a/diffcore.h b/diffcore.h
index b8f1fde..6dab95b 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
 	unsigned broken_pair : 1;
 	unsigned renamed_pair : 1;
 	unsigned is_unmerged : 1;
+	unsigned is_bulkmove : 1;
 };
 #define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
 
diff --git a/tree-diff.c b/tree-diff.c
index cd659c6..5d9f123 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -49,7 +49,9 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
 		show_entry(opt, "+", t2, base, baselen);
 		return 1;
 	}
-	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) && mode1 == mode2)
+	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) &&
+	    !DIFF_OPT_TST(opt, DETECT_BULK_MOVES) &&
+	    !hashcmp(sha1, sha2) && mode1 == mode2)
 		return 0;
 
 	/*
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 2/5] Add testcases for the --detect-bulk-moves diffcore flag.
  2010-10-14 23:29 ` [PATCH v6 1/5] Introduce bulk-move detection in diffcore Yann Dirson
@ 2010-10-14 23:29   ` Yann Dirson
  2010-10-14 23:29     ` [PATCH v6 3/5] [RFC] Handle the simpler case of a subdir invalidating bulk move Yann Dirson
  0 siblings, 1 reply; 17+ messages in thread
From: Yann Dirson @ 2010-10-14 23:29 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

This notably includes a couple of tests for cases known not to be
working correctly yet.

Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 t/t4046-diff-rename-factorize.sh |  301 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 301 insertions(+), 0 deletions(-)
 create mode 100755 t/t4046-diff-rename-factorize.sh

diff --git a/t/t4046-diff-rename-factorize.sh b/t/t4046-diff-rename-factorize.sh
new file mode 100755
index 0000000..dca7cb7
--- /dev/null
+++ b/t/t4046-diff-rename-factorize.sh
@@ -0,0 +1,301 @@
+#!/bin/sh
+#
+# Copyright (c) 2008,2010 Yann Dirson
+# Copyright (c) 2005 Junio C Hamano
+#
+
+# TODO for dir renames:
+# * two dirs or more moving all their files to a single dir
+# * simultaneous bulkmove and rename
+
+test_description='Test rename factorization in diff engine.
+
+'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
+
+test_expect_success 'setup' '
+	git commit --allow-empty -m "original empty commit"
+
+	mkdir a &&
+	printf "Line %s\n" 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 >a/path0 &&
+	sed <a/path0 >a/path1 s/Line/Record/ &&
+	sed <a/path0 >a/path2 s/Line/Stuff/ &&
+	sed <a/path0 >a/path3 s/Line/Blurb/ &&
+
+	git update-index --add a/path* &&
+	test_tick &&
+	git commit -m "original set of files"
+
+	: rename the directory &&
+	mv a b &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 b/path*
+'
+test_expect_success 'diff-index --detect-bulk-moves after directory move.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/*	b/
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:100644 100644 X X R#	a/path2	b/path2
+	:100644 100644 X X R#	a/path3	b/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup non-100% rename' '
+	echo "Line 16" >>b/path0 &&
+	mv b/path2 b/2path &&
+	rm b/path3 &&
+	echo anything >b/path100 &&
+	git update-index --add --remove b/* b/path2 b/path3
+'
+test_expect_success 'diff-index --detect-bulk-moves after content changes.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/*	b/
+	:100644 000000 X X D#	a/path3
+	:100644 100644 X X R#	a/path2	b/2path
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:000000 100644 X X A#	b/path100
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup bulk move that is not directory move' '
+	git reset -q --hard &&
+
+	mkdir c &&
+	(
+		for i in 0 1 2; do
+			cp a/path$i c/apath$i || exit
+		done
+	) &&
+	git update-index --add c/apath* &&
+	test_tick &&
+	git commit -m "first set of changes" &&
+
+	mv c/* a/ &&
+	git update-index --add --remove a/* c/apath0 c/apath1 c/apath2
+'
+test_expect_success 'diff-index --detect-bulk-moves without full-dir rename.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	a/
+	:100644 100644 X X R#	c/apath0	a/apath0
+	:100644 100644 X X R#	c/apath1	a/apath1
+	:100644 100644 X X R#	c/apath2	a/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup bulk move with new file in source dir' '
+	echo > c/anotherpath "How much wood?" &&
+	git update-index --add c/another*
+'
+test_expect_success 'diff-index --detect-bulk-moves with new file in source dir.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	a/
+	:100644 100644 X X R#	c/apath0	a/apath0
+	:100644 100644 X X R#	c/apath1	a/apath1
+	:100644 100644 X X R#	c/apath2	a/apath2
+	:000000 100644 X X A#	c/anotherpath
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup bulk move with interfering copy' '
+	rm c/anotherpath &&
+	git update-index --remove c/anotherpath &&
+	mkdir b &&
+	cp a/apath0 b/apath9 &&
+	echo >> a/apath0 "more" &&
+	git update-index --add a/apath0 b/apath9
+'
+# scores select the "wrong" one as "moved" (only a suboptimal detection)
+test_expect_failure 'diff-index --detect-bulk-moves with interfering copy.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	a/
+	:100644 100644 X X R#	c/apath0	a/apath0
+	:100644 100644 X X R#	c/apath1	a/apath1
+	:100644 100644 X X R#	c/apath2	a/apath2
+	:100644 100644 X X C#	c/apath0	b/apath9
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup bulk move to toplevel' '
+	git reset -q --hard &&
+	mv c/* . &&
+	git update-index --add --remove apath* c/apath0 c/apath1 c/apath2
+'
+test_expect_success 'diff-index --detect-bulk-moves bulk move to toplevel.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	c/*	./
+	:100644 100644 X X R#	c/apath0	apath0
+	:100644 100644 X X R#	c/apath1	apath1
+	:100644 100644 X X R#	c/apath2	apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup move including a subdir, with some content changes' '
+	git reset -q --hard &&
+	mv c a/ &&
+	git update-index --add --remove a/c/* c/apath0 c/apath1 c/apath2 &&
+	test_tick &&
+	git commit -m "move as subdir" &&
+
+	mv a b &&
+	echo foo >>b/c/apath0 &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 \
+			a/c/apath0 a/c/apath1 a/c/apath2 b/path* b/c/apath*
+'
+test_expect_success 'diff-index --detect-bulk-moves on a move including a subdir.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/*	b/
+	:040000 040000 X X R#	a/c/*	b/c/
+	:100644 100644 X X R#	a/c/apath0	b/c/apath0
+	:100644 100644 X X R#	a/c/apath1	b/c/apath1
+	:100644 100644 X X R#	a/c/apath2	b/c/apath2
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:100644 100644 X X R#	a/path2	b/path2
+	:100644 100644 X X R#	a/path3	b/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup move without a subdir' '
+	git reset -q --hard &&
+	mkdir b &&
+	mv a/path* b/ &&
+	: rename files in the directory but not subdir. &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 b/path*
+'
+test_expect_success 'moving files but not subdirs is not mistaken for dir move' '
+	cat >expected <<-EOF &&
+	:100644 100644 X X R#	a/path0	b/path0
+	:100644 100644 X X R#	a/path1	b/path1
+	:100644 100644 X X R#	a/path2	b/path2
+	:100644 100644 X X R#	a/path3	b/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup move of files and subdirs to different places' '
+	git reset -q --hard &&
+	mv a/c b &&
+	mv a d &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 \
+		a/c/apath0 a/c/apath1 a/c/apath2 d/path* b/apath*
+'
+test_expect_success 'moving subdirs into one dir and files into another is not mistaken for dir move' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/c/*	b/
+	:100644 100644 X X R#	a/c/apath0	b/apath0
+	:100644 100644 X X R#	a/c/apath1	b/apath1
+	:100644 100644 X X R#	a/c/apath2	b/apath2
+	:100644 100644 X X R#	a/path0	d/path0
+	:100644 100644 X X R#	a/path1	d/path1
+	:100644 100644 X X R#	a/path2	d/path2
+	:100644 100644 X X R#	a/path3	d/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+# the same with different ordering
+test_expect_success 'setup move of files and subdirs to different places' '
+	mv d 0 &&
+	git update-index --add --remove d/path0 d/path1 d/path2 d/path3 0/path*
+'
+test_expect_success 'moving subdirs into one dir and files into another is not mistaken for dir move' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/c/*	b/
+	:100644 100644 X X R#	a/path0	0/path0
+	:100644 100644 X X R#	a/path1	0/path1
+	:100644 100644 X X R#	a/path2	0/path2
+	:100644 100644 X X R#	a/path3	0/path3
+	:100644 100644 X X R#	a/c/apath0	b/apath0
+	:100644 100644 X X R#	a/c/apath1	b/apath1
+	:100644 100644 X X R#	a/c/apath2	b/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_expect_success 'setup move of dir with only subdirs' '
+	git reset -q --hard &&
+	mkdir a/b &&
+	mv a/path* a/b/ &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 a/b/path* &&
+	test_tick &&
+	git commit -m "move all toplevel files down one level" &&
+
+	git mv a z
+'
+# TODO: only a suboptimal non-detection
+test_expect_failure 'moving a dir with no direct children files' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	a/*	z/
+	:040000 040000 X X R#	a/b/*	z/b/
+	:040000 040000 X X R#	a/c/*	z/c/
+	:100644 100644 X X R#	a/b/path0	z/b/path0
+	:100644 100644 X X R#	a/b/path1	z/b/path1
+	:100644 100644 X X R#	a/b/path2	z/b/path2
+	:100644 100644 X X R#	a/b/path3	z/b/path3
+	:100644 100644 X X R#	a/c/apath0	z/c/apath0
+	:100644 100644 X X R#	a/c/apath1	z/c/apath1
+	:100644 100644 X X R#	a/c/apath2	z/c/apath2
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+# now test moving all files from toplevel into subdir (does not hides file moves) (needs consensus on syntax)
+# Note: this is a special case of move of a dir into one of its own subdirs, which in
+# turn is a variant of new files/dirs being added into a dir after all its contents
+# are moved away
+
+test_expect_success 'setup move from toplevel to subdir' '
+	git reset -q --hard HEAD~3 &&
+	mv a/* . &&
+	git update-index --add --remove a/path0 a/path1 a/path2 a/path3 path* &&
+	test_tick &&
+	git commit -m "move all files to toplevel" &&
+
+	mkdir z &&
+	mv path* z/ &&
+	git update-index --add --remove path0 path1 path2 path3 z/path*
+'
+test_expect_success '--detect-bulk-moves everything from toplevel.' '
+	cat >expected <<-EOF &&
+	:040000 040000 X X R#	./*	z/
+	:100644 100644 X X R#	path0	z/path0
+	:100644 100644 X X R#	path1	z/path1
+	:100644 100644 X X R#	path2	z/path2
+	:100644 100644 X X R#	path3	z/path3
+	EOF
+	git diff-index --detect-bulk-moves HEAD >current &&
+	grep -v "^\[DBG\] " <current >current.filtered &&
+	compare_diff_raw expected current.filtered
+'
+
+test_done
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 3/5] [RFC] Handle the simpler case of a subdir invalidating bulk move.
  2010-10-14 23:29   ` [PATCH v6 2/5] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
@ 2010-10-14 23:29     ` Yann Dirson
  2010-10-14 23:29       ` [PATCH v6 4/5] [RFC] Consider all parents of a file as candidates for bulk rename Yann Dirson
  0 siblings, 1 reply; 17+ messages in thread
From: Yann Dirson @ 2010-10-14 23:29 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

Doing it so simply depends on the subdir moves being noticed later,
this fixes only one of the remaining failing testcases.  Doing better
requires real handling of deep trees, which is addressed by next
patch, but add much complexity.

Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 diffcore-rename.c |   59 ++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 45 insertions(+), 14 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index f252da7..1be1af1 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -596,10 +596,52 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
 	    maybe_disqualify_bulkmove(one_parent_path, one_leftover))
 		return;
 
+	fprintf(stderr, "[] %s -> %s ?\n",dstpair->one->path, dstpair->two->path);
+
+	// FIXME: loop over successive prefixes
+	unsigned needs_adding = 1;
+
 	/* already considered ? */
-	for (seen=bulkmove_candidates; seen; seen = seen->next)
-		if (!strcmp(seen->one->path, one_parent_path)) break;
-	if (!seen) { /* record potential dir rename */
+	for (seen=bulkmove_candidates; seen; seen = seen->next) {
+		if (seen->discarded) {
+			/* already seen a rename from seen->one to some than ->two */
+			needs_adding = 0;
+			continue;
+		}
+		/* check exact dir */
+		if (!strcmp(seen->one->path, one_parent_path)) {
+			/* already added */
+			needs_adding = 0;
+			/* check that seen entry matches this rename */
+			if (strcmp(two_parent_path, seen->two->path)) {
+				fprintf(stderr, "[DBG] discarding dir %s from bulk moves (split into %s and %s)\n",
+					seen->one->path, two_parent_path, seen->two->path);
+				// FIXME: may be worth to free it instead
+				seen->discarded = 1;
+			}
+			continue;
+		}
+		if (!prefixcmp(one_parent_path, seen->one->path)) {
+			if (prefixcmp(two_parent_path, seen->two->path)) {
+				fprintf(stderr, "[DBG] discarding dir %s from bulk moves (split into %s and %s)\n",
+					seen->one->path, two_parent_path, seen->two->path);
+				// FIXME: may be worth to free it instead
+				seen->discarded = 1;
+				continue;
+			}
+		} else {
+			fprintf(stderr, "[DBG]  %s considered irrelevant for %s -> %s\n",
+				dstpair->one->path, seen->one->path, seen->two->path);
+			continue;
+		}
+
+		/* dstpair confirms seen */
+		fprintf(stderr, "[DBG] %s -> %s DOES NOT cause discard of %s -> %s\n",
+			dstpair->one->path, dstpair->two->path,
+			seen->one->path, seen->two->path);
+	}
+	if (needs_adding) { /* record potential dir rename */
+		/* all checks ok, we keep that entry */
 		seen = xmalloc(sizeof(*seen));
 		seen->one = alloc_filespec(one_parent_path);
 		fill_filespec(seen->one, null_sha1, S_IFDIR);
@@ -614,17 +656,6 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
 			one_parent_path, two_parent_path);
 		return;
 	}
-	if (seen->discarded)
-		/* already seen a rename from seen->one to some than ->two */
-		return;
-	/* check that seen entry matches this rename */
-	if (strcmp(two_parent_path, seen->two->path)) {
-		fprintf(stderr, "[DBG] discarding dir %s from bulk moves (split into %s and %s)\n",
-			one_parent_path, two_parent_path, seen->two->path);
-		seen->discarded = 1;
-	}
-
-	/* all checks ok, we keep that entry */
 }
 
 /*
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 4/5] [RFC] Consider all parents of a file as candidates for bulk rename.
  2010-10-14 23:29     ` [PATCH v6 3/5] [RFC] Handle the simpler case of a subdir invalidating bulk move Yann Dirson
@ 2010-10-14 23:29       ` Yann Dirson
  2010-10-14 23:29         ` [PATCH v6 5/5] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson
  2010-10-17 19:24         ` [PATCH v6.1] [RFC] Consider all parents of a file as candidates for bulk rename Yann Dirson
  0 siblings, 2 replies; 17+ messages in thread
From: Yann Dirson @ 2010-10-14 23:29 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

We previously only looked at the immediate parents of a file involved
in a move.  Now consider upper ones as candidates too.

The way it is done here is somewhat inefficient, but at least it
handles all previously-known cases of incorrectness.

Note this patch largely puts existing code in a look, and is a bit
easier to read with -w.

Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 diffcore-rename.c |  118 ++++++++++++++++++++++++++++-------------------------
 1 files changed, 63 insertions(+), 55 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 1be1af1..ff69201 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -542,7 +542,7 @@ static int maybe_disqualify_bulkmove(const char* one_parent_path,
 		if (strncmp(rename_dst[j].two->path,
 			    one_parent_path, onep_len))
 			break; /* exhausted directory in this direction */
-		fprintf(stderr, "[DBG] leftover file %s in %s\n",
+		fprintf(stderr, "[DBG] leftover file %s in '%s'\n",
 			rename_dst[j].two->path, one_parent_path);
 		if (rename_dst[j].i_am_not_single || /* those were already here */
 		    (rename_dst[j].pair &&
@@ -596,66 +596,74 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
 	    maybe_disqualify_bulkmove(one_parent_path, one_leftover))
 		return;
 
-	fprintf(stderr, "[] %s -> %s ?\n",dstpair->one->path, dstpair->two->path);
+	fprintf(stderr, "[] %s -> %s ?\n", dstpair->one->path, dstpair->two->path);
 
-	// FIXME: loop over successive prefixes
-	unsigned needs_adding = 1;
+	// loop over successive prefixes
+	// FIXME: also loop over two_parent_path prefixes ?
+	do {
+		unsigned needs_adding = 1;
 
-	/* already considered ? */
-	for (seen=bulkmove_candidates; seen; seen = seen->next) {
-		if (seen->discarded) {
-			/* already seen a rename from seen->one to some than ->two */
-			needs_adding = 0;
-			continue;
-		}
-		/* check exact dir */
-		if (!strcmp(seen->one->path, one_parent_path)) {
-			/* already added */
-			needs_adding = 0;
-			/* check that seen entry matches this rename */
-			if (strcmp(two_parent_path, seen->two->path)) {
-				fprintf(stderr, "[DBG] discarding dir %s from bulk moves (split into %s and %s)\n",
-					seen->one->path, two_parent_path, seen->two->path);
-				// FIXME: may be worth to free it instead
-				seen->discarded = 1;
-			}
-			continue;
-		}
-		if (!prefixcmp(one_parent_path, seen->one->path)) {
-			if (prefixcmp(two_parent_path, seen->two->path)) {
-				fprintf(stderr, "[DBG] discarding dir %s from bulk moves (split into %s and %s)\n",
-					seen->one->path, two_parent_path, seen->two->path);
-				// FIXME: may be worth to free it instead
-				seen->discarded = 1;
+		fprintf(stderr, "[[]] %s ...\n", one_parent_path);
+
+		/* already considered ? */
+		for (seen=bulkmove_candidates; seen; seen = seen->next) {
+			if (seen->discarded) {
+				/* already seen a rename from seen->one to some than ->two */
+				needs_adding = 0;
 				continue;
 			}
-		} else {
-			fprintf(stderr, "[DBG]  %s considered irrelevant for %s -> %s\n",
-				dstpair->one->path, seen->one->path, seen->two->path);
-			continue;
+			fprintf(stderr, "[DBG]  ? %s -> %s\n", seen->one->path, seen->two->path);
+			/* subdir of "seen" source dir ? */
+			if (!prefixcmp(one_parent_path, seen->one->path)) {
+				/* subdir of "seen" dest dir ? */
+				if (!prefixcmp(two_parent_path, seen->two->path)) {
+					if (!strcmp(seen->one->path, one_parent_path) &&
+					    !strcmp(seen->two->path, two_parent_path)) {
+						/* already added */
+						fprintf(stderr, "[DBG] already added\n");
+						needs_adding = 0;
+					} else // confirms
+						fprintf(stderr, "[DBG]  'dstpair' conforts 'seen'\n");
+				} else {
+					fprintf(stderr, "[DBG] discarding %s -> %s from bulk moves (split into %s and %s)\n",
+						seen->one->path, seen->two->path,
+						two_parent_path, seen->two->path);
+					// FIXME: may be worth to free it instead
+					seen->discarded = 1;
+					if (!strcmp(seen->one->path, one_parent_path) &&
+					    prefixcmp(seen->two->path, two_parent_path)) {
+						fprintf(stderr, "[DBG] ... and not adding self\n");
+						needs_adding = 0;
+					}
+					continue;
+				}
+			}
+			else fprintf(stderr, "[DBG]  'dstpair' unrelated to 'seen'\n");
+
+			/* dstpair confirms seen, or does not infirm */
+			fprintf(stderr, "[DBG] %s -> %s DOES NOT cause discard of %s -> %s\n",
+				dstpair->one->path, dstpair->two->path,
+				seen->one->path, seen->two->path);
+		}
+		if (needs_adding) { /* record potential dir rename */
+			/* all checks ok, we keep that entry */
+			seen = xmalloc(sizeof(*seen));
+			seen->one = alloc_filespec(one_parent_path);
+			fill_filespec(seen->one, null_sha1, S_IFDIR);
+			seen->two = alloc_filespec(two_parent_path);
+			fill_filespec(seen->two, null_sha1, S_IFDIR);
+			seen->discarded = 0;
+			seen->next = bulkmove_candidates;
+			bulkmove_candidates = seen;
+			fprintf(stderr, "[DBG] %s -> %s suggests possible bulk move from %s to %s\n",
+				dstpair->one->path,
+				dstpair->two->path,
+				one_parent_path, two_parent_path);
 		}
 
-		/* dstpair confirms seen */
-		fprintf(stderr, "[DBG] %s -> %s DOES NOT cause discard of %s -> %s\n",
-			dstpair->one->path, dstpair->two->path,
-			seen->one->path, seen->two->path);
-	}
-	if (needs_adding) { /* record potential dir rename */
-		/* all checks ok, we keep that entry */
-		seen = xmalloc(sizeof(*seen));
-		seen->one = alloc_filespec(one_parent_path);
-		fill_filespec(seen->one, null_sha1, S_IFDIR);
-		seen->two = alloc_filespec(two_parent_path);
-		fill_filespec(seen->two, null_sha1, S_IFDIR);
-		seen->discarded = 0;
-		seen->next = bulkmove_candidates;
-		bulkmove_candidates = seen;
-		fprintf(stderr, "[DBG] %s -> %s suggests possible bulk move from %s to %s\n",
-			dstpair->one->path,
-			dstpair->two->path,
-			one_parent_path, two_parent_path);
-		return;
-	}
+		/* next parent if any */
+		copy_dirname(one_parent_path, one_parent_path);
+	} while (*one_parent_path);
 }
 
 /*
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 5/5] [WIP] Allow hiding renames of individual files involved in a directory rename.
  2010-10-14 23:29       ` [PATCH v6 4/5] [RFC] Consider all parents of a file as candidates for bulk rename Yann Dirson
@ 2010-10-14 23:29         ` Yann Dirson
  2010-10-17 19:24         ` [PATCH v6.1] [RFC] Consider all parents of a file as candidates for bulk rename Yann Dirson
  1 sibling, 0 replies; 17+ messages in thread
From: Yann Dirson @ 2010-10-14 23:29 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

Once has identified groups of bulk-moved files, and then
the --hide-bulk-move-details flag hides those of the individual renames
which carry no other information (further name change, or content changes).

This makes it much easier to a human reader to spot content changes
in a commit that also moves a whole subtree.

Important note: unified diff output is not currently useful, since the "bulk move"
headers are not yet added by --detect-bulk-moves, but the redundant renames are
really removed.

Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 diff.c            |    7 +++++
 diff.h            |    3 ++
 diffcore-rename.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 diffcore.h        |    1 +
 4 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 4de43d6..81282bf 100644
--- a/diff.c
+++ b/diff.c
@@ -3193,6 +3193,13 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		if (!options->detect_rename)
 			options->detect_rename = DIFF_DETECT_RENAME;
 	}
+	else if (!strcmp(arg, "--hide-bulk-move-details")) {
+		DIFF_OPT_SET(options, HIDE_DIR_RENAME_DETAILS);
+		if (!DIFF_OPT_TST(options, DETECT_BULK_MOVES))
+			DIFF_OPT_SET(options, DETECT_BULK_MOVES);
+		if (!options->detect_rename)
+			options->detect_rename = DIFF_DETECT_RENAME;
+	}
 	else if (!strcmp(arg, "--follow"))
 		DIFF_OPT_SET(options, FOLLOW_RENAMES);
 	else if (!strcmp(arg, "--color"))
diff --git a/diff.h b/diff.h
index b0d6fa6..7f132d0 100644
--- a/diff.h
+++ b/diff.h
@@ -79,6 +79,7 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options *opt, void *data)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
 #define DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG (1 << 27)
 #define DIFF_OPT_DETECT_BULK_MOVES  (1 << 28)
+#define DIFF_OPT_HIDE_DIR_RENAME_DETAILS (1 << 29)
 
 #define DIFF_OPT_TST(opts, flag)    ((opts)->flags & DIFF_OPT_##flag)
 #define DIFF_OPT_SET(opts, flag)    ((opts)->flags |= DIFF_OPT_##flag)
@@ -268,6 +269,8 @@ extern void diffcore_fix_diff_index(struct diff_options *);
 "                try unchanged files as candidate for copy detection.\n" \
 "  --detect-bulk-moves\n" \
 "                detect wholesale directory renames.\n" \
+"  --hide-bulk-move-details\n" \
+"                hide renames of individual files in a directory rename.\n" \
 "  -l<n>         limit rename attempts up to <n> paths.\n" \
 "  -O<file>      reorder diffs according to the <file>.\n" \
 "  -S<string>    find filepair whose only one side contains the string.\n" \
diff --git a/diffcore-rename.c b/diffcore-rename.c
index ff69201..b419953 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -456,6 +456,34 @@ struct diff_dir_rename {
 };
 
 /*
+ * Marks as such file_rename if it is made uninteresting by dir_rename.
+ * Returns -1 if the file_rename is outside of the range in which given
+ * renames concerned by dir_rename are to be found (ie. end of loop),
+ * 0 otherwise.
+ */
+static int maybe_mark_uninteresting(struct diff_rename_dst* file_rename,
+				    struct diff_dir_rename* dir_rename,
+				    int one_len, int two_len)
+{
+	if (!file_rename->pair) /* file add */
+		return 0;
+	if (strncmp(file_rename->two->path,
+		    dir_rename->two->path, two_len))
+		return -1;
+	if (strncmp(file_rename->pair->one->path,
+		    dir_rename->one->path, one_len) ||
+	    !basename_same(file_rename->pair->one, file_rename->two) ||
+	    file_rename->pair->score != MAX_SCORE)
+		return 0;
+
+	file_rename->pair->uninteresting_rename = 1;
+	fprintf (stderr, "[DBG] %s* -> %s* makes %s -> %s uninteresting\n",
+		dir_rename->one->path, dir_rename->two->path,
+		file_rename->pair->one->path, file_rename->two->path);
+	return 0;
+}
+
+/*
  * Copy dirname of src into dst, suitable to append a filename without
  * an additional "/".
  * Only handles relative paths since there is no absolute path in a git repo.
@@ -670,11 +698,43 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
  * Take all file renames recorded so far and check if they could cause
  * a bulk move to be detected.
  */
-static void diffcore_bulk_moves(void)
+static void diffcore_bulk_moves(int opt_hide_renames)
 {
 	int i;
 	for (i = 0; i < rename_dst_nr; i++)
 		check_one_bulk_move(rename_dst[i].pair);
+
+	if (opt_hide_renames) {
+		// flag as "uninteresting" those candidates hidden by dir move
+		struct diff_dir_rename* candidate;
+		for (candidate=bulkmove_candidates;
+		     candidate; candidate = candidate->next) {
+			int two_idx, i, one_len, two_len;
+			if (candidate->discarded)
+				continue;
+
+			// bisect to any entry within candidate dst dir
+			struct diff_rename_dst* two_sample =
+				locate_rename_dst_dir(candidate->two->path);
+			if (!two_sample) {
+				die ("PANIC: %s candidate of rename not in target tree (from %s)\n",
+				     candidate->two->path, candidate->one->path);
+			}
+			two_idx = two_sample - rename_dst;
+
+			// now remove extraneous 100% files inside.
+			one_len = strlen(candidate->one->path);
+			two_len = strlen(candidate->two->path);
+			for (i = two_idx; i < rename_dst_nr; i++)
+				if (maybe_mark_uninteresting (rename_dst+i, candidate,
+							      one_len, two_len) < 0)
+					break;
+			for (i = two_idx-1; i >= 0; i--)
+				if (maybe_mark_uninteresting (rename_dst+i, candidate,
+							      one_len, two_len) < 0)
+					break;
+		}
+	}
 }
 
 void diffcore_rename(struct diff_options *options)
@@ -847,7 +907,7 @@ void diffcore_rename(struct diff_options *options)
 
 	/* Now possibly factorize those renames and copies. */
 	if (DIFF_OPT_TST(options, DETECT_BULK_MOVES))
-		diffcore_bulk_moves();
+		diffcore_bulk_moves(DIFF_OPT_TST(options, HIDE_DIR_RENAME_DETAILS));
 
 	DIFF_QUEUE_CLEAR(&outq);
 
@@ -881,7 +941,8 @@ void diffcore_rename(struct diff_options *options)
 			struct diff_rename_dst *dst =
 				locate_rename_dst(p->two, 0);
 			if (dst && dst->pair) {
-				diff_q(&outq, dst->pair);
+				if (!dst->pair->uninteresting_rename)
+					diff_q(&outq, dst->pair);
 				pair_to_free = p;
 			}
 			else
diff --git a/diffcore.h b/diffcore.h
index 6dab95b..a4eb8e1 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -69,6 +69,7 @@ struct diff_filepair {
 	unsigned broken_pair : 1;
 	unsigned renamed_pair : 1;
 	unsigned is_unmerged : 1;
+	unsigned uninteresting_rename : 1;
 	unsigned is_bulkmove : 1;
 };
 #define DIFF_PAIR_UNMERGED(p) ((p)->is_unmerged)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH] compat: add memrchr()
  2010-10-14 23:29 [PATCH v6 0/5] Detection of directory renames Yann Dirson
  2010-10-14 23:29 ` [PATCH v6 1/5] Introduce bulk-move detection in diffcore Yann Dirson
@ 2010-10-15  5:17 ` Jonathan Nieder
  2010-10-15  5:31   ` Ævar Arnfjörð Bjarmason
                     ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Jonathan Nieder @ 2010-10-15  5:17 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git

Reimplement another handy convenience function from glibc.  memrchr()
searches from the end of a memory area for a particular character.  It
is similar to strrchr() but takes a length argument and is
binary-safe.

The whole-directory rename detection patch could use this to find the
last directory separator in a (possibly truncated) pathname.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Yann Dirson wrote:

> * memrchr() implementation for portability

Something like this?  Untested.

 git-compat-util.h |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 2af8d3e..6f1020e 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -366,6 +366,9 @@ extern int git_vsnprintf(char *str, size_t maxsize,
 #define HAVE_STRCHRNUL
 #define HAVE_MEMPCPY
 #endif
+#if __GLIBC_PREREQ(2, 2)
+#define HAVE_MEMRCHR
+#endif
 #endif
 
 #ifndef HAVE_STRCHRNUL
@@ -386,6 +389,19 @@ static inline void *gitmempcpy(void *dest, const void *src, size_t n)
 }
 #endif
 
+#ifndef HAVE_MEMRCHR
+#define memrchr gitmemrchr
+static inline void *gitmemrchr(const void *s, int c, size_t n)
+{
+	const char *p = s;
+	p += n;
+	while (p != s)
+		if (*--p == c)
+			return p;
+	return NULL;
+}
+#endif
+
 extern void release_pack_memory(size_t, int);
 
 typedef void (*try_to_free_t)(size_t);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] compat: add memrchr()
  2010-10-15  5:17 ` [PATCH] compat: add memrchr() Jonathan Nieder
@ 2010-10-15  5:31   ` Ævar Arnfjörð Bjarmason
  2010-10-15  6:06     ` Jonathan Nieder
  2010-10-15  6:57     ` Johannes Sixt
  2010-10-15  8:56   ` Ludvig Strigeus
  2010-10-15  8:56   ` [PATCH] " Erik Faye-Lund
  2 siblings, 2 replies; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-15  5:31 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Yann Dirson, git

On Fri, Oct 15, 2010 at 05:17, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Reimplement another handy convenience function from glibc.  memrchr()
> searches from the end of a memory area for a particular character.  It
> is similar to strrchr() but takes a length argument and is
> binary-safe.
>
> The whole-directory rename detection patch could use this to find the
> last directory separator in a (possibly truncated) pathname.

Maybe something like this for configure.ac:

    AC_CHECK_LIB([c], [memchr],
    [HAVE_MEMRCHR=YesPlease],
    [HAVE_MEMRCHR=])
    AC_SUBST(HAVE_MEMRCHR)

And documentation with the other HAVE_* variables at the top of the
Makefile?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] compat: add memrchr()
  2010-10-15  5:31   ` Ævar Arnfjörð Bjarmason
@ 2010-10-15  6:06     ` Jonathan Nieder
  2010-10-15 10:49       ` Ævar Arnfjörð Bjarmason
  2010-10-15  6:57     ` Johannes Sixt
  1 sibling, 1 reply; 17+ messages in thread
From: Jonathan Nieder @ 2010-10-15  6:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Yann Dirson, git

Ævar Arnfjörð Bjarmason wrote:

> Maybe something like this for configure.ac:
> 
>     AC_CHECK_LIB([c], [memchr],
>     [HAVE_MEMRCHR=YesPlease],
>     [HAVE_MEMRCHR=])
>     AC_SUBST(HAVE_MEMRCHR)
> 
> And documentation with the other HAVE_* variables at the top of the
> Makefile?

Hmm, the BSDs and plan 9 have an memrchr() apparently.   Any idea for
taking advantage of that (the makefile support part) that's less ugly
than this?

If we miss a platform, that's no big deal.  The 1-char-at-a-time
loop is not so slow, and the "#define memrchr gitmemrchr" ensures
that it would not conflict with the libc version.
---
diff --git a/Makefile b/Makefile
index 1f1ce04..fa91530 100644
--- a/Makefile
+++ b/Makefile
@@ -211,6 +211,8 @@ all::
 #
 # Define NO_REGEX if you have no or inferior regex support in your C library.
 #
+# Define HAVE_MEMRCHR if you have memrchr() in your C library.
+#
 # Define JSMIN to point to JavaScript minifier that functions as
 # a filter to have gitweb.js minified.
 #
@@ -1388,6 +1390,9 @@ endif
 ifdef NO_UINTMAX_T
 	BASIC_CFLAGS += -Duintmax_t=uint32_t
 endif
+ifdef HAVE_MEMRCHR
+	BASIC_CFLAGS += -DHAVE_MEMRCHR
+endif
 ifdef NO_SOCKADDR_STORAGE
 ifdef NO_IPV6
 	BASIC_CFLAGS += -Dsockaddr_storage=sockaddr_in
diff --git a/git-compat-util.h b/git-compat-util.h
index 6f1020e..45aaebc 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -366,7 +366,7 @@ extern int git_vsnprintf(char *str, size_t maxsize,
 #define HAVE_STRCHRNUL
 #define HAVE_MEMPCPY
 #endif
-#if __GLIBC_PREREQ(2, 2)
+#if __GLIBC_PREREQ(2, 2) && !defined(HAVE_MEMRCHR)
 #define HAVE_MEMRCHR
 #endif
 #endif

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] compat: add memrchr()
  2010-10-15  5:31   ` Ævar Arnfjörð Bjarmason
  2010-10-15  6:06     ` Jonathan Nieder
@ 2010-10-15  6:57     ` Johannes Sixt
  1 sibling, 0 replies; 17+ messages in thread
From: Johannes Sixt @ 2010-10-15  6:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jonathan Nieder, Yann Dirson, git

Am 10/15/2010 7:31, schrieb Ævar Arnfjörð Bjarmason:
> Maybe something like this for configure.ac:
> 
>     AC_CHECK_LIB([c], [memchr],

Don't forget to change this to [memrchr] before you submit a patch.

>     [HAVE_MEMRCHR=YesPlease],
>     [HAVE_MEMRCHR=])
>     AC_SUBST(HAVE_MEMRCHR)

-- Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] compat: add memrchr()
  2010-10-15  5:17 ` [PATCH] compat: add memrchr() Jonathan Nieder
  2010-10-15  5:31   ` Ævar Arnfjörð Bjarmason
@ 2010-10-15  8:56   ` Ludvig Strigeus
  2010-10-15 15:26     ` [PATCH v2] " Jonathan Nieder
  2010-10-15  8:56   ` [PATCH] " Erik Faye-Lund
  2 siblings, 1 reply; 17+ messages in thread
From: Ludvig Strigeus @ 2010-10-15  8:56 UTC (permalink / raw)
  To: git

Jonathan Nieder <jrnieder <at> gmail.com> writes:

> 
> Reimplement another handy convenience function from glibc.  memrchr()
> searches from the end of a memory area for a particular character.  It
> is similar to strrchr() but takes a length argument and is
> binary-safe.

System memrchr uses a byte comparison rather than an int comparison.

"The memchr() function scans the first n bytes of the memory area pointed to by
s for the character c. The first byte to match c (interpreted as an unsigned
character) stops the operation. "

See: http://linux.die.net/man/3/memrchr

It's a good idea to do the same thing for compatibility reasons.

/Ludde

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] compat: add memrchr()
  2010-10-15  5:17 ` [PATCH] compat: add memrchr() Jonathan Nieder
  2010-10-15  5:31   ` Ævar Arnfjörð Bjarmason
  2010-10-15  8:56   ` Ludvig Strigeus
@ 2010-10-15  8:56   ` Erik Faye-Lund
  2010-10-15  9:35     ` Johannes Sixt
  2 siblings, 1 reply; 17+ messages in thread
From: Erik Faye-Lund @ 2010-10-15  8:56 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Yann Dirson, git

On Fri, Oct 15, 2010 at 7:17 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Reimplement another handy convenience function from glibc.  memrchr()
> searches from the end of a memory area for a particular character.  It
> is similar to strrchr() but takes a length argument and is
> binary-safe.
>
> The whole-directory rename detection patch could use this to find the
> last directory separator in a (possibly truncated) pathname.
>

Are you sure this would work on Windows where both '/' and '\' are
valid directory separators?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] compat: add memrchr()
  2010-10-15  8:56   ` [PATCH] " Erik Faye-Lund
@ 2010-10-15  9:35     ` Johannes Sixt
  0 siblings, 0 replies; 17+ messages in thread
From: Johannes Sixt @ 2010-10-15  9:35 UTC (permalink / raw)
  To: kusmabite; +Cc: Jonathan Nieder, Yann Dirson, git

Am 10/15/2010 10:56, schrieb Erik Faye-Lund:
> On Fri, Oct 15, 2010 at 7:17 AM, Jonathan Nieder <jrnieder@gmail.com> wrote:
>> Reimplement another handy convenience function from glibc.  memrchr()
>> searches from the end of a memory area for a particular character.  It
>> is similar to strrchr() but takes a length argument and is
>> binary-safe.
>>
>> The whole-directory rename detection patch could use this to find the
>> last directory separator in a (possibly truncated) pathname.
>>
> 
> Are you sure this would work on Windows where both '/' and '\' are
> valid directory separators?

I'm sure this would be used only on paths that were constructed from index
or repository contents; there, the directory separator is '/' by definition.

-- Hannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] compat: add memrchr()
  2010-10-15  6:06     ` Jonathan Nieder
@ 2010-10-15 10:49       ` Ævar Arnfjörð Bjarmason
  2010-10-15 22:27         ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-10-15 10:49 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Yann Dirson, git

On Fri, Oct 15, 2010 at 06:06, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Ævar Arnfjörð Bjarmason wrote:
>
>> Maybe something like this for configure.ac:
>>
>>     AC_CHECK_LIB([c], [memchr],
>>     [HAVE_MEMRCHR=YesPlease],
>>     [HAVE_MEMRCHR=])
>>     AC_SUBST(HAVE_MEMRCHR)
>>
>> And documentation with the other HAVE_* variables at the top of the
>> Makefile?
>
> Hmm, the BSDs and plan 9 have an memrchr() apparently.   Any idea for
> taking advantage of that (the makefile support part) that's less ugly
> than this?

Check out my "Makefile & configure: add a NO_FNMATCH flag" and
"Makefile & configure: add a NO_FNMATCH_CASEFOLD flag" patches already
in pu.

Maybe it should be a NO_* flag since that's what we use when we expect
sanity, e.g. we have NO_REGEXP=, NO_CURL and NO_GETTEXT.

Then you just need NO_MEMRCHR=UnfortunatelyYes entries in the big
if/else block for those platforms that don't have it.

But memrchr() is a GNU extension so it should probably be a HAVE_*. I
don't know.

> If we miss a platform, that's no big deal.  The 1-char-at-a-time
> loop is not so slow, and the "#define memrchr gitmemrchr" ensures
> that it would not conflict with the libc version.

Yup, probably best as a HAVE_*. In any case the autoconf probe I added
should work when memrchr is spelled properly.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2] compat: add memrchr()
  2010-10-15  8:56   ` Ludvig Strigeus
@ 2010-10-15 15:26     ` Jonathan Nieder
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Nieder @ 2010-10-15 15:26 UTC (permalink / raw)
  To: Ludvig Strigeus
  Cc: git, Erik Faye-Lund, Yann Dirson,
	Ævar Arnfjörð Bjarmason

Reimplement another handy convenience function from glibc.  memrchr()
searches from the end of a memory area for a particular character.  It
is similar to strrchr() but takes a length argument and is
binary-safe.

The whole-directory rename detection patch could use this to find the
last '/' in a (possibly truncated) pathname.

The system memrchr() is used on glibc systems to provide a sanity
check that our code works with a non-custom implementation.  Yes, the
various BSDs have their own highly optimized memrchr(), too.  The
planned use of memrchr in git is for clarity, not speed, so it is not
obvious that the makefile+autoconf magic to use libc's implementation
on a wide variety of operating systems would be worth the time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Ludvig Strigeus wrote:

> System memrchr uses a byte comparison rather than an int comparison.
> 
> "The memchr() function scans the first n bytes of the memory area pointed to by
> s for the character c. The first byte to match c (interpreted as an unsigned
> character) stops the operation. "

Oh, right.  (strchr() uses char, memchr() uses unsigned char.)

 git-compat-util.h |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 2af8d3e..6de9dee 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -366,6 +366,9 @@ extern int git_vsnprintf(char *str, size_t maxsize,
 #define HAVE_STRCHRNUL
 #define HAVE_MEMPCPY
 #endif
+#if __GLIBC_PREREQ(2, 2)
+#define HAVE_MEMRCHR
+#endif
 #endif
 
 #ifndef HAVE_STRCHRNUL
@@ -386,6 +389,19 @@ static inline void *gitmempcpy(void *dest, const void *src, size_t n)
 }
 #endif
 
+#ifndef HAVE_MEMRCHR
+#define memrchr gitmemrchr
+static inline void *gitmemrchr(const void *s, int c, size_t n)
+{
+	const unsigned char *p = s;
+	p += n;
+	while (p != s)
+		if (*--p == (unsigned char) c)
+			return p;
+	return NULL;
+}
+#endif
+
 extern void release_pack_memory(size_t, int);
 
 typedef void (*try_to_free_t)(size_t);
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] compat: add memrchr()
  2010-10-15 10:49       ` Ævar Arnfjörð Bjarmason
@ 2010-10-15 22:27         ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2010-10-15 22:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jonathan Nieder, Yann Dirson, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Maybe it should be a NO_* flag since that's what we use when we expect
> sanity, e.g. we have NO_REGEXP=, NO_CURL and NO_GETTEXT.
>
> Then you just need NO_MEMRCHR=UnfortunatelyYes entries in the big
> if/else block for those platforms that don't have it.

Hmmm, perhaps our naming convention have deteriorated over time.

I think one of the oldest one is NO_CURL, which started out as a way to
say "I do not have to go over http and dumb http transport is of no use to
me.  I _might_ have curl library installed on this system, but it does not
matter.  No thanks, please do not build git with curl support".  For some
people, it also meant "Unfortunately I do not have a working curl library
here, so I'll live with a git without http support".

On the other hand, NO_REGEX means "regrettably the platform implementation
is not good enough for our codebase, and we do need to use a replacement
implementation from compat/".

The difference is that it is not an option to build a less capable git
that lacks features that depend on REGEX.  But somehow we ended up using
the same NO_ prefix, conflating the two.  In that sense NO_GETTEXT is
named correctly, I think, as we fall back on implementation (i.e. less
capable git) that does not do i18n/l10n.

> But memrchr() is a GNU extension so it should probably be a HAVE_*. I
> don't know.

Yes, if we want to express that we positively rely on the existence of an
extension, HAVE_* does sound more correct.  It also is how the world with
autoconf/configure works.

We might want to clean up the names of variables at some point, but from a
quick glance things do not look very good.

Most NO_FOO are misnamed and should be !HAVE_FOO, but because our codebase
tend to use more common features, switching from the way we currently do,
i.e. assuming we have FOO and defining NO_FOO for oddball platforms, to
the other way around of defining HAVE_FOO for platforms where we do have FOO
and omit it for the ones that lack, may make the Makefile more cluttered.

But as I already said, switching to HAVE_FOO has one definite advantage;
it is more conventional and would mesh a lot better with the way
autoconf/configure would do things.

Here is a quick break-down of the current set I found from the 'master'
branch.

* You decline to build, or are unable to build, git with...

  NO_CURL		transport based on libcurl e.g. HTTP.
  NO_ICONV		charset re-encoding support.
  NO_IPV6		IPV6
  NO_NSEC		nanosecond resolution file timestamps
  NO_POSIX_ONLY_PROGRAMS
  NO_PYTHON		anything that depends on Python
  NO_SYMLINK_HEAD	support for original layout where .git/HEAD was a symlink
  NO_SVN_TESTS		test that spends too much time on git-svn


* We work around the lack of the platform feature FOO without loss of
  features from git (i.e. should be !HAVE_FOO):

  NO_C99_FORMAT		SZ_FMT and friends
  NO_D_INO_IN_DIRENT	(struct dirent)d.d_ino
  NO_D_TYPE_IN_DIRENT	(struct dirent)d.d_type
  NO_HSTRERROR
  NO_INET_NTOP
  NO_INET_PTON
  NO_LIBGEN_H
  NO_MEMMEM
  NO_MKDTEMP
  NO_MKSTEMPS
  NO_MMAP
  HAVE_PATHS_H		#include <paths.h>
  NO_PERL_MAKEMAKER
  NO_PREAD
  NO_REGEX
  NO_R_TO_GCC_LINKER
  NO_SETENV
  NO_SOCKADDR_STORAGE
  NO_ST_BLOCKS_IN_STRUCT_STAT
  NO_STRCASESTR
  NO_STRLCPY
  NO_STRTOK_R
  NO_STRTOULL
  NO_STRTOUMAX
  NO_SYS_SELECT_H
  NO_TRUSTABLE_FILEMODE
  NO_UINTMAX_T
  NO_UNSETENV

Unfortunately the world is not so black-and-white.  There is another class

* Depending on platform features, we use different implementation with
  almost no observable difference in behaviour (perhaps except for
  performance)

  NO_PTHREADS
	Use of threads for better machine utilization

  NO_FAST_WORKING_DIRECTORY
	Reading from object database is faster than opening a file in the
	working directory and reading from it on this platform, so we do
	the former when we know the file in the working directory is clean.
        Perhaps !HAVE_FAST_WORKING_DIRECTORY.

And yet another that is completely unrelated:

  NO_SUBDIR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6.1] [RFC] Consider all parents of a file as candidates for bulk rename.
  2010-10-14 23:29       ` [PATCH v6 4/5] [RFC] Consider all parents of a file as candidates for bulk rename Yann Dirson
  2010-10-14 23:29         ` [PATCH v6 5/5] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson
@ 2010-10-17 19:24         ` Yann Dirson
  1 sibling, 0 replies; 17+ messages in thread
From: Yann Dirson @ 2010-10-17 19:24 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson, Yann Dirson

We previously only looked at the immediate parents of a file involved
in a move.  Now consider upper ones as candidates too.

The way it is done here is somewhat inefficient, but at least it
handles all previously-known cases of incorrectness.

Signed-off-by: Yann Dirson <ydirson@free.fr>
---
 diffcore-rename.c |  120 ++++++++++++++++++++++++++++-------------------------
 1 files changed, 63 insertions(+), 57 deletions(-)

There was a major flaw in the first iteration of this patch (never
looking again at discarded renames).  This iteration also adds more
explicit comments about why this piece of code should work, despite
having been written late at night and having already shown one problem
which demonstrated that a testsuite is no replacement for a good night
of sleep :)

I'm still not happy with the thing, though - the loops can surely be
made shorter by using the discarded flags previously set, and more
importantly we need to consider moves to parents of the file's
destination too, which will add a loop of its own which will have to
be made as short as possible.  All of this seems to require to sort
bulkmove_candidates so we can binary-search into it and rapidly access
related paths.

Other opinions on the spirit of this patch still gladly welcome :)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 1be1af1..a8f3a8b 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -542,7 +542,7 @@ static int maybe_disqualify_bulkmove(const char* one_parent_path,
 		if (strncmp(rename_dst[j].two->path,
 			    one_parent_path, onep_len))
 			break; /* exhausted directory in this direction */
-		fprintf(stderr, "[DBG] leftover file %s in %s\n",
+		fprintf(stderr, "[DBG] leftover file %s in '%s'\n",
 			rename_dst[j].two->path, one_parent_path);
 		if (rename_dst[j].i_am_not_single || /* those were already here */
 		    (rename_dst[j].pair &&
@@ -596,66 +596,72 @@ static void check_one_bulk_move(struct diff_filepair *dstpair)
 	    maybe_disqualify_bulkmove(one_parent_path, one_leftover))
 		return;
 
-	fprintf(stderr, "[] %s -> %s ?\n",dstpair->one->path, dstpair->two->path);
-
-	// FIXME: loop over successive prefixes
-	unsigned needs_adding = 1;
+	fprintf(stderr, "[] %s -> %s ?\n", dstpair->one->path, dstpair->two->path);
 
-	/* already considered ? */
-	for (seen=bulkmove_candidates; seen; seen = seen->next) {
-		if (seen->discarded) {
-			/* already seen a rename from seen->one to some than ->two */
-			needs_adding = 0;
-			continue;
-		}
-		/* check exact dir */
-		if (!strcmp(seen->one->path, one_parent_path)) {
-			/* already added */
-			needs_adding = 0;
-			/* check that seen entry matches this rename */
-			if (strcmp(two_parent_path, seen->two->path)) {
-				fprintf(stderr, "[DBG] discarding dir %s from bulk moves (split into %s and %s)\n",
-					seen->one->path, two_parent_path, seen->two->path);
-				// FIXME: may be worth to free it instead
-				seen->discarded = 1;
+	/* loop up one_parent_path over successive parents */
+	// FIXME: also loop over two_parent_path prefixes ?
+	// FIXME: use a more informative prefixcmp to avoid strcmp calls
+	do {
+		unsigned needs_adding = 1;
+
+		fprintf(stderr, "[[]] %s ...\n", one_parent_path);
+
+		/* already considered ? */
+		for (seen=bulkmove_candidates; seen; seen = seen->next) {
+			fprintf(stderr, "[DBG]  ? %s -> %s\n", seen->one->path, seen->two->path);
+			/* subdir of "seen" source dir ? */
+			if (!prefixcmp(one_parent_path, seen->one->path)) {
+				/* subdir of "seen" dest dir ? */
+				if (!prefixcmp(two_parent_path, seen->two->path)) {
+					if (!strcmp(seen->one->path, one_parent_path) &&
+					    !strcmp(seen->two->path, two_parent_path)) {
+						/* already added */
+						fprintf(stderr, "[DBG]  already added\n");
+						needs_adding = 0;
+					} else /* is in a subdir: confirms but still
+						* may need adding */
+						fprintf(stderr, "[DBG]  'dstpair' conforts 'seen'\n");
+				} else {
+					fprintf(stderr, "[DBG] discarding %s -> %s from bulk moves (split into %s and %s)\n",
+						seen->one->path, seen->two->path,
+						two_parent_path, seen->two->path);
+					seen->discarded = 1;
+					/* Need to discard dstpair as well, unless moving
+					 * from a strict subdir of seen->one or to a
+					 * strict subdir of seen->two */
+					if (!strcmp(seen->one->path, one_parent_path) &&
+					    prefixcmp(seen->two->path, two_parent_path)) {
+						fprintf(stderr, "[DBG] ... and not adding self\n");
+						needs_adding = 0;
+					}
+					continue;
+				}
 			}
-			continue;
+			else fprintf(stderr, "[DBG]  %s outside of %s\n",
+				     one_parent_path, seen->one->path);
+
+			/* dstpair confirms seen, or does not infirm */
+			fprintf(stderr, "[DBG] %s -> %s DOES NOT cause discard of %s -> %s\n",
+				dstpair->one->path, dstpair->two->path,
+				seen->one->path, seen->two->path);
 		}
-		if (!prefixcmp(one_parent_path, seen->one->path)) {
-			if (prefixcmp(two_parent_path, seen->two->path)) {
-				fprintf(stderr, "[DBG] discarding dir %s from bulk moves (split into %s and %s)\n",
-					seen->one->path, two_parent_path, seen->two->path);
-				// FIXME: may be worth to free it instead
-				seen->discarded = 1;
-				continue;
-			}
-		} else {
-			fprintf(stderr, "[DBG]  %s considered irrelevant for %s -> %s\n",
-				dstpair->one->path, seen->one->path, seen->two->path);
-			continue;
+		if (needs_adding) { /* record potential dir rename */
+			seen = xmalloc(sizeof(*seen));
+			seen->one = alloc_filespec(one_parent_path);
+			fill_filespec(seen->one, null_sha1, S_IFDIR);
+			seen->two = alloc_filespec(two_parent_path);
+			fill_filespec(seen->two, null_sha1, S_IFDIR);
+			seen->discarded = 0;
+			seen->next = bulkmove_candidates;
+			bulkmove_candidates = seen;
+			fprintf(stderr, "[DBG] %s -> %s suggests possible bulk move from %s to %s\n",
+				dstpair->one->path,
+				dstpair->two->path,
+				one_parent_path, two_parent_path);
 		}
-
-		/* dstpair confirms seen */
-		fprintf(stderr, "[DBG] %s -> %s DOES NOT cause discard of %s -> %s\n",
-			dstpair->one->path, dstpair->two->path,
-			seen->one->path, seen->two->path);
-	}
-	if (needs_adding) { /* record potential dir rename */
-		/* all checks ok, we keep that entry */
-		seen = xmalloc(sizeof(*seen));
-		seen->one = alloc_filespec(one_parent_path);
-		fill_filespec(seen->one, null_sha1, S_IFDIR);
-		seen->two = alloc_filespec(two_parent_path);
-		fill_filespec(seen->two, null_sha1, S_IFDIR);
-		seen->discarded = 0;
-		seen->next = bulkmove_candidates;
-		bulkmove_candidates = seen;
-		fprintf(stderr, "[DBG] %s -> %s suggests possible bulk move from %s to %s\n",
-			dstpair->one->path,
-			dstpair->two->path,
-			one_parent_path, two_parent_path);
-		return;
-	}
+		/* next parent if any */
+		copy_dirname(one_parent_path, one_parent_path);
+	} while (*one_parent_path);
 }
 
 /*
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-10-17 19:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-14 23:29 [PATCH v6 0/5] Detection of directory renames Yann Dirson
2010-10-14 23:29 ` [PATCH v6 1/5] Introduce bulk-move detection in diffcore Yann Dirson
2010-10-14 23:29   ` [PATCH v6 2/5] Add testcases for the --detect-bulk-moves diffcore flag Yann Dirson
2010-10-14 23:29     ` [PATCH v6 3/5] [RFC] Handle the simpler case of a subdir invalidating bulk move Yann Dirson
2010-10-14 23:29       ` [PATCH v6 4/5] [RFC] Consider all parents of a file as candidates for bulk rename Yann Dirson
2010-10-14 23:29         ` [PATCH v6 5/5] [WIP] Allow hiding renames of individual files involved in a directory rename Yann Dirson
2010-10-17 19:24         ` [PATCH v6.1] [RFC] Consider all parents of a file as candidates for bulk rename Yann Dirson
2010-10-15  5:17 ` [PATCH] compat: add memrchr() Jonathan Nieder
2010-10-15  5:31   ` Ævar Arnfjörð Bjarmason
2010-10-15  6:06     ` Jonathan Nieder
2010-10-15 10:49       ` Ævar Arnfjörð Bjarmason
2010-10-15 22:27         ` Junio C Hamano
2010-10-15  6:57     ` Johannes Sixt
2010-10-15  8:56   ` Ludvig Strigeus
2010-10-15 15:26     ` [PATCH v2] " Jonathan Nieder
2010-10-15  8:56   ` [PATCH] " Erik Faye-Lund
2010-10-15  9:35     ` Johannes Sixt

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).