git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation
  2018-04-02 22:48 [RFC PATCH " Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

This makes the follow up patch easier.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 52f6e02130..879b8a5d9d 100644
--- a/diff.c
+++ b/diff.c
@@ -707,11 +707,15 @@ struct moved_entry {
 	struct moved_entry *next_line;
 };
 
-static int moved_entry_cmp(const struct diff_options *diffopt,
-			   const struct moved_entry *a,
-			   const struct moved_entry *b,
+static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
+			   const void *entry,
+			   const void *entry_or_key,
 			   const void *keydata)
 {
+	const struct diff_options *diffopt = hashmap_cmp_fn_data;
+	const struct moved_entry *a = entry;
+	const struct moved_entry *b = entry_or_key;
+
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
 				    diffopt->xdl_opts);
@@ -5534,10 +5538,8 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
-			hashmap_init(&del_lines,
-				     (hashmap_cmp_fn)moved_entry_cmp, o, 0);
-			hashmap_init(&add_lines,
-				     (hashmap_cmp_fn)moved_entry_cmp, o, 0);
+			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
+			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
 			add_lines_to_move_detection(o, &add_lines, &del_lines);
 			mark_color_as_moved(o, &add_lines, &del_lines);
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation
@ 2018-04-24 21:03 Stefan Beller
  2018-04-24 21:03 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Stefan Beller @ 2018-04-24 21:03 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, simon, avarab, jacob.keller, Stefan Beller

v2:
I think I have addressed Jonathans feedback
* by using a string instead of counting the first character only.
* refined tests slightly (easier to read)
* moved white space handling for moved blocks into its own flag field,
  keeping the enum for the actual mode of move detection.

v1:
This is a re-attempt of [1], which allows the moved code detection to
ignore blanks in various modes.

patches 1-5 are refactoring, patch 6 adds all existing white space options
of regular diff to the move detection. (I am unsure about this patch,
as I presume we want to keep the option space at a minimum if possible).

The fun is in the last patch, which allows white space sensitive
languages to trust the move detection, too. Each block that is marked as
moved will have the same delta in {in-, de-}dentation.
I would think this mode might be a reasonable default eventually.

Thanks,
Stefan

[1] https://public-inbox.org/git/20171025224620.27657-1-sbeller@google.com/

Stefan Beller (7):
  xdiff/xdiff.h: remove unused flags
  xdiff/xdiffi.c: remove unneeded function declarations
  diff.c: do not pass diff options as keydata to hashmap
  diff.c: adjust hash function signature to match hashmap expectation
  diff.c: add a blocks mode for moved code detection
  diff.c: decouple white space treatment from move detection algorithm
  diff.c: add --color-moved-ignore-space-delta option

 Documentation/diff-options.txt |  25 ++++-
 diff.c                         | 128 ++++++++++++++++++----
 diff.h                         |   8 +-
 t/t4015-diff-whitespace.sh     | 192 +++++++++++++++++++++++++++++++--
 xdiff/xdiff.h                  |   8 --
 xdiff/xdiffi.c                 |  17 ---
 6 files changed, 322 insertions(+), 56 deletions(-)

-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/7] xdiff/xdiff.h: remove unused flags
  2018-04-24 21:03 [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
@ 2018-04-24 21:03 ` Stefan Beller
  2018-04-24 21:03 ` [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2018-04-24 21:03 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, simon, avarab, jacob.keller, Stefan Beller

These flags were there since the beginning (3443546f6e (Use a *real*
built-in diff generator, 2006-03-24), but were never used. Remove them.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 xdiff/xdiff.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index c1937a2911..2356da5f78 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -52,14 +52,6 @@ extern "C" {
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
-#define XDL_MMB_READONLY (1 << 0)
-
-#define XDL_MMF_ATOMIC (1 << 0)
-
-#define XDL_BDOP_INS 1
-#define XDL_BDOP_CPY 2
-#define XDL_BDOP_INSB 3
-
 /* merge simplification levels */
 #define XDL_MERGE_MINIMAL 0
 #define XDL_MERGE_EAGER 1
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations
  2018-04-24 21:03 [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
  2018-04-24 21:03 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
@ 2018-04-24 21:03 ` Stefan Beller
  2018-04-24 21:03 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2018-04-24 21:03 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, simon, avarab, jacob.keller, Stefan Beller

There is no need to forward-declare these functions, as they are used
after their implementation only.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 xdiff/xdiffi.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 0de1ef463b..3e8aff92bc 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -22,34 +22,17 @@
 
 #include "xinclude.h"
 
-
-
 #define XDL_MAX_COST_MIN 256
 #define XDL_HEUR_MIN_COST 256
 #define XDL_LINE_MAX (long)((1UL << (CHAR_BIT * sizeof(long) - 1)) - 1)
 #define XDL_SNAKE_CNT 20
 #define XDL_K_HEUR 4
 
-
-
 typedef struct s_xdpsplit {
 	long i1, i2;
 	int min_lo, min_hi;
 } xdpsplit_t;
 
-
-
-
-static long xdl_split(unsigned long const *ha1, long off1, long lim1,
-		      unsigned long const *ha2, long off2, long lim2,
-		      long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
-		      xdalgoenv_t *xenv);
-static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, long chg2);
-
-
-
-
-
 /*
  * See "An O(ND) Difference Algorithm and its Variations", by Eugene Myers.
  * Basically considers a "box" (off1, off2, lim1, lim2) and scan from both
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap
  2018-04-24 21:03 [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
  2018-04-24 21:03 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
  2018-04-24 21:03 ` [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
@ 2018-04-24 21:03 ` Stefan Beller
  2018-04-24 21:03 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2018-04-24 21:03 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, simon, avarab, jacob.keller, Stefan Beller

When we initialize the hashmap, we give it a pointer to the
diff_options, which it then passes along to each call of the
hashmap_cmp_fn function. There's no need to pass it a second time as
the "keydata" parameter, and our comparison functions never look at
keydata.

This was a mistake left over from an earlier round of 2e2d5ac184
(diff.c: color moved lines differently, 2017-06-30), before hashmap
learned to pass the data pointer for us.

Explanation-by: Jeff King <peff@peff.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 1289df4b1f..ce7bedc1b9 100644
--- a/diff.c
+++ b/diff.c
@@ -842,13 +842,13 @@ static void mark_color_as_moved(struct diff_options *o,
 		case DIFF_SYMBOL_PLUS:
 			hm = del_lines;
 			key = prepare_entry(o, n);
-			match = hashmap_get(hm, key, o);
+			match = hashmap_get(hm, key, NULL);
 			free(key);
 			break;
 		case DIFF_SYMBOL_MINUS:
 			hm = add_lines;
 			key = prepare_entry(o, n);
-			match = hashmap_get(hm, key, o);
+			match = hashmap_get(hm, key, NULL);
 			free(key);
 			break;
 		default:
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation
  2018-04-24 21:03 [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (2 preceding siblings ...)
  2018-04-24 21:03 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
@ 2018-04-24 21:03 ` Stefan Beller
  2018-04-24 21:03 ` [PATCH 5/7] diff.c: add a blocks mode for moved code detection Stefan Beller
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2018-04-24 21:03 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, simon, avarab, jacob.keller, Stefan Beller

This makes the follow up patch easier.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index ce7bedc1b9..d1bae900cd 100644
--- a/diff.c
+++ b/diff.c
@@ -707,11 +707,15 @@ struct moved_entry {
 	struct moved_entry *next_line;
 };
 
-static int moved_entry_cmp(const struct diff_options *diffopt,
-			   const struct moved_entry *a,
-			   const struct moved_entry *b,
+static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
+			   const void *entry,
+			   const void *entry_or_key,
 			   const void *keydata)
 {
+	const struct diff_options *diffopt = hashmap_cmp_fn_data;
+	const struct moved_entry *a = entry;
+	const struct moved_entry *b = entry_or_key;
+
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
 				    diffopt->xdl_opts);
@@ -5534,10 +5538,8 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
-			hashmap_init(&del_lines,
-				     (hashmap_cmp_fn)moved_entry_cmp, o, 0);
-			hashmap_init(&add_lines,
-				     (hashmap_cmp_fn)moved_entry_cmp, o, 0);
+			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
+			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
 			add_lines_to_move_detection(o, &add_lines, &del_lines);
 			mark_color_as_moved(o, &add_lines, &del_lines);
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/7] diff.c: add a blocks mode for moved code detection
  2018-04-24 21:03 [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (3 preceding siblings ...)
  2018-04-24 21:03 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
@ 2018-04-24 21:03 ` Stefan Beller
  2018-04-24 21:50   ` Jonathan Tan
  2018-04-24 21:03 ` [PATCH 6/7] diff.c: decouple white space treatment from move detection algorithm Stefan Beller
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2018-04-24 21:03 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, simon, avarab, jacob.keller, Stefan Beller

The new "blocks" mode provides a middle ground between plain and zebra.
It is as intuitive (few colors) as plain, but still has the requirement
for a minimum of lines/characters to count a block as moved.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
 (https://public-inbox.org/git/87o9j0uljo.fsf@evledraar.gmail.com/)
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt |  8 ++++--
 diff.c                         |  6 +++--
 diff.h                         |  5 ++--
 t/t4015-diff-whitespace.sh     | 48 +++++++++++++++++++++++++++++++++-
 4 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index e3a44f03cd..bb9f1b7cd8 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -276,10 +276,14 @@ plain::
 	that are added somewhere else in the diff. This mode picks up any
 	moved line, but it is not very useful in a review to determine
 	if a block of code was moved without permutation.
-zebra::
+blocks:
 	Blocks of moved text of at least 20 alphanumeric characters
 	are detected greedily. The detected blocks are
-	painted using either the 'color.diff.{old,new}Moved' color or
+	painted using either the 'color.diff.{old,new}Moved' color.
+	Adjacent blocks cannot be told apart.
+zebra::
+	Blocks of moved text are detected as in 'blocks' mode. The blocks
+	are painted using either the 'color.diff.{old,new}Moved' color or
 	'color.diff.{old,new}MovedAlternative'. The change between
 	the two colors indicates that a new block was detected.
 dimmed_zebra::
diff --git a/diff.c b/diff.c
index d1bae900cd..95c51c0b7d 100644
--- a/diff.c
+++ b/diff.c
@@ -271,6 +271,8 @@ static int parse_color_moved(const char *arg)
 		return COLOR_MOVED_NO;
 	else if (!strcmp(arg, "plain"))
 		return COLOR_MOVED_PLAIN;
+	else if (!strcmp(arg, "blocks"))
+		return COLOR_MOVED_BLOCKS;
 	else if (!strcmp(arg, "zebra"))
 		return COLOR_MOVED_ZEBRA;
 	else if (!strcmp(arg, "default"))
@@ -278,7 +280,7 @@ static int parse_color_moved(const char *arg)
 	else if (!strcmp(arg, "dimmed_zebra"))
 		return COLOR_MOVED_ZEBRA_DIM;
 	else
-		return error(_("color moved setting must be one of 'no', 'default', 'zebra', 'dimmed_zebra', 'plain'"));
+		return error(_("color moved setting must be one of 'no', 'default', 'blocks', 'zebra', 'dimmed_zebra', 'plain'"));
 }
 
 int git_diff_ui_config(const char *var, const char *value, void *cb)
@@ -903,7 +905,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
 		block_length++;
 
-		if (flipped_block)
+		if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
 	adjust_last_block(o, n, block_length);
diff --git a/diff.h b/diff.h
index d29560f822..7bd4f182c3 100644
--- a/diff.h
+++ b/diff.h
@@ -208,8 +208,9 @@ struct diff_options {
 	enum {
 		COLOR_MOVED_NO = 0,
 		COLOR_MOVED_PLAIN = 1,
-		COLOR_MOVED_ZEBRA = 2,
-		COLOR_MOVED_ZEBRA_DIM = 3,
+		COLOR_MOVED_BLOCKS = 2,
+		COLOR_MOVED_ZEBRA = 3,
+		COLOR_MOVED_ZEBRA_DIM = 4,
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 17df491a3a..45091abb19 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1223,7 +1223,7 @@ test_expect_success 'plain moved code, inside file' '
 	test_cmp expected actual
 '
 
-test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
+test_expect_success 'detect blocks of moved code' '
 	git reset --hard &&
 	cat <<-\EOF >lines.txt &&
 		long line 1
@@ -1271,6 +1271,52 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
 	test_config color.diff.newMovedDimmed "normal cyan" &&
 	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
 	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
+
+	git diff HEAD --no-renames --color-moved=blocks --color |
+		grep -v "index" |
+		test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+	<BOLD>--- a/lines.txt<RESET>
+	<BOLD>+++ b/lines.txt<RESET>
+	<CYAN>@@ -1,16 +1,16 @@<RESET>
+	<MAGENTA>-long line 1<RESET>
+	<MAGENTA>-long line 2<RESET>
+	<MAGENTA>-long line 3<RESET>
+	 line 4<RESET>
+	 line 5<RESET>
+	 line 6<RESET>
+	 line 7<RESET>
+	 line 8<RESET>
+	 line 9<RESET>
+	<CYAN>+<RESET><CYAN>long line 1<RESET>
+	<CYAN>+<RESET><CYAN>long line 2<RESET>
+	<CYAN>+<RESET><CYAN>long line 3<RESET>
+	<CYAN>+<RESET><CYAN>long line 14<RESET>
+	<CYAN>+<RESET><CYAN>long line 15<RESET>
+	<CYAN>+<RESET><CYAN>long line 16<RESET>
+	 line 10<RESET>
+	 line 11<RESET>
+	 line 12<RESET>
+	 line 13<RESET>
+	<MAGENTA>-long line 14<RESET>
+	<MAGENTA>-long line 15<RESET>
+	<MAGENTA>-long line 16<RESET>
+	EOF
+	test_cmp expected actual
+
+'
+
+test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
+	# reuse setup from test before!
+	test_config color.diff.oldMoved "magenta" &&
+	test_config color.diff.newMoved "cyan" &&
+	test_config color.diff.oldMovedAlternative "blue" &&
+	test_config color.diff.newMovedAlternative "yellow" &&
+	test_config color.diff.oldMovedDimmed "normal magenta" &&
+	test_config color.diff.newMovedDimmed "normal cyan" &&
+	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
+	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
 	git diff HEAD --no-renames --color-moved=dimmed_zebra --color |
 		grep -v "index" |
 		test_decode_color >actual &&
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/7] diff.c: decouple white space treatment from move detection algorithm
  2018-04-24 21:03 [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (4 preceding siblings ...)
  2018-04-24 21:03 ` [PATCH 5/7] diff.c: add a blocks mode for moved code detection Stefan Beller
@ 2018-04-24 21:03 ` Stefan Beller
  2018-04-24 22:00   ` Jonathan Tan
  2018-04-24 21:03 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
  2018-04-24 22:37 ` [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
  7 siblings, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2018-04-24 21:03 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, simon, avarab, jacob.keller, Stefan Beller

In the original implementation of the move detection logic the choice for
ignoring white space changes is the same for the move detection as it is
for the regular diff.  Some cases came up where different treatment would
have been nice.

Allow the user to specify that whitespace should be ignored differently
during detection of moved lines than during generation of added and removed
lines. This is done by providing analogs to the --ignore-space-at-eol,
-b, and -w options (namely,
  --color-moved-[no-]ignore-space-at-eol
  --color-moved-[no-]ignore-space-change
  --color-moved-[no-]ignore-all-space) that affect only the color of the
output, and making the existing --ignore-space-at-eol, -b, and -w options
no longer affect the color of the output.

As we change the default, we'll adjust the tests.

For now we do not infer any options to treat whitespaces in the move
detection from the generic white space options given to diff.
This can be tuned later to reasonable default.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt | 13 +++++
 diff.c                         | 19 ++++++-
 diff.h                         |  1 +
 t/t4015-diff-whitespace.sh     | 90 +++++++++++++++++++++++++++++++---
 4 files changed, 114 insertions(+), 9 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index bb9f1b7cd8..7b2527b9a1 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -292,6 +292,19 @@ dimmed_zebra::
 	blocks are considered interesting, the rest is uninteresting.
 --
 
+--color-moved-[no-]ignore-space-at-eol::
+	Ignore changes in whitespace at EOL when performing the move
+	detection for --color-moved.
+--color-moved-[no-]ignore-space-change::
+	Ignore changes in amount of whitespace when performing the move
+	detection for --color-moved.  This ignores whitespace
+	at line end, and considers all other sequences of one or
+	more whitespace characters to be equivalent.
+--color-moved-[no-]ignore-all-space::
+	Ignore whitespace when comparing lines when performing the move
+	detection for --color-moved.  This ignores differences even if
+	one line has whitespace where the other line has none.
+
 --word-diff[=<mode>]::
 	Show a word diff, using the <mode> to delimit changed words.
 	By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index 95c51c0b7d..b5819dd538 100644
--- a/diff.c
+++ b/diff.c
@@ -717,10 +717,12 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
 	const struct moved_entry *a = entry;
 	const struct moved_entry *b = entry_or_key;
+	unsigned flags = diffopt->color_moved_ws_handling
+			 & XDF_WHITESPACE_FLAGS;
 
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
-				    diffopt->xdl_opts);
+				    flags);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -728,8 +730,9 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 {
 	struct moved_entry *ret = xmalloc(sizeof(*ret));
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
+	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
-	ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
+	ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
 	ret->es = l;
 	ret->next_line = NULL;
 
@@ -4638,6 +4641,18 @@ int diff_opt_parse(struct diff_options *options,
 		DIFF_XDL_SET(options, IGNORE_CR_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+	else if (!strcmp(arg, "--color-moved-no-ignore-all-space"))
+		options->color_moved_ws_handling &= ~XDF_IGNORE_WHITESPACE;
+	else if (!strcmp(arg, "--color-moved-no-ignore-space-change"))
+		options->color_moved_ws_handling &= ~XDF_IGNORE_WHITESPACE_CHANGE;
+	else if (!strcmp(arg, "--color-moved-no-ignore-space-at-eol"))
+		options->color_moved_ws_handling &= ~XDF_IGNORE_WHITESPACE_AT_EOL;
+	else if (!strcmp(arg, "--color-moved-ignore-all-space"))
+		options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
+	else if (!strcmp(arg, "--color-moved-ignore-space-change"))
+		options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE_CHANGE;
+	else if (!strcmp(arg, "--color-moved-ignore-space-at-eol"))
+		options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE_AT_EOL;
 	else if (!strcmp(arg, "--indent-heuristic"))
 		DIFF_XDL_SET(options, INDENT_HEURISTIC);
 	else if (!strcmp(arg, "--no-indent-heuristic"))
diff --git a/diff.h b/diff.h
index 7bd4f182c3..de5dc68005 100644
--- a/diff.h
+++ b/diff.h
@@ -214,6 +214,7 @@ struct diff_options {
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
+	int color_moved_ws_handling;
 };
 
 void diff_emit_submodule_del(struct diff_options *o, const char *line);
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 45091abb19..751fc478dd 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1441,7 +1441,10 @@ test_expect_success 'move detection ignoring whitespace ' '
 	line 4
 	line 5
 	EOF
-	git diff HEAD --no-renames --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1465,7 +1468,10 @@ test_expect_success 'move detection ignoring whitespace ' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -w --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1505,7 +1511,10 @@ test_expect_success 'move detection ignoring whitespace changes' '
 	line 5
 	EOF
 
-	git diff HEAD --no-renames --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1529,7 +1538,10 @@ test_expect_success 'move detection ignoring whitespace changes' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -b --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-at-eol \
+		--color-moved-ignore-space-change |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1572,7 +1584,10 @@ test_expect_success 'move detection ignoring whitespace at eol' '
 	# avoid cluttering the output with complaints about our eol whitespace
 	test_config core.whitespace -blank-at-eol &&
 
-	git diff HEAD --no-renames --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1596,7 +1611,10 @@ test_expect_success 'move detection ignoring whitespace at eol' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1768,7 +1786,65 @@ test_expect_success 'move detection with submodules' '
 
 	# nor did we mess with it another way
 	git diff --submodule=diff --color | test_decode_color >expect &&
-	test_cmp expect decoded_actual
+	test_cmp expect decoded_actual &&
+	rm -rf bananas &&
+	git submodule deinit bananas
+'
+
+test_expect_success 'only move detection ignores white spaces' '
+	git reset --hard &&
+	q_to_tab <<-\EOF >text.txt &&
+		a long line to exceed per-line minimum
+		another long line to exceed per-line minimum
+		original file
+	EOF
+	git add text.txt &&
+	git commit -m "add text" &&
+	q_to_tab <<-\EOF >text.txt &&
+		Qa long line to exceed per-line minimum
+		Qanother long line to exceed per-line minimum
+		new file
+	EOF
+
+	# Make sure we get a different diff using -w
+	git diff --color --color-moved -w \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
+		grep -v "index" |
+		test_decode_color >actual &&
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/text.txt b/text.txt<RESET>
+	<BOLD>--- a/text.txt<RESET>
+	<BOLD>+++ b/text.txt<RESET>
+	<CYAN>@@ -1,3 +1,3 @@<RESET>
+	 Qa long line to exceed per-line minimum<RESET>
+	 Qanother long line to exceed per-line minimum<RESET>
+	<RED>-original file<RESET>
+	<GREEN>+<RESET><GREEN>new file<RESET>
+	EOF
+	test_cmp expected actual &&
+
+	# And now ignoring white space only in the move detection
+	git diff --color --color-moved \
+		--color-moved-ignore-all-space \
+		--color-moved-ignore-space-change \
+		--color-moved-ignore-space-at-eol |
+		grep -v "index" |
+		test_decode_color >actual &&
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/text.txt b/text.txt<RESET>
+	<BOLD>--- a/text.txt<RESET>
+	<BOLD>+++ b/text.txt<RESET>
+	<CYAN>@@ -1,3 +1,3 @@<RESET>
+	<BOLD;MAGENTA>-a long line to exceed per-line minimum<RESET>
+	<BOLD;MAGENTA>-another long line to exceed per-line minimum<RESET>
+	<RED>-original file<RESET>
+	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>a long line to exceed per-line minimum<RESET>
+	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>another long line to exceed per-line minimum<RESET>
+	<GREEN>+<RESET><GREEN>new file<RESET>
+	EOF
+	test_cmp expected actual
 '
 
 test_done
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-24 21:03 [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (5 preceding siblings ...)
  2018-04-24 21:03 ` [PATCH 6/7] diff.c: decouple white space treatment from move detection algorithm Stefan Beller
@ 2018-04-24 21:03 ` Stefan Beller
  2018-04-24 22:35   ` Jonathan Tan
  2018-04-24 22:37 ` [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
  7 siblings, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2018-04-24 21:03 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, simon, avarab, jacob.keller, Stefan Beller

This marks moved code still as blocks when their indentation level
changes uniformly.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt |  4 ++
 diff.c                         | 83 +++++++++++++++++++++++++++++++---
 diff.h                         |  2 +
 t/t4015-diff-whitespace.sh     | 54 ++++++++++++++++++++++
 4 files changed, 137 insertions(+), 6 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 7b2527b9a1..facdbc8f95 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -304,6 +304,10 @@ dimmed_zebra::
 	Ignore whitespace when comparing lines when performing the move
 	detection for --color-moved.  This ignores differences even if
 	one line has whitespace where the other line has none.
+--color-moved-[no-]ignore-space-prefix-delta::
+	Ignores whitespace when comparing lines when performing the move
+	detection for --color-moved. This ignores uniform differences
+	of white space at the beginning lines in moved blocks.
 
 --word-diff[=<mode>]::
 	Show a word diff, using the <mode> to delimit changed words.
diff --git a/diff.c b/diff.c
index b5819dd538..1227a4d2a8 100644
--- a/diff.c
+++ b/diff.c
@@ -709,6 +709,31 @@ struct moved_entry {
 	struct moved_entry *next_line;
 };
 
+struct ws_delta {
+	char *string; /* The prefix delta, which is the same in the block */
+	int direction; /* adding or removing the line? */
+	int missmatch; /* in the remainder */
+};
+#define WS_DELTA_INIT { NULL, 0, 0 }
+
+static void compute_ws_delta(const struct emitted_diff_symbol *a,
+			     const struct emitted_diff_symbol *b,
+			     struct ws_delta *out)
+{
+	const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
+	const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
+	int d = longer->len - shorter->len;
+
+	out->missmatch = !memcmp(longer->line + d, shorter->line, shorter->len);
+	out->string = xmemdupz(longer->line, d);
+	out->direction = (a == longer);
+}
+
+static int compare_ws_delta(const struct ws_delta *a, const struct ws_delta *b)
+{
+	return a->direction == b->direction && !strcmp(a->string, b->string);
+}
+
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			   const void *entry,
 			   const void *entry_or_key,
@@ -720,6 +745,15 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
+	if (diffopt->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
+		/*
+		 * As there is not specific white space config given,
+		 * we'd need to check for a new block, so ignore all
+		 * white space. The setup of the white space
+		 * configuration for the next block is done else where
+		 */
+		flags |= XDF_IGNORE_WHITESPACE;
+
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
 				    flags);
@@ -772,7 +806,8 @@ static void add_lines_to_move_detection(struct diff_options *o,
 }
 
 static int shrink_potential_moved_blocks(struct moved_entry **pmb,
-					 int pmb_nr)
+					 int pmb_nr,
+					 struct ws_delta **wsd)
 {
 	int lp, rp;
 
@@ -788,6 +823,10 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 
 		if (lp < pmb_nr && rp > -1 && lp < rp) {
 			pmb[lp] = pmb[rp];
+			if (*wsd) {
+				free((*wsd)[lp].string);
+				(*wsd)[lp] = (*wsd)[rp];
+			}
 			pmb[rp] = NULL;
 			rp--;
 			lp++;
@@ -837,8 +876,11 @@ static void mark_color_as_moved(struct diff_options *o,
 {
 	struct moved_entry **pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
-	int n, flipped_block = 1, block_length = 0;
 
+	struct ws_delta *wsd = NULL; /* white space deltas between pmb */
+	int wsd_alloc = 0;
+
+	int n, flipped_block = 1, block_length = 0;
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
 		struct hashmap *hm = NULL;
@@ -881,14 +923,31 @@ static void mark_color_as_moved(struct diff_options *o,
 			struct moved_entry *p = pmb[i];
 			struct moved_entry *pnext = (p && p->next_line) ?
 					p->next_line : NULL;
-			if (pnext && !hm->cmpfn(o, pnext, match, NULL)) {
-				pmb[i] = p->next_line;
+
+			if (o->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES) {
+				struct ws_delta out = WS_DELTA_INIT;
+
+				if (pnext)
+					compute_ws_delta(l, pnext->es, &out);
+				if (pnext &&
+				    !hm->cmpfn(o, pnext, match, NULL) &&
+				    compare_ws_delta(&out, &wsd[i])) {
+					pmb[i] = p->next_line;
+					/* wsd[i] is the same */
+				} else {
+					pmb[i] = NULL;
+				}
+				free(out.string);
 			} else {
-				pmb[i] = NULL;
+				if (pnext && !hm->cmpfn(o, pnext, match, NULL)) {
+					pmb[i] = p->next_line;
+				} else {
+					pmb[i] = NULL;
+				}
 			}
 		}
 
-		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
+		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr, &wsd);
 
 		if (pmb_nr == 0) {
 			/*
@@ -897,6 +956,10 @@ static void mark_color_as_moved(struct diff_options *o,
 			 */
 			for (; match; match = hashmap_get_next(hm, match)) {
 				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
+				if (o->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES) {
+					ALLOC_GROW(wsd, pmb_nr + 1, wsd_alloc);
+					compute_ws_delta(l, match->es, &wsd[pmb_nr]);
+				}
 				pmb[pmb_nr++] = match;
 			}
 
@@ -914,6 +977,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	adjust_last_block(o, n, block_length);
 
 	free(pmb);
+	free(wsd);
 }
 
 #define DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK \
@@ -4647,12 +4711,16 @@ int diff_opt_parse(struct diff_options *options,
 		options->color_moved_ws_handling &= ~XDF_IGNORE_WHITESPACE_CHANGE;
 	else if (!strcmp(arg, "--color-moved-no-ignore-space-at-eol"))
 		options->color_moved_ws_handling &= ~XDF_IGNORE_WHITESPACE_AT_EOL;
+	else if (!strcmp(arg, "--color-moved-no-ignore-space-prefix-delta"))
+		options->color_moved_ws_handling &= ~COLOR_MOVED_DELTA_WHITESPACES;
 	else if (!strcmp(arg, "--color-moved-ignore-all-space"))
 		options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
 	else if (!strcmp(arg, "--color-moved-ignore-space-change"))
 		options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE_CHANGE;
 	else if (!strcmp(arg, "--color-moved-ignore-space-at-eol"))
 		options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE_AT_EOL;
+	else if (!strcmp(arg, "--color-moved-ignore-space-prefix-delta"))
+		options->color_moved_ws_handling |= COLOR_MOVED_DELTA_WHITESPACES;
 	else if (!strcmp(arg, "--indent-heuristic"))
 		DIFF_XDL_SET(options, INDENT_HEURISTIC);
 	else if (!strcmp(arg, "--no-indent-heuristic"))
@@ -5558,6 +5626,9 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
 			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
+			if (o->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
+				o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
+
 			add_lines_to_move_detection(o, &add_lines, &del_lines);
 			mark_color_as_moved(o, &add_lines, &del_lines);
 			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
diff --git a/diff.h b/diff.h
index de5dc68005..b00ea76c08 100644
--- a/diff.h
+++ b/diff.h
@@ -214,6 +214,8 @@ struct diff_options {
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
+	/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
+	#define COLOR_MOVED_DELTA_WHITESPACES	(1 << 22)
 	int color_moved_ws_handling;
 };
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 751fc478dd..3e94cf051d 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1847,4 +1847,58 @@ test_expect_success 'only move detection ignores white spaces' '
 	test_cmp expected actual
 '
 
+test_expect_success 'compare whitespace delta across moved blocks' '
+
+	git reset --hard &&
+	q_to_tab <<-\EOF >text.txt &&
+	QIndented
+	QText across
+	Qthree lines
+	QBut! <- this stands out
+	Qthis one
+	QQline did
+	Qnot adjust
+	EOF
+
+	git add text.txt &&
+	git commit -m "add text.txt" &&
+
+	q_to_tab <<-\EOF >text.txt &&
+	QQIndented
+	QQText across
+	QQthree lines
+	QQQBut! <- this stands out
+	this one
+	Qline did
+	not adjust
+	EOF
+
+	git diff --color --color-moved-ignore-space-prefix-delta |
+		grep -v "index" |
+		test_decode_color >actual &&
+
+	q_to_tab <<-\EOF >expected &&
+		<BOLD>diff --git a/text.txt b/text.txt<RESET>
+		<BOLD>--- a/text.txt<RESET>
+		<BOLD>+++ b/text.txt<RESET>
+		<CYAN>@@ -1,7 +1,7 @@<RESET>
+		<RED>-QIndented<RESET>
+		<RED>-QText across<RESET>
+		<RED>-Qthree lines<RESET>
+		<RED>-QBut! <- this stands out<RESET>
+		<RED>-Qthis one<RESET>
+		<RED>-QQline did<RESET>
+		<RED>-Qnot adjust<RESET>
+		<GREEN>+<RESET>QQ<GREEN>Indented<RESET>
+		<GREEN>+<RESET>QQ<GREEN>Text across<RESET>
+		<GREEN>+<RESET>QQ<GREEN>three lines<RESET>
+		<GREEN>+<RESET>QQQ<GREEN>But! <- this stands out<RESET>
+		<GREEN>+<RESET><GREEN>this one<RESET>
+		<GREEN>+<RESET>Q<GREEN>line did<RESET>
+		<GREEN>+<RESET><GREEN>not adjust<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] diff.c: add a blocks mode for moved code detection
  2018-04-24 21:03 ` [PATCH 5/7] diff.c: add a blocks mode for moved code detection Stefan Beller
@ 2018-04-24 21:50   ` Jonathan Tan
  2018-04-24 22:37     ` Stefan Beller
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Tan @ 2018-04-24 21:50 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, simon, avarab, jacob.keller

On Tue, 24 Apr 2018 14:03:28 -0700
Stefan Beller <sbeller@google.com> wrote:

> Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>  (https://public-inbox.org/git/87o9j0uljo.fsf@evledraar.gmail.com/)
> Signed-off-by: Stefan Beller <sbeller@google.com>

Firstly, I don't know if this is the right solution- as written
in the linked e-mail [1], the issue might be more that the config
conflates 2 unrelated things, not that a certain intersection is
missing.

[1] https://public-inbox.org/git/87muykuij7.fsf@evledraar.gmail.com/

Optional: Probably better to put the link inline, instead of in the
trailer.

> -test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
> +test_expect_success 'detect blocks of moved code' '
>  	git reset --hard &&
>  	cat <<-\EOF >lines.txt &&
>  		long line 1
> @@ -1271,6 +1271,52 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
>  	test_config color.diff.newMovedDimmed "normal cyan" &&
>  	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
>  	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&

Add a comment here explaining that these colors do not appear in the
output, but merely set to recognizable values to ensure that they do not
appear in the output.

> +
> +	git diff HEAD --no-renames --color-moved=blocks --color |
> +		grep -v "index" |
> +		test_decode_color >actual &&
> +	cat <<-\EOF >expected &&
> +	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
> +	<BOLD>--- a/lines.txt<RESET>
> +	<BOLD>+++ b/lines.txt<RESET>
> +	<CYAN>@@ -1,16 +1,16 @@<RESET>
> +	<MAGENTA>-long line 1<RESET>
> +	<MAGENTA>-long line 2<RESET>
> +	<MAGENTA>-long line 3<RESET>
> +	 line 4<RESET>
> +	 line 5<RESET>
> +	 line 6<RESET>
> +	 line 7<RESET>
> +	 line 8<RESET>
> +	 line 9<RESET>
> +	<CYAN>+<RESET><CYAN>long line 1<RESET>
> +	<CYAN>+<RESET><CYAN>long line 2<RESET>
> +	<CYAN>+<RESET><CYAN>long line 3<RESET>
> +	<CYAN>+<RESET><CYAN>long line 14<RESET>
> +	<CYAN>+<RESET><CYAN>long line 15<RESET>
> +	<CYAN>+<RESET><CYAN>long line 16<RESET>
> +	 line 10<RESET>
> +	 line 11<RESET>
> +	 line 12<RESET>
> +	 line 13<RESET>
> +	<MAGENTA>-long line 14<RESET>
> +	<MAGENTA>-long line 15<RESET>
> +	<MAGENTA>-long line 16<RESET>
> +	EOF
> +	test_cmp expected actual
> +
> +'

[snip]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] diff.c: decouple white space treatment from move detection algorithm
  2018-04-24 21:03 ` [PATCH 6/7] diff.c: decouple white space treatment from move detection algorithm Stefan Beller
@ 2018-04-24 22:00   ` Jonathan Tan
  2018-04-24 22:19     ` Stefan Beller
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Tan @ 2018-04-24 22:00 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, simon, avarab, jacob.keller

On Tue, 24 Apr 2018 14:03:29 -0700
Stefan Beller <sbeller@google.com> wrote:

> As we change the default, we'll adjust the tests.

This statement is probably better written as:

  In some existing tests, options like --ignore-space-at-eol were used
  to control the color of the output. They have been updated to use
  options like --color-moved-ignore-space-at-eol instead.

> +	unsigned flags = diffopt->color_moved_ws_handling
> +			 & XDF_WHITESPACE_FLAGS;

No need for "& XDF_WHITESPACE_FLAGS".

> +	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;

Same here.

> @@ -214,6 +214,7 @@ struct diff_options {
>  	} color_moved;
>  	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
>  	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
> +	int color_moved_ws_handling;
>  };

Should the "int" be "unsigned"? I noticed that the flag-like xdl_opts is
signed, but I think it's better for flags to be unsigned. Also, document
what this stores. (And also, I would limit the bits.)

> +test_expect_success 'only move detection ignores white spaces' '
> +	git reset --hard &&
> +	q_to_tab <<-\EOF >text.txt &&
> +		a long line to exceed per-line minimum
> +		another long line to exceed per-line minimum
> +		original file
> +	EOF
> +	git add text.txt &&
> +	git commit -m "add text" &&
> +	q_to_tab <<-\EOF >text.txt &&
> +		Qa long line to exceed per-line minimum
> +		Qanother long line to exceed per-line minimum
> +		new file
> +	EOF
> +
> +	# Make sure we get a different diff using -w
> +	git diff --color --color-moved -w \
> +		--color-moved-no-ignore-all-space \
> +		--color-moved-no-ignore-space-change \
> +		--color-moved-no-ignore-space-at-eol |
> +		grep -v "index" |
> +		test_decode_color >actual &&
> +	q_to_tab <<-\EOF >expected &&
> +	<BOLD>diff --git a/text.txt b/text.txt<RESET>
> +	<BOLD>--- a/text.txt<RESET>
> +	<BOLD>+++ b/text.txt<RESET>
> +	<CYAN>@@ -1,3 +1,3 @@<RESET>
> +	 Qa long line to exceed per-line minimum<RESET>
> +	 Qanother long line to exceed per-line minimum<RESET>
> +	<RED>-original file<RESET>
> +	<GREEN>+<RESET><GREEN>new file<RESET>
> +	EOF
> +	test_cmp expected actual &&
> +
> +	# And now ignoring white space only in the move detection
> +	git diff --color --color-moved \
> +		--color-moved-ignore-all-space \
> +		--color-moved-ignore-space-change \
> +		--color-moved-ignore-space-at-eol |
> +		grep -v "index" |
> +		test_decode_color >actual &&
> +	q_to_tab <<-\EOF >expected &&
> +	<BOLD>diff --git a/text.txt b/text.txt<RESET>
> +	<BOLD>--- a/text.txt<RESET>
> +	<BOLD>+++ b/text.txt<RESET>
> +	<CYAN>@@ -1,3 +1,3 @@<RESET>
> +	<BOLD;MAGENTA>-a long line to exceed per-line minimum<RESET>
> +	<BOLD;MAGENTA>-another long line to exceed per-line minimum<RESET>
> +	<RED>-original file<RESET>
> +	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>a long line to exceed per-line minimum<RESET>
> +	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>another long line to exceed per-line minimum<RESET>
> +	<GREEN>+<RESET><GREEN>new file<RESET>
> +	EOF
> +	test_cmp expected actual
>  '

I know I suggested "per-line minimum", but I don't think there is one -
I think we only have a per-block minimum. Maybe delete "per-line" in
each of the lines.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/7] diff.c: decouple white space treatment from move detection algorithm
  2018-04-24 22:00   ` Jonathan Tan
@ 2018-04-24 22:19     ` Stefan Beller
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2018-04-24 22:19 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git, Simon Ruderich, Ævar Arnfjörð Bjarmason,
	Jacob Keller

Hi Jonathan,

On Tue, Apr 24, 2018 at 3:00 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Tue, 24 Apr 2018 14:03:29 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> As we change the default, we'll adjust the tests.
>
> This statement is probably better written as:
>
>   In some existing tests, options like --ignore-space-at-eol were used
>   to control the color of the output. They have been updated to use
>   options like --color-moved-ignore-space-at-eol instead.

I'll adjust that statement; thanks for helping me out with good commit
messages (even the "As we change the defaults, .." was proposed by
you in a previous round)

>
>> +     unsigned flags = diffopt->color_moved_ws_handling
>> +                      & XDF_WHITESPACE_FLAGS;
>
> No need for "& XDF_WHITESPACE_FLAGS".

This is in anticipation of the next commit, when
color_moved_ws_handling takes more flags.
I can move that over to the last commit.

>
>> +     unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
>
> Same here.

Maybe I'll just state in the commit message why we keep the masking
here.

>
>> @@ -214,6 +214,7 @@ struct diff_options {
>>       } color_moved;
>>       #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
>>       #define COLOR_MOVED_MIN_ALNUM_COUNT 20
>> +     int color_moved_ws_handling;
>>  };
>
> Should the "int" be "unsigned"?

yes.

> I noticed that the flag-like xdl_opts is
> signed, but I think it's better for flags to be unsigned.

I can change those, too.

> Also, document
> what this stores.

ok, will document.

> (And also, I would limit the bits.)

Not sure I follow. you want to make it e.g.

  unsigned color_moved_ws_handling : 6;

?

Oh, that would actually work, as XDF_WHITESPACE_FLAGS
are in second to fifth bits.

But then we need to document why the XDF_WHITESPACE
need to be at these low positions.

>> +     q_to_tab <<-\EOF >text.txt &&
>> +             Qa long line to exceed per-line minimum
>> +             Qanother long line to exceed per-line minimum
>> +             new file

>
> I know I suggested "per-line minimum", but I don't think there is one -
> I think we only have a per-block minimum. Maybe delete "per-line" in
> each of the lines.

yeah, I guess this heuristic could also make for another setting, though
as of now I did not desire any other heuristic than you originally came up
with. Will reword the text. Thanks!

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-24 21:03 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
@ 2018-04-24 22:35   ` Jonathan Tan
       [not found]     ` <CAGZ79kbGkHFSS9K8KKTwNikx3Tw+m+RMLY3RAf8SW_iK9a2OJQ@mail.gmail.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Tan @ 2018-04-24 22:35 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, simon, avarab, jacob.keller

On Tue, 24 Apr 2018 14:03:30 -0700
Stefan Beller <sbeller@google.com> wrote:

> +--color-moved-[no-]ignore-space-prefix-delta::
> +	Ignores whitespace when comparing lines when performing the move
> +	detection for --color-moved. This ignores uniform differences
> +	of white space at the beginning lines in moved blocks.

Setting this option means that moved lines may be indented or dedented,
and if they have been indented or dedented by the same amount, they are
still considered to be the same block. Maybe call this
--color-moved-allow-indentation-change.

> +struct ws_delta {
> +	char *string; /* The prefix delta, which is the same in the block */

Probably better described as "the difference between the '-' line and
the '+' line". This may be the empty string if there is no difference.

> +	int direction; /* adding or removing the line? */

What is the value when "added" and what when "removed"? Also, it is not
truly "added" or "removed", so a better way might be: 1 if the '-' line
is longer than the '+' line, and 0 otherwise. (And make sure that the
documentation is correct with respect to equal lines.)

> +	int missmatch; /* in the remainder */

s/missmatch/mismatch/
Also, what is this used for?

> +	if (diffopt->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
> +		/*
> +		 * As there is not specific white space config given,
> +		 * we'd need to check for a new block, so ignore all
> +		 * white space. The setup of the white space
> +		 * configuration for the next block is done else where
> +		 */
> +		flags |= XDF_IGNORE_WHITESPACE;
> +
>  	return !xdiff_compare_lines(a->es->line, a->es->len,
>  				    b->es->line, b->es->len,
>  				    flags);

I wrote in [1]:

  I think we should just prohibit combining this with any of the
  whitespace ignoring flags except for the space-at-eol one. They seem
  to contradict anyway.

To elaborate, adding a specific flag that may interfere with other
user-provided flags sounds like we're unnecessarily adding combinations
that we must keep track of, and that it's both logical (from a user's
point of view) and clearer (as for the code) to just forbid such
combinations.

[1] https://public-inbox.org/git/20180402174118.d204ec0d4b9d2fa7ebd77739@google.com/

>  	struct moved_entry **pmb = NULL; /* potentially moved blocks */
>  	int pmb_nr = 0, pmb_alloc = 0;
> -	int n, flipped_block = 1, block_length = 0;
>  
> +	struct ws_delta *wsd = NULL; /* white space deltas between pmb */
> +	int wsd_alloc = 0;
> +
> +	int n, flipped_block = 1, block_length = 0;

It seems like pmb and wsd are parallel arrays - could each wsd be
embedded into the corresponding entry of pmb instead?

> --- a/diff.h
> +++ b/diff.h
> @@ -214,6 +214,8 @@ struct diff_options {
>  	} color_moved;
>  	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
>  	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
> +	/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
> +	#define COLOR_MOVED_DELTA_WHITESPACES	(1 << 22)
>  	int color_moved_ws_handling;
>  };

Setting of DELTA_WHITESPACES should be a separate field, not as part of
ws_handling. It's fine for the ws_handling to be a bitset, since that's
how it's passed to xdiff_compare_lines(), but we don't need to do the
same for DELTA_WHITESPACES.

> +test_expect_success 'compare whitespace delta across moved blocks' '
> +
> +	git reset --hard &&
> +	q_to_tab <<-\EOF >text.txt &&
> +	QIndented
> +	QText across
> +	Qthree lines
> +	QBut! <- this stands out
> +	Qthis one
> +	QQline did
> +	Qnot adjust
> +	EOF
> +
> +	git add text.txt &&
> +	git commit -m "add text.txt" &&
> +
> +	q_to_tab <<-\EOF >text.txt &&
> +	QQIndented
> +	QQText across
> +	QQthree lines
> +	QQQBut! <- this stands out
> +	this one
> +	Qline did
> +	not adjust
> +	EOF
> +
> +	git diff --color --color-moved-ignore-space-prefix-delta |
> +		grep -v "index" |
> +		test_decode_color >actual &&
> +
> +	q_to_tab <<-\EOF >expected &&
> +		<BOLD>diff --git a/text.txt b/text.txt<RESET>
> +		<BOLD>--- a/text.txt<RESET>
> +		<BOLD>+++ b/text.txt<RESET>
> +		<CYAN>@@ -1,7 +1,7 @@<RESET>
> +		<RED>-QIndented<RESET>
> +		<RED>-QText across<RESET>
> +		<RED>-Qthree lines<RESET>
> +		<RED>-QBut! <- this stands out<RESET>
> +		<RED>-Qthis one<RESET>
> +		<RED>-QQline did<RESET>
> +		<RED>-Qnot adjust<RESET>
> +		<GREEN>+<RESET>QQ<GREEN>Indented<RESET>
> +		<GREEN>+<RESET>QQ<GREEN>Text across<RESET>
> +		<GREEN>+<RESET>QQ<GREEN>three lines<RESET>
> +		<GREEN>+<RESET>QQQ<GREEN>But! <- this stands out<RESET>
> +		<GREEN>+<RESET><GREEN>this one<RESET>
> +		<GREEN>+<RESET>Q<GREEN>line did<RESET>
> +		<GREEN>+<RESET><GREEN>not adjust<RESET>
> +	EOF
> +
> +	test_cmp expected actual
> +'

I would have expected every line except the "this stands out" line to be
colored differently than the usual RED and GREEN. Is this test output
expected?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation
  2018-04-24 21:03 [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (6 preceding siblings ...)
  2018-04-24 21:03 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
@ 2018-04-24 22:37 ` Jonathan Tan
  2018-04-24 22:58   ` Stefan Beller
  7 siblings, 1 reply; 19+ messages in thread
From: Jonathan Tan @ 2018-04-24 22:37 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, simon, avarab, jacob.keller

On Tue, 24 Apr 2018 14:03:23 -0700
Stefan Beller <sbeller@google.com> wrote:

> v2:
> I think I have addressed Jonathans feedback
> * by using a string instead of counting the first character only.
> * refined tests slightly (easier to read)
> * moved white space handling for moved blocks into its own flag field,
>   keeping the enum for the actual mode of move detection.

For reference, v1 is here:
https://public-inbox.org/git/20180402224854.86922-1-sbeller@google.com/

> Stefan Beller (7):
>   xdiff/xdiff.h: remove unused flags
>   xdiff/xdiffi.c: remove unneeded function declarations
>   diff.c: do not pass diff options as keydata to hashmap
>   diff.c: adjust hash function signature to match hashmap expectation
>   diff.c: add a blocks mode for moved code detection
>   diff.c: decouple white space treatment from move detection algorithm
>   diff.c: add --color-moved-ignore-space-delta option

I'm not sure if we should add a new "blocks" mode, or if we should
modify the existing plain mode to have the minimum block length instead.
I reviewed the code as if we want the new "blocks" mode.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] diff.c: add a blocks mode for moved code detection
  2018-04-24 21:50   ` Jonathan Tan
@ 2018-04-24 22:37     ` Stefan Beller
  2018-04-24 22:59       ` Jonathan Tan
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2018-04-24 22:37 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git, Simon Ruderich, Ævar Arnfjörð Bjarmason,
	Jacob Keller

On Tue, Apr 24, 2018 at 2:50 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Tue, 24 Apr 2018 14:03:28 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>>  (https://public-inbox.org/git/87o9j0uljo.fsf@evledraar.gmail.com/)
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>
> Firstly, I don't know if this is the right solution- as written
> in the linked e-mail [1], the issue might be more that the config
> conflates 2 unrelated things, not that a certain intersection is
> missing.

The "plain zebra" or as I call them "blocks", has the "heuristic
for a minimum of 20 characters" and "few colors" as its defining
features, which solves that use case.

Stepping back a bit, we have different "building blocks"
at our disposal:
* move detection by line or block
* alternating blocks
* a heuristic to skip over small chunks (20 alnum chars)

These can be combined independently, so would
you expect the user to expect two options for them?
For example "--color-moved=zebra" could be split
into  "--skip-small --alternate-blocks"

Eventually we'll use various colors to inform the user
what these building blocks made of the diff.

Ævar wrote:

> Which is what I mean by the current config conflating two (to me)
> unrelated things. One is how we, via any method, detect what's moved or
> not, and the other is what color/format we use to present this to the
> user.

So instead of building blocks we rather want to split into algorithms
and presentation layer?

The presentation layer would be things like:
* use a different color for moved things
* alternate colors for adjacent blocks
* paint border of a block (dimmed zebra)

The algorithm side would be
* detect moves
* detect moves as blocks
* skip small heuristic

Am I still missing the big picture?

> [1] https://public-inbox.org/git/87muykuij7.fsf@evledraar.gmail.com/
>
> Optional: Probably better to put the link inline, instead of in the
> trailer.

ok.

>
>> -test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
>> +test_expect_success 'detect blocks of moved code' '
>>       git reset --hard &&
>>       cat <<-\EOF >lines.txt &&
>>               long line 1
>> @@ -1271,6 +1271,52 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
>>       test_config color.diff.newMovedDimmed "normal cyan" &&
>>       test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
>>       test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
>
> Add a comment here explaining that these colors do not appear in the
> output, but merely set to recognizable values to ensure that they do not
> appear in the output.

ok.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation
  2018-04-24 22:37 ` [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
@ 2018-04-24 22:58   ` Stefan Beller
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2018-04-24 22:58 UTC (permalink / raw)
  To: Jonathan Tan
  Cc: git, Simon Ruderich, Ævar Arnfjörð Bjarmason,
	Jacob Keller

On Tue, Apr 24, 2018 at 3:37 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Tue, 24 Apr 2018 14:03:23 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> v2:
>> I think I have addressed Jonathans feedback
>> * by using a string instead of counting the first character only.
>> * refined tests slightly (easier to read)
>> * moved white space handling for moved blocks into its own flag field,
>>   keeping the enum for the actual mode of move detection.
>
> For reference, v1 is here:
> https://public-inbox.org/git/20180402224854.86922-1-sbeller@google.com/
>
>> Stefan Beller (7):
>>   xdiff/xdiff.h: remove unused flags
>>   xdiff/xdiffi.c: remove unneeded function declarations
>>   diff.c: do not pass diff options as keydata to hashmap
>>   diff.c: adjust hash function signature to match hashmap expectation
>>   diff.c: add a blocks mode for moved code detection
>>   diff.c: decouple white space treatment from move detection algorithm
>>   diff.c: add --color-moved-ignore-space-delta option
>
> I'm not sure if we should add a new "blocks" mode, or if we should
> modify the existing plain mode to have the minimum block length instead.
> I reviewed the code as if we want the new "blocks" mode.

Thanks for the review!

I think keeping plain is useful, see 176841f0c9 (diff.c: color
moved lines differently, plain mode, 2017-06-30)

    diff.c: color moved lines differently, plain mode

    Add the 'plain' mode for move detection of code. This omits the checking
    for adjacent blocks, so it is not as useful. If you have a lot of the
    same blocks moved in the same patch, the 'Zebra' would end up slow as it
    is O(n^2) (n is number of same blocks). So this may be useful there and
    is generally easy to add. Instead be very literal at the move detection,
    do not skip over short blocks here.


Although if we do not care about that use case we can just add heuristics to
plain.

As eluded to in Ævars email, we might want to break it up into multiple
options as well?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/7] diff.c: add a blocks mode for moved code detection
  2018-04-24 22:37     ` Stefan Beller
@ 2018-04-24 22:59       ` Jonathan Tan
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Tan @ 2018-04-24 22:59 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Simon Ruderich, Ævar Arnfjörð Bjarmason,
	Jacob Keller

On Tue, 24 Apr 2018 15:37:51 -0700
Stefan Beller <sbeller@google.com> wrote:

> These can be combined independently, so would
> you expect the user to expect two options for them?
> For example "--color-moved=zebra" could be split
> into  "--skip-small --alternate-blocks"

Yes, this is a good explanation. Reusing your terms below, --skip-small
controls the algorithm, and --alternate-blocks controls the presentation
layer.

> So instead of building blocks we rather want to split into algorithms
> and presentation layer?
> 
> The presentation layer would be things like:
> * use a different color for moved things
> * alternate colors for adjacent blocks
> * paint border of a block (dimmed zebra)
> 
> The algorithm side would be
> * detect moves
> * detect moves as blocks
> * skip small heuristic

Yes.

This was just brainstorming, though - this might not be the direction we
want to take in this patch. (The right solution might just be to always
use blocks - thereby simplifying the algorithm aspect.)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
       [not found]     ` <CAGZ79kbGkHFSS9K8KKTwNikx3Tw+m+RMLY3RAf8SW_iK9a2OJQ@mail.gmail.com>
@ 2018-04-24 23:23       ` Stefan Beller
  2018-04-25  0:11         ` Jonathan Tan
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2018-04-24 23:23 UTC (permalink / raw)
  To: Jonathan Tan, git, Simon Ruderich,
	Ævar Arnfjörð Bjarmason, Jacob Keller

Replied to Jonathan only instead of all. My reply is below:

On Tue, Apr 24, 2018 at 3:55 PM, Stefan Beller <sbeller@google.com> wrote:
> On Tue, Apr 24, 2018 at 3:35 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
>> On Tue, 24 Apr 2018 14:03:30 -0700
>> Stefan Beller <sbeller@google.com> wrote:
>>
>>> +--color-moved-[no-]ignore-space-prefix-delta::
>>> +     Ignores whitespace when comparing lines when performing the move
>>> +     detection for --color-moved. This ignores uniform differences
>>> +     of white space at the beginning lines in moved blocks.
>>
>> Setting this option means that moved lines may be indented or dedented,
>> and if they have been indented or dedented by the same amount, they are
>> still considered to be the same block. Maybe call this
>> --color-moved-allow-indentation-change.
>
> ok, sounds good as well. I tried coming up with a name that refers to
> the block check as that is the important part.
>
>>> +struct ws_delta {
>>> +     char *string; /* The prefix delta, which is the same in the block */
>>
>> Probably better described as "the difference between the '-' line and
>> the '+' line". This may be the empty string if there is no difference.
>
> Makes sense.
>
>>
>>> +     int direction; /* adding or removing the line? */
>>
>> What is the value when "added" and what when "removed"? Also, it is not
>> truly "added" or "removed", so a better way might be: 1 if the '-' line
>> is longer than the '+' line, and 0 otherwise. (And make sure that the
>> documentation is correct with respect to equal lines.)
>>
>>> +     int missmatch; /* in the remainder */
>>
>> s/missmatch/mismatch/
>> Also, what is this used for?
>
> The mismatch should be used for (thanks for catching!)
> checking if the remainder of a line is the same, although a boolean
> may be not the correct choice. We know that the two strings
> passed into compute_ws_delta come from a complete white space
> agnostic comparison, so consider:
>
> + SP SP more TAB more
> + SP SP text TAB text
>
> - SP more TAB more
> - SP text TAB text
>
> which would mark this as a moved block. This is the feature
> working as intended, but what about
>
> + SP SP more TAB more
> + SP SP text TAB text
>
> - SP more SP more
> - SP text TAB text
>
> Note how the length of the strings is the same, hence the current
> code of
>
>     compute_ws_delta(...) {
>         int d = longer->len - shorter->len;
>         out->string = xmemdupz(longer->line, d);
>     }
>
> would work fine and not notice the "change in the remainder".
> That ought to be caught by the mismatch variable, that
> is assigned, but not used.
>
> The compare_ws_delta(a, b) needs to be extended to
>
>   !a->mismatch && !b->mismatch && existing_condition
>
> Ideally the example from above would be different depending
> on whether the other white space flags are given or not.
>
>>> +     if (diffopt->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
>>> +             /*
>>> +              * As there is not specific white space config given,
>>> +              * we'd need to check for a new block, so ignore all
>>> +              * white space. The setup of the white space
>>> +              * configuration for the next block is done else where
>>> +              */
>>> +             flags |= XDF_IGNORE_WHITESPACE;
>>> +
>>>       return !xdiff_compare_lines(a->es->line, a->es->len,
>>>                                   b->es->line, b->es->len,
>>>                                   flags);
>>
>> I wrote in [1]:
>>
>>   I think we should just prohibit combining this with any of the
>>   whitespace ignoring flags except for the space-at-eol one. They seem
>>   to contradict anyway.
>
> As outlined above, I think there are corner cases in which they do not
> contradict. So I think the COLOR_MOVED_DELTA_WHITESPACES
> will go into its own variable, and then we can solve the corner cases
> correctly.
>
>> To elaborate, adding a specific flag that may interfere with other
>> user-provided flags sounds like we're unnecessarily adding combinations
>> that we must keep track of, and that it's both logical (from a user's
>> point of view) and clearer (as for the code) to just forbid such
>> combinations.
>
> Yes, I think you mentioned this before. Thanks for reminding!
>
>>> +     struct ws_delta *wsd = NULL; /* white space deltas between pmb */
>>> +     int wsd_alloc = 0;
>>> +
>>> +     int n, flipped_block = 1, block_length = 0;
>>
>> It seems like pmb and wsd are parallel arrays - could each wsd be
>> embedded into the corresponding entry of pmb instead?
>
> I'll explore that. It sounds like a good idea for code hygiene.
> Although if you do not intend to use this feature, then keeping it separate
> would allow for a smaller footprint in memory.
>
>>
>>> --- a/diff.h
>>> +++ b/diff.h
>>> @@ -214,6 +214,8 @@ struct diff_options {
>>>       } color_moved;
>>>       #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
>>>       #define COLOR_MOVED_MIN_ALNUM_COUNT 20
>>> +     /* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
>>> +     #define COLOR_MOVED_DELTA_WHITESPACES   (1 << 22)
>>>       int color_moved_ws_handling;
>>>  };
>>
>> Setting of DELTA_WHITESPACES should be a separate field, not as part of
>> ws_handling. It's fine for the ws_handling to be a bitset, since that's
>> how it's passed to xdiff_compare_lines(), but we don't need to do the
>> same for DELTA_WHITESPACES.
>
> You are correct. Thanks for your patience in this series!
>
>> +     git diff --color --color-moved-ignore-space-prefix-delta |
>> +             grep -v "index" |
>> +             test_decode_color >actual &&
>
>>> +     q_to_tab <<-\EOF >expected &&
>>> +             <BOLD>diff --git a/text.txt b/text.txt<RESET>
>>> +             <BOLD>--- a/text.txt<RESET>
>>> +             <BOLD>+++ b/text.txt<RESET>
>>> +             <CYAN>@@ -1,7 +1,7 @@<RESET>
>>> +             <RED>-QIndented<RESET>
>>> +             <RED>-QText across<RESET>
>>> +             <RED>-Qthree lines<RESET>
>>> +             <RED>-QBut! <- this stands out<RESET>
>>> +             <RED>-Qthis one<RESET>
>>> +             <RED>-QQline did<RESET>
>>> +             <RED>-Qnot adjust<RESET>
>>> +             <GREEN>+<RESET>QQ<GREEN>Indented<RESET>
>>> +             <GREEN>+<RESET>QQ<GREEN>Text across<RESET>
>>> +             <GREEN>+<RESET>QQ<GREEN>three lines<RESET>
>>> +             <GREEN>+<RESET>QQQ<GREEN>But! <- this stands out<RESET>
>>> +             <GREEN>+<RESET><GREEN>this one<RESET>
>>> +             <GREEN>+<RESET>Q<GREEN>line did<RESET>
>>> +             <GREEN>+<RESET><GREEN>not adjust<RESET>
>>> +     EOF
>>> +
>>> +     test_cmp expected actual
>>> +'
>>
>> I would have expected every line except the "this stands out" line to be
>> colored differently than the usual RED and GREEN. Is this test output
>> expected?
>
> It is wrong indeed. I blindly copied the actual file once interactive testing
> confirmed it worked.
>
> The command is missing a --color-moved, as the --color-moved-whitespace-settings
> do not imply --color-moved, yet(?)
>
> Thanks,
> Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-24 23:23       ` Stefan Beller
@ 2018-04-25  0:11         ` Jonathan Tan
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Tan @ 2018-04-25  0:11 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Simon Ruderich, Ævar Arnfjörð Bjarmason,
	Jacob Keller

On Tue, 24 Apr 2018 16:23:28 -0700
Stefan Beller <sbeller@google.com> wrote:

> >> s/missmatch/mismatch/
> >> Also, what is this used for?
> >
> > The mismatch should be used for (thanks for catching!)
> > checking if the remainder of a line is the same, although a boolean
> > may be not the correct choice. We know that the two strings
> > passed into compute_ws_delta come from a complete white space
> > agnostic comparison, so consider:
> >
> > + SP SP more TAB more
> > + SP SP text TAB text
> >
> > - SP more TAB more
> > - SP text TAB text
> >
> > which would mark this as a moved block. This is the feature
> > working as intended, but what about
> >
> > + SP SP more TAB more
> > + SP SP text TAB text
> >
> > - SP more SP more
> > - SP text TAB text
> >
> > Note how the length of the strings is the same, hence the current
> > code of
> >
> >     compute_ws_delta(...) {
> >         int d = longer->len - shorter->len;
> >         out->string = xmemdupz(longer->line, d);
> >     }
> >
> > would work fine and not notice the "change in the remainder".
> > That ought to be caught by the mismatch variable, that
> > is assigned, but not used.
> >
> > The compare_ws_delta(a, b) needs to be extended to
> >
> >   !a->mismatch && !b->mismatch && existing_condition
> >
> > Ideally the example from above would be different depending
> > on whether the other white space flags are given or not.

Thanks - this gives me food for thought.

I'm starting to think that it is impossible to avoid creating our own
string comparison function that:
 - seeks to the first non-whitespace character in both strings
 - checks that both strings, from that first non-whitespace characters,
   are equal for some definition of equal (whether through strcmp or
   xdiff_compare_lines)
 - walks backwards from that first non-whitespace characters to look for
   the first non-matching whitespace character between the 2 strings

The existing diff whitespace modes (to be passed to xdiff_compare_lines)
do not give the exact result we want. For example, if
XDF_IGNORE_WHITESPACE is used (as is in this patch), lines like "a b"
and "ab " would match even though they shouldn't.

This comparison function can be used both in moved_entry_cmp() and when
constructing the ws_delta (in which case it should be made to output
whatever information is needed as out parameters or similar).

> >>> +     if (diffopt->color_moved_ws_handling & COLOR_MOVED_DELTA_WHITESPACES)
> >>> +             /*
> >>> +              * As there is not specific white space config given,
> >>> +              * we'd need to check for a new block, so ignore all
> >>> +              * white space. The setup of the white space
> >>> +              * configuration for the next block is done else where
> >>> +              */
> >>> +             flags |= XDF_IGNORE_WHITESPACE;
> >>> +
> >>>       return !xdiff_compare_lines(a->es->line, a->es->len,
> >>>                                   b->es->line, b->es->len,
> >>>                                   flags);

We can't add XDF_IGNORE_WHITESPACE here - as far as I can tell, this
means that more lines will be treated as moved than the user wants (if
the user did not set --color-moved-ignore-all-space).

> >> It seems like pmb and wsd are parallel arrays - could each wsd be
> >> embedded into the corresponding entry of pmb instead?
> >
> > I'll explore that. It sounds like a good idea for code hygiene.
> > Although if you do not intend to use this feature, then keeping it separate
> > would allow for a smaller footprint in memory.

If you're worried about memory, wsd can be embedded as a pointer.

> > The command is missing a --color-moved, as the --color-moved-whitespace-settings
> > do not imply --color-moved, yet(?)

Maybe one should imply the other (or at least a warning).

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-04-25  0:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-24 21:03 [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
2018-04-24 21:03 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
2018-04-24 21:03 ` [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
2018-04-24 21:03 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
2018-04-24 21:03 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
2018-04-24 21:03 ` [PATCH 5/7] diff.c: add a blocks mode for moved code detection Stefan Beller
2018-04-24 21:50   ` Jonathan Tan
2018-04-24 22:37     ` Stefan Beller
2018-04-24 22:59       ` Jonathan Tan
2018-04-24 21:03 ` [PATCH 6/7] diff.c: decouple white space treatment from move detection algorithm Stefan Beller
2018-04-24 22:00   ` Jonathan Tan
2018-04-24 22:19     ` Stefan Beller
2018-04-24 21:03 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
2018-04-24 22:35   ` Jonathan Tan
     [not found]     ` <CAGZ79kbGkHFSS9K8KKTwNikx3Tw+m+RMLY3RAf8SW_iK9a2OJQ@mail.gmail.com>
2018-04-24 23:23       ` Stefan Beller
2018-04-25  0:11         ` Jonathan Tan
2018-04-24 22:37 ` [PATCHv2 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
2018-04-24 22:58   ` Stefan Beller
  -- strict thread matches above, loose matches on Subject: below --
2018-04-02 22:48 [RFC PATCH " Stefan Beller
2018-04-02 22:48 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).