git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation
@ 2018-06-22  1:57 Stefan Beller
  2018-06-22  1:57 ` [PATCH v3 1/8] xdiff/xdiff.h: remove unused flags Stefan Beller
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-22  1:57 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon

This replaces sb/diff-color-move-more and is also available at
https://github.com/stefanbeller/git/tree/color-moved-only
It applies on v2.18.0.

Move detection is a nice feature, but doesn't work well with indentation
or dedentation. Make it possible to indent/dedent code and still have
it recognized as moved code in the diff.

The fun is in the last patch, which allows white space sensitive
languages to trust the move detection, too. Each block that is marked as
moved will have the same delta in {in-, de-}dentation.
I would think this mode might be a reasonable default eventually.

Thanks,
Stefan

v3:
 This is a complete rewrite of the actual patch, with slight modifications]
 on the refactoring how to decouple the white space treatment from the
 move detection. See range-diff below.

v2: https://public-inbox.org/git/20180424210330.87861-1-sbeller@google.com/

v1: https://public-inbox.org/git/20180402224854.86922-1-sbeller@google.com/

Stefan Beller (8):
  xdiff/xdiff.h: remove unused flags
  xdiff/xdiffi.c: remove unneeded function declarations
  diff.c: do not pass diff options as keydata to hashmap
  diff.c: adjust hash function signature to match hashmap expectation
  diff.c: add a blocks mode for moved code detection
  diff.c: decouple white space treatment from move detection algorithm
  diff.c: factor advance_or_nullify out of mark_color_as_moved
  diff.c: add white space mode to move detection that allows indent
    changes

 Documentation/diff-options.txt |  29 +++-
 diff.c                         | 253 +++++++++++++++++++++++++++++----
 diff.h                         |   9 +-
 t/t4015-diff-whitespace.sh     | 202 +++++++++++++++++++++++++-
 xdiff/xdiff.h                  |   8 --
 xdiff/xdiffi.c                 |  17 ---
 6 files changed, 458 insertions(+), 60 deletions(-)

-- 
2.18.0.rc2.346.g013aa6912e-goog

git branch-diff fe0a9eaf31dd0c349ae4308498c33a5c3794b293..origin/sb/diff-color-move-more origin/master..HEAD

1:  a7a7af6b76b = 1:  7e01bd9a972 xdiff/xdiff.h: remove unused flags
2:  a7b6aaf7bc0 = 2:  46e11a99bb7 xdiff/xdiffi.c: remove unneeded function declarations
3:  d9e57cc6b05 = 3:  8fd0ce94aaf diff.c: do not pass diff options as keydata to hashmap
4:  87111ba726d = 4:  4a07b39163c diff.c: adjust hash function signature to match hashmap expectation
5:  9559b8cb456 = 5:  ef1976a301d diff.c: add a blocks mode for moved code detection
6:  41a70464209 ! 6:  a60a3f0de9d diff.c: decouple white space treatment from move detection algorithm
    @@ -7,24 +7,30 @@
         for the regular diff.  Some cases came up where different treatment would
         have been nice.
     
    -    Allow the user to specify that whitespace should be ignored differently
    +    Allow the user to specify that white space should be ignored differently
         during detection of moved lines than during generation of added and removed
         lines. This is done by providing analogs to the --ignore-space-at-eol,
    -    -b, and -w options (namely,
    -      --color-moved-[no-]ignore-space-at-eol
    -      --color-moved-[no-]ignore-space-change
    -      --color-moved-[no-]ignore-all-space) that affect only the color of the
    -    output, and making the existing --ignore-space-at-eol, -b, and -w options
    -    no longer affect the color of the output.
    +    -b, and -w options by introducing the option --color-moved-ws=<modes>
    +    with the modes named "ignore-space-at-eol", "ignore-space-change" and
    +    "ignore-all-space", which is used only during the move detection phase.
     
         As we change the default, we'll adjust the tests.
     
    -    For now we do not infer any options to treat whitespaces in the move
    +    For now we do not infer any options to treat white spaces in the move
         detection from the generic white space options given to diff.
         This can be tuned later to reasonable default.
     
    +    As we plan on adding more white space related options in a later patch,
    +    that interferes with the current white space options, use a flag field
    +    and clamp it down to  XDF_WHITESPACE_FLAGS, as that (a) allows to easily
    +    check at parse time if we give invalid combinations and (b) can reuse
    +    parts of this patch.
    +
    +    By having the white space treatment in its own option, we'll also
    +    make it easier for a later patch to have an config option for
    +    spaces in the move detection.
    +
         Signed-off-by: Stefan Beller <sbeller@google.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
     diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
     --- a/Documentation/diff-options.txt
    @@ -33,18 +39,21 @@
      	blocks are considered interesting, the rest is uninteresting.
      --
      
    -+--color-moved-[no-]ignore-space-at-eol::
    -+	Ignore changes in whitespace at EOL when performing the move
    -+	detection for --color-moved.
    -+--color-moved-[no-]ignore-space-change::
    -+	Ignore changes in amount of whitespace when performing the move
    -+	detection for --color-moved.  This ignores whitespace
    ++--color-moved-ws=<modes>::
    ++	This configures how white spaces are ignored when performing the
    ++	move detection for `--color-moved`. These modes can be given
    ++	as a comma separated list:
    +++
    ++--
    ++ignore-space-at-eol::
    ++	Ignore changes in whitespace at EOL.
    ++ignore-space-change::
    ++	Ignore changes in amount of whitespace.  This ignores whitespace
     +	at line end, and considers all other sequences of one or
     +	more whitespace characters to be equivalent.
    -+--color-moved-[no-]ignore-all-space::
    -+	Ignore whitespace when comparing lines when performing the move
    -+	detection for --color-moved.  This ignores differences even if
    -+	one line has whitespace where the other line has none.
    ++ignore-all-space::
    ++	Ignore whitespace when comparing lines. This ignores differences
    ++	even if one line has whitespace where the other line has none.
     +
      --word-diff[=<mode>]::
      	Show a word diff, using the <mode> to delimit changed words.
    @@ -53,6 +62,43 @@
     diff --git a/diff.c b/diff.c
     --- a/diff.c
     +++ b/diff.c
    +@@
    + 		return error(_("color moved setting must be one of 'no', 'default', 'blocks', 'zebra', 'dimmed_zebra', 'plain'"));
    + }
    + 
    ++static int parse_color_moved_ws(const char *arg)
    ++{
    ++	int ret = 0;
    ++	struct string_list l = STRING_LIST_INIT_DUP;
    ++	struct string_list_item *i;
    ++
    ++	string_list_split(&l, arg, ',', -1);
    ++
    ++	for_each_string_list_item(i, &l) {
    ++		struct strbuf sb = STRBUF_INIT;
    ++		strbuf_addstr(&sb, i->string);
    ++		strbuf_trim(&sb);
    ++
    ++		if (!strcmp(sb.buf, "ignore-space-change"))
    ++			ret |= XDF_IGNORE_WHITESPACE_CHANGE;
    ++		else if (!strcmp(sb.buf, "ignore-space-at-eol"))
    ++			ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
    ++		else if (!strcmp(sb.buf, "ignore-all-space"))
    ++			ret |= XDF_IGNORE_WHITESPACE;
    ++		else
    ++			error(_("ignoring unknown color-moved-ws mode '%s'"), sb.buf);
    ++
    ++		strbuf_release(&sb);
    ++	}
    ++
    ++	string_list_clear(&l, 0);
    ++
    ++	return ret;
    ++}
    ++
    + int git_diff_ui_config(const char *var, const char *value, void *cb)
    + {
    + 	if (!strcmp(var, "diff.color") || !strcmp(var, "color.diff")) {
     @@
      	const struct diff_options *diffopt = hashmap_cmp_fn_data;
      	const struct moved_entry *a = entry;
    @@ -79,24 +125,14 @@
      	ret->next_line = NULL;
      
     @@
    - 		DIFF_XDL_SET(options, IGNORE_CR_AT_EOL);
    - 	else if (!strcmp(arg, "--ignore-blank-lines"))
    - 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
    -+	else if (!strcmp(arg, "--color-moved-no-ignore-all-space"))
    -+		options->color_moved_ws_handling &= ~XDF_IGNORE_WHITESPACE;
    -+	else if (!strcmp(arg, "--color-moved-no-ignore-space-change"))
    -+		options->color_moved_ws_handling &= ~XDF_IGNORE_WHITESPACE_CHANGE;
    -+	else if (!strcmp(arg, "--color-moved-no-ignore-space-at-eol"))
    -+		options->color_moved_ws_handling &= ~XDF_IGNORE_WHITESPACE_AT_EOL;
    -+	else if (!strcmp(arg, "--color-moved-ignore-all-space"))
    -+		options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
    -+	else if (!strcmp(arg, "--color-moved-ignore-space-change"))
    -+		options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE_CHANGE;
    -+	else if (!strcmp(arg, "--color-moved-ignore-space-at-eol"))
    -+		options->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE_AT_EOL;
    - 	else if (!strcmp(arg, "--indent-heuristic"))
    - 		DIFF_XDL_SET(options, INDENT_HEURISTIC);
    - 	else if (!strcmp(arg, "--no-indent-heuristic"))
    + 		if (cm < 0)
    + 			die("bad --color-moved argument: %s", arg);
    + 		options->color_moved = cm;
    ++	} else if (skip_prefix(arg, "--color-moved-ws=", &arg)) {
    ++		options->color_moved_ws_handling = parse_color_moved_ws(arg);
    + 	} else if (skip_to_optional_arg_default(arg, "--color-words", &options->word_regex, NULL)) {
    + 		options->use_color = 1;
    + 		options->word_diff = DIFF_WORDS_COLOR;
     
     diff --git a/diff.h b/diff.h
     --- a/diff.h
    @@ -113,39 +149,13 @@
     diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
     --- a/t/t4015-diff-whitespace.sh
     +++ b/t/t4015-diff-whitespace.sh
    -@@
    - 	line 4
    - 	line 5
    - 	EOF
    --	git diff HEAD --no-renames --color-moved --color |
    -+	git diff HEAD --no-renames --color-moved --color \
    -+		--color-moved-no-ignore-all-space \
    -+		--color-moved-no-ignore-space-change \
    -+		--color-moved-no-ignore-space-at-eol |
    - 		grep -v "index" |
    - 		test_decode_color >actual &&
    - 	cat <<-\EOF >expected &&
     @@
      	EOF
      	test_cmp expected actual &&
      
     -	git diff HEAD --no-renames -w --color-moved --color |
     +	git diff HEAD --no-renames --color-moved --color \
    -+		--color-moved-ignore-all-space \
    -+		--color-moved-no-ignore-space-change \
    -+		--color-moved-no-ignore-space-at-eol |
    - 		grep -v "index" |
    - 		test_decode_color >actual &&
    - 	cat <<-\EOF >expected &&
    -@@
    - 	line 5
    - 	EOF
    - 
    --	git diff HEAD --no-renames --color-moved --color |
    -+	git diff HEAD --no-renames --color-moved --color \
    -+		--color-moved-no-ignore-all-space \
    -+		--color-moved-no-ignore-space-change \
    -+		--color-moved-no-ignore-space-at-eol |
    ++		--color-moved-ws=ignore-all-space |
      		grep -v "index" |
      		test_decode_color >actual &&
      	cat <<-\EOF >expected &&
    @@ -155,21 +165,7 @@
      
     -	git diff HEAD --no-renames -b --color-moved --color |
     +	git diff HEAD --no-renames --color-moved --color \
    -+		--color-moved-no-ignore-all-space \
    -+		--color-moved-no-ignore-space-at-eol \
    -+		--color-moved-ignore-space-change |
    - 		grep -v "index" |
    - 		test_decode_color >actual &&
    - 	cat <<-\EOF >expected &&
    -@@
    - 	# avoid cluttering the output with complaints about our eol whitespace
    - 	test_config core.whitespace -blank-at-eol &&
    - 
    --	git diff HEAD --no-renames --color-moved --color |
    -+	git diff HEAD --no-renames --color-moved --color \
    -+		--color-moved-no-ignore-all-space \
    -+		--color-moved-no-ignore-space-change \
    -+		--color-moved-no-ignore-space-at-eol |
    ++		--color-moved-ws=ignore-space-change |
      		grep -v "index" |
      		test_decode_color >actual &&
      	cat <<-\EOF >expected &&
    @@ -179,9 +175,7 @@
      
     -	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
     +	git diff HEAD --no-renames --color-moved --color \
    -+		--color-moved-no-ignore-all-space \
    -+		--color-moved-no-ignore-space-change \
    -+		--color-moved-ignore-space-at-eol |
    ++		--color-moved-ws=ignore-space-at-eol |
      		grep -v "index" |
      		test_decode_color >actual &&
      	cat <<-\EOF >expected &&
    @@ -211,10 +205,7 @@
     +	EOF
     +
     +	# Make sure we get a different diff using -w
    -+	git diff --color --color-moved -w \
    -+		--color-moved-no-ignore-all-space \
    -+		--color-moved-no-ignore-space-change \
    -+		--color-moved-no-ignore-space-at-eol |
    ++	git diff --color --color-moved -w |
     +		grep -v "index" |
     +		test_decode_color >actual &&
     +	q_to_tab <<-\EOF >expected &&
    @@ -231,9 +222,7 @@
     +
     +	# And now ignoring white space only in the move detection
     +	git diff --color --color-moved \
    -+		--color-moved-ignore-all-space \
    -+		--color-moved-ignore-space-change \
    -+		--color-moved-ignore-space-at-eol |
    ++		--color-moved-ws=ignore-all-space,ignore-space-change,ignore-space-at-eol |
     +		grep -v "index" |
     +		test_decode_color >actual &&
     +	q_to_tab <<-\EOF >expected &&
7:  ce99fa38553 < -:  ----------- diff.c: add --color-moved-ignore-space-delta option
8:  39c5337cd9e < -:  ----------- diff: color-moved white space handling options imply color-moved
-:  ----------- > 7:  b76faee22fe diff.c: factor advance_or_nullify out of mark_color_as_moved
-:  ----------- > 8:  ab003ed7e27 diff.c: add white space mode to move detection that allows indent changes

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 1/8] xdiff/xdiff.h: remove unused flags
  2018-06-22  1:57 [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Stefan Beller
@ 2018-06-22  1:57 ` Stefan Beller
  2018-06-22  1:57 ` [PATCH v3 2/8] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-22  1:57 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, Junio C Hamano

These flags were there since the beginning (3443546f6e (Use a *real*
built-in diff generator, 2006-03-24), but were never used. Remove them.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 xdiff/xdiff.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index c1937a29112..2356da5f784 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -52,14 +52,6 @@ extern "C" {
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
-#define XDL_MMB_READONLY (1 << 0)
-
-#define XDL_MMF_ATOMIC (1 << 0)
-
-#define XDL_BDOP_INS 1
-#define XDL_BDOP_CPY 2
-#define XDL_BDOP_INSB 3
-
 /* merge simplification levels */
 #define XDL_MERGE_MINIMAL 0
 #define XDL_MERGE_EAGER 1
-- 
2.18.0.rc2.346.g013aa6912e-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 2/8] xdiff/xdiffi.c: remove unneeded function declarations
  2018-06-22  1:57 [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Stefan Beller
  2018-06-22  1:57 ` [PATCH v3 1/8] xdiff/xdiff.h: remove unused flags Stefan Beller
@ 2018-06-22  1:57 ` Stefan Beller
  2018-06-22  1:57 ` [PATCH v3 3/8] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-22  1:57 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, Junio C Hamano

There is no need to forward-declare these functions, as they are used
after their implementation only.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 xdiff/xdiffi.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 0de1ef463bf..3e8aff92bc4 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -22,34 +22,17 @@
 
 #include "xinclude.h"
 
-
-
 #define XDL_MAX_COST_MIN 256
 #define XDL_HEUR_MIN_COST 256
 #define XDL_LINE_MAX (long)((1UL << (CHAR_BIT * sizeof(long) - 1)) - 1)
 #define XDL_SNAKE_CNT 20
 #define XDL_K_HEUR 4
 
-
-
 typedef struct s_xdpsplit {
 	long i1, i2;
 	int min_lo, min_hi;
 } xdpsplit_t;
 
-
-
-
-static long xdl_split(unsigned long const *ha1, long off1, long lim1,
-		      unsigned long const *ha2, long off2, long lim2,
-		      long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
-		      xdalgoenv_t *xenv);
-static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, long chg2);
-
-
-
-
-
 /*
  * See "An O(ND) Difference Algorithm and its Variations", by Eugene Myers.
  * Basically considers a "box" (off1, off2, lim1, lim2) and scan from both
-- 
2.18.0.rc2.346.g013aa6912e-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 3/8] diff.c: do not pass diff options as keydata to hashmap
  2018-06-22  1:57 [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Stefan Beller
  2018-06-22  1:57 ` [PATCH v3 1/8] xdiff/xdiff.h: remove unused flags Stefan Beller
  2018-06-22  1:57 ` [PATCH v3 2/8] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
@ 2018-06-22  1:57 ` Stefan Beller
  2018-06-22  1:57 ` [PATCH v3 4/8] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-22  1:57 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, Junio C Hamano

When we initialize the hashmap, we give it a pointer to the
diff_options, which it then passes along to each call of the
hashmap_cmp_fn function. There's no need to pass it a second time as
the "keydata" parameter, and our comparison functions never look at
keydata.

This was a mistake left over from an earlier round of 2e2d5ac184
(diff.c: color moved lines differently, 2017-06-30), before hashmap
learned to pass the data pointer for us.

Explanation-by: Jeff King <peff@peff.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 136d44b4556..112e6af2cc8 100644
--- a/diff.c
+++ b/diff.c
@@ -842,13 +842,13 @@ static void mark_color_as_moved(struct diff_options *o,
 		case DIFF_SYMBOL_PLUS:
 			hm = del_lines;
 			key = prepare_entry(o, n);
-			match = hashmap_get(hm, key, o);
+			match = hashmap_get(hm, key, NULL);
 			free(key);
 			break;
 		case DIFF_SYMBOL_MINUS:
 			hm = add_lines;
 			key = prepare_entry(o, n);
-			match = hashmap_get(hm, key, o);
+			match = hashmap_get(hm, key, NULL);
 			free(key);
 			break;
 		default:
-- 
2.18.0.rc2.346.g013aa6912e-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 4/8] diff.c: adjust hash function signature to match hashmap expectation
  2018-06-22  1:57 [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (2 preceding siblings ...)
  2018-06-22  1:57 ` [PATCH v3 3/8] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
@ 2018-06-22  1:57 ` Stefan Beller
  2018-06-22  1:57 ` [PATCH v3 5/8] diff.c: add a blocks mode for moved code detection Stefan Beller
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-22  1:57 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, Junio C Hamano

This makes the follow up patch easier.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 112e6af2cc8..2a1a21d3b7a 100644
--- a/diff.c
+++ b/diff.c
@@ -707,11 +707,15 @@ struct moved_entry {
 	struct moved_entry *next_line;
 };
 
-static int moved_entry_cmp(const struct diff_options *diffopt,
-			   const struct moved_entry *a,
-			   const struct moved_entry *b,
+static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
+			   const void *entry,
+			   const void *entry_or_key,
 			   const void *keydata)
 {
+	const struct diff_options *diffopt = hashmap_cmp_fn_data;
+	const struct moved_entry *a = entry;
+	const struct moved_entry *b = entry_or_key;
+
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
 				    diffopt->xdl_opts);
@@ -5541,10 +5545,8 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
-			hashmap_init(&del_lines,
-				     (hashmap_cmp_fn)moved_entry_cmp, o, 0);
-			hashmap_init(&add_lines,
-				     (hashmap_cmp_fn)moved_entry_cmp, o, 0);
+			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
+			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
 			add_lines_to_move_detection(o, &add_lines, &del_lines);
 			mark_color_as_moved(o, &add_lines, &del_lines);
-- 
2.18.0.rc2.346.g013aa6912e-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 5/8] diff.c: add a blocks mode for moved code detection
  2018-06-22  1:57 [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (3 preceding siblings ...)
  2018-06-22  1:57 ` [PATCH v3 4/8] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
@ 2018-06-22  1:57 ` Stefan Beller
  2018-06-22  1:57 ` [PATCH v3 6/8] diff.c: decouple white space treatment from move detection algorithm Stefan Beller
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-22  1:57 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, Junio C Hamano

The new "blocks" mode provides a middle ground between plain and zebra.
It is as intuitive (few colors) as plain, but still has the requirement
for a minimum of lines/characters to count a block as moved.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
 (https://public-inbox.org/git/87o9j0uljo.fsf@evledraar.gmail.com/)
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/diff-options.txt |  8 ++++--
 diff.c                         |  6 +++--
 diff.h                         |  5 ++--
 t/t4015-diff-whitespace.sh     | 48 +++++++++++++++++++++++++++++++++-
 4 files changed, 60 insertions(+), 7 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 41064909ee4..2e20794f9ed 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -276,10 +276,14 @@ plain::
 	that are added somewhere else in the diff. This mode picks up any
 	moved line, but it is not very useful in a review to determine
 	if a block of code was moved without permutation.
-zebra::
+blocks:
 	Blocks of moved text of at least 20 alphanumeric characters
 	are detected greedily. The detected blocks are
-	painted using either the 'color.diff.{old,new}Moved' color or
+	painted using either the 'color.diff.{old,new}Moved' color.
+	Adjacent blocks cannot be told apart.
+zebra::
+	Blocks of moved text are detected as in 'blocks' mode. The blocks
+	are painted using either the 'color.diff.{old,new}Moved' color or
 	'color.diff.{old,new}MovedAlternative'. The change between
 	the two colors indicates that a new block was detected.
 dimmed_zebra::
diff --git a/diff.c b/diff.c
index 2a1a21d3b7a..b9575901eb2 100644
--- a/diff.c
+++ b/diff.c
@@ -271,6 +271,8 @@ static int parse_color_moved(const char *arg)
 		return COLOR_MOVED_NO;
 	else if (!strcmp(arg, "plain"))
 		return COLOR_MOVED_PLAIN;
+	else if (!strcmp(arg, "blocks"))
+		return COLOR_MOVED_BLOCKS;
 	else if (!strcmp(arg, "zebra"))
 		return COLOR_MOVED_ZEBRA;
 	else if (!strcmp(arg, "default"))
@@ -278,7 +280,7 @@ static int parse_color_moved(const char *arg)
 	else if (!strcmp(arg, "dimmed_zebra"))
 		return COLOR_MOVED_ZEBRA_DIM;
 	else
-		return error(_("color moved setting must be one of 'no', 'default', 'zebra', 'dimmed_zebra', 'plain'"));
+		return error(_("color moved setting must be one of 'no', 'default', 'blocks', 'zebra', 'dimmed_zebra', 'plain'"));
 }
 
 int git_diff_ui_config(const char *var, const char *value, void *cb)
@@ -903,7 +905,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
 		block_length++;
 
-		if (flipped_block)
+		if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
 	adjust_last_block(o, n, block_length);
diff --git a/diff.h b/diff.h
index dedac472ca5..d8009597937 100644
--- a/diff.h
+++ b/diff.h
@@ -208,8 +208,9 @@ struct diff_options {
 	enum {
 		COLOR_MOVED_NO = 0,
 		COLOR_MOVED_PLAIN = 1,
-		COLOR_MOVED_ZEBRA = 2,
-		COLOR_MOVED_ZEBRA_DIM = 3,
+		COLOR_MOVED_BLOCKS = 2,
+		COLOR_MOVED_ZEBRA = 3,
+		COLOR_MOVED_ZEBRA_DIM = 4,
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 17df491a3ab..45091abb192 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1223,7 +1223,7 @@ test_expect_success 'plain moved code, inside file' '
 	test_cmp expected actual
 '
 
-test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
+test_expect_success 'detect blocks of moved code' '
 	git reset --hard &&
 	cat <<-\EOF >lines.txt &&
 		long line 1
@@ -1271,6 +1271,52 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
 	test_config color.diff.newMovedDimmed "normal cyan" &&
 	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
 	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
+
+	git diff HEAD --no-renames --color-moved=blocks --color |
+		grep -v "index" |
+		test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+	<BOLD>--- a/lines.txt<RESET>
+	<BOLD>+++ b/lines.txt<RESET>
+	<CYAN>@@ -1,16 +1,16 @@<RESET>
+	<MAGENTA>-long line 1<RESET>
+	<MAGENTA>-long line 2<RESET>
+	<MAGENTA>-long line 3<RESET>
+	 line 4<RESET>
+	 line 5<RESET>
+	 line 6<RESET>
+	 line 7<RESET>
+	 line 8<RESET>
+	 line 9<RESET>
+	<CYAN>+<RESET><CYAN>long line 1<RESET>
+	<CYAN>+<RESET><CYAN>long line 2<RESET>
+	<CYAN>+<RESET><CYAN>long line 3<RESET>
+	<CYAN>+<RESET><CYAN>long line 14<RESET>
+	<CYAN>+<RESET><CYAN>long line 15<RESET>
+	<CYAN>+<RESET><CYAN>long line 16<RESET>
+	 line 10<RESET>
+	 line 11<RESET>
+	 line 12<RESET>
+	 line 13<RESET>
+	<MAGENTA>-long line 14<RESET>
+	<MAGENTA>-long line 15<RESET>
+	<MAGENTA>-long line 16<RESET>
+	EOF
+	test_cmp expected actual
+
+'
+
+test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
+	# reuse setup from test before!
+	test_config color.diff.oldMoved "magenta" &&
+	test_config color.diff.newMoved "cyan" &&
+	test_config color.diff.oldMovedAlternative "blue" &&
+	test_config color.diff.newMovedAlternative "yellow" &&
+	test_config color.diff.oldMovedDimmed "normal magenta" &&
+	test_config color.diff.newMovedDimmed "normal cyan" &&
+	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
+	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
 	git diff HEAD --no-renames --color-moved=dimmed_zebra --color |
 		grep -v "index" |
 		test_decode_color >actual &&
-- 
2.18.0.rc2.346.g013aa6912e-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 6/8] diff.c: decouple white space treatment from move detection algorithm
  2018-06-22  1:57 [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (4 preceding siblings ...)
  2018-06-22  1:57 ` [PATCH v3 5/8] diff.c: add a blocks mode for moved code detection Stefan Beller
@ 2018-06-22  1:57 ` Stefan Beller
  2018-06-22  1:57 ` [PATCH v3 7/8] diff.c: factor advance_or_nullify out of mark_color_as_moved Stefan Beller
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-22  1:57 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon

In the original implementation of the move detection logic the choice for
ignoring white space changes is the same for the move detection as it is
for the regular diff.  Some cases came up where different treatment would
have been nice.

Allow the user to specify that white space should be ignored differently
during detection of moved lines than during generation of added and removed
lines. This is done by providing analogs to the --ignore-space-at-eol,
-b, and -w options by introducing the option --color-moved-ws=<modes>
with the modes named "ignore-space-at-eol", "ignore-space-change" and
"ignore-all-space", which is used only during the move detection phase.

As we change the default, we'll adjust the tests.

For now we do not infer any options to treat white spaces in the move
detection from the generic white space options given to diff.
This can be tuned later to reasonable default.

As we plan on adding more white space related options in a later patch,
that interferes with the current white space options, use a flag field
and clamp it down to  XDF_WHITESPACE_FLAGS, as that (a) allows to easily
check at parse time if we give invalid combinations and (b) can reuse
parts of this patch.

By having the white space treatment in its own option, we'll also
make it easier for a later patch to have an config option for
spaces in the move detection.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt | 16 +++++++++
 diff.c                         | 39 +++++++++++++++++++--
 diff.h                         |  1 +
 t/t4015-diff-whitespace.sh     | 64 +++++++++++++++++++++++++++++++---
 4 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 2e20794f9ed..d174ed1dd0e 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -292,6 +292,22 @@ dimmed_zebra::
 	blocks are considered interesting, the rest is uninteresting.
 --
 
+--color-moved-ws=<modes>::
+	This configures how white spaces are ignored when performing the
+	move detection for `--color-moved`. These modes can be given
+	as a comma separated list:
++
+--
+ignore-space-at-eol::
+	Ignore changes in whitespace at EOL.
+ignore-space-change::
+	Ignore changes in amount of whitespace.  This ignores whitespace
+	at line end, and considers all other sequences of one or
+	more whitespace characters to be equivalent.
+ignore-all-space::
+	Ignore whitespace when comparing lines. This ignores differences
+	even if one line has whitespace where the other line has none.
+
 --word-diff[=<mode>]::
 	Show a word diff, using the <mode> to delimit changed words.
 	By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index b9575901eb2..6c523bafddf 100644
--- a/diff.c
+++ b/diff.c
@@ -283,6 +283,36 @@ static int parse_color_moved(const char *arg)
 		return error(_("color moved setting must be one of 'no', 'default', 'blocks', 'zebra', 'dimmed_zebra', 'plain'"));
 }
 
+static int parse_color_moved_ws(const char *arg)
+{
+	int ret = 0;
+	struct string_list l = STRING_LIST_INIT_DUP;
+	struct string_list_item *i;
+
+	string_list_split(&l, arg, ',', -1);
+
+	for_each_string_list_item(i, &l) {
+		struct strbuf sb = STRBUF_INIT;
+		strbuf_addstr(&sb, i->string);
+		strbuf_trim(&sb);
+
+		if (!strcmp(sb.buf, "ignore-space-change"))
+			ret |= XDF_IGNORE_WHITESPACE_CHANGE;
+		else if (!strcmp(sb.buf, "ignore-space-at-eol"))
+			ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
+		else if (!strcmp(sb.buf, "ignore-all-space"))
+			ret |= XDF_IGNORE_WHITESPACE;
+		else
+			error(_("ignoring unknown color-moved-ws mode '%s'"), sb.buf);
+
+		strbuf_release(&sb);
+	}
+
+	string_list_clear(&l, 0);
+
+	return ret;
+}
+
 int git_diff_ui_config(const char *var, const char *value, void *cb)
 {
 	if (!strcmp(var, "diff.color") || !strcmp(var, "color.diff")) {
@@ -717,10 +747,12 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
 	const struct moved_entry *a = entry;
 	const struct moved_entry *b = entry_or_key;
+	unsigned flags = diffopt->color_moved_ws_handling
+			 & XDF_WHITESPACE_FLAGS;
 
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
-				    diffopt->xdl_opts);
+				    flags);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -728,8 +760,9 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 {
 	struct moved_entry *ret = xmalloc(sizeof(*ret));
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
+	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
-	ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
+	ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
 	ret->es = l;
 	ret->next_line = NULL;
 
@@ -4717,6 +4750,8 @@ int diff_opt_parse(struct diff_options *options,
 		if (cm < 0)
 			die("bad --color-moved argument: %s", arg);
 		options->color_moved = cm;
+	} else if (skip_prefix(arg, "--color-moved-ws=", &arg)) {
+		options->color_moved_ws_handling = parse_color_moved_ws(arg);
 	} else if (skip_to_optional_arg_default(arg, "--color-words", &options->word_regex, NULL)) {
 		options->use_color = 1;
 		options->word_diff = DIFF_WORDS_COLOR;
diff --git a/diff.h b/diff.h
index d8009597937..94d4adfe0f3 100644
--- a/diff.h
+++ b/diff.h
@@ -214,6 +214,7 @@ struct diff_options {
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
+	int color_moved_ws_handling;
 };
 
 void diff_emit_submodule_del(struct diff_options *o, const char *line);
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 45091abb192..aad0870c8a1 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1465,7 +1465,8 @@ test_expect_success 'move detection ignoring whitespace ' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -w --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-ws=ignore-all-space |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1529,7 +1530,8 @@ test_expect_success 'move detection ignoring whitespace changes' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -b --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-ws=ignore-space-change |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1596,7 +1598,8 @@ test_expect_success 'move detection ignoring whitespace at eol' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-ws=ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1768,7 +1771,60 @@ test_expect_success 'move detection with submodules' '
 
 	# nor did we mess with it another way
 	git diff --submodule=diff --color | test_decode_color >expect &&
-	test_cmp expect decoded_actual
+	test_cmp expect decoded_actual &&
+	rm -rf bananas &&
+	git submodule deinit bananas
+'
+
+test_expect_success 'only move detection ignores white spaces' '
+	git reset --hard &&
+	q_to_tab <<-\EOF >text.txt &&
+		a long line to exceed per-line minimum
+		another long line to exceed per-line minimum
+		original file
+	EOF
+	git add text.txt &&
+	git commit -m "add text" &&
+	q_to_tab <<-\EOF >text.txt &&
+		Qa long line to exceed per-line minimum
+		Qanother long line to exceed per-line minimum
+		new file
+	EOF
+
+	# Make sure we get a different diff using -w
+	git diff --color --color-moved -w |
+		grep -v "index" |
+		test_decode_color >actual &&
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/text.txt b/text.txt<RESET>
+	<BOLD>--- a/text.txt<RESET>
+	<BOLD>+++ b/text.txt<RESET>
+	<CYAN>@@ -1,3 +1,3 @@<RESET>
+	 Qa long line to exceed per-line minimum<RESET>
+	 Qanother long line to exceed per-line minimum<RESET>
+	<RED>-original file<RESET>
+	<GREEN>+<RESET><GREEN>new file<RESET>
+	EOF
+	test_cmp expected actual &&
+
+	# And now ignoring white space only in the move detection
+	git diff --color --color-moved \
+		--color-moved-ws=ignore-all-space,ignore-space-change,ignore-space-at-eol |
+		grep -v "index" |
+		test_decode_color >actual &&
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/text.txt b/text.txt<RESET>
+	<BOLD>--- a/text.txt<RESET>
+	<BOLD>+++ b/text.txt<RESET>
+	<CYAN>@@ -1,3 +1,3 @@<RESET>
+	<BOLD;MAGENTA>-a long line to exceed per-line minimum<RESET>
+	<BOLD;MAGENTA>-another long line to exceed per-line minimum<RESET>
+	<RED>-original file<RESET>
+	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>a long line to exceed per-line minimum<RESET>
+	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>another long line to exceed per-line minimum<RESET>
+	<GREEN>+<RESET><GREEN>new file<RESET>
+	EOF
+	test_cmp expected actual
 '
 
 test_done
-- 
2.18.0.rc2.346.g013aa6912e-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 7/8] diff.c: factor advance_or_nullify out of mark_color_as_moved
  2018-06-22  1:57 [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (5 preceding siblings ...)
  2018-06-22  1:57 ` [PATCH v3 6/8] diff.c: decouple white space treatment from move detection algorithm Stefan Beller
@ 2018-06-22  1:57 ` Stefan Beller
  2018-06-22  1:57 ` [PATCH v3 8/8] diff.c: add white space mode to move detection that allows indent changes Stefan Beller
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-22  1:57 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon

This moves the part of code that checks if we're still in a block
into its own function.  We'll need a different approach on advancing
the blocks in a later patch, so having it as a separate function will
prove useful.

While at it rename the variable `p` to `prev` to indicate that it refers
to the previous line. This is as pmb[i] was assigned in the last iteration
of the outmost for loop.

Further rename `pnext` to `cur` to indicate that this should match up with
the current line of the outmost for loop.

Also replace the advancement of pmb[i] to reuse `cur` instead of
using `p->next` (which is how the name for pnext could be explained.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/diff.c b/diff.c
index 6c523bafddf..040b46545e5 100644
--- a/diff.c
+++ b/diff.c
@@ -801,6 +801,25 @@ static void add_lines_to_move_detection(struct diff_options *o,
 	}
 }
 
+static void pmb_advance_or_null(struct diff_options *o,
+				struct moved_entry *match,
+				struct hashmap *hm,
+				struct moved_entry **pmb,
+				int pmb_nr)
+{
+	int i;
+	for (i = 0; i < pmb_nr; i++) {
+		struct moved_entry *prev = pmb[i];
+		struct moved_entry *cur = (prev && prev->next_line) ?
+				prev->next_line : NULL;
+		if (cur && !hm->cmpfn(o, cur, match, NULL)) {
+			pmb[i] = cur;
+		} else {
+			pmb[i] = NULL;
+		}
+	}
+}
+
 static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 					 int pmb_nr)
 {
@@ -875,7 +894,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-		int i;
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
@@ -906,17 +924,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (o->color_moved == COLOR_MOVED_PLAIN)
 			continue;
 
-		/* Check any potential block runs, advance each or nullify */
-		for (i = 0; i < pmb_nr; i++) {
-			struct moved_entry *p = pmb[i];
-			struct moved_entry *pnext = (p && p->next_line) ?
-					p->next_line : NULL;
-			if (pnext && !hm->cmpfn(o, pnext, match, NULL)) {
-				pmb[i] = p->next_line;
-			} else {
-				pmb[i] = NULL;
-			}
-		}
+		pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
2.18.0.rc2.346.g013aa6912e-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 8/8] diff.c: add white space mode to move detection that allows indent changes
  2018-06-22  1:57 [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (6 preceding siblings ...)
  2018-06-22  1:57 ` [PATCH v3 7/8] diff.c: factor advance_or_nullify out of mark_color_as_moved Stefan Beller
@ 2018-06-22  1:57 ` Stefan Beller
  2018-06-23 16:52   ` SZEDER Gábor
  2018-06-22 22:37 ` [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Junio C Hamano
  2018-06-29  0:19 ` [PATCH v4 0/9] " Stefan Beller
  9 siblings, 1 reply; 25+ messages in thread
From: Stefan Beller @ 2018-06-22  1:57 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon

The option of --color-moved has proven to be useful as observed on the
mailing list. However when refactoring sometimes the indentation changes,
for example when partitioning a functions into smaller helper functions
the code usually mostly moved around except for a decrease in indentation.

To just review the moved code ignoring the change in indentation, a mode
to ignore spaces in the move detection as implemented in a previous patch
would be enough.  However the whole move coloring as motivated in commit
2e2d5ac (diff.c: color moved lines differently, 2017-06-30), brought
up the notion of the reviewer being able to trust the move of a "block".

As there are languages such as python, which depend on proper relative
indentation for the control flow of the program, ignoring any white space
change in a block would not uphold the promises of 2e2d5ac that allows
reviewers to pay less attention to the inside of a block, as inside
the reviewer wants to assume the same program flow.

This new mode of white space ignorance will take this into account and will
only allow the same white space changes per line in each block. This patch
even allows only for the same change at the beginning of the lines.

As this is a white space mode, it is made exclusive to other white space
modes in the move detection.

This patch brings some challenges, related to the detection of blocks.
We need a white net the catch the possible moved lines, but then need to
narrow down to check if the blocks are still in tact. Consider this
example (ignoring block sizes):

 - A
 - B
 - C
 +    A
 +    B
 +    C

At the beginning of a block when checking if there is a counterpart
for A, we have to ignore all space changes. However at the following
lines we have to check if the indent change stayed the same.

Checking if the indentation change did stay the same, is done by computing
the indentation change by the difference in line length, and then assume
the change is only in the beginning of the longer line, the common tail
is the same. That is why the test contains lines like:

 - <TAB> A
 ...
 + A <TAB>
 ...

As the first line starting a block is caught using a compare function that
ignores white spaces unlike the rest of the block, where the white space
delta is taken into account for the comparison, we also have to think about
the following situation:

 - A
 - B
 -   A
 -   B
 +    A
 +    B
 +      A
 +      B

When checking if the first A (both in the + and - lines) is a start of
a block, we have to check all 'A' and record all the white space deltas
such that we can find the example above to be just one block that is
indented.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt |   5 ++
 diff.c                         | 158 ++++++++++++++++++++++++++++++++-
 diff.h                         |   3 +
 t/t4015-diff-whitespace.sh     |  90 +++++++++++++++++++
 4 files changed, 254 insertions(+), 2 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index d174ed1dd0e..d9ff0bb8a58 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -307,6 +307,11 @@ ignore-space-change::
 ignore-all-space::
 	Ignore whitespace when comparing lines. This ignores differences
 	even if one line has whitespace where the other line has none.
+allow-indentation-change::
+	Initially ignore any white spaces in the move detection, then
+	group the moved code blocks only into a block if the change in
+	whitespace is the same per line. This is incompatible with the
+	other modes.
 
 --word-diff[=<mode>]::
 	Show a word diff, using the <mode> to delimit changed words.
diff --git a/diff.c b/diff.c
index 040b46545e5..9e357111864 100644
--- a/diff.c
+++ b/diff.c
@@ -302,12 +302,18 @@ static int parse_color_moved_ws(const char *arg)
 			ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
 		else if (!strcmp(sb.buf, "ignore-all-space"))
 			ret |= XDF_IGNORE_WHITESPACE;
+		else if (!strcmp(sb.buf, "allow-indentation-change"))
+			ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
 		else
 			error(_("ignoring unknown color-moved-ws mode '%s'"), sb.buf);
 
 		strbuf_release(&sb);
 	}
 
+	if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
+	    (ret & XDF_WHITESPACE_FLAGS))
+		die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes"));
+
 	string_list_clear(&l, 0);
 
 	return ret;
@@ -737,7 +743,91 @@ struct moved_entry {
 	struct hashmap_entry ent;
 	const struct emitted_diff_symbol *es;
 	struct moved_entry *next_line;
+	struct ws_delta *wsd;
+};
+
+/**
+ * The struct ws_delta holds white space differences between moved lines, i.e.
+ * between '+' and '-' lines that have been detected to be a move.
+ * The string contains the difference in leading white spaces, before the
+ * rest of the line is compared using the white space config for move
+ * coloring. The current_longer indicates if the first string in the
+ * comparision is longer than the second.
+ */
+struct ws_delta {
+	char *string;
+	int current_longer : 1;
 };
+#define WS_DELTA_INIT { NULL, 0 }
+
+static int compute_ws_delta(const struct emitted_diff_symbol *a,
+			     const struct emitted_diff_symbol *b,
+			     struct ws_delta *out)
+{
+	const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
+	const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
+	int d = longer->len - shorter->len;
+
+	out->string = xmemdupz(longer->line, d);
+	out->current_longer = (a == longer);
+
+	return !strncmp(longer->line + d, shorter->line, shorter->len);
+}
+
+static int cmp_in_block_with_wsd(const struct diff_options *o,
+				 const struct moved_entry *cur,
+				 const struct moved_entry *match,
+				 struct moved_entry *pmb,
+				 int n)
+{
+	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
+	int al = cur->es->len, cl = l->len;
+	const char *a = cur->es->line,
+		   *b = match->es->line,
+		   *c = l->line;
+
+	int wslen;
+
+	/*
+	 * We need to check if 'cur' is equal to 'match'.
+	 * As those are from the same (+/-) side, we do not need to adjust for
+	 * indent changes. However these were found using fuzzy matching
+	 * so we do have to check if they are equal.
+	 */
+	if (strcmp(a, b))
+		return 1;
+
+	if (!pmb->wsd)
+		/*
+		 * No white space delta was carried forward? This can happen
+		 * when we exit early in this function and do not carry
+		 * forward ws.
+		 */
+		return 1;
+
+	/*
+	 * The indent changes of the block are known and carried forward in
+	 * pmb->wsd; however we need to check if the indent changes of the
+	 * current line are still the same as before.
+	 *
+	 * To do so we need to compare 'l' to 'cur', adjusting the
+	 * one of them for the white spaces, depending which was longer.
+	 */
+
+	wslen = strlen(pmb->wsd->string);
+	if (pmb->wsd->current_longer) {
+		c += wslen;
+		cl -= wslen;
+	} else {
+		a += wslen;
+		al -= wslen;
+	}
+
+	if (strcmp(a, c))
+		return 1;
+
+	return 0;
+}
 
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			   const void *entry,
@@ -750,6 +840,16 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
+	if (diffopt->color_moved_ws_handling &
+	    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+		/*
+		 * As there is not specific white space config given,
+		 * we'd need to check for a new block, so ignore all
+		 * white space. The setup of the white space
+		 * configuration for the next block is done else where
+		 */
+		flags |= XDF_IGNORE_WHITESPACE;
+
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
 				    flags);
@@ -765,6 +865,7 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 	ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
 	ret->es = l;
 	ret->next_line = NULL;
+	ret->wsd = NULL;
 
 	return ret;
 }
@@ -820,6 +921,37 @@ static void pmb_advance_or_null(struct diff_options *o,
 	}
 }
 
+static void pmb_advance_or_null_multi_match(struct diff_options *o,
+					    struct moved_entry *match,
+					    struct hashmap *hm,
+					    struct moved_entry **pmb,
+					    int pmb_nr, int n)
+{
+	int i;
+	char *got_match = xcalloc(1, pmb_nr);
+
+	for (; match; match = hashmap_get_next(hm, match)) {
+		for (i = 0; i < pmb_nr; i++) {
+			struct moved_entry *prev = pmb[i];
+			struct moved_entry *cur = (prev && prev->next_line) ?
+					prev->next_line : NULL;
+			if (!cur)
+				continue;
+			if (!cmp_in_block_with_wsd(o, cur, match, pmb[i], n))
+				got_match[i] |= 1;
+		}
+	}
+
+	for (i = 0; i <pmb_nr; i++) {
+		if (got_match[i]) {
+			/* Carry the white space delta forward */
+			pmb[i]->next_line->wsd = pmb[i]->wsd;
+			pmb[i] = pmb[i]->next_line;
+		} else
+			pmb[i] = NULL;
+	}
+}
+
 static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 					 int pmb_nr)
 {
@@ -837,6 +969,10 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 
 		if (lp < pmb_nr && rp > -1 && lp < rp) {
 			pmb[lp] = pmb[rp];
+			if (pmb[rp]->wsd) {
+				free(pmb[rp]->wsd->string);
+				FREE_AND_NULL(pmb[rp]->wsd);
+			}
 			pmb[rp] = NULL;
 			rp--;
 			lp++;
@@ -924,7 +1060,11 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (o->color_moved == COLOR_MOVED_PLAIN)
 			continue;
 
-		pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
+		if (o->color_moved_ws_handling &
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
+		else
+			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
@@ -935,7 +1075,17 @@ static void mark_color_as_moved(struct diff_options *o,
 			 */
 			for (; match; match = hashmap_get_next(hm, match)) {
 				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
-				pmb[pmb_nr++] = match;
+				if (o->color_moved_ws_handling &
+				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
+					struct ws_delta *wsd = xmalloc(sizeof(*match->wsd));
+					if (compute_ws_delta(l, match->es, wsd)) {
+						match->wsd = wsd;
+						pmb[pmb_nr++] = match;
+					} else
+						free(wsd);
+				} else {
+					pmb[pmb_nr++] = match;
+				}
 			}
 
 			flipped_block = (flipped_block + 1) % 2;
@@ -5590,6 +5740,10 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
+			if (o->color_moved_ws_handling &
+			    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+				o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
+
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
 			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
diff --git a/diff.h b/diff.h
index 94d4adfe0f3..a14895bb824 100644
--- a/diff.h
+++ b/diff.h
@@ -214,6 +214,9 @@ struct diff_options {
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
+
+	/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
+	#define COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE (1<<5)
 	int color_moved_ws_handling;
 };
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index aad0870c8a1..13b20be591e 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1827,4 +1827,94 @@ test_expect_success 'only move detection ignores white spaces' '
 	test_cmp expected actual
 '
 
+test_expect_success 'compare whitespace delta across moved blocks' '
+
+	git reset --hard &&
+	q_to_tab <<-\EOF >text.txt &&
+	QIndented
+	QText across
+	Qsome lines
+	QBut! <- this stands out
+	QAdjusting with
+	QQdifferent starting
+	Qwhite spaces
+	QAnother outlier
+	QQQIndented
+	QQQText across
+	QQQfive lines
+	QQQthat has similar lines
+	QQQto previous blocks, but with different indent
+	QQQYetQAnotherQoutlierQ
+	EOF
+
+	git add text.txt &&
+	git commit -m "add text.txt" &&
+
+	q_to_tab <<-\EOF >text.txt &&
+	QQIndented
+	QQText across
+	QQsome lines
+	QQQBut! <- this stands out
+	Adjusting with
+	Qdifferent starting
+	white spaces
+	AnotherQoutlier
+	QQIndented
+	QQText across
+	QQfive lines
+	QQthat has similar lines
+	QQto previous blocks, but with different indent
+	QQYetQAnotherQoutlier
+	EOF
+
+	git diff --color --color-moved --color-moved-ws=allow-indentation-change |
+		grep -v "index" |
+		test_decode_color >actual &&
+
+	q_to_tab <<-\EOF >expected &&
+		<BOLD>diff --git a/text.txt b/text.txt<RESET>
+		<BOLD>--- a/text.txt<RESET>
+		<BOLD>+++ b/text.txt<RESET>
+		<CYAN>@@ -1,14 +1,14 @@<RESET>
+		<BOLD;MAGENTA>-QIndented<RESET>
+		<BOLD;MAGENTA>-QText across<RESET>
+		<BOLD;MAGENTA>-Qsome lines<RESET>
+		<RED>-QBut! <- this stands out<RESET>
+		<BOLD;MAGENTA>-QAdjusting with<RESET>
+		<BOLD;MAGENTA>-QQdifferent starting<RESET>
+		<BOLD;MAGENTA>-Qwhite spaces<RESET>
+		<RED>-QAnother outlier<RESET>
+		<BOLD;MAGENTA>-QQQIndented<RESET>
+		<BOLD;MAGENTA>-QQQText across<RESET>
+		<BOLD;MAGENTA>-QQQfive lines<RESET>
+		<BOLD;MAGENTA>-QQQthat has similar lines<RESET>
+		<BOLD;MAGENTA>-QQQto previous blocks, but with different indent<RESET>
+		<RED>-QQQYetQAnotherQoutlierQ<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Indented<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Text across<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>some lines<RESET>
+		<GREEN>+<RESET>QQQ<GREEN>But! <- this stands out<RESET>
+		<BOLD;CYAN>+<RESET><BOLD;CYAN>Adjusting with<RESET>
+		<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>different starting<RESET>
+		<BOLD;CYAN>+<RESET><BOLD;CYAN>white spaces<RESET>
+		<GREEN>+<RESET><GREEN>AnotherQoutlier<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Indented<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Text across<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>five lines<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>that has similar lines<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>to previous blocks, but with different indent<RESET>
+		<GREEN>+<RESET>QQ<GREEN>YetQAnotherQoutlier<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'compare whitespace delta incompatible with other space options' '
+	test_must_fail git diff \
+		--color-moved-ws=allow-indentation-change,ignore-all-space \
+		2>err &&
+	grep allow-indentation-change err
+'
+
+
 test_done
-- 
2.18.0.rc2.346.g013aa6912e-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation
  2018-06-22  1:57 [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (7 preceding siblings ...)
  2018-06-22  1:57 ` [PATCH v3 8/8] diff.c: add white space mode to move detection that allows indent changes Stefan Beller
@ 2018-06-22 22:37 ` Junio C Hamano
  2018-06-29  0:19 ` [PATCH v4 0/9] " Stefan Beller
  9 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2018-06-22 22:37 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jacob.keller, jonathantanmy, simon

This seems to break the documentation build rather badly; I have a
monkey-fix queued at the tip fo the topic branch for tonight's push.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 8/8] diff.c: add white space mode to move detection that allows indent changes
  2018-06-22  1:57 ` [PATCH v3 8/8] diff.c: add white space mode to move detection that allows indent changes Stefan Beller
@ 2018-06-23 16:52   ` SZEDER Gábor
  0 siblings, 0 replies; 25+ messages in thread
From: SZEDER Gábor @ 2018-06-23 16:52 UTC (permalink / raw)
  To: Stefan Beller; +Cc: SZEDER Gábor, git, jacob.keller, jonathantanmy, simon

> diff --git a/diff.c b/diff.c
> index 040b46545e5..9e357111864 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -302,12 +302,18 @@ static int parse_color_moved_ws(const char *arg)
>  			ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
>  		else if (!strcmp(sb.buf, "ignore-all-space"))
>  			ret |= XDF_IGNORE_WHITESPACE;
> +		else if (!strcmp(sb.buf, "allow-indentation-change"))
> +			ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
>  		else
>  			error(_("ignoring unknown color-moved-ws mode '%s'"), sb.buf);
>  
>  		strbuf_release(&sb);
>  	}
>  
> +	if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
> +	    (ret & XDF_WHITESPACE_FLAGS))
> +		die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes"));

Note that this is a translated error message.

> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index aad0870c8a1..13b20be591e 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1827,4 +1827,94 @@ test_expect_success 'only move detection ignores white spaces' '
>  	test_cmp expected actual
>  '
>  
> +test_expect_success 'compare whitespace delta across moved blocks' '
> +
> +	git reset --hard &&
> +	q_to_tab <<-\EOF >text.txt &&
> +	QIndented
> +	QText across
> +	Qsome lines
> +	QBut! <- this stands out
> +	QAdjusting with
> +	QQdifferent starting
> +	Qwhite spaces
> +	QAnother outlier
> +	QQQIndented
> +	QQQText across
> +	QQQfive lines
> +	QQQthat has similar lines
> +	QQQto previous blocks, but with different indent
> +	QQQYetQAnotherQoutlierQ
> +	EOF
> +
> +	git add text.txt &&
> +	git commit -m "add text.txt" &&
> +
> +	q_to_tab <<-\EOF >text.txt &&
> +	QQIndented
> +	QQText across
> +	QQsome lines
> +	QQQBut! <- this stands out
> +	Adjusting with
> +	Qdifferent starting
> +	white spaces
> +	AnotherQoutlier
> +	QQIndented
> +	QQText across
> +	QQfive lines
> +	QQthat has similar lines
> +	QQto previous blocks, but with different indent
> +	QQYetQAnotherQoutlier
> +	EOF
> +
> +	git diff --color --color-moved --color-moved-ws=allow-indentation-change |
> +		grep -v "index" |
> +		test_decode_color >actual &&

Please use an intermediate file for the output of 'git diff', because
running it upstream of a pipe hides its exit code.

> +
> +	q_to_tab <<-\EOF >expected &&
> +		<BOLD>diff --git a/text.txt b/text.txt<RESET>
> +		<BOLD>--- a/text.txt<RESET>
> +		<BOLD>+++ b/text.txt<RESET>
> +		<CYAN>@@ -1,14 +1,14 @@<RESET>
> +		<BOLD;MAGENTA>-QIndented<RESET>
> +		<BOLD;MAGENTA>-QText across<RESET>
> +		<BOLD;MAGENTA>-Qsome lines<RESET>
> +		<RED>-QBut! <- this stands out<RESET>
> +		<BOLD;MAGENTA>-QAdjusting with<RESET>
> +		<BOLD;MAGENTA>-QQdifferent starting<RESET>
> +		<BOLD;MAGENTA>-Qwhite spaces<RESET>
> +		<RED>-QAnother outlier<RESET>
> +		<BOLD;MAGENTA>-QQQIndented<RESET>
> +		<BOLD;MAGENTA>-QQQText across<RESET>
> +		<BOLD;MAGENTA>-QQQfive lines<RESET>
> +		<BOLD;MAGENTA>-QQQthat has similar lines<RESET>
> +		<BOLD;MAGENTA>-QQQto previous blocks, but with different indent<RESET>
> +		<RED>-QQQYetQAnotherQoutlierQ<RESET>
> +		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Indented<RESET>
> +		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Text across<RESET>
> +		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>some lines<RESET>
> +		<GREEN>+<RESET>QQQ<GREEN>But! <- this stands out<RESET>
> +		<BOLD;CYAN>+<RESET><BOLD;CYAN>Adjusting with<RESET>
> +		<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>different starting<RESET>
> +		<BOLD;CYAN>+<RESET><BOLD;CYAN>white spaces<RESET>
> +		<GREEN>+<RESET><GREEN>AnotherQoutlier<RESET>
> +		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Indented<RESET>
> +		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Text across<RESET>
> +		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>five lines<RESET>
> +		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>that has similar lines<RESET>
> +		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>to previous blocks, but with different indent<RESET>
> +		<GREEN>+<RESET>QQ<GREEN>YetQAnotherQoutlier<RESET>
> +	EOF
> +
> +	test_cmp expected actual
> +'
> +
> +test_expect_success 'compare whitespace delta incompatible with other space options' '
> +	test_must_fail git diff \
> +		--color-moved-ws=allow-indentation-change,ignore-all-space \
> +		2>err &&
> +	grep allow-indentation-change err

A translated error message should be checked with 'test_i18ngrep'.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4 0/9] Moved code detection: ignore space on uniform indentation
  2018-06-22  1:57 [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (8 preceding siblings ...)
  2018-06-22 22:37 ` [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Junio C Hamano
@ 2018-06-29  0:19 ` Stefan Beller
  2018-06-29  0:19   ` [PATCH v4 1/9] xdiff/xdiff.h: remove unused flags Stefan Beller
                     ` (8 more replies)
  9 siblings, 9 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-29  0:19 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, bmwill

v4:
 * see range diff below
 * brought best practices to t4015 and have git not upstream of a pipe
   (new patch 3)
 * squashed in the SQUASH patches
 * fixed the translation as well.

v3:
 This is a complete rewrite of the actual patch, with slight modifications]
 on the refactoring how to decouple the white space treatment from the
 move detection. See range-diff below.
 https://public-inbox.org/git/20180622015725.219575-1-sbeller@google.com/

v2: https://public-inbox.org/git/20180424210330.87861-1-sbeller@google.com/

v1: https://public-inbox.org/git/20180402224854.86922-1-sbeller@google.com/

Stefan Beller (9):
  xdiff/xdiff.h: remove unused flags
  xdiff/xdiffi.c: remove unneeded function declarations
  t4015: avoid git as a pipe input
  diff.c: do not pass diff options as keydata to hashmap
  diff.c: adjust hash function signature to match hashmap expectation
  diff.c: add a blocks mode for moved code detection
  diff.c: decouple white space treatment from move detection algorithm
  diff.c: factor advance_or_nullify out of mark_color_as_moved
  diff.c: add white space mode to move detection that allows indent
    changes

 Documentation/diff-options.txt |  30 +++-
 diff.c                         | 253 +++++++++++++++++++++++++++++----
 diff.h                         |   9 +-
 t/t4015-diff-whitespace.sh     | 243 ++++++++++++++++++++++++++-----
 xdiff/xdiff.h                  |   8 --
 xdiff/xdiffi.c                 |  17 ---
 6 files changed, 472 insertions(+), 88 deletions(-)

-- 
2.18.0.399.gad0ab374a1-goog

1:  ace2dc2bc11 = 1:  ace2dc2bc11 xdiff/xdiff.h: remove unused flags
2:  53b3574564e = 2:  53b3574564e xdiff/xdiffi.c: remove unneeded function declarations
-:  ----------- > 3:  9b58621e0d8 t4015: avoid git as a pipe input
3:  34850b565df = 4:  be0ea0717e1 diff.c: do not pass diff options as keydata to hashmap
4:  8148e51178f = 5:  ff7e8721afa diff.c: adjust hash function signature to match hashmap expectation
5:  9d1de6a208e ! 6:  73c2801a5e3 diff.c: add a blocks mode for moved code detection
    @@ -19,7 +19,7 @@
      	moved line, but it is not very useful in a review to determine
      	if a block of code was moved without permutation.
     -zebra::
    -+blocks:
    ++blocks::
      	Blocks of moved text of at least 20 alphanumeric characters
      	are detected greedily. The detected blocks are
     -	painted using either the 'color.diff.{old,new}Moved' color or
    @@ -95,10 +95,8 @@
      	test_config color.diff.newMovedDimmed "normal cyan" &&
      	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
      	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
    -+
    -+	git diff HEAD --no-renames --color-moved=blocks --color |
    -+		grep -v "index" |
    -+		test_decode_color >actual &&
    ++	git diff HEAD --no-renames --color-moved=blocks --color >actual.raw &&
    ++	grep -v "index" actual.raw | test_decode_color >actual &&
     +	cat <<-\EOF >expected &&
     +	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
     +	<BOLD>--- a/lines.txt<RESET>
    @@ -141,6 +139,16 @@
     +	test_config color.diff.newMovedDimmed "normal cyan" &&
     +	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
     +	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
    - 	git diff HEAD --no-renames --color-moved=dimmed_zebra --color |
    - 		grep -v "index" |
    - 		test_decode_color >actual &&
    + 	git diff HEAD --no-renames --color-moved=dimmed_zebra --color >actual.raw &&
    + 	grep -v "index" actual.raw | test_decode_color >actual &&
    + 	cat <<-\EOF >expected &&
    +@@
    + 	7charsA
    + 	EOF
    + 
    +-	git diff HEAD --color-moved=zebra --color --no-renames | grep -v "index" | test_decode_color >actual &&
    ++	git diff HEAD --color-moved=zebra --color --no-renames >actual.raw &&
    ++	grep -v "index" actual.raw | test_decode_color >actual &&
    + 	cat >expected <<-\EOF &&
    + 	<BOLD>diff --git a/bar b/bar<RESET>
    + 	<BOLD>--- a/bar<RESET>
6:  e37bb7b1fc8 ! 7:  87a8919c260 diff.c: decouple white space treatment from move detection algorithm
    @@ -55,6 +55,7 @@
     +ignore-all-space::
     +	Ignore whitespace when comparing lines. This ignores differences
     +	even if one line has whitespace where the other line has none.
    ++--
     +
      --word-diff[=<mode>]::
      	Show a word diff, using the <mode> to delimit changed words.
    @@ -154,32 +155,32 @@
      	EOF
      	test_cmp expected actual &&
      
    --	git diff HEAD --no-renames -w --color-moved --color |
    +-	git diff HEAD --no-renames -w --color-moved --color >actual.raw &&
     +	git diff HEAD --no-renames --color-moved --color \
    -+		--color-moved-ws=ignore-all-space |
    - 		grep -v "index" |
    - 		test_decode_color >actual &&
    ++		--color-moved-ws=ignore-all-space >actual.raw &&
    + 	grep -v "index" actual.raw | test_decode_color >actual &&
      	cat <<-\EOF >expected &&
    + 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
     @@
      	EOF
      	test_cmp expected actual &&
      
    --	git diff HEAD --no-renames -b --color-moved --color |
    +-	git diff HEAD --no-renames -b --color-moved --color >actual.raw &&
     +	git diff HEAD --no-renames --color-moved --color \
    -+		--color-moved-ws=ignore-space-change |
    - 		grep -v "index" |
    - 		test_decode_color >actual &&
    ++		--color-moved-ws=ignore-space-change >actual.raw &&
    + 	grep -v "index" actual.raw | test_decode_color >actual &&
      	cat <<-\EOF >expected &&
    + 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
     @@
      	EOF
      	test_cmp expected actual &&
      
    --	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
    +-	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color >actual.raw &&
     +	git diff HEAD --no-renames --color-moved --color \
    -+		--color-moved-ws=ignore-space-at-eol |
    - 		grep -v "index" |
    - 		test_decode_color >actual &&
    ++		--color-moved-ws=ignore-space-at-eol >actual.raw &&
    + 	grep -v "index" actual.raw | test_decode_color >actual &&
      	cat <<-\EOF >expected &&
    + 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
     @@
      
      	# nor did we mess with it another way
7:  a58e68b8880 = 8:  25279c42f25 diff.c: factor advance_or_nullify out of mark_color_as_moved
8:  f2d78d2c672 ! 9:  305f8b27589 diff.c: add white space mode to move detection that allows indent changes
    @@ -86,9 +86,9 @@
     +	group the moved code blocks only into a block if the change in
     +	whitespace is the same per line. This is incompatible with the
     +	other modes.
    + --
      
      --word-diff[=<mode>]::
    - 	Show a word diff, using the <mode> to delimit changed words.
     
     diff --git a/diff.c b/diff.c
     --- a/diff.c
    @@ -129,7 +129,7 @@
     + */
     +struct ws_delta {
     +	char *string;
    -+	int current_longer : 1;
    ++	unsigned int current_longer : 1;
      };
     +#define WS_DELTA_INIT { NULL, 0 }
     +
    @@ -339,6 +339,30 @@
     diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
     --- a/t/t4015-diff-whitespace.sh
     +++ b/t/t4015-diff-whitespace.sh
    +@@
    + 	EOF
    + 
    + 	# Make sure we get a different diff using -w
    +-	git diff --color --color-moved -w |
    +-		grep -v "index" |
    +-		test_decode_color >actual &&
    ++	git diff --color --color-moved -w >actual.raw &&
    ++	grep -v "index" actual.raw | test_decode_color >actual &&
    + 	q_to_tab <<-\EOF >expected &&
    + 	<BOLD>diff --git a/text.txt b/text.txt<RESET>
    + 	<BOLD>--- a/text.txt<RESET>
    +@@
    + 
    + 	# And now ignoring white space only in the move detection
    + 	git diff --color --color-moved \
    +-		--color-moved-ws=ignore-all-space,ignore-space-change,ignore-space-at-eol |
    +-		grep -v "index" |
    +-		test_decode_color >actual &&
    ++		--color-moved-ws=ignore-all-space,ignore-space-change,ignore-space-at-eol >actual.raw &&
    ++	grep -v "index" actual.raw | test_decode_color >actual &&
    + 	q_to_tab <<-\EOF >expected &&
    + 	<BOLD>diff --git a/text.txt b/text.txt<RESET>
    + 	<BOLD>--- a/text.txt<RESET>
     @@
      	test_cmp expected actual
      '
    @@ -383,9 +407,8 @@
     +	QQYetQAnotherQoutlier
     +	EOF
     +
    -+	git diff --color --color-moved --color-moved-ws=allow-indentation-change |
    -+		grep -v "index" |
    -+		test_decode_color >actual &&
    ++	git diff --color --color-moved --color-moved-ws=allow-indentation-change >actual.raw &&
    ++	grep -v "index" actual.raw | test_decode_color >actual &&
     +
     +	q_to_tab <<-\EOF >expected &&
     +		<BOLD>diff --git a/text.txt b/text.txt<RESET>
    @@ -429,8 +452,7 @@
     +	test_must_fail git diff \
     +		--color-moved-ws=allow-indentation-change,ignore-all-space \
     +		2>err &&
    -+	grep allow-indentation-change err
    ++	test_i18ngrep allow-indentation-change err
     +'
    -+
     +
      test_done
9:  76db2b1328f < -:  ----------- SQUASH????? Documentation breakage emergency fix
10:  d484589fad1 < -:  ----------- SQUASH??? t/4015 GETTEXT_POISON emergency fix
11:  fceb91fc99d < -:  ----------- diff: fix a sparse 'dubious one-bit signed bitfield' error

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4 1/9] xdiff/xdiff.h: remove unused flags
  2018-06-29  0:19 ` [PATCH v4 0/9] " Stefan Beller
@ 2018-06-29  0:19   ` Stefan Beller
  2018-06-29  0:19   ` [PATCH v4 2/9] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-29  0:19 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, bmwill, Junio C Hamano

These flags were there since the beginning (3443546f6e (Use a *real*
built-in diff generator, 2006-03-24), but were never used. Remove them.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 xdiff/xdiff.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index c1937a29112..2356da5f784 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -52,14 +52,6 @@ extern "C" {
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
-#define XDL_MMB_READONLY (1 << 0)
-
-#define XDL_MMF_ATOMIC (1 << 0)
-
-#define XDL_BDOP_INS 1
-#define XDL_BDOP_CPY 2
-#define XDL_BDOP_INSB 3
-
 /* merge simplification levels */
 #define XDL_MERGE_MINIMAL 0
 #define XDL_MERGE_EAGER 1
-- 
2.18.0.399.gad0ab374a1-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 2/9] xdiff/xdiffi.c: remove unneeded function declarations
  2018-06-29  0:19 ` [PATCH v4 0/9] " Stefan Beller
  2018-06-29  0:19   ` [PATCH v4 1/9] xdiff/xdiff.h: remove unused flags Stefan Beller
@ 2018-06-29  0:19   ` Stefan Beller
  2018-06-29  0:19   ` [PATCH v4 3/9] t4015: avoid git as a pipe input Stefan Beller
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-29  0:19 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, bmwill, Junio C Hamano

There is no need to forward-declare these functions, as they are used
after their implementation only.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 xdiff/xdiffi.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 0de1ef463bf..3e8aff92bc4 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -22,34 +22,17 @@
 
 #include "xinclude.h"
 
-
-
 #define XDL_MAX_COST_MIN 256
 #define XDL_HEUR_MIN_COST 256
 #define XDL_LINE_MAX (long)((1UL << (CHAR_BIT * sizeof(long) - 1)) - 1)
 #define XDL_SNAKE_CNT 20
 #define XDL_K_HEUR 4
 
-
-
 typedef struct s_xdpsplit {
 	long i1, i2;
 	int min_lo, min_hi;
 } xdpsplit_t;
 
-
-
-
-static long xdl_split(unsigned long const *ha1, long off1, long lim1,
-		      unsigned long const *ha2, long off2, long lim2,
-		      long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
-		      xdalgoenv_t *xenv);
-static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, long chg2);
-
-
-
-
-
 /*
  * See "An O(ND) Difference Algorithm and its Variations", by Eugene Myers.
  * Basically considers a "box" (off1, off2, lim1, lim2) and scan from both
-- 
2.18.0.399.gad0ab374a1-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 3/9] t4015: avoid git as a pipe input
  2018-06-29  0:19 ` [PATCH v4 0/9] " Stefan Beller
  2018-06-29  0:19   ` [PATCH v4 1/9] xdiff/xdiff.h: remove unused flags Stefan Beller
  2018-06-29  0:19   ` [PATCH v4 2/9] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
@ 2018-06-29  0:19   ` Stefan Beller
  2018-06-29  0:19   ` [PATCH v4 4/9] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-29  0:19 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, bmwill

In t4015 we have a pattern of

    git diff [<options, related to color>] |
        grep -v "index" |
        test_decode_color >actual &&

to produce output that we want to test against. This pattern was introduced
in 86b452e2769 (diff.c: add dimming to moved line detection, 2017-06-30)
as then the focus on getting the colors right. However the pattern used
is not best practice as we do care about the exit code of Git. So let's
not have Git as the upstream of a pipe. Piping the output of grep to
some function is fine as we assume grep to be un-flawed in our test suite.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 t/t4015-diff-whitespace.sh | 50 +++++++++++++++-----------------------
 1 file changed, 20 insertions(+), 30 deletions(-)

diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 17df491a3ab..ddbc3901385 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1271,9 +1271,8 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
 	test_config color.diff.newMovedDimmed "normal cyan" &&
 	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
 	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
-	git diff HEAD --no-renames --color-moved=dimmed_zebra --color |
-		grep -v "index" |
-		test_decode_color >actual &&
+	git diff HEAD --no-renames --color-moved=dimmed_zebra --color >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
 	<BOLD>--- a/lines.txt<RESET>
@@ -1315,9 +1314,8 @@ test_expect_success 'cmd option assumes configured colored-moved' '
 	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
 	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
 	test_config diff.colorMoved zebra &&
-	git diff HEAD --no-renames --color-moved --color |
-		grep -v "index" |
-		test_decode_color >actual &&
+	git diff HEAD --no-renames --color-moved --color >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
 	<BOLD>--- a/lines.txt<RESET>
@@ -1395,9 +1393,8 @@ test_expect_success 'move detection ignoring whitespace ' '
 	line 4
 	line 5
 	EOF
-	git diff HEAD --no-renames --color-moved --color |
-		grep -v "index" |
-		test_decode_color >actual &&
+	git diff HEAD --no-renames --color-moved --color >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
 	<BOLD>--- a/lines.txt<RESET>
@@ -1419,9 +1416,8 @@ test_expect_success 'move detection ignoring whitespace ' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -w --color-moved --color |
-		grep -v "index" |
-		test_decode_color >actual &&
+	git diff HEAD --no-renames -w --color-moved --color >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
 	<BOLD>--- a/lines.txt<RESET>
@@ -1459,9 +1455,8 @@ test_expect_success 'move detection ignoring whitespace changes' '
 	line 5
 	EOF
 
-	git diff HEAD --no-renames --color-moved --color |
-		grep -v "index" |
-		test_decode_color >actual &&
+	git diff HEAD --no-renames --color-moved --color >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
 	<BOLD>--- a/lines.txt<RESET>
@@ -1483,9 +1478,8 @@ test_expect_success 'move detection ignoring whitespace changes' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -b --color-moved --color |
-		grep -v "index" |
-		test_decode_color >actual &&
+	git diff HEAD --no-renames -b --color-moved --color >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
 	<BOLD>--- a/lines.txt<RESET>
@@ -1526,9 +1520,8 @@ test_expect_success 'move detection ignoring whitespace at eol' '
 	# avoid cluttering the output with complaints about our eol whitespace
 	test_config core.whitespace -blank-at-eol &&
 
-	git diff HEAD --no-renames --color-moved --color |
-		grep -v "index" |
-		test_decode_color >actual &&
+	git diff HEAD --no-renames --color-moved --color >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
 	<BOLD>--- a/lines.txt<RESET>
@@ -1550,9 +1543,8 @@ test_expect_success 'move detection ignoring whitespace at eol' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
-		grep -v "index" |
-		test_decode_color >actual &&
+	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
 	<BOLD>--- a/lines.txt<RESET>
@@ -1597,9 +1589,8 @@ test_expect_success '--color-moved block at end of diff output respects MIN_ALNU
 	irrelevant_line
 	EOF
 
-	git diff HEAD --color-moved=zebra --color --no-renames |
-		grep -v "index" |
-		test_decode_color >actual &&
+	git diff HEAD --color-moved=zebra --color --no-renames >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat >expected <<-\EOF &&
 	<BOLD>diff --git a/bar b/bar<RESET>
 	<BOLD>--- a/bar<RESET>
@@ -1636,9 +1627,8 @@ test_expect_success '--color-moved respects MIN_ALNUM_COUNT' '
 	nineteen chars 456789
 	EOF
 
-	git diff HEAD --color-moved=zebra --color --no-renames |
-		grep -v "index" |
-		test_decode_color >actual &&
+	git diff HEAD --color-moved=zebra --color --no-renames >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat >expected <<-\EOF &&
 	<BOLD>diff --git a/bar b/bar<RESET>
 	<BOLD>--- a/bar<RESET>
-- 
2.18.0.399.gad0ab374a1-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 4/9] diff.c: do not pass diff options as keydata to hashmap
  2018-06-29  0:19 ` [PATCH v4 0/9] " Stefan Beller
                     ` (2 preceding siblings ...)
  2018-06-29  0:19   ` [PATCH v4 3/9] t4015: avoid git as a pipe input Stefan Beller
@ 2018-06-29  0:19   ` Stefan Beller
  2018-06-29  0:19   ` [PATCH v4 5/9] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-29  0:19 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, bmwill, Junio C Hamano

When we initialize the hashmap, we give it a pointer to the
diff_options, which it then passes along to each call of the
hashmap_cmp_fn function. There's no need to pass it a second time as
the "keydata" parameter, and our comparison functions never look at
keydata.

This was a mistake left over from an earlier round of 2e2d5ac184
(diff.c: color moved lines differently, 2017-06-30), before hashmap
learned to pass the data pointer for us.

Explanation-by: Jeff King <peff@peff.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 1289df4b1f9..ce7bedc1b92 100644
--- a/diff.c
+++ b/diff.c
@@ -842,13 +842,13 @@ static void mark_color_as_moved(struct diff_options *o,
 		case DIFF_SYMBOL_PLUS:
 			hm = del_lines;
 			key = prepare_entry(o, n);
-			match = hashmap_get(hm, key, o);
+			match = hashmap_get(hm, key, NULL);
 			free(key);
 			break;
 		case DIFF_SYMBOL_MINUS:
 			hm = add_lines;
 			key = prepare_entry(o, n);
-			match = hashmap_get(hm, key, o);
+			match = hashmap_get(hm, key, NULL);
 			free(key);
 			break;
 		default:
-- 
2.18.0.399.gad0ab374a1-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 5/9] diff.c: adjust hash function signature to match hashmap expectation
  2018-06-29  0:19 ` [PATCH v4 0/9] " Stefan Beller
                     ` (3 preceding siblings ...)
  2018-06-29  0:19   ` [PATCH v4 4/9] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
@ 2018-06-29  0:19   ` Stefan Beller
  2018-06-29  0:19   ` [PATCH v4 6/9] diff.c: add a blocks mode for moved code detection Stefan Beller
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-29  0:19 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, bmwill, Junio C Hamano

This makes the follow up patch easier.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index ce7bedc1b92..d1bae900cdc 100644
--- a/diff.c
+++ b/diff.c
@@ -707,11 +707,15 @@ struct moved_entry {
 	struct moved_entry *next_line;
 };
 
-static int moved_entry_cmp(const struct diff_options *diffopt,
-			   const struct moved_entry *a,
-			   const struct moved_entry *b,
+static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
+			   const void *entry,
+			   const void *entry_or_key,
 			   const void *keydata)
 {
+	const struct diff_options *diffopt = hashmap_cmp_fn_data;
+	const struct moved_entry *a = entry;
+	const struct moved_entry *b = entry_or_key;
+
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
 				    diffopt->xdl_opts);
@@ -5534,10 +5538,8 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
-			hashmap_init(&del_lines,
-				     (hashmap_cmp_fn)moved_entry_cmp, o, 0);
-			hashmap_init(&add_lines,
-				     (hashmap_cmp_fn)moved_entry_cmp, o, 0);
+			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
+			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
 			add_lines_to_move_detection(o, &add_lines, &del_lines);
 			mark_color_as_moved(o, &add_lines, &del_lines);
-- 
2.18.0.399.gad0ab374a1-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 6/9] diff.c: add a blocks mode for moved code detection
  2018-06-29  0:19 ` [PATCH v4 0/9] " Stefan Beller
                     ` (4 preceding siblings ...)
  2018-06-29  0:19   ` [PATCH v4 5/9] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
@ 2018-06-29  0:19   ` Stefan Beller
  2018-07-02 17:18     ` Brandon Williams
  2018-06-29  0:19   ` [PATCH v4 7/9] diff.c: decouple white space treatment from move detection algorithm Stefan Beller
                     ` (2 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Stefan Beller @ 2018-06-29  0:19 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, bmwill, Junio C Hamano

The new "blocks" mode provides a middle ground between plain and zebra.
It is as intuitive (few colors) as plain, but still has the requirement
for a minimum of lines/characters to count a block as moved.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
 (https://public-inbox.org/git/87o9j0uljo.fsf@evledraar.gmail.com/)
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/diff-options.txt |  8 ++++--
 diff.c                         |  6 +++--
 diff.h                         |  5 ++--
 t/t4015-diff-whitespace.sh     | 49 ++++++++++++++++++++++++++++++++--
 4 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index e3a44f03cdc..ba56169de31 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -276,10 +276,14 @@ plain::
 	that are added somewhere else in the diff. This mode picks up any
 	moved line, but it is not very useful in a review to determine
 	if a block of code was moved without permutation.
-zebra::
+blocks::
 	Blocks of moved text of at least 20 alphanumeric characters
 	are detected greedily. The detected blocks are
-	painted using either the 'color.diff.{old,new}Moved' color or
+	painted using either the 'color.diff.{old,new}Moved' color.
+	Adjacent blocks cannot be told apart.
+zebra::
+	Blocks of moved text are detected as in 'blocks' mode. The blocks
+	are painted using either the 'color.diff.{old,new}Moved' color or
 	'color.diff.{old,new}MovedAlternative'. The change between
 	the two colors indicates that a new block was detected.
 dimmed_zebra::
diff --git a/diff.c b/diff.c
index d1bae900cdc..95c51c0b7df 100644
--- a/diff.c
+++ b/diff.c
@@ -271,6 +271,8 @@ static int parse_color_moved(const char *arg)
 		return COLOR_MOVED_NO;
 	else if (!strcmp(arg, "plain"))
 		return COLOR_MOVED_PLAIN;
+	else if (!strcmp(arg, "blocks"))
+		return COLOR_MOVED_BLOCKS;
 	else if (!strcmp(arg, "zebra"))
 		return COLOR_MOVED_ZEBRA;
 	else if (!strcmp(arg, "default"))
@@ -278,7 +280,7 @@ static int parse_color_moved(const char *arg)
 	else if (!strcmp(arg, "dimmed_zebra"))
 		return COLOR_MOVED_ZEBRA_DIM;
 	else
-		return error(_("color moved setting must be one of 'no', 'default', 'zebra', 'dimmed_zebra', 'plain'"));
+		return error(_("color moved setting must be one of 'no', 'default', 'blocks', 'zebra', 'dimmed_zebra', 'plain'"));
 }
 
 int git_diff_ui_config(const char *var, const char *value, void *cb)
@@ -903,7 +905,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
 		block_length++;
 
-		if (flipped_block)
+		if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
 	adjust_last_block(o, n, block_length);
diff --git a/diff.h b/diff.h
index d29560f822c..7bd4f182c33 100644
--- a/diff.h
+++ b/diff.h
@@ -208,8 +208,9 @@ struct diff_options {
 	enum {
 		COLOR_MOVED_NO = 0,
 		COLOR_MOVED_PLAIN = 1,
-		COLOR_MOVED_ZEBRA = 2,
-		COLOR_MOVED_ZEBRA_DIM = 3,
+		COLOR_MOVED_BLOCKS = 2,
+		COLOR_MOVED_ZEBRA = 3,
+		COLOR_MOVED_ZEBRA_DIM = 4,
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index ddbc3901385..e54529f026d 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1223,7 +1223,7 @@ test_expect_success 'plain moved code, inside file' '
 	test_cmp expected actual
 '
 
-test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
+test_expect_success 'detect blocks of moved code' '
 	git reset --hard &&
 	cat <<-\EOF >lines.txt &&
 		long line 1
@@ -1271,6 +1271,50 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
 	test_config color.diff.newMovedDimmed "normal cyan" &&
 	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
 	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
+	git diff HEAD --no-renames --color-moved=blocks --color >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+	<BOLD>--- a/lines.txt<RESET>
+	<BOLD>+++ b/lines.txt<RESET>
+	<CYAN>@@ -1,16 +1,16 @@<RESET>
+	<MAGENTA>-long line 1<RESET>
+	<MAGENTA>-long line 2<RESET>
+	<MAGENTA>-long line 3<RESET>
+	 line 4<RESET>
+	 line 5<RESET>
+	 line 6<RESET>
+	 line 7<RESET>
+	 line 8<RESET>
+	 line 9<RESET>
+	<CYAN>+<RESET><CYAN>long line 1<RESET>
+	<CYAN>+<RESET><CYAN>long line 2<RESET>
+	<CYAN>+<RESET><CYAN>long line 3<RESET>
+	<CYAN>+<RESET><CYAN>long line 14<RESET>
+	<CYAN>+<RESET><CYAN>long line 15<RESET>
+	<CYAN>+<RESET><CYAN>long line 16<RESET>
+	 line 10<RESET>
+	 line 11<RESET>
+	 line 12<RESET>
+	 line 13<RESET>
+	<MAGENTA>-long line 14<RESET>
+	<MAGENTA>-long line 15<RESET>
+	<MAGENTA>-long line 16<RESET>
+	EOF
+	test_cmp expected actual
+
+'
+
+test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
+	# reuse setup from test before!
+	test_config color.diff.oldMoved "magenta" &&
+	test_config color.diff.newMoved "cyan" &&
+	test_config color.diff.oldMovedAlternative "blue" &&
+	test_config color.diff.newMovedAlternative "yellow" &&
+	test_config color.diff.oldMovedDimmed "normal magenta" &&
+	test_config color.diff.newMovedDimmed "normal cyan" &&
+	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
+	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
 	git diff HEAD --no-renames --color-moved=dimmed_zebra --color >actual.raw &&
 	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1669,7 +1713,8 @@ test_expect_success '--color-moved treats adjacent blocks as separate for MIN_AL
 	7charsA
 	EOF
 
-	git diff HEAD --color-moved=zebra --color --no-renames | grep -v "index" | test_decode_color >actual &&
+	git diff HEAD --color-moved=zebra --color --no-renames >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat >expected <<-\EOF &&
 	<BOLD>diff --git a/bar b/bar<RESET>
 	<BOLD>--- a/bar<RESET>
-- 
2.18.0.399.gad0ab374a1-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 7/9] diff.c: decouple white space treatment from move detection algorithm
  2018-06-29  0:19 ` [PATCH v4 0/9] " Stefan Beller
                     ` (5 preceding siblings ...)
  2018-06-29  0:19   ` [PATCH v4 6/9] diff.c: add a blocks mode for moved code detection Stefan Beller
@ 2018-06-29  0:19   ` Stefan Beller
  2018-07-02 17:22     ` Brandon Williams
  2018-06-29  0:19   ` [PATCH v4 8/9] diff.c: factor advance_or_nullify out of mark_color_as_moved Stefan Beller
  2018-06-29  0:19   ` [PATCH v4 9/9] diff.c: add white space mode to move detection that allows indent changes Stefan Beller
  8 siblings, 1 reply; 25+ messages in thread
From: Stefan Beller @ 2018-06-29  0:19 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, bmwill, Junio C Hamano

In the original implementation of the move detection logic the choice for
ignoring white space changes is the same for the move detection as it is
for the regular diff.  Some cases came up where different treatment would
have been nice.

Allow the user to specify that white space should be ignored differently
during detection of moved lines than during generation of added and removed
lines. This is done by providing analogs to the --ignore-space-at-eol,
-b, and -w options by introducing the option --color-moved-ws=<modes>
with the modes named "ignore-space-at-eol", "ignore-space-change" and
"ignore-all-space", which is used only during the move detection phase.

As we change the default, we'll adjust the tests.

For now we do not infer any options to treat white spaces in the move
detection from the generic white space options given to diff.
This can be tuned later to reasonable default.

As we plan on adding more white space related options in a later patch,
that interferes with the current white space options, use a flag field
and clamp it down to  XDF_WHITESPACE_FLAGS, as that (a) allows to easily
check at parse time if we give invalid combinations and (b) can reuse
parts of this patch.

By having the white space treatment in its own option, we'll also
make it easier for a later patch to have an config option for
spaces in the move detection.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/diff-options.txt | 17 +++++++++
 diff.c                         | 39 +++++++++++++++++++--
 diff.h                         |  1 +
 t/t4015-diff-whitespace.sh     | 64 +++++++++++++++++++++++++++++++---
 4 files changed, 115 insertions(+), 6 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index ba56169de31..80e29e39854 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -292,6 +292,23 @@ dimmed_zebra::
 	blocks are considered interesting, the rest is uninteresting.
 --
 
+--color-moved-ws=<modes>::
+	This configures how white spaces are ignored when performing the
+	move detection for `--color-moved`. These modes can be given
+	as a comma separated list:
++
+--
+ignore-space-at-eol::
+	Ignore changes in whitespace at EOL.
+ignore-space-change::
+	Ignore changes in amount of whitespace.  This ignores whitespace
+	at line end, and considers all other sequences of one or
+	more whitespace characters to be equivalent.
+ignore-all-space::
+	Ignore whitespace when comparing lines. This ignores differences
+	even if one line has whitespace where the other line has none.
+--
+
 --word-diff[=<mode>]::
 	Show a word diff, using the <mode> to delimit changed words.
 	By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index 95c51c0b7df..70eeb40c5fd 100644
--- a/diff.c
+++ b/diff.c
@@ -283,6 +283,36 @@ static int parse_color_moved(const char *arg)
 		return error(_("color moved setting must be one of 'no', 'default', 'blocks', 'zebra', 'dimmed_zebra', 'plain'"));
 }
 
+static int parse_color_moved_ws(const char *arg)
+{
+	int ret = 0;
+	struct string_list l = STRING_LIST_INIT_DUP;
+	struct string_list_item *i;
+
+	string_list_split(&l, arg, ',', -1);
+
+	for_each_string_list_item(i, &l) {
+		struct strbuf sb = STRBUF_INIT;
+		strbuf_addstr(&sb, i->string);
+		strbuf_trim(&sb);
+
+		if (!strcmp(sb.buf, "ignore-space-change"))
+			ret |= XDF_IGNORE_WHITESPACE_CHANGE;
+		else if (!strcmp(sb.buf, "ignore-space-at-eol"))
+			ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
+		else if (!strcmp(sb.buf, "ignore-all-space"))
+			ret |= XDF_IGNORE_WHITESPACE;
+		else
+			error(_("ignoring unknown color-moved-ws mode '%s'"), sb.buf);
+
+		strbuf_release(&sb);
+	}
+
+	string_list_clear(&l, 0);
+
+	return ret;
+}
+
 int git_diff_ui_config(const char *var, const char *value, void *cb)
 {
 	if (!strcmp(var, "diff.color") || !strcmp(var, "color.diff")) {
@@ -717,10 +747,12 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
 	const struct moved_entry *a = entry;
 	const struct moved_entry *b = entry_or_key;
+	unsigned flags = diffopt->color_moved_ws_handling
+			 & XDF_WHITESPACE_FLAGS;
 
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
-				    diffopt->xdl_opts);
+				    flags);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -728,8 +760,9 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 {
 	struct moved_entry *ret = xmalloc(sizeof(*ret));
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
+	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
-	ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
+	ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
 	ret->es = l;
 	ret->next_line = NULL;
 
@@ -4710,6 +4743,8 @@ int diff_opt_parse(struct diff_options *options,
 		if (cm < 0)
 			die("bad --color-moved argument: %s", arg);
 		options->color_moved = cm;
+	} else if (skip_prefix(arg, "--color-moved-ws=", &arg)) {
+		options->color_moved_ws_handling = parse_color_moved_ws(arg);
 	} else if (skip_to_optional_arg_default(arg, "--color-words", &options->word_regex, NULL)) {
 		options->use_color = 1;
 		options->word_diff = DIFF_WORDS_COLOR;
diff --git a/diff.h b/diff.h
index 7bd4f182c33..de5dc680051 100644
--- a/diff.h
+++ b/diff.h
@@ -214,6 +214,7 @@ struct diff_options {
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
+	int color_moved_ws_handling;
 };
 
 void diff_emit_submodule_del(struct diff_options *o, const char *line);
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index e54529f026d..0c737a47cf8 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1460,7 +1460,8 @@ test_expect_success 'move detection ignoring whitespace ' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -w --color-moved --color >actual.raw &&
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-ws=ignore-all-space >actual.raw &&
 	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
@@ -1522,7 +1523,8 @@ test_expect_success 'move detection ignoring whitespace changes' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -b --color-moved --color >actual.raw &&
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-ws=ignore-space-change >actual.raw &&
 	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
@@ -1587,7 +1589,8 @@ test_expect_success 'move detection ignoring whitespace at eol' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color >actual.raw &&
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-ws=ignore-space-at-eol >actual.raw &&
 	grep -v "index" actual.raw | test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
@@ -1757,7 +1760,60 @@ test_expect_success 'move detection with submodules' '
 
 	# nor did we mess with it another way
 	git diff --submodule=diff --color | test_decode_color >expect &&
-	test_cmp expect decoded_actual
+	test_cmp expect decoded_actual &&
+	rm -rf bananas &&
+	git submodule deinit bananas
+'
+
+test_expect_success 'only move detection ignores white spaces' '
+	git reset --hard &&
+	q_to_tab <<-\EOF >text.txt &&
+		a long line to exceed per-line minimum
+		another long line to exceed per-line minimum
+		original file
+	EOF
+	git add text.txt &&
+	git commit -m "add text" &&
+	q_to_tab <<-\EOF >text.txt &&
+		Qa long line to exceed per-line minimum
+		Qanother long line to exceed per-line minimum
+		new file
+	EOF
+
+	# Make sure we get a different diff using -w
+	git diff --color --color-moved -w |
+		grep -v "index" |
+		test_decode_color >actual &&
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/text.txt b/text.txt<RESET>
+	<BOLD>--- a/text.txt<RESET>
+	<BOLD>+++ b/text.txt<RESET>
+	<CYAN>@@ -1,3 +1,3 @@<RESET>
+	 Qa long line to exceed per-line minimum<RESET>
+	 Qanother long line to exceed per-line minimum<RESET>
+	<RED>-original file<RESET>
+	<GREEN>+<RESET><GREEN>new file<RESET>
+	EOF
+	test_cmp expected actual &&
+
+	# And now ignoring white space only in the move detection
+	git diff --color --color-moved \
+		--color-moved-ws=ignore-all-space,ignore-space-change,ignore-space-at-eol |
+		grep -v "index" |
+		test_decode_color >actual &&
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/text.txt b/text.txt<RESET>
+	<BOLD>--- a/text.txt<RESET>
+	<BOLD>+++ b/text.txt<RESET>
+	<CYAN>@@ -1,3 +1,3 @@<RESET>
+	<BOLD;MAGENTA>-a long line to exceed per-line minimum<RESET>
+	<BOLD;MAGENTA>-another long line to exceed per-line minimum<RESET>
+	<RED>-original file<RESET>
+	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>a long line to exceed per-line minimum<RESET>
+	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>another long line to exceed per-line minimum<RESET>
+	<GREEN>+<RESET><GREEN>new file<RESET>
+	EOF
+	test_cmp expected actual
 '
 
 test_done
-- 
2.18.0.399.gad0ab374a1-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 8/9] diff.c: factor advance_or_nullify out of mark_color_as_moved
  2018-06-29  0:19 ` [PATCH v4 0/9] " Stefan Beller
                     ` (6 preceding siblings ...)
  2018-06-29  0:19   ` [PATCH v4 7/9] diff.c: decouple white space treatment from move detection algorithm Stefan Beller
@ 2018-06-29  0:19   ` Stefan Beller
  2018-06-29  0:19   ` [PATCH v4 9/9] diff.c: add white space mode to move detection that allows indent changes Stefan Beller
  8 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-06-29  0:19 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, bmwill, Junio C Hamano

This moves the part of code that checks if we're still in a block
into its own function.  We'll need a different approach on advancing
the blocks in a later patch, so having it as a separate function will
prove useful.

While at it rename the variable `p` to `prev` to indicate that it refers
to the previous line. This is as pmb[i] was assigned in the last iteration
of the outmost for loop.

Further rename `pnext` to `cur` to indicate that this should match up with
the current line of the outmost for loop.

Also replace the advancement of pmb[i] to reuse `cur` instead of
using `p->next` (which is how the name for pnext could be explained.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/diff.c b/diff.c
index 70eeb40c5fd..4963819e530 100644
--- a/diff.c
+++ b/diff.c
@@ -801,6 +801,25 @@ static void add_lines_to_move_detection(struct diff_options *o,
 	}
 }
 
+static void pmb_advance_or_null(struct diff_options *o,
+				struct moved_entry *match,
+				struct hashmap *hm,
+				struct moved_entry **pmb,
+				int pmb_nr)
+{
+	int i;
+	for (i = 0; i < pmb_nr; i++) {
+		struct moved_entry *prev = pmb[i];
+		struct moved_entry *cur = (prev && prev->next_line) ?
+				prev->next_line : NULL;
+		if (cur && !hm->cmpfn(o, cur, match, NULL)) {
+			pmb[i] = cur;
+		} else {
+			pmb[i] = NULL;
+		}
+	}
+}
+
 static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 					 int pmb_nr)
 {
@@ -875,7 +894,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-		int i;
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
@@ -906,17 +924,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (o->color_moved == COLOR_MOVED_PLAIN)
 			continue;
 
-		/* Check any potential block runs, advance each or nullify */
-		for (i = 0; i < pmb_nr; i++) {
-			struct moved_entry *p = pmb[i];
-			struct moved_entry *pnext = (p && p->next_line) ?
-					p->next_line : NULL;
-			if (pnext && !hm->cmpfn(o, pnext, match, NULL)) {
-				pmb[i] = p->next_line;
-			} else {
-				pmb[i] = NULL;
-			}
-		}
+		pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
2.18.0.399.gad0ab374a1-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v4 9/9] diff.c: add white space mode to move detection that allows indent changes
  2018-06-29  0:19 ` [PATCH v4 0/9] " Stefan Beller
                     ` (7 preceding siblings ...)
  2018-06-29  0:19   ` [PATCH v4 8/9] diff.c: factor advance_or_nullify out of mark_color_as_moved Stefan Beller
@ 2018-06-29  0:19   ` Stefan Beller
  2018-07-02 17:36     ` Brandon Williams
  8 siblings, 1 reply; 25+ messages in thread
From: Stefan Beller @ 2018-06-29  0:19 UTC (permalink / raw)
  To: sbeller; +Cc: git, jacob.keller, jonathantanmy, simon, bmwill, Junio C Hamano

The option of --color-moved has proven to be useful as observed on the
mailing list. However when refactoring sometimes the indentation changes,
for example when partitioning a functions into smaller helper functions
the code usually mostly moved around except for a decrease in indentation.

To just review the moved code ignoring the change in indentation, a mode
to ignore spaces in the move detection as implemented in a previous patch
would be enough.  However the whole move coloring as motivated in commit
2e2d5ac (diff.c: color moved lines differently, 2017-06-30), brought
up the notion of the reviewer being able to trust the move of a "block".

As there are languages such as python, which depend on proper relative
indentation for the control flow of the program, ignoring any white space
change in a block would not uphold the promises of 2e2d5ac that allows
reviewers to pay less attention to the inside of a block, as inside
the reviewer wants to assume the same program flow.

This new mode of white space ignorance will take this into account and will
only allow the same white space changes per line in each block. This patch
even allows only for the same change at the beginning of the lines.

As this is a white space mode, it is made exclusive to other white space
modes in the move detection.

This patch brings some challenges, related to the detection of blocks.
We need a white net the catch the possible moved lines, but then need to
narrow down to check if the blocks are still in tact. Consider this
example (ignoring block sizes):

 - A
 - B
 - C
 +    A
 +    B
 +    C

At the beginning of a block when checking if there is a counterpart
for A, we have to ignore all space changes. However at the following
lines we have to check if the indent change stayed the same.

Checking if the indentation change did stay the same, is done by computing
the indentation change by the difference in line length, and then assume
the change is only in the beginning of the longer line, the common tail
is the same. That is why the test contains lines like:

 - <TAB> A
 ...
 + A <TAB>
 ...

As the first line starting a block is caught using a compare function that
ignores white spaces unlike the rest of the block, where the white space
delta is taken into account for the comparison, we also have to think about
the following situation:

 - A
 - B
 -   A
 -   B
 +    A
 +    B
 +      A
 +      B

When checking if the first A (both in the + and - lines) is a start of
a block, we have to check all 'A' and record all the white space deltas
such that we can find the example above to be just one block that is
indented.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/diff-options.txt |   5 ++
 diff.c                         | 158 ++++++++++++++++++++++++++++++++-
 diff.h                         |   3 +
 t/t4015-diff-whitespace.sh     |  98 ++++++++++++++++++--
 4 files changed, 256 insertions(+), 8 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 80e29e39854..143acd9417e 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -307,6 +307,11 @@ ignore-space-change::
 ignore-all-space::
 	Ignore whitespace when comparing lines. This ignores differences
 	even if one line has whitespace where the other line has none.
+allow-indentation-change::
+	Initially ignore any white spaces in the move detection, then
+	group the moved code blocks only into a block if the change in
+	whitespace is the same per line. This is incompatible with the
+	other modes.
 --
 
 --word-diff[=<mode>]::
diff --git a/diff.c b/diff.c
index 4963819e530..f51f0ac32f4 100644
--- a/diff.c
+++ b/diff.c
@@ -302,12 +302,18 @@ static int parse_color_moved_ws(const char *arg)
 			ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
 		else if (!strcmp(sb.buf, "ignore-all-space"))
 			ret |= XDF_IGNORE_WHITESPACE;
+		else if (!strcmp(sb.buf, "allow-indentation-change"))
+			ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
 		else
 			error(_("ignoring unknown color-moved-ws mode '%s'"), sb.buf);
 
 		strbuf_release(&sb);
 	}
 
+	if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
+	    (ret & XDF_WHITESPACE_FLAGS))
+		die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes"));
+
 	string_list_clear(&l, 0);
 
 	return ret;
@@ -737,7 +743,91 @@ struct moved_entry {
 	struct hashmap_entry ent;
 	const struct emitted_diff_symbol *es;
 	struct moved_entry *next_line;
+	struct ws_delta *wsd;
+};
+
+/**
+ * The struct ws_delta holds white space differences between moved lines, i.e.
+ * between '+' and '-' lines that have been detected to be a move.
+ * The string contains the difference in leading white spaces, before the
+ * rest of the line is compared using the white space config for move
+ * coloring. The current_longer indicates if the first string in the
+ * comparision is longer than the second.
+ */
+struct ws_delta {
+	char *string;
+	unsigned int current_longer : 1;
 };
+#define WS_DELTA_INIT { NULL, 0 }
+
+static int compute_ws_delta(const struct emitted_diff_symbol *a,
+			     const struct emitted_diff_symbol *b,
+			     struct ws_delta *out)
+{
+	const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
+	const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
+	int d = longer->len - shorter->len;
+
+	out->string = xmemdupz(longer->line, d);
+	out->current_longer = (a == longer);
+
+	return !strncmp(longer->line + d, shorter->line, shorter->len);
+}
+
+static int cmp_in_block_with_wsd(const struct diff_options *o,
+				 const struct moved_entry *cur,
+				 const struct moved_entry *match,
+				 struct moved_entry *pmb,
+				 int n)
+{
+	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
+	int al = cur->es->len, cl = l->len;
+	const char *a = cur->es->line,
+		   *b = match->es->line,
+		   *c = l->line;
+
+	int wslen;
+
+	/*
+	 * We need to check if 'cur' is equal to 'match'.
+	 * As those are from the same (+/-) side, we do not need to adjust for
+	 * indent changes. However these were found using fuzzy matching
+	 * so we do have to check if they are equal.
+	 */
+	if (strcmp(a, b))
+		return 1;
+
+	if (!pmb->wsd)
+		/*
+		 * No white space delta was carried forward? This can happen
+		 * when we exit early in this function and do not carry
+		 * forward ws.
+		 */
+		return 1;
+
+	/*
+	 * The indent changes of the block are known and carried forward in
+	 * pmb->wsd; however we need to check if the indent changes of the
+	 * current line are still the same as before.
+	 *
+	 * To do so we need to compare 'l' to 'cur', adjusting the
+	 * one of them for the white spaces, depending which was longer.
+	 */
+
+	wslen = strlen(pmb->wsd->string);
+	if (pmb->wsd->current_longer) {
+		c += wslen;
+		cl -= wslen;
+	} else {
+		a += wslen;
+		al -= wslen;
+	}
+
+	if (strcmp(a, c))
+		return 1;
+
+	return 0;
+}
 
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			   const void *entry,
@@ -750,6 +840,16 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
+	if (diffopt->color_moved_ws_handling &
+	    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+		/*
+		 * As there is not specific white space config given,
+		 * we'd need to check for a new block, so ignore all
+		 * white space. The setup of the white space
+		 * configuration for the next block is done else where
+		 */
+		flags |= XDF_IGNORE_WHITESPACE;
+
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
 				    flags);
@@ -765,6 +865,7 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 	ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
 	ret->es = l;
 	ret->next_line = NULL;
+	ret->wsd = NULL;
 
 	return ret;
 }
@@ -820,6 +921,37 @@ static void pmb_advance_or_null(struct diff_options *o,
 	}
 }
 
+static void pmb_advance_or_null_multi_match(struct diff_options *o,
+					    struct moved_entry *match,
+					    struct hashmap *hm,
+					    struct moved_entry **pmb,
+					    int pmb_nr, int n)
+{
+	int i;
+	char *got_match = xcalloc(1, pmb_nr);
+
+	for (; match; match = hashmap_get_next(hm, match)) {
+		for (i = 0; i < pmb_nr; i++) {
+			struct moved_entry *prev = pmb[i];
+			struct moved_entry *cur = (prev && prev->next_line) ?
+					prev->next_line : NULL;
+			if (!cur)
+				continue;
+			if (!cmp_in_block_with_wsd(o, cur, match, pmb[i], n))
+				got_match[i] |= 1;
+		}
+	}
+
+	for (i = 0; i <pmb_nr; i++) {
+		if (got_match[i]) {
+			/* Carry the white space delta forward */
+			pmb[i]->next_line->wsd = pmb[i]->wsd;
+			pmb[i] = pmb[i]->next_line;
+		} else
+			pmb[i] = NULL;
+	}
+}
+
 static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 					 int pmb_nr)
 {
@@ -837,6 +969,10 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 
 		if (lp < pmb_nr && rp > -1 && lp < rp) {
 			pmb[lp] = pmb[rp];
+			if (pmb[rp]->wsd) {
+				free(pmb[rp]->wsd->string);
+				FREE_AND_NULL(pmb[rp]->wsd);
+			}
 			pmb[rp] = NULL;
 			rp--;
 			lp++;
@@ -924,7 +1060,11 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (o->color_moved == COLOR_MOVED_PLAIN)
 			continue;
 
-		pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
+		if (o->color_moved_ws_handling &
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
+		else
+			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
@@ -935,7 +1075,17 @@ static void mark_color_as_moved(struct diff_options *o,
 			 */
 			for (; match; match = hashmap_get_next(hm, match)) {
 				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
-				pmb[pmb_nr++] = match;
+				if (o->color_moved_ws_handling &
+				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
+					struct ws_delta *wsd = xmalloc(sizeof(*match->wsd));
+					if (compute_ws_delta(l, match->es, wsd)) {
+						match->wsd = wsd;
+						pmb[pmb_nr++] = match;
+					} else
+						free(wsd);
+				} else {
+					pmb[pmb_nr++] = match;
+				}
 			}
 
 			flipped_block = (flipped_block + 1) % 2;
@@ -5583,6 +5733,10 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
+			if (o->color_moved_ws_handling &
+			    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+				o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
+
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
 			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
diff --git a/diff.h b/diff.h
index de5dc680051..5e6bcf09260 100644
--- a/diff.h
+++ b/diff.h
@@ -214,6 +214,9 @@ struct diff_options {
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
+
+	/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
+	#define COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE (1<<5)
 	int color_moved_ws_handling;
 };
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 0c737a47cf8..41facf7abf9 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1781,9 +1781,8 @@ test_expect_success 'only move detection ignores white spaces' '
 	EOF
 
 	# Make sure we get a different diff using -w
-	git diff --color --color-moved -w |
-		grep -v "index" |
-		test_decode_color >actual &&
+	git diff --color --color-moved -w >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	q_to_tab <<-\EOF >expected &&
 	<BOLD>diff --git a/text.txt b/text.txt<RESET>
 	<BOLD>--- a/text.txt<RESET>
@@ -1798,9 +1797,8 @@ test_expect_success 'only move detection ignores white spaces' '
 
 	# And now ignoring white space only in the move detection
 	git diff --color --color-moved \
-		--color-moved-ws=ignore-all-space,ignore-space-change,ignore-space-at-eol |
-		grep -v "index" |
-		test_decode_color >actual &&
+		--color-moved-ws=ignore-all-space,ignore-space-change,ignore-space-at-eol >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
 	q_to_tab <<-\EOF >expected &&
 	<BOLD>diff --git a/text.txt b/text.txt<RESET>
 	<BOLD>--- a/text.txt<RESET>
@@ -1816,4 +1814,92 @@ test_expect_success 'only move detection ignores white spaces' '
 	test_cmp expected actual
 '
 
+test_expect_success 'compare whitespace delta across moved blocks' '
+
+	git reset --hard &&
+	q_to_tab <<-\EOF >text.txt &&
+	QIndented
+	QText across
+	Qsome lines
+	QBut! <- this stands out
+	QAdjusting with
+	QQdifferent starting
+	Qwhite spaces
+	QAnother outlier
+	QQQIndented
+	QQQText across
+	QQQfive lines
+	QQQthat has similar lines
+	QQQto previous blocks, but with different indent
+	QQQYetQAnotherQoutlierQ
+	EOF
+
+	git add text.txt &&
+	git commit -m "add text.txt" &&
+
+	q_to_tab <<-\EOF >text.txt &&
+	QQIndented
+	QQText across
+	QQsome lines
+	QQQBut! <- this stands out
+	Adjusting with
+	Qdifferent starting
+	white spaces
+	AnotherQoutlier
+	QQIndented
+	QQText across
+	QQfive lines
+	QQthat has similar lines
+	QQto previous blocks, but with different indent
+	QQYetQAnotherQoutlier
+	EOF
+
+	git diff --color --color-moved --color-moved-ws=allow-indentation-change >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
+
+	q_to_tab <<-\EOF >expected &&
+		<BOLD>diff --git a/text.txt b/text.txt<RESET>
+		<BOLD>--- a/text.txt<RESET>
+		<BOLD>+++ b/text.txt<RESET>
+		<CYAN>@@ -1,14 +1,14 @@<RESET>
+		<BOLD;MAGENTA>-QIndented<RESET>
+		<BOLD;MAGENTA>-QText across<RESET>
+		<BOLD;MAGENTA>-Qsome lines<RESET>
+		<RED>-QBut! <- this stands out<RESET>
+		<BOLD;MAGENTA>-QAdjusting with<RESET>
+		<BOLD;MAGENTA>-QQdifferent starting<RESET>
+		<BOLD;MAGENTA>-Qwhite spaces<RESET>
+		<RED>-QAnother outlier<RESET>
+		<BOLD;MAGENTA>-QQQIndented<RESET>
+		<BOLD;MAGENTA>-QQQText across<RESET>
+		<BOLD;MAGENTA>-QQQfive lines<RESET>
+		<BOLD;MAGENTA>-QQQthat has similar lines<RESET>
+		<BOLD;MAGENTA>-QQQto previous blocks, but with different indent<RESET>
+		<RED>-QQQYetQAnotherQoutlierQ<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Indented<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Text across<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>some lines<RESET>
+		<GREEN>+<RESET>QQQ<GREEN>But! <- this stands out<RESET>
+		<BOLD;CYAN>+<RESET><BOLD;CYAN>Adjusting with<RESET>
+		<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>different starting<RESET>
+		<BOLD;CYAN>+<RESET><BOLD;CYAN>white spaces<RESET>
+		<GREEN>+<RESET><GREEN>AnotherQoutlier<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Indented<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Text across<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>five lines<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>that has similar lines<RESET>
+		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>to previous blocks, but with different indent<RESET>
+		<GREEN>+<RESET>QQ<GREEN>YetQAnotherQoutlier<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'compare whitespace delta incompatible with other space options' '
+	test_must_fail git diff \
+		--color-moved-ws=allow-indentation-change,ignore-all-space \
+		2>err &&
+	test_i18ngrep allow-indentation-change err
+'
+
 test_done
-- 
2.18.0.399.gad0ab374a1-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 6/9] diff.c: add a blocks mode for moved code detection
  2018-06-29  0:19   ` [PATCH v4 6/9] diff.c: add a blocks mode for moved code detection Stefan Beller
@ 2018-07-02 17:18     ` Brandon Williams
  0 siblings, 0 replies; 25+ messages in thread
From: Brandon Williams @ 2018-07-02 17:18 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jacob.keller, jonathantanmy, simon, Junio C Hamano

On 06/28, Stefan Beller wrote:
> The new "blocks" mode provides a middle ground between plain and zebra.
> It is as intuitive (few colors) as plain, but still has the requirement
> for a minimum of lines/characters to count a block as moved.
> 
> Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>  (https://public-inbox.org/git/87o9j0uljo.fsf@evledraar.gmail.com/)
> Signed-off-by: Stefan Beller <sbeller@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/diff-options.txt |  8 ++++--
>  diff.c                         |  6 +++--
>  diff.h                         |  5 ++--
>  t/t4015-diff-whitespace.sh     | 49 ++++++++++++++++++++++++++++++++--
>  4 files changed, 60 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index e3a44f03cdc..ba56169de31 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -276,10 +276,14 @@ plain::
>  	that are added somewhere else in the diff. This mode picks up any
>  	moved line, but it is not very useful in a review to determine
>  	if a block of code was moved without permutation.
> -zebra::
> +blocks::
>  	Blocks of moved text of at least 20 alphanumeric characters
>  	are detected greedily. The detected blocks are
> -	painted using either the 'color.diff.{old,new}Moved' color or
> +	painted using either the 'color.diff.{old,new}Moved' color.
> +	Adjacent blocks cannot be told apart.
> +zebra::
> +	Blocks of moved text are detected as in 'blocks' mode. The blocks
> +	are painted using either the 'color.diff.{old,new}Moved' color or
>  	'color.diff.{old,new}MovedAlternative'. The change between
>  	the two colors indicates that a new block was detected.
>  dimmed_zebra::
> diff --git a/diff.c b/diff.c
> index d1bae900cdc..95c51c0b7df 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -271,6 +271,8 @@ static int parse_color_moved(const char *arg)
>  		return COLOR_MOVED_NO;
>  	else if (!strcmp(arg, "plain"))
>  		return COLOR_MOVED_PLAIN;
> +	else if (!strcmp(arg, "blocks"))
> +		return COLOR_MOVED_BLOCKS;
>  	else if (!strcmp(arg, "zebra"))
>  		return COLOR_MOVED_ZEBRA;
>  	else if (!strcmp(arg, "default"))
> @@ -278,7 +280,7 @@ static int parse_color_moved(const char *arg)
>  	else if (!strcmp(arg, "dimmed_zebra"))
>  		return COLOR_MOVED_ZEBRA_DIM;
>  	else
> -		return error(_("color moved setting must be one of 'no', 'default', 'zebra', 'dimmed_zebra', 'plain'"));
> +		return error(_("color moved setting must be one of 'no', 'default', 'blocks', 'zebra', 'dimmed_zebra', 'plain'"));
>  }
>  
>  int git_diff_ui_config(const char *var, const char *value, void *cb)
> @@ -903,7 +905,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  
>  		block_length++;
>  
> -		if (flipped_block)
> +		if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
>  			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
>  	}
>  	adjust_last_block(o, n, block_length);
> diff --git a/diff.h b/diff.h
> index d29560f822c..7bd4f182c33 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -208,8 +208,9 @@ struct diff_options {
>  	enum {
>  		COLOR_MOVED_NO = 0,
>  		COLOR_MOVED_PLAIN = 1,
> -		COLOR_MOVED_ZEBRA = 2,
> -		COLOR_MOVED_ZEBRA_DIM = 3,
> +		COLOR_MOVED_BLOCKS = 2,
> +		COLOR_MOVED_ZEBRA = 3,
> +		COLOR_MOVED_ZEBRA_DIM = 4,
>  	} color_moved;
>  	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
>  	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index ddbc3901385..e54529f026d 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1223,7 +1223,7 @@ test_expect_success 'plain moved code, inside file' '
>  	test_cmp expected actual
>  '
>  
> -test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
> +test_expect_success 'detect blocks of moved code' '
>  	git reset --hard &&
>  	cat <<-\EOF >lines.txt &&
>  		long line 1
> @@ -1271,6 +1271,50 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
>  	test_config color.diff.newMovedDimmed "normal cyan" &&
>  	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
>  	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
> +	git diff HEAD --no-renames --color-moved=blocks --color >actual.raw &&
> +	grep -v "index" actual.raw | test_decode_color >actual &&
> +	cat <<-\EOF >expected &&
> +	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
> +	<BOLD>--- a/lines.txt<RESET>
> +	<BOLD>+++ b/lines.txt<RESET>
> +	<CYAN>@@ -1,16 +1,16 @@<RESET>
> +	<MAGENTA>-long line 1<RESET>
> +	<MAGENTA>-long line 2<RESET>
> +	<MAGENTA>-long line 3<RESET>
> +	 line 4<RESET>
> +	 line 5<RESET>
> +	 line 6<RESET>
> +	 line 7<RESET>
> +	 line 8<RESET>
> +	 line 9<RESET>
> +	<CYAN>+<RESET><CYAN>long line 1<RESET>
> +	<CYAN>+<RESET><CYAN>long line 2<RESET>
> +	<CYAN>+<RESET><CYAN>long line 3<RESET>
> +	<CYAN>+<RESET><CYAN>long line 14<RESET>
> +	<CYAN>+<RESET><CYAN>long line 15<RESET>
> +	<CYAN>+<RESET><CYAN>long line 16<RESET>
> +	 line 10<RESET>
> +	 line 11<RESET>
> +	 line 12<RESET>
> +	 line 13<RESET>
> +	<MAGENTA>-long line 14<RESET>
> +	<MAGENTA>-long line 15<RESET>
> +	<MAGENTA>-long line 16<RESET>
> +	EOF
> +	test_cmp expected actual
> +

Theres an empty line here.  Not worth fixing if its the only issue
though.

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 7/9] diff.c: decouple white space treatment from move detection algorithm
  2018-06-29  0:19   ` [PATCH v4 7/9] diff.c: decouple white space treatment from move detection algorithm Stefan Beller
@ 2018-07-02 17:22     ` Brandon Williams
  0 siblings, 0 replies; 25+ messages in thread
From: Brandon Williams @ 2018-07-02 17:22 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jacob.keller, jonathantanmy, simon, Junio C Hamano

On 06/28, Stefan Beller wrote:
> In the original implementation of the move detection logic the choice for
> ignoring white space changes is the same for the move detection as it is
> for the regular diff.  Some cases came up where different treatment would
> have been nice.
> 
> Allow the user to specify that white space should be ignored differently
> during detection of moved lines than during generation of added and removed
> lines. This is done by providing analogs to the --ignore-space-at-eol,
> -b, and -w options by introducing the option --color-moved-ws=<modes>
> with the modes named "ignore-space-at-eol", "ignore-space-change" and
> "ignore-all-space", which is used only during the move detection phase.
> 
> As we change the default, we'll adjust the tests.
> 
> For now we do not infer any options to treat white spaces in the move
> detection from the generic white space options given to diff.
> This can be tuned later to reasonable default.
> 
> As we plan on adding more white space related options in a later patch,
> that interferes with the current white space options, use a flag field
> and clamp it down to  XDF_WHITESPACE_FLAGS, as that (a) allows to easily
> check at parse time if we give invalid combinations and (b) can reuse
> parts of this patch.
> 
> By having the white space treatment in its own option, we'll also
> make it easier for a later patch to have an config option for
> spaces in the move detection.
> 
> Signed-off-by: Stefan Beller <sbeller@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/diff-options.txt | 17 +++++++++
>  diff.c                         | 39 +++++++++++++++++++--
>  diff.h                         |  1 +
>  t/t4015-diff-whitespace.sh     | 64 +++++++++++++++++++++++++++++++---
>  4 files changed, 115 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index ba56169de31..80e29e39854 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -292,6 +292,23 @@ dimmed_zebra::
>  	blocks are considered interesting, the rest is uninteresting.
>  --
>  
> +--color-moved-ws=<modes>::
> +	This configures how white spaces are ignored when performing the
> +	move detection for `--color-moved`. These modes can be given
> +	as a comma separated list:
> ++
> +--
> +ignore-space-at-eol::
> +	Ignore changes in whitespace at EOL.
> +ignore-space-change::
> +	Ignore changes in amount of whitespace.  This ignores whitespace
> +	at line end, and considers all other sequences of one or
> +	more whitespace characters to be equivalent.
> +ignore-all-space::
> +	Ignore whitespace when comparing lines. This ignores differences
> +	even if one line has whitespace where the other line has none.
> +--
> +
>  --word-diff[=<mode>]::
>  	Show a word diff, using the <mode> to delimit changed words.
>  	By default, words are delimited by whitespace; see
> diff --git a/diff.c b/diff.c
> index 95c51c0b7df..70eeb40c5fd 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -283,6 +283,36 @@ static int parse_color_moved(const char *arg)
>  		return error(_("color moved setting must be one of 'no', 'default', 'blocks', 'zebra', 'dimmed_zebra', 'plain'"));
>  }
>  
> +static int parse_color_moved_ws(const char *arg)
> +{
> +	int ret = 0;
> +	struct string_list l = STRING_LIST_INIT_DUP;
> +	struct string_list_item *i;
> +
> +	string_list_split(&l, arg, ',', -1);
> +
> +	for_each_string_list_item(i, &l) {
> +		struct strbuf sb = STRBUF_INIT;
> +		strbuf_addstr(&sb, i->string);
> +		strbuf_trim(&sb);
> +
> +		if (!strcmp(sb.buf, "ignore-space-change"))
> +			ret |= XDF_IGNORE_WHITESPACE_CHANGE;
> +		else if (!strcmp(sb.buf, "ignore-space-at-eol"))
> +			ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
> +		else if (!strcmp(sb.buf, "ignore-all-space"))
> +			ret |= XDF_IGNORE_WHITESPACE;
> +		else
> +			error(_("ignoring unknown color-moved-ws mode '%s'"), sb.buf);
> +
> +		strbuf_release(&sb);
> +	}
> +
> +	string_list_clear(&l, 0);
> +
> +	return ret;
> +}
> +
>  int git_diff_ui_config(const char *var, const char *value, void *cb)
>  {
>  	if (!strcmp(var, "diff.color") || !strcmp(var, "color.diff")) {
> @@ -717,10 +747,12 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
>  	const struct diff_options *diffopt = hashmap_cmp_fn_data;
>  	const struct moved_entry *a = entry;
>  	const struct moved_entry *b = entry_or_key;
> +	unsigned flags = diffopt->color_moved_ws_handling
> +			 & XDF_WHITESPACE_FLAGS;
>  
>  	return !xdiff_compare_lines(a->es->line, a->es->len,
>  				    b->es->line, b->es->len,
> -				    diffopt->xdl_opts);
> +				    flags);
>  }
>  
>  static struct moved_entry *prepare_entry(struct diff_options *o,
> @@ -728,8 +760,9 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
>  {
>  	struct moved_entry *ret = xmalloc(sizeof(*ret));
>  	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
> +	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
>  
> -	ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
> +	ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
>  	ret->es = l;
>  	ret->next_line = NULL;
>  
> @@ -4710,6 +4743,8 @@ int diff_opt_parse(struct diff_options *options,
>  		if (cm < 0)
>  			die("bad --color-moved argument: %s", arg);
>  		options->color_moved = cm;
> +	} else if (skip_prefix(arg, "--color-moved-ws=", &arg)) {
> +		options->color_moved_ws_handling = parse_color_moved_ws(arg);
>  	} else if (skip_to_optional_arg_default(arg, "--color-words", &options->word_regex, NULL)) {
>  		options->use_color = 1;
>  		options->word_diff = DIFF_WORDS_COLOR;
> diff --git a/diff.h b/diff.h
> index 7bd4f182c33..de5dc680051 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -214,6 +214,7 @@ struct diff_options {
>  	} color_moved;
>  	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
>  	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
> +	int color_moved_ws_handling;
>  };
>  
>  void diff_emit_submodule_del(struct diff_options *o, const char *line);
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index e54529f026d..0c737a47cf8 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1460,7 +1460,8 @@ test_expect_success 'move detection ignoring whitespace ' '
>  	EOF
>  	test_cmp expected actual &&
>  
> -	git diff HEAD --no-renames -w --color-moved --color >actual.raw &&
> +	git diff HEAD --no-renames --color-moved --color \
> +		--color-moved-ws=ignore-all-space >actual.raw &&
>  	grep -v "index" actual.raw | test_decode_color >actual &&
>  	cat <<-\EOF >expected &&
>  	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
> @@ -1522,7 +1523,8 @@ test_expect_success 'move detection ignoring whitespace changes' '
>  	EOF
>  	test_cmp expected actual &&
>  
> -	git diff HEAD --no-renames -b --color-moved --color >actual.raw &&
> +	git diff HEAD --no-renames --color-moved --color \
> +		--color-moved-ws=ignore-space-change >actual.raw &&
>  	grep -v "index" actual.raw | test_decode_color >actual &&
>  	cat <<-\EOF >expected &&
>  	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
> @@ -1587,7 +1589,8 @@ test_expect_success 'move detection ignoring whitespace at eol' '
>  	EOF
>  	test_cmp expected actual &&
>  
> -	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color >actual.raw &&
> +	git diff HEAD --no-renames --color-moved --color \
> +		--color-moved-ws=ignore-space-at-eol >actual.raw &&
>  	grep -v "index" actual.raw | test_decode_color >actual &&
>  	cat <<-\EOF >expected &&
>  	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
> @@ -1757,7 +1760,60 @@ test_expect_success 'move detection with submodules' '
>  
>  	# nor did we mess with it another way
>  	git diff --submodule=diff --color | test_decode_color >expect &&
> -	test_cmp expect decoded_actual
> +	test_cmp expect decoded_actual &&
> +	rm -rf bananas &&
> +	git submodule deinit bananas

Maybe you want to use a test_when_finished call for this instead of
doing this at the end of the test?  I guess this comes up as a point of
debate: Do you have each test clean up after itself or do you ensure
that subsequent tests makes sure its env is ready before testing.
Anyway this is just a suggestion.

> +'
> +
> +test_expect_success 'only move detection ignores white spaces' '
> +	git reset --hard &&
> +	q_to_tab <<-\EOF >text.txt &&
> +		a long line to exceed per-line minimum
> +		another long line to exceed per-line minimum
> +		original file
> +	EOF
> +	git add text.txt &&
> +	git commit -m "add text" &&
> +	q_to_tab <<-\EOF >text.txt &&
> +		Qa long line to exceed per-line minimum
> +		Qanother long line to exceed per-line minimum
> +		new file
> +	EOF
> +
> +	# Make sure we get a different diff using -w
> +	git diff --color --color-moved -w |
> +		grep -v "index" |
> +		test_decode_color >actual &&
> +	q_to_tab <<-\EOF >expected &&
> +	<BOLD>diff --git a/text.txt b/text.txt<RESET>
> +	<BOLD>--- a/text.txt<RESET>
> +	<BOLD>+++ b/text.txt<RESET>
> +	<CYAN>@@ -1,3 +1,3 @@<RESET>
> +	 Qa long line to exceed per-line minimum<RESET>
> +	 Qanother long line to exceed per-line minimum<RESET>
> +	<RED>-original file<RESET>
> +	<GREEN>+<RESET><GREEN>new file<RESET>
> +	EOF
> +	test_cmp expected actual &&
> +
> +	# And now ignoring white space only in the move detection
> +	git diff --color --color-moved \
> +		--color-moved-ws=ignore-all-space,ignore-space-change,ignore-space-at-eol |
> +		grep -v "index" |
> +		test_decode_color >actual &&
> +	q_to_tab <<-\EOF >expected &&
> +	<BOLD>diff --git a/text.txt b/text.txt<RESET>
> +	<BOLD>--- a/text.txt<RESET>
> +	<BOLD>+++ b/text.txt<RESET>
> +	<CYAN>@@ -1,3 +1,3 @@<RESET>
> +	<BOLD;MAGENTA>-a long line to exceed per-line minimum<RESET>
> +	<BOLD;MAGENTA>-another long line to exceed per-line minimum<RESET>
> +	<RED>-original file<RESET>
> +	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>a long line to exceed per-line minimum<RESET>
> +	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>another long line to exceed per-line minimum<RESET>
> +	<GREEN>+<RESET><GREEN>new file<RESET>
> +	EOF
> +	test_cmp expected actual
>  '
>  
>  test_done
> -- 
> 2.18.0.399.gad0ab374a1-goog
> 

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 9/9] diff.c: add white space mode to move detection that allows indent changes
  2018-06-29  0:19   ` [PATCH v4 9/9] diff.c: add white space mode to move detection that allows indent changes Stefan Beller
@ 2018-07-02 17:36     ` Brandon Williams
  2018-07-02 18:59       ` Stefan Beller
  0 siblings, 1 reply; 25+ messages in thread
From: Brandon Williams @ 2018-07-02 17:36 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jacob.keller, jonathantanmy, simon, Junio C Hamano

On 06/28, Stefan Beller wrote:
> The option of --color-moved has proven to be useful as observed on the
> mailing list. However when refactoring sometimes the indentation changes,
> for example when partitioning a functions into smaller helper functions
> the code usually mostly moved around except for a decrease in indentation.
> 
> To just review the moved code ignoring the change in indentation, a mode
> to ignore spaces in the move detection as implemented in a previous patch
> would be enough.  However the whole move coloring as motivated in commit
> 2e2d5ac (diff.c: color moved lines differently, 2017-06-30), brought
> up the notion of the reviewer being able to trust the move of a "block".
> 
> As there are languages such as python, which depend on proper relative
> indentation for the control flow of the program, ignoring any white space
> change in a block would not uphold the promises of 2e2d5ac that allows
> reviewers to pay less attention to the inside of a block, as inside
> the reviewer wants to assume the same program flow.
> 
> This new mode of white space ignorance will take this into account and will
> only allow the same white space changes per line in each block. This patch
> even allows only for the same change at the beginning of the lines.
> 
> As this is a white space mode, it is made exclusive to other white space
> modes in the move detection.
> 
> This patch brings some challenges, related to the detection of blocks.
> We need a white net the catch the possible moved lines, but then need to

s/white/wide/

> +
> +/**
> + * The struct ws_delta holds white space differences between moved lines, i.e.
> + * between '+' and '-' lines that have been detected to be a move.
> + * The string contains the difference in leading white spaces, before the
> + * rest of the line is compared using the white space config for move
> + * coloring. The current_longer indicates if the first string in the
> + * comparision is longer than the second.
> + */
> +struct ws_delta {
> +	char *string;
> +	unsigned int current_longer : 1;
>  };
> +#define WS_DELTA_INIT { NULL, 0 }
> +
> +static int compute_ws_delta(const struct emitted_diff_symbol *a,
> +			     const struct emitted_diff_symbol *b,
> +			     struct ws_delta *out)
> +{
> +	const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
> +	const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
> +	int d = longer->len - shorter->len;
> +
> +	out->string = xmemdupz(longer->line, d);
> +	out->current_longer = (a == longer);
> +
> +	return !strncmp(longer->line + d, shorter->line, shorter->len);
> +}

I'm having a harder time understanding this block.  This is used to fill
a ws_delta struct by calculating the whitespace delta between two lines.
If that is the case then why doesn't this function verify that the first
'd' characters in the longer line are indeed whitespace?  Also, maybe
this is just because I'm not as familiar with the move detection code,
but how would the whitespace detection handle a line being moved from
being indented with tabs to spaces or vice versa?  Is this something
already handled and not an issue?

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v4 9/9] diff.c: add white space mode to move detection that allows indent changes
  2018-07-02 17:36     ` Brandon Williams
@ 2018-07-02 18:59       ` Stefan Beller
  0 siblings, 0 replies; 25+ messages in thread
From: Stefan Beller @ 2018-07-02 18:59 UTC (permalink / raw)
  To: Brandon Williams
  Cc: git, Jacob Keller, Jonathan Tan, Simon Ruderich, Junio C Hamano

On Mon, Jul 2, 2018 at 10:36 AM Brandon Williams <bmwill@google.com> wrote:
>
> On 06/28, Stefan Beller wrote:
> > The option of --color-moved has proven to be useful as observed on the
> > mailing list. However when refactoring sometimes the indentation changes,
> > for example when partitioning a functions into smaller helper functions
> > the code usually mostly moved around except for a decrease in indentation.
> >
> > To just review the moved code ignoring the change in indentation, a mode
> > to ignore spaces in the move detection as implemented in a previous patch
> > would be enough.  However the whole move coloring as motivated in commit
> > 2e2d5ac (diff.c: color moved lines differently, 2017-06-30), brought
> > up the notion of the reviewer being able to trust the move of a "block".
> >
> > As there are languages such as python, which depend on proper relative
> > indentation for the control flow of the program, ignoring any white space
> > change in a block would not uphold the promises of 2e2d5ac that allows
> > reviewers to pay less attention to the inside of a block, as inside
> > the reviewer wants to assume the same program flow.
> >
> > This new mode of white space ignorance will take this into account and will
> > only allow the same white space changes per line in each block. This patch
> > even allows only for the same change at the beginning of the lines.
> >
> > As this is a white space mode, it is made exclusive to other white space
> > modes in the move detection.
> >
> > This patch brings some challenges, related to the detection of blocks.
> > We need a white net the catch the possible moved lines, but then need to
>
> s/white/wide/

oops. will fix.

>
> > +
> > +/**
> > + * The struct ws_delta holds white space differences between moved lines, i.e.
> > + * between '+' and '-' lines that have been detected to be a move.
> > + * The string contains the difference in leading white spaces, before the
> > + * rest of the line is compared using the white space config for move
> > + * coloring. The current_longer indicates if the first string in the
> > + * comparision is longer than the second.
> > + */
> > +struct ws_delta {
> > +     char *string;
> > +     unsigned int current_longer : 1;
> >  };
> > +#define WS_DELTA_INIT { NULL, 0 }
> > +
> > +static int compute_ws_delta(const struct emitted_diff_symbol *a,
> > +                          const struct emitted_diff_symbol *b,
> > +                          struct ws_delta *out)
> > +{
> > +     const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
> > +     const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
> > +     int d = longer->len - shorter->len;
> > +
> > +     out->string = xmemdupz(longer->line, d);
> > +     out->current_longer = (a == longer);
> > +
> > +     return !strncmp(longer->line + d, shorter->line, shorter->len);
> > +}
>
> I'm having a harder time understanding this block.  This is used to fill
> a ws_delta struct by calculating the whitespace delta between two lines.

yes.

> If that is the case then why doesn't this function verify that the first
> 'd' characters in the longer line are indeed whitespace?

This was done implicitly before as compute_ws_delta is called only
on two lines that are "equal with XDF_IGNORE_WHITESPACE".

> Also, maybe
> this is just because I'm not as familiar with the move detection code,
> but how would the whitespace detection handle a line being moved from
> being indented with tabs to spaces or vice versa?  Is this something
> already handled and not an issue?

The ws_delta stores the string (of whitespaces) that the 'shorter string'
needs to be prefixed with to create the 'longer string.
(I chose 'shorter' and 'longer' as any of them can be '+' or '-' lines)

Then later when we compare the strings (the current line in
consideration and strings that we put in hashmaps that we
know are moved lines) in cmp_in_block_with_wsd,
we (should) take this white space delta into account.

I just realize that there we do not check for the tab/space replacement
as there we would need to compare the ws_delta string to the beginning
of the 'longer' string there.

I'll add a test for the space/tab replacement as well.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-07-02 19:07 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-22  1:57 [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Stefan Beller
2018-06-22  1:57 ` [PATCH v3 1/8] xdiff/xdiff.h: remove unused flags Stefan Beller
2018-06-22  1:57 ` [PATCH v3 2/8] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
2018-06-22  1:57 ` [PATCH v3 3/8] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
2018-06-22  1:57 ` [PATCH v3 4/8] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
2018-06-22  1:57 ` [PATCH v3 5/8] diff.c: add a blocks mode for moved code detection Stefan Beller
2018-06-22  1:57 ` [PATCH v3 6/8] diff.c: decouple white space treatment from move detection algorithm Stefan Beller
2018-06-22  1:57 ` [PATCH v3 7/8] diff.c: factor advance_or_nullify out of mark_color_as_moved Stefan Beller
2018-06-22  1:57 ` [PATCH v3 8/8] diff.c: add white space mode to move detection that allows indent changes Stefan Beller
2018-06-23 16:52   ` SZEDER Gábor
2018-06-22 22:37 ` [PATCH v3 0/8] Moved code detection: ignore space on uniform indentation Junio C Hamano
2018-06-29  0:19 ` [PATCH v4 0/9] " Stefan Beller
2018-06-29  0:19   ` [PATCH v4 1/9] xdiff/xdiff.h: remove unused flags Stefan Beller
2018-06-29  0:19   ` [PATCH v4 2/9] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
2018-06-29  0:19   ` [PATCH v4 3/9] t4015: avoid git as a pipe input Stefan Beller
2018-06-29  0:19   ` [PATCH v4 4/9] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
2018-06-29  0:19   ` [PATCH v4 5/9] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
2018-06-29  0:19   ` [PATCH v4 6/9] diff.c: add a blocks mode for moved code detection Stefan Beller
2018-07-02 17:18     ` Brandon Williams
2018-06-29  0:19   ` [PATCH v4 7/9] diff.c: decouple white space treatment from move detection algorithm Stefan Beller
2018-07-02 17:22     ` Brandon Williams
2018-06-29  0:19   ` [PATCH v4 8/9] diff.c: factor advance_or_nullify out of mark_color_as_moved Stefan Beller
2018-06-29  0:19   ` [PATCH v4 9/9] diff.c: add white space mode to move detection that allows indent changes Stefan Beller
2018-07-02 17:36     ` Brandon Williams
2018-07-02 18:59       ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).