git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation
@ 2018-04-02 22:48 Stefan Beller
  2018-04-02 22:48 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

This is a re-attempt of [1], which allows the moved code detection to
ignore blanks in various modes.

patches 1-5 are refactoring, patch 6 adds all existing white space options
of regular diff to the move detection. (I am unsure about this patch,
as I presume we want to keep the option space at a minimum if possible).

The fun is in the last patch, which allows white space sensitive
languages to trust the move detection, too. Each block that is marked as
moved will have the same delta in {in-, de-}dentation.
I would think this mode might be a reasonable default eventually.

Thanks,
Stefan

[1] https://public-inbox.org/git/20171025224620.27657-1-sbeller@google.com/

Stefan Beller (7):
  xdiff/xdiff.h: remove unused flags
  xdiff/xdiffi.c: remove unneeded function declarations
  diff.c: do not pass diff options as keydata to hashmap
  diff.c: adjust hash function signature to match hashmap expectation
  diff.c: refactor internal representation for coloring moved code
  diff.c: decouple white space treatment for move detection from generic
    option
  diff.c: add --color-moved-ignore-space-delta option

 Documentation/diff-options.txt |  13 ++
 diff.c                         | 155 ++++++++++++---
 diff.h                         |  18 +-
 t/t4015-diff-whitespace.sh     | 341 ++++++++++++++++++++++++++++++++-
 xdiff/xdiff.h                  |   8 -
 xdiff/xdiffi.c                 |  17 --
 6 files changed, 483 insertions(+), 69 deletions(-)

-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/7] xdiff/xdiff.h: remove unused flags
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-02 22:48 ` [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

These flags were there since the beginning (3443546f6e (Use a *real*
built-in diff generator, 2006-03-24), but were never used. Remove them.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 xdiff/xdiff.h | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index c1937a2911..2356da5f78 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -52,14 +52,6 @@ extern "C" {
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
-#define XDL_MMB_READONLY (1 << 0)
-
-#define XDL_MMF_ATOMIC (1 << 0)
-
-#define XDL_BDOP_INS 1
-#define XDL_BDOP_CPY 2
-#define XDL_BDOP_INSB 3
-
 /* merge simplification levels */
 #define XDL_MERGE_MINIMAL 0
 #define XDL_MERGE_EAGER 1
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
  2018-04-02 22:48 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-02 22:48 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

There is no need to forward-declare these functions, as they are used
after their implementation only.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 xdiff/xdiffi.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 0de1ef463b..3e8aff92bc 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -22,34 +22,17 @@
 
 #include "xinclude.h"
 
-
-
 #define XDL_MAX_COST_MIN 256
 #define XDL_HEUR_MIN_COST 256
 #define XDL_LINE_MAX (long)((1UL << (CHAR_BIT * sizeof(long) - 1)) - 1)
 #define XDL_SNAKE_CNT 20
 #define XDL_K_HEUR 4
 
-
-
 typedef struct s_xdpsplit {
 	long i1, i2;
 	int min_lo, min_hi;
 } xdpsplit_t;
 
-
-
-
-static long xdl_split(unsigned long const *ha1, long off1, long lim1,
-		      unsigned long const *ha2, long off2, long lim2,
-		      long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
-		      xdalgoenv_t *xenv);
-static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, long chg2);
-
-
-
-
-
 /*
  * See "An O(ND) Difference Algorithm and its Variations", by Eugene Myers.
  * Basically considers a "box" (off1, off2, lim1, lim2) and scan from both
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
  2018-04-02 22:48 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
  2018-04-02 22:48 ` [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-06 20:04   ` Jeff King
  2018-04-02 22:48 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

The diff options are passed to the compare function as
'hashmap_cmp_fn_data', which are given when the hashmaps
are initialized.

A later patch will make use of the keydata to signal
different settings for comparision.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 4c59f5f5d3..52f6e02130 100644
--- a/diff.c
+++ b/diff.c
@@ -842,13 +842,13 @@ static void mark_color_as_moved(struct diff_options *o,
 		case DIFF_SYMBOL_PLUS:
 			hm = del_lines;
 			key = prepare_entry(o, n);
-			match = hashmap_get(hm, key, o);
+			match = hashmap_get(hm, key, NULL);
 			free(key);
 			break;
 		case DIFF_SYMBOL_MINUS:
 			hm = add_lines;
 			key = prepare_entry(o, n);
-			match = hashmap_get(hm, key, o);
+			match = hashmap_get(hm, key, NULL);
 			free(key);
 			break;
 		default:
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (2 preceding siblings ...)
  2018-04-02 22:48 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-02 22:48 ` [PATCH 5/7] diff.c: refactor internal representation for coloring moved code Stefan Beller
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

This makes the follow up patch easier.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 52f6e02130..879b8a5d9d 100644
--- a/diff.c
+++ b/diff.c
@@ -707,11 +707,15 @@ struct moved_entry {
 	struct moved_entry *next_line;
 };
 
-static int moved_entry_cmp(const struct diff_options *diffopt,
-			   const struct moved_entry *a,
-			   const struct moved_entry *b,
+static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
+			   const void *entry,
+			   const void *entry_or_key,
 			   const void *keydata)
 {
+	const struct diff_options *diffopt = hashmap_cmp_fn_data;
+	const struct moved_entry *a = entry;
+	const struct moved_entry *b = entry_or_key;
+
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
 				    diffopt->xdl_opts);
@@ -5534,10 +5538,8 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
-			hashmap_init(&del_lines,
-				     (hashmap_cmp_fn)moved_entry_cmp, o, 0);
-			hashmap_init(&add_lines,
-				     (hashmap_cmp_fn)moved_entry_cmp, o, 0);
+			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
+			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
 			add_lines_to_move_detection(o, &add_lines, &del_lines);
 			mark_color_as_moved(o, &add_lines, &del_lines);
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (3 preceding siblings ...)
  2018-04-02 22:48 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-02 23:51   ` Jonathan Tan
  2018-04-03 19:39   ` Ævar Arnfjörð Bjarmason
  2018-04-02 22:48 ` [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option Stefan Beller
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 27+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

At the time the move coloring was implemented we thought an enum of modes
is the best to configure this feature.  However as we want to tack on new
features, the enum would grow exponentially.

Refactor the code such that features are enabled via bits. Currently we can
* activate the move detection,
* enable the block detection on top, and
* enable the dimming inside a block, though this could be done without
  block detection as well (mode "plain, dimmed")

Choose the flags to not be at bit position 2,3,4 as the next patch
will occupy these.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 27 ++++++++++++++-------------
 diff.h | 17 ++++++++++-------
 2 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/diff.c b/diff.c
index 879b8a5d9d..2052a43f7a 100644
--- a/diff.c
+++ b/diff.c
@@ -260,7 +260,7 @@ static int parse_color_moved(const char *arg)
 {
 	switch (git_parse_maybe_bool(arg)) {
 	case 0:
-		return COLOR_MOVED_NO;
+		return 0;
 	case 1:
 		return COLOR_MOVED_DEFAULT;
 	default:
@@ -268,15 +268,17 @@ static int parse_color_moved(const char *arg)
 	}
 
 	if (!strcmp(arg, "no"))
-		return COLOR_MOVED_NO;
+		return 0;
 	else if (!strcmp(arg, "plain"))
-		return COLOR_MOVED_PLAIN;
+		return COLOR_MOVED_ENABLED;
 	else if (!strcmp(arg, "zebra"))
-		return COLOR_MOVED_ZEBRA;
+		return COLOR_MOVED_ENABLED | COLOR_MOVED_FIND_BLOCKS;
 	else if (!strcmp(arg, "default"))
 		return COLOR_MOVED_DEFAULT;
 	else if (!strcmp(arg, "dimmed_zebra"))
-		return COLOR_MOVED_ZEBRA_DIM;
+		return COLOR_MOVED_ENABLED |
+		       COLOR_MOVED_FIND_BLOCKS |
+		       COLOR_MOVED_DIMMED_BLOCKS;
 	else
 		return error(_("color moved setting must be one of 'no', 'default', 'zebra', 'dimmed_zebra', 'plain'"));
 }
@@ -794,7 +796,7 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 }
 
 /*
- * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
+ * If o->color_moved is not set to find blocks, this function does nothing.
  *
  * Otherwise, if the last block has fewer alphanumeric characters than
  * COLOR_MOVED_MIN_ALNUM_COUNT, unset DIFF_SYMBOL_MOVED_LINE on all lines in
@@ -809,7 +811,7 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 static void adjust_last_block(struct diff_options *o, int n, int block_length)
 {
 	int i, alnum_count = 0;
-	if (o->color_moved == COLOR_MOVED_PLAIN)
+	if (!(o->color_moved & COLOR_MOVED_FIND_BLOCKS))
 		return;
 	for (i = 1; i < block_length + 1; i++) {
 		const char *c = o->emitted_symbols->buf[n - i].line;
@@ -868,7 +870,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
 		l->flags |= DIFF_SYMBOL_MOVED_LINE;
 
-		if (o->color_moved == COLOR_MOVED_PLAIN)
+		if (!(o->color_moved & COLOR_MOVED_FIND_BLOCKS))
 			continue;
 
 		/* Check any potential block runs, advance each or nullify */
@@ -4697,12 +4699,11 @@ int diff_opt_parse(struct diff_options *options,
 	else if (!strcmp(arg, "--no-color"))
 		options->use_color = 0;
 	else if (!strcmp(arg, "--color-moved")) {
-		if (diff_color_moved_default)
-			options->color_moved = diff_color_moved_default;
-		if (options->color_moved == COLOR_MOVED_NO)
+		options->color_moved = diff_color_moved_default;
+		if (!options->color_moved)
 			options->color_moved = COLOR_MOVED_DEFAULT;
 	} else if (!strcmp(arg, "--no-color-moved"))
-		options->color_moved = COLOR_MOVED_NO;
+		options->color_moved = 0;
 	else if (skip_prefix(arg, "--color-moved=", &arg)) {
 		int cm = parse_color_moved(arg);
 		if (cm < 0)
@@ -5543,7 +5544,7 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 
 			add_lines_to_move_detection(o, &add_lines, &del_lines);
 			mark_color_as_moved(o, &add_lines, &del_lines);
-			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
+			if (o->color_moved & COLOR_MOVED_DIMMED_BLOCKS)
 				dim_moved_lines(o);
 
 			hashmap_free(&add_lines, 0);
diff --git a/diff.h b/diff.h
index d29560f822..9542017986 100644
--- a/diff.h
+++ b/diff.h
@@ -205,13 +205,16 @@ struct diff_options {
 	int diff_path_counter;
 
 	struct emitted_diff_symbols *emitted_symbols;
-	enum {
-		COLOR_MOVED_NO = 0,
-		COLOR_MOVED_PLAIN = 1,
-		COLOR_MOVED_ZEBRA = 2,
-		COLOR_MOVED_ZEBRA_DIM = 3,
-	} color_moved;
-	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
+	unsigned color_moved;
+
+	#define COLOR_MOVED_ENABLED		(1 << 0)
+
+	#define COLOR_MOVED_FIND_BLOCKS		(1 << 1)
+	/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
+
+	#define COLOR_MOVED_DIMMED_BLOCKS	(1 << 23)
+
+	#define COLOR_MOVED_DEFAULT (COLOR_MOVED_ENABLED | COLOR_MOVED_FIND_BLOCKS)
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
 };
 
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (4 preceding siblings ...)
  2018-04-02 22:48 ` [PATCH 5/7] diff.c: refactor internal representation for coloring moved code Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-03  0:31   ` Jonathan Tan
  2018-04-02 22:48 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
  2018-04-02 23:47 ` [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
  7 siblings, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

In the original implementation of the move detection logic we assumed that
the choice for ignoring white space changes is the same for the move
detection as it is for the generic diff.  It turns out this is wrong.

There are a couple of things where the user wants to input their
decision into the diff machinery:

* Whether or not to ignore white space for the general diff detection.
  This is covered with the -w, -b options already.
* Whether the move detection ought to pay attention to white spaces
  in general. And if so, how are white spaces handled for the block
  detection.

One example would be reviewing a current patch under discussion, that moves
code around.  In that case, you may want to have the general diff machinery
not ignore the white spaces (i.e. exact matching), as that will show you
the diff as the patch sent. The code moved however may have another
indentation level, such that you want to ignore white spaces for the move
detection. The code may be in python, such that spaces matter for program
flow, though. That is why you'd want each block to have the same change
in white space.

This patch decouples white space treatment in the general diff machinery
from the white space treatment in the move detection.

This adds all the the options for ignoring white space that the regular
diff machinery has to the move detection, such that we can cover all the
cases that were introduced in 7a55427094 (Merge branch
'jk/diff-color-moved-fix', 2017-11-06).

The example from above (different lines in the diff with -w for the regular
diff) is covered in a test. Convert the existing tests to be more explicit
on their choice of white space behavior in the move detection as the tests
should not fail if the default is changed.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt |  13 +++
 diff.c                         |  17 +++-
 t/t4015-diff-whitespace.sh     | 150 +++++++++++++++++++++++++++++++--
 3 files changed, 171 insertions(+), 9 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index e3a44f03cd..4ca09c1977 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -288,6 +288,19 @@ dimmed_zebra::
 	blocks are considered interesting, the rest is uninteresting.
 --
 
+--color-moved-[no-]ignore-all-space::
+	Ignore whitespace when comparing lines when performing the move
+	detection for --color-moved.  This ignores differences even if
+	one line has whitespace where the other line has none.
+--color-moved-[no-]ignore-space-change::
+	Ignore changes in amount of whitespace when performing the move
+	detection for --color-moved.  This ignores whitespace
+	at line end, and considers all other sequences of one or
+	more whitespace characters to be equivalent.
+--color-moved-[no-]ignore-space-at-eol::
+	Ignore changes in whitespace at EOL when performing the move
+	detection for --color-moved.
+
 --word-diff[=<mode>]::
 	Show a word diff, using the <mode> to delimit changed words.
 	By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index 2052a43f7a..5fe2930dca 100644
--- a/diff.c
+++ b/diff.c
@@ -720,7 +720,7 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
-				    diffopt->xdl_opts);
+				    diffopt->color_moved & XDF_WHITESPACE_FLAGS);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -728,8 +728,9 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 {
 	struct moved_entry *ret = xmalloc(sizeof(*ret));
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
+	unsigned flags = o->color_moved & XDF_WHITESPACE_FLAGS;
 
-	ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
+	ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
 	ret->es = l;
 	ret->next_line = NULL;
 
@@ -4638,6 +4639,18 @@ int diff_opt_parse(struct diff_options *options,
 		DIFF_XDL_SET(options, IGNORE_CR_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+	else if (!strcmp(arg, "--color-moved-no-ignore-all-space"))
+		options->color_moved &= ~XDF_IGNORE_WHITESPACE;
+	else if (!strcmp(arg, "--color-moved-no-ignore-space-change"))
+		options->color_moved &= ~XDF_IGNORE_WHITESPACE_CHANGE;
+	else if (!strcmp(arg, "--color-moved-no-ignore-space-at-eol"))
+		options->color_moved &= ~XDF_IGNORE_WHITESPACE_AT_EOL;
+	else if (!strcmp(arg, "--color-moved-ignore-all-space"))
+		options->color_moved |= XDF_IGNORE_WHITESPACE;
+	else if (!strcmp(arg, "--color-moved-ignore-space-change"))
+		options->color_moved |= XDF_IGNORE_WHITESPACE_CHANGE;
+	else if (!strcmp(arg, "--color-moved-ignore-space-at-eol"))
+		options->color_moved |= XDF_IGNORE_WHITESPACE_AT_EOL;
 	else if (!strcmp(arg, "--indent-heuristic"))
 		DIFF_XDL_SET(options, INDENT_HEURISTIC);
 	else if (!strcmp(arg, "--no-indent-heuristic"))
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 17df491a3a..38aaf4c46c 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1395,7 +1395,10 @@ test_expect_success 'move detection ignoring whitespace ' '
 	line 4
 	line 5
 	EOF
-	git diff HEAD --no-renames --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1419,7 +1422,10 @@ test_expect_success 'move detection ignoring whitespace ' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -w --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1459,7 +1465,10 @@ test_expect_success 'move detection ignoring whitespace changes' '
 	line 5
 	EOF
 
-	git diff HEAD --no-renames --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1483,7 +1492,10 @@ test_expect_success 'move detection ignoring whitespace changes' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -b --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-at-eol \
+		--color-moved-ignore-space-change |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1526,7 +1538,10 @@ test_expect_success 'move detection ignoring whitespace at eol' '
 	# avoid cluttering the output with complaints about our eol whitespace
 	test_config core.whitespace -blank-at-eol &&
 
-	git diff HEAD --no-renames --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1550,7 +1565,10 @@ test_expect_success 'move detection ignoring whitespace at eol' '
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
+	git diff HEAD --no-renames --color-moved --color \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-ignore-space-at-eol |
 		grep -v "index" |
 		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
@@ -1722,7 +1740,125 @@ test_expect_success 'move detection with submodules' '
 
 	# nor did we mess with it another way
 	git diff --submodule=diff --color | test_decode_color >expect &&
-	test_cmp expect decoded_actual
+	test_cmp expect decoded_actual &&
+	rm -rf bananas &&
+	git submodule deinit bananas
+'
+
+test_expect_success 'move detection only ignores white spaces' '
+	git reset --hard &&
+	q_to_tab <<-\EOF >function.c &&
+	int func()
+	{
+	Qif (foo) {
+	QQ// this part of the function
+	QQ// function will be very long
+	QQ// indeed. We must exceed both
+	QQ// per-line and number of line
+	QQ// minimums
+	QQ;
+	Q}
+	Qbaz();
+	Qbar();
+	Q// more unrelated stuff
+	}
+	EOF
+	git add function.c &&
+	git commit -m "add function.c" &&
+	q_to_tab <<-\EOF >function.c &&
+	int do_foo()
+	{
+	Q// this part of the function
+	Q// function will be very long
+	Q// indeed. We must exceed both
+	Q// per-line and number of line
+	Q// minimums
+	Q;
+	}
+
+	int func()
+	{
+	Qif (foo)
+	QQdo_foo();
+	Qbaz();
+	Qbar();
+	Q// more unrelated stuff
+	}
+	EOF
+
+	# Make sure we get a different diff using -w ("moved function header")
+	git diff --color --color-moved -w \
+		--color-moved-no-ignore-all-space \
+		--color-moved-no-ignore-space-change \
+		--color-moved-no-ignore-space-at-eol |
+		grep -v "index" |
+		test_decode_color >actual &&
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/function.c b/function.c<RESET>
+	<BOLD>--- a/function.c<RESET>
+	<BOLD>+++ b/function.c<RESET>
+	<CYAN>@@ -1,6 +1,5 @@<RESET>
+	<BOLD;MAGENTA>-int func()<RESET>
+	<GREEN>+<RESET><GREEN>int do_foo()<RESET>
+	 {<RESET>
+	<RED>-	if (foo) {<RESET>
+	 Q// this part of the function<RESET>
+	 Q// function will be very long<RESET>
+	 Q// indeed. We must exceed both<RESET>
+	<CYAN>@@ -8,6 +7,11 @@<RESET> <RESET>int func()<RESET>
+	 Q// minimums<RESET>
+	 Q;<RESET>
+	 }<RESET>
+	<GREEN>+<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>int func()<RESET>
+	<GREEN>+<RESET><GREEN>{<RESET>
+	<GREEN>+<RESET>Q<GREEN>if (foo)<RESET>
+	<GREEN>+<RESET>QQ<GREEN>do_foo();<RESET>
+	 Qbaz();<RESET>
+	 Qbar();<RESET>
+	 Q// more unrelated stuff<RESET>
+	EOF
+	test_cmp expected actual &&
+
+	# And now ignoring white space only in the move detection
+	git diff --color --color-moved \
+		--color-moved-ignore-all-space \
+		--color-moved-ignore-space-change \
+		--color-moved-ignore-space-at-eol |
+		grep -v "index" |
+		test_decode_color >actual &&
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/function.c b/function.c<RESET>
+	<BOLD>--- a/function.c<RESET>
+	<BOLD>+++ b/function.c<RESET>
+	<CYAN>@@ -1,13 +1,17 @@<RESET>
+	<GREEN>+<RESET><GREEN>int do_foo()<RESET>
+	<GREEN>+<RESET><GREEN>{<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>// this part of the function<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>// function will be very long<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>// indeed. We must exceed both<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>// per-line and number of line<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>// minimums<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>;<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>}<RESET>
+	<GREEN>+<RESET>
+	 int func()<RESET>
+	 {<RESET>
+	<RED>-Qif (foo) {<RESET>
+	<BOLD;MAGENTA>-QQ// this part of the function<RESET>
+	<BOLD;MAGENTA>-QQ// function will be very long<RESET>
+	<BOLD;MAGENTA>-QQ// indeed. We must exceed both<RESET>
+	<BOLD;MAGENTA>-QQ// per-line and number of line<RESET>
+	<BOLD;MAGENTA>-QQ// minimums<RESET>
+	<BOLD;MAGENTA>-QQ;<RESET>
+	<BOLD;MAGENTA>-Q}<RESET>
+	<GREEN>+<RESET>Q<GREEN>if (foo)<RESET>
+	<GREEN>+<RESET>QQ<GREEN>do_foo();<RESET>
+	 Qbaz();<RESET>
+	 Qbar();<RESET>
+	 Q// more unrelated stuff<RESET>
+	EOF
+	test_cmp expected actual
 '
 
 test_done
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (5 preceding siblings ...)
  2018-04-02 22:48 ` [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option Stefan Beller
@ 2018-04-02 22:48 ` Stefan Beller
  2018-04-03  0:41   ` Jonathan Tan
  2018-04-02 23:47 ` [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
  7 siblings, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-04-02 22:48 UTC (permalink / raw)
  To: jonathantanmy, jacob.keller, simon; +Cc: git, Stefan Beller

This marks moved code still as blocks when their indentation level
changes uniformly.

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c                     |  93 ++++++++++++++++--
 diff.h                     |   1 +
 t/t4015-diff-whitespace.sh | 191 +++++++++++++++++++++++++++++++++++++
 3 files changed, 278 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 5fe2930dca..9f969be588 100644
--- a/diff.c
+++ b/diff.c
@@ -709,6 +709,41 @@ struct moved_entry {
 	struct moved_entry *next_line;
 };
 
+struct ws_delta {
+	int deltachars;
+	char firstchar;
+};
+
+static void compute_ws_delta(const struct emitted_diff_symbol *a,
+			     const struct emitted_diff_symbol *b,
+			     struct ws_delta *out)
+{
+	int i;
+	const struct emitted_diff_symbol *longer = a->len > b->len ? a : b;
+
+
+	out->deltachars = 0;
+	out->firstchar = 0;
+
+	if (longer->len > 0 && isspace(longer->line[0]))
+		out->firstchar = longer->line[0];
+	else
+		return;
+
+	for (i = 0; i < a->len; i++)
+		if (a->line[i] == out->firstchar)
+			out->deltachars ++;
+
+	for (i = 0; i < b->len; i++)
+		if (b->line[i] == out->firstchar)
+			out->deltachars --;
+}
+
+static int compare_ws_delta(const struct ws_delta *a, const struct ws_delta *b)
+{
+	return  a->firstchar == b->firstchar && a->deltachars == b->deltachars;
+}
+
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			   const void *entry,
 			   const void *entry_or_key,
@@ -717,10 +752,20 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
 	const struct moved_entry *a = entry;
 	const struct moved_entry *b = entry_or_key;
+	unsigned flags = diffopt->color_moved & XDF_WHITESPACE_FLAGS;
+
+	if (diffopt->color_moved & COLOR_MOVED_DELTA_WHITESPACES)
+		/*
+		 * As there is not specific white space config given,
+		 * we'd need to check for a new block, so ignore all
+		 * white space. The setup of the white space
+		 * configuration for the next block is done else where
+		 */
+		flags |= XDF_IGNORE_WHITESPACE;
 
 	return !xdiff_compare_lines(a->es->line, a->es->len,
 				    b->es->line, b->es->len,
-				    diffopt->color_moved & XDF_WHITESPACE_FLAGS);
+				    flags);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -770,7 +815,8 @@ static void add_lines_to_move_detection(struct diff_options *o,
 }
 
 static int shrink_potential_moved_blocks(struct moved_entry **pmb,
-					 int pmb_nr)
+					 int pmb_nr,
+					 struct ws_delta **wsd)
 {
 	int lp, rp;
 
@@ -786,6 +832,8 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 
 		if (lp < pmb_nr && rp > -1 && lp < rp) {
 			pmb[lp] = pmb[rp];
+			if (*wsd)
+				(*wsd)[lp] = (*wsd)[rp];
 			pmb[rp] = NULL;
 			rp--;
 			lp++;
@@ -835,8 +883,11 @@ static void mark_color_as_moved(struct diff_options *o,
 {
 	struct moved_entry **pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
-	int n, flipped_block = 1, block_length = 0;
 
+	struct ws_delta *wsd = NULL; /* white space deltas between pmb */
+	int wsd_alloc = 0;
+
+	int n, flipped_block = 1, block_length = 0;
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
 		struct hashmap *hm = NULL;
@@ -879,14 +930,30 @@ static void mark_color_as_moved(struct diff_options *o,
 			struct moved_entry *p = pmb[i];
 			struct moved_entry *pnext = (p && p->next_line) ?
 					p->next_line : NULL;
-			if (pnext && !hm->cmpfn(o, pnext, match, NULL)) {
-				pmb[i] = p->next_line;
+
+			if (o->color_moved & COLOR_MOVED_DELTA_WHITESPACES) {
+				struct ws_delta out;
+
+				if (pnext)
+					compute_ws_delta(l, pnext->es, &out);
+				if (pnext &&
+				    !hm->cmpfn(o, pnext, match, NULL) &&
+				    compare_ws_delta(&out, &wsd[i])) {
+					pmb[i] = p->next_line;
+					/* wsd[i] is the same */
+				} else {
+					pmb[i] = NULL;
+				}
 			} else {
-				pmb[i] = NULL;
+				if (pnext && !hm->cmpfn(o, pnext, match, NULL)) {
+					pmb[i] = p->next_line;
+				} else {
+					pmb[i] = NULL;
+				}
 			}
 		}
 
-		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
+		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr, &wsd);
 
 		if (pmb_nr == 0) {
 			/*
@@ -895,6 +962,10 @@ static void mark_color_as_moved(struct diff_options *o,
 			 */
 			for (; match; match = hashmap_get_next(hm, match)) {
 				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
+				if (o->color_moved & COLOR_MOVED_DELTA_WHITESPACES) {
+					ALLOC_GROW(wsd, pmb_nr + 1, wsd_alloc);
+					compute_ws_delta(l, match->es, &wsd[pmb_nr]);
+				}
 				pmb[pmb_nr++] = match;
 			}
 
@@ -912,6 +983,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	adjust_last_block(o, n, block_length);
 
 	free(pmb);
+	free(wsd);
 }
 
 #define DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK \
@@ -4645,12 +4717,16 @@ int diff_opt_parse(struct diff_options *options,
 		options->color_moved &= ~XDF_IGNORE_WHITESPACE_CHANGE;
 	else if (!strcmp(arg, "--color-moved-no-ignore-space-at-eol"))
 		options->color_moved &= ~XDF_IGNORE_WHITESPACE_AT_EOL;
+	else if (!strcmp(arg, "--color-moved-no-ignore-space-delta"))
+		options->color_moved &= ~COLOR_MOVED_DELTA_WHITESPACES;
 	else if (!strcmp(arg, "--color-moved-ignore-all-space"))
 		options->color_moved |= XDF_IGNORE_WHITESPACE;
 	else if (!strcmp(arg, "--color-moved-ignore-space-change"))
 		options->color_moved |= XDF_IGNORE_WHITESPACE_CHANGE;
 	else if (!strcmp(arg, "--color-moved-ignore-space-at-eol"))
 		options->color_moved |= XDF_IGNORE_WHITESPACE_AT_EOL;
+	else if (!strcmp(arg, "--color-moved-ignore-space-delta"))
+		options->color_moved |= COLOR_MOVED_DELTA_WHITESPACES;
 	else if (!strcmp(arg, "--indent-heuristic"))
 		DIFF_XDL_SET(options, INDENT_HEURISTIC);
 	else if (!strcmp(arg, "--no-indent-heuristic"))
@@ -5555,6 +5631,9 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
 			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
+			if (o->color_moved & COLOR_MOVED_DELTA_WHITESPACES)
+				o->color_moved |= XDF_IGNORE_WHITESPACE;
+
 			add_lines_to_move_detection(o, &add_lines, &del_lines);
 			mark_color_as_moved(o, &add_lines, &del_lines);
 			if (o->color_moved & COLOR_MOVED_DIMMED_BLOCKS)
diff --git a/diff.h b/diff.h
index 9542017986..b8c0cf1232 100644
--- a/diff.h
+++ b/diff.h
@@ -212,6 +212,7 @@ struct diff_options {
 	#define COLOR_MOVED_FIND_BLOCKS		(1 << 1)
 	/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
 
+	#define COLOR_MOVED_DELTA_WHITESPACES	(1 << 22)
 	#define COLOR_MOVED_DIMMED_BLOCKS	(1 << 23)
 
 	#define COLOR_MOVED_DEFAULT (COLOR_MOVED_ENABLED | COLOR_MOVED_FIND_BLOCKS)
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 38aaf4c46c..246ccadf9b 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1861,4 +1861,195 @@ test_expect_success 'move detection only ignores white spaces' '
 	test_cmp expected actual
 '
 
+test_expect_success 'compare whitespace delta across moved blocks' '
+
+	git reset --hard &&
+	q_to_tab <<-\EOF >text.txt &&
+	QIndented
+	QText
+	Qacross
+	Qfive
+	Qlines
+	QBut!
+	Qthis
+	QQone
+	Qline
+	QQdid
+	Qnot
+	QQadjust
+	EOF
+
+	git add text.txt &&
+	git commit -m "add text.txt" &&
+
+	q_to_tab <<-\EOF >text.txt &&
+	QQIndented
+	QQText
+	QQacross
+	QQfive
+	QQlines
+	QQQBut!
+	QQthis
+	QQQone
+	QQline
+	QQQdid
+	QQnot
+	QQQadjust
+	EOF
+
+	git diff --color --color-moved-ignore-space-delta |
+		grep -v "index" |
+		test_decode_color >actual &&
+
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/text.txt b/text.txt<RESET>
+	<BOLD>--- a/text.txt<RESET>
+	<BOLD>+++ b/text.txt<RESET>
+	<CYAN>@@ -1,12 +1,12 @@<RESET>
+	<BOLD;MAGENTA>-QIndented<RESET>
+	<BOLD;MAGENTA>-QText<RESET>
+	<BOLD;MAGENTA>-Qacross<RESET>
+	<BOLD;MAGENTA>-Qfive<RESET>
+	<BOLD;MAGENTA>-Qlines<RESET>
+	<RED>-QBut!<RESET>
+	<BOLD;MAGENTA>-Qthis<RESET>
+	<BOLD;MAGENTA>-QQone<RESET>
+	<BOLD;MAGENTA>-Qline<RESET>
+	<BOLD;MAGENTA>-QQdid<RESET>
+	<BOLD;MAGENTA>-Qnot<RESET>
+	<BOLD;MAGENTA>-QQadjust<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>Indented<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>Text<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>across<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>five<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>lines<RESET>
+	<GREEN>+<RESET>QQQ<GREEN>But!<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>this<RESET>
+	<BOLD;YELLOW>+<RESET>QQQ<BOLD;YELLOW>one<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>line<RESET>
+	<BOLD;YELLOW>+<RESET>QQQ<BOLD;YELLOW>did<RESET>
+	<BOLD;YELLOW>+<RESET>QQ<BOLD;YELLOW>not<RESET>
+	<BOLD;YELLOW>+<RESET>QQQ<BOLD;YELLOW>adjust<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success 'compare whitespace delta across moved blocks with multiple indentation levels' '
+	# alternative: "python programmers would love the move detection, too"
+
+	git reset --hard &&
+	q_to_tab <<-\EOF >test.py &&
+	class test:
+	Qdef f(x):
+	QQ"""
+	QQA simple python function
+	QQthat returns the square of a number
+	QQAlthough it may not be pythonic
+	QQ"""
+	QQdef g(x):
+	QQQ"""
+	QQQNested function that returns the same number
+	QQQWe just multiply by 1.0
+	QQQso we can write a comment about it
+	QQQas we need longer blocks
+	QQQ"""
+	QQQreturn 1.0 * x
+	QQ# Another comment for f(x)
+	QQ# also spanning multiple lines
+	QQ# to make a block
+	QQreturn g(x) * g(x)
+	Qdef h(x):
+	QQ# Another function unrelated to the previous
+	QQ# but building a block,
+	QQ# long enough to call it a block
+	QQreturn x * 1.0
+	EOF
+
+	git add test.py &&
+	git commit -m "add test.py" &&
+
+	q_to_tab <<-\EOF >test.py &&
+	class test:
+	Qdef h(x):
+	QQ# Another function unrelated to the previous
+	QQ# but building a block,
+	QQ# long enough to call it a block
+	QQreturn x * 1.0
+	def f(x):
+	Q"""
+	QA simple python function
+	Qthat returns the square of a number
+	QAlthough it may not be pythonic
+	Q"""
+	Qdef g(x):
+	QQ"""
+	QQNested function that returns the same number
+	QQWe just multiply by 1.0
+	QQso we can write a comment about it
+	QQas we need longer blocks
+	QQ"""
+	QQreturn 1.0 * x
+	Q# Another comment for f(x)
+	Q# also spanning multiple lines
+	Q# to make a block
+	Qreturn g(x) * g(x)
+	EOF
+
+	git diff --color --color-moved-ignore-space-delta |
+		grep -v "index" |
+		test_decode_color >actual &&
+
+	q_to_tab <<-\EOF >expected &&
+	<BOLD>diff --git a/test.py b/test.py<RESET>
+	<BOLD>--- a/test.py<RESET>
+	<BOLD>+++ b/test.py<RESET>
+	<CYAN>@@ -1,24 +1,24 @@<RESET>
+	 class test:<RESET>
+	<BOLD;MAGENTA>-Qdef f(x):<RESET>
+	<BOLD;MAGENTA>-QQ"""<RESET>
+	<BOLD;MAGENTA>-QQA simple python function<RESET>
+	<BOLD;MAGENTA>-QQthat returns the square of a number<RESET>
+	<BOLD;MAGENTA>-QQAlthough it may not be pythonic<RESET>
+	<BOLD;MAGENTA>-QQ"""<RESET>
+	<BOLD;MAGENTA>-QQdef g(x):<RESET>
+	<BOLD;MAGENTA>-QQQ"""<RESET>
+	<BOLD;MAGENTA>-QQQNested function that returns the same number<RESET>
+	<BOLD;MAGENTA>-QQQWe just multiply by 1.0<RESET>
+	<BOLD;MAGENTA>-QQQso we can write a comment about it<RESET>
+	<BOLD;MAGENTA>-QQQas we need longer blocks<RESET>
+	<BOLD;MAGENTA>-QQQ"""<RESET>
+	<BOLD;MAGENTA>-QQQreturn 1.0 * x<RESET>
+	<BOLD;MAGENTA>-QQ# Another comment for f(x)<RESET>
+	<BOLD;MAGENTA>-QQ# also spanning multiple lines<RESET>
+	<BOLD;MAGENTA>-QQ# to make a block<RESET>
+	<BOLD;MAGENTA>-QQreturn g(x) * g(x)<RESET>
+	 Qdef h(x):<RESET>
+	 QQ# Another function unrelated to the previous<RESET>
+	 QQ# but building a block,<RESET>
+	 QQ# long enough to call it a block<RESET>
+	 QQreturn x * 1.0<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>def f(x):<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>"""<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>A simple python function<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>that returns the square of a number<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>Although it may not be pythonic<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>"""<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>def g(x):<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>"""<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Nested function that returns the same number<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>We just multiply by 1.0<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>so we can write a comment about it<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>as we need longer blocks<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>"""<RESET>
+	<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>return 1.0 * x<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN># Another comment for f(x)<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN># also spanning multiple lines<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN># to make a block<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>return g(x) * g(x)<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation
  2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
                   ` (6 preceding siblings ...)
  2018-04-02 22:48 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
@ 2018-04-02 23:47 ` Jonathan Tan
  2018-04-03  0:03   ` Jacob Keller
  7 siblings, 1 reply; 27+ messages in thread
From: Jonathan Tan @ 2018-04-02 23:47 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jacob.keller, simon, git

On Mon,  2 Apr 2018 15:48:47 -0700
Stefan Beller <sbeller@google.com> wrote:

> This is a re-attempt of [1], which allows the moved code detection to
> ignore blanks in various modes.
> 
> patches 1-5 are refactoring, patch 6 adds all existing white space options
> of regular diff to the move detection. (I am unsure about this patch,
> as I presume we want to keep the option space at a minimum if possible).

My preference is to not do this until a need has been demonstrated, but
this sounds like it could be useful one day. I'll review the patches
from the viewpoint that we do want this feature.

> The fun is in the last patch, which allows white space sensitive
> languages to trust the move detection, too. Each block that is marked as
> moved will have the same delta in {in-, de-}dentation.
> I would think this mode might be a reasonable default eventually.

This sounds like a good idea. "Trust" is probably too strong a word, but
I can see this being useful even in non-whitespace-sensitive languages
with nested blocks (like C).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-02 22:48 ` [PATCH 5/7] diff.c: refactor internal representation for coloring moved code Stefan Beller
@ 2018-04-02 23:51   ` Jonathan Tan
  2018-04-03 18:59     ` Stefan Beller
  2018-04-06 21:28     ` Stefan Beller
  2018-04-03 19:39   ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 27+ messages in thread
From: Jonathan Tan @ 2018-04-02 23:51 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jacob.keller, simon, git

On Mon,  2 Apr 2018 15:48:52 -0700
Stefan Beller <sbeller@google.com> wrote:

> At the time the move coloring was implemented we thought an enum of modes
> is the best to configure this feature.  However as we want to tack on new
> features, the enum would grow exponentially.
> 
> Refactor the code such that features are enabled via bits. Currently we can
> * activate the move detection,
> * enable the block detection on top, and
> * enable the dimming inside a block, though this could be done without
>   block detection as well (mode "plain, dimmed")

Firstly, patches 1-4 are obviously correct.

As for this patch, I don't think that using flags is the right way to do
this. We are not under any size pressure for struct diff_options, and
the additional options that we plan to add (color-moved-whitespace-flags
and ignore-space-delta) can easily be additional fields instead.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation
  2018-04-02 23:47 ` [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
@ 2018-04-03  0:03   ` Jacob Keller
  2018-04-03 19:00     ` Stefan Beller
  0 siblings, 1 reply; 27+ messages in thread
From: Jacob Keller @ 2018-04-03  0:03 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Stefan Beller, Simon Ruderich, Git mailing list

On Mon, Apr 2, 2018 at 4:47 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Mon,  2 Apr 2018 15:48:47 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> This is a re-attempt of [1], which allows the moved code detection to
>> ignore blanks in various modes.
>>
>> patches 1-5 are refactoring, patch 6 adds all existing white space options
>> of regular diff to the move detection. (I am unsure about this patch,
>> as I presume we want to keep the option space at a minimum if possible).
>
> My preference is to not do this until a need has been demonstrated, but
> this sounds like it could be useful one day. I'll review the patches
> from the viewpoint that we do want this feature.
>
>> The fun is in the last patch, which allows white space sensitive
>> languages to trust the move detection, too. Each block that is marked as
>> moved will have the same delta in {in-, de-}dentation.
>> I would think this mode might be a reasonable default eventually.
>
> This sounds like a good idea. "Trust" is probably too strong a word, but
> I can see this being useful even in non-whitespace-sensitive languages
> with nested blocks (like C).

The ability to detect moved code despite whitespace changes would be
good, even while showing diffs with the whitespace intact.

We may not need *all* the options available, but allowing the internal
settings to enable this but have the user-visible controls be limited
should be totally fine.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option
  2018-04-02 22:48 ` [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option Stefan Beller
@ 2018-04-03  0:31   ` Jonathan Tan
  0 siblings, 0 replies; 27+ messages in thread
From: Jonathan Tan @ 2018-04-03  0:31 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jacob.keller, simon, git

On Mon,  2 Apr 2018 15:48:53 -0700
Stefan Beller <sbeller@google.com> wrote:

> In the original implementation of the move detection logic we assumed that
> the choice for ignoring white space changes is the same for the move
> detection as it is for the generic diff.  It turns out this is wrong.

I don't think we can say that this is wrong - just that we decided that
it would be useful to allow different whitespace handling methods when
calculating the diff and when detecting moves.

> There are a couple of things where the user wants to input their
> decision into the diff machinery:
> 
> * Whether or not to ignore white space for the general diff detection.
>   This is covered with the -w, -b options already.
> * Whether the move detection ought to pay attention to white spaces
>   in general. And if so, how are white spaces handled for the block
>   detection.
> 
> One example would be reviewing a current patch under discussion, that moves
> code around.  In that case, you may want to have the general diff machinery
> not ignore the white spaces (i.e. exact matching), as that will show you
> the diff as the patch sent. The code moved however may have another
> indentation level, such that you want to ignore white spaces for the move
> detection. The code may be in python, such that spaces matter for program
> flow, though. That is why you'd want each block to have the same change
> in white space.
> 
> This patch decouples white space treatment in the general diff machinery
> from the white space treatment in the move detection.
> 
> This adds all the the options for ignoring white space that the regular
> diff machinery has to the move detection, such that we can cover all the
> cases that were introduced in 7a55427094 (Merge branch
> 'jk/diff-color-moved-fix', 2017-11-06).

I would just write the above as follows:

  Allow the user to specify that whitespace should be ignored
  differently during detection of moved lines than during generation of
  added and removed lines. This is done by providing analogs to the
  --ignore-space-at-eol, -b, and -w options (namely,
  --color-moved-[no-]ignore-space-at-eol <fill in the rest>) that affect
  only the color of the output, and making the existing
  --ignore-space-at-eol, -b, and -w options no longer affect the color
  of the output.

(And if the above is not clear enough that this is a
backwards-incompatible change, we might need to call it out.)

> The example from above (different lines in the diff with -w for the regular
> diff) is covered in a test. Convert the existing tests to be more explicit
> on their choice of white space behavior in the move detection as the tests
> should not fail if the default is changed.

This statement is confusing to me - of course the tests should fail,
since we changed the defaults. I think it is sufficient to just mention
that whitespace handling has to be explicitly specified for the move
detection part.

> +--color-moved-[no-]ignore-all-space::
> +	Ignore whitespace when comparing lines when performing the move
> +	detection for --color-moved.  This ignores differences even if
> +	one line has whitespace where the other line has none.
> +--color-moved-[no-]ignore-space-change::
> +	Ignore changes in amount of whitespace when performing the move
> +	detection for --color-moved.  This ignores whitespace
> +	at line end, and considers all other sequences of one or
> +	more whitespace characters to be equivalent.
> +--color-moved-[no-]ignore-space-at-eol::
> +	Ignore changes in whitespace at EOL when performing the move
> +	detection for --color-moved.

Optional: I would reorder this to be in the same order as the analogous
options (--ignore-space-at-eol first, then -b, then -w).

> @@ -720,7 +720,7 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
>  
>  	return !xdiff_compare_lines(a->es->line, a->es->len,
>  				    b->es->line, b->es->len,
> -				    diffopt->xdl_opts);
> +				    diffopt->color_moved & XDF_WHITESPACE_FLAGS);
>  }
>  
>  static struct moved_entry *prepare_entry(struct diff_options *o,
> @@ -728,8 +728,9 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
>  {
>  	struct moved_entry *ret = xmalloc(sizeof(*ret));
>  	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
> +	unsigned flags = o->color_moved & XDF_WHITESPACE_FLAGS;
>  
> -	ret->ent.hash = xdiff_hash_string(l->line, l->len, o->xdl_opts);
> +	ret->ent.hash = xdiff_hash_string(l->line, l->len, flags);
>  	ret->es = l;
>  	ret->next_line = NULL;

These 2 changes looked suspicious to me at first, but then I saw that
moved_entry_cmp() and prepare_entry() are only used during move
detection.

> @@ -1419,7 +1422,10 @@ test_expect_success 'move detection ignoring whitespace ' '
>  	EOF
>  	test_cmp expected actual &&
>  
> -	git diff HEAD --no-renames -w --color-moved --color |
> +	git diff HEAD --no-renames --color-moved --color \
> +		--color-moved-ignore-all-space \
> +		--color-moved-no-ignore-space-change \
> +		--color-moved-no-ignore-space-at-eol |
>  		grep -v "index" |
>  		test_decode_color >actual &&
>  	cat <<-\EOF >expected &&

This change (removal of -w) looked suspicious to me at first, but then I
saw that the "-w" was only used to trigger whitespace insensitivity
during the move detection phase. (The line in question was moved, so "-"
and "+" lines would have been generated in the diff anyway regardless of
whether whitespace was ignored.) Same as to the other test
modifications.

> +test_expect_success 'move detection only ignores white spaces' '

This seems to be "only move detection ignores white spaces", not as
written. I would title this "--color-moved-ignore-all-space only ignores
whitespace during move detection".

> +	q_to_tab <<-\EOF >function.c &&
> +	int func()
> +	{
> +	Qif (foo) {
> +	QQ// this part of the function
> +	QQ// function will be very long
> +	QQ// indeed. We must exceed both
> +	QQ// per-line and number of line
> +	QQ// minimums
> +	QQ;
> +	Q}
> +	Qbaz();
> +	Qbar();
> +	Q// more unrelated stuff
> +	}
> +	EOF

Could the sample data be more concise? In particular, a few lines should
be sufficient for before:

  a long line to exceed per-line minimum
  another long line to exceed per-line minimum
  original file

and after:

  Qa long line to exceed per-line minimum
  Qanother long line to exceed per-line minimum
  new file

Then the ordinary diff (with -w) will only have "-" and "+" for
"original file" and "new file", and the one with only
--color-moved-ignore-all-space would have "-" and "+" for all lines.

> +	# Make sure we get a different diff using -w ("moved function header")

-w is not "moved function header", maybe reword.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-02 22:48 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
@ 2018-04-03  0:41   ` Jonathan Tan
  2018-04-03 19:22     ` Stefan Beller
  0 siblings, 1 reply; 27+ messages in thread
From: Jonathan Tan @ 2018-04-03  0:41 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jacob.keller, simon, git

On Mon,  2 Apr 2018 15:48:54 -0700
Stefan Beller <sbeller@google.com> wrote:

> +struct ws_delta {
> +	int deltachars;
> +	char firstchar;
> +};

I'll just make some overall design comments.

Shouldn't this be a string of characters (or a char* and len) and
whether it was added or removed? If you're only checking the first
character, this would not work if the other characters were different.

> @@ -717,10 +752,20 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
>  	const struct diff_options *diffopt = hashmap_cmp_fn_data;
>  	const struct moved_entry *a = entry;
>  	const struct moved_entry *b = entry_or_key;
> +	unsigned flags = diffopt->color_moved & XDF_WHITESPACE_FLAGS;
> +
> +	if (diffopt->color_moved & COLOR_MOVED_DELTA_WHITESPACES)
> +		/*
> +		 * As there is not specific white space config given,
> +		 * we'd need to check for a new block, so ignore all
> +		 * white space. The setup of the white space
> +		 * configuration for the next block is done else where
> +		 */
> +		flags |= XDF_IGNORE_WHITESPACE;
>  
>  	return !xdiff_compare_lines(a->es->line, a->es->len,
>  				    b->es->line, b->es->len,
> -				    diffopt->color_moved & XDF_WHITESPACE_FLAGS);
> +				    flags);
>  }

I think we should just prohibit combining this with any of the
whitespace ignoring flags except for the space-at-eol one. They seem to
contradict anyway.

> +test_expect_success 'compare whitespace delta across moved blocks' '
> +
> +	git reset --hard &&
> +	q_to_tab <<-\EOF >text.txt &&
> +	QIndented
> +	QText
> +	Qacross
> +	Qfive
> +	Qlines
> +	QBut!
> +	Qthis
> +	QQone
> +	Qline
> +	QQdid
> +	Qnot
> +	QQadjust
> +	EOF

Do we need 5 lines? I thought 2 would suffice. (It's the ALNUM_COUNT
that matters, as far as I know.) This makes it hard to see that the
"But!" line is the one that counts.

> +test_expect_success 'compare whitespace delta across moved blocks with multiple indentation levels' '
> +	# alternative: "python programmers would love the move detection, too"
> +
> +	git reset --hard &&
> +	q_to_tab <<-\EOF >test.py &&
> +	class test:
> +	Qdef f(x):
> +	QQ"""
> +	QQA simple python function
> +	QQthat returns the square of a number
> +	QQAlthough it may not be pythonic
> +	QQ"""
> +	QQdef g(x):
> +	QQQ"""
> +	QQQNested function that returns the same number
> +	QQQWe just multiply by 1.0
> +	QQQso we can write a comment about it
> +	QQQas we need longer blocks
> +	QQQ"""
> +	QQQreturn 1.0 * x
> +	QQ# Another comment for f(x)
> +	QQ# also spanning multiple lines
> +	QQ# to make a block
> +	QQreturn g(x) * g(x)
> +	Qdef h(x):
> +	QQ# Another function unrelated to the previous
> +	QQ# but building a block,
> +	QQ# long enough to call it a block
> +	QQreturn x * 1.0
> +	EOF

If the objective it just to show that the functions f and g are treated
as one unit despite their lines being of multiple indentation levels,
the test file could be much shorter.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-02 23:51   ` Jonathan Tan
@ 2018-04-03 18:59     ` Stefan Beller
  2018-04-06 21:28     ` Stefan Beller
  1 sibling, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-04-03 18:59 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jacob Keller, Simon Ruderich, git

On Mon, Apr 2, 2018 at 4:51 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Mon,  2 Apr 2018 15:48:52 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> At the time the move coloring was implemented we thought an enum of modes
>> is the best to configure this feature.  However as we want to tack on new
>> features, the enum would grow exponentially.
>>
>> Refactor the code such that features are enabled via bits. Currently we can
>> * activate the move detection,
>> * enable the block detection on top, and
>> * enable the dimming inside a block, though this could be done without
>>   block detection as well (mode "plain, dimmed")
>
> Firstly, patches 1-4 are obviously correct.
>
> As for this patch, I don't think that using flags is the right way to do
> this. We are not under any size pressure for struct diff_options, and
> the additional options that we plan to add (color-moved-whitespace-flags
> and ignore-space-delta) can easily be additional fields instead.

Gah! I'll give it a try.

When I came to the conclusion that the features of this series is
orthogonal to the existing code, bit fields came first to mind.

Let's see if the alternative is easier to read.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation
  2018-04-03  0:03   ` Jacob Keller
@ 2018-04-03 19:00     ` Stefan Beller
  2018-04-03 19:55       ` Jacob Keller
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-04-03 19:00 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jonathan Tan, Simon Ruderich, Git mailing list

>>> The fun is in the last patch, which allows white space sensitive
>>> languages to trust the move detection, too. Each block that is marked as
>>> moved will have the same delta in {in-, de-}dentation.
>>> I would think this mode might be a reasonable default eventually.
>>
>> This sounds like a good idea. "Trust" is probably too strong a word, but
>> I can see this being useful even in non-whitespace-sensitive languages
>> with nested blocks (like C).
>
> The ability to detect moved code despite whitespace changes would be
> good, even while showing diffs with the whitespace intact.

That is what the last patch is about.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-03  0:41   ` Jonathan Tan
@ 2018-04-03 19:22     ` Stefan Beller
  2018-04-03 20:38       ` Jonathan Tan
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-04-03 19:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jacob Keller, Simon Ruderich, git

On Mon, Apr 2, 2018 at 5:41 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Mon,  2 Apr 2018 15:48:54 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> +struct ws_delta {
>> +     int deltachars;
>> +     char firstchar;
>> +};
>
> I'll just make some overall design comments.
>
> Shouldn't this be a string of characters (or a char* and len) and
> whether it was added or removed? If you're only checking the first
> character, this would not work if the other characters were different.

I considered diving into this, but it seemed to be too complicated for
>95 % of the use cases, which can be approximated in change of the
first character.

Because if we take a string of characters, we'd also need to take care of
tricky conversions (e.g. Are 8 white spaces equal to a tab, and if so do
we break blocks if one line converts 8 ws to a tab?)

So I would definitely pursue the string instead of change of first
character, but what are all the heuristics to put in?

Just to be clear: The string would contain only the change in
white space up front, or would we also somehow store white space
in other parts?

- # This is a sample comment
- # across multiple lines
- # maybe even a license header
+ #     This is a sample comment
+ #     across multiple lines
+ #     maybe even a license header

How about this?


>
>> @@ -717,10 +752,20 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
>>       const struct diff_options *diffopt = hashmap_cmp_fn_data;
>>       const struct moved_entry *a = entry;
>>       const struct moved_entry *b = entry_or_key;
>> +     unsigned flags = diffopt->color_moved & XDF_WHITESPACE_FLAGS;
>> +
>> +     if (diffopt->color_moved & COLOR_MOVED_DELTA_WHITESPACES)
>> +             /*
>> +              * As there is not specific white space config given,
>> +              * we'd need to check for a new block, so ignore all
>> +              * white space. The setup of the white space
>> +              * configuration for the next block is done else where
>> +              */
>> +             flags |= XDF_IGNORE_WHITESPACE;
>>
>>       return !xdiff_compare_lines(a->es->line, a->es->len,
>>                                   b->es->line, b->es->len,
>> -                                 diffopt->color_moved & XDF_WHITESPACE_FLAGS);
>> +                                 flags);
>>  }
>
> I think we should just prohibit combining this with any of the
> whitespace ignoring flags except for the space-at-eol one. They seem to
> contradict anyway.

ok, we can narrow this one down to ignore all white space.

>
>> +test_expect_success 'compare whitespace delta across moved blocks' '
>> +
>> +     git reset --hard &&
>> +     q_to_tab <<-\EOF >text.txt &&
>> +     QIndented
>> +     QText
>> +     Qacross
>> +     Qfive
>> +     Qlines
>> +     QBut!
>> +     Qthis
>> +     QQone
>> +     Qline
>> +     QQdid
>> +     Qnot
>> +     QQadjust
>> +     EOF
>
> Do we need 5 lines? I thought 2 would suffice. (It's the ALNUM_COUNT
> that matters, as far as I know.) This makes it hard to see that the
> "But!" line is the one that counts.

I did not want to go with the bare minimum as then adjusting the minimum
would be a pain as these unrelated (to the minimum) test cases would
break.

>
>> +test_expect_success 'compare whitespace delta across moved blocks with multiple indentation levels' '

>> +     EOF
>
> If the objective it just to show that the functions f and g are treated
> as one unit despite their lines being of multiple indentation levels,
> the test file could be much shorter.

yeah, I noticed that we already test that in the test above where we
have that test after the "But!", where lines ziggy-zag. Will drop this test.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-02 22:48 ` [PATCH 5/7] diff.c: refactor internal representation for coloring moved code Stefan Beller
  2018-04-02 23:51   ` Jonathan Tan
@ 2018-04-03 19:39   ` Ævar Arnfjörð Bjarmason
  2018-04-03 19:49     ` Stefan Beller
  1 sibling, 1 reply; 27+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-04-03 19:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jonathantanmy, jacob.keller, simon, git


On Mon, Apr 02 2018, Stefan Beller wrote:

> At the time the move coloring was implemented we thought an enum of modes
> is the best to configure this feature.  However as we want to tack on new
> features, the enum would grow exponentially.
>
> Refactor the code such that features are enabled via bits. Currently we can
> * activate the move detection,
> * enable the block detection on top, and
> * enable the dimming inside a block, though this could be done without
>   block detection as well (mode "plain, dimmed")
>
> Choose the flags to not be at bit position 2,3,4 as the next patch
> will occupy these.

When I've been playing with colorMoved the thing I've found really
confusing is that the current config has confused two completely
unrelated things (at least, from a user perspective), what underlying
algorithm you use, and how the colors look.

I was helping someone at work the other day where they were trying:

    git -c color.diff.new="green bold" \
        -c color.diff.old="red bold" \
        -c color.diff.newMoved="green" \
        -c color.diff.oldMoved="red" \
        -c diff.colorMoved=plain show <commit>

But what gave better results was:

    git -c color.diff.new="green bold" \
        -c color.diff.old="red bold" \
        -c color.diff.newMoved="green" \
        -c color.diff.oldMoved="red" \
        -c diff.colorMoved=zebra \
        -c color.diff.oldMovedAlternative=red \
        -c color.diff.newMovedAlternative=green show <commit>

I don't have a public test commit to share (sorry), but I have an
internal example where "plain" will consider a thing as falling under
color.diff.old OR color.diff.oldMoved, but zebra will consider that
whole part only color.diff.old.

I see now that that might be since only the "zebra" supports the
*Alternative that it ends up "stealing" chunks from something that would
have otherwise been classified differently, so I have no idea if there's
an easy "solution", or if it's even a problem.

Sorry about being vague, I just dug this up from some old notes now
after this patch jolted my memory about it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-03 19:39   ` Ævar Arnfjörð Bjarmason
@ 2018-04-03 19:49     ` Stefan Beller
  2018-04-03 20:44       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-04-03 19:49 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jonathan Tan, Jacob Keller, Simon Ruderich, git

On Tue, Apr 3, 2018 at 12:39 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Mon, Apr 02 2018, Stefan Beller wrote:
>
>> At the time the move coloring was implemented we thought an enum of modes
>> is the best to configure this feature.  However as we want to tack on new
>> features, the enum would grow exponentially.
>>
>> Refactor the code such that features are enabled via bits. Currently we can
>> * activate the move detection,
>> * enable the block detection on top, and
>> * enable the dimming inside a block, though this could be done without
>>   block detection as well (mode "plain, dimmed")
>>
>> Choose the flags to not be at bit position 2,3,4 as the next patch
>> will occupy these.
>
> When I've been playing with colorMoved the thing I've found really
> confusing is that the current config has confused two completely
> unrelated things (at least, from a user perspective), what underlying
> algorithm you use, and how the colors look.

Not sure I follow. The colors are in color.diff.X and the algorithm is in
diff.colorMoved, whereas some colors are reused for different algorithms?

>
> I was helping someone at work the other day where they were trying:
>
>     git -c color.diff.new="green bold" \
>         -c color.diff.old="red bold" \
>         -c color.diff.newMoved="green" \
>         -c color.diff.oldMoved="red" \
>         -c diff.colorMoved=plain show <commit>
>
> But what gave better results was:
>
>     git -c color.diff.new="green bold" \
>         -c color.diff.old="red bold" \
>         -c color.diff.newMoved="green" \
>         -c color.diff.oldMoved="red" \
>         -c diff.colorMoved=zebra \
>         -c color.diff.oldMovedAlternative=red \
>         -c color.diff.newMovedAlternative=green show <commit>
>
> I don't have a public test commit to share (sorry), but I have an
> internal example where "plain" will consider a thing as falling under
> color.diff.old OR color.diff.oldMoved, but zebra will consider that
> whole part only color.diff.old.

What do you mean by "OR" ?
Is the hunk present multiple times and colored one or the other way?
Is it colored differently in different invocations of Git?
Is one hunk mixing up both colors?

Is the hunk "small" ?
small hunks are un-colored, to avoid showing empty lines
or closing braces as moved. But plain mode ignores this heuristic.

> I see now that that might be since only the "zebra" supports the
> *Alternative that it ends up "stealing" chunks from something that would
> have otherwise been classified differently, so I have no idea if there's
> an easy "solution", or if it's even a problem.

Can you describe the issue more to see if it is a problem?
(It sounds like a problem in the documentation/UX to me already
as the docs could not tell you what to expect)

> Sorry about being vague, I just dug this up from some old notes now
> after this patch jolted my memory about it.

ok.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation
  2018-04-03 19:00     ` Stefan Beller
@ 2018-04-03 19:55       ` Jacob Keller
  0 siblings, 0 replies; 27+ messages in thread
From: Jacob Keller @ 2018-04-03 19:55 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Tan, Simon Ruderich, Git mailing list

On Tue, Apr 3, 2018 at 12:00 PM, Stefan Beller <sbeller@google.com> wrote:
>>>> The fun is in the last patch, which allows white space sensitive
>>>> languages to trust the move detection, too. Each block that is marked as
>>>> moved will have the same delta in {in-, de-}dentation.
>>>> I would think this mode might be a reasonable default eventually.
>>>
>>> This sounds like a good idea. "Trust" is probably too strong a word, but
>>> I can see this being useful even in non-whitespace-sensitive languages
>>> with nested blocks (like C).
>>
>> The ability to detect moved code despite whitespace changes would be
>> good, even while showing diffs with the whitespace intact.
>
> That is what the last patch is about.

Right. I was trying to say "I agree with the goal, even if we don't
necessarily allow every possible white-space + color-moved lines
combination" (i.e. to avoid polluting the option space too much)

Thanks,
Jake

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option
  2018-04-03 19:22     ` Stefan Beller
@ 2018-04-03 20:38       ` Jonathan Tan
  0 siblings, 0 replies; 27+ messages in thread
From: Jonathan Tan @ 2018-04-03 20:38 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jacob Keller, Simon Ruderich, git

On Tue, 3 Apr 2018 12:22:32 -0700
Stefan Beller <sbeller@google.com> wrote:

> On Mon, Apr 2, 2018 at 5:41 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> > On Mon,  2 Apr 2018 15:48:54 -0700
> > Stefan Beller <sbeller@google.com> wrote:
> >
> >> +struct ws_delta {
> >> +     int deltachars;
> >> +     char firstchar;
> >> +};
> >
> > I'll just make some overall design comments.
> >
> > Shouldn't this be a string of characters (or a char* and len) and
> > whether it was added or removed? If you're only checking the first
> > character, this would not work if the other characters were different.
> 
> I considered diving into this, but it seemed to be too complicated for
> >95 % of the use cases, which can be approximated in change of the
> first character.

It's true that most use cases can be approximated this way, but I don't
think that it's worth the approximation.

> Because if we take a string of characters, we'd also need to take care of
> tricky conversions (e.g. Are 8 white spaces equal to a tab, and if so do
> we break blocks if one line converts 8 ws to a tab?)

No conversions - spaces are spaces and tabs are tabs.

> So I would definitely pursue the string instead of change of first
> character, but what are all the heuristics to put in?

No heuristics - a few lines make a block if the same prefix (which
consists of all whitespace) was added or removed.

> Just to be clear: The string would contain only the change in
> white space up front, or would we also somehow store white space
> in other parts?

Only change in white space at the start of the line - this option only
handles space at the start of the line, right?

> - # This is a sample comment
> - # across multiple lines
> - # maybe even a license header
> + #     This is a sample comment
> + #     across multiple lines
> + #     maybe even a license header
> 
> How about this?

My understanding is that this patch does not handle this case.

> >> @@ -717,10 +752,20 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
> >>       const struct diff_options *diffopt = hashmap_cmp_fn_data;
> >>       const struct moved_entry *a = entry;
> >>       const struct moved_entry *b = entry_or_key;
> >> +     unsigned flags = diffopt->color_moved & XDF_WHITESPACE_FLAGS;
> >> +
> >> +     if (diffopt->color_moved & COLOR_MOVED_DELTA_WHITESPACES)
> >> +             /*
> >> +              * As there is not specific white space config given,
> >> +              * we'd need to check for a new block, so ignore all
> >> +              * white space. The setup of the white space
> >> +              * configuration for the next block is done else where
> >> +              */
> >> +             flags |= XDF_IGNORE_WHITESPACE;
> >>
> >>       return !xdiff_compare_lines(a->es->line, a->es->len,
> >>                                   b->es->line, b->es->len,
> >> -                                 diffopt->color_moved & XDF_WHITESPACE_FLAGS);
> >> +                                 flags);
> >>  }
> >
> > I think we should just prohibit combining this with any of the
> > whitespace ignoring flags except for the space-at-eol one. They seem to
> > contradict anyway.
> 
> ok, we can narrow this one down to ignore all white space.

What do you mean? The rationale for my comment is that I saw that you
need to specify a special flag to xdiff_compare_lines if
COLOR_MOVED_DELTA_WHITESPACES is set, which could conflict with other
flags that the user has explicitly set. So avoiding that case entirely
seems like a good idea, especially since it is logical to do so.

> >> +test_expect_success 'compare whitespace delta across moved blocks' '
> >> +
> >> +     git reset --hard &&
> >> +     q_to_tab <<-\EOF >text.txt &&
> >> +     QIndented
> >> +     QText
> >> +     Qacross
> >> +     Qfive
> >> +     Qlines
> >> +     QBut!
> >> +     Qthis
> >> +     QQone
> >> +     Qline
> >> +     QQdid
> >> +     Qnot
> >> +     QQadjust
> >> +     EOF
> >
> > Do we need 5 lines? I thought 2 would suffice. (It's the ALNUM_COUNT
> > that matters, as far as I know.) This makes it hard to see that the
> > "But!" line is the one that counts.
> 
> I did not want to go with the bare minimum as then adjusting the minimum
> would be a pain as these unrelated (to the minimum) test cases would
> break.

That is true, but it makes the test case harder to read now. If you're
worried about bumping into the minimum if we do adjust the minimum,
making the lines longer should be sufficient.

> >> +test_expect_success 'compare whitespace delta across moved blocks with multiple indentation levels' '
> 
> >> +     EOF
> >
> > If the objective it just to show that the functions f and g are treated
> > as one unit despite their lines being of multiple indentation levels,
> > the test file could be much shorter.
> 
> yeah, I noticed that we already test that in the test above where we
> have that test after the "But!", where lines ziggy-zag. Will drop this test.

OK, sounds good.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-03 19:49     ` Stefan Beller
@ 2018-04-03 20:44       ` Ævar Arnfjörð Bjarmason
  2018-04-03 21:05         ` [PATCH] diff: add a blocks mode for moved code detection Stefan Beller
  0 siblings, 1 reply; 27+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-04-03 20:44 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Tan, Jacob Keller, Simon Ruderich, git


On Tue, Apr 03 2018, Stefan Beller wrote:

> On Tue, Apr 3, 2018 at 12:39 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> On Mon, Apr 02 2018, Stefan Beller wrote:
>>
>>> At the time the move coloring was implemented we thought an enum of modes
>>> is the best to configure this feature.  However as we want to tack on new
>>> features, the enum would grow exponentially.
>>>
>>> Refactor the code such that features are enabled via bits. Currently we can
>>> * activate the move detection,
>>> * enable the block detection on top, and
>>> * enable the dimming inside a block, though this could be done without
>>>   block detection as well (mode "plain, dimmed")
>>>
>>> Choose the flags to not be at bit position 2,3,4 as the next patch
>>> will occupy these.
>>
>> When I've been playing with colorMoved the thing I've found really
>> confusing is that the current config has confused two completely
>> unrelated things (at least, from a user perspective), what underlying
>> algorithm you use, and how the colors look.
>
> Not sure I follow. The colors are in color.diff.X and the algorithm is in
> diff.colorMoved, whereas some colors are reused for different algorithms?
>
>>
>> I was helping someone at work the other day where they were trying:
>>
>>     git -c color.diff.new="green bold" \
>>         -c color.diff.old="red bold" \
>>         -c color.diff.newMoved="green" \
>>         -c color.diff.oldMoved="red" \
>>         -c diff.colorMoved=plain show <commit>
>>
>> But what gave better results was:
>>
>>     git -c color.diff.new="green bold" \
>>         -c color.diff.old="red bold" \
>>         -c color.diff.newMoved="green" \
>>         -c color.diff.oldMoved="red" \
>>         -c diff.colorMoved=zebra \
>>         -c color.diff.oldMovedAlternative=red \
>>         -c color.diff.newMovedAlternative=green show <commit>
>>
>> I don't have a public test commit to share (sorry), but I have an
>> internal example where "plain" will consider a thing as falling under
>> color.diff.old OR color.diff.oldMoved, but zebra will consider that
>> whole part only color.diff.old.
>
> What do you mean by "OR" ?
> Is the hunk present multiple times and colored one or the other way?
> Is it colored differently in different invocations of Git?
> Is one hunk mixing up both colors?
>
> Is the hunk "small" ?
> small hunks are un-colored, to avoid showing empty lines
> or closing braces as moved. But plain mode ignores this heuristic.
>
>> I see now that that might be since only the "zebra" supports the
>> *Alternative that it ends up "stealing" chunks from something that would
>> have otherwise been classified differently, so I have no idea if there's
>> an easy "solution", or if it's even a problem.
>
> Can you describe the issue more to see if it is a problem?
> (It sounds like a problem in the documentation/UX to me already
> as the docs could not tell you what to expect)
>
>> Sorry about being vague, I just dug this up from some old notes now
>> after this patch jolted my memory about it.

Forget about what I said so far, sorry, that was a really shitty
report. I dug into this a bit more and here's a better one.

I still can't share the actual diff I have in front of me (internal
code).

Currently we have plain, zebra & dimmed_zebra, and zebra is the
default.

I got an internal report from someone who had, because zebra looked
crappy in his terminal, moved to "plain", and was reporting getting
worse moved diffs as a result.

I found that there's essentially a missing setting between "plain" and
"zebra", in git command terms:

    # The "plain" setting
    git -c diff.colorMoved=true \
        -c diff.colorMoved=plain \
        show <commit>

    # We don't have this, it's "plain" but with "zebra" heuristics,
    # plain_zebra?
    git -c diff.colorMoved=true \
        -c color.diff.oldMovedAlternative="bold magenta" \
        -c color.diff.newMovedAlternative="bold yellow" \
        -c diff.colorMoved=zebra \
        show <commit>

    # The "zebra" setting.
    git -c diff.colorMoved=true \
        -c diff.colorMoved=zebra \
        show <commit>

Which is what I mean by the current config conflating two (to me)
unrelated things. One is how we, via any method, detect what's moved or
not, and the other is what color/format we use to present this to the
user.

You can feed that plain_zebra invocation input where it'll color-wise
produce something that looks *almost* like "plain", but will differ (and
usually be better) in what lines it decides to show as moved, which of
course is due to *MovedAlternative.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] diff: add a blocks mode for moved code detection
  2018-04-03 20:44       ` Ævar Arnfjörð Bjarmason
@ 2018-04-03 21:05         ` Stefan Beller
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-04-03 21:05 UTC (permalink / raw)
  To: avarab; +Cc: git, jacob.keller, jonathantanmy, sbeller, simon

> Currently we have plain, zebra & dimmed_zebra, and zebra is the
> default.
>
> I got an internal report from someone who had, because zebra looked
> crappy in his terminal, moved to "plain", and was reporting getting
> worse moved diffs as a result.
>
> I found that there's essentially a missing setting between "plain" and
> "zebra", in git command terms:
>
>     # The "plain" setting
>     git -c diff.colorMoved=true \
>         -c diff.colorMoved=plain \
>         show <commit>
>
>     # We don't have this, it's "plain" but with "zebra" heuristics,
>     # plain_zebra?
>     git -c diff.colorMoved=true \
>         -c color.diff.oldMovedAlternative="bold magenta" \
>         -c color.diff.newMovedAlternative="bold yellow" \
>         -c diff.colorMoved=zebra \
>         show <commit>
>
>     # The "zebra" setting.
>     git -c diff.colorMoved=true \
>         -c diff.colorMoved=zebra \
>         show <commit>
>
> Which is what I mean by the current config conflating two (to me)
> unrelated things. One is how we, via any method, detect what's moved or
> not, and the other is what color/format we use to present this to the
> user.

Oh I see.

Reading the docs again, maybe we want to have a "blocks" mode,
that is zebra with the same color for any block?

> You can feed that plain_zebra invocation input where it'll color-wise
> produce something that looks *almost* like "plain", but will differ (and
> usually be better) in what lines it decides to show as moved, which of
> course is due to *MovedAlternative.

I would think this is close to what you want (module implementation errors,
I did not run/test this code).

One could also argue that this is *too* weak, as when there are
multiple blocks of say 15 chars adjacent, they might be one large block.

---8<----

From dde04f6afa35a313fac3575100fe83b554ec2b59 Mon Sep 17 00:00:00 2001
From: Stefan Beller <sbeller@google.com>
Date: Tue, 3 Apr 2018 14:03:01 -0700
Subject: [PATCH] diff: add a blocks mode for moved code detection

Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-options.txt | 5 +++++
 diff.c                         | 4 +++-
 diff.h                         | 5 +++--
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index c330c01ff0..abce5142d2 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -268,6 +268,11 @@ plain::
 	that are added somewhere else in the diff. This mode picks up any
 	moved line, but it is not very useful in a review to determine
 	if a block of code was moved without permutation.
+blocks:
+	Blocks of moved text of at least 20 alphanumeric characters
+	are detected greedily. The detected blocks are
+	painted using either the 'color.diff.{old,new}Moved' color.
+	Adjacent blocks cannot be told apart.
 zebra::
 	Blocks of moved text of at least 20 alphanumeric characters
 	are detected greedily. The detected blocks are
diff --git a/diff.c b/diff.c
index 21c3838b25..80dd8cbd9a 100644
--- a/diff.c
+++ b/diff.c
@@ -271,6 +271,8 @@ static int parse_color_moved(const char *arg)
 		return COLOR_MOVED_NO;
 	else if (!strcmp(arg, "plain"))
 		return COLOR_MOVED_PLAIN;
+	else if (!strcmp(arg, "blocks"))
+		return COLOR_MOVED_BLOCKS;
 	else if (!strcmp(arg, "zebra"))
 		return COLOR_MOVED_ZEBRA;
 	else if (!strcmp(arg, "default"))
@@ -899,7 +901,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
 		block_length++;
 
-		if (flipped_block)
+		if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
 	adjust_last_block(o, n, block_length);
diff --git a/diff.h b/diff.h
index 6bd278aac1..3a228861d9 100644
--- a/diff.h
+++ b/diff.h
@@ -207,8 +207,9 @@ struct diff_options {
 	enum {
 		COLOR_MOVED_NO = 0,
 		COLOR_MOVED_PLAIN = 1,
-		COLOR_MOVED_ZEBRA = 2,
-		COLOR_MOVED_ZEBRA_DIM = 3,
+		COLOR_MOVED_BLOCKS = 2,
+		COLOR_MOVED_ZEBRA = 3,
+		COLOR_MOVED_ZEBRA_DIM = 4,
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
 	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
-- 
2.17.0.484.g0c8726318c-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap
  2018-04-02 22:48 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
@ 2018-04-06 20:04   ` Jeff King
  2018-04-06 20:41     ` Stefan Beller
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2018-04-06 20:04 UTC (permalink / raw)
  To: Stefan Beller; +Cc: jonathantanmy, jacob.keller, simon, git

On Mon, Apr 02, 2018 at 03:48:50PM -0700, Stefan Beller wrote:

> The diff options are passed to the compare function as
> 'hashmap_cmp_fn_data', which are given when the hashmaps
> are initialized.
> 
> A later patch will make use of the keydata to signal
> different settings for comparision.

I had to scratch my head here for a moment. Don't we use those options
as part of the comparison?

I took the "which" to mean "the compare function", but I think you mean
"we pass these diff options already when the hashmap is initialized".

Maybe something like this would be more clear:

  When we initialize the hashmap, we give it a pointer to the
  diff_options, which it then passes along to each call of the
  hashmap_cmp_fn function. There's no need to pass it a second time as
  the "keydata" parameter, and our comparison functions never look at
  keydata.

  This was a mistake left over from an earlier round of 2e2d5ac184
  (diff.c: color moved lines differently, 2017-06-30), before hashmap
  learned to pass the data pointer for us.

(I'm just guessing on the second paragraph based on a quick look at
git-blame and my recollection from the time).

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap
  2018-04-06 20:04   ` Jeff King
@ 2018-04-06 20:41     ` Stefan Beller
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-04-06 20:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, Jacob Keller, Simon Ruderich, git

Hi Jeff,

On Fri, Apr 6, 2018 at 1:04 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Apr 02, 2018 at 03:48:50PM -0700, Stefan Beller wrote:
>
>> The diff options are passed to the compare function as
>> 'hashmap_cmp_fn_data', which are given when the hashmaps
>> are initialized.
>>
>> A later patch will make use of the keydata to signal
>> different settings for comparision.
>
> I had to scratch my head here for a moment. Don't we use those options
> as part of the comparison?

Yes we do, but not as passed in here.

Stepping back a bit: There are 2 void pointers passed to the cmp function:

typedef int (*hashmap_cmp_fn)(const void *hashmap_cmp_fn_data,
      const void *entry, const void *entry_or_key,
      const void *keydata);

The hashmap_cmp_fn_data is the same pointer as we pass as
the equals_function data in

extern void hashmap_init(struct hashmap *map,
        hashmap_cmp_fn equals_function,
        const void *equals_function_data,
        size_t initial_size);

whereas the keydata is passed in either directly when calling cmp directly
or in hashmap_get_from_hash:

static inline void *hashmap_get_from_hash(const struct hashmap *map,
          unsigned int hash,
         const void *keydata)
{
        struct hashmap_entry key;
        hashmap_entry_init(&key, hash);
        return hashmap_get(map, &key, keydata);
}

It turns out we are passing the struct diff_options *o into the cmp
function twice, once via the inits equals_function_data, as well as
keydata directly. Omit the direct pass in.

This is mostly a cleanup as of now, as it turns out I do not need to
reuse the keydata field anyway.

> I took the "which" to mean "the compare function", but I think you mean
> "we pass these diff options already when the hashmap is initialized".

Oh, yes.

> Maybe something like this would be more clear:
>
>   When we initialize the hashmap, we give it a pointer to the
>   diff_options, which it then passes along to each call of the
>   hashmap_cmp_fn function. There's no need to pass it a second time as
>   the "keydata" parameter, and our comparison functions never look at
>   keydata.
>

Thanks for clearing this up, will take as-is.

>   This was a mistake left over from an earlier round of 2e2d5ac184
>   (diff.c: color moved lines differently, 2017-06-30), before hashmap
>   learned to pass the data pointer for us.

That sounds about right.

>
> (I'm just guessing on the second paragraph based on a quick look at
> git-blame and my recollection from the time).
>
> -Peff

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-02 23:51   ` Jonathan Tan
  2018-04-03 18:59     ` Stefan Beller
@ 2018-04-06 21:28     ` Stefan Beller
  2018-04-06 22:27       ` Jonathan Tan
  1 sibling, 1 reply; 27+ messages in thread
From: Stefan Beller @ 2018-04-06 21:28 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jacob Keller, Simon Ruderich, git

On Mon, Apr 2, 2018 at 4:51 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Mon,  2 Apr 2018 15:48:52 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> At the time the move coloring was implemented we thought an enum of modes
>> is the best to configure this feature.  However as we want to tack on new
>> features, the enum would grow exponentially.
>>
>> Refactor the code such that features are enabled via bits. Currently we can
>> * activate the move detection,
>> * enable the block detection on top, and
>> * enable the dimming inside a block, though this could be done without
>>   block detection as well (mode "plain, dimmed")
>
> Firstly, patches 1-4 are obviously correct.
>
> As for this patch, I don't think that using flags is the right way to do
> this.

Now that I redid it another way[1], I have the impression that this was the
right approach, because it allows for a short
  if (o->color_moved)
condition. If we treat white spaces separately, then we'd have to
have implications such as:

  if (some white space option)
    the enum = default if not given explicitely.

which we do not need in case of a flags field.

[1] Keeping the enum around and having an extra variable for the
white space related configuration.

> We are not under any size pressure for struct diff_options, and
> the additional options that we plan to add (color-moved-whitespace-flags
> and ignore-space-delta) can easily be additional fields instead.

The  traditional white space flags would want to be a field and occupy
the same bits in that field for ease of implementation, and the new option
would just fit in by picking a new place in the bit field.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/7] diff.c: refactor internal representation for coloring moved code
  2018-04-06 21:28     ` Stefan Beller
@ 2018-04-06 22:27       ` Jonathan Tan
  0 siblings, 0 replies; 27+ messages in thread
From: Jonathan Tan @ 2018-04-06 22:27 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jacob Keller, Simon Ruderich, git

On Fri, 6 Apr 2018 14:28:40 -0700
Stefan Beller <sbeller@google.com> wrote:

> Now that I redid it another way[1], I have the impression that this was the
> right approach, because it allows for a short
>   if (o->color_moved)
> condition. If we treat white spaces separately, then we'd have to
> have implications such as:
> 
>   if (some white space option)
>     the enum = default if not given explicitely.
> 
> which we do not need in case of a flags field.
> 
> [1] Keeping the enum around and having an extra variable for the
> white space related configuration.

Ah, I think I see what you mean. In your way, move detection can be
activated either by explicitly setting a color-move pattern (e.g.
zebra), or by setting a move-detection whitespace option, both of which
will set bits in the uint32.

As opposed to my proposed way, where you either have to set the default
explicitly (like you describe), or write "if (o->color_moved ||
o->color_moved_whitespace_handling)" instead of "if (o->color_moved)".

I don't think such an implicit dependence (whitespace option to
color-move pattern) is reason enough to use a bitfield, and I think the
opposite actually - this dependence should be in fact explicit, not
implicit. But I'm open to opinions from others.

> > We are not under any size pressure for struct diff_options, and
> > the additional options that we plan to add (color-moved-whitespace-flags
> > and ignore-space-delta) can easily be additional fields instead.
> 
> The  traditional white space flags would want to be a field and occupy
> the same bits in that field for ease of implementation, and the new option
> would just fit in by picking a new place in the bit field.

If we were to use bitfields, this would be the way, yes.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap
  2018-04-24 21:03 [PATCHv2 " Stefan Beller
@ 2018-04-24 21:03 ` Stefan Beller
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2018-04-24 21:03 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, simon, avarab, jacob.keller, Stefan Beller

When we initialize the hashmap, we give it a pointer to the
diff_options, which it then passes along to each call of the
hashmap_cmp_fn function. There's no need to pass it a second time as
the "keydata" parameter, and our comparison functions never look at
keydata.

This was a mistake left over from an earlier round of 2e2d5ac184
(diff.c: color moved lines differently, 2017-06-30), before hashmap
learned to pass the data pointer for us.

Explanation-by: Jeff King <peff@peff.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 diff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 1289df4b1f..ce7bedc1b9 100644
--- a/diff.c
+++ b/diff.c
@@ -842,13 +842,13 @@ static void mark_color_as_moved(struct diff_options *o,
 		case DIFF_SYMBOL_PLUS:
 			hm = del_lines;
 			key = prepare_entry(o, n);
-			match = hashmap_get(hm, key, o);
+			match = hashmap_get(hm, key, NULL);
 			free(key);
 			break;
 		case DIFF_SYMBOL_MINUS:
 			hm = add_lines;
 			key = prepare_entry(o, n);
-			match = hashmap_get(hm, key, o);
+			match = hashmap_get(hm, key, NULL);
 			free(key);
 			break;
 		default:
-- 
2.17.0.441.gb46fe60e1d-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2018-04-24 21:03 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-02 22:48 [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Stefan Beller
2018-04-02 22:48 ` [PATCH 1/7] xdiff/xdiff.h: remove unused flags Stefan Beller
2018-04-02 22:48 ` [PATCH 2/7] xdiff/xdiffi.c: remove unneeded function declarations Stefan Beller
2018-04-02 22:48 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller
2018-04-06 20:04   ` Jeff King
2018-04-06 20:41     ` Stefan Beller
2018-04-02 22:48 ` [PATCH 4/7] diff.c: adjust hash function signature to match hashmap expectation Stefan Beller
2018-04-02 22:48 ` [PATCH 5/7] diff.c: refactor internal representation for coloring moved code Stefan Beller
2018-04-02 23:51   ` Jonathan Tan
2018-04-03 18:59     ` Stefan Beller
2018-04-06 21:28     ` Stefan Beller
2018-04-06 22:27       ` Jonathan Tan
2018-04-03 19:39   ` Ævar Arnfjörð Bjarmason
2018-04-03 19:49     ` Stefan Beller
2018-04-03 20:44       ` Ævar Arnfjörð Bjarmason
2018-04-03 21:05         ` [PATCH] diff: add a blocks mode for moved code detection Stefan Beller
2018-04-02 22:48 ` [PATCH 6/7] diff.c: decouple white space treatment for move detection from generic option Stefan Beller
2018-04-03  0:31   ` Jonathan Tan
2018-04-02 22:48 ` [PATCH 7/7] diff.c: add --color-moved-ignore-space-delta option Stefan Beller
2018-04-03  0:41   ` Jonathan Tan
2018-04-03 19:22     ` Stefan Beller
2018-04-03 20:38       ` Jonathan Tan
2018-04-02 23:47 ` [RFC PATCH 0/7] Moved code detection: ignore space on uniform indentation Jonathan Tan
2018-04-03  0:03   ` Jacob Keller
2018-04-03 19:00     ` Stefan Beller
2018-04-03 19:55       ` Jacob Keller
  -- strict thread matches above, loose matches on Subject: below --
2018-04-24 21:03 [PATCHv2 " Stefan Beller
2018-04-24 21:03 ` [PATCH 3/7] diff.c: do not pass diff options as keydata to hashmap Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).