git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change
@ 2018-09-24 10:06 Phillip Wood
  2018-09-24 10:06 ` [RFC PATCH 1/3] xdiff-interface: make xdl_blankline() available Phillip Wood
                   ` (5 more replies)
  0 siblings, 6 replies; 44+ messages in thread
From: Phillip Wood @ 2018-09-24 10:06 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Stefan Beller, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

When trying out the new --color-moved-ws=allow-indentation-change I
was disappointed to discover it did not work if the indentation
contains a mix of spaces and tabs. This series adds a new option that
does. It's and RFC as there are some open questions about how to
proceed, see the last patch for details. Once there's a clearer way
forward I'll reroll with some documentation changes.

Phillip Wood (3):
  xdiff-interface: make xdl_blankline() available
  diff.c: remove unused variables
  diff: add --color-moved-ws=allow-mixed-indentation-change

 diff.c                     | 124 ++++++++++++++++++++++++++++++++-----
 diff.h                     |   1 +
 t/t4015-diff-whitespace.sh |  89 ++++++++++++++++++++++++++
 xdiff-interface.c          |   5 ++
 xdiff-interface.h          |   5 ++
 5 files changed, 208 insertions(+), 16 deletions(-)

-- 
2.19.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 1/3] xdiff-interface: make xdl_blankline() available
  2018-09-24 10:06 [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change Phillip Wood
@ 2018-09-24 10:06 ` Phillip Wood
  2018-09-24 23:19   ` Stefan Beller
  2018-09-24 10:06 ` [RFC PATCH 2/3] diff.c: remove unused variables Phillip Wood
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 44+ messages in thread
From: Phillip Wood @ 2018-09-24 10:06 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Stefan Beller, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This will be used by the move detection code.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 xdiff-interface.c | 5 +++++
 xdiff-interface.h | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index 9315bc0ede..eceabfa72d 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -308,6 +308,11 @@ int xdiff_compare_lines(const char *l1, long s1,
 	return xdl_recmatch(l1, s1, l2, s2, flags);
 }
 
+int xdiff_is_blankline(const char *l1, long s1, long flags)
+{
+	return xdl_blankline(l1, s1, flags);
+}
+
 int git_xmerge_style = -1;
 
 int git_xmerge_config(const char *var, const char *value, void *cb)
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 135fc05d72..d0008b016f 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -45,4 +45,9 @@ extern int xdiff_compare_lines(const char *l1, long s1,
  */
 extern unsigned long xdiff_hash_string(const char *s, size_t len, long flags);
 
+/*
+ * Returns 1 if the line is blank, taking XDF_WHITESPACE_FLAGS into account
+ */
+extern int xdiff_is_blankline(const char *s, long len, long flags);
+
 #endif
-- 
2.19.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 2/3] diff.c: remove unused variables
  2018-09-24 10:06 [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change Phillip Wood
  2018-09-24 10:06 ` [RFC PATCH 1/3] xdiff-interface: make xdl_blankline() available Phillip Wood
@ 2018-09-24 10:06 ` Phillip Wood
  2018-09-24 10:06 ` [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change Phillip Wood
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-09-24 10:06 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Stefan Beller, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

The string lengths are not used in cmp_in_block_with_wsd() so lets
remove them.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index 9393993e33..0a652e28d4 100644
--- a/diff.c
+++ b/diff.c
@@ -789,7 +789,6 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 				 int n)
 {
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-	int al = cur->es->len, cl = l->len;
 	const char *a = cur->es->line,
 		   *b = match->es->line,
 		   *c = l->line;
@@ -823,13 +822,10 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 */
 
 	wslen = strlen(pmb->wsd->string);
-	if (pmb->wsd->current_longer) {
+	if (pmb->wsd->current_longer)
 		c += wslen;
-		cl -= wslen;
-	} else {
+	else
 		a += wslen;
-		al -= wslen;
-	}
 
 	if (strcmp(a, c))
 		return 1;
-- 
2.19.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change
  2018-09-24 10:06 [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change Phillip Wood
  2018-09-24 10:06 ` [RFC PATCH 1/3] xdiff-interface: make xdl_blankline() available Phillip Wood
  2018-09-24 10:06 ` [RFC PATCH 2/3] diff.c: remove unused variables Phillip Wood
@ 2018-09-24 10:06 ` Phillip Wood
  2018-09-25  1:07   ` Stefan Beller
  2018-09-24 11:03 ` [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change Phillip Wood
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 44+ messages in thread
From: Phillip Wood @ 2018-09-24 10:06 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Stefan Beller, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This adds another mode for highlighting lines that have moved with an
indentation change. Unlike the existing
--color-moved-ws=allow-indentation-change setting this mode uses the
visible change in the indentation to group lines, rather than the
indentation string. This means it works with files that use a mix of
tabs and spaces for indentation and can cope with whitespace errors
where there is a space before a tab (it's the job of
--ws-error-highlight to deal with those errors, it should affect the
move detection). It will also group the lines either
side of a blank line if their indentation change matches so short
lines followed by a blank line followed by more lines with the same
indentation change will be correctly highlighted.

This is a RFC as there are a number of questions about how to proceed
from here:
 1) Do we need a second option or should this implementation replace
    --color-moved-ws=allow-indentation-change. I'm unclear if that mode
    has any advantages for some people. There seems to have been an
    intention [1] to get it working with mixes of tabs and spaces but
    nothing ever came of it.
 2) If we keep two options what should this option be called, the name
    is long and ambiguous at the moment - mixed could refer to mixed
    indentation length rather than a mix of tabs and spaces.
 3) Should we support whitespace flags with this mode?
    --ignore-space-at-eol and --ignore-cr-at eol would be fairly simple
    to support and I can see a use for them, --ignore-all-space and
    --ignore-space-change would need some changes to xdiff to allow them
    to apply only after the indentation. I think --ignore-blank-lines
    would need a bit of work to get it working as well. (Note the
    existing mode does not support any of these flags either)

[1] https://public-inbox.org/git/CAGZ79kasAqE+=7ciVrdjoRdu0UFjVBr8Ma502nw+3hZL=ebXYQ@mail.gmail.com/

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 122 +++++++++++++++++++++++++++++++++----
 diff.h                     |   1 +
 t/t4015-diff-whitespace.sh |  89 +++++++++++++++++++++++++++
 3 files changed, 199 insertions(+), 13 deletions(-)

diff --git a/diff.c b/diff.c
index 0a652e28d4..45f33daa60 100644
--- a/diff.c
+++ b/diff.c
@@ -304,7 +304,11 @@ static int parse_color_moved_ws(const char *arg)
 		else if (!strcmp(sb.buf, "ignore-all-space"))
 			ret |= XDF_IGNORE_WHITESPACE;
 		else if (!strcmp(sb.buf, "allow-indentation-change"))
-			ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
+			ret = COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
+			 (ret & ~COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE);
+		else if (!strcmp(sb.buf, "allow-mixed-indentation-change"))
+			ret = COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE |
+			 (ret & ~COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE);
 		else
 			error(_("ignoring unknown color-moved-ws mode '%s'"), sb.buf);
 
@@ -314,6 +318,9 @@ static int parse_color_moved_ws(const char *arg)
 	if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
 	    (ret & XDF_WHITESPACE_FLAGS))
 		die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes"));
+	else if ((ret & COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) &&
+		 (ret & XDF_WHITESPACE_FLAGS))
+		die(_("color-moved-ws: allow-mixed-indentation-change cannot be combined with other white space modes"));
 
 	string_list_clear(&l, 0);
 
@@ -763,11 +770,65 @@ struct moved_entry {
  * comparision is longer than the second.
  */
 struct ws_delta {
-	char *string;
+	union {
+		int delta;
+		char *string;
+	};
 	unsigned int current_longer : 1;
+	unsigned int have_string : 1;
 };
 #define WS_DELTA_INIT { NULL, 0 }
 
+static int compute_mixed_ws_delta(const struct emitted_diff_symbol *a,
+				  const struct emitted_diff_symbol *b,
+				  int *delta)
+{
+	unsigned int i = 0, j = 0;
+	int la, lb;
+	int ta = a->flags & WS_TAB_WIDTH_MASK;
+	int tb = b->flags & WS_TAB_WIDTH_MASK;
+	const char *sa = a->line;
+	const char *sb = b->line;
+
+	if (xdiff_is_blankline(sa, a->len, 0) &&
+	    xdiff_is_blankline(sb, b->len, 0)) {
+		*delta = INT_MIN;
+		return 1;
+	}
+
+	/* skip any \v \f \r at start of indentation */
+	while (sa[i] == '\f' || sa[i] == '\v' ||
+	       (sa[i] == '\r' && i < a->len - 1))
+		i++;
+	while (sb[j] == '\f' || sb[j] == '\v' ||
+	       (sb[j] == '\r' && j < b->len - 1))
+		j++;
+
+	for (la = 0; ; i++) {
+		if (sa[i] == ' ')
+			la++;
+		else if (sa[i] == '\t')
+			la = ((la + ta) / ta) * ta;
+		else
+			break;
+	}
+	for (lb = 0; ; j++) {
+		if (sb[j] == ' ')
+			lb++;
+		else if (sb[j] == '\t')
+			lb = ((lb + tb) / tb) * tb;
+		else
+			break;
+	}
+	if (a->s == DIFF_SYMBOL_PLUS)
+		*delta = la - lb;
+	else
+		*delta = lb - la;
+
+	return (a->len - i == b->len - j) &&
+		!memcmp(sa + i, sb + j, a->len - i);
+}
+
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
 			     const struct emitted_diff_symbol *b,
 			     struct ws_delta *out)
@@ -778,6 +839,7 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 
 	out->string = xmemdupz(longer->line, d);
 	out->current_longer = (a == longer);
+	out->have_string = 1;
 
 	return !strncmp(longer->line + d, shorter->line, shorter->len);
 }
@@ -820,15 +882,34 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 * To do so we need to compare 'l' to 'cur', adjusting the
 	 * one of them for the white spaces, depending which was longer.
 	 */
+	if (o->color_moved_ws_handling &
+	    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
+		wslen = strlen(pmb->wsd->string);
+		if (pmb->wsd->current_longer)
+			c += wslen;
+		else
+			a += wslen;
 
-	wslen = strlen(pmb->wsd->string);
-	if (pmb->wsd->current_longer)
-		c += wslen;
-	else
-		a += wslen;
+		if (strcmp(a, c))
+			return 1;
 
-	if (strcmp(a, c))
-		return 1;
+		return 0;
+	} else if (o->color_moved_ws_handling &
+		   COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) {
+		int delta;
+
+		if (!compute_mixed_ws_delta(cur->es, l, &delta))
+		    return 1;
+
+		if (pmb->wsd->delta == INT_MIN) {
+			pmb->wsd->delta = delta;
+			return 0;
+		}
+
+		return !(delta == pmb->wsd->delta || delta == INT_MIN);
+	} else {
+		BUG("no color_moved_ws_allow_indentation_change set");
+	}
 
 	return 0;
 }
@@ -845,7 +926,8 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			 & XDF_WHITESPACE_FLAGS;
 
 	if (diffopt->color_moved_ws_handling &
-	    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+	    (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
+	     COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
 		/*
 		 * As there is not specific white space config given,
 		 * we'd need to check for a new block, so ignore all
@@ -953,7 +1035,8 @@ static void pmb_advance_or_null_multi_match(struct diff_options *o,
 			pmb[i] = pmb[i]->next_line;
 		} else {
 			if (pmb[i]->wsd) {
-				free(pmb[i]->wsd->string);
+				if (pmb[i]->wsd->have_string)
+					free(pmb[i]->wsd->string);
 				FREE_AND_NULL(pmb[i]->wsd);
 			}
 			pmb[i] = NULL;
@@ -1066,7 +1149,8 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 
 		if (o->color_moved_ws_handling &
-		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+		    (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
+		     COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
 			pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
 		else
 			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
@@ -1088,6 +1172,17 @@ static void mark_color_as_moved(struct diff_options *o,
 						pmb[pmb_nr++] = match;
 					} else
 						free(wsd);
+				} else if (o->color_moved_ws_handling &
+					   COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) {
+					int delta;
+
+					if (compute_mixed_ws_delta(l, match->es, &delta)) {
+						struct ws_delta *wsd = xmalloc(sizeof(*match->wsd));
+						wsd->delta = delta;
+						wsd->have_string = 0;
+						match->wsd = wsd;
+						pmb[pmb_nr++] = match;
+					}
 				} else {
 					pmb[pmb_nr++] = match;
 				}
@@ -5740,7 +5835,8 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 			struct hashmap add_lines, del_lines;
 
 			if (o->color_moved_ws_handling &
-			    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			    (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
+			     COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
 				o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
 
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
diff --git a/diff.h b/diff.h
index 5e6bcf0926..03628cda45 100644
--- a/diff.h
+++ b/diff.h
@@ -217,6 +217,7 @@ struct diff_options {
 
 	/* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
 	#define COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE (1<<5)
+	#define COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE (1<<6)
 	int color_moved_ws_handling;
 };
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 41facf7abf..737dbd4a42 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1902,4 +1902,93 @@ test_expect_success 'compare whitespace delta incompatible with other space opti
 	test_i18ngrep allow-indentation-change err
 '
 
+NUL=''
+test_expect_success 'compare mixed whitespace delta across moved blocks' '
+
+	git reset --hard &&
+	tr Q_ "\t " <<-EOF >text.txt &&
+	${NUL}
+	____too short without
+	${NUL}
+	____being grouped across blank line
+	${NUL}
+	context
+	lines
+	to
+	anchor
+	____Indented text to
+	_Q____be further indented by four spaces across
+	____Qseveral lines
+	QQ____These two lines have had their
+	____indentation reduced by four spaces
+	Qdifferent indentation change
+	____too short
+	EOF
+
+	git add text.txt &&
+	git commit -m "add text.txt" &&
+
+	tr Q_ "\t " <<-EOF >text.txt &&
+	context
+	lines
+	to
+	anchor
+	QIndented text to
+	QQbe further indented by four spaces across
+	Q____several lines
+	${NUL}
+	QQtoo short without
+	${NUL}
+	QQbeing grouped across blank line
+	${NUL}
+	Q_QThese two lines have had their
+	indentation reduced by four spaces
+	QQdifferent indentation change
+	__Qtoo short
+	EOF
+
+	git -c color.diff.whitespace="normal red" \
+		-c core.whitespace=space-before-tab \
+		diff --color --color-moved --ws-error-highlight=all \
+		--color-moved-ws=allow-mixed-indentation-change >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
+
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/text.txt b/text.txt<RESET>
+	<BOLD>--- a/text.txt<RESET>
+	<BOLD>+++ b/text.txt<RESET>
+	<CYAN>@@ -1,16 +1,16 @@<RESET>
+	<BOLD;MAGENTA>-<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    too short without<RESET>
+	<BOLD;MAGENTA>-<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    being grouped across blank line<RESET>
+	<BOLD;MAGENTA>-<RESET>
+	 <RESET>context<RESET>
+	 <RESET>lines<RESET>
+	 <RESET>to<RESET>
+	 <RESET>anchor<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    Indented text to<RESET>
+	<BOLD;MAGENTA>-<RESET><BRED> <RESET>	<BOLD;MAGENTA>    be further indented by four spaces across<RESET>
+	<BOLD;MAGENTA>-<RESET><BRED>    <RESET>	<BOLD;MAGENTA>several lines<RESET>
+	<BOLD;BLUE>-<RESET>		<BOLD;BLUE>    These two lines have had their<RESET>
+	<BOLD;BLUE>-<RESET><BOLD;BLUE>    indentation reduced by four spaces<RESET>
+	<BOLD;MAGENTA>-<RESET>	<BOLD;MAGENTA>different indentation change<RESET>
+	<RED>-<RESET><RED>    too short<RESET>
+	<BOLD;CYAN>+<RESET>	<BOLD;CYAN>Indented text to<RESET>
+	<BOLD;CYAN>+<RESET>		<BOLD;CYAN>be further indented by four spaces across<RESET>
+	<BOLD;CYAN>+<RESET>	<BOLD;CYAN>    several lines<RESET>
+	<BOLD;YELLOW>+<RESET>
+	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>too short without<RESET>
+	<BOLD;YELLOW>+<RESET>
+	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>being grouped across blank line<RESET>
+	<BOLD;YELLOW>+<RESET>
+	<BOLD;CYAN>+<RESET>	<BRED> <RESET>	<BOLD;CYAN>These two lines have had their<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>indentation reduced by four spaces<RESET>
+	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>different indentation change<RESET>
+	<GREEN>+<RESET><BRED>  <RESET>	<GREEN>too short<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.19.0


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change
  2018-09-24 10:06 [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change Phillip Wood
                   ` (2 preceding siblings ...)
  2018-09-24 10:06 ` [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change Phillip Wood
@ 2018-09-24 11:03 ` Phillip Wood
  2018-11-16 11:03 ` [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
  5 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-09-24 11:03 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Stefan Beller, Phillip Wood

On 24/09/2018 11:06, Phillip Wood wrote:
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
> 
> When trying out the new --color-moved-ws=allow-indentation-change I
> was disappointed to discover it did not work if the indentation
> contains a mix of spaces and tabs. This series adds a new option that
> does. It's and RFC as there are some open questions about how to
> proceed, see the last patch for details. Once there's a clearer way
> forward I'll reroll with some documentation changes.
> 

I should have said that this series is based on top of fab01ec52e
("diff: fix --color-moved-ws=allow-indentation-change", 2018-09-04), the
result merges into current master without conflicts.

Best Wishes

Phillip

> Phillip Wood (3):
>   xdiff-interface: make xdl_blankline() available
>   diff.c: remove unused variables
>   diff: add --color-moved-ws=allow-mixed-indentation-change
> 
>  diff.c                     | 124 ++++++++++++++++++++++++++++++++-----
>  diff.h                     |   1 +
>  t/t4015-diff-whitespace.sh |  89 ++++++++++++++++++++++++++
>  xdiff-interface.c          |   5 ++
>  xdiff-interface.h          |   5 ++
>  5 files changed, 208 insertions(+), 16 deletions(-)
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 1/3] xdiff-interface: make xdl_blankline() available
  2018-09-24 10:06 ` [RFC PATCH 1/3] xdiff-interface: make xdl_blankline() available Phillip Wood
@ 2018-09-24 23:19   ` Stefan Beller
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Beller @ 2018-09-24 23:19 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git

On Mon, Sep 24, 2018 at 3:06 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> This will be used by the move detection code.
>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  xdiff-interface.c | 5 +++++
>  xdiff-interface.h | 5 +++++
>  2 files changed, 10 insertions(+)
>
> diff --git a/xdiff-interface.c b/xdiff-interface.c
> index 9315bc0ede..eceabfa72d 100644
> --- a/xdiff-interface.c
> +++ b/xdiff-interface.c
> @@ -308,6 +308,11 @@ int xdiff_compare_lines(const char *l1, long s1,
>         return xdl_recmatch(l1, s1, l2, s2, flags);
>  }
>
> +int xdiff_is_blankline(const char *l1, long s1, long flags)
> +{
> +       return xdl_blankline(l1, s1, flags);
> +}
> +
>  int git_xmerge_style = -1;
>
>  int git_xmerge_config(const char *var, const char *value, void *cb)
> diff --git a/xdiff-interface.h b/xdiff-interface.h
> index 135fc05d72..d0008b016f 100644
> --- a/xdiff-interface.h
> +++ b/xdiff-interface.h
> @@ -45,4 +45,9 @@ extern int xdiff_compare_lines(const char *l1, long s1,
>   */
>  extern unsigned long xdiff_hash_string(const char *s, size_t len, long flags);
>
> +/*
> + * Returns 1 if the line is blank, taking XDF_WHITESPACE_FLAGS into account

presumably in the flags field.

> + */
> +extern int xdiff_is_blankline(const char *s, long len, long flags);

We also have
    int ws_blank_line(const char *, int, int)
that looks very similar, but works slightly differently.
grep.c has
    static int is_empty_line(const char *bol, const char *eol)
    {
       while (bol < eol && isspace(*bol))
       bol++;
       return bol == eol;
    }

pretty.c also has a is_blank_line() (and some stale comment in
that file refers to it as is_empty_line, 77356122443
(pretty: make the skip_blank_lines() function public,
2016-06-22)

Would we be able to unify all these down to one or two
functions? (Maybe all can use the new xdiff function?)
It seems as if we're reinventing the wheel a couple times
in our code base.

Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change
  2018-09-24 10:06 ` [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change Phillip Wood
@ 2018-09-25  1:07   ` Stefan Beller
  2018-10-09  9:50     ` Phillip Wood
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Beller @ 2018-09-25  1:07 UTC (permalink / raw)
  To: Phillip Wood, Jonathan Tan; +Cc: git

On Mon, Sep 24, 2018 at 3:06 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> This adds another mode for highlighting lines that have moved with an
> indentation change. Unlike the existing
> --color-moved-ws=allow-indentation-change setting this mode uses the
> visible change in the indentation to group lines, rather than the
> indentation string.

Wow! Thanks for putting this RFC out.
My original vision was to be useful to python users as well,
which counts 1 tab as 8 spaces IIUC.

The "visual" indentation you mention here sounds like
a tab is counted as "up to the next position of (n-1) % 8",
i.e. stop at positions 8, 16, 24... which would not be pythonic,
but useful in e.g. our code base.

> This means it works with files that use a mix of
> tabs and spaces for indentation and can cope with whitespace errors
> where there is a space before a tab

Cool!

> (it's the job of
> --ws-error-highlight to deal with those errors, it should affect the
> move detection).

Not sure I understand this side note. So --ws-error-highlight can
highlight them, but the move detection should *not*(?) be affected
by the highlighted parts, or it should do things differently on
whether  --ws-error-highlight is given?

> It will also group the lines either
> side of a blank line if their indentation change matches so short
> lines followed by a blank line followed by more lines with the same
> indentation change will be correctly highlighted.

That sounds very useful (at least for my editor, that strips
blank lines to be empty lines), but I would think this feature is
worth its own commit/patch.

I wonder how much this feature is orthogonal to the existing
problem of detecting the moved indented blocks (existing
allow-indentation-change vs the new feature discussed first
above)

>
> This is a RFC as there are a number of questions about how to proceed
> from here:
>  1) Do we need a second option or should this implementation replace
>     --color-moved-ws=allow-indentation-change. I'm unclear if that mode
>     has any advantages for some people. There seems to have been an
>     intention [1] to get it working with mixes of tabs and spaces but
>     nothing ever came of it.

Oh, yeah, I was working on that, but dropped the ball.

I am not sure what the best end goal is, or if there are many different
modes that are useful to different target audiences.
My own itch at the time was (de-/)in-dented code from refactoring
patches for git.git and JGit (so Java, C, shell); and I think not hurting
python would also be good.

ignoring the mixture of ws seems like it would also cater free text or
other more exotic languages.

What is your use case, what kind of content do you process that
this patch would help you?

I am not overly attached to the current implementation of
 --color-moved-ws=allow-indentation-change,
and I think Junio has expressed the fear of "too many options"
already in this problem space, so if possible I would extend/replace
the current option.

>  2) If we keep two options what should this option be called, the name
>     is long and ambiguous at the moment - mixed could refer to mixed
>     indentation length rather than a mix of tabs and spaces.

Let's first read the code to have an opinion, or re-state the question
from above ("What is this used for?") as I could imagine one of the
modes could be "ws-pythonic" and allow for whitespace indentation
that would have the whole block count as an indented by the same
amount, (e.g. if you wrap a couple functions in python by a class).

> +++ b/diff.c
> @@ -304,7 +304,11 @@ static int parse_color_moved_ws(const char *arg)
>                 else if (!strcmp(sb.buf, "ignore-all-space"))
>                         ret |= XDF_IGNORE_WHITESPACE;
>                 else if (!strcmp(sb.buf, "allow-indentation-change"))
> -                       ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
> +                       ret = COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
> +                        (ret & ~COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE);

So this RFC lets "allow-indentation-change" override
"allow-mixed-indentation-change" and vice versa. That
also solves the issue of configuring one and giving the other
as a command line option. Nice.

>         if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
>             (ret & XDF_WHITESPACE_FLAGS))
>                 die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes"));
> +       else if ((ret & COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) &&
> +                (ret & XDF_WHITESPACE_FLAGS))
> +               die(_("color-moved-ws: allow-mixed-indentation-change cannot be combined with other white space modes"));

Do we want to open a bit mask for all indentation change options? e.g.
#define COLOR_MOVED_WS_INDENTATION_MASK \
    (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE | \
     COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE)

> @@ -763,11 +770,65 @@ struct moved_entry {
>   * comparision is longer than the second.
>   */
>  struct ws_delta {
> -       char *string;
> +       union {
> +               int delta;
> +               char *string;
> +       };
>         unsigned int current_longer : 1;
> +       unsigned int have_string : 1;
>  };
>  #define WS_DELTA_INIT { NULL, 0 }
>
> +static int compute_mixed_ws_delta(const struct emitted_diff_symbol *a,
> +                                 const struct emitted_diff_symbol *b,
> +                                 int *delta)
> +{
> +       unsigned int i = 0, j = 0;
> +       int la, lb;
> +       int ta = a->flags & WS_TAB_WIDTH_MASK;
> +       int tb = b->flags & WS_TAB_WIDTH_MASK;
> +       const char *sa = a->line;
> +       const char *sb = b->line;
> +
> +       if (xdiff_is_blankline(sa, a->len, 0) &&
> +           xdiff_is_blankline(sb, b->len, 0)) {
> +               *delta = INT_MIN;
> +               return 1;
> +       }
> +
> +       /* skip any \v \f \r at start of indentation */
> +       while (sa[i] == '\f' || sa[i] == '\v' ||
> +              (sa[i] == '\r' && i < a->len - 1))

I do not understand the use of parens here.
I would have expected all comparisons in one
block which is then &&'d to the length requirement.
But this seems to tread \r special if not at EOL.

> +               i++;
> +       while (sb[j] == '\f' || sb[j] == '\v' ||
> +              (sb[j] == '\r' && j < b->len - 1))
> +               j++;
> +
> +       for (la = 0; ; i++) {
> +               if (sa[i] == ' ')
> +                       la++;
> +               else if (sa[i] == '\t')
> +                       la = ((la + ta) / ta) * ta;

multiplication/division may be expensive,
would something like

  la = la - (la % ta) + ta;

work instead? (the modulo is a hidden division,
but at least we do not have another multiplication)

Further I'd find it slightly easier to understand as it
"fills up to the next multiple of <ta>" whereas the
divide and re-multiply trick relies on integer logic, but
that might be just me.  Maybe just add a comment.

> +               else
> +                       break;
> +       }
> +       for (lb = 0; ; j++) {
> +               if (sb[j] == ' ')
> +                       lb++;
> +               else if (sb[j] == '\t')
> +                       lb = ((lb + tb) / tb) * tb;
> +               else
> +                       break;
> +       }
> +       if (a->s == DIFF_SYMBOL_PLUS)
> +               *delta = la - lb;
> +       else
> +               *delta = lb - la;

When writing the original feature I had reasons
not to rely on the symbol, as you could have
moved things from + to - (or the other way round)
and added or removed indentation. That is what the
`current_longer` is used for. But given that you only
count here, we can have negative numbers, so it
would work either way for adding or removing indentation.

But then, why do we need to have a different sign
depending on the sign of the line?

> +
> +       return (a->len - i == b->len - j) &&
> +               !memcmp(sa + i, sb + j, a->len - i);
> +}
> +
>  static int compute_ws_delta(const struct emitted_diff_symbol *a,
>                              const struct emitted_diff_symbol *b,
>                              struct ws_delta *out)
> @@ -778,6 +839,7 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
>
>         out->string = xmemdupz(longer->line, d);
>         out->current_longer = (a == longer);
> +       out->have_string = 1;
>
>         return !strncmp(longer->line + d, shorter->line, shorter->len);
>  }
> @@ -820,15 +882,34 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
>          * To do so we need to compare 'l' to 'cur', adjusting the
>          * one of them for the white spaces, depending which was longer.
>          */

The comment above would only apply to the original mode?

> +       if (o->color_moved_ws_handling &
> +           COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
> +               wslen = strlen(pmb->wsd->string);
> +               if (pmb->wsd->current_longer)
> +                       c += wslen;
> +               else
> +                       a += wslen;
>
> -       wslen = strlen(pmb->wsd->string);
> -       if (pmb->wsd->current_longer)
> -               c += wslen;
> -       else
> -               a += wslen;
> +               if (strcmp(a, c))
> +                       return 1;

This could be "return strcmp" instead of falling
through to the last line in the function in case of 0. But this
is just indenting code that is already there.

>
> -       if (strcmp(a, c))
> -               return 1;
> +               return 0;
> +       } else if (o->color_moved_ws_handling &
> +                  COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) {
> +               int delta;
> +
> +               if (!compute_mixed_ws_delta(cur->es, l, &delta))
> +                   return 1;
> +
> +               if (pmb->wsd->delta == INT_MIN) {
> +                       pmb->wsd->delta = delta;
> +                       return 0;
> +               }
> +
> +               return !(delta == pmb->wsd->delta || delta == INT_MIN);

Most of the code here deals with jumping over empty lines, and the new
mode is just comparing the two numbers.


> +       } else {
> +               BUG("no color_moved_ws_allow_indentation_change set");

Instead of the BUG here could we have a switch/case (or if/else)
covering the complete space of delta->have_string instead?
Then we would not leave a lingering bug in the code base.

> +       }
>
>         return 0;
>  }
> @@ -845,7 +926,8 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
>                          & XDF_WHITESPACE_FLAGS;
>
>         if (diffopt->color_moved_ws_handling &
> -           COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
> +           (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
> +            COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
>                 /*
>                  * As there is not specific white space config given,
>                  * we'd need to check for a new block, so ignore all
> @@ -953,7 +1035,8 @@ static void pmb_advance_or_null_multi_match(struct diff_options *o,
>                         pmb[i] = pmb[i]->next_line;
>                 } else {
>                         if (pmb[i]->wsd) {
> -                               free(pmb[i]->wsd->string);
> +                               if (pmb[i]->wsd->have_string)
> +                                       free(pmb[i]->wsd->string);
>                                 FREE_AND_NULL(pmb[i]->wsd);
>                         }
>                         pmb[i] = NULL;
> @@ -1066,7 +1149,8 @@ static void mark_color_as_moved(struct diff_options *o,
>                         continue;
>
>                 if (o->color_moved_ws_handling &
> -                   COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
> +                   (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
> +                    COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
>                         pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
>                 else
>                         pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
> @@ -1088,6 +1172,17 @@ static void mark_color_as_moved(struct diff_options *o,
>                                                 pmb[pmb_nr++] = match;
>                                         } else
>                                                 free(wsd);
> +                               } else if (o->color_moved_ws_handling &
> +                                          COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) {
> +                                       int delta;
> +
> +                                       if (compute_mixed_ws_delta(l, match->es, &delta)) {
> +                                               struct ws_delta *wsd = xmalloc(sizeof(*match->wsd));
> +                                               wsd->delta = delta;
> +                                               wsd->have_string = 0;
> +                                               match->wsd = wsd;
> +                                               pmb[pmb_nr++] = match;

I would want to keep mark_color_as_moved and friends smaller, and instead
move the complexity to compute_ws_delta  which would check for the mode
in `o` instead of repeating the modes in all these function.
Just like cmp_in_block_with_wsd takes both modes into account


> +                                       }
>                                 } else {
>                                         pmb[pmb_nr++] = match;
>                                 }
> @@ -5740,7 +5835,8 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>                         struct hashmap add_lines, del_lines;
>
>                         if (o->color_moved_ws_handling &
> -                           COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
> +                           (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
> +                            COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
>                                 o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
>
>                         hashmap_init(&del_lines, moved_entry_cmp, o, 0);
> diff --git a/diff.h b/diff.h
> index 5e6bcf0926..03628cda45 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -217,6 +217,7 @@ struct diff_options {
>
>         /* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
>         #define COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE (1<<5)
> +       #define COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE (1<<6)
>         int color_moved_ws_handling;
>  };
>
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 41facf7abf..737dbd4a42 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1902,4 +1902,93 @@ test_expect_success 'compare whitespace delta incompatible with other space opti
>         test_i18ngrep allow-indentation-change err
>  '
>
> +NUL=''
> +test_expect_success 'compare mixed whitespace delta across moved blocks' '
> +
> +       git reset --hard &&
> +       tr Q_ "\t " <<-EOF >text.txt &&

So this is the extended version of q_to_tab, as it also
translates _ to blank.

> +       ${NUL}

is an empty line? So maybe s/NUL/EMPTY/ ?

I think the following test cases may be useful:

    (3x_) too short without
    $EMPTY
    (4x_)  being grouped across blank line

that will be indented to

    (3x_+n) too short without
    $EMPTY
    (4x_+n)  being grouped across blank line

i.e. the current test of grouping across an empty line
always has the same indentation before and after, but we
only care about the change in indentation, such that
we should be able to have different indentation levels
before and after an empty line in the code, and
still count it as a block when they are indented the
same amount.


Is it possible for a block to start with an empty line?
How do we handle multiple adjacent empty lines?

Do we need tests for such special cases?

-

I hope this helps, as I gave the feedback above
mostly unstructured.

I'm excited about the skip blank lines mode, but
I am not quite sure about the "just count" mode,
as that is what I had originally IIRC but Jonathan
seemed to not be fond of it. Maybe he remembers
why.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change
  2018-09-25  1:07   ` Stefan Beller
@ 2018-10-09  9:50     ` Phillip Wood
  2018-10-09 21:10       ` Stefan Beller
  0 siblings, 1 reply; 44+ messages in thread
From: Phillip Wood @ 2018-10-09  9:50 UTC (permalink / raw)
  To: Stefan Beller, Phillip Wood, Jonathan Tan; +Cc: git

Hi Stefan

Thanks for all your comments on this, they've been really helpful.

On 25/09/2018 02:07, Stefan Beller wrote:
> On Mon, Sep 24, 2018 at 3:06 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>>
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> This adds another mode for highlighting lines that have moved with an
>> indentation change. Unlike the existing
>> --color-moved-ws=allow-indentation-change setting this mode uses the
>> visible change in the indentation to group lines, rather than the
>> indentation string.
> 
> Wow! Thanks for putting this RFC out.
> My original vision was to be useful to python users as well,
> which counts 1 tab as 8 spaces IIUC.
> 
> The "visual" indentation you mention here sounds like
> a tab is counted as "up to the next position of (n-1) % 8",
> i.e. stop at positions 8, 16, 24... which would not be pythonic,
> but useful in e.g. our code base.

The docs for python2 state[1]

  Leading whitespace (spaces and tabs) at the beginning of a logical
  line is used to compute the indentation level of the line, which in
  turn is used to determine the grouping of statements.

  First, tabs are replaced (from left to right) by one to eight spaces
  such that the total number of characters up to and including the
  replacement is a multiple of eight (this is intended to be the same
  rule as used by Unix). The total number of spaces preceding the
  first non-blank character then determines the line’s
  indentation. Indentation cannot be split over multiple physical
  lines using backslashes; the whitespace up to the first backslash
  determines the indentation.

As I understand it that fits with the "visual" indentation implemented
by this patch.

For python3 adds a third paragraph[2]

  Indentation is rejected as inconsistent if a source file mixes tabs
  and spaces in a way that makes the meaning dependent on the worth of
  a tab in spaces; a TabError is raised in that case.

My impression is that people generally avoid mixing tabs and spaces in
python3 code, in which case I wonder if the "visual" indentation
combined with a suitable setting for core.whitespace to highlight
erroneous tabs/spaces would be enough. (I'm not a python programmer so I
could be completely wrong on that)

In any case the more I think about it the more convinced I am that
having a move detection mode for "pythonic" indentation is a mistake. If
a line is added with dodgy indentation then it is a problem whether or
not it has been moved so I think this should be handled by the
whitespace error highlighting. This would allow a single mode for move
detection with an indentation change.

[1] https://docs.python.org/2.7/reference/lexical_analysis.html#indentation
[2] https://docs.python.org/3.7/reference/lexical_analysis.html#indentation

>> This means it works with files that use a mix of
>> tabs and spaces for indentation and can cope with whitespace errors
>> where there is a space before a tab
> 
> Cool!
> 
>> (it's the job of
>> --ws-error-highlight to deal with those errors, it should affect the
>> move detection).
> 
> Not sure I understand this side note. So --ws-error-highlight can
> highlight them, but the move detection should *not*(?) be affected
> by the highlighted parts, or it should do things differently on
> whether  --ws-error-highlight is given?

I just meant that the move detection should pretend the whitespace
errors do not exist.

>> It will also group the lines either
>> side of a blank line if their indentation change matches so short
>> lines followed by a blank line followed by more lines with the same
>> indentation change will be correctly highlighted.
> 
> That sounds very useful (at least for my editor, that strips
> blank lines to be empty lines), but I would think this feature is
> worth its own commit/patch.
> 
> I wonder how much this feature is orthogonal to the existing
> problem of detecting the moved indented blocks (existing
> allow-indentation-change vs the new feature discussed first
> above)

It only works if the blank lines get moved with the non-blank lines
around it, then it matches the normal moved behavior I think. I'd like
to have it include blank context lines where the lines either side have
the same indentation change but that is trickier to implement.

>>
>> This is a RFC as there are a number of questions about how to proceed
>> from here:
>>  1) Do we need a second option or should this implementation replace
>>     --color-moved-ws=allow-indentation-change. I'm unclear if that mode
>>     has any advantages for some people. There seems to have been an
>>     intention [1] to get it working with mixes of tabs and spaces but
>>     nothing ever came of it.
> 
> Oh, yeah, I was working on that, but dropped the ball.
> 
> I am not sure what the best end goal is, or if there are many different
> modes that are useful to different target audiences.
> My own itch at the time was (de-/)in-dented code from refactoring
> patches for git.git and JGit (so Java, C, shell); and I think not hurting
> python would also be good.

As I said above I've more or less come to the view that the correctness
of pythonic indentation is orthogonal to move detection as it affects
all additions, not just those that correspond to moved lines.

> ignoring the mixture of ws seems like it would also cater free text or
> other more exotic languages.
> 
> What is your use case, what kind of content do you process that
> this patch would help you?

I wrote this because I was re-factoring some shell code than was using a
indentation step of four spaces but with tabs in the leading indentation
which the current mode does not handle.

> I am not overly attached to the current implementation of
>  --color-moved-ws=allow-indentation-change,
> and I think Junio has expressed the fear of "too many options"
> already in this problem space, so if possible I would extend/replace
> the current option.
> 
>>  2) If we keep two options what should this option be called, the name
>>     is long and ambiguous at the moment - mixed could refer to mixed
>>     indentation length rather than a mix of tabs and spaces.
> 
> Let's first read the code to have an opinion, or re-state the question
> from above ("What is this used for?") as I could imagine one of the
> modes could be "ws-pythonic" and allow for whitespace indentation
> that would have the whole block count as an indented by the same
> amount, (e.g. if you wrap a couple functions in python by a class).
> 
>> +++ b/diff.c
>> @@ -304,7 +304,11 @@ static int parse_color_moved_ws(const char *arg)
>>                 else if (!strcmp(sb.buf, "ignore-all-space"))
>>                         ret |= XDF_IGNORE_WHITESPACE;
>>                 else if (!strcmp(sb.buf, "allow-indentation-change"))
>> -                       ret |= COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE;
>> +                       ret = COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
>> +                        (ret & ~COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE);
> 
> So this RFC lets "allow-indentation-change" override
> "allow-mixed-indentation-change" and vice versa. That
> also solves the issue of configuring one and giving the other
> as a command line option. Nice.
> 
>>         if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
>>             (ret & XDF_WHITESPACE_FLAGS))
>>                 die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes"));
>> +       else if ((ret & COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) &&
>> +                (ret & XDF_WHITESPACE_FLAGS))
>> +               die(_("color-moved-ws: allow-mixed-indentation-change cannot be combined with other white space modes"));
> 
> Do we want to open a bit mask for all indentation change options? e.g.
> #define COLOR_MOVED_WS_INDENTATION_MASK \
>     (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE | \
>      COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE)

That's a good idea if we retain two separate modes

> 
>> @@ -763,11 +770,65 @@ struct moved_entry {
>>   * comparision is longer than the second.
>>   */
>>  struct ws_delta {
>> -       char *string;
>> +       union {
>> +               int delta;
>> +               char *string;
>> +       };
>>         unsigned int current_longer : 1;
>> +       unsigned int have_string : 1;
>>  };
>>  #define WS_DELTA_INIT { NULL, 0 }
>>
>> +static int compute_mixed_ws_delta(const struct emitted_diff_symbol *a,
>> +                                 const struct emitted_diff_symbol *b,
>> +                                 int *delta)
>> +{
>> +       unsigned int i = 0, j = 0;
>> +       int la, lb;
>> +       int ta = a->flags & WS_TAB_WIDTH_MASK;
>> +       int tb = b->flags & WS_TAB_WIDTH_MASK;
>> +       const char *sa = a->line;
>> +       const char *sb = b->line;
>> +
>> +       if (xdiff_is_blankline(sa, a->len, 0) &&
>> +           xdiff_is_blankline(sb, b->len, 0)) {
>> +               *delta = INT_MIN;
>> +               return 1;
>> +       }
>> +
>> +       /* skip any \v \f \r at start of indentation */
>> +       while (sa[i] == '\f' || sa[i] == '\v' ||
>> +              (sa[i] == '\r' && i < a->len - 1))
> 
> I do not understand the use of parens here.
> I would have expected all comparisons in one
> block which is then &&'d to the length requirement.
> But this seems to tread \r special if not at EOL.

I only want to skip '\r' if it isn't part of "\r\n" at the end of the
line. (similar to way --ignore-cr-at-eol does not ignore a trailing '\r'
on an incomplete line)

>> +               i++;
>> +       while (sb[j] == '\f' || sb[j] == '\v' ||
>> +              (sb[j] == '\r' && j < b->len - 1))
>> +               j++;
>> +
>> +       for (la = 0; ; i++) {
>> +               if (sa[i] == ' ')
>> +                       la++;
>> +               else if (sa[i] == '\t')
>> +                       la = ((la + ta) / ta) * ta;
> 
> multiplication/division may be expensive,
> would something like
> 
>   la = la - (la % ta) + ta;
> 
> work instead? (the modulo is a hidden division,
> but at least we do not have another multiplication)
> 
> Further I'd find it slightly easier to understand as it
> "fills up to the next multiple of <ta>" whereas the
> divide and re-multiply trick relies on integer logic, but
> that might be just me.  Maybe just add a comment.

I agree your version is clearer and it is marginally (~1%) faster

>> +               else
>> +                       break;
>> +       }
>> +       for (lb = 0; ; j++) {
>> +               if (sb[j] == ' ')
>> +                       lb++;
>> +               else if (sb[j] == '\t')
>> +                       lb = ((lb + tb) / tb) * tb;
>> +               else
>> +                       break;
>> +       }
>> +       if (a->s == DIFF_SYMBOL_PLUS)
>> +               *delta = la - lb;
>> +       else
>> +               *delta = lb - la;
> 
> When writing the original feature I had reasons
> not to rely on the symbol, as you could have
> moved things from + to - (or the other way round)
> and added or removed indentation. That is what the
> `current_longer` is used for. But given that you only
> count here, we can have negative numbers, so it
> would work either way for adding or removing indentation.
> 
> But then, why do we need to have a different sign
> depending on the sign of the line?

The check means that we get the same delta whichever way round the lines
are compared. I think I added this because without it the highlighting
gets broken if there is increase in indentation followed by an identical
decrease on the next line.

>> +
>> +       return (a->len - i == b->len - j) &&
>> +               !memcmp(sa + i, sb + j, a->len - i);
>> +}
>> +
>>  static int compute_ws_delta(const struct emitted_diff_symbol *a,
>>                              const struct emitted_diff_symbol *b,
>>                              struct ws_delta *out)
>> @@ -778,6 +839,7 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
>>
>>         out->string = xmemdupz(longer->line, d);
>>         out->current_longer = (a == longer);
>> +       out->have_string = 1;
>>
>>         return !strncmp(longer->line + d, shorter->line, shorter->len);
>>  }
>> @@ -820,15 +882,34 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
>>          * To do so we need to compare 'l' to 'cur', adjusting the
>>          * one of them for the white spaces, depending which was longer.
>>          */
> 
> The comment above would only apply to the original mode?

Yes that should be changed/moved

>> +       if (o->color_moved_ws_handling &
>> +           COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
>> +               wslen = strlen(pmb->wsd->string);
>> +               if (pmb->wsd->current_longer)
>> +                       c += wslen;
>> +               else
>> +                       a += wslen;
>>
>> -       wslen = strlen(pmb->wsd->string);
>> -       if (pmb->wsd->current_longer)
>> -               c += wslen;
>> -       else
>> -               a += wslen;
>> +               if (strcmp(a, c))
>> +                       return 1;
> 
> This could be "return strcmp" instead of falling
> through to the last line in the function in case of 0. But this
> is just indenting code that is already there.

As you say it's keeping the code the same, also while it does not matter
to the caller at the moment I was wary of potentially changing the sign
of the return value.

>> -       if (strcmp(a, c))
>> -               return 1;
>> +               return 0;
>> +       } else if (o->color_moved_ws_handling &
>> +                  COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) {
>> +               int delta;
>> +
>> +               if (!compute_mixed_ws_delta(cur->es, l, &delta))
>> +                   return 1;
>> +
>> +               if (pmb->wsd->delta == INT_MIN) {
>> +                       pmb->wsd->delta = delta;
>> +                       return 0;
>> +               }
>> +
>> +               return !(delta == pmb->wsd->delta || delta == INT_MIN);
> 
> Most of the code here deals with jumping over empty lines, and the new
> mode is just comparing the two numbers.
> 
>> +       } else {
>> +               BUG("no color_moved_ws_allow_indentation_change set");
> 
> Instead of the BUG here could we have a switch/case (or if/else)
> covering the complete space of delta->have_string instead?
> Then we would not leave a lingering bug in the code base.

I'm not sure what you mean, we cover all the existing
color_moved_ws_handling values, I added the BUG() call to pick up future
omissions if another mode is added. (If we go for a single mode none of
this matters)

>> +       }
>>
>>         return 0;
>>  }
>> @@ -845,7 +926,8 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
>>                          & XDF_WHITESPACE_FLAGS;
>>
>>         if (diffopt->color_moved_ws_handling &
>> -           COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
>> +           (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
>> +            COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
>>                 /*
>>                  * As there is not specific white space config given,
>>                  * we'd need to check for a new block, so ignore all
>> @@ -953,7 +1035,8 @@ static void pmb_advance_or_null_multi_match(struct diff_options *o,
>>                         pmb[i] = pmb[i]->next_line;
>>                 } else {
>>                         if (pmb[i]->wsd) {
>> -                               free(pmb[i]->wsd->string);
>> +                               if (pmb[i]->wsd->have_string)
>> +                                       free(pmb[i]->wsd->string);
>>                                 FREE_AND_NULL(pmb[i]->wsd);
>>                         }
>>                         pmb[i] = NULL;
>> @@ -1066,7 +1149,8 @@ static void mark_color_as_moved(struct diff_options *o,
>>                         continue;
>>
>>                 if (o->color_moved_ws_handling &
>> -                   COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
>> +                   (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
>> +                    COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
>>                         pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
>>                 else
>>                         pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
>> @@ -1088,6 +1172,17 @@ static void mark_color_as_moved(struct diff_options *o,
>>                                                 pmb[pmb_nr++] = match;
>>                                         } else
>>                                                 free(wsd);
>> +                               } else if (o->color_moved_ws_handling &
>> +                                          COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE) {
>> +                                       int delta;
>> +
>> +                                       if (compute_mixed_ws_delta(l, match->es, &delta)) {
>> +                                               struct ws_delta *wsd = xmalloc(sizeof(*match->wsd));
>> +                                               wsd->delta = delta;
>> +                                               wsd->have_string = 0;
>> +                                               match->wsd = wsd;
>> +                                               pmb[pmb_nr++] = match;
> 
> I would want to keep mark_color_as_moved and friends smaller, and instead
> move the complexity to compute_ws_delta  which would check for the mode
> in `o` instead of repeating the modes in all these function.
> Just like cmp_in_block_with_wsd takes both modes into account

That makes sense, I'll fix it.

>> +                                       }
>>                                 } else {
>>                                         pmb[pmb_nr++] = match;
>>                                 }
>> @@ -5740,7 +5835,8 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>>                         struct hashmap add_lines, del_lines;
>>
>>                         if (o->color_moved_ws_handling &
>> -                           COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
>> +                           (COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE |
>> +                            COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE))
>>                                 o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
>>
>>                         hashmap_init(&del_lines, moved_entry_cmp, o, 0);
>> diff --git a/diff.h b/diff.h
>> index 5e6bcf0926..03628cda45 100644
>> --- a/diff.h
>> +++ b/diff.h
>> @@ -217,6 +217,7 @@ struct diff_options {
>>
>>         /* XDF_WHITESPACE_FLAGS regarding block detection are set at 2, 3, 4 */
>>         #define COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE (1<<5)
>> +       #define COLOR_MOVED_WS_ALLOW_MIXED_INDENTATION_CHANGE (1<<6)
>>         int color_moved_ws_handling;
>>  };
>>
>> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
>> index 41facf7abf..737dbd4a42 100755
>> --- a/t/t4015-diff-whitespace.sh
>> +++ b/t/t4015-diff-whitespace.sh
>> @@ -1902,4 +1902,93 @@ test_expect_success 'compare whitespace delta incompatible with other space opti
>>         test_i18ngrep allow-indentation-change err
>>  '
>>
>> +NUL=''
>> +test_expect_success 'compare mixed whitespace delta across moved blocks' '
>> +
>> +       git reset --hard &&
>> +       tr Q_ "\t " <<-EOF >text.txt &&
> 
> So this is the extended version of q_to_tab, as it also
> translates _ to blank.

Exactly

>> +       ${NUL}
> 
> is an empty line? So maybe s/NUL/EMPTY/ ?

That might be clearer

> I think the following test cases may be useful:
> 
>     (3x_) too short without
>     $EMPTY
>     (4x_)  being grouped across blank line
> 
> that will be indented to
> 
>     (3x_+n) too short without
>     $EMPTY
>     (4x_+n)  being grouped across blank line
> 
> i.e. the current test of grouping across an empty line
> always has the same indentation before and after, but we
> only care about the change in indentation, such that
> we should be able to have different indentation levels
> before and after an empty line in the code, and
> still count it as a block when they are indented the
> same amount.

That's a good idea, thanks

> Is it possible for a block to start with an empty line?

Yes, the block in the test starts with an empty line

> How do we handle multiple adjacent empty lines?

We group them all with the moved lines. This is slightly different to
--ignore-blank-lines which has a threshold on how may blank lines it
will ignore.

> Do we need tests for such special cases?

I would probably be best, picking up changes to the behavior of unusual
corner cases is best done with a test.

> I hope this helps, as I gave the feedback above
> mostly unstructured.

It's been really useful, thank for taking the time to look through the
patch so carefully.

Best Wishes

Phillip

> I'm excited about the skip blank lines mode, but
> I am not quite sure about the "just count" mode,
> as that is what I had originally IIRC but Jonathan
> seemed to not be fond of it. Maybe he remembers
> why.
> 
> Thanks,
> Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change
  2018-10-09  9:50     ` Phillip Wood
@ 2018-10-09 21:10       ` Stefan Beller
  2018-10-10 15:26         ` Phillip Wood
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Beller @ 2018-10-09 21:10 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Jonathan Tan, git

> As I said above I've more or less come to the view that the correctness
> of pythonic indentation is orthogonal to move detection as it affects
> all additions, not just those that correspond to moved lines.

Makes sense.

> > What is your use case, what kind of content do you process that
> > this patch would help you?
>
> I wrote this because I was re-factoring some shell code than was using a
> indentation step of four spaces but with tabs in the leading indentation
> which the current mode does not handle.

Ah that is good to know.

I was thinking whether we want to generalize the move detection into a more
generic "detect and fade out uninteresting things" and not just focus on white
spaces (but these are most often the uninteresting things).

Over the last year we had quite a couple of large refactorings, that
would have helped by that:
* For example the hash transition plan had a lot of patches that
  were basically s/char *sha1/struct object oid/ or some variation thereof.
* Introducing struct repository

I used the word diff to look at those patches, which helped a lot, but
maybe a mode that would allow me to mark this specific replacement
uninteresting would be even better.
Maybe this can be done as a piggyback on top of the move detection as
a "move in place, but with uninteresting pattern". The problem of this
is that the pattern needs to be accounted for when hashing the entries
into the hashmaps, which is easy when doing white spaces only.


> >> +       if (a->s == DIFF_SYMBOL_PLUS)
> >> +               *delta = la - lb;
> >> +       else
> >> +               *delta = lb - la;
> >
> > When writing the original feature I had reasons
> > not to rely on the symbol, as you could have
> > moved things from + to - (or the other way round)
> > and added or removed indentation. That is what the
> > `current_longer` is used for. But given that you only
> > count here, we can have negative numbers, so it
> > would work either way for adding or removing indentation.
> >
> > But then, why do we need to have a different sign
> > depending on the sign of the line?
>
> The check means that we get the same delta whichever way round the lines
> are compared. I think I added this because without it the highlighting
> gets broken if there is increase in indentation followed by an identical
> decrease on the next line.

But wouldn't we want to get that highlighted?
I do not quite understand the scenario, yet. Are both indented
and dedented part of the same block?


> >
> >> +       } else {
> >> +               BUG("no color_moved_ws_allow_indentation_change set");
> >
> > Instead of the BUG here could we have a switch/case (or if/else)
> > covering the complete space of delta->have_string instead?
> > Then we would not leave a lingering bug in the code base.
>
> I'm not sure what you mean, we cover all the existing
> color_moved_ws_handling values, I added the BUG() call to pick up future
> omissions if another mode is added. (If we go for a single mode none of
> this matters)

Ah, makes sense!

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change
  2018-10-09 21:10       ` Stefan Beller
@ 2018-10-10 15:26         ` Phillip Wood
  2018-10-10 18:05           ` Stefan Beller
  0 siblings, 1 reply; 44+ messages in thread
From: Phillip Wood @ 2018-10-10 15:26 UTC (permalink / raw)
  To: Stefan Beller, Phillip Wood; +Cc: Jonathan Tan, git

On 09/10/2018 22:10, Stefan Beller wrote:
>> As I said above I've more or less come to the view that the correctness
>> of pythonic indentation is orthogonal to move detection as it affects
>> all additions, not just those that correspond to moved lines.
> 
> Makes sense.

Right so are you happy for we to re-roll with a single 
allow-indentation-change mode based on my RFC?

> 
>>> What is your use case, what kind of content do you process that
>>> this patch would help you?
>>
>> I wrote this because I was re-factoring some shell code than was using a
>> indentation step of four spaces but with tabs in the leading indentation
>> which the current mode does not handle.
> 
> Ah that is good to know.
> 
> I was thinking whether we want to generalize the move detection into a more
> generic "detect and fade out uninteresting things" and not just focus on white
> spaces (but these are most often the uninteresting things).
> 
> Over the last year we had quite a couple of large refactorings, that
> would have helped by that:
> * For example the hash transition plan had a lot of patches that
>    were basically s/char *sha1/struct object oid/ or some variation thereof.
> * Introducing struct repository
> 
> I used the word diff to look at those patches, which helped a lot, but
> maybe a mode that would allow me to mark this specific replacement
> uninteresting would be even better.
> Maybe this can be done as a piggyback on top of the move detection as
> a "move in place, but with uninteresting pattern". The problem of this
> is that the pattern needs to be accounted for when hashing the entries
> into the hashmaps, which is easy when doing white spaces only.

Yes the I like the idea. Yesterday I was looking at Alban's patches to 
refactor the todo list handling for rebase -i and there are a lot of '.' 
to '->' changes which weren't particularly interesting though at least 
diff-highlight made it clear if that was the only change on a line. 
Incidentally --color-moved was very useful for looking at that series.

>>>> +       if (a->s == DIFF_SYMBOL_PLUS)
>>>> +               *delta = la - lb;
>>>> +       else
>>>> +               *delta = lb - la;
>>>
>>> When writing the original feature I had reasons
>>> not to rely on the symbol, as you could have
>>> moved things from + to - (or the other way round)
>>> and added or removed indentation. That is what the
>>> `current_longer` is used for. But given that you only
>>> count here, we can have negative numbers, so it
>>> would work either way for adding or removing indentation.
>>>
>>> But then, why do we need to have a different sign
>>> depending on the sign of the line?
>>
>> The check means that we get the same delta whichever way round the lines
>> are compared. I think I added this because without it the highlighting
>> gets broken if there is increase in indentation followed by an identical
>> decrease on the next line.
> 
> But wouldn't we want to get that highlighted?
> I do not quite understand the scenario, yet. Are both indented
> and dedented part of the same block?

With --color-moved=zebra the indented lines and the de-indented lines 
should be different colors, without the test they both ended up in the 
same block.

Best Wishes

Phillip
>>>
>>>> +       } else {
>>>> +               BUG("no color_moved_ws_allow_indentation_change set");
>>>
>>> Instead of the BUG here could we have a switch/case (or if/else)
>>> covering the complete space of delta->have_string instead?
>>> Then we would not leave a lingering bug in the code base.
>>
>> I'm not sure what you mean, we cover all the existing
>> color_moved_ws_handling values, I added the BUG() call to pick up future
>> omissions if another mode is added. (If we go for a single mode none of
>> this matters)
> 
> Ah, makes sense!
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change
  2018-10-10 15:26         ` Phillip Wood
@ 2018-10-10 18:05           ` Stefan Beller
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Beller @ 2018-10-10 18:05 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Jonathan Tan, git

On Wed, Oct 10, 2018 at 8:26 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> On 09/10/2018 22:10, Stefan Beller wrote:
> >> As I said above I've more or less come to the view that the correctness
> >> of pythonic indentation is orthogonal to move detection as it affects
> >> all additions, not just those that correspond to moved lines.
> >
> > Makes sense.
>
> Right so are you happy for we to re-roll with a single
> allow-indentation-change mode based on my RFC?

I am happy with that.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment
  2018-09-24 10:06 [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change Phillip Wood
                   ` (3 preceding siblings ...)
  2018-09-24 11:03 ` [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change Phillip Wood
@ 2018-11-16 11:03 ` Phillip Wood
  2018-11-16 11:03   ` [PATCH v1 1/9] diff: document --no-color-moved Phillip Wood
                     ` (8 more replies)
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
  5 siblings, 9 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-16 11:03 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

When trying out the new --color-moved-ws=allow-indentation-change I
was disappointed to discover it did not work if the indentation
contains a mix of spaces and tabs. This series reworks it so that it
does.

Since the rfc this series has grown a few fixes at the beginning. The
implementation has been reworked, the last two patches correspond to a
heavily reworked version the last patch of the rfc version, all the
other patches are new.

Phillip Wood (9):
  diff: document --no-color-moved
  diff: use whitespace consistently
  diff: allow --no-color-moved-ws
  diff --color-moved-ws: demonstrate false positives
  diff --color-moved-ws: fix false positives
  diff --color-moved=zebra: be stricter with color alternation
  diff --color-moved-ws: optimize allow-indentation-change
  diff --color-moved-ws: modify allow-indentation-change
  diff --color-moved-ws: handle blank lines

 Documentation/diff-options.txt |  15 ++-
 diff.c                         | 219 +++++++++++++++++++++------------
 t/t4015-diff-whitespace.sh     |  99 ++++++++++++++-
 3 files changed, 251 insertions(+), 82 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v1 1/9] diff: document --no-color-moved
  2018-11-16 11:03 ` [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
@ 2018-11-16 11:03   ` Phillip Wood
  2018-11-16 11:03   ` [PATCH v1 2/9] diff: use whitespace consistently Phillip Wood
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-16 11:03 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Add documentation for --no-color-moved.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 Documentation/diff-options.txt | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 0378cd574e..151690f814 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -293,6 +293,10 @@ dimmed-zebra::
 	`dimmed_zebra` is a deprecated synonym.
 --
 
+--no-color-moved::
+	Turn off move detection. This can be used to override configuration
+	settings. It is the same as `--color-moved=no`.
+
 --color-moved-ws=<modes>::
 	This configures how white spaces are ignored when performing the
 	move detection for `--color-moved`.
-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v1 2/9] diff: use whitespace consistently
  2018-11-16 11:03 ` [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
  2018-11-16 11:03   ` [PATCH v1 1/9] diff: document --no-color-moved Phillip Wood
@ 2018-11-16 11:03   ` Phillip Wood
  2018-11-16 18:29     ` Stefan Beller
  2018-11-16 11:03   ` [PATCH v1 3/9] diff: allow --no-color-moved-ws Phillip Wood
                     ` (6 subsequent siblings)
  8 siblings, 1 reply; 44+ messages in thread
From: Phillip Wood @ 2018-11-16 11:03 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Most of the documentation uses 'whitespace' rather than 'white space'
or 'white spaces' convert to latter two to the former for consistency.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 Documentation/diff-options.txt | 4 ++--
 diff.c                         | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 151690f814..57a2f4cb7a 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -298,7 +298,7 @@ dimmed-zebra::
 	settings. It is the same as `--color-moved=no`.
 
 --color-moved-ws=<modes>::
-	This configures how white spaces are ignored when performing the
+	This configures how whitespace is ignored when performing the
 	move detection for `--color-moved`.
 ifdef::git-diff[]
 	It can be set by the `diff.colorMovedWS` configuration setting.
@@ -316,7 +316,7 @@ ignore-all-space::
 	Ignore whitespace when comparing lines. This ignores differences
 	even if one line has whitespace where the other line has none.
 allow-indentation-change::
-	Initially ignore any white spaces in the move detection, then
+	Initially ignore any whitespace in the move detection, then
 	group the moved code blocks only into a block if the change in
 	whitespace is the same per line. This is incompatible with the
 	other modes.
diff --git a/diff.c b/diff.c
index c29b1cce14..78cd3958f4 100644
--- a/diff.c
+++ b/diff.c
@@ -320,7 +320,7 @@ static int parse_color_moved_ws(const char *arg)
 
 	if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
 	    (ret & XDF_WHITESPACE_FLAGS))
-		die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes"));
+		die(_("color-moved-ws: allow-indentation-change cannot be combined with other whitespace modes"));
 
 	string_list_clear(&l, 0);
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v1 3/9] diff: allow --no-color-moved-ws
  2018-11-16 11:03 ` [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
  2018-11-16 11:03   ` [PATCH v1 1/9] diff: document --no-color-moved Phillip Wood
  2018-11-16 11:03   ` [PATCH v1 2/9] diff: use whitespace consistently Phillip Wood
@ 2018-11-16 11:03   ` Phillip Wood
  2018-11-16 11:03   ` [PATCH v1 4/9] diff --color-moved-ws: demonstrate false positives Phillip Wood
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-16 11:03 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Allow --no-color-moved-ws and --color-moved-ws=no to cancel any previous
--color-moved-ws option.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 Documentation/diff-options.txt | 7 +++++++
 diff.c                         | 6 +++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 57a2f4cb7a..e1744fa80d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -306,6 +306,8 @@ endif::git-diff[]
 	These modes can be given as a comma separated list:
 +
 --
+no::
+	Do not ignore whitespace when performing move detection.
 ignore-space-at-eol::
 	Ignore changes in whitespace at EOL.
 ignore-space-change::
@@ -322,6 +324,11 @@ allow-indentation-change::
 	other modes.
 --
 
+--no-color-moved-ws::
+	Do not ignore whitespace when performing move detection. This can be
+	used to override configuration settings. It is the same as
+	`--color-moved-ws=no`.
+
 --word-diff[=<mode>]::
 	Show a word diff, using the <mode> to delimit changed words.
 	By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index 78cd3958f4..9b9811988b 100644
--- a/diff.c
+++ b/diff.c
@@ -304,7 +304,9 @@ static int parse_color_moved_ws(const char *arg)
 		strbuf_addstr(&sb, i->string);
 		strbuf_trim(&sb);
 
-		if (!strcmp(sb.buf, "ignore-space-change"))
+		if (!strcmp(sb.buf, "no"))
+			ret = 0;
+		else if (!strcmp(sb.buf, "ignore-space-change"))
 			ret |= XDF_IGNORE_WHITESPACE_CHANGE;
 		else if (!strcmp(sb.buf, "ignore-space-at-eol"))
 			ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
@@ -5008,6 +5010,8 @@ int diff_opt_parse(struct diff_options *options,
 		if (cm < 0)
 			die("bad --color-moved argument: %s", arg);
 		options->color_moved = cm;
+	} else if (!strcmp(arg, "--no-color-moved-ws")) {
+		options->color_moved_ws_handling = 0;
 	} else if (skip_prefix(arg, "--color-moved-ws=", &arg)) {
 		options->color_moved_ws_handling = parse_color_moved_ws(arg);
 	} else if (skip_to_optional_arg_default(arg, "--color-words", &options->word_regex, NULL)) {
-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v1 4/9] diff --color-moved-ws: demonstrate false positives
  2018-11-16 11:03 ` [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (2 preceding siblings ...)
  2018-11-16 11:03   ` [PATCH v1 3/9] diff: allow --no-color-moved-ws Phillip Wood
@ 2018-11-16 11:03   ` Phillip Wood
  2018-11-16 11:03   ` [PATCH v1 5/9] diff --color-moved-ws: fix " Phillip Wood
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-16 11:03 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

'diff --color-moved-ws=allow-indentation-change' can highlight lines
that have internal whitespace changes rather than indentation
changes. For example in commit 1a07e59c3e ("Update messages in
preparation for i18n", 2018-07-21) the lines

-               die (_("must end with a color"));
+               die(_("must end with a color"));

are highlighted as moved when they should not be. Modify an existing
test to show the problem that will be fixed in the next commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 t/t4015-diff-whitespace.sh | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index a9fb226c5a..eee81a1987 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1809,7 +1809,7 @@ test_expect_success 'only move detection ignores white spaces' '
 	test_cmp expected actual
 '
 
-test_expect_success 'compare whitespace delta across moved blocks' '
+test_expect_failure 'compare whitespace delta across moved blocks' '
 
 	git reset --hard &&
 	q_to_tab <<-\EOF >text.txt &&
@@ -1827,6 +1827,7 @@ test_expect_success 'compare whitespace delta across moved blocks' '
 	QQQthat has similar lines
 	QQQto previous blocks, but with different indent
 	QQQYetQAnotherQoutlierQ
+	QLine with internal w h i t e s p a c e change
 	EOF
 
 	git add text.txt &&
@@ -1847,6 +1848,7 @@ test_expect_success 'compare whitespace delta across moved blocks' '
 	QQthat has similar lines
 	QQto previous blocks, but with different indent
 	QQYetQAnotherQoutlier
+	QLine with internal whitespace change
 	EOF
 
 	git diff --color --color-moved --color-moved-ws=allow-indentation-change >actual.raw &&
@@ -1856,7 +1858,7 @@ test_expect_success 'compare whitespace delta across moved blocks' '
 		<BOLD>diff --git a/text.txt b/text.txt<RESET>
 		<BOLD>--- a/text.txt<RESET>
 		<BOLD>+++ b/text.txt<RESET>
-		<CYAN>@@ -1,14 +1,14 @@<RESET>
+		<CYAN>@@ -1,15 +1,15 @@<RESET>
 		<BOLD;MAGENTA>-QIndented<RESET>
 		<BOLD;MAGENTA>-QText across<RESET>
 		<BOLD;MAGENTA>-Qsome lines<RESET>
@@ -1871,6 +1873,7 @@ test_expect_success 'compare whitespace delta across moved blocks' '
 		<BOLD;MAGENTA>-QQQthat has similar lines<RESET>
 		<BOLD;MAGENTA>-QQQto previous blocks, but with different indent<RESET>
 		<RED>-QQQYetQAnotherQoutlierQ<RESET>
+		<RED>-QLine with internal w h i t e s p a c e change<RESET>
 		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Indented<RESET>
 		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Text across<RESET>
 		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>some lines<RESET>
@@ -1885,6 +1888,7 @@ test_expect_success 'compare whitespace delta across moved blocks' '
 		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>that has similar lines<RESET>
 		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>to previous blocks, but with different indent<RESET>
 		<GREEN>+<RESET>QQ<GREEN>YetQAnotherQoutlier<RESET>
+		<GREEN>+<RESET>Q<GREEN>Line with internal whitespace change<RESET>
 	EOF
 
 	test_cmp expected actual
-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v1 5/9] diff --color-moved-ws: fix false positives
  2018-11-16 11:03 ` [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (3 preceding siblings ...)
  2018-11-16 11:03   ` [PATCH v1 4/9] diff --color-moved-ws: demonstrate false positives Phillip Wood
@ 2018-11-16 11:03   ` " Phillip Wood
  2018-11-16 11:03   ` [PATCH v1 6/9] diff --color-moved=zebra: be stricter with color alternation Phillip Wood
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-16 11:03 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

'diff --color-moved-ws=allow-indentation-change' can color lines as
moved when they are in fact different. For example in commit
1a07e59c3e ("Update messages in preparation for i18n", 2018-07-21) the
lines

-               die (_("must end with a color"));
+               die(_("must end with a color"));

are colored as moved even though they are different.

This is because if there is a fuzzy match for the first line of
a potential moved block the line is marked as moved before the
potential match is checked to see if it actually matches. The fix is
to delay marking the line as moved until after we have checked that
there really is at least one matching potential moved block.

Note that the test modified in the last commit still fails because
adding an unmoved line between two moved blocks that are already
separated by unmoved lines changes the color of the block following the
addition. This should not be the case and will be fixed in the next
commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index 9b9811988b..53a7ab5aca 100644
--- a/diff.c
+++ b/diff.c
@@ -1104,10 +1104,10 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 		}
 
-		l->flags |= DIFF_SYMBOL_MOVED_LINE;
-
-		if (o->color_moved == COLOR_MOVED_PLAIN)
+		if (o->color_moved == COLOR_MOVED_PLAIN) {
+			l->flags |= DIFF_SYMBOL_MOVED_LINE;
 			continue;
+		}
 
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
@@ -1141,10 +1141,13 @@ static void mark_color_as_moved(struct diff_options *o,
 			block_length = 0;
 		}
 
-		block_length++;
+		if (pmb_nr) {
+			block_length++;
 
-		if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
-			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
+			l->flags |= DIFF_SYMBOL_MOVED_LINE;
+			if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
+				l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
+		}
 	}
 	adjust_last_block(o, n, block_length);
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v1 6/9] diff --color-moved=zebra: be stricter with color alternation
  2018-11-16 11:03 ` [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (4 preceding siblings ...)
  2018-11-16 11:03   ` [PATCH v1 5/9] diff --color-moved-ws: fix " Phillip Wood
@ 2018-11-16 11:03   ` Phillip Wood
  2018-11-16 11:03   ` [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change Phillip Wood
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-16 11:03 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Currently when using --color-moved=zebra the color of moved blocks
depends on the number of lines separating them. This means that adding
an odd number of unmoved lines between blocks that are already separated
by one or more unmoved lines will change the color of subsequent moved
blocks. This does not make much sense as the blocks were already
separated by unmoved lines and causes problems when adding lines to test
cases.

Fix this by only using the alternate colors for adjacent moved blocks.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    An alternative would be to always alternate the color of blocks whether
    are not they are adjacent to each other.

 diff.c                     | 27 +++++++++++++++++++--------
 t/t4015-diff-whitespace.sh |  6 +++---
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index 53a7ab5aca..8c08dd68df 100644
--- a/diff.c
+++ b/diff.c
@@ -1038,26 +1038,30 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb,
  * The last block consists of the (n - block_length)'th line up to but not
  * including the nth line.
  *
+ * Returns 0 if the last block is empty or is unset by this function, non zero
+ * otherwise.
+ *
  * NEEDSWORK: This uses the same heuristic as blame_entry_score() in blame.c.
  * Think of a way to unify them.
  */
-static void adjust_last_block(struct diff_options *o, int n, int block_length)
+static int adjust_last_block(struct diff_options *o, int n, int block_length)
 {
 	int i, alnum_count = 0;
 	if (o->color_moved == COLOR_MOVED_PLAIN)
-		return;
+		return block_length;
 	for (i = 1; i < block_length + 1; i++) {
 		const char *c = o->emitted_symbols->buf[n - i].line;
 		for (; *c; c++) {
 			if (!isalnum(*c))
 				continue;
 			alnum_count++;
 			if (alnum_count >= COLOR_MOVED_MIN_ALNUM_COUNT)
-				return;
+				return 1;
 		}
 	}
 	for (i = 1; i < block_length + 1; i++)
 		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
+	return 0;
 }
 
 /* Find blocks of moved code, delegate actual coloring decision to helper */
@@ -1067,14 +1071,15 @@ static void mark_color_as_moved(struct diff_options *o,
 {
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
-	int n, flipped_block = 1, block_length = 0;
+	int n, flipped_block = 0, block_length = 0;
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
 		struct hashmap *hm = NULL;
 		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
+		enum diff_symbol last_symbol = 0;
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
@@ -1090,7 +1095,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			free(key);
 			break;
 		default:
-			flipped_block = 1;
+			flipped_block = 0;
 		}
 
 		if (!match) {
@@ -1101,10 +1106,13 @@ static void mark_color_as_moved(struct diff_options *o,
 				moved_block_clear(&pmb[i]);
 			pmb_nr = 0;
 			block_length = 0;
+			flipped_block = 0;
+			last_symbol = l->s;
 			continue;
 		}
 
 		if (o->color_moved == COLOR_MOVED_PLAIN) {
+			last_symbol = l->s;
 			l->flags |= DIFF_SYMBOL_MOVED_LINE;
 			continue;
 		}
@@ -1135,19 +1143,22 @@ static void mark_color_as_moved(struct diff_options *o,
 				}
 			}
 
-			flipped_block = (flipped_block + 1) % 2;
+			if (adjust_last_block(o, n, block_length) &&
+			    pmb_nr && last_symbol != l->s)
+				flipped_block = (flipped_block + 1) % 2;
+			else
+				flipped_block = 0;
 
-			adjust_last_block(o, n, block_length);
 			block_length = 0;
 		}
 
 		if (pmb_nr) {
 			block_length++;
-
 			l->flags |= DIFF_SYMBOL_MOVED_LINE;
 			if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
 				l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 		}
+		last_symbol = l->s;
 	}
 	adjust_last_block(o, n, block_length);
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index eee81a1987..fe8a2ab06e 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1802,14 +1802,14 @@ test_expect_success 'only move detection ignores white spaces' '
 	<BOLD;MAGENTA>-a long line to exceed per-line minimum<RESET>
 	<BOLD;MAGENTA>-another long line to exceed per-line minimum<RESET>
 	<RED>-original file<RESET>
-	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>a long line to exceed per-line minimum<RESET>
-	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>another long line to exceed per-line minimum<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>a long line to exceed per-line minimum<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>another long line to exceed per-line minimum<RESET>
 	<GREEN>+<RESET><GREEN>new file<RESET>
 	EOF
 	test_cmp expected actual
 '
 
-test_expect_failure 'compare whitespace delta across moved blocks' '
+test_expect_success 'compare whitespace delta across moved blocks' '
 
 	git reset --hard &&
 	q_to_tab <<-\EOF >text.txt &&
-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change
  2018-11-16 11:03 ` [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (5 preceding siblings ...)
  2018-11-16 11:03   ` [PATCH v1 6/9] diff --color-moved=zebra: be stricter with color alternation Phillip Wood
@ 2018-11-16 11:03   ` Phillip Wood
  2018-11-16 20:40     ` Stefan Beller
  2018-11-16 11:03   ` [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change Phillip Wood
  2018-11-16 11:03   ` [PATCH v1 9/9] diff --color-moved-ws: handle blank lines Phillip Wood
  8 siblings, 1 reply; 44+ messages in thread
From: Phillip Wood @ 2018-11-16 11:03 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

When running

  git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0

cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
return after comparing a and b. By comparing the lengths first we can
return early in all but 0.03% of those cases without dereferencing the
string pointers. The comparison between a and c fails in 6.8% of
calls, by comparing the lengths first we reject all the failing calls
without dereferencing the string pointers.

This reduces the time to run the command above by by 42% from 14.6s to
8.5s. This is still much slower than the normal --color-moved which
takes ~0.6-0.7s to run but is a significant improvement.

The next commits will replace the current implementation with one that
works with mixed tabs and spaces in the indentation. I think it is
worth optimizing the current implementation first to enable a fair
comparison between the two implementations.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 8c08dd68df..c378ce3daf 100644
--- a/diff.c
+++ b/diff.c
@@ -829,20 +829,23 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 				 int n)
 {
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-	int al = cur->es->len, cl = l->len;
+	int al = cur->es->len, bl = match->es->len, cl = l->len;
 	const char *a = cur->es->line,
 		   *b = match->es->line,
 		   *c = l->line;
-
+	const char *orig_a = a;
 	int wslen;
 
 	/*
-	 * We need to check if 'cur' is equal to 'match'.
-	 * As those are from the same (+/-) side, we do not need to adjust for
-	 * indent changes. However these were found using fuzzy matching
-	 * so we do have to check if they are equal.
+	 * We need to check if 'cur' is equal to 'match'.  As those
+	 * are from the same (+/-) side, we do not need to adjust for
+	 * indent changes. However these were found using fuzzy
+	 * matching so we do have to check if they are equal. Here we
+	 * just check the lengths. We delay calling memcmp() to check
+	 * the contents until later as if the length comparison for a
+	 * and c fails we can avoid the call all together.
 	 */
-	if (strcmp(a, b))
+	if (al != bl)
 		return 1;
 
 	if (!pmb->wsd.string)
@@ -870,7 +873,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 		al -= wslen;
 	}
 
-	if (al != cl || memcmp(a, c, al))
+	if (al != cl || memcmp(orig_a, b, bl) || memcmp(a, c, al))
 		return 1;
 
 	return 0;
-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change
  2018-11-16 11:03 ` [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (6 preceding siblings ...)
  2018-11-16 11:03   ` [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change Phillip Wood
@ 2018-11-16 11:03   ` Phillip Wood
  2018-11-16 21:47     ` Stefan Beller
  2018-11-16 11:03   ` [PATCH v1 9/9] diff --color-moved-ws: handle blank lines Phillip Wood
  8 siblings, 1 reply; 44+ messages in thread
From: Phillip Wood @ 2018-11-16 11:03 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Currently diff --color-moved-ws=allow-indentation-change does not
support indentation that contains a mix of tabs and spaces. For
example in commit 546f70f377 ("convert.h: drop 'extern' from function
declaration", 2018-06-30) the function parameters in the following
lines are not colored as moved [1].

-extern int stream_filter(struct stream_filter *,
-                        const char *input, size_t *isize_p,
-                        char *output, size_t *osize_p);
+int stream_filter(struct stream_filter *,
+                 const char *input, size_t *isize_p,
+                 char *output, size_t *osize_p);

This commit changes the way the indentation is handled to track the
visual size of the indentation rather than the characters in the
indentation. This has they benefit that any whitespace errors do not
interfer with the move detection (the whitespace errors will still be
highlighted according to --ws-error-highlight). During the discussion
of this feature there were concerns about the correct detection of
indentation for python. However those concerns apply whether or not
we're detecting moved lines so no attempt is made to determine if the
indentation is 'pythonic'.

[1] Note that before the commit to fix the erroneous coloring of moved
    lines each line was colored as a different block, since that commit
    they are uncolored.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    Changes since rfc:
     - It now replaces the existing implementation rather than adding a new
       mode.
     - The indentation deltas are now calculated once for each line and
       cached.
     - Optimized the whitespace delta comparison to compare string lengths
       before comparing the actual strings.
     - Modified the calculation of tabs as suggested by Stefan.
     - Split out the blank line handling into a separate commit as suggest
       by Stefan.
     - Fixed some comments pointed out by Stefan.

 diff.c                     | 130 +++++++++++++++++++++----------------
 t/t4015-diff-whitespace.sh |  56 ++++++++++++++++
 2 files changed, 129 insertions(+), 57 deletions(-)

diff --git a/diff.c b/diff.c
index c378ce3daf..89559293e7 100644
--- a/diff.c
+++ b/diff.c
@@ -750,6 +750,8 @@ struct emitted_diff_symbol {
 	const char *line;
 	int len;
 	int flags;
+	int indent_off;
+	int indent_width;
 	enum diff_symbol s;
 };
 #define EMITTED_DIFF_SYMBOL_INIT {NULL}
@@ -780,44 +782,68 @@ struct moved_entry {
 	struct moved_entry *next_line;
 };
 
-/**
- * The struct ws_delta holds white space differences between moved lines, i.e.
- * between '+' and '-' lines that have been detected to be a move.
- * The string contains the difference in leading white spaces, before the
- * rest of the line is compared using the white space config for move
- * coloring. The current_longer indicates if the first string in the
- * comparision is longer than the second.
- */
-struct ws_delta {
-	char *string;
-	unsigned int current_longer : 1;
-};
-#define WS_DELTA_INIT { NULL, 0 }
-
 struct moved_block {
 	struct moved_entry *match;
-	struct ws_delta wsd;
+	int wsd; /* The whitespace delta of this block */
 };
 
 static void moved_block_clear(struct moved_block *b)
 {
-	FREE_AND_NULL(b->wsd.string);
-	b->match = NULL;
+	memset(b, 0, sizeof(*b));
+}
+
+static void fill_es_indent_data(struct emitted_diff_symbol *es)
+{
+	unsigned int off = 0;
+	int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK;
+	const char *s = es->line;
+	const int len = es->len;
+
+	/* skip any \v \f \r at start of indentation */
+	while (s[off] == '\f' || s[off] == '\v' ||
+	       (s[off] == '\r' && off < len - 1))
+		off++;
+
+	/* calculate the visual width of indentation */
+	while(1) {
+		if (s[off] == ' ') {
+			width++;
+			off++;
+		} else if (s[off] == '\t') {
+			width += tab_width - (width % tab_width);
+			while (s[++off] == '\t')
+				width += tab_width;
+		} else {
+			break;
+		}
+	}
+
+	es->indent_off = off;
+	es->indent_width = width;
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
-			     const struct emitted_diff_symbol *b,
-			     struct ws_delta *out)
+			    const struct emitted_diff_symbol *b,
+			    int *out)
 {
-	const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
-	const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
-	int d = longer->len - shorter->len;
+	int a_len = a->len,
+	    b_len = b->len,
+	    a_off = a->indent_off,
+	    a_width = a->indent_width,
+	    b_off = b->indent_off,
+	    b_width = b->indent_width;
+	int delta;
 
-	if (strncmp(longer->line + d, shorter->line, shorter->len))
+	if (a->s == DIFF_SYMBOL_PLUS)
+		delta = a_width - b_width;
+	else
+		delta = b_width - a_width;
+
+	if (a_len - a_off != b_len - b_off ||
+	    memcmp(a->line + a_off, b->line + b_off, a_len - a_off))
 		return 0;
 
-	out->string = xmemdupz(longer->line, d);
-	out->current_longer = (a == longer);
+	*out = delta;
 
 	return 1;
 }
@@ -833,8 +859,11 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	const char *a = cur->es->line,
 		   *b = match->es->line,
 		   *c = l->line;
-	const char *orig_a = a;
-	int wslen;
+	int a_off = cur->es->indent_off,
+	    a_width = cur->es->indent_width,
+	    c_off = l->indent_off,
+	    c_width = l->indent_width;
+	int delta;
 
 	/*
 	 * We need to check if 'cur' is equal to 'match'.  As those
@@ -848,35 +877,20 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	if (al != bl)
 		return 1;
 
-	if (!pmb->wsd.string)
-		/*
-		 * The white space delta is not active? This can happen
-		 * when we exit early in this function.
-		 */
-		return 1;
-
 	/*
-	 * The indent changes of the block are known and stored in
-	 * pmb->wsd; however we need to check if the indent changes of the
-	 * current line are still the same as before.
-	 *
-	 * To do so we need to compare 'l' to 'cur', adjusting the
-	 * one of them for the white spaces, depending which was longer.
+	 * The indent changes of the block are known and stored in pmb->wsd;
+	 * however we need to check if the indent changes of the current line
+	 * match those of the current block and that the text of 'l' and 'cur'
+	 * after the indentation match.
 	 */
+	if (cur->es->s == DIFF_SYMBOL_PLUS)
+		delta = a_width - c_width;
+	else
+		delta = c_width - a_width;
 
-	wslen = strlen(pmb->wsd.string);
-	if (pmb->wsd.current_longer) {
-		c += wslen;
-		cl -= wslen;
-	} else {
-		a += wslen;
-		al -= wslen;
-	}
-
-	if (al != cl || memcmp(orig_a, b, bl) || memcmp(a, c, al))
-		return 1;
-
-	return 0;
+	return !(delta == pmb->wsd && al - a_off == cl - c_off &&
+		 !memcmp(a, b, al) && !
+		 memcmp(a + a_off, c + c_off, al - a_off));
 }
 
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
@@ -942,6 +956,9 @@ static void add_lines_to_move_detection(struct diff_options *o,
 			continue;
 		}
 
+		if (o->color_moved_ws_handling &
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			fill_es_indent_data(&o->emitted_symbols->buf[n]);
 		key = prepare_entry(o, n);
 		if (prev_line && prev_line->es->s == o->emitted_symbols->buf[n].s)
 			prev_line->next_line = key;
@@ -1020,8 +1037,7 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb,
 
 		if (lp < pmb_nr && rp > -1 && lp < rp) {
 			pmb[lp] = pmb[rp];
-			pmb[rp].match = NULL;
-			pmb[rp].wsd.string = NULL;
+			memset(&pmb[rp], 0, sizeof(pmb[rp]));
 			rp--;
 			lp++;
 		}
@@ -1141,7 +1157,7 @@ static void mark_color_as_moved(struct diff_options *o,
 							     &pmb[pmb_nr].wsd))
 						pmb[pmb_nr++].match = match;
 				} else {
-					pmb[pmb_nr].wsd.string = NULL;
+					pmb[pmb_nr].wsd = 0;
 					pmb[pmb_nr++].match = match;
 				}
 			}
@@ -1507,7 +1523,7 @@ static void emit_diff_symbol_from_struct(struct diff_options *o,
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
 			     const char *line, int len, unsigned flags)
 {
-	struct emitted_diff_symbol e = {line, len, flags, s};
+	struct emitted_diff_symbol e = {line, len, flags, 0, 0, s};
 
 	if (o->emitted_symbols)
 		append_emitted_diff_symbol(o, &e);
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index fe8a2ab06e..e023839ba6 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta incompatible with other space opti
 	test_i18ngrep allow-indentation-change err
 '
 
+test_expect_success 'compare mixed whitespace delta across moved blocks' '
+
+	git reset --hard &&
+	tr Q_ "\t " <<-EOF >text.txt &&
+	____Indented text to
+	_Q____be further indented by four spaces across
+	____Qseveral lines
+	QQ____These two lines have had their
+	____indentation reduced by four spaces
+	Qdifferent indentation change
+	____too short
+	EOF
+
+	git add text.txt &&
+	git commit -m "add text.txt" &&
+
+	tr Q_ "\t " <<-EOF >text.txt &&
+	QIndented text to
+	QQbe further indented by four spaces across
+	Q____several lines
+	Q_QThese two lines have had their
+	indentation reduced by four spaces
+	QQdifferent indentation change
+	__Qtoo short
+	EOF
+
+	git -c color.diff.whitespace="normal red" \
+		-c core.whitespace=space-before-tab \
+		diff --color --color-moved --ws-error-highlight=all \
+		--color-moved-ws=allow-indentation-change >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
+
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/text.txt b/text.txt<RESET>
+	<BOLD>--- a/text.txt<RESET>
+	<BOLD>+++ b/text.txt<RESET>
+	<CYAN>@@ -1,7 +1,7 @@<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    Indented text to<RESET>
+	<BOLD;MAGENTA>-<RESET><BRED> <RESET>	<BOLD;MAGENTA>    be further indented by four spaces across<RESET>
+	<BOLD;MAGENTA>-<RESET><BRED>    <RESET>	<BOLD;MAGENTA>several lines<RESET>
+	<BOLD;BLUE>-<RESET>		<BOLD;BLUE>    These two lines have had their<RESET>
+	<BOLD;BLUE>-<RESET><BOLD;BLUE>    indentation reduced by four spaces<RESET>
+	<BOLD;MAGENTA>-<RESET>	<BOLD;MAGENTA>different indentation change<RESET>
+	<RED>-<RESET><RED>    too short<RESET>
+	<BOLD;CYAN>+<RESET>	<BOLD;CYAN>Indented text to<RESET>
+	<BOLD;CYAN>+<RESET>		<BOLD;CYAN>be further indented by four spaces across<RESET>
+	<BOLD;CYAN>+<RESET>	<BOLD;CYAN>    several lines<RESET>
+	<BOLD;YELLOW>+<RESET>	<BRED> <RESET>	<BOLD;YELLOW>These two lines have had their<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>indentation reduced by four spaces<RESET>
+	<BOLD;CYAN>+<RESET>		<BOLD;CYAN>different indentation change<RESET>
+	<GREEN>+<RESET><BRED>  <RESET>	<GREEN>too short<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v1 9/9] diff --color-moved-ws: handle blank lines
  2018-11-16 11:03 ` [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (7 preceding siblings ...)
  2018-11-16 11:03   ` [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change Phillip Wood
@ 2018-11-16 11:03   ` Phillip Wood
  2018-11-20 18:05     ` Stefan Beller
  8 siblings, 1 reply; 44+ messages in thread
From: Phillip Wood @ 2018-11-16 11:03 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

When using --color-moved-ws=allow-indentation-change allow lines with
the same indentation change to be grouped across blank lines. For now
this only works if the blank lines have been moved as well, not for
blocks that have just had their indentation changed.

This completes the changes to the implementation of
--color-moved=allow-indentation-change. Running

  git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0

now takes 5.0s. This is a saving of 41% from 8.5s for the optimized
version of the previous implementation and 66% from the original which
took 14.6s.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---

Notes:
    Changes since rfc:
     - Split these changes into a separate commit.
     - Detect blank lines when processing the indentation rather than
       parsing each line twice.
     - Tweaked the test to make it harder as suggested by Stefan.
     - Added timing data to the commit message.

 diff.c                     | 34 ++++++++++++++++++++++++++++---
 t/t4015-diff-whitespace.sh | 41 ++++++++++++++++++++++++++++++++++----
 2 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 89559293e7..072b5bced6 100644
--- a/diff.c
+++ b/diff.c
@@ -792,9 +792,11 @@ static void moved_block_clear(struct moved_block *b)
 	memset(b, 0, sizeof(*b));
 }
 
+#define INDENT_BLANKLINE INT_MIN
+
 static void fill_es_indent_data(struct emitted_diff_symbol *es)
 {
-	unsigned int off = 0;
+	unsigned int off = 0, i;
 	int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK;
 	const char *s = es->line;
 	const int len = es->len;
@@ -818,8 +820,18 @@ static void fill_es_indent_data(struct emitted_diff_symbol *es)
 		}
 	}
 
-	es->indent_off = off;
-	es->indent_width = width;
+	/* check if this line is blank */
+	for (i = off; i < len; i++)
+		if (!isspace(s[i]))
+		    break;
+
+	if (i == len) {
+		es->indent_width = INDENT_BLANKLINE;
+		es->indent_off = len;
+	} else {
+		es->indent_off = off;
+		es->indent_width = width;
+	}
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
@@ -834,6 +846,11 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 	    b_width = b->indent_width;
 	int delta;
 
+	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
+		*out = INDENT_BLANKLINE;
+		return 1;
+	}
+
 	if (a->s == DIFF_SYMBOL_PLUS)
 		delta = a_width - b_width;
 	else
@@ -877,6 +894,10 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	if (al != bl)
 		return 1;
 
+	/* If 'l' and 'cur' are both blank then they match. */
+	if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE)
+		return 0;
+
 	/*
 	 * The indent changes of the block are known and stored in pmb->wsd;
 	 * however we need to check if the indent changes of the current line
@@ -888,6 +909,13 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	else
 		delta = c_width - a_width;
 
+	/*
+	 * If the previous lines of this block were all blank then set its
+	 * whitespace delta.
+	 */
+	if (pmb->wsd == INDENT_BLANKLINE)
+		pmb->wsd = delta;
+
 	return !(delta == pmb->wsd && al - a_off == cl - c_off &&
 		 !memcmp(a, b, al) && !
 		 memcmp(a + a_off, c + c_off, al - a_off));
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index e023839ba6..9d6f88b07f 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1901,10 +1901,20 @@ test_expect_success 'compare whitespace delta incompatible with other space opti
 	test_i18ngrep allow-indentation-change err
 '
 
+EMPTY=''
 test_expect_success 'compare mixed whitespace delta across moved blocks' '
 
 	git reset --hard &&
 	tr Q_ "\t " <<-EOF >text.txt &&
+	${EMPTY}
+	____too short without
+	${EMPTY}
+	___being grouped across blank line
+	${EMPTY}
+	context
+	lines
+	to
+	anchor
 	____Indented text to
 	_Q____be further indented by four spaces across
 	____Qseveral lines
@@ -1918,9 +1928,18 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	git commit -m "add text.txt" &&
 
 	tr Q_ "\t " <<-EOF >text.txt &&
+	context
+	lines
+	to
+	anchor
 	QIndented text to
 	QQbe further indented by four spaces across
 	Q____several lines
+	${EMPTY}
+	QQtoo short without
+	${EMPTY}
+	Q_______being grouped across blank line
+	${EMPTY}
 	Q_QThese two lines have had their
 	indentation reduced by four spaces
 	QQdifferent indentation change
@@ -1937,7 +1956,16 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	<BOLD>diff --git a/text.txt b/text.txt<RESET>
 	<BOLD>--- a/text.txt<RESET>
 	<BOLD>+++ b/text.txt<RESET>
-	<CYAN>@@ -1,7 +1,7 @@<RESET>
+	<CYAN>@@ -1,16 +1,16 @@<RESET>
+	<BOLD;MAGENTA>-<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    too short without<RESET>
+	<BOLD;MAGENTA>-<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>   being grouped across blank line<RESET>
+	<BOLD;MAGENTA>-<RESET>
+	 <RESET>context<RESET>
+	 <RESET>lines<RESET>
+	 <RESET>to<RESET>
+	 <RESET>anchor<RESET>
 	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    Indented text to<RESET>
 	<BOLD;MAGENTA>-<RESET><BRED> <RESET>	<BOLD;MAGENTA>    be further indented by four spaces across<RESET>
 	<BOLD;MAGENTA>-<RESET><BRED>    <RESET>	<BOLD;MAGENTA>several lines<RESET>
@@ -1948,9 +1976,14 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	<BOLD;CYAN>+<RESET>	<BOLD;CYAN>Indented text to<RESET>
 	<BOLD;CYAN>+<RESET>		<BOLD;CYAN>be further indented by four spaces across<RESET>
 	<BOLD;CYAN>+<RESET>	<BOLD;CYAN>    several lines<RESET>
-	<BOLD;YELLOW>+<RESET>	<BRED> <RESET>	<BOLD;YELLOW>These two lines have had their<RESET>
-	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>indentation reduced by four spaces<RESET>
-	<BOLD;CYAN>+<RESET>		<BOLD;CYAN>different indentation change<RESET>
+	<BOLD;YELLOW>+<RESET>
+	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>too short without<RESET>
+	<BOLD;YELLOW>+<RESET>
+	<BOLD;YELLOW>+<RESET>	<BOLD;YELLOW>       being grouped across blank line<RESET>
+	<BOLD;YELLOW>+<RESET>
+	<BOLD;CYAN>+<RESET>	<BRED> <RESET>	<BOLD;CYAN>These two lines have had their<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>indentation reduced by four spaces<RESET>
+	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>different indentation change<RESET>
 	<GREEN>+<RESET><BRED>  <RESET>	<GREEN>too short<RESET>
 	EOF
 
-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 2/9] diff: use whitespace consistently
  2018-11-16 11:03   ` [PATCH v1 2/9] diff: use whitespace consistently Phillip Wood
@ 2018-11-16 18:29     ` Stefan Beller
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Beller @ 2018-11-16 18:29 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Most of the documentation uses 'whitespace' rather than 'white space'
> or 'white spaces' convert to latter two to the former for consistency.

Makes sense; this doesn't touch docs, but also code.
$ git grep "white space" yields some other places
as well (Documentation/git-cat-file.txt and lots in t/)
But I guess we keep it to this feature for now instead
of a tree wide cleanup.

Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change
  2018-11-16 11:03   ` [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change Phillip Wood
@ 2018-11-16 20:40     ` Stefan Beller
  2018-11-17 14:52       ` Phillip Wood
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Beller @ 2018-11-16 20:40 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> When running
>
>   git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0
>
> cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
> return after comparing a and b. By comparing the lengths first we can
> return early in all but 0.03% of those cases without dereferencing the
> string pointers. The comparison between a and c fails in 6.8% of
> calls, by comparing the lengths first we reject all the failing calls
> without dereferencing the string pointers.
>
> This reduces the time to run the command above by by 42% from 14.6s to
> 8.5s. This is still much slower than the normal --color-moved which
> takes ~0.6-0.7s to run but is a significant improvement.
>
> The next commits will replace the current implementation with one that
> works with mixed tabs and spaces in the indentation. I think it is
> worth optimizing the current implementation first to enable a fair
> comparison between the two implementations.

Up to here the series looks good and I think we could take it
as a preparatory self-standing series.

I'll read on.
Thanks,
Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change
  2018-11-16 11:03   ` [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change Phillip Wood
@ 2018-11-16 21:47     ` Stefan Beller
  2018-11-17 14:59       ` Phillip Wood
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Beller @ 2018-11-16 21:47 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Currently diff --color-moved-ws=allow-indentation-change does not
> support indentation that contains a mix of tabs and spaces. For
> example in commit 546f70f377 ("convert.h: drop 'extern' from function
> declaration", 2018-06-30) the function parameters in the following
> lines are not colored as moved [1].
>
> -extern int stream_filter(struct stream_filter *,
> -                        const char *input, size_t *isize_p,
> -                        char *output, size_t *osize_p);
> +int stream_filter(struct stream_filter *,
> +                 const char *input, size_t *isize_p,
> +                 char *output, size_t *osize_p);
>
> This commit changes the way the indentation is handled to track the
> visual size of the indentation rather than the characters in the
> indentation. This has they benefit that any whitespace errors do not

s/they/the/

> interfer with the move detection (the whitespace errors will still be
> highlighted according to --ws-error-highlight). During the discussion
> of this feature there were concerns about the correct detection of
> indentation for python. However those concerns apply whether or not
> we're detecting moved lines so no attempt is made to determine if the
> indentation is 'pythonic'.
>
> [1] Note that before the commit to fix the erroneous coloring of moved
>     lines each line was colored as a different block, since that commit
>     they are uncolored.
>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>
> Notes:
>     Changes since rfc:
>      - It now replaces the existing implementation rather than adding a new
>        mode.
>      - The indentation deltas are now calculated once for each line and
>        cached.
>      - Optimized the whitespace delta comparison to compare string lengths
>        before comparing the actual strings.
>      - Modified the calculation of tabs as suggested by Stefan.
>      - Split out the blank line handling into a separate commit as suggest
>        by Stefan.
>      - Fixed some comments pointed out by Stefan.
>
>  diff.c                     | 130 +++++++++++++++++++++----------------
>  t/t4015-diff-whitespace.sh |  56 ++++++++++++++++
>  2 files changed, 129 insertions(+), 57 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index c378ce3daf..89559293e7 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -750,6 +750,8 @@ struct emitted_diff_symbol {
>         const char *line;
>         int len;
>         int flags;
> +       int indent_off;
> +       int indent_width;

So this is the trick how we compute the ws related
data only once per line. :-)

On the other hand, we do not save memory by disabling
the ws detection, but I guess that is not a problem for now.

Would it make sense to have the new variables be
unsigned? (Also a comment on what they are, I
needed to read the code to understand off to be
offset into the line, where the content starts, and
width to be the visual width, as I did not recall
the RFC.)

> +static void fill_es_indent_data(struct emitted_diff_symbol *es)
> [...]

> +               if (o->color_moved_ws_handling &
> +                   COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
> +                       fill_es_indent_data(&o->emitted_symbols->buf[n]);

Nice.

By reducing the information kept around to ints, we also do not need
to alloc/free
memory for each line.

> +++ b/t/t4015-diff-whitespace.sh
> @@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta incompatible with other space opti
>         test_i18ngrep allow-indentation-change err
>  '
>
> +test_expect_success 'compare mixed whitespace delta across moved blocks' '

Looks good,

Thanks!
Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change
  2018-11-16 20:40     ` Stefan Beller
@ 2018-11-17 14:52       ` Phillip Wood
  0 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-17 14:52 UTC (permalink / raw)
  To: Stefan Beller, Phillip Wood; +Cc: git

On 16/11/2018 20:40, Stefan Beller wrote:
> On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>>
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> When running
>>
>>    git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0
>>
>> cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
>> return after comparing a and b. By comparing the lengths first we can
>> return early in all but 0.03% of those cases without dereferencing the
>> string pointers. The comparison between a and c fails in 6.8% of
>> calls, by comparing the lengths first we reject all the failing calls
>> without dereferencing the string pointers.
>>
>> This reduces the time to run the command above by by 42% from 14.6s to
>> 8.5s. This is still much slower than the normal --color-moved which
>> takes ~0.6-0.7s to run but is a significant improvement.
>>
>> The next commits will replace the current implementation with one that
>> works with mixed tabs and spaces in the indentation. I think it is
>> worth optimizing the current implementation first to enable a fair
>> comparison between the two implementations.
> 
> Up to here the series looks good and I think we could take it
> as a preparatory self-standing series.

Thanks for looking at these, I think it makes sense to split the series 
here, the commit message for this patch may want tweaking slightly if we 
do. (I did wonder about splitting it in two when I submitted it but took 
the easy way out.)

Best Wishes

Phillip
> 
> I'll read on.
> Thanks,
> Stefan
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change
  2018-11-16 21:47     ` Stefan Beller
@ 2018-11-17 14:59       ` Phillip Wood
  0 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-17 14:59 UTC (permalink / raw)
  To: Stefan Beller, Phillip Wood; +Cc: git

Hi Stefan

On 16/11/2018 21:47, Stefan Beller wrote:
> On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>>
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> Currently diff --color-moved-ws=allow-indentation-change does not
>> support indentation that contains a mix of tabs and spaces. For
>> example in commit 546f70f377 ("convert.h: drop 'extern' from function
>> declaration", 2018-06-30) the function parameters in the following
>> lines are not colored as moved [1].
>>
>> -extern int stream_filter(struct stream_filter *,
>> -                        const char *input, size_t *isize_p,
>> -                        char *output, size_t *osize_p);
>> +int stream_filter(struct stream_filter *,
>> +                 const char *input, size_t *isize_p,
>> +                 char *output, size_t *osize_p);
>>
>> This commit changes the way the indentation is handled to track the
>> visual size of the indentation rather than the characters in the
>> indentation. This has they benefit that any whitespace errors do not
> 
> s/they/the/

Thanks, well spotted

> 
>> interfer with the move detection (the whitespace errors will still be
>> highlighted according to --ws-error-highlight). During the discussion
>> of this feature there were concerns about the correct detection of
>> indentation for python. However those concerns apply whether or not
>> we're detecting moved lines so no attempt is made to determine if the
>> indentation is 'pythonic'.
>>
>> [1] Note that before the commit to fix the erroneous coloring of moved
>>      lines each line was colored as a different block, since that commit
>>      they are uncolored.
>>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>>
>> Notes:
>>      Changes since rfc:
>>       - It now replaces the existing implementation rather than adding a new
>>         mode.
>>       - The indentation deltas are now calculated once for each line and
>>         cached.
>>       - Optimized the whitespace delta comparison to compare string lengths
>>         before comparing the actual strings.
>>       - Modified the calculation of tabs as suggested by Stefan.
>>       - Split out the blank line handling into a separate commit as suggest
>>         by Stefan.
>>       - Fixed some comments pointed out by Stefan.
>>
>>   diff.c                     | 130 +++++++++++++++++++++----------------
>>   t/t4015-diff-whitespace.sh |  56 ++++++++++++++++
>>   2 files changed, 129 insertions(+), 57 deletions(-)
>>
>> diff --git a/diff.c b/diff.c
>> index c378ce3daf..89559293e7 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -750,6 +750,8 @@ struct emitted_diff_symbol {
>>          const char *line;
>>          int len;
>>          int flags;
>> +       int indent_off;
>> +       int indent_width;
> 
> So this is the trick how we compute the ws related
> data only once per line. :-)
> 
> On the other hand, we do not save memory by disabling
> the ws detection, but I guess that is not a problem for now.

I did wonder about that, but decided the increase was small compared to 
all the strings that are copied when creating the emitted_diff_symbols. 
If we want to save memory then we should stop struct 
emitted_diff_symbol() from carrying a copy of all the strings.

> Would it make sense to have the new variables be
> unsigned? (Also a comment on what they are, I
> needed to read the code to understand off to be
> offset into the line, where the content starts, and
> width to be the visual width, as I did not recall
> the RFC.)

Yes a comment would make sense. I don't think I have a strong preference 
for signed/unsigned, I can change it if you want.

Thanks for looking at these so promptly

Best Wishes

Phillip
>> +static void fill_es_indent_data(struct emitted_diff_symbol *es)
>> [...]
> 
>> +               if (o->color_moved_ws_handling &
>> +                   COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
>> +                       fill_es_indent_data(&o->emitted_symbols->buf[n]);
> 
> Nice.
> 
> By reducing the information kept around to ints, we also do not need
> to alloc/free
> memory for each line.
> 
>> +++ b/t/t4015-diff-whitespace.sh
>> @@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta incompatible with other space opti
>>          test_i18ngrep allow-indentation-change err
>>   '
>>
>> +test_expect_success 'compare mixed whitespace delta across moved blocks' '
> 
> Looks good,
> 
> Thanks!
> Stefan
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 9/9] diff --color-moved-ws: handle blank lines
  2018-11-16 11:03   ` [PATCH v1 9/9] diff --color-moved-ws: handle blank lines Phillip Wood
@ 2018-11-20 18:05     ` Stefan Beller
  2018-11-21 15:49       ` Phillip Wood
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Beller @ 2018-11-20 18:05 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git

On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> When using --color-moved-ws=allow-indentation-change allow lines with
> the same indentation change to be grouped across blank lines. For now
> this only works if the blank lines have been moved as well, not for
> blocks that have just had their indentation changed.
>
> This completes the changes to the implementation of
> --color-moved=allow-indentation-change. Running
>
>   git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0
>
> now takes 5.0s. This is a saving of 41% from 8.5s for the optimized
> version of the previous implementation and 66% from the original which
> took 14.6s.
>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>
> Notes:
>     Changes since rfc:
>      - Split these changes into a separate commit.
>      - Detect blank lines when processing the indentation rather than
>        parsing each line twice.
>      - Tweaked the test to make it harder as suggested by Stefan.
>      - Added timing data to the commit message.
>
>  diff.c                     | 34 ++++++++++++++++++++++++++++---
>  t/t4015-diff-whitespace.sh | 41 ++++++++++++++++++++++++++++++++++----
>  2 files changed, 68 insertions(+), 7 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 89559293e7..072b5bced6 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -792,9 +792,11 @@ static void moved_block_clear(struct moved_block *b)
>         memset(b, 0, sizeof(*b));
>  }
>
> +#define INDENT_BLANKLINE INT_MIN

Answering my question from the previous patch:
This is why we need to keep the indents signed.

This patch looks quite nice to read along.

The whole series looks good to me.
Do we need to update the docs in any way?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v1 9/9] diff --color-moved-ws: handle blank lines
  2018-11-20 18:05     ` Stefan Beller
@ 2018-11-21 15:49       ` Phillip Wood
  0 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-21 15:49 UTC (permalink / raw)
  To: Stefan Beller, Phillip Wood; +Cc: git

On 20/11/2018 18:05, Stefan Beller wrote:
> On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>>
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> When using --color-moved-ws=allow-indentation-change allow lines with
>> the same indentation change to be grouped across blank lines. For now
>> this only works if the blank lines have been moved as well, not for
>> blocks that have just had their indentation changed.
>>
>> This completes the changes to the implementation of
>> --color-moved=allow-indentation-change. Running
>>
>>    git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0
>>
>> now takes 5.0s. This is a saving of 41% from 8.5s for the optimized
>> version of the previous implementation and 66% from the original which
>> took 14.6s.
>>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>>
>> Notes:
>>      Changes since rfc:
>>       - Split these changes into a separate commit.
>>       - Detect blank lines when processing the indentation rather than
>>         parsing each line twice.
>>       - Tweaked the test to make it harder as suggested by Stefan.
>>       - Added timing data to the commit message.
>>
>>   diff.c                     | 34 ++++++++++++++++++++++++++++---
>>   t/t4015-diff-whitespace.sh | 41 ++++++++++++++++++++++++++++++++++----
>>   2 files changed, 68 insertions(+), 7 deletions(-)
>>
>> diff --git a/diff.c b/diff.c
>> index 89559293e7..072b5bced6 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -792,9 +792,11 @@ static void moved_block_clear(struct moved_block *b)
>>          memset(b, 0, sizeof(*b));
>>   }
>>
>> +#define INDENT_BLANKLINE INT_MIN
> 
> Answering my question from the previous patch:
> This is why we need to keep the indents signed.
> 
> This patch looks quite nice to read along.
> 
> The whole series looks good to me.

Thanks

> Do we need to update the docs in any way?

I'm not sure, at the moment it does not make any promises about the 
exact behavior of --color-moved-ws=allow-indentation-change, we could 
change it to be more explicit but I'm not sure it's worth it.

Thanks for looking over these patches, I'll post a reroll soon based on 
your comments.

Phillip

> Thanks,
> Stefan
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment
  2018-09-24 10:06 [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change Phillip Wood
                   ` (4 preceding siblings ...)
  2018-11-16 11:03 ` [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
@ 2018-11-23 11:16 ` Phillip Wood
  2018-11-23 11:16   ` [PATCH v2 1/9] diff: document --no-color-moved Phillip Wood
                     ` (10 more replies)
  5 siblings, 11 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-23 11:16 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller, Junio C Hamano; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Thanks to Stefan for his feedback on v1. I've updated patches 2 & 8 in
response to those comments - see the range-diff below for details (the
patch numbers are off by one in the range diff, I think because the
first patch is unchanged and so it was used as the merge base by
--range-diff=<old-head>. For some reason the range-diff also includes
the notes even though I did not give --notes to format-patch)

When trying out the new --color-moved-ws=allow-indentation-change I
was disappointed to discover it did not work if the indentation
contains a mix of spaces and tabs. This series reworks it so that it
does.


Phillip Wood (9):
  diff: document --no-color-moved
  Use "whitespace" consistently
  diff: allow --no-color-moved-ws
  diff --color-moved-ws: demonstrate false positives
  diff --color-moved-ws: fix false positives
  diff --color-moved=zebra: be stricter with color alternation
  diff --color-moved-ws: optimize allow-indentation-change
  diff --color-moved-ws: modify allow-indentation-change
  diff --color-moved-ws: handle blank lines

 Documentation/diff-options.txt |  15 ++-
 Documentation/git-cat-file.txt |   8 +-
 diff.c                         | 219 +++++++++++++++++++++------------
 t/t4015-diff-whitespace.sh     |  99 ++++++++++++++-
 4 files changed, 255 insertions(+), 86 deletions(-)

Range-diff against v1:
1:  ae58ae4f29 ! 1:  4939ee371d diff: use whitespace consistently
    @@ -1,9 +1,10 @@
     Author: Phillip Wood <phillip.wood@dunelm.org.uk>
     
    -    diff: use whitespace consistently
    +    Use "whitespace" consistently
     
    -    Most of the documentation uses 'whitespace' rather than 'white space'
    -    or 'white spaces' convert to latter two to the former for consistency.
    +    Most of the messages and documentation use 'whitespace' rather than
    +    'white space' or 'white spaces' convert to latter two to the former for
    +    consistency.
     
         Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
     
    @@ -29,6 +30,39 @@
      	whitespace is the same per line. This is incompatible with the
      	other modes.
     
    + diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
    + --- a/Documentation/git-cat-file.txt
    + +++ b/Documentation/git-cat-file.txt
    +@@
    + stdin, and the SHA-1, type, and size of each object is printed on stdout. The
    + output format can be overridden using the optional `<format>` argument. If
    + either `--textconv` or `--filters` was specified, the input is expected to
    +-list the object names followed by the path name, separated by a single white
    +-space, so that the appropriate drivers can be determined.
    ++list the object names followed by the path name, separated by a single
    ++whitespace, so that the appropriate drivers can be determined.
    + 
    + OPTIONS
    + -------
    +@@
    + 	Print object information and contents for each object provided
    + 	on stdin.  May not be combined with any other options or arguments
    + 	except `--textconv` or `--filters`, in which case the input lines
    +-	also need to specify the path, separated by white space.  See the
    ++	also need to specify the path, separated by whitespace.  See the
    + 	section `BATCH OUTPUT` below for details.
    + 
    + --batch-check::
    + --batch-check=<format>::
    + 	Print object information for each object provided on stdin.  May
    + 	not be combined with any other options or arguments except
    + 	`--textconv` or `--filters`, in which case the input lines also
    +-	need to specify the path, separated by white space.  See the
    ++	need to specify the path, separated by whitespace.  See the
    + 	section `BATCH OUTPUT` below for details.
    + 
    + --batch-all-objects::
    +
      diff --git a/diff.c b/diff.c
      --- a/diff.c
      +++ b/diff.c
2:  7072bc6211 = 2:  204c7fea9d diff: allow --no-color-moved-ws
3:  ce3ad19eea = 3:  542b79b215 diff --color-moved-ws: demonstrate false positives
4:  700e0b61e7 = 4:  4ffb5c4122 diff --color-moved-ws: fix false positives
5:  9ecd8159a7 = 5:  a3a84f90c5 diff --color-moved=zebra: be stricter with color alternation
6:  1b1158b1ca = 6:  f94f2e0bae diff --color-moved-ws: optimize allow-indentation-change
7:  d8a362be6a ! 7:  fe8eb9cdbc diff --color-moved-ws: modify allow-indentation-change
    @@ -17,7 +17,7 @@
     
         This commit changes the way the indentation is handled to track the
         visual size of the indentation rather than the characters in the
    -    indentation. This has they benefit that any whitespace errors do not
    +    indentation. This has the benefit that any whitespace errors do not
         interfer with the move detection (the whitespace errors will still be
         highlighted according to --ws-error-highlight). During the discussion
         of this feature there were concerns about the correct detection of
    @@ -30,7 +30,7 @@
             they are uncolored.
     
         Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
    -    Changes since rfc:
    +    changes since rfc:
          - It now replaces the existing implementation rather than adding a new
            mode.
          - The indentation deltas are now calculated once for each line and
    @@ -49,8 +49,8 @@
      	const char *line;
      	int len;
      	int flags;
    -+	int indent_off;
    -+	int indent_width;
    ++	int indent_off;   /* Offset to first non-whitespace character */
    ++	int indent_width; /* The visual width of the indentation */
      	enum diff_symbol s;
      };
      #define EMITTED_DIFF_SYMBOL_INIT {NULL}
8:  1f7e99d45c = 8:  e600f8247c diff --color-moved-ws: handle blank lines
    
-- 
2.19.1.1690.g258b440b18


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 1/9] diff: document --no-color-moved
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
@ 2018-11-23 11:16   ` Phillip Wood
  2018-11-23 11:16   ` [PATCH v2 2/9] Use "whitespace" consistently Phillip Wood
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-23 11:16 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller, Junio C Hamano; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Add documentation for --no-color-moved.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 Documentation/diff-options.txt | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 0378cd574e..151690f814 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -293,6 +293,10 @@ dimmed-zebra::
 	`dimmed_zebra` is a deprecated synonym.
 --
 
+--no-color-moved::
+	Turn off move detection. This can be used to override configuration
+	settings. It is the same as `--color-moved=no`.
+
 --color-moved-ws=<modes>::
 	This configures how white spaces are ignored when performing the
 	move detection for `--color-moved`.
-- 
2.19.1.1690.g258b440b18


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 2/9] Use "whitespace" consistently
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
  2018-11-23 11:16   ` [PATCH v2 1/9] diff: document --no-color-moved Phillip Wood
@ 2018-11-23 11:16   ` Phillip Wood
  2018-11-23 11:16   ` [PATCH v2 3/9] diff: allow --no-color-moved-ws Phillip Wood
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-23 11:16 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller, Junio C Hamano; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Most of the messages and documentation use 'whitespace' rather than
'white space' or 'white spaces' convert to latter two to the former for
consistency.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 Documentation/diff-options.txt | 4 ++--
 Documentation/git-cat-file.txt | 8 ++++----
 diff.c                         | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 151690f814..57a2f4cb7a 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -298,7 +298,7 @@ dimmed-zebra::
 	settings. It is the same as `--color-moved=no`.
 
 --color-moved-ws=<modes>::
-	This configures how white spaces are ignored when performing the
+	This configures how whitespace is ignored when performing the
 	move detection for `--color-moved`.
 ifdef::git-diff[]
 	It can be set by the `diff.colorMovedWS` configuration setting.
@@ -316,7 +316,7 @@ ignore-all-space::
 	Ignore whitespace when comparing lines. This ignores differences
 	even if one line has whitespace where the other line has none.
 allow-indentation-change::
-	Initially ignore any white spaces in the move detection, then
+	Initially ignore any whitespace in the move detection, then
 	group the moved code blocks only into a block if the change in
 	whitespace is the same per line. This is incompatible with the
 	other modes.
diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 74013335a1..9a2e9cdafb 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -23,8 +23,8 @@ In the second form, a list of objects (separated by linefeeds) is provided on
 stdin, and the SHA-1, type, and size of each object is printed on stdout. The
 output format can be overridden using the optional `<format>` argument. If
 either `--textconv` or `--filters` was specified, the input is expected to
-list the object names followed by the path name, separated by a single white
-space, so that the appropriate drivers can be determined.
+list the object names followed by the path name, separated by a single
+whitespace, so that the appropriate drivers can be determined.
 
 OPTIONS
 -------
@@ -79,15 +79,15 @@ OPTIONS
 	Print object information and contents for each object provided
 	on stdin.  May not be combined with any other options or arguments
 	except `--textconv` or `--filters`, in which case the input lines
-	also need to specify the path, separated by white space.  See the
+	also need to specify the path, separated by whitespace.  See the
 	section `BATCH OUTPUT` below for details.
 
 --batch-check::
 --batch-check=<format>::
 	Print object information for each object provided on stdin.  May
 	not be combined with any other options or arguments except
 	`--textconv` or `--filters`, in which case the input lines also
-	need to specify the path, separated by white space.  See the
+	need to specify the path, separated by whitespace.  See the
 	section `BATCH OUTPUT` below for details.
 
 --batch-all-objects::
diff --git a/diff.c b/diff.c
index c29b1cce14..78cd3958f4 100644
--- a/diff.c
+++ b/diff.c
@@ -320,7 +320,7 @@ static int parse_color_moved_ws(const char *arg)
 
 	if ((ret & COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) &&
 	    (ret & XDF_WHITESPACE_FLAGS))
-		die(_("color-moved-ws: allow-indentation-change cannot be combined with other white space modes"));
+		die(_("color-moved-ws: allow-indentation-change cannot be combined with other whitespace modes"));
 
 	string_list_clear(&l, 0);
 
-- 
2.19.1.1690.g258b440b18


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 3/9] diff: allow --no-color-moved-ws
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
  2018-11-23 11:16   ` [PATCH v2 1/9] diff: document --no-color-moved Phillip Wood
  2018-11-23 11:16   ` [PATCH v2 2/9] Use "whitespace" consistently Phillip Wood
@ 2018-11-23 11:16   ` Phillip Wood
  2018-11-23 11:16   ` [PATCH v2 4/9] diff --color-moved-ws: demonstrate false positives Phillip Wood
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-23 11:16 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller, Junio C Hamano; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Allow --no-color-moved-ws and --color-moved-ws=no to cancel any previous
--color-moved-ws option.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 Documentation/diff-options.txt | 7 +++++++
 diff.c                         | 6 +++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 57a2f4cb7a..e1744fa80d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -306,6 +306,8 @@ endif::git-diff[]
 	These modes can be given as a comma separated list:
 +
 --
+no::
+	Do not ignore whitespace when performing move detection.
 ignore-space-at-eol::
 	Ignore changes in whitespace at EOL.
 ignore-space-change::
@@ -322,6 +324,11 @@ allow-indentation-change::
 	other modes.
 --
 
+--no-color-moved-ws::
+	Do not ignore whitespace when performing move detection. This can be
+	used to override configuration settings. It is the same as
+	`--color-moved-ws=no`.
+
 --word-diff[=<mode>]::
 	Show a word diff, using the <mode> to delimit changed words.
 	By default, words are delimited by whitespace; see
diff --git a/diff.c b/diff.c
index 78cd3958f4..9b9811988b 100644
--- a/diff.c
+++ b/diff.c
@@ -304,7 +304,9 @@ static int parse_color_moved_ws(const char *arg)
 		strbuf_addstr(&sb, i->string);
 		strbuf_trim(&sb);
 
-		if (!strcmp(sb.buf, "ignore-space-change"))
+		if (!strcmp(sb.buf, "no"))
+			ret = 0;
+		else if (!strcmp(sb.buf, "ignore-space-change"))
 			ret |= XDF_IGNORE_WHITESPACE_CHANGE;
 		else if (!strcmp(sb.buf, "ignore-space-at-eol"))
 			ret |= XDF_IGNORE_WHITESPACE_AT_EOL;
@@ -5008,6 +5010,8 @@ int diff_opt_parse(struct diff_options *options,
 		if (cm < 0)
 			die("bad --color-moved argument: %s", arg);
 		options->color_moved = cm;
+	} else if (!strcmp(arg, "--no-color-moved-ws")) {
+		options->color_moved_ws_handling = 0;
 	} else if (skip_prefix(arg, "--color-moved-ws=", &arg)) {
 		options->color_moved_ws_handling = parse_color_moved_ws(arg);
 	} else if (skip_to_optional_arg_default(arg, "--color-words", &options->word_regex, NULL)) {
-- 
2.19.1.1690.g258b440b18


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 4/9] diff --color-moved-ws: demonstrate false positives
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (2 preceding siblings ...)
  2018-11-23 11:16   ` [PATCH v2 3/9] diff: allow --no-color-moved-ws Phillip Wood
@ 2018-11-23 11:16   ` Phillip Wood
  2018-11-23 11:16   ` [PATCH v2 5/9] diff --color-moved-ws: fix " Phillip Wood
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-23 11:16 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller, Junio C Hamano; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

'diff --color-moved-ws=allow-indentation-change' can highlight lines
that have internal whitespace changes rather than indentation
changes. For example in commit 1a07e59c3e ("Update messages in
preparation for i18n", 2018-07-21) the lines

-               die (_("must end with a color"));
+               die(_("must end with a color"));

are highlighted as moved when they should not be. Modify an existing
test to show the problem that will be fixed in the next commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 t/t4015-diff-whitespace.sh | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index a9fb226c5a..eee81a1987 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1809,7 +1809,7 @@ test_expect_success 'only move detection ignores white spaces' '
 	test_cmp expected actual
 '
 
-test_expect_success 'compare whitespace delta across moved blocks' '
+test_expect_failure 'compare whitespace delta across moved blocks' '
 
 	git reset --hard &&
 	q_to_tab <<-\EOF >text.txt &&
@@ -1827,6 +1827,7 @@ test_expect_success 'compare whitespace delta across moved blocks' '
 	QQQthat has similar lines
 	QQQto previous blocks, but with different indent
 	QQQYetQAnotherQoutlierQ
+	QLine with internal w h i t e s p a c e change
 	EOF
 
 	git add text.txt &&
@@ -1847,6 +1848,7 @@ test_expect_success 'compare whitespace delta across moved blocks' '
 	QQthat has similar lines
 	QQto previous blocks, but with different indent
 	QQYetQAnotherQoutlier
+	QLine with internal whitespace change
 	EOF
 
 	git diff --color --color-moved --color-moved-ws=allow-indentation-change >actual.raw &&
@@ -1856,7 +1858,7 @@ test_expect_success 'compare whitespace delta across moved blocks' '
 		<BOLD>diff --git a/text.txt b/text.txt<RESET>
 		<BOLD>--- a/text.txt<RESET>
 		<BOLD>+++ b/text.txt<RESET>
-		<CYAN>@@ -1,14 +1,14 @@<RESET>
+		<CYAN>@@ -1,15 +1,15 @@<RESET>
 		<BOLD;MAGENTA>-QIndented<RESET>
 		<BOLD;MAGENTA>-QText across<RESET>
 		<BOLD;MAGENTA>-Qsome lines<RESET>
@@ -1871,6 +1873,7 @@ test_expect_success 'compare whitespace delta across moved blocks' '
 		<BOLD;MAGENTA>-QQQthat has similar lines<RESET>
 		<BOLD;MAGENTA>-QQQto previous blocks, but with different indent<RESET>
 		<RED>-QQQYetQAnotherQoutlierQ<RESET>
+		<RED>-QLine with internal w h i t e s p a c e change<RESET>
 		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Indented<RESET>
 		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>Text across<RESET>
 		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>some lines<RESET>
@@ -1885,6 +1888,7 @@ test_expect_success 'compare whitespace delta across moved blocks' '
 		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>that has similar lines<RESET>
 		<BOLD;CYAN>+<RESET>QQ<BOLD;CYAN>to previous blocks, but with different indent<RESET>
 		<GREEN>+<RESET>QQ<GREEN>YetQAnotherQoutlier<RESET>
+		<GREEN>+<RESET>Q<GREEN>Line with internal whitespace change<RESET>
 	EOF
 
 	test_cmp expected actual
-- 
2.19.1.1690.g258b440b18


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 5/9] diff --color-moved-ws: fix false positives
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (3 preceding siblings ...)
  2018-11-23 11:16   ` [PATCH v2 4/9] diff --color-moved-ws: demonstrate false positives Phillip Wood
@ 2018-11-23 11:16   ` " Phillip Wood
  2018-11-23 11:16   ` [PATCH v2 6/9] diff --color-moved=zebra: be stricter with color alternation Phillip Wood
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-23 11:16 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller, Junio C Hamano; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

'diff --color-moved-ws=allow-indentation-change' can color lines as
moved when they are in fact different. For example in commit
1a07e59c3e ("Update messages in preparation for i18n", 2018-07-21) the
lines

-               die (_("must end with a color"));
+               die(_("must end with a color"));

are colored as moved even though they are different.

This is because if there is a fuzzy match for the first line of
a potential moved block the line is marked as moved before the
potential match is checked to see if it actually matches. The fix is
to delay marking the line as moved until after we have checked that
there really is at least one matching potential moved block.

Note that the test modified in the last commit still fails because
adding an unmoved line between two moved blocks that are already
separated by unmoved lines changes the color of the block following the
addition. This should not be the case and will be fixed in the next
commit.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index 9b9811988b..53a7ab5aca 100644
--- a/diff.c
+++ b/diff.c
@@ -1104,10 +1104,10 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 		}
 
-		l->flags |= DIFF_SYMBOL_MOVED_LINE;
-
-		if (o->color_moved == COLOR_MOVED_PLAIN)
+		if (o->color_moved == COLOR_MOVED_PLAIN) {
+			l->flags |= DIFF_SYMBOL_MOVED_LINE;
 			continue;
+		}
 
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
@@ -1141,10 +1141,13 @@ static void mark_color_as_moved(struct diff_options *o,
 			block_length = 0;
 		}
 
-		block_length++;
+		if (pmb_nr) {
+			block_length++;
 
-		if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
-			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
+			l->flags |= DIFF_SYMBOL_MOVED_LINE;
+			if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
+				l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
+		}
 	}
 	adjust_last_block(o, n, block_length);
 
-- 
2.19.1.1690.g258b440b18


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 6/9] diff --color-moved=zebra: be stricter with color alternation
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (4 preceding siblings ...)
  2018-11-23 11:16   ` [PATCH v2 5/9] diff --color-moved-ws: fix " Phillip Wood
@ 2018-11-23 11:16   ` Phillip Wood
  2018-11-23 11:16   ` [PATCH v2 7/9] diff --color-moved-ws: optimize allow-indentation-change Phillip Wood
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-23 11:16 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller, Junio C Hamano; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Currently when using --color-moved=zebra the color of moved blocks
depends on the number of lines separating them. This means that adding
an odd number of unmoved lines between blocks that are already separated
by one or more unmoved lines will change the color of subsequent moved
blocks. This does not make much sense as the blocks were already
separated by unmoved lines and causes problems when adding lines to test
cases.

Fix this by only using the alternate colors for adjacent moved blocks.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 27 +++++++++++++++++++--------
 t/t4015-diff-whitespace.sh |  6 +++---
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index 53a7ab5aca..8c08dd68df 100644
--- a/diff.c
+++ b/diff.c
@@ -1038,26 +1038,30 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb,
  * The last block consists of the (n - block_length)'th line up to but not
  * including the nth line.
  *
+ * Returns 0 if the last block is empty or is unset by this function, non zero
+ * otherwise.
+ *
  * NEEDSWORK: This uses the same heuristic as blame_entry_score() in blame.c.
  * Think of a way to unify them.
  */
-static void adjust_last_block(struct diff_options *o, int n, int block_length)
+static int adjust_last_block(struct diff_options *o, int n, int block_length)
 {
 	int i, alnum_count = 0;
 	if (o->color_moved == COLOR_MOVED_PLAIN)
-		return;
+		return block_length;
 	for (i = 1; i < block_length + 1; i++) {
 		const char *c = o->emitted_symbols->buf[n - i].line;
 		for (; *c; c++) {
 			if (!isalnum(*c))
 				continue;
 			alnum_count++;
 			if (alnum_count >= COLOR_MOVED_MIN_ALNUM_COUNT)
-				return;
+				return 1;
 		}
 	}
 	for (i = 1; i < block_length + 1; i++)
 		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
+	return 0;
 }
 
 /* Find blocks of moved code, delegate actual coloring decision to helper */
@@ -1067,14 +1071,15 @@ static void mark_color_as_moved(struct diff_options *o,
 {
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
-	int n, flipped_block = 1, block_length = 0;
+	int n, flipped_block = 0, block_length = 0;
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
 		struct hashmap *hm = NULL;
 		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
+		enum diff_symbol last_symbol = 0;
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
@@ -1090,7 +1095,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			free(key);
 			break;
 		default:
-			flipped_block = 1;
+			flipped_block = 0;
 		}
 
 		if (!match) {
@@ -1101,10 +1106,13 @@ static void mark_color_as_moved(struct diff_options *o,
 				moved_block_clear(&pmb[i]);
 			pmb_nr = 0;
 			block_length = 0;
+			flipped_block = 0;
+			last_symbol = l->s;
 			continue;
 		}
 
 		if (o->color_moved == COLOR_MOVED_PLAIN) {
+			last_symbol = l->s;
 			l->flags |= DIFF_SYMBOL_MOVED_LINE;
 			continue;
 		}
@@ -1135,19 +1143,22 @@ static void mark_color_as_moved(struct diff_options *o,
 				}
 			}
 
-			flipped_block = (flipped_block + 1) % 2;
+			if (adjust_last_block(o, n, block_length) &&
+			    pmb_nr && last_symbol != l->s)
+				flipped_block = (flipped_block + 1) % 2;
+			else
+				flipped_block = 0;
 
-			adjust_last_block(o, n, block_length);
 			block_length = 0;
 		}
 
 		if (pmb_nr) {
 			block_length++;
-
 			l->flags |= DIFF_SYMBOL_MOVED_LINE;
 			if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
 				l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 		}
+		last_symbol = l->s;
 	}
 	adjust_last_block(o, n, block_length);
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index eee81a1987..fe8a2ab06e 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1802,14 +1802,14 @@ test_expect_success 'only move detection ignores white spaces' '
 	<BOLD;MAGENTA>-a long line to exceed per-line minimum<RESET>
 	<BOLD;MAGENTA>-another long line to exceed per-line minimum<RESET>
 	<RED>-original file<RESET>
-	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>a long line to exceed per-line minimum<RESET>
-	<BOLD;YELLOW>+<RESET>Q<BOLD;YELLOW>another long line to exceed per-line minimum<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>a long line to exceed per-line minimum<RESET>
+	<BOLD;CYAN>+<RESET>Q<BOLD;CYAN>another long line to exceed per-line minimum<RESET>
 	<GREEN>+<RESET><GREEN>new file<RESET>
 	EOF
 	test_cmp expected actual
 '
 
-test_expect_failure 'compare whitespace delta across moved blocks' '
+test_expect_success 'compare whitespace delta across moved blocks' '
 
 	git reset --hard &&
 	q_to_tab <<-\EOF >text.txt &&
-- 
2.19.1.1690.g258b440b18


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 7/9] diff --color-moved-ws: optimize allow-indentation-change
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (5 preceding siblings ...)
  2018-11-23 11:16   ` [PATCH v2 6/9] diff --color-moved=zebra: be stricter with color alternation Phillip Wood
@ 2018-11-23 11:16   ` Phillip Wood
  2018-11-23 11:16   ` [PATCH v2 8/9] diff --color-moved-ws: modify allow-indentation-change Phillip Wood
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-23 11:16 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller, Junio C Hamano; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

When running

  git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0

cmp_in_block_with_wsd() is called 694908327 times. Of those 42.7%
return after comparing a and b. By comparing the lengths first we can
return early in all but 0.03% of those cases without dereferencing the
string pointers. The comparison between a and c fails in 6.8% of
calls, by comparing the lengths first we reject all the failing calls
without dereferencing the string pointers.

This reduces the time to run the command above by by 42% from 14.6s to
8.5s. This is still much slower than the normal --color-moved which
takes ~0.6-0.7s to run but is a significant improvement.

The next commits will replace the current implementation with one that
works with mixed tabs and spaces in the indentation. I think it is
worth optimizing the current implementation first to enable a fair
comparison between the two implementations.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 8c08dd68df..c378ce3daf 100644
--- a/diff.c
+++ b/diff.c
@@ -829,20 +829,23 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 				 int n)
 {
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-	int al = cur->es->len, cl = l->len;
+	int al = cur->es->len, bl = match->es->len, cl = l->len;
 	const char *a = cur->es->line,
 		   *b = match->es->line,
 		   *c = l->line;
-
+	const char *orig_a = a;
 	int wslen;
 
 	/*
-	 * We need to check if 'cur' is equal to 'match'.
-	 * As those are from the same (+/-) side, we do not need to adjust for
-	 * indent changes. However these were found using fuzzy matching
-	 * so we do have to check if they are equal.
+	 * We need to check if 'cur' is equal to 'match'.  As those
+	 * are from the same (+/-) side, we do not need to adjust for
+	 * indent changes. However these were found using fuzzy
+	 * matching so we do have to check if they are equal. Here we
+	 * just check the lengths. We delay calling memcmp() to check
+	 * the contents until later as if the length comparison for a
+	 * and c fails we can avoid the call all together.
 	 */
-	if (strcmp(a, b))
+	if (al != bl)
 		return 1;
 
 	if (!pmb->wsd.string)
@@ -870,7 +873,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 		al -= wslen;
 	}
 
-	if (al != cl || memcmp(a, c, al))
+	if (al != cl || memcmp(orig_a, b, bl) || memcmp(a, c, al))
 		return 1;
 
 	return 0;
-- 
2.19.1.1690.g258b440b18


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 8/9] diff --color-moved-ws: modify allow-indentation-change
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (6 preceding siblings ...)
  2018-11-23 11:16   ` [PATCH v2 7/9] diff --color-moved-ws: optimize allow-indentation-change Phillip Wood
@ 2018-11-23 11:16   ` Phillip Wood
  2018-11-23 11:16   ` [PATCH v2 9/9] diff --color-moved-ws: handle blank lines Phillip Wood
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-23 11:16 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller, Junio C Hamano; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Currently diff --color-moved-ws=allow-indentation-change does not
support indentation that contains a mix of tabs and spaces. For
example in commit 546f70f377 ("convert.h: drop 'extern' from function
declaration", 2018-06-30) the function parameters in the following
lines are not colored as moved [1].

-extern int stream_filter(struct stream_filter *,
-                        const char *input, size_t *isize_p,
-                        char *output, size_t *osize_p);
+int stream_filter(struct stream_filter *,
+                 const char *input, size_t *isize_p,
+                 char *output, size_t *osize_p);

This commit changes the way the indentation is handled to track the
visual size of the indentation rather than the characters in the
indentation. This has the benefit that any whitespace errors do not
interfer with the move detection (the whitespace errors will still be
highlighted according to --ws-error-highlight). During the discussion
of this feature there were concerns about the correct detection of
indentation for python. However those concerns apply whether or not
we're detecting moved lines so no attempt is made to determine if the
indentation is 'pythonic'.

[1] Note that before the commit to fix the erroneous coloring of moved
    lines each line was colored as a different block, since that commit
    they are uncolored.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 130 +++++++++++++++++++++----------------
 t/t4015-diff-whitespace.sh |  56 ++++++++++++++++
 2 files changed, 129 insertions(+), 57 deletions(-)

diff --git a/diff.c b/diff.c
index c378ce3daf..148503e49c 100644
--- a/diff.c
+++ b/diff.c
@@ -750,6 +750,8 @@ struct emitted_diff_symbol {
 	const char *line;
 	int len;
 	int flags;
+	int indent_off;   /* Offset to first non-whitespace character */
+	int indent_width; /* The visual width of the indentation */
 	enum diff_symbol s;
 };
 #define EMITTED_DIFF_SYMBOL_INIT {NULL}
@@ -780,44 +782,68 @@ struct moved_entry {
 	struct moved_entry *next_line;
 };
 
-/**
- * The struct ws_delta holds white space differences between moved lines, i.e.
- * between '+' and '-' lines that have been detected to be a move.
- * The string contains the difference in leading white spaces, before the
- * rest of the line is compared using the white space config for move
- * coloring. The current_longer indicates if the first string in the
- * comparision is longer than the second.
- */
-struct ws_delta {
-	char *string;
-	unsigned int current_longer : 1;
-};
-#define WS_DELTA_INIT { NULL, 0 }
-
 struct moved_block {
 	struct moved_entry *match;
-	struct ws_delta wsd;
+	int wsd; /* The whitespace delta of this block */
 };
 
 static void moved_block_clear(struct moved_block *b)
 {
-	FREE_AND_NULL(b->wsd.string);
-	b->match = NULL;
+	memset(b, 0, sizeof(*b));
+}
+
+static void fill_es_indent_data(struct emitted_diff_symbol *es)
+{
+	unsigned int off = 0;
+	int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK;
+	const char *s = es->line;
+	const int len = es->len;
+
+	/* skip any \v \f \r at start of indentation */
+	while (s[off] == '\f' || s[off] == '\v' ||
+	       (s[off] == '\r' && off < len - 1))
+		off++;
+
+	/* calculate the visual width of indentation */
+	while(1) {
+		if (s[off] == ' ') {
+			width++;
+			off++;
+		} else if (s[off] == '\t') {
+			width += tab_width - (width % tab_width);
+			while (s[++off] == '\t')
+				width += tab_width;
+		} else {
+			break;
+		}
+	}
+
+	es->indent_off = off;
+	es->indent_width = width;
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
-			     const struct emitted_diff_symbol *b,
-			     struct ws_delta *out)
+			    const struct emitted_diff_symbol *b,
+			    int *out)
 {
-	const struct emitted_diff_symbol *longer =  a->len > b->len ? a : b;
-	const struct emitted_diff_symbol *shorter = a->len > b->len ? b : a;
-	int d = longer->len - shorter->len;
+	int a_len = a->len,
+	    b_len = b->len,
+	    a_off = a->indent_off,
+	    a_width = a->indent_width,
+	    b_off = b->indent_off,
+	    b_width = b->indent_width;
+	int delta;
 
-	if (strncmp(longer->line + d, shorter->line, shorter->len))
+	if (a->s == DIFF_SYMBOL_PLUS)
+		delta = a_width - b_width;
+	else
+		delta = b_width - a_width;
+
+	if (a_len - a_off != b_len - b_off ||
+	    memcmp(a->line + a_off, b->line + b_off, a_len - a_off))
 		return 0;
 
-	out->string = xmemdupz(longer->line, d);
-	out->current_longer = (a == longer);
+	*out = delta;
 
 	return 1;
 }
@@ -833,8 +859,11 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	const char *a = cur->es->line,
 		   *b = match->es->line,
 		   *c = l->line;
-	const char *orig_a = a;
-	int wslen;
+	int a_off = cur->es->indent_off,
+	    a_width = cur->es->indent_width,
+	    c_off = l->indent_off,
+	    c_width = l->indent_width;
+	int delta;
 
 	/*
 	 * We need to check if 'cur' is equal to 'match'.  As those
@@ -848,35 +877,20 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	if (al != bl)
 		return 1;
 
-	if (!pmb->wsd.string)
-		/*
-		 * The white space delta is not active? This can happen
-		 * when we exit early in this function.
-		 */
-		return 1;
-
 	/*
-	 * The indent changes of the block are known and stored in
-	 * pmb->wsd; however we need to check if the indent changes of the
-	 * current line are still the same as before.
-	 *
-	 * To do so we need to compare 'l' to 'cur', adjusting the
-	 * one of them for the white spaces, depending which was longer.
+	 * The indent changes of the block are known and stored in pmb->wsd;
+	 * however we need to check if the indent changes of the current line
+	 * match those of the current block and that the text of 'l' and 'cur'
+	 * after the indentation match.
 	 */
+	if (cur->es->s == DIFF_SYMBOL_PLUS)
+		delta = a_width - c_width;
+	else
+		delta = c_width - a_width;
 
-	wslen = strlen(pmb->wsd.string);
-	if (pmb->wsd.current_longer) {
-		c += wslen;
-		cl -= wslen;
-	} else {
-		a += wslen;
-		al -= wslen;
-	}
-
-	if (al != cl || memcmp(orig_a, b, bl) || memcmp(a, c, al))
-		return 1;
-
-	return 0;
+	return !(delta == pmb->wsd && al - a_off == cl - c_off &&
+		 !memcmp(a, b, al) && !
+		 memcmp(a + a_off, c + c_off, al - a_off));
 }
 
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
@@ -942,6 +956,9 @@ static void add_lines_to_move_detection(struct diff_options *o,
 			continue;
 		}
 
+		if (o->color_moved_ws_handling &
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			fill_es_indent_data(&o->emitted_symbols->buf[n]);
 		key = prepare_entry(o, n);
 		if (prev_line && prev_line->es->s == o->emitted_symbols->buf[n].s)
 			prev_line->next_line = key;
@@ -1020,8 +1037,7 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb,
 
 		if (lp < pmb_nr && rp > -1 && lp < rp) {
 			pmb[lp] = pmb[rp];
-			pmb[rp].match = NULL;
-			pmb[rp].wsd.string = NULL;
+			memset(&pmb[rp], 0, sizeof(pmb[rp]));
 			rp--;
 			lp++;
 		}
@@ -1141,7 +1157,7 @@ static void mark_color_as_moved(struct diff_options *o,
 							     &pmb[pmb_nr].wsd))
 						pmb[pmb_nr++].match = match;
 				} else {
-					pmb[pmb_nr].wsd.string = NULL;
+					pmb[pmb_nr].wsd = 0;
 					pmb[pmb_nr++].match = match;
 				}
 			}
@@ -1507,7 +1523,7 @@ static void emit_diff_symbol_from_struct(struct diff_options *o,
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
 			     const char *line, int len, unsigned flags)
 {
-	struct emitted_diff_symbol e = {line, len, flags, s};
+	struct emitted_diff_symbol e = {line, len, flags, 0, 0, s};
 
 	if (o->emitted_symbols)
 		append_emitted_diff_symbol(o, &e);
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index fe8a2ab06e..e023839ba6 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta incompatible with other space opti
 	test_i18ngrep allow-indentation-change err
 '
 
+test_expect_success 'compare mixed whitespace delta across moved blocks' '
+
+	git reset --hard &&
+	tr Q_ "\t " <<-EOF >text.txt &&
+	____Indented text to
+	_Q____be further indented by four spaces across
+	____Qseveral lines
+	QQ____These two lines have had their
+	____indentation reduced by four spaces
+	Qdifferent indentation change
+	____too short
+	EOF
+
+	git add text.txt &&
+	git commit -m "add text.txt" &&
+
+	tr Q_ "\t " <<-EOF >text.txt &&
+	QIndented text to
+	QQbe further indented by four spaces across
+	Q____several lines
+	Q_QThese two lines have had their
+	indentation reduced by four spaces
+	QQdifferent indentation change
+	__Qtoo short
+	EOF
+
+	git -c color.diff.whitespace="normal red" \
+		-c core.whitespace=space-before-tab \
+		diff --color --color-moved --ws-error-highlight=all \
+		--color-moved-ws=allow-indentation-change >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
+
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/text.txt b/text.txt<RESET>
+	<BOLD>--- a/text.txt<RESET>
+	<BOLD>+++ b/text.txt<RESET>
+	<CYAN>@@ -1,7 +1,7 @@<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    Indented text to<RESET>
+	<BOLD;MAGENTA>-<RESET><BRED> <RESET>	<BOLD;MAGENTA>    be further indented by four spaces across<RESET>
+	<BOLD;MAGENTA>-<RESET><BRED>    <RESET>	<BOLD;MAGENTA>several lines<RESET>
+	<BOLD;BLUE>-<RESET>		<BOLD;BLUE>    These two lines have had their<RESET>
+	<BOLD;BLUE>-<RESET><BOLD;BLUE>    indentation reduced by four spaces<RESET>
+	<BOLD;MAGENTA>-<RESET>	<BOLD;MAGENTA>different indentation change<RESET>
+	<RED>-<RESET><RED>    too short<RESET>
+	<BOLD;CYAN>+<RESET>	<BOLD;CYAN>Indented text to<RESET>
+	<BOLD;CYAN>+<RESET>		<BOLD;CYAN>be further indented by four spaces across<RESET>
+	<BOLD;CYAN>+<RESET>	<BOLD;CYAN>    several lines<RESET>
+	<BOLD;YELLOW>+<RESET>	<BRED> <RESET>	<BOLD;YELLOW>These two lines have had their<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>indentation reduced by four spaces<RESET>
+	<BOLD;CYAN>+<RESET>		<BOLD;CYAN>different indentation change<RESET>
+	<GREEN>+<RESET><BRED>  <RESET>	<GREEN>too short<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.19.1.1690.g258b440b18


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 9/9] diff --color-moved-ws: handle blank lines
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (7 preceding siblings ...)
  2018-11-23 11:16   ` [PATCH v2 8/9] diff --color-moved-ws: modify allow-indentation-change Phillip Wood
@ 2018-11-23 11:16   ` Phillip Wood
  2018-11-26 21:20   ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Stefan Beller
  2019-01-08 16:22   ` Phillip Wood
  10 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-23 11:16 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller, Junio C Hamano; +Cc: Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

When using --color-moved-ws=allow-indentation-change allow lines with
the same indentation change to be grouped across blank lines. For now
this only works if the blank lines have been moved as well, not for
blocks that have just had their indentation changed.

This completes the changes to the implementation of
--color-moved=allow-indentation-change. Running

  git diff --color-moved=allow-indentation-change v2.18.0 v2.19.0

now takes 5.0s. This is a saving of 41% from 8.5s for the optimized
version of the previous implementation and 66% from the original which
took 14.6s.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 34 ++++++++++++++++++++++++++++---
 t/t4015-diff-whitespace.sh | 41 ++++++++++++++++++++++++++++++++++----
 2 files changed, 68 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 148503e49c..ef5b8c78d7 100644
--- a/diff.c
+++ b/diff.c
@@ -792,9 +792,11 @@ static void moved_block_clear(struct moved_block *b)
 	memset(b, 0, sizeof(*b));
 }
 
+#define INDENT_BLANKLINE INT_MIN
+
 static void fill_es_indent_data(struct emitted_diff_symbol *es)
 {
-	unsigned int off = 0;
+	unsigned int off = 0, i;
 	int width = 0, tab_width = es->flags & WS_TAB_WIDTH_MASK;
 	const char *s = es->line;
 	const int len = es->len;
@@ -818,8 +820,18 @@ static void fill_es_indent_data(struct emitted_diff_symbol *es)
 		}
 	}
 
-	es->indent_off = off;
-	es->indent_width = width;
+	/* check if this line is blank */
+	for (i = off; i < len; i++)
+		if (!isspace(s[i]))
+		    break;
+
+	if (i == len) {
+		es->indent_width = INDENT_BLANKLINE;
+		es->indent_off = len;
+	} else {
+		es->indent_off = off;
+		es->indent_width = width;
+	}
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
@@ -834,6 +846,11 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 	    b_width = b->indent_width;
 	int delta;
 
+	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
+		*out = INDENT_BLANKLINE;
+		return 1;
+	}
+
 	if (a->s == DIFF_SYMBOL_PLUS)
 		delta = a_width - b_width;
 	else
@@ -877,6 +894,10 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	if (al != bl)
 		return 1;
 
+	/* If 'l' and 'cur' are both blank then they match. */
+	if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE)
+		return 0;
+
 	/*
 	 * The indent changes of the block are known and stored in pmb->wsd;
 	 * however we need to check if the indent changes of the current line
@@ -888,6 +909,13 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	else
 		delta = c_width - a_width;
 
+	/*
+	 * If the previous lines of this block were all blank then set its
+	 * whitespace delta.
+	 */
+	if (pmb->wsd == INDENT_BLANKLINE)
+		pmb->wsd = delta;
+
 	return !(delta == pmb->wsd && al - a_off == cl - c_off &&
 		 !memcmp(a, b, al) && !
 		 memcmp(a + a_off, c + c_off, al - a_off));
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index e023839ba6..9d6f88b07f 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1901,10 +1901,20 @@ test_expect_success 'compare whitespace delta incompatible with other space opti
 	test_i18ngrep allow-indentation-change err
 '
 
+EMPTY=''
 test_expect_success 'compare mixed whitespace delta across moved blocks' '
 
 	git reset --hard &&
 	tr Q_ "\t " <<-EOF >text.txt &&
+	${EMPTY}
+	____too short without
+	${EMPTY}
+	___being grouped across blank line
+	${EMPTY}
+	context
+	lines
+	to
+	anchor
 	____Indented text to
 	_Q____be further indented by four spaces across
 	____Qseveral lines
@@ -1918,9 +1928,18 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	git commit -m "add text.txt" &&
 
 	tr Q_ "\t " <<-EOF >text.txt &&
+	context
+	lines
+	to
+	anchor
 	QIndented text to
 	QQbe further indented by four spaces across
 	Q____several lines
+	${EMPTY}
+	QQtoo short without
+	${EMPTY}
+	Q_______being grouped across blank line
+	${EMPTY}
 	Q_QThese two lines have had their
 	indentation reduced by four spaces
 	QQdifferent indentation change
@@ -1937,7 +1956,16 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	<BOLD>diff --git a/text.txt b/text.txt<RESET>
 	<BOLD>--- a/text.txt<RESET>
 	<BOLD>+++ b/text.txt<RESET>
-	<CYAN>@@ -1,7 +1,7 @@<RESET>
+	<CYAN>@@ -1,16 +1,16 @@<RESET>
+	<BOLD;MAGENTA>-<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    too short without<RESET>
+	<BOLD;MAGENTA>-<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>   being grouped across blank line<RESET>
+	<BOLD;MAGENTA>-<RESET>
+	 <RESET>context<RESET>
+	 <RESET>lines<RESET>
+	 <RESET>to<RESET>
+	 <RESET>anchor<RESET>
 	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    Indented text to<RESET>
 	<BOLD;MAGENTA>-<RESET><BRED> <RESET>	<BOLD;MAGENTA>    be further indented by four spaces across<RESET>
 	<BOLD;MAGENTA>-<RESET><BRED>    <RESET>	<BOLD;MAGENTA>several lines<RESET>
@@ -1948,9 +1976,14 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	<BOLD;CYAN>+<RESET>	<BOLD;CYAN>Indented text to<RESET>
 	<BOLD;CYAN>+<RESET>		<BOLD;CYAN>be further indented by four spaces across<RESET>
 	<BOLD;CYAN>+<RESET>	<BOLD;CYAN>    several lines<RESET>
-	<BOLD;YELLOW>+<RESET>	<BRED> <RESET>	<BOLD;YELLOW>These two lines have had their<RESET>
-	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>indentation reduced by four spaces<RESET>
-	<BOLD;CYAN>+<RESET>		<BOLD;CYAN>different indentation change<RESET>
+	<BOLD;YELLOW>+<RESET>
+	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>too short without<RESET>
+	<BOLD;YELLOW>+<RESET>
+	<BOLD;YELLOW>+<RESET>	<BOLD;YELLOW>       being grouped across blank line<RESET>
+	<BOLD;YELLOW>+<RESET>
+	<BOLD;CYAN>+<RESET>	<BRED> <RESET>	<BOLD;CYAN>These two lines have had their<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>indentation reduced by four spaces<RESET>
+	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>different indentation change<RESET>
 	<GREEN>+<RESET><BRED>  <RESET>	<GREEN>too short<RESET>
 	EOF
 
-- 
2.19.1.1690.g258b440b18


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (8 preceding siblings ...)
  2018-11-23 11:16   ` [PATCH v2 9/9] diff --color-moved-ws: handle blank lines Phillip Wood
@ 2018-11-26 21:20   ` Stefan Beller
  2018-11-27 20:52     ` Phillip Wood
  2019-01-08 16:22   ` Phillip Wood
  10 siblings, 1 reply; 44+ messages in thread
From: Stefan Beller @ 2018-11-26 21:20 UTC (permalink / raw)
  To: Phillip Wood; +Cc: git, Junio C Hamano

On Fri, Nov 23, 2018 at 3:17 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Thanks to Stefan for his feedback on v1. I've updated patches 2 & 8 in
> response to those comments - see the range-diff below for details (the
> patch numbers are off by one in the range diff, I think because the
> first patch is unchanged and so it was used as the merge base by
> --range-diff=<old-head>.

`git range-diff` accepts a three dotted "range" OLD...NEW
as an easy abbreviation for the arguments
"COMMON..OLD COMMON..NEW" and the common element is
computed as the last common element. It doesn't have knowledge
about where you started your topic branch.


> For some reason the range-diff also includes
> the notes even though I did not give --notes to format-patch)

This is interesting.
The existence of notes.rewrite.<command> seems to work well
with the range-diff then, as the config would trigger the copy-over
of notes and then range-diff would diff the original notes to the new
notes.

>
> When trying out the new --color-moved-ws=allow-indentation-change I
> was disappointed to discover it did not work if the indentation
> contains a mix of spaces and tabs. This series reworks it so that it
> does.
>

The range-diff looks good to me.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment
  2018-11-26 21:20   ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Stefan Beller
@ 2018-11-27 20:52     ` Phillip Wood
  0 siblings, 0 replies; 44+ messages in thread
From: Phillip Wood @ 2018-11-27 20:52 UTC (permalink / raw)
  To: Stefan Beller, Phillip Wood; +Cc: git, Junio C Hamano

Hi Stefan

On 26/11/2018 21:20, Stefan Beller wrote:
> On Fri, Nov 23, 2018 at 3:17 AM Phillip Wood <phillip.wood@talktalk.net> wrote:
>>
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> Thanks to Stefan for his feedback on v1. I've updated patches 2 & 8 in
>> response to those comments - see the range-diff below for details (the
>> patch numbers are off by one in the range diff, I think because the
>> first patch is unchanged and so it was used as the merge base by
>> --range-diff=<old-head>.
> 
> `git range-diff` accepts a three dotted "range" OLD...NEW
> as an easy abbreviation for the arguments
> "COMMON..OLD COMMON..NEW" and the common element is
> computed as the last common element. It doesn't have knowledge
> about where you started your topic branch.

I was using the new --range-diff option to format-patch, I think I 
should have given --range-diff=@{u}..<old-head>.

>> For some reason the range-diff also includes
>> the notes even though I did not give --notes to format-patch)
> 
> This is interesting.
> The existence of notes.rewrite.<command> seems to work well
> with the range-diff then, as the config would trigger the copy-over
> of notes and then range-diff would diff the original notes to the new
> notes.

Yes, but I think with format-patch it should only diff the notes when 
--notes is given.

>> When trying out the new --color-moved-ws=allow-indentation-change I
>> was disappointed to discover it did not work if the indentation
>> contains a mix of spaces and tabs. This series reworks it so that it
>> does.
>>
> 
> The range-diff looks good to me.

That's good, thanks for your comments on the previous iterations.

Best Wishes

Phillip
> Thanks,
> Stefan
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment
  2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
                     ` (9 preceding siblings ...)
  2018-11-26 21:20   ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Stefan Beller
@ 2019-01-08 16:22   ` Phillip Wood
  2019-01-08 18:31     ` Junio C Hamano
  10 siblings, 1 reply; 44+ messages in thread
From: Phillip Wood @ 2019-01-08 16:22 UTC (permalink / raw)
  To: Git Mailing List, Stefan Beller, Junio C Hamano; +Cc: Phillip Wood

Hi Junio

I just wanted to check that these patches are on your radar as they 
haven't made it into pu yet.

Best Wishes for the New Year

Phillip

On 23/11/2018 11:16, Phillip Wood wrote:
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
> 
> Thanks to Stefan for his feedback on v1. I've updated patches 2 & 8 in
> response to those comments - see the range-diff below for details (the
> patch numbers are off by one in the range diff, I think because the
> first patch is unchanged and so it was used as the merge base by
> --range-diff=<old-head>. For some reason the range-diff also includes
> the notes even though I did not give --notes to format-patch)
> 
> When trying out the new --color-moved-ws=allow-indentation-change I
> was disappointed to discover it did not work if the indentation
> contains a mix of spaces and tabs. This series reworks it so that it
> does.
> 
> 
> Phillip Wood (9):
>    diff: document --no-color-moved
>    Use "whitespace" consistently
>    diff: allow --no-color-moved-ws
>    diff --color-moved-ws: demonstrate false positives
>    diff --color-moved-ws: fix false positives
>    diff --color-moved=zebra: be stricter with color alternation
>    diff --color-moved-ws: optimize allow-indentation-change
>    diff --color-moved-ws: modify allow-indentation-change
>    diff --color-moved-ws: handle blank lines
> 
>   Documentation/diff-options.txt |  15 ++-
>   Documentation/git-cat-file.txt |   8 +-
>   diff.c                         | 219 +++++++++++++++++++++------------
>   t/t4015-diff-whitespace.sh     |  99 ++++++++++++++-
>   4 files changed, 255 insertions(+), 86 deletions(-)
> 
> Range-diff against v1:
> 1:  ae58ae4f29 ! 1:  4939ee371d diff: use whitespace consistently
>      @@ -1,9 +1,10 @@
>       Author: Phillip Wood <phillip.wood@dunelm.org.uk>
>       
>      -    diff: use whitespace consistently
>      +    Use "whitespace" consistently
>       
>      -    Most of the documentation uses 'whitespace' rather than 'white space'
>      -    or 'white spaces' convert to latter two to the former for consistency.
>      +    Most of the messages and documentation use 'whitespace' rather than
>      +    'white space' or 'white spaces' convert to latter two to the former for
>      +    consistency.
>       
>           Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>       
>      @@ -29,6 +30,39 @@
>        	whitespace is the same per line. This is incompatible with the
>        	other modes.
>       
>      + diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
>      + --- a/Documentation/git-cat-file.txt
>      + +++ b/Documentation/git-cat-file.txt
>      +@@
>      + stdin, and the SHA-1, type, and size of each object is printed on stdout. The
>      + output format can be overridden using the optional `<format>` argument. If
>      + either `--textconv` or `--filters` was specified, the input is expected to
>      +-list the object names followed by the path name, separated by a single white
>      +-space, so that the appropriate drivers can be determined.
>      ++list the object names followed by the path name, separated by a single
>      ++whitespace, so that the appropriate drivers can be determined.
>      +
>      + OPTIONS
>      + -------
>      +@@
>      + 	Print object information and contents for each object provided
>      + 	on stdin.  May not be combined with any other options or arguments
>      + 	except `--textconv` or `--filters`, in which case the input lines
>      +-	also need to specify the path, separated by white space.  See the
>      ++	also need to specify the path, separated by whitespace.  See the
>      + 	section `BATCH OUTPUT` below for details.
>      +
>      + --batch-check::
>      + --batch-check=<format>::
>      + 	Print object information for each object provided on stdin.  May
>      + 	not be combined with any other options or arguments except
>      + 	`--textconv` or `--filters`, in which case the input lines also
>      +-	need to specify the path, separated by white space.  See the
>      ++	need to specify the path, separated by whitespace.  See the
>      + 	section `BATCH OUTPUT` below for details.
>      +
>      + --batch-all-objects::
>      +
>        diff --git a/diff.c b/diff.c
>        --- a/diff.c
>        +++ b/diff.c
> 2:  7072bc6211 = 2:  204c7fea9d diff: allow --no-color-moved-ws
> 3:  ce3ad19eea = 3:  542b79b215 diff --color-moved-ws: demonstrate false positives
> 4:  700e0b61e7 = 4:  4ffb5c4122 diff --color-moved-ws: fix false positives
> 5:  9ecd8159a7 = 5:  a3a84f90c5 diff --color-moved=zebra: be stricter with color alternation
> 6:  1b1158b1ca = 6:  f94f2e0bae diff --color-moved-ws: optimize allow-indentation-change
> 7:  d8a362be6a ! 7:  fe8eb9cdbc diff --color-moved-ws: modify allow-indentation-change
>      @@ -17,7 +17,7 @@
>       
>           This commit changes the way the indentation is handled to track the
>           visual size of the indentation rather than the characters in the
>      -    indentation. This has they benefit that any whitespace errors do not
>      +    indentation. This has the benefit that any whitespace errors do not
>           interfer with the move detection (the whitespace errors will still be
>           highlighted according to --ws-error-highlight). During the discussion
>           of this feature there were concerns about the correct detection of
>      @@ -30,7 +30,7 @@
>               they are uncolored.
>       
>           Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>      -    Changes since rfc:
>      +    changes since rfc:
>            - It now replaces the existing implementation rather than adding a new
>              mode.
>            - The indentation deltas are now calculated once for each line and
>      @@ -49,8 +49,8 @@
>        	const char *line;
>        	int len;
>        	int flags;
>      -+	int indent_off;
>      -+	int indent_width;
>      ++	int indent_off;   /* Offset to first non-whitespace character */
>      ++	int indent_width; /* The visual width of the indentation */
>        	enum diff_symbol s;
>        };
>        #define EMITTED_DIFF_SYMBOL_INIT {NULL}
> 8:  1f7e99d45c = 8:  e600f8247c diff --color-moved-ws: handle blank lines
>      
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment
  2019-01-08 16:22   ` Phillip Wood
@ 2019-01-08 18:31     ` Junio C Hamano
  2019-01-10  0:37       ` Stefan Beller
  0 siblings, 1 reply; 44+ messages in thread
From: Junio C Hamano @ 2019-01-08 18:31 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Git Mailing List, Stefan Beller, Phillip Wood

Phillip Wood <phillip.wood@talktalk.net> writes:

> I just wanted to check that these patches are on your radar as they
> haven't made it into pu yet.

Sorry, but they were not on my radar.  I was waiting for comments to
come in on them before doing anything, and now it is more than a
month ago X-<.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment
  2019-01-08 18:31     ` Junio C Hamano
@ 2019-01-10  0:37       ` Stefan Beller
  2019-01-10 18:39         ` Junio C Hamano
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Beller @ 2019-01-10  0:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, Git Mailing List, Phillip Wood

On Tue, Jan 8, 2019 at 10:31 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Phillip Wood <phillip.wood@talktalk.net> writes:
>
> > I just wanted to check that these patches are on your radar as they
> > haven't made it into pu yet.
>
> Sorry, but they were not on my radar.  I was waiting for comments to
> come in on them before doing anything, and now it is more than a
> month ago X-<.

I have reviewed the whole series again, and still have
no comment on them, i.e. the series as-is in v2 is
    Reviewed-by: Stefan Beller <sbeller@google.com>
if that helps.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment
  2019-01-10  0:37       ` Stefan Beller
@ 2019-01-10 18:39         ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2019-01-10 18:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Phillip Wood, Git Mailing List, Phillip Wood

Stefan Beller <sbeller@google.com> writes:

> On Tue, Jan 8, 2019 at 10:31 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Phillip Wood <phillip.wood@talktalk.net> writes:
>>
>> > I just wanted to check that these patches are on your radar as they
>> > haven't made it into pu yet.
>>
>> Sorry, but they were not on my radar.  I was waiting for comments to
>> come in on them before doing anything, and now it is more than a
>> month ago X-<.
>
> I have reviewed the whole series again, and still have
> no comment on them, i.e. the series as-is in v2 is
>     Reviewed-by: Stefan Beller <sbeller@google.com>
> if that helps.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, back to index

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-24 10:06 [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change Phillip Wood
2018-09-24 10:06 ` [RFC PATCH 1/3] xdiff-interface: make xdl_blankline() available Phillip Wood
2018-09-24 23:19   ` Stefan Beller
2018-09-24 10:06 ` [RFC PATCH 2/3] diff.c: remove unused variables Phillip Wood
2018-09-24 10:06 ` [RFC PATCH 3/3] diff: add --color-moved-ws=allow-mixed-indentation-change Phillip Wood
2018-09-25  1:07   ` Stefan Beller
2018-10-09  9:50     ` Phillip Wood
2018-10-09 21:10       ` Stefan Beller
2018-10-10 15:26         ` Phillip Wood
2018-10-10 18:05           ` Stefan Beller
2018-09-24 11:03 ` [RFC PATCH 0/3] diff --color-moved-ws: allow mixed spaces and tabs in indentation change Phillip Wood
2018-11-16 11:03 ` [PATCH v1 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
2018-11-16 11:03   ` [PATCH v1 1/9] diff: document --no-color-moved Phillip Wood
2018-11-16 11:03   ` [PATCH v1 2/9] diff: use whitespace consistently Phillip Wood
2018-11-16 18:29     ` Stefan Beller
2018-11-16 11:03   ` [PATCH v1 3/9] diff: allow --no-color-moved-ws Phillip Wood
2018-11-16 11:03   ` [PATCH v1 4/9] diff --color-moved-ws: demonstrate false positives Phillip Wood
2018-11-16 11:03   ` [PATCH v1 5/9] diff --color-moved-ws: fix " Phillip Wood
2018-11-16 11:03   ` [PATCH v1 6/9] diff --color-moved=zebra: be stricter with color alternation Phillip Wood
2018-11-16 11:03   ` [PATCH v1 7/9] diff --color-moved-ws: optimize allow-indentation-change Phillip Wood
2018-11-16 20:40     ` Stefan Beller
2018-11-17 14:52       ` Phillip Wood
2018-11-16 11:03   ` [PATCH v1 8/9] diff --color-moved-ws: modify allow-indentation-change Phillip Wood
2018-11-16 21:47     ` Stefan Beller
2018-11-17 14:59       ` Phillip Wood
2018-11-16 11:03   ` [PATCH v1 9/9] diff --color-moved-ws: handle blank lines Phillip Wood
2018-11-20 18:05     ` Stefan Beller
2018-11-21 15:49       ` Phillip Wood
2018-11-23 11:16 ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Phillip Wood
2018-11-23 11:16   ` [PATCH v2 1/9] diff: document --no-color-moved Phillip Wood
2018-11-23 11:16   ` [PATCH v2 2/9] Use "whitespace" consistently Phillip Wood
2018-11-23 11:16   ` [PATCH v2 3/9] diff: allow --no-color-moved-ws Phillip Wood
2018-11-23 11:16   ` [PATCH v2 4/9] diff --color-moved-ws: demonstrate false positives Phillip Wood
2018-11-23 11:16   ` [PATCH v2 5/9] diff --color-moved-ws: fix " Phillip Wood
2018-11-23 11:16   ` [PATCH v2 6/9] diff --color-moved=zebra: be stricter with color alternation Phillip Wood
2018-11-23 11:16   ` [PATCH v2 7/9] diff --color-moved-ws: optimize allow-indentation-change Phillip Wood
2018-11-23 11:16   ` [PATCH v2 8/9] diff --color-moved-ws: modify allow-indentation-change Phillip Wood
2018-11-23 11:16   ` [PATCH v2 9/9] diff --color-moved-ws: handle blank lines Phillip Wood
2018-11-26 21:20   ` [PATCH v2 0/9] diff --color-moved-ws fixes and enhancment Stefan Beller
2018-11-27 20:52     ` Phillip Wood
2019-01-08 16:22   ` Phillip Wood
2019-01-08 18:31     ` Junio C Hamano
2019-01-10  0:37       ` Stefan Beller
2019-01-10 18:39         ` Junio C Hamano

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox