git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling
@ 2017-08-11 22:49 Jonathan Tan
  2017-08-11 22:49 ` [RFC PATCH 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
                   ` (15 more replies)
  0 siblings, 16 replies; 30+ messages in thread
From: Jonathan Tan @ 2017-08-11 22:49 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller

Note that these patches are for "next", depending on the "--color-moved"
patches.

While working on something else [1], I noticed some irregularities with
how "diff --color-moved" treats the minimum block size, occasionally
coloring blocks smaller than that as moved.

I've marked this as RFC because now I'm not sure if minimum number of
lines is the best way to handle the minimum block size. In particular,
an existing test (in t4015) assumes that 2-line blocks can be
legitimately colored as moved (and notice that it is now marked
test_expect_failure in patch 3), and in my work, I would also like the
rules to be relaxed. What do you think of also changing the rule to,
say, "minimum 10 non-whitespace characters"?

[1] https://public-inbox.org/git/cover.1502483486.git.jonathantanmy@google.com/

Jonathan Tan (3):
  diff: avoid redundantly clearing a flag
  diff: respect MIN_BLOCK_LENGTH for last block
  diff: check MIN_BLOCK_LENGTH at start of new block

 diff.c                     | 35 +++++++++++++++-----
 t/t4015-diff-whitespace.sh | 80 +++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 106 insertions(+), 9 deletions(-)

-- 
2.14.0.434.g98096fd7a8-goog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 1/3] diff: avoid redundantly clearing a flag
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
@ 2017-08-11 22:49 ` Jonathan Tan
  2017-08-14 17:13   ` Stefan Beller
  2017-08-11 22:49 ` [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Jonathan Tan @ 2017-08-11 22:49 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller

No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in
mark_color_as_moved(), so it is redundant to clear it for the current
line. Therefore, clear it only for previous lines.

This makes a refactoring in a subsequent patch easier.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index f84346b47..4965ffbc4 100644
--- a/diff.c
+++ b/diff.c
@@ -895,7 +895,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (!match) {
 			if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH &&
 			    o->color_moved != COLOR_MOVED_PLAIN) {
-				for (i = 0; i < block_length + 1; i++) {
+				for (i = 1; i < block_length + 1; i++) {
 					l = &o->emitted_symbols->buf[n - i];
 					l->flags &= ~DIFF_SYMBOL_MOVED_LINE;
 				}
-- 
2.14.0.434.g98096fd7a8-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
  2017-08-11 22:49 ` [RFC PATCH 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
@ 2017-08-11 22:49 ` Jonathan Tan
  2017-08-14 17:17   ` Stefan Beller
  2017-08-11 22:49 ` [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Jonathan Tan @ 2017-08-11 22:49 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller

Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line
that does not belong to the current block. In particular, this means
that MIN_BLOCK_LENGTH is not checked after all lines are encountered.

Perform that check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 diff.c                     | 29 ++++++++++++++++++++++-------
 t/t4015-diff-whitespace.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 4965ffbc4..95620b130 100644
--- a/diff.c
+++ b/diff.c
@@ -858,6 +858,26 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 	return rp + 1;
 }
 
+/*
+ * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
+ *
+ * Otherwise, if the last block has fewer lines than
+ * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in
+ * that block.
+ *
+ * The last block consists of the (n - block_length)'th line up to but not
+ * including the nth line.
+ */
+static void adjust_last_block(struct diff_options *o, int n, int block_length)
+{
+	int i;
+	if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH ||
+	    o->color_moved == COLOR_MOVED_PLAIN)
+		return;
+	for (i = 1; i < block_length + 1; i++)
+		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
+}
+
 /* Find blocks of moved code, delegate actual coloring decision to helper */
 static void mark_color_as_moved(struct diff_options *o,
 				struct hashmap *add_lines,
@@ -893,13 +913,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		if (!match) {
-			if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH &&
-			    o->color_moved != COLOR_MOVED_PLAIN) {
-				for (i = 1; i < block_length + 1; i++) {
-					l = &o->emitted_symbols->buf[n - i];
-					l->flags &= ~DIFF_SYMBOL_MOVED_LINE;
-				}
-			}
+			adjust_last_block(o, n, block_length);
 			pmb_nr = 0;
 			block_length = 0;
 			continue;
@@ -941,6 +955,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (flipped_block)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
+	adjust_last_block(o, n, block_length);
 
 	free(pmb);
 }
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index c3b697411..6f7758e5c 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1382,6 +1382,41 @@ EOF
 	test_cmp expected actual
 '
 
+test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' '
+	git reset --hard &&
+	>bar &&
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	line1
+	EOF
+	git add foo bar &&
+	git commit -m x &&
+
+	cat <<-\EOF >bar &&
+	line1
+	EOF
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	EOF
+
+	git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/bar b/bar<RESET>
+	<BOLD>--- a/bar<RESET>
+	<BOLD>+++ b/bar<RESET>
+	<CYAN>@@ -0,0 +1 @@<RESET>
+	<GREEN>+<RESET><GREEN>line1<RESET>
+	<BOLD>diff --git a/foo b/foo<RESET>
+	<BOLD>--- a/foo<RESET>
+	<BOLD>+++ b/foo<RESET>
+	<CYAN>@@ -1,2 +1 @@<RESET>
+	 irrelevant_line<RESET>
+	<RED>-line1<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_expect_success 'move detection with submodules' '
 	test_create_repo bananas &&
 	echo ripe >bananas/recipe &&
-- 
2.14.0.434.g98096fd7a8-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
  2017-08-11 22:49 ` [RFC PATCH 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
  2017-08-11 22:49 ` [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
@ 2017-08-11 22:49 ` Jonathan Tan
  2017-08-14 17:22   ` Stefan Beller
  2017-08-12  0:39 ` [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Junio C Hamano
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Jonathan Tan @ 2017-08-11 22:49 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller

When noticing that the current line is not the continuation of the
current block, but the start of a new one, mark_color_as_moved() does
not check the length of the current block. Perform that check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 diff.c                     |  6 +++++-
 t/t4015-diff-whitespace.sh | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 95620b130..04fc0d65d 100644
--- a/diff.c
+++ b/diff.c
@@ -920,7 +920,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		l->flags |= DIFF_SYMBOL_MOVED_LINE;
-		block_length++;
 
 		if (o->color_moved == COLOR_MOVED_PLAIN)
 			continue;
@@ -950,8 +949,13 @@ static void mark_color_as_moved(struct diff_options *o,
 			}
 
 			flipped_block = (flipped_block + 1) % 2;
+
+			adjust_last_block(o, n, block_length);
+			block_length = 0;
 		}
 
+		block_length++;
+
 		if (flipped_block)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 6f7758e5c..401dc8f08 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1015,7 +1015,7 @@ test_expect_success 'detect moved code, complete file' '
 	test_cmp expected actual
 '
 
-test_expect_success 'detect malicious moved code, inside file' '
+test_expect_failure 'detect malicious moved code, inside file' '
 	test_config color.diff.oldMoved "normal red" &&
 	test_config color.diff.newMoved "normal green" &&
 	test_config color.diff.oldMovedAlternative "blue" &&
@@ -1417,6 +1417,49 @@ test_expect_success '--color-moved block at end of diff output respects MIN_BLOC
 	test_cmp expected actual
 '
 
+test_expect_success '--color-moved treats adjacent blocks as separate for MIN_BLOCK_LENGTH' '
+	git reset --hard &&
+	cat <<-\EOF >foo &&
+	line1
+	irrelevant_line
+	line2
+	line3
+	EOF
+	>bar &&
+	git add foo bar &&
+	git commit -m x &&
+
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	EOF
+	cat <<-\EOF >bar &&
+	line2
+	line3
+	line1
+	EOF
+
+	git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/bar b/bar<RESET>
+	<BOLD>--- a/bar<RESET>
+	<BOLD>+++ b/bar<RESET>
+	<CYAN>@@ -0,0 +1,3 @@<RESET>
+	<GREEN>+<RESET><GREEN>line2<RESET>
+	<GREEN>+<RESET><GREEN>line3<RESET>
+	<GREEN>+<RESET><GREEN>line1<RESET>
+	<BOLD>diff --git a/foo b/foo<RESET>
+	<BOLD>--- a/foo<RESET>
+	<BOLD>+++ b/foo<RESET>
+	<CYAN>@@ -1,4 +1 @@<RESET>
+	<RED>-line1<RESET>
+	 irrelevant_line<RESET>
+	<RED>-line2<RESET>
+	<RED>-line3<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_expect_success 'move detection with submodules' '
 	test_create_repo bananas &&
 	echo ripe >bananas/recipe &&
-- 
2.14.0.434.g98096fd7a8-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (2 preceding siblings ...)
  2017-08-11 22:49 ` [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan
@ 2017-08-12  0:39 ` Junio C Hamano
  2017-08-14 17:29   ` Stefan Beller
  2017-08-14 21:31 ` [PATCH v2 " Jonathan Tan
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2017-08-12  0:39 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, sbeller

Jonathan Tan <jonathantanmy@google.com> writes:

> Note that these patches are for "next", depending on the "--color-moved"
> patches.

As we have finished Git 2.14 cycle, in preparation for the next one,
the 'next' branch will be rewound and rebuilt early next week.  I do
not mind tentatively ejecting some topics that needs fix-ups out of
'next' to give them a clean restart.

My preference however is to keep sb/diff-color-move topic as-is
without replacing and fixing it with incremental updates like these
patches.

Thanks.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/3] diff: avoid redundantly clearing a flag
  2017-08-11 22:49 ` [RFC PATCH 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
@ 2017-08-14 17:13   ` Stefan Beller
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2017-08-14 17:13 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git@vger.kernel.org

On Fri, Aug 11, 2017 at 3:49 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in
> mark_color_as_moved(), so it is redundant to clear it for the current
> line. Therefore, clear it only for previous lines.

Oh that part. I remember debating with myself if I rather want to have
the upper bound adjusted by one less (block_length instead of
'block_length + 1'), and then add a constant to 'buf[n - i];'

The patch as implemented is fine, too.

> This makes a refactoring in a subsequent patch easier.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  diff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/diff.c b/diff.c
> index f84346b47..4965ffbc4 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -895,7 +895,7 @@ static void mark_color_as_moved(struct diff_options *o,
>                 if (!match) {
>                         if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH &&
>                             o->color_moved != COLOR_MOVED_PLAIN) {
> -                               for (i = 0; i < block_length + 1; i++) {
> +                               for (i = 1; i < block_length + 1; i++) {
>                                         l = &o->emitted_symbols->buf[n - i];
>                                         l->flags &= ~DIFF_SYMBOL_MOVED_LINE;
>                                 }
> --
> 2.14.0.434.g98096fd7a8-goog
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block
  2017-08-11 22:49 ` [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
@ 2017-08-14 17:17   ` Stefan Beller
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2017-08-14 17:17 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git@vger.kernel.org

On Fri, Aug 11, 2017 at 3:49 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line
> that does not belong to the current block. In particular, this means
> that MIN_BLOCK_LENGTH is not checked after all lines are encountered.
>
> Perform that check.

Thanks for spotting! This fix looks straightforward correct.
(Also thanks for factoring out the adjustment, I am tempted to
start a bike shedding discussion about its name, though. :P)

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  diff.c                     | 29 ++++++++++++++++++++++-------
>  t/t4015-diff-whitespace.sh | 35 +++++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+), 7 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 4965ffbc4..95620b130 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -858,6 +858,26 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
>         return rp + 1;
>  }
>
> +/*
> + * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
> + *
> + * Otherwise, if the last block has fewer lines than
> + * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in
> + * that block.
> + *
> + * The last block consists of the (n - block_length)'th line up to but not
> + * including the nth line.
> + */
> +static void adjust_last_block(struct diff_options *o, int n, int block_length)
> +{
> +       int i;
> +       if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH ||
> +           o->color_moved == COLOR_MOVED_PLAIN)
> +               return;
> +       for (i = 1; i < block_length + 1; i++)
> +               o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
> +}
> +
>  /* Find blocks of moved code, delegate actual coloring decision to helper */
>  static void mark_color_as_moved(struct diff_options *o,
>                                 struct hashmap *add_lines,
> @@ -893,13 +913,7 @@ static void mark_color_as_moved(struct diff_options *o,
>                 }
>
>                 if (!match) {
> -                       if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH &&
> -                           o->color_moved != COLOR_MOVED_PLAIN) {
> -                               for (i = 1; i < block_length + 1; i++) {
> -                                       l = &o->emitted_symbols->buf[n - i];
> -                                       l->flags &= ~DIFF_SYMBOL_MOVED_LINE;
> -                               }
> -                       }
> +                       adjust_last_block(o, n, block_length);
>                         pmb_nr = 0;
>                         block_length = 0;
>                         continue;
> @@ -941,6 +955,7 @@ static void mark_color_as_moved(struct diff_options *o,
>                 if (flipped_block)
>                         l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
>         }
> +       adjust_last_block(o, n, block_length);
>
>         free(pmb);
>  }
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index c3b697411..6f7758e5c 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1382,6 +1382,41 @@ EOF
>         test_cmp expected actual
>  '
>
> +test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' '
> +       git reset --hard &&
> +       >bar &&
> +       cat <<-\EOF >foo &&
> +       irrelevant_line
> +       line1
> +       EOF
> +       git add foo bar &&
> +       git commit -m x &&
> +
> +       cat <<-\EOF >bar &&
> +       line1
> +       EOF
> +       cat <<-\EOF >foo &&
> +       irrelevant_line
> +       EOF
> +
> +       git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual &&
> +       cat >expected <<-\EOF &&
> +       <BOLD>diff --git a/bar b/bar<RESET>
> +       <BOLD>--- a/bar<RESET>
> +       <BOLD>+++ b/bar<RESET>
> +       <CYAN>@@ -0,0 +1 @@<RESET>
> +       <GREEN>+<RESET><GREEN>line1<RESET>
> +       <BOLD>diff --git a/foo b/foo<RESET>
> +       <BOLD>--- a/foo<RESET>
> +       <BOLD>+++ b/foo<RESET>
> +       <CYAN>@@ -1,2 +1 @@<RESET>
> +        irrelevant_line<RESET>
> +       <RED>-line1<RESET>
> +       EOF
> +
> +       test_cmp expected actual
> +'
> +
>  test_expect_success 'move detection with submodules' '
>         test_create_repo bananas &&
>         echo ripe >bananas/recipe &&
> --
> 2.14.0.434.g98096fd7a8-goog
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block
  2017-08-11 22:49 ` [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan
@ 2017-08-14 17:22   ` Stefan Beller
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2017-08-14 17:22 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git@vger.kernel.org

On Fri, Aug 11, 2017 at 3:49 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> When noticing that the current line is not the continuation of the
> current block, but the start of a new one, mark_color_as_moved() does
> not check the length of the current block. Perform that check.

As far as I remember that behavior is intentional, as indicated by
the succeeding test.

The whole MIN_BLOCK_LENGTH thing is a hack IMHO as we did not have
a better heuristic for suppressing uninteresting "moved" lines such as closing
braces in C.

The information that a thing is moved in between two blocks is more
valuable than pointing out it is just 'new' or 'old'.

As this is changing behavior in a way that seems controversial, can you
give your motivation/example for why this behavior is better?
(Do we want to put it into an option/mode?)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling
  2017-08-12  0:39 ` [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Junio C Hamano
@ 2017-08-14 17:29   ` Stefan Beller
  2017-08-14 19:37     ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Beller @ 2017-08-14 17:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git@vger.kernel.org

On Fri, Aug 11, 2017 at 5:39 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jonathan Tan <jonathantanmy@google.com> writes:
>
>> Note that these patches are for "next", depending on the "--color-moved"
>> patches.
>
> As we have finished Git 2.14 cycle, in preparation for the next one,
> the 'next' branch will be rewound and rebuilt early next week.  I do
> not mind tentatively ejecting some topics that needs fix-ups out of
> 'next' to give them a clean restart.

These would cleanly apply on sb/diff-color-move.

>
> My preference however is to keep sb/diff-color-move topic as-is
> without replacing and fixing it with incremental updates like these
> patches.

I would have hoped to not need to reroll that topic.
Though I do find patches 1&2 valuable either on top or squashed
into "[PATCH] diff.c: color moved lines differently" and
"[PATCH] diff.c: color moved lines differently, plain mode"
respectively.

So I'd ask to pick at least patches 1&2 on top of that series, please?
(I am missing the context for *why* you preference is to not do
exactly this).

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling
  2017-08-14 17:29   ` Stefan Beller
@ 2017-08-14 19:37     ` Junio C Hamano
  2017-08-14 19:51       ` Stefan Beller
  0 siblings, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2017-08-14 19:37 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Tan, git@vger.kernel.org

Stefan Beller <sbeller@google.com> writes:

> On Fri, Aug 11, 2017 at 5:39 PM, Junio C Hamano <gitster@pobox.com> wrote:
> ...
>> My preference however is to keep sb/diff-color-move topic as-is
>> without replacing and fixing it with incremental updates like these
>> patches.
>
> I would have hoped to not need to reroll that topic.
> Though I do find patches 1&2 valuable either on top or squashed
> into "[PATCH] diff.c: color moved lines differently" and
> "[PATCH] diff.c: color moved lines differently, plain mode"
> respectively.
>
> So I'd ask to pick at least patches 1&2 on top of that series, please?

Yeah, that is exactly what I did before reading this message but
after reading your comments on the patches ;-)

> (I am missing the context for *why* you preference is to not do
> exactly this).

I see what I wrote can be misread, especially due to its lack of
",instead", that I want to keep the broken one as-is, with neither
reroll nor fixup.  That is not what I meant.

 - If you choose to squash so that the resulting history after the
   series graduates to 'master' will be simpler to read (due to lack
   of "oops, that was a mistake"), I do not mind a reroll.  

 - On the other hand, as the topic has been in 'next' for some time
   and presumably people tried it in their real daily work when
   needed, keeping what is queued as-is has a value---we have a
   fixed reference point that we can go back to to compare the code
   with and without the fix.

I do not have a strong preference, but if I were asked to choose,
I'd choose the latter.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling
  2017-08-14 19:37     ` Junio C Hamano
@ 2017-08-14 19:51       ` Stefan Beller
  2017-08-15 17:07         ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Beller @ 2017-08-14 19:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git@vger.kernel.org

On Mon, Aug 14, 2017 at 12:37 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> On Fri, Aug 11, 2017 at 5:39 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> ...
>>> My preference however is to keep sb/diff-color-move topic as-is
>>> without replacing and fixing it with incremental updates like these
>>> patches.
>>
>> I would have hoped to not need to reroll that topic.
>> Though I do find patches 1&2 valuable either on top or squashed
>> into "[PATCH] diff.c: color moved lines differently" and
>> "[PATCH] diff.c: color moved lines differently, plain mode"
>> respectively.
>>
>> So I'd ask to pick at least patches 1&2 on top of that series, please?
>
> Yeah, that is exactly what I did before reading this message but
> after reading your comments on the patches ;-)
>
>> (I am missing the context for *why* you preference is to not do
>> exactly this).
>
> I see what I wrote can be misread, especially due to its lack of
> ",instead", that I want to keep the broken one as-is, with neither
> reroll nor fixup.  That is not what I meant.
>
>  - If you choose to squash so that the resulting history after the
>    series graduates to 'master' will be simpler to read (due to lack
>    of "oops, that was a mistake"), I do not mind a reroll.
>
>  - On the other hand, as the topic has been in 'next' for some time
>    and presumably people tried it in their real daily work when
>    needed, keeping what is queued as-is has a value---we have a
>    fixed reference point that we can go back to to compare the code
>    with and without the fix.
>
> I do not have a strong preference, but if I were asked to choose,
> I'd choose the latter.

We'll go with the latter then. Thanks!

Other reasons for the latter that I want to add:
* The patches are written 2 month apart, which may indicate that
  there was real usage and hence fixes with a more substantiated
  understanding of the new feature.
* We should not strive for "perfect" history IMHO. That is because
  commit messages provide a lot of reasoning and add a lot of value
  for understanding the code. If I were to squash and reroll, I would
  need to make sure these points are addressed in the commit
  message to have a result that is equally good.
  The history only needs to be "good-enough", which we defined to
  "bisectable on all platforms that we care about", fixups/bugfixes
  are like the cherry on the cake, it draw attention on its own.
  Not a bad thing IMHO.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (3 preceding siblings ...)
  2017-08-12  0:39 ` [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Junio C Hamano
@ 2017-08-14 21:31 ` Jonathan Tan
  2017-08-14 21:31 ` [PATCH v2 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2017-08-14 21:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller, gitster

These patches are on sb/diff-color-move.

Patches 1 and 2 are unchanged.

It was pointed out to me that the documentation for
"--color-moved=zebra" is ambiguous, but could plausibly describe the
current behavior. I still think that the current behavior is confusing,
so I have retained patch 3, but have updated the commit message and
documentation to describe this situation. I also proceeded with updating
the test mentioned in the previous cover letter to produce the expected
result (describing the modification in the commit message).

(If we do decide that the current behavior is correct, I can send
another patch to update the documentation to be less ambiguous and maybe
update the code to not use a name like MIN_BLOCK_LENGTH, as it is the
number of adjacent moved lines that we are checking, not the length of a
block.)

Jonathan Tan (3):
  diff: avoid redundantly clearing a flag
  diff: respect MIN_BLOCK_LENGTH for last block
  diff: check MIN_BLOCK_LENGTH at start of new block

 Documentation/diff-options.txt |   6 +--
 diff.c                         |  35 ++++++++++----
 t/t4015-diff-whitespace.sh     | 106 +++++++++++++++++++++++++++++++++++------
 3 files changed, 122 insertions(+), 25 deletions(-)

-- 
2.14.1.480.gb18f417b89-goog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/3] diff: avoid redundantly clearing a flag
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (4 preceding siblings ...)
  2017-08-14 21:31 ` [PATCH v2 " Jonathan Tan
@ 2017-08-14 21:31 ` Jonathan Tan
  2017-08-14 21:31 ` [PATCH v2 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2017-08-14 21:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller, gitster

No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in
mark_color_as_moved(), so it is redundant to clear it for the current
line. Therefore, clear it only for previous lines.

This makes a refactoring in a subsequent patch easier.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 4af73a7e0..23311f9c0 100644
--- a/diff.c
+++ b/diff.c
@@ -898,7 +898,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (!match) {
 			if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH &&
 			    o->color_moved != COLOR_MOVED_PLAIN) {
-				for (i = 0; i < block_length + 1; i++) {
+				for (i = 1; i < block_length + 1; i++) {
 					l = &o->emitted_symbols->buf[n - i];
 					l->flags &= ~DIFF_SYMBOL_MOVED_LINE;
 				}
-- 
2.14.1.480.gb18f417b89-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/3] diff: respect MIN_BLOCK_LENGTH for last block
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (5 preceding siblings ...)
  2017-08-14 21:31 ` [PATCH v2 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
@ 2017-08-14 21:31 ` Jonathan Tan
  2017-08-14 21:31 ` [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2017-08-14 21:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller, gitster

Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line
that does not belong to the current block. In particular, this means
that MIN_BLOCK_LENGTH is not checked after all lines are encountered.

Perform that check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 diff.c                     | 29 ++++++++++++++++++++++-------
 t/t4015-diff-whitespace.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 23311f9c0..f598d8a3a 100644
--- a/diff.c
+++ b/diff.c
@@ -861,6 +861,26 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 	return rp + 1;
 }
 
+/*
+ * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
+ *
+ * Otherwise, if the last block has fewer lines than
+ * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in
+ * that block.
+ *
+ * The last block consists of the (n - block_length)'th line up to but not
+ * including the nth line.
+ */
+static void adjust_last_block(struct diff_options *o, int n, int block_length)
+{
+	int i;
+	if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH ||
+	    o->color_moved == COLOR_MOVED_PLAIN)
+		return;
+	for (i = 1; i < block_length + 1; i++)
+		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
+}
+
 /* Find blocks of moved code, delegate actual coloring decision to helper */
 static void mark_color_as_moved(struct diff_options *o,
 				struct hashmap *add_lines,
@@ -896,13 +916,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		if (!match) {
-			if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH &&
-			    o->color_moved != COLOR_MOVED_PLAIN) {
-				for (i = 1; i < block_length + 1; i++) {
-					l = &o->emitted_symbols->buf[n - i];
-					l->flags &= ~DIFF_SYMBOL_MOVED_LINE;
-				}
-			}
+			adjust_last_block(o, n, block_length);
 			pmb_nr = 0;
 			block_length = 0;
 			continue;
@@ -944,6 +958,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (flipped_block)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
+	adjust_last_block(o, n, block_length);
 
 	free(pmb);
 }
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index c3b697411..6f7758e5c 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1382,6 +1382,41 @@ EOF
 	test_cmp expected actual
 '
 
+test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' '
+	git reset --hard &&
+	>bar &&
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	line1
+	EOF
+	git add foo bar &&
+	git commit -m x &&
+
+	cat <<-\EOF >bar &&
+	line1
+	EOF
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	EOF
+
+	git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/bar b/bar<RESET>
+	<BOLD>--- a/bar<RESET>
+	<BOLD>+++ b/bar<RESET>
+	<CYAN>@@ -0,0 +1 @@<RESET>
+	<GREEN>+<RESET><GREEN>line1<RESET>
+	<BOLD>diff --git a/foo b/foo<RESET>
+	<BOLD>--- a/foo<RESET>
+	<BOLD>+++ b/foo<RESET>
+	<CYAN>@@ -1,2 +1 @@<RESET>
+	 irrelevant_line<RESET>
+	<RED>-line1<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_expect_success 'move detection with submodules' '
 	test_create_repo bananas &&
 	echo ripe >bananas/recipe &&
-- 
2.14.1.480.gb18f417b89-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (6 preceding siblings ...)
  2017-08-14 21:31 ` [PATCH v2 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
@ 2017-08-14 21:31 ` Jonathan Tan
  2017-08-14 22:46   ` Stefan Beller
  2017-08-14 23:57 ` [PATCH v3 0/3] "diff --color-moved" with different heuristic Jonathan Tan
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Jonathan Tan @ 2017-08-14 21:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller, gitster

The existing documentation states "If there are fewer than 3 adjacent
moved lines, they are not marked up as moved", which is ambiguous as to
whether "adjacent moved lines" must be adjacent both at the source and
at the destination, or be adjacent merely at the source or merely at the
destination. The behavior of the current code takes the latter
interpretation, but the behavior of blocks being conceptually painted as
blocks and then "unpainted" as lines is confusing to me.

Therefore, clarify the ambiguity in the documentation in the stricter
direction - a block is completely painted or not at all - and update the
code accordingly.

This requires a change in the test "detect malicious moved code, inside
file" in that the malicious change is now marked without the move
colors (because the blocks involved are too small), contrasting with
the subsequent test where the non-malicious change is marked with move
colors.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/diff-options.txt |  6 ++--
 diff.c                         |  6 +++-
 t/t4015-diff-whitespace.sh     | 71 +++++++++++++++++++++++++++++++++---------
 3 files changed, 65 insertions(+), 18 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index bc52bd0b9..1ee3ca3f6 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -257,10 +257,10 @@ zebra::
 	Blocks of moved code are detected greedily. The detected blocks are
 	painted using either the 'color.diff.{old,new}Moved' color or
 	'color.diff.{old,new}MovedAlternative'. The change between
-	the two colors indicates that a new block was detected. If there
-	are fewer than 3 adjacent moved lines, they are not marked up
+	the two colors indicates that a new block was detected. If a block
+	has fewer than 3 adjacent moved lines, it is not marked up
 	as moved, but the regular colors 'color.diff.{old,new}' will be
-	used.
+	used instead.
 dimmed_zebra::
 	Similar to 'zebra', but additional dimming of uninteresting parts
 	of moved code is performed. The bordering lines of two adjacent
diff --git a/diff.c b/diff.c
index f598d8a3a..20b784736 100644
--- a/diff.c
+++ b/diff.c
@@ -923,7 +923,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		l->flags |= DIFF_SYMBOL_MOVED_LINE;
-		block_length++;
 
 		if (o->color_moved == COLOR_MOVED_PLAIN)
 			continue;
@@ -953,8 +952,13 @@ static void mark_color_as_moved(struct diff_options *o,
 			}
 
 			flipped_block = (flipped_block + 1) % 2;
+
+			adjust_last_block(o, n, block_length);
+			block_length = 0;
 		}
 
+		block_length++;
+
 		if (flipped_block)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 6f7758e5c..d0613a189 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1097,13 +1097,13 @@ test_expect_success 'detect malicious moved code, inside file' '
 	 printf("World\n");<RESET>
 	 }<RESET>
 	 <RESET>
-	<BRED>-int secure_foo(struct user *u)<RESET>
-	<BRED>-{<RESET>
-	<BLUE>-if (!u->is_allowed_foo)<RESET>
-	<BLUE>-return;<RESET>
-	<BRED>-foo(u);<RESET>
-	<BLUE>-}<RESET>
-	<BLUE>-<RESET>
+	<RED>-int secure_foo(struct user *u)<RESET>
+	<RED>-{<RESET>
+	<RED>-if (!u->is_allowed_foo)<RESET>
+	<RED>-return;<RESET>
+	<RED>-foo(u);<RESET>
+	<RED>-}<RESET>
+	<RED>-<RESET>
 	 int main()<RESET>
 	 {<RESET>
 	 foo();<RESET>
@@ -1115,13 +1115,13 @@ test_expect_success 'detect malicious moved code, inside file' '
 	 printf("Hello World, but different\n");<RESET>
 	 }<RESET>
 	 <RESET>
-	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
-	<BGREEN>+<RESET><BGREEN>{<RESET>
-	<YELLOW>+<RESET><YELLOW>foo(u);<RESET>
-	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
-	<BGREEN>+<RESET><BGREEN>return;<RESET>
-	<YELLOW>+<RESET><YELLOW>}<RESET>
-	<YELLOW>+<RESET>
+	<GREEN>+<RESET><GREEN>int secure_foo(struct user *u)<RESET>
+	<GREEN>+<RESET><GREEN>{<RESET>
+	<GREEN>+<RESET><GREEN>foo(u);<RESET>
+	<GREEN>+<RESET><GREEN>if (!u->is_allowed_foo)<RESET>
+	<GREEN>+<RESET><GREEN>return;<RESET>
+	<GREEN>+<RESET><GREEN>}<RESET>
+	<GREEN>+<RESET>
 	 int another_function()<RESET>
 	 {<RESET>
 	 bar();<RESET>
@@ -1417,6 +1417,49 @@ test_expect_success '--color-moved block at end of diff output respects MIN_BLOC
 	test_cmp expected actual
 '
 
+test_expect_success '--color-moved treats adjacent blocks as separate for MIN_BLOCK_LENGTH' '
+	git reset --hard &&
+	cat <<-\EOF >foo &&
+	line1
+	irrelevant_line
+	line2
+	line3
+	EOF
+	>bar &&
+	git add foo bar &&
+	git commit -m x &&
+
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	EOF
+	cat <<-\EOF >bar &&
+	line2
+	line3
+	line1
+	EOF
+
+	git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/bar b/bar<RESET>
+	<BOLD>--- a/bar<RESET>
+	<BOLD>+++ b/bar<RESET>
+	<CYAN>@@ -0,0 +1,3 @@<RESET>
+	<GREEN>+<RESET><GREEN>line2<RESET>
+	<GREEN>+<RESET><GREEN>line3<RESET>
+	<GREEN>+<RESET><GREEN>line1<RESET>
+	<BOLD>diff --git a/foo b/foo<RESET>
+	<BOLD>--- a/foo<RESET>
+	<BOLD>+++ b/foo<RESET>
+	<CYAN>@@ -1,4 +1 @@<RESET>
+	<RED>-line1<RESET>
+	 irrelevant_line<RESET>
+	<RED>-line2<RESET>
+	<RED>-line3<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_expect_success 'move detection with submodules' '
 	test_create_repo bananas &&
 	echo ripe >bananas/recipe &&
-- 
2.14.1.480.gb18f417b89-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block
  2017-08-14 21:31 ` [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan
@ 2017-08-14 22:46   ` Stefan Beller
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2017-08-14 22:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git@vger.kernel.org, Junio C Hamano

On Mon, Aug 14, 2017 at 2:31 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> The existing documentation states "If there are fewer than 3 adjacent
> moved lines, they are not marked up as moved", which is ambiguous as to
> whether "adjacent moved lines" must be adjacent both at the source and
> at the destination, or be adjacent merely at the source or merely at the
> destination. The behavior of the current code takes the latter
> interpretation, but the behavior of blocks being conceptually painted as
> blocks and then "unpainted" as lines is confusing to me.
>
> Therefore, clarify the ambiguity in the documentation in the stricter
> direction - a block is completely painted or not at all - and update the
> code accordingly.
>
> This requires a change in the test "detect malicious moved code, inside
> file" in that the malicious change is now marked without the move
> colors (because the blocks involved are too small), contrasting with
> the subsequent test where the non-malicious change is marked with move
> colors.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>

I wonder if these changes ought to be a new mode
(C.f. "mountain zebra" and "imperial zebra" for slight
changes in coloring ;) or if we can settle on one true way.

The 3 lines heuristic is a bad heuristic IMHO (it works reasonable well
for little effort but the fact that we discuss this patch makes it a bad
heuristic as we discuss corner cases that are not relevant. The heuristic
originally wanted to filter out stray single braces that were "moved",
it did not want to suppress small original moved pieces of code),
which this covers up a bit.

Maybe we'll cook this in next for a while to see how people
react to it?

> ---
>  Documentation/diff-options.txt |  6 ++--
>  diff.c                         |  6 +++-
>  t/t4015-diff-whitespace.sh     | 71 +++++++++++++++++++++++++++++++++---------
>  3 files changed, 65 insertions(+), 18 deletions(-)
>
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index bc52bd0b9..1ee3ca3f6 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -257,10 +257,10 @@ zebra::
>         Blocks of moved code are detected greedily. The detected blocks are
>         painted using either the 'color.diff.{old,new}Moved' color or
>         'color.diff.{old,new}MovedAlternative'. The change between
> -       the two colors indicates that a new block was detected. If there
> -       are fewer than 3 adjacent moved lines, they are not marked up
> +       the two colors indicates that a new block was detected. If a block
> +       has fewer than 3 adjacent moved lines, it is not marked up
>         as moved, but the regular colors 'color.diff.{old,new}' will be
> -       used.
> +       used instead.

Thanks for clarifying the docs.

> diff --git a/diff.c b/diff.c
> index f598d8a3a..20b784736 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -923,7 +923,6 @@ static void mark_color_as_moved(struct diff_options *o,
>                 }
>
>                 l->flags |= DIFF_SYMBOL_MOVED_LINE;
> -               block_length++;
>
>                 if (o->color_moved == COLOR_MOVED_PLAIN)
>                         continue;
> @@ -953,8 +952,13 @@ static void mark_color_as_moved(struct diff_options *o,
>                         }
>
>                         flipped_block = (flipped_block + 1) % 2;
> +
> +                       adjust_last_block(o, n, block_length);
> +                       block_length = 0;
>                 }
>
> +               block_length++;
> +
>                 if (flipped_block)
>                         l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
>         }

This changes the algorithm in a non-obvious way.
When the min-length heuristic is strictly bound to each block,
the function can be simplified more than adding on these tweaks,

1) remove variable block_length, needing to count in the adjust function

2) assign DIFF_SYMBOL_MOVED_LINE either in
    COLOR_MOVED_PLAIN case (and continue) or later (where
    block_length is increased in this patch)

No need to do these, just as thoughts on how to reduce complexity.

> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 6f7758e5c..d0613a189 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1097,13 +1097,13 @@ test_expect_success 'detect malicious moved code, inside file' '

This test would not 'detect malicious moved code, inside file'  any more,
I think instead we'd rather want to have a more realistic test case,
which has more lines in it? (This test is about the block detection
not about the omission of short blocks, which was an after thought)

> +test_expect_success '--color-moved treats adjacent blocks as separate for MIN_BLOCK_LENGTH' '

Thanks for providing a test here! For testing MIN_BLOCK_LENGTH
for each block I would have imagined the tests would have a block of
length (1,)2,3(,4) lines and then we'd see that the blocks are
highlighted or not.
This only has length=1 blocks?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 0/3] "diff --color-moved" with different heuristic
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (7 preceding siblings ...)
  2017-08-14 21:31 ` [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan
@ 2017-08-14 23:57 ` Jonathan Tan
  2017-08-14 23:57 ` [PATCH v3 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2017-08-14 23:57 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller, gitster

These patches are on sb/diff-color-move.

Patches 1 and 2 are unchanged.

If we're planning to cook a heuristic, we might as well try a better
one. What do you think of this? A heuristic of number of non-whitespace
characters, applied at the block level, and not dependent on the block's
neighbors.

As for the test, good point about testing the boundary conditions - I
have included another test that does that.

Jonathan Tan (3):
  diff: avoid redundantly clearing a flag
  diff: respect MIN_BLOCK_LENGTH for last block
  diff: define block by number of non-space chars

 Documentation/diff-options.txt |   8 +--
 diff.c                         |  44 +++++++++++---
 diff.h                         |   2 +-
 t/t4015-diff-whitespace.sh     | 129 +++++++++++++++++++++++++++++++++++++++--
 4 files changed, 163 insertions(+), 20 deletions(-)

-- 
2.14.1.480.gb18f417b89-goog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 1/3] diff: avoid redundantly clearing a flag
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (8 preceding siblings ...)
  2017-08-14 23:57 ` [PATCH v3 0/3] "diff --color-moved" with different heuristic Jonathan Tan
@ 2017-08-14 23:57 ` Jonathan Tan
  2017-08-14 23:57 ` [PATCH v3 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2017-08-14 23:57 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller, gitster

No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in
mark_color_as_moved(), so it is redundant to clear it for the current
line. Therefore, clear it only for previous lines.

This makes a refactoring in a subsequent patch easier.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 4af73a7e0..23311f9c0 100644
--- a/diff.c
+++ b/diff.c
@@ -898,7 +898,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (!match) {
 			if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH &&
 			    o->color_moved != COLOR_MOVED_PLAIN) {
-				for (i = 0; i < block_length + 1; i++) {
+				for (i = 1; i < block_length + 1; i++) {
 					l = &o->emitted_symbols->buf[n - i];
 					l->flags &= ~DIFF_SYMBOL_MOVED_LINE;
 				}
-- 
2.14.1.480.gb18f417b89-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 2/3] diff: respect MIN_BLOCK_LENGTH for last block
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (9 preceding siblings ...)
  2017-08-14 23:57 ` [PATCH v3 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
@ 2017-08-14 23:57 ` Jonathan Tan
  2017-08-14 23:57 ` [PATCH v3 3/3] diff: define block by number of non-space chars Jonathan Tan
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2017-08-14 23:57 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller, gitster

Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line
that does not belong to the current block. In particular, this means
that MIN_BLOCK_LENGTH is not checked after all lines are encountered.

Perform that check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 diff.c                     | 29 ++++++++++++++++++++++-------
 t/t4015-diff-whitespace.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 23311f9c0..f598d8a3a 100644
--- a/diff.c
+++ b/diff.c
@@ -861,6 +861,26 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 	return rp + 1;
 }
 
+/*
+ * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
+ *
+ * Otherwise, if the last block has fewer lines than
+ * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in
+ * that block.
+ *
+ * The last block consists of the (n - block_length)'th line up to but not
+ * including the nth line.
+ */
+static void adjust_last_block(struct diff_options *o, int n, int block_length)
+{
+	int i;
+	if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH ||
+	    o->color_moved == COLOR_MOVED_PLAIN)
+		return;
+	for (i = 1; i < block_length + 1; i++)
+		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
+}
+
 /* Find blocks of moved code, delegate actual coloring decision to helper */
 static void mark_color_as_moved(struct diff_options *o,
 				struct hashmap *add_lines,
@@ -896,13 +916,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		if (!match) {
-			if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH &&
-			    o->color_moved != COLOR_MOVED_PLAIN) {
-				for (i = 1; i < block_length + 1; i++) {
-					l = &o->emitted_symbols->buf[n - i];
-					l->flags &= ~DIFF_SYMBOL_MOVED_LINE;
-				}
-			}
+			adjust_last_block(o, n, block_length);
 			pmb_nr = 0;
 			block_length = 0;
 			continue;
@@ -944,6 +958,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (flipped_block)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
+	adjust_last_block(o, n, block_length);
 
 	free(pmb);
 }
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index c3b697411..6f7758e5c 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1382,6 +1382,41 @@ EOF
 	test_cmp expected actual
 '
 
+test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' '
+	git reset --hard &&
+	>bar &&
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	line1
+	EOF
+	git add foo bar &&
+	git commit -m x &&
+
+	cat <<-\EOF >bar &&
+	line1
+	EOF
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	EOF
+
+	git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/bar b/bar<RESET>
+	<BOLD>--- a/bar<RESET>
+	<BOLD>+++ b/bar<RESET>
+	<CYAN>@@ -0,0 +1 @@<RESET>
+	<GREEN>+<RESET><GREEN>line1<RESET>
+	<BOLD>diff --git a/foo b/foo<RESET>
+	<BOLD>--- a/foo<RESET>
+	<BOLD>+++ b/foo<RESET>
+	<CYAN>@@ -1,2 +1 @@<RESET>
+	 irrelevant_line<RESET>
+	<RED>-line1<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_expect_success 'move detection with submodules' '
 	test_create_repo bananas &&
 	echo ripe >bananas/recipe &&
-- 
2.14.1.480.gb18f417b89-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 3/3] diff: define block by number of non-space chars
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (10 preceding siblings ...)
  2017-08-14 23:57 ` [PATCH v3 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
@ 2017-08-14 23:57 ` Jonathan Tan
  2017-08-15  2:29   ` Stefan Beller
  2017-08-15 19:54   ` Junio C Hamano
  2017-08-16  1:27 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Jonathan Tan
                   ` (3 subsequent siblings)
  15 siblings, 2 replies; 30+ messages in thread
From: Jonathan Tan @ 2017-08-14 23:57 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller, gitster

The existing behavior of diff --color-moved=zebra does not define the
minimum size of a block at all, instead relying on a heuristic applied
later to filter out sets of adjacent moved lines that are shorter than 3
lines long. This can be confusing, because a block could thus be colored
as moved at the source but not at the destination (or vice versa),
depending on its neighbors.

Instead, teach diff that the minimum size of a block is 10
non-whitespace characters. This allows diff to still exclude
uninteresting lines appearing on their own (such as those solely
consisting of one or a few closing braces), as was the intention of the
adjacent-moved-line heuristic.

This requires a change in the test "detect malicious moved code, inside
file" in that some of its lines are no longer considered to be part of a
block, because they are too short. The malicious move can still be
observed, however.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/diff-options.txt |  8 ++--
 diff.c                         | 27 +++++++++---
 diff.h                         |  2 +-
 t/t4015-diff-whitespace.sh     | 96 +++++++++++++++++++++++++++++++++++++++---
 4 files changed, 113 insertions(+), 20 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index bc52bd0b9..d0bdc849f 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -254,13 +254,11 @@ plain::
 	moved line, but it is not very useful in a review to determine
 	if a block of code was moved without permutation.
 zebra::
-	Blocks of moved code are detected greedily. The detected blocks are
+	Blocks of moved text of at least 10 non-whitespace characters
+	are detected greedily. The detected blocks are
 	painted using either the 'color.diff.{old,new}Moved' color or
 	'color.diff.{old,new}MovedAlternative'. The change between
-	the two colors indicates that a new block was detected. If there
-	are fewer than 3 adjacent moved lines, they are not marked up
-	as moved, but the regular colors 'color.diff.{old,new}' will be
-	used.
+	the two colors indicates that a new block was detected.
 dimmed_zebra::
 	Similar to 'zebra', but additional dimming of uninteresting parts
 	of moved code is performed. The bordering lines of two adjacent
diff --git a/diff.c b/diff.c
index f598d8a3a..305ce4126 100644
--- a/diff.c
+++ b/diff.c
@@ -864,19 +864,28 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 /*
  * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
  *
- * Otherwise, if the last block has fewer lines than
- * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in
- * that block.
+ * Otherwise, if the last block has fewer non-space characters than
+ * COLOR_MOVED_MIN_NON_SPACE_COUNT, unset DIFF_SYMBOL_MOVED_LINE on all lines
+ * in that block.
  *
  * The last block consists of the (n - block_length)'th line up to but not
  * including the nth line.
  */
 static void adjust_last_block(struct diff_options *o, int n, int block_length)
 {
-	int i;
-	if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH ||
-	    o->color_moved == COLOR_MOVED_PLAIN)
+	int i, non_space_count = 0;
+	if (o->color_moved == COLOR_MOVED_PLAIN)
 		return;
+	for (i = 1; i < block_length + 1; i++) {
+		const char *c = o->emitted_symbols->buf[n - i].line;
+		for (; *c; c++) {
+			if (isspace(*c))
+				continue;
+			non_space_count++;
+			if (non_space_count >= COLOR_MOVED_MIN_NON_SPACE_COUNT)
+				return;
+		}
+	}
 	for (i = 1; i < block_length + 1; i++)
 		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
 }
@@ -923,7 +932,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		l->flags |= DIFF_SYMBOL_MOVED_LINE;
-		block_length++;
 
 		if (o->color_moved == COLOR_MOVED_PLAIN)
 			continue;
@@ -953,8 +961,13 @@ static void mark_color_as_moved(struct diff_options *o,
 			}
 
 			flipped_block = (flipped_block + 1) % 2;
+
+			adjust_last_block(o, n, block_length);
+			block_length = 0;
 		}
 
+		block_length++;
+
 		if (flipped_block)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
diff --git a/diff.h b/diff.h
index 5755f465d..9e2fece5b 100644
--- a/diff.h
+++ b/diff.h
@@ -195,7 +195,7 @@ struct diff_options {
 		COLOR_MOVED_ZEBRA_DIM = 3,
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
-	#define COLOR_MOVED_MIN_BLOCK_LENGTH 3
+	#define COLOR_MOVED_MIN_NON_SPACE_COUNT 10
 };
 
 void diff_emit_submodule_del(struct diff_options *o, const char *line);
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 6f7758e5c..d8e7b77b9 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1101,9 +1101,9 @@ test_expect_success 'detect malicious moved code, inside file' '
 	<BRED>-{<RESET>
 	<BLUE>-if (!u->is_allowed_foo)<RESET>
 	<BLUE>-return;<RESET>
-	<BRED>-foo(u);<RESET>
-	<BLUE>-}<RESET>
-	<BLUE>-<RESET>
+	<RED>-foo(u);<RESET>
+	<RED>-}<RESET>
+	<RED>-<RESET>
 	 int main()<RESET>
 	 {<RESET>
 	 foo();<RESET>
@@ -1117,11 +1117,11 @@ test_expect_success 'detect malicious moved code, inside file' '
 	 <RESET>
 	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
 	<BGREEN>+<RESET><BGREEN>{<RESET>
-	<YELLOW>+<RESET><YELLOW>foo(u);<RESET>
+	<GREEN>+<RESET><GREEN>foo(u);<RESET>
 	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
 	<BGREEN>+<RESET><BGREEN>return;<RESET>
-	<YELLOW>+<RESET><YELLOW>}<RESET>
-	<YELLOW>+<RESET>
+	<GREEN>+<RESET><GREEN>}<RESET>
+	<GREEN>+<RESET>
 	 int another_function()<RESET>
 	 {<RESET>
 	 bar();<RESET>
@@ -1382,7 +1382,7 @@ EOF
 	test_cmp expected actual
 '
 
-test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' '
+test_expect_success '--color-moved block at end of diff output respects MIN_NON_SPACE_COUNT' '
 	git reset --hard &&
 	>bar &&
 	cat <<-\EOF >foo &&
@@ -1417,6 +1417,88 @@ test_expect_success '--color-moved block at end of diff output respects MIN_BLOC
 	test_cmp expected actual
 '
 
+test_expect_success '--color-moved respects MIN_NON_SPACE_COUNT' '
+	git reset --hard &&
+	cat <<-\EOF >foo &&
+	nine chars
+	irrelevant_line
+	ten chars__
+	EOF
+	>bar &&
+	git add foo bar &&
+	git commit -m x &&
+
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	EOF
+	cat <<-\EOF >bar &&
+	ten chars__
+	nine chars
+	EOF
+
+	git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/bar b/bar<RESET>
+	<BOLD>--- a/bar<RESET>
+	<BOLD>+++ b/bar<RESET>
+	<CYAN>@@ -0,0 +1,2 @@<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>ten chars__<RESET>
+	<GREEN>+<RESET><GREEN>nine chars<RESET>
+	<BOLD>diff --git a/foo b/foo<RESET>
+	<BOLD>--- a/foo<RESET>
+	<BOLD>+++ b/foo<RESET>
+	<CYAN>@@ -1,3 +1 @@<RESET>
+	<RED>-nine chars<RESET>
+	 irrelevant_line<RESET>
+	<BOLD;MAGENTA>-ten chars__<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success '--color-moved treats adjacent blocks as separate for MIN_NON_SPACE_COUNT' '
+	git reset --hard &&
+	cat <<-\EOF >foo &&
+	lin1
+	irrelevant_line
+	lin2
+	lin3
+	EOF
+	>bar &&
+	git add foo bar &&
+	git commit -m x &&
+
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	EOF
+	cat <<-\EOF >bar &&
+	lin2
+	lin3
+	lin1
+	EOF
+
+	git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/bar b/bar<RESET>
+	<BOLD>--- a/bar<RESET>
+	<BOLD>+++ b/bar<RESET>
+	<CYAN>@@ -0,0 +1,3 @@<RESET>
+	<GREEN>+<RESET><GREEN>lin2<RESET>
+	<GREEN>+<RESET><GREEN>lin3<RESET>
+	<GREEN>+<RESET><GREEN>lin1<RESET>
+	<BOLD>diff --git a/foo b/foo<RESET>
+	<BOLD>--- a/foo<RESET>
+	<BOLD>+++ b/foo<RESET>
+	<CYAN>@@ -1,4 +1 @@<RESET>
+	<RED>-lin1<RESET>
+	 irrelevant_line<RESET>
+	<RED>-lin2<RESET>
+	<RED>-lin3<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_expect_success 'move detection with submodules' '
 	test_create_repo bananas &&
 	echo ripe >bananas/recipe &&
-- 
2.14.1.480.gb18f417b89-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 3/3] diff: define block by number of non-space chars
  2017-08-14 23:57 ` [PATCH v3 3/3] diff: define block by number of non-space chars Jonathan Tan
@ 2017-08-15  2:29   ` Stefan Beller
  2017-08-15 19:54   ` Junio C Hamano
  1 sibling, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2017-08-15  2:29 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git@vger.kernel.org, Junio C Hamano

On Mon, Aug 14, 2017 at 4:57 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> The existing behavior of diff --color-moved=zebra does not define the
> minimum size of a block at all, instead relying on a heuristic applied
> later to filter out sets of adjacent moved lines that are shorter than 3
> lines long. This can be confusing, because a block could thus be colored
> as moved at the source but not at the destination (or vice versa),
> depending on its neighbors.
>
> Instead, teach diff that the minimum size of a block is 10
> non-whitespace characters. This allows diff to still exclude
> uninteresting lines appearing on their own (such as those solely
> consisting of one or a few closing braces), as was the intention of the
> adjacent-moved-line heuristic.

After some thought, I really like this heuristic, however allow me
a moment to bikeshed 10 as a number here.

One could think that 10 equals roughly 3 lines a 3 characters and
in C based languages the shortest meaningful lines have more than
3 characters ("i++;", "a();", "int i;" have 4 or 5 each), but I would still
think that 10 is too much, as we'd want to detect the closing braces
in their own lines.

>  dimmed_zebra::
>         Similar to 'zebra', but additional dimming of uninteresting parts
>         of moved code is performed. The bordering lines of two adjacent
> diff --git a/diff.c b/diff.c
> index f598d8a3a..305ce4126 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -864,19 +864,28 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
>  /*
>   * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
>   *
> - * Otherwise, if the last block has fewer lines than
> - * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in
> - * that block.
> + * Otherwise, if the last block has fewer non-space characters than
> + * COLOR_MOVED_MIN_NON_SPACE_COUNT, unset DIFF_SYMBOL_MOVED_LINE on all lines
> + * in that block.
>   *
>   * The last block consists of the (n - block_length)'th line up to but not
>   * including the nth line.
>   */
>  static void adjust_last_block(struct diff_options *o, int n, int block_length)
>  {
> -       int i;
> -       if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH ||
> -           o->color_moved == COLOR_MOVED_PLAIN)
> +       int i, non_space_count = 0;
> +       if (o->color_moved == COLOR_MOVED_PLAIN)
>                 return;
> +       for (i = 1; i < block_length + 1; i++) {
> +               const char *c = o->emitted_symbols->buf[n - i].line;
> +               for (; *c; c++) {
> +                       if (isspace(*c))
> +                               continue;
> +                       non_space_count++;
> +                       if (non_space_count >= COLOR_MOVED_MIN_NON_SPACE_COUNT)
> +                               return;

When we do this counting, we could count the lines ourselves here as well.
`n-block_count` should be equal to the line that has a different
(flags & (DIFF_SYMBOL_MOVED_LINE | DIFF_SYMBOL_MOVED_LINE_ALT))
pattern than those before. (although we'd also have to check for i > 0, too)
Your choice.

> +               }
> +       }
>         for (i = 1; i < block_length + 1; i++)
>                 o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
>  }
> @@ -923,7 +932,6 @@ static void mark_color_as_moved(struct diff_options *o,
>                 }
>
>                 l->flags |= DIFF_SYMBOL_MOVED_LINE;
> -               block_length++;
>
>                 if (o->color_moved == COLOR_MOVED_PLAIN)
>                         continue;
> @@ -953,8 +961,13 @@ static void mark_color_as_moved(struct diff_options *o,
>                         }
>
>                         flipped_block = (flipped_block + 1) % 2;
> +
> +                       adjust_last_block(o, n, block_length);
> +                       block_length = 0;
>                 }
>
> +               block_length++;
> +
>                 if (flipped_block)
>                         l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
>         }
> diff --git a/diff.h b/diff.h
> index 5755f465d..9e2fece5b 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -195,7 +195,7 @@ struct diff_options {
>                 COLOR_MOVED_ZEBRA_DIM = 3,
>         } color_moved;
>         #define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
> -       #define COLOR_MOVED_MIN_BLOCK_LENGTH 3
> +       #define COLOR_MOVED_MIN_NON_SPACE_COUNT 10
>  };
>
>  void diff_emit_submodule_del(struct diff_options *o, const char *line);
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 6f7758e5c..d8e7b77b9 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1101,9 +1101,9 @@ test_expect_success 'detect malicious moved code, inside file' '
>         <BRED>-{<RESET>
>         <BLUE>-if (!u->is_allowed_foo)<RESET>
>         <BLUE>-return;<RESET>
> -       <BRED>-foo(u);<RESET>
> -       <BLUE>-}<RESET>
> -       <BLUE>-<RESET>
> +       <RED>-foo(u);<RESET>
> +       <RED>-}<RESET>
> +       <RED>-<RESET>

Here we have 2 blocks, the first has 7 character,
which we may want to detect, the second has only 1 char.

The longest "uninteresting" line in C like languages might
be "\t } else {" which has 6 non-ws characters.

Thinking of other languages (shell "fi" is uninteresting, others are
interesting,
Latex \"custom" all bets are off), I think we may want to go lower and have
COLOR_MOVED_MIN_NON_SPACE_COUNT to be about 6
(~ 2 characters on a 3 line block).

That said this is all bikeshedding, feel free to ignore.

Acked-by: Stefan Beller <sbeller@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling
  2017-08-14 19:51       ` Stefan Beller
@ 2017-08-15 17:07         ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2017-08-15 17:07 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Tan, git@vger.kernel.org

Stefan Beller <sbeller@google.com> writes:

>> I see what I wrote can be misread, especially due to its lack of
>> ",instead", that I want to keep the broken one as-is, with neither
>> reroll nor fixup.  That is not what I meant.
>>
>>  - If you choose to squash so that the resulting history after the
>>    series graduates to 'master' will be simpler to read (due to lack
>>    of "oops, that was a mistake"), I do not mind a reroll.
>>
>>  - On the other hand, as the topic has been in 'next' for some time
>>    and presumably people tried it in their real daily work when
>>    needed, keeping what is queued as-is has a value---we have a
>>    fixed reference point that we can go back to to compare the code
>>    with and without the fix.
>>
>> I do not have a strong preference, but if I were asked to choose,
>> I'd choose the latter.
>
> We'll go with the latter then. Thanks!
>
> Other reasons for the latter that I want to add:

Yup, we are on the same page.  You articulated what I meant in the
"On the other hand" bullet point in a better way.

Even though we generally do not tolerate stupid mistakes and design
errors to clutter the history if they are found early in the review
process while the patches are still in flight, code that have been
"in" for extended period of time and then found it has room for
improvement is a different matter.  There is a reason why we thought
it was good enough initially, and there is a reason why we later
found it needing improvement.  Doing the latter as an incremental
fix-up is a good way to leave records for both in our history.  

And to make that kind of incremental refinement useful, it helps to
keep the history clean from an initial attempt riddled with trivial
issues that are found early in the review is also important.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 3/3] diff: define block by number of non-space chars
  2017-08-14 23:57 ` [PATCH v3 3/3] diff: define block by number of non-space chars Jonathan Tan
  2017-08-15  2:29   ` Stefan Beller
@ 2017-08-15 19:54   ` Junio C Hamano
  2017-08-15 20:06     ` Stefan Beller
  1 sibling, 1 reply; 30+ messages in thread
From: Junio C Hamano @ 2017-08-15 19:54 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, sbeller

Jonathan Tan <jonathantanmy@google.com> writes:

> The existing behavior of diff --color-moved=zebra does not define the
> minimum size of a block at all, instead relying on a heuristic applied
> later to filter out sets of adjacent moved lines that are shorter than 3
> lines long. This can be confusing, because a block could thus be colored
> as moved at the source but not at the destination (or vice versa),
> depending on its neighbors.
>
> Instead, teach diff that the minimum size of a block is 10
> non-whitespace characters. This allows diff to still exclude
> uninteresting lines appearing on their own (such as those solely
> consisting of one or a few closing braces), as was the intention of the
> adjacent-moved-line heuristic.

I recall that there is a logic backed by a similar rationale in
blame.c::blame_entry_score() but over there we count alnum, not
!isspace, to judge if a block has been split into too small a piece
to be significant.  I do not know which one is better, but if there
is no strong reason, perhaps we want to unify the two, so that we
can improve both heuristics at the same time?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 3/3] diff: define block by number of non-space chars
  2017-08-15 19:54   ` Junio C Hamano
@ 2017-08-15 20:06     ` Stefan Beller
  2017-08-15 20:53       ` Junio C Hamano
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Beller @ 2017-08-15 20:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git@vger.kernel.org

On Tue, Aug 15, 2017 at 12:54 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jonathan Tan <jonathantanmy@google.com> writes:
>
>> The existing behavior of diff --color-moved=zebra does not define the
>> minimum size of a block at all, instead relying on a heuristic applied
>> later to filter out sets of adjacent moved lines that are shorter than 3
>> lines long. This can be confusing, because a block could thus be colored
>> as moved at the source but not at the destination (or vice versa),
>> depending on its neighbors.
>>
>> Instead, teach diff that the minimum size of a block is 10
>> non-whitespace characters. This allows diff to still exclude
>> uninteresting lines appearing on their own (such as those solely
>> consisting of one or a few closing braces), as was the intention of the
>> adjacent-moved-line heuristic.
>
> I recall that there is a logic backed by a similar rationale in
> blame.c::blame_entry_score() but over there we count alnum, not
> !isspace, to judge if a block has been split into too small a piece
> to be significant.  I do not know which one is better, but if there
> is no strong reason, perhaps we want to unify the two, so that we
> can improve both heuristics at the same time?

In an ideal world we would use entropy of the diffed characters as
that is a best approximation on how much "interesting" things are
going on in that particular diff.

Computing the entropy is cumbersome, but maybe ok for this
purpose (we're most likely IO bound anyway, specifically when
including the human understanding as the last part of IO).

Reasons that may influence the choice
* Non latin/ASCII characters should also work.
* Do we care about the whitespace esoteric
  programming language?

The function blame_entry_score is documented to approach
this exact problem that we are trying to solve here, so I agree
we should have a common heuristic.

Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 3/3] diff: define block by number of non-space chars
  2017-08-15 20:06     ` Stefan Beller
@ 2017-08-15 20:53       ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2017-08-15 20:53 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jonathan Tan, git@vger.kernel.org

Stefan Beller <sbeller@google.com> writes:

> The function blame_entry_score is documented to approach
> this exact problem that we are trying to solve here, so I agree
> we should have a common heuristic.

While I do not think it is necessary to do such a refactoring within
the scope of this series, we should at least leave a NEEDSWORK
in-code comment next to the new heuristics code to remind those who
revisit the code of this possible small project idea, I would think.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v4 0/3] "diff --color-moved" with yet another heuristic
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (11 preceding siblings ...)
  2017-08-14 23:57 ` [PATCH v3 3/3] diff: define block by number of non-space chars Jonathan Tan
@ 2017-08-16  1:27 ` Jonathan Tan
  2017-08-16  5:55   ` Stefan Beller
  2017-08-16  1:27 ` [PATCH v4 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 30+ messages in thread
From: Jonathan Tan @ 2017-08-16  1:27 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller, gitster

These patches are on sb/diff-color-move.

Patches 1 and 2 are unchanged, except for some line wrapping in a test
in patch 2.

This has been updated to use the same alphanumeric heuristic as blame
(20 alnum characters). I tried it out and I thought the results were
reasonable in a patch set that I'm working on (the pack-related function
refactoring one).

As for refactoring blame.c and this file, I'm not sure where best to put
the new function, so I've added a NEEDSWORK for now.

As for detecting block boundaries in adjust_last_block(), I've left it
as-is for now. I think it's clearer if the parent function provides that
information, since it already tracks that. In addition, we avoid corner
cases such as what happens if the block is at the start of the diff
output (we must ensure that we don't read off the beginning edge, for
example).

Jonathan Tan (3):
  diff: avoid redundantly clearing a flag
  diff: respect MIN_BLOCK_LENGTH for last block
  diff: define block by number of alphanumeric chars

 Documentation/diff-options.txt |   8 +-
 diff.c                         |  47 ++++++--
 diff.h                         |   2 +-
 t/t4015-diff-whitespace.sh     | 261 ++++++++++++++++++++++++++++++-----------
 4 files changed, 236 insertions(+), 82 deletions(-)

-- 
2.14.1.480.gb18f417b89-goog


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v4 1/3] diff: avoid redundantly clearing a flag
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (12 preceding siblings ...)
  2017-08-16  1:27 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Jonathan Tan
@ 2017-08-16  1:27 ` Jonathan Tan
  2017-08-16  1:27 ` [PATCH v4 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
  2017-08-16  1:27 ` [PATCH v4 3/3] diff: define block by number of alphanumeric chars Jonathan Tan
  15 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2017-08-16  1:27 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller, gitster

No code in diff.c sets DIFF_SYMBOL_MOVED_LINE except in
mark_color_as_moved(), so it is redundant to clear it for the current
line. Therefore, clear it only for previous lines.

This makes a refactoring in a subsequent patch easier.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 4af73a7e0..23311f9c0 100644
--- a/diff.c
+++ b/diff.c
@@ -898,7 +898,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (!match) {
 			if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH &&
 			    o->color_moved != COLOR_MOVED_PLAIN) {
-				for (i = 0; i < block_length + 1; i++) {
+				for (i = 1; i < block_length + 1; i++) {
 					l = &o->emitted_symbols->buf[n - i];
 					l->flags &= ~DIFF_SYMBOL_MOVED_LINE;
 				}
-- 
2.14.1.480.gb18f417b89-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 2/3] diff: respect MIN_BLOCK_LENGTH for last block
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (13 preceding siblings ...)
  2017-08-16  1:27 ` [PATCH v4 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
@ 2017-08-16  1:27 ` Jonathan Tan
  2017-08-16  1:27 ` [PATCH v4 3/3] diff: define block by number of alphanumeric chars Jonathan Tan
  15 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2017-08-16  1:27 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller, gitster

Currently, MIN_BLOCK_LENGTH is only checked when diff encounters a line
that does not belong to the current block. In particular, this means
that MIN_BLOCK_LENGTH is not checked after all lines are encountered.

Perform that check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 diff.c                     | 29 ++++++++++++++++++++++-------
 t/t4015-diff-whitespace.sh | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 23311f9c0..f598d8a3a 100644
--- a/diff.c
+++ b/diff.c
@@ -861,6 +861,26 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 	return rp + 1;
 }
 
+/*
+ * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
+ *
+ * Otherwise, if the last block has fewer lines than
+ * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in
+ * that block.
+ *
+ * The last block consists of the (n - block_length)'th line up to but not
+ * including the nth line.
+ */
+static void adjust_last_block(struct diff_options *o, int n, int block_length)
+{
+	int i;
+	if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH ||
+	    o->color_moved == COLOR_MOVED_PLAIN)
+		return;
+	for (i = 1; i < block_length + 1; i++)
+		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
+}
+
 /* Find blocks of moved code, delegate actual coloring decision to helper */
 static void mark_color_as_moved(struct diff_options *o,
 				struct hashmap *add_lines,
@@ -896,13 +916,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		if (!match) {
-			if (block_length < COLOR_MOVED_MIN_BLOCK_LENGTH &&
-			    o->color_moved != COLOR_MOVED_PLAIN) {
-				for (i = 1; i < block_length + 1; i++) {
-					l = &o->emitted_symbols->buf[n - i];
-					l->flags &= ~DIFF_SYMBOL_MOVED_LINE;
-				}
-			}
+			adjust_last_block(o, n, block_length);
 			pmb_nr = 0;
 			block_length = 0;
 			continue;
@@ -944,6 +958,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (flipped_block)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
+	adjust_last_block(o, n, block_length);
 
 	free(pmb);
 }
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index c3b697411..6ae62a6bf 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1382,6 +1382,43 @@ EOF
 	test_cmp expected actual
 '
 
+test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' '
+	git reset --hard &&
+	>bar &&
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	line1
+	EOF
+	git add foo bar &&
+	git commit -m x &&
+
+	cat <<-\EOF >bar &&
+	line1
+	EOF
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	EOF
+
+	git diff HEAD --color-moved=zebra --no-renames |
+		grep -v "index" |
+		test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/bar b/bar<RESET>
+	<BOLD>--- a/bar<RESET>
+	<BOLD>+++ b/bar<RESET>
+	<CYAN>@@ -0,0 +1 @@<RESET>
+	<GREEN>+<RESET><GREEN>line1<RESET>
+	<BOLD>diff --git a/foo b/foo<RESET>
+	<BOLD>--- a/foo<RESET>
+	<BOLD>+++ b/foo<RESET>
+	<CYAN>@@ -1,2 +1 @@<RESET>
+	 irrelevant_line<RESET>
+	<RED>-line1<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_expect_success 'move detection with submodules' '
 	test_create_repo bananas &&
 	echo ripe >bananas/recipe &&
-- 
2.14.1.480.gb18f417b89-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v4 3/3] diff: define block by number of alphanumeric chars
  2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
                   ` (14 preceding siblings ...)
  2017-08-16  1:27 ` [PATCH v4 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
@ 2017-08-16  1:27 ` Jonathan Tan
  15 siblings, 0 replies; 30+ messages in thread
From: Jonathan Tan @ 2017-08-16  1:27 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, sbeller, gitster

The existing behavior of diff --color-moved=zebra does not define the
minimum size of a block at all, instead relying on a heuristic applied
later to filter out sets of adjacent moved lines that are shorter than 3
lines long. This can be confusing, because a block could thus be colored
as moved at the source but not at the destination (or vice versa),
depending on its neighbors.

Instead, teach diff that the minimum size of a block is 20 alphanumeric
characters, the same heuristic used by "git blame". This allows diff to
still exclude uninteresting lines appearing on their own (such as those
solely consisting of one or a few closing braces), as was the intention
of the adjacent-moved-line heuristic.

This requires a change in some tests in that some of their lines are no
longer considered to be part of a block, because they are too short.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/diff-options.txt |   8 +-
 diff.c                         |  28 +++--
 diff.h                         |   2 +-
 t/t4015-diff-whitespace.sh     | 226 ++++++++++++++++++++++++++++-------------
 4 files changed, 183 insertions(+), 81 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index bc52bd0b9..b8c881605 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -254,13 +254,11 @@ plain::
 	moved line, but it is not very useful in a review to determine
 	if a block of code was moved without permutation.
 zebra::
-	Blocks of moved code are detected greedily. The detected blocks are
+	Blocks of moved text of at least 20 alphanumeric characters
+	are detected greedily. The detected blocks are
 	painted using either the 'color.diff.{old,new}Moved' color or
 	'color.diff.{old,new}MovedAlternative'. The change between
-	the two colors indicates that a new block was detected. If there
-	are fewer than 3 adjacent moved lines, they are not marked up
-	as moved, but the regular colors 'color.diff.{old,new}' will be
-	used.
+	the two colors indicates that a new block was detected.
 dimmed_zebra::
 	Similar to 'zebra', but additional dimming of uninteresting parts
 	of moved code is performed. The bordering lines of two adjacent
diff --git a/diff.c b/diff.c
index f598d8a3a..c50fcc7ea 100644
--- a/diff.c
+++ b/diff.c
@@ -864,19 +864,31 @@ static int shrink_potential_moved_blocks(struct moved_entry **pmb,
 /*
  * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
  *
- * Otherwise, if the last block has fewer lines than
- * COLOR_MOVED_MIN_BLOCK_LENGTH, unset DIFF_SYMBOL_MOVED_LINE on all lines in
+ * Otherwise, if the last block has fewer alphanumeric characters than
+ * COLOR_MOVED_MIN_ALNUM_COUNT, unset DIFF_SYMBOL_MOVED_LINE on all lines in
  * that block.
  *
  * The last block consists of the (n - block_length)'th line up to but not
  * including the nth line.
+ *
+ * NEEDSWORK: This uses the same heuristic as blame_entry_score() in blame.c.
+ * Think of a way to unify them.
  */
 static void adjust_last_block(struct diff_options *o, int n, int block_length)
 {
-	int i;
-	if (block_length >= COLOR_MOVED_MIN_BLOCK_LENGTH ||
-	    o->color_moved == COLOR_MOVED_PLAIN)
+	int i, alnum_count = 0;
+	if (o->color_moved == COLOR_MOVED_PLAIN)
 		return;
+	for (i = 1; i < block_length + 1; i++) {
+		const char *c = o->emitted_symbols->buf[n - i].line;
+		for (; *c; c++) {
+			if (!isalnum(*c))
+				continue;
+			alnum_count++;
+			if (alnum_count >= COLOR_MOVED_MIN_ALNUM_COUNT)
+				return;
+		}
+	}
 	for (i = 1; i < block_length + 1; i++)
 		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
 }
@@ -923,7 +935,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		l->flags |= DIFF_SYMBOL_MOVED_LINE;
-		block_length++;
 
 		if (o->color_moved == COLOR_MOVED_PLAIN)
 			continue;
@@ -953,8 +964,13 @@ static void mark_color_as_moved(struct diff_options *o,
 			}
 
 			flipped_block = (flipped_block + 1) % 2;
+
+			adjust_last_block(o, n, block_length);
+			block_length = 0;
 		}
 
+		block_length++;
+
 		if (flipped_block)
 			l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 	}
diff --git a/diff.h b/diff.h
index 5755f465d..aca150ba2 100644
--- a/diff.h
+++ b/diff.h
@@ -195,7 +195,7 @@ struct diff_options {
 		COLOR_MOVED_ZEBRA_DIM = 3,
 	} color_moved;
 	#define COLOR_MOVED_DEFAULT COLOR_MOVED_ZEBRA
-	#define COLOR_MOVED_MIN_BLOCK_LENGTH 3
+	#define COLOR_MOVED_MIN_ALNUM_COUNT 20
 };
 
 void diff_emit_submodule_del(struct diff_options *o, const char *line);
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 6ae62a6bf..12d182dc1 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1101,9 +1101,9 @@ test_expect_success 'detect malicious moved code, inside file' '
 	<BRED>-{<RESET>
 	<BLUE>-if (!u->is_allowed_foo)<RESET>
 	<BLUE>-return;<RESET>
-	<BRED>-foo(u);<RESET>
-	<BLUE>-}<RESET>
-	<BLUE>-<RESET>
+	<RED>-foo(u);<RESET>
+	<RED>-}<RESET>
+	<RED>-<RESET>
 	 int main()<RESET>
 	 {<RESET>
 	 foo();<RESET>
@@ -1117,11 +1117,11 @@ test_expect_success 'detect malicious moved code, inside file' '
 	 <RESET>
 	<BGREEN>+<RESET><BGREEN>int secure_foo(struct user *u)<RESET>
 	<BGREEN>+<RESET><BGREEN>{<RESET>
-	<YELLOW>+<RESET><YELLOW>foo(u);<RESET>
+	<GREEN>+<RESET><GREEN>foo(u);<RESET>
 	<BGREEN>+<RESET><BGREEN>if (!u->is_allowed_foo)<RESET>
 	<BGREEN>+<RESET><BGREEN>return;<RESET>
-	<YELLOW>+<RESET><YELLOW>}<RESET>
-	<YELLOW>+<RESET>
+	<GREEN>+<RESET><GREEN>}<RESET>
+	<GREEN>+<RESET>
 	 int another_function()<RESET>
 	 {<RESET>
 	 bar();<RESET>
@@ -1182,9 +1182,9 @@ test_expect_success 'plain moved code, inside file' '
 test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
 	git reset --hard &&
 	cat <<-\EOF >lines.txt &&
-		line 1
-		line 2
-		line 3
+		long line 1
+		long line 2
+		long line 3
 		line 4
 		line 5
 		line 6
@@ -1195,9 +1195,9 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
 		line 11
 		line 12
 		line 13
-		line 14
-		line 15
-		line 16
+		long line 14
+		long line 15
+		long line 16
 	EOF
 	git add lines.txt &&
 	git commit -m "add poetry" &&
@@ -1208,12 +1208,12 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
 		line 7
 		line 8
 		line 9
-		line 1
-		line 2
-		line 3
-		line 14
-		line 15
-		line 16
+		long line 1
+		long line 2
+		long line 3
+		long line 14
+		long line 15
+		long line 16
 		line 10
 		line 11
 		line 12
@@ -1227,35 +1227,36 @@ test_expect_success 'detect permutations inside moved code -- dimmed_zebra' '
 	test_config color.diff.newMovedDimmed "normal cyan" &&
 	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
 	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
-	git diff HEAD --no-renames --color-moved=dimmed_zebra| test_decode_color >actual &&
+	git diff HEAD --no-renames --color-moved=dimmed_zebra |
+		grep -v "index" |
+		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
-	<BOLD>index 47ea9c3..ba96a38 100644<RESET>
 	<BOLD>--- a/lines.txt<RESET>
 	<BOLD>+++ b/lines.txt<RESET>
 	<CYAN>@@ -1,16 +1,16 @@<RESET>
-	<BMAGENTA>-line 1<RESET>
-	<BMAGENTA>-line 2<RESET>
-	<BMAGENTA>-line 3<RESET>
+	<BMAGENTA>-long line 1<RESET>
+	<BMAGENTA>-long line 2<RESET>
+	<BMAGENTA>-long line 3<RESET>
 	 line 4<RESET>
 	 line 5<RESET>
 	 line 6<RESET>
 	 line 7<RESET>
 	 line 8<RESET>
 	 line 9<RESET>
-	<BCYAN>+<RESET><BCYAN>line 1<RESET>
-	<BCYAN>+<RESET><BCYAN>line 2<RESET>
-	<CYAN>+<RESET><CYAN>line 3<RESET>
-	<YELLOW>+<RESET><YELLOW>line 14<RESET>
-	<BYELLOW>+<RESET><BYELLOW>line 15<RESET>
-	<BYELLOW>+<RESET><BYELLOW>line 16<RESET>
+	<BCYAN>+<RESET><BCYAN>long line 1<RESET>
+	<BCYAN>+<RESET><BCYAN>long line 2<RESET>
+	<CYAN>+<RESET><CYAN>long line 3<RESET>
+	<YELLOW>+<RESET><YELLOW>long line 14<RESET>
+	<BYELLOW>+<RESET><BYELLOW>long line 15<RESET>
+	<BYELLOW>+<RESET><BYELLOW>long line 16<RESET>
 	 line 10<RESET>
 	 line 11<RESET>
 	 line 12<RESET>
 	 line 13<RESET>
-	<BMAGENTA>-line 14<RESET>
-	<BMAGENTA>-line 15<RESET>
-	<BMAGENTA>-line 16<RESET>
+	<BMAGENTA>-long line 14<RESET>
+	<BMAGENTA>-long line 15<RESET>
+	<BMAGENTA>-long line 16<RESET>
 	EOF
 	test_cmp expected actual
 '
@@ -1270,35 +1271,36 @@ test_expect_success 'cmd option assumes configured colored-moved' '
 	test_config color.diff.oldMovedAlternativeDimmed "normal blue" &&
 	test_config color.diff.newMovedAlternativeDimmed "normal yellow" &&
 	test_config diff.colorMoved zebra &&
-	git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
+	git diff HEAD --no-renames --color-moved |
+		grep -v "index" |
+		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
-	<BOLD>index 47ea9c3..ba96a38 100644<RESET>
 	<BOLD>--- a/lines.txt<RESET>
 	<BOLD>+++ b/lines.txt<RESET>
 	<CYAN>@@ -1,16 +1,16 @@<RESET>
-	<MAGENTA>-line 1<RESET>
-	<MAGENTA>-line 2<RESET>
-	<MAGENTA>-line 3<RESET>
+	<MAGENTA>-long line 1<RESET>
+	<MAGENTA>-long line 2<RESET>
+	<MAGENTA>-long line 3<RESET>
 	 line 4<RESET>
 	 line 5<RESET>
 	 line 6<RESET>
 	 line 7<RESET>
 	 line 8<RESET>
 	 line 9<RESET>
-	<CYAN>+<RESET><CYAN>line 1<RESET>
-	<CYAN>+<RESET><CYAN>line 2<RESET>
-	<CYAN>+<RESET><CYAN>line 3<RESET>
-	<YELLOW>+<RESET><YELLOW>line 14<RESET>
-	<YELLOW>+<RESET><YELLOW>line 15<RESET>
-	<YELLOW>+<RESET><YELLOW>line 16<RESET>
+	<CYAN>+<RESET><CYAN>long line 1<RESET>
+	<CYAN>+<RESET><CYAN>long line 2<RESET>
+	<CYAN>+<RESET><CYAN>long line 3<RESET>
+	<YELLOW>+<RESET><YELLOW>long line 14<RESET>
+	<YELLOW>+<RESET><YELLOW>long line 15<RESET>
+	<YELLOW>+<RESET><YELLOW>long line 16<RESET>
 	 line 10<RESET>
 	 line 11<RESET>
 	 line 12<RESET>
 	 line 13<RESET>
-	<MAGENTA>-line 14<RESET>
-	<MAGENTA>-line 15<RESET>
-	<MAGENTA>-line 16<RESET>
+	<MAGENTA>-long line 14<RESET>
+	<MAGENTA>-long line 15<RESET>
+	<MAGENTA>-long line 16<RESET>
 	EOF
 	test_cmp expected actual
 '
@@ -1324,16 +1326,16 @@ line 1
 line 2
 line 3
 line 4
-line 5
-line 6
-line 7
+long line 5
+long line 6
+long line 7
 EOF
 	git add lines.txt &&
 	git commit -m "add poetry" &&
 	cat <<\EOF >lines.txt &&
-	line 5
-	line 6
-	line 7
+	long line 5
+	long line 6
+	long line 7
 line 1
 line 2
 line 3
@@ -1341,48 +1343,50 @@ line 4
 EOF
 	test_config color.diff.oldMoved "magenta" &&
 	test_config color.diff.newMoved "cyan" &&
-	git diff HEAD --no-renames --color-moved| test_decode_color >actual &&
+	git diff HEAD --no-renames --color-moved |
+		grep -v "index" |
+		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
-	<BOLD>index 734156d..eb89ead 100644<RESET>
 	<BOLD>--- a/lines.txt<RESET>
 	<BOLD>+++ b/lines.txt<RESET>
 	<CYAN>@@ -1,7 +1,7 @@<RESET>
-	<GREEN>+<RESET>	<GREEN>line 5<RESET>
-	<GREEN>+<RESET>	<GREEN>line 6<RESET>
-	<GREEN>+<RESET>	<GREEN>line 7<RESET>
+	<GREEN>+<RESET>	<GREEN>long line 5<RESET>
+	<GREEN>+<RESET>	<GREEN>long line 6<RESET>
+	<GREEN>+<RESET>	<GREEN>long line 7<RESET>
 	 line 1<RESET>
 	 line 2<RESET>
 	 line 3<RESET>
 	 line 4<RESET>
-	<RED>-line 5<RESET>
-	<RED>-line 6<RESET>
-	<RED>-line 7<RESET>
+	<RED>-long line 5<RESET>
+	<RED>-long line 6<RESET>
+	<RED>-long line 7<RESET>
 	EOF
 	test_cmp expected actual &&
 
-	git diff HEAD --no-renames -w --color-moved| test_decode_color >actual &&
+	git diff HEAD --no-renames -w --color-moved |
+		grep -v "index" |
+		test_decode_color >actual &&
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
-	<BOLD>index 734156d..eb89ead 100644<RESET>
 	<BOLD>--- a/lines.txt<RESET>
 	<BOLD>+++ b/lines.txt<RESET>
 	<CYAN>@@ -1,7 +1,7 @@<RESET>
-	<CYAN>+<RESET>	<CYAN>line 5<RESET>
-	<CYAN>+<RESET>	<CYAN>line 6<RESET>
-	<CYAN>+<RESET>	<CYAN>line 7<RESET>
+	<CYAN>+<RESET>	<CYAN>long line 5<RESET>
+	<CYAN>+<RESET>	<CYAN>long line 6<RESET>
+	<CYAN>+<RESET>	<CYAN>long line 7<RESET>
 	 line 1<RESET>
 	 line 2<RESET>
 	 line 3<RESET>
 	 line 4<RESET>
-	<MAGENTA>-line 5<RESET>
-	<MAGENTA>-line 6<RESET>
-	<MAGENTA>-line 7<RESET>
+	<MAGENTA>-long line 5<RESET>
+	<MAGENTA>-long line 6<RESET>
+	<MAGENTA>-long line 7<RESET>
 	EOF
 	test_cmp expected actual
 '
 
-test_expect_success '--color-moved block at end of diff output respects MIN_BLOCK_LENGTH' '
+test_expect_success '--color-moved block at end of diff output respects MIN_ALNUM_COUNT' '
 	git reset --hard &&
 	>bar &&
 	cat <<-\EOF >foo &&
@@ -1419,6 +1423,90 @@ test_expect_success '--color-moved block at end of diff output respects MIN_BLOC
 	test_cmp expected actual
 '
 
+test_expect_success '--color-moved respects MIN_ALNUM_COUNT' '
+	git reset --hard &&
+	cat <<-\EOF >foo &&
+	nineteen chars 456789
+	irrelevant_line
+	twenty chars 234567890
+	EOF
+	>bar &&
+	git add foo bar &&
+	git commit -m x &&
+
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	EOF
+	cat <<-\EOF >bar &&
+	twenty chars 234567890
+	nineteen chars 456789
+	EOF
+
+	git diff HEAD --color-moved=zebra --no-renames |
+		grep -v "index" |
+		test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/bar b/bar<RESET>
+	<BOLD>--- a/bar<RESET>
+	<BOLD>+++ b/bar<RESET>
+	<CYAN>@@ -0,0 +1,2 @@<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>twenty chars 234567890<RESET>
+	<GREEN>+<RESET><GREEN>nineteen chars 456789<RESET>
+	<BOLD>diff --git a/foo b/foo<RESET>
+	<BOLD>--- a/foo<RESET>
+	<BOLD>+++ b/foo<RESET>
+	<CYAN>@@ -1,3 +1 @@<RESET>
+	<RED>-nineteen chars 456789<RESET>
+	 irrelevant_line<RESET>
+	<BOLD;MAGENTA>-twenty chars 234567890<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
+test_expect_success '--color-moved treats adjacent blocks as separate for MIN_ALNUM_COUNT' '
+	git reset --hard &&
+	cat <<-\EOF >foo &&
+	7charsA
+	irrelevant_line
+	7charsB
+	7charsC
+	EOF
+	>bar &&
+	git add foo bar &&
+	git commit -m x &&
+
+	cat <<-\EOF >foo &&
+	irrelevant_line
+	EOF
+	cat <<-\EOF >bar &&
+	7charsB
+	7charsC
+	7charsA
+	EOF
+
+	git diff HEAD --color-moved=zebra --no-renames | grep -v "index" | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/bar b/bar<RESET>
+	<BOLD>--- a/bar<RESET>
+	<BOLD>+++ b/bar<RESET>
+	<CYAN>@@ -0,0 +1,3 @@<RESET>
+	<GREEN>+<RESET><GREEN>7charsB<RESET>
+	<GREEN>+<RESET><GREEN>7charsC<RESET>
+	<GREEN>+<RESET><GREEN>7charsA<RESET>
+	<BOLD>diff --git a/foo b/foo<RESET>
+	<BOLD>--- a/foo<RESET>
+	<BOLD>+++ b/foo<RESET>
+	<CYAN>@@ -1,4 +1 @@<RESET>
+	<RED>-7charsA<RESET>
+	 irrelevant_line<RESET>
+	<RED>-7charsB<RESET>
+	<RED>-7charsC<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_expect_success 'move detection with submodules' '
 	test_create_repo bananas &&
 	echo ripe >bananas/recipe &&
-- 
2.14.1.480.gb18f417b89-goog


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v4 0/3] "diff --color-moved" with yet another heuristic
  2017-08-16  1:27 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Jonathan Tan
@ 2017-08-16  5:55   ` Stefan Beller
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Beller @ 2017-08-16  5:55 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git@vger.kernel.org, Junio C Hamano

On Tue, Aug 15, 2017 at 6:27 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> These patches are on sb/diff-color-move.
>
> Patches 1 and 2 are unchanged, except for some line wrapping in a test
> in patch 2.
>
> This has been updated to use the same alphanumeric heuristic as blame
> (20 alnum characters). I tried it out and I thought the results were
> reasonable in a patch set that I'm working on (the pack-related function
> refactoring one).

ok, great!

> As for refactoring blame.c and this file, I'm not sure where best to put
> the new function, so I've added a NEEDSWORK for now.

Let's just cook this heuristic a bit and see if we want to refactor them
or tweak them differently.

> As for detecting block boundaries in adjust_last_block(), I've left it
> as-is for now. I think it's clearer if the parent function provides that
> information, since it already tracks that. In addition, we avoid corner
> cases such as what happens if the block is at the start of the diff
> output (we must ensure that we don't read off the beginning edge, for
> example).

ok.

Thanks!
Stefan


>
> Jonathan Tan (3):
>   diff: avoid redundantly clearing a flag
>   diff: respect MIN_BLOCK_LENGTH for last block
>   diff: define block by number of alphanumeric chars
>
>  Documentation/diff-options.txt |   8 +-
>  diff.c                         |  47 ++++++--
>  diff.h                         |   2 +-
>  t/t4015-diff-whitespace.sh     | 261 ++++++++++++++++++++++++++++++-----------
>  4 files changed, 236 insertions(+), 82 deletions(-)
>
> --
> 2.14.1.480.gb18f417b89-goog
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-08-16  5:55 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-11 22:49 [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Jonathan Tan
2017-08-11 22:49 ` [RFC PATCH 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
2017-08-14 17:13   ` Stefan Beller
2017-08-11 22:49 ` [RFC PATCH 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
2017-08-14 17:17   ` Stefan Beller
2017-08-11 22:49 ` [RFC PATCH 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan
2017-08-14 17:22   ` Stefan Beller
2017-08-12  0:39 ` [RFC PATCH 0/3] Fixes to "diff --color-moved" MIN_BLOCK_LENGTH handling Junio C Hamano
2017-08-14 17:29   ` Stefan Beller
2017-08-14 19:37     ` Junio C Hamano
2017-08-14 19:51       ` Stefan Beller
2017-08-15 17:07         ` Junio C Hamano
2017-08-14 21:31 ` [PATCH v2 " Jonathan Tan
2017-08-14 21:31 ` [PATCH v2 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
2017-08-14 21:31 ` [PATCH v2 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
2017-08-14 21:31 ` [PATCH v2 3/3] diff: check MIN_BLOCK_LENGTH at start of new block Jonathan Tan
2017-08-14 22:46   ` Stefan Beller
2017-08-14 23:57 ` [PATCH v3 0/3] "diff --color-moved" with different heuristic Jonathan Tan
2017-08-14 23:57 ` [PATCH v3 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
2017-08-14 23:57 ` [PATCH v3 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
2017-08-14 23:57 ` [PATCH v3 3/3] diff: define block by number of non-space chars Jonathan Tan
2017-08-15  2:29   ` Stefan Beller
2017-08-15 19:54   ` Junio C Hamano
2017-08-15 20:06     ` Stefan Beller
2017-08-15 20:53       ` Junio C Hamano
2017-08-16  1:27 ` [PATCH v4 0/3] "diff --color-moved" with yet another heuristic Jonathan Tan
2017-08-16  5:55   ` Stefan Beller
2017-08-16  1:27 ` [PATCH v4 1/3] diff: avoid redundantly clearing a flag Jonathan Tan
2017-08-16  1:27 ` [PATCH v4 2/3] diff: respect MIN_BLOCK_LENGTH for last block Jonathan Tan
2017-08-16  1:27 ` [PATCH v4 3/3] diff: define block by number of alphanumeric chars Jonathan Tan

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).