git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/10] diff --color-moved[-ws] speedups
@ 2021-06-14 13:04 Phillip Wood via GitGitGadget
  2021-06-14 13:04 ` [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring Phillip Wood via GitGitGadget
                   ` (11 more replies)
  0 siblings, 12 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-06-14 13:04 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood

The current implementation of diff --color-moved-ws=allow-indentation-change
is considerably slower that the implementation of diff --color-moved which
is in turn slower than a regular diff. This patch series starts with a
couple of bug fixes and then reworks the implementation of diff
--color-moved and diff --color-moved-ws=allow-indentation-change to speed
them up on large diffs. The time to run git diff --color-moved
--no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
with --color-moved - the time to run git log -p --color-moved
--no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
processors. On older processors these patches reduce the running time in all
cases that I've tested. In general the larger the diff the larger the speed
up. As an extreme example the time to run diff --color-moved
--color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
minutes to 6 seconds.

Phillip Wood (10):
  diff --color-moved=zerba: fix alternate coloring
  diff --color-moved: avoid false short line matches and bad zerba
    coloring
  diff: simplify allow-indentation-change delta calculation
  diff --color-moved-ws=allow-indentation-change: simplify and optimize
  diff --color-moved: call comparison function directly
  diff --color-moved: unify moved block growth functions
  diff --color-moved: shrink potential moved blocks as we go
  diff --color-moved: stop clearing potential moved blocks
  diff --color-moved-ws=allow-indentation-change: improve hash lookups
  diff --color-moved: intern strings

 diff.c                     | 375 ++++++++++++++-----------------------
 t/t4015-diff-whitespace.sh | 137 ++++++++++++++
 2 files changed, 276 insertions(+), 236 deletions(-)


base-commit: 211eca0895794362184da2be2a2d812d070719d3
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-981%2Fphillipwood%2Fwip%2Fdiff-color-moved-tweaks-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-981/phillipwood/wip/diff-color-moved-tweaks-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/981
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring
  2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
@ 2021-06-14 13:04 ` Phillip Wood via GitGitGadget
  2021-06-15  3:24   ` Junio C Hamano
  2021-06-14 13:04 ` [PATCH 02/10] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-06-14 13:04 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
alternation", 2018-11-23) sought to avoid using the alternate colors
unless there are two adjacent moved blocks of the same
sign. Unfortunately it contains two bugs that prevented it from fixing
the problem properly. Firstly `last_symbol` is reset at the start of
each iteration of the loop losing the symbol of the last line and
secondly when deciding whether to use the alternate color it should be
checking if the current line is the same sign of the last line, not a
different sign. The combination of the two errors means that we still
use the alternate color when we should do but we also use it when we
shouldn't. This is most noticable when using
--color-moved-ws=allow-indentation-change with hunks like

-this line gets indented
+    this line gets indented

where the post image is colored with newMovedAlternate rather than
newMoved. While this does not matter much, the next commit will change
the coloring to be correct in this case, so lets fix the bug here to
make it clear why the output is changing and add a regression test.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     |  4 +--
 t/t4015-diff-whitespace.sh | 72 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 52c791574b71..cb068f8258c0 100644
--- a/diff.c
+++ b/diff.c
@@ -1142,6 +1142,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
 	int n, flipped_block = 0, block_length = 0;
+	enum diff_symbol last_symbol = 0;
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
@@ -1149,7 +1150,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-		enum diff_symbol last_symbol = 0;
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
@@ -1214,7 +1214,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			}
 
 			if (adjust_last_block(o, n, block_length) &&
-			    pmb_nr && last_symbol != l->s)
+			    pmb_nr && last_symbol == l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 2c13b62d3c65..920114cd795c 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1442,6 +1442,78 @@ test_expect_success 'detect permutations inside moved code -- dimmed-zebra' '
 	test_cmp expected actual
 '
 
+test_expect_success 'zebra alternate color is only used when necessary' '
+	cat >old.txt <<-\EOF &&
+	line 1A should be marked as oldMoved newMovedAlternate
+	line 1B should be marked as oldMoved newMovedAlternate
+	unchanged
+	line 2A should be marked as oldMoved newMovedAlternate
+	line 2B should be marked as oldMoved newMovedAlternate
+	line 3A should be marked as oldMovedAlternate newMoved
+	line 3B should be marked as oldMovedAlternate newMoved
+	unchanged
+	line 4A should be marked as oldMoved newMovedAlternate
+	line 4B should be marked as oldMoved newMovedAlternate
+	line 5A should be marked as oldMovedAlternate newMoved
+	line 5B should be marked as oldMovedAlternate newMoved
+	line 6A should be marked as oldMoved newMoved
+	line 6B should be marked as oldMoved newMoved
+	EOF
+	cat >new.txt <<-\EOF &&
+	  line 1A should be marked as oldMoved newMovedAlternate
+	  line 1B should be marked as oldMoved newMovedAlternate
+	unchanged
+	  line 3A should be marked as oldMovedAlternate newMoved
+	  line 3B should be marked as oldMovedAlternate newMoved
+	  line 2A should be marked as oldMoved newMovedAlternate
+	  line 2B should be marked as oldMoved newMovedAlternate
+	unchanged
+	  line 6A should be marked as oldMoved newMoved
+	  line 6B should be marked as oldMoved newMoved
+	    line 4A should be marked as oldMoved newMovedAlternate
+	    line 4B should be marked as oldMoved newMovedAlternate
+	  line 5A should be marked as oldMovedAlternate newMoved
+	  line 5B should be marked as oldMovedAlternate newMoved
+	EOF
+	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
+		 --color-moved-ws=allow-indentation-change \
+		 old.txt new.txt >output &&
+	grep -v index output | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/old.txt b/new.txt<RESET>
+	<BOLD>--- a/old.txt<RESET>
+	<BOLD>+++ b/new.txt<RESET>
+	<CYAN>@@ -1,14 +1,14 @@<RESET>
+	<BOLD;MAGENTA>-line 1A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 1B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1B should be marked as oldMoved newMovedAlternate<RESET>
+	 unchanged<RESET>
+	<BOLD;MAGENTA>-line 2A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 2B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;BLUE>-line 3A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;BLUE>-line 3B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2B should be marked as oldMoved newMovedAlternate<RESET>
+	 unchanged<RESET>
+	<BOLD;MAGENTA>-line 4A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 4B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;BLUE>-line 5A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;BLUE>-line 5B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;MAGENTA>-line 6A should be marked as oldMoved newMoved<RESET>
+	<BOLD;MAGENTA>-line 6B should be marked as oldMoved newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6A should be marked as oldMoved newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6B should be marked as oldMoved newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5B should be marked as oldMovedAlternate newMoved<RESET>
+	EOF
+	test_cmp expected actual
+'
+
 test_expect_success 'cmd option assumes configured colored-moved' '
 	test_config color.diff.oldMoved "magenta" &&
 	test_config color.diff.newMoved "cyan" &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 02/10] diff --color-moved: avoid false short line matches and bad zerba coloring
  2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
  2021-06-14 13:04 ` [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring Phillip Wood via GitGitGadget
@ 2021-06-14 13:04 ` Phillip Wood via GitGitGadget
  2021-06-14 13:04 ` [PATCH 03/10] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-06-14 13:04 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

When marking moved lines it is possible for a block of potential
matched lines to extend past a change in sign when there is a sequence
of added lines whose text matches the text of a sequence of deleted
and added lines. Most of the time either `match` will be NULL or
`pmb_advance_or_null()` will fail when the loop encounters a change of
sign but there are corner cases where `match` is non-NULL and
`pmb_advance_or_null()` successfully advances the moved block despite
the change in sign.

One consequence of this is highlighting a short line as moved when it
should not be. For example

-moved line  # Correctly highlighted as moved
+short line  # Wrongly highlighted as moved
 context
+moved line  # Correctly highlighted as moved
+short line
 context
-short line

The other consequence is coloring a moved addition following a moved
deletion in the wrong color. In the example below the first "+moved
line 3" should be highlighted as newMoved not newMovedAlternate.

-moved line 1 # Correctly highlighted as oldMoved
-moved line 2 # Correctly highlighted as oldMovedAlternate
+moved line 3 # Wrongly highlighted as newMovedAlternate
 context      # Everything else is highlighted correctly
+moved line 2
+moved line 3
 context
+moved line 1
-moved line 3

These false matches are more likely when using --color-moved-ws with
the exception of --color-moved-ws=allow-indentation-change which ties
the sign of the current whitespace delta to the sign of the line to
avoid this problem. The fix is to check that the sign of the new line
being matched is the same as the sign of the line that started the
block of potential matches.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 17 ++++++----
 t/t4015-diff-whitespace.sh | 65 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index cb068f8258c0..a0c43a104768 100644
--- a/diff.c
+++ b/diff.c
@@ -1142,7 +1142,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
 	int n, flipped_block = 0, block_length = 0;
-	enum diff_symbol last_symbol = 0;
+	enum diff_symbol moved_symbol = 0;
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
@@ -1168,7 +1168,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			flipped_block = 0;
 		}
 
-		if (!match) {
+		if (pmb_nr && (!match || l->s != moved_symbol)) {
 			int i;
 
 			adjust_last_block(o, n, block_length);
@@ -1177,12 +1177,13 @@ static void mark_color_as_moved(struct diff_options *o,
 			pmb_nr = 0;
 			block_length = 0;
 			flipped_block = 0;
-			last_symbol = l->s;
+		}
+		if (!match) {
+			moved_symbol = 0;
 			continue;
 		}
 
 		if (o->color_moved == COLOR_MOVED_PLAIN) {
-			last_symbol = l->s;
 			l->flags |= DIFF_SYMBOL_MOVED_LINE;
 			continue;
 		}
@@ -1214,11 +1215,16 @@ static void mark_color_as_moved(struct diff_options *o,
 			}
 
 			if (adjust_last_block(o, n, block_length) &&
-			    pmb_nr && last_symbol == l->s)
+			    pmb_nr && moved_symbol == l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
 
+			if (pmb_nr)
+				moved_symbol = l->s;
+			else
+				moved_symbol = 0;
+
 			block_length = 0;
 		}
 
@@ -1228,7 +1234,6 @@ static void mark_color_as_moved(struct diff_options *o,
 			if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
 				l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 		}
-		last_symbol = l->s;
 	}
 	adjust_last_block(o, n, block_length);
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 920114cd795c..3119a59f071d 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1514,6 +1514,71 @@ test_expect_success 'zebra alternate color is only used when necessary' '
 	test_cmp expected actual
 '
 
+test_expect_success 'short lines of opposite sign do not get marked as moved' '
+	cat >old.txt <<-\EOF &&
+	this line should be marked as moved
+	unchanged
+	unchanged
+	unchanged
+	unchanged
+	too short
+	this line should be marked as oldMoved newMoved
+	this line should be marked as oldMovedAlternate newMoved
+	unchanged 1
+	unchanged 2
+	unchanged 3
+	unchanged 4
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	EOF
+	cat >new.txt <<-\EOF &&
+	too short
+	unchanged
+	unchanged
+	this line should be marked as moved
+	too short
+	unchanged
+	unchanged
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	unchanged 1
+	unchanged 2
+	this line should be marked as oldMovedAlternate newMoved
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	unchanged 3
+	this line should be marked as oldMoved newMoved
+	unchanged 4
+	EOF
+	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
+		old.txt new.txt >output && cat output &&
+	grep -v index output | test_decode_color >actual &&
+	cat >expect <<-\EOF &&
+	<BOLD>diff --git a/old.txt b/new.txt<RESET>
+	<BOLD>--- a/old.txt<RESET>
+	<BOLD>+++ b/new.txt<RESET>
+	<CYAN>@@ -1,13 +1,15 @@<RESET>
+	<BOLD;MAGENTA>-this line should be marked as moved<RESET>
+	<GREEN>+<RESET><GREEN>too short<RESET>
+	 unchanged<RESET>
+	 unchanged<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as moved<RESET>
+	<GREEN>+<RESET><GREEN>too short<RESET>
+	 unchanged<RESET>
+	 unchanged<RESET>
+	<RED>-too short<RESET>
+	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved<RESET>
+	<BOLD;BLUE>-this line should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	 unchanged 1<RESET>
+	 unchanged 2<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	 unchanged 3<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved<RESET>
+	 unchanged 4<RESET>
+	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'cmd option assumes configured colored-moved' '
 	test_config color.diff.oldMoved "magenta" &&
 	test_config color.diff.newMoved "cyan" &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 03/10] diff: simplify allow-indentation-change delta calculation
  2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
  2021-06-14 13:04 ` [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring Phillip Wood via GitGitGadget
  2021-06-14 13:04 ` [PATCH 02/10] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
@ 2021-06-14 13:04 ` Phillip Wood via GitGitGadget
  2021-06-14 13:04 ` [PATCH 04/10] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-06-14 13:04 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Now that we reliably end a block when the sign changes we don't need
the whitespace delta calculation to rely on the sign.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index a0c43a104768..19c8954ec546 100644
--- a/diff.c
+++ b/diff.c
@@ -864,23 +864,17 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 	    a_width = a->indent_width,
 	    b_off = b->indent_off,
 	    b_width = b->indent_width;
-	int delta;
 
 	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
 		*out = INDENT_BLANKLINE;
 		return 1;
 	}
 
-	if (a->s == DIFF_SYMBOL_PLUS)
-		delta = a_width - b_width;
-	else
-		delta = b_width - a_width;
-
 	if (a_len - a_off != b_len - b_off ||
 	    memcmp(a->line + a_off, b->line + b_off, a_len - a_off))
 		return 0;
 
-	*out = delta;
+	*out = a_width - b_width;
 
 	return 1;
 }
@@ -924,10 +918,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 * match those of the current block and that the text of 'l' and 'cur'
 	 * after the indentation match.
 	 */
-	if (cur->es->s == DIFF_SYMBOL_PLUS)
-		delta = a_width - c_width;
-	else
-		delta = c_width - a_width;
+	delta = c_width - a_width;
 
 	/*
 	 * If the previous lines of this block were all blank then set its
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 04/10] diff --color-moved-ws=allow-indentation-change: simplify and optimize
  2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
                   ` (2 preceding siblings ...)
  2021-06-14 13:04 ` [PATCH 03/10] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
@ 2021-06-14 13:04 ` Phillip Wood via GitGitGadget
  2021-06-14 13:04 ` [PATCH 05/10] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-06-14 13:04 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

If we already have a block of potentially moved lines then as we move
down the diff we need to check if the next line of each potentially
moved line matches the current line of the diff. The implementation of
--color-moved-ws=allow-indentation-change was needlessly performing
this check on all the lines in the diff that matched the current line
rather than just the current line. To exacerbate the problem finding
all the other lines in the diff that match the current line involves a
fuzzy lookup so we were wasting even more time performing a second
comparison to filter out the non-matching lines. Fixing this reduces
time to run
  git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
by 88% and simplifies the code.

Before this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
  Time (mean ± σ):      9.978 s ±  0.042 s    [User: 9.905 s, System: 0.057 s]
  Range (min … max):    9.917 s … 10.037 s    10 runs

After this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
  Time (mean ± σ):      1.220 s ±  0.004 s    [User: 1.160 s, System: 0.058 s]
  Range (min … max):    1.214 s …  1.226 s    10 runs

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 65 ++++++++++++++++------------------------------------------
 1 file changed, 18 insertions(+), 47 deletions(-)

diff --git a/diff.c b/diff.c
index 19c8954ec546..5d5d168107a6 100644
--- a/diff.c
+++ b/diff.c
@@ -881,35 +881,20 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 
 static int cmp_in_block_with_wsd(const struct diff_options *o,
 				 const struct moved_entry *cur,
-				 const struct moved_entry *match,
-				 struct moved_block *pmb,
-				 int n)
+				 const struct emitted_diff_symbol *l,
+				 struct moved_block *pmb)
 {
-	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-	int al = cur->es->len, bl = match->es->len, cl = l->len;
+	int al = cur->es->len, bl = l->len;
 	const char *a = cur->es->line,
-		   *b = match->es->line,
-		   *c = l->line;
+		   *b = l->line;
 	int a_off = cur->es->indent_off,
 	    a_width = cur->es->indent_width,
-	    c_off = l->indent_off,
-	    c_width = l->indent_width;
+	    b_off = l->indent_off,
+	    b_width = l->indent_width;
 	int delta;
 
-	/*
-	 * We need to check if 'cur' is equal to 'match'.  As those
-	 * are from the same (+/-) side, we do not need to adjust for
-	 * indent changes. However these were found using fuzzy
-	 * matching so we do have to check if they are equal. Here we
-	 * just check the lengths. We delay calling memcmp() to check
-	 * the contents until later as if the length comparison for a
-	 * and c fails we can avoid the call all together.
-	 */
-	if (al != bl)
-		return 1;
-
 	/* If 'l' and 'cur' are both blank then they match. */
-	if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE)
+	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
 		return 0;
 
 	/*
@@ -918,7 +903,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 * match those of the current block and that the text of 'l' and 'cur'
 	 * after the indentation match.
 	 */
-	delta = c_width - a_width;
+	delta = b_width - a_width;
 
 	/*
 	 * If the previous lines of this block were all blank then set its
@@ -927,9 +912,8 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	if (pmb->wsd == INDENT_BLANKLINE)
 		pmb->wsd = delta;
 
-	return !(delta == pmb->wsd && al - a_off == cl - c_off &&
-		 !memcmp(a, b, al) && !
-		 memcmp(a + a_off, c + c_off, al - a_off));
+	return !(delta == pmb->wsd && al - a_off == bl - b_off &&
+		 !memcmp(a + a_off, b + b_off, al - a_off));
 }
 
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
@@ -1030,36 +1014,23 @@ static void pmb_advance_or_null(struct diff_options *o,
 }
 
 static void pmb_advance_or_null_multi_match(struct diff_options *o,
-					    struct moved_entry *match,
-					    struct hashmap *hm,
+					    struct emitted_diff_symbol *l,
 					    struct moved_block *pmb,
-					    int pmb_nr, int n)
+					    int pmb_nr)
 {
 	int i;
-	char *got_match = xcalloc(1, pmb_nr);
-
-	hashmap_for_each_entry_from(hm, match, ent) {
-		for (i = 0; i < pmb_nr; i++) {
-			struct moved_entry *prev = pmb[i].match;
-			struct moved_entry *cur = (prev && prev->next_line) ?
-					prev->next_line : NULL;
-			if (!cur)
-				continue;
-			if (!cmp_in_block_with_wsd(o, cur, match, &pmb[i], n))
-				got_match[i] |= 1;
-		}
-	}
 
 	for (i = 0; i < pmb_nr; i++) {
-		if (got_match[i]) {
+		struct moved_entry *prev = pmb[i].match;
+		struct moved_entry *cur = (prev && prev->next_line) ?
+			prev->next_line : NULL;
+		if (cur && !cmp_in_block_with_wsd(o, cur, l, &pmb[i])) {
 			/* Advance to the next line */
-			pmb[i].match = pmb[i].match->next_line;
+			pmb[i].match = cur;
 		} else {
 			moved_block_clear(&pmb[i]);
 		}
 	}
-
-	free(got_match);
 }
 
 static int shrink_potential_moved_blocks(struct moved_block *pmb,
@@ -1181,7 +1152,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
+			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
 		else
 			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 05/10] diff --color-moved: call comparison function directly
  2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
                   ` (3 preceding siblings ...)
  2021-06-14 13:04 ` [PATCH 04/10] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
@ 2021-06-14 13:04 ` Phillip Wood via GitGitGadget
  2021-06-14 13:04 ` [PATCH 06/10] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-06-14 13:04 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Calling xdiff_compare_lines() directly rather than using a function
pointer from the hash map reduces the time very slightly but more
importantly it will allow us to easily combine pmb_advance_or_null()
and pmb_advance_or_null_multi_match() in the next commit.

Before this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
  Time (mean ± σ):      1.136 s ±  0.004 s    [User: 1.079 s, System: 0.053 s]
  Range (min … max):    1.130 s …  1.141 s    10 runs

After this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
  Time (mean ± σ):      1.118 s ±  0.003 s    [User: 1.062 s, System: 0.053 s]
  Range (min … max):    1.114 s …  1.121 s    10 runs

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 5d5d168107a6..c8fdfc9049bb 100644
--- a/diff.c
+++ b/diff.c
@@ -995,17 +995,20 @@ static void add_lines_to_move_detection(struct diff_options *o,
 }
 
 static void pmb_advance_or_null(struct diff_options *o,
-				struct moved_entry *match,
-				struct hashmap *hm,
+				struct emitted_diff_symbol *l,
 				struct moved_block *pmb,
 				int pmb_nr)
 {
 	int i;
+	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
+
 	for (i = 0; i < pmb_nr; i++) {
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
 				prev->next_line : NULL;
-		if (cur && !hm->cmpfn(o, &cur->ent, &match->ent, NULL)) {
+		if (cur && xdiff_compare_lines(cur->es->line, cur->es->len,
+						l->line, l->len,
+						flags)) {
 			pmb[i].match = cur;
 		} else {
 			pmb[i].match = NULL;
@@ -1154,7 +1157,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
 			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
 		else
-			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
+			pmb_advance_or_null(o, l, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 06/10] diff --color-moved: unify moved block growth functions
  2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
                   ` (4 preceding siblings ...)
  2021-06-14 13:04 ` [PATCH 05/10] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
@ 2021-06-14 13:04 ` Phillip Wood via GitGitGadget
  2021-06-14 13:04 ` [PATCH 07/10] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-06-14 13:04 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

After the last two commits pmb_advance_or_null() and
pmb_advance_or_null_multi_match() differ only in the comparison they
perform. Lets simplify the code by combining them into a single
function.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/diff.c b/diff.c
index c8fdfc9049bb..de6522a3a860 100644
--- a/diff.c
+++ b/diff.c
@@ -1003,36 +1003,23 @@ static void pmb_advance_or_null(struct diff_options *o,
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
 	for (i = 0; i < pmb_nr; i++) {
+		int match;
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
 				prev->next_line : NULL;
-		if (cur && xdiff_compare_lines(cur->es->line, cur->es->len,
-						l->line, l->len,
-						flags)) {
-			pmb[i].match = cur;
-		} else {
-			pmb[i].match = NULL;
-		}
-	}
-}
 
-static void pmb_advance_or_null_multi_match(struct diff_options *o,
-					    struct emitted_diff_symbol *l,
-					    struct moved_block *pmb,
-					    int pmb_nr)
-{
-	int i;
-
-	for (i = 0; i < pmb_nr; i++) {
-		struct moved_entry *prev = pmb[i].match;
-		struct moved_entry *cur = (prev && prev->next_line) ?
-			prev->next_line : NULL;
-		if (cur && !cmp_in_block_with_wsd(o, cur, l, &pmb[i])) {
-			/* Advance to the next line */
+		if (o->color_moved_ws_handling &
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			match = cur &&
+				!cmp_in_block_with_wsd(o, cur, l, &pmb[i]);
+		else
+			match = cur &&
+				xdiff_compare_lines(cur->es->line, cur->es->len,
+						    l->line, l->len, flags);
+		if (match)
 			pmb[i].match = cur;
-		} else {
+		else
 			moved_block_clear(&pmb[i]);
-		}
 	}
 }
 
@@ -1153,11 +1140,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 		}
 
-		if (o->color_moved_ws_handling &
-		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
-		else
-			pmb_advance_or_null(o, l, pmb, pmb_nr);
+		pmb_advance_or_null(o, l, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 07/10] diff --color-moved: shrink potential moved blocks as we go
  2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
                   ` (5 preceding siblings ...)
  2021-06-14 13:04 ` [PATCH 06/10] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
@ 2021-06-14 13:04 ` Phillip Wood via GitGitGadget
  2021-06-14 13:04 ` [PATCH 08/10] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-06-14 13:04 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Rather than setting `match` to NULL and then looping over the list of
potential matched blocks for a second time to remove blocks with no
matches just filter out the blocks with no matches as we go.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 42 ++++++------------------------------------
 1 file changed, 6 insertions(+), 36 deletions(-)

diff --git a/diff.c b/diff.c
index de6522a3a860..f60cce654c14 100644
--- a/diff.c
+++ b/diff.c
@@ -997,12 +997,12 @@ static void add_lines_to_move_detection(struct diff_options *o,
 static void pmb_advance_or_null(struct diff_options *o,
 				struct emitted_diff_symbol *l,
 				struct moved_block *pmb,
-				int pmb_nr)
+				int *pmb_nr)
 {
-	int i;
+	int i, j;
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
-	for (i = 0; i < pmb_nr; i++) {
+	for (i = 0, j = 0; i < *pmb_nr; i++) {
 		int match;
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
@@ -1017,37 +1017,9 @@ static void pmb_advance_or_null(struct diff_options *o,
 				xdiff_compare_lines(cur->es->line, cur->es->len,
 						    l->line, l->len, flags);
 		if (match)
-			pmb[i].match = cur;
-		else
-			moved_block_clear(&pmb[i]);
+			pmb[j++].match = cur;
 	}
-}
-
-static int shrink_potential_moved_blocks(struct moved_block *pmb,
-					 int pmb_nr)
-{
-	int lp, rp;
-
-	/* Shrink the set of potential block to the remaining running */
-	for (lp = 0, rp = pmb_nr - 1; lp <= rp;) {
-		while (lp < pmb_nr && pmb[lp].match)
-			lp++;
-		/* lp points at the first NULL now */
-
-		while (rp > -1 && !pmb[rp].match)
-			rp--;
-		/* rp points at the last non-NULL */
-
-		if (lp < pmb_nr && rp > -1 && lp < rp) {
-			pmb[lp] = pmb[rp];
-			memset(&pmb[rp], 0, sizeof(pmb[rp]));
-			rp--;
-			lp++;
-		}
-	}
-
-	/* Remember the number of running sets */
-	return rp + 1;
+	*pmb_nr = j;
 }
 
 /*
@@ -1140,9 +1112,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 		}
 
-		pmb_advance_or_null(o, l, pmb, pmb_nr);
-
-		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
+		pmb_advance_or_null(o, l, pmb, &pmb_nr);
 
 		if (pmb_nr == 0) {
 			/*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 08/10] diff --color-moved: stop clearing potential moved blocks
  2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
                   ` (6 preceding siblings ...)
  2021-06-14 13:04 ` [PATCH 07/10] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
@ 2021-06-14 13:04 ` Phillip Wood via GitGitGadget
  2021-06-14 13:04 ` [PATCH 09/10] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-06-14 13:04 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

moved_block_clear() was introduced in 74d156f4a1 ("diff
--color-moved-ws: fix double free crash", 2018-10-04) to free the
memory that was allocated when initializing a potential moved
block. However since 21536d077f ("diff --color-moved-ws: modify
allow-indentation-change", 2018-11-23) initializing a potential moved
block no longer allocates any memory. Up until the last commit we were
relying on moved_block_clear() to set the `match` pointer to NULL when
a block stopped matching, but since that commit we do not clear a
moved block that does not match so it does not make sense to clear
them elsewhere.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/diff.c b/diff.c
index f60cce654c14..ee58373f55f8 100644
--- a/diff.c
+++ b/diff.c
@@ -807,11 +807,6 @@ struct moved_block {
 	int wsd; /* The whitespace delta of this block */
 };
 
-static void moved_block_clear(struct moved_block *b)
-{
-	memset(b, 0, sizeof(*b));
-}
-
 #define INDENT_BLANKLINE INT_MIN
 
 static void fill_es_indent_data(struct emitted_diff_symbol *es)
@@ -1093,11 +1088,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		if (pmb_nr && (!match || l->s != moved_symbol)) {
-			int i;
-
 			adjust_last_block(o, n, block_length);
-			for(i = 0; i < pmb_nr; i++)
-				moved_block_clear(&pmb[i]);
 			pmb_nr = 0;
 			block_length = 0;
 			flipped_block = 0;
@@ -1155,8 +1146,6 @@ static void mark_color_as_moved(struct diff_options *o,
 	}
 	adjust_last_block(o, n, block_length);
 
-	for(n = 0; n < pmb_nr; n++)
-		moved_block_clear(&pmb[n]);
 	free(pmb);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 09/10] diff --color-moved-ws=allow-indentation-change: improve hash lookups
  2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
                   ` (7 preceding siblings ...)
  2021-06-14 13:04 ` [PATCH 08/10] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
@ 2021-06-14 13:04 ` Phillip Wood via GitGitGadget
  2021-07-09 15:36   ` Elijah Newren
  2021-06-14 13:04 ` [PATCH 10/10] diff --color-moved: intern strings Phillip Wood via GitGitGadget
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-06-14 13:04 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

As libxdiff does not have a whitespace flag to ignore the indentation
the code for --color-moved-ws=allow-indentation-change uses
XDF_IGNORE_WHITESPACE and then filters out any hash lookups where
there are non-indentation changes. This is filtering is inefficient as
we have to perform another string comparison.

By using the offset data that we have already computed to skip the
indentation we can avoid using XDF_IGNORE_WHITESPACE and safely remove
the extra checks which improves the performance by 14% and paves the
way for the elimination of string comparisons in the next commit.

This change slightly increases the runtime of other --color-moved
modes. This could be avoided by using different comparison functions
for the different modes but after the changes in the next commit there
is no measurable benefit.

Before this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
  Time (mean ± σ):      1.116 s ±  0.005 s    [User: 1.057 s, System: 0.056 s]
  Range (min … max):    1.109 s …  1.123 s    10 runs

Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
  Time (mean ± σ):      1.216 s ±  0.005 s    [User: 1.155 s, System: 0.059 s]
  Range (min … max):    1.206 s …  1.223 s    10 runs

After this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
  Time (mean ± σ):      1.147 s ±  0.005 s    [User: 1.085 s, System: 0.059 s]
  Range (min … max):    1.140 s …  1.154 s    10 runs

Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
  Time (mean ± σ):      1.048 s ±  0.005 s    [User: 987.4 ms, System: 58.8 ms]
  Range (min … max):    1.043 s …  1.056 s    10 runs

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 66 +++++++++++++++++-----------------------------------------
 1 file changed, 19 insertions(+), 47 deletions(-)

diff --git a/diff.c b/diff.c
index ee58373f55f8..e6f3586b39bf 100644
--- a/diff.c
+++ b/diff.c
@@ -850,28 +850,15 @@ static void fill_es_indent_data(struct emitted_diff_symbol *es)
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
-			    const struct emitted_diff_symbol *b,
-			    int *out)
-{
-	int a_len = a->len,
-	    b_len = b->len,
-	    a_off = a->indent_off,
-	    a_width = a->indent_width,
-	    b_off = b->indent_off,
+			    const struct emitted_diff_symbol *b)
+{
+	int a_width = a->indent_width,
 	    b_width = b->indent_width;
 
-	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
-		*out = INDENT_BLANKLINE;
-		return 1;
-	}
-
-	if (a_len - a_off != b_len - b_off ||
-	    memcmp(a->line + a_off, b->line + b_off, a_len - a_off))
-		return 0;
-
-	*out = a_width - b_width;
+	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
+		return INDENT_BLANKLINE;
 
-	return 1;
+	return a_width - b_width;
 }
 
 static int cmp_in_block_with_wsd(const struct diff_options *o,
@@ -917,26 +904,17 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			   const void *keydata)
 {
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
-	const struct moved_entry *a, *b;
+	const struct emitted_diff_symbol *a, *b;
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
-	a = container_of(eptr, const struct moved_entry, ent);
-	b = container_of(entry_or_key, const struct moved_entry, ent);
-
-	if (diffopt->color_moved_ws_handling &
-	    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-		/*
-		 * As there is not specific white space config given,
-		 * we'd need to check for a new block, so ignore all
-		 * white space. The setup of the white space
-		 * configuration for the next block is done else where
-		 */
-		flags |= XDF_IGNORE_WHITESPACE;
+	a = container_of(eptr, const struct moved_entry, ent)->es;
+	b = container_of(entry_or_key, const struct moved_entry, ent)->es;
 
-	return !xdiff_compare_lines(a->es->line, a->es->len,
-				    b->es->line, b->es->len,
-				    flags);
+	return !xdiff_compare_lines(a->line + a->indent_off,
+				    a->len - a->indent_off,
+				    b->line + b->indent_off,
+				    b->len - b->indent_off, flags);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -945,7 +923,8 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 	struct moved_entry *ret = xmalloc(sizeof(*ret));
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
-	unsigned int hash = xdiff_hash_string(l->line, l->len, flags);
+	unsigned int hash = xdiff_hash_string(l->line + l->indent_off,
+					      l->len - l->indent_off, flags);
 
 	hashmap_entry_init(&ret->ent, hash);
 	ret->es = l;
@@ -1113,14 +1092,11 @@ static void mark_color_as_moved(struct diff_options *o,
 			hashmap_for_each_entry_from(hm, match, ent) {
 				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
 				if (o->color_moved_ws_handling &
-				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
-					if (compute_ws_delta(l, match->es,
-							     &pmb[pmb_nr].wsd))
-						pmb[pmb_nr++].match = match;
-				} else {
+				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+					pmb[pmb_nr].wsd = compute_ws_delta(l, match->es);
+				else
 					pmb[pmb_nr].wsd = 0;
-					pmb[pmb_nr++].match = match;
-				}
+				pmb[pmb_nr++].match = match;
 			}
 
 			if (adjust_last_block(o, n, block_length) &&
@@ -6240,10 +6216,6 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
-			if (o->color_moved_ws_handling &
-			    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-				o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
-
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
 			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH 10/10] diff --color-moved: intern strings
  2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
                   ` (8 preceding siblings ...)
  2021-06-14 13:04 ` [PATCH 09/10] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
@ 2021-06-14 13:04 ` Phillip Wood via GitGitGadget
  2021-06-16 14:24 ` [PATCH 00/10] diff --color-moved[-ws] speedups Ævar Arnfjörð Bjarmason
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
  11 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-06-14 13:04 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Taking inspiration from xdl_classify_record() assign an id to each
addition and deletion such that lines that match for the current
--color-moved-ws mode share the same unique id. This reduces the
number of hash lookups a little (calculating the ids still involves
one hash lookup per line) but the main benefit is that when growing
blocks of potentially moved lines we can replace string comparisons
which involve chasing a pointer with a simple integer comparison.  On
a large diff this commit reduces the time to run 'diff --color-moved'
by 33% and 'diff --color-moved-ws=allow-indentation-change' by 20%.

Compared to master the time to run 'git log --patch --color-moved' is
increased by 2% and 'git log --patch
--color-moved-ws=allow-indentation-change' in reduced by 14%. These
timings were performed on an i5-7200U, on an i5-3470 both commands are
faster than master. The small speed decrease on commit sized diffs is
unfortunate but I think it is small enough to be worth it for the
gains on larger diffs.

Large diff before this change:
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
  Time (mean ± σ):      1.147 s ±  0.005 s    [User: 1.085 s, System: 0.059 s]
  Range (min … max):    1.140 s …  1.154 s    10 runs

Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
  Time (mean ± σ):      1.048 s ±  0.005 s    [User: 987.4 ms, System: 58.8 ms]
  Range (min … max):    1.043 s …  1.056 s    10 runs

Large diff after this change
Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
  Time (mean ± σ):     762.7 ms ±   2.8 ms    [User: 707.5 ms, System: 53.7 ms]
  Range (min … max):   758.0 ms … 767.0 ms    10 runs

Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
  Time (mean ± σ):     831.7 ms ±   1.7 ms    [User: 776.5 ms, System: 53.3 ms]
  Range (min … max):   829.2 ms … 835.1 ms    10 runs

Small diffs on master
Benchmark #1: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0
  Time (mean ± σ):      1.567 s ±  0.001 s    [User: 1.443 s, System: 0.121 s]
  Range (min … max):    1.566 s …  1.571 s    10 runs

Benchmark #2: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change -n1000 --no-merges v2.29.0
  Time (mean ± σ):      1.865 s ±  0.008 s    [User: 1.748 s, System: 0.112 s]
  Range (min … max):    1.857 s …  1.881 s    10 runs

Small diffs after this change
Benchmark #1: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0
  Time (mean ± σ):      1.597 s ±  0.003 s    [User: 1.413 s, System: 0.179 s]
  Range (min … max):    1.591 s …  1.601 s    10 runs

Benchmark #2: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change -n1000 --no-merges v2.29.0
  Time (mean ± σ):      1.606 s ±  0.006 s    [User: 1.420 s, System: 0.181 s]
  Range (min … max):    1.601 s …  1.622 s    10 runs

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 173 ++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 96 insertions(+), 77 deletions(-)

diff --git a/diff.c b/diff.c
index e6f3586b39bf..3260e2c60591 100644
--- a/diff.c
+++ b/diff.c
@@ -18,6 +18,7 @@
 #include "submodule-config.h"
 #include "submodule.h"
 #include "hashmap.h"
+#include "mem-pool.h"
 #include "ll-merge.h"
 #include "string-list.h"
 #include "strvec.h"
@@ -772,6 +773,7 @@ struct emitted_diff_symbol {
 	int flags;
 	int indent_off;   /* Offset to first non-whitespace character */
 	int indent_width; /* The visual width of the indentation */
+	unsigned id;
 	enum diff_symbol s;
 };
 #define EMITTED_DIFF_SYMBOL_INIT {NULL}
@@ -797,9 +799,9 @@ static void append_emitted_diff_symbol(struct diff_options *o,
 }
 
 struct moved_entry {
-	struct hashmap_entry ent;
 	const struct emitted_diff_symbol *es;
 	struct moved_entry *next_line;
+	struct moved_entry *next_match;
 };
 
 struct moved_block {
@@ -866,24 +868,24 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 				 const struct emitted_diff_symbol *l,
 				 struct moved_block *pmb)
 {
-	int al = cur->es->len, bl = l->len;
-	const char *a = cur->es->line,
-		   *b = l->line;
-	int a_off = cur->es->indent_off,
-	    a_width = cur->es->indent_width,
-	    b_off = l->indent_off,
-	    b_width = l->indent_width;
+	int a_width = cur->es->indent_width, b_width = l->indent_width;
 	int delta;
 
-	/* If 'l' and 'cur' are both blank then they match. */
-	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
+	/* The text of each line must match */
+	if (cur->es->id != l->id)
+		return 1;
+
+	/*
+	 * If 'l' and 'cur' are both blank then we don't need to check the
+	 * indent. We only need to check cur as we know the strings match.
+	 * */
+	if (a_width == INDENT_BLANKLINE)
 		return 0;
 
 	/*
 	 * The indent changes of the block are known and stored in pmb->wsd;
 	 * however we need to check if the indent changes of the current line
-	 * match those of the current block and that the text of 'l' and 'cur'
-	 * after the indentation match.
+	 * match those of the current block.
 	 */
 	delta = b_width - a_width;
 
@@ -894,22 +896,26 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	if (pmb->wsd == INDENT_BLANKLINE)
 		pmb->wsd = delta;
 
-	return !(delta == pmb->wsd && al - a_off == bl - b_off &&
-		 !memcmp(a + a_off, b + b_off, al - a_off));
+	return delta != pmb->wsd;
 }
 
-static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
-			   const struct hashmap_entry *eptr,
-			   const struct hashmap_entry *entry_or_key,
-			   const void *keydata)
+struct interned_diff_symbol {
+	struct hashmap_entry ent;
+	struct emitted_diff_symbol *es;
+};
+
+static int interned_diff_symbol_cmp(const void *hashmap_cmp_fn_data,
+				    const struct hashmap_entry *eptr,
+				    const struct hashmap_entry *entry_or_key,
+				    const void *keydata)
 {
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
 	const struct emitted_diff_symbol *a, *b;
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
-	a = container_of(eptr, const struct moved_entry, ent)->es;
-	b = container_of(entry_or_key, const struct moved_entry, ent)->es;
+	a = container_of(eptr, const struct interned_diff_symbol, ent)->es;
+	b = container_of(entry_or_key, const struct interned_diff_symbol, ent)->es;
 
 	return !xdiff_compare_lines(a->line + a->indent_off,
 				    a->len - a->indent_off,
@@ -917,55 +923,81 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 				    b->len - b->indent_off, flags);
 }
 
-static struct moved_entry *prepare_entry(struct diff_options *o,
-					 int line_no)
+static void prepare_entry(struct diff_options *o, struct emitted_diff_symbol *l,
+			  struct interned_diff_symbol *s)
 {
-	struct moved_entry *ret = xmalloc(sizeof(*ret));
-	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 	unsigned int hash = xdiff_hash_string(l->line + l->indent_off,
 					      l->len - l->indent_off, flags);
 
-	hashmap_entry_init(&ret->ent, hash);
-	ret->es = l;
-	ret->next_line = NULL;
-
-	return ret;
+	hashmap_entry_init(&s->ent, hash);
+	s->es = l;
 }
 
-static void add_lines_to_move_detection(struct diff_options *o,
-					struct hashmap *add_lines,
-					struct hashmap *del_lines)
+struct moved_entry_list {
+	struct moved_entry *add, *del;
+};
+
+static struct moved_entry_list *add_lines_to_move_detection(struct diff_options *o,
+							    struct mem_pool *entry_mem_pool)
 {
 	struct moved_entry *prev_line = NULL;
-
+	struct mem_pool interned_pool;
+	struct hashmap interned_map;
+	struct moved_entry_list *entry_list = NULL;
+	size_t entry_list_alloc = 0;
+	unsigned id = 0;
 	int n;
+
+	hashmap_init(&interned_map, interned_diff_symbol_cmp, o, 8096);
+	mem_pool_init(&interned_pool, 1024 * 1024);
+
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
-		struct hashmap *hm;
-		struct moved_entry *key;
+		struct interned_diff_symbol key;
+		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
+		struct interned_diff_symbol *s;
+		struct moved_entry *entry;
 
-		switch (o->emitted_symbols->buf[n].s) {
-		case DIFF_SYMBOL_PLUS:
-			hm = add_lines;
-			break;
-		case DIFF_SYMBOL_MINUS:
-			hm = del_lines;
-			break;
-		default:
+		if (l->s != DIFF_SYMBOL_PLUS && l->s != DIFF_SYMBOL_MINUS) {
 			prev_line = NULL;
 			continue;
 		}
 
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			fill_es_indent_data(&o->emitted_symbols->buf[n]);
-		key = prepare_entry(o, n);
-		if (prev_line && prev_line->es->s == o->emitted_symbols->buf[n].s)
-			prev_line->next_line = key;
+			fill_es_indent_data(l);
 
-		hashmap_add(hm, &key->ent);
-		prev_line = key;
+		prepare_entry(o, l, &key);
+		s = hashmap_get_entry(&interned_map, &key, ent, &key.ent);
+		if (s) {
+			l->id = s->es->id;
+		} else {
+			l->id = id;
+			ALLOC_GROW_BY(entry_list, id, 1, entry_list_alloc);
+			hashmap_add(&interned_map,
+				    memcpy(mem_pool_alloc(&interned_pool,
+							  sizeof(key)),
+					   &key, sizeof(key)));
+		}
+		entry = mem_pool_alloc(entry_mem_pool, sizeof(*entry));
+		entry->es = l;
+		entry->next_line = NULL;
+		if (prev_line && prev_line->es->s == l->s)
+			prev_line->next_line = entry;
+		prev_line = entry;
+		if (l->s == DIFF_SYMBOL_PLUS) {
+			entry->next_match = entry_list[l->id].add;
+			entry_list[l->id].add = entry;
+		} else {
+			entry->next_match = entry_list[l->id].del;
+			entry_list[l->id].del = entry;
+		}
 	}
+
+	hashmap_clear(&interned_map);
+	mem_pool_discard(&interned_pool, 0);
+
+	return entry_list;
 }
 
 static void pmb_advance_or_null(struct diff_options *o,
@@ -974,7 +1006,6 @@ static void pmb_advance_or_null(struct diff_options *o,
 				int *pmb_nr)
 {
 	int i, j;
-	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
 	for (i = 0, j = 0; i < *pmb_nr; i++) {
 		int match;
@@ -987,9 +1018,8 @@ static void pmb_advance_or_null(struct diff_options *o,
 			match = cur &&
 				!cmp_in_block_with_wsd(o, cur, l, &pmb[i]);
 		else
-			match = cur &&
-				xdiff_compare_lines(cur->es->line, cur->es->len,
-						    l->line, l->len, flags);
+			match = cur && cur->es->id == l->id;
+
 		if (match)
 			pmb[j++].match = cur;
 	}
@@ -1034,8 +1064,7 @@ static int adjust_last_block(struct diff_options *o, int n, int block_length)
 
 /* Find blocks of moved code, delegate actual coloring decision to helper */
 static void mark_color_as_moved(struct diff_options *o,
-				struct hashmap *add_lines,
-				struct hashmap *del_lines)
+				struct moved_entry_list *entry_list)
 {
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
@@ -1044,23 +1073,15 @@ static void mark_color_as_moved(struct diff_options *o,
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
-		struct hashmap *hm = NULL;
-		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
-			hm = del_lines;
-			key = prepare_entry(o, n);
-			match = hashmap_get_entry(hm, key, ent, NULL);
-			free(key);
+			match = entry_list[l->id].del;
 			break;
 		case DIFF_SYMBOL_MINUS:
-			hm = add_lines;
-			key = prepare_entry(o, n);
-			match = hashmap_get_entry(hm, key, ent, NULL);
-			free(key);
+			match = entry_list[l->id].add;
 			break;
 		default:
 			flipped_block = 0;
@@ -1089,7 +1110,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			 * The current line is the start of a new block.
 			 * Setup the set of potential blocks.
 			 */
-			hashmap_for_each_entry_from(hm, match, ent) {
+			for (; match; match = match->next_match) {
 				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
 				if (o->color_moved_ws_handling &
 				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
@@ -1460,7 +1481,7 @@ static void emit_diff_symbol_from_struct(struct diff_options *o,
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
 			     const char *line, int len, unsigned flags)
 {
-	struct emitted_diff_symbol e = {line, len, flags, 0, 0, s};
+	struct emitted_diff_symbol e = {line, len, flags, 0, 0, 0, s};
 
 	if (o->emitted_symbols)
 		append_emitted_diff_symbol(o, &e);
@@ -6214,20 +6235,18 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 
 	if (o->emitted_symbols) {
 		if (o->color_moved) {
-			struct hashmap add_lines, del_lines;
-
-			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
-			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
+			struct mem_pool entry_pool;
+			struct moved_entry_list *entry_list;
 
-			add_lines_to_move_detection(o, &add_lines, &del_lines);
-			mark_color_as_moved(o, &add_lines, &del_lines);
+			mem_pool_init(&entry_pool, 1024 * 1024);
+			entry_list = add_lines_to_move_detection(o,
+								 &entry_pool);
+			mark_color_as_moved(o, entry_list);
 			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
 				dim_moved_lines(o);
 
-			hashmap_clear_and_free(&add_lines, struct moved_entry,
-						ent);
-			hashmap_clear_and_free(&del_lines, struct moved_entry,
-						ent);
+			mem_pool_discard(&entry_pool, 0);
+			free(entry_list);
 		}
 
 		for (i = 0; i < esm.nr; i++)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring
  2021-06-14 13:04 ` [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring Phillip Wood via GitGitGadget
@ 2021-06-15  3:24   ` Junio C Hamano
  2021-06-15 11:22     ` Phillip Wood
  0 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2021-06-15  3:24 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget; +Cc: git, Phillip Wood

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Subject: Re: [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring

Zerba?

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
> alternation", 2018-11-23) sought to avoid using the alternate colors
> unless there are two adjacent moved blocks of the same
> sign. Unfortunately it contains two bugs that prevented it from fixing
> the problem properly. Firstly `last_symbol` is reset at the start of
> each iteration of the loop losing the symbol of the last line and
> secondly when deciding whether to use the alternate color it should be
> checking if the current line is the same sign of the last line, not a
> different sign. The combination of the two errors means that we still
> use the alternate color when we should do but we also use it when we
> shouldn't. This is most noticable when using
> --color-moved-ws=allow-indentation-change with hunks like
>
> -this line gets indented
> +    this line gets indented
>
> where the post image is colored with newMovedAlternate rather than
> newMoved. While this does not matter much, the next commit will change
> the coloring to be correct in this case, so lets fix the bug here to
> make it clear why the output is changing and add a regression test.
>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  diff.c                     |  4 +--
>  t/t4015-diff-whitespace.sh | 72 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 74 insertions(+), 2 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 52c791574b71..cb068f8258c0 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1142,6 +1142,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  	struct moved_block *pmb = NULL; /* potentially moved blocks */
>  	int pmb_nr = 0, pmb_alloc = 0;
>  	int n, flipped_block = 0, block_length = 0;
> +	enum diff_symbol last_symbol = 0;
>  
>  
>  	for (n = 0; n < o->emitted_symbols->nr; n++) {
> @@ -1149,7 +1150,6 @@ static void mark_color_as_moved(struct diff_options *o,
>  		struct moved_entry *key;
>  		struct moved_entry *match = NULL;
>  		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
> -		enum diff_symbol last_symbol = 0;
>  
>  		switch (l->s) {
>  		case DIFF_SYMBOL_PLUS:
> @@ -1214,7 +1214,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  			}
>  
>  			if (adjust_last_block(o, n, block_length) &&
> -			    pmb_nr && last_symbol != l->s)
> +			    pmb_nr && last_symbol == l->s)
>  				flipped_block = (flipped_block + 1) % 2;
>  			else
>  				flipped_block = 0;
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 2c13b62d3c65..920114cd795c 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1442,6 +1442,78 @@ test_expect_success 'detect permutations inside moved code -- dimmed-zebra' '
>  	test_cmp expected actual
>  '
>  
> +test_expect_success 'zebra alternate color is only used when necessary' '
> +	cat >old.txt <<-\EOF &&
> +	line 1A should be marked as oldMoved newMovedAlternate
> +	line 1B should be marked as oldMoved newMovedAlternate
> +	unchanged
> +	line 2A should be marked as oldMoved newMovedAlternate
> +	line 2B should be marked as oldMoved newMovedAlternate
> +	line 3A should be marked as oldMovedAlternate newMoved
> +	line 3B should be marked as oldMovedAlternate newMoved
> +	unchanged
> +	line 4A should be marked as oldMoved newMovedAlternate
> +	line 4B should be marked as oldMoved newMovedAlternate
> +	line 5A should be marked as oldMovedAlternate newMoved
> +	line 5B should be marked as oldMovedAlternate newMoved
> +	line 6A should be marked as oldMoved newMoved
> +	line 6B should be marked as oldMoved newMoved
> +	EOF
> +	cat >new.txt <<-\EOF &&
> +	  line 1A should be marked as oldMoved newMovedAlternate
> +	  line 1B should be marked as oldMoved newMovedAlternate
> +	unchanged
> +	  line 3A should be marked as oldMovedAlternate newMoved
> +	  line 3B should be marked as oldMovedAlternate newMoved
> +	  line 2A should be marked as oldMoved newMovedAlternate
> +	  line 2B should be marked as oldMoved newMovedAlternate
> +	unchanged
> +	  line 6A should be marked as oldMoved newMoved
> +	  line 6B should be marked as oldMoved newMoved
> +	    line 4A should be marked as oldMoved newMovedAlternate
> +	    line 4B should be marked as oldMoved newMovedAlternate
> +	  line 5A should be marked as oldMovedAlternate newMoved
> +	  line 5B should be marked as oldMovedAlternate newMoved
> +	EOF
> +	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
> +		 --color-moved-ws=allow-indentation-change \
> +		 old.txt new.txt >output &&
> +	grep -v index output | test_decode_color >actual &&
> +	cat >expected <<-\EOF &&
> +	<BOLD>diff --git a/old.txt b/new.txt<RESET>
> +	<BOLD>--- a/old.txt<RESET>
> +	<BOLD>+++ b/new.txt<RESET>
> +	<CYAN>@@ -1,14 +1,14 @@<RESET>
> +	<BOLD;MAGENTA>-line 1A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;MAGENTA>-line 1B should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1B should be marked as oldMoved newMovedAlternate<RESET>
> +	 unchanged<RESET>
> +	<BOLD;MAGENTA>-line 2A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;MAGENTA>-line 2B should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;BLUE>-line 3A should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;BLUE>-line 3B should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3A should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3B should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2B should be marked as oldMoved newMovedAlternate<RESET>
> +	 unchanged<RESET>
> +	<BOLD;MAGENTA>-line 4A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;MAGENTA>-line 4B should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;BLUE>-line 5A should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;BLUE>-line 5B should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;MAGENTA>-line 6A should be marked as oldMoved newMoved<RESET>
> +	<BOLD;MAGENTA>-line 6B should be marked as oldMoved newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6A should be marked as oldMoved newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6B should be marked as oldMoved newMoved<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4B should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5A should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5B should be marked as oldMovedAlternate newMoved<RESET>
> +	EOF
> +	test_cmp expected actual
> +'
> +
>  test_expect_success 'cmd option assumes configured colored-moved' '
>  	test_config color.diff.oldMoved "magenta" &&
>  	test_config color.diff.newMoved "cyan" &&

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring
  2021-06-15  3:24   ` Junio C Hamano
@ 2021-06-15 11:22     ` Phillip Wood
  0 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood @ 2021-06-15 11:22 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood via GitGitGadget; +Cc: git, Phillip Wood

On 15/06/2021 04:24, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> Subject: Re: [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring
> 
> Zerba?

They're quite rare in the wild :-/ Thanks for pointing that out I'll fix 
it when I re-roll

Best Wishes

Phillip

>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
>> alternation", 2018-11-23) sought to avoid using the alternate colors
>> unless there are two adjacent moved blocks of the same
>> sign. Unfortunately it contains two bugs that prevented it from fixing
>> the problem properly. Firstly `last_symbol` is reset at the start of
>> each iteration of the loop losing the symbol of the last line and
>> secondly when deciding whether to use the alternate color it should be
>> checking if the current line is the same sign of the last line, not a
>> different sign. The combination of the two errors means that we still
>> use the alternate color when we should do but we also use it when we
>> shouldn't. This is most noticable when using
>> --color-moved-ws=allow-indentation-change with hunks like
>>
>> -this line gets indented
>> +    this line gets indented
>>
>> where the post image is colored with newMovedAlternate rather than
>> newMoved. While this does not matter much, the next commit will change
>> the coloring to be correct in this case, so lets fix the bug here to
>> make it clear why the output is changing and add a regression test.
>>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>>   diff.c                     |  4 +--
>>   t/t4015-diff-whitespace.sh | 72 ++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 74 insertions(+), 2 deletions(-)
>>
>> diff --git a/diff.c b/diff.c
>> index 52c791574b71..cb068f8258c0 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -1142,6 +1142,7 @@ static void mark_color_as_moved(struct diff_options *o,
>>   	struct moved_block *pmb = NULL; /* potentially moved blocks */
>>   	int pmb_nr = 0, pmb_alloc = 0;
>>   	int n, flipped_block = 0, block_length = 0;
>> +	enum diff_symbol last_symbol = 0;
>>   
>>   
>>   	for (n = 0; n < o->emitted_symbols->nr; n++) {
>> @@ -1149,7 +1150,6 @@ static void mark_color_as_moved(struct diff_options *o,
>>   		struct moved_entry *key;
>>   		struct moved_entry *match = NULL;
>>   		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
>> -		enum diff_symbol last_symbol = 0;
>>   
>>   		switch (l->s) {
>>   		case DIFF_SYMBOL_PLUS:
>> @@ -1214,7 +1214,7 @@ static void mark_color_as_moved(struct diff_options *o,
>>   			}
>>   
>>   			if (adjust_last_block(o, n, block_length) &&
>> -			    pmb_nr && last_symbol != l->s)
>> +			    pmb_nr && last_symbol == l->s)
>>   				flipped_block = (flipped_block + 1) % 2;
>>   			else
>>   				flipped_block = 0;
>> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
>> index 2c13b62d3c65..920114cd795c 100755
>> --- a/t/t4015-diff-whitespace.sh
>> +++ b/t/t4015-diff-whitespace.sh
>> @@ -1442,6 +1442,78 @@ test_expect_success 'detect permutations inside moved code -- dimmed-zebra' '
>>   	test_cmp expected actual
>>   '
>>   
>> +test_expect_success 'zebra alternate color is only used when necessary' '
>> +	cat >old.txt <<-\EOF &&
>> +	line 1A should be marked as oldMoved newMovedAlternate
>> +	line 1B should be marked as oldMoved newMovedAlternate
>> +	unchanged
>> +	line 2A should be marked as oldMoved newMovedAlternate
>> +	line 2B should be marked as oldMoved newMovedAlternate
>> +	line 3A should be marked as oldMovedAlternate newMoved
>> +	line 3B should be marked as oldMovedAlternate newMoved
>> +	unchanged
>> +	line 4A should be marked as oldMoved newMovedAlternate
>> +	line 4B should be marked as oldMoved newMovedAlternate
>> +	line 5A should be marked as oldMovedAlternate newMoved
>> +	line 5B should be marked as oldMovedAlternate newMoved
>> +	line 6A should be marked as oldMoved newMoved
>> +	line 6B should be marked as oldMoved newMoved
>> +	EOF
>> +	cat >new.txt <<-\EOF &&
>> +	  line 1A should be marked as oldMoved newMovedAlternate
>> +	  line 1B should be marked as oldMoved newMovedAlternate
>> +	unchanged
>> +	  line 3A should be marked as oldMovedAlternate newMoved
>> +	  line 3B should be marked as oldMovedAlternate newMoved
>> +	  line 2A should be marked as oldMoved newMovedAlternate
>> +	  line 2B should be marked as oldMoved newMovedAlternate
>> +	unchanged
>> +	  line 6A should be marked as oldMoved newMoved
>> +	  line 6B should be marked as oldMoved newMoved
>> +	    line 4A should be marked as oldMoved newMovedAlternate
>> +	    line 4B should be marked as oldMoved newMovedAlternate
>> +	  line 5A should be marked as oldMovedAlternate newMoved
>> +	  line 5B should be marked as oldMovedAlternate newMoved
>> +	EOF
>> +	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
>> +		 --color-moved-ws=allow-indentation-change \
>> +		 old.txt new.txt >output &&
>> +	grep -v index output | test_decode_color >actual &&
>> +	cat >expected <<-\EOF &&
>> +	<BOLD>diff --git a/old.txt b/new.txt<RESET>
>> +	<BOLD>--- a/old.txt<RESET>
>> +	<BOLD>+++ b/new.txt<RESET>
>> +	<CYAN>@@ -1,14 +1,14 @@<RESET>
>> +	<BOLD;MAGENTA>-line 1A should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;MAGENTA>-line 1B should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1A should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1B should be marked as oldMoved newMovedAlternate<RESET>
>> +	 unchanged<RESET>
>> +	<BOLD;MAGENTA>-line 2A should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;MAGENTA>-line 2B should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;BLUE>-line 3A should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;BLUE>-line 3B should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3A should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3B should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2A should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2B should be marked as oldMoved newMovedAlternate<RESET>
>> +	 unchanged<RESET>
>> +	<BOLD;MAGENTA>-line 4A should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;MAGENTA>-line 4B should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;BLUE>-line 5A should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;BLUE>-line 5B should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;MAGENTA>-line 6A should be marked as oldMoved newMoved<RESET>
>> +	<BOLD;MAGENTA>-line 6B should be marked as oldMoved newMoved<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6A should be marked as oldMoved newMoved<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6B should be marked as oldMoved newMoved<RESET>
>> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4A should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4B should be marked as oldMoved newMovedAlternate<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5A should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5B should be marked as oldMovedAlternate newMoved<RESET>
>> +	EOF
>> +	test_cmp expected actual
>> +'
>> +
>>   test_expect_success 'cmd option assumes configured colored-moved' '
>>   	test_config color.diff.oldMoved "magenta" &&
>>   	test_config color.diff.newMoved "cyan" &&

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 00/10] diff --color-moved[-ws] speedups
  2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
                   ` (9 preceding siblings ...)
  2021-06-14 13:04 ` [PATCH 10/10] diff --color-moved: intern strings Phillip Wood via GitGitGadget
@ 2021-06-16 14:24 ` Ævar Arnfjörð Bjarmason
  2021-06-21 10:03   ` Phillip Wood
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
  11 siblings, 1 reply; 92+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-16 14:24 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget; +Cc: git, Phillip Wood


On Mon, Jun 14 2021, Phillip Wood via GitGitGadget wrote:

> The current implementation of diff --color-moved-ws=allow-indentation-change
> is considerably slower that the implementation of diff --color-moved which
> is in turn slower than a regular diff. This patch series starts with a
> couple of bug fixes and then reworks the implementation of diff
> --color-moved and diff --color-moved-ws=allow-indentation-change to speed
> them up on large diffs. The time to run git diff --color-moved
> --no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
> git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
> v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
> with --color-moved - the time to run git log -p --color-moved
> --no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
> processors. On older processors these patches reduce the running time in all
> cases that I've tested. In general the larger the diff the larger the speed
> up. As an extreme example the time to run diff --color-moved
> --color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
> minutes to 6 seconds.
>
> Phillip Wood (10):
>   diff --color-moved=zerba: fix alternate coloring
>   diff --color-moved: avoid false short line matches and bad zerba
>     coloring
>   diff: simplify allow-indentation-change delta calculation
>   diff --color-moved-ws=allow-indentation-change: simplify and optimize
>   diff --color-moved: call comparison function directly
>   diff --color-moved: unify moved block growth functions
>   diff --color-moved: shrink potential moved blocks as we go
>   diff --color-moved: stop clearing potential moved blocks
>   diff --color-moved-ws=allow-indentation-change: improve hash lookups
>   diff --color-moved: intern strings

Nice to see these land after the earlier on-list reference to them.

I skimmed these mostly, and am not familiar with this code, but didn't
see any glaring things missing. There was one existing oddity with
assigning a 0 to an "enum diff_symbol", don't we want
DIFF_SYMBOL_BINARY_DIFF_HEADER? In any case, it's just a line you touch
in 02/10, and pre-dates these changes.

One thing I would very much like to see here is a conversion of the
existing ad-hoc benchmarks you note in commit messages to something that
lives in t/perf/, it really helps future maintenance of perf-sensitive
code to be able to re-run those, and I for one find the output much
easier to read than whatever tool you're using to produce your
benchmarks.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 00/10] diff --color-moved[-ws] speedups
  2021-06-16 14:24 ` [PATCH 00/10] diff --color-moved[-ws] speedups Ævar Arnfjörð Bjarmason
@ 2021-06-21 10:03   ` Phillip Wood
  0 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood @ 2021-06-21 10:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason,
	Phillip Wood via GitGitGadget
  Cc: git, Phillip Wood

On 16/06/2021 15:24, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Jun 14 2021, Phillip Wood via GitGitGadget wrote:
> 
>> The current implementation of diff --color-moved-ws=allow-indentation-change
>> is considerably slower that the implementation of diff --color-moved which
>> is in turn slower than a regular diff. This patch series starts with a
>> couple of bug fixes and then reworks the implementation of diff
>> --color-moved and diff --color-moved-ws=allow-indentation-change to speed
>> them up on large diffs. The time to run git diff --color-moved
>> --no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
>> git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
>> v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
>> with --color-moved - the time to run git log -p --color-moved
>> --no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
>> processors. On older processors these patches reduce the running time in all
>> cases that I've tested. In general the larger the diff the larger the speed
>> up. As an extreme example the time to run diff --color-moved
>> --color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
>> minutes to 6 seconds.
>>
>> Phillip Wood (10):
>>    diff --color-moved=zerba: fix alternate coloring
>>    diff --color-moved: avoid false short line matches and bad zerba
>>      coloring
>>    diff: simplify allow-indentation-change delta calculation
>>    diff --color-moved-ws=allow-indentation-change: simplify and optimize
>>    diff --color-moved: call comparison function directly
>>    diff --color-moved: unify moved block growth functions
>>    diff --color-moved: shrink potential moved blocks as we go
>>    diff --color-moved: stop clearing potential moved blocks
>>    diff --color-moved-ws=allow-indentation-change: improve hash lookups
>>    diff --color-moved: intern strings
> 
> Nice to see these land after the earlier on-list reference to them.
> 
> I skimmed these mostly, and am not familiar with this code, but didn't
> see any glaring things missing. There was one existing oddity with
> assigning a 0 to an "enum diff_symbol", don't we want
> DIFF_SYMBOL_BINARY_DIFF_HEADER? In any case, it's just a line you touch
> in 02/10, and pre-dates these changes.

Thanks for taking a look at these patches. I take your point about the 
assignment, I don't think the actual value matters so long as it's not 
DIFF_SYMBOL_PLUS or DIFF_SYMBOL_MINUS.

> One thing I would very much like to see here is a conversion of the
> existing ad-hoc benchmarks you note in commit messages to something that
> lives in t/perf/, it really helps future maintenance of perf-sensitive
> code to be able to re-run those, and I for one find the output much
> easier to read than whatever tool you're using to produce your
> benchmarks.

Adding some perf tests is a good idea, I'll do that when I reroll which 
may take a couple of weeks as I'm going offline for a while at the end 
of the week. The tool I have been using is hyperfine[1], it has been 
used by a few other contributors (see `git log --grep σ` if you're 
interested)

[1] https://github.com/sharkdp/hyperfine

Best Wishes

Phillip


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH 09/10] diff --color-moved-ws=allow-indentation-change: improve hash lookups
  2021-06-14 13:04 ` [PATCH 09/10] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
@ 2021-07-09 15:36   ` Elijah Newren
  0 siblings, 0 replies; 92+ messages in thread
From: Elijah Newren @ 2021-07-09 15:36 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget; +Cc: Git Mailing List, Phillip Wood

On Mon, Jun 14, 2021 at 6:06 AM Phillip Wood via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> As libxdiff does not have a whitespace flag to ignore the indentation
> the code for --color-moved-ws=allow-indentation-change uses
> XDF_IGNORE_WHITESPACE and then filters out any hash lookups where
> there are non-indentation changes. This is filtering is inefficient as

s/This is filtering is/This filtering is/

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v2 00/12] diff --color-moved[-ws] speedups
  2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
                   ` (10 preceding siblings ...)
  2021-06-16 14:24 ` [PATCH 00/10] diff --color-moved[-ws] speedups Ævar Arnfjörð Bjarmason
@ 2021-07-20 10:36 ` Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 01/12] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
                     ` (13 more replies)
  11 siblings, 14 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git; +Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood

Thanks to Ævar and Elijah for their comments, I've reworded the commit
messages, addressed the enum initialization issue in patch 2 (now 3) and
added some perf tests.

There are two new patches in this round. The first patch is new and adds the
perf tests suggested by Ævar, the penultimate patch is also new and coverts
the existing code to use a designated initializer.

I've converted the benchmark results in the commit messages to use the new
tests, the percentage changes are broadly similar to the previous results
though I ended up running them on a different computer this time.

V1 cover letter:

The current implementation of diff --color-moved-ws=allow-indentation-change
is considerably slower that the implementation of diff --color-moved which
is in turn slower than a regular diff. This patch series starts with a
couple of bug fixes and then reworks the implementation of diff
--color-moved and diff --color-moved-ws=allow-indentation-change to speed
them up on large diffs. The time to run git diff --color-moved
--no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
with --color-moved - the time to run git log -p --color-moved
--no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
processors. On older processors these patches reduce the running time in all
cases that I've tested. In general the larger the diff the larger the speed
up. As an extreme example the time to run diff --color-moved
--color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
minutes to 6 seconds.

Phillip Wood (12):
  diff --color-moved: add perf tests
  diff --color-moved=zebra: fix alternate coloring
  diff --color-moved: avoid false short line matches and bad zerba
    coloring
  diff: simplify allow-indentation-change delta calculation
  diff --color-moved-ws=allow-indentation-change: simplify and optimize
  diff --color-moved: call comparison function directly
  diff --color-moved: unify moved block growth functions
  diff --color-moved: shrink potential moved blocks as we go
  diff --color-moved: stop clearing potential moved blocks
  diff --color-moved-ws=allow-indentation-change: improve hash lookups
  diff: use designated initializers for emitted_diff_symbol
  diff --color-moved: intern strings

 diff.c                           | 377 ++++++++++++-------------------
 t/perf/p4002-diff-color-moved.sh |  45 ++++
 t/t4015-diff-whitespace.sh       | 137 +++++++++++
 3 files changed, 323 insertions(+), 236 deletions(-)
 create mode 100755 t/perf/p4002-diff-color-moved.sh


base-commit: 211eca0895794362184da2be2a2d812d070719d3
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-981%2Fphillipwood%2Fwip%2Fdiff-color-moved-tweaks-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-981/phillipwood/wip/diff-color-moved-tweaks-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/981

Range-diff vs v1:

  -:  ----------- >  1:  8fc8914a37b diff --color-moved: add perf tests
  1:  374dbebcbf2 !  2:  9b4e4d2674a diff --color-moved=zerba: fix alternate coloring
     @@ Metadata
      Author: Phillip Wood <phillip.wood@dunelm.org.uk>
      
       ## Commit message ##
     -    diff --color-moved=zerba: fix alternate coloring
     +    diff --color-moved=zebra: fix alternate coloring
      
          b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
          alternation", 2018-11-23) sought to avoid using the alternate colors
  2:  3d02a0a91a0 !  3:  5512145c70f diff --color-moved: avoid false short line matches and bad zerba coloring
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
       	int pmb_nr = 0, pmb_alloc = 0;
       	int n, flipped_block = 0, block_length = 0;
      -	enum diff_symbol last_symbol = 0;
     -+	enum diff_symbol moved_symbol = 0;
     ++	enum diff_symbol moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
       
       
       	for (n = 0; n < o->emitted_symbols->nr; n++) {
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
      -			last_symbol = l->s;
      +		}
      +		if (!match) {
     -+			moved_symbol = 0;
     ++			moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
       			continue;
       		}
       
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
      +			if (pmb_nr)
      +				moved_symbol = l->s;
      +			else
     -+				moved_symbol = 0;
     ++				moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
      +
       			block_length = 0;
       		}
  3:  30f0ed44768 =  4:  93fdef30d64 diff: simplify allow-indentation-change delta calculation
  4:  ebb6eec1d92 !  5:  6b7a8aed4ec diff --color-moved-ws=allow-indentation-change: simplify and optimize
     @@ Commit message
          comparison to filter out the non-matching lines. Fixing this reduces
          time to run
            git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -    by 88% and simplifies the code.
     +    by 93% compared to master and simplifies the code.
      
     -    Before this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -      Time (mean ± σ):      9.978 s ±  0.042 s    [User: 9.905 s, System: 0.057 s]
     -      Range (min … max):    9.917 s … 10.037 s    10 runs
     -
     -    After this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.220 s ±  0.004 s    [User: 1.160 s, System: 0.058 s]
     -      Range (min … max):    1.214 s …  1.226 s    10 runs
     +    Test                                                                  HEAD^              HEAD
     +    ---------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41( 0.38+0.03)   0.41(0.37+0.04)  +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.83( 0.79+0.04)   0.82(0.79+0.02)  -1.2%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change  13.68(13.59+0.07)   0.92(0.89+0.03) -93.3%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.31( 1.22+0.08)   1.31(1.21+0.10)  +0.0%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.47( 1.40+0.07)   1.47(1.36+0.10)  +0.0%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.87( 1.77+0.09)   1.50(1.41+0.09) -19.8%
      
          Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
      
  5:  cec0c2d04d7 !  6:  cfbdd447eee diff --color-moved: call comparison function directly
     @@ Metadata
       ## Commit message ##
          diff --color-moved: call comparison function directly
      
     -    Calling xdiff_compare_lines() directly rather than using a function
     -    pointer from the hash map reduces the time very slightly but more
     -    importantly it will allow us to easily combine pmb_advance_or_null()
     -    and pmb_advance_or_null_multi_match() in the next commit.
     +    This change will allow us to easily combine pmb_advance_or_null() and
     +    pmb_advance_or_null_multi_match() in the next commit. Calling
     +    xdiff_compare_lines() directly rather than using a function pointer
     +    from the hash map has little effect on the run time.
      
     -    Before this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.136 s ±  0.004 s    [User: 1.079 s, System: 0.053 s]
     -      Range (min … max):    1.130 s …  1.141 s    10 runs
     -
     -    After this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.118 s ±  0.003 s    [User: 1.062 s, System: 0.053 s]
     -      Range (min … max):    1.114 s …  1.121 s    10 runs
     +    Test                                                                  HEAD^             HEAD
     +    -------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.37+0.04)   0.41(0.39+0.02) +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.82(0.79+0.02)   0.83(0.79+0.03) +1.2%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.92(0.89+0.03)   0.91(0.85+0.05) -1.1%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.31(1.21+0.10)   1.33(1.22+0.10) +1.5%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.47(1.36+0.10)   1.47(1.39+0.08) +0.0%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.50(1.41+0.09)   1.51(1.42+0.09) +0.7%
      
          Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
      
  6:  050cef0081d =  7:  73ce9b54e86 diff --color-moved: unify moved block growth functions
  7:  9390e9a66eb =  8:  ef8ce0e6ebc diff --color-moved: shrink potential moved blocks as we go
  8:  1de99ac2bc3 =  9:  9d0a042eae1 diff --color-moved: stop clearing potential moved blocks
  9:  41cdedd6090 ! 10:  dd365ad115f diff --color-moved-ws=allow-indentation-change: improve hash lookups
     @@ Commit message
          As libxdiff does not have a whitespace flag to ignore the indentation
          the code for --color-moved-ws=allow-indentation-change uses
          XDF_IGNORE_WHITESPACE and then filters out any hash lookups where
     -    there are non-indentation changes. This is filtering is inefficient as
     +    there are non-indentation changes. This filtering is inefficient as
          we have to perform another string comparison.
      
          By using the offset data that we have already computed to skip the
          indentation we can avoid using XDF_IGNORE_WHITESPACE and safely remove
     -    the extra checks which improves the performance by 14% and paves the
     +    the extra checks which improves the performance by 11% and paves the
          way for the elimination of string comparisons in the next commit.
      
     -    This change slightly increases the runtime of other --color-moved
     +    This change slightly increases the run time of other --color-moved
          modes. This could be avoided by using different comparison functions
     -    for the different modes but after the changes in the next commit there
     -    is no measurable benefit.
     +    for the different modes but after the next two commits there is no
     +    measurable benefit in doing so.
      
     -    Before this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.116 s ±  0.005 s    [User: 1.057 s, System: 0.056 s]
     -      Range (min … max):    1.109 s …  1.123 s    10 runs
     -
     -    Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.216 s ±  0.005 s    [User: 1.155 s, System: 0.059 s]
     -      Range (min … max):    1.206 s …  1.223 s    10 runs
     -
     -    After this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.147 s ±  0.005 s    [User: 1.085 s, System: 0.059 s]
     -      Range (min … max):    1.140 s …  1.154 s    10 runs
     -
     -    Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.048 s ±  0.005 s    [User: 987.4 ms, System: 58.8 ms]
     -      Range (min … max):    1.043 s …  1.056 s    10 runs
     +    Test                                                                  HEAD^             HEAD
     +    --------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.38+0.03)   0.41(0.36+0.04) +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.82(0.76+0.05)   0.84(0.79+0.04) +2.4%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.91(0.88+0.03)   0.81(0.74+0.06) -11.0%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.32(1.21+0.10)   1.31(1.19+0.11) -0.8%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.47(1.37+0.10)   1.47(1.36+0.11) +0.0%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.51(1.42+0.09)   1.48(1.37+0.10) -2.0%
      
          Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
      
  -:  ----------- > 11:  c160222ab3c diff: use designated initializers for emitted_diff_symbol
 10:  220664dd907 ! 12:  753554587f9 diff --color-moved: intern strings
     @@ Commit message
          number of hash lookups a little (calculating the ids still involves
          one hash lookup per line) but the main benefit is that when growing
          blocks of potentially moved lines we can replace string comparisons
     -    which involve chasing a pointer with a simple integer comparison.  On
     -    a large diff this commit reduces the time to run 'diff --color-moved'
     -    by 33% and 'diff --color-moved-ws=allow-indentation-change' by 20%.
     +    which involve chasing a pointer with a simple integer comparison.
      
     -    Compared to master the time to run 'git log --patch --color-moved' is
     -    increased by 2% and 'git log --patch
     -    --color-moved-ws=allow-indentation-change' in reduced by 14%. These
     -    timings were performed on an i5-7200U, on an i5-3470 both commands are
     -    faster than master. The small speed decrease on commit sized diffs is
     -    unfortunate but I think it is small enough to be worth it for the
     -    gains on larger diffs.
     +    On a large diff this commit reduces the time to run
     +       diff --color-moved
     +    by 33% and
     +        diff --color-moved-ws=allow-indentation-change
     +    by 26%. Compared to master the time to run
     +        diff --color-moved-ws=allow-indentation-change
     +    is now reduced by 95% and the overhead compared to --no-color-moved is
     +    reduced to 50%.
      
     -    Large diff before this change:
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.147 s ±  0.005 s    [User: 1.085 s, System: 0.059 s]
     -      Range (min … max):    1.140 s …  1.154 s    10 runs
     +    Compared to the previous commit the time to run
     +        git log --patch --color-moved
     +    is increased slightly, but compared to master there is no change in
     +    run time.
      
     -    Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -      Time (mean ± σ):      1.048 s ±  0.005 s    [User: 987.4 ms, System: 58.8 ms]
     -      Range (min … max):    1.043 s …  1.056 s    10 runs
     +    Test                                                                  HEAD^             HEAD
     +    --------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.36+0.04)   0.41(0.37+0.03)  +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.83(0.79+0.03)   0.55(0.52+0.03) -33.7%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.81(0.77+0.04)   0.60(0.55+0.05) -25.9%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.30(1.20+0.09)   1.31(1.22+0.08)  +0.8%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.46(1.35+0.11)   1.47(1.30+0.16)  +0.7%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.46(1.38+0.07)   1.47(1.34+0.13)  +0.7%
      
     -    Large diff after this change
     -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
     -      Time (mean ± σ):     762.7 ms ±   2.8 ms    [User: 707.5 ms, System: 53.7 ms]
     -      Range (min … max):   758.0 ms … 767.0 ms    10 runs
     -
     -    Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
     -      Time (mean ± σ):     831.7 ms ±   1.7 ms    [User: 776.5 ms, System: 53.3 ms]
     -      Range (min … max):   829.2 ms … 835.1 ms    10 runs
     -
     -    Small diffs on master
     -    Benchmark #1: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0
     -      Time (mean ± σ):      1.567 s ±  0.001 s    [User: 1.443 s, System: 0.121 s]
     -      Range (min … max):    1.566 s …  1.571 s    10 runs
     -
     -    Benchmark #2: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change -n1000 --no-merges v2.29.0
     -      Time (mean ± σ):      1.865 s ±  0.008 s    [User: 1.748 s, System: 0.112 s]
     -      Range (min … max):    1.857 s …  1.881 s    10 runs
     -
     -    Small diffs after this change
     -    Benchmark #1: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0
     -      Time (mean ± σ):      1.597 s ±  0.003 s    [User: 1.413 s, System: 0.179 s]
     -      Range (min … max):    1.591 s …  1.601 s    10 runs
     -
     -    Benchmark #2: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change -n1000 --no-merges v2.29.0
     -      Time (mean ± σ):      1.606 s ±  0.006 s    [User: 1.420 s, System: 0.181 s]
     -      Range (min … max):    1.601 s …  1.622 s    10 runs
     +    Test                                                                  master            HEAD
     +    --------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.40( 0.36+0.03)  0.41(0.37+0.03)  +2.5%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.82( 0.77+0.04)  0.55(0.52+0.03) -32.9%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change  14.10(14.04+0.04)  0.60(0.55+0.05) -95.7%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.31( 1.21+0.09)  1.31(1.22+0.08)  +0.0%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.47( 1.37+0.09)  1.47(1.30+0.16)  +0.0%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.86( 1.76+0.10)  1.47(1.34+0.13) -21.0%
      
          Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
      
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
       				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
       				if (o->color_moved_ws_handling &
       				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
     -@@ diff.c: static void emit_diff_symbol_from_struct(struct diff_options *o,
     - static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
     - 			     const char *line, int len, unsigned flags)
     - {
     --	struct emitted_diff_symbol e = {line, len, flags, 0, 0, s};
     -+	struct emitted_diff_symbol e = {line, len, flags, 0, 0, 0, s};
     - 
     - 	if (o->emitted_symbols)
     - 		append_emitted_diff_symbol(o, &e);
      @@ diff.c: static void diff_flush_patch_all_file_pairs(struct diff_options *o)
       
       	if (o->emitted_symbols) {

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v2 01/12] diff --color-moved: add perf tests
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
@ 2021-07-20 10:36   ` Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 02/12] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Add some tests so we can monitor changes to the performance of the
move detection code. The tests record the performance of a single
large diff and a sequence of smaller diffs.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 t/perf/p4002-diff-color-moved.sh | 45 ++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100755 t/perf/p4002-diff-color-moved.sh

diff --git a/t/perf/p4002-diff-color-moved.sh b/t/perf/p4002-diff-color-moved.sh
new file mode 100755
index 00000000000..ad56bcb71e4
--- /dev/null
+++ b/t/perf/p4002-diff-color-moved.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+test_description='Tests diff --color-moved performance'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+if ! git rev-parse --verify v2.29.0^{commit} >/dev/null
+then
+	skip_all='skipping because tag v2.29.0 was not found'
+	test_done
+fi
+
+GIT_PAGER_IN_USE=1
+test_export GIT_PAGER_IN_USE
+
+test_perf 'diff --no-color-moved --no-color-moved-ws large change' '
+	git diff --no-color-moved --no-color-moved-ws v2.28.0 v2.29.0
+'
+
+test_perf 'diff --color-moved --no-color-moved-ws large change' '
+	git diff --color-moved=zebra --no-color-moved-ws v2.28.0 v2.29.0
+'
+
+test_perf 'diff --color-moved-ws=allow-indentation-change large change' '
+	git diff --color-moved=zebra --color-moved-ws=allow-indentation-change \
+		v2.28.0 v2.29.0
+'
+
+test_perf 'log --no-color-moved --no-color-moved-ws' '
+	git log --no-color-moved --no-color-moved-ws --no-merges --patch \
+		-n1000 v2.29.0
+'
+
+test_perf 'log --color-moved --no-color-moved-ws' '
+	git log --color-moved=zebra --no-color-moved-ws --no-merges --patch \
+		-n1000 v2.29.0
+'
+
+test_perf 'log --color-moved-ws=allow-indentation-change' '
+	git log --color-moved=zebra --color-moved-ws=allow-indentation-change \
+		--no-merges --patch -n1000 v2.29.0
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 02/12] diff --color-moved=zebra: fix alternate coloring
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 01/12] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
@ 2021-07-20 10:36   ` Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 03/12] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
                     ` (11 subsequent siblings)
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
alternation", 2018-11-23) sought to avoid using the alternate colors
unless there are two adjacent moved blocks of the same
sign. Unfortunately it contains two bugs that prevented it from fixing
the problem properly. Firstly `last_symbol` is reset at the start of
each iteration of the loop losing the symbol of the last line and
secondly when deciding whether to use the alternate color it should be
checking if the current line is the same sign of the last line, not a
different sign. The combination of the two errors means that we still
use the alternate color when we should do but we also use it when we
shouldn't. This is most noticable when using
--color-moved-ws=allow-indentation-change with hunks like

-this line gets indented
+    this line gets indented

where the post image is colored with newMovedAlternate rather than
newMoved. While this does not matter much, the next commit will change
the coloring to be correct in this case, so lets fix the bug here to
make it clear why the output is changing and add a regression test.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     |  4 +--
 t/t4015-diff-whitespace.sh | 72 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 52c791574b7..cb068f8258c 100644
--- a/diff.c
+++ b/diff.c
@@ -1142,6 +1142,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
 	int n, flipped_block = 0, block_length = 0;
+	enum diff_symbol last_symbol = 0;
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
@@ -1149,7 +1150,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-		enum diff_symbol last_symbol = 0;
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
@@ -1214,7 +1214,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			}
 
 			if (adjust_last_block(o, n, block_length) &&
-			    pmb_nr && last_symbol != l->s)
+			    pmb_nr && last_symbol == l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 2c13b62d3c6..920114cd795 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1442,6 +1442,78 @@ test_expect_success 'detect permutations inside moved code -- dimmed-zebra' '
 	test_cmp expected actual
 '
 
+test_expect_success 'zebra alternate color is only used when necessary' '
+	cat >old.txt <<-\EOF &&
+	line 1A should be marked as oldMoved newMovedAlternate
+	line 1B should be marked as oldMoved newMovedAlternate
+	unchanged
+	line 2A should be marked as oldMoved newMovedAlternate
+	line 2B should be marked as oldMoved newMovedAlternate
+	line 3A should be marked as oldMovedAlternate newMoved
+	line 3B should be marked as oldMovedAlternate newMoved
+	unchanged
+	line 4A should be marked as oldMoved newMovedAlternate
+	line 4B should be marked as oldMoved newMovedAlternate
+	line 5A should be marked as oldMovedAlternate newMoved
+	line 5B should be marked as oldMovedAlternate newMoved
+	line 6A should be marked as oldMoved newMoved
+	line 6B should be marked as oldMoved newMoved
+	EOF
+	cat >new.txt <<-\EOF &&
+	  line 1A should be marked as oldMoved newMovedAlternate
+	  line 1B should be marked as oldMoved newMovedAlternate
+	unchanged
+	  line 3A should be marked as oldMovedAlternate newMoved
+	  line 3B should be marked as oldMovedAlternate newMoved
+	  line 2A should be marked as oldMoved newMovedAlternate
+	  line 2B should be marked as oldMoved newMovedAlternate
+	unchanged
+	  line 6A should be marked as oldMoved newMoved
+	  line 6B should be marked as oldMoved newMoved
+	    line 4A should be marked as oldMoved newMovedAlternate
+	    line 4B should be marked as oldMoved newMovedAlternate
+	  line 5A should be marked as oldMovedAlternate newMoved
+	  line 5B should be marked as oldMovedAlternate newMoved
+	EOF
+	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
+		 --color-moved-ws=allow-indentation-change \
+		 old.txt new.txt >output &&
+	grep -v index output | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/old.txt b/new.txt<RESET>
+	<BOLD>--- a/old.txt<RESET>
+	<BOLD>+++ b/new.txt<RESET>
+	<CYAN>@@ -1,14 +1,14 @@<RESET>
+	<BOLD;MAGENTA>-line 1A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 1B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1B should be marked as oldMoved newMovedAlternate<RESET>
+	 unchanged<RESET>
+	<BOLD;MAGENTA>-line 2A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 2B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;BLUE>-line 3A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;BLUE>-line 3B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2B should be marked as oldMoved newMovedAlternate<RESET>
+	 unchanged<RESET>
+	<BOLD;MAGENTA>-line 4A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 4B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;BLUE>-line 5A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;BLUE>-line 5B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;MAGENTA>-line 6A should be marked as oldMoved newMoved<RESET>
+	<BOLD;MAGENTA>-line 6B should be marked as oldMoved newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6A should be marked as oldMoved newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6B should be marked as oldMoved newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5B should be marked as oldMovedAlternate newMoved<RESET>
+	EOF
+	test_cmp expected actual
+'
+
 test_expect_success 'cmd option assumes configured colored-moved' '
 	test_config color.diff.oldMoved "magenta" &&
 	test_config color.diff.newMoved "cyan" &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 03/12] diff --color-moved: avoid false short line matches and bad zerba coloring
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 01/12] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 02/12] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
@ 2021-07-20 10:36   ` Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 04/12] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

When marking moved lines it is possible for a block of potential
matched lines to extend past a change in sign when there is a sequence
of added lines whose text matches the text of a sequence of deleted
and added lines. Most of the time either `match` will be NULL or
`pmb_advance_or_null()` will fail when the loop encounters a change of
sign but there are corner cases where `match` is non-NULL and
`pmb_advance_or_null()` successfully advances the moved block despite
the change in sign.

One consequence of this is highlighting a short line as moved when it
should not be. For example

-moved line  # Correctly highlighted as moved
+short line  # Wrongly highlighted as moved
 context
+moved line  # Correctly highlighted as moved
+short line
 context
-short line

The other consequence is coloring a moved addition following a moved
deletion in the wrong color. In the example below the first "+moved
line 3" should be highlighted as newMoved not newMovedAlternate.

-moved line 1 # Correctly highlighted as oldMoved
-moved line 2 # Correctly highlighted as oldMovedAlternate
+moved line 3 # Wrongly highlighted as newMovedAlternate
 context      # Everything else is highlighted correctly
+moved line 2
+moved line 3
 context
+moved line 1
-moved line 3

These false matches are more likely when using --color-moved-ws with
the exception of --color-moved-ws=allow-indentation-change which ties
the sign of the current whitespace delta to the sign of the line to
avoid this problem. The fix is to check that the sign of the new line
being matched is the same as the sign of the line that started the
block of potential matches.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 17 ++++++----
 t/t4015-diff-whitespace.sh | 65 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index cb068f8258c..2b51b77fd20 100644
--- a/diff.c
+++ b/diff.c
@@ -1142,7 +1142,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
 	int n, flipped_block = 0, block_length = 0;
-	enum diff_symbol last_symbol = 0;
+	enum diff_symbol moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
@@ -1168,7 +1168,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			flipped_block = 0;
 		}
 
-		if (!match) {
+		if (pmb_nr && (!match || l->s != moved_symbol)) {
 			int i;
 
 			adjust_last_block(o, n, block_length);
@@ -1177,12 +1177,13 @@ static void mark_color_as_moved(struct diff_options *o,
 			pmb_nr = 0;
 			block_length = 0;
 			flipped_block = 0;
-			last_symbol = l->s;
+		}
+		if (!match) {
+			moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
 			continue;
 		}
 
 		if (o->color_moved == COLOR_MOVED_PLAIN) {
-			last_symbol = l->s;
 			l->flags |= DIFF_SYMBOL_MOVED_LINE;
 			continue;
 		}
@@ -1214,11 +1215,16 @@ static void mark_color_as_moved(struct diff_options *o,
 			}
 
 			if (adjust_last_block(o, n, block_length) &&
-			    pmb_nr && last_symbol == l->s)
+			    pmb_nr && moved_symbol == l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
 
+			if (pmb_nr)
+				moved_symbol = l->s;
+			else
+				moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
+
 			block_length = 0;
 		}
 
@@ -1228,7 +1234,6 @@ static void mark_color_as_moved(struct diff_options *o,
 			if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
 				l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 		}
-		last_symbol = l->s;
 	}
 	adjust_last_block(o, n, block_length);
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 920114cd795..3119a59f071 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1514,6 +1514,71 @@ test_expect_success 'zebra alternate color is only used when necessary' '
 	test_cmp expected actual
 '
 
+test_expect_success 'short lines of opposite sign do not get marked as moved' '
+	cat >old.txt <<-\EOF &&
+	this line should be marked as moved
+	unchanged
+	unchanged
+	unchanged
+	unchanged
+	too short
+	this line should be marked as oldMoved newMoved
+	this line should be marked as oldMovedAlternate newMoved
+	unchanged 1
+	unchanged 2
+	unchanged 3
+	unchanged 4
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	EOF
+	cat >new.txt <<-\EOF &&
+	too short
+	unchanged
+	unchanged
+	this line should be marked as moved
+	too short
+	unchanged
+	unchanged
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	unchanged 1
+	unchanged 2
+	this line should be marked as oldMovedAlternate newMoved
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	unchanged 3
+	this line should be marked as oldMoved newMoved
+	unchanged 4
+	EOF
+	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
+		old.txt new.txt >output && cat output &&
+	grep -v index output | test_decode_color >actual &&
+	cat >expect <<-\EOF &&
+	<BOLD>diff --git a/old.txt b/new.txt<RESET>
+	<BOLD>--- a/old.txt<RESET>
+	<BOLD>+++ b/new.txt<RESET>
+	<CYAN>@@ -1,13 +1,15 @@<RESET>
+	<BOLD;MAGENTA>-this line should be marked as moved<RESET>
+	<GREEN>+<RESET><GREEN>too short<RESET>
+	 unchanged<RESET>
+	 unchanged<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as moved<RESET>
+	<GREEN>+<RESET><GREEN>too short<RESET>
+	 unchanged<RESET>
+	 unchanged<RESET>
+	<RED>-too short<RESET>
+	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved<RESET>
+	<BOLD;BLUE>-this line should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	 unchanged 1<RESET>
+	 unchanged 2<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	 unchanged 3<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved<RESET>
+	 unchanged 4<RESET>
+	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'cmd option assumes configured colored-moved' '
 	test_config color.diff.oldMoved "magenta" &&
 	test_config color.diff.newMoved "cyan" &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 04/12] diff: simplify allow-indentation-change delta calculation
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
                     ` (2 preceding siblings ...)
  2021-07-20 10:36   ` [PATCH v2 03/12] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
@ 2021-07-20 10:36   ` Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 05/12] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Now that we reliably end a block when the sign changes we don't need
the whitespace delta calculation to rely on the sign.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index 2b51b77fd20..77c893b266a 100644
--- a/diff.c
+++ b/diff.c
@@ -864,23 +864,17 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 	    a_width = a->indent_width,
 	    b_off = b->indent_off,
 	    b_width = b->indent_width;
-	int delta;
 
 	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
 		*out = INDENT_BLANKLINE;
 		return 1;
 	}
 
-	if (a->s == DIFF_SYMBOL_PLUS)
-		delta = a_width - b_width;
-	else
-		delta = b_width - a_width;
-
 	if (a_len - a_off != b_len - b_off ||
 	    memcmp(a->line + a_off, b->line + b_off, a_len - a_off))
 		return 0;
 
-	*out = delta;
+	*out = a_width - b_width;
 
 	return 1;
 }
@@ -924,10 +918,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 * match those of the current block and that the text of 'l' and 'cur'
 	 * after the indentation match.
 	 */
-	if (cur->es->s == DIFF_SYMBOL_PLUS)
-		delta = a_width - c_width;
-	else
-		delta = c_width - a_width;
+	delta = c_width - a_width;
 
 	/*
 	 * If the previous lines of this block were all blank then set its
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 05/12] diff --color-moved-ws=allow-indentation-change: simplify and optimize
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
                     ` (3 preceding siblings ...)
  2021-07-20 10:36   ` [PATCH v2 04/12] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
@ 2021-07-20 10:36   ` Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 06/12] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

If we already have a block of potentially moved lines then as we move
down the diff we need to check if the next line of each potentially
moved line matches the current line of the diff. The implementation of
--color-moved-ws=allow-indentation-change was needlessly performing
this check on all the lines in the diff that matched the current line
rather than just the current line. To exacerbate the problem finding
all the other lines in the diff that match the current line involves a
fuzzy lookup so we were wasting even more time performing a second
comparison to filter out the non-matching lines. Fixing this reduces
time to run
  git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
by 93% compared to master and simplifies the code.

Test                                                                  HEAD^              HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41( 0.38+0.03)   0.41(0.37+0.04)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.83( 0.79+0.04)   0.82(0.79+0.02)  -1.2%
4002.3: diff --color-moved-ws=allow-indentation-change large change  13.68(13.59+0.07)   0.92(0.89+0.03) -93.3%
4002.4: log --no-color-moved --no-color-moved-ws                      1.31( 1.22+0.08)   1.31(1.21+0.10)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.47( 1.40+0.07)   1.47(1.36+0.10)  +0.0%
4002.6: log --color-moved-ws=allow-indentation-change                 1.87( 1.77+0.09)   1.50(1.41+0.09) -19.8%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 65 ++++++++++++++++------------------------------------------
 1 file changed, 18 insertions(+), 47 deletions(-)

diff --git a/diff.c b/diff.c
index 77c893b266a..55384449170 100644
--- a/diff.c
+++ b/diff.c
@@ -881,35 +881,20 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 
 static int cmp_in_block_with_wsd(const struct diff_options *o,
 				 const struct moved_entry *cur,
-				 const struct moved_entry *match,
-				 struct moved_block *pmb,
-				 int n)
+				 const struct emitted_diff_symbol *l,
+				 struct moved_block *pmb)
 {
-	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-	int al = cur->es->len, bl = match->es->len, cl = l->len;
+	int al = cur->es->len, bl = l->len;
 	const char *a = cur->es->line,
-		   *b = match->es->line,
-		   *c = l->line;
+		   *b = l->line;
 	int a_off = cur->es->indent_off,
 	    a_width = cur->es->indent_width,
-	    c_off = l->indent_off,
-	    c_width = l->indent_width;
+	    b_off = l->indent_off,
+	    b_width = l->indent_width;
 	int delta;
 
-	/*
-	 * We need to check if 'cur' is equal to 'match'.  As those
-	 * are from the same (+/-) side, we do not need to adjust for
-	 * indent changes. However these were found using fuzzy
-	 * matching so we do have to check if they are equal. Here we
-	 * just check the lengths. We delay calling memcmp() to check
-	 * the contents until later as if the length comparison for a
-	 * and c fails we can avoid the call all together.
-	 */
-	if (al != bl)
-		return 1;
-
 	/* If 'l' and 'cur' are both blank then they match. */
-	if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE)
+	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
 		return 0;
 
 	/*
@@ -918,7 +903,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 * match those of the current block and that the text of 'l' and 'cur'
 	 * after the indentation match.
 	 */
-	delta = c_width - a_width;
+	delta = b_width - a_width;
 
 	/*
 	 * If the previous lines of this block were all blank then set its
@@ -927,9 +912,8 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	if (pmb->wsd == INDENT_BLANKLINE)
 		pmb->wsd = delta;
 
-	return !(delta == pmb->wsd && al - a_off == cl - c_off &&
-		 !memcmp(a, b, al) && !
-		 memcmp(a + a_off, c + c_off, al - a_off));
+	return !(delta == pmb->wsd && al - a_off == bl - b_off &&
+		 !memcmp(a + a_off, b + b_off, al - a_off));
 }
 
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
@@ -1030,36 +1014,23 @@ static void pmb_advance_or_null(struct diff_options *o,
 }
 
 static void pmb_advance_or_null_multi_match(struct diff_options *o,
-					    struct moved_entry *match,
-					    struct hashmap *hm,
+					    struct emitted_diff_symbol *l,
 					    struct moved_block *pmb,
-					    int pmb_nr, int n)
+					    int pmb_nr)
 {
 	int i;
-	char *got_match = xcalloc(1, pmb_nr);
-
-	hashmap_for_each_entry_from(hm, match, ent) {
-		for (i = 0; i < pmb_nr; i++) {
-			struct moved_entry *prev = pmb[i].match;
-			struct moved_entry *cur = (prev && prev->next_line) ?
-					prev->next_line : NULL;
-			if (!cur)
-				continue;
-			if (!cmp_in_block_with_wsd(o, cur, match, &pmb[i], n))
-				got_match[i] |= 1;
-		}
-	}
 
 	for (i = 0; i < pmb_nr; i++) {
-		if (got_match[i]) {
+		struct moved_entry *prev = pmb[i].match;
+		struct moved_entry *cur = (prev && prev->next_line) ?
+			prev->next_line : NULL;
+		if (cur && !cmp_in_block_with_wsd(o, cur, l, &pmb[i])) {
 			/* Advance to the next line */
-			pmb[i].match = pmb[i].match->next_line;
+			pmb[i].match = cur;
 		} else {
 			moved_block_clear(&pmb[i]);
 		}
 	}
-
-	free(got_match);
 }
 
 static int shrink_potential_moved_blocks(struct moved_block *pmb,
@@ -1181,7 +1152,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
+			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
 		else
 			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 06/12] diff --color-moved: call comparison function directly
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
                     ` (4 preceding siblings ...)
  2021-07-20 10:36   ` [PATCH v2 05/12] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
@ 2021-07-20 10:36   ` Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 07/12] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This change will allow us to easily combine pmb_advance_or_null() and
pmb_advance_or_null_multi_match() in the next commit. Calling
xdiff_compare_lines() directly rather than using a function pointer
from the hash map has little effect on the run time.

Test                                                                  HEAD^             HEAD
-------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.37+0.04)   0.41(0.39+0.02) +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.82(0.79+0.02)   0.83(0.79+0.03) +1.2%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.92(0.89+0.03)   0.91(0.85+0.05) -1.1%
4002.4: log --no-color-moved --no-color-moved-ws                      1.31(1.21+0.10)   1.33(1.22+0.10) +1.5%
4002.5: log --color-moved --no-color-moved-ws                         1.47(1.36+0.10)   1.47(1.39+0.08) +0.0%
4002.6: log --color-moved-ws=allow-indentation-change                 1.50(1.41+0.09)   1.51(1.42+0.09) +0.7%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 55384449170..c056d917d0d 100644
--- a/diff.c
+++ b/diff.c
@@ -995,17 +995,20 @@ static void add_lines_to_move_detection(struct diff_options *o,
 }
 
 static void pmb_advance_or_null(struct diff_options *o,
-				struct moved_entry *match,
-				struct hashmap *hm,
+				struct emitted_diff_symbol *l,
 				struct moved_block *pmb,
 				int pmb_nr)
 {
 	int i;
+	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
+
 	for (i = 0; i < pmb_nr; i++) {
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
 				prev->next_line : NULL;
-		if (cur && !hm->cmpfn(o, &cur->ent, &match->ent, NULL)) {
+		if (cur && xdiff_compare_lines(cur->es->line, cur->es->len,
+						l->line, l->len,
+						flags)) {
 			pmb[i].match = cur;
 		} else {
 			pmb[i].match = NULL;
@@ -1154,7 +1157,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
 			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
 		else
-			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
+			pmb_advance_or_null(o, l, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 07/12] diff --color-moved: unify moved block growth functions
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
                     ` (5 preceding siblings ...)
  2021-07-20 10:36   ` [PATCH v2 06/12] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
@ 2021-07-20 10:36   ` Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 08/12] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

After the last two commits pmb_advance_or_null() and
pmb_advance_or_null_multi_match() differ only in the comparison they
perform. Lets simplify the code by combining them into a single
function.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/diff.c b/diff.c
index c056d917d0d..b03f79b626c 100644
--- a/diff.c
+++ b/diff.c
@@ -1003,36 +1003,23 @@ static void pmb_advance_or_null(struct diff_options *o,
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
 	for (i = 0; i < pmb_nr; i++) {
+		int match;
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
 				prev->next_line : NULL;
-		if (cur && xdiff_compare_lines(cur->es->line, cur->es->len,
-						l->line, l->len,
-						flags)) {
-			pmb[i].match = cur;
-		} else {
-			pmb[i].match = NULL;
-		}
-	}
-}
 
-static void pmb_advance_or_null_multi_match(struct diff_options *o,
-					    struct emitted_diff_symbol *l,
-					    struct moved_block *pmb,
-					    int pmb_nr)
-{
-	int i;
-
-	for (i = 0; i < pmb_nr; i++) {
-		struct moved_entry *prev = pmb[i].match;
-		struct moved_entry *cur = (prev && prev->next_line) ?
-			prev->next_line : NULL;
-		if (cur && !cmp_in_block_with_wsd(o, cur, l, &pmb[i])) {
-			/* Advance to the next line */
+		if (o->color_moved_ws_handling &
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			match = cur &&
+				!cmp_in_block_with_wsd(o, cur, l, &pmb[i]);
+		else
+			match = cur &&
+				xdiff_compare_lines(cur->es->line, cur->es->len,
+						    l->line, l->len, flags);
+		if (match)
 			pmb[i].match = cur;
-		} else {
+		else
 			moved_block_clear(&pmb[i]);
-		}
 	}
 }
 
@@ -1153,11 +1140,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 		}
 
-		if (o->color_moved_ws_handling &
-		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
-		else
-			pmb_advance_or_null(o, l, pmb, pmb_nr);
+		pmb_advance_or_null(o, l, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 08/12] diff --color-moved: shrink potential moved blocks as we go
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
                     ` (6 preceding siblings ...)
  2021-07-20 10:36   ` [PATCH v2 07/12] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
@ 2021-07-20 10:36   ` Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 09/12] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
                     ` (5 subsequent siblings)
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Rather than setting `match` to NULL and then looping over the list of
potential matched blocks for a second time to remove blocks with no
matches just filter out the blocks with no matches as we go.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 42 ++++++------------------------------------
 1 file changed, 6 insertions(+), 36 deletions(-)

diff --git a/diff.c b/diff.c
index b03f79b626c..068473c0be3 100644
--- a/diff.c
+++ b/diff.c
@@ -997,12 +997,12 @@ static void add_lines_to_move_detection(struct diff_options *o,
 static void pmb_advance_or_null(struct diff_options *o,
 				struct emitted_diff_symbol *l,
 				struct moved_block *pmb,
-				int pmb_nr)
+				int *pmb_nr)
 {
-	int i;
+	int i, j;
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
-	for (i = 0; i < pmb_nr; i++) {
+	for (i = 0, j = 0; i < *pmb_nr; i++) {
 		int match;
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
@@ -1017,37 +1017,9 @@ static void pmb_advance_or_null(struct diff_options *o,
 				xdiff_compare_lines(cur->es->line, cur->es->len,
 						    l->line, l->len, flags);
 		if (match)
-			pmb[i].match = cur;
-		else
-			moved_block_clear(&pmb[i]);
+			pmb[j++].match = cur;
 	}
-}
-
-static int shrink_potential_moved_blocks(struct moved_block *pmb,
-					 int pmb_nr)
-{
-	int lp, rp;
-
-	/* Shrink the set of potential block to the remaining running */
-	for (lp = 0, rp = pmb_nr - 1; lp <= rp;) {
-		while (lp < pmb_nr && pmb[lp].match)
-			lp++;
-		/* lp points at the first NULL now */
-
-		while (rp > -1 && !pmb[rp].match)
-			rp--;
-		/* rp points at the last non-NULL */
-
-		if (lp < pmb_nr && rp > -1 && lp < rp) {
-			pmb[lp] = pmb[rp];
-			memset(&pmb[rp], 0, sizeof(pmb[rp]));
-			rp--;
-			lp++;
-		}
-	}
-
-	/* Remember the number of running sets */
-	return rp + 1;
+	*pmb_nr = j;
 }
 
 /*
@@ -1140,9 +1112,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 		}
 
-		pmb_advance_or_null(o, l, pmb, pmb_nr);
-
-		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
+		pmb_advance_or_null(o, l, pmb, &pmb_nr);
 
 		if (pmb_nr == 0) {
 			/*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 09/12] diff --color-moved: stop clearing potential moved blocks
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
                     ` (7 preceding siblings ...)
  2021-07-20 10:36   ` [PATCH v2 08/12] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
@ 2021-07-20 10:36   ` Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 10/12] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
                     ` (4 subsequent siblings)
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

moved_block_clear() was introduced in 74d156f4a1 ("diff
--color-moved-ws: fix double free crash", 2018-10-04) to free the
memory that was allocated when initializing a potential moved
block. However since 21536d077f ("diff --color-moved-ws: modify
allow-indentation-change", 2018-11-23) initializing a potential moved
block no longer allocates any memory. Up until the last commit we were
relying on moved_block_clear() to set the `match` pointer to NULL when
a block stopped matching, but since that commit we do not clear a
moved block that does not match so it does not make sense to clear
them elsewhere.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/diff.c b/diff.c
index 068473c0be3..4b5776a5a0a 100644
--- a/diff.c
+++ b/diff.c
@@ -807,11 +807,6 @@ struct moved_block {
 	int wsd; /* The whitespace delta of this block */
 };
 
-static void moved_block_clear(struct moved_block *b)
-{
-	memset(b, 0, sizeof(*b));
-}
-
 #define INDENT_BLANKLINE INT_MIN
 
 static void fill_es_indent_data(struct emitted_diff_symbol *es)
@@ -1093,11 +1088,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		if (pmb_nr && (!match || l->s != moved_symbol)) {
-			int i;
-
 			adjust_last_block(o, n, block_length);
-			for(i = 0; i < pmb_nr; i++)
-				moved_block_clear(&pmb[i]);
 			pmb_nr = 0;
 			block_length = 0;
 			flipped_block = 0;
@@ -1155,8 +1146,6 @@ static void mark_color_as_moved(struct diff_options *o,
 	}
 	adjust_last_block(o, n, block_length);
 
-	for(n = 0; n < pmb_nr; n++)
-		moved_block_clear(&pmb[n]);
 	free(pmb);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 10/12] diff --color-moved-ws=allow-indentation-change: improve hash lookups
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
                     ` (8 preceding siblings ...)
  2021-07-20 10:36   ` [PATCH v2 09/12] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
@ 2021-07-20 10:36   ` Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 11/12] diff: use designated initializers for emitted_diff_symbol Phillip Wood via GitGitGadget
                     ` (3 subsequent siblings)
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

As libxdiff does not have a whitespace flag to ignore the indentation
the code for --color-moved-ws=allow-indentation-change uses
XDF_IGNORE_WHITESPACE and then filters out any hash lookups where
there are non-indentation changes. This filtering is inefficient as
we have to perform another string comparison.

By using the offset data that we have already computed to skip the
indentation we can avoid using XDF_IGNORE_WHITESPACE and safely remove
the extra checks which improves the performance by 11% and paves the
way for the elimination of string comparisons in the next commit.

This change slightly increases the run time of other --color-moved
modes. This could be avoided by using different comparison functions
for the different modes but after the next two commits there is no
measurable benefit in doing so.

Test                                                                  HEAD^             HEAD
--------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.38+0.03)   0.41(0.36+0.04) +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.82(0.76+0.05)   0.84(0.79+0.04) +2.4%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.91(0.88+0.03)   0.81(0.74+0.06) -11.0%
4002.4: log --no-color-moved --no-color-moved-ws                      1.32(1.21+0.10)   1.31(1.19+0.11) -0.8%
4002.5: log --color-moved --no-color-moved-ws                         1.47(1.37+0.10)   1.47(1.36+0.11) +0.0%
4002.6: log --color-moved-ws=allow-indentation-change                 1.51(1.42+0.09)   1.48(1.37+0.10) -2.0%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 66 +++++++++++++++++-----------------------------------------
 1 file changed, 19 insertions(+), 47 deletions(-)

diff --git a/diff.c b/diff.c
index 4b5776a5a0a..f899083d028 100644
--- a/diff.c
+++ b/diff.c
@@ -850,28 +850,15 @@ static void fill_es_indent_data(struct emitted_diff_symbol *es)
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
-			    const struct emitted_diff_symbol *b,
-			    int *out)
-{
-	int a_len = a->len,
-	    b_len = b->len,
-	    a_off = a->indent_off,
-	    a_width = a->indent_width,
-	    b_off = b->indent_off,
+			    const struct emitted_diff_symbol *b)
+{
+	int a_width = a->indent_width,
 	    b_width = b->indent_width;
 
-	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
-		*out = INDENT_BLANKLINE;
-		return 1;
-	}
-
-	if (a_len - a_off != b_len - b_off ||
-	    memcmp(a->line + a_off, b->line + b_off, a_len - a_off))
-		return 0;
-
-	*out = a_width - b_width;
+	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
+		return INDENT_BLANKLINE;
 
-	return 1;
+	return a_width - b_width;
 }
 
 static int cmp_in_block_with_wsd(const struct diff_options *o,
@@ -917,26 +904,17 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			   const void *keydata)
 {
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
-	const struct moved_entry *a, *b;
+	const struct emitted_diff_symbol *a, *b;
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
-	a = container_of(eptr, const struct moved_entry, ent);
-	b = container_of(entry_or_key, const struct moved_entry, ent);
-
-	if (diffopt->color_moved_ws_handling &
-	    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-		/*
-		 * As there is not specific white space config given,
-		 * we'd need to check for a new block, so ignore all
-		 * white space. The setup of the white space
-		 * configuration for the next block is done else where
-		 */
-		flags |= XDF_IGNORE_WHITESPACE;
+	a = container_of(eptr, const struct moved_entry, ent)->es;
+	b = container_of(entry_or_key, const struct moved_entry, ent)->es;
 
-	return !xdiff_compare_lines(a->es->line, a->es->len,
-				    b->es->line, b->es->len,
-				    flags);
+	return !xdiff_compare_lines(a->line + a->indent_off,
+				    a->len - a->indent_off,
+				    b->line + b->indent_off,
+				    b->len - b->indent_off, flags);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -945,7 +923,8 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 	struct moved_entry *ret = xmalloc(sizeof(*ret));
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
-	unsigned int hash = xdiff_hash_string(l->line, l->len, flags);
+	unsigned int hash = xdiff_hash_string(l->line + l->indent_off,
+					      l->len - l->indent_off, flags);
 
 	hashmap_entry_init(&ret->ent, hash);
 	ret->es = l;
@@ -1113,14 +1092,11 @@ static void mark_color_as_moved(struct diff_options *o,
 			hashmap_for_each_entry_from(hm, match, ent) {
 				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
 				if (o->color_moved_ws_handling &
-				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
-					if (compute_ws_delta(l, match->es,
-							     &pmb[pmb_nr].wsd))
-						pmb[pmb_nr++].match = match;
-				} else {
+				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+					pmb[pmb_nr].wsd = compute_ws_delta(l, match->es);
+				else
 					pmb[pmb_nr].wsd = 0;
-					pmb[pmb_nr++].match = match;
-				}
+				pmb[pmb_nr++].match = match;
 			}
 
 			if (adjust_last_block(o, n, block_length) &&
@@ -6240,10 +6216,6 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
-			if (o->color_moved_ws_handling &
-			    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-				o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
-
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
 			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 11/12] diff: use designated initializers for emitted_diff_symbol
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
                     ` (9 preceding siblings ...)
  2021-07-20 10:36   ` [PATCH v2 10/12] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
@ 2021-07-20 10:36   ` Phillip Wood via GitGitGadget
  2021-07-20 10:36   ` [PATCH v2 12/12] diff --color-moved: intern strings Phillip Wood via GitGitGadget
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This makes it clearer which fields are being explicitly initialized
and will simplify the next commit where we add a new field to the
struct.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index f899083d028..31a20a34240 100644
--- a/diff.c
+++ b/diff.c
@@ -1460,7 +1460,9 @@ static void emit_diff_symbol_from_struct(struct diff_options *o,
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
 			     const char *line, int len, unsigned flags)
 {
-	struct emitted_diff_symbol e = {line, len, flags, 0, 0, s};
+	struct emitted_diff_symbol e = {
+		.line = line, .len = len, .flags = flags, .s = s
+	};
 
 	if (o->emitted_symbols)
 		append_emitted_diff_symbol(o, &e);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v2 12/12] diff --color-moved: intern strings
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
                     ` (10 preceding siblings ...)
  2021-07-20 10:36   ` [PATCH v2 11/12] diff: use designated initializers for emitted_diff_symbol Phillip Wood via GitGitGadget
@ 2021-07-20 10:36   ` Phillip Wood via GitGitGadget
  2021-07-20 13:38   ` [PATCH v2 00/12] diff --color-moved[-ws] speedups Phillip Wood
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-07-20 10:36 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Taking inspiration from xdl_classify_record() assign an id to each
addition and deletion such that lines that match for the current
--color-moved-ws mode share the same unique id. This reduces the
number of hash lookups a little (calculating the ids still involves
one hash lookup per line) but the main benefit is that when growing
blocks of potentially moved lines we can replace string comparisons
which involve chasing a pointer with a simple integer comparison.

On a large diff this commit reduces the time to run
   diff --color-moved
by 33% and
    diff --color-moved-ws=allow-indentation-change
by 26%. Compared to master the time to run
    diff --color-moved-ws=allow-indentation-change
is now reduced by 95% and the overhead compared to --no-color-moved is
reduced to 50%.

Compared to the previous commit the time to run
    git log --patch --color-moved
is increased slightly, but compared to master there is no change in
run time.

Test                                                                  HEAD^             HEAD
--------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.36+0.04)   0.41(0.37+0.03)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.83(0.79+0.03)   0.55(0.52+0.03) -33.7%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.81(0.77+0.04)   0.60(0.55+0.05) -25.9%
4002.4: log --no-color-moved --no-color-moved-ws                      1.30(1.20+0.09)   1.31(1.22+0.08)  +0.8%
4002.5: log --color-moved --no-color-moved-ws                         1.46(1.35+0.11)   1.47(1.30+0.16)  +0.7%
4002.6: log --color-moved-ws=allow-indentation-change                 1.46(1.38+0.07)   1.47(1.34+0.13)  +0.7%

Test                                                                  master            HEAD
--------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.40( 0.36+0.03)  0.41(0.37+0.03)  +2.5%
4002.2: diff --color-moved --no-color-moved-ws large change           0.82( 0.77+0.04)  0.55(0.52+0.03) -32.9%
4002.3: diff --color-moved-ws=allow-indentation-change large change  14.10(14.04+0.04)  0.60(0.55+0.05) -95.7%
4002.4: log --no-color-moved --no-color-moved-ws                      1.31( 1.21+0.09)  1.31(1.22+0.08)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.47( 1.37+0.09)  1.47(1.30+0.16)  +0.0%
4002.6: log --color-moved-ws=allow-indentation-change                 1.86( 1.76+0.10)  1.47(1.34+0.13) -21.0%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 171 ++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 95 insertions(+), 76 deletions(-)

diff --git a/diff.c b/diff.c
index 31a20a34240..2956c8f7103 100644
--- a/diff.c
+++ b/diff.c
@@ -18,6 +18,7 @@
 #include "submodule-config.h"
 #include "submodule.h"
 #include "hashmap.h"
+#include "mem-pool.h"
 #include "ll-merge.h"
 #include "string-list.h"
 #include "strvec.h"
@@ -772,6 +773,7 @@ struct emitted_diff_symbol {
 	int flags;
 	int indent_off;   /* Offset to first non-whitespace character */
 	int indent_width; /* The visual width of the indentation */
+	unsigned id;
 	enum diff_symbol s;
 };
 #define EMITTED_DIFF_SYMBOL_INIT {NULL}
@@ -797,9 +799,9 @@ static void append_emitted_diff_symbol(struct diff_options *o,
 }
 
 struct moved_entry {
-	struct hashmap_entry ent;
 	const struct emitted_diff_symbol *es;
 	struct moved_entry *next_line;
+	struct moved_entry *next_match;
 };
 
 struct moved_block {
@@ -866,24 +868,24 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 				 const struct emitted_diff_symbol *l,
 				 struct moved_block *pmb)
 {
-	int al = cur->es->len, bl = l->len;
-	const char *a = cur->es->line,
-		   *b = l->line;
-	int a_off = cur->es->indent_off,
-	    a_width = cur->es->indent_width,
-	    b_off = l->indent_off,
-	    b_width = l->indent_width;
+	int a_width = cur->es->indent_width, b_width = l->indent_width;
 	int delta;
 
-	/* If 'l' and 'cur' are both blank then they match. */
-	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
+	/* The text of each line must match */
+	if (cur->es->id != l->id)
+		return 1;
+
+	/*
+	 * If 'l' and 'cur' are both blank then we don't need to check the
+	 * indent. We only need to check cur as we know the strings match.
+	 * */
+	if (a_width == INDENT_BLANKLINE)
 		return 0;
 
 	/*
 	 * The indent changes of the block are known and stored in pmb->wsd;
 	 * however we need to check if the indent changes of the current line
-	 * match those of the current block and that the text of 'l' and 'cur'
-	 * after the indentation match.
+	 * match those of the current block.
 	 */
 	delta = b_width - a_width;
 
@@ -894,22 +896,26 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	if (pmb->wsd == INDENT_BLANKLINE)
 		pmb->wsd = delta;
 
-	return !(delta == pmb->wsd && al - a_off == bl - b_off &&
-		 !memcmp(a + a_off, b + b_off, al - a_off));
+	return delta != pmb->wsd;
 }
 
-static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
-			   const struct hashmap_entry *eptr,
-			   const struct hashmap_entry *entry_or_key,
-			   const void *keydata)
+struct interned_diff_symbol {
+	struct hashmap_entry ent;
+	struct emitted_diff_symbol *es;
+};
+
+static int interned_diff_symbol_cmp(const void *hashmap_cmp_fn_data,
+				    const struct hashmap_entry *eptr,
+				    const struct hashmap_entry *entry_or_key,
+				    const void *keydata)
 {
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
 	const struct emitted_diff_symbol *a, *b;
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
-	a = container_of(eptr, const struct moved_entry, ent)->es;
-	b = container_of(entry_or_key, const struct moved_entry, ent)->es;
+	a = container_of(eptr, const struct interned_diff_symbol, ent)->es;
+	b = container_of(entry_or_key, const struct interned_diff_symbol, ent)->es;
 
 	return !xdiff_compare_lines(a->line + a->indent_off,
 				    a->len - a->indent_off,
@@ -917,55 +923,81 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 				    b->len - b->indent_off, flags);
 }
 
-static struct moved_entry *prepare_entry(struct diff_options *o,
-					 int line_no)
+static void prepare_entry(struct diff_options *o, struct emitted_diff_symbol *l,
+			  struct interned_diff_symbol *s)
 {
-	struct moved_entry *ret = xmalloc(sizeof(*ret));
-	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 	unsigned int hash = xdiff_hash_string(l->line + l->indent_off,
 					      l->len - l->indent_off, flags);
 
-	hashmap_entry_init(&ret->ent, hash);
-	ret->es = l;
-	ret->next_line = NULL;
-
-	return ret;
+	hashmap_entry_init(&s->ent, hash);
+	s->es = l;
 }
 
-static void add_lines_to_move_detection(struct diff_options *o,
-					struct hashmap *add_lines,
-					struct hashmap *del_lines)
+struct moved_entry_list {
+	struct moved_entry *add, *del;
+};
+
+static struct moved_entry_list *add_lines_to_move_detection(struct diff_options *o,
+							    struct mem_pool *entry_mem_pool)
 {
 	struct moved_entry *prev_line = NULL;
-
+	struct mem_pool interned_pool;
+	struct hashmap interned_map;
+	struct moved_entry_list *entry_list = NULL;
+	size_t entry_list_alloc = 0;
+	unsigned id = 0;
 	int n;
+
+	hashmap_init(&interned_map, interned_diff_symbol_cmp, o, 8096);
+	mem_pool_init(&interned_pool, 1024 * 1024);
+
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
-		struct hashmap *hm;
-		struct moved_entry *key;
+		struct interned_diff_symbol key;
+		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
+		struct interned_diff_symbol *s;
+		struct moved_entry *entry;
 
-		switch (o->emitted_symbols->buf[n].s) {
-		case DIFF_SYMBOL_PLUS:
-			hm = add_lines;
-			break;
-		case DIFF_SYMBOL_MINUS:
-			hm = del_lines;
-			break;
-		default:
+		if (l->s != DIFF_SYMBOL_PLUS && l->s != DIFF_SYMBOL_MINUS) {
 			prev_line = NULL;
 			continue;
 		}
 
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			fill_es_indent_data(&o->emitted_symbols->buf[n]);
-		key = prepare_entry(o, n);
-		if (prev_line && prev_line->es->s == o->emitted_symbols->buf[n].s)
-			prev_line->next_line = key;
+			fill_es_indent_data(l);
 
-		hashmap_add(hm, &key->ent);
-		prev_line = key;
+		prepare_entry(o, l, &key);
+		s = hashmap_get_entry(&interned_map, &key, ent, &key.ent);
+		if (s) {
+			l->id = s->es->id;
+		} else {
+			l->id = id;
+			ALLOC_GROW_BY(entry_list, id, 1, entry_list_alloc);
+			hashmap_add(&interned_map,
+				    memcpy(mem_pool_alloc(&interned_pool,
+							  sizeof(key)),
+					   &key, sizeof(key)));
+		}
+		entry = mem_pool_alloc(entry_mem_pool, sizeof(*entry));
+		entry->es = l;
+		entry->next_line = NULL;
+		if (prev_line && prev_line->es->s == l->s)
+			prev_line->next_line = entry;
+		prev_line = entry;
+		if (l->s == DIFF_SYMBOL_PLUS) {
+			entry->next_match = entry_list[l->id].add;
+			entry_list[l->id].add = entry;
+		} else {
+			entry->next_match = entry_list[l->id].del;
+			entry_list[l->id].del = entry;
+		}
 	}
+
+	hashmap_clear(&interned_map);
+	mem_pool_discard(&interned_pool, 0);
+
+	return entry_list;
 }
 
 static void pmb_advance_or_null(struct diff_options *o,
@@ -974,7 +1006,6 @@ static void pmb_advance_or_null(struct diff_options *o,
 				int *pmb_nr)
 {
 	int i, j;
-	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
 	for (i = 0, j = 0; i < *pmb_nr; i++) {
 		int match;
@@ -987,9 +1018,8 @@ static void pmb_advance_or_null(struct diff_options *o,
 			match = cur &&
 				!cmp_in_block_with_wsd(o, cur, l, &pmb[i]);
 		else
-			match = cur &&
-				xdiff_compare_lines(cur->es->line, cur->es->len,
-						    l->line, l->len, flags);
+			match = cur && cur->es->id == l->id;
+
 		if (match)
 			pmb[j++].match = cur;
 	}
@@ -1034,8 +1064,7 @@ static int adjust_last_block(struct diff_options *o, int n, int block_length)
 
 /* Find blocks of moved code, delegate actual coloring decision to helper */
 static void mark_color_as_moved(struct diff_options *o,
-				struct hashmap *add_lines,
-				struct hashmap *del_lines)
+				struct moved_entry_list *entry_list)
 {
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
@@ -1044,23 +1073,15 @@ static void mark_color_as_moved(struct diff_options *o,
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
-		struct hashmap *hm = NULL;
-		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
-			hm = del_lines;
-			key = prepare_entry(o, n);
-			match = hashmap_get_entry(hm, key, ent, NULL);
-			free(key);
+			match = entry_list[l->id].del;
 			break;
 		case DIFF_SYMBOL_MINUS:
-			hm = add_lines;
-			key = prepare_entry(o, n);
-			match = hashmap_get_entry(hm, key, ent, NULL);
-			free(key);
+			match = entry_list[l->id].add;
 			break;
 		default:
 			flipped_block = 0;
@@ -1089,7 +1110,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			 * The current line is the start of a new block.
 			 * Setup the set of potential blocks.
 			 */
-			hashmap_for_each_entry_from(hm, match, ent) {
+			for (; match; match = match->next_match) {
 				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
 				if (o->color_moved_ws_handling &
 				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
@@ -6216,20 +6237,18 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 
 	if (o->emitted_symbols) {
 		if (o->color_moved) {
-			struct hashmap add_lines, del_lines;
-
-			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
-			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
+			struct mem_pool entry_pool;
+			struct moved_entry_list *entry_list;
 
-			add_lines_to_move_detection(o, &add_lines, &del_lines);
-			mark_color_as_moved(o, &add_lines, &del_lines);
+			mem_pool_init(&entry_pool, 1024 * 1024);
+			entry_list = add_lines_to_move_detection(o,
+								 &entry_pool);
+			mark_color_as_moved(o, entry_list);
 			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
 				dim_moved_lines(o);
 
-			hashmap_clear_and_free(&add_lines, struct moved_entry,
-						ent);
-			hashmap_clear_and_free(&del_lines, struct moved_entry,
-						ent);
+			mem_pool_discard(&entry_pool, 0);
+			free(entry_list);
 		}
 
 		for (i = 0; i < esm.nr; i++)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH v2 00/12] diff --color-moved[-ws] speedups
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
                     ` (11 preceding siblings ...)
  2021-07-20 10:36   ` [PATCH v2 12/12] diff --color-moved: intern strings Phillip Wood via GitGitGadget
@ 2021-07-20 13:38   ` Phillip Wood
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
  13 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood @ 2021-07-20 13:38 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget, git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren

Sorry Elijah I forgot to add you to the CC list on GitGitGadget

Best Wishes

Phillip

On 20/07/2021 11:36, Phillip Wood via GitGitGadget wrote:
> Thanks to Ævar and Elijah for their comments, I've reworded the commit
> messages, addressed the enum initialization issue in patch 2 (now 3) and
> added some perf tests.
> 
> There are two new patches in this round. The first patch is new and adds the
> perf tests suggested by Ævar, the penultimate patch is also new and coverts
> the existing code to use a designated initializer.
> 
> I've converted the benchmark results in the commit messages to use the new
> tests, the percentage changes are broadly similar to the previous results
> though I ended up running them on a different computer this time.
> 
> V1 cover letter:
> 
> The current implementation of diff --color-moved-ws=allow-indentation-change
> is considerably slower that the implementation of diff --color-moved which
> is in turn slower than a regular diff. This patch series starts with a
> couple of bug fixes and then reworks the implementation of diff
> --color-moved and diff --color-moved-ws=allow-indentation-change to speed
> them up on large diffs. The time to run git diff --color-moved
> --no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
> git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
> v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
> with --color-moved - the time to run git log -p --color-moved
> --no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
> processors. On older processors these patches reduce the running time in all
> cases that I've tested. In general the larger the diff the larger the speed
> up. As an extreme example the time to run diff --color-moved
> --color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
> minutes to 6 seconds.
> 
> Phillip Wood (12):
>    diff --color-moved: add perf tests
>    diff --color-moved=zebra: fix alternate coloring
>    diff --color-moved: avoid false short line matches and bad zerba
>      coloring
>    diff: simplify allow-indentation-change delta calculation
>    diff --color-moved-ws=allow-indentation-change: simplify and optimize
>    diff --color-moved: call comparison function directly
>    diff --color-moved: unify moved block growth functions
>    diff --color-moved: shrink potential moved blocks as we go
>    diff --color-moved: stop clearing potential moved blocks
>    diff --color-moved-ws=allow-indentation-change: improve hash lookups
>    diff: use designated initializers for emitted_diff_symbol
>    diff --color-moved: intern strings
> 
>   diff.c                           | 377 ++++++++++++-------------------
>   t/perf/p4002-diff-color-moved.sh |  45 ++++
>   t/t4015-diff-whitespace.sh       | 137 +++++++++++
>   3 files changed, 323 insertions(+), 236 deletions(-)
>   create mode 100755 t/perf/p4002-diff-color-moved.sh
> 
> 
> base-commit: 211eca0895794362184da2be2a2d812d070719d3
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-981%2Fphillipwood%2Fwip%2Fdiff-color-moved-tweaks-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-981/phillipwood/wip/diff-color-moved-tweaks-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/981
> 
> Range-diff vs v1:
> 
>    -:  ----------- >  1:  8fc8914a37b diff --color-moved: add perf tests
>    1:  374dbebcbf2 !  2:  9b4e4d2674a diff --color-moved=zerba: fix alternate coloring
>       @@ Metadata
>        Author: Phillip Wood <phillip.wood@dunelm.org.uk>
>        
>         ## Commit message ##
>       -    diff --color-moved=zerba: fix alternate coloring
>       +    diff --color-moved=zebra: fix alternate coloring
>        
>            b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
>            alternation", 2018-11-23) sought to avoid using the alternate colors
>    2:  3d02a0a91a0 !  3:  5512145c70f diff --color-moved: avoid false short line matches and bad zerba coloring
>       @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
>         	int pmb_nr = 0, pmb_alloc = 0;
>         	int n, flipped_block = 0, block_length = 0;
>        -	enum diff_symbol last_symbol = 0;
>       -+	enum diff_symbol moved_symbol = 0;
>       ++	enum diff_symbol moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
>         
>         
>         	for (n = 0; n < o->emitted_symbols->nr; n++) {
>       @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
>        -			last_symbol = l->s;
>        +		}
>        +		if (!match) {
>       -+			moved_symbol = 0;
>       ++			moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
>         			continue;
>         		}
>         
>       @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
>        +			if (pmb_nr)
>        +				moved_symbol = l->s;
>        +			else
>       -+				moved_symbol = 0;
>       ++				moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
>        +
>         			block_length = 0;
>         		}
>    3:  30f0ed44768 =  4:  93fdef30d64 diff: simplify allow-indentation-change delta calculation
>    4:  ebb6eec1d92 !  5:  6b7a8aed4ec diff --color-moved-ws=allow-indentation-change: simplify and optimize
>       @@ Commit message
>            comparison to filter out the non-matching lines. Fixing this reduces
>            time to run
>              git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
>       -    by 88% and simplifies the code.
>       +    by 93% compared to master and simplifies the code.
>        
>       -    Before this change
>       -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
>       -      Time (mean ± σ):      9.978 s ±  0.042 s    [User: 9.905 s, System: 0.057 s]
>       -      Range (min … max):    9.917 s … 10.037 s    10 runs
>       -
>       -    After this change
>       -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
>       -      Time (mean ± σ):      1.220 s ±  0.004 s    [User: 1.160 s, System: 0.058 s]
>       -      Range (min … max):    1.214 s …  1.226 s    10 runs
>       +    Test                                                                  HEAD^              HEAD
>       +    ---------------------------------------------------------------------------------------------------------------
>       +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41( 0.38+0.03)   0.41(0.37+0.04)  +0.0%
>       +    4002.2: diff --color-moved --no-color-moved-ws large change           0.83( 0.79+0.04)   0.82(0.79+0.02)  -1.2%
>       +    4002.3: diff --color-moved-ws=allow-indentation-change large change  13.68(13.59+0.07)   0.92(0.89+0.03) -93.3%
>       +    4002.4: log --no-color-moved --no-color-moved-ws                      1.31( 1.22+0.08)   1.31(1.21+0.10)  +0.0%
>       +    4002.5: log --color-moved --no-color-moved-ws                         1.47( 1.40+0.07)   1.47(1.36+0.10)  +0.0%
>       +    4002.6: log --color-moved-ws=allow-indentation-change                 1.87( 1.77+0.09)   1.50(1.41+0.09) -19.8%
>        
>            Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>        
>    5:  cec0c2d04d7 !  6:  cfbdd447eee diff --color-moved: call comparison function directly
>       @@ Metadata
>         ## Commit message ##
>            diff --color-moved: call comparison function directly
>        
>       -    Calling xdiff_compare_lines() directly rather than using a function
>       -    pointer from the hash map reduces the time very slightly but more
>       -    importantly it will allow us to easily combine pmb_advance_or_null()
>       -    and pmb_advance_or_null_multi_match() in the next commit.
>       +    This change will allow us to easily combine pmb_advance_or_null() and
>       +    pmb_advance_or_null_multi_match() in the next commit. Calling
>       +    xdiff_compare_lines() directly rather than using a function pointer
>       +    from the hash map has little effect on the run time.
>        
>       -    Before this change
>       -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
>       -      Time (mean ± σ):      1.136 s ±  0.004 s    [User: 1.079 s, System: 0.053 s]
>       -      Range (min … max):    1.130 s …  1.141 s    10 runs
>       -
>       -    After this change
>       -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
>       -      Time (mean ± σ):      1.118 s ±  0.003 s    [User: 1.062 s, System: 0.053 s]
>       -      Range (min … max):    1.114 s …  1.121 s    10 runs
>       +    Test                                                                  HEAD^             HEAD
>       +    -------------------------------------------------------------------------------------------------------------
>       +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.37+0.04)   0.41(0.39+0.02) +0.0%
>       +    4002.2: diff --color-moved --no-color-moved-ws large change           0.82(0.79+0.02)   0.83(0.79+0.03) +1.2%
>       +    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.92(0.89+0.03)   0.91(0.85+0.05) -1.1%
>       +    4002.4: log --no-color-moved --no-color-moved-ws                      1.31(1.21+0.10)   1.33(1.22+0.10) +1.5%
>       +    4002.5: log --color-moved --no-color-moved-ws                         1.47(1.36+0.10)   1.47(1.39+0.08) +0.0%
>       +    4002.6: log --color-moved-ws=allow-indentation-change                 1.50(1.41+0.09)   1.51(1.42+0.09) +0.7%
>        
>            Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>        
>    6:  050cef0081d =  7:  73ce9b54e86 diff --color-moved: unify moved block growth functions
>    7:  9390e9a66eb =  8:  ef8ce0e6ebc diff --color-moved: shrink potential moved blocks as we go
>    8:  1de99ac2bc3 =  9:  9d0a042eae1 diff --color-moved: stop clearing potential moved blocks
>    9:  41cdedd6090 ! 10:  dd365ad115f diff --color-moved-ws=allow-indentation-change: improve hash lookups
>       @@ Commit message
>            As libxdiff does not have a whitespace flag to ignore the indentation
>            the code for --color-moved-ws=allow-indentation-change uses
>            XDF_IGNORE_WHITESPACE and then filters out any hash lookups where
>       -    there are non-indentation changes. This is filtering is inefficient as
>       +    there are non-indentation changes. This filtering is inefficient as
>            we have to perform another string comparison.
>        
>            By using the offset data that we have already computed to skip the
>            indentation we can avoid using XDF_IGNORE_WHITESPACE and safely remove
>       -    the extra checks which improves the performance by 14% and paves the
>       +    the extra checks which improves the performance by 11% and paves the
>            way for the elimination of string comparisons in the next commit.
>        
>       -    This change slightly increases the runtime of other --color-moved
>       +    This change slightly increases the run time of other --color-moved
>            modes. This could be avoided by using different comparison functions
>       -    for the different modes but after the changes in the next commit there
>       -    is no measurable benefit.
>       +    for the different modes but after the next two commits there is no
>       +    measurable benefit in doing so.
>        
>       -    Before this change
>       -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
>       -      Time (mean ± σ):      1.116 s ±  0.005 s    [User: 1.057 s, System: 0.056 s]
>       -      Range (min … max):    1.109 s …  1.123 s    10 runs
>       -
>       -    Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
>       -      Time (mean ± σ):      1.216 s ±  0.005 s    [User: 1.155 s, System: 0.059 s]
>       -      Range (min … max):    1.206 s …  1.223 s    10 runs
>       -
>       -    After this change
>       -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
>       -      Time (mean ± σ):      1.147 s ±  0.005 s    [User: 1.085 s, System: 0.059 s]
>       -      Range (min … max):    1.140 s …  1.154 s    10 runs
>       -
>       -    Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
>       -      Time (mean ± σ):      1.048 s ±  0.005 s    [User: 987.4 ms, System: 58.8 ms]
>       -      Range (min … max):    1.043 s …  1.056 s    10 runs
>       +    Test                                                                  HEAD^             HEAD
>       +    --------------------------------------------------------------------------------------------------------------
>       +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.38+0.03)   0.41(0.36+0.04) +0.0%
>       +    4002.2: diff --color-moved --no-color-moved-ws large change           0.82(0.76+0.05)   0.84(0.79+0.04) +2.4%
>       +    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.91(0.88+0.03)   0.81(0.74+0.06) -11.0%
>       +    4002.4: log --no-color-moved --no-color-moved-ws                      1.32(1.21+0.10)   1.31(1.19+0.11) -0.8%
>       +    4002.5: log --color-moved --no-color-moved-ws                         1.47(1.37+0.10)   1.47(1.36+0.11) +0.0%
>       +    4002.6: log --color-moved-ws=allow-indentation-change                 1.51(1.42+0.09)   1.48(1.37+0.10) -2.0%
>        
>            Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>        
>    -:  ----------- > 11:  c160222ab3c diff: use designated initializers for emitted_diff_symbol
>   10:  220664dd907 ! 12:  753554587f9 diff --color-moved: intern strings
>       @@ Commit message
>            number of hash lookups a little (calculating the ids still involves
>            one hash lookup per line) but the main benefit is that when growing
>            blocks of potentially moved lines we can replace string comparisons
>       -    which involve chasing a pointer with a simple integer comparison.  On
>       -    a large diff this commit reduces the time to run 'diff --color-moved'
>       -    by 33% and 'diff --color-moved-ws=allow-indentation-change' by 20%.
>       +    which involve chasing a pointer with a simple integer comparison.
>        
>       -    Compared to master the time to run 'git log --patch --color-moved' is
>       -    increased by 2% and 'git log --patch
>       -    --color-moved-ws=allow-indentation-change' in reduced by 14%. These
>       -    timings were performed on an i5-7200U, on an i5-3470 both commands are
>       -    faster than master. The small speed decrease on commit sized diffs is
>       -    unfortunate but I think it is small enough to be worth it for the
>       -    gains on larger diffs.
>       +    On a large diff this commit reduces the time to run
>       +       diff --color-moved
>       +    by 33% and
>       +        diff --color-moved-ws=allow-indentation-change
>       +    by 26%. Compared to master the time to run
>       +        diff --color-moved-ws=allow-indentation-change
>       +    is now reduced by 95% and the overhead compared to --no-color-moved is
>       +    reduced to 50%.
>        
>       -    Large diff before this change:
>       -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
>       -      Time (mean ± σ):      1.147 s ±  0.005 s    [User: 1.085 s, System: 0.059 s]
>       -      Range (min … max):    1.140 s …  1.154 s    10 runs
>       +    Compared to the previous commit the time to run
>       +        git log --patch --color-moved
>       +    is increased slightly, but compared to master there is no change in
>       +    run time.
>        
>       -    Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
>       -      Time (mean ± σ):      1.048 s ±  0.005 s    [User: 987.4 ms, System: 58.8 ms]
>       -      Range (min … max):    1.043 s …  1.056 s    10 runs
>       +    Test                                                                  HEAD^             HEAD
>       +    --------------------------------------------------------------------------------------------------------------
>       +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.36+0.04)   0.41(0.37+0.03)  +0.0%
>       +    4002.2: diff --color-moved --no-color-moved-ws large change           0.83(0.79+0.03)   0.55(0.52+0.03) -33.7%
>       +    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.81(0.77+0.04)   0.60(0.55+0.05) -25.9%
>       +    4002.4: log --no-color-moved --no-color-moved-ws                      1.30(1.20+0.09)   1.31(1.22+0.08)  +0.8%
>       +    4002.5: log --color-moved --no-color-moved-ws                         1.46(1.35+0.11)   1.47(1.30+0.16)  +0.7%
>       +    4002.6: log --color-moved-ws=allow-indentation-change                 1.46(1.38+0.07)   1.47(1.34+0.13)  +0.7%
>        
>       -    Large diff after this change
>       -    Benchmark #1: bin-wrappers/git diff --diff-algorithm=myers --color-moved --no-color-moved-ws v2.28.0 v2.29.0
>       -      Time (mean ± σ):     762.7 ms ±   2.8 ms    [User: 707.5 ms, System: 53.7 ms]
>       -      Range (min … max):   758.0 ms … 767.0 ms    10 runs
>       -
>       -    Benchmark #2: bin-wrappers/git diff --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
>       -      Time (mean ± σ):     831.7 ms ±   1.7 ms    [User: 776.5 ms, System: 53.3 ms]
>       -      Range (min … max):   829.2 ms … 835.1 ms    10 runs
>       -
>       -    Small diffs on master
>       -    Benchmark #1: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0
>       -      Time (mean ± σ):      1.567 s ±  0.001 s    [User: 1.443 s, System: 0.121 s]
>       -      Range (min … max):    1.566 s …  1.571 s    10 runs
>       -
>       -    Benchmark #2: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change -n1000 --no-merges v2.29.0
>       -      Time (mean ± σ):      1.865 s ±  0.008 s    [User: 1.748 s, System: 0.112 s]
>       -      Range (min … max):    1.857 s …  1.881 s    10 runs
>       -
>       -    Small diffs after this change
>       -    Benchmark #1: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --no-color-moved-ws --no-merges -n1000 v2.29.0
>       -      Time (mean ± σ):      1.597 s ±  0.003 s    [User: 1.413 s, System: 0.179 s]
>       -      Range (min … max):    1.591 s …  1.601 s    10 runs
>       -
>       -    Benchmark #2: bin-wrappers/git log -p --diff-algorithm=myers --color-moved --color-moved-ws=allow-indentation-change -n1000 --no-merges v2.29.0
>       -      Time (mean ± σ):      1.606 s ±  0.006 s    [User: 1.420 s, System: 0.181 s]
>       -      Range (min … max):    1.601 s …  1.622 s    10 runs
>       +    Test                                                                  master            HEAD
>       +    --------------------------------------------------------------------------------------------------------------
>       +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.40( 0.36+0.03)  0.41(0.37+0.03)  +2.5%
>       +    4002.2: diff --color-moved --no-color-moved-ws large change           0.82( 0.77+0.04)  0.55(0.52+0.03) -32.9%
>       +    4002.3: diff --color-moved-ws=allow-indentation-change large change  14.10(14.04+0.04)  0.60(0.55+0.05) -95.7%
>       +    4002.4: log --no-color-moved --no-color-moved-ws                      1.31( 1.21+0.09)  1.31(1.22+0.08)  +0.0%
>       +    4002.5: log --color-moved --no-color-moved-ws                         1.47( 1.37+0.09)  1.47(1.30+0.16)  +0.0%
>       +    4002.6: log --color-moved-ws=allow-indentation-change                 1.86( 1.76+0.10)  1.47(1.34+0.13) -21.0%
>        
>            Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>        
>       @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
>         				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
>         				if (o->color_moved_ws_handling &
>         				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
>       -@@ diff.c: static void emit_diff_symbol_from_struct(struct diff_options *o,
>       - static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
>       - 			     const char *line, int len, unsigned flags)
>       - {
>       --	struct emitted_diff_symbol e = {line, len, flags, 0, 0, s};
>       -+	struct emitted_diff_symbol e = {line, len, flags, 0, 0, 0, s};
>       -
>       - 	if (o->emitted_symbols)
>       - 		append_emitted_diff_symbol(o, &e);
>        @@ diff.c: static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>         
>         	if (o->emitted_symbols) {
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v3 00/15] diff --color-moved[-ws] speedups
  2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
                     ` (12 preceding siblings ...)
  2021-07-20 13:38   ` [PATCH v2 00/12] diff --color-moved[-ws] speedups Phillip Wood
@ 2021-10-27 12:04   ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
                       ` (16 more replies)
  13 siblings, 17 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood

Sorry it has taken so long to post this re-roll. Changes since V2:

 * Patches 1-3 are new and fix an existing bug.
 * Patch 8 includes Peff's unused parameter fix.
 * Patch 11 has been updated to fix a bug fix in V2.
 * Patch 13 has an expanded commit message explaining a change in behavior
   for lines starting with a form-feed.
 * Updated benchmark results.

The bug fix in patch 3 degrades the performance, but by the end of the
series the timings are the same as V2 - see the range diff.

V2 Cover Letter: Thanks to Ævar and Elijah for their comments, I've reworded
the commit messages, addressed the enum initialization issue in patch 2 (now
3) and added some perf tests.

There are two new patches in this round. The first patch is new and adds the
perf tests suggested by Ævar, the penultimate patch is also new and coverts
the existing code to use a designated initializer.

I've converted the benchmark results in the commit messages to use the new
tests, the percentage changes are broadly similar to the previous results
though I ended up running them on a different computer this time.

V1 cover letter:

The current implementation of diff --color-moved-ws=allow-indentation-change
is considerably slower that the implementation of diff --color-moved which
is in turn slower than a regular diff. This patch series starts with a
couple of bug fixes and then reworks the implementation of diff
--color-moved and diff --color-moved-ws=allow-indentation-change to speed
them up on large diffs. The time to run git diff --color-moved
--no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
with --color-moved - the time to run git log -p --color-moved
--no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
processors. On older processors these patches reduce the running time in all
cases that I've tested. In general the larger the diff the larger the speed
up. As an extreme example the time to run diff --color-moved
--color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
minutes to 6 seconds.

Phillip Wood (15):
  diff --color-moved: add perf tests
  diff --color-moved: clear all flags on blocks that are too short
  diff --color-moved: factor out function
  diff --color-moved: rewind when discarding pmb
  diff --color-moved=zebra: fix alternate coloring
  diff --color-moved: avoid false short line matches and bad zerba
    coloring
  diff: simplify allow-indentation-change delta calculation
  diff --color-moved-ws=allow-indentation-change: simplify and optimize
  diff --color-moved: call comparison function directly
  diff --color-moved: unify moved block growth functions
  diff --color-moved: shrink potential moved blocks as we go
  diff --color-moved: stop clearing potential moved blocks
  diff --color-moved-ws=allow-indentation-change: improve hash lookups
  diff: use designated initializers for emitted_diff_symbol
  diff --color-moved: intern strings

 diff.c                           | 431 +++++++++++++------------------
 t/perf/p4002-diff-color-moved.sh |  45 ++++
 t/t4015-diff-whitespace.sh       | 205 ++++++++++++++-
 3 files changed, 425 insertions(+), 256 deletions(-)
 create mode 100755 t/perf/p4002-diff-color-moved.sh


base-commit: 211eca0895794362184da2be2a2d812d070719d3
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-981%2Fphillipwood%2Fwip%2Fdiff-color-moved-tweaks-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-981/phillipwood/wip/diff-color-moved-tweaks-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/981

Range-diff vs v2:

  1:  8fc8914a37b =  1:  8fc8914a37b diff --color-moved: add perf tests
  -:  ----------- >  2:  e9daed2360c diff --color-moved: clear all flags on blocks that are too short
  -:  ----------- >  3:  658aec2670c diff --color-moved: factor out function
  -:  ----------- >  4:  a30f52d7f15 diff --color-moved: rewind when discarding pmb
  2:  9b4e4d2674a !  5:  1dde206b7b1 diff --color-moved=zebra: fix alternate coloring
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
       		switch (l->s) {
       		case DIFF_SYMBOL_PLUS:
      @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
     - 			}
     + 							    &pmb, &pmb_alloc,
     + 							    &pmb_nr);
       
     - 			if (adjust_last_block(o, n, block_length) &&
     --			    pmb_nr && last_symbol != l->s)
     -+			    pmb_nr && last_symbol == l->s)
     +-			if (contiguous && pmb_nr && last_symbol != l->s)
     ++			if (contiguous && pmb_nr && last_symbol == l->s)
       				flipped_block = (flipped_block + 1) % 2;
       			else
       				flipped_block = 0;
  3:  5512145c70f !  6:  2717ff500d2 diff --color-moved: avoid false short line matches and bad zerba coloring
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
      +		if (pmb_nr && (!match || l->s != moved_symbol)) {
       			int i;
       
     - 			adjust_last_block(o, n, block_length);
     + 			if (!adjust_last_block(o, n, block_length) &&
      @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
       			pmb_nr = 0;
       			block_length = 0;
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
       			continue;
       		}
      @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
     - 			}
     + 							    &pmb, &pmb_alloc,
     + 							    &pmb_nr);
       
     - 			if (adjust_last_block(o, n, block_length) &&
     --			    pmb_nr && last_symbol == l->s)
     -+			    pmb_nr && moved_symbol == l->s)
     +-			if (contiguous && pmb_nr && last_symbol == l->s)
     ++			if (contiguous && pmb_nr && moved_symbol == l->s)
       				flipped_block = (flipped_block + 1) % 2;
       			else
       				flipped_block = 0;
  4:  93fdef30d64 =  7:  f96fa71d53c diff: simplify allow-indentation-change delta calculation
  5:  6b7a8aed4ec !  8:  324b689c915 diff --color-moved-ws=allow-indentation-change: simplify and optimize
     @@ Commit message
            git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
          by 93% compared to master and simplifies the code.
      
     -    Test                                                                  HEAD^              HEAD
     +    Test                                                                 HEAD^               HEAD
          ---------------------------------------------------------------------------------------------------------------
     -    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41( 0.38+0.03)   0.41(0.37+0.04)  +0.0%
     -    4002.2: diff --color-moved --no-color-moved-ws large change           0.83( 0.79+0.04)   0.82(0.79+0.02)  -1.2%
     -    4002.3: diff --color-moved-ws=allow-indentation-change large change  13.68(13.59+0.07)   0.92(0.89+0.03) -93.3%
     -    4002.4: log --no-color-moved --no-color-moved-ws                      1.31( 1.22+0.08)   1.31(1.21+0.10)  +0.0%
     -    4002.5: log --color-moved --no-color-moved-ws                         1.47( 1.40+0.07)   1.47(1.36+0.10)  +0.0%
     -    4002.6: log --color-moved-ws=allow-indentation-change                 1.87( 1.77+0.09)   1.50(1.41+0.09) -19.8%
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.35+0.03)   0.38(0.35+0.03)  +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.86 (0.80+0.06)   0.87(0.83+0.04)  +1.2%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change  19.01(18.93+0.06)   0.97(0.92+0.04) -94.9%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.16 (1.06+0.09)   1.17(1.06+0.10)  +0.9%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.32 (1.25+0.07)   1.32(1.24+0.08)  +0.0%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.71 (1.64+0.06)   1.36(1.25+0.10) -20.5%
      
     +    Test                                                                 master              HEAD
     +    ---------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)   0.38(0.35+0.03)  +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.75+0.04)   0.87(0.83+0.04)  +8.7%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change  14.20(14.15+0.05)   0.97(0.92+0.04) -93.2%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.15 (1.05+0.09)   1.17(1.06+0.10)  +1.7%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.30 (1.19+0.11)   1.32(1.24+0.08)  +1.5%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.70 (1.63+0.06)   1.36(1.25+0.10) -20.0%
     +
     +    Helped-by: Jeff King <peff@peff.net>
          Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
      
       ## diff.c ##
      @@ diff.c: static int compute_ws_delta(const struct emitted_diff_symbol *a,
     + 	return 1;
     + }
       
     - static int cmp_in_block_with_wsd(const struct diff_options *o,
     - 				 const struct moved_entry *cur,
     +-static int cmp_in_block_with_wsd(const struct diff_options *o,
     +-				 const struct moved_entry *cur,
      -				 const struct moved_entry *match,
      -				 struct moved_block *pmb,
      -				 int n)
     -+				 const struct emitted_diff_symbol *l,
     -+				 struct moved_block *pmb)
     - {
     +-{
      -	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
      -	int al = cur->es->len, bl = match->es->len, cl = l->len;
     ++static int cmp_in_block_with_wsd(const struct moved_entry *cur,
     ++				 const struct emitted_diff_symbol *l,
     ++				 struct moved_block *pmb)
     ++{
      +	int al = cur->es->len, bl = l->len;
       	const char *a = cur->es->line,
      -		   *b = match->es->line,
     @@ diff.c: static void pmb_advance_or_null(struct diff_options *o,
      +		struct moved_entry *prev = pmb[i].match;
      +		struct moved_entry *cur = (prev && prev->next_line) ?
      +			prev->next_line : NULL;
     -+		if (cur && !cmp_in_block_with_wsd(o, cur, l, &pmb[i])) {
     ++		if (cur && !cmp_in_block_with_wsd(cur, l, &pmb[i])) {
       			/* Advance to the next line */
      -			pmb[i].match = pmb[i].match->next_line;
      +			pmb[i].match = cur;
  6:  cfbdd447eee !  9:  f142f33276a diff --color-moved: call comparison function directly
     @@ Commit message
      
          Test                                                                  HEAD^             HEAD
          -------------------------------------------------------------------------------------------------------------
     -    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.37+0.04)   0.41(0.39+0.02) +0.0%
     -    4002.2: diff --color-moved --no-color-moved-ws large change           0.82(0.79+0.02)   0.83(0.79+0.03) +1.2%
     -    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.92(0.89+0.03)   0.91(0.85+0.05) -1.1%
     -    4002.4: log --no-color-moved --no-color-moved-ws                      1.31(1.21+0.10)   1.33(1.22+0.10) +1.5%
     -    4002.5: log --color-moved --no-color-moved-ws                         1.47(1.36+0.10)   1.47(1.39+0.08) +0.0%
     -    4002.6: log --color-moved-ws=allow-indentation-change                 1.50(1.41+0.09)   1.51(1.42+0.09) +0.7%
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.35+0.03)   0.38(0.32+0.06) +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.87(0.83+0.04)   0.87(0.80+0.06) +0.0%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.97(0.92+0.04)   0.97(0.93+0.04) +0.0%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.17(1.06+0.10)   1.16(1.10+0.05) -0.9%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.32(1.24+0.08)   1.31(1.22+0.09) -0.8%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.36(1.25+0.10)   1.35(1.25+0.10) -0.7%
      
          Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
      
  7:  73ce9b54e86 ! 10:  8f3ea865dd3 diff --color-moved: unify moved block growth functions
     @@ diff.c: static void pmb_advance_or_null(struct diff_options *o,
      -		struct moved_entry *prev = pmb[i].match;
      -		struct moved_entry *cur = (prev && prev->next_line) ?
      -			prev->next_line : NULL;
     --		if (cur && !cmp_in_block_with_wsd(o, cur, l, &pmb[i])) {
     +-		if (cur && !cmp_in_block_with_wsd(cur, l, &pmb[i])) {
      -			/* Advance to the next line */
      +		if (o->color_moved_ws_handling &
      +		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
      +			match = cur &&
     -+				!cmp_in_block_with_wsd(o, cur, l, &pmb[i]);
     ++				!cmp_in_block_with_wsd(cur, l, &pmb[i]);
      +		else
      +			match = cur &&
      +				xdiff_compare_lines(cur->es->line, cur->es->len,
  8:  ef8ce0e6ebc ! 11:  078c04d4a66 diff --color-moved: shrink potential moved blocks as we go
     @@ diff.c: static void add_lines_to_move_detection(struct diff_options *o,
       		struct moved_entry *prev = pmb[i].match;
       		struct moved_entry *cur = (prev && prev->next_line) ?
      @@ diff.c: static void pmb_advance_or_null(struct diff_options *o,
     + 			match = cur &&
       				xdiff_compare_lines(cur->es->line, cur->es->len,
       						    l->line, l->len, flags);
     - 		if (match)
     +-		if (match)
      -			pmb[i].match = cur;
      -		else
      -			moved_block_clear(&pmb[i]);
     -+			pmb[j++].match = cur;
     - 	}
     +-	}
      -}
      -
      -static int shrink_potential_moved_blocks(struct moved_block *pmb,
     @@ diff.c: static void pmb_advance_or_null(struct diff_options *o,
      -			memset(&pmb[rp], 0, sizeof(pmb[rp]));
      -			rp--;
      -			lp++;
     --		}
     --	}
     ++		if (match) {
     ++			pmb[j] = pmb[i];
     ++			pmb[j++].match = cur;
     + 		}
     + 	}
      -
      -	/* Remember the number of running sets */
      -	return rp + 1;
      +	*pmb_nr = j;
       }
       
     - /*
     + static void fill_potential_moved_blocks(struct diff_options *o,
      @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
       			continue;
       		}
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
      +		pmb_advance_or_null(o, l, pmb, &pmb_nr);
       
       		if (pmb_nr == 0) {
     - 			/*
     + 			int contiguous = adjust_last_block(o, n, block_length);
  9:  9d0a042eae1 ! 12:  618371471a0 diff --color-moved: stop clearing potential moved blocks
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
       		if (pmb_nr && (!match || l->s != moved_symbol)) {
      -			int i;
      -
     - 			adjust_last_block(o, n, block_length);
     + 			if (!adjust_last_block(o, n, block_length) &&
     + 			    block_length > 1) {
     + 				/*
     +@@ diff.c: static void mark_color_as_moved(struct diff_options *o,
     + 				match = NULL;
     + 				n -= block_length;
     + 			}
      -			for(i = 0; i < pmb_nr; i++)
      -				moved_block_clear(&pmb[i]);
       			pmb_nr = 0;
 10:  dd365ad115f ! 13:  6a8e9a2724d diff --color-moved-ws=allow-indentation-change: improve hash lookups
     @@ Commit message
          for the different modes but after the next two commits there is no
          measurable benefit in doing so.
      
     +    There is a change in behavior for lines that begin with a form-feed or
     +    vertical-tab character. Since b46054b374 ("xdiff: use
     +    git-compat-util", 2019-04-11) xdiff does not treat '\f' or '\v' as
     +    whitespace characters. This means that lines starting with those
     +    characters are never considered to be blank and never match a line
     +    that does not start with the same character. After this patch a line
     +    matching "^[\f\v\r]*[ \t]*$" is considered to be blank by
     +    --color-moved-ws=allow-indentation-change and lines beginning
     +    "^[\f\v\r]*[ \t]*" can match another line if the suffixes match. This
     +    changes the output of git show for d18f76dccf ("compat/regex: use the
     +    regex engine from gawk for compat", 2010-08-17) as some lines in the
     +    pre-image before a moved block that contain '\f' are now considered
     +    moved as well as they match a blank line before the moved lines in the
     +    post-image. This commit updates one of the tests to reflect this
     +    change.
     +
          Test                                                                  HEAD^             HEAD
          --------------------------------------------------------------------------------------------------------------
     -    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.38+0.03)   0.41(0.36+0.04) +0.0%
     -    4002.2: diff --color-moved --no-color-moved-ws large change           0.82(0.76+0.05)   0.84(0.79+0.04) +2.4%
     -    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.91(0.88+0.03)   0.81(0.74+0.06) -11.0%
     -    4002.4: log --no-color-moved --no-color-moved-ws                      1.32(1.21+0.10)   1.31(1.19+0.11) -0.8%
     -    4002.5: log --color-moved --no-color-moved-ws                         1.47(1.37+0.10)   1.47(1.36+0.11) +0.0%
     -    4002.6: log --color-moved-ws=allow-indentation-change                 1.51(1.42+0.09)   1.48(1.37+0.10) -2.0%
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.33+0.05)   0.38(0.33+0.05)  +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.86(0.82+0.04)   0.88(0.84+0.04)  +2.3%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.97(0.94+0.03)   0.86(0.81+0.05) -11.3%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.16(1.07+0.09)   1.16(1.06+0.09)  +0.0%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.32(1.26+0.06)   1.33(1.27+0.05)  +0.8%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.35(1.29+0.06)   1.33(1.24+0.08)  -1.5%
      
          Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
      
     @@ diff.c: static void fill_es_indent_data(struct emitted_diff_symbol *es)
      +	return a_width - b_width;
       }
       
     - static int cmp_in_block_with_wsd(const struct diff_options *o,
     + static int cmp_in_block_with_wsd(const struct moved_entry *cur,
      @@ diff.c: static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
       			   const void *keydata)
       {
     @@ diff.c: static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
       
      -	a = container_of(eptr, const struct moved_entry, ent);
      -	b = container_of(entry_or_key, const struct moved_entry, ent);
     --
     ++	a = container_of(eptr, const struct moved_entry, ent)->es;
     ++	b = container_of(entry_or_key, const struct moved_entry, ent)->es;
     + 
      -	if (diffopt->color_moved_ws_handling &
      -	    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
      -		/*
     @@ diff.c: static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
      -		 * configuration for the next block is done else where
      -		 */
      -		flags |= XDF_IGNORE_WHITESPACE;
     -+	a = container_of(eptr, const struct moved_entry, ent)->es;
     -+	b = container_of(entry_or_key, const struct moved_entry, ent)->es;
     - 
     +-
      -	return !xdiff_compare_lines(a->es->line, a->es->len,
      -				    b->es->line, b->es->len,
      -				    flags);
     @@ diff.c: static struct moved_entry *prepare_entry(struct diff_options *o,
       
       	hashmap_entry_init(&ret->ent, hash);
       	ret->es = l;
     -@@ diff.c: static void mark_color_as_moved(struct diff_options *o,
     - 			hashmap_for_each_entry_from(hm, match, ent) {
     - 				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
     - 				if (o->color_moved_ws_handling &
     --				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
     --					if (compute_ws_delta(l, match->es,
     --							     &pmb[pmb_nr].wsd))
     --						pmb[pmb_nr++].match = match;
     --				} else {
     -+				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
     -+					pmb[pmb_nr].wsd = compute_ws_delta(l, match->es);
     -+				else
     - 					pmb[pmb_nr].wsd = 0;
     --					pmb[pmb_nr++].match = match;
     --				}
     -+				pmb[pmb_nr++].match = match;
     - 			}
     +@@ diff.c: static void fill_potential_moved_blocks(struct diff_options *o,
     + 	hashmap_for_each_entry_from(hm, match, ent) {
     + 		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
     + 		if (o->color_moved_ws_handling &
     +-		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
     +-			if (compute_ws_delta(l, match->es, &(pmb[pmb_nr]).wsd))
     +-				pmb[pmb_nr++].match = match;
     +-		} else {
     ++		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
     ++			pmb[pmb_nr].wsd = compute_ws_delta(l, match->es);
     ++		else
     + 			pmb[pmb_nr].wsd = 0;
     +-			pmb[pmb_nr++].match = match;
     +-		}
     ++		pmb[pmb_nr++].match = match;
     + 	}
       
     - 			if (adjust_last_block(o, n, block_length) &&
     + 	*pmb_p = pmb;
      @@ diff.c: static void diff_flush_patch_all_file_pairs(struct diff_options *o)
       		if (o->color_moved) {
       			struct hashmap add_lines, del_lines;
     @@ diff.c: static void diff_flush_patch_all_file_pairs(struct diff_options *o)
       			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
       			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
       
     +
     + ## t/t4015-diff-whitespace.sh ##
     +@@ t/t4015-diff-whitespace.sh: EMPTY=''
     + test_expect_success 'compare mixed whitespace delta across moved blocks' '
     + 
     + 	git reset --hard &&
     +-	tr Q_ "\t " <<-EOF >text.txt &&
     +-	${EMPTY}
     +-	____too short without
     +-	${EMPTY}
     ++	tr "^|Q_" "\f\v\t " <<-EOF >text.txt &&
     ++	^__
     ++	|____too short without
     ++	^
     + 	___being grouped across blank line
     + 	${EMPTY}
     + 	context
     +@@ t/t4015-diff-whitespace.sh: test_expect_success 'compare mixed whitespace delta across moved blocks' '
     + 	git add text.txt &&
     + 	git commit -m "add text.txt" &&
     + 
     +-	tr Q_ "\t " <<-EOF >text.txt &&
     ++	tr "^|Q_" "\f\v\t " <<-EOF >text.txt &&
     + 	context
     + 	lines
     + 	to
     +@@ t/t4015-diff-whitespace.sh: test_expect_success 'compare mixed whitespace delta across moved blocks' '
     + 	${EMPTY}
     + 	QQtoo short without
     + 	${EMPTY}
     +-	Q_______being grouped across blank line
     ++	^Q_______being grouped across blank line
     + 	${EMPTY}
     + 	Q_QThese two lines have had their
     + 	indentation reduced by four spaces
     +@@ t/t4015-diff-whitespace.sh: test_expect_success 'compare mixed whitespace delta across moved blocks' '
     + 		-c core.whitespace=space-before-tab \
     + 		diff --color --color-moved --ws-error-highlight=all \
     + 		--color-moved-ws=allow-indentation-change >actual.raw &&
     +-	grep -v "index" actual.raw | test_decode_color >actual &&
     ++	grep -v "index" actual.raw | tr "\f\v" "^|" | test_decode_color >actual &&
     + 
     + 	cat <<-\EOF >expected &&
     + 	<BOLD>diff --git a/text.txt b/text.txt<RESET>
     + 	<BOLD>--- a/text.txt<RESET>
     + 	<BOLD>+++ b/text.txt<RESET>
     + 	<CYAN>@@ -1,16 +1,16 @@<RESET>
     +-	<BOLD;MAGENTA>-<RESET>
     +-	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    too short without<RESET>
     +-	<BOLD;MAGENTA>-<RESET>
     ++	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>^<RESET><BRED>  <RESET>
     ++	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>|    too short without<RESET>
     ++	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>^<RESET>
     + 	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>   being grouped across blank line<RESET>
     + 	<BOLD;MAGENTA>-<RESET>
     + 	 <RESET>context<RESET>
     +@@ t/t4015-diff-whitespace.sh: test_expect_success 'compare mixed whitespace delta across moved blocks' '
     + 	<BOLD;YELLOW>+<RESET>
     + 	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>too short without<RESET>
     + 	<BOLD;YELLOW>+<RESET>
     +-	<BOLD;YELLOW>+<RESET>	<BOLD;YELLOW>       being grouped across blank line<RESET>
     ++	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>^	       being grouped across blank line<RESET>
     + 	<BOLD;YELLOW>+<RESET>
     + 	<BOLD;CYAN>+<RESET>	<BRED> <RESET>	<BOLD;CYAN>These two lines have had their<RESET>
     + 	<BOLD;CYAN>+<RESET><BOLD;CYAN>indentation reduced by four spaces<RESET>
 11:  c160222ab3c = 14:  ef98a6e7015 diff: use designated initializers for emitted_diff_symbol
 12:  753554587f9 ! 15:  ae78c05f08d diff --color-moved: intern strings
     @@ Commit message
          number of hash lookups a little (calculating the ids still involves
          one hash lookup per line) but the main benefit is that when growing
          blocks of potentially moved lines we can replace string comparisons
     -    which involve chasing a pointer with a simple integer comparison.
     +    which involve chasing a pointer with a simple integer comparison. On a
     +    large diff this commit reduces the time to run 'diff --color-moved' by
     +    37% compared to the previous commit and 31% compared to master, for
     +    'diff --color-moved-ws=allow-indentation-change' the reduction is 28%
     +    compared to the previous commit and 96% compared to master. There is
     +    little change in the performance of 'git log --patch' as the diffs are
     +    smaller.
      
     -    On a large diff this commit reduces the time to run
     -       diff --color-moved
     -    by 33% and
     -        diff --color-moved-ws=allow-indentation-change
     -    by 26%. Compared to master the time to run
     -        diff --color-moved-ws=allow-indentation-change
     -    is now reduced by 95% and the overhead compared to --no-color-moved is
     -    reduced to 50%.
     +    Test                                                                 HEAD^               HEAD
     +    ---------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.33+0.05)    0.38(0.33+0.05)  +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.88(0.81+0.06)    0.55(0.50+0.04) -37.5%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.85(0.79+0.06)    0.61(0.54+0.06) -28.2%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.16(1.07+0.08)    1.15(1.09+0.05)  -0.9%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.31(1.22+0.08)    1.29(1.19+0.09)  -1.5%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.32(1.24+0.08)    1.31(1.18+0.13)  -0.8%
      
     -    Compared to the previous commit the time to run
     -        git log --patch --color-moved
     -    is increased slightly, but compared to master there is no change in
     -    run time.
     -
     -    Test                                                                  HEAD^             HEAD
     -    --------------------------------------------------------------------------------------------------------------
     -    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.41(0.36+0.04)   0.41(0.37+0.03)  +0.0%
     -    4002.2: diff --color-moved --no-color-moved-ws large change           0.83(0.79+0.03)   0.55(0.52+0.03) -33.7%
     -    4002.3: diff --color-moved-ws=allow-indentation-change large change   0.81(0.77+0.04)   0.60(0.55+0.05) -25.9%
     -    4002.4: log --no-color-moved --no-color-moved-ws                      1.30(1.20+0.09)   1.31(1.22+0.08)  +0.8%
     -    4002.5: log --color-moved --no-color-moved-ws                         1.46(1.35+0.11)   1.47(1.30+0.16)  +0.7%
     -    4002.6: log --color-moved-ws=allow-indentation-change                 1.46(1.38+0.07)   1.47(1.34+0.13)  +0.7%
     -
     -    Test                                                                  master            HEAD
     -    --------------------------------------------------------------------------------------------------------------
     -    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.40( 0.36+0.03)  0.41(0.37+0.03)  +2.5%
     -    4002.2: diff --color-moved --no-color-moved-ws large change           0.82( 0.77+0.04)  0.55(0.52+0.03) -32.9%
     -    4002.3: diff --color-moved-ws=allow-indentation-change large change  14.10(14.04+0.04)  0.60(0.55+0.05) -95.7%
     -    4002.4: log --no-color-moved --no-color-moved-ws                      1.31( 1.21+0.09)  1.31(1.22+0.08)  +0.0%
     -    4002.5: log --color-moved --no-color-moved-ws                         1.47( 1.37+0.09)  1.47(1.30+0.16)  +0.0%
     -    4002.6: log --color-moved-ws=allow-indentation-change                 1.86( 1.76+0.10)  1.47(1.34+0.13) -21.0%
     +    Test                                                                 master              HEAD
     +    ---------------------------------------------------------------------------------------------------------------
     +    4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)   0.38(0.33+0.05)  +0.0%
     +    4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.75+0.04)   0.55(0.50+0.04) -31.2%
     +    4002.3: diff --color-moved-ws=allow-indentation-change large change  14.20(14.15+0.05)   0.61(0.54+0.06) -95.7%
     +    4002.4: log --no-color-moved --no-color-moved-ws                      1.15 (1.05+0.09)   1.15(1.09+0.05)  +0.0%
     +    4002.5: log --color-moved --no-color-moved-ws                         1.30 (1.19+0.11)   1.29(1.19+0.09)  -0.8%
     +    4002.6: log --color-moved-ws=allow-indentation-change                 1.70 (1.63+0.06)   1.31(1.18+0.13) -22.9%
      
          Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
      
     @@ diff.c: static void append_emitted_diff_symbol(struct diff_options *o,
       };
       
       struct moved_block {
     -@@ diff.c: static int cmp_in_block_with_wsd(const struct diff_options *o,
     +@@ diff.c: static int cmp_in_block_with_wsd(const struct moved_entry *cur,
       				 const struct emitted_diff_symbol *l,
       				 struct moved_block *pmb)
       {
     @@ diff.c: static int cmp_in_block_with_wsd(const struct diff_options *o,
       	 */
       	delta = b_width - a_width;
       
     -@@ diff.c: static int cmp_in_block_with_wsd(const struct diff_options *o,
     +@@ diff.c: static int cmp_in_block_with_wsd(const struct moved_entry *cur,
       	if (pmb->wsd == INDENT_BLANKLINE)
       		pmb->wsd = delta;
       
     @@ diff.c: static void pmb_advance_or_null(struct diff_options *o,
       		int match;
      @@ diff.c: static void pmb_advance_or_null(struct diff_options *o,
       			match = cur &&
     - 				!cmp_in_block_with_wsd(o, cur, l, &pmb[i]);
     + 				!cmp_in_block_with_wsd(cur, l, &pmb[i]);
       		else
      -			match = cur &&
      -				xdiff_compare_lines(cur->es->line, cur->es->len,
      -						    l->line, l->len, flags);
      +			match = cur && cur->es->id == l->id;
      +
     - 		if (match)
     + 		if (match) {
     + 			pmb[j] = pmb[i];
       			pmb[j++].match = cur;
     - 	}
     +@@ diff.c: static void pmb_advance_or_null(struct diff_options *o,
     + }
     + 
     + static void fill_potential_moved_blocks(struct diff_options *o,
     +-					struct hashmap *hm,
     + 					struct moved_entry *match,
     + 					struct emitted_diff_symbol *l,
     + 					struct moved_block **pmb_p,
     +@@ diff.c: static void fill_potential_moved_blocks(struct diff_options *o,
     + 	 * The current line is the start of a new block.
     + 	 * Setup the set of potential blocks.
     + 	 */
     +-	hashmap_for_each_entry_from(hm, match, ent) {
     ++	for (; match; match = match->next_match) {
     + 		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
     + 		if (o->color_moved_ws_handling &
     + 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
      @@ diff.c: static int adjust_last_block(struct diff_options *o, int n, int block_length)
       
       /* Find blocks of moved code, delegate actual coloring decision to helper */
     @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
       		default:
       			flipped_block = 0;
      @@ diff.c: static void mark_color_as_moved(struct diff_options *o,
     - 			 * The current line is the start of a new block.
     - 			 * Setup the set of potential blocks.
     - 			 */
     --			hashmap_for_each_entry_from(hm, match, ent) {
     -+			for (; match; match = match->next_match) {
     - 				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
     - 				if (o->color_moved_ws_handling &
     - 				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
     + 				 */
     + 				n -= block_length;
     + 			else
     +-				fill_potential_moved_blocks(o, hm, match, l,
     ++				fill_potential_moved_blocks(o, match, l,
     + 							    &pmb, &pmb_alloc,
     + 							    &pmb_nr);
     + 
      @@ diff.c: static void diff_flush_patch_all_file_pairs(struct diff_options *o)
       
       	if (o->emitted_symbols) {

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v3 01/15] diff --color-moved: add perf tests
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-28 21:32       ` Junio C Hamano
  2021-10-27 12:04     ` [PATCH v3 02/15] diff --color-moved: clear all flags on blocks that are too short Phillip Wood via GitGitGadget
                       ` (15 subsequent siblings)
  16 siblings, 1 reply; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Add some tests so we can monitor changes to the performance of the
move detection code. The tests record the performance of a single
large diff and a sequence of smaller diffs.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 t/perf/p4002-diff-color-moved.sh | 45 ++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)
 create mode 100755 t/perf/p4002-diff-color-moved.sh

diff --git a/t/perf/p4002-diff-color-moved.sh b/t/perf/p4002-diff-color-moved.sh
new file mode 100755
index 00000000000..ad56bcb71e4
--- /dev/null
+++ b/t/perf/p4002-diff-color-moved.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+test_description='Tests diff --color-moved performance'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+if ! git rev-parse --verify v2.29.0^{commit} >/dev/null
+then
+	skip_all='skipping because tag v2.29.0 was not found'
+	test_done
+fi
+
+GIT_PAGER_IN_USE=1
+test_export GIT_PAGER_IN_USE
+
+test_perf 'diff --no-color-moved --no-color-moved-ws large change' '
+	git diff --no-color-moved --no-color-moved-ws v2.28.0 v2.29.0
+'
+
+test_perf 'diff --color-moved --no-color-moved-ws large change' '
+	git diff --color-moved=zebra --no-color-moved-ws v2.28.0 v2.29.0
+'
+
+test_perf 'diff --color-moved-ws=allow-indentation-change large change' '
+	git diff --color-moved=zebra --color-moved-ws=allow-indentation-change \
+		v2.28.0 v2.29.0
+'
+
+test_perf 'log --no-color-moved --no-color-moved-ws' '
+	git log --no-color-moved --no-color-moved-ws --no-merges --patch \
+		-n1000 v2.29.0
+'
+
+test_perf 'log --color-moved --no-color-moved-ws' '
+	git log --color-moved=zebra --no-color-moved-ws --no-merges --patch \
+		-n1000 v2.29.0
+'
+
+test_perf 'log --color-moved-ws=allow-indentation-change' '
+	git log --color-moved=zebra --color-moved-ws=allow-indentation-change \
+		--no-merges --patch -n1000 v2.29.0
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 02/15] diff --color-moved: clear all flags on blocks that are too short
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 03/15] diff --color-moved: factor out function Phillip Wood via GitGitGadget
                       ` (14 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

If a block of potentially moved lines is not long enough then the
DIFF_SYMBOL_MOVED_LINE flag is cleared on the matching lines so they
are not marked as moved. To avoid problems when we start rewinding
after an unsuccessful match in a couple of commits time make sure all
the move related flags are cleared, not just DIFF_SYMBOL_MOVED_LINE.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 52c791574b7..bd8e4ec9757 100644
--- a/diff.c
+++ b/diff.c
@@ -1114,6 +1114,8 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb,
  * NEEDSWORK: This uses the same heuristic as blame_entry_score() in blame.c.
  * Think of a way to unify them.
  */
+#define DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK \
+  (DIFF_SYMBOL_MOVED_LINE | DIFF_SYMBOL_MOVED_LINE_ALT)
 static int adjust_last_block(struct diff_options *o, int n, int block_length)
 {
 	int i, alnum_count = 0;
@@ -1130,7 +1132,7 @@ static int adjust_last_block(struct diff_options *o, int n, int block_length)
 		}
 	}
 	for (i = 1; i < block_length + 1; i++)
-		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
+		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK;
 	return 0;
 }
 
@@ -1237,8 +1239,6 @@ static void mark_color_as_moved(struct diff_options *o,
 	free(pmb);
 }
 
-#define DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK \
-  (DIFF_SYMBOL_MOVED_LINE | DIFF_SYMBOL_MOVED_LINE_ALT)
 static void dim_moved_lines(struct diff_options *o)
 {
 	int n;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 03/15] diff --color-moved: factor out function
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 02/15] diff --color-moved: clear all flags on blocks that are too short Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-28 21:51       ` Junio C Hamano
  2021-10-27 12:04     ` [PATCH v3 04/15] diff --color-moved: rewind when discarding pmb Phillip Wood via GitGitGadget
                       ` (13 subsequent siblings)
  16 siblings, 1 reply; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This code is quite heavily indented and having it in its own function
simplifies an upcoming change.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 51 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index bd8e4ec9757..09af94e018c 100644
--- a/diff.c
+++ b/diff.c
@@ -1098,6 +1098,38 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb,
 	return rp + 1;
 }
 
+static void fill_potential_moved_blocks(struct diff_options *o,
+					struct hashmap *hm,
+					struct moved_entry *match,
+					struct emitted_diff_symbol *l,
+					struct moved_block **pmb_p,
+					int *pmb_alloc_p, int *pmb_nr_p)
+
+{
+	struct moved_block *pmb = *pmb_p;
+	int pmb_alloc = *pmb_alloc_p, pmb_nr = *pmb_nr_p;
+
+	/*
+	 * The current line is the start of a new block.
+	 * Setup the set of potential blocks.
+	 */
+	hashmap_for_each_entry_from(hm, match, ent) {
+		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
+		if (o->color_moved_ws_handling &
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
+			if (compute_ws_delta(l, match->es, &(pmb[pmb_nr]).wsd))
+				pmb[pmb_nr++].match = match;
+		} else {
+			pmb[pmb_nr].wsd = 0;
+			pmb[pmb_nr++].match = match;
+		}
+	}
+
+	*pmb_p = pmb;
+	*pmb_alloc_p = pmb_alloc;
+	*pmb_nr_p = pmb_nr;
+}
+
 /*
  * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
  *
@@ -1198,23 +1230,8 @@ static void mark_color_as_moved(struct diff_options *o,
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
 		if (pmb_nr == 0) {
-			/*
-			 * The current line is the start of a new block.
-			 * Setup the set of potential blocks.
-			 */
-			hashmap_for_each_entry_from(hm, match, ent) {
-				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
-				if (o->color_moved_ws_handling &
-				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
-					if (compute_ws_delta(l, match->es,
-							     &pmb[pmb_nr].wsd))
-						pmb[pmb_nr++].match = match;
-				} else {
-					pmb[pmb_nr].wsd = 0;
-					pmb[pmb_nr++].match = match;
-				}
-			}
-
+			fill_potential_moved_blocks(
+				o, hm, match, l, &pmb, &pmb_alloc, &pmb_nr);
 			if (adjust_last_block(o, n, block_length) &&
 			    pmb_nr && last_symbol != l->s)
 				flipped_block = (flipped_block + 1) % 2;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 04/15] diff --color-moved: rewind when discarding pmb
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (2 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 03/15] diff --color-moved: factor out function Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 05/15] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
                       ` (12 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

diff --color-moved colors the two sides of the diff separately. It
walks through the diff and tries to find matches on the other side of
the diff for the current line. When it finds one or more matches it
starts a "potential moved block" (pmb) and marks the current line as
moved. Then as it walks through the diff it only looks for matches for
the current line in the lines following those in the pmb. When none of
the lines in the pmb match it checks how long the match is and if it
is too short it unmarks the lines as matched and goes back to finding
all the lines that match the current line. As the process of finding
matching lines restarts from the end of the block that was too short
it is possible to miss the start of a matching block on on side but
not the other. In the test added here "-two" would not be colored as
moved but "+two" would be.

Fix this by rewinding the current line when we reach the end of a
block that is too short. This is quadratic in the length of the
discarded block. While the discarded blocks are quite short on a large
diff this still has a significant impact on the performance of
--color-moved-ws=allow-indentation-change. The following commits
optimize the performance of the --color-moved machinery which
mitigates the performance impact of this commit. After the
optimization this commit has a negligible impact on performance.

Test                                                                 HEAD^               HEAD
------------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)    0.39 (0.34+0.04)  +2.6%
4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.76+0.03)    0.86 (0.82+0.04)  +7.5%
4002.3: diff --color-moved-ws=allow-indentation-change large change  14.22(14.17+0.04)   19.01(18.93+0.05) +33.7%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16 (1.06+0.09)    1.16 (1.07+0.07)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.31 (1.22+0.09)    1.32 (1.22+0.09)  +0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.71 (1.61+0.09)    1.72 (1.63+0.08)  +0.6%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 28 ++++++++++++++++++-----
 t/t4015-diff-whitespace.sh | 46 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 09af94e018c..1e1b5127d15 100644
--- a/diff.c
+++ b/diff.c
@@ -1205,7 +1205,15 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (!match) {
 			int i;
 
-			adjust_last_block(o, n, block_length);
+			if (!adjust_last_block(o, n, block_length) &&
+			    block_length > 1) {
+				/*
+				 * Rewind in case there is another match
+				 * starting at the second line of the block
+				 */
+				match = NULL;
+				n -= block_length;
+			}
 			for(i = 0; i < pmb_nr; i++)
 				moved_block_clear(&pmb[i]);
 			pmb_nr = 0;
@@ -1230,10 +1238,20 @@ static void mark_color_as_moved(struct diff_options *o,
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
 		if (pmb_nr == 0) {
-			fill_potential_moved_blocks(
-				o, hm, match, l, &pmb, &pmb_alloc, &pmb_nr);
-			if (adjust_last_block(o, n, block_length) &&
-			    pmb_nr && last_symbol != l->s)
+			int contiguous = adjust_last_block(o, n, block_length);
+
+			if (!contiguous && block_length > 1)
+				/*
+				 * Rewind in case there is another match
+				 * starting at the second line of the block
+				 */
+				n -= block_length;
+			else
+				fill_potential_moved_blocks(o, hm, match, l,
+							    &pmb, &pmb_alloc,
+							    &pmb_nr);
+
+			if (contiguous && pmb_nr && last_symbol != l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 2c13b62d3c6..308dc136596 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1833,6 +1833,52 @@ test_expect_success '--color-moved treats adjacent blocks as separate for MIN_AL
 	test_cmp expected actual
 '
 
+test_expect_success '--color-moved rewinds for MIN_ALNUM_COUNT' '
+	git reset --hard &&
+	test_write_lines >file \
+		A B C one two three four five six seven D E F G H I J &&
+	git add file &&
+	test_write_lines >file \
+		one two A B C D E F G H I J two three four five six seven &&
+	git diff --color-moved=zebra -- file &&
+
+	git diff --color-moved=zebra --color -- file >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/file b/file<RESET>
+	<BOLD>--- a/file<RESET>
+	<BOLD>+++ b/file<RESET>
+	<CYAN>@@ -1,13 +1,8 @@<RESET>
+	<GREEN>+<RESET><GREEN>one<RESET>
+	<GREEN>+<RESET><GREEN>two<RESET>
+	 A<RESET>
+	 B<RESET>
+	 C<RESET>
+	<RED>-one<RESET>
+	<BOLD;MAGENTA>-two<RESET>
+	<BOLD;MAGENTA>-three<RESET>
+	<BOLD;MAGENTA>-four<RESET>
+	<BOLD;MAGENTA>-five<RESET>
+	<BOLD;MAGENTA>-six<RESET>
+	<BOLD;MAGENTA>-seven<RESET>
+	 D<RESET>
+	 E<RESET>
+	 F<RESET>
+	<CYAN>@@ -15,3 +10,9 @@<RESET> <RESET>G<RESET>
+	 H<RESET>
+	 I<RESET>
+	 J<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>two<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>three<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>four<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>five<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>six<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>seven<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_expect_success 'move detection with submodules' '
 	test_create_repo bananas &&
 	echo ripe >bananas/recipe &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 05/15] diff --color-moved=zebra: fix alternate coloring
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (3 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 04/15] diff --color-moved: rewind when discarding pmb Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
                       ` (11 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
alternation", 2018-11-23) sought to avoid using the alternate colors
unless there are two adjacent moved blocks of the same
sign. Unfortunately it contains two bugs that prevented it from fixing
the problem properly. Firstly `last_symbol` is reset at the start of
each iteration of the loop losing the symbol of the last line and
secondly when deciding whether to use the alternate color it should be
checking if the current line is the same sign of the last line, not a
different sign. The combination of the two errors means that we still
use the alternate color when we should do but we also use it when we
shouldn't. This is most noticable when using
--color-moved-ws=allow-indentation-change with hunks like

-this line gets indented
+    this line gets indented

where the post image is colored with newMovedAlternate rather than
newMoved. While this does not matter much, the next commit will change
the coloring to be correct in this case, so lets fix the bug here to
make it clear why the output is changing and add a regression test.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     |  4 +--
 t/t4015-diff-whitespace.sh | 72 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 1e1b5127d15..53f0df75329 100644
--- a/diff.c
+++ b/diff.c
@@ -1176,6 +1176,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
 	int n, flipped_block = 0, block_length = 0;
+	enum diff_symbol last_symbol = 0;
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
@@ -1183,7 +1184,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-		enum diff_symbol last_symbol = 0;
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
@@ -1251,7 +1251,7 @@ static void mark_color_as_moved(struct diff_options *o,
 							    &pmb, &pmb_alloc,
 							    &pmb_nr);
 
-			if (contiguous && pmb_nr && last_symbol != l->s)
+			if (contiguous && pmb_nr && last_symbol == l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 308dc136596..4e0fd76c6c5 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1442,6 +1442,78 @@ test_expect_success 'detect permutations inside moved code -- dimmed-zebra' '
 	test_cmp expected actual
 '
 
+test_expect_success 'zebra alternate color is only used when necessary' '
+	cat >old.txt <<-\EOF &&
+	line 1A should be marked as oldMoved newMovedAlternate
+	line 1B should be marked as oldMoved newMovedAlternate
+	unchanged
+	line 2A should be marked as oldMoved newMovedAlternate
+	line 2B should be marked as oldMoved newMovedAlternate
+	line 3A should be marked as oldMovedAlternate newMoved
+	line 3B should be marked as oldMovedAlternate newMoved
+	unchanged
+	line 4A should be marked as oldMoved newMovedAlternate
+	line 4B should be marked as oldMoved newMovedAlternate
+	line 5A should be marked as oldMovedAlternate newMoved
+	line 5B should be marked as oldMovedAlternate newMoved
+	line 6A should be marked as oldMoved newMoved
+	line 6B should be marked as oldMoved newMoved
+	EOF
+	cat >new.txt <<-\EOF &&
+	  line 1A should be marked as oldMoved newMovedAlternate
+	  line 1B should be marked as oldMoved newMovedAlternate
+	unchanged
+	  line 3A should be marked as oldMovedAlternate newMoved
+	  line 3B should be marked as oldMovedAlternate newMoved
+	  line 2A should be marked as oldMoved newMovedAlternate
+	  line 2B should be marked as oldMoved newMovedAlternate
+	unchanged
+	  line 6A should be marked as oldMoved newMoved
+	  line 6B should be marked as oldMoved newMoved
+	    line 4A should be marked as oldMoved newMovedAlternate
+	    line 4B should be marked as oldMoved newMovedAlternate
+	  line 5A should be marked as oldMovedAlternate newMoved
+	  line 5B should be marked as oldMovedAlternate newMoved
+	EOF
+	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
+		 --color-moved-ws=allow-indentation-change \
+		 old.txt new.txt >output &&
+	grep -v index output | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/old.txt b/new.txt<RESET>
+	<BOLD>--- a/old.txt<RESET>
+	<BOLD>+++ b/new.txt<RESET>
+	<CYAN>@@ -1,14 +1,14 @@<RESET>
+	<BOLD;MAGENTA>-line 1A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 1B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1B should be marked as oldMoved newMovedAlternate<RESET>
+	 unchanged<RESET>
+	<BOLD;MAGENTA>-line 2A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 2B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;BLUE>-line 3A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;BLUE>-line 3B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2B should be marked as oldMoved newMovedAlternate<RESET>
+	 unchanged<RESET>
+	<BOLD;MAGENTA>-line 4A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 4B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;BLUE>-line 5A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;BLUE>-line 5B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;MAGENTA>-line 6A should be marked as oldMoved newMoved<RESET>
+	<BOLD;MAGENTA>-line 6B should be marked as oldMoved newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6A should be marked as oldMoved newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6B should be marked as oldMoved newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5B should be marked as oldMovedAlternate newMoved<RESET>
+	EOF
+	test_cmp expected actual
+'
+
 test_expect_success 'cmd option assumes configured colored-moved' '
 	test_config color.diff.oldMoved "magenta" &&
 	test_config color.diff.newMoved "cyan" &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (4 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 05/15] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 07/15] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
                       ` (10 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

When marking moved lines it is possible for a block of potential
matched lines to extend past a change in sign when there is a sequence
of added lines whose text matches the text of a sequence of deleted
and added lines. Most of the time either `match` will be NULL or
`pmb_advance_or_null()` will fail when the loop encounters a change of
sign but there are corner cases where `match` is non-NULL and
`pmb_advance_or_null()` successfully advances the moved block despite
the change in sign.

One consequence of this is highlighting a short line as moved when it
should not be. For example

-moved line  # Correctly highlighted as moved
+short line  # Wrongly highlighted as moved
 context
+moved line  # Correctly highlighted as moved
+short line
 context
-short line

The other consequence is coloring a moved addition following a moved
deletion in the wrong color. In the example below the first "+moved
line 3" should be highlighted as newMoved not newMovedAlternate.

-moved line 1 # Correctly highlighted as oldMoved
-moved line 2 # Correctly highlighted as oldMovedAlternate
+moved line 3 # Wrongly highlighted as newMovedAlternate
 context      # Everything else is highlighted correctly
+moved line 2
+moved line 3
 context
+moved line 1
-moved line 3

These false matches are more likely when using --color-moved-ws with
the exception of --color-moved-ws=allow-indentation-change which ties
the sign of the current whitespace delta to the sign of the line to
avoid this problem. The fix is to check that the sign of the new line
being matched is the same as the sign of the line that started the
block of potential matches.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 17 ++++++----
 t/t4015-diff-whitespace.sh | 65 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index 53f0df75329..efba2789354 100644
--- a/diff.c
+++ b/diff.c
@@ -1176,7 +1176,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
 	int n, flipped_block = 0, block_length = 0;
-	enum diff_symbol last_symbol = 0;
+	enum diff_symbol moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
@@ -1202,7 +1202,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			flipped_block = 0;
 		}
 
-		if (!match) {
+		if (pmb_nr && (!match || l->s != moved_symbol)) {
 			int i;
 
 			if (!adjust_last_block(o, n, block_length) &&
@@ -1219,12 +1219,13 @@ static void mark_color_as_moved(struct diff_options *o,
 			pmb_nr = 0;
 			block_length = 0;
 			flipped_block = 0;
-			last_symbol = l->s;
+		}
+		if (!match) {
+			moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
 			continue;
 		}
 
 		if (o->color_moved == COLOR_MOVED_PLAIN) {
-			last_symbol = l->s;
 			l->flags |= DIFF_SYMBOL_MOVED_LINE;
 			continue;
 		}
@@ -1251,11 +1252,16 @@ static void mark_color_as_moved(struct diff_options *o,
 							    &pmb, &pmb_alloc,
 							    &pmb_nr);
 
-			if (contiguous && pmb_nr && last_symbol == l->s)
+			if (contiguous && pmb_nr && moved_symbol == l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
 
+			if (pmb_nr)
+				moved_symbol = l->s;
+			else
+				moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
+
 			block_length = 0;
 		}
 
@@ -1265,7 +1271,6 @@ static void mark_color_as_moved(struct diff_options *o,
 			if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
 				l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 		}
-		last_symbol = l->s;
 	}
 	adjust_last_block(o, n, block_length);
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 4e0fd76c6c5..15782c879d2 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1514,6 +1514,71 @@ test_expect_success 'zebra alternate color is only used when necessary' '
 	test_cmp expected actual
 '
 
+test_expect_success 'short lines of opposite sign do not get marked as moved' '
+	cat >old.txt <<-\EOF &&
+	this line should be marked as moved
+	unchanged
+	unchanged
+	unchanged
+	unchanged
+	too short
+	this line should be marked as oldMoved newMoved
+	this line should be marked as oldMovedAlternate newMoved
+	unchanged 1
+	unchanged 2
+	unchanged 3
+	unchanged 4
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	EOF
+	cat >new.txt <<-\EOF &&
+	too short
+	unchanged
+	unchanged
+	this line should be marked as moved
+	too short
+	unchanged
+	unchanged
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	unchanged 1
+	unchanged 2
+	this line should be marked as oldMovedAlternate newMoved
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	unchanged 3
+	this line should be marked as oldMoved newMoved
+	unchanged 4
+	EOF
+	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
+		old.txt new.txt >output && cat output &&
+	grep -v index output | test_decode_color >actual &&
+	cat >expect <<-\EOF &&
+	<BOLD>diff --git a/old.txt b/new.txt<RESET>
+	<BOLD>--- a/old.txt<RESET>
+	<BOLD>+++ b/new.txt<RESET>
+	<CYAN>@@ -1,13 +1,15 @@<RESET>
+	<BOLD;MAGENTA>-this line should be marked as moved<RESET>
+	<GREEN>+<RESET><GREEN>too short<RESET>
+	 unchanged<RESET>
+	 unchanged<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as moved<RESET>
+	<GREEN>+<RESET><GREEN>too short<RESET>
+	 unchanged<RESET>
+	 unchanged<RESET>
+	<RED>-too short<RESET>
+	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved<RESET>
+	<BOLD;BLUE>-this line should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	 unchanged 1<RESET>
+	 unchanged 2<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	 unchanged 3<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved<RESET>
+	 unchanged 4<RESET>
+	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'cmd option assumes configured colored-moved' '
 	test_config color.diff.oldMoved "magenta" &&
 	test_config color.diff.newMoved "cyan" &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 07/15] diff: simplify allow-indentation-change delta calculation
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (5 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
                       ` (9 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Now that we reliably end a block when the sign changes we don't need
the whitespace delta calculation to rely on the sign.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index efba2789354..9aff167be27 100644
--- a/diff.c
+++ b/diff.c
@@ -864,23 +864,17 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 	    a_width = a->indent_width,
 	    b_off = b->indent_off,
 	    b_width = b->indent_width;
-	int delta;
 
 	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
 		*out = INDENT_BLANKLINE;
 		return 1;
 	}
 
-	if (a->s == DIFF_SYMBOL_PLUS)
-		delta = a_width - b_width;
-	else
-		delta = b_width - a_width;
-
 	if (a_len - a_off != b_len - b_off ||
 	    memcmp(a->line + a_off, b->line + b_off, a_len - a_off))
 		return 0;
 
-	*out = delta;
+	*out = a_width - b_width;
 
 	return 1;
 }
@@ -924,10 +918,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 * match those of the current block and that the text of 'l' and 'cur'
 	 * after the indentation match.
 	 */
-	if (cur->es->s == DIFF_SYMBOL_PLUS)
-		delta = a_width - c_width;
-	else
-		delta = c_width - a_width;
+	delta = c_width - a_width;
 
 	/*
 	 * If the previous lines of this block were all blank then set its
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (6 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 07/15] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 09/15] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
                       ` (8 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

If we already have a block of potentially moved lines then as we move
down the diff we need to check if the next line of each potentially
moved line matches the current line of the diff. The implementation of
--color-moved-ws=allow-indentation-change was needlessly performing
this check on all the lines in the diff that matched the current line
rather than just the current line. To exacerbate the problem finding
all the other lines in the diff that match the current line involves a
fuzzy lookup so we were wasting even more time performing a second
comparison to filter out the non-matching lines. Fixing this reduces
time to run
  git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
by 93% compared to master and simplifies the code.

Test                                                                 HEAD^               HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.35+0.03)   0.38(0.35+0.03)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.86 (0.80+0.06)   0.87(0.83+0.04)  +1.2%
4002.3: diff --color-moved-ws=allow-indentation-change large change  19.01(18.93+0.06)   0.97(0.92+0.04) -94.9%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16 (1.06+0.09)   1.17(1.06+0.10)  +0.9%
4002.5: log --color-moved --no-color-moved-ws                         1.32 (1.25+0.07)   1.32(1.24+0.08)  +0.0%
4002.6: log --color-moved-ws=allow-indentation-change                 1.71 (1.64+0.06)   1.36(1.25+0.10) -20.5%

Test                                                                 master              HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)   0.38(0.35+0.03)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.75+0.04)   0.87(0.83+0.04)  +8.7%
4002.3: diff --color-moved-ws=allow-indentation-change large change  14.20(14.15+0.05)   0.97(0.92+0.04) -93.2%
4002.4: log --no-color-moved --no-color-moved-ws                      1.15 (1.05+0.09)   1.17(1.06+0.10)  +1.7%
4002.5: log --color-moved --no-color-moved-ws                         1.30 (1.19+0.11)   1.32(1.24+0.08)  +1.5%
4002.6: log --color-moved-ws=allow-indentation-change                 1.70 (1.63+0.06)   1.36(1.25+0.10) -20.0%

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 70 +++++++++++++++++-----------------------------------------
 1 file changed, 20 insertions(+), 50 deletions(-)

diff --git a/diff.c b/diff.c
index 9aff167be27..78a486021ab 100644
--- a/diff.c
+++ b/diff.c
@@ -879,37 +879,21 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 	return 1;
 }
 
-static int cmp_in_block_with_wsd(const struct diff_options *o,
-				 const struct moved_entry *cur,
-				 const struct moved_entry *match,
-				 struct moved_block *pmb,
-				 int n)
-{
-	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-	int al = cur->es->len, bl = match->es->len, cl = l->len;
+static int cmp_in_block_with_wsd(const struct moved_entry *cur,
+				 const struct emitted_diff_symbol *l,
+				 struct moved_block *pmb)
+{
+	int al = cur->es->len, bl = l->len;
 	const char *a = cur->es->line,
-		   *b = match->es->line,
-		   *c = l->line;
+		   *b = l->line;
 	int a_off = cur->es->indent_off,
 	    a_width = cur->es->indent_width,
-	    c_off = l->indent_off,
-	    c_width = l->indent_width;
+	    b_off = l->indent_off,
+	    b_width = l->indent_width;
 	int delta;
 
-	/*
-	 * We need to check if 'cur' is equal to 'match'.  As those
-	 * are from the same (+/-) side, we do not need to adjust for
-	 * indent changes. However these were found using fuzzy
-	 * matching so we do have to check if they are equal. Here we
-	 * just check the lengths. We delay calling memcmp() to check
-	 * the contents until later as if the length comparison for a
-	 * and c fails we can avoid the call all together.
-	 */
-	if (al != bl)
-		return 1;
-
 	/* If 'l' and 'cur' are both blank then they match. */
-	if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE)
+	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
 		return 0;
 
 	/*
@@ -918,7 +902,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 * match those of the current block and that the text of 'l' and 'cur'
 	 * after the indentation match.
 	 */
-	delta = c_width - a_width;
+	delta = b_width - a_width;
 
 	/*
 	 * If the previous lines of this block were all blank then set its
@@ -927,9 +911,8 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	if (pmb->wsd == INDENT_BLANKLINE)
 		pmb->wsd = delta;
 
-	return !(delta == pmb->wsd && al - a_off == cl - c_off &&
-		 !memcmp(a, b, al) && !
-		 memcmp(a + a_off, c + c_off, al - a_off));
+	return !(delta == pmb->wsd && al - a_off == bl - b_off &&
+		 !memcmp(a + a_off, b + b_off, al - a_off));
 }
 
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
@@ -1030,36 +1013,23 @@ static void pmb_advance_or_null(struct diff_options *o,
 }
 
 static void pmb_advance_or_null_multi_match(struct diff_options *o,
-					    struct moved_entry *match,
-					    struct hashmap *hm,
+					    struct emitted_diff_symbol *l,
 					    struct moved_block *pmb,
-					    int pmb_nr, int n)
+					    int pmb_nr)
 {
 	int i;
-	char *got_match = xcalloc(1, pmb_nr);
-
-	hashmap_for_each_entry_from(hm, match, ent) {
-		for (i = 0; i < pmb_nr; i++) {
-			struct moved_entry *prev = pmb[i].match;
-			struct moved_entry *cur = (prev && prev->next_line) ?
-					prev->next_line : NULL;
-			if (!cur)
-				continue;
-			if (!cmp_in_block_with_wsd(o, cur, match, &pmb[i], n))
-				got_match[i] |= 1;
-		}
-	}
 
 	for (i = 0; i < pmb_nr; i++) {
-		if (got_match[i]) {
+		struct moved_entry *prev = pmb[i].match;
+		struct moved_entry *cur = (prev && prev->next_line) ?
+			prev->next_line : NULL;
+		if (cur && !cmp_in_block_with_wsd(cur, l, &pmb[i])) {
 			/* Advance to the next line */
-			pmb[i].match = pmb[i].match->next_line;
+			pmb[i].match = cur;
 		} else {
 			moved_block_clear(&pmb[i]);
 		}
 	}
-
-	free(got_match);
 }
 
 static int shrink_potential_moved_blocks(struct moved_block *pmb,
@@ -1223,7 +1193,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
+			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
 		else
 			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 09/15] diff --color-moved: call comparison function directly
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (7 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 10/15] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
                       ` (7 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This change will allow us to easily combine pmb_advance_or_null() and
pmb_advance_or_null_multi_match() in the next commit. Calling
xdiff_compare_lines() directly rather than using a function pointer
from the hash map has little effect on the run time.

Test                                                                  HEAD^             HEAD
-------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.35+0.03)   0.38(0.32+0.06) +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.87(0.83+0.04)   0.87(0.80+0.06) +0.0%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.97(0.92+0.04)   0.97(0.93+0.04) +0.0%
4002.4: log --no-color-moved --no-color-moved-ws                      1.17(1.06+0.10)   1.16(1.10+0.05) -0.9%
4002.5: log --color-moved --no-color-moved-ws                         1.32(1.24+0.08)   1.31(1.22+0.09) -0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.36(1.25+0.10)   1.35(1.25+0.10) -0.7%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 78a486021ab..22e0edac173 100644
--- a/diff.c
+++ b/diff.c
@@ -994,17 +994,20 @@ static void add_lines_to_move_detection(struct diff_options *o,
 }
 
 static void pmb_advance_or_null(struct diff_options *o,
-				struct moved_entry *match,
-				struct hashmap *hm,
+				struct emitted_diff_symbol *l,
 				struct moved_block *pmb,
 				int pmb_nr)
 {
 	int i;
+	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
+
 	for (i = 0; i < pmb_nr; i++) {
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
 				prev->next_line : NULL;
-		if (cur && !hm->cmpfn(o, &cur->ent, &match->ent, NULL)) {
+		if (cur && xdiff_compare_lines(cur->es->line, cur->es->len,
+						l->line, l->len,
+						flags)) {
 			pmb[i].match = cur;
 		} else {
 			pmb[i].match = NULL;
@@ -1195,7 +1198,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
 			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
 		else
-			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
+			pmb_advance_or_null(o, l, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 10/15] diff --color-moved: unify moved block growth functions
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (8 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 09/15] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 11/15] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
                       ` (6 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

After the last two commits pmb_advance_or_null() and
pmb_advance_or_null_multi_match() differ only in the comparison they
perform. Lets simplify the code by combining them into a single
function.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/diff.c b/diff.c
index 22e0edac173..51f092e724e 100644
--- a/diff.c
+++ b/diff.c
@@ -1002,36 +1002,23 @@ static void pmb_advance_or_null(struct diff_options *o,
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
 	for (i = 0; i < pmb_nr; i++) {
+		int match;
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
 				prev->next_line : NULL;
-		if (cur && xdiff_compare_lines(cur->es->line, cur->es->len,
-						l->line, l->len,
-						flags)) {
-			pmb[i].match = cur;
-		} else {
-			pmb[i].match = NULL;
-		}
-	}
-}
 
-static void pmb_advance_or_null_multi_match(struct diff_options *o,
-					    struct emitted_diff_symbol *l,
-					    struct moved_block *pmb,
-					    int pmb_nr)
-{
-	int i;
-
-	for (i = 0; i < pmb_nr; i++) {
-		struct moved_entry *prev = pmb[i].match;
-		struct moved_entry *cur = (prev && prev->next_line) ?
-			prev->next_line : NULL;
-		if (cur && !cmp_in_block_with_wsd(cur, l, &pmb[i])) {
-			/* Advance to the next line */
+		if (o->color_moved_ws_handling &
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			match = cur &&
+				!cmp_in_block_with_wsd(cur, l, &pmb[i]);
+		else
+			match = cur &&
+				xdiff_compare_lines(cur->es->line, cur->es->len,
+						    l->line, l->len, flags);
+		if (match)
 			pmb[i].match = cur;
-		} else {
+		else
 			moved_block_clear(&pmb[i]);
-		}
 	}
 }
 
@@ -1194,11 +1181,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 		}
 
-		if (o->color_moved_ws_handling &
-		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
-		else
-			pmb_advance_or_null(o, l, pmb, pmb_nr);
+		pmb_advance_or_null(o, l, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 11/15] diff --color-moved: shrink potential moved blocks as we go
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (9 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 10/15] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 12/15] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
                       ` (5 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Rather than setting `match` to NULL and then looping over the list of
potential matched blocks for a second time to remove blocks with no
matches just filter out the blocks with no matches as we go.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 44 ++++++++------------------------------------
 1 file changed, 8 insertions(+), 36 deletions(-)

diff --git a/diff.c b/diff.c
index 51f092e724e..626fd47aa0e 100644
--- a/diff.c
+++ b/diff.c
@@ -996,12 +996,12 @@ static void add_lines_to_move_detection(struct diff_options *o,
 static void pmb_advance_or_null(struct diff_options *o,
 				struct emitted_diff_symbol *l,
 				struct moved_block *pmb,
-				int pmb_nr)
+				int *pmb_nr)
 {
-	int i;
+	int i, j;
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
-	for (i = 0; i < pmb_nr; i++) {
+	for (i = 0, j = 0; i < *pmb_nr; i++) {
 		int match;
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
@@ -1015,38 +1015,12 @@ static void pmb_advance_or_null(struct diff_options *o,
 			match = cur &&
 				xdiff_compare_lines(cur->es->line, cur->es->len,
 						    l->line, l->len, flags);
-		if (match)
-			pmb[i].match = cur;
-		else
-			moved_block_clear(&pmb[i]);
-	}
-}
-
-static int shrink_potential_moved_blocks(struct moved_block *pmb,
-					 int pmb_nr)
-{
-	int lp, rp;
-
-	/* Shrink the set of potential block to the remaining running */
-	for (lp = 0, rp = pmb_nr - 1; lp <= rp;) {
-		while (lp < pmb_nr && pmb[lp].match)
-			lp++;
-		/* lp points at the first NULL now */
-
-		while (rp > -1 && !pmb[rp].match)
-			rp--;
-		/* rp points at the last non-NULL */
-
-		if (lp < pmb_nr && rp > -1 && lp < rp) {
-			pmb[lp] = pmb[rp];
-			memset(&pmb[rp], 0, sizeof(pmb[rp]));
-			rp--;
-			lp++;
+		if (match) {
+			pmb[j] = pmb[i];
+			pmb[j++].match = cur;
 		}
 	}
-
-	/* Remember the number of running sets */
-	return rp + 1;
+	*pmb_nr = j;
 }
 
 static void fill_potential_moved_blocks(struct diff_options *o,
@@ -1181,9 +1155,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 		}
 
-		pmb_advance_or_null(o, l, pmb, pmb_nr);
-
-		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
+		pmb_advance_or_null(o, l, pmb, &pmb_nr);
 
 		if (pmb_nr == 0) {
 			int contiguous = adjust_last_block(o, n, block_length);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 12/15] diff --color-moved: stop clearing potential moved blocks
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (10 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 11/15] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
                       ` (4 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

moved_block_clear() was introduced in 74d156f4a1 ("diff
--color-moved-ws: fix double free crash", 2018-10-04) to free the
memory that was allocated when initializing a potential moved
block. However since 21536d077f ("diff --color-moved-ws: modify
allow-indentation-change", 2018-11-23) initializing a potential moved
block no longer allocates any memory. Up until the last commit we were
relying on moved_block_clear() to set the `match` pointer to NULL when
a block stopped matching, but since that commit we do not clear a
moved block that does not match so it does not make sense to clear
them elsewhere.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/diff.c b/diff.c
index 626fd47aa0e..ffbe09937bc 100644
--- a/diff.c
+++ b/diff.c
@@ -807,11 +807,6 @@ struct moved_block {
 	int wsd; /* The whitespace delta of this block */
 };
 
-static void moved_block_clear(struct moved_block *b)
-{
-	memset(b, 0, sizeof(*b));
-}
-
 #define INDENT_BLANKLINE INT_MIN
 
 static void fill_es_indent_data(struct emitted_diff_symbol *es)
@@ -1128,8 +1123,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		if (pmb_nr && (!match || l->s != moved_symbol)) {
-			int i;
-
 			if (!adjust_last_block(o, n, block_length) &&
 			    block_length > 1) {
 				/*
@@ -1139,8 +1132,6 @@ static void mark_color_as_moved(struct diff_options *o,
 				match = NULL;
 				n -= block_length;
 			}
-			for(i = 0; i < pmb_nr; i++)
-				moved_block_clear(&pmb[i]);
 			pmb_nr = 0;
 			block_length = 0;
 			flipped_block = 0;
@@ -1193,8 +1184,6 @@ static void mark_color_as_moved(struct diff_options *o,
 	}
 	adjust_last_block(o, n, block_length);
 
-	for(n = 0; n < pmb_nr; n++)
-		moved_block_clear(&pmb[n]);
 	free(pmb);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (11 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 12/15] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 14/15] diff: use designated initializers for emitted_diff_symbol Phillip Wood via GitGitGadget
                       ` (3 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

As libxdiff does not have a whitespace flag to ignore the indentation
the code for --color-moved-ws=allow-indentation-change uses
XDF_IGNORE_WHITESPACE and then filters out any hash lookups where
there are non-indentation changes. This filtering is inefficient as
we have to perform another string comparison.

By using the offset data that we have already computed to skip the
indentation we can avoid using XDF_IGNORE_WHITESPACE and safely remove
the extra checks which improves the performance by 11% and paves the
way for the elimination of string comparisons in the next commit.

This change slightly increases the run time of other --color-moved
modes. This could be avoided by using different comparison functions
for the different modes but after the next two commits there is no
measurable benefit in doing so.

There is a change in behavior for lines that begin with a form-feed or
vertical-tab character. Since b46054b374 ("xdiff: use
git-compat-util", 2019-04-11) xdiff does not treat '\f' or '\v' as
whitespace characters. This means that lines starting with those
characters are never considered to be blank and never match a line
that does not start with the same character. After this patch a line
matching "^[\f\v\r]*[ \t]*$" is considered to be blank by
--color-moved-ws=allow-indentation-change and lines beginning
"^[\f\v\r]*[ \t]*" can match another line if the suffixes match. This
changes the output of git show for d18f76dccf ("compat/regex: use the
regex engine from gawk for compat", 2010-08-17) as some lines in the
pre-image before a moved block that contain '\f' are now considered
moved as well as they match a blank line before the moved lines in the
post-image. This commit updates one of the tests to reflect this
change.

Test                                                                  HEAD^             HEAD
--------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.33+0.05)   0.38(0.33+0.05)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.86(0.82+0.04)   0.88(0.84+0.04)  +2.3%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.97(0.94+0.03)   0.86(0.81+0.05) -11.3%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16(1.07+0.09)   1.16(1.06+0.09)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.32(1.26+0.06)   1.33(1.27+0.05)  +0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.35(1.29+0.06)   1.33(1.24+0.08)  -1.5%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 65 +++++++++++---------------------------
 t/t4015-diff-whitespace.sh | 22 ++++++-------
 2 files changed, 30 insertions(+), 57 deletions(-)

diff --git a/diff.c b/diff.c
index ffbe09937bc..2085c063675 100644
--- a/diff.c
+++ b/diff.c
@@ -850,28 +850,15 @@ static void fill_es_indent_data(struct emitted_diff_symbol *es)
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
-			    const struct emitted_diff_symbol *b,
-			    int *out)
-{
-	int a_len = a->len,
-	    b_len = b->len,
-	    a_off = a->indent_off,
-	    a_width = a->indent_width,
-	    b_off = b->indent_off,
+			    const struct emitted_diff_symbol *b)
+{
+	int a_width = a->indent_width,
 	    b_width = b->indent_width;
 
-	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
-		*out = INDENT_BLANKLINE;
-		return 1;
-	}
-
-	if (a_len - a_off != b_len - b_off ||
-	    memcmp(a->line + a_off, b->line + b_off, a_len - a_off))
-		return 0;
-
-	*out = a_width - b_width;
+	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
+		return INDENT_BLANKLINE;
 
-	return 1;
+	return a_width - b_width;
 }
 
 static int cmp_in_block_with_wsd(const struct moved_entry *cur,
@@ -916,26 +903,17 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			   const void *keydata)
 {
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
-	const struct moved_entry *a, *b;
+	const struct emitted_diff_symbol *a, *b;
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
-	a = container_of(eptr, const struct moved_entry, ent);
-	b = container_of(entry_or_key, const struct moved_entry, ent);
+	a = container_of(eptr, const struct moved_entry, ent)->es;
+	b = container_of(entry_or_key, const struct moved_entry, ent)->es;
 
-	if (diffopt->color_moved_ws_handling &
-	    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-		/*
-		 * As there is not specific white space config given,
-		 * we'd need to check for a new block, so ignore all
-		 * white space. The setup of the white space
-		 * configuration for the next block is done else where
-		 */
-		flags |= XDF_IGNORE_WHITESPACE;
-
-	return !xdiff_compare_lines(a->es->line, a->es->len,
-				    b->es->line, b->es->len,
-				    flags);
+	return !xdiff_compare_lines(a->line + a->indent_off,
+				    a->len - a->indent_off,
+				    b->line + b->indent_off,
+				    b->len - b->indent_off, flags);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -944,7 +922,8 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 	struct moved_entry *ret = xmalloc(sizeof(*ret));
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
-	unsigned int hash = xdiff_hash_string(l->line, l->len, flags);
+	unsigned int hash = xdiff_hash_string(l->line + l->indent_off,
+					      l->len - l->indent_off, flags);
 
 	hashmap_entry_init(&ret->ent, hash);
 	ret->es = l;
@@ -1036,13 +1015,11 @@ static void fill_potential_moved_blocks(struct diff_options *o,
 	hashmap_for_each_entry_from(hm, match, ent) {
 		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
 		if (o->color_moved_ws_handling &
-		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
-			if (compute_ws_delta(l, match->es, &(pmb[pmb_nr]).wsd))
-				pmb[pmb_nr++].match = match;
-		} else {
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			pmb[pmb_nr].wsd = compute_ws_delta(l, match->es);
+		else
 			pmb[pmb_nr].wsd = 0;
-			pmb[pmb_nr++].match = match;
-		}
+		pmb[pmb_nr++].match = match;
 	}
 
 	*pmb_p = pmb;
@@ -6276,10 +6253,6 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
-			if (o->color_moved_ws_handling &
-			    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-				o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
-
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
 			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 15782c879d2..50d0cf486be 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -2206,10 +2206,10 @@ EMPTY=''
 test_expect_success 'compare mixed whitespace delta across moved blocks' '
 
 	git reset --hard &&
-	tr Q_ "\t " <<-EOF >text.txt &&
-	${EMPTY}
-	____too short without
-	${EMPTY}
+	tr "^|Q_" "\f\v\t " <<-EOF >text.txt &&
+	^__
+	|____too short without
+	^
 	___being grouped across blank line
 	${EMPTY}
 	context
@@ -2228,7 +2228,7 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	git add text.txt &&
 	git commit -m "add text.txt" &&
 
-	tr Q_ "\t " <<-EOF >text.txt &&
+	tr "^|Q_" "\f\v\t " <<-EOF >text.txt &&
 	context
 	lines
 	to
@@ -2239,7 +2239,7 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	${EMPTY}
 	QQtoo short without
 	${EMPTY}
-	Q_______being grouped across blank line
+	^Q_______being grouped across blank line
 	${EMPTY}
 	Q_QThese two lines have had their
 	indentation reduced by four spaces
@@ -2251,16 +2251,16 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 		-c core.whitespace=space-before-tab \
 		diff --color --color-moved --ws-error-highlight=all \
 		--color-moved-ws=allow-indentation-change >actual.raw &&
-	grep -v "index" actual.raw | test_decode_color >actual &&
+	grep -v "index" actual.raw | tr "\f\v" "^|" | test_decode_color >actual &&
 
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/text.txt b/text.txt<RESET>
 	<BOLD>--- a/text.txt<RESET>
 	<BOLD>+++ b/text.txt<RESET>
 	<CYAN>@@ -1,16 +1,16 @@<RESET>
-	<BOLD;MAGENTA>-<RESET>
-	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    too short without<RESET>
-	<BOLD;MAGENTA>-<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>^<RESET><BRED>  <RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>|    too short without<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>^<RESET>
 	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>   being grouped across blank line<RESET>
 	<BOLD;MAGENTA>-<RESET>
 	 <RESET>context<RESET>
@@ -2280,7 +2280,7 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	<BOLD;YELLOW>+<RESET>
 	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>too short without<RESET>
 	<BOLD;YELLOW>+<RESET>
-	<BOLD;YELLOW>+<RESET>	<BOLD;YELLOW>       being grouped across blank line<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>^	       being grouped across blank line<RESET>
 	<BOLD;YELLOW>+<RESET>
 	<BOLD;CYAN>+<RESET>	<BRED> <RESET>	<BOLD;CYAN>These two lines have had their<RESET>
 	<BOLD;CYAN>+<RESET><BOLD;CYAN>indentation reduced by four spaces<RESET>
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 14/15] diff: use designated initializers for emitted_diff_symbol
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (12 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 12:04     ` [PATCH v3 15/15] diff --color-moved: intern strings Phillip Wood via GitGitGadget
                       ` (2 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This makes it clearer which fields are being explicitly initialized
and will simplify the next commit where we add a new field to the
struct.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 2085c063675..9ef88d7665a 100644
--- a/diff.c
+++ b/diff.c
@@ -1497,7 +1497,9 @@ static void emit_diff_symbol_from_struct(struct diff_options *o,
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
 			     const char *line, int len, unsigned flags)
 {
-	struct emitted_diff_symbol e = {line, len, flags, 0, 0, s};
+	struct emitted_diff_symbol e = {
+		.line = line, .len = len, .flags = flags, .s = s
+	};
 
 	if (o->emitted_symbols)
 		append_emitted_diff_symbol(o, &e);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v3 15/15] diff --color-moved: intern strings
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (13 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 14/15] diff: use designated initializers for emitted_diff_symbol Phillip Wood via GitGitGadget
@ 2021-10-27 12:04     ` Phillip Wood via GitGitGadget
  2021-10-27 13:28     ` [PATCH v3 00/15] diff --color-moved[-ws] speedups Phillip Wood
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-10-27 12:04 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Taking inspiration from xdl_classify_record() assign an id to each
addition and deletion such that lines that match for the current
--color-moved-ws mode share the same unique id. This reduces the
number of hash lookups a little (calculating the ids still involves
one hash lookup per line) but the main benefit is that when growing
blocks of potentially moved lines we can replace string comparisons
which involve chasing a pointer with a simple integer comparison. On a
large diff this commit reduces the time to run 'diff --color-moved' by
37% compared to the previous commit and 31% compared to master, for
'diff --color-moved-ws=allow-indentation-change' the reduction is 28%
compared to the previous commit and 96% compared to master. There is
little change in the performance of 'git log --patch' as the diffs are
smaller.

Test                                                                 HEAD^               HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.33+0.05)    0.38(0.33+0.05)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.88(0.81+0.06)    0.55(0.50+0.04) -37.5%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.85(0.79+0.06)    0.61(0.54+0.06) -28.2%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16(1.07+0.08)    1.15(1.09+0.05)  -0.9%
4002.5: log --color-moved --no-color-moved-ws                         1.31(1.22+0.08)    1.29(1.19+0.09)  -1.5%
4002.6: log --color-moved-ws=allow-indentation-change                 1.32(1.24+0.08)    1.31(1.18+0.13)  -0.8%

Test                                                                 master              HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)   0.38(0.33+0.05)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.75+0.04)   0.55(0.50+0.04) -31.2%
4002.3: diff --color-moved-ws=allow-indentation-change large change  14.20(14.15+0.05)   0.61(0.54+0.06) -95.7%
4002.4: log --no-color-moved --no-color-moved-ws                      1.15 (1.05+0.09)   1.15(1.09+0.05)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.30 (1.19+0.11)   1.29(1.19+0.09)  -0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.70 (1.63+0.06)   1.31(1.18+0.13) -22.9%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 174 +++++++++++++++++++++++++++++++--------------------------
 1 file changed, 96 insertions(+), 78 deletions(-)

diff --git a/diff.c b/diff.c
index 9ef88d7665a..c28c56c1283 100644
--- a/diff.c
+++ b/diff.c
@@ -18,6 +18,7 @@
 #include "submodule-config.h"
 #include "submodule.h"
 #include "hashmap.h"
+#include "mem-pool.h"
 #include "ll-merge.h"
 #include "string-list.h"
 #include "strvec.h"
@@ -772,6 +773,7 @@ struct emitted_diff_symbol {
 	int flags;
 	int indent_off;   /* Offset to first non-whitespace character */
 	int indent_width; /* The visual width of the indentation */
+	unsigned id;
 	enum diff_symbol s;
 };
 #define EMITTED_DIFF_SYMBOL_INIT {NULL}
@@ -797,9 +799,9 @@ static void append_emitted_diff_symbol(struct diff_options *o,
 }
 
 struct moved_entry {
-	struct hashmap_entry ent;
 	const struct emitted_diff_symbol *es;
 	struct moved_entry *next_line;
+	struct moved_entry *next_match;
 };
 
 struct moved_block {
@@ -865,24 +867,24 @@ static int cmp_in_block_with_wsd(const struct moved_entry *cur,
 				 const struct emitted_diff_symbol *l,
 				 struct moved_block *pmb)
 {
-	int al = cur->es->len, bl = l->len;
-	const char *a = cur->es->line,
-		   *b = l->line;
-	int a_off = cur->es->indent_off,
-	    a_width = cur->es->indent_width,
-	    b_off = l->indent_off,
-	    b_width = l->indent_width;
+	int a_width = cur->es->indent_width, b_width = l->indent_width;
 	int delta;
 
-	/* If 'l' and 'cur' are both blank then they match. */
-	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
+	/* The text of each line must match */
+	if (cur->es->id != l->id)
+		return 1;
+
+	/*
+	 * If 'l' and 'cur' are both blank then we don't need to check the
+	 * indent. We only need to check cur as we know the strings match.
+	 * */
+	if (a_width == INDENT_BLANKLINE)
 		return 0;
 
 	/*
 	 * The indent changes of the block are known and stored in pmb->wsd;
 	 * however we need to check if the indent changes of the current line
-	 * match those of the current block and that the text of 'l' and 'cur'
-	 * after the indentation match.
+	 * match those of the current block.
 	 */
 	delta = b_width - a_width;
 
@@ -893,22 +895,26 @@ static int cmp_in_block_with_wsd(const struct moved_entry *cur,
 	if (pmb->wsd == INDENT_BLANKLINE)
 		pmb->wsd = delta;
 
-	return !(delta == pmb->wsd && al - a_off == bl - b_off &&
-		 !memcmp(a + a_off, b + b_off, al - a_off));
+	return delta != pmb->wsd;
 }
 
-static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
-			   const struct hashmap_entry *eptr,
-			   const struct hashmap_entry *entry_or_key,
-			   const void *keydata)
+struct interned_diff_symbol {
+	struct hashmap_entry ent;
+	struct emitted_diff_symbol *es;
+};
+
+static int interned_diff_symbol_cmp(const void *hashmap_cmp_fn_data,
+				    const struct hashmap_entry *eptr,
+				    const struct hashmap_entry *entry_or_key,
+				    const void *keydata)
 {
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
 	const struct emitted_diff_symbol *a, *b;
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
-	a = container_of(eptr, const struct moved_entry, ent)->es;
-	b = container_of(entry_or_key, const struct moved_entry, ent)->es;
+	a = container_of(eptr, const struct interned_diff_symbol, ent)->es;
+	b = container_of(entry_or_key, const struct interned_diff_symbol, ent)->es;
 
 	return !xdiff_compare_lines(a->line + a->indent_off,
 				    a->len - a->indent_off,
@@ -916,55 +922,81 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 				    b->len - b->indent_off, flags);
 }
 
-static struct moved_entry *prepare_entry(struct diff_options *o,
-					 int line_no)
+static void prepare_entry(struct diff_options *o, struct emitted_diff_symbol *l,
+			  struct interned_diff_symbol *s)
 {
-	struct moved_entry *ret = xmalloc(sizeof(*ret));
-	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 	unsigned int hash = xdiff_hash_string(l->line + l->indent_off,
 					      l->len - l->indent_off, flags);
 
-	hashmap_entry_init(&ret->ent, hash);
-	ret->es = l;
-	ret->next_line = NULL;
-
-	return ret;
+	hashmap_entry_init(&s->ent, hash);
+	s->es = l;
 }
 
-static void add_lines_to_move_detection(struct diff_options *o,
-					struct hashmap *add_lines,
-					struct hashmap *del_lines)
+struct moved_entry_list {
+	struct moved_entry *add, *del;
+};
+
+static struct moved_entry_list *add_lines_to_move_detection(struct diff_options *o,
+							    struct mem_pool *entry_mem_pool)
 {
 	struct moved_entry *prev_line = NULL;
-
+	struct mem_pool interned_pool;
+	struct hashmap interned_map;
+	struct moved_entry_list *entry_list = NULL;
+	size_t entry_list_alloc = 0;
+	unsigned id = 0;
 	int n;
+
+	hashmap_init(&interned_map, interned_diff_symbol_cmp, o, 8096);
+	mem_pool_init(&interned_pool, 1024 * 1024);
+
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
-		struct hashmap *hm;
-		struct moved_entry *key;
+		struct interned_diff_symbol key;
+		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
+		struct interned_diff_symbol *s;
+		struct moved_entry *entry;
 
-		switch (o->emitted_symbols->buf[n].s) {
-		case DIFF_SYMBOL_PLUS:
-			hm = add_lines;
-			break;
-		case DIFF_SYMBOL_MINUS:
-			hm = del_lines;
-			break;
-		default:
+		if (l->s != DIFF_SYMBOL_PLUS && l->s != DIFF_SYMBOL_MINUS) {
 			prev_line = NULL;
 			continue;
 		}
 
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			fill_es_indent_data(&o->emitted_symbols->buf[n]);
-		key = prepare_entry(o, n);
-		if (prev_line && prev_line->es->s == o->emitted_symbols->buf[n].s)
-			prev_line->next_line = key;
+			fill_es_indent_data(l);
 
-		hashmap_add(hm, &key->ent);
-		prev_line = key;
+		prepare_entry(o, l, &key);
+		s = hashmap_get_entry(&interned_map, &key, ent, &key.ent);
+		if (s) {
+			l->id = s->es->id;
+		} else {
+			l->id = id;
+			ALLOC_GROW_BY(entry_list, id, 1, entry_list_alloc);
+			hashmap_add(&interned_map,
+				    memcpy(mem_pool_alloc(&interned_pool,
+							  sizeof(key)),
+					   &key, sizeof(key)));
+		}
+		entry = mem_pool_alloc(entry_mem_pool, sizeof(*entry));
+		entry->es = l;
+		entry->next_line = NULL;
+		if (prev_line && prev_line->es->s == l->s)
+			prev_line->next_line = entry;
+		prev_line = entry;
+		if (l->s == DIFF_SYMBOL_PLUS) {
+			entry->next_match = entry_list[l->id].add;
+			entry_list[l->id].add = entry;
+		} else {
+			entry->next_match = entry_list[l->id].del;
+			entry_list[l->id].del = entry;
+		}
 	}
+
+	hashmap_clear(&interned_map);
+	mem_pool_discard(&interned_pool, 0);
+
+	return entry_list;
 }
 
 static void pmb_advance_or_null(struct diff_options *o,
@@ -973,7 +1005,6 @@ static void pmb_advance_or_null(struct diff_options *o,
 				int *pmb_nr)
 {
 	int i, j;
-	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
 	for (i = 0, j = 0; i < *pmb_nr; i++) {
 		int match;
@@ -986,9 +1017,8 @@ static void pmb_advance_or_null(struct diff_options *o,
 			match = cur &&
 				!cmp_in_block_with_wsd(cur, l, &pmb[i]);
 		else
-			match = cur &&
-				xdiff_compare_lines(cur->es->line, cur->es->len,
-						    l->line, l->len, flags);
+			match = cur && cur->es->id == l->id;
+
 		if (match) {
 			pmb[j] = pmb[i];
 			pmb[j++].match = cur;
@@ -998,7 +1028,6 @@ static void pmb_advance_or_null(struct diff_options *o,
 }
 
 static void fill_potential_moved_blocks(struct diff_options *o,
-					struct hashmap *hm,
 					struct moved_entry *match,
 					struct emitted_diff_symbol *l,
 					struct moved_block **pmb_p,
@@ -1012,7 +1041,7 @@ static void fill_potential_moved_blocks(struct diff_options *o,
 	 * The current line is the start of a new block.
 	 * Setup the set of potential blocks.
 	 */
-	hashmap_for_each_entry_from(hm, match, ent) {
+	for (; match; match = match->next_match) {
 		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
@@ -1067,8 +1096,7 @@ static int adjust_last_block(struct diff_options *o, int n, int block_length)
 
 /* Find blocks of moved code, delegate actual coloring decision to helper */
 static void mark_color_as_moved(struct diff_options *o,
-				struct hashmap *add_lines,
-				struct hashmap *del_lines)
+				struct moved_entry_list *entry_list)
 {
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
@@ -1077,23 +1105,15 @@ static void mark_color_as_moved(struct diff_options *o,
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
-		struct hashmap *hm = NULL;
-		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
-			hm = del_lines;
-			key = prepare_entry(o, n);
-			match = hashmap_get_entry(hm, key, ent, NULL);
-			free(key);
+			match = entry_list[l->id].del;
 			break;
 		case DIFF_SYMBOL_MINUS:
-			hm = add_lines;
-			key = prepare_entry(o, n);
-			match = hashmap_get_entry(hm, key, ent, NULL);
-			free(key);
+			match = entry_list[l->id].add;
 			break;
 		default:
 			flipped_block = 0;
@@ -1135,7 +1155,7 @@ static void mark_color_as_moved(struct diff_options *o,
 				 */
 				n -= block_length;
 			else
-				fill_potential_moved_blocks(o, hm, match, l,
+				fill_potential_moved_blocks(o, match, l,
 							    &pmb, &pmb_alloc,
 							    &pmb_nr);
 
@@ -6253,20 +6273,18 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 
 	if (o->emitted_symbols) {
 		if (o->color_moved) {
-			struct hashmap add_lines, del_lines;
-
-			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
-			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
+			struct mem_pool entry_pool;
+			struct moved_entry_list *entry_list;
 
-			add_lines_to_move_detection(o, &add_lines, &del_lines);
-			mark_color_as_moved(o, &add_lines, &del_lines);
+			mem_pool_init(&entry_pool, 1024 * 1024);
+			entry_list = add_lines_to_move_detection(o,
+								 &entry_pool);
+			mark_color_as_moved(o, entry_list);
 			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
 				dim_moved_lines(o);
 
-			hashmap_clear_and_free(&add_lines, struct moved_entry,
-						ent);
-			hashmap_clear_and_free(&del_lines, struct moved_entry,
-						ent);
+			mem_pool_discard(&entry_pool, 0);
+			free(entry_list);
 		}
 
 		for (i = 0; i < esm.nr; i++)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH v3 00/15] diff --color-moved[-ws] speedups
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (14 preceding siblings ...)
  2021-10-27 12:04     ` [PATCH v3 15/15] diff --color-moved: intern strings Phillip Wood via GitGitGadget
@ 2021-10-27 13:28     ` Phillip Wood
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood @ 2021-10-27 13:28 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget, git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren

On 27/10/2021 13:04, Phillip Wood via GitGitGadget wrote:
 > [...]
>   * Patches 1-3 are new and fix an existing bug.

Sorry that should be "Patches 2-4"

>[...]
> The bug fix in patch 3 degrades the performance, but by the end of the
> series the timings are the same as V2 - see the range diff.

It is patch 4 that degrades the performance, not patch 3

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v3 01/15] diff --color-moved: add perf tests
  2021-10-27 12:04     ` [PATCH v3 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
@ 2021-10-28 21:32       ` Junio C Hamano
  2021-10-29 10:24         ` Phillip Wood
  0 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2021-10-28 21:32 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> Add some tests so we can monitor changes to the performance of the
> move detection code. The tests record the performance of a single
> large diff and a sequence of smaller diffs.

"A single large diff" meaning...?

> +if ! git rev-parse --verify v2.29.0^{commit} >/dev/null
> +then
> +	skip_all='skipping because tag v2.29.0 was not found'
> +	test_done
> +fi

Hmph.  So this is designed only to be run in a clone of git.git with
that tag (and a bit of history, at least to v2.28.0 and 1000 commits)?

I am asking primarily because this seems to be the first instance of
a test that hardcodes the dependency on our history, instead of
allowing the tester to use their favourite history by using the
GIT_PERF_LARGE_REPO and GIT_PERF_REPO environment variables.

The intention of the tests themselves looks quite clear.  Thanks.

> +GIT_PAGER_IN_USE=1
> +test_export GIT_PAGER_IN_USE
> +
> +test_perf 'diff --no-color-moved --no-color-moved-ws large change' '
> +	git diff --no-color-moved --no-color-moved-ws v2.28.0 v2.29.0
> +'
> +
> +test_perf 'diff --color-moved --no-color-moved-ws large change' '
> +	git diff --color-moved=zebra --no-color-moved-ws v2.28.0 v2.29.0
> +'
> +
> +test_perf 'diff --color-moved-ws=allow-indentation-change large change' '
> +	git diff --color-moved=zebra --color-moved-ws=allow-indentation-change \
> +		v2.28.0 v2.29.0
> +'
> +
> +test_perf 'log --no-color-moved --no-color-moved-ws' '
> +	git log --no-color-moved --no-color-moved-ws --no-merges --patch \
> +		-n1000 v2.29.0
> +'
> +
> +test_perf 'log --color-moved --no-color-moved-ws' '
> +	git log --color-moved=zebra --no-color-moved-ws --no-merges --patch \
> +		-n1000 v2.29.0
> +'
> +
> +test_perf 'log --color-moved-ws=allow-indentation-change' '
> +	git log --color-moved=zebra --color-moved-ws=allow-indentation-change \
> +		--no-merges --patch -n1000 v2.29.0
> +'
> +
> +test_done

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v3 03/15] diff --color-moved: factor out function
  2021-10-27 12:04     ` [PATCH v3 03/15] diff --color-moved: factor out function Phillip Wood via GitGitGadget
@ 2021-10-28 21:51       ` Junio C Hamano
  2021-10-29 10:35         ` Phillip Wood
  0 siblings, 1 reply; 92+ messages in thread
From: Junio C Hamano @ 2021-10-28 21:51 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood

"Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> This code is quite heavily indented and having it in its own function
> simplifies an upcoming change.

And this should show as "moved" lines correctly in the output from
"log -p --color-moved -w"?

... not really.  There is an unfortunate artificial line wrapping in
the original, which was unwrapped by this move, so the blocks do not
exactly match.

> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  diff.c | 51 ++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 34 insertions(+), 17 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index bd8e4ec9757..09af94e018c 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1098,6 +1098,38 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb,
>  	return rp + 1;
>  }
>  
> +static void fill_potential_moved_blocks(struct diff_options *o,
> +					struct hashmap *hm,
> +					struct moved_entry *match,
> +					struct emitted_diff_symbol *l,
> +					struct moved_block **pmb_p,
> +					int *pmb_alloc_p, int *pmb_nr_p)
> +
> +{
> +	struct moved_block *pmb = *pmb_p;
> +	int pmb_alloc = *pmb_alloc_p, pmb_nr = *pmb_nr_p;
> +
> +	/*
> +	 * The current line is the start of a new block.
> +	 * Setup the set of potential blocks.
> +	 */
> +	hashmap_for_each_entry_from(hm, match, ent) {
> +		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
> +		if (o->color_moved_ws_handling &
> +		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
> +			if (compute_ws_delta(l, match->es, &(pmb[pmb_nr]).wsd))
> +				pmb[pmb_nr++].match = match;
> +		} else {
> +			pmb[pmb_nr].wsd = 0;
> +			pmb[pmb_nr++].match = match;
> +		}
> +	}
> +
> +	*pmb_p = pmb;
> +	*pmb_alloc_p = pmb_alloc;
> +	*pmb_nr_p = pmb_nr;
> +}
> +
>  /*
>   * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
>   *
> @@ -1198,23 +1230,8 @@ static void mark_color_as_moved(struct diff_options *o,
>  		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
>  
>  		if (pmb_nr == 0) {
> -			/*
> -			 * The current line is the start of a new block.
> -			 * Setup the set of potential blocks.
> -			 */
> -			hashmap_for_each_entry_from(hm, match, ent) {
> -				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
> -				if (o->color_moved_ws_handling &
> -				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
> -					if (compute_ws_delta(l, match->es,
> -							     &pmb[pmb_nr].wsd))
> -						pmb[pmb_nr++].match = match;
> -				} else {
> -					pmb[pmb_nr].wsd = 0;
> -					pmb[pmb_nr++].match = match;
> -				}
> -			}
> -
> +			fill_potential_moved_blocks(
> +				o, hm, match, l, &pmb, &pmb_alloc, &pmb_nr);
>  			if (adjust_last_block(o, n, block_length) &&
>  			    pmb_nr && last_symbol != l->s)
>  				flipped_block = (flipped_block + 1) % 2;

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v3 01/15] diff --color-moved: add perf tests
  2021-10-28 21:32       ` Junio C Hamano
@ 2021-10-29 10:24         ` Phillip Wood
  2021-10-29 11:06           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 92+ messages in thread
From: Phillip Wood @ 2021-10-29 10:24 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood via GitGitGadget
  Cc: git, Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren

Hi Junio

On 28/10/2021 22:32, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> Add some tests so we can monitor changes to the performance of the
>> move detection code. The tests record the performance of a single
>> large diff and a sequence of smaller diffs.
> 
> "A single large diff" meaning...?

The diff of two commits that are far apart in the history so have lots 
of changes between them

>> +if ! git rev-parse --verify v2.29.0^{commit} >/dev/null
>> +then
>> +	skip_all='skipping because tag v2.29.0 was not found'
>> +	test_done
>> +fi
> 
> Hmph.  So this is designed only to be run in a clone of git.git with
> that tag (and a bit of history, at least to v2.28.0 and 1000 commits)?
> 
> I am asking primarily because this seems to be the first instance of
> a test that hardcodes the dependency on our history, instead of
> allowing the tester to use their favourite history by using the
> GIT_PERF_LARGE_REPO and GIT_PERF_REPO environment variables.

p3404-rebase-interactive does the same thing. The aim is to have a 
repeatable test rather than just using whatever commit HEAD happens to 
be pointing at when the test is run as the starting point, if you have 
any ideas for doing that another way I'm happy to change it.

> The intention of the tests themselves looks quite clear.  Thanks.

Thanks

Phillip

>> +GIT_PAGER_IN_USE=1
>> +test_export GIT_PAGER_IN_USE
>> +
>> +test_perf 'diff --no-color-moved --no-color-moved-ws large change' '
>> +	git diff --no-color-moved --no-color-moved-ws v2.28.0 v2.29.0
>> +'
>> +
>> +test_perf 'diff --color-moved --no-color-moved-ws large change' '
>> +	git diff --color-moved=zebra --no-color-moved-ws v2.28.0 v2.29.0
>> +'
>> +
>> +test_perf 'diff --color-moved-ws=allow-indentation-change large change' '
>> +	git diff --color-moved=zebra --color-moved-ws=allow-indentation-change \
>> +		v2.28.0 v2.29.0
>> +'
>> +
>> +test_perf 'log --no-color-moved --no-color-moved-ws' '
>> +	git log --no-color-moved --no-color-moved-ws --no-merges --patch \
>> +		-n1000 v2.29.0
>> +'
>> +
>> +test_perf 'log --color-moved --no-color-moved-ws' '
>> +	git log --color-moved=zebra --no-color-moved-ws --no-merges --patch \
>> +		-n1000 v2.29.0
>> +'
>> +
>> +test_perf 'log --color-moved-ws=allow-indentation-change' '
>> +	git log --color-moved=zebra --color-moved-ws=allow-indentation-change \
>> +		--no-merges --patch -n1000 v2.29.0
>> +'
>> +
>> +test_done


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v3 03/15] diff --color-moved: factor out function
  2021-10-28 21:51       ` Junio C Hamano
@ 2021-10-29 10:35         ` Phillip Wood
  0 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood @ 2021-10-29 10:35 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood via GitGitGadget
  Cc: git, Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren

Hi Junio

On 28/10/2021 22:51, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> This code is quite heavily indented and having it in its own function
>> simplifies an upcoming change.
> 
> And this should show as "moved" lines correctly in the output from
> "log -p --color-moved -w"?

rather that "-w" one can use "--color-moved-ws=allow-indentation-change" 
to highlight moved lines where the indentation has changed. It took me a 
while to realize why "-w" does not do anything here but it is because 
the lines are moved as well as having their indentation changed.

> ... not really.  There is an unfortunate artificial line wrapping in
> the original, which was unwrapped by this move, so the blocks do not
> exactly match.

Yes that's a shame, it seemed overkill to have one commit moving the 
code as is and then another reformatting it. All of the moved lines 
apart from the one that is unwrapped are highlighted with 
"--color-moved-ws=allow-indentation-change".

Best Wishes

Phillip

>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>>   diff.c | 51 ++++++++++++++++++++++++++++++++++-----------------
>>   1 file changed, 34 insertions(+), 17 deletions(-)
>>
>> diff --git a/diff.c b/diff.c
>> index bd8e4ec9757..09af94e018c 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -1098,6 +1098,38 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb,
>>   	return rp + 1;
>>   }
>>   
>> +static void fill_potential_moved_blocks(struct diff_options *o,
>> +					struct hashmap *hm,
>> +					struct moved_entry *match,
>> +					struct emitted_diff_symbol *l,
>> +					struct moved_block **pmb_p,
>> +					int *pmb_alloc_p, int *pmb_nr_p)
>> +
>> +{
>> +	struct moved_block *pmb = *pmb_p;
>> +	int pmb_alloc = *pmb_alloc_p, pmb_nr = *pmb_nr_p;
>> +
>> +	/*
>> +	 * The current line is the start of a new block.
>> +	 * Setup the set of potential blocks.
>> +	 */
>> +	hashmap_for_each_entry_from(hm, match, ent) {
>> +		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
>> +		if (o->color_moved_ws_handling &
>> +		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
>> +			if (compute_ws_delta(l, match->es, &(pmb[pmb_nr]).wsd))
>> +				pmb[pmb_nr++].match = match;
>> +		} else {
>> +			pmb[pmb_nr].wsd = 0;
>> +			pmb[pmb_nr++].match = match;
>> +		}
>> +	}
>> +
>> +	*pmb_p = pmb;
>> +	*pmb_alloc_p = pmb_alloc;
>> +	*pmb_nr_p = pmb_nr;
>> +}
>> +
>>   /*
>>    * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
>>    *
>> @@ -1198,23 +1230,8 @@ static void mark_color_as_moved(struct diff_options *o,
>>   		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
>>   
>>   		if (pmb_nr == 0) {
>> -			/*
>> -			 * The current line is the start of a new block.
>> -			 * Setup the set of potential blocks.
>> -			 */
>> -			hashmap_for_each_entry_from(hm, match, ent) {
>> -				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
>> -				if (o->color_moved_ws_handling &
>> -				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
>> -					if (compute_ws_delta(l, match->es,
>> -							     &pmb[pmb_nr].wsd))
>> -						pmb[pmb_nr++].match = match;
>> -				} else {
>> -					pmb[pmb_nr].wsd = 0;
>> -					pmb[pmb_nr++].match = match;
>> -				}
>> -			}
>> -
>> +			fill_potential_moved_blocks(
>> +				o, hm, match, l, &pmb, &pmb_alloc, &pmb_nr);
>>   			if (adjust_last_block(o, n, block_length) &&
>>   			    pmb_nr && last_symbol != l->s)
>>   				flipped_block = (flipped_block + 1) % 2;


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v3 01/15] diff --color-moved: add perf tests
  2021-10-29 10:24         ` Phillip Wood
@ 2021-10-29 11:06           ` Ævar Arnfjörð Bjarmason
  2021-11-10 11:05             ` Phillip Wood
  0 siblings, 1 reply; 92+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-10-29 11:06 UTC (permalink / raw)
  To: phillip.wood
  Cc: Junio C Hamano, Phillip Wood via GitGitGadget, git, Elijah Newren


On Fri, Oct 29 2021, Phillip Wood wrote:

> Hi Junio
>
> On 28/10/2021 22:32, Junio C Hamano wrote:
>> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> 
>>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>>
>>> Add some tests so we can monitor changes to the performance of the
>>> move detection code. The tests record the performance of a single
>>> large diff and a sequence of smaller diffs.
>> "A single large diff" meaning...?
>
> The diff of two commits that are far apart in the history so have lots
> of changes between them
>
>>> +if ! git rev-parse --verify v2.29.0^{commit} >/dev/null
>>> +then
>>> +	skip_all='skipping because tag v2.29.0 was not found'
>>> +	test_done
>>> +fi
>> Hmph.  So this is designed only to be run in a clone of git.git with
>> that tag (and a bit of history, at least to v2.28.0 and 1000 commits)?
>> I am asking primarily because this seems to be the first instance of
>> a test that hardcodes the dependency on our history, instead of
>> allowing the tester to use their favourite history by using the
>> GIT_PERF_LARGE_REPO and GIT_PERF_REPO environment variables.
>
> p3404-rebase-interactive does the same thing. The aim is to have a
> repeatable test rather than just using whatever commit HEAD happens to 
> be pointing at when the test is run as the starting point, if you have
> any ideas for doing that another way I'm happy to change it.

I don't know if it's worth it here, but the following would work:

 1. List all tags in the repository, sorted in reverse order, so e.g.:

    git tag -l 'v*.0' --sort=version:refname

    (The glob can be configurable as an env variable, or we could fall
    back)

 2. Go down that list and find the first pair that matches some limit, I
    think say the first "major" release with 500 commits would qualify

 3. Make it a GIT_PERF_LARGE_REPO test

We've got some perf tests that do similar things. I think you'd find
that with something like this you should able to hand the perf test a
path to git.git, or linux.git, and probably any "major" repository" as
long as it follows a common "we tag our releases at some interval"
pattern.

Or perhaps more simply:

 1. Note the number of commits in the history, per "git rev-list HEAD |
    wc -l" 2.

 2. Then round that down to the nearest 10^x, so for a 250k commit
   repository round down to 100k and diff say the 90k..100kth commits,
   for git.git which has 60k that would be 10k, and the diff is commits
   9k..10k..

It means you'll get a "bump" eventually when say git.git crosses 100k
commits, but it will prorably be stable for any measurement anyone cares
to do, and means that you can get "realistic" measurements for diffing a
big chuck on of history from anything from a tiny repository with >=10
commits, to something truly gargantuan where you'd end up diffing say
900k..1m.
     


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v3 01/15] diff --color-moved: add perf tests
  2021-10-29 11:06           ` Ævar Arnfjörð Bjarmason
@ 2021-11-10 11:05             ` Phillip Wood
  0 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood @ 2021-11-10 11:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, phillip.wood
  Cc: Junio C Hamano, Phillip Wood via GitGitGadget, git, Elijah Newren

Hi Ævar

On 29/10/2021 12:06, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Oct 29 2021, Phillip Wood wrote:
> 
>> Hi Junio
>>
>> On 28/10/2021 22:32, Junio C Hamano wrote:
>>> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>>>
>>>> Add some tests so we can monitor changes to the performance of the
>>>> move detection code. The tests record the performance of a single
>>>> large diff and a sequence of smaller diffs.
>>> "A single large diff" meaning...?
>>
>> The diff of two commits that are far apart in the history so have lots
>> of changes between them
>>
>>>> +if ! git rev-parse --verify v2.29.0^{commit} >/dev/null
>>>> +then
>>>> +	skip_all='skipping because tag v2.29.0 was not found'
>>>> +	test_done
>>>> +fi
>>> Hmph.  So this is designed only to be run in a clone of git.git with
>>> that tag (and a bit of history, at least to v2.28.0 and 1000 commits)?
>>> I am asking primarily because this seems to be the first instance of
>>> a test that hardcodes the dependency on our history, instead of
>>> allowing the tester to use their favourite history by using the
>>> GIT_PERF_LARGE_REPO and GIT_PERF_REPO environment variables.
>>
>> p3404-rebase-interactive does the same thing. The aim is to have a
>> repeatable test rather than just using whatever commit HEAD happens to
>> be pointing at when the test is run as the starting point, if you have
>> any ideas for doing that another way I'm happy to change it.
> 
> I don't know if it's worth it here, but the following would work:
> 
>   1. List all tags in the repository, sorted in reverse order, so e.g.:
> 
>      git tag -l 'v*.0' --sort=version:refname
> 
>      (The glob can be configurable as an env variable, or we could fall
>      back)
> 
>   2. Go down that list and find the first pair that matches some limit, I
>      think say the first "major" release with 500 commits would qualify
> 
>   3. Make it a GIT_PERF_LARGE_REPO test
> 
> We've got some perf tests that do similar things. I think you'd find
> that with something like this you should able to hand the perf test a
> path to git.git, or linux.git, and probably any "major" repository" as
> long as it follows a common "we tag our releases at some interval"
> pattern.
> 
> Or perhaps more simply:
> 
>   1. Note the number of commits in the history, per "git rev-list HEAD |
>      wc -l" 2.
> 
>   2. Then round that down to the nearest 10^x, so for a 250k commit
>     repository round down to 100k and diff say the 90k..100kth commits,
>     for git.git which has 60k that would be 10k, and the diff is commits
>     9k..10k..
> 
> It means you'll get a "bump" eventually when say git.git crosses 100k
> commits, but it will prorably be stable for any measurement anyone cares
> to do, and means that you can get "realistic" measurements for diffing a
> big chuck on of history from anything from a tiny repository with >=10
> commits, to something truly gargantuan where you'd end up diffing say
> 900k..1m.

Thanks for the suggestions, I was quite tempted by the second idea, but 
in the end I couldn't face rerunning the pref tests and updating all the 
commit messages again. I've added a couple of environment variables to 
allow the revs in the diff commands to be customized.

Best Wishes

Phillip



^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v4 00/15] diff --color-moved[-ws] speedups
  2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
                       ` (15 preceding siblings ...)
  2021-10-27 13:28     ` [PATCH v3 00/15] diff --color-moved[-ws] speedups Phillip Wood
@ 2021-11-16  9:49     ` Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
                         ` (16 more replies)
  16 siblings, 17 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood

Changes since V3:

 * Patch 1 now allows the user to choose different endpoints for the diff
   perf tests to facilitate testing with different repositories.
 * Fixed the alignment of the perf results column headers in a couple of
   patches.

Changes since V2:

 * Patches 1-3 are new and fix an existing bug.
 * Patch 8 includes Peff's unused parameter fix.
 * Patch 11 has been updated to fix a bug fix in V2.
 * Patch 13 has an expanded commit message explaining a change in behavior
   for lines starting with a form-feed.
 * Updated benchmark results.

The bug fix in patch 3 degrades the performance, but by the end of the
series the timings are the same as V2 - see the range diff.

V2 Cover Letter: Thanks to Ævar and Elijah for their comments, I've reworded
the commit messages, addressed the enum initialization issue in patch 2 (now
3) and added some perf tests.

There are two new patches in this round. The first patch is new and adds the
perf tests suggested by Ævar, the penultimate patch is also new and coverts
the existing code to use a designated initializer.

I've converted the benchmark results in the commit messages to use the new
tests, the percentage changes are broadly similar to the previous results
though I ended up running them on a different computer this time.

V1 cover letter:

The current implementation of diff --color-moved-ws=allow-indentation-change
is considerably slower that the implementation of diff --color-moved which
is in turn slower than a regular diff. This patch series starts with a
couple of bug fixes and then reworks the implementation of diff
--color-moved and diff --color-moved-ws=allow-indentation-change to speed
them up on large diffs. The time to run git diff --color-moved
--no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
with --color-moved - the time to run git log -p --color-moved
--no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
processors. On older processors these patches reduce the running time in all
cases that I've tested. In general the larger the diff the larger the speed
up. As an extreme example the time to run diff --color-moved
--color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
minutes to 6 seconds.

Phillip Wood (15):
  diff --color-moved: add perf tests
  diff --color-moved: clear all flags on blocks that are too short
  diff --color-moved: factor out function
  diff --color-moved: rewind when discarding pmb
  diff --color-moved=zebra: fix alternate coloring
  diff --color-moved: avoid false short line matches and bad zerba
    coloring
  diff: simplify allow-indentation-change delta calculation
  diff --color-moved-ws=allow-indentation-change: simplify and optimize
  diff --color-moved: call comparison function directly
  diff --color-moved: unify moved block growth functions
  diff --color-moved: shrink potential moved blocks as we go
  diff --color-moved: stop clearing potential moved blocks
  diff --color-moved-ws=allow-indentation-change: improve hash lookups
  diff: use designated initializers for emitted_diff_symbol
  diff --color-moved: intern strings

 diff.c                           | 431 +++++++++++++------------------
 t/perf/p4002-diff-color-moved.sh |  57 ++++
 t/t4015-diff-whitespace.sh       | 205 ++++++++++++++-
 3 files changed, 437 insertions(+), 256 deletions(-)
 create mode 100755 t/perf/p4002-diff-color-moved.sh


base-commit: 211eca0895794362184da2be2a2d812d070719d3
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-981%2Fphillipwood%2Fwip%2Fdiff-color-moved-tweaks-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-981/phillipwood/wip/diff-color-moved-tweaks-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/981

Range-diff vs v3:

  1:  8fc8914a37b !  1:  48ee03cf52a diff --color-moved: add perf tests
     @@ Commit message
          diff --color-moved: add perf tests
      
          Add some tests so we can monitor changes to the performance of the
     -    move detection code. The tests record the performance of a single
     -    large diff and a sequence of smaller diffs.
     +    move detection code. The tests record the performance --color-moved
     +    and --color-moved-ws=allow-indentation-change for a large diff and a
     +    sequence of smaller diffs. The range of commits used for the large
     +    diff can be customized by exporting TEST_REV_A and TEST_REV_B when
     +    running the test.
      
          Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
      
     @@ t/perf/p4002-diff-color-moved.sh (new)
      +
      +test_perf_default_repo
      +
     -+if ! git rev-parse --verify v2.29.0^{commit} >/dev/null
     ++# The endpoints of the diff can be customized by setting TEST_REV_A
     ++# and TEST_REV_B in the environment when running this test.
     ++
     ++rev="${TEST_REV_A:-v2.28.0}"
     ++if ! rev_a="$(git rev-parse --quiet --verify "$rev")"
     ++then
     ++	skip_all="skipping because '$rev' was not found. \
     ++		  Use TEST_REV_A and TEST_REV_B to set the revs to use"
     ++	test_done
     ++fi
     ++rev="${TEST_REV_B:-v2.29.0}"
     ++if ! rev_b="$(git rev-parse --quiet --verify "$rev")"
      +then
     -+	skip_all='skipping because tag v2.29.0 was not found'
     ++	skip_all="skipping because '$rev' was not found. \
     ++		  Use TEST_REV_A and TEST_REV_B to set the revs to use"
      +	test_done
      +fi
      +
      +GIT_PAGER_IN_USE=1
     -+test_export GIT_PAGER_IN_USE
     ++test_export GIT_PAGER_IN_USE rev_a rev_b
      +
      +test_perf 'diff --no-color-moved --no-color-moved-ws large change' '
     -+	git diff --no-color-moved --no-color-moved-ws v2.28.0 v2.29.0
     ++	git diff --no-color-moved --no-color-moved-ws $rev_a $rev_b
      +'
      +
      +test_perf 'diff --color-moved --no-color-moved-ws large change' '
     -+	git diff --color-moved=zebra --no-color-moved-ws v2.28.0 v2.29.0
     ++	git diff --color-moved=zebra --no-color-moved-ws $rev_a $rev_b
      +'
      +
      +test_perf 'diff --color-moved-ws=allow-indentation-change large change' '
      +	git diff --color-moved=zebra --color-moved-ws=allow-indentation-change \
     -+		v2.28.0 v2.29.0
     ++		$rev_a $rev_b
      +'
      +
      +test_perf 'log --no-color-moved --no-color-moved-ws' '
      +	git log --no-color-moved --no-color-moved-ws --no-merges --patch \
     -+		-n1000 v2.29.0
     ++		-n1000 $rev_b
      +'
      +
      +test_perf 'log --color-moved --no-color-moved-ws' '
      +	git log --color-moved=zebra --no-color-moved-ws --no-merges --patch \
     -+		-n1000 v2.29.0
     ++		-n1000 $rev_b
      +'
      +
      +test_perf 'log --color-moved-ws=allow-indentation-change' '
      +	git log --color-moved=zebra --color-moved-ws=allow-indentation-change \
     -+		--no-merges --patch -n1000 v2.29.0
     ++		--no-merges --patch -n1000 $rev_b
      +'
      +
      +test_done
  2:  e9daed2360c =  2:  47c652716e8 diff --color-moved: clear all flags on blocks that are too short
  3:  658aec2670c =  3:  99e38ba9de9 diff --color-moved: factor out function
  4:  a30f52d7f15 !  4:  9ca71db61ae diff --color-moved: rewind when discarding pmb
     @@ Commit message
          mitigates the performance impact of this commit. After the
          optimization this commit has a negligible impact on performance.
      
     -    Test                                                                 HEAD^               HEAD
     -    ------------------------------------------------------------------------------------------------------------------
     +    Test                                                                  HEAD^               HEAD
     +    -----------------------------------------------------------------------------------------------------------------
          4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)    0.39 (0.34+0.04)  +2.6%
          4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.76+0.03)    0.86 (0.82+0.04)  +7.5%
          4002.3: diff --color-moved-ws=allow-indentation-change large change  14.22(14.17+0.04)   19.01(18.93+0.05) +33.7%
  5:  1dde206b7b1 =  5:  56bb69af36e diff --color-moved=zebra: fix alternate coloring
  6:  2717ff500d2 =  6:  10b11526206 diff --color-moved: avoid false short line matches and bad zerba coloring
  7:  f96fa71d53c =  7:  c2e7b347257 diff: simplify allow-indentation-change delta calculation
  8:  324b689c915 !  8:  d7bbc0041e0 diff --color-moved-ws=allow-indentation-change: simplify and optimize
     @@ Commit message
            git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
          by 93% compared to master and simplifies the code.
      
     -    Test                                                                 HEAD^               HEAD
     +    Test                                                                  HEAD^              HEAD
          ---------------------------------------------------------------------------------------------------------------
          4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.35+0.03)   0.38(0.35+0.03)  +0.0%
          4002.2: diff --color-moved --no-color-moved-ws large change           0.86 (0.80+0.06)   0.87(0.83+0.04)  +1.2%
     @@ Commit message
          4002.5: log --color-moved --no-color-moved-ws                         1.32 (1.25+0.07)   1.32(1.24+0.08)  +0.0%
          4002.6: log --color-moved-ws=allow-indentation-change                 1.71 (1.64+0.06)   1.36(1.25+0.10) -20.5%
      
     -    Test                                                                 master              HEAD
     +    Test                                                                  master             HEAD
          ---------------------------------------------------------------------------------------------------------------
          4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)   0.38(0.35+0.03)  +0.0%
          4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.75+0.04)   0.87(0.83+0.04)  +8.7%
  9:  f142f33276a =  9:  c3e5dce1910 diff --color-moved: call comparison function directly
 10:  8f3ea865dd3 = 10:  9eb8cecd52a diff --color-moved: unify moved block growth functions
 11:  078c04d4a66 = 11:  35e204e1578 diff --color-moved: shrink potential moved blocks as we go
 12:  618371471a0 = 12:  ec329e7946d diff --color-moved: stop clearing potential moved blocks
 13:  6a8e9a2724d = 13:  6ec94134aaf diff --color-moved-ws=allow-indentation-change: improve hash lookups
 14:  ef98a6e7015 = 14:  d44c5d734c3 diff: use designated initializers for emitted_diff_symbol
 15:  ae78c05f08d ! 15:  5177f669423 diff --color-moved: intern strings
     @@ Commit message
          little change in the performance of 'git log --patch' as the diffs are
          smaller.
      
     -    Test                                                                 HEAD^               HEAD
     +    Test                                                                  HEAD^              HEAD
          ---------------------------------------------------------------------------------------------------------------
          4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.33+0.05)    0.38(0.33+0.05)  +0.0%
          4002.2: diff --color-moved --no-color-moved-ws large change           0.88(0.81+0.06)    0.55(0.50+0.04) -37.5%
     @@ Commit message
          4002.5: log --color-moved --no-color-moved-ws                         1.31(1.22+0.08)    1.29(1.19+0.09)  -1.5%
          4002.6: log --color-moved-ws=allow-indentation-change                 1.32(1.24+0.08)    1.31(1.18+0.13)  -0.8%
      
     -    Test                                                                 master              HEAD
     +    Test                                                                  master             HEAD
          ---------------------------------------------------------------------------------------------------------------
          4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)   0.38(0.33+0.05)  +0.0%
          4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.75+0.04)   0.55(0.50+0.04) -31.2%

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v4 01/15] diff --color-moved: add perf tests
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 02/15] diff --color-moved: clear all flags on blocks that are too short Phillip Wood via GitGitGadget
                         ` (15 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Add some tests so we can monitor changes to the performance of the
move detection code. The tests record the performance --color-moved
and --color-moved-ws=allow-indentation-change for a large diff and a
sequence of smaller diffs. The range of commits used for the large
diff can be customized by exporting TEST_REV_A and TEST_REV_B when
running the test.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 t/perf/p4002-diff-color-moved.sh | 57 ++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)
 create mode 100755 t/perf/p4002-diff-color-moved.sh

diff --git a/t/perf/p4002-diff-color-moved.sh b/t/perf/p4002-diff-color-moved.sh
new file mode 100755
index 00000000000..ab2af931c04
--- /dev/null
+++ b/t/perf/p4002-diff-color-moved.sh
@@ -0,0 +1,57 @@
+#!/bin/sh
+
+test_description='Tests diff --color-moved performance'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+# The endpoints of the diff can be customized by setting TEST_REV_A
+# and TEST_REV_B in the environment when running this test.
+
+rev="${TEST_REV_A:-v2.28.0}"
+if ! rev_a="$(git rev-parse --quiet --verify "$rev")"
+then
+	skip_all="skipping because '$rev' was not found. \
+		  Use TEST_REV_A and TEST_REV_B to set the revs to use"
+	test_done
+fi
+rev="${TEST_REV_B:-v2.29.0}"
+if ! rev_b="$(git rev-parse --quiet --verify "$rev")"
+then
+	skip_all="skipping because '$rev' was not found. \
+		  Use TEST_REV_A and TEST_REV_B to set the revs to use"
+	test_done
+fi
+
+GIT_PAGER_IN_USE=1
+test_export GIT_PAGER_IN_USE rev_a rev_b
+
+test_perf 'diff --no-color-moved --no-color-moved-ws large change' '
+	git diff --no-color-moved --no-color-moved-ws $rev_a $rev_b
+'
+
+test_perf 'diff --color-moved --no-color-moved-ws large change' '
+	git diff --color-moved=zebra --no-color-moved-ws $rev_a $rev_b
+'
+
+test_perf 'diff --color-moved-ws=allow-indentation-change large change' '
+	git diff --color-moved=zebra --color-moved-ws=allow-indentation-change \
+		$rev_a $rev_b
+'
+
+test_perf 'log --no-color-moved --no-color-moved-ws' '
+	git log --no-color-moved --no-color-moved-ws --no-merges --patch \
+		-n1000 $rev_b
+'
+
+test_perf 'log --color-moved --no-color-moved-ws' '
+	git log --color-moved=zebra --no-color-moved-ws --no-merges --patch \
+		-n1000 $rev_b
+'
+
+test_perf 'log --color-moved-ws=allow-indentation-change' '
+	git log --color-moved=zebra --color-moved-ws=allow-indentation-change \
+		--no-merges --patch -n1000 $rev_b
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 02/15] diff --color-moved: clear all flags on blocks that are too short
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 03/15] diff --color-moved: factor out function Phillip Wood via GitGitGadget
                         ` (14 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

If a block of potentially moved lines is not long enough then the
DIFF_SYMBOL_MOVED_LINE flag is cleared on the matching lines so they
are not marked as moved. To avoid problems when we start rewinding
after an unsuccessful match in a couple of commits time make sure all
the move related flags are cleared, not just DIFF_SYMBOL_MOVED_LINE.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 52c791574b7..bd8e4ec9757 100644
--- a/diff.c
+++ b/diff.c
@@ -1114,6 +1114,8 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb,
  * NEEDSWORK: This uses the same heuristic as blame_entry_score() in blame.c.
  * Think of a way to unify them.
  */
+#define DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK \
+  (DIFF_SYMBOL_MOVED_LINE | DIFF_SYMBOL_MOVED_LINE_ALT)
 static int adjust_last_block(struct diff_options *o, int n, int block_length)
 {
 	int i, alnum_count = 0;
@@ -1130,7 +1132,7 @@ static int adjust_last_block(struct diff_options *o, int n, int block_length)
 		}
 	}
 	for (i = 1; i < block_length + 1; i++)
-		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
+		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK;
 	return 0;
 }
 
@@ -1237,8 +1239,6 @@ static void mark_color_as_moved(struct diff_options *o,
 	free(pmb);
 }
 
-#define DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK \
-  (DIFF_SYMBOL_MOVED_LINE | DIFF_SYMBOL_MOVED_LINE_ALT)
 static void dim_moved_lines(struct diff_options *o)
 {
 	int n;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 03/15] diff --color-moved: factor out function
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 02/15] diff --color-moved: clear all flags on blocks that are too short Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 04/15] diff --color-moved: rewind when discarding pmb Phillip Wood via GitGitGadget
                         ` (13 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This code is quite heavily indented and having it in its own function
simplifies an upcoming change.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 51 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index bd8e4ec9757..09af94e018c 100644
--- a/diff.c
+++ b/diff.c
@@ -1098,6 +1098,38 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb,
 	return rp + 1;
 }
 
+static void fill_potential_moved_blocks(struct diff_options *o,
+					struct hashmap *hm,
+					struct moved_entry *match,
+					struct emitted_diff_symbol *l,
+					struct moved_block **pmb_p,
+					int *pmb_alloc_p, int *pmb_nr_p)
+
+{
+	struct moved_block *pmb = *pmb_p;
+	int pmb_alloc = *pmb_alloc_p, pmb_nr = *pmb_nr_p;
+
+	/*
+	 * The current line is the start of a new block.
+	 * Setup the set of potential blocks.
+	 */
+	hashmap_for_each_entry_from(hm, match, ent) {
+		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
+		if (o->color_moved_ws_handling &
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
+			if (compute_ws_delta(l, match->es, &(pmb[pmb_nr]).wsd))
+				pmb[pmb_nr++].match = match;
+		} else {
+			pmb[pmb_nr].wsd = 0;
+			pmb[pmb_nr++].match = match;
+		}
+	}
+
+	*pmb_p = pmb;
+	*pmb_alloc_p = pmb_alloc;
+	*pmb_nr_p = pmb_nr;
+}
+
 /*
  * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
  *
@@ -1198,23 +1230,8 @@ static void mark_color_as_moved(struct diff_options *o,
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
 		if (pmb_nr == 0) {
-			/*
-			 * The current line is the start of a new block.
-			 * Setup the set of potential blocks.
-			 */
-			hashmap_for_each_entry_from(hm, match, ent) {
-				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
-				if (o->color_moved_ws_handling &
-				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
-					if (compute_ws_delta(l, match->es,
-							     &pmb[pmb_nr].wsd))
-						pmb[pmb_nr++].match = match;
-				} else {
-					pmb[pmb_nr].wsd = 0;
-					pmb[pmb_nr++].match = match;
-				}
-			}
-
+			fill_potential_moved_blocks(
+				o, hm, match, l, &pmb, &pmb_alloc, &pmb_nr);
 			if (adjust_last_block(o, n, block_length) &&
 			    pmb_nr && last_symbol != l->s)
 				flipped_block = (flipped_block + 1) % 2;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 04/15] diff --color-moved: rewind when discarding pmb
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (2 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 03/15] diff --color-moved: factor out function Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 05/15] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
                         ` (12 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

diff --color-moved colors the two sides of the diff separately. It
walks through the diff and tries to find matches on the other side of
the diff for the current line. When it finds one or more matches it
starts a "potential moved block" (pmb) and marks the current line as
moved. Then as it walks through the diff it only looks for matches for
the current line in the lines following those in the pmb. When none of
the lines in the pmb match it checks how long the match is and if it
is too short it unmarks the lines as matched and goes back to finding
all the lines that match the current line. As the process of finding
matching lines restarts from the end of the block that was too short
it is possible to miss the start of a matching block on on side but
not the other. In the test added here "-two" would not be colored as
moved but "+two" would be.

Fix this by rewinding the current line when we reach the end of a
block that is too short. This is quadratic in the length of the
discarded block. While the discarded blocks are quite short on a large
diff this still has a significant impact on the performance of
--color-moved-ws=allow-indentation-change. The following commits
optimize the performance of the --color-moved machinery which
mitigates the performance impact of this commit. After the
optimization this commit has a negligible impact on performance.

Test                                                                  HEAD^               HEAD
-----------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)    0.39 (0.34+0.04)  +2.6%
4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.76+0.03)    0.86 (0.82+0.04)  +7.5%
4002.3: diff --color-moved-ws=allow-indentation-change large change  14.22(14.17+0.04)   19.01(18.93+0.05) +33.7%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16 (1.06+0.09)    1.16 (1.07+0.07)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.31 (1.22+0.09)    1.32 (1.22+0.09)  +0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.71 (1.61+0.09)    1.72 (1.63+0.08)  +0.6%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 28 ++++++++++++++++++-----
 t/t4015-diff-whitespace.sh | 46 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 09af94e018c..1e1b5127d15 100644
--- a/diff.c
+++ b/diff.c
@@ -1205,7 +1205,15 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (!match) {
 			int i;
 
-			adjust_last_block(o, n, block_length);
+			if (!adjust_last_block(o, n, block_length) &&
+			    block_length > 1) {
+				/*
+				 * Rewind in case there is another match
+				 * starting at the second line of the block
+				 */
+				match = NULL;
+				n -= block_length;
+			}
 			for(i = 0; i < pmb_nr; i++)
 				moved_block_clear(&pmb[i]);
 			pmb_nr = 0;
@@ -1230,10 +1238,20 @@ static void mark_color_as_moved(struct diff_options *o,
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
 		if (pmb_nr == 0) {
-			fill_potential_moved_blocks(
-				o, hm, match, l, &pmb, &pmb_alloc, &pmb_nr);
-			if (adjust_last_block(o, n, block_length) &&
-			    pmb_nr && last_symbol != l->s)
+			int contiguous = adjust_last_block(o, n, block_length);
+
+			if (!contiguous && block_length > 1)
+				/*
+				 * Rewind in case there is another match
+				 * starting at the second line of the block
+				 */
+				n -= block_length;
+			else
+				fill_potential_moved_blocks(o, hm, match, l,
+							    &pmb, &pmb_alloc,
+							    &pmb_nr);
+
+			if (contiguous && pmb_nr && last_symbol != l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 2c13b62d3c6..308dc136596 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1833,6 +1833,52 @@ test_expect_success '--color-moved treats adjacent blocks as separate for MIN_AL
 	test_cmp expected actual
 '
 
+test_expect_success '--color-moved rewinds for MIN_ALNUM_COUNT' '
+	git reset --hard &&
+	test_write_lines >file \
+		A B C one two three four five six seven D E F G H I J &&
+	git add file &&
+	test_write_lines >file \
+		one two A B C D E F G H I J two three four five six seven &&
+	git diff --color-moved=zebra -- file &&
+
+	git diff --color-moved=zebra --color -- file >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/file b/file<RESET>
+	<BOLD>--- a/file<RESET>
+	<BOLD>+++ b/file<RESET>
+	<CYAN>@@ -1,13 +1,8 @@<RESET>
+	<GREEN>+<RESET><GREEN>one<RESET>
+	<GREEN>+<RESET><GREEN>two<RESET>
+	 A<RESET>
+	 B<RESET>
+	 C<RESET>
+	<RED>-one<RESET>
+	<BOLD;MAGENTA>-two<RESET>
+	<BOLD;MAGENTA>-three<RESET>
+	<BOLD;MAGENTA>-four<RESET>
+	<BOLD;MAGENTA>-five<RESET>
+	<BOLD;MAGENTA>-six<RESET>
+	<BOLD;MAGENTA>-seven<RESET>
+	 D<RESET>
+	 E<RESET>
+	 F<RESET>
+	<CYAN>@@ -15,3 +10,9 @@<RESET> <RESET>G<RESET>
+	 H<RESET>
+	 I<RESET>
+	 J<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>two<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>three<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>four<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>five<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>six<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>seven<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_expect_success 'move detection with submodules' '
 	test_create_repo bananas &&
 	echo ripe >bananas/recipe &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 05/15] diff --color-moved=zebra: fix alternate coloring
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (3 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 04/15] diff --color-moved: rewind when discarding pmb Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-22 13:34         ` Johannes Schindelin
  2021-11-16  9:49       ` [PATCH v4 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
                         ` (11 subsequent siblings)
  16 siblings, 1 reply; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
alternation", 2018-11-23) sought to avoid using the alternate colors
unless there are two adjacent moved blocks of the same
sign. Unfortunately it contains two bugs that prevented it from fixing
the problem properly. Firstly `last_symbol` is reset at the start of
each iteration of the loop losing the symbol of the last line and
secondly when deciding whether to use the alternate color it should be
checking if the current line is the same sign of the last line, not a
different sign. The combination of the two errors means that we still
use the alternate color when we should do but we also use it when we
shouldn't. This is most noticable when using
--color-moved-ws=allow-indentation-change with hunks like

-this line gets indented
+    this line gets indented

where the post image is colored with newMovedAlternate rather than
newMoved. While this does not matter much, the next commit will change
the coloring to be correct in this case, so lets fix the bug here to
make it clear why the output is changing and add a regression test.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     |  4 +--
 t/t4015-diff-whitespace.sh | 72 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 1e1b5127d15..53f0df75329 100644
--- a/diff.c
+++ b/diff.c
@@ -1176,6 +1176,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
 	int n, flipped_block = 0, block_length = 0;
+	enum diff_symbol last_symbol = 0;
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
@@ -1183,7 +1184,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-		enum diff_symbol last_symbol = 0;
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
@@ -1251,7 +1251,7 @@ static void mark_color_as_moved(struct diff_options *o,
 							    &pmb, &pmb_alloc,
 							    &pmb_nr);
 
-			if (contiguous && pmb_nr && last_symbol != l->s)
+			if (contiguous && pmb_nr && last_symbol == l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 308dc136596..4e0fd76c6c5 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1442,6 +1442,78 @@ test_expect_success 'detect permutations inside moved code -- dimmed-zebra' '
 	test_cmp expected actual
 '
 
+test_expect_success 'zebra alternate color is only used when necessary' '
+	cat >old.txt <<-\EOF &&
+	line 1A should be marked as oldMoved newMovedAlternate
+	line 1B should be marked as oldMoved newMovedAlternate
+	unchanged
+	line 2A should be marked as oldMoved newMovedAlternate
+	line 2B should be marked as oldMoved newMovedAlternate
+	line 3A should be marked as oldMovedAlternate newMoved
+	line 3B should be marked as oldMovedAlternate newMoved
+	unchanged
+	line 4A should be marked as oldMoved newMovedAlternate
+	line 4B should be marked as oldMoved newMovedAlternate
+	line 5A should be marked as oldMovedAlternate newMoved
+	line 5B should be marked as oldMovedAlternate newMoved
+	line 6A should be marked as oldMoved newMoved
+	line 6B should be marked as oldMoved newMoved
+	EOF
+	cat >new.txt <<-\EOF &&
+	  line 1A should be marked as oldMoved newMovedAlternate
+	  line 1B should be marked as oldMoved newMovedAlternate
+	unchanged
+	  line 3A should be marked as oldMovedAlternate newMoved
+	  line 3B should be marked as oldMovedAlternate newMoved
+	  line 2A should be marked as oldMoved newMovedAlternate
+	  line 2B should be marked as oldMoved newMovedAlternate
+	unchanged
+	  line 6A should be marked as oldMoved newMoved
+	  line 6B should be marked as oldMoved newMoved
+	    line 4A should be marked as oldMoved newMovedAlternate
+	    line 4B should be marked as oldMoved newMovedAlternate
+	  line 5A should be marked as oldMovedAlternate newMoved
+	  line 5B should be marked as oldMovedAlternate newMoved
+	EOF
+	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
+		 --color-moved-ws=allow-indentation-change \
+		 old.txt new.txt >output &&
+	grep -v index output | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/old.txt b/new.txt<RESET>
+	<BOLD>--- a/old.txt<RESET>
+	<BOLD>+++ b/new.txt<RESET>
+	<CYAN>@@ -1,14 +1,14 @@<RESET>
+	<BOLD;MAGENTA>-line 1A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 1B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1B should be marked as oldMoved newMovedAlternate<RESET>
+	 unchanged<RESET>
+	<BOLD;MAGENTA>-line 2A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 2B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;BLUE>-line 3A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;BLUE>-line 3B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2B should be marked as oldMoved newMovedAlternate<RESET>
+	 unchanged<RESET>
+	<BOLD;MAGENTA>-line 4A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 4B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;BLUE>-line 5A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;BLUE>-line 5B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;MAGENTA>-line 6A should be marked as oldMoved newMoved<RESET>
+	<BOLD;MAGENTA>-line 6B should be marked as oldMoved newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6A should be marked as oldMoved newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6B should be marked as oldMoved newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5B should be marked as oldMovedAlternate newMoved<RESET>
+	EOF
+	test_cmp expected actual
+'
+
 test_expect_success 'cmd option assumes configured colored-moved' '
 	test_config color.diff.oldMoved "magenta" &&
 	test_config color.diff.newMoved "cyan" &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (4 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 05/15] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-22 14:18         ` Johannes Schindelin
  2021-11-16  9:49       ` [PATCH v4 07/15] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
                         ` (10 subsequent siblings)
  16 siblings, 1 reply; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

When marking moved lines it is possible for a block of potential
matched lines to extend past a change in sign when there is a sequence
of added lines whose text matches the text of a sequence of deleted
and added lines. Most of the time either `match` will be NULL or
`pmb_advance_or_null()` will fail when the loop encounters a change of
sign but there are corner cases where `match` is non-NULL and
`pmb_advance_or_null()` successfully advances the moved block despite
the change in sign.

One consequence of this is highlighting a short line as moved when it
should not be. For example

-moved line  # Correctly highlighted as moved
+short line  # Wrongly highlighted as moved
 context
+moved line  # Correctly highlighted as moved
+short line
 context
-short line

The other consequence is coloring a moved addition following a moved
deletion in the wrong color. In the example below the first "+moved
line 3" should be highlighted as newMoved not newMovedAlternate.

-moved line 1 # Correctly highlighted as oldMoved
-moved line 2 # Correctly highlighted as oldMovedAlternate
+moved line 3 # Wrongly highlighted as newMovedAlternate
 context      # Everything else is highlighted correctly
+moved line 2
+moved line 3
 context
+moved line 1
-moved line 3

These false matches are more likely when using --color-moved-ws with
the exception of --color-moved-ws=allow-indentation-change which ties
the sign of the current whitespace delta to the sign of the line to
avoid this problem. The fix is to check that the sign of the new line
being matched is the same as the sign of the line that started the
block of potential matches.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 17 ++++++----
 t/t4015-diff-whitespace.sh | 65 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index 53f0df75329..efba2789354 100644
--- a/diff.c
+++ b/diff.c
@@ -1176,7 +1176,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
 	int n, flipped_block = 0, block_length = 0;
-	enum diff_symbol last_symbol = 0;
+	enum diff_symbol moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
@@ -1202,7 +1202,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			flipped_block = 0;
 		}
 
-		if (!match) {
+		if (pmb_nr && (!match || l->s != moved_symbol)) {
 			int i;
 
 			if (!adjust_last_block(o, n, block_length) &&
@@ -1219,12 +1219,13 @@ static void mark_color_as_moved(struct diff_options *o,
 			pmb_nr = 0;
 			block_length = 0;
 			flipped_block = 0;
-			last_symbol = l->s;
+		}
+		if (!match) {
+			moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
 			continue;
 		}
 
 		if (o->color_moved == COLOR_MOVED_PLAIN) {
-			last_symbol = l->s;
 			l->flags |= DIFF_SYMBOL_MOVED_LINE;
 			continue;
 		}
@@ -1251,11 +1252,16 @@ static void mark_color_as_moved(struct diff_options *o,
 							    &pmb, &pmb_alloc,
 							    &pmb_nr);
 
-			if (contiguous && pmb_nr && last_symbol == l->s)
+			if (contiguous && pmb_nr && moved_symbol == l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
 
+			if (pmb_nr)
+				moved_symbol = l->s;
+			else
+				moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
+
 			block_length = 0;
 		}
 
@@ -1265,7 +1271,6 @@ static void mark_color_as_moved(struct diff_options *o,
 			if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
 				l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 		}
-		last_symbol = l->s;
 	}
 	adjust_last_block(o, n, block_length);
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 4e0fd76c6c5..15782c879d2 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1514,6 +1514,71 @@ test_expect_success 'zebra alternate color is only used when necessary' '
 	test_cmp expected actual
 '
 
+test_expect_success 'short lines of opposite sign do not get marked as moved' '
+	cat >old.txt <<-\EOF &&
+	this line should be marked as moved
+	unchanged
+	unchanged
+	unchanged
+	unchanged
+	too short
+	this line should be marked as oldMoved newMoved
+	this line should be marked as oldMovedAlternate newMoved
+	unchanged 1
+	unchanged 2
+	unchanged 3
+	unchanged 4
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	EOF
+	cat >new.txt <<-\EOF &&
+	too short
+	unchanged
+	unchanged
+	this line should be marked as moved
+	too short
+	unchanged
+	unchanged
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	unchanged 1
+	unchanged 2
+	this line should be marked as oldMovedAlternate newMoved
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	unchanged 3
+	this line should be marked as oldMoved newMoved
+	unchanged 4
+	EOF
+	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
+		old.txt new.txt >output && cat output &&
+	grep -v index output | test_decode_color >actual &&
+	cat >expect <<-\EOF &&
+	<BOLD>diff --git a/old.txt b/new.txt<RESET>
+	<BOLD>--- a/old.txt<RESET>
+	<BOLD>+++ b/new.txt<RESET>
+	<CYAN>@@ -1,13 +1,15 @@<RESET>
+	<BOLD;MAGENTA>-this line should be marked as moved<RESET>
+	<GREEN>+<RESET><GREEN>too short<RESET>
+	 unchanged<RESET>
+	 unchanged<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as moved<RESET>
+	<GREEN>+<RESET><GREEN>too short<RESET>
+	 unchanged<RESET>
+	 unchanged<RESET>
+	<RED>-too short<RESET>
+	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved<RESET>
+	<BOLD;BLUE>-this line should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	 unchanged 1<RESET>
+	 unchanged 2<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	 unchanged 3<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved<RESET>
+	 unchanged 4<RESET>
+	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'cmd option assumes configured colored-moved' '
 	test_config color.diff.oldMoved "magenta" &&
 	test_config color.diff.newMoved "cyan" &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 07/15] diff: simplify allow-indentation-change delta calculation
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (5 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
                         ` (9 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Now that we reliably end a block when the sign changes we don't need
the whitespace delta calculation to rely on the sign.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index efba2789354..9aff167be27 100644
--- a/diff.c
+++ b/diff.c
@@ -864,23 +864,17 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 	    a_width = a->indent_width,
 	    b_off = b->indent_off,
 	    b_width = b->indent_width;
-	int delta;
 
 	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
 		*out = INDENT_BLANKLINE;
 		return 1;
 	}
 
-	if (a->s == DIFF_SYMBOL_PLUS)
-		delta = a_width - b_width;
-	else
-		delta = b_width - a_width;
-
 	if (a_len - a_off != b_len - b_off ||
 	    memcmp(a->line + a_off, b->line + b_off, a_len - a_off))
 		return 0;
 
-	*out = delta;
+	*out = a_width - b_width;
 
 	return 1;
 }
@@ -924,10 +918,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 * match those of the current block and that the text of 'l' and 'cur'
 	 * after the indentation match.
 	 */
-	if (cur->es->s == DIFF_SYMBOL_PLUS)
-		delta = a_width - c_width;
-	else
-		delta = c_width - a_width;
+	delta = c_width - a_width;
 
 	/*
 	 * If the previous lines of this block were all blank then set its
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (6 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 07/15] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-23 14:51         ` Johannes Schindelin
  2021-11-16  9:49       ` [PATCH v4 09/15] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
                         ` (8 subsequent siblings)
  16 siblings, 1 reply; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

If we already have a block of potentially moved lines then as we move
down the diff we need to check if the next line of each potentially
moved line matches the current line of the diff. The implementation of
--color-moved-ws=allow-indentation-change was needlessly performing
this check on all the lines in the diff that matched the current line
rather than just the current line. To exacerbate the problem finding
all the other lines in the diff that match the current line involves a
fuzzy lookup so we were wasting even more time performing a second
comparison to filter out the non-matching lines. Fixing this reduces
time to run
  git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
by 93% compared to master and simplifies the code.

Test                                                                  HEAD^              HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.35+0.03)   0.38(0.35+0.03)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.86 (0.80+0.06)   0.87(0.83+0.04)  +1.2%
4002.3: diff --color-moved-ws=allow-indentation-change large change  19.01(18.93+0.06)   0.97(0.92+0.04) -94.9%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16 (1.06+0.09)   1.17(1.06+0.10)  +0.9%
4002.5: log --color-moved --no-color-moved-ws                         1.32 (1.25+0.07)   1.32(1.24+0.08)  +0.0%
4002.6: log --color-moved-ws=allow-indentation-change                 1.71 (1.64+0.06)   1.36(1.25+0.10) -20.5%

Test                                                                  master             HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)   0.38(0.35+0.03)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.75+0.04)   0.87(0.83+0.04)  +8.7%
4002.3: diff --color-moved-ws=allow-indentation-change large change  14.20(14.15+0.05)   0.97(0.92+0.04) -93.2%
4002.4: log --no-color-moved --no-color-moved-ws                      1.15 (1.05+0.09)   1.17(1.06+0.10)  +1.7%
4002.5: log --color-moved --no-color-moved-ws                         1.30 (1.19+0.11)   1.32(1.24+0.08)  +1.5%
4002.6: log --color-moved-ws=allow-indentation-change                 1.70 (1.63+0.06)   1.36(1.25+0.10) -20.0%

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 70 +++++++++++++++++-----------------------------------------
 1 file changed, 20 insertions(+), 50 deletions(-)

diff --git a/diff.c b/diff.c
index 9aff167be27..78a486021ab 100644
--- a/diff.c
+++ b/diff.c
@@ -879,37 +879,21 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 	return 1;
 }
 
-static int cmp_in_block_with_wsd(const struct diff_options *o,
-				 const struct moved_entry *cur,
-				 const struct moved_entry *match,
-				 struct moved_block *pmb,
-				 int n)
-{
-	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-	int al = cur->es->len, bl = match->es->len, cl = l->len;
+static int cmp_in_block_with_wsd(const struct moved_entry *cur,
+				 const struct emitted_diff_symbol *l,
+				 struct moved_block *pmb)
+{
+	int al = cur->es->len, bl = l->len;
 	const char *a = cur->es->line,
-		   *b = match->es->line,
-		   *c = l->line;
+		   *b = l->line;
 	int a_off = cur->es->indent_off,
 	    a_width = cur->es->indent_width,
-	    c_off = l->indent_off,
-	    c_width = l->indent_width;
+	    b_off = l->indent_off,
+	    b_width = l->indent_width;
 	int delta;
 
-	/*
-	 * We need to check if 'cur' is equal to 'match'.  As those
-	 * are from the same (+/-) side, we do not need to adjust for
-	 * indent changes. However these were found using fuzzy
-	 * matching so we do have to check if they are equal. Here we
-	 * just check the lengths. We delay calling memcmp() to check
-	 * the contents until later as if the length comparison for a
-	 * and c fails we can avoid the call all together.
-	 */
-	if (al != bl)
-		return 1;
-
 	/* If 'l' and 'cur' are both blank then they match. */
-	if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE)
+	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
 		return 0;
 
 	/*
@@ -918,7 +902,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 * match those of the current block and that the text of 'l' and 'cur'
 	 * after the indentation match.
 	 */
-	delta = c_width - a_width;
+	delta = b_width - a_width;
 
 	/*
 	 * If the previous lines of this block were all blank then set its
@@ -927,9 +911,8 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	if (pmb->wsd == INDENT_BLANKLINE)
 		pmb->wsd = delta;
 
-	return !(delta == pmb->wsd && al - a_off == cl - c_off &&
-		 !memcmp(a, b, al) && !
-		 memcmp(a + a_off, c + c_off, al - a_off));
+	return !(delta == pmb->wsd && al - a_off == bl - b_off &&
+		 !memcmp(a + a_off, b + b_off, al - a_off));
 }
 
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
@@ -1030,36 +1013,23 @@ static void pmb_advance_or_null(struct diff_options *o,
 }
 
 static void pmb_advance_or_null_multi_match(struct diff_options *o,
-					    struct moved_entry *match,
-					    struct hashmap *hm,
+					    struct emitted_diff_symbol *l,
 					    struct moved_block *pmb,
-					    int pmb_nr, int n)
+					    int pmb_nr)
 {
 	int i;
-	char *got_match = xcalloc(1, pmb_nr);
-
-	hashmap_for_each_entry_from(hm, match, ent) {
-		for (i = 0; i < pmb_nr; i++) {
-			struct moved_entry *prev = pmb[i].match;
-			struct moved_entry *cur = (prev && prev->next_line) ?
-					prev->next_line : NULL;
-			if (!cur)
-				continue;
-			if (!cmp_in_block_with_wsd(o, cur, match, &pmb[i], n))
-				got_match[i] |= 1;
-		}
-	}
 
 	for (i = 0; i < pmb_nr; i++) {
-		if (got_match[i]) {
+		struct moved_entry *prev = pmb[i].match;
+		struct moved_entry *cur = (prev && prev->next_line) ?
+			prev->next_line : NULL;
+		if (cur && !cmp_in_block_with_wsd(cur, l, &pmb[i])) {
 			/* Advance to the next line */
-			pmb[i].match = pmb[i].match->next_line;
+			pmb[i].match = cur;
 		} else {
 			moved_block_clear(&pmb[i]);
 		}
 	}
-
-	free(got_match);
 }
 
 static int shrink_potential_moved_blocks(struct moved_block *pmb,
@@ -1223,7 +1193,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
+			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
 		else
 			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 09/15] diff --color-moved: call comparison function directly
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (7 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-23 15:09         ` Johannes Schindelin
  2021-11-16  9:49       ` [PATCH v4 10/15] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
                         ` (7 subsequent siblings)
  16 siblings, 1 reply; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This change will allow us to easily combine pmb_advance_or_null() and
pmb_advance_or_null_multi_match() in the next commit. Calling
xdiff_compare_lines() directly rather than using a function pointer
from the hash map has little effect on the run time.

Test                                                                  HEAD^             HEAD
-------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.35+0.03)   0.38(0.32+0.06) +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.87(0.83+0.04)   0.87(0.80+0.06) +0.0%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.97(0.92+0.04)   0.97(0.93+0.04) +0.0%
4002.4: log --no-color-moved --no-color-moved-ws                      1.17(1.06+0.10)   1.16(1.10+0.05) -0.9%
4002.5: log --color-moved --no-color-moved-ws                         1.32(1.24+0.08)   1.31(1.22+0.09) -0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.36(1.25+0.10)   1.35(1.25+0.10) -0.7%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 78a486021ab..22e0edac173 100644
--- a/diff.c
+++ b/diff.c
@@ -994,17 +994,20 @@ static void add_lines_to_move_detection(struct diff_options *o,
 }
 
 static void pmb_advance_or_null(struct diff_options *o,
-				struct moved_entry *match,
-				struct hashmap *hm,
+				struct emitted_diff_symbol *l,
 				struct moved_block *pmb,
 				int pmb_nr)
 {
 	int i;
+	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
+
 	for (i = 0; i < pmb_nr; i++) {
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
 				prev->next_line : NULL;
-		if (cur && !hm->cmpfn(o, &cur->ent, &match->ent, NULL)) {
+		if (cur && xdiff_compare_lines(cur->es->line, cur->es->len,
+						l->line, l->len,
+						flags)) {
 			pmb[i].match = cur;
 		} else {
 			pmb[i].match = NULL;
@@ -1195,7 +1198,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
 			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
 		else
-			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
+			pmb_advance_or_null(o, l, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 10/15] diff --color-moved: unify moved block growth functions
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (8 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 09/15] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 11/15] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
                         ` (6 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

After the last two commits pmb_advance_or_null() and
pmb_advance_or_null_multi_match() differ only in the comparison they
perform. Lets simplify the code by combining them into a single
function.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/diff.c b/diff.c
index 22e0edac173..51f092e724e 100644
--- a/diff.c
+++ b/diff.c
@@ -1002,36 +1002,23 @@ static void pmb_advance_or_null(struct diff_options *o,
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
 	for (i = 0; i < pmb_nr; i++) {
+		int match;
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
 				prev->next_line : NULL;
-		if (cur && xdiff_compare_lines(cur->es->line, cur->es->len,
-						l->line, l->len,
-						flags)) {
-			pmb[i].match = cur;
-		} else {
-			pmb[i].match = NULL;
-		}
-	}
-}
 
-static void pmb_advance_or_null_multi_match(struct diff_options *o,
-					    struct emitted_diff_symbol *l,
-					    struct moved_block *pmb,
-					    int pmb_nr)
-{
-	int i;
-
-	for (i = 0; i < pmb_nr; i++) {
-		struct moved_entry *prev = pmb[i].match;
-		struct moved_entry *cur = (prev && prev->next_line) ?
-			prev->next_line : NULL;
-		if (cur && !cmp_in_block_with_wsd(cur, l, &pmb[i])) {
-			/* Advance to the next line */
+		if (o->color_moved_ws_handling &
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			match = cur &&
+				!cmp_in_block_with_wsd(cur, l, &pmb[i]);
+		else
+			match = cur &&
+				xdiff_compare_lines(cur->es->line, cur->es->len,
+						    l->line, l->len, flags);
+		if (match)
 			pmb[i].match = cur;
-		} else {
+		else
 			moved_block_clear(&pmb[i]);
-		}
 	}
 }
 
@@ -1194,11 +1181,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 		}
 
-		if (o->color_moved_ws_handling &
-		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
-		else
-			pmb_advance_or_null(o, l, pmb, pmb_nr);
+		pmb_advance_or_null(o, l, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 11/15] diff --color-moved: shrink potential moved blocks as we go
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (9 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 10/15] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 12/15] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
                         ` (5 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Rather than setting `match` to NULL and then looping over the list of
potential matched blocks for a second time to remove blocks with no
matches just filter out the blocks with no matches as we go.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 44 ++++++++------------------------------------
 1 file changed, 8 insertions(+), 36 deletions(-)

diff --git a/diff.c b/diff.c
index 51f092e724e..626fd47aa0e 100644
--- a/diff.c
+++ b/diff.c
@@ -996,12 +996,12 @@ static void add_lines_to_move_detection(struct diff_options *o,
 static void pmb_advance_or_null(struct diff_options *o,
 				struct emitted_diff_symbol *l,
 				struct moved_block *pmb,
-				int pmb_nr)
+				int *pmb_nr)
 {
-	int i;
+	int i, j;
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
-	for (i = 0; i < pmb_nr; i++) {
+	for (i = 0, j = 0; i < *pmb_nr; i++) {
 		int match;
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
@@ -1015,38 +1015,12 @@ static void pmb_advance_or_null(struct diff_options *o,
 			match = cur &&
 				xdiff_compare_lines(cur->es->line, cur->es->len,
 						    l->line, l->len, flags);
-		if (match)
-			pmb[i].match = cur;
-		else
-			moved_block_clear(&pmb[i]);
-	}
-}
-
-static int shrink_potential_moved_blocks(struct moved_block *pmb,
-					 int pmb_nr)
-{
-	int lp, rp;
-
-	/* Shrink the set of potential block to the remaining running */
-	for (lp = 0, rp = pmb_nr - 1; lp <= rp;) {
-		while (lp < pmb_nr && pmb[lp].match)
-			lp++;
-		/* lp points at the first NULL now */
-
-		while (rp > -1 && !pmb[rp].match)
-			rp--;
-		/* rp points at the last non-NULL */
-
-		if (lp < pmb_nr && rp > -1 && lp < rp) {
-			pmb[lp] = pmb[rp];
-			memset(&pmb[rp], 0, sizeof(pmb[rp]));
-			rp--;
-			lp++;
+		if (match) {
+			pmb[j] = pmb[i];
+			pmb[j++].match = cur;
 		}
 	}
-
-	/* Remember the number of running sets */
-	return rp + 1;
+	*pmb_nr = j;
 }
 
 static void fill_potential_moved_blocks(struct diff_options *o,
@@ -1181,9 +1155,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 		}
 
-		pmb_advance_or_null(o, l, pmb, pmb_nr);
-
-		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
+		pmb_advance_or_null(o, l, pmb, &pmb_nr);
 
 		if (pmb_nr == 0) {
 			int contiguous = adjust_last_block(o, n, block_length);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 12/15] diff --color-moved: stop clearing potential moved blocks
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (10 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 11/15] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
                         ` (4 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

moved_block_clear() was introduced in 74d156f4a1 ("diff
--color-moved-ws: fix double free crash", 2018-10-04) to free the
memory that was allocated when initializing a potential moved
block. However since 21536d077f ("diff --color-moved-ws: modify
allow-indentation-change", 2018-11-23) initializing a potential moved
block no longer allocates any memory. Up until the last commit we were
relying on moved_block_clear() to set the `match` pointer to NULL when
a block stopped matching, but since that commit we do not clear a
moved block that does not match so it does not make sense to clear
them elsewhere.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/diff.c b/diff.c
index 626fd47aa0e..ffbe09937bc 100644
--- a/diff.c
+++ b/diff.c
@@ -807,11 +807,6 @@ struct moved_block {
 	int wsd; /* The whitespace delta of this block */
 };
 
-static void moved_block_clear(struct moved_block *b)
-{
-	memset(b, 0, sizeof(*b));
-}
-
 #define INDENT_BLANKLINE INT_MIN
 
 static void fill_es_indent_data(struct emitted_diff_symbol *es)
@@ -1128,8 +1123,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		if (pmb_nr && (!match || l->s != moved_symbol)) {
-			int i;
-
 			if (!adjust_last_block(o, n, block_length) &&
 			    block_length > 1) {
 				/*
@@ -1139,8 +1132,6 @@ static void mark_color_as_moved(struct diff_options *o,
 				match = NULL;
 				n -= block_length;
 			}
-			for(i = 0; i < pmb_nr; i++)
-				moved_block_clear(&pmb[i]);
 			pmb_nr = 0;
 			block_length = 0;
 			flipped_block = 0;
@@ -1193,8 +1184,6 @@ static void mark_color_as_moved(struct diff_options *o,
 	}
 	adjust_last_block(o, n, block_length);
 
-	for(n = 0; n < pmb_nr; n++)
-		moved_block_clear(&pmb[n]);
 	free(pmb);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (11 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 12/15] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 14/15] diff: use designated initializers for emitted_diff_symbol Phillip Wood via GitGitGadget
                         ` (3 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

As libxdiff does not have a whitespace flag to ignore the indentation
the code for --color-moved-ws=allow-indentation-change uses
XDF_IGNORE_WHITESPACE and then filters out any hash lookups where
there are non-indentation changes. This filtering is inefficient as
we have to perform another string comparison.

By using the offset data that we have already computed to skip the
indentation we can avoid using XDF_IGNORE_WHITESPACE and safely remove
the extra checks which improves the performance by 11% and paves the
way for the elimination of string comparisons in the next commit.

This change slightly increases the run time of other --color-moved
modes. This could be avoided by using different comparison functions
for the different modes but after the next two commits there is no
measurable benefit in doing so.

There is a change in behavior for lines that begin with a form-feed or
vertical-tab character. Since b46054b374 ("xdiff: use
git-compat-util", 2019-04-11) xdiff does not treat '\f' or '\v' as
whitespace characters. This means that lines starting with those
characters are never considered to be blank and never match a line
that does not start with the same character. After this patch a line
matching "^[\f\v\r]*[ \t]*$" is considered to be blank by
--color-moved-ws=allow-indentation-change and lines beginning
"^[\f\v\r]*[ \t]*" can match another line if the suffixes match. This
changes the output of git show for d18f76dccf ("compat/regex: use the
regex engine from gawk for compat", 2010-08-17) as some lines in the
pre-image before a moved block that contain '\f' are now considered
moved as well as they match a blank line before the moved lines in the
post-image. This commit updates one of the tests to reflect this
change.

Test                                                                  HEAD^             HEAD
--------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.33+0.05)   0.38(0.33+0.05)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.86(0.82+0.04)   0.88(0.84+0.04)  +2.3%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.97(0.94+0.03)   0.86(0.81+0.05) -11.3%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16(1.07+0.09)   1.16(1.06+0.09)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.32(1.26+0.06)   1.33(1.27+0.05)  +0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.35(1.29+0.06)   1.33(1.24+0.08)  -1.5%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 65 +++++++++++---------------------------
 t/t4015-diff-whitespace.sh | 22 ++++++-------
 2 files changed, 30 insertions(+), 57 deletions(-)

diff --git a/diff.c b/diff.c
index ffbe09937bc..2085c063675 100644
--- a/diff.c
+++ b/diff.c
@@ -850,28 +850,15 @@ static void fill_es_indent_data(struct emitted_diff_symbol *es)
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
-			    const struct emitted_diff_symbol *b,
-			    int *out)
-{
-	int a_len = a->len,
-	    b_len = b->len,
-	    a_off = a->indent_off,
-	    a_width = a->indent_width,
-	    b_off = b->indent_off,
+			    const struct emitted_diff_symbol *b)
+{
+	int a_width = a->indent_width,
 	    b_width = b->indent_width;
 
-	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
-		*out = INDENT_BLANKLINE;
-		return 1;
-	}
-
-	if (a_len - a_off != b_len - b_off ||
-	    memcmp(a->line + a_off, b->line + b_off, a_len - a_off))
-		return 0;
-
-	*out = a_width - b_width;
+	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
+		return INDENT_BLANKLINE;
 
-	return 1;
+	return a_width - b_width;
 }
 
 static int cmp_in_block_with_wsd(const struct moved_entry *cur,
@@ -916,26 +903,17 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			   const void *keydata)
 {
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
-	const struct moved_entry *a, *b;
+	const struct emitted_diff_symbol *a, *b;
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
-	a = container_of(eptr, const struct moved_entry, ent);
-	b = container_of(entry_or_key, const struct moved_entry, ent);
+	a = container_of(eptr, const struct moved_entry, ent)->es;
+	b = container_of(entry_or_key, const struct moved_entry, ent)->es;
 
-	if (diffopt->color_moved_ws_handling &
-	    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-		/*
-		 * As there is not specific white space config given,
-		 * we'd need to check for a new block, so ignore all
-		 * white space. The setup of the white space
-		 * configuration for the next block is done else where
-		 */
-		flags |= XDF_IGNORE_WHITESPACE;
-
-	return !xdiff_compare_lines(a->es->line, a->es->len,
-				    b->es->line, b->es->len,
-				    flags);
+	return !xdiff_compare_lines(a->line + a->indent_off,
+				    a->len - a->indent_off,
+				    b->line + b->indent_off,
+				    b->len - b->indent_off, flags);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -944,7 +922,8 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 	struct moved_entry *ret = xmalloc(sizeof(*ret));
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
-	unsigned int hash = xdiff_hash_string(l->line, l->len, flags);
+	unsigned int hash = xdiff_hash_string(l->line + l->indent_off,
+					      l->len - l->indent_off, flags);
 
 	hashmap_entry_init(&ret->ent, hash);
 	ret->es = l;
@@ -1036,13 +1015,11 @@ static void fill_potential_moved_blocks(struct diff_options *o,
 	hashmap_for_each_entry_from(hm, match, ent) {
 		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
 		if (o->color_moved_ws_handling &
-		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
-			if (compute_ws_delta(l, match->es, &(pmb[pmb_nr]).wsd))
-				pmb[pmb_nr++].match = match;
-		} else {
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			pmb[pmb_nr].wsd = compute_ws_delta(l, match->es);
+		else
 			pmb[pmb_nr].wsd = 0;
-			pmb[pmb_nr++].match = match;
-		}
+		pmb[pmb_nr++].match = match;
 	}
 
 	*pmb_p = pmb;
@@ -6276,10 +6253,6 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
-			if (o->color_moved_ws_handling &
-			    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-				o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
-
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
 			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 15782c879d2..50d0cf486be 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -2206,10 +2206,10 @@ EMPTY=''
 test_expect_success 'compare mixed whitespace delta across moved blocks' '
 
 	git reset --hard &&
-	tr Q_ "\t " <<-EOF >text.txt &&
-	${EMPTY}
-	____too short without
-	${EMPTY}
+	tr "^|Q_" "\f\v\t " <<-EOF >text.txt &&
+	^__
+	|____too short without
+	^
 	___being grouped across blank line
 	${EMPTY}
 	context
@@ -2228,7 +2228,7 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	git add text.txt &&
 	git commit -m "add text.txt" &&
 
-	tr Q_ "\t " <<-EOF >text.txt &&
+	tr "^|Q_" "\f\v\t " <<-EOF >text.txt &&
 	context
 	lines
 	to
@@ -2239,7 +2239,7 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	${EMPTY}
 	QQtoo short without
 	${EMPTY}
-	Q_______being grouped across blank line
+	^Q_______being grouped across blank line
 	${EMPTY}
 	Q_QThese two lines have had their
 	indentation reduced by four spaces
@@ -2251,16 +2251,16 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 		-c core.whitespace=space-before-tab \
 		diff --color --color-moved --ws-error-highlight=all \
 		--color-moved-ws=allow-indentation-change >actual.raw &&
-	grep -v "index" actual.raw | test_decode_color >actual &&
+	grep -v "index" actual.raw | tr "\f\v" "^|" | test_decode_color >actual &&
 
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/text.txt b/text.txt<RESET>
 	<BOLD>--- a/text.txt<RESET>
 	<BOLD>+++ b/text.txt<RESET>
 	<CYAN>@@ -1,16 +1,16 @@<RESET>
-	<BOLD;MAGENTA>-<RESET>
-	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    too short without<RESET>
-	<BOLD;MAGENTA>-<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>^<RESET><BRED>  <RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>|    too short without<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>^<RESET>
 	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>   being grouped across blank line<RESET>
 	<BOLD;MAGENTA>-<RESET>
 	 <RESET>context<RESET>
@@ -2280,7 +2280,7 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	<BOLD;YELLOW>+<RESET>
 	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>too short without<RESET>
 	<BOLD;YELLOW>+<RESET>
-	<BOLD;YELLOW>+<RESET>	<BOLD;YELLOW>       being grouped across blank line<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>^	       being grouped across blank line<RESET>
 	<BOLD;YELLOW>+<RESET>
 	<BOLD;CYAN>+<RESET>	<BRED> <RESET>	<BOLD;CYAN>These two lines have had their<RESET>
 	<BOLD;CYAN>+<RESET><BOLD;CYAN>indentation reduced by four spaces<RESET>
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 14/15] diff: use designated initializers for emitted_diff_symbol
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (12 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-11-16  9:49       ` [PATCH v4 15/15] diff --color-moved: intern strings Phillip Wood via GitGitGadget
                         ` (2 subsequent siblings)
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This makes it clearer which fields are being explicitly initialized
and will simplify the next commit where we add a new field to the
struct.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 2085c063675..9ef88d7665a 100644
--- a/diff.c
+++ b/diff.c
@@ -1497,7 +1497,9 @@ static void emit_diff_symbol_from_struct(struct diff_options *o,
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
 			     const char *line, int len, unsigned flags)
 {
-	struct emitted_diff_symbol e = {line, len, flags, 0, 0, s};
+	struct emitted_diff_symbol e = {
+		.line = line, .len = len, .flags = flags, .s = s
+	};
 
 	if (o->emitted_symbols)
 		append_emitted_diff_symbol(o, &e);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v4 15/15] diff --color-moved: intern strings
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (13 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 14/15] diff: use designated initializers for emitted_diff_symbol Phillip Wood via GitGitGadget
@ 2021-11-16  9:49       ` Phillip Wood via GitGitGadget
  2021-12-08 12:30       ` [PATCH v4 00/15] diff --color-moved[-ws] speedups Johannes Schindelin
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
  16 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-11-16  9:49 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Taking inspiration from xdl_classify_record() assign an id to each
addition and deletion such that lines that match for the current
--color-moved-ws mode share the same unique id. This reduces the
number of hash lookups a little (calculating the ids still involves
one hash lookup per line) but the main benefit is that when growing
blocks of potentially moved lines we can replace string comparisons
which involve chasing a pointer with a simple integer comparison. On a
large diff this commit reduces the time to run 'diff --color-moved' by
37% compared to the previous commit and 31% compared to master, for
'diff --color-moved-ws=allow-indentation-change' the reduction is 28%
compared to the previous commit and 96% compared to master. There is
little change in the performance of 'git log --patch' as the diffs are
smaller.

Test                                                                  HEAD^              HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.33+0.05)    0.38(0.33+0.05)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.88(0.81+0.06)    0.55(0.50+0.04) -37.5%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.85(0.79+0.06)    0.61(0.54+0.06) -28.2%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16(1.07+0.08)    1.15(1.09+0.05)  -0.9%
4002.5: log --color-moved --no-color-moved-ws                         1.31(1.22+0.08)    1.29(1.19+0.09)  -1.5%
4002.6: log --color-moved-ws=allow-indentation-change                 1.32(1.24+0.08)    1.31(1.18+0.13)  -0.8%

Test                                                                  master             HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)   0.38(0.33+0.05)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.75+0.04)   0.55(0.50+0.04) -31.2%
4002.3: diff --color-moved-ws=allow-indentation-change large change  14.20(14.15+0.05)   0.61(0.54+0.06) -95.7%
4002.4: log --no-color-moved --no-color-moved-ws                      1.15 (1.05+0.09)   1.15(1.09+0.05)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.30 (1.19+0.11)   1.29(1.19+0.09)  -0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.70 (1.63+0.06)   1.31(1.18+0.13) -22.9%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 174 +++++++++++++++++++++++++++++++--------------------------
 1 file changed, 96 insertions(+), 78 deletions(-)

diff --git a/diff.c b/diff.c
index 9ef88d7665a..c28c56c1283 100644
--- a/diff.c
+++ b/diff.c
@@ -18,6 +18,7 @@
 #include "submodule-config.h"
 #include "submodule.h"
 #include "hashmap.h"
+#include "mem-pool.h"
 #include "ll-merge.h"
 #include "string-list.h"
 #include "strvec.h"
@@ -772,6 +773,7 @@ struct emitted_diff_symbol {
 	int flags;
 	int indent_off;   /* Offset to first non-whitespace character */
 	int indent_width; /* The visual width of the indentation */
+	unsigned id;
 	enum diff_symbol s;
 };
 #define EMITTED_DIFF_SYMBOL_INIT {NULL}
@@ -797,9 +799,9 @@ static void append_emitted_diff_symbol(struct diff_options *o,
 }
 
 struct moved_entry {
-	struct hashmap_entry ent;
 	const struct emitted_diff_symbol *es;
 	struct moved_entry *next_line;
+	struct moved_entry *next_match;
 };
 
 struct moved_block {
@@ -865,24 +867,24 @@ static int cmp_in_block_with_wsd(const struct moved_entry *cur,
 				 const struct emitted_diff_symbol *l,
 				 struct moved_block *pmb)
 {
-	int al = cur->es->len, bl = l->len;
-	const char *a = cur->es->line,
-		   *b = l->line;
-	int a_off = cur->es->indent_off,
-	    a_width = cur->es->indent_width,
-	    b_off = l->indent_off,
-	    b_width = l->indent_width;
+	int a_width = cur->es->indent_width, b_width = l->indent_width;
 	int delta;
 
-	/* If 'l' and 'cur' are both blank then they match. */
-	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
+	/* The text of each line must match */
+	if (cur->es->id != l->id)
+		return 1;
+
+	/*
+	 * If 'l' and 'cur' are both blank then we don't need to check the
+	 * indent. We only need to check cur as we know the strings match.
+	 * */
+	if (a_width == INDENT_BLANKLINE)
 		return 0;
 
 	/*
 	 * The indent changes of the block are known and stored in pmb->wsd;
 	 * however we need to check if the indent changes of the current line
-	 * match those of the current block and that the text of 'l' and 'cur'
-	 * after the indentation match.
+	 * match those of the current block.
 	 */
 	delta = b_width - a_width;
 
@@ -893,22 +895,26 @@ static int cmp_in_block_with_wsd(const struct moved_entry *cur,
 	if (pmb->wsd == INDENT_BLANKLINE)
 		pmb->wsd = delta;
 
-	return !(delta == pmb->wsd && al - a_off == bl - b_off &&
-		 !memcmp(a + a_off, b + b_off, al - a_off));
+	return delta != pmb->wsd;
 }
 
-static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
-			   const struct hashmap_entry *eptr,
-			   const struct hashmap_entry *entry_or_key,
-			   const void *keydata)
+struct interned_diff_symbol {
+	struct hashmap_entry ent;
+	struct emitted_diff_symbol *es;
+};
+
+static int interned_diff_symbol_cmp(const void *hashmap_cmp_fn_data,
+				    const struct hashmap_entry *eptr,
+				    const struct hashmap_entry *entry_or_key,
+				    const void *keydata)
 {
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
 	const struct emitted_diff_symbol *a, *b;
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
-	a = container_of(eptr, const struct moved_entry, ent)->es;
-	b = container_of(entry_or_key, const struct moved_entry, ent)->es;
+	a = container_of(eptr, const struct interned_diff_symbol, ent)->es;
+	b = container_of(entry_or_key, const struct interned_diff_symbol, ent)->es;
 
 	return !xdiff_compare_lines(a->line + a->indent_off,
 				    a->len - a->indent_off,
@@ -916,55 +922,81 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 				    b->len - b->indent_off, flags);
 }
 
-static struct moved_entry *prepare_entry(struct diff_options *o,
-					 int line_no)
+static void prepare_entry(struct diff_options *o, struct emitted_diff_symbol *l,
+			  struct interned_diff_symbol *s)
 {
-	struct moved_entry *ret = xmalloc(sizeof(*ret));
-	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 	unsigned int hash = xdiff_hash_string(l->line + l->indent_off,
 					      l->len - l->indent_off, flags);
 
-	hashmap_entry_init(&ret->ent, hash);
-	ret->es = l;
-	ret->next_line = NULL;
-
-	return ret;
+	hashmap_entry_init(&s->ent, hash);
+	s->es = l;
 }
 
-static void add_lines_to_move_detection(struct diff_options *o,
-					struct hashmap *add_lines,
-					struct hashmap *del_lines)
+struct moved_entry_list {
+	struct moved_entry *add, *del;
+};
+
+static struct moved_entry_list *add_lines_to_move_detection(struct diff_options *o,
+							    struct mem_pool *entry_mem_pool)
 {
 	struct moved_entry *prev_line = NULL;
-
+	struct mem_pool interned_pool;
+	struct hashmap interned_map;
+	struct moved_entry_list *entry_list = NULL;
+	size_t entry_list_alloc = 0;
+	unsigned id = 0;
 	int n;
+
+	hashmap_init(&interned_map, interned_diff_symbol_cmp, o, 8096);
+	mem_pool_init(&interned_pool, 1024 * 1024);
+
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
-		struct hashmap *hm;
-		struct moved_entry *key;
+		struct interned_diff_symbol key;
+		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
+		struct interned_diff_symbol *s;
+		struct moved_entry *entry;
 
-		switch (o->emitted_symbols->buf[n].s) {
-		case DIFF_SYMBOL_PLUS:
-			hm = add_lines;
-			break;
-		case DIFF_SYMBOL_MINUS:
-			hm = del_lines;
-			break;
-		default:
+		if (l->s != DIFF_SYMBOL_PLUS && l->s != DIFF_SYMBOL_MINUS) {
 			prev_line = NULL;
 			continue;
 		}
 
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			fill_es_indent_data(&o->emitted_symbols->buf[n]);
-		key = prepare_entry(o, n);
-		if (prev_line && prev_line->es->s == o->emitted_symbols->buf[n].s)
-			prev_line->next_line = key;
+			fill_es_indent_data(l);
 
-		hashmap_add(hm, &key->ent);
-		prev_line = key;
+		prepare_entry(o, l, &key);
+		s = hashmap_get_entry(&interned_map, &key, ent, &key.ent);
+		if (s) {
+			l->id = s->es->id;
+		} else {
+			l->id = id;
+			ALLOC_GROW_BY(entry_list, id, 1, entry_list_alloc);
+			hashmap_add(&interned_map,
+				    memcpy(mem_pool_alloc(&interned_pool,
+							  sizeof(key)),
+					   &key, sizeof(key)));
+		}
+		entry = mem_pool_alloc(entry_mem_pool, sizeof(*entry));
+		entry->es = l;
+		entry->next_line = NULL;
+		if (prev_line && prev_line->es->s == l->s)
+			prev_line->next_line = entry;
+		prev_line = entry;
+		if (l->s == DIFF_SYMBOL_PLUS) {
+			entry->next_match = entry_list[l->id].add;
+			entry_list[l->id].add = entry;
+		} else {
+			entry->next_match = entry_list[l->id].del;
+			entry_list[l->id].del = entry;
+		}
 	}
+
+	hashmap_clear(&interned_map);
+	mem_pool_discard(&interned_pool, 0);
+
+	return entry_list;
 }
 
 static void pmb_advance_or_null(struct diff_options *o,
@@ -973,7 +1005,6 @@ static void pmb_advance_or_null(struct diff_options *o,
 				int *pmb_nr)
 {
 	int i, j;
-	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
 	for (i = 0, j = 0; i < *pmb_nr; i++) {
 		int match;
@@ -986,9 +1017,8 @@ static void pmb_advance_or_null(struct diff_options *o,
 			match = cur &&
 				!cmp_in_block_with_wsd(cur, l, &pmb[i]);
 		else
-			match = cur &&
-				xdiff_compare_lines(cur->es->line, cur->es->len,
-						    l->line, l->len, flags);
+			match = cur && cur->es->id == l->id;
+
 		if (match) {
 			pmb[j] = pmb[i];
 			pmb[j++].match = cur;
@@ -998,7 +1028,6 @@ static void pmb_advance_or_null(struct diff_options *o,
 }
 
 static void fill_potential_moved_blocks(struct diff_options *o,
-					struct hashmap *hm,
 					struct moved_entry *match,
 					struct emitted_diff_symbol *l,
 					struct moved_block **pmb_p,
@@ -1012,7 +1041,7 @@ static void fill_potential_moved_blocks(struct diff_options *o,
 	 * The current line is the start of a new block.
 	 * Setup the set of potential blocks.
 	 */
-	hashmap_for_each_entry_from(hm, match, ent) {
+	for (; match; match = match->next_match) {
 		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
@@ -1067,8 +1096,7 @@ static int adjust_last_block(struct diff_options *o, int n, int block_length)
 
 /* Find blocks of moved code, delegate actual coloring decision to helper */
 static void mark_color_as_moved(struct diff_options *o,
-				struct hashmap *add_lines,
-				struct hashmap *del_lines)
+				struct moved_entry_list *entry_list)
 {
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
@@ -1077,23 +1105,15 @@ static void mark_color_as_moved(struct diff_options *o,
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
-		struct hashmap *hm = NULL;
-		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
-			hm = del_lines;
-			key = prepare_entry(o, n);
-			match = hashmap_get_entry(hm, key, ent, NULL);
-			free(key);
+			match = entry_list[l->id].del;
 			break;
 		case DIFF_SYMBOL_MINUS:
-			hm = add_lines;
-			key = prepare_entry(o, n);
-			match = hashmap_get_entry(hm, key, ent, NULL);
-			free(key);
+			match = entry_list[l->id].add;
 			break;
 		default:
 			flipped_block = 0;
@@ -1135,7 +1155,7 @@ static void mark_color_as_moved(struct diff_options *o,
 				 */
 				n -= block_length;
 			else
-				fill_potential_moved_blocks(o, hm, match, l,
+				fill_potential_moved_blocks(o, match, l,
 							    &pmb, &pmb_alloc,
 							    &pmb_nr);
 
@@ -6253,20 +6273,18 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 
 	if (o->emitted_symbols) {
 		if (o->color_moved) {
-			struct hashmap add_lines, del_lines;
-
-			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
-			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
+			struct mem_pool entry_pool;
+			struct moved_entry_list *entry_list;
 
-			add_lines_to_move_detection(o, &add_lines, &del_lines);
-			mark_color_as_moved(o, &add_lines, &del_lines);
+			mem_pool_init(&entry_pool, 1024 * 1024);
+			entry_list = add_lines_to_move_detection(o,
+								 &entry_pool);
+			mark_color_as_moved(o, entry_list);
 			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
 				dim_moved_lines(o);
 
-			hashmap_clear_and_free(&add_lines, struct moved_entry,
-						ent);
-			hashmap_clear_and_free(&del_lines, struct moved_entry,
-						ent);
+			mem_pool_discard(&entry_pool, 0);
+			free(entry_list);
 		}
 
 		for (i = 0; i < esm.nr; i++)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [PATCH v4 05/15] diff --color-moved=zebra: fix alternate coloring
  2021-11-16  9:49       ` [PATCH v4 05/15] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
@ 2021-11-22 13:34         ` Johannes Schindelin
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2021-11-22 13:34 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

Hi Phillip,

On Tue, 16 Nov 2021, Phillip Wood via GitGitGadget wrote:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
> alternation", 2018-11-23) sought to avoid using the alternate colors
> unless there are two adjacent moved blocks of the same
> sign. Unfortunately it contains two bugs that prevented it from fixing
> the problem properly. Firstly `last_symbol` is reset at the start of
> each iteration of the loop losing the symbol of the last line and
> secondly when deciding whether to use the alternate color it should be
> checking if the current line is the same sign of the last line, not a
> different sign. The combination of the two errors means that we still
> use the alternate color when we should do but we also use it when we
> shouldn't. This is most noticable when using
> --color-moved-ws=allow-indentation-change with hunks like
>
> -this line gets indented
> +    this line gets indented
>
> where the post image is colored with newMovedAlternate rather than
> newMoved. While this does not matter much, the next commit will change
> the coloring to be correct in this case, so lets fix the bug here to
> make it clear why the output is changing and add a regression test.

What an excellent commit message!

Thank you,
Dscho

>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  diff.c                     |  4 +--
>  t/t4015-diff-whitespace.sh | 72 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 74 insertions(+), 2 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 1e1b5127d15..53f0df75329 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1176,6 +1176,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  	struct moved_block *pmb = NULL; /* potentially moved blocks */
>  	int pmb_nr = 0, pmb_alloc = 0;
>  	int n, flipped_block = 0, block_length = 0;
> +	enum diff_symbol last_symbol = 0;
>
>
>  	for (n = 0; n < o->emitted_symbols->nr; n++) {
> @@ -1183,7 +1184,6 @@ static void mark_color_as_moved(struct diff_options *o,
>  		struct moved_entry *key;
>  		struct moved_entry *match = NULL;
>  		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
> -		enum diff_symbol last_symbol = 0;
>
>  		switch (l->s) {
>  		case DIFF_SYMBOL_PLUS:
> @@ -1251,7 +1251,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  							    &pmb, &pmb_alloc,
>  							    &pmb_nr);
>
> -			if (contiguous && pmb_nr && last_symbol != l->s)
> +			if (contiguous && pmb_nr && last_symbol == l->s)
>  				flipped_block = (flipped_block + 1) % 2;
>  			else
>  				flipped_block = 0;
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 308dc136596..4e0fd76c6c5 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1442,6 +1442,78 @@ test_expect_success 'detect permutations inside moved code -- dimmed-zebra' '
>  	test_cmp expected actual
>  '
>
> +test_expect_success 'zebra alternate color is only used when necessary' '
> +	cat >old.txt <<-\EOF &&
> +	line 1A should be marked as oldMoved newMovedAlternate
> +	line 1B should be marked as oldMoved newMovedAlternate
> +	unchanged
> +	line 2A should be marked as oldMoved newMovedAlternate
> +	line 2B should be marked as oldMoved newMovedAlternate
> +	line 3A should be marked as oldMovedAlternate newMoved
> +	line 3B should be marked as oldMovedAlternate newMoved
> +	unchanged
> +	line 4A should be marked as oldMoved newMovedAlternate
> +	line 4B should be marked as oldMoved newMovedAlternate
> +	line 5A should be marked as oldMovedAlternate newMoved
> +	line 5B should be marked as oldMovedAlternate newMoved
> +	line 6A should be marked as oldMoved newMoved
> +	line 6B should be marked as oldMoved newMoved
> +	EOF
> +	cat >new.txt <<-\EOF &&
> +	  line 1A should be marked as oldMoved newMovedAlternate
> +	  line 1B should be marked as oldMoved newMovedAlternate
> +	unchanged
> +	  line 3A should be marked as oldMovedAlternate newMoved
> +	  line 3B should be marked as oldMovedAlternate newMoved
> +	  line 2A should be marked as oldMoved newMovedAlternate
> +	  line 2B should be marked as oldMoved newMovedAlternate
> +	unchanged
> +	  line 6A should be marked as oldMoved newMoved
> +	  line 6B should be marked as oldMoved newMoved
> +	    line 4A should be marked as oldMoved newMovedAlternate
> +	    line 4B should be marked as oldMoved newMovedAlternate
> +	  line 5A should be marked as oldMovedAlternate newMoved
> +	  line 5B should be marked as oldMovedAlternate newMoved
> +	EOF
> +	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
> +		 --color-moved-ws=allow-indentation-change \
> +		 old.txt new.txt >output &&
> +	grep -v index output | test_decode_color >actual &&
> +	cat >expected <<-\EOF &&
> +	<BOLD>diff --git a/old.txt b/new.txt<RESET>
> +	<BOLD>--- a/old.txt<RESET>
> +	<BOLD>+++ b/new.txt<RESET>
> +	<CYAN>@@ -1,14 +1,14 @@<RESET>
> +	<BOLD;MAGENTA>-line 1A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;MAGENTA>-line 1B should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1B should be marked as oldMoved newMovedAlternate<RESET>
> +	 unchanged<RESET>
> +	<BOLD;MAGENTA>-line 2A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;MAGENTA>-line 2B should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;BLUE>-line 3A should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;BLUE>-line 3B should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3A should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3B should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2B should be marked as oldMoved newMovedAlternate<RESET>
> +	 unchanged<RESET>
> +	<BOLD;MAGENTA>-line 4A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;MAGENTA>-line 4B should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;BLUE>-line 5A should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;BLUE>-line 5B should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;MAGENTA>-line 6A should be marked as oldMoved newMoved<RESET>
> +	<BOLD;MAGENTA>-line 6B should be marked as oldMoved newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6A should be marked as oldMoved newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6B should be marked as oldMoved newMoved<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4A should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4B should be marked as oldMoved newMovedAlternate<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5A should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5B should be marked as oldMovedAlternate newMoved<RESET>
> +	EOF
> +	test_cmp expected actual
> +'
> +
>  test_expect_success 'cmd option assumes configured colored-moved' '
>  	test_config color.diff.oldMoved "magenta" &&
>  	test_config color.diff.newMoved "cyan" &&
> --
> gitgitgadget
>
>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v4 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring
  2021-11-16  9:49       ` [PATCH v4 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
@ 2021-11-22 14:18         ` Johannes Schindelin
  2021-11-22 19:00           ` Phillip Wood
  0 siblings, 1 reply; 92+ messages in thread
From: Johannes Schindelin @ 2021-11-22 14:18 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

Hi Phillip,

The commit's oneline has a typo: zerba instead of zebra.

On Tue, 16 Nov 2021, Phillip Wood via GitGitGadget wrote:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> When marking moved lines it is possible for a block of potential
> matched lines to extend past a change in sign when there is a sequence
> of added lines whose text matches the text of a sequence of deleted
> and added lines. Most of the time either `match` will be NULL or
> `pmb_advance_or_null()` will fail when the loop encounters a change of
> sign but there are corner cases where `match` is non-NULL and
> `pmb_advance_or_null()` successfully advances the moved block despite
> the change in sign.
>
> One consequence of this is highlighting a short line as moved when it
> should not be. For example
>
> -moved line  # Correctly highlighted as moved
> +short line  # Wrongly highlighted as moved
>  context
> +moved line  # Correctly highlighted as moved
> +short line
>  context
> -short line
>
> The other consequence is coloring a moved addition following a moved
> deletion in the wrong color. In the example below the first "+moved
> line 3" should be highlighted as newMoved not newMovedAlternate.
>
> -moved line 1 # Correctly highlighted as oldMoved
> -moved line 2 # Correctly highlighted as oldMovedAlternate
> +moved line 3 # Wrongly highlighted as newMovedAlternate
>  context      # Everything else is highlighted correctly
> +moved line 2
> +moved line 3
>  context
> +moved line 1
> -moved line 3
>
> These false matches are more likely when using --color-moved-ws with
> the exception of --color-moved-ws=allow-indentation-change which ties
> the sign of the current whitespace delta to the sign of the line to
> avoid this problem. The fix is to check that the sign of the new line
> being matched is the same as the sign of the line that started the
> block of potential matches.
>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  diff.c                     | 17 ++++++----
>  t/t4015-diff-whitespace.sh | 65 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 76 insertions(+), 6 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 53f0df75329..efba2789354 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1176,7 +1176,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  	struct moved_block *pmb = NULL; /* potentially moved blocks */
>  	int pmb_nr = 0, pmb_alloc = 0;
>  	int n, flipped_block = 0, block_length = 0;
> -	enum diff_symbol last_symbol = 0;
> +	enum diff_symbol moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;

The exact value does not matter, as long as it is different from whatever
the next line will have, of course.

>
>
>  	for (n = 0; n < o->emitted_symbols->nr; n++) {
> @@ -1202,7 +1202,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  			flipped_block = 0;
>  		}
>
> -		if (!match) {
> +		if (pmb_nr && (!match || l->s != moved_symbol)) {
>  			int i;
>
>  			if (!adjust_last_block(o, n, block_length) &&
> @@ -1219,12 +1219,13 @@ static void mark_color_as_moved(struct diff_options *o,
>  			pmb_nr = 0;
>  			block_length = 0;
>  			flipped_block = 0;
> -			last_symbol = l->s;
> +		}

This is one of those instances where I dislike having the patch in a
static mail. I so want to have a _convenient_ way to expand the diff
context, to have a look around.

So I went over to
https://github.com/gitgitgadget/git/pull/981/commits/10b11526206d3b515ba08ac80ccf09ecb7a03420
to get the convenience I need for a pleasant reviewing experience.

In this instance, the `continue` that dropped out of that conditional
block gave me pause.

My understanding is that the diff makes it essentially a lot harder to
understand what is done here: this conditional block did two things, it
re-set the possibly-moved-block, and it skipped to the next loop
iteration. With this patch, we now re-set the possibly-moved-block in more
cases, but still skip to the next loop iteration under the same condition
as before:

> +		if (!match) {
> +			moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
>  			continue;
>  		}

However, after reading the commit message, I would have expected the
condition above to read `if (!match || l->s != moved_symbol)` instead of
`if (!match)`. Could you help me understand what I am missing?

>
>  		if (o->color_moved == COLOR_MOVED_PLAIN) {
> -			last_symbol = l->s;
>  			l->flags |= DIFF_SYMBOL_MOVED_LINE;
>  			continue;
>  		}

I want to make sure that I understand why the `last_symbol` assignment
could be removed without any `moved_symbol` assignment in its place. But I
don't, I still do not see why we do not need a `moved_symbol = l->s;`
assignment here.

Unless, that is, we extended the `!match` condition above to also cover
the case where `l->s != moved_symbol`.

> @@ -1251,11 +1252,16 @@ static void mark_color_as_moved(struct diff_options *o,
>  							    &pmb, &pmb_alloc,
>  							    &pmb_nr);
>
> -			if (contiguous && pmb_nr && last_symbol == l->s)
> +			if (contiguous && pmb_nr && moved_symbol == l->s)
>  				flipped_block = (flipped_block + 1) % 2;

This is totally not your fault, but I really wish we could have the much
simpler and much easier to understand `flipped_block = !flipped_block`
here.

>  			else
>  				flipped_block = 0;
>
> +			if (pmb_nr)
> +				moved_symbol = l->s;
> +			else
> +				moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
> +
>  			block_length = 0;
>  		}
>
> @@ -1265,7 +1271,6 @@ static void mark_color_as_moved(struct diff_options *o,
>  			if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
>  				l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
>  		}
> -		last_symbol = l->s;

That makes sense: we only set `moved_symbol` when `pmb_nr` had been 0 now,
and don't want it to be overridden.

As I said, I do not quite understand this patch yet, and am looking for
your guidance to wrap my head around it.

Thank you for working on this!
Dscho

>  	}
>  	adjust_last_block(o, n, block_length);
>
> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
> index 4e0fd76c6c5..15782c879d2 100755
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1514,6 +1514,71 @@ test_expect_success 'zebra alternate color is only used when necessary' '
>  	test_cmp expected actual
>  '
>
> +test_expect_success 'short lines of opposite sign do not get marked as moved' '
> +	cat >old.txt <<-\EOF &&
> +	this line should be marked as moved
> +	unchanged
> +	unchanged
> +	unchanged
> +	unchanged
> +	too short
> +	this line should be marked as oldMoved newMoved
> +	this line should be marked as oldMovedAlternate newMoved
> +	unchanged 1
> +	unchanged 2
> +	unchanged 3
> +	unchanged 4
> +	this line should be marked as oldMoved newMoved/newMovedAlternate
> +	EOF
> +	cat >new.txt <<-\EOF &&
> +	too short
> +	unchanged
> +	unchanged
> +	this line should be marked as moved
> +	too short
> +	unchanged
> +	unchanged
> +	this line should be marked as oldMoved newMoved/newMovedAlternate
> +	unchanged 1
> +	unchanged 2
> +	this line should be marked as oldMovedAlternate newMoved
> +	this line should be marked as oldMoved newMoved/newMovedAlternate
> +	unchanged 3
> +	this line should be marked as oldMoved newMoved
> +	unchanged 4
> +	EOF
> +	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
> +		old.txt new.txt >output && cat output &&
> +	grep -v index output | test_decode_color >actual &&
> +	cat >expect <<-\EOF &&
> +	<BOLD>diff --git a/old.txt b/new.txt<RESET>
> +	<BOLD>--- a/old.txt<RESET>
> +	<BOLD>+++ b/new.txt<RESET>
> +	<CYAN>@@ -1,13 +1,15 @@<RESET>
> +	<BOLD;MAGENTA>-this line should be marked as moved<RESET>
> +	<GREEN>+<RESET><GREEN>too short<RESET>
> +	 unchanged<RESET>
> +	 unchanged<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as moved<RESET>
> +	<GREEN>+<RESET><GREEN>too short<RESET>
> +	 unchanged<RESET>
> +	 unchanged<RESET>
> +	<RED>-too short<RESET>
> +	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved<RESET>
> +	<BOLD;BLUE>-this line should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
> +	 unchanged 1<RESET>
> +	 unchanged 2<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMovedAlternate newMoved<RESET>
> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
> +	 unchanged 3<RESET>
> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved<RESET>
> +	 unchanged 4<RESET>
> +	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
> +	EOF
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'cmd option assumes configured colored-moved' '
>  	test_config color.diff.oldMoved "magenta" &&
>  	test_config color.diff.newMoved "cyan" &&
> --
> gitgitgadget
>
>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v4 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring
  2021-11-22 14:18         ` Johannes Schindelin
@ 2021-11-22 19:00           ` Phillip Wood
  2021-11-22 21:54             ` Johannes Schindelin
  0 siblings, 1 reply; 92+ messages in thread
From: Phillip Wood @ 2021-11-22 19:00 UTC (permalink / raw)
  To: Johannes Schindelin, Phillip Wood via GitGitGadget
  Cc: git, Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren

Hi Dscho

Thanks ever so much for taking a detailed look at this series.

On 22/11/2021 14:18, Johannes Schindelin wrote:
> Hi Phillip,
> 
> The commit's oneline has a typo: zerba instead of zebra.

Sigh, I thought I'd fixed that

> On Tue, 16 Nov 2021, Phillip Wood via GitGitGadget wrote:
> 
>> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>>
>> When marking moved lines it is possible for a block of potential
>> matched lines to extend past a change in sign when there is a sequence
>> of added lines whose text matches the text of a sequence of deleted
>> and added lines. Most of the time either `match` will be NULL or
>> `pmb_advance_or_null()` will fail when the loop encounters a change of
>> sign but there are corner cases where `match` is non-NULL and
>> `pmb_advance_or_null()` successfully advances the moved block despite
>> the change in sign.
>>
>> One consequence of this is highlighting a short line as moved when it
>> should not be. For example
>>
>> -moved line  # Correctly highlighted as moved
>> +short line  # Wrongly highlighted as moved
>>   context
>> +moved line  # Correctly highlighted as moved
>> +short line
>>   context
>> -short line
>>
>> The other consequence is coloring a moved addition following a moved
>> deletion in the wrong color. In the example below the first "+moved
>> line 3" should be highlighted as newMoved not newMovedAlternate.
>>
>> -moved line 1 # Correctly highlighted as oldMoved
>> -moved line 2 # Correctly highlighted as oldMovedAlternate
>> +moved line 3 # Wrongly highlighted as newMovedAlternate
>>   context      # Everything else is highlighted correctly
>> +moved line 2
>> +moved line 3
>>   context
>> +moved line 1
>> -moved line 3
>>
>> These false matches are more likely when using --color-moved-ws with
>> the exception of --color-moved-ws=allow-indentation-change which ties
>> the sign of the current whitespace delta to the sign of the line to
>> avoid this problem. The fix is to check that the sign of the new line
>> being matched is the same as the sign of the line that started the
>> block of potential matches.
>>
>> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
>> ---
>>   diff.c                     | 17 ++++++----
>>   t/t4015-diff-whitespace.sh | 65 ++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 76 insertions(+), 6 deletions(-)
>>
>> diff --git a/diff.c b/diff.c
>> index 53f0df75329..efba2789354 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -1176,7 +1176,7 @@ static void mark_color_as_moved(struct diff_options *o,
>>   	struct moved_block *pmb = NULL; /* potentially moved blocks */
>>   	int pmb_nr = 0, pmb_alloc = 0;
>>   	int n, flipped_block = 0, block_length = 0;
>> -	enum diff_symbol last_symbol = 0;
>> +	enum diff_symbol moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
> 
> The exact value does not matter, as long as it is different from whatever
> the next line will have, of course.
> 
>>
>>
>>   	for (n = 0; n < o->emitted_symbols->nr; n++) {
>> @@ -1202,7 +1202,7 @@ static void mark_color_as_moved(struct diff_options *o,
>>   			flipped_block = 0;
>>   		}
>>
>> -		if (!match) {
>> +		if (pmb_nr && (!match || l->s != moved_symbol)) {
>>   			int i;
>>
>>   			if (!adjust_last_block(o, n, block_length) &&
>> @@ -1219,12 +1219,13 @@ static void mark_color_as_moved(struct diff_options *o,
>>   			pmb_nr = 0;
>>   			block_length = 0;
>>   			flipped_block = 0;
>> -			last_symbol = l->s;
>> +		}
> 
> This is one of those instances where I dislike having the patch in a
> static mail. I so want to have a _convenient_ way to expand the diff
> context, to have a look around.
> 
> So I went over to
> https://github.com/gitgitgadget/git/pull/981/commits/10b11526206d3b515ba08ac80ccf09ecb7a03420
> to get the convenience I need for a pleasant reviewing experience.
> 
> In this instance, the `continue` that dropped out of that conditional
> block gave me pause.
> 
> My understanding is that the diff makes it essentially a lot harder to
> understand what is done here: this conditional block did two things, it
> re-set the possibly-moved-block, and it skipped to the next loop
> iteration. With this patch, we now re-set the possibly-moved-block in more
> cases, but still skip to the next loop iteration under the same condition
> as before:
> 
>> +		if (!match) {
>> +			moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
>>   			continue;
>>   		}
> 
> However, after reading the commit message, I would have expected the
> condition above to read `if (!match || l->s != moved_symbol)` instead of
> `if (!match)`. Could you help me understand what I am missing?

If there is a match we want to carry on executing the body of the loop 
to start a new block of moved lines. moved_symbol will be updated at the 
end of the loop.

>>
>>   		if (o->color_moved == COLOR_MOVED_PLAIN) {
>> -			last_symbol = l->s;
>>   			l->flags |= DIFF_SYMBOL_MOVED_LINE;
>>   			continue;
>>   		}
> 
> I want to make sure that I understand why the `last_symbol` assignment
> could be removed without any `moved_symbol` assignment in its place. But I
> don't, I still do not see why we do not need a `moved_symbol = l->s;`
> assignment here.

I had to think about it but I think the answer is that COLOR_MOVED_PLAIN 
does not care about moved_symbol - it is only used by the zebra coloring 
modes.

> Unless, that is, we extended the `!match` condition above to also cover
> the case where `l->s != moved_symbol`.
> 
>> @@ -1251,11 +1252,16 @@ static void mark_color_as_moved(struct diff_options *o,
>>   							    &pmb, &pmb_alloc,
>>   							    &pmb_nr);
>>
>> -			if (contiguous && pmb_nr && last_symbol == l->s)
>> +			if (contiguous && pmb_nr && moved_symbol == l->s)
>>   				flipped_block = (flipped_block + 1) % 2;
> 
> This is totally not your fault, but I really wish we could have the much
> simpler and much easier to understand `flipped_block = !flipped_block`
> here.

It's partially my fault - I should have simplified it when I moved that 
line in b0a2ba4776 ("diff --color-moved=zebra: be stricter with color 
alternation", 2018-11-23)

>>   			else
>>   				flipped_block = 0;
>>
>> +			if (pmb_nr)
>> +				moved_symbol = l->s;
>> +			else
>> +				moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
>> +

This is where we update moved_symbol when it did not match l->s above.

    			block_length = 0;
>>   		}
>>
>> @@ -1265,7 +1271,6 @@ static void mark_color_as_moved(struct diff_options *o,
>>   			if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
>>   				l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
>>   		}
>> -		last_symbol = l->s;
> 
> That makes sense: we only set `moved_symbol` when `pmb_nr` had been 0 now,
> and don't want it to be overridden.
 >
> As I said, I do not quite understand this patch yet, and am looking for
> your guidance to wrap my head around it.
> 
> Thank you for working on this!

Thanks for looking at it, I hope these comments help, let me know if 
I've failed to explain well enough.

Best Wishes

Phillip

> Dscho
> 
>>   	}
>>   	adjust_last_block(o, n, block_length);
>>
>> diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
>> index 4e0fd76c6c5..15782c879d2 100755
>> --- a/t/t4015-diff-whitespace.sh
>> +++ b/t/t4015-diff-whitespace.sh
>> @@ -1514,6 +1514,71 @@ test_expect_success 'zebra alternate color is only used when necessary' '
>>   	test_cmp expected actual
>>   '
>>
>> +test_expect_success 'short lines of opposite sign do not get marked as moved' '
>> +	cat >old.txt <<-\EOF &&
>> +	this line should be marked as moved
>> +	unchanged
>> +	unchanged
>> +	unchanged
>> +	unchanged
>> +	too short
>> +	this line should be marked as oldMoved newMoved
>> +	this line should be marked as oldMovedAlternate newMoved
>> +	unchanged 1
>> +	unchanged 2
>> +	unchanged 3
>> +	unchanged 4
>> +	this line should be marked as oldMoved newMoved/newMovedAlternate
>> +	EOF
>> +	cat >new.txt <<-\EOF &&
>> +	too short
>> +	unchanged
>> +	unchanged
>> +	this line should be marked as moved
>> +	too short
>> +	unchanged
>> +	unchanged
>> +	this line should be marked as oldMoved newMoved/newMovedAlternate
>> +	unchanged 1
>> +	unchanged 2
>> +	this line should be marked as oldMovedAlternate newMoved
>> +	this line should be marked as oldMoved newMoved/newMovedAlternate
>> +	unchanged 3
>> +	this line should be marked as oldMoved newMoved
>> +	unchanged 4
>> +	EOF
>> +	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
>> +		old.txt new.txt >output && cat output &&
>> +	grep -v index output | test_decode_color >actual &&
>> +	cat >expect <<-\EOF &&
>> +	<BOLD>diff --git a/old.txt b/new.txt<RESET>
>> +	<BOLD>--- a/old.txt<RESET>
>> +	<BOLD>+++ b/new.txt<RESET>
>> +	<CYAN>@@ -1,13 +1,15 @@<RESET>
>> +	<BOLD;MAGENTA>-this line should be marked as moved<RESET>
>> +	<GREEN>+<RESET><GREEN>too short<RESET>
>> +	 unchanged<RESET>
>> +	 unchanged<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as moved<RESET>
>> +	<GREEN>+<RESET><GREEN>too short<RESET>
>> +	 unchanged<RESET>
>> +	 unchanged<RESET>
>> +	<RED>-too short<RESET>
>> +	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved<RESET>
>> +	<BOLD;BLUE>-this line should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
>> +	 unchanged 1<RESET>
>> +	 unchanged 2<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMovedAlternate newMoved<RESET>
>> +	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
>> +	 unchanged 3<RESET>
>> +	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved<RESET>
>> +	 unchanged 4<RESET>
>> +	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
>> +	EOF
>> +	test_cmp expect actual
>> +'
>> +
>>   test_expect_success 'cmd option assumes configured colored-moved' '
>>   	test_config color.diff.oldMoved "magenta" &&
>>   	test_config color.diff.newMoved "cyan" &&
>> --
>> gitgitgadget
>>
>>


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v4 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring
  2021-11-22 19:00           ` Phillip Wood
@ 2021-11-22 21:54             ` Johannes Schindelin
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2021-11-22 21:54 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Phillip Wood via GitGitGadget, git,
	Ævar Arnfjörð Bjarmason, Elijah Newren

Hi Phillip,

On Mon, 22 Nov 2021, Phillip Wood wrote:

[... a good explanation...]

> On 22/11/2021 14:18, Johannes Schindelin wrote:
>
> > As I said, I do not quite understand this patch yet, and am looking
> > for your guidance to wrap my head around it.
>
> Thanks for looking at it, I hope these comments help, let me know if I've
> failed to explain well enough.

Yes, thank you, I think I understand enough now to say that the patch
looks good to me.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v4 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize
  2021-11-16  9:49       ` [PATCH v4 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
@ 2021-11-23 14:51         ` Johannes Schindelin
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2021-11-23 14:51 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

Hi Phillip,

tl;dr: the patch looks good to me (it is a bit tricky to review, though,
but that is not your fault, it is our code review process' fault).

On Tue, 16 Nov 2021, Phillip Wood via GitGitGadget wrote:

>   git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
> by 93% compared to master and simplifies the code.

Nice!

> diff --git a/diff.c b/diff.c
> index 9aff167be27..78a486021ab 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -879,37 +879,21 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
>  	return 1;
>  }
>
> -static int cmp_in_block_with_wsd(const struct diff_options *o,
> -				 const struct moved_entry *cur,
> -				 const struct moved_entry *match,
> -				 struct moved_block *pmb,
> -				 int n)
> -{
> -	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
> -	int al = cur->es->len, bl = match->es->len, cl = l->len;
> +static int cmp_in_block_with_wsd(const struct moved_entry *cur,
> +				 const struct emitted_diff_symbol *l,
> +				 struct moved_block *pmb)
> +{
> +	int al = cur->es->len, bl = l->len;

Once I realized that the old `b` was removed and the old `c` became the
new `b`, it was a breeze to validate this hunk.

>  	const char *a = cur->es->line,
> -		   *b = match->es->line,
> -		   *c = l->line;
> +		   *b = l->line;
>  	int a_off = cur->es->indent_off,
>  	    a_width = cur->es->indent_width,
> -	    c_off = l->indent_off,
> -	    c_width = l->indent_width;
> +	    b_off = l->indent_off,
> +	    b_width = l->indent_width;
>  	int delta;
>
> -	/*
> -	 * We need to check if 'cur' is equal to 'match'.  As those
> -	 * are from the same (+/-) side, we do not need to adjust for
> -	 * indent changes. However these were found using fuzzy
> -	 * matching so we do have to check if they are equal. Here we
> -	 * just check the lengths. We delay calling memcmp() to check
> -	 * the contents until later as if the length comparison for a
> -	 * and c fails we can avoid the call all together.
> -	 */
> -	if (al != bl)
> -		return 1;

The commit message really helped understanding why this is not needed.
Thank you!

> -
>  	/* If 'l' and 'cur' are both blank then they match. */
> -	if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE)
> +	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
>  		return 0;
>
>  	/*
> @@ -918,7 +902,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
>  	 * match those of the current block and that the text of 'l' and 'cur'
>  	 * after the indentation match.
>  	 */
> -	delta = c_width - a_width;
> +	delta = b_width - a_width;
>
>  	/*
>  	 * If the previous lines of this block were all blank then set its
> @@ -927,9 +911,8 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
>  	if (pmb->wsd == INDENT_BLANKLINE)
>  		pmb->wsd = delta;
>
> -	return !(delta == pmb->wsd && al - a_off == cl - c_off &&
> -		 !memcmp(a, b, al) && !
> -		 memcmp(a + a_off, c + c_off, al - a_off));
> +	return !(delta == pmb->wsd && al - a_off == bl - b_off &&
> +		 !memcmp(a + a_off, b + b_off, al - a_off));
>  }

Once again, I am sad that we have no better platform to do our code
contribution and review. Whatever you can say about GitHub's UI, it is
better than static diffs in mails.

But you used GitGitGadget, and I finally broke down and wrote a script
that allows me to magic my way from the mail into the correct commit in
the GitGitGadget PR in the browser. It is still shell script (at some
stage, I will need to extend the script to be much smarter than any shell
script can be, and probably convert it to node.js, but not today).

This helped me verify that there are no left-over references to the old
`b`. So all is good!

>
>  static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
> @@ -1030,36 +1013,23 @@ static void pmb_advance_or_null(struct diff_options *o,
>  }
>
>  static void pmb_advance_or_null_multi_match(struct diff_options *o,
> -					    struct moved_entry *match,
> -					    struct hashmap *hm,
> +					    struct emitted_diff_symbol *l,
>  					    struct moved_block *pmb,
> -					    int pmb_nr, int n)
> +					    int pmb_nr)
>  {
>  	int i;
> -	char *got_match = xcalloc(1, pmb_nr);
> -
> -	hashmap_for_each_entry_from(hm, match, ent) {
> -		for (i = 0; i < pmb_nr; i++) {
> -			struct moved_entry *prev = pmb[i].match;
> -			struct moved_entry *cur = (prev && prev->next_line) ?
> -					prev->next_line : NULL;
> -			if (!cur)
> -				continue;
> -			if (!cmp_in_block_with_wsd(o, cur, match, &pmb[i], n))
> -				got_match[i] |= 1;
> -		}
> -	}
>
>  	for (i = 0; i < pmb_nr; i++) {
> -		if (got_match[i]) {
> +		struct moved_entry *prev = pmb[i].match;
> +		struct moved_entry *cur = (prev && prev->next_line) ?
> +			prev->next_line : NULL;
> +		if (cur && !cmp_in_block_with_wsd(cur, l, &pmb[i])) {
>  			/* Advance to the next line */
> -			pmb[i].match = pmb[i].match->next_line;
> +			pmb[i].match = cur;
>  		} else {
>  			moved_block_clear(&pmb[i]);
>  		}
>  	}
> -
> -	free(got_match);

Even got rid of an allocation. Very nice.

>  }
>
>  static int shrink_potential_moved_blocks(struct moved_block *pmb,
> @@ -1223,7 +1193,7 @@ static void mark_color_as_moved(struct diff_options *o,
>
>  		if (o->color_moved_ws_handling &
>  		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
> -			pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
> +			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);

Again, magic button to the rescue! And I can verify that `l` is assigned
to `&o->emitted_symbols->buf[n]`, so: the patch does the correct thing.

Thank you,
Dscho

>  		else
>  			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
>
> --
> gitgitgadget
>
>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v4 09/15] diff --color-moved: call comparison function directly
  2021-11-16  9:49       ` [PATCH v4 09/15] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
@ 2021-11-23 15:09         ` Johannes Schindelin
  0 siblings, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2021-11-23 15:09 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood, Phillip Wood

Hi Phillip,

On Tue, 16 Nov 2021, Phillip Wood via GitGitGadget wrote:

> From: Phillip Wood <phillip.wood@dunelm.org.uk>
>
> This change will allow us to easily combine pmb_advance_or_null() and
> pmb_advance_or_null_multi_match() in the next commit. Calling
> xdiff_compare_lines() directly rather than using a function pointer
> from the hash map has little effect on the run time.

Good. I verified that the function `moved_entry_cmp()`
(https://github.com/gitgitgadget/git/blob/c3e5dce191/diff.c#L918-L944)
calls `xdiff_compare_lines()`, and it is this function that is used for
both `del_lines` and `add_lines` which would have been passed as `hm`:
https://github.com/gitgitgadget/git/blob/c3e5dce191/diff.c#L6339-L6340

>
> Test                                                                  HEAD^             HEAD
> -------------------------------------------------------------------------------------------------------------
> 4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.35+0.03)   0.38(0.32+0.06) +0.0%
> 4002.2: diff --color-moved --no-color-moved-ws large change           0.87(0.83+0.04)   0.87(0.80+0.06) +0.0%
> 4002.3: diff --color-moved-ws=allow-indentation-change large change   0.97(0.92+0.04)   0.97(0.93+0.04) +0.0%
> 4002.4: log --no-color-moved --no-color-moved-ws                      1.17(1.06+0.10)   1.16(1.10+0.05) -0.9%
> 4002.5: log --color-moved --no-color-moved-ws                         1.32(1.24+0.08)   1.31(1.22+0.09) -0.8%
> 4002.6: log --color-moved-ws=allow-indentation-change                 1.36(1.25+0.10)   1.35(1.25+0.10) -0.7%

Honestly, I would have expected an improvement, given that
`moved_entry_cmp()` has to do a few things before it can call
`xdiff_compare_lines()`.

I love your attention to detail, providing performance numbers in the
commit message to prove that it at least has no negative impact on the
speed.

Thanks,
Dscho

>
> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> ---
>  diff.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 78a486021ab..22e0edac173 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -994,17 +994,20 @@ static void add_lines_to_move_detection(struct diff_options *o,
>  }
>
>  static void pmb_advance_or_null(struct diff_options *o,
> -				struct moved_entry *match,
> -				struct hashmap *hm,
> +				struct emitted_diff_symbol *l,
>  				struct moved_block *pmb,
>  				int pmb_nr)
>  {
>  	int i;
> +	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
> +
>  	for (i = 0; i < pmb_nr; i++) {
>  		struct moved_entry *prev = pmb[i].match;
>  		struct moved_entry *cur = (prev && prev->next_line) ?
>  				prev->next_line : NULL;
> -		if (cur && !hm->cmpfn(o, &cur->ent, &match->ent, NULL)) {
> +		if (cur && xdiff_compare_lines(cur->es->line, cur->es->len,
> +						l->line, l->len,
> +						flags)) {
>  			pmb[i].match = cur;
>  		} else {
>  			pmb[i].match = NULL;
> @@ -1195,7 +1198,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
>  			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
>  		else
> -			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
> +			pmb_advance_or_null(o, l, pmb, pmb_nr);
>
>  		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
>
> --
> gitgitgadget
>
>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [PATCH v4 00/15] diff --color-moved[-ws] speedups
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (14 preceding siblings ...)
  2021-11-16  9:49       ` [PATCH v4 15/15] diff --color-moved: intern strings Phillip Wood via GitGitGadget
@ 2021-12-08 12:30       ` Johannes Schindelin
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
  16 siblings, 0 replies; 92+ messages in thread
From: Johannes Schindelin @ 2021-12-08 12:30 UTC (permalink / raw)
  To: Phillip Wood via GitGitGadget
  Cc: git, Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Phillip Wood

Hi Phillip,

On Tue, 16 Nov 2021, Phillip Wood via GitGitGadget wrote:

> Changes since V3:
>
>  * Patch 1 now allows the user to choose different endpoints for the diff
>    perf tests to facilitate testing with different repositories.
>  * Fixed the alignment of the perf results column headers in a couple of
>    patches.

I finished reading over the patches in this iteration. Although I did not
have the mental bandwidth to review in particular the last patch in the
series in detail, the regression tests seem comprehensive enough for me to
be confident in the correctness.

So: from my side, this patch series is good to go!

Thank you,
Dscho

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v5 00/15] diff --color-moved[-ws] speedups
  2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
                         ` (15 preceding siblings ...)
  2021-12-08 12:30       ` [PATCH v4 00/15] diff --color-moved[-ws] speedups Johannes Schindelin
@ 2021-12-09 10:29       ` Phillip Wood via GitGitGadget
  2021-12-09 10:29         ` [PATCH v5 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
                           ` (14 more replies)
  16 siblings, 15 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:29 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood

Thanks to Dscho for his comments on V3. Changes since V4:

 * Fixed a typo in the commit message to patch 6

Changes since V3:

 * Patch 1 now allows the user to choose different endpoints for the diff
   perf tests to facilitate testing with different repositories.
 * Fixed the alignment of the perf results column headers in a couple of
   patches.

Changes since V2:

 * Patches 1-3 are new and fix an existing bug.
 * Patch 8 includes Peff's unused parameter fix.
 * Patch 11 has been updated to fix a bug fix in V2.
 * Patch 13 has an expanded commit message explaining a change in behavior
   for lines starting with a form-feed.
 * Updated benchmark results.

The bug fix in patch 3 degrades the performance, but by the end of the
series the timings are the same as V2 - see the range diff.

V2 Cover Letter: Thanks to Ævar and Elijah for their comments, I've reworded
the commit messages, addressed the enum initialization issue in patch 2 (now
3) and added some perf tests.

There are two new patches in this round. The first patch is new and adds the
perf tests suggested by Ævar, the penultimate patch is also new and coverts
the existing code to use a designated initializer.

I've converted the benchmark results in the commit messages to use the new
tests, the percentage changes are broadly similar to the previous results
though I ended up running them on a different computer this time.

V1 cover letter:

The current implementation of diff --color-moved-ws=allow-indentation-change
is considerably slower that the implementation of diff --color-moved which
is in turn slower than a regular diff. This patch series starts with a
couple of bug fixes and then reworks the implementation of diff
--color-moved and diff --color-moved-ws=allow-indentation-change to speed
them up on large diffs. The time to run git diff --color-moved
--no-color-moved-ws v2.28.0 v2.29.0 is reduced by 33% and the time to run
git diff --color-moved --color-moved-ws=allow-indentation-change v2.28.0
v2.29.0 is reduced by 88%. There is a small slowdown for commit sized diffs
with --color-moved - the time to run git log -p --color-moved
--no-color-moved-ws --no-merges -n1000 v2.29.0 is increased by 2% on recent
processors. On older processors these patches reduce the running time in all
cases that I've tested. In general the larger the diff the larger the speed
up. As an extreme example the time to run diff --color-moved
--color-moved-ws=allow-indentation-change v2.25.0 v2.30.0 goes down from 8
minutes to 6 seconds.

Phillip Wood (15):
  diff --color-moved: add perf tests
  diff --color-moved: clear all flags on blocks that are too short
  diff --color-moved: factor out function
  diff --color-moved: rewind when discarding pmb
  diff --color-moved=zebra: fix alternate coloring
  diff --color-moved: avoid false short line matches and bad zebra
    coloring
  diff: simplify allow-indentation-change delta calculation
  diff --color-moved-ws=allow-indentation-change: simplify and optimize
  diff --color-moved: call comparison function directly
  diff --color-moved: unify moved block growth functions
  diff --color-moved: shrink potential moved blocks as we go
  diff --color-moved: stop clearing potential moved blocks
  diff --color-moved-ws=allow-indentation-change: improve hash lookups
  diff: use designated initializers for emitted_diff_symbol
  diff --color-moved: intern strings

 diff.c                           | 431 +++++++++++++------------------
 t/perf/p4002-diff-color-moved.sh |  57 ++++
 t/t4015-diff-whitespace.sh       | 205 ++++++++++++++-
 3 files changed, 437 insertions(+), 256 deletions(-)
 create mode 100755 t/perf/p4002-diff-color-moved.sh


base-commit: 211eca0895794362184da2be2a2d812d070719d3
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-981%2Fphillipwood%2Fwip%2Fdiff-color-moved-tweaks-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-981/phillipwood/wip/diff-color-moved-tweaks-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/981

Range-diff vs v4:

  1:  48ee03cf52a =  1:  48ee03cf52a diff --color-moved: add perf tests
  2:  47c652716e8 =  2:  47c652716e8 diff --color-moved: clear all flags on blocks that are too short
  3:  99e38ba9de9 =  3:  99e38ba9de9 diff --color-moved: factor out function
  4:  9ca71db61ae =  4:  9ca71db61ae diff --color-moved: rewind when discarding pmb
  5:  56bb69af36e =  5:  56bb69af36e diff --color-moved=zebra: fix alternate coloring
  6:  10b11526206 !  6:  ed62b980225 diff --color-moved: avoid false short line matches and bad zerba coloring
     @@ Metadata
      Author: Phillip Wood <phillip.wood@dunelm.org.uk>
      
       ## Commit message ##
     -    diff --color-moved: avoid false short line matches and bad zerba coloring
     +    diff --color-moved: avoid false short line matches and bad zebra coloring
      
          When marking moved lines it is possible for a block of potential
          matched lines to extend past a change in sign when there is a sequence
  7:  c2e7b347257 =  7:  b8db6a1af7d diff: simplify allow-indentation-change delta calculation
  8:  d7bbc0041e0 =  8:  eeb633063b7 diff --color-moved-ws=allow-indentation-change: simplify and optimize
  9:  c3e5dce1910 =  9:  fb413cab3a8 diff --color-moved: call comparison function directly
 10:  9eb8cecd52a = 10:  ec8764082d5 diff --color-moved: unify moved block growth functions
 11:  35e204e1578 = 11:  6199a014547 diff --color-moved: shrink potential moved blocks as we go
 12:  ec329e7946d = 12:  1db84490ee4 diff --color-moved: stop clearing potential moved blocks
 13:  6ec94134aaf = 13:  3e769bab78c diff --color-moved-ws=allow-indentation-change: improve hash lookups
 14:  d44c5d734c3 = 14:  b8869659664 diff: use designated initializers for emitted_diff_symbol
 15:  5177f669423 = 15:  350fa55ce5e diff --color-moved: intern strings

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [PATCH v5 01/15] diff --color-moved: add perf tests
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
@ 2021-12-09 10:29         ` Phillip Wood via GitGitGadget
  2021-12-09 10:29         ` [PATCH v5 02/15] diff --color-moved: clear all flags on blocks that are too short Phillip Wood via GitGitGadget
                           ` (13 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:29 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Add some tests so we can monitor changes to the performance of the
move detection code. The tests record the performance --color-moved
and --color-moved-ws=allow-indentation-change for a large diff and a
sequence of smaller diffs. The range of commits used for the large
diff can be customized by exporting TEST_REV_A and TEST_REV_B when
running the test.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 t/perf/p4002-diff-color-moved.sh | 57 ++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)
 create mode 100755 t/perf/p4002-diff-color-moved.sh

diff --git a/t/perf/p4002-diff-color-moved.sh b/t/perf/p4002-diff-color-moved.sh
new file mode 100755
index 00000000000..ab2af931c04
--- /dev/null
+++ b/t/perf/p4002-diff-color-moved.sh
@@ -0,0 +1,57 @@
+#!/bin/sh
+
+test_description='Tests diff --color-moved performance'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+# The endpoints of the diff can be customized by setting TEST_REV_A
+# and TEST_REV_B in the environment when running this test.
+
+rev="${TEST_REV_A:-v2.28.0}"
+if ! rev_a="$(git rev-parse --quiet --verify "$rev")"
+then
+	skip_all="skipping because '$rev' was not found. \
+		  Use TEST_REV_A and TEST_REV_B to set the revs to use"
+	test_done
+fi
+rev="${TEST_REV_B:-v2.29.0}"
+if ! rev_b="$(git rev-parse --quiet --verify "$rev")"
+then
+	skip_all="skipping because '$rev' was not found. \
+		  Use TEST_REV_A and TEST_REV_B to set the revs to use"
+	test_done
+fi
+
+GIT_PAGER_IN_USE=1
+test_export GIT_PAGER_IN_USE rev_a rev_b
+
+test_perf 'diff --no-color-moved --no-color-moved-ws large change' '
+	git diff --no-color-moved --no-color-moved-ws $rev_a $rev_b
+'
+
+test_perf 'diff --color-moved --no-color-moved-ws large change' '
+	git diff --color-moved=zebra --no-color-moved-ws $rev_a $rev_b
+'
+
+test_perf 'diff --color-moved-ws=allow-indentation-change large change' '
+	git diff --color-moved=zebra --color-moved-ws=allow-indentation-change \
+		$rev_a $rev_b
+'
+
+test_perf 'log --no-color-moved --no-color-moved-ws' '
+	git log --no-color-moved --no-color-moved-ws --no-merges --patch \
+		-n1000 $rev_b
+'
+
+test_perf 'log --color-moved --no-color-moved-ws' '
+	git log --color-moved=zebra --no-color-moved-ws --no-merges --patch \
+		-n1000 $rev_b
+'
+
+test_perf 'log --color-moved-ws=allow-indentation-change' '
+	git log --color-moved=zebra --color-moved-ws=allow-indentation-change \
+		--no-merges --patch -n1000 $rev_b
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 02/15] diff --color-moved: clear all flags on blocks that are too short
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
  2021-12-09 10:29         ` [PATCH v5 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
@ 2021-12-09 10:29         ` Phillip Wood via GitGitGadget
  2021-12-09 10:29         ` [PATCH v5 03/15] diff --color-moved: factor out function Phillip Wood via GitGitGadget
                           ` (12 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:29 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

If a block of potentially moved lines is not long enough then the
DIFF_SYMBOL_MOVED_LINE flag is cleared on the matching lines so they
are not marked as moved. To avoid problems when we start rewinding
after an unsuccessful match in a couple of commits time make sure all
the move related flags are cleared, not just DIFF_SYMBOL_MOVED_LINE.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 52c791574b7..bd8e4ec9757 100644
--- a/diff.c
+++ b/diff.c
@@ -1114,6 +1114,8 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb,
  * NEEDSWORK: This uses the same heuristic as blame_entry_score() in blame.c.
  * Think of a way to unify them.
  */
+#define DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK \
+  (DIFF_SYMBOL_MOVED_LINE | DIFF_SYMBOL_MOVED_LINE_ALT)
 static int adjust_last_block(struct diff_options *o, int n, int block_length)
 {
 	int i, alnum_count = 0;
@@ -1130,7 +1132,7 @@ static int adjust_last_block(struct diff_options *o, int n, int block_length)
 		}
 	}
 	for (i = 1; i < block_length + 1; i++)
-		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE;
+		o->emitted_symbols->buf[n - i].flags &= ~DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK;
 	return 0;
 }
 
@@ -1237,8 +1239,6 @@ static void mark_color_as_moved(struct diff_options *o,
 	free(pmb);
 }
 
-#define DIFF_SYMBOL_MOVED_LINE_ZEBRA_MASK \
-  (DIFF_SYMBOL_MOVED_LINE | DIFF_SYMBOL_MOVED_LINE_ALT)
 static void dim_moved_lines(struct diff_options *o)
 {
 	int n;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 03/15] diff --color-moved: factor out function
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
  2021-12-09 10:29         ` [PATCH v5 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
  2021-12-09 10:29         ` [PATCH v5 02/15] diff --color-moved: clear all flags on blocks that are too short Phillip Wood via GitGitGadget
@ 2021-12-09 10:29         ` Phillip Wood via GitGitGadget
  2021-12-09 10:29         ` [PATCH v5 04/15] diff --color-moved: rewind when discarding pmb Phillip Wood via GitGitGadget
                           ` (11 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:29 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This code is quite heavily indented and having it in its own function
simplifies an upcoming change.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 51 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/diff.c b/diff.c
index bd8e4ec9757..09af94e018c 100644
--- a/diff.c
+++ b/diff.c
@@ -1098,6 +1098,38 @@ static int shrink_potential_moved_blocks(struct moved_block *pmb,
 	return rp + 1;
 }
 
+static void fill_potential_moved_blocks(struct diff_options *o,
+					struct hashmap *hm,
+					struct moved_entry *match,
+					struct emitted_diff_symbol *l,
+					struct moved_block **pmb_p,
+					int *pmb_alloc_p, int *pmb_nr_p)
+
+{
+	struct moved_block *pmb = *pmb_p;
+	int pmb_alloc = *pmb_alloc_p, pmb_nr = *pmb_nr_p;
+
+	/*
+	 * The current line is the start of a new block.
+	 * Setup the set of potential blocks.
+	 */
+	hashmap_for_each_entry_from(hm, match, ent) {
+		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
+		if (o->color_moved_ws_handling &
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
+			if (compute_ws_delta(l, match->es, &(pmb[pmb_nr]).wsd))
+				pmb[pmb_nr++].match = match;
+		} else {
+			pmb[pmb_nr].wsd = 0;
+			pmb[pmb_nr++].match = match;
+		}
+	}
+
+	*pmb_p = pmb;
+	*pmb_alloc_p = pmb_alloc;
+	*pmb_nr_p = pmb_nr;
+}
+
 /*
  * If o->color_moved is COLOR_MOVED_PLAIN, this function does nothing.
  *
@@ -1198,23 +1230,8 @@ static void mark_color_as_moved(struct diff_options *o,
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
 		if (pmb_nr == 0) {
-			/*
-			 * The current line is the start of a new block.
-			 * Setup the set of potential blocks.
-			 */
-			hashmap_for_each_entry_from(hm, match, ent) {
-				ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
-				if (o->color_moved_ws_handling &
-				    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
-					if (compute_ws_delta(l, match->es,
-							     &pmb[pmb_nr].wsd))
-						pmb[pmb_nr++].match = match;
-				} else {
-					pmb[pmb_nr].wsd = 0;
-					pmb[pmb_nr++].match = match;
-				}
-			}
-
+			fill_potential_moved_blocks(
+				o, hm, match, l, &pmb, &pmb_alloc, &pmb_nr);
 			if (adjust_last_block(o, n, block_length) &&
 			    pmb_nr && last_symbol != l->s)
 				flipped_block = (flipped_block + 1) % 2;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 04/15] diff --color-moved: rewind when discarding pmb
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
                           ` (2 preceding siblings ...)
  2021-12-09 10:29         ` [PATCH v5 03/15] diff --color-moved: factor out function Phillip Wood via GitGitGadget
@ 2021-12-09 10:29         ` Phillip Wood via GitGitGadget
  2021-12-09 10:29         ` [PATCH v5 05/15] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
                           ` (10 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:29 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

diff --color-moved colors the two sides of the diff separately. It
walks through the diff and tries to find matches on the other side of
the diff for the current line. When it finds one or more matches it
starts a "potential moved block" (pmb) and marks the current line as
moved. Then as it walks through the diff it only looks for matches for
the current line in the lines following those in the pmb. When none of
the lines in the pmb match it checks how long the match is and if it
is too short it unmarks the lines as matched and goes back to finding
all the lines that match the current line. As the process of finding
matching lines restarts from the end of the block that was too short
it is possible to miss the start of a matching block on on side but
not the other. In the test added here "-two" would not be colored as
moved but "+two" would be.

Fix this by rewinding the current line when we reach the end of a
block that is too short. This is quadratic in the length of the
discarded block. While the discarded blocks are quite short on a large
diff this still has a significant impact on the performance of
--color-moved-ws=allow-indentation-change. The following commits
optimize the performance of the --color-moved machinery which
mitigates the performance impact of this commit. After the
optimization this commit has a negligible impact on performance.

Test                                                                  HEAD^               HEAD
-----------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)    0.39 (0.34+0.04)  +2.6%
4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.76+0.03)    0.86 (0.82+0.04)  +7.5%
4002.3: diff --color-moved-ws=allow-indentation-change large change  14.22(14.17+0.04)   19.01(18.93+0.05) +33.7%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16 (1.06+0.09)    1.16 (1.07+0.07)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.31 (1.22+0.09)    1.32 (1.22+0.09)  +0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.71 (1.61+0.09)    1.72 (1.63+0.08)  +0.6%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 28 ++++++++++++++++++-----
 t/t4015-diff-whitespace.sh | 46 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 09af94e018c..1e1b5127d15 100644
--- a/diff.c
+++ b/diff.c
@@ -1205,7 +1205,15 @@ static void mark_color_as_moved(struct diff_options *o,
 		if (!match) {
 			int i;
 
-			adjust_last_block(o, n, block_length);
+			if (!adjust_last_block(o, n, block_length) &&
+			    block_length > 1) {
+				/*
+				 * Rewind in case there is another match
+				 * starting at the second line of the block
+				 */
+				match = NULL;
+				n -= block_length;
+			}
 			for(i = 0; i < pmb_nr; i++)
 				moved_block_clear(&pmb[i]);
 			pmb_nr = 0;
@@ -1230,10 +1238,20 @@ static void mark_color_as_moved(struct diff_options *o,
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
 		if (pmb_nr == 0) {
-			fill_potential_moved_blocks(
-				o, hm, match, l, &pmb, &pmb_alloc, &pmb_nr);
-			if (adjust_last_block(o, n, block_length) &&
-			    pmb_nr && last_symbol != l->s)
+			int contiguous = adjust_last_block(o, n, block_length);
+
+			if (!contiguous && block_length > 1)
+				/*
+				 * Rewind in case there is another match
+				 * starting at the second line of the block
+				 */
+				n -= block_length;
+			else
+				fill_potential_moved_blocks(o, hm, match, l,
+							    &pmb, &pmb_alloc,
+							    &pmb_nr);
+
+			if (contiguous && pmb_nr && last_symbol != l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 2c13b62d3c6..308dc136596 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1833,6 +1833,52 @@ test_expect_success '--color-moved treats adjacent blocks as separate for MIN_AL
 	test_cmp expected actual
 '
 
+test_expect_success '--color-moved rewinds for MIN_ALNUM_COUNT' '
+	git reset --hard &&
+	test_write_lines >file \
+		A B C one two three four five six seven D E F G H I J &&
+	git add file &&
+	test_write_lines >file \
+		one two A B C D E F G H I J two three four five six seven &&
+	git diff --color-moved=zebra -- file &&
+
+	git diff --color-moved=zebra --color -- file >actual.raw &&
+	grep -v "index" actual.raw | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/file b/file<RESET>
+	<BOLD>--- a/file<RESET>
+	<BOLD>+++ b/file<RESET>
+	<CYAN>@@ -1,13 +1,8 @@<RESET>
+	<GREEN>+<RESET><GREEN>one<RESET>
+	<GREEN>+<RESET><GREEN>two<RESET>
+	 A<RESET>
+	 B<RESET>
+	 C<RESET>
+	<RED>-one<RESET>
+	<BOLD;MAGENTA>-two<RESET>
+	<BOLD;MAGENTA>-three<RESET>
+	<BOLD;MAGENTA>-four<RESET>
+	<BOLD;MAGENTA>-five<RESET>
+	<BOLD;MAGENTA>-six<RESET>
+	<BOLD;MAGENTA>-seven<RESET>
+	 D<RESET>
+	 E<RESET>
+	 F<RESET>
+	<CYAN>@@ -15,3 +10,9 @@<RESET> <RESET>G<RESET>
+	 H<RESET>
+	 I<RESET>
+	 J<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>two<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>three<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>four<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>five<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>six<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>seven<RESET>
+	EOF
+
+	test_cmp expected actual
+'
+
 test_expect_success 'move detection with submodules' '
 	test_create_repo bananas &&
 	echo ripe >bananas/recipe &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 05/15] diff --color-moved=zebra: fix alternate coloring
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
                           ` (3 preceding siblings ...)
  2021-12-09 10:29         ` [PATCH v5 04/15] diff --color-moved: rewind when discarding pmb Phillip Wood via GitGitGadget
@ 2021-12-09 10:29         ` Phillip Wood via GitGitGadget
  2021-12-09 10:30         ` [PATCH v5 06/15] diff --color-moved: avoid false short line matches and bad zebra coloring Phillip Wood via GitGitGadget
                           ` (9 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:29 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

b0a2ba4776 ("diff --color-moved=zebra: be stricter with color
alternation", 2018-11-23) sought to avoid using the alternate colors
unless there are two adjacent moved blocks of the same
sign. Unfortunately it contains two bugs that prevented it from fixing
the problem properly. Firstly `last_symbol` is reset at the start of
each iteration of the loop losing the symbol of the last line and
secondly when deciding whether to use the alternate color it should be
checking if the current line is the same sign of the last line, not a
different sign. The combination of the two errors means that we still
use the alternate color when we should do but we also use it when we
shouldn't. This is most noticable when using
--color-moved-ws=allow-indentation-change with hunks like

-this line gets indented
+    this line gets indented

where the post image is colored with newMovedAlternate rather than
newMoved. While this does not matter much, the next commit will change
the coloring to be correct in this case, so lets fix the bug here to
make it clear why the output is changing and add a regression test.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     |  4 +--
 t/t4015-diff-whitespace.sh | 72 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/diff.c b/diff.c
index 1e1b5127d15..53f0df75329 100644
--- a/diff.c
+++ b/diff.c
@@ -1176,6 +1176,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
 	int n, flipped_block = 0, block_length = 0;
+	enum diff_symbol last_symbol = 0;
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
@@ -1183,7 +1184,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-		enum diff_symbol last_symbol = 0;
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
@@ -1251,7 +1251,7 @@ static void mark_color_as_moved(struct diff_options *o,
 							    &pmb, &pmb_alloc,
 							    &pmb_nr);
 
-			if (contiguous && pmb_nr && last_symbol != l->s)
+			if (contiguous && pmb_nr && last_symbol == l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 308dc136596..4e0fd76c6c5 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1442,6 +1442,78 @@ test_expect_success 'detect permutations inside moved code -- dimmed-zebra' '
 	test_cmp expected actual
 '
 
+test_expect_success 'zebra alternate color is only used when necessary' '
+	cat >old.txt <<-\EOF &&
+	line 1A should be marked as oldMoved newMovedAlternate
+	line 1B should be marked as oldMoved newMovedAlternate
+	unchanged
+	line 2A should be marked as oldMoved newMovedAlternate
+	line 2B should be marked as oldMoved newMovedAlternate
+	line 3A should be marked as oldMovedAlternate newMoved
+	line 3B should be marked as oldMovedAlternate newMoved
+	unchanged
+	line 4A should be marked as oldMoved newMovedAlternate
+	line 4B should be marked as oldMoved newMovedAlternate
+	line 5A should be marked as oldMovedAlternate newMoved
+	line 5B should be marked as oldMovedAlternate newMoved
+	line 6A should be marked as oldMoved newMoved
+	line 6B should be marked as oldMoved newMoved
+	EOF
+	cat >new.txt <<-\EOF &&
+	  line 1A should be marked as oldMoved newMovedAlternate
+	  line 1B should be marked as oldMoved newMovedAlternate
+	unchanged
+	  line 3A should be marked as oldMovedAlternate newMoved
+	  line 3B should be marked as oldMovedAlternate newMoved
+	  line 2A should be marked as oldMoved newMovedAlternate
+	  line 2B should be marked as oldMoved newMovedAlternate
+	unchanged
+	  line 6A should be marked as oldMoved newMoved
+	  line 6B should be marked as oldMoved newMoved
+	    line 4A should be marked as oldMoved newMovedAlternate
+	    line 4B should be marked as oldMoved newMovedAlternate
+	  line 5A should be marked as oldMovedAlternate newMoved
+	  line 5B should be marked as oldMovedAlternate newMoved
+	EOF
+	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
+		 --color-moved-ws=allow-indentation-change \
+		 old.txt new.txt >output &&
+	grep -v index output | test_decode_color >actual &&
+	cat >expected <<-\EOF &&
+	<BOLD>diff --git a/old.txt b/new.txt<RESET>
+	<BOLD>--- a/old.txt<RESET>
+	<BOLD>+++ b/new.txt<RESET>
+	<CYAN>@@ -1,14 +1,14 @@<RESET>
+	<BOLD;MAGENTA>-line 1A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 1B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 1B should be marked as oldMoved newMovedAlternate<RESET>
+	 unchanged<RESET>
+	<BOLD;MAGENTA>-line 2A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 2B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;BLUE>-line 3A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;BLUE>-line 3B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 3B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>  line 2B should be marked as oldMoved newMovedAlternate<RESET>
+	 unchanged<RESET>
+	<BOLD;MAGENTA>-line 4A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;MAGENTA>-line 4B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;BLUE>-line 5A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;BLUE>-line 5B should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;MAGENTA>-line 6A should be marked as oldMoved newMoved<RESET>
+	<BOLD;MAGENTA>-line 6B should be marked as oldMoved newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6A should be marked as oldMoved newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 6B should be marked as oldMoved newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4A should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>    line 4B should be marked as oldMoved newMovedAlternate<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5A should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>  line 5B should be marked as oldMovedAlternate newMoved<RESET>
+	EOF
+	test_cmp expected actual
+'
+
 test_expect_success 'cmd option assumes configured colored-moved' '
 	test_config color.diff.oldMoved "magenta" &&
 	test_config color.diff.newMoved "cyan" &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 06/15] diff --color-moved: avoid false short line matches and bad zebra coloring
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
                           ` (4 preceding siblings ...)
  2021-12-09 10:29         ` [PATCH v5 05/15] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
@ 2021-12-09 10:30         ` Phillip Wood via GitGitGadget
  2021-12-09 10:30         ` [PATCH v5 07/15] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
                           ` (8 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:30 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

When marking moved lines it is possible for a block of potential
matched lines to extend past a change in sign when there is a sequence
of added lines whose text matches the text of a sequence of deleted
and added lines. Most of the time either `match` will be NULL or
`pmb_advance_or_null()` will fail when the loop encounters a change of
sign but there are corner cases where `match` is non-NULL and
`pmb_advance_or_null()` successfully advances the moved block despite
the change in sign.

One consequence of this is highlighting a short line as moved when it
should not be. For example

-moved line  # Correctly highlighted as moved
+short line  # Wrongly highlighted as moved
 context
+moved line  # Correctly highlighted as moved
+short line
 context
-short line

The other consequence is coloring a moved addition following a moved
deletion in the wrong color. In the example below the first "+moved
line 3" should be highlighted as newMoved not newMovedAlternate.

-moved line 1 # Correctly highlighted as oldMoved
-moved line 2 # Correctly highlighted as oldMovedAlternate
+moved line 3 # Wrongly highlighted as newMovedAlternate
 context      # Everything else is highlighted correctly
+moved line 2
+moved line 3
 context
+moved line 1
-moved line 3

These false matches are more likely when using --color-moved-ws with
the exception of --color-moved-ws=allow-indentation-change which ties
the sign of the current whitespace delta to the sign of the line to
avoid this problem. The fix is to check that the sign of the new line
being matched is the same as the sign of the line that started the
block of potential matches.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 17 ++++++----
 t/t4015-diff-whitespace.sh | 65 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index 53f0df75329..efba2789354 100644
--- a/diff.c
+++ b/diff.c
@@ -1176,7 +1176,7 @@ static void mark_color_as_moved(struct diff_options *o,
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
 	int n, flipped_block = 0, block_length = 0;
-	enum diff_symbol last_symbol = 0;
+	enum diff_symbol moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
@@ -1202,7 +1202,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			flipped_block = 0;
 		}
 
-		if (!match) {
+		if (pmb_nr && (!match || l->s != moved_symbol)) {
 			int i;
 
 			if (!adjust_last_block(o, n, block_length) &&
@@ -1219,12 +1219,13 @@ static void mark_color_as_moved(struct diff_options *o,
 			pmb_nr = 0;
 			block_length = 0;
 			flipped_block = 0;
-			last_symbol = l->s;
+		}
+		if (!match) {
+			moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
 			continue;
 		}
 
 		if (o->color_moved == COLOR_MOVED_PLAIN) {
-			last_symbol = l->s;
 			l->flags |= DIFF_SYMBOL_MOVED_LINE;
 			continue;
 		}
@@ -1251,11 +1252,16 @@ static void mark_color_as_moved(struct diff_options *o,
 							    &pmb, &pmb_alloc,
 							    &pmb_nr);
 
-			if (contiguous && pmb_nr && last_symbol == l->s)
+			if (contiguous && pmb_nr && moved_symbol == l->s)
 				flipped_block = (flipped_block + 1) % 2;
 			else
 				flipped_block = 0;
 
+			if (pmb_nr)
+				moved_symbol = l->s;
+			else
+				moved_symbol = DIFF_SYMBOL_BINARY_DIFF_HEADER;
+
 			block_length = 0;
 		}
 
@@ -1265,7 +1271,6 @@ static void mark_color_as_moved(struct diff_options *o,
 			if (flipped_block && o->color_moved != COLOR_MOVED_BLOCKS)
 				l->flags |= DIFF_SYMBOL_MOVED_LINE_ALT;
 		}
-		last_symbol = l->s;
 	}
 	adjust_last_block(o, n, block_length);
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 4e0fd76c6c5..15782c879d2 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1514,6 +1514,71 @@ test_expect_success 'zebra alternate color is only used when necessary' '
 	test_cmp expected actual
 '
 
+test_expect_success 'short lines of opposite sign do not get marked as moved' '
+	cat >old.txt <<-\EOF &&
+	this line should be marked as moved
+	unchanged
+	unchanged
+	unchanged
+	unchanged
+	too short
+	this line should be marked as oldMoved newMoved
+	this line should be marked as oldMovedAlternate newMoved
+	unchanged 1
+	unchanged 2
+	unchanged 3
+	unchanged 4
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	EOF
+	cat >new.txt <<-\EOF &&
+	too short
+	unchanged
+	unchanged
+	this line should be marked as moved
+	too short
+	unchanged
+	unchanged
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	unchanged 1
+	unchanged 2
+	this line should be marked as oldMovedAlternate newMoved
+	this line should be marked as oldMoved newMoved/newMovedAlternate
+	unchanged 3
+	this line should be marked as oldMoved newMoved
+	unchanged 4
+	EOF
+	test_expect_code 1 git diff --no-index --color --color-moved=zebra \
+		old.txt new.txt >output && cat output &&
+	grep -v index output | test_decode_color >actual &&
+	cat >expect <<-\EOF &&
+	<BOLD>diff --git a/old.txt b/new.txt<RESET>
+	<BOLD>--- a/old.txt<RESET>
+	<BOLD>+++ b/new.txt<RESET>
+	<CYAN>@@ -1,13 +1,15 @@<RESET>
+	<BOLD;MAGENTA>-this line should be marked as moved<RESET>
+	<GREEN>+<RESET><GREEN>too short<RESET>
+	 unchanged<RESET>
+	 unchanged<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as moved<RESET>
+	<GREEN>+<RESET><GREEN>too short<RESET>
+	 unchanged<RESET>
+	 unchanged<RESET>
+	<RED>-too short<RESET>
+	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved<RESET>
+	<BOLD;BLUE>-this line should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	 unchanged 1<RESET>
+	 unchanged 2<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMovedAlternate newMoved<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	 unchanged 3<RESET>
+	<BOLD;CYAN>+<RESET><BOLD;CYAN>this line should be marked as oldMoved newMoved<RESET>
+	 unchanged 4<RESET>
+	<BOLD;MAGENTA>-this line should be marked as oldMoved newMoved/newMovedAlternate<RESET>
+	EOF
+	test_cmp expect actual
+'
+
 test_expect_success 'cmd option assumes configured colored-moved' '
 	test_config color.diff.oldMoved "magenta" &&
 	test_config color.diff.newMoved "cyan" &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 07/15] diff: simplify allow-indentation-change delta calculation
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
                           ` (5 preceding siblings ...)
  2021-12-09 10:30         ` [PATCH v5 06/15] diff --color-moved: avoid false short line matches and bad zebra coloring Phillip Wood via GitGitGadget
@ 2021-12-09 10:30         ` Phillip Wood via GitGitGadget
  2021-12-09 10:30         ` [PATCH v5 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
                           ` (7 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:30 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Now that we reliably end a block when the sign changes we don't need
the whitespace delta calculation to rely on the sign.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/diff.c b/diff.c
index efba2789354..9aff167be27 100644
--- a/diff.c
+++ b/diff.c
@@ -864,23 +864,17 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 	    a_width = a->indent_width,
 	    b_off = b->indent_off,
 	    b_width = b->indent_width;
-	int delta;
 
 	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
 		*out = INDENT_BLANKLINE;
 		return 1;
 	}
 
-	if (a->s == DIFF_SYMBOL_PLUS)
-		delta = a_width - b_width;
-	else
-		delta = b_width - a_width;
-
 	if (a_len - a_off != b_len - b_off ||
 	    memcmp(a->line + a_off, b->line + b_off, a_len - a_off))
 		return 0;
 
-	*out = delta;
+	*out = a_width - b_width;
 
 	return 1;
 }
@@ -924,10 +918,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 * match those of the current block and that the text of 'l' and 'cur'
 	 * after the indentation match.
 	 */
-	if (cur->es->s == DIFF_SYMBOL_PLUS)
-		delta = a_width - c_width;
-	else
-		delta = c_width - a_width;
+	delta = c_width - a_width;
 
 	/*
 	 * If the previous lines of this block were all blank then set its
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
                           ` (6 preceding siblings ...)
  2021-12-09 10:30         ` [PATCH v5 07/15] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
@ 2021-12-09 10:30         ` Phillip Wood via GitGitGadget
  2021-12-09 10:30         ` [PATCH v5 09/15] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
                           ` (6 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:30 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

If we already have a block of potentially moved lines then as we move
down the diff we need to check if the next line of each potentially
moved line matches the current line of the diff. The implementation of
--color-moved-ws=allow-indentation-change was needlessly performing
this check on all the lines in the diff that matched the current line
rather than just the current line. To exacerbate the problem finding
all the other lines in the diff that match the current line involves a
fuzzy lookup so we were wasting even more time performing a second
comparison to filter out the non-matching lines. Fixing this reduces
time to run
  git diff --color-moved-ws=allow-indentation-change v2.28.0 v2.29.0
by 93% compared to master and simplifies the code.

Test                                                                  HEAD^              HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.35+0.03)   0.38(0.35+0.03)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.86 (0.80+0.06)   0.87(0.83+0.04)  +1.2%
4002.3: diff --color-moved-ws=allow-indentation-change large change  19.01(18.93+0.06)   0.97(0.92+0.04) -94.9%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16 (1.06+0.09)   1.17(1.06+0.10)  +0.9%
4002.5: log --color-moved --no-color-moved-ws                         1.32 (1.25+0.07)   1.32(1.24+0.08)  +0.0%
4002.6: log --color-moved-ws=allow-indentation-change                 1.71 (1.64+0.06)   1.36(1.25+0.10) -20.5%

Test                                                                  master             HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)   0.38(0.35+0.03)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.75+0.04)   0.87(0.83+0.04)  +8.7%
4002.3: diff --color-moved-ws=allow-indentation-change large change  14.20(14.15+0.05)   0.97(0.92+0.04) -93.2%
4002.4: log --no-color-moved --no-color-moved-ws                      1.15 (1.05+0.09)   1.17(1.06+0.10)  +1.7%
4002.5: log --color-moved --no-color-moved-ws                         1.30 (1.19+0.11)   1.32(1.24+0.08)  +1.5%
4002.6: log --color-moved-ws=allow-indentation-change                 1.70 (1.63+0.06)   1.36(1.25+0.10) -20.0%

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 70 +++++++++++++++++-----------------------------------------
 1 file changed, 20 insertions(+), 50 deletions(-)

diff --git a/diff.c b/diff.c
index 9aff167be27..78a486021ab 100644
--- a/diff.c
+++ b/diff.c
@@ -879,37 +879,21 @@ static int compute_ws_delta(const struct emitted_diff_symbol *a,
 	return 1;
 }
 
-static int cmp_in_block_with_wsd(const struct diff_options *o,
-				 const struct moved_entry *cur,
-				 const struct moved_entry *match,
-				 struct moved_block *pmb,
-				 int n)
-{
-	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
-	int al = cur->es->len, bl = match->es->len, cl = l->len;
+static int cmp_in_block_with_wsd(const struct moved_entry *cur,
+				 const struct emitted_diff_symbol *l,
+				 struct moved_block *pmb)
+{
+	int al = cur->es->len, bl = l->len;
 	const char *a = cur->es->line,
-		   *b = match->es->line,
-		   *c = l->line;
+		   *b = l->line;
 	int a_off = cur->es->indent_off,
 	    a_width = cur->es->indent_width,
-	    c_off = l->indent_off,
-	    c_width = l->indent_width;
+	    b_off = l->indent_off,
+	    b_width = l->indent_width;
 	int delta;
 
-	/*
-	 * We need to check if 'cur' is equal to 'match'.  As those
-	 * are from the same (+/-) side, we do not need to adjust for
-	 * indent changes. However these were found using fuzzy
-	 * matching so we do have to check if they are equal. Here we
-	 * just check the lengths. We delay calling memcmp() to check
-	 * the contents until later as if the length comparison for a
-	 * and c fails we can avoid the call all together.
-	 */
-	if (al != bl)
-		return 1;
-
 	/* If 'l' and 'cur' are both blank then they match. */
-	if (a_width == INDENT_BLANKLINE && c_width == INDENT_BLANKLINE)
+	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
 		return 0;
 
 	/*
@@ -918,7 +902,7 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	 * match those of the current block and that the text of 'l' and 'cur'
 	 * after the indentation match.
 	 */
-	delta = c_width - a_width;
+	delta = b_width - a_width;
 
 	/*
 	 * If the previous lines of this block were all blank then set its
@@ -927,9 +911,8 @@ static int cmp_in_block_with_wsd(const struct diff_options *o,
 	if (pmb->wsd == INDENT_BLANKLINE)
 		pmb->wsd = delta;
 
-	return !(delta == pmb->wsd && al - a_off == cl - c_off &&
-		 !memcmp(a, b, al) && !
-		 memcmp(a + a_off, c + c_off, al - a_off));
+	return !(delta == pmb->wsd && al - a_off == bl - b_off &&
+		 !memcmp(a + a_off, b + b_off, al - a_off));
 }
 
 static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
@@ -1030,36 +1013,23 @@ static void pmb_advance_or_null(struct diff_options *o,
 }
 
 static void pmb_advance_or_null_multi_match(struct diff_options *o,
-					    struct moved_entry *match,
-					    struct hashmap *hm,
+					    struct emitted_diff_symbol *l,
 					    struct moved_block *pmb,
-					    int pmb_nr, int n)
+					    int pmb_nr)
 {
 	int i;
-	char *got_match = xcalloc(1, pmb_nr);
-
-	hashmap_for_each_entry_from(hm, match, ent) {
-		for (i = 0; i < pmb_nr; i++) {
-			struct moved_entry *prev = pmb[i].match;
-			struct moved_entry *cur = (prev && prev->next_line) ?
-					prev->next_line : NULL;
-			if (!cur)
-				continue;
-			if (!cmp_in_block_with_wsd(o, cur, match, &pmb[i], n))
-				got_match[i] |= 1;
-		}
-	}
 
 	for (i = 0; i < pmb_nr; i++) {
-		if (got_match[i]) {
+		struct moved_entry *prev = pmb[i].match;
+		struct moved_entry *cur = (prev && prev->next_line) ?
+			prev->next_line : NULL;
+		if (cur && !cmp_in_block_with_wsd(cur, l, &pmb[i])) {
 			/* Advance to the next line */
-			pmb[i].match = pmb[i].match->next_line;
+			pmb[i].match = cur;
 		} else {
 			moved_block_clear(&pmb[i]);
 		}
 	}
-
-	free(got_match);
 }
 
 static int shrink_potential_moved_blocks(struct moved_block *pmb,
@@ -1223,7 +1193,7 @@ static void mark_color_as_moved(struct diff_options *o,
 
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			pmb_advance_or_null_multi_match(o, match, hm, pmb, pmb_nr, n);
+			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
 		else
 			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 09/15] diff --color-moved: call comparison function directly
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
                           ` (7 preceding siblings ...)
  2021-12-09 10:30         ` [PATCH v5 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
@ 2021-12-09 10:30         ` Phillip Wood via GitGitGadget
  2021-12-09 10:30         ` [PATCH v5 10/15] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
                           ` (5 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:30 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This change will allow us to easily combine pmb_advance_or_null() and
pmb_advance_or_null_multi_match() in the next commit. Calling
xdiff_compare_lines() directly rather than using a function pointer
from the hash map has little effect on the run time.

Test                                                                  HEAD^             HEAD
-------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.35+0.03)   0.38(0.32+0.06) +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.87(0.83+0.04)   0.87(0.80+0.06) +0.0%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.97(0.92+0.04)   0.97(0.93+0.04) +0.0%
4002.4: log --no-color-moved --no-color-moved-ws                      1.17(1.06+0.10)   1.16(1.10+0.05) -0.9%
4002.5: log --color-moved --no-color-moved-ws                         1.32(1.24+0.08)   1.31(1.22+0.09) -0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.36(1.25+0.10)   1.35(1.25+0.10) -0.7%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 78a486021ab..22e0edac173 100644
--- a/diff.c
+++ b/diff.c
@@ -994,17 +994,20 @@ static void add_lines_to_move_detection(struct diff_options *o,
 }
 
 static void pmb_advance_or_null(struct diff_options *o,
-				struct moved_entry *match,
-				struct hashmap *hm,
+				struct emitted_diff_symbol *l,
 				struct moved_block *pmb,
 				int pmb_nr)
 {
 	int i;
+	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
+
 	for (i = 0; i < pmb_nr; i++) {
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
 				prev->next_line : NULL;
-		if (cur && !hm->cmpfn(o, &cur->ent, &match->ent, NULL)) {
+		if (cur && xdiff_compare_lines(cur->es->line, cur->es->len,
+						l->line, l->len,
+						flags)) {
 			pmb[i].match = cur;
 		} else {
 			pmb[i].match = NULL;
@@ -1195,7 +1198,7 @@ static void mark_color_as_moved(struct diff_options *o,
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
 			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
 		else
-			pmb_advance_or_null(o, match, hm, pmb, pmb_nr);
+			pmb_advance_or_null(o, l, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 10/15] diff --color-moved: unify moved block growth functions
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
                           ` (8 preceding siblings ...)
  2021-12-09 10:30         ` [PATCH v5 09/15] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
@ 2021-12-09 10:30         ` Phillip Wood via GitGitGadget
  2021-12-09 10:30         ` [PATCH v5 11/15] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
                           ` (4 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:30 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

After the last two commits pmb_advance_or_null() and
pmb_advance_or_null_multi_match() differ only in the comparison they
perform. Lets simplify the code by combining them into a single
function.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 41 ++++++++++++-----------------------------
 1 file changed, 12 insertions(+), 29 deletions(-)

diff --git a/diff.c b/diff.c
index 22e0edac173..51f092e724e 100644
--- a/diff.c
+++ b/diff.c
@@ -1002,36 +1002,23 @@ static void pmb_advance_or_null(struct diff_options *o,
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
 	for (i = 0; i < pmb_nr; i++) {
+		int match;
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
 				prev->next_line : NULL;
-		if (cur && xdiff_compare_lines(cur->es->line, cur->es->len,
-						l->line, l->len,
-						flags)) {
-			pmb[i].match = cur;
-		} else {
-			pmb[i].match = NULL;
-		}
-	}
-}
 
-static void pmb_advance_or_null_multi_match(struct diff_options *o,
-					    struct emitted_diff_symbol *l,
-					    struct moved_block *pmb,
-					    int pmb_nr)
-{
-	int i;
-
-	for (i = 0; i < pmb_nr; i++) {
-		struct moved_entry *prev = pmb[i].match;
-		struct moved_entry *cur = (prev && prev->next_line) ?
-			prev->next_line : NULL;
-		if (cur && !cmp_in_block_with_wsd(cur, l, &pmb[i])) {
-			/* Advance to the next line */
+		if (o->color_moved_ws_handling &
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			match = cur &&
+				!cmp_in_block_with_wsd(cur, l, &pmb[i]);
+		else
+			match = cur &&
+				xdiff_compare_lines(cur->es->line, cur->es->len,
+						    l->line, l->len, flags);
+		if (match)
 			pmb[i].match = cur;
-		} else {
+		else
 			moved_block_clear(&pmb[i]);
-		}
 	}
 }
 
@@ -1194,11 +1181,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 		}
 
-		if (o->color_moved_ws_handling &
-		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			pmb_advance_or_null_multi_match(o, l, pmb, pmb_nr);
-		else
-			pmb_advance_or_null(o, l, pmb, pmb_nr);
+		pmb_advance_or_null(o, l, pmb, pmb_nr);
 
 		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 11/15] diff --color-moved: shrink potential moved blocks as we go
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
                           ` (9 preceding siblings ...)
  2021-12-09 10:30         ` [PATCH v5 10/15] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
@ 2021-12-09 10:30         ` Phillip Wood via GitGitGadget
  2021-12-09 10:30         ` [PATCH v5 12/15] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
                           ` (3 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:30 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Rather than setting `match` to NULL and then looping over the list of
potential matched blocks for a second time to remove blocks with no
matches just filter out the blocks with no matches as we go.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 44 ++++++++------------------------------------
 1 file changed, 8 insertions(+), 36 deletions(-)

diff --git a/diff.c b/diff.c
index 51f092e724e..626fd47aa0e 100644
--- a/diff.c
+++ b/diff.c
@@ -996,12 +996,12 @@ static void add_lines_to_move_detection(struct diff_options *o,
 static void pmb_advance_or_null(struct diff_options *o,
 				struct emitted_diff_symbol *l,
 				struct moved_block *pmb,
-				int pmb_nr)
+				int *pmb_nr)
 {
-	int i;
+	int i, j;
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
-	for (i = 0; i < pmb_nr; i++) {
+	for (i = 0, j = 0; i < *pmb_nr; i++) {
 		int match;
 		struct moved_entry *prev = pmb[i].match;
 		struct moved_entry *cur = (prev && prev->next_line) ?
@@ -1015,38 +1015,12 @@ static void pmb_advance_or_null(struct diff_options *o,
 			match = cur &&
 				xdiff_compare_lines(cur->es->line, cur->es->len,
 						    l->line, l->len, flags);
-		if (match)
-			pmb[i].match = cur;
-		else
-			moved_block_clear(&pmb[i]);
-	}
-}
-
-static int shrink_potential_moved_blocks(struct moved_block *pmb,
-					 int pmb_nr)
-{
-	int lp, rp;
-
-	/* Shrink the set of potential block to the remaining running */
-	for (lp = 0, rp = pmb_nr - 1; lp <= rp;) {
-		while (lp < pmb_nr && pmb[lp].match)
-			lp++;
-		/* lp points at the first NULL now */
-
-		while (rp > -1 && !pmb[rp].match)
-			rp--;
-		/* rp points at the last non-NULL */
-
-		if (lp < pmb_nr && rp > -1 && lp < rp) {
-			pmb[lp] = pmb[rp];
-			memset(&pmb[rp], 0, sizeof(pmb[rp]));
-			rp--;
-			lp++;
+		if (match) {
+			pmb[j] = pmb[i];
+			pmb[j++].match = cur;
 		}
 	}
-
-	/* Remember the number of running sets */
-	return rp + 1;
+	*pmb_nr = j;
 }
 
 static void fill_potential_moved_blocks(struct diff_options *o,
@@ -1181,9 +1155,7 @@ static void mark_color_as_moved(struct diff_options *o,
 			continue;
 		}
 
-		pmb_advance_or_null(o, l, pmb, pmb_nr);
-
-		pmb_nr = shrink_potential_moved_blocks(pmb, pmb_nr);
+		pmb_advance_or_null(o, l, pmb, &pmb_nr);
 
 		if (pmb_nr == 0) {
 			int contiguous = adjust_last_block(o, n, block_length);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 12/15] diff --color-moved: stop clearing potential moved blocks
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
                           ` (10 preceding siblings ...)
  2021-12-09 10:30         ` [PATCH v5 11/15] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
@ 2021-12-09 10:30         ` Phillip Wood via GitGitGadget
  2021-12-09 10:30         ` [PATCH v5 13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
                           ` (2 subsequent siblings)
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:30 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

moved_block_clear() was introduced in 74d156f4a1 ("diff
--color-moved-ws: fix double free crash", 2018-10-04) to free the
memory that was allocated when initializing a potential moved
block. However since 21536d077f ("diff --color-moved-ws: modify
allow-indentation-change", 2018-11-23) initializing a potential moved
block no longer allocates any memory. Up until the last commit we were
relying on moved_block_clear() to set the `match` pointer to NULL when
a block stopped matching, but since that commit we do not clear a
moved block that does not match so it does not make sense to clear
them elsewhere.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 11 -----------
 1 file changed, 11 deletions(-)

diff --git a/diff.c b/diff.c
index 626fd47aa0e..ffbe09937bc 100644
--- a/diff.c
+++ b/diff.c
@@ -807,11 +807,6 @@ struct moved_block {
 	int wsd; /* The whitespace delta of this block */
 };
 
-static void moved_block_clear(struct moved_block *b)
-{
-	memset(b, 0, sizeof(*b));
-}
-
 #define INDENT_BLANKLINE INT_MIN
 
 static void fill_es_indent_data(struct emitted_diff_symbol *es)
@@ -1128,8 +1123,6 @@ static void mark_color_as_moved(struct diff_options *o,
 		}
 
 		if (pmb_nr && (!match || l->s != moved_symbol)) {
-			int i;
-
 			if (!adjust_last_block(o, n, block_length) &&
 			    block_length > 1) {
 				/*
@@ -1139,8 +1132,6 @@ static void mark_color_as_moved(struct diff_options *o,
 				match = NULL;
 				n -= block_length;
 			}
-			for(i = 0; i < pmb_nr; i++)
-				moved_block_clear(&pmb[i]);
 			pmb_nr = 0;
 			block_length = 0;
 			flipped_block = 0;
@@ -1193,8 +1184,6 @@ static void mark_color_as_moved(struct diff_options *o,
 	}
 	adjust_last_block(o, n, block_length);
 
-	for(n = 0; n < pmb_nr; n++)
-		moved_block_clear(&pmb[n]);
 	free(pmb);
 }
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
                           ` (11 preceding siblings ...)
  2021-12-09 10:30         ` [PATCH v5 12/15] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
@ 2021-12-09 10:30         ` Phillip Wood via GitGitGadget
  2021-12-09 10:30         ` [PATCH v5 14/15] diff: use designated initializers for emitted_diff_symbol Phillip Wood via GitGitGadget
  2021-12-09 10:30         ` [PATCH v5 15/15] diff --color-moved: intern strings Phillip Wood via GitGitGadget
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:30 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

As libxdiff does not have a whitespace flag to ignore the indentation
the code for --color-moved-ws=allow-indentation-change uses
XDF_IGNORE_WHITESPACE and then filters out any hash lookups where
there are non-indentation changes. This filtering is inefficient as
we have to perform another string comparison.

By using the offset data that we have already computed to skip the
indentation we can avoid using XDF_IGNORE_WHITESPACE and safely remove
the extra checks which improves the performance by 11% and paves the
way for the elimination of string comparisons in the next commit.

This change slightly increases the run time of other --color-moved
modes. This could be avoided by using different comparison functions
for the different modes but after the next two commits there is no
measurable benefit in doing so.

There is a change in behavior for lines that begin with a form-feed or
vertical-tab character. Since b46054b374 ("xdiff: use
git-compat-util", 2019-04-11) xdiff does not treat '\f' or '\v' as
whitespace characters. This means that lines starting with those
characters are never considered to be blank and never match a line
that does not start with the same character. After this patch a line
matching "^[\f\v\r]*[ \t]*$" is considered to be blank by
--color-moved-ws=allow-indentation-change and lines beginning
"^[\f\v\r]*[ \t]*" can match another line if the suffixes match. This
changes the output of git show for d18f76dccf ("compat/regex: use the
regex engine from gawk for compat", 2010-08-17) as some lines in the
pre-image before a moved block that contain '\f' are now considered
moved as well as they match a blank line before the moved lines in the
post-image. This commit updates one of the tests to reflect this
change.

Test                                                                  HEAD^             HEAD
--------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.33+0.05)   0.38(0.33+0.05)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.86(0.82+0.04)   0.88(0.84+0.04)  +2.3%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.97(0.94+0.03)   0.86(0.81+0.05) -11.3%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16(1.07+0.09)   1.16(1.06+0.09)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.32(1.26+0.06)   1.33(1.27+0.05)  +0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.35(1.29+0.06)   1.33(1.24+0.08)  -1.5%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c                     | 65 +++++++++++---------------------------
 t/t4015-diff-whitespace.sh | 22 ++++++-------
 2 files changed, 30 insertions(+), 57 deletions(-)

diff --git a/diff.c b/diff.c
index ffbe09937bc..2085c063675 100644
--- a/diff.c
+++ b/diff.c
@@ -850,28 +850,15 @@ static void fill_es_indent_data(struct emitted_diff_symbol *es)
 }
 
 static int compute_ws_delta(const struct emitted_diff_symbol *a,
-			    const struct emitted_diff_symbol *b,
-			    int *out)
-{
-	int a_len = a->len,
-	    b_len = b->len,
-	    a_off = a->indent_off,
-	    a_width = a->indent_width,
-	    b_off = b->indent_off,
+			    const struct emitted_diff_symbol *b)
+{
+	int a_width = a->indent_width,
 	    b_width = b->indent_width;
 
-	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE) {
-		*out = INDENT_BLANKLINE;
-		return 1;
-	}
-
-	if (a_len - a_off != b_len - b_off ||
-	    memcmp(a->line + a_off, b->line + b_off, a_len - a_off))
-		return 0;
-
-	*out = a_width - b_width;
+	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
+		return INDENT_BLANKLINE;
 
-	return 1;
+	return a_width - b_width;
 }
 
 static int cmp_in_block_with_wsd(const struct moved_entry *cur,
@@ -916,26 +903,17 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 			   const void *keydata)
 {
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
-	const struct moved_entry *a, *b;
+	const struct emitted_diff_symbol *a, *b;
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
-	a = container_of(eptr, const struct moved_entry, ent);
-	b = container_of(entry_or_key, const struct moved_entry, ent);
+	a = container_of(eptr, const struct moved_entry, ent)->es;
+	b = container_of(entry_or_key, const struct moved_entry, ent)->es;
 
-	if (diffopt->color_moved_ws_handling &
-	    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-		/*
-		 * As there is not specific white space config given,
-		 * we'd need to check for a new block, so ignore all
-		 * white space. The setup of the white space
-		 * configuration for the next block is done else where
-		 */
-		flags |= XDF_IGNORE_WHITESPACE;
-
-	return !xdiff_compare_lines(a->es->line, a->es->len,
-				    b->es->line, b->es->len,
-				    flags);
+	return !xdiff_compare_lines(a->line + a->indent_off,
+				    a->len - a->indent_off,
+				    b->line + b->indent_off,
+				    b->len - b->indent_off, flags);
 }
 
 static struct moved_entry *prepare_entry(struct diff_options *o,
@@ -944,7 +922,8 @@ static struct moved_entry *prepare_entry(struct diff_options *o,
 	struct moved_entry *ret = xmalloc(sizeof(*ret));
 	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
-	unsigned int hash = xdiff_hash_string(l->line, l->len, flags);
+	unsigned int hash = xdiff_hash_string(l->line + l->indent_off,
+					      l->len - l->indent_off, flags);
 
 	hashmap_entry_init(&ret->ent, hash);
 	ret->es = l;
@@ -1036,13 +1015,11 @@ static void fill_potential_moved_blocks(struct diff_options *o,
 	hashmap_for_each_entry_from(hm, match, ent) {
 		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
 		if (o->color_moved_ws_handling &
-		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) {
-			if (compute_ws_delta(l, match->es, &(pmb[pmb_nr]).wsd))
-				pmb[pmb_nr++].match = match;
-		} else {
+		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
+			pmb[pmb_nr].wsd = compute_ws_delta(l, match->es);
+		else
 			pmb[pmb_nr].wsd = 0;
-			pmb[pmb_nr++].match = match;
-		}
+		pmb[pmb_nr++].match = match;
 	}
 
 	*pmb_p = pmb;
@@ -6276,10 +6253,6 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 		if (o->color_moved) {
 			struct hashmap add_lines, del_lines;
 
-			if (o->color_moved_ws_handling &
-			    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-				o->color_moved_ws_handling |= XDF_IGNORE_WHITESPACE;
-
 			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
 			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
 
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 15782c879d2..50d0cf486be 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -2206,10 +2206,10 @@ EMPTY=''
 test_expect_success 'compare mixed whitespace delta across moved blocks' '
 
 	git reset --hard &&
-	tr Q_ "\t " <<-EOF >text.txt &&
-	${EMPTY}
-	____too short without
-	${EMPTY}
+	tr "^|Q_" "\f\v\t " <<-EOF >text.txt &&
+	^__
+	|____too short without
+	^
 	___being grouped across blank line
 	${EMPTY}
 	context
@@ -2228,7 +2228,7 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	git add text.txt &&
 	git commit -m "add text.txt" &&
 
-	tr Q_ "\t " <<-EOF >text.txt &&
+	tr "^|Q_" "\f\v\t " <<-EOF >text.txt &&
 	context
 	lines
 	to
@@ -2239,7 +2239,7 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	${EMPTY}
 	QQtoo short without
 	${EMPTY}
-	Q_______being grouped across blank line
+	^Q_______being grouped across blank line
 	${EMPTY}
 	Q_QThese two lines have had their
 	indentation reduced by four spaces
@@ -2251,16 +2251,16 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 		-c core.whitespace=space-before-tab \
 		diff --color --color-moved --ws-error-highlight=all \
 		--color-moved-ws=allow-indentation-change >actual.raw &&
-	grep -v "index" actual.raw | test_decode_color >actual &&
+	grep -v "index" actual.raw | tr "\f\v" "^|" | test_decode_color >actual &&
 
 	cat <<-\EOF >expected &&
 	<BOLD>diff --git a/text.txt b/text.txt<RESET>
 	<BOLD>--- a/text.txt<RESET>
 	<BOLD>+++ b/text.txt<RESET>
 	<CYAN>@@ -1,16 +1,16 @@<RESET>
-	<BOLD;MAGENTA>-<RESET>
-	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>    too short without<RESET>
-	<BOLD;MAGENTA>-<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>^<RESET><BRED>  <RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>|    too short without<RESET>
+	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>^<RESET>
 	<BOLD;MAGENTA>-<RESET><BOLD;MAGENTA>   being grouped across blank line<RESET>
 	<BOLD;MAGENTA>-<RESET>
 	 <RESET>context<RESET>
@@ -2280,7 +2280,7 @@ test_expect_success 'compare mixed whitespace delta across moved blocks' '
 	<BOLD;YELLOW>+<RESET>
 	<BOLD;YELLOW>+<RESET>		<BOLD;YELLOW>too short without<RESET>
 	<BOLD;YELLOW>+<RESET>
-	<BOLD;YELLOW>+<RESET>	<BOLD;YELLOW>       being grouped across blank line<RESET>
+	<BOLD;YELLOW>+<RESET><BOLD;YELLOW>^	       being grouped across blank line<RESET>
 	<BOLD;YELLOW>+<RESET>
 	<BOLD;CYAN>+<RESET>	<BRED> <RESET>	<BOLD;CYAN>These two lines have had their<RESET>
 	<BOLD;CYAN>+<RESET><BOLD;CYAN>indentation reduced by four spaces<RESET>
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 14/15] diff: use designated initializers for emitted_diff_symbol
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
                           ` (12 preceding siblings ...)
  2021-12-09 10:30         ` [PATCH v5 13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
@ 2021-12-09 10:30         ` Phillip Wood via GitGitGadget
  2021-12-09 10:30         ` [PATCH v5 15/15] diff --color-moved: intern strings Phillip Wood via GitGitGadget
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:30 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

This makes it clearer which fields are being explicitly initialized
and will simplify the next commit where we add a new field to the
struct.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 2085c063675..9ef88d7665a 100644
--- a/diff.c
+++ b/diff.c
@@ -1497,7 +1497,9 @@ static void emit_diff_symbol_from_struct(struct diff_options *o,
 static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
 			     const char *line, int len, unsigned flags)
 {
-	struct emitted_diff_symbol e = {line, len, flags, 0, 0, s};
+	struct emitted_diff_symbol e = {
+		.line = line, .len = len, .flags = flags, .s = s
+	};
 
 	if (o->emitted_symbols)
 		append_emitted_diff_symbol(o, &e);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [PATCH v5 15/15] diff --color-moved: intern strings
  2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
                           ` (13 preceding siblings ...)
  2021-12-09 10:30         ` [PATCH v5 14/15] diff: use designated initializers for emitted_diff_symbol Phillip Wood via GitGitGadget
@ 2021-12-09 10:30         ` Phillip Wood via GitGitGadget
  14 siblings, 0 replies; 92+ messages in thread
From: Phillip Wood via GitGitGadget @ 2021-12-09 10:30 UTC (permalink / raw)
  To: git
  Cc: Phillip Wood, Ævar Arnfjörð Bjarmason,
	Elijah Newren, Phillip Wood, Johannes Schindelin, Phillip Wood,
	Phillip Wood

From: Phillip Wood <phillip.wood@dunelm.org.uk>

Taking inspiration from xdl_classify_record() assign an id to each
addition and deletion such that lines that match for the current
--color-moved-ws mode share the same unique id. This reduces the
number of hash lookups a little (calculating the ids still involves
one hash lookup per line) but the main benefit is that when growing
blocks of potentially moved lines we can replace string comparisons
which involve chasing a pointer with a simple integer comparison. On a
large diff this commit reduces the time to run 'diff --color-moved' by
37% compared to the previous commit and 31% compared to master, for
'diff --color-moved-ws=allow-indentation-change' the reduction is 28%
compared to the previous commit and 96% compared to master. There is
little change in the performance of 'git log --patch' as the diffs are
smaller.

Test                                                                  HEAD^              HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38(0.33+0.05)    0.38(0.33+0.05)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.88(0.81+0.06)    0.55(0.50+0.04) -37.5%
4002.3: diff --color-moved-ws=allow-indentation-change large change   0.85(0.79+0.06)    0.61(0.54+0.06) -28.2%
4002.4: log --no-color-moved --no-color-moved-ws                      1.16(1.07+0.08)    1.15(1.09+0.05)  -0.9%
4002.5: log --color-moved --no-color-moved-ws                         1.31(1.22+0.08)    1.29(1.19+0.09)  -1.5%
4002.6: log --color-moved-ws=allow-indentation-change                 1.32(1.24+0.08)    1.31(1.18+0.13)  -0.8%

Test                                                                  master             HEAD
---------------------------------------------------------------------------------------------------------------
4002.1: diff --no-color-moved --no-color-moved-ws large change        0.38 (0.33+0.05)   0.38(0.33+0.05)  +0.0%
4002.2: diff --color-moved --no-color-moved-ws large change           0.80 (0.75+0.04)   0.55(0.50+0.04) -31.2%
4002.3: diff --color-moved-ws=allow-indentation-change large change  14.20(14.15+0.05)   0.61(0.54+0.06) -95.7%
4002.4: log --no-color-moved --no-color-moved-ws                      1.15 (1.05+0.09)   1.15(1.09+0.05)  +0.0%
4002.5: log --color-moved --no-color-moved-ws                         1.30 (1.19+0.11)   1.29(1.19+0.09)  -0.8%
4002.6: log --color-moved-ws=allow-indentation-change                 1.70 (1.63+0.06)   1.31(1.18+0.13) -22.9%

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
---
 diff.c | 174 +++++++++++++++++++++++++++++++--------------------------
 1 file changed, 96 insertions(+), 78 deletions(-)

diff --git a/diff.c b/diff.c
index 9ef88d7665a..c28c56c1283 100644
--- a/diff.c
+++ b/diff.c
@@ -18,6 +18,7 @@
 #include "submodule-config.h"
 #include "submodule.h"
 #include "hashmap.h"
+#include "mem-pool.h"
 #include "ll-merge.h"
 #include "string-list.h"
 #include "strvec.h"
@@ -772,6 +773,7 @@ struct emitted_diff_symbol {
 	int flags;
 	int indent_off;   /* Offset to first non-whitespace character */
 	int indent_width; /* The visual width of the indentation */
+	unsigned id;
 	enum diff_symbol s;
 };
 #define EMITTED_DIFF_SYMBOL_INIT {NULL}
@@ -797,9 +799,9 @@ static void append_emitted_diff_symbol(struct diff_options *o,
 }
 
 struct moved_entry {
-	struct hashmap_entry ent;
 	const struct emitted_diff_symbol *es;
 	struct moved_entry *next_line;
+	struct moved_entry *next_match;
 };
 
 struct moved_block {
@@ -865,24 +867,24 @@ static int cmp_in_block_with_wsd(const struct moved_entry *cur,
 				 const struct emitted_diff_symbol *l,
 				 struct moved_block *pmb)
 {
-	int al = cur->es->len, bl = l->len;
-	const char *a = cur->es->line,
-		   *b = l->line;
-	int a_off = cur->es->indent_off,
-	    a_width = cur->es->indent_width,
-	    b_off = l->indent_off,
-	    b_width = l->indent_width;
+	int a_width = cur->es->indent_width, b_width = l->indent_width;
 	int delta;
 
-	/* If 'l' and 'cur' are both blank then they match. */
-	if (a_width == INDENT_BLANKLINE && b_width == INDENT_BLANKLINE)
+	/* The text of each line must match */
+	if (cur->es->id != l->id)
+		return 1;
+
+	/*
+	 * If 'l' and 'cur' are both blank then we don't need to check the
+	 * indent. We only need to check cur as we know the strings match.
+	 * */
+	if (a_width == INDENT_BLANKLINE)
 		return 0;
 
 	/*
 	 * The indent changes of the block are known and stored in pmb->wsd;
 	 * however we need to check if the indent changes of the current line
-	 * match those of the current block and that the text of 'l' and 'cur'
-	 * after the indentation match.
+	 * match those of the current block.
 	 */
 	delta = b_width - a_width;
 
@@ -893,22 +895,26 @@ static int cmp_in_block_with_wsd(const struct moved_entry *cur,
 	if (pmb->wsd == INDENT_BLANKLINE)
 		pmb->wsd = delta;
 
-	return !(delta == pmb->wsd && al - a_off == bl - b_off &&
-		 !memcmp(a + a_off, b + b_off, al - a_off));
+	return delta != pmb->wsd;
 }
 
-static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
-			   const struct hashmap_entry *eptr,
-			   const struct hashmap_entry *entry_or_key,
-			   const void *keydata)
+struct interned_diff_symbol {
+	struct hashmap_entry ent;
+	struct emitted_diff_symbol *es;
+};
+
+static int interned_diff_symbol_cmp(const void *hashmap_cmp_fn_data,
+				    const struct hashmap_entry *eptr,
+				    const struct hashmap_entry *entry_or_key,
+				    const void *keydata)
 {
 	const struct diff_options *diffopt = hashmap_cmp_fn_data;
 	const struct emitted_diff_symbol *a, *b;
 	unsigned flags = diffopt->color_moved_ws_handling
 			 & XDF_WHITESPACE_FLAGS;
 
-	a = container_of(eptr, const struct moved_entry, ent)->es;
-	b = container_of(entry_or_key, const struct moved_entry, ent)->es;
+	a = container_of(eptr, const struct interned_diff_symbol, ent)->es;
+	b = container_of(entry_or_key, const struct interned_diff_symbol, ent)->es;
 
 	return !xdiff_compare_lines(a->line + a->indent_off,
 				    a->len - a->indent_off,
@@ -916,55 +922,81 @@ static int moved_entry_cmp(const void *hashmap_cmp_fn_data,
 				    b->len - b->indent_off, flags);
 }
 
-static struct moved_entry *prepare_entry(struct diff_options *o,
-					 int line_no)
+static void prepare_entry(struct diff_options *o, struct emitted_diff_symbol *l,
+			  struct interned_diff_symbol *s)
 {
-	struct moved_entry *ret = xmalloc(sizeof(*ret));
-	struct emitted_diff_symbol *l = &o->emitted_symbols->buf[line_no];
 	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 	unsigned int hash = xdiff_hash_string(l->line + l->indent_off,
 					      l->len - l->indent_off, flags);
 
-	hashmap_entry_init(&ret->ent, hash);
-	ret->es = l;
-	ret->next_line = NULL;
-
-	return ret;
+	hashmap_entry_init(&s->ent, hash);
+	s->es = l;
 }
 
-static void add_lines_to_move_detection(struct diff_options *o,
-					struct hashmap *add_lines,
-					struct hashmap *del_lines)
+struct moved_entry_list {
+	struct moved_entry *add, *del;
+};
+
+static struct moved_entry_list *add_lines_to_move_detection(struct diff_options *o,
+							    struct mem_pool *entry_mem_pool)
 {
 	struct moved_entry *prev_line = NULL;
-
+	struct mem_pool interned_pool;
+	struct hashmap interned_map;
+	struct moved_entry_list *entry_list = NULL;
+	size_t entry_list_alloc = 0;
+	unsigned id = 0;
 	int n;
+
+	hashmap_init(&interned_map, interned_diff_symbol_cmp, o, 8096);
+	mem_pool_init(&interned_pool, 1024 * 1024);
+
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
-		struct hashmap *hm;
-		struct moved_entry *key;
+		struct interned_diff_symbol key;
+		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
+		struct interned_diff_symbol *s;
+		struct moved_entry *entry;
 
-		switch (o->emitted_symbols->buf[n].s) {
-		case DIFF_SYMBOL_PLUS:
-			hm = add_lines;
-			break;
-		case DIFF_SYMBOL_MINUS:
-			hm = del_lines;
-			break;
-		default:
+		if (l->s != DIFF_SYMBOL_PLUS && l->s != DIFF_SYMBOL_MINUS) {
 			prev_line = NULL;
 			continue;
 		}
 
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
-			fill_es_indent_data(&o->emitted_symbols->buf[n]);
-		key = prepare_entry(o, n);
-		if (prev_line && prev_line->es->s == o->emitted_symbols->buf[n].s)
-			prev_line->next_line = key;
+			fill_es_indent_data(l);
 
-		hashmap_add(hm, &key->ent);
-		prev_line = key;
+		prepare_entry(o, l, &key);
+		s = hashmap_get_entry(&interned_map, &key, ent, &key.ent);
+		if (s) {
+			l->id = s->es->id;
+		} else {
+			l->id = id;
+			ALLOC_GROW_BY(entry_list, id, 1, entry_list_alloc);
+			hashmap_add(&interned_map,
+				    memcpy(mem_pool_alloc(&interned_pool,
+							  sizeof(key)),
+					   &key, sizeof(key)));
+		}
+		entry = mem_pool_alloc(entry_mem_pool, sizeof(*entry));
+		entry->es = l;
+		entry->next_line = NULL;
+		if (prev_line && prev_line->es->s == l->s)
+			prev_line->next_line = entry;
+		prev_line = entry;
+		if (l->s == DIFF_SYMBOL_PLUS) {
+			entry->next_match = entry_list[l->id].add;
+			entry_list[l->id].add = entry;
+		} else {
+			entry->next_match = entry_list[l->id].del;
+			entry_list[l->id].del = entry;
+		}
 	}
+
+	hashmap_clear(&interned_map);
+	mem_pool_discard(&interned_pool, 0);
+
+	return entry_list;
 }
 
 static void pmb_advance_or_null(struct diff_options *o,
@@ -973,7 +1005,6 @@ static void pmb_advance_or_null(struct diff_options *o,
 				int *pmb_nr)
 {
 	int i, j;
-	unsigned flags = o->color_moved_ws_handling & XDF_WHITESPACE_FLAGS;
 
 	for (i = 0, j = 0; i < *pmb_nr; i++) {
 		int match;
@@ -986,9 +1017,8 @@ static void pmb_advance_or_null(struct diff_options *o,
 			match = cur &&
 				!cmp_in_block_with_wsd(cur, l, &pmb[i]);
 		else
-			match = cur &&
-				xdiff_compare_lines(cur->es->line, cur->es->len,
-						    l->line, l->len, flags);
+			match = cur && cur->es->id == l->id;
+
 		if (match) {
 			pmb[j] = pmb[i];
 			pmb[j++].match = cur;
@@ -998,7 +1028,6 @@ static void pmb_advance_or_null(struct diff_options *o,
 }
 
 static void fill_potential_moved_blocks(struct diff_options *o,
-					struct hashmap *hm,
 					struct moved_entry *match,
 					struct emitted_diff_symbol *l,
 					struct moved_block **pmb_p,
@@ -1012,7 +1041,7 @@ static void fill_potential_moved_blocks(struct diff_options *o,
 	 * The current line is the start of a new block.
 	 * Setup the set of potential blocks.
 	 */
-	hashmap_for_each_entry_from(hm, match, ent) {
+	for (; match; match = match->next_match) {
 		ALLOC_GROW(pmb, pmb_nr + 1, pmb_alloc);
 		if (o->color_moved_ws_handling &
 		    COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE)
@@ -1067,8 +1096,7 @@ static int adjust_last_block(struct diff_options *o, int n, int block_length)
 
 /* Find blocks of moved code, delegate actual coloring decision to helper */
 static void mark_color_as_moved(struct diff_options *o,
-				struct hashmap *add_lines,
-				struct hashmap *del_lines)
+				struct moved_entry_list *entry_list)
 {
 	struct moved_block *pmb = NULL; /* potentially moved blocks */
 	int pmb_nr = 0, pmb_alloc = 0;
@@ -1077,23 +1105,15 @@ static void mark_color_as_moved(struct diff_options *o,
 
 
 	for (n = 0; n < o->emitted_symbols->nr; n++) {
-		struct hashmap *hm = NULL;
-		struct moved_entry *key;
 		struct moved_entry *match = NULL;
 		struct emitted_diff_symbol *l = &o->emitted_symbols->buf[n];
 
 		switch (l->s) {
 		case DIFF_SYMBOL_PLUS:
-			hm = del_lines;
-			key = prepare_entry(o, n);
-			match = hashmap_get_entry(hm, key, ent, NULL);
-			free(key);
+			match = entry_list[l->id].del;
 			break;
 		case DIFF_SYMBOL_MINUS:
-			hm = add_lines;
-			key = prepare_entry(o, n);
-			match = hashmap_get_entry(hm, key, ent, NULL);
-			free(key);
+			match = entry_list[l->id].add;
 			break;
 		default:
 			flipped_block = 0;
@@ -1135,7 +1155,7 @@ static void mark_color_as_moved(struct diff_options *o,
 				 */
 				n -= block_length;
 			else
-				fill_potential_moved_blocks(o, hm, match, l,
+				fill_potential_moved_blocks(o, match, l,
 							    &pmb, &pmb_alloc,
 							    &pmb_nr);
 
@@ -6253,20 +6273,18 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 
 	if (o->emitted_symbols) {
 		if (o->color_moved) {
-			struct hashmap add_lines, del_lines;
-
-			hashmap_init(&del_lines, moved_entry_cmp, o, 0);
-			hashmap_init(&add_lines, moved_entry_cmp, o, 0);
+			struct mem_pool entry_pool;
+			struct moved_entry_list *entry_list;
 
-			add_lines_to_move_detection(o, &add_lines, &del_lines);
-			mark_color_as_moved(o, &add_lines, &del_lines);
+			mem_pool_init(&entry_pool, 1024 * 1024);
+			entry_list = add_lines_to_move_detection(o,
+								 &entry_pool);
+			mark_color_as_moved(o, entry_list);
 			if (o->color_moved == COLOR_MOVED_ZEBRA_DIM)
 				dim_moved_lines(o);
 
-			hashmap_clear_and_free(&add_lines, struct moved_entry,
-						ent);
-			hashmap_clear_and_free(&del_lines, struct moved_entry,
-						ent);
+			mem_pool_discard(&entry_pool, 0);
+			free(entry_list);
 		}
 
 		for (i = 0; i < esm.nr; i++)
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2021-12-09 10:30 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-14 13:04 [PATCH 00/10] diff --color-moved[-ws] speedups Phillip Wood via GitGitGadget
2021-06-14 13:04 ` [PATCH 01/10] diff --color-moved=zerba: fix alternate coloring Phillip Wood via GitGitGadget
2021-06-15  3:24   ` Junio C Hamano
2021-06-15 11:22     ` Phillip Wood
2021-06-14 13:04 ` [PATCH 02/10] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
2021-06-14 13:04 ` [PATCH 03/10] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
2021-06-14 13:04 ` [PATCH 04/10] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
2021-06-14 13:04 ` [PATCH 05/10] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
2021-06-14 13:04 ` [PATCH 06/10] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
2021-06-14 13:04 ` [PATCH 07/10] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
2021-06-14 13:04 ` [PATCH 08/10] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
2021-06-14 13:04 ` [PATCH 09/10] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
2021-07-09 15:36   ` Elijah Newren
2021-06-14 13:04 ` [PATCH 10/10] diff --color-moved: intern strings Phillip Wood via GitGitGadget
2021-06-16 14:24 ` [PATCH 00/10] diff --color-moved[-ws] speedups Ævar Arnfjörð Bjarmason
2021-06-21 10:03   ` Phillip Wood
2021-07-20 10:36 ` [PATCH v2 00/12] " Phillip Wood via GitGitGadget
2021-07-20 10:36   ` [PATCH v2 01/12] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
2021-07-20 10:36   ` [PATCH v2 02/12] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
2021-07-20 10:36   ` [PATCH v2 03/12] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
2021-07-20 10:36   ` [PATCH v2 04/12] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
2021-07-20 10:36   ` [PATCH v2 05/12] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
2021-07-20 10:36   ` [PATCH v2 06/12] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
2021-07-20 10:36   ` [PATCH v2 07/12] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
2021-07-20 10:36   ` [PATCH v2 08/12] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
2021-07-20 10:36   ` [PATCH v2 09/12] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
2021-07-20 10:36   ` [PATCH v2 10/12] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
2021-07-20 10:36   ` [PATCH v2 11/12] diff: use designated initializers for emitted_diff_symbol Phillip Wood via GitGitGadget
2021-07-20 10:36   ` [PATCH v2 12/12] diff --color-moved: intern strings Phillip Wood via GitGitGadget
2021-07-20 13:38   ` [PATCH v2 00/12] diff --color-moved[-ws] speedups Phillip Wood
2021-10-27 12:04   ` [PATCH v3 00/15] " Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
2021-10-28 21:32       ` Junio C Hamano
2021-10-29 10:24         ` Phillip Wood
2021-10-29 11:06           ` Ævar Arnfjörð Bjarmason
2021-11-10 11:05             ` Phillip Wood
2021-10-27 12:04     ` [PATCH v3 02/15] diff --color-moved: clear all flags on blocks that are too short Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 03/15] diff --color-moved: factor out function Phillip Wood via GitGitGadget
2021-10-28 21:51       ` Junio C Hamano
2021-10-29 10:35         ` Phillip Wood
2021-10-27 12:04     ` [PATCH v3 04/15] diff --color-moved: rewind when discarding pmb Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 05/15] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 07/15] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 09/15] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 10/15] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 11/15] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 12/15] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 14/15] diff: use designated initializers for emitted_diff_symbol Phillip Wood via GitGitGadget
2021-10-27 12:04     ` [PATCH v3 15/15] diff --color-moved: intern strings Phillip Wood via GitGitGadget
2021-10-27 13:28     ` [PATCH v3 00/15] diff --color-moved[-ws] speedups Phillip Wood
2021-11-16  9:49     ` [PATCH v4 " Phillip Wood via GitGitGadget
2021-11-16  9:49       ` [PATCH v4 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
2021-11-16  9:49       ` [PATCH v4 02/15] diff --color-moved: clear all flags on blocks that are too short Phillip Wood via GitGitGadget
2021-11-16  9:49       ` [PATCH v4 03/15] diff --color-moved: factor out function Phillip Wood via GitGitGadget
2021-11-16  9:49       ` [PATCH v4 04/15] diff --color-moved: rewind when discarding pmb Phillip Wood via GitGitGadget
2021-11-16  9:49       ` [PATCH v4 05/15] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
2021-11-22 13:34         ` Johannes Schindelin
2021-11-16  9:49       ` [PATCH v4 06/15] diff --color-moved: avoid false short line matches and bad zerba coloring Phillip Wood via GitGitGadget
2021-11-22 14:18         ` Johannes Schindelin
2021-11-22 19:00           ` Phillip Wood
2021-11-22 21:54             ` Johannes Schindelin
2021-11-16  9:49       ` [PATCH v4 07/15] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
2021-11-16  9:49       ` [PATCH v4 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
2021-11-23 14:51         ` Johannes Schindelin
2021-11-16  9:49       ` [PATCH v4 09/15] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
2021-11-23 15:09         ` Johannes Schindelin
2021-11-16  9:49       ` [PATCH v4 10/15] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
2021-11-16  9:49       ` [PATCH v4 11/15] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
2021-11-16  9:49       ` [PATCH v4 12/15] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
2021-11-16  9:49       ` [PATCH v4 13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
2021-11-16  9:49       ` [PATCH v4 14/15] diff: use designated initializers for emitted_diff_symbol Phillip Wood via GitGitGadget
2021-11-16  9:49       ` [PATCH v4 15/15] diff --color-moved: intern strings Phillip Wood via GitGitGadget
2021-12-08 12:30       ` [PATCH v4 00/15] diff --color-moved[-ws] speedups Johannes Schindelin
2021-12-09 10:29       ` [PATCH v5 " Phillip Wood via GitGitGadget
2021-12-09 10:29         ` [PATCH v5 01/15] diff --color-moved: add perf tests Phillip Wood via GitGitGadget
2021-12-09 10:29         ` [PATCH v5 02/15] diff --color-moved: clear all flags on blocks that are too short Phillip Wood via GitGitGadget
2021-12-09 10:29         ` [PATCH v5 03/15] diff --color-moved: factor out function Phillip Wood via GitGitGadget
2021-12-09 10:29         ` [PATCH v5 04/15] diff --color-moved: rewind when discarding pmb Phillip Wood via GitGitGadget
2021-12-09 10:29         ` [PATCH v5 05/15] diff --color-moved=zebra: fix alternate coloring Phillip Wood via GitGitGadget
2021-12-09 10:30         ` [PATCH v5 06/15] diff --color-moved: avoid false short line matches and bad zebra coloring Phillip Wood via GitGitGadget
2021-12-09 10:30         ` [PATCH v5 07/15] diff: simplify allow-indentation-change delta calculation Phillip Wood via GitGitGadget
2021-12-09 10:30         ` [PATCH v5 08/15] diff --color-moved-ws=allow-indentation-change: simplify and optimize Phillip Wood via GitGitGadget
2021-12-09 10:30         ` [PATCH v5 09/15] diff --color-moved: call comparison function directly Phillip Wood via GitGitGadget
2021-12-09 10:30         ` [PATCH v5 10/15] diff --color-moved: unify moved block growth functions Phillip Wood via GitGitGadget
2021-12-09 10:30         ` [PATCH v5 11/15] diff --color-moved: shrink potential moved blocks as we go Phillip Wood via GitGitGadget
2021-12-09 10:30         ` [PATCH v5 12/15] diff --color-moved: stop clearing potential moved blocks Phillip Wood via GitGitGadget
2021-12-09 10:30         ` [PATCH v5 13/15] diff --color-moved-ws=allow-indentation-change: improve hash lookups Phillip Wood via GitGitGadget
2021-12-09 10:30         ` [PATCH v5 14/15] diff: use designated initializers for emitted_diff_symbol Phillip Wood via GitGitGadget
2021-12-09 10:30         ` [PATCH v5 15/15] diff --color-moved: intern strings Phillip Wood via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).