* [PATCH 1/5] t4015: refactor --color-moved whitespace test
2017-10-19 20:23 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
@ 2017-10-19 20:24 ` Jeff King
2017-10-19 20:56 ` Stefan Beller
2017-10-19 20:25 ` [PATCH 2/5] t4015: check "negative" case for "-w --color-moved" Jeff King
` (3 subsequent siblings)
4 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2017-10-19 20:24 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Stefan Beller, git, orgads
In preparation for testing several different whitespace
options, let's split out the setup and cleanup steps of the
whitespace test.
While we're here, let's also switch to using "<<-" to indent
our here-documents properly, and use q_to_tab to more
explicitly mark where we expect whitespace to appear.
Signed-off-by: Jeff King <peff@peff.net>
---
t/t4015-diff-whitespace.sh | 49 +++++++++++++++++++++++++++-------------------
1 file changed, 29 insertions(+), 20 deletions(-)
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 87083f728f..164b502405 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1318,30 +1318,34 @@ test_expect_success 'no effect from --color-moved with --word-diff' '
test_cmp expect actual
'
-test_expect_success 'move detection ignoring whitespace ' '
+test_expect_success 'set up whitespace tests' '
git reset --hard &&
- cat <<\EOF >lines.txt &&
-line 1
-line 2
-line 3
-line 4
-long line 5
-long line 6
-long line 7
-EOF
- git add lines.txt &&
- git commit -m "add poetry" &&
- cat <<\EOF >lines.txt &&
+ # Note that these lines have no leading or trailing whitespace.
+ cat <<-\EOF >lines.txt &&
+ line 1
+ line 2
+ line 3
+ line 4
long line 5
long line 6
long line 7
-line 1
-line 2
-line 3
-line 4
-EOF
- test_config color.diff.oldMoved "magenta" &&
- test_config color.diff.newMoved "cyan" &&
+ EOF
+ git add lines.txt &&
+ git commit -m "add poetry" &&
+ git config color.diff.oldMoved "magenta" &&
+ git config color.diff.newMoved "cyan"
+'
+
+test_expect_success 'move detection ignoring whitespace ' '
+ q_to_tab <<-\EOF >lines.txt &&
+ Qlong line 5
+ Qlong line 6
+ Qlong line 7
+ line 1
+ line 2
+ line 3
+ line 4
+ EOF
git diff HEAD --no-renames --color-moved --color |
grep -v "index" |
test_decode_color >actual &&
@@ -1385,6 +1389,11 @@ EOF
test_cmp expected actual
'
+test_expect_success 'clean up whitespace-test colors' '
+ git config --unset color.diff.oldMoved &&
+ git config --unset color.diff.newMoved
+'
+
test_expect_success '--color-moved block at end of diff output respects MIN_ALNUM_COUNT' '
git reset --hard &&
>bar &&
--
2.15.0.rc1.560.g5f0609e481
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 1/5] t4015: refactor --color-moved whitespace test
2017-10-19 20:24 ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
@ 2017-10-19 20:56 ` Stefan Beller
2017-10-19 21:10 ` Jeff King
0 siblings, 1 reply; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 20:56 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git@vger.kernel.org, Orgad Shaneh
On Thu, Oct 19, 2017 at 1:24 PM, Jeff King <peff@peff.net> wrote:
>
> +test_expect_success 'clean up whitespace-test colors' '
> + git config --unset color.diff.oldMoved &&
> + git config --unset color.diff.newMoved
> +'
This could be part of the previous test as
test_config color.diff.oldMoved "magenta" &&
test_config color.diff.newMoved "cyan" &&
in the beginning. (That way we also do not pollute the setup,
but keeping it test local).
With or without this nit, this is
Reviewed-by: Stefan Beller <sbeller@google.com>
Thanks,
Stefan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 1/5] t4015: refactor --color-moved whitespace test
2017-10-19 20:56 ` Stefan Beller
@ 2017-10-19 21:10 ` Jeff King
0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19 21:10 UTC (permalink / raw)
To: Stefan Beller; +Cc: Junio C Hamano, git@vger.kernel.org, Orgad Shaneh
On Thu, Oct 19, 2017 at 01:56:55PM -0700, Stefan Beller wrote:
> On Thu, Oct 19, 2017 at 1:24 PM, Jeff King <peff@peff.net> wrote:
>
> >
> > +test_expect_success 'clean up whitespace-test colors' '
> > + git config --unset color.diff.oldMoved &&
> > + git config --unset color.diff.newMoved
> > +'
>
> This could be part of the previous test as
>
> test_config color.diff.oldMoved "magenta" &&
> test_config color.diff.newMoved "cyan" &&
>
> in the beginning. (That way we also do not pollute the setup,
> but keeping it test local).
It used to be in the previous test as test_config, but part of the setup
refactoring is to keep these common bits across several tests. Hence we
"git config" in the setup step, and we must "git config --unset" here in
the cleanup. There's only the one test between setup/cleanup right now,
but the point is to add more.
The other alternative (besides repeating ourselves) would be to put all
those common bits into a function and call the function at the top of
each test.
-Peff
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 2/5] t4015: check "negative" case for "-w --color-moved"
2017-10-19 20:23 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
2017-10-19 20:24 ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
@ 2017-10-19 20:25 ` Jeff King
2017-10-19 20:54 ` Stefan Beller
2017-10-19 20:26 ` [PATCH 3/5] t4015: test the output of "diff --color-moved -b" Jeff King
` (2 subsequent siblings)
4 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2017-10-19 20:25 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Stefan Beller, git, orgads
We test that lines with whitespace changes are not found by
"--color-moved" by default, but are found if "-w" is added.
Let's add one more twist: a line that has non-whitespace
changes should not be marked as a pure move.
This is perhaps an obvious case for us to get right (and we
do), but as we add more whitespace tests, they will form a
pattern of "make sure this case is a move and this other
case is not".
Note that we have to add a line to our moved block, since
having a too-small block doesn't trigger the "moved"
heuristics. And we also add a line of context to ensure
that there's more context lines than moved lines (so the
diff shows us moving the lines up, rather than moving the
context down).
Signed-off-by: Jeff King <peff@peff.net>
---
t/t4015-diff-whitespace.sh | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 164b502405..503c9bc7f3 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1326,9 +1326,11 @@ test_expect_success 'set up whitespace tests' '
line 2
line 3
line 4
- long line 5
+ line 5
long line 6
long line 7
+ long line 8
+ long line 9
EOF
git add lines.txt &&
git commit -m "add poetry" &&
@@ -1338,13 +1340,15 @@ test_expect_success 'set up whitespace tests' '
test_expect_success 'move detection ignoring whitespace ' '
q_to_tab <<-\EOF >lines.txt &&
- Qlong line 5
Qlong line 6
Qlong line 7
+ Qlong line 8
+ Qchanged long line 9
line 1
line 2
line 3
line 4
+ line 5
EOF
git diff HEAD --no-renames --color-moved --color |
grep -v "index" |
@@ -1353,17 +1357,20 @@ test_expect_success 'move detection ignoring whitespace ' '
<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
<BOLD>--- a/lines.txt<RESET>
<BOLD>+++ b/lines.txt<RESET>
- <CYAN>@@ -1,7 +1,7 @@<RESET>
- <GREEN>+<RESET> <GREEN>long line 5<RESET>
+ <CYAN>@@ -1,9 +1,9 @@<RESET>
<GREEN>+<RESET> <GREEN>long line 6<RESET>
<GREEN>+<RESET> <GREEN>long line 7<RESET>
+ <GREEN>+<RESET> <GREEN>long line 8<RESET>
+ <GREEN>+<RESET> <GREEN>changed long line 9<RESET>
line 1<RESET>
line 2<RESET>
line 3<RESET>
line 4<RESET>
- <RED>-long line 5<RESET>
+ line 5<RESET>
<RED>-long line 6<RESET>
<RED>-long line 7<RESET>
+ <RED>-long line 8<RESET>
+ <RED>-long line 9<RESET>
EOF
test_cmp expected actual &&
@@ -1374,17 +1381,20 @@ test_expect_success 'move detection ignoring whitespace ' '
<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
<BOLD>--- a/lines.txt<RESET>
<BOLD>+++ b/lines.txt<RESET>
- <CYAN>@@ -1,7 +1,7 @@<RESET>
- <CYAN>+<RESET> <CYAN>long line 5<RESET>
+ <CYAN>@@ -1,9 +1,9 @@<RESET>
<CYAN>+<RESET> <CYAN>long line 6<RESET>
<CYAN>+<RESET> <CYAN>long line 7<RESET>
+ <CYAN>+<RESET> <CYAN>long line 8<RESET>
+ <GREEN>+<RESET> <GREEN>changed long line 9<RESET>
line 1<RESET>
line 2<RESET>
line 3<RESET>
line 4<RESET>
- <MAGENTA>-long line 5<RESET>
+ line 5<RESET>
<MAGENTA>-long line 6<RESET>
<MAGENTA>-long line 7<RESET>
+ <MAGENTA>-long line 8<RESET>
+ <RED>-long line 9<RESET>
EOF
test_cmp expected actual
'
--
2.15.0.rc1.560.g5f0609e481
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 2/5] t4015: check "negative" case for "-w --color-moved"
2017-10-19 20:25 ` [PATCH 2/5] t4015: check "negative" case for "-w --color-moved" Jeff King
@ 2017-10-19 20:54 ` Stefan Beller
0 siblings, 0 replies; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 20:54 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git@vger.kernel.org, Orgad Shaneh
On Thu, Oct 19, 2017 at 1:25 PM, Jeff King <peff@peff.net> wrote:
> We test that lines with whitespace changes are not found by
> "--color-moved" by default, but are found if "-w" is added.
> Let's add one more twist: a line that has non-whitespace
> changes should not be marked as a pure move.
>
> This is perhaps an obvious case for us to get right (and we
> do), but as we add more whitespace tests, they will form a
> pattern of "make sure this case is a move and this other
> case is not".
>
> Note that we have to add a line to our moved block, since
> having a too-small block doesn't trigger the "moved"
> heuristics. And we also add a line of context to ensure
> that there's more context lines than moved lines (so the
> diff shows us moving the lines up, rather than moving the
> context down).
>
> Signed-off-by: Jeff King <peff@peff.net>
This patch is
Reviewed-by: Stefan Beller <sbeller@google.com>
Thanks!
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 3/5] t4015: test the output of "diff --color-moved -b"
2017-10-19 20:23 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
2017-10-19 20:24 ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
2017-10-19 20:25 ` [PATCH 2/5] t4015: check "negative" case for "-w --color-moved" Jeff King
@ 2017-10-19 20:26 ` Jeff King
2017-10-19 21:03 ` Stefan Beller
2017-10-19 20:29 ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
2017-10-19 20:31 ` [PATCH 5/5] diff: handle NULs in get_string_hash() Jeff King
4 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2017-10-19 20:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Stefan Beller, git, orgads
Commit fa5ba2c1dd (diff: fix infinite loop with
--color-moved --ignore-space-change, 2017-10-12) added a
test to make sure that "--color-moved -b" doesn't run
forever, but the test in question doesn't actually have any
moved lines in it.
Let's scrap that test and add a variant of the existing
"--color-moved -w" test, but this time we'll check that we
find the move with whitespace changes, but not arbitrary
whitespace additions.
Signed-off-by: Jeff King <peff@peff.net>
---
t/t4015-diff-whitespace.sh | 73 ++++++++++++++++++++++++++++++++++++++++------
1 file changed, 64 insertions(+), 9 deletions(-)
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 503c9bc7f3..1f54816c6b 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1399,6 +1399,70 @@ test_expect_success 'move detection ignoring whitespace ' '
test_cmp expected actual
'
+test_expect_success 'move detection ignoring whitespace changes' '
+ git reset --hard &&
+ # Lines 6-8 have a space change, but 9 is new whitespace
+ q_to_tab <<-\EOF >lines.txt &&
+ longQline 6
+ longQline 7
+ longQline 8
+ long liQne 9
+ line 1
+ line 2
+ line 3
+ line 4
+ line 5
+ EOF
+
+ git diff HEAD --no-renames --color-moved --color |
+ grep -v "index" |
+ test_decode_color >actual &&
+ cat <<-\EOF >expected &&
+ <BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+ <BOLD>--- a/lines.txt<RESET>
+ <BOLD>+++ b/lines.txt<RESET>
+ <CYAN>@@ -1,9 +1,9 @@<RESET>
+ <GREEN>+<RESET><GREEN>long line 6<RESET>
+ <GREEN>+<RESET><GREEN>long line 7<RESET>
+ <GREEN>+<RESET><GREEN>long line 8<RESET>
+ <GREEN>+<RESET><GREEN>long li ne 9<RESET>
+ line 1<RESET>
+ line 2<RESET>
+ line 3<RESET>
+ line 4<RESET>
+ line 5<RESET>
+ <RED>-long line 6<RESET>
+ <RED>-long line 7<RESET>
+ <RED>-long line 8<RESET>
+ <RED>-long line 9<RESET>
+ EOF
+ test_cmp expected actual &&
+
+ git diff HEAD --no-renames -b --color-moved --color |
+ grep -v "index" |
+ test_decode_color >actual &&
+ cat <<-\EOF >expected &&
+ <BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+ <BOLD>--- a/lines.txt<RESET>
+ <BOLD>+++ b/lines.txt<RESET>
+ <CYAN>@@ -1,9 +1,9 @@<RESET>
+ <CYAN>+<RESET><CYAN>long line 6<RESET>
+ <CYAN>+<RESET><CYAN>long line 7<RESET>
+ <CYAN>+<RESET><CYAN>long line 8<RESET>
+ <GREEN>+<RESET><GREEN>long li ne 9<RESET>
+ line 1<RESET>
+ line 2<RESET>
+ line 3<RESET>
+ line 4<RESET>
+ line 5<RESET>
+ <MAGENTA>-long line 6<RESET>
+ <MAGENTA>-long line 7<RESET>
+ <MAGENTA>-long line 8<RESET>
+ <RED>-long line 9<RESET>
+ EOF
+ test_cmp expected actual
+'
+
test_expect_success 'clean up whitespace-test colors' '
git config --unset color.diff.oldMoved &&
git config --unset color.diff.newMoved
@@ -1549,13 +1613,4 @@ test_expect_success 'move detection with submodules' '
test_cmp expect decoded_actual
'
-test_expect_success 'move detection with whitespace changes' '
- test_when_finished "git reset --hard" &&
- test_seq 10 >test &&
- git add test &&
- sed s/3/42/ <test >test.tmp &&
- mv test.tmp test &&
- git -c diff.colormoved diff --ignore-space-change -- test
-'
-
test_done
--
2.15.0.rc1.560.g5f0609e481
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 3/5] t4015: test the output of "diff --color-moved -b"
2017-10-19 20:26 ` [PATCH 3/5] t4015: test the output of "diff --color-moved -b" Jeff King
@ 2017-10-19 21:03 ` Stefan Beller
2017-10-19 21:14 ` Jeff King
0 siblings, 1 reply; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 21:03 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git@vger.kernel.org, Orgad Shaneh
On Thu, Oct 19, 2017 at 1:26 PM, Jeff King <peff@peff.net> wrote:
> Commit fa5ba2c1dd (diff: fix infinite loop with
> --color-moved --ignore-space-change, 2017-10-12) added a
> test to make sure that "--color-moved -b" doesn't run
> forever, but the test in question doesn't actually have any
> moved lines in it.
>
> Let's scrap that test and add a variant of the existing
> "--color-moved -w" test, but this time we'll check that we
> find the move with whitespace changes, but not arbitrary
> whitespace additions.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> +
> + git diff HEAD --no-renames --color-moved --color |
> + grep -v "index" |
> + test_decode_color >actual &&
The -v index grep makes it future proof (for a new hash)!
I like that. What I do not like is the pipe at the end of
git-diff as we certainly want to inspect the return code
(or for a segfault ;)
> + git diff HEAD --no-renames -b --color-moved --color |
> + grep -v "index" |
> + test_decode_color >actual &&
Do we need (or rather want) to give --color additionally,
instead of having --color-moved imply it? I guess this is
extra carefulness from the "color always" series?
> test_expect_success 'clean up whitespace-test colors' '
> git config --unset color.diff.oldMoved &&
> git config --unset color.diff.newMoved
> @@ -1549,13 +1613,4 @@ test_expect_success 'move detection with submodules' '
> test_cmp expect decoded_actual
> '
>
> -test_expect_success 'move detection with whitespace changes' '
> - test_when_finished "git reset --hard" &&
> - test_seq 10 >test &&
> - git add test &&
> - sed s/3/42/ <test >test.tmp &&
> - mv test.tmp test &&
> - git -c diff.colormoved diff --ignore-space-change -- test
> -'
Ok, this is covered above in testing for the unchanged lines (1-5)
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 3/5] t4015: test the output of "diff --color-moved -b"
2017-10-19 21:03 ` Stefan Beller
@ 2017-10-19 21:14 ` Jeff King
0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19 21:14 UTC (permalink / raw)
To: Stefan Beller; +Cc: Junio C Hamano, git@vger.kernel.org, Orgad Shaneh
On Thu, Oct 19, 2017 at 02:03:54PM -0700, Stefan Beller wrote:
> > +
> > + git diff HEAD --no-renames --color-moved --color |
> > + grep -v "index" |
> > + test_decode_color >actual &&
>
> The -v index grep makes it future proof (for a new hash)!
> I like that. What I do not like is the pipe at the end of
> git-diff as we certainly want to inspect the return code
> (or for a segfault ;)
I can claim neither credit nor failure here. These lines are copied
straight from your existing test. :)
> > + git diff HEAD --no-renames -b --color-moved --color |
> > + grep -v "index" |
> > + test_decode_color >actual &&
>
> Do we need (or rather want) to give --color additionally,
> instead of having --color-moved imply it? I guess this is
> extra carefulness from the "color always" series?
I do think --color-moved probably ought to imply --color (or at least
--color=auto). But AFAIK it doesn't do so now, so we need --color (and
if it implied "auto", we'd have to overcome that anyway).
But again, this comes straight from the existing test.
> > -test_expect_success 'move detection with whitespace changes' '
> > - test_when_finished "git reset --hard" &&
> > - test_seq 10 >test &&
> > - git add test &&
> > - sed s/3/42/ <test >test.tmp &&
> > - mv test.tmp test &&
> > - git -c diff.colormoved diff --ignore-space-change -- test
> > -'
>
> Ok, this is covered above in testing for the unchanged lines (1-5)
Yes, though I think it's actually the _changed_ lines which do it (the
problem was in next_byte(), so any diff with "--color-moved -b", whether
it had moves or not, would infinite loop as long as it had lines without
whitespace at the end).
-Peff
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 4/5] diff: fix whitespace-skipping with --color-moved
2017-10-19 20:23 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
` (2 preceding siblings ...)
2017-10-19 20:26 ` [PATCH 3/5] t4015: test the output of "diff --color-moved -b" Jeff King
@ 2017-10-19 20:29 ` Jeff King
2017-10-19 21:15 ` Stefan Beller
2017-10-20 7:23 ` Simon Ruderich
2017-10-19 20:31 ` [PATCH 5/5] diff: handle NULs in get_string_hash() Jeff King
4 siblings, 2 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19 20:29 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Stefan Beller, git, orgads
The code for handling whitespace with --color-moved
represents partial strings as a pair of pointers. There are
two possible conventions for the end pointer:
1. It points to the byte right after the end of the
string.
2. It points to the final byte of the string.
But we seem to use both conventions in the code:
a. we assign the initial pointers from the NUL-terminated
string using (1)
b. we eat trailing whitespace by checking the second
pointer for isspace(), which needs (2)
c. the next_byte() function checks for end-of-string with
"if (cp > endp)", which is (2)
d. in next_byte() we skip past internal whitespace with
"while (cp < end)", which is (1)
This creates fewer bugs than you might think, because there
are some subtle interactions. Because of (a) and (c), we
always return the NUL-terminator from next_byte(). But all
of the callers of next_byte() happen to handle that
gracefully.
Because of the mismatch between (d) and (c), next_byte()
could accidentally return a whitespace character right at
endp. But because of the interaction of (a) and (b), we fail
to actually chomp trailing whitespace, meaning our endp
_always_ points to a NUL, canceling out the problem.
But that does leave (b) as a real bug: when ignoring
whitespace only at the end-of-line, we don't correctly trim
it, and fail to match up lines.
We can fix the whole thing by moving consistently to one
convention. Since convention (1) is idiomatic in our code
base, we'll pick that one.
The existing "-w" and "-b" tests continue to pass, and a new
"--ignore-space-at-eol" shows off the breakage we're fixing.
Signed-off-by: Jeff King <peff@peff.net>
---
diff.c | 15 +++++++----
t/t4015-diff-whitespace.sh | 67 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 77 insertions(+), 5 deletions(-)
diff --git a/diff.c b/diff.c
index 6fd288420b..09081a207c 100644
--- a/diff.c
+++ b/diff.c
@@ -712,7 +712,7 @@ static int next_byte(const char **cp, const char **endp,
{
int retval;
- if (*cp > *endp)
+ if (*cp >= *endp)
return -1;
if (isspace(**cp)) {
@@ -729,7 +729,12 @@ static int next_byte(const char **cp, const char **endp,
if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
while (*cp < *endp && isspace(**cp))
(*cp)++;
- /* return the first non-ws character via the usual below */
+ /*
+ * return the first non-ws character via the usual
+ * below, unless we ate all of the bytes
+ */
+ if (*cp >= *endp)
+ return -1;
}
}
@@ -750,9 +755,9 @@ static int moved_entry_cmp(const struct diff_options *diffopt,
return a->es->len != b->es->len || memcmp(ap, bp, a->es->len);
if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_AT_EOL)) {
- while (ae > ap && isspace(*ae))
+ while (ae > ap && isspace(ae[-1]))
ae--;
- while (be > bp && isspace(*be))
+ while (be > bp && isspace(be[-1]))
be--;
}
@@ -775,7 +780,7 @@ static unsigned get_string_hash(struct emitted_diff_symbol *es, struct diff_opti
int c;
strbuf_reset(&sb);
- while (ae > ap && isspace(*ae))
+ while (ae > ap && isspace(ae[-1]))
ae--;
while ((c = next_byte(&ap, &ae, o)) > 0)
strbuf_addch(&sb, c);
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 1f54816c6b..eb9431da7d 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1463,6 +1463,73 @@ test_expect_success 'move detection ignoring whitespace changes' '
test_cmp expected actual
'
+test_expect_failure 'move detection ignoring whitespace at eol' '
+ git reset --hard &&
+ # Lines 6-9 have new eol whitespace, but 9 also has it in the middle
+ q_to_tab <<-\EOF >lines.txt &&
+ long line 6Q
+ long line 7Q
+ long line 8Q
+ longQline 9Q
+ line 1
+ line 2
+ line 3
+ line 4
+ line 5
+ EOF
+
+ # avoid cluttering the output with complaints about our eol whitespace
+ test_config core.whitespace -blank-at-eol &&
+
+ git diff HEAD --no-renames --color-moved --color |
+ grep -v "index" |
+ test_decode_color >actual &&
+ cat <<-\EOF >expected &&
+ <BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+ <BOLD>--- a/lines.txt<RESET>
+ <BOLD>+++ b/lines.txt<RESET>
+ <CYAN>@@ -1,9 +1,9 @@<RESET>
+ <GREEN>+<RESET><GREEN>long line 6 <RESET>
+ <GREEN>+<RESET><GREEN>long line 7 <RESET>
+ <GREEN>+<RESET><GREEN>long line 8 <RESET>
+ <GREEN>+<RESET><GREEN>long line 9 <RESET>
+ line 1<RESET>
+ line 2<RESET>
+ line 3<RESET>
+ line 4<RESET>
+ line 5<RESET>
+ <RED>-long line 6<RESET>
+ <RED>-long line 7<RESET>
+ <RED>-long line 8<RESET>
+ <RED>-long line 9<RESET>
+ EOF
+ test_cmp expected actual &&
+
+ git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
+ grep -v "index" |
+ test_decode_color >actual &&
+ cat <<-\EOF >expected &&
+ <BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+ <BOLD>--- a/lines.txt<RESET>
+ <BOLD>+++ b/lines.txt<RESET>
+ <CYAN>@@ -1,9 +1,9 @@<RESET>
+ <CYAN>+<RESET><CYAN>long line 6 <RESET>
+ <CYAN>+<RESET><CYAN>long line 7 <RESET>
+ <CYAN>+<RESET><CYAN>long line 8 <RESET>
+ <GREEN>+<RESET><GREEN>long line 9 <RESET>
+ line 1<RESET>
+ line 2<RESET>
+ line 3<RESET>
+ line 4<RESET>
+ line 5<RESET>
+ <MAGENTA>-long line 6<RESET>
+ <MAGENTA>-long line 7<RESET>
+ <MAGENTA>-long line 8<RESET>
+ <RED>-long line 9<RESET>
+ EOF
+ test_cmp expected actual
+'
+
test_expect_success 'clean up whitespace-test colors' '
git config --unset color.diff.oldMoved &&
git config --unset color.diff.newMoved
--
2.15.0.rc1.560.g5f0609e481
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 4/5] diff: fix whitespace-skipping with --color-moved
2017-10-19 20:29 ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
@ 2017-10-19 21:15 ` Stefan Beller
2017-10-19 21:19 ` Jeff King
2017-10-20 7:23 ` Simon Ruderich
1 sibling, 1 reply; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 21:15 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git@vger.kernel.org, Orgad Shaneh
On Thu, Oct 19, 2017 at 1:29 PM, Jeff King <peff@peff.net> wrote:
> The code for handling whitespace with --color-moved
> represents partial strings as a pair of pointers. There are
> two possible conventions for the end pointer:
>
> 1. It points to the byte right after the end of the
> string.
>
> 2. It points to the final byte of the string.
>
> But we seem to use both conventions in the code:
>
> a. we assign the initial pointers from the NUL-terminated
> string using (1)
>
> b. we eat trailing whitespace by checking the second
> pointer for isspace(), which needs (2)
>
> c. the next_byte() function checks for end-of-string with
> "if (cp > endp)", which is (2)
>
> d. in next_byte() we skip past internal whitespace with
> "while (cp < end)", which is (1)
>
> This creates fewer bugs than you might think, because there
> are some subtle interactions. Because of (a) and (c), we
> always return the NUL-terminator from next_byte(). But all
> of the callers of next_byte() happen to handle that
> gracefully.
>
> Because of the mismatch between (d) and (c), next_byte()
> could accidentally return a whitespace character right at
> endp. But because of the interaction of (a) and (b), we fail
> to actually chomp trailing whitespace, meaning our endp
> _always_ points to a NUL, canceling out the problem.
>
> But that does leave (b) as a real bug: when ignoring
> whitespace only at the end-of-line, we don't correctly trim
> it, and fail to match up lines.
>
> We can fix the whole thing by moving consistently to one
> convention. Since convention (1) is idiomatic in our code
> base, we'll pick that one.
>
> The existing "-w" and "-b" tests continue to pass, and a new
> "--ignore-space-at-eol" shows off the breakage we're fixing.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff.c | 15 +++++++----
> t/t4015-diff-whitespace.sh | 67 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 77 insertions(+), 5 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 6fd288420b..09081a207c 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -712,7 +712,7 @@ static int next_byte(const char **cp, const char **endp,
> {
> int retval;
>
> - if (*cp > *endp)
> + if (*cp >= *endp)
> return -1;
This converts (c) from (2) to (1).
> while (*cp < *endp && isspace(**cp))
> (*cp)++;
> - /* return the first non-ws character via the usual below */
> + /*
> + * return the first non-ws character via the usual
> + * below, unless we ate all of the bytes
> + */
> + if (*cp >= *endp)
> + return -1;
This fixes the mismatch between (d) and (c).
When I wrote the code, I did not follow proper commenting style
by capitalizing the sentence start and putting periods. :(
(Well it was a single line comment, which I have seen to not
follow style occasionally unlike any multi line comment. Anyway
no need to fix it here and now, just pointing out my bad code
in the beginning.)
> - while (ae > ap && isspace(*ae))
> + while (ae > ap && isspace(ae[-1]))
> ae--;
> - while (be > bp && isspace(*be))
> + while (be > bp && isspace(be[-1]))
> be--;
...
> - while (ae > ap && isspace(*ae))
> + while (ae > ap && isspace(ae[-1]))
These fixes convert (b) to (1), so we're all on (1).
As we check for strict endpointer > firstpointer
(and not >=), the check of -1 is fine, too.
> @@ -1463,6 +1463,73 @@ test_expect_success 'move detection ignoring whitespace changes' '
> test_cmp expected actual
> '
>
> +test_expect_failure 'move detection ignoring whitespace at eol' '
> + git reset --hard &&
> + # Lines 6-9 have new eol whitespace, but 9 also has it in the middle
> + q_to_tab <<-\EOF >lines.txt &&
> + long line 6Q
> + long line 7Q
> + long line 8Q
> + longQline 9Q
> + line 1
> + line 2
> + line 3
> + line 4
> + line 5
> + EOF
> +
> + # avoid cluttering the output with complaints about our eol whitespace
> + test_config core.whitespace -blank-at-eol &&
We avoid the eol space change as we want to test the move detection
without interference. Do we want to test it with that as well?
> + git diff HEAD --no-renames --color-moved --color |
> + grep -v "index" |
> + test_decode_color >actual &&
..
> + git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
> + grep -v "index" |
> + test_decode_color >actual &&
..
> + <GREEN>+<RESET><GREEN>long line 9 <RESET>
ok, we also have no interference with space changes,
which we assume is orthogonal.
The commit message really enlightened me,
Thanks!
Stefan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 4/5] diff: fix whitespace-skipping with --color-moved
2017-10-19 21:15 ` Stefan Beller
@ 2017-10-19 21:19 ` Jeff King
0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19 21:19 UTC (permalink / raw)
To: Stefan Beller; +Cc: Junio C Hamano, git@vger.kernel.org, Orgad Shaneh
On Thu, Oct 19, 2017 at 02:15:12PM -0700, Stefan Beller wrote:
> > +test_expect_failure 'move detection ignoring whitespace at eol' '
> > + git reset --hard &&
> > + # Lines 6-9 have new eol whitespace, but 9 also has it in the middle
> > + q_to_tab <<-\EOF >lines.txt &&
> > + long line 6Q
> > + long line 7Q
> > + long line 8Q
> > + longQline 9Q
> > + line 1
> > + line 2
> > + line 3
> > + line 4
> > + line 5
> > + EOF
> > +
> > + # avoid cluttering the output with complaints about our eol whitespace
> > + test_config core.whitespace -blank-at-eol &&
>
> We avoid the eol space change as we want to test the move detection
> without interference. Do we want to test it with that as well?
I don't think it creates interference. It's just that the expected
output becomes more like:
<CYAN>+<RESET><CYAN>long line 6<RESET><BLUE> </RESET>
and the extra coloring just made it harder for me to read and write the
test. :)
I agree it might be worth checking the interaction between whitespace
coloring and --color-moved, but I think we'd want it to be more thorough
(e.g., covering the beginning of line with indent-with-non-tab or
similar).
> The commit message really enlightened me,
> Thanks!
Thanks for reviewing.
-Peff
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 4/5] diff: fix whitespace-skipping with --color-moved
2017-10-19 20:29 ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
2017-10-19 21:15 ` Stefan Beller
@ 2017-10-20 7:23 ` Simon Ruderich
2017-10-20 22:37 ` Jeff King
1 sibling, 1 reply; 34+ messages in thread
From: Simon Ruderich @ 2017-10-20 7:23 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Stefan Beller, git, orgads
On Thu, Oct 19, 2017 at 04:29:26PM -0400, Jeff King wrote:
> [snip]
>
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1463,6 +1463,73 @@ test_expect_success 'move detection ignoring whitespace changes' '
> test_cmp expected actual
> '
>
> +test_expect_failure 'move detection ignoring whitespace at eol' '
Shouldn't this be test_expect_success? According to the commit
message "and a new "--ignore-space-at-eol" shows off the breakage
we're fixing.". I didn't actually run the code so I don't know if
the test fails or not.
Regards
Simon
--
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 4/5] diff: fix whitespace-skipping with --color-moved
2017-10-20 7:23 ` Simon Ruderich
@ 2017-10-20 22:37 ` Jeff King
0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2017-10-20 22:37 UTC (permalink / raw)
To: Simon Ruderich; +Cc: Junio C Hamano, Stefan Beller, git, orgads
On Fri, Oct 20, 2017 at 09:23:54AM +0200, Simon Ruderich wrote:
> On Thu, Oct 19, 2017 at 04:29:26PM -0400, Jeff King wrote:
> > [snip]
> >
> > --- a/t/t4015-diff-whitespace.sh
> > +++ b/t/t4015-diff-whitespace.sh
> > @@ -1463,6 +1463,73 @@ test_expect_success 'move detection ignoring whitespace changes' '
> > test_cmp expected actual
> > '
> >
> > +test_expect_failure 'move detection ignoring whitespace at eol' '
>
> Shouldn't this be test_expect_success? According to the commit
> message "and a new "--ignore-space-at-eol" shows off the breakage
> we're fixing.". I didn't actually run the code so I don't know if
> the test fails or not.
Thanks, good catch. I had originally added all the tests in a single
commit, with the intent of flipping this failure to success. But when I
shifted it to just get added here, I accidentally lost that flip.
-Peff
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 5/5] diff: handle NULs in get_string_hash()
2017-10-19 20:23 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
` (3 preceding siblings ...)
2017-10-19 20:29 ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
@ 2017-10-19 20:31 ` Jeff King
2017-10-19 21:31 ` Stefan Beller
4 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2017-10-19 20:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Stefan Beller, git, orgads
For computing moved lines, we feed the characters of each
line into a hash. When we've been asked to ignore
whitespace, then we pick each character using next_byte(),
which returns -1 on end-of-string, which it determines using
the start/end pointers we feed it.
However our check of its return value treats "0" the same as
"-1", meaning we'd quit if the string has an embedded NUL.
This is unlikely to ever come up in practice since our line
boundaries generally come from calling strlen() in the first
place.
But it was a bit surprising to me as a reader of the
next_byte() code. And it's possible that we may one day feed
this function with more exotic input, which otherwise works
with arbitrary ptr/len pairs.
Signed-off-by: Jeff King <peff@peff.net>
---
I noticed that we make an extra copy of each line here, just to feed it
to memihash! I guess "-w" is not a critical-performance code path, but
this could be fixed if we could do memhash() incrementally (e.g., by
putting the FNV state into a struct and letting callers "add" to it
incrementally). Maybe an interesting #leftoverbits, though I'd want to
see timing tests that show it's worth doing.
diff.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/diff.c b/diff.c
index 09081a207c..c4a669ffa8 100644
--- a/diff.c
+++ b/diff.c
@@ -782,7 +782,7 @@ static unsigned get_string_hash(struct emitted_diff_symbol *es, struct diff_opti
strbuf_reset(&sb);
while (ae > ap && isspace(ae[-1]))
ae--;
- while ((c = next_byte(&ap, &ae, o)) > 0)
+ while ((c = next_byte(&ap, &ae, o)) >= 0)
strbuf_addch(&sb, c);
return memhash(sb.buf, sb.len);
--
2.15.0.rc1.560.g5f0609e481
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 5/5] diff: handle NULs in get_string_hash()
2017-10-19 20:31 ` [PATCH 5/5] diff: handle NULs in get_string_hash() Jeff King
@ 2017-10-19 21:31 ` Stefan Beller
2017-10-19 21:39 ` Jeff King
0 siblings, 1 reply; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 21:31 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git@vger.kernel.org, Orgad Shaneh
On Thu, Oct 19, 2017 at 1:31 PM, Jeff King <peff@peff.net> wrote:
> For computing moved lines, we feed the characters of each
> line into a hash. When we've been asked to ignore
> whitespace, then we pick each character using next_byte(),
> which returns -1 on end-of-string, which it determines using
> the start/end pointers we feed it.
>
> However our check of its return value treats "0" the same as
> "-1", meaning we'd quit if the string has an embedded NUL.
I agree. The code looks correct.
> This is unlikely to ever come up in practice since our line
> boundaries generally come from calling strlen() in the first
> place.
get_string_hash is called from
prepare_entry, which in turn is called from
add_lines_to_move_detection or mark_color_as_moved
diff_flush_patch_all_file_pairs
that constructs the arguments in
diff_flush_patch
run_diff
run_diff_cmd
builtin_diff (part "/* Crazy xdl interfaces.. */")
xdi_diff_outf( fn_out_consume as arg!)
xdi_diff
xdl_diff
xdl_call_hunk_func
-> fn_out_consume(cb, line, len)
xdl_call_hunk_func however uses pointer arithmetic instead
of strlen. So I think this sentence is not a good idea to put in
the commit message.
It may not occur in practice, due to binary files detection using
NUL as a signal, but conceptually our move-colored(!) diffs
should be compatible with NULs with this patch now.
> But it was a bit surprising to me as a reader of the
> next_byte() code. And it's possible that we may one day feed
> this function with more exotic input, which otherwise works
> with arbitrary ptr/len pairs.
Good point.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I noticed that we make an extra copy of each line here, just to feed it
> to memihash! I guess "-w" is not a critical-performance code path, but
> this could be fixed if we could do memhash() incrementally (e.g., by
> putting the FNV state into a struct and letting callers "add" to it
> incrementally). Maybe an interesting #leftoverbits, though I'd want to
> see timing tests that show it's worth doing.
>
I agree.
Thanks,
Stefan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 5/5] diff: handle NULs in get_string_hash()
2017-10-19 21:31 ` Stefan Beller
@ 2017-10-19 21:39 ` Jeff King
2017-10-19 21:50 ` Stefan Beller
0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2017-10-19 21:39 UTC (permalink / raw)
To: Stefan Beller; +Cc: Junio C Hamano, git@vger.kernel.org, Orgad Shaneh
On Thu, Oct 19, 2017 at 02:31:20PM -0700, Stefan Beller wrote:
> > This is unlikely to ever come up in practice since our line
> > boundaries generally come from calling strlen() in the first
> > place.
>
> get_string_hash is called from
> prepare_entry, which in turn is called from
> add_lines_to_move_detection or mark_color_as_moved
> diff_flush_patch_all_file_pairs
>
> that constructs the arguments in
> diff_flush_patch
> run_diff
> run_diff_cmd
> builtin_diff (part "/* Crazy xdl interfaces.. */")
> xdi_diff_outf( fn_out_consume as arg!)
> xdi_diff
> xdl_diff
> xdl_call_hunk_func
> -> fn_out_consume(cb, line, len)
>
> xdl_call_hunk_func however uses pointer arithmetic instead
> of strlen. So I think this sentence is not a good idea to put in
> the commit message.
>
> It may not occur in practice, due to binary files detection using
> NUL as a signal, but conceptually our move-colored(!) diffs
> should be compatible with NULs with this patch now.
Good catch. I just skimmed over all the diff_emit_* functions, which use
strlen(). But at the bottom is emit_add_line(), which has a real length
marker from xdiff.
I agree we wouldn't see NULs in general, but this is maybe fixing "diff
--color-moved -a". I dunno. It's probably not worth digging too much,
since it seems like the right thing to do regardless.
-Peff
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 5/5] diff: handle NULs in get_string_hash()
2017-10-19 21:39 ` Jeff King
@ 2017-10-19 21:50 ` Stefan Beller
0 siblings, 0 replies; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 21:50 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git@vger.kernel.org, Orgad Shaneh
On Thu, Oct 19, 2017 at 2:39 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Oct 19, 2017 at 02:31:20PM -0700, Stefan Beller wrote:
>
>> > This is unlikely to ever come up in practice since our line
>> > boundaries generally come from calling strlen() in the first
>> > place.
>>
>> get_string_hash is called from
>> prepare_entry, which in turn is called from
>> add_lines_to_move_detection or mark_color_as_moved
>> diff_flush_patch_all_file_pairs
>>
>> that constructs the arguments in
>> diff_flush_patch
>> run_diff
>> run_diff_cmd
>> builtin_diff (part "/* Crazy xdl interfaces.. */")
>> xdi_diff_outf( fn_out_consume as arg!)
>> xdi_diff
>> xdl_diff
>> xdl_call_hunk_func
>> -> fn_out_consume(cb, line, len)
>>
>> xdl_call_hunk_func however uses pointer arithmetic instead
>> of strlen. So I think this sentence is not a good idea to put in
>> the commit message.
>>
>> It may not occur in practice, due to binary files detection using
>> NUL as a signal, but conceptually our move-colored(!) diffs
>> should be compatible with NULs with this patch now.
>
> Good catch. I just skimmed over all the diff_emit_* functions, which use
> strlen(). But at the bottom is emit_add_line(), which has a real length
> marker from xdiff.
Doh!
There is diff_emit_submodule_* but I presume you meant
emit_diff_* actually as there the strlen is legit, as we generate
known non-NUL data to print.
The submodule output may get mangled in diff_emit_submodule_{add,del}
as the input is coming from user data.
> I agree we wouldn't see NULs in general, but this is maybe fixing "diff
> --color-moved -a". I dunno. It's probably not worth digging too much,
> since it seems like the right thing to do regardless.
I agree, that this digging seems not worth it; I was just agitated at
the seemingly incorrect commit message.
Thanks,
Stefan
>
> -Peff
^ permalink raw reply [flat|nested] 34+ messages in thread