git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* Out of memory with diff.colormoved enabled
@ 2017-10-12 19:53 Orgad Shaneh
  2017-10-12 20:05 ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Orgad Shaneh @ 2017-10-12 19:53 UTC (permalink / raw)
  To: git

Hi,

git version 2.15.0.rc0 (from debian sid package)

There is an infinite loop when colormoved is used with --ignore-space-change:

git init
seq 20 > test
git add test
sed -i 's/9/42/' test
git -c diff.colormoved diff --ignore-space-change -- test

- Orgad

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Out of memory with diff.colormoved enabled
  2017-10-12 19:53 Out of memory with diff.colormoved enabled Orgad Shaneh
@ 2017-10-12 20:05 ` Jeff King
  2017-10-12 22:39   ` Stefan Beller
  2017-10-12 23:33   ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
  0 siblings, 2 replies; 34+ messages in thread
From: Jeff King @ 2017-10-12 20:05 UTC (permalink / raw)
  To: Orgad Shaneh; +Cc: Stefan Beller, git

On Thu, Oct 12, 2017 at 10:53:23PM +0300, Orgad Shaneh wrote:

> There is an infinite loop when colormoved is used with --ignore-space-change:
> 
> git init
> seq 20 > test
> git add test
> sed -i 's/9/42/' test
> git -c diff.colormoved diff --ignore-space-change -- test

Thanks for an easy reproduction recipe.

It looks like the problem is that next_byte() doesn't make any forward
progress in the buffer with --ignore-space-change. We try to convert
whitespace into a single space (I'm not sure why, but I'm not very
familiar with this part of the code). But if there's no space, then the
"cp" pointer never gets advanced.

This fixes it, but I have no idea if it's doing the right thing:

diff --git a/diff.c b/diff.c
index 69f03570ad..e8dedc7357 100644
--- a/diff.c
+++ b/diff.c
@@ -713,13 +713,17 @@ static int next_byte(const char **cp, const char **endp,
 		return -1;
 
 	if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) {
-		while (*cp < *endp && isspace(**cp))
+		int saw_whitespace = 0;
+		while (*cp < *endp && isspace(**cp)) {
 			(*cp)++;
+			saw_whitespace = 1;
+		}
 		/*
 		 * After skipping a couple of whitespaces, we still have to
 		 * account for one space.
 		 */
-		return (int)' ';
+		if (saw_whitespace)
+			return (int)' ';
 	}
 
 	if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {

I guess it would be equally correct to not enter that if-block unless
isspace(**cp).

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Out of memory with diff.colormoved enabled
  2017-10-12 20:05 ` Jeff King
@ 2017-10-12 22:39   ` Stefan Beller
  2017-10-12 23:33   ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
  1 sibling, 0 replies; 34+ messages in thread
From: Stefan Beller @ 2017-10-12 22:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Orgad Shaneh, git

On Thu, Oct 12, 2017 at 1:05 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Oct 12, 2017 at 10:53:23PM +0300, Orgad Shaneh wrote:
>
>> There is an infinite loop when colormoved is used with --ignore-space-change:
>>
>> git init
>> seq 20 > test
>> git add test
>> sed -i 's/9/42/' test
>> git -c diff.colormoved diff --ignore-space-change -- test
>
> Thanks for an easy reproduction recipe.

Thanks here as well!

> It looks like the problem is that next_byte() doesn't make any forward
> progress in the buffer with --ignore-space-change. We try to convert
> whitespace into a single space

> (I'm not sure why, but I'm not very
> familiar with this part of the code).

(on why you don't feel familiar)
Because it is quite new, and you weren't one of the main reviewers.
2e2d5ac184 (diff.c: color moved lines differently, 2017-06-30) was also very
large, such that it is easy to overlook. Though I remember Junio and me
discussing the next_byte part quite vividly. :/

(On why it is the way it is)
Consider the three strings
    one SP word
    one TAB word
    oneword

The first two are equal, but the third is different with
`--ignore-space-change` given. To achieve that goal,
the easiest thing to do (in my mind) was to replace
any sequence of blank characters by "a standard
space" sequence. That will make all strings with different
white space sequences compare equal, but the one
without blanks will be different.

> But if there's no space, then the
> "cp" pointer never gets advanced.

Right, because of the early return, skipping the increment of *cp

> This fixes it, but I have no idea if it's doing the right thing:
>
> diff --git a/diff.c b/diff.c
> index 69f03570ad..e8dedc7357 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -713,13 +713,17 @@ static int next_byte(const char **cp, const char **endp,
>                 return -1;
>
>         if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) {
> -               while (*cp < *endp && isspace(**cp))
> +               int saw_whitespace = 0;
> +               while (*cp < *endp && isspace(**cp)) {
>                         (*cp)++;
> +                       saw_whitespace = 1;
> +               }
>                 /*
>                  * After skipping a couple of whitespaces, we still have to
>                  * account for one space.
>                  */
> -               return (int)' ';
> +               if (saw_whitespace)
> +                       return (int)' ';

The "else" is implicit and it falls through to
the standard case at the end of the function,
incrementing *cp, returning the character *cp
pointed at prior to being incremented.

That sounds correct.

>         }
>
>         if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
>
> I guess it would be equally correct to not enter that if-block unless
> isspace(**cp).

This also sounds correct.

>
> -Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-12 20:05 ` Jeff King
  2017-10-12 22:39   ` Stefan Beller
@ 2017-10-12 23:33   ` Stefan Beller
  2017-10-13  0:18     ` Jeff King
  1 sibling, 1 reply; 34+ messages in thread
From: Stefan Beller @ 2017-10-12 23:33 UTC (permalink / raw)
  To: peff; +Cc: git, orgads, sbeller

The added test would hang up Git due to an infinite loop. The function
`next_byte()` doesn't make any forward progress in the buffer with
`--ignore-space-change`.

Fix this by only returning early when there was actual white space
to be covered, fall back to the default case at the end of the function
when there is no white space.

Reported-by: Orgad Shaneh <orgads@gmail.com>
Debugged-by: Jeff King <peff@peff.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
---

Peff, feel free to take ownership here. I merely made it to a patch.

Thanks,
Stefan

 diff.c                     | 12 ++++++++----
 t/t4015-diff-whitespace.sh |  8 ++++++++
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 69f03570ad..6fe84e6994 100644
--- a/diff.c
+++ b/diff.c
@@ -713,13 +713,17 @@ static int next_byte(const char **cp, const char **endp,
 		return -1;
 
 	if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) {
-		while (*cp < *endp && isspace(**cp))
+		int saw_whitespace = 0;
+		while (*cp < *endp && isspace(**cp)) {
 			(*cp)++;
+			saw_whitespace = 1;
+		}
 		/*
-		 * After skipping a couple of whitespaces, we still have to
-		 * account for one space.
+		 * After skipping a couple of whitespaces,
+		 * we still have to account for one space.
 		 */
-		return (int)' ';
+		if (saw_whitespace)
+			return (int)' ';
 	}
 
 	if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index bd0f75d9f7..c088ae86af 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1530,4 +1530,12 @@ test_expect_success 'move detection with submodules' '
 	test_cmp expect decoded_actual
 '
 
+test_expect_success 'move detection with whitespace changes' '
+	test_seq 10 > test &&
+	git add test &&
+	sed -i "s/3/42/" test &&
+	git -c diff.colormoved diff --ignore-space-change -- test &&
+	git reset --hard
+'
+
 test_done
-- 
2.14.0.rc0.3.g6c2e499285


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-12 23:33   ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
@ 2017-10-13  0:18     ` Jeff King
  2017-10-13  0:20       ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2017-10-13  0:18 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, orgads

On Thu, Oct 12, 2017 at 04:33:22PM -0700, Stefan Beller wrote:

> Peff, feel free to take ownership here. I merely made it to a patch.

I think compared to my original, it makes sense to actually wrap the
whole thing with a check for the whitespace. You can do it just in the
IGNORE_WHITESPACE_CHANGE conditional, but I think it makes more sense to
cover both. Like the patch below (view with "-w" to see the logic more
clearly).

I also tweaked the test to remove "sed -i", which isn't portable, and
fixed up a few style nits.

-- >8 --
Subject: diff: fix infinite loop with --color-moved --ignore-space-change

The --color-moved code uses next_byte() to advance through
the blob contents. When the user has asked to ignore
whitespace changes, we try to collapse any whitespace change
down to a single space.

However, we enter the conditional block whenever we see the
IGNORE_WHITESPACE_CHANGE flag, even if the next byte isn't
whitespace.

This means that the combination of "--color-moved and
--ignore-space-change" was completely broken. Worse, because
we return from next_byte() without having advanced our
pointer, the function makes no forward progress in the
buffer and loops infinitely.

Fix this by entering the conditional only when we actually
see whitespace. We can apply this also to the
IGNORE_WHITESPACE change. That code path isn't buggy
(because it falls through to returning the next
non-whitespace byte), but it makes the logic more clear if
we only bother to look at whitespace flags after seeing that
the next byte is whitespace.

Reported-by: Orgad Shaneh <orgads@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 diff.c                     | 28 +++++++++++++++-------------
 t/t4015-diff-whitespace.sh |  9 +++++++++
 2 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/diff.c b/diff.c
index 69f03570ad..d76bb937c1 100644
--- a/diff.c
+++ b/diff.c
@@ -712,20 +712,22 @@ static int next_byte(const char **cp, const char **endp,
 	if (*cp > *endp)
 		return -1;
 
-	if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) {
-		while (*cp < *endp && isspace(**cp))
-			(*cp)++;
-		/*
-		 * After skipping a couple of whitespaces, we still have to
-		 * account for one space.
-		 */
-		return (int)' ';
-	}
+	if (isspace(**cp)) {
+		if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) {
+			while (*cp < *endp && isspace(**cp))
+				(*cp)++;
+			/*
+			 * After skipping a couple of whitespaces,
+			 * we still have to account for one space.
+			 */
+			return (int)' ';
+		}
 
-	if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
-		while (*cp < *endp && isspace(**cp))
-			(*cp)++;
-		/* return the first non-ws character via the usual below */
+		if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
+			while (*cp < *endp && isspace(**cp))
+				(*cp)++;
+			/* return the first non-ws character via the usual below */
+		}
 	}
 
 	retval = (unsigned char)(**cp);
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index bd0f75d9f7..87083f728f 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1530,4 +1530,13 @@ test_expect_success 'move detection with submodules' '
 	test_cmp expect decoded_actual
 '
 
+test_expect_success 'move detection with whitespace changes' '
+	test_when_finished "git reset --hard" &&
+	test_seq 10 >test &&
+	git add test &&
+	sed s/3/42/ <test >test.tmp &&
+	mv test.tmp test &&
+	git -c diff.colormoved diff --ignore-space-change -- test
+'
+
 test_done
-- 
2.15.0.rc1.381.g94fac273d4


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-13  0:18     ` Jeff King
@ 2017-10-13  0:20       ` Jeff King
  2017-10-13  0:24         ` Stefan Beller
  2017-10-19  5:04         ` Jeff King
  0 siblings, 2 replies; 34+ messages in thread
From: Jeff King @ 2017-10-13  0:20 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, orgads

On Thu, Oct 12, 2017 at 08:18:37PM -0400, Jeff King wrote:

> Fix this by entering the conditional only when we actually
> see whitespace. We can apply this also to the
> IGNORE_WHITESPACE change. That code path isn't buggy
> (because it falls through to returning the next
> non-whitespace byte), but it makes the logic more clear if
> we only bother to look at whitespace flags after seeing that
> the next byte is whitespace.

I think there actually _is_ a bug in that code path, but it's unrelated
to this one. If you have whitespace at the end of the buffer, then we'd
advance *cp until it matches *endp, and then return whatever is at *endp
(which is nonsense, or probably a NUL) rather than returning "-1".

I'm out of time for tonight and not familiar enough with the color-moved
code to come up with a reasonable test case quickly, but maybe you can
see if that can trigger bad behavior?

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-13  0:20       ` Jeff King
@ 2017-10-13  0:24         ` Stefan Beller
  2017-10-19  5:04         ` Jeff King
  1 sibling, 0 replies; 34+ messages in thread
From: Stefan Beller @ 2017-10-13  0:24 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Orgad Shaneh

On Thu, Oct 12, 2017 at 5:20 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Oct 12, 2017 at 08:18:37PM -0400, Jeff King wrote:
>
>> Fix this by entering the conditional only when we actually
>> see whitespace. We can apply this also to the
>> IGNORE_WHITESPACE change. That code path isn't buggy
>> (because it falls through to returning the next
>> non-whitespace byte), but it makes the logic more clear if
>> we only bother to look at whitespace flags after seeing that
>> the next byte is whitespace.
>
> I think there actually _is_ a bug in that code path, but it's unrelated
> to this one. If you have whitespace at the end of the buffer, then we'd
> advance *cp until it matches *endp, and then return whatever is at *endp
> (which is nonsense, or probably a NUL) rather than returning "-1".

Good catch! This plays together interestingly with
IGNORE_WHITESPACE_AT_EOL, too. If that is set the endp is just put
further to the front, so we'd actually compare white spaces after endp.

If that flag is not set, we'd get NUL or nonsense.

> I'm out of time for tonight and not familiar enough with the color-moved
> code to come up with a reasonable test case quickly, but maybe you can
> see if that can trigger bad behavior?
>
> -Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-13  0:20       ` Jeff King
  2017-10-13  0:24         ` Stefan Beller
@ 2017-10-19  5:04         ` Jeff King
  2017-10-19  5:24           ` Jeff King
  1 sibling, 1 reply; 34+ messages in thread
From: Jeff King @ 2017-10-19  5:04 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, orgads

On Thu, Oct 12, 2017 at 08:20:57PM -0400, Jeff King wrote:

> On Thu, Oct 12, 2017 at 08:18:37PM -0400, Jeff King wrote:
> 
> > Fix this by entering the conditional only when we actually
> > see whitespace. We can apply this also to the
> > IGNORE_WHITESPACE change. That code path isn't buggy
> > (because it falls through to returning the next
> > non-whitespace byte), but it makes the logic more clear if
> > we only bother to look at whitespace flags after seeing that
> > the next byte is whitespace.
> 
> I think there actually _is_ a bug in that code path, but it's unrelated
> to this one. If you have whitespace at the end of the buffer, then we'd
> advance *cp until it matches *endp, and then return whatever is at *endp
> (which is nonsense, or probably a NUL) rather than returning "-1".
> 
> I'm out of time for tonight and not familiar enough with the color-moved
> code to come up with a reasonable test case quickly, but maybe you can
> see if that can trigger bad behavior?

I found a few moments to follow up on this. I'm not sure if it's a bug
or intentional behavior. It's definitely easy to trigger next_byte()
reading *endp (even without ignoring whitespace, we always dereference
it).

This function is always operating on the buffer from an
emitted_diff_symbol, which is NUL-terminated. So *endp is always NUL. So
we'll return "0" at the end of the string.

There are two callers. The first is get_string_hash(), which does:

  while ((c = next_byte(&ap, &ae, o)) > 0)

so we'll break out on the NUL byte. But it also sometimes shrinks the
end-pointer "ae" to skip trailing whitespace.  Or at least it tries to.
It does:

   const char *ap = es->line, *ae = es->line + es->len;
   ...
   while (ae > ap && isspace(*ae))
           ae--;

but AFAICT that loop will never trigger, since *ae will always point to
NUL to begin with. But it points to the expectation that our end-pointer
does not follow the usual one-past-the-end convention, but rather is
meant to point to the actual last character in the string.

The same problem is repeated in the other caller, moved_entry_cmp(),
which tries to eat trailing whitespace for IGNORE_WHITESPACE_AT_EOL. But
it uses the same loop that starts on NUL and will never trigger.

It then goes on to call next_byte(). The extra NUL there should not
cause any problem, because we are just checking for equality between the
two strings (so if they both return NUL at the same time, good; if not,
then we know they are different).

So. That leaves me with:

  - I'm unclear on whether next_byte() is meant to return that trailing
    NUL or not. I don't think it causes any bugs, but it certainly
    confused me for a function to take a cp/endp pair of pointers, and
    then dereference endp. It might be worth either fixing or clarifying
    with a comment.

  - Those loops to eat trailing whitespace are doing nothing. I'm not
    sure if that all works out because next_byte() eats whitespaces or
    not (I think not, because it doesn't eat whitespace for the
    IGNORE_WHITESPACE_AT_EOL case). But I'm not quite sure what a test
    would look like.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-19  5:04         ` Jeff King
@ 2017-10-19  5:24           ` Jeff King
  2017-10-19  5:30             ` Junio C Hamano
  2017-10-19 19:53             ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
  0 siblings, 2 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19  5:24 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, orgads

On Thu, Oct 19, 2017 at 01:04:59AM -0400, Jeff King wrote:

> So. That leaves me with:
> 
>   - I'm unclear on whether next_byte() is meant to return that trailing
>     NUL or not. I don't think it causes any bugs, but it certainly
>     confused me for a function to take a cp/endp pair of pointers, and
>     then dereference endp. It might be worth either fixing or clarifying
>     with a comment.
> 
>   - Those loops to eat trailing whitespace are doing nothing. I'm not
>     sure if that all works out because next_byte() eats whitespaces or
>     not (I think not, because it doesn't eat whitespace for the
>     IGNORE_WHITESPACE_AT_EOL case). But I'm not quite sure what a test
>     would look like.

I had trouble constructing a test at first, but I think my test lines
just weren't long enough to trigger the movement heuristics. If I switch
to something besides seq, I can do:

  # any input that has reasonably sized lines
  look e | head -50 >file
  git add file
  
  perl -i -ne '
    # pick up lines 20-25 to move to line 40, and
    # add some trailing whitespace to them
    if ($. >= 20 && $. <= 25) {
      s/$/     /;
      $hold .= $_;
    } else {
      print $hold if ($. == 40);
      print;
    }
  ' file

  git diff --color-moved --ignore-space-at-eol

I think that _should_ show the block as moved, but it doesn't. But if I
apply this patch:

diff --git a/diff.c b/diff.c
index 93dccd1817..375d9cf447 100644
--- a/diff.c
+++ b/diff.c
@@ -743,8 +743,8 @@ static int moved_entry_cmp(const struct diff_options *diffopt,
 			   const struct moved_entry *b,
 			   const void *keydata)
 {
-	const char *ap = a->es->line, *ae = a->es->line + a->es->len;
-	const char *bp = b->es->line, *be = b->es->line + b->es->len;
+	const char *ap = a->es->line, *ae = a->es->line + a->es->len - 1;
+	const char *bp = b->es->line, *be = b->es->line + b->es->len - 1;
 
 	if (!(diffopt->xdl_opts & XDF_WHITESPACE_FLAGS))
 		return a->es->len != b->es->len  || memcmp(ap, bp, a->es->len);
@@ -771,7 +771,7 @@ static unsigned get_string_hash(struct emitted_diff_symbol *es, struct diff_opti
 {
 	if (o->xdl_opts & XDF_WHITESPACE_FLAGS) {
 		static struct strbuf sb = STRBUF_INIT;
-		const char *ap = es->line, *ae = es->line + es->len;
+		const char *ap = es->line, *ae = es->line + es->len - 1;
 		int c;
 
 		strbuf_reset(&sb);

it does. It just adjusts our "end pointer" to point to the last valid
character in the string (rather than one past), which seems to be the
convention that those loops (and next_byte) expect.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-19  5:24           ` Jeff King
@ 2017-10-19  5:30             ` Junio C Hamano
  2017-10-19  5:32               ` Junio C Hamano
  2017-10-19  5:42               ` Jeff King
  2017-10-19 19:53             ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
  1 sibling, 2 replies; 34+ messages in thread
From: Junio C Hamano @ 2017-10-19  5:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, git, orgads

Jeff King <peff@peff.net> writes:

> it does. It just adjusts our "end pointer" to point to the last valid
> character in the string (rather than one past), which seems to be the
> convention that those loops (and next_byte) expect.

Yeah I am not sure if I like this comparison at the beginning of the
function:

        static int next_byte(const char **cp, const char **endp,
                             const struct diff_options *diffopt)
        {
                int retval;

                if (*cp > *endp)
                        return -1;

but it says endp _is_ part of valid input, contrary to my intuition.

And your change to the initialization of ae/be in moved_entry_cmp()
makes it consistent with it, I think.

But doesn't it mean ae computation in get_string_hash() also needs a
massaging?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-19  5:30             ` Junio C Hamano
@ 2017-10-19  5:32               ` Junio C Hamano
  2017-10-19  5:32                 ` Jeff King
  2017-10-19  5:42               ` Jeff King
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2017-10-19  5:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, git, orgads

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> it does. It just adjusts our "end pointer" to point to the last valid
>> character in the string (rather than one past), which seems to be the
>> convention that those loops (and next_byte) expect.
>
> Yeah I am not sure if I like this comparison at the beginning of the
> function:
>
>         static int next_byte(const char **cp, const char **endp,
>                              const struct diff_options *diffopt)
>         {
>                 int retval;
>
>                 if (*cp > *endp)
>                         return -1;
>
> but it says endp _is_ part of valid input, contrary to my intuition.
>
> And your change to the initialization of ae/be in moved_entry_cmp()
> makes it consistent with it, I think.
>
> But doesn't it mean ae computation in get_string_hash() also needs a
> massaging?

Ah, forget the last two lines.  You do do the massaging in your
patch.

My preference actually is to fix next_byte to follow the usual "when
we end, it points one past the valid region", though.

Thanks for digging it through.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-19  5:32               ` Junio C Hamano
@ 2017-10-19  5:32                 ` Jeff King
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19  5:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git, orgads

On Thu, Oct 19, 2017 at 02:32:04PM +0900, Junio C Hamano wrote:

> > Yeah I am not sure if I like this comparison at the beginning of the
> > function:
> >
> >         static int next_byte(const char **cp, const char **endp,
> >                              const struct diff_options *diffopt)
> >         {
> >                 int retval;
> >
> >                 if (*cp > *endp)
> >                         return -1;
> >
> > but it says endp _is_ part of valid input, contrary to my intuition.
> >
> > And your change to the initialization of ae/be in moved_entry_cmp()
> > makes it consistent with it, I think.
> >
> > But doesn't it mean ae computation in get_string_hash() also needs a
> > massaging?
> 
> Ah, forget the last two lines.  You do do the massaging in your
> patch.

I was just replying so. :)

> My preference actually is to fix next_byte to follow the usual "when
> we end, it points one past the valid region", though.

Yeah, I think that is my preference, too.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-19  5:30             ` Junio C Hamano
  2017-10-19  5:32               ` Junio C Hamano
@ 2017-10-19  5:42               ` Jeff King
  2017-10-19 19:55                 ` Stefan Beller
  2017-10-19 20:23                 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
  1 sibling, 2 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19  5:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git, orgads

On Thu, Oct 19, 2017 at 02:30:08PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > it does. It just adjusts our "end pointer" to point to the last valid
> > character in the string (rather than one past), which seems to be the
> > convention that those loops (and next_byte) expect.
> 
> Yeah I am not sure if I like this comparison at the beginning of the
> function:
> 
>         static int next_byte(const char **cp, const char **endp,
>                              const struct diff_options *diffopt)
>         {
>                 int retval;
> 
>                 if (*cp > *endp)
>                         return -1;
> 
> but it says endp _is_ part of valid input, contrary to my intuition.

Actually, I think even this function is confused about its convention.
In the line you quote, we clearly treat *endp as part of the input. But
later we do:

  while (*cp < *endp && isspace(**cp))
	  (*cp)++;

meaning that we'd fail to soak up whitespace at *endp. That wouldn't be
so bad if not for the other bug which fails to eat whitespace at endp in
the first place. :)

So I think the right fix is this:

diff --git a/diff.c b/diff.c
index 6fd288420b..09081a207c 100644
--- a/diff.c
+++ b/diff.c
@@ -712,7 +712,7 @@ static int next_byte(const char **cp, const char **endp,
 {
 	int retval;
 
-	if (*cp > *endp)
+	if (*cp >= *endp)
 		return -1;
 
 	if (isspace(**cp)) {
@@ -729,7 +729,12 @@ static int next_byte(const char **cp, const char **endp,
 		if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
 			while (*cp < *endp && isspace(**cp))
 				(*cp)++;
-			/* return the first non-ws character via the usual below */
+			/*
+			 * return the first non-ws character via the usual
+			 * below, unless we ate all of the bytes
+			 */
+			if (*cp >= *endp)
+				return -1;
 		}
 	}
 
@@ -750,9 +755,9 @@ static int moved_entry_cmp(const struct diff_options *diffopt,
 		return a->es->len != b->es->len  || memcmp(ap, bp, a->es->len);
 
 	if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_AT_EOL)) {
-		while (ae > ap && isspace(*ae))
+		while (ae > ap && isspace(ae[-1]))
 			ae--;
-		while (be > bp && isspace(*be))
+		while (be > bp && isspace(be[-1]))
 			be--;
 	}
 
@@ -775,7 +780,7 @@ static unsigned get_string_hash(struct emitted_diff_symbol *es, struct diff_opti
 		int c;
 
 		strbuf_reset(&sb);
-		while (ae > ap && isspace(*ae))
+		while (ae > ap && isspace(ae[-1]))
 			ae--;
 		while ((c = next_byte(&ap, &ae, o)) > 0)
 			strbuf_addch(&sb, c);

It's late here, so I'll wait for comments from Stefan and then try to
wrap it up with a commit message and test tomorrow.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-19  5:24           ` Jeff King
  2017-10-19  5:30             ` Junio C Hamano
@ 2017-10-19 19:53             ` Stefan Beller
  2017-10-19 19:55               ` Jeff King
  1 sibling, 1 reply; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 19:53 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Orgad Shaneh

On Wed, Oct 18, 2017 at 10:24 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Oct 19, 2017 at 01:04:59AM -0400, Jeff King wrote:
>
>> So. That leaves me with:
>>
>>   - I'm unclear on whether next_byte() is meant to return that trailing
>>     NUL or not. I don't think it causes any bugs, but it certainly
>>     confused me for a function to take a cp/endp pair of pointers, and
>>     then dereference endp. It might be worth either fixing or clarifying
>>     with a comment.
>>
>>   - Those loops to eat trailing whitespace are doing nothing. I'm not
>>     sure if that all works out because next_byte() eats whitespaces or
>>     not (I think not, because it doesn't eat whitespace for the
>>     IGNORE_WHITESPACE_AT_EOL case). But I'm not quite sure what a test
>>     would look like.
>
> I had trouble constructing a test at first, but I think my test lines
> just weren't long enough to trigger the movement heuristics. If I switch
> to something besides seq, I can do:
>
>   # any input that has reasonably sized lines
>   look e | head -50 >file
>   git add file
>
>   perl -i -ne '
>     # pick up lines 20-25 to move to line 40, and
>     # add some trailing whitespace to them
>     if ($. >= 20 && $. <= 25) {
>       s/$/     /;
>       $hold .= $_;
>     } else {
>       print $hold if ($. == 40);
>       print;
>     }
>   ' file
>
>   git diff --color-moved --ignore-space-at-eol
>
> I think that _should_ show the block as moved, but it doesn't. But if I
> apply this patch:
>
> diff --git a/diff.c b/diff.c
> index 93dccd1817..375d9cf447 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -743,8 +743,8 @@ static int moved_entry_cmp(const struct diff_options *diffopt,
>                            const struct moved_entry *b,
>                            const void *keydata)
>  {
> -       const char *ap = a->es->line, *ae = a->es->line + a->es->len;
> -       const char *bp = b->es->line, *be = b->es->line + b->es->len;
> +       const char *ap = a->es->line, *ae = a->es->line + a->es->len - 1;
> +       const char *bp = b->es->line, *be = b->es->line + b->es->len - 1;
>
>         if (!(diffopt->xdl_opts & XDF_WHITESPACE_FLAGS))
>                 return a->es->len != b->es->len  || memcmp(ap, bp, a->es->len);
> @@ -771,7 +771,7 @@ static unsigned get_string_hash(struct emitted_diff_symbol *es, struct diff_opti
>  {
>         if (o->xdl_opts & XDF_WHITESPACE_FLAGS) {
>                 static struct strbuf sb = STRBUF_INIT;
> -               const char *ap = es->line, *ae = es->line + es->len;
> +               const char *ap = es->line, *ae = es->line + es->len - 1;
>                 int c;
>
>                 strbuf_reset(&sb);
>
> it does. It just adjusts our "end pointer" to point to the last valid
> character in the string (rather than one past),

Thanks for spotting. I can send a proper patch with tests if you'd like.


> which seems to be the
> convention that those loops (and next_byte) expect.

I'll look at that again.

Thanks for poking!
Stefan

>
> -Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-19  5:42               ` Jeff King
@ 2017-10-19 19:55                 ` Stefan Beller
  2017-10-19 20:23                 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
  1 sibling, 0 replies; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 19:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Orgad Shaneh

On Wed, Oct 18, 2017 at 10:42 PM, Jeff King <peff@peff.net> wrote:

>
> So I think the right fix is this:
>
[...]
>
> It's late here, so I'll wait for comments from Stefan and then try to
> wrap it up with a commit message and test tomorrow.
>
> -Peff

I agree that this is better and looks correct.
Thanks for offering to send a patch. I'd be happy to review it.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] diff.c: increment buffer pointer in all code path
  2017-10-19 19:53             ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
@ 2017-10-19 19:55               ` Jeff King
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19 19:55 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Orgad Shaneh

On Thu, Oct 19, 2017 at 12:53:03PM -0700, Stefan Beller wrote:

> > @@ -771,7 +771,7 @@ static unsigned get_string_hash(struct emitted_diff_symbol *es, struct diff_opti
> >  {
> >         if (o->xdl_opts & XDF_WHITESPACE_FLAGS) {
> >                 static struct strbuf sb = STRBUF_INIT;
> > -               const char *ap = es->line, *ae = es->line + es->len;
> > +               const char *ap = es->line, *ae = es->line + es->len - 1;
> >                 int c;
> >
> >                 strbuf_reset(&sb);
> >
> > it does. It just adjusts our "end pointer" to point to the last valid
> > character in the string (rather than one past),
> 
> Thanks for spotting. I can send a proper patch with tests if you'd like.

I'm putting the finishing touches on a series that should be out in a
few minutes. Hold off until then.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol"
  2017-10-19  5:42               ` Jeff King
  2017-10-19 19:55                 ` Stefan Beller
@ 2017-10-19 20:23                 ` Jeff King
  2017-10-19 20:24                   ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
                                     ` (4 more replies)
  1 sibling, 5 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19 20:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git, orgads

On Thu, Oct 19, 2017 at 01:42:46AM -0400, Jeff King wrote:

> It's late here, so I'll wait for comments from Stefan and then try to
> wrap it up with a commit message and test tomorrow.

Here it is.

The fix is in patch 4. The earlier ones are just beefing up the test
coverage, and the last one is a minor cleanup we can do on top.

  [1/5]: t4015: refactor --color-moved whitespace test
  [2/5]: t4015: check "negative" case for "-w --color-moved"
  [3/5]: t4015: test the output of "diff --color-moved -b"
  [4/5]: diff: fix whitespace-skipping with --color-moved
  [5/5]: diff: handle NULs in get_string_hash()

 diff.c                     |  17 ++--
 t/t4015-diff-whitespace.sh | 213 +++++++++++++++++++++++++++++++++++++--------
 2 files changed, 188 insertions(+), 42 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/5] t4015: refactor --color-moved whitespace test
  2017-10-19 20:23                 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
@ 2017-10-19 20:24                   ` Jeff King
  2017-10-19 20:56                     ` Stefan Beller
  2017-10-19 20:25                   ` [PATCH 2/5] t4015: check "negative" case for "-w --color-moved" Jeff King
                                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2017-10-19 20:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git, orgads

In preparation for testing several different whitespace
options, let's split out the setup and cleanup steps of the
whitespace test.

While we're here, let's also switch to using "<<-" to indent
our here-documents properly, and use q_to_tab to more
explicitly mark where we expect whitespace to appear.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t4015-diff-whitespace.sh | 49 +++++++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 87083f728f..164b502405 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1318,30 +1318,34 @@ test_expect_success 'no effect from --color-moved with --word-diff' '
 	test_cmp expect actual
 '
 
-test_expect_success 'move detection ignoring whitespace ' '
+test_expect_success 'set up whitespace tests' '
 	git reset --hard &&
-	cat <<\EOF >lines.txt &&
-line 1
-line 2
-line 3
-line 4
-long line 5
-long line 6
-long line 7
-EOF
-	git add lines.txt &&
-	git commit -m "add poetry" &&
-	cat <<\EOF >lines.txt &&
+	# Note that these lines have no leading or trailing whitespace.
+	cat <<-\EOF >lines.txt &&
+	line 1
+	line 2
+	line 3
+	line 4
 	long line 5
 	long line 6
 	long line 7
-line 1
-line 2
-line 3
-line 4
-EOF
-	test_config color.diff.oldMoved "magenta" &&
-	test_config color.diff.newMoved "cyan" &&
+	EOF
+	git add lines.txt &&
+	git commit -m "add poetry" &&
+	git config color.diff.oldMoved "magenta" &&
+	git config color.diff.newMoved "cyan"
+'
+
+test_expect_success 'move detection ignoring whitespace ' '
+	q_to_tab <<-\EOF >lines.txt &&
+	Qlong line 5
+	Qlong line 6
+	Qlong line 7
+	line 1
+	line 2
+	line 3
+	line 4
+	EOF
 	git diff HEAD --no-renames --color-moved --color |
 		grep -v "index" |
 		test_decode_color >actual &&
@@ -1385,6 +1389,11 @@ EOF
 	test_cmp expected actual
 '
 
+test_expect_success 'clean up whitespace-test colors' '
+	git config --unset color.diff.oldMoved &&
+	git config --unset color.diff.newMoved
+'
+
 test_expect_success '--color-moved block at end of diff output respects MIN_ALNUM_COUNT' '
 	git reset --hard &&
 	>bar &&
-- 
2.15.0.rc1.560.g5f0609e481


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 2/5] t4015: check "negative" case for "-w --color-moved"
  2017-10-19 20:23                 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
  2017-10-19 20:24                   ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
@ 2017-10-19 20:25                   ` Jeff King
  2017-10-19 20:54                     ` Stefan Beller
  2017-10-19 20:26                   ` [PATCH 3/5] t4015: test the output of "diff --color-moved -b" Jeff King
                                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2017-10-19 20:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git, orgads

We test that lines with whitespace changes are not found by
"--color-moved" by default, but are found if "-w" is added.
Let's add one more twist: a line that has non-whitespace
changes should not be marked as a pure move.

This is perhaps an obvious case for us to get right (and we
do), but as we add more whitespace tests, they will form a
pattern of "make sure this case is a move and this other
case is not".

Note that we have to add a line to our moved block, since
having a too-small block doesn't trigger the "moved"
heuristics.  And we also add a line of context to ensure
that there's more context lines than moved lines (so the
diff shows us moving the lines up, rather than moving the
context down).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t4015-diff-whitespace.sh | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 164b502405..503c9bc7f3 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1326,9 +1326,11 @@ test_expect_success 'set up whitespace tests' '
 	line 2
 	line 3
 	line 4
-	long line 5
+	line 5
 	long line 6
 	long line 7
+	long line 8
+	long line 9
 	EOF
 	git add lines.txt &&
 	git commit -m "add poetry" &&
@@ -1338,13 +1340,15 @@ test_expect_success 'set up whitespace tests' '
 
 test_expect_success 'move detection ignoring whitespace ' '
 	q_to_tab <<-\EOF >lines.txt &&
-	Qlong line 5
 	Qlong line 6
 	Qlong line 7
+	Qlong line 8
+	Qchanged long line 9
 	line 1
 	line 2
 	line 3
 	line 4
+	line 5
 	EOF
 	git diff HEAD --no-renames --color-moved --color |
 		grep -v "index" |
@@ -1353,17 +1357,20 @@ test_expect_success 'move detection ignoring whitespace ' '
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
 	<BOLD>--- a/lines.txt<RESET>
 	<BOLD>+++ b/lines.txt<RESET>
-	<CYAN>@@ -1,7 +1,7 @@<RESET>
-	<GREEN>+<RESET>	<GREEN>long line 5<RESET>
+	<CYAN>@@ -1,9 +1,9 @@<RESET>
 	<GREEN>+<RESET>	<GREEN>long line 6<RESET>
 	<GREEN>+<RESET>	<GREEN>long line 7<RESET>
+	<GREEN>+<RESET>	<GREEN>long line 8<RESET>
+	<GREEN>+<RESET>	<GREEN>changed long line 9<RESET>
 	 line 1<RESET>
 	 line 2<RESET>
 	 line 3<RESET>
 	 line 4<RESET>
-	<RED>-long line 5<RESET>
+	 line 5<RESET>
 	<RED>-long line 6<RESET>
 	<RED>-long line 7<RESET>
+	<RED>-long line 8<RESET>
+	<RED>-long line 9<RESET>
 	EOF
 	test_cmp expected actual &&
 
@@ -1374,17 +1381,20 @@ test_expect_success 'move detection ignoring whitespace ' '
 	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
 	<BOLD>--- a/lines.txt<RESET>
 	<BOLD>+++ b/lines.txt<RESET>
-	<CYAN>@@ -1,7 +1,7 @@<RESET>
-	<CYAN>+<RESET>	<CYAN>long line 5<RESET>
+	<CYAN>@@ -1,9 +1,9 @@<RESET>
 	<CYAN>+<RESET>	<CYAN>long line 6<RESET>
 	<CYAN>+<RESET>	<CYAN>long line 7<RESET>
+	<CYAN>+<RESET>	<CYAN>long line 8<RESET>
+	<GREEN>+<RESET>	<GREEN>changed long line 9<RESET>
 	 line 1<RESET>
 	 line 2<RESET>
 	 line 3<RESET>
 	 line 4<RESET>
-	<MAGENTA>-long line 5<RESET>
+	 line 5<RESET>
 	<MAGENTA>-long line 6<RESET>
 	<MAGENTA>-long line 7<RESET>
+	<MAGENTA>-long line 8<RESET>
+	<RED>-long line 9<RESET>
 	EOF
 	test_cmp expected actual
 '
-- 
2.15.0.rc1.560.g5f0609e481


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 3/5] t4015: test the output of "diff --color-moved -b"
  2017-10-19 20:23                 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
  2017-10-19 20:24                   ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
  2017-10-19 20:25                   ` [PATCH 2/5] t4015: check "negative" case for "-w --color-moved" Jeff King
@ 2017-10-19 20:26                   ` Jeff King
  2017-10-19 21:03                     ` Stefan Beller
  2017-10-19 20:29                   ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
  2017-10-19 20:31                   ` [PATCH 5/5] diff: handle NULs in get_string_hash() Jeff King
  4 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2017-10-19 20:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git, orgads

Commit fa5ba2c1dd (diff: fix infinite loop with
--color-moved --ignore-space-change, 2017-10-12) added a
test to make sure that "--color-moved -b" doesn't run
forever, but the test in question doesn't actually have any
moved lines in it.

Let's scrap that test and add a variant of the existing
"--color-moved -w" test, but this time we'll check that we
find the move with whitespace changes, but not arbitrary
whitespace additions.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t4015-diff-whitespace.sh | 73 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 64 insertions(+), 9 deletions(-)

diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 503c9bc7f3..1f54816c6b 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1399,6 +1399,70 @@ test_expect_success 'move detection ignoring whitespace ' '
 	test_cmp expected actual
 '
 
+test_expect_success 'move detection ignoring whitespace changes' '
+	git reset --hard &&
+	# Lines 6-8 have a space change, but 9 is new whitespace
+	q_to_tab <<-\EOF >lines.txt &&
+	longQline 6
+	longQline 7
+	longQline 8
+	long liQne 9
+	line 1
+	line 2
+	line 3
+	line 4
+	line 5
+	EOF
+
+	git diff HEAD --no-renames --color-moved --color |
+		grep -v "index" |
+		test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+	<BOLD>--- a/lines.txt<RESET>
+	<BOLD>+++ b/lines.txt<RESET>
+	<CYAN>@@ -1,9 +1,9 @@<RESET>
+	<GREEN>+<RESET><GREEN>long	line 6<RESET>
+	<GREEN>+<RESET><GREEN>long	line 7<RESET>
+	<GREEN>+<RESET><GREEN>long	line 8<RESET>
+	<GREEN>+<RESET><GREEN>long li	ne 9<RESET>
+	 line 1<RESET>
+	 line 2<RESET>
+	 line 3<RESET>
+	 line 4<RESET>
+	 line 5<RESET>
+	<RED>-long line 6<RESET>
+	<RED>-long line 7<RESET>
+	<RED>-long line 8<RESET>
+	<RED>-long line 9<RESET>
+	EOF
+	test_cmp expected actual &&
+
+	git diff HEAD --no-renames -b --color-moved --color |
+		grep -v "index" |
+		test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+	<BOLD>--- a/lines.txt<RESET>
+	<BOLD>+++ b/lines.txt<RESET>
+	<CYAN>@@ -1,9 +1,9 @@<RESET>
+	<CYAN>+<RESET><CYAN>long	line 6<RESET>
+	<CYAN>+<RESET><CYAN>long	line 7<RESET>
+	<CYAN>+<RESET><CYAN>long	line 8<RESET>
+	<GREEN>+<RESET><GREEN>long li	ne 9<RESET>
+	 line 1<RESET>
+	 line 2<RESET>
+	 line 3<RESET>
+	 line 4<RESET>
+	 line 5<RESET>
+	<MAGENTA>-long line 6<RESET>
+	<MAGENTA>-long line 7<RESET>
+	<MAGENTA>-long line 8<RESET>
+	<RED>-long line 9<RESET>
+	EOF
+	test_cmp expected actual
+'
+
 test_expect_success 'clean up whitespace-test colors' '
 	git config --unset color.diff.oldMoved &&
 	git config --unset color.diff.newMoved
@@ -1549,13 +1613,4 @@ test_expect_success 'move detection with submodules' '
 	test_cmp expect decoded_actual
 '
 
-test_expect_success 'move detection with whitespace changes' '
-	test_when_finished "git reset --hard" &&
-	test_seq 10 >test &&
-	git add test &&
-	sed s/3/42/ <test >test.tmp &&
-	mv test.tmp test &&
-	git -c diff.colormoved diff --ignore-space-change -- test
-'
-
 test_done
-- 
2.15.0.rc1.560.g5f0609e481


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 4/5] diff: fix whitespace-skipping with --color-moved
  2017-10-19 20:23                 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
                                     ` (2 preceding siblings ...)
  2017-10-19 20:26                   ` [PATCH 3/5] t4015: test the output of "diff --color-moved -b" Jeff King
@ 2017-10-19 20:29                   ` Jeff King
  2017-10-19 21:15                     ` Stefan Beller
  2017-10-20  7:23                     ` Simon Ruderich
  2017-10-19 20:31                   ` [PATCH 5/5] diff: handle NULs in get_string_hash() Jeff King
  4 siblings, 2 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19 20:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git, orgads

The code for handling whitespace with --color-moved
represents partial strings as a pair of pointers. There are
two possible conventions for the end pointer:

  1. It points to the byte right after the end of the
     string.

  2. It points to the final byte of the string.

But we seem to use both conventions in the code:

  a. we assign the initial pointers from the NUL-terminated
     string using (1)

  b. we eat trailing whitespace by checking the second
     pointer for isspace(), which needs (2)

  c. the next_byte() function checks for end-of-string with
     "if (cp > endp)", which is (2)

  d. in next_byte() we skip past internal whitespace with
     "while (cp < end)", which is (1)

This creates fewer bugs than you might think, because there
are some subtle interactions. Because of (a) and (c), we
always return the NUL-terminator from next_byte(). But all
of the callers of next_byte() happen to handle that
gracefully.

Because of the mismatch between (d) and (c), next_byte()
could accidentally return a whitespace character right at
endp. But because of the interaction of (a) and (b), we fail
to actually chomp trailing whitespace, meaning our endp
_always_ points to a NUL, canceling out the problem.

But that does leave (b) as a real bug: when ignoring
whitespace only at the end-of-line, we don't correctly trim
it, and fail to match up lines.

We can fix the whole thing by moving consistently to one
convention. Since convention (1) is idiomatic in our code
base, we'll pick that one.

The existing "-w" and "-b" tests continue to pass, and a new
"--ignore-space-at-eol" shows off the breakage we're fixing.

Signed-off-by: Jeff King <peff@peff.net>
---
 diff.c                     | 15 +++++++----
 t/t4015-diff-whitespace.sh | 67 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 6fd288420b..09081a207c 100644
--- a/diff.c
+++ b/diff.c
@@ -712,7 +712,7 @@ static int next_byte(const char **cp, const char **endp,
 {
 	int retval;
 
-	if (*cp > *endp)
+	if (*cp >= *endp)
 		return -1;
 
 	if (isspace(**cp)) {
@@ -729,7 +729,12 @@ static int next_byte(const char **cp, const char **endp,
 		if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
 			while (*cp < *endp && isspace(**cp))
 				(*cp)++;
-			/* return the first non-ws character via the usual below */
+			/*
+			 * return the first non-ws character via the usual
+			 * below, unless we ate all of the bytes
+			 */
+			if (*cp >= *endp)
+				return -1;
 		}
 	}
 
@@ -750,9 +755,9 @@ static int moved_entry_cmp(const struct diff_options *diffopt,
 		return a->es->len != b->es->len  || memcmp(ap, bp, a->es->len);
 
 	if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_AT_EOL)) {
-		while (ae > ap && isspace(*ae))
+		while (ae > ap && isspace(ae[-1]))
 			ae--;
-		while (be > bp && isspace(*be))
+		while (be > bp && isspace(be[-1]))
 			be--;
 	}
 
@@ -775,7 +780,7 @@ static unsigned get_string_hash(struct emitted_diff_symbol *es, struct diff_opti
 		int c;
 
 		strbuf_reset(&sb);
-		while (ae > ap && isspace(*ae))
+		while (ae > ap && isspace(ae[-1]))
 			ae--;
 		while ((c = next_byte(&ap, &ae, o)) > 0)
 			strbuf_addch(&sb, c);
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 1f54816c6b..eb9431da7d 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1463,6 +1463,73 @@ test_expect_success 'move detection ignoring whitespace changes' '
 	test_cmp expected actual
 '
 
+test_expect_failure 'move detection ignoring whitespace at eol' '
+	git reset --hard &&
+	# Lines 6-9 have new eol whitespace, but 9 also has it in the middle
+	q_to_tab <<-\EOF >lines.txt &&
+	long line 6Q
+	long line 7Q
+	long line 8Q
+	longQline 9Q
+	line 1
+	line 2
+	line 3
+	line 4
+	line 5
+	EOF
+
+	# avoid cluttering the output with complaints about our eol whitespace
+	test_config core.whitespace -blank-at-eol &&
+
+	git diff HEAD --no-renames --color-moved --color |
+		grep -v "index" |
+		test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+	<BOLD>--- a/lines.txt<RESET>
+	<BOLD>+++ b/lines.txt<RESET>
+	<CYAN>@@ -1,9 +1,9 @@<RESET>
+	<GREEN>+<RESET><GREEN>long line 6	<RESET>
+	<GREEN>+<RESET><GREEN>long line 7	<RESET>
+	<GREEN>+<RESET><GREEN>long line 8	<RESET>
+	<GREEN>+<RESET><GREEN>long	line 9	<RESET>
+	 line 1<RESET>
+	 line 2<RESET>
+	 line 3<RESET>
+	 line 4<RESET>
+	 line 5<RESET>
+	<RED>-long line 6<RESET>
+	<RED>-long line 7<RESET>
+	<RED>-long line 8<RESET>
+	<RED>-long line 9<RESET>
+	EOF
+	test_cmp expected actual &&
+
+	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
+		grep -v "index" |
+		test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+	<BOLD>--- a/lines.txt<RESET>
+	<BOLD>+++ b/lines.txt<RESET>
+	<CYAN>@@ -1,9 +1,9 @@<RESET>
+	<CYAN>+<RESET><CYAN>long line 6	<RESET>
+	<CYAN>+<RESET><CYAN>long line 7	<RESET>
+	<CYAN>+<RESET><CYAN>long line 8	<RESET>
+	<GREEN>+<RESET><GREEN>long	line 9	<RESET>
+	 line 1<RESET>
+	 line 2<RESET>
+	 line 3<RESET>
+	 line 4<RESET>
+	 line 5<RESET>
+	<MAGENTA>-long line 6<RESET>
+	<MAGENTA>-long line 7<RESET>
+	<MAGENTA>-long line 8<RESET>
+	<RED>-long line 9<RESET>
+	EOF
+	test_cmp expected actual
+'
+
 test_expect_success 'clean up whitespace-test colors' '
 	git config --unset color.diff.oldMoved &&
 	git config --unset color.diff.newMoved
-- 
2.15.0.rc1.560.g5f0609e481


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 5/5] diff: handle NULs in get_string_hash()
  2017-10-19 20:23                 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
                                     ` (3 preceding siblings ...)
  2017-10-19 20:29                   ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
@ 2017-10-19 20:31                   ` Jeff King
  2017-10-19 21:31                     ` Stefan Beller
  4 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2017-10-19 20:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git, orgads

For computing moved lines, we feed the characters of each
line into a hash. When we've been asked to ignore
whitespace, then we pick each character using next_byte(),
which returns -1 on end-of-string, which it determines using
the start/end pointers we feed it.

However our check of its return value treats "0" the same as
"-1", meaning we'd quit if the string has an embedded NUL.
This is unlikely to ever come up in practice since our line
boundaries generally come from calling strlen() in the first
place.

But it was a bit surprising to me as a reader of the
next_byte() code. And it's possible that we may one day feed
this function with more exotic input, which otherwise works
with arbitrary ptr/len pairs.

Signed-off-by: Jeff King <peff@peff.net>
---
I noticed that we make an extra copy of each line here, just to feed it
to memihash! I guess "-w" is not a critical-performance code path, but
this could be fixed if we could do memhash() incrementally (e.g., by
putting the FNV state into a struct and letting callers "add" to it
incrementally). Maybe an interesting #leftoverbits, though I'd want to
see timing tests that show it's worth doing.

 diff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index 09081a207c..c4a669ffa8 100644
--- a/diff.c
+++ b/diff.c
@@ -782,7 +782,7 @@ static unsigned get_string_hash(struct emitted_diff_symbol *es, struct diff_opti
 		strbuf_reset(&sb);
 		while (ae > ap && isspace(ae[-1]))
 			ae--;
-		while ((c = next_byte(&ap, &ae, o)) > 0)
+		while ((c = next_byte(&ap, &ae, o)) >= 0)
 			strbuf_addch(&sb, c);
 
 		return memhash(sb.buf, sb.len);
-- 
2.15.0.rc1.560.g5f0609e481

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 2/5] t4015: check "negative" case for "-w --color-moved"
  2017-10-19 20:25                   ` [PATCH 2/5] t4015: check "negative" case for "-w --color-moved" Jeff King
@ 2017-10-19 20:54                     ` Stefan Beller
  0 siblings, 0 replies; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 20:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Orgad Shaneh

On Thu, Oct 19, 2017 at 1:25 PM, Jeff King <peff@peff.net> wrote:
> We test that lines with whitespace changes are not found by
> "--color-moved" by default, but are found if "-w" is added.
> Let's add one more twist: a line that has non-whitespace
> changes should not be marked as a pure move.
>
> This is perhaps an obvious case for us to get right (and we
> do), but as we add more whitespace tests, they will form a
> pattern of "make sure this case is a move and this other
> case is not".
>
> Note that we have to add a line to our moved block, since
> having a too-small block doesn't trigger the "moved"
> heuristics.  And we also add a line of context to ensure
> that there's more context lines than moved lines (so the
> diff shows us moving the lines up, rather than moving the
> context down).
>
> Signed-off-by: Jeff King <peff@peff.net>

This patch is
Reviewed-by: Stefan Beller <sbeller@google.com>

Thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] t4015: refactor --color-moved whitespace test
  2017-10-19 20:24                   ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
@ 2017-10-19 20:56                     ` Stefan Beller
  2017-10-19 21:10                       ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 20:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Orgad Shaneh

On Thu, Oct 19, 2017 at 1:24 PM, Jeff King <peff@peff.net> wrote:

>
> +test_expect_success 'clean up whitespace-test colors' '
> +       git config --unset color.diff.oldMoved &&
> +       git config --unset color.diff.newMoved
> +'

This could be part of the previous test as

  test_config color.diff.oldMoved "magenta" &&
  test_config color.diff.newMoved "cyan" &&

in the beginning. (That way we also do not pollute the setup,
but keeping it test local).

With or without this nit, this is
Reviewed-by: Stefan Beller <sbeller@google.com>

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] t4015: test the output of "diff --color-moved -b"
  2017-10-19 20:26                   ` [PATCH 3/5] t4015: test the output of "diff --color-moved -b" Jeff King
@ 2017-10-19 21:03                     ` Stefan Beller
  2017-10-19 21:14                       ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 21:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Orgad Shaneh

On Thu, Oct 19, 2017 at 1:26 PM, Jeff King <peff@peff.net> wrote:
> Commit fa5ba2c1dd (diff: fix infinite loop with
> --color-moved --ignore-space-change, 2017-10-12) added a
> test to make sure that "--color-moved -b" doesn't run
> forever, but the test in question doesn't actually have any
> moved lines in it.
>
> Let's scrap that test and add a variant of the existing
> "--color-moved -w" test, but this time we'll check that we
> find the move with whitespace changes, but not arbitrary
> whitespace additions.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

> +
> +       git diff HEAD --no-renames --color-moved --color |
> +               grep -v "index" |
> +               test_decode_color >actual &&

The -v index grep makes it future proof (for a new hash)!
I like that. What I do not like is the pipe at the end of
git-diff as we certainly want to inspect the return code
(or for a segfault ;)

> +       git diff HEAD --no-renames -b --color-moved --color |
> +               grep -v "index" |
> +               test_decode_color >actual &&

Do we need (or rather want) to give --color additionally,
instead of having --color-moved imply it? I guess this is
extra carefulness from the "color always" series?


>  test_expect_success 'clean up whitespace-test colors' '
>         git config --unset color.diff.oldMoved &&
>         git config --unset color.diff.newMoved
> @@ -1549,13 +1613,4 @@ test_expect_success 'move detection with submodules' '
>         test_cmp expect decoded_actual
>  '
>
> -test_expect_success 'move detection with whitespace changes' '
> -       test_when_finished "git reset --hard" &&
> -       test_seq 10 >test &&
> -       git add test &&
> -       sed s/3/42/ <test >test.tmp &&
> -       mv test.tmp test &&
> -       git -c diff.colormoved diff --ignore-space-change -- test
> -'

Ok, this is covered above in testing for the unchanged lines (1-5)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] t4015: refactor --color-moved whitespace test
  2017-10-19 20:56                     ` Stefan Beller
@ 2017-10-19 21:10                       ` Jeff King
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19 21:10 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git, Orgad Shaneh

On Thu, Oct 19, 2017 at 01:56:55PM -0700, Stefan Beller wrote:

> On Thu, Oct 19, 2017 at 1:24 PM, Jeff King <peff@peff.net> wrote:
> 
> >
> > +test_expect_success 'clean up whitespace-test colors' '
> > +       git config --unset color.diff.oldMoved &&
> > +       git config --unset color.diff.newMoved
> > +'
> 
> This could be part of the previous test as
> 
>   test_config color.diff.oldMoved "magenta" &&
>   test_config color.diff.newMoved "cyan" &&
> 
> in the beginning. (That way we also do not pollute the setup,
> but keeping it test local).

It used to be in the previous test as test_config, but part of the setup
refactoring is to keep these common bits across several tests.  Hence we
"git config" in the setup step, and we must "git config --unset" here in
the cleanup. There's only the one test between setup/cleanup right now,
but the point is to add more.

The other alternative (besides repeating ourselves) would be to put all
those common bits into a function and call the function at the top of
each test.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/5] t4015: test the output of "diff --color-moved -b"
  2017-10-19 21:03                     ` Stefan Beller
@ 2017-10-19 21:14                       ` Jeff King
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19 21:14 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git, Orgad Shaneh

On Thu, Oct 19, 2017 at 02:03:54PM -0700, Stefan Beller wrote:

> > +
> > +       git diff HEAD --no-renames --color-moved --color |
> > +               grep -v "index" |
> > +               test_decode_color >actual &&
> 
> The -v index grep makes it future proof (for a new hash)!
> I like that. What I do not like is the pipe at the end of
> git-diff as we certainly want to inspect the return code
> (or for a segfault ;)

I can claim neither credit nor failure here. These lines are copied
straight from your existing test. :)

> > +       git diff HEAD --no-renames -b --color-moved --color |
> > +               grep -v "index" |
> > +               test_decode_color >actual &&
> 
> Do we need (or rather want) to give --color additionally,
> instead of having --color-moved imply it? I guess this is
> extra carefulness from the "color always" series?

I do think --color-moved probably ought to imply --color (or at least
--color=auto). But AFAIK it doesn't do so now, so we need --color (and
if it implied "auto", we'd have to overcome that anyway).

But again, this comes straight from the existing test.

> > -test_expect_success 'move detection with whitespace changes' '
> > -       test_when_finished "git reset --hard" &&
> > -       test_seq 10 >test &&
> > -       git add test &&
> > -       sed s/3/42/ <test >test.tmp &&
> > -       mv test.tmp test &&
> > -       git -c diff.colormoved diff --ignore-space-change -- test
> > -'
> 
> Ok, this is covered above in testing for the unchanged lines (1-5)

Yes, though I think it's actually the _changed_ lines which do it (the
problem was in next_byte(), so any diff with "--color-moved -b", whether
it had moves or not, would infinite loop as long as it had lines without
whitespace at the end).

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] diff: fix whitespace-skipping with --color-moved
  2017-10-19 20:29                   ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
@ 2017-10-19 21:15                     ` Stefan Beller
  2017-10-19 21:19                       ` Jeff King
  2017-10-20  7:23                     ` Simon Ruderich
  1 sibling, 1 reply; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 21:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Orgad Shaneh

On Thu, Oct 19, 2017 at 1:29 PM, Jeff King <peff@peff.net> wrote:
> The code for handling whitespace with --color-moved
> represents partial strings as a pair of pointers. There are
> two possible conventions for the end pointer:
>
>   1. It points to the byte right after the end of the
>      string.
>
>   2. It points to the final byte of the string.
>
> But we seem to use both conventions in the code:
>
>   a. we assign the initial pointers from the NUL-terminated
>      string using (1)
>
>   b. we eat trailing whitespace by checking the second
>      pointer for isspace(), which needs (2)
>
>   c. the next_byte() function checks for end-of-string with
>      "if (cp > endp)", which is (2)
>
>   d. in next_byte() we skip past internal whitespace with
>      "while (cp < end)", which is (1)
>
> This creates fewer bugs than you might think, because there
> are some subtle interactions. Because of (a) and (c), we
> always return the NUL-terminator from next_byte(). But all
> of the callers of next_byte() happen to handle that
> gracefully.
>
> Because of the mismatch between (d) and (c), next_byte()
> could accidentally return a whitespace character right at
> endp. But because of the interaction of (a) and (b), we fail
> to actually chomp trailing whitespace, meaning our endp
> _always_ points to a NUL, canceling out the problem.
>
> But that does leave (b) as a real bug: when ignoring
> whitespace only at the end-of-line, we don't correctly trim
> it, and fail to match up lines.
>
> We can fix the whole thing by moving consistently to one
> convention. Since convention (1) is idiomatic in our code
> base, we'll pick that one.
>
> The existing "-w" and "-b" tests continue to pass, and a new
> "--ignore-space-at-eol" shows off the breakage we're fixing.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  diff.c                     | 15 +++++++----
>  t/t4015-diff-whitespace.sh | 67 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 77 insertions(+), 5 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 6fd288420b..09081a207c 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -712,7 +712,7 @@ static int next_byte(const char **cp, const char **endp,
>  {
>         int retval;
>
> -       if (*cp > *endp)
> +       if (*cp >= *endp)
>                 return -1;

This converts (c) from (2) to (1).

>                         while (*cp < *endp && isspace(**cp))
>                                 (*cp)++;
> -                       /* return the first non-ws character via the usual below */
> +                       /*
> +                        * return the first non-ws character via the usual
> +                        * below, unless we ate all of the bytes
> +                        */
> +                       if (*cp >= *endp)
> +                               return -1;

This fixes the mismatch between (d) and (c).

When I wrote the code, I did not follow proper commenting style
by capitalizing the sentence start and putting periods. :(
(Well it was a single line comment, which I have seen to not
follow style occasionally unlike any multi line comment. Anyway
no need to fix it here and now, just pointing out my bad code
in the beginning.)

> -               while (ae > ap && isspace(*ae))
> +               while (ae > ap && isspace(ae[-1]))
>                         ae--;
> -               while (be > bp && isspace(*be))
> +               while (be > bp && isspace(be[-1]))
>                         be--;
...
> -               while (ae > ap && isspace(*ae))
> +               while (ae > ap && isspace(ae[-1]))

These fixes convert (b) to (1), so we're all on (1).
As we check for strict endpointer > firstpointer
(and not >=), the check of -1 is fine, too.


> @@ -1463,6 +1463,73 @@ test_expect_success 'move detection ignoring whitespace changes' '
>         test_cmp expected actual
>  '
>
> +test_expect_failure 'move detection ignoring whitespace at eol' '
> +       git reset --hard &&
> +       # Lines 6-9 have new eol whitespace, but 9 also has it in the middle
> +       q_to_tab <<-\EOF >lines.txt &&
> +       long line 6Q
> +       long line 7Q
> +       long line 8Q
> +       longQline 9Q
> +       line 1
> +       line 2
> +       line 3
> +       line 4
> +       line 5
> +       EOF
> +
> +       # avoid cluttering the output with complaints about our eol whitespace
> +       test_config core.whitespace -blank-at-eol &&

We avoid the eol space change as we want to test the move detection
without interference. Do we want to test it with that as well?

> +       git diff HEAD --no-renames --color-moved --color |
> +               grep -v "index" |
> +               test_decode_color >actual &&
..
> +       git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
> +               grep -v "index" |
> +               test_decode_color >actual &&
..
> +       <GREEN>+<RESET><GREEN>long      line 9  <RESET>

ok, we also have no interference with space changes,
which we assume is orthogonal.

The commit message really enlightened me,
Thanks!

Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] diff: fix whitespace-skipping with --color-moved
  2017-10-19 21:15                     ` Stefan Beller
@ 2017-10-19 21:19                       ` Jeff King
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2017-10-19 21:19 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git, Orgad Shaneh

On Thu, Oct 19, 2017 at 02:15:12PM -0700, Stefan Beller wrote:

> > +test_expect_failure 'move detection ignoring whitespace at eol' '
> > +       git reset --hard &&
> > +       # Lines 6-9 have new eol whitespace, but 9 also has it in the middle
> > +       q_to_tab <<-\EOF >lines.txt &&
> > +       long line 6Q
> > +       long line 7Q
> > +       long line 8Q
> > +       longQline 9Q
> > +       line 1
> > +       line 2
> > +       line 3
> > +       line 4
> > +       line 5
> > +       EOF
> > +
> > +       # avoid cluttering the output with complaints about our eol whitespace
> > +       test_config core.whitespace -blank-at-eol &&
> 
> We avoid the eol space change as we want to test the move detection
> without interference. Do we want to test it with that as well?

I don't think it creates interference. It's just that the expected
output becomes more like:

  <CYAN>+<RESET><CYAN>long line 6<RESET><BLUE> </RESET>

and the extra coloring just made it harder for me to read and write the
test. :)

I agree it might be worth checking the interaction between whitespace
coloring and --color-moved, but I think we'd want it to be more thorough
(e.g., covering the beginning of line with indent-with-non-tab or
similar).

> The commit message really enlightened me,
> Thanks!

Thanks for reviewing.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] diff: handle NULs in get_string_hash()
  2017-10-19 20:31                   ` [PATCH 5/5] diff: handle NULs in get_string_hash() Jeff King
@ 2017-10-19 21:31                     ` Stefan Beller
  2017-10-19 21:39                       ` Jeff King
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 21:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Orgad Shaneh

On Thu, Oct 19, 2017 at 1:31 PM, Jeff King <peff@peff.net> wrote:
> For computing moved lines, we feed the characters of each
> line into a hash. When we've been asked to ignore
> whitespace, then we pick each character using next_byte(),
> which returns -1 on end-of-string, which it determines using
> the start/end pointers we feed it.
>
> However our check of its return value treats "0" the same as
> "-1", meaning we'd quit if the string has an embedded NUL.

I agree. The code looks correct.

> This is unlikely to ever come up in practice since our line
> boundaries generally come from calling strlen() in the first
> place.

get_string_hash is called from
 prepare_entry, which in turn is called from
  add_lines_to_move_detection or mark_color_as_moved
   diff_flush_patch_all_file_pairs

that constructs the arguments in
diff_flush_patch
 run_diff
  run_diff_cmd
   builtin_diff (part "/* Crazy xdl interfaces.. */")
    xdi_diff_outf( fn_out_consume as arg!)
     xdi_diff
      xdl_diff
       xdl_call_hunk_func
        -> fn_out_consume(cb, line, len)

xdl_call_hunk_func however uses pointer arithmetic instead
of strlen. So I think this sentence is not a good idea to put in
the commit message.

It may not occur in practice, due to binary files detection using
NUL as a signal, but conceptually our move-colored(!) diffs
should be compatible with NULs with this patch now.

> But it was a bit surprising to me as a reader of the
> next_byte() code. And it's possible that we may one day feed
> this function with more exotic input, which otherwise works
> with arbitrary ptr/len pairs.

Good point.

>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I noticed that we make an extra copy of each line here, just to feed it
> to memihash! I guess "-w" is not a critical-performance code path, but
> this could be fixed if we could do memhash() incrementally (e.g., by
> putting the FNV state into a struct and letting callers "add" to it
> incrementally). Maybe an interesting #leftoverbits, though I'd want to
> see timing tests that show it's worth doing.
>

I agree.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] diff: handle NULs in get_string_hash()
  2017-10-19 21:31                     ` Stefan Beller
@ 2017-10-19 21:39                       ` Jeff King
  2017-10-19 21:50                         ` Stefan Beller
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff King @ 2017-10-19 21:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git, Orgad Shaneh

On Thu, Oct 19, 2017 at 02:31:20PM -0700, Stefan Beller wrote:

> > This is unlikely to ever come up in practice since our line
> > boundaries generally come from calling strlen() in the first
> > place.
> 
> get_string_hash is called from
>  prepare_entry, which in turn is called from
>   add_lines_to_move_detection or mark_color_as_moved
>    diff_flush_patch_all_file_pairs
> 
> that constructs the arguments in
> diff_flush_patch
>  run_diff
>   run_diff_cmd
>    builtin_diff (part "/* Crazy xdl interfaces.. */")
>     xdi_diff_outf( fn_out_consume as arg!)
>      xdi_diff
>       xdl_diff
>        xdl_call_hunk_func
>         -> fn_out_consume(cb, line, len)
> 
> xdl_call_hunk_func however uses pointer arithmetic instead
> of strlen. So I think this sentence is not a good idea to put in
> the commit message.
> 
> It may not occur in practice, due to binary files detection using
> NUL as a signal, but conceptually our move-colored(!) diffs
> should be compatible with NULs with this patch now.

Good catch. I just skimmed over all the diff_emit_* functions, which use
strlen(). But at the bottom is emit_add_line(), which has a real length
marker from xdiff.

I agree we wouldn't see NULs in general, but this is maybe fixing "diff
--color-moved -a". I dunno. It's probably not worth digging too much,
since it seems like the right thing to do regardless.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/5] diff: handle NULs in get_string_hash()
  2017-10-19 21:39                       ` Jeff King
@ 2017-10-19 21:50                         ` Stefan Beller
  0 siblings, 0 replies; 34+ messages in thread
From: Stefan Beller @ 2017-10-19 21:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Orgad Shaneh

On Thu, Oct 19, 2017 at 2:39 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Oct 19, 2017 at 02:31:20PM -0700, Stefan Beller wrote:
>
>> > This is unlikely to ever come up in practice since our line
>> > boundaries generally come from calling strlen() in the first
>> > place.
>>
>> get_string_hash is called from
>>  prepare_entry, which in turn is called from
>>   add_lines_to_move_detection or mark_color_as_moved
>>    diff_flush_patch_all_file_pairs
>>
>> that constructs the arguments in
>> diff_flush_patch
>>  run_diff
>>   run_diff_cmd
>>    builtin_diff (part "/* Crazy xdl interfaces.. */")
>>     xdi_diff_outf( fn_out_consume as arg!)
>>      xdi_diff
>>       xdl_diff
>>        xdl_call_hunk_func
>>         -> fn_out_consume(cb, line, len)
>>
>> xdl_call_hunk_func however uses pointer arithmetic instead
>> of strlen. So I think this sentence is not a good idea to put in
>> the commit message.
>>
>> It may not occur in practice, due to binary files detection using
>> NUL as a signal, but conceptually our move-colored(!) diffs
>> should be compatible with NULs with this patch now.
>
> Good catch. I just skimmed over all the diff_emit_* functions, which use
> strlen(). But at the bottom is emit_add_line(), which has a real length
> marker from xdiff.

Doh!
There is diff_emit_submodule_* but I presume you meant
emit_diff_* actually as there the strlen is legit, as we generate
known non-NUL data to print.

The submodule output may get mangled in diff_emit_submodule_{add,del}
as the input is coming from user data.

> I agree we wouldn't see NULs in general, but this is maybe fixing "diff
> --color-moved -a". I dunno. It's probably not worth digging too much,
> since it seems like the right thing to do regardless.

I agree, that this digging seems not worth it; I was just agitated at
the seemingly incorrect commit message.

Thanks,
Stefan

>
> -Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] diff: fix whitespace-skipping with --color-moved
  2017-10-19 20:29                   ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
  2017-10-19 21:15                     ` Stefan Beller
@ 2017-10-20  7:23                     ` Simon Ruderich
  2017-10-20 22:37                       ` Jeff King
  1 sibling, 1 reply; 34+ messages in thread
From: Simon Ruderich @ 2017-10-20  7:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Stefan Beller, git, orgads

On Thu, Oct 19, 2017 at 04:29:26PM -0400, Jeff King wrote:
> [snip]
>
> --- a/t/t4015-diff-whitespace.sh
> +++ b/t/t4015-diff-whitespace.sh
> @@ -1463,6 +1463,73 @@ test_expect_success 'move detection ignoring whitespace changes' '
>  	test_cmp expected actual
>  '
>
> +test_expect_failure 'move detection ignoring whitespace at eol' '

Shouldn't this be test_expect_success? According to the commit
message "and a new "--ignore-space-at-eol" shows off the breakage
we're fixing.". I didn't actually run the code so I don't know if
the test fails or not.

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] diff: fix whitespace-skipping with --color-moved
  2017-10-20  7:23                     ` Simon Ruderich
@ 2017-10-20 22:37                       ` Jeff King
  0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2017-10-20 22:37 UTC (permalink / raw)
  To: Simon Ruderich; +Cc: Junio C Hamano, Stefan Beller, git, orgads

On Fri, Oct 20, 2017 at 09:23:54AM +0200, Simon Ruderich wrote:

> On Thu, Oct 19, 2017 at 04:29:26PM -0400, Jeff King wrote:
> > [snip]
> >
> > --- a/t/t4015-diff-whitespace.sh
> > +++ b/t/t4015-diff-whitespace.sh
> > @@ -1463,6 +1463,73 @@ test_expect_success 'move detection ignoring whitespace changes' '
> >  	test_cmp expected actual
> >  '
> >
> > +test_expect_failure 'move detection ignoring whitespace at eol' '
> 
> Shouldn't this be test_expect_success? According to the commit
> message "and a new "--ignore-space-at-eol" shows off the breakage
> we're fixing.". I didn't actually run the code so I don't know if
> the test fails or not.

Thanks, good catch. I had originally added all the tests in a single
commit, with the intent of flipping this failure to success. But when I
shifted it to just get added here, I accidentally lost that flip.

-Peff

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, back to index

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-12 19:53 Out of memory with diff.colormoved enabled Orgad Shaneh
2017-10-12 20:05 ` Jeff King
2017-10-12 22:39   ` Stefan Beller
2017-10-12 23:33   ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
2017-10-13  0:18     ` Jeff King
2017-10-13  0:20       ` Jeff King
2017-10-13  0:24         ` Stefan Beller
2017-10-19  5:04         ` Jeff King
2017-10-19  5:24           ` Jeff King
2017-10-19  5:30             ` Junio C Hamano
2017-10-19  5:32               ` Junio C Hamano
2017-10-19  5:32                 ` Jeff King
2017-10-19  5:42               ` Jeff King
2017-10-19 19:55                 ` Stefan Beller
2017-10-19 20:23                 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
2017-10-19 20:24                   ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
2017-10-19 20:56                     ` Stefan Beller
2017-10-19 21:10                       ` Jeff King
2017-10-19 20:25                   ` [PATCH 2/5] t4015: check "negative" case for "-w --color-moved" Jeff King
2017-10-19 20:54                     ` Stefan Beller
2017-10-19 20:26                   ` [PATCH 3/5] t4015: test the output of "diff --color-moved -b" Jeff King
2017-10-19 21:03                     ` Stefan Beller
2017-10-19 21:14                       ` Jeff King
2017-10-19 20:29                   ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
2017-10-19 21:15                     ` Stefan Beller
2017-10-19 21:19                       ` Jeff King
2017-10-20  7:23                     ` Simon Ruderich
2017-10-20 22:37                       ` Jeff King
2017-10-19 20:31                   ` [PATCH 5/5] diff: handle NULs in get_string_hash() Jeff King
2017-10-19 21:31                     ` Stefan Beller
2017-10-19 21:39                       ` Jeff King
2017-10-19 21:50                         ` Stefan Beller
2017-10-19 19:53             ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
2017-10-19 19:55               ` Jeff King

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox