git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Stefan Beller <sbeller@google.com>,
	git@vger.kernel.org, orgads@gmail.com
Subject: [PATCH 4/5] diff: fix whitespace-skipping with --color-moved
Date: Thu, 19 Oct 2017 16:29:26 -0400
Message-ID: <20171019202926.irldca42wqosmxrd@sigill.intra.peff.net> (raw)
In-Reply-To: <20171019202326.grovyfsragl2d7xx@sigill.intra.peff.net>

The code for handling whitespace with --color-moved
represents partial strings as a pair of pointers. There are
two possible conventions for the end pointer:

  1. It points to the byte right after the end of the
     string.

  2. It points to the final byte of the string.

But we seem to use both conventions in the code:

  a. we assign the initial pointers from the NUL-terminated
     string using (1)

  b. we eat trailing whitespace by checking the second
     pointer for isspace(), which needs (2)

  c. the next_byte() function checks for end-of-string with
     "if (cp > endp)", which is (2)

  d. in next_byte() we skip past internal whitespace with
     "while (cp < end)", which is (1)

This creates fewer bugs than you might think, because there
are some subtle interactions. Because of (a) and (c), we
always return the NUL-terminator from next_byte(). But all
of the callers of next_byte() happen to handle that
gracefully.

Because of the mismatch between (d) and (c), next_byte()
could accidentally return a whitespace character right at
endp. But because of the interaction of (a) and (b), we fail
to actually chomp trailing whitespace, meaning our endp
_always_ points to a NUL, canceling out the problem.

But that does leave (b) as a real bug: when ignoring
whitespace only at the end-of-line, we don't correctly trim
it, and fail to match up lines.

We can fix the whole thing by moving consistently to one
convention. Since convention (1) is idiomatic in our code
base, we'll pick that one.

The existing "-w" and "-b" tests continue to pass, and a new
"--ignore-space-at-eol" shows off the breakage we're fixing.

Signed-off-by: Jeff King <peff@peff.net>
---
 diff.c                     | 15 +++++++----
 t/t4015-diff-whitespace.sh | 67 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index 6fd288420b..09081a207c 100644
--- a/diff.c
+++ b/diff.c
@@ -712,7 +712,7 @@ static int next_byte(const char **cp, const char **endp,
 {
 	int retval;
 
-	if (*cp > *endp)
+	if (*cp >= *endp)
 		return -1;
 
 	if (isspace(**cp)) {
@@ -729,7 +729,12 @@ static int next_byte(const char **cp, const char **endp,
 		if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
 			while (*cp < *endp && isspace(**cp))
 				(*cp)++;
-			/* return the first non-ws character via the usual below */
+			/*
+			 * return the first non-ws character via the usual
+			 * below, unless we ate all of the bytes
+			 */
+			if (*cp >= *endp)
+				return -1;
 		}
 	}
 
@@ -750,9 +755,9 @@ static int moved_entry_cmp(const struct diff_options *diffopt,
 		return a->es->len != b->es->len  || memcmp(ap, bp, a->es->len);
 
 	if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_AT_EOL)) {
-		while (ae > ap && isspace(*ae))
+		while (ae > ap && isspace(ae[-1]))
 			ae--;
-		while (be > bp && isspace(*be))
+		while (be > bp && isspace(be[-1]))
 			be--;
 	}
 
@@ -775,7 +780,7 @@ static unsigned get_string_hash(struct emitted_diff_symbol *es, struct diff_opti
 		int c;
 
 		strbuf_reset(&sb);
-		while (ae > ap && isspace(*ae))
+		while (ae > ap && isspace(ae[-1]))
 			ae--;
 		while ((c = next_byte(&ap, &ae, o)) > 0)
 			strbuf_addch(&sb, c);
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 1f54816c6b..eb9431da7d 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -1463,6 +1463,73 @@ test_expect_success 'move detection ignoring whitespace changes' '
 	test_cmp expected actual
 '
 
+test_expect_failure 'move detection ignoring whitespace at eol' '
+	git reset --hard &&
+	# Lines 6-9 have new eol whitespace, but 9 also has it in the middle
+	q_to_tab <<-\EOF >lines.txt &&
+	long line 6Q
+	long line 7Q
+	long line 8Q
+	longQline 9Q
+	line 1
+	line 2
+	line 3
+	line 4
+	line 5
+	EOF
+
+	# avoid cluttering the output with complaints about our eol whitespace
+	test_config core.whitespace -blank-at-eol &&
+
+	git diff HEAD --no-renames --color-moved --color |
+		grep -v "index" |
+		test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+	<BOLD>--- a/lines.txt<RESET>
+	<BOLD>+++ b/lines.txt<RESET>
+	<CYAN>@@ -1,9 +1,9 @@<RESET>
+	<GREEN>+<RESET><GREEN>long line 6	<RESET>
+	<GREEN>+<RESET><GREEN>long line 7	<RESET>
+	<GREEN>+<RESET><GREEN>long line 8	<RESET>
+	<GREEN>+<RESET><GREEN>long	line 9	<RESET>
+	 line 1<RESET>
+	 line 2<RESET>
+	 line 3<RESET>
+	 line 4<RESET>
+	 line 5<RESET>
+	<RED>-long line 6<RESET>
+	<RED>-long line 7<RESET>
+	<RED>-long line 8<RESET>
+	<RED>-long line 9<RESET>
+	EOF
+	test_cmp expected actual &&
+
+	git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
+		grep -v "index" |
+		test_decode_color >actual &&
+	cat <<-\EOF >expected &&
+	<BOLD>diff --git a/lines.txt b/lines.txt<RESET>
+	<BOLD>--- a/lines.txt<RESET>
+	<BOLD>+++ b/lines.txt<RESET>
+	<CYAN>@@ -1,9 +1,9 @@<RESET>
+	<CYAN>+<RESET><CYAN>long line 6	<RESET>
+	<CYAN>+<RESET><CYAN>long line 7	<RESET>
+	<CYAN>+<RESET><CYAN>long line 8	<RESET>
+	<GREEN>+<RESET><GREEN>long	line 9	<RESET>
+	 line 1<RESET>
+	 line 2<RESET>
+	 line 3<RESET>
+	 line 4<RESET>
+	 line 5<RESET>
+	<MAGENTA>-long line 6<RESET>
+	<MAGENTA>-long line 7<RESET>
+	<MAGENTA>-long line 8<RESET>
+	<RED>-long line 9<RESET>
+	EOF
+	test_cmp expected actual
+'
+
 test_expect_success 'clean up whitespace-test colors' '
 	git config --unset color.diff.oldMoved &&
 	git config --unset color.diff.newMoved
-- 
2.15.0.rc1.560.g5f0609e481


  parent reply index

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-12 19:53 Out of memory with diff.colormoved enabled Orgad Shaneh
2017-10-12 20:05 ` Jeff King
2017-10-12 22:39   ` Stefan Beller
2017-10-12 23:33   ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
2017-10-13  0:18     ` Jeff King
2017-10-13  0:20       ` Jeff King
2017-10-13  0:24         ` Stefan Beller
2017-10-19  5:04         ` Jeff King
2017-10-19  5:24           ` Jeff King
2017-10-19  5:30             ` Junio C Hamano
2017-10-19  5:32               ` Junio C Hamano
2017-10-19  5:32                 ` Jeff King
2017-10-19  5:42               ` Jeff King
2017-10-19 19:55                 ` Stefan Beller
2017-10-19 20:23                 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
2017-10-19 20:24                   ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
2017-10-19 20:56                     ` Stefan Beller
2017-10-19 21:10                       ` Jeff King
2017-10-19 20:25                   ` [PATCH 2/5] t4015: check "negative" case for "-w --color-moved" Jeff King
2017-10-19 20:54                     ` Stefan Beller
2017-10-19 20:26                   ` [PATCH 3/5] t4015: test the output of "diff --color-moved -b" Jeff King
2017-10-19 21:03                     ` Stefan Beller
2017-10-19 21:14                       ` Jeff King
2017-10-19 20:29                   ` Jeff King [this message]
2017-10-19 21:15                     ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Stefan Beller
2017-10-19 21:19                       ` Jeff King
2017-10-20  7:23                     ` Simon Ruderich
2017-10-20 22:37                       ` Jeff King
2017-10-19 20:31                   ` [PATCH 5/5] diff: handle NULs in get_string_hash() Jeff King
2017-10-19 21:31                     ` Stefan Beller
2017-10-19 21:39                       ` Jeff King
2017-10-19 21:50                         ` Stefan Beller
2017-10-19 19:53             ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
2017-10-19 19:55               ` Jeff King

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171019202926.irldca42wqosmxrd@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=orgads@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox