From: Stefan Beller <sbeller@google.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
"git@vger.kernel.org" <git@vger.kernel.org>,
Orgad Shaneh <orgads@gmail.com>
Subject: Re: [PATCH 4/5] diff: fix whitespace-skipping with --color-moved
Date: Thu, 19 Oct 2017 14:15:12 -0700 [thread overview]
Message-ID: <CAGZ79kZhCoFe1P_gwigzrgzzcqRunrUWGOAgy9xvS2ugNeVLmA@mail.gmail.com> (raw)
In-Reply-To: <20171019202926.irldca42wqosmxrd@sigill.intra.peff.net>
On Thu, Oct 19, 2017 at 1:29 PM, Jeff King <peff@peff.net> wrote:
> The code for handling whitespace with --color-moved
> represents partial strings as a pair of pointers. There are
> two possible conventions for the end pointer:
>
> 1. It points to the byte right after the end of the
> string.
>
> 2. It points to the final byte of the string.
>
> But we seem to use both conventions in the code:
>
> a. we assign the initial pointers from the NUL-terminated
> string using (1)
>
> b. we eat trailing whitespace by checking the second
> pointer for isspace(), which needs (2)
>
> c. the next_byte() function checks for end-of-string with
> "if (cp > endp)", which is (2)
>
> d. in next_byte() we skip past internal whitespace with
> "while (cp < end)", which is (1)
>
> This creates fewer bugs than you might think, because there
> are some subtle interactions. Because of (a) and (c), we
> always return the NUL-terminator from next_byte(). But all
> of the callers of next_byte() happen to handle that
> gracefully.
>
> Because of the mismatch between (d) and (c), next_byte()
> could accidentally return a whitespace character right at
> endp. But because of the interaction of (a) and (b), we fail
> to actually chomp trailing whitespace, meaning our endp
> _always_ points to a NUL, canceling out the problem.
>
> But that does leave (b) as a real bug: when ignoring
> whitespace only at the end-of-line, we don't correctly trim
> it, and fail to match up lines.
>
> We can fix the whole thing by moving consistently to one
> convention. Since convention (1) is idiomatic in our code
> base, we'll pick that one.
>
> The existing "-w" and "-b" tests continue to pass, and a new
> "--ignore-space-at-eol" shows off the breakage we're fixing.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff.c | 15 +++++++----
> t/t4015-diff-whitespace.sh | 67 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 77 insertions(+), 5 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 6fd288420b..09081a207c 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -712,7 +712,7 @@ static int next_byte(const char **cp, const char **endp,
> {
> int retval;
>
> - if (*cp > *endp)
> + if (*cp >= *endp)
> return -1;
This converts (c) from (2) to (1).
> while (*cp < *endp && isspace(**cp))
> (*cp)++;
> - /* return the first non-ws character via the usual below */
> + /*
> + * return the first non-ws character via the usual
> + * below, unless we ate all of the bytes
> + */
> + if (*cp >= *endp)
> + return -1;
This fixes the mismatch between (d) and (c).
When I wrote the code, I did not follow proper commenting style
by capitalizing the sentence start and putting periods. :(
(Well it was a single line comment, which I have seen to not
follow style occasionally unlike any multi line comment. Anyway
no need to fix it here and now, just pointing out my bad code
in the beginning.)
> - while (ae > ap && isspace(*ae))
> + while (ae > ap && isspace(ae[-1]))
> ae--;
> - while (be > bp && isspace(*be))
> + while (be > bp && isspace(be[-1]))
> be--;
...
> - while (ae > ap && isspace(*ae))
> + while (ae > ap && isspace(ae[-1]))
These fixes convert (b) to (1), so we're all on (1).
As we check for strict endpointer > firstpointer
(and not >=), the check of -1 is fine, too.
> @@ -1463,6 +1463,73 @@ test_expect_success 'move detection ignoring whitespace changes' '
> test_cmp expected actual
> '
>
> +test_expect_failure 'move detection ignoring whitespace at eol' '
> + git reset --hard &&
> + # Lines 6-9 have new eol whitespace, but 9 also has it in the middle
> + q_to_tab <<-\EOF >lines.txt &&
> + long line 6Q
> + long line 7Q
> + long line 8Q
> + longQline 9Q
> + line 1
> + line 2
> + line 3
> + line 4
> + line 5
> + EOF
> +
> + # avoid cluttering the output with complaints about our eol whitespace
> + test_config core.whitespace -blank-at-eol &&
We avoid the eol space change as we want to test the move detection
without interference. Do we want to test it with that as well?
> + git diff HEAD --no-renames --color-moved --color |
> + grep -v "index" |
> + test_decode_color >actual &&
..
> + git diff HEAD --no-renames --ignore-space-at-eol --color-moved --color |
> + grep -v "index" |
> + test_decode_color >actual &&
..
> + <GREEN>+<RESET><GREEN>long line 9 <RESET>
ok, we also have no interference with space changes,
which we assume is orthogonal.
The commit message really enlightened me,
Thanks!
Stefan
next prev parent reply other threads:[~2017-10-19 21:15 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-12 19:53 Out of memory with diff.colormoved enabled Orgad Shaneh
2017-10-12 20:05 ` Jeff King
2017-10-12 22:39 ` Stefan Beller
2017-10-12 23:33 ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
2017-10-13 0:18 ` Jeff King
2017-10-13 0:20 ` Jeff King
2017-10-13 0:24 ` Stefan Beller
2017-10-19 5:04 ` Jeff King
2017-10-19 5:24 ` Jeff King
2017-10-19 5:30 ` Junio C Hamano
2017-10-19 5:32 ` Junio C Hamano
2017-10-19 5:32 ` Jeff King
2017-10-19 5:42 ` Jeff King
2017-10-19 19:55 ` Stefan Beller
2017-10-19 20:23 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
2017-10-19 20:24 ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
2017-10-19 20:56 ` Stefan Beller
2017-10-19 21:10 ` Jeff King
2017-10-19 20:25 ` [PATCH 2/5] t4015: check "negative" case for "-w --color-moved" Jeff King
2017-10-19 20:54 ` Stefan Beller
2017-10-19 20:26 ` [PATCH 3/5] t4015: test the output of "diff --color-moved -b" Jeff King
2017-10-19 21:03 ` Stefan Beller
2017-10-19 21:14 ` Jeff King
2017-10-19 20:29 ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
2017-10-19 21:15 ` Stefan Beller [this message]
2017-10-19 21:19 ` Jeff King
2017-10-20 7:23 ` Simon Ruderich
2017-10-20 22:37 ` Jeff King
2017-10-19 20:31 ` [PATCH 5/5] diff: handle NULs in get_string_hash() Jeff King
2017-10-19 21:31 ` Stefan Beller
2017-10-19 21:39 ` Jeff King
2017-10-19 21:50 ` Stefan Beller
2017-10-19 19:53 ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
2017-10-19 19:55 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGZ79kZhCoFe1P_gwigzrgzzcqRunrUWGOAgy9xvS2ugNeVLmA@mail.gmail.com \
--to=sbeller@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=orgads@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).