git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Jeff King <peff@peff.net>
To: Stefan Beller <sbeller@google.com>
Cc: git@vger.kernel.org, orgads@gmail.com
Subject: Re: [PATCH] diff.c: increment buffer pointer in all code path
Date: Thu, 19 Oct 2017 01:04:59 -0400
Message-ID: <20171019050459.p2cx63yrxfwq4ta3@sigill.intra.peff.net> (raw)
In-Reply-To: <20171013002057.froqi54olmhmah6b@sigill.intra.peff.net>

On Thu, Oct 12, 2017 at 08:20:57PM -0400, Jeff King wrote:

> On Thu, Oct 12, 2017 at 08:18:37PM -0400, Jeff King wrote:
> 
> > Fix this by entering the conditional only when we actually
> > see whitespace. We can apply this also to the
> > IGNORE_WHITESPACE change. That code path isn't buggy
> > (because it falls through to returning the next
> > non-whitespace byte), but it makes the logic more clear if
> > we only bother to look at whitespace flags after seeing that
> > the next byte is whitespace.
> 
> I think there actually _is_ a bug in that code path, but it's unrelated
> to this one. If you have whitespace at the end of the buffer, then we'd
> advance *cp until it matches *endp, and then return whatever is at *endp
> (which is nonsense, or probably a NUL) rather than returning "-1".
> 
> I'm out of time for tonight and not familiar enough with the color-moved
> code to come up with a reasonable test case quickly, but maybe you can
> see if that can trigger bad behavior?

I found a few moments to follow up on this. I'm not sure if it's a bug
or intentional behavior. It's definitely easy to trigger next_byte()
reading *endp (even without ignoring whitespace, we always dereference
it).

This function is always operating on the buffer from an
emitted_diff_symbol, which is NUL-terminated. So *endp is always NUL. So
we'll return "0" at the end of the string.

There are two callers. The first is get_string_hash(), which does:

  while ((c = next_byte(&ap, &ae, o)) > 0)

so we'll break out on the NUL byte. But it also sometimes shrinks the
end-pointer "ae" to skip trailing whitespace.  Or at least it tries to.
It does:

   const char *ap = es->line, *ae = es->line + es->len;
   ...
   while (ae > ap && isspace(*ae))
           ae--;

but AFAICT that loop will never trigger, since *ae will always point to
NUL to begin with. But it points to the expectation that our end-pointer
does not follow the usual one-past-the-end convention, but rather is
meant to point to the actual last character in the string.

The same problem is repeated in the other caller, moved_entry_cmp(),
which tries to eat trailing whitespace for IGNORE_WHITESPACE_AT_EOL. But
it uses the same loop that starts on NUL and will never trigger.

It then goes on to call next_byte(). The extra NUL there should not
cause any problem, because we are just checking for equality between the
two strings (so if they both return NUL at the same time, good; if not,
then we know they are different).

So. That leaves me with:

  - I'm unclear on whether next_byte() is meant to return that trailing
    NUL or not. I don't think it causes any bugs, but it certainly
    confused me for a function to take a cp/endp pair of pointers, and
    then dereference endp. It might be worth either fixing or clarifying
    with a comment.

  - Those loops to eat trailing whitespace are doing nothing. I'm not
    sure if that all works out because next_byte() eats whitespaces or
    not (I think not, because it doesn't eat whitespace for the
    IGNORE_WHITESPACE_AT_EOL case). But I'm not quite sure what a test
    would look like.

-Peff

  parent reply index

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-12 19:53 Out of memory with diff.colormoved enabled Orgad Shaneh
2017-10-12 20:05 ` Jeff King
2017-10-12 22:39   ` Stefan Beller
2017-10-12 23:33   ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
2017-10-13  0:18     ` Jeff King
2017-10-13  0:20       ` Jeff King
2017-10-13  0:24         ` Stefan Beller
2017-10-19  5:04         ` Jeff King [this message]
2017-10-19  5:24           ` Jeff King
2017-10-19  5:30             ` Junio C Hamano
2017-10-19  5:32               ` Junio C Hamano
2017-10-19  5:32                 ` Jeff King
2017-10-19  5:42               ` Jeff King
2017-10-19 19:55                 ` Stefan Beller
2017-10-19 20:23                 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
2017-10-19 20:24                   ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
2017-10-19 20:56                     ` Stefan Beller
2017-10-19 21:10                       ` Jeff King
2017-10-19 20:25                   ` [PATCH 2/5] t4015: check "negative" case for "-w --color-moved" Jeff King
2017-10-19 20:54                     ` Stefan Beller
2017-10-19 20:26                   ` [PATCH 3/5] t4015: test the output of "diff --color-moved -b" Jeff King
2017-10-19 21:03                     ` Stefan Beller
2017-10-19 21:14                       ` Jeff King
2017-10-19 20:29                   ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
2017-10-19 21:15                     ` Stefan Beller
2017-10-19 21:19                       ` Jeff King
2017-10-20  7:23                     ` Simon Ruderich
2017-10-20 22:37                       ` Jeff King
2017-10-19 20:31                   ` [PATCH 5/5] diff: handle NULs in get_string_hash() Jeff King
2017-10-19 21:31                     ` Stefan Beller
2017-10-19 21:39                       ` Jeff King
2017-10-19 21:50                         ` Stefan Beller
2017-10-19 19:53             ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
2017-10-19 19:55               ` Jeff King

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171019050459.p2cx63yrxfwq4ta3@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=orgads@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox