git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jeff King <peff@peff.net>
Cc: Orgad Shaneh <orgads@gmail.com>, git <git@vger.kernel.org>
Subject: Re: Out of memory with diff.colormoved enabled
Date: Thu, 12 Oct 2017 15:39:03 -0700
Message-ID: <CAGZ79kZ=v0GiBfNPK=iRp7Hi=zG=KUL6+jHuc3N5CRqtECt7fw@mail.gmail.com> (raw)
In-Reply-To: <20171012200536.m6oz4zrjcze3yw4i@sigill.intra.peff.net>

On Thu, Oct 12, 2017 at 1:05 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Oct 12, 2017 at 10:53:23PM +0300, Orgad Shaneh wrote:
>
>> There is an infinite loop when colormoved is used with --ignore-space-change:
>>
>> git init
>> seq 20 > test
>> git add test
>> sed -i 's/9/42/' test
>> git -c diff.colormoved diff --ignore-space-change -- test
>
> Thanks for an easy reproduction recipe.

Thanks here as well!

> It looks like the problem is that next_byte() doesn't make any forward
> progress in the buffer with --ignore-space-change. We try to convert
> whitespace into a single space

> (I'm not sure why, but I'm not very
> familiar with this part of the code).

(on why you don't feel familiar)
Because it is quite new, and you weren't one of the main reviewers.
2e2d5ac184 (diff.c: color moved lines differently, 2017-06-30) was also very
large, such that it is easy to overlook. Though I remember Junio and me
discussing the next_byte part quite vividly. :/

(On why it is the way it is)
Consider the three strings
    one SP word
    one TAB word
    oneword

The first two are equal, but the third is different with
`--ignore-space-change` given. To achieve that goal,
the easiest thing to do (in my mind) was to replace
any sequence of blank characters by "a standard
space" sequence. That will make all strings with different
white space sequences compare equal, but the one
without blanks will be different.

> But if there's no space, then the
> "cp" pointer never gets advanced.

Right, because of the early return, skipping the increment of *cp

> This fixes it, but I have no idea if it's doing the right thing:
>
> diff --git a/diff.c b/diff.c
> index 69f03570ad..e8dedc7357 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -713,13 +713,17 @@ static int next_byte(const char **cp, const char **endp,
>                 return -1;
>
>         if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE_CHANGE)) {
> -               while (*cp < *endp && isspace(**cp))
> +               int saw_whitespace = 0;
> +               while (*cp < *endp && isspace(**cp)) {
>                         (*cp)++;
> +                       saw_whitespace = 1;
> +               }
>                 /*
>                  * After skipping a couple of whitespaces, we still have to
>                  * account for one space.
>                  */
> -               return (int)' ';
> +               if (saw_whitespace)
> +                       return (int)' ';

The "else" is implicit and it falls through to
the standard case at the end of the function,
incrementing *cp, returning the character *cp
pointed at prior to being incremented.

That sounds correct.

>         }
>
>         if (DIFF_XDL_TST(diffopt, IGNORE_WHITESPACE)) {
>
> I guess it would be equally correct to not enter that if-block unless
> isspace(**cp).

This also sounds correct.

>
> -Peff

  reply index

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-12 19:53 Orgad Shaneh
2017-10-12 20:05 ` Jeff King
2017-10-12 22:39   ` Stefan Beller [this message]
2017-10-12 23:33   ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
2017-10-13  0:18     ` Jeff King
2017-10-13  0:20       ` Jeff King
2017-10-13  0:24         ` Stefan Beller
2017-10-19  5:04         ` Jeff King
2017-10-19  5:24           ` Jeff King
2017-10-19  5:30             ` Junio C Hamano
2017-10-19  5:32               ` Junio C Hamano
2017-10-19  5:32                 ` Jeff King
2017-10-19  5:42               ` Jeff King
2017-10-19 19:55                 ` Stefan Beller
2017-10-19 20:23                 ` [PATCH 0/5] fix "diff --color-moved --ignore-space-at-eol" Jeff King
2017-10-19 20:24                   ` [PATCH 1/5] t4015: refactor --color-moved whitespace test Jeff King
2017-10-19 20:56                     ` Stefan Beller
2017-10-19 21:10                       ` Jeff King
2017-10-19 20:25                   ` [PATCH 2/5] t4015: check "negative" case for "-w --color-moved" Jeff King
2017-10-19 20:54                     ` Stefan Beller
2017-10-19 20:26                   ` [PATCH 3/5] t4015: test the output of "diff --color-moved -b" Jeff King
2017-10-19 21:03                     ` Stefan Beller
2017-10-19 21:14                       ` Jeff King
2017-10-19 20:29                   ` [PATCH 4/5] diff: fix whitespace-skipping with --color-moved Jeff King
2017-10-19 21:15                     ` Stefan Beller
2017-10-19 21:19                       ` Jeff King
2017-10-20  7:23                     ` Simon Ruderich
2017-10-20 22:37                       ` Jeff King
2017-10-19 20:31                   ` [PATCH 5/5] diff: handle NULs in get_string_hash() Jeff King
2017-10-19 21:31                     ` Stefan Beller
2017-10-19 21:39                       ` Jeff King
2017-10-19 21:50                         ` Stefan Beller
2017-10-19 19:53             ` [PATCH] diff.c: increment buffer pointer in all code path Stefan Beller
2017-10-19 19:55               ` Jeff King

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kZ=v0GiBfNPK=iRp7Hi=zG=KUL6+jHuc3N5CRqtECt7fw@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=orgads@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox