git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2] line-range: Fix infinite loop bug with degenerate regex
@ 2022-12-05 19:36 Lars Kellogg-Stedman
  2022-12-07  4:33 ` Eric Sunshine
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Lars Kellogg-Stedman @ 2022-12-05 19:36 UTC (permalink / raw)
  To: git; +Cc: Lars Kellogg-Stedman

When the -L argument to "git log" is passed the degenerate regular
expression "$" (as in "-L :$:line-range.c"), this results in an
infinite loop in find_funcname_matching_regexp() (the function
iterates through the file correctly, but when it reaches the end of
the file it matches $ against the empty string, "", and at that points
loops forever).

Modify the loop condition from while (1) to while (*start) so that the
loop exits when start is the empty string. In this case, "git log" exits
with the error:

    fatal: -L parameter '$' starting at line 1: no match

Originally reported in <https://stackoverflow.com/q/74690545/147356>.

Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com>
---
 line-range.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/line-range.c b/line-range.c
index 955a8a9535..bdcb810485 100644
--- a/line-range.c
+++ b/line-range.c
@@ -135,7 +135,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
 {
 	int reg_error;
 	regmatch_t match[1];
-	while (1) {
+	while (*start) {
 		const char *bol, *eol;
 		reg_error = regexec(regexp, start, 1, match, 0);
 		if (reg_error == REG_NOMATCH)
@@ -161,6 +161,8 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
 			return bol;
 		start = eol;
 	}
+
+    return NULL;
 }
 
 static const char *parse_range_funcname(
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] line-range: Fix infinite loop bug with degenerate regex
  2022-12-05 19:36 [PATCH v2] line-range: Fix infinite loop bug with degenerate regex Lars Kellogg-Stedman
@ 2022-12-07  4:33 ` Eric Sunshine
  2022-12-07  4:52 ` Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Eric Sunshine @ 2022-12-07  4:33 UTC (permalink / raw)
  To: Lars Kellogg-Stedman; +Cc: git

On Mon, Dec 5, 2022 at 2:45 PM Lars Kellogg-Stedman <lars@oddbit.com> wrote:
> line-range: Fix infinite loop bug with degenerate regex

s/Fix/fix/

> When the -L argument to "git log" is passed the degenerate regular
> expression "$" (as in "-L :$:line-range.c"), this results in an
> infinite loop in find_funcname_matching_regexp() (the function
> iterates through the file correctly, but when it reaches the end of
> the file it matches $ against the empty string, "", and at that points
> loops forever).

s/points/point/

> Modify the loop condition from while (1) to while (*start) so that the
> loop exits when start is the empty string. In this case, "git log" exits
> with the error:
>
>     fatal: -L parameter '$' starting at line 1: no match

I've never really used `-L:funcname:`, so it took me by surprise that
this would be reported as a fatal error. However, now that I've read
through the code and put a bit of thought into degenerate cases, I
can't come up with any better way to report it. Okay.

> Originally reported in <https://stackoverflow.com/q/74690545/147356>.
>
> Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com>
> ---
> diff --git a/line-range.c b/line-range.c
> @@ -135,7 +135,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>  {
>         int reg_error;
>         regmatch_t match[1];
> -       while (1) {
> +       while (*start) {
>                 const char *bol, *eol;
>                 reg_error = regexec(regexp, start, 1, match, 0);
>                 if (reg_error == REG_NOMATCH)
> @@ -161,6 +161,8 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>                         return bol;
>                 start = eol;
>         }
> +
> +    return NULL;
>  }

I'm not particularly familiar with this code (even though I touched it
years ago), so I can't say whether this is the best fix, but it seems
reasonable now that I've studied the code and put some thought into
degenerate cases. I had wondered if it would be possible to catch
these degenerate cases in some general way, but whatever ideas I had
seemed too special-purpose; your solution here is straightforward and
probably more robust than special-purpose code.

Style-wise, I'd probably drop the blank line before `return NULL`.

Please do re-roll once more with a test which verifies that this bug
has been fixed. Here's a test you can add to the bottom of
t/t4211-line-log.sh for that purpose (be sure to re-indent it with
TABs which Gmail mushes out):

    test_expect_success 'degenerate -L:$: does not hang' '
        >content-immaterial &&
        git add content-immaterial &&
        git commit -m immaterial &&
        test_must_fail git log -L:$:content-immaterial
    '

And here's my sign-off which you can insert above your own sign-off if
you include the above test.

    Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] line-range: Fix infinite loop bug with degenerate regex
  2022-12-05 19:36 [PATCH v2] line-range: Fix infinite loop bug with degenerate regex Lars Kellogg-Stedman
  2022-12-07  4:33 ` Eric Sunshine
@ 2022-12-07  4:52 ` Ævar Arnfjörð Bjarmason
  2022-12-07  5:29   ` Junio C Hamano
                     ` (2 more replies)
  2022-12-11  1:53 ` [PATCH v3] line-range: Fix infinite loop bug with degenerate '$' regex Lars Kellogg-Stedman
  2022-12-19 22:48 ` [PATCH v4] line-range: fix infinite loop bug with " Lars Kellogg-Stedman
  3 siblings, 3 replies; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-07  4:52 UTC (permalink / raw)
  To: Lars Kellogg-Stedman; +Cc: git


On Mon, Dec 05 2022, Lars Kellogg-Stedman wrote:

> When the -L argument to "git log" is passed the degenerate regular
> expression "$" (as in "-L :$:line-range.c"), this results in an
> infinite loop in find_funcname_matching_regexp() (the function
> iterates through the file correctly, but when it reaches the end of
> the file it matches $ against the empty string, "", and at that points
> loops forever).
>
> Modify the loop condition from while (1) to while (*start) so that the
> loop exits when start is the empty string. In this case, "git log" exits
> with the error:
>
>     fatal: -L parameter '$' starting at line 1: no match
>
> Originally reported in <https://stackoverflow.com/q/74690545/147356>.
>
> Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com>
> ---
>  line-range.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/line-range.c b/line-range.c
> index 955a8a9535..bdcb810485 100644
> --- a/line-range.c
> +++ b/line-range.c
> @@ -135,7 +135,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>  {
>  	int reg_error;
>  	regmatch_t match[1];
> -	while (1) {
> +	while (*start) {
>  		const char *bol, *eol;
>  		reg_error = regexec(regexp, start, 1, match, 0);
>  		if (reg_error == REG_NOMATCH)
> @@ -161,6 +161,8 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>  			return bol;
>  		start = eol;
>  	}
> +
> +    return NULL;
>  }
>  
>  static const char *parse_range_funcname(

We really should fix this, but why not just count this as a match,
rather than erroring out?

That we're mixing up whether '$' always matches here with our iteration
loop is our own internal bug, we shouldn't error out on a '$'. It's just
a regex that happens to match everything.

It's perfectly OK to provide a zero-width regex in general, e.g. try:

     git config --get-regexp '$'

Or for a non-git tool:

	grep '$' FILE

I think the real bug here is that we are pointing our regexec() at the
haystack of multiple lines as an optimization.

Then in "determine extent of line matched" and "is it a funcname line"
assuming that a positive match must be non-zero-width. But that's just
because the optimization is leaky.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] line-range: Fix infinite loop bug with degenerate regex
  2022-12-07  4:52 ` Ævar Arnfjörð Bjarmason
@ 2022-12-07  5:29   ` Junio C Hamano
  2022-12-07 20:30   ` SZEDER Gábor
  2022-12-09 19:16   ` Lars Kellogg-Stedman
  2 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2022-12-07  5:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Lars Kellogg-Stedman, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> It's perfectly OK to provide a zero-width regex in general, e.g. try:
> Or for a non-git tool:
> ...

The same story goes for "^" or even ".*", I suspect.

> I think the real bug here is that we are pointing our regexec() at the
> haystack of multiple lines as an optimization.
>
> Then in "determine extent of line matched" and "is it a funcname line"
> assuming that a positive match must be non-zero-width. But that's just
> because the optimization is leaky.

Good thinking.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] line-range: Fix infinite loop bug with degenerate regex
  2022-12-07  4:52 ` Ævar Arnfjörð Bjarmason
  2022-12-07  5:29   ` Junio C Hamano
@ 2022-12-07 20:30   ` SZEDER Gábor
  2022-12-09 19:16   ` Lars Kellogg-Stedman
  2 siblings, 0 replies; 15+ messages in thread
From: SZEDER Gábor @ 2022-12-07 20:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Lars Kellogg-Stedman, git

On Wed, Dec 07, 2022 at 05:52:04AM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Dec 05 2022, Lars Kellogg-Stedman wrote:
> 
> > When the -L argument to "git log" is passed the degenerate regular
> > expression "$" (as in "-L :$:line-range.c"), this results in an
> > infinite loop in find_funcname_matching_regexp() (the function
> > iterates through the file correctly, but when it reaches the end of
> > the file it matches $ against the empty string, "", and at that points
> > loops forever).
> >
> > Modify the loop condition from while (1) to while (*start) so that the
> > loop exits when start is the empty string. In this case, "git log" exits
> > with the error:
> >
> >     fatal: -L parameter '$' starting at line 1: no match
> >
> > Originally reported in <https://stackoverflow.com/q/74690545/147356>.
> >
> > Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com>
> > ---
> >  line-range.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/line-range.c b/line-range.c
> > index 955a8a9535..bdcb810485 100644
> > --- a/line-range.c
> > +++ b/line-range.c
> > @@ -135,7 +135,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
> >  {
> >  	int reg_error;
> >  	regmatch_t match[1];
> > -	while (1) {
> > +	while (*start) {
> >  		const char *bol, *eol;
> >  		reg_error = regexec(regexp, start, 1, match, 0);
> >  		if (reg_error == REG_NOMATCH)
> > @@ -161,6 +161,8 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
> >  			return bol;
> >  		start = eol;
> >  	}
> > +
> > +    return NULL;
> >  }
> >  
> >  static const char *parse_range_funcname(
> 
> We really should fix this, but why not just count this as a match,
> rather than erroring out?
> 
> That we're mixing up whether '$' always matches here with our iteration
> loop is our own internal bug, we shouldn't error out on a '$'. It's just
> a regex that happens to match everything.

Indeed.  The description of '-L...' in the manpage of 'git log' (and
'blame') states:

    If :<funcname> is given in place of <start> and <end>, it is a
    regular expression that denotes the range from the first funcname
    line that matches <funcname>, up to the next funcname line.

So, if we use '-L' with a funcname regex that matches every line, be
it '$', '^' or '.*', then this would mean that it should denote the
range from the first funcname line in that file to the second.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] line-range: Fix infinite loop bug with degenerate regex
  2022-12-07  4:52 ` Ævar Arnfjörð Bjarmason
  2022-12-07  5:29   ` Junio C Hamano
  2022-12-07 20:30   ` SZEDER Gábor
@ 2022-12-09 19:16   ` Lars Kellogg-Stedman
  2 siblings, 0 replies; 15+ messages in thread
From: Lars Kellogg-Stedman @ 2022-12-09 19:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

On Wed, Dec 07, 2022 at 05:52:04AM +0100, Ævar Arnfjörð Bjarmason wrote:
> We really should fix this, but why not just count this as a match,
> rather than erroring out?
> 
> That we're mixing up whether '$' always matches here with our iteration
> loop is our own internal bug, we shouldn't error out on a '$'. It's just
> a regex that happens to match everything.

Thanks, that makes a lot of sense. I'll try to rework things this
weekend.

Cheers,

-- 
Lars Kellogg-Stedman <lars@oddbit.com> | larsks @ {irc,twitter,github}
http://blog.oddbit.com/                | N1LKS

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] line-range: Fix infinite loop bug with degenerate '$' regex
  2022-12-05 19:36 [PATCH v2] line-range: Fix infinite loop bug with degenerate regex Lars Kellogg-Stedman
  2022-12-07  4:33 ` Eric Sunshine
  2022-12-07  4:52 ` Ævar Arnfjörð Bjarmason
@ 2022-12-11  1:53 ` Lars Kellogg-Stedman
  2022-12-11  3:32   ` Junio C Hamano
  2022-12-19 22:48 ` [PATCH v4] line-range: fix infinite loop bug with " Lars Kellogg-Stedman
  3 siblings, 1 reply; 15+ messages in thread
From: Lars Kellogg-Stedman @ 2022-12-11  1:53 UTC (permalink / raw)
  To: git; +Cc: Lars Kellogg-Stedman

When the -L argument to "git log" is passed the degenerate regular
expression "$" (as in "-L :$:line-range.c"), this results in an
infinite loop in find_funcname_matching_regexp().

Modify find_funcname_matching_regexp to correctly match the entire line
instead of the zero-width match at eol and update the loop condition to
prevent an infinite loop in the event of other undiscovered corner cases.

Originally reported in <https://stackoverflow.com/q/74690545/147356>.

Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com>
---
This modifies my earlier patch so that instead of failing, the
regular expression '$' correctly matches any function name (and so
always returns the first function in the named file). This makes '$'
behave the same as '.' and '^'.

The primary change is that we pre-decrement the beginning-of-line
marker ('bol') before comparing it to '\n'. In the case of '$', where
we start with bol == eol, this ensures that bol will find the
beginning of the line on which the match occurred.

 line-range.c        |  6 +++---
 t/t4211-line-log.sh | 22 ++++++++++++++++++++++
 2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/line-range.c b/line-range.c
index 955a8a9535..b0e26e7f9d 100644
--- a/line-range.c
+++ b/line-range.c
@@ -135,7 +135,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
 {
 	int reg_error;
 	regmatch_t match[1];
-	while (1) {
+	while (*start) {
 		const char *bol, *eol;
 		reg_error = regexec(regexp, start, 1, match, 0);
 		if (reg_error == REG_NOMATCH)
@@ -148,8 +148,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
 		/* determine extent of line matched */
 		bol = start+match[0].rm_so;
 		eol = start+match[0].rm_eo;
-		while (bol > start && *bol != '\n')
-			bol--;
+		while (bol > start && *--bol != '\n');
 		if (*bol == '\n')
 			bol++;
 		while (*eol && *eol != '\n')
@@ -161,6 +160,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
 			return bol;
 		start = eol;
 	}
+	return NULL;
 }
 
 static const char *parse_range_funcname(
diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index ac9e4d0928..19db07a8df 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -315,4 +315,26 @@ test_expect_success 'line-log with --before' '
 	test_cmp expect actual
 '
 
+test_expect_success 'setup tests for zero-width regular expressions' '
+	cat > expect <<-EOF
+	Modify func1() in file.c
+	Add func1() and func2() in file.c
+	EOF
+'
+
+test_expect_success 'zero-width regex $ matches any function name' '
+	git log --format="%s" --no-patch "-L:$:file.c" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'zero-width regex ^ matches any function name' '
+	git log --format="%s" --no-patch "-L:^:file.c" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'zero-width regex . matches any function name' '
+	git log --format="%s" --no-patch "-L:.:file.c" >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] line-range: Fix infinite loop bug with degenerate '$' regex
  2022-12-11  1:53 ` [PATCH v3] line-range: Fix infinite loop bug with degenerate '$' regex Lars Kellogg-Stedman
@ 2022-12-11  3:32   ` Junio C Hamano
  2022-12-11  3:34     ` Junio C Hamano
  2022-12-14 14:53     ` Lars Kellogg-Stedman
  0 siblings, 2 replies; 15+ messages in thread
From: Junio C Hamano @ 2022-12-11  3:32 UTC (permalink / raw)
  To: Lars Kellogg-Stedman; +Cc: git

Lars Kellogg-Stedman <lars@oddbit.com> writes:

> When the -L argument to "git log" is passed the degenerate regular
> expression "$" (as in "-L :$:line-range.c"), this results in an
> infinite loop in find_funcname_matching_regexp().

Is "matching an empty line" the only way a regular expression can be
a "degenerate" one?  If not, perhaps being a bit more explicit would
help the readers, e.g.

    ... a regular expression that matches any line, even an empty
    one, such as "-L :$:line-range.c", this results in ...

> The primary change is that we pre-decrement the beginning-of-line
> marker ('bol') before comparing it to '\n'. In the case of '$', where
> we start with bol == eol, this ensures that bol will find the
> beginning of the line on which the match occurred.

This clear explanation probably deserves to be in the commit log
message proper.

> @@ -148,8 +148,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>  		/* determine extent of line matched */
>  		bol = start+match[0].rm_so;
>  		eol = start+match[0].rm_eo;
> -		while (bol > start && *bol != '\n')
> -			bol--;
> +		while (bol > start && *--bol != '\n');

Please write it on two lines, and highlight an empty body of the
loop, like so

		while (condition)
			; /* nothing */

I think the intention of the above loop is to decrement bol
sufficiently and make it point at the terminating LF at the end
of the previous line, and then the next increment here

>  		if (*bol == '\n')
>  			bol++;

compensates to bring bol back to point at the beginning of line.

In the iteration of the loop when bol == start + 1, we inspect
*--bol (i.e.  *start).  If it is LF, we exit the loop, so bol ==
start and *bol == '\n'.  If it is not LF, we iterate one more time,
notice bol == start and exit the loop, so bol == start and *bol !=
'\n'.  bol will never go below start, so dereference *bol to see it
value where should be safe without OOB read.

> @@ -161,6 +160,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>  			return bol;
>  		start = eol;
>  	}
> +	return NULL;
>  }

What is this change about?  Isn't the above an endless loop without
break, from which the only way for the control to leave it is by
returning?

> diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
> index ac9e4d0928..19db07a8df 100755
> --- a/t/t4211-line-log.sh
> +++ b/t/t4211-line-log.sh
> @@ -315,4 +315,26 @@ test_expect_success 'line-log with --before' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'setup tests for zero-width regular expressions' '
> +	cat > expect <<-EOF

Style: lose the SP after "cat >".

> +	Modify func1() in file.c
> +	Add func1() and func2() in file.c
> +	EOF
> +'
> +
> +test_expect_success 'zero-width regex $ matches any function name' '
> +	git log --format="%s" --no-patch "-L:$:file.c" >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'zero-width regex ^ matches any function name' '
> +	git log --format="%s" --no-patch "-L:^:file.c" >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'zero-width regex . matches any function name' '

"." is one character wide, not zero-width.  Did you mean ".*"?

> +	git log --format="%s" --no-patch "-L:.:file.c" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] line-range: Fix infinite loop bug with degenerate '$' regex
  2022-12-11  3:32   ` Junio C Hamano
@ 2022-12-11  3:34     ` Junio C Hamano
  2022-12-14 14:53     ` Lars Kellogg-Stedman
  1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2022-12-11  3:34 UTC (permalink / raw)
  To: Lars Kellogg-Stedman; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Lars Kellogg-Stedman <lars@oddbit.com> writes:
>
>> When the -L argument to "git log" is passed the degenerate regular
>> expression "$" (as in "-L :$:line-range.c"), this results in an
>> infinite loop in find_funcname_matching_regexp().
>
> Is "matching an empty line" the only way a regular expression can be
> a "degenerate" one?  If not, perhaps being a bit more explicit would
> help the readers, e.g.
>
>     ... a regular expression that matches any line, even an empty
>     one, such as "-L :$:line-range.c", this results in ...

I forgot to point out that the above comment on "degenerate" applies
equally to the commit title (which matters more).  Also, please do
not upcase the first word after "<area>:" on the title line.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] line-range: Fix infinite loop bug with degenerate '$' regex
  2022-12-11  3:32   ` Junio C Hamano
  2022-12-11  3:34     ` Junio C Hamano
@ 2022-12-14 14:53     ` Lars Kellogg-Stedman
  2022-12-18  1:33       ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Lars Kellogg-Stedman @ 2022-12-14 14:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sun, Dec 11, 2022 at 12:32:38PM +0900, Junio C Hamano wrote:
> Lars Kellogg-Stedman <lars@oddbit.com> writes:
> 
> > When the -L argument to "git log" is passed the degenerate regular
> > expression "$" (as in "-L :$:line-range.c"), this results in an
> > infinite loop in find_funcname_matching_regexp().
> 
> Is "matching an empty line" the only way a regular expression can be
> a "degenerate" one?  If not, perhaps being a bit more explicit would
> help the readers, e.g.

"Matching an empty line" isn't the issue (and if I implied that it
was, that's my bad). The issue only crops up when matching the
zero-width regular expression at the *end* of a line.

I'll update the subject and description to be more clear...

> This clear explanation probably deserves to be in the commit log
> message proper.

...including adding the extended description to the commit message. Some projects
object if commit messages get too wordy.

> Please write it on two lines, and highlight an empty body of the
> loop, like so

Will do.

> > @@ -161,6 +160,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
> >  			return bol;
> >  		start = eol;
> >  	}
> > +	return NULL;
> >  }
> 
> What is this change about?  Isn't the above an endless loop without
> break, from which the only way for the control to leave it is by
> returning?

It's not an endless loop without break; this change modified the loop
condition:

> -       while (1) {
> +       while (*start) {

That would have prevented the infinite loop in the first place. I'm
happy to drop this change if you prefer, but if this condition had
been in place originally we wouldn't have had the infinite loop bug
(we would still have had incorrect behavior, but it would have been
perhaps easier for an end user to diagnose).

> Style: lose the SP after "cat >".

Will do.

-- 
Lars Kellogg-Stedman <lars@oddbit.com> | larsks @ {irc,twitter,github}
http://blog.oddbit.com/                | N1LKS

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] line-range: Fix infinite loop bug with degenerate '$' regex
  2022-12-14 14:53     ` Lars Kellogg-Stedman
@ 2022-12-18  1:33       ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2022-12-18  1:33 UTC (permalink / raw)
  To: Lars Kellogg-Stedman; +Cc: git

Lars Kellogg-Stedman <lars@oddbit.com> writes:

>> > @@ -161,6 +160,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>> >  			return bol;
>> >  		start = eol;
>> >  	}
>> > +	return NULL;
>> >  }
>> 
>> What is this change about?  Isn't the above an endless loop without
>> break, from which the only way for the control to leave it is by
>> returning?
>
> It's not an endless loop without break; this change modified the loop
> condition:

Ah, thanks, that's clear, and sorry for the noise, and thanks.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v4] line-range: fix infinite loop bug with '$' regex
  2022-12-05 19:36 [PATCH v2] line-range: Fix infinite loop bug with degenerate regex Lars Kellogg-Stedman
                   ` (2 preceding siblings ...)
  2022-12-11  1:53 ` [PATCH v3] line-range: Fix infinite loop bug with degenerate '$' regex Lars Kellogg-Stedman
@ 2022-12-19 22:48 ` Lars Kellogg-Stedman
  2022-12-19 22:55   ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 15+ messages in thread
From: Lars Kellogg-Stedman @ 2022-12-19 22:48 UTC (permalink / raw)
  To: git; +Cc: Lars Kellogg-Stedman

When the -L argument to "git log" is passed the zero-width regular
expression "$" (as in "-L :$:line-range.c"), this results in an
infinite loop in find_funcname_matching_regexp().

Modify find_funcname_matching_regexp to correctly match the entire line
instead of the zero-width match at eol and update the loop condition to
prevent an infinite loop in the event of other undiscovered corner cases.

The primary change is that we pre-decrement the beginning-of-line marker
('bol') before comparing it to '\n'. In the case of '$', where we match the
'\n' at the end of the line and start the loop with bol == eol, this
ensures that bol will find the beginning of the line on which the match
occurred.

Originally reported in <https://stackoverflow.com/q/74690545/147356>.

Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com>
---
 line-range.c        |  7 ++++---
 t/t4211-line-log.sh | 22 ++++++++++++++++++++++
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/line-range.c b/line-range.c
index 955a8a9535..47bf0d6f1a 100644
--- a/line-range.c
+++ b/line-range.c
@@ -135,7 +135,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
 {
 	int reg_error;
 	regmatch_t match[1];
-	while (1) {
+	while (*start) {
 		const char *bol, *eol;
 		reg_error = regexec(regexp, start, 1, match, 0);
 		if (reg_error == REG_NOMATCH)
@@ -148,8 +148,8 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
 		/* determine extent of line matched */
 		bol = start+match[0].rm_so;
 		eol = start+match[0].rm_eo;
-		while (bol > start && *bol != '\n')
-			bol--;
+		while (bol > start && *--bol != '\n')
+			; /* nothing */
 		if (*bol == '\n')
 			bol++;
 		while (*eol && *eol != '\n')
@@ -161,6 +161,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
 			return bol;
 		start = eol;
 	}
+	return NULL;
 }
 
 static const char *parse_range_funcname(
diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index ac9e4d0928..c6540e822f 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -315,4 +315,26 @@ test_expect_success 'line-log with --before' '
 	test_cmp expect actual
 '
 
+test_expect_success 'setup tests for zero-width regular expressions' '
+	cat >expect <<-EOF
+	Modify func1() in file.c
+	Add func1() and func2() in file.c
+	EOF
+'
+
+test_expect_success 'zero-width regex $ matches any function name' '
+	git log --format="%s" --no-patch "-L:$:file.c" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'zero-width regex ^ matches any function name' '
+	git log --format="%s" --no-patch "-L:^:file.c" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'zero-width regex .* matches any function name' '
+	git log --format="%s" --no-patch "-L:.*:file.c" >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] line-range: fix infinite loop bug with '$' regex
  2022-12-19 22:48 ` [PATCH v4] line-range: fix infinite loop bug with " Lars Kellogg-Stedman
@ 2022-12-19 22:55   ` Ævar Arnfjörð Bjarmason
  2022-12-20  0:00     ` Eric Sunshine
  0 siblings, 1 reply; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-19 22:55 UTC (permalink / raw)
  To: Lars Kellogg-Stedman; +Cc: git


On Mon, Dec 19 2022, Lars Kellogg-Stedman wrote:

> When the -L argument to "git log" is passed the zero-width regular
> expression "$" (as in "-L :$:line-range.c"), this results in an
> infinite loop in find_funcname_matching_regexp().
>
> Modify find_funcname_matching_regexp to correctly match the entire line
> instead of the zero-width match at eol and update the loop condition to
> prevent an infinite loop in the event of other undiscovered corner cases.
>
> The primary change is that we pre-decrement the beginning-of-line marker
> ('bol') before comparing it to '\n'. In the case of '$', where we match the
> '\n' at the end of the line and start the loop with bol == eol, this
> ensures that bol will find the beginning of the line on which the match
> occurred.
>
> Originally reported in <https://stackoverflow.com/q/74690545/147356>.
>
> Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com>
> ---
>  line-range.c        |  7 ++++---
>  t/t4211-line-log.sh | 22 ++++++++++++++++++++++
>  2 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/line-range.c b/line-range.c
> index 955a8a9535..47bf0d6f1a 100644
> --- a/line-range.c
> +++ b/line-range.c
> @@ -135,7 +135,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>  {
>  	int reg_error;
>  	regmatch_t match[1];
> -	while (1) {
> +	while (*start) {
>  		const char *bol, *eol;
>  		reg_error = regexec(regexp, start, 1, match, 0);
>  		if (reg_error == REG_NOMATCH)
> @@ -148,8 +148,8 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>  		/* determine extent of line matched */
>  		bol = start+match[0].rm_so;
>  		eol = start+match[0].rm_eo;
> -		while (bol > start && *bol != '\n')
> -			bol--;
> +		while (bol > start && *--bol != '\n')
> +			; /* nothing */
>  		if (*bol == '\n')
>  			bol++;
>  		while (*eol && *eol != '\n')
> @@ -161,6 +161,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>  			return bol;
>  		start = eol;
>  	}
> +	return NULL;
>  }
>  
>  static const char *parse_range_funcname(
> diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
> index ac9e4d0928..c6540e822f 100755
> --- a/t/t4211-line-log.sh
> +++ b/t/t4211-line-log.sh
> @@ -315,4 +315,26 @@ test_expect_success 'line-log with --before' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'setup tests for zero-width regular expressions' '
> +	cat >expect <<-EOF
> +	Modify func1() in file.c
> +	Add func1() and func2() in file.c
> +	EOF
> +'
> +
> +test_expect_success 'zero-width regex $ matches any function name' '
> +	git log --format="%s" --no-patch "-L:$:file.c" >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'zero-width regex ^ matches any function name' '
> +	git log --format="%s" --no-patch "-L:^:file.c" >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'zero-width regex .* matches any function name' '
> +	git log --format="%s" --no-patch "-L:.*:file.c" >actual &&
> +	test_cmp expect actual
> +'
> +

Untested, but as most of this is repeated & would fail if the "setup"
test is skipped, a one-off function would be better, e.g.:
	
	test_zerowidth_regex () {
		local rx="$1" &&
		
		test_expect_success "zero-width regex '$rx' matches any function name" '
			cat >expect <<-\EOF &&
			Modify func1() in file.c
			Add func1() and func2() in file.c
			EOF
			git log --format="%s" --no-patch -L:"$rx":file.c >actual &&
			test_cmp expect actual
		'
	}
	
	test_zerowidth_regex '$'
	test_zerowidth_regex '^'
	test_zerowidth_regex '.*'

The change of direction for the fix itself looks good to me (re my
earlier feedback on a previous round), i.e. that we should fix our own
code, not forbid '$' in regexes.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] line-range: fix infinite loop bug with '$' regex
  2022-12-19 22:55   ` Ævar Arnfjörð Bjarmason
@ 2022-12-20  0:00     ` Eric Sunshine
  2022-12-20  1:07       ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Sunshine @ 2022-12-20  0:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Lars Kellogg-Stedman, git

On Mon, Dec 19, 2022 at 6:00 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Mon, Dec 19 2022, Lars Kellogg-Stedman wrote:
> > +test_expect_success 'setup tests for zero-width regular expressions' '
> > +     cat >expect <<-EOF
> > +     Modify func1() in file.c
> > +     Add func1() and func2() in file.c
> > +     EOF
> > +'
> > +
> > +test_expect_success 'zero-width regex $ matches any function name' '
> > +     git log --format="%s" --no-patch "-L:$:file.c" >actual &&
> > +     test_cmp expect actual
> > +'
>
> Untested, but as most of this is repeated & would fail if the "setup"
> test is skipped, a one-off function would be better, e.g.:
>
>         test_zerowidth_regex () {
>                 local rx="$1" &&
>                 [...]
>         }
>         test_zerowidth_regex '$'
>         test_zerowidth_regex '^'
>         test_zerowidth_regex '.*'
>
> The change of direction for the fix itself looks good to me (re my
> earlier feedback on a previous round), i.e. that we should fix our own
> code, not forbid '$' in regexes.

It's subjective, of course, but the patch seems "good enough" as-is
and the tests are easy to understand. Therefore, can you clarify for
Lars and other reviewers if you're merely presenting an alternative
approach, or if you consider your suggestion "blocking" and expect a
reroll.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] line-range: fix infinite loop bug with '$' regex
  2022-12-20  0:00     ` Eric Sunshine
@ 2022-12-20  1:07       ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2022-12-20  1:07 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Ævar Arnfjörð Bjarmason, Lars Kellogg-Stedman, git

Eric Sunshine <sunshine@sunshineco.com> writes:

>> The change of direction for the fix itself looks good to me (re my
>> earlier feedback on a previous round), i.e. that we should fix our own
>> code, not forbid '$' in regexes.
>
> It's subjective, of course, but the patch seems "good enough" as-is
> and the tests are easy to understand. Therefore, can you clarify for
> Lars and other reviewers if you're merely presenting an alternative
> approach, or if you consider your suggestion "blocking" and expect a
> reroll.

Thanks for saying this.

I personally felt that the original was easier to follow---given
that these tests do not have too many moving parts that may need to
be changed (in which case abstracting common parts out would allow
us to maintain only one copy which is a big win), being able to read
from top to bottom without having to read a function and recall what
the parameter was etc.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-12-20  1:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-05 19:36 [PATCH v2] line-range: Fix infinite loop bug with degenerate regex Lars Kellogg-Stedman
2022-12-07  4:33 ` Eric Sunshine
2022-12-07  4:52 ` Ævar Arnfjörð Bjarmason
2022-12-07  5:29   ` Junio C Hamano
2022-12-07 20:30   ` SZEDER Gábor
2022-12-09 19:16   ` Lars Kellogg-Stedman
2022-12-11  1:53 ` [PATCH v3] line-range: Fix infinite loop bug with degenerate '$' regex Lars Kellogg-Stedman
2022-12-11  3:32   ` Junio C Hamano
2022-12-11  3:34     ` Junio C Hamano
2022-12-14 14:53     ` Lars Kellogg-Stedman
2022-12-18  1:33       ` Junio C Hamano
2022-12-19 22:48 ` [PATCH v4] line-range: fix infinite loop bug with " Lars Kellogg-Stedman
2022-12-19 22:55   ` Ævar Arnfjörð Bjarmason
2022-12-20  0:00     ` Eric Sunshine
2022-12-20  1:07       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).