git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* Bug: combined diff with --ignore-matching-lines
@ 2022-05-14  9:32 René Scharfe
  2022-05-23 18:31 ` [PATCH] diff: fix a segfault in >2 tree -I<regex> and --output=<file> Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 6+ messages in thread
From: René Scharfe @ 2022-05-14  9:32 UTC (permalink / raw)
  To: Git List
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Michał Kępień

Hi all,

git diff segfaults when it's asked to produce a combined diff and ignore
certain lines with --ignore-matching-lines/-I, e.g.:

   $ git diff -I DEF_VER v2.33.3 v2.33.3^@
   zsh: segmentation fault  ./git-diff -I DEF_VER v2.33.3 v2.33.3^@

That's because combine-diff.c::diff_tree_combined() copies a diffopt
without making a deep copy of the ignore_regex array and frees it, then
later tries to use it.

The segfault can be fixed by adding "diffopt.no_free = 1;" or reverting
c45dc9cf30 (diff: plug memory leak from regcomp() on {log,diff} -I,
2021-02-11).

But even with that the only thing the command ignores is the option -I;
the GIT-VERSION-GEN changes in the middle should have been omitted:

   $ git diff -I DEF_VER v2.33.3 v2.33.3^@
   diff --cc Documentation/RelNotes/2.33.3.txt
   index 0000000000,0000000000..e2bada12a1
   new file mode 100644
   --- /dev/null
   +++ b/Documentation/RelNotes/2.33.3.txt
   @@@ -1,0 -1,0 +1,4 @@@
   ++Git Documentation/RelNotes/2.33.3.txt Release Notes
   ++=========================
   ++
   ++This release merges up the fixes that appear in v2.33.3.
   diff --cc GIT-VERSION-GEN
   index d81eab5f00,e7efe58866..86a3a2870c
   --- a/GIT-VERSION-GEN
   +++ b/GIT-VERSION-GEN
   @@@ -1,7 -1,7 +1,7 @@@
     #!/bin/sh

     GVF=GIT-VERSION-FILE
   - DEF_VER=v2.33.2
    -DEF_VER=v2.32.2
   ++DEF_VER=v2.33.3

     LF='
     '
   diff --cc RelNotes
   index 8e79de2efe,4ac68388c3..899139d9ec
   --- a/RelNotes
   +++ b/RelNotes
   @@@ -1,1 -1,1 +1,1 @@@
   - Documentation/RelNotes/2.33.2.txt
    -Documentation/RelNotes/2.32.2.txt
   ++Documentation/RelNotes/2.33.3.txt

Just setting need_generic_pathscan is not enough.  Ideas?

René


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] diff: fix a segfault in >2 tree -I<regex> and --output=<file>
  2022-05-14  9:32 Bug: combined diff with --ignore-matching-lines René Scharfe
@ 2022-05-23 18:31 ` Ævar Arnfjörð Bjarmason
  2022-05-23 20:08   ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-23 18:31 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, René Scharfe, Michał Kępień,
	Ævar Arnfjörð Bjarmason

Fix a regression in c45dc9cf30 (diff: plug memory leak from regcomp()
on {log,diff} -I, 2021-02-11), as noted in [1] there was a logic error
where we'd free the regex too soon.

Now we'll ensure that diff_free() can be called repeatedly
instead. We'd ultimately like to do away with the "no_free" confusion
surrounding it, and to attempt to free() things only once, as outlined
in [2]. But in the meantime this will fix the segfault.

Since the rest of diff_free() was already safe re-invoke this only
affected the -I<regex> and --output=<file> options. In both cases our
behavior before wasn't very sensible. When producing a combined diff
we'll go through combined-diff.c, which doesn't handle many of the
options that the corresponding diff.c codepaths do.

Thus we're here testing that -I<regex> is ignored in this case, and
likewise for --output=<file>, but since this is what we were doing
before c45dc9cf30 let's accept it for now.

1. https://lore.kernel.org/git/a6a14213-bc82-d6fb-43dd-5a423c40a4f8@web.de/
2. https://lore.kernel.org/git/220520.86pmk81a9z.gmgdl@evledraar.gmail.com/

Reported-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

On Sat, May 14 2022, René Scharfe wrote:

> Hi all,
>
> git diff segfaults when it's asked to produce a combined diff and ignore
> certain lines with --ignore-matching-lines/-I, e.g.:
>
>    $ git diff -I DEF_VER v2.33.3 v2.33.3^@
>    zsh: segmentation fault  ./git-diff -I DEF_VER v2.33.3 v2.33.3^@

 diff.c                  |  9 ++++++---
 t/t4013-diff-various.sh | 15 +++++++++++++++
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index e71cf758861..183c9f21305 100644
--- a/diff.c
+++ b/diff.c
@@ -6432,8 +6432,10 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 
 static void diff_free_file(struct diff_options *options)
 {
-	if (options->close_file)
-		fclose(options->file);
+	if (!options->close_file)
+		return;
+	options->close_file = 0;
+	fclose(options->file);
 }
 
 static void diff_free_ignore_regex(struct diff_options *options)
@@ -6444,7 +6446,8 @@ static void diff_free_ignore_regex(struct diff_options *options)
 		regfree(options->ignore_regex[i]);
 		free(options->ignore_regex[i]);
 	}
-	free(options->ignore_regex);
+	options->ignore_regex_nr = 0;
+	FREE_AND_NULL(options->ignore_regex);
 }
 
 void diff_free(struct diff_options *options)
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 056e922164d..b556d185f53 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -614,4 +614,19 @@ test_expect_success 'diff -I<regex>: detect malformed regex' '
 	test_i18ngrep "invalid regex given to -I: " error
 '
 
+test_expect_success 'diff -I<regex>: combined diff does not segfault' '
+	revs="HEAD~2 HEAD~ HEAD" &&
+	git diff $revs >expect &&
+	git diff -I . $revs >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'diff --output=<file>: combined diff does not segfault' '
+	revs="HEAD~2 HEAD~ HEAD" &&
+	git diff --output=expect.file $revs >expect.out &&
+	git diff $revs >actual &&
+	test_cmp expect.out actual &&
+	test_must_be_empty expect.file
+'
+
 test_done
-- 
2.36.1.1038.gde12b200317


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] diff: fix a segfault in >2 tree -I<regex> and --output=<file>
  2022-05-23 18:31 ` [PATCH] diff: fix a segfault in >2 tree -I<regex> and --output=<file> Ævar Arnfjörð Bjarmason
@ 2022-05-23 20:08   ` Junio C Hamano
  2022-05-24 11:38     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2022-05-23 20:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, René Scharfe, Michał Kępień

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Fix a regression in c45dc9cf30 (diff: plug memory leak from regcomp()
> on {log,diff} -I, 2021-02-11), as noted in [1] there was a logic error
> where we'd free the regex too soon.
>
> Now we'll ensure that diff_free() can be called repeatedly
> instead. We'd ultimately like to do away with the "no_free" confusion
> surrounding it, and to attempt to free() things only once, as outlined
> in [2]. But in the meantime this will fix the segfault.

Hmph, repeated calls to diff_free_file() now closes the file upon
the first call.  I would have expected that such a resource would be
released when all the references go away, i.e. upon the last call.
The same thing for the ignore-regex array.

Clearing the "options->close_file" bit, and using FREE_AND_NULL(),
would hide a breakage that could be caused by this change, doesn't
it, because any use-after-release will say "ah, no need to close the
file" and "oh, there is no regex".  The former is not so worrisome,
but the latter may be---we may no longer have regex because the
first call to diff_free_ignore_regex() has cleared it and the code
that wants to use the regex, if exists, would happily say "oh, there
is no regex", instead of exposing the use-after-release breakage to
a segfault.

> Thus we're here testing that -I<regex> is ignored in this case, and
> likewise for --output=<file>, but since this is what we were doing
> before c45dc9cf30 let's accept it for now.

It is true that the result of applying this patch is equivalent to
c45dc9cf (diff: plug memory leak from regcomp() on {log,diff} -I,
2021-02-11), but doesn't that merely point at the commit as the
source of behaviour breakage?  With ignore-regex leaking before that
commit, wouldn't we have been using ignore-regex with combined diff
machinery?

Sorry, but I am failing to convince myself that this is not sweep
the issue under the rug.

> 1. https://lore.kernel.org/git/a6a14213-bc82-d6fb-43dd-5a423c40a4f8@web.de/
> 2. https://lore.kernel.org/git/220520.86pmk81a9z.gmgdl@evledraar.gmail.com/
>
> Reported-by: René Scharfe <l.s.r@web.de>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>
> On Sat, May 14 2022, René Scharfe wrote:
>
>> Hi all,
>>
>> git diff segfaults when it's asked to produce a combined diff and ignore
>> certain lines with --ignore-matching-lines/-I, e.g.:
>>
>>    $ git diff -I DEF_VER v2.33.3 v2.33.3^@
>>    zsh: segmentation fault  ./git-diff -I DEF_VER v2.33.3 v2.33.3^@
>
>  diff.c                  |  9 ++++++---
>  t/t4013-diff-various.sh | 15 +++++++++++++++
>  2 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index e71cf758861..183c9f21305 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6432,8 +6432,10 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>  
>  static void diff_free_file(struct diff_options *options)
>  {
> -	if (options->close_file)
> -		fclose(options->file);
> +	if (!options->close_file)
> +		return;
> +	options->close_file = 0;
> +	fclose(options->file);
>  }
>  
>  static void diff_free_ignore_regex(struct diff_options *options)
> @@ -6444,7 +6446,8 @@ static void diff_free_ignore_regex(struct diff_options *options)
>  		regfree(options->ignore_regex[i]);
>  		free(options->ignore_regex[i]);
>  	}
> -	free(options->ignore_regex);
> +	options->ignore_regex_nr = 0;
> +	FREE_AND_NULL(options->ignore_regex);
>  }
>  
>  void diff_free(struct diff_options *options)
> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> index 056e922164d..b556d185f53 100755
> --- a/t/t4013-diff-various.sh
> +++ b/t/t4013-diff-various.sh
> @@ -614,4 +614,19 @@ test_expect_success 'diff -I<regex>: detect malformed regex' '
>  	test_i18ngrep "invalid regex given to -I: " error
>  '
>  
> +test_expect_success 'diff -I<regex>: combined diff does not segfault' '
> +	revs="HEAD~2 HEAD~ HEAD" &&
> +	git diff $revs >expect &&
> +	git diff -I . $revs >actual &&
> +	test_cmp expect actual

And indeed this casts such a broken behaviour in stone.

> +'
> +
> +test_expect_success 'diff --output=<file>: combined diff does not segfault' '
> +	revs="HEAD~2 HEAD~ HEAD" &&
> +	git diff --output=expect.file $revs >expect.out &&
> +	git diff $revs >actual &&
> +	test_cmp expect.out actual &&
> +	test_must_be_empty expect.file

So is this one.

> +'
> +
>  test_done


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] diff: fix a segfault in >2 tree -I<regex> and --output=<file>
  2022-05-23 20:08   ` Junio C Hamano
@ 2022-05-24 11:38     ` Ævar Arnfjörð Bjarmason
  2022-05-24 19:38       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-24 11:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe, Michał Kępień


On Mon, May 23 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Fix a regression in c45dc9cf30 (diff: plug memory leak from regcomp()
>> on {log,diff} -I, 2021-02-11), as noted in [1] there was a logic error
>> where we'd free the regex too soon.
>>
>> Now we'll ensure that diff_free() can be called repeatedly
>> instead. We'd ultimately like to do away with the "no_free" confusion
>> surrounding it, and to attempt to free() things only once, as outlined
>> in [2]. But in the meantime this will fix the segfault.
>
> Hmph, repeated calls to diff_free_file() now closes the file upon
> the first call.  I would have expected that such a resource would be
> released when all the references go away, i.e. upon the last call.
> The same thing for the ignore-regex array.

Yes, that would be much more sensible. But as noted:

    When producing a combined diff we'll go through combined-diff.c,
    which doesn't handle many of the options that the corresponding
    diff.c codepaths do.

I.e. the "right" thing to do in this case would require a much more
involved fix. We've somehow ended up not supporting --output=<file>, -I
and probably many other options in the combined-diff mode, which both in
testing and in this part of the implementation seems to have become an
afterthought.

So before any changes of mine we silently ignore those options, and in
those particular cases the "right" thing to do if we're not growing new
features would probably be to error out early if they were provided in
the combined diff mode.

But as a minimal fix just tailoring diff_free() towards the
not-combined-diff.c case seems to be the smallest & most correct thing
to do for now to address the segfault & the immediate issue.

> Clearing the "options->close_file" bit, and using FREE_AND_NULL(),
> would hide a breakage that could be caused by this change, doesn't
> it, because any use-after-release will say "ah, no need to close the
> file" and "oh, there is no regex".  The former is not so worrisome,
> but the latter may be---we may no longer have regex because the
> first call to diff_free_ignore_regex() has cleared it and the code
> that wants to use the regex, if exists, would happily say "oh, there
> is no regex", instead of exposing the use-after-release breakage to
> a segfault.

Yes, this wouldn't make much sense if we were supporting the file output
and -I in the combined-diff.c case, but AFAICT the two cases are:


 1. The "normal" diff case, where we set those up once, and diff_free()
    them once.

 2. The "combined-diff.c" case, where we might call diff_free() N times,
    but it's all to produce the diff itself, not for those options.


>> Thus we're here testing that -I<regex> is ignored in this case, and
>> likewise for --output=<file>, but since this is what we were doing
>> before c45dc9cf30 let's accept it for now.
>
> It is true that the result of applying this patch is equivalent to
> c45dc9cf (diff: plug memory leak from regcomp() on {log,diff} -I,
> 2021-02-11), but doesn't that merely point at the commit as the
> source of behaviour breakage?  With ignore-regex leaking before that
> commit, wouldn't we have been using ignore-regex with combined diff
> machinery?

No, because -I never did anything with the combined diff machinery,
neither did --output.

> Sorry, but I am failing to convince myself that this is not sweep
> the issue under the rug.

I think that's a fair summary, much of it was already under the rug,
we're sweeping some of the remainin parts under it :)

I think that whole combined-diff interaction really needs to be fix, not
just for the diff_free() case, but e.g. we should either error out or
support options that we're silently ignoring now.

But as noted in
https://lore.kernel.org/git/220520.86pmk81a9z.gmgdl@evledraar.gmail.com/
I do have patches queued up locally that form a better basis for fixing
these issues. I.e. once we fix this segfault and have
release_revisions() it'll be easy to get rid of that "no_free" case in
diff_free().

>> [...]
>>  void diff_free(struct diff_options *options)
>> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
>> index 056e922164d..b556d185f53 100755
>> --- a/t/t4013-diff-various.sh
>> +++ b/t/t4013-diff-various.sh
>> @@ -614,4 +614,19 @@ test_expect_success 'diff -I<regex>: detect malformed regex' '
>>  	test_i18ngrep "invalid regex given to -I: " error
>>  '
>>  
>> +test_expect_success 'diff -I<regex>: combined diff does not segfault' '
>> +	revs="HEAD~2 HEAD~ HEAD" &&
>> +	git diff $revs >expect &&
>> +	git diff -I . $revs >actual &&
>> +	test_cmp expect actual
>
> And indeed this casts such a broken behaviour in stone.
>
>> +'
>> +
>> +test_expect_success 'diff --output=<file>: combined diff does not segfault' '
>> +	revs="HEAD~2 HEAD~ HEAD" &&
>> +	git diff --output=expect.file $revs >expect.out &&
>> +	git diff $revs >actual &&
>> +	test_cmp expect.out actual &&
>> +	test_must_be_empty expect.file
>
> So is this one.

I was on the fence about adding these tests, since I expected you to
comment on this aspect of them. I.e. we could just ignore the output
here and narrowly see if we segfault.

But since we had no tests at all for this before, and intentional or not
this behavior of combined-diff is long-standing behavior (that nobody
seems to have noticed or cared about) I think it's good to have tests
that check the "expected" (as in what we did before my c45dc9cf30)
output.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] diff: fix a segfault in >2 tree -I<regex> and --output=<file>
  2022-05-24 11:38     ` Ævar Arnfjörð Bjarmason
@ 2022-05-24 19:38       ` Junio C Hamano
  2022-05-24 20:17         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2022-05-24 19:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, René Scharfe, Michał Kępień

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I.e. the "right" thing to do in this case would require a much more
> involved fix. We've somehow ended up not supporting --output=<file>, -I
> and probably many other options in the combined-diff mode, which both in
> testing and in this part of the implementation seems to have become an
> afterthought.

OK, a hopefully final question.

How much less involved is it to add a new code (without doing
anything in this patch) to detect and die on the combination of
combined-diff with these two options, so that we can document the
fact that we do not support them?  It would give us much better way
forward than leaving the command silently ignore and give result
that is not in line with what was asked, wouldn't it?  That way, the
much more involved "fix" will turn into a change to add a missing
feature.

Thanks.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] diff: fix a segfault in >2 tree -I<regex> and --output=<file>
  2022-05-24 19:38       ` Junio C Hamano
@ 2022-05-24 20:17         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-24 20:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe, Michał Kępień


On Tue, May 24 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> I.e. the "right" thing to do in this case would require a much more
>> involved fix. We've somehow ended up not supporting --output=<file>, -I
>> and probably many other options in the combined-diff mode, which both in
>> testing and in this part of the implementation seems to have become an
>> afterthought.
>
> OK, a hopefully final question.
>
> How much less involved is it to add a new code (without doing
> anything in this patch)

...yeah, I think for this one it makes sense to narrowly focus on the
segfault...

> to detect and die on the combination of
> combined-diff with these two options, so that we can document the
> fact that we do not support them?  It would give us much better way
> forward than leaving the command silently ignore and give result
> that is not in line with what was asked, wouldn't it?  That way, the
> much more involved "fix" will turn into a change to add a missing
> feature.

I think not much, it's rather trivial for the case where we invoke "git
diff", I.e. just adding something to the "builtin_diff_combined()"
branch in builtin/diff.c to detect these two cases specifically.

I haven't looked in any depth into how we might reach code in
combine-diff.c through other means, and if any of it can set these two
indirectly somewhere else (i.e. other things that take diff options).

I also wonder if I'm just wrong in my assessment that it's a Bad Thing
that we take some of these without ever doing anything with them in some
modes, e.g.:

	git log --oneline -I foo

This will never do anything with that "-I foo" by definition, but would
as soon as you add -p, should we error without -p (or other diff-showing
options).

The same goes for range-diff, format-patch, --remerge-diff and any
number of other things where we take the full set of options, but only
do something with a limited subset of them.

It is helpful in some cases if we were more anal about it, e.g. when I
was wondering why -I didn't do anything with the combined diff, but also
handy for scripting and one-liners if you can tweak the command-line
back & forth without it being so strict.

So I don't know. Maybe I'm just trying to talk myself out of pulling on
that (bound to be long) thread, but I'm coming more around to this just
being a non-issue beyond the narrow and needed fix for diff_free() in
particular.

I.e. the more general approach of chasing down options that don't do
anything for a given "diff mode". We might still want to error on some
particular ones, such as -I with the combined diff (but not with
--oneline, or whatever).

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-24 20:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-14  9:32 Bug: combined diff with --ignore-matching-lines René Scharfe
2022-05-23 18:31 ` [PATCH] diff: fix a segfault in >2 tree -I<regex> and --output=<file> Ævar Arnfjörð Bjarmason
2022-05-23 20:08   ` Junio C Hamano
2022-05-24 11:38     ` Ævar Arnfjörð Bjarmason
2022-05-24 19:38       ` Junio C Hamano
2022-05-24 20:17         ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).