git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "René Scharfe" <l.s.r@web.de>,
	"Michał Kępień" <michal@isc.org>
Subject: Re: [PATCH] diff: fix a segfault in >2 tree -I<regex> and --output=<file>
Date: Tue, 24 May 2022 13:38:16 +0200	[thread overview]
Message-ID: <220524.86leurw3my.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <xmqqleusqaff.fsf@gitster.g>


On Mon, May 23 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Fix a regression in c45dc9cf30 (diff: plug memory leak from regcomp()
>> on {log,diff} -I, 2021-02-11), as noted in [1] there was a logic error
>> where we'd free the regex too soon.
>>
>> Now we'll ensure that diff_free() can be called repeatedly
>> instead. We'd ultimately like to do away with the "no_free" confusion
>> surrounding it, and to attempt to free() things only once, as outlined
>> in [2]. But in the meantime this will fix the segfault.
>
> Hmph, repeated calls to diff_free_file() now closes the file upon
> the first call.  I would have expected that such a resource would be
> released when all the references go away, i.e. upon the last call.
> The same thing for the ignore-regex array.

Yes, that would be much more sensible. But as noted:

    When producing a combined diff we'll go through combined-diff.c,
    which doesn't handle many of the options that the corresponding
    diff.c codepaths do.

I.e. the "right" thing to do in this case would require a much more
involved fix. We've somehow ended up not supporting --output=<file>, -I
and probably many other options in the combined-diff mode, which both in
testing and in this part of the implementation seems to have become an
afterthought.

So before any changes of mine we silently ignore those options, and in
those particular cases the "right" thing to do if we're not growing new
features would probably be to error out early if they were provided in
the combined diff mode.

But as a minimal fix just tailoring diff_free() towards the
not-combined-diff.c case seems to be the smallest & most correct thing
to do for now to address the segfault & the immediate issue.

> Clearing the "options->close_file" bit, and using FREE_AND_NULL(),
> would hide a breakage that could be caused by this change, doesn't
> it, because any use-after-release will say "ah, no need to close the
> file" and "oh, there is no regex".  The former is not so worrisome,
> but the latter may be---we may no longer have regex because the
> first call to diff_free_ignore_regex() has cleared it and the code
> that wants to use the regex, if exists, would happily say "oh, there
> is no regex", instead of exposing the use-after-release breakage to
> a segfault.

Yes, this wouldn't make much sense if we were supporting the file output
and -I in the combined-diff.c case, but AFAICT the two cases are:


 1. The "normal" diff case, where we set those up once, and diff_free()
    them once.

 2. The "combined-diff.c" case, where we might call diff_free() N times,
    but it's all to produce the diff itself, not for those options.


>> Thus we're here testing that -I<regex> is ignored in this case, and
>> likewise for --output=<file>, but since this is what we were doing
>> before c45dc9cf30 let's accept it for now.
>
> It is true that the result of applying this patch is equivalent to
> c45dc9cf (diff: plug memory leak from regcomp() on {log,diff} -I,
> 2021-02-11), but doesn't that merely point at the commit as the
> source of behaviour breakage?  With ignore-regex leaking before that
> commit, wouldn't we have been using ignore-regex with combined diff
> machinery?

No, because -I never did anything with the combined diff machinery,
neither did --output.

> Sorry, but I am failing to convince myself that this is not sweep
> the issue under the rug.

I think that's a fair summary, much of it was already under the rug,
we're sweeping some of the remainin parts under it :)

I think that whole combined-diff interaction really needs to be fix, not
just for the diff_free() case, but e.g. we should either error out or
support options that we're silently ignoring now.

But as noted in
https://lore.kernel.org/git/220520.86pmk81a9z.gmgdl@evledraar.gmail.com/
I do have patches queued up locally that form a better basis for fixing
these issues. I.e. once we fix this segfault and have
release_revisions() it'll be easy to get rid of that "no_free" case in
diff_free().

>> [...]
>>  void diff_free(struct diff_options *options)
>> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
>> index 056e922164d..b556d185f53 100755
>> --- a/t/t4013-diff-various.sh
>> +++ b/t/t4013-diff-various.sh
>> @@ -614,4 +614,19 @@ test_expect_success 'diff -I<regex>: detect malformed regex' '
>>  	test_i18ngrep "invalid regex given to -I: " error
>>  '
>>  
>> +test_expect_success 'diff -I<regex>: combined diff does not segfault' '
>> +	revs="HEAD~2 HEAD~ HEAD" &&
>> +	git diff $revs >expect &&
>> +	git diff -I . $revs >actual &&
>> +	test_cmp expect actual
>
> And indeed this casts such a broken behaviour in stone.
>
>> +'
>> +
>> +test_expect_success 'diff --output=<file>: combined diff does not segfault' '
>> +	revs="HEAD~2 HEAD~ HEAD" &&
>> +	git diff --output=expect.file $revs >expect.out &&
>> +	git diff $revs >actual &&
>> +	test_cmp expect.out actual &&
>> +	test_must_be_empty expect.file
>
> So is this one.

I was on the fence about adding these tests, since I expected you to
comment on this aspect of them. I.e. we could just ignore the output
here and narrowly see if we segfault.

But since we had no tests at all for this before, and intentional or not
this behavior of combined-diff is long-standing behavior (that nobody
seems to have noticed or cared about) I think it's good to have tests
that check the "expected" (as in what we did before my c45dc9cf30)
output.

  reply	other threads:[~2022-05-24 11:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-14  9:32 Bug: combined diff with --ignore-matching-lines René Scharfe
2022-05-23 18:31 ` [PATCH] diff: fix a segfault in >2 tree -I<regex> and --output=<file> Ævar Arnfjörð Bjarmason
2022-05-23 20:08   ` Junio C Hamano
2022-05-24 11:38     ` Ævar Arnfjörð Bjarmason [this message]
2022-05-24 19:38       ` Junio C Hamano
2022-05-24 20:17         ` Ævar Arnfjörð Bjarmason
2022-06-18 11:12           ` René Scharfe
2022-06-18 11:12           ` [PATCH 1/2] combine-diff: abort if --ignore-matching-lines is given René Scharfe
2022-06-21 15:35             ` Junio C Hamano
2022-06-21 15:58               ` René Scharfe
2022-06-21 16:55                 ` Junio C Hamano
2022-06-18 11:12           ` [PATCH 2/2] combine-diff: abort if --output " René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220524.86leurw3my.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=michal@isc.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).