git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: Junio C Hamano <gitster@pobox.com>,
	Git List <git@vger.kernel.org>, Taylor Blau <me@ttaylorr.com>,
	Christian Couder <chriscool@tuxfamily.org>,
	Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak"
Date: Mon, 28 Nov 2022 16:27:57 +0100	[thread overview]
Message-ID: <221128.861qpnkrd3.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <9bea523d-93d1-953b-a136-3f00844c880a@web.de>


On Mon, Nov 28 2022, René Scharfe wrote:

> Am 28.11.2022 um 13:24 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Mon, Nov 28 2022, Ævar Arnfjörð Bjarmason wrote:
>>
>> René:
>>
>>> On Mon, Nov 28 2022, René Scharfe wrote:
>>>
>>>> Am 28.11.2022 um 11:03 schrieb Junio C Hamano:
>>>>> René Scharfe <l.s.r@web.de> writes:
>>>>>
>>>>>> This reverts commit 5cb28270a1ff94a0a23e67b479bbbec3bc993518.
>>>>>>
>>>>>> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
>>>>>> 2022-03-28) avoided leaking rev_info allocations in many cases by
>>>>>> calling repo_init_revisions() only when the .filter member was actually
>>>>>> needed, but then still leaking it.  That was fixed later by 2108fe4a19
>>>>>> (revisions API users: add straightforward release_revisions(),
>>>>>> 2022-04-13), making the reverted commit unnecessary.
>>>>>
>>>>> Hmph, with this merged, 'seen' breaks linux-leaks job in a strange
>>>>> way.
>>>>>
>>>>> https://github.com/git/git/actions/runs/3563546608/jobs/5986458300#step:5:3917
>>>>>
>>>>> Does anybody want to help looking into it?
>>>
>>> [I see we crossed E-Mails]:
>>> https://lore.kernel.org/git/221128.868rjvmi3l.gmgdl@evledraar.gmail.com/
>>>
>>>> The patch exposes that release_revisions() leaks the diffopt allocations
>>>> as we're yet to address the TODO added by 54c8a7c379 (revisions API: add
>>>> a TODO for diff_free(&revs->diffopt), 2022-04-14).
>>>
>>> That's correct, and we have that leak in various places in our codebase,
>>> but per the above side-thread I think this is primarily exposing that
>>> we're setting up the "struct rev_info" with your change when we don't
>>> need to. Why can't we just skip it?
>>>
>>> Yeah, if we do set it up we'll run into an outstanding leak, and that
>>> should also be fixed (I have some local patches...), but the other cases
>>> I know of where we'll leak that data is where we're actually using the
>>> "struct rev_info".
>>>
>>> I haven't tried tearing your change apart to poke at it myself, and
>>> maybe there's some really good reason for why you can't separate getting
>>> rid of the J.5.7 dependency and removing the lazy-init.
>>>
>>>> The patch below plugs it locally.
>>>>
>>>> --- >8 ---
>>>> Subject: [PATCH 4/3] fixup! revision: free diffopt in release_revisions()
>>>>
>>>> Signed-off-by: René Scharfe <l.s.r@web.de>
>>>> ---
>>>>  builtin/pack-objects.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
>>>> index 3e74fbb0cd..a47a3f0fba 100644
>>>> --- a/builtin/pack-objects.c
>>>> +++ b/builtin/pack-objects.c
>>>> @@ -4462,6 +4462,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>>>>  	} else {
>>>>  		get_object_list(&revs, rp.nr, rp.v);
>>>>  	}
>>>> +	diff_free(&revs.diffopt);
>>>>  	release_revisions(&revs);
>>>>  	cleanup_preferred_base();
>>>>  	if (include_tag && nr_result)
>>>
>>> So, the main motivation for the change was paranoia that a compiler or
>>> platform might show up without J.5.7 support and that would bite us, but
>>> we're now adding a double-free-in-waiting?
>>>
>>> I think we're both a bit paranoid, but clearly have different
>>> paranoia-priorities :)
>>>
>>> If we do end up with some hack like this instead of fixing the
>>> underlying problem I'd much prefer that such a hack just be an UNLEAK()
>>> here.
>>>
>>> I.e. we have a destructor for "revs.*" already, let's not bypass it and
>>> start freeing things from under it, which will result in a double-free
>>> if we forget this callsite once the TODO in 54c8a7c379 is addressed.
>>>
>>> As you'd see if you made release_revisions() simply call
>>> diff_free(&revs.diffopt) doing so would reveal some really gnarly edge
>>> cases.
>>>
>>> I haven't dug into this one, but offhand I'm not confident in saying
>>> that this isn't exposing us to some aspect of that gnarlyness (maybe
>>> not, it's been a while since I looked).
>>>
>>> (IIRC some of the most gnarly edge cases will only show up as CI
>>> failures on Windows, to do with the ordering of when we'll fclose()
>>> files hanging off that "diffopt").
>>
>> This squashed into 3/3 seems to me to be a proper fix to a change that
>> wants to refactor the code for non-J.5.7 compatibility. I.e. this just
>> does the data<->fp casting part of the change, without refactoring the
>> "lazy init".
>
> That works, but lazy code is more complicated and there is no benefit
> here -- eager allocations are not noticably slow or big.  Laziness
> hides leaks in corners, i.e. requiring invocations with uncommon
> options to trigger them.

Yes, sometimes it's easier to just set everything up at the
beginning. As for hiding leaks I think the empirical data here is going
against that, i.e. your change introduced a leak.

I don't think it's realistic that we'll have the side that assigns to
"have_revs" drift from the corresponding code in cmd_pack_objects().

>> But I think you should check this a bit more carefully. Your 3/3 says
>> that your change "mak[es] the reverted commit unnecessary"
>
> No, it says that _your_ change 2108fe4a19 (revisions API users: add
> straightforward release_revisions(), 2022-04-13) made it unnecessary.

Yes, I'm saying that's not correct, because if you run the command that
5cb28270a1 prominently notes we'll now leak with this revert:

	echo e83c5163316f89bfbde7d9ab23ca2e25604af290 | ./git pack-objects initial

But yes with just 5cb28270a1 didn't add release_revisions(), that came
shortly afterwards in 2108fe4a19.

>> , but as I
>> noted if you'd run the command that commit shows, you'd have seen you're
>> re-introducing the leak it fixed. So I wonder what else has been missed
>> here.
>
> 5cb28270a1 (pack-objects: lazily set up "struct rev_info", don't leak,
> 2022-03-28) did not plug the leak.  It only moved it to the corner that
> handles the --filter option.

I think we're using "the leak" here differently. I mean callstacks that
LeakSanitizer emits & tests we have that do & don't pass with
SANITIZE=leak.

But yes, there may be multiple paths through a function, some of which
leak, some of which don't. I'm not saying that the entire set of API
features that builtin/pack-objects.c uses in the revision API is
leak-free.

> That leak is only interesting to Git developers and harmless for users.
> But if the goal is to become free of trivial leaks in order to allow
> using tools like LeakSanitizer to find real ones then pushing them into
> the shadows not yet reached by our test coverage won't help for long.

It's clearly helping in this case, as our CI had multiple failing tests.

>> I vaguely recall that one reason I ended up with that J.5.7 dependency
>> was because there was an objection to mocking up the "struct option" as
>> I'm doing here. I.e. here we assume that the
>> opt_parse_list_objects_filter() is only ever going to care about the
>> "value" member.
>
> It's probably fine, but unnecessarily complicated compared to calling
> repo_init_revisions() eagerly.

I'm leaving aside the question of whether we should go for some version
of the refactoring in your 3/3.

What I am saying is that such refactoring should be split up from the
more narrow bug fix to the existing code. I.e. this as a replacement for
your 3/3 is all that's needed to pass the test you're adding in 2/3.

-- >8 --
From: =?UTF-8?q?Ren=C3=A9=20Scharfe?= <l.s.r@web.de>
Subject: [PATCH] pack-objects: support multiple --filter options again

5cb28270a1f (pack-objects: lazily set up "struct rev_info", don't
leak, 2022-03-28) broke support for multiple --filter options by
calling repo_init_revisions() every time "--filter" was seen. Instead
we should only do so the first time, and subsequently append to the
existing filter data.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/pack-objects.c                 | 3 ++-
 t/t5317-pack-objects-filter-objects.sh | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 573d0b20b76..c702c09dd45 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4158,7 +4158,8 @@ static struct list_objects_filter_options *po_filter_revs_init(void *value)
 {
 	struct po_filter_data *data = value;
 
-	repo_init_revisions(the_repository, &data->revs, NULL);
+	if (!data->have_revs)
+		repo_init_revisions(the_repository, &data->revs, NULL);
 	data->have_revs = 1;
 
 	return &data->revs.filter;
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 25faebaada8..5b707d911b5 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -265,7 +265,7 @@ test_expect_success 'verify normal and blob:limit packfiles have same commits/tr
 	test_cmp expected observed
 '
 
-test_expect_failure 'verify small limit and big limit results in small limit' '
+test_expect_success 'verify small limit and big limit results in small limit' '
 	git -C r2 ls-files -s large.1000 >ls_files_result &&
 	test_parse_ls_files_stage_oids <ls_files_result |
 	sort >expected &&
-- 
2.39.0.rc0.993.g0c499e58e3b


  reply	other threads:[~2022-11-28 15:54 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-12 10:42 [PATCH 0/3] pack-objects: fix and simplify --filter handling René Scharfe
2022-11-12 10:44 ` [PATCH 1/3] pack-objects: fix handling of multiple --filter options René Scharfe
2022-11-12 11:41   ` Ævar Arnfjörð Bjarmason
2022-11-13 17:31     ` René Scharfe
2022-11-12 16:58   ` Jeff King
2022-11-13  5:01     ` Taylor Blau
2022-11-13 16:44       ` Jeff King
2022-11-13 17:31       ` René Scharfe
2022-11-12 10:44 ` [PATCH 2/3] pack-object: simplify --filter handling René Scharfe
2022-11-12 11:45   ` Ævar Arnfjörð Bjarmason
2022-11-12 17:02   ` Jeff King
2022-11-13 16:49     ` Jeff King
2022-11-13 17:31     ` René Scharfe
2022-11-12 10:46 ` [PATCH 3/3] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() René Scharfe
2022-11-20 10:03 ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling René Scharfe
2022-11-20 10:06   ` [PATCH v2 1/3] t5317: stop losing return codes of git ls-files René Scharfe
2022-11-20 10:07   ` [PATCH v2 2/3] t5317: demonstrate failure to handle multiple --filter options René Scharfe
2022-11-20 10:13   ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe
2022-11-28 10:03     ` Junio C Hamano
2022-11-28 11:12       ` Ævar Arnfjörð Bjarmason
2022-11-28 12:00         ` [PATCH] t5314: check exit code of "rev-parse" Ævar Arnfjörð Bjarmason
2022-11-28 13:51           ` René Scharfe
2022-11-28 14:18           ` [PATCH v2] t5314: check exit code of "git" Ævar Arnfjörð Bjarmason
2022-11-28 11:26       ` [PATCH v2 3/3] Revert "pack-objects: lazily set up "struct rev_info", don't leak" René Scharfe
2022-11-28 11:31         ` Ævar Arnfjörð Bjarmason
2022-11-28 12:24           ` Ævar Arnfjörð Bjarmason
2022-11-28 15:16             ` René Scharfe
2022-11-28 15:27               ` Ævar Arnfjörð Bjarmason [this message]
2022-11-28 14:29           ` René Scharfe
2022-11-28 14:34             ` Ævar Arnfjörð Bjarmason
2022-11-28 15:56               ` René Scharfe
2022-11-28 17:57                 ` René Scharfe
2022-11-28 18:32                   ` Ævar Arnfjörð Bjarmason
2022-11-28 21:57                     ` René Scharfe
2022-11-29  1:26                       ` Jeff King
2022-11-29  1:46                         ` Junio C Hamano
2022-11-29 10:25                         ` Ævar Arnfjörð Bjarmason
2022-11-29  7:12                       ` Ævar Arnfjörð Bjarmason
2022-11-29 19:18                         ` René Scharfe
2022-11-28 17:57                 ` Ævar Arnfjörð Bjarmason
2022-11-22 19:02   ` [PATCH v2 0/3] pack-objects: fix and simplify --filter handling Jeff King
2022-11-29 12:19 ` [PATCH v3 0/5] " René Scharfe
2022-11-29 12:21   ` [PATCH v3 1/5] t5317: stop losing return codes of git ls-files René Scharfe
2022-11-29 12:22   ` [PATCH v3 2/5] t5317: demonstrate failure to handle multiple --filter options René Scharfe
2022-11-29 12:23   ` [PATCH v3 3/5] pack-objects: fix handling of " René Scharfe
2022-11-30  1:09     ` Junio C Hamano
2022-11-30  7:11       ` René Scharfe
2022-11-29 12:25   ` [PATCH v3 4/5] pack-objects: simplify --filter handling René Scharfe
2022-11-29 13:27     ` Ævar Arnfjörð Bjarmason
2022-11-30 11:23       ` René Scharfe
2022-11-29 12:26   ` [PATCH v3 5/5] list-objects-filter: remove OPT_PARSE_LIST_OBJECTS_FILTER_INIT() René Scharfe
2022-11-30  1:20     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221128.861qpnkrd3.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).