From: "René Scharfe" <l.s.r@web.de>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Elijah Newren <newren@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
Git Mailing List <git@vger.kernel.org>, Thomas Hurst <tom@hur.st>
Subject: Re: Corrupt name-rev output
Date: Tue, 17 May 2022 18:20:58 +0200 [thread overview]
Message-ID: <dece627d-ccf8-2375-0c50-c59637e561d3@web.de> (raw)
In-Reply-To: <220517.86sfp878zr.gmgdl@evledraar.gmail.com>
Am 17.05.22 um 12:15 schrieb Ævar Arnfjörð Bjarmason:
>
> On Fri, Apr 22 2022, René Scharfe wrote:
>
>> Am 21.04.22 um 19:55 schrieb René Scharfe:
>>> Am 21.04.22 um 04:11 schrieb Elijah Newren:
>>>
>>>> Reverting 2d53975488 fixes the problem.
>>>
>>> That's a good band-aid.
>> Or perhaps it's all we need. I can't replicate the original reduction
>> of peak memory usage for the Chromium repo anymore. In fact, the very
>> next commit, 079f970971 (name-rev: sort tip names before applying,
>> 2020-02-05), reduced the number of times free(3) is called there from
>> 44245 to 5, and 3656f84278 (name-rev: prefer shorter names over
>> following merges, 2021-12-04) brought that number down to zero.
>>
>> I can't reproduce the issue with the hardenedBSD repo, by the way, but
>> e.g. with 'git name-rev 58b82150da' in the Linux repo.
>>
>> --- >8 ---
>> Subject: [PATCH] Revert "name-rev: release unused name strings"
>>
>> This reverts commit 2d53975488df195e1431c3f90bfb5b60018d5bf6.
>>
>> 3656f84278 (name-rev: prefer shorter names over following merges,
>> 2021-12-04) broke the assumption of 2d53975488 (name-rev: release unused
>> name strings, 2020-02-04) that a better name for a child is a better
>> name for all of its ancestors as well, because it added a penalty for
>> generation > 0. This leads to strings being free(3)'d that are still
>> needed.
>>
>> 079f970971 (name-rev: sort tip names before applying, 2020-02-05)
>> already reduced the number of free(3) calls for the use case that
>> motivated the original patch (name-rev --all in the Chromium repository)
>> from ca. 44000 to 5, and 3656f84278 eliminated even those few. So this
>> revert won't affect name-rev's performance on that particular repo.
>>
>> Reported-by: Thomas Hurst <tom@hur.st>
>> Helped-by: Elijah Newren <newren@gmail.com>
>> Signed-off-by: René Scharfe <l.s.r@web.de>
>> ---
>> builtin/name-rev.c | 21 +++++----------------
>> 1 file changed, 5 insertions(+), 16 deletions(-)
>>
>> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
>> index c59b5699fe..02ea9d1633 100644
>> --- a/builtin/name-rev.c
>> +++ b/builtin/name-rev.c
>> @@ -18,7 +18,7 @@
>> #define CUTOFF_DATE_SLOP 86400
>>
>> struct rev_name {
>> - char *tip_name;
>> + const char *tip_name;
>> timestamp_t taggerdate;
>> int generation;
>> int distance;
>> @@ -84,7 +84,7 @@ static int commit_is_before_cutoff(struct commit *commit)
>>
>> static int is_valid_rev_name(const struct rev_name *name)
>> {
>> - return name && (name->generation || name->tip_name);
>> + return name && name->tip_name;
>> }
>>
>> static struct rev_name *get_commit_rev_name(const struct commit *commit)
>> @@ -146,20 +146,9 @@ static struct rev_name *create_or_update_name(struct commit *commit,
>> {
>> struct rev_name *name = commit_rev_name_at(&rev_names, commit);
>>
>> - if (is_valid_rev_name(name)) {
>> - if (!is_better_name(name, taggerdate, generation, distance, from_tag))
>> - return NULL;
>> -
>> - /*
>> - * This string might still be shared with ancestors
>> - * (generation > 0). We can release it here regardless,
>> - * because the new name that has just won will be better
>> - * for them as well, so name_rev() will replace these
>> - * stale pointers when it processes the parents.
>> - */
>> - if (!name->generation)
>> - free(name->tip_name);
>> - }
>> + if (is_valid_rev_name(name) &&
>> + !is_better_name(name, taggerdate, generation, distance, from_tag))
>> + return NULL;
>>
>> name->taggerdate = taggerdate;
>> name->generation = generation;
>
> I haven't dug into whether it's a false positive, but with this change
> GCC's -fanalyzer has started complaining about a potential NULL
> dereference:
>
> builtin/name-rev.c:230:50: error: dereference of NULL ‘name’ [CWE-476] [-Werror=analyzer-null-dereference]
> 230 | generation = name->generation + 1;
>
> This "fixes" it, and passes all tests, but presumably a better fix
> involves the callers of get_commit_rev_name() (or that function itself)
> deciding if they're OK with NULL here earlier?:
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 02ea9d16330..1d3a620ac72 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -209,6 +209,7 @@ static void name_rev(struct commit *start_commit,
> struct rev_name *name = get_commit_rev_name(commit);
> struct commit_list *parents;
> int parent_number = 1;
> + assert(name);
>
> parents_to_queue_nr = 0;
>
>
It's a false positive AFAICS because we set the name of the first commit
to go into the queue with create_or_update_name(), then go through its
parents and set their name as well before putting them into the queue to
traverse ancestors recursively. I.e. everything we pull from the queue
must have a non-NULL name, right?
This begs the question why a free(3) call would make a difference to the
analyzer -- it doesn't affect NULL pointers after all.
Another good question is whether we should remove the validity check and
use commit_rev_name_peek() directly instead of get_commit_rev_name() in
the loop since we know it must be valid. This might save a few cycles
and perhaps calm down the analyzer.
René
prev parent reply other threads:[~2022-05-17 16:21 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-17 16:20 Corrupt name-rev output Thomas Hurst
2022-04-20 3:13 ` Junio C Hamano
2022-04-21 2:11 ` Elijah Newren
2022-04-21 17:55 ` René Scharfe
2022-04-22 18:44 ` René Scharfe
2022-04-23 16:47 ` Junio C Hamano
2022-05-17 10:15 ` Ævar Arnfjörð Bjarmason
2022-05-17 16:20 ` René Scharfe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dece627d-ccf8-2375-0c50-c59637e561d3@web.de \
--to=l.s.r@web.de \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=tom@hur.st \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).