Re: Whether to keep using UNLEAK() in built-ins

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Junio C Hamano <gitster@pobox.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, "Jeff King" <peff@peff.net>,
	"Andrzej Hunt" <ajrhunt@google.com>,
	"Martin Ågren" <martin.agren@gmail.com>
Subject: Re: Whether to keep using UNLEAK() in built-ins
Date: Fri, 18 Feb 2022 10:19:41 -0800	[thread overview]
Message-ID: <xmqqwnhs11he.fsf@gitster.g> (raw)
In-Reply-To: <220218.861r00ib86.gmgdl@evledraar.gmail.com> ("Ævar Arnfjörð Bjarmason"'s message of "Fri, 18 Feb 2022 13:35:12 +0100")

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>>> Specifically, the cited commit turned an existing strbuf_release()
>>> on &err into UNLEAK().  If that and the other strbuf (sb) were so
>>> easily releasable, why didn't we do so back then already?
>>
>> I suspect that the answer to the above question is because these
>> allocations are in the top-level cmd_commit() function, which is
>> never called recursively or repeatedly as a subroutine.  The only
>> significant thing that happens after we return from it is to exit.
>>
>> In such a code path, marking a variable as UNLEAK() is a better
>> thing to do than calling strbuf_release().  Both will work as a way
>> to squelch sanitizers from reporting a leak that does not matter,
>> but calling strbuf_release() means we'd spend extra cycles to return
>> pieces of memory to the pool, even though we know that the pool
>> itself will be cleaned immediately later at exit.
>>
>> We already have UNLEAK to tell sanitizers not to distract us from
>> spotting and plugging real leaks by reporting these apparent leaks
>> that do not matter.  It is of somewhat dubious value to do a "we
>> care too much about pleasing sanitizer and spend extra cycles at
>> runtime while real users are doing real work" change.

> Per https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/ the
> real goal I have in mind here is to use the built-ins as a stand-in for
> testing whether the underlying APIs are leak-free.
>
> Because of that having to reason about UNLEAK() doing the right thing or
> not is just unneeded distraction. Yes for a "struct strbuf" it won't
> matter, but most of what we UNLEAK() is more complex stuff like "struct
> rev_info". We won't really make headway making revision.c not leak
> memory without using "git log" et al as the test subjects for whether
> that API leaks.

I have to say that you have a wrong goal and wrong priority.  The
number of UNLEAK in cmd_foo() functions is not even a poor
approximation of our progress.

Imagine that a subsystem that are repeatedly set-up and used during
a single program invocation was leaky.  Let's say a program calls
diff_setup() to prepare the diff_options structure, feeds two things
to compare, and calls diff_flush() to show the comparison result,
but let's assume this sequence leaks some resources.

Now cmd_diff() may be such a program that does a setup, feeds two
trees, calls diff_flush() and exits.  If we didn't do anything to
it, diff_options may "leak".  Marking it with UNLEAK may be a good
measure, if we want to keep the leak checker from reporting a leak
that does not matter in practice so that we can concentrate on
plugging real leaks that matter.

But consider cmd_log(), running something like "git log -p".  It
iterates over commits, does the <setup, compare two trees, flush>
repeatedly for each commit it encounters during the walk on the same
diff_options structure.  Now, the leak in the code path does matter.
If diff_flush() is left leaky, it will show up in the leak checker's
output, and that is reporting real leaks that matter.  The thing is,
the <setup, compare, flush> sequence whose leak matters is not in
cmd_log(); it is much deeper in the revision walking machinery.  We
do not want to paper over such leaks with UNLEAK.

See the difference?  The number of UNLEAK OUTside built-ins does
matter.  We do not want to have UNLEAK there to hide leaks in
possible callees that may be called arbitrary number of times in
arbitrary order.  Compared to that, UNLEAK in cmd_foo() to mark an
on-stack variable that was used only once without even a recursion
is purely to squelch the leak checker from reporting leaks that does
not matter.  For simple things like strbuf on stack of the top-level
cmd_foo() functions, we could even leave strbuf_release() not called
and leave the resource reclamation to _exit(2).  That would cause
the leak checkers to report them and distract us, and UNLEAK is a
better way to squelch the distraction without wasting extra cycles
at runtime to actually free them.

So, don't look at "built-ins as a stand-in".  It is a wrong
direction to go in to let the "leak checker" tail wag the dog.

Thanks.

next prev parent reply	other threads:[~2022-02-18 18:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-16  8:21 [PATCH 0/2] commit: trivial leak fix, add 2 tests to linux-leaks CI Ævar Arnfjörð Bjarmason
2022-02-16  8:21 ` [PATCH 1/2] commit: fix "author_ident" leak Ævar Arnfjörð Bjarmason
2022-02-16 17:59   ` Junio C Hamano
2022-02-16  8:21 ` [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK() Ævar Arnfjörð Bjarmason
2022-02-16 18:03   ` Junio C Hamano
2022-02-16 18:30     ` Junio C Hamano
2022-02-18 12:35       ` Whether to keep using UNLEAK() in built-ins (was: [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK()) Ævar Arnfjörð Bjarmason
2022-02-18 18:19         ` Junio C Hamano [this message]
2022-02-18 19:31           ` Whether to keep using UNLEAK() in built-ins Ævar Arnfjörð Bjarmason
2022-05-12 22:51 ` [PATCH] commit: fix "author_ident" leak Junio C Hamano
2022-05-17 13:48   ` Ævar Arnfjörð Bjarmason
2022-05-18 16:30     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqwnhs11he.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=ajrhunt@google.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=martin.agren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).