git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Andrzej Hunt <andrzej@ahunt.org>
Cc: git@vger.kernel.org, "Jeff King" <peff@peff.net>,
	"Lénaïc Huard" <lenaic@lhuard.fr>,
	"Derrick Stolee" <dstolee@microsoft.com>
Subject: Re: UNLEAK(), leak checking in the default tests etc.
Date: Thu, 10 Jun 2021 12:56:55 +0200	[thread overview]
Message-ID: <87y2bi0vvl.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <fcb0eaee-6ae1-f2cc-51d5-103eea64532a@ahunt.org>


On Wed, Jun 09 2021, Andrzej Hunt wrote:

> On 09/06/2021 16:38, Ævar Arnfjörð Bjarmason wrote:
>> [In-Reply-To
>> <a74bbcae7363df03bf8e93167d9274d16dc807f3.1615747662.git.gitgitgadget@gmail.com>,
>> but intentionally breaking threading for a new topic]
>> On Sun, Mar 14 2021, Andrzej Hunt via GitGitGadget wrote:
>> 
>>> Most of these pointers can safely be freed when cmd_clone() completes,
>>> therefore we make sure to free them. The one exception is that we
>>> have to UNLEAK(repo) because it can point either to argv[0], or a
>>> malloc'd string returned by absolute_pathdup().
>> I ran into this when manually checking with valgrind and discovered
>> that
>> you need SANITIZERS for -DSUPPRESS_ANNOTATED_LEAKS to squash it.
>> I wonder if that shouldn't be in DEVOPTS (or even a default under
>> DEVELOPER=1). I.e. you don't need any other special compile flags, just
>> a compiled git that you then run under valgrind to spot this.
>
> I'm not familiar with git's development conventions/philosophy, but my
> 2c is that it's better not to enable it by default in order to
> minimise divergence from the code that users are running. OTOH it's
> not a major difference in behaviour so perhaps that's not a concern
> here.
>
> More significantly: I get the impression it's easier to do leak
> checking using LSAN, which requires recompiling git anyway - at which
> point you get the flag for free - so how often will people actually
> perform leak checking with Valgrind in the first place?

*Nod*, I didn't investigate the runtime penalty you and Jeff point
out. In any case, it seems that can also be done with valgrind exclusion
rules and/or manually ignoring these cases in the test wrapper.

>> 
>>>   builtin/clone.c | 14 ++++++++++----
>>>   1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/builtin/clone.c b/builtin/clone.c
>>> index 51e844a2de0a..952fe3d8fc88 100644
>>> --- a/builtin/clone.c
>>> +++ b/builtin/clone.c
>>> @@ -964,10 +964,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>>>   {
>>>   	int is_bundle = 0, is_local;
>>>   	const char *repo_name, *repo, *work_tree, *git_dir;
>>> -	char *path, *dir, *display_repo = NULL;
>>> +	char *path = NULL, *dir, *display_repo = NULL;
>>>   	int dest_exists, real_dest_exists = 0;
>>>   	const struct ref *refs, *remote_head;
>>> -	const struct ref *remote_head_points_at;
>>> +	struct ref *remote_head_points_at = NULL;
>>>   	const struct ref *our_head_points_at;
>>>   	struct ref *mapped_refs;
>>>   	const struct ref *ref;
>>> @@ -1017,9 +1017,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>>>   	repo_name = argv[0];
>>>     	path = get_repo_path(repo_name, &is_bundle);
>>> -	if (path)
>>> +	if (path) {
>>> +		FREE_AND_NULL(path);
>>>   		repo = absolute_pathdup(repo_name);
>>> -	else if (strchr(repo_name, ':')) {
>>> +	} else if (strchr(repo_name, ':')) {
>>>   		repo = repo_name;
>>>   		display_repo = transport_anonymize_url(repo);
>>>   	} else
>> In this case it seems better to just have a :
>>      int repo_heap = 0;
>>      Then set "repo_heap = 1" in that absolute_pathdup(repo_name)
>> branch,
>>      and...
>> 
>>> @@ -1393,6 +1394,11 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>>>   	strbuf_release(&reflog_msg);
>>>   	strbuf_release(&branch_top);
>>>   	strbuf_release(&key);
>>> +	free_refs(mapped_refs);
>>> +	free_refs(remote_head_points_at);
>>> +	free(dir);
>>> +	free(path);
>>> +	UNLEAK(repo);
>> Here do:
>>      if (repo_heap)
>>          free(repo);
>> 
>
> Although this is possible, I don't think it's worth it: if UNLEAK
> already exists, we might as well use it here to make the code
> simpler. And UNLEAK is unlikely to go away anytime soon
> because... (continued below)
>
>> But maybe there's some other out of the box way to make leak checking
>> Just Work without special flags in this case. I'm just noting this one
>> because it ended up being the only one that leaked unless I compiled
>> with -DSUPPRESS_ANNOTATED_LEAKS. I was fixing some leaks in the bundle
>> code.
>
> There are trickier examples where a cmd_* function has a complex
> struct on the stack, and correctly clearing all allocated memory
> pointed to by its members (or in turn further children with
> potentially multiple levels of indirection) is a lot of work - and
> that work doesn't actually benefit the user in any way. In other
> words, we either need to be able to use UNLEAK to suppress certain
> classes of uninteresting memory leaks - which allows us to focus on
> the interesting/real leaks - or someone has to spend a lot of time
> doing cleanup by hand (and/or someone has to implement a bunch of new
> cleanup functions)).
>
> In your example above, the UNLEAK can be avoided at the cost of one
> additional tracking variable - but in many other cases avoiding an 
> UNLEAK is much more expensive. It's certainly valid to debate the
> merits of the UNLEAK here, but that won't remove the need for UNLEAK's 
> existence in general.
>
> (The most common example that I remember is where cmd_* has a
> rev_info, and AFAICT there's no one-liner to clean that up. Using
> UNLEAK is honestly the best approach there. I don't think I've
> actually submitted any patches doing this, but I have a few in my
> local backlog.)

The thing I was patching happened to be making rev_info * not leak. I
probably didn't cover some more complex cases, but some simple cases
seem relatively easy.

I.e. it just doesn't have a release() function, and at least the things
I was looking at (bundle.c code) were relatively easy cases where we
were just missing a loop to free() data from some struct.

But yes, I agree that free()-ing just before we exit() is rather useless
in itself, the reason I wanted it is because it's a useful (although not
perfect) proxy for checking if the APIs the command uses as a one-off
leak when used as libraries, where we may be processing N items, later
doing other work etc.

We should probably eventually have a s/free/end_free()/g and imitate
perl(1)'s PERL_DESTRUCT_LEVEL option. I.e. you can globally configure
perl to run in a mode that assumes a one-off command, in that case
you'll just let the OS handle the cleanup, or one where you care about
memory leaks because you're using it e.g. as an embedded library.

But maybe it's not even worth it. In Perl the main benefit is that it's
a programming language with DESTROY handlers etc., so destruction can
often be expensive; turning it off entirely can also be buggy, imagine
relying on destructors to free temporary files etc.

We have that issue in theory with the interaction of atexit() handlers
and e.g. things that would behave differently at a distance if certain
thing were free()'d already, but in practice we probably don't.

But maybe it's not even worth pursuing. Have you (or anyone else) tried
e.g. benchmarking git's tests or t/perf tests where free() is defined to
be some noop stub? I'd expect it not to matter, but maybe I'm wrong...

>> Anyway, getting to the "default tests" point. I fixed a memory leak, and
>> wanted to it tested that the specific command doesn't leak in git's
>> default tests.
>> Do we have such a thing, if not why not?
>> The closest I got to getting this was:
>>      GIT_VALGRIND_MODE=memcheck
>> GIT_VALGRIND_OPTIONS="--leak-check=full
>> --errors-for-leak-kinds=definite --error-exitcode=123" <SOME TEST>
>> --valgrind
>
> It's easy to perform leak-checking runs *if* you're OK recompiling
> with LSAN, instead of using valgrind. My usual recipe for running
> against a range of tests is something like:

I thought valgrind would be a better approach since we might rely on it
just being there, so we could run some known-good commands that don't
leak even in a "normal" test run, but...

>   make SANITIZE=address,leak
>   ASAN_OPTIONS="detect_leaks=1:abort_on_error=1" CFLAGS="-Og -g" 
> T="\$(wildcard t00[0-9][0-9]-*.sh)" test
>
> Additionally: I usually specify CC=clang, although gcc+LSAN has mostly
> been stable enough in my experience so you might be able to skip that.
> (I've found ASAN+LSAN to be more stable than LSAN by itself, which is
> why I specify address+leak, but adding ASAN in turn requires
> overriding ASAN_OPTIONS to reenable leak checking.)
>
> I don't know whether or not Valgrind is more/less effective at finding
> leaks, so being able to run the test suite under valgrind would be
> nice for comparison purposes though.

I didn't know how to set that up, that seems easy enough.

This works for me:

    make CC=clang SANITIZE=address,leak CFLAGS="-00 -g"
    (cd t && make ASAN_OPTIONS="<what you said>" [...])

I.e. it's just SANITIZE & flags that's important at compile-time. You
doubtless knew that, mainly for my own notes & others following along.

I ran it, noted the failing tests, produced a giant GIT_SKIP_TESTS list
and hacked ci/ to run that as a new linux-clang-SANITIZE job. That messy
WIP code is currently running at:
https://github.com/avar/git/runs/2793150092

Wouldn't it be a good idea to have such a job and slowly work on the
exclusion list?

E.g. I saw that t0004 failed, which was trivially fixed with a single
strbuf_release(), and we could guard against regressions.

Anyway, I can submit some cleaned-up patches for that. I was just
fishing for whether there was some good reason not to do it, since there
seemed to have been interest in leak fixes, but it hadn't made it into
CI / some "blessed" GIT_TEST_* mode or whatever. I.e. maybe the reports
were unstable or unreliable...


  parent reply	other threads:[~2021-06-10 12:04 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 14:38 UNLEAK(), leak checking in the default tests etc Ævar Arnfjörð Bjarmason
2021-06-09 17:44 ` Andrzej Hunt
2021-06-09 20:36   ` Felipe Contreras
2021-06-10 10:46   ` Jeff King
2021-06-10 10:56   ` Ævar Arnfjörð Bjarmason [this message]
2021-06-10 13:38     ` Jeff King
2021-06-10 15:32       ` Andrzej Hunt
2021-06-10 16:36         ` Jeff King
2021-06-11 15:44           ` Andrzej Hunt
2021-06-10 19:01 ` SZEDER Gábor
2021-07-14  0:11 ` [PATCH 0/4] add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-07-14  0:11   ` [PATCH 1/4] tests: " Ævar Arnfjörð Bjarmason
2021-07-14  3:23     ` Đoàn Trần Công Danh
2021-07-14  0:11   ` [PATCH 2/4] SANITIZE tests: fix memory leaks in t13*config*, add to whitelist Ævar Arnfjörð Bjarmason
2021-07-14  0:11   ` [PATCH 3/4] SANITIZE tests: fix memory leaks in t5701*, " Ævar Arnfjörð Bjarmason
2021-07-14  0:11   ` [PATCH 4/4] SANITIZE tests: fix leak in mailmap.c Ævar Arnfjörð Bjarmason
2021-07-14  2:19     ` Eric Sunshine
2021-07-14 17:23   ` [PATCH v2 0/4] add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-07-14 17:23     ` [PATCH v2 1/4] tests: " Ævar Arnfjörð Bjarmason
2021-07-14 18:42       ` Andrzej Hunt
2021-07-14 22:39         ` Ævar Arnfjörð Bjarmason
2021-07-15 21:14         ` Jeff King
2021-07-15 21:06       ` Jeff King
2021-07-16 14:46         ` Ævar Arnfjörð Bjarmason
2021-07-16 18:09           ` Jeff King
2021-07-16 18:45             ` Jeff King
2021-07-16 18:56             ` Ævar Arnfjörð Bjarmason
2021-07-16 19:22               ` Jeff King
2021-07-14 17:23     ` [PATCH v2 2/4] SANITIZE tests: fix memory leaks in t13*config*, add to whitelist Ævar Arnfjörð Bjarmason
2021-07-14 18:57       ` Andrzej Hunt
2021-07-14 22:56         ` Ævar Arnfjörð Bjarmason
2021-07-15 21:42         ` Jeff King
2021-07-16  5:18           ` Andrzej Hunt
2021-07-16 21:20             ` Jeff King
2021-07-16  7:46           ` Ævar Arnfjörð Bjarmason
2021-07-16 21:16             ` Jeff King
2021-08-31 12:47               ` Ævar Arnfjörð Bjarmason
2021-09-01  7:53                 ` Jeff King
2021-09-01 11:45                   ` Ævar Arnfjörð Bjarmason
2021-07-14 17:23     ` [PATCH v2 3/4] SANITIZE tests: fix memory leaks in t5701*, " Ævar Arnfjörð Bjarmason
2021-07-15 17:37       ` Andrzej Hunt
2021-07-15 21:43       ` Jeff King
2021-08-31 13:46       ` [PATCH] protocol-caps.c: fix memory leak in send_info() Ævar Arnfjörð Bjarmason
2021-08-31 15:32         ` Bruno Albuquerque
2021-08-31 18:15           ` Junio C Hamano
     [not found]         ` <CAPeR6H69a_HMwWnpHzssaCm_ow=ic7AnzMdZVQJQ2ECRDaWzaA@mail.gmail.com>
2021-08-31 20:08           ` Ævar Arnfjörð Bjarmason
2021-07-14 17:23     ` [PATCH v2 4/4] SANITIZE tests: fix leak in mailmap.c Ævar Arnfjörð Bjarmason
2021-08-31 13:42       ` [PATCH] mailmap.c: fix a memory leak in free_mailap_{info,entry}() Ævar Arnfjörð Bjarmason
2021-08-31 16:22         ` Eric Sunshine
2021-08-31 19:38         ` Jeff King
2021-08-31 19:46           ` Junio C Hamano
2021-07-15 17:37     ` [PATCH v2 0/4] add a test mode for SANITIZE=leak, run it in CI Andrzej Hunt
2021-08-31 13:35     ` [PATCH v3 0/8] " Ævar Arnfjörð Bjarmason
2021-09-01  9:56       ` Jeff King
2021-09-01 10:42         ` Jeff King
2021-09-02 12:25         ` Ævar Arnfjörð Bjarmason
2021-09-03 11:13           ` Jeff King
2021-09-07 15:33       ` [PATCH v4 0/3] " Ævar Arnfjörð Bjarmason
2021-09-07 15:33         ` [PATCH v4 1/3] Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS Ævar Arnfjörð Bjarmason
2021-09-07 15:33         ` [PATCH v4 2/3] CI: refactor "if" to "case" statement Ævar Arnfjörð Bjarmason
2021-09-07 15:33         ` [PATCH v4 3/3] tests: add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-09-07 16:29           ` Eric Sunshine
2021-09-07 16:51           ` Jeff King
2021-09-07 16:44         ` [PATCH v4 0/3] " Jeff King
2021-09-07 18:22           ` Junio C Hamano
2021-09-07 21:30         ` [PATCH v5 " Ævar Arnfjörð Bjarmason
2021-09-07 21:30           ` [PATCH v5 1/3] Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS Ævar Arnfjörð Bjarmason
2021-09-07 21:30           ` [PATCH v5 2/3] CI: refactor "if" to "case" statement Ævar Arnfjörð Bjarmason
2021-09-07 21:30           ` [PATCH v5 3/3] tests: add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-09-08  4:46             ` Eric Sunshine
2021-09-16  3:56             ` [PATCH] fixup! " Carlo Marcelo Arenas Belón
2021-09-16  6:14               ` Ævar Arnfjörð Bjarmason
2021-09-08 11:02           ` [PATCH v5 0/3] " Junio C Hamano
2021-09-08 12:03             ` Ævar Arnfjörð Bjarmason
2021-09-09 23:10               ` Emily Shaffer
2021-09-16 10:48           ` [PATCH v6 0/2] " Ævar Arnfjörð Bjarmason
2021-09-16 10:48             ` [PATCH v6 1/2] Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS Ævar Arnfjörð Bjarmason
2021-09-16 10:48             ` [PATCH v6 2/2] tests: add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-09-19  8:03             ` [PATCH v7 0/2] " Ævar Arnfjörð Bjarmason
2021-09-19  8:03               ` [PATCH v7 1/2] Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS Ævar Arnfjörð Bjarmason
2021-09-19  8:03               ` [PATCH v7 2/2] tests: add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-09-22 11:17                 ` [PATCH] fixup! " Carlo Marcelo Arenas Belón
2021-09-23  1:50                   ` Ævar Arnfjörð Bjarmason
2021-09-23  9:20               ` [PATCH v8 0/2] " Ævar Arnfjörð Bjarmason
2021-09-23  9:20                 ` [PATCH v8 1/2] Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS Ævar Arnfjörð Bjarmason
2021-09-23  9:20                 ` [PATCH v8 2/2] tests: add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-11-03 22:44                   ` Re* " Junio C Hamano
2021-11-03 23:57                     ` Junio C Hamano
2021-11-04 10:06                     ` Ævar Arnfjörð Bjarmason
2021-11-16 18:31                       ` [PATCH] t0006: date_mode can leak .strftime_fmt member Ævar Arnfjörð Bjarmason
2021-11-16 19:04                         ` Junio C Hamano
2021-11-16 19:31                         ` Jeff King
2022-02-02 21:03                           ` [PATCH 0/5] date.[ch] API: split from cache.h, add API docs, stop leaking memory Ævar Arnfjörð Bjarmason
2022-02-02 21:03                             ` [PATCH 1/5] cache.h: remove always unused show_date_human() declaration Ævar Arnfjörð Bjarmason
2022-02-02 21:03                             ` [PATCH 2/5] date API: create a date.h, split from cache.h Ævar Arnfjörð Bjarmason
2022-02-02 21:19                               ` Ævar Arnfjörð Bjarmason
2022-02-15  3:04                               ` Junio C Hamano
2022-02-02 21:03                             ` [PATCH 3/5] date API: provide and use a DATE_MODE_INIT Ævar Arnfjörð Bjarmason
2022-02-02 21:03                             ` [PATCH 4/5] date API: add basic API docs Ævar Arnfjörð Bjarmason
2022-02-15  2:14                               ` Junio C Hamano
2022-02-02 21:03                             ` [PATCH 5/5] date API: add and use a date_mode_release() Ævar Arnfjörð Bjarmason
2022-02-15  0:28                               ` Junio C Hamano
2022-02-04 23:53                             ` [PATCH v2 0/5] date.[ch] API: split from cache.h, add API docs, stop leaking memory Ævar Arnfjörð Bjarmason
2022-02-04 23:53                               ` [PATCH v2 1/5] cache.h: remove always unused show_date_human() declaration Ævar Arnfjörð Bjarmason
2022-02-04 23:53                               ` [PATCH v2 2/5] date API: create a date.h, split from cache.h Ævar Arnfjörð Bjarmason
2022-02-04 23:53                               ` [PATCH v2 3/5] date API: provide and use a DATE_MODE_INIT Ævar Arnfjörð Bjarmason
2022-02-04 23:53                               ` [PATCH v2 4/5] date API: add basic API docs Ævar Arnfjörð Bjarmason
2022-02-04 23:53                               ` [PATCH v2 5/5] date API: add and use a date_mode_release() Ævar Arnfjörð Bjarmason
2022-02-14 17:25                               ` [PATCH v2 0/5] date.[ch] API: split from cache.h, add API docs, stop leaking memory Ævar Arnfjörð Bjarmason
2022-02-14 19:52                                 ` Junio C Hamano
2022-02-16  8:14                               ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2022-02-16  8:14                                 ` [PATCH v3 1/5] cache.h: remove always unused show_date_human() declaration Ævar Arnfjörð Bjarmason
2022-02-16  8:14                                 ` [PATCH v3 2/5] date API: create a date.h, split from cache.h Ævar Arnfjörð Bjarmason
2022-02-16  8:14                                 ` [PATCH v3 3/5] date API: provide and use a DATE_MODE_INIT Ævar Arnfjörð Bjarmason
2022-02-16  8:14                                 ` [PATCH v3 4/5] date API: add basic API docs Ævar Arnfjörð Bjarmason
2022-02-16  8:14                                 ` [PATCH v3 5/5] date API: add and use a date_mode_release() Ævar Arnfjörð Bjarmason
2022-02-16 17:45                                 ` [PATCH v3 0/5] date.[ch] API: split from cache.h, add API docs, stop leaking memory Junio C Hamano
     [not found]     ` <cover-v3-0.8-00000000000-20210831T132607Z-avarab@gmail.com>
2021-08-31 13:35       ` [PATCH v3 1/8] Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 2/8] CI: refactor "if" to "case" statement Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 3/8] tests: add a test mode for SANITIZE=leak, run it in CI Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 4/8] tests: annotate t000*.sh with TEST_PASSES_SANITIZE_LEAK=true Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 5/8] tests: annotate t001*.sh " Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 6/8] tests: annotate t002*.sh " Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 7/8] tests: annotate select t0*.sh " Ævar Arnfjörð Bjarmason
2021-08-31 13:35       ` [PATCH v3 8/8] tests: annotate select t*.sh " Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y2bi0vvl.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=andrzej@ahunt.org \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=lenaic@lhuard.fr \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).