git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Andrei Rybak <rybak.a.v@gmail.com>
Subject: Re: [PATCH v3 1/3] bundle cmd: stop leaking memory from parse_options_cmd_bundle()
Date: Thu, 1 Jul 2021 11:41:21 -0400	[thread overview]
Message-ID: <YN3iIaovvG7XgLQP@coredump.intra.peff.net> (raw)
In-Reply-To: <87v95vdxrc.fsf@evledraar.gmail.com>

On Wed, Jun 30, 2021 at 08:00:50PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > So this code is perfectly fine and unavoidable. In the case you were
> > touching it was foo() that was calling die() directly, so we could work
> > around it with some conditionals. But from the leak-checker's
> > perspective the two cases are the same.
> 
> I've got you to blame for introducing SANITIZE=*. Now I've got these
> leak checkers all mixed up :)
> 
> Yes you're right, FWIW I (re-)wrote this commit message just before
> sending and should have said "a heap leak checker" instead of
> "SANITIZE=leak", I really meant valgrind.

Ah, OK, that makes more sense.

I think the distinction here isn't "heap leak checker" (since valgrind
does still look at the whole memory space to find references), but
rather the treatment of "still reachable" items. I think LSan/ASan
ignore these entirely.

> I originally ended with the "we are about to die" tracking because I was
> tracing things with valgrind, which does complain about this sort of
> thing. I.e.:
>     
>      24 bytes in 1 blocks are still reachable in loss record 8 of 21
>         at 0x48356AF: malloc (vg_replace_malloc.c:298)
>         by 0x4837DE7: realloc (vg_replace_malloc.c:826)
>         by 0x3C06F1: xrealloc (wrapper.c:126)
>         by 0x380EC9: strbuf_grow (strbuf.c:98)
>         by 0x381A14: strbuf_add (strbuf.c:297)
>         by 0x20ADC5: strbuf_addstr (strbuf.h:304)
>         by 0x20B66D: prefix_filename (abspath.c:277)
>         by 0x13CDC6: parse_options_cmd_bundle (bundle.c:55)
>         by 0x13D565: cmd_bundle_unbundle (bundle.c:170)
>         by 0x13D829: cmd_bundle (bundle.c:214)
>         by 0x1279F4: run_builtin (git.c:461)
>         by 0x127DFB: handle_builtin (git.c:714)
> 
> Re what I mentioned/asked in
> https://lore.kernel.org/git/87czsv2idy.fsf@evledraar.gmail.com/ I was
> experimenting with doing leak checking in the tests.
> 
> In this case we have 21 in total under --show-leak-kinds=all (and it was
> 20 in v2 of this series).

IMHO these "still reachable" leaks are not interesting. They'll fall
into one of two categories:

  - allocations still on the stack when we exit() without unwinding.
    These are always uninteresting, since by definition we are exiting
    the process.

  - globals that hang on to allocations (which will still be present
    even if we exit the program by returning up through main()). Most of
    these will be items we intend to last for the program length (e.g.,
    fields in the_repository).

    It's _possible_ that some of these could be interesting. E.g., a
    program might have two phases: in the first, we rely on some
    subsystem which uses global variables to cache some data (say,
    parsed config values in remote.c or userdiff.c), and in the second
    phase we no longer use that subsystem. We could be instructing the
    subsystem to free up the memory after the first phase. But in
    practice this is awkward, because the program-level code doesn't
    know about subsystem allocations (or even which subsystems might
    have been recursively triggered).

    And while still-reachable tracking could be used to find
    opportunities like this, I think there's a better approach to
    finding these. If subsystems avoid global-variable caches and
    instead stick their allocations into context structs, then the
    memory is associated with those structs. So for example, if
    userdiff attached its storage to a diff_options, which is attached
    to a rev_info, then any "leaking" is all predicated on the rev_info
    (which the main program either cleans up, or annotates with UNLEAK).

> Maybe if we do end up with a test mode for this we shouldn't care about
> checkers like valgrind and only cater to the SANITIZE=leak mode.

I do find SANITIZE=leak to be a more useful tool in general, but I'm not
at all against using valgrind if people find it convenient. I just think
"reachable" leaks are not interesting enough to be part of what we're
searching for when we run the test suite.

> I'm still partial to the idea that we'll get the most win out of
> something that we can run in the tests by default, i.e. we'll only need
> to have a valgrind on the system & have someone run "make test" to run a
> (limited set of) tests with this.
> 
> Whereas SANITIZE=leak is always a dev-only feature people may not have
> on all the time.

That is exactly why I think "SANITIZE=leak" is better: it is easier for
people to run. valgrind is _extremely_ slow. It's also not available
everywhere; ASan/LSan aren't either, but they're pretty standard in
clang and gcc these days.

It is nice that valgrind can run on an un-instrumented binary, but I
sort of assume that anybody running "make test" will be able to build
the binary, since "make" is going to want to do that. I.e., I think
"make SANITIZE=leak test" is already doing what we want (we just need to
further annotate known-failures).

-Peff

  reply	other threads:[~2021-07-01 15:41 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-17 11:21 [PATCH 0/3] bundle.c: remove "ref_list" in favor of string-list.c API Ævar Arnfjörð Bjarmason
2021-06-17 11:21 ` [PATCH 1/3] bundle cmd: stop leaking memory from parse_options_cmd_bundle() Ævar Arnfjörð Bjarmason
2021-06-17 11:21 ` [PATCH 2/3] bundle.c: use a temporary variable for OIDs and names Ævar Arnfjörð Bjarmason
2021-06-17 11:21 ` [PATCH 3/3] bundle: remove "ref_list" in favor of string-list.c API Ævar Arnfjörð Bjarmason
2021-06-19  2:12   ` Andrei Rybak
2021-06-21 15:16 ` [PATCH v2 0/3] bundle.c: " Ævar Arnfjörð Bjarmason
2021-06-21 15:16   ` [PATCH v2 1/3] bundle cmd: stop leaking memory from parse_options_cmd_bundle() Ævar Arnfjörð Bjarmason
2021-06-24 16:54     ` Jeff King
2021-06-24 19:28       ` Ævar Arnfjörð Bjarmason
2021-06-21 15:16   ` [PATCH v2 2/3] bundle.c: use a temporary variable for OIDs and names Ævar Arnfjörð Bjarmason
2021-06-21 15:16   ` [PATCH v2 3/3] bundle: remove "ref_list" in favor of string-list.c API Ævar Arnfjörð Bjarmason
2021-06-24 17:11     ` Jeff King
2021-06-24 19:31       ` Ævar Arnfjörð Bjarmason
2021-06-29  1:02       ` Junio C Hamano
2021-06-24 17:14   ` [PATCH v2 0/3] bundle.c: " Jeff King
2021-06-30 14:06   ` [PATCH v3 0/3] Ævar Arnfjörð Bjarmason
2021-06-30 14:06     ` [PATCH v3 1/3] bundle cmd: stop leaking memory from parse_options_cmd_bundle() Ævar Arnfjörð Bjarmason
2021-06-30 17:26       ` Jeff King
2021-06-30 18:00         ` Ævar Arnfjörð Bjarmason
2021-07-01 15:41           ` Jeff King [this message]
2021-06-30 14:06     ` [PATCH v3 2/3] bundle.c: use a temporary variable for OIDs and names Ævar Arnfjörð Bjarmason
2021-06-30 14:06     ` [PATCH v3 3/3] bundle: remove "ref_list" in favor of string-list.c API Ævar Arnfjörð Bjarmason
2021-06-30 21:18       ` Junio C Hamano
2021-06-30 17:34     ` [PATCH v3 0/3] Jeff King
2021-06-30 17:45       ` Jeff King
2021-06-30 18:00         ` Ævar Arnfjörð Bjarmason
2021-07-01 10:53           ` Ævar Arnfjörð Bjarmason
2021-07-02  9:57     ` [PATCH v4 0/3] bundle.c: remove "ref_list" in favor of string-list.c API Ævar Arnfjörð Bjarmason
2021-07-02  9:57       ` [PATCH v4 1/3] bundle cmd: stop leaking memory from parse_options_cmd_bundle() Ævar Arnfjörð Bjarmason
2021-07-02  9:57       ` [PATCH v4 2/3] bundle.c: use a temporary variable for OIDs and names Ævar Arnfjörð Bjarmason
2021-07-02  9:57       ` [PATCH v4 3/3] bundle: remove "ref_list" in favor of string-list.c API Ævar Arnfjörð Bjarmason
2021-07-03 10:52       ` [PATCH v4 0/3] bundle.c: " Jeff King
2021-07-03 11:28         ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YN3iIaovvG7XgLQP@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=rybak.a.v@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).