git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: phillip.wood@dunelm.org.uk
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Phillip Wood <phillip.wood123@gmail.com>,
	Jeff King <peff@peff.net>,
	Derrick Stolee <derrickstolee@github.com>,
	Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 13/17] rebase: don't leak on "--abort"
Date: Sat, 05 Nov 2022 13:01:42 +0100	[thread overview]
Message-ID: <221105.86eduhy3l8.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <f9df4dfc-6024-deb0-5277-bab96c81a875@dunelm.org.uk>


On Fri, Nov 04 2022, Phillip Wood wrote:

> On 03/11/2022 17:06, Ævar Arnfjörð Bjarmason wrote:
>> Fix a leak in the recent 6159e7add49 (rebase --abort: improve reflog
>> message, 2022-10-12). Before that commit we'd strbuf_release() the
>> reflog message we were formatting, but when that code was refactored
>> to use "ropts.head_msg" the strbuf_release() was omitted.
>> Ideally the three users of "ropts" in cmd_rebase() should use
>> different "ropts" variables, in practice they're completely separate,
>> as this and the other user in the "switch" statement will "goto
>> cleanup", which won't touch "ropts".
>> The third caller after the "switch" is then unreachable if we take
>> these two branches, so all of them are getting a "{ 0 }" init'd
>> "ropts".
>> So it's OK that we're leaving a stale pointer in "ropts.head_msg",
>> cleaning it up was our responsibility, and it won't be used again.
>> [...]
>> --- a/builtin/rebase.c
>> +++ b/builtin/rebase.c
>> @@ -1320,6 +1320,7 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>>   		if (reset_head(the_repository, &ropts) < 0)
>>   			die(_("could not move back to %s"),
>>   			    oid_to_hex(&options.orig_head->object.oid));
>> +		strbuf_release(&head_msg);
>>   		remove_branch_state(the_repository, 0);
>>   		ret = finish_rebase(&options);
>>   		goto cleanup;
>
> This looks fine. One could argue that the leak is not a "real" leak as
> we're about to exit but the omission of strbuf_release() was 
> unintentional and fix is nice and simple.

What I've been targeting in this and past leak fix topics is roughly
what valgrind calls[1] "definitely lost" and "indirectly lost", and
which -fsanitize=leak calls "that thing I die on whan I run into it" :)

So, you and I know we're "about to exit" as we're in cmd_rebase() and
about to return to common-main.c etc, but it's harder to convince a
compiler of that.

Maybe I'm just touchy, sorry. It just feels like every time I submit a
leak fix topic there's some rehash of the "why do this at all?" or "why
this one?"  discussion.

To summarize (maybe too much for just this reply, but I hope to link
back to this in the future):

 * Historically we've said "that's just a built-in, let the OS take care
   of it!"

 * I agree with that in principle. It's pretty pointless to free()
   things in cmd_*() in the abstract. We're "about to exit", even if
   valgrind etc. call it "definitely lost"[1].

 * But, it would be nice to have git's APIs not leak, so we could call
   them in loops, replace some of our own sub-process invocations
   etc[2].

 * If we had 100% test coverage for those libraries we could just test
   that, and get those leak-free, and ignore the built-ins.

 * We don't have that, and probably never will. E.g. "git log" and
   friends are *the* things that stress revision.[ch]. So to assert that
   the API is leak-free we really need to assert that its users
   are. Ditto for pretty much any other API we carry.

 * Still, we could just say "if something's alloc'd in cmd_*() don't
   care about it". Both LSAN and valgrind support "suppressions" to
   ignore certain functions, patterns etc.

 * But that doesn't really work either, as e.g. "struct rev_info" would
   seem to come from such a cmd_*() as far as ownership goes...

 * ..."but you could have an even more clever suppressions mechanism,
   and only ignore those things that are malloc'd by cmd_*(). But not
   e.g. a malloc in revision.c in a struct member in a struct that's on
   the stack in a cmd_*()", one might say.

   Yeah, you could do that, e.g. in this case we'll call strbuf_addf()
   which makes a beeline to malloc(). Sufficiently clever
   post-processing of leak stacktraces could probably make the
   distinction.

   But then you run into the problem of e.g. how the the log family and
   others use "mailmap". I.e. in that case (and many others) the cmd_*()
   will malloc(), but "hand it off" to the API, whose job we know it is
   to free() it.

   But we only know because of knowledge about the API, an automated
   system would care, still, it's not unworkable. We could go through
   those, list the exceptions etc. etc.

That's a lot of context, but I think brings us up-to-date on this
one-line change.

So, could we do things differently here? Definitely. And if the codebase
was in a state where e.g. builtin/*.c had ~1500 UNLEAK() already I'd
probably try to extend that, rather than fighting that uphill battle.

But that's not the case, we have around ~30 of them, fewer after this
topic. By far the majority of builtin/* is caring about directly
recahable leaks already.

E.g. just above this context in another "case" arm we're doing the exact
"variable on stack, use it, then release()", just with a "struct
string_list".

So I think this is the right direction to go in, both in this patch & in
general.

1. https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks
2. Even the assumption that cmd_*() is a one-off is shaky, e.g. for
   cases where we shell out to "git commit" we're close now to being
   able to just call cmd_commit(), but let's leave that aside for now).

   So, we're pretty close to cmd_*() being a funny API that happens to
   take strvec arguments, which is going to be the shortest path to
   eliminating sub-process invocation in many cases. I.e. as opposed to
   refactoring the cmd_*() into a "real" library, that takes a struct
   with options, doesn't call parse_options() etc.

  reply	other threads:[~2022-11-05 12:43 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-03 17:05 [PATCH 00/17] leak fixes: use existing constructors & other trivia Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 01/17] tests: mark tests as passing with SANITIZE=leak Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 02/17] {reset,merge}: call discard_index() before returning Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 03/17] commit: discard partial cache before (re-)reading it Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 04/17] read-cache.c: clear and free "sparse_checkout_patterns" Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 05/17] dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache" Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 06/17] built-ins & libs & helpers: add/move destructors, fix leaks Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 07/17] unpack-file: fix ancient leak in create_temp_file() Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 08/17] revision API: call graph_clear() in release_revisions() Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 09/17] ls-files: fix a --with-tree memory leak Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 10/17] sequencer.c: fix "opts->strategy" leak in read_strategy_opts() Ævar Arnfjörð Bjarmason
2022-11-04 14:50   ` Phillip Wood
2022-11-04 21:44     ` Taylor Blau
2022-11-05 12:43       ` Ævar Arnfjörð Bjarmason
2022-11-08 15:00     ` Phillip Wood
2022-11-08 15:26       ` Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 11/17] connected.c: free the "struct packed_git" Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 12/17] sequencer.c: fix a pick_commits() leak Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 13/17] rebase: don't leak on "--abort" Ævar Arnfjörð Bjarmason
2022-11-04 14:42   ` Phillip Wood
2022-11-05 12:01     ` Ævar Arnfjörð Bjarmason [this message]
2022-11-03 17:06 ` [PATCH 14/17] sequencer.c: fix sequencer_continue() leak Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 15/17] cherry-pick: free "struct replay_opts" members Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 16/17] revert: fix parse_options_concat() leak Ævar Arnfjörð Bjarmason
2022-11-03 17:06 ` [PATCH 17/17] built-ins: use free() not UNLEAK() if trivial, rm dead code Ævar Arnfjörð Bjarmason
2022-11-04 15:20 ` [PATCH 00/17] leak fixes: use existing constructors & other trivia Phillip Wood
2022-11-05 12:46   ` Ævar Arnfjörð Bjarmason
2022-11-07  9:46     ` Phillip Wood
2022-11-08 18:17 ` [PATCH v2 00/15] " Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 01/15] tests: mark tests as passing with SANITIZE=leak Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 02/15] {reset,merge}: call discard_index() before returning Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 03/15] commit: discard partial cache before (re-)reading it Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 04/15] read-cache.c: clear and free "sparse_checkout_patterns" Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 05/15] dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache" Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 06/15] built-ins & libs & helpers: add/move destructors, fix leaks Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 07/15] unpack-file: fix ancient leak in create_temp_file() Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 08/15] revision API: call graph_clear() in release_revisions() Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 09/15] ls-files: fix a --with-tree memory leak Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 10/15] sequencer.c: fix "opts->strategy" leak in read_strategy_opts() Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 11/15] connected.c: free the "struct packed_git" Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 12/15] rebase: don't leak on "--abort" Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 13/15] cherry-pick: free "struct replay_opts" members Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 14/15] revert: fix parse_options_concat() leak Ævar Arnfjörð Bjarmason
2022-11-08 18:17   ` [PATCH v2 15/15] built-ins: use free() not UNLEAK() if trivial, rm dead code Ævar Arnfjörð Bjarmason
2022-11-08 20:54   ` [PATCH v2 00/15] leak fixes: use existing constructors & other trivia Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221105.86eduhy3l8.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).