git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] commit: trivial leak fix, add 2 tests to linux-leaks CI
@ 2022-02-16  8:21 Ævar Arnfjörð Bjarmason
  2022-02-16  8:21 ` [PATCH 1/2] commit: fix "author_ident" leak Ævar Arnfjörð Bjarmason
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-16  8:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason

This trivial series fixes a leak in "git commit", removes two UNLEAK()
in favor of simply calling strbuf_release(), and marks two tests that
previously hit that UNLEAK() being run in the linux-leaks CI job.

Ævar Arnfjörð Bjarmason (2):
  commit: fix "author_ident" leak
  commit: use strbuf_release() instead of UNLEAK()

 builtin/commit.c                 | 13 ++++++++-----
 t/t2203-add-intent.sh            |  1 +
 t/t7011-skip-worktree-reading.sh |  1 +
 3 files changed, 10 insertions(+), 5 deletions(-)

-- 
2.35.1.1028.g2d2d4be19de


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] commit: fix "author_ident" leak
  2022-02-16  8:21 [PATCH 0/2] commit: trivial leak fix, add 2 tests to linux-leaks CI Ævar Arnfjörð Bjarmason
@ 2022-02-16  8:21 ` Ævar Arnfjörð Bjarmason
  2022-02-16 17:59   ` Junio C Hamano
  2022-02-16  8:21 ` [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK() Ævar Arnfjörð Bjarmason
  2022-05-12 22:51 ` [PATCH] commit: fix "author_ident" leak Junio C Hamano
  2 siblings, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-16  8:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason

Fix a leak in cmd_commit(), since 4c28e4ada03 (commit: die before
asking to edit the log message, 2010-12-20) we have been freeing the
"author_ident" "struct strbuf", but not in the case where
prepare_to_commit() returns non-zero.

This fixes a leak demonstrated by e.g. "t3505-cherry-pick-empty.sh",
but unfortunately we cannot mark it or other affected tests as passing
now with "TEST_PASSES_SANITIZE_LEAK=true" as we'll need to fix many
other memory leaks before doing so.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 6b99ac276d8..696b3527adf 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1689,6 +1689,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	struct commit *current_head = NULL;
 	struct commit_extra_header *extra = NULL;
 	struct strbuf err = STRBUF_INIT;
+	int ret = 0;
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_commit_usage, builtin_commit_options);
@@ -1723,8 +1724,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	   running hooks, writing the trees, and interacting with the user.  */
 	if (!prepare_to_commit(index_file, prefix,
 			       current_head, &s, &author_ident)) {
+		ret = 1;
 		rollback_index_files();
-		return 1;
+		goto cleanup;
 	}
 
 	/* Determine parents */
@@ -1822,7 +1824,6 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		rollback_index_files();
 		die(_("failed to write commit object"));
 	}
-	strbuf_release(&author_ident);
 	free_commit_extra_headers(extra);
 
 	if (update_head_with_reflog(current_head, &oid, reflog_msg, &sb,
@@ -1863,7 +1864,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 
 	apply_autostash(git_path_merge_autostash(the_repository));
 
+cleanup:
+	strbuf_release(&author_ident);
 	UNLEAK(err);
 	UNLEAK(sb);
-	return 0;
+	return ret;
 }
-- 
2.35.1.1028.g2d2d4be19de


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK()
  2022-02-16  8:21 [PATCH 0/2] commit: trivial leak fix, add 2 tests to linux-leaks CI Ævar Arnfjörð Bjarmason
  2022-02-16  8:21 ` [PATCH 1/2] commit: fix "author_ident" leak Ævar Arnfjörð Bjarmason
@ 2022-02-16  8:21 ` Ævar Arnfjörð Bjarmason
  2022-02-16 18:03   ` Junio C Hamano
  2022-05-12 22:51 ` [PATCH] commit: fix "author_ident" leak Junio C Hamano
  2 siblings, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-16  8:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason

Convert the UNLEAK() added in 0e5bba53af7 (add UNLEAK annotation for
reducing leak false positives, 2017-09-08) to release the memory using
strbuf_release() instead.

The tests being marked as passing with
"TEST_PASSES_SANITIZE_LEAK=true" already passed before due to the
UNLEAK(), but now they really don't leak memory, so let's mark them as
such.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/commit.c                 | 4 ++--
 t/t2203-add-intent.sh            | 1 +
 t/t7011-skip-worktree-reading.sh | 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 696b3527adf..c38ae2b7656 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1866,7 +1866,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 
 cleanup:
 	strbuf_release(&author_ident);
-	UNLEAK(err);
-	UNLEAK(sb);
+	strbuf_release(&err);
+	strbuf_release(&sb);
 	return ret;
 }
diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index db7ca559986..ebf58db2d18 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -2,6 +2,7 @@
 
 test_description='Intent to add'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'intent to add' '
diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh
index 1761a2b1b99..4adac5acd57 100755
--- a/t/t7011-skip-worktree-reading.sh
+++ b/t/t7011-skip-worktree-reading.sh
@@ -5,6 +5,7 @@
 
 test_description='skip-worktree bit test'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 cat >expect.full <<EOF
-- 
2.35.1.1028.g2d2d4be19de


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] commit: fix "author_ident" leak
  2022-02-16  8:21 ` [PATCH 1/2] commit: fix "author_ident" leak Ævar Arnfjörð Bjarmason
@ 2022-02-16 17:59   ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2022-02-16 17:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Fix a leak in cmd_commit(), since 4c28e4ada03 (commit: die before
> asking to edit the log message, 2010-12-20) we have been freeing the
> "author_ident" "struct strbuf", but not in the case where

It took me an actual reading of the code to see that the above
refers to one and the same thing (i.e. "author_ident variable of
type struct strbuf").  I think it should be sufficient and clearer
to just mention "author_ident" here.

> prepare_to_commit() returns non-zero.

Good eyes.  

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/commit.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 6b99ac276d8..696b3527adf 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1689,6 +1689,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>  	struct commit *current_head = NULL;
>  	struct commit_extra_header *extra = NULL;
>  	struct strbuf err = STRBUF_INIT;
> +	int ret = 0;
>  
>  	if (argc == 2 && !strcmp(argv[1], "-h"))
>  		usage_with_options(builtin_commit_usage, builtin_commit_options);
> @@ -1723,8 +1724,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>  	   running hooks, writing the trees, and interacting with the user.  */
>  	if (!prepare_to_commit(index_file, prefix,
>  			       current_head, &s, &author_ident)) {
> +		ret = 1;
>  		rollback_index_files();
> -		return 1;
> +		goto cleanup;
>  	}
>  
>  	/* Determine parents */
> @@ -1822,7 +1824,6 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>  		rollback_index_files();
>  		die(_("failed to write commit object"));
>  	}
> -	strbuf_release(&author_ident);
>  	free_commit_extra_headers(extra);

Hmph, if we hit one of the two die() after this point before we
reach the "cleanup" label, author_ident will be left on the stack,
which we may want to UNLEAK()?

I am wondering if prepare_to_commit(), which is the one that is
responsible for allocating and using the information in the strbuf,
should be the one who is responsible for cleaning it when it failed
to do its thing, but I do not think it is a good idea, because the
caller MUST release it in the success case anyway.  So dealing with
the releasing here does make sense.

By jumping to the cleanup label, we not just release author_ident,
but we start unleak'ing err and sb as well, which shouldn't be a
problem, hopefully.

Will queue.

Thanks.

>  	if (update_head_with_reflog(current_head, &oid, reflog_msg, &sb,
> @@ -1863,7 +1864,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>  
>  	apply_autostash(git_path_merge_autostash(the_repository));
>  
> +cleanup:
> +	strbuf_release(&author_ident);
>  	UNLEAK(err);
>  	UNLEAK(sb);
> -	return 0;
> +	return ret;
>  }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK()
  2022-02-16  8:21 ` [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK() Ævar Arnfjörð Bjarmason
@ 2022-02-16 18:03   ` Junio C Hamano
  2022-02-16 18:30     ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2022-02-16 18:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Convert the UNLEAK() added in 0e5bba53af7 (add UNLEAK annotation for
> reducing leak false positives, 2017-09-08) to release the memory using
> strbuf_release() instead.
>
> The tests being marked as passing with
> "TEST_PASSES_SANITIZE_LEAK=true" already passed before due to the
> UNLEAK(), but now they really don't leak memory, so let's mark them as
> such.

That smells like a brave move.

Specifically, the cited commit turned an existing strbuf_release()
on &err into UNLEAK().  If that and the other strbuf (sb) were so
easily releasable, why didn't we do so back then already?

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/commit.c                 | 4 ++--
>  t/t2203-add-intent.sh            | 1 +
>  t/t7011-skip-worktree-reading.sh | 1 +
>  3 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 696b3527adf..c38ae2b7656 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1866,7 +1866,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>  
>  cleanup:
>  	strbuf_release(&author_ident);
> -	UNLEAK(err);
> -	UNLEAK(sb);
> +	strbuf_release(&err);
> +	strbuf_release(&sb);
>  	return ret;
>  }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK()
  2022-02-16 18:03   ` Junio C Hamano
@ 2022-02-16 18:30     ` Junio C Hamano
  2022-02-18 12:35       ` Whether to keep using UNLEAK() in built-ins (was: [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK()) Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2022-02-16 18:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Convert the UNLEAK() added in 0e5bba53af7 (add UNLEAK annotation for
>> reducing leak false positives, 2017-09-08) to release the memory using
>> strbuf_release() instead.
>>
>> The tests being marked as passing with
>> "TEST_PASSES_SANITIZE_LEAK=true" already passed before due to the
>> UNLEAK(), but now they really don't leak memory, so let's mark them as
>> such.
>
> That smells like a brave move.
>
> Specifically, the cited commit turned an existing strbuf_release()
> on &err into UNLEAK().  If that and the other strbuf (sb) were so
> easily releasable, why didn't we do so back then already?

I suspect that the answer to the above question is because these
allocations are in the top-level cmd_commit() function, which is
never called recursively or repeatedly as a subroutine.  The only
significant thing that happens after we return from it is to exit.

In such a code path, marking a variable as UNLEAK() is a better
thing to do than calling strbuf_release().  Both will work as a way
to squelch sanitizers from reporting a leak that does not matter,
but calling strbuf_release() means we'd spend extra cycles to return
pieces of memory to the pool, even though we know that the pool
itself will be cleaned immediately later at exit.

We already have UNLEAK to tell sanitizers not to distract us from
spotting and plugging real leaks by reporting these apparent leaks
that do not matter.  It is of somewhat dubious value to do a "we
care too much about pleasing sanitizer and spend extra cycles at
runtime while real users are doing real work" change.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Whether to keep using UNLEAK() in built-ins (was: [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK())
  2022-02-16 18:30     ` Junio C Hamano
@ 2022-02-18 12:35       ` Ævar Arnfjörð Bjarmason
  2022-02-18 18:19         ` Whether to keep using UNLEAK() in built-ins Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-18 12:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Andrzej Hunt, Martin Ågren


On Wed, Feb 16 2022, Junio C Hamano wrote:

[CC-ing some people using/interested in UNLEAK()]

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>
>>> Convert the UNLEAK() added in 0e5bba53af7 (add UNLEAK annotation for
>>> reducing leak false positives, 2017-09-08) to release the memory using
>>> strbuf_release() instead.
>>>
>>> The tests being marked as passing with
>>> "TEST_PASSES_SANITIZE_LEAK=true" already passed before due to the
>>> UNLEAK(), but now they really don't leak memory, so let's mark them as
>>> such.
>>
>> That smells like a brave move.
>>
>> Specifically, the cited commit turned an existing strbuf_release()
>> on &err into UNLEAK().  If that and the other strbuf (sb) were so
>> easily releasable, why didn't we do so back then already?
>
> I suspect that the answer to the above question is because these
> allocations are in the top-level cmd_commit() function, which is
> never called recursively or repeatedly as a subroutine.  The only
> significant thing that happens after we return from it is to exit.
>
> In such a code path, marking a variable as UNLEAK() is a better
> thing to do than calling strbuf_release().  Both will work as a way
> to squelch sanitizers from reporting a leak that does not matter,
> but calling strbuf_release() means we'd spend extra cycles to return
> pieces of memory to the pool, even though we know that the pool
> itself will be cleaned immediately later at exit.
>
> We already have UNLEAK to tell sanitizers not to distract us from
> spotting and plugging real leaks by reporting these apparent leaks
> that do not matter.  It is of somewhat dubious value to do a "we
> care too much about pleasing sanitizer and spend extra cycles at
> runtime while real users are doing real work" change.

We've had several discussions about the utility of UNLEAK() as I've been
submitting these patches, and I thought that if we weren't 100% on the
same page it was at least clear what I was going for here.

Per https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/ the
real goal I have in mind here is to use the built-ins as a stand-in for
testing whether the underlying APIs are leak-free.

Because of that having to reason about UNLEAK() doing the right thing or
not is just unneeded distraction. Yes for a "struct strbuf" it won't
matter, but most of what we UNLEAK() is more complex stuff like "struct
rev_info". We won't really make headway making revision.c not leak
memory without using "git log" et al as the test subjects for whether
that API leaks.

We are also freeing most of this already even for built-ins, e.g. see
(not all of these are applicable obviously, but per the numbers enough
are to make the point):

    $ git grep '\bstrbuf_release\(' -- builtin | wc -l
    456
    $ git grep '\bUNLEAK\(' -- builtin | wc -l
    29

So one goal I've got with these patches is to eventually get rid of
UNLEAK() entirely. We only use it in these few instances:

    $ git grep '\bUNLEAK\(' -- '*.c' | wc -l
    31

Per the above we'd want to convert any that deal with the complex
structures that are the big source of leaks to doing real releases for
testing the APIs. That'll leave only a handful of remaining legitimate
uses, which I don't think are worth keeping some UNLEAK() API around
for, v.s. just freeing them.

I think we've also somewhat been talking past each other in past
exchanges (including me with Jeff King) about what you call "real
leaks".

When I'm referring to memory leaks I'm talking about them in the more
inclusive sense explained in this valgrind documentation:
https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks;
Basically "a malloc() not followed by a free() is a leak".

But you and Jeff King have (I think) commonly used that term to exclude
what's called "still reachable" in that table. Those *are* memory leaks,
but as explained in that table just ones that arguably don't matter. Or
at least ones most people tracing leaks don't want reported by default.

UNLEAK() is just a mechanism for moving a memory leak from another
category into "still reachable". That will make LSAN blind to it, and
valgrind in many cases due to it being a heap tracer. But it & other
memory leaking tools *will* report even on those if given the right
options.

So I've found it useful to get rid of UNLEAK() in some cases for those,
because even if LSAN runs clean it's useful to run valgrind in its more
pedantic tracing modes to see even the "still unreachable", and if we
should care about it or not.

Some of the leaks in those category *are* "real leaks". I.e. you can
have an ever-growing structure with global reach that's always "still
reachable", but when using the API would eventually OOM you.

So while it's a useful heuristic for spotting "will be cleaned up at
exit anyway" you can't rely on it, so I've been looking at them anyway.

So being able to just free() them so I can permanently ignore them as I
fix more leaks would be useful to me, so I hope you'll agree on just
talking this as-is, thanks! :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Whether to keep using UNLEAK() in built-ins
  2022-02-18 12:35       ` Whether to keep using UNLEAK() in built-ins (was: [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK()) Ævar Arnfjörð Bjarmason
@ 2022-02-18 18:19         ` Junio C Hamano
  2022-02-18 19:31           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2022-02-18 18:19 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Andrzej Hunt, Martin Ågren

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>>> Specifically, the cited commit turned an existing strbuf_release()
>>> on &err into UNLEAK().  If that and the other strbuf (sb) were so
>>> easily releasable, why didn't we do so back then already?
>>
>> I suspect that the answer to the above question is because these
>> allocations are in the top-level cmd_commit() function, which is
>> never called recursively or repeatedly as a subroutine.  The only
>> significant thing that happens after we return from it is to exit.
>>
>> In such a code path, marking a variable as UNLEAK() is a better
>> thing to do than calling strbuf_release().  Both will work as a way
>> to squelch sanitizers from reporting a leak that does not matter,
>> but calling strbuf_release() means we'd spend extra cycles to return
>> pieces of memory to the pool, even though we know that the pool
>> itself will be cleaned immediately later at exit.
>>
>> We already have UNLEAK to tell sanitizers not to distract us from
>> spotting and plugging real leaks by reporting these apparent leaks
>> that do not matter.  It is of somewhat dubious value to do a "we
>> care too much about pleasing sanitizer and spend extra cycles at
>> runtime while real users are doing real work" change.

> Per https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/ the
> real goal I have in mind here is to use the built-ins as a stand-in for
> testing whether the underlying APIs are leak-free.
>
> Because of that having to reason about UNLEAK() doing the right thing or
> not is just unneeded distraction. Yes for a "struct strbuf" it won't
> matter, but most of what we UNLEAK() is more complex stuff like "struct
> rev_info". We won't really make headway making revision.c not leak
> memory without using "git log" et al as the test subjects for whether
> that API leaks.

I have to say that you have a wrong goal and wrong priority.  The
number of UNLEAK in cmd_foo() functions is not even a poor
approximation of our progress.

Imagine that a subsystem that are repeatedly set-up and used during
a single program invocation was leaky.  Let's say a program calls
diff_setup() to prepare the diff_options structure, feeds two things
to compare, and calls diff_flush() to show the comparison result,
but let's assume this sequence leaks some resources.

Now cmd_diff() may be such a program that does a setup, feeds two
trees, calls diff_flush() and exits.  If we didn't do anything to
it, diff_options may "leak".  Marking it with UNLEAK may be a good
measure, if we want to keep the leak checker from reporting a leak
that does not matter in practice so that we can concentrate on
plugging real leaks that matter.

But consider cmd_log(), running something like "git log -p".  It
iterates over commits, does the <setup, compare two trees, flush>
repeatedly for each commit it encounters during the walk on the same
diff_options structure.  Now, the leak in the code path does matter.
If diff_flush() is left leaky, it will show up in the leak checker's
output, and that is reporting real leaks that matter.  The thing is,
the <setup, compare, flush> sequence whose leak matters is not in
cmd_log(); it is much deeper in the revision walking machinery.  We
do not want to paper over such leaks with UNLEAK.

See the difference?  The number of UNLEAK OUTside built-ins does
matter.  We do not want to have UNLEAK there to hide leaks in
possible callees that may be called arbitrary number of times in
arbitrary order.  Compared to that, UNLEAK in cmd_foo() to mark an
on-stack variable that was used only once without even a recursion
is purely to squelch the leak checker from reporting leaks that does
not matter.  For simple things like strbuf on stack of the top-level
cmd_foo() functions, we could even leave strbuf_release() not called
and leave the resource reclamation to _exit(2).  That would cause
the leak checkers to report them and distract us, and UNLEAK is a
better way to squelch the distraction without wasting extra cycles
at runtime to actually free them.

So, don't look at "built-ins as a stand-in".  It is a wrong
direction to go in to let the "leak checker" tail wag the dog.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Whether to keep using UNLEAK() in built-ins
  2022-02-18 18:19         ` Whether to keep using UNLEAK() in built-ins Junio C Hamano
@ 2022-02-18 19:31           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-18 19:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Andrzej Hunt, Martin Ågren


On Fri, Feb 18 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>>>> Specifically, the cited commit turned an existing strbuf_release()
>>>> on &err into UNLEAK().  If that and the other strbuf (sb) were so
>>>> easily releasable, why didn't we do so back then already?
>>>
>>> I suspect that the answer to the above question is because these
>>> allocations are in the top-level cmd_commit() function, which is
>>> never called recursively or repeatedly as a subroutine.  The only
>>> significant thing that happens after we return from it is to exit.
>>>
>>> In such a code path, marking a variable as UNLEAK() is a better
>>> thing to do than calling strbuf_release().  Both will work as a way
>>> to squelch sanitizers from reporting a leak that does not matter,
>>> but calling strbuf_release() means we'd spend extra cycles to return
>>> pieces of memory to the pool, even though we know that the pool
>>> itself will be cleaned immediately later at exit.
>>>
>>> We already have UNLEAK to tell sanitizers not to distract us from
>>> spotting and plugging real leaks by reporting these apparent leaks
>>> that do not matter.  It is of somewhat dubious value to do a "we
>>> care too much about pleasing sanitizer and spend extra cycles at
>>> runtime while real users are doing real work" change.
>
>> Per https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/ the
>> real goal I have in mind here is to use the built-ins as a stand-in for
>> testing whether the underlying APIs are leak-free.
>>
>> Because of that having to reason about UNLEAK() doing the right thing or
>> not is just unneeded distraction. Yes for a "struct strbuf" it won't
>> matter, but most of what we UNLEAK() is more complex stuff like "struct
>> rev_info". We won't really make headway making revision.c not leak
>> memory without using "git log" et al as the test subjects for whether
>> that API leaks.
>
> I have to say that you have a wrong goal and wrong priority.  The
> number of UNLEAK in cmd_foo() functions is not even a poor
> approximation of our progress.

To clarify I'm not saying it's an approximation of how close we are to
getting rid of memory leaks. I think I'll know better than anyone what
the current state of shoveling shit uphill that is.

I was suggesting that these's a trivial number of these, and mostly we
strbuf_release() already, and that on modern libc's free() is pretty
much free anyway, so worrying about using UNLEAK() v.s. just freeing
before exit is some combination of a premature optimization and
needlessly retaining a special-case for the built-ins.

> Imagine that a subsystem that are repeatedly set-up and used during
> a single program invocation was leaky.  Let's say a program calls
> diff_setup() to prepare the diff_options structure, feeds two things
> to compare, and calls diff_flush() to show the comparison result,
> but let's assume this sequence leaks some resources.
>
> Now cmd_diff() may be such a program that does a setup, feeds two
> trees, calls diff_flush() and exits.  If we didn't do anything to
> it, diff_options may "leak".  Marking it with UNLEAK may be a good
> measure, if we want to keep the leak checker from reporting a leak
> that does not matter in practice so that we can concentrate on
> plugging real leaks that matter.
>
> But consider cmd_log(), running something like "git log -p".  It
> iterates over commits, does the <setup, compare two trees, flush>
> repeatedly for each commit it encounters during the walk on the same
> diff_options structure.  Now, the leak in the code path does matter.
> If diff_flush() is left leaky, it will show up in the leak checker's
> output, and that is reporting real leaks that matter.  The thing is,
> the <setup, compare, flush> sequence whose leak matters is not in
> cmd_log(); it is much deeper in the revision walking machinery.  We
> do not want to paper over such leaks with UNLEAK.
>
> See the difference?  The number of UNLEAK OUTside built-ins does
> matter.  We do not want to have UNLEAK there to hide leaks in
> possible callees that may be called arbitrary number of times in
> arbitrary order.  Compared to that, UNLEAK in cmd_foo() to mark an
> on-stack variable that was used only once without even a recursion
> is purely to squelch the leak checker from reporting leaks that does
> not matter.  For simple things like strbuf on stack of the top-level
> cmd_foo() functions, we could even leave strbuf_release() not called
> and leave the resource reclamation to _exit(2).  That would cause
> the leak checkers to report them and distract us, and UNLEAK is a
> better way to squelch the distraction without wasting extra cycles
> at runtime to actually free them.
>
> So, don't look at "built-ins as a stand-in".  It is a wrong
> direction to go in to let the "leak checker" tail wag the dog.

Yes, I see the difference and I fully accept the point you're making.

I just don't find it to be a useful distinction in practice, because for
e.g. "git log" we do have certain uses of the API which are only
performed by the top-level part of a "git log --whatever".

So if we take the view that e.g. the "struct rev_info rev" should be
UNLEAK()'d because "we're about to exit anyway" then as soon as we *do*
use that part of the API we'll run into the previously unfixed and
hidden leak.

Which is why as noted in the linked-to-above
https://lore.kernel.org/git/87a6k8daeu.fsf@evledraar.gmail.com/ I think
it's useful to fix leaks even in built-ins, because they're useful
canaries for those APIs.

Now, in this case does it really matter? No, it doesn't currently. But I
think the direction we should be heading in is turning more of these
cmd_whatever() into properly functioning libraries, as e.g. John Cai's
just-sent series to do that for "reflog delete" does:
https://lore.kernel.org/git/pull.1218.git.git.1645209647.gitgitgadget@gmail.com/

Once we do that the UNLEAK() here will absolutely need to be changed to
a strbuf_release() or equivalent, since if you make two commits with
that new library we'll have leaked memory.

So I think it makes sense to just fix leaks everywhere if it's easy
without worrying about the distinction, particularly since I haven't
seen that "wasting cycles" concern matter in practice.

If it did I'd think adding a "GIT_DESTRUCT_LEVEL" as I suggested in the
linked E-Mail would make more sense, since e.g. you could also use that
if you knew you were walking 100 commits & exiting, which can be useful
to further reduce free() churn, and is how a similar global in the Perl
API (PERL_DESTRUCT_LEVEL) works.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] commit: fix "author_ident" leak
  2022-02-16  8:21 [PATCH 0/2] commit: trivial leak fix, add 2 tests to linux-leaks CI Ævar Arnfjörð Bjarmason
  2022-02-16  8:21 ` [PATCH 1/2] commit: fix "author_ident" leak Ævar Arnfjörð Bjarmason
  2022-02-16  8:21 ` [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK() Ævar Arnfjörð Bjarmason
@ 2022-05-12 22:51 ` Junio C Hamano
  2022-05-17 13:48   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2022-05-12 22:51 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

Since 4c28e4ada03 (commit: die before asking to edit the log
message, 2010-12-20), we have been "leaking" the "author_ident" when
prepare_to_commit() fails.  Instead of returning from right there,
introduce an exit status variable and jump to the clean-up label
at the end.

Instead of explicitly releasing the resource with strbuf_release(),
mark the variable with UNLEAK() at the end, together with two other
variables that are already marked as such.  If this were in a
utility function that is called number of times, but these are
different, we should explicitly release resources that grow
proportionally to the size of the problem being solved, but
cmd_commit() is like main() and there is no point in spending extra
cycles to release individual pieces of resource at the end, just
before process exit will clean everything for us for free anyway.

This fixes a leak demonstrated by e.g. "t3505-cherry-pick-empty.sh",
but unfortunately we cannot mark it or other affected tests as passing
now with "TEST_PASSES_SANITIZE_LEAK=true" as we'll need to fix many
other memory leaks before doing so.

Incidentally there are two tests that always passes the leak checker
with or without this change.  Mark them as such.

This is based on an earlier patch by Ævar, but takes a different
approach that is more maintainable.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/commit.c                 | 9 ++++++---
 t/t2203-add-intent.sh            | 1 +
 t/t7011-skip-worktree-reading.sh | 1 +
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index b9ed0374e3..4e8b3d3251 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1688,6 +1688,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	struct commit *current_head = NULL;
 	struct commit_extra_header *extra = NULL;
 	struct strbuf err = STRBUF_INIT;
+	int ret = 0;
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_commit_usage, builtin_commit_options);
@@ -1722,8 +1723,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 	   running hooks, writing the trees, and interacting with the user.  */
 	if (!prepare_to_commit(index_file, prefix,
 			       current_head, &s, &author_ident)) {
+		ret = 1;
 		rollback_index_files();
-		return 1;
+		goto cleanup;
 	}
 
 	/* Determine parents */
@@ -1821,7 +1823,6 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 		rollback_index_files();
 		die(_("failed to write commit object"));
 	}
-	strbuf_release(&author_ident);
 	free_commit_extra_headers(extra);
 
 	if (update_head_with_reflog(current_head, &oid, reflog_msg, &sb,
@@ -1862,7 +1863,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
 
 	apply_autostash(git_path_merge_autostash(the_repository));
 
+cleanup:
+	UNLEAK(author_ident);
 	UNLEAK(err);
 	UNLEAK(sb);
-	return 0;
+	return ret;
 }
diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index db7ca55998..ebf58db2d1 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -2,6 +2,7 @@
 
 test_description='Intent to add'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'intent to add' '
diff --git a/t/t7011-skip-worktree-reading.sh b/t/t7011-skip-worktree-reading.sh
index 1761a2b1b9..4adac5acd5 100755
--- a/t/t7011-skip-worktree-reading.sh
+++ b/t/t7011-skip-worktree-reading.sh
@@ -5,6 +5,7 @@
 
 test_description='skip-worktree bit test'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 cat >expect.full <<EOF
-- 
2.36.1-338-g1c7f76a54c


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] commit: fix "author_ident" leak
  2022-05-12 22:51 ` [PATCH] commit: fix "author_ident" leak Junio C Hamano
@ 2022-05-17 13:48   ` Ævar Arnfjörð Bjarmason
  2022-05-18 16:30     ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-05-17 13:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git


On Thu, May 12 2022, Junio C Hamano wrote:

> Since 4c28e4ada03 (commit: die before asking to edit the log
> message, 2010-12-20), we have been "leaking" the "author_ident" when
> prepare_to_commit() fails.  Instead of returning from right there,
> introduce an exit status variable and jump to the clean-up label
> at the end.
>
> Instead of explicitly releasing the resource with strbuf_release(),
> mark the variable with UNLEAK() at the end, together with two other
> variables that are already marked as such.  If this were in a
> utility function that is called number of times, but these are
> different, we should explicitly release resources that grow
> proportionally to the size of the problem being solved, but
> cmd_commit() is like main() and there is no point in spending extra
> cycles to release individual pieces of resource at the end, just
> before process exit will clean everything for us for free anyway.
>
> This fixes a leak demonstrated by e.g. "t3505-cherry-pick-empty.sh",
> but unfortunately we cannot mark it or other affected tests as passing
> now with "TEST_PASSES_SANITIZE_LEAK=true" as we'll need to fix many
> other memory leaks before doing so.
>
> Incidentally there are two tests that always passes the leak checker
> with or without this change.  Mark them as such.
>
> This is based on an earlier patch by Ævar, but takes a different
> approach that is more maintainable.

We've talked about UNLEAK() v.s. strbuf_release() elsewhere, so let's
leave that aside. I know your preferences in that area.

But even accounting for that, I don't see what the "more maintainable"
here refers to. The approach I suggested would s/UNLEAK/strbuf_release/
in the 4th hunk, but otherwise be equivalent.

Surely both are about as easy to maintain. To the extent that there's
any difference at all I'd think the strbuf_release() would pull ahead,
as it's guaranteed to do the right thing with all of our memory analysis
tooling (some of which will have a noop UNLEAK()).

Just a small question, I see this is in "next" already, and I'm fine
with this change either way.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] commit: fix "author_ident" leak
  2022-05-17 13:48   ` Ævar Arnfjörð Bjarmason
@ 2022-05-18 16:30     ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2022-05-18 16:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> But even accounting for that, I don't see what the "more maintainable"
> here refers to. The approach I suggested would s/UNLEAK/strbuf_release/
> in the 4th hunk, but otherwise be equivalent.

Judicious use of UNLEAK() has documentation value to tell readers
which use of pointer variables need to be explicitly released, and
which pointer variables can just implicitly released by going out of
scope at the end.  There are also a few other small added benefits
(they do not have to be changed when the helper needed to do the
real release changes, and not doing an explicit real release on
resources when there no need to is conceptually cleaner).  But they
are icing on the cake.

Sorry for a late reply---I go offline every other Tuesday and
yesterday was such a Tuesday.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-05-18 16:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16  8:21 [PATCH 0/2] commit: trivial leak fix, add 2 tests to linux-leaks CI Ævar Arnfjörð Bjarmason
2022-02-16  8:21 ` [PATCH 1/2] commit: fix "author_ident" leak Ævar Arnfjörð Bjarmason
2022-02-16 17:59   ` Junio C Hamano
2022-02-16  8:21 ` [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK() Ævar Arnfjörð Bjarmason
2022-02-16 18:03   ` Junio C Hamano
2022-02-16 18:30     ` Junio C Hamano
2022-02-18 12:35       ` Whether to keep using UNLEAK() in built-ins (was: [PATCH 2/2] commit: use strbuf_release() instead of UNLEAK()) Ævar Arnfjörð Bjarmason
2022-02-18 18:19         ` Whether to keep using UNLEAK() in built-ins Junio C Hamano
2022-02-18 19:31           ` Ævar Arnfjörð Bjarmason
2022-05-12 22:51 ` [PATCH] commit: fix "author_ident" leak Junio C Hamano
2022-05-17 13:48   ` Ævar Arnfjörð Bjarmason
2022-05-18 16:30     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).