LSan curiosity in t1300

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* LSan curiosity in t1300
@ 2022-09-08  5:29 Jeff King
  2022-09-08 15:42 ` [PATCH] tests: opt "git --config-env" test out of SANITIZE=leak Ævar Arnfjörð Bjarmason
  2022-09-08 18:03 ` LSan curiosity in t1300 Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Jeff King @ 2022-09-08  5:29 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason

If I run:

  make SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true CFLAGS=-O0

locally, then all tests pass or are skipped. But if switch to -O2, then
t1300 starts failing!

The command that supposedly leaks is this:

  git --config-env=foo.flag= config --bool foo.flag

and the backtrace looks like:

  Direct leak of 9 byte(s) in 1 object(s) allocated from:
      #0 0x56195ae6dc62 in __interceptor_malloc (git+0x78c62) (BuildId: a5cfeb484fd14b6120fa26ff364fa2313fd23168)
      #1 0x56195b0ad6b6 in do_xmalloc wrapper.c
      #2 0x56195b0ad7bd in xmemdupz (git+0x2b87bd) (BuildId: a5cfeb484fd14b6120fa26ff364fa2313fd23168)
      #3 0x56195af6f8d0 in git_config_push_env (git+0x17a8d0) (BuildId: a5cfeb484fd14b6120fa26ff364fa2313fd23168)
      #4 0x56195ae71689 in handle_options git.c
      #5 0x56195ae7060f in cmd_main (git+0x7b60f) (BuildId: a5cfeb484fd14b6120fa26ff364fa2313fd23168)
      #6 0x56195af370c6 in main (git+0x1420c6) (BuildId: a5cfeb484fd14b6120fa26ff364fa2313fd23168)
      #7 0x7f45b9029209 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
      #8 0x7f45b90292bb in __libc_start_main csu/../csu/libc-start.c:389:3
      #9 0x56195ae420c0 in _start (git+0x4d0c0) (BuildId: a5cfeb484fd14b6120fa26ff364fa2313fd23168)

But here's the weird part. The function looks like this:

  void git_config_push_env(const char *spec)
  {
          char *key;
          const char *env_name;
          const char *env_value;

          env_name = strrchr(spec, '=');
          if (!env_name)
                  die(_("invalid config format: %s"), spec);
          key = xmemdupz(spec, env_name - spec);
          env_name++;
          if (!*env_name)
                  die(_("missing environment variable name for configuration '%.*s'"),
                      (int)(env_name - spec - 1), spec);

          env_value = getenv(env_name);
          if (!env_value)
                  die(_("missing environment variable '%s' for configuration '%.*s'"),
                      env_name, (int)(env_name - spec - 1), spec);

          git_config_push_split_parameter(key, env_value);
          free(key);
  }

And it is complaining that we've leaked "key". But as you can see, we
always free it. The problem is that for this particular invocation we
die("missing environment variable name"), so of course we don't make it
to the free(). Normally this is OK, though. The "key" variable is still
on the stack, so the leak-checker should realize that it's still
reachable.

But if you run this in a debugger, you'll find that under -O2 the "key"
variable has been optimized out! So LSan is producing the wrong result
due to the optimization. It doesn't know that "key" is conceptually
still reachable.

Maybe this is a known problem, but it was certainly surprising to me.
I'm not sure if we should do anything about it or not. It doesn't seem
to trigger in CI, even though I don't see us taking any steps there to
use -O0 or similar. So we can perhaps ignore it for now, and this
message can serve as a warning. But if we think LSan isn't reliable
under higher optimizations, we could perhaps teach the Makefile to
prefer -O0 when it sees SANITIZE=leak.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] tests: opt "git --config-env" test out of SANITIZE=leak
  2022-09-08  5:29 LSan curiosity in t1300 Jeff King
@ 2022-09-08 15:42 ` Ævar Arnfjörð Bjarmason
  2022-09-08 18:40   ` Jeff King
  2022-09-08 18:03 ` LSan curiosity in t1300 Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-08 15:42 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

In 676063016ad (leak tests: mark some config tests as passing with
SANITIZE=leak, 2021-10-31) t1300*.sh as a whole was marked as passing
with SANITIZE=leak, but as reported[1] certain compiler options and
optimization levels will re-arrange the generated code to have the
sanitizer fire.

This issue is reproducible with e.g. clang v12 with CFLAGS=-O2. The
issue is that -fsanitize=leak will combine with other optimization
options, and as we eliminate functions, variables etc. the
-fsanitize=leak criteria for considering something a leak might
effectively change.

This is arguably not a compiler issue or bug, as the very notion of a
variable being "in scope" as far as the leak checker is concerned is
fuzzy at best in the face of compiler optimizations.

A similar issue came up before related to the config.c code in
[2]. There the consensus seemed to be to avoid hacks in the C code to
work around these compiler edge cases. In this case however skipping
one test is easy enough. We can deal with these "!SANITIZE_LEAK"
issues later, when we have fewer overall leaks.

1. https://lore.kernel.org/git/Yxl91jfycCo7O7Pp@coredump.intra.peff.net/
2. https://lore.kernel.org/git/patch-v3-5.6-9a44204c4c9-20211022T175227Z-avarab@gmail.com/
---

On Thu, Sep 08 2022, Jeff King wrote:

> If I run:
>
>   make SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true CFLAGS=-O0
>
> locally, then all tests pass or are skipped. But if switch to -O2, then
> t1300 starts failing!
>
> The command that supposedly leaks is this:
>
>   git --config-env=foo.flag= config --bool foo.flag
>
> and the backtrace looks like:
>
>   Direct leak of 9 byte(s) in 1 object(s) allocated from:
>       #0 0x56195ae6dc62 in __interceptor_malloc (git+0x78c62) (BuildId: a5cfeb484fd14b6120fa26ff364fa2313fd23168)
>       #1 0x56195b0ad6b6 in do_xmalloc wrapper.c
>       #2 0x56195b0ad7bd in xmemdupz (git+0x2b87bd) (BuildId: a5cfeb484fd14b6120fa26ff364fa2313fd23168)
>       #3 0x56195af6f8d0 in git_config_push_env (git+0x17a8d0) (BuildId: a5cfeb484fd14b6120fa26ff364fa2313fd23168)
>       #4 0x56195ae71689 in handle_options git.c
>       #5 0x56195ae7060f in cmd_main (git+0x7b60f) (BuildId: a5cfeb484fd14b6120fa26ff364fa2313fd23168)
>       #6 0x56195af370c6 in main (git+0x1420c6) (BuildId: a5cfeb484fd14b6120fa26ff364fa2313fd23168)
>       #7 0x7f45b9029209 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
>       #8 0x7f45b90292bb in __libc_start_main csu/../csu/libc-start.c:389:3
>       #9 0x56195ae420c0 in _start (git+0x4d0c0) (BuildId: a5cfeb484fd14b6120fa26ff364fa2313fd23168)
>
> But here's the weird part. The function looks like this:
>
>   void git_config_push_env(const char *spec)
>   {
>           char *key;
>           const char *env_name;
>           const char *env_value;
>   
>           env_name = strrchr(spec, '=');
>           if (!env_name)
>                   die(_("invalid config format: %s"), spec);
>           key = xmemdupz(spec, env_name - spec);
>           env_name++;
>           if (!*env_name)
>                   die(_("missing environment variable name for configuration '%.*s'"),
>                       (int)(env_name - spec - 1), spec);
>   
>           env_value = getenv(env_name);
>           if (!env_value)
>                   die(_("missing environment variable '%s' for configuration '%.*s'"),
>                       env_name, (int)(env_name - spec - 1), spec);
>   
>           git_config_push_split_parameter(key, env_value);
>           free(key);
>   }
>
> And it is complaining that we've leaked "key". But as you can see, we
> always free it. The problem is that for this particular invocation we
> die("missing environment variable name"), so of course we don't make it
> to the free(). Normally this is OK, though. The "key" variable is still
> on the stack, so the leak-checker should realize that it's still
> reachable.

[Re-arranged your E-Mail a bit for the purposes of the flow of this
reply]

> Maybe this is a known problem, but it was certainly surprising to me.

It is surprising, but it's a (at least to me) known "issue". See
e.g. https://lore.kernel.org/git/patch-v3-5.6-9a44204c4c9-20211022T175227Z-avarab@gmail.com/
for a previous discussion between us. I took the consensus there to be
that we shouldn't bend over backwards to worry about this.

I have been diligently ensuring that I don't mark tests as leak-free
if they "pass", but then "fail" on higher optimization levels.

But I see this one slipped through because I locally changed my
default compiler from "clang" to "gcc" (again) a while ago, and thus
missed testing clang at higher optimization levels. Usually it's GCC
that's unhappy about this sort of thing.

> But if you run this in a debugger, you'll find that under -O2 the "key"
> variable has been optimized out! So LSan is producing the wrong result
> due to the optimization. It doesn't know that "key" is conceptually
> still reachable.
> [...]

I don't know if we (or the compiler implementors) can call it the
"wrong" result. Just as some warnings only pop up depending on
optimization levels, this behavior also depends on it.

I've seen quite a few of these around die() codepaths. Presumably some
re-arranging/eliminaton of the code around that NORETURN codepath...

> [...]
> I'm not sure if we should do anything about it or not. It doesn't seem
> to trigger in CI, even though I don't see us taking any steps there to
> use -O0 or similar. So we can perhaps ignore it for now, and this
> message can serve as a warning. But if we think LSan isn't reliable
> under higher optimizations, we could perhaps teach the Makefile to
> prefer -O0 when it sees SANITIZE=leak.

I think it's good to not force -O0 with SANITIZE=leak, but this is
annoying, so let's do the below change. With it the tests pass for me
with clang under -O2/-O3.

FWIW if you play with the "check" mode you'll probably also find that
we have some tests that are "flaky" as far as the leak checking is
concerned ("t2012 t6415 t6435", I think, at least I skip those locally
when using it). IIRC some of that also depends on -On.

 t/t1300-config.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index c6661e61af5..a4953881db6 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1398,7 +1398,7 @@ test_expect_success 'git --config-env with missing value' '
 	grep "invalid config format: config" error
 '

-test_expect_success 'git --config-env fails with invalid parameters' '
+test_expect_success !SANITIZE_LEAK 'git --config-env fails with invalid parameters' '
 	test_must_fail git --config-env=foo.flag config --bool foo.flag 2>error &&
 	test_i18ngrep "invalid config format: foo.flag" error &&
 	test_must_fail git --config-env=foo.flag= config --bool foo.flag 2>error &&
-- 
2.37.3.1492.gbb5af15b83b

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: LSan curiosity in t1300
  2022-09-08  5:29 LSan curiosity in t1300 Jeff King
  2022-09-08 15:42 ` [PATCH] tests: opt "git --config-env" test out of SANITIZE=leak Ævar Arnfjörð Bjarmason
@ 2022-09-08 18:03 ` Junio C Hamano
  2022-09-08 18:15   ` Jeff King
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2022-09-08 18:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> writes:

> And it is complaining that we've leaked "key". But as you can see, we
> always free it. The problem is that for this particular invocation we
> die("missing environment variable name"), so of course we don't make it
> to the free(). Normally this is OK, though. The "key" variable is still
> on the stack, so the leak-checker should realize that it's still
> reachable.
>
> But if you run this in a debugger, you'll find that under -O2 the "key"
> variable has been optimized out! So LSan is producing the wrong result
> due to the optimization. It doesn't know that "key" is conceptually
> still reachable.

Yikes.  

Is that related to the cause of the other patch from you, i.e.
wanting to get "registers" but unable to?  If they manage to get the
value of the register that holds 'key', would this disappear, I
wonder?

> Maybe this is a known problem, but it was certainly surprising to me.
> I'm not sure if we should do anything about it or not. It doesn't seem
> to trigger in CI, even though I don't see us taking any steps there to
> use -O0 or similar. So we can perhaps ignore it for now, and this
> message can serve as a warning. But if we think LSan isn't reliable
> under higher optimizations, we could perhaps teach the Makefile to
> prefer -O0 when it sees SANITIZE=leak.

Nice note to leave here, I agree.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: LSan curiosity in t1300
  2022-09-08 18:03 ` LSan curiosity in t1300 Junio C Hamano
@ 2022-09-08 18:15   ` Jeff King
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2022-09-08 18:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ævar Arnfjörð Bjarmason

On Thu, Sep 08, 2022 at 11:03:05AM -0700, Junio C Hamano wrote:

> > But if you run this in a debugger, you'll find that under -O2 the "key"
> > variable has been optimized out! So LSan is producing the wrong result
> > due to the optimization. It doesn't know that "key" is conceptually
> > still reachable.
> 
> Yikes.  
> 
> Is that related to the cause of the other patch from you, i.e.
> wanting to get "registers" but unable to?  If they manage to get the
> value of the register that holds 'key', would this disappear, I
> wonder?

I don't think so. t1300 fails consistently for me with -O2, and it
doesn't have the "register" message (and never came up as one of the
racy losers). I think the "register" thing is mostly just a harmless
warning that doesn't change the output, but simply confuses our "did
lsan produce any logs" check.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tests: opt "git --config-env" test out of SANITIZE=leak
  2022-09-08 15:42 ` [PATCH] tests: opt "git --config-env" test out of SANITIZE=leak Ævar Arnfjörð Bjarmason
@ 2022-09-08 18:40   ` Jeff King
  2022-09-23  8:28     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2022-09-08 18:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Thu, Sep 08, 2022 at 05:42:54PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This is arguably not a compiler issue or bug, as the very notion of a
> variable being "in scope" as far as the leak checker is concerned is
> fuzzy at best in the face of compiler optimizations.
> 
> A similar issue came up before related to the config.c code in
> [2]. There the consensus seemed to be to avoid hacks in the C code to
> work around these compiler edge cases. In this case however skipping
> one test is easy enough. We can deal with these "!SANITIZE_LEAK"
> issues later, when we have fewer overall leaks.

IMHO this is the wrong approach. It is playing whack-a-mole with
individual false positives, when we know the root cause is compiler
optimizations. We should be invoking the compiler in a way that gives us
the most reliable result.

In the long run, when all leaks are plugged, we'd want to ditch the
whole SANITIZE_LEAK system anyway. So we are better off preventing false
positives than trying to gloss over them.

> > Maybe this is a known problem, but it was certainly surprising to me.
> 
> It is surprising, but it's a (at least to me) known "issue". See
> e.g. https://lore.kernel.org/git/patch-v3-5.6-9a44204c4c9-20211022T175227Z-avarab@gmail.com/
> for a previous discussion between us. I took the consensus there to be
> that we shouldn't bend over backwards to worry about this.

Thanks, I'd forgotten about that conversation. I stand by my comments
there that we shouldn't be changing the code. But I really think just
using -O0 for the leak-check is a better solution than trying to label
false positives.

> But I see this one slipped through because I locally changed my
> default compiler from "clang" to "gcc" (again) a while ago, and thus
> missed testing clang at higher optimization levels. Usually it's GCC
> that's unhappy about this sort of thing.

FWIW, I found this using clang-14, because the gcc on my system (12.2.0)
does not seem to find actual leaks. It couldn't see any of the ones in
t1060 that I fixed yesterday.

> > But if you run this in a debugger, you'll find that under -O2 the "key"
> > variable has been optimized out! So LSan is producing the wrong result
> > due to the optimization. It doesn't know that "key" is conceptually
> > still reachable.
> > [...]
> 
> I don't know if we (or the compiler implementors) can call it the
> "wrong" result. Just as some warnings only pop up depending on
> optimization levels, this behavior also depends on it.

I think it's different than the case of warnings. In those cases the
optimizations let the compiler see more of what's going on in the code,
which lets them find a warn-able offense that they wouldn't otherwise
see (e.g., inlining a function lets the compiler see an invariant).
Those are real problems in the code that -O0 simply doesn't catch.

But this is different: there is no problem in the code. It is just that
the optimization of removing the variable does not work well with how
the leak-checker works. There is nothing we can do in the code to fix
it; the C conceptual model of variable scoping is all we have to work
with, and according to that, there is no leak. C's "as if" rule means
that the compiler can change the program as long as the behavior is the
same. And it is, from the perspective of the rest of the program. The
issue is that the leak-checker is itself a sort of undefined behavior;
it expects to poke at random parts of the stack at any moment and find a
consistent state, which is not something a normal C program can do.

> I've seen quite a few of these around die() codepaths. Presumably some
> re-arranging/eliminaton of the code around that NORETURN codepath...

The compiler probably does this kind of re-arranging and elimination all
the time. The reason it comes up around die() is that it means we invoke
the leak-checker mid-function. Even though the "key" variable here was
eliminated, the compiler would need to arrange that the pointer value is
_somewhere_ so that when we hit the end of the function we can call that
free(). But when we exit in the middle, we end up doing the leak-check
in a funny state.

It may be that this is influenced by the NORETURN flag. Presumably "key"
is held in a register, and the call to die(), knowing that it does not
need to return, may choose not to preserve that register. So turning
NORETURN into a noop _might_ help, but only because it's giving the
optimizer less information to work with (and it might not if the
optimizer can figure it out on its own via inline functions, for
example). The root cause is the optimizer, so disabling it is the
most complete fix.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tests: opt "git --config-env" test out of SANITIZE=leak
  2022-09-08 18:40   ` Jeff King
@ 2022-09-23  8:28     ` Ævar Arnfjörð Bjarmason
  2022-09-23 21:01       ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-23  8:28 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano


On Thu, Sep 08 2022, Jeff King wrote:

> On Thu, Sep 08, 2022 at 05:42:54PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> This is arguably not a compiler issue or bug, as the very notion of a
>> variable being "in scope" as far as the leak checker is concerned is
>> fuzzy at best in the face of compiler optimizations.
>> 
>> A similar issue came up before related to the config.c code in
>> [2]. There the consensus seemed to be to avoid hacks in the C code to
>> work around these compiler edge cases. In this case however skipping
>> one test is easy enough. We can deal with these "!SANITIZE_LEAK"
>> issues later, when we have fewer overall leaks.
>
> IMHO this is the wrong approach. It is playing whack-a-mole with
> individual false positives, when we know the root cause is compiler
> optimizations. We should be invoking the compiler in a way that gives us
> the most reliable result.
>
> In the long run, when all leaks are plugged, we'd want to ditch the
> whole SANITIZE_LEAK system anyway. So we are better off preventing false
> positives than trying to gloss over them.

In the long run when all leaks are plugged I'd prefer to be able to
compile a git with CFLAGS=-O3 and -fsanitize=leak, and have it run "git
config" without erroring out about a leak.

So I'd really prefer to keep this patch as-is. I'd agree with you if the
"whack-a-mole" was painful, but in this case it's really not. I think
it's just this one edge-case in the whole codebase (out of the things
marked leak-free).

That's inching us closer to what I think should be the end goal,
i.e. this SANITIZE=leak is just a transition mechanism, so having it
enforce -O0 would mean that we can't have -fsanitize=leak Just Work in
the future (both for the tests & when manually compiled).

>> I don't know if we (or the compiler implementors) can call it the
>> "wrong" result. Just as some warnings only pop up depending on
>> optimization levels, this behavior also depends on it.
>
> I think it's different than the case of warnings. In those cases the
> optimizations let the compiler see more of what's going on in the code,
> which lets them find a warn-able offense that they wouldn't otherwise
> see (e.g., inlining a function lets the compiler see an invariant).
> Those are real problems in the code that -O0 simply doesn't catch.
>
> But this is different: there is no problem in the code. It is just that
> the optimization of removing the variable does not work well with how
> the leak-checker works. There is nothing we can do in the code to fix
> it; the C conceptual model of variable scoping is all we have to work
> with, and according to that, there is no leak. C's "as if" rule means
> that the compiler can change the program as long as the behavior is the
> same. And it is, from the perspective of the rest of the program. The
> issue is that the leak-checker is itself a sort of undefined behavior;
> it expects to poke at random parts of the stack at any moment and find a
> consistent state, which is not something a normal C program can do.

This is just from vague recollection, and I couldn't find an example now
with some light digging (which was just seeing if any of the tests with
a "!SANITIZE_LEAK" prereq passed on -O0, but not -O3 with clang), but I
think I've run into cases where higher optimization levels have also
spotted genuine leaks that I've fixed.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tests: opt "git --config-env" test out of SANITIZE=leak
  2022-09-23  8:28     ` Ævar Arnfjörð Bjarmason
@ 2022-09-23 21:01       ` Jeff King
  2022-09-26  8:56         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2022-09-23 21:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano

On Fri, Sep 23, 2022 at 10:28:29AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > In the long run, when all leaks are plugged, we'd want to ditch the
> > whole SANITIZE_LEAK system anyway. So we are better off preventing false
> > positives than trying to gloss over them.
> 
> In the long run when all leaks are plugged I'd prefer to be able to
> compile a git with CFLAGS=-O3 and -fsanitize=leak, and have it run "git
> config" without erroring out about a leak.

Why? Do you want to run a leak-checking copy of Git all the time? If so,
I have bad news for you performance-wise. Running the tests marked as
leak-passing with -O2 (but not -fsanitize=leak) takes ~101s of CPU.
Running with -O0 takes ~111s. Running with -fsanitize=leak takes ~241s.
So the improvement from compiler optimizations is not a big win there,
relatively speaking.

Or are you thinking that -O3 reveals new information we would not find
under other optimization levels? I don't think this is the case. While
that does sometimes find new opportunities for static analysis (via
inlining code, etc), I don't think it helps with run-time analysis. And
as we've seen here, it actively makes things _worse_ by introducing
false positives.

> So I'd really prefer to keep this patch as-is. I'd agree with you if the
> "whack-a-mole" was painful, but in this case it's really not. I think
> it's just this one edge-case in the whole codebase (out of the things
> marked leak-free).

Is it just this one spot? This is already the second one we've discussed
on the list, and I think you indicated there were more spots where you
intentionally held back on setting TEST_PASSES_SANITIZE_LEAK when you
saw hits under higher optimization levels.

It really is a potential problem anywhere we'd call a NORETURN function,
because the compiler (rightly) realizes there is no point in making sure
we can call a later free() that we'll never reach.

> That's inching us closer to what I think should be the end goal,
> i.e. this SANITIZE=leak is just a transition mechanism, so having it
> enforce -O0 would mean that we can't have -fsanitize=leak Just Work in
> the future (both for the tests & when manually compiled).

I think TEST_PASSES_SANITIZE_LEAK is a transition mechanism. But we'll
always want SANITIZE=leak, and I don't think it makes sense for ordinary
builds to use it.

> This is just from vague recollection, and I couldn't find an example now
> with some light digging (which was just seeing if any of the tests with
> a "!SANITIZE_LEAK" prereq passed on -O0, but not -O3 with clang), but I
> think I've run into cases where higher optimization levels have also
> spotted genuine leaks that I've fixed.

If you have a counter-example, I'm all ears, but I have trouble
imagining how -O3 could actually find a leak not present in -O0. Again,
because we're testing runtime behavior, and the optimizations should not
be changing the visible behavior of our program (and the problem is the
leak checker itself is testing "invisible" things that may not actually
be important, like the state of a variable on the stack when we call
exit()).

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] tests: opt "git --config-env" test out of SANITIZE=leak
  2022-09-23 21:01       ` Jeff King
@ 2022-09-26  8:56         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 8+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-26  8:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano

On Fri, Sep 23 2022, Jeff King wrote:

> On Fri, Sep 23, 2022 at 10:28:29AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > In the long run, when all leaks are plugged, we'd want to ditch the
>> > whole SANITIZE_LEAK system anyway. So we are better off preventing false
>> > positives than trying to gloss over them.
>> 
>> In the long run when all leaks are plugged I'd prefer to be able to
>> compile a git with CFLAGS=-O3 and -fsanitize=leak, and have it run "git
>> config" without erroring out about a leak.
>
> Why? Do you want to run a leak-checking copy of Git all the time? 

Not all the time, yes, and I've been getting closer to compiling my
daily driver with it..

> If so,
> I have bad news for you performance-wise. Running the tests marked as
> leak-passing with -O2 (but not -fsanitize=leak) takes ~101s of CPU.
> Running with -O0 takes ~111s. Running with -fsanitize=leak takes ~241s.
> So the improvement from compiler optimizations is not a big win there,
> relatively speaking.

Yeah, I know it sucks, but I use it for interactive use, "git push" and
the like, so I usually don't care if it's ~2x slower. I even run with
SANITIZE=address sometimes.

Wanting to have non-O0 there is less about thinking the higher -On
helps, and more to avoid it being a special snowflake, i.e. if I run
with -O2 -g usually I'd like to just add -fsanitize=leak to that, and
not have to also change the optimization level.

> Or are you thinking that -O3 reveals new information we would not find
> under other optimization levels? I don't think this is the case. While
> that does sometimes find new opportunities for static analysis (via
> inlining code, etc), I don't think it helps with run-time analysis. And
> as we've seen here, it actively makes things _worse_ by introducing
> false positives.

I think this (and to your "conter-example" below) is correct on your
part. I.e. I don't see a good reason for why this would happen.

I have been able to reliably reproduce some leaks as being flaky (and
have avoided adding them to the tests). I wonder if that explains it,
i.e. there was another underlying issue, and the optimization level
happened to trigger some race (or whatever was going on there, I haven't
looked in any depth into those either...).

>> So I'd really prefer to keep this patch as-is. I'd agree with you if the
>> "whack-a-mole" was painful, but in this case it's really not. I think
>> it's just this one edge-case in the whole codebase (out of the things
>> marked leak-free).
>
> Is it just this one spot? This is already the second one we've discussed
> on the list, and I think you indicated there were more spots where you
> intentionally held back on setting TEST_PASSES_SANITIZE_LEAK when you
> saw hits under higher optimization levels.

Probably not the only spot, but the only spot under the
TEST_PASSES_SANITIZE_LEAK umbrella.

> It really is a potential problem anywhere we'd call a NORETURN function,
> because the compiler (rightly) realizes there is no point in making sure
> we can call a later free() that we'll never reach.

FWIW I'm not sure it's NORETURN, and haven't had time to dig...

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-26  9:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-08  5:29 LSan curiosity in t1300 Jeff King
2022-09-08 15:42 ` [PATCH] tests: opt "git --config-env" test out of SANITIZE=leak Ævar Arnfjörð Bjarmason
2022-09-08 18:40   ` Jeff King
2022-09-23  8:28     ` Ævar Arnfjörð Bjarmason
2022-09-23 21:01       ` Jeff King
2022-09-26  8:56         ` Ævar Arnfjörð Bjarmason
2022-09-08 18:03 ` LSan curiosity in t1300 Junio C Hamano
2022-09-08 18:15   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).