git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] die routine: change recursion limit from 1 to 1024
@ 2017-06-19 22:00 Ævar Arnfjörð Bjarmason
  2017-06-19 22:08 ` Stefan Beller
  2017-06-20 15:54 ` [PATCH] die routine: change recursion limit from 1 to 1024 Jeff King
  0 siblings, 2 replies; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-19 22:00 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Brandon Casey, Jeff King,
	Ævar Arnfjörð Bjarmason

Change the recursion limit for the default die routine from a *very*
low 1 to 1024. This ensures that infinite recursions are broken, but
doesn't lose error messages.

The intent of the existing code, as explained in commit
cd163d4b4e ("usage.c: detect recursion in die routines and bail out
immediately", 2012-11-14), is to break infinite recursion in cases
where the die routine itself dies.

However, doing that very aggressively by immediately printing out
"recursion detected in die handler" if we've already called die() once
means that threaded invocations of git can go through the following
flow:

 1. Start a bunch of threads

 2. The threads start invoking die(), pretty much at the same time.

 3. The first die() invocation will be in the middle of trying to
    print out its message by the time another thread dies, that other
    thread then runs into the recursion limit and dies with "recursion
    detected in die handler".

 4. Due to a race condition the initial error may never get printed
    before the "recursion detected" thread calls exit() and aborts the
    program.

An example of this is running a threaded grep against e.g. linux.git:

    git grep -P --threads=4 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'

With the current version of git this will print some combination of
multiple PCRE failures that caused the abort and multiple "recursion
detected", some invocations will print out multiple "recursion
detected" errors with no PCRE error at all!

Now, git-grep could make use of the pluggable error facility added in
commit c19a490e37 ("usage: allow pluggable die-recursion checks",
2013-04-16).

That should be done for git-grep in particular because before this
change (and after) it'll potentially print out the exact same error
from the N threads it starts, that should be de-duplicated.

But let's start by improving the default behavior shared across all of
git. Right now the common case is not an infinite recursion in the
handler, but us losing error messages by default because we're overly
paranoid about our recursion check.

So let's just set the recursion limit to a number higher than the
number of threads we're ever likely to spawn. Now we won't lose
errors, and if we have a recursing die handler we'll still die within
microseconds.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 usage.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/usage.c b/usage.c
index 2f87ca69a8..1c198d4882 100644
--- a/usage.c
+++ b/usage.c
@@ -44,7 +44,9 @@ static void warn_builtin(const char *warn, va_list params)
 static int die_is_recursing_builtin(void)
 {
 	static int dying;
-	return dying++;
+	static int recursion_limit = 1024;
+
+	return dying++ > recursion_limit;
 }
 
 /* If we are in a dlopen()ed .so write to a global variable would segfault
-- 
2.13.1.518.g3df882009


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] die routine: change recursion limit from 1 to 1024
  2017-06-19 22:00 [PATCH] die routine: change recursion limit from 1 to 1024 Ævar Arnfjörð Bjarmason
@ 2017-06-19 22:08 ` Stefan Beller
  2017-06-19 22:32   ` Ævar Arnfjörð Bjarmason
  2017-06-20 15:54 ` [PATCH] die routine: change recursion limit from 1 to 1024 Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: Stefan Beller @ 2017-06-19 22:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git@vger.kernel.org, Junio C Hamano, Brandon Casey, Jeff King

> Now, git-grep could make use of the pluggable error facility added in
> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
> 2013-04-16).

I think we should do that instead (though I have not looked at the downsides
of this), because...
>
> So let's just set the recursion limit to a number higher than the
> number of threads we're ever likely to spawn. Now we won't lose
> errors, and if we have a recursing die handler we'll still die within
> microseconds.

how are we handling access to that global variable?
Do we need to hold a mutex to be correct? or rather hope that
it works across threads, not counting on it, because each thread
individually would count up to 1024?

I would prefer if we kept the number as low as "at most
one screen of lines".

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  usage.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/usage.c b/usage.c
> index 2f87ca69a8..1c198d4882 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -44,7 +44,9 @@ static void warn_builtin(const char *warn, va_list params)
>  static int die_is_recursing_builtin(void)
>  {
>         static int dying;
> -       return dying++;
> +       static int recursion_limit = 1024;
> +
> +       return dying++ > recursion_limit;
>  }
>
>  /* If we are in a dlopen()ed .so write to a global variable would segfault
> --
> 2.13.1.518.g3df882009
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] die routine: change recursion limit from 1 to 1024
  2017-06-19 22:08 ` Stefan Beller
@ 2017-06-19 22:32   ` Ævar Arnfjörð Bjarmason
  2017-06-19 22:38     ` Stefan Beller
  2017-06-21 20:47     ` [PATCH v2] die(): stop hiding errors due to overzealous recursion guard Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-19 22:32 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git@vger.kernel.org, Junio C Hamano, Brandon Casey, Jeff King


On Mon, Jun 19 2017, Stefan Beller jotted:

>> Now, git-grep could make use of the pluggable error facility added in
>> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
>> 2013-04-16).
>
> I think we should do that instead (though I have not looked at the downsides
> of this), because...

It makes sense to do that in addition to what I'm doing here (or if the
approach I'm taking doesn't make sense, some other patch to fix the
general issue in the default handler).

I'm going to try to get around to fixing the grep behavior in a
follow-up patch, this is a fix for the overzealous recursion detection
in the default handler needlessly causing other issues.

>>
>> So let's just set the recursion limit to a number higher than the
>> number of threads we're ever likely to spawn. Now we won't lose
>> errors, and if we have a recursing die handler we'll still die within
>> microseconds.
>
> how are we handling access to that global variable?
> Do we need to hold a mutex to be correct? or rather hope that
> it works across threads, not counting on it, because each thread
> individually would count up to 1024?

It's not guarded by a mutex and the ++ here and the reads from it are
both racy.

However, for its stated purpose that's fine, even if we're racily
incrementing it and losing some updates some will get through, which is
good enough for an infinite recursion detection. We don't really care if
we die at exactly 1 or exactly 1024.

> I would prefer if we kept the number as low as "at most
> one screen of lines".

In practice this is the case in git, because the programs that would
encounter this are going to be spawning less than screenfull of threads,
assuming (as is the case) that each thread might print out one error.

The semantics of that aren't changed with this patch, the difference is
that you're going to get e.g. N repeats of a meaningful error instead of
N repeats of either the meaningful error OR "recursion detected in die
handler", depending on your luck.

I.e. in current git (after a few runs to get an unlucky one):

    $ git grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
    fatal: recursion detected in die handler
    fatal: recursion detected in die handler
    fatal: recursion detected in die handler

Or if you're lucky at least one of these will be the actual error:

    $ git grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
    fatal: recursion detected in die handler
    fatal: pcre_exec failed with error code -8
    fatal: recursion detected in die handler
    fatal: recursion detected in die handler
    fatal: recursion detected in die handler

But with this change:

    $ ~/g/git/git-grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
    fatal: pcre2_jit_match failed with error code -47: match limit exceeded
    fatal: pcre2_jit_match failed with error code -47: match limit exceeded
    fatal: pcre2_jit_match failed with error code -47: match limit exceeded

(The error message is different because I compiled with PCRE v2 locally,
instead of the system PCRE v1, but that doesn't matter for this example)

Over 1000 runs thish is how that breaks down on my machine, without this
patch. I've replaced the recursion error with just "R" and the PCRE
error with "P", and shown them in descending order by occurrence, lines
without a "P" only printed out the recursion error:

    $ (for i in {1..1000}; do git grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: p.*/P/' | tr '\n' ' '; echo; done)|sort|uniq -c|sort -nr|head -n 10
        247 R R
        136 P R R
        122 P R
        112 R R R
        108 R
         59 R P R
         54 R P
         54 P
         31 P R R R
         21 R R P

There's a long tail I've omitted there of alterations to that. As this
shows in >10% of cases we don't print out any meaningful error at
all. But with this change:

    $ (for i in {1..1000}; do ~/g/git/git-grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: p.*/P/' | tr '\n' ' '; echo; done)|sort|uniq -c|sort -nr|head -n 10
        377 P P
        358 P P P
        192 P
         63 P P P P
          8 P P P P P
          2 P P P P P P

We will always show a meaningful error, but may of course do so multiple
times, which is a subject for a fix in git-grep in particular, but the
point is again, to fix the general case for the default handler.

Something something sorry about the long mail didn't have time to write
a shorter one :)

>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  usage.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/usage.c b/usage.c
>> index 2f87ca69a8..1c198d4882 100644
>> --- a/usage.c
>> +++ b/usage.c
>> @@ -44,7 +44,9 @@ static void warn_builtin(const char *warn, va_list params)
>>  static int die_is_recursing_builtin(void)
>>  {
>>         static int dying;
>> -       return dying++;
>> +       static int recursion_limit = 1024;
>> +
>> +       return dying++ > recursion_limit;
>>  }
>>
>>  /* If we are in a dlopen()ed .so write to a global variable would segfault
>> --
>> 2.13.1.518.g3df882009
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] die routine: change recursion limit from 1 to 1024
  2017-06-19 22:32   ` Ævar Arnfjörð Bjarmason
@ 2017-06-19 22:38     ` Stefan Beller
  2017-06-21 20:47     ` [PATCH v2] die(): stop hiding errors due to overzealous recursion guard Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2017-06-19 22:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git@vger.kernel.org, Junio C Hamano, Brandon Casey, Jeff King

On Mon, Jun 19, 2017 at 3:32 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Mon, Jun 19 2017, Stefan Beller jotted:
>
>>> Now, git-grep could make use of the pluggable error facility added in
>>> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
>>> 2013-04-16).
>>
>> I think we should do that instead (though I have not looked at the downsides
>> of this), because...
>
> It makes sense to do that in addition to what I'm doing here (or if the
> approach I'm taking doesn't make sense, some other patch to fix the
> general issue in the default handler).
>
> I'm going to try to get around to fixing the grep behavior in a
> follow-up patch, this is a fix for the overzealous recursion detection
> in the default handler needlessly causing other issues.
>
>>>
>>> So let's just set the recursion limit to a number higher than the
>>> number of threads we're ever likely to spawn. Now we won't lose
>>> errors, and if we have a recursing die handler we'll still die within
>>> microseconds.
>>
>> how are we handling access to that global variable?
>> Do we need to hold a mutex to be correct? or rather hope that
>> it works across threads, not counting on it, because each thread
>> individually would count up to 1024?
>
> It's not guarded by a mutex and the ++ here and the reads from it are
> both racy.
>
> However, for its stated purpose that's fine, even if we're racily
> incrementing it and losing some updates some will get through, which is
> good enough for an infinite recursion detection. We don't really care if
> we die at exactly 1 or exactly 1024.
>
>> I would prefer if we kept the number as low as "at most
>> one screen of lines".
>
> In practice this is the case in git, because the programs that would
> encounter this are going to be spawning less than screenfull of threads,
> assuming (as is the case) that each thread might print out one error.
>
> The semantics of that aren't changed with this patch, the difference is
> that you're going to get e.g. N repeats of a meaningful error instead of
> N repeats of either the meaningful error OR "recursion detected in die
> handler", depending on your luck.
>
> I.e. in current git (after a few runs to get an unlucky one):
>
>     $ git grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
>     fatal: recursion detected in die handler
>     fatal: recursion detected in die handler
>     fatal: recursion detected in die handler
>
> Or if you're lucky at least one of these will be the actual error:
>
>     $ git grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
>     fatal: recursion detected in die handler
>     fatal: pcre_exec failed with error code -8
>     fatal: recursion detected in die handler
>     fatal: recursion detected in die handler
>     fatal: recursion detected in die handler
>
> But with this change:
>
>     $ ~/g/git/git-grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$'
>     fatal: pcre2_jit_match failed with error code -47: match limit exceeded
>     fatal: pcre2_jit_match failed with error code -47: match limit exceeded
>     fatal: pcre2_jit_match failed with error code -47: match limit exceeded
>
> (The error message is different because I compiled with PCRE v2 locally,
> instead of the system PCRE v1, but that doesn't matter for this example)
>
> Over 1000 runs thish is how that breaks down on my machine, without this
> patch. I've replaced the recursion error with just "R" and the PCRE
> error with "P", and shown them in descending order by occurrence, lines
> without a "P" only printed out the recursion error:
>
>     $ (for i in {1..1000}; do git grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: p.*/P/' | tr '\n' ' '; echo; done)|sort|uniq -c|sort -nr|head -n 10
>         247 R R
>         136 P R R
>         122 P R
>         112 R R R
>         108 R
>          59 R P R
>          54 R P
>          54 P
>          31 P R R R
>          21 R R P
>
> There's a long tail I've omitted there of alterations to that. As this
> shows in >10% of cases we don't print out any meaningful error at
> all. But with this change:
>
>     $ (for i in {1..1000}; do ~/g/git/git-grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: p.*/P/' | tr '\n' ' '; echo; done)|sort|uniq -c|sort -nr|head -n 10
>         377 P P
>         358 P P P
>         192 P
>          63 P P P P
>           8 P P P P P
>           2 P P P P P P
>
> We will always show a meaningful error, but may of course do so multiple
> times, which is a subject for a fix in git-grep in particular, but the
> point is again, to fix the general case for the default handler.
>
> Something something sorry about the long mail didn't have time to write
> a shorter one :)
>

Actually this convinced me (and it would be lovely to have such observations
in the commit message).

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] die routine: change recursion limit from 1 to 1024
  2017-06-19 22:00 [PATCH] die routine: change recursion limit from 1 to 1024 Ævar Arnfjörð Bjarmason
  2017-06-19 22:08 ` Stefan Beller
@ 2017-06-20 15:54 ` Jeff King
  2017-06-20 16:15   ` Jeff King
  2017-06-20 18:49   ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 17+ messages in thread
From: Jeff King @ 2017-06-20 15:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Brandon Casey

On Mon, Jun 19, 2017 at 10:00:36PM +0000, Ævar Arnfjörð Bjarmason wrote:

> Change the recursion limit for the default die routine from a *very*
> low 1 to 1024. This ensures that infinite recursions are broken, but
> doesn't lose error messages.
> 
> The intent of the existing code, as explained in commit
> cd163d4b4e ("usage.c: detect recursion in die routines and bail out
> immediately", 2012-11-14), is to break infinite recursion in cases
> where the die routine itself dies.

I agree that was the original intent, but I think it also does something
else. Anytime die() recurses, even a single level, we're going to cover
up the original failure with the one that happened inside die(), which
is almost certainly the less interesting of the two.

E.g., if I

  die_errno("unable to open %s", filename);

and then the die handler calls malloc() and fails, you'd much rather see
that first message than "out of memory".

To be fair, "die handler is recursing" is _also_ not helpful, but at
least it's clear that this is a bug (and IMHO it should be marked with
BUG()). Saying "out of memory" tells you about the second error, but it
doesn't tell you that we've masked the first error. So it may lead to
more confusion in the long run.

I wonder if we can get the best of both, though. Can we make the logic
more like:

  if (!dying) {
	/* ok, normal */
	return 0;
  } else if (dying < 1024) {
	/* only show the warning once */
	if (dying == 1)
		warning("I heard you liked errors, so I put a die() in your die()");
	return 0; /* don't bail yet */
  } else {
	BUG("recursion detected in die handler");
  }

> Now, git-grep could make use of the pluggable error facility added in
> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
> 2013-04-16).

Yeah, I think this is a bug in git-grep and should be fixed, independent
of this commit. You should be able to use as a template the callbacks
added by the child of c19a490e37:

  1ece66bc9 (run-command: use thread-aware die_is_recursing routine,
  2013-04-16)

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] die routine: change recursion limit from 1 to 1024
  2017-06-20 15:54 ` [PATCH] die routine: change recursion limit from 1 to 1024 Jeff King
@ 2017-06-20 16:15   ` Jeff King
  2017-06-20 18:49   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2017-06-20 16:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Brandon Casey

On Tue, Jun 20, 2017 at 11:54:59AM -0400, Jeff King wrote:

> > Now, git-grep could make use of the pluggable error facility added in
> > commit c19a490e37 ("usage: allow pluggable die-recursion checks",
> > 2013-04-16).
> 
> Yeah, I think this is a bug in git-grep and should be fixed, independent
> of this commit. You should be able to use as a template the callbacks
> added by the child of c19a490e37:
> 
>   1ece66bc9 (run-command: use thread-aware die_is_recursing routine,
>   2013-04-16)

To clarify, I think anytime we spawn worker threads that might run die()
we probably need to be installing not just a custom recursion handler
but a custom die function.

It's weird to see:

  $ git grep ...
  fatal: some error
  fatal: some error
  fatal: some error

Or even:

  $ git grep ...
  fatal: some error
  some actual results
  more actual results

I'm not sure what the _right_ thing is there, but it probably involves
recording the per-thread errors in individual buffers, waiting for the
master to pthread_join() them all, and then doing something like:

  /* this also covers the case of only having 1 error */
  int all_errors_identical = 1;
  for (i = 1; i < nr_errors; i++) {
	if (strcmp(errors[i], errors[i-1])) {
		all_errors_identical = 0;
		break;
	}
  }

  if (all_errors_identical) {
	/* just show it */
	die("%s", errors[0]);
  } else {
	for (i = 0; i < nr_errors; i++)
		error("%s", errors[i]);
	die("multiple errors encountered");
  }

I don't know if we'd want to actually get into the details of what
"multiple errors" means. It isn't _all_ of the possible errors, because
each thread stopped running when it hit the die(). But it's also not
just one error.

Actually, I guess we could just pick one error and show only that one.
That would most closely match the non-threaded case. And it's way
simpler than what I wrote above.

Hrm. I guess you could even do that without buffering if you allow the
thread to take down the whole process. The only thing you'd need to do
is teach die() to take a mutex so that we don't racily show multiple
errors.

That seems like the best option (I almost just deleted my entire email,
but maybe the thought process in leading there is useful).

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] die routine: change recursion limit from 1 to 1024
  2017-06-20 15:54 ` [PATCH] die routine: change recursion limit from 1 to 1024 Jeff King
  2017-06-20 16:15   ` Jeff King
@ 2017-06-20 18:49   ` Ævar Arnfjörð Bjarmason
  2017-06-20 19:05     ` Jeff King
  2017-06-21  8:12     ` Simon Ruderich
  1 sibling, 2 replies; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-20 18:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Brandon Casey


On Tue, Jun 20 2017, Jeff King jotted:

> On Mon, Jun 19, 2017 at 10:00:36PM +0000, Ævar Arnfjörð Bjarmason wrote:
>
>> Change the recursion limit for the default die routine from a *very*
>> low 1 to 1024. This ensures that infinite recursions are broken, but
>> doesn't lose error messages.
>>
>> The intent of the existing code, as explained in commit
>> cd163d4b4e ("usage.c: detect recursion in die routines and bail out
>> immediately", 2012-11-14), is to break infinite recursion in cases
>> where the die routine itself dies.
>
> I agree that was the original intent, but I think it also does something
> else. Anytime die() recurses, even a single level, we're going to cover
> up the original failure with the one that happened inside die(), which
> is almost certainly the less interesting of the two.
>
> E.g., if I
>
>   die_errno("unable to open %s", filename);
>
> and then the die handler calls malloc() and fails, you'd much rather see
> that first message than "out of memory".
>
> To be fair, "die handler is recursing" is _also_ not helpful, but at
> least it's clear that this is a bug (and IMHO it should be marked with
> BUG()). Saying "out of memory" tells you about the second error, but it
> doesn't tell you that we've masked the first error. So it may lead to
> more confusion in the long run.
>
> I wonder if we can get the best of both, though. Can we make the logic
> more like:
>
>   if (!dying) {
> 	/* ok, normal */
> 	return 0;
>   } else if (dying < 1024) {
> 	/* only show the warning once */
> 	if (dying == 1)
> 		warning("I heard you liked errors, so I put a die() in your die()");
> 	return 0; /* don't bail yet */
>   } else {
> 	BUG("recursion detected in die handler");
>   }

If I understand you correctly this on top:

    diff --git a/usage.c b/usage.c
    index 1c198d4882..f6d5af2bb4 100644
    --- a/usage.c
    +++ b/usage.c
    @@ -46,7 +46,19 @@ static int die_is_recursing_builtin(void)
     	static int dying;
     	static int recursion_limit = 1024;

    -	return dying++ > recursion_limit;
    +	dying++;
    +
    +	if (!dying) {
    +		/* ok, normal */
    +		return 0;
    +	} else if (dying < recursion_limit) {
    +		/* only show the warning once */
    +		if (dying == 1)
    +			warning("die() called many times. Recursion error or racy threaded death!");
    +		return 0; /* don't bail yet */
    +	} else {
    +		return 1;
    +	}
     }

     /* If we are in a dlopen()ed .so write to a global variable would segfault

Will yield this over 1000 runs, i.e. mostly this works and we emit the
warning (although sometimes we miss it, and we might even emit it twice
or more due to an extra race condition we have now):

    $ (for i in {1..1000}; do ~/g/git/git-grep -P --threads=10 '(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: p.*/P/; s/^warning.*/W/' | tr '\n' ' '; echo; done)|sort|uniq -c|sort -nr|head -n 20
        245 W P P
        222 W P P P
        212 W P
         47 P W P
         36 W P P P P
         35 P P P
         35 P P
         30 P W P P
         16 P P W
         14 W W P P
         12 P P W P
         11 W W P P P
         11 P
          8 W P W P P
          8 P P P P
          7 W P P P P P
          7 P W P P P
          6 W P W P
          4 P W P W
          3 W W P P W

I think it makes sense to apply that on top, even though we could print
more than 1 warning here it makes sense to alert the user that we're in
the middle of some racy death, it explains the multiple lines of output
they'll probably (but not always!) get.

As you can see the third most common case is that we needlessly print
out the warning, i.e. we have only one error anyway, but we can't
guarantee that, so it probably makes sense to emit it.

To reply to your 20170620161514.ygbflanx4pldc7n7@sigill.intra.peff.net
downthread here (where you talk about setting up a custom die handler
for grep) yeah that would make sense, but as long as we're supplying
this default behavior (and not outlawing using it with pthreads) it
makes sense to get out of our own way with this recursion detection.

I think my patch (possibly with the fixup above, depending on what we
think about dupe warnings) is just fine to fix this. This is already a
super-rare edge case in grep, and to the extent that it would be a
problem for anyone it's because our paranoid recursion detector totally
hides the error, I don't think it's worth worrying about us printing a
few dupe error messages under threading for something that almost never
happens.

At least, that's starting to go beyond my bothering to hack on it :)

>> Now, git-grep could make use of the pluggable error facility added in
>> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
>> 2013-04-16).
>
> Yeah, I think this is a bug in git-grep and should be fixed, independent
> of this commit. You should be able to use as a template the callbacks
> added by the child of c19a490e37:
>
>   1ece66bc9 (run-command: use thread-aware die_is_recursing routine,
>   2013-04-16)
>
> -Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] die routine: change recursion limit from 1 to 1024
  2017-06-20 18:49   ` Ævar Arnfjörð Bjarmason
@ 2017-06-20 19:05     ` Jeff King
  2017-06-21  8:12     ` Simon Ruderich
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2017-06-20 19:05 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Brandon Casey

On Tue, Jun 20, 2017 at 08:49:59PM +0200, Ævar Arnfjörð Bjarmason wrote:

> As you can see the third most common case is that we needlessly print
> out the warning, i.e. we have only one error anyway, but we can't
> guarantee that, so it probably makes sense to emit it.

Right, my suggestion actually doesn't help much with the multi-threaded
case. The warning is pointless but it will still be racily shown,
because we can't tell the difference between races and recursion. So
it's better, but fundamentally still pretty ugly. And we'd want to
actually fix the real problem.

> To reply to your 20170620161514.ygbflanx4pldc7n7@sigill.intra.peff.net
> downthread here (where you talk about setting up a custom die handler
> for grep) yeah that would make sense, but as long as we're supplying
> this default behavior (and not outlawing using it with pthreads) it
> makes sense to get out of our own way with this recursion detection.

I actually was more or less proposing that we consider stock die (no
custom handler) with pthreads to be outlawed. It sometimes kind of does
the right thing, but in a racy way. Probably threads calling die()
should consider their error-handling a bit more carefully.

That said, in my other mail I arrived at "just put a mutex into die()"
which I think would remove the raciness, and give the default behavior
of "the first thread to call die will take down the whole process and
get its single error printed". That actually seems pretty reasonable.
And it makes the "recursion or race?" question go away.

I'm not quite sure how to implement it, though. I think you'd want to
take the lock in die() itself, because it would make the is-recursing
check all the way to the exit an atomic unit. But who is responsible for
unlocking then?  The point of die_routine() is that it never returns.
That's fine if it truly calls exit(), but the threaded die handler in
run-command.c does a pthread_exit(). So _somebody_ has to unlock that so
other threads don't block if they try to die().

What a mess.

> I think my patch (possibly with the fixup above, depending on what we
> think about dupe warnings) is just fine to fix this. This is already a
> super-rare edge case in grep, and to the extent that it would be a
> problem for anyone it's because our paranoid recursion detector totally
> hides the error, I don't think it's worth worrying about us printing a
> few dupe error messages under threading for something that almost never
> happens.
> 
> At least, that's starting to go beyond my bothering to hack on it :)

Yes, I think your patch with my suggestion above is a strict improvement
over what's there. You'd see the warning any time you might have seen
"die is recursing", but at least you _also_ get to see the real error.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] die routine: change recursion limit from 1 to 1024
  2017-06-20 18:49   ` Ævar Arnfjörð Bjarmason
  2017-06-20 19:05     ` Jeff King
@ 2017-06-21  8:12     ` Simon Ruderich
  2017-06-21 10:10       ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 17+ messages in thread
From: Simon Ruderich @ 2017-06-21  8:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, git, Junio C Hamano, Brandon Casey

On Tue, Jun 20, 2017 at 08:49:59PM +0200, Ævar Arnfjörð Bjarmason wrote:
> If I understand you correctly this on top:
>
>     diff --git a/usage.c b/usage.c
>     index 1c198d4882..f6d5af2bb4 100644
>     --- a/usage.c
>     +++ b/usage.c
>     @@ -46,7 +46,19 @@ static int die_is_recursing_builtin(void)
>      	static int dying;
>      	static int recursion_limit = 1024;
>
>     -	return dying++ > recursion_limit;
>     +	dying++;
>     +
>     +	if (!dying) {

This will never trigger as dying was incremented two lines
before. But I think it's already handled by the dying <
recursion_limit case so we can just omit it.

>     +		/* ok, normal */
>     +		return 0;
>     +	} else if (dying < recursion_limit) {
>     +		/* only show the warning once */
>     +		if (dying == 1)
>     +			warning("die() called many times. Recursion error or racy threaded death!");
>     +		return 0; /* don't bail yet */
>     +	} else {
>     +		return 1;
>     +	}
>      }

Maybe restructure it like this:

    dying++
    if (dying > recursion_limit)
        return 1;
    if (dying == 1)
        warning();
    return 0;

Btw. is there a reason why recursion_limit is a static variable
and not a constant/define?

Regards
Simon
-- 
+ privacy is necessary
+ using gnupg http://gnupg.org
+ public key id: 0x92FEFDB7E44C32F9

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] die routine: change recursion limit from 1 to 1024
  2017-06-21  8:12     ` Simon Ruderich
@ 2017-06-21 10:10       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-21 10:10 UTC (permalink / raw)
  To: Simon Ruderich; +Cc: Jeff King, git, Junio C Hamano, Brandon Casey


On Wed, Jun 21 2017, Simon Ruderich jotted:

> On Tue, Jun 20, 2017 at 08:49:59PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> If I understand you correctly this on top:
>>
>>     diff --git a/usage.c b/usage.c
>>     index 1c198d4882..f6d5af2bb4 100644
>>     --- a/usage.c
>>     +++ b/usage.c
>>     @@ -46,7 +46,19 @@ static int die_is_recursing_builtin(void)
>>      	static int dying;
>>      	static int recursion_limit = 1024;
>>
>>     -	return dying++ > recursion_limit;
>>     +	dying++;
>>     +
>>     +	if (!dying) {
>
> This will never trigger as dying was incremented two lines
> before. But I think it's already handled by the dying <
> recursion_limit case so we can just omit it.
>
>>     +		/* ok, normal */
>>     +		return 0;
>>     +	} else if (dying < recursion_limit) {
>>     +		/* only show the warning once */
>>     +		if (dying == 1)
>>     +			warning("die() called many times. Recursion error or racy threaded death!");
>>     +		return 0; /* don't bail yet */
>>     +	} else {
>>     +		return 1;
>>     +	}
>>      }
>
> Maybe restructure it like this:
>
>     dying++
>     if (dying > recursion_limit)
>         return 1;
>     if (dying == 1)
>         warning();
>     return 0;

Thanks, silly mistake. Will fix.

> Btw. is there a reason why recursion_limit is a static variable
> and not a constant/define?

Nope, will make it a const. Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2] die(): stop hiding errors due to overzealous recursion guard
  2017-06-19 22:32   ` Ævar Arnfjörð Bjarmason
  2017-06-19 22:38     ` Stefan Beller
@ 2017-06-21 20:47     ` Ævar Arnfjörð Bjarmason
  2017-06-21 21:12       ` Stefan Beller
                         ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-21 20:47 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Stefan Beller, Simon Ruderich,
	Ævar Arnfjörð Bjarmason

Change the recursion limit for the default die routine from a *very*
low 1 to 1024. This ensures that infinite recursions are broken, but
doesn't lose the meaningful error messages under threaded execution
where threads concurrently start to die.

The intent of the existing code, as explained in commit
cd163d4b4e ("usage.c: detect recursion in die routines and bail out
immediately", 2012-11-14), is to break infinite recursion in cases
where the die routine itself calls die(), and would thus infinitely
recurse.

However, doing that very aggressively by immediately printing out
"recursion detected in die handler" if we've already called die() once
means that threaded invocations of git can end up only printing out
the "recursion detected" error, while hiding the meaningful error.

An example of this is running a threaded grep which dies on execution
against pretty much any repo, git.git will do:

    git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$'

With the current version of git this will print some combination of
multiple PCRE failures that caused the abort and multiple "recursion
detected", some invocations will print out multiple "recursion
detected" errors with no PCRE error at all!

Before this change, running the above grep command 1000 times against
git.git[1] and taking the top 20 results will on my system yield the
following distribution of actual errors ("E") and recursion
errors ("R"):

    322 E R
    306 E
    116 E R R
     65 R R
     54 R E
     49 E E
     44 R
     15 E R R R
      9 R R R
      7 R E R
      5 R R E
      3 E R R R R
      2 E E R
      1 R R R R
      1 R R R E
      1 R E R R

The exact results are obviously random and system-dependent, but this
shows the race condition in this code. Some small part of the time
we're about to print out the actual error ("E") but another thread's
recursion error beats us to it, and sometimes we print out nothing but
the recursion error.

With this change we get, now with "W" to mean the new warning being
emitted indicating that we've called die() many times:

    502 E
    160 E W E
    120 E E
     53 E W
     35 E W E E
     34 W E E
     29 W E E E
     16 E E W
     16 E E E
     11 W E E E E
      7 E E W E
      4 W E
      3 W W E E
      2 E W E E E
      1 W W E
      1 W E W E
      1 E W W E E E
      1 E W W E E
      1 E W W E
      1 E W E E W

Which still sucks a bit, due to a still present race-condition in this
code we're sometimes going to print out several errors still, or
several warnings, or two duplicate errors without the warning.

But we will never have a case where we completely hide the actual
error as we do now.

Now, git-grep could make use of the pluggable error facility added in
commit c19a490e37 ("usage: allow pluggable die-recursion checks",
2013-04-16). There's other threaded code that calls set_die_routine()
or set_die_is_recursing_routine().

But this is about fixing the general die() behavior with threading
when we don't have such a custom routine yet. Right now the common
case is not an infinite recursion in the handler, but us losing error
messages by default because we're overly paranoid about our recursion
check.

So let's just set the recursion limit to a number higher than the
number of threads we're ever likely to spawn. Now we won't lose
errors, and if we have a recursing die handler we'll still die within
microseconds.

There are race conditions in this code itself, in particular the
"dying" variable is not thread mutexed, so we e.g. won't be dying at
exactly 1024, or for that matter even be able to accurately test
"dying == 2", see the cases where we print out more than one "W"
above.

But that doesn't really matter, for the recursion guard we just need
to die "soon", not at exactly 1024 calls, and for printing the correct
error and only one warning most of the time in the face of threaded
death this is good enough and a net improvement on the current code.

1. for i in {1..1000}; do git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: p.*/E/; s/^warning.*/W/' | tr '\n' ' '; echo; done | sort | uniq -c | sort -nr | head -n 20

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

This replaces v1 and takes into account the feedback in this thread
(thanks everyone!).

The commit message is also much improved and includes more rationale
originally in my reply to Stefan in 87podz8v6v.fsf@gmail.com

 usage.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/usage.c b/usage.c
index 2f87ca69a8..1ea7df9a20 100644
--- a/usage.c
+++ b/usage.c
@@ -44,7 +44,23 @@ static void warn_builtin(const char *warn, va_list params)
 static int die_is_recursing_builtin(void)
 {
 	static int dying;
-	return dying++;
+	/*
+	 * Just an arbitrary number X where "a < x < b" where "a" is
+	 * "maximum number of pthreads we'll ever plausibly spawn" and
+	 * "b" is "something less than Inf", since the point is to
+	 * prevent infinite recursion.
+	 */
+	static const int recursion_limit = 1024;
+
+	dying++;
+	if (dying > recursion_limit) {
+		return 1;
+	} else if (dying == 2) {
+		warning("die() called many times. Recursion error or racy threaded death!");
+		return 0;
+	} else {
+		return 0;
+	}
 }
 
 /* If we are in a dlopen()ed .so write to a global variable would segfault
-- 
2.13.1.611.g7e3b11ae1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] die(): stop hiding errors due to overzealous recursion guard
  2017-06-21 20:47     ` [PATCH v2] die(): stop hiding errors due to overzealous recursion guard Ævar Arnfjörð Bjarmason
@ 2017-06-21 21:12       ` Stefan Beller
  2017-06-21 21:21       ` Morten Welinder
  2017-06-21 21:32       ` Junio C Hamano
  2 siblings, 0 replies; 17+ messages in thread
From: Stefan Beller @ 2017-06-21 21:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git@vger.kernel.org, Junio C Hamano, Jeff King, Simon Ruderich

On Wed, Jun 21, 2017 at 1:47 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Change the recursion limit for the default die routine from a *very*
> low 1 to 1024. This ensures that infinite recursions are broken, but
> doesn't lose the meaningful error messages under threaded execution
> where threads concurrently start to die.
>
> The intent of the existing code, as explained in commit
> cd163d4b4e ("usage.c: detect recursion in die routines and bail out
> immediately", 2012-11-14), is to break infinite recursion in cases
> where the die routine itself calls die(), and would thus infinitely
> recurse.
>
> However, doing that very aggressively by immediately printing out
> "recursion detected in die handler" if we've already called die() once
> means that threaded invocations of git can end up only printing out
> the "recursion detected" error, while hiding the meaningful error.
>
> An example of this is running a threaded grep which dies on execution
> against pretty much any repo, git.git will do:
>
>     git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$'
>
> With the current version of git this will print some combination of
> multiple PCRE failures that caused the abort and multiple "recursion
> detected", some invocations will print out multiple "recursion
> detected" errors with no PCRE error at all!
>
> Before this change, running the above grep command 1000 times against
> git.git[1] and taking the top 20 results will on my system yield the
> following distribution of actual errors ("E") and recursion
> errors ("R"):
>
>     322 E R
>     306 E
>     116 E R R
>      65 R R
>      54 R E
>      49 E E
>      44 R
>      15 E R R R
>       9 R R R
>       7 R E R
>       5 R R E
>       3 E R R R R
>       2 E E R
>       1 R R R R
>       1 R R R E
>       1 R E R R
>
> The exact results are obviously random and system-dependent, but this
> shows the race condition in this code. Some small part of the time
> we're about to print out the actual error ("E") but another thread's
> recursion error beats us to it, and sometimes we print out nothing but
> the recursion error.
>
> With this change we get, now with "W" to mean the new warning being
> emitted indicating that we've called die() many times:
>
>     502 E
>     160 E W E
>     120 E E
>      53 E W
>      35 E W E E
>      34 W E E
>      29 W E E E
>      16 E E W
>      16 E E E
>      11 W E E E E
>       7 E E W E
>       4 W E
>       3 W W E E
>       2 E W E E E
>       1 W W E
>       1 W E W E
>       1 E W W E E E
>       1 E W W E E
>       1 E W W E
>       1 E W E E W
>
> Which still sucks a bit, due to a still present race-condition in this
> code we're sometimes going to print out several errors still, or
> several warnings, or two duplicate errors without the warning.
>
> But we will never have a case where we completely hide the actual
> error as we do now.
>
> Now, git-grep could make use of the pluggable error facility added in
> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
> 2013-04-16). There's other threaded code that calls set_die_routine()
> or set_die_is_recursing_routine().
>
> But this is about fixing the general die() behavior with threading
> when we don't have such a custom routine yet. Right now the common
> case is not an infinite recursion in the handler, but us losing error
> messages by default because we're overly paranoid about our recursion
> check.
>
> So let's just set the recursion limit to a number higher than the
> number of threads we're ever likely to spawn. Now we won't lose
> errors, and if we have a recursing die handler we'll still die within
> microseconds.
>
> There are race conditions in this code itself, in particular the
> "dying" variable is not thread mutexed, so we e.g. won't be dying at
> exactly 1024, or for that matter even be able to accurately test
> "dying == 2", see the cases where we print out more than one "W"
> above.
>
> But that doesn't really matter, for the recursion guard we just need
> to die "soon", not at exactly 1024 calls, and for printing the correct
> error and only one warning most of the time in the face of threaded
> death this is good enough and a net improvement on the current code.
>
> 1. for i in {1..1000}; do git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: p.*/E/; s/^warning.*/W/' | tr '\n' ' '; echo; done | sort | uniq -c | sort -nr | head -n 20
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---

Reviewed-by-and-found-no-nits: Stefan Beller <sbeller@google.com>
;)

>
> This replaces v1 and takes into account the feedback in this thread
> (thanks everyone!).
>
> The commit message is also much improved and includes more rationale
> originally in my reply to Stefan in 87podz8v6v.fsf@gmail.com

Thanks!
Stefan

>
>  usage.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/usage.c b/usage.c
> index 2f87ca69a8..1ea7df9a20 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -44,7 +44,23 @@ static void warn_builtin(const char *warn, va_list params)
>  static int die_is_recursing_builtin(void)
>  {
>         static int dying;
> -       return dying++;
> +       /*
> +        * Just an arbitrary number X where "a < x < b" where "a" is
> +        * "maximum number of pthreads we'll ever plausibly spawn" and
> +        * "b" is "something less than Inf", since the point is to
> +        * prevent infinite recursion.
> +        */
> +       static const int recursion_limit = 1024;
> +
> +       dying++;
> +       if (dying > recursion_limit) {
> +               return 1;
> +       } else if (dying == 2) {
> +               warning("die() called many times. Recursion error or racy threaded death!");
> +               return 0;
> +       } else {
> +               return 0;
> +       }
>  }
>
>  /* If we are in a dlopen()ed .so write to a global variable would segfault
> --
> 2.13.1.611.g7e3b11ae1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] die(): stop hiding errors due to overzealous recursion guard
  2017-06-21 20:47     ` [PATCH v2] die(): stop hiding errors due to overzealous recursion guard Ævar Arnfjörð Bjarmason
  2017-06-21 21:12       ` Stefan Beller
@ 2017-06-21 21:21       ` Morten Welinder
  2017-06-21 21:40         ` Ævar Arnfjörð Bjarmason
  2017-06-21 21:32       ` Junio C Hamano
  2 siblings, 1 reply; 17+ messages in thread
From: Morten Welinder @ 2017-06-21 21:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: GIT Mailing List, Junio C Hamano, Jeff King, Stefan Beller,
	Simon Ruderich

If threading is the issue, how do you get meaningful results from
reading and updating
"dying" with no use of atomic types or locks?  Other than winning the
implied race, of
course.

M.


On Wed, Jun 21, 2017 at 4:47 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> Change the recursion limit for the default die routine from a *very*
> low 1 to 1024. This ensures that infinite recursions are broken, but
> doesn't lose the meaningful error messages under threaded execution
> where threads concurrently start to die.
>
> The intent of the existing code, as explained in commit
> cd163d4b4e ("usage.c: detect recursion in die routines and bail out
> immediately", 2012-11-14), is to break infinite recursion in cases
> where the die routine itself calls die(), and would thus infinitely
> recurse.
>
> However, doing that very aggressively by immediately printing out
> "recursion detected in die handler" if we've already called die() once
> means that threaded invocations of git can end up only printing out
> the "recursion detected" error, while hiding the meaningful error.
>
> An example of this is running a threaded grep which dies on execution
> against pretty much any repo, git.git will do:
>
>     git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$'
>
> With the current version of git this will print some combination of
> multiple PCRE failures that caused the abort and multiple "recursion
> detected", some invocations will print out multiple "recursion
> detected" errors with no PCRE error at all!
>
> Before this change, running the above grep command 1000 times against
> git.git[1] and taking the top 20 results will on my system yield the
> following distribution of actual errors ("E") and recursion
> errors ("R"):
>
>     322 E R
>     306 E
>     116 E R R
>      65 R R
>      54 R E
>      49 E E
>      44 R
>      15 E R R R
>       9 R R R
>       7 R E R
>       5 R R E
>       3 E R R R R
>       2 E E R
>       1 R R R R
>       1 R R R E
>       1 R E R R
>
> The exact results are obviously random and system-dependent, but this
> shows the race condition in this code. Some small part of the time
> we're about to print out the actual error ("E") but another thread's
> recursion error beats us to it, and sometimes we print out nothing but
> the recursion error.
>
> With this change we get, now with "W" to mean the new warning being
> emitted indicating that we've called die() many times:
>
>     502 E
>     160 E W E
>     120 E E
>      53 E W
>      35 E W E E
>      34 W E E
>      29 W E E E
>      16 E E W
>      16 E E E
>      11 W E E E E
>       7 E E W E
>       4 W E
>       3 W W E E
>       2 E W E E E
>       1 W W E
>       1 W E W E
>       1 E W W E E E
>       1 E W W E E
>       1 E W W E
>       1 E W E E W
>
> Which still sucks a bit, due to a still present race-condition in this
> code we're sometimes going to print out several errors still, or
> several warnings, or two duplicate errors without the warning.
>
> But we will never have a case where we completely hide the actual
> error as we do now.
>
> Now, git-grep could make use of the pluggable error facility added in
> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
> 2013-04-16). There's other threaded code that calls set_die_routine()
> or set_die_is_recursing_routine().
>
> But this is about fixing the general die() behavior with threading
> when we don't have such a custom routine yet. Right now the common
> case is not an infinite recursion in the handler, but us losing error
> messages by default because we're overly paranoid about our recursion
> check.
>
> So let's just set the recursion limit to a number higher than the
> number of threads we're ever likely to spawn. Now we won't lose
> errors, and if we have a recursing die handler we'll still die within
> microseconds.
>
> There are race conditions in this code itself, in particular the
> "dying" variable is not thread mutexed, so we e.g. won't be dying at
> exactly 1024, or for that matter even be able to accurately test
> "dying == 2", see the cases where we print out more than one "W"
> above.
>
> But that doesn't really matter, for the recursion guard we just need
> to die "soon", not at exactly 1024 calls, and for printing the correct
> error and only one warning most of the time in the face of threaded
> death this is good enough and a net improvement on the current code.
>
> 1. for i in {1..1000}; do git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: p.*/E/; s/^warning.*/W/' | tr '\n' ' '; echo; done | sort | uniq -c | sort -nr | head -n 20
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>
> This replaces v1 and takes into account the feedback in this thread
> (thanks everyone!).
>
> The commit message is also much improved and includes more rationale
> originally in my reply to Stefan in 87podz8v6v.fsf@gmail.com
>
>  usage.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/usage.c b/usage.c
> index 2f87ca69a8..1ea7df9a20 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -44,7 +44,23 @@ static void warn_builtin(const char *warn, va_list params)
>  static int die_is_recursing_builtin(void)
>  {
>         static int dying;
> -       return dying++;
> +       /*
> +        * Just an arbitrary number X where "a < x < b" where "a" is
> +        * "maximum number of pthreads we'll ever plausibly spawn" and
> +        * "b" is "something less than Inf", since the point is to
> +        * prevent infinite recursion.
> +        */
> +       static const int recursion_limit = 1024;
> +
> +       dying++;
> +       if (dying > recursion_limit) {
> +               return 1;
> +       } else if (dying == 2) {
> +               warning("die() called many times. Recursion error or racy threaded death!");
> +               return 0;
> +       } else {
> +               return 0;
> +       }
>  }
>
>  /* If we are in a dlopen()ed .so write to a global variable would segfault
> --
> 2.13.1.611.g7e3b11ae1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] die(): stop hiding errors due to overzealous recursion guard
  2017-06-21 20:47     ` [PATCH v2] die(): stop hiding errors due to overzealous recursion guard Ævar Arnfjörð Bjarmason
  2017-06-21 21:12       ` Stefan Beller
  2017-06-21 21:21       ` Morten Welinder
@ 2017-06-21 21:32       ` Junio C Hamano
  2017-06-24 12:36         ` Jeff King
  2 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2017-06-21 21:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Stefan Beller, Simon Ruderich

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> So let's just set the recursion limit to a number higher than the
> number of threads we're ever likely to spawn. Now we won't lose
> errors, and if we have a recursing die handler we'll still die within
> microseconds.
>
> There are race conditions in this code itself, in particular the
> "dying" variable is not thread mutexed, so we e.g. won't be dying at
> exactly 1024, or for that matter even be able to accurately test
> "dying == 2", see the cases where we print out more than one "W"
> above.

One case I'd be worried about would be that the race is so bad that
die-is-recursing-builtin never returns 0 even once.  Everybody will
just say "recursing" and die, without giving any useful information.

Will queue, as it is nevertheless an improvement over the current
code.

Thanks.

>  usage.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/usage.c b/usage.c
> index 2f87ca69a8..1ea7df9a20 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -44,7 +44,23 @@ static void warn_builtin(const char *warn, va_list params)
>  static int die_is_recursing_builtin(void)
>  {
>  	static int dying;
> -	return dying++;
> +	/*
> +	 * Just an arbitrary number X where "a < x < b" where "a" is
> +	 * "maximum number of pthreads we'll ever plausibly spawn" and
> +	 * "b" is "something less than Inf", since the point is to
> +	 * prevent infinite recursion.
> +	 */
> +	static const int recursion_limit = 1024;
> +
> +	dying++;
> +	if (dying > recursion_limit) {
> +		return 1;
> +	} else if (dying == 2) {
> +		warning("die() called many times. Recursion error or racy threaded death!");
> +		return 0;
> +	} else {
> +		return 0;
> +	}
>  }
>  
>  /* If we are in a dlopen()ed .so write to a global variable would segfault

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] die(): stop hiding errors due to overzealous recursion guard
  2017-06-21 21:21       ` Morten Welinder
@ 2017-06-21 21:40         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-21 21:40 UTC (permalink / raw)
  To: Morten Welinder
  Cc: GIT Mailing List, Junio C Hamano, Jeff King, Stefan Beller,
	Simon Ruderich


On Wed, Jun 21 2017, Morten Welinder jotted:

> If threading is the issue, how do you get meaningful results from
> reading and updating
> "dying" with no use of atomic types or locks?  Other than winning the
> implied race, of
> course.

Threading isn't the issue. The issue is that we have an overzelous
recursion guard that will demonstrably cause us to lose errors in the
face of threading.

By amending the guard so that we won't run into it in practice so soon
that we'll hide errors (see the empirical results in the commit message)
we solve *that* issue in practice.

The current code & the code I'm adding here suffers from race conditions
& non-atomic updates, but for the reasons explained at the bottom of the
the commit message that's OK.

We're not relying on being able to do x++ and have x be 1, 2, 3 etc. in
the face of threading, we're just currently relying on it being larger
than 1, or with my patch eventually larger than 1024.

It is possible with my patch that we'll never take the "dying == 2" ->
"warning(..)" branch (and empirical results show that happens), but it's
enough for the purposes of the default die handler (which we really
should be overriding if we're doing threading, but sometimes we're lazy)
that it works most of the time, and that we at least don't hide real
errors, which is the issue with it right now.

> On Wed, Jun 21, 2017 at 4:47 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> Change the recursion limit for the default die routine from a *very*
>> low 1 to 1024. This ensures that infinite recursions are broken, but
>> doesn't lose the meaningful error messages under threaded execution
>> where threads concurrently start to die.
>>
>> The intent of the existing code, as explained in commit
>> cd163d4b4e ("usage.c: detect recursion in die routines and bail out
>> immediately", 2012-11-14), is to break infinite recursion in cases
>> where the die routine itself calls die(), and would thus infinitely
>> recurse.
>>
>> However, doing that very aggressively by immediately printing out
>> "recursion detected in die handler" if we've already called die() once
>> means that threaded invocations of git can end up only printing out
>> the "recursion detected" error, while hiding the meaningful error.
>>
>> An example of this is running a threaded grep which dies on execution
>> against pretty much any repo, git.git will do:
>>
>>     git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$'
>>
>> With the current version of git this will print some combination of
>> multiple PCRE failures that caused the abort and multiple "recursion
>> detected", some invocations will print out multiple "recursion
>> detected" errors with no PCRE error at all!
>>
>> Before this change, running the above grep command 1000 times against
>> git.git[1] and taking the top 20 results will on my system yield the
>> following distribution of actual errors ("E") and recursion
>> errors ("R"):
>>
>>     322 E R
>>     306 E
>>     116 E R R
>>      65 R R
>>      54 R E
>>      49 E E
>>      44 R
>>      15 E R R R
>>       9 R R R
>>       7 R E R
>>       5 R R E
>>       3 E R R R R
>>       2 E E R
>>       1 R R R R
>>       1 R R R E
>>       1 R E R R
>>
>> The exact results are obviously random and system-dependent, but this
>> shows the race condition in this code. Some small part of the time
>> we're about to print out the actual error ("E") but another thread's
>> recursion error beats us to it, and sometimes we print out nothing but
>> the recursion error.
>>
>> With this change we get, now with "W" to mean the new warning being
>> emitted indicating that we've called die() many times:
>>
>>     502 E
>>     160 E W E
>>     120 E E
>>      53 E W
>>      35 E W E E
>>      34 W E E
>>      29 W E E E
>>      16 E E W
>>      16 E E E
>>      11 W E E E E
>>       7 E E W E
>>       4 W E
>>       3 W W E E
>>       2 E W E E E
>>       1 W W E
>>       1 W E W E
>>       1 E W W E E E
>>       1 E W W E E
>>       1 E W W E
>>       1 E W E E W
>>
>> Which still sucks a bit, due to a still present race-condition in this
>> code we're sometimes going to print out several errors still, or
>> several warnings, or two duplicate errors without the warning.
>>
>> But we will never have a case where we completely hide the actual
>> error as we do now.
>>
>> Now, git-grep could make use of the pluggable error facility added in
>> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
>> 2013-04-16). There's other threaded code that calls set_die_routine()
>> or set_die_is_recursing_routine().
>>
>> But this is about fixing the general die() behavior with threading
>> when we don't have such a custom routine yet. Right now the common
>> case is not an infinite recursion in the handler, but us losing error
>> messages by default because we're overly paranoid about our recursion
>> check.
>>
>> So let's just set the recursion limit to a number higher than the
>> number of threads we're ever likely to spawn. Now we won't lose
>> errors, and if we have a recursing die handler we'll still die within
>> microseconds.
>>
>> There are race conditions in this code itself, in particular the
>> "dying" variable is not thread mutexed, so we e.g. won't be dying at
>> exactly 1024, or for that matter even be able to accurately test
>> "dying == 2", see the cases where we print out more than one "W"
>> above.
>>
>> But that doesn't really matter, for the recursion guard we just need
>> to die "soon", not at exactly 1024 calls, and for printing the correct
>> error and only one warning most of the time in the face of threaded
>> death this is good enough and a net improvement on the current code.
>>
>> 1. for i in {1..1000}; do git grep -P --threads=8 '(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: r.*/R/; s/^fatal: p.*/E/; s/^warning.*/W/' | tr '\n' ' '; echo; done | sort | uniq -c | sort -nr | head -n 20
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>
>> This replaces v1 and takes into account the feedback in this thread
>> (thanks everyone!).
>>
>> The commit message is also much improved and includes more rationale
>> originally in my reply to Stefan in 87podz8v6v.fsf@gmail.com
>>
>>  usage.c | 18 +++++++++++++++++-
>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/usage.c b/usage.c
>> index 2f87ca69a8..1ea7df9a20 100644
>> --- a/usage.c
>> +++ b/usage.c
>> @@ -44,7 +44,23 @@ static void warn_builtin(const char *warn, va_list params)
>>  static int die_is_recursing_builtin(void)
>>  {
>>         static int dying;
>> -       return dying++;
>> +       /*
>> +        * Just an arbitrary number X where "a < x < b" where "a" is
>> +        * "maximum number of pthreads we'll ever plausibly spawn" and
>> +        * "b" is "something less than Inf", since the point is to
>> +        * prevent infinite recursion.
>> +        */
>> +       static const int recursion_limit = 1024;
>> +
>> +       dying++;
>> +       if (dying > recursion_limit) {
>> +               return 1;
>> +       } else if (dying == 2) {
>> +               warning("die() called many times. Recursion error or racy threaded death!");
>> +               return 0;
>> +       } else {
>> +               return 0;
>> +       }
>>  }
>>
>>  /* If we are in a dlopen()ed .so write to a global variable would segfault
>> --
>> 2.13.1.611.g7e3b11ae1
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] die(): stop hiding errors due to overzealous recursion guard
  2017-06-21 21:32       ` Junio C Hamano
@ 2017-06-24 12:36         ` Jeff King
  2017-06-24 18:32           ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2017-06-24 12:36 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Stefan Beller,
	Simon Ruderich

On Wed, Jun 21, 2017 at 02:32:16PM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
> > So let's just set the recursion limit to a number higher than the
> > number of threads we're ever likely to spawn. Now we won't lose
> > errors, and if we have a recursing die handler we'll still die within
> > microseconds.
> >
> > There are race conditions in this code itself, in particular the
> > "dying" variable is not thread mutexed, so we e.g. won't be dying at
> > exactly 1024, or for that matter even be able to accurately test
> > "dying == 2", see the cases where we print out more than one "W"
> > above.
> 
> One case I'd be worried about would be that the race is so bad that
> die-is-recursing-builtin never returns 0 even once.  Everybody will
> just say "recursing" and die, without giving any useful information.

I was trying to think how that would happen. If nobody's actually
recursing indefinitely, then the value in theory peaks at the number of
threads (modulo the fact that we're modifying a variable from multiple
threads without any locking; I'm not sure how reasonable it is to assume
in practice that sheared writes may cause us to lose an increment but
not to put nonsense in to the variable). If they are, then one thread
may increment it to 1024 before another thread gets a chance to say
anything. But in that case, the recursion-die is our expected outcome.

Anyway, it might be reasonable to protect the counter with a mutex.
Like:

diff --git a/usage.c b/usage.c
index fc2b31c54b..34fef0f9fa 100644
--- a/usage.c
+++ b/usage.c
@@ -44,9 +44,19 @@ static void warn_builtin(const char *warn, va_list params)
 	vreportf("warning: ", warn, params);
 }
 
+#ifndef NO_PTHREADS
+static pthread_mutex_t recursion_mutex = PTHREAD_MUTEX_INITIALIZER;
+#define recursion_lock() pthread_mutex_lock(&recursion_mutex)
+#define recursion_unlock() pthread_mutex_unlock(&recursion_mutex)
+#else
+#define recursion_lock()
+#define recursion_unlock()
+#endif
+static int recursion_counter;
+
 static int die_is_recursing_builtin(void)
 {
-	static int dying;
+	int dying;
 	/*
 	 * Just an arbitrary number X where "a < x < b" where "a" is
 	 * "maximum number of pthreads we'll ever plausibly spawn" and
@@ -55,7 +65,10 @@ static int die_is_recursing_builtin(void)
 	 */
 	static const int recursion_limit = 1024;
 
-	dying++;
+	recursion_lock();
+	dying = ++recursion_counter;
+	recursion_unlock();
+
 	if (dying > recursion_limit) {
 		return 1;
 	} else if (dying == 2) {

I can't remember if there are problems on Windows with using constant
mutex initializers, though. If so, I guess common-main would have to
initialize it.

I left the rest of the logic as-is, but if we switched to post-increment:

  dying = recursion_counter++;

then I think the numbers around "dying" would make more sense (e.g.,
"dying == 2" would make more sense to me as "dying == 1" to check that
we were already dying).

To be honest, I'm not sure if it's worth giving it much more time,
though. I'd be fine with Ævar's patch as-is.

-Peff

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] die(): stop hiding errors due to overzealous recursion guard
  2017-06-24 12:36         ` Jeff King
@ 2017-06-24 18:32           ` Junio C Hamano
  0 siblings, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2017-06-24 18:32 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Stefan Beller,
	Simon Ruderich

Jeff King <peff@peff.net> writes:

>> One case I'd be worried about would be that the race is so bad that
>> die-is-recursing-builtin never returns 0 even once.  Everybody will
>> just say "recursing" and die, without giving any useful information.
>
> I was trying to think how that would happen. If nobody's actually
> recursing indefinitely, then the value in theory peaks at the number of
> threads (modulo the fact that we're modifying a variable from multiple
> threads without any locking; I'm not sure how reasonable it is to assume
> in practice that sheared writes may cause us to lose an increment but
> not to put nonsense in to the variable). If they are, then one thread
> may increment it to 1024 before another thread gets a chance to say
> anything. But in that case, the recursion-die is our expected outcome.
>
> Anyway, it might be reasonable to protect the counter with a mutex.
> Like:
> ...
> To be honest, I'm not sure if it's worth giving it much more time,
> though. I'd be fine with Ævar's patch as-is.

The scenario I had in mind was three or more threads simultaneously
dying, each incrementing dying counter by one and before any of them
have a chance to say "called many times, error or racy threaded
death!", because they all observe three (or more).  

But I was incorrectly reading the code---in that case, as long as
dying is small enough, we'll return 0 and let at least one of the
caller give a chance to give a message that came in "err" from their
invocations of die()'s.

So I do not think it is worth worrying about too deeply.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-06-24 18:32 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-19 22:00 [PATCH] die routine: change recursion limit from 1 to 1024 Ævar Arnfjörð Bjarmason
2017-06-19 22:08 ` Stefan Beller
2017-06-19 22:32   ` Ævar Arnfjörð Bjarmason
2017-06-19 22:38     ` Stefan Beller
2017-06-21 20:47     ` [PATCH v2] die(): stop hiding errors due to overzealous recursion guard Ævar Arnfjörð Bjarmason
2017-06-21 21:12       ` Stefan Beller
2017-06-21 21:21       ` Morten Welinder
2017-06-21 21:40         ` Ævar Arnfjörð Bjarmason
2017-06-21 21:32       ` Junio C Hamano
2017-06-24 12:36         ` Jeff King
2017-06-24 18:32           ` Junio C Hamano
2017-06-20 15:54 ` [PATCH] die routine: change recursion limit from 1 to 1024 Jeff King
2017-06-20 16:15   ` Jeff King
2017-06-20 18:49   ` Ævar Arnfjörð Bjarmason
2017-06-20 19:05     ` Jeff King
2017-06-21  8:12     ` Simon Ruderich
2017-06-21 10:10       ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).