Is t7006-pager.sh racy?

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Is t7006-pager.sh racy?
@ 2021-10-24  0:04 Junio C Hamano
  2021-10-24 17:03 ` SZEDER Gábor
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2021-10-24  0:04 UTC (permalink / raw)
  To: git

It seems under --stress it is fairly easy to break the said test,
especially the one near the end

    test_expect_success TTY 'git returns SIGPIPE on propagated signals from pager' '
            test_when_finished "rm pager-used trace.normal" &&
            test_config core.pager ">pager-used; test-tool sigchain" &&
            GIT_TRACE2="$(pwd)/trace.normal" &&
            export GIT_TRACE2 &&
            test_when_finished "unset GIT_TRACE2" &&

            if test_have_prereq !MINGW
            then
                    OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
                    test_match_signal 13 "$OUT"
            else
                    test_terminal git log
            fi &&

            grep child_exit trace.normal >child-exits &&
            test_line_count = 1 child-exits &&
            grep " code:143 " child-exits &&
            test_path_is_file pager-used
    '

When it fails the code in child-exits is actually -1 not 143, a
propagated pipe.

What makes us expect that the "git log" invocation should trigger a
SIGPIPE in the first place?  test-sigchain emitting a few lines and
exiting without ever reading from its standard input?  Will it
reliably die soon enough before, or linger around until, we attempt
to write to the pager?  If it does not die quickly enough, can we
attempt to write, successfully fill the buffer between us and the
pager, and wait to notice that test-sigchain exiting with non-zero,
or something?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Is t7006-pager.sh racy?
  2021-10-24  0:04 Is t7006-pager.sh racy? Junio C Hamano
@ 2021-10-24 17:03 ` SZEDER Gábor
  2021-10-25 17:41   ` Jeff King
  2021-11-21 18:40   ` Jeff King
  0 siblings, 2 replies; 17+ messages in thread
From: SZEDER Gábor @ 2021-10-24 17:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Ævar Arnfjörð Bjarmason

On Sat, Oct 23, 2021 at 05:04:42PM -0700, Junio C Hamano wrote:
> It seems under --stress it is fairly easy to break the said test,
> especially the one near the end

I couldn't reproduce a failure with --stress, but after a cursory look
into those tests I doubt that either that test or any of the
preceeding SIGPIPE tests added in c24b7f6736 (pager: test for exit
code with and without SIGPIPE, 2021-02-02) actually check what they
are supposed to.

>     test_expect_success TTY 'git returns SIGPIPE on propagated signals from pager' '
>             test_when_finished "rm pager-used trace.normal" &&
>             test_config core.pager ">pager-used; test-tool sigchain" &&
>             GIT_TRACE2="$(pwd)/trace.normal" &&
>             export GIT_TRACE2 &&
>             test_when_finished "unset GIT_TRACE2" &&
> 
>             if test_have_prereq !MINGW
>             then
>                     OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&

This way too convoluted contruct doesn't do what we expect it to do,
see below.

>                     test_match_signal 13 "$OUT"
>             else
>                     test_terminal git log
>             fi &&
> 
>             grep child_exit trace.normal >child-exits &&
>             test_line_count = 1 child-exits &&
>             grep " code:143 " child-exits &&
>             test_path_is_file pager-used
>     '
> 
> When it fails the code in child-exits is actually -1 not 143, a
> propagated pipe.
> 
> What makes us expect that the "git log" invocation should trigger a
> SIGPIPE in the first place?

A misunderstanding, perhaps, because those 'git log' commands with
their early-exiting pagers rarely trigger SIGPIPE.

>  test-sigchain emitting a few lines and
> exiting without ever reading from its standard input?

It's 'test_terminal.perl' emitting whatever it read from 'git log' to
the "| :" pipe, which doesn't ever read from its standard input.

> Will it
> reliably die soon enough before, or linger around until, we attempt
> to write to the pager?  If it does not die quickly enough, can we
> attempt to write, successfully fill the buffer between us and the
> pager, and wait to notice that test-sigchain exiting with non-zero,
> or something?

The history used in those tests is rather small, 'git log |wc -c'
shows that it outputs merely 811 bytes, which is not enough to fill
any kind of buffers.  So I think 'git log' can only get SIGPIPE if it
is still actively producing output after the pager command has already
exited.

The first two tests in the patch below use such an early-exiting pager
that is supposed to trigger a SIGPIPE, but without that unnecessarily
clever construct from the other tests, and demonstrate that 'git log'
usually exits with success; './t7006-pager.sh --no-chain-lint -r
1,2,101,109,110 --stress' usually takes a few repetitions to fail.

The third test below sends 'test_terminal's output to a non-reading
pipe stage, and along with the debug output added to
'test-terminal.perl' demonstrate that it's 'test_terminal' that
reliably gets SIGPIPE, or at least './t7006-pager.sh --no-chain-lint
-r 1,2,101,111 --stress' didn't fail in 1000+ repetitions.

All this means that when those tests run

    OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
    test_match_signal 13 "$OUT"

then though they intend to match a SIGPIPE in 'git log' supposedly
triggeres by the early-exiting pager, they mistakelly match the
unexpected SIGPIPE in 'test_terminal' triggered by "| :" instead.


  ---  >8  ---

diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh
index 0e7cf75435..e3a2cacbb7 100755
--- a/t/t7006-pager.sh
+++ b/t/t7006-pager.sh
@@ -786,4 +786,35 @@ test_expect_success TTY 'git returns SIGPIPE on propagated signals from pager' '
 	test_path_is_file pager-used
 '
 
+echo "'git log' output size: $(git log |wc -c)"
+
+test_expect_success TTY 'early pager exit rarely triggers SIGPIPE #1' '
+	test_when_finished "rm -f pager-used" &&
+	test_config core.pager ">pager-used; head -n 1; exit 0" &&
+
+	test_terminal git log &&
+	test_path_is_file pager-used
+'
+
+test_expect_success TTY 'early pager exit rarely triggers SIGPIPE #2' '
+	test_when_finished "rm -f pager-used" &&
+	test_config core.pager ">pager-used; head -n 1; exit 0" &&
+
+	echo 0 >expect.EC &&
+	# Needs --no-chain-lint
+	( test_terminal git log ; echo $? >EC ) &&
+	test_cmp expect.EC EC &&
+	test_path_is_file pager-used
+'
+
+test_expect_success TTY '"test_terminal ... | :" triggers SIGPIPE in test_terminal' '
+	test_when_finished "rm -f pager-used" &&
+	test_config core.pager ">pager-used; head -n 1; exit 0" &&
+
+	test_terminal git log 2>err | : &&
+	grep "test-terminal.perl caught SIGPIPE" err &&
+	! grep "test-terminal.perl: waitpid() for child process" err &&
+	test_path_is_file pager-used
+'
+
 test_done
diff --git a/t/test-terminal.perl b/t/test-terminal.perl
index 1bcf01a9a4..064623031c 100755
--- a/t/test-terminal.perl
+++ b/t/test-terminal.perl
@@ -22,11 +22,14 @@ sub start_child {
 	return $pid;
 }
 
+$SIG{PIPE} = sub { die "test-terminal.perl caught SIGPIPE: $!" };
+
 # Wait for $pid to finish.
 sub finish_child {
 	# Simplified from wait_or_whine() in run-command.c.
 	my ($pid) = @_;
 
+	warn "test-terminal: waitpid() for child process";
 	my $waiting = waitpid($pid, 0);
 	if ($waiting < 0) {
 		die "waitpid failed: $!";

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Is t7006-pager.sh racy?
  2021-10-24 17:03 ` SZEDER Gábor
@ 2021-10-25 17:41   ` Jeff King
  2021-10-28 19:55     ` SZEDER Gábor
  2021-11-21 18:40   ` Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff King @ 2021-10-25 17:41 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason

On Sun, Oct 24, 2021 at 07:03:49PM +0200, SZEDER Gábor wrote:

> On Sat, Oct 23, 2021 at 05:04:42PM -0700, Junio C Hamano wrote:
> > It seems under --stress it is fairly easy to break the said test,
> > especially the one near the end
> 
> I couldn't reproduce a failure with --stress, but after a cursory look
> into those tests I doubt that either that test or any of the
> preceeding SIGPIPE tests added in c24b7f6736 (pager: test for exit
> code with and without SIGPIPE, 2021-02-02) actually check what they
> are supposed to.

Yeah, I am puzzled that they are using test_terminal in the first place
(as opposed to just "git -p"). And you are right that a raw git-log is
unlikely to be slow enough to get SIGPIPE in most cases.

My usual test for an intentional SIGPIPE is "yes". So something like:

  git -p \
    -c core.pager='exit 0' \
    -c alias.yes='!yes' \
    yes

will reliably trigger SIGPIPE from yes, which git.c will then translate
into an exit code of 141.

If you really want to see SIGPIPE from a builtin (which arguably is the
more interesting case here, though I think it behaves the same with
respect to the pager), it's a bit trickier. One way to do it is with a
command that doesn't generate output until after it gets EOF on stdin.

So something like "git log --stdin" works, but you have to contort
yourself a bit to make it race-free:

-- >8 --
# The I/O setup here is:
#
#         fifo:log-in          stdout
#   shell -----------> git-log ------> pager
#     ^                                 /
#      \-------------------------------/
#                 fifo:pager-closed
#
# The pager closes its stdin, which will give git-log SIGPIPE. But the
# tricky part is that after doing so, it signals via fifo to the shell,
# which then writes to git-log's stdin, triggering it to actually
# generate output (and get SIGPIPE).
#
# You can verify that it's race-free by inserting a "sleep 3" at the
# front of the pager command (before the exec) and seeing that the
# other processes wait (and we still get SIGPIPE).

mkfifo pager-closed
mkfifo log-in
git config core.pager 'exec 0<&-; echo ready >pager-closed; exit 0'
(git -p log --stdin <log-in; echo $? >exit-code) &

# we have to open a descriptor rather than just "echo HEAD >log-in", because
# that will give git-log an immediate EOF on its input when echo closes it, and
# we must wait until the signal from pager-closed. Likewise we cannot wait
# for that signal before the echo, because the subshell is blocking on opening
# log-in until somebody is hooked up to the write end of the pipe.
exec 9>log-in
read ok <pager-closed
echo HEAD >&9
exec 9>&-

# now we can wait for the subshell to finish and retrieve any output
# it produced
wait
cat exit-code
-- >8 --

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Is t7006-pager.sh racy?
  2021-10-25 17:41   ` Jeff King
@ 2021-10-28 19:55     ` SZEDER Gábor
  2021-10-28 22:10       ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: SZEDER Gábor @ 2021-10-28 19:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason

On Mon, Oct 25, 2021 at 01:41:18PM -0400, Jeff King wrote:
> On Sun, Oct 24, 2021 at 07:03:49PM +0200, SZEDER Gábor wrote:
> 
> > On Sat, Oct 23, 2021 at 05:04:42PM -0700, Junio C Hamano wrote:
> > > It seems under --stress it is fairly easy to break the said test,
> > > especially the one near the end
> > 
> > I couldn't reproduce a failure with --stress, but after a cursory look
> > into those tests I doubt that either that test or any of the
> > preceeding SIGPIPE tests added in c24b7f6736 (pager: test for exit
> > code with and without SIGPIPE, 2021-02-02) actually check what they
> > are supposed to.
> 
> Yeah, I am puzzled that they are using test_terminal in the first place
> (as opposed to just "git -p"). And you are right that a raw git-log is
> unlikely to be slow enough to get SIGPIPE in most cases.
> 
> My usual test for an intentional SIGPIPE is "yes". So something like:
> 
>   git -p \
>     -c core.pager='exit 0' \
>     -c alias.yes='!yes' \
>     yes
> 
> will reliably trigger SIGPIPE from yes, which git.c will then translate
> into an exit code of 141.

Oh, that's clever.  Alas it's not applicable to our tests, because
'yes' is not portable; 8648732e29 (t/test-lib.sh: provide a shell
implementation of the 'yes' utility, 2009-08-28).


> If you really want to see SIGPIPE from a builtin (which arguably is the
> more interesting case here, though I think it behaves the same with
> respect to the pager), it's a bit trickier. One way to do it is with a
> command that doesn't generate output until after it gets EOF on stdin.
> 
> So something like "git log --stdin" works, but you have to contort
> yourself a bit to make it race-free:
> 
> -- >8 --
> # The I/O setup here is:
> #
> #         fifo:log-in          stdout
> #   shell -----------> git-log ------> pager
> #     ^                                 /
> #      \-------------------------------/
> #                 fifo:pager-closed
> #
> # The pager closes its stdin, which will give git-log SIGPIPE. But the
> # tricky part is that after doing so, it signals via fifo to the shell,
> # which then writes to git-log's stdin, triggering it to actually
> # generate output (and get SIGPIPE).
> #
> # You can verify that it's race-free by inserting a "sleep 3" at the
> # front of the pager command (before the exec) and seeing that the
> # other processes wait (and we still get SIGPIPE).
> 
> mkfifo pager-closed
> mkfifo log-in
> git config core.pager 'exec 0<&-; echo ready >pager-closed; exit 0'
> (git -p log --stdin <log-in; echo $? >exit-code) &
> 
> # we have to open a descriptor rather than just "echo HEAD >log-in", because
> # that will give git-log an immediate EOF on its input when echo closes it, and
> # we must wait until the signal from pager-closed. Likewise we cannot wait
> # for that signal before the echo, because the subshell is blocking on opening
> # log-in until somebody is hooked up to the write end of the pipe.
> exec 9>log-in
> read ok <pager-closed
> echo HEAD >&9
> exec 9>&-
> 
> # now we can wait for the subshell to finish and retrieve any output
> # it produced
> wait
> cat exit-code
> -- >8 --

Ugh.  I think this would work reliably, but...  ugh :)

I wonder whether we could do this as a new pair of 'test-tool'
helpers, one to run the pager through the usual pager-invoking
machinery and to generate a lot of output, the other to be used as the
early-exiting pager, with a pipe between the two to ensure that the
SIGPIPE does happen.  Well, essentially the same that you outlined
above but in C instead of shell, which I somehow find less "ugh".


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Is t7006-pager.sh racy?
  2021-10-28 19:55     ` SZEDER Gábor
@ 2021-10-28 22:10       ` Jeff King
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2021-10-28 22:10 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason

On Thu, Oct 28, 2021 at 09:55:27PM +0200, SZEDER Gábor wrote:

> > 
> > My usual test for an intentional SIGPIPE is "yes". So something like:
> > 
> >   git -p \
> >     -c core.pager='exit 0' \
> >     -c alias.yes='!yes' \
> >     yes
> > 
> > will reliably trigger SIGPIPE from yes, which git.c will then translate
> > into an exit code of 141.
> 
> Oh, that's clever.  Alas it's not applicable to our tests, because
> 'yes' is not portable; 8648732e29 (t/test-lib.sh: provide a shell
> implementation of the 'yes' utility, 2009-08-28).

True, though it would be easy enough to do:

  git -c alias.yes='!while true; do echo y; done' yes

> > So something like "git log --stdin" works, but you have to contort
> > yourself a bit to make it race-free:
> [...]
> Ugh.  I think this would work reliably, but...  ugh :)

Yes. :)

> I wonder whether we could do this as a new pair of 'test-tool'
> helpers, one to run the pager through the usual pager-invoking
> machinery and to generate a lot of output, the other to be used as the
> early-exiting pager, with a pipe between the two to ensure that the
> SIGPIPE does happen.  Well, essentially the same that you outlined
> above but in C instead of shell, which I somehow find less "ugh".

Maybe, though now we may be diverging from how git.c uses the pager. As
gross as what I wrote is, it is testing the code we want to test, I
think. I am not sure that the alias vs builtin thing is worth caring
about anyway, though, in which case the alias.yes thing above would be
fine.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Is t7006-pager.sh racy?
  2021-10-24 17:03 ` SZEDER Gábor
  2021-10-25 17:41   ` Jeff King
@ 2021-11-21 18:40   ` Jeff King
  2021-11-21 22:05     ` Jeff King
  2021-12-01 14:03     ` Is t7006-pager.sh racy? Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 17+ messages in thread
From: Jeff King @ 2021-11-21 18:40 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason

On Sun, Oct 24, 2021 at 07:03:49PM +0200, SZEDER Gábor wrote:

> > What makes us expect that the "git log" invocation should trigger a
> > SIGPIPE in the first place?
> 
> A misunderstanding, perhaps, because those 'git log' commands with
> their early-exiting pagers rarely trigger SIGPIPE.

I happened to be looking in this area today[1], and I think it turns out
not to be "rarely", but rather "never" for some of the tests.

The test in question sets the pager to "does-not-exist". But in that
case we will realize immediately via run-command.c that we could not run
the pager, and will not even redirect our stdout to it.

For example, doing this:

  GIT_PAGER=does-not-exist git -c alias.foo='!yes' -p foo

will never get SIGPIPE; it will just write infinitely to the original
stdout, and return success.

Whereas this:

  GIT_PAGER=false git -c alias.foo='!yes' -p foo

will reliably get SIGPIPE. But even if we used it (with a while loop to
instead of "yes" address the portability concern), the tests in t7006
would still be wrong, because they are sending test-terminal's output to
a closed pipe (so we'd still see SIGPIPE regardless of Git's behavior).
They should be sending test_terminal's output to a file or /dev/null.

It seems like this thread stalled. Ævar, were you planning to fix these
tests?

It's not too hard to swap out a "yes" as I showed above, but it further
confuses the trace2 output, because now we have the child yes (or its
shell equivalent) exiting as well.

The non-child "log --stdin" example I gave earlier in the thread avoids
that, but the fifo hackery is so horrible that I'd just as soon avoid
it. I guess yet another option is a builtin which produces infinite
output. Perhaps:

  oid=$(git rev-parse HEAD)
  while true; do echo $oid; done |
  test_terminal git -p cat-file --batch-check

That's guaranteed to get SIGPIPE eventually if the pager stops reading.
I seem to recall that test_terminal's handling of stdin is somewhat
broken, though, and would probably get in our way[2]. Possibly we could
just rip it out, as nobody is relying on it (they can't be, because it's
broken).

-Peff

[1] https://lore.kernel.org/git/YZqSBlvzz2KgOMnJ@coredump.intra.peff.net/

[2] https://lore.kernel.org/git/20190520125016.GA13474@sigill.intra.peff.net/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Is t7006-pager.sh racy?
  2021-11-21 18:40   ` Jeff King
@ 2021-11-21 22:05     ` Jeff King
  2021-11-21 22:54       ` [PATCH] t7006: clean up SIGPIPE handling in trace2 tests Jeff King
  2021-12-01 14:03     ` Is t7006-pager.sh racy? Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff King @ 2021-11-21 22:05 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason

On Sun, Nov 21, 2021 at 01:40:02PM -0500, Jeff King wrote:

> It seems like this thread stalled. Ævar, were you planning to fix these
> tests?
> 
> It's not too hard to swap out a "yes" as I showed above, but it further
> confuses the trace2 output, because now we have the child yes (or its
> shell equivalent) exiting as well.
> [...]

I think it also probably doesn't test what the patch adding these tests
was interested in. Specifically, it's trying to check the case of
reporting the pager exit code while we're in a signal handler. So we
have to get the SIGPIPE in the main git process to trigger that. Which
means it has to be a builtin.

So you really do need a builtin that produces infinite output. Or at the
very least, enough that we expect it to fill any possible OS pipe
buffer. I guess the latter is actually not that hard. A 1MB
commit-message would be plenty (or if you prefer, a big file with "log
-p").

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] t7006: clean up SIGPIPE handling in trace2 tests
  2021-11-21 22:05     ` Jeff King
@ 2021-11-21 22:54       ` Jeff King
  2021-11-21 23:10         ` Jeff King
  2021-11-22  2:17         ` Junio C Hamano
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff King @ 2021-11-21 22:54 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason

On Sun, Nov 21, 2021 at 05:05:12PM -0500, Jeff King wrote:

> On Sun, Nov 21, 2021 at 01:40:02PM -0500, Jeff King wrote:
> 
> > It seems like this thread stalled. Ævar, were you planning to fix these
> > tests?
> > 
> > It's not too hard to swap out a "yes" as I showed above, but it further
> > confuses the trace2 output, because now we have the child yes (or its
> > shell equivalent) exiting as well.
> > [...]
> 
> I think it also probably doesn't test what the patch adding these tests
> was interested in. Specifically, it's trying to check the case of
> reporting the pager exit code while we're in a signal handler. So we
> have to get the SIGPIPE in the main git process to trigger that. Which
> means it has to be a builtin.
> 
> So you really do need a builtin that produces infinite output. Or at the
> very least, enough that we expect it to fill any possible OS pipe
> buffer. I guess the latter is actually not that hard. A 1MB
> commit-message would be plenty (or if you prefer, a big file with "log
> -p").

That's actually pretty easy to do. So I came up with the patch below, in
the interests of wrapping up this thread.

I'm not 100% sure this fixes any possible races, as the race Junio
initially reported seemed to be in the "propagated signals from pager"
test, which I don't think has these flaky-SIGPIPE problems. But I think
it's at least correcting some of the confusion. And we can see if it
happens again (I haven't been able to trigger any failures with --stress
myself).

-- >8 --
Subject: [PATCH] t7006: clean up SIGPIPE handling in trace2 tests

Comit c24b7f6736 (pager: test for exit code with and without SIGPIPE,
2021-02-02) introduced some tests that don't reliably generate SIGPIPE
where we expect it (i.e., when our pager doesn't read all of the output
from git-log).

There are two problems that somewhat cancel each other out.

First is that the output of git-log isn't very large (only around 800
bytes). So even if the pager doesn't read all of our output, it's racy
whether or not we'll actually get a SIGPIPE (we won't if we write all of
the output into the pipe buffer before the pager exits).

But we wrap git-log with test_terminal, which is supposed to propagate
the exit status of git-log. However, it doesn't always do so;
test_terminal will copy to stdout any lines that it got from our fake
pager, and it pipes to an empty command. So most of the time we are
seeing a SIGPIPE from test_terminal itself (though this is likewise
racy).

Let's try to make this more robust in two ways:

  1. We'll put a commit with a huge message at the tip of history. Since
     this is over a megabyte, it should fill the OS pipe buffer
     completely, causing git-log to keep trying to write even after the
     pager has exited.

  2. We'll redirect the output of test_terminal to /dev/null. That means
     it can never get SIGPIPE itself, and will always be giving us the
     exit code from git-log.

These two changes reveal that one of the tests was looking for the wrong
behavior. If we try to start a pager that does not exist (according to
execve()), then the error propagates from start_command() back to the
pager code as an error, and we avoid redirecting git-log's stdout to the
broken pager entirely.  Instead, it goes straight to the original stdout
(test_terminal's pty in this case), and we do not see a SIGPIPE at all.

So the test "git attempts to page to nonexisting pager command, gets
SIGPIPE" is checking the wrong outcome; it should be looking for a
successful exit (and was only confused by test_terminal's SIGPIPE).

There's a related test, "git discards nonexisting pager without
SIGPIPE", which sets the pager to a shell command which will read all
input and _then_ run a non-existing command. But that doesn't trigger
the same execve() behavior. We really do run the shell there and
redirect git-log's stdout to it. And the fact that the shell then exits
127 is not interesting. It is not different at that point than the
earlier test to check for "exit 1". So we can drop that test entirely.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7006-pager.sh | 42 ++++++++++++++----------------------------
 1 file changed, 14 insertions(+), 28 deletions(-)

diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh
index 0e7cf75435..577e51b47a 100755
--- a/t/t7006-pager.sh
+++ b/t/t7006-pager.sh
@@ -661,6 +661,13 @@ test_expect_success 'setup trace2' '
 	export GIT_TRACE2_BRIEF
 '

+test_expect_success 'setup large log output' '
+	perl -e "
+		print \"this is a long commit message\" x 50000
+	" >commit-msg &&
+	git commit --allow-empty -F commit-msg
+'
+
 test_expect_success TTY 'git returns SIGPIPE on early pager exit' '
 	test_when_finished "rm pager-used trace.normal" &&
 	test_config core.pager ">pager-used; head -n 1; exit 0" &&
@@ -670,7 +677,7 @@ test_expect_success TTY 'git returns SIGPIPE on early pager exit' '

 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
+		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
 		test_match_signal 13 "$OUT"
 	else
 		test_terminal git log
@@ -691,7 +698,7 @@ test_expect_success TTY 'git returns SIGPIPE on early pager non-zero exit' '

 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
+		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
 		test_match_signal 13 "$OUT"
 	else
 		test_terminal git log
@@ -712,7 +719,7 @@ test_expect_success TTY 'git discards pager non-zero exit without SIGPIPE' '

 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
+		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
 		test "$OUT" -eq 0
 	else
 		test_terminal git log
@@ -724,28 +731,7 @@ test_expect_success TTY 'git discards pager non-zero exit without SIGPIPE' '
 	test_path_is_file pager-used
 '

-test_expect_success TTY 'git discards nonexisting pager without SIGPIPE' '
-	test_when_finished "rm pager-used trace.normal" &&
-	test_config core.pager "wc >pager-used; does-not-exist" &&
-	GIT_TRACE2="$(pwd)/trace.normal" &&
-	export GIT_TRACE2 &&
-	test_when_finished "unset GIT_TRACE2" &&
-
-	if test_have_prereq !MINGW
-	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
-		test "$OUT" -eq 0
-	else
-		test_terminal git log
-	fi &&
-
-	grep child_exit trace.normal >child-exits &&
-	test_line_count = 1 child-exits &&
-	grep " code:127 " child-exits &&
-	test_path_is_file pager-used
-'
-
-test_expect_success TTY 'git attempts to page to nonexisting pager command, gets SIGPIPE' '
+test_expect_success TTY 'git skips paging nonexisting command' '
 	test_when_finished "rm trace.normal" &&
 	test_config core.pager "does-not-exist" &&
 	GIT_TRACE2="$(pwd)/trace.normal" &&
@@ -754,8 +740,8 @@ test_expect_success TTY 'git attempts to page to nonexisting pager command, gets

 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
-		test_match_signal 13 "$OUT"
+		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
+		test "$OUT" -eq 0
 	else
 		test_terminal git log
 	fi &&
@@ -774,7 +760,7 @@ test_expect_success TTY 'git returns SIGPIPE on propagated signals from pager' '

 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) | :) 3>&1 ) &&
+		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
 		test_match_signal 13 "$OUT"
 	else
 		test_terminal git log
-- 
2.34.0.636.g3d595567a1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] t7006: clean up SIGPIPE handling in trace2 tests
  2021-11-21 22:54       ` [PATCH] t7006: clean up SIGPIPE handling in trace2 tests Jeff King
@ 2021-11-21 23:10         ` Jeff King
  2021-11-22  2:17         ` Junio C Hamano
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2021-11-21 23:10 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, git, Ævar Arnfjörð Bjarmason

On Sun, Nov 21, 2021 at 05:54:39PM -0500, Jeff King wrote:

> That's actually pretty easy to do. So I came up with the patch below, in
> the interests of wrapping up this thread.
> 
> I'm not 100% sure this fixes any possible races, as the race Junio
> initially reported seemed to be in the "propagated signals from pager"
> test, which I don't think has these flaky-SIGPIPE problems. But I think
> it's at least correcting some of the confusion. And we can see if it
> happens again (I haven't been able to trigger any failures with --stress
> myself).
> 
> -- >8 --
> Subject: [PATCH] t7006: clean up SIGPIPE handling in trace2 tests

This one is optional on top. It's just cleanup, but IMHO makes it easier
to understand what's going on.

-- >8 --
Subject: [PATCH] t7006: simplify exit-code checks for sigpipe tests

Some tests in t7006 check for a SIGPIPE result by recording $? and
comparing it with test_match_signal. Before the previous commit, the
command was on the left-hand side of a pipe, and so we had to do some
subshell trickery to extract it.

But now that this is no longer the case, we can do things much more
simply: just run the command directly, using braces to avoid wrecking
the &&-chain, and then record $?. We could almost use test_expect_code
here, but it doesn't know about test_match_signal.

Likewise, for tests which expect success (i.e., not SIGPIPE), we can
just put them in the &&-chain as usual. That even lets us get rid of the
!MINGW check, since the expectation is the same on both sides.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t7006-pager.sh | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh
index 577e51b47a..851961c798 100755
--- a/t/t7006-pager.sh
+++ b/t/t7006-pager.sh
@@ -677,7 +677,7 @@ test_expect_success TTY 'git returns SIGPIPE on early pager exit' '
 
 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
+		{ test_terminal git log >/dev/null; OUT=$?; } &&
 		test_match_signal 13 "$OUT"
 	else
 		test_terminal git log
@@ -698,7 +698,7 @@ test_expect_success TTY 'git returns SIGPIPE on early pager non-zero exit' '
 
 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
+		{ test_terminal git log >/dev/null; OUT=$?; } &&
 		test_match_signal 13 "$OUT"
 	else
 		test_terminal git log
@@ -717,13 +717,7 @@ test_expect_success TTY 'git discards pager non-zero exit without SIGPIPE' '
 	export GIT_TRACE2 &&
 	test_when_finished "unset GIT_TRACE2" &&
 
-	if test_have_prereq !MINGW
-	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
-		test "$OUT" -eq 0
-	else
-		test_terminal git log
-	fi &&
+	test_terminal git log &&
 
 	grep child_exit trace.normal >child-exits &&
 	test_line_count = 1 child-exits &&
@@ -738,13 +732,7 @@ test_expect_success TTY 'git skips paging nonexisting command' '
 	export GIT_TRACE2 &&
 	test_when_finished "unset GIT_TRACE2" &&
 
-	if test_have_prereq !MINGW
-	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
-		test "$OUT" -eq 0
-	else
-		test_terminal git log
-	fi &&
+	test_terminal git log &&
 
 	grep child_exit trace.normal >child-exits &&
 	test_line_count = 1 child-exits &&
@@ -760,7 +748,7 @@ test_expect_success TTY 'git returns SIGPIPE on propagated signals from pager' '
 
 	if test_have_prereq !MINGW
 	then
-		OUT=$( ((test_terminal git log; echo $? 1>&3) >/dev/null) 3>&1 ) &&
+		{ test_terminal git log >/dev/null; OUT=$?; } &&
 		test_match_signal 13 "$OUT"
 	else
 		test_terminal git log
-- 
2.34.0.636.g3d595567a1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] t7006: clean up SIGPIPE handling in trace2 tests
  2021-11-21 22:54       ` [PATCH] t7006: clean up SIGPIPE handling in trace2 tests Jeff King
  2021-11-21 23:10         ` Jeff King
@ 2021-11-22  2:17         ` Junio C Hamano
  2021-11-22  4:51           ` Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2021-11-22  2:17 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, git, Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> writes:

> I'm not 100% sure this fixes any possible races, as the race Junio
> initially reported seemed to be in the "propagated signals from pager"
> test, which I don't think has these flaky-SIGPIPE problems. But I think
> it's at least correcting some of the confusion. And we can see if it
> happens again (I haven't been able to trigger any failures with --stress
> myself).

Applying this (or this and the follow-up) seems to make t7006, which
used to be flaky, to consistently fail at test "git returns SIGPIPE
on propagated signals from pager" for me ;-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] t7006: clean up SIGPIPE handling in trace2 tests
  2021-11-22  2:17         ` Junio C Hamano
@ 2021-11-22  4:51           ` Jeff King
  2021-11-22  4:54             ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2021-11-22  4:51 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: SZEDER Gábor, git, Ævar Arnfjörð Bjarmason

On Sun, Nov 21, 2021 at 06:17:04PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I'm not 100% sure this fixes any possible races, as the race Junio
> > initially reported seemed to be in the "propagated signals from pager"
> > test, which I don't think has these flaky-SIGPIPE problems. But I think
> > it's at least correcting some of the confusion. And we can see if it
> > happens again (I haven't been able to trigger any failures with --stress
> > myself).
> 
> Applying this (or this and the follow-up) seems to make t7006, which
> used to be flaky, to consistently fail at test "git returns SIGPIPE
> on propagated signals from pager" for me ;-)

Well, I guess it's good that we made things more consistent. :) It is
curious that you get failures and I don't, though. I wonder what the
difference is.

One curiosity is that the test does this:

	test_config core.pager ">pager-used; test-tool sigchain"

While "test-tool sigchain" will die with SIGTERM, it's the shell itself
which will waitpid() on it. And so in the end, what Git will generally
see is the same as if the shell had done "exit 143".

I wonder if the difference is between our shells. I know from previous
experience that bash will sometimes directly exec the final command in a
"-c" command, as an optimization. I don't get any difference running the
test with dash or bash, but that makes sense; the pager command is run
internally by Git via "sh -c".

Aha, that's it. If I recompile with SHELL_PATH=/bin/bash, then I see a
failure. Likewise, if I change the test like this:

diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh
index 851961c798..a87ef37803 100755
--- a/t/t7006-pager.sh
+++ b/t/t7006-pager.sh
@@ -741,7 +741,7 @@ test_expect_success TTY 'git skips paging nonexisting command' '

 test_expect_success TTY 'git returns SIGPIPE on propagated signals from pager' '
 	test_when_finished "rm pager-used trace.normal" &&
-	test_config core.pager ">pager-used; test-tool sigchain" &&
+	test_config core.pager ">pager-used; exec test-tool sigchain" &&
 	GIT_TRACE2="$(pwd)/trace.normal" &&
 	export GIT_TRACE2 &&
 	test_when_finished "unset GIT_TRACE2" &&

then it fails even with dash. And that is, I think, closer to what the
test was actually trying to cover (since checking a shell's "exit 143"
is really no different than "exit 1", and we checked that earlier).

So why is it failing? It looks like trace2 reports this as code "-1"
rather than 143. I think that is because the fix from be8fc53e36 (pager:
properly log pager exit code when signalled, 2021-02-02) is incomplete.
It sets WEXITSTATUS() if the pager exited, but it doesn't handle signal
death at all. I think it needs:

diff --git a/run-command.c b/run-command.c
index f40df01c77..ef9d1d4236 100644
--- a/run-command.c
+++ b/run-command.c
@@ -555,6 +555,8 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
 	if (in_signal) {
 		if (WIFEXITED(status))
 			code = WEXITSTATUS(status);
+		else if (WIFSIGNALED(status))
+			code = 128 + WTERMSIG(status); /* see comment below */
 		return code;
 	}

-Peff

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] t7006: clean up SIGPIPE handling in trace2 tests
  2021-11-22  4:51           ` Jeff King
@ 2021-11-22  4:54             ` Jeff King
  2021-11-22  5:49               ` Junio C Hamano
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2021-11-22  4:54 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: SZEDER Gábor, git, Ævar Arnfjörð Bjarmason

On Sun, Nov 21, 2021 at 11:51:54PM -0500, Jeff King wrote:

> So why is it failing? It looks like trace2 reports this as code "-1"
> rather than 143. I think that is because the fix from be8fc53e36 (pager:
> properly log pager exit code when signalled, 2021-02-02) is incomplete.
> It sets WEXITSTATUS() if the pager exited, but it doesn't handle signal
> death at all. I think it needs:
> 
> diff --git a/run-command.c b/run-command.c
> index f40df01c77..ef9d1d4236 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -555,6 +555,8 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
>  	if (in_signal) {
>  		if (WIFEXITED(status))
>  			code = WEXITSTATUS(status);
> +		else if (WIFSIGNALED(status))
> +			code = 128 + WTERMSIG(status); /* see comment below */
>  		return code;
>  	}

I'm tempted to say that this would be clearer if the in_signal code path
just followed the main logic, like this:

diff --git a/run-command.c b/run-command.c
index f40df01c77..1f58c17b6c 100644
--- a/run-command.c
+++ b/run-command.c
@@ -552,20 +552,17 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
 
 	while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
 		;	/* nothing */
-	if (in_signal) {
-		if (WIFEXITED(status))
-			code = WEXITSTATUS(status);
-		return code;
-	}
 
 	if (waiting < 0) {
 		failed_errno = errno;
-		error_errno("waitpid for %s failed", argv0);
+		if (!in_signal)
+			error_errno("waitpid for %s failed", argv0);
 	} else if (waiting != pid) {
-		error("waitpid is confused (%s)", argv0);
+		if (!in_signal)
+			error("waitpid is confused (%s)", argv0);
 	} else if (WIFSIGNALED(status)) {
 		code = WTERMSIG(status);
-		if (code != SIGINT && code != SIGQUIT && code != SIGPIPE)
+		if (!in_signal && code != SIGINT && code != SIGQUIT && code != SIGPIPE)
 			error("%s died of signal %d", argv0, code);
 		/*
 		 * This return value is chosen so that code & 0xff
@@ -576,10 +573,12 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
 	} else if (WIFEXITED(status)) {
 		code = WEXITSTATUS(status);
 	} else {
-		error("waitpid is confused (%s)", argv0);
+		if (!in_signal)
+			error("waitpid is confused (%s)", argv0);
 	}
 
-	clear_child_for_cleanup(pid);
+	if (!in_signal)
+		clear_child_for_cleanup(pid);
 
 	errno = failed_errno;
 	return code;

That's a lot more tedious "if (!in_signal)" checks, but:

  - we don't have to duplicate any of the actual application logic

  - we'd now cover the extra cases for waitpid failing or returning the
    wrong pid (previously if waitpid() failed we'd still look at status,
    which could contain complete garbage!)

-Peff

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] t7006: clean up SIGPIPE handling in trace2 tests
  2021-11-22  4:54             ` Jeff King
@ 2021-11-22  5:49               ` Junio C Hamano
  2021-11-22  6:05                 ` Junio C Hamano
  2021-11-22 19:11                 ` Jeff King
  0 siblings, 2 replies; 17+ messages in thread
From: Junio C Hamano @ 2021-11-22  5:49 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, git, Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> writes:

> I'm tempted to say that this would be clearer if the in_signal code path
> just followed the main logic, like this:
>
> diff --git a/run-command.c b/run-command.c
> index f40df01c77..1f58c17b6c 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -552,20 +552,17 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
>  
>  	while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
>  		;	/* nothing */
> -	if (in_signal) {
> -		if (WIFEXITED(status))
> -			code = WEXITSTATUS(status);
> -		return code;
> -	}
>  
>  	if (waiting < 0) {
>  		failed_errno = errno;
> -		error_errno("waitpid for %s failed", argv0);
> +		if (!in_signal)
> +			error_errno("waitpid for %s failed", argv0);
>  	} else if (waiting != pid) {
> -		error("waitpid is confused (%s)", argv0);
> +		if (!in_signal)
> +			error("waitpid is confused (%s)", argv0);
>  	} else if (WIFSIGNALED(status)) {
>  		code = WTERMSIG(status);
> -		if (code != SIGINT && code != SIGQUIT && code != SIGPIPE)
> +		if (!in_signal && code != SIGINT && code != SIGQUIT && code != SIGPIPE)
>  			error("%s died of signal %d", argv0, code);
>  		/*
>  		 * This return value is chosen so that code & 0xff
> @@ -576,10 +573,12 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
>  	} else if (WIFEXITED(status)) {
>  		code = WEXITSTATUS(status);
>  	} else {
> -		error("waitpid is confused (%s)", argv0);
> +		if (!in_signal)
> +			error("waitpid is confused (%s)", argv0);
>  	}
>  
> -	clear_child_for_cleanup(pid);
> +	if (!in_signal)
> +		clear_child_for_cleanup(pid);
>  
>  	errno = failed_errno;
>  	return code;
>
> That's a lot more tedious "if (!in_signal)" checks, but:
>
>   - we don't have to duplicate any of the actual application logic
>
>   - we'd now cover the extra cases for waitpid failing or returning the
>     wrong pid (previously if waitpid() failed we'd still look at status,
>     which could contain complete garbage!)

Yeah, the repeated "if (!in_signal)" look a bit ugly, but fixing
that "we only deal with ifexited in in_signal case" to do the right
thing would make the code even more annoying and harder to maintain.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] t7006: clean up SIGPIPE handling in trace2 tests
  2021-11-22  5:49               ` Junio C Hamano
@ 2021-11-22  6:05                 ` Junio C Hamano
  2021-11-22 19:11                 ` Jeff King
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2021-11-22  6:05 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, git, Ævar Arnfjörð Bjarmason

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
> ...
>> That's a lot more tedious "if (!in_signal)" checks, but:
>>
>>   - we don't have to duplicate any of the actual application logic
>>
>>   - we'd now cover the extra cases for waitpid failing or returning the
>>     wrong pid (previously if waitpid() failed we'd still look at status,
>>     which could contain complete garbage!)
> ...
> Yeah, the repeated "if (!in_signal)" look a bit ugly, but fixing
> that "we only deal with ifexited in in_signal case" to do the right
> thing would make the code even more annoying and harder to maintain.

Eh, what I meant was that it would be annoying and harder to
maintain, if done inside the single "if (in_signal) { ... }" near
the beginning. (IOW, I am completely in agreement with your
reasoning above).


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] t7006: clean up SIGPIPE handling in trace2 tests
  2021-11-22  5:49               ` Junio C Hamano
  2021-11-22  6:05                 ` Junio C Hamano
@ 2021-11-22 19:11                 ` Jeff King
  2021-11-22 21:28                   ` [PATCH] run-command: unify signal and regular logic for wait_or_whine() Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff King @ 2021-11-22 19:11 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: SZEDER Gábor, git, Ævar Arnfjörð Bjarmason

On Sun, Nov 21, 2021 at 09:49:32PM -0800, Junio C Hamano wrote:

> > That's a lot more tedious "if (!in_signal)" checks, but:
> >
> >   - we don't have to duplicate any of the actual application logic
> >
> >   - we'd now cover the extra cases for waitpid failing or returning the
> >     wrong pid (previously if waitpid() failed we'd still look at status,
> >     which could contain complete garbage!)
> 
> Yeah, the repeated "if (!in_signal)" look a bit ugly, but fixing
> that "we only deal with ifexited in in_signal case" to do the right
> thing would make the code even more annoying and harder to maintain.

OK. Let me see if I can clean this up into a full series that actually
fixes the race you saw, and breaks down the other fixes a bit more.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] run-command: unify signal and regular logic for wait_or_whine()
  2021-11-22 19:11                 ` Jeff King
@ 2021-11-22 21:28                   ` Jeff King
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2021-11-22 21:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: SZEDER Gábor, git, Ævar Arnfjörð Bjarmason

On Mon, Nov 22, 2021 at 02:11:18PM -0500, Jeff King wrote:

> On Sun, Nov 21, 2021 at 09:49:32PM -0800, Junio C Hamano wrote:
> 
> > > That's a lot more tedious "if (!in_signal)" checks, but:
> > >
> > >   - we don't have to duplicate any of the actual application logic
> > >
> > >   - we'd now cover the extra cases for waitpid failing or returning the
> > >     wrong pid (previously if waitpid() failed we'd still look at status,
> > >     which could contain complete garbage!)
> > 
> > Yeah, the repeated "if (!in_signal)" look a bit ugly, but fixing
> > that "we only deal with ifexited in in_signal case" to do the right
> > thing would make the code even more annoying and harder to maintain.
> 
> OK. Let me see if I can clean this up into a full series that actually
> fixes the race you saw, and breaks down the other fixes a bit more.

Hmm. So I had thought to maybe split apart the earlier patches and
interweave this one at the right spot. But really there is a mutual
dependency in the test fixes: fixing the SIGPIPE issue means that this
will start to fail (but only sometimes, if /bin/sh is bash!), but fixing
the "exec" thing means it is only testing something sometimes (because
of the pipe raciness).

So I ended up doing this as just a separate preparatory fix, which is in
the patch below. I see you picked up the earlier diff as
jk/pager-exit-fix.  This one gives much more explanation, and adjusts
the test to actually cover this case. It can be applied directly on top
of ab/pager-exit-log, replacing what's in jk/pager-exit-fix.

And then I'd apply the other two (which are currently in
jk/t7006-sigpipe-tests-fix) directly on top of that. IMHO it should all
just be a single series that graduates together, but we could do the
other test fixups separately (they are fixing races, but I think the
races err on the side of the tests doing nothing, not failing
erroneously).

-- >8 --
Subject: run-command: unify signal and regular logic for wait_or_whine()

Since 507d7804c0 (pager: don't use unsafe functions in signal handlers,
2015-09-04), we have a separate code path in wait_or_whine() for the
case that we're in a signal handler. But that code path misses some of
the cases handled by the main logic.

This was improved in be8fc53e36 (pager: properly log pager exit code
when signalled, 2021-02-02), but that covered only case: actually
returning the correct error code. But there are some other cases:

  - if waitpid() returns failure, we wouldn't notice and would look at
    uninitialized garbage in the status variable; it's not clear if it's
    possible to trigger this or not

  - if the process exited by signal, then we would still report "-1"
    rather than the correct signal code

This latter case even had a test added in be8fc53e36, but it doesn't
work reliably. It sets the pager command to:

  >pager-used; test-tool sigchain

The latter command will die by signal, but because there are multiple
commands, there will be a shell in between. And it's the shell whose
waitpid() call will see the signal death, and it will then exit with
code 143, which is what Git will see.

To make matters even more confusing, some shells (such as bash) will
realize that there's nothing for the shell to do after test-tool
finishes, and will turn it into an exec. So the test was only checking
what it thought when /bin/sh points to a shell like bash (we're relying
on the shell used internally by Git to spawn sub-commands here, so even
running the test under bash would not be enough).

This patch adjusts the tests to explicitly call "exec" in the pager
command, which produces a consistent outcome regardless of shell. Note
that without the code change in this patch it _should_ fail reliably,
but doesn't. That test, like its siblings, tries to trigger SIGPIPE in
the git-log process writing to the pager, but only do so racily. That
will be fixed in a follow-on patch.

For the code change here, we have two options:

  - we can teach the in_signal code to handle WIFSIGNALED()

  - we can stop returning early when in_signal is set, and instead
    annotate individual calls that we need to skip in this case

The former is a simpler patch, but means we're essentially duplicating
all of the logic. So instead I went with the latter. The result is a
bigger patch, and we do run the risk of new code being added but
forgetting to handle in_signal. But in the long run it seems more
maintainable.

I've skipped any non-trivial calls for the in_signal case, like calling
error(). We'll also skip the call to clear_child_for_cleanup(), as we
were before. This is arguably the wrong thing to do, since we wouldn't
want to try to clean it up again. But:

  - we can't call it as-is, because it calls free(), which we must avoid
    in a signal handler (we'd have to pass in_signal so it can skip the
    free() call)

  - we'll only go through the list of children to clean once, since our
    cleanup_children_on_signal() handler pops itself after running (and
    then re-raises, so eventually we'd just exit). So this cleanup only
    matters if a process is on the cleanup list _and_ it has a separate
    handler to clean itself up. Which is questionable in the first place
    (and AFAIK we do not do).

  - double-cleanup isn't actually that bad anyway. waitpid() will just
    return an error, which we won't even report because of in_signal.

Signed-off-by: Jeff King <peff@peff.net>
---
 run-command.c    | 19 +++++++++----------
 t/t7006-pager.sh |  2 +-
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/run-command.c b/run-command.c
index 509841bf27..350593928d 100644
--- a/run-command.c
+++ b/run-command.c
@@ -551,20 +551,17 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)

 	while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
 		;	/* nothing */
-	if (in_signal) {
-		if (WIFEXITED(status))
-			code = WEXITSTATUS(status);
-		return code;
-	}

 	if (waiting < 0) {
 		failed_errno = errno;
-		error_errno("waitpid for %s failed", argv0);
+		if (!in_signal)
+			error_errno("waitpid for %s failed", argv0);
 	} else if (waiting != pid) {
-		error("waitpid is confused (%s)", argv0);
+		if (!in_signal)
+			error("waitpid is confused (%s)", argv0);
 	} else if (WIFSIGNALED(status)) {
 		code = WTERMSIG(status);
-		if (code != SIGINT && code != SIGQUIT && code != SIGPIPE)
+		if (!in_signal && code != SIGINT && code != SIGQUIT && code != SIGPIPE)
 			error("%s died of signal %d", argv0, code);
 		/*
 		 * This return value is chosen so that code & 0xff
@@ -575,10 +572,12 @@ static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
 	} else if (WIFEXITED(status)) {
 		code = WEXITSTATUS(status);
 	} else {
-		error("waitpid is confused (%s)", argv0);
+		if (!in_signal)
+			error("waitpid is confused (%s)", argv0);
 	}

-	clear_child_for_cleanup(pid);
+	if (!in_signal)
+		clear_child_for_cleanup(pid);

 	errno = failed_errno;
 	return code;
diff --git a/t/t7006-pager.sh b/t/t7006-pager.sh
index 0e7cf75435..032bde278e 100755
--- a/t/t7006-pager.sh
+++ b/t/t7006-pager.sh
@@ -767,7 +767,7 @@ test_expect_success TTY 'git attempts to page to nonexisting pager command, gets

 test_expect_success TTY 'git returns SIGPIPE on propagated signals from pager' '
 	test_when_finished "rm pager-used trace.normal" &&
-	test_config core.pager ">pager-used; test-tool sigchain" &&
+	test_config core.pager ">pager-used; exec test-tool sigchain" &&
 	GIT_TRACE2="$(pwd)/trace.normal" &&
 	export GIT_TRACE2 &&
 	test_when_finished "unset GIT_TRACE2" &&
-- 
2.34.0.639.g83a0ed08a8

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Is t7006-pager.sh racy?
  2021-11-21 18:40   ` Jeff King
  2021-11-21 22:05     ` Jeff King
@ 2021-12-01 14:03     ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 17+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-12-01 14:03 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, Junio C Hamano, git


On Sun, Nov 21 2021, Jeff King wrote:

> On Sun, Oct 24, 2021 at 07:03:49PM +0200, SZEDER Gábor wrote:
>
>> > What makes us expect that the "git log" invocation should trigger a
>> > SIGPIPE in the first place?
>> 
>> A misunderstanding, perhaps, because those 'git log' commands with
>> their early-exiting pagers rarely trigger SIGPIPE.
>
> I happened to be looking in this area today[1], and I think it turns out
> not to be "rarely", but rather "never" for some of the tests.
>
> The test in question sets the pager to "does-not-exist". But in that
> case we will realize immediately via run-command.c that we could not run
> the pager, and will not even redirect our stdout to it.
>
> For example, doing this:
>
>   GIT_PAGER=does-not-exist git -c alias.foo='!yes' -p foo
>
> will never get SIGPIPE; it will just write infinitely to the original
> stdout, and return success.
>
> Whereas this:
>
>   GIT_PAGER=false git -c alias.foo='!yes' -p foo
>
> will reliably get SIGPIPE. But even if we used it (with a while loop to
> instead of "yes" address the portability concern), the tests in t7006
> would still be wrong, because they are sending test-terminal's output to
> a closed pipe (so we'd still see SIGPIPE regardless of Git's behavior).
> They should be sending test_terminal's output to a file or /dev/null.
>
> It seems like this thread stalled. Ævar, were you planning to fix these
> tests?

Not at this point I'm not! :)

Anyway, a belated thanks for tackling this issue & sorry about the
mess. I see jk/t7006-sigpipe-tests-fix is marked already & those fixes
LGTM. This fell through the cracks for me among other E-Mail traffic.

> [...]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-12-01 14:05 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-24  0:04 Is t7006-pager.sh racy? Junio C Hamano
2021-10-24 17:03 ` SZEDER Gábor
2021-10-25 17:41   ` Jeff King
2021-10-28 19:55     ` SZEDER Gábor
2021-10-28 22:10       ` Jeff King
2021-11-21 18:40   ` Jeff King
2021-11-21 22:05     ` Jeff King
2021-11-21 22:54       ` [PATCH] t7006: clean up SIGPIPE handling in trace2 tests Jeff King
2021-11-21 23:10         ` Jeff King
2021-11-22  2:17         ` Junio C Hamano
2021-11-22  4:51           ` Jeff King
2021-11-22  4:54             ` Jeff King
2021-11-22  5:49               ` Junio C Hamano
2021-11-22  6:05                 ` Junio C Hamano
2021-11-22 19:11                 ` Jeff King
2021-11-22 21:28                   ` [PATCH] run-command: unify signal and regular logic for wait_or_whine() Jeff King
2021-12-01 14:03     ` Is t7006-pager.sh racy? Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).