git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* t0008 hang on streaming test (OS X)
@ 2013-07-10 16:36 Brian Gernhardt
  2013-07-10 20:35 ` Antoine Pelisse
  2013-07-11 13:34 ` Jeff King
  0 siblings, 2 replies; 10+ messages in thread
From: Brian Gernhardt @ 2013-07-10 16:36 UTC (permalink / raw)
  To: git@vger.kernel.org List

The newest test in t0008 "streaming support for --stdin", seems to hang sporadically on my MacBook Pro (running 10.8.4).  The hang seems to be related to running it in parallel with other tests, as I can only reliably cause it by running with prove  and -j 3.  However, once that has hung I am able to semi-reliably have it occur by running the test separately (with the test hung in the background, using separate trash directories via the --root option).

Experimentation has led me to find that it is hanging when trying to read the 2nd response from check-ignore.

I am somewhat stuck on how to fix it.  Any ideas?

~~ Brian Gernhardt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: t0008 hang on streaming test (OS X)
  2013-07-10 16:36 t0008 hang on streaming test (OS X) Brian Gernhardt
@ 2013-07-10 20:35 ` Antoine Pelisse
  2013-07-11 16:14   ` Brian Gernhardt
  2013-07-11 13:34 ` Jeff King
  1 sibling, 1 reply; 10+ messages in thread
From: Antoine Pelisse @ 2013-07-10 20:35 UTC (permalink / raw)
  To: Brian Gernhardt; +Cc: git@vger.kernel.org List

On Wed, Jul 10, 2013 at 6:36 PM, Brian Gernhardt
<brian@gernhardtsoftware.com> wrote:
> I am somewhat stuck on how to fix it.  Any ideas?

I don't have anything to reproduce here, but usually I start
investigating this kind of problems by attaching the hung process with
gdb to see the current state (if it's stuck in a specific state), or
to investigate the end-less loop.
That usually help finding a good starting point.

Cheers,
Antoine

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: t0008 hang on streaming test (OS X)
  2013-07-10 16:36 t0008 hang on streaming test (OS X) Brian Gernhardt
  2013-07-10 20:35 ` Antoine Pelisse
@ 2013-07-11 13:34 ` Jeff King
  2013-07-11 16:09   ` Junio C Hamano
  2013-07-11 16:13   ` t0008 hang on streaming test (OS X) Brian Gernhardt
  1 sibling, 2 replies; 10+ messages in thread
From: Jeff King @ 2013-07-11 13:34 UTC (permalink / raw)
  To: Brian Gernhardt; +Cc: git@vger.kernel.org List

On Wed, Jul 10, 2013 at 12:36:40PM -0400, Brian Gernhardt wrote:

> The newest test in t0008 "streaming support for --stdin", seems to
> hang sporadically on my MacBook Pro (running 10.8.4).  The hang seems
> to be related to running it in parallel with other tests, as I can
> only reliably cause it by running with prove  and -j 3.  However, once
> that has hung I am able to semi-reliably have it occur by running the
> test separately (with the test hung in the background, using separate
> trash directories via the --root option).

I can't replicate the hang here (on Linux) doing:

  for i in `seq 1 30`; do
      ./t0008-ignores.sh --root=/tmp/foo/$i &
  done

Do you know which test it is hanging on? You mentioned that you can
replicate it outside of "prove"; what does running with "-v" say?

The last test in t0008, with the fifos, would make me the most
suspicious. The way we do it _should_ be fine, but I'm wondering if the
shell is blocking in exec here:

  mkfifo in out &&
  (git check-ignore -n -v --stdin <in >out &) &&
  exec 9>in &&

That is, if the fifo is not opened for some reason by the backgrounded
process (there's a race, of course, but the outer shell should just
block until the sub-shell actually opens it). I wonder if the
descriptor-opening behavior of:

  cmd <in >out &

is different between shells (that is, if it backgrounds the opening of
in and out on some shells, but not on others). But then I would expect
it to fail consistently.

Just for fun, does switching the middle line there to:

  (sh -c "git check-ignore -n -v --stdin <in >out" &) &&

have any effect?

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: t0008 hang on streaming test (OS X)
  2013-07-11 13:34 ` Jeff King
@ 2013-07-11 16:09   ` Junio C Hamano
  2013-07-12 10:35     ` [PATCH] t0008: avoid SIGPIPE race condition on fifo Jeff King
  2013-07-11 16:13   ` t0008 hang on streaming test (OS X) Brian Gernhardt
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2013-07-11 16:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Brian Gernhardt, git@vger.kernel.org List

Jeff King <peff@peff.net> writes:

> On Wed, Jul 10, 2013 at 12:36:40PM -0400, Brian Gernhardt wrote:
>
>> The newest test in t0008 "streaming support for --stdin", seems to
>> hang sporadically on my MacBook Pro (running 10.8.4).  The hang seems
>> to be related to running it in parallel with other tests, as I can
>> only reliably cause it by running with prove  and -j 3.  However, once
>> that has hung I am able to semi-reliably have it occur by running the
>> test separately (with the test hung in the background, using separate
>> trash directories via the --root option).
>
> I can't replicate the hang here (on Linux) doing:
>
>   for i in `seq 1 30`; do
>       ./t0008-ignores.sh --root=/tmp/foo/$i &
>   done

It seems to hang on me with bash, but not dash, here.

> Do you know which test it is hanging on? You mentioned that you can
> replicate it outside of "prove"; what does running with "-v" say?
>
> The last test in t0008, with the fifos, would make me the most
> suspicious. The way we do it _should_ be fine, but I'm wondering if the
> shell is blocking in exec here:
>
>   mkfifo in out &&
>   (git check-ignore -n -v --stdin <in >out &) &&
>   exec 9>in &&
>
> That is, if the fifo is not opened for some reason by the backgrounded
> process (there's a race, of course, but the outer shell should just
> block until the sub-shell actually opens it). I wonder if the
> descriptor-opening behavior of:
>
>   cmd <in >out &
>
> is different between shells (that is, if it backgrounds the opening of
> in and out on some shells, but not on others). But then I would expect
> it to fail consistently.
>
> Just for fun, does switching the middle line there to:
>
>   (sh -c "git check-ignore -n -v --stdin <in >out" &) &&
>
> have any effect?
>
> -Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: t0008 hang on streaming test (OS X)
  2013-07-11 13:34 ` Jeff King
  2013-07-11 16:09   ` Junio C Hamano
@ 2013-07-11 16:13   ` Brian Gernhardt
  1 sibling, 0 replies; 10+ messages in thread
From: Brian Gernhardt @ 2013-07-11 16:13 UTC (permalink / raw)
  To: Jeff King; +Cc: git@vger.kernel.org List


On Jul 11, 2013, at 9:34 AM, Jeff King <peff@peff.net> wrote:

> On Wed, Jul 10, 2013 at 12:36:40PM -0400, Brian Gernhardt wrote:
> 
>> The newest test in t0008 "streaming support for --stdin",

>> Experimentation has led me to find that it is hanging when trying to read the 2nd response from check-ignore.


> Do you know which test it is hanging on? You mentioned that you can
> replicate it outside of "prove"; what does running with "-v" say?
> 
> The last test in t0008, with the fifos, would make me the most
> suspicious.

The 2nd `read response <out` line is where it was hanging, based on a variety of echos added to the test.

~~ Brian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: t0008 hang on streaming test (OS X)
  2013-07-10 20:35 ` Antoine Pelisse
@ 2013-07-11 16:14   ` Brian Gernhardt
  0 siblings, 0 replies; 10+ messages in thread
From: Brian Gernhardt @ 2013-07-11 16:14 UTC (permalink / raw)
  To: Antoine Pelisse; +Cc: git@vger.kernel.org List


On Jul 10, 2013, at 4:35 PM, Antoine Pelisse <apelisse@gmail.com> wrote:

> On Wed, Jul 10, 2013 at 6:36 PM, Brian Gernhardt
> <brian@gernhardtsoftware.com> wrote:
>> I am somewhat stuck on how to fix it.  Any ideas?
> 
> I don't have anything to reproduce here, but usually I start
> investigating this kind of problems by attaching the hung process with
> gdb to see the current state (if it's stuck in a specific state), or
> to investigate the end-less loop.
> That usually help finding a good starting point.

Unfortunately, the hung process is /bin/sh (aka bash).  Using the "Sample" function of Activity Monitor gave me that 100% of the time was spent in libc's _open...  Which enlightens me not at all.

~~ Brian Gernhardt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] t0008: avoid SIGPIPE race condition on fifo
  2013-07-11 16:09   ` Junio C Hamano
@ 2013-07-12 10:35     ` Jeff King
  2013-07-12 16:23       ` Junio C Hamano
  2013-07-12 16:42       ` Brian Gernhardt
  0 siblings, 2 replies; 10+ messages in thread
From: Jeff King @ 2013-07-12 10:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brian Gernhardt, git@vger.kernel.org List

On Thu, Jul 11, 2013 at 09:09:55AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Wed, Jul 10, 2013 at 12:36:40PM -0400, Brian Gernhardt wrote:
> >
> >> The newest test in t0008 "streaming support for --stdin", seems to
> >> hang sporadically on my MacBook Pro (running 10.8.4).  The hang seems
> >> to be related to running it in parallel with other tests, as I can
> >> only reliably cause it by running with prove  and -j 3.  However, once
> >> that has hung I am able to semi-reliably have it occur by running the
> >> test separately (with the test hung in the background, using separate
> >> trash directories via the --root option).
> >
> > I can't replicate the hang here (on Linux) doing:
> >
> >   for i in `seq 1 30`; do
> >       ./t0008-ignores.sh --root=/tmp/foo/$i &
> >   done
> 
> It seems to hang on me with bash, but not dash, here.

Thanks, I was able to replicate it with bash, and like Brian, I saw it
hanging in the second read. strace revealed that we were blocked on
open("out").

The patch below should fix it. I'm still not sure why the choice of
shell matters; it may simply be a timing fluke that bash is more likely
to hit for some reason.

-- >8 --
Subject: [PATCH] t0008: avoid SIGPIPE race condition on fifo

To test check-ignore's --stdin feature, we use two fifos to
send and receive data. We carefully keep a descriptor to its
input open so that it does not receive EOF between input
lines. However, we do not do the same for its output. That
means there is a potential race condition in which
check-ignore has opened the output pipe once (when we read
the first line), and then writes the second line before we
have re-opened the pipe.

In that case, check-ignore gets a SIGPIPE and dies. The
outer shell then tries to open the output fifo but blocks
indefinitely, because there is no writer.  We can fix it by
keeping a descriptor open through the whole procedure.

This should also help if check-ignore dies for any other
reason (we would already have opened the fifo and would
therefore not block, but just get EOF on read).

However, we are technically still susceptible to
check-ignore dying early, before we have opened the fifo.
This is an unlikely race and shouldn't generally happen in
practice, though, so we can hopefully ignore it.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t0008-ignores.sh | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index a56db80..c29342d 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -697,13 +697,21 @@ test_expect_success PIPE 'streaming support for --stdin' '
 	# shell, and then echo to the fd. We make sure to close it at
 	# the end, so that the subprocess does get EOF and dies
 	# properly.
+	#
+	# Similarly, we must keep "out" open so that check-ignore does
+	# not ever get SIGPIPE trying to write to us. Not only would that
+	# produce incorrect results, but then there would be no writer on the
+	# other end of the pipe, and we would potentially block forever trying
+	# to open it.
 	exec 9>in &&
+	exec 8<out &&
 	test_when_finished "exec 9>&-" &&
+	test_when_finished "exec 8<&-" &&
 	echo >&9 one &&
-	read response <out &&
+	read response <&8 &&
 	echo "$response" | grep "^\.gitignore:1:one	one" &&
 	echo >&9 two &&
-	read response <out &&
+	read response <&8 &&
 	echo "$response" | grep "^::	two"
 '
 
-- 
1.8.3.rc1.30.gff0fb75

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] t0008: avoid SIGPIPE race condition on fifo
  2013-07-12 10:35     ` [PATCH] t0008: avoid SIGPIPE race condition on fifo Jeff King
@ 2013-07-12 16:23       ` Junio C Hamano
  2013-07-12 20:42         ` Jeff King
  2013-07-12 16:42       ` Brian Gernhardt
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2013-07-12 16:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Brian Gernhardt, git@vger.kernel.org List

Jeff King <peff@peff.net> writes:

> Subject: [PATCH] t0008: avoid SIGPIPE race condition on fifo
>
> To test check-ignore's --stdin feature, we use two fifos to
> send and receive data. We carefully keep a descriptor to its
> input open so that it does not receive EOF between input
> lines. However, we do not do the same for its output. That
> means there is a potential race condition in which
> check-ignore has opened the output pipe once (when we read
> the first line), and then writes the second line before we
> have re-opened the pipe.
>
> In that case, check-ignore gets a SIGPIPE and dies. The
> outer shell then tries to open the output fifo but blocks
> indefinitely, because there is no writer.  We can fix it by
> keeping a descriptor open through the whole procedure.

Ahh, figures.

I wish I were smart enough to figure that out immediately after
seeing the test that does funny things to "in" with "9".

Thanks.

> This should also help if check-ignore dies for any other
> reason (we would already have opened the fifo and would
> therefore not block, but just get EOF on read).
>
> However, we are technically still susceptible to
> check-ignore dying early, before we have opened the fifo.
> This is an unlikely race and shouldn't generally happen in
> practice, though, so we can hopefully ignore it.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t0008-ignores.sh | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
> index a56db80..c29342d 100755
> --- a/t/t0008-ignores.sh
> +++ b/t/t0008-ignores.sh
> @@ -697,13 +697,21 @@ test_expect_success PIPE 'streaming support for --stdin' '
>  	# shell, and then echo to the fd. We make sure to close it at
>  	# the end, so that the subprocess does get EOF and dies
>  	# properly.
> +	#
> +	# Similarly, we must keep "out" open so that check-ignore does
> +	# not ever get SIGPIPE trying to write to us. Not only would that
> +	# produce incorrect results, but then there would be no writer on the
> +	# other end of the pipe, and we would potentially block forever trying
> +	# to open it.
>  	exec 9>in &&
> +	exec 8<out &&
>  	test_when_finished "exec 9>&-" &&
> +	test_when_finished "exec 8<&-" &&
>  	echo >&9 one &&
> -	read response <out &&
> +	read response <&8 &&
>  	echo "$response" | grep "^\.gitignore:1:one	one" &&
>  	echo >&9 two &&
> -	read response <out &&
> +	read response <&8 &&
>  	echo "$response" | grep "^::	two"
>  '

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] t0008: avoid SIGPIPE race condition on fifo
  2013-07-12 10:35     ` [PATCH] t0008: avoid SIGPIPE race condition on fifo Jeff King
  2013-07-12 16:23       ` Junio C Hamano
@ 2013-07-12 16:42       ` Brian Gernhardt
  1 sibling, 0 replies; 10+ messages in thread
From: Brian Gernhardt @ 2013-07-12 16:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git@vger.kernel.org List


On Jul 12, 2013, at 6:35 AM, Jeff King <peff@peff.net> wrote:

> Subject: [PATCH] t0008: avoid SIGPIPE race condition on fifo

Was able to complete a prove run with this patch.  Many thanks.

~~ Brian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] t0008: avoid SIGPIPE race condition on fifo
  2013-07-12 16:23       ` Junio C Hamano
@ 2013-07-12 20:42         ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2013-07-12 20:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brian Gernhardt, git@vger.kernel.org List

On Fri, Jul 12, 2013 at 09:23:54AM -0700, Junio C Hamano wrote:

> > In that case, check-ignore gets a SIGPIPE and dies. The
> > outer shell then tries to open the output fifo but blocks
> > indefinitely, because there is no writer.  We can fix it by
> > keeping a descriptor open through the whole procedure.
> 
> Ahh, figures.
> 
> I wish I were smart enough to figure that out immediately after
> seeing the test that does funny things to "in" with "9".

I wish I were smart enough to have figured it out when I was writing the
funny stuff with "in" and "9" in the first place. :)

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-07-12 20:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-10 16:36 t0008 hang on streaming test (OS X) Brian Gernhardt
2013-07-10 20:35 ` Antoine Pelisse
2013-07-11 16:14   ` Brian Gernhardt
2013-07-11 13:34 ` Jeff King
2013-07-11 16:09   ` Junio C Hamano
2013-07-12 10:35     ` [PATCH] t0008: avoid SIGPIPE race condition on fifo Jeff King
2013-07-12 16:23       ` Junio C Hamano
2013-07-12 20:42         ` Jeff King
2013-07-12 16:42       ` Brian Gernhardt
2013-07-11 16:13   ` t0008 hang on streaming test (OS X) Brian Gernhardt

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).