[PATCH 0/4] fix hang in git push when pack-objects fails

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 0/4] fix hang in git push when pack-objects fails
@ 2011-03-31 18:42 Jeff King
  2011-03-31 18:43 ` [PATCH 1/4] teach wait_or_whine a "quiet" mode Jeff King
                   ` (5 more replies)
  0 siblings, 6 replies; 48+ messages in thread
From: Jeff King @ 2011-03-31 18:42 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt

When we push over the git protocol, we spawn a pack-object process.  If
pack-objects dies prematurely (for example, because there is some repo
corruption), we are careful to clean up the sideband demuxer (if it is
being used) with finish_async. However, for an async implementation
which forks (i.e., when we have no pthreads), that means we will
waitpid() for the async process.

Meanwhile, the async sideband demuxer will continue trying to stream
data from the remote repo until it gets EOF.  Depending on what data
pack-objects actually sent, the remote repo may not actually send
anything (e.g., if we sent nothing and it is simply waiting for the
pack).  This leads to deadlock cycle in which send-pack waits on the
demuxer, the demuxer waits on the remote receive-pack, and the remote
receive-pack waits on send-pack to send the pack data.

You can test this by compiling with NO_PTHREADS=1 and running the
following script:

-- >8 --
#!/bin/sh

rm -rf parent child

git init --bare parent &&
git init child &&
cd child &&
git remote add origin ../parent &&
echo content >file &&
git add file &&
git commit -m one &&
git push origin HEAD &&
echo content >>file &&
git add file &&
git commit -m two

sha1=`git rev-parse HEAD:file`
file=`echo $sha1 | sed 's,..,&/,'`
rm -fv .git/objects/$file

git push
-- 8< --

The problem bisects to 38a81b4 (receive-pack: Wrap status reports inside
side-band-64k, 2010-02-05). In fact, at that point in time we didn't use
pthreads at all for async calls on non-win32 platforms, so even people
with pthreads are affected. For example, you can trigger it with
v1.7.0.2 even with pthreads, since we didn't use them for async code
back then.  That state continued until f6b6098 (Enable threaded async
procedures whenever pthreads is available, 2010-03-09), at which point
the problem went away for pthreads users.

So what I did was build a maint series straight on top of 38a81b4,
the source of the bug:

  [1/4]: teach wait_or_whine a "quiet" mode
  [2/4]: finish_async: be quiet when waiting for async process
  [3/4]: run-command: allow aborting async code prematurely
  [4/4]: send-pack: abort sideband demuxer on pack-objects error

The first two are refactoring so that aborting async code does not
produce a stray "child process died with signal 15" message. If they're
too invasive, they can go away and we can live with the extra message.

The third one introduces abort_async, which just kill()s the
forked process for the non-win32 case. For the win32 case, we need to
either:

  1. do nothing. I'm not 100% sure why, but the bug does not manifest
     itself with pthreads. I don't know how it behaves on win32.

  2. do the equivalent of pthread_cancel. This makes more sense to me.
     Even if this particular case doesn't have an issue in the threaded
     case, having a primitive like abort_async actually kill the thread
     is sensible.

So that fixes old versions. To fix newer ones, it needs to be merged
with f6b6098, which has a few conflicts. And then you can apply on top
of that merge:

  [5/4]: run-command: implement abort_async for pthreads

which will again break win32, as the compat wrapper doesn't implement
pthread_cancel.

If it's easier to pull than to recreate the merge, you can find the
first series based on 38a81b4 here:

  git://github.com/peff/git.git jk/maint-push-async-hang

and then the merge, with conflict resolution, and 5/4 on top here:

  git://github.com/peff/git.git jk/maint-push-async-hang-threads

If it's easier, I can also build the fix on top of v1.7.2, which was the
first release with async pthreads, and then we can either call that "old
enough", or I can backport it via cherry-pick to the source.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/4] teach wait_or_whine a "quiet" mode
  2011-03-31 18:42 [PATCH 0/4] fix hang in git push when pack-objects fails Jeff King
@ 2011-03-31 18:43 ` Jeff King
  2011-03-31 20:56   ` Johannes Sixt
  2011-03-31 18:44 ` [PATCH 2/4] finish_async: be quiet when waiting for async process Jeff King
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2011-03-31 18:43 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt

The wait_or_whine function will complain to stderr in a few
cases:

  1. We fail to actually waitpid() correctly.

  2. The child died of a signal.

  3. The child returned exit code 127, indicating a missing
     command to exec after forking.

We already have a silent_exec_failure flag to silence (3).
Let's convert that into a "quiet" flag to also silence (2).
This shouldn't result in signal failure being silent for
existing users of silent_exec_failure, since they already
will need to be checking the return code and complaining for
the case of a non-zero exit code.

For (1), it probably makes sense to always complain about a
failure to correctly wait, so let's not quiet that.

Signed-off-by: Jeff King <peff@peff.net>
---
 run-command.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/run-command.c b/run-command.c
index 0d95340..0d5626a 100644
--- a/run-command.c
+++ b/run-command.c
@@ -240,7 +240,7 @@ fail_pipe:
 	return 0;
 }
 
-static int wait_or_whine(pid_t pid, const char *argv0, int silent_exec_failure)
+static int wait_or_whine(pid_t pid, const char *argv0, int quiet)
 {
 	int status, code = -1;
 	pid_t waiting;
@@ -256,7 +256,8 @@ static int wait_or_whine(pid_t pid, const char *argv0, int silent_exec_failure)
 		error("waitpid is confused (%s)", argv0);
 	} else if (WIFSIGNALED(status)) {
 		code = WTERMSIG(status);
-		error("%s died of signal %d", argv0, code);
+		if (!quiet)
+			error("%s died of signal %d", argv0, code);
 		/*
 		 * This return value is chosen so that code & 0xff
 		 * mimics the exit code that a POSIX shell would report for
@@ -271,7 +272,7 @@ static int wait_or_whine(pid_t pid, const char *argv0, int silent_exec_failure)
 		if (code == 127) {
 			code = -1;
 			failed_errno = ENOENT;
-			if (!silent_exec_failure)
+			if (!quiet)
 				error("cannot run %s: %s", argv0,
 					strerror(ENOENT));
 		}
-- 
1.7.4.13.g8566c

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/4] finish_async: be quiet when waiting for async process
  2011-03-31 18:42 [PATCH 0/4] fix hang in git push when pack-objects fails Jeff King
  2011-03-31 18:43 ` [PATCH 1/4] teach wait_or_whine a "quiet" mode Jeff King
@ 2011-03-31 18:44 ` Jeff King
  2011-03-31 18:44 ` [PATCH 3/4] run-command: allow aborting async code prematurely Jeff King
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2011-03-31 18:44 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt

When we ask to finish_async, we should be quiet about things
like signal death in the async process, for two reasons:

  1. This better matches what happens on the Windows side,
     where threads are used (and what will eventually happen
     when pthreads are used on unix).

  2. We pass along the error code to the caller, who needs
     to check and produce an error message anyway for the
     case of a non-zero exit code.

Signed-off-by: Jeff King <peff@peff.net>
---
 run-command.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/run-command.c b/run-command.c
index 0d5626a..258c880 100644
--- a/run-command.c
+++ b/run-command.c
@@ -427,7 +427,7 @@ error:
 int finish_async(struct async *async)
 {
 #ifndef WIN32
-	int ret = wait_or_whine(async->pid, "child process", 0);
+	int ret = wait_or_whine(async->pid, "child process", 1);
 #else
 	DWORD ret = 0;
 	if (WaitForSingleObject(async->tid, INFINITE) != WAIT_OBJECT_0)
-- 
1.7.4.13.g8566c

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 3/4] run-command: allow aborting async code prematurely
  2011-03-31 18:42 [PATCH 0/4] fix hang in git push when pack-objects fails Jeff King
  2011-03-31 18:43 ` [PATCH 1/4] teach wait_or_whine a "quiet" mode Jeff King
  2011-03-31 18:44 ` [PATCH 2/4] finish_async: be quiet when waiting for async process Jeff King
@ 2011-03-31 18:44 ` Jeff King
  2011-04-01  9:36   ` Erik Faye-Lund
  2011-03-31 18:44 ` [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error Jeff King
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2011-03-31 18:44 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt

We provide only two abstract promitives for async code:
start and finish. Where "finish" means to wait until the
async code tells us it is done. However, it may also be
useful for us to to tell the async code to abort right away,
because whatever it was doing is no longer useful.

For a separate process, we just kill() it. For Windows, we
need to do whatever the equivalent to pthread_cancel is.

Signed-off-by: Jeff King <peff@peff.net>
---
 run-command.c |   10 ++++++++++
 run-command.h |    1 +
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/run-command.c b/run-command.c
index 258c880..f179d2a 100644
--- a/run-command.c
+++ b/run-command.c
@@ -439,6 +439,16 @@ int finish_async(struct async *async)
 	return ret;
 }
 
+void abort_async(struct async *async)
+{
+#ifndef WIN32
+	kill(async->pid, 15);
+#else
+	/* no clue */
+#endif
+	finish_async(async);
+}
+
 int run_hook(const char *index_file, const char *name, ...)
 {
 	struct child_process hook;
diff --git a/run-command.h b/run-command.h
index 65ccb1c..df2c736 100644
--- a/run-command.h
+++ b/run-command.h
@@ -83,5 +83,6 @@ struct async {
 
 int start_async(struct async *async);
 int finish_async(struct async *async);
+void abort_async(struct async *async);
 
 #endif
-- 
1.7.4.13.g8566c

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error
  2011-03-31 18:42 [PATCH 0/4] fix hang in git push when pack-objects fails Jeff King
                   ` (2 preceding siblings ...)
  2011-03-31 18:44 ` [PATCH 3/4] run-command: allow aborting async code prematurely Jeff King
@ 2011-03-31 18:44 ` Jeff King
  2011-04-13 19:53   ` Johannes Sixt
  2011-03-31 18:45 ` [PATCH 5/4] run-command: implement abort_async for pthreads Jeff King
  2011-03-31 20:45 ` [PATCH 0/4] fix hang in git push when pack-objects fails Johannes Sixt
  5 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2011-03-31 18:44 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt

If pack-objects dies prematurely (for example, because there
is some repo corruption), we are careful to clean up the
sideband demuxer (if it is being used) with finish_async.
However, for an async implementation which forks (i.e., not
Windows), that means we will waitpid() for the async
process.

Meanwhile, the async sideband demuxer will continue trying
to stream data from the remote repo until it gets EOF.
Depending on what data pack-objects actually sent, the
remote repo may not actually send anything (e.g., if we sent
nothing and it is simply waiting for the pack). This leads
to deadlock cycle in which send-pack waits on the demuxer,
the demuxer waits on the remote receive-pack, and the remote
receive-pack waits on send-pack to send the pack data.

To break this, let's have send-pack abort the demuxer when
it knows that it has failed to properly send the pack.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin-send-pack.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/builtin-send-pack.c b/builtin-send-pack.c
index 2478e18..cc068ff 100644
--- a/builtin-send-pack.c
+++ b/builtin-send-pack.c
@@ -516,7 +516,7 @@ int send_pack(struct send_pack_args *args,
 			for (ref = remote_refs; ref; ref = ref->next)
 				ref->status = REF_STATUS_NONE;
 			if (use_sideband)
-				finish_async(&demux);
+				abort_async(&demux);
 			return -1;
 		}
 	}
-- 
1.7.4.13.g8566c

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-03-31 18:42 [PATCH 0/4] fix hang in git push when pack-objects fails Jeff King
                   ` (3 preceding siblings ...)
  2011-03-31 18:44 ` [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error Jeff King
@ 2011-03-31 18:45 ` Jeff King
  2011-04-01  9:41   ` Erik Faye-Lund
  2011-03-31 20:45 ` [PATCH 0/4] fix hang in git push when pack-objects fails Johannes Sixt
  5 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2011-03-31 18:45 UTC (permalink / raw)
  To: git; +Cc: Johannes Sixt

We just need to cancel the thread, instead of sending it a
signal as we do for fork()'d async sections.

Signed-off-by: Jeff King <peff@peff.net>
---
This could also just be part of the merge resolution, but I thought it
would be easier to see what is going on if I put it here.

 run-command.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/run-command.c b/run-command.c
index e31c073..46ea07d 100644
--- a/run-command.c
+++ b/run-command.c
@@ -607,7 +607,7 @@ void abort_async(struct async *async)
 #ifdef NO_PTHREADS
 	kill(async->pid, 15);
 #else
-	/* no clue */
+	pthread_cancel(async->tid);
 #endif
 	finish_async(async);
 }
-- 
1.7.4.13.g8566c

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/4] fix hang in git push when pack-objects fails
  2011-03-31 18:42 [PATCH 0/4] fix hang in git push when pack-objects fails Jeff King
                   ` (4 preceding siblings ...)
  2011-03-31 18:45 ` [PATCH 5/4] run-command: implement abort_async for pthreads Jeff King
@ 2011-03-31 20:45 ` Johannes Sixt
  2011-04-01  1:34   ` Jeff King
  5 siblings, 1 reply; 48+ messages in thread
From: Johannes Sixt @ 2011-03-31 20:45 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Donnerstag, 31. März 2011, Jeff King wrote:
>   1. do nothing. I'm not 100% sure why, but the bug does not manifest
>      itself with pthreads. I don't know how it behaves on win32.

The reason might be that with threads we carefully close file descriptors, so 
that the other end sees EOF or broken pipe and terminates, while with a 
forked sideband demux one of the channels remains open in the forked process?

-- Hannes

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/4] teach wait_or_whine a "quiet" mode
  2011-03-31 18:43 ` [PATCH 1/4] teach wait_or_whine a "quiet" mode Jeff King
@ 2011-03-31 20:56   ` Johannes Sixt
  2011-04-01  1:35     ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Johannes Sixt @ 2011-03-31 20:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Donnerstag, 31. März 2011, Jeff King wrote:
> The wait_or_whine function will complain to stderr in a few
> cases:
>
>   1. We fail to actually waitpid() correctly.
>
>   2. The child died of a signal.
>
>   3. The child returned exit code 127, indicating a missing
>      command to exec after forking.
>
> We already have a silent_exec_failure flag to silence (3).
> Let's convert that into a "quiet" flag to also silence (2).

I'm rather negative on controlling these two error reports with the same flag 
because...

> This shouldn't result in signal failure being silent for
> existing users of silent_exec_failure, since they already
> will need to be checking the return code and complaining for
> the case of a non-zero exit code.

This reasoning is not correct. Error reporting in the routines in 
run-command.c is structured such that callers have to check the return code, 
but they do not report errors themselves. Therefore, even if callers request 
silent_exec_failure, they will not do their own error reporting if there is 
some other failure.

-- Hannes

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 0/4] fix hang in git push when pack-objects fails
  2011-03-31 20:45 ` [PATCH 0/4] fix hang in git push when pack-objects fails Johannes Sixt
@ 2011-04-01  1:34   ` Jeff King
  0 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2011-04-01  1:34 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

On Thu, Mar 31, 2011 at 10:45:38PM +0200, Johannes Sixt wrote:

> On Donnerstag, 31. März 2011, Jeff King wrote:
> >   1. do nothing. I'm not 100% sure why, but the bug does not manifest
> >      itself with pthreads. I don't know how it behaves on win32.
> 
> The reason might be that with threads we carefully close file descriptors, so 
> that the other end sees EOF or broken pipe and terminates, while with a 
> forked sideband demux one of the channels remains open in the forked process?

That was my thought, too, except that:

  1. I don't see us closing the descriptor after we fork the async code
     off. So even if the async code carefully closed it, the main
     process would still have it open.

  2. In some cases it may not even be a file descriptor that needs
     closed, but rather a half-duplex shutdown (e.g., for git-over-tcp).
     Because the point of the async code is that it is reading from the
     remote socket.

So I'm really not sure. I can do some straces and try to investigate
further on what happens in the pthread case.  But regardless, it seems
like we should be killing off any async processes explicitly on error.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 1/4] teach wait_or_whine a "quiet" mode
  2011-03-31 20:56   ` Johannes Sixt
@ 2011-04-01  1:35     ` Jeff King
  0 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2011-04-01  1:35 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

On Thu, Mar 31, 2011 at 10:56:39PM +0200, Johannes Sixt wrote:

> On Donnerstag, 31. März 2011, Jeff King wrote:
> > The wait_or_whine function will complain to stderr in a few
> > cases:
> >
> >   1. We fail to actually waitpid() correctly.
> >
> >   2. The child died of a signal.
> >
> >   3. The child returned exit code 127, indicating a missing
> >      command to exec after forking.
> >
> > We already have a silent_exec_failure flag to silence (3).
> > Let's convert that into a "quiet" flag to also silence (2).
> 
> I'm rather negative on controlling these two error reports with the same flag 
> because...

Fair enough. We can add an extra flag, then, or just drop both patches
entirely. They are just trying to save the "child process killed"
message when we abort a separate-process async.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 3/4] run-command: allow aborting async code prematurely
  2011-03-31 18:44 ` [PATCH 3/4] run-command: allow aborting async code prematurely Jeff King
@ 2011-04-01  9:36   ` Erik Faye-Lund
  2011-04-01 13:59     ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Erik Faye-Lund @ 2011-04-01  9:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Johannes Sixt

On Thu, Mar 31, 2011 at 8:44 PM, Jeff King <peff@peff.net> wrote:
> We provide only two abstract promitives for async code:
> start and finish. Where "finish" means to wait until the
> async code tells us it is done. However, it may also be
> useful for us to to tell the async code to abort right away,
> because whatever it was doing is no longer useful.
>
> For a separate process, we just kill() it. For Windows, we
> need to do whatever the equivalent to pthread_cancel is.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  run-command.c |   10 ++++++++++
>  run-command.h |    1 +
>  2 files changed, 11 insertions(+), 0 deletions(-)
>
> diff --git a/run-command.c b/run-command.c
> index 258c880..f179d2a 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -439,6 +439,16 @@ int finish_async(struct async *async)
>        return ret;
>  }
>
> +void abort_async(struct async *async)
> +{
> +#ifndef WIN32
> +       kill(async->pid, 15);

This doesn't compile unless NO_PTHREADS is set, no?

> +#else
> +       /* no clue */
> +#endif
> +       finish_async(async);
> +}
> +

This should probably be

void abort_async(struct async *async)
{
#ifdef NO_PTHREADS
	kill(async->pid, 15);
#else
	pthread_cancel(async->tid)
#endif
	finish_async(async);
}

... and then us Windows-guys must implement something like pthread_cancel().

Or maybe not. Can pthread reliably cancel a thread while making sure
that thread isn't holding a mutex (like some thread-safe malloc
implementations do)? If not, I'm not entirely sure we can even reach
this goal.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-03-31 18:45 ` [PATCH 5/4] run-command: implement abort_async for pthreads Jeff King
@ 2011-04-01  9:41   ` Erik Faye-Lund
  2011-04-01 10:15     ` Erik Faye-Lund
  2011-04-01 14:00     ` Jeff King
  0 siblings, 2 replies; 48+ messages in thread
From: Erik Faye-Lund @ 2011-04-01  9:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Johannes Sixt

On Thu, Mar 31, 2011 at 8:45 PM, Jeff King <peff@peff.net> wrote:
> We just need to cancel the thread, instead of sending it a
> signal as we do for fork()'d async sections.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> This could also just be part of the merge resolution, but I thought it
> would be easier to see what is going on if I put it here.
>
>  run-command.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/run-command.c b/run-command.c
> index e31c073..46ea07d 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -607,7 +607,7 @@ void abort_async(struct async *async)
>  #ifdef NO_PTHREADS

This context-line doesn't match 3/4... Did you send out the wrong
version of that patch?

>        kill(async->pid, 15);
>  #else
> -       /* no clue */
> +       pthread_cancel(async->tid);

My worry about terminating a thread that's currently holding a mutex
(implicitly through the CRT?) still applies though...

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01  9:41   ` Erik Faye-Lund
@ 2011-04-01 10:15     ` Erik Faye-Lund
  2011-04-01 17:27       ` Johannes Sixt
  2011-04-01 14:00     ` Jeff King
  1 sibling, 1 reply; 48+ messages in thread
From: Erik Faye-Lund @ 2011-04-01 10:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Johannes Sixt

On Fri, Apr 1, 2011 at 11:41 AM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Thu, Mar 31, 2011 at 8:45 PM, Jeff King <peff@peff.net> wrote:
>> We just need to cancel the thread, instead of sending it a
>> signal as we do for fork()'d async sections.
>>
>> Signed-off-by: Jeff King <peff@peff.net>
>> ---
>> This could also just be part of the merge resolution, but I thought it
>> would be easier to see what is going on if I put it here.
>>
>>  run-command.c |    2 +-
>>  1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/run-command.c b/run-command.c
>> index e31c073..46ea07d 100644
>> --- a/run-command.c
>> +++ b/run-command.c
>> @@ -607,7 +607,7 @@ void abort_async(struct async *async)
>>  #ifdef NO_PTHREADS
>
> This context-line doesn't match 3/4... Did you send out the wrong
> version of that patch?
>
>>        kill(async->pid, 15);
>>  #else
>> -       /* no clue */
>> +       pthread_cancel(async->tid);
>
> My worry about terminating a thread that's currently holding a mutex
> (implicitly through the CRT?) still applies though...
>

OK, I've read up on thread-cancellation, and this code seems correct.
pthread_cancel doesn't kill the thread right away, it just signals a
cancellation-event, which is checked for at certain
cancellation-points. A lot of the CRT functions are defined as
cancellation points, so it'll be a matter for us Win32-guys to
implement pthread_testcancel() and inject that into the
function-wrappers of the CRT functions that are marked as
cancellation-points.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 3/4] run-command: allow aborting async code prematurely
  2011-04-01  9:36   ` Erik Faye-Lund
@ 2011-04-01 13:59     ` Jeff King
  0 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2011-04-01 13:59 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: git, Johannes Sixt

On Fri, Apr 01, 2011 at 11:36:53AM +0200, Erik Faye-Lund wrote:

> > diff --git a/run-command.c b/run-command.c
> > index 258c880..f179d2a 100644
> > --- a/run-command.c
> > +++ b/run-command.c
> > @@ -439,6 +439,16 @@ int finish_async(struct async *async)
> >        return ret;
> >  }
> >
> > +void abort_async(struct async *async)
> > +{
> > +#ifndef WIN32
> > +       kill(async->pid, 15);
> 
> This doesn't compile unless NO_PTHREADS is set, no?

Read the cover letter again. This goes on top of an old version,
pre-pthreads. Patch 5/4 modernizes it.

> This should probably be
> 
> void abort_async(struct async *async)
> {
> #ifdef NO_PTHREADS
> 	kill(async->pid, 15);
> #else
> 	pthread_cancel(async->tid)
> #endif
> 	finish_async(async);
> }

Yes, eventually it becomes that, in 5/4.

> ... and then us Windows-guys must implement something like pthread_cancel().
> 
> Or maybe not. Can pthread reliably cancel a thread while making sure
> that thread isn't holding a mutex (like some thread-safe malloc
> implementations do)? If not, I'm not entirely sure we can even reach
> this goal.

Yes, as you figured out in a later email, there are cancellation points.
However, I'm not sure it matters for this. Run-command's async code is a
very abstract primitive that can even be implemented in a separate
process. So async code must not use mutexes or assume it shares memory
with the parent. Real threaded code should use pthreads directly (and
must be optional, or we alienate non-threaded platforms).

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01  9:41   ` Erik Faye-Lund
  2011-04-01 10:15     ` Erik Faye-Lund
@ 2011-04-01 14:00     ` Jeff King
  1 sibling, 0 replies; 48+ messages in thread
From: Jeff King @ 2011-04-01 14:00 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: git, Johannes Sixt

On Fri, Apr 01, 2011 at 11:41:12AM +0200, Erik Faye-Lund wrote:

> On Thu, Mar 31, 2011 at 8:45 PM, Jeff King <peff@peff.net> wrote:
> > We just need to cancel the thread, instead of sending it a
> > signal as we do for fork()'d async sections.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > This could also just be part of the merge resolution, but I thought it
> > would be easier to see what is going on if I put it here.
> >
> >  run-command.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/run-command.c b/run-command.c
> > index e31c073..46ea07d 100644
> > --- a/run-command.c
> > +++ b/run-command.c
> > @@ -607,7 +607,7 @@ void abort_async(struct async *async)
> >  #ifdef NO_PTHREADS
> 
> This context-line doesn't match 3/4... Did you send out the wrong
> version of that patch?

No, read the cover letter. After 4/4, you have to merge the fix up to a
more modern version with pthreads, then this goes on top of the
resolution.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 10:15     ` Erik Faye-Lund
@ 2011-04-01 17:27       ` Johannes Sixt
  2011-04-01 17:38         ` Jeff King
  2011-04-01 19:26         ` Erik Faye-Lund
  0 siblings, 2 replies; 48+ messages in thread
From: Johannes Sixt @ 2011-04-01 17:27 UTC (permalink / raw)
  To: kusmabite; +Cc: Jeff King, git

On Freitag, 1. April 2011, Erik Faye-Lund wrote:
> On Fri, Apr 1, 2011 at 11:41 AM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> > On Thu, Mar 31, 2011 at 8:45 PM, Jeff King <peff@peff.net> wrote:
> >>        kill(async->pid, 15);
> >>  #else
> >> -       /* no clue */
> >> +       pthread_cancel(async->tid);
> >
> > My worry about terminating a thread that's currently holding a mutex
> > (implicitly through the CRT?) still applies though...
>
> OK, I've read up on thread-cancellation, and this code seems correct.
> pthread_cancel doesn't kill the thread right away, it just signals a
> cancellation-event, which is checked for at certain
> cancellation-points. A lot of the CRT functions are defined as
> cancellation points, so it'll be a matter for us Win32-guys to
> implement pthread_testcancel() and inject that into the
> function-wrappers of the CRT functions that are marked as
> cancellation-points.

That's not going to happen. We cannot implement pthread_cancel() on Windows 
because it would have to be able to interrupt blocking system calls. 
(TerminateThread() is a no-no, given all the caveats about leaking system 
resources that are mentioned in the manual.)

[OK, "cannot" is a hard word. It is possible in some way, I'm sure. But that 
would mean that we implement the equivalent of Cygwin or so...]

But if I understand correctly what Jeff wrote so far, then the pthreaded case 
happens to work - by chance or by design, we don't know (yet). Perhaps we can 
get away with

-	/* no clue */
+	/* pthread_cancel(async->tid); not necessary */

-- Hannes

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 17:27       ` Johannes Sixt
@ 2011-04-01 17:38         ` Jeff King
  2011-04-01 19:26         ` Erik Faye-Lund
  1 sibling, 0 replies; 48+ messages in thread
From: Jeff King @ 2011-04-01 17:38 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: kusmabite, git

On Fri, Apr 01, 2011 at 07:27:03PM +0200, Johannes Sixt wrote:

> > OK, I've read up on thread-cancellation, and this code seems correct.
> > pthread_cancel doesn't kill the thread right away, it just signals a
> > cancellation-event, which is checked for at certain
> > cancellation-points. A lot of the CRT functions are defined as
> > cancellation points, so it'll be a matter for us Win32-guys to
> > implement pthread_testcancel() and inject that into the
> > function-wrappers of the CRT functions that are marked as
> > cancellation-points.
> 
> That's not going to happen. We cannot implement pthread_cancel() on Windows 
> because it would have to be able to interrupt blocking system calls. 
> (TerminateThread() is a no-no, given all the caveats about leaking system 
> resources that are mentioned in the manual.)
> 
> [OK, "cannot" is a hard word. It is possible in some way, I'm sure. But that 
> would mean that we implement the equivalent of Cygwin or so...]
> 
> But if I understand correctly what Jeff wrote so far, then the pthreaded case 
> happens to work - by chance or by design, we don't know (yet). Perhaps we can 
> get away with
> 
> -	/* no clue */
> +	/* pthread_cancel(async->tid); not necessary */

Yeah, I think that would probably work, but I haven't had a chance yet
to look deeper into why the pthread case doesn't hang.

I have another case, too, which is that killing a "git push" in progress
via signal will leave crufty child-processes around, still trying to
push.  One of these is the pack-objects sub-process, and the other is
(in the no-pthreads case) the sideband demuxer.

And obviously fixing that involves aborting the async process, too[1].
But we can again get away without pthread_cancel, because in the pthread
case, we can just rely on the parent process dying to take down the
thread.

-Peff

[1] Actually, my plan is to set up a signal/atexit handler to kill off
children.  Run-command callers can specify an option for "yes, this
child should be killed if I am killed". Async callers will have it
turned on automatically (since they won't even know if it's a subthread
or a different process).  So we won't actually be calling abort_async()
anyway.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 17:27       ` Johannes Sixt
  2011-04-01 17:38         ` Jeff King
@ 2011-04-01 19:26         ` Erik Faye-Lund
  2011-04-01 19:33           ` Erik Faye-Lund
  2011-04-01 19:42           ` Johannes Sixt
  1 sibling, 2 replies; 48+ messages in thread
From: Erik Faye-Lund @ 2011-04-01 19:26 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, git

On Fri, Apr 1, 2011 at 7:27 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> On Freitag, 1. April 2011, Erik Faye-Lund wrote:
>> On Fri, Apr 1, 2011 at 11:41 AM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> > On Thu, Mar 31, 2011 at 8:45 PM, Jeff King <peff@peff.net> wrote:
>> >>        kill(async->pid, 15);
>> >>  #else
>> >> -       /* no clue */
>> >> +       pthread_cancel(async->tid);
>> >
>> > My worry about terminating a thread that's currently holding a mutex
>> > (implicitly through the CRT?) still applies though...
>>
>> OK, I've read up on thread-cancellation, and this code seems correct.
>> pthread_cancel doesn't kill the thread right away, it just signals a
>> cancellation-event, which is checked for at certain
>> cancellation-points. A lot of the CRT functions are defined as
>> cancellation points, so it'll be a matter for us Win32-guys to
>> implement pthread_testcancel() and inject that into the
>> function-wrappers of the CRT functions that are marked as
>> cancellation-points.
>
> That's not going to happen. We cannot implement pthread_cancel() on Windows
> because it would have to be able to interrupt blocking system calls.
> (TerminateThread() is a no-no, given all the caveats about leaking system
> resources that are mentioned in the manual.)
>

Did you read my suggestion? I was talking about implementing
cancellation-points, just like on other platforms. This should not
lead to TerminateThread, but instead a conditional ExitThread from the
thread in question.

Something like this (I've only added a cancellation-point at close,
just to illustrate what I mean):


diff --git a/compat/mingw.c b/compat/mingw.c
index 878b1de..253be14 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -302,6 +302,14 @@ int mingw_open (const char *filename, int oflags, ...)
 	return fd;
 }

+#undef close
+int mingw_close(int fd)
+{
+	int ret = close(fd);
+	pthread_testcancel();
+	return ret;
+}
+
 #undef write
 ssize_t mingw_write(int fd, const void *buf, size_t count)
 {
diff --git a/compat/mingw.h b/compat/mingw.h
index 62eccd3..3e904c8 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -175,6 +175,9 @@ int mingw_rmdir(const char *path);
 int mingw_open (const char *filename, int oflags, ...);
 #define open mingw_open

+int mingw_close(int fd);
+#define close mingw_close
+
 ssize_t mingw_write(int fd, const void *buf, size_t count);
 #define write mingw_write

diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c
index 010e875..47d620b 100644
--- a/compat/win32/pthread.c
+++ b/compat/win32/pthread.c
@@ -18,6 +18,7 @@ static unsigned __stdcall win32_start_routine(void *arg)
 	pthread_t *thread = arg;
 	thread->tid = GetCurrentThreadId();
 	thread->arg = thread->start_routine(thread->arg);
+	CloseHandle(thread->cancel_event);
 	return 0;
 }

@@ -26,6 +27,9 @@ int pthread_create(pthread_t *thread, const void *unused,
 {
 	thread->arg = arg;
 	thread->start_routine = start_routine;
+	thread->cancel_event = CreateEvent(NULL, FALSE, FALSE, NULL);
+	if (!thread->cancel_event)
+		die("failed to create event");
 	thread->handle = (HANDLE)
 		_beginthreadex(NULL, 0, win32_start_routine, thread, 0, NULL);

diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h
index 2e20548..7fce39d 100644
--- a/compat/win32/pthread.h
+++ b/compat/win32/pthread.h
@@ -56,6 +56,7 @@ extern int pthread_cond_broadcast(pthread_cond_t *cond);
  */
 typedef struct {
 	HANDLE handle;
+	HANDLE cancel_event;
 	void *(*start_routine)(void*);
 	void *arg;
 	DWORD tid;
@@ -96,4 +97,13 @@ static inline void *pthread_getspecific(pthread_key_t key)
 	return TlsGetValue(key);
 }

+#define pthread_cancel(a) SetEvent(a.cancel_event)
+
+static inline void pthread_testcancel(void)
+{
+	pthread_t thread = pthread_self();
+	if (WaitForSingleObject(thread.cancel_event, 0))
+		pthread_exit(NULL);
+}
+
 #endif /* PTHREAD_H */

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 19:26         ` Erik Faye-Lund
@ 2011-04-01 19:33           ` Erik Faye-Lund
  2011-04-01 19:42           ` Johannes Sixt
  1 sibling, 0 replies; 48+ messages in thread
From: Erik Faye-Lund @ 2011-04-01 19:33 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, git

On Fri, Apr 1, 2011 at 9:26 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> On Fri, Apr 1, 2011 at 7:27 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>> On Freitag, 1. April 2011, Erik Faye-Lund wrote:
>>> On Fri, Apr 1, 2011 at 11:41 AM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>>> > On Thu, Mar 31, 2011 at 8:45 PM, Jeff King <peff@peff.net> wrote:
>>> >>        kill(async->pid, 15);
>>> >>  #else
>>> >> -       /* no clue */
>>> >> +       pthread_cancel(async->tid);
>>> >
>>> > My worry about terminating a thread that's currently holding a mutex
>>> > (implicitly through the CRT?) still applies though...
>>>
>>> OK, I've read up on thread-cancellation, and this code seems correct.
>>> pthread_cancel doesn't kill the thread right away, it just signals a
>>> cancellation-event, which is checked for at certain
>>> cancellation-points. A lot of the CRT functions are defined as
>>> cancellation points, so it'll be a matter for us Win32-guys to
>>> implement pthread_testcancel() and inject that into the
>>> function-wrappers of the CRT functions that are marked as
>>> cancellation-points.
>>
>> That's not going to happen. We cannot implement pthread_cancel() on Windows
>> because it would have to be able to interrupt blocking system calls.
>> (TerminateThread() is a no-no, given all the caveats about leaking system
>> resources that are mentioned in the manual.)
>>
>
> Did you read my suggestion? I was talking about implementing
> cancellation-points, just like on other platforms. This should not
> lead to TerminateThread, but instead a conditional ExitThread from the
> thread in question.
>
> Something like this (I've only added a cancellation-point at close,
> just to illustrate what I mean):
>
>
> diff --git a/compat/mingw.c b/compat/mingw.c
> index 878b1de..253be14 100644
> --- a/compat/mingw.c
> +++ b/compat/mingw.c
> @@ -302,6 +302,14 @@ int mingw_open (const char *filename, int oflags, ...)
>        return fd;
>  }
>
> +#undef close
> +int mingw_close(int fd)
> +{
> +       int ret = close(fd);
> +       pthread_testcancel();
> +       return ret;
> +}
> +
>  #undef write
>  ssize_t mingw_write(int fd, const void *buf, size_t count)
>  {
> diff --git a/compat/mingw.h b/compat/mingw.h
> index 62eccd3..3e904c8 100644
> --- a/compat/mingw.h
> +++ b/compat/mingw.h
> @@ -175,6 +175,9 @@ int mingw_rmdir(const char *path);
>  int mingw_open (const char *filename, int oflags, ...);
>  #define open mingw_open
>
> +int mingw_close(int fd);
> +#define close mingw_close
> +
>  ssize_t mingw_write(int fd, const void *buf, size_t count);
>  #define write mingw_write
>
> diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c
> index 010e875..47d620b 100644
> --- a/compat/win32/pthread.c
> +++ b/compat/win32/pthread.c
> @@ -18,6 +18,7 @@ static unsigned __stdcall win32_start_routine(void *arg)
>        pthread_t *thread = arg;
>        thread->tid = GetCurrentThreadId();
>        thread->arg = thread->start_routine(thread->arg);
> +       CloseHandle(thread->cancel_event);
>        return 0;
>  }
>
> @@ -26,6 +27,9 @@ int pthread_create(pthread_t *thread, const void *unused,
>  {
>        thread->arg = arg;
>        thread->start_routine = start_routine;
> +       thread->cancel_event = CreateEvent(NULL, FALSE, FALSE, NULL);
> +       if (!thread->cancel_event)
> +               die("failed to create event");
>        thread->handle = (HANDLE)
>                _beginthreadex(NULL, 0, win32_start_routine, thread, 0, NULL);
>
> diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h
> index 2e20548..7fce39d 100644
> --- a/compat/win32/pthread.h
> +++ b/compat/win32/pthread.h
> @@ -56,6 +56,7 @@ extern int pthread_cond_broadcast(pthread_cond_t *cond);
>  */
>  typedef struct {
>        HANDLE handle;
> +       HANDLE cancel_event;
>        void *(*start_routine)(void*);
>        void *arg;
>        DWORD tid;
> @@ -96,4 +97,13 @@ static inline void *pthread_getspecific(pthread_key_t key)
>        return TlsGetValue(key);
>  }
>
> +#define pthread_cancel(a) SetEvent(a.cancel_event)
> +
> +static inline void pthread_testcancel(void)
> +{
> +       pthread_t thread = pthread_self();
> +       if (WaitForSingleObject(thread.cancel_event, 0))

That should be:
-	if (WaitForSingleObject(thread.cancel_event, 0))
+	if (WaitForSingleObject(thread.cancel_event, 0) == WAIT_OBJECT_0)
...Of course.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 19:26         ` Erik Faye-Lund
  2011-04-01 19:33           ` Erik Faye-Lund
@ 2011-04-01 19:42           ` Johannes Sixt
  2011-04-01 19:57             ` Erik Faye-Lund
  1 sibling, 1 reply; 48+ messages in thread
From: Johannes Sixt @ 2011-04-01 19:42 UTC (permalink / raw)
  To: kusmabite; +Cc: Jeff King, git

On Freitag, 1. April 2011, Erik Faye-Lund wrote:
> On Fri, Apr 1, 2011 at 7:27 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> > On Freitag, 1. April 2011, Erik Faye-Lund wrote:
> >> OK, I've read up on thread-cancellation, and this code seems correct.
> >> pthread_cancel doesn't kill the thread right away, it just signals a
> >> cancellation-event, which is checked for at certain
> >> cancellation-points. A lot of the CRT functions are defined as
> >> cancellation points, so it'll be a matter for us Win32-guys to
> >> implement pthread_testcancel() and inject that into the
> >> function-wrappers of the CRT functions that are marked as
> >> cancellation-points.
> >
> > That's not going to happen. We cannot implement pthread_cancel() on
> > Windows because it would have to be able to interrupt blocking system
> > calls. (TerminateThread() is a no-no, given all the caveats about leaking
> > system resources that are mentioned in the manual.)
>
> Did you read my suggestion?

Yes, I did.

> I was talking about implementing 
> cancellation-points, just like on other platforms. This should not
> lead to TerminateThread, but instead a conditional ExitThread from the
> thread in question.
>
> Something like this (I've only added a cancellation-point at close,
> just to illustrate what I mean):

But this does not help the case at hand in any way. How would you interrupt a 
thread that is blocked in ReadFile()? The point of pthread_cancel() is that 
it interrupts blocked system calls, something that you cannot achieve if you 
want to keep using MS's C runtime.

-- Hannes

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 19:42           ` Johannes Sixt
@ 2011-04-01 19:57             ` Erik Faye-Lund
  2011-04-01 20:05               ` Jeff King
  2011-04-01 20:18               ` Johannes Sixt
  0 siblings, 2 replies; 48+ messages in thread
From: Erik Faye-Lund @ 2011-04-01 19:57 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, git

On Fri, Apr 1, 2011 at 9:42 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> On Freitag, 1. April 2011, Erik Faye-Lund wrote:
>> On Fri, Apr 1, 2011 at 7:27 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>> > On Freitag, 1. April 2011, Erik Faye-Lund wrote:
>> >> OK, I've read up on thread-cancellation, and this code seems correct.
>> >> pthread_cancel doesn't kill the thread right away, it just signals a
>> >> cancellation-event, which is checked for at certain
>> >> cancellation-points. A lot of the CRT functions are defined as
>> >> cancellation points, so it'll be a matter for us Win32-guys to
>> >> implement pthread_testcancel() and inject that into the
>> >> function-wrappers of the CRT functions that are marked as
>> >> cancellation-points.
>> >
>> > That's not going to happen. We cannot implement pthread_cancel() on
>> > Windows because it would have to be able to interrupt blocking system
>> > calls. (TerminateThread() is a no-no, given all the caveats about leaking
>> > system resources that are mentioned in the manual.)
>>
>> Did you read my suggestion?
>
> Yes, I did.
>
>> I was talking about implementing
>> cancellation-points, just like on other platforms. This should not
>> lead to TerminateThread, but instead a conditional ExitThread from the
>> thread in question.
>>
>> Something like this (I've only added a cancellation-point at close,
>> just to illustrate what I mean):
>
> But this does not help the case at hand in any way. How would you interrupt a
> thread that is blocked in ReadFile()? The point of pthread_cancel() is that
> it interrupts blocked system calls

There is no mention of such a guarantee in POSIX (section 2.9.5 Thread
Cancellation), so relying on that is undefined behavior.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 19:57             ` Erik Faye-Lund
@ 2011-04-01 20:05               ` Jeff King
  2011-04-01 20:13                 ` Erik Faye-Lund
  2011-04-01 20:18               ` Johannes Sixt
  1 sibling, 1 reply; 48+ messages in thread
From: Jeff King @ 2011-04-01 20:05 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Johannes Sixt, git

On Fri, Apr 01, 2011 at 09:57:14PM +0200, Erik Faye-Lund wrote:

> > But this does not help the case at hand in any way. How would you interrupt a
> > thread that is blocked in ReadFile()? The point of pthread_cancel() is that
> > it interrupts blocked system calls
> 
> There is no mention of such a guarantee in POSIX (section 2.9.5 Thread
> Cancellation), so relying on that is undefined behavior.

Eh? My pthreads(7) says that read() is required to be a cancellation
point acrroding to POSIX. I didn't dig up the actual reference in the
standard, though.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 20:05               ` Jeff King
@ 2011-04-01 20:13                 ` Erik Faye-Lund
  2011-04-01 20:17                   ` Jeff King
  2011-04-01 20:36                   ` Johannes Sixt
  0 siblings, 2 replies; 48+ messages in thread
From: Erik Faye-Lund @ 2011-04-01 20:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, git

On Fri, Apr 1, 2011 at 10:05 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 01, 2011 at 09:57:14PM +0200, Erik Faye-Lund wrote:
>
>> > But this does not help the case at hand in any way. How would you interrupt a
>> > thread that is blocked in ReadFile()? The point of pthread_cancel() is that
>> > it interrupts blocked system calls
>>
>> There is no mention of such a guarantee in POSIX (section 2.9.5 Thread
>> Cancellation), so relying on that is undefined behavior.
>
> Eh? My pthreads(7) says that read() is required to be a cancellation
> point acrroding to POSIX. I didn't dig up the actual reference in the
> standard, though.

I don't understand where the implementor would get that from after
reading through it, but if there's something I've missed we can fix it
by replacing my pthread_cancel with this, no?

static inline int pthread_cancel(pthread_t thread)
{
	SetEvent(thread.cancel_event);
	CancelSynchronousIo(thread.handle);
}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 20:13                 ` Erik Faye-Lund
@ 2011-04-01 20:17                   ` Jeff King
  2011-04-01 20:18                     ` Jeff King
  2011-04-01 20:34                     ` Erik Faye-Lund
  2011-04-01 20:36                   ` Johannes Sixt
  1 sibling, 2 replies; 48+ messages in thread
From: Jeff King @ 2011-04-01 20:17 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Johannes Sixt, git

On Fri, Apr 01, 2011 at 10:13:23PM +0200, Erik Faye-Lund wrote:

> On Fri, Apr 1, 2011 at 10:05 PM, Jeff King <peff@peff.net> wrote:
> > On Fri, Apr 01, 2011 at 09:57:14PM +0200, Erik Faye-Lund wrote:
> >
> >> > But this does not help the case at hand in any way. How would you interrupt a
> >> > thread that is blocked in ReadFile()? The point of pthread_cancel() is that
> >> > it interrupts blocked system calls
> >>
> >> There is no mention of such a guarantee in POSIX (section 2.9.5 Thread
> >> Cancellation), so relying on that is undefined behavior.
> >
> > Eh? My pthreads(7) says that read() is required to be a cancellation
> > point acrroding to POSIX. I didn't dig up the actual reference in the
> > standard, though.
> 
> I don't understand where the implementor would get that from after
> reading through it, but if there's something I've missed we can fix it
> by replacing my pthread_cancel with this, no?

Out of curiosity, which POSIX are you reading? My page references
POSIX.1-2001, but technically pthreads were originally defined in
POSIX.1c-1995.

> static inline int pthread_cancel(pthread_t thread)
> {
> 	SetEvent(thread.cancel_event);
> 	CancelSynchronousIo(thread.handle);
> }

There are a ton of cancellation points, not just I/O (e.g., sleep).
However, interrupting a read would probably be sufficient for git's
purposes.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 20:17                   ` Jeff King
@ 2011-04-01 20:18                     ` Jeff King
  2011-04-01 20:34                     ` Erik Faye-Lund
  1 sibling, 0 replies; 48+ messages in thread
From: Jeff King @ 2011-04-01 20:18 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Johannes Sixt, git

On Fri, Apr 01, 2011 at 04:17:14PM -0400, Jeff King wrote:

> > static inline int pthread_cancel(pthread_t thread)
> > {
> > 	SetEvent(thread.cancel_event);
> > 	CancelSynchronousIo(thread.handle);
> > }
> 
> There are a ton of cancellation points, not just I/O (e.g., sleep).
> However, interrupting a read would probably be sufficient for git's
> purposes.

Actually, I take that back. It would be sufficient for abort_async with
its current callers, but adding a pthread_cancel that is only partially
there may end up causing headaches down the road.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 19:57             ` Erik Faye-Lund
  2011-04-01 20:05               ` Jeff King
@ 2011-04-01 20:18               ` Johannes Sixt
  2011-04-01 20:31                 ` Erik Faye-Lund
  1 sibling, 1 reply; 48+ messages in thread
From: Johannes Sixt @ 2011-04-01 20:18 UTC (permalink / raw)
  To: kusmabite; +Cc: Jeff King, git

On Freitag, 1. April 2011, Erik Faye-Lund wrote:
> On Fri, Apr 1, 2011 at 9:42 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> > But this does not help the case at hand in any way. How would you
> > interrupt a thread that is blocked in ReadFile()? The point of
> > pthread_cancel() is that it interrupts blocked system calls
>
> There is no mention of such a guarantee in POSIX (section 2.9.5 Thread
> Cancellation), so relying on that is undefined behavior.

In the paragraph before the bulleted list at the end of "Cancellation Points":

"...If a thread has cancelability enabled and a cancellation request is made 
with the thread as a target while the thread is suspended at a cancellation 
point, the thread shall be awakened and the cancellation request shall be 
acted upon..."

-- Hannes

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 20:18               ` Johannes Sixt
@ 2011-04-01 20:31                 ` Erik Faye-Lund
  2011-04-01 21:16                   ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Erik Faye-Lund @ 2011-04-01 20:31 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, git

On Fri, Apr 1, 2011 at 10:18 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> On Freitag, 1. April 2011, Erik Faye-Lund wrote:
>> On Fri, Apr 1, 2011 at 9:42 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>> > But this does not help the case at hand in any way. How would you
>> > interrupt a thread that is blocked in ReadFile()? The point of
>> > pthread_cancel() is that it interrupts blocked system calls
>>
>> There is no mention of such a guarantee in POSIX (section 2.9.5 Thread
>> Cancellation), so relying on that is undefined behavior.
>
> In the paragraph before the bulleted list at the end of "Cancellation Points":
>
> "...If a thread has cancelability enabled and a cancellation request is made
> with the thread as a target while the thread is suspended at a cancellation
> point, the thread shall be awakened and the cancellation request shall be
> acted upon..."
>

A blocking thread and a suspended are two different matters. A
suspended thread is a thread that has been explicitly suspended by
wait, waitpid, sleep, pause etc. These functions explicitly say that
they suspend the thread ("shall suspend the calling thread until"),
while read etc does not ("shall block the calling thread until").

Similarly, making a blocking read/write fail (or terminate mid-way) is
not the same thing as awakening the thread.

I see how some people can read something like this into this section,
but I think it's pretty clear - this is not what it's talking about.
In fact, the more I read the relevant texts, the more convinced I get
that implementations that does terminate read/write strictly speaking
is in violation of the standard.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 20:17                   ` Jeff King
  2011-04-01 20:18                     ` Jeff King
@ 2011-04-01 20:34                     ` Erik Faye-Lund
  1 sibling, 0 replies; 48+ messages in thread
From: Erik Faye-Lund @ 2011-04-01 20:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, git

On Fri, Apr 1, 2011 at 10:17 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 01, 2011 at 10:13:23PM +0200, Erik Faye-Lund wrote:
>
>> On Fri, Apr 1, 2011 at 10:05 PM, Jeff King <peff@peff.net> wrote:
>> > On Fri, Apr 01, 2011 at 09:57:14PM +0200, Erik Faye-Lund wrote:
>> >
>> >> > But this does not help the case at hand in any way. How would you interrupt a
>> >> > thread that is blocked in ReadFile()? The point of pthread_cancel() is that
>> >> > it interrupts blocked system calls
>> >>
>> >> There is no mention of such a guarantee in POSIX (section 2.9.5 Thread
>> >> Cancellation), so relying on that is undefined behavior.
>> >
>> > Eh? My pthreads(7) says that read() is required to be a cancellation
>> > point acrroding to POSIX. I didn't dig up the actual reference in the
>> > standard, though.
>>
>> I don't understand where the implementor would get that from after
>> reading through it, but if there's something I've missed we can fix it
>> by replacing my pthread_cancel with this, no?
>
> Out of curiosity, which POSIX are you reading? My page references
> POSIX.1-2001, but technically pthreads were originally defined in
> POSIX.1c-1995.
>

Sorry that I wasn't more clear about this:

"The Open Group Base Specifications Issue 6"
"IEEE Std 1003.1, 2004 Edition"

Available here:
http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 20:13                 ` Erik Faye-Lund
  2011-04-01 20:17                   ` Jeff King
@ 2011-04-01 20:36                   ` Johannes Sixt
  2011-04-01 20:41                     ` Erik Faye-Lund
  1 sibling, 1 reply; 48+ messages in thread
From: Johannes Sixt @ 2011-04-01 20:36 UTC (permalink / raw)
  To: kusmabite; +Cc: Jeff King, git

On Freitag, 1. April 2011, Erik Faye-Lund wrote:
> On Fri, Apr 1, 2011 at 10:05 PM, Jeff King <peff@peff.net> wrote:
> > On Fri, Apr 01, 2011 at 09:57:14PM +0200, Erik Faye-Lund wrote:
> >> > But this does not help the case at hand in any way. How would you
> >> > interrupt a thread that is blocked in ReadFile()? The point of
> >> > pthread_cancel() is that it interrupts blocked system calls
> >>
> >> There is no mention of such a guarantee in POSIX (section 2.9.5 Thread
> >> Cancellation), so relying on that is undefined behavior.
> >
> > Eh? My pthreads(7) says that read() is required to be a cancellation
> > point acrroding to POSIX. I didn't dig up the actual reference in the
> > standard, though.
>
> I don't understand where the implementor would get that from after
> reading through it, but if there's something I've missed we can fix it
> by replacing my pthread_cancel with this, no?
>
> static inline int pthread_cancel(pthread_t thread)
> {
> 	SetEvent(thread.cancel_event);
> 	CancelSynchronousIo(thread.handle);
> }

Do I deserve this? There *is* a function that does what we need. I get what I 
deserve when I don't study all 100,000 functions in the manual...

Wait a minute! I *did* study 100,000 functions. This is one of the 25,000 
functions that are new in Vista... Those I haven't studied, yet.

;)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 20:36                   ` Johannes Sixt
@ 2011-04-01 20:41                     ` Erik Faye-Lund
  0 siblings, 0 replies; 48+ messages in thread
From: Erik Faye-Lund @ 2011-04-01 20:41 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, git

On Fri, Apr 1, 2011 at 10:36 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> On Freitag, 1. April 2011, Erik Faye-Lund wrote:
>> On Fri, Apr 1, 2011 at 10:05 PM, Jeff King <peff@peff.net> wrote:
>> > On Fri, Apr 01, 2011 at 09:57:14PM +0200, Erik Faye-Lund wrote:
>> >> > But this does not help the case at hand in any way. How would you
>> >> > interrupt a thread that is blocked in ReadFile()? The point of
>> >> > pthread_cancel() is that it interrupts blocked system calls
>> >>
>> >> There is no mention of such a guarantee in POSIX (section 2.9.5 Thread
>> >> Cancellation), so relying on that is undefined behavior.
>> >
>> > Eh? My pthreads(7) says that read() is required to be a cancellation
>> > point acrroding to POSIX. I didn't dig up the actual reference in the
>> > standard, though.
>>
>> I don't understand where the implementor would get that from after
>> reading through it, but if there's something I've missed we can fix it
>> by replacing my pthread_cancel with this, no?
>>
>> static inline int pthread_cancel(pthread_t thread)
>> {
>>       SetEvent(thread.cancel_event);
>>       CancelSynchronousIo(thread.handle);
>> }
>
> Do I deserve this? There *is* a function that does what we need. I get what I
> deserve when I don't study all 100,000 functions in the manual...
>

Hey, it's mentioned in the ReadFile documentation!

> Wait a minute! I *did* study 100,000 functions. This is one of the 25,000
> functions that are new in Vista... Those I haven't studied, yet.
>

Bah, Vista and up. That blows, we support WinXP as well! Well,
dynamically loading the function when it's there is better than doing
NOTHING, which was the previous alternative, no? :)

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 20:31                 ` Erik Faye-Lund
@ 2011-04-01 21:16                   ` Jeff King
  2011-04-02 12:27                     ` Erik Faye-Lund
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2011-04-01 21:16 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Johannes Sixt, git

On Fri, Apr 01, 2011 at 10:31:42PM +0200, Erik Faye-Lund wrote:

> On Fri, Apr 1, 2011 at 10:18 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> > On Freitag, 1. April 2011, Erik Faye-Lund wrote:
> >> On Fri, Apr 1, 2011 at 9:42 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> >> > But this does not help the case at hand in any way. How would you
> >> > interrupt a thread that is blocked in ReadFile()? The point of
> >> > pthread_cancel() is that it interrupts blocked system calls
> >>
> >> There is no mention of such a guarantee in POSIX (section 2.9.5 Thread
> >> Cancellation), so relying on that is undefined behavior.
> >
> > In the paragraph before the bulleted list at the end of "Cancellation Points":
> >
> > "...If a thread has cancelability enabled and a cancellation request is made
> > with the thread as a target while the thread is suspended at a cancellation
> > point, the thread shall be awakened and the cancellation request shall be
> > acted upon..."
> >
> 
> A blocking thread and a suspended are two different matters. A
> suspended thread is a thread that has been explicitly suspended by
> wait, waitpid, sleep, pause etc. These functions explicitly say that
> they suspend the thread ("shall suspend the calling thread until"),
> while read etc does not ("shall block the calling thread until").
> 
> Similarly, making a blocking read/write fail (or terminate mid-way) is
> not the same thing as awakening the thread.
> 
> I see how some people can read something like this into this section,
> but I think it's pretty clear - this is not what it's talking about.
> In fact, the more I read the relevant texts, the more convinced I get
> that implementations that does terminate read/write strictly speaking
> is in violation of the standard.

What about the text right before the bit that Johannes quoted:

  The side effects of acting upon a cancellation request while suspended
  during a call of a function are the same as the side effects that may
  be seen in a single-threaded program when a call to a function is
  interrupted by a signal and the given function returns [EINTR]. Any
  such side effects occur before any cancellation cleanup handlers are
  called.

I agree it would be nicer if it explicitly said "when you are in a
function which is a cancellation point, pending cancellation requests
which are delivered are acuted upon immediately".

But it is implied to me by the surrounding text, and it's the only
sensible behavior IMHO. Plus it seems to be what at least glibc pthreads
does on Linux, so I'm going to assume that people smarter than me
thought about it and came up with the same interpretation.

My test program was:

-- >8 --
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>

void *child(void *data)
{
  char buf[32];
  int r;

  fprintf(stderr, "child reading from stdin...\n");
  r = read(0, buf, sizeof(buf));
  fprintf(stderr, "child read returned %d\n", r);
  return NULL;
}

int main(void)
{
  pthread_t t;
  void *r;

  pthread_create(&t, NULL, child, NULL);
  sleep(3);
  pthread_cancel(t);

  pthread_join(t, &r);
  if (r == PTHREAD_CANCELED)
    fprintf(stderr, "thread was canceled\n");
  else
    fprintf(stderr, "thread returned %p\n", r);

  return 0;
}
-- >8 --

If you input something before 3 seconds is up, the thread prints its
message and returns NULL. But if you let it go, the cancel interrupts
the read().

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/4] run-command: implement abort_async for pthreads
  2011-04-01 21:16                   ` Jeff King
@ 2011-04-02 12:27                     ` Erik Faye-Lund
  0 siblings, 0 replies; 48+ messages in thread
From: Erik Faye-Lund @ 2011-04-02 12:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, git

On Fri, Apr 1, 2011 at 11:16 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 01, 2011 at 10:31:42PM +0200, Erik Faye-Lund wrote:
>
>> On Fri, Apr 1, 2011 at 10:18 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>> > On Freitag, 1. April 2011, Erik Faye-Lund wrote:
>> >> On Fri, Apr 1, 2011 at 9:42 PM, Johannes Sixt <j6t@kdbg.org> wrote:
>> >> > But this does not help the case at hand in any way. How would you
>> >> > interrupt a thread that is blocked in ReadFile()? The point of
>> >> > pthread_cancel() is that it interrupts blocked system calls
>> >>
>> >> There is no mention of such a guarantee in POSIX (section 2.9.5 Thread
>> >> Cancellation), so relying on that is undefined behavior.
>> >
>> > In the paragraph before the bulleted list at the end of "Cancellation Points":
>> >
>> > "...If a thread has cancelability enabled and a cancellation request is made
>> > with the thread as a target while the thread is suspended at a cancellation
>> > point, the thread shall be awakened and the cancellation request shall be
>> > acted upon..."
>> >
>>
>> A blocking thread and a suspended are two different matters. A
>> suspended thread is a thread that has been explicitly suspended by
>> wait, waitpid, sleep, pause etc. These functions explicitly say that
>> they suspend the thread ("shall suspend the calling thread until"),
>> while read etc does not ("shall block the calling thread until").
>>
>> Similarly, making a blocking read/write fail (or terminate mid-way) is
>> not the same thing as awakening the thread.
>>
>> I see how some people can read something like this into this section,
>> but I think it's pretty clear - this is not what it's talking about.
>> In fact, the more I read the relevant texts, the more convinced I get
>> that implementations that does terminate read/write strictly speaking
>> is in violation of the standard.
>
> What about the text right before the bit that Johannes quoted:
>
>  The side effects of acting upon a cancellation request while suspended
>  during a call of a function are the same as the side effects that may
>  be seen in a single-threaded program when a call to a function is
>  interrupted by a signal and the given function returns [EINTR]. Any
>  such side effects occur before any cancellation cleanup handlers are
>  called.
>

Yeah, this is much closer, because it explicitly defines the behavior
to "fail" in the same way as when a signal interrupts a wait (which is
not simply awakening the thread). The text has the same problem for
this purpose as the one Johannes quoted; it talks about suspension,
not necessarily blocking. But it's mention of side-effects makes me
suspect that they mean blocking when they say suspension in this case,
because none of the functions that are documented as "blocking" seems
to have any side-effects in this case.

So yes, I think this is the most reasonable interpretation of this
paragraph. Thanks for making me re-read it :)

> I agree it would be nicer if it explicitly said "when you are in a
> function which is a cancellation point, pending cancellation requests
> which are delivered are acuted upon immediately".
>
> But it is implied to me by the surrounding text, and it's the only
> sensible behavior IMHO.

I tend to agree with you on this.

> Plus it seems to be what at least glibc pthreads
> does on Linux, so I'm going to assume that people smarter than me
> thought about it and came up with the same interpretation.
>
> My test program was:
>
> -- >8 --
> #include <pthread.h>
> #include <unistd.h>
> #include <stdio.h>
>
> void *child(void *data)
> {
>  char buf[32];
>  int r;
>
>  fprintf(stderr, "child reading from stdin...\n");
>  r = read(0, buf, sizeof(buf));
>  fprintf(stderr, "child read returned %d\n", r);
>  return NULL;
> }
>
> int main(void)
> {
>  pthread_t t;
>  void *r;
>
>  pthread_create(&t, NULL, child, NULL);
>  sleep(3);
>  pthread_cancel(t);
>
>  pthread_join(t, &r);
>  if (r == PTHREAD_CANCELED)
>    fprintf(stderr, "thread was canceled\n");
>  else
>    fprintf(stderr, "thread returned %p\n", r);
>
>  return 0;
> }
> -- >8 --
>
> If you input something before 3 seconds is up, the thread prints its
> message and returns NULL. But if you let it go, the cancel interrupts
> the read().
>

I'm not sure I agree that measured behavior is the same as defined
operation. But it does support the best theory we have.

So now we're left to figure out how to safely terminate a blocking
read on versions of Windows earlier than Windows Vista. Perhaps just
letting it time-out (assuming it does), and handle the cancellation at
the end of read() is acceptable (when there's no support for
CancelSynchronousIo, that is)? After all, this deadlock hasn't been
observed on threaded implementations making this issue kind of
theoretical on Windows, no? Also, this seems to be roughly how
pthreads-win32 implements cancellation...

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error
  2011-03-31 18:44 ` [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error Jeff King
@ 2011-04-13 19:53   ` Johannes Sixt
  2011-04-14 13:54     ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Johannes Sixt @ 2011-04-13 19:53 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Donnerstag, 31. März 2011, Jeff King wrote:
> If pack-objects dies prematurely (for example, because there
> is some repo corruption), we are careful to clean up the
> sideband demuxer (if it is being used) with finish_async.
> However, for an async implementation which forks (i.e., not
> Windows), that means we will waitpid() for the async
> process.
>
> Meanwhile, the async sideband demuxer will continue trying
> to stream data from the remote repo until it gets EOF.
> Depending on what data pack-objects actually sent, the
> remote repo may not actually send anything (e.g., if we sent
> nothing and it is simply waiting for the pack). This leads
> to deadlock cycle in which send-pack waits on the demuxer,
> the demuxer waits on the remote receive-pack, and the remote
> receive-pack waits on send-pack to send the pack data.

This is an indication that a writable end of the pipe between send-pack and 
receive-pack is still open. This fixes the deadlock for me without having to 
kill the demuxer explicitly:

diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 5e772c7..db32ded 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -229,6 +229,9 @@ static void print_helper_status(struct ref *ref)
 static int sideband_demux(int in, int out, void *data)
 {
 	int *fd = data;
+#ifdef NO_PTHREADS
+	close(fd[1]);
+#endif
 	int ret = recv_sideband("send-pack", fd[0], out);
 	close(out);
 	return ret;

If only I had a brilliant idea how to forge this into a re-usable pattern...

-- Hannes

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error
  2011-04-13 19:53   ` Johannes Sixt
@ 2011-04-14 13:54     ` Jeff King
  2011-04-14 19:36       ` Johannes Sixt
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2011-04-14 13:54 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

On Wed, Apr 13, 2011 at 09:53:06PM +0200, Johannes Sixt wrote:

> > Meanwhile, the async sideband demuxer will continue trying
> > to stream data from the remote repo until it gets EOF.
> > Depending on what data pack-objects actually sent, the
> > remote repo may not actually send anything (e.g., if we sent
> > nothing and it is simply waiting for the pack). This leads
> > to deadlock cycle in which send-pack waits on the demuxer,
> > the demuxer waits on the remote receive-pack, and the remote
> > receive-pack waits on send-pack to send the pack data.
> 
> This is an indication that a writable end of the pipe between send-pack and 
> receive-pack is still open. This fixes the deadlock for me without having to 
> kill the demuxer explicitly:
> 
> diff --git a/builtin/send-pack.c b/builtin/send-pack.c
> index 5e772c7..db32ded 100644
> --- a/builtin/send-pack.c
> +++ b/builtin/send-pack.c
> @@ -229,6 +229,9 @@ static void print_helper_status(struct ref *ref)
>  static int sideband_demux(int in, int out, void *data)
>  {
>  	int *fd = data;
> +#ifdef NO_PTHREADS
> +	close(fd[1]);
> +#endif
>  	int ret = recv_sideband("send-pack", fd[0], out);
>  	close(out);
>  	return ret;
> 
> If only I had a brilliant idea how to forge this into a re-usable pattern...

Thanks for finding that. I had the notion that there was a pipe end
hanging open somewhere, but looking through the async code, I found us
closing the pipes properly. But of course I failed to check the fds
coming into send_pack.

Obviously it totally breaks the start_async abstraction if the called
code needs to care whether it forked or not. But we can use that to our
advantage, since it means start_async callers must assume the interface
is very limited.  So I think we can do something like:

  1. Async code declares which file descriptors it cares about. This
     would automatically include the pipe we give to it, of course.
     So the declared ones for a sideband demuxer would be stderr, and
     some network fd for reading.

  2. In the pthreads case, we do nothing. In the forked case, the child
     closes every descriptor except the "interesting" ones.

And that solves this problem, and the general case that async-callers
have no idea if they have just leaked pipe descriptors in the forked
case.

I'm still slightly confused, though, because I never see that descriptor
get closed in the threaded case. So I still don't understand why it
_doesn't_ deadlock with pthreads.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error
  2011-04-14 13:54     ` Jeff King
@ 2011-04-14 19:36       ` Johannes Sixt
  2011-04-14 20:21         ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Johannes Sixt @ 2011-04-14 19:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Donnerstag, 14. April 2011, Jeff King wrote:
> Obviously it totally breaks the start_async abstraction if the called
> code needs to care whether it forked or not. But we can use that to our
> advantage, since it means start_async callers must assume the interface
> is very limited.  So I think we can do something like:
>
>   1. Async code declares which file descriptors it cares about. This
>      would automatically include the pipe we give to it, of course.
>      So the declared ones for a sideband demuxer would be stderr, and
>      some network fd for reading.
>
>   2. In the pthreads case, we do nothing. In the forked case, the child
>      closes every descriptor except the "interesting" ones.
>
> And that solves this problem, and the general case that async-callers
> have no idea if they have just leaked pipe descriptors in the forked
> case.

Sounds like a plan. How do you close all file descriptors? Just iterate up to 
getrlimit(RLIMIT_NOFILE)?

>
> I'm still slightly confused, though, because I never see that descriptor
> get closed in the threaded case. So I still don't understand why it
> _doesn't_ deadlock with pthreads.

In the threaded case, this fd is closed by start_command(), where it is passed 
as po.out in pack_objects(). In the fork case this is too late because a 
duplicate was already inherited to the sideband demuxer.

However, pack_objects() works differently in the stateless_rpc case: then it 
does not close fd anywhere, and I think it should be possible to construct a 
similar case that hangs even in the threaded case. And the fix could simply 
look like this:

diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 5e772c7..c8f601f 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -101,6 +101,7 @@ static int pack_objects(int fd, struct ref *refs,
 		free(buf);
 		close(po.out);
 		po.out = -1;
+		close(fd);
 	}
 
 	if (finish_command(&po))

-- Hannes

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error
  2011-04-14 19:36       ` Johannes Sixt
@ 2011-04-14 20:21         ` Jeff King
  2011-04-14 20:43           ` Johannes Sixt
  0 siblings, 1 reply; 48+ messages in thread
From: Jeff King @ 2011-04-14 20:21 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

On Thu, Apr 14, 2011 at 09:36:25PM +0200, Johannes Sixt wrote:

> On Donnerstag, 14. April 2011, Jeff King wrote:
> > Obviously it totally breaks the start_async abstraction if the called
> > code needs to care whether it forked or not. But we can use that to our
> > advantage, since it means start_async callers must assume the interface
> > is very limited.  So I think we can do something like:
> >
> >   1. Async code declares which file descriptors it cares about. This
> >      would automatically include the pipe we give to it, of course.
> >      So the declared ones for a sideband demuxer would be stderr, and
> >      some network fd for reading.
> >
> >   2. In the pthreads case, we do nothing. In the forked case, the child
> >      closes every descriptor except the "interesting" ones.
> >
> > And that solves this problem, and the general case that async-callers
> > have no idea if they have just leaked pipe descriptors in the forked
> > case.
> 
> Sounds like a plan. How do you close all file descriptors? Just iterate up to 
> getrlimit(RLIMIT_NOFILE)?

Sadly, yes, I think that is what we would have to do. It does feel like
an awful hack. And it will interact badly with things like valgrind,
which open descriptors behind the scenes (but can properly handle
the forking).

I just don't see another way around it for the general case.  The
"usual" fix for this sort of thing is that the descriptors should have
close-on-exec set, but that doesn't work for us here, because we are
only forking.

It's sufficiently ugly (and still possible to break in the pthreads
case!) that it may be worth not worrying about the general case at all,
and just fixing this one with the explicit close.

> > I'm still slightly confused, though, because I never see that descriptor
> > get closed in the threaded case. So I still don't understand why it
> > _doesn't_ deadlock with pthreads.
> 
> In the threaded case, this fd is closed by start_command(), where it is passed 
> as po.out in pack_objects(). In the fork case this is too late because a 
> duplicate was already inherited to the sideband demuxer.

Hrm, I see the code now. That seems like an odd thing to do to me.
Doesn't it disallow:

  /* set up a command */
  const char **argv = { "some", "command" };
  struct child_process c;
  c.argv = argv;
  c.out = fd;

  /* run it */
  run_command(&c);

  /* now tack our own output to the end */
  write(fd, "foo", 3);

And even weirder, we only do the close for high file descriptors. So you
_can_ do that above if "fd" is stdout, but not with an arbitrary fd.

I guess it is neither here nor there with respect to this problem; it
clearly is not something we want to do a lot, as it doesn't seem to have
come up.

But at least it explains what's going on here in the threaded case.

> However, pack_objects() works differently in the stateless_rpc case: then it 
> does not close fd anywhere, and I think it should be possible to construct a 
> similar case that hangs even in the threaded case. And the fix could simply 
> look like this:
> 
> diff --git a/builtin/send-pack.c b/builtin/send-pack.c
> index 5e772c7..c8f601f 100644
> --- a/builtin/send-pack.c
> +++ b/builtin/send-pack.c
> @@ -101,6 +101,7 @@ static int pack_objects(int fd, struct ref *refs,
>  		free(buf);
>  		close(po.out);
>  		po.out = -1;
> +		close(fd);
>  	}
>  
>  	if (finish_command(&po))

Yeah, from my reading of the code, you are right.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error
  2011-04-14 20:21         ` Jeff King
@ 2011-04-14 20:43           ` Johannes Sixt
  2011-04-14 20:51             ` Jeff King
  0 siblings, 1 reply; 48+ messages in thread
From: Johannes Sixt @ 2011-04-14 20:43 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Donnerstag, 14. April 2011, Jeff King wrote:
> On Thu, Apr 14, 2011 at 09:36:25PM +0200, Johannes Sixt wrote:
> > On Donnerstag, 14. April 2011, Jeff King wrote:
> > > Obviously it totally breaks the start_async abstraction if the called
> > > code needs to care whether it forked or not. But we can use that to our
> > > advantage, since it means start_async callers must assume the interface
> > > is very limited.  So I think we can do something like:
> > >
> > >   1. Async code declares which file descriptors it cares about. This
> > >      would automatically include the pipe we give to it, of course.
> > >      So the declared ones for a sideband demuxer would be stderr, and
> > >      some network fd for reading.
> > >
> > >   2. In the pthreads case, we do nothing. In the forked case, the child
> > >      closes every descriptor except the "interesting" ones.
> > >
> > > And that solves this problem, and the general case that async-callers
> > > have no idea if they have just leaked pipe descriptors in the forked
> > > case.
> >
> > Sounds like a plan. How do you close all file descriptors? Just iterate
> > up to getrlimit(RLIMIT_NOFILE)?
>
> Sadly, yes, I think that is what we would have to do. It does feel like
> an awful hack. And it will interact badly with things like valgrind,
> which open descriptors behind the scenes (but can properly handle
> the forking).
>
> I just don't see another way around it for the general case.  The
> "usual" fix for this sort of thing is that the descriptors should have
> close-on-exec set, but that doesn't work for us here, because we are
> only forking.
>
> It's sufficiently ugly (and still possible to break in the pthreads
> case!) that it may be worth not worrying about the general case at all,
> and just fixing this one with the explicit close.
>
> > > I'm still slightly confused, though, because I never see that
> > > descriptor get closed in the threaded case. So I still don't understand
> > > why it _doesn't_ deadlock with pthreads.
> >
> > In the threaded case, this fd is closed by start_command(), where it is
> > passed as po.out in pack_objects(). In the fork case this is too late
> > because a duplicate was already inherited to the sideband demuxer.
>
> Hrm, I see the code now. That seems like an odd thing to do to me.

Why so? It's a matter of resource ownership: If you pass a positive value, you 
give away ownership; if you pass -1, you gain ownership; if you pass 0, 
ownership remains unchanged.

> Doesn't it disallow:
>
>   /* set up a command */
>   const char **argv = { "some", "command" };
>   struct child_process c;
>   c.argv = argv;
>   c.out = fd;
>
>   /* run it */
>   run_command(&c);
>
>   /* now tack our own output to the end */
>   write(fd, "foo", 3);

You would have to dup() the fd before run_command().

> And even weirder, we only do the close for high file descriptors. So you
> _can_ do that above if "fd" is stdout, but not with an arbitrary fd.

Ah, right, that's a bit dubious. The reason is that if you want to tell the 
child process to use the parent's stdout for its own stdout, you specify 0 
aka "no special treatement", i.e. just inherit from the parent, not 1. IOW, 1 
is never a sane candidate to be assigned to c.out.

-- Hannes

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error
  2011-04-14 20:43           ` Johannes Sixt
@ 2011-04-14 20:51             ` Jeff King
  2011-04-14 21:05               ` Johannes Sixt
  2011-04-14 21:21               ` Junio C Hamano
  0 siblings, 2 replies; 48+ messages in thread
From: Jeff King @ 2011-04-14 20:51 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git

On Thu, Apr 14, 2011 at 10:43:33PM +0200, Johannes Sixt wrote:

> > > In the threaded case, this fd is closed by start_command(), where it is
> > > passed as po.out in pack_objects(). In the fork case this is too late
> > > because a duplicate was already inherited to the sideband demuxer.
> >
> > Hrm, I see the code now. That seems like an odd thing to do to me.
> 
> Why so? It's a matter of resource ownership: If you pass a positive value, you 
> give away ownership; if you pass -1, you gain ownership; if you pass 0, 
> ownership remains unchanged.

I can see how that is useful. Mostly I was just surprised, because I
wouldn't expect ownership to be transferred there.

> > Doesn't it disallow:
> >
> >   /* set up a command */
> >   const char **argv = { "some", "command" };
> >   struct child_process c;
> >   c.argv = argv;
> >   c.out = fd;
> >
> >   /* run it */
> >   run_command(&c);
> >
> >   /* now tack our own output to the end */
> >   write(fd, "foo", 3);
> 
> You would have to dup() the fd before run_command().

True. That makes it less of a big deal, because for the times that you
don't want full ownership transferred, you can work around it.

> > And even weirder, we only do the close for high file descriptors. So you
> > _can_ do that above if "fd" is stdout, but not with an arbitrary fd.
> 
> Ah, right, that's a bit dubious. The reason is that if you want to tell the 
> child process to use the parent's stdout for its own stdout, you specify 0 
> aka "no special treatement", i.e. just inherit from the parent, not 1. IOW, 1 
> is never a sane candidate to be assigned to c.out.

Fair enough.

So what do you want to do about the fd that needs closing? The options
I see are:

  1. Try for a general solution. That probably means the "close every
     descriptor in the child" hackiness that I mentioned earlier.

  2. Fix this case by having the async code close it if it was forked.
     It needs to know whether we forked, so we can:

       a. Use NO_PTHREADS. Easy and simple, though it does break
          start_async's abstraction a bit.

       b. Have start_async pass in a flag telling what happened. This
          really breaks the abstraction very similarly to (a), but it
          makes the connection more explicit.

I think I am leaning a bit towards (2a). It's simple, and it's not like
this is library code with a million unknown callers; fixing it simply
and cleanly with a nice commit message is probably sufficient.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error
  2011-04-14 20:51             ` Jeff King
@ 2011-04-14 21:05               ` Johannes Sixt
  2011-04-14 21:21               ` Junio C Hamano
  1 sibling, 0 replies; 48+ messages in thread
From: Johannes Sixt @ 2011-04-14 21:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Donnerstag, 14. April 2011, Jeff King wrote:
> So what do you want to do about the fd that needs closing? The options
> I see are:
>
>   1. Try for a general solution. That probably means the "close every
>      descriptor in the child" hackiness that I mentioned earlier.
>
>   2. Fix this case by having the async code close it if it was forked.
>      It needs to know whether we forked, so we can:
>
>        a. Use NO_PTHREADS. Easy and simple, though it does break
>           start_async's abstraction a bit.
>
>        b. Have start_async pass in a flag telling what happened. This
>           really breaks the abstraction very similarly to (a), but it
>           makes the connection more explicit.
>
> I think I am leaning a bit towards (2a). It's simple, and it's not like
> this is library code with a million unknown callers; fixing it simply
> and cleanly with a nice commit message is probably sufficient.

(2a) would be good enough for my taste, too!

-- Hannes

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error
  2011-04-14 20:51             ` Jeff King
  2011-04-14 21:05               ` Johannes Sixt
@ 2011-04-14 21:21               ` Junio C Hamano
  2011-04-24 20:42                 ` [PATCH/RFC 1/2] send-pack --stateless-rpc: properly close the outgoing channel Johannes Sixt
  1 sibling, 1 reply; 48+ messages in thread
From: Junio C Hamano @ 2011-04-14 21:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, git

Jeff King <peff@peff.net> writes:

> I think I am leaning a bit towards (2a). It's simple, and it's not like
> this is library code with a million unknown callers; fixing it simply
> and cleanly with a nice commit message is probably sufficient.

This really sounds like a plan.  Even if we _might_ later want to go to
2b. or some other solution, we will know what pattern to grep for.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH/RFC 1/2] send-pack --stateless-rpc: properly close the outgoing channel
  2011-04-14 21:21               ` Junio C Hamano
@ 2011-04-24 20:42                 ` Johannes Sixt
  2011-04-24 20:49                   ` [PATCH 2/2] send-pack: avoid deadlock when pack-object dies early Johannes Sixt
  2011-04-25 16:40                   ` [PATCH/RFC 1/2] send-pack --stateless-rpc: properly close the outgoing channel Jeff King
  0 siblings, 2 replies; 48+ messages in thread
From: Johannes Sixt @ 2011-04-24 20:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

In the non-stateless-rpc case, the writable end of the channel to the
remote repo is closed by the start_command() call that runs the
pack-objects process (after pack-objects inherited a copy). But in the
--stateless-rpc case, where send-pack takes care of writing data to the
channel, this was missed.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
Am 14.04.2011 23:21, schrieb Junio C Hamano:
> Jeff King <peff@peff.net> writes:
> 
>> I think I am leaning a bit towards (2a). It's simple, and it's not like
>> this is library code with a million unknown callers; fixing it simply
>> and cleanly with a nice commit message is probably sufficient.
> 
> This really sounds like a plan.  Even if we _might_ later want to go to
> 2b. or some other solution, we will know what pattern to grep for.

Here's a 2-patch series that implements this plan. The patches go on top of
38a81b4e (receive-pack: Wrap status reports inside side-band-64k) just like
Jeff's series (jk/maint-push-async-hang).

Warning: This patch is untested. Furthermore, it does not even fix a resource
leak because the fd that is now closed in pack_objects() would be closed
later in cmd_send_pack. However, without closing the fd earlier like this,
a --stateless-rpc invocation could theoretically dead-lock just like a regular
invocation in a NO_PTHREADS build. But I also don't know how to test-drive
send-pack --stateless-rpc to construct such a case. Any hints how to do that
would be appreciated.

-- Hannes

 builtin-send-pack.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/builtin-send-pack.c b/builtin-send-pack.c
index 2478e18..089058b 100644
--- a/builtin-send-pack.c
+++ b/builtin-send-pack.c
@@ -97,6 +97,7 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 		free(buf);
 		close(po.out);
 		po.out = -1;
+		close(fd);
 	}
 
 	if (finish_command(&po))
-- 
1.7.5.rc1.97.ge0653

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/2] send-pack: avoid deadlock when pack-object dies early
  2011-04-24 20:42                 ` [PATCH/RFC 1/2] send-pack --stateless-rpc: properly close the outgoing channel Johannes Sixt
@ 2011-04-24 20:49                   ` Johannes Sixt
  2011-04-25 16:50                     ` Jeff King
  2011-04-25 16:40                   ` [PATCH/RFC 1/2] send-pack --stateless-rpc: properly close the outgoing channel Jeff King
  1 sibling, 1 reply; 48+ messages in thread
From: Johannes Sixt @ 2011-04-24 20:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

When pack-objects dies prematurely (for example, because there is some
repo corruption), we are careful to clean up the sideband demuxer (if it
is being used) with finish_async. For an async implementation which forks
(i.e., not Windows), that means we will waitpid() for the async process.

Meanwhile, the async sideband demuxer will continue trying to stream data
from the remote repo until it gets EOF. Depending on what data
pack-objects actually sent, the remote repo may not actually send
anything, in particular, after the initial rev-exchange it is waiting for
the pack data to arrive.

The send-pack parent process closes the writable end of the channel so
that after the death of the pack-objects process all writable ends should
have been closed and the remote repo should see EOF. This does not
happen, however, because when the sideband demuxer was forked earlier, it
also inherited a writable end; it remains open and keeps the remote repo
from seeing EOF. This leads to a deadlock cycle in which send-pack waits
on the demuxer, the demuxer waits on the remote receive-pack, and the
remote receive-pack waits on send-pack to send the pack data.

To break this, we close the writable end in the demuxer, but only when
it runs in a forked async process.

Analyzed-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
This is the replacement of the series that Jeff proposed earlier. It
should also be merged into f6b60983 (Enable threaded async procedures
whenever pthreads is available). To keep bisectability, the merge commit
should replace '#ifndef ASYNC_AS_THREAD' by '#ifdef NO_PTHREADS'.

-- Hannes

 builtin-send-pack.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/builtin-send-pack.c b/builtin-send-pack.c
index 089058b..b371c79 100644
--- a/builtin-send-pack.c
+++ b/builtin-send-pack.c
@@ -376,6 +376,9 @@ static void print_helper_status(struct ref *ref)
 static int sideband_demux(int in, int out, void *data)
 {
 	int *fd = data;
+#ifndef ASYNC_AS_THREAD
+	close(fd[1]);
+#endif
 	int ret = recv_sideband("send-pack", fd[0], out);
 	close(out);
 	return ret;
-- 
1.7.5.rc1.97.ge0653

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH/RFC 1/2] send-pack --stateless-rpc: properly close the outgoing channel
  2011-04-24 20:42                 ` [PATCH/RFC 1/2] send-pack --stateless-rpc: properly close the outgoing channel Johannes Sixt
  2011-04-24 20:49                   ` [PATCH 2/2] send-pack: avoid deadlock when pack-object dies early Johannes Sixt
@ 2011-04-25 16:40                   ` Jeff King
  1 sibling, 0 replies; 48+ messages in thread
From: Jeff King @ 2011-04-25 16:40 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git

On Sun, Apr 24, 2011 at 10:42:20PM +0200, Johannes Sixt wrote:

> In the non-stateless-rpc case, the writable end of the channel to the
> remote repo is closed by the start_command() call that runs the
> pack-objects process (after pack-objects inherited a copy). But in the
> --stateless-rpc case, where send-pack takes care of writing data to the
> channel, this was missed.
>
> [...]
>
> Warning: This patch is untested. Furthermore, it does not even fix a resource
> leak because the fd that is now closed in pack_objects() would be closed
> later in cmd_send_pack.

Note that we can also call send_pack directly from git-push via the
transport.c interface. I didn't check whether one can actually trigger
stateless-rpc this way, though; it looks like git-remote-http ends up
exec'ing a separate send-pack.

> However, without closing the fd earlier like this, a --stateless-rpc
> invocation could theoretically dead-lock just like a regular
> invocation in a NO_PTHREADS build. But I also don't know how to
> test-drive send-pack --stateless-rpc to construct such a case. Any
> hints how to do that would be appreciated.

I was able to get a hang using v1.7.5 compiled with pthreads. You need
to have a server accepting smart-http push (if you use github, you can
create an empty test repo there, which is sufficient).

And then do a modified version of the test I posted earlier:

  UPSTREAM=https://peff@github.com/peff/test.git

  git init child &&
  cd child &&
  git remote add origin $UPSTREAM &&
  echo content >file &&
  git add file &&
  git commit -m one &&
  echo content >>file &&
  git add file &&
  git commit -m two &&
  sha1=`git rev-parse HEAD:file` &&
  file=`echo $sha1 | sed 's,..,&/,'` &&
  rm -fv .git/objects/$file

where obviously you need to tweak $UPSTREAM to point to the repo you
created.  That sets up the broken repo state. You can then try "git
push" in the repo with various versions to check their behavior.

With stock v1.7.5, this will hang after pack-objects reports the fatal
error. With your patch, it exits immediately, though the output looks
like this:

  $ git push
  Password:
  Counting objects: 5, done.
  error: unable to find ea0faeb6073ff6cb085727c3647be457051e6ed7
  fatal: unable to read ea0faeb6073ff6cb085727c3647be457051e6ed7
  fatal: The remote end hung up unexpectedly
  fatal: The remote end hung up unexpectedly
  fatal: write error: Bad file descriptor

which could perhaps be a little nicer, but is probably not a big deal (I
didn't dig very deep, but presumably we should exit a little more
immediately after seeing pack-objects fail).

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] send-pack: avoid deadlock when pack-object dies early
  2011-04-24 20:49                   ` [PATCH 2/2] send-pack: avoid deadlock when pack-object dies early Johannes Sixt
@ 2011-04-25 16:50                     ` Jeff King
  2011-04-25 17:41                       ` Johannes Sixt
  2011-04-25 21:04                       ` [PATCH v2] " Johannes Sixt
  0 siblings, 2 replies; 48+ messages in thread
From: Jeff King @ 2011-04-25 16:50 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git

On Sun, Apr 24, 2011 at 10:49:17PM +0200, Johannes Sixt wrote:

> diff --git a/builtin-send-pack.c b/builtin-send-pack.c
> index 089058b..b371c79 100644
> --- a/builtin-send-pack.c
> +++ b/builtin-send-pack.c
> @@ -376,6 +376,9 @@ static void print_helper_status(struct ref *ref)
>  static int sideband_demux(int in, int out, void *data)
>  {
>  	int *fd = data;
> +#ifndef ASYNC_AS_THREAD
> +	close(fd[1]);
> +#endif

In the comments for 1/2, you said this goes directly on 38a81b4e. But in
that commit, we use #ifndef WIN32 to decide whether or not to fork for
async code. So shouldn't this use the same test (I don't even see
ASYNC_AS_THREAD defined anywhere else)?

And of course in more modern versions, it should be NO_PTHREADS, as you
noted.

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] send-pack: avoid deadlock when pack-object dies early
  2011-04-25 16:50                     ` Jeff King
@ 2011-04-25 17:41                       ` Johannes Sixt
  2011-04-25 17:51                         ` Junio C Hamano
  2011-04-25 21:04                       ` [PATCH v2] " Johannes Sixt
  1 sibling, 1 reply; 48+ messages in thread
From: Johannes Sixt @ 2011-04-25 17:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

Am 25.04.2011 18:50, schrieb Jeff King:
> On Sun, Apr 24, 2011 at 10:49:17PM +0200, Johannes Sixt wrote:
> 
>> diff --git a/builtin-send-pack.c b/builtin-send-pack.c
>> index 089058b..b371c79 100644
>> --- a/builtin-send-pack.c
>> +++ b/builtin-send-pack.c
>> @@ -376,6 +376,9 @@ static void print_helper_status(struct ref *ref)
>>  static int sideband_demux(int in, int out, void *data)
>>  {
>>  	int *fd = data;
>> +#ifndef ASYNC_AS_THREAD
>> +	close(fd[1]);
>> +#endif
> 
> In the comments for 1/2, you said this goes directly on 38a81b4e. But in
> that commit, we use #ifndef WIN32 to decide whether or not to fork for
> async code. So shouldn't this use the same test (I don't even see
> ASYNC_AS_THREAD defined anywhere else)?

Oops, you are right. I was looking at f6b60983, the one that the
two-patch series should be merged to; there, we remove ASYNC_AS_THREAD
and replace it by NO_PTHREADS. Therefore, I assumed that we use the
former symbol to decide whether to use threaded async procedures.
Obviously, the symbol was introduced only later. Will resend.

-- Hannes

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/2] send-pack: avoid deadlock when pack-object dies early
  2011-04-25 17:41                       ` Johannes Sixt
@ 2011-04-25 17:51                         ` Junio C Hamano
  0 siblings, 0 replies; 48+ messages in thread
From: Junio C Hamano @ 2011-04-25 17:51 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, git

Johannes Sixt <j6t@kdbg.org> writes:

> ... Therefore, I assumed that we use the
> former symbol to decide whether to use threaded async procedures.
> Obviously, the symbol was introduced only later. Will resend.

Thanks.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2] send-pack: avoid deadlock when pack-object dies early
  2011-04-25 16:50                     ` Jeff King
  2011-04-25 17:41                       ` Johannes Sixt
@ 2011-04-25 21:04                       ` Johannes Sixt
  2011-04-26  8:23                         ` Jeff King
  1 sibling, 1 reply; 48+ messages in thread
From: Johannes Sixt @ 2011-04-25 21:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

Send-pack deadlocks in two ways when pack-object dies early (for example,
because there is some repo corruption).

The first deadlock happens with the smart push protocol (--stateless-rpc).
After the initial rev-exchange, the remote is waiting for the pack data
to arrive, and the sideband demuxer at the local side continues trying to
stream data from the remote repository until it gets EOF. Meanwhile,
send-pack (in function pack_objects()) has noticed that pack-objects did
not produce output and died. Back in send_pack(), it now tries to clean
up the sideband demuxer using finish_async(). The demuxer, however, waits
for the remote end to close down, the remote waits for pack data, and
the reason that it still waits is that send-pack forgot to close the
outgoing channel. Add the missing close() in pack_objects().

The second deadlock happens in a similar constellation when the sideband
demuxer runs in a forked process (rather than in a thread). Again, the
remote end waits for pack data to arrive, the sideband demuxer waits for
the remote to shut down, and send-pack (in the regular clean-up) waits for
the demuxer to terminate. This time, the send-pack parent process closes
the writable end of the outgoing channel (in start_command() that spawned
pack-objects) so that after the death of the pack-objects process all
writable ends should have been closed and the remote repo should see EOF.
This does not happen, however, because when the sideband demuxer was forked
earlier, it also inherited a writable end; it remains open and keeps the
remote repo from seeing EOF. To break this deadlock, close the writable end
in the demuxer.

Analyzed-by: Jeff King <peff@peff.net>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
---
Am 25.04.2011 18:50, schrieb Jeff King:
> In the comments for 1/2, you said this goes directly on 38a81b4e. But in
> that commit, we use #ifndef WIN32 to decide whether or not to fork for
> async code. So shouldn't this use the same test (I don't even see
> ASYNC_AS_THREAD defined anywhere else)?

Here's the fixed patch. I squashed both earlier patches into a single patch
because they are about the same topic, as you showed with your tests of
git-push via smart http.

Again, this should go on top of 38a81b4e. When it is merged to f6b60983 or
later, the '#ifndef WIN32' must be changed to '#ifdef NO_PTHREADS'.

-- Hannes

 builtin-send-pack.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/builtin-send-pack.c b/builtin-send-pack.c
index 2478e18..6516288 100644
--- a/builtin-send-pack.c
+++ b/builtin-send-pack.c
@@ -97,6 +97,7 @@ static int pack_objects(int fd, struct ref *refs, struct extra_have_objects *ext
 		free(buf);
 		close(po.out);
 		po.out = -1;
+		close(fd);
 	}

 	if (finish_command(&po))
@@ -375,6 +376,9 @@ static void print_helper_status(struct ref *ref)
 static int sideband_demux(int in, int out, void *data)
 {
 	int *fd = data;
+#ifndef WIN32
+	close(fd[1]);
+#endif
 	int ret = recv_sideband("send-pack", fd[0], out);
 	close(out);
 	return ret;
-- 
1.7.5.rc1.97.ge0653

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v2] send-pack: avoid deadlock when pack-object dies early
  2011-04-25 21:04                       ` [PATCH v2] " Johannes Sixt
@ 2011-04-26  8:23                         ` Jeff King
  0 siblings, 0 replies; 48+ messages in thread
From: Jeff King @ 2011-04-26  8:23 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, git

On Mon, Apr 25, 2011 at 11:04:10PM +0200, Johannes Sixt wrote:

> > In the comments for 1/2, you said this goes directly on 38a81b4e. But in
> > that commit, we use #ifndef WIN32 to decide whether or not to fork for
> > async code. So shouldn't this use the same test (I don't even see
> > ASYNC_AS_THREAD defined anywhere else)?
> 
> Here's the fixed patch. I squashed both earlier patches into a single patch
> because they are about the same topic, as you showed with your tests of
> git-push via smart http.

Thanks, this one looks good to me, and I think the squash is sensible.

> Again, this should go on top of 38a81b4e. When it is merged to f6b60983 or
> later, the '#ifndef WIN32' must be changed to '#ifdef NO_PTHREADS'.

I see Junio has an evil merge in pu that takes care of this.

I wonder if it might have been nicer to "cherry-pick -n" the old fix on
top of master, fixing up the tree to use NO_PTHREADS, and then resolving
the ensuing merge conflict in favor of the cherry-picked version. . I
guess it is just a matter of style (the fact that a few of our
archaeology tools do not find content in evil merges very well is a
downside, but that just means we should fix the tools :) ).

-Peff

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2011-04-26  8:23 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-31 18:42 [PATCH 0/4] fix hang in git push when pack-objects fails Jeff King
2011-03-31 18:43 ` [PATCH 1/4] teach wait_or_whine a "quiet" mode Jeff King
2011-03-31 20:56   ` Johannes Sixt
2011-04-01  1:35     ` Jeff King
2011-03-31 18:44 ` [PATCH 2/4] finish_async: be quiet when waiting for async process Jeff King
2011-03-31 18:44 ` [PATCH 3/4] run-command: allow aborting async code prematurely Jeff King
2011-04-01  9:36   ` Erik Faye-Lund
2011-04-01 13:59     ` Jeff King
2011-03-31 18:44 ` [PATCH 4/4] send-pack: abort sideband demuxer on pack-objects error Jeff King
2011-04-13 19:53   ` Johannes Sixt
2011-04-14 13:54     ` Jeff King
2011-04-14 19:36       ` Johannes Sixt
2011-04-14 20:21         ` Jeff King
2011-04-14 20:43           ` Johannes Sixt
2011-04-14 20:51             ` Jeff King
2011-04-14 21:05               ` Johannes Sixt
2011-04-14 21:21               ` Junio C Hamano
2011-04-24 20:42                 ` [PATCH/RFC 1/2] send-pack --stateless-rpc: properly close the outgoing channel Johannes Sixt
2011-04-24 20:49                   ` [PATCH 2/2] send-pack: avoid deadlock when pack-object dies early Johannes Sixt
2011-04-25 16:50                     ` Jeff King
2011-04-25 17:41                       ` Johannes Sixt
2011-04-25 17:51                         ` Junio C Hamano
2011-04-25 21:04                       ` [PATCH v2] " Johannes Sixt
2011-04-26  8:23                         ` Jeff King
2011-04-25 16:40                   ` [PATCH/RFC 1/2] send-pack --stateless-rpc: properly close the outgoing channel Jeff King
2011-03-31 18:45 ` [PATCH 5/4] run-command: implement abort_async for pthreads Jeff King
2011-04-01  9:41   ` Erik Faye-Lund
2011-04-01 10:15     ` Erik Faye-Lund
2011-04-01 17:27       ` Johannes Sixt
2011-04-01 17:38         ` Jeff King
2011-04-01 19:26         ` Erik Faye-Lund
2011-04-01 19:33           ` Erik Faye-Lund
2011-04-01 19:42           ` Johannes Sixt
2011-04-01 19:57             ` Erik Faye-Lund
2011-04-01 20:05               ` Jeff King
2011-04-01 20:13                 ` Erik Faye-Lund
2011-04-01 20:17                   ` Jeff King
2011-04-01 20:18                     ` Jeff King
2011-04-01 20:34                     ` Erik Faye-Lund
2011-04-01 20:36                   ` Johannes Sixt
2011-04-01 20:41                     ` Erik Faye-Lund
2011-04-01 20:18               ` Johannes Sixt
2011-04-01 20:31                 ` Erik Faye-Lund
2011-04-01 21:16                   ` Jeff King
2011-04-02 12:27                     ` Erik Faye-Lund
2011-04-01 14:00     ` Jeff King
2011-03-31 20:45 ` [PATCH 0/4] fix hang in git push when pack-objects fails Johannes Sixt
2011-04-01  1:34   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).