git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
* (no subject)
@ 2020-11-19  8:18 唐宇奕
  2020-11-20 18:52 ` Bug report: orphaned pack-objects after killing upload-pack on [was: (no subject)] René Scharfe
  0 siblings, 1 reply; 13+ messages in thread
From: 唐宇奕 @ 2020-11-19  8:18 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)
  1. linux environment, prepare a large repo to give you enough time to debug
  2. clone that repo
  3. during clone, send SIGTERM to the git-upload-pack process

What did you expect to happen? (Expected behavior)
  there is no git process left

What happened instead? (Actual behavior)
  there is a git pack-objects process which is forked by the
git-upload-pack process, it becomes a zombie

What's different between what you expected and what actually happened?
  there shouldn't exist zombie processes

Anything else you want to add:
  you may need a docker container environment to reproduce this since
mostly init process will cleanup zombie processes for you

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.26.2
cpu: x86_64

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug report: orphaned pack-objects after killing upload-pack on [was: (no subject)]
  2020-11-19  8:18 唐宇奕
@ 2020-11-20 18:52 ` René Scharfe
  2020-11-21  0:29   ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: René Scharfe @ 2020-11-20 18:52 UTC (permalink / raw)
  To: 唐宇奕, git

Am 19.11.20 um 09:18 schrieb 唐宇奕:
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
>
> What did you do before the bug happened? (Steps to reproduce your issue)
>   1. linux environment, prepare a large repo to give you enough time to debug
>   2. clone that repo
>   3. during clone, send SIGTERM to the git-upload-pack process
>
> What did you expect to happen? (Expected behavior)
>   there is no git process left
>
> What happened instead? (Actual behavior)
>   there is a git pack-objects process which is forked by the
> git-upload-pack process, it becomes a zombie
>
> What's different between what you expected and what actually happened?
>   there shouldn't exist zombie processes
>
> Anything else you want to add:
>   you may need a docker container environment to reproduce this since
> mostly init process will cleanup zombie processes for you
>
> Please review the rest of the bug report below.
> You can delete any lines you don't wish to share.
>
>
> [System Info]
> git version:
> git version 2.26.2
> cpu: x86_64

Have you seen this working as you expect in an earlier version?

I suspect it's a matter of turning on some flags (patch below), but I
have to admit I don't really know what I'm doing here -- and why this
hasn't been done already.

René

---
 upload-pack.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/upload-pack.c b/upload-pack.c
index 5dc8e1f844..e42dea26fa 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -321,6 +321,8 @@ static void create_pack_file(struct upload_pack_data *pack_data,
 	pack_objects.in = -1;
 	pack_objects.out = -1;
 	pack_objects.err = -1;
+	pack_objects.clean_on_exit = 1;
+	pack_objects.wait_after_clean = 1;

 	if (start_command(&pack_objects))
 		die("git upload-pack: unable to fork git-pack-objects");
--
2.29.2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug report: orphaned pack-objects after killing upload-pack on [was: (no subject)]
  2020-11-20 18:52 ` Bug report: orphaned pack-objects after killing upload-pack on [was: (no subject)] René Scharfe
@ 2020-11-21  0:29   ` Jeff King
  2020-11-21 21:54     ` Bug report: orphaned pack-objects after killing upload-pack on [ Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2020-11-21  0:29 UTC (permalink / raw)
  To: René Scharfe; +Cc: 唐宇奕, git

On Fri, Nov 20, 2020 at 07:52:45PM +0100, René Scharfe wrote:

> Have you seen this working as you expect in an earlier version?
> 
> I suspect it's a matter of turning on some flags (patch below), but I
> have to admit I don't really know what I'm doing here -- and why this
> hasn't been done already.
> 
> René
> 
> ---
>  upload-pack.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/upload-pack.c b/upload-pack.c
> index 5dc8e1f844..e42dea26fa 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -321,6 +321,8 @@ static void create_pack_file(struct upload_pack_data *pack_data,
>  	pack_objects.in = -1;
>  	pack_objects.out = -1;
>  	pack_objects.err = -1;
> +	pack_objects.clean_on_exit = 1;
> +	pack_objects.wait_after_clean = 1;

Yeah, clean_on_exit seems quite reasonable to me. I suspect nobody ever
really noticed, because as soon as pack-objects starts to write out the
pack, it will get SIGPIPE or EPIPE and die. But there is no point in
letting it chug on expensive object enumeration or delta compression if
upload-pack has already exited.

I don't know that wait_after_clean is necessary. We don't need to wait
for pack-objects to fail.

On the flip side, one of the reasons I added clean_on_exit long ago was
for the similar issue on the push side, which is even worse. Here we
might just waste some CPU, but on the push side we connect pack-objects
directly to the remote socket, so it could actually complete the push
(from the server's perspective) after the local git-push has died. Or at
least I think that was possible at one point; it might not be the case
any more.

I wrote this patch ages ago, and it is still sitting close to the bottom
(if not the bottom) of my todo stack, waiting to be investigated. ;)

-- >8 --
Subject: [PATCH] send-pack: kill pack-objects helper on signal or exit

We spawn an external pack-objects process to actually send
objects to the remote side. If we are killed by a signal
during this process, the pack-objects will keep running and
complete the push, which may surprise the user. We should
take it down when we go down.

Signed-off-by: Jeff King <peff@peff.net>
---
 send-pack.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/send-pack.c b/send-pack.c
index eb4a44270b..d2701bf35c 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -85,6 +85,7 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *extra, struc
 	po.in = -1;
 	po.out = args->stateless_rpc ? -1 : fd;
 	po.git_cmd = 1;
+	po.clean_on_exit = 1;
 	if (start_command(&po))
 		die_errno("git pack-objects failed");
 
-- 
2.29.2.730.g3e418f96ba


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug report: orphaned pack-objects after killing upload-pack on [
  2020-11-21  0:29   ` Jeff King
@ 2020-11-21 21:54     ` Junio C Hamano
  2020-11-24  3:21       ` 唐宇奕
  2020-11-24  9:11       ` Jeff King
  0 siblings, 2 replies; 13+ messages in thread
From: Junio C Hamano @ 2020-11-21 21:54 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, 唐宇奕, git

Jeff King <peff@peff.net> writes:

> Yeah, clean_on_exit seems quite reasonable to me. I suspect nobody ever
> really noticed, because as soon as pack-objects starts to write out the
> pack, it will get SIGPIPE or EPIPE and die. But there is no point in
> letting it chug on expensive object enumeration or delta compression if
> upload-pack has already exited.
>
> I don't know that wait_after_clean is necessary. We don't need to wait
> for pack-objects to fail.
>
> On the flip side, one of the reasons I added clean_on_exit long ago was
> for the similar issue on the push side, which is even worse. Here we
> might just waste some CPU, but on the push side we connect pack-objects
> directly to the remote socket, so it could actually complete the push
> (from the server's perspective) after the local git-push has died. Or at
> least I think that was possible at one point; it might not be the case
> any more.
>
> I wrote this patch ages ago, and it is still sitting close to the bottom
> (if not the bottom) of my todo stack, waiting to be investigated. ;)

Sounds sensible.

> -- >8 --
> Subject: [PATCH] send-pack: kill pack-objects helper on signal or exit
>
> We spawn an external pack-objects process to actually send
> objects to the remote side. If we are killed by a signal
> during this process, the pack-objects will keep running and
> complete the push, which may surprise the user. We should
> take it down when we go down.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  send-pack.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/send-pack.c b/send-pack.c
> index eb4a44270b..d2701bf35c 100644
> --- a/send-pack.c
> +++ b/send-pack.c
> @@ -85,6 +85,7 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *extra, struc
>  	po.in = -1;
>  	po.out = args->stateless_rpc ? -1 : fd;
>  	po.git_cmd = 1;
> +	po.clean_on_exit = 1;
>  	if (start_command(&po))
>  		die_errno("git pack-objects failed");

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug report: orphaned pack-objects after killing upload-pack on [
  2020-11-21 21:54     ` Bug report: orphaned pack-objects after killing upload-pack on [ Junio C Hamano
@ 2020-11-24  3:21       ` 唐宇奕
  2020-11-24  9:11       ` Jeff King
  1 sibling, 0 replies; 13+ messages in thread
From: 唐宇奕 @ 2020-11-24  3:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, René Scharfe, git

when will this bug be fixed?

Junio C Hamano <gitster@pobox.com> 于2020年11月22日周日 上午5:54写道:
>
> Jeff King <peff@peff.net> writes:
>
> > Yeah, clean_on_exit seems quite reasonable to me. I suspect nobody ever
> > really noticed, because as soon as pack-objects starts to write out the
> > pack, it will get SIGPIPE or EPIPE and die. But there is no point in
> > letting it chug on expensive object enumeration or delta compression if
> > upload-pack has already exited.
> >
> > I don't know that wait_after_clean is necessary. We don't need to wait
> > for pack-objects to fail.
> >
> > On the flip side, one of the reasons I added clean_on_exit long ago was
> > for the similar issue on the push side, which is even worse. Here we
> > might just waste some CPU, but on the push side we connect pack-objects
> > directly to the remote socket, so it could actually complete the push
> > (from the server's perspective) after the local git-push has died. Or at
> > least I think that was possible at one point; it might not be the case
> > any more.
> >
> > I wrote this patch ages ago, and it is still sitting close to the bottom
> > (if not the bottom) of my todo stack, waiting to be investigated. ;)
>
> Sounds sensible.
>
> > -- >8 --
> > Subject: [PATCH] send-pack: kill pack-objects helper on signal or exit
> >
> > We spawn an external pack-objects process to actually send
> > objects to the remote side. If we are killed by a signal
> > during this process, the pack-objects will keep running and
> > complete the push, which may surprise the user. We should
> > take it down when we go down.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> >  send-pack.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/send-pack.c b/send-pack.c
> > index eb4a44270b..d2701bf35c 100644
> > --- a/send-pack.c
> > +++ b/send-pack.c
> > @@ -85,6 +85,7 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *extra, struc
> >       po.in = -1;
> >       po.out = args->stateless_rpc ? -1 : fd;
> >       po.git_cmd = 1;
> > +     po.clean_on_exit = 1;
> >       if (start_command(&po))
> >               die_errno("git pack-objects failed");

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug report: orphaned pack-objects after killing upload-pack on [
  2020-11-21 21:54     ` Bug report: orphaned pack-objects after killing upload-pack on [ Junio C Hamano
  2020-11-24  3:21       ` 唐宇奕
@ 2020-11-24  9:11       ` Jeff King
  2020-11-25 21:42         ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff King @ 2020-11-24  9:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, 唐宇奕, git

On Sat, Nov 21, 2020 at 01:54:17PM -0800, Junio C Hamano wrote:

> > I wrote this patch ages ago, and it is still sitting close to the bottom
> > (if not the bottom) of my todo stack, waiting to be investigated. ;)
> 
> Sounds sensible.
> 
> > -- >8 --
> > Subject: [PATCH] send-pack: kill pack-objects helper on signal or exit

Looks like you picked up this patch on a topic branch

I was thinking of polishing it further with a test, but I can't think of
a good way to do one that would not be horribly racy. I even tried to
think of a racy test I could run manually (e.g., pushing and killing in
a loop) but even that is tough (even with the patch, it's racy whether
the push will succeed or not, depending on exactly when you kill the
parent, so there's no automatic way to distinguish the improved cases).

So I think it is still a good idea to do, and what I wrote earlier is
as good a we can do.

René, do you want to wrap up your similar patch for the fetch side?

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug report: orphaned pack-objects after killing upload-pack on [
  2020-11-24  9:11       ` Jeff King
@ 2020-11-25 21:42         ` Junio C Hamano
  2020-11-26  0:53           ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2020-11-25 21:42 UTC (permalink / raw)
  To: René Scharfe, Jeff King; +Cc: 唐宇奕, git

Jeff King <peff@peff.net> writes:

> So I think it is still a good idea to do, and what I wrote earlier is
> as good a we can do.
>
> René, do you want to wrap up your similar patch for the fetch side?

In the meantime, I may queue this on 'seen' but it cannot move
forward without a signoff.

Thanks.

--- >8 ------ >8 ------ >8 ------ >8 ------ >8 ---
From: René Scharfe <l.s.r@web.de>
Date: Fri, 20 Nov 2020 19:52:45 +0100
Subject: [PATCH] upload-pack: kill pack-objects helper on signal or exit

We spawn an external pack-objects process to actually send objects
to the remote side. If we are killed by a signal during this
process, the pack-objects will hang around as a zombie.  We should
take it down when we go down.
---
 upload-pack.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/upload-pack.c b/upload-pack.c
index 3b858eb457..d4f7192d04 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -321,6 +321,7 @@ static void create_pack_file(struct upload_pack_data *pack_data,
 	pack_objects.in = -1;
 	pack_objects.out = -1;
 	pack_objects.err = -1;
+	pack_objects.clean_on_exit = 1;
 
 	if (start_command(&pack_objects))
 		die("git upload-pack: unable to fork git-pack-objects");
-- 
2.29.2-538-g65d51b1459

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug report: orphaned pack-objects after killing upload-pack on [
  2020-11-25 21:42         ` Junio C Hamano
@ 2020-11-26  0:53           ` Jeff King
  2020-11-26  1:04             ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2020-11-26  0:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, 唐宇奕, git

On Wed, Nov 25, 2020 at 01:42:24PM -0800, Junio C Hamano wrote:

> In the meantime, I may queue this on 'seen' but it cannot move
> forward without a signoff.
> 
> Thanks.
> 
> --- >8 ------ >8 ------ >8 ------ >8 ------ >8 ---
> From: René Scharfe <l.s.r@web.de>
> Date: Fri, 20 Nov 2020 19:52:45 +0100
> Subject: [PATCH] upload-pack: kill pack-objects helper on signal or exit
> 
> We spawn an external pack-objects process to actually send objects
> to the remote side. If we are killed by a signal during this
> process, the pack-objects will hang around as a zombie.  We should
> take it down when we go down.

I think this is a good thing to do. I'd probably avoid the word "zombie"
here, though. The orphaned pack-objects does not become a zombie process
in the traditional Unix sense of the word, waiting to be reaped by a
parent which is not paying attention. Instead it is still running but
doing work for a caller that will never read the result.

So maybe:

  We spawn an external pack-objects process to actually send objects to
  the remote side. If we are killed by a signal during this process,
  then pack-objects may continue to run. As soon as it starts producing
  output for the pack, it will see a failure writing to upload-pack and
  exit itself. But before then, it may do significant work traversing
  the object graph, compressing deltas, etc, which will all be
  pointless. So let's make sure to kill as soon as we know that the
  caller will not read the result.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug report: orphaned pack-objects after killing upload-pack on [
  2020-11-26  0:53           ` Jeff King
@ 2020-11-26  1:04             ` Junio C Hamano
  2020-11-26 20:04               ` René Scharfe
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2020-11-26  1:04 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, 唐宇奕, git

Jeff King <peff@peff.net> writes:

> On Wed, Nov 25, 2020 at 01:42:24PM -0800, Junio C Hamano wrote:
>
>> In the meantime, I may queue this on 'seen' but it cannot move
>> forward without a signoff.
>>  ...
>   We spawn an external pack-objects process to actually send objects to
>   the remote side. If we are killed by a signal during this process,
>   then pack-objects may continue to run. As soon as it starts producing
>   output for the pack, it will see a failure writing to upload-pack and
>   exit itself. But before then, it may do significant work traversing
>   the object graph, compressing deltas, etc, which will all be
>   pointless. So let's make sure to kill as soon as we know that the
>   caller will not read the result.

Thanks, that reads well.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug report: orphaned pack-objects after killing upload-pack on [
  2020-11-26  1:04             ` Junio C Hamano
@ 2020-11-26 20:04               ` René Scharfe
  2020-11-27  4:17                 ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: René Scharfe @ 2020-11-26 20:04 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: 唐宇奕, git

Am 26.11.20 um 02:04 schrieb Junio C Hamano:
> Jeff King <peff@peff.net> writes:
>
>> On Wed, Nov 25, 2020 at 01:42:24PM -0800, Junio C Hamano wrote:
>>
>>> In the meantime, I may queue this on 'seen' but it cannot move
>>> forward without a signoff.
>>>  ...
>>   We spawn an external pack-objects process to actually send objects to
>>   the remote side. If we are killed by a signal during this process,
>>   then pack-objects may continue to run. As soon as it starts producing
>>   output for the pack, it will see a failure writing to upload-pack and
>>   exit itself. But before then, it may do significant work traversing
>>   the object graph, compressing deltas, etc, which will all be
>>   pointless. So let's make sure to kill as soon as we know that the
>>   caller will not read the result.
>
> Thanks, that reads well.
>

The patch is trivial, you don't need my sign-off.  You could record Peff
as its author, as he contributed the most to the version in seen.

Before I could submit that one (or something similar) formally, I'd need
to understand what's happening here a lot better and witness the effect
of the patch.

I understand that the main benefit of stopping the child upon
termination of the parent is to avoid using CPU cycles on a heavy task
whose results will just go to waste.  But wouldn't the orphaned child
then become a zombie?  Init would reap it eventually, but are there
perhaps init-less deployments (containerized daemon?) where such
zombies could pile up?

For a test, winning the race condition should be easy if we cheat by
letting the child loop forever.  But I struggle even with the most
basic task: Making upload-pack invoked by clone call pack-objects.
(Feeling a bit silly.)

René

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug report: orphaned pack-objects after killing upload-pack on [
  2020-11-26 20:04               ` René Scharfe
@ 2020-11-27  4:17                 ` Jeff King
  2020-11-27 20:43                   ` René Scharfe
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2020-11-27  4:17 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, 唐宇奕, git

On Thu, Nov 26, 2020 at 09:04:35PM +0100, René Scharfe wrote:

> Before I could submit that one (or something similar) formally, I'd need
> to understand what's happening here a lot better and witness the effect
> of the patch.
> 
> I understand that the main benefit of stopping the child upon
> termination of the parent is to avoid using CPU cycles on a heavy task
> whose results will just go to waste.  But wouldn't the orphaned child
> then become a zombie?  Init would reap it eventually, but are there
> perhaps init-less deployments (containerized daemon?) where such
> zombies could pile up?

I think an init-less deployment like that is already broken. If we
encounter any error at all in upload-pack we may quit without reaping
all of our children. And this could never be protected against entirely;
we could be killed by SIGSEGV, SIGKILL, etc.

My understanding is container deployments often have a tiny pid-1 init
that takes care of zombie processes like this (but it's not something
I've dealt with much myself).

> For a test, winning the race condition should be easy if we cheat by
> letting the child loop forever.  But I struggle even with the most
> basic task: Making upload-pack invoked by clone call pack-objects.
> (Feeling a bit silly.)

Here's an easy reproduction. On a clone of something large-ish (by
number of objects) like linux.git:

  - make sure you don't have bitmaps on (since they make the enumerating
    phase go quickly). For linux.git it takes ~30s or so to walk the
    whole graph on my machine.

  - run "git clone --no-local -q . dst"; the "-q" is important because
    if pack-objects is writing progress to upload-pack (to get
    multiplexed over the sideband to the client), then it will notice
    pretty quickly the failure to write to stderr

  - kill just upload-pack with "pkill git-upload-pack" or whatever you
    like

  - run "ps au | grep pack-objects" (or just "top") to see pack-objects
    chugging on 100% CPU (and consuming 1GB+ of RAM)

With the patch adding clean_on_exit, that last step turns up nothing.

Now the situation above is probably pretty rare. Nobody is usually going
to kill upload-pack specifically. The more common case is when
upload-pack realizes that the client (or the network) has gone away,
because it tries to write and finds the connection gone. But what is it
writing? Most of the time it's stuff from pack-objects! So in the normal
case, pack-objects is continually writing either data or progress
reports, so it would notice for its next write.

But again, a client asking for no progress is a problem. upload-pack
will be sending keepalives every 5s or so, so it will notice client
death then. But pack-objects will keep running, not generating any
output until it starts spewing the pack.

So you could probably make the scenario above a bit more realistic by
killing the parent git-clone process. But don't use ^C; that will send
SIGINT to all of the processes. Simulate a network failure by killing
the "git clone" process specifically. This shows the same problem, and
the same improvement after the patch (though remember it may take up to
5 seconds for upload-pack to send a keepalive and notice the problem).

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug report: orphaned pack-objects after killing upload-pack on [
  2020-11-27  4:17                 ` Jeff King
@ 2020-11-27 20:43                   ` René Scharfe
  2020-11-28  6:30                     ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: René Scharfe @ 2020-11-27 20:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, 唐宇奕, git

Am 27.11.20 um 05:17 schrieb Jeff King:
> On Thu, Nov 26, 2020 at 09:04:35PM +0100, René Scharfe wrote:
>
>> Before I could submit that one (or something similar) formally, I'd need
>> to understand what's happening here a lot better and witness the effect
>> of the patch.
>>
>> I understand that the main benefit of stopping the child upon
>> termination of the parent is to avoid using CPU cycles on a heavy task
>> whose results will just go to waste.  But wouldn't the orphaned child
>> then become a zombie?  Init would reap it eventually, but are there
>> perhaps init-less deployments (containerized daemon?) where such
>> zombies could pile up?
>
> I think an init-less deployment like that is already broken. If we
> encounter any error at all in upload-pack we may quit without reaping
> all of our children. And this could never be protected against entirely;
> we could be killed by SIGSEGV, SIGKILL, etc.

That might be true, but it might still be usable if the rate of zombie
production is low enough.  And reducing it slightly might still help by
increasing the time between container restarts.  Segfaults should be
very rare, and people using kill -9 can clean up after themselves..

> My understanding is container deployments often have a tiny pid-1 init
> that takes care of zombie processes like this (but it's not something
> I've dealt with much myself).

True, e.g. https://github.com/krallin/tini, which is built into newer
Docker releases already.  So this problem is real and has an (optional)
solution.

OK, so overall the situation sounds a bit messy to me and perhaps
there's room for improvement, but I agree now that we can leave the
specialists (init, tini) to deal with our zombies.

>> For a test, winning the race condition should be easy if we cheat by
>> letting the child loop forever.  But I struggle even with the most
>> basic task: Making upload-pack invoked by clone call pack-objects.
>> (Feeling a bit silly.)
>
> Here's an easy reproduction. On a clone of something large-ish (by
> number of objects) like linux.git:
>
>   - make sure you don't have bitmaps on (since they make the enumerating
>     phase go quickly). For linux.git it takes ~30s or so to walk the
>     whole graph on my machine.
>
>   - run "git clone --no-local -q . dst"; the "-q" is important because
>     if pack-objects is writing progress to upload-pack (to get
>     multiplexed over the sideband to the client), then it will notice
>     pretty quickly the failure to write to stderr
>
>   - kill just upload-pack with "pkill git-upload-pack" or whatever you
>     like
>
>   - run "ps au | grep pack-objects" (or just "top") to see pack-objects
>     chugging on 100% CPU (and consuming 1GB+ of RAM)
>
> With the patch adding clean_on_exit, that last step turns up nothing.

I was missing --no-local, d'oh!

To win the race consistently I used this:

-- >8 --

diff --git a/run-command.c b/run-command.c
index ea4d0fb4b1..a6bf0a86dd 100644
--- a/run-command.c
+++ b/run-command.c
@@ -672,6 +672,19 @@ int start_command(struct child_process *cmd)
 	int failed_errno;
 	char *str;

+	const char *loop_argv[] = { "while :; do sleep 1; done", NULL };
+	const char *fail_cmd = getenv("GIT_DEBUG_ABANDON_CHILD");
+	const char *argv0 = cmd->argv ? cmd->argv[0] : cmd->args.v[0];
+	int fail = fail_cmd && starts_with(argv0, fail_cmd);
+
+	if (fail) {
+		fprintf(stderr, "starting endless loop instead of %s\n",
+			cmd->argv ? cmd->argv[0] : cmd->args.v[0]);
+		cmd->argv = loop_argv;
+		cmd->use_shell = 1;
+		cmd->git_cmd = 0;
+	}
+
 	if (!cmd->argv)
 		cmd->argv = cmd->args.v;
 	if (!cmd->env)
@@ -982,6 +995,9 @@ int start_command(struct child_process *cmd)
 	else if (cmd->err)
 		close(cmd->err);

+	if (fail)
+		die("abandoning child %"PRIuMAX"\n", (uintmax_t)cmd->pid);
+
 	return 0;
 }


--- 8< ---

We could build tests that always win (or lose, based on your
perspective) the race condition based on that.  Turning on clean_on_exit
is such a no-brainer that I don't see the need for one, though.

> Now the situation above is probably pretty rare. Nobody is usually going
> to kill upload-pack specifically. The more common case is when
> upload-pack realizes that the client (or the network) has gone away,
> because it tries to write and finds the connection gone. But what is it
> writing? Most of the time it's stuff from pack-objects! So in the normal
> case, pack-objects is continually writing either data or progress
> reports, so it would notice for its next write.
>
> But again, a client asking for no progress is a problem. upload-pack
> will be sending keepalives every 5s or so, so it will notice client
> death then. But pack-objects will keep running, not generating any
> output until it starts spewing the pack.
>
> So you could probably make the scenario above a bit more realistic by
> killing the parent git-clone process. But don't use ^C; that will send
> SIGINT to all of the processes. Simulate a network failure by killing
> the "git clone" process specifically. This shows the same problem, and
> the same improvement after the patch (though remember it may take up to
> 5 seconds for upload-pack to send a keepalive and notice the problem).

With the debug patch above and GIT_DEBUG_ABANDON_CHILD=git-upload-pack I
need the following patch get rid of the spawned process:

--- >8 ---

diff --git a/connect.c b/connect.c
index 8b8f56cf6d..e1b1b73ef5 100644
--- a/connect.c
+++ b/connect.c
@@ -1369,6 +1369,7 @@ struct child_process *git_connect(int fd[2], const char *url,

 		conn->use_shell = 1;
 		conn->in = conn->out = -1;
+		conn->clean_on_exit = 1;
 		if (protocol == PROTO_SSH) {
 			char *ssh_host = hostandport;
 			const char *port = NULL;

--- 8< ---

So is there a downside to clean_on_exit?  It doesn't make sense when we
start browsers or pagers, but for hooks and helpers (which are probably
the majority of started processes) cascading program termination makes
sense, no?

René


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug report: orphaned pack-objects after killing upload-pack on [
  2020-11-27 20:43                   ` René Scharfe
@ 2020-11-28  6:30                     ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2020-11-28  6:30 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, 唐宇奕, git

On Fri, Nov 27, 2020 at 09:43:06PM +0100, René Scharfe wrote:

> > [...zombie processes...]
> OK, so overall the situation sounds a bit messy to me and perhaps
> there's room for improvement, but I agree now that we can leave the
> specialists (init, tini) to deal with our zombies.

Keep in mind we are not changing anything here, either. clean_on_exit is
kicking in when we already would be exiting without calling wait(). We
could _also_ instruct run-command to wait, but nobody seems to be
complaining about it.

> With the debug patch above and GIT_DEBUG_ABANDON_CHILD=git-upload-pack I
> need the following patch get rid of the spawned process:

I don't think that is an interesting case, though. We've been discussing
the spawn between upload-pack and its child pack-objects, but here
you're running a bogus infinite loop instead of upload-pack. It will
spin forever, because the client is expecting it to say something.

In a normal setup, a severing of the connection between the client and
upload-pack will be noticed quickly, because each one is always either
trying to read from or write to the other (the exception is while
pack-objects is processing without progress meters, but there we send a
keep-alive every 5 seconds).

If one of them were to spin in a true infinite loop due to a bug (but
not break the connection), we would wait forever. But the solution there
is a timeout on inactivity.

So...

> --- >8 ---
> 
> diff --git a/connect.c b/connect.c
> index 8b8f56cf6d..e1b1b73ef5 100644
> --- a/connect.c
> +++ b/connect.c
> @@ -1369,6 +1369,7 @@ struct child_process *git_connect(int fd[2], const char *url,
> 
>  		conn->use_shell = 1;
>  		conn->in = conn->out = -1;
> +		conn->clean_on_exit = 1;
>  		if (protocol == PROTO_SSH) {
>  			char *ssh_host = hostandport;
>  			const char *port = NULL;
> 
> --- 8< ---

I don't think there's much point here. If the client side dies, it will
close the socket, and upload-pack would notice anyway. It's only your
fake "spin without doing any I/O" patch that misbehaves.

Moreover, this wouldn't do _anything_ in many cases, because upload-pack
is going to be on the far side of a network socket. So really you'd be
killing ssh here for those cases (which then likewise would close the
socket, but this isn't necessary because ssh will also exit when it sees
us close _our_ end). And it would do nothing at all for git-over-http,
git://, etc.

> So is there a downside to clean_on_exit?  It doesn't make sense when we
> start browsers or pagers, but for hooks and helpers (which are probably
> the majority of started processes) cascading program termination makes
> sense, no?

In general, no, I suspect clean_on_exit wouldn't be hurting anything if
we used it more. But generally we get a natural cascade of termination
because exiting processes close the input and output pipes going to the
other sub-processes.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-11-28 22:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-19  8:18 唐宇奕
2020-11-20 18:52 ` Bug report: orphaned pack-objects after killing upload-pack on [was: (no subject)] René Scharfe
2020-11-21  0:29   ` Jeff King
2020-11-21 21:54     ` Bug report: orphaned pack-objects after killing upload-pack on [ Junio C Hamano
2020-11-24  3:21       ` 唐宇奕
2020-11-24  9:11       ` Jeff King
2020-11-25 21:42         ` Junio C Hamano
2020-11-26  0:53           ` Jeff King
2020-11-26  1:04             ` Junio C Hamano
2020-11-26 20:04               ` René Scharfe
2020-11-27  4:17                 ` Jeff King
2020-11-27 20:43                   ` René Scharfe
2020-11-28  6:30                     ` Jeff King

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git