* (no subject) @ 2020-11-19 8:18 唐宇奕 2020-11-20 18:52 ` Bug report: orphaned pack-objects after killing upload-pack on [was: (no subject)] René Scharfe 0 siblings, 1 reply; 16+ messages in thread From: 唐宇奕 @ 2020-11-19 8:18 UTC (permalink / raw) To: git Thank you for filling out a Git bug report! Please answer the following questions to help us understand your issue. What did you do before the bug happened? (Steps to reproduce your issue) 1. linux environment, prepare a large repo to give you enough time to debug 2. clone that repo 3. during clone, send SIGTERM to the git-upload-pack process What did you expect to happen? (Expected behavior) there is no git process left What happened instead? (Actual behavior) there is a git pack-objects process which is forked by the git-upload-pack process, it becomes a zombie What's different between what you expected and what actually happened? there shouldn't exist zombie processes Anything else you want to add: you may need a docker container environment to reproduce this since mostly init process will cleanup zombie processes for you Please review the rest of the bug report below. You can delete any lines you don't wish to share. [System Info] git version: git version 2.26.2 cpu: x86_64 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [was: (no subject)] 2020-11-19 8:18 唐宇奕 @ 2020-11-20 18:52 ` René Scharfe 2020-11-21 0:29 ` Jeff King 0 siblings, 1 reply; 16+ messages in thread From: René Scharfe @ 2020-11-20 18:52 UTC (permalink / raw) To: 唐宇奕, git Am 19.11.20 um 09:18 schrieb 唐宇奕: > Thank you for filling out a Git bug report! > Please answer the following questions to help us understand your issue. > > What did you do before the bug happened? (Steps to reproduce your issue) > 1. linux environment, prepare a large repo to give you enough time to debug > 2. clone that repo > 3. during clone, send SIGTERM to the git-upload-pack process > > What did you expect to happen? (Expected behavior) > there is no git process left > > What happened instead? (Actual behavior) > there is a git pack-objects process which is forked by the > git-upload-pack process, it becomes a zombie > > What's different between what you expected and what actually happened? > there shouldn't exist zombie processes > > Anything else you want to add: > you may need a docker container environment to reproduce this since > mostly init process will cleanup zombie processes for you > > Please review the rest of the bug report below. > You can delete any lines you don't wish to share. > > > [System Info] > git version: > git version 2.26.2 > cpu: x86_64 Have you seen this working as you expect in an earlier version? I suspect it's a matter of turning on some flags (patch below), but I have to admit I don't really know what I'm doing here -- and why this hasn't been done already. René --- upload-pack.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/upload-pack.c b/upload-pack.c index 5dc8e1f844..e42dea26fa 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -321,6 +321,8 @@ static void create_pack_file(struct upload_pack_data *pack_data, pack_objects.in = -1; pack_objects.out = -1; pack_objects.err = -1; + pack_objects.clean_on_exit = 1; + pack_objects.wait_after_clean = 1; if (start_command(&pack_objects)) die("git upload-pack: unable to fork git-pack-objects"); -- 2.29.2 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [was: (no subject)] 2020-11-20 18:52 ` Bug report: orphaned pack-objects after killing upload-pack on [was: (no subject)] René Scharfe @ 2020-11-21 0:29 ` Jeff King 2020-11-21 21:54 ` Bug report: orphaned pack-objects after killing upload-pack on [ Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Jeff King @ 2020-11-21 0:29 UTC (permalink / raw) To: René Scharfe; +Cc: 唐宇奕, git On Fri, Nov 20, 2020 at 07:52:45PM +0100, René Scharfe wrote: > Have you seen this working as you expect in an earlier version? > > I suspect it's a matter of turning on some flags (patch below), but I > have to admit I don't really know what I'm doing here -- and why this > hasn't been done already. > > René > > --- > upload-pack.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/upload-pack.c b/upload-pack.c > index 5dc8e1f844..e42dea26fa 100644 > --- a/upload-pack.c > +++ b/upload-pack.c > @@ -321,6 +321,8 @@ static void create_pack_file(struct upload_pack_data *pack_data, > pack_objects.in = -1; > pack_objects.out = -1; > pack_objects.err = -1; > + pack_objects.clean_on_exit = 1; > + pack_objects.wait_after_clean = 1; Yeah, clean_on_exit seems quite reasonable to me. I suspect nobody ever really noticed, because as soon as pack-objects starts to write out the pack, it will get SIGPIPE or EPIPE and die. But there is no point in letting it chug on expensive object enumeration or delta compression if upload-pack has already exited. I don't know that wait_after_clean is necessary. We don't need to wait for pack-objects to fail. On the flip side, one of the reasons I added clean_on_exit long ago was for the similar issue on the push side, which is even worse. Here we might just waste some CPU, but on the push side we connect pack-objects directly to the remote socket, so it could actually complete the push (from the server's perspective) after the local git-push has died. Or at least I think that was possible at one point; it might not be the case any more. I wrote this patch ages ago, and it is still sitting close to the bottom (if not the bottom) of my todo stack, waiting to be investigated. ;) -- >8 -- Subject: [PATCH] send-pack: kill pack-objects helper on signal or exit We spawn an external pack-objects process to actually send objects to the remote side. If we are killed by a signal during this process, the pack-objects will keep running and complete the push, which may surprise the user. We should take it down when we go down. Signed-off-by: Jeff King <peff@peff.net> --- send-pack.c | 1 + 1 file changed, 1 insertion(+) diff --git a/send-pack.c b/send-pack.c index eb4a44270b..d2701bf35c 100644 --- a/send-pack.c +++ b/send-pack.c @@ -85,6 +85,7 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *extra, struc po.in = -1; po.out = args->stateless_rpc ? -1 : fd; po.git_cmd = 1; + po.clean_on_exit = 1; if (start_command(&po)) die_errno("git pack-objects failed"); -- 2.29.2.730.g3e418f96ba ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-11-21 0:29 ` Jeff King @ 2020-11-21 21:54 ` Junio C Hamano 2020-11-24 3:21 ` 唐宇奕 2020-11-24 9:11 ` Jeff King 0 siblings, 2 replies; 16+ messages in thread From: Junio C Hamano @ 2020-11-21 21:54 UTC (permalink / raw) To: Jeff King; +Cc: René Scharfe, 唐宇奕, git Jeff King <peff@peff.net> writes: > Yeah, clean_on_exit seems quite reasonable to me. I suspect nobody ever > really noticed, because as soon as pack-objects starts to write out the > pack, it will get SIGPIPE or EPIPE and die. But there is no point in > letting it chug on expensive object enumeration or delta compression if > upload-pack has already exited. > > I don't know that wait_after_clean is necessary. We don't need to wait > for pack-objects to fail. > > On the flip side, one of the reasons I added clean_on_exit long ago was > for the similar issue on the push side, which is even worse. Here we > might just waste some CPU, but on the push side we connect pack-objects > directly to the remote socket, so it could actually complete the push > (from the server's perspective) after the local git-push has died. Or at > least I think that was possible at one point; it might not be the case > any more. > > I wrote this patch ages ago, and it is still sitting close to the bottom > (if not the bottom) of my todo stack, waiting to be investigated. ;) Sounds sensible. > -- >8 -- > Subject: [PATCH] send-pack: kill pack-objects helper on signal or exit > > We spawn an external pack-objects process to actually send > objects to the remote side. If we are killed by a signal > during this process, the pack-objects will keep running and > complete the push, which may surprise the user. We should > take it down when we go down. > > Signed-off-by: Jeff King <peff@peff.net> > --- > send-pack.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/send-pack.c b/send-pack.c > index eb4a44270b..d2701bf35c 100644 > --- a/send-pack.c > +++ b/send-pack.c > @@ -85,6 +85,7 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *extra, struc > po.in = -1; > po.out = args->stateless_rpc ? -1 : fd; > po.git_cmd = 1; > + po.clean_on_exit = 1; > if (start_command(&po)) > die_errno("git pack-objects failed"); ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-11-21 21:54 ` Bug report: orphaned pack-objects after killing upload-pack on [ Junio C Hamano @ 2020-11-24 3:21 ` 唐宇奕 2020-11-24 9:11 ` Jeff King 1 sibling, 0 replies; 16+ messages in thread From: 唐宇奕 @ 2020-11-24 3:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, René Scharfe, git when will this bug be fixed? Junio C Hamano <gitster@pobox.com> 于2020年11月22日周日 上午5:54写道: > > Jeff King <peff@peff.net> writes: > > > Yeah, clean_on_exit seems quite reasonable to me. I suspect nobody ever > > really noticed, because as soon as pack-objects starts to write out the > > pack, it will get SIGPIPE or EPIPE and die. But there is no point in > > letting it chug on expensive object enumeration or delta compression if > > upload-pack has already exited. > > > > I don't know that wait_after_clean is necessary. We don't need to wait > > for pack-objects to fail. > > > > On the flip side, one of the reasons I added clean_on_exit long ago was > > for the similar issue on the push side, which is even worse. Here we > > might just waste some CPU, but on the push side we connect pack-objects > > directly to the remote socket, so it could actually complete the push > > (from the server's perspective) after the local git-push has died. Or at > > least I think that was possible at one point; it might not be the case > > any more. > > > > I wrote this patch ages ago, and it is still sitting close to the bottom > > (if not the bottom) of my todo stack, waiting to be investigated. ;) > > Sounds sensible. > > > -- >8 -- > > Subject: [PATCH] send-pack: kill pack-objects helper on signal or exit > > > > We spawn an external pack-objects process to actually send > > objects to the remote side. If we are killed by a signal > > during this process, the pack-objects will keep running and > > complete the push, which may surprise the user. We should > > take it down when we go down. > > > > Signed-off-by: Jeff King <peff@peff.net> > > --- > > send-pack.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/send-pack.c b/send-pack.c > > index eb4a44270b..d2701bf35c 100644 > > --- a/send-pack.c > > +++ b/send-pack.c > > @@ -85,6 +85,7 @@ static int pack_objects(int fd, struct ref *refs, struct oid_array *extra, struc > > po.in = -1; > > po.out = args->stateless_rpc ? -1 : fd; > > po.git_cmd = 1; > > + po.clean_on_exit = 1; > > if (start_command(&po)) > > die_errno("git pack-objects failed"); ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-11-21 21:54 ` Bug report: orphaned pack-objects after killing upload-pack on [ Junio C Hamano 2020-11-24 3:21 ` 唐宇奕 @ 2020-11-24 9:11 ` Jeff King 2020-11-25 21:42 ` Junio C Hamano 1 sibling, 1 reply; 16+ messages in thread From: Jeff King @ 2020-11-24 9:11 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, 唐宇奕, git On Sat, Nov 21, 2020 at 01:54:17PM -0800, Junio C Hamano wrote: > > I wrote this patch ages ago, and it is still sitting close to the bottom > > (if not the bottom) of my todo stack, waiting to be investigated. ;) > > Sounds sensible. > > > -- >8 -- > > Subject: [PATCH] send-pack: kill pack-objects helper on signal or exit Looks like you picked up this patch on a topic branch I was thinking of polishing it further with a test, but I can't think of a good way to do one that would not be horribly racy. I even tried to think of a racy test I could run manually (e.g., pushing and killing in a loop) but even that is tough (even with the patch, it's racy whether the push will succeed or not, depending on exactly when you kill the parent, so there's no automatic way to distinguish the improved cases). So I think it is still a good idea to do, and what I wrote earlier is as good a we can do. René, do you want to wrap up your similar patch for the fetch side? -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-11-24 9:11 ` Jeff King @ 2020-11-25 21:42 ` Junio C Hamano 2020-11-26 0:53 ` Jeff King 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2020-11-25 21:42 UTC (permalink / raw) To: René Scharfe, Jeff King; +Cc: 唐宇奕, git Jeff King <peff@peff.net> writes: > So I think it is still a good idea to do, and what I wrote earlier is > as good a we can do. > > René, do you want to wrap up your similar patch for the fetch side? In the meantime, I may queue this on 'seen' but it cannot move forward without a signoff. Thanks. --- >8 ------ >8 ------ >8 ------ >8 ------ >8 --- From: René Scharfe <l.s.r@web.de> Date: Fri, 20 Nov 2020 19:52:45 +0100 Subject: [PATCH] upload-pack: kill pack-objects helper on signal or exit We spawn an external pack-objects process to actually send objects to the remote side. If we are killed by a signal during this process, the pack-objects will hang around as a zombie. We should take it down when we go down. --- upload-pack.c | 1 + 1 file changed, 1 insertion(+) diff --git a/upload-pack.c b/upload-pack.c index 3b858eb457..d4f7192d04 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -321,6 +321,7 @@ static void create_pack_file(struct upload_pack_data *pack_data, pack_objects.in = -1; pack_objects.out = -1; pack_objects.err = -1; + pack_objects.clean_on_exit = 1; if (start_command(&pack_objects)) die("git upload-pack: unable to fork git-pack-objects"); -- 2.29.2-538-g65d51b1459 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-11-25 21:42 ` Junio C Hamano @ 2020-11-26 0:53 ` Jeff King 2020-11-26 1:04 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Jeff King @ 2020-11-26 0:53 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, 唐宇奕, git On Wed, Nov 25, 2020 at 01:42:24PM -0800, Junio C Hamano wrote: > In the meantime, I may queue this on 'seen' but it cannot move > forward without a signoff. > > Thanks. > > --- >8 ------ >8 ------ >8 ------ >8 ------ >8 --- > From: René Scharfe <l.s.r@web.de> > Date: Fri, 20 Nov 2020 19:52:45 +0100 > Subject: [PATCH] upload-pack: kill pack-objects helper on signal or exit > > We spawn an external pack-objects process to actually send objects > to the remote side. If we are killed by a signal during this > process, the pack-objects will hang around as a zombie. We should > take it down when we go down. I think this is a good thing to do. I'd probably avoid the word "zombie" here, though. The orphaned pack-objects does not become a zombie process in the traditional Unix sense of the word, waiting to be reaped by a parent which is not paying attention. Instead it is still running but doing work for a caller that will never read the result. So maybe: We spawn an external pack-objects process to actually send objects to the remote side. If we are killed by a signal during this process, then pack-objects may continue to run. As soon as it starts producing output for the pack, it will see a failure writing to upload-pack and exit itself. But before then, it may do significant work traversing the object graph, compressing deltas, etc, which will all be pointless. So let's make sure to kill as soon as we know that the caller will not read the result. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-11-26 0:53 ` Jeff King @ 2020-11-26 1:04 ` Junio C Hamano 2020-11-26 20:04 ` René Scharfe 0 siblings, 1 reply; 16+ messages in thread From: Junio C Hamano @ 2020-11-26 1:04 UTC (permalink / raw) To: Jeff King; +Cc: René Scharfe, 唐宇奕, git Jeff King <peff@peff.net> writes: > On Wed, Nov 25, 2020 at 01:42:24PM -0800, Junio C Hamano wrote: > >> In the meantime, I may queue this on 'seen' but it cannot move >> forward without a signoff. >> ... > We spawn an external pack-objects process to actually send objects to > the remote side. If we are killed by a signal during this process, > then pack-objects may continue to run. As soon as it starts producing > output for the pack, it will see a failure writing to upload-pack and > exit itself. But before then, it may do significant work traversing > the object graph, compressing deltas, etc, which will all be > pointless. So let's make sure to kill as soon as we know that the > caller will not read the result. Thanks, that reads well. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-11-26 1:04 ` Junio C Hamano @ 2020-11-26 20:04 ` René Scharfe 2020-11-27 4:17 ` Jeff King 2020-12-01 12:15 ` Jeff King 0 siblings, 2 replies; 16+ messages in thread From: René Scharfe @ 2020-11-26 20:04 UTC (permalink / raw) To: Junio C Hamano, Jeff King; +Cc: 唐宇奕, git Am 26.11.20 um 02:04 schrieb Junio C Hamano: > Jeff King <peff@peff.net> writes: > >> On Wed, Nov 25, 2020 at 01:42:24PM -0800, Junio C Hamano wrote: >> >>> In the meantime, I may queue this on 'seen' but it cannot move >>> forward without a signoff. >>> ... >> We spawn an external pack-objects process to actually send objects to >> the remote side. If we are killed by a signal during this process, >> then pack-objects may continue to run. As soon as it starts producing >> output for the pack, it will see a failure writing to upload-pack and >> exit itself. But before then, it may do significant work traversing >> the object graph, compressing deltas, etc, which will all be >> pointless. So let's make sure to kill as soon as we know that the >> caller will not read the result. > > Thanks, that reads well. > The patch is trivial, you don't need my sign-off. You could record Peff as its author, as he contributed the most to the version in seen. Before I could submit that one (or something similar) formally, I'd need to understand what's happening here a lot better and witness the effect of the patch. I understand that the main benefit of stopping the child upon termination of the parent is to avoid using CPU cycles on a heavy task whose results will just go to waste. But wouldn't the orphaned child then become a zombie? Init would reap it eventually, but are there perhaps init-less deployments (containerized daemon?) where such zombies could pile up? For a test, winning the race condition should be easy if we cheat by letting the child loop forever. But I struggle even with the most basic task: Making upload-pack invoked by clone call pack-objects. (Feeling a bit silly.) René ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-11-26 20:04 ` René Scharfe @ 2020-11-27 4:17 ` Jeff King 2020-11-27 20:43 ` René Scharfe 2020-12-01 12:15 ` Jeff King 1 sibling, 1 reply; 16+ messages in thread From: Jeff King @ 2020-11-27 4:17 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, 唐宇奕, git On Thu, Nov 26, 2020 at 09:04:35PM +0100, René Scharfe wrote: > Before I could submit that one (or something similar) formally, I'd need > to understand what's happening here a lot better and witness the effect > of the patch. > > I understand that the main benefit of stopping the child upon > termination of the parent is to avoid using CPU cycles on a heavy task > whose results will just go to waste. But wouldn't the orphaned child > then become a zombie? Init would reap it eventually, but are there > perhaps init-less deployments (containerized daemon?) where such > zombies could pile up? I think an init-less deployment like that is already broken. If we encounter any error at all in upload-pack we may quit without reaping all of our children. And this could never be protected against entirely; we could be killed by SIGSEGV, SIGKILL, etc. My understanding is container deployments often have a tiny pid-1 init that takes care of zombie processes like this (but it's not something I've dealt with much myself). > For a test, winning the race condition should be easy if we cheat by > letting the child loop forever. But I struggle even with the most > basic task: Making upload-pack invoked by clone call pack-objects. > (Feeling a bit silly.) Here's an easy reproduction. On a clone of something large-ish (by number of objects) like linux.git: - make sure you don't have bitmaps on (since they make the enumerating phase go quickly). For linux.git it takes ~30s or so to walk the whole graph on my machine. - run "git clone --no-local -q . dst"; the "-q" is important because if pack-objects is writing progress to upload-pack (to get multiplexed over the sideband to the client), then it will notice pretty quickly the failure to write to stderr - kill just upload-pack with "pkill git-upload-pack" or whatever you like - run "ps au | grep pack-objects" (or just "top") to see pack-objects chugging on 100% CPU (and consuming 1GB+ of RAM) With the patch adding clean_on_exit, that last step turns up nothing. Now the situation above is probably pretty rare. Nobody is usually going to kill upload-pack specifically. The more common case is when upload-pack realizes that the client (or the network) has gone away, because it tries to write and finds the connection gone. But what is it writing? Most of the time it's stuff from pack-objects! So in the normal case, pack-objects is continually writing either data or progress reports, so it would notice for its next write. But again, a client asking for no progress is a problem. upload-pack will be sending keepalives every 5s or so, so it will notice client death then. But pack-objects will keep running, not generating any output until it starts spewing the pack. So you could probably make the scenario above a bit more realistic by killing the parent git-clone process. But don't use ^C; that will send SIGINT to all of the processes. Simulate a network failure by killing the "git clone" process specifically. This shows the same problem, and the same improvement after the patch (though remember it may take up to 5 seconds for upload-pack to send a keepalive and notice the problem). -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-11-27 4:17 ` Jeff King @ 2020-11-27 20:43 ` René Scharfe 2020-11-28 6:30 ` Jeff King 0 siblings, 1 reply; 16+ messages in thread From: René Scharfe @ 2020-11-27 20:43 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, 唐宇奕, git Am 27.11.20 um 05:17 schrieb Jeff King: > On Thu, Nov 26, 2020 at 09:04:35PM +0100, René Scharfe wrote: > >> Before I could submit that one (or something similar) formally, I'd need >> to understand what's happening here a lot better and witness the effect >> of the patch. >> >> I understand that the main benefit of stopping the child upon >> termination of the parent is to avoid using CPU cycles on a heavy task >> whose results will just go to waste. But wouldn't the orphaned child >> then become a zombie? Init would reap it eventually, but are there >> perhaps init-less deployments (containerized daemon?) where such >> zombies could pile up? > > I think an init-less deployment like that is already broken. If we > encounter any error at all in upload-pack we may quit without reaping > all of our children. And this could never be protected against entirely; > we could be killed by SIGSEGV, SIGKILL, etc. That might be true, but it might still be usable if the rate of zombie production is low enough. And reducing it slightly might still help by increasing the time between container restarts. Segfaults should be very rare, and people using kill -9 can clean up after themselves.. > My understanding is container deployments often have a tiny pid-1 init > that takes care of zombie processes like this (but it's not something > I've dealt with much myself). True, e.g. https://github.com/krallin/tini, which is built into newer Docker releases already. So this problem is real and has an (optional) solution. OK, so overall the situation sounds a bit messy to me and perhaps there's room for improvement, but I agree now that we can leave the specialists (init, tini) to deal with our zombies. >> For a test, winning the race condition should be easy if we cheat by >> letting the child loop forever. But I struggle even with the most >> basic task: Making upload-pack invoked by clone call pack-objects. >> (Feeling a bit silly.) > > Here's an easy reproduction. On a clone of something large-ish (by > number of objects) like linux.git: > > - make sure you don't have bitmaps on (since they make the enumerating > phase go quickly). For linux.git it takes ~30s or so to walk the > whole graph on my machine. > > - run "git clone --no-local -q . dst"; the "-q" is important because > if pack-objects is writing progress to upload-pack (to get > multiplexed over the sideband to the client), then it will notice > pretty quickly the failure to write to stderr > > - kill just upload-pack with "pkill git-upload-pack" or whatever you > like > > - run "ps au | grep pack-objects" (or just "top") to see pack-objects > chugging on 100% CPU (and consuming 1GB+ of RAM) > > With the patch adding clean_on_exit, that last step turns up nothing. I was missing --no-local, d'oh! To win the race consistently I used this: -- >8 -- diff --git a/run-command.c b/run-command.c index ea4d0fb4b1..a6bf0a86dd 100644 --- a/run-command.c +++ b/run-command.c @@ -672,6 +672,19 @@ int start_command(struct child_process *cmd) int failed_errno; char *str; + const char *loop_argv[] = { "while :; do sleep 1; done", NULL }; + const char *fail_cmd = getenv("GIT_DEBUG_ABANDON_CHILD"); + const char *argv0 = cmd->argv ? cmd->argv[0] : cmd->args.v[0]; + int fail = fail_cmd && starts_with(argv0, fail_cmd); + + if (fail) { + fprintf(stderr, "starting endless loop instead of %s\n", + cmd->argv ? cmd->argv[0] : cmd->args.v[0]); + cmd->argv = loop_argv; + cmd->use_shell = 1; + cmd->git_cmd = 0; + } + if (!cmd->argv) cmd->argv = cmd->args.v; if (!cmd->env) @@ -982,6 +995,9 @@ int start_command(struct child_process *cmd) else if (cmd->err) close(cmd->err); + if (fail) + die("abandoning child %"PRIuMAX"\n", (uintmax_t)cmd->pid); + return 0; } --- 8< --- We could build tests that always win (or lose, based on your perspective) the race condition based on that. Turning on clean_on_exit is such a no-brainer that I don't see the need for one, though. > Now the situation above is probably pretty rare. Nobody is usually going > to kill upload-pack specifically. The more common case is when > upload-pack realizes that the client (or the network) has gone away, > because it tries to write and finds the connection gone. But what is it > writing? Most of the time it's stuff from pack-objects! So in the normal > case, pack-objects is continually writing either data or progress > reports, so it would notice for its next write. > > But again, a client asking for no progress is a problem. upload-pack > will be sending keepalives every 5s or so, so it will notice client > death then. But pack-objects will keep running, not generating any > output until it starts spewing the pack. > > So you could probably make the scenario above a bit more realistic by > killing the parent git-clone process. But don't use ^C; that will send > SIGINT to all of the processes. Simulate a network failure by killing > the "git clone" process specifically. This shows the same problem, and > the same improvement after the patch (though remember it may take up to > 5 seconds for upload-pack to send a keepalive and notice the problem). With the debug patch above and GIT_DEBUG_ABANDON_CHILD=git-upload-pack I need the following patch get rid of the spawned process: --- >8 --- diff --git a/connect.c b/connect.c index 8b8f56cf6d..e1b1b73ef5 100644 --- a/connect.c +++ b/connect.c @@ -1369,6 +1369,7 @@ struct child_process *git_connect(int fd[2], const char *url, conn->use_shell = 1; conn->in = conn->out = -1; + conn->clean_on_exit = 1; if (protocol == PROTO_SSH) { char *ssh_host = hostandport; const char *port = NULL; --- 8< --- So is there a downside to clean_on_exit? It doesn't make sense when we start browsers or pagers, but for hooks and helpers (which are probably the majority of started processes) cascading program termination makes sense, no? René ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-11-27 20:43 ` René Scharfe @ 2020-11-28 6:30 ` Jeff King 0 siblings, 0 replies; 16+ messages in thread From: Jeff King @ 2020-11-28 6:30 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, 唐宇奕, git On Fri, Nov 27, 2020 at 09:43:06PM +0100, René Scharfe wrote: > > [...zombie processes...] > OK, so overall the situation sounds a bit messy to me and perhaps > there's room for improvement, but I agree now that we can leave the > specialists (init, tini) to deal with our zombies. Keep in mind we are not changing anything here, either. clean_on_exit is kicking in when we already would be exiting without calling wait(). We could _also_ instruct run-command to wait, but nobody seems to be complaining about it. > With the debug patch above and GIT_DEBUG_ABANDON_CHILD=git-upload-pack I > need the following patch get rid of the spawned process: I don't think that is an interesting case, though. We've been discussing the spawn between upload-pack and its child pack-objects, but here you're running a bogus infinite loop instead of upload-pack. It will spin forever, because the client is expecting it to say something. In a normal setup, a severing of the connection between the client and upload-pack will be noticed quickly, because each one is always either trying to read from or write to the other (the exception is while pack-objects is processing without progress meters, but there we send a keep-alive every 5 seconds). If one of them were to spin in a true infinite loop due to a bug (but not break the connection), we would wait forever. But the solution there is a timeout on inactivity. So... > --- >8 --- > > diff --git a/connect.c b/connect.c > index 8b8f56cf6d..e1b1b73ef5 100644 > --- a/connect.c > +++ b/connect.c > @@ -1369,6 +1369,7 @@ struct child_process *git_connect(int fd[2], const char *url, > > conn->use_shell = 1; > conn->in = conn->out = -1; > + conn->clean_on_exit = 1; > if (protocol == PROTO_SSH) { > char *ssh_host = hostandport; > const char *port = NULL; > > --- 8< --- I don't think there's much point here. If the client side dies, it will close the socket, and upload-pack would notice anyway. It's only your fake "spin without doing any I/O" patch that misbehaves. Moreover, this wouldn't do _anything_ in many cases, because upload-pack is going to be on the far side of a network socket. So really you'd be killing ssh here for those cases (which then likewise would close the socket, but this isn't necessary because ssh will also exit when it sees us close _our_ end). And it would do nothing at all for git-over-http, git://, etc. > So is there a downside to clean_on_exit? It doesn't make sense when we > start browsers or pagers, but for hooks and helpers (which are probably > the majority of started processes) cascading program termination makes > sense, no? In general, no, I suspect clean_on_exit wouldn't be hurting anything if we used it more. But generally we get a natural cascade of termination because exiting processes close the input and output pipes going to the other sub-processes. -Peff ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-11-26 20:04 ` René Scharfe 2020-11-27 4:17 ` Jeff King @ 2020-12-01 12:15 ` Jeff King 2020-12-02 11:45 ` René Scharfe 1 sibling, 1 reply; 16+ messages in thread From: Jeff King @ 2020-12-01 12:15 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, 唐宇奕, git On Thu, Nov 26, 2020 at 09:04:35PM +0100, René Scharfe wrote: > >> We spawn an external pack-objects process to actually send objects to > >> the remote side. If we are killed by a signal during this process, > >> then pack-objects may continue to run. As soon as it starts producing > >> output for the pack, it will see a failure writing to upload-pack and > >> exit itself. But before then, it may do significant work traversing > >> the object graph, compressing deltas, etc, which will all be > >> pointless. So let's make sure to kill as soon as we know that the > >> caller will not read the result. > > > > Thanks, that reads well. > > > The patch is trivial, you don't need my sign-off. You could record Peff > as its author, as he contributed the most to the version in seen. I didn't want this topic to be forgotten, so here it is with me as the author, my signoff, and an overview of the reproduction in the commit message. (I am perfectly happy for René to be author, but I am mainly interested in resolving the signoff issue; I agree most of the work was in the diagnosis, and I did re-type the single line all by myself ;) ). -- >8 -- Subject: [PATCH] upload-pack: kill pack-objects helper on signal or exit We spawn an external pack-objects process to actually send objects to the remote side. If we are killed by a signal during this process, then pack-objects may continue to run. As soon as it starts producing output for the pack, it will see a failure writing to upload-pack and exit itself. But before then, it may do significant work traversing the object graph, compressing deltas, etc, which will all be pointless. So let's make sure to kill as soon as we know that the caller will not read the result. There's no test here, since it's inherently racy, but here's an easy reproduction is on a large-ish repo like linux.git: - make sure you don't have pack bitmaps (since they make the enumerating phase go quickly). For linux.git it takes ~30s or so to walk the whole graph on my machine. - run "git clone --no-local -q . dst"; the "-q" is important because if pack-objects is writing progress to upload-pack (to get multiplexed over the sideband to the client), then it will notice pretty quickly the failure to write to stderr - kill the client-side clone process in another terminal (don't use ^C, as that will send SIGINT to all of the processes) - run "ps au | grep git" or similar to observe upload-pack dying within 5 seconds (it will send a keepalive that will notice the client has gone away) - but you'll still see pack-objects consuming 100% CPU (and 1GB+ of RAM) during the traversal and delta compression phases. It will exit as soon as it starts to write the pack (when it will notice that upload-pack went away). With this patch, pack-objects exits as soon as upload-pack does. Signed-off-by: Jeff King <peff@peff.net> --- upload-pack.c | 1 + 1 file changed, 1 insertion(+) diff --git a/upload-pack.c b/upload-pack.c index 5dc8e1f844..1006bebd50 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -321,6 +321,7 @@ static void create_pack_file(struct upload_pack_data *pack_data, pack_objects.in = -1; pack_objects.out = -1; pack_objects.err = -1; + pack_objects.clean_on_exit = 1; if (start_command(&pack_objects)) die("git upload-pack: unable to fork git-pack-objects"); -- 2.29.2.893.g57eb4d1d5a ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-12-01 12:15 ` Jeff King @ 2020-12-02 11:45 ` René Scharfe 2020-12-02 22:14 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: René Scharfe @ 2020-12-02 11:45 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, 唐宇奕, git Am 01.12.20 um 13:15 schrieb Jeff King: > On Thu, Nov 26, 2020 at 09:04:35PM +0100, René Scharfe wrote: >> The patch is trivial, you don't need my sign-off. You could record Peff >> as its author, as he contributed the most to the version in seen. > > I didn't want this topic to be forgotten, so here it is with me as the > author, my signoff, and an overview of the reproduction in the commit > message. Thank you! René ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug report: orphaned pack-objects after killing upload-pack on [ 2020-12-02 11:45 ` René Scharfe @ 2020-12-02 22:14 ` Junio C Hamano 0 siblings, 0 replies; 16+ messages in thread From: Junio C Hamano @ 2020-12-02 22:14 UTC (permalink / raw) To: René Scharfe; +Cc: Jeff King, 唐宇奕, git René Scharfe <l.s.r@web.de> writes: > Am 01.12.20 um 13:15 schrieb Jeff King: >> On Thu, Nov 26, 2020 at 09:04:35PM +0100, René Scharfe wrote: >>> The patch is trivial, you don't need my sign-off. You could record Peff >>> as its author, as he contributed the most to the version in seen. >> >> I didn't want this topic to be forgotten, so here it is with me as the >> author, my signoff, and an overview of the reproduction in the commit >> message. > > Thank you! Thanks, both. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2020-12-02 22:16 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-11-19 8:18 唐宇奕 2020-11-20 18:52 ` Bug report: orphaned pack-objects after killing upload-pack on [was: (no subject)] René Scharfe 2020-11-21 0:29 ` Jeff King 2020-11-21 21:54 ` Bug report: orphaned pack-objects after killing upload-pack on [ Junio C Hamano 2020-11-24 3:21 ` 唐宇奕 2020-11-24 9:11 ` Jeff King 2020-11-25 21:42 ` Junio C Hamano 2020-11-26 0:53 ` Jeff King 2020-11-26 1:04 ` Junio C Hamano 2020-11-26 20:04 ` René Scharfe 2020-11-27 4:17 ` Jeff King 2020-11-27 20:43 ` René Scharfe 2020-11-28 6:30 ` Jeff King 2020-12-01 12:15 ` Jeff King 2020-12-02 11:45 ` René Scharfe 2020-12-02 22:14 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).