From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-5.0 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id 382131F4D7 for ; Tue, 3 May 2022 15:01:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237653AbiECPDf (ORCPT ); Tue, 3 May 2022 11:03:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237651AbiECPD0 (ORCPT ); Tue, 3 May 2022 11:03:26 -0400 Received: from siwi.pair.com (siwi.pair.com [209.68.5.199]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E91A539816 for ; Tue, 3 May 2022 07:59:53 -0700 (PDT) Received: from siwi.pair.com (localhost [127.0.0.1]) by siwi.pair.com (Postfix) with ESMTP id 1E49C3F47F0; Tue, 3 May 2022 10:59:53 -0400 (EDT) Received: from jeffhost-mbp.local (162-238-212-202.lightspeed.rlghnc.sbcglobal.net [162.238.212.202]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by siwi.pair.com (Postfix) with ESMTPSA id EC7843F4139; Tue, 3 May 2022 10:59:52 -0400 (EDT) Subject: Re: [PATCH] run-command: don't spam trace2_child_exit() To: Junio C Hamano , Josh Steadmon Cc: git@vger.kernel.org References: <4616d09ffa632bd2c9e308a713c4bdf2a1328c3c.1651179450.git.steadmon@google.com> From: Jeff Hostetler Message-ID: <75f62c9e-e083-d333-6339-2d12e0788400@jeffhostetler.com> Date: Tue, 3 May 2022 10:59:52 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 4/28/22 5:46 PM, Junio C Hamano wrote: > Josh Steadmon writes: > >> In rare cases, wait_or_whine() cannot determine a child process's exit >> status (and will return -1 in this case). This can cause Git to issue >> trace2 child_exit events despite the fact that the child is still >> running. I'm curious what is causing the spurious return values. Could you instrument wait_or_whine() and see which of the if/else arms are causing the -1 to be returned? That routine is rather complicated and looks like it has 3 different ways that a -1 could be returned. > > Rather, we do not even know if the child is still running when it > happens, right? It is curious what "rare cases" makes the symptom > appear. Do we know? > > The patch looks OK from the "we do not know the child exited in this > case, so we shouldn't be reporting the child exit" point of view, of > course. Having one event that started a child in the log and then > having millions of events that reports the exit of the (same) child > is way too broken. With this change, we remove these phoney exit > events from the log. > > Do we know, for such a child process that caused these millions > phoney exit events, we got a real exit event at the end? Otherwise, > we'd still have a similar problem in the opposite direction, i.e. a > child has a start event recorded, many exit event discarded but the > log lacks the true exit event for the child, implying that the child > is still running because we failed to log its exit? > >> int finish_command_in_signal(struct child_process *cmd) >> { >> int ret = wait_or_whine(cmd->pid, cmd->args.v[0], 1); >> - trace2_child_exit(cmd, ret); >> + if (ret != -1) >> + trace2_child_exit(cmd, ret); >> return ret; >> } Since this is only called from pager.c and is used to setup the pager, I have to wonder if you're only getting these spurious events for the pager process or for any of the other random child processes. And whether they are received while the pager is alive and working properly, or when you're trying to quit the pager or when the pager is trying to signal eof. > > Will queue; thanks. >