git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Josh Steadmon <steadmon@google.com>, git@vger.kernel.org
Cc: gitster@pobox.com
Subject: Re: [PATCH v2] run-command: don't spam trace2_child_exit()
Date: Fri, 10 Jun 2022 11:31:30 -0400	[thread overview]
Message-ID: <e1a078cc-b1b0-5ee0-9539-77fd97dd5a78@jeffhostetler.com> (raw)
In-Reply-To: <50d872a057a558fa5519856b95abd048ddb514dc.1654625626.git.steadmon@google.com>



On 6/7/22 2:21 PM, Josh Steadmon wrote:
> In rare cases[1], wait_or_whine() cannot determine a child process's
> status (and will return -1 in this case). This can cause Git to issue
> trace2 child_exit events despite the fact that the child may still be
> running. In pathological cases, we've seen > 80 million exit events in
> our trace logs for a single child process.
> 
> Fix this by only issuing trace2 events in finish_command_in_signal() if
> we get a value other than -1 from wait_or_whine(). This can lead to
> missing child_exit events in such a case, but that is preferable to
> duplicating events on a scale that threatens to fill the user's
> filesystem with invalid trace logs.
> 
> [1]: This can happen when:
> 
> * waitpid() returns -1 and errno != EINTR
> * waitpid() returns an invalid PID
> * the status set by waitpid() has neither the WIFEXITED() nor
>    WIFSIGNALED() flags
> 
> Signed-off-by: Josh Steadmon <steadmon@google.com>
> ---
> Updated the commit message with more details about when wait_or_whine()
> can fail.
> 
>   run-command.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/run-command.c b/run-command.c
> index a8501e38ce..e0fe2418a2 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -983,7 +983,8 @@ int finish_command(struct child_process *cmd)
>   int finish_command_in_signal(struct child_process *cmd)
>   {
>   	int ret = wait_or_whine(cmd->pid, cmd->args.v[0], 1);
> -	trace2_child_exit(cmd, ret);
> +	if (ret != -1)
> +		trace2_child_exit(cmd, ret);
>   	return ret;
>   }
>   
> 
> base-commit: faa21c10d44184f616d391c158dcbb13b9c72ef3
> 


I think there are larger bugs lurking here.

finish_command_in_signal() is only called from
wait_for_pager_signal(), which is loaded up by
sigchain_push_common() when the pager is started.

There are no loops or other callers here.

We sigchain-pop our handler upon first use, so even if we
were getting millions of PIPE signals from other chilld
processes, we shouldn't be getting more than 1 message
here.  (Assuming that our pop is working and our call to
raise() doesn't cause us to be called again.)

The code is behaving as if the "s->n--" at the bottom of
sigchain_pop() isn't being performed and our raise() is
calling us again.

I originally thought it might be due to unbalanced push/pops -- which
could happen if the process were in a "push; do something; pop"
sequence and the PIPE happened in the middle of that.  But I don't
think that matters.

So I have to wonder if the sigchain code has a bug.

The Linux man-page states that signal(2) is not thread-safe
and even if it were, the sigchain code here does not have any
locking on those global data structures.

But that might not be the problem here and it's hard to say without
any of the original trace data.

Having said all of that I think it would be better to fix the
original problem something like this (not tested):

  static void wait_for_pager_signal(int signo)
  {
+	static int handled = 0;
+
+	if (handled)
+		return;
+	handled = 1;
+
	close_pager_fds();
	finish_command_in_signal(&pager_process);
	sigchain_pop(signo);
	raise(signo);
  }

Jeff


      parent reply	other threads:[~2022-06-10 15:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-28 20:58 [PATCH] " Josh Steadmon
2022-04-28 21:46 ` Junio C Hamano
2022-05-03 14:59   ` Jeff Hostetler
2022-05-05 19:58     ` Josh Steadmon
2022-05-10 20:37       ` Jeff Hostetler
2022-06-07 18:45         ` Josh Steadmon
2022-05-05 19:44   ` Josh Steadmon
2022-06-07 18:21 ` [PATCH v2] " Josh Steadmon
2022-06-07 22:09   ` Ævar Arnfjörð Bjarmason
2022-06-10 15:31   ` Jeff Hostetler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e1a078cc-b1b0-5ee0-9539-77fd97dd5a78@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).