git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 1/1] upload-pack: fix race condition in error messages
Date: Wed, 28 Aug 2019 10:54:12 -0400	[thread overview]
Message-ID: <20190828145412.GB14432@sigill.intra.peff.net> (raw)
In-Reply-To: <20190828093408.GD8571@szeder.dev>

On Wed, Aug 28, 2019 at 11:34:08AM +0200, SZEDER Gábor wrote:

> On Tue, Aug 27, 2019 at 06:43:29PM -0700, Derrick Stolee via GitGitGadget wrote:
> > Test t5516-fetch-push.sh has a test 'deny fetch unreachable SHA1,
> > allowtipsha1inwant=true' that checks stderr for a specific error
> > string from the remote. In some build environments the error sent
> > over the remote connection gets mingled with the error from the
> > die() statement. Since both signals are being output to the same
> > file descriptor (but from parent and child processes), the output
> > we are matching with grep gets split.
> 
> In the spirit of "An error message is worth a thousand words", I think
> it's worth to include the error message causing the failure:
> 
>   error: 'grep not our ref.*64ea4c133d59fa98e86a771eda009872d6ab2886 err' didn't find a match in:
>   fatal: git upload-pack: not our ref 64ea4c13fatal: remote error: upload-pack: not our ref 63d59fa98e86a771eda009872d6ab2886
>   4ea4c133d59fa98e86a771eda009872d6ab2886
>   error: last command exited with $?=1
> 
> I tried to reproduce this specific error on Linux and macOS, but
> couldn't yet.

I suspect it depends on write() to a file not being atomic, since we
should be able to get the full message out to a single write in both
cases. It's _possible_ that stderr is fully buffered on Windows, but it
generally shouldn't be. If it is, then this might help:

diff --git a/usage.c b/usage.c
index 2fdb20086b..d6df31bc5b 100644
--- a/usage.c
+++ b/usage.c
@@ -10,13 +10,19 @@ void vreportf(const char *prefix, const char *err, va_list params)
 {
 	char msg[4096];
 	char *p;
-
-	vsnprintf(msg, sizeof(msg), err, params);
+	size_t len;
+
+	/* truncation is OK here, but make sure we have a newline */
+	len = xsnprintf(msg, sizeof(msg), "%s", prefix);
+	len += vsnprintf(msg + len, sizeof(msg) - len, err, params);
+	if (len >= sizeof(msg) - 1)
+		len--;
+	len += xsnprintf(msg + len, sizeof(msg) - len, "\n");
 	for (p = msg; *p; p++) {
 		if (iscntrl(*p) && *p != '\t' && *p != '\n')
 			*p = '?';
 	}
-	fprintf(stderr, "%s%s\n", prefix, msg);
+	write(2, msg, len);
 }
 
 static NORETURN void usage_builtin(const char *err, va_list params)

But again, I'm doubtful.

> Here you talk about reducing the risk ...
> 
> > 1. Write an error message to stderr.
> > 2. Write an error message across the connection.
> > 3. exit(1).
> > 
> > This reorders the events so the error is written entirely before
> > the client receives a message from the remote, removing the race
> > condition.
> 
> ... but here you talk about removing the race condition.
> 
> I don't understand how this change would remove the race condition.
> After all, the occasional failure is caused by two messages racing
> through different file descriptors, and merely sending them in reverse
> order doesn't change that they would still be racing.

I had the same puzzlement. I think the answer might be that in the
original, we can have two write()s happening simultaneously:

  1. upload-pack sends the packet to the client

  2. client receives packet

  3a. upload-pack then writes to stderr

  3b. simultaneously, the client writes to stderr

But by reordering, step 3a is completed before step 1.

> > +			warning("git upload-pack: not our ref %s",
> > +				oid_to_hex(&o->oid));
> >  			packet_writer_error(writer,
> >  					    "upload-pack: not our ref %s",
> >  					    oid_to_hex(&o->oid));
> > -			die("git upload-pack: not our ref %s",
> > -			    oid_to_hex(&o->oid));
> > +			exit(1);
> 
> 
> So, the error coming from the 'git fetch' command in question
> currently looks like this:
> 
>   fatal: git upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886
>   fatal: remote error: upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886
> 
> Changing die() to a warning() changes the prefix of the message from
> "fatal:" to "warning:", i.e. with this patch I got this:
> 
>   warning: git upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886
>   fatal: remote error: upload-pack: not our ref 64ea4c133d59fa98e86a771eda009872d6ab2886
> 
> I don't think that "demoting" that message from fatal to warning
> matters much to users, because they would see the (arguably redundant)
> second line starting with "fatal:".

An interesting thing about this message is that many users won't see it
at all! It requires that they be connected to the stderr of upload-pack,
which happens only for local-filesystem clones, and for git-over-ssh to
a vanilla ssh session.

It won't be seen for git:// nor for https://, which go to the stderr of
git-daemon and the webserver respectively. Nor would it be seen on any
hosting site which uses something other than vanilla sshd to terminate
the connection (at GitHub for example, ssh terminates in a proxy layer
which then uses a different protocol to run upload-pack on the backend).

That's why we've been adding this sideband-level errors: so everybody
can see them. So unless the client is a 2005-era version of Git that
lacks sideband, the die message is either invisible to the user, or
redundant.

I'd almost suggest it is worth dropping the die() message altogether
here, but it does carry some value to people collecting them on the
server side[1].

Another solution is for Git to redirect the stderr of the child
upload-pack it runs, which puts all invocations on the same footing,
regardless of protocol. But that risks losing useful messages from
before the sideband starts up (e.g., ssh failures).

[1] At GitHub, we record the exit status of every Git process, and we've
    hooked die() to record the message, which is often helpful for
    post-facto debugging. I'd hate to lose it, but we could certainly
    work around it for this case, perhaps by hooking
    packet_writer_error() in a similar way.

> Unfortunately, however, while running './t5516-fetch-push.sh -r 1,79
> --stress' to try to reproduce a failure caused by those mingled
> messages, the same check only failed for a different reason so far
> (both on Linux and macOS (on Travis CI)):

There's some hand-waving argument that this should be race-free in
014ade7484 (upload-pack: send ERR packet for non-tip objects,
2019-04-13), but I am not too surprised if there is a flaw in that
logic.

This kind of race is a problem anytime upload-pack dies. And it's not
just a test issue, but a real problem for users: they might see EPIPE
instead of the error message from the server (it used to be SIGPIPE but
we recently relaxed that).

There's details and a possible path forward discussed in:

  https://public-inbox.org/git/20190305041139.GA19800@sigill.intra.peff.net/

-Peff

  reply	other threads:[~2019-08-28 14:54 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-28  1:43 [PATCH 0/1] upload-pack: fix race condition in t5516 Derrick Stolee via GitGitGadget
2019-08-28  1:43 ` [PATCH 1/1] upload-pack: fix race condition in error messages Derrick Stolee via GitGitGadget
2019-08-28  9:34   ` SZEDER Gábor
2019-08-28 14:54     ` Jeff King [this message]
2019-08-28 15:39       ` Jeff King
2019-08-28 16:15         ` SZEDER Gábor
2019-08-29 12:58           ` Derrick Stolee
2019-08-29 14:13             ` Jeff King
2019-08-29 14:27               ` Derrick Stolee
2019-08-29 14:38                 ` Jeff King
2019-08-29 21:58                   ` SZEDER Gábor
2019-08-29 22:06                     ` SZEDER Gábor
2019-08-30 12:10                       ` SZEDER Gábor
2019-09-04  5:04                         ` Jeff King
2019-09-10 12:08                           ` SZEDER Gábor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190828145412.GB14432@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).