git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Jeff King <peff@peff.net>
Cc: "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Git mailing list" <git@vger.kernel.org>,
	"Clemens Buchacher" <drizzd@gmx.net>
Subject: Re: t5570-git-daemon fails with SIGPIPE on OSX
Date: Fri, 8 Feb 2019 10:28:12 +0100 (STD)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.1902081024550.41@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.1902080958190.41@tvgsbejvaqbjf.bet>

[-- Attachment #1: Type: text/plain, Size: 6837 bytes --]

Hi Peff,

On Fri, 8 Feb 2019, Johannes Schindelin wrote:

> I just had a look at the patch you provided below (for some reason, my
> previous search on public-inbox only turned up Gábor's mail to which you
> responded).
> 
> Admittedly, I do not really understand all aspects of it, but it applies,
> still, and I kicked off a stress test here:
> 
> 	https://dev.azure.com/git/git/_build/results?buildId=338
> 
> It seems that your patch fixes that t5570 flakiness on macOS, and more
> importantly, addresses an important issue on macOS.
> 
> Will play a bit more with it and keep you posted.

Alas, I was fooled. *Fooled*, I say. Apparently the --stress option makes
the script *succeed* when it fails?

I say that because I wanted to make sure that your patch fixes things and
reverted your change and started another build, which succeeded. So I
started another build, then another build, and they all succeeded. Only
then it dawned on me that I had not looked at the *logs*. And they all
still report the same issue, even with your patch:

https://dev.azure.com/git/git/_build/results?buildId=338&view=logs&jobId=51041795-01c5-57f3-5561-107b6b9e51a6&taskId=fadc714a-a906-5cf2-cc7a-335e443ad2f8&lineStart=1402&lineEnd=1505&colStart=1&colEnd=32

(You will have to scroll all the way down, or press Ctrl+End, to see that
"fetch notices corrupt pack" is failing.)

So I am afraid that your patch does not fix the issue nor does it work
around it.

Ciao,
Dscho

> On Tue, 14 Aug 2018, Jeff King wrote:
> 
> > On Mon, Aug 06, 2018 at 05:11:13PM +0200, SZEDER Gábor wrote:
> > 
> > >   - 'git upload-pack' receives the request, parses the want line,
> > >     notices the corrupt pack, responds with an 'ERR upload-pack: not
> > >     our ref' pkt-line, and die()s right away.
> > > 
> > >   - 'git fetch' finally approaches the end of the function, where it
> > >     attempts to send a done pkt-line via another send_request() call
> > >     through the now closing TCP socket.
> > > 
> > >   - What happens now seems to depend on the platform:
> > > 
> > >     - On Linux, both on my machine and on Travis CI, it shows textbook
> > >       example behaviour: write() returns with error and sets errno to
> > >       ECONNRESET.  Since it happens in write_or_die(), 'git fetch'
> > >       die()s with 'fatal: write error: Connection reset by peer', and
> > >       doesn't show the error send by 'git upload-pack'; how could it,
> > >       it doesn't even get as far to receive upload-pack's ERR
> > >       pkt-line.
> > > 
> > >       The test only checks that 'git fetch' fails, but it doesn't
> > >       check whether it failed with the right error message, so the
> > >       test still succeeds.  Had it checked the error message as well,
> > >       we most likely had noticed this issue already, it doesn't happen
> > >       all that rarely.
> > 
> > Hmm. Traditionally we did not send ERR as part of upload-pack at all. It
> > was the message you got from git-daemon if it couldn't start the
> > requested sub-process. It was only later in bdb31eada7 (upload-pack:
> > report "not our ref" to client, 2017-02-23) that we started sending
> > them. So I think that is why it does not check the error message: it is
> > not expecting that case at all (and it is not actually interesting here,
> > as the real problem is that the remote side is corrupt, but it sadly
> > does not say anything so useful).
> > 
> > I think that's somewhat tangential, though. The root of the issue is
> > this:
> > 
> > >     - On the new OSX images with XCode 9.4 on Travis CI the write()
> > >       triggers SIGPIPE right away, and 'test_must_fail' notices it and
> > >       fails the test.  I couldn't see any sign of an ECONNRESET or any
> > >       other error that we could act upon to avoid the SIGPIPE.
> > 
> > Right, as soon as we get SIGPIPE we can't offer any useful message,
> > because we're dead. I would argue that fetch should simply turn off
> > SIGPIPE entirely, and rely on getting EPIPE from write(). But since
> > we're in write_or_die(), it actually turns EPIPE back into a SIGPIPE
> > death!
> > 
> > So we'd probably also want to teach it to use a real write_in_full(),
> > and then output a more useful message in this case. write_or_die()
> > really does produce bad messages regardless, because it doesn't know
> > what it's writing to.
> > 
> > That would give us a baby step in the right direction, because at least
> > we'd always be doing a controlled die() then. And then the next step
> > would be to show the remote error message (even though it's not actually
> > useful in this case, in theory upload-pack could generate something
> > better). And that would mean turning the die() on write into an attempt
> > to drain any ERR messages before either dying or returning an error up
> > the stack.
> > 
> > I suspect the (largely untested) patch below would make your test
> > problems go away. Or instead, we could simply add sigpipe=ok to the
> > test_must_fail invocation, but I agree with you that the current
> > behavior on OS X is not ideal (the user sees no error message).
> > 
> > -Peff
> > 
> > diff --git a/fetch-pack.c b/fetch-pack.c
> > index 5714bcbddd..3e80604562 100644
> > --- a/fetch-pack.c
> > +++ b/fetch-pack.c
> > @@ -188,8 +188,10 @@ static void send_request(struct fetch_pack_args *args,
> >  	if (args->stateless_rpc) {
> >  		send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
> >  		packet_flush(fd);
> > -	} else
> > -		write_or_die(fd, buf->buf, buf->len);
> > +	} else {
> > +		if (write_in_full(fd, buf->buf, buf->len) < 0)
> > +			die_errno("unable to write to remote");
> > +	}
> >  }
> >  
> >  static void insert_one_alternate_object(struct fetch_negotiator *negotiator,
> > @@ -1167,7 +1169,8 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
> >  
> >  	/* Send request */
> >  	packet_buf_flush(&req_buf);
> > -	write_or_die(fd_out, req_buf.buf, req_buf.len);
> > +	if (write_in_full(fd_out, req_buf.buf, req_buf.len) < 0)
> > +		die_errno("unable to write request to remote");
> >  
> >  	strbuf_release(&req_buf);
> >  	return ret;
> > diff --git a/pkt-line.c b/pkt-line.c
> > index a593c08aad..450d0801b1 100644
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -88,13 +88,15 @@ static void packet_trace(const char *buf, unsigned int len, int write)
> >  void packet_flush(int fd)
> >  {
> >  	packet_trace("0000", 4, 1);
> > -	write_or_die(fd, "0000", 4);
> > +	if (write_in_full(fd, "0000", 4) < 0)
> > +		die_errno("unable to write flush packet");
> >  }
> >  
> >  void packet_delim(int fd)
> >  {
> >  	packet_trace("0001", 4, 1);
> > -	write_or_die(fd, "0001", 4);
> > +	if (write_in_full(fd, "0000", 4) < 0)
> > +		die_errno("unable to write delim packet");
> >  }
> >  
> >  int packet_flush_gently(int fd)
> > 
> > 

  reply	other threads:[~2019-02-08  9:28 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-06 15:11 t5570-git-daemon fails with SIGPIPE on OSX SZEDER Gábor
2018-08-06 15:31 ` SZEDER Gábor
2019-02-08  8:32   ` Johannes Schindelin
2019-02-08 12:54     ` SZEDER Gábor
2018-08-14 22:32 ` Jeff King
2018-08-14 22:37   ` Jeff King
2019-02-08  9:02   ` Johannes Schindelin
2019-02-08  9:28     ` Johannes Schindelin [this message]
2019-02-08 19:54       ` Jeff King
2019-03-01 15:02         ` Johannes Schindelin
2019-03-01 19:00           ` Jeff King
2019-03-02 21:21             ` Johannes Schindelin
2019-03-03 16:54               ` Jeff King
2019-03-03 16:55                 ` [PATCH 1/2] fetch: avoid calling write_or_die() Jeff King
2019-03-04 13:42                   ` Duy Nguyen
2019-03-05  4:11                     ` Jeff King
2019-03-03 16:58                 ` [PATCH 2/2] fetch: ignore SIGPIPE during network operation Jeff King
2019-03-04  1:11                   ` Junio C Hamano
2019-03-05  4:11                     ` Jeff King
2019-03-03  1:21             ` t5570-git-daemon fails with SIGPIPE on OSX Junio C Hamano
2019-03-03 14:56               ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.1902081024550.41@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=drizzd@gmx.net \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).