git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] t5500.43: make the check a bit more robust
@ 2020-10-13 14:45 Johannes Schindelin via GitGitGadget
  2020-10-13 17:53 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-10-13 14:45 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Johannes Schindelin, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

In 2b695ecd74d (t5500: count objects through stderr, not trace,
2020-05-06) we tried to ensure that the "Total 3" message could be
grepped in Git's output, even if it sometimes got chopped up into
multiple lines in the trace machinery.

However, the first instance where this mattered now goes through the
sideband machinery, where it is _still_ possible for messages to get
chopped up.

Note: we have code in `demultiplex_sideband()` _specifically_ to stitch
back together lines that were delivered in separate sideband packets.
However, this stitching fails when a primary packet is delivered in
between the two sideband packets: since a primary packet was received,
`demultiplex_sideband()` has to return that (and cannot wait for other
sideband packets, and therefore has to flush any incomplete line it
received).

This seems only to happen occasionally in the `vs-test` part of our CI
builds, i.e. with binaries built using Visual C, but not when building
with GCC or clang; The symptom is that t5500.43 fails to find a line
matching `remote: Total 3` in the `log` file, which ends in something
along these lines:

	remote: Tota
	remote: l 3 (delta 0), reused 0 (delta 0), pack-reused 0

To work around that, use some `sed` magic to re-stitch together those
split lines, after the fact.

The other test case touched by 2b695ecd74d (t5500: count objects through
stderr, not trace, 2020-05-06) is not affected by this issue because the
sideband machinery is not involved there.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    Work around flakiness in t5500.43
    
    It seems that this test became flaky only recently, although I have to
    admit that I have no idea why: the involved code does not seem to have
    changed recently at all. It should have been fixed by 
    https://lore.kernel.org/git/20200506220741.71021-1-jonathantanmy@google.com/
    , but apparently wasn't completely fixed, despite what I said in that
    thread.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-753%2Fdscho%2Funflake-t5500.43-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-753/dscho/unflake-t5500.43-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/753

 t/t5500-fetch-pack.sh | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 3557374312..5e556313af 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -400,7 +400,20 @@ test_expect_success 'in_vain not triggered before first ACK' '
 	test_commit -C myserver bar &&
 
 	git -C myclient fetch --progress origin 2>log &&
-	test_i18ngrep "remote: Total 3 " log
+	if ! test_i18ngrep "remote: Total 3 " log
+	then
+		# It is possible that the "Total 3" line is delivered in
+		# multiple sideband packets, and that a primary packet is
+		# delivered in between. When that happens, the line will be
+		# presented on multiple "remote:" lines.
+		sed "/^remote: T/{
+			:a
+			N
+			s/\nremote: //
+			ba
+		}" log >log.unsplit &&
+		test_i18ngrep "remote: Total 3 " log.unsplit
+	fi
 '
 
 test_expect_success 'in_vain resetted upon ACK' '

base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] t5500.43: make the check a bit more robust
  2020-10-13 14:45 [PATCH] t5500.43: make the check a bit more robust Johannes Schindelin via GitGitGadget
@ 2020-10-13 17:53 ` Junio C Hamano
  2020-10-13 18:55 ` Jeff King
  2020-10-19 19:35 ` [PATCH v2 0/3] Work around flakiness in t5500.43 Johannes Schindelin via GitGitGadget
  2 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2020-10-13 17:53 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Jonathan Tan, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Note: we have code in `demultiplex_sideband()` _specifically_ to stitch
> back together lines that were delivered in separate sideband packets.
> However, this stitching fails when a primary packet is delivered in
> between the two sideband packets: since a primary packet was received,
> `demultiplex_sideband()` has to return that (and cannot wait for other
> sideband packets, and therefore has to flush any incomplete line it
> received).

Ouch.  Good find.

>  	git -C myclient fetch --progress origin 2>log &&
> -	test_i18ngrep "remote: Total 3 " log
> +	if ! test_i18ngrep "remote: Total 3 " log
> +	then
> +		# It is possible that the "Total 3" line is delivered in
> +		# multiple sideband packets, and that a primary packet is
> +		# delivered in between. When that happens, the line will be
> +		# presented on multiple "remote:" lines.
> +		sed "/^remote: T/{
> +			:a
> +			N
> +			s/\nremote: //
> +			ba

OK, so if we see multiple and adjacent "remote:" lines, we strip the
leading "remote: " prefix from the second and subsequent lines and
concatenate them into one giant line?  Sounds good.

Will queue.  Thanks.

> +		}" log >log.unsplit &&
> +		test_i18ngrep "remote: Total 3 " log.unsplit
> +	fi
>  '
>  
>  test_expect_success 'in_vain resetted upon ACK' '
>
> base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] t5500.43: make the check a bit more robust
  2020-10-13 14:45 [PATCH] t5500.43: make the check a bit more robust Johannes Schindelin via GitGitGadget
  2020-10-13 17:53 ` Junio C Hamano
@ 2020-10-13 18:55 ` Jeff King
  2020-10-13 19:07   ` Junio C Hamano
                     ` (2 more replies)
  2020-10-19 19:35 ` [PATCH v2 0/3] Work around flakiness in t5500.43 Johannes Schindelin via GitGitGadget
  2 siblings, 3 replies; 33+ messages in thread
From: Jeff King @ 2020-10-13 18:55 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Jonathan Tan, Johannes Schindelin

On Tue, Oct 13, 2020 at 02:45:23PM +0000, Johannes Schindelin via GitGitGadget wrote:

> Note: we have code in `demultiplex_sideband()` _specifically_ to stitch
> back together lines that were delivered in separate sideband packets.
> However, this stitching fails when a primary packet is delivered in
> between the two sideband packets: since a primary packet was received,
> `demultiplex_sideband()` has to return that (and cannot wait for other
> sideband packets, and therefore has to flush any incomplete line it
> received).

This sounds like a bug. We should accumulate multiple sideband 2
packets, even across data packets. And I think we _used_ to do that. The
original recv_sideband() looked roughly like this:

	while (!retval) {
		 packet_read();
		 band = buf[0] & 0xff;
		 switch (band) {
		 case 3:
		   ...stuff data into outbuf...
		   retval = SIDEBAND_ERROR_REMOTE;
		   break;
		 case 2:
		   ...print full lines, stuff remainder into outbuf...
		   break; /* note, does _not_ set retval */
		 case 1:
		   write_or_die(out, buf + 1, len);
		   break; /* likewise, does not set retval */
		 default:
		   ...complain about broken sideband...
		   retval = SIDEBAND_PROTOCOL_ERROR;
		}
	}
	if (outbuf.len)
		...print outbuf...

So we would keep looping until we see EOF, sideband 3, or a protocol
error. But notably we would not break out of the loop on bands 1 or 2,
and we do not flush band 2 until we break out of the loop.

But then in fbd76cd450 (sideband: reverse its dependency on pkt-line,
2019-01-16), the function became demultiplex_sideband(). The loop went
away, and we pump only a single packet on each call. When we see
sideband 2, we do an early return. But on sideband 1, we continue to the
cleanup: label, which flushes the scratch buffer.

I think we need to return from there without hitting that cleanup label,
like this:

diff --git a/sideband.c b/sideband.c
index 0a60662fa6..a5405b9aaa 100644
--- a/sideband.c
+++ b/sideband.c
@@ -190,7 +190,7 @@ int demultiplex_sideband(const char *me, char *buf, int len,
 		return 0;
 	case 1:
 		*sideband_type = SIDEBAND_PRIMARY;
-		break;
+		return 1;
 	default:
 		strbuf_addf(scratch, "%s%s: protocol error: bad band #%d",
 			    scratch->len ? "\n" : "", me, band);

Does that make the problem go away for you?

>     Work around flakiness in t5500.43
>     
>     It seems that this test became flaky only recently, although I have to
>     admit that I have no idea why: the involved code does not seem to have
>     changed recently at all. It should have been fixed by 
>     https://lore.kernel.org/git/20200506220741.71021-1-jonathantanmy@google.com/
>     , but apparently wasn't completely fixed, despite what I said in that
>     thread.

I think this is a real bug, and we shouldn't be changing the tests to
accommodate. Users may actually see the broken lines, and our tests are
telling us that.

I suspect it's uncommon in practice because it requires the server side
also splitting the line across two packets. And while the server-side
upload-pack might get a partial write from a child pack-objects or
whatever, that should only happen if:

  - the pipe buffer fills up. Which is hard to do since our messages
    tend to be very small. So perhaps it implies a very tiny pipe
    buffer, and/or slow scheduling of the receiving side to actually
    clean it out.

  - the writer is fully buffering its stderr writes instead of sending
    them a line at a time

The latter seems a more likely candidate for switching from gcc to
Visual C (which presumably has a different libc / CRT).  I think the
client should be handling this more robustly, but it may be worth
digging into the buffering issue if it means progress may not be
delivered in quite as timely a way as we expect it to be.

-Peff

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] t5500.43: make the check a bit more robust
  2020-10-13 18:55 ` Jeff King
@ 2020-10-13 19:07   ` Junio C Hamano
  2020-10-13 19:09   ` Jeff King
  2020-10-17  2:34   ` Johannes Schindelin
  2 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2020-10-13 19:07 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan,
	Johannes Schindelin

Jeff King <peff@peff.net> writes:

>>     Work around flakiness in t5500.43
>>     
>>     It seems that this test became flaky only recently, although I have to
>>     admit that I have no idea why: the involved code does not seem to have
>>     changed recently at all. It should have been fixed by 
>>     https://lore.kernel.org/git/20200506220741.71021-1-jonathantanmy@google.com/
>>     , but apparently wasn't completely fixed, despite what I said in that
>>     thread.
>
> I think this is a real bug, and we shouldn't be changing the tests to
> accommodate. Users may actually see the broken lines, and our tests are
> telling us that.

True; I reacted too hastily.  Will not queue the workaround ;-)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] t5500.43: make the check a bit more robust
  2020-10-13 18:55 ` Jeff King
  2020-10-13 19:07   ` Junio C Hamano
@ 2020-10-13 19:09   ` Jeff King
  2020-10-17  3:31     ` Johannes Schindelin
  2020-10-17  2:34   ` Johannes Schindelin
  2 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2020-10-13 19:09 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Jonathan Tan, Johannes Schindelin

On Tue, Oct 13, 2020 at 02:55:15PM -0400, Jeff King wrote:

> But then in fbd76cd450 (sideband: reverse its dependency on pkt-line,
> 2019-01-16), the function became demultiplex_sideband(). The loop went
> away, and we pump only a single packet on each call. When we see
> sideband 2, we do an early return. But on sideband 1, we continue to the
> cleanup: label, which flushes the scratch buffer.
> 
> I think we need to return from there without hitting that cleanup label,
> like this:

By the way, the reason this works is that the "scratch" buffer persists
beyond individual calls to demultiplex_sideband(). So we can get away
with letting it persist between the two.

However unlike the original recv_sideband(), which broke out of the loop
on error and then flushed scratch, our post-fbd76cd450 does not do the
same. It now looks like:

int recv_sideband(const char *me, int in_stream, int out)
{
        char buf[LARGE_PACKET_MAX + 1];
        int len;
        struct strbuf scratch = STRBUF_INIT;
        enum sideband_type sideband_type;

        while (1) {
                len = packet_read(in_stream, NULL, NULL, buf, LARGE_PACKET_MAX,
                                  0);
                if (!demultiplex_sideband(me, buf, len, 0, &scratch,
                                          &sideband_type))
                        continue;
                switch (sideband_type) {
                case SIDEBAND_PRIMARY:
                        write_or_die(out, buf + 1, len - 1);
                        break;
                default: /* errors: message already written */
                        return sideband_type;
                }
        }
}

I wondered if we could ever have a case where we broke out of the loop
with data left over in "scratch" (or its buffer left allocated). I think
the answer is no. We only break out of the loop if
demultiplex_sideband() returned non-zero _and_ its not the primary
sideband. So both before and after my suggested fix, we'd have hit the
cleanup label in demultiplex_sideband(), which flushes and frees the
buffer.

I do have to say that the original loop before that commit was a lot
easier to follow, though.

Another weirdness I noticed is that it doesn't distinguish a flush
packet (expected) from a zero-byte packet (an error). But neither did
the original. I would have guessed the zero-byte packet was meant to
trigger this second conditional:

        if (len == 0) {
                *sideband_type = SIDEBAND_FLUSH;
                goto cleanup;
        }
        if (len < 1) {
                strbuf_addf(scratch,
                            "%s%s: protocol error: no band designator",
                            scratch->len ? "\n" : "", me);
                *sideband_type = SIDEBAND_PROTOCOL_ERROR;
                goto cleanup;
        }

but we'd hit the first conditional before then. We would still trigger
the second if we saw EOF while reading the packet (because we set the
length to -1 then), but then it's arguably the wrong error to be
showing.

So I think this could be improved a bit by using
packet_read_with_status() in the recv_sideband() caller.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] t5500.43: make the check a bit more robust
  2020-10-13 18:55 ` Jeff King
  2020-10-13 19:07   ` Junio C Hamano
  2020-10-13 19:09   ` Jeff King
@ 2020-10-17  2:34   ` Johannes Schindelin
  2 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin @ 2020-10-17  2:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan

Hi Peff,

On Tue, 13 Oct 2020, Jeff King wrote:

> On Tue, Oct 13, 2020 at 02:45:23PM +0000, Johannes Schindelin via GitGitGadget wrote:
>
> > Note: we have code in `demultiplex_sideband()` _specifically_ to stitch
> > back together lines that were delivered in separate sideband packets.
> > However, this stitching fails when a primary packet is delivered in
> > between the two sideband packets: since a primary packet was received,
> > `demultiplex_sideband()` has to return that (and cannot wait for other
> > sideband packets, and therefore has to flush any incomplete line it
> > received).
>
> This sounds like a bug. We should accumulate multiple sideband 2
> packets, even across data packets. And I think we _used_ to do that. The
> original recv_sideband() looked roughly like this:
>
> 	while (!retval) {
> 		 packet_read();
> 		 band = buf[0] & 0xff;
> 		 switch (band) {
> 		 case 3:
> 		   ...stuff data into outbuf...
> 		   retval = SIDEBAND_ERROR_REMOTE;
> 		   break;
> 		 case 2:
> 		   ...print full lines, stuff remainder into outbuf...
> 		   break; /* note, does _not_ set retval */
> 		 case 1:
> 		   write_or_die(out, buf + 1, len);
> 		   break; /* likewise, does not set retval */
> 		 default:
> 		   ...complain about broken sideband...
> 		   retval = SIDEBAND_PROTOCOL_ERROR;
> 		}
> 	}
> 	if (outbuf.len)
> 		...print outbuf...
>
> So we would keep looping until we see EOF, sideband 3, or a protocol
> error. But notably we would not break out of the loop on bands 1 or 2,
> and we do not flush band 2 until we break out of the loop.
>
> But then in fbd76cd450 (sideband: reverse its dependency on pkt-line,
> 2019-01-16), the function became demultiplex_sideband(). The loop went
> away, and we pump only a single packet on each call. When we see
> sideband 2, we do an early return. But on sideband 1, we continue to the
> cleanup: label, which flushes the scratch buffer.
>
> I think we need to return from there without hitting that cleanup label,
> like this:
>
> diff --git a/sideband.c b/sideband.c
> index 0a60662fa6..a5405b9aaa 100644
> --- a/sideband.c
> +++ b/sideband.c
> @@ -190,7 +190,7 @@ int demultiplex_sideband(const char *me, char *buf, int len,
>  		return 0;
>  	case 1:
>  		*sideband_type = SIDEBAND_PRIMARY;
> -		break;
> +		return 1;
>  	default:
>  		strbuf_addf(scratch, "%s%s: protocol error: bad band #%d",
>  			    scratch->len ? "\n" : "", me, band);
>
> Does that make the problem go away for you?

Yes. But it took a substantial amount of time for myself to convince
myself that this oneliner is actually correct.

>
> >     Work around flakiness in t5500.43
> >
> >     It seems that this test became flaky only recently, although I have to
> >     admit that I have no idea why: the involved code does not seem to have
> >     changed recently at all. It should have been fixed by
> >     https://lore.kernel.org/git/20200506220741.71021-1-jonathantanmy@google.com/
> >     , but apparently wasn't completely fixed, despite what I said in that
> >     thread.
>
> I think this is a real bug, and we shouldn't be changing the tests to
> accommodate. Users may actually see the broken lines, and our tests are
> telling us that.
>
> I suspect it's uncommon in practice because it requires the server side
> also splitting the line across two packets. And while the server-side
> upload-pack might get a partial write from a child pack-objects or
> whatever, that should only happen if:
>
>   - the pipe buffer fills up. Which is hard to do since our messages
>     tend to be very small. So perhaps it implies a very tiny pipe
>     buffer, and/or slow scheduling of the receiving side to actually
>     clean it out.
>
>   - the writer is fully buffering its stderr writes instead of sending
>     them a line at a time
>
> The latter seems a more likely candidate for switching from gcc to
> Visual C (which presumably has a different libc / CRT).  I think the
> client should be handling this more robustly, but it may be worth
> digging into the buffering issue if it means progress may not be
> delivered in quite as timely a way as we expect it to be.

FWIW I _think_ the issue at hand is that newer MSVC runtimes deliver
`stderr` byte-by-byte. While that is totally allowed, it is different from
how things seem to be done on Linux/Unix/Darwin.

In my tests, I frequently saw the first two or four characters of that
"Total 3" line be delivered in its own sideband packet, hence the
breakage.

Thank you for keeping me honest. I would much rather have avoided spending
several hours on this investigation, but in the end, it is something that
is better made correct than left incorrect.

Having said that it is neither a very new bug nor does it seem to matter
on Linux/macOS (and not even in Git for Windows because it is built using
mingw-w64-gcc, which uses an older MSVC runtime, which apparently does not
share that issue with newer versions), therefore I plan on sending off v2
only after v2.29.0 was published.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] t5500.43: make the check a bit more robust
  2020-10-13 19:09   ` Jeff King
@ 2020-10-17  3:31     ` Johannes Schindelin
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin @ 2020-10-17  3:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan

Hi Peff,

On Tue, 13 Oct 2020, Jeff King wrote:

> On Tue, Oct 13, 2020 at 02:55:15PM -0400, Jeff King wrote:
>
> > But then in fbd76cd450 (sideband: reverse its dependency on pkt-line,
> > 2019-01-16), the function became demultiplex_sideband(). The loop went
> > away, and we pump only a single packet on each call. When we see
> > sideband 2, we do an early return. But on sideband 1, we continue to the
> > cleanup: label, which flushes the scratch buffer.
> >
> > I think we need to return from there without hitting that cleanup label,
> > like this:
>
> By the way, the reason this works is that the "scratch" buffer persists
> beyond individual calls to demultiplex_sideband(). So we can get away
> with letting it persist between the two.

The thing that threw me was that I somehow expected `recv_sideband()` to
return primary data as it is read, much like `read()` operates. And yes,
I also found the split version of the code (`recv_sideband()` contains the
`while` loop and lives in `pkt-line.c` while `demultiplex_sideband()`
contains the `scratch` handling and the `switch` between packet types and
it lives in `sideband.c`) was much harder to read than the original
version.

> However unlike the original recv_sideband(), which broke out of the loop
> on error and then flushed scratch, our post-fbd76cd450 does not do the
> same. It now looks like:
>
> int recv_sideband(const char *me, int in_stream, int out)
> {
>         char buf[LARGE_PACKET_MAX + 1];
>         int len;
>         struct strbuf scratch = STRBUF_INIT;
>         enum sideband_type sideband_type;
>
>         while (1) {
>                 len = packet_read(in_stream, NULL, NULL, buf, LARGE_PACKET_MAX,
>                                   0);
>                 if (!demultiplex_sideband(me, buf, len, 0, &scratch,
>                                           &sideband_type))
>                         continue;
>                 switch (sideband_type) {
>                 case SIDEBAND_PRIMARY:
>                         write_or_die(out, buf + 1, len - 1);
>                         break;
>                 default: /* errors: message already written */
>                         return sideband_type;
>                 }
>         }
> }
>
> I wondered if we could ever have a case where we broke out of the loop
> with data left over in "scratch" (or its buffer left allocated). I think
> the answer is no. We only break out of the loop if
> demultiplex_sideband() returned non-zero _and_ its not the primary
> sideband. So both before and after my suggested fix, we'd have hit the
> cleanup label in demultiplex_sideband(), which flushes and frees the
> buffer.

Right.

It took me quite a while to convince myself of that, too.

And since I am really worried that the complexity of the code makes it
easy to introduce a regression, I spent quite a bit of time to figure out
how to implement a good regression test for exactly this issue.

Even so, the regression test will not necessarily catch problems where the
`while` loop is abandoned without processing any unfinished sideband
message. I introduced a `BUG()` for that case, but it is quite a bit
unsatisfying that I could not come up with a better way to ensure that
this does not happen.

> I do have to say that the original loop before that commit was a lot
> easier to follow, though.
>
> Another weirdness I noticed is that it doesn't distinguish a flush
> packet (expected) from a zero-byte packet (an error). But neither did
> the original. I would have guessed the zero-byte packet was meant to
> trigger this second conditional:
>
>         if (len == 0) {
>                 *sideband_type = SIDEBAND_FLUSH;
>                 goto cleanup;
>         }
>         if (len < 1) {
>                 strbuf_addf(scratch,
>                             "%s%s: protocol error: no band designator",
>                             scratch->len ? "\n" : "", me);
>                 *sideband_type = SIDEBAND_PROTOCOL_ERROR;
>                 goto cleanup;
>         }
>
> but we'd hit the first conditional before then. We would still trigger
> the second if we saw EOF while reading the packet (because we set the
> length to -1 then), but then it's arguably the wrong error to be
> showing.
>
> So I think this could be improved a bit by using
> packet_read_with_status() in the recv_sideband() caller.

Possibly. But is it really a bug to send a zero-byte packet? It is
inefficient, sure. But I could see how it could potentially simplify code,
or serve e.g. as some sort of a "keep-alive" mechanism or whatever.

In other words, I am not sure that  we should treat this as an error, but
yes, we should probably not treat it as a flush (even if it is likely that
our current sideband users simply won't ever send empty primary packets).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 0/3] Work around flakiness in t5500.43
  2020-10-13 14:45 [PATCH] t5500.43: make the check a bit more robust Johannes Schindelin via GitGitGadget
  2020-10-13 17:53 ` Junio C Hamano
  2020-10-13 18:55 ` Jeff King
@ 2020-10-19 19:35 ` Johannes Schindelin via GitGitGadget
  2020-10-19 19:35   ` [PATCH v2 1/3] sideband: avoid reporting incomplete sideband messages Johannes Schindelin via GitGitGadget
                     ` (4 more replies)
  2 siblings, 5 replies; 33+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-10-19 19:35 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Jeff King, Johannes Schindelin, Johannes Schindelin

It seems that this test became flaky only recently, although I have to admit
that I have no idea why: the involved code does not seem to have changed
recently at all. It should have been fixed by 
https://lore.kernel.org/git/20200506220741.71021-1-jonathantanmy@google.com/
, but apparently wasn't completely fixed, despite what I said in that
thread.

Changes since v1:

 * Instead of papering over the underlying cause, the patch was completely
   changed to actually fix the bug and add a proper regression test for it
   (originally, I wanted to act according to the common notion that good
   programmers are lazy, oh my, see how well that worked out for me).
 * We now specifically watch out for future bugs where incomplete sideband
   messages would be dropped inadvertently rather than being reported at EOF
   or at encountering a flush/error packet.
 * Before sending 0-length data to demultiplex_sideband(), we now make sure
   that it is not marked as a data packet; Otherwise we now complain loudly
   about a bug.

Johannes Schindelin (3):
  sideband: avoid reporting incomplete sideband messages
  sideband: report unhandled incomplete sideband messages as bugs
  sideband: add defense against packets missing a band designator

 pkt-line.c               | 11 +++++++++--
 sideband.c               |  2 +-
 t/helper/test-pkt-line.c | 23 +++++++++++++++++++++++
 t/t0070-fundamental.sh   |  6 ++++++
 4 files changed, 39 insertions(+), 3 deletions(-)


base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-753%2Fdscho%2Funflake-t5500.43-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-753/dscho/unflake-t5500.43-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/753

Range-diff vs v1:

 1:  d977644184 < -:  ---------- t5500.43: make the check a bit more robust
 -:  ---------- > 1:  e4ba96358b sideband: avoid reporting incomplete sideband messages
 -:  ---------- > 2:  9ffcc5b78e sideband: report unhandled incomplete sideband messages as bugs
 -:  ---------- > 3:  c61e560451 sideband: add defense against packets missing a band designator

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 1/3] sideband: avoid reporting incomplete sideband messages
  2020-10-19 19:35 ` [PATCH v2 0/3] Work around flakiness in t5500.43 Johannes Schindelin via GitGitGadget
@ 2020-10-19 19:35   ` Johannes Schindelin via GitGitGadget
  2020-10-20 20:33     ` Junio C Hamano
  2020-10-19 19:35   ` [PATCH v2 2/3] sideband: report unhandled incomplete sideband messages as bugs Johannes Schindelin via GitGitGadget
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 33+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-10-19 19:35 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, Jeff King, Johannes Schindelin, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

In 2b695ecd74d (t5500: count objects through stderr, not trace,
2020-05-06) we tried to ensure that the "Total 3" message could be
grepped in Git's output, even if it sometimes got chopped up into
multiple lines in the trace machinery.

However, the first instance where this mattered now goes through the
sideband machinery, where it is _still_ possible for messages to get
chopped up: it *is* possible for the standard error stream to be sent
byte-for-byte and hence it can be easily interrupted. Meaning: it is
possible for the single line that we're looking for to be chopped up
into multiple sideband packets, with a primary packet being delivered
between them.

This seems to happen occasionally in the `vs-test` part of our CI
builds, i.e. with binaries built using Visual C, but not when building
with GCC or clang; The symptom is that t5500.43 fails to find a line
matching `remote: Total 3` in the `log` file, which ends in something
along these lines:

	remote: Tota
	remote: l 3 (delta 0), reused 0 (delta 0), pack-reused 0

This should not happen, though: we have code in `demultiplex_sideband()`
_specifically_ to stitch back together lines that were delivered in
separate sideband packets.

However, this stitching was broken in a subtle way in fbd76cd450
(sideband: reverse its dependency on pkt-line, 2019-01-16): before that
change, incomplete sideband lines would not be flushed upon receiving a
primary packet, but after that patch, they would be.

The subtleness of this bug comes from the fact that it is easy to get
confused by the ambiguous meaning of the `break` keyword: after writing
the primary packet contents, the `break;` in the original version of
`recv_sideband()` does _not_ break out of the `while` loop, but instead
only ends the `switch` case:

	while (!retval) {
		[...]
		switch (band) {
			[...]
		case 1:
/* Write the contents of the primary packet */
			write_or_die(out, buf + 1, len);
/* Here, we do *not* break out of the loop, `retval` is unchanged */
			break;
		[...]
	}

	if (outbuf.len) {
/* Write any remaining sideband messages lacking a trailing LF */
		strbuf_addch(&outbuf, '\n');
		xwrite(2, outbuf.buf, outbuf.len);
	}

In contrast, after fbd76cd450 (sideband: reverse its dependency on
pkt-line, 2019-01-16), the body of the `while` loop was extracted into
`demultiplex_sideband()`, crucially _including_ the logic to write
incomplete sideband messages:

	switch (band) {
	[...]
	case 1:
		*sideband_type = SIDEBAND_PRIMARY;
/* This does not break out of the loop: the loop is in the caller */
		break;
	[...]
	}

cleanup:
	[...]
/* This logic is now no longer _outside_ the loop but _inside_ */
	if (scratch->len) {
		strbuf_addch(scratch, '\n');
		xwrite(2, scratch->buf, scratch->len);
	}

The correct way to fix this is to return from `demultiplex_sideband()`
early. The caller will then write out the contents of the primary packet
and continue looping. The `scratch` buffer for incomplete sideband
messages is owned by that caller, and will continue to accumulate the
remainder(s) of those messages. The loop will only end once
`demultiplex_sideband()` returned non-zero _and_ did not indicate a
primary packet, which is the case only when we hit the `cleanup:` path,
in which we take care of flushing any unfinished sideband messages and
release the `scratch` buffer.

To ensure that this does not get broken again, we introduce a pair of
subcommands of the `pkt-line` test helper that specifically chop up the
sideband message and squeeze a primary packet into the middle.

Final note: The other test case touched by 2b695ecd74d (t5500: count
objects through stderr, not trace, 2020-05-06) is not affected by this
issue because the sideband machinery is not involved there.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sideband.c               |  2 +-
 t/helper/test-pkt-line.c | 23 +++++++++++++++++++++++
 t/t0070-fundamental.sh   |  6 ++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/sideband.c b/sideband.c
index 0a60662fa6..a5405b9aaa 100644
--- a/sideband.c
+++ b/sideband.c
@@ -190,7 +190,7 @@ int demultiplex_sideband(const char *me, char *buf, int len,
 		return 0;
 	case 1:
 		*sideband_type = SIDEBAND_PRIMARY;
-		break;
+		return 1;
 	default:
 		strbuf_addf(scratch, "%s%s: protocol error: bad band #%d",
 			    scratch->len ? "\n" : "", me, band);
diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
index 69152958e5..0bf20642be 100644
--- a/t/helper/test-pkt-line.c
+++ b/t/helper/test-pkt-line.c
@@ -84,6 +84,25 @@ static void unpack_sideband(void)
 	}
 }
 
+static int send_split_sideband(void)
+{
+	const char *part1 = "Hello,";
+	const char *primary = "\001primary: regular output\n";
+	const char *part2 = " world!\n";
+
+	send_sideband(1, 2, part1, strlen(part1), LARGE_PACKET_MAX);
+	packet_write(1, primary, strlen(primary));
+	send_sideband(1, 2, part2, strlen(part2), LARGE_PACKET_MAX);
+	packet_response_end(1);
+
+	return 0;
+}
+
+static int receive_sideband(void)
+{
+	return recv_sideband("sideband: ", 0, 1);
+}
+
 int cmd__pkt_line(int argc, const char **argv)
 {
 	if (argc < 2)
@@ -95,6 +114,10 @@ int cmd__pkt_line(int argc, const char **argv)
 		unpack();
 	else if (!strcmp(argv[1], "unpack-sideband"))
 		unpack_sideband();
+	else if (!strcmp(argv[1], "send-split-sideband"))
+		send_split_sideband();
+	else if (!strcmp(argv[1], "receive-sideband"))
+		receive_sideband();
 	else
 		die("invalid argument '%s'", argv[1]);
 
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 7b111a56fd..357201640a 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -34,4 +34,10 @@ test_expect_success 'check for a bug in the regex routines' '
 	test-tool regex --bug
 '
 
+test_expect_success 'incomplete sideband messages are reassembled' '
+	test-tool pkt-line send-split-sideband >split-sideband &&
+	test-tool pkt-line receive-sideband <split-sideband 2>err &&
+	grep "Hello, world" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 2/3] sideband: report unhandled incomplete sideband messages as bugs
  2020-10-19 19:35 ` [PATCH v2 0/3] Work around flakiness in t5500.43 Johannes Schindelin via GitGitGadget
  2020-10-19 19:35   ` [PATCH v2 1/3] sideband: avoid reporting incomplete sideband messages Johannes Schindelin via GitGitGadget
@ 2020-10-19 19:35   ` Johannes Schindelin via GitGitGadget
  2020-10-19 19:35   ` [PATCH v2 3/3] sideband: add defense against packets missing a band designator Johannes Schindelin via GitGitGadget
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-10-19 19:35 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, Jeff King, Johannes Schindelin, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

It was pretty tricky to verify that incomplete sideband messages are
handled correctly by the `recv_sideband()`/`demultiplex_sideband()`
code: they have to be flushed out at the end of the loop in
`recv_sideband()`, but the actual flushing is done by the
`demultiplex_sideband()` function (which therefore has to know somehow
that the loop will be done after it returns).

To catch future bugs where incomplete sideband messages might not be
shown by mistake, let's catch that condition and report a bug.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 pkt-line.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 844c253ccd..657a702927 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -471,6 +471,9 @@ int recv_sideband(const char *me, int in_stream, int out)
 			write_or_die(out, buf + 1, len - 1);
 			break;
 		default: /* errors: message already written */
+			if (scratch.len > 0)
+				BUG("unhandled incomplete sideband: '%s'",
+				    scratch.buf);
 			return sideband_type;
 		}
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 3/3] sideband: add defense against packets missing a band designator
  2020-10-19 19:35 ` [PATCH v2 0/3] Work around flakiness in t5500.43 Johannes Schindelin via GitGitGadget
  2020-10-19 19:35   ` [PATCH v2 1/3] sideband: avoid reporting incomplete sideband messages Johannes Schindelin via GitGitGadget
  2020-10-19 19:35   ` [PATCH v2 2/3] sideband: report unhandled incomplete sideband messages as bugs Johannes Schindelin via GitGitGadget
@ 2020-10-19 19:35   ` Johannes Schindelin via GitGitGadget
  2020-10-20 20:36     ` Junio C Hamano
                       ` (2 more replies)
  2020-10-23  8:50   ` [PATCH v2 0/3] Work around flakiness in t5500.43 Jeff King
  2020-10-26 16:09   ` [PATCH v3 0/2] " Johannes Schindelin via GitGitGadget
  4 siblings, 3 replies; 33+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-10-19 19:35 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, Jeff King, Johannes Schindelin, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

While there is currently no instance of code producing this type of
packet, if the `demultiplex_sideband()` would receive a packet whose
payload is not only empty but even misses the band designator, it would
mistake it for a flush packet.

Let's defend against such a bug in the future.

Note: `demultiplex_sideband()` will treat empty flush, delim, eof and
response-end packets all alike: it will treat them as flush packets.
Since empty packets by definition are missing a band designator, this is
the best that function can do. Therefore, it would make little sense to
pass the `status` on to `demultiplex_sideband()`. Besides, fbd76cd450
(sideband: reverse its dependency on pkt-line, 2019-01-16) went out of
its way to _stop_ the code inside `demultiplex_sideband()` from relying
on anything in `pkt-line.h`; that includes the data type `enum
packet_read_status` that we would need if we were to pass `status` as a
parameter to that function.

Based on a suggestion by Jeff King.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 pkt-line.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 657a702927..f72048f623 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -461,8 +461,10 @@ int recv_sideband(const char *me, int in_stream, int out)
 	enum sideband_type sideband_type;
 
 	while (1) {
-		len = packet_read(in_stream, NULL, NULL, buf, LARGE_PACKET_MAX,
-				  0);
+		int status = packet_read_with_status(in_stream, NULL, NULL, buf,
+						     LARGE_PACKET_MAX, &len, 0);
+		if (!len && status == PACKET_READ_NORMAL)
+			BUG("missing band designator");
 		if (!demultiplex_sideband(me, buf, len, 0, &scratch,
 					  &sideband_type))
 			continue;
@@ -520,6 +522,8 @@ enum packet_read_status packet_reader_read(struct packet_reader *reader)
 							 reader->options);
 		if (!reader->use_sideband)
 			break;
+		if (!reader->pktlen && reader->status == PACKET_READ_NORMAL)
+			BUG("missing band designator");
 		if (demultiplex_sideband(reader->me, reader->buffer,
 					 reader->pktlen, 1, &scratch,
 					 &sideband_type))
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/3] sideband: avoid reporting incomplete sideband messages
  2020-10-19 19:35   ` [PATCH v2 1/3] sideband: avoid reporting incomplete sideband messages Johannes Schindelin via GitGitGadget
@ 2020-10-20 20:33     ` Junio C Hamano
  2020-10-20 20:48       ` Johannes Schindelin
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2020-10-20 20:33 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Jonathan Tan, Jeff King, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> In 2b695ecd74d (t5500: count objects through stderr, not trace,
> 2020-05-06) we tried to ensure that the "Total 3" message could be
> grepped in Git's output, even if it sometimes got chopped up into
> multiple lines in the trace machinery.
> ...
> The correct way to fix this is to return from `demultiplex_sideband()`
> early. The caller will then write out the contents of the primary packet
> and continue looping. The `scratch` buffer for incomplete sideband
> messages is owned by that caller, and will continue to accumulate the
> remainder(s) of those messages. The loop will only end once
> `demultiplex_sideband()` returned non-zero _and_ did not indicate a
> primary packet, which is the case only when we hit the `cleanup:` path,
> in which we take care of flushing any unfinished sideband messages and
> release the `scratch` buffer.
>
> To ensure that this does not get broken again, we introduce a pair of
> subcommands of the `pkt-line` test helper that specifically chop up the
> sideband message and squeeze a primary packet into the middle.
>
> Final note: The other test case touched by 2b695ecd74d (t5500: count
> objects through stderr, not trace, 2020-05-06) is not affected by this
> issue because the sideband machinery is not involved there.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---

Nicely explained.  Do we want to also give credit to Peff for the
single-liner fix here, perhaps with a suggested/helped-by trailer?

>  sideband.c               |  2 +-
>  t/helper/test-pkt-line.c | 23 +++++++++++++++++++++++
>  t/t0070-fundamental.sh   |  6 ++++++
>  3 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/sideband.c b/sideband.c
> index 0a60662fa6..a5405b9aaa 100644
> --- a/sideband.c
> +++ b/sideband.c
> @@ -190,7 +190,7 @@ int demultiplex_sideband(const char *me, char *buf, int len,
>  		return 0;
>  	case 1:
>  		*sideband_type = SIDEBAND_PRIMARY;
> -		break;
> +		return 1;
>  	default:
>  		strbuf_addf(scratch, "%s%s: protocol error: bad band #%d",
>  			    scratch->len ? "\n" : "", me, band);

> diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
> index 69152958e5..0bf20642be 100644
> --- a/t/helper/test-pkt-line.c
> +++ b/t/helper/test-pkt-line.c
> @@ -84,6 +84,25 @@ static void unpack_sideband(void)
>  	}
>  }
>  
> +static int send_split_sideband(void)
> +{
> +	const char *part1 = "Hello,";
> +	const char *primary = "\001primary: regular output\n";
> +	const char *part2 = " world!\n";
> +
> +	send_sideband(1, 2, part1, strlen(part1), LARGE_PACKET_MAX);
> +	packet_write(1, primary, strlen(primary));
> +	send_sideband(1, 2, part2, strlen(part2), LARGE_PACKET_MAX);
> +	packet_response_end(1);
> +
> +	return 0;
> +}

OK.

> +static int receive_sideband(void)
> +{
> +	return recv_sideband("sideband: ", 0, 1);
> +}
> +
>  int cmd__pkt_line(int argc, const char **argv)
>  {
>  	if (argc < 2)
> @@ -95,6 +114,10 @@ int cmd__pkt_line(int argc, const char **argv)
>  		unpack();
>  	else if (!strcmp(argv[1], "unpack-sideband"))
>  		unpack_sideband();
> +	else if (!strcmp(argv[1], "send-split-sideband"))
> +		send_split_sideband();
> +	else if (!strcmp(argv[1], "receive-sideband"))
> +		receive_sideband();
>  	else
>  		die("invalid argument '%s'", argv[1]);
>  
> diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
> index 7b111a56fd..357201640a 100755
> --- a/t/t0070-fundamental.sh
> +++ b/t/t0070-fundamental.sh
> @@ -34,4 +34,10 @@ test_expect_success 'check for a bug in the regex routines' '
>  	test-tool regex --bug
>  '
>  
> +test_expect_success 'incomplete sideband messages are reassembled' '
> +	test-tool pkt-line send-split-sideband >split-sideband &&
> +	test-tool pkt-line receive-sideband <split-sideband 2>err &&
> +	grep "Hello, world" err
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/3] sideband: add defense against packets missing a band designator
  2020-10-19 19:35   ` [PATCH v2 3/3] sideband: add defense against packets missing a band designator Johannes Schindelin via GitGitGadget
@ 2020-10-20 20:36     ` Junio C Hamano
  2020-10-23  8:34     ` Jeff King
  2020-10-23  8:48     ` Jeff King
  2 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2020-10-20 20:36 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Jonathan Tan, Jeff King, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> While there is currently no instance of code producing this type of
> packet, if the `demultiplex_sideband()` would receive a packet whose
> payload is not only empty but even misses the band designator, it would
> mistake it for a flush packet.

Quite sensible.  Will queue all three.  Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/3] sideband: avoid reporting incomplete sideband messages
  2020-10-20 20:33     ` Junio C Hamano
@ 2020-10-20 20:48       ` Johannes Schindelin
  2020-10-20 21:33         ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Johannes Schindelin @ 2020-10-20 20:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan,
	Jeff King

Hi Junio,

On Tue, 20 Oct 2020, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > In 2b695ecd74d (t5500: count objects through stderr, not trace,
> > 2020-05-06) we tried to ensure that the "Total 3" message could be
> > grepped in Git's output, even if it sometimes got chopped up into
> > multiple lines in the trace machinery.
> > ...
> > The correct way to fix this is to return from `demultiplex_sideband()`
> > early. The caller will then write out the contents of the primary packet
> > and continue looping. The `scratch` buffer for incomplete sideband
> > messages is owned by that caller, and will continue to accumulate the
> > remainder(s) of those messages. The loop will only end once
> > `demultiplex_sideband()` returned non-zero _and_ did not indicate a
> > primary packet, which is the case only when we hit the `cleanup:` path,
> > in which we take care of flushing any unfinished sideband messages and
> > release the `scratch` buffer.
> >
> > To ensure that this does not get broken again, we introduce a pair of
> > subcommands of the `pkt-line` test helper that specifically chop up the
> > sideband message and squeeze a primary packet into the middle.
> >
> > Final note: The other test case touched by 2b695ecd74d (t5500: count
> > objects through stderr, not trace, 2020-05-06) is not affected by this
> > issue because the sideband machinery is not involved there.
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
>
> Nicely explained.  Do we want to also give credit to Peff for the
> single-liner fix here, perhaps with a suggested/helped-by trailer?

Sure.

I had not added that because I had actually come up with the fix
independently in my analysis before I read Peff's reply thoroughly.

But credit where credit is due: without Peff's reply, I would not have
worked on the full fix and stayed with the work-around.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 1/3] sideband: avoid reporting incomplete sideband messages
  2020-10-20 20:48       ` Johannes Schindelin
@ 2020-10-20 21:33         ` Junio C Hamano
  0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2020-10-20 21:33 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan,
	Jeff King

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> Nicely explained.  Do we want to also give credit to Peff for the
>> single-liner fix here, perhaps with a suggested/helped-by trailer?
>
> Sure.
>
> I had not added that because I had actually come up with the fix
> independently in my analysis before I read Peff's reply thoroughly.

Ah, of course, e-mails cross all the time so these things happen.
Thanks, both.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/3] sideband: add defense against packets missing a band designator
  2020-10-23  8:48     ` Jeff King
@ 2020-10-23  5:36       ` Johannes Schindelin
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin @ 2020-10-23  5:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan

Hi Peff,

On Fri, 23 Oct 2020, Jeff King wrote:

> On Mon, Oct 19, 2020 at 07:35:42PM +0000, Johannes Schindelin via GitGitGadget wrote:
>
> > diff --git a/pkt-line.c b/pkt-line.c
> > index 657a702927..f72048f623 100644
> > --- a/pkt-line.c
> > +++ b/pkt-line.c
> > @@ -461,8 +461,10 @@ int recv_sideband(const char *me, int in_stream, int out)
> >  	enum sideband_type sideband_type;
> >
> >  	while (1) {
> > -		len = packet_read(in_stream, NULL, NULL, buf, LARGE_PACKET_MAX,
> > -				  0);
> > +		int status = packet_read_with_status(in_stream, NULL, NULL, buf,
> > +						     LARGE_PACKET_MAX, &len, 0);
> > +		if (!len && status == PACKET_READ_NORMAL)
> > +			BUG("missing band designator");
> >  		if (!demultiplex_sideband(me, buf, len, 0, &scratch,
> >  					  &sideband_type))
>
> I also wonder if this status-check should be pushed down into
> demultiplex_sideband() by passing "status",

I tried that, but as mentioned in the commit message, fbd76cd450
(sideband: reverse its dependency on pkt-line, 2019-01-16) went out of its
way to _stop_ the code inside `demultiplex_sideband()` from relying on
anything in `pkt-line.h`. And that `PACKET_READ_NORMAL` and
`PACKET_READ_EOF` _is_ from `pkt-line.h`.

Ciao,
Dscho

> for two reasons:
>
>   1. So we don't have to repeat it (though it isn't that big)
>
>   2. The other half of this weirdness is that if we get an early EOF,
>      we'll hit the "missing sideband designator" die() message. But
>      that's not really what happened; we probably got a network hangup.
>      And we could distinguish that case by checking for status ==
>      PACKET_READ_EOF and provide a better message.
>
> Something like this (completely untested):
>
> diff --git a/sideband.c b/sideband.c
> index 0a60662fa6..6ad15ed581 100644
> --- a/sideband.c
> +++ b/sideband.c
> @@ -115,6 +115,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
>  #define DUMB_SUFFIX "        "
>
>  int demultiplex_sideband(const char *me, char *buf, int len,
> +			 enum packet_read_status status,
>  			 int die_on_error,
>  			 struct strbuf *scratch,
>  			 enum sideband_type *sideband_type)
> @@ -130,17 +131,29 @@ int demultiplex_sideband(const char *me, char *buf, int len,
>  			suffix = DUMB_SUFFIX;
>  	}
>
> -	if (len == 0) {
> -		*sideband_type = SIDEBAND_FLUSH;
> -		goto cleanup;
> -	}
> -	if (len < 1) {
> +	if (status == PACKET_READ_EOF) {
>  		strbuf_addf(scratch,
> -			    "%s%s: protocol error: no band designator",
> +			    "%s%s: protocol error: eof while reading packet",
>  			    scratch->len ? "\n" : "", me);
>  		*sideband_type = SIDEBAND_PROTOCOL_ERROR;
>  		goto cleanup;
>  	}
> +
> +	if (len < 0)
> +		BUG("negative length on non-eof packet read");
> +
> +	if (len == 0) {
> +		if (status == PACKET_READ_NORMAL) {
> +			strbuf_addf(scratch,
> +				    "%s%s protocol error: no band designator",
> +				    scratch->len ? "\n" : "", me);
> +			*sideband_type = SIDEBAND_PROTOCOL_ERROR;
> +		} else {
> +			*sideband_type = SIDEBAND_FLUSH;
> +		}
> +		goto cleanup;
> +	}
> +
>  	band = buf[0] & 0xff;
>  	buf[len] = '\0';
>  	len--;
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/3] sideband: add defense against packets missing a band designator
  2020-10-19 19:35   ` [PATCH v2 3/3] sideband: add defense against packets missing a band designator Johannes Schindelin via GitGitGadget
  2020-10-20 20:36     ` Junio C Hamano
@ 2020-10-23  8:34     ` Jeff King
  2020-10-23 14:44       ` Junio C Hamano
  2020-10-23  8:48     ` Jeff King
  2 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2020-10-23  8:34 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Jonathan Tan, Johannes Schindelin

On Mon, Oct 19, 2020 at 07:35:42PM +0000, Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> While there is currently no instance of code producing this type of
> packet, if the `demultiplex_sideband()` would receive a packet whose
> payload is not only empty but even misses the band designator, it would
> mistake it for a flush packet.
> 
> Let's defend against such a bug in the future.

That seems reasonable, but I'm not sure if these ought to be BUG()s.
Isn't it an indication that the other side sent us bogus input? That
likely is a bug on the other end, but I think it should be a die(), just
as we would produce for any other malformed protocol input.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/3] sideband: add defense against packets missing a band designator
  2020-10-19 19:35   ` [PATCH v2 3/3] sideband: add defense against packets missing a band designator Johannes Schindelin via GitGitGadget
  2020-10-20 20:36     ` Junio C Hamano
  2020-10-23  8:34     ` Jeff King
@ 2020-10-23  8:48     ` Jeff King
  2020-10-23  5:36       ` Johannes Schindelin
  2 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2020-10-23  8:48 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Jonathan Tan, Johannes Schindelin

On Mon, Oct 19, 2020 at 07:35:42PM +0000, Johannes Schindelin via GitGitGadget wrote:

> diff --git a/pkt-line.c b/pkt-line.c
> index 657a702927..f72048f623 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -461,8 +461,10 @@ int recv_sideband(const char *me, int in_stream, int out)
>  	enum sideband_type sideband_type;
>  
>  	while (1) {
> -		len = packet_read(in_stream, NULL, NULL, buf, LARGE_PACKET_MAX,
> -				  0);
> +		int status = packet_read_with_status(in_stream, NULL, NULL, buf,
> +						     LARGE_PACKET_MAX, &len, 0);
> +		if (!len && status == PACKET_READ_NORMAL)
> +			BUG("missing band designator");
>  		if (!demultiplex_sideband(me, buf, len, 0, &scratch,
>  					  &sideband_type))

I also wonder if this status-check should be pushed down into
demultiplex_sideband() by passing "status", for two reasons:

  1. So we don't have to repeat it (though it isn't that big)

  2. The other half of this weirdness is that if we get an early EOF,
     we'll hit the "missing sideband designator" die() message. But
     that's not really what happened; we probably got a network hangup.
     And we could distinguish that case by checking for status ==
     PACKET_READ_EOF and provide a better message.

Something like this (completely untested):

diff --git a/sideband.c b/sideband.c
index 0a60662fa6..6ad15ed581 100644
--- a/sideband.c
+++ b/sideband.c
@@ -115,6 +115,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
 #define DUMB_SUFFIX "        "
 
 int demultiplex_sideband(const char *me, char *buf, int len,
+			 enum packet_read_status status,
 			 int die_on_error,
 			 struct strbuf *scratch,
 			 enum sideband_type *sideband_type)
@@ -130,17 +131,29 @@ int demultiplex_sideband(const char *me, char *buf, int len,
 			suffix = DUMB_SUFFIX;
 	}
 
-	if (len == 0) {
-		*sideband_type = SIDEBAND_FLUSH;
-		goto cleanup;
-	}
-	if (len < 1) {
+	if (status == PACKET_READ_EOF) {
 		strbuf_addf(scratch,
-			    "%s%s: protocol error: no band designator",
+			    "%s%s: protocol error: eof while reading packet",
 			    scratch->len ? "\n" : "", me);
 		*sideband_type = SIDEBAND_PROTOCOL_ERROR;
 		goto cleanup;
 	}
+
+	if (len < 0)
+		BUG("negative length on non-eof packet read");
+
+	if (len == 0) {
+		if (status == PACKET_READ_NORMAL) {
+			strbuf_addf(scratch,
+				    "%s%s protocol error: no band designator",
+				    scratch->len ? "\n" : "", me);
+			*sideband_type = SIDEBAND_PROTOCOL_ERROR;
+		} else {
+			*sideband_type = SIDEBAND_FLUSH;
+		}
+		goto cleanup;
+	}
+
 	band = buf[0] & 0xff;
 	buf[len] = '\0';
 	len--;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/3] Work around flakiness in t5500.43
  2020-10-19 19:35 ` [PATCH v2 0/3] Work around flakiness in t5500.43 Johannes Schindelin via GitGitGadget
                     ` (2 preceding siblings ...)
  2020-10-19 19:35   ` [PATCH v2 3/3] sideband: add defense against packets missing a band designator Johannes Schindelin via GitGitGadget
@ 2020-10-23  8:50   ` Jeff King
  2020-10-26 16:04     ` Johannes Schindelin
  2020-10-26 16:09   ` [PATCH v3 0/2] " Johannes Schindelin via GitGitGadget
  4 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2020-10-23  8:50 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Jonathan Tan, Johannes Schindelin

On Mon, Oct 19, 2020 at 07:35:39PM +0000, Johannes Schindelin via GitGitGadget wrote:

> Changes since v1:
> 
>  * Instead of papering over the underlying cause, the patch was completely
>    changed to actually fix the bug and add a proper regression test for it
>    (originally, I wanted to act according to the common notion that good
>    programmers are lazy, oh my, see how well that worked out for me).

Thanks for fixing this. Your explanation looks thorough and the code
looks correct. Definitely patches 1 and 2 look good to me. I left a few
comments on patch 3.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 3/3] sideband: add defense against packets missing a band designator
  2020-10-23  8:34     ` Jeff King
@ 2020-10-23 14:44       ` Junio C Hamano
  0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2020-10-23 14:44 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan,
	Johannes Schindelin

Jeff King <peff@peff.net> writes:

> On Mon, Oct 19, 2020 at 07:35:42PM +0000, Johannes Schindelin via GitGitGadget wrote:
>
>> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>> 
>> While there is currently no instance of code producing this type of
>> packet, if the `demultiplex_sideband()` would receive a packet whose
>> payload is not only empty but even misses the band designator, it would
>> mistake it for a flush packet.
>> 
>> Let's defend against such a bug in the future.
>
> That seems reasonable, but I'm not sure if these ought to be BUG()s.
> Isn't it an indication that the other side sent us bogus input? That
> likely is a bug on the other end, but I think it should be a die(), just
> as we would produce for any other malformed protocol input.

Thanks for spotting.  I also think this was a good change, but at
this point in the code we found a problem in the data the other side
created (i.e. we diagnosed a bug on the other side), which is a
usual input error, so it should not be a BUG().  

Would this be something we can warn and ignore if the connection is
still active, I wonder, though.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 0/3] Work around flakiness in t5500.43
  2020-10-23  8:50   ` [PATCH v2 0/3] Work around flakiness in t5500.43 Jeff King
@ 2020-10-26 16:04     ` Johannes Schindelin
  0 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin @ 2020-10-26 16:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan

Hi,

On Fri, 23 Oct 2020, Jeff King wrote:

> On Mon, Oct 19, 2020 at 07:35:39PM +0000, Johannes Schindelin via GitGitGadget wrote:
>
> > Changes since v1:
> >
> >  * Instead of papering over the underlying cause, the patch was completely
> >    changed to actually fix the bug and add a proper regression test for it
> >    (originally, I wanted to act according to the common notion that good
> >    programmers are lazy, oh my, see how well that worked out for me).
>
> Thanks for fixing this. Your explanation looks thorough and the code
> looks correct. Definitely patches 1 and 2 look good to me. I left a few
> comments on patch 3.

Thank you.

After thinking about this for several days, I will drop patch 3/3 because
it is unclear to me how to implement what you suggested without reverting
the intention of Jonathan's patch that made `pkt-line` depend on
`sideband` instead of the other way round.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 0/2] Work around flakiness in t5500.43
  2020-10-19 19:35 ` [PATCH v2 0/3] Work around flakiness in t5500.43 Johannes Schindelin via GitGitGadget
                     ` (3 preceding siblings ...)
  2020-10-23  8:50   ` [PATCH v2 0/3] Work around flakiness in t5500.43 Jeff King
@ 2020-10-26 16:09   ` Johannes Schindelin via GitGitGadget
  2020-10-26 16:09     ` [PATCH v3 1/2] sideband: avoid reporting incomplete sideband messages Johannes Schindelin via GitGitGadget
                       ` (3 more replies)
  4 siblings, 4 replies; 33+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-10-26 16:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Jeff King, Johannes Schindelin, Johannes Schindelin

It seems that this test became flaky only recently, although I have to admit
that I have no idea why: the involved code does not seem to have changed
recently at all. It should have been fixed by 
https://lore.kernel.org/git/20200506220741.71021-1-jonathantanmy@google.com/
, but apparently wasn't completely fixed, despite what I said in that
thread.

Changes since v2:

 * Dropped patch 3/3 because it was only intended to be defensive
   programming, but turned out to be too hard without layering violations.

Changes since v1:

 * Instead of papering over the underlying cause, the patch was completely
   changed to actually fix the bug and add a proper regression test for it
   (originally, I wanted to act according to the common notion that good
   programmers are lazy, oh my, see how well that worked out for me).
 * We now specifically watch out for future bugs where incomplete sideband
   messages would be dropped inadvertently rather than being reported at EOF
   or at encountering a flush/error packet.
 * Before sending 0-length data to demultiplex_sideband(), we now make sure
   that it is not marked as a data packet; Otherwise we now complain loudly
   about a bug.

Johannes Schindelin (2):
  sideband: avoid reporting incomplete sideband messages
  sideband: report unhandled incomplete sideband messages as bugs

 pkt-line.c               |  3 +++
 sideband.c               |  2 +-
 t/helper/test-pkt-line.c | 23 +++++++++++++++++++++++
 t/t0070-fundamental.sh   |  6 ++++++
 4 files changed, 33 insertions(+), 1 deletion(-)


base-commit: d4a392452e292ff924e79ec8458611c0f679d6d4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-753%2Fdscho%2Funflake-t5500.43-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-753/dscho/unflake-t5500.43-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/753

Range-diff vs v2:

 1:  e4ba96358b = 1:  e4ba96358b sideband: avoid reporting incomplete sideband messages
 2:  9ffcc5b78e = 2:  9ffcc5b78e sideband: report unhandled incomplete sideband messages as bugs
 3:  c61e560451 < -:  ---------- sideband: add defense against packets missing a band designator

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 1/2] sideband: avoid reporting incomplete sideband messages
  2020-10-26 16:09   ` [PATCH v3 0/2] " Johannes Schindelin via GitGitGadget
@ 2020-10-26 16:09     ` Johannes Schindelin via GitGitGadget
  2020-10-26 16:09     ` [PATCH v3 2/2] sideband: report unhandled incomplete sideband messages as bugs Johannes Schindelin via GitGitGadget
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-10-26 16:09 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, Jeff King, Johannes Schindelin, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

In 2b695ecd74d (t5500: count objects through stderr, not trace,
2020-05-06) we tried to ensure that the "Total 3" message could be
grepped in Git's output, even if it sometimes got chopped up into
multiple lines in the trace machinery.

However, the first instance where this mattered now goes through the
sideband machinery, where it is _still_ possible for messages to get
chopped up: it *is* possible for the standard error stream to be sent
byte-for-byte and hence it can be easily interrupted. Meaning: it is
possible for the single line that we're looking for to be chopped up
into multiple sideband packets, with a primary packet being delivered
between them.

This seems to happen occasionally in the `vs-test` part of our CI
builds, i.e. with binaries built using Visual C, but not when building
with GCC or clang; The symptom is that t5500.43 fails to find a line
matching `remote: Total 3` in the `log` file, which ends in something
along these lines:

	remote: Tota
	remote: l 3 (delta 0), reused 0 (delta 0), pack-reused 0

This should not happen, though: we have code in `demultiplex_sideband()`
_specifically_ to stitch back together lines that were delivered in
separate sideband packets.

However, this stitching was broken in a subtle way in fbd76cd450
(sideband: reverse its dependency on pkt-line, 2019-01-16): before that
change, incomplete sideband lines would not be flushed upon receiving a
primary packet, but after that patch, they would be.

The subtleness of this bug comes from the fact that it is easy to get
confused by the ambiguous meaning of the `break` keyword: after writing
the primary packet contents, the `break;` in the original version of
`recv_sideband()` does _not_ break out of the `while` loop, but instead
only ends the `switch` case:

	while (!retval) {
		[...]
		switch (band) {
			[...]
		case 1:
/* Write the contents of the primary packet */
			write_or_die(out, buf + 1, len);
/* Here, we do *not* break out of the loop, `retval` is unchanged */
			break;
		[...]
	}

	if (outbuf.len) {
/* Write any remaining sideband messages lacking a trailing LF */
		strbuf_addch(&outbuf, '\n');
		xwrite(2, outbuf.buf, outbuf.len);
	}

In contrast, after fbd76cd450 (sideband: reverse its dependency on
pkt-line, 2019-01-16), the body of the `while` loop was extracted into
`demultiplex_sideband()`, crucially _including_ the logic to write
incomplete sideband messages:

	switch (band) {
	[...]
	case 1:
		*sideband_type = SIDEBAND_PRIMARY;
/* This does not break out of the loop: the loop is in the caller */
		break;
	[...]
	}

cleanup:
	[...]
/* This logic is now no longer _outside_ the loop but _inside_ */
	if (scratch->len) {
		strbuf_addch(scratch, '\n');
		xwrite(2, scratch->buf, scratch->len);
	}

The correct way to fix this is to return from `demultiplex_sideband()`
early. The caller will then write out the contents of the primary packet
and continue looping. The `scratch` buffer for incomplete sideband
messages is owned by that caller, and will continue to accumulate the
remainder(s) of those messages. The loop will only end once
`demultiplex_sideband()` returned non-zero _and_ did not indicate a
primary packet, which is the case only when we hit the `cleanup:` path,
in which we take care of flushing any unfinished sideband messages and
release the `scratch` buffer.

To ensure that this does not get broken again, we introduce a pair of
subcommands of the `pkt-line` test helper that specifically chop up the
sideband message and squeeze a primary packet into the middle.

Final note: The other test case touched by 2b695ecd74d (t5500: count
objects through stderr, not trace, 2020-05-06) is not affected by this
issue because the sideband machinery is not involved there.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 sideband.c               |  2 +-
 t/helper/test-pkt-line.c | 23 +++++++++++++++++++++++
 t/t0070-fundamental.sh   |  6 ++++++
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/sideband.c b/sideband.c
index 0a60662fa6..a5405b9aaa 100644
--- a/sideband.c
+++ b/sideband.c
@@ -190,7 +190,7 @@ int demultiplex_sideband(const char *me, char *buf, int len,
 		return 0;
 	case 1:
 		*sideband_type = SIDEBAND_PRIMARY;
-		break;
+		return 1;
 	default:
 		strbuf_addf(scratch, "%s%s: protocol error: bad band #%d",
 			    scratch->len ? "\n" : "", me, band);
diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
index 69152958e5..0bf20642be 100644
--- a/t/helper/test-pkt-line.c
+++ b/t/helper/test-pkt-line.c
@@ -84,6 +84,25 @@ static void unpack_sideband(void)
 	}
 }
 
+static int send_split_sideband(void)
+{
+	const char *part1 = "Hello,";
+	const char *primary = "\001primary: regular output\n";
+	const char *part2 = " world!\n";
+
+	send_sideband(1, 2, part1, strlen(part1), LARGE_PACKET_MAX);
+	packet_write(1, primary, strlen(primary));
+	send_sideband(1, 2, part2, strlen(part2), LARGE_PACKET_MAX);
+	packet_response_end(1);
+
+	return 0;
+}
+
+static int receive_sideband(void)
+{
+	return recv_sideband("sideband: ", 0, 1);
+}
+
 int cmd__pkt_line(int argc, const char **argv)
 {
 	if (argc < 2)
@@ -95,6 +114,10 @@ int cmd__pkt_line(int argc, const char **argv)
 		unpack();
 	else if (!strcmp(argv[1], "unpack-sideband"))
 		unpack_sideband();
+	else if (!strcmp(argv[1], "send-split-sideband"))
+		send_split_sideband();
+	else if (!strcmp(argv[1], "receive-sideband"))
+		receive_sideband();
 	else
 		die("invalid argument '%s'", argv[1]);
 
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 7b111a56fd..357201640a 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -34,4 +34,10 @@ test_expect_success 'check for a bug in the regex routines' '
 	test-tool regex --bug
 '
 
+test_expect_success 'incomplete sideband messages are reassembled' '
+	test-tool pkt-line send-split-sideband >split-sideband &&
+	test-tool pkt-line receive-sideband <split-sideband 2>err &&
+	grep "Hello, world" err
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3 2/2] sideband: report unhandled incomplete sideband messages as bugs
  2020-10-26 16:09   ` [PATCH v3 0/2] " Johannes Schindelin via GitGitGadget
  2020-10-26 16:09     ` [PATCH v3 1/2] sideband: avoid reporting incomplete sideband messages Johannes Schindelin via GitGitGadget
@ 2020-10-26 16:09     ` Johannes Schindelin via GitGitGadget
  2020-10-26 18:33     ` [PATCH v3 0/2] Work around flakiness in t5500.43 Junio C Hamano
  2020-10-27  6:52     ` [PATCH] sideband: diagnose more incoming packet anomalies Jeff King
  3 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-10-26 16:09 UTC (permalink / raw)
  To: git
  Cc: Jonathan Tan, Jeff King, Johannes Schindelin, Johannes Schindelin,
	Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

It was pretty tricky to verify that incomplete sideband messages are
handled correctly by the `recv_sideband()`/`demultiplex_sideband()`
code: they have to be flushed out at the end of the loop in
`recv_sideband()`, but the actual flushing is done by the
`demultiplex_sideband()` function (which therefore has to know somehow
that the loop will be done after it returns).

To catch future bugs where incomplete sideband messages might not be
shown by mistake, let's catch that condition and report a bug.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 pkt-line.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 844c253ccd..657a702927 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -471,6 +471,9 @@ int recv_sideband(const char *me, int in_stream, int out)
 			write_or_die(out, buf + 1, len - 1);
 			break;
 		default: /* errors: message already written */
+			if (scratch.len > 0)
+				BUG("unhandled incomplete sideband: '%s'",
+				    scratch.buf);
 			return sideband_type;
 		}
 	}
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v3 0/2] Work around flakiness in t5500.43
  2020-10-26 16:09   ` [PATCH v3 0/2] " Johannes Schindelin via GitGitGadget
  2020-10-26 16:09     ` [PATCH v3 1/2] sideband: avoid reporting incomplete sideband messages Johannes Schindelin via GitGitGadget
  2020-10-26 16:09     ` [PATCH v3 2/2] sideband: report unhandled incomplete sideband messages as bugs Johannes Schindelin via GitGitGadget
@ 2020-10-26 18:33     ` Junio C Hamano
  2020-10-27  6:52     ` [PATCH] sideband: diagnose more incoming packet anomalies Jeff King
  3 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2020-10-26 18:33 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: git, Jonathan Tan, Jeff King, Johannes Schindelin

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> It seems that this test became flaky only recently, although I have to admit
> that I have no idea why: the involved code does not seem to have changed
> recently at all. It should have been fixed by 
> https://lore.kernel.org/git/20200506220741.71021-1-jonathantanmy@google.com/
> , but apparently wasn't completely fixed, despite what I said in that
> thread.
>
> Changes since v2:
>
>  * Dropped patch 3/3 because it was only intended to be defensive
>    programming, but turned out to be too hard without layering violations.

Thanks, the remaining two look identical, so let me preserve the
timestamps by just dropping the tip one from what is queued, instead
of queuing these two patches.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] sideband: diagnose more incoming packet anomalies
  2020-10-26 16:09   ` [PATCH v3 0/2] " Johannes Schindelin via GitGitGadget
                       ` (2 preceding siblings ...)
  2020-10-26 18:33     ` [PATCH v3 0/2] Work around flakiness in t5500.43 Junio C Hamano
@ 2020-10-27  6:52     ` Jeff King
  2020-10-27  7:12       ` Jeff King
  2020-10-27  7:13       ` Jeff King
  3 siblings, 2 replies; 33+ messages in thread
From: Jeff King @ 2020-10-27  6:52 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Junio C Hamano, git, Jonathan Tan, Johannes Schindelin

On Mon, Oct 26, 2020 at 04:09:13PM +0000, Johannes Schindelin via GitGitGadget wrote:

> Changes since v2:
> 
>  * Dropped patch 3/3 because it was only intended to be defensive
>    programming, but turned out to be too hard without layering violations.

It is a bit tricky, but here's a working replacement.

-- >8 --
Subject: [PATCH] sideband: diagnose more sideband anomalies

In demultiplex_sideband(), there are two oddities when we check an
incoming packet:

  - if it has zero length, then we assume it's a flush packet. This
    means we fail to notice the difference between a real flush and a
    true zero-length packet that's missing its sideband designator. It's
    not a huge problem in practice because we'd never send a zero-length
    data packet (even our keepalives are otherwise-empty sideband-1
    packets).

    But it would be nice to detect and report the error, since it's
    likely to cause other confusion (we think the other side flushed,
    but they do not).

  - we try to detect packets missing their designator by checking for
    "if (len < 1)". But this will never trigger for "len == 0"; we've
    already detected that and left the function before then.

    It _could_ detect a negative "len" parameter. But in that case, the
    error message is wrong. The issue is not "no sideband" but rather
    "eof while reading the packet". However, this can't actually be
    triggered in practice, because neither of the two callers uses
    pkt_read's GENTLE_ON_EOF flag. Which means they'd die with "the
    remote end hung up unexpectedly" before we even get here.

    So this truly is dead code.

We can improve these cases by passing in a pkt-line status to the
demultiplexer, and by having recv_sideband() use GENTLE_ON_EOF. This
gives us two improvements:

  - we can now reliably detect flush packets, and will report a normal
    packet missing its sideband designator as an error

  - we'll report an eof with a more detailed "protocol error: eof while
    reading sideband packet", rather than the generic "the remote end
    hung up unexpectedly"

  - when we see an eof, we'll flush the sideband scratch buffer, which
    may provide some hints from the remote about why they hung up
    (though note we already flush on newlines, so it's likely that most
    such messages already made it through)

In some sense this patch goes against fbd76cd450 (sideband: reverse its
dependency on pkt-line, 2019-01-16), which caused the sideband code not
to depend on the pkt-line code. But that commit was really just trying
to deal with the circular header dependency. The two modules are
conceptually interlinked, and it was just trying to keep things
compiling. And indeed, there's a sticking point in this patch: because
pkt-line.h includes sideband.h, we can't add the reverse include we need
for the sideband code to have an "enum packet_read_status" parameter.
Nor can we forward declare it, because you can't forward declare an enum
in C. However, C does guarantee that enums fit in an int, so we can just
use that type.

One alternative would be for the callers to check themselves that they
got something sane from the pkt-line code. But besides duplicating
logic, this gets quite tricky. Any error condition requires flushing the
sideband #2 scratch buffer, which only demultiplex_sideband() knows how
to do.

Signed-off-by: Jeff King <peff@peff.net>
---
 pkt-line.c             | 14 ++++++++------
 sideband.c             | 29 ++++++++++++++++++++++-------
 sideband.h             |  6 +++++-
 t/t0070-fundamental.sh | 12 ++++++++++++
 4 files changed, 47 insertions(+), 14 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 657a702927..d633005ef7 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -461,9 +461,11 @@ int recv_sideband(const char *me, int in_stream, int out)
 	enum sideband_type sideband_type;
 
 	while (1) {
-		len = packet_read(in_stream, NULL, NULL, buf, LARGE_PACKET_MAX,
-				  0);
-		if (!demultiplex_sideband(me, buf, len, 0, &scratch,
+		int status = packet_read_with_status(in_stream, NULL, NULL,
+						     buf, LARGE_PACKET_MAX,
+						     &len,
+						     PACKET_READ_GENTLE_ON_EOF);
+		if (!demultiplex_sideband(me, status, buf, len, 0, &scratch,
 					  &sideband_type))
 			continue;
 		switch (sideband_type) {
@@ -520,9 +522,9 @@ enum packet_read_status packet_reader_read(struct packet_reader *reader)
 							 reader->options);
 		if (!reader->use_sideband)
 			break;
-		if (demultiplex_sideband(reader->me, reader->buffer,
-					 reader->pktlen, 1, &scratch,
-					 &sideband_type))
+		if (demultiplex_sideband(reader->me, reader->status,
+					 reader->buffer, reader->pktlen, 1,
+					 &scratch, &sideband_type))
 			break;
 	}
 
diff --git a/sideband.c b/sideband.c
index a5405b9aaa..a0d3fb9652 100644
--- a/sideband.c
+++ b/sideband.c
@@ -3,6 +3,7 @@
 #include "config.h"
 #include "sideband.h"
 #include "help.h"
+#include "pkt-line.h"
 
 struct keyword_entry {
 	/*
@@ -114,7 +115,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
 #define ANSI_SUFFIX "\033[K"
 #define DUMB_SUFFIX "        "
 
-int demultiplex_sideband(const char *me, char *buf, int len,
+int demultiplex_sideband(const char *me, int status,
+			 char *buf, int len,
 			 int die_on_error,
 			 struct strbuf *scratch,
 			 enum sideband_type *sideband_type)
@@ -130,17 +132,30 @@ int demultiplex_sideband(const char *me, char *buf, int len,
 			suffix = DUMB_SUFFIX;
 	}
 
-	if (len == 0) {
-		*sideband_type = SIDEBAND_FLUSH;
-		goto cleanup;
-	}
-	if (len < 1) {
+	if (status == PACKET_READ_EOF) {
 		strbuf_addf(scratch,
-			    "%s%s: protocol error: no band designator",
+			    "%s%s: protocol error: eof while reading sideband packet",
 			    scratch->len ? "\n" : "", me);
 		*sideband_type = SIDEBAND_PROTOCOL_ERROR;
 		goto cleanup;
 	}
+
+	if (len < 0)
+		BUG("negative length on non-eof packet read");
+
+	if (len == 0) {
+		if (status == PACKET_READ_NORMAL) {
+			strbuf_addf(scratch,
+				    "%s%s: protocol error: missing sideband designator",
+				    scratch->len ? "\n" : "", me);
+			*sideband_type = SIDEBAND_PROTOCOL_ERROR;
+		} else {
+			/* covers flush, delim, etc */
+			*sideband_type = SIDEBAND_FLUSH;
+		}
+		goto cleanup;
+	}
+
 	band = buf[0] & 0xff;
 	buf[len] = '\0';
 	len--;
diff --git a/sideband.h b/sideband.h
index 227740a58e..5a25331be5 100644
--- a/sideband.h
+++ b/sideband.h
@@ -18,8 +18,12 @@ enum sideband_type {
  *
  * scratch must be a struct strbuf allocated by the caller. It is used to store
  * progress messages split across multiple packets.
+ *
+ * The "status" parameter is a pkt-line response as returned by
+ * packet_read_with_status() (e.g., PACKET_READ_NORMAL).
  */
-int demultiplex_sideband(const char *me, char *buf, int len,
+int demultiplex_sideband(const char *me, int status,
+			 char *buf, int len,
 			 int die_on_error,
 			 struct strbuf *scratch,
 			 enum sideband_type *sideband_type);
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 357201640a..936030a5ef 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -40,4 +40,16 @@ test_expect_success 'incomplete sideband messages are reassembled' '
 	grep "Hello, world" err
 '
 
+test_expect_success 'eof on sideband message is reported' '
+	printf 1234 >input &&
+	test-tool pkt-line receive-sideband <input 2>err &&
+	test_i18ngrep "eof while reading" err
+'
+
+test_expect_success 'missing sideband designator is reported' '
+	printf 0004 >input &&
+	test-tool pkt-line receive-sideband <input 2>err &&
+	test_i18ngrep "missing sideband" err
+'
+
 test_done
-- 
2.29.1.634.g9e41dc1bf2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] sideband: diagnose more incoming packet anomalies
  2020-10-27  6:52     ` [PATCH] sideband: diagnose more incoming packet anomalies Jeff King
@ 2020-10-27  7:12       ` Jeff King
  2020-10-27 18:56         ` Junio C Hamano
  2020-10-27  7:13       ` Jeff King
  1 sibling, 1 reply; 33+ messages in thread
From: Jeff King @ 2020-10-27  7:12 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Junio C Hamano, git, Jonathan Tan, Johannes Schindelin

On Tue, Oct 27, 2020 at 02:52:50AM -0400, Jeff King wrote:

>   - we'll report an eof with a more detailed "protocol error: eof while
>     reading sideband packet", rather than the generic "the remote end
>     hung up unexpectedly"

I wasn't sure if "eof" was too jargon-y. This is probably a message
normal users are likely to see if the network drops out on them.

So I'm open to rewording suggestions. Or even dropping the eof bits
entirely. They do give us better flushing behavior:

>   - when we see an eof, we'll flush the sideband scratch buffer, which
>     may provide some hints from the remote about why they hung up
>     (though note we already flush on newlines, so it's likely that most
>     such messages already made it through)

But I suspect that's uncommon in practice.

>  t/t0070-fundamental.sh | 12 ++++++++++++

I added this to t0070 since that's where your earlier test went. It does
feel like this isn't quite as "fundamental" as some of the other things
in that file, though.

> +test_expect_success 'missing sideband designator is reported' '
> +	printf 0004 >input &&
> +	test-tool pkt-line receive-sideband <input 2>err &&
> +	test_i18ngrep "missing sideband" err
> +'

I found it much more straight-forward to just printf the sample input,
rather than writing C code to create it, as your test did. It keeps the
input and expectation together, and makes them easier to extend or
change.

I wonder if it's worth doing this:

 t/helper/test-pkt-line.c | 16 ----------------
 t/t0070-fundamental.sh   |  7 ++++++-
 t/test-lib-functions.sh  | 12 +++++++++++-
 3 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
index 5e638f0b97..0615833bec 100644
--- a/t/helper/test-pkt-line.c
+++ b/t/helper/test-pkt-line.c
@@ -84,20 +84,6 @@ static void unpack_sideband(void)
 	}
 }
 
-static int send_split_sideband(void)
-{
-	const char *part1 = "Hello,";
-	const char *primary = "\001primary: regular output\n";
-	const char *part2 = " world!\n";
-
-	send_sideband(1, 2, part1, strlen(part1), LARGE_PACKET_MAX);
-	packet_write(1, primary, strlen(primary));
-	send_sideband(1, 2, part2, strlen(part2), LARGE_PACKET_MAX);
-	packet_response_end(1);
-
-	return 0;
-}
-
 static int receive_sideband(void)
 {
 	return recv_sideband("sideband", 0, 1);
@@ -114,8 +100,6 @@ int cmd__pkt_line(int argc, const char **argv)
 		unpack();
 	else if (!strcmp(argv[1], "unpack-sideband"))
 		unpack_sideband();
-	else if (!strcmp(argv[1], "send-split-sideband"))
-		send_split_sideband();
 	else if (!strcmp(argv[1], "receive-sideband"))
 		receive_sideband();
 	else
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 936030a5ef..f9fda8d610 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -35,7 +35,12 @@ test_expect_success 'check for a bug in the regex routines' '
 '
 
 test_expect_success 'incomplete sideband messages are reassembled' '
-	test-tool pkt-line send-split-sideband >split-sideband &&
+	{
+		packetize -2 "Hello," &&
+		packetize -1 "primary: regular output" &&
+		packetize -2 " world!" &&
+		printf 0000
+	} >split-sideband &&
 	test-tool pkt-line receive-sideband <split-sideband 2>err &&
 	grep "Hello, world" err
 '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 8d59b90348..c90ea61747 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1427,10 +1427,20 @@ nongit () {
 # given on stdin, and that empty input becomes an empty packet, not a flush
 # packet (for that you can just print 0000 yourself).
 packetize() {
+	band=
+	bandlen=0
+	case "$1" in
+	-[123])
+		band="\\${1#-}"
+		bandlen=1
+		shift
+		;;
+	esac
+
 	if test $# -gt 0
 	then
 		packet="$*"
-		printf '%04x%s' "$((4 + ${#packet}))" "$packet"
+		printf "%04x$band%s" "$((4 + $bandlen + ${#packet}))" "$packet"
 	else
 		perl -e '
 			my $packet = do { local $/; <STDIN> };

Teaching packetize() about bands isn't strictly necessary (we could
printf the band designators ourselves and pipe it into packetize(); that
would also add the missing newlines, though those do not matter for this
test).

-Peff

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] sideband: diagnose more incoming packet anomalies
  2020-10-27  6:52     ` [PATCH] sideband: diagnose more incoming packet anomalies Jeff King
  2020-10-27  7:12       ` Jeff King
@ 2020-10-27  7:13       ` Jeff King
  2020-10-27 19:04         ` Junio C Hamano
  1 sibling, 1 reply; 33+ messages in thread
From: Jeff King @ 2020-10-27  7:13 UTC (permalink / raw)
  To: Johannes Schindelin via GitGitGadget
  Cc: Junio C Hamano, git, Jonathan Tan, Johannes Schindelin

On Tue, Oct 27, 2020 at 02:52:50AM -0400, Jeff King wrote:

> On Mon, Oct 26, 2020 at 04:09:13PM +0000, Johannes Schindelin via GitGitGadget wrote:
> 
> > Changes since v2:
> > 
> >  * Dropped patch 3/3 because it was only intended to be defensive
> >    programming, but turned out to be too hard without layering violations.
> 
> It is a bit tricky, but here's a working replacement.

And one other small cleanup, worth doing with or without my patch.

-- >8 --
Subject: [PATCH] test-pkt-line: drop colon from sideband identity

We pass "sideband: " as our identity for errors to recv_sideband(). But
it already adds the trailing colon and space. This doesn't invalidate
any tests, but it looks funny when you examine the test output.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/helper/test-pkt-line.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
index 0bf20642be..5e638f0b97 100644
--- a/t/helper/test-pkt-line.c
+++ b/t/helper/test-pkt-line.c
@@ -100,7 +100,7 @@ static int send_split_sideband(void)
 
 static int receive_sideband(void)
 {
-	return recv_sideband("sideband: ", 0, 1);
+	return recv_sideband("sideband", 0, 1);
 }
 
 int cmd__pkt_line(int argc, const char **argv)
-- 
2.29.1.634.g9e41dc1bf2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] sideband: diagnose more incoming packet anomalies
  2020-10-27  7:12       ` Jeff King
@ 2020-10-27 18:56         ` Junio C Hamano
  2020-10-27 20:42           ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2020-10-27 18:56 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan,
	Johannes Schindelin

Jeff King <peff@peff.net> writes:

> On Tue, Oct 27, 2020 at 02:52:50AM -0400, Jeff King wrote:
>
>>   - we'll report an eof with a more detailed "protocol error: eof while
>>     reading sideband packet", rather than the generic "the remote end
>>     hung up unexpectedly"
>
> I wasn't sure if "eof" was too jargon-y. This is probably a message
> normal users are likely to see if the network drops out on them.

We were expecting to see something, and instead of seeing that
specific thing we were waiting, we saw the end-of-file?

That sounds like "the other side hung up unexpectedly" to me.  Does
it bother you for being "too generic" because it does not say what
we were expecting to see?  "unexpected disconnect while waiting for
a sideband packet"?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] sideband: diagnose more incoming packet anomalies
  2020-10-27  7:13       ` Jeff King
@ 2020-10-27 19:04         ` Junio C Hamano
  0 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2020-10-27 19:04 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan,
	Johannes Schindelin

Jeff King <peff@peff.net> writes:

> And one other small cleanup, worth doing with or without my patch.
>
> -- >8 --
> Subject: [PATCH] test-pkt-line: drop colon from sideband identity
>
> We pass "sideband: " as our identity for errors to recv_sideband(). But
> it already adds the trailing colon and space. This doesn't invalidate
> any tests, but it looks funny when you examine the test output.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

Makes sense and it is a no-brainer to understand why it is the right
thing to do.  I'll queue it directly on top of two patches from
Dscho.

Thanks.

>  t/helper/test-pkt-line.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
> index 0bf20642be..5e638f0b97 100644
> --- a/t/helper/test-pkt-line.c
> +++ b/t/helper/test-pkt-line.c
> @@ -100,7 +100,7 @@ static int send_split_sideband(void)
>  
>  static int receive_sideband(void)
>  {
> -	return recv_sideband("sideband: ", 0, 1);
> +	return recv_sideband("sideband", 0, 1);
>  }
>  
>  int cmd__pkt_line(int argc, const char **argv)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] sideband: diagnose more incoming packet anomalies
  2020-10-27 18:56         ` Junio C Hamano
@ 2020-10-27 20:42           ` Jeff King
  2020-10-27 21:38             ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2020-10-27 20:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan,
	Johannes Schindelin

On Tue, Oct 27, 2020 at 11:56:46AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Tue, Oct 27, 2020 at 02:52:50AM -0400, Jeff King wrote:
> >
> >>   - we'll report an eof with a more detailed "protocol error: eof while
> >>     reading sideband packet", rather than the generic "the remote end
> >>     hung up unexpectedly"
> >
> > I wasn't sure if "eof" was too jargon-y. This is probably a message
> > normal users are likely to see if the network drops out on them.
> 
> We were expecting to see something, and instead of seeing that
> specific thing we were waiting, we saw the end-of-file?
> 
> That sounds like "the other side hung up unexpectedly" to me.  Does
> it bother you for being "too generic" because it does not say what
> we were expecting to see?  "unexpected disconnect while waiting for
> a sideband packet"?

Yes, saying what we were expecting is what seems like an improvement to
me. I tried coming up with something that said "hung up" but the wording
ended up awkward. Your "unexpected disconnect while waiting for a
sideband packet" is exactly what I was trying for. :)

Does this patch seem otherwise worth doing?

An alternate patch would be to keep the behavior the same and just
clarify the code a bit. Something like:

-- >8 --
Subject: [PATCH] demultiplex_sideband(): clarify corner cases

The size checks in demultiplex_sideband() are a bit subtle and
confusing:

  - we consider a zero-length packet ("0004") to be a flush packet, even
    though it's not really one. This is perhaps wrong, but it should
    never happen in our protocol, and we err on the side of history and
    leniency. We'll leave a comment indicating that we expect this case.

  - likewise we consider any flush-like packet (e.g., "0001" delim) to
    be a flush. I didn't confirm whether this is necessary for normal
    protocol usage. It may be for the everything-over-sideband mode
    introduced by 0bbc0bc574 ({fetch,upload}-pack: sideband v2 fetch
    response, 2019-01-16). Likewise let's leave a comment.

  - we check for "len < 1" to see if there's no sideband designator.
    That's confusing, because we already covered the "len == 0" case.
    What is interesting is the "len < 0" case. But that's not a missing
    sideband designator, but rather an error or EOF from the pkt-line
    code. This should never happen, though, because our callers instruct
    pkt-line to die() on EOF anyway. So let's make it more obvious
    that we're looking for a negative value here, and consider it a
    BUG() in the caller to pass us garbage.

Signed-off-by: Jeff King <peff@peff.net>
---
 sideband.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/sideband.c b/sideband.c
index 0a60662fa6..6ba1925614 100644
--- a/sideband.c
+++ b/sideband.c
@@ -130,17 +130,18 @@ int demultiplex_sideband(const char *me, char *buf, int len,
 			suffix = DUMB_SUFFIX;
 	}
 
+	if (len < 0)
+		BUG("error/eof packet passed to demultiplex_sideband");
+
 	if (len == 0) {
+		/*
+		 * we treat all flush-like packets (flush, delim, etc) and even
+		 * empty data packets as a flush
+		 */
 		*sideband_type = SIDEBAND_FLUSH;
 		goto cleanup;
 	}
-	if (len < 1) {
-		strbuf_addf(scratch,
-			    "%s%s: protocol error: no band designator",
-			    scratch->len ? "\n" : "", me);
-		*sideband_type = SIDEBAND_PROTOCOL_ERROR;
-		goto cleanup;
-	}
+
 	band = buf[0] & 0xff;
 	buf[len] = '\0';
 	len--;
-- 
2.29.1.641.gb6287e56d2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] sideband: diagnose more incoming packet anomalies
  2020-10-27 20:42           ` Jeff King
@ 2020-10-27 21:38             ` Junio C Hamano
  2020-10-28  9:33               ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2020-10-27 21:38 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan,
	Johannes Schindelin

Jeff King <peff@peff.net> writes:

> Does this patch seem otherwise worth doing?

Yeah.

FWIW, I do not find the "dependency" thing disturbing.  sideband is
an extension of the pkt-line mechansim, so it is natural that it
depends on pkt-line.  I'd also be happy if enums, structures and
calls defined in both headers are made available by just including
one of them (e.g. retire sideband.h, perhaps).

> An alternate patch would be to keep the behavior the same and just
> clarify the code a bit. Something like:

This also looks OK to me from readability's point of view, but it
does not as much help the end user who is puzzled as the real thing,
I am afraid.

Thanks.

> -- >8 --
> Subject: [PATCH] demultiplex_sideband(): clarify corner cases
>
> The size checks in demultiplex_sideband() are a bit subtle and
> confusing:
>
>   - we consider a zero-length packet ("0004") to be a flush packet, even
>     though it's not really one. This is perhaps wrong, but it should
>     never happen in our protocol, and we err on the side of history and
>     leniency. We'll leave a comment indicating that we expect this case.
>
>   - likewise we consider any flush-like packet (e.g., "0001" delim) to
>     be a flush. I didn't confirm whether this is necessary for normal
>     protocol usage. It may be for the everything-over-sideband mode
>     introduced by 0bbc0bc574 ({fetch,upload}-pack: sideband v2 fetch
>     response, 2019-01-16). Likewise let's leave a comment.
>
>   - we check for "len < 1" to see if there's no sideband designator.
>     That's confusing, because we already covered the "len == 0" case.
>     What is interesting is the "len < 0" case. But that's not a missing
>     sideband designator, but rather an error or EOF from the pkt-line
>     code. This should never happen, though, because our callers instruct
>     pkt-line to die() on EOF anyway. So let's make it more obvious
>     that we're looking for a negative value here, and consider it a
>     BUG() in the caller to pass us garbage.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  sideband.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/sideband.c b/sideband.c
> index 0a60662fa6..6ba1925614 100644
> --- a/sideband.c
> +++ b/sideband.c
> @@ -130,17 +130,18 @@ int demultiplex_sideband(const char *me, char *buf, int len,
>  			suffix = DUMB_SUFFIX;
>  	}
>  
> +	if (len < 0)
> +		BUG("error/eof packet passed to demultiplex_sideband");
> +
>  	if (len == 0) {
> +		/*
> +		 * we treat all flush-like packets (flush, delim, etc) and even
> +		 * empty data packets as a flush
> +		 */
>  		*sideband_type = SIDEBAND_FLUSH;
>  		goto cleanup;
>  	}
> -	if (len < 1) {
> -		strbuf_addf(scratch,
> -			    "%s%s: protocol error: no band designator",
> -			    scratch->len ? "\n" : "", me);
> -		*sideband_type = SIDEBAND_PROTOCOL_ERROR;
> -		goto cleanup;
> -	}
> +
>  	band = buf[0] & 0xff;
>  	buf[len] = '\0';
>  	len--;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] sideband: diagnose more incoming packet anomalies
  2020-10-27 21:38             ` Junio C Hamano
@ 2020-10-28  9:33               ` Jeff King
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2020-10-28  9:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Jonathan Tan,
	Johannes Schindelin

On Tue, Oct 27, 2020 at 02:38:12PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Does this patch seem otherwise worth doing?
> 
> Yeah.

OK, here it is, then with that one error message reworded. I dropped
"protocol error" from that one message, as well. One, because it was
getting quite long. And two, it's probably _not_ a protocol error, but
rather a network drop. So just describing what we see on the client
side is the best strategy.

> FWIW, I do not find the "dependency" thing disturbing.  sideband is
> an extension of the pkt-line mechansim, so it is natural that it
> depends on pkt-line.  I'd also be happy if enums, structures and
> calls defined in both headers are made available by just including
> one of them (e.g. retire sideband.h, perhaps).

I agree with that. Given how easy it was to work around with an int, I'm
not inclined to spend time going further at this point, but it's
something to keep in mind.

> > An alternate patch would be to keep the behavior the same and just
> > clarify the code a bit. Something like:
> 
> This also looks OK to me from readability's point of view, but it
> does not as much help the end user who is puzzled as the real thing,
> I am afraid.

I suspect these cases are rare enough that nobody minds either way. But
I do think the behavior-changing patch is making us more likely to do
the right thing. The downside is the risk of some unintended regression,
but I feel pretty good about it.

-- >8 --
Subject: [PATCH] sideband: diagnose more sideband anomalies

In demultiplex_sideband(), there are two oddities when we check an
incoming packet:

  - if it has zero length, then we assume it's a flush packet. This
    means we fail to notice the difference between a real flush and a
    true zero-length packet that's missing its sideband designator. It's
    not a huge problem in practice because we'd never send a zero-length
    data packet (even our keepalives are otherwise-empty sideband-1
    packets).

    But it would be nice to detect and report the error, since it's
    likely to cause other confusion (we think the other side flushed,
    but they do not).

  - we try to detect packets missing their designator by checking for
    "if (len < 1)". But this will never trigger for "len == 0"; we've
    already detected that and left the function before then.

    It _could_ detect a negative "len" parameter. But in that case, the
    error message is wrong. The issue is not "no sideband" but rather
    "eof while reading the packet". However, this can't actually be
    triggered in practice, because neither of the two callers uses
    pkt_read's GENTLE_ON_EOF flag. Which means they'd die with "the
    remote end hung up unexpectedly" before we even get here.

    So this truly is dead code.

We can improve these cases by passing in a pkt-line status to the
demultiplexer, and by having recv_sideband() use GENTLE_ON_EOF. This
gives us two improvements:

  - we can now reliably detect flush packets, and will report a normal
    packet missing its sideband designator as an error

  - we'll report an eof with a more detailed "protocol error: eof while
    reading sideband packet", rather than the generic "the remote end
    hung up unexpectedly"

  - when we see an eof, we'll flush the sideband scratch buffer, which
    may provide some hints from the remote about why they hung up
    (though note we already flush on newlines, so it's likely that most
    such messages already made it through)

In some sense this patch goes against fbd76cd450 (sideband: reverse its
dependency on pkt-line, 2019-01-16), which caused the sideband code not
to depend on the pkt-line code. But that commit was really just trying
to deal with the circular header dependency. The two modules are
conceptually interlinked, and it was just trying to keep things
compiling. And indeed, there's a sticking point in this patch: because
pkt-line.h includes sideband.h, we can't add the reverse include we need
for the sideband code to have an "enum packet_read_status" parameter.
Nor can we forward declare it, because you can't forward declare an enum
in C. However, C does guarantee that enums fit in an int, so we can just
use that type.

One alternative would be for the callers to check themselves that they
got something sane from the pkt-line code. But besides duplicating
logic, this gets quite tricky. Any error condition requires flushing the
sideband #2 scratch buffer, which only demultiplex_sideband() knows how
to do.

Signed-off-by: Jeff King <peff@peff.net>
---
 pkt-line.c             | 14 ++++++++------
 sideband.c             | 29 ++++++++++++++++++++++-------
 sideband.h             |  6 +++++-
 t/t0070-fundamental.sh | 12 ++++++++++++
 4 files changed, 47 insertions(+), 14 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 657a702927..d633005ef7 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -461,9 +461,11 @@ int recv_sideband(const char *me, int in_stream, int out)
 	enum sideband_type sideband_type;
 
 	while (1) {
-		len = packet_read(in_stream, NULL, NULL, buf, LARGE_PACKET_MAX,
-				  0);
-		if (!demultiplex_sideband(me, buf, len, 0, &scratch,
+		int status = packet_read_with_status(in_stream, NULL, NULL,
+						     buf, LARGE_PACKET_MAX,
+						     &len,
+						     PACKET_READ_GENTLE_ON_EOF);
+		if (!demultiplex_sideband(me, status, buf, len, 0, &scratch,
 					  &sideband_type))
 			continue;
 		switch (sideband_type) {
@@ -520,9 +522,9 @@ enum packet_read_status packet_reader_read(struct packet_reader *reader)
 							 reader->options);
 		if (!reader->use_sideband)
 			break;
-		if (demultiplex_sideband(reader->me, reader->buffer,
-					 reader->pktlen, 1, &scratch,
-					 &sideband_type))
+		if (demultiplex_sideband(reader->me, reader->status,
+					 reader->buffer, reader->pktlen, 1,
+					 &scratch, &sideband_type))
 			break;
 	}
 
diff --git a/sideband.c b/sideband.c
index a5405b9aaa..6f9e026732 100644
--- a/sideband.c
+++ b/sideband.c
@@ -3,6 +3,7 @@
 #include "config.h"
 #include "sideband.h"
 #include "help.h"
+#include "pkt-line.h"
 
 struct keyword_entry {
 	/*
@@ -114,7 +115,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, const char *src, int n)
 #define ANSI_SUFFIX "\033[K"
 #define DUMB_SUFFIX "        "
 
-int demultiplex_sideband(const char *me, char *buf, int len,
+int demultiplex_sideband(const char *me, int status,
+			 char *buf, int len,
 			 int die_on_error,
 			 struct strbuf *scratch,
 			 enum sideband_type *sideband_type)
@@ -130,17 +132,30 @@ int demultiplex_sideband(const char *me, char *buf, int len,
 			suffix = DUMB_SUFFIX;
 	}
 
-	if (len == 0) {
-		*sideband_type = SIDEBAND_FLUSH;
-		goto cleanup;
-	}
-	if (len < 1) {
+	if (status == PACKET_READ_EOF) {
 		strbuf_addf(scratch,
-			    "%s%s: protocol error: no band designator",
+			    "%s%s: unexpected disconnect while reading sideband packet",
 			    scratch->len ? "\n" : "", me);
 		*sideband_type = SIDEBAND_PROTOCOL_ERROR;
 		goto cleanup;
 	}
+
+	if (len < 0)
+		BUG("negative length on non-eof packet read");
+
+	if (len == 0) {
+		if (status == PACKET_READ_NORMAL) {
+			strbuf_addf(scratch,
+				    "%s%s: protocol error: missing sideband designator",
+				    scratch->len ? "\n" : "", me);
+			*sideband_type = SIDEBAND_PROTOCOL_ERROR;
+		} else {
+			/* covers flush, delim, etc */
+			*sideband_type = SIDEBAND_FLUSH;
+		}
+		goto cleanup;
+	}
+
 	band = buf[0] & 0xff;
 	buf[len] = '\0';
 	len--;
diff --git a/sideband.h b/sideband.h
index 227740a58e..5a25331be5 100644
--- a/sideband.h
+++ b/sideband.h
@@ -18,8 +18,12 @@ enum sideband_type {
  *
  * scratch must be a struct strbuf allocated by the caller. It is used to store
  * progress messages split across multiple packets.
+ *
+ * The "status" parameter is a pkt-line response as returned by
+ * packet_read_with_status() (e.g., PACKET_READ_NORMAL).
  */
-int demultiplex_sideband(const char *me, char *buf, int len,
+int demultiplex_sideband(const char *me, int status,
+			 char *buf, int len,
 			 int die_on_error,
 			 struct strbuf *scratch,
 			 enum sideband_type *sideband_type);
diff --git a/t/t0070-fundamental.sh b/t/t0070-fundamental.sh
index 357201640a..8d59905ef0 100755
--- a/t/t0070-fundamental.sh
+++ b/t/t0070-fundamental.sh
@@ -40,4 +40,16 @@ test_expect_success 'incomplete sideband messages are reassembled' '
 	grep "Hello, world" err
 '
 
+test_expect_success 'eof on sideband message is reported' '
+	printf 1234 >input &&
+	test-tool pkt-line receive-sideband <input 2>err &&
+	test_i18ngrep "unexpected disconnect" err
+'
+
+test_expect_success 'missing sideband designator is reported' '
+	printf 0004 >input &&
+	test-tool pkt-line receive-sideband <input 2>err &&
+	test_i18ngrep "missing sideband" err
+'
+
 test_done
-- 
2.29.1.641.gb6287e56d2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-10-28 23:00 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-13 14:45 [PATCH] t5500.43: make the check a bit more robust Johannes Schindelin via GitGitGadget
2020-10-13 17:53 ` Junio C Hamano
2020-10-13 18:55 ` Jeff King
2020-10-13 19:07   ` Junio C Hamano
2020-10-13 19:09   ` Jeff King
2020-10-17  3:31     ` Johannes Schindelin
2020-10-17  2:34   ` Johannes Schindelin
2020-10-19 19:35 ` [PATCH v2 0/3] Work around flakiness in t5500.43 Johannes Schindelin via GitGitGadget
2020-10-19 19:35   ` [PATCH v2 1/3] sideband: avoid reporting incomplete sideband messages Johannes Schindelin via GitGitGadget
2020-10-20 20:33     ` Junio C Hamano
2020-10-20 20:48       ` Johannes Schindelin
2020-10-20 21:33         ` Junio C Hamano
2020-10-19 19:35   ` [PATCH v2 2/3] sideband: report unhandled incomplete sideband messages as bugs Johannes Schindelin via GitGitGadget
2020-10-19 19:35   ` [PATCH v2 3/3] sideband: add defense against packets missing a band designator Johannes Schindelin via GitGitGadget
2020-10-20 20:36     ` Junio C Hamano
2020-10-23  8:34     ` Jeff King
2020-10-23 14:44       ` Junio C Hamano
2020-10-23  8:48     ` Jeff King
2020-10-23  5:36       ` Johannes Schindelin
2020-10-23  8:50   ` [PATCH v2 0/3] Work around flakiness in t5500.43 Jeff King
2020-10-26 16:04     ` Johannes Schindelin
2020-10-26 16:09   ` [PATCH v3 0/2] " Johannes Schindelin via GitGitGadget
2020-10-26 16:09     ` [PATCH v3 1/2] sideband: avoid reporting incomplete sideband messages Johannes Schindelin via GitGitGadget
2020-10-26 16:09     ` [PATCH v3 2/2] sideband: report unhandled incomplete sideband messages as bugs Johannes Schindelin via GitGitGadget
2020-10-26 18:33     ` [PATCH v3 0/2] Work around flakiness in t5500.43 Junio C Hamano
2020-10-27  6:52     ` [PATCH] sideband: diagnose more incoming packet anomalies Jeff King
2020-10-27  7:12       ` Jeff King
2020-10-27 18:56         ` Junio C Hamano
2020-10-27 20:42           ` Jeff King
2020-10-27 21:38             ` Junio C Hamano
2020-10-28  9:33               ` Jeff King
2020-10-27  7:13       ` Jeff King
2020-10-27 19:04         ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).