do people find t5504.8 flaky?

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* do people find t5504.8 flaky?
@ 2019-04-23  2:45 Junio C Hamano
  2019-04-23  3:02 ` Jeff King
  2019-04-29 13:36 ` Johannes Schindelin
  0 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2019-04-23  2:45 UTC (permalink / raw)
  To: git

I have been seeing occasional failures of t5504-fetch-receive-strict
test on the cc/replace-graft-peel-tags topic, but it seems that the
fork point of that topic from the mainline already fails the same
step #8, only less frequently.

The push is rejected as expected, but the remote side that receives
the "push" fails and the local side does not leave an expected
output we expect when the test fails.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: do people find t5504.8 flaky?
  2019-04-23  2:45 do people find t5504.8 flaky? Junio C Hamano
@ 2019-04-23  3:02 ` Jeff King
  2019-11-13  0:07   ` SZEDER Gábor
  2019-04-29 13:36 ` Johannes Schindelin
  1 sibling, 1 reply; 9+ messages in thread
From: Jeff King @ 2019-04-23  3:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Apr 23, 2019 at 11:45:17AM +0900, Junio C Hamano wrote:

> I have been seeing occasional failures of t5504-fetch-receive-strict
> test on the cc/replace-graft-peel-tags topic, but it seems that the
> fork point of that topic from the mainline already fails the same
> step #8, only less frequently.
> 
> The push is rejected as expected, but the remote side that receives
> the "push" fails and the local side does not leave an expected
> output we expect when the test fails.

No, I haven't seen it fail, nor does running with --stress turn up
anything. But looking at the test I would not be at all surprised if we
have races around error hangups. I believe that index-pack will die
unceremoniously as soon as something fails to pass its fsck check.

The client will keep sending data, and may hit a SIGPIPE (or the network
equivalent), depending on how much slack there is in the buffers,
whether we hit the problem as a base object or after we receive
everything and start resolving deltas, etc.

I think after seeing a fatal error we probably ought to consider pumping
the rest of the bytes from the client to /dev/null. That's wasteful, but
it's the only clean way to get a message back, I think. It would also
give us the opportunity to complain about other objects, too, if there
are multiple (it might make sense to abort before resolving deltas,
though; at that point we have all of the data and that phase is very CPU
intensive).

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: do people find t5504.8 flaky?
  2019-04-23  2:45 do people find t5504.8 flaky? Junio C Hamano
  2019-04-23  3:02 ` Jeff King
@ 2019-04-29 13:36 ` Johannes Schindelin
  1 sibling, 0 replies; 9+ messages in thread
From: Johannes Schindelin @ 2019-04-29 13:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

On Tue, 23 Apr 2019, Junio C Hamano wrote:

> I have been seeing occasional failures of t5504-fetch-receive-strict
> test on the cc/replace-graft-peel-tags topic, but it seems that the
> fork point of that topic from the mainline already fails the same
> step #8, only less frequently.

Sounds familiar. IIRC it fails on Azure Pipelines from time to time, most
often in linux-clang, less often in linux32. But I don't have hard data on
that, I am still trying to get a grip on making the CI builds more robust
without having to ask you to integrate azure-pipelines.yml changes into
*all* branches.

Currently, my idea is to have two different Azure Pipelines set up: one
that performs the continuous testing using azure-pipelines.yml on your
maint, master, next and pu (do you want jch, too?) [*1*], and one that has
a separate build definition and tries to work around many (all?) known
issues with existing branches, e.g. coping with the known SIGPIPE issues
in the git-daemon tests on macOS that *have* been addressed in master but
not in some of the other branches that are accumulated into pu [*2*]. This
latter build is *not* using azure-pipelines.yml, but a build definition
that is maintained directly in Azure Pipelines ("Visual Designer" is what
this feature is called in Azure Pipelines).

Ciao,
Dscho

Footnote *1*: https://dev.azure.com/gitgitgadget/git/_build?definitionId=6

Footnote *2*: https://dev.azure.com/gitgitgadget/git/_build?definitionId=4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: do people find t5504.8 flaky?
  2019-04-23  3:02 ` Jeff King
@ 2019-11-13  0:07   ` SZEDER Gábor
  2019-11-13  1:03     ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: SZEDER Gábor @ 2019-11-13  0:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Johannes Schindelin

On Mon, Apr 22, 2019 at 11:02:54PM -0400, Jeff King wrote:
> On Tue, Apr 23, 2019 at 11:45:17AM +0900, Junio C Hamano wrote:
> 
> > I have been seeing occasional failures of t5504-fetch-receive-strict
> > test on the cc/replace-graft-peel-tags topic, but it seems that the
> > fork point of that topic from the mainline already fails the same
> > step #8, only less frequently.
> > 
> > The push is rejected as expected, but the remote side that receives
> > the "push" fails and the local side does not leave an expected
> > output we expect when the test fails.

I've seen it fail a few times on Travis CI, but it's rare, much rarer
than our "avarage" flaky test failures.

The subsequent test t5504.9 is flaky as well: the two tests are
essentially the same, they only differ in the configuration variable
that enables the fsck checks.

> No, I haven't seen it fail, nor does running with --stress turn up
> anything.

I can reproduce the failure fairly quickly with '-r 1,8 --stress' (and
nr of jobs = 4x cores).

FWIW, I enabled GIT_TRACE_PACKET and the relevant part of the failure
looks like this [1]:

  + test_must_fail env GIT_TRACE_PACKET=/home/szeder/src/git/t/trash directory.t5504-fetch-receive-strict.stress-8/trace-packet git push --porcelain dst master:refs/heads/test
  remote: fatal: object of unexpected type        
  error: remote unpack failed: unpack-objects abnormal exit
  error: failed to push some refs to 'dst'
  + cat trace-packet
  packet: receive-pack> 0000000000000000000000000000000000000000 capabilities^{}\0report-status delete-refs side-band-64k quiet atomic ofs-delta agent=git/2.24.0.1.g52e0cf1d06
  packet: receive-pack> 0000
  packet:         push< 0000000000000000000000000000000000000000 capabilities^{}\0report-status delete-refs side-band-64k quiet atomic ofs-delta agent=git/2.24.0.1.g52e0cf1d06
  packet:         push< 0000
  packet:         push> 0000000000000000000000000000000000000000 a7943252b7679bec6b9679dbc7863c08610ac2bc refs/heads/test\0 report-status side-band-64k quiet agent=git/2.24.0.1.g52e0cf1d06
  packet:         push> 0000
  packet: receive-pack< 0000000000000000000000000000000000000000 a7943252b7679bec6b9679dbc7863c08610ac2bc refs/heads/test\0 report-status side-band-64k quiet agent=git/2.24.0.1.g52e0cf1d06
  packet: receive-pack< 0000
  packet:     sideband< \2fatal: object of unexpected type
  packet: receive-pack> unpack unpack-objects abnormal exit
  packet: receive-pack> ng refs/heads/test unpacker error
  packet: receive-pack> 0000
  packet:     sideband< \10028unpack unpack-objects abnormal exit0026ng refs/heads/test unpacker error0000
  packet: receive-pack> 0000
  packet:     sideband< 0000
  packet:         push< unpack unpack-objects abnormal exit
  + test_cmp exp act
  --- exp 2019-11-12 23:40:33.131679990 +0000
  +++ act 2019-11-12 23:40:33.203680114 +0000
  @@ -1,2 +0,0 @@
  -To dst
  -!      refs/heads/master:refs/heads/test       [remote rejected] (unpacker error)
  error: last command exited with $?=1
  not ok 8 - push with receive.fsckobjects

Note that 'sideband< 0000' is not the last packet.

For comparison, here is the packet trace from a successful test run:

  + cat trace-packet
  packet: receive-pack> 0000000000000000000000000000000000000000 capabilities^{}\0report-status delete-refs side-band-64k quiet atomic ofs-delta agent=git/2.24.0.1.g52e0cf1d06
  packet: receive-pack> 0000
  packet:         push< 0000000000000000000000000000000000000000 capabilities^{}\0report-status delete-refs side-band-64k quiet atomic ofs-delta agent=git/2.24.0.1.g52e0cf1d06
  packet:         push< 0000
  packet:         push> 0000000000000000000000000000000000000000 38af865a0f3f0170ef7a18edcb3088d3f7961b21 refs/heads/test\0 report-status side-band-64k quiet agent=git/2.24.0.1.g52e0cf1d06
  packet:         push> 0000
  packet: receive-pack< 0000000000000000000000000000000000000000 38af865a0f3f0170ef7a18edcb3088d3f7961b21 refs/heads/test\0 report-status side-band-64k quiet agent=git/2.24.0.1.g52e0cf1d06
  packet: receive-pack< 0000
  packet:     sideband< \2fatal: object of unexpected type
  packet: receive-pack> unpack unpack-objects abnormal exit
  packet: receive-pack> ng refs/heads/test unpacker error
  packet: receive-pack> 0000
  packet:     sideband< \10028unpack unpack-objects abnormal exit0026ng refs/heads/test unpacker error0000
  packet:         push< unpack unpack-objects abnormal exit
  packet:         push< ng refs/heads/test unpacker error
  packet:         push< 0000
  packet: receive-pack> 0000
  packet:     sideband< 0000

Note that 'sideband< 0000' is the final packet.

Whether this confirms Peff's theories below, I don't know; sideband
always makes me dizzy :)

FWIW, I could reproduce the failure on ef7e93d908 (do not override
receive-pack errors, 2012-02-13) as well, i.e. on the commit that
started checking 'git push's output.

Hope it helps.


[1] Note the lack of a dozen or so '-x' trace lines from
    'test_must_fail' and 'test_cmp' ;)  Current WIP patch at:

    https://github.com/szeder/git/commit/52e0cf1d0695c107142e36905dfdbaceacdacf8c

> But looking at the test I would not be at all surprised if we
> have races around error hangups. I believe that index-pack will die
> unceremoniously as soon as something fails to pass its fsck check.
> 
> The client will keep sending data, and may hit a SIGPIPE (or the network
> equivalent), depending on how much slack there is in the buffers,
> whether we hit the problem as a base object or after we receive
> everything and start resolving deltas, etc.
> 
> I think after seeing a fatal error we probably ought to consider pumping
> the rest of the bytes from the client to /dev/null. That's wasteful, but
> it's the only clean way to get a message back, I think. It would also
> give us the opportunity to complain about other objects, too, if there
> are multiple (it might make sense to abort before resolving deltas,
> though; at that point we have all of the data and that phase is very CPU
> intensive).
> 
> -Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: do people find t5504.8 flaky?
  2019-11-13  0:07   ` SZEDER Gábor
@ 2019-11-13  1:03     ` Jeff King
  2019-11-13  2:07       ` Jeff King
  2019-11-13  3:47       ` Junio C Hamano
  0 siblings, 2 replies; 9+ messages in thread
From: Jeff King @ 2019-11-13  1:03 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git, Johannes Schindelin

On Wed, Nov 13, 2019 at 01:07:47AM +0100, SZEDER Gábor wrote:

> > No, I haven't seen it fail, nor does running with --stress turn up
> > anything.
> 
> I can reproduce the failure fairly quickly with '-r 1,8 --stress' (and
> nr of jobs = 4x cores).

I was able to reproduce with that, too (I wonder why I had so much
trouble before? Maybe "-r 1,8" simply gives many more opportunities per
second).

Running with "-x", I don't see any sign of a premature SIGPIPE death,
which is good:

  + git push --porcelain dst master:refs/heads/test
  remote: fatal: object of unexpected type        
  error: remote unpack failed: unpack-objects abnormal exit
  error: failed to push some refs to 'dst'
  + exit_code=1

The "exit_code" assignment there is from test_must_fail. And looking at
the code, I think we handle a remote hangup well, because the writer is
actually a pack-objects sub-process. So we hit this code in send-pack.c:

          rc = finish_command(&po);
          if (rc) {
                  /*
                   * For a normal non-zero exit, we assume pack-objects wrote
                   * something useful to stderr. For death by signal, though,
                   * we should mention it to the user. The exception is SIGPIPE
                   * (141), because that's a normal occurrence if the remote end
                   * hangs up (and we'll report that by trying to read the unpack
                   * status).
                   */
                  if (rc > 128 && rc != 141)
                          error("pack-objects died of signal %d", rc - 128);
                  return -1;
          }

But that error return might explain things.

The difference in your traces seems to be that in the failing case we
exit immediately after seeing the unpack error:

>   packet:     sideband< \2fatal: object of unexpected type
>   packet: receive-pack> unpack unpack-objects abnormal exit
>   packet: receive-pack> ng refs/heads/test unpacker error
>   packet: receive-pack> 0000
>   packet:     sideband< \10028unpack unpack-objects abnormal exit0026ng refs/heads/test unpacker error0000
>   packet: receive-pack> 0000
>   packet:     sideband< 0000
>   packet:         push< unpack unpack-objects abnormal exit
>   + test_cmp exp act
>   --- exp 2019-11-12 23:40:33.131679990 +0000
>   +++ act 2019-11-12 23:40:33.203680114 +0000
>   @@ -1,2 +0,0 @@
>   -To dst
>   -!      refs/heads/master:refs/heads/test       [remote rejected] (unpacker error)
>   error: last command exited with $?=1
>   not ok 8 - push with receive.fsckobjects

Note that "receive-pack" sent us the "ng refs/heads/test" line. And our
sideband demuxer even saw it! But it never made it to push.

Whereas in the successful case:

>   packet:     sideband< \2fatal: object of unexpected type
>   packet: receive-pack> unpack unpack-objects abnormal exit
>   packet: receive-pack> ng refs/heads/test unpacker error
>   packet: receive-pack> 0000
>   packet:     sideband< \10028unpack unpack-objects abnormal exit0026ng refs/heads/test unpacker error0000
>   packet:         push< unpack unpack-objects abnormal exit
>   packet:         push< ng refs/heads/test unpacker error
>   packet:         push< 0000
>   packet: receive-pack> 0000
>   packet:     sideband< 0000

We do see "push" getting it.

So for some reason we're deciding not to read the status from out
sideband demuxer. And that is probably because of this code in
send_pack():

                  if (pack_objects(out, remote_refs, extra_have, args) < 0) {
                          for (ref = remote_refs; ref; ref = ref->next)
                                  ref->status = REF_STATUS_NONE;
                          if (args->stateless_rpc)
                                  close(out);
                          if (git_connection_is_socket(conn))
                                  shutdown(fd[0], SHUT_WR);
  
                          /*
                           * Do not even bother with the return value; we know we
                           * are failing, and just want the error() side effects.
                           */
                          if (status_report)
                                  receive_unpack_status(&reader);
  
                          if (use_sideband) {
                                  close(demux.out);
                                  finish_async(&demux);
                          }
                          fd[1] = -1;
                          return -1;
                  }

If we saw a pack-objects error, then we do bother to receive the unpack
status, but we don't stick around to get the individual ref status, and
everything stays as REF_STATUS_NONE.

And looking in transport_print_push_status(), NONE is skipped when
printing. We probably ought to be seeding it with a different status
there, but there's not really one for "hey, _we_ failed". Let's hack it
to use REMOTE_REJECT, put put in our own message, like so:

diff --git a/send-pack.c b/send-pack.c
index 34c77cbb1a..f6d609a244 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -565,8 +565,10 @@ int send_pack(struct send_pack_args *args,
 
 	if (need_pack_data && cmds_sent) {
 		if (pack_objects(out, remote_refs, extra_have, args) < 0) {
-			for (ref = remote_refs; ref; ref = ref->next)
-				ref->status = REF_STATUS_NONE;
+			for (ref = remote_refs; ref; ref = ref->next) {
+				ref->status = REF_STATUS_REMOTE_REJECT;
+				ref->remote_status = xstrdup("pack-objects error");
+			}
 			if (args->stateless_rpc)
 				close(out);
 			if (git_connection_is_socket(conn))

Now when the stress-test fails, I get:

  --- exp	2019-11-13 00:48:53.269943875 +0000
  +++ act	2019-11-13 00:48:53.317943957 +0000
  @@ -1,2 +1,2 @@
   To dst
  -!	refs/heads/master:refs/heads/test	[remote rejected] (unpacker error)
  +!	refs/heads/master:refs/heads/test	[remote rejected] (pack-objects error)

So we're definitely on the right track! Of course we could say "unpacker
error" and the test failure would go away. But it feels a bit like
lying, since the remote didn't tell us that.

So really, what we want for this case is to just get the remote status,
like so:

diff --git a/send-pack.c b/send-pack.c
index 34c77cbb1a..d4db965b96 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -565,19 +565,19 @@ int send_pack(struct send_pack_args *args,
 
 	if (need_pack_data && cmds_sent) {
 		if (pack_objects(out, remote_refs, extra_have, args) < 0) {
-			for (ref = remote_refs; ref; ref = ref->next)
-				ref->status = REF_STATUS_NONE;
 			if (args->stateless_rpc)
 				close(out);
 			if (git_connection_is_socket(conn))
 				shutdown(fd[0], SHUT_WR);
 
 			/*
 			 * Do not even bother with the return value; we know we
-			 * are failing, and just want the error() side effects.
+			 * are failing, and just want the error() side effects,
+			 * as well as marking refs with the remote status (if
+			 * we get one).
 			 */
 			if (status_report)
-				receive_unpack_status(&reader);
+				receive_status(&reader, remote_refs);
 
 			if (use_sideband) {
 				close(demux.out);

I was worried at first that we might make things worse for the case that
the network has hung up completely (which would cause pack-objects to
fail, but also cause us to not get anything from the remote). But this
is really no worse. Even in the existing code, we'd complain to stderr
about trying to read the unpack status. And then when we read the remote
ref status, as soon as we see a bad packet we just quietly stop reading
(thus leaving any unmentioned refs as EXPECTING_REPORT).

So with that second patch above, the test failure goes away for me.

-Peff

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: do people find t5504.8 flaky?
  2019-11-13  1:03     ` Jeff King
@ 2019-11-13  2:07       ` Jeff King
  2019-11-18 22:30         ` SZEDER Gábor
  2019-11-13  3:47       ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Jeff King @ 2019-11-13  2:07 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, git, Johannes Schindelin

On Tue, Nov 12, 2019 at 08:03:37PM -0500, Jeff King wrote:

> So with that second patch above, the test failure goes away for me.

After poking at the behavior around connection dropping, I've convinced
myself this is the right thing to do. So here it is with a commit
message. I can't get the flaky test to fail anymore, but please let me
know if you do.

I didn't see a good way to use stock git to simulate a hung-up
connection (because it has to happen after the advertisement, while the
client is sending the pack). So I stuck an exit(0) at the end of
parse_pack_header() in receive-pack, just to verify how things work
before and after my patch (which, spoiler alert, is exactly the same).

-- >8 --
Subject: [PATCH] send-pack: check remote ref status on pack-objects failure

When we're pushing a pack and our local pack-objects fails, we enter an
error code path that does a few things:

  1. Set the status of every ref to REF_STATUS_NONE

  2. Call receive_unpack_status() to try to get an error report from
     the other side

  3. Return an error to the caller

If pack-objects failed because the connection to the server dropped,
there's not much more we can do than report the hangup. And indeed, step
2 will try to read a packet from the other side, which will die() in the
packet-reading code with "the remote end hung up unexpectedly".

But if the connection _didn't_ die, then the most common issue is that
the remote index-pack or unpack-objects complained about our pack (we
could also have a local pack-objects error, but this ends up being the
same; we'd send an incomplete pack and the remote side would complain).

In that case we do report the error from the other side (because of step
2), but we fail to say anything further about the refs. The issue is
two-fold:

  - in step 1, the "NONE" status is not "we saw an error, so we have no
    status". It generally means "this ref did not match our refspecs, so
    we didn't try to push it". So when we print out the push status
    table, we won't mention any refs at all!

    But even if we had a status enum for "pack-objects error", we
    wouldn't want to blindly set it for every ref. For example, in a
    non-atomic push we might have rejected some refs already on the
    client side (e.g., REF_STATUS_REJECT_NODELETE) and we'd want to
    report that.

  - in step 2, we read just the unpack status. But receive-pack will
    also tell us about each ref (usually that it rejected them because
    of the unpacker error).

So a much better strategy is to leave the ref status fields as they are
(usually EXPECTING_REPORT) and then actually receive (and print) the
full per-ref status.

This case is actually covered in the test suite, as t5504.8, which
writes a pack that will be rejected by the remote unpack-objects. But
it's racy. Because our pack is small, most of the time pack-objects
manages to write the whole thing before the remote rejects it, and so it
returns success and we print out the errors from the remote. But very
occasionally (or when run under --stress) it goes slow enough to see a
failure in writing, and git-push reports nothing for the refs.

With this patch, the test should behave consistently.

There shouldn't be any downside to this approach. If we really did see
the connection drop, we'd already die in receive_unpack_status(), and
we'll continue to do so. If the connection drops _after_ we get the
unpack status but before we see any ref status, we'll still print the
remote unpacker error, but will now say "remote end hung up" instead of
returning the error up the call-stack. But as discussed, we weren't
showing anything more useful than that with the current code. And
anyway, that case is quite unlikely (the connection dropping at that
point would have to be unrelated to the pack-objects error, because of
the ordering of events).

In the future we might want to handle packet-read errors ourself instead
of dying, which would print a full ref status table even for hangups.
But in the meantime, this patch should be a strict improvement.

Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 send-pack.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/send-pack.c b/send-pack.c
index 34c77cbb1a..a7322d3278 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -565,8 +565,6 @@ int send_pack(struct send_pack_args *args,

 	if (need_pack_data && cmds_sent) {
 		if (pack_objects(out, remote_refs, extra_have, args) < 0) {
-			for (ref = remote_refs; ref; ref = ref->next)
-				ref->status = REF_STATUS_NONE;
 			if (args->stateless_rpc)
 				close(out);
 			if (git_connection_is_socket(conn))
@@ -574,10 +572,12 @@ int send_pack(struct send_pack_args *args,

 			/*
 			 * Do not even bother with the return value; we know we
-			 * are failing, and just want the error() side effects.
+			 * are failing, and just want the error() side effects,
+			 * as well as marking refs with their remote status (if
+			 * we get one).
 			 */
 			if (status_report)
-				receive_unpack_status(&reader);
+				receive_status(&reader, remote_refs);

 			if (use_sideband) {
 				close(demux.out);
-- 
2.24.0.739.gb5632e4929

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: do people find t5504.8 flaky?
  2019-11-13  1:03     ` Jeff King
  2019-11-13  2:07       ` Jeff King
@ 2019-11-13  3:47       ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2019-11-13  3:47 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, git, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> So really, what we want for this case is to just get the remote status,
> like so:
>
> diff --git a/send-pack.c b/send-pack.c
> index 34c77cbb1a..d4db965b96 100644
> --- a/send-pack.c
> +++ b/send-pack.c
> @@ -565,19 +565,19 @@ int send_pack(struct send_pack_args *args,
>  
>  	if (need_pack_data && cmds_sent) {
>  		if (pack_objects(out, remote_refs, extra_have, args) < 0) {
> -			for (ref = remote_refs; ref; ref = ref->next)
> -				ref->status = REF_STATUS_NONE;
>  			if (args->stateless_rpc)
>  				close(out);
>  			if (git_connection_is_socket(conn))
>  				shutdown(fd[0], SHUT_WR);
>  
>  			/*
>  			 * Do not even bother with the return value; we know we
> -			 * are failing, and just want the error() side effects.
> +			 * are failing, and just want the error() side effects,
> +			 * as well as marking refs with the remote status (if
> +			 * we get one).
>  			 */
>  			if (status_report)
> -				receive_unpack_status(&reader);
> +				receive_status(&reader, remote_refs);
>  
>  			if (use_sideband) {
>  				close(demux.out);
>
> I was worried at first that we might make things worse for the case that
> the network has hung up completely (which would cause pack-objects to
> fail, but also cause us to not get anything from the remote). But this
> is really no worse. Even in the existing code, we'd complain to stderr
> about trying to read the unpack status. And then when we read the remote
> ref status, as soon as we see a bad packet we just quietly stop reading
> (thus leaving any unmentioned refs as EXPECTING_REPORT).
>
> So with that second patch above, the test failure goes away for me.

Nicely analyzed and patched.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: do people find t5504.8 flaky?
  2019-11-13  2:07       ` Jeff King
@ 2019-11-18 22:30         ` SZEDER Gábor
  2019-11-18 23:25           ` Randall S. Becker
  0 siblings, 1 reply; 9+ messages in thread
From: SZEDER Gábor @ 2019-11-18 22:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Johannes Schindelin

On Tue, Nov 12, 2019 at 09:07:19PM -0500, Jeff King wrote:
> On Tue, Nov 12, 2019 at 08:03:37PM -0500, Jeff King wrote:
> 
> > So with that second patch above, the test failure goes away for me.
> 
> After poking at the behavior around connection dropping, I've convinced
> myself this is the right thing to do. So here it is with a commit
> message. I can't get the flaky test to fail anymore, but please let me
> know if you do.

FWIW neither can I.  I clocked in ~3 hours of --stress with the patch
applied, and no failures  (before it usually failed within 30s).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: do people find t5504.8 flaky?
  2019-11-18 22:30         ` SZEDER Gábor
@ 2019-11-18 23:25           ` Randall S. Becker
  0 siblings, 0 replies; 9+ messages in thread
From: Randall S. Becker @ 2019-11-18 23:25 UTC (permalink / raw)
  To: 'SZEDER Gábor', 'Jeff King'
  Cc: 'Junio C Hamano', git, 'Johannes Schindelin'

On November 18, 2019 5:30 PM, SZEDER Gábor wrote
> On Tue, Nov 12, 2019 at 09:07:19PM -0500, Jeff King wrote:
> > On Tue, Nov 12, 2019 at 08:03:37PM -0500, Jeff King wrote:
> >
> > > So with that second patch above, the test failure goes away for me.
> >
> > After poking at the behavior around connection dropping, I've
> > convinced myself this is the right thing to do. So here it is with a
> > commit message. I can't get the flaky test to fail anymore, but please
> > let me know if you do.
> 
> FWIW neither can I.  I clocked in ~3 hours of --stress with the patch applied,
> and no failures  (before it usually failed within 30s).

Nothing flaky so far on either NonStop ports.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-11-18 23:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-23  2:45 do people find t5504.8 flaky? Junio C Hamano
2019-04-23  3:02 ` Jeff King
2019-11-13  0:07   ` SZEDER Gábor
2019-11-13  1:03     ` Jeff King
2019-11-13  2:07       ` Jeff King
2019-11-18 22:30         ` SZEDER Gábor
2019-11-18 23:25           ` Randall S. Becker
2019-11-13  3:47       ` Junio C Hamano
2019-04-29 13:36 ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).