git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [Bug/Solution] Git hangs in compat/poll/poll.c on HPE NonStop
@ 2017-09-28 21:01 Randall S. Becker
  2017-09-28 22:47 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Randall S. Becker @ 2017-09-28 21:01 UTC (permalink / raw)
  To: git

Hi Team,

After a whole lot of investigating, we (it is a large "we") have discovered
the reason for the hang we occasionally get in git-upload-pack on HPE
NonStop servers - reported here well over a year ago. This resulted from a
subtle check that the operating system does on file descriptors. When it
sees random values in pfd[i].revents, it sometimes thinks its dealing with a
TTY and well, things end badly after that. There is a non-destructive fix
that I would like to propose for this that I have already tested. Sadly, I
have no email mechanism where our repo resides for a real patch message. The
patch is based on 2.3.7 (16018ae), but should be applicable forward. We have
held off moving to a more recent version until resolving this, so that's
next on our plan.

--- a/compat/poll/poll.c
+++ b/compat/poll/poll.c
@@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
            pfd[i].revents = happened;
            rc++;
          }
+        else
+          {
+            pfd[i].revents = 0;
+          }
       }

   return rc;

Sincerely,
Randall

-- Brief whoami: NonStop&UNIX developer since approximately
UNIX(421664400)/NonStop(211288444200000000) 
-- In my real life, I talk too much.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Bug/Solution] Git hangs in compat/poll/poll.c on HPE NonStop
  2017-09-28 21:01 [Bug/Solution] Git hangs in compat/poll/poll.c on HPE NonStop Randall S. Becker
@ 2017-09-28 22:47 ` Junio C Hamano
  2017-09-29  8:41   ` Paolo Bonzini
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2017-09-28 22:47 UTC (permalink / raw)
  To: Randall S. Becker; +Cc: git, Paolo Bonzini

[jch: a patch that hopefully is applicable is attached at the end;
 I'd appreciate input from Paolo, the contributor of the original
 upstream]

"Randall S. Becker" <rsbecker@nexbridge.com> writes:

> After a whole lot of investigating, we (it is a large "we") have discovered
> the reason for the hang we occasionally get in git-upload-pack on HPE
> NonStop servers - reported here well over a year ago. This resulted from a
> subtle check that the operating system does on file descriptors. When it
> sees random values in pfd[i].revents,...

What do you mean by "random values"?  Where do these random values
come from?  The original code the patch touches is _not_ setting
anything, so it is leaving it uninitialized and that is seen by the
system?  If that is the case perhaps should we clear it before we
call into this codepath?

> .... There is a non-destructive fix
> that I would like to propose for this that I have already tested.

I am not sure in what sense this is "non-destructive"; we left the
value as it was when we didn't notice anything happening in
compute_revents().  Now we explicitly destroy the value there was
(just like we do in the case where the corresponding file descriptor
was negative).  Maybe overwriting with 0 is the right thing, but I
wouldn't call it "non-destructive", though.  Puzzling...

> --- a/compat/poll/poll.c
> +++ b/compat/poll/poll.c
> @@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
>             pfd[i].revents = happened;
>             rc++;
>           }
> +        else
> +          {
> +            pfd[i].revents = 0;
> +          }
>        }
>
>    return rc;

FYI, the upstream gnulib rewrites this part of the code with
d42461c3 ("poll: fixes for large fds", 2015-02-20) quite a bit, and
it reads like this:

+    {
+      pfd[i].revents = (pfd[i].fd < 0
+                        ? 0
+                        : compute_revents (pfd[i].fd, pfd[i].events,
+                                           &rfds, &wfds, &efds));
+      rc += pfd[i].revents != 0;
+    }

The code before your fix (and before d42461c3) used to assign to
pfd[i].revents only when compute_revents() gave us non-zero value,
but with d42461c3 in the upstream, it pfd[i].revents is assigned
the return value from compute_revents() even if the value returned
is 0.  And rc is incremented only when that value is non-zero.

The result of applying your patch is equivalent, so after all, I
tend to think that your patch is the right fix in the context of the
code we have (i.e. the older code we borrowed from them).

I wonder if we want to refresh the borrowed copy of poll.c to a
newer snapshot someday, but that is totally a separate topic.  At
least, I now know that your fix in this patch will not be broken due
to d42461c3 updating the code in a slightly different way ;-)

Randall, I forged your Sign-off in the patch below, but please say
it is OK, after reading Documentation/SubmittingPatches.

Thanks.

-- >8 --
From: Randall S. Becker <rsbecker@nexbridge.com>
Subject: poll.c: always set revents, even if to zero.

Match what happens to pfd[i].revents when compute_revents() returns
0 to the upstream gnulib's commit d42461c3 ("poll: fixes for large
fds", 2015-02-20).  The revents field is set to 0, without
incrementing the value rc to be returned from the function.

This fixes occasional hangs in git-upload-pack on NPE NonStop.

Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 compat/poll/poll.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/compat/poll/poll.c b/compat/poll/poll.c
index b10adc780f..ae03b74a6f 100644
--- a/compat/poll/poll.c
+++ b/compat/poll/poll.c
@@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
 	    pfd[i].revents = happened;
 	    rc++;
 	  }
+	else
+	  {
+	    pfd[i].revents = 0;
+	  }
       }
 
   return rc;

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [Bug/Solution] Git hangs in compat/poll/poll.c on HPE NonStop
  2017-09-28 22:47 ` Junio C Hamano
@ 2017-09-29  8:41   ` Paolo Bonzini
  0 siblings, 0 replies; 3+ messages in thread
From: Paolo Bonzini @ 2017-09-29  8:41 UTC (permalink / raw)
  To: Junio C Hamano, Randall S. Becker; +Cc: git

On 29/09/2017 00:47, Junio C Hamano wrote:
> [jch: a patch that hopefully is applicable is attached at the end;
>  I'd appreciate input from Paolo, the contributor of the original
>  upstream]

I wasn't involved in upstream commit d42461c3, but Linux is also always
overwriting the revents output with zero.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> "Randall S. Becker" <rsbecker@nexbridge.com> writes:
> 
>> After a whole lot of investigating, we (it is a large "we") have discovered
>> the reason for the hang we occasionally get in git-upload-pack on HPE
>> NonStop servers - reported here well over a year ago. This resulted from a
>> subtle check that the operating system does on file descriptors. When it
>> sees random values in pfd[i].revents,...
> 
> What do you mean by "random values"?  Where do these random values
> come from?  The original code the patch touches is _not_ setting
> anything, so it is leaving it uninitialized and that is seen by the
> system?  If that is the case perhaps should we clear it before we
> call into this codepath?
> 
>> .... There is a non-destructive fix
>> that I would like to propose for this that I have already tested.
> 
> I am not sure in what sense this is "non-destructive"; we left the
> value as it was when we didn't notice anything happening in
> compute_revents().  Now we explicitly destroy the value there was
> (just like we do in the case where the corresponding file descriptor
> was negative).  Maybe overwriting with 0 is the right thing, but I
> wouldn't call it "non-destructive", though.  Puzzling...
> 
>> --- a/compat/poll/poll.c
>> +++ b/compat/poll/poll.c
>> @@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
>>             pfd[i].revents = happened;
>>             rc++;
>>           }
>> +        else
>> +          {
>> +            pfd[i].revents = 0;
>> +          }
>>        }
>>
>>    return rc;
> 
> FYI, the upstream gnulib rewrites this part of the code with
> d42461c3 ("poll: fixes for large fds", 2015-02-20) quite a bit, and
> it reads like this:
> 
> +    {
> +      pfd[i].revents = (pfd[i].fd < 0
> +                        ? 0
> +                        : compute_revents (pfd[i].fd, pfd[i].events,
> +                                           &rfds, &wfds, &efds));
> +      rc += pfd[i].revents != 0;
> +    }
> 
> The code before your fix (and before d42461c3) used to assign to
> pfd[i].revents only when compute_revents() gave us non-zero value,
> but with d42461c3 in the upstream, it pfd[i].revents is assigned
> the return value from compute_revents() even if the value returned
> is 0.  And rc is incremented only when that value is non-zero.
> 
> The result of applying your patch is equivalent, so after all, I
> tend to think that your patch is the right fix in the context of the
> code we have (i.e. the older code we borrowed from them).
> 
> I wonder if we want to refresh the borrowed copy of poll.c to a
> newer snapshot someday, but that is totally a separate topic.  At
> least, I now know that your fix in this patch will not be broken due
> to d42461c3 updating the code in a slightly different way ;-)
> 
> Randall, I forged your Sign-off in the patch below, but please say
> it is OK, after reading Documentation/SubmittingPatches.
> 
> Thanks.
> 
> -- >8 --
> From: Randall S. Becker <rsbecker@nexbridge.com>
> Subject: poll.c: always set revents, even if to zero.
> 
> Match what happens to pfd[i].revents when compute_revents() returns
> 0 to the upstream gnulib's commit d42461c3 ("poll: fixes for large
> fds", 2015-02-20).  The revents field is set to 0, without
> incrementing the value rc to be returned from the function.
> 
> This fixes occasional hangs in git-upload-pack on NPE NonStop.
> 
> Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  compat/poll/poll.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/compat/poll/poll.c b/compat/poll/poll.c
> index b10adc780f..ae03b74a6f 100644
> --- a/compat/poll/poll.c
> +++ b/compat/poll/poll.c
> @@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
>  	    pfd[i].revents = happened;
>  	    rc++;
>  	  }
> +	else
> +	  {
> +	    pfd[i].revents = 0;
> +	  }
>        }
>  
>    return rc;
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-09-29  8:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-28 21:01 [Bug/Solution] Git hangs in compat/poll/poll.c on HPE NonStop Randall S. Becker
2017-09-28 22:47 ` Junio C Hamano
2017-09-29  8:41   ` Paolo Bonzini

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).