Re: [Bug/Solution] Git hangs in compat/poll/poll.c on HPE NonStop
To
Randall S. Becker
Cc
git@vger.kernel.org
Paolo Bonzini
From
Junio C Hamano
See Also
Prev
Date
2017-09-28 22:47:17 UTC
[jch: a patch that hopefully is applicable is attached at the end;
 I'd appreciate input from Paolo, the contributor of the original
 upstream]

"Randall S. Becker" <rsbecker@nexbridge.com> writes:

> After a whole lot of investigating, we (it is a large "we") have discovered
> the reason for the hang we occasionally get in git-upload-pack on HPE
> NonStop servers - reported here well over a year ago. This resulted from a
> subtle check that the operating system does on file descriptors. When it
> sees random values in pfd[i].revents,...

What do you mean by "random values"?  Where do these random values
come from?  The original code the patch touches is _not_ setting
anything, so it is leaving it uninitialized and that is seen by the
system?  If that is the case perhaps should we clear it before we
call into this codepath?

> .... There is a non-destructive fix
> that I would like to propose for this that I have already tested.

I am not sure in what sense this is "non-destructive"; we left the
value as it was when we didn't notice anything happening in
compute_revents().  Now we explicitly destroy the value there was
(just like we do in the case where the corresponding file descriptor
was negative).  Maybe overwriting with 0 is the right thing, but I
wouldn't call it "non-destructive", though.  Puzzling...

> --- a/compat/poll/poll.c
> +++ b/compat/poll/poll.c
> @@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
>             pfd[i].revents = happened;
>             rc++;
>           }
> +        else
> +          {
> +            pfd[i].revents = 0;
> +          }
>        }
>
>    return rc;

FYI, the upstream gnulib rewrites this part of the code with
d42461c3 ("poll: fixes for large fds", 2015-02-20) quite a bit, and
it reads like this:

+    {
+      pfd[i].revents = (pfd[i].fd < 0
+                        ? 0
+                        : compute_revents (pfd[i].fd, pfd[i].events,
+                                           &rfds, &wfds, &efds));
+      rc += pfd[i].revents != 0;
+    }

The code before your fix (and before d42461c3) used to assign to
pfd[i].revents only when compute_revents() gave us non-zero value,
but with d42461c3 in the upstream, it pfd[i].revents is assigned
the return value from compute_revents() even if the value returned
is 0.  And rc is incremented only when that value is non-zero.

The result of applying your patch is equivalent, so after all, I
tend to think that your patch is the right fix in the context of the
code we have (i.e. the older code we borrowed from them).

I wonder if we want to refresh the borrowed copy of poll.c to a
newer snapshot someday, but that is totally a separate topic.  At
least, I now know that your fix in this patch will not be broken due
to d42461c3 updating the code in a slightly different way ;-)

Randall, I forged your Sign-off in the patch below, but please say
it is OK, after reading Documentation/SubmittingPatches.

Thanks.

-- >8 --
From: Randall S. Becker <rsbecker@nexbridge.com>
Subject: poll.c: always set revents, even if to zero.

Match what happens to pfd[i].revents when compute_revents() returns
0 to the upstream gnulib's commit d42461c3 ("poll: fixes for large
fds", 2015-02-20).  The revents field is set to 0, without
incrementing the value rc to be returned from the function.

This fixes occasional hangs in git-upload-pack on NPE NonStop.

Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 compat/poll/poll.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/compat/poll/poll.c b/compat/poll/poll.c
index b10adc780f..ae03b74a6f 100644
--- a/compat/poll/poll.c
+++ b/compat/poll/poll.c
@@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
 	    pfd[i].revents = happened;
 	    rc++;
 	  }
+	else
+	  {
+	    pfd[i].revents = 0;
+	  }
       }
 
   return rc;