Re: [Bug/Solution] Git hangs in compat/poll/poll.c on HPE NonStop
To
Junio C Hamano
Randall S. Becker
Cc
git@vger.kernel.org
From
Paolo Bonzini
See Also
Prev Ref 1
Date
2017-09-29 08:41:23 UTC
On 29/09/2017 00:47, Junio C Hamano wrote:
> [jch: a patch that hopefully is applicable is attached at the end;
>  I'd appreciate input from Paolo, the contributor of the original
>  upstream]

I wasn't involved in upstream commit d42461c3, but Linux is also always
overwriting the revents output with zero.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Paolo

> "Randall S. Becker" <rsbecker@nexbridge.com> writes:
> 
>> After a whole lot of investigating, we (it is a large "we") have discovered
>> the reason for the hang we occasionally get in git-upload-pack on HPE
>> NonStop servers - reported here well over a year ago. This resulted from a
>> subtle check that the operating system does on file descriptors. When it
>> sees random values in pfd[i].revents,...
> 
> What do you mean by "random values"?  Where do these random values
> come from?  The original code the patch touches is _not_ setting
> anything, so it is leaving it uninitialized and that is seen by the
> system?  If that is the case perhaps should we clear it before we
> call into this codepath?
> 
>> .... There is a non-destructive fix
>> that I would like to propose for this that I have already tested.
> 
> I am not sure in what sense this is "non-destructive"; we left the
> value as it was when we didn't notice anything happening in
> compute_revents().  Now we explicitly destroy the value there was
> (just like we do in the case where the corresponding file descriptor
> was negative).  Maybe overwriting with 0 is the right thing, but I
> wouldn't call it "non-destructive", though.  Puzzling...
> 
>> --- a/compat/poll/poll.c
>> +++ b/compat/poll/poll.c
>> @@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
>>             pfd[i].revents = happened;
>>             rc++;
>>           }
>> +        else
>> +          {
>> +            pfd[i].revents = 0;
>> +          }
>>        }
>>
>>    return rc;
> 
> FYI, the upstream gnulib rewrites this part of the code with
> d42461c3 ("poll: fixes for large fds", 2015-02-20) quite a bit, and
> it reads like this:
> 
> +    {
> +      pfd[i].revents = (pfd[i].fd < 0
> +                        ? 0
> +                        : compute_revents (pfd[i].fd, pfd[i].events,
> +                                           &rfds, &wfds, &efds));
> +      rc += pfd[i].revents != 0;
> +    }
> 
> The code before your fix (and before d42461c3) used to assign to
> pfd[i].revents only when compute_revents() gave us non-zero value,
> but with d42461c3 in the upstream, it pfd[i].revents is assigned
> the return value from compute_revents() even if the value returned
> is 0.  And rc is incremented only when that value is non-zero.
> 
> The result of applying your patch is equivalent, so after all, I
> tend to think that your patch is the right fix in the context of the
> code we have (i.e. the older code we borrowed from them).
> 
> I wonder if we want to refresh the borrowed copy of poll.c to a
> newer snapshot someday, but that is totally a separate topic.  At
> least, I now know that your fix in this patch will not be broken due
> to d42461c3 updating the code in a slightly different way ;-)
> 
> Randall, I forged your Sign-off in the patch below, but please say
> it is OK, after reading Documentation/SubmittingPatches.
> 
> Thanks.
> 
> -- >8 --
> From: Randall S. Becker <rsbecker@nexbridge.com>
> Subject: poll.c: always set revents, even if to zero.
> 
> Match what happens to pfd[i].revents when compute_revents() returns
> 0 to the upstream gnulib's commit d42461c3 ("poll: fixes for large
> fds", 2015-02-20).  The revents field is set to 0, without
> incrementing the value rc to be returned from the function.
> 
> This fixes occasional hangs in git-upload-pack on NPE NonStop.
> 
> Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  compat/poll/poll.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/compat/poll/poll.c b/compat/poll/poll.c
> index b10adc780f..ae03b74a6f 100644
> --- a/compat/poll/poll.c
> +++ b/compat/poll/poll.c
> @@ -438,6 +438,10 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
>  	    pfd[i].revents = happened;
>  	    rc++;
>  	  }
> +	else
> +	  {
> +	    pfd[i].revents = 0;
> +	  }
>        }
>  
>    return rc;
>