2010-07-28 07:24:09

by Andy Chittenden

[permalink] [raw]
Subject: RE: nfs client hang

cmVzZW5kaW5nIGFzIGl0IHNlZW1zIHRvIGhhdmUgYmVlbiBjb3JydXB0ZWQgb24gTEtNTCENCg0K
PiBUaGUgUlBDIGNsaWVudCBtYXJrcyB0aGUgc29ja2V0IGNsb3NlZC4gYW5kIHRoZSBsaW5nZXIg
dGltZW91dCBpcyANCj4gY2FuY2VsbGVkLiAgQXQgdGhpcyBwb2ludCwgc2tfc2h1dGRvd24gc2hv
dWxkIGJlIHNldCB0byB6ZXJvLCBjb3JyZWN0PyANCj4gSSBkb24ndCBzZWUgYW4geHNfZXJyb3Jf
cmVwb3J0KCkgY2FsbCBoZXJlLCB3aGljaCB3b3VsZCBjb25maXJtIHRoYXQgdGhlIA0KPiBzb2Nr
ZXQgdG9vayBhIHRyaXAgdGhyb3VnaCB0Y3BfZGlzY29ubmVjdCgpLg0KDQpGcm9tIG15IHJlYWRp
bmcgb2YgdGNwX2Rpc2Nvbm5lY3QoKSwgaXQgY2FsbHMgc2stPnNrX2Vycm9yX3JlcG9ydChzaykg
dW5jb25kaXRpb25hbGx5IHNvIGFzIHRoZXJlJ3Mgbm8geHNfZXJyb3JfcmVwb3J0KCksIHRoYXQg
c3VyZWx5IG1lYW5zIHRoZSBleGFjdCBvcHBvc2l0ZTogdGNwX2Rpc2Nvbm5lY3QoKSB3YXNuJ3Qg
Y2FsbGVkLiBJZiBpdCdzIG5vdCBjYWxsZWQsIHNrX3NodXRkb3duIGlzIG5vdCBjbGVhcmVkLiBB
bmQgbXkgcmV2aXNlZCB0cmFjaW5nIGNvbmZpcm1lZCB0aGF0IGl0IHdhcyBzZXQgdG8gDQpTRU5E
X1NIVVRET1dOLg0KDQotLSANCkFuZHksIEJsdWVBcmMgRW5naW5lZXJpbmcNCg==


2010-07-29 10:10:19

by Andy Chittenden

[permalink] [raw]
Subject: Re: nfs client hang

On 2010-07-28 18:37, Chuck Lever wrote:
> On 07/28/10 03:24 AM, Andy Chittenden wrote:
>> resending as it seems to have been corrupted on LKML!
>>
>>> The RPC client marks the socket closed. and the linger timeout is
>>> cancelled. At this point, sk_shutdown should be set to zero, correct?
>>> I don't see an xs_error_report() call here, which would confirm that the
>>> socket took a trip through tcp_disconnect().
>> From my reading of tcp_disconnect(), it calls sk->sk_error_report(sk)
>> unconditionally so as there's no xs_error_report(), that surely means
>> the exact opposite: tcp_disconnect() wasn't called. If it's not
>> called, sk_shutdown is not cleared. And my revised tracing confirmed
>> that it was set to SEND_SHUTDOWN.
> Sorry, that's what I meant above.
>
> An xs_error_report() debugging message at that point in the log would
> confirm that the socket took a trip through tcp_disconnect(). But I
> don't see such a message.
I don't see how tcp_disconnect() gets called if the application does a
shutdown when the state is TCP_ESTABLISHED (or a myriad of other
states). It just seems to send a FIN. Should tcp_disconnect() be called?
If so, how? Alternatively, I wonder whether my patch that set
sk_shutdown to 0 in tcp_connect_init() is the correct fix after all.

--
Andy, BlueArc Engineering


2010-07-28 17:38:53

by Chuck Lever III

[permalink] [raw]
Subject: Re: nfs client hang

On 07/28/10 03:24 AM, Andy Chittenden wrote:
> resending as it seems to have been corrupted on LKML!
>
>> The RPC client marks the socket closed. and the linger timeout is
>> cancelled. At this point, sk_shutdown should be set to zero, correct?
>> I don't see an xs_error_report() call here, which would confirm that the
>> socket took a trip through tcp_disconnect().
>
> From my reading of tcp_disconnect(), it calls sk->sk_error_report(sk)
> unconditionally so as there's no xs_error_report(), that surely means
> the exact opposite: tcp_disconnect() wasn't called. If it's not
> called, sk_shutdown is not cleared. And my revised tracing confirmed
> that it was set to SEND_SHUTDOWN.

Sorry, that's what I meant above.

An xs_error_report() debugging message at that point in the log would
confirm that the socket took a trip through tcp_disconnect(). But I
don't see such a message.