2011-12-21 09:24:17

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Session timeout on RHEL6.2

Dear friends,

We are observing strange behavior with RHEL 6.2:

Our the server lease time is 90 seconds. I can see that client
sends SEQUENCE every 60 sec. And this is for some hours ( ~8 ).
At some point client sends SEQUENCE after 127 seconds and
gets, as expected, EXPIRED.

I this point I have to blame myself.
Client comes with EXCHANGE_ID using the same clientid.
We did not garbage collected clientid internally as this happens after
2*LEASE_TIME
and return EXPIRE. This ping-pong never ends.

This is probably mostly a bug on my side. Nevertheless we never observed late
SEQUENCE with kernel > 2.6.39. A short packet dump attached.

I can open bug at RHEL if required.

Regards,
Tigran.


Attachments:
sequence.dump (3.04 kB)

2011-12-25 12:03:54

by Benny Halevy

[permalink] [raw]
Subject: Re: Session timeout on RHEL6.2

On 2011-12-25 11:47, Trond Myklebust wrote:
> On Sun, 2011-12-25 at 06:37 +0200, Benny Halevy wrote:
>> On 2011-12-21 22:11, Tigran Mkrtchyan wrote:
>>> On Wed, Dec 21, 2011 at 2:57 PM, Trond Myklebust
>>> <[email protected]> wrote:
>>>> On Wed, 2011-12-21 at 10:24 +0100, Tigran Mkrtchyan wrote:
>>>>> Dear friends,
>>>>>
>>>>> We are observing strange behavior with RHEL 6.2:
>>>>>
>>>>> Our the server lease time is 90 seconds. I can see that client
>>>>> sends SEQUENCE every 60 sec. And this is for some hours ( ~8 ).
>>>>> At some point client sends SEQUENCE after 127 seconds and
>>>>> gets, as expected, EXPIRED.
>>>>
>>>> Why shouldn't the client be allowed to let the lease expire if nothing
>>>> is using that filesystem?
>>>>
>>>>> I this point I have to blame myself.
>>>>> Client comes with EXCHANGE_ID using the same clientid.
>>>>> We did not garbage collected clientid internally as this happens after
>>>>> 2*LEASE_TIME
>>>>> and return EXPIRE. This ping-pong never ends.
>>>>>
>>>>> This is probably mostly a bug on my side. Nevertheless we never observed late
>>>>> SEQUENCE with kernel > 2.6.39. A short packet dump attached.
>>>>>
>>>>> I can open bug at RHEL if required.
>>>>
>>>> I wouldn't consider that a bug.
>>>
>>> As I said, there is a bug in exchange_id processing ( case 3 ) on my
>>> side. But to me it's sounds strange that client after more than 8
>>> hours of sending only sequence decided to send one of them later than
>>> lease time. Especially, that we did not have it with other kernels.
>>
>> I'm inclined to agree. The client can let the lease expire for sure
>> and that's not a bug but the fact that the client sent the SEQUENCE operation
>> after the lease had expired indicates it might not be aware of that fact
>> and that seems to be a client bug.
>>
>> That said, I don't think that letting the lease expire when the client is idle
>> is the most polite thing to do. Why let the server clean up after the client
>> and revert to possibly un-optimized recovery paths rather than orderly
>> destruction of the state by the client?
>
> There are plenty of cases where the client can be idle for hours or even
> _days_. What's the point of pinging the server all the time after
> working hours?
>
> If someone wants to code up a DESTROY_SESSION and DESTROY_CLIENTID in
> order to make it formal, then fine, however note that we don't even do
> that on a full unmount today.
>

The heavy lifting is releasing locks and returning layouts and delegations
sending DESTROY_{SESSION,CLIENTID} would be nice to have but I don't think
it's the most important issue.

Benny

2011-12-21 13:59:49

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Session timeout on RHEL6.2

On Wed, 2011-12-21 at 10:24 +0100, Tigran Mkrtchyan wrote:
> Dear friends,
>
> We are observing strange behavior with RHEL 6.2:
>
> Our the server lease time is 90 seconds. I can see that client
> sends SEQUENCE every 60 sec. And this is for some hours ( ~8 ).
> At some point client sends SEQUENCE after 127 seconds and
> gets, as expected, EXPIRED.

Why shouldn't the client be allowed to let the lease expire if nothing
is using that filesystem?

> I this point I have to blame myself.
> Client comes with EXCHANGE_ID using the same clientid.
> We did not garbage collected clientid internally as this happens after
> 2*LEASE_TIME
> and return EXPIRE. This ping-pong never ends.
>
> This is probably mostly a bug on my side. Nevertheless we never observed late
> SEQUENCE with kernel > 2.6.39. A short packet dump attached.
>
> I can open bug at RHEL if required.

I wouldn't consider that a bug.

Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-12-30 01:03:58

by Myklebust, Trond

[permalink] [raw]
Subject: RE: Session timeout on RHEL6.2

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBUaWdyYW4gTWtydGNoeWFuIFtt
YWlsdG86dGlncmFuLm1rcnRjaHlhbkBkZXN5LmRlXQ0KPiBTZW50OiBUaHVyc2RheSwgRGVjZW1i
ZXIgMjksIDIwMTEgODoyMSBQTQ0KPiBUbzogTXlrbGVidXN0LCBUcm9uZA0KPiBDYzogQmVubnkg
SGFsZXZ5OyBsaW51eC1uZnMNCj4gU3ViamVjdDogUmU6IFNlc3Npb24gdGltZW91dCBvbiBSSEVM
Ni4yDQo+IA0KPiBIaSBUcm9uZCwNCj4gDQo+IFRoZXJlIGlzIGEgc21hbGwgaW5jb25zaXN0ZW5j
eSBpbiB5b3VyIHRoZW9yeTogdG8gY2xvc2UgaWRsZSBzZXNzaW9uIGl0J3MgZW5vdWdoDQo+IG5v
dCB0byBzZW5kIHNlcXVlbmNlIGFueSBtb3JlIGFuZCB0aGVyZSBhcmUgbm8gcmVhc29uIHRvIHJl
LWVzdGFibGlzaA0KPiBzZXNzaW9uIGFzIHNvb24gYXMgc2VydmVyIHJldHVybnMgRVhQSVJFRC4N
Cg0KSSBkb24ndCB1bmRlcnN0YW5kLiBJJ3ZlIG5ldmVyIHB1dCBmb3J3YXJkIGFueSAidGhlb3J5
IiBpbnZvbHZpbmcgZm9yY2luZyB0aGUgY2xpZW50IHRvIHJlLWVzdGFibGlzaCB0aGUgc2Vzc2lv
biBqdXN0IGJlY2F1c2UgdGhlIHNlcnZlciByZXR1cm5zIEVYUElSRUQuIEl0IHNob3VsZCBiZSBy
ZS1lc3RhYmxpc2hpbmcgc2Vzc2lvbnMgaWZmIHdlIHdhbnQgdG8gYWNjZXNzIHRoZSBmaWxlc3lz
dGVtIGFuZCB0aGUgc2VydmVyIHRlbGxzIGl0IHRoYXQgdGhlIHNlc3Npb24gZXhwaXJlZC4NCg0K
VHJvbmQNCg0KDQo+IFRpZ3Jhbi4NCj4gDQo+IE9uIFN1biwgRGVjIDI1LCAyMDExIGF0IDI6MjUg
UE0sIFRyb25kIE15a2xlYnVzdA0KPiA8VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20+IHdyb3Rl
Og0KPiA+IE9uIFN1biwgMjAxMS0xMi0yNSBhdCAxNDowMyArMDIwMCwgQmVubnkgSGFsZXZ5IHdy
b3RlOg0KPiA+PiBPbiAyMDExLTEyLTI1IDExOjQ3LCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+
ID4+ID4gT24gU3VuLCAyMDExLTEyLTI1IGF0IDA2OjM3ICswMjAwLCBCZW5ueSBIYWxldnkgd3Jv
dGU6DQo+ID4+ID4+IE9uIDIwMTEtMTItMjEgMjI6MTEsIFRpZ3JhbiBNa3J0Y2h5YW4gd3JvdGU6
DQo+ID4+ID4+PiBPbiBXZWQsIERlYyAyMSwgMjAxMSBhdCAyOjU3IFBNLCBUcm9uZCBNeWtsZWJ1
c3QNCj4gPj4gPj4+IDxUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbT4gd3JvdGU6DQo+ID4+ID4+
Pj4gT24gV2VkLCAyMDExLTEyLTIxIGF0IDEwOjI0ICswMTAwLCBUaWdyYW4gTWtydGNoeWFuIHdy
b3RlOg0KPiA+PiA+Pj4+PiBEZWFyIGZyaWVuZHMsDQo+ID4+ID4+Pj4+DQo+ID4+ID4+Pj4+IFdl
IGFyZSBvYnNlcnZpbmcgc3RyYW5nZSBiZWhhdmlvciB3aXRoIFJIRUwgNi4yOg0KPiA+PiA+Pj4+
Pg0KPiA+PiA+Pj4+PiBPdXIgdGhlIHNlcnZlciBsZWFzZSB0aW1lIGlzIDkwIHNlY29uZHMuIEkg
Y2FuIHNlZSB0aGF0IGNsaWVudA0KPiA+PiA+Pj4+PiBzZW5kcyBTRVFVRU5DRSBldmVyeSA2MCBz
ZWMuIEFuZCB0aGlzIGlzIGZvciBzb21lIGhvdXJzICggfjggKS4NCj4gPj4gPj4+Pj4gQXQgc29t
ZSBwb2ludCBjbGllbnQgc2VuZHMgU0VRVUVOQ0UgYWZ0ZXIgMTI3IHNlY29uZHMgYW5kIGdldHMs
DQo+ID4+ID4+Pj4+IGFzIGV4cGVjdGVkLCBFWFBJUkVELg0KPiA+PiA+Pj4+DQo+ID4+ID4+Pj4g
V2h5IHNob3VsZG4ndCB0aGUgY2xpZW50IGJlIGFsbG93ZWQgdG8gbGV0IHRoZSBsZWFzZSBleHBp
cmUgaWYNCj4gPj4gPj4+PiBub3RoaW5nIGlzIHVzaW5nIHRoYXQgZmlsZXN5c3RlbT8NCj4gPj4g
Pj4+Pg0KPiA+PiA+Pj4+PiBJIHRoaXMgcG9pbnQgSSBoYXZlIHRvIGJsYW1lIG15c2VsZi4NCj4g
Pj4gPj4+Pj4gQ2xpZW50IGNvbWVzIHdpdGggRVhDSEFOR0VfSUQgdXNpbmcgdGhlIHNhbWUgY2xp
ZW50aWQuDQo+ID4+ID4+Pj4+IFdlIGRpZCBub3QgZ2FyYmFnZSBjb2xsZWN0ZWQgY2xpZW50aWQg
aW50ZXJuYWxseSBhcyB0aGlzDQo+ID4+ID4+Pj4+IGhhcHBlbnMgYWZ0ZXIgMipMRUFTRV9USU1F
IGFuZCByZXR1cm4gRVhQSVJFLiBUaGlzIHBpbmctcG9uZw0KPiA+PiA+Pj4+PiBuZXZlciBlbmRz
Lg0KPiA+PiA+Pj4+Pg0KPiA+PiA+Pj4+PiBUaGlzIGlzIHByb2JhYmx5IG1vc3RseSBhIGJ1ZyBv
biBteSBzaWRlLiBOZXZlcnRoZWxlc3Mgd2UgbmV2ZXINCj4gPj4gPj4+Pj4gb2JzZXJ2ZWQgbGF0
ZSBTRVFVRU5DRSB3aXRoIGtlcm5lbCA+IDIuNi4zOS4gQSBzaG9ydCBwYWNrZXQNCj4gZHVtcCBh
dHRhY2hlZC4NCj4gPj4gPj4+Pj4NCj4gPj4gPj4+Pj4gSSBjYW4gb3BlbiBidWcgYXQgUkhFTCBp
ZiByZXF1aXJlZC4NCj4gPj4gPj4+Pg0KPiA+PiA+Pj4+IEkgd291bGRuJ3QgY29uc2lkZXIgdGhh
dCBhIGJ1Zy4NCj4gPj4gPj4+DQo+ID4+ID4+PiBBcyBJIHNhaWQsIHRoZXJlIGlzIGEgYnVnIGlu
IGV4Y2hhbmdlX2lkIHByb2Nlc3NpbmcgKCBjYXNlIDMgKSBvbg0KPiA+PiA+Pj4gbXkgc2lkZS4g
QnV0IHRvIG1lIGl0J3Mgc291bmRzIHN0cmFuZ2UgdGhhdCBjbGllbnQgYWZ0ZXIgbW9yZQ0KPiA+
PiA+Pj4gdGhhbiA4IGhvdXJzIG9mIHNlbmRpbmcgb25seSBzZXF1ZW5jZSBkZWNpZGVkIHRvIHNl
bmQgb25lIG9mIHRoZW0NCj4gPj4gPj4+IGxhdGVyIHRoYW4gbGVhc2UgdGltZS4gRXNwZWNpYWxs
eSwgdGhhdCB3ZSBkaWQgbm90IGhhdmUgaXQgd2l0aCBvdGhlcg0KPiBrZXJuZWxzLg0KPiA+PiA+
Pg0KPiA+PiA+PiBJJ20gaW5jbGluZWQgdG8gYWdyZWUuIMKgVGhlIGNsaWVudCBjYW4gbGV0IHRo
ZSBsZWFzZSBleHBpcmUgZm9yDQo+ID4+ID4+IHN1cmUgYW5kIHRoYXQncyBub3QgYSBidWcgYnV0
IHRoZSBmYWN0IHRoYXQgdGhlIGNsaWVudCBzZW50IHRoZQ0KPiA+PiA+PiBTRVFVRU5DRSBvcGVy
YXRpb24gYWZ0ZXIgdGhlIGxlYXNlIGhhZCBleHBpcmVkIGluZGljYXRlcyBpdCBtaWdodA0KPiA+
PiA+PiBub3QgYmUgYXdhcmUgb2YgdGhhdCBmYWN0IGFuZCB0aGF0IHNlZW1zIHRvIGJlIGEgY2xp
ZW50IGJ1Zy4NCj4gPj4gPj4NCj4gPj4gPj4gVGhhdCBzYWlkLCBJIGRvbid0IHRoaW5rIHRoYXQg
bGV0dGluZyB0aGUgbGVhc2UgZXhwaXJlIHdoZW4gdGhlDQo+ID4+ID4+IGNsaWVudCBpcyBpZGxl
IGlzIHRoZSBtb3N0IHBvbGl0ZSB0aGluZyB0byBkby4gV2h5IGxldCB0aGUgc2VydmVyDQo+ID4+
ID4+IGNsZWFuIHVwIGFmdGVyIHRoZSBjbGllbnQgYW5kIHJldmVydCB0byBwb3NzaWJseSB1bi1v
cHRpbWl6ZWQNCj4gPj4gPj4gcmVjb3ZlcnkgcGF0aHMgcmF0aGVyIHRoYW4gb3JkZXJseSBkZXN0
cnVjdGlvbiBvZiB0aGUgc3RhdGUgYnkgdGhlDQo+IGNsaWVudD8NCj4gPj4gPg0KPiA+PiA+IFRo
ZXJlIGFyZSBwbGVudHkgb2YgY2FzZXMgd2hlcmUgdGhlIGNsaWVudCBjYW4gYmUgaWRsZSBmb3Ig
aG91cnMgb3INCj4gPj4gPiBldmVuIF9kYXlzXy4gV2hhdCdzIHRoZSBwb2ludCBvZiBwaW5naW5n
IHRoZSBzZXJ2ZXIgYWxsIHRoZSB0aW1lDQo+ID4+ID4gYWZ0ZXIgd29ya2luZyBob3Vycz8NCj4g
Pj4gPg0KPiA+PiA+IElmIHNvbWVvbmUgd2FudHMgdG8gY29kZSB1cCBhIERFU1RST1lfU0VTU0lP
TiBhbmQNCj4gREVTVFJPWV9DTElFTlRJRA0KPiA+PiA+IGluIG9yZGVyIHRvIG1ha2UgaXQgZm9y
bWFsLCB0aGVuIGZpbmUsIGhvd2V2ZXIgbm90ZSB0aGF0IHdlIGRvbid0DQo+ID4+ID4gZXZlbiBk
byB0aGF0IG9uIGEgZnVsbCB1bm1vdW50IHRvZGF5Lg0KPiA+PiA+DQo+ID4+DQo+ID4+IFRoZSBo
ZWF2eSBsaWZ0aW5nIGlzIHJlbGVhc2luZyBsb2NrcyBhbmQgcmV0dXJuaW5nIGxheW91dHMgYW5k
DQo+ID4+IGRlbGVnYXRpb25zIHNlbmRpbmcgREVTVFJPWV97U0VTU0lPTixDTElFTlRJRH0gd291
bGQgYmUgbmljZSB0byBoYXZlDQo+ID4+IGJ1dCBJIGRvbid0IHRoaW5rIGl0J3MgdGhlIG1vc3Qg
aW1wb3J0YW50IGlzc3VlLg0KPiA+DQo+ID4gQWN0dWFsbHksIHRoYXQgcmVxdWlyZW1lbnQgdG8g
cmV0dXJuIHN0YXRlIGlzIHdoYXQgbWFrZXMNCj4gPiBERVNUUk9ZX0NMSUVOVElEIGEgY29tcGxl
dGVseSB1c2VsZXNzIG9wZXJhdGlvbi4NCj4gPiBGb3JnZXQgd2hhdCBJIHNhaWQgdGhlbjogaXQn
cyB0b28gc3R1cGlkIHRvIGltcGxlbWVudC4uLg0KPiA+DQo+ID4gLS0NCj4gPiBUcm9uZCBNeWts
ZWJ1c3QNCj4gPiBMaW51eCBORlMgY2xpZW50IG1haW50YWluZXINCj4gPg0KPiA+IE5ldEFwcA0K
PiA+IFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQo+ID4gd3d3Lm5ldGFwcC5jb20NCj4gPg0K

2011-12-25 09:48:02

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Session timeout on RHEL6.2

On Sun, 2011-12-25 at 06:37 +0200, Benny Halevy wrote:
> On 2011-12-21 22:11, Tigran Mkrtchyan wrote:
> > On Wed, Dec 21, 2011 at 2:57 PM, Trond Myklebust
> > <[email protected]> wrote:
> >> On Wed, 2011-12-21 at 10:24 +0100, Tigran Mkrtchyan wrote:
> >>> Dear friends,
> >>>
> >>> We are observing strange behavior with RHEL 6.2:
> >>>
> >>> Our the server lease time is 90 seconds. I can see that client
> >>> sends SEQUENCE every 60 sec. And this is for some hours ( ~8 ).
> >>> At some point client sends SEQUENCE after 127 seconds and
> >>> gets, as expected, EXPIRED.
> >>
> >> Why shouldn't the client be allowed to let the lease expire if nothing
> >> is using that filesystem?
> >>
> >>> I this point I have to blame myself.
> >>> Client comes with EXCHANGE_ID using the same clientid.
> >>> We did not garbage collected clientid internally as this happens after
> >>> 2*LEASE_TIME
> >>> and return EXPIRE. This ping-pong never ends.
> >>>
> >>> This is probably mostly a bug on my side. Nevertheless we never observed late
> >>> SEQUENCE with kernel > 2.6.39. A short packet dump attached.
> >>>
> >>> I can open bug at RHEL if required.
> >>
> >> I wouldn't consider that a bug.
> >
> > As I said, there is a bug in exchange_id processing ( case 3 ) on my
> > side. But to me it's sounds strange that client after more than 8
> > hours of sending only sequence decided to send one of them later than
> > lease time. Especially, that we did not have it with other kernels.
>
> I'm inclined to agree. The client can let the lease expire for sure
> and that's not a bug but the fact that the client sent the SEQUENCE operation
> after the lease had expired indicates it might not be aware of that fact
> and that seems to be a client bug.
>
> That said, I don't think that letting the lease expire when the client is idle
> is the most polite thing to do. Why let the server clean up after the client
> and revert to possibly un-optimized recovery paths rather than orderly
> destruction of the state by the client?

There are plenty of cases where the client can be idle for hours or even
_days_. What's the point of pinging the server all the time after
working hours?

If someone wants to code up a DESTROY_SESSION and DESTROY_CLIENTID in
order to make it formal, then fine, however note that we don't even do
that on a full unmount today.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-12-25 13:25:10

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Session timeout on RHEL6.2

On Sun, 2011-12-25 at 14:03 +0200, Benny Halevy wrote:
> On 2011-12-25 11:47, Trond Myklebust wrote:
> > On Sun, 2011-12-25 at 06:37 +0200, Benny Halevy wrote:
> >> On 2011-12-21 22:11, Tigran Mkrtchyan wrote:
> >>> On Wed, Dec 21, 2011 at 2:57 PM, Trond Myklebust
> >>> <[email protected]> wrote:
> >>>> On Wed, 2011-12-21 at 10:24 +0100, Tigran Mkrtchyan wrote:
> >>>>> Dear friends,
> >>>>>
> >>>>> We are observing strange behavior with RHEL 6.2:
> >>>>>
> >>>>> Our the server lease time is 90 seconds. I can see that client
> >>>>> sends SEQUENCE every 60 sec. And this is for some hours ( ~8 ).
> >>>>> At some point client sends SEQUENCE after 127 seconds and
> >>>>> gets, as expected, EXPIRED.
> >>>>
> >>>> Why shouldn't the client be allowed to let the lease expire if nothing
> >>>> is using that filesystem?
> >>>>
> >>>>> I this point I have to blame myself.
> >>>>> Client comes with EXCHANGE_ID using the same clientid.
> >>>>> We did not garbage collected clientid internally as this happens after
> >>>>> 2*LEASE_TIME
> >>>>> and return EXPIRE. This ping-pong never ends.
> >>>>>
> >>>>> This is probably mostly a bug on my side. Nevertheless we never observed late
> >>>>> SEQUENCE with kernel > 2.6.39. A short packet dump attached.
> >>>>>
> >>>>> I can open bug at RHEL if required.
> >>>>
> >>>> I wouldn't consider that a bug.
> >>>
> >>> As I said, there is a bug in exchange_id processing ( case 3 ) on my
> >>> side. But to me it's sounds strange that client after more than 8
> >>> hours of sending only sequence decided to send one of them later than
> >>> lease time. Especially, that we did not have it with other kernels.
> >>
> >> I'm inclined to agree. The client can let the lease expire for sure
> >> and that's not a bug but the fact that the client sent the SEQUENCE operation
> >> after the lease had expired indicates it might not be aware of that fact
> >> and that seems to be a client bug.
> >>
> >> That said, I don't think that letting the lease expire when the client is idle
> >> is the most polite thing to do. Why let the server clean up after the client
> >> and revert to possibly un-optimized recovery paths rather than orderly
> >> destruction of the state by the client?
> >
> > There are plenty of cases where the client can be idle for hours or even
> > _days_. What's the point of pinging the server all the time after
> > working hours?
> >
> > If someone wants to code up a DESTROY_SESSION and DESTROY_CLIENTID in
> > order to make it formal, then fine, however note that we don't even do
> > that on a full unmount today.
> >
>
> The heavy lifting is releasing locks and returning layouts and delegations
> sending DESTROY_{SESSION,CLIENTID} would be nice to have but I don't think
> it's the most important issue.

Actually, that requirement to return state is what makes
DESTROY_CLIENTID a completely useless operation.
Forget what I said then: it's too stupid to implement...

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-12-29 19:20:52

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: Session timeout on RHEL6.2

Hi Trond,

There is a small inconsistency in your theory: to close idle session
it's enough not to send sequence any more and there are no reason to
re-establish session as soon as server returns EXPIRED.

Tigran.

On Sun, Dec 25, 2011 at 2:25 PM, Trond Myklebust
<[email protected]> wrote:
> On Sun, 2011-12-25 at 14:03 +0200, Benny Halevy wrote:
>> On 2011-12-25 11:47, Trond Myklebust wrote:
>> > On Sun, 2011-12-25 at 06:37 +0200, Benny Halevy wrote:
>> >> On 2011-12-21 22:11, Tigran Mkrtchyan wrote:
>> >>> On Wed, Dec 21, 2011 at 2:57 PM, Trond Myklebust
>> >>> <[email protected]> wrote:
>> >>>> On Wed, 2011-12-21 at 10:24 +0100, Tigran Mkrtchyan wrote:
>> >>>>> Dear friends,
>> >>>>>
>> >>>>> We are observing strange behavior with RHEL 6.2:
>> >>>>>
>> >>>>> Our the server lease time is 90 seconds. I can see that client
>> >>>>> sends SEQUENCE every 60 sec. And this is for some hours ( ~8 ).
>> >>>>> At some point client sends SEQUENCE after 127 seconds and
>> >>>>> gets, as expected, EXPIRED.
>> >>>>
>> >>>> Why shouldn't the client be allowed to let the lease expire if nothing
>> >>>> is using that filesystem?
>> >>>>
>> >>>>> I this point I have to blame myself.
>> >>>>> Client comes with EXCHANGE_ID using the same clientid.
>> >>>>> We did not garbage collected clientid internally as this happens after
>> >>>>> 2*LEASE_TIME
>> >>>>> and return EXPIRE. This ping-pong never ends.
>> >>>>>
>> >>>>> This is probably mostly a bug on my side. Nevertheless we never observed late
>> >>>>> SEQUENCE with kernel > 2.6.39. A short packet dump attached.
>> >>>>>
>> >>>>> I can open bug at RHEL if required.
>> >>>>
>> >>>> I wouldn't consider that a bug.
>> >>>
>> >>> As I said, there is a bug in exchange_id processing ( case 3 ) on my
>> >>> side. But to me it's sounds strange that client after more than 8
>> >>> hours of sending only sequence decided to send one of them later than
>> >>> lease time. Especially, that we did not have it with other kernels.
>> >>
>> >> I'm inclined to agree.  The client can let the lease expire for sure
>> >> and that's not a bug but the fact that the client sent the SEQUENCE operation
>> >> after the lease had expired indicates it might not be aware of that fact
>> >> and that seems to be a client bug.
>> >>
>> >> That said, I don't think that letting the lease expire when the client is idle
>> >> is the most polite thing to do. Why let the server clean up after the client
>> >> and revert to possibly un-optimized recovery paths rather than orderly
>> >> destruction of the state by the client?
>> >
>> > There are plenty of cases where the client can be idle for hours or even
>> > _days_. What's the point of pinging the server all the time after
>> > working hours?
>> >
>> > If someone wants to code up a DESTROY_SESSION and DESTROY_CLIENTID in
>> > order to make it formal, then fine, however note that we don't even do
>> > that on a full unmount today.
>> >
>>
>> The heavy lifting is releasing locks and returning layouts and delegations
>> sending DESTROY_{SESSION,CLIENTID} would be nice to have but I don't think
>> it's the most important issue.
>
> Actually, that requirement to return state is what makes
> DESTROY_CLIENTID a completely useless operation.
> Forget what I said then: it's too stupid to implement...
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>

2011-12-30 11:27:38

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: Session timeout on RHEL6.2

On Fri, Dec 30, 2011 at 2:03 AM, Myklebust, Trond
<[email protected]> wrote:
>> -----Original Message-----
>> From: Tigran Mkrtchyan [mailto:[email protected]]
>> Sent: Thursday, December 29, 2011 8:21 PM
>> To: Myklebust, Trond
>> Cc: Benny Halevy; linux-nfs
>> Subject: Re: Session timeout on RHEL6.2
>>
>> Hi Trond,
>>
>> There is a small inconsistency in your theory: to close idle session it's enough
>> not to send sequence any more and there are no reason to re-establish
>> session as soon as server returns EXPIRED.
>
> I don't understand. I've never put forward any "theory" involving forcing the client to re-establish the session just because the server returns EXPIRED. It should be re-establishing sessions iff we want to access the filesystem and the server tells it that the session expired.

My apologies if I was not clear. I just wanted to say that it doesn't
looks like expected client behavior.

Tigran.

>
> Trond
>
>
>> Tigran.
>>
>> On Sun, Dec 25, 2011 at 2:25 PM, Trond Myklebust
>> <[email protected]> wrote:
>> > On Sun, 2011-12-25 at 14:03 +0200, Benny Halevy wrote:
>> >> On 2011-12-25 11:47, Trond Myklebust wrote:
>> >> > On Sun, 2011-12-25 at 06:37 +0200, Benny Halevy wrote:
>> >> >> On 2011-12-21 22:11, Tigran Mkrtchyan wrote:
>> >> >>> On Wed, Dec 21, 2011 at 2:57 PM, Trond Myklebust
>> >> >>> <[email protected]> wrote:
>> >> >>>> On Wed, 2011-12-21 at 10:24 +0100, Tigran Mkrtchyan wrote:
>> >> >>>>> Dear friends,
>> >> >>>>>
>> >> >>>>> We are observing strange behavior with RHEL 6.2:
>> >> >>>>>
>> >> >>>>> Our the server lease time is 90 seconds. I can see that client
>> >> >>>>> sends SEQUENCE every 60 sec. And this is for some hours ( ~8 ).
>> >> >>>>> At some point client sends SEQUENCE after 127 seconds and gets,
>> >> >>>>> as expected, EXPIRED.
>> >> >>>>
>> >> >>>> Why shouldn't the client be allowed to let the lease expire if
>> >> >>>> nothing is using that filesystem?
>> >> >>>>
>> >> >>>>> I this point I have to blame myself.
>> >> >>>>> Client comes with EXCHANGE_ID using the same clientid.
>> >> >>>>> We did not garbage collected clientid internally as this
>> >> >>>>> happens after 2*LEASE_TIME and return EXPIRE. This ping-pong
>> >> >>>>> never ends.
>> >> >>>>>
>> >> >>>>> This is probably mostly a bug on my side. Nevertheless we never
>> >> >>>>> observed late SEQUENCE with kernel > 2.6.39. A short packet
>> dump attached.
>> >> >>>>>
>> >> >>>>> I can open bug at RHEL if required.
>> >> >>>>
>> >> >>>> I wouldn't consider that a bug.
>> >> >>>
>> >> >>> As I said, there is a bug in exchange_id processing ( case 3 ) on
>> >> >>> my side. But to me it's sounds strange that client after more
>> >> >>> than 8 hours of sending only sequence decided to send one of them
>> >> >>> later than lease time. Especially, that we did not have it with other
>> kernels.
>> >> >>
>> >> >> I'm inclined to agree.  The client can let the lease expire for
>> >> >> sure and that's not a bug but the fact that the client sent the
>> >> >> SEQUENCE operation after the lease had expired indicates it might
>> >> >> not be aware of that fact and that seems to be a client bug.
>> >> >>
>> >> >> That said, I don't think that letting the lease expire when the
>> >> >> client is idle is the most polite thing to do. Why let the server
>> >> >> clean up after the client and revert to possibly un-optimized
>> >> >> recovery paths rather than orderly destruction of the state by the
>> client?
>> >> >
>> >> > There are plenty of cases where the client can be idle for hours or
>> >> > even _days_. What's the point of pinging the server all the time
>> >> > after working hours?
>> >> >
>> >> > If someone wants to code up a DESTROY_SESSION and
>> DESTROY_CLIENTID
>> >> > in order to make it formal, then fine, however note that we don't
>> >> > even do that on a full unmount today.
>> >> >
>> >>
>> >> The heavy lifting is releasing locks and returning layouts and
>> >> delegations sending DESTROY_{SESSION,CLIENTID} would be nice to have
>> >> but I don't think it's the most important issue.
>> >
>> > Actually, that requirement to return state is what makes
>> > DESTROY_CLIENTID a completely useless operation.
>> > Forget what I said then: it's too stupid to implement...
>> >
>> > --
>> > Trond Myklebust
>> > Linux NFS client maintainer
>> >
>> > NetApp
>> > [email protected]
>> > http://www.netapp.com
>> >

2011-12-25 04:37:50

by Benny Halevy

[permalink] [raw]
Subject: Re: Session timeout on RHEL6.2

On 2011-12-21 22:11, Tigran Mkrtchyan wrote:
> On Wed, Dec 21, 2011 at 2:57 PM, Trond Myklebust
> <[email protected]> wrote:
>> On Wed, 2011-12-21 at 10:24 +0100, Tigran Mkrtchyan wrote:
>>> Dear friends,
>>>
>>> We are observing strange behavior with RHEL 6.2:
>>>
>>> Our the server lease time is 90 seconds. I can see that client
>>> sends SEQUENCE every 60 sec. And this is for some hours ( ~8 ).
>>> At some point client sends SEQUENCE after 127 seconds and
>>> gets, as expected, EXPIRED.
>>
>> Why shouldn't the client be allowed to let the lease expire if nothing
>> is using that filesystem?
>>
>>> I this point I have to blame myself.
>>> Client comes with EXCHANGE_ID using the same clientid.
>>> We did not garbage collected clientid internally as this happens after
>>> 2*LEASE_TIME
>>> and return EXPIRE. This ping-pong never ends.
>>>
>>> This is probably mostly a bug on my side. Nevertheless we never observed late
>>> SEQUENCE with kernel > 2.6.39. A short packet dump attached.
>>>
>>> I can open bug at RHEL if required.
>>
>> I wouldn't consider that a bug.
>
> As I said, there is a bug in exchange_id processing ( case 3 ) on my
> side. But to me it's sounds strange that client after more than 8
> hours of sending only sequence decided to send one of them later than
> lease time. Especially, that we did not have it with other kernels.

I'm inclined to agree. The client can let the lease expire for sure
and that's not a bug but the fact that the client sent the SEQUENCE operation
after the lease had expired indicates it might not be aware of that fact
and that seems to be a client bug.

That said, I don't think that letting the lease expire when the client is idle
is the most polite thing to do. Why let the server clean up after the client
and revert to possibly un-optimized recovery paths rather than orderly
destruction of the state by the client?

Benny

>
> Anyway, we will update our server tomorrow with bugfix.
>
> Tigran.
>
>>
>> Trond
>> --
>> Trond Myklebust
>> Linux NFS client maintainer
>>
>> NetApp
>> [email protected]
>> http://www.netapp.com
>>

2011-12-21 20:11:49

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: Session timeout on RHEL6.2

On Wed, Dec 21, 2011 at 2:57 PM, Trond Myklebust
<[email protected]> wrote:
> On Wed, 2011-12-21 at 10:24 +0100, Tigran Mkrtchyan wrote:
>> Dear friends,
>>
>> We are observing strange behavior with RHEL 6.2:
>>
>> Our the server lease time is 90 seconds. I can see that client
>> sends SEQUENCE every 60 sec. And this is for some hours ( ~8 ).
>> At some point client sends SEQUENCE after 127 seconds and
>> gets, as expected, EXPIRED.
>
> Why shouldn't the client be allowed to let the lease expire if nothing
> is using that filesystem?
>
>> I this point I have to blame myself.
>> Client comes with EXCHANGE_ID using the same clientid.
>> We did not garbage collected clientid internally as this happens after
>> 2*LEASE_TIME
>> and return EXPIRE. This ping-pong never ends.
>>
>> This is probably mostly a bug on my side. Nevertheless we never observed late
>> SEQUENCE with kernel > 2.6.39. A short packet dump attached.
>>
>> I can open bug at RHEL if required.
>
> I wouldn't consider that a bug.

As I said, there is a bug in exchange_id processing ( case 3 ) on my
side. But to me it's sounds strange that client after more than 8
hours of sending only sequence decided to send one of them later than
lease time. Especially, that we did not have it with other kernels.

Anyway, we will update our server tomorrow with bugfix.

Tigran.

>
> Trond
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html