Hi Trond,
Is there a reason that nfs4_proc_reclaim_complete() isn't wrapped
with a do while() to handle errors?
Thanks.
> On Jan 24, 2017, at 14:06, Olga Kornievskaia <[email protected]> wrote:
>=20
> Hi Trond,
>=20
> Is there a reason that nfs4_proc_reclaim_complete() isn't wrapped
> with a do while() to handle errors?
What do we not already handle correctly in nfs4_reclaim_complete_done()?
On Tue, Jan 24, 2017 at 2:12 PM, Trond Myklebust
<[email protected]> wrote:
>
>> On Jan 24, 2017, at 14:06, Olga Kornievskaia <[email protected]> wrote:
>>
>> Hi Trond,
>>
>> Is there a reason that nfs4_proc_reclaim_complete() isn't wrapped
>> with a do while() to handle errors?
>
> What do we not already handle correctly in nfs4_reclaim_complete_done()?
Could this be because when an error occurs rpc_done isn't called
(rpc_release is called)? What I see is that if RECLAIM_COMPLETE gets
an error (BAD_SESSION) the client just ignores it.
DQo+IE9uIEphbiAyNCwgMjAxNywgYXQgMTQ6NDAsIE9sZ2EgS29ybmlldnNrYWlhIDxhZ2xvQHVt
aWNoLmVkdT4gd3JvdGU6DQo+IA0KPiBPbiBUdWUsIEphbiAyNCwgMjAxNyBhdCAyOjEyIFBNLCBU
cm9uZCBNeWtsZWJ1c3QNCj4gPHRyb25kbXlAcHJpbWFyeWRhdGEuY29tPiB3cm90ZToNCj4+IA0K
Pj4+IE9uIEphbiAyNCwgMjAxNywgYXQgMTQ6MDYsIE9sZ2EgS29ybmlldnNrYWlhIDxhZ2xvQHVt
aWNoLmVkdT4gd3JvdGU6DQo+Pj4gDQo+Pj4gSGkgVHJvbmQsDQo+Pj4gDQo+Pj4gSXMgdGhlcmUg
YSByZWFzb24gdGhhdCBuZnM0X3Byb2NfcmVjbGFpbV9jb21wbGV0ZSgpIGlzbid0ICB3cmFwcGVk
DQo+Pj4gd2l0aCBhIGRvIHdoaWxlKCkgdG8gaGFuZGxlIGVycm9ycz8NCj4+IA0KPj4gV2hhdCBk
byB3ZSBub3QgYWxyZWFkeSBoYW5kbGUgY29ycmVjdGx5IGluIG5mczRfcmVjbGFpbV9jb21wbGV0
ZV9kb25lKCk/DQo+IA0KPiBDb3VsZCB0aGlzIGJlIGJlY2F1c2Ugd2hlbiBhbiBlcnJvciBvY2N1
cnMgcnBjX2RvbmUgaXNuJ3QgY2FsbGVkDQo+IChycGNfcmVsZWFzZSBpcyBjYWxsZWQpPyBXaGF0
IEkgc2VlIGlzIHRoYXQgaWYgUkVDTEFJTV9DT01QTEVURSBnZXRzDQo+IGFuIGVycm9yIChCQURf
U0VTU0lPTikgdGhlIGNsaWVudCBqdXN0IGlnbm9yZXMgaXQuDQo+IA0KDQpUaGF04oCZcyBjb3Jy
ZWN0LiBXaHkgZG8gd2UgbmVlZCB0byBoYW5kbGUgQkFEX1NFU1NJT04gdGhlcmU/IFdl4oCZcmUg
ZG9uZSB3aXRoIHN0YXRlIHJlY292ZXJ5LCBzbyBpZiB0aGUgc2VydmVyIHJlYm9vdGVkLCB3ZSBj
YW4gY2F0Y2ggdGhhdCBsYXRlci4=
On Tue, Jan 24, 2017 at 2:44 PM, Trond Myklebust
<[email protected]> wrote:
>
>> On Jan 24, 2017, at 14:40, Olga Kornievskaia <[email protected]> wrote:
>>
>> On Tue, Jan 24, 2017 at 2:12 PM, Trond Myklebust
>> <[email protected]> wrote:
>>>
>>>> On Jan 24, 2017, at 14:06, Olga Kornievskaia <[email protected]> wrote:
>>>>
>>>> Hi Trond,
>>>>
>>>> Is there a reason that nfs4_proc_reclaim_complete() isn't wrapped
>>>> with a do while() to handle errors?
>>>
>>> What do we not already handle correctly in nfs4_reclaim_complete_done()=
?
>>
>> Could this be because when an error occurs rpc_done isn't called
>> (rpc_release is called)? What I see is that if RECLAIM_COMPLETE gets
>> an error (BAD_SESSION) the client just ignores it.
>>
>
> That=E2=80=99s correct. Why do we need to handle BAD_SESSION there? We=E2=
=80=99re done with state recovery, so if the server rebooted, we can catch =
that later.
(1) don't we want to handle session errors as soon as possible?
(2) I ran into a problem (not sure yet if reproducible) where I had a
client stuck in an infinite loop of RECLAIM_COMPLETE being sent with
reply of BAD_SESSION.
yes I don't know why the client is looping but it made me look into
the fact that we are not handling session errors on reclaim complete
which I simulated by having the server return BAD_SESSION to
RECLAIM_COMPLETE and I see that client simply ignores it.
DQo+IE9uIEphbiAyNCwgMjAxNywgYXQgMTQ6NTAsIE9sZ2EgS29ybmlldnNrYWlhIDxhZ2xvQHVt
aWNoLmVkdT4gd3JvdGU6DQo+IA0KPiBPbiBUdWUsIEphbiAyNCwgMjAxNyBhdCAyOjQ0IFBNLCBU
cm9uZCBNeWtsZWJ1c3QNCj4gPHRyb25kbXlAcHJpbWFyeWRhdGEuY29tPiB3cm90ZToNCj4+IA0K
Pj4+IE9uIEphbiAyNCwgMjAxNywgYXQgMTQ6NDAsIE9sZ2EgS29ybmlldnNrYWlhIDxhZ2xvQHVt
aWNoLmVkdT4gd3JvdGU6DQo+Pj4gDQo+Pj4gT24gVHVlLCBKYW4gMjQsIDIwMTcgYXQgMjoxMiBQ
TSwgVHJvbmQgTXlrbGVidXN0DQo+Pj4gPHRyb25kbXlAcHJpbWFyeWRhdGEuY29tPiB3cm90ZToN
Cj4+Pj4gDQo+Pj4+PiBPbiBKYW4gMjQsIDIwMTcsIGF0IDE0OjA2LCBPbGdhIEtvcm5pZXZza2Fp
YSA8YWdsb0B1bWljaC5lZHU+IHdyb3RlOg0KPj4+Pj4gDQo+Pj4+PiBIaSBUcm9uZCwNCj4+Pj4+
IA0KPj4+Pj4gSXMgdGhlcmUgYSByZWFzb24gdGhhdCBuZnM0X3Byb2NfcmVjbGFpbV9jb21wbGV0
ZSgpIGlzbid0ICB3cmFwcGVkDQo+Pj4+PiB3aXRoIGEgZG8gd2hpbGUoKSB0byBoYW5kbGUgZXJy
b3JzPw0KPj4+PiANCj4+Pj4gV2hhdCBkbyB3ZSBub3QgYWxyZWFkeSBoYW5kbGUgY29ycmVjdGx5
IGluIG5mczRfcmVjbGFpbV9jb21wbGV0ZV9kb25lKCk/DQo+Pj4gDQo+Pj4gQ291bGQgdGhpcyBi
ZSBiZWNhdXNlIHdoZW4gYW4gZXJyb3Igb2NjdXJzIHJwY19kb25lIGlzbid0IGNhbGxlZA0KPj4+
IChycGNfcmVsZWFzZSBpcyBjYWxsZWQpPyBXaGF0IEkgc2VlIGlzIHRoYXQgaWYgUkVDTEFJTV9D
T01QTEVURSBnZXRzDQo+Pj4gYW4gZXJyb3IgKEJBRF9TRVNTSU9OKSB0aGUgY2xpZW50IGp1c3Qg
aWdub3JlcyBpdC4NCj4+PiANCj4+IA0KPj4gVGhhdOKAmXMgY29ycmVjdC4gV2h5IGRvIHdlIG5l
ZWQgdG8gaGFuZGxlIEJBRF9TRVNTSU9OIHRoZXJlPyBXZeKAmXJlIGRvbmUgd2l0aCBzdGF0ZSBy
ZWNvdmVyeSwgc28gaWYgdGhlIHNlcnZlciByZWJvb3RlZCwgd2UgY2FuIGNhdGNoIHRoYXQgbGF0
ZXIuDQo+IA0KPiAoMSkgZG9uJ3Qgd2Ugd2FudCB0byBoYW5kbGUgc2Vzc2lvbiBlcnJvcnMgYXMg
c29vbiBhcyBwb3NzaWJsZT8NCj4gKDIpIEkgcmFuIGludG8gYSBwcm9ibGVtIChub3Qgc3VyZSB5
ZXQgaWYgcmVwcm9kdWNpYmxlKSB3aGVyZSBJIGhhZCBhDQo+IGNsaWVudCBzdHVjayBpbiBhbiBp
bmZpbml0ZSBsb29wIG9mIFJFQ0xBSU1fQ09NUExFVEUgYmVpbmcgc2VudCB3aXRoDQo+IHJlcGx5
IG9mIEJBRF9TRVNTSU9OLg0KPiANCj4geWVzIEkgZG9uJ3Qga25vdyB3aHkgdGhlIGNsaWVudCBp
cyBsb29waW5nIGJ1dCBpdCBtYWRlIG1lIGxvb2sgaW50bw0KPiB0aGUgZmFjdCB0aGF0IHdlIGFy
ZSBub3QgaGFuZGxpbmcgc2Vzc2lvbiBlcnJvcnMgb24gcmVjbGFpbSBjb21wbGV0ZQ0KPiB3aGlj
aCBJIHNpbXVsYXRlZCBieSBoYXZpbmcgdGhlIHNlcnZlciByZXR1cm4gQkFEX1NFU1NJT04gdG8N
Cj4gUkVDTEFJTV9DT01QTEVURSBhbmQgSSBzZWUgdGhhdCBjbGllbnQgc2ltcGx5IGlnbm9yZXMg
aXQuDQoNCkl0IGRvZXNu4oCZdCBpZ25vcmUgaXQ6DQoNCnN0YXRpYyBpbnQgbmZzNDFfcmVjbGFp
bV9jb21wbGV0ZV9oYW5kbGVfZXJyb3JzKHN0cnVjdCBycGNfdGFzayAqdGFzaywgc3RydWN0IG5m
c19jbGllbnQgKmNscCkNCnsNCiAgICAgICAgc3dpdGNoKHRhc2stPnRrX3N0YXR1cykgew0KICAg
ICAgICBjYXNlIDA6DQogICAgICAgIGNhc2UgLU5GUzRFUlJfQ09NUExFVEVfQUxSRUFEWToNCiAg
ICAgICAgY2FzZSAtTkZTNEVSUl9XUk9OR19DUkVEOiAvKiBXaGF0IHRvIGRvIGhlcmU/ICovDQog
ICAgICAgICAgICAgICAgYnJlYWs7DQogICAgICAgIGNhc2UgLU5GUzRFUlJfREVMQVk6DQogICAg
ICAgICAgICAgICAgcnBjX2RlbGF5KHRhc2ssIE5GUzRfUE9MTF9SRVRSWV9NQVgpOw0KICAgICAg
ICAgICAgICAgIC8qIGZhbGwgdGhyb3VnaCAqLw0KICAgICAgICBjYXNlIC1ORlM0RVJSX1JFVFJZ
X1VOQ0FDSEVEX1JFUDoNCiAgICAgICAgICAgICAgICByZXR1cm4gLUVBR0FJTjsNCiAgICAgICAg
ZGVmYXVsdDoNCiAgICAgICAgICAgICAgICBuZnM0X3NjaGVkdWxlX2xlYXNlX3JlY292ZXJ5KGNs
cCk7DQogICAgICAgIH0NCiAgICAgICAgcmV0dXJuIDA7DQp9DQoNCklPVzogd2hhdCB0aGUgY29k
ZSBkb2VzIGlzIHNjaGVkdWxlIGFub3RoZXIgcm91bmQgb2YgbGVhc2UgcmVjb3ZlcnkuDQoNCg==
On Tue, Jan 24, 2017 at 3:35 PM, Trond Myklebust
<[email protected]> wrote:
>
>> On Jan 24, 2017, at 14:50, Olga Kornievskaia <[email protected]> wrote:
>>
>> On Tue, Jan 24, 2017 at 2:44 PM, Trond Myklebust
>> <[email protected]> wrote:
>>>
>>>> On Jan 24, 2017, at 14:40, Olga Kornievskaia <[email protected]> wrote:
>>>>
>>>> On Tue, Jan 24, 2017 at 2:12 PM, Trond Myklebust
>>>> <[email protected]> wrote:
>>>>>
>>>>>> On Jan 24, 2017, at 14:06, Olga Kornievskaia <[email protected]> wrote:
>>>>>>
>>>>>> Hi Trond,
>>>>>>
>>>>>> Is there a reason that nfs4_proc_reclaim_complete() isn't wrapped
>>>>>> with a do while() to handle errors?
>>>>>
>>>>> What do we not already handle correctly in nfs4_reclaim_complete_done=
()?
>>>>
>>>> Could this be because when an error occurs rpc_done isn't called
>>>> (rpc_release is called)? What I see is that if RECLAIM_COMPLETE gets
>>>> an error (BAD_SESSION) the client just ignores it.
>>>>
>>>
>>> That=E2=80=99s correct. Why do we need to handle BAD_SESSION there? We=
=E2=80=99re done with state recovery, so if the server rebooted, we can cat=
ch that later.
>>
>> (1) don't we want to handle session errors as soon as possible?
>> (2) I ran into a problem (not sure yet if reproducible) where I had a
>> client stuck in an infinite loop of RECLAIM_COMPLETE being sent with
>> reply of BAD_SESSION.
>>
>> yes I don't know why the client is looping but it made me look into
>> the fact that we are not handling session errors on reclaim complete
>> which I simulated by having the server return BAD_SESSION to
>> RECLAIM_COMPLETE and I see that client simply ignores it.
>
> It doesn=E2=80=99t ignore it:
>
> static int nfs41_reclaim_complete_handle_errors(struct rpc_task *task, st=
ruct nfs_client *clp)
> {
> switch(task->tk_status) {
> case 0:
> case -NFS4ERR_COMPLETE_ALREADY:
> case -NFS4ERR_WRONG_CRED: /* What to do here? */
> break;
> case -NFS4ERR_DELAY:
> rpc_delay(task, NFS4_POLL_RETRY_MAX);
> /* fall through */
> case -NFS4ERR_RETRY_UNCACHED_REP:
> return -EAGAIN;
> default:
> nfs4_schedule_lease_recovery(clp);
> }
> return 0;
> }
>
> IOW: what the code does is schedule another round of lease recovery.
We already agreed that this doesn't get called because rpc_call_done
isn't called on the error.
T24gVHVlLCAyMDE3LTAxLTI0IGF0IDE1OjQ4IC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90
ZToNCj4gT24gVHVlLCBKYW4gMjQsIDIwMTcgYXQgMzozNSBQTSwgVHJvbmQgTXlrbGVidXN0DQo+
IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+ID4gDQo+ID4gPiBPbiBKYW4gMjQs
IDIwMTcsIGF0IDE0OjUwLCBPbGdhIEtvcm5pZXZza2FpYSA8YWdsb0B1bWljaC5lZHU+DQo+ID4g
PiB3cm90ZToNCj4gPiA+IA0KPiA+ID4gT24gVHVlLCBKYW4gMjQsIDIwMTcgYXQgMjo0NCBQTSwg
VHJvbmQgTXlrbGVidXN0DQo+ID4gPiA8dHJvbmRteUBwcmltYXJ5ZGF0YS5jb20+IHdyb3RlOg0K
PiA+ID4gPiANCj4gPiA+ID4gPiBPbiBKYW4gMjQsIDIwMTcsIGF0IDE0OjQwLCBPbGdhIEtvcm5p
ZXZza2FpYSA8YWdsb0B1bWljaC5lZHU+DQo+ID4gPiA+ID4gd3JvdGU6DQo+ID4gPiA+ID4gDQo+
ID4gPiA+ID4gT24gVHVlLCBKYW4gMjQsIDIwMTcgYXQgMjoxMiBQTSwgVHJvbmQgTXlrbGVidXN0
DQo+ID4gPiA+ID4gPHRyb25kbXlAcHJpbWFyeWRhdGEuY29tPiB3cm90ZToNCj4gPiA+ID4gPiA+
IA0KPiA+ID4gPiA+ID4gPiBPbiBKYW4gMjQsIDIwMTcsIGF0IDE0OjA2LCBPbGdhIEtvcm5pZXZz
a2FpYSA8YWdsb0B1bWljaC4NCj4gPiA+ID4gPiA+ID4gZWR1PiB3cm90ZToNCj4gPiA+ID4gPiA+
ID4gDQo+ID4gPiA+ID4gPiA+IEhpIFRyb25kLA0KPiA+ID4gPiA+ID4gPiANCj4gPiA+ID4gPiA+
ID4gSXMgdGhlcmUgYSByZWFzb24gdGhhdCBuZnM0X3Byb2NfcmVjbGFpbV9jb21wbGV0ZSgpDQo+
ID4gPiA+ID4gPiA+IGlzbid0wqDCoHdyYXBwZWQNCj4gPiA+ID4gPiA+ID4gd2l0aCBhIGRvIHdo
aWxlKCkgdG8gaGFuZGxlIGVycm9ycz8NCj4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gV2hhdCBk
byB3ZSBub3QgYWxyZWFkeSBoYW5kbGUgY29ycmVjdGx5IGluDQo+ID4gPiA+ID4gPiBuZnM0X3Jl
Y2xhaW1fY29tcGxldGVfZG9uZSgpPw0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+IENvdWxkIHRoaXMg
YmUgYmVjYXVzZSB3aGVuIGFuIGVycm9yIG9jY3VycyBycGNfZG9uZSBpc24ndA0KPiA+ID4gPiA+
IGNhbGxlZA0KPiA+ID4gPiA+IChycGNfcmVsZWFzZSBpcyBjYWxsZWQpPyBXaGF0IEkgc2VlIGlz
IHRoYXQgaWYNCj4gPiA+ID4gPiBSRUNMQUlNX0NPTVBMRVRFIGdldHMNCj4gPiA+ID4gPiBhbiBl
cnJvciAoQkFEX1NFU1NJT04pIHRoZSBjbGllbnQganVzdCBpZ25vcmVzIGl0Lg0KPiA+ID4gPiA+
IA0KPiA+ID4gPiANCj4gPiA+ID4gVGhhdOKAmXMgY29ycmVjdC4gV2h5IGRvIHdlIG5lZWQgdG8g
aGFuZGxlIEJBRF9TRVNTSU9OIHRoZXJlPw0KPiA+ID4gPiBXZeKAmXJlIGRvbmUgd2l0aCBzdGF0
ZSByZWNvdmVyeSwgc28gaWYgdGhlIHNlcnZlciByZWJvb3RlZCwgd2UNCj4gPiA+ID4gY2FuIGNh
dGNoIHRoYXQgbGF0ZXIuDQo+ID4gPiANCj4gPiA+ICgxKSBkb24ndCB3ZSB3YW50IHRvIGhhbmRs
ZSBzZXNzaW9uIGVycm9ycyBhcyBzb29uIGFzIHBvc3NpYmxlPw0KPiA+ID4gKDIpIEkgcmFuIGlu
dG8gYSBwcm9ibGVtIChub3Qgc3VyZSB5ZXQgaWYgcmVwcm9kdWNpYmxlKSB3aGVyZSBJDQo+ID4g
PiBoYWQgYQ0KPiA+ID4gY2xpZW50IHN0dWNrIGluIGFuIGluZmluaXRlIGxvb3Agb2YgUkVDTEFJ
TV9DT01QTEVURSBiZWluZyBzZW50DQo+ID4gPiB3aXRoDQo+ID4gPiByZXBseSBvZiBCQURfU0VT
U0lPTi4NCj4gPiA+IA0KPiA+ID4geWVzIEkgZG9uJ3Qga25vdyB3aHkgdGhlIGNsaWVudCBpcyBs
b29waW5nIGJ1dCBpdCBtYWRlIG1lIGxvb2sNCj4gPiA+IGludG8NCj4gPiA+IHRoZSBmYWN0IHRo
YXQgd2UgYXJlIG5vdCBoYW5kbGluZyBzZXNzaW9uIGVycm9ycyBvbiByZWNsYWltDQo+ID4gPiBj
b21wbGV0ZQ0KPiA+ID4gd2hpY2ggSSBzaW11bGF0ZWQgYnkgaGF2aW5nIHRoZSBzZXJ2ZXIgcmV0
dXJuIEJBRF9TRVNTSU9OIHRvDQo+ID4gPiBSRUNMQUlNX0NPTVBMRVRFIGFuZCBJIHNlZSB0aGF0
IGNsaWVudCBzaW1wbHkgaWdub3JlcyBpdC4NCj4gPiANCj4gPiBJdCBkb2VzbuKAmXQgaWdub3Jl
IGl0Og0KPiA+IA0KPiA+IHN0YXRpYyBpbnQgbmZzNDFfcmVjbGFpbV9jb21wbGV0ZV9oYW5kbGVf
ZXJyb3JzKHN0cnVjdCBycGNfdGFzaw0KPiA+ICp0YXNrLCBzdHJ1Y3QgbmZzX2NsaWVudCAqY2xw
KQ0KPiA+IHsNCj4gPiDCoMKgwqDCoMKgwqDCoMKgc3dpdGNoKHRhc2stPnRrX3N0YXR1cykgew0K
PiA+IMKgwqDCoMKgwqDCoMKgwqBjYXNlIDA6DQo+ID4gwqDCoMKgwqDCoMKgwqDCoGNhc2UgLU5G
UzRFUlJfQ09NUExFVEVfQUxSRUFEWToNCj4gPiDCoMKgwqDCoMKgwqDCoMKgY2FzZSAtTkZTNEVS
Ul9XUk9OR19DUkVEOiAvKiBXaGF0IHRvIGRvIGhlcmU/ICovDQo+ID4gwqDCoMKgwqDCoMKgwqDC
oMKgwqDCoMKgwqDCoMKgwqBicmVhazsNCj4gPiDCoMKgwqDCoMKgwqDCoMKgY2FzZSAtTkZTNEVS
Ul9ERUxBWToNCj4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoHJwY19kZWxheSh0
YXNrLCBORlM0X1BPTExfUkVUUllfTUFYKTsNCj4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
oMKgwqDCoC8qIGZhbGwgdGhyb3VnaCAqLw0KPiA+IMKgwqDCoMKgwqDCoMKgwqBjYXNlIC1ORlM0
RVJSX1JFVFJZX1VOQ0FDSEVEX1JFUDoNCj4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
wqDCoHJldHVybiAtRUFHQUlOOw0KPiA+IMKgwqDCoMKgwqDCoMKgwqBkZWZhdWx0Og0KPiA+IMKg
wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgbmZzNF9zY2hlZHVsZV9sZWFzZV9yZWNvdmVy
eShjbHApOw0KPiA+IMKgwqDCoMKgwqDCoMKgwqB9DQo+ID4gwqDCoMKgwqDCoMKgwqDCoHJldHVy
biAwOw0KPiA+IH0NCj4gPiANCj4gPiBJT1c6IHdoYXQgdGhlIGNvZGUgZG9lcyBpcyBzY2hlZHVs
ZSBhbm90aGVyIHJvdW5kIG9mIGxlYXNlDQo+ID4gcmVjb3ZlcnkuDQo+IA0KPiBXZSBhbHJlYWR5
IGFncmVlZCB0aGF0IHRoaXMgZG9lc24ndCBnZXQgY2FsbGVkIGJlY2F1c2UgcnBjX2NhbGxfZG9u
ZQ0KPiBpc24ndCBjYWxsZWQgb24gdGhlIGVycm9yLg0KDQpXaGF0IGFtIEkgbWlzc2luZz8gV2h5
IHdvdWxkbid0IGl0IGdldCBjYWxsZWQ/DQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBO
RlMgY2xpZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWtsZWJ1c3RAcHJpbWFy
eWRhdGEuY29tDQo=
T24gVHVlLCAyMDE3LTAxLTI0IGF0IDE2OjExIC0wNTAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6
DQo+IE9uIFR1ZSwgMjAxNy0wMS0yNCBhdCAxNTo0OCAtMDUwMCwgT2xnYSBLb3JuaWV2c2thaWEg
d3JvdGU6DQo+ID4gT24gVHVlLCBKYW4gMjQsIDIwMTcgYXQgMzozNSBQTSwgVHJvbmQgTXlrbGVi
dXN0DQo+ID4gPHRyb25kbXlAcHJpbWFyeWRhdGEuY29tPiB3cm90ZToNCj4gPiA+IA0KPiA+ID4g
PiBPbiBKYW4gMjQsIDIwMTcsIGF0IDE0OjUwLCBPbGdhIEtvcm5pZXZza2FpYSA8YWdsb0B1bWlj
aC5lZHU+DQo+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiANCj4gPiA+ID4gT24gVHVlLCBKYW4gMjQs
IDIwMTcgYXQgMjo0NCBQTSwgVHJvbmQgTXlrbGVidXN0DQo+ID4gPiA+IDx0cm9uZG15QHByaW1h
cnlkYXRhLmNvbT4gd3JvdGU6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiBPbiBKYW4gMjQsIDIw
MTcsIGF0IDE0OjQwLCBPbGdhIEtvcm5pZXZza2FpYSA8YWdsb0B1bWljaC5lZA0KPiA+ID4gPiA+
ID4gdT4NCj4gPiA+ID4gPiA+IHdyb3RlOg0KPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiBPbiBU
dWUsIEphbiAyNCwgMjAxNyBhdCAyOjEyIFBNLCBUcm9uZCBNeWtsZWJ1c3QNCj4gPiA+ID4gPiA+
IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+ID4gPiA+ID4gPiA+IA0KPiA+ID4g
PiA+ID4gPiA+IE9uIEphbiAyNCwgMjAxNywgYXQgMTQ6MDYsIE9sZ2EgS29ybmlldnNrYWlhIDxh
Z2xvQHVtaWMNCj4gPiA+ID4gPiA+ID4gPiBoLg0KPiA+ID4gPiA+ID4gPiA+IGVkdT4gd3JvdGU6
DQo+ID4gPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gSGkgVHJvbmQsDQo+ID4gPiA+ID4g
PiA+ID4gDQo+ID4gPiA+ID4gPiA+ID4gSXMgdGhlcmUgYSByZWFzb24gdGhhdCBuZnM0X3Byb2Nf
cmVjbGFpbV9jb21wbGV0ZSgpDQo+ID4gPiA+ID4gPiA+ID4gaXNuJ3TCoMKgd3JhcHBlZA0KPiA+
ID4gPiA+ID4gPiA+IHdpdGggYSBkbyB3aGlsZSgpIHRvIGhhbmRsZSBlcnJvcnM/DQo+ID4gPiA+
ID4gPiA+IA0KPiA+ID4gPiA+ID4gPiBXaGF0IGRvIHdlIG5vdCBhbHJlYWR5IGhhbmRsZSBjb3Jy
ZWN0bHkgaW4NCj4gPiA+ID4gPiA+ID4gbmZzNF9yZWNsYWltX2NvbXBsZXRlX2RvbmUoKT8NCj4g
PiA+ID4gPiA+IA0KPiA+ID4gPiA+ID4gQ291bGQgdGhpcyBiZSBiZWNhdXNlIHdoZW4gYW4gZXJy
b3Igb2NjdXJzIHJwY19kb25lIGlzbid0DQo+ID4gPiA+ID4gPiBjYWxsZWQNCj4gPiA+ID4gPiA+
IChycGNfcmVsZWFzZSBpcyBjYWxsZWQpPyBXaGF0IEkgc2VlIGlzIHRoYXQgaWYNCj4gPiA+ID4g
PiA+IFJFQ0xBSU1fQ09NUExFVEUgZ2V0cw0KPiA+ID4gPiA+ID4gYW4gZXJyb3IgKEJBRF9TRVNT
SU9OKSB0aGUgY2xpZW50IGp1c3QgaWdub3JlcyBpdC4NCj4gPiA+ID4gPiA+IA0KPiA+ID4gPiA+
IA0KPiA+ID4gPiA+IFRoYXTigJlzIGNvcnJlY3QuIFdoeSBkbyB3ZSBuZWVkIHRvIGhhbmRsZSBC
QURfU0VTU0lPTiB0aGVyZT8NCj4gPiA+ID4gPiBXZeKAmXJlIGRvbmUgd2l0aCBzdGF0ZSByZWNv
dmVyeSwgc28gaWYgdGhlIHNlcnZlciByZWJvb3RlZCwgd2UNCj4gPiA+ID4gPiBjYW4gY2F0Y2gg
dGhhdCBsYXRlci4NCj4gPiA+ID4gDQo+ID4gPiA+ICgxKSBkb24ndCB3ZSB3YW50IHRvIGhhbmRs
ZSBzZXNzaW9uIGVycm9ycyBhcyBzb29uIGFzIHBvc3NpYmxlPw0KPiA+ID4gPiAoMikgSSByYW4g
aW50byBhIHByb2JsZW0gKG5vdCBzdXJlIHlldCBpZiByZXByb2R1Y2libGUpIHdoZXJlIEkNCj4g
PiA+ID4gaGFkIGENCj4gPiA+ID4gY2xpZW50IHN0dWNrIGluIGFuIGluZmluaXRlIGxvb3Agb2Yg
UkVDTEFJTV9DT01QTEVURSBiZWluZyBzZW50DQo+ID4gPiA+IHdpdGgNCj4gPiA+ID4gcmVwbHkg
b2YgQkFEX1NFU1NJT04uDQo+ID4gPiA+IA0KPiA+ID4gPiB5ZXMgSSBkb24ndCBrbm93IHdoeSB0
aGUgY2xpZW50IGlzIGxvb3BpbmcgYnV0IGl0IG1hZGUgbWUgbG9vaw0KPiA+ID4gPiBpbnRvDQo+
ID4gPiA+IHRoZSBmYWN0IHRoYXQgd2UgYXJlIG5vdCBoYW5kbGluZyBzZXNzaW9uIGVycm9ycyBv
biByZWNsYWltDQo+ID4gPiA+IGNvbXBsZXRlDQo+ID4gPiA+IHdoaWNoIEkgc2ltdWxhdGVkIGJ5
IGhhdmluZyB0aGUgc2VydmVyIHJldHVybiBCQURfU0VTU0lPTiB0bw0KPiA+ID4gPiBSRUNMQUlN
X0NPTVBMRVRFIGFuZCBJIHNlZSB0aGF0IGNsaWVudCBzaW1wbHkgaWdub3JlcyBpdC4NCj4gPiA+
IA0KPiA+ID4gSXQgZG9lc27igJl0IGlnbm9yZSBpdDoNCj4gPiA+IA0KPiA+ID4gc3RhdGljIGlu
dCBuZnM0MV9yZWNsYWltX2NvbXBsZXRlX2hhbmRsZV9lcnJvcnMoc3RydWN0IHJwY190YXNrDQo+
ID4gPiAqdGFzaywgc3RydWN0IG5mc19jbGllbnQgKmNscCkNCj4gPiA+IHsNCj4gPiA+IMKgwqDC
oMKgwqDCoMKgwqBzd2l0Y2godGFzay0+dGtfc3RhdHVzKSB7DQo+ID4gPiDCoMKgwqDCoMKgwqDC
oMKgY2FzZSAwOg0KPiA+ID4gwqDCoMKgwqDCoMKgwqDCoGNhc2UgLU5GUzRFUlJfQ09NUExFVEVf
QUxSRUFEWToNCj4gPiA+IMKgwqDCoMKgwqDCoMKgwqBjYXNlIC1ORlM0RVJSX1dST05HX0NSRUQ6
IC8qIFdoYXQgdG8gZG8gaGVyZT8gKi8NCj4gPiA+IMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
wqDCoMKgYnJlYWs7DQo+ID4gPiDCoMKgwqDCoMKgwqDCoMKgY2FzZSAtTkZTNEVSUl9ERUxBWToN
Cj4gPiA+IMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgcnBjX2RlbGF5KHRhc2ssIE5G
UzRfUE9MTF9SRVRSWV9NQVgpOw0KPiA+ID4gwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg
wqAvKiBmYWxsIHRocm91Z2ggKi8NCj4gPiA+IMKgwqDCoMKgwqDCoMKgwqBjYXNlIC1ORlM0RVJS
X1JFVFJZX1VOQ0FDSEVEX1JFUDoNCj4gPiA+IMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC
oMKgcmV0dXJuIC1FQUdBSU47DQo+ID4gPiDCoMKgwqDCoMKgwqDCoMKgZGVmYXVsdDoNCj4gPiA+
IMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgbmZzNF9zY2hlZHVsZV9sZWFzZV9yZWNv
dmVyeShjbHApOw0KPiA+ID4gwqDCoMKgwqDCoMKgwqDCoH0NCj4gPiA+IMKgwqDCoMKgwqDCoMKg
wqByZXR1cm4gMDsNCj4gPiA+IH0NCj4gPiA+IA0KPiA+ID4gSU9XOiB3aGF0IHRoZSBjb2RlIGRv
ZXMgaXMgc2NoZWR1bGUgYW5vdGhlciByb3VuZCBvZiBsZWFzZQ0KPiA+ID4gcmVjb3ZlcnkuDQo+
ID4gDQo+ID4gV2UgYWxyZWFkeSBhZ3JlZWQgdGhhdCB0aGlzIGRvZXNuJ3QgZ2V0IGNhbGxlZCBi
ZWNhdXNlDQo+ID4gcnBjX2NhbGxfZG9uZQ0KPiA+IGlzbid0IGNhbGxlZCBvbiB0aGUgZXJyb3Iu
DQo+IA0KPiBXaGF0IGFtIEkgbWlzc2luZz8gV2h5IHdvdWxkbid0IGl0IGdldCBjYWxsZWQ/DQoN
ClNvcnJ5LiBJIHNlZSBJIGNhdXNlZCB0aGF0IGNvbmZ1c2lvbi4gV2hlbiBJIHNhaWQgInRoYXQg
aXMgY29ycmVjdCIsIEkNCndhcyByZWZlcnJpbmcgdG8gdGhlIGZhY3QgdGhhdCB0aGUgY2xpZW50
IGlnbm9yZXMgdGhlIGVycm9yLiBBbGwgaXQNCmRvZXMgaW4gdGhlIGNhc2Ugb2YgdGhvc2UgZXJy
b3JzIGlzIHRvIHNjaGVkdWxlIHJlY292ZXJ5IGFuZCB0aGVuIGV4aXQuDQoNClRoZSBjbGllbnQg
d2lsbCBhbHdheXMgY2FsbCBycGNfY2FsbF9kb25lKCkgcHJvdmlkZWQgdGhhdCB0aGUgUlBDIGNh
bGwNCndhcyBpbml0aWFsaXNlZCBzdWNjZXNzZnVsbHkuDQoNCi0tIA0KDQoNCg0KCQ0KCQ0KDQoN
ClRyb25kIE15a2xlYnVzdA0KUHJpbmNpcGFsIFN5c3RlbSBBcmNoaXRlY3QNCjQzMDAgRWwgQ2Ft
aW5vIFJlYWwgfCBTdWl0ZSAxMDANCkxvcyBBbHRvcywgQ0HCoMKgOTQwMjINClc6IDY1MC00MjIt
MzgwMA0KQzogODAxLTkyMS00NTgzwqANCnd3dy5wcmltYXJ5ZGF0YS5jb20NCg0KDQoNCg0K
On Tue, Jan 24, 2017 at 4:21 PM, Trond Myklebust
<[email protected]> wrote:
> On Tue, 2017-01-24 at 16:11 -0500, Trond Myklebust wrote:
>> On Tue, 2017-01-24 at 15:48 -0500, Olga Kornievskaia wrote:
>> > On Tue, Jan 24, 2017 at 3:35 PM, Trond Myklebust
>> > <[email protected]> wrote:
>> > >
>> > > > On Jan 24, 2017, at 14:50, Olga Kornievskaia <[email protected]>
>> > > > wrote:
>> > > >
>> > > > On Tue, Jan 24, 2017 at 2:44 PM, Trond Myklebust
>> > > > <[email protected]> wrote:
>> > > > >
>> > > > > > On Jan 24, 2017, at 14:40, Olga Kornievskaia <[email protected]
>> > > > > > u>
>> > > > > > wrote:
>> > > > > >
>> > > > > > On Tue, Jan 24, 2017 at 2:12 PM, Trond Myklebust
>> > > > > > <[email protected]> wrote:
>> > > > > > >
>> > > > > > > > On Jan 24, 2017, at 14:06, Olga Kornievskaia <aglo@umic
>> > > > > > > > h.
>> > > > > > > > edu> wrote:
>> > > > > > > >
>> > > > > > > > Hi Trond,
>> > > > > > > >
>> > > > > > > > Is there a reason that nfs4_proc_reclaim_complete()
>> > > > > > > > isn't wrapped
>> > > > > > > > with a do while() to handle errors?
>> > > > > > >
>> > > > > > > What do we not already handle correctly in
>> > > > > > > nfs4_reclaim_complete_done()?
>> > > > > >
>> > > > > > Could this be because when an error occurs rpc_done isn't
>> > > > > > called
>> > > > > > (rpc_release is called)? What I see is that if
>> > > > > > RECLAIM_COMPLETE gets
>> > > > > > an error (BAD_SESSION) the client just ignores it.
>> > > > > >
>> > > > >
>> > > > > That=E2=80=99s correct. Why do we need to handle BAD_SESSION the=
re?
>> > > > > We=E2=80=99re done with state recovery, so if the server reboote=
d, we
>> > > > > can catch that later.
>> > > >
>> > > > (1) don't we want to handle session errors as soon as possible?
>> > > > (2) I ran into a problem (not sure yet if reproducible) where I
>> > > > had a
>> > > > client stuck in an infinite loop of RECLAIM_COMPLETE being sent
>> > > > with
>> > > > reply of BAD_SESSION.
>> > > >
>> > > > yes I don't know why the client is looping but it made me look
>> > > > into
>> > > > the fact that we are not handling session errors on reclaim
>> > > > complete
>> > > > which I simulated by having the server return BAD_SESSION to
>> > > > RECLAIM_COMPLETE and I see that client simply ignores it.
>> > >
>> > > It doesn=E2=80=99t ignore it:
>> > >
>> > > static int nfs41_reclaim_complete_handle_errors(struct rpc_task
>> > > *task, struct nfs_client *clp)
>> > > {
>> > > switch(task->tk_status) {
>> > > case 0:
>> > > case -NFS4ERR_COMPLETE_ALREADY:
>> > > case -NFS4ERR_WRONG_CRED: /* What to do here? */
>> > > break;
>> > > case -NFS4ERR_DELAY:
>> > > rpc_delay(task, NFS4_POLL_RETRY_MAX);
>> > > /* fall through */
>> > > case -NFS4ERR_RETRY_UNCACHED_REP:
>> > > return -EAGAIN;
>> > > default:
>> > > nfs4_schedule_lease_recovery(clp);
>> > > }
>> > > return 0;
>> > > }
>> > >
>> > > IOW: what the code does is schedule another round of lease
>> > > recovery.
>> >
>> > We already agreed that this doesn't get called because
>> > rpc_call_done
>> > isn't called on the error.
>>
>> What am I missing? Why wouldn't it get called?
>
> Sorry. I see I caused that confusion. When I said "that is correct", I
> was referring to the fact that the client ignores the error. All it
> does in the case of those errors is to schedule recovery and then exit.
>
> The client will always call rpc_call_done() provided that the RPC call
> was initialised successfully.
Ok. Thanks for clarifying. So something is going on with the
BAD_SESSION error loop but I just don't know what.
I have encountered either SEQUENCE getting into an infinite loop of
getting BAD_SESSION and then once I've seen this RECLAIM_COMPLETE
getting into a loop. I'll keep looking...