2017-06-19 20:36:14

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH 1/1] NFS fix race in state recovery

An application can fail with EIO when the server reboots and the
client while doing the recovery is getting into the following
situation where it incorrectly chooses "nograce" recovery instead
of "reboot" recovery. The necessary trigger is a pnfs mount and 2 DS
reads on the same file fail with BAD_STATEID each read uses its own
lock stateid (and MDS server reboots losing state). Since server
rebooted it will fail recovery ops with BAD_SESSION and then also
STALE_ID triggering session and clientid recovery.

1. Thread1 gets BAD_STATEID initiates stateid recovery calls
nfs4_state_mark_reclaim_no_grace() which sets the cl_state->flags and
state->flags respective _NOGRACE flags.

2. State manager gets to run and it clears the _NOGRACE from the
cl_state. Calls nfs4_do_reclaim() which sends TEST_STATEID which gets
a BAD_SESSION.

3. Thread2 now gets control and it also processing its BAD_STATEID and
calls nfs4_state_mark_reclaim_no_grace() and it again sets the
state/cl_state->flags to have _NOGRACE.

4. State manager runs and continues with state recovery by sending
DESTROY_SESSION, and then CREATE_SESSION which fails with
STALE_CLIENTID. Now we are recovering the lease.

5. State manager in nfs4_recovery_handle_error for STALE_CLIENTID,
calls nfs4_state_start_reclaim_reboot() calls
nfs4_state_mark_reclaim_reboot() which has a check

/* Don't recover state that expired before the reboot */
if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags)) {
clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags);
return 0;
}

Because the _NOGRACE was set again in Step3, we end up clearing
_RECLAIM_REBOOT and not setting _RECLAIM_REBOOT in the cl_state->flags
and choose to do "nograce" operation instead of recovery.

Proposing to introduce a new state flag indicating recovery is
going on and when set in nfs4_state_mark_reclaim_nograce() return.

Signed-off-by: Olga Kornievskaia <[email protected]>
---
fs/nfs/nfs4_fs.h | 1 +
fs/nfs/nfs4state.c | 5 +++++
2 files changed, 6 insertions(+)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index af285cc..0f20f12 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -158,6 +158,7 @@ enum {
NFS_O_RDWR_STATE, /* OPEN stateid has read/write state */
NFS_STATE_RECLAIM_REBOOT, /* OPEN stateid server rebooted */
NFS_STATE_RECLAIM_NOGRACE, /* OPEN stateid needs to recover state */
+ NFS_STATE_RECLAIM_INPROGRESS, /* OPEN stateid is in progress */
NFS_STATE_POSIX_LOCKS, /* Posix locks are supported */
NFS_STATE_RECOVERY_FAILED, /* OPEN stateid state recovery failed */
NFS_STATE_MAY_NOTIFY_LOCK, /* server may CB_NOTIFY_LOCK */
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index b34de03..4eef2eb 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1335,6 +1335,8 @@ int nfs4_state_mark_reclaim_nograce(struct nfs_client *clp, struct nfs4_state *s
{
if (!nfs4_valid_open_stateid(state))
return 0;
+ if (test_bit(NFS_STATE_RECLAIM_INPROGRESS, &state->flags))
+ return 1;
set_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags);
clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags);
set_bit(NFS_OWNER_RECLAIM_NOGRACE, &state->owner->so_flags);
@@ -1522,6 +1524,7 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
continue;
atomic_inc(&state->count);
spin_unlock(&sp->so_lock);
+ set_bit(NFS_STATE_RECLAIM_INPROGRESS, &state->flags);
status = ops->recover_open(sp, state);
if (status >= 0) {
status = nfs4_reclaim_locks(state, ops);
@@ -1538,6 +1541,8 @@ static int nfs4_reclaim_open_state(struct nfs4_state_owner *sp, const struct nfs
}
clear_bit(NFS_STATE_RECLAIM_NOGRACE,
&state->flags);
+ clear_bit(NFS_STATE_RECLAIM_INPROGRESS,
+ &state->flags);
nfs4_put_open_state(state);
spin_lock(&sp->so_lock);
goto restart;
--
1.8.3.1



2017-06-20 19:46:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/1] NFS fix race in state recovery

T24gTW9uLCAyMDE3LTA2LTE5IGF0IDE2OjM2IC0wNDAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90
ZToNCj4gQW4gYXBwbGljYXRpb24gY2FuIGZhaWwgd2l0aCBFSU8gd2hlbiB0aGUgc2VydmVyIHJl
Ym9vdHMgYW5kIHRoZQ0KPiBjbGllbnQgd2hpbGUgZG9pbmcgdGhlIHJlY292ZXJ5IGlzIGdldHRp
bmcgaW50byB0aGUgZm9sbG93aW5nDQo+IHNpdHVhdGlvbiB3aGVyZSBpdCBpbmNvcnJlY3RseSBj
aG9vc2VzICJub2dyYWNlIiByZWNvdmVyeSBpbnN0ZWFkDQo+IG9mICJyZWJvb3QiIHJlY292ZXJ5
LiBUaGUgbmVjZXNzYXJ5IHRyaWdnZXIgaXMgYSBwbmZzIG1vdW50IGFuZCAyIERTDQo+IHJlYWRz
IG9uIHRoZSBzYW1lIGZpbGUgZmFpbCB3aXRoIEJBRF9TVEFURUlEIGVhY2ggcmVhZCB1c2VzIGl0
cyBvd24NCj4gbG9jayBzdGF0ZWlkIChhbmQgTURTIHNlcnZlciByZWJvb3RzIGxvc2luZyBzdGF0
ZSkuIFNpbmNlwqDCoHNlcnZlcg0KPiByZWJvb3RlZCBpdCB3aWxsIGZhaWwgcmVjb3Zlcnkgb3Bz
IHdpdGggQkFEX1NFU1NJT04gYW5kIHRoZW4gYWxzbw0KPiBTVEFMRV9JRCB0cmlnZ2VyaW5nIHNl
c3Npb24gYW5kIGNsaWVudGlkIHJlY292ZXJ5Lg0KPiANCj4gMS4gVGhyZWFkMSBnZXRzIEJBRF9T
VEFURUlEIGluaXRpYXRlcyBzdGF0ZWlkIHJlY292ZXJ5IGNhbGxzDQo+IG5mczRfc3RhdGVfbWFy
a19yZWNsYWltX25vX2dyYWNlKCkgd2hpY2ggc2V0cyB0aGUgY2xfc3RhdGUtPmZsYWdzIGFuZA0K
PiBzdGF0ZS0+ZmxhZ3MgcmVzcGVjdGl2ZSBfTk9HUkFDRSBmbGFncy4NCj4gDQo+IDIuIFN0YXRl
IG1hbmFnZXIgZ2V0cyB0byBydW4gYW5kIGl0IGNsZWFycyB0aGUgX05PR1JBQ0UgZnJvbSB0aGUN
Cj4gY2xfc3RhdGUuIENhbGxzIG5mczRfZG9fcmVjbGFpbSgpIHdoaWNoIHNlbmRzIFRFU1RfU1RB
VEVJRCB3aGljaCBnZXRzDQo+IGEgQkFEX1NFU1NJT04uDQo+IA0KPiAzLiBUaHJlYWQyIG5vdyBn
ZXRzIGNvbnRyb2wgYW5kIGl0IGFsc28gcHJvY2Vzc2luZyBpdHMgQkFEX1NUQVRFSUQNCj4gYW5k
DQo+IGNhbGxzIG5mczRfc3RhdGVfbWFya19yZWNsYWltX25vX2dyYWNlKCkgYW5kIGl0IGFnYWlu
IHNldHMgdGhlDQo+IHN0YXRlL2NsX3N0YXRlLT5mbGFncyB0byBoYXZlIF9OT0dSQUNFLg0KPiAN
Cj4gNC4gU3RhdGUgbWFuYWdlciBydW5zIGFuZCBjb250aW51ZXMgd2l0aCBzdGF0ZSByZWNvdmVy
eSBieSBzZW5kaW5nDQo+IERFU1RST1lfU0VTU0lPTiwgYW5kIHRoZW4gQ1JFQVRFX1NFU1NJT04g
d2hpY2ggZmFpbHMgd2l0aA0KPiBTVEFMRV9DTElFTlRJRC4gTm93IHdlIGFyZSByZWNvdmVyaW5n
IHRoZSBsZWFzZS4NCj4gDQo+IDUuIFN0YXRlIG1hbmFnZXIgaW4gbmZzNF9yZWNvdmVyeV9oYW5k
bGVfZXJyb3IgZm9yIFNUQUxFX0NMSUVOVElELA0KPiBjYWxscyBuZnM0X3N0YXRlX3N0YXJ0X3Jl
Y2xhaW1fcmVib290KCkgY2FsbHMNCj4gbmZzNF9zdGF0ZV9tYXJrX3JlY2xhaW1fcmVib290KCkg
d2hpY2ggaGFzIGEgY2hlY2sNCj4gDQo+IC8qIERvbid0IHJlY292ZXIgc3RhdGUgdGhhdCBleHBp
cmVkIGJlZm9yZSB0aGUgcmVib290ICovDQo+IGlmICh0ZXN0X2JpdChORlNfU1RBVEVfUkVDTEFJ
TV9OT0dSQUNFLCAmc3RhdGUtPmZsYWdzKSkgew0KPiDCoCBjbGVhcl9iaXQoTkZTX1NUQVRFX1JF
Q0xBSU1fUkVCT09ULCAmc3RhdGUtPmZsYWdzKTsNCj4gwqAgcmV0dXJuIDA7DQo+IH0NCj4gDQo+
IEJlY2F1c2UgdGhlIF9OT0dSQUNFIHdhcyBzZXQgYWdhaW4gaW4gU3RlcDMsIHdlIGVuZCB1cCBj
bGVhcmluZw0KPiBfUkVDTEFJTV9SRUJPT1QgYW5kIG5vdCBzZXR0aW5nIF9SRUNMQUlNX1JFQk9P
VCBpbiB0aGUgY2xfc3RhdGUtDQo+ID5mbGFncw0KPiBhbmQgY2hvb3NlIHRvIGRvICJub2dyYWNl
IiBvcGVyYXRpb24gaW5zdGVhZCBvZiByZWNvdmVyeS4NCj4gDQo+IFByb3Bvc2luZyB0byBpbnRy
b2R1Y2UgYSBuZXcgc3RhdGUgZmxhZyBpbmRpY2F0aW5nIHJlY292ZXJ5IGlzDQo+IGdvaW5nIG9u
IGFuZCB3aGVuIHNldCBpbiBuZnM0X3N0YXRlX21hcmtfcmVjbGFpbV9ub2dyYWNlKCkgcmV0dXJu
Lg0KPiANClRoaXMgbG9va3MgdG8gYmUgYSBjb25zZXF1ZW5jZSBvZiB0cmVhdGluZyB0aGUgZXJy
b3IgY29kZXMgZnJvbSB0aGUgRFMNCmFzIGJlaW5nIGluZGlzdGluZ3Vpc2hhYmxlIGZyb20gdGhl
IE1EUyBvbmVzLiBJIHRoaW5rIHRoZSByaWdodCBmaXgNCmhlcmUgaXMgcmF0aGVyIHRvIHRyZWF0
IHRoZSBEUyBlcnJvciBjb2RlcyBhcyBhbiBpbmRpY2F0aW9uIHRoYXQgd2UNCm5lZWQgdG8gZHJv
cCB0aGUgbGF5b3V0IGFuZCByZXRyeSB0aGUgSS9PIChlaXRoZXIgcmVzZW5kIGEgbGF5b3V0Z2V0
LA0Kb3IgcmV0cnkgdGhlIEkvTyB0aHJvdWdoIHRoZSBNRFMpLg0KDQotLSANClRyb25kIE15a2xl
YnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlr
bGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K