While running generic/089 on v4.1, I noticed the client was doing a lot of
unexpected state recovery. Some investigation shows the following exchange
on the wire:
Client Server
---------- ----------
OPEN1 (owner A) ->
OPEN2 (owner A) ->
<- OPEN1 response: state A1
<- OPEN2 response: state A2
CLOSE (state A2)->
<- CLOSE response: state A3
LOCK (state A1) ->
<- LOCK response: NFS4ERR_BAD_STATEID
Observation of the client's tracepoints show that the first OPEN's response
is not handled by the client until after the second OPEN then CLOSE of the
state. Since both OPENs are done with CLAIM_FH, we have references to the
nfs4_state on the opendata, so it sticks around around, and we incorrectly
transition the nfs4_state back to NFS_OPEN_STATE with the first OPEN's
sequence number.
I investigated various ways of bringing back partial sequencing to OPENs
with the same owner or OPENs and CLOSE, but I didn't like bringing back the
allocations and extra checks for the sequence ids.
I then looked at detecting this race by "noticing" holes in the state's
sequence number and keeping a count of the holes on the state, so a CLOSE
could be deferred until all OPENs complete, but this seemed to be too much
machinery to add to the state handling logic.
I finally ended up deciding to have the first OPEN retry if it loses the
race updating the state. Doing that, unfortunately, means that I needed to
move a bunch of code around so that if nfs_need_update_stateid() == false,
the OPEN can be re-sent. The end result nets a few less lines of code.
This race still exists, however, and will occur more rarely on generic/089 if
we are using CLAIM_NULL because there is still a way for the first OPEN's
response to allocate a new nfs4_state with the old stateid and sequence
number long after that state has been closed and its nfs4_state cleaned up
by the second OPEN and CLOSE. Fixing that may require creating a record of
"pending opens" that can be used to either defer the CLOSE, or retry the
losing OPEN. Another way may be to keep closed nfs4_state around for a bit
to detect this race, and cleanup of closed states can be batched later.
This set doesn't try to fix that race since it is rarely seen.
Patches 1 and 2 just open-code __update_open_stateid and
nfs_set_open_state_locked respectively. They should not change any
behavior. Patch 3 causes the OPEN to be retried if the stateid should not
be updated.
Comments and critique are welcome; I'd very much like to know if there's
any desire to fix this race for both cases.
Ben
Benjamin Coddington (3):
NFSv4: Move __update_open_stateid() into update_open_stateid()
NFSv4: Move nfs_set_open_stateid_locked into update_open_stateid()
NFSv4.1: Detect and retry after OPEN and CLOSE/DOWNGRADE race
fs/nfs/nfs4proc.c | 118 ++++++++++++++++++++++++++----------------------------
1 file changed, 56 insertions(+), 62 deletions(-)
--
2.9.3
nfs_set_open_stateid_locked() has only a singler caller and can be
open-coded within update_open_stateid(). This prepares
update_open_stateid() to be able to report if the state's stateid needed an
update.
Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/nfs/nfs4proc.c | 37 +++++++++++++++----------------------
1 file changed, 15 insertions(+), 22 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 7e0ae74ceb4f..40e431ea1bdc 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1419,27 +1419,6 @@ static void nfs_clear_open_stateid(struct nfs4_state *state,
nfs4_schedule_state_manager(state->owner->so_server->nfs_client);
}
-static void nfs_set_open_stateid_locked(struct nfs4_state *state,
- const nfs4_stateid *stateid, fmode_t fmode,
- nfs4_stateid *freeme)
-{
- switch (fmode) {
- case FMODE_READ:
- set_bit(NFS_O_RDONLY_STATE, &state->flags);
- break;
- case FMODE_WRITE:
- set_bit(NFS_O_WRONLY_STATE, &state->flags);
- break;
- case FMODE_READ|FMODE_WRITE:
- set_bit(NFS_O_RDWR_STATE, &state->flags);
- }
- if (!nfs_need_update_open_stateid(state, stateid, freeme))
- return;
- if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
- nfs4_stateid_copy(&state->stateid, stateid);
- nfs4_stateid_copy(&state->open_stateid, stateid);
-}
-
static int update_open_stateid(struct nfs4_state *state,
const nfs4_stateid *open_stateid,
const nfs4_stateid *delegation,
@@ -1490,7 +1469,21 @@ static int update_open_stateid(struct nfs4_state *state,
}
if (open_stateid) {
- nfs_set_open_stateid_locked(state, open_stateid, fmode, &freeme);
+ switch (fmode) {
+ case FMODE_READ:
+ set_bit(NFS_O_RDONLY_STATE, &state->flags);
+ break;
+ case FMODE_WRITE:
+ set_bit(NFS_O_WRONLY_STATE, &state->flags);
+ break;
+ case FMODE_READ|FMODE_WRITE:
+ set_bit(NFS_O_RDWR_STATE, &state->flags);
+ }
+ if (nfs_need_update_open_stateid(state, open_stateid, &freeme)) {
+ if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
+ nfs4_stateid_copy(&state->stateid, open_stateid);
+ nfs4_stateid_copy(&state->open_stateid, open_stateid);
+ }
ret = 1;
}
--
2.9.3
The two calls to __update_open_stateid() can be removed so that we have
only a single section protected by the so_lock and state seqlock. This
allows us to open-code __update_open_stateid(). This is done so that in a
laater patch we can indicated that the state's stateid was not updated
because it had previously been updated with a more recent sequence id.
Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/nfs/nfs4proc.c | 71 ++++++++++++++++++++++++++-----------------------------
1 file changed, 34 insertions(+), 37 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index d90132642340..7e0ae74ceb4f 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1440,29 +1440,6 @@ static void nfs_set_open_stateid_locked(struct nfs4_state *state,
nfs4_stateid_copy(&state->open_stateid, stateid);
}
-static void __update_open_stateid(struct nfs4_state *state,
- const nfs4_stateid *open_stateid,
- const nfs4_stateid *deleg_stateid,
- fmode_t fmode,
- nfs4_stateid *freeme)
-{
- /*
- * Protect the call to nfs4_state_set_mode_locked and
- * serialise the stateid update
- */
- spin_lock(&state->owner->so_lock);
- write_seqlock(&state->seqlock);
- if (deleg_stateid != NULL) {
- nfs4_stateid_copy(&state->stateid, deleg_stateid);
- set_bit(NFS_DELEGATED_STATE, &state->flags);
- }
- if (open_stateid != NULL)
- nfs_set_open_stateid_locked(state, open_stateid, fmode, freeme);
- write_sequnlock(&state->seqlock);
- update_open_stateflags(state, fmode);
- spin_unlock(&state->owner->so_lock);
-}
-
static int update_open_stateid(struct nfs4_state *state,
const nfs4_stateid *open_stateid,
const nfs4_stateid *delegation,
@@ -1485,27 +1462,47 @@ static int update_open_stateid(struct nfs4_state *state,
spin_lock(&deleg_cur->lock);
if (rcu_dereference(nfsi->delegation) != deleg_cur ||
test_bit(NFS_DELEGATION_RETURNING, &deleg_cur->flags) ||
- (deleg_cur->type & fmode) != fmode)
- goto no_delegation_unlock;
+ (deleg_cur->type & fmode) != fmode) {
+ spin_unlock(&deleg_cur->lock);
+ deleg_cur = NULL;
+ goto no_delegation;
+ }
if (delegation == NULL)
delegation = &deleg_cur->stateid;
- else if (!nfs4_stateid_match(&deleg_cur->stateid, delegation))
- goto no_delegation_unlock;
-
- nfs_mark_delegation_referenced(deleg_cur);
- __update_open_stateid(state, open_stateid, &deleg_cur->stateid,
- fmode, &freeme);
- ret = 1;
-no_delegation_unlock:
- spin_unlock(&deleg_cur->lock);
+ else if (!nfs4_stateid_match(&deleg_cur->stateid, delegation)) {
+ spin_unlock(&deleg_cur->lock);
+ deleg_cur = NULL;
+ }
+
no_delegation:
- rcu_read_unlock();
+ /*
+ * Protect the call to nfs4_state_set_mode_locked and
+ * serialise the stateid update
+ */
+ spin_lock(&state->owner->so_lock);
+ write_seqlock(&state->seqlock);
+ if (deleg_cur) {
+ nfs_mark_delegation_referenced(deleg_cur);
+ nfs4_stateid_copy(&state->stateid, &deleg_cur->stateid);
+ set_bit(NFS_DELEGATED_STATE, &state->flags);
+ ret = 1;
+ }
- if (!ret && open_stateid != NULL) {
- __update_open_stateid(state, open_stateid, NULL, fmode, &freeme);
+ if (open_stateid) {
+ nfs_set_open_stateid_locked(state, open_stateid, fmode, &freeme);
ret = 1;
}
+
+ write_sequnlock(&state->seqlock);
+ update_open_stateflags(state, fmode);
+ spin_unlock(&state->owner->so_lock);
+
+ if (deleg_cur)
+ spin_unlock(&deleg_cur->lock);
+
+ rcu_read_unlock();
+
if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags))
nfs4_schedule_state_manager(clp);
if (freeme.type != 0)
--
2.9.3
If the client issues two simultaneous OPEN calls, and the response to the
first OPEN call (sequence id 1) is delayed before updating the nfs4_state,
then the client's nfs4_state can transition through a complete lifecycle of
OPEN (state sequence id 2), and CLOSE (state sequence id 3). When the
first OPEN is finally processed, the nfs4_state is incorrectly transitioned
back to NFS_OPEN_STATE for the first OPEN (sequence id 1). Subsequent calls
to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and trigger state
recovery.
Fix this by passing back the result of need_update_open_stateid() to the
open() path, with the result to try again if the OPEN's stateid should not
be updated.
Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/nfs/nfs4proc.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 40e431ea1bdc..632136bfd3b3 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1352,9 +1352,9 @@ static void nfs_test_and_clear_all_open_stateid(struct nfs4_state *state)
static bool nfs_need_update_open_stateid(struct nfs4_state *state,
const nfs4_stateid *stateid, nfs4_stateid *freeme)
{
- if (test_and_set_bit(NFS_OPEN_STATE, &state->flags) == 0)
- return true;
if (!nfs4_stateid_match_other(stateid, &state->open_stateid)) {
+ if (test_and_set_bit(NFS_OPEN_STATE, &state->flags) == 0)
+ return true;
nfs4_stateid_copy(freeme, &state->open_stateid);
nfs_test_and_clear_all_open_stateid(state);
return true;
@@ -1468,7 +1468,8 @@ static int update_open_stateid(struct nfs4_state *state,
ret = 1;
}
- if (open_stateid) {
+ if (open_stateid &&
+ nfs_need_update_open_stateid(state, open_stateid, &freeme)) {
switch (fmode) {
case FMODE_READ:
set_bit(NFS_O_RDONLY_STATE, &state->flags);
@@ -1479,11 +1480,10 @@ static int update_open_stateid(struct nfs4_state *state,
case FMODE_READ|FMODE_WRITE:
set_bit(NFS_O_RDWR_STATE, &state->flags);
}
- if (nfs_need_update_open_stateid(state, open_stateid, &freeme)) {
- if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
- nfs4_stateid_copy(&state->stateid, open_stateid);
- nfs4_stateid_copy(&state->open_stateid, open_stateid);
- }
+
+ if (test_bit(NFS_DELEGATED_STATE, &state->flags) == 0)
+ nfs4_stateid_copy(&state->stateid, open_stateid);
+ nfs4_stateid_copy(&state->open_stateid, open_stateid);
ret = 1;
}
@@ -1675,8 +1675,12 @@ _nfs4_opendata_to_nfs4_state(struct nfs4_opendata *data)
goto err_put_inode;
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
- update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode);
+ ret = -EAGAIN;
+ if (!update_open_stateid(state, &data->o_res.stateid, NULL,
+ data->o_arg.fmode)) {
+ nfs4_put_open_state(state);
+ goto err_put_inode;
+ }
iput(inode);
out:
nfs_release_seqid(data->o_arg.seqid);
--
2.9.3
T24gVHVlLCAyMDE3LTEwLTE3IGF0IDEwOjQ2IC0wNDAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdy
b3RlOg0KPiBJZiB0aGUgY2xpZW50IGlzc3VlcyB0d28gc2ltdWx0YW5lb3VzIE9QRU4gY2FsbHMs
IGFuZCB0aGUgcmVzcG9uc2UgdG8NCj4gdGhlDQo+IGZpcnN0IE9QRU4gY2FsbCAoc2VxdWVuY2Ug
aWQgMSkgaXMgZGVsYXllZCBiZWZvcmUgdXBkYXRpbmcgdGhlDQo+IG5mczRfc3RhdGUsDQo+IHRo
ZW4gdGhlIGNsaWVudCdzIG5mczRfc3RhdGUgY2FuIHRyYW5zaXRpb24gdGhyb3VnaCBhIGNvbXBs
ZXRlDQo+IGxpZmVjeWNsZSBvZg0KPiBPUEVOIChzdGF0ZSBzZXF1ZW5jZSBpZCAyKSwgYW5kIENM
T1NFIChzdGF0ZSBzZXF1ZW5jZSBpZCAzKS4gIFdoZW4NCj4gdGhlDQo+IGZpcnN0IE9QRU4gaXMg
ZmluYWxseSBwcm9jZXNzZWQsIHRoZSBuZnM0X3N0YXRlIGlzIGluY29ycmVjdGx5DQo+IHRyYW5z
aXRpb25lZA0KPiBiYWNrIHRvIE5GU19PUEVOX1NUQVRFIGZvciB0aGUgZmlyc3QgT1BFTiAoc2Vx
dWVuY2UgaWQNCj4gMSkuICBTdWJzZXF1ZW50IGNhbGxzDQo+IHRvIExPQ0sgb3IgQ0xPU0Ugd2ls
bCByZWNlaXZlIE5GUzRFUlJfQkFEX1NUQVRFSUQsIGFuZCB0cmlnZ2VyIHN0YXRlDQo+IHJlY292
ZXJ5Lg0KPiANCj4gRml4IHRoaXMgYnkgcGFzc2luZyBiYWNrIHRoZSByZXN1bHQgb2YgbmVlZF91
cGRhdGVfb3Blbl9zdGF0ZWlkKCkgdG8NCj4gdGhlDQo+IG9wZW4oKSBwYXRoLCB3aXRoIHRoZSBy
ZXN1bHQgdG8gdHJ5IGFnYWluIGlmIHRoZSBPUEVOJ3Mgc3RhdGVpZA0KPiBzaG91bGQgbm90DQo+
IGJlIHVwZGF0ZWQuDQo+IA0KDQpXaHkgYXJlIHdlIHdvcnJpZWQgYWJvdXQgdGhlIHNwZWNpYWwg
Y2FzZSB3aGVyZSB0aGUgY2xpZW50IGFjdHVhbGx5DQpmaW5kcyB0aGUgY2xvc2VkIHN0YXRlaWQg
aW4gaXRzIGNhY2hlPw0KDQpJbiB0aGUgbW9yZSBnZW5lcmFsIGNhc2Ugb2YgeW91ciByYWNlLCB0
aGUgc3RhdGVpZCBtaWdodCBub3QgYmUgZm91bmQNCmF0IGFsbCBiZWNhdXNlIHRoZSBDTE9TRSBj
b21wbGV0ZXMgYW5kIGlzIHByb2Nlc3NlZCBvbiB0aGUgY2xpZW50DQpiZWZvcmUgaXQgY2FuIHBy
b2Nlc3MgdGhlIHJlcGx5IGZyb20gdGhlIGRlbGF5ZWQgT1BFTi4gSWYgc28sIHdlIHJlYWxseQ0K
aGF2ZSBubyB3YXkgdG8gZGV0ZWN0IHRoYXQgdGhlIGZpbGUgaGFzIGFjdHVhbGx5IGJlZW4gY2xv
c2VkIGJ5IHRoZQ0Kc2VydmVyIHVudGlsIHdlIHNlZSB0aGUgTkZTNEVSUl9CQURfU1RBVEVJRC4N
Cg0KTm90ZSBhbHNvIHRoYXQgc2VjdGlvbiAxOC4yLjQgc2F5cyB0aGF0ICJ0aGUgc2VydmVyIFNI
T1VMRCByZXR1cm4gdGhlDQppbnZhbGlkIHNwZWNpYWwgc3RhdGVpZCAodGhlICJvdGhlciIgdmFs
dWUgaXMgemVybyBhbmQgdGhlICJzZXFpZCINCmZpZWxkIGlzIE5GUzRfVUlOVDMyX01BWCwgc2Vl
IFNlY3Rpb24gOC4yLjMpIiB3aGljaCBmdXJ0aGVyIGVuc3VyZXMNCnRoYXQgd2Ugc2hvdWxkIG5v
dCBiZSBhYmxlIHRvIG1hdGNoIHRoZSBPUEVOIHN0YXRlaWQgb25jZSB0aGUgQ0xPU0UgaXMNCnBy
b2Nlc3NlZCBvbiB0aGUgY2xpZW50Lg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZT
IGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlk
YXRhLmNvbQ0K
On 17 Oct 2017, at 11:49, Trond Myklebust wrote:
> On Tue, 2017-10-17 at 10:46 -0400, Benjamin Coddington wrote:
>> If the client issues two simultaneous OPEN calls, and the response to
>> the
>> first OPEN call (sequence id 1) is delayed before updating the
>> nfs4_state,
>> then the client's nfs4_state can transition through a complete
>> lifecycle of
>> OPEN (state sequence id 2), and CLOSE (state sequence id 3). When
>> the
>> first OPEN is finally processed, the nfs4_state is incorrectly
>> transitioned
>> back to NFS_OPEN_STATE for the first OPEN (sequence id
>> 1). Subsequent calls
>> to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and trigger state
>> recovery.
>>
>> Fix this by passing back the result of need_update_open_stateid() to
>> the
>> open() path, with the result to try again if the OPEN's stateid
>> should not
>> be updated.
>>
>
> Why are we worried about the special case where the client actually
> finds the closed stateid in its cache?
Because I am hitting that case very frequently in generic/089, and I hate
how unnecessary state recovery slows everything down. I'm also seeing a
problem where generic/089 never completes, and I was trying to get this
behavior out of the way.
> In the more general case of your race, the stateid might not be found
> at all because the CLOSE completes and is processed on the client
> before it can process the reply from the delayed OPEN. If so, we really
> have no way to detect that the file has actually been closed by the
> server until we see the NFS4ERR_BAD_STATEID.
I mentioned this case in the cover letter. It's possible that the client
could retain a record of a closed stateid in order to retry an OPEN in that
case. Another approach may be to detect 'holes' in the state id sequence
and not call CLOSE until each id is processed. I think there's an existing
problem now where this race (without the CLOSE) can incorrectly update the
state flags with the result of the delayed OPEN's mode.
> Note also that section 18.2.4 says that "the server SHOULD return the
> invalid special stateid (the "other" value is zero and the "seqid"
> field is NFS4_UINT32_MAX, see Section 8.2.3)" which further ensures
> that we should not be able to match the OPEN stateid once the CLOSE is
> processed on the client.
Ah, right, so need_update_open_stateid() should handle that case. But that
doesn't mean there's no way to match the OPEN statid after a CLOSE. We
could still keep a record of the closed state with a flag instead of an
incremented sequence id.
So perhaps keeping track of holes in the sequence would be a better
approach, but then the OPEN response handling needs to call CLOSE in case
the OPEN fails.. that seems more complicated.
Ben
T24gVHVlLCAyMDE3LTEwLTE3IGF0IDEzOjMzIC0wNDAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdy
b3RlOg0KPiBPbiAxNyBPY3QgMjAxNywgYXQgMTE6NDksIFRyb25kIE15a2xlYnVzdCB3cm90ZToN
Cj4gDQo+ID4gT24gVHVlLCAyMDE3LTEwLTE3IGF0IDEwOjQ2IC0wNDAwLCBCZW5qYW1pbiBDb2Rk
aW5ndG9uIHdyb3RlOg0KPiA+ID4gSWYgdGhlIGNsaWVudCBpc3N1ZXMgdHdvIHNpbXVsdGFuZW91
cyBPUEVOIGNhbGxzLCBhbmQgdGhlDQo+ID4gPiByZXNwb25zZSB0bw0KPiA+ID4gdGhlDQo+ID4g
PiBmaXJzdCBPUEVOIGNhbGwgKHNlcXVlbmNlIGlkIDEpIGlzIGRlbGF5ZWQgYmVmb3JlIHVwZGF0
aW5nIHRoZQ0KPiA+ID4gbmZzNF9zdGF0ZSwNCj4gPiA+IHRoZW4gdGhlIGNsaWVudCdzIG5mczRf
c3RhdGUgY2FuIHRyYW5zaXRpb24gdGhyb3VnaCBhIGNvbXBsZXRlDQo+ID4gPiBsaWZlY3ljbGUg
b2YNCj4gPiA+IE9QRU4gKHN0YXRlIHNlcXVlbmNlIGlkIDIpLCBhbmQgQ0xPU0UgKHN0YXRlIHNl
cXVlbmNlIGlkDQo+ID4gPiAzKS4gIFdoZW4NCj4gPiA+IHRoZQ0KPiA+ID4gZmlyc3QgT1BFTiBp
cyBmaW5hbGx5IHByb2Nlc3NlZCwgdGhlIG5mczRfc3RhdGUgaXMgaW5jb3JyZWN0bHkNCj4gPiA+
IHRyYW5zaXRpb25lZA0KPiA+ID4gYmFjayB0byBORlNfT1BFTl9TVEFURSBmb3IgdGhlIGZpcnN0
IE9QRU4gKHNlcXVlbmNlIGlkDQo+ID4gPiAxKS4gIFN1YnNlcXVlbnQgY2FsbHMNCj4gPiA+IHRv
IExPQ0sgb3IgQ0xPU0Ugd2lsbCByZWNlaXZlIE5GUzRFUlJfQkFEX1NUQVRFSUQsIGFuZCB0cmln
Z2VyDQo+ID4gPiBzdGF0ZQ0KPiA+ID4gcmVjb3ZlcnkuDQo+ID4gPiANCj4gPiA+IEZpeCB0aGlz
IGJ5IHBhc3NpbmcgYmFjayB0aGUgcmVzdWx0IG9mIG5lZWRfdXBkYXRlX29wZW5fc3RhdGVpZCgp
DQo+ID4gPiB0bw0KPiA+ID4gdGhlDQo+ID4gPiBvcGVuKCkgcGF0aCwgd2l0aCB0aGUgcmVzdWx0
IHRvIHRyeSBhZ2FpbiBpZiB0aGUgT1BFTidzIHN0YXRlaWQNCj4gPiA+IHNob3VsZCBub3QNCj4g
PiA+IGJlIHVwZGF0ZWQuDQo+ID4gPiANCj4gPiANCj4gPiBXaHkgYXJlIHdlIHdvcnJpZWQgYWJv
dXQgdGhlIHNwZWNpYWwgY2FzZSB3aGVyZSB0aGUgY2xpZW50IGFjdHVhbGx5DQo+ID4gZmluZHMg
dGhlIGNsb3NlZCBzdGF0ZWlkIGluIGl0cyBjYWNoZT8NCj4gDQo+IEJlY2F1c2UgSSBhbSBoaXR0
aW5nIHRoYXQgY2FzZSB2ZXJ5IGZyZXF1ZW50bHkgaW4gZ2VuZXJpYy8wODksIGFuZCBJDQo+IGhh
dGUNCj4gaG93IHVubmVjZXNzYXJ5IHN0YXRlIHJlY292ZXJ5IHNsb3dzIGV2ZXJ5dGhpbmcgZG93
bi4gIEknbSBhbHNvIA0KDQpXaHkgaXMgaXQgYmVpbmcgaGl0LiBJcyB0aGUgY2xpZW50IHByb2Nl
c3Npbmcgc3R1ZmYgb3V0IG9mIG9yZGVyPw0KDQo+IHNlZWluZyBhDQo+IHByb2JsZW0gd2hlcmUg
Z2VuZXJpYy8wODkgbmV2ZXIgY29tcGxldGVzLCBhbmQgSSB3YXMgdHJ5aW5nIHRvIGdldA0KPiB0
aGlzDQo+IGJlaGF2aW9yIG91dCBvZiB0aGUgd2F5Lg0KPiANCj4gPiBJbiB0aGUgbW9yZSBnZW5l
cmFsIGNhc2Ugb2YgeW91ciByYWNlLCB0aGUgc3RhdGVpZCBtaWdodCBub3QgYmUNCj4gPiBmb3Vu
ZA0KPiA+IGF0IGFsbCBiZWNhdXNlIHRoZSBDTE9TRSBjb21wbGV0ZXMgYW5kIGlzIHByb2Nlc3Nl
ZCBvbiB0aGUgY2xpZW50DQo+ID4gYmVmb3JlIGl0IGNhbiBwcm9jZXNzIHRoZSByZXBseSBmcm9t
IHRoZSBkZWxheWVkIE9QRU4uIElmIHNvLCB3ZQ0KPiA+IHJlYWxseQ0KPiA+IGhhdmUgbm8gd2F5
IHRvIGRldGVjdCB0aGF0IHRoZSBmaWxlIGhhcyBhY3R1YWxseSBiZWVuIGNsb3NlZCBieSB0aGUN
Cj4gPiBzZXJ2ZXIgdW50aWwgd2Ugc2VlIHRoZSBORlM0RVJSX0JBRF9TVEFURUlELg0KPiANCj4g
SSBtZW50aW9uZWQgdGhpcyBjYXNlIGluIHRoZSBjb3ZlciBsZXR0ZXIuICBJdCdzIHBvc3NpYmxl
IHRoYXQgdGhlDQo+IGNsaWVudA0KPiBjb3VsZCByZXRhaW4gYSByZWNvcmQgb2YgYSBjbG9zZWQg
c3RhdGVpZCBpbiBvcmRlciB0byByZXRyeSBhbiBPUEVODQo+IGluIHRoYXQNCg0KVGhhdCB3b3Vs
ZCByZXF1aXJlIHVzIHRvIHJldGFpbiBhbGwgc3RhdGVpZHMgdW50aWwgdGhlcmUgYXJlIG5vIG1v
cmUNCnBlbmRpbmcgT1BFTiBjYWxscy4NCg0KPiBjYXNlLiAgQW5vdGhlciBhcHByb2FjaCBtYXkg
YmUgdG8gZGV0ZWN0ICdob2xlcycgaW4gdGhlIHN0YXRlIGlkDQo+IHNlcXVlbmNlDQo+IGFuZCBu
b3QgY2FsbCBDTE9TRSB1bnRpbCBlYWNoIGlkIGlzIHByb2Nlc3NlZC4gIEkgdGhpbmsgdGhlcmUn
cyBhbg0KPiBleGlzdGluZw0KDQpXZSBjYW4ndCBrbm93IHdlIGhhdmUgYSBob2xlIHVudGlsIHdl
IGtub3cgdGhlIHN0YXJ0aW5nIHZhbHVlIG9mIHRoZQ0Kc2VxaWQsIHdoaWNoIGlzIHVuZGVmaW5l
ZCBhY2NvcmRpbmcgdG8gUkZDIDU2NjEgc2VjdGlvbiAzLjMuMTIuDQoNCj4gcHJvYmxlbSBub3cg
d2hlcmUgdGhpcyByYWNlICh3aXRob3V0IHRoZSBDTE9TRSkgY2FuIGluY29ycmVjdGx5DQo+IHVw
ZGF0ZSB0aGUNCj4gc3RhdGUgZmxhZ3Mgd2l0aCB0aGUgcmVzdWx0IG9mIHRoZSBkZWxheWVkIE9Q
RU4ncyBtb2RlLg0KPiANCj4gPiBOb3RlIGFsc28gdGhhdCBzZWN0aW9uIDE4LjIuNCBzYXlzIHRo
YXQgInRoZSBzZXJ2ZXIgU0hPVUxEIHJldHVybg0KPiA+IHRoZQ0KPiA+IGludmFsaWQgc3BlY2lh
bCBzdGF0ZWlkICh0aGUgIm90aGVyIiB2YWx1ZSBpcyB6ZXJvIGFuZCB0aGUgInNlcWlkIg0KPiA+
IGZpZWxkIGlzIE5GUzRfVUlOVDMyX01BWCwgc2VlIFNlY3Rpb24gOC4yLjMpIiB3aGljaCBmdXJ0
aGVyIGVuc3VyZXMNCj4gPiB0aGF0IHdlIHNob3VsZCBub3QgYmUgYWJsZSB0byBtYXRjaCB0aGUg
T1BFTiBzdGF0ZWlkIG9uY2UgdGhlIENMT1NFDQo+ID4gaXMNCj4gPiBwcm9jZXNzZWQgb24gdGhl
IGNsaWVudC4NCj4gDQo+IEFoLCByaWdodCwgc28gbmVlZF91cGRhdGVfb3Blbl9zdGF0ZWlkKCkg
c2hvdWxkIGhhbmRsZSB0aGF0DQo+IGNhc2UuICBCdXQgdGhhdA0KPiBkb2Vzbid0IG1lYW4gdGhl
cmUncyBubyB3YXkgdG8gbWF0Y2ggdGhlIE9QRU4gc3RhdGlkIGFmdGVyIGENCj4gQ0xPU0UuICBX
ZQ0KPiBjb3VsZCBzdGlsbCBrZWVwIGEgcmVjb3JkIG9mIHRoZSBjbG9zZWQgc3RhdGUgd2l0aCBh
IGZsYWcgaW5zdGVhZCBvZg0KPiBhbg0KPiBpbmNyZW1lbnRlZCBzZXF1ZW5jZSBpZC4NCj4gDQo+
IFNvIHBlcmhhcHMga2VlcGluZyB0cmFjayBvZiBob2xlcyBpbiB0aGUgc2VxdWVuY2Ugd291bGQg
YmUgYSBiZXR0ZXINCj4gYXBwcm9hY2gsIGJ1dCB0aGVuIHRoZSBPUEVOIHJlc3BvbnNlIGhhbmRs
aW5nIG5lZWRzIHRvIGNhbGwgQ0xPU0UgaW4NCj4gY2FzZQ0KPiB0aGUgT1BFTiBmYWlscy4uIHRo
YXQgc2VlbXMgbW9yZSBjb21wbGljYXRlZC4NCj4gDQo+IEJlbg0KPiANCi0tIA0KVHJvbmQgTXlr
bGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5t
eWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo=
On 17 Oct 2017, at 13:42, Trond Myklebust wrote:
> On Tue, 2017-10-17 at 13:33 -0400, Benjamin Coddington wrote:
>> On 17 Oct 2017, at 11:49, Trond Myklebust wrote:
>>
>>> On Tue, 2017-10-17 at 10:46 -0400, Benjamin Coddington wrote:
>>>> If the client issues two simultaneous OPEN calls, and the
>>>> response to
>>>> the
>>>> first OPEN call (sequence id 1) is delayed before updating the
>>>> nfs4_state,
>>>> then the client's nfs4_state can transition through a complete
>>>> lifecycle of
>>>> OPEN (state sequence id 2), and CLOSE (state sequence id
>>>> 3). When
>>>> the
>>>> first OPEN is finally processed, the nfs4_state is incorrectly
>>>> transitioned
>>>> back to NFS_OPEN_STATE for the first OPEN (sequence id
>>>> 1). Subsequent calls
>>>> to LOCK or CLOSE will receive NFS4ERR_BAD_STATEID, and trigger
>>>> state
>>>> recovery.
>>>>
>>>> Fix this by passing back the result of need_update_open_stateid()
>>>> to
>>>> the
>>>> open() path, with the result to try again if the OPEN's stateid
>>>> should not
>>>> be updated.
>>>>
>>>
>>> Why are we worried about the special case where the client actually
>>> finds the closed stateid in its cache?
>>
>> Because I am hitting that case very frequently in generic/089, and I
>> hate
>> how unnecessary state recovery slows everything down. I'm also
>
> Why is it being hit. Is the client processing stuff out of order?
Yes.
>>> In the more general case of your race, the stateid might not be
>>> found
>>> at all because the CLOSE completes and is processed on the client
>>> before it can process the reply from the delayed OPEN. If so, we
>>> really
>>> have no way to detect that the file has actually been closed by the
>>> server until we see the NFS4ERR_BAD_STATEID.
>>
>> I mentioned this case in the cover letter. It's possible that the
>> client
>> could retain a record of a closed stateid in order to retry an OPEN
>> in that
>
> That would require us to retain all stateids until there are no more
> pending OPEN calls.
>
>> case. Another approach may be to detect 'holes' in the state id
>> sequence
>> and not call CLOSE until each id is processed. I think there's an
>> existing
>
> We can't know we have a hole until we know the starting value of the
> seqid, which is undefined according to RFC 5661 section 3.3.12.
Ah, yuck. I read 8.2.2:
When such a set of locks is first created, the server returns a
stateid with seqid value of one.
.. and went from there. Is this a conflict in the spec?
Ben
T24gVHVlLCAyMDE3LTEwLTE3IGF0IDEzOjUyIC0wNDAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdy
b3RlOg0KPiBPbiAxNyBPY3QgMjAxNywgYXQgMTM6NDIsIFRyb25kIE15a2xlYnVzdCB3cm90ZToN
Cj4gDQo+ID4gT24gVHVlLCAyMDE3LTEwLTE3IGF0IDEzOjMzIC0wNDAwLCBCZW5qYW1pbiBDb2Rk
aW5ndG9uIHdyb3RlOg0KPiA+ID4gT24gMTcgT2N0IDIwMTcsIGF0IDExOjQ5LCBUcm9uZCBNeWts
ZWJ1c3Qgd3JvdGU6DQo+ID4gPiANCj4gPiA+ID4gT24gVHVlLCAyMDE3LTEwLTE3IGF0IDEwOjQ2
IC0wNDAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdyb3RlOg0KPiA+ID4gPiA+IElmIHRoZSBjbGll
bnQgaXNzdWVzIHR3byBzaW11bHRhbmVvdXMgT1BFTiBjYWxscywgYW5kIHRoZQ0KPiA+ID4gPiA+
IHJlc3BvbnNlIHRvDQo+ID4gPiA+ID4gdGhlDQo+ID4gPiA+ID4gZmlyc3QgT1BFTiBjYWxsIChz
ZXF1ZW5jZSBpZCAxKSBpcyBkZWxheWVkIGJlZm9yZSB1cGRhdGluZw0KPiA+ID4gPiA+IHRoZQ0K
PiA+ID4gPiA+IG5mczRfc3RhdGUsDQo+ID4gPiA+ID4gdGhlbiB0aGUgY2xpZW50J3MgbmZzNF9z
dGF0ZSBjYW4gdHJhbnNpdGlvbiB0aHJvdWdoIGENCj4gPiA+ID4gPiBjb21wbGV0ZQ0KPiA+ID4g
PiA+IGxpZmVjeWNsZSBvZg0KPiA+ID4gPiA+IE9QRU4gKHN0YXRlIHNlcXVlbmNlIGlkIDIpLCBh
bmQgQ0xPU0UgKHN0YXRlIHNlcXVlbmNlIGlkDQo+ID4gPiA+ID4gMykuICBXaGVuDQo+ID4gPiA+
ID4gdGhlDQo+ID4gPiA+ID4gZmlyc3QgT1BFTiBpcyBmaW5hbGx5IHByb2Nlc3NlZCwgdGhlIG5m
czRfc3RhdGUgaXMNCj4gPiA+ID4gPiBpbmNvcnJlY3RseQ0KPiA+ID4gPiA+IHRyYW5zaXRpb25l
ZA0KPiA+ID4gPiA+IGJhY2sgdG8gTkZTX09QRU5fU1RBVEUgZm9yIHRoZSBmaXJzdCBPUEVOIChz
ZXF1ZW5jZSBpZA0KPiA+ID4gPiA+IDEpLiAgU3Vic2VxdWVudCBjYWxscw0KPiA+ID4gPiA+IHRv
IExPQ0sgb3IgQ0xPU0Ugd2lsbCByZWNlaXZlIE5GUzRFUlJfQkFEX1NUQVRFSUQsIGFuZA0KPiA+
ID4gPiA+IHRyaWdnZXINCj4gPiA+ID4gPiBzdGF0ZQ0KPiA+ID4gPiA+IHJlY292ZXJ5Lg0KPiA+
ID4gPiA+IA0KPiA+ID4gPiA+IEZpeCB0aGlzIGJ5IHBhc3NpbmcgYmFjayB0aGUgcmVzdWx0IG9m
DQo+ID4gPiA+ID4gbmVlZF91cGRhdGVfb3Blbl9zdGF0ZWlkKCkNCj4gPiA+ID4gPiB0bw0KPiA+
ID4gPiA+IHRoZQ0KPiA+ID4gPiA+IG9wZW4oKSBwYXRoLCB3aXRoIHRoZSByZXN1bHQgdG8gdHJ5
IGFnYWluIGlmIHRoZSBPUEVOJ3MNCj4gPiA+ID4gPiBzdGF0ZWlkDQo+ID4gPiA+ID4gc2hvdWxk
IG5vdA0KPiA+ID4gPiA+IGJlIHVwZGF0ZWQuDQo+ID4gPiA+ID4gDQo+ID4gPiA+IA0KPiA+ID4g
PiBXaHkgYXJlIHdlIHdvcnJpZWQgYWJvdXQgdGhlIHNwZWNpYWwgY2FzZSB3aGVyZSB0aGUgY2xp
ZW50DQo+ID4gPiA+IGFjdHVhbGx5DQo+ID4gPiA+IGZpbmRzIHRoZSBjbG9zZWQgc3RhdGVpZCBp
biBpdHMgY2FjaGU/DQo+ID4gPiANCj4gPiA+IEJlY2F1c2UgSSBhbSBoaXR0aW5nIHRoYXQgY2Fz
ZSB2ZXJ5IGZyZXF1ZW50bHkgaW4gZ2VuZXJpYy8wODksDQo+ID4gPiBhbmQgSQ0KPiA+ID4gaGF0
ZQ0KPiA+ID4gaG93IHVubmVjZXNzYXJ5IHN0YXRlIHJlY292ZXJ5IHNsb3dzIGV2ZXJ5dGhpbmcg
ZG93bi4gIEknbSBhbHNvDQo+ID4gDQo+ID4gV2h5IGlzIGl0IGJlaW5nIGhpdC4gSXMgdGhlIGNs
aWVudCBwcm9jZXNzaW5nIHN0dWZmIG91dCBvZiBvcmRlcj8NCj4gDQo+IFllcy4NCj4gDQo+ID4g
PiA+IEluIHRoZSBtb3JlIGdlbmVyYWwgY2FzZSBvZiB5b3VyIHJhY2UsIHRoZSBzdGF0ZWlkIG1p
Z2h0IG5vdCBiZQ0KPiA+ID4gPiBmb3VuZA0KPiA+ID4gPiBhdCBhbGwgYmVjYXVzZSB0aGUgQ0xP
U0UgY29tcGxldGVzIGFuZCBpcyBwcm9jZXNzZWQgb24gdGhlDQo+ID4gPiA+IGNsaWVudA0KPiA+
ID4gPiBiZWZvcmUgaXQgY2FuIHByb2Nlc3MgdGhlIHJlcGx5IGZyb20gdGhlIGRlbGF5ZWQgT1BF
Ti4gSWYgc28sDQo+ID4gPiA+IHdlDQo+ID4gPiA+IHJlYWxseQ0KPiA+ID4gPiBoYXZlIG5vIHdh
eSB0byBkZXRlY3QgdGhhdCB0aGUgZmlsZSBoYXMgYWN0dWFsbHkgYmVlbiBjbG9zZWQgYnkNCj4g
PiA+ID4gdGhlDQo+ID4gPiA+IHNlcnZlciB1bnRpbCB3ZSBzZWUgdGhlIE5GUzRFUlJfQkFEX1NU
QVRFSUQuDQo+ID4gPiANCj4gPiA+IEkgbWVudGlvbmVkIHRoaXMgY2FzZSBpbiB0aGUgY292ZXIg
bGV0dGVyLiAgSXQncyBwb3NzaWJsZSB0aGF0DQo+ID4gPiB0aGUNCj4gPiA+IGNsaWVudA0KPiA+
ID4gY291bGQgcmV0YWluIGEgcmVjb3JkIG9mIGEgY2xvc2VkIHN0YXRlaWQgaW4gb3JkZXIgdG8g
cmV0cnkgYW4NCj4gPiA+IE9QRU4NCj4gPiA+IGluIHRoYXQNCj4gPiANCj4gPiBUaGF0IHdvdWxk
IHJlcXVpcmUgdXMgdG8gcmV0YWluIGFsbCBzdGF0ZWlkcyB1bnRpbCB0aGVyZSBhcmUgbm8NCj4g
PiBtb3JlDQo+ID4gcGVuZGluZyBPUEVOIGNhbGxzLg0KPiA+IA0KPiA+ID4gY2FzZS4gIEFub3Ro
ZXIgYXBwcm9hY2ggbWF5IGJlIHRvIGRldGVjdCAnaG9sZXMnIGluIHRoZSBzdGF0ZSBpZA0KPiA+
ID4gc2VxdWVuY2UNCj4gPiA+IGFuZCBub3QgY2FsbCBDTE9TRSB1bnRpbCBlYWNoIGlkIGlzIHBy
b2Nlc3NlZC4gIEkgdGhpbmsgdGhlcmUncw0KPiA+ID4gYW4NCj4gPiA+IGV4aXN0aW5nDQo+ID4g
DQo+ID4gV2UgY2FuJ3Qga25vdyB3ZSBoYXZlIGEgaG9sZSB1bnRpbCB3ZSBrbm93IHRoZSBzdGFy
dGluZyB2YWx1ZSBvZg0KPiA+IHRoZQ0KPiA+IHNlcWlkLCB3aGljaCBpcyB1bmRlZmluZWQgYWNj
b3JkaW5nIHRvIFJGQyA1NjYxIHNlY3Rpb24gMy4zLjEyLg0KPiANCj4gQWgsIHl1Y2suICBJIHJl
YWQgOC4yLjI6DQo+IA0KPiAgICBXaGVuIHN1Y2ggYSBzZXQgb2YgbG9ja3MgaXMgZmlyc3QgY3Jl
YXRlZCwgdGhlIHNlcnZlciByZXR1cm5zIGENCj4gICAgc3RhdGVpZCB3aXRoIHNlcWlkIHZhbHVl
IG9mIG9uZS4NCj4gDQo+IC4uIGFuZCB3ZW50IGZyb20gdGhlcmUuICBJcyB0aGlzIGEgY29uZmxp
Y3QgaW4gdGhlIHNwZWM/DQo+IA0KDQpIbW0uLi4gSSBub3RlIHRoYXQgdGhlIGVxdWl2YWxlbnQg
c2VjdGlvbiAyLjIuMTYgaW4gUkZDNzUzMCBoYXMgYmVlbg0KY2hhbmdlZCB0byByZW1vdmUgdGhl
IGJpdCBhYm91dCB0aGUgc3RhcnRpbmcgdmFsdWUgb2Ygc2VxaWQgYmVpbmcNCnVuZGVmaW5lZC4N
Cg0KU28sIGlmIHRoZSBzcGVjIGFsbG93cyB1cyB0byByZWx5IG9uIHRoZSBzZXFpZCBhbHdheXMg
YmVpbmcgaW5pdGlhbGlzZWQNCnRvIDEsIHRoZW4gd2UgbWlnaHQgYXQgbGVhc3QgYmUgYWJsZSB0
byBkZXRlY3QgdGhhdCB3ZSBoYXZlIHRvIHJlcGxheQ0KdGhlIG9wZW4uDQpPbmUgd2F5IHRvIGRv
IHNvIG1pZ2h0IGJlIHRvIGtlZXAgYSBjb3VudCBvZiB0aGUgbnVtYmVyIG9mIG91dHN0YW5kaW5n
DQpzZXFpZHMsIGFuZCB0aGVuIGZvcmNlIE9QRU5fRE9XTkdSQURFL0NMT1NFIHRvIHdhaXQgdW50
aWwgdGhhdCBudW1iZXINCmhpdHMgMCAob3IgdW50aWwgYSBzdGF0ZSByZWNvdmVyeSBpcyBzdGFy
dGVkKS4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5l
ciwgUHJpbWFyeURhdGENCnRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCg==
On 17 Oct 2017, at 14:26, Trond Myklebust wrote:
> On Tue, 2017-10-17 at 13:52 -0400, Benjamin Coddington wrote:
>> Ah, yuck. I read 8.2.2:
>>
>> When such a set of locks is first created, the server returns a
>> stateid with seqid value of one.
>>
>> .. and went from there. Is this a conflict in the spec?
>>
>
> Hmm... I note that the equivalent section 2.2.16 in RFC7530 has been
> changed to remove the bit about the starting value of seqid being
> undefined.
>
> So, if the spec allows us to rely on the seqid always being initialised
> to 1, then we might at least be able to detect that we have to replay
> the open.
> One way to do so might be to keep a count of the number of outstanding
> seqids, and then force OPEN_DOWNGRADE/CLOSE to wait until that number
> hits 0 (or until a state recovery is started).
I can take that approach for another spin, but due to other priorities, I'll
likely not come back to this until next week. Thanks for your comments.
Ben