Suppose I open/create a file over pNFS and am given a WRITE delegation.
I then chmod the file (as rsync is wont to do) so the delegation is
immediately returned.
I then proceed to write, which triggers a LAYOUT_GET request. The
stateid for that request it taken from state->stateid, which is still
the delegation stateid. Naturally it gets NFS4ERR_BAD_STATEID.
When an OPEN is given a delegation, the delegation stateid gets copied
into ->stateid and the open stateid is left in ->open_stateid.
When the delegation is returned, this it done by inode, not open_state,
so it doesn't have easy access to reset the ->stateid. It could find
all the open_states and fix them up I guess it ....
Other than this usage in pnfs (which dates back to commit b1f69b754e and
is, I believe, incorrect) the ->stateid is *only* used to help find the
write state to recover when an error is reported and the state needs to
be recovered (... though the usage in nfs4_do_handle_exception()
introduced by 272289a3df is a bit different).
Anyway, no other code uses it when choosing a stateid to send in a
request. So no other code inadvertently uses the delegation stateid
after it has been returned.
How should this be fixed?
- have nfs_start_delegation_return() iterated over all open states
and copy the open_stateid over the stateid??
- have pnfs_update_layout() use ->open_stateid rather than ->stateid ??
I suspect that would be wrong.
- have pnfs_update_layout() use ->open_stateid if NFS_I()->delegation
is NULL ??
Something else?
The last seems easiest, but I'm not certain it is best.
Thanks,
NeilBrown
DQo+IE9uIE9jdCAxMywgMjAxNiwgYXQgMDM6MTgsIE5laWxCcm93biA8bmVpbGJAc3VzZS5jb20+
IHdyb3RlOg0KPiANCj4gDQo+IFN1cHBvc2UgSSBvcGVuL2NyZWF0ZSBhIGZpbGUgb3ZlciBwTkZT
IGFuZCBhbSBnaXZlbiBhIFdSSVRFIGRlbGVnYXRpb24uDQo+IEkgdGhlbiBjaG1vZCB0aGUgZmls
ZSAoYXMgcnN5bmMgaXMgd29udCB0byBkbykgc28gdGhlIGRlbGVnYXRpb24gaXMNCj4gaW1tZWRp
YXRlbHkgcmV0dXJuZWQuDQo+IEkgdGhlbiBwcm9jZWVkIHRvIHdyaXRlLCB3aGljaCB0cmlnZ2Vy
cyBhIExBWU9VVF9HRVQgcmVxdWVzdC4gIFRoZQ0KPiBzdGF0ZWlkIGZvciB0aGF0IHJlcXVlc3Qg
aXQgdGFrZW4gZnJvbSBzdGF0ZS0+c3RhdGVpZCwgd2hpY2ggaXMgc3RpbGwNCj4gdGhlIGRlbGVn
YXRpb24gc3RhdGVpZC4gIE5hdHVyYWxseSBpdCBnZXRzIE5GUzRFUlJfQkFEX1NUQVRFSUQuDQoN
ClVtbeKApiBXaHkgaXNu4oCZdCB0aGUgY2xpZW50IHJlY292ZXJpbmcgb3BlbiBzdGF0ZWlkcyBh
cyBwYXJ0IG9mIHRoZSBkZWxlZ2F0aW9uIHJldHVybj8gVGhhdCB3b3VsZCBhcHBlYXIgdG8gYmUg
dGhlIHJvb3QgY2F1c2UgcHJvYmxlbSBoZXJlLg0KDQpJcyB0aGlzIHRoZSBsYXRlc3QgdXBzdHJl
YW0ga2VybmVsPw0KDQpDaGVlcnMNCiAgVHJvbmQ=
On Fri, Oct 14 2016, Trond Myklebust wrote:
>> On Oct 13, 2016, at 03:18, NeilBrown <[email protected]> wrote:
>>
>>
>> Suppose I open/create a file over pNFS and am given a WRITE delegation.
>> I then chmod the file (as rsync is wont to do) so the delegation is
>> immediately returned.
>> I then proceed to write, which triggers a LAYOUT_GET request. The
>> stateid for that request it taken from state->stateid, which is still
>> the delegation stateid. Naturally it gets NFS4ERR_BAD_STATEID.
>
> Umm… Why isn’t the client recovering open stateids as part of the delegation return? That would appear to be the root cause problem here.
Ahh.. that's where it happens. Of course. Thanks.
I was thinking that where would be no opens to recover because the
delegation wasn't used to complete any other opens. But there is more
to the recovery than that.
>
> Is this the latest upstream kernel?
No, it is 3.12 based. I now see that I am just missing
Commit: 5e99b532bb95 ("nfs4: reset states to use open_stateid when returning delegation voluntarily")
Thanks!
NeilBrown