2017-02-17 18:46:21

by Benjamin Coddington

[permalink] [raw]
Subject: [PATCH v2 0/4] Skipped unlocks

Well over a year ago I made other attempts to fix the problem of NFS failing
to send an unlock when signalled. Those attempts were terrible.

Here's another, smaller version that keeps two simple fixes and splits out
the behavior for waiting on I/O completion to be different for the v3 and v4
case. For v4, we can easily ensure the unlock is always sent by having the
rpc_task perform the wait. For v3, that approach would require significant
changes to the way NLM works, so a simpler approach of skipping the wait is
used as was suggested by Trond.

I think this is probably a little late for 4.11. Comments and review are
welcomed.

since v1:
- add Christoph's reviewed-by on 1/4 and 2/4 and fixup switch
indentation on 2/4

Benjamin Coddington (4):
NFS4: remove a redundant lock range check
NFS: Move the flock open mode check into nfs_flock()
locks: Set FL_CLOSE when removing flock locks on close()
NFS: Always wait for I/O completion before unlock

fs/locks.c | 2 +-
fs/nfs/file.c | 31 ++++++++++++++++---------------
fs/nfs/nfs3proc.c | 13 +++++++++++++
fs/nfs/nfs4proc.c | 24 +++++++-----------------
fs/nfs/pagelist.c | 1 +
5 files changed, 38 insertions(+), 33 deletions(-)

--
2.9.3



2017-02-17 18:46:21

by Benjamin Coddington

[permalink] [raw]
Subject: [PATCH 1/4] NFS4: remove a redundant lock range check

flock64_to_posix_lock() is already doing this check

Signed-off-by: Benjamin Coddington <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
fs/nfs/nfs4proc.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 0a0eaecf9676..9388899e4050 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6570,9 +6570,6 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request)
ctx = nfs_file_open_context(filp);
state = ctx->state;

- if (request->fl_start < 0 || request->fl_end < 0)
- return -EINVAL;
-
if (IS_GETLK(cmd)) {
if (state != NULL)
return nfs4_proc_getlk(state, F_GETLK, request);
--
2.9.3


2017-02-17 18:46:21

by Benjamin Coddington

[permalink] [raw]
Subject: [PATCH 2/4] NFS: Move the flock open mode check into nfs_flock()

We only need to check lock exclusive/shared types against open mode when
flock() is used on NFS, so move it into the flock-specific path instead of
checking it for all locks.

Signed-off-by: Benjamin Coddington <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
fs/nfs/file.c | 18 ++++++++++++++++--
fs/nfs/nfs4proc.c | 14 --------------
2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 26dbe8b0c10d..a490f45df4db 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -820,9 +820,23 @@ int nfs_flock(struct file *filp, int cmd, struct file_lock *fl)
if (NFS_SERVER(inode)->flags & NFS_MOUNT_LOCAL_FLOCK)
is_local = 1;

- /* We're simulating flock() locks using posix locks on the server */
- if (fl->fl_type == F_UNLCK)
+ /*
+ * VFS doesn't require the open mode to match a flock() lock's type.
+ * NFS, however, may simulate flock() locking with posix locking which
+ * requires the open mode to match the lock type.
+ */
+ switch (fl->fl_type) {
+ case F_UNLCK:
return do_unlk(filp, cmd, fl, is_local);
+ case F_RDLCK:
+ if (!(filp->f_mode & FMODE_READ))
+ return -EBADF;
+ break;
+ case F_WRLCK:
+ if (!(filp->f_mode & FMODE_WRITE))
+ return -EBADF;
+ }
+
return do_setlk(filp, cmd, fl, is_local);
}
EXPORT_SYMBOL_GPL(nfs_flock);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 9388899e4050..91f88bfbbe79 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6592,20 +6592,6 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request)
!test_bit(NFS_STATE_POSIX_LOCKS, &state->flags))
return -ENOLCK;

- /*
- * Don't rely on the VFS having checked the file open mode,
- * since it won't do this for flock() locks.
- */
- switch (request->fl_type) {
- case F_RDLCK:
- if (!(filp->f_mode & FMODE_READ))
- return -EBADF;
- break;
- case F_WRLCK:
- if (!(filp->f_mode & FMODE_WRITE))
- return -EBADF;
- }
-
status = nfs4_set_lock_state(state, request);
if (status != 0)
return status;
--
2.9.3


2017-02-17 18:46:21

by Benjamin Coddington

[permalink] [raw]
Subject: [PATCH 4/4] NFS: Always wait for I/O completion before unlock

NFS attempts to wait for read and write completion before unlocking in
order to ensure that the data returned was protected by the lock. When
this waiting is interrupted by a signal, the unlock may never be sent, and
messages similar to the following are seen in the kernel ring buffer:

[20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3:
[20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1 fl_type=0x0 fl_pid=20183
[20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1 fl_type=0x0 fl_pid=20185

For NFSv3, the missing unlock will cause the server to refuse conflicting
locks indefinitely. For NFSv4, the leftover lock will be removed by the
server after the lease timeout.

This patch fixes this for NFSv3 by skipping the wait in order to
immediately send the unlock if the FL_CLOSE flag is set when signaled. For
NFSv4, this approach may cause the server to see the I/O as arriving with
an old stateid, so, for the v4 case the fix is different: the wait on I/O
completion is moved into nfs4_locku_ops' rpc_call_prepare(). This will
cause the sleep to happen in rpciod context, and a signal to the originally
waiting process will not cause the unlock to be skipped.

Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/nfs/file.c | 13 -------------
fs/nfs/nfs3proc.c | 13 +++++++++++++
fs/nfs/nfs4proc.c | 7 +++++++
fs/nfs/pagelist.c | 1 +
4 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index a490f45df4db..adc67fe762e3 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -684,7 +684,6 @@ static int
do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
{
struct inode *inode = filp->f_mapping->host;
- struct nfs_lock_context *l_ctx;
int status;

/*
@@ -693,18 +692,6 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
*/
vfs_fsync(filp, 0);

- l_ctx = nfs_get_lock_context(nfs_file_open_context(filp));
- if (!IS_ERR(l_ctx)) {
- status = nfs_iocounter_wait(l_ctx);
- nfs_put_lock_context(l_ctx);
- if (status < 0)
- return status;
- }
-
- /* NOTE: special case
- * If we're signalled while cleaning up locks on process exit, we
- * still need to complete the unlock.
- */
/*
* Use local locking if mounted with "-onolock" or with appropriate
* "-olocal_lock="
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index dc925b531f32..ec3f12571c82 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -869,6 +869,19 @@ static int
nfs3_proc_lock(struct file *filp, int cmd, struct file_lock *fl)
{
struct inode *inode = file_inode(filp);
+ int status;
+ struct nfs_lock_context *l_ctx;
+
+ /* For v3, always send the unlock on FL_CLOSE */
+ if (fl->fl_type == F_UNLCK) {
+ l_ctx = nfs_get_lock_context(nfs_file_open_context(filp));
+ if (!IS_ERR(l_ctx)) {
+ status = nfs_iocounter_wait(l_ctx);
+ nfs_put_lock_context(l_ctx);
+ if (status < 0 && !(fl->fl_flags & FL_CLOSE))
+ return status;
+ }
+ }

return nlmclnt_proc(NFS_SERVER(inode)->nlm_host, cmd, fl);
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 91f88bfbbe79..052b97fd653d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5906,6 +5906,7 @@ struct nfs4_unlockdata {
struct nfs_locku_res res;
struct nfs4_lock_state *lsp;
struct nfs_open_context *ctx;
+ struct nfs_lock_context *l_ctx;
struct file_lock fl;
struct nfs_server *server;
unsigned long timestamp;
@@ -5930,6 +5931,7 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl,
atomic_inc(&lsp->ls_count);
/* Ensure we don't close file until we're done freeing locks! */
p->ctx = get_nfs_open_context(ctx);
+ p->l_ctx = nfs_get_lock_context(ctx);
memcpy(&p->fl, fl, sizeof(p->fl));
p->server = NFS_SERVER(inode);
return p;
@@ -5988,6 +5990,11 @@ static void nfs4_locku_prepare(struct rpc_task *task, void *data)
/* Note: exit _without_ running nfs4_locku_done */
goto out_no_action;
}
+
+ if (!IS_ERR(calldata->l_ctx)) {
+ nfs_iocounter_wait(calldata->l_ctx);
+ nfs_put_lock_context(calldata->l_ctx);
+ }
calldata->timestamp = jiffies;
if (nfs4_setup_sequence(calldata->server,
&calldata->arg.seq_args,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 6e629b856a00..8bf3cefdaa96 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -114,6 +114,7 @@ nfs_iocounter_wait(struct nfs_lock_context *l_ctx)
return wait_on_atomic_t(&l_ctx->io_count, nfs_wait_atomic_killable,
TASK_KILLABLE);
}
+EXPORT_SYMBOL(nfs_iocounter_wait);

/*
* nfs_page_group_lock - lock the head of the page group
--
2.9.3


2017-02-17 18:46:26

by Benjamin Coddington

[permalink] [raw]
Subject: [PATCH 3/4] locks: Set FL_CLOSE when removing flock locks on close()

Set FL_CLOSE in fl_flags as in locks_remove_posix() when clearing locks.
NFS will check for this flag to ensure an unlock is sent in a following
patch.

Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/locks.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/locks.c b/fs/locks.c
index 26811321d39b..af2031a1fcff 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2504,7 +2504,7 @@ locks_remove_flock(struct file *filp, struct file_lock_context *flctx)
.fl_owner = filp,
.fl_pid = current->tgid,
.fl_file = filp,
- .fl_flags = FL_FLOCK,
+ .fl_flags = FL_FLOCK | FL_CLOSE,
.fl_type = F_UNLCK,
.fl_end = OFFSET_MAX,
};
--
2.9.3


2017-02-17 19:00:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFS: Always wait for I/O completion before unlock

T24gRnJpLCAyMDE3LTAyLTE3IGF0IDEzOjQ2IC0wNTAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdy
b3RlOg0KPiBORlMgYXR0ZW1wdHMgdG8gd2FpdCBmb3IgcmVhZCBhbmQgd3JpdGUgY29tcGxldGlv
biBiZWZvcmUgdW5sb2NraW5nDQo+IGluDQo+IG9yZGVyIHRvIGVuc3VyZSB0aGF0IHRoZSBkYXRh
IHJldHVybmVkIHdhcyBwcm90ZWN0ZWQgYnkgdGhlDQo+IGxvY2suwqDCoFdoZW4NCj4gdGhpcyB3
YWl0aW5nIGlzIGludGVycnVwdGVkIGJ5IGEgc2lnbmFsLCB0aGUgdW5sb2NrIG1heSBuZXZlciBi
ZQ0KPiBzZW50LCBhbmQNCj4gbWVzc2FnZXMgc2ltaWxhciB0byB0aGUgZm9sbG93aW5nIGFyZSBz
ZWVuIGluIHRoZSBrZXJuZWwgcmluZyBidWZmZXI6DQo+IA0KPiBbMjAuMTY3ODc2XSBMZWFrZWQg
bG9ja3Mgb24gZGV2PTB4MDoweDJiIGlubz0weDhkZDRjMzoNCj4gWzIwLjE2ODI4Nl0gUE9TSVg6
IGZsX293bmVyPWZmZmY4ODAwNzhiMDY5NDAgZmxfZmxhZ3M9MHgxIGZsX3R5cGU9MHgwDQo+IGZs
X3BpZD0yMDE4Mw0KPiBbMjAuMTY4NzI3XSBQT1NJWDogZmxfb3duZXI9ZmZmZjg4MDA3OGIwNjY4
MCBmbF9mbGFncz0weDEgZmxfdHlwZT0weDANCj4gZmxfcGlkPTIwMTg1DQo+IA0KPiBGb3IgTkZT
djMsIHRoZSBtaXNzaW5nIHVubG9jayB3aWxsIGNhdXNlIHRoZSBzZXJ2ZXIgdG8gcmVmdXNlDQo+
IGNvbmZsaWN0aW5nDQo+IGxvY2tzIGluZGVmaW5pdGVseS7CoMKgRm9yIE5GU3Y0LCB0aGUgbGVm
dG92ZXIgbG9jayB3aWxsIGJlIHJlbW92ZWQgYnkNCj4gdGhlDQo+IHNlcnZlciBhZnRlciB0aGUg
bGVhc2UgdGltZW91dC4NCj4gDQo+IFRoaXMgcGF0Y2ggZml4ZXMgdGhpcyBmb3IgTkZTdjMgYnkg
c2tpcHBpbmcgdGhlIHdhaXQgaW4gb3JkZXIgdG8NCj4gaW1tZWRpYXRlbHkgc2VuZCB0aGUgdW5s
b2NrIGlmIHRoZSBGTF9DTE9TRSBmbGFnIGlzIHNldCB3aGVuDQo+IHNpZ25hbGVkLsKgwqBGb3IN
Cj4gTkZTdjQsIHRoaXMgYXBwcm9hY2ggbWF5IGNhdXNlIHRoZSBzZXJ2ZXIgdG8gc2VlIHRoZSBJ
L08gYXMgYXJyaXZpbmcNCj4gd2l0aA0KPiBhbiBvbGQgc3RhdGVpZCwgc28sIGZvciB0aGUgdjQg
Y2FzZSB0aGUgZml4IGlzIGRpZmZlcmVudDogdGhlIHdhaXQgb24NCj4gSS9PDQo+IGNvbXBsZXRp
b24gaXMgbW92ZWQgaW50byBuZnM0X2xvY2t1X29wcycgcnBjX2NhbGxfcHJlcGFyZSgpLsKgwqBU
aGlzDQo+IHdpbGwNCj4gY2F1c2UgdGhlIHNsZWVwIHRvIGhhcHBlbiBpbiBycGNpb2QgY29udGV4
dCwgYW5kIGEgc2lnbmFsIHRvIHRoZQ0KPiBvcmlnaW5hbGx5DQo+IHdhaXRpbmcgcHJvY2VzcyB3
aWxsIG5vdCBjYXVzZSB0aGUgdW5sb2NrIHRvIGJlIHNraXBwZWQuDQoNCk5BQ0suIEkvTyB3YWl0
cyBpbiBycGNpb2QgY29udGV4dHMgYXJlIE5PVCBhY2NlcHRhYmxlLiBycGNpb2QgaXMgcGFydA0K
b2YgdGhlIG1lbW9yeSByZWNsYWltIGNoYWluLCBzbyBoYXZpbmcgaXQgc2xlZXAgb24gSS9PIGlz
IGRlYWRsb2NrDQpwcm9uZS4NCg0KV2h5IGlzIHRoZXJlIGEgbmVlZCB0byB3YWl0IGZvciBJL08g
Y29tcGxldGlvbiBpbiB0aGUgZmlyc3QgcGxhY2UgaWYNCnRoZSB1c2VyIGhhcyBraWxsZWQgdGhl
IHRhc2sgdGhhdCBoZWxkIHRoZSBsb2NrPyAna2lsbCAtOScgd2lsbCBjYXVzZQ0KY29ycnVwdGlv
bjsgdGhhdCdzIGEgZmFjdCB0aGF0IG5vIGFtb3VudCBvZiBwYXBlciB3aWxsIGNvdmVyIG92ZXIu
DQoNCj4gDQo+IFNpZ25lZC1vZmYtYnk6IEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJl
ZGhhdC5jb20+DQo+IC0tLQ0KPiDCoGZzL25mcy9maWxlLmPCoMKgwqDCoMKgfCAxMyAtLS0tLS0t
LS0tLS0tDQo+IMKgZnMvbmZzL25mczNwcm9jLmMgfCAxMyArKysrKysrKysrKysrDQo+IMKgZnMv
bmZzL25mczRwcm9jLmMgfMKgwqA3ICsrKysrKysNCj4gwqBmcy9uZnMvcGFnZWxpc3QuYyB8wqDC
oDEgKw0KPiDCoDQgZmlsZXMgY2hhbmdlZCwgMjEgaW5zZXJ0aW9ucygrKSwgMTMgZGVsZXRpb25z
KC0pDQo+IA0KPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL2ZpbGUuYyBiL2ZzL25mcy9maWxlLmMNCj4g
aW5kZXggYTQ5MGY0NWRmNGRiLi5hZGM2N2ZlNzYyZTMgMTAwNjQ0DQo+IC0tLSBhL2ZzL25mcy9m
aWxlLmMNCj4gKysrIGIvZnMvbmZzL2ZpbGUuYw0KPiBAQCAtNjg0LDcgKzY4NCw2IEBAIHN0YXRp
YyBpbnQNCj4gwqBkb191bmxrKHN0cnVjdCBmaWxlICpmaWxwLCBpbnQgY21kLCBzdHJ1Y3QgZmls
ZV9sb2NrICpmbCwgaW50DQo+IGlzX2xvY2FsKQ0KPiDCoHsNCj4gwqAJc3RydWN0IGlub2RlICpp
bm9kZSA9IGZpbHAtPmZfbWFwcGluZy0+aG9zdDsNCj4gLQlzdHJ1Y3QgbmZzX2xvY2tfY29udGV4
dCAqbF9jdHg7DQo+IMKgCWludCBzdGF0dXM7DQo+IMKgDQo+IMKgCS8qDQo+IEBAIC02OTMsMTgg
KzY5Miw2IEBAIGRvX3VubGsoc3RydWN0IGZpbGUgKmZpbHAsIGludCBjbWQsIHN0cnVjdA0KPiBm
aWxlX2xvY2sgKmZsLCBpbnQgaXNfbG9jYWwpDQo+IMKgCcKgKi8NCj4gwqAJdmZzX2ZzeW5jKGZp
bHAsIDApOw0KPiDCoA0KPiAtCWxfY3R4ID0gbmZzX2dldF9sb2NrX2NvbnRleHQobmZzX2ZpbGVf
b3Blbl9jb250ZXh0KGZpbHApKTsNCj4gLQlpZiAoIUlTX0VSUihsX2N0eCkpIHsNCj4gLQkJc3Rh
dHVzID0gbmZzX2lvY291bnRlcl93YWl0KGxfY3R4KTsNCj4gLQkJbmZzX3B1dF9sb2NrX2NvbnRl
eHQobF9jdHgpOw0KPiAtCQlpZiAoc3RhdHVzIDwgMCkNCj4gLQkJCXJldHVybiBzdGF0dXM7DQo+
IC0JfQ0KPiAtDQo+IC0JLyogTk9URTogc3BlY2lhbCBjYXNlDQo+IC0JwqAqwqAJSWYgd2UncmUg
c2lnbmFsbGVkIHdoaWxlIGNsZWFuaW5nIHVwIGxvY2tzIG9uDQo+IHByb2Nlc3MgZXhpdCwgd2UN
Cj4gLQnCoCrCoAlzdGlsbCBuZWVkIHRvIGNvbXBsZXRlIHRoZSB1bmxvY2suDQo+IC0JwqAqLw0K
PiDCoAkvKg0KPiDCoAnCoCogVXNlIGxvY2FsIGxvY2tpbmcgaWYgbW91bnRlZCB3aXRoICItb25v
bG9jayIgb3Igd2l0aA0KPiBhcHByb3ByaWF0ZQ0KPiDCoAnCoCogIi1vbG9jYWxfbG9jaz0iDQo+
IGRpZmYgLS1naXQgYS9mcy9uZnMvbmZzM3Byb2MuYyBiL2ZzL25mcy9uZnMzcHJvYy5jDQo+IGlu
ZGV4IGRjOTI1YjUzMWYzMi4uZWMzZjEyNTcxYzgyIDEwMDY0NA0KPiAtLS0gYS9mcy9uZnMvbmZz
M3Byb2MuYw0KPiArKysgYi9mcy9uZnMvbmZzM3Byb2MuYw0KPiBAQCAtODY5LDYgKzg2OSwxOSBA
QCBzdGF0aWMgaW50DQo+IMKgbmZzM19wcm9jX2xvY2soc3RydWN0IGZpbGUgKmZpbHAsIGludCBj
bWQsIHN0cnVjdCBmaWxlX2xvY2sgKmZsKQ0KPiDCoHsNCj4gwqAJc3RydWN0IGlub2RlICppbm9k
ZSA9IGZpbGVfaW5vZGUoZmlscCk7DQo+ICsJaW50IHN0YXR1czsNCj4gKwlzdHJ1Y3QgbmZzX2xv
Y2tfY29udGV4dCAqbF9jdHg7DQo+ICsNCj4gKwkvKiBGb3IgdjMsIGFsd2F5cyBzZW5kIHRoZSB1
bmxvY2sgb24gRkxfQ0xPU0UgKi8NCj4gKwlpZiAoZmwtPmZsX3R5cGUgPT0gRl9VTkxDSykgew0K
PiArCQlsX2N0eCA9DQo+IG5mc19nZXRfbG9ja19jb250ZXh0KG5mc19maWxlX29wZW5fY29udGV4
dChmaWxwKSk7DQo+ICsJCWlmICghSVNfRVJSKGxfY3R4KSkgew0KPiArCQkJc3RhdHVzID0gbmZz
X2lvY291bnRlcl93YWl0KGxfY3R4KTsNCj4gKwkJCW5mc19wdXRfbG9ja19jb250ZXh0KGxfY3R4
KTsNCj4gKwkJCWlmIChzdGF0dXMgPCAwICYmICEoZmwtPmZsX2ZsYWdzICYNCj4gRkxfQ0xPU0Up
KQ0KPiArCQkJCXJldHVybiBzdGF0dXM7DQo+ICsJCX0NCj4gKwl9DQo+IMKgDQo+IMKgCXJldHVy
biBubG1jbG50X3Byb2MoTkZTX1NFUlZFUihpbm9kZSktPm5sbV9ob3N0LCBjbWQsIGZsKTsNCj4g
wqB9DQo+IGRpZmYgLS1naXQgYS9mcy9uZnMvbmZzNHByb2MuYyBiL2ZzL25mcy9uZnM0cHJvYy5j
DQo+IGluZGV4IDkxZjg4YmZiYmU3OS4uMDUyYjk3ZmQ2NTNkIDEwMDY0NA0KPiAtLS0gYS9mcy9u
ZnMvbmZzNHByb2MuYw0KPiArKysgYi9mcy9uZnMvbmZzNHByb2MuYw0KPiBAQCAtNTkwNiw2ICs1
OTA2LDcgQEAgc3RydWN0IG5mczRfdW5sb2NrZGF0YSB7DQo+IMKgCXN0cnVjdCBuZnNfbG9ja3Vf
cmVzIHJlczsNCj4gwqAJc3RydWN0IG5mczRfbG9ja19zdGF0ZSAqbHNwOw0KPiDCoAlzdHJ1Y3Qg
bmZzX29wZW5fY29udGV4dCAqY3R4Ow0KPiArCXN0cnVjdCBuZnNfbG9ja19jb250ZXh0ICpsX2N0
eDsNCj4gwqAJc3RydWN0IGZpbGVfbG9jayBmbDsNCj4gwqAJc3RydWN0IG5mc19zZXJ2ZXIgKnNl
cnZlcjsNCj4gwqAJdW5zaWduZWQgbG9uZyB0aW1lc3RhbXA7DQo+IEBAIC01OTMwLDYgKzU5MzEs
NyBAQCBzdGF0aWMgc3RydWN0IG5mczRfdW5sb2NrZGF0YQ0KPiAqbmZzNF9hbGxvY191bmxvY2tk
YXRhKHN0cnVjdCBmaWxlX2xvY2sgKmZsLA0KPiDCoAlhdG9taWNfaW5jKCZsc3AtPmxzX2NvdW50
KTsNCj4gwqAJLyogRW5zdXJlIHdlIGRvbid0IGNsb3NlIGZpbGUgdW50aWwgd2UncmUgZG9uZSBm
cmVlaW5nDQo+IGxvY2tzISAqLw0KPiDCoAlwLT5jdHggPSBnZXRfbmZzX29wZW5fY29udGV4dChj
dHgpOw0KPiArCXAtPmxfY3R4ID0gbmZzX2dldF9sb2NrX2NvbnRleHQoY3R4KTsNCj4gwqAJbWVt
Y3B5KCZwLT5mbCwgZmwsIHNpemVvZihwLT5mbCkpOw0KPiDCoAlwLT5zZXJ2ZXIgPSBORlNfU0VS
VkVSKGlub2RlKTsNCj4gwqAJcmV0dXJuIHA7DQo+IEBAIC01OTg4LDYgKzU5OTAsMTEgQEAgc3Rh
dGljIHZvaWQgbmZzNF9sb2NrdV9wcmVwYXJlKHN0cnVjdCBycGNfdGFzaw0KPiAqdGFzaywgdm9p
ZCAqZGF0YSkNCj4gwqAJCS8qIE5vdGU6IGV4aXQgX3dpdGhvdXRfIHJ1bm5pbmcgbmZzNF9sb2Nr
dV9kb25lICovDQo+IMKgCQlnb3RvIG91dF9ub19hY3Rpb247DQo+IMKgCX0NCj4gKw0KPiArCWlm
ICghSVNfRVJSKGNhbGxkYXRhLT5sX2N0eCkpIHsNCj4gKwkJbmZzX2lvY291bnRlcl93YWl0KGNh
bGxkYXRhLT5sX2N0eCk7DQo+ICsJCW5mc19wdXRfbG9ja19jb250ZXh0KGNhbGxkYXRhLT5sX2N0
eCk7DQo+ICsJfQ0KPiDCoAljYWxsZGF0YS0+dGltZXN0YW1wID0gamlmZmllczsNCj4gwqAJaWYg
KG5mczRfc2V0dXBfc2VxdWVuY2UoY2FsbGRhdGEtPnNlcnZlciwNCj4gwqAJCQkJJmNhbGxkYXRh
LT5hcmcuc2VxX2FyZ3MsDQo+IGRpZmYgLS1naXQgYS9mcy9uZnMvcGFnZWxpc3QuYyBiL2ZzL25m
cy9wYWdlbGlzdC5jDQo+IGluZGV4IDZlNjI5Yjg1NmEwMC4uOGJmM2NlZmRhYTk2IDEwMDY0NA0K
PiAtLS0gYS9mcy9uZnMvcGFnZWxpc3QuYw0KPiArKysgYi9mcy9uZnMvcGFnZWxpc3QuYw0KPiBA
QCAtMTE0LDYgKzExNCw3IEBAIG5mc19pb2NvdW50ZXJfd2FpdChzdHJ1Y3QgbmZzX2xvY2tfY29u
dGV4dA0KPiAqbF9jdHgpDQo+IMKgCXJldHVybiB3YWl0X29uX2F0b21pY190KCZsX2N0eC0+aW9f
Y291bnQsDQo+IG5mc193YWl0X2F0b21pY19raWxsYWJsZSwNCj4gwqAJCQlUQVNLX0tJTExBQkxF
KTsNCj4gwqB9DQo+ICtFWFBPUlRfU1lNQk9MKG5mc19pb2NvdW50ZXJfd2FpdCk7DQo+IMKgDQo+
IMKgLyoNCj4gwqAgKiBuZnNfcGFnZV9ncm91cF9sb2NrIC0gbG9jayB0aGUgaGVhZCBvZiB0aGUg
cGFnZSBncm91cA0KLS0gDQoNCg0KDQoJDQoJDQoNCg0KVHJvbmQgTXlrbGVidXN0DQpQcmluY2lw
YWwgU3lzdGVtIEFyY2hpdGVjdA0KNDMwMCBFbCBDYW1pbm8gUmVhbCB8IFN1aXRlIDEwMA0KTG9z
IEFsdG9zLCBDQcKgwqA5NDAyMg0KVzogNjUwLTQyMi0zODAwDQpDOiA4MDEtOTIxLTQ1ODPCoA0K
d3d3LnByaW1hcnlkYXRhLmNvbQ0KDQoNCg0KDQo=

2017-02-17 19:15:24

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFS: Always wait for I/O completion before unlock

On 17 Feb 2017, at 14:00, Trond Myklebust wrote:

> On Fri, 2017-02-17 at 13:46 -0500, Benjamin Coddington wrote:
>> NFS attempts to wait for read and write completion before unlocking
>> in
>> order to ensure that the data returned was protected by the
>> lock.  When
>> this waiting is interrupted by a signal, the unlock may never be
>> sent, and
>> messages similar to the following are seen in the kernel ring buffer:
>>
>> [20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3:
>> [20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1 fl_type=0x0
>> fl_pid=20183
>> [20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1 fl_type=0x0
>> fl_pid=20185
>>
>> For NFSv3, the missing unlock will cause the server to refuse
>> conflicting
>> locks indefinitely.  For NFSv4, the leftover lock will be removed by
>> the
>> server after the lease timeout.
>>
>> This patch fixes this for NFSv3 by skipping the wait in order to
>> immediately send the unlock if the FL_CLOSE flag is set when
>> signaled.  For
>> NFSv4, this approach may cause the server to see the I/O as arriving
>> with
>> an old stateid, so, for the v4 case the fix is different: the wait on
>> I/O
>> completion is moved into nfs4_locku_ops' rpc_call_prepare().  This
>> will
>> cause the sleep to happen in rpciod context, and a signal to the
>> originally
>> waiting process will not cause the unlock to be skipped.
>
> NACK. I/O waits in rpciod contexts are NOT acceptable. rpciod is part
> of the memory reclaim chain, so having it sleep on I/O is deadlock
> prone.
>
> Why is there a need to wait for I/O completion in the first place if
> the user has killed the task that held the lock? 'kill -9' will cause
> corruption; that's a fact that no amount of paper will cover over.

To avoid an unnecessary recovery situation where the server asks us to resend
I/O due to an invalid stateid.

I'm fine with skipping the wait on signaled FL_CLOSE if the that's an
acceptable trade-off. I can send a v3.

Ben

>>
>> Signed-off-by: Benjamin Coddington <[email protected]>
>> ---
>>  fs/nfs/file.c     | 13 -------------
>>  fs/nfs/nfs3proc.c | 13 +++++++++++++
>>  fs/nfs/nfs4proc.c |  7 +++++++
>>  fs/nfs/pagelist.c |  1 +
>>  4 files changed, 21 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
>> index a490f45df4db..adc67fe762e3 100644
>> --- a/fs/nfs/file.c
>> +++ b/fs/nfs/file.c
>> @@ -684,7 +684,6 @@ static int
>>  do_unlk(struct file *filp, int cmd, struct file_lock *fl, int
>> is_local)
>>  {
>>   struct inode *inode = filp->f_mapping->host;
>> - struct nfs_lock_context *l_ctx;
>>   int status;
>>  
>>   /*
>> @@ -693,18 +692,6 @@ do_unlk(struct file *filp, int cmd, struct
>> file_lock *fl, int is_local)
>>    */
>>   vfs_fsync(filp, 0);
>>  
>> - l_ctx = nfs_get_lock_context(nfs_file_open_context(filp));
>> - if (!IS_ERR(l_ctx)) {
>> - status = nfs_iocounter_wait(l_ctx);
>> - nfs_put_lock_context(l_ctx);
>> - if (status < 0)
>> - return status;
>> - }
>> -
>> - /* NOTE: special case
>> -  *  If we're signalled while cleaning up locks on
>> process exit, we
>> -  *  still need to complete the unlock.
>> -  */
>>   /*
>>    * Use local locking if mounted with "-onolock" or with
>> appropriate
>>    * "-olocal_lock="
>> diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
>> index dc925b531f32..ec3f12571c82 100644
>> --- a/fs/nfs/nfs3proc.c
>> +++ b/fs/nfs/nfs3proc.c
>> @@ -869,6 +869,19 @@ static int
>>  nfs3_proc_lock(struct file *filp, int cmd, struct file_lock *fl)
>>  {
>>   struct inode *inode = file_inode(filp);
>> + int status;
>> + struct nfs_lock_context *l_ctx;
>> +
>> + /* For v3, always send the unlock on FL_CLOSE */
>> + if (fl->fl_type == F_UNLCK) {
>> + l_ctx =
>> nfs_get_lock_context(nfs_file_open_context(filp));
>> + if (!IS_ERR(l_ctx)) {
>> + status = nfs_iocounter_wait(l_ctx);
>> + nfs_put_lock_context(l_ctx);
>> + if (status < 0 && !(fl->fl_flags &
>> FL_CLOSE))
>> + return status;
>> + }
>> + }
>>  
>>   return nlmclnt_proc(NFS_SERVER(inode)->nlm_host, cmd, fl);
>>  }
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 91f88bfbbe79..052b97fd653d 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -5906,6 +5906,7 @@ struct nfs4_unlockdata {
>>   struct nfs_locku_res res;
>>   struct nfs4_lock_state *lsp;
>>   struct nfs_open_context *ctx;
>> + struct nfs_lock_context *l_ctx;
>>   struct file_lock fl;
>>   struct nfs_server *server;
>>   unsigned long timestamp;
>> @@ -5930,6 +5931,7 @@ static struct nfs4_unlockdata
>> *nfs4_alloc_unlockdata(struct file_lock *fl,
>>   atomic_inc(&lsp->ls_count);
>>   /* Ensure we don't close file until we're done freeing
>> locks! */
>>   p->ctx = get_nfs_open_context(ctx);
>> + p->l_ctx = nfs_get_lock_context(ctx);
>>   memcpy(&p->fl, fl, sizeof(p->fl));
>>   p->server = NFS_SERVER(inode);
>>   return p;
>> @@ -5988,6 +5990,11 @@ static void nfs4_locku_prepare(struct rpc_task
>> *task, void *data)
>>   /* Note: exit _without_ running nfs4_locku_done */
>>   goto out_no_action;
>>   }
>> +
>> + if (!IS_ERR(calldata->l_ctx)) {
>> + nfs_iocounter_wait(calldata->l_ctx);
>> + nfs_put_lock_context(calldata->l_ctx);
>> + }
>>   calldata->timestamp = jiffies;
>>   if (nfs4_setup_sequence(calldata->server,
>>   &calldata->arg.seq_args,
>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>> index 6e629b856a00..8bf3cefdaa96 100644
>> --- a/fs/nfs/pagelist.c
>> +++ b/fs/nfs/pagelist.c
>> @@ -114,6 +114,7 @@ nfs_iocounter_wait(struct nfs_lock_context
>> *l_ctx)
>>   return wait_on_atomic_t(&l_ctx->io_count,
>> nfs_wait_atomic_killable,
>>   TASK_KILLABLE);
>>  }
>> +EXPORT_SYMBOL(nfs_iocounter_wait);
>>  
>>  /*
>>   * nfs_page_group_lock - lock the head of the page group
> --
>
>
>
>
>
>
>
> Trond Myklebust
> Principal System Architect
> 4300 El Camino Real | Suite 100
> Los Altos, CA  94022
> W: 650-422-3800
> C: 801-921-4583 
> http://www.primarydata.com

2017-02-17 19:30:52

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFS: Always wait for I/O completion before unlock

T24gRnJpLCAyMDE3LTAyLTE3IGF0IDE0OjE1IC0wNTAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdy
b3RlOg0KPiBPbiAxNyBGZWIgMjAxNywgYXQgMTQ6MDAsIFRyb25kIE15a2xlYnVzdCB3cm90ZToN
Cj4gDQo+ID4gT24gRnJpLCAyMDE3LTAyLTE3IGF0IDEzOjQ2IC0wNTAwLCBCZW5qYW1pbiBDb2Rk
aW5ndG9uIHdyb3RlOg0KPiA+ID4gTkZTIGF0dGVtcHRzIHRvIHdhaXQgZm9yIHJlYWQgYW5kIHdy
aXRlIGNvbXBsZXRpb24gYmVmb3JlDQo+ID4gPiB1bmxvY2tpbmcNCj4gPiA+IGluDQo+ID4gPiBv
cmRlciB0byBlbnN1cmUgdGhhdCB0aGUgZGF0YSByZXR1cm5lZCB3YXMgcHJvdGVjdGVkIGJ5IHRo
ZQ0KPiA+ID4gbG9jay7CoMKgV2hlbg0KPiA+ID4gdGhpcyB3YWl0aW5nIGlzIGludGVycnVwdGVk
IGJ5IGEgc2lnbmFsLCB0aGUgdW5sb2NrIG1heSBuZXZlciBiZQ0KPiA+ID4gc2VudCwgYW5kDQo+
ID4gPiBtZXNzYWdlcyBzaW1pbGFyIHRvIHRoZSBmb2xsb3dpbmcgYXJlIHNlZW4gaW4gdGhlIGtl
cm5lbCByaW5nDQo+ID4gPiBidWZmZXI6DQo+ID4gPiANCj4gPiA+IFsyMC4xNjc4NzZdIExlYWtl
ZCBsb2NrcyBvbiBkZXY9MHgwOjB4MmIgaW5vPTB4OGRkNGMzOg0KPiA+ID4gWzIwLjE2ODI4Nl0g
UE9TSVg6IGZsX293bmVyPWZmZmY4ODAwNzhiMDY5NDAgZmxfZmxhZ3M9MHgxDQo+ID4gPiBmbF90
eXBlPTB4MA0KPiA+ID4gZmxfcGlkPTIwMTgzDQo+ID4gPiBbMjAuMTY4NzI3XSBQT1NJWDogZmxf
b3duZXI9ZmZmZjg4MDA3OGIwNjY4MCBmbF9mbGFncz0weDENCj4gPiA+IGZsX3R5cGU9MHgwDQo+
ID4gPiBmbF9waWQ9MjAxODUNCj4gPiA+IA0KPiA+ID4gRm9yIE5GU3YzLCB0aGUgbWlzc2luZyB1
bmxvY2sgd2lsbCBjYXVzZSB0aGUgc2VydmVyIHRvIHJlZnVzZQ0KPiA+ID4gY29uZmxpY3RpbmcN
Cj4gPiA+IGxvY2tzIGluZGVmaW5pdGVseS7CoMKgRm9yIE5GU3Y0LCB0aGUgbGVmdG92ZXIgbG9j
ayB3aWxsIGJlIHJlbW92ZWQNCj4gPiA+IGJ5DQo+ID4gPiB0aGUNCj4gPiA+IHNlcnZlciBhZnRl
ciB0aGUgbGVhc2UgdGltZW91dC4NCj4gPiA+IA0KPiA+ID4gVGhpcyBwYXRjaCBmaXhlcyB0aGlz
IGZvciBORlN2MyBieSBza2lwcGluZyB0aGUgd2FpdCBpbiBvcmRlciB0bw0KPiA+ID4gaW1tZWRp
YXRlbHkgc2VuZCB0aGUgdW5sb2NrIGlmIHRoZSBGTF9DTE9TRSBmbGFnIGlzIHNldCB3aGVuDQo+
ID4gPiBzaWduYWxlZC7CoMKgRm9yDQo+ID4gPiBORlN2NCwgdGhpcyBhcHByb2FjaCBtYXkgY2F1
c2UgdGhlIHNlcnZlciB0byBzZWUgdGhlIEkvTyBhcw0KPiA+ID4gYXJyaXZpbmcNCj4gPiA+IHdp
dGgNCj4gPiA+IGFuIG9sZCBzdGF0ZWlkLCBzbywgZm9yIHRoZSB2NCBjYXNlIHRoZSBmaXggaXMg
ZGlmZmVyZW50OiB0aGUNCj4gPiA+IHdhaXQgb24NCj4gPiA+IEkvTw0KPiA+ID4gY29tcGxldGlv
biBpcyBtb3ZlZCBpbnRvIG5mczRfbG9ja3Vfb3BzJw0KPiA+ID4gcnBjX2NhbGxfcHJlcGFyZSgp
LsKgwqBUaGlzDQo+ID4gPiB3aWxsDQo+ID4gPiBjYXVzZSB0aGUgc2xlZXAgdG8gaGFwcGVuIGlu
IHJwY2lvZCBjb250ZXh0LCBhbmQgYSBzaWduYWwgdG8gdGhlDQo+ID4gPiBvcmlnaW5hbGx5DQo+
ID4gPiB3YWl0aW5nIHByb2Nlc3Mgd2lsbCBub3QgY2F1c2UgdGhlIHVubG9jayB0byBiZSBza2lw
cGVkLg0KPiA+IA0KPiA+IE5BQ0suIEkvTyB3YWl0cyBpbiBycGNpb2QgY29udGV4dHMgYXJlIE5P
VCBhY2NlcHRhYmxlLiBycGNpb2QgaXMNCj4gPiBwYXJ0DQo+ID4gb2YgdGhlIG1lbW9yeSByZWNs
YWltIGNoYWluLCBzbyBoYXZpbmcgaXQgc2xlZXAgb24gSS9PIGlzIGRlYWRsb2NrDQo+ID4gcHJv
bmUuDQo+ID4gDQo+ID4gV2h5IGlzIHRoZXJlIGEgbmVlZCB0byB3YWl0IGZvciBJL08gY29tcGxl
dGlvbiBpbiB0aGUgZmlyc3QgcGxhY2UNCj4gPiBpZg0KPiA+IHRoZSB1c2VyIGhhcyBraWxsZWQg
dGhlIHRhc2sgdGhhdCBoZWxkIHRoZSBsb2NrPyAna2lsbCAtOScgd2lsbA0KPiA+IGNhdXNlDQo+
ID4gY29ycnVwdGlvbjsgdGhhdCdzIGEgZmFjdCB0aGF0IG5vIGFtb3VudCBvZiBwYXBlciB3aWxs
IGNvdmVyIG92ZXIuDQo+IA0KPiBUbyBhdm9pZCBhbiB1bm5lY2Vzc2FyeSByZWNvdmVyeSBzaXR1
YXRpb24gd2hlcmUgdGhlIHNlcnZlciBhc2tzIHVzDQo+IHRvIHJlc2VuZA0KPiBJL08gZHVlIHRv
IGFuIGludmFsaWQgc3RhdGVpZC4NCj4gDQoNCkkgYWdyZWUgd2Ugc2hvdWxkbid0IHJlY292ZXIg
aW4gdGhpcyBzaXR1YXRpb24uIEl0IHdvdWxkIGJlIGJldHRlciB0bw0KamV0dGlzb24gdGhlIGZh
aWxlZCB3cml0ZSwgYW5kIGludmFsaWRhdGUgdGhlIHBhZ2UuIENhbiB3ZSBtYWtlIHVzZSBvZg0K
bmZzX3diX3BhZ2VfY2FuY2VsKCkgdG9nZXRoZXIgd2l0aCBnZW5lcmljX2Vycm9yX3JlbW92ZV9w
YWdlKCk/DQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWlu
ZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo=

2017-02-17 20:10:56

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFS: Always wait for I/O completion before unlock

On 17 Feb 2017, at 14:30, Trond Myklebust wrote:
> On Fri, 2017-02-17 at 14:15 -0500, Benjamin Coddington wrote:
>> To avoid an unnecessary recovery situation where the server asks us to
>> resend I/O due to an invalid stateid.
>
> I agree we shouldn't recover in this situation. It would be better to
> jettison the failed write, and invalidate the page. Can we make use of
> nfs_wb_page_cancel() together with generic_error_remove_page()?

I'll take a look..

Ben

2017-02-21 13:56:30

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH 4/4] NFS: Always wait for I/O completion before unlock

On 17 Feb 2017, at 14:30, Trond Myklebust wrote:

> On Fri, 2017-02-17 at 14:15 -0500, Benjamin Coddington wrote:
>> On 17 Feb 2017, at 14:00, Trond Myklebust wrote:
>>
>>> On Fri, 2017-02-17 at 13:46 -0500, Benjamin Coddington wrote:
>>>> NFS attempts to wait for read and write completion before
>>>> unlocking
>>>> in
>>>> order to ensure that the data returned was protected by the
>>>> lock.  When
>>>> this waiting is interrupted by a signal, the unlock may never be
>>>> sent, and
>>>> messages similar to the following are seen in the kernel ring
>>>> buffer:
>>>>
>>>> [20.167876] Leaked locks on dev=0x0:0x2b ino=0x8dd4c3:
>>>> [20.168286] POSIX: fl_owner=ffff880078b06940 fl_flags=0x1
>>>> fl_type=0x0
>>>> fl_pid=20183
>>>> [20.168727] POSIX: fl_owner=ffff880078b06680 fl_flags=0x1
>>>> fl_type=0x0
>>>> fl_pid=20185
>>>>
>>>> For NFSv3, the missing unlock will cause the server to refuse
>>>> conflicting
>>>> locks indefinitely.  For NFSv4, the leftover lock will be removed
>>>> by
>>>> the
>>>> server after the lease timeout.
>>>>
>>>> This patch fixes this for NFSv3 by skipping the wait in order to
>>>> immediately send the unlock if the FL_CLOSE flag is set when
>>>> signaled.  For
>>>> NFSv4, this approach may cause the server to see the I/O as
>>>> arriving
>>>> with
>>>> an old stateid, so, for the v4 case the fix is different: the
>>>> wait on
>>>> I/O
>>>> completion is moved into nfs4_locku_ops'
>>>> rpc_call_prepare().  This
>>>> will
>>>> cause the sleep to happen in rpciod context, and a signal to the
>>>> originally
>>>> waiting process will not cause the unlock to be skipped.
>>>
>>> NACK. I/O waits in rpciod contexts are NOT acceptable. rpciod is
>>> part
>>> of the memory reclaim chain, so having it sleep on I/O is deadlock
>>> prone.
>>>
>>> Why is there a need to wait for I/O completion in the first place
>>> if
>>> the user has killed the task that held the lock? 'kill -9' will
>>> cause
>>> corruption; that's a fact that no amount of paper will cover over.
>>
>> To avoid an unnecessary recovery situation where the server asks us
>> to resend
>> I/O due to an invalid stateid.
>>
>
> I agree we shouldn't recover in this situation. It would be better to
> jettison the failed write, and invalidate the page. Can we make use of
> nfs_wb_page_cancel() together with generic_error_remove_page()?

Probably we can piggy-back on NFS_LOCK_LOST, then -EIO would get passed up
and the page would make it into generic_error_remove_page(). Any
outstanding writes are likely already transmitted or scheduled to be
tranmitted by now, and the error recovery path for incorrect stateids
doesn't intersect with nfs_wb_page_cancel(), rather it re-schedules the RPC.

But, after looking at this further, I'm not sure how much work should be
done here. It's a fairly unlikely situation already, and if we assert that
a fatal signal means writes don't have to complete at all, I don't see the
harm in having them complete outside the lock. Adding extra complexity to
bypass recovery for this specific situation would be optimal, but
unnecessary.

Ben