2017-08-18 07:13:35

by NeilBrown

[permalink] [raw]
Subject: [PATCH 0/8] Assorted NFS patches.

Hi,
this is an ad-hoc collection of NFS patches that I have
waiting for me to act on.

Some have been posted the list before but didn't get a response
(maybe people were busy). A couple have been revised a little.
Some are new.

Some fix real bugs (hanging mounts, unflushed data, wrong
credentials for CLOSE), others are just clean-ups.

Thanks,
NeilBrown


---

NeilBrown (8):
SUNRPC: ECONNREFUSED should cause a rebind.
NFSv4: don't let hanging mounts block other mounts
NFS: don't expect errors from mempool_alloc().
NFS: flush out dirty data on file fput().
SUNRPC: remove some dead code.
NFS: flush data when locking a file to ensure cache coherence for mmap.
NFS: remove error handling for callers of rpc_new_task()
NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys'


fs/lockd/clntproc.c | 4 ---
fs/nfs/file.c | 17 ++++++++---
fs/nfs/nfs42proc.c | 2 -
fs/nfs/nfs4_fs.h | 11 +++++++
fs/nfs/nfs4client.c | 2 +
fs/nfs/nfs4proc.c | 62 ++++------------------------------------
fs/nfs/pagelist.c | 5 ---
fs/nfs/unlink.c | 9 +-----
fs/nfs/write.c | 8 +----
net/sunrpc/auth_gss/auth_gss.c | 3 +-
net/sunrpc/clnt.c | 22 +++++---------
net/sunrpc/rpcb_clnt.c | 6 ----
net/sunrpc/svc.c | 6 ----
13 files changed, 43 insertions(+), 114 deletions(-)

--
Signature



2017-08-18 07:13:47

by NeilBrown

[permalink] [raw]
Subject: [PATCH 2/8] NFSv4: don't let hanging mounts block other mounts

If you try an NFSv4 mount from an inaccessible server, it will hang as
you would expect.
If you then try an NFSv4 mount from a different accessible server,
it will also hang. This is not expected.

The second mount is blocked in
nfs4_init_client()
-> nfs4_discover_server_trunking()
-> nfs40_discover_server_trunking()
-> nfs40_walk_client_list()
-> nfs4_match_client()
-> nfs_wait_client_init_complete()
It is waiting for the first mount to complete so that it can then
see if the two servers are really one and the same.

It is not necessary to wait here when an nfs_client cl_cons_state is
NFS_CS_INITING. Such a client will, after changing cl_cons_state, call
nfs4_discover_server_trunking() itself. So if the current client just
skips those clients, trunking will still be discovered if necessary.

I am unsure of situation with NFS_CS_SESSION_INITING, but I suspect
that the comment "Wait for CREATE_SESSION to finish" implies that
it is only clients in NFS_CS_SESSION_INITING that need to be waited for.

Signed-off-by: NeilBrown <[email protected]>
---
fs/nfs/nfs4client.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index e9bea90dc017..d8b9b7ff19a9 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -482,7 +482,7 @@ static int nfs4_match_client(struct nfs_client *pos, struct nfs_client *new,
* remaining fields in "pos", especially the client
* ID and serverowner fields. Wait for CREATE_SESSION
* to finish. */
- if (pos->cl_cons_state > NFS_CS_READY) {
+ if (pos->cl_cons_state == NFS_CS_SESSION_INITING) {
atomic_inc(&pos->cl_count);
spin_unlock(&nn->nfs_client_lock);




2017-08-18 07:13:41

by NeilBrown

[permalink] [raw]
Subject: [PATCH 1/8] SUNRPC: ECONNREFUSED should cause a rebind.

If you
- mount and NFSv3 filesystem
- do some file locking which requires the server
to make a GRANT call back
- unmount
- mount again and do the same locking

then the second attempt at locking suffers a 30 second delay.
Unmounting and remounting causes lockd to stop and restart,
which causes it to bind to a new port.
The server still thinks the old port is valid and gets ECONNREFUSED
when trying to contact it.
ECONNREFUSED should be seen as a hard error that is not worth
retrying. Rebinding is the only reasonable response.

This patch forces a rebind if that makes sense.

Signed-off-by: NeilBrown <[email protected]>
---
net/sunrpc/clnt.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 2e49d1f892b7..69a9e5953744 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1903,6 +1903,14 @@ call_connect_status(struct rpc_task *task)
task->tk_status = 0;
switch (status) {
case -ECONNREFUSED:
+ /* A positive refusal suggests a rebind is needed. */
+ if (RPC_IS_SOFTCONN(task))
+ break;
+ if (clnt->cl_autobind) {
+ rpc_force_rebind(clnt);
+ task->tk_action = call_bind;
+ return;
+ }
case -ECONNRESET:
case -ECONNABORTED:
case -ENETUNREACH:



2017-08-18 07:13:53

by NeilBrown

[permalink] [raw]
Subject: [PATCH 3/8] NFS: don't expect errors from mempool_alloc().

Commit fbe77c30e9ab ("NFS: move rw_mode to nfs_pageio_header")
reintroduced some pointless code that commit 518662e0fcb9 ("NFS: fix
usage of mempools.") had recently removed.

Remove it again.

Cc: Benjamin Coddington <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---
fs/nfs/write.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index b1af5dee5e0a..e833179d0d49 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -102,10 +102,8 @@ static struct nfs_pgio_header *nfs_writehdr_alloc(void)
{
struct nfs_pgio_header *p = mempool_alloc(nfs_wdata_mempool, GFP_NOIO);

- if (p) {
- memset(p, 0, sizeof(*p));
- p->rw_mode = FMODE_WRITE;
- }
+ memset(p, 0, sizeof(*p));
+ p->rw_mode = FMODE_WRITE;
return p;
}




2017-08-18 07:13:59

by NeilBrown

[permalink] [raw]
Subject: [PATCH 4/8] NFS: flush out dirty data on file fput().

Any dirty NFS page holds an s_active reference on the superblock,
because page_private() references an nfs_page, which references an
open context, which references the superblock.

So if there are any dirty pages when the filesystem is unmounted, the
unmount will act like a "lazy" unmount and not call ->kill_sb().
Background write-back can then write out the pages *after* the filesystem
unmount has apparently completed.

This contrasts with other filesystems which do not hold extra s_active
references, so ->kill_sb() is reliably called on unmount, and
generic_shutdown_super() will call sync_filesystem() to flush
everything out before the unmount completes.

When open/write/close is used to modify files, the final close causes
f_op->flush to be called, which flushes all dirty pages. However if
open/mmap/close/modify-memory/unmap is used, dirty pages can remain in
memory after the application has dropped all references to the file.
Similarly if a loop-mount is done with a NFS file, there is no final
flush when the loop device is destroyed and the file can have dirty
data after the last "close".

Also, a loop-back mount of a device does not "close" the file when the
loop device is destroyed. This can leave dirty page cache pages too.

Fix this by calling vfs_fsync() in nfs_file_release (aka
f_op->release()). This means that on the final unmap of a file (or
destruction of a loop device), all changes are flushed, and ensures that
when unmount is requested there will be no dirty pages to delay the
final unmount.

Without this patch, it is not safe to stop or disconnect the NFS
server after all clients have unmounted. They need to unmount and
call "sync".

Signed-off-by: NeilBrown <[email protected]>
---
fs/nfs/file.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index af330c31f627..aa883d8b24e6 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -81,6 +81,12 @@ nfs_file_release(struct inode *inode, struct file *filp)
{
dprintk("NFS: release(%pD2)\n", filp);

+ if (filp->f_mode & FMODE_WRITE)
+ /* Ensure dirty mmapped pages are flushed
+ * so there will be no dirty pages to
+ * prevent an unmount from completing.
+ */
+ vfs_fsync(filp, 0);
nfs_inc_stats(inode, NFSIOS_VFSRELEASE);
nfs_file_clear_open_context(filp);
return 0;



2017-08-18 07:14:05

by NeilBrown

[permalink] [raw]
Subject: [PATCH 5/8] SUNRPC: remove some dead code.

RPC_TASK_NO_RETRANS_TIMEOUT is set when cl_noretranstimeo
is set, which happens when RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT is set,
which happens when NFS_CS_NO_RETRANS_TIMEOUT is set.

This flag means "don't resend on a timeout, only resend if the
connection gets broken for some reason".

cl_discrtry is set when RPC_CLNT_CREATE_DISCRTRY is set, which
happens when NFS_CS_DISCRTRY is set.

This flag means "always disconnect before resending".

NFS_CS_NO_RETRANS_TIMEOUT and NFS_CS_DISCRTRY are both only set
in nfs4_init_client(), and it always sets both.

So we will never have a situation where only one of the flags is set.
So this code, which tests if timeout retransmits are allowed, and
disconnection is required, will never run.

So it makes sense to remove this code as it cannot be tested and
could confuse people reading the code (like me).

(alternately we could leave it there with a comment saying
it is never actually used).

Signed-off-by: NeilBrown <[email protected]>
---
net/sunrpc/clnt.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 69a9e5953744..2ad827db2704 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2147,10 +2147,6 @@ call_status(struct rpc_task *task)
rpc_delay(task, 3*HZ);
case -ETIMEDOUT:
task->tk_action = call_timeout;
- if (!(task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT)
- && task->tk_client->cl_discrtry)
- xprt_conditional_disconnect(req->rq_xprt,
- req->rq_connect_cookie);
break;
case -ECONNREFUSED:
case -ECONNRESET:



2017-08-18 07:14:11

by NeilBrown

[permalink] [raw]
Subject: [PATCH 6/8] NFS: flush data when locking a file to ensure cache coherence for mmap.

When a byte range lock (or flock) is taken out on an NFS file, the
validity of the cached data is checked and the inode is marked
NFS_INODE_INVALID_DATA. However the cached data isn't flushed from
the page cache.

This is sufficient for future read() requests or mmap() requests as
they call nfs_revalidate_mapping() which performs the flush if
necessary.

However an existing mapping is not affected. Accessing data through
that mapping will continue to return old data even though the inode is
marked NFS_INODE_INVALID_DATA.

This can easily be confirmed using the 'nfs' tool in
git://github.com/okirch/twopence-nfs.git
and running

nfs coherence FILENAME
on one client, and
nfs coherence -r FILENAME
on another client.

It appears that prior to Linux 2.6.0 this worked correctly.

However commit:

http://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?id=ca9268fe3ddd075714005adecd4afbd7f9ab87d0

removed the call to inode_invalidate_pages() from nfs_zap_caches(). I
haven't tested this code, but inspection suggests that prior to this
commit, file locking would invalidate all inode pages.

This patch adds a call to nfs_revalidate_mapping() after a
successful SETLK so that invalid data is flushed. With this patch the
above test passes. To minimize impact (and possibly avoid a GETATTR
call) this only happens if the mapping might be mapped into
userspace.

Cc: Olaf Kirch <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---
fs/nfs/file.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index aa883d8b24e6..d67cd3dbfa67 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -750,15 +750,18 @@ do_setlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
goto out;

/*
- * Revalidate the cache if the server has time stamps granular
- * enough to detect subsecond changes. Otherwise, clear the
- * cache to prevent missing any changes.
+ * Invalidate cache to prevent missing any changes. If
+ * the file is mapped, clear the page cache as well so
+ * those mappings will be loaded.
*
* This makes locking act as a cache coherency point.
*/
nfs_sync_mapping(filp->f_mapping);
- if (!NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
+ if (!NFS_PROTO(inode)->have_delegation(inode, FMODE_READ)) {
nfs_zap_caches(inode);
+ if (mapping_mapped(filp->f_mapping))
+ nfs_revalidate_mapping(inode, filp->f_mapping);
+ }
out:
return status;
}



2017-08-18 07:14:19

by NeilBrown

[permalink] [raw]
Subject: [PATCH 7/8] NFS: remove error handling for callers of rpc_new_task()

Since commit 62b2417e84ba ("sunrpc: don't check for failure from mempool_alloc()")

rpc_new_task() cannot fail, so functions which fail only when it failed
also cannot fail.
This means we no longer need to check for errors from:

rpc_run_task()
rpc_run_bc_task()
__nlm_async_call()
nfs4_do_unlck()
_nfs41_proc_sequence()
_nfs41_free_stateid()
nfs_async_rename()
rpc_call_null()
rpc_call_null_helper()
rpcb_call_async()

Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: NeilBrown <[email protected]>


Hi,
if you think there might ever be a need to allow
rpc_new_task() to fail in the future, we might not
want to take this approach.


Signed-off-by: NeilBrown <[email protected]>
---
fs/lockd/clntproc.c | 4 ---
fs/nfs/nfs42proc.c | 2 -
fs/nfs/nfs4proc.c | 62 ++++------------------------------------
fs/nfs/pagelist.c | 5 ---
fs/nfs/unlink.c | 9 +-----
fs/nfs/write.c | 2 -
net/sunrpc/auth_gss/auth_gss.c | 3 +-
net/sunrpc/clnt.c | 10 ------
net/sunrpc/rpcb_clnt.c | 6 ----
net/sunrpc/svc.c | 6 ----
10 files changed, 8 insertions(+), 101 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 066ac313ae5c..fefed4a8d9c8 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -367,8 +367,6 @@ static int nlm_do_async_call(struct nlm_rqst *req, u32 proc, struct rpc_message
struct rpc_task *task;

task = __nlm_async_call(req, proc, msg, tk_ops);
- if (IS_ERR(task))
- return PTR_ERR(task);
rpc_put_task(task);
return 0;
}
@@ -412,8 +410,6 @@ static int nlmclnt_async_call(struct rpc_cred *cred, struct nlm_rqst *req, u32 p
int err;

task = __nlm_async_call(req, proc, &msg, tk_ops);
- if (IS_ERR(task))
- return PTR_ERR(task);
err = rpc_wait_for_completion_task(task);
rpc_put_task(task);
return err;
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 6c2db51e67a7..0062a73973ee 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -463,8 +463,6 @@ int nfs42_proc_layoutstats_generic(struct nfs_server *server,
}
nfs4_init_sequence(&data->args.seq_args, &data->res.seq_res, 0);
task = rpc_run_task(&task_setup);
- if (IS_ERR(task))
- return PTR_ERR(task);
rpc_put_task(task);
return 0;
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index d90132642340..df4a8724e85d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -977,12 +977,8 @@ static int nfs4_call_sync_sequence(struct rpc_clnt *clnt,
};

task = rpc_run_task(&task_setup);
- if (IS_ERR(task))
- ret = PTR_ERR(task);
- else {
- ret = task->tk_status;
- rpc_put_task(task);
- }
+ ret = task->tk_status;
+ rpc_put_task(task);
return ret;
}

@@ -2024,8 +2020,6 @@ static int _nfs4_proc_open_confirm(struct nfs4_opendata *data)
if (data->is_recover)
nfs4_set_sequence_privileged(&data->c_arg.seq_args);
task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return PTR_ERR(task);
status = rpc_wait_for_completion_task(task);
if (status != 0) {
data->cancelled = true;
@@ -2191,8 +2185,6 @@ static int nfs4_run_open_task(struct nfs4_opendata *data, int isrecover)
data->is_recover = true;
}
task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return PTR_ERR(task);
status = rpc_wait_for_completion_task(task);
if (status != 0) {
data->cancelled = true;
@@ -3213,8 +3205,6 @@ int nfs4_do_close(struct nfs4_state *state, gfp_t gfp_mask, int wait)
msg.rpc_resp = &calldata->res;
task_setup_data.callback_data = calldata;
task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return PTR_ERR(task);
status = 0;
if (wait)
status = rpc_wait_for_completion_task(task);
@@ -5536,10 +5526,6 @@ int nfs4_proc_setclientid(struct nfs_client *clp, u32 program,
clp->cl_rpcclient->cl_auth->au_ops->au_name,
clp->cl_owner_id);
task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task)) {
- status = PTR_ERR(task);
- goto out;
- }
status = task->tk_status;
if (setclientid.sc_cred) {
clp->cl_acceptor = rpcauth_stringify_acceptor(setclientid.sc_cred);
@@ -5755,8 +5741,6 @@ static int _nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, co
msg.rpc_argp = &data->args;
msg.rpc_resp = &data->res;
task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return PTR_ERR(task);
if (!issync)
goto out;
status = rpc_wait_for_completion_task(task);
@@ -6035,9 +6019,6 @@ static int nfs4_proc_unlck(struct nfs4_state *state, int cmd, struct file_lock *
if (IS_ERR(seqid))
goto out;
task = nfs4_do_unlck(request, nfs_file_open_context(request->fl_file), lsp, seqid);
- status = PTR_ERR(task);
- if (IS_ERR(task))
- goto out;
status = rpc_wait_for_completion_task(task);
rpc_put_task(task);
out:
@@ -6193,8 +6174,7 @@ static void nfs4_lock_release(void *calldata)
struct rpc_task *task;
task = nfs4_do_unlck(&data->fl, data->ctx, data->lsp,
data->arg.lock_seqid);
- if (!IS_ERR(task))
- rpc_put_task_async(task);
+ rpc_put_task_async(task);
dprintk("%s: cancelling lock!\n", __func__);
} else
nfs_free_seqid(data->arg.lock_seqid);
@@ -6263,8 +6243,6 @@ static int _nfs4_do_setlk(struct nfs4_state *state, int cmd, struct file_lock *f
} else
data->arg.new_lock = 1;
task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return PTR_ERR(task);
ret = rpc_wait_for_completion_task(task);
if (ret == 0) {
ret = data->rpc_status;
@@ -7222,11 +7200,8 @@ int nfs4_proc_bind_one_conn_to_session(struct rpc_clnt *clnt,
args.dir = NFS4_CDFC4_FORE;

task = rpc_run_task(&task_setup_data);
- if (!IS_ERR(task)) {
- status = task->tk_status;
- rpc_put_task(task);
- } else
- status = PTR_ERR(task);
+ status = task->tk_status;
+ rpc_put_task(task);
trace_nfs4_bind_conn_to_session(clp, status);
if (status == 0) {
if (memcmp(res.sessionid.data,
@@ -7573,8 +7548,6 @@ static int _nfs4_proc_exchange_id(struct nfs_client *clp, struct rpc_cred *cred,
task_setup_data.callback_data = calldata;

task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return PTR_ERR(task);

status = calldata->rpc_status;

@@ -7799,9 +7772,6 @@ int nfs4_proc_get_lease_time(struct nfs_client *clp, struct nfs_fsinfo *fsinfo)
nfs4_set_sequence_privileged(&args.la_seq_args);
task = rpc_run_task(&task_setup);

- if (IS_ERR(task))
- return PTR_ERR(task);
-
status = task->tk_status;
rpc_put_task(task);
return status;
@@ -8149,10 +8119,7 @@ static int nfs41_proc_async_sequence(struct nfs_client *clp, struct rpc_cred *cr
if ((renew_flags & NFS4_RENEW_TIMEOUT) == 0)
return -EAGAIN;
task = _nfs41_proc_sequence(clp, cred, false);
- if (IS_ERR(task))
- ret = PTR_ERR(task);
- else
- rpc_put_task_async(task);
+ rpc_put_task_async(task);
dprintk("<-- %s status=%d\n", __func__, ret);
return ret;
}
@@ -8163,15 +8130,10 @@ static int nfs4_proc_sequence(struct nfs_client *clp, struct rpc_cred *cred)
int ret;

task = _nfs41_proc_sequence(clp, cred, true);
- if (IS_ERR(task)) {
- ret = PTR_ERR(task);
- goto out;
- }
ret = rpc_wait_for_completion_task(task);
if (!ret)
ret = task->tk_status;
rpc_put_task(task);
-out:
dprintk("<-- %s status=%d\n", __func__, ret);
return ret;
}
@@ -8280,10 +8242,6 @@ static int nfs41_proc_reclaim_complete(struct nfs_client *clp,
msg.rpc_resp = &calldata->res;
task_setup_data.callback_data = calldata;
task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task)) {
- status = PTR_ERR(task);
- goto out;
- }
status = rpc_wait_for_completion_task(task);
if (status == 0)
status = task->tk_status;
@@ -8515,8 +8473,6 @@ nfs4_proc_layoutget(struct nfs4_layoutget *lgp, long *timeout, gfp_t gfp_flags)
nfs4_init_sequence(&lgp->args.seq_args, &lgp->res.seq_res, 0);

task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return ERR_CAST(task);
status = rpc_wait_for_completion_task(task);
if (status == 0) {
status = nfs4_layoutget_handle_exception(task, lgp, &exception);
@@ -8632,8 +8588,6 @@ int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync)
}
nfs4_init_sequence(&lrp->args.seq_args, &lrp->res.seq_res, 1);
task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return PTR_ERR(task);
if (sync)
status = task->tk_status;
trace_nfs4_layoutreturn(lrp->args.inode, &lrp->args.stateid, status);
@@ -8779,8 +8733,6 @@ nfs4_proc_layoutcommit(struct nfs4_layoutcommit_data *data, bool sync)
}
nfs4_init_sequence(&data->args.seq_args, &data->res.seq_res, 1);
task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return PTR_ERR(task);
if (sync)
status = task->tk_status;
trace_nfs4_layoutcommit(data->args.inode, &data->args.stateid, status);
@@ -9106,8 +9058,6 @@ static int nfs41_free_stateid(struct nfs_server *server,
struct rpc_task *task;

task = _nfs41_free_stateid(server, stateid, cred, is_recovery);
- if (IS_ERR(task))
- return PTR_ERR(task);
rpc_put_task(task);
return 0;
}
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index de9066a92c0d..12c1465c1472 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -646,17 +646,12 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
(unsigned long long)hdr->args.offset);

task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task)) {
- ret = PTR_ERR(task);
- goto out;
- }
if (how & FLUSH_SYNC) {
ret = rpc_wait_for_completion_task(task);
if (ret == 0)
ret = task->tk_status;
}
rpc_put_task(task);
-out:
return ret;
}
EXPORT_SYMBOL_GPL(nfs_initiate_pgio);
diff --git a/fs/nfs/unlink.c b/fs/nfs/unlink.c
index e3949d93085c..3cb88c073fde 100644
--- a/fs/nfs/unlink.c
+++ b/fs/nfs/unlink.c
@@ -108,8 +108,7 @@ static void nfs_do_call_unlink(struct nfs_unlinkdata *data)

task_setup_data.rpc_client = NFS_CLIENT(dir);
task = rpc_run_task(&task_setup_data);
- if (!IS_ERR(task))
- rpc_put_task_async(task);
+ rpc_put_task_async(task);
}

static int nfs_call_unlink(struct dentry *dentry, struct nfs_unlinkdata *data)
@@ -497,12 +496,6 @@ nfs_sillyrename(struct inode *dir, struct dentry *dentry)
/* run the rename task, undo unlink if it fails */
task = nfs_async_rename(dir, dir, dentry, sdentry,
nfs_complete_sillyrename);
- if (IS_ERR(task)) {
- error = -EBUSY;
- nfs_cancel_async_unlink(dentry);
- goto out_dput;
- }
-
/* wait for the RPC task to complete, unless a SIGKILL intervenes */
error = rpc_wait_for_completion_task(task);
if (error == 0)
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index e833179d0d49..c2924d3f172a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1691,8 +1691,6 @@ int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
NFS_SP4_MACH_CRED_COMMIT, &task_setup_data.rpc_client, &msg);

task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return PTR_ERR(task);
if (how & FLUSH_SYNC)
rpc_wait_for_completion_task(task);
rpc_put_task(task);
diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 9463af4b32e8..bfcf5e96483c 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -1232,8 +1232,7 @@ gss_destroying_context(struct rpc_cred *cred)
get_rpccred(cred);

task = rpc_call_null(gss_auth->client, cred, RPC_TASK_ASYNC|RPC_TASK_SOFT);
- if (!IS_ERR(task))
- rpc_put_task(task);
+ rpc_put_task(task);

put_rpccred(cred);
return 1;
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 2ad827db2704..e1d96f52e97f 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1080,8 +1080,6 @@ int rpc_call_sync(struct rpc_clnt *clnt, const struct rpc_message *msg, int flag
}

task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return PTR_ERR(task);
status = task->tk_status;
rpc_put_task(task);
return status;
@@ -1110,8 +1108,6 @@ rpc_call_async(struct rpc_clnt *clnt, const struct rpc_message *msg, int flags,
};

task = rpc_run_task(&task_setup_data);
- if (IS_ERR(task))
- return PTR_ERR(task);
rpc_put_task(task);
return 0;
}
@@ -2590,8 +2586,6 @@ int rpc_clnt_test_and_add_xprt(struct rpc_clnt *clnt,
RPC_TASK_SOFT|RPC_TASK_SOFTCONN|RPC_TASK_ASYNC,
&rpc_cb_add_xprt_call_ops, data);
put_rpccred(cred);
- if (IS_ERR(task))
- return PTR_ERR(task);
rpc_put_task(task);
return 1;
}
@@ -2637,10 +2631,6 @@ int rpc_clnt_setup_test_and_add_xprt(struct rpc_clnt *clnt,
RPC_TASK_SOFT | RPC_TASK_SOFTCONN,
NULL, NULL);
put_rpccred(cred);
- if (IS_ERR(task)) {
- status = PTR_ERR(task);
- goto out_err;
- }
status = task->tk_status;
rpc_put_task(task);

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index ea0676f199c8..57a955583360 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -782,12 +782,6 @@ void rpcb_getport_async(struct rpc_task *task)

child = rpcb_call_async(rpcb_clnt, map, proc);
rpc_release_client(rpcb_clnt);
- if (IS_ERR(child)) {
- /* rpcb_map_release() has freed the arguments */
- dprintk("RPC: %5u %s: rpc_run_task failed\n",
- task->tk_pid, __func__);
- return;
- }

xprt->stat.bind_count++;
rpc_put_task(child);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 85ce0db5b0a6..10c2182f70aa 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1508,16 +1508,10 @@ bc_svc_process(struct svc_serv *serv, struct rpc_rqst *req,
/* Finally, send the reply synchronously */
memcpy(&req->rq_snd_buf, &rqstp->rq_res, sizeof(req->rq_snd_buf));
task = rpc_run_bc_task(req);
- if (IS_ERR(task)) {
- error = PTR_ERR(task);
- goto out;
- }
-
WARN_ON_ONCE(atomic_read(&task->tk_count) != 1);
error = task->tk_status;
rpc_put_task(task);

-out:
dprintk("svc: %s(), error=%d\n", __func__, error);
return error;
}



2017-08-18 07:14:24

by NeilBrown

[permalink] [raw]
Subject: [PATCH 8/8] NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys'

An NFSv4.1 client might close a file after the user who opened it has
logged off. In this case the user's credentials may no longer be
valid, if they are e.g. kerberos credentials that have expired.

NFSv4.1 has a mechanism to allow the client to use machine credentials
to close a file. However due to a short-coming in the RFC, a CLOSE
with those credentials may not be possible if the file in question
isn't exported to the same security flavor - the required PUTFH must
be rejected when this is the case.

Specifically if a server and client support kerberos in general and
have used it to form a machine credential, but the file is only
exported to "sec=sys", a PUTFH with the machine credentials will fail,
so CLOSE is not possible.

As RPC_AUTH_UNIX (used by sec=sys) credentials can never expire, there
is no value in using the machine credential in place of them.
So in that case, just use the users credentials for CLOSE etc, as you would
in NFSv4.0

Signed-off-by: Neil Brown <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
---
fs/nfs/nfs4_fs.h | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 40bd05f05e74..ac4f10b7f6c1 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -303,6 +303,17 @@ _nfs4_state_protect(struct nfs_client *clp, unsigned long sp4_mode,
struct rpc_cred *newcred = NULL;
rpc_authflavor_t flavor;

+ if (sp4_mode == NFS_SP4_MACH_CRED_CLEANUP ||
+ sp4_mode == NFS_SP4_MACH_CRED_PNFS_CLEANUP) {
+ /* Using machine creds for cleanup operations
+ * is only relevent if the client credentials
+ * might expire. So don't bother for
+ * RPC_AUTH_UNIX. If file was only exported to
+ * sec=sys, the PUTFH would fail anyway.
+ */
+ if ((*clntp)->cl_auth->au_flavor == RPC_AUTH_UNIX)
+ return false;
+ }
if (test_bit(sp4_mode, &clp->cl_sp4_flags)) {
spin_lock(&clp->cl_lock);
if (clp->cl_machine_cred != NULL)



2017-08-18 14:40:47

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 1/8] SUNRPC: ECONNREFUSED should cause a rebind.


> On Aug 18, 2017, at 3:12 AM, NeilBrown <[email protected]> wrote:
>
> If you
> - mount and NFSv3 filesystem
> - do some file locking which requires the server
> to make a GRANT call back
> - unmount
> - mount again and do the same locking
>
> then the second attempt at locking suffers a 30 second delay.
> Unmounting and remounting causes lockd to stop and restart,
> which causes it to bind to a new port.
> The server still thinks the old port is valid and gets ECONNREFUSED
> when trying to contact it.
> ECONNREFUSED should be seen as a hard error that is not worth
> retrying. Rebinding is the only reasonable response.
>
> This patch forces a rebind if that makes sense.

I also reported this problem recently, and found a similar
(but rather lockd-centric) solution.

Reviewed-by: Chuck Lever <[email protected]>


> Signed-off-by: NeilBrown <[email protected]>
> ---
> net/sunrpc/clnt.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 2e49d1f892b7..69a9e5953744 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1903,6 +1903,14 @@ call_connect_status(struct rpc_task *task)
> task->tk_status = 0;
> switch (status) {
> case -ECONNREFUSED:
> + /* A positive refusal suggests a rebind is needed. */
> + if (RPC_IS_SOFTCONN(task))
> + break;
> + if (clnt->cl_autobind) {
> + rpc_force_rebind(clnt);
> + task->tk_action = call_bind;
> + return;
> + }
> case -ECONNRESET:
> case -ECONNABORTED:
> case -ENETUNREACH:
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




2017-08-18 19:05:57

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 2/8] NFSv4: don't let hanging mounts block other mounts

T24gRnJpLCAyMDE3LTA4LTE4IGF0IDE3OjEyICsxMDAwLCBOZWlsQnJvd24gd3JvdGU6DQo+IElm
IHlvdSB0cnkgYW4gTkZTdjQgbW91bnQgZnJvbSBhbiBpbmFjY2Vzc2libGUgc2VydmVyLCBpdCB3
aWxsIGhhbmcNCj4gYXMNCj4geW91IHdvdWxkIGV4cGVjdC4NCj4gSWYgeW91IHRoZW4gdHJ5IGFu
IE5GU3Y0IG1vdW50IGZyb20gYSBkaWZmZXJlbnQgYWNjZXNzaWJsZSBzZXJ2ZXIsDQo+IGl0IHdp
bGwgYWxzbyBoYW5nLiAgVGhpcyBpcyBub3QgZXhwZWN0ZWQuDQo+IA0KPiBUaGUgc2Vjb25kIG1v
dW50IGlzIGJsb2NrZWQgaW4NCj4gICBuZnM0X2luaXRfY2xpZW50KCkNCj4gICAtPiBuZnM0X2Rp
c2NvdmVyX3NlcnZlcl90cnVua2luZygpDQo+ICAgLT4gbmZzNDBfZGlzY292ZXJfc2VydmVyX3Ry
dW5raW5nKCkNCj4gICAtPiBuZnM0MF93YWxrX2NsaWVudF9saXN0KCkNCj4gICAtPiBuZnM0X21h
dGNoX2NsaWVudCgpDQo+ICAgLT4gbmZzX3dhaXRfY2xpZW50X2luaXRfY29tcGxldGUoKQ0KPiBJ
dCBpcyB3YWl0aW5nIGZvciB0aGUgZmlyc3QgbW91bnQgdG8gY29tcGxldGUgc28gdGhhdCBpdCBj
YW4gdGhlbg0KPiBzZWUgaWYgdGhlIHR3byBzZXJ2ZXJzIGFyZSByZWFsbHkgb25lIGFuZCB0aGUg
c2FtZS4NCj4gDQo+IEl0IGlzIG5vdCBuZWNlc3NhcnkgdG8gd2FpdCBoZXJlIHdoZW4gYW4gbmZz
X2NsaWVudCBjbF9jb25zX3N0YXRlIGlzDQo+IE5GU19DU19JTklUSU5HLiAgU3VjaCBhIGNsaWVu
dCB3aWxsLCBhZnRlciBjaGFuZ2luZyBjbF9jb25zX3N0YXRlLA0KPiBjYWxsDQo+IG5mczRfZGlz
Y292ZXJfc2VydmVyX3RydW5raW5nKCkgaXRzZWxmLiAgU28gaWYgdGhlIGN1cnJlbnQgY2xpZW50
DQo+IGp1c3QNCj4gc2tpcHMgdGhvc2UgY2xpZW50cywgdHJ1bmtpbmcgd2lsbCBzdGlsbCBiZSBk
aXNjb3ZlcmVkIGlmIG5lY2Vzc2FyeS4NCj4gDQo+IEkgYW0gdW5zdXJlIG9mIHNpdHVhdGlvbiB3
aXRoIE5GU19DU19TRVNTSU9OX0lOSVRJTkcsIGJ1dCBJIHN1c3BlY3QNCj4gdGhhdCB0aGUgY29t
bWVudCAiV2FpdCBmb3IgQ1JFQVRFX1NFU1NJT04gdG8gZmluaXNoIiBpbXBsaWVzIHRoYXQNCj4g
aXQgaXMgb25seSBjbGllbnRzIGluIE5GU19DU19TRVNTSU9OX0lOSVRJTkcgdGhhdCBuZWVkIHRv
IGJlIHdhaXRlZA0KPiBmb3IuDQo+IA0KPiBTaWduZWQtb2ZmLWJ5OiBOZWlsQnJvd24gPG5laWxi
QHN1c2UuY29tPg0KPiAtLS0NCj4gIGZzL25mcy9uZnM0Y2xpZW50LmMgfCAgICAyICstDQo+ICAx
IGZpbGUgY2hhbmdlZCwgMSBpbnNlcnRpb24oKyksIDEgZGVsZXRpb24oLSkNCj4gDQo+IGRpZmYg
LS1naXQgYS9mcy9uZnMvbmZzNGNsaWVudC5jIGIvZnMvbmZzL25mczRjbGllbnQuYw0KPiBpbmRl
eCBlOWJlYTkwZGMwMTcuLmQ4YjliN2ZmMTlhOSAxMDA2NDQNCj4gLS0tIGEvZnMvbmZzL25mczRj
bGllbnQuYw0KPiArKysgYi9mcy9uZnMvbmZzNGNsaWVudC5jDQo+IEBAIC00ODIsNyArNDgyLDcg
QEAgc3RhdGljIGludCBuZnM0X21hdGNoX2NsaWVudChzdHJ1Y3QNCj4gbmZzX2NsaWVudCAgKnBv
cywgIHN0cnVjdCBuZnNfY2xpZW50ICpuZXcsDQo+ICAJICogcmVtYWluaW5nIGZpZWxkcyBpbiAi
cG9zIiwgZXNwZWNpYWxseSB0aGUgY2xpZW50DQo+ICAJICogSUQgYW5kIHNlcnZlcm93bmVyIGZp
ZWxkcy4gIFdhaXQgZm9yIENSRUFURV9TRVNTSU9ODQo+ICAJICogdG8gZmluaXNoLiAqLw0KPiAt
CWlmIChwb3MtPmNsX2NvbnNfc3RhdGUgPiBORlNfQ1NfUkVBRFkpIHsNCj4gKwlpZiAocG9zLT5j
bF9jb25zX3N0YXRlID09IE5GU19DU19TRVNTSU9OX0lOSVRJTkcpIHsNCj4gIAkJYXRvbWljX2lu
YygmcG9zLT5jbF9jb3VudCk7DQo+ICAJCXNwaW5fdW5sb2NrKCZubi0+bmZzX2NsaWVudF9sb2Nr
KTsNCg0KVGhpcyBjb3VsZCBjYXVzZSB1cyB0byBkZWNsYXJlIGEgZmFsc2UgcG9zaXRpdmUgbWF0
Y2ggd2l0aCBhIGNsaWVudA0KdGhhdCBpcyB1bmluaXRpYWxpc2VkLg0KDQotLSANClRyb25kIE15
a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQu
bXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K


2017-08-18 19:15:15

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 4/8] NFS: flush out dirty data on file fput().

T24gRnJpLCAyMDE3LTA4LTE4IGF0IDE3OjEyICsxMDAwLCBOZWlsQnJvd24gd3JvdGU6DQo+IEFu
eSBkaXJ0eSBORlMgcGFnZSBob2xkcyBhbiBzX2FjdGl2ZSByZWZlcmVuY2Ugb24gdGhlIHN1cGVy
YmxvY2ssDQo+IGJlY2F1c2UgcGFnZV9wcml2YXRlKCkgcmVmZXJlbmNlcyBhbiBuZnNfcGFnZSwg
d2hpY2ggcmVmZXJlbmNlcyBhbg0KPiBvcGVuIGNvbnRleHQsIHdoaWNoIHJlZmVyZW5jZXMgdGhl
IHN1cGVyYmxvY2suDQo+IA0KPiBTbyBpZiB0aGVyZSBhcmUgYW55IGRpcnR5IHBhZ2VzIHdoZW4g
dGhlIGZpbGVzeXN0ZW0gaXMgdW5tb3VudGVkLCB0aGUNCj4gdW5tb3VudCB3aWxsIGFjdCBsaWtl
IGEgImxhenkiIHVubW91bnQgYW5kIG5vdCBjYWxsIC0+a2lsbF9zYigpLg0KPiBCYWNrZ3JvdW5k
IHdyaXRlLWJhY2sgY2FuIHRoZW4gd3JpdGUgb3V0IHRoZSBwYWdlcyAqYWZ0ZXIqIHRoZQ0KPiBm
aWxlc3lzdGVtDQo+IHVubW91bnQgaGFzIGFwcGFyZW50bHkgY29tcGxldGVkLg0KPiANCj4gVGhp
cyBjb250cmFzdHMgd2l0aCBvdGhlciBmaWxlc3lzdGVtcyB3aGljaCBkbyBub3QgaG9sZCBleHRy
YQ0KPiBzX2FjdGl2ZQ0KPiByZWZlcmVuY2VzLCBzbyAtPmtpbGxfc2IoKSBpcyByZWxpYWJseSBj
YWxsZWQgb24gdW5tb3VudCwgYW5kDQo+IGdlbmVyaWNfc2h1dGRvd25fc3VwZXIoKSB3aWxsIGNh
bGwgc3luY19maWxlc3lzdGVtKCkgdG8gZmx1c2gNCj4gZXZlcnl0aGluZyBvdXQgYmVmb3JlIHRo
ZSB1bm1vdW50IGNvbXBsZXRlcy4NCj4gDQo+IFdoZW4gb3Blbi93cml0ZS9jbG9zZSBpcyB1c2Vk
IHRvIG1vZGlmeSBmaWxlcywgdGhlIGZpbmFsIGNsb3NlIGNhdXNlcw0KPiBmX29wLT5mbHVzaCB0
byBiZSBjYWxsZWQsIHdoaWNoIGZsdXNoZXMgYWxsIGRpcnR5IHBhZ2VzLiAgSG93ZXZlciBpZg0K
PiBvcGVuL21tYXAvY2xvc2UvbW9kaWZ5LW1lbW9yeS91bm1hcCBpcyB1c2VkLCBkaXJ0eSBwYWdl
cyBjYW4gcmVtYWluDQo+IGluDQo+IG1lbW9yeSBhZnRlciB0aGUgYXBwbGljYXRpb24gaGFzIGRy
b3BwZWQgYWxsIHJlZmVyZW5jZXMgdG8gdGhlIGZpbGUuDQo+IFNpbWlsYXJseSBpZiBhIGxvb3At
bW91bnQgaXMgZG9uZSB3aXRoIGEgTkZTIGZpbGUsIHRoZXJlIGlzIG5vIGZpbmFsDQo+IGZsdXNo
IHdoZW4gdGhlIGxvb3AgZGV2aWNlIGlzIGRlc3Ryb3llZCBhbmQgdGhlIGZpbGUgY2FuIGhhdmUg
ZGlydHkNCj4gZGF0YSBhZnRlciB0aGUgbGFzdCAiY2xvc2UiLg0KPiANCj4gQWxzbywgYSBsb29w
LWJhY2sgbW91bnQgb2YgYSBkZXZpY2UgZG9lcyBub3QgImNsb3NlIiB0aGUgZmlsZSB3aGVuDQo+
IHRoZQ0KPiBsb29wIGRldmljZSBpcyBkZXN0cm95ZWQuICBUaGlzIGNhbiBsZWF2ZSBkaXJ0eSBw
YWdlIGNhY2hlIHBhZ2VzIHRvby4NCj4gDQo+IEZpeCB0aGlzIGJ5IGNhbGxpbmcgdmZzX2ZzeW5j
KCkgaW4gbmZzX2ZpbGVfcmVsZWFzZSAoYWthDQo+IGZfb3AtPnJlbGVhc2UoKSkuICBUaGlzIG1l
YW5zIHRoYXQgb24gdGhlIGZpbmFsIHVubWFwIG9mIGEgZmlsZSAob3INCj4gZGVzdHJ1Y3Rpb24g
b2YgYSBsb29wIGRldmljZSksIGFsbCBjaGFuZ2VzIGFyZSBmbHVzaGVkLCBhbmQgZW5zdXJlcw0K
PiB0aGF0DQo+IHdoZW4gdW5tb3VudCBpcyByZXF1ZXN0ZWQgdGhlcmUgd2lsbCBiZSBubyBkaXJ0
eSBwYWdlcyB0byBkZWxheSB0aGUNCj4gZmluYWwgdW5tb3VudC4NCj4gDQo+IFdpdGhvdXQgdGhp
cyBwYXRjaCwgaXQgaXMgbm90IHNhZmUgdG8gc3RvcCBvciBkaXNjb25uZWN0IHRoZSBORlMNCj4g
c2VydmVyIGFmdGVyIGFsbCBjbGllbnRzIGhhdmUgdW5tb3VudGVkLiAgVGhleSBuZWVkIHRvIHVu
bW91bnQgYW5kDQo+IGNhbGwgInN5bmMiLg0KPiANCj4gU2lnbmVkLW9mZi1ieTogTmVpbEJyb3du
IDxuZWlsYkBzdXNlLmNvbT4NCj4gLS0tDQo+ICBmcy9uZnMvZmlsZS5jIHwgICAgNiArKysrKysN
Cj4gIDEgZmlsZSBjaGFuZ2VkLCA2IGluc2VydGlvbnMoKykNCj4gDQo+IGRpZmYgLS1naXQgYS9m
cy9uZnMvZmlsZS5jIGIvZnMvbmZzL2ZpbGUuYw0KPiBpbmRleCBhZjMzMGMzMWY2MjcuLmFhODgz
ZDhiMjRlNiAxMDA2NDQNCj4gLS0tIGEvZnMvbmZzL2ZpbGUuYw0KPiArKysgYi9mcy9uZnMvZmls
ZS5jDQo+IEBAIC04MSw2ICs4MSwxMiBAQCBuZnNfZmlsZV9yZWxlYXNlKHN0cnVjdCBpbm9kZSAq
aW5vZGUsIHN0cnVjdCBmaWxlDQo+ICpmaWxwKQ0KPiAgew0KPiAgCWRwcmludGsoIk5GUzogcmVs
ZWFzZSglcEQyKVxuIiwgZmlscCk7DQo+ICANCj4gKwlpZiAoZmlscC0+Zl9tb2RlICYgRk1PREVf
V1JJVEUpDQo+ICsJCS8qIEVuc3VyZSBkaXJ0eSBtbWFwcGVkIHBhZ2VzIGFyZSBmbHVzaGVkDQo+
ICsJCSAqIHNvIHRoZXJlIHdpbGwgYmUgbm8gZGlydHkgcGFnZXMgdG8NCj4gKwkJICogcHJldmVu
dCBhbiB1bm1vdW50IGZyb20gY29tcGxldGluZy4NCj4gKwkJICovDQo+ICsJCXZmc19mc3luYyhm
aWxwLCAwKTsNCj4gIAluZnNfaW5jX3N0YXRzKGlub2RlLCBORlNJT1NfVkZTUkVMRUFTRSk7DQo+
ICAJbmZzX2ZpbGVfY2xlYXJfb3Blbl9jb250ZXh0KGZpbHApOw0KPiAgCXJldHVybiAwOw0KDQpU
aGUgcmlnaHQgZml4IGhlcmUgaXMgdG8gZW5zdXJlIHRoYXQgdW1vdW50KCkgZmx1c2hlcyB0aGUg
ZGlydHkgZGF0YSwNCmFuZCB0aGF0IGl0IGFib3J0cyB0aGUgdW1vdW50IGF0dGVtcHQgaWYgdGhl
IGZsdXNoIGZhaWxzLiBPdGhlcndpc2UsIGENCnNpZ25hbCB0byB0aGUgYWJvdmUgZnN5bmMgY2Fs
bCB3aWxsIGNhdXNlIHRoZSBwcm9ibGVtIHRvIHJlb2NjdXIuDQoNCi0tIA0KVHJvbmQgTXlrbGVi
dXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWts
ZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo=


2017-08-19 00:50:43

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 2/8] NFSv4: don't let hanging mounts block other mounts

On Fri, Aug 18 2017, Trond Myklebust wrote:

> On Fri, 2017-08-18 at 17:12 +1000, NeilBrown wrote:
>> If you try an NFSv4 mount from an inaccessible server, it will hang
>> as
>> you would expect.
>> If you then try an NFSv4 mount from a different accessible server,
>> it will also hang. This is not expected.
>>
>> The second mount is blocked in
>> nfs4_init_client()
>> -> nfs4_discover_server_trunking()
>> -> nfs40_discover_server_trunking()
>> -> nfs40_walk_client_list()
>> -> nfs4_match_client()
>> -> nfs_wait_client_init_complete()
>> It is waiting for the first mount to complete so that it can then
>> see if the two servers are really one and the same.
>>
>> It is not necessary to wait here when an nfs_client cl_cons_state is
>> NFS_CS_INITING. Such a client will, after changing cl_cons_state,
>> call
>> nfs4_discover_server_trunking() itself. So if the current client
>> just
>> skips those clients, trunking will still be discovered if necessary.
>>
>> I am unsure of situation with NFS_CS_SESSION_INITING, but I suspect
>> that the comment "Wait for CREATE_SESSION to finish" implies that
>> it is only clients in NFS_CS_SESSION_INITING that need to be waited
>> for.
>>
>> Signed-off-by: NeilBrown <[email protected]>
>> ---
>> fs/nfs/nfs4client.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
>> index e9bea90dc017..d8b9b7ff19a9 100644
>> --- a/fs/nfs/nfs4client.c
>> +++ b/fs/nfs/nfs4client.c
>> @@ -482,7 +482,7 @@ static int nfs4_match_client(struct
>> nfs_client *pos, struct nfs_client *new,
>> * remaining fields in "pos", especially the client
>> * ID and serverowner fields. Wait for CREATE_SESSION
>> * to finish. */
>> - if (pos->cl_cons_state > NFS_CS_READY) {
>> + if (pos->cl_cons_state == NFS_CS_SESSION_INITING) {
>> atomic_inc(&pos->cl_count);
>> spin_unlock(&nn->nfs_client_lock);
>
> This could cause us to declare a false positive match with a client
> that is uninitialised.

Thanks for the review.

A positive match is reported by returning zero.
If cl_cons_state is not NFS_CS_READY, nfs4_match_client() will not
return zero.

So I don't see how a false positive is possible.

A false negative might be possible with an uninitialized client, but
once that client is initialised, it will call walk_client_list and find
the match. Won't it?

Thanks,
NeilBrown


Attachments:
signature.asc (832.00 B)

2017-08-19 01:02:20

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 4/8] NFS: flush out dirty data on file fput().

On Fri, Aug 18 2017, Trond Myklebust wrote:

> On Fri, 2017-08-18 at 17:12 +1000, NeilBrown wrote:
>> Any dirty NFS page holds an s_active reference on the superblock,
>> because page_private() references an nfs_page, which references an
>> open context, which references the superblock.
>>
>> So if there are any dirty pages when the filesystem is unmounted, the
>> unmount will act like a "lazy" unmount and not call ->kill_sb().
>> Background write-back can then write out the pages *after* the
>> filesystem
>> unmount has apparently completed.
>>
>> This contrasts with other filesystems which do not hold extra
>> s_active
>> references, so ->kill_sb() is reliably called on unmount, and
>> generic_shutdown_super() will call sync_filesystem() to flush
>> everything out before the unmount completes.
>>
>> When open/write/close is used to modify files, the final close causes
>> f_op->flush to be called, which flushes all dirty pages. However if
>> open/mmap/close/modify-memory/unmap is used, dirty pages can remain
>> in
>> memory after the application has dropped all references to the file.
>> Similarly if a loop-mount is done with a NFS file, there is no final
>> flush when the loop device is destroyed and the file can have dirty
>> data after the last "close".
>>
>> Also, a loop-back mount of a device does not "close" the file when
>> the
>> loop device is destroyed. This can leave dirty page cache pages too.
>>
>> Fix this by calling vfs_fsync() in nfs_file_release (aka
>> f_op->release()). This means that on the final unmap of a file (or
>> destruction of a loop device), all changes are flushed, and ensures
>> that
>> when unmount is requested there will be no dirty pages to delay the
>> final unmount.
>>
>> Without this patch, it is not safe to stop or disconnect the NFS
>> server after all clients have unmounted. They need to unmount and
>> call "sync".
>>
>> Signed-off-by: NeilBrown <[email protected]>
>> ---
>> fs/nfs/file.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
>> index af330c31f627..aa883d8b24e6 100644
>> --- a/fs/nfs/file.c
>> +++ b/fs/nfs/file.c
>> @@ -81,6 +81,12 @@ nfs_file_release(struct inode *inode, struct file
>> *filp)
>> {
>> dprintk("NFS: release(%pD2)\n", filp);
>>
>> + if (filp->f_mode & FMODE_WRITE)
>> + /* Ensure dirty mmapped pages are flushed
>> + * so there will be no dirty pages to
>> + * prevent an unmount from completing.
>> + */
>> + vfs_fsync(filp, 0);
>> nfs_inc_stats(inode, NFSIOS_VFSRELEASE);
>> nfs_file_clear_open_context(filp);
>> return 0;
>
> The right fix here is to ensure that umount() flushes the dirty data,
> and that it aborts the umount attempt if the flush fails. Otherwise, a
> signal to the above fsync call will cause the problem to reoccur.

The only way to ensure that umount flushes dirty data is to revert
commit 1daef0a86837 ("NFS: Clean up nfs_sb_active/nfs_sb_deactive").
As long as we are taking extra references to s_active, umount won't do
what it ought to do.

I do wonder what we should do when you try to unmount a filesystem
mounted from an inaccessible server.
Blocking is awkward for scripts to work with. Failing with EBUSY might
be misleading.
I don't think that using MNT_FORCE guarantees success does it? It would
need to discard any dirty data and abort any other background tasks.

I think I would be in favor of EBUSY rather than a hang, and of making
MNT_FORCE a silver bullet providing there are no open file descriptors
on the filesystem.

Thanks,
NeilBrown


Attachments:
signature.asc (832.00 B)

2017-08-19 01:27:26

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 4/8] NFS: flush out dirty data on file fput().

T24gU2F0LCAyMDE3LTA4LTE5IGF0IDExOjAyICsxMDAwLCBOZWlsQnJvd24gd3JvdGU6DQo+IE9u
IEZyaSwgQXVnIDE4IDIwMTcsIFRyb25kIE15a2xlYnVzdCB3cm90ZToNCj4gDQo+ID4gT24gRnJp
LCAyMDE3LTA4LTE4IGF0IDE3OjEyICsxMDAwLCBOZWlsQnJvd24gd3JvdGU6DQo+ID4gPiBBbnkg
ZGlydHkgTkZTIHBhZ2UgaG9sZHMgYW4gc19hY3RpdmUgcmVmZXJlbmNlIG9uIHRoZSBzdXBlcmJs
b2NrLA0KPiA+ID4gYmVjYXVzZSBwYWdlX3ByaXZhdGUoKSByZWZlcmVuY2VzIGFuIG5mc19wYWdl
LCB3aGljaCByZWZlcmVuY2VzDQo+ID4gPiBhbg0KPiA+ID4gb3BlbiBjb250ZXh0LCB3aGljaCBy
ZWZlcmVuY2VzIHRoZSBzdXBlcmJsb2NrLg0KPiA+ID4gDQo+ID4gPiBTbyBpZiB0aGVyZSBhcmUg
YW55IGRpcnR5IHBhZ2VzIHdoZW4gdGhlIGZpbGVzeXN0ZW0gaXMgdW5tb3VudGVkLA0KPiA+ID4g
dGhlDQo+ID4gPiB1bm1vdW50IHdpbGwgYWN0IGxpa2UgYSAibGF6eSIgdW5tb3VudCBhbmQgbm90
IGNhbGwgLT5raWxsX3NiKCkuDQo+ID4gPiBCYWNrZ3JvdW5kIHdyaXRlLWJhY2sgY2FuIHRoZW4g
d3JpdGUgb3V0IHRoZSBwYWdlcyAqYWZ0ZXIqIHRoZQ0KPiA+ID4gZmlsZXN5c3RlbQ0KPiA+ID4g
dW5tb3VudCBoYXMgYXBwYXJlbnRseSBjb21wbGV0ZWQuDQo+ID4gPiANCj4gPiA+IFRoaXMgY29u
dHJhc3RzIHdpdGggb3RoZXIgZmlsZXN5c3RlbXMgd2hpY2ggZG8gbm90IGhvbGQgZXh0cmENCj4g
PiA+IHNfYWN0aXZlDQo+ID4gPiByZWZlcmVuY2VzLCBzbyAtPmtpbGxfc2IoKSBpcyByZWxpYWJs
eSBjYWxsZWQgb24gdW5tb3VudCwgYW5kDQo+ID4gPiBnZW5lcmljX3NodXRkb3duX3N1cGVyKCkg
d2lsbCBjYWxsIHN5bmNfZmlsZXN5c3RlbSgpIHRvIGZsdXNoDQo+ID4gPiBldmVyeXRoaW5nIG91
dCBiZWZvcmUgdGhlIHVubW91bnQgY29tcGxldGVzLg0KPiA+ID4gDQo+ID4gPiBXaGVuIG9wZW4v
d3JpdGUvY2xvc2UgaXMgdXNlZCB0byBtb2RpZnkgZmlsZXMsIHRoZSBmaW5hbCBjbG9zZQ0KPiA+
ID4gY2F1c2VzDQo+ID4gPiBmX29wLT5mbHVzaCB0byBiZSBjYWxsZWQsIHdoaWNoIGZsdXNoZXMg
YWxsIGRpcnR5IHBhZ2VzLiAgSG93ZXZlcg0KPiA+ID4gaWYNCj4gPiA+IG9wZW4vbW1hcC9jbG9z
ZS9tb2RpZnktbWVtb3J5L3VubWFwIGlzIHVzZWQsIGRpcnR5IHBhZ2VzIGNhbg0KPiA+ID4gcmVt
YWluDQo+ID4gPiBpbg0KPiA+ID4gbWVtb3J5IGFmdGVyIHRoZSBhcHBsaWNhdGlvbiBoYXMgZHJv
cHBlZCBhbGwgcmVmZXJlbmNlcyB0byB0aGUNCj4gPiA+IGZpbGUuDQo+ID4gPiBTaW1pbGFybHkg
aWYgYSBsb29wLW1vdW50IGlzIGRvbmUgd2l0aCBhIE5GUyBmaWxlLCB0aGVyZSBpcyBubw0KPiA+
ID4gZmluYWwNCj4gPiA+IGZsdXNoIHdoZW4gdGhlIGxvb3AgZGV2aWNlIGlzIGRlc3Ryb3llZCBh
bmQgdGhlIGZpbGUgY2FuIGhhdmUNCj4gPiA+IGRpcnR5DQo+ID4gPiBkYXRhIGFmdGVyIHRoZSBs
YXN0ICJjbG9zZSIuDQo+ID4gPiANCj4gPiA+IEFsc28sIGEgbG9vcC1iYWNrIG1vdW50IG9mIGEg
ZGV2aWNlIGRvZXMgbm90ICJjbG9zZSIgdGhlIGZpbGUNCj4gPiA+IHdoZW4NCj4gPiA+IHRoZQ0K
PiA+ID4gbG9vcCBkZXZpY2UgaXMgZGVzdHJveWVkLiAgVGhpcyBjYW4gbGVhdmUgZGlydHkgcGFn
ZSBjYWNoZSBwYWdlcw0KPiA+ID4gdG9vLg0KPiA+ID4gDQo+ID4gPiBGaXggdGhpcyBieSBjYWxs
aW5nIHZmc19mc3luYygpIGluIG5mc19maWxlX3JlbGVhc2UgKGFrYQ0KPiA+ID4gZl9vcC0+cmVs
ZWFzZSgpKS4gIFRoaXMgbWVhbnMgdGhhdCBvbiB0aGUgZmluYWwgdW5tYXAgb2YgYSBmaWxlDQo+
ID4gPiAob3INCj4gPiA+IGRlc3RydWN0aW9uIG9mIGEgbG9vcCBkZXZpY2UpLCBhbGwgY2hhbmdl
cyBhcmUgZmx1c2hlZCwgYW5kDQo+ID4gPiBlbnN1cmVzDQo+ID4gPiB0aGF0DQo+ID4gPiB3aGVu
IHVubW91bnQgaXMgcmVxdWVzdGVkIHRoZXJlIHdpbGwgYmUgbm8gZGlydHkgcGFnZXMgdG8gZGVs
YXkNCj4gPiA+IHRoZQ0KPiA+ID4gZmluYWwgdW5tb3VudC4NCj4gPiA+IA0KPiA+ID4gV2l0aG91
dCB0aGlzIHBhdGNoLCBpdCBpcyBub3Qgc2FmZSB0byBzdG9wIG9yIGRpc2Nvbm5lY3QgdGhlIE5G
Uw0KPiA+ID4gc2VydmVyIGFmdGVyIGFsbCBjbGllbnRzIGhhdmUgdW5tb3VudGVkLiAgVGhleSBu
ZWVkIHRvIHVubW91bnQNCj4gPiA+IGFuZA0KPiA+ID4gY2FsbCAic3luYyIuDQo+ID4gPiANCj4g
PiA+IFNpZ25lZC1vZmYtYnk6IE5laWxCcm93biA8bmVpbGJAc3VzZS5jb20+DQo+ID4gPiAtLS0N
Cj4gPiA+ICBmcy9uZnMvZmlsZS5jIHwgICAgNiArKysrKysNCj4gPiA+ICAxIGZpbGUgY2hhbmdl
ZCwgNiBpbnNlcnRpb25zKCspDQo+ID4gPiANCj4gPiA+IGRpZmYgLS1naXQgYS9mcy9uZnMvZmls
ZS5jIGIvZnMvbmZzL2ZpbGUuYw0KPiA+ID4gaW5kZXggYWYzMzBjMzFmNjI3Li5hYTg4M2Q4YjI0
ZTYgMTAwNjQ0DQo+ID4gPiAtLS0gYS9mcy9uZnMvZmlsZS5jDQo+ID4gPiArKysgYi9mcy9uZnMv
ZmlsZS5jDQo+ID4gPiBAQCAtODEsNiArODEsMTIgQEAgbmZzX2ZpbGVfcmVsZWFzZShzdHJ1Y3Qg
aW5vZGUgKmlub2RlLCBzdHJ1Y3QNCj4gPiA+IGZpbGUNCj4gPiA+ICpmaWxwKQ0KPiA+ID4gIHsN
Cj4gPiA+ICAJZHByaW50aygiTkZTOiByZWxlYXNlKCVwRDIpXG4iLCBmaWxwKTsNCj4gPiA+ICAN
Cj4gPiA+ICsJaWYgKGZpbHAtPmZfbW9kZSAmIEZNT0RFX1dSSVRFKQ0KPiA+ID4gKwkJLyogRW5z
dXJlIGRpcnR5IG1tYXBwZWQgcGFnZXMgYXJlIGZsdXNoZWQNCj4gPiA+ICsJCSAqIHNvIHRoZXJl
IHdpbGwgYmUgbm8gZGlydHkgcGFnZXMgdG8NCj4gPiA+ICsJCSAqIHByZXZlbnQgYW4gdW5tb3Vu
dCBmcm9tIGNvbXBsZXRpbmcuDQo+ID4gPiArCQkgKi8NCj4gPiA+ICsJCXZmc19mc3luYyhmaWxw
LCAwKTsNCj4gPiA+ICAJbmZzX2luY19zdGF0cyhpbm9kZSwgTkZTSU9TX1ZGU1JFTEVBU0UpOw0K
PiA+ID4gIAluZnNfZmlsZV9jbGVhcl9vcGVuX2NvbnRleHQoZmlscCk7DQo+ID4gPiAgCXJldHVy
biAwOw0KPiA+IA0KPiA+IFRoZSByaWdodCBmaXggaGVyZSBpcyB0byBlbnN1cmUgdGhhdCB1bW91
bnQoKSBmbHVzaGVzIHRoZSBkaXJ0eQ0KPiA+IGRhdGEsDQo+ID4gYW5kIHRoYXQgaXQgYWJvcnRz
IHRoZSB1bW91bnQgYXR0ZW1wdCBpZiB0aGUgZmx1c2ggZmFpbHMuDQo+ID4gT3RoZXJ3aXNlLCBh
DQo+ID4gc2lnbmFsIHRvIHRoZSBhYm92ZSBmc3luYyBjYWxsIHdpbGwgY2F1c2UgdGhlIHByb2Js
ZW0gdG8gcmVvY2N1ci4NCj4gDQo+IFRoZSBvbmx5IHdheSB0byBlbnN1cmUgdGhhdCB1bW91bnQg
Zmx1c2hlcyBkaXJ0eSBkYXRhIGlzIHRvIHJldmVydA0KPiBjb21taXQgMWRhZWYwYTg2ODM3ICgi
TkZTOiBDbGVhbiB1cCBuZnNfc2JfYWN0aXZlL25mc19zYl9kZWFjdGl2ZSIpLg0KPiBBcyBsb25n
IGFzIHdlIGFyZSB0YWtpbmcgZXh0cmEgcmVmZXJlbmNlcyB0byBzX2FjdGl2ZSwgdW1vdW50IHdv
bid0DQo+IGRvDQo+IHdoYXQgaXQgb3VnaHQgdG8gZG8uDQo+IA0KDQpSZXZlcnRpbmcgdGhhdCBw
YXRjaCBpcyBub3Qgc3VmZmljaWVudC4gWW91J2Qgc3RpbGwgbmVlZCB0byBjYWxsDQpzeW5jX2Zp
bGVzeXN0ZW0oKSwgYW5kIGFib3J0IHRoZSB1bW91bnQgaWYgdGhlIGxhdHRlciBmYWlscy4NCg0K
PiBJIGRvIHdvbmRlciB3aGF0IHdlIHNob3VsZCBkbyB3aGVuIHlvdSB0cnkgdG8gdW5tb3VudCBh
IGZpbGVzeXN0ZW0NCj4gbW91bnRlZCBmcm9tIGFuIGluYWNjZXNzaWJsZSBzZXJ2ZXIuDQo+IEJs
b2NraW5nIGlzIGF3a3dhcmQgZm9yIHNjcmlwdHMgdG8gd29yayB3aXRoLiAgRmFpbGluZyB3aXRo
IEVCVVNZDQo+IG1pZ2h0DQo+IGJlIG1pc2xlYWRpbmcuDQo+IEkgZG9uJ3QgdGhpbmsgdGhhdCB1
c2luZyBNTlRfRk9SQ0UgZ3VhcmFudGVlcyBzdWNjZXNzIGRvZXMgaXQ/ICBJdA0KPiB3b3VsZA0K
PiBuZWVkIHRvIGRpc2NhcmQgYW55IGRpcnR5IGRhdGEgYW5kIGFib3J0IGFueSBvdGhlciBiYWNr
Z3JvdW5kIHRhc2tzLg0KDQpNTlRfRk9SQ0Ugd2lsbCBvbmx5IGFib3J0IGFueSBvdXRzdGFuZGlu
ZyBSUEMgY2FsbHMuIEl0IGlzIGNvbnN0cmFpbmVkDQpieSB0aGUgZmFjdCB0aGF0IHdlIGRvbid0
IHNlcmlhbGlzZSB1bW91bnQgYW5kIG1vdW50LCBhbmQgdGhhdCB0aGUNCmZpbGVzeXN0ZW0gb24g
d2hpY2ggd2UgYXJlIG9wZXJhdGluZyBtYXkgYmUgbW91bnRlZCBpbiBzZXZlcmFsIHBsYWNlcw0K
aW4gdGhlIG1haW4gbmFtZXNwYWNlIGFuZCBpbmRlZWQgaW4gc2V2ZXJhbCBwcml2YXRlIG5hbWVz
cGFjZXMgdG8gYm9vdC4NCg0KPiBJIHRoaW5rIEkgd291bGQgYmUgaW4gZmF2b3Igb2YgRUJVU1kg
cmF0aGVyIHRoYW4gYSBoYW5nLCBhbmQgb2YNCj4gbWFraW5nDQo+IE1OVF9GT1JDRSBhIHNpbHZl
ciBidWxsZXQgcHJvdmlkaW5nIHRoZXJlIGFyZSBubyBvcGVuIGZpbGUNCj4gZGVzY3JpcHRvcnMN
Cj4gb24gdGhlIGZpbGVzeXN0ZW0uDQoNCkFsbCBvdGhlciBmaWxlc3lzdGVtcyB0cmlnZ2VyIHN5
bmNfZmlsZXN5c3RlbSgpLCBhbmQgdGhhdCdzIHdoYXQgdGhlDQpWRlMgZXhwZWN0cyB1cyB0byBk
by4gVGhlIHByb2JsZW0gaXMgdGhlIFZGUyBhc3N1bWVzIHRoYXQgY2FsbCB3aWxsDQphbHdheXMg
c3VjY2VlZCwgYW5kIHdvbid0IGV2ZXIgcmV0dXJuIGFuIGVycm9yLg0KDQotLSANClRyb25kIE15
a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQu
bXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0K


2017-08-21 01:36:08

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 4/8] NFS: flush out dirty data on file fput().

On Sat, Aug 19 2017, Trond Myklebust wrote:

> On Sat, 2017-08-19 at 11:02 +1000, NeilBrown wrote:
>> On Fri, Aug 18 2017, Trond Myklebust wrote:
>>
>> > On Fri, 2017-08-18 at 17:12 +1000, NeilBrown wrote:
>> > > Any dirty NFS page holds an s_active reference on the superblock,
>> > > because page_private() references an nfs_page, which references
>> > > an
>> > > open context, which references the superblock.
>> > >
>> > > So if there are any dirty pages when the filesystem is unmounted,
>> > > the
>> > > unmount will act like a "lazy" unmount and not call ->kill_sb().
>> > > Background write-back can then write out the pages *after* the
>> > > filesystem
>> > > unmount has apparently completed.
>> > >
>> > > This contrasts with other filesystems which do not hold extra
>> > > s_active
>> > > references, so ->kill_sb() is reliably called on unmount, and
>> > > generic_shutdown_super() will call sync_filesystem() to flush
>> > > everything out before the unmount completes.
>> > >
>> > > When open/write/close is used to modify files, the final close
>> > > causes
>> > > f_op->flush to be called, which flushes all dirty pages. However
>> > > if
>> > > open/mmap/close/modify-memory/unmap is used, dirty pages can
>> > > remain
>> > > in
>> > > memory after the application has dropped all references to the
>> > > file.
>> > > Similarly if a loop-mount is done with a NFS file, there is no
>> > > final
>> > > flush when the loop device is destroyed and the file can have
>> > > dirty
>> > > data after the last "close".
>> > >
>> > > Also, a loop-back mount of a device does not "close" the file
>> > > when
>> > > the
>> > > loop device is destroyed. This can leave dirty page cache pages
>> > > too.
>> > >
>> > > Fix this by calling vfs_fsync() in nfs_file_release (aka
>> > > f_op->release()). This means that on the final unmap of a file
>> > > (or
>> > > destruction of a loop device), all changes are flushed, and
>> > > ensures
>> > > that
>> > > when unmount is requested there will be no dirty pages to delay
>> > > the
>> > > final unmount.
>> > >
>> > > Without this patch, it is not safe to stop or disconnect the NFS
>> > > server after all clients have unmounted. They need to unmount
>> > > and
>> > > call "sync".
>> > >
>> > > Signed-off-by: NeilBrown <[email protected]>
>> > > ---
>> > > fs/nfs/file.c | 6 ++++++
>> > > 1 file changed, 6 insertions(+)
>> > >
>> > > diff --git a/fs/nfs/file.c b/fs/nfs/file.c
>> > > index af330c31f627..aa883d8b24e6 100644
>> > > --- a/fs/nfs/file.c
>> > > +++ b/fs/nfs/file.c
>> > > @@ -81,6 +81,12 @@ nfs_file_release(struct inode *inode, struct
>> > > file
>> > > *filp)
>> > > {
>> > > dprintk("NFS: release(%pD2)\n", filp);
>> > >
>> > > + if (filp->f_mode & FMODE_WRITE)
>> > > + /* Ensure dirty mmapped pages are flushed
>> > > + * so there will be no dirty pages to
>> > > + * prevent an unmount from completing.
>> > > + */
>> > > + vfs_fsync(filp, 0);
>> > > nfs_inc_stats(inode, NFSIOS_VFSRELEASE);
>> > > nfs_file_clear_open_context(filp);
>> > > return 0;
>> >
>> > The right fix here is to ensure that umount() flushes the dirty
>> > data,
>> > and that it aborts the umount attempt if the flush fails.
>> > Otherwise, a
>> > signal to the above fsync call will cause the problem to reoccur.
>>
>> The only way to ensure that umount flushes dirty data is to revert
>> commit 1daef0a86837 ("NFS: Clean up nfs_sb_active/nfs_sb_deactive").
>> As long as we are taking extra references to s_active, umount won't
>> do
>> what it ought to do.
>>
>
> Reverting that patch is not sufficient. You'd still need to call
> sync_filesystem(), and abort the umount if the latter fails.

generic_shutdown_super() (which nfs_kill_super() calls) already calls
sync_filesystem(). However sync_filesystem() cannot fail - it just
asks bdi workers to flush data out, then waits for completion.

If the server is working, this should complete correctly. If it isn't
this will block indefinitely in the same way that a "sync" following the
umount will block indefinitely today.

So while I agree that reverting this isn't sufficient, I think it might
be a step in the right direction.

I've been experimenting with what happens when the server does down and
you try to unmount. With NFSv4, the state manager can still be trying
to send a RENEW and not want to let go until it gets an answer. I
haven't traced it all through yet to understand why or how significant
this is.


>
>> I do wonder what we should do when you try to unmount a filesystem
>> mounted from an inaccessible server.
>> Blocking is awkward for scripts to work with. Failing with EBUSY
>> might
>> be misleading.
>> I don't think that using MNT_FORCE guarantees success does it? It
>> would
>> need to discard any dirty data and abort any other background tasks.
>
> MNT_FORCE will only abort any outstanding RPC calls. It is constrained
> by the fact that we don't serialise umount and mount, and that the
> filesystem on which we are operating may be mounted in several places
> in the main namespace and indeed in several private namespaces to
> boot.

Yes, that is a problem.
If we could arrange that SIGKILL *always* kills the process so that it
drops the reference to the filesystem, we could possible delay the
MNT_FORCE handling until deactivate_super(), but at present we sometimes
need MNT_FORCE to allow KILLed processes to die, and while there are
processes with file descriptors, umount will not get as far as
deactivate_super().
(We need filemap_fdatawait_range() to be killable to do better.)

>
>> I think I would be in favor of EBUSY rather than a hang, and of
>> making
>> MNT_FORCE a silver bullet providing there are no open file
>> descriptors
>> on the filesystem.
>
> All other filesystems trigger sync_filesystem(), and that's what the
> VFS expects us to do. The problem is the VFS assumes that call will
> always succeed, and won't ever return an error.

Yes, it is not easy to find an "ideal" solution.

Given that fact, and given that the present patch does address an easily
demonstrated symptom, would you accept it as a small step in a good
direction? When the server is still accessible, and when no-one kills
processes at awkward times, it definitely helps.

Even with a SIGKILL, I think the WRITEs and the COMMIT will be sent, but
the COMMIT won't be waited for...

Thanks,
NeilBrown


Attachments:
signature.asc (832.00 B)

2017-08-23 11:12:43

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 4/8] NFS: flush out dirty data on file fput().

On Mon, 2017-08-21 at 11:35 +1000, NeilBrown wrote:
> On Sat, Aug 19 2017, Trond Myklebust wrote:
>
> > On Sat, 2017-08-19 at 11:02 +1000, NeilBrown wrote:
> > > On Fri, Aug 18 2017, Trond Myklebust wrote:
> > >
> > > > On Fri, 2017-08-18 at 17:12 +1000, NeilBrown wrote:
> > > > > Any dirty NFS page holds an s_active reference on the superblock,
> > > > > because page_private() references an nfs_page, which references
> > > > > an
> > > > > open context, which references the superblock.
> > > > >
> > > > > So if there are any dirty pages when the filesystem is unmounted,
> > > > > the
> > > > > unmount will act like a "lazy" unmount and not call ->kill_sb().
> > > > > Background write-back can then write out the pages *after* the
> > > > > filesystem
> > > > > unmount has apparently completed.
> > > > >
> > > > > This contrasts with other filesystems which do not hold extra
> > > > > s_active
> > > > > references, so ->kill_sb() is reliably called on unmount, and
> > > > > generic_shutdown_super() will call sync_filesystem() to flush
> > > > > everything out before the unmount completes.
> > > > >
> > > > > When open/write/close is used to modify files, the final close
> > > > > causes
> > > > > f_op->flush to be called, which flushes all dirty pages. However
> > > > > if
> > > > > open/mmap/close/modify-memory/unmap is used, dirty pages can
> > > > > remain
> > > > > in
> > > > > memory after the application has dropped all references to the
> > > > > file.
> > > > > Similarly if a loop-mount is done with a NFS file, there is no
> > > > > final
> > > > > flush when the loop device is destroyed and the file can have
> > > > > dirty
> > > > > data after the last "close".
> > > > >
> > > > > Also, a loop-back mount of a device does not "close" the file
> > > > > when
> > > > > the
> > > > > loop device is destroyed. This can leave dirty page cache pages
> > > > > too.
> > > > >
> > > > > Fix this by calling vfs_fsync() in nfs_file_release (aka
> > > > > f_op->release()). This means that on the final unmap of a file
> > > > > (or
> > > > > destruction of a loop device), all changes are flushed, and
> > > > > ensures
> > > > > that
> > > > > when unmount is requested there will be no dirty pages to delay
> > > > > the
> > > > > final unmount.
> > > > >
> > > > > Without this patch, it is not safe to stop or disconnect the NFS
> > > > > server after all clients have unmounted. They need to unmount
> > > > > and
> > > > > call "sync".
> > > > >
> > > > > Signed-off-by: NeilBrown <[email protected]>
> > > > > ---
> > > > > fs/nfs/file.c | 6 ++++++
> > > > > 1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> > > > > index af330c31f627..aa883d8b24e6 100644
> > > > > --- a/fs/nfs/file.c
> > > > > +++ b/fs/nfs/file.c
> > > > > @@ -81,6 +81,12 @@ nfs_file_release(struct inode *inode, struct
> > > > > file
> > > > > *filp)
> > > > > {
> > > > > dprintk("NFS: release(%pD2)\n", filp);
> > > > >
> > > > > + if (filp->f_mode & FMODE_WRITE)
> > > > > + /* Ensure dirty mmapped pages are flushed
> > > > > + * so there will be no dirty pages to
> > > > > + * prevent an unmount from completing.
> > > > > + */
> > > > > + vfs_fsync(filp, 0);
> > > > > nfs_inc_stats(inode, NFSIOS_VFSRELEASE);
> > > > > nfs_file_clear_open_context(filp);
> > > > > return 0;
> > > >
> > > > The right fix here is to ensure that umount() flushes the dirty
> > > > data,
> > > > and that it aborts the umount attempt if the flush fails.
> > > > Otherwise, a
> > > > signal to the above fsync call will cause the problem to reoccur.
> > >
> > > The only way to ensure that umount flushes dirty data is to revert
> > > commit 1daef0a86837 ("NFS: Clean up nfs_sb_active/nfs_sb_deactive").
> > > As long as we are taking extra references to s_active, umount won't
> > > do
> > > what it ought to do.
> > >
> >
> > Reverting that patch is not sufficient. You'd still need to call
> > sync_filesystem(), and abort the umount if the latter fails.
>
> generic_shutdown_super() (which nfs_kill_super() calls) already calls
> sync_filesystem(). However sync_filesystem() cannot fail - it just
> asks bdi workers to flush data out, then waits for completion.
>
> If the server is working, this should complete correctly. If it isn't
> this will block indefinitely in the same way that a "sync" following the
> umount will block indefinitely today.
>
> So while I agree that reverting this isn't sufficient, I think it might
> be a step in the right direction.
>
> I've been experimenting with what happens when the server does down and
> you try to unmount. With NFSv4, the state manager can still be trying
> to send a RENEW and not want to let go until it gets an answer. I
> haven't traced it all through yet to understand why or how significant
> this is.
>
>
> >
> > > I do wonder what we should do when you try to unmount a filesystem
> > > mounted from an inaccessible server.
> > > Blocking is awkward for scripts to work with. Failing with EBUSY
> > > might
> > > be misleading.
> > > I don't think that using MNT_FORCE guarantees success does it? It
> > > would
> > > need to discard any dirty data and abort any other background tasks.
> >
> > MNT_FORCE will only abort any outstanding RPC calls. It is constrained
> > by the fact that we don't serialise umount and mount, and that the
> > filesystem on which we are operating may be mounted in several places
> > in the main namespace and indeed in several private namespaces to
> > boot.
>
> Yes, that is a problem.
> If we could arrange that SIGKILL *always* kills the process so that it
> drops the reference to the filesystem, we could possible delay the
> MNT_FORCE handling until deactivate_super(), but at present we sometimes
> need MNT_FORCE to allow KILLed processes to die, and while there are
> processes with file descriptors, umount will not get as far as
> deactivate_super().
> (We need filemap_fdatawait_range() to be killable to do better.)
>

Indeed. I think we need to consider variants of these that are killable,
and replace the existing calls with the killable variants in the
appropriate places. That's a long term (and difficult) project though...

> >
> > > I think I would be in favor of EBUSY rather than a hang, and of
> > > making
> > > MNT_FORCE a silver bullet providing there are no open file
> > > descriptors
> > > on the filesystem.
> >
> > All other filesystems trigger sync_filesystem(), and that's what the
> > VFS expects us to do. The problem is the VFS assumes that call will
> > always succeed, and won't ever return an error.
>
> Yes, it is not easy to find an "ideal" solution.
>
> Given that fact, and given that the present patch does address an easily
> demonstrated symptom, would you accept it as a small step in a good
> direction? When the server is still accessible, and when no-one kills
> processes at awkward times, it definitely helps.
>
> Even with a SIGKILL, I think the WRITEs and the COMMIT will be sent, but
> the COMMIT won't be waited for...
>
> Thanks,
> NeilBrown

--
Jeff Layton <[email protected]>