LinuxLists.cc - [PATCH v4 00/10] locking fixups for NFS

2015-12-30 13:14:08

Subject: [PATCH v4 00/10] locking fixups for NFS

This set addresses a few NFS related locking issues as well as a couple
minor cleanups. It does a bit of plumbing to make sure we can complete
unlocks after close when the fl_file may be unavailable, and it attempts to
always release locks even when a wait for oustanding IO is interrupted
before sending an unlock. This can orphan a lock on a server, which might
mean a delay for matching lock NFS4, or require a server restart to clean up
that lock for NFS3.

Change for v4:
- combine tests for fl_type into single switch on 02/10

Change for v3:
- update a comment on open mode checking for flock() locks

Changes for v2:
- the change to move do_vfs_lock() to a shared inline has been removed
- added patch to set FL_CLOSE for flock unlock on close (08/10)
- instead of keeping a copied list of deferred file_locks to unlock,
just send an unlock for the entire file if FL_CLOSE is set.

Benjamin Coddington (10):
NFS4: remove a redundant lock range check
NFS: Move the flock open mode check into nfs_flock()
NFS: Pass nfs_open_context instead of file to the lock procs
NFSv4: Pass nfs_open_context instead of nfs4_state to
nfs4_proc_unlck()
lockd: Plumb nfs_open_context into nlm client unlock
lockd: Send the inode to nlmclnt_setlockargs()
lockd: do_vfs_lock() only needs the inode
locks: Set FL_CLOSE when removing flock locks on close()
NFS: Deferred unlocks - always unlock on FL_CLOSE
NFS: cleanup do_vfs_lock()

fs/lockd/clntproc.c | 41 ++++++++++++++++---------------
fs/locks.c | 2 +-
fs/nfs/file.c | 58 ++++++++++++++++++++++++++++++++-----------
fs/nfs/inode.c | 22 ++++++++++++++++
fs/nfs/nfs3proc.c | 6 +---
fs/nfs/nfs4proc.c | 45 ++++++++++------------------------
fs/nfs/pagelist.c | 8 ++++-
fs/nfs/proc.c | 6 +---
include/linux/lockd/bind.h | 3 +-
include/linux/nfs_fs.h | 3 ++
include/linux/nfs_xdr.h | 2 +-
11 files changed, 116 insertions(+), 80 deletions(-)

2015-12-30 13:14:08

by Benjamin Coddington

[permalink] [raw]

Subject: [PATCH v4 01/10] NFS4: remove a redundant lock range check

flock64_to_posix_lock() is already doing this check

Signed-off-by: Benjamin Coddington <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
---
fs/nfs/nfs4proc.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 8981803..2231d7d 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6108,9 +6108,6 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request)
ctx = nfs_file_open_context(filp);
state = ctx->state;

- if (request->fl_start < 0 || request->fl_end < 0)
- return -EINVAL;
-
if (IS_GETLK(cmd)) {
if (state != NULL)
return nfs4_proc_getlk(state, F_GETLK, request);
--
1.7.1

2015-12-30 13:14:08

by Benjamin Coddington

[permalink] [raw]

Subject: [PATCH v4 04/10] NFSv4: Pass nfs_open_context instead of nfs4_state to nfs4_proc_unlck()

nfs4_proc_unlk() acquires the nfs_open_context from the file_lock's
fl_file->private, but we will not always have a referenced fl_file in this
path. Instead, pass along the nfs_open_context from the call to
nfs4_proc_lock(). This allows us to use nfs4_proc_unlck() without a
valid reference in fl_file.

Signed-off-by: Benjamin Coddington <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
---
fs/nfs/nfs4proc.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 784ba4e..279c8b3 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5656,8 +5656,10 @@ static struct rpc_task *nfs4_do_unlck(struct file_lock *fl,
return rpc_run_task(&task_setup_data);
}

-static int nfs4_proc_unlck(struct nfs4_state *state, int cmd, struct file_lock *request)
+static int nfs4_proc_unlck(struct nfs_open_context *ctx,
+ int cmd, struct file_lock *request)
{
+ struct nfs4_state *state = ctx->state;
struct inode *inode = state->inode;
struct nfs4_state_owner *sp = state->owner;
struct nfs_inode *nfsi = NFS_I(inode);
@@ -5693,7 +5695,7 @@ static int nfs4_proc_unlck(struct nfs4_state *state, int cmd, struct file_lock *
status = -ENOMEM;
if (IS_ERR(seqid))
goto out;
- task = nfs4_do_unlck(request, nfs_file_open_context(request->fl_file), lsp, seqid);
+ task = nfs4_do_unlck(request, ctx, lsp, seqid);
status = PTR_ERR(task);
if (IS_ERR(task))
goto out;
@@ -6117,7 +6119,7 @@ nfs4_proc_lock(struct nfs_open_context *ctx, int cmd, struct file_lock *request)

if (request->fl_type == F_UNLCK) {
if (state != NULL)
- return nfs4_proc_unlck(state, cmd, request);
+ return nfs4_proc_unlck(ctx, cmd, request);
return 0;
}

--
1.7.1

2015-12-30 13:14:08

by Benjamin Coddington

[permalink] [raw]

Subject: [PATCH v4 02/10] NFS: Move the flock open mode check into nfs_flock()

We only need to check lock exclusive/shared types against open mode when
flock() is used on NFS, so move it into the flock-specific path instead of
checking it for all locks.

Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/nfs/file.c | 20 +++++++++++++++++---
fs/nfs/nfs4proc.c | 13 -------------
2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 93e2364..8b16a08 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -890,9 +890,23 @@ int nfs_flock(struct file *filp, int cmd, struct file_lock *fl)
if (NFS_SERVER(inode)->flags & NFS_MOUNT_LOCAL_FLOCK)
is_local = 1;

- /* We're simulating flock() locks using posix locks on the server */
- if (fl->fl_type == F_UNLCK)
- return do_unlk(filp, cmd, fl, is_local);
+ /*
+ * VFS doesn't require the open mode to match a flock() lock's type.
+ * NFS, however, may simulate flock() locking with posix locking which
+ * requires the open mode to match the lock type.
+ */
+ switch (fl->fl_type) {
+ case F_UNLCK:
+ return do_unlk(filp, cmd, fl, is_local);
+ case F_RDLCK:
+ if (!(filp->f_mode & FMODE_READ))
+ return -EBADF;
+ break;
+ case F_WRLCK:
+ if (!(filp->f_mode & FMODE_WRITE))
+ return -EBADF;
+ }
+
return do_setlk(filp, cmd, fl, is_local);
}
EXPORT_SYMBOL_GPL(nfs_flock);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 2231d7d..05ea1e1 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6125,19 +6125,6 @@ nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request)

if (state == NULL)
return -ENOLCK;
- /*
- * Don't rely on the VFS having checked the file open mode,
- * since it won't do this for flock() locks.
- */
- switch (request->fl_type) {
- case F_RDLCK:
- if (!(filp->f_mode & FMODE_READ))
- return -EBADF;
- break;
- case F_WRLCK:
- if (!(filp->f_mode & FMODE_WRITE))
- return -EBADF;
- }

do {
status = nfs4_proc_setlk(state, cmd, request);
--
1.7.1

2015-12-30 13:14:08

by Benjamin Coddington

[permalink] [raw]

Subject: [PATCH v4 03/10] NFS: Pass nfs_open_context instead of file to the lock procs

We only use the file struct pointer to obtain the nfs_open_context (or the
inode for v2/v3), so just pass that instead. This allows us to use the
lock procedure without a reference to a struct file which may not be
available while releasing locks after a file close.

Signed-off-by: Benjamin Coddington <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
---
fs/nfs/file.c | 9 ++++++---
fs/nfs/nfs3proc.c | 4 ++--
fs/nfs/nfs4proc.c | 4 +---
fs/nfs/proc.c | 4 ++--
include/linux/nfs_xdr.h | 2 +-
5 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 8b16a08..20a0541 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -711,6 +711,7 @@ static int
do_getlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
{
struct inode *inode = filp->f_mapping->host;
+ struct nfs_open_context *ctx = nfs_file_open_context(filp);
int status = 0;
unsigned int saved_type = fl->fl_type;

@@ -728,7 +729,7 @@ do_getlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
if (is_local)
goto out_noconflict;

- status = NFS_PROTO(inode)->lock(filp, cmd, fl);
+ status = NFS_PROTO(inode)->lock(ctx, cmd, fl);
out:
return status;
out_noconflict:
@@ -745,6 +746,7 @@ static int
do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
{
struct inode *inode = filp->f_mapping->host;
+ struct nfs_open_context *ctx = nfs_file_open_context(filp);
struct nfs_lock_context *l_ctx;
int status;

@@ -771,7 +773,7 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
* "-olocal_lock="
*/
if (!is_local)
- status = NFS_PROTO(inode)->lock(filp, cmd, fl);
+ status = NFS_PROTO(inode)->lock(ctx, cmd, fl);
else
status = do_vfs_lock(filp, fl);
return status;
@@ -786,6 +788,7 @@ static int
do_setlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
{
struct inode *inode = filp->f_mapping->host;
+ struct nfs_open_context *ctx = nfs_file_open_context(filp);
int status;

/*
@@ -801,7 +804,7 @@ do_setlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
* "-olocal_lock="
*/
if (!is_local)
- status = NFS_PROTO(inode)->lock(filp, cmd, fl);
+ status = NFS_PROTO(inode)->lock(ctx, cmd, fl);
else
status = do_vfs_lock(filp, fl);
if (status < 0)
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index cb28cce..a10e147 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -866,9 +866,9 @@ static void nfs3_proc_commit_setup(struct nfs_commit_data *data, struct rpc_mess
}

static int
-nfs3_proc_lock(struct file *filp, int cmd, struct file_lock *fl)
+nfs3_proc_lock(struct nfs_open_context *ctx, int cmd, struct file_lock *fl)
{
- struct inode *inode = file_inode(filp);
+ struct inode *inode = d_inode(ctx->dentry);

return nlmclnt_proc(NFS_SERVER(inode)->nlm_host, cmd, fl);
}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 05ea1e1..784ba4e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6097,15 +6097,13 @@ static int nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *
}

static int
-nfs4_proc_lock(struct file *filp, int cmd, struct file_lock *request)
+nfs4_proc_lock(struct nfs_open_context *ctx, int cmd, struct file_lock *request)
{
- struct nfs_open_context *ctx;
struct nfs4_state *state;
unsigned long timeout = NFS4_LOCK_MINTIMEOUT;
int status;

/* verify open state */
- ctx = nfs_file_open_context(filp);
state = ctx->state;

if (IS_GETLK(cmd)) {
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index b417bbc..185deb9 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -634,9 +634,9 @@ nfs_proc_commit_setup(struct nfs_commit_data *data, struct rpc_message *msg)
}

static int
-nfs_proc_lock(struct file *filp, int cmd, struct file_lock *fl)
+nfs_proc_lock(struct nfs_open_context *ctx, int cmd, struct file_lock *fl)
{
- struct inode *inode = file_inode(filp);
+ struct inode *inode = d_inode(ctx->dentry);

return nlmclnt_proc(NFS_SERVER(inode)->nlm_host, cmd, fl);
}
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 11bbae4..58c4682 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1555,7 +1555,7 @@ struct nfs_rpc_ops {
void (*commit_setup) (struct nfs_commit_data *, struct rpc_message *);
void (*commit_rpc_prepare)(struct rpc_task *, struct nfs_commit_data *);
int (*commit_done) (struct rpc_task *, struct nfs_commit_data *);
- int (*lock)(struct file *, int, struct file_lock *);
+ int (*lock)(struct nfs_open_context *, int, struct file_lock *);
int (*lock_check_bounds)(const struct file_lock *);
void (*clear_acl_cache)(struct inode *);
void (*close_context)(struct nfs_open_context *ctx, int);
--
1.7.1

2015-12-30 13:14:08

by Benjamin Coddington

[permalink] [raw]

Subject: [PATCH v4 06/10] lockd: Send the inode to nlmclnt_setlockargs()

Instead of using fl_file to reference the inode in nlmclnt_setlockargs() to
set the filehandle, send the inode as an argument. That way, we can use
nlmclnt_proc without having a valid fl_file, which can happen if we are
releasing locks after a close.

Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/lockd/clntproc.c | 11 ++++++-----
1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 72b65c4..91aa483 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -122,14 +122,15 @@ static struct nlm_lockowner *nlm_find_lockowner(struct nlm_host *host, fl_owner_
/*
* Initialize arguments for TEST/LOCK/UNLOCK/CANCEL calls
*/
-static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl)
+static void nlmclnt_setlockargs(struct nlm_rqst *req, struct file_lock *fl,
+ struct inode *inode)
{
struct nlm_args *argp = &req->a_args;
struct nlm_lock *lock = &argp->lock;
char *nodename = req->a_host->h_rpcclnt->cl_nodename;

nlmclnt_next_cookie(&argp->cookie);
- memcpy(&lock->fh, NFS_FH(file_inode(fl->fl_file)), sizeof(struct nfs_fh));
+ memcpy(&lock->fh, NFS_FH(inode), sizeof(struct nfs_fh));
lock->caller = nodename;
lock->oh.data = req->a_owner;
lock->oh.len = snprintf(req->a_owner, sizeof(req->a_owner), "%u@%s",
@@ -171,7 +172,7 @@ int nlmclnt_proc(struct nfs_open_context *ctx, int cmd, struct file_lock *fl)
return -ENOMEM;
}
/* Set up the argument struct */
- nlmclnt_setlockargs(call, fl);
+ nlmclnt_setlockargs(call, fl, inode);

if (IS_SETLK(cmd) || IS_SETLKW(cmd)) {
if (fl->fl_type != F_UNLCK) {
@@ -619,7 +620,7 @@ nlmclnt_reclaim(struct nlm_host *host, struct file_lock *fl,
req->a_host = host;

/* Set up the argument struct */
- nlmclnt_setlockargs(req, fl);
+ nlmclnt_setlockargs(req, fl, file_inode(fl->fl_file));
req->a_args.reclaim = 1;

status = nlmclnt_call(nfs_file_cred(fl->fl_file), req, NLMPROC_LOCK);
@@ -746,7 +747,7 @@ static int nlmclnt_cancel(struct nlm_host *host, int block, struct file_lock *fl
return -ENOMEM;
req->a_flags = RPC_TASK_ASYNC;

- nlmclnt_setlockargs(req, fl);
+ nlmclnt_setlockargs(req, fl, file_inode(fl->fl_file));
req->a_args.block = block;

atomic_inc(&req->a_count);
--
1.7.1

2015-12-30 13:14:08

by Benjamin Coddington

[permalink] [raw]

Subject: [PATCH v4 05/10] lockd: Plumb nfs_open_context into nlm client unlock

We cannot depend on a file_lock->fl_file still being around while
performing unlocks for NFSv2 and NFSv3. Pass along the nfs_open_context to
provide the necessary inode and credential to complete the unlock.

Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/lockd/clntproc.c | 16 ++++++++++------
fs/nfs/nfs3proc.c | 4 +---
fs/nfs/proc.c | 4 +---
include/linux/lockd/bind.h | 3 ++-
4 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 1129520..72b65c4 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -25,7 +25,8 @@

static int nlmclnt_test(struct nlm_rqst *, struct file_lock *);
static int nlmclnt_lock(struct nlm_rqst *, struct file_lock *);
-static int nlmclnt_unlock(struct nlm_rqst *, struct file_lock *);
+static int nlmclnt_unlock(struct nfs_open_context *nfs,
+ struct nlm_rqst *, struct file_lock *);
static int nlm_stat_to_errno(__be32 stat);
static void nlmclnt_locks_init_private(struct file_lock *fl, struct nlm_host *host);
static int nlmclnt_cancel(struct nlm_host *, int , struct file_lock *);
@@ -147,13 +148,15 @@ static void nlmclnt_release_lockargs(struct nlm_rqst *req)

/**
* nlmclnt_proc - Perform a single client-side lock request
- * @host: address of a valid nlm_host context representing the NLM server
+ * @ctx: address of the nfs_open_context for lock to operate upon
* @cmd: fcntl-style file lock operation to perform
* @fl: address of arguments for the lock operation
*
*/
-int nlmclnt_proc(struct nlm_host *host, int cmd, struct file_lock *fl)
+int nlmclnt_proc(struct nfs_open_context *ctx, int cmd, struct file_lock *fl)
{
+ struct inode *inode = d_inode(ctx->dentry);
+ struct nlm_host *host = NFS_SERVER(inode)->nlm_host;
struct nlm_rqst *call;
int status;

@@ -175,7 +178,7 @@ int nlmclnt_proc(struct nlm_host *host, int cmd, struct file_lock *fl)
call->a_args.block = IS_SETLKW(cmd) ? 1 : 0;
status = nlmclnt_lock(call, fl);
} else
- status = nlmclnt_unlock(call, fl);
+ status = nlmclnt_unlock(ctx, call, fl);
} else if (IS_GETLK(cmd))
status = nlmclnt_test(call, fl);
else
@@ -646,7 +649,8 @@ nlmclnt_reclaim(struct nlm_host *host, struct file_lock *fl,
* UNLOCK: remove an existing lock
*/
static int
-nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
+nlmclnt_unlock(struct nfs_open_context *ctx,
+ struct nlm_rqst *req, struct file_lock *fl)
{
struct nlm_host *host = req->a_host;
struct nlm_res *resp = &req->a_res;
@@ -669,7 +673,7 @@ nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
}

atomic_inc(&req->a_count);
- status = nlmclnt_async_call(nfs_file_cred(fl->fl_file), req,
+ status = nlmclnt_async_call(ctx->cred, req,
NLMPROC_UNLOCK, &nlmclnt_unlock_ops);
if (status < 0)
goto out;
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index a10e147..2941ab7 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -868,9 +868,7 @@ static void nfs3_proc_commit_setup(struct nfs_commit_data *data, struct rpc_mess
static int
nfs3_proc_lock(struct nfs_open_context *ctx, int cmd, struct file_lock *fl)
{
- struct inode *inode = d_inode(ctx->dentry);
-
- return nlmclnt_proc(NFS_SERVER(inode)->nlm_host, cmd, fl);
+ return nlmclnt_proc(ctx, cmd, fl);
}

static int nfs3_have_delegation(struct inode *inode, fmode_t flags)
diff --git a/fs/nfs/proc.c b/fs/nfs/proc.c
index 185deb9..17d892e 100644
--- a/fs/nfs/proc.c
+++ b/fs/nfs/proc.c
@@ -636,9 +636,7 @@ nfs_proc_commit_setup(struct nfs_commit_data *data, struct rpc_message *msg)
static int
nfs_proc_lock(struct nfs_open_context *ctx, int cmd, struct file_lock *fl)
{
- struct inode *inode = d_inode(ctx->dentry);
-
- return nlmclnt_proc(NFS_SERVER(inode)->nlm_host, cmd, fl);
+ return nlmclnt_proc(ctx, cmd, fl);
}

/* Helper functions for NFS lock bounds checking */
diff --git a/include/linux/lockd/bind.h b/include/linux/lockd/bind.h
index 4d24d64..1d62e85 100644
--- a/include/linux/lockd/bind.h
+++ b/include/linux/lockd/bind.h
@@ -18,6 +18,7 @@

/* Dummy declarations */
struct svc_rqst;
+struct nfs_open_context;

/*
* This is the set of functions for lockd->nfsd communication
@@ -52,7 +53,7 @@ struct nlmclnt_initdata {
extern struct nlm_host *nlmclnt_init(const struct nlmclnt_initdata *nlm_init);
extern void nlmclnt_done(struct nlm_host *host);

-extern int nlmclnt_proc(struct nlm_host *host, int cmd,
+extern int nlmclnt_proc(struct nfs_open_context *ctx, int cmd,
struct file_lock *fl);
extern int lockd_up(struct net *net);
extern void lockd_down(struct net *net);
--
1.7.1

2015-12-30 13:14:08

by Benjamin Coddington

[permalink] [raw]

Subject: [PATCH v4 07/10] lockd: do_vfs_lock() only needs the inode

Send the inode instead of the file to do_vfs_lock() by using
locks_lock_inode_wait(). This allows an unlock operation to complete
after a close when there may not be an fl_file.

Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/lockd/clntproc.c | 13 +++++++------
1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 91aa483..fa1d83b 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -476,9 +476,9 @@ static void nlmclnt_locks_init_private(struct file_lock *fl, struct nlm_host *ho
fl->fl_ops = &nlmclnt_lock_ops;
}

-static int do_vfs_lock(struct file_lock *fl)
+static int do_vfs_lock(struct inode *inode, struct file_lock *fl)
{
- return locks_lock_file_wait(fl->fl_file, fl);
+ return locks_lock_inode_wait(inode, fl);
}

/*
@@ -508,6 +508,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
struct nlm_host *host = req->a_host;
struct nlm_res *resp = &req->a_res;
struct nlm_wait *block = NULL;
+ struct inode *inode = file_inode(fl->fl_file);
unsigned char fl_flags = fl->fl_flags;
unsigned char fl_type;
int status = -ENOLCK;
@@ -517,7 +518,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
req->a_args.state = nsm_local_state;

fl->fl_flags |= FL_ACCESS;
- status = do_vfs_lock(fl);
+ status = do_vfs_lock(inode, fl);
fl->fl_flags = fl_flags;
if (status < 0)
goto out;
@@ -567,7 +568,7 @@ again:
}
/* Ensure the resulting lock will get added to granted list */
fl->fl_flags |= FL_SLEEP;
- if (do_vfs_lock(fl) < 0)
+ if (do_vfs_lock(inode, fl) < 0)
printk(KERN_WARNING "%s: VFS is out of sync with lock manager!\n", __func__);
up_read(&host->h_rwsem);
fl->fl_flags = fl_flags;
@@ -597,7 +598,7 @@ out_unlock:
fl_type = fl->fl_type;
fl->fl_type = F_UNLCK;
down_read(&host->h_rwsem);
- do_vfs_lock(fl);
+ do_vfs_lock(inode, fl);
up_read(&host->h_rwsem);
fl->fl_type = fl_type;
fl->fl_flags = fl_flags;
@@ -665,7 +666,7 @@ nlmclnt_unlock(struct nfs_open_context *ctx,
*/
fl->fl_flags |= FL_EXISTS;
down_read(&host->h_rwsem);
- status = do_vfs_lock(fl);
+ status = do_vfs_lock(d_inode(ctx->dentry), fl);
up_read(&host->h_rwsem);
fl->fl_flags = fl_flags;
if (status == -ENOENT) {
--
1.7.1

2015-12-30 13:14:08

by Benjamin Coddington

[permalink] [raw]

Subject: [PATCH v4 09/10] NFS: Deferred unlocks - always unlock on FL_CLOSE

NFS unlock procedures will wait for IO to complete before sending an
unlock. In the case that this wait is interrupted, an unlock may never be
sent if the unlock is in the close path.

On NFSv3, this lost lock will then cause other clients to be blocked from
acquiring a conflicting lock indefinitely. On NFSv4, the nfs4_lock_state
may never be released resulting in the use of invalid stateids for lock
operations after a subsequent open by the same process.

Fix this by setting a flag on the lock context to send an unlock for the
entire file as soon as outstanding IO for that lock context has completed.
A call into NFS_PROTO(inode)->lock() for both posix and flock style locks
is made no matter which original lock type was held, since the FL_EXISTS
check will return the operation early for a non-existent lock.

Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/nfs/file.c | 26 +++++++++++++++++++++-----
fs/nfs/inode.c | 22 ++++++++++++++++++++++
fs/nfs/pagelist.c | 8 ++++++--
include/linux/nfs_fs.h | 3 +++
4 files changed, 52 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 20a0541..3644fed 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -743,6 +743,22 @@ static int do_vfs_lock(struct file *file, struct file_lock *fl)
}

static int
+defer_close_unlk(struct inode *inode, struct nfs_lock_context *l_ctx)
+{
+ struct nfs_io_counter *c = &l_ctx->io_count;
+ int status = 0;
+
+ if (test_bit(NFS_LOCK_CLOSE_UNLOCK, &l_ctx->flags))
+ return -EINPROGRESS;
+
+ if (atomic_read(&c->io_count) != 0) {
+ set_bit(NFS_LOCK_CLOSE_UNLOCK, &l_ctx->flags);
+ status = -EINPROGRESS;
+ }
+ return status;
+}
+
+static int
do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
{
struct inode *inode = filp->f_mapping->host;
@@ -758,16 +774,16 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)

l_ctx = nfs_get_lock_context(nfs_file_open_context(filp));
if (!IS_ERR(l_ctx)) {
- status = nfs_iocounter_wait(&l_ctx->io_count);
+ if (fl->fl_flags & FL_CLOSE)
+ status = defer_close_unlk(inode, l_ctx);
+ else
+ status = nfs_iocounter_wait(&l_ctx->io_count);
+
nfs_put_lock_context(l_ctx);
if (status < 0)
return status;
}

- /* NOTE: special case
- * If we're signalled while cleaning up locks on process exit, we
- * still need to complete the unlock.
- */
/*
* Use local locking if mounted with "-onolock" or with appropriate
* "-olocal_lock="
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 31b0a52..065c8a9 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -699,6 +699,7 @@ static void nfs_init_lock_context(struct nfs_lock_context *l_ctx)
l_ctx->lockowner.l_owner = current->files;
l_ctx->lockowner.l_pid = current->tgid;
INIT_LIST_HEAD(&l_ctx->list);
+ l_ctx->flags = 0;
nfs_iocounter_init(&l_ctx->io_count);
}

@@ -759,6 +760,27 @@ void nfs_put_lock_context(struct nfs_lock_context *l_ctx)
}
EXPORT_SYMBOL_GPL(nfs_put_lock_context);

+void nfs_unlock_lock_context(struct nfs_lock_context *l_ctx)
+{
+ struct inode *inode = d_inode(l_ctx->open_context->dentry);
+ struct file_lock fl = {
+ .fl_type = F_UNLCK,
+ .fl_start = 0,
+ .fl_end = OFFSET_MAX,
+ .fl_owner = l_ctx->lockowner.l_owner,
+ .fl_pid = l_ctx->lockowner.l_pid,
+ };
+
+ fl.fl_flags = FL_POSIX | FL_CLOSE;
+ NFS_PROTO(inode)->lock(l_ctx->open_context, F_SETLK, &fl);
+ if (fl.fl_ops)
+ fl.fl_ops->fl_release_private(&fl);
+ fl.fl_flags = FL_FLOCK | FL_CLOSE;
+ NFS_PROTO(inode)->lock(l_ctx->open_context, F_SETLK, &fl);
+ if (fl.fl_ops)
+ fl.fl_ops->fl_release_private(&fl);
+}
+
/**
* nfs_close_context - Common close_context() routine NFSv2/v3
* @ctx: pointer to context
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index fe3ddd2..f914fbe 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -108,9 +108,13 @@ nfs_iocounter_inc(struct nfs_io_counter *c)
}

static void
-nfs_iocounter_dec(struct nfs_io_counter *c)
+nfs_iocounter_dec(struct nfs_lock_context *l_ctx)
{
+ struct nfs_io_counter *c = &l_ctx->io_count;
+
if (atomic_dec_and_test(&c->io_count)) {
+ if (test_and_clear_bit(NFS_LOCK_CLOSE_UNLOCK, &l_ctx->flags))
+ nfs_unlock_lock_context(l_ctx);
clear_bit(NFS_IO_INPROGRESS, &c->flags);
smp_mb__after_atomic();
wake_up_bit(&c->flags, NFS_IO_INPROGRESS);
@@ -431,7 +435,7 @@ static void nfs_clear_request(struct nfs_page *req)
req->wb_page = NULL;
}
if (l_ctx != NULL) {
- nfs_iocounter_dec(&l_ctx->io_count);
+ nfs_iocounter_dec(l_ctx);
nfs_put_lock_context(l_ctx);
req->wb_lock_context = NULL;
}
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index c0e9614..b105144 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -72,6 +72,8 @@ struct nfs_lock_context {
struct nfs_open_context *open_context;
struct nfs_lockowner lockowner;
struct nfs_io_counter io_count;
+#define NFS_LOCK_CLOSE_UNLOCK 0
+ unsigned long flags;
};

struct nfs4_state;
@@ -373,6 +375,7 @@ extern void nfs_file_set_open_context(struct file *filp, struct nfs_open_context
extern void nfs_file_clear_open_context(struct file *flip);
extern struct nfs_lock_context *nfs_get_lock_context(struct nfs_open_context *ctx);
extern void nfs_put_lock_context(struct nfs_lock_context *l_ctx);
+extern void nfs_unlock_lock_context(struct nfs_lock_context *l_ctx);
extern u64 nfs_compat_user_ino64(u64 fileid);
extern void nfs_fattr_init(struct nfs_fattr *fattr);
extern void nfs_fattr_set_barrier(struct nfs_fattr *fattr);
--
1.7.1

2015-12-30 13:14:08

by Benjamin Coddington

[permalink] [raw]

Subject: [PATCH v4 08/10] locks: Set FL_CLOSE when removing flock locks on close()

Use FL_CLOSE in fl_flags as in locks_remove_posix() when clearing locks.
NFS will depend on this flag to properly defer an unlock until IO under the
current lock has completed.

Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/locks.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 0d2b326..9aea07a 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2423,7 +2423,7 @@ locks_remove_flock(struct file *filp, struct file_lock_context *flctx)
.fl_owner = filp,
.fl_pid = current->tgid,
.fl_file = filp,
- .fl_flags = FL_FLOCK,
+ .fl_flags = FL_FLOCK | FL_CLOSE,
.fl_type = F_UNLCK,
.fl_end = OFFSET_MAX,
};
--
1.7.1

2015-12-30 13:14:08

by Benjamin Coddington

[permalink] [raw]

Subject: [PATCH v4 10/10] NFS: cleanup do_vfs_lock()

The three versions of do_vfs_lock() only called locks_lock_inode_wait(), so
just call that directly.

Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/lockd/clntproc.c | 13 ++++---------
fs/nfs/file.c | 9 ++-------
fs/nfs/nfs4proc.c | 17 +++++++----------
3 files changed, 13 insertions(+), 26 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index fa1d83b..58145db 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -476,11 +476,6 @@ static void nlmclnt_locks_init_private(struct file_lock *fl, struct nlm_host *ho
fl->fl_ops = &nlmclnt_lock_ops;
}

-static int do_vfs_lock(struct inode *inode, struct file_lock *fl)
-{
- return locks_lock_inode_wait(inode, fl);
-}
-
/*
* LOCK: Try to create a lock
*
@@ -518,7 +513,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
req->a_args.state = nsm_local_state;

fl->fl_flags |= FL_ACCESS;
- status = do_vfs_lock(inode, fl);
+ status = locks_lock_inode_wait(inode, fl);
fl->fl_flags = fl_flags;
if (status < 0)
goto out;
@@ -568,7 +563,7 @@ again:
}
/* Ensure the resulting lock will get added to granted list */
fl->fl_flags |= FL_SLEEP;
- if (do_vfs_lock(inode, fl) < 0)
+ if (locks_lock_inode_wait(inode, fl) < 0)
printk(KERN_WARNING "%s: VFS is out of sync with lock manager!\n", __func__);
up_read(&host->h_rwsem);
fl->fl_flags = fl_flags;
@@ -598,7 +593,7 @@ out_unlock:
fl_type = fl->fl_type;
fl->fl_type = F_UNLCK;
down_read(&host->h_rwsem);
- do_vfs_lock(inode, fl);
+ locks_lock_inode_wait(inode, fl);
up_read(&host->h_rwsem);
fl->fl_type = fl_type;
fl->fl_flags = fl_flags;
@@ -666,7 +661,7 @@ nlmclnt_unlock(struct nfs_open_context *ctx,
*/
fl->fl_flags |= FL_EXISTS;
down_read(&host->h_rwsem);
- status = do_vfs_lock(d_inode(ctx->dentry), fl);
+ status = locks_lock_inode_wait(d_inode(ctx->dentry), fl);
up_read(&host->h_rwsem);
fl->fl_flags = fl_flags;
if (status == -ENOENT) {
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 3644fed..5842d0f 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -737,11 +737,6 @@ out_noconflict:
goto out;
}

-static int do_vfs_lock(struct file *file, struct file_lock *fl)
-{
- return locks_lock_file_wait(file, fl);
-}
-
static int
defer_close_unlk(struct inode *inode, struct nfs_lock_context *l_ctx)
{
@@ -791,7 +786,7 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
if (!is_local)
status = NFS_PROTO(inode)->lock(ctx, cmd, fl);
else
- status = do_vfs_lock(filp, fl);
+ status = locks_lock_file_wait(filp, fl);
return status;
}

@@ -822,7 +817,7 @@ do_setlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
if (!is_local)
status = NFS_PROTO(inode)->lock(ctx, cmd, fl);
else
- status = do_vfs_lock(filp, fl);
+ status = locks_lock_file_wait(filp, fl);
if (status < 0)
goto out;

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 279c8b3..02c246f 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5510,11 +5510,6 @@ static int nfs4_proc_getlk(struct nfs4_state *state, int cmd, struct file_lock *
return err;
}

-static int do_vfs_lock(struct inode *inode, struct file_lock *fl)
-{
- return locks_lock_inode_wait(inode, fl);
-}
-
struct nfs4_unlockdata {
struct nfs_locku_args arg;
struct nfs_locku_res res;
@@ -5567,7 +5562,8 @@ static void nfs4_locku_done(struct rpc_task *task, void *data)
switch (task->tk_status) {
case 0:
renew_lease(calldata->server, calldata->timestamp);
- do_vfs_lock(calldata->lsp->ls_state->inode, &calldata->fl);
+ locks_lock_inode_wait(calldata->lsp->ls_state->inode,
+ &calldata->fl);
if (nfs4_update_lock_stateid(calldata->lsp,
&calldata->res.stateid))
break;
@@ -5677,7 +5673,7 @@ static int nfs4_proc_unlck(struct nfs_open_context *ctx,
mutex_lock(&sp->so_delegreturn_mutex);
/* Exclude nfs4_reclaim_open_stateid() - note nesting! */
down_read(&nfsi->rwsem);
- if (do_vfs_lock(inode, request) == -ENOENT) {
+ if (locks_lock_inode_wait(inode, request) == -ENOENT) {
up_read(&nfsi->rwsem);
mutex_unlock(&sp->so_delegreturn_mutex);
goto out;
@@ -5818,7 +5814,8 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
data->timestamp);
if (data->arg.new_lock) {
data->fl.fl_flags &= ~(FL_SLEEP | FL_ACCESS);
- if (do_vfs_lock(lsp->ls_state->inode, &data->fl) < 0) {
+ if (locks_lock_inode_wait(lsp->ls_state->inode,
+ &data->fl) < 0) {
rpc_restart_call_prepare(task);
break;
}
@@ -6060,7 +6057,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
if (status != 0)
goto out;
request->fl_flags |= FL_ACCESS;
- status = do_vfs_lock(state->inode, request);
+ status = locks_lock_inode_wait(state->inode, request);
if (status < 0)
goto out;
down_read(&nfsi->rwsem);
@@ -6068,7 +6065,7 @@ static int _nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock
/* Yes: cache locks! */
/* ...but avoid races with delegation recall... */
request->fl_flags = fl_flags & ~FL_SLEEP;
- status = do_vfs_lock(state->inode, request);
+ status = locks_lock_inode_wait(state->inode, request);
up_read(&nfsi->rwsem);
goto out;
}
--
1.7.1

2016-01-03 12:00:41

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v4 01/10] NFS4: remove a redundant lock range check

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2016-01-03 12:01:43

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v4 02/10] NFS: Move the flock open mode check into nfs_flock()

Looks fine, althought the switch label indentation is a little
unfortunate.

Reviewed-by: Christoph Hellwig <[email protected]>

2016-01-03 12:02:05

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v4 03/10] NFS: Pass nfs_open_context instead of file to the lock procs

Looks fine,

Reviewed-by: Christoph Hellwig <[email protected]>

2016-01-03 12:02:36

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v4 04/10] NFSv4: Pass nfs_open_context instead of nfs4_state to nfs4_proc_unlck()

Looks fine,

Reviewed-by: Christoph Hellwig <[email protected]>

2016-01-03 12:03:00

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v4 05/10] lockd: Plumb nfs_open_context into nlm client unlock

Looks fine,

Reviewed-by: Christoph Hellwig <[email protected]>

2016-01-03 12:04:06

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v4 06/10] lockd: Send the inode to nlmclnt_setlockargs()

I'd use 'pass' instead of 'send' in the patch description, but otherwise
this looks fine to me:

Reviewed-by: Christoph Hellwig <[email protected]>

2016-01-03 12:04:34

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v4 07/10] lockd: do_vfs_lock() only needs the inode

Looks fine,

Reviewed-by: Christoph Hellwig <[email protected]>

2016-01-03 12:05:52

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v4 08/10] locks: Set FL_CLOSE when removing flock locks on close()

On Wed, Dec 30, 2015 at 08:14:04AM -0500, Benjamin Coddington wrote:
> Use FL_CLOSE in fl_flags as in locks_remove_posix() when clearing locks.
> NFS will depend on this flag to properly defer an unlock until IO under the
> current lock has completed.

I wish we had a different of for relasing locks vs acquiring them,
but for now this looks fine to me:

Reviewed-by: Christoph Hellwig <[email protected]>

2016-01-03 12:06:41

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v4 10/10] NFS: cleanup do_vfs_lock()

Looks fine,

Reviewed-by: Christoph Hellwig <[email protected]>

2016-01-05 03:30:01

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH v4 09/10] NFS: Deferred unlocks - always unlock on FL_CLOSE

On Wed, Dec 30, 2015 at 8:14 AM, Benjamin Coddington
<[email protected]> wrote:
> NFS unlock procedures will wait for IO to complete before sending an
> unlock. In the case that this wait is interrupted, an unlock may never be
> sent if the unlock is in the close path.
>
> On NFSv3, this lost lock will then cause other clients to be blocked from
> acquiring a conflicting lock indefinitely. On NFSv4, the nfs4_lock_state
> may never be released resulting in the use of invalid stateids for lock
> operations after a subsequent open by the same process.
>
> Fix this by setting a flag on the lock context to send an unlock for the
> entire file as soon as outstanding IO for that lock context has completed.
> A call into NFS_PROTO(inode)->lock() for both posix and flock style locks
> is made no matter which original lock type was held, since the FL_EXISTS
> check will return the operation early for a non-existent lock.
>
> Signed-off-by: Benjamin Coddington <[email protected]>
> ---
> fs/nfs/file.c | 26 +++++++++++++++++++++-----
> fs/nfs/inode.c | 22 ++++++++++++++++++++++
> fs/nfs/pagelist.c | 8 ++++++--
> include/linux/nfs_fs.h | 3 +++
> 4 files changed, 52 insertions(+), 7 deletions(-)
>
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index 20a0541..3644fed 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -743,6 +743,22 @@ static int do_vfs_lock(struct file *file, struct file_lock *fl)
> }
>
> static int
> +defer_close_unlk(struct inode *inode, struct nfs_lock_context *l_ctx)
> +{
> + struct nfs_io_counter *c = &l_ctx->io_count;
> + int status = 0;
> +
> + if (test_bit(NFS_LOCK_CLOSE_UNLOCK, &l_ctx->flags))
> + return -EINPROGRESS;

Why is this needed? Normally, the process should set FL_CLOSE only
once per fl_owner_t.

> +
> + if (atomic_read(&c->io_count) != 0) {
> + set_bit(NFS_LOCK_CLOSE_UNLOCK, &l_ctx->flags);

What guarantees atomicity between io_count being non-zero and the
setting of NFS_LOCK_CLOSE_UNLOCK?

> + status = -EINPROGRESS;
> + }
> + return status;
> +}
> +
> +static int
> do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
> {
> struct inode *inode = filp->f_mapping->host;
> @@ -758,16 +774,16 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
>
> l_ctx = nfs_get_lock_context(nfs_file_open_context(filp));
> if (!IS_ERR(l_ctx)) {
> - status = nfs_iocounter_wait(&l_ctx->io_count);
> + if (fl->fl_flags & FL_CLOSE)
> + status = defer_close_unlk(inode, l_ctx);
> + else
> + status = nfs_iocounter_wait(&l_ctx->io_count);
> +
> nfs_put_lock_context(l_ctx);
> if (status < 0)
> return status;

Question: instead of deferring unlock, why can't we simply ignore the
return value of nfs_iocounter_wait(), just like we do for the return
value of vfs_fsync() in this function?

> }
>
> - /* NOTE: special case
> - * If we're signalled while cleaning up locks on process exit, we
> - * still need to complete the unlock.
> - */
> /*
> * Use local locking if mounted with "-onolock" or with appropriate
> * "-olocal_lock="
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 31b0a52..065c8a9 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -699,6 +699,7 @@ static void nfs_init_lock_context(struct nfs_lock_context *l_ctx)
> l_ctx->lockowner.l_owner = current->files;
> l_ctx->lockowner.l_pid = current->tgid;
> INIT_LIST_HEAD(&l_ctx->list);
> + l_ctx->flags = 0;
> nfs_iocounter_init(&l_ctx->io_count);
> }
>
> @@ -759,6 +760,27 @@ void nfs_put_lock_context(struct nfs_lock_context *l_ctx)
> }
> EXPORT_SYMBOL_GPL(nfs_put_lock_context);
>
> +void nfs_unlock_lock_context(struct nfs_lock_context *l_ctx)
> +{
> + struct inode *inode = d_inode(l_ctx->open_context->dentry);
> + struct file_lock fl = {
> + .fl_type = F_UNLCK,
> + .fl_start = 0,
> + .fl_end = OFFSET_MAX,
> + .fl_owner = l_ctx->lockowner.l_owner,
> + .fl_pid = l_ctx->lockowner.l_pid,
> + };
> +
> + fl.fl_flags = FL_POSIX | FL_CLOSE;
> + NFS_PROTO(inode)->lock(l_ctx->open_context, F_SETLK, &fl);
> + if (fl.fl_ops)
> + fl.fl_ops->fl_release_private(&fl);
> + fl.fl_flags = FL_FLOCK | FL_CLOSE;
> + NFS_PROTO(inode)->lock(l_ctx->open_context, F_SETLK, &fl);
> + if (fl.fl_ops)
> + fl.fl_ops->fl_release_private(&fl);

This is releasing both POSIX and flock() locks, but they all use
different fl_owner_t values. Each lock context therefore corresponds
to _either_ a POSIX _or_ a flock() lock, or an OFD lock. Furthermore,
you can end up with a double free here because you don't reset
fl.fl_ops->fl_release_private to NULL before calling into the flock()
case.

Also, what about the case where this is a local lock (i.e. we mounted
with -onolock or -olocal_lock=)?

> +}
> +
> /**
> * nfs_close_context - Common close_context() routine NFSv2/v3
> * @ctx: pointer to context
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index fe3ddd2..f914fbe 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -108,9 +108,13 @@ nfs_iocounter_inc(struct nfs_io_counter *c)
> }
>
> static void
> -nfs_iocounter_dec(struct nfs_io_counter *c)
> +nfs_iocounter_dec(struct nfs_lock_context *l_ctx)
> {
> + struct nfs_io_counter *c = &l_ctx->io_count;
> +
> if (atomic_dec_and_test(&c->io_count)) {
> + if (test_and_clear_bit(NFS_LOCK_CLOSE_UNLOCK, &l_ctx->flags))
> + nfs_unlock_lock_context(l_ctx);

What guarantees that you are in a context where you are allowed to
make synchronous RPC calls here? AFAICS you are almost always calling
from an asynchronous I/O path callback.

> clear_bit(NFS_IO_INPROGRESS, &c->flags);
> smp_mb__after_atomic();
> wake_up_bit(&c->flags, NFS_IO_INPROGRESS);

BTW: nfs_iocounter_dec() could use a cleanup. Now that we have
wait_on_atomic_t(), there is no need for the NFS_IO_INPROGRESS bit.

> @@ -431,7 +435,7 @@ static void nfs_clear_request(struct nfs_page *req)
> req->wb_page = NULL;
> }
> if (l_ctx != NULL) {
> - nfs_iocounter_dec(&l_ctx->io_count);
> + nfs_iocounter_dec(l_ctx);
> nfs_put_lock_context(l_ctx);
> req->wb_lock_context = NULL;
> }
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index c0e9614..b105144 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -72,6 +72,8 @@ struct nfs_lock_context {
> struct nfs_open_context *open_context;
> struct nfs_lockowner lockowner;
> struct nfs_io_counter io_count;
> +#define NFS_LOCK_CLOSE_UNLOCK 0
> + unsigned long flags;
> };
>
> struct nfs4_state;
> @@ -373,6 +375,7 @@ extern void nfs_file_set_open_context(struct file *filp, struct nfs_open_context
> extern void nfs_file_clear_open_context(struct file *flip);
> extern struct nfs_lock_context *nfs_get_lock_context(struct nfs_open_context *ctx);
> extern void nfs_put_lock_context(struct nfs_lock_context *l_ctx);
> +extern void nfs_unlock_lock_context(struct nfs_lock_context *l_ctx);
> extern u64 nfs_compat_user_ino64(u64 fileid);
> extern void nfs_fattr_init(struct nfs_fattr *fattr);
> extern void nfs_fattr_set_barrier(struct nfs_fattr *fattr);
> --
> 1.7.1

2016-01-05 13:49:00

by Benjamin Coddington

[permalink] [raw]

Subject: Re: [PATCH v4 09/10] NFS: Deferred unlocks - always unlock on FL_CLOSE

On Mon, 4 Jan 2016, Trond Myklebust wrote:

> On Wed, Dec 30, 2015 at 8:14 AM, Benjamin Coddington
> <[email protected]> wrote:
> > NFS unlock procedures will wait for IO to complete before sending an
> > unlock. In the case that this wait is interrupted, an unlock may never be
> > sent if the unlock is in the close path.
> >
> > On NFSv3, this lost lock will then cause other clients to be blocked from
> > acquiring a conflicting lock indefinitely. On NFSv4, the nfs4_lock_state
> > may never be released resulting in the use of invalid stateids for lock
> > operations after a subsequent open by the same process.
> >
> > Fix this by setting a flag on the lock context to send an unlock for the
> > entire file as soon as outstanding IO for that lock context has completed.
> > A call into NFS_PROTO(inode)->lock() for both posix and flock style locks
> > is made no matter which original lock type was held, since the FL_EXISTS
> > check will return the operation early for a non-existent lock.
> >
> > Signed-off-by: Benjamin Coddington <[email protected]>
> > ---
> > fs/nfs/file.c | 26 +++++++++++++++++++++-----
> > fs/nfs/inode.c | 22 ++++++++++++++++++++++
> > fs/nfs/pagelist.c | 8 ++++++--
> > include/linux/nfs_fs.h | 3 +++
> > 4 files changed, 52 insertions(+), 7 deletions(-)
> >
> > diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> > index 20a0541..3644fed 100644
> > --- a/fs/nfs/file.c
> > +++ b/fs/nfs/file.c
> > @@ -743,6 +743,22 @@ static int do_vfs_lock(struct file *file, struct file_lock *fl)
> > }
> >
> > static int
> > +defer_close_unlk(struct inode *inode, struct nfs_lock_context *l_ctx)
> > +{
> > + struct nfs_io_counter *c = &l_ctx->io_count;
> > + int status = 0;
> > +
> > + if (test_bit(NFS_LOCK_CLOSE_UNLOCK, &l_ctx->flags))
> > + return -EINPROGRESS;
>
> Why is this needed? Normally, the process should set FL_CLOSE only
> once per fl_owner_t.

That is true; this is unnecessary. I'll remove it.

> > +
> > + if (atomic_read(&c->io_count) != 0) {
> > + set_bit(NFS_LOCK_CLOSE_UNLOCK, &l_ctx->flags);
>
> What guarantees atomicity between io_count being non-zero and the
> setting of NFS_LOCK_CLOSE_UNLOCK?

Nothing, which is a mistake. I'd removed the bits that ensured atomicity on
a previous version that would do the unlock on the release of the
nfs_lock_context, but ended up scrapping it. I'll add them back.

Aside: an approach that does the unlock on the last placement of the lock
context seems like it might have less spinning and checking in the iocounter
functions. I discarded that because the open context shares lock context
reference accounting and I thought that separating them would be more work,
however it is an unexpected optimization so maybe they should be separated
for clarity; I may look at it again.

> > + status = -EINPROGRESS;
> > + }
> > + return status;
> > +}
> > +
> > +static int
> > do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
> > {
> > struct inode *inode = filp->f_mapping->host;
> > @@ -758,16 +774,16 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
> >
> > l_ctx = nfs_get_lock_context(nfs_file_open_context(filp));
> > if (!IS_ERR(l_ctx)) {
> > - status = nfs_iocounter_wait(&l_ctx->io_count);
> > + if (fl->fl_flags & FL_CLOSE)
> > + status = defer_close_unlk(inode, l_ctx);
> > + else
> > + status = nfs_iocounter_wait(&l_ctx->io_count);
> > +
> > nfs_put_lock_context(l_ctx);
> > if (status < 0)
> > return status;
>
> Question: instead of deferring unlock, why can't we simply ignore the
> return value of nfs_iocounter_wait(), just like we do for the return
> value of vfs_fsync() in this function?

Wouldn't that allow an interrupt in nfs_iocounter_wait() to send the unlock
before I/O completes, which could have the server ask us to resend the I/O
due to an invalid lock statid?

> > }
> >
> > - /* NOTE: special case
> > - * If we're signalled while cleaning up locks on process exit, we
> > - * still need to complete the unlock.
> > - */
> > /*
> > * Use local locking if mounted with "-onolock" or with appropriate
> > * "-olocal_lock="
> > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> > index 31b0a52..065c8a9 100644
> > --- a/fs/nfs/inode.c
> > +++ b/fs/nfs/inode.c
> > @@ -699,6 +699,7 @@ static void nfs_init_lock_context(struct nfs_lock_context *l_ctx)
> > l_ctx->lockowner.l_owner = current->files;
> > l_ctx->lockowner.l_pid = current->tgid;
> > INIT_LIST_HEAD(&l_ctx->list);
> > + l_ctx->flags = 0;
> > nfs_iocounter_init(&l_ctx->io_count);
> > }
> >
> > @@ -759,6 +760,27 @@ void nfs_put_lock_context(struct nfs_lock_context *l_ctx)
> > }
> > EXPORT_SYMBOL_GPL(nfs_put_lock_context);
> >
> > +void nfs_unlock_lock_context(struct nfs_lock_context *l_ctx)
> > +{
> > + struct inode *inode = d_inode(l_ctx->open_context->dentry);
> > + struct file_lock fl = {
> > + .fl_type = F_UNLCK,
> > + .fl_start = 0,
> > + .fl_end = OFFSET_MAX,
> > + .fl_owner = l_ctx->lockowner.l_owner,
> > + .fl_pid = l_ctx->lockowner.l_pid,
> > + };
> > +
> > + fl.fl_flags = FL_POSIX | FL_CLOSE;
> > + NFS_PROTO(inode)->lock(l_ctx->open_context, F_SETLK, &fl);
> > + if (fl.fl_ops)
> > + fl.fl_ops->fl_release_private(&fl);
> > + fl.fl_flags = FL_FLOCK | FL_CLOSE;
> > + NFS_PROTO(inode)->lock(l_ctx->open_context, F_SETLK, &fl);
> > + if (fl.fl_ops)
> > + fl.fl_ops->fl_release_private(&fl);
>
> This is releasing both POSIX and flock() locks, but they all use
> different fl_owner_t values. Each lock context therefore corresponds
> to _either_ a POSIX _or_ a flock() lock, or an OFD lock.

This detail I had completely missed. I can carry the lock type along on the
lock context and perform the correct unlock procedure from that.

> to _either_ a POSIX _or_ a flock() lock, or an OFD lock. Furthermore,
> you can end up with a double free here because you don't reset
> fl.fl_ops->fl_release_private to NULL before calling into the flock()
> case.

Yes, what an idiot am I. :)

> Also, what about the case where this is a local lock (i.e. we mounted
> with -onolock or -olocal_lock=)?

Then there's no need at all to defer an unlock, so we should check for it
before setting the flag to defer. That check will be added.

> > +}
> > +
> > /**
> > * nfs_close_context - Common close_context() routine NFSv2/v3
> > * @ctx: pointer to context
> > diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> > index fe3ddd2..f914fbe 100644
> > --- a/fs/nfs/pagelist.c
> > +++ b/fs/nfs/pagelist.c
> > @@ -108,9 +108,13 @@ nfs_iocounter_inc(struct nfs_io_counter *c)
> > }
> >
> > static void
> > -nfs_iocounter_dec(struct nfs_io_counter *c)
> > +nfs_iocounter_dec(struct nfs_lock_context *l_ctx)
> > {
> > + struct nfs_io_counter *c = &l_ctx->io_count;
> > +
> > if (atomic_dec_and_test(&c->io_count)) {
> > + if (test_and_clear_bit(NFS_LOCK_CLOSE_UNLOCK, &l_ctx->flags))
> > + nfs_unlock_lock_context(l_ctx);
>
> What guarantees that you are in a context where you are allowed to
> make synchronous RPC calls here? AFAICS you are almost always calling
> from an asynchronous I/O path callback.

Good point - the lock procedures could skip waiting for FL_CLOSE.

> > clear_bit(NFS_IO_INPROGRESS, &c->flags);
> > smp_mb__after_atomic();
> > wake_up_bit(&c->flags, NFS_IO_INPROGRESS);
>
> BTW: nfs_iocounter_dec() could use a cleanup. Now that we have
> wait_on_atomic_t(), there is no need for the NFS_IO_INPROGRESS bit.

I will attempt this, but please let me know if you consider the review of
my work to be much more work overall than if you were to fix it.. I really
appreciate the review and your time, and Happy New Year to you as well!

Ben

>
> > @@ -431,7 +435,7 @@ static void nfs_clear_request(struct nfs_page *req)
> > req->wb_page = NULL;
> > }
> > if (l_ctx != NULL) {
> > - nfs_iocounter_dec(&l_ctx->io_count);
> > + nfs_iocounter_dec(l_ctx);
> > nfs_put_lock_context(l_ctx);
> > req->wb_lock_context = NULL;
> > }
> > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> > index c0e9614..b105144 100644
> > --- a/include/linux/nfs_fs.h
> > +++ b/include/linux/nfs_fs.h
> > @@ -72,6 +72,8 @@ struct nfs_lock_context {
> > struct nfs_open_context *open_context;
> > struct nfs_lockowner lockowner;
> > struct nfs_io_counter io_count;
> > +#define NFS_LOCK_CLOSE_UNLOCK 0
> > + unsigned long flags;
> > };
> >
> > struct nfs4_state;
> > @@ -373,6 +375,7 @@ extern void nfs_file_set_open_context(struct file *filp, struct nfs_open_context
> > extern void nfs_file_clear_open_context(struct file *flip);
> > extern struct nfs_lock_context *nfs_get_lock_context(struct nfs_open_context *ctx);
> > extern void nfs_put_lock_context(struct nfs_lock_context *l_ctx);
> > +extern void nfs_unlock_lock_context(struct nfs_lock_context *l_ctx);
> > extern u64 nfs_compat_user_ino64(u64 fileid);
> > extern void nfs_fattr_init(struct nfs_fattr *fattr);
> > extern void nfs_fattr_set_barrier(struct nfs_fattr *fattr);
> > --
> > 1.7.1
>