2023-03-03 12:16:11

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 0/7] lockd: fix races that can result in stuck filelocks

I sent the first patch in this series the other day, but didn't get any
responses. Since then I've had time to follow up on the client-side part
of this problem, which eventually also pointed out yet another bug on
the server side. There are also a couple of cleanup patches in here too,
and a patch to add some tracepoints that I found useful while diagnosing
this.

With this set on both client and server, I'm now able to run Yongcheng's
test for an hour straight with no stuck locks.

Jeff Layton (7):
lockd: purge resources held on behalf of nlm clients when shutting
down
lockd: remove 2 unused helper functions
lockd: move struct nlm_wait to lockd.h
lockd: fix races in client GRANTED_MSG wait logic
lockd: server should unlock lock if client rejects the grant
nfs: move nfs_fhandle_hash to common include file
lockd: add some client-side tracepoints

fs/lockd/Makefile | 6 ++-
fs/lockd/clntlock.c | 58 +++++++++++---------------
fs/lockd/clntproc.c | 42 ++++++++++++++-----
fs/lockd/host.c | 1 +
fs/lockd/svclock.c | 21 ++++++++--
fs/lockd/trace.c | 3 ++
fs/lockd/trace.h | 83 +++++++++++++++++++++++++++++++++++++
fs/nfs/internal.h | 15 -------
include/linux/lockd/lockd.h | 29 ++++++-------
include/linux/nfs.h | 20 +++++++++
10 files changed, 200 insertions(+), 78 deletions(-)
create mode 100644 fs/lockd/trace.c
create mode 100644 fs/lockd/trace.h

--
2.39.2



2023-03-03 12:16:13

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 3/7] lockd: move struct nlm_wait to lockd.h

...and drop the unused b_reclaim field.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/clntlock.c | 12 ------------
include/linux/lockd/lockd.h | 11 ++++++++++-
2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
index 82b19a30e0f0..464cb15c1a06 100644
--- a/fs/lockd/clntlock.c
+++ b/fs/lockd/clntlock.c
@@ -29,18 +29,6 @@ static int reclaimer(void *ptr);
* client perspective.
*/

-/*
- * This is the representation of a blocked client lock.
- */
-struct nlm_wait {
- struct list_head b_list; /* linked list */
- wait_queue_head_t b_wait; /* where to wait on */
- struct nlm_host * b_host;
- struct file_lock * b_lock; /* local file lock */
- unsigned short b_reclaim; /* got to reclaim lock */
- __be32 b_status; /* grant callback status */
-};
-
static LIST_HEAD(nlm_blocked);
static DEFINE_SPINLOCK(nlm_blocked_lock);

diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
index 26c2aed31a0c..0eec760fcc05 100644
--- a/include/linux/lockd/lockd.h
+++ b/include/linux/lockd/lockd.h
@@ -121,7 +121,16 @@ struct nlm_lockowner {
uint32_t pid;
};

-struct nlm_wait;
+/*
+ * This is the representation of a blocked client lock.
+ */
+struct nlm_wait {
+ struct list_head b_list; /* linked list */
+ wait_queue_head_t b_wait; /* where to wait on */
+ struct nlm_host * b_host;
+ struct file_lock * b_lock; /* local file lock */
+ __be32 b_status; /* grant callback status */
+};

/*
* Memory chunk for NLM client RPC request.
--
2.39.2


2023-03-03 12:16:14

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 5/7] lockd: server should unlock lock if client rejects the grant

Currently lockd just dequeues the block and ignores it if the client
sends a GRANT_RES with a status of nlm_lck_denied. That status is an
indicator that the client has rejected the lock, so the right thing to
do is to unlock the lock we were trying to grant.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2063818
Reported-by: Yongcheng Yang <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/svclock.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 4e30f3c50970..c43ccdf28ed9 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -954,19 +954,32 @@ void
nlmsvc_grant_reply(struct nlm_cookie *cookie, __be32 status)
{
struct nlm_block *block;
+ struct file_lock *fl;
+ int error;

dprintk("grant_reply: looking for cookie %x, s=%d \n",
*(unsigned int *)(cookie->data), status);
if (!(block = nlmsvc_find_block(cookie)))
return;

- if (status == nlm_lck_denied_grace_period) {
+ switch (status) {
+ case nlm_lck_denied_grace_period:
/* Try again in a couple of seconds */
nlmsvc_insert_block(block, 10 * HZ);
- } else {
+ break;
+ case nlm_lck_denied:
+ /* Client doesn't want it, just unlock it */
+ nlmsvc_unlink_block(block);
+ fl = &block->b_call->a_args.lock.fl;
+ fl->fl_type = F_UNLCK;
+ error = vfs_lock_file(fl->fl_file, F_SETLK, fl, NULL);
+ if (error)
+ pr_warn("lockd: unable to unlock lock rejected by client!\n");
+ break;
+ default:
/*
- * Lock is now held by client, or has been rejected.
- * In both cases, the block should be removed.
+ * Either it was accepted or the status makes no sense
+ * just unlink it either way.
*/
nlmsvc_unlink_block(block);
}
--
2.39.2


2023-03-03 12:16:14

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 4/7] lockd: fix races in client GRANTED_MSG wait logic

After the wait for a grant is done (for whatever reason), nlmclnt_block
updates the status of the nlm_rqst with the status of the block. At the
point it does this, however, the block is still queued its status could
change at any time.

This is particularly a problem when the waiting task is signaled during
the wait. We can end up giving up on the lock just before the GRANTED_MSG
callback comes in, and accept it even though the lock request gets back
an error, leaving a dangling lock on the server.

Since the nlm_wait never lives beyond the end of nlmclnt_lock, put it on
the stack and add functions to allow us to enqueue and dequeue the
block. Enqueue it just before the lock/wait loop, and dequeue it
just after we exit the loop instead of waiting until the end of
the function. Also, scrape the status at the time that we dequeue it to
ensure that it's final.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2063818
Reported-by: Yongcheng Yang <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/clntlock.c | 42 ++++++++++++++++++-------------------
fs/lockd/clntproc.c | 28 ++++++++++++++++---------
include/linux/lockd/lockd.h | 8 ++++---
3 files changed, 44 insertions(+), 34 deletions(-)

diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
index 464cb15c1a06..c374ee072db3 100644
--- a/fs/lockd/clntlock.c
+++ b/fs/lockd/clntlock.c
@@ -82,41 +82,42 @@ void nlmclnt_done(struct nlm_host *host)
}
EXPORT_SYMBOL_GPL(nlmclnt_done);

+void nlmclnt_prepare_block(struct nlm_wait *block, struct nlm_host *host, struct file_lock *fl)
+{
+ block->b_host = host;
+ block->b_lock = fl;
+ init_waitqueue_head(&block->b_wait);
+ block->b_status = nlm_lck_blocked;
+}
+
/*
* Queue up a lock for blocking so that the GRANTED request can see it
*/
-struct nlm_wait *nlmclnt_prepare_block(struct nlm_host *host, struct file_lock *fl)
+void nlmclnt_queue_block(struct nlm_wait *block)
{
- struct nlm_wait *block;
-
- block = kmalloc(sizeof(*block), GFP_KERNEL);
- if (block != NULL) {
- block->b_host = host;
- block->b_lock = fl;
- init_waitqueue_head(&block->b_wait);
- block->b_status = nlm_lck_blocked;
-
- spin_lock(&nlm_blocked_lock);
- list_add(&block->b_list, &nlm_blocked);
- spin_unlock(&nlm_blocked_lock);
- }
- return block;
+ spin_lock(&nlm_blocked_lock);
+ list_add(&block->b_list, &nlm_blocked);
+ spin_unlock(&nlm_blocked_lock);
}

-void nlmclnt_finish_block(struct nlm_wait *block)
+/*
+ * Dequeue the block and return its final status
+ */
+__be32 nlmclnt_dequeue_block(struct nlm_wait *block)
{
- if (block == NULL)
- return;
+ __be32 status;
+
spin_lock(&nlm_blocked_lock);
list_del(&block->b_list);
+ status = block->b_status;
spin_unlock(&nlm_blocked_lock);
- kfree(block);
+ return status;
}

/*
* Block on a lock
*/
-int nlmclnt_block(struct nlm_wait *block, struct nlm_rqst *req, long timeout)
+int nlmclnt_wait(struct nlm_wait *block, struct nlm_rqst *req, long timeout)
{
long ret;

@@ -142,7 +143,6 @@ int nlmclnt_block(struct nlm_wait *block, struct nlm_rqst *req, long timeout)
/* Reset the lock status after a server reboot so we resend */
if (block->b_status == nlm_lck_denied_grace_period)
block->b_status = nlm_lck_blocked;
- req->a_res.status = block->b_status;
return 0;
}

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 16b4de868cd2..a14c9110719c 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -516,9 +516,10 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
const struct cred *cred = nfs_file_cred(fl->fl_file);
struct nlm_host *host = req->a_host;
struct nlm_res *resp = &req->a_res;
- struct nlm_wait *block = NULL;
+ struct nlm_wait block;
unsigned char fl_flags = fl->fl_flags;
unsigned char fl_type;
+ __be32 b_status;
int status = -ENOLCK;

if (nsm_monitor(host) < 0)
@@ -531,31 +532,41 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
if (status < 0)
goto out;

- block = nlmclnt_prepare_block(host, fl);
+ nlmclnt_prepare_block(&block, host, fl);
again:
/*
* Initialise resp->status to a valid non-zero value,
* since 0 == nlm_lck_granted
*/
resp->status = nlm_lck_blocked;
- for(;;) {
+
+ /*
+ * A GRANTED callback can come at any time -- even before the reply
+ * to the LOCK request arrives, so we queue the wait before
+ * requesting the lock.
+ */
+ nlmclnt_queue_block(&block);
+ for (;;) {
/* Reboot protection */
fl->fl_u.nfs_fl.state = host->h_state;
status = nlmclnt_call(cred, req, NLMPROC_LOCK);
if (status < 0)
break;
/* Did a reclaimer thread notify us of a server reboot? */
- if (resp->status == nlm_lck_denied_grace_period)
+ if (resp->status == nlm_lck_denied_grace_period)
continue;
if (resp->status != nlm_lck_blocked)
break;
/* Wait on an NLM blocking lock */
- status = nlmclnt_block(block, req, NLMCLNT_POLL_TIMEOUT);
+ status = nlmclnt_wait(&block, req, NLMCLNT_POLL_TIMEOUT);
if (status < 0)
break;
- if (resp->status != nlm_lck_blocked)
+ if (block.b_status != nlm_lck_blocked)
break;
}
+ b_status = nlmclnt_dequeue_block(&block);
+ if (resp->status == nlm_lck_blocked)
+ resp->status = b_status;

/* if we were interrupted while blocking, then cancel the lock request
* and exit
@@ -564,7 +575,7 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
if (!req->a_args.block)
goto out_unlock;
if (nlmclnt_cancel(host, req->a_args.block, fl) == 0)
- goto out_unblock;
+ goto out;
}

if (resp->status == nlm_granted) {
@@ -593,8 +604,6 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
status = -ENOLCK;
else
status = nlm_stat_to_errno(resp->status);
-out_unblock:
- nlmclnt_finish_block(block);
out:
nlmclnt_release_call(req);
return status;
@@ -602,7 +611,6 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
/* Fatal error: ensure that we remove the lock altogether */
dprintk("lockd: lock attempt ended in fatal error.\n"
" Attempting to unlock.\n");
- nlmclnt_finish_block(block);
fl_type = fl->fl_type;
fl->fl_type = F_UNLCK;
down_read(&host->h_rwsem);
diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
index 0eec760fcc05..7452fb88ecd4 100644
--- a/include/linux/lockd/lockd.h
+++ b/include/linux/lockd/lockd.h
@@ -211,9 +211,11 @@ struct nlm_rqst * nlm_alloc_call(struct nlm_host *host);
int nlm_async_call(struct nlm_rqst *, u32, const struct rpc_call_ops *);
int nlm_async_reply(struct nlm_rqst *, u32, const struct rpc_call_ops *);
void nlmclnt_release_call(struct nlm_rqst *);
-struct nlm_wait * nlmclnt_prepare_block(struct nlm_host *host, struct file_lock *fl);
-void nlmclnt_finish_block(struct nlm_wait *block);
-int nlmclnt_block(struct nlm_wait *block, struct nlm_rqst *req, long timeout);
+void nlmclnt_prepare_block(struct nlm_wait *block, struct nlm_host *host,
+ struct file_lock *fl);
+void nlmclnt_queue_block(struct nlm_wait *block);
+__be32 nlmclnt_dequeue_block(struct nlm_wait *block);
+int nlmclnt_wait(struct nlm_wait *block, struct nlm_rqst *req, long timeout);
__be32 nlmclnt_grant(const struct sockaddr *addr,
const struct nlm_lock *lock);
void nlmclnt_recovery(struct nlm_host *);
--
2.39.2


2023-03-03 12:16:15

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 7/7] lockd: add some client-side tracepoints

Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/Makefile | 6 ++--
fs/lockd/clntlock.c | 4 +++
fs/lockd/clntproc.c | 14 ++++++++
fs/lockd/trace.c | 3 ++
fs/lockd/trace.h | 83 +++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 108 insertions(+), 2 deletions(-)
create mode 100644 fs/lockd/trace.c
create mode 100644 fs/lockd/trace.h

diff --git a/fs/lockd/Makefile b/fs/lockd/Makefile
index 6d5e83ed4476..ac9f9d84510e 100644
--- a/fs/lockd/Makefile
+++ b/fs/lockd/Makefile
@@ -3,10 +3,12 @@
# Makefile for the linux lock manager stuff
#

+ccflags-y += -I$(src) # needed for trace events
+
obj-$(CONFIG_LOCKD) += lockd.o

-lockd-objs-y := clntlock.o clntproc.o clntxdr.o host.o svc.o svclock.o \
- svcshare.o svcproc.o svcsubs.o mon.o xdr.o
+lockd-objs-y += clntlock.o clntproc.o clntxdr.o host.o svc.o svclock.o \
+ svcshare.o svcproc.o svcsubs.o mon.o trace.o xdr.o
lockd-objs-$(CONFIG_LOCKD_V4) += clnt4xdr.o xdr4.o svc4proc.o
lockd-objs-$(CONFIG_PROC_FS) += procfs.o
lockd-objs := $(lockd-objs-y)
diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c
index c374ee072db3..e3972aa3045a 100644
--- a/fs/lockd/clntlock.c
+++ b/fs/lockd/clntlock.c
@@ -14,9 +14,12 @@
#include <linux/nfs_fs.h>
#include <linux/sunrpc/addr.h>
#include <linux/sunrpc/svc.h>
+#include <linux/sunrpc/svc_xprt.h>
#include <linux/lockd/lockd.h>
#include <linux/kthread.h>

+#include "trace.h"
+
#define NLMDBG_FACILITY NLMDBG_CLIENT

/*
@@ -186,6 +189,7 @@ __be32 nlmclnt_grant(const struct sockaddr *addr, const struct nlm_lock *lock)
res = nlm_granted;
}
spin_unlock(&nlm_blocked_lock);
+ trace_nlmclnt_grant(lock, addr, svc_addr_len(addr), res);
return res;
}

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index a14c9110719c..fba6c7fa7474 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -20,6 +20,8 @@
#include <linux/sunrpc/svc.h>
#include <linux/lockd/lockd.h>

+#include "trace.h"
+
#define NLMDBG_FACILITY NLMDBG_CLIENT
#define NLMCLNT_GRACE_WAIT (5*HZ)
#define NLMCLNT_POLL_TIMEOUT (30*HZ)
@@ -451,6 +453,9 @@ nlmclnt_test(struct nlm_rqst *req, struct file_lock *fl)
status = nlm_stat_to_errno(req->a_res.status);
}
out:
+ trace_nlmclnt_test(&req->a_args.lock,
+ (const struct sockaddr *)&req->a_host->h_addr,
+ req->a_host->h_addrlen, req->a_res.status);
nlmclnt_release_call(req);
return status;
}
@@ -605,10 +610,16 @@ nlmclnt_lock(struct nlm_rqst *req, struct file_lock *fl)
else
status = nlm_stat_to_errno(resp->status);
out:
+ trace_nlmclnt_lock(&req->a_args.lock,
+ (const struct sockaddr *)&req->a_host->h_addr,
+ req->a_host->h_addrlen, req->a_res.status);
nlmclnt_release_call(req);
return status;
out_unlock:
/* Fatal error: ensure that we remove the lock altogether */
+ trace_nlmclnt_lock(&req->a_args.lock,
+ (const struct sockaddr *)&req->a_host->h_addr,
+ req->a_host->h_addrlen, req->a_res.status);
dprintk("lockd: lock attempt ended in fatal error.\n"
" Attempting to unlock.\n");
fl_type = fl->fl_type;
@@ -704,6 +715,9 @@ nlmclnt_unlock(struct nlm_rqst *req, struct file_lock *fl)
/* What to do now? I'm out of my depth... */
status = -ENOLCK;
out:
+ trace_nlmclnt_unlock(&req->a_args.lock,
+ (const struct sockaddr *)&req->a_host->h_addr,
+ req->a_host->h_addrlen, req->a_res.status);
nlmclnt_release_call(req);
return status;
}
diff --git a/fs/lockd/trace.c b/fs/lockd/trace.c
new file mode 100644
index 000000000000..d9a6ff6e673c
--- /dev/null
+++ b/fs/lockd/trace.c
@@ -0,0 +1,3 @@
+// SPDX-License-Identifier: GPL-2.0
+#define CREATE_TRACE_POINTS
+#include "trace.h"
diff --git a/fs/lockd/trace.h b/fs/lockd/trace.h
new file mode 100644
index 000000000000..b79d154d5e66
--- /dev/null
+++ b/fs/lockd/trace.h
@@ -0,0 +1,83 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM lockd
+
+#if !defined(_TRACE_LOCKD_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_LOCKD_H
+
+#include <linux/tracepoint.h>
+#include <linux/crc32.h>
+#include <linux/nfs.h>
+#include <linux/lockd/lockd.h>
+
+#define show_nlm_status(val) \
+ __print_symbolic(val, \
+ { NLM_LCK_GRANTED, "LCK_GRANTED" }, \
+ { NLM_LCK_DENIED, "LCK_DENIED" }, \
+ { NLM_LCK_DENIED_NOLOCKS, "LCK_DENIED_NOLOCKS" }, \
+ { NLM_LCK_BLOCKED, "LCK_BLOCKED" }, \
+ { NLM_LCK_DENIED_GRACE_PERIOD, "LCK_DENIED_GRACE_PERIOD" }, \
+ { NLM_DEADLCK, "DEADLCK" }, \
+ { NLM_ROFS, "ROFS" }, \
+ { NLM_STALE_FH, "STALE_FH" }, \
+ { NLM_FBIG, "FBIG" }, \
+ { NLM_FAILED, "FAILED" })
+
+DECLARE_EVENT_CLASS(nlmclnt_lock_event,
+ TP_PROTO(
+ const struct nlm_lock *lock,
+ const struct sockaddr *addr,
+ unsigned int addrlen,
+ __be32 status
+ ),
+
+ TP_ARGS(lock, addr, addrlen, status),
+
+ TP_STRUCT__entry(
+ __field(u32, oh)
+ __field(u32, svid)
+ __field(u32, fh)
+ __field(u32, status)
+ __field(u64, start)
+ __field(u64, len)
+ __sockaddr(addr, addrlen)
+ ),
+
+ TP_fast_assign(
+ __entry->oh = ~crc32_le(0xffffffff, lock->oh.data, lock->oh.len);
+ __entry->svid = lock->svid;
+ __entry->fh = nfs_fhandle_hash(&lock->fh);
+ __entry->start = lock->lock_start;
+ __entry->len = lock->lock_len;
+ __entry->status = be32_to_cpu(status);
+ __assign_sockaddr(addr, addr, addrlen);
+ ),
+
+ TP_printk(
+ "addr=%pISpc oh=0x%08x svid=0x%08x fh=0x%08x start=%llu len=%llu status=%s",
+ __get_sockaddr(addr), __entry->oh, __entry->svid,
+ __entry->fh, __entry->start, __entry->len,
+ show_nlm_status(__entry->status)
+ )
+);
+
+#define DEFINE_NLMCLNT_EVENT(name) \
+ DEFINE_EVENT(nlmclnt_lock_event, name, \
+ TP_PROTO( \
+ const struct nlm_lock *lock, \
+ const struct sockaddr *addr, \
+ unsigned int addrlen, \
+ __be32 status \
+ ), \
+ TP_ARGS(lock, addr, addrlen, status))
+
+DEFINE_NLMCLNT_EVENT(nlmclnt_test);
+DEFINE_NLMCLNT_EVENT(nlmclnt_lock);
+DEFINE_NLMCLNT_EVENT(nlmclnt_unlock);
+DEFINE_NLMCLNT_EVENT(nlmclnt_grant);
+#endif /* _TRACE_LOCKD_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace
+#include <trace/define_trace.h>
--
2.39.2


2023-03-03 12:16:16

by Jeff Layton

[permalink] [raw]
Subject: [PATCH 6/7] nfs: move nfs_fhandle_hash to common include file

lockd needs to be able to hash filehandles for tracepoints. Move
nfs_fhandle_hash to a common nfs include file.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/internal.h | 15 ---------------
include/linux/nfs.h | 20 ++++++++++++++++++++
2 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 2a65fe2a63ab..10fb5e7573eb 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -846,27 +846,12 @@ u64 nfs_timespec_to_change_attr(const struct timespec64 *ts)
}

#ifdef CONFIG_CRC32
-/**
- * nfs_fhandle_hash - calculate the crc32 hash for the filehandle
- * @fh - pointer to filehandle
- *
- * returns a crc32 hash for the filehandle that is compatible with
- * the one displayed by "wireshark".
- */
-static inline u32 nfs_fhandle_hash(const struct nfs_fh *fh)
-{
- return ~crc32_le(0xFFFFFFFF, &fh->data[0], fh->size);
-}
static inline u32 nfs_stateid_hash(const nfs4_stateid *stateid)
{
return ~crc32_le(0xFFFFFFFF, &stateid->other[0],
NFS4_STATEID_OTHER_SIZE);
}
#else
-static inline u32 nfs_fhandle_hash(const struct nfs_fh *fh)
-{
- return 0;
-}
static inline u32 nfs_stateid_hash(nfs4_stateid *stateid)
{
return 0;
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index b06375e88e58..ceb70a926b95 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -10,6 +10,7 @@

#include <linux/sunrpc/msg_prot.h>
#include <linux/string.h>
+#include <linux/crc32.h>
#include <uapi/linux/nfs.h>

/*
@@ -44,4 +45,23 @@ enum nfs3_stable_how {
/* used by direct.c to mark verf as invalid */
NFS_INVALID_STABLE_HOW = -1
};
+
+#ifdef CONFIG_CRC32
+/**
+ * nfs_fhandle_hash - calculate the crc32 hash for the filehandle
+ * @fh - pointer to filehandle
+ *
+ * returns a crc32 hash for the filehandle that is compatible with
+ * the one displayed by "wireshark".
+ */
+static inline u32 nfs_fhandle_hash(const struct nfs_fh *fh)
+{
+ return ~crc32_le(0xFFFFFFFF, &fh->data[0], fh->size);
+}
+#else /* CONFIG_CRC32 */
+static inline u32 nfs_fhandle_hash(const struct nfs_fh *fh)
+{
+ return 0;
+}
+#endif /* CONFIG_CRC32 */
#endif /* _LINUX_NFS_H */
--
2.39.2


2023-03-03 14:42:14

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 0/7] lockd: fix races that can result in stuck filelocks



> On Mar 3, 2023, at 7:15 AM, Jeff Layton <[email protected]> wrote:
>
> I sent the first patch in this series the other day, but didn't get any
> responses.

We'll have to work out who will take which patches in this set.
Once fully reviewed, I can take the set if the client maintainers
send Acks for 2-4 and 6-7.

nfsd-next for v6.4 is not yet open. I can work on setting that up
today.


> Since then I've had time to follow up on the client-side part
> of this problem, which eventually also pointed out yet another bug on
> the server side. There are also a couple of cleanup patches in here too,
> and a patch to add some tracepoints that I found useful while diagnosing
> this.
>
> With this set on both client and server, I'm now able to run Yongcheng's
> test for an hour straight with no stuck locks.
>
> Jeff Layton (7):
> lockd: purge resources held on behalf of nlm clients when shutting
> down
> lockd: remove 2 unused helper functions
> lockd: move struct nlm_wait to lockd.h
> lockd: fix races in client GRANTED_MSG wait logic
> lockd: server should unlock lock if client rejects the grant
> nfs: move nfs_fhandle_hash to common include file
> lockd: add some client-side tracepoints
>
> fs/lockd/Makefile | 6 ++-
> fs/lockd/clntlock.c | 58 +++++++++++---------------
> fs/lockd/clntproc.c | 42 ++++++++++++++-----
> fs/lockd/host.c | 1 +
> fs/lockd/svclock.c | 21 ++++++++--
> fs/lockd/trace.c | 3 ++
> fs/lockd/trace.h | 83 +++++++++++++++++++++++++++++++++++++
> fs/nfs/internal.h | 15 -------
> include/linux/lockd/lockd.h | 29 ++++++-------
> include/linux/nfs.h | 20 +++++++++
> 10 files changed, 200 insertions(+), 78 deletions(-)
> create mode 100644 fs/lockd/trace.c
> create mode 100644 fs/lockd/trace.h
>
> --
> 2.39.2
>

--
Chuck Lever



2023-03-03 18:11:53

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 0/7] lockd: fix races that can result in stuck filelocks



> On Mar 3, 2023, at 9:41 AM, Chuck Lever III <[email protected]> wrote:
>
>
>
>> On Mar 3, 2023, at 7:15 AM, Jeff Layton <[email protected]> wrote:
>>
>> I sent the first patch in this series the other day, but didn't get any
>> responses.
>
> We'll have to work out who will take which patches in this set.
> Once fully reviewed, I can take the set if the client maintainers
> send Acks for 2-4 and 6-7.
>
> nfsd-next for v6.4 is not yet open. I can work on setting that up
> today.
>
>
>> Since then I've had time to follow up on the client-side part
>> of this problem, which eventually also pointed out yet another bug on
>> the server side. There are also a couple of cleanup patches in here too,
>> and a patch to add some tracepoints that I found useful while diagnosing
>> this.
>>
>> With this set on both client and server, I'm now able to run Yongcheng's
>> test for an hour straight with no stuck locks.
>>
>> Jeff Layton (7):
>> lockd: purge resources held on behalf of nlm clients when shutting
>> down
>> lockd: remove 2 unused helper functions
>> lockd: move struct nlm_wait to lockd.h
>> lockd: fix races in client GRANTED_MSG wait logic
>> lockd: server should unlock lock if client rejects the grant
>> nfs: move nfs_fhandle_hash to common include file
>> lockd: add some client-side tracepoints
>>
>> fs/lockd/Makefile | 6 ++-
>> fs/lockd/clntlock.c | 58 +++++++++++---------------
>> fs/lockd/clntproc.c | 42 ++++++++++++++-----
>> fs/lockd/host.c | 1 +
>> fs/lockd/svclock.c | 21 ++++++++--
>> fs/lockd/trace.c | 3 ++
>> fs/lockd/trace.h | 83 +++++++++++++++++++++++++++++++++++++
>> fs/nfs/internal.h | 15 -------
>> include/linux/lockd/lockd.h | 29 ++++++-------
>> include/linux/nfs.h | 20 +++++++++
>> 10 files changed, 200 insertions(+), 78 deletions(-)
>> create mode 100644 fs/lockd/trace.c
>> create mode 100644 fs/lockd/trace.h
>>
>> --
>> 2.39.2

I've opened nfsd-next for v6.4 and applied these. I can drop any
that the client maintainers wish to take through their tree or
would prefer to reject.

Noted that several of these had checkpatch.pl warnings or errors.
I fixed up the issues before applying them.


--
Chuck Lever



2023-03-12 15:33:48

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 0/7] lockd: fix races that can result in stuck filelocks

On Fri, Mar 3, 2023 at 4:54 PM Chuck Lever III <[email protected]> wrote:
>
>
>
> > On Mar 3, 2023, at 7:15 AM, Jeff Layton <[email protected]> wrote:
> >
> > I sent the first patch in this series the other day, but didn't get any
> > responses.
>
> We'll have to work out who will take which patches in this set.
> Once fully reviewed, I can take the set if the client maintainers
> send Acks for 2-4 and 6-7.
>
> nfsd-next for v6.4 is not yet open. I can work on setting that up
> today.
>
>
> > Since then I've had time to follow up on the client-side part
> > of this problem, which eventually also pointed out yet another bug on
> > the server side. There are also a couple of cleanup patches in here too,
> > and a patch to add some tracepoints that I found useful while diagnosing
> > this.
> >
> > With this set on both client and server, I'm now able to run Yongcheng's
> > test for an hour straight with no stuck locks.

My nfstest_lock test occasionally gets into an endless wait loop for the lock in
one of the optests.

AFAIK, this started happening after I upgraded my client machine to v5.15.88.
Does this seem related to the client bug fixes in this patch set?

If so, is this bug a regression? and why are the fixes aimed for v6.4?

Thanks,
Amir.

2023-03-12 16:44:41

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 0/7] lockd: fix races that can result in stuck filelocks



> On Mar 12, 2023, at 11:33 AM, Amir Goldstein <[email protected]> wrote:
>
> On Fri, Mar 3, 2023 at 4:54 PM Chuck Lever III <[email protected]> wrote:
>>
>>
>>
>>> On Mar 3, 2023, at 7:15 AM, Jeff Layton <[email protected]> wrote:
>>>
>>> I sent the first patch in this series the other day, but didn't get any
>>> responses.
>>
>> We'll have to work out who will take which patches in this set.
>> Once fully reviewed, I can take the set if the client maintainers
>> send Acks for 2-4 and 6-7.
>>
>> nfsd-next for v6.4 is not yet open. I can work on setting that up
>> today.
>>
>>
>>> Since then I've had time to follow up on the client-side part
>>> of this problem, which eventually also pointed out yet another bug on
>>> the server side. There are also a couple of cleanup patches in here too,
>>> and a patch to add some tracepoints that I found useful while diagnosing
>>> this.
>>>
>>> With this set on both client and server, I'm now able to run Yongcheng's
>>> test for an hour straight with no stuck locks.
>
> My nfstest_lock test occasionally gets into an endless wait loop for the lock in
> one of the optests.
>
> AFAIK, this started happening after I upgraded my client machine to v5.15.88.
> Does this seem related to the client bug fixes in this patch set?

I will let Jeff tackle that question. He did not add a Fixes:
tag, so it's difficult to say off-hand.


> If so, is this bug a regression?

If your test misbehavior is related to these fixes, then probably
yes. But this is the first I've heard of a longer-term problem.


> and why are the fixes aimed for v6.4?

Because these are test failures, not failures of non-artificial
workloads. Also because we haven't heard reports, potential or
otherwise, of a regression, until now.

Since they are test failures only, there doesn't seem to be an
urgency to get them into 6.3-rc. I prefer to let these sit in
-next for a bit, therefore. As we are well aware, patches that
go into -rc are rather aggressively pulled into stable, and I
would like to have some confidence that these fixes do not
introduce further problems.

You are welcome to petition for faster integration. It would
help if you had a positive test result to share with us.

--
Chuck Lever


2023-03-13 10:45:32

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 0/7] lockd: fix races that can result in stuck filelocks

On Sun, 2023-03-12 at 17:33 +0200, Amir Goldstein wrote:
> On Fri, Mar 3, 2023 at 4:54 PM Chuck Lever III <[email protected]> wrote:
> >
> >
> >
> > > On Mar 3, 2023, at 7:15 AM, Jeff Layton <[email protected]> wrote:
> > >
> > > I sent the first patch in this series the other day, but didn't get any
> > > responses.
> >
> > We'll have to work out who will take which patches in this set.
> > Once fully reviewed, I can take the set if the client maintainers
> > send Acks for 2-4 and 6-7.
> >
> > nfsd-next for v6.4 is not yet open. I can work on setting that up
> > today.
> >
> >
> > > Since then I've had time to follow up on the client-side part
> > > of this problem, which eventually also pointed out yet another bug on
> > > the server side. There are also a couple of cleanup patches in here too,
> > > and a patch to add some tracepoints that I found useful while diagnosing
> > > this.
> > >
> > > With this set on both client and server, I'm now able to run Yongcheng's
> > > test for an hour straight with no stuck locks.
>
> My nfstest_lock test occasionally gets into an endless wait loop for the lock in
> one of the optests.
>
> AFAIK, this started happening after I upgraded my client machine to v5.15.88.
> Does this seem related to the client bug fixes in this patch set?
>
> If so, is this bug a regression? and why are the fixes aimed for v6.4?
>

Most of this (lockd) code hasn't changed in well over a decade, so if
this is a regression then it's a very old one. I suppose it's possible
that this regressed after the BKL was removed from this code, but that
was a long time ago now and I'm not sure I can identify a commit that
this fixes.

I'm fine with this going in sooner than v6.4, but given that this has
been broken so long, I didn't see the need to rush.

Cheers,
--
Jeff Layton <[email protected]>

2023-03-13 15:14:56

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH 0/7] lockd: fix races that can result in stuck filelocks

On Mon, Mar 13, 2023 at 12:45 PM Jeff Layton <[email protected]> wrote:
>
> On Sun, 2023-03-12 at 17:33 +0200, Amir Goldstein wrote:
> > On Fri, Mar 3, 2023 at 4:54 PM Chuck Lever III <[email protected]> wrote:
> > >
> > >
> > >
> > > > On Mar 3, 2023, at 7:15 AM, Jeff Layton <[email protected]> wrote:
> > > >
> > > > I sent the first patch in this series the other day, but didn't get any
> > > > responses.
> > >
> > > We'll have to work out who will take which patches in this set.
> > > Once fully reviewed, I can take the set if the client maintainers
> > > send Acks for 2-4 and 6-7.
> > >
> > > nfsd-next for v6.4 is not yet open. I can work on setting that up
> > > today.
> > >
> > >
> > > > Since then I've had time to follow up on the client-side part
> > > > of this problem, which eventually also pointed out yet another bug on
> > > > the server side. There are also a couple of cleanup patches in here too,
> > > > and a patch to add some tracepoints that I found useful while diagnosing
> > > > this.
> > > >
> > > > With this set on both client and server, I'm now able to run Yongcheng's
> > > > test for an hour straight with no stuck locks.
> >
> > My nfstest_lock test occasionally gets into an endless wait loop for the lock in
> > one of the optests.

I forgot to mention that the regression is only with nfsversion=3!
Is anyone else running nfstest_lock with nfsversion=3?

> >
> > AFAIK, this started happening after I upgraded my client machine to v5.15.88.
> > Does this seem related to the client bug fixes in this patch set?
> >
> > If so, is this bug a regression? and why are the fixes aimed for v6.4?
> >
>
> Most of this (lockd) code hasn't changed in well over a decade, so if
> this is a regression then it's a very old one. I suppose it's possible
> that this regressed after the BKL was removed from this code, but that
> was a long time ago now and I'm not sure I can identify a commit that
> this fixes.
>
> I'm fine with this going in sooner than v6.4, but given that this has
> been broken so long, I didn't see the need to rush.
>

I don't know what is the relation of the optest regression that I am
experiencing and the client and server bugs mentioned in this patch set.
I just re-tested optest01 with several combinations of client-server kernels.
I rebooted both client and server before each test.
The results are a bit odd:

client server optest01 result
------------------------------------------------------
5.10.109 5.10.109 optest01 completes successfully after <30s
5.15.88 5.15.88 optest01 never completes (see attached log)
5.15.88 5.10.109 optest01 never completes
5.15.88+ [*] 5.15.88 optest01 never completes
5.15.88+ 5.10.109 optest01 never completes
5.15.88+ 5.15.88+ optest01 completes successfully after ~300s [**]

Unless I missed something with the tests, it looks like
1.a. There was a regressions in client from 5.10.109..5.15.88
1.b. The regression is manifested with both 5.10 and 5.15 servers
2.a. The patches improve the situation (from infinite to 30s per wait)...
2.b. ...but only when applied to both client and server and...
2.c. The situation is still a lot worse than 5.10 client with 5.10 server

Attached also the NFS[D] Kconfig which is identical for the tested
5.10 and 5.15 kernels.

Do you need me to provide any traces or any other info?

Thanks,
Amir.

[*] 5.15.88+ stands for 5.15.88 + the patches in this set, which all
apply cleanly
[**] The test takes 300s because every single 30s wait takes the entire 30s:

DBG1: 15:21:47.118095 - Unlock file (F_UNLCK, F_SETLK) off=0 len=0
range(0, 18446744073709551615)
DBG3: 15:21:47.119832 - Wait up to 30 secs to check if blocked
lock has been granted @253.87
DBG3: 15:21:48.121296 - Check if blocked lock has been granted @254.87
...
DBG3: 15:22:14.158314 - Check if blocked lock has been granted @280.90
DBG3: 15:22:15.017594 - Getting results from blocked lock @281.76
DBG1: 15:22:15.017832 - Unlock file (F_UNLCK, F_SETLK) off=0 len=0
range(0, 18446744073709551615) on second process @281.76
PASS: Locking byte range (72 passed, 0 failed)


Attachments:
optest01.nfsver3.linux-5.15.88.log.gz (32.20 kB)
5.15.88.NFS.config (671.00 B)
Download all attachments

2023-03-13 19:20:42

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 0/7] lockd: fix races that can result in stuck filelocks

On Mon, 2023-03-13 at 17:14 +0200, Amir Goldstein wrote:
> On Mon, Mar 13, 2023 at 12:45 PM Jeff Layton <[email protected]> wrote:
> >
> > On Sun, 2023-03-12 at 17:33 +0200, Amir Goldstein wrote:
> > > On Fri, Mar 3, 2023 at 4:54 PM Chuck Lever III <[email protected]> wrote:
> > > >
> > > >
> > > >
> > > > > On Mar 3, 2023, at 7:15 AM, Jeff Layton <[email protected]> wrote:
> > > > >
> > > > > I sent the first patch in this series the other day, but didn't get any
> > > > > responses.
> > > >
> > > > We'll have to work out who will take which patches in this set.
> > > > Once fully reviewed, I can take the set if the client maintainers
> > > > send Acks for 2-4 and 6-7.
> > > >
> > > > nfsd-next for v6.4 is not yet open. I can work on setting that up
> > > > today.
> > > >
> > > >
> > > > > Since then I've had time to follow up on the client-side part
> > > > > of this problem, which eventually also pointed out yet another bug on
> > > > > the server side. There are also a couple of cleanup patches in here too,
> > > > > and a patch to add some tracepoints that I found useful while diagnosing
> > > > > this.
> > > > >
> > > > > With this set on both client and server, I'm now able to run Yongcheng's
> > > > > test for an hour straight with no stuck locks.
> > >
> > > My nfstest_lock test occasionally gets into an endless wait loop for the lock in
> > > one of the optests.
>
> I forgot to mention that the regression is only with nfsversion=3!
> Is anyone else running nfstest_lock with nfsversion=3?
>
> > >
> > > AFAIK, this started happening after I upgraded my client machine to v5.15.88.
> > > Does this seem related to the client bug fixes in this patch set?
> > >
> > > If so, is this bug a regression? and why are the fixes aimed for v6.4?
> > >
> >
> > Most of this (lockd) code hasn't changed in well over a decade, so if
> > this is a regression then it's a very old one. I suppose it's possible
> > that this regressed after the BKL was removed from this code, but that
> > was a long time ago now and I'm not sure I can identify a commit that
> > this fixes.
> >
> > I'm fine with this going in sooner than v6.4, but given that this has
> > been broken so long, I didn't see the need to rush.
> >
>
> I don't know what is the relation of the optest regression that I am
> experiencing and the client and server bugs mentioned in this patch set.
> I just re-tested optest01 with several combinations of client-server kernels.
> I rebooted both client and server before each test.
> The results are a bit odd:
>
> client server optest01 result
> ------------------------------------------------------
> 5.10.109 5.10.109 optest01 completes successfully after <30s
> 5.15.88 5.15.88 optest01 never completes (see attached log)
> 5.15.88 5.10.109 optest01 never completes
> 5.15.88+ [*] 5.15.88 optest01 never completes
> 5.15.88+ 5.10.109 optest01 never completes
> 5.15.88+ 5.15.88+ optest01 completes successfully after ~300s [**]
>
> Unless I missed something with the tests, it looks like
> 1.a. There was a regressions in client from 5.10.109..5.15.88
> 1.b. The regression is manifested with both 5.10 and 5.15 servers
> 2.a. The patches improve the situation (from infinite to 30s per wait)...
> 2.b. ...but only when applied to both client and server and...
> 2.c. The situation is still a lot worse than 5.10 client with 5.10 server
>
> Attached also the NFS[D] Kconfig which is identical for the tested
> 5.10 and 5.15 kernels.
>
> Do you need me to provide any traces or any other info?
>
> Thanks,
> Amir.
>
> [*] 5.15.88+ stands for 5.15.88 + the patches in this set, which all
> apply cleanly
> [**] The test takes 300s because every single 30s wait takes the entire 30s:
>
> DBG1: 15:21:47.118095 - Unlock file (F_UNLCK, F_SETLK) off=0 len=0
> range(0, 18446744073709551615)
> DBG3: 15:21:47.119832 - Wait up to 30 secs to check if blocked
> lock has been granted @253.87
> DBG3: 15:21:48.121296 - Check if blocked lock has been granted @254.87
> ...
> DBG3: 15:22:14.158314 - Check if blocked lock has been granted @280.90
> DBG3: 15:22:15.017594 - Getting results from blocked lock @281.76
> DBG1: 15:22:15.017832 - Unlock file (F_UNLCK, F_SETLK) off=0 len=0
> range(0, 18446744073709551615) on second process @281.76
> PASS: Locking byte range (72 passed, 0 failed)

This sounds like a different problem than what this patchset fixes. This
patchset is really all about signal handling during the wait for a lock.
That sounds more like the wait is just not completing?

I just kicked off this test in nfstests with vers=3 and I think I see
the same 30s stalls. Coincidentally:

#define NLMCLNT_POLL_TIMEOUT (30*HZ)

So it does look like something may be going wrong with the lock granting
mechanism. I'll need to do a bit of investigation to figure out what's
going on.

--
Jeff Layton <[email protected]>