2024-04-24 00:19:59

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 0/2] Revert NFSD callback client retransmit fixes

Read the patch descriptions for what these two patches do.

I've convinced myself that going back to the v6.8 behavior is
better than leaving these applied. I plan to send them to Linus
for v6.9-rc. See the nfsd-fixes branch in:

https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git


---

Chuck Lever (2):
Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"
Revert "NFSD: Convert the callback workqueue to use delayed_work"


fs/nfsd/nfs4callback.c | 26 +++++---------------------
fs/nfsd/state.h | 2 +-
2 files changed, 6 insertions(+), 22 deletions(-)

--
Chuck Lever



2024-04-24 00:20:06

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 1/2] Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"

From: Chuck Lever <[email protected]>

The reverted commit attempted to enable NFSD to retransmit pending
callback operations if an NFS client disconnects, but
unintentionally introduces a hazardous behavior regression if the
client becomes permanently unreachable while callback operations are
still pending.

A disconnect can occur due to network partition or if the NFS server
needs to force the NFS client to retransmit (for example, if a GSS
window under-run occurs).

Reverting the commit will make NFSD behave the same as it did in
v6.8 and before. Pending callback operations are permanently lost if
the client connection is terminated before the client receives them.

For some callback operations, this loss is not harmful.

However, for CB_RECALL, the loss means a delegation might be revoked
unnecessarily. For CB_OFFLOAD, pending COPY operations will never
complete unless the NFS client subsequently sends an OFFLOAD_STATUS
operation, which the Linux NFS client does not currently implement.

These issues still need to be addressed somehow.

Reported-by: Dai Ngo <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4callback.c | 20 ++------------------
1 file changed, 2 insertions(+), 18 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index e440f72b9d4e..d153af81f406 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -986,14 +986,6 @@ static bool nfsd4_queue_cb(struct nfsd4_callback *cb)
return queue_delayed_work(callback_wq, &cb->cb_work, 0);
}

-static void nfsd4_queue_cb_delayed(struct nfsd4_callback *cb,
- unsigned long msecs)
-{
- trace_nfsd_cb_queue(cb->cb_clp, cb);
- queue_delayed_work(callback_wq, &cb->cb_work,
- msecs_to_jiffies(msecs));
-}
-
static void nfsd41_cb_inflight_begin(struct nfs4_client *clp)
{
atomic_inc(&clp->cl_cb_inflight);
@@ -1502,16 +1494,8 @@ nfsd4_run_cb_work(struct work_struct *work)

clnt = clp->cl_cb_client;
if (!clnt) {
- if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
- nfsd41_destroy_cb(cb);
- else {
- /*
- * XXX: Ideally, we could wait for the client to
- * reconnect, but I haven't figured out how
- * to do that yet.
- */
- nfsd4_queue_cb_delayed(cb, 25);
- }
+ /* Callback channel broken, or client killed; give up: */
+ nfsd41_destroy_cb(cb);
return;
}




2024-04-24 00:20:11

by Chuck Lever

[permalink] [raw]
Subject: [PATCH 2/2] Revert "NFSD: Convert the callback workqueue to use delayed_work"

From: Chuck Lever <[email protected]>

This commit was a pre-requisite for commit c1ccfcf1a9bf ("NFSD:
Reschedule CB operations when backchannel rpc_clnt is shut down"),
which has already been reverted.

Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4callback.c | 6 +++---
fs/nfsd/state.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index d153af81f406..6aff46701aa1 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -983,7 +983,7 @@ static struct workqueue_struct *callback_wq;
static bool nfsd4_queue_cb(struct nfsd4_callback *cb)
{
trace_nfsd_cb_queue(cb->cb_clp, cb);
- return queue_delayed_work(callback_wq, &cb->cb_work, 0);
+ return queue_work(callback_wq, &cb->cb_work);
}

static void nfsd41_cb_inflight_begin(struct nfs4_client *clp)
@@ -1482,7 +1482,7 @@ static void
nfsd4_run_cb_work(struct work_struct *work)
{
struct nfsd4_callback *cb =
- container_of(work, struct nfsd4_callback, cb_work.work);
+ container_of(work, struct nfsd4_callback, cb_work);
struct nfs4_client *clp = cb->cb_clp;
struct rpc_clnt *clnt;
int flags;
@@ -1528,7 +1528,7 @@ void nfsd4_init_cb(struct nfsd4_callback *cb, struct nfs4_client *clp,
cb->cb_msg.rpc_argp = cb;
cb->cb_msg.rpc_resp = cb;
cb->cb_ops = ops;
- INIT_DELAYED_WORK(&cb->cb_work, nfsd4_run_cb_work);
+ INIT_WORK(&cb->cb_work, nfsd4_run_cb_work);
cb->cb_status = 0;
cb->cb_need_restart = false;
cb->cb_holds_slot = false;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 01c6f3445646..2ed0fcf879fd 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -68,7 +68,7 @@ struct nfsd4_callback {
struct nfs4_client *cb_clp;
struct rpc_message cb_msg;
const struct nfsd4_callback_ops *cb_ops;
- struct delayed_work cb_work;
+ struct work_struct cb_work;
int cb_seq_status;
int cb_status;
bool cb_need_restart;