2022-08-08 13:55:26

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 0/7] Wait for DELEGRETURN before returning NFS4ERR_DELAY

This RFC series adds logic to the Linux NFS server to make it wait a
few moments for clients to return a delegation before replying with
NFS4ERR_DELAY. There are two main benefits:

- This improves responsiveness when a delegated file is accessed
from another client
- This removes an unnecessary latency if a client has neglected to
return a delegation before attempting a RENAME or SETATTR

This series is incomplete:
- There are likely other operations that can benefit (eg. OPEN)

This series applies against v5.19.

Changes since v2:
- Wake immediately after client sends DELEGRETURN
- More tracepoint improvements

Changes since RFC:
- Eliminate spurious console messages on the server
- Wait for DELEGRETURN for RENAME operations
- Add CB done tracepoints
- Rework the retry loop

---

Chuck Lever (7):
NFSD: Instrument fh_verify()
NFSD: Replace dprintk() call site in fh_verify()
NFSD: Trace NFSv4 COMPOUND tags
NFSD: Add tracepoints to report NFSv4 callback completions
NFSD: Add a mechanism to wait for a DELEGRETURN
NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY
NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY


fs/nfsd/nfs4layouts.c | 2 +-
fs/nfsd/nfs4proc.c | 56 +++++++++++---
fs/nfsd/nfs4state.c | 35 +++++++++
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 13 +---
fs/nfsd/trace.h | 171 ++++++++++++++++++++++++++++++++++++++++--
fs/nfsd/xdr4.h | 2 +
7 files changed, 251 insertions(+), 29 deletions(-)

--
Chuck Lever


2022-08-08 13:55:37

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 3/7] NFSD: Trace NFSv4 COMPOUND tags

The Linux NFSv4 client implementation does not use COMPOUND tags,
but the Solaris and MacOS implementations do, and so does pynfs.
Record these eye-catchers in the server's trace buffer to annotate
client requests while troubleshooting.

Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
fs/nfsd/trace.h | 21 ++++++++++++++-------
2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 3895eb52d2b1..e622e1c5a8e1 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2678,7 +2678,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)

rqstp->rq_lease_breaker = (void **)&cstate->clp;

- trace_nfsd_compound(rqstp, args->opcnt);
+ trace_nfsd_compound(rqstp, args->tag, args->taglen, args->opcnt);
while (!status && resp->opcnt < args->opcnt) {
op = &args->ops[resp->opcnt++];

diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index b6f3c1366a82..c5f3b5740bbf 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -84,19 +84,26 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
{ NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" })

TRACE_EVENT(nfsd_compound,
- TP_PROTO(const struct svc_rqst *rqst,
- u32 args_opcnt),
- TP_ARGS(rqst, args_opcnt),
+ TP_PROTO(
+ const struct svc_rqst *rqst,
+ const char *tag,
+ u32 taglen,
+ u32 opcnt
+ ),
+ TP_ARGS(rqst, tag, taglen, opcnt),
TP_STRUCT__entry(
__field(u32, xid)
- __field(u32, args_opcnt)
+ __field(u32, opcnt)
+ __string_len(tag, tag, taglen)
),
TP_fast_assign(
__entry->xid = be32_to_cpu(rqst->rq_xid);
- __entry->args_opcnt = args_opcnt;
+ __entry->opcnt = opcnt;
+ __assign_str_len(tag, tag, taglen);
),
- TP_printk("xid=0x%08x opcnt=%u",
- __entry->xid, __entry->args_opcnt)
+ TP_printk("xid=0x%08x opcnt=%u tag=%s",
+ __entry->xid, __entry->opcnt, __get_str(tag)
+ )
)

TRACE_EVENT(nfsd_compound_status,


2022-08-08 13:55:37

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 4/7] NFSD: Add tracepoints to report NFSv4 callback completions

Wireshark has always been lousy about dissecting NFSv4 callbacks,
especially NFSv4.0 backchannel requests. Add tracepoints so we
can surgically capture these events in the trace log.

Tracepoints are time-stamped and ordered so that we can now observe
the timing relationship between a CB_RECALL Reply and the client's
DELEGRETURN Call. Example:

nfsd-1153 [002] 211.986391: nfsd_cb_recall: addr=192.168.1.67:45767 client 62ea82e4:fee7492a stateid 00000003:00000001

nfsd-1153 [002] 212.095634: nfsd_compound: xid=0x0000002c opcnt=2
nfsd-1153 [002] 212.095647: nfsd_compound_status: op=1/2 OP_PUTFH status=0
nfsd-1153 [002] 212.095658: nfsd_file_put: hash=0xf72 inode=0xffff9291148c7410 ref=3 flags=HASHED|REFERENCED may=READ file=0xffff929103b3ea00
nfsd-1153 [002] 212.095661: nfsd_compound_status: op=2/2 OP_DELEGRETURN status=0
kworker/u25:8-148 [002] 212.096713: nfsd_cb_recall_done: client 62ea82e4:fee7492a stateid 00000003:00000001 status=0

Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4layouts.c | 2 +-
fs/nfsd/nfs4proc.c | 3 +++
fs/nfsd/nfs4state.c | 4 ++++
fs/nfsd/trace.h | 39 +++++++++++++++++++++++++++++++++++++++
4 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 2c05692a9abf..3564d1c6f610 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -658,7 +658,7 @@ nfsd4_cb_layout_done(struct nfsd4_callback *cb, struct rpc_task *task)
ktime_t now, cutoff;
const struct nfsd4_layout_ops *ops;

-
+ trace_nfsd_cb_layout_done(&ls->ls_stid.sc_stateid, task);
switch (task->tk_status) {
case 0:
case -NFS4ERR_DELAY:
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index e622e1c5a8e1..14f07d3d6c48 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1666,6 +1666,9 @@ static void nfsd4_cb_offload_release(struct nfsd4_callback *cb)
static int nfsd4_cb_offload_done(struct nfsd4_callback *cb,
struct rpc_task *task)
{
+ struct nfsd4_copy *copy = container_of(cb, struct nfsd4_copy, cp_cb);
+
+ trace_nfsd_cb_offload_done(&copy->cp_res.cb_stateid, task);
return 1;
}

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 9409a0dc1b76..0cf5a4bb36df 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -357,6 +357,8 @@ nfsd4_cb_notify_lock_prepare(struct nfsd4_callback *cb)
static int
nfsd4_cb_notify_lock_done(struct nfsd4_callback *cb, struct rpc_task *task)
{
+ trace_nfsd_cb_notify_lock_done(&zero_stateid, task);
+
/*
* Since this is just an optimization, we don't try very hard if it
* turns out not to succeed. We'll requeue it on NFS4ERR_DELAY, and
@@ -4715,6 +4717,8 @@ static int nfsd4_cb_recall_done(struct nfsd4_callback *cb,
{
struct nfs4_delegation *dp = cb_to_delegation(cb);

+ trace_nfsd_cb_recall_done(&dp->dl_stid.sc_stateid, task);
+
if (dp->dl_stid.sc_type == NFS4_CLOSED_DELEG_STID ||
dp->dl_stid.sc_type == NFS4_REVOKED_DELEG_STID)
return 1;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index c5f3b5740bbf..6809d92d4cc5 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -1239,6 +1239,45 @@ TRACE_EVENT(nfsd_cb_offload,
__entry->fh_hash, __entry->count, __entry->status)
);

+DECLARE_EVENT_CLASS(nfsd_cb_done_class,
+ TP_PROTO(
+ const stateid_t *stp,
+ const struct rpc_task *task
+ ),
+ TP_ARGS(stp, task),
+ TP_STRUCT__entry(
+ __field(u32, cl_boot)
+ __field(u32, cl_id)
+ __field(u32, si_id)
+ __field(u32, si_generation)
+ __field(int, status)
+ ),
+ TP_fast_assign(
+ __entry->cl_boot = stp->si_opaque.so_clid.cl_boot;
+ __entry->cl_id = stp->si_opaque.so_clid.cl_id;
+ __entry->si_id = stp->si_opaque.so_id;
+ __entry->si_generation = stp->si_generation;
+ __entry->status = task->tk_status;
+ ),
+ TP_printk("client %08x:%08x stateid %08x:%08x status=%d",
+ __entry->cl_boot, __entry->cl_id, __entry->si_id,
+ __entry->si_generation, __entry->status
+ )
+);
+
+#define DEFINE_NFSD_CB_DONE_EVENT(name) \
+DEFINE_EVENT(nfsd_cb_done_class, name, \
+ TP_PROTO( \
+ const stateid_t *stp, \
+ const struct rpc_task *task \
+ ), \
+ TP_ARGS(stp, task))
+
+DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_recall_done);
+DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_notify_lock_done);
+DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_layout_done);
+DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_offload_done);
+
#endif /* _NFSD_TRACE_H */

#undef TRACE_INCLUDE_PATH


2022-08-08 13:55:38

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 2/7] NFSD: Replace dprintk() call site in fh_verify()

Record permission errors in the trace log. Note that the new trace
event is conditional, so it will only record non-zero return values
from nfsd_permission().

Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfsfh.c | 8 +-------
fs/nfsd/trace.h | 46 ++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 45 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 5e2ed4b2a925..877da093ed2d 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -392,13 +392,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
skip_pseudoflavor_check:
/* Finally, check access permissions. */
error = nfsd_permission(rqstp, exp, dentry, access);
-
- if (error) {
- dprintk("fh_verify: %pd2 permission failure, "
- "acc=%x, error=%d\n",
- dentry,
- access, ntohl(error));
- }
+ trace_nfsd_fh_verify_err(rqstp, fhp, type, access, error);
out:
if (error == nfserr_stale)
nfsd_stats_fh_stale_inc(exp);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 8467fd8f94c2..b6f3c1366a82 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -211,13 +211,55 @@ TRACE_EVENT(nfsd_fh_verify,
__entry->type = type;
__entry->access = access;
),
- TP_printk("xid=0x%08x fh_hash=0x%08x inode=%p type=%s access=%s",
- __entry->xid, __entry->fh_hash, __entry->inode,
+ TP_printk("xid=0x%08x fh_hash=0x%08x type=%s access=%s",
+ __entry->xid, __entry->fh_hash,
show_fs_file_type(__entry->type),
show_nfsd_may_flags(__entry->access)
)
);

+TRACE_EVENT_CONDITION(nfsd_fh_verify_err,
+ TP_PROTO(
+ const struct svc_rqst *rqstp,
+ const struct svc_fh *fhp,
+ umode_t type,
+ int access,
+ __be32 error
+ ),
+ TP_ARGS(rqstp, fhp, type, access, error),
+ TP_CONDITION(error),
+ TP_STRUCT__entry(
+ __field(unsigned int, netns_ino)
+ __sockaddr(server, rqstp->rq_xprt->xpt_remotelen)
+ __sockaddr(client, rqstp->rq_xprt->xpt_remotelen)
+ __field(u32, xid)
+ __field(u32, fh_hash)
+ __field(void *, inode)
+ __field(unsigned long, type)
+ __field(unsigned long, access)
+ __field(int, error)
+ ),
+ TP_fast_assign(
+ __entry->netns_ino = SVC_NET(rqstp)->ns.inum;
+ __assign_sockaddr(server, &rqstp->rq_xprt->xpt_local,
+ rqstp->rq_xprt->xpt_locallen);
+ __assign_sockaddr(client, &rqstp->rq_xprt->xpt_remote,
+ rqstp->rq_xprt->xpt_remotelen);
+ __entry->xid = be32_to_cpu(rqstp->rq_xid);
+ __entry->fh_hash = knfsd_fh_hash(&fhp->fh_handle);
+ __entry->inode = d_inode(fhp->fh_dentry);
+ __entry->type = type;
+ __entry->access = access;
+ __entry->error = be32_to_cpu(error);
+ ),
+ TP_printk("xid=0x%08x fh_hash=0x%08x type=%s access=%s error=%d",
+ __entry->xid, __entry->fh_hash,
+ show_fs_file_type(__entry->type),
+ show_nfsd_may_flags(__entry->access),
+ __entry->error
+ )
+);
+
DECLARE_EVENT_CLASS(nfsd_fh_err_class,
TP_PROTO(struct svc_rqst *rqstp,
struct svc_fh *fhp,


2022-08-08 13:55:38

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 5/7] NFSD: Add a mechanism to wait for a DELEGRETURN

A subsequent patch will use this mechanism to wake up an operation
that is waiting for a client to return a delegation.

The new tracepoint records whether the wait timed out or was
properly awoken by the expected DELEGRETURN.

Suggested-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4state.c | 31 +++++++++++++++++++++++++++++++
fs/nfsd/nfsd.h | 1 +
fs/nfsd/trace.h | 23 +++++++++++++++++++++++
fs/nfsd/xdr4.h | 2 ++
4 files changed, 57 insertions(+)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 0cf5a4bb36df..9d5140d485d4 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4689,6 +4689,36 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type)
return ret;
}

+static bool nfsd4_deleg_present(struct inode *inode)
+{
+ struct file_lock_context *ctx = smp_load_acquire(&inode->i_flctx);
+
+ return ctx && !list_empty_careful(&ctx->flc_lease);
+}
+
+/**
+ * nfsd4_wait_for_delegreturn - wait for delegations to be returned
+ * @rqstp: the RPC transaction being executed
+ * @fhp: instantiated filehandle of the file being waited for
+ *
+ * The timeout prevents deadlock if all nfsd threads happen to be
+ * tied up waiting for returning delegations.
+ *
+ * Return values:
+ * %true: delegation was returned
+ * %false: timed out waiting for delegreturn
+ */
+bool nfsd4_wait_for_delegreturn(struct svc_rqst *rqstp, struct svc_fh *fhp)
+{
+ struct inode *inode = d_inode(fhp->fh_dentry);
+ long __maybe_unused timeo;
+
+ timeo = wait_var_event_timeout(inode, !nfsd4_deleg_present(inode),
+ NFSD_DELEGRETURN_TIMEOUT);
+ trace_nfsd_delegreturn_wakeup(rqstp, fhp, timeo);
+ return timeo > 0;
+}
+
static void nfsd4_cb_recall_prepare(struct nfsd4_callback *cb)
{
struct nfs4_delegation *dp = cb_to_delegation(cb);
@@ -6703,6 +6733,7 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (status)
goto put_stateid;

+ wake_up_var(d_inode(cstate->current_fh.fh_dentry));
destroy_delegation(dp);
put_stateid:
nfs4_put_stid(&dp->dl_stid);
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 9a8b09afc173..05abfb6a0271 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -341,6 +341,7 @@ void nfsd_lockd_shutdown(void);

#define NFSD_LAUNDROMAT_MINTIMEOUT 1 /* seconds */
#define NFSD_COURTESY_CLIENT_TIMEOUT (24 * 60 * 60) /* seconds */
+#define NFSD_DELEGRETURN_TIMEOUT (HZ / 34) /* 30ms */

/*
* The following attributes are currently not supported by the NFSv4 server:
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 6809d92d4cc5..a88f4303fbd9 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -538,6 +538,29 @@ DEFINE_NFSD_COPY_ERR_EVENT(clone_file_range_err);
#include "filecache.h"
#include "vfs.h"

+TRACE_EVENT(nfsd_delegreturn_wakeup,
+ TP_PROTO(
+ const struct svc_rqst *rqstp,
+ const struct svc_fh *fhp,
+ long timeo
+ ),
+ TP_ARGS(rqstp, fhp, timeo),
+ TP_STRUCT__entry(
+ __field(u32, xid)
+ __field(u32, fh_hash)
+ __field(long, timeo)
+ ),
+ TP_fast_assign(
+ __entry->xid = be32_to_cpu(rqstp->rq_xid);
+ __entry->fh_hash = knfsd_fh_hash(&fhp->fh_handle);
+ __entry->timeo = timeo;
+ ),
+ TP_printk("xid=0x%08x fh_hash=0x%08x%s",
+ __entry->xid, __entry->fh_hash,
+ __entry->timeo == 0 ? " (timed out)" : ""
+ )
+);
+
DECLARE_EVENT_CLASS(nfsd_stateid_class,
TP_PROTO(stateid_t *stp),
TP_ARGS(stp),
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 7b744011f2d3..7323433746f1 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -788,6 +788,8 @@ extern __be32 nfsd4_destroy_clientid(struct svc_rqst *, struct nfsd4_compound_st
union nfsd4_op_u *u);
__be32 nfsd4_reclaim_complete(struct svc_rqst *, struct nfsd4_compound_state *,
union nfsd4_op_u *u);
+extern bool nfsd4_wait_for_delegreturn(struct svc_rqst *rqstp,
+ struct svc_fh *fhp);
extern __be32 nfsd4_process_open1(struct nfsd4_compound_state *,
struct nfsd4_open *open, struct nfsd_net *nn);
extern __be32 nfsd4_process_open2(struct svc_rqst *rqstp,


2022-08-08 13:55:45

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 6/7] NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY

nfsd_setattr() can kick off a CB_RECALL (via
notify_change() -> break_lease()) if a delegation is present. Before
returning NFS4ERR_DELAY, give the client holding that delegation a
chance to return it and then retry the nfsd_setattr() again, once.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4proc.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 14f07d3d6c48..5b18a71f1043 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1142,7 +1142,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
{
struct nfsd4_setattr *setattr = &u->setattr;
__be32 status = nfs_ok;
- int err;
+ int err, retries;

if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
status = nfs4_preprocess_stateid_op(rqstp, cstate,
@@ -1173,8 +1173,21 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
&setattr->sa_label);
if (status)
goto out;
- status = nfsd_setattr(rqstp, &cstate->current_fh, &setattr->sa_iattr,
- 0, (time64_t)0);
+
+ retries = 1;
+ do {
+ status = nfsd_setattr(rqstp, &cstate->current_fh,
+ &setattr->sa_iattr, 0, (time64_t)0);
+ if (status != nfserr_jukebox)
+ break;
+ if (!retries--)
+ break;
+ if (!nfsd4_wait_for_delegreturn(rqstp, &cstate->current_fh))
+ break;
+
+ fh_clear_pre_post_attrs(&cstate->current_fh);
+ } while (1);
+
out:
fh_drop_write(&cstate->current_fh);
return status;


2022-08-08 13:55:45

by Chuck Lever III

[permalink] [raw]
Subject: [PATCH v3 7/7] NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY

nfsd_rename() can kick off a CB_RECALL (via
vfs_rename() -> leases_conflict()) if a delegation is present.
Before returning NFS4ERR_DELAY, give the client holding that
delegation a chance to return it and then retry the nfsd_rename()
again, once.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4proc.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 5b18a71f1043..ed3c6ef875c3 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1056,17 +1056,33 @@ nfsd4_rename(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
{
struct nfsd4_rename *rename = &u->rename;
__be32 status;
+ int retries;

if (opens_in_grace(SVC_NET(rqstp)))
return nfserr_grace;
- status = nfsd_rename(rqstp, &cstate->save_fh, rename->rn_sname,
- rename->rn_snamelen, &cstate->current_fh,
- rename->rn_tname, rename->rn_tnamelen);
- if (status)
- return status;
- set_change_info(&rename->rn_sinfo, &cstate->current_fh);
- set_change_info(&rename->rn_tinfo, &cstate->save_fh);
- return nfs_ok;
+
+ retries = 1;
+ do {
+ status = nfsd_rename(rqstp, &cstate->save_fh, rename->rn_sname,
+ rename->rn_snamelen, &cstate->current_fh,
+ rename->rn_tname, rename->rn_tnamelen);
+ if (status == nfs_ok) {
+ set_change_info(&rename->rn_sinfo, &cstate->current_fh);
+ set_change_info(&rename->rn_tinfo, &cstate->save_fh);
+ break;
+ }
+ if (status != nfserr_jukebox)
+ break;
+ if (!retries--)
+ break;
+ if (!nfsd4_wait_for_delegreturn(rqstp, &cstate->current_fh))
+ break;
+
+ fh_clear_pre_post_attrs(&cstate->save_fh);
+ fh_clear_pre_post_attrs(&cstate->current_fh);
+ } while (1);
+
+ return status;
}

static __be32


2022-08-08 18:59:36

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v3 0/7] Wait for DELEGRETURN before returning NFS4ERR_DELAY

On Mon, 2022-08-08 at 09:52 -0400, Chuck Lever wrote:
> This RFC series adds logic to the Linux NFS server to make it wait a
> few moments for clients to return a delegation before replying with
> NFS4ERR_DELAY. There are two main benefits:
>
> - This improves responsiveness when a delegated file is accessed
> from another client
> - This removes an unnecessary latency if a client has neglected to
> return a delegation before attempting a RENAME or SETATTR
>
> This series is incomplete:
> - There are likely other operations that can benefit (eg. OPEN)
>
> This series applies against v5.19.
>
> Changes since v2:
> - Wake immediately after client sends DELEGRETURN
> - More tracepoint improvements
>
> Changes since RFC:
> - Eliminate spurious console messages on the server
> - Wait for DELEGRETURN for RENAME operations
^^^^
Does REMOVE need similar treatment? Does the Apple client return a
delegation before attempting to unlink?

> - Add CB done tracepoints
> - Rework the retry loop
>
> ---
>
> Chuck Lever (7):
> NFSD: Instrument fh_verify()
> NFSD: Replace dprintk() call site in fh_verify()
> NFSD: Trace NFSv4 COMPOUND tags
> NFSD: Add tracepoints to report NFSv4 callback completions
> NFSD: Add a mechanism to wait for a DELEGRETURN
> NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY
> NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY
>
>
> fs/nfsd/nfs4layouts.c | 2 +-
> fs/nfsd/nfs4proc.c | 56 +++++++++++---
> fs/nfsd/nfs4state.c | 35 +++++++++
> fs/nfsd/nfsd.h | 1 +
> fs/nfsd/nfsfh.c | 13 +---
> fs/nfsd/trace.h | 171 ++++++++++++++++++++++++++++++++++++++++--
> fs/nfsd/xdr4.h | 2 +
> 7 files changed, 251 insertions(+), 29 deletions(-)
>
> --
> Chuck Lever
>

The new tracepoints are nice and the patchset makes sense. You can add:

Reviewed-by: Jeff Layton <[email protected]>

2022-08-08 20:07:35

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v3 0/7] Wait for DELEGRETURN before returning NFS4ERR_DELAY



> On Aug 8, 2022, at 2:48 PM, Jeff Layton <[email protected]> wrote:
>
> On Mon, 2022-08-08 at 09:52 -0400, Chuck Lever wrote:
>> This RFC series adds logic to the Linux NFS server to make it wait a
>> few moments for clients to return a delegation before replying with
>> NFS4ERR_DELAY. There are two main benefits:
>>
>> - This improves responsiveness when a delegated file is accessed
>> from another client
>> - This removes an unnecessary latency if a client has neglected to
>> return a delegation before attempting a RENAME or SETATTR
>>
>> This series is incomplete:
>> - There are likely other operations that can benefit (eg. OPEN)

> Does REMOVE need similar treatment? Does the Apple client return a
> delegation before attempting to unlink?

It's certainly plausible that REMOVE might trigger a delegation recall,
but I haven't yet seen this happen on the wire.


--
Chuck Lever



2022-08-08 21:51:18

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v3 0/7] Wait for DELEGRETURN before returning NFS4ERR_DELAY


> On Aug 8, 2022, at 4:06 PM, Chuck Lever III <[email protected]> wrote:
>
>> On Aug 8, 2022, at 2:48 PM, Jeff Layton <[email protected]> wrote:
>>
>> On Mon, 2022-08-08 at 09:52 -0400, Chuck Lever wrote:
>>> This RFC series adds logic to the Linux NFS server to make it wait a
>>> few moments for clients to return a delegation before replying with
>>> NFS4ERR_DELAY. There are two main benefits:
>>>
>>> - This improves responsiveness when a delegated file is accessed
>>> from another client
>>> - This removes an unnecessary latency if a client has neglected to
>>> return a delegation before attempting a RENAME or SETATTR
>>>
>>> This series is incomplete:
>>> - There are likely other operations that can benefit (eg. OPEN)
>
>> Does REMOVE need similar treatment? Does the Apple client return a
>> delegation before attempting to unlink?
>
> It's certainly plausible that REMOVE might trigger a delegation recall,
> but I haven't yet seen this happen on the wire.

I started playing with REMOVE and DELEG15a, and ran into an
interesting problem.

REMOVE is passed the filehandle of the parent and the name of
the entry to be removed. nfsd4_remove() doesn't have an inode
or filehandle of the object to be removed. nfsd_unlink() does
acquire the inode of that object, though.

And, I guess if we want to avoid NFSv3 operations returning
JUKEBOX on delegated files too, the "wait for DELEGRETURN"
really ought to be called from the fs/nfsd/vfs.c functions,
not from the NFSv4 specific functions in fs/nfsd/nfs4proc.c.


--
Chuck Lever