This RFC series adds logic to the Linux NFS server to make it wait a
few moments for clients to return a delegation before replying with
NFS4ERR_DELAY. There are two main benefits:
- This improves responsiveness when a delegated file is accessed
from another client
- This removes an unnecessary latency if a client has neglected to
return a delegation before attempting a RENAME or SETATTR
This series is incomplete:
- There are likely other operations that can benefit (eg. OPEN)
This series applies against v5.19.
Changes since v2:
- Wake immediately after client sends DELEGRETURN
- More tracepoint improvements
Changes since RFC:
- Eliminate spurious console messages on the server
- Wait for DELEGRETURN for RENAME operations
- Add CB done tracepoints
- Rework the retry loop
---
Chuck Lever (7):
NFSD: Instrument fh_verify()
NFSD: Replace dprintk() call site in fh_verify()
NFSD: Trace NFSv4 COMPOUND tags
NFSD: Add tracepoints to report NFSv4 callback completions
NFSD: Add a mechanism to wait for a DELEGRETURN
NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY
NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY
fs/nfsd/nfs4layouts.c | 2 +-
fs/nfsd/nfs4proc.c | 56 +++++++++++---
fs/nfsd/nfs4state.c | 35 +++++++++
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 13 +---
fs/nfsd/trace.h | 171 ++++++++++++++++++++++++++++++++++++++++--
fs/nfsd/xdr4.h | 2 +
7 files changed, 251 insertions(+), 29 deletions(-)
--
Chuck Lever
The Linux NFSv4 client implementation does not use COMPOUND tags,
but the Solaris and MacOS implementations do, and so does pynfs.
Record these eye-catchers in the server's trace buffer to annotate
client requests while troubleshooting.
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4proc.c | 2 +-
fs/nfsd/trace.h | 21 ++++++++++++++-------
2 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 3895eb52d2b1..e622e1c5a8e1 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2678,7 +2678,7 @@ nfsd4_proc_compound(struct svc_rqst *rqstp)
rqstp->rq_lease_breaker = (void **)&cstate->clp;
- trace_nfsd_compound(rqstp, args->opcnt);
+ trace_nfsd_compound(rqstp, args->tag, args->taglen, args->opcnt);
while (!status && resp->opcnt < args->opcnt) {
op = &args->ops[resp->opcnt++];
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index b6f3c1366a82..c5f3b5740bbf 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -84,19 +84,26 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
{ NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" })
TRACE_EVENT(nfsd_compound,
- TP_PROTO(const struct svc_rqst *rqst,
- u32 args_opcnt),
- TP_ARGS(rqst, args_opcnt),
+ TP_PROTO(
+ const struct svc_rqst *rqst,
+ const char *tag,
+ u32 taglen,
+ u32 opcnt
+ ),
+ TP_ARGS(rqst, tag, taglen, opcnt),
TP_STRUCT__entry(
__field(u32, xid)
- __field(u32, args_opcnt)
+ __field(u32, opcnt)
+ __string_len(tag, tag, taglen)
),
TP_fast_assign(
__entry->xid = be32_to_cpu(rqst->rq_xid);
- __entry->args_opcnt = args_opcnt;
+ __entry->opcnt = opcnt;
+ __assign_str_len(tag, tag, taglen);
),
- TP_printk("xid=0x%08x opcnt=%u",
- __entry->xid, __entry->args_opcnt)
+ TP_printk("xid=0x%08x opcnt=%u tag=%s",
+ __entry->xid, __entry->opcnt, __get_str(tag)
+ )
)
TRACE_EVENT(nfsd_compound_status,
Wireshark has always been lousy about dissecting NFSv4 callbacks,
especially NFSv4.0 backchannel requests. Add tracepoints so we
can surgically capture these events in the trace log.
Tracepoints are time-stamped and ordered so that we can now observe
the timing relationship between a CB_RECALL Reply and the client's
DELEGRETURN Call. Example:
nfsd-1153 [002] 211.986391: nfsd_cb_recall: addr=192.168.1.67:45767 client 62ea82e4:fee7492a stateid 00000003:00000001
nfsd-1153 [002] 212.095634: nfsd_compound: xid=0x0000002c opcnt=2
nfsd-1153 [002] 212.095647: nfsd_compound_status: op=1/2 OP_PUTFH status=0
nfsd-1153 [002] 212.095658: nfsd_file_put: hash=0xf72 inode=0xffff9291148c7410 ref=3 flags=HASHED|REFERENCED may=READ file=0xffff929103b3ea00
nfsd-1153 [002] 212.095661: nfsd_compound_status: op=2/2 OP_DELEGRETURN status=0
kworker/u25:8-148 [002] 212.096713: nfsd_cb_recall_done: client 62ea82e4:fee7492a stateid 00000003:00000001 status=0
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4layouts.c | 2 +-
fs/nfsd/nfs4proc.c | 3 +++
fs/nfsd/nfs4state.c | 4 ++++
fs/nfsd/trace.h | 39 +++++++++++++++++++++++++++++++++++++++
4 files changed, 47 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 2c05692a9abf..3564d1c6f610 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -658,7 +658,7 @@ nfsd4_cb_layout_done(struct nfsd4_callback *cb, struct rpc_task *task)
ktime_t now, cutoff;
const struct nfsd4_layout_ops *ops;
-
+ trace_nfsd_cb_layout_done(&ls->ls_stid.sc_stateid, task);
switch (task->tk_status) {
case 0:
case -NFS4ERR_DELAY:
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index e622e1c5a8e1..14f07d3d6c48 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1666,6 +1666,9 @@ static void nfsd4_cb_offload_release(struct nfsd4_callback *cb)
static int nfsd4_cb_offload_done(struct nfsd4_callback *cb,
struct rpc_task *task)
{
+ struct nfsd4_copy *copy = container_of(cb, struct nfsd4_copy, cp_cb);
+
+ trace_nfsd_cb_offload_done(©->cp_res.cb_stateid, task);
return 1;
}
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 9409a0dc1b76..0cf5a4bb36df 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -357,6 +357,8 @@ nfsd4_cb_notify_lock_prepare(struct nfsd4_callback *cb)
static int
nfsd4_cb_notify_lock_done(struct nfsd4_callback *cb, struct rpc_task *task)
{
+ trace_nfsd_cb_notify_lock_done(&zero_stateid, task);
+
/*
* Since this is just an optimization, we don't try very hard if it
* turns out not to succeed. We'll requeue it on NFS4ERR_DELAY, and
@@ -4715,6 +4717,8 @@ static int nfsd4_cb_recall_done(struct nfsd4_callback *cb,
{
struct nfs4_delegation *dp = cb_to_delegation(cb);
+ trace_nfsd_cb_recall_done(&dp->dl_stid.sc_stateid, task);
+
if (dp->dl_stid.sc_type == NFS4_CLOSED_DELEG_STID ||
dp->dl_stid.sc_type == NFS4_REVOKED_DELEG_STID)
return 1;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index c5f3b5740bbf..6809d92d4cc5 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -1239,6 +1239,45 @@ TRACE_EVENT(nfsd_cb_offload,
__entry->fh_hash, __entry->count, __entry->status)
);
+DECLARE_EVENT_CLASS(nfsd_cb_done_class,
+ TP_PROTO(
+ const stateid_t *stp,
+ const struct rpc_task *task
+ ),
+ TP_ARGS(stp, task),
+ TP_STRUCT__entry(
+ __field(u32, cl_boot)
+ __field(u32, cl_id)
+ __field(u32, si_id)
+ __field(u32, si_generation)
+ __field(int, status)
+ ),
+ TP_fast_assign(
+ __entry->cl_boot = stp->si_opaque.so_clid.cl_boot;
+ __entry->cl_id = stp->si_opaque.so_clid.cl_id;
+ __entry->si_id = stp->si_opaque.so_id;
+ __entry->si_generation = stp->si_generation;
+ __entry->status = task->tk_status;
+ ),
+ TP_printk("client %08x:%08x stateid %08x:%08x status=%d",
+ __entry->cl_boot, __entry->cl_id, __entry->si_id,
+ __entry->si_generation, __entry->status
+ )
+);
+
+#define DEFINE_NFSD_CB_DONE_EVENT(name) \
+DEFINE_EVENT(nfsd_cb_done_class, name, \
+ TP_PROTO( \
+ const stateid_t *stp, \
+ const struct rpc_task *task \
+ ), \
+ TP_ARGS(stp, task))
+
+DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_recall_done);
+DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_notify_lock_done);
+DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_layout_done);
+DEFINE_NFSD_CB_DONE_EVENT(nfsd_cb_offload_done);
+
#endif /* _NFSD_TRACE_H */
#undef TRACE_INCLUDE_PATH
Record permission errors in the trace log. Note that the new trace
event is conditional, so it will only record non-zero return values
from nfsd_permission().
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfsfh.c | 8 +-------
fs/nfsd/trace.h | 46 ++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 45 insertions(+), 9 deletions(-)
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 5e2ed4b2a925..877da093ed2d 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -392,13 +392,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
skip_pseudoflavor_check:
/* Finally, check access permissions. */
error = nfsd_permission(rqstp, exp, dentry, access);
-
- if (error) {
- dprintk("fh_verify: %pd2 permission failure, "
- "acc=%x, error=%d\n",
- dentry,
- access, ntohl(error));
- }
+ trace_nfsd_fh_verify_err(rqstp, fhp, type, access, error);
out:
if (error == nfserr_stale)
nfsd_stats_fh_stale_inc(exp);
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 8467fd8f94c2..b6f3c1366a82 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -211,13 +211,55 @@ TRACE_EVENT(nfsd_fh_verify,
__entry->type = type;
__entry->access = access;
),
- TP_printk("xid=0x%08x fh_hash=0x%08x inode=%p type=%s access=%s",
- __entry->xid, __entry->fh_hash, __entry->inode,
+ TP_printk("xid=0x%08x fh_hash=0x%08x type=%s access=%s",
+ __entry->xid, __entry->fh_hash,
show_fs_file_type(__entry->type),
show_nfsd_may_flags(__entry->access)
)
);
+TRACE_EVENT_CONDITION(nfsd_fh_verify_err,
+ TP_PROTO(
+ const struct svc_rqst *rqstp,
+ const struct svc_fh *fhp,
+ umode_t type,
+ int access,
+ __be32 error
+ ),
+ TP_ARGS(rqstp, fhp, type, access, error),
+ TP_CONDITION(error),
+ TP_STRUCT__entry(
+ __field(unsigned int, netns_ino)
+ __sockaddr(server, rqstp->rq_xprt->xpt_remotelen)
+ __sockaddr(client, rqstp->rq_xprt->xpt_remotelen)
+ __field(u32, xid)
+ __field(u32, fh_hash)
+ __field(void *, inode)
+ __field(unsigned long, type)
+ __field(unsigned long, access)
+ __field(int, error)
+ ),
+ TP_fast_assign(
+ __entry->netns_ino = SVC_NET(rqstp)->ns.inum;
+ __assign_sockaddr(server, &rqstp->rq_xprt->xpt_local,
+ rqstp->rq_xprt->xpt_locallen);
+ __assign_sockaddr(client, &rqstp->rq_xprt->xpt_remote,
+ rqstp->rq_xprt->xpt_remotelen);
+ __entry->xid = be32_to_cpu(rqstp->rq_xid);
+ __entry->fh_hash = knfsd_fh_hash(&fhp->fh_handle);
+ __entry->inode = d_inode(fhp->fh_dentry);
+ __entry->type = type;
+ __entry->access = access;
+ __entry->error = be32_to_cpu(error);
+ ),
+ TP_printk("xid=0x%08x fh_hash=0x%08x type=%s access=%s error=%d",
+ __entry->xid, __entry->fh_hash,
+ show_fs_file_type(__entry->type),
+ show_nfsd_may_flags(__entry->access),
+ __entry->error
+ )
+);
+
DECLARE_EVENT_CLASS(nfsd_fh_err_class,
TP_PROTO(struct svc_rqst *rqstp,
struct svc_fh *fhp,
A subsequent patch will use this mechanism to wake up an operation
that is waiting for a client to return a delegation.
The new tracepoint records whether the wait timed out or was
properly awoken by the expected DELEGRETURN.
Suggested-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4state.c | 31 +++++++++++++++++++++++++++++++
fs/nfsd/nfsd.h | 1 +
fs/nfsd/trace.h | 23 +++++++++++++++++++++++
fs/nfsd/xdr4.h | 2 ++
4 files changed, 57 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 0cf5a4bb36df..9d5140d485d4 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4689,6 +4689,36 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type)
return ret;
}
+static bool nfsd4_deleg_present(struct inode *inode)
+{
+ struct file_lock_context *ctx = smp_load_acquire(&inode->i_flctx);
+
+ return ctx && !list_empty_careful(&ctx->flc_lease);
+}
+
+/**
+ * nfsd4_wait_for_delegreturn - wait for delegations to be returned
+ * @rqstp: the RPC transaction being executed
+ * @fhp: instantiated filehandle of the file being waited for
+ *
+ * The timeout prevents deadlock if all nfsd threads happen to be
+ * tied up waiting for returning delegations.
+ *
+ * Return values:
+ * %true: delegation was returned
+ * %false: timed out waiting for delegreturn
+ */
+bool nfsd4_wait_for_delegreturn(struct svc_rqst *rqstp, struct svc_fh *fhp)
+{
+ struct inode *inode = d_inode(fhp->fh_dentry);
+ long __maybe_unused timeo;
+
+ timeo = wait_var_event_timeout(inode, !nfsd4_deleg_present(inode),
+ NFSD_DELEGRETURN_TIMEOUT);
+ trace_nfsd_delegreturn_wakeup(rqstp, fhp, timeo);
+ return timeo > 0;
+}
+
static void nfsd4_cb_recall_prepare(struct nfsd4_callback *cb)
{
struct nfs4_delegation *dp = cb_to_delegation(cb);
@@ -6703,6 +6733,7 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (status)
goto put_stateid;
+ wake_up_var(d_inode(cstate->current_fh.fh_dentry));
destroy_delegation(dp);
put_stateid:
nfs4_put_stid(&dp->dl_stid);
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 9a8b09afc173..05abfb6a0271 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -341,6 +341,7 @@ void nfsd_lockd_shutdown(void);
#define NFSD_LAUNDROMAT_MINTIMEOUT 1 /* seconds */
#define NFSD_COURTESY_CLIENT_TIMEOUT (24 * 60 * 60) /* seconds */
+#define NFSD_DELEGRETURN_TIMEOUT (HZ / 34) /* 30ms */
/*
* The following attributes are currently not supported by the NFSv4 server:
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 6809d92d4cc5..a88f4303fbd9 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -538,6 +538,29 @@ DEFINE_NFSD_COPY_ERR_EVENT(clone_file_range_err);
#include "filecache.h"
#include "vfs.h"
+TRACE_EVENT(nfsd_delegreturn_wakeup,
+ TP_PROTO(
+ const struct svc_rqst *rqstp,
+ const struct svc_fh *fhp,
+ long timeo
+ ),
+ TP_ARGS(rqstp, fhp, timeo),
+ TP_STRUCT__entry(
+ __field(u32, xid)
+ __field(u32, fh_hash)
+ __field(long, timeo)
+ ),
+ TP_fast_assign(
+ __entry->xid = be32_to_cpu(rqstp->rq_xid);
+ __entry->fh_hash = knfsd_fh_hash(&fhp->fh_handle);
+ __entry->timeo = timeo;
+ ),
+ TP_printk("xid=0x%08x fh_hash=0x%08x%s",
+ __entry->xid, __entry->fh_hash,
+ __entry->timeo == 0 ? " (timed out)" : ""
+ )
+);
+
DECLARE_EVENT_CLASS(nfsd_stateid_class,
TP_PROTO(stateid_t *stp),
TP_ARGS(stp),
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 7b744011f2d3..7323433746f1 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -788,6 +788,8 @@ extern __be32 nfsd4_destroy_clientid(struct svc_rqst *, struct nfsd4_compound_st
union nfsd4_op_u *u);
__be32 nfsd4_reclaim_complete(struct svc_rqst *, struct nfsd4_compound_state *,
union nfsd4_op_u *u);
+extern bool nfsd4_wait_for_delegreturn(struct svc_rqst *rqstp,
+ struct svc_fh *fhp);
extern __be32 nfsd4_process_open1(struct nfsd4_compound_state *,
struct nfsd4_open *open, struct nfsd_net *nn);
extern __be32 nfsd4_process_open2(struct svc_rqst *rqstp,
nfsd_setattr() can kick off a CB_RECALL (via
notify_change() -> break_lease()) if a delegation is present. Before
returning NFS4ERR_DELAY, give the client holding that delegation a
chance to return it and then retry the nfsd_setattr() again, once.
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4proc.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 14f07d3d6c48..5b18a71f1043 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1142,7 +1142,7 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
{
struct nfsd4_setattr *setattr = &u->setattr;
__be32 status = nfs_ok;
- int err;
+ int err, retries;
if (setattr->sa_iattr.ia_valid & ATTR_SIZE) {
status = nfs4_preprocess_stateid_op(rqstp, cstate,
@@ -1173,8 +1173,21 @@ nfsd4_setattr(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
&setattr->sa_label);
if (status)
goto out;
- status = nfsd_setattr(rqstp, &cstate->current_fh, &setattr->sa_iattr,
- 0, (time64_t)0);
+
+ retries = 1;
+ do {
+ status = nfsd_setattr(rqstp, &cstate->current_fh,
+ &setattr->sa_iattr, 0, (time64_t)0);
+ if (status != nfserr_jukebox)
+ break;
+ if (!retries--)
+ break;
+ if (!nfsd4_wait_for_delegreturn(rqstp, &cstate->current_fh))
+ break;
+
+ fh_clear_pre_post_attrs(&cstate->current_fh);
+ } while (1);
+
out:
fh_drop_write(&cstate->current_fh);
return status;
nfsd_rename() can kick off a CB_RECALL (via
vfs_rename() -> leases_conflict()) if a delegation is present.
Before returning NFS4ERR_DELAY, give the client holding that
delegation a chance to return it and then retry the nfsd_rename()
again, once.
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=354
Tested-by: Igor Mammedov <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4proc.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 5b18a71f1043..ed3c6ef875c3 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1056,17 +1056,33 @@ nfsd4_rename(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
{
struct nfsd4_rename *rename = &u->rename;
__be32 status;
+ int retries;
if (opens_in_grace(SVC_NET(rqstp)))
return nfserr_grace;
- status = nfsd_rename(rqstp, &cstate->save_fh, rename->rn_sname,
- rename->rn_snamelen, &cstate->current_fh,
- rename->rn_tname, rename->rn_tnamelen);
- if (status)
- return status;
- set_change_info(&rename->rn_sinfo, &cstate->current_fh);
- set_change_info(&rename->rn_tinfo, &cstate->save_fh);
- return nfs_ok;
+
+ retries = 1;
+ do {
+ status = nfsd_rename(rqstp, &cstate->save_fh, rename->rn_sname,
+ rename->rn_snamelen, &cstate->current_fh,
+ rename->rn_tname, rename->rn_tnamelen);
+ if (status == nfs_ok) {
+ set_change_info(&rename->rn_sinfo, &cstate->current_fh);
+ set_change_info(&rename->rn_tinfo, &cstate->save_fh);
+ break;
+ }
+ if (status != nfserr_jukebox)
+ break;
+ if (!retries--)
+ break;
+ if (!nfsd4_wait_for_delegreturn(rqstp, &cstate->current_fh))
+ break;
+
+ fh_clear_pre_post_attrs(&cstate->save_fh);
+ fh_clear_pre_post_attrs(&cstate->current_fh);
+ } while (1);
+
+ return status;
}
static __be32
On Mon, 2022-08-08 at 09:52 -0400, Chuck Lever wrote:
> This RFC series adds logic to the Linux NFS server to make it wait a
> few moments for clients to return a delegation before replying with
> NFS4ERR_DELAY. There are two main benefits:
>
> - This improves responsiveness when a delegated file is accessed
> from another client
> - This removes an unnecessary latency if a client has neglected to
> return a delegation before attempting a RENAME or SETATTR
>
> This series is incomplete:
> - There are likely other operations that can benefit (eg. OPEN)
>
> This series applies against v5.19.
>
> Changes since v2:
> - Wake immediately after client sends DELEGRETURN
> - More tracepoint improvements
>
> Changes since RFC:
> - Eliminate spurious console messages on the server
> - Wait for DELEGRETURN for RENAME operations
^^^^
Does REMOVE need similar treatment? Does the Apple client return a
delegation before attempting to unlink?
> - Add CB done tracepoints
> - Rework the retry loop
>
> ---
>
> Chuck Lever (7):
> NFSD: Instrument fh_verify()
> NFSD: Replace dprintk() call site in fh_verify()
> NFSD: Trace NFSv4 COMPOUND tags
> NFSD: Add tracepoints to report NFSv4 callback completions
> NFSD: Add a mechanism to wait for a DELEGRETURN
> NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY
> NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY
>
>
> fs/nfsd/nfs4layouts.c | 2 +-
> fs/nfsd/nfs4proc.c | 56 +++++++++++---
> fs/nfsd/nfs4state.c | 35 +++++++++
> fs/nfsd/nfsd.h | 1 +
> fs/nfsd/nfsfh.c | 13 +---
> fs/nfsd/trace.h | 171 ++++++++++++++++++++++++++++++++++++++++--
> fs/nfsd/xdr4.h | 2 +
> 7 files changed, 251 insertions(+), 29 deletions(-)
>
> --
> Chuck Lever
>
The new tracepoints are nice and the patchset makes sense. You can add:
Reviewed-by: Jeff Layton <[email protected]>
> On Aug 8, 2022, at 2:48 PM, Jeff Layton <[email protected]> wrote:
>
> On Mon, 2022-08-08 at 09:52 -0400, Chuck Lever wrote:
>> This RFC series adds logic to the Linux NFS server to make it wait a
>> few moments for clients to return a delegation before replying with
>> NFS4ERR_DELAY. There are two main benefits:
>>
>> - This improves responsiveness when a delegated file is accessed
>> from another client
>> - This removes an unnecessary latency if a client has neglected to
>> return a delegation before attempting a RENAME or SETATTR
>>
>> This series is incomplete:
>> - There are likely other operations that can benefit (eg. OPEN)
> Does REMOVE need similar treatment? Does the Apple client return a
> delegation before attempting to unlink?
It's certainly plausible that REMOVE might trigger a delegation recall,
but I haven't yet seen this happen on the wire.
--
Chuck Lever
> On Aug 8, 2022, at 4:06 PM, Chuck Lever III <[email protected]> wrote:
>
>> On Aug 8, 2022, at 2:48 PM, Jeff Layton <[email protected]> wrote:
>>
>> On Mon, 2022-08-08 at 09:52 -0400, Chuck Lever wrote:
>>> This RFC series adds logic to the Linux NFS server to make it wait a
>>> few moments for clients to return a delegation before replying with
>>> NFS4ERR_DELAY. There are two main benefits:
>>>
>>> - This improves responsiveness when a delegated file is accessed
>>> from another client
>>> - This removes an unnecessary latency if a client has neglected to
>>> return a delegation before attempting a RENAME or SETATTR
>>>
>>> This series is incomplete:
>>> - There are likely other operations that can benefit (eg. OPEN)
>
>> Does REMOVE need similar treatment? Does the Apple client return a
>> delegation before attempting to unlink?
>
> It's certainly plausible that REMOVE might trigger a delegation recall,
> but I haven't yet seen this happen on the wire.
I started playing with REMOVE and DELEG15a, and ran into an
interesting problem.
REMOVE is passed the filehandle of the parent and the name of
the entry to be removed. nfsd4_remove() doesn't have an inode
or filehandle of the object to be removed. nfsd_unlink() does
acquire the inode of that object, though.
And, I guess if we want to avoid NFSv3 operations returning
JUKEBOX on delegated files too, the "wait for DELEGRETURN"
really ought to be called from the fs/nfsd/vfs.c functions,
not from the NFSv4 specific functions in fs/nfsd/nfs4proc.c.
--
Chuck Lever