NFSv4.1 adds a new GET_DIR_DELEGATION operation, to allow clients
to request a delegation on a directory. If the client holds a directory
delegation, then it knows that nothing will change the dentries in it
until it has been recalled.
Last year, Rick Macklem gave a talk at the NFS Bakeathon on his
implementation of directory delegations for FreeBSD [1], and showed that
it can greatly improve LOOKUP-heavy workloads. There is also some
earlier work by CITI [2] that showed similar results. The SMB protocol
also has a similar sort of construct, and they have also seen great
performance improvements from it.
This patchset adds minimal support for directory delegations to the VFS
layer, and and plumbs support for GET_DIR_DELEGATION handling into nfsd.
It also has a set of patches for the NFS client. The VFS and server-side
bits all seem to work now and do the right thing. The client side is
still very much a WIP since it relies more on policy and heuristics.
What I've found is that without directory delegations in play, the
client usually revalidates dentries by issuing a GETATTR for the parent.
If it holds a delegation on the parent, it can avoid that GETATTR, which
is a nice benefit.
But...the catch is that the Linux NFS client defaults to caching
dentries for 30-60s before revalidating them. The attrcache timeout
quickly ends up as 60s on directories that aren't changing, so for a
delegation to be useful, you need to be revalidating dentries in the
same directory over a rather long span of time. Even then, at best it'll
be optimizing away one GETATTR per inode every 60s or so.
I've done a little ad-hoc testing with kernel builds, but they don't
show much benefit from this. Testing with them both enabled and
disabled, the kernel builds finished within 5 seconds of one another on
a 12.5 minute build, which is insignificant.
The GETATTR load seems to be reduced by about 7% with dir delegs
enabled, which is nice, but GETATTRs are usually pretty lightweight
anyway, and we do have the overhead of dir delegations to deal with.
There is a lot of room for improvement that could make this more
compelling:
1) The client's logic for fetching a delegation is probably too
primitive. Maybe we need to request one only after every 3-5
revalidations?
2) The client uses the same "REFERENCED" timeout for dir delegations
as it does for clients. It could probably benefit from holding them
longer by default. I'm not sure how best to do that.
3) The protocol allows for clients and server to use asynchronous
notifications to avoid issuing delegation breaks when there are
changes to a delegated directory. That's not implemented yet,
but it could change the calculus here for workloads where multiple
clients are accessing and changing child dentries.
4) A GET_DIR_DELEGATION+READDIR compound could be worthwhile. If we have
all of the dentries in the directory, then we could (in principle)
satisfy any negtive dentry lookup w/o going to the server. If they're
all in the correct order on d_subdirs, we could satisfy a readdir
via dcache_readdir without going to the server.
5) The server could probably avoid handing them out if the inode changed
<60s in the past?
For the VFS and NFSD bits, I mainly want to get some early feedback.
Does this seem like a reasonable approach? Anyone see major problems
with handling directory delegations in the VFS codepaths?
The NFS client bits are quite a bit more rough. Is there a better
heuristic for when to request a dir delegation? I could use a bit of
guidance here.
[1]: https://www.youtube.com/watch?v=DdFyH3BN5pI
[2]: https://linux-nfs.org/wiki/index.php/CITI_Experience_with_Directory_Delegations
Signed-off-by: Jeff Layton <[email protected]>
---
Jeff Layton (24):
filelock: push the S_ISREG check down to ->setlease handlers
filelock: add a lm_set_conflict lease_manager callback
vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink}
vfs: allow mkdir to wait for delegation break on parent
vfs: allow rmdir to wait for delegation break on parent
vfs: break parent dir delegations in open(..., O_CREAT) codepath
vfs: make vfs_create break delegations on parent directory
vfs: make vfs_mknod break delegations on parent directory
filelock: lift the ban on directory leases in generic_setlease
nfsd: allow filecache to hold S_IFDIR files
nfsd: allow DELEGRETURN on directories
nfsd: encoders and decoders for GET_DIR_DELEGATION
nfsd: check for delegation conflicts vs. the same client
nfsd: wire up GET_DIR_DELEGATION handling
nfs: fix nfs_stateid_hash prototype when CONFIG_CRC32 isn't set
nfs: remove unused NFS_CALL macro
nfs: add cache_validity to the nfs_inode_event tracepoints
nfs: add a tracepoint to nfs_inode_detach_delegation_locked
nfs: new tracepoint in nfs_delegation_need_return
nfs: new tracepoint in match_stateid operation
nfs: add a GDD_GETATTR rpc operation
nfs: skip dentry revalidation when parent dir has a delegation
nfs: optionally request a delegation on GETATTR
nfs: add a module parameter to disable directory delegations
drivers/base/devtmpfs.c | 6 +-
fs/cachefiles/namei.c | 2 +-
fs/ecryptfs/inode.c | 8 +--
fs/init.c | 4 +-
fs/locks.c | 12 +++-
fs/namei.c | 101 ++++++++++++++++++++++++++++------
fs/nfs/delegation.c | 5 ++
fs/nfs/dir.c | 20 +++++++
fs/nfs/internal.h | 2 +-
fs/nfs/nfs4file.c | 2 +
fs/nfs/nfs4proc.c | 55 ++++++++++++++++++-
fs/nfs/nfs4trace.h | 104 +++++++++++++++++++++++++++++++++++
fs/nfs/nfs4xdr.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++
fs/nfs/nfstrace.h | 8 ++-
fs/nfsd/filecache.c | 37 +++++++++++--
fs/nfsd/filecache.h | 2 +
fs/nfsd/nfs3proc.c | 2 +-
fs/nfsd/nfs4proc.c | 49 +++++++++++++++++
fs/nfsd/nfs4recover.c | 6 +-
fs/nfsd/nfs4state.c | 112 +++++++++++++++++++++++++++++++++++++-
fs/nfsd/nfs4xdr.c | 72 +++++++++++++++++++++++-
fs/nfsd/state.h | 5 ++
fs/nfsd/vfs.c | 13 +++--
fs/nfsd/vfs.h | 2 +-
fs/nfsd/xdr4.h | 8 +++
fs/open.c | 2 +-
fs/overlayfs/overlayfs.h | 8 +--
fs/smb/client/cifsfs.c | 3 +
fs/smb/server/vfs.c | 8 +--
include/linux/filelock.h | 10 ++++
include/linux/fs.h | 11 ++--
include/linux/nfs4.h | 6 ++
include/linux/nfs_fs.h | 1 +
include/linux/nfs_fs_sb.h | 1 +
include/linux/nfs_xdr.h | 9 +--
net/unix/af_unix.c | 2 +-
36 files changed, 754 insertions(+), 80 deletions(-)
---
base-commit: b80a250a20975e30ff8635dd75d6d20bf05405a1
change-id: 20240215-dir-deleg-e212210ba9d4
Best regards,
--
Jeff Layton <[email protected]>
fh_verify only allows you to filter on a single type of inode, so have
nfsd4_delegreturn not filter by type. Once fh_verify returns, do the
appropriate check of the type and return an error if it's not a regular
file or directory.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 17d09d72632b..c52e807f9672 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7425,12 +7425,24 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
struct nfs4_delegation *dp;
stateid_t *stateid = &dr->dr_stateid;
struct nfs4_stid *s;
+ umode_t mode;
__be32 status;
struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
- if ((status = fh_verify(rqstp, &cstate->current_fh, S_IFREG, 0)))
+ if ((status = fh_verify(rqstp, &cstate->current_fh, 0, 0)))
return status;
+ mode = d_inode(cstate->current_fh.fh_dentry)->i_mode & S_IFMT;
+ switch(mode) {
+ case S_IFREG:
+ case S_IFDIR:
+ break;
+ case S_IFLNK:
+ return nfserr_symlink;
+ default:
+ return nfserr_inval;
+ }
+
status = nfsd4_lookup_stateid(cstate, stateid, SC_TYPE_DELEG, 0, &s, nn);
if (status)
goto out;
--
2.44.0
Add a tracepoint in the function that decides whether to return a
delegation to the server.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/delegation.c | 2 ++
fs/nfs/nfs4trace.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 49 insertions(+)
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index a331a2dbae12..88dbec4739d3 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -571,6 +571,8 @@ static bool nfs_delegation_need_return(struct nfs_delegation *delegation)
{
bool ret = false;
+ trace_nfs_delegation_need_return(delegation);
+
if (test_and_clear_bit(NFS_DELEGATION_RETURN, &delegation->flags))
ret = true;
else if (test_bit(NFS_DELEGATION_RETURN_IF_CLOSED, &delegation->flags)) {
diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
index b4ebac1961cc..e5adfb755dc7 100644
--- a/fs/nfs/nfs4trace.h
+++ b/fs/nfs/nfs4trace.h
@@ -14,6 +14,8 @@
#include <trace/misc/fs.h>
#include <trace/misc/nfs.h>
+#include "delegation.h"
+
#define show_nfs_fattr_flags(valid) \
__print_flags((unsigned long)valid, "|", \
{ NFS_ATTR_FATTR_TYPE, "TYPE" }, \
@@ -928,6 +930,51 @@ DEFINE_NFS4_SET_DELEGATION_EVENT(nfs4_set_delegation);
DEFINE_NFS4_SET_DELEGATION_EVENT(nfs4_reclaim_delegation);
DEFINE_NFS4_SET_DELEGATION_EVENT(nfs4_detach_delegation);
+#define show_delegation_flags(flags) \
+ __print_flags(flags, "|", \
+ { BIT(NFS_DELEGATION_NEED_RECLAIM), "NEED_RECLAIM" }, \
+ { BIT(NFS_DELEGATION_RETURN), "RETURN" }, \
+ { BIT(NFS_DELEGATION_RETURN_IF_CLOSED), "RETURN_IF_CLOSED" }, \
+ { BIT(NFS_DELEGATION_REFERENCED), "REFERENCED" }, \
+ { BIT(NFS_DELEGATION_RETURNING), "RETURNING" }, \
+ { BIT(NFS_DELEGATION_REVOKED), "REVOKED" }, \
+ { BIT(NFS_DELEGATION_TEST_EXPIRED), "TEST_EXPIRED" }, \
+ { BIT(NFS_DELEGATION_INODE_FREEING), "INODE_FREEING" }, \
+ { BIT(NFS_DELEGATION_RETURN_DELAYED), "RETURN_DELAYED" })
+
+DECLARE_EVENT_CLASS(nfs4_delegation_event,
+ TP_PROTO(
+ const struct nfs_delegation *delegation
+ ),
+
+ TP_ARGS(delegation),
+
+ TP_STRUCT__entry(
+ __field(u32, fhandle)
+ __field(unsigned int, fmode)
+ __field(unsigned long, flags)
+ ),
+
+ TP_fast_assign(
+ __entry->fhandle = nfs_fhandle_hash(NFS_FH(delegation->inode));
+ __entry->fmode = delegation->type;
+ __entry->flags = delegation->flags;
+ ),
+
+ TP_printk(
+ "fhandle=0x%08x fmode=%s flags=%s",
+ __entry->fhandle, show_fs_fmode_flags(__entry->fmode),
+ show_delegation_flags(__entry->flags)
+ )
+);
+#define DEFINE_NFS4_DELEGATION_EVENT(name) \
+ DEFINE_EVENT(nfs4_delegation_event, name, \
+ TP_PROTO( \
+ const struct nfs_delegation *delegation \
+ ), \
+ TP_ARGS(delegation))
+DEFINE_NFS4_DELEGATION_EVENT(nfs_delegation_need_return);
+
TRACE_EVENT(nfs4_delegreturn_exit,
TP_PROTO(
const struct nfs4_delegreturnargs *args,
--
2.44.0
We expect directory delegations to most helpful when their children have
frequently revalidated dentries. The way we usually revalidate them is
to check that the dentry's d_time field matches the change attribute on
the parent. Most dentry revalidations that have to issue an RPC end up
as a GETATTR on the wire for the parent, if the parent hasn't changed,
then the dentry is fine.
Add a new NFS_INO_GDD_GETATTR flag to the nfsi->flags field. Whenever we
validate a dentry vs. its parent, set that flag on the parent. Then,
when we next do a GETATTR vs. the parent, test and clear the flag and do
a GDD_GETATTR if it was set. If the the GET_DIR_DELEGATION succeeds, set
the delegation in the inode.
Finally, a lot of servers don't support GET_DIR_DELEGATION, so add a
NFS_CAP_* bit to track whether the server supports it, and enable it by
default on v4.1+, and retry without a delegation if we get back an
ENOTSUPP.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/delegation.c | 1 +
fs/nfs/dir.c | 8 ++++++++
fs/nfs/nfs4proc.c | 45 ++++++++++++++++++++++++++++++++++++++++++---
include/linux/nfs_fs.h | 1 +
include/linux/nfs_fs_sb.h | 1 +
5 files changed, 53 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index 88dbec4739d3..14c14bd03465 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -356,6 +356,7 @@ nfs_detach_delegation_locked(struct nfs_inode *nfsi,
delegation->inode = NULL;
rcu_assign_pointer(nfsi->delegation, NULL);
spin_unlock(&delegation->lock);
+ clear_bit(NFS_INO_GDD_GETATTR, &nfsi->flags);
return delegation;
}
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 061648c73116..930fe7e14914 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1504,12 +1504,20 @@ static int nfs_dentry_verify_change(struct inode *dir, struct dentry *dentry)
static int nfs_check_verifier(struct inode *dir, struct dentry *dentry,
int rcu_walk)
{
+ struct nfs_inode *nfsi = NFS_I(dir);
+
if (IS_ROOT(dentry))
return 1;
if (NFS_SERVER(dir)->flags & NFS_MOUNT_LOOKUP_CACHE_NONE)
return 0;
if (!nfs_dentry_verify_change(dir, dentry))
return 0;
+ /*
+ * The dentry matches the directory's change attribute, so
+ * we're likely revalidating here. Flag the dir so that we
+ * ask for a delegation on the next getattr.
+ */
+ set_bit(NFS_INO_GDD_GETATTR, &nfsi->flags);
/* Revalidate nfsi->cache_change_attribute before we declare a match */
if (nfs_mapping_need_revalidate_inode(dir)) {
if (rcu_walk)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index d518388bd0d6..3dbe9a18c9be 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4318,6 +4318,21 @@ static int nfs4_get_referral(struct rpc_clnt *client, struct inode *dir,
return status;
}
+static bool should_request_dir_deleg(struct inode *inode)
+{
+ if (!inode)
+ return false;
+ if (!S_ISDIR(inode->i_mode))
+ return false;
+ if (!nfs_server_capable(inode, NFS_CAP_GET_DIR_DELEG))
+ return false;
+ if (!test_and_clear_bit(NFS_INO_GDD_GETATTR, &(NFS_I(inode)->flags)))
+ return false;
+ if (nfs4_have_delegation(inode, FMODE_READ))
+ return false;
+ return true;
+}
+
static int _nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
struct nfs_fattr *fattr, struct inode *inode)
{
@@ -4331,11 +4346,12 @@ static int _nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
.server = server,
};
struct rpc_message msg = {
- .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_GETATTR],
.rpc_argp = &args,
.rpc_resp = &res,
};
unsigned short task_flags = 0;
+ bool gdd;
+ int status;
if (nfs4_has_session(server->nfs_client))
task_flags = RPC_TASK_MOVEABLE;
@@ -4344,11 +4360,32 @@ static int _nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
if (inode && (server->flags & NFS_MOUNT_SOFTREVAL))
task_flags |= RPC_TASK_TIMEOUT;
+retry:
+ gdd = should_request_dir_deleg(inode);
+ if (gdd)
+ msg.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_GDD_GETATTR];
+ else
+ msg.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_GETATTR];
+
nfs4_bitmap_copy_adjust(bitmask, nfs4_bitmask(server, fattr->label), inode, 0);
nfs_fattr_init(fattr);
nfs4_init_sequence(&args.seq_args, &res.seq_res, 0, 0);
- return nfs4_do_call_sync(server->client, server, &msg,
- &args.seq_args, &res.seq_res, task_flags);
+ status = nfs4_do_call_sync(server->client, server, &msg,
+ &args.seq_args, &res.seq_res, task_flags);
+ switch (status) {
+ case 0:
+ if (gdd && res.nf_status == GDD4_OK)
+ status = nfs_inode_set_delegation(inode, current_cred(), FMODE_READ,
+ &res.deleg, 0);
+ break;
+ case -ENOTSUPP:
+ /* If the server doesn't support GET_DIR_DELEGATION, retry without it */
+ if (gdd) {
+ server->caps &= ~NFS_CAP_GET_DIR_DELEG;
+ goto retry;
+ }
+ }
+ return status;
}
int nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
@@ -10552,6 +10589,7 @@ static const struct nfs4_minor_version_ops nfs_v4_1_minor_ops = {
.minor_version = 1,
.init_caps = NFS_CAP_READDIRPLUS
| NFS_CAP_ATOMIC_OPEN
+ | NFS_CAP_GET_DIR_DELEG
| NFS_CAP_POSIX_LOCK
| NFS_CAP_STATEID_NFSV41
| NFS_CAP_ATOMIC_OPEN_V1
@@ -10578,6 +10616,7 @@ static const struct nfs4_minor_version_ops nfs_v4_2_minor_ops = {
.minor_version = 2,
.init_caps = NFS_CAP_READDIRPLUS
| NFS_CAP_ATOMIC_OPEN
+ | NFS_CAP_GET_DIR_DELEG
| NFS_CAP_POSIX_LOCK
| NFS_CAP_STATEID_NFSV41
| NFS_CAP_ATOMIC_OPEN_V1
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index f5ce7b101146..5140e4dac98a 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -318,6 +318,7 @@ struct nfs4_copy_state {
#define NFS_INO_LAYOUTCOMMITTING (10) /* layoutcommit inflight */
#define NFS_INO_LAYOUTSTATS (11) /* layoutstats inflight */
#define NFS_INO_ODIRECT (12) /* I/O setting is O_DIRECT */
+#define NFS_INO_GDD_GETATTR (13) /* Ask for dir deleg on next GETATTR */
static inline struct nfs_inode *NFS_I(const struct inode *inode)
{
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 92de074e63b9..0ab744cf52d7 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -278,6 +278,7 @@ struct nfs_server {
#define NFS_CAP_LGOPEN (1U << 5)
#define NFS_CAP_CASE_INSENSITIVE (1U << 6)
#define NFS_CAP_CASE_PRESERVING (1U << 7)
+#define NFS_CAP_GET_DIR_DELEG (1U << 13)
#define NFS_CAP_POSIX_LOCK (1U << 14)
#define NFS_CAP_UIDGID_NOMAP (1U << 15)
#define NFS_CAP_STATEID_NFSV41 (1U << 16)
--
2.44.0