2020-03-11 19:57:26

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH 00/13] client side user xattr (RFC8276) support

This patchset implements the client side for NFS user extended attributes,
as defined in RFC8726.

This was originally posted as an RFC in:

https://patchwork.kernel.org/cover/11143565/

Patch 1 is shared with the server side patch, posted
separately.

Most comments in there still apply, except that:

1. Client side caching is now included in this patch set.
2. As per the discussion, user extended attributes are enabled if
the client and server support them (e.g. they support 4.2 and
advertise the user extended attribute FATTR). There are no longer
options to switch them off on either the client or the server.
3. The code is no longer conditioned on a config option.
4. The number of patches has been reduced somewhat by merging
smaller, related ones.

The client side caching is implemented through a per-inode hash table,
which is allocated on demand. See fs/nfs/nfs42xattr.c for details.

This has been tested as follows:

* Linux client and server:
* Test all corner cases (XATTR_SIZE_*)
* Test all failure cases (no xattr, setxattr with different or
invalid flags, etc).
* Verify the content of xattrs across several operations.
* Use KASAN and KMEMLEAK for a longer mix of testruns to verify
that there were no leaks (after unmounting the filesystem).
* Stress tested caching, trying to run the client out of memory.

* Tested against the FreeBSD-current implementation as well, which works
(after I fixed 2 bugs in that implementation, which I'm sending out to
them too).

* Not tested: RDMA (I couldn't get a setup going).

Frank van der Linden (13):
nfs,nfsd: NFSv4.2 extended attribute protocol definitions
nfs: add client side only definitions for user xattrs
NFSv4.2: query the server for extended attribute support
NFSv4.2: define limits and sizes for user xattr handling
NFSv4.2: add client side XDR handling for extended attributes
nfs: define nfs_access_get_cached function
NFSv4.2: query the extended attribute access bits
nfs: modify update_changeattr to deal with regular files
nfs: define and use the NFS_INO_INVALID_XATTR flag
nfs: make the buf_to_pages_noslab function available to the nfs code
NFSv4.2: add the extended attribute proc functions.
NFSv4.2: hook in the user extended attribute handlers
NFSv4.2: add client side xattr caching.

fs/nfs/Makefile | 1 +
fs/nfs/client.c | 19 +-
fs/nfs/dir.c | 24 +-
fs/nfs/inode.c | 16 +-
fs/nfs/internal.h | 28 ++
fs/nfs/nfs42.h | 24 +
fs/nfs/nfs42proc.c | 248 ++++++++++
fs/nfs/nfs42xattr.c | 1083 +++++++++++++++++++++++++++++++++++++++++++
fs/nfs/nfs42xdr.c | 442 ++++++++++++++++++
fs/nfs/nfs4_fs.h | 5 +
fs/nfs/nfs4client.c | 31 ++
fs/nfs/nfs4proc.c | 248 ++++++++--
fs/nfs/nfs4super.c | 10 +
fs/nfs/nfs4xdr.c | 29 ++
fs/nfs/nfstrace.h | 3 +-
include/linux/nfs4.h | 25 +
include/linux/nfs_fs.h | 12 +
include/linux/nfs_fs_sb.h | 6 +
include/linux/nfs_xdr.h | 60 ++-
include/uapi/linux/nfs4.h | 3 +
include/uapi/linux/nfs_fs.h | 1 +
21 files changed, 2276 insertions(+), 42 deletions(-)
create mode 100644 fs/nfs/nfs42xattr.c

--
2.16.6


2020-03-11 19:57:27

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH 06/13] nfs: define nfs_access_get_cached function

The only consumer of nfs_access_get_cached_rcu and nfs_access_cached
calls these static functions in order to first try RCU access, and
then locked access.

Combine them in to a single function, and call that. Make this function
available to the rest of the NFS code.

Signed-off-by: Frank van der Linden <[email protected]>
---
fs/nfs/dir.c | 20 ++++++++++++++++----
include/linux/nfs_fs.h | 2 ++
2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 193d6fb363b7..a0b564781e3e 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2433,7 +2433,7 @@ static struct nfs_access_entry *nfs_access_search_rbtree(struct inode *inode, co
return NULL;
}

-static int nfs_access_get_cached(struct inode *inode, const struct cred *cred, struct nfs_access_entry *res, bool may_block)
+static int nfs_access_get_cached_locked(struct inode *inode, const struct cred *cred, struct nfs_access_entry *res, bool may_block)
{
struct nfs_inode *nfsi = NFS_I(inode);
struct nfs_access_entry *cache;
@@ -2506,6 +2506,20 @@ static int nfs_access_get_cached_rcu(struct inode *inode, const struct cred *cre
return err;
}

+int nfs_access_get_cached(struct inode *inode, const struct cred *cred, struct
+nfs_access_entry *res, bool may_block)
+{
+ int status;
+
+ status = nfs_access_get_cached_rcu(inode, cred, res);
+ if (status != 0)
+ status = nfs_access_get_cached_locked(inode, cred, res,
+ may_block);
+
+ return status;
+}
+EXPORT_SYMBOL_GPL(nfs_access_get_cached);
+
static void nfs_access_add_rbtree(struct inode *inode, struct nfs_access_entry *set)
{
struct nfs_inode *nfsi = NFS_I(inode);
@@ -2620,9 +2634,7 @@ static int nfs_do_access(struct inode *inode, const struct cred *cred, int mask)

trace_nfs_access_enter(inode);

- status = nfs_access_get_cached_rcu(inode, cred, &cache);
- if (status != 0)
- status = nfs_access_get_cached(inode, cred, &cache, may_block);
+ status = nfs_access_get_cached(inode, cred, &cache, may_block);
if (status == 0)
goto out_cached;

diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 442458e94ab5..e86e7a747092 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -491,6 +491,8 @@ extern int nfs_instantiate(struct dentry *dentry, struct nfs_fh *fh,
struct nfs_fattr *fattr, struct nfs4_label *label);
extern int nfs_may_open(struct inode *inode, const struct cred *cred, int openflags);
extern void nfs_access_zap_cache(struct inode *inode);
+extern int nfs_access_get_cached(struct inode *inode, const struct cred *cred, struct nfs_access_entry *res,
+ bool may_block);

/*
* linux/fs/nfs/symlink.c
--
2.16.6

2020-03-11 19:57:27

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH 10/13] nfs: make the buf_to_pages_noslab function available to the nfs code

Make the buf_to_pages_noslab function available to the rest of the NFS
code. Rename it to nfs4_buf_to_pages_noslab to be consistent.

This will be used later in the NFSv4.2 xattr code.

Signed-off-by: Frank van der Linden <[email protected]>
---
fs/nfs/internal.h | 3 +++
fs/nfs/nfs4proc.c | 4 ++--
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 68f235a571e1..1e3a7e119c93 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -317,6 +317,9 @@ extern const u32 nfs42_maxlistxattrs_overhead;
extern const struct rpc_procinfo nfs4_procedures[];
#endif

+extern int nfs4_buf_to_pages_noslab(const void *buf, size_t buflen,
+ struct page **pages);
+
#ifdef CONFIG_NFS_V4_SECURITY_LABEL
extern struct nfs4_label *nfs4_label_alloc(struct nfs_server *server, gfp_t flags);
static inline struct nfs4_label *
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 725ae64f62f7..aee3a1c97def 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5553,7 +5553,7 @@ static inline int nfs4_server_supports_acls(struct nfs_server *server)
*/
#define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)

-static int buf_to_pages_noslab(const void *buf, size_t buflen,
+int nfs4_buf_to_pages_noslab(const void *buf, size_t buflen,
struct page **pages)
{
struct page *newpage, **spages;
@@ -5795,7 +5795,7 @@ static int __nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t bufl
return -EOPNOTSUPP;
if (npages > ARRAY_SIZE(pages))
return -ERANGE;
- i = buf_to_pages_noslab(buf, buflen, arg.acl_pages);
+ i = nfs4_buf_to_pages_noslab(buf, buflen, arg.acl_pages);
if (i < 0)
return i;
nfs4_inode_make_writeable(inode);
--
2.16.6

2020-03-11 19:57:28

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH 05/13] NFSv4.2: add client side XDR handling for extended attributes

Define the argument and response structures that will be used for
RFC 8276 extended attribute RPC calls, and implement the necessary
functions to encode/decode the extended attribute operations.

Signed-off-by: Frank van der Linden <[email protected]>
---
fs/nfs/nfs42xdr.c | 368 ++++++++++++++++++++++++++++++++++++++++++++++++
fs/nfs/nfs4xdr.c | 6 +
include/linux/nfs_xdr.h | 59 +++++++-
3 files changed, 432 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index 6712daa9d85b..ac45a1027523 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -407,6 +407,212 @@ static void encode_layouterror(struct xdr_stream *xdr,
encode_device_error(xdr, &args->errors[0]);
}

+#ifdef CONFIG_NFS_V4_2
+static void encode_setxattr(struct xdr_stream *xdr,
+ const struct nfs42_setxattrargs *arg,
+ struct compound_hdr *hdr)
+{
+ __be32 *p;
+
+ BUILD_BUG_ON(XATTR_CREATE != SETXATTR4_CREATE);
+ BUILD_BUG_ON(XATTR_REPLACE != SETXATTR4_REPLACE);
+
+ encode_op_hdr(xdr, OP_SETXATTR, decode_setxattr_maxsz, hdr);
+ p = reserve_space(xdr, 4);
+ *p = cpu_to_be32(arg->xattr_flags);
+ encode_string(xdr, strlen(arg->xattr_name), arg->xattr_name);
+ p = reserve_space(xdr, 4);
+ *p = cpu_to_be32(arg->xattr_len);
+ if (arg->xattr_len)
+ xdr_write_pages(xdr, arg->xattr_pages, 0, arg->xattr_len);
+}
+
+static int decode_setxattr(struct xdr_stream *xdr,
+ struct nfs4_change_info *cinfo)
+{
+ int status;
+
+ status = decode_op_hdr(xdr, OP_SETXATTR);
+ if (status)
+ goto out;
+ status = decode_change_info(xdr, cinfo);
+out:
+ return status;
+}
+
+
+static void encode_getxattr(struct xdr_stream *xdr, const char *name,
+ struct compound_hdr *hdr)
+{
+ encode_op_hdr(xdr, OP_GETXATTR, decode_getxattr_maxsz, hdr);
+ encode_string(xdr, strlen(name), name);
+}
+
+static int decode_getxattr(struct xdr_stream *xdr,
+ struct nfs42_getxattrres *res,
+ struct rpc_rqst *req)
+{
+ int status;
+ __be32 *p;
+ u32 len, rdlen;
+
+ status = decode_op_hdr(xdr, OP_GETXATTR);
+ if (status)
+ return status;
+
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ return -EIO;
+
+ len = be32_to_cpup(p);
+ if (len > req->rq_rcv_buf.page_len)
+ return -ERANGE;
+
+ res->xattr_len = len;
+
+ if (len > 0) {
+ rdlen = xdr_read_pages(xdr, len);
+ if (rdlen < len)
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static void encode_removexattr(struct xdr_stream *xdr, const char *name,
+ struct compound_hdr *hdr)
+{
+ encode_op_hdr(xdr, OP_REMOVEXATTR, decode_removexattr_maxsz, hdr);
+ encode_string(xdr, strlen(name), name);
+}
+
+
+static int decode_removexattr(struct xdr_stream *xdr,
+ struct nfs4_change_info *cinfo)
+{
+ int status;
+
+ status = decode_op_hdr(xdr, OP_REMOVEXATTR);
+ if (status)
+ goto out;
+
+ status = decode_change_info(xdr, cinfo);
+out:
+ return status;
+}
+
+static void encode_listxattrs(struct xdr_stream *xdr,
+ const struct nfs42_listxattrsargs *arg,
+ struct compound_hdr *hdr)
+{
+ __be32 *p;
+
+ encode_op_hdr(xdr, OP_LISTXATTRS, decode_listxattrs_maxsz + 1, hdr);
+
+ p = reserve_space(xdr, 12);
+ if (unlikely(!p))
+ return;
+
+ p = xdr_encode_hyper(p, arg->cookie);
+ /*
+ * RFC 8276 says to specify the full max length of the LISTXATTRS
+ * XDR reply. Count is set to the XDR length of the names array
+ * plus the EOF marker. So, add the cookie and the names count.
+ */
+ *p = cpu_to_be32(arg->count + 8 + 4);
+}
+
+static int decode_listxattrs(struct xdr_stream *xdr,
+ struct nfs42_listxattrsres *res)
+{
+ int status;
+ __be32 *p;
+ u32 count, len, ulen;
+ size_t left, copied;
+ char *buf;
+
+ status = decode_op_hdr(xdr, OP_LISTXATTRS);
+ if (status) {
+ /*
+ * Special case: for LISTXATTRS, NFS4ERR_TOOSMALL
+ * should be translated to ERANGE.
+ */
+ if (status == -ETOOSMALL)
+ status = -ERANGE;
+ goto out;
+ }
+
+ p = xdr_inline_decode(xdr, 8);
+ if (unlikely(!p))
+ return -EIO;
+
+ xdr_decode_hyper(p, &res->cookie);
+
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ return -EIO;
+
+ left = res->xattr_len;
+ buf = res->xattr_buf;
+
+ count = be32_to_cpup(p);
+ copied = 0;
+
+ /*
+ * We have asked for enough room to encode the maximum number
+ * of possible attribute names, so everything should fit.
+ *
+ * But, don't rely on that assumption. Just decode entries
+ * until they don't fit anymore, just in case the server did
+ * something odd.
+ */
+ while (count--) {
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ return -EIO;
+
+ len = be32_to_cpup(p);
+ if (len > (XATTR_NAME_MAX - XATTR_USER_PREFIX_LEN)) {
+ status = -ERANGE;
+ goto out;
+ }
+
+ p = xdr_inline_decode(xdr, len);
+ if (unlikely(!p))
+ return -EIO;
+
+ ulen = len + XATTR_USER_PREFIX_LEN + 1;
+ if (buf) {
+ if (ulen > left) {
+ status = -ERANGE;
+ goto out;
+ }
+
+ memcpy(buf, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN);
+ memcpy(buf + XATTR_USER_PREFIX_LEN, p, len);
+
+ buf[ulen - 1] = 0;
+ buf += ulen;
+ left -= ulen;
+ }
+ copied += ulen;
+ }
+
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ return -EIO;
+
+ res->eof = be32_to_cpup(p);
+ res->copied = copied;
+
+out:
+ if (status == -ERANGE && res->xattr_len == XATTR_LIST_MAX)
+ status = -E2BIG;
+
+ return status;
+}
+#endif
+
/*
* Encode ALLOCATE request
*/
@@ -1062,4 +1268,166 @@ static int nfs4_xdr_dec_layouterror(struct rpc_rqst *rqstp,
return status;
}

+#ifdef CONFIG_NFS_V4_2
+static void nfs4_xdr_enc_setxattr(struct rpc_rqst *req, struct xdr_stream *xdr,
+ const void *data)
+{
+ const struct nfs42_setxattrargs *args = data;
+ struct compound_hdr hdr = {
+ .minorversion = nfs4_xdr_minorversion(&args->seq_args),
+ };
+
+ encode_compound_hdr(xdr, req, &hdr);
+ encode_sequence(xdr, &args->seq_args, &hdr);
+ encode_putfh(xdr, args->fh, &hdr);
+ encode_setxattr(xdr, args, &hdr);
+ encode_nops(&hdr);
+}
+
+static int nfs4_xdr_dec_setxattr(struct rpc_rqst *req, struct xdr_stream *xdr,
+ void *data)
+{
+ struct nfs42_setxattrres *res = data;
+ struct compound_hdr hdr;
+ int status;
+
+ status = decode_compound_hdr(xdr, &hdr);
+ if (status)
+ goto out;
+ status = decode_sequence(xdr, &res->seq_res, req);
+ if (status)
+ goto out;
+ status = decode_putfh(xdr);
+ if (status)
+ goto out;
+
+ status = decode_setxattr(xdr, &res->cinfo);
+out:
+ return status;
+}
+
+static void nfs4_xdr_enc_getxattr(struct rpc_rqst *req, struct xdr_stream *xdr,
+ const void *data)
+{
+ const struct nfs42_getxattrargs *args = data;
+ struct compound_hdr hdr = {
+ .minorversion = nfs4_xdr_minorversion(&args->seq_args),
+ };
+ size_t plen;
+
+ encode_compound_hdr(xdr, req, &hdr);
+ encode_sequence(xdr, &args->seq_args, &hdr);
+ encode_putfh(xdr, args->fh, &hdr);
+ encode_getxattr(xdr, args->xattr_name, &hdr);
+
+ plen = args->xattr_len ? args->xattr_len : XATTR_SIZE_MAX;
+
+ rpc_prepare_reply_pages(req, args->xattr_pages, 0, plen,
+ hdr.replen);
+ req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
+
+ encode_nops(&hdr);
+}
+
+static int nfs4_xdr_dec_getxattr(struct rpc_rqst *rqstp,
+ struct xdr_stream *xdr, void *data)
+{
+ struct nfs42_getxattrres *res = data;
+ struct compound_hdr hdr;
+ int status;
+
+ status = decode_compound_hdr(xdr, &hdr);
+ if (status)
+ goto out;
+ status = decode_sequence(xdr, &res->seq_res, rqstp);
+ if (status)
+ goto out;
+ status = decode_putfh(xdr);
+ if (status)
+ goto out;
+ status = decode_getxattr(xdr, res, rqstp);
+out:
+ return status;
+}
+
+static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
+ struct xdr_stream *xdr, const void *data)
+{
+ const struct nfs42_listxattrsargs *args = data;
+ struct compound_hdr hdr = {
+ .minorversion = nfs4_xdr_minorversion(&args->seq_args),
+ };
+
+ encode_compound_hdr(xdr, req, &hdr);
+ encode_sequence(xdr, &args->seq_args, &hdr);
+ encode_putfh(xdr, args->fh, &hdr);
+ encode_listxattrs(xdr, args, &hdr);
+
+ rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
+ hdr.replen);
+ req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
+
+ encode_nops(&hdr);
+}
+
+static int nfs4_xdr_dec_listxattrs(struct rpc_rqst *rqstp,
+ struct xdr_stream *xdr, void *data)
+{
+ struct nfs42_listxattrsres *res = data;
+ struct compound_hdr hdr;
+ int status;
+
+ xdr_set_scratch_buffer(xdr, page_address(res->scratch), PAGE_SIZE);
+
+ status = decode_compound_hdr(xdr, &hdr);
+ if (status)
+ goto out;
+ status = decode_sequence(xdr, &res->seq_res, rqstp);
+ if (status)
+ goto out;
+ status = decode_putfh(xdr);
+ if (status)
+ goto out;
+ status = decode_listxattrs(xdr, res);
+out:
+ return status;
+}
+
+static void nfs4_xdr_enc_removexattr(struct rpc_rqst *req,
+ struct xdr_stream *xdr, const void *data)
+{
+ const struct nfs42_removexattrargs *args = data;
+ struct compound_hdr hdr = {
+ .minorversion = nfs4_xdr_minorversion(&args->seq_args),
+ };
+
+ encode_compound_hdr(xdr, req, &hdr);
+ encode_sequence(xdr, &args->seq_args, &hdr);
+ encode_putfh(xdr, args->fh, &hdr);
+ encode_removexattr(xdr, args->xattr_name, &hdr);
+ encode_nops(&hdr);
+}
+
+static int nfs4_xdr_dec_removexattr(struct rpc_rqst *req,
+ struct xdr_stream *xdr, void *data)
+{
+ struct nfs42_removexattrres *res = data;
+ struct compound_hdr hdr;
+ int status;
+
+ status = decode_compound_hdr(xdr, &hdr);
+ if (status)
+ goto out;
+ status = decode_sequence(xdr, &res->seq_res, req);
+ if (status)
+ goto out;
+ status = decode_putfh(xdr);
+ if (status)
+ goto out;
+
+ status = decode_removexattr(xdr, &res->cinfo);
+out:
+ return status;
+}
+#endif
#endif /* __LINUX_FS_NFS_NFS4_2XDR_H */
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index bebc087a1433..9e1ced791789 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7479,6 +7479,8 @@ static struct {
{ NFS4ERR_SYMLINK, -ELOOP },
{ NFS4ERR_OP_ILLEGAL, -EOPNOTSUPP },
{ NFS4ERR_DEADLOCK, -EDEADLK },
+ { NFS4ERR_NOXATTR, -ENODATA },
+ { NFS4ERR_XATTR2BIG, -E2BIG },
{ -1, -EIO }
};

@@ -7607,6 +7609,10 @@ const struct rpc_procinfo nfs4_procedures[] = {
PROC42(COPY_NOTIFY, enc_copy_notify, dec_copy_notify),
PROC(LOOKUPP, enc_lookupp, dec_lookupp),
PROC42(LAYOUTERROR, enc_layouterror, dec_layouterror),
+ PROC42(GETXATTR, enc_getxattr, dec_getxattr),
+ PROC42(SETXATTR, enc_setxattr, dec_setxattr),
+ PROC42(LISTXATTRS, enc_listxattrs, dec_listxattrs),
+ PROC42(REMOVEXATTR, enc_removexattr, dec_removexattr),
};

static unsigned int nfs_version4_counts[ARRAY_SIZE(nfs4_procedures)];
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 5076fe42c693..685deed805e8 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1497,7 +1497,64 @@ struct nfs42_seek_res {
u32 sr_eof;
u64 sr_offset;
};
-#endif
+
+struct nfs42_setxattrargs {
+ struct nfs4_sequence_args seq_args;
+ struct nfs_fh *fh;
+ const char *xattr_name;
+ u32 xattr_flags;
+ size_t xattr_len;
+ struct page **xattr_pages;
+};
+
+struct nfs42_setxattrres {
+ struct nfs4_sequence_res seq_res;
+ struct nfs4_change_info cinfo;
+};
+
+struct nfs42_getxattrargs {
+ struct nfs4_sequence_args seq_args;
+ struct nfs_fh *fh;
+ const char *xattr_name;
+ size_t xattr_len;
+ struct page **xattr_pages;
+};
+
+struct nfs42_getxattrres {
+ struct nfs4_sequence_res seq_res;
+ size_t xattr_len;
+};
+
+struct nfs42_listxattrsargs {
+ struct nfs4_sequence_args seq_args;
+ struct nfs_fh *fh;
+ u32 count;
+ u64 cookie;
+ struct page **xattr_pages;
+};
+
+struct nfs42_listxattrsres {
+ struct nfs4_sequence_res seq_res;
+ struct page *scratch;
+ void *xattr_buf;
+ size_t xattr_len;
+ u64 cookie;
+ bool eof;
+ size_t copied;
+};
+
+struct nfs42_removexattrargs {
+ struct nfs4_sequence_args seq_args;
+ struct nfs_fh *fh;
+ const char *xattr_name;
+};
+
+struct nfs42_removexattrres {
+ struct nfs4_sequence_res seq_res;
+ struct nfs4_change_info cinfo;
+};
+
+#endif /* CONFIG_NFS_V4_2 */

struct nfs_page;

--
2.16.6

2020-03-11 19:57:31

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH 01/13] nfs,nfsd: NFSv4.2 extended attribute protocol definitions

Add definitions for the new operations, errors and flags as defined
in RFC 8276 (File System Extended Attributes in NFSv4).

Signed-off-by: Frank van der Linden <[email protected]>
---
include/linux/nfs4.h | 20 ++++++++++++++++++++
include/uapi/linux/nfs4.h | 3 +++
2 files changed, 23 insertions(+)

diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 82d8fb422092..350aeda0c48c 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -150,6 +150,12 @@ enum nfs_opnum4 {
OP_WRITE_SAME = 70,
OP_CLONE = 71,

+ /* xattr support (RFC8726) */
+ OP_GETXATTR = 72,
+ OP_SETXATTR = 73,
+ OP_LISTXATTRS = 74,
+ OP_REMOVEXATTR = 75,
+
OP_ILLEGAL = 10044,
};

@@ -280,6 +286,10 @@ enum nfsstat4 {
NFS4ERR_WRONG_LFS = 10092,
NFS4ERR_BADLABEL = 10093,
NFS4ERR_OFFLOAD_NO_REQS = 10094,
+
+ /* xattr (RFC8276) */
+ NFS4ERR_NOXATTR = 10095,
+ NFS4ERR_XATTR2BIG = 10096,
};

static inline bool seqid_mutating_err(u32 err)
@@ -452,6 +462,7 @@ enum change_attr_type4 {
#define FATTR4_WORD2_CHANGE_ATTR_TYPE (1UL << 15)
#define FATTR4_WORD2_SECURITY_LABEL (1UL << 16)
#define FATTR4_WORD2_MODE_UMASK (1UL << 17)
+#define FATTR4_WORD2_XATTR_SUPPORT (1UL << 18)

/* MDS threshold bitmap bits */
#define THRESHOLD_RD (1UL << 0)
@@ -700,4 +711,13 @@ struct nl4_server {
struct nfs42_netaddr nl4_addr; /* NL4_NETADDR */
} u;
};
+
+/*
+ * Options for setxattr. These match the flags for setxattr(2).
+ */
+enum nfs4_setxattr_options {
+ SETXATTR4_EITHER = 0,
+ SETXATTR4_CREATE = 1,
+ SETXATTR4_REPLACE = 2,
+};
#endif
diff --git a/include/uapi/linux/nfs4.h b/include/uapi/linux/nfs4.h
index 8572930cf5b0..bf197e99b98f 100644
--- a/include/uapi/linux/nfs4.h
+++ b/include/uapi/linux/nfs4.h
@@ -33,6 +33,9 @@
#define NFS4_ACCESS_EXTEND 0x0008
#define NFS4_ACCESS_DELETE 0x0010
#define NFS4_ACCESS_EXECUTE 0x0020
+#define NFS4_ACCESS_XAREAD 0x0040
+#define NFS4_ACCESS_XAWRITE 0x0080
+#define NFS4_ACCESS_XALIST 0x0100

#define NFS4_FH_PERSISTENT 0x0000
#define NFS4_FH_NOEXPIRE_WITH_OPEN 0x0001
--
2.16.6

2020-03-11 19:57:39

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH 13/13] NFSv4.2: add client side xattr caching.

Implement client side caching for NFSv4.2 extended attributes. The cache
is a per-inode hashtable, with name/value entries. There is one special
entry for the listxattr cache.

NFS inodes have a pointer to a cache structure. The cache structure is
allocated on demand, freed when the cache is invalidated.

Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE)
are collected by a separate shrinker, and freed more aggressively
than others.

Signed-off-by: Frank van der Linden <[email protected]>
---
fs/nfs/Makefile | 1 +
fs/nfs/inode.c | 9 +-
fs/nfs/internal.h | 20 +
fs/nfs/nfs42proc.c | 12 +
fs/nfs/nfs42xattr.c | 1083 +++++++++++++++++++++++++++++++++++++++++++
fs/nfs/nfs4proc.c | 42 +-
fs/nfs/nfs4super.c | 10 +
include/linux/nfs_fs.h | 6 +
include/uapi/linux/nfs_fs.h | 1 +
9 files changed, 1177 insertions(+), 7 deletions(-)
create mode 100644 fs/nfs/nfs42xattr.c

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 2433c3e03cfa..191b3e9aa232 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -31,6 +31,7 @@ nfsv4-$(CONFIG_NFS_USE_LEGACY_DNS) += cache_lib.o
nfsv4-$(CONFIG_SYSCTL) += nfs4sysctl.o
nfsv4-$(CONFIG_NFS_V4_1) += pnfs.o pnfs_dev.o pnfs_nfs.o
nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o
+nfsv4-$(CONFIG_NFS_V4_2) += nfs42xattr.o

obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
obj-$(CONFIG_PNFS_BLOCK) += blocklayout/
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index d2be152796ef..9d4952d2306b 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -194,6 +194,7 @@ bool nfs_check_cache_invalid(struct inode *inode, unsigned long flags)

return nfs_check_cache_invalid_not_delegated(inode, flags);
}
+EXPORT_SYMBOL_GPL(nfs_check_cache_invalid);

static void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
{
@@ -235,11 +236,13 @@ static void nfs_zap_caches_locked(struct inode *inode)
| NFS_INO_INVALID_DATA
| NFS_INO_INVALID_ACCESS
| NFS_INO_INVALID_ACL
+ | NFS_INO_INVALID_XATTR
| NFS_INO_REVAL_PAGECACHE);
} else
nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
| NFS_INO_INVALID_ACCESS
| NFS_INO_INVALID_ACL
+ | NFS_INO_INVALID_XATTR
| NFS_INO_REVAL_PAGECACHE);
nfs_zap_label_cache_locked(nfsi);
}
@@ -1885,7 +1888,8 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
if (!(have_writers || have_delegation)) {
invalid |= NFS_INO_INVALID_DATA
| NFS_INO_INVALID_ACCESS
- | NFS_INO_INVALID_ACL;
+ | NFS_INO_INVALID_ACL
+ | NFS_INO_INVALID_XATTR;
/* Force revalidate of all attributes */
save_cache_validity |= NFS_INO_INVALID_CTIME
| NFS_INO_INVALID_MTIME
@@ -2084,6 +2088,9 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
#if IS_ENABLED(CONFIG_NFS_V4)
nfsi->nfs4_acl = NULL;
#endif /* CONFIG_NFS_V4 */
+#ifdef CONFIG_NFS_V4_2
+ nfsi->xattr_cache = NULL;
+#endif
return &nfsi->vfs_inode;
}
EXPORT_SYMBOL_GPL(nfs_alloc_inode);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 1e3a7e119c93..67b8e4f7c554 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -575,6 +575,26 @@ extern void nfs4_test_session_trunk(struct rpc_clnt *clnt,
struct rpc_xprt *xprt,
void *data);

+#ifdef CONFIG_NFS_V4_2
+extern int __init nfs4_xattr_cache_init(void);
+extern void nfs4_xattr_cache_exit(void);
+extern void nfs4_xattr_cache_add(struct inode *inode, const char *name,
+ const char *buf, struct page **pages,
+ ssize_t buflen);
+extern void nfs4_xattr_cache_remove(struct inode *inode, const char *name);
+extern ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name,
+ char *buf, ssize_t buflen);
+extern void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
+ ssize_t buflen);
+extern ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf,
+ ssize_t buflen);
+extern void nfs4_xattr_cache_zap(struct inode *inode);
+#else
+static inline void nfs4_xattr_cache_zap(struct inode *inode)
+{
+}
+#endif
+
static inline struct inode *nfs_igrab_and_active(struct inode *inode)
{
inode = igrab(inode);
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 8c2e52bc986a..e200522469af 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -1182,6 +1182,18 @@ static ssize_t _nfs42_proc_getxattr(struct inode *inode, const char *name,
if (ret < 0)
return ret;

+ /*
+ * Normally, the caching is done one layer up, but for successful
+ * RPCS, always cache the result here, even if the caller was
+ * just querying the length, or if the reply was too big for
+ * the caller. This avoids a second RPC in the case of the
+ * common query-alloc-retrieve cycle for xattrs.
+ *
+ * Note that xattr_len is always capped to XATTR_SIZE_MAX.
+ */
+
+ nfs4_xattr_cache_add(inode, name, NULL, pages, res.xattr_len);
+
if (buflen) {
if (res.xattr_len > buflen)
return -ERANGE;
diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
new file mode 100644
index 000000000000..23fdab977a2a
--- /dev/null
+++ b/fs/nfs/nfs42xattr.c
@@ -0,0 +1,1083 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2019, 2020 Amazon.com, Inc. or its affiliates. All rights reserved.
+ *
+ * User extended attribute client side cache functions.
+ *
+ * Author: Frank van der Linden <[email protected]>
+ */
+#include <linux/errno.h>
+#include <linux/nfs_fs.h>
+#include <linux/hashtable.h>
+#include <linux/refcount.h>
+#include <uapi/linux/xattr.h>
+
+#include "nfs4_fs.h"
+#include "internal.h"
+
+/*
+ * User extended attributes client side caching is implemented by having
+ * a cache structure attached to NFS inodes. This structure is allocated
+ * when needed, and freed when the cache is zapped.
+ *
+ * The cache structure contains as hash table of entries, and a pointer
+ * to a special-cased entry for the listxattr cache.
+ *
+ * Accessing and allocating / freeing the caches is done via reference
+ * counting. The cache entries use a similar refcounting scheme.
+ *
+ * This makes freeing a cache, both from the shrinker and from the
+ * zap cache path, easy. It also means that, in current use cases,
+ * the large majority of inodes will not waste any memory, as they
+ * will never have any user extended attributes assigned to them.
+ *
+ * Attribute entries are hashed in to a simple hash table. They are
+ * also part of an LRU.
+ *
+ * There are three shrinkers.
+ *
+ * Two shrinkers deal with the cache entries themselves: one for
+ * large entries (> PAGE_SIZE), and one for smaller entries. The
+ * shrinker for the larger entries works more aggressively than
+ * those for the smaller entries.
+ *
+ * The other shrinker frees the cache structures themselves.
+ */
+
+/*
+ * 64 buckets is a good default. There is likely no reasonable
+ * workload that uses more than even 64 user extended attributes.
+ * You can certainly add a lot more - but you get what you ask for
+ * in those circumstances.
+ */
+#define NFS4_XATTR_HASH_SIZE 64
+
+#define NFSDBG_FACILITY NFSDBG_XATTRCACHE
+
+struct nfs4_xattr_cache;
+struct nfs4_xattr_entry;
+
+struct nfs4_xattr_bucket {
+ spinlock_t lock;
+ struct hlist_head hlist;
+ struct nfs4_xattr_cache *cache;
+ bool draining;
+};
+
+struct nfs4_xattr_cache {
+ struct kref ref;
+ spinlock_t hash_lock; /* protects hashtable and lru */
+ struct nfs4_xattr_bucket buckets[NFS4_XATTR_HASH_SIZE];
+ struct list_head lru;
+ struct list_head dispose;
+ atomic_long_t nent;
+ spinlock_t listxattr_lock;
+ struct inode *inode;
+ struct nfs4_xattr_entry *listxattr;
+ struct work_struct work;
+};
+
+struct nfs4_xattr_entry {
+ struct kref ref;
+ struct hlist_node hnode;
+ struct list_head lru;
+ struct list_head dispose;
+ char *xattr_name;
+ void *xattr_value;
+ size_t xattr_size;
+ struct nfs4_xattr_bucket *bucket;
+ uint32_t flags;
+};
+
+#define NFS4_XATTR_ENTRY_EXTVAL 0x0001
+
+/*
+ * LRU list of NFS inodes that have xattr caches.
+ */
+static struct list_lru nfs4_xattr_cache_lru;
+static struct list_lru nfs4_xattr_entry_lru;
+static struct list_lru nfs4_xattr_large_entry_lru;
+
+static struct kmem_cache *nfs4_xattr_cache_cachep;
+
+static struct workqueue_struct *nfs4_xattr_cache_wq;
+
+/*
+ * Hashing helper functions.
+ */
+static void
+nfs4_xattr_hash_init(struct nfs4_xattr_cache *cache)
+{
+ unsigned int i;
+
+ for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
+ INIT_HLIST_HEAD(&cache->buckets[i].hlist);
+ spin_lock_init(&cache->buckets[i].lock);
+ cache->buckets[i].cache = cache;
+ cache->buckets[i].draining = false;
+ }
+}
+
+/*
+ * Locking order:
+ * 1. inode i_lock or bucket lock
+ * 2. list_lru lock (taken by list_lru_* functions)
+ */
+
+/*
+ * Wrapper functions to add a cache entry to the right LRU.
+ */
+static bool
+nfs4_xattr_entry_lru_add(struct nfs4_xattr_entry *entry)
+{
+ struct list_lru *lru;
+
+ lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
+ &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
+
+ return list_lru_add(lru, &entry->lru);
+}
+
+static bool
+nfs4_xattr_entry_lru_del(struct nfs4_xattr_entry *entry)
+{
+ struct list_lru *lru;
+
+ lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
+ &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
+
+ return list_lru_del(lru, &entry->lru);
+}
+
+/*
+ * This function allocates cache entries. They are the normal
+ * extended attribute name/value pairs, but may also be a listxattr
+ * cache. Those allocations use the same entry so that they can be
+ * treated as one by the memory shrinker.
+ *
+ * xattr cache entries are allocated together with names. If the
+ * value fits in to one page with the entry structure and the name,
+ * it will also be part of the same allocation (kmalloc). This is
+ * expected to be the vast majority of cases. Larger allocations
+ * have a value pointer that is allocated separately by kvmalloc.
+ *
+ * Parameters:
+ *
+ * @name: Name of the extended attribute. NULL for listxattr cache
+ * entry.
+ * @value: Value of attribute, or listxattr cache. NULL if the
+ * value is to be copied from pages instead.
+ * @pages: Pages to copy the value from, if not NULL. Passed in to
+ * make it easier to copy the value after an RPC, even if
+ * the value will not be passed up to application (e.g.
+ * for a 'query' getxattr with NULL buffer).
+ * @len: Length of the value. Can be 0 for zero-length attribues.
+ * @value and @pages will be NULL if @len is 0.
+ */
+static struct nfs4_xattr_entry *
+nfs4_xattr_alloc_entry(const char *name, const void *value,
+ struct page **pages, size_t len)
+{
+ struct nfs4_xattr_entry *entry;
+ void *valp;
+ char *namep;
+ size_t alloclen, slen;
+ char *buf;
+ uint32_t flags;
+
+ BUILD_BUG_ON(sizeof(struct nfs4_xattr_entry) +
+ XATTR_NAME_MAX + 1 > PAGE_SIZE);
+
+ alloclen = sizeof(struct nfs4_xattr_entry);
+ if (name != NULL) {
+ slen = strlen(name) + 1;
+ alloclen += slen;
+ } else
+ slen = 0;
+
+ if (alloclen + len <= PAGE_SIZE) {
+ alloclen += len;
+ flags = 0;
+ } else {
+ flags = NFS4_XATTR_ENTRY_EXTVAL;
+ }
+
+ buf = kmalloc(alloclen, GFP_KERNEL_ACCOUNT | GFP_NOFS);
+ if (buf == NULL)
+ return NULL;
+ entry = (struct nfs4_xattr_entry *)buf;
+
+ if (name != NULL) {
+ namep = buf + sizeof(struct nfs4_xattr_entry);
+ memcpy(namep, name, slen);
+ } else {
+ namep = NULL;
+ }
+
+
+ if (flags & NFS4_XATTR_ENTRY_EXTVAL) {
+ valp = kvmalloc(len, GFP_KERNEL_ACCOUNT | GFP_NOFS);
+ if (valp == NULL) {
+ kfree(buf);
+ return NULL;
+ }
+ } else if (len != 0) {
+ valp = buf + sizeof(struct nfs4_xattr_entry) + slen;
+ } else
+ valp = NULL;
+
+ if (valp != NULL) {
+ if (value != NULL)
+ memcpy(valp, value, len);
+ else
+ _copy_from_pages(valp, pages, 0, len);
+ }
+
+ entry->flags = flags;
+ entry->xattr_value = valp;
+ kref_init(&entry->ref);
+ entry->xattr_name = namep;
+ entry->xattr_size = len;
+ entry->bucket = NULL;
+ INIT_LIST_HEAD(&entry->lru);
+ INIT_LIST_HEAD(&entry->dispose);
+ INIT_HLIST_NODE(&entry->hnode);
+
+ return entry;
+}
+
+static void
+nfs4_xattr_free_entry(struct nfs4_xattr_entry *entry)
+{
+ if (entry->flags & NFS4_XATTR_ENTRY_EXTVAL)
+ kvfree(entry->xattr_value);
+ kfree(entry);
+}
+
+static void
+nfs4_xattr_free_entry_cb(struct kref *kref)
+{
+ struct nfs4_xattr_entry *entry;
+
+ entry = container_of(kref, struct nfs4_xattr_entry, ref);
+
+ if (WARN_ON(!list_empty(&entry->lru)))
+ return;
+
+ nfs4_xattr_free_entry(entry);
+}
+
+static void
+nfs4_xattr_free_cache_cb(struct kref *kref)
+{
+ struct nfs4_xattr_cache *cache;
+ int i;
+
+ cache = container_of(kref, struct nfs4_xattr_cache, ref);
+
+ for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
+ if (WARN_ON(!hlist_empty(&cache->buckets[i].hlist)))
+ return;
+ cache->buckets[i].draining = false;
+ }
+
+ cache->listxattr = NULL;
+
+ kmem_cache_free(nfs4_xattr_cache_cachep, cache);
+
+}
+
+static struct nfs4_xattr_cache *
+nfs4_xattr_alloc_cache(void)
+{
+ struct nfs4_xattr_cache *cache;
+
+ cache = kmem_cache_alloc(nfs4_xattr_cache_cachep,
+ GFP_KERNEL_ACCOUNT | GFP_NOFS);
+ if (cache == NULL)
+ return NULL;
+
+ kref_init(&cache->ref);
+ atomic_long_set(&cache->nent, 0);
+
+ return cache;
+}
+
+/*
+ * Set the listxattr cache, which is a special-cased cache entry.
+ * The special value ERR_PTR(-ESTALE) is used to indicate that
+ * the cache is being drained - this prevents a new listxattr
+ * cache from being added to what is now a stale cache.
+ */
+static int
+nfs4_xattr_set_listcache(struct nfs4_xattr_cache *cache,
+ struct nfs4_xattr_entry *new)
+{
+ struct nfs4_xattr_entry *old;
+ int ret = 1;
+
+ spin_lock(&cache->listxattr_lock);
+
+ old = cache->listxattr;
+
+ if (old == ERR_PTR(-ESTALE)) {
+ ret = 0;
+ goto out;
+ }
+
+ cache->listxattr = new;
+ if (new != NULL && new != ERR_PTR(-ESTALE))
+ nfs4_xattr_entry_lru_add(new);
+
+ if (old != NULL) {
+ nfs4_xattr_entry_lru_del(old);
+ kref_put(&old->ref, nfs4_xattr_free_entry_cb);
+ }
+out:
+ spin_unlock(&cache->listxattr_lock);
+
+ return ret;
+}
+
+/*
+ * Unlink a cache from its parent inode, clearing out an invalid
+ * cache. Must be called with i_lock held.
+ */
+static struct nfs4_xattr_cache *
+nfs4_xattr_cache_unlink(struct inode *inode)
+{
+ struct nfs_inode *nfsi;
+ struct nfs4_xattr_cache *oldcache;
+
+ nfsi = NFS_I(inode);
+
+ oldcache = nfsi->xattr_cache;
+ if (oldcache != NULL) {
+ list_lru_del(&nfs4_xattr_cache_lru, &oldcache->lru);
+ oldcache->inode = NULL;
+ }
+ nfsi->xattr_cache = NULL;
+ nfsi->cache_validity &= ~NFS_INO_INVALID_XATTR;
+
+ return oldcache;
+
+}
+
+/*
+ * Discard a cache. Usually called by a worker, since walking all
+ * the entries can take up some cycles that we don't want to waste
+ * in the I/O path. Can also be called from the shrinker callback.
+ *
+ * The cache is dead, it has already been unlinked from its inode,
+ * and no longer appears on the cache LRU list.
+ *
+ * Mark all buckets as draining, so that no new entries are added. This
+ * could still happen in the unlikely, but possible case that another
+ * thread had grabbed a reference before it was unlinked from the inode,
+ * and is still holding it for an add operation.
+ *
+ * Remove all entries from the LRU lists, so that there is no longer
+ * any way to 'find' this cache. Then, remove the entries from the hash
+ * table.
+ *
+ * At that point, the cache will remain empty and can be freed when the final
+ * reference drops, which is very likely the kref_put at the end of
+ * this function, or the one called immediately afterwards in the
+ * shrinker callback.
+ */
+static void
+nfs4_xattr_discard_cache(struct nfs4_xattr_cache *cache)
+{
+ unsigned int i;
+ struct nfs4_xattr_entry *entry;
+ struct nfs4_xattr_bucket *bucket;
+ struct hlist_node *n;
+
+ nfs4_xattr_set_listcache(cache, ERR_PTR(-ESTALE));
+
+ for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
+ bucket = &cache->buckets[i];
+
+ spin_lock(&bucket->lock);
+ bucket->draining = true;
+ hlist_for_each_entry_safe(entry, n, &bucket->hlist, hnode) {
+ nfs4_xattr_entry_lru_del(entry);
+ hlist_del_init(&entry->hnode);
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+ }
+ spin_unlock(&bucket->lock);
+ }
+
+ atomic_long_set(&cache->nent, 0);
+
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+}
+
+static void
+nfs4_xattr_discard_cache_worker(struct work_struct *work)
+{
+ struct nfs4_xattr_cache *cache = container_of(work,
+ struct nfs4_xattr_cache, work);
+
+ nfs4_xattr_discard_cache(cache);
+}
+
+static void
+nfs4_xattr_reap_cache(struct nfs4_xattr_cache *cache)
+{
+ queue_work(nfs4_xattr_cache_wq, &cache->work);
+}
+
+/*
+ * Get a referenced copy of the cache structure. Avoid doing allocs
+ * while holding i_lock. Which means that we do some optimistic allocation,
+ * and might have to free the result in rare cases.
+ *
+ * This function only checks the NFS_INO_INVALID_XATTR cache validity bit
+ * and acts accordingly, replacing the cache when needed. For the read case
+ * (!add), this means that the caller must make sure that the cache
+ * is valid before caling this function. getxattr and listxattr call
+ * revalidate_inode to do this. The attribute cache timeout (for the
+ * non-delegated case) is expected to be dealt with in the revalidate
+ * call.
+ */
+
+static struct nfs4_xattr_cache *
+nfs4_xattr_get_cache(struct inode *inode, int add)
+{
+ struct nfs_inode *nfsi;
+ struct nfs4_xattr_cache *cache, *oldcache, *newcache;
+
+ nfsi = NFS_I(inode);
+
+ cache = oldcache = NULL;
+
+ spin_lock(&inode->i_lock);
+
+ if (nfsi->cache_validity & NFS_INO_INVALID_XATTR)
+ oldcache = nfs4_xattr_cache_unlink(inode);
+ else
+ cache = nfsi->xattr_cache;
+
+ if (cache != NULL)
+ kref_get(&cache->ref);
+
+ spin_unlock(&inode->i_lock);
+
+ if (add && cache == NULL) {
+ newcache = NULL;
+
+ cache = nfs4_xattr_alloc_cache();
+ if (cache == NULL)
+ goto out;
+
+ spin_lock(&inode->i_lock);
+ if (nfsi->cache_validity & NFS_INO_INVALID_XATTR) {
+ /*
+ * The cache was invalidated again. Give up,
+ * since what we want to enter is now likely
+ * outdated anyway.
+ */
+ spin_unlock(&inode->i_lock);
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+ cache = NULL;
+ goto out;
+ }
+
+ /*
+ * Check if someone beat us to it.
+ */
+ if (nfsi->xattr_cache != NULL) {
+ newcache = nfsi->xattr_cache;
+ kref_get(&newcache->ref);
+ } else {
+ kref_get(&cache->ref);
+ nfsi->xattr_cache = cache;
+ cache->inode = inode;
+ list_lru_add(&nfs4_xattr_cache_lru, &cache->lru);
+ }
+
+ spin_unlock(&inode->i_lock);
+
+ /*
+ * If there was a race, throw away the cache we just
+ * allocated, and use the new one allocated by someone
+ * else.
+ */
+ if (newcache != NULL) {
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+ cache = newcache;
+ }
+ }
+
+out:
+ /*
+ * Discarding an old cache is done via a workqueue.
+ */
+ if (oldcache != NULL)
+ nfs4_xattr_reap_cache(oldcache);
+
+ return cache;
+}
+
+static inline struct nfs4_xattr_bucket *
+nfs4_xattr_hash_bucket(struct nfs4_xattr_cache *cache, const char *name)
+{
+ return &cache->buckets[jhash(name, strlen(name), 0) &
+ (ARRAY_SIZE(cache->buckets) - 1)];
+}
+
+static struct nfs4_xattr_entry *
+nfs4_xattr_get_entry(struct nfs4_xattr_bucket *bucket, const char *name)
+{
+ struct nfs4_xattr_entry *entry;
+
+ entry = NULL;
+
+ hlist_for_each_entry(entry, &bucket->hlist, hnode) {
+ if (!strcmp(entry->xattr_name, name))
+ break;
+ }
+
+ return entry;
+}
+
+static int
+nfs4_xattr_hash_add(struct nfs4_xattr_cache *cache,
+ struct nfs4_xattr_entry *entry)
+{
+ struct nfs4_xattr_bucket *bucket;
+ struct nfs4_xattr_entry *oldentry = NULL;
+ int ret = 1;
+
+ bucket = nfs4_xattr_hash_bucket(cache, entry->xattr_name);
+ entry->bucket = bucket;
+
+ spin_lock(&bucket->lock);
+
+ if (bucket->draining) {
+ ret = 0;
+ goto out;
+ }
+
+ oldentry = nfs4_xattr_get_entry(bucket, entry->xattr_name);
+ if (oldentry != NULL) {
+ hlist_del_init(&oldentry->hnode);
+ nfs4_xattr_entry_lru_del(oldentry);
+ } else {
+ atomic_long_inc(&cache->nent);
+ }
+
+ hlist_add_head(&entry->hnode, &bucket->hlist);
+ nfs4_xattr_entry_lru_add(entry);
+
+out:
+ spin_unlock(&bucket->lock);
+
+ if (oldentry != NULL)
+ kref_put(&oldentry->ref, nfs4_xattr_free_entry_cb);
+
+ return ret;
+}
+
+static void
+nfs4_xattr_hash_remove(struct nfs4_xattr_cache *cache, const char *name)
+{
+ struct nfs4_xattr_bucket *bucket;
+ struct nfs4_xattr_entry *entry;
+
+ bucket = nfs4_xattr_hash_bucket(cache, name);
+
+ spin_lock(&bucket->lock);
+
+ entry = nfs4_xattr_get_entry(bucket, name);
+ if (entry != NULL) {
+ hlist_del_init(&entry->hnode);
+ nfs4_xattr_entry_lru_del(entry);
+ atomic_long_dec(&cache->nent);
+ }
+
+ spin_unlock(&bucket->lock);
+
+ if (entry != NULL)
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+}
+
+static struct nfs4_xattr_entry *
+nfs4_xattr_hash_find(struct nfs4_xattr_cache *cache, const char *name)
+{
+ struct nfs4_xattr_bucket *bucket;
+ struct nfs4_xattr_entry *entry;
+
+ bucket = nfs4_xattr_hash_bucket(cache, name);
+
+ spin_lock(&bucket->lock);
+
+ entry = nfs4_xattr_get_entry(bucket, name);
+ if (entry != NULL)
+ kref_get(&entry->ref);
+
+ spin_unlock(&bucket->lock);
+
+ return entry;
+}
+
+/*
+ * Entry point to retrieve an entry from the cache.
+ */
+ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name, char *buf,
+ ssize_t buflen)
+{
+ struct nfs4_xattr_cache *cache;
+ struct nfs4_xattr_entry *entry;
+ ssize_t ret;
+
+ cache = nfs4_xattr_get_cache(inode, 0);
+ if (cache == NULL)
+ return -ENOENT;
+
+ ret = 0;
+ entry = nfs4_xattr_hash_find(cache, name);
+
+ if (entry != NULL) {
+ dprintk("%s: cache hit '%s', len %lu\n", __func__,
+ entry->xattr_name, (unsigned long)entry->xattr_size);
+ if (buflen == 0) {
+ /* Length probe only */
+ ret = entry->xattr_size;
+ } else if (buflen < entry->xattr_size)
+ ret = -ERANGE;
+ else {
+ memcpy(buf, entry->xattr_value, entry->xattr_size);
+ ret = entry->xattr_size;
+ }
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+ } else {
+ dprintk("%s: cache miss '%s'\n", __func__, name);
+ ret = -ENOENT;
+ }
+
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+
+ return ret;
+}
+
+/*
+ * Retrieve a cached list of xattrs from the cache.
+ */
+ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf, ssize_t buflen)
+{
+ struct nfs4_xattr_cache *cache;
+ struct nfs4_xattr_entry *entry;
+ ssize_t ret;
+
+ cache = nfs4_xattr_get_cache(inode, 0);
+ if (cache == NULL)
+ return -ENOENT;
+
+ spin_lock(&cache->listxattr_lock);
+
+ entry = cache->listxattr;
+
+ if (entry != NULL && entry != ERR_PTR(-ESTALE)) {
+ if (buflen == 0) {
+ /* Length probe only */
+ ret = entry->xattr_size;
+ } else if (entry->xattr_size > buflen)
+ ret = -ERANGE;
+ else {
+ memcpy(buf, entry->xattr_value, entry->xattr_size);
+ ret = entry->xattr_size;
+ }
+ } else {
+ ret = -ENOENT;
+ }
+
+ spin_unlock(&cache->listxattr_lock);
+
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+
+ return ret;
+}
+
+/*
+ * Add an xattr to the cache.
+ *
+ * This also invalidates the xattr list cache.
+ */
+void nfs4_xattr_cache_add(struct inode *inode, const char *name,
+ const char *buf, struct page **pages, ssize_t buflen)
+{
+ struct nfs4_xattr_cache *cache;
+ struct nfs4_xattr_entry *entry;
+
+ dprintk("%s: add '%s' len %lu\n", __func__,
+ name, (unsigned long)buflen);
+
+ cache = nfs4_xattr_get_cache(inode, 1);
+ if (cache == NULL)
+ return;
+
+ entry = nfs4_xattr_alloc_entry(name, buf, pages, buflen);
+ if (entry == NULL)
+ goto out;
+
+ (void)nfs4_xattr_set_listcache(cache, NULL);
+
+ if (!nfs4_xattr_hash_add(cache, entry))
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+
+out:
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+}
+
+
+/*
+ * Remove an xattr from the cache.
+ *
+ * This also invalidates the xattr list cache.
+ */
+void nfs4_xattr_cache_remove(struct inode *inode, const char *name)
+{
+ struct nfs4_xattr_cache *cache;
+
+ dprintk("%s: remove '%s'\n", __func__, name);
+
+ cache = nfs4_xattr_get_cache(inode, 0);
+ if (cache == NULL)
+ return;
+
+ (void)nfs4_xattr_set_listcache(cache, NULL);
+ nfs4_xattr_hash_remove(cache, name);
+
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+}
+
+/*
+ * Cache listxattr output, replacing any possible old one.
+ */
+void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
+ ssize_t buflen)
+{
+ struct nfs4_xattr_cache *cache;
+ struct nfs4_xattr_entry *entry;
+
+ cache = nfs4_xattr_get_cache(inode, 1);
+ if (cache == NULL)
+ return;
+
+ entry = nfs4_xattr_alloc_entry(NULL, buf, NULL, buflen);
+ if (entry == NULL)
+ goto out;
+
+ /*
+ * This is just there to be able to get to bucket->cache,
+ * which is obviously the same for all buckets, so just
+ * use bucket 0.
+ */
+ entry->bucket = &cache->buckets[0];
+
+ if (!nfs4_xattr_set_listcache(cache, entry))
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+
+out:
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+}
+
+/*
+ * Zap the entire cache. Called when an inode is evicted.
+ */
+void nfs4_xattr_cache_zap(struct inode *inode)
+{
+ struct nfs4_xattr_cache *oldcache;
+
+ spin_lock(&inode->i_lock);
+ oldcache = nfs4_xattr_cache_unlink(inode);
+ spin_unlock(&inode->i_lock);
+
+ if (oldcache)
+ nfs4_xattr_discard_cache(oldcache);
+}
+
+/*
+ * The entry LRU is shrunk more aggressively than the cache LRU,
+ * by settings @seeks to 1.
+ *
+ * Cache structures are freed only when they've become empty, after
+ * pruning all but one entry.
+ */
+
+static unsigned long nfs4_xattr_cache_count(struct shrinker *shrink,
+ struct shrink_control *sc);
+static unsigned long nfs4_xattr_entry_count(struct shrinker *shrink,
+ struct shrink_control *sc);
+static unsigned long nfs4_xattr_cache_scan(struct shrinker *shrink,
+ struct shrink_control *sc);
+static unsigned long nfs4_xattr_entry_scan(struct shrinker *shrink,
+ struct shrink_control *sc);
+
+static struct shrinker nfs4_xattr_cache_shrinker = {
+ .count_objects = nfs4_xattr_cache_count,
+ .scan_objects = nfs4_xattr_cache_scan,
+ .seeks = DEFAULT_SEEKS,
+ .flags = SHRINKER_MEMCG_AWARE,
+};
+
+static struct shrinker nfs4_xattr_entry_shrinker = {
+ .count_objects = nfs4_xattr_entry_count,
+ .scan_objects = nfs4_xattr_entry_scan,
+ .seeks = DEFAULT_SEEKS,
+ .batch = 512,
+ .flags = SHRINKER_MEMCG_AWARE,
+};
+
+static struct shrinker nfs4_xattr_large_entry_shrinker = {
+ .count_objects = nfs4_xattr_entry_count,
+ .scan_objects = nfs4_xattr_entry_scan,
+ .seeks = 1,
+ .batch = 512,
+ .flags = SHRINKER_MEMCG_AWARE,
+};
+
+static enum lru_status
+cache_lru_isolate(struct list_head *item,
+ struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
+{
+ struct list_head *dispose = arg;
+ struct inode *inode;
+ struct nfs4_xattr_cache *cache = container_of(item,
+ struct nfs4_xattr_cache, lru);
+
+ if (atomic_long_read(&cache->nent) > 1)
+ return LRU_SKIP;
+
+ /*
+ * If a cache structure is on the LRU list, we know that
+ * its inode is valid. Try to lock it to break the link.
+ * Since we're inverting the lock order here, only try.
+ */
+ inode = cache->inode;
+
+ if (!spin_trylock(&inode->i_lock))
+ return LRU_SKIP;
+
+ kref_get(&cache->ref);
+
+ cache->inode = NULL;
+ NFS_I(inode)->xattr_cache = NULL;
+ NFS_I(inode)->cache_validity &= ~NFS_INO_INVALID_XATTR;
+ list_lru_isolate(lru, &cache->lru);
+
+ spin_unlock(&inode->i_lock);
+
+ list_add_tail(&cache->dispose, dispose);
+ return LRU_REMOVED;
+}
+
+static unsigned long
+nfs4_xattr_cache_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+ LIST_HEAD(dispose);
+ unsigned long freed;
+ struct nfs4_xattr_cache *cache;
+
+ freed = list_lru_shrink_walk(&nfs4_xattr_cache_lru, sc,
+ cache_lru_isolate, &dispose);
+ while (!list_empty(&dispose)) {
+ cache = list_first_entry(&dispose, struct nfs4_xattr_cache,
+ dispose);
+ list_del_init(&cache->dispose);
+ nfs4_xattr_discard_cache(cache);
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+ }
+
+ return freed;
+}
+
+
+static unsigned long
+nfs4_xattr_cache_count(struct shrinker *shrink, struct shrink_control *sc)
+{
+ unsigned long count;
+
+ count = list_lru_count(&nfs4_xattr_cache_lru);
+ return vfs_pressure_ratio(count);
+}
+
+static enum lru_status
+entry_lru_isolate(struct list_head *item,
+ struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
+{
+ struct list_head *dispose = arg;
+ struct nfs4_xattr_bucket *bucket;
+ struct nfs4_xattr_cache *cache;
+ struct nfs4_xattr_entry *entry = container_of(item,
+ struct nfs4_xattr_entry, lru);
+
+ bucket = entry->bucket;
+ cache = bucket->cache;
+
+ /*
+ * Unhook the entry from its parent (either a cache bucket
+ * or a cache structure if it's a listxattr buf), so that
+ * it's no longer found. Then add it to the isolate list,
+ * to be freed later.
+ *
+ * In both cases, we're reverting lock order, so use
+ * trylock and skip the entry if we can't get the lock.
+ */
+ if (entry->xattr_name != NULL) {
+ /* Regular cache entry */
+ if (!spin_trylock(&bucket->lock))
+ return LRU_SKIP;
+
+ kref_get(&entry->ref);
+
+ hlist_del_init(&entry->hnode);
+ atomic_long_dec(&cache->nent);
+ list_lru_isolate(lru, &entry->lru);
+
+ spin_unlock(&bucket->lock);
+ } else {
+ /* Listxattr cache entry */
+ if (!spin_trylock(&cache->listxattr_lock))
+ return LRU_SKIP;
+
+ kref_get(&entry->ref);
+
+ cache->listxattr = NULL;
+ list_lru_isolate(lru, &entry->lru);
+
+ spin_unlock(&cache->listxattr_lock);
+ }
+
+ list_add_tail(&entry->dispose, dispose);
+ return LRU_REMOVED;
+}
+
+static unsigned long
+nfs4_xattr_entry_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+ LIST_HEAD(dispose);
+ unsigned long freed;
+ struct nfs4_xattr_entry *entry;
+ struct list_lru *lru;
+
+ lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
+ &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
+
+ freed = list_lru_shrink_walk(lru, sc, entry_lru_isolate, &dispose);
+
+ while (!list_empty(&dispose)) {
+ entry = list_first_entry(&dispose, struct nfs4_xattr_entry,
+ dispose);
+ list_del_init(&entry->dispose);
+
+ /*
+ * Drop two references: the one that we just grabbed
+ * in entry_lru_isolate, and the one that was set
+ * when the entry was first allocated.
+ */
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+ }
+
+ return freed;
+}
+
+static unsigned long
+nfs4_xattr_entry_count(struct shrinker *shrink, struct shrink_control *sc)
+{
+ unsigned long count;
+ struct list_lru *lru;
+
+ lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
+ &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
+
+ count = list_lru_count(lru);
+ return vfs_pressure_ratio(count);
+}
+
+
+static void nfs4_xattr_cache_init_once(void *p)
+{
+ struct nfs4_xattr_cache *cache = (struct nfs4_xattr_cache *)p;
+
+ spin_lock_init(&cache->listxattr_lock);
+ atomic_long_set(&cache->nent, 0);
+ nfs4_xattr_hash_init(cache);
+ cache->listxattr = NULL;
+ INIT_WORK(&cache->work, nfs4_xattr_discard_cache_worker);
+ INIT_LIST_HEAD(&cache->lru);
+ INIT_LIST_HEAD(&cache->dispose);
+}
+
+int __init nfs4_xattr_cache_init(void)
+{
+ int ret = 0;
+
+ nfs4_xattr_cache_cachep = kmem_cache_create("nfs4_xattr_cache_cache",
+ sizeof(struct nfs4_xattr_cache), 0,
+ (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ nfs4_xattr_cache_init_once);
+ if (nfs4_xattr_cache_cachep == NULL)
+ return -ENOMEM;
+
+ ret = list_lru_init_memcg(&nfs4_xattr_large_entry_lru,
+ &nfs4_xattr_large_entry_shrinker);
+ if (ret)
+ goto out4;
+
+ ret = list_lru_init_memcg(&nfs4_xattr_entry_lru,
+ &nfs4_xattr_entry_shrinker);
+ if (ret)
+ goto out3;
+
+ ret = list_lru_init_memcg(&nfs4_xattr_cache_lru,
+ &nfs4_xattr_cache_shrinker);
+ if (ret)
+ goto out2;
+
+ nfs4_xattr_cache_wq = alloc_workqueue("nfs4_xattr", WQ_MEM_RECLAIM, 0);
+ if (nfs4_xattr_cache_wq == NULL)
+ goto out1;
+
+ ret = register_shrinker(&nfs4_xattr_cache_shrinker);
+ if (ret)
+ goto out0;
+
+ ret = register_shrinker(&nfs4_xattr_entry_shrinker);
+ if (ret)
+ goto out;
+
+ ret = register_shrinker(&nfs4_xattr_large_entry_shrinker);
+ if (!ret)
+ return 0;
+
+ unregister_shrinker(&nfs4_xattr_entry_shrinker);
+out:
+ unregister_shrinker(&nfs4_xattr_cache_shrinker);
+out0:
+ destroy_workqueue(nfs4_xattr_cache_wq);
+out1:
+ list_lru_destroy(&nfs4_xattr_cache_lru);
+out2:
+ list_lru_destroy(&nfs4_xattr_entry_lru);
+out3:
+ list_lru_destroy(&nfs4_xattr_large_entry_lru);
+out4:
+ kmem_cache_destroy(nfs4_xattr_cache_cachep);
+
+ return ret;
+}
+
+void nfs4_xattr_cache_exit(void)
+{
+ unregister_shrinker(&nfs4_xattr_entry_shrinker);
+ unregister_shrinker(&nfs4_xattr_cache_shrinker);
+ list_lru_destroy(&nfs4_xattr_entry_lru);
+ list_lru_destroy(&nfs4_xattr_cache_lru);
+ kmem_cache_destroy(nfs4_xattr_cache_cachep);
+ destroy_workqueue(nfs4_xattr_cache_wq);
+}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6df94857f5bb..079c1ac84cee 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7459,6 +7459,7 @@ static int nfs4_xattr_set_nfs4_user(const struct xattr_handler *handler,
size_t buflen, int flags)
{
struct nfs_access_entry cache;
+ int ret;

if (!nfs_server_capable(inode, NFS_CAP_XATTR))
return -EOPNOTSUPP;
@@ -7477,10 +7478,17 @@ static int nfs4_xattr_set_nfs4_user(const struct xattr_handler *handler,
return -EACCES;
}

- if (buf == NULL)
- return nfs42_proc_removexattr(inode, key);
- else
- return nfs42_proc_setxattr(inode, key, buf, buflen, flags);
+ if (buf == NULL) {
+ ret = nfs42_proc_removexattr(inode, key);
+ if (!ret)
+ nfs4_xattr_cache_remove(inode, key);
+ } else {
+ ret = nfs42_proc_setxattr(inode, key, buf, buflen, flags);
+ if (!ret)
+ nfs4_xattr_cache_add(inode, key, buf, NULL, buflen);
+ }
+
+ return ret;
}

static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
@@ -7488,6 +7496,7 @@ static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
const char *key, void *buf, size_t buflen)
{
struct nfs_access_entry cache;
+ ssize_t ret;

if (!nfs_server_capable(inode, NFS_CAP_XATTR))
return -EOPNOTSUPP;
@@ -7497,7 +7506,17 @@ static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
return -EACCES;
}

- return nfs42_proc_getxattr(inode, key, buf, buflen);
+ ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ if (ret)
+ return ret;
+
+ ret = nfs4_xattr_cache_get(inode, key, buf, buflen);
+ if (ret >= 0 || (ret < 0 && ret != -ENOENT))
+ return ret;
+
+ ret = nfs42_proc_getxattr(inode, key, buf, buflen);
+
+ return ret;
}

static ssize_t
@@ -7505,7 +7524,7 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char *list, size_t list_len)
{
u64 cookie;
bool eof;
- int ret, size;
+ ssize_t ret, size;
char *buf;
size_t buflen;
struct nfs_access_entry cache;
@@ -7518,6 +7537,14 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char *list, size_t list_len)
return 0;
}

+ ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ if (ret)
+ return ret;
+
+ ret = nfs4_xattr_cache_list(inode, list, list_len);
+ if (ret >= 0 || (ret < 0 && ret != -ENOENT))
+ return ret;
+
cookie = 0;
eof = false;
buflen = list_len ? list_len : XATTR_LIST_MAX;
@@ -7537,6 +7564,9 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char *list, size_t list_len)
size += ret;
}

+ if (list_len)
+ nfs4_xattr_cache_set_list(inode, list, size);
+
return size;
}

diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
index 1475f932d7da..0c1ab846b83d 100644
--- a/fs/nfs/nfs4super.c
+++ b/fs/nfs/nfs4super.c
@@ -69,6 +69,7 @@ static void nfs4_evict_inode(struct inode *inode)
pnfs_destroy_layout(NFS_I(inode));
/* First call standard NFS clear_inode() code */
nfs_clear_inode(inode);
+ nfs4_xattr_cache_zap(inode);
}

struct nfs_referral_count {
@@ -268,6 +269,12 @@ static int __init init_nfs_v4(void)
if (err)
goto out1;

+#ifdef CONFIG_NFS_V4_2
+ err = nfs4_xattr_cache_init();
+ if (err)
+ goto out2;
+#endif
+
err = nfs4_register_sysctl();
if (err)
goto out2;
@@ -288,6 +295,9 @@ static void __exit exit_nfs_v4(void)
nfs4_pnfs_v3_ds_connect_unload();

unregister_nfs_version(&nfs_v4);
+#ifdef CONFIG_NFS_V4_2
+ nfs4_xattr_cache_exit();
+#endif
nfs4_unregister_sysctl();
nfs_idmap_quit();
nfs_dns_resolver_destroy();
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 1fcfef670a4a..c08cc22d9c32 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -102,6 +102,8 @@ struct nfs_delegation;

struct posix_acl;

+struct nfs4_xattr_cache;
+
/*
* nfs fs inode data in memory
*/
@@ -188,6 +190,10 @@ struct nfs_inode {
struct fscache_cookie *fscache;
#endif
struct inode vfs_inode;
+
+#ifdef CONFIG_NFS_V4_2
+ struct nfs4_xattr_cache *xattr_cache;
+#endif
};

struct nfs4_copy_state {
diff --git a/include/uapi/linux/nfs_fs.h b/include/uapi/linux/nfs_fs.h
index 7bcc8cd6831d..3afe3767c55d 100644
--- a/include/uapi/linux/nfs_fs.h
+++ b/include/uapi/linux/nfs_fs.h
@@ -56,6 +56,7 @@
#define NFSDBG_PNFS 0x1000
#define NFSDBG_PNFS_LD 0x2000
#define NFSDBG_STATE 0x4000
+#define NFSDBG_XATTRCACHE 0x8000
#define NFSDBG_ALL 0xFFFF


--
2.16.6

2020-03-11 19:57:45

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH 08/13] nfs: modify update_changeattr to deal with regular files

Until now, change attributes in change_info form were only returned by
directory operations. However, they are also used for the RFC 8276
extended attribute operations, which work on both directories
and regular files. Modify update_changeattr to deal:

* Rename it to nfs4_update_changeattr and make it non-static.
* Don't always use INO_INVALID_DATA, this isn't needed for a
directory that only had its extended attributes changed by us.
* Existing callers now always pass in INO_INVALID_DATA.

For the current callers of this function, behavior is unchanged.

Signed-off-by: Frank van der Linden <[email protected]>
---
fs/nfs/nfs4_fs.h | 5 ++++
fs/nfs/nfs4proc.c | 70 ++++++++++++++++++++++++++++++++++---------------------
2 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 8be1ba7c62bb..42c9f237dc61 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -322,6 +322,11 @@ extern int update_open_stateid(struct nfs4_state *state,

extern int nfs4_proc_get_lease_time(struct nfs_client *clp,
struct nfs_fsinfo *fsinfo);
+extern void nfs4_update_changeattr(struct inode *dir,
+ struct nfs4_change_info *cinfo,
+ unsigned long timestamp,
+ unsigned long cache_validity);
+
#if defined(CONFIG_NFS_V4_1)
extern int nfs41_sequence_done(struct rpc_task *, struct nfs4_sequence_res *);
extern int nfs4_proc_create_session(struct nfs_client *, const struct cred *);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index da72237c6933..31d65bbf4a23 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1157,37 +1157,48 @@ nfs4_dec_nlink_locked(struct inode *inode)
}

static void
-update_changeattr_locked(struct inode *dir, struct nfs4_change_info *cinfo,
+nfs4_update_changeattr_locked(struct inode *inode,
+ struct nfs4_change_info *cinfo,
unsigned long timestamp, unsigned long cache_validity)
{
- struct nfs_inode *nfsi = NFS_I(dir);
+ struct nfs_inode *nfsi = NFS_I(inode);

nfsi->cache_validity |= NFS_INO_INVALID_CTIME
| NFS_INO_INVALID_MTIME
- | NFS_INO_INVALID_DATA
| cache_validity;
- if (cinfo->atomic && cinfo->before == inode_peek_iversion_raw(dir)) {
+
+ if (cinfo->atomic && cinfo->before == inode_peek_iversion_raw(inode)) {
nfsi->cache_validity &= ~NFS_INO_REVAL_PAGECACHE;
nfsi->attrtimeo_timestamp = jiffies;
} else {
- nfs_force_lookup_revalidate(dir);
- if (cinfo->before != inode_peek_iversion_raw(dir))
+ if (S_ISDIR(inode->i_mode)) {
+ nfsi->cache_validity |= NFS_INO_INVALID_DATA;
+ nfs_force_lookup_revalidate(inode);
+ } else {
+ if (!NFS_PROTO(inode)->have_delegation(inode,
+ FMODE_READ))
+ nfsi->cache_validity |= NFS_INO_REVAL_PAGECACHE;
+ }
+
+ if (cinfo->before != inode_peek_iversion_raw(inode))
nfsi->cache_validity |= NFS_INO_INVALID_ACCESS |
- NFS_INO_INVALID_ACL;
+ NFS_INO_INVALID_ACL;
}
- inode_set_iversion_raw(dir, cinfo->after);
+ inode_set_iversion_raw(inode, cinfo->after);
nfsi->read_cache_jiffies = timestamp;
nfsi->attr_gencount = nfs_inc_attr_generation_counter();
nfsi->cache_validity &= ~NFS_INO_INVALID_CHANGE;
- nfs_fscache_invalidate(dir);
+
+ if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
+ nfs_fscache_invalidate(inode);
}

-static void
-update_changeattr(struct inode *dir, struct nfs4_change_info *cinfo,
+void
+nfs4_update_changeattr(struct inode *dir, struct nfs4_change_info *cinfo,
unsigned long timestamp, unsigned long cache_validity)
{
spin_lock(&dir->i_lock);
- update_changeattr_locked(dir, cinfo, timestamp, cache_validity);
+ nfs4_update_changeattr_locked(dir, cinfo, timestamp, cache_validity);
spin_unlock(&dir->i_lock);
}

@@ -2643,8 +2654,9 @@ static int _nfs4_proc_open(struct nfs4_opendata *data,
data->file_created = true;
if (data->file_created ||
inode_peek_iversion_raw(dir) != o_res->cinfo.after)
- update_changeattr(dir, &o_res->cinfo,
- o_res->f_attr->time_start, 0);
+ nfs4_update_changeattr(dir, &o_res->cinfo,
+ o_res->f_attr->time_start,
+ NFS_INO_INVALID_DATA);
}
if ((o_res->rflags & NFS4_OPEN_RESULT_LOCKTYPE_POSIX) == 0)
server->caps &= ~NFS_CAP_POSIX_LOCK;
@@ -4543,7 +4555,8 @@ _nfs4_proc_remove(struct inode *dir, const struct qstr *name, u32 ftype)
status = nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 1);
if (status == 0) {
spin_lock(&dir->i_lock);
- update_changeattr_locked(dir, &res.cinfo, timestamp, 0);
+ nfs4_update_changeattr_locked(dir, &res.cinfo, timestamp,
+ NFS_INO_INVALID_DATA);
/* Removing a directory decrements nlink in the parent */
if (ftype == NF4DIR && dir->i_nlink > 2)
nfs4_dec_nlink_locked(dir);
@@ -4627,8 +4640,9 @@ static int nfs4_proc_unlink_done(struct rpc_task *task, struct inode *dir)
&data->timeout) == -EAGAIN)
return 0;
if (task->tk_status == 0)
- update_changeattr(dir, &res->cinfo,
- res->dir_attr->time_start, 0);
+ nfs4_update_changeattr(dir, &res->cinfo,
+ res->dir_attr->time_start,
+ NFS_INO_INVALID_DATA);
return 1;
}

@@ -4672,16 +4686,18 @@ static int nfs4_proc_rename_done(struct rpc_task *task, struct inode *old_dir,
if (task->tk_status == 0) {
if (new_dir != old_dir) {
/* Note: If we moved a directory, nlink will change */
- update_changeattr(old_dir, &res->old_cinfo,
+ nfs4_update_changeattr(old_dir, &res->old_cinfo,
res->old_fattr->time_start,
- NFS_INO_INVALID_OTHER);
- update_changeattr(new_dir, &res->new_cinfo,
+ NFS_INO_INVALID_OTHER |
+ NFS_INO_INVALID_DATA);
+ nfs4_update_changeattr(new_dir, &res->new_cinfo,
res->new_fattr->time_start,
- NFS_INO_INVALID_OTHER);
+ NFS_INO_INVALID_OTHER |
+ NFS_INO_INVALID_DATA);
} else
- update_changeattr(old_dir, &res->old_cinfo,
+ nfs4_update_changeattr(old_dir, &res->old_cinfo,
res->old_fattr->time_start,
- 0);
+ NFS_INO_INVALID_DATA);
}
return 1;
}
@@ -4722,7 +4738,8 @@ static int _nfs4_proc_link(struct inode *inode, struct inode *dir, const struct

status = nfs4_call_sync(server->client, server, &msg, &arg.seq_args, &res.seq_res, 1);
if (!status) {
- update_changeattr(dir, &res.cinfo, res.fattr->time_start, 0);
+ nfs4_update_changeattr(dir, &res.cinfo, res.fattr->time_start,
+ NFS_INO_INVALID_DATA);
status = nfs_post_op_update_inode(inode, res.fattr);
if (!status)
nfs_setsecurity(inode, res.fattr, res.label);
@@ -4800,8 +4817,9 @@ static int nfs4_do_create(struct inode *dir, struct dentry *dentry, struct nfs4_
&data->arg.seq_args, &data->res.seq_res, 1);
if (status == 0) {
spin_lock(&dir->i_lock);
- update_changeattr_locked(dir, &data->res.dir_cinfo,
- data->res.fattr->time_start, 0);
+ nfs4_update_changeattr_locked(dir, &data->res.dir_cinfo,
+ data->res.fattr->time_start,
+ NFS_INO_INVALID_DATA);
/* Creating a directory bumps nlink in the parent */
if (data->arg.ftype == NF4DIR)
nfs4_inc_nlink_locked(dir);
--
2.16.6

2020-03-11 19:57:52

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH 09/13] nfs: define and use the NFS_INO_INVALID_XATTR flag

Define the NFS_INO_INVALID_XATTR flag, to be used for the NFSv4.2 xattr
cache, and use it where appropriate.

No functional change as yet.

Signed-off-by: Frank van der Linden <[email protected]>
---
fs/nfs/inode.c | 7 ++++++-
fs/nfs/nfs4proc.c | 3 ++-
fs/nfs/nfstrace.h | 3 ++-
include/linux/nfs_fs.h | 1 +
4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 11bf15800ac9..d2be152796ef 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -205,7 +205,8 @@ static void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
flags &= ~NFS_INO_INVALID_OTHER;
flags &= ~(NFS_INO_INVALID_CHANGE
| NFS_INO_INVALID_SIZE
- | NFS_INO_REVAL_PAGECACHE);
+ | NFS_INO_REVAL_PAGECACHE
+ | NFS_INO_INVALID_XATTR);
}

if (inode->i_mapping->nrpages == 0)
@@ -535,6 +536,8 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr, st
inode->i_gid = fattr->gid;
else if (nfs_server_capable(inode, NFS_CAP_OWNER_GROUP))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_OTHER);
+ if (nfs_server_capable(inode, NFS_CAP_XATTR))
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_XATTR);
if (fattr->valid & NFS_ATTR_FATTR_BLOCKS_USED)
inode->i_blocks = fattr->du.nfs2.blocks;
if (fattr->valid & NFS_ATTR_FATTR_SPACE_USED) {
@@ -1365,6 +1368,8 @@ static void nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
inode_set_iversion_raw(inode, fattr->change_attr);
if (S_ISDIR(inode->i_mode))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_DATA);
+ else if (nfs_server_capable(inode, NFS_CAP_XATTR))
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_XATTR);
}
/* If we have atomic WCC data, we may update some attributes */
ts = inode->i_ctime;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 31d65bbf4a23..725ae64f62f7 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1182,7 +1182,8 @@ nfs4_update_changeattr_locked(struct inode *inode,

if (cinfo->before != inode_peek_iversion_raw(inode))
nfsi->cache_validity |= NFS_INO_INVALID_ACCESS |
- NFS_INO_INVALID_ACL;
+ NFS_INO_INVALID_ACL |
+ NFS_INO_INVALID_XATTR;
}
inode_set_iversion_raw(inode, cinfo->after);
nfsi->read_cache_jiffies = timestamp;
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index a9588d19a5ae..b941ad1199e5 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -59,7 +59,8 @@ TRACE_DEFINE_ENUM(NFS_INO_INVALID_OTHER);
{ NFS_INO_INVALID_CTIME, "INVALID_CTIME" }, \
{ NFS_INO_INVALID_MTIME, "INVALID_MTIME" }, \
{ NFS_INO_INVALID_SIZE, "INVALID_SIZE" }, \
- { NFS_INO_INVALID_OTHER, "INVALID_OTHER" })
+ { NFS_INO_INVALID_OTHER, "INVALID_OTHER" }, \
+ { NFS_INO_INVALID_XATTR, "INVALID_XATTR" })

TRACE_DEFINE_ENUM(NFS_INO_ADVISE_RDPLUS);
TRACE_DEFINE_ENUM(NFS_INO_STALE);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index e86e7a747092..1fcfef670a4a 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -233,6 +233,7 @@ struct nfs4_copy_state {
#define NFS_INO_INVALID_OTHER BIT(12) /* other attrs are invalid */
#define NFS_INO_DATA_INVAL_DEFER \
BIT(13) /* Deferred cache invalidation */
+#define NFS_INO_INVALID_XATTR BIT(14) /* xattrs are invalid */

#define NFS_INO_INVALID_ATTR (NFS_INO_INVALID_CHANGE \
| NFS_INO_INVALID_CTIME \
--
2.16.6

2020-03-12 19:07:19

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: [PATCH 00/13] client side user xattr (RFC8276) support


I have applied the patchset and run simple test against dCache nfs server:

root@anahit xattr-test]# df -h .
Filesystem Size Used Avail Use% Mounted on
ani:/ 213G 179G 35G 84% /mnt
[root@anahit xattr-test]#
[root@anahit xattr-test]# touch file.txt
[root@anahit xattr-test]# attr -l file.txt
[root@anahit xattr-test]# attr -s key1 -V value1 file.txt
Attribute "key1" set to a 6 byte value for file.txt:
value1
[root@anahit xattr-test]# attr -s key2 -V value2 file.txt
Attribute "key2" set to a 6 byte value for file.txt:
value2
[root@anahit xattr-test]# attr -l file.txt
Attribute "user.key1" has a 6 byte value for file.txt
Attribute "user.key2" has a 6 byte value for file.txt
[root@anahit xattr-test]# attr -g key1 file.txt
Attribute "key1" had a 6 byte value for file.txt:
value1
[root@anahit xattr-test]# attr -g key2 file.txt
Attribute "key2" had a 6 byte value for file.txt:
value2
[root@anahit xattr-test]# getfattr -n user.key1 file.txt
# file: file.txt
user.key1="value1"

[root@anahit xattr-test]# getfattr -n user.key2 file.txt
# file: file.txt
user.key2="value2"

[root@anahit xattr-test]# attr -r key1 file.txt
[root@anahit xattr-test]# attr -r key2 file.txt
[root@anahit xattr-test]# attr -l file.txt
[root@anahit xattr-test]#


At lease a dirrerent implementation in addition to linux server works as expected.

Tested-by: "Tigran Mkrtchyan" <[email protected]>


Tigran.

----- Original Message -----
> From: "Frank van der Linden" <[email protected]>
> To: "Trond Myklebust" <[email protected]>, "Anna Schumaker" <[email protected]>, "linux-nfs"
> <[email protected]>
> Cc: "Frank van der Linden" <[email protected]>
> Sent: Wednesday, March 11, 2020 8:56:00 PM
> Subject: [PATCH 00/13] client side user xattr (RFC8276) support

> This patchset implements the client side for NFS user extended attributes,
> as defined in RFC8726.
>
> This was originally posted as an RFC in:
>
> https://patchwork.kernel.org/cover/11143565/
>
> Patch 1 is shared with the server side patch, posted
> separately.
>
> Most comments in there still apply, except that:
>
> 1. Client side caching is now included in this patch set.
> 2. As per the discussion, user extended attributes are enabled if
> the client and server support them (e.g. they support 4.2 and
> advertise the user extended attribute FATTR). There are no longer
> options to switch them off on either the client or the server.
> 3. The code is no longer conditioned on a config option.
> 4. The number of patches has been reduced somewhat by merging
> smaller, related ones.
>
> The client side caching is implemented through a per-inode hash table,
> which is allocated on demand. See fs/nfs/nfs42xattr.c for details.
>
> This has been tested as follows:
>
> * Linux client and server:
> * Test all corner cases (XATTR_SIZE_*)
> * Test all failure cases (no xattr, setxattr with different or
> invalid flags, etc).
> * Verify the content of xattrs across several operations.
> * Use KASAN and KMEMLEAK for a longer mix of testruns to verify
> that there were no leaks (after unmounting the filesystem).
> * Stress tested caching, trying to run the client out of memory.
>
> * Tested against the FreeBSD-current implementation as well, which works
> (after I fixed 2 bugs in that implementation, which I'm sending out to
> them too).
>
> * Not tested: RDMA (I couldn't get a setup going).
>
> Frank van der Linden (13):
> nfs,nfsd: NFSv4.2 extended attribute protocol definitions
> nfs: add client side only definitions for user xattrs
> NFSv4.2: query the server for extended attribute support
> NFSv4.2: define limits and sizes for user xattr handling
> NFSv4.2: add client side XDR handling for extended attributes
> nfs: define nfs_access_get_cached function
> NFSv4.2: query the extended attribute access bits
> nfs: modify update_changeattr to deal with regular files
> nfs: define and use the NFS_INO_INVALID_XATTR flag
> nfs: make the buf_to_pages_noslab function available to the nfs code
> NFSv4.2: add the extended attribute proc functions.
> NFSv4.2: hook in the user extended attribute handlers
> NFSv4.2: add client side xattr caching.
>
> fs/nfs/Makefile | 1 +
> fs/nfs/client.c | 19 +-
> fs/nfs/dir.c | 24 +-
> fs/nfs/inode.c | 16 +-
> fs/nfs/internal.h | 28 ++
> fs/nfs/nfs42.h | 24 +
> fs/nfs/nfs42proc.c | 248 ++++++++++
> fs/nfs/nfs42xattr.c | 1083 +++++++++++++++++++++++++++++++++++++++++++
> fs/nfs/nfs42xdr.c | 442 ++++++++++++++++++
> fs/nfs/nfs4_fs.h | 5 +
> fs/nfs/nfs4client.c | 31 ++
> fs/nfs/nfs4proc.c | 248 ++++++++--
> fs/nfs/nfs4super.c | 10 +
> fs/nfs/nfs4xdr.c | 29 ++
> fs/nfs/nfstrace.h | 3 +-
> include/linux/nfs4.h | 25 +
> include/linux/nfs_fs.h | 12 +
> include/linux/nfs_fs_sb.h | 6 +
> include/linux/nfs_xdr.h | 60 ++-
> include/uapi/linux/nfs4.h | 3 +
> include/uapi/linux/nfs_fs.h | 1 +
> 21 files changed, 2276 insertions(+), 42 deletions(-)
> create mode 100644 fs/nfs/nfs42xattr.c
>
> --
> 2.16.6

2020-03-12 20:10:46

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 00/13] client side user xattr (RFC8276) support

Hi Frank,

On Wed, Mar 11, 2020 at 3:57 PM Frank van der Linden
<[email protected]> wrote:
>
> This patchset implements the client side for NFS user extended attributes,
> as defined in RFC8726.
>
> This was originally posted as an RFC in:
>
> https://patchwork.kernel.org/cover/11143565/
>
> Patch 1 is shared with the server side patch, posted
> separately.
>
> Most comments in there still apply, except that:
>
> 1. Client side caching is now included in this patch set.
> 2. As per the discussion, user extended attributes are enabled if
> the client and server support them (e.g. they support 4.2 and
> advertise the user extended attribute FATTR). There are no longer
> options to switch them off on either the client or the server.
> 3. The code is no longer conditioned on a config option.
> 4. The number of patches has been reduced somewhat by merging
> smaller, related ones.
>
> The client side caching is implemented through a per-inode hash table,
> which is allocated on demand. See fs/nfs/nfs42xattr.c for details.
>
> This has been tested as follows:
>
> * Linux client and server:
> * Test all corner cases (XATTR_SIZE_*)
> * Test all failure cases (no xattr, setxattr with different or
> invalid flags, etc).
> * Verify the content of xattrs across several operations.
> * Use KASAN and KMEMLEAK for a longer mix of testruns to verify
> that there were no leaks (after unmounting the filesystem).
> * Stress tested caching, trying to run the client out of memory.

I'm curious if you've tried xfstests with your patches? There are a
handful of tests using xattrs that might be good to check with, too:

anna@gouda % grep xattr -l tests/generic/[0-9][0-9][0-9]
tests/generic/037
tests/generic/062
tests/generic/066
tests/generic/093
tests/generic/117
tests/generic/337
tests/generic/377
tests/generic/403
tests/generic/425
tests/generic/454
tests/generic/489
tests/generic/523
tests/generic/529
tests/generic/556

Thanks,
Anna


>
> * Tested against the FreeBSD-current implementation as well, which works
> (after I fixed 2 bugs in that implementation, which I'm sending out to
> them too).
>
> * Not tested: RDMA (I couldn't get a setup going).
>
> Frank van der Linden (13):
> nfs,nfsd: NFSv4.2 extended attribute protocol definitions
> nfs: add client side only definitions for user xattrs
> NFSv4.2: query the server for extended attribute support
> NFSv4.2: define limits and sizes for user xattr handling
> NFSv4.2: add client side XDR handling for extended attributes
> nfs: define nfs_access_get_cached function
> NFSv4.2: query the extended attribute access bits
> nfs: modify update_changeattr to deal with regular files
> nfs: define and use the NFS_INO_INVALID_XATTR flag
> nfs: make the buf_to_pages_noslab function available to the nfs code
> NFSv4.2: add the extended attribute proc functions.
> NFSv4.2: hook in the user extended attribute handlers
> NFSv4.2: add client side xattr caching.
>
> fs/nfs/Makefile | 1 +
> fs/nfs/client.c | 19 +-
> fs/nfs/dir.c | 24 +-
> fs/nfs/inode.c | 16 +-
> fs/nfs/internal.h | 28 ++
> fs/nfs/nfs42.h | 24 +
> fs/nfs/nfs42proc.c | 248 ++++++++++
> fs/nfs/nfs42xattr.c | 1083 +++++++++++++++++++++++++++++++++++++++++++
> fs/nfs/nfs42xdr.c | 442 ++++++++++++++++++
> fs/nfs/nfs4_fs.h | 5 +
> fs/nfs/nfs4client.c | 31 ++
> fs/nfs/nfs4proc.c | 248 ++++++++--
> fs/nfs/nfs4super.c | 10 +
> fs/nfs/nfs4xdr.c | 29 ++
> fs/nfs/nfstrace.h | 3 +-
> include/linux/nfs4.h | 25 +
> include/linux/nfs_fs.h | 12 +
> include/linux/nfs_fs_sb.h | 6 +
> include/linux/nfs_xdr.h | 60 ++-
> include/uapi/linux/nfs4.h | 3 +
> include/uapi/linux/nfs_fs.h | 1 +
> 21 files changed, 2276 insertions(+), 42 deletions(-)
> create mode 100644 fs/nfs/nfs42xattr.c
>
> --
> 2.16.6
>

2020-03-12 20:37:05

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 10/13] nfs: make the buf_to_pages_noslab function available to the nfs code

Hi Frank,

On Wed, 2020-03-11 at 19:56 +0000, Frank van der Linden wrote:
> Make the buf_to_pages_noslab function available to the rest of the NFS
> code. Rename it to nfs4_buf_to_pages_noslab to be consistent.
>
> This will be used later in the NFSv4.2 xattr code.
>
> Signed-off-by: Frank van der Linden <[email protected]>
> ---
> fs/nfs/internal.h | 3 +++
> fs/nfs/nfs4proc.c | 4 ++--
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 68f235a571e1..1e3a7e119c93 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -317,6 +317,9 @@ extern const u32 nfs42_maxlistxattrs_overhead;
> extern const struct rpc_procinfo nfs4_procedures[];
> #endif
>
> +extern int nfs4_buf_to_pages_noslab(const void *buf, size_t buflen,
> + struct page **pages);
> +

Please put this in nfs4_fs.h, as its only callers are nfs4proc.c and nfs42proc.c

Thanks,
Anna

> #ifdef CONFIG_NFS_V4_SECURITY_LABEL
> extern struct nfs4_label *nfs4_label_alloc(struct nfs_server *server, gfp_t
> flags);
> static inline struct nfs4_label *
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 725ae64f62f7..aee3a1c97def 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5553,7 +5553,7 @@ static inline int nfs4_server_supports_acls(struct
> nfs_server *server)
> */
> #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)
>
> -static int buf_to_pages_noslab(const void *buf, size_t buflen,
> +int nfs4_buf_to_pages_noslab(const void *buf, size_t buflen,
> struct page **pages)
> {
> struct page *newpage, **spages;
> @@ -5795,7 +5795,7 @@ static int __nfs4_proc_set_acl(struct inode *inode,
> const void *buf, size_t bufl
> return -EOPNOTSUPP;
> if (npages > ARRAY_SIZE(pages))
> return -ERANGE;
> - i = buf_to_pages_noslab(buf, buflen, arg.acl_pages);
> + i = nfs4_buf_to_pages_noslab(buf, buflen, arg.acl_pages);
> if (i < 0)
> return i;
> nfs4_inode_make_writeable(inode);

2020-03-12 20:40:22

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 13/13] NFSv4.2: add client side xattr caching.

Hi Frank,

On Wed, 2020-03-11 at 19:56 +0000, Frank van der Linden wrote:
> Implement client side caching for NFSv4.2 extended attributes. The cache
> is a per-inode hashtable, with name/value entries. There is one special
> entry for the listxattr cache.
>
> NFS inodes have a pointer to a cache structure. The cache structure is
> allocated on demand, freed when the cache is invalidated.
>
> Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE)
> are collected by a separate shrinker, and freed more aggressively
> than others.
>
> Signed-off-by: Frank van der Linden <[email protected]>
> ---
> fs/nfs/Makefile | 1 +
> fs/nfs/inode.c | 9 +-
> fs/nfs/internal.h | 20 +
> fs/nfs/nfs42proc.c | 12 +
> fs/nfs/nfs42xattr.c | 1083
> +++++++++++++++++++++++++++++++++++++++++++
> fs/nfs/nfs4proc.c | 42 +-
> fs/nfs/nfs4super.c | 10 +
> include/linux/nfs_fs.h | 6 +
> include/uapi/linux/nfs_fs.h | 1 +
> 9 files changed, 1177 insertions(+), 7 deletions(-)
> create mode 100644 fs/nfs/nfs42xattr.c
>
> diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
> index 2433c3e03cfa..191b3e9aa232 100644
> --- a/fs/nfs/Makefile
> +++ b/fs/nfs/Makefile
> @@ -31,6 +31,7 @@ nfsv4-$(CONFIG_NFS_USE_LEGACY_DNS) += cache_lib.o
> nfsv4-$(CONFIG_SYSCTL) += nfs4sysctl.o
> nfsv4-$(CONFIG_NFS_V4_1) += pnfs.o pnfs_dev.o pnfs_nfs.o
> nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o
> +nfsv4-$(CONFIG_NFS_V4_2) += nfs42xattr.o
>
> obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
> obj-$(CONFIG_PNFS_BLOCK) += blocklayout/
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index d2be152796ef..9d4952d2306b 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -194,6 +194,7 @@ bool nfs_check_cache_invalid(struct inode *inode, unsigned
> long flags)
>
> return nfs_check_cache_invalid_not_delegated(inode, flags);
> }
> +EXPORT_SYMBOL_GPL(nfs_check_cache_invalid);
>
> static void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
> {
> @@ -235,11 +236,13 @@ static void nfs_zap_caches_locked(struct inode *inode)
> | NFS_INO_INVALID_DATA
> | NFS_INO_INVALID_ACCESS
> | NFS_INO_INVALID_ACL
> + | NFS_INO_INVALID_XATTR
> | NFS_INO_REVAL_PAGECACHE);
> } else
> nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
> | NFS_INO_INVALID_ACCESS
> | NFS_INO_INVALID_ACL
> + | NFS_INO_INVALID_XATTR
> | NFS_INO_REVAL_PAGECACHE);
> nfs_zap_label_cache_locked(nfsi);
> }
> @@ -1885,7 +1888,8 @@ static int nfs_update_inode(struct inode *inode, struct
> nfs_fattr *fattr)
> if (!(have_writers || have_delegation)) {
> invalid |= NFS_INO_INVALID_DATA
> | NFS_INO_INVALID_ACCESS
> - | NFS_INO_INVALID_ACL;
> + | NFS_INO_INVALID_ACL
> + | NFS_INO_INVALID_XATTR;
> /* Force revalidate of all attributes */
> save_cache_validity |= NFS_INO_INVALID_CTIME
> | NFS_INO_INVALID_MTIME
> @@ -2084,6 +2088,9 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
> #if IS_ENABLED(CONFIG_NFS_V4)
> nfsi->nfs4_acl = NULL;
> #endif /* CONFIG_NFS_V4 */
> +#ifdef CONFIG_NFS_V4_2
> + nfsi->xattr_cache = NULL;
> +#endif
> return &nfsi->vfs_inode;
> }
> EXPORT_SYMBOL_GPL(nfs_alloc_inode);
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 1e3a7e119c93..67b8e4f7c554 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -575,6 +575,26 @@ extern void nfs4_test_session_trunk(struct rpc_clnt
> *clnt,
> struct rpc_xprt *xprt,
> void *data);
>
> +#ifdef CONFIG_NFS_V4_2
> +extern int __init nfs4_xattr_cache_init(void);
> +extern void nfs4_xattr_cache_exit(void);
> +extern void nfs4_xattr_cache_add(struct inode *inode, const char *name,
> + const char *buf, struct page **pages,
> + ssize_t buflen);
> +extern void nfs4_xattr_cache_remove(struct inode *inode, const char *name);
> +extern ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name,
> + char *buf, ssize_t buflen);
> +extern void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
> + ssize_t buflen);
> +extern ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf,
> + ssize_t buflen);
> +extern void nfs4_xattr_cache_zap(struct inode *inode);
> +#else
> +static inline void nfs4_xattr_cache_zap(struct inode *inode)
> +{
> +}
> +#endif
> +

Same thing with these functions. The generic client doesn't need to know about
them, so please move it into nfs4_fs.h instead.

Thanks,
Anna

> static inline struct inode *nfs_igrab_and_active(struct inode *inode)
> {
> inode = igrab(inode);
> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> index 8c2e52bc986a..e200522469af 100644
> --- a/fs/nfs/nfs42proc.c
> +++ b/fs/nfs/nfs42proc.c
> @@ -1182,6 +1182,18 @@ static ssize_t _nfs42_proc_getxattr(struct inode
> *inode, const char *name,
> if (ret < 0)
> return ret;
>
> + /*
> + * Normally, the caching is done one layer up, but for successful
> + * RPCS, always cache the result here, even if the caller was
> + * just querying the length, or if the reply was too big for
> + * the caller. This avoids a second RPC in the case of the
> + * common query-alloc-retrieve cycle for xattrs.
> + *
> + * Note that xattr_len is always capped to XATTR_SIZE_MAX.
> + */
> +
> + nfs4_xattr_cache_add(inode, name, NULL, pages, res.xattr_len);
> +
> if (buflen) {
> if (res.xattr_len > buflen)
> return -ERANGE;
> diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
> new file mode 100644
> index 000000000000..23fdab977a2a
> --- /dev/null
> +++ b/fs/nfs/nfs42xattr.c
> @@ -0,0 +1,1083 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright 2019, 2020 Amazon.com, Inc. or its affiliates. All rights
> reserved.
> + *
> + * User extended attribute client side cache functions.
> + *
> + * Author: Frank van der Linden <[email protected]>
> + */
> +#include <linux/errno.h>
> +#include <linux/nfs_fs.h>
> +#include <linux/hashtable.h>
> +#include <linux/refcount.h>
> +#include <uapi/linux/xattr.h>
> +
> +#include "nfs4_fs.h"
> +#include "internal.h"
> +
> +/*
> + * User extended attributes client side caching is implemented by having
> + * a cache structure attached to NFS inodes. This structure is allocated
> + * when needed, and freed when the cache is zapped.
> + *
> + * The cache structure contains as hash table of entries, and a pointer
> + * to a special-cased entry for the listxattr cache.
> + *
> + * Accessing and allocating / freeing the caches is done via reference
> + * counting. The cache entries use a similar refcounting scheme.
> + *
> + * This makes freeing a cache, both from the shrinker and from the
> + * zap cache path, easy. It also means that, in current use cases,
> + * the large majority of inodes will not waste any memory, as they
> + * will never have any user extended attributes assigned to them.
> + *
> + * Attribute entries are hashed in to a simple hash table. They are
> + * also part of an LRU.
> + *
> + * There are three shrinkers.
> + *
> + * Two shrinkers deal with the cache entries themselves: one for
> + * large entries (> PAGE_SIZE), and one for smaller entries. The
> + * shrinker for the larger entries works more aggressively than
> + * those for the smaller entries.
> + *
> + * The other shrinker frees the cache structures themselves.
> + */
> +
> +/*
> + * 64 buckets is a good default. There is likely no reasonable
> + * workload that uses more than even 64 user extended attributes.
> + * You can certainly add a lot more - but you get what you ask for
> + * in those circumstances.
> + */
> +#define NFS4_XATTR_HASH_SIZE 64
> +
> +#define NFSDBG_FACILITY NFSDBG_XATTRCACHE
> +
> +struct nfs4_xattr_cache;
> +struct nfs4_xattr_entry;
> +
> +struct nfs4_xattr_bucket {
> + spinlock_t lock;
> + struct hlist_head hlist;
> + struct nfs4_xattr_cache *cache;
> + bool draining;
> +};
> +
> +struct nfs4_xattr_cache {
> + struct kref ref;
> + spinlock_t hash_lock; /* protects hashtable and lru */
> + struct nfs4_xattr_bucket buckets[NFS4_XATTR_HASH_SIZE];
> + struct list_head lru;
> + struct list_head dispose;
> + atomic_long_t nent;
> + spinlock_t listxattr_lock;
> + struct inode *inode;
> + struct nfs4_xattr_entry *listxattr;
> + struct work_struct work;
> +};
> +
> +struct nfs4_xattr_entry {
> + struct kref ref;
> + struct hlist_node hnode;
> + struct list_head lru;
> + struct list_head dispose;
> + char *xattr_name;
> + void *xattr_value;
> + size_t xattr_size;
> + struct nfs4_xattr_bucket *bucket;
> + uint32_t flags;
> +};
> +
> +#define NFS4_XATTR_ENTRY_EXTVAL 0x0001
> +
> +/*
> + * LRU list of NFS inodes that have xattr caches.
> + */
> +static struct list_lru nfs4_xattr_cache_lru;
> +static struct list_lru nfs4_xattr_entry_lru;
> +static struct list_lru nfs4_xattr_large_entry_lru;
> +
> +static struct kmem_cache *nfs4_xattr_cache_cachep;
> +
> +static struct workqueue_struct *nfs4_xattr_cache_wq;
> +
> +/*
> + * Hashing helper functions.
> + */
> +static void
> +nfs4_xattr_hash_init(struct nfs4_xattr_cache *cache)
> +{
> + unsigned int i;
> +
> + for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
> + INIT_HLIST_HEAD(&cache->buckets[i].hlist);
> + spin_lock_init(&cache->buckets[i].lock);
> + cache->buckets[i].cache = cache;
> + cache->buckets[i].draining = false;
> + }
> +}
> +
> +/*
> + * Locking order:
> + * 1. inode i_lock or bucket lock
> + * 2. list_lru lock (taken by list_lru_* functions)
> + */
> +
> +/*
> + * Wrapper functions to add a cache entry to the right LRU.
> + */
> +static bool
> +nfs4_xattr_entry_lru_add(struct nfs4_xattr_entry *entry)
> +{
> + struct list_lru *lru;
> +
> + lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
> + &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> + return list_lru_add(lru, &entry->lru);
> +}
> +
> +static bool
> +nfs4_xattr_entry_lru_del(struct nfs4_xattr_entry *entry)
> +{
> + struct list_lru *lru;
> +
> + lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
> + &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> + return list_lru_del(lru, &entry->lru);
> +}
> +
> +/*
> + * This function allocates cache entries. They are the normal
> + * extended attribute name/value pairs, but may also be a listxattr
> + * cache. Those allocations use the same entry so that they can be
> + * treated as one by the memory shrinker.
> + *
> + * xattr cache entries are allocated together with names. If the
> + * value fits in to one page with the entry structure and the name,
> + * it will also be part of the same allocation (kmalloc). This is
> + * expected to be the vast majority of cases. Larger allocations
> + * have a value pointer that is allocated separately by kvmalloc.
> + *
> + * Parameters:
> + *
> + * @name: Name of the extended attribute. NULL for listxattr cache
> + * entry.
> + * @value: Value of attribute, or listxattr cache. NULL if the
> + * value is to be copied from pages instead.
> + * @pages: Pages to copy the value from, if not NULL. Passed in to
> + * make it easier to copy the value after an RPC, even if
> + * the value will not be passed up to application (e.g.
> + * for a 'query' getxattr with NULL buffer).
> + * @len: Length of the value. Can be 0 for zero-length attribues.
> + * @value and @pages will be NULL if @len is 0.
> + */
> +static struct nfs4_xattr_entry *
> +nfs4_xattr_alloc_entry(const char *name, const void *value,
> + struct page **pages, size_t len)
> +{
> + struct nfs4_xattr_entry *entry;
> + void *valp;
> + char *namep;
> + size_t alloclen, slen;
> + char *buf;
> + uint32_t flags;
> +
> + BUILD_BUG_ON(sizeof(struct nfs4_xattr_entry) +
> + XATTR_NAME_MAX + 1 > PAGE_SIZE);
> +
> + alloclen = sizeof(struct nfs4_xattr_entry);
> + if (name != NULL) {
> + slen = strlen(name) + 1;
> + alloclen += slen;
> + } else
> + slen = 0;
> +
> + if (alloclen + len <= PAGE_SIZE) {
> + alloclen += len;
> + flags = 0;
> + } else {
> + flags = NFS4_XATTR_ENTRY_EXTVAL;
> + }
> +
> + buf = kmalloc(alloclen, GFP_KERNEL_ACCOUNT | GFP_NOFS);
> + if (buf == NULL)
> + return NULL;
> + entry = (struct nfs4_xattr_entry *)buf;
> +
> + if (name != NULL) {
> + namep = buf + sizeof(struct nfs4_xattr_entry);
> + memcpy(namep, name, slen);
> + } else {
> + namep = NULL;
> + }
> +
> +
> + if (flags & NFS4_XATTR_ENTRY_EXTVAL) {
> + valp = kvmalloc(len, GFP_KERNEL_ACCOUNT | GFP_NOFS);
> + if (valp == NULL) {
> + kfree(buf);
> + return NULL;
> + }
> + } else if (len != 0) {
> + valp = buf + sizeof(struct nfs4_xattr_entry) + slen;
> + } else
> + valp = NULL;
> +
> + if (valp != NULL) {
> + if (value != NULL)
> + memcpy(valp, value, len);
> + else
> + _copy_from_pages(valp, pages, 0, len);
> + }
> +
> + entry->flags = flags;
> + entry->xattr_value = valp;
> + kref_init(&entry->ref);
> + entry->xattr_name = namep;
> + entry->xattr_size = len;
> + entry->bucket = NULL;
> + INIT_LIST_HEAD(&entry->lru);
> + INIT_LIST_HEAD(&entry->dispose);
> + INIT_HLIST_NODE(&entry->hnode);
> +
> + return entry;
> +}
> +
> +static void
> +nfs4_xattr_free_entry(struct nfs4_xattr_entry *entry)
> +{
> + if (entry->flags & NFS4_XATTR_ENTRY_EXTVAL)
> + kvfree(entry->xattr_value);
> + kfree(entry);
> +}
> +
> +static void
> +nfs4_xattr_free_entry_cb(struct kref *kref)
> +{
> + struct nfs4_xattr_entry *entry;
> +
> + entry = container_of(kref, struct nfs4_xattr_entry, ref);
> +
> + if (WARN_ON(!list_empty(&entry->lru)))
> + return;
> +
> + nfs4_xattr_free_entry(entry);
> +}
> +
> +static void
> +nfs4_xattr_free_cache_cb(struct kref *kref)
> +{
> + struct nfs4_xattr_cache *cache;
> + int i;
> +
> + cache = container_of(kref, struct nfs4_xattr_cache, ref);
> +
> + for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
> + if (WARN_ON(!hlist_empty(&cache->buckets[i].hlist)))
> + return;
> + cache->buckets[i].draining = false;
> + }
> +
> + cache->listxattr = NULL;
> +
> + kmem_cache_free(nfs4_xattr_cache_cachep, cache);
> +
> +}
> +
> +static struct nfs4_xattr_cache *
> +nfs4_xattr_alloc_cache(void)
> +{
> + struct nfs4_xattr_cache *cache;
> +
> + cache = kmem_cache_alloc(nfs4_xattr_cache_cachep,
> + GFP_KERNEL_ACCOUNT | GFP_NOFS);
> + if (cache == NULL)
> + return NULL;
> +
> + kref_init(&cache->ref);
> + atomic_long_set(&cache->nent, 0);
> +
> + return cache;
> +}
> +
> +/*
> + * Set the listxattr cache, which is a special-cased cache entry.
> + * The special value ERR_PTR(-ESTALE) is used to indicate that
> + * the cache is being drained - this prevents a new listxattr
> + * cache from being added to what is now a stale cache.
> + */
> +static int
> +nfs4_xattr_set_listcache(struct nfs4_xattr_cache *cache,
> + struct nfs4_xattr_entry *new)
> +{
> + struct nfs4_xattr_entry *old;
> + int ret = 1;
> +
> + spin_lock(&cache->listxattr_lock);
> +
> + old = cache->listxattr;
> +
> + if (old == ERR_PTR(-ESTALE)) {
> + ret = 0;
> + goto out;
> + }
> +
> + cache->listxattr = new;
> + if (new != NULL && new != ERR_PTR(-ESTALE))
> + nfs4_xattr_entry_lru_add(new);
> +
> + if (old != NULL) {
> + nfs4_xattr_entry_lru_del(old);
> + kref_put(&old->ref, nfs4_xattr_free_entry_cb);
> + }
> +out:
> + spin_unlock(&cache->listxattr_lock);
> +
> + return ret;
> +}
> +
> +/*
> + * Unlink a cache from its parent inode, clearing out an invalid
> + * cache. Must be called with i_lock held.
> + */
> +static struct nfs4_xattr_cache *
> +nfs4_xattr_cache_unlink(struct inode *inode)
> +{
> + struct nfs_inode *nfsi;
> + struct nfs4_xattr_cache *oldcache;
> +
> + nfsi = NFS_I(inode);
> +
> + oldcache = nfsi->xattr_cache;
> + if (oldcache != NULL) {
> + list_lru_del(&nfs4_xattr_cache_lru, &oldcache->lru);
> + oldcache->inode = NULL;
> + }
> + nfsi->xattr_cache = NULL;
> + nfsi->cache_validity &= ~NFS_INO_INVALID_XATTR;
> +
> + return oldcache;
> +
> +}
> +
> +/*
> + * Discard a cache. Usually called by a worker, since walking all
> + * the entries can take up some cycles that we don't want to waste
> + * in the I/O path. Can also be called from the shrinker callback.
> + *
> + * The cache is dead, it has already been unlinked from its inode,
> + * and no longer appears on the cache LRU list.
> + *
> + * Mark all buckets as draining, so that no new entries are added. This
> + * could still happen in the unlikely, but possible case that another
> + * thread had grabbed a reference before it was unlinked from the inode,
> + * and is still holding it for an add operation.
> + *
> + * Remove all entries from the LRU lists, so that there is no longer
> + * any way to 'find' this cache. Then, remove the entries from the hash
> + * table.
> + *
> + * At that point, the cache will remain empty and can be freed when the final
> + * reference drops, which is very likely the kref_put at the end of
> + * this function, or the one called immediately afterwards in the
> + * shrinker callback.
> + */
> +static void
> +nfs4_xattr_discard_cache(struct nfs4_xattr_cache *cache)
> +{
> + unsigned int i;
> + struct nfs4_xattr_entry *entry;
> + struct nfs4_xattr_bucket *bucket;
> + struct hlist_node *n;
> +
> + nfs4_xattr_set_listcache(cache, ERR_PTR(-ESTALE));
> +
> + for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
> + bucket = &cache->buckets[i];
> +
> + spin_lock(&bucket->lock);
> + bucket->draining = true;
> + hlist_for_each_entry_safe(entry, n, &bucket->hlist, hnode) {
> + nfs4_xattr_entry_lru_del(entry);
> + hlist_del_init(&entry->hnode);
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> + }
> + spin_unlock(&bucket->lock);
> + }
> +
> + atomic_long_set(&cache->nent, 0);
> +
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +static void
> +nfs4_xattr_discard_cache_worker(struct work_struct *work)
> +{
> + struct nfs4_xattr_cache *cache = container_of(work,
> + struct nfs4_xattr_cache, work);
> +
> + nfs4_xattr_discard_cache(cache);
> +}
> +
> +static void
> +nfs4_xattr_reap_cache(struct nfs4_xattr_cache *cache)
> +{
> + queue_work(nfs4_xattr_cache_wq, &cache->work);
> +}
> +
> +/*
> + * Get a referenced copy of the cache structure. Avoid doing allocs
> + * while holding i_lock. Which means that we do some optimistic allocation,
> + * and might have to free the result in rare cases.
> + *
> + * This function only checks the NFS_INO_INVALID_XATTR cache validity bit
> + * and acts accordingly, replacing the cache when needed. For the read case
> + * (!add), this means that the caller must make sure that the cache
> + * is valid before caling this function. getxattr and listxattr call
> + * revalidate_inode to do this. The attribute cache timeout (for the
> + * non-delegated case) is expected to be dealt with in the revalidate
> + * call.
> + */
> +
> +static struct nfs4_xattr_cache *
> +nfs4_xattr_get_cache(struct inode *inode, int add)
> +{
> + struct nfs_inode *nfsi;
> + struct nfs4_xattr_cache *cache, *oldcache, *newcache;
> +
> + nfsi = NFS_I(inode);
> +
> + cache = oldcache = NULL;
> +
> + spin_lock(&inode->i_lock);
> +
> + if (nfsi->cache_validity & NFS_INO_INVALID_XATTR)
> + oldcache = nfs4_xattr_cache_unlink(inode);
> + else
> + cache = nfsi->xattr_cache;
> +
> + if (cache != NULL)
> + kref_get(&cache->ref);
> +
> + spin_unlock(&inode->i_lock);
> +
> + if (add && cache == NULL) {
> + newcache = NULL;
> +
> + cache = nfs4_xattr_alloc_cache();
> + if (cache == NULL)
> + goto out;
> +
> + spin_lock(&inode->i_lock);
> + if (nfsi->cache_validity & NFS_INO_INVALID_XATTR) {
> + /*
> + * The cache was invalidated again. Give up,
> + * since what we want to enter is now likely
> + * outdated anyway.
> + */
> + spin_unlock(&inode->i_lock);
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> + cache = NULL;
> + goto out;
> + }
> +
> + /*
> + * Check if someone beat us to it.
> + */
> + if (nfsi->xattr_cache != NULL) {
> + newcache = nfsi->xattr_cache;
> + kref_get(&newcache->ref);
> + } else {
> + kref_get(&cache->ref);
> + nfsi->xattr_cache = cache;
> + cache->inode = inode;
> + list_lru_add(&nfs4_xattr_cache_lru, &cache->lru);
> + }
> +
> + spin_unlock(&inode->i_lock);
> +
> + /*
> + * If there was a race, throw away the cache we just
> + * allocated, and use the new one allocated by someone
> + * else.
> + */
> + if (newcache != NULL) {
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> + cache = newcache;
> + }
> + }
> +
> +out:
> + /*
> + * Discarding an old cache is done via a workqueue.
> + */
> + if (oldcache != NULL)
> + nfs4_xattr_reap_cache(oldcache);
> +
> + return cache;
> +}
> +
> +static inline struct nfs4_xattr_bucket *
> +nfs4_xattr_hash_bucket(struct nfs4_xattr_cache *cache, const char *name)
> +{
> + return &cache->buckets[jhash(name, strlen(name), 0) &
> + (ARRAY_SIZE(cache->buckets) - 1)];
> +}
> +
> +static struct nfs4_xattr_entry *
> +nfs4_xattr_get_entry(struct nfs4_xattr_bucket *bucket, const char *name)
> +{
> + struct nfs4_xattr_entry *entry;
> +
> + entry = NULL;
> +
> + hlist_for_each_entry(entry, &bucket->hlist, hnode) {
> + if (!strcmp(entry->xattr_name, name))
> + break;
> + }
> +
> + return entry;
> +}
> +
> +static int
> +nfs4_xattr_hash_add(struct nfs4_xattr_cache *cache,
> + struct nfs4_xattr_entry *entry)
> +{
> + struct nfs4_xattr_bucket *bucket;
> + struct nfs4_xattr_entry *oldentry = NULL;
> + int ret = 1;
> +
> + bucket = nfs4_xattr_hash_bucket(cache, entry->xattr_name);
> + entry->bucket = bucket;
> +
> + spin_lock(&bucket->lock);
> +
> + if (bucket->draining) {
> + ret = 0;
> + goto out;
> + }
> +
> + oldentry = nfs4_xattr_get_entry(bucket, entry->xattr_name);
> + if (oldentry != NULL) {
> + hlist_del_init(&oldentry->hnode);
> + nfs4_xattr_entry_lru_del(oldentry);
> + } else {
> + atomic_long_inc(&cache->nent);
> + }
> +
> + hlist_add_head(&entry->hnode, &bucket->hlist);
> + nfs4_xattr_entry_lru_add(entry);
> +
> +out:
> + spin_unlock(&bucket->lock);
> +
> + if (oldentry != NULL)
> + kref_put(&oldentry->ref, nfs4_xattr_free_entry_cb);
> +
> + return ret;
> +}
> +
> +static void
> +nfs4_xattr_hash_remove(struct nfs4_xattr_cache *cache, const char *name)
> +{
> + struct nfs4_xattr_bucket *bucket;
> + struct nfs4_xattr_entry *entry;
> +
> + bucket = nfs4_xattr_hash_bucket(cache, name);
> +
> + spin_lock(&bucket->lock);
> +
> + entry = nfs4_xattr_get_entry(bucket, name);
> + if (entry != NULL) {
> + hlist_del_init(&entry->hnode);
> + nfs4_xattr_entry_lru_del(entry);
> + atomic_long_dec(&cache->nent);
> + }
> +
> + spin_unlock(&bucket->lock);
> +
> + if (entry != NULL)
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +}
> +
> +static struct nfs4_xattr_entry *
> +nfs4_xattr_hash_find(struct nfs4_xattr_cache *cache, const char *name)
> +{
> + struct nfs4_xattr_bucket *bucket;
> + struct nfs4_xattr_entry *entry;
> +
> + bucket = nfs4_xattr_hash_bucket(cache, name);
> +
> + spin_lock(&bucket->lock);
> +
> + entry = nfs4_xattr_get_entry(bucket, name);
> + if (entry != NULL)
> + kref_get(&entry->ref);
> +
> + spin_unlock(&bucket->lock);
> +
> + return entry;
> +}
> +
> +/*
> + * Entry point to retrieve an entry from the cache.
> + */
> +ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name, char
> *buf,
> + ssize_t buflen)
> +{
> + struct nfs4_xattr_cache *cache;
> + struct nfs4_xattr_entry *entry;
> + ssize_t ret;
> +
> + cache = nfs4_xattr_get_cache(inode, 0);
> + if (cache == NULL)
> + return -ENOENT;
> +
> + ret = 0;
> + entry = nfs4_xattr_hash_find(cache, name);
> +
> + if (entry != NULL) {
> + dprintk("%s: cache hit '%s', len %lu\n", __func__,
> + entry->xattr_name, (unsigned long)entry->xattr_size);
> + if (buflen == 0) {
> + /* Length probe only */
> + ret = entry->xattr_size;
> + } else if (buflen < entry->xattr_size)
> + ret = -ERANGE;
> + else {
> + memcpy(buf, entry->xattr_value, entry->xattr_size);
> + ret = entry->xattr_size;
> + }
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> + } else {
> + dprintk("%s: cache miss '%s'\n", __func__, name);
> + ret = -ENOENT;
> + }
> +
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +
> + return ret;
> +}
> +
> +/*
> + * Retrieve a cached list of xattrs from the cache.
> + */
> +ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf, ssize_t buflen)
> +{
> + struct nfs4_xattr_cache *cache;
> + struct nfs4_xattr_entry *entry;
> + ssize_t ret;
> +
> + cache = nfs4_xattr_get_cache(inode, 0);
> + if (cache == NULL)
> + return -ENOENT;
> +
> + spin_lock(&cache->listxattr_lock);
> +
> + entry = cache->listxattr;
> +
> + if (entry != NULL && entry != ERR_PTR(-ESTALE)) {
> + if (buflen == 0) {
> + /* Length probe only */
> + ret = entry->xattr_size;
> + } else if (entry->xattr_size > buflen)
> + ret = -ERANGE;
> + else {
> + memcpy(buf, entry->xattr_value, entry->xattr_size);
> + ret = entry->xattr_size;
> + }
> + } else {
> + ret = -ENOENT;
> + }
> +
> + spin_unlock(&cache->listxattr_lock);
> +
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +
> + return ret;
> +}
> +
> +/*
> + * Add an xattr to the cache.
> + *
> + * This also invalidates the xattr list cache.
> + */
> +void nfs4_xattr_cache_add(struct inode *inode, const char *name,
> + const char *buf, struct page **pages, ssize_t buflen)
> +{
> + struct nfs4_xattr_cache *cache;
> + struct nfs4_xattr_entry *entry;
> +
> + dprintk("%s: add '%s' len %lu\n", __func__,
> + name, (unsigned long)buflen);
> +
> + cache = nfs4_xattr_get_cache(inode, 1);
> + if (cache == NULL)
> + return;
> +
> + entry = nfs4_xattr_alloc_entry(name, buf, pages, buflen);
> + if (entry == NULL)
> + goto out;
> +
> + (void)nfs4_xattr_set_listcache(cache, NULL);
> +
> + if (!nfs4_xattr_hash_add(cache, entry))
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +
> +out:
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +
> +/*
> + * Remove an xattr from the cache.
> + *
> + * This also invalidates the xattr list cache.
> + */
> +void nfs4_xattr_cache_remove(struct inode *inode, const char *name)
> +{
> + struct nfs4_xattr_cache *cache;
> +
> + dprintk("%s: remove '%s'\n", __func__, name);
> +
> + cache = nfs4_xattr_get_cache(inode, 0);
> + if (cache == NULL)
> + return;
> +
> + (void)nfs4_xattr_set_listcache(cache, NULL);
> + nfs4_xattr_hash_remove(cache, name);
> +
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +/*
> + * Cache listxattr output, replacing any possible old one.
> + */
> +void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
> + ssize_t buflen)
> +{
> + struct nfs4_xattr_cache *cache;
> + struct nfs4_xattr_entry *entry;
> +
> + cache = nfs4_xattr_get_cache(inode, 1);
> + if (cache == NULL)
> + return;
> +
> + entry = nfs4_xattr_alloc_entry(NULL, buf, NULL, buflen);
> + if (entry == NULL)
> + goto out;
> +
> + /*
> + * This is just there to be able to get to bucket->cache,
> + * which is obviously the same for all buckets, so just
> + * use bucket 0.
> + */
> + entry->bucket = &cache->buckets[0];
> +
> + if (!nfs4_xattr_set_listcache(cache, entry))
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +
> +out:
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +/*
> + * Zap the entire cache. Called when an inode is evicted.
> + */
> +void nfs4_xattr_cache_zap(struct inode *inode)
> +{
> + struct nfs4_xattr_cache *oldcache;
> +
> + spin_lock(&inode->i_lock);
> + oldcache = nfs4_xattr_cache_unlink(inode);
> + spin_unlock(&inode->i_lock);
> +
> + if (oldcache)
> + nfs4_xattr_discard_cache(oldcache);
> +}
> +
> +/*
> + * The entry LRU is shrunk more aggressively than the cache LRU,
> + * by settings @seeks to 1.
> + *
> + * Cache structures are freed only when they've become empty, after
> + * pruning all but one entry.
> + */
> +
> +static unsigned long nfs4_xattr_cache_count(struct shrinker *shrink,
> + struct shrink_control *sc);
> +static unsigned long nfs4_xattr_entry_count(struct shrinker *shrink,
> + struct shrink_control *sc);
> +static unsigned long nfs4_xattr_cache_scan(struct shrinker *shrink,
> + struct shrink_control *sc);
> +static unsigned long nfs4_xattr_entry_scan(struct shrinker *shrink,
> + struct shrink_control *sc);
> +
> +static struct shrinker nfs4_xattr_cache_shrinker = {
> + .count_objects = nfs4_xattr_cache_count,
> + .scan_objects = nfs4_xattr_cache_scan,
> + .seeks = DEFAULT_SEEKS,
> + .flags = SHRINKER_MEMCG_AWARE,
> +};
> +
> +static struct shrinker nfs4_xattr_entry_shrinker = {
> + .count_objects = nfs4_xattr_entry_count,
> + .scan_objects = nfs4_xattr_entry_scan,
> + .seeks = DEFAULT_SEEKS,
> + .batch = 512,
> + .flags = SHRINKER_MEMCG_AWARE,
> +};
> +
> +static struct shrinker nfs4_xattr_large_entry_shrinker = {
> + .count_objects = nfs4_xattr_entry_count,
> + .scan_objects = nfs4_xattr_entry_scan,
> + .seeks = 1,
> + .batch = 512,
> + .flags = SHRINKER_MEMCG_AWARE,
> +};
> +
> +static enum lru_status
> +cache_lru_isolate(struct list_head *item,
> + struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
> +{
> + struct list_head *dispose = arg;
> + struct inode *inode;
> + struct nfs4_xattr_cache *cache = container_of(item,
> + struct nfs4_xattr_cache, lru);
> +
> + if (atomic_long_read(&cache->nent) > 1)
> + return LRU_SKIP;
> +
> + /*
> + * If a cache structure is on the LRU list, we know that
> + * its inode is valid. Try to lock it to break the link.
> + * Since we're inverting the lock order here, only try.
> + */
> + inode = cache->inode;
> +
> + if (!spin_trylock(&inode->i_lock))
> + return LRU_SKIP;
> +
> + kref_get(&cache->ref);
> +
> + cache->inode = NULL;
> + NFS_I(inode)->xattr_cache = NULL;
> + NFS_I(inode)->cache_validity &= ~NFS_INO_INVALID_XATTR;
> + list_lru_isolate(lru, &cache->lru);
> +
> + spin_unlock(&inode->i_lock);
> +
> + list_add_tail(&cache->dispose, dispose);
> + return LRU_REMOVED;
> +}
> +
> +static unsigned long
> +nfs4_xattr_cache_scan(struct shrinker *shrink, struct shrink_control *sc)
> +{
> + LIST_HEAD(dispose);
> + unsigned long freed;
> + struct nfs4_xattr_cache *cache;
> +
> + freed = list_lru_shrink_walk(&nfs4_xattr_cache_lru, sc,
> + cache_lru_isolate, &dispose);
> + while (!list_empty(&dispose)) {
> + cache = list_first_entry(&dispose, struct nfs4_xattr_cache,
> + dispose);
> + list_del_init(&cache->dispose);
> + nfs4_xattr_discard_cache(cache);
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> + }
> +
> + return freed;
> +}
> +
> +
> +static unsigned long
> +nfs4_xattr_cache_count(struct shrinker *shrink, struct shrink_control *sc)
> +{
> + unsigned long count;
> +
> + count = list_lru_count(&nfs4_xattr_cache_lru);
> + return vfs_pressure_ratio(count);
> +}
> +
> +static enum lru_status
> +entry_lru_isolate(struct list_head *item,
> + struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
> +{
> + struct list_head *dispose = arg;
> + struct nfs4_xattr_bucket *bucket;
> + struct nfs4_xattr_cache *cache;
> + struct nfs4_xattr_entry *entry = container_of(item,
> + struct nfs4_xattr_entry, lru);
> +
> + bucket = entry->bucket;
> + cache = bucket->cache;
> +
> + /*
> + * Unhook the entry from its parent (either a cache bucket
> + * or a cache structure if it's a listxattr buf), so that
> + * it's no longer found. Then add it to the isolate list,
> + * to be freed later.
> + *
> + * In both cases, we're reverting lock order, so use
> + * trylock and skip the entry if we can't get the lock.
> + */
> + if (entry->xattr_name != NULL) {
> + /* Regular cache entry */
> + if (!spin_trylock(&bucket->lock))
> + return LRU_SKIP;
> +
> + kref_get(&entry->ref);
> +
> + hlist_del_init(&entry->hnode);
> + atomic_long_dec(&cache->nent);
> + list_lru_isolate(lru, &entry->lru);
> +
> + spin_unlock(&bucket->lock);
> + } else {
> + /* Listxattr cache entry */
> + if (!spin_trylock(&cache->listxattr_lock))
> + return LRU_SKIP;
> +
> + kref_get(&entry->ref);
> +
> + cache->listxattr = NULL;
> + list_lru_isolate(lru, &entry->lru);
> +
> + spin_unlock(&cache->listxattr_lock);
> + }
> +
> + list_add_tail(&entry->dispose, dispose);
> + return LRU_REMOVED;
> +}
> +
> +static unsigned long
> +nfs4_xattr_entry_scan(struct shrinker *shrink, struct shrink_control *sc)
> +{
> + LIST_HEAD(dispose);
> + unsigned long freed;
> + struct nfs4_xattr_entry *entry;
> + struct list_lru *lru;
> +
> + lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
> + &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> + freed = list_lru_shrink_walk(lru, sc, entry_lru_isolate, &dispose);
> +
> + while (!list_empty(&dispose)) {
> + entry = list_first_entry(&dispose, struct nfs4_xattr_entry,
> + dispose);
> + list_del_init(&entry->dispose);
> +
> + /*
> + * Drop two references: the one that we just grabbed
> + * in entry_lru_isolate, and the one that was set
> + * when the entry was first allocated.
> + */
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> + }
> +
> + return freed;
> +}
> +
> +static unsigned long
> +nfs4_xattr_entry_count(struct shrinker *shrink, struct shrink_control *sc)
> +{
> + unsigned long count;
> + struct list_lru *lru;
> +
> + lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
> + &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> + count = list_lru_count(lru);
> + return vfs_pressure_ratio(count);
> +}
> +
> +
> +static void nfs4_xattr_cache_init_once(void *p)
> +{
> + struct nfs4_xattr_cache *cache = (struct nfs4_xattr_cache *)p;
> +
> + spin_lock_init(&cache->listxattr_lock);
> + atomic_long_set(&cache->nent, 0);
> + nfs4_xattr_hash_init(cache);
> + cache->listxattr = NULL;
> + INIT_WORK(&cache->work, nfs4_xattr_discard_cache_worker);
> + INIT_LIST_HEAD(&cache->lru);
> + INIT_LIST_HEAD(&cache->dispose);
> +}
> +
> +int __init nfs4_xattr_cache_init(void)
> +{
> + int ret = 0;
> +
> + nfs4_xattr_cache_cachep = kmem_cache_create("nfs4_xattr_cache_cache",
> + sizeof(struct nfs4_xattr_cache), 0,
> + (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT),
> + nfs4_xattr_cache_init_once);
> + if (nfs4_xattr_cache_cachep == NULL)
> + return -ENOMEM;
> +
> + ret = list_lru_init_memcg(&nfs4_xattr_large_entry_lru,
> + &nfs4_xattr_large_entry_shrinker);
> + if (ret)
> + goto out4;
> +
> + ret = list_lru_init_memcg(&nfs4_xattr_entry_lru,
> + &nfs4_xattr_entry_shrinker);
> + if (ret)
> + goto out3;
> +
> + ret = list_lru_init_memcg(&nfs4_xattr_cache_lru,
> + &nfs4_xattr_cache_shrinker);
> + if (ret)
> + goto out2;
> +
> + nfs4_xattr_cache_wq = alloc_workqueue("nfs4_xattr", WQ_MEM_RECLAIM, 0);
> + if (nfs4_xattr_cache_wq == NULL)
> + goto out1;
> +
> + ret = register_shrinker(&nfs4_xattr_cache_shrinker);
> + if (ret)
> + goto out0;
> +
> + ret = register_shrinker(&nfs4_xattr_entry_shrinker);
> + if (ret)
> + goto out;
> +
> + ret = register_shrinker(&nfs4_xattr_large_entry_shrinker);
> + if (!ret)
> + return 0;
> +
> + unregister_shrinker(&nfs4_xattr_entry_shrinker);
> +out:
> + unregister_shrinker(&nfs4_xattr_cache_shrinker);
> +out0:
> + destroy_workqueue(nfs4_xattr_cache_wq);
> +out1:
> + list_lru_destroy(&nfs4_xattr_cache_lru);
> +out2:
> + list_lru_destroy(&nfs4_xattr_entry_lru);
> +out3:
> + list_lru_destroy(&nfs4_xattr_large_entry_lru);
> +out4:
> + kmem_cache_destroy(nfs4_xattr_cache_cachep);
> +
> + return ret;
> +}
> +
> +void nfs4_xattr_cache_exit(void)
> +{
> + unregister_shrinker(&nfs4_xattr_entry_shrinker);
> + unregister_shrinker(&nfs4_xattr_cache_shrinker);
> + list_lru_destroy(&nfs4_xattr_entry_lru);
> + list_lru_destroy(&nfs4_xattr_cache_lru);
> + kmem_cache_destroy(nfs4_xattr_cache_cachep);
> + destroy_workqueue(nfs4_xattr_cache_wq);
> +}
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6df94857f5bb..079c1ac84cee 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -7459,6 +7459,7 @@ static int nfs4_xattr_set_nfs4_user(const struct
> xattr_handler *handler,
> size_t buflen, int flags)
> {
> struct nfs_access_entry cache;
> + int ret;
>
> if (!nfs_server_capable(inode, NFS_CAP_XATTR))
> return -EOPNOTSUPP;
> @@ -7477,10 +7478,17 @@ static int nfs4_xattr_set_nfs4_user(const struct
> xattr_handler *handler,
> return -EACCES;
> }
>
> - if (buf == NULL)
> - return nfs42_proc_removexattr(inode, key);
> - else
> - return nfs42_proc_setxattr(inode, key, buf, buflen, flags);
> + if (buf == NULL) {
> + ret = nfs42_proc_removexattr(inode, key);
> + if (!ret)
> + nfs4_xattr_cache_remove(inode, key);
> + } else {
> + ret = nfs42_proc_setxattr(inode, key, buf, buflen, flags);
> + if (!ret)
> + nfs4_xattr_cache_add(inode, key, buf, NULL, buflen);
> + }
> +
> + return ret;
> }
>
> static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
> @@ -7488,6 +7496,7 @@ static int nfs4_xattr_get_nfs4_user(const struct
> xattr_handler *handler,
> const char *key, void *buf, size_t buflen)
> {
> struct nfs_access_entry cache;
> + ssize_t ret;
>
> if (!nfs_server_capable(inode, NFS_CAP_XATTR))
> return -EOPNOTSUPP;
> @@ -7497,7 +7506,17 @@ static int nfs4_xattr_get_nfs4_user(const struct
> xattr_handler *handler,
> return -EACCES;
> }
>
> - return nfs42_proc_getxattr(inode, key, buf, buflen);
> + ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> + if (ret)
> + return ret;
> +
> + ret = nfs4_xattr_cache_get(inode, key, buf, buflen);
> + if (ret >= 0 || (ret < 0 && ret != -ENOENT))
> + return ret;
> +
> + ret = nfs42_proc_getxattr(inode, key, buf, buflen);
> +
> + return ret;
> }
>
> static ssize_t
> @@ -7505,7 +7524,7 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char
> *list, size_t list_len)
> {
> u64 cookie;
> bool eof;
> - int ret, size;
> + ssize_t ret, size;
> char *buf;
> size_t buflen;
> struct nfs_access_entry cache;
> @@ -7518,6 +7537,14 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char
> *list, size_t list_len)
> return 0;
> }
>
> + ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> + if (ret)
> + return ret;
> +
> + ret = nfs4_xattr_cache_list(inode, list, list_len);
> + if (ret >= 0 || (ret < 0 && ret != -ENOENT))
> + return ret;
> +
> cookie = 0;
> eof = false;
> buflen = list_len ? list_len : XATTR_LIST_MAX;
> @@ -7537,6 +7564,9 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char
> *list, size_t list_len)
> size += ret;
> }
>
> + if (list_len)
> + nfs4_xattr_cache_set_list(inode, list, size);
> +
> return size;
> }
>
> diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
> index 1475f932d7da..0c1ab846b83d 100644
> --- a/fs/nfs/nfs4super.c
> +++ b/fs/nfs/nfs4super.c
> @@ -69,6 +69,7 @@ static void nfs4_evict_inode(struct inode *inode)
> pnfs_destroy_layout(NFS_I(inode));
> /* First call standard NFS clear_inode() code */
> nfs_clear_inode(inode);
> + nfs4_xattr_cache_zap(inode);
> }
>
> struct nfs_referral_count {
> @@ -268,6 +269,12 @@ static int __init init_nfs_v4(void)
> if (err)
> goto out1;
>
> +#ifdef CONFIG_NFS_V4_2
> + err = nfs4_xattr_cache_init();
> + if (err)
> + goto out2;
> +#endif
> +
> err = nfs4_register_sysctl();
> if (err)
> goto out2;
> @@ -288,6 +295,9 @@ static void __exit exit_nfs_v4(void)
> nfs4_pnfs_v3_ds_connect_unload();
>
> unregister_nfs_version(&nfs_v4);
> +#ifdef CONFIG_NFS_V4_2
> + nfs4_xattr_cache_exit();
> +#endif
> nfs4_unregister_sysctl();
> nfs_idmap_quit();
> nfs_dns_resolver_destroy();
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 1fcfef670a4a..c08cc22d9c32 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -102,6 +102,8 @@ struct nfs_delegation;
>
> struct posix_acl;
>
> +struct nfs4_xattr_cache;
> +
> /*
> * nfs fs inode data in memory
> */
> @@ -188,6 +190,10 @@ struct nfs_inode {
> struct fscache_cookie *fscache;
> #endif
> struct inode vfs_inode;
> +
> +#ifdef CONFIG_NFS_V4_2
> + struct nfs4_xattr_cache *xattr_cache;
> +#endif
> };
>
> struct nfs4_copy_state {
> diff --git a/include/uapi/linux/nfs_fs.h b/include/uapi/linux/nfs_fs.h
> index 7bcc8cd6831d..3afe3767c55d 100644
> --- a/include/uapi/linux/nfs_fs.h
> +++ b/include/uapi/linux/nfs_fs.h
> @@ -56,6 +56,7 @@
> #define NFSDBG_PNFS 0x1000
> #define NFSDBG_PNFS_LD 0x2000
> #define NFSDBG_STATE 0x4000
> +#define NFSDBG_XATTRCACHE 0x8000
> #define NFSDBG_ALL 0xFFFF
>
>

2020-03-12 20:49:47

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 13/13] NFSv4.2: add client side xattr caching.

On Wed, 2020-03-11 at 19:56 +0000, Frank van der Linden wrote:
> Implement client side caching for NFSv4.2 extended attributes. The cache
> is a per-inode hashtable, with name/value entries. There is one special
> entry for the listxattr cache.
>
> NFS inodes have a pointer to a cache structure. The cache structure is
> allocated on demand, freed when the cache is invalidated.
>
> Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE)
> are collected by a separate shrinker, and freed more aggressively
> than others.
>
> Signed-off-by: Frank van der Linden <[email protected]>
> ---
> fs/nfs/Makefile | 1 +
> fs/nfs/inode.c | 9 +-
> fs/nfs/internal.h | 20 +
> fs/nfs/nfs42proc.c | 12 +
> fs/nfs/nfs42xattr.c | 1083
> +++++++++++++++++++++++++++++++++++++++++++
> fs/nfs/nfs4proc.c | 42 +-
> fs/nfs/nfs4super.c | 10 +
> include/linux/nfs_fs.h | 6 +
> include/uapi/linux/nfs_fs.h | 1 +
> 9 files changed, 1177 insertions(+), 7 deletions(-)
> create mode 100644 fs/nfs/nfs42xattr.c
>
> diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
> index 2433c3e03cfa..191b3e9aa232 100644
> --- a/fs/nfs/Makefile
> +++ b/fs/nfs/Makefile
> @@ -31,6 +31,7 @@ nfsv4-$(CONFIG_NFS_USE_LEGACY_DNS) += cache_lib.o
> nfsv4-$(CONFIG_SYSCTL) += nfs4sysctl.o
> nfsv4-$(CONFIG_NFS_V4_1) += pnfs.o pnfs_dev.o pnfs_nfs.o
> nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o
> +nfsv4-$(CONFIG_NFS_V4_2) += nfs42xattr.o

Oh, you should also be able to combine the two CONFIG_NFS_V4_2 lines here:
nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o nfs42xattr.o

>
> obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
> obj-$(CONFIG_PNFS_BLOCK) += blocklayout/
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index d2be152796ef..9d4952d2306b 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -194,6 +194,7 @@ bool nfs_check_cache_invalid(struct inode *inode, unsigned
> long flags)
>
> return nfs_check_cache_invalid_not_delegated(inode, flags);
> }
> +EXPORT_SYMBOL_GPL(nfs_check_cache_invalid);
>
> static void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
> {
> @@ -235,11 +236,13 @@ static void nfs_zap_caches_locked(struct inode *inode)
> | NFS_INO_INVALID_DATA
> | NFS_INO_INVALID_ACCESS
> | NFS_INO_INVALID_ACL
> + | NFS_INO_INVALID_XATTR
> | NFS_INO_REVAL_PAGECACHE);
> } else
> nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
> | NFS_INO_INVALID_ACCESS
> | NFS_INO_INVALID_ACL
> + | NFS_INO_INVALID_XATTR
> | NFS_INO_REVAL_PAGECACHE);
> nfs_zap_label_cache_locked(nfsi);
> }
> @@ -1885,7 +1888,8 @@ static int nfs_update_inode(struct inode *inode, struct
> nfs_fattr *fattr)
> if (!(have_writers || have_delegation)) {
> invalid |= NFS_INO_INVALID_DATA
> | NFS_INO_INVALID_ACCESS
> - | NFS_INO_INVALID_ACL;
> + | NFS_INO_INVALID_ACL
> + | NFS_INO_INVALID_XATTR;
> /* Force revalidate of all attributes */
> save_cache_validity |= NFS_INO_INVALID_CTIME
> | NFS_INO_INVALID_MTIME
> @@ -2084,6 +2088,9 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
> #if IS_ENABLED(CONFIG_NFS_V4)
> nfsi->nfs4_acl = NULL;
> #endif /* CONFIG_NFS_V4 */
> +#ifdef CONFIG_NFS_V4_2
> + nfsi->xattr_cache = NULL;
> +#endif
> return &nfsi->vfs_inode;
> }
> EXPORT_SYMBOL_GPL(nfs_alloc_inode);
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 1e3a7e119c93..67b8e4f7c554 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -575,6 +575,26 @@ extern void nfs4_test_session_trunk(struct rpc_clnt
> *clnt,
> struct rpc_xprt *xprt,
> void *data);
>
> +#ifdef CONFIG_NFS_V4_2
> +extern int __init nfs4_xattr_cache_init(void);
> +extern void nfs4_xattr_cache_exit(void);
> +extern void nfs4_xattr_cache_add(struct inode *inode, const char *name,
> + const char *buf, struct page **pages,
> + ssize_t buflen);
> +extern void nfs4_xattr_cache_remove(struct inode *inode, const char *name);
> +extern ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name,
> + char *buf, ssize_t buflen);
> +extern void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
> + ssize_t buflen);
> +extern ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf,
> + ssize_t buflen);
> +extern void nfs4_xattr_cache_zap(struct inode *inode);
> +#else
> +static inline void nfs4_xattr_cache_zap(struct inode *inode)
> +{
> +}
> +#endif
> +
> static inline struct inode *nfs_igrab_and_active(struct inode *inode)
> {
> inode = igrab(inode);
> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> index 8c2e52bc986a..e200522469af 100644
> --- a/fs/nfs/nfs42proc.c
> +++ b/fs/nfs/nfs42proc.c
> @@ -1182,6 +1182,18 @@ static ssize_t _nfs42_proc_getxattr(struct inode
> *inode, const char *name,
> if (ret < 0)
> return ret;
>
> + /*
> + * Normally, the caching is done one layer up, but for successful
> + * RPCS, always cache the result here, even if the caller was
> + * just querying the length, or if the reply was too big for
> + * the caller. This avoids a second RPC in the case of the
> + * common query-alloc-retrieve cycle for xattrs.
> + *
> + * Note that xattr_len is always capped to XATTR_SIZE_MAX.
> + */
> +
> + nfs4_xattr_cache_add(inode, name, NULL, pages, res.xattr_len);
> +
> if (buflen) {
> if (res.xattr_len > buflen)
> return -ERANGE;
> diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
> new file mode 100644
> index 000000000000..23fdab977a2a
> --- /dev/null
> +++ b/fs/nfs/nfs42xattr.c
> @@ -0,0 +1,1083 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright 2019, 2020 Amazon.com, Inc. or its affiliates. All rights
> reserved.
> + *
> + * User extended attribute client side cache functions.
> + *
> + * Author: Frank van der Linden <[email protected]>
> + */
> +#include <linux/errno.h>
> +#include <linux/nfs_fs.h>
> +#include <linux/hashtable.h>
> +#include <linux/refcount.h>
> +#include <uapi/linux/xattr.h>
> +
> +#include "nfs4_fs.h"
> +#include "internal.h"
> +
> +/*
> + * User extended attributes client side caching is implemented by having
> + * a cache structure attached to NFS inodes. This structure is allocated
> + * when needed, and freed when the cache is zapped.
> + *
> + * The cache structure contains as hash table of entries, and a pointer
> + * to a special-cased entry for the listxattr cache.
> + *
> + * Accessing and allocating / freeing the caches is done via reference
> + * counting. The cache entries use a similar refcounting scheme.
> + *
> + * This makes freeing a cache, both from the shrinker and from the
> + * zap cache path, easy. It also means that, in current use cases,
> + * the large majority of inodes will not waste any memory, as they
> + * will never have any user extended attributes assigned to them.
> + *
> + * Attribute entries are hashed in to a simple hash table. They are
> + * also part of an LRU.
> + *
> + * There are three shrinkers.
> + *
> + * Two shrinkers deal with the cache entries themselves: one for
> + * large entries (> PAGE_SIZE), and one for smaller entries. The
> + * shrinker for the larger entries works more aggressively than
> + * those for the smaller entries.
> + *
> + * The other shrinker frees the cache structures themselves.
> + */
> +
> +/*
> + * 64 buckets is a good default. There is likely no reasonable
> + * workload that uses more than even 64 user extended attributes.
> + * You can certainly add a lot more - but you get what you ask for
> + * in those circumstances.
> + */
> +#define NFS4_XATTR_HASH_SIZE 64
> +
> +#define NFSDBG_FACILITY NFSDBG_XATTRCACHE
> +
> +struct nfs4_xattr_cache;
> +struct nfs4_xattr_entry;
> +
> +struct nfs4_xattr_bucket {
> + spinlock_t lock;
> + struct hlist_head hlist;
> + struct nfs4_xattr_cache *cache;
> + bool draining;
> +};
> +
> +struct nfs4_xattr_cache {
> + struct kref ref;
> + spinlock_t hash_lock; /* protects hashtable and lru */
> + struct nfs4_xattr_bucket buckets[NFS4_XATTR_HASH_SIZE];
> + struct list_head lru;
> + struct list_head dispose;
> + atomic_long_t nent;
> + spinlock_t listxattr_lock;
> + struct inode *inode;
> + struct nfs4_xattr_entry *listxattr;
> + struct work_struct work;
> +};
> +
> +struct nfs4_xattr_entry {
> + struct kref ref;
> + struct hlist_node hnode;
> + struct list_head lru;
> + struct list_head dispose;
> + char *xattr_name;
> + void *xattr_value;
> + size_t xattr_size;
> + struct nfs4_xattr_bucket *bucket;
> + uint32_t flags;
> +};
> +
> +#define NFS4_XATTR_ENTRY_EXTVAL 0x0001
> +
> +/*
> + * LRU list of NFS inodes that have xattr caches.
> + */
> +static struct list_lru nfs4_xattr_cache_lru;
> +static struct list_lru nfs4_xattr_entry_lru;
> +static struct list_lru nfs4_xattr_large_entry_lru;
> +
> +static struct kmem_cache *nfs4_xattr_cache_cachep;
> +
> +static struct workqueue_struct *nfs4_xattr_cache_wq;
> +
> +/*
> + * Hashing helper functions.
> + */
> +static void
> +nfs4_xattr_hash_init(struct nfs4_xattr_cache *cache)
> +{
> + unsigned int i;
> +
> + for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
> + INIT_HLIST_HEAD(&cache->buckets[i].hlist);
> + spin_lock_init(&cache->buckets[i].lock);
> + cache->buckets[i].cache = cache;
> + cache->buckets[i].draining = false;
> + }
> +}
> +
> +/*
> + * Locking order:
> + * 1. inode i_lock or bucket lock
> + * 2. list_lru lock (taken by list_lru_* functions)
> + */
> +
> +/*
> + * Wrapper functions to add a cache entry to the right LRU.
> + */
> +static bool
> +nfs4_xattr_entry_lru_add(struct nfs4_xattr_entry *entry)
> +{
> + struct list_lru *lru;
> +
> + lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
> + &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> + return list_lru_add(lru, &entry->lru);
> +}
> +
> +static bool
> +nfs4_xattr_entry_lru_del(struct nfs4_xattr_entry *entry)
> +{
> + struct list_lru *lru;
> +
> + lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
> + &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> + return list_lru_del(lru, &entry->lru);
> +}
> +
> +/*
> + * This function allocates cache entries. They are the normal
> + * extended attribute name/value pairs, but may also be a listxattr
> + * cache. Those allocations use the same entry so that they can be
> + * treated as one by the memory shrinker.
> + *
> + * xattr cache entries are allocated together with names. If the
> + * value fits in to one page with the entry structure and the name,
> + * it will also be part of the same allocation (kmalloc). This is
> + * expected to be the vast majority of cases. Larger allocations
> + * have a value pointer that is allocated separately by kvmalloc.
> + *
> + * Parameters:
> + *
> + * @name: Name of the extended attribute. NULL for listxattr cache
> + * entry.
> + * @value: Value of attribute, or listxattr cache. NULL if the
> + * value is to be copied from pages instead.
> + * @pages: Pages to copy the value from, if not NULL. Passed in to
> + * make it easier to copy the value after an RPC, even if
> + * the value will not be passed up to application (e.g.
> + * for a 'query' getxattr with NULL buffer).
> + * @len: Length of the value. Can be 0 for zero-length attribues.
> + * @value and @pages will be NULL if @len is 0.
> + */
> +static struct nfs4_xattr_entry *
> +nfs4_xattr_alloc_entry(const char *name, const void *value,
> + struct page **pages, size_t len)
> +{
> + struct nfs4_xattr_entry *entry;
> + void *valp;
> + char *namep;
> + size_t alloclen, slen;
> + char *buf;
> + uint32_t flags;
> +
> + BUILD_BUG_ON(sizeof(struct nfs4_xattr_entry) +
> + XATTR_NAME_MAX + 1 > PAGE_SIZE);
> +
> + alloclen = sizeof(struct nfs4_xattr_entry);
> + if (name != NULL) {
> + slen = strlen(name) + 1;
> + alloclen += slen;
> + } else
> + slen = 0;
> +
> + if (alloclen + len <= PAGE_SIZE) {
> + alloclen += len;
> + flags = 0;
> + } else {
> + flags = NFS4_XATTR_ENTRY_EXTVAL;
> + }
> +
> + buf = kmalloc(alloclen, GFP_KERNEL_ACCOUNT | GFP_NOFS);
> + if (buf == NULL)
> + return NULL;
> + entry = (struct nfs4_xattr_entry *)buf;
> +
> + if (name != NULL) {
> + namep = buf + sizeof(struct nfs4_xattr_entry);
> + memcpy(namep, name, slen);
> + } else {
> + namep = NULL;
> + }
> +
> +
> + if (flags & NFS4_XATTR_ENTRY_EXTVAL) {
> + valp = kvmalloc(len, GFP_KERNEL_ACCOUNT | GFP_NOFS);
> + if (valp == NULL) {
> + kfree(buf);
> + return NULL;
> + }
> + } else if (len != 0) {
> + valp = buf + sizeof(struct nfs4_xattr_entry) + slen;
> + } else
> + valp = NULL;
> +
> + if (valp != NULL) {
> + if (value != NULL)
> + memcpy(valp, value, len);
> + else
> + _copy_from_pages(valp, pages, 0, len);
> + }
> +
> + entry->flags = flags;
> + entry->xattr_value = valp;
> + kref_init(&entry->ref);
> + entry->xattr_name = namep;
> + entry->xattr_size = len;
> + entry->bucket = NULL;
> + INIT_LIST_HEAD(&entry->lru);
> + INIT_LIST_HEAD(&entry->dispose);
> + INIT_HLIST_NODE(&entry->hnode);
> +
> + return entry;
> +}
> +
> +static void
> +nfs4_xattr_free_entry(struct nfs4_xattr_entry *entry)
> +{
> + if (entry->flags & NFS4_XATTR_ENTRY_EXTVAL)
> + kvfree(entry->xattr_value);
> + kfree(entry);
> +}
> +
> +static void
> +nfs4_xattr_free_entry_cb(struct kref *kref)
> +{
> + struct nfs4_xattr_entry *entry;
> +
> + entry = container_of(kref, struct nfs4_xattr_entry, ref);
> +
> + if (WARN_ON(!list_empty(&entry->lru)))
> + return;
> +
> + nfs4_xattr_free_entry(entry);
> +}
> +
> +static void
> +nfs4_xattr_free_cache_cb(struct kref *kref)
> +{
> + struct nfs4_xattr_cache *cache;
> + int i;
> +
> + cache = container_of(kref, struct nfs4_xattr_cache, ref);
> +
> + for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
> + if (WARN_ON(!hlist_empty(&cache->buckets[i].hlist)))
> + return;
> + cache->buckets[i].draining = false;
> + }
> +
> + cache->listxattr = NULL;
> +
> + kmem_cache_free(nfs4_xattr_cache_cachep, cache);
> +
> +}
> +
> +static struct nfs4_xattr_cache *
> +nfs4_xattr_alloc_cache(void)
> +{
> + struct nfs4_xattr_cache *cache;
> +
> + cache = kmem_cache_alloc(nfs4_xattr_cache_cachep,
> + GFP_KERNEL_ACCOUNT | GFP_NOFS);
> + if (cache == NULL)
> + return NULL;
> +
> + kref_init(&cache->ref);
> + atomic_long_set(&cache->nent, 0);
> +
> + return cache;
> +}
> +
> +/*
> + * Set the listxattr cache, which is a special-cased cache entry.
> + * The special value ERR_PTR(-ESTALE) is used to indicate that
> + * the cache is being drained - this prevents a new listxattr
> + * cache from being added to what is now a stale cache.
> + */
> +static int
> +nfs4_xattr_set_listcache(struct nfs4_xattr_cache *cache,
> + struct nfs4_xattr_entry *new)
> +{
> + struct nfs4_xattr_entry *old;
> + int ret = 1;
> +
> + spin_lock(&cache->listxattr_lock);
> +
> + old = cache->listxattr;
> +
> + if (old == ERR_PTR(-ESTALE)) {
> + ret = 0;
> + goto out;
> + }
> +
> + cache->listxattr = new;
> + if (new != NULL && new != ERR_PTR(-ESTALE))
> + nfs4_xattr_entry_lru_add(new);
> +
> + if (old != NULL) {
> + nfs4_xattr_entry_lru_del(old);
> + kref_put(&old->ref, nfs4_xattr_free_entry_cb);
> + }
> +out:
> + spin_unlock(&cache->listxattr_lock);
> +
> + return ret;
> +}
> +
> +/*
> + * Unlink a cache from its parent inode, clearing out an invalid
> + * cache. Must be called with i_lock held.
> + */
> +static struct nfs4_xattr_cache *
> +nfs4_xattr_cache_unlink(struct inode *inode)
> +{
> + struct nfs_inode *nfsi;
> + struct nfs4_xattr_cache *oldcache;
> +
> + nfsi = NFS_I(inode);
> +
> + oldcache = nfsi->xattr_cache;
> + if (oldcache != NULL) {
> + list_lru_del(&nfs4_xattr_cache_lru, &oldcache->lru);
> + oldcache->inode = NULL;
> + }
> + nfsi->xattr_cache = NULL;
> + nfsi->cache_validity &= ~NFS_INO_INVALID_XATTR;
> +
> + return oldcache;
> +
> +}
> +
> +/*
> + * Discard a cache. Usually called by a worker, since walking all
> + * the entries can take up some cycles that we don't want to waste
> + * in the I/O path. Can also be called from the shrinker callback.
> + *
> + * The cache is dead, it has already been unlinked from its inode,
> + * and no longer appears on the cache LRU list.
> + *
> + * Mark all buckets as draining, so that no new entries are added. This
> + * could still happen in the unlikely, but possible case that another
> + * thread had grabbed a reference before it was unlinked from the inode,
> + * and is still holding it for an add operation.
> + *
> + * Remove all entries from the LRU lists, so that there is no longer
> + * any way to 'find' this cache. Then, remove the entries from the hash
> + * table.
> + *
> + * At that point, the cache will remain empty and can be freed when the final
> + * reference drops, which is very likely the kref_put at the end of
> + * this function, or the one called immediately afterwards in the
> + * shrinker callback.
> + */
> +static void
> +nfs4_xattr_discard_cache(struct nfs4_xattr_cache *cache)
> +{
> + unsigned int i;
> + struct nfs4_xattr_entry *entry;
> + struct nfs4_xattr_bucket *bucket;
> + struct hlist_node *n;
> +
> + nfs4_xattr_set_listcache(cache, ERR_PTR(-ESTALE));
> +
> + for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
> + bucket = &cache->buckets[i];
> +
> + spin_lock(&bucket->lock);
> + bucket->draining = true;
> + hlist_for_each_entry_safe(entry, n, &bucket->hlist, hnode) {
> + nfs4_xattr_entry_lru_del(entry);
> + hlist_del_init(&entry->hnode);
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> + }
> + spin_unlock(&bucket->lock);
> + }
> +
> + atomic_long_set(&cache->nent, 0);
> +
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +static void
> +nfs4_xattr_discard_cache_worker(struct work_struct *work)
> +{
> + struct nfs4_xattr_cache *cache = container_of(work,
> + struct nfs4_xattr_cache, work);
> +
> + nfs4_xattr_discard_cache(cache);
> +}
> +
> +static void
> +nfs4_xattr_reap_cache(struct nfs4_xattr_cache *cache)
> +{
> + queue_work(nfs4_xattr_cache_wq, &cache->work);
> +}
> +
> +/*
> + * Get a referenced copy of the cache structure. Avoid doing allocs
> + * while holding i_lock. Which means that we do some optimistic allocation,
> + * and might have to free the result in rare cases.
> + *
> + * This function only checks the NFS_INO_INVALID_XATTR cache validity bit
> + * and acts accordingly, replacing the cache when needed. For the read case
> + * (!add), this means that the caller must make sure that the cache
> + * is valid before caling this function. getxattr and listxattr call
> + * revalidate_inode to do this. The attribute cache timeout (for the
> + * non-delegated case) is expected to be dealt with in the revalidate
> + * call.
> + */
> +
> +static struct nfs4_xattr_cache *
> +nfs4_xattr_get_cache(struct inode *inode, int add)
> +{
> + struct nfs_inode *nfsi;
> + struct nfs4_xattr_cache *cache, *oldcache, *newcache;
> +
> + nfsi = NFS_I(inode);
> +
> + cache = oldcache = NULL;
> +
> + spin_lock(&inode->i_lock);
> +
> + if (nfsi->cache_validity & NFS_INO_INVALID_XATTR)
> + oldcache = nfs4_xattr_cache_unlink(inode);
> + else
> + cache = nfsi->xattr_cache;
> +
> + if (cache != NULL)
> + kref_get(&cache->ref);
> +
> + spin_unlock(&inode->i_lock);
> +
> + if (add && cache == NULL) {
> + newcache = NULL;
> +
> + cache = nfs4_xattr_alloc_cache();
> + if (cache == NULL)
> + goto out;
> +
> + spin_lock(&inode->i_lock);
> + if (nfsi->cache_validity & NFS_INO_INVALID_XATTR) {
> + /*
> + * The cache was invalidated again. Give up,
> + * since what we want to enter is now likely
> + * outdated anyway.
> + */
> + spin_unlock(&inode->i_lock);
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> + cache = NULL;
> + goto out;
> + }
> +
> + /*
> + * Check if someone beat us to it.
> + */
> + if (nfsi->xattr_cache != NULL) {
> + newcache = nfsi->xattr_cache;
> + kref_get(&newcache->ref);
> + } else {
> + kref_get(&cache->ref);
> + nfsi->xattr_cache = cache;
> + cache->inode = inode;
> + list_lru_add(&nfs4_xattr_cache_lru, &cache->lru);
> + }
> +
> + spin_unlock(&inode->i_lock);
> +
> + /*
> + * If there was a race, throw away the cache we just
> + * allocated, and use the new one allocated by someone
> + * else.
> + */
> + if (newcache != NULL) {
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> + cache = newcache;
> + }
> + }
> +
> +out:
> + /*
> + * Discarding an old cache is done via a workqueue.
> + */
> + if (oldcache != NULL)
> + nfs4_xattr_reap_cache(oldcache);
> +
> + return cache;
> +}
> +
> +static inline struct nfs4_xattr_bucket *
> +nfs4_xattr_hash_bucket(struct nfs4_xattr_cache *cache, const char *name)
> +{
> + return &cache->buckets[jhash(name, strlen(name), 0) &
> + (ARRAY_SIZE(cache->buckets) - 1)];
> +}
> +
> +static struct nfs4_xattr_entry *
> +nfs4_xattr_get_entry(struct nfs4_xattr_bucket *bucket, const char *name)
> +{
> + struct nfs4_xattr_entry *entry;
> +
> + entry = NULL;
> +
> + hlist_for_each_entry(entry, &bucket->hlist, hnode) {
> + if (!strcmp(entry->xattr_name, name))
> + break;
> + }
> +
> + return entry;
> +}
> +
> +static int
> +nfs4_xattr_hash_add(struct nfs4_xattr_cache *cache,
> + struct nfs4_xattr_entry *entry)
> +{
> + struct nfs4_xattr_bucket *bucket;
> + struct nfs4_xattr_entry *oldentry = NULL;
> + int ret = 1;
> +
> + bucket = nfs4_xattr_hash_bucket(cache, entry->xattr_name);
> + entry->bucket = bucket;
> +
> + spin_lock(&bucket->lock);
> +
> + if (bucket->draining) {
> + ret = 0;
> + goto out;
> + }
> +
> + oldentry = nfs4_xattr_get_entry(bucket, entry->xattr_name);
> + if (oldentry != NULL) {
> + hlist_del_init(&oldentry->hnode);
> + nfs4_xattr_entry_lru_del(oldentry);
> + } else {
> + atomic_long_inc(&cache->nent);
> + }
> +
> + hlist_add_head(&entry->hnode, &bucket->hlist);
> + nfs4_xattr_entry_lru_add(entry);
> +
> +out:
> + spin_unlock(&bucket->lock);
> +
> + if (oldentry != NULL)
> + kref_put(&oldentry->ref, nfs4_xattr_free_entry_cb);
> +
> + return ret;
> +}
> +
> +static void
> +nfs4_xattr_hash_remove(struct nfs4_xattr_cache *cache, const char *name)
> +{
> + struct nfs4_xattr_bucket *bucket;
> + struct nfs4_xattr_entry *entry;
> +
> + bucket = nfs4_xattr_hash_bucket(cache, name);
> +
> + spin_lock(&bucket->lock);
> +
> + entry = nfs4_xattr_get_entry(bucket, name);
> + if (entry != NULL) {
> + hlist_del_init(&entry->hnode);
> + nfs4_xattr_entry_lru_del(entry);
> + atomic_long_dec(&cache->nent);
> + }
> +
> + spin_unlock(&bucket->lock);
> +
> + if (entry != NULL)
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +}
> +
> +static struct nfs4_xattr_entry *
> +nfs4_xattr_hash_find(struct nfs4_xattr_cache *cache, const char *name)
> +{
> + struct nfs4_xattr_bucket *bucket;
> + struct nfs4_xattr_entry *entry;
> +
> + bucket = nfs4_xattr_hash_bucket(cache, name);
> +
> + spin_lock(&bucket->lock);
> +
> + entry = nfs4_xattr_get_entry(bucket, name);
> + if (entry != NULL)
> + kref_get(&entry->ref);
> +
> + spin_unlock(&bucket->lock);
> +
> + return entry;
> +}
> +
> +/*
> + * Entry point to retrieve an entry from the cache.
> + */
> +ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name, char
> *buf,
> + ssize_t buflen)
> +{
> + struct nfs4_xattr_cache *cache;
> + struct nfs4_xattr_entry *entry;
> + ssize_t ret;
> +
> + cache = nfs4_xattr_get_cache(inode, 0);
> + if (cache == NULL)
> + return -ENOENT;
> +
> + ret = 0;
> + entry = nfs4_xattr_hash_find(cache, name);
> +
> + if (entry != NULL) {
> + dprintk("%s: cache hit '%s', len %lu\n", __func__,
> + entry->xattr_name, (unsigned long)entry->xattr_size);
> + if (buflen == 0) {
> + /* Length probe only */
> + ret = entry->xattr_size;
> + } else if (buflen < entry->xattr_size)
> + ret = -ERANGE;
> + else {
> + memcpy(buf, entry->xattr_value, entry->xattr_size);
> + ret = entry->xattr_size;
> + }
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> + } else {
> + dprintk("%s: cache miss '%s'\n", __func__, name);
> + ret = -ENOENT;
> + }
> +
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +
> + return ret;
> +}
> +
> +/*
> + * Retrieve a cached list of xattrs from the cache.
> + */
> +ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf, ssize_t buflen)
> +{
> + struct nfs4_xattr_cache *cache;
> + struct nfs4_xattr_entry *entry;
> + ssize_t ret;
> +
> + cache = nfs4_xattr_get_cache(inode, 0);
> + if (cache == NULL)
> + return -ENOENT;
> +
> + spin_lock(&cache->listxattr_lock);
> +
> + entry = cache->listxattr;
> +
> + if (entry != NULL && entry != ERR_PTR(-ESTALE)) {
> + if (buflen == 0) {
> + /* Length probe only */
> + ret = entry->xattr_size;
> + } else if (entry->xattr_size > buflen)
> + ret = -ERANGE;
> + else {
> + memcpy(buf, entry->xattr_value, entry->xattr_size);
> + ret = entry->xattr_size;
> + }
> + } else {
> + ret = -ENOENT;
> + }
> +
> + spin_unlock(&cache->listxattr_lock);
> +
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +
> + return ret;
> +}
> +
> +/*
> + * Add an xattr to the cache.
> + *
> + * This also invalidates the xattr list cache.
> + */
> +void nfs4_xattr_cache_add(struct inode *inode, const char *name,
> + const char *buf, struct page **pages, ssize_t buflen)
> +{
> + struct nfs4_xattr_cache *cache;
> + struct nfs4_xattr_entry *entry;
> +
> + dprintk("%s: add '%s' len %lu\n", __func__,
> + name, (unsigned long)buflen);
> +
> + cache = nfs4_xattr_get_cache(inode, 1);
> + if (cache == NULL)
> + return;
> +
> + entry = nfs4_xattr_alloc_entry(name, buf, pages, buflen);
> + if (entry == NULL)
> + goto out;
> +
> + (void)nfs4_xattr_set_listcache(cache, NULL);
> +
> + if (!nfs4_xattr_hash_add(cache, entry))
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +
> +out:
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +
> +/*
> + * Remove an xattr from the cache.
> + *
> + * This also invalidates the xattr list cache.
> + */
> +void nfs4_xattr_cache_remove(struct inode *inode, const char *name)
> +{
> + struct nfs4_xattr_cache *cache;
> +
> + dprintk("%s: remove '%s'\n", __func__, name);
> +
> + cache = nfs4_xattr_get_cache(inode, 0);
> + if (cache == NULL)
> + return;
> +
> + (void)nfs4_xattr_set_listcache(cache, NULL);
> + nfs4_xattr_hash_remove(cache, name);
> +
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +/*
> + * Cache listxattr output, replacing any possible old one.
> + */
> +void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
> + ssize_t buflen)
> +{
> + struct nfs4_xattr_cache *cache;
> + struct nfs4_xattr_entry *entry;
> +
> + cache = nfs4_xattr_get_cache(inode, 1);
> + if (cache == NULL)
> + return;
> +
> + entry = nfs4_xattr_alloc_entry(NULL, buf, NULL, buflen);
> + if (entry == NULL)
> + goto out;
> +
> + /*
> + * This is just there to be able to get to bucket->cache,
> + * which is obviously the same for all buckets, so just
> + * use bucket 0.
> + */
> + entry->bucket = &cache->buckets[0];
> +
> + if (!nfs4_xattr_set_listcache(cache, entry))
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +
> +out:
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +/*
> + * Zap the entire cache. Called when an inode is evicted.
> + */
> +void nfs4_xattr_cache_zap(struct inode *inode)
> +{
> + struct nfs4_xattr_cache *oldcache;
> +
> + spin_lock(&inode->i_lock);
> + oldcache = nfs4_xattr_cache_unlink(inode);
> + spin_unlock(&inode->i_lock);
> +
> + if (oldcache)
> + nfs4_xattr_discard_cache(oldcache);
> +}
> +
> +/*
> + * The entry LRU is shrunk more aggressively than the cache LRU,
> + * by settings @seeks to 1.
> + *
> + * Cache structures are freed only when they've become empty, after
> + * pruning all but one entry.
> + */
> +
> +static unsigned long nfs4_xattr_cache_count(struct shrinker *shrink,
> + struct shrink_control *sc);
> +static unsigned long nfs4_xattr_entry_count(struct shrinker *shrink,
> + struct shrink_control *sc);
> +static unsigned long nfs4_xattr_cache_scan(struct shrinker *shrink,
> + struct shrink_control *sc);
> +static unsigned long nfs4_xattr_entry_scan(struct shrinker *shrink,
> + struct shrink_control *sc);
> +
> +static struct shrinker nfs4_xattr_cache_shrinker = {
> + .count_objects = nfs4_xattr_cache_count,
> + .scan_objects = nfs4_xattr_cache_scan,
> + .seeks = DEFAULT_SEEKS,
> + .flags = SHRINKER_MEMCG_AWARE,
> +};
> +
> +static struct shrinker nfs4_xattr_entry_shrinker = {
> + .count_objects = nfs4_xattr_entry_count,
> + .scan_objects = nfs4_xattr_entry_scan,
> + .seeks = DEFAULT_SEEKS,
> + .batch = 512,
> + .flags = SHRINKER_MEMCG_AWARE,
> +};
> +
> +static struct shrinker nfs4_xattr_large_entry_shrinker = {
> + .count_objects = nfs4_xattr_entry_count,
> + .scan_objects = nfs4_xattr_entry_scan,
> + .seeks = 1,
> + .batch = 512,
> + .flags = SHRINKER_MEMCG_AWARE,
> +};
> +
> +static enum lru_status
> +cache_lru_isolate(struct list_head *item,
> + struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
> +{
> + struct list_head *dispose = arg;
> + struct inode *inode;
> + struct nfs4_xattr_cache *cache = container_of(item,
> + struct nfs4_xattr_cache, lru);
> +
> + if (atomic_long_read(&cache->nent) > 1)
> + return LRU_SKIP;
> +
> + /*
> + * If a cache structure is on the LRU list, we know that
> + * its inode is valid. Try to lock it to break the link.
> + * Since we're inverting the lock order here, only try.
> + */
> + inode = cache->inode;
> +
> + if (!spin_trylock(&inode->i_lock))
> + return LRU_SKIP;
> +
> + kref_get(&cache->ref);
> +
> + cache->inode = NULL;
> + NFS_I(inode)->xattr_cache = NULL;
> + NFS_I(inode)->cache_validity &= ~NFS_INO_INVALID_XATTR;
> + list_lru_isolate(lru, &cache->lru);
> +
> + spin_unlock(&inode->i_lock);
> +
> + list_add_tail(&cache->dispose, dispose);
> + return LRU_REMOVED;
> +}
> +
> +static unsigned long
> +nfs4_xattr_cache_scan(struct shrinker *shrink, struct shrink_control *sc)
> +{
> + LIST_HEAD(dispose);
> + unsigned long freed;
> + struct nfs4_xattr_cache *cache;
> +
> + freed = list_lru_shrink_walk(&nfs4_xattr_cache_lru, sc,
> + cache_lru_isolate, &dispose);
> + while (!list_empty(&dispose)) {
> + cache = list_first_entry(&dispose, struct nfs4_xattr_cache,
> + dispose);
> + list_del_init(&cache->dispose);
> + nfs4_xattr_discard_cache(cache);
> + kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> + }
> +
> + return freed;
> +}
> +
> +
> +static unsigned long
> +nfs4_xattr_cache_count(struct shrinker *shrink, struct shrink_control *sc)
> +{
> + unsigned long count;
> +
> + count = list_lru_count(&nfs4_xattr_cache_lru);
> + return vfs_pressure_ratio(count);
> +}
> +
> +static enum lru_status
> +entry_lru_isolate(struct list_head *item,
> + struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
> +{
> + struct list_head *dispose = arg;
> + struct nfs4_xattr_bucket *bucket;
> + struct nfs4_xattr_cache *cache;
> + struct nfs4_xattr_entry *entry = container_of(item,
> + struct nfs4_xattr_entry, lru);
> +
> + bucket = entry->bucket;
> + cache = bucket->cache;
> +
> + /*
> + * Unhook the entry from its parent (either a cache bucket
> + * or a cache structure if it's a listxattr buf), so that
> + * it's no longer found. Then add it to the isolate list,
> + * to be freed later.
> + *
> + * In both cases, we're reverting lock order, so use
> + * trylock and skip the entry if we can't get the lock.
> + */
> + if (entry->xattr_name != NULL) {
> + /* Regular cache entry */
> + if (!spin_trylock(&bucket->lock))
> + return LRU_SKIP;
> +
> + kref_get(&entry->ref);
> +
> + hlist_del_init(&entry->hnode);
> + atomic_long_dec(&cache->nent);
> + list_lru_isolate(lru, &entry->lru);
> +
> + spin_unlock(&bucket->lock);
> + } else {
> + /* Listxattr cache entry */
> + if (!spin_trylock(&cache->listxattr_lock))
> + return LRU_SKIP;
> +
> + kref_get(&entry->ref);
> +
> + cache->listxattr = NULL;
> + list_lru_isolate(lru, &entry->lru);
> +
> + spin_unlock(&cache->listxattr_lock);
> + }
> +
> + list_add_tail(&entry->dispose, dispose);
> + return LRU_REMOVED;
> +}
> +
> +static unsigned long
> +nfs4_xattr_entry_scan(struct shrinker *shrink, struct shrink_control *sc)
> +{
> + LIST_HEAD(dispose);
> + unsigned long freed;
> + struct nfs4_xattr_entry *entry;
> + struct list_lru *lru;
> +
> + lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
> + &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> + freed = list_lru_shrink_walk(lru, sc, entry_lru_isolate, &dispose);
> +
> + while (!list_empty(&dispose)) {
> + entry = list_first_entry(&dispose, struct nfs4_xattr_entry,
> + dispose);
> + list_del_init(&entry->dispose);
> +
> + /*
> + * Drop two references: the one that we just grabbed
> + * in entry_lru_isolate, and the one that was set
> + * when the entry was first allocated.
> + */
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> + kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> + }
> +
> + return freed;
> +}
> +
> +static unsigned long
> +nfs4_xattr_entry_count(struct shrinker *shrink, struct shrink_control *sc)
> +{
> + unsigned long count;
> + struct list_lru *lru;
> +
> + lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
> + &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> + count = list_lru_count(lru);
> + return vfs_pressure_ratio(count);
> +}
> +
> +
> +static void nfs4_xattr_cache_init_once(void *p)
> +{
> + struct nfs4_xattr_cache *cache = (struct nfs4_xattr_cache *)p;
> +
> + spin_lock_init(&cache->listxattr_lock);
> + atomic_long_set(&cache->nent, 0);
> + nfs4_xattr_hash_init(cache);
> + cache->listxattr = NULL;
> + INIT_WORK(&cache->work, nfs4_xattr_discard_cache_worker);
> + INIT_LIST_HEAD(&cache->lru);
> + INIT_LIST_HEAD(&cache->dispose);
> +}
> +
> +int __init nfs4_xattr_cache_init(void)
> +{
> + int ret = 0;
> +
> + nfs4_xattr_cache_cachep = kmem_cache_create("nfs4_xattr_cache_cache",
> + sizeof(struct nfs4_xattr_cache), 0,
> + (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT),
> + nfs4_xattr_cache_init_once);
> + if (nfs4_xattr_cache_cachep == NULL)
> + return -ENOMEM;
> +
> + ret = list_lru_init_memcg(&nfs4_xattr_large_entry_lru,
> + &nfs4_xattr_large_entry_shrinker);
> + if (ret)
> + goto out4;
> +
> + ret = list_lru_init_memcg(&nfs4_xattr_entry_lru,
> + &nfs4_xattr_entry_shrinker);
> + if (ret)
> + goto out3;
> +
> + ret = list_lru_init_memcg(&nfs4_xattr_cache_lru,
> + &nfs4_xattr_cache_shrinker);
> + if (ret)
> + goto out2;
> +
> + nfs4_xattr_cache_wq = alloc_workqueue("nfs4_xattr", WQ_MEM_RECLAIM, 0);
> + if (nfs4_xattr_cache_wq == NULL)
> + goto out1;
> +
> + ret = register_shrinker(&nfs4_xattr_cache_shrinker);
> + if (ret)
> + goto out0;
> +
> + ret = register_shrinker(&nfs4_xattr_entry_shrinker);
> + if (ret)
> + goto out;
> +
> + ret = register_shrinker(&nfs4_xattr_large_entry_shrinker);
> + if (!ret)
> + return 0;
> +
> + unregister_shrinker(&nfs4_xattr_entry_shrinker);
> +out:
> + unregister_shrinker(&nfs4_xattr_cache_shrinker);
> +out0:
> + destroy_workqueue(nfs4_xattr_cache_wq);
> +out1:
> + list_lru_destroy(&nfs4_xattr_cache_lru);
> +out2:
> + list_lru_destroy(&nfs4_xattr_entry_lru);
> +out3:
> + list_lru_destroy(&nfs4_xattr_large_entry_lru);
> +out4:
> + kmem_cache_destroy(nfs4_xattr_cache_cachep);
> +
> + return ret;
> +}
> +
> +void nfs4_xattr_cache_exit(void)
> +{
> + unregister_shrinker(&nfs4_xattr_entry_shrinker);
> + unregister_shrinker(&nfs4_xattr_cache_shrinker);
> + list_lru_destroy(&nfs4_xattr_entry_lru);
> + list_lru_destroy(&nfs4_xattr_cache_lru);
> + kmem_cache_destroy(nfs4_xattr_cache_cachep);
> + destroy_workqueue(nfs4_xattr_cache_wq);
> +}
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6df94857f5bb..079c1ac84cee 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -7459,6 +7459,7 @@ static int nfs4_xattr_set_nfs4_user(const struct
> xattr_handler *handler,
> size_t buflen, int flags)
> {
> struct nfs_access_entry cache;
> + int ret;
>
> if (!nfs_server_capable(inode, NFS_CAP_XATTR))
> return -EOPNOTSUPP;
> @@ -7477,10 +7478,17 @@ static int nfs4_xattr_set_nfs4_user(const struct
> xattr_handler *handler,
> return -EACCES;
> }
>
> - if (buf == NULL)
> - return nfs42_proc_removexattr(inode, key);
> - else
> - return nfs42_proc_setxattr(inode, key, buf, buflen, flags);
> + if (buf == NULL) {
> + ret = nfs42_proc_removexattr(inode, key);
> + if (!ret)
> + nfs4_xattr_cache_remove(inode, key);
> + } else {
> + ret = nfs42_proc_setxattr(inode, key, buf, buflen, flags);
> + if (!ret)
> + nfs4_xattr_cache_add(inode, key, buf, NULL, buflen);
> + }
> +
> + return ret;
> }
>
> static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
> @@ -7488,6 +7496,7 @@ static int nfs4_xattr_get_nfs4_user(const struct
> xattr_handler *handler,
> const char *key, void *buf, size_t buflen)
> {
> struct nfs_access_entry cache;
> + ssize_t ret;
>
> if (!nfs_server_capable(inode, NFS_CAP_XATTR))
> return -EOPNOTSUPP;
> @@ -7497,7 +7506,17 @@ static int nfs4_xattr_get_nfs4_user(const struct
> xattr_handler *handler,
> return -EACCES;
> }
>
> - return nfs42_proc_getxattr(inode, key, buf, buflen);
> + ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> + if (ret)
> + return ret;
> +
> + ret = nfs4_xattr_cache_get(inode, key, buf, buflen);
> + if (ret >= 0 || (ret < 0 && ret != -ENOENT))
> + return ret;
> +
> + ret = nfs42_proc_getxattr(inode, key, buf, buflen);
> +
> + return ret;
> }
>
> static ssize_t
> @@ -7505,7 +7524,7 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char
> *list, size_t list_len)
> {
> u64 cookie;
> bool eof;
> - int ret, size;
> + ssize_t ret, size;
> char *buf;
> size_t buflen;
> struct nfs_access_entry cache;
> @@ -7518,6 +7537,14 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char
> *list, size_t list_len)
> return 0;
> }
>
> + ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> + if (ret)
> + return ret;
> +
> + ret = nfs4_xattr_cache_list(inode, list, list_len);
> + if (ret >= 0 || (ret < 0 && ret != -ENOENT))
> + return ret;
> +
> cookie = 0;
> eof = false;
> buflen = list_len ? list_len : XATTR_LIST_MAX;
> @@ -7537,6 +7564,9 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char
> *list, size_t list_len)
> size += ret;
> }
>
> + if (list_len)
> + nfs4_xattr_cache_set_list(inode, list, size);
> +
> return size;
> }
>
> diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
> index 1475f932d7da..0c1ab846b83d 100644
> --- a/fs/nfs/nfs4super.c
> +++ b/fs/nfs/nfs4super.c
> @@ -69,6 +69,7 @@ static void nfs4_evict_inode(struct inode *inode)
> pnfs_destroy_layout(NFS_I(inode));
> /* First call standard NFS clear_inode() code */
> nfs_clear_inode(inode);
> + nfs4_xattr_cache_zap(inode);
> }
>
> struct nfs_referral_count {
> @@ -268,6 +269,12 @@ static int __init init_nfs_v4(void)
> if (err)
> goto out1;
>
> +#ifdef CONFIG_NFS_V4_2
> + err = nfs4_xattr_cache_init();
> + if (err)
> + goto out2;
> +#endif
> +
> err = nfs4_register_sysctl();
> if (err)
> goto out2;
> @@ -288,6 +295,9 @@ static void __exit exit_nfs_v4(void)
> nfs4_pnfs_v3_ds_connect_unload();
>
> unregister_nfs_version(&nfs_v4);
> +#ifdef CONFIG_NFS_V4_2
> + nfs4_xattr_cache_exit();
> +#endif
> nfs4_unregister_sysctl();
> nfs_idmap_quit();
> nfs_dns_resolver_destroy();
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 1fcfef670a4a..c08cc22d9c32 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -102,6 +102,8 @@ struct nfs_delegation;
>
> struct posix_acl;
>
> +struct nfs4_xattr_cache;
> +
> /*
> * nfs fs inode data in memory
> */
> @@ -188,6 +190,10 @@ struct nfs_inode {
> struct fscache_cookie *fscache;
> #endif
> struct inode vfs_inode;
> +
> +#ifdef CONFIG_NFS_V4_2
> + struct nfs4_xattr_cache *xattr_cache;
> +#endif
> };
>
> struct nfs4_copy_state {
> diff --git a/include/uapi/linux/nfs_fs.h b/include/uapi/linux/nfs_fs.h
> index 7bcc8cd6831d..3afe3767c55d 100644
> --- a/include/uapi/linux/nfs_fs.h
> +++ b/include/uapi/linux/nfs_fs.h
> @@ -56,6 +56,7 @@
> #define NFSDBG_PNFS 0x1000
> #define NFSDBG_PNFS_LD 0x2000
> #define NFSDBG_STATE 0x4000
> +#define NFSDBG_XATTRCACHE 0x8000
> #define NFSDBG_ALL 0xFFFF
>
>

2020-03-12 20:50:23

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH 05/13] NFSv4.2: add client side XDR handling for extended attributes

Hi Frank,

On Wed, 2020-03-11 at 19:56 +0000, Frank van der Linden wrote:
> Define the argument and response structures that will be used for
> RFC 8276 extended attribute RPC calls, and implement the necessary
> functions to encode/decode the extended attribute operations.
>
> Signed-off-by: Frank van der Linden <[email protected]>
> ---
> fs/nfs/nfs42xdr.c | 368
> ++++++++++++++++++++++++++++++++++++++++++++++++
> fs/nfs/nfs4xdr.c | 6 +
> include/linux/nfs_xdr.h | 59 +++++++-
> 3 files changed, 432 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> index 6712daa9d85b..ac45a1027523 100644
> --- a/fs/nfs/nfs42xdr.c
> +++ b/fs/nfs/nfs42xdr.c
> @@ -407,6 +407,212 @@ static void encode_layouterror(struct xdr_stream *xdr,
> encode_device_error(xdr, &args->errors[0]);
> }
>
> +#ifdef CONFIG_NFS_V4_2

This file is only compiled when CONFIG_NFS_V4_2=y, so there is no need for the
#ifdef here.

Anna

> +static void encode_setxattr(struct xdr_stream *xdr,
> + const struct nfs42_setxattrargs *arg,
> + struct compound_hdr *hdr)
> +{
> + __be32 *p;
> +
> + BUILD_BUG_ON(XATTR_CREATE != SETXATTR4_CREATE);
> + BUILD_BUG_ON(XATTR_REPLACE != SETXATTR4_REPLACE);
> +
> + encode_op_hdr(xdr, OP_SETXATTR, decode_setxattr_maxsz, hdr);
> + p = reserve_space(xdr, 4);
> + *p = cpu_to_be32(arg->xattr_flags);
> + encode_string(xdr, strlen(arg->xattr_name), arg->xattr_name);
> + p = reserve_space(xdr, 4);
> + *p = cpu_to_be32(arg->xattr_len);
> + if (arg->xattr_len)
> + xdr_write_pages(xdr, arg->xattr_pages, 0, arg->xattr_len);
> +}
> +
> +static int decode_setxattr(struct xdr_stream *xdr,
> + struct nfs4_change_info *cinfo)
> +{
> + int status;
> +
> + status = decode_op_hdr(xdr, OP_SETXATTR);
> + if (status)
> + goto out;
> + status = decode_change_info(xdr, cinfo);
> +out:
> + return status;
> +}
> +
> +
> +static void encode_getxattr(struct xdr_stream *xdr, const char *name,
> + struct compound_hdr *hdr)
> +{
> + encode_op_hdr(xdr, OP_GETXATTR, decode_getxattr_maxsz, hdr);
> + encode_string(xdr, strlen(name), name);
> +}
> +
> +static int decode_getxattr(struct xdr_stream *xdr,
> + struct nfs42_getxattrres *res,
> + struct rpc_rqst *req)
> +{
> + int status;
> + __be32 *p;
> + u32 len, rdlen;
> +
> + status = decode_op_hdr(xdr, OP_GETXATTR);
> + if (status)
> + return status;
> +
> + p = xdr_inline_decode(xdr, 4);
> + if (unlikely(!p))
> + return -EIO;
> +
> + len = be32_to_cpup(p);
> + if (len > req->rq_rcv_buf.page_len)
> + return -ERANGE;
> +
> + res->xattr_len = len;
> +
> + if (len > 0) {
> + rdlen = xdr_read_pages(xdr, len);
> + if (rdlen < len)
> + return -EIO;
> + }
> +
> + return 0;
> +}
> +
> +static void encode_removexattr(struct xdr_stream *xdr, const char *name,
> + struct compound_hdr *hdr)
> +{
> + encode_op_hdr(xdr, OP_REMOVEXATTR, decode_removexattr_maxsz, hdr);
> + encode_string(xdr, strlen(name), name);
> +}
> +
> +
> +static int decode_removexattr(struct xdr_stream *xdr,
> + struct nfs4_change_info *cinfo)
> +{
> + int status;
> +
> + status = decode_op_hdr(xdr, OP_REMOVEXATTR);
> + if (status)
> + goto out;
> +
> + status = decode_change_info(xdr, cinfo);
> +out:
> + return status;
> +}
> +
> +static void encode_listxattrs(struct xdr_stream *xdr,
> + const struct nfs42_listxattrsargs *arg,
> + struct compound_hdr *hdr)
> +{
> + __be32 *p;
> +
> + encode_op_hdr(xdr, OP_LISTXATTRS, decode_listxattrs_maxsz + 1, hdr);
> +
> + p = reserve_space(xdr, 12);
> + if (unlikely(!p))
> + return;
> +
> + p = xdr_encode_hyper(p, arg->cookie);
> + /*
> + * RFC 8276 says to specify the full max length of the LISTXATTRS
> + * XDR reply. Count is set to the XDR length of the names array
> + * plus the EOF marker. So, add the cookie and the names count.
> + */
> + *p = cpu_to_be32(arg->count + 8 + 4);
> +}
> +
> +static int decode_listxattrs(struct xdr_stream *xdr,
> + struct nfs42_listxattrsres *res)
> +{
> + int status;
> + __be32 *p;
> + u32 count, len, ulen;
> + size_t left, copied;
> + char *buf;
> +
> + status = decode_op_hdr(xdr, OP_LISTXATTRS);
> + if (status) {
> + /*
> + * Special case: for LISTXATTRS, NFS4ERR_TOOSMALL
> + * should be translated to ERANGE.
> + */
> + if (status == -ETOOSMALL)
> + status = -ERANGE;
> + goto out;
> + }
> +
> + p = xdr_inline_decode(xdr, 8);
> + if (unlikely(!p))
> + return -EIO;
> +
> + xdr_decode_hyper(p, &res->cookie);
> +
> + p = xdr_inline_decode(xdr, 4);
> + if (unlikely(!p))
> + return -EIO;
> +
> + left = res->xattr_len;
> + buf = res->xattr_buf;
> +
> + count = be32_to_cpup(p);
> + copied = 0;
> +
> + /*
> + * We have asked for enough room to encode the maximum number
> + * of possible attribute names, so everything should fit.
> + *
> + * But, don't rely on that assumption. Just decode entries
> + * until they don't fit anymore, just in case the server did
> + * something odd.
> + */
> + while (count--) {
> + p = xdr_inline_decode(xdr, 4);
> + if (unlikely(!p))
> + return -EIO;
> +
> + len = be32_to_cpup(p);
> + if (len > (XATTR_NAME_MAX - XATTR_USER_PREFIX_LEN)) {
> + status = -ERANGE;
> + goto out;
> + }
> +
> + p = xdr_inline_decode(xdr, len);
> + if (unlikely(!p))
> + return -EIO;
> +
> + ulen = len + XATTR_USER_PREFIX_LEN + 1;
> + if (buf) {
> + if (ulen > left) {
> + status = -ERANGE;
> + goto out;
> + }
> +
> + memcpy(buf, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN);
> + memcpy(buf + XATTR_USER_PREFIX_LEN, p, len);
> +
> + buf[ulen - 1] = 0;
> + buf += ulen;
> + left -= ulen;
> + }
> + copied += ulen;
> + }
> +
> + p = xdr_inline_decode(xdr, 4);
> + if (unlikely(!p))
> + return -EIO;
> +
> + res->eof = be32_to_cpup(p);
> + res->copied = copied;
> +
> +out:
> + if (status == -ERANGE && res->xattr_len == XATTR_LIST_MAX)
> + status = -E2BIG;
> +
> + return status;
> +}
> +#endif
> +
> /*
> * Encode ALLOCATE request
> */
> @@ -1062,4 +1268,166 @@ static int nfs4_xdr_dec_layouterror(struct rpc_rqst
> *rqstp,
> return status;
> }
>
> +#ifdef CONFIG_NFS_V4_2
> +static void nfs4_xdr_enc_setxattr(struct rpc_rqst *req, struct xdr_stream
> *xdr,
> + const void *data)
> +{
> + const struct nfs42_setxattrargs *args = data;
> + struct compound_hdr hdr = {
> + .minorversion = nfs4_xdr_minorversion(&args->seq_args),
> + };
> +
> + encode_compound_hdr(xdr, req, &hdr);
> + encode_sequence(xdr, &args->seq_args, &hdr);
> + encode_putfh(xdr, args->fh, &hdr);
> + encode_setxattr(xdr, args, &hdr);
> + encode_nops(&hdr);
> +}
> +
> +static int nfs4_xdr_dec_setxattr(struct rpc_rqst *req, struct xdr_stream
> *xdr,
> + void *data)
> +{
> + struct nfs42_setxattrres *res = data;
> + struct compound_hdr hdr;
> + int status;
> +
> + status = decode_compound_hdr(xdr, &hdr);
> + if (status)
> + goto out;
> + status = decode_sequence(xdr, &res->seq_res, req);
> + if (status)
> + goto out;
> + status = decode_putfh(xdr);
> + if (status)
> + goto out;
> +
> + status = decode_setxattr(xdr, &res->cinfo);
> +out:
> + return status;
> +}
> +
> +static void nfs4_xdr_enc_getxattr(struct rpc_rqst *req, struct xdr_stream
> *xdr,
> + const void *data)
> +{
> + const struct nfs42_getxattrargs *args = data;
> + struct compound_hdr hdr = {
> + .minorversion = nfs4_xdr_minorversion(&args->seq_args),
> + };
> + size_t plen;
> +
> + encode_compound_hdr(xdr, req, &hdr);
> + encode_sequence(xdr, &args->seq_args, &hdr);
> + encode_putfh(xdr, args->fh, &hdr);
> + encode_getxattr(xdr, args->xattr_name, &hdr);
> +
> + plen = args->xattr_len ? args->xattr_len : XATTR_SIZE_MAX;
> +
> + rpc_prepare_reply_pages(req, args->xattr_pages, 0, plen,
> + hdr.replen);
> + req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> +
> + encode_nops(&hdr);
> +}
> +
> +static int nfs4_xdr_dec_getxattr(struct rpc_rqst *rqstp,
> + struct xdr_stream *xdr, void *data)
> +{
> + struct nfs42_getxattrres *res = data;
> + struct compound_hdr hdr;
> + int status;
> +
> + status = decode_compound_hdr(xdr, &hdr);
> + if (status)
> + goto out;
> + status = decode_sequence(xdr, &res->seq_res, rqstp);
> + if (status)
> + goto out;
> + status = decode_putfh(xdr);
> + if (status)
> + goto out;
> + status = decode_getxattr(xdr, res, rqstp);
> +out:
> + return status;
> +}
> +
> +static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
> + struct xdr_stream *xdr, const void *data)
> +{
> + const struct nfs42_listxattrsargs *args = data;
> + struct compound_hdr hdr = {
> + .minorversion = nfs4_xdr_minorversion(&args->seq_args),
> + };
> +
> + encode_compound_hdr(xdr, req, &hdr);
> + encode_sequence(xdr, &args->seq_args, &hdr);
> + encode_putfh(xdr, args->fh, &hdr);
> + encode_listxattrs(xdr, args, &hdr);
> +
> + rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> + hdr.replen);
> + req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> +
> + encode_nops(&hdr);
> +}
> +
> +static int nfs4_xdr_dec_listxattrs(struct rpc_rqst *rqstp,
> + struct xdr_stream *xdr, void *data)
> +{
> + struct nfs42_listxattrsres *res = data;
> + struct compound_hdr hdr;
> + int status;
> +
> + xdr_set_scratch_buffer(xdr, page_address(res->scratch), PAGE_SIZE);
> +
> + status = decode_compound_hdr(xdr, &hdr);
> + if (status)
> + goto out;
> + status = decode_sequence(xdr, &res->seq_res, rqstp);
> + if (status)
> + goto out;
> + status = decode_putfh(xdr);
> + if (status)
> + goto out;
> + status = decode_listxattrs(xdr, res);
> +out:
> + return status;
> +}
> +
> +static void nfs4_xdr_enc_removexattr(struct rpc_rqst *req,
> + struct xdr_stream *xdr, const void *data)
> +{
> + const struct nfs42_removexattrargs *args = data;
> + struct compound_hdr hdr = {
> + .minorversion = nfs4_xdr_minorversion(&args->seq_args),
> + };
> +
> + encode_compound_hdr(xdr, req, &hdr);
> + encode_sequence(xdr, &args->seq_args, &hdr);
> + encode_putfh(xdr, args->fh, &hdr);
> + encode_removexattr(xdr, args->xattr_name, &hdr);
> + encode_nops(&hdr);
> +}
> +
> +static int nfs4_xdr_dec_removexattr(struct rpc_rqst *req,
> + struct xdr_stream *xdr, void *data)
> +{
> + struct nfs42_removexattrres *res = data;
> + struct compound_hdr hdr;
> + int status;
> +
> + status = decode_compound_hdr(xdr, &hdr);
> + if (status)
> + goto out;
> + status = decode_sequence(xdr, &res->seq_res, req);
> + if (status)
> + goto out;
> + status = decode_putfh(xdr);
> + if (status)
> + goto out;
> +
> + status = decode_removexattr(xdr, &res->cinfo);
> +out:
> + return status;
> +}
> +#endif
> #endif /* __LINUX_FS_NFS_NFS4_2XDR_H */
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index bebc087a1433..9e1ced791789 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -7479,6 +7479,8 @@ static struct {
> { NFS4ERR_SYMLINK, -ELOOP },
> { NFS4ERR_OP_ILLEGAL, -EOPNOTSUPP },
> { NFS4ERR_DEADLOCK, -EDEADLK },
> + { NFS4ERR_NOXATTR, -ENODATA },
> + { NFS4ERR_XATTR2BIG, -E2BIG },
> { -1, -EIO }
> };
>
> @@ -7607,6 +7609,10 @@ const struct rpc_procinfo nfs4_procedures[] = {
> PROC42(COPY_NOTIFY, enc_copy_notify, dec_copy_notify),
> PROC(LOOKUPP, enc_lookupp, dec_lookupp),
> PROC42(LAYOUTERROR, enc_layouterror, dec_layouterror),
> + PROC42(GETXATTR, enc_getxattr, dec_getxattr),
> + PROC42(SETXATTR, enc_setxattr, dec_setxattr),
> + PROC42(LISTXATTRS, enc_listxattrs, dec_listxattrs),
> + PROC42(REMOVEXATTR, enc_removexattr, dec_removexattr),
> };
>
> static unsigned int nfs_version4_counts[ARRAY_SIZE(nfs4_procedures)];
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 5076fe42c693..685deed805e8 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1497,7 +1497,64 @@ struct nfs42_seek_res {
> u32 sr_eof;
> u64 sr_offset;
> };
> -#endif
> +
> +struct nfs42_setxattrargs {
> + struct nfs4_sequence_args seq_args;
> + struct nfs_fh *fh;
> + const char *xattr_name;
> + u32 xattr_flags;
> + size_t xattr_len;
> + struct page **xattr_pages;
> +};
> +
> +struct nfs42_setxattrres {
> + struct nfs4_sequence_res seq_res;
> + struct nfs4_change_info cinfo;
> +};
> +
> +struct nfs42_getxattrargs {
> + struct nfs4_sequence_args seq_args;
> + struct nfs_fh *fh;
> + const char *xattr_name;
> + size_t xattr_len;
> + struct page **xattr_pages;
> +};
> +
> +struct nfs42_getxattrres {
> + struct nfs4_sequence_res seq_res;
> + size_t xattr_len;
> +};
> +
> +struct nfs42_listxattrsargs {
> + struct nfs4_sequence_args seq_args;
> + struct nfs_fh *fh;
> + u32 count;
> + u64 cookie;
> + struct page **xattr_pages;
> +};
> +
> +struct nfs42_listxattrsres {
> + struct nfs4_sequence_res seq_res;
> + struct page *scratch;
> + void *xattr_buf;
> + size_t xattr_len;
> + u64 cookie;
> + bool eof;
> + size_t copied;
> +};
> +
> +struct nfs42_removexattrargs {
> + struct nfs4_sequence_args seq_args;
> + struct nfs_fh *fh;
> + const char *xattr_name;
> +};
> +
> +struct nfs42_removexattrres {
> + struct nfs4_sequence_res seq_res;
> + struct nfs4_change_info cinfo;
> +};
> +
> +#endif /* CONFIG_NFS_V4_2 */
>
> struct nfs_page;
>

2020-03-16 15:52:10

by Frank van der Linden

[permalink] [raw]
Subject: Re: [PATCH 00/13] client side user xattr (RFC8276) support

Hi Anna,

On Thu, Mar 12, 2020 at 04:09:51PM -0400, Anna Schumaker wrote:
> I'm curious if you've tried xfstests with your patches? There are a
> handful of tests using xattrs that might be good to check with, too:
>
> anna@gouda % grep xattr -l tests/generic/[0-9][0-9][0-9]
> tests/generic/037
> tests/generic/062
> tests/generic/066
> tests/generic/093
> tests/generic/117
> tests/generic/337
> tests/generic/377
> tests/generic/403
> tests/generic/425
> tests/generic/454
> tests/generic/489
> tests/generic/523
> tests/generic/529
> tests/generic/556
>
> Thanks,
> Anna

I did run xfstests, and it looks fine, meaning that the differences between
actual and expected output are expected. I can rerun the xattr
specific ones and post the report. Will do that with v2.

- Frank

2020-03-17 23:04:35

by Frank van der Linden

[permalink] [raw]
Subject: Re: [PATCH 00/13] client side user xattr (RFC8276) support

On Thu, Mar 12, 2020 at 04:09:51PM -0400, Anna Schumaker wrote:
> I'm curious if you've tried xfstests with your patches? There are a
> handful of tests using xattrs that might be good to check with, too:
>
> anna@gouda % grep xattr -l tests/generic/[0-9][0-9][0-9]
> tests/generic/037
> tests/generic/062
> tests/generic/066
> tests/generic/093
> tests/generic/117
> tests/generic/337
> tests/generic/377
> tests/generic/403
> tests/generic/425
> tests/generic/454
> tests/generic/489
> tests/generic/523
> tests/generic/529
> tests/generic/556
>
> Thanks,
> Anna

I ran did a "check -nfs -g quick" run of xfstests-dev. The following tests
were applicable to extended attributes:

generic/020 fail Doesn't compute MAX_ATTR right for NFS, passes
after fixing that.
generic/037 pass
generic/062 fail It unconditionally expects the "system" and
"trusted" namespaces to be there too, not
easily fixed.
generic/066 pass
generic/093 fail Capabilities use the "security" namespace, can't
work on NFS.
generic/097 fail "trusted" namespace explicitly used, can't work
on NFS.
generic/103 fail fallocate fails on NFS, not xattr related
generic/117 pass
generic/377 fail Doesn't expect the "system.nfs4acl" attribute to
show up in listxattr. Can be fixed by filtering
out only "user" namespace xattrs.
generic/403 fail Uses the "trusted" namespace, but does not really
need it. Works if converted to the "user" namespace.
generic/454 pass
generic/523 pass


In other words, there were no problems with the patches themselves, but
xfstests will need some work to work properly.

I can send a few simple fixes in for xfstests, but a few need a bit more
work, specifically the ones that expected certain xattr namespaces to be
there. Right now there is a "_require_attr" check function, that probably
needs to be split up in to "_require_attr_user, _require_attr_system", etc
functions, which is a bit more work.

- Frank

2020-03-19 14:40:27

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 00/13] client side user xattr (RFC8276) support

On Tue, Mar 17, 2020 at 11:03:39PM +0000, Frank van der Linden wrote:
> On Thu, Mar 12, 2020 at 04:09:51PM -0400, Anna Schumaker wrote:
> > I'm curious if you've tried xfstests with your patches? There are a
> > handful of tests using xattrs that might be good to check with, too:
> >
> > anna@gouda % grep xattr -l tests/generic/[0-9][0-9][0-9]
> > tests/generic/037
> > tests/generic/062
> > tests/generic/066
> > tests/generic/093
> > tests/generic/117
> > tests/generic/337
> > tests/generic/377
> > tests/generic/403
> > tests/generic/425
> > tests/generic/454
> > tests/generic/489
> > tests/generic/523
> > tests/generic/529
> > tests/generic/556
> >
> > Thanks,
> > Anna
>
> I ran did a "check -nfs -g quick" run of xfstests-dev. The following tests
> were applicable to extended attributes:
>
> generic/020 fail Doesn't compute MAX_ATTR right for NFS, passes
> after fixing that.
> generic/037 pass
> generic/062 fail It unconditionally expects the "system" and
> "trusted" namespaces to be there too, not
> easily fixed.
> generic/066 pass
> generic/093 fail Capabilities use the "security" namespace, can't
> work on NFS.
> generic/097 fail "trusted" namespace explicitly used, can't work
> on NFS.
> generic/103 fail fallocate fails on NFS, not xattr related
> generic/117 pass
> generic/377 fail Doesn't expect the "system.nfs4acl" attribute to
> show up in listxattr. Can be fixed by filtering
> out only "user" namespace xattrs.
> generic/403 fail Uses the "trusted" namespace, but does not really
> need it. Works if converted to the "user" namespace.
> generic/454 pass
> generic/523 pass
>
>
> In other words, there were no problems with the patches themselves, but
> xfstests will need some work to work properly.
>
> I can send a few simple fixes in for xfstests, but a few need a bit more
> work, specifically the ones that expected certain xattr namespaces to be
> there. Right now there is a "_require_attr" check function, that probably
> needs to be split up in to "_require_attr_user, _require_attr_system", etc
> functions, which is a bit more work.

I just took a quick look at common/attr and all I see in _require_attrs
is:

attr -s "user.xfstests" -V "attr" $TEST_DIR/syscalltest

What am I missing?

--b.