2020-03-25 23:11:30

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH v2 00/13] NFS client user xattr (RFC8276) support

v1 is here: https://www.spinics.net/lists/linux-nfs/msg76740.html

v2:

* Move nfs4 specific definitions to nfs_fs4.h
* Squash some patches together to avoid unused function warnings
when bisecting.
* Made determining server support a two-step process. First,
the extended attribute FATTR is verified to be supported, then
the value of the attributed is queried in fsinfo to determine
support.
* Fixed up Makefile to remove an unneeded extra line.

This was tested as before (using my own stress tests), but also
with xfstests-dev. No issues were found, but xfstests needs some
fixes to correctly run the applicable xattr tests on NFS. I
have those changes, but need to send them out.

I also tested stress-ng-xattr with 1000 workers on the client
side, running for 8 hours without problems (except for noting
that the session tbl_lock can become quite hot when doing
NFS operations by 1000 threads, but that's a separate issue).


Frank van der Linden (13):
nfs,nfsd: NFSv4.2 extended attribute protocol definitions
nfs: add client side only definitions for user xattrs
NFSv4.2: define limits and sizes for user xattr handling
NFSv4.2: query the server for extended attribute support
NFSv4.2: add client side XDR handling for extended attributes
nfs: define nfs_access_get_cached function
NFSv4.2: query the extended attribute access bits
nfs: modify update_changeattr to deal with regular files
nfs: define and use the NFS_INO_INVALID_XATTR flag
nfs: make the buf_to_pages_noslab function available to the nfs code
NFSv4.2: add the extended attribute proc functions.
NFSv4.2: hook in the user extended attribute handlers
NFSv4.2: add client side xattr caching.

fs/nfs/Makefile | 2 +-
fs/nfs/client.c | 22 +-
fs/nfs/dir.c | 24 +-
fs/nfs/inode.c | 16 +-
fs/nfs/nfs42.h | 24 +
fs/nfs/nfs42proc.c | 248 ++++++++
fs/nfs/nfs42xattr.c | 1083 +++++++++++++++++++++++++++++++++++
fs/nfs/nfs42xdr.c | 438 ++++++++++++++
fs/nfs/nfs4_fs.h | 35 ++
fs/nfs/nfs4client.c | 31 +
fs/nfs/nfs4proc.c | 237 +++++++-
fs/nfs/nfs4super.c | 10 +
fs/nfs/nfs4xdr.c | 31 +
fs/nfs/nfstrace.h | 3 +-
include/linux/nfs4.h | 25 +
include/linux/nfs_fs.h | 12 +
include/linux/nfs_fs_sb.h | 6 +
include/linux/nfs_xdr.h | 60 +-
include/uapi/linux/nfs4.h | 3 +
include/uapi/linux/nfs_fs.h | 1 +
20 files changed, 2269 insertions(+), 42 deletions(-)
create mode 100644 fs/nfs/nfs42xattr.c

--
2.17.2


2020-03-25 23:11:41

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH v2 06/13] nfs: define nfs_access_get_cached function

The only consumer of nfs_access_get_cached_rcu and nfs_access_cached
calls these static functions in order to first try RCU access, and
then locked access.

Combine them in to a single function, and call that. Make this function
available to the rest of the NFS code.

Signed-off-by: Frank van der Linden <[email protected]>
---
fs/nfs/dir.c | 20 ++++++++++++++++----
include/linux/nfs_fs.h | 2 ++
2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 193d6fb363b7..a0b564781e3e 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2433,7 +2433,7 @@ static struct nfs_access_entry *nfs_access_search_rbtree(struct inode *inode, co
return NULL;
}

-static int nfs_access_get_cached(struct inode *inode, const struct cred *cred, struct nfs_access_entry *res, bool may_block)
+static int nfs_access_get_cached_locked(struct inode *inode, const struct cred *cred, struct nfs_access_entry *res, bool may_block)
{
struct nfs_inode *nfsi = NFS_I(inode);
struct nfs_access_entry *cache;
@@ -2506,6 +2506,20 @@ static int nfs_access_get_cached_rcu(struct inode *inode, const struct cred *cre
return err;
}

+int nfs_access_get_cached(struct inode *inode, const struct cred *cred, struct
+nfs_access_entry *res, bool may_block)
+{
+ int status;
+
+ status = nfs_access_get_cached_rcu(inode, cred, res);
+ if (status != 0)
+ status = nfs_access_get_cached_locked(inode, cred, res,
+ may_block);
+
+ return status;
+}
+EXPORT_SYMBOL_GPL(nfs_access_get_cached);
+
static void nfs_access_add_rbtree(struct inode *inode, struct nfs_access_entry *set)
{
struct nfs_inode *nfsi = NFS_I(inode);
@@ -2620,9 +2634,7 @@ static int nfs_do_access(struct inode *inode, const struct cred *cred, int mask)

trace_nfs_access_enter(inode);

- status = nfs_access_get_cached_rcu(inode, cred, &cache);
- if (status != 0)
- status = nfs_access_get_cached(inode, cred, &cache, may_block);
+ status = nfs_access_get_cached(inode, cred, &cache, may_block);
if (status == 0)
goto out_cached;

diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 442458e94ab5..e86e7a747092 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -491,6 +491,8 @@ extern int nfs_instantiate(struct dentry *dentry, struct nfs_fh *fh,
struct nfs_fattr *fattr, struct nfs4_label *label);
extern int nfs_may_open(struct inode *inode, const struct cred *cred, int openflags);
extern void nfs_access_zap_cache(struct inode *inode);
+extern int nfs_access_get_cached(struct inode *inode, const struct cred *cred, struct nfs_access_entry *res,
+ bool may_block);

/*
* linux/fs/nfs/symlink.c
--
2.17.2

2020-03-25 23:11:42

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH v2 09/13] nfs: define and use the NFS_INO_INVALID_XATTR flag

Define the NFS_INO_INVALID_XATTR flag, to be used for the NFSv4.2 xattr
cache, and use it where appropriate.

No functional change as yet.

Signed-off-by: Frank van der Linden <[email protected]>
---
fs/nfs/inode.c | 7 ++++++-
fs/nfs/nfs4proc.c | 3 ++-
fs/nfs/nfstrace.h | 3 ++-
include/linux/nfs_fs.h | 1 +
4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 11bf15800ac9..d2be152796ef 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -205,7 +205,8 @@ static void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
flags &= ~NFS_INO_INVALID_OTHER;
flags &= ~(NFS_INO_INVALID_CHANGE
| NFS_INO_INVALID_SIZE
- | NFS_INO_REVAL_PAGECACHE);
+ | NFS_INO_REVAL_PAGECACHE
+ | NFS_INO_INVALID_XATTR);
}

if (inode->i_mapping->nrpages == 0)
@@ -535,6 +536,8 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr, st
inode->i_gid = fattr->gid;
else if (nfs_server_capable(inode, NFS_CAP_OWNER_GROUP))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_OTHER);
+ if (nfs_server_capable(inode, NFS_CAP_XATTR))
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_XATTR);
if (fattr->valid & NFS_ATTR_FATTR_BLOCKS_USED)
inode->i_blocks = fattr->du.nfs2.blocks;
if (fattr->valid & NFS_ATTR_FATTR_SPACE_USED) {
@@ -1365,6 +1368,8 @@ static void nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
inode_set_iversion_raw(inode, fattr->change_attr);
if (S_ISDIR(inode->i_mode))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_DATA);
+ else if (nfs_server_capable(inode, NFS_CAP_XATTR))
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_XATTR);
}
/* If we have atomic WCC data, we may update some attributes */
ts = inode->i_ctime;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 0bb2241028f0..441bdbb7f247 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1183,7 +1183,8 @@ nfs4_update_changeattr_locked(struct inode *inode,

if (cinfo->before != inode_peek_iversion_raw(inode))
nfsi->cache_validity |= NFS_INO_INVALID_ACCESS |
- NFS_INO_INVALID_ACL;
+ NFS_INO_INVALID_ACL |
+ NFS_INO_INVALID_XATTR;
}
inode_set_iversion_raw(inode, cinfo->after);
nfsi->read_cache_jiffies = timestamp;
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index a9588d19a5ae..b941ad1199e5 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -59,7 +59,8 @@ TRACE_DEFINE_ENUM(NFS_INO_INVALID_OTHER);
{ NFS_INO_INVALID_CTIME, "INVALID_CTIME" }, \
{ NFS_INO_INVALID_MTIME, "INVALID_MTIME" }, \
{ NFS_INO_INVALID_SIZE, "INVALID_SIZE" }, \
- { NFS_INO_INVALID_OTHER, "INVALID_OTHER" })
+ { NFS_INO_INVALID_OTHER, "INVALID_OTHER" }, \
+ { NFS_INO_INVALID_XATTR, "INVALID_XATTR" })

TRACE_DEFINE_ENUM(NFS_INO_ADVISE_RDPLUS);
TRACE_DEFINE_ENUM(NFS_INO_STALE);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index e86e7a747092..1fcfef670a4a 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -233,6 +233,7 @@ struct nfs4_copy_state {
#define NFS_INO_INVALID_OTHER BIT(12) /* other attrs are invalid */
#define NFS_INO_DATA_INVAL_DEFER \
BIT(13) /* Deferred cache invalidation */
+#define NFS_INO_INVALID_XATTR BIT(14) /* xattrs are invalid */

#define NFS_INO_INVALID_ATTR (NFS_INO_INVALID_CHANGE \
| NFS_INO_INVALID_CTIME \
--
2.17.2

2020-03-25 23:11:43

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH v2 13/13] NFSv4.2: add client side xattr caching.

Implement client side caching for NFSv4.2 extended attributes. The cache
is a per-inode hashtable, with name/value entries. There is one special
entry for the listxattr cache.

NFS inodes have a pointer to a cache structure. The cache structure is
allocated on demand, freed when the cache is invalidated.

Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE)
are collected by a separate shrinker, and freed more aggressively
than others.

Signed-off-by: Frank van der Linden <[email protected]>
---
fs/nfs/Makefile | 2 +-
fs/nfs/inode.c | 9 +-
fs/nfs/nfs42proc.c | 12 +
fs/nfs/nfs42xattr.c | 1083 +++++++++++++++++++++++++++++++++++
fs/nfs/nfs4_fs.h | 22 +
fs/nfs/nfs4proc.c | 42 +-
fs/nfs/nfs4super.c | 10 +
include/linux/nfs_fs.h | 6 +
include/uapi/linux/nfs_fs.h | 1 +
9 files changed, 1179 insertions(+), 8 deletions(-)
create mode 100644 fs/nfs/nfs42xattr.c

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 2433c3e03cfa..22d11fdc6deb 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -30,7 +30,7 @@ nfsv4-y := nfs4proc.o nfs4xdr.o nfs4state.o nfs4renewd.o nfs4super.o nfs4file.o
nfsv4-$(CONFIG_NFS_USE_LEGACY_DNS) += cache_lib.o
nfsv4-$(CONFIG_SYSCTL) += nfs4sysctl.o
nfsv4-$(CONFIG_NFS_V4_1) += pnfs.o pnfs_dev.o pnfs_nfs.o
-nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o
+nfsv4-$(CONFIG_NFS_V4_2) += nfs42proc.o nfs42xattr.o

obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
obj-$(CONFIG_PNFS_BLOCK) += blocklayout/
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index d2be152796ef..9d4952d2306b 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -194,6 +194,7 @@ bool nfs_check_cache_invalid(struct inode *inode, unsigned long flags)

return nfs_check_cache_invalid_not_delegated(inode, flags);
}
+EXPORT_SYMBOL_GPL(nfs_check_cache_invalid);

static void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
{
@@ -235,11 +236,13 @@ static void nfs_zap_caches_locked(struct inode *inode)
| NFS_INO_INVALID_DATA
| NFS_INO_INVALID_ACCESS
| NFS_INO_INVALID_ACL
+ | NFS_INO_INVALID_XATTR
| NFS_INO_REVAL_PAGECACHE);
} else
nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
| NFS_INO_INVALID_ACCESS
| NFS_INO_INVALID_ACL
+ | NFS_INO_INVALID_XATTR
| NFS_INO_REVAL_PAGECACHE);
nfs_zap_label_cache_locked(nfsi);
}
@@ -1885,7 +1888,8 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
if (!(have_writers || have_delegation)) {
invalid |= NFS_INO_INVALID_DATA
| NFS_INO_INVALID_ACCESS
- | NFS_INO_INVALID_ACL;
+ | NFS_INO_INVALID_ACL
+ | NFS_INO_INVALID_XATTR;
/* Force revalidate of all attributes */
save_cache_validity |= NFS_INO_INVALID_CTIME
| NFS_INO_INVALID_MTIME
@@ -2084,6 +2088,9 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
#if IS_ENABLED(CONFIG_NFS_V4)
nfsi->nfs4_acl = NULL;
#endif /* CONFIG_NFS_V4 */
+#ifdef CONFIG_NFS_V4_2
+ nfsi->xattr_cache = NULL;
+#endif
return &nfsi->vfs_inode;
}
EXPORT_SYMBOL_GPL(nfs_alloc_inode);
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 8c2e52bc986a..e200522469af 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -1182,6 +1182,18 @@ static ssize_t _nfs42_proc_getxattr(struct inode *inode, const char *name,
if (ret < 0)
return ret;

+ /*
+ * Normally, the caching is done one layer up, but for successful
+ * RPCS, always cache the result here, even if the caller was
+ * just querying the length, or if the reply was too big for
+ * the caller. This avoids a second RPC in the case of the
+ * common query-alloc-retrieve cycle for xattrs.
+ *
+ * Note that xattr_len is always capped to XATTR_SIZE_MAX.
+ */
+
+ nfs4_xattr_cache_add(inode, name, NULL, pages, res.xattr_len);
+
if (buflen) {
if (res.xattr_len > buflen)
return -ERANGE;
diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
new file mode 100644
index 000000000000..23fdab977a2a
--- /dev/null
+++ b/fs/nfs/nfs42xattr.c
@@ -0,0 +1,1083 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2019, 2020 Amazon.com, Inc. or its affiliates. All rights reserved.
+ *
+ * User extended attribute client side cache functions.
+ *
+ * Author: Frank van der Linden <[email protected]>
+ */
+#include <linux/errno.h>
+#include <linux/nfs_fs.h>
+#include <linux/hashtable.h>
+#include <linux/refcount.h>
+#include <uapi/linux/xattr.h>
+
+#include "nfs4_fs.h"
+#include "internal.h"
+
+/*
+ * User extended attributes client side caching is implemented by having
+ * a cache structure attached to NFS inodes. This structure is allocated
+ * when needed, and freed when the cache is zapped.
+ *
+ * The cache structure contains as hash table of entries, and a pointer
+ * to a special-cased entry for the listxattr cache.
+ *
+ * Accessing and allocating / freeing the caches is done via reference
+ * counting. The cache entries use a similar refcounting scheme.
+ *
+ * This makes freeing a cache, both from the shrinker and from the
+ * zap cache path, easy. It also means that, in current use cases,
+ * the large majority of inodes will not waste any memory, as they
+ * will never have any user extended attributes assigned to them.
+ *
+ * Attribute entries are hashed in to a simple hash table. They are
+ * also part of an LRU.
+ *
+ * There are three shrinkers.
+ *
+ * Two shrinkers deal with the cache entries themselves: one for
+ * large entries (> PAGE_SIZE), and one for smaller entries. The
+ * shrinker for the larger entries works more aggressively than
+ * those for the smaller entries.
+ *
+ * The other shrinker frees the cache structures themselves.
+ */
+
+/*
+ * 64 buckets is a good default. There is likely no reasonable
+ * workload that uses more than even 64 user extended attributes.
+ * You can certainly add a lot more - but you get what you ask for
+ * in those circumstances.
+ */
+#define NFS4_XATTR_HASH_SIZE 64
+
+#define NFSDBG_FACILITY NFSDBG_XATTRCACHE
+
+struct nfs4_xattr_cache;
+struct nfs4_xattr_entry;
+
+struct nfs4_xattr_bucket {
+ spinlock_t lock;
+ struct hlist_head hlist;
+ struct nfs4_xattr_cache *cache;
+ bool draining;
+};
+
+struct nfs4_xattr_cache {
+ struct kref ref;
+ spinlock_t hash_lock; /* protects hashtable and lru */
+ struct nfs4_xattr_bucket buckets[NFS4_XATTR_HASH_SIZE];
+ struct list_head lru;
+ struct list_head dispose;
+ atomic_long_t nent;
+ spinlock_t listxattr_lock;
+ struct inode *inode;
+ struct nfs4_xattr_entry *listxattr;
+ struct work_struct work;
+};
+
+struct nfs4_xattr_entry {
+ struct kref ref;
+ struct hlist_node hnode;
+ struct list_head lru;
+ struct list_head dispose;
+ char *xattr_name;
+ void *xattr_value;
+ size_t xattr_size;
+ struct nfs4_xattr_bucket *bucket;
+ uint32_t flags;
+};
+
+#define NFS4_XATTR_ENTRY_EXTVAL 0x0001
+
+/*
+ * LRU list of NFS inodes that have xattr caches.
+ */
+static struct list_lru nfs4_xattr_cache_lru;
+static struct list_lru nfs4_xattr_entry_lru;
+static struct list_lru nfs4_xattr_large_entry_lru;
+
+static struct kmem_cache *nfs4_xattr_cache_cachep;
+
+static struct workqueue_struct *nfs4_xattr_cache_wq;
+
+/*
+ * Hashing helper functions.
+ */
+static void
+nfs4_xattr_hash_init(struct nfs4_xattr_cache *cache)
+{
+ unsigned int i;
+
+ for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
+ INIT_HLIST_HEAD(&cache->buckets[i].hlist);
+ spin_lock_init(&cache->buckets[i].lock);
+ cache->buckets[i].cache = cache;
+ cache->buckets[i].draining = false;
+ }
+}
+
+/*
+ * Locking order:
+ * 1. inode i_lock or bucket lock
+ * 2. list_lru lock (taken by list_lru_* functions)
+ */
+
+/*
+ * Wrapper functions to add a cache entry to the right LRU.
+ */
+static bool
+nfs4_xattr_entry_lru_add(struct nfs4_xattr_entry *entry)
+{
+ struct list_lru *lru;
+
+ lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
+ &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
+
+ return list_lru_add(lru, &entry->lru);
+}
+
+static bool
+nfs4_xattr_entry_lru_del(struct nfs4_xattr_entry *entry)
+{
+ struct list_lru *lru;
+
+ lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
+ &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
+
+ return list_lru_del(lru, &entry->lru);
+}
+
+/*
+ * This function allocates cache entries. They are the normal
+ * extended attribute name/value pairs, but may also be a listxattr
+ * cache. Those allocations use the same entry so that they can be
+ * treated as one by the memory shrinker.
+ *
+ * xattr cache entries are allocated together with names. If the
+ * value fits in to one page with the entry structure and the name,
+ * it will also be part of the same allocation (kmalloc). This is
+ * expected to be the vast majority of cases. Larger allocations
+ * have a value pointer that is allocated separately by kvmalloc.
+ *
+ * Parameters:
+ *
+ * @name: Name of the extended attribute. NULL for listxattr cache
+ * entry.
+ * @value: Value of attribute, or listxattr cache. NULL if the
+ * value is to be copied from pages instead.
+ * @pages: Pages to copy the value from, if not NULL. Passed in to
+ * make it easier to copy the value after an RPC, even if
+ * the value will not be passed up to application (e.g.
+ * for a 'query' getxattr with NULL buffer).
+ * @len: Length of the value. Can be 0 for zero-length attribues.
+ * @value and @pages will be NULL if @len is 0.
+ */
+static struct nfs4_xattr_entry *
+nfs4_xattr_alloc_entry(const char *name, const void *value,
+ struct page **pages, size_t len)
+{
+ struct nfs4_xattr_entry *entry;
+ void *valp;
+ char *namep;
+ size_t alloclen, slen;
+ char *buf;
+ uint32_t flags;
+
+ BUILD_BUG_ON(sizeof(struct nfs4_xattr_entry) +
+ XATTR_NAME_MAX + 1 > PAGE_SIZE);
+
+ alloclen = sizeof(struct nfs4_xattr_entry);
+ if (name != NULL) {
+ slen = strlen(name) + 1;
+ alloclen += slen;
+ } else
+ slen = 0;
+
+ if (alloclen + len <= PAGE_SIZE) {
+ alloclen += len;
+ flags = 0;
+ } else {
+ flags = NFS4_XATTR_ENTRY_EXTVAL;
+ }
+
+ buf = kmalloc(alloclen, GFP_KERNEL_ACCOUNT | GFP_NOFS);
+ if (buf == NULL)
+ return NULL;
+ entry = (struct nfs4_xattr_entry *)buf;
+
+ if (name != NULL) {
+ namep = buf + sizeof(struct nfs4_xattr_entry);
+ memcpy(namep, name, slen);
+ } else {
+ namep = NULL;
+ }
+
+
+ if (flags & NFS4_XATTR_ENTRY_EXTVAL) {
+ valp = kvmalloc(len, GFP_KERNEL_ACCOUNT | GFP_NOFS);
+ if (valp == NULL) {
+ kfree(buf);
+ return NULL;
+ }
+ } else if (len != 0) {
+ valp = buf + sizeof(struct nfs4_xattr_entry) + slen;
+ } else
+ valp = NULL;
+
+ if (valp != NULL) {
+ if (value != NULL)
+ memcpy(valp, value, len);
+ else
+ _copy_from_pages(valp, pages, 0, len);
+ }
+
+ entry->flags = flags;
+ entry->xattr_value = valp;
+ kref_init(&entry->ref);
+ entry->xattr_name = namep;
+ entry->xattr_size = len;
+ entry->bucket = NULL;
+ INIT_LIST_HEAD(&entry->lru);
+ INIT_LIST_HEAD(&entry->dispose);
+ INIT_HLIST_NODE(&entry->hnode);
+
+ return entry;
+}
+
+static void
+nfs4_xattr_free_entry(struct nfs4_xattr_entry *entry)
+{
+ if (entry->flags & NFS4_XATTR_ENTRY_EXTVAL)
+ kvfree(entry->xattr_value);
+ kfree(entry);
+}
+
+static void
+nfs4_xattr_free_entry_cb(struct kref *kref)
+{
+ struct nfs4_xattr_entry *entry;
+
+ entry = container_of(kref, struct nfs4_xattr_entry, ref);
+
+ if (WARN_ON(!list_empty(&entry->lru)))
+ return;
+
+ nfs4_xattr_free_entry(entry);
+}
+
+static void
+nfs4_xattr_free_cache_cb(struct kref *kref)
+{
+ struct nfs4_xattr_cache *cache;
+ int i;
+
+ cache = container_of(kref, struct nfs4_xattr_cache, ref);
+
+ for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
+ if (WARN_ON(!hlist_empty(&cache->buckets[i].hlist)))
+ return;
+ cache->buckets[i].draining = false;
+ }
+
+ cache->listxattr = NULL;
+
+ kmem_cache_free(nfs4_xattr_cache_cachep, cache);
+
+}
+
+static struct nfs4_xattr_cache *
+nfs4_xattr_alloc_cache(void)
+{
+ struct nfs4_xattr_cache *cache;
+
+ cache = kmem_cache_alloc(nfs4_xattr_cache_cachep,
+ GFP_KERNEL_ACCOUNT | GFP_NOFS);
+ if (cache == NULL)
+ return NULL;
+
+ kref_init(&cache->ref);
+ atomic_long_set(&cache->nent, 0);
+
+ return cache;
+}
+
+/*
+ * Set the listxattr cache, which is a special-cased cache entry.
+ * The special value ERR_PTR(-ESTALE) is used to indicate that
+ * the cache is being drained - this prevents a new listxattr
+ * cache from being added to what is now a stale cache.
+ */
+static int
+nfs4_xattr_set_listcache(struct nfs4_xattr_cache *cache,
+ struct nfs4_xattr_entry *new)
+{
+ struct nfs4_xattr_entry *old;
+ int ret = 1;
+
+ spin_lock(&cache->listxattr_lock);
+
+ old = cache->listxattr;
+
+ if (old == ERR_PTR(-ESTALE)) {
+ ret = 0;
+ goto out;
+ }
+
+ cache->listxattr = new;
+ if (new != NULL && new != ERR_PTR(-ESTALE))
+ nfs4_xattr_entry_lru_add(new);
+
+ if (old != NULL) {
+ nfs4_xattr_entry_lru_del(old);
+ kref_put(&old->ref, nfs4_xattr_free_entry_cb);
+ }
+out:
+ spin_unlock(&cache->listxattr_lock);
+
+ return ret;
+}
+
+/*
+ * Unlink a cache from its parent inode, clearing out an invalid
+ * cache. Must be called with i_lock held.
+ */
+static struct nfs4_xattr_cache *
+nfs4_xattr_cache_unlink(struct inode *inode)
+{
+ struct nfs_inode *nfsi;
+ struct nfs4_xattr_cache *oldcache;
+
+ nfsi = NFS_I(inode);
+
+ oldcache = nfsi->xattr_cache;
+ if (oldcache != NULL) {
+ list_lru_del(&nfs4_xattr_cache_lru, &oldcache->lru);
+ oldcache->inode = NULL;
+ }
+ nfsi->xattr_cache = NULL;
+ nfsi->cache_validity &= ~NFS_INO_INVALID_XATTR;
+
+ return oldcache;
+
+}
+
+/*
+ * Discard a cache. Usually called by a worker, since walking all
+ * the entries can take up some cycles that we don't want to waste
+ * in the I/O path. Can also be called from the shrinker callback.
+ *
+ * The cache is dead, it has already been unlinked from its inode,
+ * and no longer appears on the cache LRU list.
+ *
+ * Mark all buckets as draining, so that no new entries are added. This
+ * could still happen in the unlikely, but possible case that another
+ * thread had grabbed a reference before it was unlinked from the inode,
+ * and is still holding it for an add operation.
+ *
+ * Remove all entries from the LRU lists, so that there is no longer
+ * any way to 'find' this cache. Then, remove the entries from the hash
+ * table.
+ *
+ * At that point, the cache will remain empty and can be freed when the final
+ * reference drops, which is very likely the kref_put at the end of
+ * this function, or the one called immediately afterwards in the
+ * shrinker callback.
+ */
+static void
+nfs4_xattr_discard_cache(struct nfs4_xattr_cache *cache)
+{
+ unsigned int i;
+ struct nfs4_xattr_entry *entry;
+ struct nfs4_xattr_bucket *bucket;
+ struct hlist_node *n;
+
+ nfs4_xattr_set_listcache(cache, ERR_PTR(-ESTALE));
+
+ for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
+ bucket = &cache->buckets[i];
+
+ spin_lock(&bucket->lock);
+ bucket->draining = true;
+ hlist_for_each_entry_safe(entry, n, &bucket->hlist, hnode) {
+ nfs4_xattr_entry_lru_del(entry);
+ hlist_del_init(&entry->hnode);
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+ }
+ spin_unlock(&bucket->lock);
+ }
+
+ atomic_long_set(&cache->nent, 0);
+
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+}
+
+static void
+nfs4_xattr_discard_cache_worker(struct work_struct *work)
+{
+ struct nfs4_xattr_cache *cache = container_of(work,
+ struct nfs4_xattr_cache, work);
+
+ nfs4_xattr_discard_cache(cache);
+}
+
+static void
+nfs4_xattr_reap_cache(struct nfs4_xattr_cache *cache)
+{
+ queue_work(nfs4_xattr_cache_wq, &cache->work);
+}
+
+/*
+ * Get a referenced copy of the cache structure. Avoid doing allocs
+ * while holding i_lock. Which means that we do some optimistic allocation,
+ * and might have to free the result in rare cases.
+ *
+ * This function only checks the NFS_INO_INVALID_XATTR cache validity bit
+ * and acts accordingly, replacing the cache when needed. For the read case
+ * (!add), this means that the caller must make sure that the cache
+ * is valid before caling this function. getxattr and listxattr call
+ * revalidate_inode to do this. The attribute cache timeout (for the
+ * non-delegated case) is expected to be dealt with in the revalidate
+ * call.
+ */
+
+static struct nfs4_xattr_cache *
+nfs4_xattr_get_cache(struct inode *inode, int add)
+{
+ struct nfs_inode *nfsi;
+ struct nfs4_xattr_cache *cache, *oldcache, *newcache;
+
+ nfsi = NFS_I(inode);
+
+ cache = oldcache = NULL;
+
+ spin_lock(&inode->i_lock);
+
+ if (nfsi->cache_validity & NFS_INO_INVALID_XATTR)
+ oldcache = nfs4_xattr_cache_unlink(inode);
+ else
+ cache = nfsi->xattr_cache;
+
+ if (cache != NULL)
+ kref_get(&cache->ref);
+
+ spin_unlock(&inode->i_lock);
+
+ if (add && cache == NULL) {
+ newcache = NULL;
+
+ cache = nfs4_xattr_alloc_cache();
+ if (cache == NULL)
+ goto out;
+
+ spin_lock(&inode->i_lock);
+ if (nfsi->cache_validity & NFS_INO_INVALID_XATTR) {
+ /*
+ * The cache was invalidated again. Give up,
+ * since what we want to enter is now likely
+ * outdated anyway.
+ */
+ spin_unlock(&inode->i_lock);
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+ cache = NULL;
+ goto out;
+ }
+
+ /*
+ * Check if someone beat us to it.
+ */
+ if (nfsi->xattr_cache != NULL) {
+ newcache = nfsi->xattr_cache;
+ kref_get(&newcache->ref);
+ } else {
+ kref_get(&cache->ref);
+ nfsi->xattr_cache = cache;
+ cache->inode = inode;
+ list_lru_add(&nfs4_xattr_cache_lru, &cache->lru);
+ }
+
+ spin_unlock(&inode->i_lock);
+
+ /*
+ * If there was a race, throw away the cache we just
+ * allocated, and use the new one allocated by someone
+ * else.
+ */
+ if (newcache != NULL) {
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+ cache = newcache;
+ }
+ }
+
+out:
+ /*
+ * Discarding an old cache is done via a workqueue.
+ */
+ if (oldcache != NULL)
+ nfs4_xattr_reap_cache(oldcache);
+
+ return cache;
+}
+
+static inline struct nfs4_xattr_bucket *
+nfs4_xattr_hash_bucket(struct nfs4_xattr_cache *cache, const char *name)
+{
+ return &cache->buckets[jhash(name, strlen(name), 0) &
+ (ARRAY_SIZE(cache->buckets) - 1)];
+}
+
+static struct nfs4_xattr_entry *
+nfs4_xattr_get_entry(struct nfs4_xattr_bucket *bucket, const char *name)
+{
+ struct nfs4_xattr_entry *entry;
+
+ entry = NULL;
+
+ hlist_for_each_entry(entry, &bucket->hlist, hnode) {
+ if (!strcmp(entry->xattr_name, name))
+ break;
+ }
+
+ return entry;
+}
+
+static int
+nfs4_xattr_hash_add(struct nfs4_xattr_cache *cache,
+ struct nfs4_xattr_entry *entry)
+{
+ struct nfs4_xattr_bucket *bucket;
+ struct nfs4_xattr_entry *oldentry = NULL;
+ int ret = 1;
+
+ bucket = nfs4_xattr_hash_bucket(cache, entry->xattr_name);
+ entry->bucket = bucket;
+
+ spin_lock(&bucket->lock);
+
+ if (bucket->draining) {
+ ret = 0;
+ goto out;
+ }
+
+ oldentry = nfs4_xattr_get_entry(bucket, entry->xattr_name);
+ if (oldentry != NULL) {
+ hlist_del_init(&oldentry->hnode);
+ nfs4_xattr_entry_lru_del(oldentry);
+ } else {
+ atomic_long_inc(&cache->nent);
+ }
+
+ hlist_add_head(&entry->hnode, &bucket->hlist);
+ nfs4_xattr_entry_lru_add(entry);
+
+out:
+ spin_unlock(&bucket->lock);
+
+ if (oldentry != NULL)
+ kref_put(&oldentry->ref, nfs4_xattr_free_entry_cb);
+
+ return ret;
+}
+
+static void
+nfs4_xattr_hash_remove(struct nfs4_xattr_cache *cache, const char *name)
+{
+ struct nfs4_xattr_bucket *bucket;
+ struct nfs4_xattr_entry *entry;
+
+ bucket = nfs4_xattr_hash_bucket(cache, name);
+
+ spin_lock(&bucket->lock);
+
+ entry = nfs4_xattr_get_entry(bucket, name);
+ if (entry != NULL) {
+ hlist_del_init(&entry->hnode);
+ nfs4_xattr_entry_lru_del(entry);
+ atomic_long_dec(&cache->nent);
+ }
+
+ spin_unlock(&bucket->lock);
+
+ if (entry != NULL)
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+}
+
+static struct nfs4_xattr_entry *
+nfs4_xattr_hash_find(struct nfs4_xattr_cache *cache, const char *name)
+{
+ struct nfs4_xattr_bucket *bucket;
+ struct nfs4_xattr_entry *entry;
+
+ bucket = nfs4_xattr_hash_bucket(cache, name);
+
+ spin_lock(&bucket->lock);
+
+ entry = nfs4_xattr_get_entry(bucket, name);
+ if (entry != NULL)
+ kref_get(&entry->ref);
+
+ spin_unlock(&bucket->lock);
+
+ return entry;
+}
+
+/*
+ * Entry point to retrieve an entry from the cache.
+ */
+ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name, char *buf,
+ ssize_t buflen)
+{
+ struct nfs4_xattr_cache *cache;
+ struct nfs4_xattr_entry *entry;
+ ssize_t ret;
+
+ cache = nfs4_xattr_get_cache(inode, 0);
+ if (cache == NULL)
+ return -ENOENT;
+
+ ret = 0;
+ entry = nfs4_xattr_hash_find(cache, name);
+
+ if (entry != NULL) {
+ dprintk("%s: cache hit '%s', len %lu\n", __func__,
+ entry->xattr_name, (unsigned long)entry->xattr_size);
+ if (buflen == 0) {
+ /* Length probe only */
+ ret = entry->xattr_size;
+ } else if (buflen < entry->xattr_size)
+ ret = -ERANGE;
+ else {
+ memcpy(buf, entry->xattr_value, entry->xattr_size);
+ ret = entry->xattr_size;
+ }
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+ } else {
+ dprintk("%s: cache miss '%s'\n", __func__, name);
+ ret = -ENOENT;
+ }
+
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+
+ return ret;
+}
+
+/*
+ * Retrieve a cached list of xattrs from the cache.
+ */
+ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf, ssize_t buflen)
+{
+ struct nfs4_xattr_cache *cache;
+ struct nfs4_xattr_entry *entry;
+ ssize_t ret;
+
+ cache = nfs4_xattr_get_cache(inode, 0);
+ if (cache == NULL)
+ return -ENOENT;
+
+ spin_lock(&cache->listxattr_lock);
+
+ entry = cache->listxattr;
+
+ if (entry != NULL && entry != ERR_PTR(-ESTALE)) {
+ if (buflen == 0) {
+ /* Length probe only */
+ ret = entry->xattr_size;
+ } else if (entry->xattr_size > buflen)
+ ret = -ERANGE;
+ else {
+ memcpy(buf, entry->xattr_value, entry->xattr_size);
+ ret = entry->xattr_size;
+ }
+ } else {
+ ret = -ENOENT;
+ }
+
+ spin_unlock(&cache->listxattr_lock);
+
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+
+ return ret;
+}
+
+/*
+ * Add an xattr to the cache.
+ *
+ * This also invalidates the xattr list cache.
+ */
+void nfs4_xattr_cache_add(struct inode *inode, const char *name,
+ const char *buf, struct page **pages, ssize_t buflen)
+{
+ struct nfs4_xattr_cache *cache;
+ struct nfs4_xattr_entry *entry;
+
+ dprintk("%s: add '%s' len %lu\n", __func__,
+ name, (unsigned long)buflen);
+
+ cache = nfs4_xattr_get_cache(inode, 1);
+ if (cache == NULL)
+ return;
+
+ entry = nfs4_xattr_alloc_entry(name, buf, pages, buflen);
+ if (entry == NULL)
+ goto out;
+
+ (void)nfs4_xattr_set_listcache(cache, NULL);
+
+ if (!nfs4_xattr_hash_add(cache, entry))
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+
+out:
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+}
+
+
+/*
+ * Remove an xattr from the cache.
+ *
+ * This also invalidates the xattr list cache.
+ */
+void nfs4_xattr_cache_remove(struct inode *inode, const char *name)
+{
+ struct nfs4_xattr_cache *cache;
+
+ dprintk("%s: remove '%s'\n", __func__, name);
+
+ cache = nfs4_xattr_get_cache(inode, 0);
+ if (cache == NULL)
+ return;
+
+ (void)nfs4_xattr_set_listcache(cache, NULL);
+ nfs4_xattr_hash_remove(cache, name);
+
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+}
+
+/*
+ * Cache listxattr output, replacing any possible old one.
+ */
+void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
+ ssize_t buflen)
+{
+ struct nfs4_xattr_cache *cache;
+ struct nfs4_xattr_entry *entry;
+
+ cache = nfs4_xattr_get_cache(inode, 1);
+ if (cache == NULL)
+ return;
+
+ entry = nfs4_xattr_alloc_entry(NULL, buf, NULL, buflen);
+ if (entry == NULL)
+ goto out;
+
+ /*
+ * This is just there to be able to get to bucket->cache,
+ * which is obviously the same for all buckets, so just
+ * use bucket 0.
+ */
+ entry->bucket = &cache->buckets[0];
+
+ if (!nfs4_xattr_set_listcache(cache, entry))
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+
+out:
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+}
+
+/*
+ * Zap the entire cache. Called when an inode is evicted.
+ */
+void nfs4_xattr_cache_zap(struct inode *inode)
+{
+ struct nfs4_xattr_cache *oldcache;
+
+ spin_lock(&inode->i_lock);
+ oldcache = nfs4_xattr_cache_unlink(inode);
+ spin_unlock(&inode->i_lock);
+
+ if (oldcache)
+ nfs4_xattr_discard_cache(oldcache);
+}
+
+/*
+ * The entry LRU is shrunk more aggressively than the cache LRU,
+ * by settings @seeks to 1.
+ *
+ * Cache structures are freed only when they've become empty, after
+ * pruning all but one entry.
+ */
+
+static unsigned long nfs4_xattr_cache_count(struct shrinker *shrink,
+ struct shrink_control *sc);
+static unsigned long nfs4_xattr_entry_count(struct shrinker *shrink,
+ struct shrink_control *sc);
+static unsigned long nfs4_xattr_cache_scan(struct shrinker *shrink,
+ struct shrink_control *sc);
+static unsigned long nfs4_xattr_entry_scan(struct shrinker *shrink,
+ struct shrink_control *sc);
+
+static struct shrinker nfs4_xattr_cache_shrinker = {
+ .count_objects = nfs4_xattr_cache_count,
+ .scan_objects = nfs4_xattr_cache_scan,
+ .seeks = DEFAULT_SEEKS,
+ .flags = SHRINKER_MEMCG_AWARE,
+};
+
+static struct shrinker nfs4_xattr_entry_shrinker = {
+ .count_objects = nfs4_xattr_entry_count,
+ .scan_objects = nfs4_xattr_entry_scan,
+ .seeks = DEFAULT_SEEKS,
+ .batch = 512,
+ .flags = SHRINKER_MEMCG_AWARE,
+};
+
+static struct shrinker nfs4_xattr_large_entry_shrinker = {
+ .count_objects = nfs4_xattr_entry_count,
+ .scan_objects = nfs4_xattr_entry_scan,
+ .seeks = 1,
+ .batch = 512,
+ .flags = SHRINKER_MEMCG_AWARE,
+};
+
+static enum lru_status
+cache_lru_isolate(struct list_head *item,
+ struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
+{
+ struct list_head *dispose = arg;
+ struct inode *inode;
+ struct nfs4_xattr_cache *cache = container_of(item,
+ struct nfs4_xattr_cache, lru);
+
+ if (atomic_long_read(&cache->nent) > 1)
+ return LRU_SKIP;
+
+ /*
+ * If a cache structure is on the LRU list, we know that
+ * its inode is valid. Try to lock it to break the link.
+ * Since we're inverting the lock order here, only try.
+ */
+ inode = cache->inode;
+
+ if (!spin_trylock(&inode->i_lock))
+ return LRU_SKIP;
+
+ kref_get(&cache->ref);
+
+ cache->inode = NULL;
+ NFS_I(inode)->xattr_cache = NULL;
+ NFS_I(inode)->cache_validity &= ~NFS_INO_INVALID_XATTR;
+ list_lru_isolate(lru, &cache->lru);
+
+ spin_unlock(&inode->i_lock);
+
+ list_add_tail(&cache->dispose, dispose);
+ return LRU_REMOVED;
+}
+
+static unsigned long
+nfs4_xattr_cache_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+ LIST_HEAD(dispose);
+ unsigned long freed;
+ struct nfs4_xattr_cache *cache;
+
+ freed = list_lru_shrink_walk(&nfs4_xattr_cache_lru, sc,
+ cache_lru_isolate, &dispose);
+ while (!list_empty(&dispose)) {
+ cache = list_first_entry(&dispose, struct nfs4_xattr_cache,
+ dispose);
+ list_del_init(&cache->dispose);
+ nfs4_xattr_discard_cache(cache);
+ kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+ }
+
+ return freed;
+}
+
+
+static unsigned long
+nfs4_xattr_cache_count(struct shrinker *shrink, struct shrink_control *sc)
+{
+ unsigned long count;
+
+ count = list_lru_count(&nfs4_xattr_cache_lru);
+ return vfs_pressure_ratio(count);
+}
+
+static enum lru_status
+entry_lru_isolate(struct list_head *item,
+ struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
+{
+ struct list_head *dispose = arg;
+ struct nfs4_xattr_bucket *bucket;
+ struct nfs4_xattr_cache *cache;
+ struct nfs4_xattr_entry *entry = container_of(item,
+ struct nfs4_xattr_entry, lru);
+
+ bucket = entry->bucket;
+ cache = bucket->cache;
+
+ /*
+ * Unhook the entry from its parent (either a cache bucket
+ * or a cache structure if it's a listxattr buf), so that
+ * it's no longer found. Then add it to the isolate list,
+ * to be freed later.
+ *
+ * In both cases, we're reverting lock order, so use
+ * trylock and skip the entry if we can't get the lock.
+ */
+ if (entry->xattr_name != NULL) {
+ /* Regular cache entry */
+ if (!spin_trylock(&bucket->lock))
+ return LRU_SKIP;
+
+ kref_get(&entry->ref);
+
+ hlist_del_init(&entry->hnode);
+ atomic_long_dec(&cache->nent);
+ list_lru_isolate(lru, &entry->lru);
+
+ spin_unlock(&bucket->lock);
+ } else {
+ /* Listxattr cache entry */
+ if (!spin_trylock(&cache->listxattr_lock))
+ return LRU_SKIP;
+
+ kref_get(&entry->ref);
+
+ cache->listxattr = NULL;
+ list_lru_isolate(lru, &entry->lru);
+
+ spin_unlock(&cache->listxattr_lock);
+ }
+
+ list_add_tail(&entry->dispose, dispose);
+ return LRU_REMOVED;
+}
+
+static unsigned long
+nfs4_xattr_entry_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+ LIST_HEAD(dispose);
+ unsigned long freed;
+ struct nfs4_xattr_entry *entry;
+ struct list_lru *lru;
+
+ lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
+ &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
+
+ freed = list_lru_shrink_walk(lru, sc, entry_lru_isolate, &dispose);
+
+ while (!list_empty(&dispose)) {
+ entry = list_first_entry(&dispose, struct nfs4_xattr_entry,
+ dispose);
+ list_del_init(&entry->dispose);
+
+ /*
+ * Drop two references: the one that we just grabbed
+ * in entry_lru_isolate, and the one that was set
+ * when the entry was first allocated.
+ */
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+ kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+ }
+
+ return freed;
+}
+
+static unsigned long
+nfs4_xattr_entry_count(struct shrinker *shrink, struct shrink_control *sc)
+{
+ unsigned long count;
+ struct list_lru *lru;
+
+ lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
+ &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
+
+ count = list_lru_count(lru);
+ return vfs_pressure_ratio(count);
+}
+
+
+static void nfs4_xattr_cache_init_once(void *p)
+{
+ struct nfs4_xattr_cache *cache = (struct nfs4_xattr_cache *)p;
+
+ spin_lock_init(&cache->listxattr_lock);
+ atomic_long_set(&cache->nent, 0);
+ nfs4_xattr_hash_init(cache);
+ cache->listxattr = NULL;
+ INIT_WORK(&cache->work, nfs4_xattr_discard_cache_worker);
+ INIT_LIST_HEAD(&cache->lru);
+ INIT_LIST_HEAD(&cache->dispose);
+}
+
+int __init nfs4_xattr_cache_init(void)
+{
+ int ret = 0;
+
+ nfs4_xattr_cache_cachep = kmem_cache_create("nfs4_xattr_cache_cache",
+ sizeof(struct nfs4_xattr_cache), 0,
+ (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+ nfs4_xattr_cache_init_once);
+ if (nfs4_xattr_cache_cachep == NULL)
+ return -ENOMEM;
+
+ ret = list_lru_init_memcg(&nfs4_xattr_large_entry_lru,
+ &nfs4_xattr_large_entry_shrinker);
+ if (ret)
+ goto out4;
+
+ ret = list_lru_init_memcg(&nfs4_xattr_entry_lru,
+ &nfs4_xattr_entry_shrinker);
+ if (ret)
+ goto out3;
+
+ ret = list_lru_init_memcg(&nfs4_xattr_cache_lru,
+ &nfs4_xattr_cache_shrinker);
+ if (ret)
+ goto out2;
+
+ nfs4_xattr_cache_wq = alloc_workqueue("nfs4_xattr", WQ_MEM_RECLAIM, 0);
+ if (nfs4_xattr_cache_wq == NULL)
+ goto out1;
+
+ ret = register_shrinker(&nfs4_xattr_cache_shrinker);
+ if (ret)
+ goto out0;
+
+ ret = register_shrinker(&nfs4_xattr_entry_shrinker);
+ if (ret)
+ goto out;
+
+ ret = register_shrinker(&nfs4_xattr_large_entry_shrinker);
+ if (!ret)
+ return 0;
+
+ unregister_shrinker(&nfs4_xattr_entry_shrinker);
+out:
+ unregister_shrinker(&nfs4_xattr_cache_shrinker);
+out0:
+ destroy_workqueue(nfs4_xattr_cache_wq);
+out1:
+ list_lru_destroy(&nfs4_xattr_cache_lru);
+out2:
+ list_lru_destroy(&nfs4_xattr_entry_lru);
+out3:
+ list_lru_destroy(&nfs4_xattr_large_entry_lru);
+out4:
+ kmem_cache_destroy(nfs4_xattr_cache_cachep);
+
+ return ret;
+}
+
+void nfs4_xattr_cache_exit(void)
+{
+ unregister_shrinker(&nfs4_xattr_entry_shrinker);
+ unregister_shrinker(&nfs4_xattr_cache_shrinker);
+ list_lru_destroy(&nfs4_xattr_entry_lru);
+ list_lru_destroy(&nfs4_xattr_cache_lru);
+ kmem_cache_destroy(nfs4_xattr_cache_cachep);
+ destroy_workqueue(nfs4_xattr_cache_wq);
+}
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index da18d0254c75..6601c0054809 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -624,12 +624,34 @@ static inline bool nfs4_state_match_open_stateid_other(const struct nfs4_state *
nfs4_stateid_match_other(&state->open_stateid, stateid);
}

+/* nfs42xattr.c */
+#ifdef CONFIG_NFS_V4_2
+extern int __init nfs4_xattr_cache_init(void);
+extern void nfs4_xattr_cache_exit(void);
+extern void nfs4_xattr_cache_add(struct inode *inode, const char *name,
+ const char *buf, struct page **pages,
+ ssize_t buflen);
+extern void nfs4_xattr_cache_remove(struct inode *inode, const char *name);
+extern ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name,
+ char *buf, ssize_t buflen);
+extern void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
+ ssize_t buflen);
+extern ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf,
+ ssize_t buflen);
+extern void nfs4_xattr_cache_zap(struct inode *inode);
#else
+static inline void nfs4_xattr_cache_zap(struct inode *inode)
+{
+}
+#endif /* CONFIG_NFS_V4_2 */
+
+#else /* CONFIG_NFS_V4 */

#define nfs4_close_state(a, b) do { } while (0)
#define nfs4_close_sync(a, b) do { } while (0)
#define nfs4_state_protect(a, b, c, d) do { } while (0)
#define nfs4_state_protect_write(a, b, c, d) do { } while (0)

+
#endif /* CONFIG_NFS_V4 */
#endif /* __LINUX_FS_NFS_NFS4_FS.H */
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 33bbd4c90416..257f4a0397a0 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7450,6 +7450,7 @@ static int nfs4_xattr_set_nfs4_user(const struct xattr_handler *handler,
size_t buflen, int flags)
{
struct nfs_access_entry cache;
+ int ret;

if (!nfs_server_capable(inode, NFS_CAP_XATTR))
return -EOPNOTSUPP;
@@ -7468,10 +7469,17 @@ static int nfs4_xattr_set_nfs4_user(const struct xattr_handler *handler,
return -EACCES;
}

- if (buf == NULL)
- return nfs42_proc_removexattr(inode, key);
- else
- return nfs42_proc_setxattr(inode, key, buf, buflen, flags);
+ if (buf == NULL) {
+ ret = nfs42_proc_removexattr(inode, key);
+ if (!ret)
+ nfs4_xattr_cache_remove(inode, key);
+ } else {
+ ret = nfs42_proc_setxattr(inode, key, buf, buflen, flags);
+ if (!ret)
+ nfs4_xattr_cache_add(inode, key, buf, NULL, buflen);
+ }
+
+ return ret;
}

static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
@@ -7479,6 +7487,7 @@ static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
const char *key, void *buf, size_t buflen)
{
struct nfs_access_entry cache;
+ ssize_t ret;

if (!nfs_server_capable(inode, NFS_CAP_XATTR))
return -EOPNOTSUPP;
@@ -7488,7 +7497,17 @@ static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
return -EACCES;
}

- return nfs42_proc_getxattr(inode, key, buf, buflen);
+ ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ if (ret)
+ return ret;
+
+ ret = nfs4_xattr_cache_get(inode, key, buf, buflen);
+ if (ret >= 0 || (ret < 0 && ret != -ENOENT))
+ return ret;
+
+ ret = nfs42_proc_getxattr(inode, key, buf, buflen);
+
+ return ret;
}

static ssize_t
@@ -7496,7 +7515,7 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char *list, size_t list_len)
{
u64 cookie;
bool eof;
- int ret, size;
+ ssize_t ret, size;
char *buf;
size_t buflen;
struct nfs_access_entry cache;
@@ -7509,6 +7528,14 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char *list, size_t list_len)
return 0;
}

+ ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+ if (ret)
+ return ret;
+
+ ret = nfs4_xattr_cache_list(inode, list, list_len);
+ if (ret >= 0 || (ret < 0 && ret != -ENOENT))
+ return ret;
+
cookie = 0;
eof = false;
buflen = list_len ? list_len : XATTR_LIST_MAX;
@@ -7528,6 +7555,9 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char *list, size_t list_len)
size += ret;
}

+ if (list_len)
+ nfs4_xattr_cache_set_list(inode, list, size);
+
return size;
}

diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
index 1475f932d7da..0c1ab846b83d 100644
--- a/fs/nfs/nfs4super.c
+++ b/fs/nfs/nfs4super.c
@@ -69,6 +69,7 @@ static void nfs4_evict_inode(struct inode *inode)
pnfs_destroy_layout(NFS_I(inode));
/* First call standard NFS clear_inode() code */
nfs_clear_inode(inode);
+ nfs4_xattr_cache_zap(inode);
}

struct nfs_referral_count {
@@ -268,6 +269,12 @@ static int __init init_nfs_v4(void)
if (err)
goto out1;

+#ifdef CONFIG_NFS_V4_2
+ err = nfs4_xattr_cache_init();
+ if (err)
+ goto out2;
+#endif
+
err = nfs4_register_sysctl();
if (err)
goto out2;
@@ -288,6 +295,9 @@ static void __exit exit_nfs_v4(void)
nfs4_pnfs_v3_ds_connect_unload();

unregister_nfs_version(&nfs_v4);
+#ifdef CONFIG_NFS_V4_2
+ nfs4_xattr_cache_exit();
+#endif
nfs4_unregister_sysctl();
nfs_idmap_quit();
nfs_dns_resolver_destroy();
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 1fcfef670a4a..c08cc22d9c32 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -102,6 +102,8 @@ struct nfs_delegation;

struct posix_acl;

+struct nfs4_xattr_cache;
+
/*
* nfs fs inode data in memory
*/
@@ -188,6 +190,10 @@ struct nfs_inode {
struct fscache_cookie *fscache;
#endif
struct inode vfs_inode;
+
+#ifdef CONFIG_NFS_V4_2
+ struct nfs4_xattr_cache *xattr_cache;
+#endif
};

struct nfs4_copy_state {
diff --git a/include/uapi/linux/nfs_fs.h b/include/uapi/linux/nfs_fs.h
index 7bcc8cd6831d..3afe3767c55d 100644
--- a/include/uapi/linux/nfs_fs.h
+++ b/include/uapi/linux/nfs_fs.h
@@ -56,6 +56,7 @@
#define NFSDBG_PNFS 0x1000
#define NFSDBG_PNFS_LD 0x2000
#define NFSDBG_STATE 0x4000
+#define NFSDBG_XATTRCACHE 0x8000
#define NFSDBG_ALL 0xFFFF


--
2.17.2

2020-03-25 23:11:47

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH v2 04/13] NFSv4.2: query the server for extended attribute support

Query the server for extended attribute support, and record it
as the NFS_CAP_XATTR flag in the server capabilities.

Signed-off-by: Frank van der Linden <[email protected]>
---
fs/nfs/client.c | 3 +++
fs/nfs/nfs4proc.c | 3 ++-
fs/nfs/nfs4xdr.c | 25 +++++++++++++++++++++++++
include/linux/nfs_fs_sb.h | 1 +
include/linux/nfs_xdr.h | 1 +
5 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index eef39a4ec114..234635ef307e 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -808,6 +808,9 @@ static void nfs_server_set_fsinfo(struct nfs_server *server,
XATTR_SIZE_MAX);
server->lxasize = min_t(unsigned int, raw_max_rpc_payload,
nfs42_listxattr_xdrsize(XATTR_LIST_MAX));
+
+ if (fsinfo->xattr_support)
+ server->caps |= NFS_CAP_XATTR;
#endif
}

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 69b7ab7a5815..11eac1b46749 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -256,6 +256,7 @@ const u32 nfs4_fsinfo_bitmap[3] = { FATTR4_WORD0_MAXFILESIZE
| FATTR4_WORD1_FS_LAYOUT_TYPES,
FATTR4_WORD2_LAYOUT_BLKSIZE
| FATTR4_WORD2_CLONE_BLKSIZE
+ | FATTR4_WORD2_XATTR_SUPPORT
};

const u32 nfs4_fs_locations_bitmap[3] = {
@@ -3737,7 +3738,7 @@ static void nfs4_close_context(struct nfs_open_context *ctx, int is_sync)

#define FATTR4_WORD1_NFS40_MASK (2*FATTR4_WORD1_MOUNTED_ON_FILEID - 1UL)
#define FATTR4_WORD2_NFS41_MASK (2*FATTR4_WORD2_SUPPATTR_EXCLCREAT - 1UL)
-#define FATTR4_WORD2_NFS42_MASK (2*FATTR4_WORD2_MODE_UMASK - 1UL)
+#define FATTR4_WORD2_NFS42_MASK (2*FATTR4_WORD2_XATTR_SUPPORT - 1UL)

static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle)
{
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 47817ef0aadb..9e1b07640e9a 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -4201,6 +4201,26 @@ static int decode_attr_time_modify(struct xdr_stream *xdr, uint32_t *bitmap, str
return status;
}

+static int decode_attr_xattrsupport(struct xdr_stream *xdr, uint32_t *bitmap,
+ uint32_t *res)
+{
+ __be32 *p;
+
+ *res = 0;
+ if (unlikely(bitmap[2] & (FATTR4_WORD2_XATTR_SUPPORT - 1U)))
+ return -EIO;
+ if (likely(bitmap[2] & FATTR4_WORD2_XATTR_SUPPORT)) {
+ p = xdr_inline_decode(xdr, 4);
+ if (unlikely(!p))
+ return -EIO;
+ *res = be32_to_cpup(p);
+ bitmap[2] &= ~FATTR4_WORD2_XATTR_SUPPORT;
+ }
+ dprintk("%s: XATTR support=%s\n", __func__,
+ *res == 0 ? "false" : "true");
+ return 0;
+}
+
static int verify_attr_len(struct xdr_stream *xdr, unsigned int savep, uint32_t attrlen)
{
unsigned int attrwords = XDR_QUADLEN(attrlen);
@@ -4855,6 +4875,11 @@ static int decode_fsinfo(struct xdr_stream *xdr, struct nfs_fsinfo *fsinfo)
if (status)
goto xdr_error;

+ status = decode_attr_xattrsupport(xdr, bitmap,
+ &fsinfo->xattr_support);
+ if (status)
+ goto xdr_error;
+
status = verify_attr_len(xdr, savep, attrlen);
xdr_error:
dprintk("%s: xdr returned %d!\n", __func__, -status);
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 128e01acb4ca..7eae72a8762e 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -286,5 +286,6 @@ struct nfs_server {
#define NFS_CAP_OFFLOAD_CANCEL (1U << 25)
#define NFS_CAP_LAYOUTERROR (1U << 26)
#define NFS_CAP_COPY_NOTIFY (1U << 27)
+#define NFS_CAP_XATTR (1U << 28)

#endif
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 94c77ed55ce1..6fb1f2b65db3 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -149,6 +149,7 @@ struct nfs_fsinfo {
__u32 layouttype[NFS_MAX_LAYOUT_TYPES]; /* supported pnfs layout driver */
__u32 blksize; /* preferred pnfs io block size */
__u32 clone_blksize; /* granularity of a CLONE operation */
+ __u32 xattr_support; /* User xattrs supported */
};

struct nfs_fsstat {
--
2.17.2

2020-03-25 23:12:04

by Frank van der Linden

[permalink] [raw]
Subject: [PATCH v2 10/13] nfs: make the buf_to_pages_noslab function available to the nfs code

Make the buf_to_pages_noslab function available to the rest of the NFS
code. Rename it to nfs4_buf_to_pages_noslab to be consistent.

This will be used later in the NFSv4.2 xattr code.

Signed-off-by: Frank van der Linden <[email protected]>
---
fs/nfs/nfs4_fs.h | 2 ++
fs/nfs/nfs4proc.c | 4 ++--
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index f82c3e884662..da18d0254c75 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -326,6 +326,8 @@ extern void nfs4_update_changeattr(struct inode *dir,
struct nfs4_change_info *cinfo,
unsigned long timestamp,
unsigned long cache_validity);
+extern int nfs4_buf_to_pages_noslab(const void *buf, size_t buflen,
+ struct page **pages);

#if defined(CONFIG_NFS_V4_1)
extern int nfs41_sequence_done(struct rpc_task *, struct nfs4_sequence_res *);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 441bdbb7f247..c1193766cfe5 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5544,7 +5544,7 @@ static inline int nfs4_server_supports_acls(struct nfs_server *server)
*/
#define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)

-static int buf_to_pages_noslab(const void *buf, size_t buflen,
+int nfs4_buf_to_pages_noslab(const void *buf, size_t buflen,
struct page **pages)
{
struct page *newpage, **spages;
@@ -5786,7 +5786,7 @@ static int __nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t bufl
return -EOPNOTSUPP;
if (npages > ARRAY_SIZE(pages))
return -ERANGE;
- i = buf_to_pages_noslab(buf, buflen, arg.acl_pages);
+ i = nfs4_buf_to_pages_noslab(buf, buflen, arg.acl_pages);
if (i < 0)
return i;
nfs4_inode_make_writeable(inode);
--
2.17.2

2020-03-26 19:03:53

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support

Hi Frank.

----- Original Message -----
> From: "Frank van der Linden" <[email protected]>
> To: "linux-nfs" <[email protected]>, "Anna Schumaker" <[email protected]>, "Trond Myklebust"
> <[email protected]>
> Cc: "Frank van der Linden" <[email protected]>
> Sent: Thursday, March 26, 2020 12:10:38 AM
> Subject: [PATCH v2 00/13] NFS client user xattr (RFC8276) support

> v1 is here: https://www.spinics.net/lists/linux-nfs/msg76740.html
>
> v2:
>
> * Move nfs4 specific definitions to nfs_fs4.h
> * Squash some patches together to avoid unused function warnings
> when bisecting.
> * Made determining server support a two-step process. First,
> the extended attribute FATTR is verified to be supported, then
> the value of the attributed is queried in fsinfo to determine
> support.

The new patchset looks broken to me.

Client quiryes for supported attributes and gets xattr_support bit set:

Mar 26 11:27:07 ani.desy.de kernel: decode_attr_supported: bitmask=fcffbfff:40fdbe3e:00040800

However, the attribute never queries, but client makes is decision:

Mar 26 11:27:07 ani.desy.de kernel: decode_attr_xattrsupport: XATTR support=false

The packets can be found here: https://sas.desy.de/index.php/s/GEPiBxPg3eR4aGA

Can you provide packets of your mount/umount round.

Regards,
Tigran.


> * Fixed up Makefile to remove an unneeded extra line.
>
> This was tested as before (using my own stress tests), but also
> with xfstests-dev. No issues were found, but xfstests needs some
> fixes to correctly run the applicable xattr tests on NFS. I
> have those changes, but need to send them out.
>
> I also tested stress-ng-xattr with 1000 workers on the client
> side, running for 8 hours without problems (except for noting
> that the session tbl_lock can become quite hot when doing
> NFS operations by 1000 threads, but that's a separate issue).
>
>
> Frank van der Linden (13):
> nfs,nfsd: NFSv4.2 extended attribute protocol definitions
> nfs: add client side only definitions for user xattrs
> NFSv4.2: define limits and sizes for user xattr handling
> NFSv4.2: query the server for extended attribute support
> NFSv4.2: add client side XDR handling for extended attributes
> nfs: define nfs_access_get_cached function
> NFSv4.2: query the extended attribute access bits
> nfs: modify update_changeattr to deal with regular files
> nfs: define and use the NFS_INO_INVALID_XATTR flag
> nfs: make the buf_to_pages_noslab function available to the nfs code
> NFSv4.2: add the extended attribute proc functions.
> NFSv4.2: hook in the user extended attribute handlers
> NFSv4.2: add client side xattr caching.
>
> fs/nfs/Makefile | 2 +-
> fs/nfs/client.c | 22 +-
> fs/nfs/dir.c | 24 +-
> fs/nfs/inode.c | 16 +-
> fs/nfs/nfs42.h | 24 +
> fs/nfs/nfs42proc.c | 248 ++++++++
> fs/nfs/nfs42xattr.c | 1083 +++++++++++++++++++++++++++++++++++
> fs/nfs/nfs42xdr.c | 438 ++++++++++++++
> fs/nfs/nfs4_fs.h | 35 ++
> fs/nfs/nfs4client.c | 31 +
> fs/nfs/nfs4proc.c | 237 +++++++-
> fs/nfs/nfs4super.c | 10 +
> fs/nfs/nfs4xdr.c | 31 +
> fs/nfs/nfstrace.h | 3 +-
> include/linux/nfs4.h | 25 +
> include/linux/nfs_fs.h | 12 +
> include/linux/nfs_fs_sb.h | 6 +
> include/linux/nfs_xdr.h | 60 +-
> include/uapi/linux/nfs4.h | 3 +
> include/uapi/linux/nfs_fs.h | 1 +
> 20 files changed, 2269 insertions(+), 42 deletions(-)
> create mode 100644 fs/nfs/nfs42xattr.c
>
> --
> 2.17.2

2020-03-26 19:45:01

by Frank van der Linden

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support

On Thu, Mar 26, 2020 at 08:03:13PM +0100, Mkrtchyan, Tigran wrote:
> Hi Frank.
>
> The new patchset looks broken to me.
>
> Client quiryes for supported attributes and gets xattr_support bit set:
>
> Mar 26 11:27:07 ani.desy.de kernel: decode_attr_supported: bitmask=fcffbfff:40fdbe3e:00040800
>
> However, the attribute never queries, but client makes is decision:
>
> Mar 26 11:27:07 ani.desy.de kernel: decode_attr_xattrsupport: XATTR support=false
>
> The packets can be found here: https://sas.desy.de/index.php/s/GEPiBxPg3eR4aGA
>
> Can you provide packets of your mount/umount round.
>
> Regards,
> Tigran.

Hi Tigran,

This patchset works against the server side patches, and also works against
a FreeBSD-current server.

Is this failure happening with your Java server?

Let me look at your packet dump and compare it with a working scenario.

- Frank

2020-03-26 23:16:46

by Frank van der Linden

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support

On Thu, Mar 26, 2020 at 08:03:13PM +0100, Mkrtchyan, Tigran wrote:
> The new patchset looks broken to me.
>
> Client quiryes for supported attributes and gets xattr_support bit set:
>
> Mar 26 11:27:07 ani.desy.de kernel: decode_attr_supported: bitmask=fcffbfff:40fdbe3e:00040800
>
> However, the attribute never queries, but client makes is decision:
>
> Mar 26 11:27:07 ani.desy.de kernel: decode_attr_xattrsupport: XATTR support=false
>
> The packets can be found here: https://sas.desy.de/index.php/s/GEPiBxPg3eR4aGA
>
> Can you provide packets of your mount/umount round.

Hi Tigran,

I looked at your packet dump. It seems like your server only supports 4.1,
not 4.2. xattr support builds on 4.2 (within the rules laid out in
RFC 8178).

So, the fsinfo client call, which is just a GETATTR, masks out the 4.2
fattr bits from server->attr_mask, and just uses the 4.1 bits. Meaning that
xattr_support is not included, and defaults to false.

The packet dump also indicates that your server advertises the xattr_support
fattr as supported, even though it's in a 4.1 session, which would not
be correct.

- Frank

2020-03-27 07:52:07

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support



----- Original Message -----
> From: "Frank van der Linden" <[email protected]>
> To: "Tigran Mkrtchyan" <[email protected]>
> Cc: "linux-nfs" <[email protected]>, "Anna Schumaker" <[email protected]>, "Trond Myklebust"
> <[email protected]>
> Sent: Friday, March 27, 2020 12:16:02 AM
> Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support

> On Thu, Mar 26, 2020 at 08:03:13PM +0100, Mkrtchyan, Tigran wrote:
>> The new patchset looks broken to me.
>>
>> Client quiryes for supported attributes and gets xattr_support bit set:
>>
>> Mar 26 11:27:07 ani.desy.de kernel: decode_attr_supported:
>> bitmask=fcffbfff:40fdbe3e:00040800
>>
>> However, the attribute never queries, but client makes is decision:
>>
>> Mar 26 11:27:07 ani.desy.de kernel: decode_attr_xattrsupport: XATTR
>> support=false
>>
>> The packets can be found here: https://sas.desy.de/index.php/s/GEPiBxPg3eR4aGA
>>
>> Can you provide packets of your mount/umount round.
>
> Hi Tigran,
>
> I looked at your packet dump. It seems like your server only supports 4.1,
> not 4.2. xattr support builds on 4.2 (within the rules laid out in
> RFC 8178).

That's right, this is what rfc8276 says:

Note that the XDR code contained in this document depends on types
from the NFSv4.2 nfs4_prot.x file [RFC7863]. This includes both nfs
types that end with a 4, such as nfs_cookie4, count4, etc., as well
as more-generic types, such as opaque and bool.

However, xattr support doesn't relays on any functionality provided by v4.2. Moreover,
all used data structures (nfs_cookie4, component4, change_info4) defined in nfsv4.0.
Thus there are no reasons why even v4.0 server can't support xattrs.

>
> So, the fsinfo client call, which is just a GETATTR, masks out the 4.2
> fattr bits from server->attr_mask, and just uses the 4.1 bits. Meaning that
> xattr_support is not included, and defaults to false.

I was expecting something like this, but was unable to find the place where this
masking is happening.

Tigran.

>
> The packet dump also indicates that your server advertises the xattr_support
> fattr as supported, even though it's in a 4.1 session, which would not
> be correct.
>
> - Frank

2020-06-08 15:59:31

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support


Dear maintainers,

are those changes considered for 5.8?

Thanks,
Tigran.

----- Original Message -----
> From: "Tigran Mkrtchyan" <[email protected]>
> To: "Frank van der Linden" <[email protected]>
> Cc: "linux-nfs" <[email protected]>, "Anna Schumaker" <[email protected]>, "Trond Myklebust"
> <[email protected]>
> Sent: Saturday, April 4, 2020 2:18:54 PM
> Subject: Re:[PATCH v2 00/13] NFS client user xattr (RFC8276) support

> After adding 'minimal' 4.2 implementation to our server, the patchset works as
> expected.
> Thanks, Tigran.
>
> -------- Original message --------
> From: "Mkrtchyan, Tigran" <[email protected]>
> Date: Fri, Mar 27, 2020, 08:51
> To: Frank van der Linden <[email protected]>
> Cc: linux-nfs <[email protected]>, Anna Schumaker
> <[email protected]>, Trond Myklebust <[email protected]>
> Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support
>
>
> ----- Original Message -----
>> From: "Frank van der Linden" <[email protected]>
>> To: "Tigran Mkrtchyan" <[email protected]>
>> Cc: "linux-nfs" <[email protected]>, "Anna Schumaker"
>> <[email protected]>, "Trond Myklebust"
>> <[email protected]>
>> Sent: Friday, March 27, 2020 12:16:02 AM
>> Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support
>
>> On Thu, Mar 26, 2020 at 08:03:13PM +0100, Mkrtchyan, Tigran wrote:
>>> The new patchset looks broken to me.
>>>
>>> Client quiryes for supported attributes and gets xattr_support bit set:
>>>
>>> Mar 26 11:27:07 ani.desy.de kernel: decode_attr_supported:
>>> bitmask=fcffbfff:40fdbe3e:00040800
>>>
>>> However, the attribute never queries, but client makes is decision:
>>>
>>> Mar 26 11:27:07 ani.desy.de kernel: decode_attr_xattrsupport: XATTR
>>> support=false
>>>
>>> The packets can be found here: https://sas.desy.de/index.php/s/GEPiBxPg3eR4aGA
>>>
>>> Can you provide packets of your mount/umount round.
>>
>> Hi Tigran,
>>
>> I looked at your packet dump. It seems like your server only supports 4.1,
>> not 4.2. xattr support builds on 4.2 (within the rules laid out in
>> RFC 8178).
>
> That's right, this is what rfc8276 says:
>
> Note that the XDR code contained in this document depends on types
> from the NFSv4.2 nfs4_prot.x file [RFC7863]. This includes both nfs
> types that end with a 4, such as nfs_cookie4, count4, etc., as well
> as more-generic types, such as opaque and bool.
>
> However, xattr support doesn't relays on any functionality provided by v4.2.
> Moreover,
> all used data structures (nfs_cookie4, component4, change_info4) defined in
> nfsv4.0.
> Thus there are no reasons why even v4.0 server can't support xattrs.
>
>>
>> So, the fsinfo client call, which is just a GETATTR, masks out the 4.2
>> fattr bits from server->attr_mask, and just uses the 4.1 bits. Meaning that
>> xattr_support is not included, and defaults to false.
>
> I was expecting something like this, but was unable to find the place where this
> masking is happening.
>
> Tigran.
>
>>
>> The packet dump also indicates that your server advertises the xattr_support
>> fattr as supported, even though it's in a 4.1 session, which would not
>> be correct.
>>
> > - Frank

2020-06-08 16:17:26

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support

Hi Tigran,


On Mon, Jun 8, 2020 at 11:59 AM Mkrtchyan, Tigran
<[email protected]> wrote:
>
>
> Dear maintainers,
>
> are those changes considered for 5.8?

My understanding is that these patches will be targeting 5.9.

Anna
>
> Thanks,
> Tigran.
>
> ----- Original Message -----
> > From: "Tigran Mkrtchyan" <[email protected]>
> > To: "Frank van der Linden" <[email protected]>
> > Cc: "linux-nfs" <[email protected]>, "Anna Schumaker" <[email protected]>, "Trond Myklebust"
> > <[email protected]>
> > Sent: Saturday, April 4, 2020 2:18:54 PM
> > Subject: Re:[PATCH v2 00/13] NFS client user xattr (RFC8276) support
>
> > After adding 'minimal' 4.2 implementation to our server, the patchset works as
> > expected.
> > Thanks, Tigran.
> >
> > -------- Original message --------
> > From: "Mkrtchyan, Tigran" <[email protected]>
> > Date: Fri, Mar 27, 2020, 08:51
> > To: Frank van der Linden <[email protected]>
> > Cc: linux-nfs <[email protected]>, Anna Schumaker
> > <[email protected]>, Trond Myklebust <[email protected]>
> > Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support
> >
> >
> > ----- Original Message -----
> >> From: "Frank van der Linden" <[email protected]>
> >> To: "Tigran Mkrtchyan" <[email protected]>
> >> Cc: "linux-nfs" <[email protected]>, "Anna Schumaker"
> >> <[email protected]>, "Trond Myklebust"
> >> <[email protected]>
> >> Sent: Friday, March 27, 2020 12:16:02 AM
> >> Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support
> >
> >> On Thu, Mar 26, 2020 at 08:03:13PM +0100, Mkrtchyan, Tigran wrote:
> >>> The new patchset looks broken to me.
> >>>
> >>> Client quiryes for supported attributes and gets xattr_support bit set:
> >>>
> >>> Mar 26 11:27:07 ani.desy.de kernel: decode_attr_supported:
> >>> bitmask=fcffbfff:40fdbe3e:00040800
> >>>
> >>> However, the attribute never queries, but client makes is decision:
> >>>
> >>> Mar 26 11:27:07 ani.desy.de kernel: decode_attr_xattrsupport: XATTR
> >>> support=false
> >>>
> >>> The packets can be found here: https://sas.desy.de/index.php/s/GEPiBxPg3eR4aGA
> >>>
> >>> Can you provide packets of your mount/umount round.
> >>
> >> Hi Tigran,
> >>
> >> I looked at your packet dump. It seems like your server only supports 4.1,
> >> not 4.2. xattr support builds on 4.2 (within the rules laid out in
> >> RFC 8178).
> >
> > That's right, this is what rfc8276 says:
> >
> > Note that the XDR code contained in this document depends on types
> > from the NFSv4.2 nfs4_prot.x file [RFC7863]. This includes both nfs
> > types that end with a 4, such as nfs_cookie4, count4, etc., as well
> > as more-generic types, such as opaque and bool.
> >
> > However, xattr support doesn't relays on any functionality provided by v4.2.
> > Moreover,
> > all used data structures (nfs_cookie4, component4, change_info4) defined in
> > nfsv4.0.
> > Thus there are no reasons why even v4.0 server can't support xattrs.
> >
> >>
> >> So, the fsinfo client call, which is just a GETATTR, masks out the 4.2
> >> fattr bits from server->attr_mask, and just uses the 4.1 bits. Meaning that
> >> xattr_support is not included, and defaults to false.
> >
> > I was expecting something like this, but was unable to find the place where this
> > masking is happening.
> >
> > Tigran.
> >
> >>
> >> The packet dump also indicates that your server advertises the xattr_support
> >> fattr as supported, even though it's in a 4.1 session, which would not
> >> be correct.
> >>
> > > - Frank

2020-06-08 16:41:08

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support

Thanks for clarification.

Tigran.

----- Original Message -----
> From: "Anna Schumaker" <[email protected]>
> To: "Tigran Mkrtchyan" <[email protected]>
> Cc: "linux-nfs" <[email protected]>, "Trond Myklebust" <[email protected]>, "Frank van der Linden"
> <[email protected]>
> Sent: Monday, June 8, 2020 6:15:06 PM
> Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support

> Hi Tigran,
>
>
> On Mon, Jun 8, 2020 at 11:59 AM Mkrtchyan, Tigran
> <[email protected]> wrote:
>>
>>
>> Dear maintainers,
>>
>> are those changes considered for 5.8?
>
> My understanding is that these patches will be targeting 5.9.
>
> Anna
>>
>> Thanks,
>> Tigran.
>>
>> ----- Original Message -----
>> > From: "Tigran Mkrtchyan" <[email protected]>
>> > To: "Frank van der Linden" <[email protected]>
>> > Cc: "linux-nfs" <[email protected]>, "Anna Schumaker"
>> > <[email protected]>, "Trond Myklebust"
>> > <[email protected]>
>> > Sent: Saturday, April 4, 2020 2:18:54 PM
>> > Subject: Re:[PATCH v2 00/13] NFS client user xattr (RFC8276) support
>>
>> > After adding 'minimal' 4.2 implementation to our server, the patchset works as
>> > expected.
>> > Thanks, Tigran.
>> >
>> > -------- Original message --------
>> > From: "Mkrtchyan, Tigran" <[email protected]>
>> > Date: Fri, Mar 27, 2020, 08:51
>> > To: Frank van der Linden <[email protected]>
>> > Cc: linux-nfs <[email protected]>, Anna Schumaker
>> > <[email protected]>, Trond Myklebust <[email protected]>
>> > Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support
>> >
>> >
>> > ----- Original Message -----
>> >> From: "Frank van der Linden" <[email protected]>
>> >> To: "Tigran Mkrtchyan" <[email protected]>
>> >> Cc: "linux-nfs" <[email protected]>, "Anna Schumaker"
>> >> <[email protected]>, "Trond Myklebust"
>> >> <[email protected]>
>> >> Sent: Friday, March 27, 2020 12:16:02 AM
>> >> Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support
>> >
>> >> On Thu, Mar 26, 2020 at 08:03:13PM +0100, Mkrtchyan, Tigran wrote:
>> >>> The new patchset looks broken to me.
>> >>>
>> >>> Client quiryes for supported attributes and gets xattr_support bit set:
>> >>>
>> >>> Mar 26 11:27:07 ani.desy.de kernel: decode_attr_supported:
>> >>> bitmask=fcffbfff:40fdbe3e:00040800
>> >>>
>> >>> However, the attribute never queries, but client makes is decision:
>> >>>
>> >>> Mar 26 11:27:07 ani.desy.de kernel: decode_attr_xattrsupport: XATTR
>> >>> support=false
>> >>>
>> >>> The packets can be found here: https://sas.desy.de/index.php/s/GEPiBxPg3eR4aGA
>> >>>
>> >>> Can you provide packets of your mount/umount round.
>> >>
>> >> Hi Tigran,
>> >>
>> >> I looked at your packet dump. It seems like your server only supports 4.1,
>> >> not 4.2. xattr support builds on 4.2 (within the rules laid out in
>> >> RFC 8178).
>> >
>> > That's right, this is what rfc8276 says:
>> >
>> > Note that the XDR code contained in this document depends on types
>> > from the NFSv4.2 nfs4_prot.x file [RFC7863]. This includes both nfs
>> > types that end with a 4, such as nfs_cookie4, count4, etc., as well
>> > as more-generic types, such as opaque and bool.
>> >
>> > However, xattr support doesn't relays on any functionality provided by v4.2.
>> > Moreover,
>> > all used data structures (nfs_cookie4, component4, change_info4) defined in
>> > nfsv4.0.
>> > Thus there are no reasons why even v4.0 server can't support xattrs.
>> >
>> >>
>> >> So, the fsinfo client call, which is just a GETATTR, masks out the 4.2
>> >> fattr bits from server->attr_mask, and just uses the 4.1 bits. Meaning that
>> >> xattr_support is not included, and defaults to false.
>> >
>> > I was expecting something like this, but was unable to find the place where this
>> > masking is happening.
>> >
>> > Tigran.
>> >
>> >>
>> >> The packet dump also indicates that your server advertises the xattr_support
>> >> fattr as supported, even though it's in a 4.1 session, which would not
>> >> be correct.
>> >>
> > > > - Frank

2020-06-08 16:49:26

by Frank van der Linden

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support

On Mon, Jun 08, 2020 at 06:37:29PM +0200, Mkrtchyan, Tigran wrote:
> Thanks for clarification.
>
> Tigran.
>
> ----- Original Message -----
> > From: "Anna Schumaker" <[email protected]>
> > To: "Tigran Mkrtchyan" <[email protected]>
> > Cc: "linux-nfs" <[email protected]>, "Trond Myklebust" <[email protected]>, "Frank van der Linden"
> > <[email protected]>
> > Sent: Monday, June 8, 2020 6:15:06 PM
> > Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support
>
> > Hi Tigran,
> >
> >
> > On Mon, Jun 8, 2020 at 11:59 AM Mkrtchyan, Tigran
> > <[email protected]> wrote:
> >>
> >>
> >> Dear maintainers,
> >>
> >> are those changes considered for 5.8?
> >
> > My understanding is that these patches will be targeting 5.9.
> >
> > Anna

Hi Tigran,

Since there is one remaining open issue on the server side changes that needs
signoff from the general fs maintainers, I agreed with Bruce & Chuck to target
5.9, as the 5.8 merge window is currently already open, and all the activity
is centered around it.

For the client side, there are no open issues that were flagged, so from my
side it's all ready to go - except for me to post a v3 rebased to
the latest tree, which is easy to do.

In other words, I think it's ok for the client side to go in to 5.8, but it
probably makes more sense to have it go in to the same version as the server
side, so that's what I proposed to Anna & Trond.

- Frank

2020-06-08 16:53:29

by Mkrtchyan, Tigran

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support


Hi Frank,

----- Original Message -----
> From: "Frank van der Linden" <[email protected]>
> To: "Tigran Mkrtchyan" <[email protected]>
> Cc: "Anna Schumaker" <[email protected]>, "linux-nfs" <[email protected]>, "Trond Myklebust"
> <[email protected]>
> Sent: Monday, June 8, 2020 6:47:42 PM
> Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support

> On Mon, Jun 08, 2020 at 06:37:29PM +0200, Mkrtchyan, Tigran wrote:
>> Thanks for clarification.
>>
>> Tigran.
>>
>> ----- Original Message -----
>> > From: "Anna Schumaker" <[email protected]>
>> > To: "Tigran Mkrtchyan" <[email protected]>
>> > Cc: "linux-nfs" <[email protected]>, "Trond Myklebust"
>> > <[email protected]>, "Frank van der Linden"
>> > <[email protected]>
>> > Sent: Monday, June 8, 2020 6:15:06 PM
>> > Subject: Re: [PATCH v2 00/13] NFS client user xattr (RFC8276) support
>>
>> > Hi Tigran,
>> >
>> >
>> > On Mon, Jun 8, 2020 at 11:59 AM Mkrtchyan, Tigran
>> > <[email protected]> wrote:
>> >>
>> >>
>> >> Dear maintainers,
>> >>
>> >> are those changes considered for 5.8?
>> >
>> > My understanding is that these patches will be targeting 5.9.
>> >
>> > Anna
>
> Hi Tigran,
>
> Since there is one remaining open issue on the server side changes that needs
> signoff from the general fs maintainers, I agreed with Bruce & Chuck to target
> 5.9, as the 5.8 merge window is currently already open, and all the activity
> is centered around it.
>
> For the client side, there are no open issues that were flagged, so from my
> side it's all ready to go - except for me to post a v3 rebased to
> the latest tree, which is easy to do.
>
> In other words, I think it's ok for the client side to go in to 5.8, but it
> probably makes more sense to have it go in to the same version as the server
> side, so that's what I proposed to Anna & Trond.


Makes sense.

Thanks,
Tigran.

>
> - Frank