2022-10-28 15:05:49

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v7 00/14] A course adjustment, for sure...

I'm proposing this series for v6.2 (for-next).

For quite some time, we've been encouraged to disable filecache
garbage collection for NFSv4 files, and I think I found a surgical
way to do just that. That is presented in "NFSD: Add an NFSD_FILE_GC
flag to enable nfsd_file garbage collection".

The other major change in this set is reworking the file_hashtbl to
resize itself dynamically. This reduces the average size of its
bucket chains, greatly speeding up hash insertion, which holds the
state_lock.

Comments and opinions are welcome. I think this is the last version
of this series...? Jeff's fixes will follow or be squashed into this
one.


Changes since v6:
- Reviewed-by tags have been updated
- Renamed nfs4_file hash helper functions

Changes since v5:
- Wrap hash insertion with inode->i_lock
- Replace hashfn and friends with in-built rhashtable functions
- Add a tracepoint to report delegation return

Changes since v4:
- Addressed some comments in the GC patch; more to come
- Split clean-ups out of the rhashtable patch, reordered the series
- Removed state_lock from the rhashtable helpers

Changes since v3:
- the new filehandle alias check was still not right

Changes since v2:
- Converted nfs4_file_rhashtbl to nfs4_file_rhltable
- Addressed most or all other review comments

Changes since RFC:
- checking nfs4_files for inode aliases is now done only on hash
insertion
- the nfs4_file reference count is now bumped only while the RCU
read lock is held
- comments and function names have been revised and clarified

---

Chuck Lever (14):
NFSD: Pass the target nfsd_file to nfsd_commit()
NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately"
NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
NFSD: Clean up nfs4_preprocess_stateid_op() call sites
NFSD: Trace stateids returned via DELEGRETURN
NFSD: Trace delegation revocations
NFSD: Use const pointers as parameters to fh_ helpers
NFSD: Update file_hashtbl() helpers
NFSD: Clean up nfsd4_init_file()
NFSD: Add a nfsd4_file_hash_remove() helper
NFSD: Clean up find_or_add_file()
NFSD: Refactor find_file()
NFSD: Allocate an rhashtable for nfs4_file objects
NFSD: Use rhashtable for managing nfs4_file objects


fs/nfsd/filecache.c | 81 ++++++++++++++-------
fs/nfsd/filecache.h | 4 +-
fs/nfsd/nfs3proc.c | 10 ++-
fs/nfsd/nfs4proc.c | 42 +++++------
fs/nfsd/nfs4state.c | 166 +++++++++++++++++++++++++-------------------
fs/nfsd/nfsfh.h | 10 +--
fs/nfsd/state.h | 5 +-
fs/nfsd/trace.h | 59 +++++++++++++++-
fs/nfsd/vfs.c | 19 ++---
fs/nfsd/vfs.h | 3 +-
10 files changed, 250 insertions(+), 149 deletions(-)

--
Chuck Lever



2022-10-28 15:06:03

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v7 12/14] NFSD: Refactor find_file()

find_file() is now the only caller of find_file_locked(), so just
fold these two together.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
---
fs/nfsd/nfs4state.c | 36 +++++++++++++++---------------------
1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b1988a46fb9b..2b694d693be5 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4667,31 +4667,24 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net)
nfs4_put_stid(&last->st_stid);
}

-/* search file_hashtbl[] for file */
-static struct nfs4_file *
-find_file_locked(const struct svc_fh *fh, unsigned int hashval)
+static noinline_for_stack struct nfs4_file *
+nfsd4_file_hash_lookup(const struct svc_fh *fhp)
{
- struct nfs4_file *fp;
+ unsigned int hashval = file_hashval(fhp);
+ struct nfs4_file *fi;

- hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash,
- lockdep_is_held(&state_lock)) {
- if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) {
- if (refcount_inc_not_zero(&fp->fi_ref))
- return fp;
+ rcu_read_lock();
+ hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash,
+ lockdep_is_held(&state_lock)) {
+ if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) {
+ if (refcount_inc_not_zero(&fi->fi_ref)) {
+ rcu_read_unlock();
+ return fi;
+ }
}
}
- return NULL;
-}
-
-static struct nfs4_file * find_file(struct svc_fh *fh)
-{
- struct nfs4_file *fp;
- unsigned int hashval = file_hashval(fh);
-
- rcu_read_lock();
- fp = find_file_locked(fh, hashval);
rcu_read_unlock();
- return fp;
+ return NULL;
}

/*
@@ -4742,9 +4735,10 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type)
struct nfs4_file *fp;
__be32 ret = nfs_ok;

- fp = find_file(current_fh);
+ fp = nfsd4_file_hash_lookup(current_fh);
if (!fp)
return ret;
+
/* Check for conflicting share reservations */
spin_lock(&fp->fi_lock);
if (fp->fi_share_deny & deny_type)



2022-10-28 15:06:17

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v7 03/14] NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection

NFSv4 operations manage the lifetime of nfsd_file items they use by
means of NFSv4 OPEN and CLOSE. Hence there's no need for them to be
garbage collected.

Introduce a mechanism to enable garbage collection for nfsd_file
items used only by NFSv2/3 callers.

Note that the change in nfsd_file_put() ensures that both CLOSE and
DELEGRETURN will actually close out and free an nfsd_file on last
reference of a non-garbage-collected file.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=394
Suggested-by: Trond Myklebust <[email protected]>
Tested-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
---
fs/nfsd/filecache.c | 63 +++++++++++++++++++++++++++++++++++++++++++++------
fs/nfsd/filecache.h | 3 ++
fs/nfsd/nfs3proc.c | 4 ++-
fs/nfsd/trace.h | 3 ++
fs/nfsd/vfs.c | 4 ++-
5 files changed, 64 insertions(+), 13 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index beb41a507623..965c7b13fddc 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -63,6 +63,7 @@ struct nfsd_file_lookup_key {
struct net *net;
const struct cred *cred;
unsigned char need;
+ bool gc;
enum nfsd_file_lookup_type type;
};

@@ -162,6 +163,8 @@ static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg,
return 1;
if (!nfsd_match_cred(nf->nf_cred, key->cred))
return 1;
+ if (!!test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc)
+ return 1;
if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0)
return 1;
break;
@@ -297,6 +300,8 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may)
nf->nf_flags = 0;
__set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
__set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
+ if (key->gc)
+ __set_bit(NFSD_FILE_GC, &nf->nf_flags);
nf->nf_inode = key->inode;
/* nf_ref is pre-incremented for hash table */
refcount_set(&nf->nf_ref, 2);
@@ -428,16 +433,27 @@ nfsd_file_put_noref(struct nfsd_file *nf)
}
}

+static void
+nfsd_file_unhash_and_put(struct nfsd_file *nf)
+{
+ if (nfsd_file_unhash(nf))
+ nfsd_file_put_noref(nf);
+}
+
void
nfsd_file_put(struct nfsd_file *nf)
{
might_sleep();

- nfsd_file_lru_add(nf);
- if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
+ if (test_bit(NFSD_FILE_GC, &nf->nf_flags))
+ nfsd_file_lru_add(nf);
+ else if (refcount_read(&nf->nf_ref) == 2)
+ nfsd_file_unhash_and_put(nf);
+
+ if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
nfsd_file_flush(nf);
nfsd_file_put_noref(nf);
- } else if (nf->nf_file) {
+ } else if (nf->nf_file && test_bit(NFSD_FILE_GC, &nf->nf_flags)) {
nfsd_file_put_noref(nf);
nfsd_file_schedule_laundrette();
} else
@@ -1017,12 +1033,14 @@ nfsd_file_is_cached(struct inode *inode)

static __be32
nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
- unsigned int may_flags, struct nfsd_file **pnf, bool open)
+ unsigned int may_flags, struct nfsd_file **pnf,
+ bool open, bool want_gc)
{
struct nfsd_file_lookup_key key = {
.type = NFSD_FILE_KEY_FULL,
.need = may_flags & NFSD_FILE_MAY_MASK,
.net = SVC_NET(rqstp),
+ .gc = want_gc,
};
bool open_retry = true;
struct nfsd_file *nf;
@@ -1117,14 +1135,35 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
* then unhash.
*/
if (status != nfs_ok || key.inode->i_nlink == 0)
- if (nfsd_file_unhash(nf))
- nfsd_file_put_noref(nf);
+ nfsd_file_unhash_and_put(nf);
clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
smp_mb__after_atomic();
wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
goto out;
}

+/**
+ * nfsd_file_acquire_gc - Get a struct nfsd_file with an open file
+ * @rqstp: the RPC transaction being executed
+ * @fhp: the NFS filehandle of the file to be opened
+ * @may_flags: NFSD_MAY_ settings for the file
+ * @pnf: OUT: new or found "struct nfsd_file" object
+ *
+ * The nfsd_file object returned by this API is reference-counted
+ * and garbage-collected. The object is retained for a few
+ * seconds after the final nfsd_file_put() in case the caller
+ * wants to re-use it.
+ *
+ * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
+ * network byte order is returned.
+ */
+__be32
+nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ unsigned int may_flags, struct nfsd_file **pnf)
+{
+ return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, true);
+}
+
/**
* nfsd_file_acquire - Get a struct nfsd_file with an open file
* @rqstp: the RPC transaction being executed
@@ -1132,6 +1171,10 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
* @may_flags: NFSD_MAY_ settings for the file
* @pnf: OUT: new or found "struct nfsd_file" object
*
+ * The nfsd_file_object returned by this API is reference-counted
+ * but not garbage-collected. The object is unhashed after the
+ * final nfsd_file_put().
+ *
* Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
* network byte order is returned.
*/
@@ -1139,7 +1182,7 @@ __be32
nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int may_flags, struct nfsd_file **pnf)
{
- return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true);
+ return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, false);
}

/**
@@ -1149,6 +1192,10 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
* @may_flags: NFSD_MAY_ settings for the file
* @pnf: OUT: new or found "struct nfsd_file" object
*
+ * The nfsd_file_object returned by this API is reference-counted
+ * but not garbage-collected. The object is released immediately
+ * one RCU grace period after the final nfsd_file_put().
+ *
* Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
* network byte order is returned.
*/
@@ -1156,7 +1203,7 @@ __be32
nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int may_flags, struct nfsd_file **pnf)
{
- return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false);
+ return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false, false);
}

/*
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 6b012ea4bd9d..b7efb2c3ddb1 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -38,6 +38,7 @@ struct nfsd_file {
#define NFSD_FILE_HASHED (0)
#define NFSD_FILE_PENDING (1)
#define NFSD_FILE_REFERENCED (2)
+#define NFSD_FILE_GC (3)
unsigned long nf_flags;
struct inode *nf_inode; /* don't deref */
refcount_t nf_ref;
@@ -55,6 +56,8 @@ void nfsd_file_put(struct nfsd_file *nf);
struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
void nfsd_file_close_inode_sync(struct inode *inode);
bool nfsd_file_is_cached(struct inode *inode);
+__be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
+ unsigned int may_flags, struct nfsd_file **nfp);
__be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int may_flags, struct nfsd_file **nfp);
__be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index ff2920546333..d01b29aba662 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -772,8 +772,8 @@ nfsd3_proc_commit(struct svc_rqst *rqstp)
(unsigned long long) argp->offset);

fh_copy(&resp->fh, &argp->fh);
- resp->status = nfsd_file_acquire(rqstp, &resp->fh, NFSD_MAY_WRITE |
- NFSD_MAY_NOT_BREAK_LEASE, &nf);
+ resp->status = nfsd_file_acquire_gc(rqstp, &resp->fh, NFSD_MAY_WRITE |
+ NFSD_MAY_NOT_BREAK_LEASE, &nf);
if (resp->status)
goto out;
resp->status = nfsd_commit(rqstp, &resp->fh, nf, argp->offset,
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 06a96e955bd0..b065a4b1e0dc 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -814,7 +814,8 @@ DEFINE_CLID_EVENT(confirmed_r);
__print_flags(val, "|", \
{ 1 << NFSD_FILE_HASHED, "HASHED" }, \
{ 1 << NFSD_FILE_PENDING, "PENDING" }, \
- { 1 << NFSD_FILE_REFERENCED, "REFERENCED"})
+ { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}, \
+ { 1 << NFSD_FILE_GC, "GC"})

DECLARE_EVENT_CLASS(nfsd_file_class,
TP_PROTO(struct nfsd_file *nf),
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 91c34cef11d8..df1830a1a295 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1084,7 +1084,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
__be32 err;

trace_nfsd_read_start(rqstp, fhp, offset, *count);
- err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
+ err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_READ, &nf);
if (err)
return err;

@@ -1116,7 +1116,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,

trace_nfsd_write_start(rqstp, fhp, offset, *cnt);

- err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf);
+ err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_WRITE, &nf);
if (err)
goto out;




2022-10-28 15:06:17

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v7 01/14] NFSD: Pass the target nfsd_file to nfsd_commit()

In a moment I'm going to introduce separate nfsd_file types, one of
which is garbage-collected; the other, not. The garbage-collected
variety is to be used by NFSv2 and v3, and the non-garbage-collected
variety is to be used by NFSv4.

nfsd_commit() is invoked by both NFSv3 and NFSv4 consumers. We want
nfsd_commit() to find and use the correct variety of cached
nfsd_file object for the NFS version that is in use.

Signed-off-by: Chuck Lever <[email protected]>
Tested-by: Jeff Layton <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
---
fs/nfsd/nfs3proc.c | 10 +++++++++-
fs/nfsd/nfs4proc.c | 11 ++++++++++-
fs/nfsd/vfs.c | 15 ++++-----------
fs/nfsd/vfs.h | 3 ++-
4 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
index 923d9a80df92..ff2920546333 100644
--- a/fs/nfsd/nfs3proc.c
+++ b/fs/nfsd/nfs3proc.c
@@ -13,6 +13,7 @@
#include "cache.h"
#include "xdr3.h"
#include "vfs.h"
+#include "filecache.h"

#define NFSDDBG_FACILITY NFSDDBG_PROC

@@ -763,6 +764,7 @@ nfsd3_proc_commit(struct svc_rqst *rqstp)
{
struct nfsd3_commitargs *argp = rqstp->rq_argp;
struct nfsd3_commitres *resp = rqstp->rq_resp;
+ struct nfsd_file *nf;

dprintk("nfsd: COMMIT(3) %s %u@%Lu\n",
SVCFH_fmt(&argp->fh),
@@ -770,8 +772,14 @@ nfsd3_proc_commit(struct svc_rqst *rqstp)
(unsigned long long) argp->offset);

fh_copy(&resp->fh, &argp->fh);
- resp->status = nfsd_commit(rqstp, &resp->fh, argp->offset,
+ resp->status = nfsd_file_acquire(rqstp, &resp->fh, NFSD_MAY_WRITE |
+ NFSD_MAY_NOT_BREAK_LEASE, &nf);
+ if (resp->status)
+ goto out;
+ resp->status = nfsd_commit(rqstp, &resp->fh, nf, argp->offset,
argp->count, resp->verf);
+ nfsd_file_put(nf);
+out:
return rpc_success;
}

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 8beb2bc4c328..42db62413890 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -731,10 +731,19 @@ nfsd4_commit(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
union nfsd4_op_u *u)
{
struct nfsd4_commit *commit = &u->commit;
+ struct nfsd_file *nf;
+ __be32 status;

- return nfsd_commit(rqstp, &cstate->current_fh, commit->co_offset,
+ status = nfsd_file_acquire(rqstp, &cstate->current_fh, NFSD_MAY_WRITE |
+ NFSD_MAY_NOT_BREAK_LEASE, &nf);
+ if (status != nfs_ok)
+ return status;
+
+ status = nfsd_commit(rqstp, &cstate->current_fh, nf, commit->co_offset,
commit->co_count,
(__be32 *)commit->co_verf.data);
+ nfsd_file_put(nf);
+ return status;
}

static __be32
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index f650afedd67f..91c34cef11d8 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1132,6 +1132,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
* nfsd_commit - Commit pending writes to stable storage
* @rqstp: RPC request being processed
* @fhp: NFS filehandle
+ * @nf: target file
* @offset: raw offset from beginning of file
* @count: raw count of bytes to sync
* @verf: filled in with the server's current write verifier
@@ -1148,19 +1149,13 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
* An nfsstat value in network byte order.
*/
__be32
-nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset,
- u32 count, __be32 *verf)
+nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
+ u64 offset, u32 count, __be32 *verf)
{
+ __be32 err = nfs_ok;
u64 maxbytes;
loff_t start, end;
struct nfsd_net *nn;
- struct nfsd_file *nf;
- __be32 err;
-
- err = nfsd_file_acquire(rqstp, fhp,
- NFSD_MAY_WRITE|NFSD_MAY_NOT_BREAK_LEASE, &nf);
- if (err)
- goto out;

/*
* Convert the client-provided (offset, count) range to a
@@ -1201,8 +1196,6 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp, u64 offset,
} else
nfsd_copy_write_verifier(verf, nn);

- nfsd_file_put(nf);
-out:
return err;
}

diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 120521bc7b24..9744b041105b 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -88,7 +88,8 @@ __be32 nfsd_access(struct svc_rqst *, struct svc_fh *, u32 *, u32 *);
__be32 nfsd_create_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
struct svc_fh *resfhp, struct nfsd_attrs *iap);
__be32 nfsd_commit(struct svc_rqst *rqst, struct svc_fh *fhp,
- u64 offset, u32 count, __be32 *verf);
+ struct nfsd_file *nf, u64 offset, u32 count,
+ __be32 *verf);
#ifdef CONFIG_NFSD_V4
__be32 nfsd_getxattr(struct svc_rqst *rqstp, struct svc_fh *fhp,
char *name, void **bufp, int *lenp);



2022-10-28 15:06:35

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v7 09/14] NFSD: Clean up nfsd4_init_file()

Name this function more consistently. I'm going to use nfsd4_file_
and nfsd4_file_hash_ for these helpers.

Change the @fh parameter to be const pointer for better type safety.

Finally, move the hash insertion operation to the caller. This is
typical for most other "init_object" type helpers, and it is where
most of the other nfs4_file hash table operations are located.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
---
fs/nfsd/nfs4state.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 60f1aa2c5442..3132e4844ef8 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4262,11 +4262,9 @@ static struct nfs4_file *nfsd4_alloc_file(void)
}

/* OPEN Share state helper functions */
-static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval,
- struct nfs4_file *fp)
-{
- lockdep_assert_held(&state_lock);

+static void nfsd4_file_init(const struct svc_fh *fh, struct nfs4_file *fp)
+{
refcount_set(&fp->fi_ref, 1);
spin_lock_init(&fp->fi_lock);
INIT_LIST_HEAD(&fp->fi_stateids);
@@ -4284,7 +4282,6 @@ static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval,
INIT_LIST_HEAD(&fp->fi_lo_states);
atomic_set(&fp->fi_lo_recalls, 0);
#endif
- hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]);
}

void
@@ -4702,7 +4699,8 @@ static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh,
fp->fi_aliased = alias_found = true;
}
if (likely(ret == NULL)) {
- nfsd4_init_file(fh, hashval, new);
+ nfsd4_file_init(fh, new);
+ hlist_add_head_rcu(&new->fi_hash, &file_hashtbl[hashval]);
new->fi_aliased = alias_found;
ret = new;
}



2022-10-28 15:06:35

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v7 02/14] NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately"

This reverts commit 5e138c4a750dc140d881dab4a8804b094bbc08d2.

That commit attempted to make files available to other users as soon
as all NFSv4 clients were done with them, rather than waiting until
the filecache LRU had garbage collected them.

It gets the reference counting wrong, for one thing.

But it also misses that DELEGRETURN should release a file in the
same fashion. In fact, any nfsd_file_put() on an file held open
by an NFSv4 client needs potentially to release the file
immediately...

Clear the way for implementing that idea.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
---
fs/nfsd/filecache.c | 18 ------------------
fs/nfsd/filecache.h | 1 -
fs/nfsd/nfs4state.c | 4 ++--
3 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 29a62db155fb..beb41a507623 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -444,24 +444,6 @@ nfsd_file_put(struct nfsd_file *nf)
nfsd_file_put_noref(nf);
}

-/**
- * nfsd_file_close - Close an nfsd_file
- * @nf: nfsd_file to close
- *
- * If this is the final reference for @nf, free it immediately.
- * This reflects an on-the-wire CLOSE or DELEGRETURN into the
- * VFS and exported filesystem.
- */
-void nfsd_file_close(struct nfsd_file *nf)
-{
- nfsd_file_put(nf);
- if (refcount_dec_if_one(&nf->nf_ref)) {
- nfsd_file_unhash(nf);
- nfsd_file_lru_remove(nf);
- nfsd_file_free(nf);
- }
-}
-
struct nfsd_file *
nfsd_file_get(struct nfsd_file *nf)
{
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 357832bac736..6b012ea4bd9d 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -52,7 +52,6 @@ void nfsd_file_cache_shutdown(void);
int nfsd_file_cache_start_net(struct net *net);
void nfsd_file_cache_shutdown_net(struct net *net);
void nfsd_file_put(struct nfsd_file *nf);
-void nfsd_file_close(struct nfsd_file *nf);
struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
void nfsd_file_close_inode_sync(struct inode *inode);
bool nfsd_file_is_cached(struct inode *inode);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 4e718500a00c..c829b828b6fd 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -831,9 +831,9 @@ static void __nfs4_file_put_access(struct nfs4_file *fp, int oflag)
swap(f2, fp->fi_fds[O_RDWR]);
spin_unlock(&fp->fi_lock);
if (f1)
- nfsd_file_close(f1);
+ nfsd_file_put(f1);
if (f2)
- nfsd_file_close(f2);
+ nfsd_file_put(f2);
}
}




2022-10-28 15:06:36

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v7 10/14] NFSD: Add a nfsd4_file_hash_remove() helper

Refactor to relocate hash deletion operation to a helper function
that is close to most other nfs4_file data structure operations.

The "noinline" annotation will become useful in a moment when the
hlist_del_rcu() is replaced with a more complex rhash remove
operation. It also guarantees that hash remove operations can be
traced with "-p function -l remove_nfs4_file_locked".

This also simplifies the organization of forward declarations: the
to-be-added rhashtable and its param structure will be defined
/after/ put_nfs4_file().

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
---
fs/nfsd/nfs4state.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 3132e4844ef8..504455cb80e9 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -84,6 +84,7 @@ static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
void nfsd4_end_grace(struct nfsd_net *nn);
static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
+static void nfsd4_file_hash_remove(struct nfs4_file *fi);

/* Locking: */

@@ -591,7 +592,7 @@ put_nfs4_file(struct nfs4_file *fi)
might_lock(&state_lock);

if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) {
- hlist_del_rcu(&fi->fi_hash);
+ nfsd4_file_hash_remove(fi);
spin_unlock(&state_lock);
WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate));
WARN_ON_ONCE(!list_empty(&fi->fi_delegations));
@@ -4734,6 +4735,11 @@ find_or_add_file(struct nfs4_file *new, struct svc_fh *fh)
return insert_file(new, fh, hashval);
}

+static noinline_for_stack void nfsd4_file_hash_remove(struct nfs4_file *fi)
+{
+ hlist_del_rcu(&fi->fi_hash);
+}
+
/*
* Called to check deny when READ with all zero stateid or
* WRITE with all zero or all one stateid



2022-10-28 15:07:42

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v7 05/14] NFSD: Trace stateids returned via DELEGRETURN

Handing out a delegation stateid is recorded with the
nfsd_deleg_read tracepoint, but there isn't a matching tracepoint
for recording when the stateid is returned.

Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4state.c | 1 +
fs/nfsd/trace.h | 1 +
2 files changed, 2 insertions(+)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index c829b828b6fd..93cfae7cd391 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -6901,6 +6901,7 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (status)
goto put_stateid;

+ trace_nfsd_deleg_return(stateid);
wake_up_var(d_inode(cstate->current_fh.fh_dentry));
destroy_delegation(dp);
put_stateid:
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index b065a4b1e0dc..477c2b035872 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -601,6 +601,7 @@ DEFINE_STATEID_EVENT(layout_recall_release);

DEFINE_STATEID_EVENT(open);
DEFINE_STATEID_EVENT(deleg_read);
+DEFINE_STATEID_EVENT(deleg_return);
DEFINE_STATEID_EVENT(deleg_recall);

DECLARE_EVENT_CLASS(nfsd_stateseqid_class,



2022-10-28 15:08:32

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v7 14/14] NFSD: Use rhashtable for managing nfs4_file objects

fh_match() is costly, especially when filehandles are large (as is
the case for NFSv4). It needs to be used sparingly when searching
data structures. Unfortunately, with common workloads, I see
multiple thousands of objects stored in file_hashtbl[], which has
just 256 buckets, making its bucket hash chains quite lengthy.

Walking long hash chains with the state_lock held blocks other
activity that needs that lock. Sizable hash chains are a common
occurrance once the server has handed out some delegations, for
example -- IIUC, each delegated file is held open on the server by
an nfs4_file object.

To help mitigate the cost of searching with fh_match(), replace the
nfs4_file hash table with an rhashtable, which can dynamically
resize its bucket array to minimize hash chain length.

The result of this modification is an improvement in the latency of
NFSv4 operations, and the reduction of nfsd CPU utilization due to
eliminating the cost of multiple calls to fh_match() and reducing
the CPU cache misses incurred while walking long hash chains in the
nfs4_file hash table.

Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
---
fs/nfsd/nfs4state.c | 77 ++++++++++++++++++++++++++-------------------------
fs/nfsd/state.h | 4 ---
2 files changed, 40 insertions(+), 41 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index c2ef2db9c84c..c78b66e678dd 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -591,11 +591,8 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu)
void
put_nfs4_file(struct nfs4_file *fi)
{
- might_lock(&state_lock);
-
- if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) {
+ if (refcount_dec_and_test(&fi->fi_ref)) {
nfsd4_file_hash_remove(fi);
- spin_unlock(&state_lock);
WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate));
WARN_ON_ONCE(!list_empty(&fi->fi_delegations));
call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu);
@@ -709,20 +706,6 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername)
return ret & OWNER_HASH_MASK;
}

-/* hash table for nfs4_file */
-#define FILE_HASH_BITS 8
-#define FILE_HASH_SIZE (1 << FILE_HASH_BITS)
-
-static unsigned int file_hashval(const struct svc_fh *fh)
-{
- struct inode *inode = d_inode(fh->fh_dentry);
-
- /* XXX: why not (here & in file cache) use inode? */
- return (unsigned int)hash_long(inode->i_ino, FILE_HASH_BITS);
-}
-
-static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
-
static struct rhltable nfs4_file_rhltable ____cacheline_aligned_in_smp;

static const struct rhashtable_params nfs4_file_rhash_params = {
@@ -4687,12 +4670,14 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net)
static noinline_for_stack struct nfs4_file *
nfsd4_file_hash_lookup(const struct svc_fh *fhp)
{
- unsigned int hashval = file_hashval(fhp);
+ struct inode *inode = d_inode(fhp->fh_dentry);
+ struct rhlist_head *tmp, *list;
struct nfs4_file *fi;

rcu_read_lock();
- hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash,
- lockdep_is_held(&state_lock)) {
+ list = rhltable_lookup(&nfs4_file_rhltable, &inode,
+ nfs4_file_rhash_params);
+ rhl_for_each_entry_rcu(fi, tmp, list, fi_rlist) {
if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) {
if (refcount_inc_not_zero(&fi->fi_ref)) {
rcu_read_unlock();
@@ -4706,40 +4691,56 @@ nfsd4_file_hash_lookup(const struct svc_fh *fhp)

/*
* On hash insertion, identify entries with the same inode but
- * distinct filehandles. They will all be in the same hash bucket
- * because nfs4_file's are hashed by the address in the fi_inode
- * field.
+ * distinct filehandles. They will all be on the list returned
+ * by rhltable_lookup().
+ *
+ * inode->i_lock prevents racing insertions from adding an entry
+ * for the same inode/fhp pair twice.
*/
static noinline_for_stack struct nfs4_file *
nfsd4_file_hash_insert(struct nfs4_file *new, const struct svc_fh *fhp)
{
- unsigned int hashval = file_hashval(fhp);
+ struct inode *inode = d_inode(fhp->fh_dentry);
+ struct rhlist_head *tmp, *list;
struct nfs4_file *ret = NULL;
bool alias_found = false;
struct nfs4_file *fi;
+ int err;

- spin_lock(&state_lock);
- hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash,
- lockdep_is_held(&state_lock)) {
+ rcu_read_lock();
+ spin_lock(&inode->i_lock);
+
+ list = rhltable_lookup(&nfs4_file_rhltable, &inode,
+ nfs4_file_rhash_params);
+ rhl_for_each_entry_rcu(fi, tmp, list, fi_rlist) {
if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) {
if (refcount_inc_not_zero(&fi->fi_ref))
ret = fi;
- } else if (d_inode(fhp->fh_dentry) == fi->fi_inode)
+ } else
fi->fi_aliased = alias_found = true;
}
- if (likely(ret == NULL)) {
- nfsd4_file_init(fhp, new);
- hlist_add_head_rcu(&new->fi_hash, &file_hashtbl[hashval]);
- new->fi_aliased = alias_found;
- ret = new;
- }
- spin_unlock(&state_lock);
+ if (ret)
+ goto out_unlock;
+
+ nfsd4_file_init(fhp, new);
+ err = rhltable_insert(&nfs4_file_rhltable, &new->fi_rlist,
+ nfs4_file_rhash_params);
+ if (err)
+ goto out_unlock;
+
+ new->fi_aliased = alias_found;
+ ret = new;
+
+out_unlock:
+ spin_unlock(&inode->i_lock);
+ rcu_read_unlock();
return ret;
}

static noinline_for_stack void nfsd4_file_hash_remove(struct nfs4_file *fi)
{
- hlist_del_rcu(&fi->fi_hash);
+ rhltable_remove(&nfs4_file_rhltable, &fi->fi_rlist,
+ nfs4_file_rhash_params);
}

/*
@@ -5629,6 +5630,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
* If not found, create the nfs4_file struct
*/
fp = nfsd4_file_hash_insert(open->op_file, current_fh);
+ if (unlikely(!fp))
+ return nfserr_jukebox;
if (fp != open->op_file) {
status = nfs4_check_deleg(cl, open, &dp);
if (status)
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 190fc7e418a4..eadd7f465bf5 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -536,16 +536,12 @@ struct nfs4_clnt_odstate {
* inode can have multiple filehandles associated with it, so there is
* (potentially) a many to one relationship between this struct and struct
* inode.
- *
- * These are hashed by filehandle in the file_hashtbl, which is protected by
- * the global state_lock spinlock.
*/
struct nfs4_file {
refcount_t fi_ref;
struct inode * fi_inode;
bool fi_aliased;
spinlock_t fi_lock;
- struct hlist_node fi_hash; /* hash on fi_fhandle */
struct rhlist_head fi_rlist;
struct list_head fi_stateids;
union {



2022-10-31 16:40:36

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [PATCH v7 03/14] NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection

On Fri, 2022-10-28 at 10:46 -0400, Chuck Lever wrote:
> NFSv4 operations manage the lifetime of nfsd_file items they use by
> means of NFSv4 OPEN and CLOSE. Hence there's no need for them to be
> garbage collected.
>
> Introduce a mechanism to enable garbage collection for nfsd_file
> items used only by NFSv2/3 callers.
>
> Note that the change in nfsd_file_put() ensures that both CLOSE and
> DELEGRETURN will actually close out and free an nfsd_file on last
> reference of a non-garbage-collected file.
>
> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=394
> Suggested-by: Trond Myklebust <[email protected]>
> Tested-by: Jeff Layton <[email protected]>
> Signed-off-by: Chuck Lever <[email protected]>
> Reviewed-by: NeilBrown <[email protected]>
> ---
> fs/nfsd/filecache.c | 63 +++++++++++++++++++++++++++++++++++++++++++++------
> fs/nfsd/filecache.h | 3 ++
> fs/nfsd/nfs3proc.c | 4 ++-
> fs/nfsd/trace.h | 3 ++
> fs/nfsd/vfs.c | 4 ++-
> 5 files changed, 64 insertions(+), 13 deletions(-)
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index beb41a507623..965c7b13fddc 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -63,6 +63,7 @@ struct nfsd_file_lookup_key {
> struct net *net;
> const struct cred *cred;
> unsigned char need;
> + bool gc;
> enum nfsd_file_lookup_type type;
> };
>
> @@ -162,6 +163,8 @@ static int nfsd_file_obj_cmpfn(struct rhashtable_compare_arg *arg,
> return 1;
> if (!nfsd_match_cred(nf->nf_cred, key->cred))
> return 1;
> + if (!!test_bit(NFSD_FILE_GC, &nf->nf_flags) != key->gc)
> + return 1;
> if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0)
> return 1;
> break;
> @@ -297,6 +300,8 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may)
> nf->nf_flags = 0;
> __set_bit(NFSD_FILE_HASHED, &nf->nf_flags);
> __set_bit(NFSD_FILE_PENDING, &nf->nf_flags);
> + if (key->gc)
> + __set_bit(NFSD_FILE_GC, &nf->nf_flags);
> nf->nf_inode = key->inode;
> /* nf_ref is pre-incremented for hash table */
> refcount_set(&nf->nf_ref, 2);
> @@ -428,16 +433,27 @@ nfsd_file_put_noref(struct nfsd_file *nf)
> }
> }
>
> +static void
> +nfsd_file_unhash_and_put(struct nfsd_file *nf)
> +{
> + if (nfsd_file_unhash(nf))
> + nfsd_file_put_noref(nf);
> +}
> +
> void
> nfsd_file_put(struct nfsd_file *nf)
> {
> might_sleep();
>
> - nfsd_file_lru_add(nf);
> - if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) {
> + if (test_bit(NFSD_FILE_GC, &nf->nf_flags))
> + nfsd_file_lru_add(nf);
> + else if (refcount_read(&nf->nf_ref) == 2)
> + nfsd_file_unhash_and_put(nf);
> +
> + if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
> nfsd_file_flush(nf);
> nfsd_file_put_noref(nf);
> - } else if (nf->nf_file) {
> + } else if (nf->nf_file && test_bit(NFSD_FILE_GC, &nf->nf_flags)) {
> nfsd_file_put_noref(nf);
> nfsd_file_schedule_laundrette();
> } else
> @@ -1017,12 +1033,14 @@ nfsd_file_is_cached(struct inode *inode)
>
> static __be32
> nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> - unsigned int may_flags, struct nfsd_file **pnf, bool open)
> + unsigned int may_flags, struct nfsd_file **pnf,
> + bool open, bool want_gc)
> {
> struct nfsd_file_lookup_key key = {
> .type = NFSD_FILE_KEY_FULL,
> .need = may_flags & NFSD_FILE_MAY_MASK,
> .net = SVC_NET(rqstp),
> + .gc = want_gc,
> };
> bool open_retry = true;
> struct nfsd_file *nf;
> @@ -1117,14 +1135,35 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> * then unhash.
> */
> if (status != nfs_ok || key.inode->i_nlink == 0)
> - if (nfsd_file_unhash(nf))
> - nfsd_file_put_noref(nf);
> + nfsd_file_unhash_and_put(nf);
> clear_bit_unlock(NFSD_FILE_PENDING, &nf->nf_flags);
> smp_mb__after_atomic();
> wake_up_bit(&nf->nf_flags, NFSD_FILE_PENDING);
> goto out;
> }
>
> +/**
> + * nfsd_file_acquire_gc - Get a struct nfsd_file with an open file
> + * @rqstp: the RPC transaction being executed
> + * @fhp: the NFS filehandle of the file to be opened
> + * @may_flags: NFSD_MAY_ settings for the file
> + * @pnf: OUT: new or found "struct nfsd_file" object
> + *
> + * The nfsd_file object returned by this API is reference-counted
> + * and garbage-collected. The object is retained for a few
> + * seconds after the final nfsd_file_put() in case the caller
> + * wants to re-use it.
> + *
> + * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
> + * network byte order is returned.
> + */
> +__be32
> +nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
> + unsigned int may_flags, struct nfsd_file **pnf)
> +{
> + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, true);
> +}
> +
> /**
> * nfsd_file_acquire - Get a struct nfsd_file with an open file
> * @rqstp: the RPC transaction being executed
> @@ -1132,6 +1171,10 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> * @may_flags: NFSD_MAY_ settings for the file
> * @pnf: OUT: new or found "struct nfsd_file" object
> *
> + * The nfsd_file_object returned by this API is reference-counted
> + * but not garbage-collected. The object is unhashed after the
> + * final nfsd_file_put().
> + *
> * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
> * network byte order is returned.
> */
> @@ -1139,7 +1182,7 @@ __be32
> nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> unsigned int may_flags, struct nfsd_file **pnf)
> {
> - return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true);
> + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, true, false);
> }
>
> /**
> @@ -1149,6 +1192,10 @@ nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> * @may_flags: NFSD_MAY_ settings for the file
> * @pnf: OUT: new or found "struct nfsd_file" object
> *
> + * The nfsd_file_object returned by this API is reference-counted
> + * but not garbage-collected. The object is released immediately
> + * one RCU grace period after the final nfsd_file_put().
> + *
> * Returns nfs_ok and sets @pnf on success; otherwise an nfsstat in
> * network byte order is returned.
> */
> @@ -1156,7 +1203,7 @@ __be32
> nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> unsigned int may_flags, struct nfsd_file **pnf)
> {
> - return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false);
> + return nfsd_file_do_acquire(rqstp, fhp, may_flags, pnf, false, false);
> }
>
> /*
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index 6b012ea4bd9d..b7efb2c3ddb1 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -38,6 +38,7 @@ struct nfsd_file {
> #define NFSD_FILE_HASHED (0)
> #define NFSD_FILE_PENDING (1)
> #define NFSD_FILE_REFERENCED (2)
> +#define NFSD_FILE_GC (3)
> unsigned long nf_flags;
> struct inode *nf_inode; /* don't deref */
> refcount_t nf_ref;
> @@ -55,6 +56,8 @@ void nfsd_file_put(struct nfsd_file *nf);
> struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> void nfsd_file_close_inode_sync(struct inode *inode);
> bool nfsd_file_is_cached(struct inode *inode);
> +__be32 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
> + unsigned int may_flags, struct nfsd_file **nfp);
> __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> unsigned int may_flags, struct nfsd_file **nfp);
> __be32 nfsd_file_create(struct svc_rqst *rqstp, struct svc_fh *fhp,
> diff --git a/fs/nfsd/nfs3proc.c b/fs/nfsd/nfs3proc.c
> index ff2920546333..d01b29aba662 100644
> --- a/fs/nfsd/nfs3proc.c
> +++ b/fs/nfsd/nfs3proc.c
> @@ -772,8 +772,8 @@ nfsd3_proc_commit(struct svc_rqst *rqstp)
> (unsigned long long) argp->offset);
>
> fh_copy(&resp->fh, &argp->fh);
> - resp->status = nfsd_file_acquire(rqstp, &resp->fh, NFSD_MAY_WRITE |
> - NFSD_MAY_NOT_BREAK_LEASE, &nf);
> + resp->status = nfsd_file_acquire_gc(rqstp, &resp->fh, NFSD_MAY_WRITE |
> + NFSD_MAY_NOT_BREAK_LEASE, &nf);
> if (resp->status)
> goto out;
> resp->status = nfsd_commit(rqstp, &resp->fh, nf, argp->offset,
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 06a96e955bd0..b065a4b1e0dc 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -814,7 +814,8 @@ DEFINE_CLID_EVENT(confirmed_r);
> __print_flags(val, "|", \
> { 1 << NFSD_FILE_HASHED, "HASHED" }, \
> { 1 << NFSD_FILE_PENDING, "PENDING" }, \
> - { 1 << NFSD_FILE_REFERENCED, "REFERENCED"})
> + { 1 << NFSD_FILE_REFERENCED, "REFERENCED"}, \
> + { 1 << NFSD_FILE_GC, "GC"})
>
> DECLARE_EVENT_CLASS(nfsd_file_class,
> TP_PROTO(struct nfsd_file *nf),
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 91c34cef11d8..df1830a1a295 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1084,7 +1084,7 @@ __be32 nfsd_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> __be32 err;
>
> trace_nfsd_read_start(rqstp, fhp, offset, *count);
> - err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_READ, &nf);
> + err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_READ, &nf);
> if (err)
> return err;
>
> @@ -1116,7 +1116,7 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
>
> trace_nfsd_write_start(rqstp, fhp, offset, *cnt);
>
> - err = nfsd_file_acquire(rqstp, fhp, NFSD_MAY_WRITE, &nf);
> + err = nfsd_file_acquire_gc(rqstp, fhp, NFSD_MAY_WRITE, &nf);
> if (err)
> goto out;
>
>
>

Reviewed-by: Jeff Layton <[email protected]>

2022-10-31 16:40:56

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [PATCH v7 05/14] NFSD: Trace stateids returned via DELEGRETURN

On Fri, 2022-10-28 at 10:47 -0400, Chuck Lever wrote:
> Handing out a delegation stateid is recorded with the
> nfsd_deleg_read tracepoint, but there isn't a matching tracepoint
> for recording when the stateid is returned.
>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 1 +
> fs/nfsd/trace.h | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index c829b828b6fd..93cfae7cd391 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -6901,6 +6901,7 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> if (status)
> goto put_stateid;
>
> + trace_nfsd_deleg_return(stateid);
> wake_up_var(d_inode(cstate->current_fh.fh_dentry));
> destroy_delegation(dp);
> put_stateid:
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index b065a4b1e0dc..477c2b035872 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -601,6 +601,7 @@ DEFINE_STATEID_EVENT(layout_recall_release);
>
> DEFINE_STATEID_EVENT(open);
> DEFINE_STATEID_EVENT(deleg_read);
> +DEFINE_STATEID_EVENT(deleg_return);
> DEFINE_STATEID_EVENT(deleg_recall);
>
> DECLARE_EVENT_CLASS(nfsd_stateseqid_class,
>
>

Reviewed-by: Jeff Layton <[email protected]>

2022-10-31 16:41:02

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [PATCH v7 09/14] NFSD: Clean up nfsd4_init_file()

On Fri, 2022-10-28 at 10:47 -0400, Chuck Lever wrote:
> Name this function more consistently. I'm going to use nfsd4_file_
> and nfsd4_file_hash_ for these helpers.
>
> Change the @fh parameter to be const pointer for better type safety.
>
> Finally, move the hash insertion operation to the caller. This is
> typical for most other "init_object" type helpers, and it is where
> most of the other nfs4_file hash table operations are located.
>
> Signed-off-by: Chuck Lever <[email protected]>
> Reviewed-by: NeilBrown <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 60f1aa2c5442..3132e4844ef8 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4262,11 +4262,9 @@ static struct nfs4_file *nfsd4_alloc_file(void)
> }
>
> /* OPEN Share state helper functions */
> -static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval,
> - struct nfs4_file *fp)
> -{
> - lockdep_assert_held(&state_lock);
>
> +static void nfsd4_file_init(const struct svc_fh *fh, struct nfs4_file *fp)
> +{
> refcount_set(&fp->fi_ref, 1);
> spin_lock_init(&fp->fi_lock);
> INIT_LIST_HEAD(&fp->fi_stateids);
> @@ -4284,7 +4282,6 @@ static void nfsd4_init_file(struct svc_fh *fh, unsigned int hashval,
> INIT_LIST_HEAD(&fp->fi_lo_states);
> atomic_set(&fp->fi_lo_recalls, 0);
> #endif
> - hlist_add_head_rcu(&fp->fi_hash, &file_hashtbl[hashval]);
> }
>
> void
> @@ -4702,7 +4699,8 @@ static struct nfs4_file *insert_file(struct nfs4_file *new, struct svc_fh *fh,
> fp->fi_aliased = alias_found = true;
> }
> if (likely(ret == NULL)) {
> - nfsd4_init_file(fh, hashval, new);
> + nfsd4_file_init(fh, new);
> + hlist_add_head_rcu(&new->fi_hash, &file_hashtbl[hashval]);
> new->fi_aliased = alias_found;
> ret = new;
> }
>
>

Reviewed-by: Jeff Layton <[email protected]>

2022-10-31 16:42:32

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [PATCH v7 10/14] NFSD: Add a nfsd4_file_hash_remove() helper

On Fri, 2022-10-28 at 10:47 -0400, Chuck Lever wrote:
> Refactor to relocate hash deletion operation to a helper function
> that is close to most other nfs4_file data structure operations.
>
> The "noinline" annotation will become useful in a moment when the
> hlist_del_rcu() is replaced with a more complex rhash remove
> operation. It also guarantees that hash remove operations can be
> traced with "-p function -l remove_nfs4_file_locked".
>
> This also simplifies the organization of forward declarations: the
> to-be-added rhashtable and its param structure will be defined
> /after/ put_nfs4_file().
>
> Signed-off-by: Chuck Lever <[email protected]>
> Reviewed-by: NeilBrown <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 3132e4844ef8..504455cb80e9 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -84,6 +84,7 @@ static bool check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
> static void nfs4_free_ol_stateid(struct nfs4_stid *stid);
> void nfsd4_end_grace(struct nfsd_net *nn);
> static void _free_cpntf_state_locked(struct nfsd_net *nn, struct nfs4_cpntf_state *cps);
> +static void nfsd4_file_hash_remove(struct nfs4_file *fi);
>
> /* Locking: */
>
> @@ -591,7 +592,7 @@ put_nfs4_file(struct nfs4_file *fi)
> might_lock(&state_lock);
>
> if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) {
> - hlist_del_rcu(&fi->fi_hash);
> + nfsd4_file_hash_remove(fi);
> spin_unlock(&state_lock);
> WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate));
> WARN_ON_ONCE(!list_empty(&fi->fi_delegations));
> @@ -4734,6 +4735,11 @@ find_or_add_file(struct nfs4_file *new, struct svc_fh *fh)
> return insert_file(new, fh, hashval);
> }
>
> +static noinline_for_stack void nfsd4_file_hash_remove(struct nfs4_file *fi)
> +{
> + hlist_del_rcu(&fi->fi_hash);
> +}
> +
> /*
> * Called to check deny when READ with all zero stateid or
> * WRITE with all zero or all one stateid
>
>

Reviewed-by: Jeff Layton <[email protected]>

2022-10-31 16:54:07

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [PATCH v7 12/14] NFSD: Refactor find_file()

On Fri, 2022-10-28 at 10:47 -0400, Chuck Lever wrote:
> find_file() is now the only caller of find_file_locked(), so just
> fold these two together.
>
> Signed-off-by: Chuck Lever <[email protected]>
> Reviewed-by: NeilBrown <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 36 +++++++++++++++---------------------
> 1 file changed, 15 insertions(+), 21 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index b1988a46fb9b..2b694d693be5 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -4667,31 +4667,24 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net)
> nfs4_put_stid(&last->st_stid);
> }
>
> -/* search file_hashtbl[] for file */
> -static struct nfs4_file *
> -find_file_locked(const struct svc_fh *fh, unsigned int hashval)
> +static noinline_for_stack struct nfs4_file *
> +nfsd4_file_hash_lookup(const struct svc_fh *fhp)
> {
> - struct nfs4_file *fp;
> + unsigned int hashval = file_hashval(fhp);
> + struct nfs4_file *fi;
>
> - hlist_for_each_entry_rcu(fp, &file_hashtbl[hashval], fi_hash,
> - lockdep_is_held(&state_lock)) {
> - if (fh_match(&fp->fi_fhandle, &fh->fh_handle)) {
> - if (refcount_inc_not_zero(&fp->fi_ref))
> - return fp;
> + rcu_read_lock();
> + hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash,
> + lockdep_is_held(&state_lock)) {
> + if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) {
> + if (refcount_inc_not_zero(&fi->fi_ref)) {
> + rcu_read_unlock();
> + return fi;
> + }
> }
> }
> - return NULL;
> -}
> -
> -static struct nfs4_file * find_file(struct svc_fh *fh)
> -{
> - struct nfs4_file *fp;
> - unsigned int hashval = file_hashval(fh);
> -
> - rcu_read_lock();
> - fp = find_file_locked(fh, hashval);
> rcu_read_unlock();
> - return fp;
> + return NULL;
> }
>
> /*
> @@ -4742,9 +4735,10 @@ nfs4_share_conflict(struct svc_fh *current_fh, unsigned int deny_type)
> struct nfs4_file *fp;
> __be32 ret = nfs_ok;
>
> - fp = find_file(current_fh);
> + fp = nfsd4_file_hash_lookup(current_fh);
> if (!fp)
> return ret;
> +
> /* Check for conflicting share reservations */
> spin_lock(&fp->fi_lock);
> if (fp->fi_share_deny & deny_type)
>
>

Reviewed-by: Jeff Layton <[email protected]>

2022-10-31 17:11:00

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v7 14/14] NFSD: Use rhashtable for managing nfs4_file objects

On Fri, 2022-10-28 at 10:48 -0400, Chuck Lever wrote:
> fh_match() is costly, especially when filehandles are large (as is
> the case for NFSv4). It needs to be used sparingly when searching
> data structures. Unfortunately, with common workloads, I see
> multiple thousands of objects stored in file_hashtbl[], which has
> just 256 buckets, making its bucket hash chains quite lengthy.
>
> Walking long hash chains with the state_lock held blocks other
> activity that needs that lock. Sizable hash chains are a common
> occurrance once the server has handed out some delegations, for
> example -- IIUC, each delegated file is held open on the server by
> an nfs4_file object.
>
> To help mitigate the cost of searching with fh_match(), replace the
> nfs4_file hash table with an rhashtable, which can dynamically
> resize its bucket array to minimize hash chain length.
>
> The result of this modification is an improvement in the latency of
> NFSv4 operations, and the reduction of nfsd CPU utilization due to
> eliminating the cost of multiple calls to fh_match() and reducing
> the CPU cache misses incurred while walking long hash chains in the
> nfs4_file hash table.
>
> Signed-off-by: Chuck Lever <[email protected]>
> Reviewed-by: NeilBrown <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 77 ++++++++++++++++++++++++++-------------------------
> fs/nfsd/state.h | 4 ---
> 2 files changed, 40 insertions(+), 41 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index c2ef2db9c84c..c78b66e678dd 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -591,11 +591,8 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu)
> void
> put_nfs4_file(struct nfs4_file *fi)
> {
> - might_lock(&state_lock);
> -
> - if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) {
> + if (refcount_dec_and_test(&fi->fi_ref)) {
> nfsd4_file_hash_remove(fi);
> - spin_unlock(&state_lock);
> WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate));
> WARN_ON_ONCE(!list_empty(&fi->fi_delegations));
> call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu);
> @@ -709,20 +706,6 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername)
> return ret & OWNER_HASH_MASK;
> }
>
> -/* hash table for nfs4_file */
> -#define FILE_HASH_BITS 8
> -#define FILE_HASH_SIZE (1 << FILE_HASH_BITS)
> -
> -static unsigned int file_hashval(const struct svc_fh *fh)
> -{
> - struct inode *inode = d_inode(fh->fh_dentry);
> -
> - /* XXX: why not (here & in file cache) use inode? */
> - return (unsigned int)hash_long(inode->i_ino, FILE_HASH_BITS);
> -}
> -
> -static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
> -
> static struct rhltable nfs4_file_rhltable ____cacheline_aligned_in_smp;
>
> static const struct rhashtable_params nfs4_file_rhash_params = {
> @@ -4687,12 +4670,14 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net)
> static noinline_for_stack struct nfs4_file *
> nfsd4_file_hash_lookup(const struct svc_fh *fhp)
> {
> - unsigned int hashval = file_hashval(fhp);
> + struct inode *inode = d_inode(fhp->fh_dentry);
> + struct rhlist_head *tmp, *list;
> struct nfs4_file *fi;
>
> rcu_read_lock();
> - hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash,
> - lockdep_is_held(&state_lock)) {
> + list = rhltable_lookup(&nfs4_file_rhltable, &inode,
> + nfs4_file_rhash_params);
> + rhl_for_each_entry_rcu(fi, tmp, list, fi_rlist) {
> if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) {
> if (refcount_inc_not_zero(&fi->fi_ref)) {
> rcu_read_unlock();
> @@ -4706,40 +4691,56 @@ nfsd4_file_hash_lookup(const struct svc_fh *fhp)
>
> /*
> * On hash insertion, identify entries with the same inode but
> - * distinct filehandles. They will all be in the same hash bucket
> - * because nfs4_file's are hashed by the address in the fi_inode
> - * field.
> + * distinct filehandles. They will all be on the list returned
> + * by rhltable_lookup().
> + *
> + * inode->i_lock prevents racing insertions from adding an entry
> + * for the same inode/fhp pair twice.
> */
> static noinline_for_stack struct nfs4_file *
> nfsd4_file_hash_insert(struct nfs4_file *new, const struct svc_fh *fhp)
> {
> - unsigned int hashval = file_hashval(fhp);
> + struct inode *inode = d_inode(fhp->fh_dentry);
> + struct rhlist_head *tmp, *list;
> struct nfs4_file *ret = NULL;
> bool alias_found = false;
> struct nfs4_file *fi;
> + int err;
>
> - spin_lock(&state_lock);
> - hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash,
> - lockdep_is_held(&state_lock)) {
> + rcu_read_lock();
> + spin_lock(&inode->i_lock);
> +
> + list = rhltable_lookup(&nfs4_file_rhltable, &inode,
> + nfs4_file_rhash_params);
> + rhl_for_each_entry_rcu(fi, tmp, list, fi_rlist) {
> if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) {
> if (refcount_inc_not_zero(&fi->fi_ref))
> ret = fi;
> - } else if (d_inode(fhp->fh_dentry) == fi->fi_inode)
> + } else
> fi->fi_aliased = alias_found = true;
> }
> - if (likely(ret == NULL)) {
> - nfsd4_file_init(fhp, new);
> - hlist_add_head_rcu(&new->fi_hash, &file_hashtbl[hashval]);
> - new->fi_aliased = alias_found;
> - ret = new;
> - }
> - spin_unlock(&state_lock);
> + if (ret)
> + goto out_unlock;
> +
> + nfsd4_file_init(fhp, new);
> + err = rhltable_insert(&nfs4_file_rhltable, &new->fi_rlist,
> + nfs4_file_rhash_params);
> + if (err)
> + goto out_unlock;
> +
> + new->fi_aliased = alias_found;
> + ret = new;
> +
> +out_unlock:
> + spin_unlock(&inode->i_lock);
> + rcu_read_unlock();
> return ret;
> }
>
> static noinline_for_stack void nfsd4_file_hash_remove(struct nfs4_file *fi)
> {
> - hlist_del_rcu(&fi->fi_hash);
> + rhltable_remove(&nfs4_file_rhltable, &fi->fi_rlist,
> + nfs4_file_rhash_params);
> }
>
> /*
> @@ -5629,6 +5630,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
> * If not found, create the nfs4_file struct
> */
> fp = nfsd4_file_hash_insert(open->op_file, current_fh);
> + if (unlikely(!fp))
> + return nfserr_jukebox;
> if (fp != open->op_file) {
> status = nfs4_check_deleg(cl, open, &dp);
> if (status)
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 190fc7e418a4..eadd7f465bf5 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -536,16 +536,12 @@ struct nfs4_clnt_odstate {
> * inode can have multiple filehandles associated with it, so there is
> * (potentially) a many to one relationship between this struct and struct
> * inode.
> - *
> - * These are hashed by filehandle in the file_hashtbl, which is protected by
> - * the global state_lock spinlock.
> */
> struct nfs4_file {
> refcount_t fi_ref;
> struct inode * fi_inode;
> bool fi_aliased;
> spinlock_t fi_lock;
> - struct hlist_node fi_hash; /* hash on fi_fhandle */
> struct rhlist_head fi_rlist;
> struct list_head fi_stateids;
> union {
>
>

You can add this to 13 and 14:

Reviewed-by: Jeff Layton <[email protected]>

That said, I'd probably prefer to see the two squashed together, since
you're adding unused infrastructure in 13.


2022-10-31 17:13:31

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [PATCH v7 14/14] NFSD: Use rhashtable for managing nfs4_file objects

On Mon, 2022-10-31 at 12:54 -0400, Jeff Layton wrote:
> On Fri, 2022-10-28 at 10:48 -0400, Chuck Lever wrote:
> > fh_match() is costly, especially when filehandles are large (as is
> > the case for NFSv4). It needs to be used sparingly when searching
> > data structures. Unfortunately, with common workloads, I see
> > multiple thousands of objects stored in file_hashtbl[], which has
> > just 256 buckets, making its bucket hash chains quite lengthy.
> >
> > Walking long hash chains with the state_lock held blocks other
> > activity that needs that lock. Sizable hash chains are a common
> > occurrance once the server has handed out some delegations, for
> > example -- IIUC, each delegated file is held open on the server by
> > an nfs4_file object.
> >
> > To help mitigate the cost of searching with fh_match(), replace the
> > nfs4_file hash table with an rhashtable, which can dynamically
> > resize its bucket array to minimize hash chain length.
> >
> > The result of this modification is an improvement in the latency of
> > NFSv4 operations, and the reduction of nfsd CPU utilization due to
> > eliminating the cost of multiple calls to fh_match() and reducing
> > the CPU cache misses incurred while walking long hash chains in the
> > nfs4_file hash table.
> >
> > Signed-off-by: Chuck Lever <[email protected]>
> > Reviewed-by: NeilBrown <[email protected]>
> > ---
> > fs/nfsd/nfs4state.c | 77 ++++++++++++++++++++++++++-------------------------
> > fs/nfsd/state.h | 4 ---
> > 2 files changed, 40 insertions(+), 41 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index c2ef2db9c84c..c78b66e678dd 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -591,11 +591,8 @@ static void nfsd4_free_file_rcu(struct rcu_head *rcu)
> > void
> > put_nfs4_file(struct nfs4_file *fi)
> > {
> > - might_lock(&state_lock);
> > -
> > - if (refcount_dec_and_lock(&fi->fi_ref, &state_lock)) {
> > + if (refcount_dec_and_test(&fi->fi_ref)) {
> > nfsd4_file_hash_remove(fi);
> > - spin_unlock(&state_lock);
> > WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate));
> > WARN_ON_ONCE(!list_empty(&fi->fi_delegations));
> > call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu);
> > @@ -709,20 +706,6 @@ static unsigned int ownerstr_hashval(struct xdr_netobj *ownername)
> > return ret & OWNER_HASH_MASK;
> > }
> >
> > -/* hash table for nfs4_file */
> > -#define FILE_HASH_BITS 8
> > -#define FILE_HASH_SIZE (1 << FILE_HASH_BITS)
> > -
> > -static unsigned int file_hashval(const struct svc_fh *fh)
> > -{
> > - struct inode *inode = d_inode(fh->fh_dentry);
> > -
> > - /* XXX: why not (here & in file cache) use inode? */
> > - return (unsigned int)hash_long(inode->i_ino, FILE_HASH_BITS);
> > -}
> > -
> > -static struct hlist_head file_hashtbl[FILE_HASH_SIZE];
> > -
> > static struct rhltable nfs4_file_rhltable ____cacheline_aligned_in_smp;
> >
> > static const struct rhashtable_params nfs4_file_rhash_params = {
> > @@ -4687,12 +4670,14 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net)
> > static noinline_for_stack struct nfs4_file *
> > nfsd4_file_hash_lookup(const struct svc_fh *fhp)
> > {
> > - unsigned int hashval = file_hashval(fhp);
> > + struct inode *inode = d_inode(fhp->fh_dentry);
> > + struct rhlist_head *tmp, *list;
> > struct nfs4_file *fi;
> >
> > rcu_read_lock();
> > - hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash,
> > - lockdep_is_held(&state_lock)) {
> > + list = rhltable_lookup(&nfs4_file_rhltable, &inode,
> > + nfs4_file_rhash_params);
> > + rhl_for_each_entry_rcu(fi, tmp, list, fi_rlist) {
> > if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) {
> > if (refcount_inc_not_zero(&fi->fi_ref)) {
> > rcu_read_unlock();
> > @@ -4706,40 +4691,56 @@ nfsd4_file_hash_lookup(const struct svc_fh *fhp)
> >
> > /*
> > * On hash insertion, identify entries with the same inode but
> > - * distinct filehandles. They will all be in the same hash bucket
> > - * because nfs4_file's are hashed by the address in the fi_inode
> > - * field.
> > + * distinct filehandles. They will all be on the list returned
> > + * by rhltable_lookup().
> > + *
> > + * inode->i_lock prevents racing insertions from adding an entry
> > + * for the same inode/fhp pair twice.
> > */
> > static noinline_for_stack struct nfs4_file *
> > nfsd4_file_hash_insert(struct nfs4_file *new, const struct svc_fh *fhp)
> > {
> > - unsigned int hashval = file_hashval(fhp);
> > + struct inode *inode = d_inode(fhp->fh_dentry);
> > + struct rhlist_head *tmp, *list;
> > struct nfs4_file *ret = NULL;
> > bool alias_found = false;
> > struct nfs4_file *fi;
> > + int err;
> >
> > - spin_lock(&state_lock);
> > - hlist_for_each_entry_rcu(fi, &file_hashtbl[hashval], fi_hash,
> > - lockdep_is_held(&state_lock)) {
> > + rcu_read_lock();
> > + spin_lock(&inode->i_lock);
> > +
> > + list = rhltable_lookup(&nfs4_file_rhltable, &inode,
> > + nfs4_file_rhash_params);
> > + rhl_for_each_entry_rcu(fi, tmp, list, fi_rlist) {
> > if (fh_match(&fi->fi_fhandle, &fhp->fh_handle)) {
> > if (refcount_inc_not_zero(&fi->fi_ref))
> > ret = fi;
> > - } else if (d_inode(fhp->fh_dentry) == fi->fi_inode)
> > + } else
> > fi->fi_aliased = alias_found = true;
> > }
> > - if (likely(ret == NULL)) {
> > - nfsd4_file_init(fhp, new);
> > - hlist_add_head_rcu(&new->fi_hash, &file_hashtbl[hashval]);
> > - new->fi_aliased = alias_found;
> > - ret = new;
> > - }
> > - spin_unlock(&state_lock);
> > + if (ret)
> > + goto out_unlock;
> > +
> > + nfsd4_file_init(fhp, new);
> > + err = rhltable_insert(&nfs4_file_rhltable, &new->fi_rlist,
> > + nfs4_file_rhash_params);
> > + if (err)
> > + goto out_unlock;
> > +
> > + new->fi_aliased = alias_found;
> > + ret = new;
> > +
> > +out_unlock:
> > + spin_unlock(&inode->i_lock);
> > + rcu_read_unlock();
> > return ret;
> > }
> >
> > static noinline_for_stack void nfsd4_file_hash_remove(struct nfs4_file *fi)
> > {
> > - hlist_del_rcu(&fi->fi_hash);
> > + rhltable_remove(&nfs4_file_rhltable, &fi->fi_rlist,
> > + nfs4_file_rhash_params);
> > }
> >
> > /*
> > @@ -5629,6 +5630,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
> > * If not found, create the nfs4_file struct
> > */
> > fp = nfsd4_file_hash_insert(open->op_file, current_fh);
> > + if (unlikely(!fp))
> > + return nfserr_jukebox;
> > if (fp != open->op_file) {
> > status = nfs4_check_deleg(cl, open, &dp);
> > if (status)
> > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > index 190fc7e418a4..eadd7f465bf5 100644
> > --- a/fs/nfsd/state.h
> > +++ b/fs/nfsd/state.h
> > @@ -536,16 +536,12 @@ struct nfs4_clnt_odstate {
> > * inode can have multiple filehandles associated with it, so there is
> > * (potentially) a many to one relationship between this struct and struct
> > * inode.
> > - *
> > - * These are hashed by filehandle in the file_hashtbl, which is protected by
> > - * the global state_lock spinlock.
> > */
> > struct nfs4_file {
> > refcount_t fi_ref;
> > struct inode * fi_inode;
> > bool fi_aliased;
> > spinlock_t fi_lock;
> > - struct hlist_node fi_hash; /* hash on fi_fhandle */
> > struct rhlist_head fi_rlist;
> > struct list_head fi_stateids;
> > union {
> >
> >
>
> You can add this to 13 and 14:
>
> Reviewed-by: Jeff Layton <[email protected]>
>
> That said, I'd probably prefer to see the two squashed together, since
> you're adding unused infrastructure in 13.
>

Erm, maybe make that my kernel.org address instead? I try to use that
for upstream work.

--
Jeff Layton <[email protected]>