LinuxLists.cc - [RPC] nfsd: NFSv4 close a file completely

2022-06-12 12:55:14

Subject: [RPC] nfsd: NFSv4 close a file completely

NFSv4 need to close a file completely (no lingering open) when it does
a CLOSE or DELEGRETURN.

When multiple NFSv4/OPEN from different clients, we need to check the
reference count. The flowing reference-count-check change the behavior
of NFSv3 nfsd_rename()/nfsd_unlink() too.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=387
Signed-off-by: Wang Yugui <[email protected]>
---
TO-CHECK:
1) NFSv3 nfsd_rename()/nfsd_unlink() feature change is OK?
2) Can we do better performance than nfsd_file_close_inode_sync()?
3) nfsd_file_close_inode_sync()->nfsd_file_close_inode() in nfsd4_delegreturn()
=> 'Text file busy' about 4s
4) reference-count-check : refcount_read(&nf->nf_ref) <= 1 or ==0?
nfsd_file_alloc() refcount_set(&nf->nf_ref, 1);

fs/nfsd/filecache.c | 2 +-
fs/nfsd/nfs4state.c | 4 ++++
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 9cb2d590c036..8890a8fa7402 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -512,7 +512,7 @@ __nfsd_file_close_inode(struct inode *inode, unsigned int hashval,

spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) {
- if (inode == nf->nf_inode)
+ if (inode == nf->nf_inode && refcount_read(&nf->nf_ref) <= 1)
nfsd_file_unhash_and_release_locked(nf, dispose);
}
spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 9409a0dc1b76..be4b480f5914 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -6673,6 +6673,8 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,

/* put reference from nfs4_preprocess_seqid_op */
nfs4_put_stid(&stp->st_stid);
+
+ nfsd_file_close_inode_sync(d_inode(cstate->current_fh.fh_dentry));
out:
return status;
}
@@ -6702,6 +6704,8 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
destroy_delegation(dp);
put_stateid:
nfs4_put_stid(&dp->dl_stid);
+
+ nfsd_file_close_inode_sync(d_inode(cstate->current_fh.fh_dentry));
out:
return status;
}
--
2.36.1

2022-06-12 18:42:13

by Chuck Lever III

[permalink] [raw]

Subject: Re: [RPC] nfsd: NFSv4 close a file completely

> On Jun 12, 2022, at 3:22 AM, Wang Yugui <[email protected]> wrote:
>
> NFSv4 need to close a file completely (no lingering open) when it does
> a CLOSE or DELEGRETURN.
>
> When multiple NFSv4/OPEN from different clients, we need to check the
> reference count. The flowing reference-count-check change the behavior
> of NFSv3 nfsd_rename()/nfsd_unlink() too.
>
> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=387
> Signed-off-by: Wang Yugui <[email protected]>
> ---
> TO-CHECK:
> 1) NFSv3 nfsd_rename()/nfsd_unlink() feature change is OK?
> 2) Can we do better performance than nfsd_file_close_inode_sync()?
> 3) nfsd_file_close_inode_sync()->nfsd_file_close_inode() in nfsd4_delegreturn()
> => 'Text file busy' about 4s
> 4) reference-count-check : refcount_read(&nf->nf_ref) <= 1 or ==0?
> nfsd_file_alloc() refcount_set(&nf->nf_ref, 1);
>
> fs/nfsd/filecache.c | 2 +-
> fs/nfsd/nfs4state.c | 4 ++++
> 2 files changed, 5 insertions(+), 1 deletion(-)

I suppose I owe you (and Frank) a progress report on #386. I've fixed
the LRU algorithm and added some observability features to measure
how the fix impacts the cache's efficiency for NFSv3 workloads.

These new features show that the hit rate and average age of cache
items goes down after the fix is applied. I'm trying to understand
if I've done something wrong or if the fix is supposed to do that.

To handle the case of hundreds of thousands of open files more
efficiently, I'd like to convert the filecache to use rhashtable.

> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 9cb2d590c036..8890a8fa7402 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -512,7 +512,7 @@ __nfsd_file_close_inode(struct inode *inode, unsigned int hashval,
>
> spin_lock(&nfsd_file_hashtbl[hashval].nfb_lock);
> hlist_for_each_entry_safe(nf, tmp, &nfsd_file_hashtbl[hashval].nfb_head, nf_node) {
> - if (inode == nf->nf_inode)
> + if (inode == nf->nf_inode && refcount_read(&nf->nf_ref) <= 1)
> nfsd_file_unhash_and_release_locked(nf, dispose);
> }
> spin_unlock(&nfsd_file_hashtbl[hashval].nfb_lock);
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 9409a0dc1b76..be4b480f5914 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -6673,6 +6673,8 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>
> /* put reference from nfs4_preprocess_seqid_op */
> nfs4_put_stid(&stp->st_stid);
> +
> + nfsd_file_close_inode_sync(d_inode(cstate->current_fh.fh_dentry));

IIUC this closes /all/ nfsd_file objects that refer to the same inode.
CLOSE is supposed to release only the state held by a particular user
on a particular client. This is like closing a file descriptor; you
release the resources associated with that file descriptor, and other
users of the inode are not supposed to be affected. Thus using an
inode-based API in nfsd4_close/delegreturn seems like the wrong
approach.

> out:
> return status;
> }
> @@ -6702,6 +6704,8 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> destroy_delegation(dp);
> put_stateid:
> nfs4_put_stid(&dp->dl_stid);
> +
> + nfsd_file_close_inode_sync(d_inode(cstate->current_fh.fh_dentry));
> out:
> return status;
> }
> --
> 2.36.1
>

--
Chuck Lever

2022-06-15 15:31:41

by Wang Yugui

[permalink] [raw]

Subject: Re: [RPC] nfsd: NFSv4 close a file completely

Hi,

> > On Jun 12, 2022, at 3:22 AM, Wang Yugui <[email protected]> wrote:
> >
> > NFSv4 need to close a file completely (no lingering open) when it does
> > a CLOSE or DELEGRETURN.
> >
> > When multiple NFSv4/OPEN from different clients, we need to check the
> > reference count. The flowing reference-count-check change the behavior
> > of NFSv3 nfsd_rename()/nfsd_unlink() too.
> >
> > Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=387
> > Signed-off-by: Wang Yugui <[email protected]>
> > ---
> > TO-CHECK:
> > 1) NFSv3 nfsd_rename()/nfsd_unlink() feature change is OK?
> > 2) Can we do better performance than nfsd_file_close_inode_sync()?
> > 3) nfsd_file_close_inode_sync()->nfsd_file_close_inode() in nfsd4_delegreturn()
> > => 'Text file busy' about 4s
> > 4) reference-count-check : refcount_read(&nf->nf_ref) <= 1 or ==0?
> > nfsd_file_alloc() refcount_set(&nf->nf_ref, 1);
> >
> > fs/nfsd/filecache.c | 2 +-
> > fs/nfsd/nfs4state.c | 4 ++++
> > 2 files changed, 5 insertions(+), 1 deletion(-)
>
> I suppose I owe you (and Frank) a progress report on #386. I've fixed
> the LRU algorithm and added some observability features to measure
> how the fix impacts the cache's efficiency for NFSv3 workloads.
>
> These new features show that the hit rate and average age of cache
> items goes down after the fix is applied. I'm trying to understand
> if I've done something wrong or if the fix is supposed to do that.
>
> To handle the case of hundreds of thousands of open files more
> efficiently, I'd like to convert the filecache to use rhashtable.

A question about the comming rhashtable.

Now multiple nfsd export share a cache pool.

In the coming rhashtable, a nfsd export could use a private cache pool
to improve scale out?

Best Regards
Wang Yugui ([email protected])
2022/06/15

2022-06-15 15:40:29

by Chuck Lever III

[permalink] [raw]

Subject: Re: [RPC] nfsd: NFSv4 close a file completely

> On Jun 15, 2022, at 11:28 AM, Wang Yugui <[email protected]> wrote:
>
> Hi,
>
>>> On Jun 12, 2022, at 3:22 AM, Wang Yugui <[email protected]> wrote:
>>>
>>> NFSv4 need to close a file completely (no lingering open) when it does
>>> a CLOSE or DELEGRETURN.
>>>
>>> When multiple NFSv4/OPEN from different clients, we need to check the
>>> reference count. The flowing reference-count-check change the behavior
>>> of NFSv3 nfsd_rename()/nfsd_unlink() too.
>>>
>>> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=387
>>> Signed-off-by: Wang Yugui <[email protected]>
>>> ---
>>> TO-CHECK:
>>> 1) NFSv3 nfsd_rename()/nfsd_unlink() feature change is OK?
>>> 2) Can we do better performance than nfsd_file_close_inode_sync()?
>>> 3) nfsd_file_close_inode_sync()->nfsd_file_close_inode() in nfsd4_delegreturn()
>>> => 'Text file busy' about 4s
>>> 4) reference-count-check : refcount_read(&nf->nf_ref) <= 1 or ==0?
>>> nfsd_file_alloc() refcount_set(&nf->nf_ref, 1);
>>>
>>> fs/nfsd/filecache.c | 2 +-
>>> fs/nfsd/nfs4state.c | 4 ++++
>>> 2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> I suppose I owe you (and Frank) a progress report on #386. I've fixed
>> the LRU algorithm and added some observability features to measure
>> how the fix impacts the cache's efficiency for NFSv3 workloads.
>>
>> These new features show that the hit rate and average age of cache
>> items goes down after the fix is applied. I'm trying to understand
>> if I've done something wrong or if the fix is supposed to do that.
>>
>> To handle the case of hundreds of thousands of open files more
>> efficiently, I'd like to convert the filecache to use rhashtable.
>
> A question about the comming rhashtable.
>
> Now multiple nfsd export share a cache pool.
>
> In the coming rhashtable, a nfsd export could use a private cache pool
> to improve scale out?

That seems like a premature optimization. We don't know that the hashtable,
under normal (ie, non-generic/531) workloads, is a scaling problem.

However, I am considering (in the future) creating separate filecaches
for each nfsd_net.

--
Chuck Lever

2022-06-15 19:32:22

by Chuck Lever III

[permalink] [raw]

Subject: Re: [RPC] nfsd: NFSv4 close a file completely

> On Jun 15, 2022, at 11:39 AM, Chuck Lever III <[email protected]> wrote:
>
>> On Jun 15, 2022, at 11:28 AM, Wang Yugui <[email protected]> wrote:
>>
>> A question about the coming rhashtable.
>>
>> Now multiple nfsd export share a cache pool.
>>
>> In the coming rhashtable, a nfsd export could use a private cache pool
>> to improve scale out?
>
> That seems like a premature optimization. We don't know that the hashtable,
> under normal (ie, non-generic/531) workloads, is a scaling problem.
>
> However, I am considering (in the future) creating separate filecaches
> for each nfsd_net.

So I'm not rejecting your idea outright. To expand on this a little:

Just a single rhashtable will enable better scaling, and so will fixing
the LRU issues, and those are both in plan for my current set of fixes.
It's not clear to me that pathological workloads like generic/531 on
NFSv4 are common, so it's quite possible that just these two changes
will be enough for realistic workloads for the time being.

My near term goal for generic/531 is to prevent it from crashing NFSD.
Hopefully we can look at ways to enable that test to pass more often,
and fail gracefully when it doesn't pass. The issue is how the server
behaves when it can't open more files, which is somewhat separate from
the data structure efficiency issues you and Frank pointed out.

I'd like to get the above fixes ready for merge by the next window. So
I'm trying to narrow the changes in this set of fixes to make sure
they will be solid in a couple of months. It will be a heavier lift to
go from just one to two filecaches per server. After that, it will
likely be easier to go from two filecaches to multiple filecaches, but
I feel that's down the road.

In the medium term, supporting separate filecaches for NFSv3 and NFSv4
files is worth considering. NFSv3 nfsd_file items need to be managed
automatically and can be subject to a shrinker since there's no client
impact on releasing a cached filecache item.

NFSv4 nfsd_file items manage themselves (via OPEN/CLOSE) so an LRU isn't
really needed there (and isn't terribly effective anyway). A shrinker
can't easily release NFSv4 nfsd_file items without the server losing
state, and clients have to recover in that case.

And, it turns out that the current filecache capacity-limiting mechanism
forces NFSv3 items out of the filecache in favor of NFSv4 items when
the cache has more that NFSD_FILE_LRU_LIMIT items in it. IMO that's
obviously undesirable behavior for common mixed-version workloads.

--
Chuck Lever