2022-05-31 11:33:36

by Wang Yugui

[permalink] [raw]
Subject: [PATCH v2] nfsd: serialize filecache garbage collector

When many(>NFSD_FILE_LRU_THRESHOLD) files are kept as OPEN, such as
xfstests generic/531, nfsd proceses are in CPU high-load state,
and nfsd_file_gc(nfsd filecache garbage collector) waste many CPU times.

concurrency nfsd_file_gc() is almost meaningless, so serialize it.

Signed-off-by: Wang Yugui <[email protected]>
---
Changes since v1:
- add static to 'atomic_t nfsd_file_gc_running'.
thanks for kernel test robot <[email protected]>

fs/nfsd/filecache.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index f172412447f5..28a8f8d6d235 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -471,10 +471,15 @@ nfsd_file_lru_walk_list(struct shrink_control *sc)
return ret;
}

+/* concurrency nfsd_file_gc() is almost meaningless, so serialize it. */
+static atomic_t nfsd_file_gc_running = ATOMIC_INIT(0);
static void
nfsd_file_gc(void)
{
- nfsd_file_lru_walk_list(NULL);
+ if(atomic_cmpxchg(&nfsd_file_gc_running, 0, 1) == 0) {
+ nfsd_file_lru_walk_list(NULL);
+ atomic_set(&nfsd_file_gc_running, 0);
+ }
}

static void
--
2.36.1



2022-05-31 22:30:43

by Wang Yugui

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: serialize filecache garbage collector

Hi,

> > On May 31, 2022, at 6:34 AM, Wang Yugui <[email protected]> wrote:
> >
> > When many(>NFSD_FILE_LRU_THRESHOLD) files are kept as OPEN, such as
> > xfstests generic/531, nfsd proceses are in CPU high-load state,
> > and nfsd_file_gc(nfsd filecache garbage collector) waste many CPU times.
>
> Over the past few days, I've been able to reproduce a lot of bad
> behavior with generic/531. My test client has 12 physical CPU
> cores, and my lab network is 56Gb InfiniBand.
>
> Unfortunately this patch doesn't really begin to address it. For
> example, with this patch applied, CPU idle is in single digits
> on the NFS server that exports the test's scratch device, and
> that server can still get into a soft lock-up. IMO that is
> because this change works around the underlying problem but
> makes no attempt to root-cause or address that issue.
>
> I agree that the NFS server's behavior needs attention, but I'm
> not inclined to apply this particular patch as it is.

Yes. this patch is just particular for xfstests generic/531.

In xfstests generic/531, when many(>500K ) files are kept as OPEN, a
file delete will cause LRU walk( CPU soft look-up) too.

big LRU data is still fast to add, but very slow to remove some random
one?

Best Regards
Wang Yugui ([email protected])
2022/05/31


2022-06-01 20:09:33

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: serialize filecache garbage collector



> On May 31, 2022, at 6:34 AM, Wang Yugui <[email protected]> wrote:
>
> When many(>NFSD_FILE_LRU_THRESHOLD) files are kept as OPEN, such as
> xfstests generic/531, nfsd proceses are in CPU high-load state,
> and nfsd_file_gc(nfsd filecache garbage collector) waste many CPU times.

Over the past few days, I've been able to reproduce a lot of bad
behavior with generic/531. My test client has 12 physical CPU
cores, and my lab network is 56Gb InfiniBand.

Unfortunately this patch doesn't really begin to address it. For
example, with this patch applied, CPU idle is in single digits
on the NFS server that exports the test's scratch device, and
that server can still get into a soft lock-up. IMO that is
because this change works around the underlying problem but
makes no attempt to root-cause or address that issue.

I agree that the NFS server's behavior needs attention, but I'm
not inclined to apply this particular patch as it is.


> concurrency nfsd_file_gc() is almost meaningless, so serialize it.
>
> Signed-off-by: Wang Yugui <[email protected]>
> ---
> Changes since v1:
> - add static to 'atomic_t nfsd_file_gc_running'.
> thanks for kernel test robot <[email protected]>
>
> fs/nfsd/filecache.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index f172412447f5..28a8f8d6d235 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -471,10 +471,15 @@ nfsd_file_lru_walk_list(struct shrink_control *sc)
> return ret;
> }
>
> +/* concurrency nfsd_file_gc() is almost meaningless, so serialize it. */
> +static atomic_t nfsd_file_gc_running = ATOMIC_INIT(0);
> static void
> nfsd_file_gc(void)
> {
> - nfsd_file_lru_walk_list(NULL);
> + if(atomic_cmpxchg(&nfsd_file_gc_running, 0, 1) == 0) {
> + nfsd_file_lru_walk_list(NULL);
> + atomic_set(&nfsd_file_gc_running, 0);
> + }
> }
>
> static void
> --
> 2.36.1
>

--
Chuck Lever