LinuxLists.cc - [PATCH] nfsd: remove dedicated nfsd_filecache

2022-11-07 17:15:27

Subject: [PATCH] nfsd: remove dedicated nfsd_filecache_wq

There's no clear benefit to allocating our own over just using the
system_wq. This also fixes a minor bug in nfsd_file_cache_init(). In the
current code, if allocating the wq fails, then the nfsd_file_rhash_tbl
is leaked.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/filecache.c | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 1e76b0d3b83a..59e06d68d20c 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -66,8 +66,6 @@ struct nfsd_fcache_disposal {
struct list_head freeme;
};

-static struct workqueue_struct *nfsd_filecache_wq __read_mostly;
-
static struct kmem_cache *nfsd_file_slab;
static struct kmem_cache *nfsd_file_mark_slab;
static struct list_lru nfsd_file_lru;
@@ -564,7 +562,7 @@ nfsd_file_list_add_disposal(struct list_head *files, struct net *net)
spin_lock(&l->lock);
list_splice_tail_init(files, &l->freeme);
spin_unlock(&l->lock);
- queue_work(nfsd_filecache_wq, &l->work);
+ queue_work(system_wq, &l->work);
}

static void
@@ -855,11 +853,6 @@ nfsd_file_cache_init(void)
if (ret)
return ret;

- ret = -ENOMEM;
- nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0);
- if (!nfsd_filecache_wq)
- goto out;
-
nfsd_file_slab = kmem_cache_create("nfsd_file",
sizeof(struct nfsd_file), 0, 0, NULL);
if (!nfsd_file_slab) {
@@ -917,8 +910,6 @@ nfsd_file_cache_init(void)
nfsd_file_slab = NULL;
kmem_cache_destroy(nfsd_file_mark_slab);
nfsd_file_mark_slab = NULL;
- destroy_workqueue(nfsd_filecache_wq);
- nfsd_filecache_wq = NULL;
rhashtable_destroy(&nfsd_file_rhash_tbl);
goto out;
}
@@ -1034,8 +1025,6 @@ nfsd_file_cache_shutdown(void)
fsnotify_wait_marks_destroyed();
kmem_cache_destroy(nfsd_file_mark_slab);
nfsd_file_mark_slab = NULL;
- destroy_workqueue(nfsd_filecache_wq);
- nfsd_filecache_wq = NULL;
rhashtable_destroy(&nfsd_file_rhash_tbl);

for_each_possible_cpu(i) {
--
2.38.1

2022-11-07 17:32:10

by Jeffrey Layton

[permalink] [raw]

Subject: Re: [PATCH] nfsd: remove dedicated nfsd_filecache_wq

On Mon, 2022-11-07 at 12:10 -0500, Jeff Layton wrote:
> There's no clear benefit to allocating our own over just using the
> system_wq. This also fixes a minor bug in nfsd_file_cache_init(). In the
> current code, if allocating the wq fails, then the nfsd_file_rhash_tbl
> is leaked.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/filecache.c | 13 +------------
> 1 file changed, 1 insertion(+), 12 deletions(-)
>

I'm playing with this and it seems to be ok, but reading further into
the workqueue doc, it says this:

* A wq serves as a domain for forward progress guarantee
(``WQ_MEM_RECLAIM``, flush and work item attributes. Work items
which are not involved in memory reclaim and don't need to be
flushed as a part of a group of work items, and don't require any
special attribute, can use one of the system wq. There is no
difference in execution characteristics between using a dedicated wq
and a system wq.

These jobs are involved in mem reclaim however, due to the shrinker.
OTOH, the existing nfsd_filecache_wq doesn't set WQ_MEM_RECLAIM.

In any case, we aren't flushing the work or anything as part of mem
reclaim, so maybe the above bullet point doesn't apply here?

> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 1e76b0d3b83a..59e06d68d20c 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -66,8 +66,6 @@ struct nfsd_fcache_disposal {
> struct list_head freeme;
> };
>
> -static struct workqueue_struct *nfsd_filecache_wq __read_mostly;
> -
> static struct kmem_cache *nfsd_file_slab;
> static struct kmem_cache *nfsd_file_mark_slab;
> static struct list_lru nfsd_file_lru;
> @@ -564,7 +562,7 @@ nfsd_file_list_add_disposal(struct list_head *files, struct net *net)
> spin_lock(&l->lock);
> list_splice_tail_init(files, &l->freeme);
> spin_unlock(&l->lock);
> - queue_work(nfsd_filecache_wq, &l->work);
> + queue_work(system_wq, &l->work);
> }
>
> static void
> @@ -855,11 +853,6 @@ nfsd_file_cache_init(void)
> if (ret)
> return ret;
>
> - ret = -ENOMEM;
> - nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0);
> - if (!nfsd_filecache_wq)
> - goto out;
> -
> nfsd_file_slab = kmem_cache_create("nfsd_file",
> sizeof(struct nfsd_file), 0, 0, NULL);
> if (!nfsd_file_slab) {
> @@ -917,8 +910,6 @@ nfsd_file_cache_init(void)
> nfsd_file_slab = NULL;
> kmem_cache_destroy(nfsd_file_mark_slab);
> nfsd_file_mark_slab = NULL;
> - destroy_workqueue(nfsd_filecache_wq);
> - nfsd_filecache_wq = NULL;
> rhashtable_destroy(&nfsd_file_rhash_tbl);
> goto out;
> }
> @@ -1034,8 +1025,6 @@ nfsd_file_cache_shutdown(void)
> fsnotify_wait_marks_destroyed();
> kmem_cache_destroy(nfsd_file_mark_slab);
> nfsd_file_mark_slab = NULL;
> - destroy_workqueue(nfsd_filecache_wq);
> - nfsd_filecache_wq = NULL;
> rhashtable_destroy(&nfsd_file_rhash_tbl);
>
> for_each_possible_cpu(i) {

--
Jeff Layton <[email protected]>

2022-11-07 18:27:46

by Chuck Lever

[permalink] [raw]

Subject: Re: [PATCH] nfsd: remove dedicated nfsd_filecache_wq

> On Nov 7, 2022, at 12:28 PM, Jeff Layton <[email protected]> wrote:
>
> On Mon, 2022-11-07 at 12:10 -0500, Jeff Layton wrote:
>> There's no clear benefit to allocating our own over just using the
>> system_wq. This also fixes a minor bug in nfsd_file_cache_init(). In the
>> current code, if allocating the wq fails, then the nfsd_file_rhash_tbl
>> is leaked.
>>
>> Signed-off-by: Jeff Layton <[email protected]>
>> ---
>> fs/nfsd/filecache.c | 13 +------------
>> 1 file changed, 1 insertion(+), 12 deletions(-)
>>
>
> I'm playing with this and it seems to be ok, but reading further into
> the workqueue doc, it says this:
>
> * A wq serves as a domain for forward progress guarantee
> (``WQ_MEM_RECLAIM``, flush and work item attributes. Work items
> which are not involved in memory reclaim and don't need to be
> flushed as a part of a group of work items, and don't require any
> special attribute, can use one of the system wq. There is no
> difference in execution characteristics between using a dedicated wq
> and a system wq.
>
> These jobs are involved in mem reclaim however, due to the shrinker.
> OTOH, the existing nfsd_filecache_wq doesn't set WQ_MEM_RECLAIM.
>
> In any case, we aren't flushing the work or anything as part of mem
> reclaim, so maybe the above bullet point doesn't apply here?

In the steady state, deferring writeback seems like the right
thing to do, and I don't see the need for a special WQ for that
case -- hence nfsd_file_schedule_laundrette() can use the
system_wq.

That might explain the dual WQ arrangement in the current code.

But I'd feel better if the shrinker skipped files that require
writeback to avoid a potential deadlock scenario for some
filesystems.

>> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>> index 1e76b0d3b83a..59e06d68d20c 100644
>> --- a/fs/nfsd/filecache.c
>> +++ b/fs/nfsd/filecache.c
>> @@ -66,8 +66,6 @@ struct nfsd_fcache_disposal {
>> struct list_head freeme;
>> };
>>
>> -static struct workqueue_struct *nfsd_filecache_wq __read_mostly;
>> -
>> static struct kmem_cache *nfsd_file_slab;
>> static struct kmem_cache *nfsd_file_mark_slab;
>> static struct list_lru nfsd_file_lru;
>> @@ -564,7 +562,7 @@ nfsd_file_list_add_disposal(struct list_head *files, struct net *net)
>> spin_lock(&l->lock);
>> list_splice_tail_init(files, &l->freeme);
>> spin_unlock(&l->lock);
>> - queue_work(nfsd_filecache_wq, &l->work);
>> + queue_work(system_wq, &l->work);
>> }
>>
>> static void
>> @@ -855,11 +853,6 @@ nfsd_file_cache_init(void)
>> if (ret)
>> return ret;
>>
>> - ret = -ENOMEM;
>> - nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0);
>> - if (!nfsd_filecache_wq)
>> - goto out;
>> -
>> nfsd_file_slab = kmem_cache_create("nfsd_file",
>> sizeof(struct nfsd_file), 0, 0, NULL);
>> if (!nfsd_file_slab) {
>> @@ -917,8 +910,6 @@ nfsd_file_cache_init(void)
>> nfsd_file_slab = NULL;
>> kmem_cache_destroy(nfsd_file_mark_slab);
>> nfsd_file_mark_slab = NULL;
>> - destroy_workqueue(nfsd_filecache_wq);
>> - nfsd_filecache_wq = NULL;
>> rhashtable_destroy(&nfsd_file_rhash_tbl);
>> goto out;
>> }
>> @@ -1034,8 +1025,6 @@ nfsd_file_cache_shutdown(void)
>> fsnotify_wait_marks_destroyed();
>> kmem_cache_destroy(nfsd_file_mark_slab);
>> nfsd_file_mark_slab = NULL;
>> - destroy_workqueue(nfsd_filecache_wq);
>> - nfsd_filecache_wq = NULL;
>> rhashtable_destroy(&nfsd_file_rhash_tbl);
>>
>> for_each_possible_cpu(i) {
>
> --
> Jeff Layton <[email protected]>

--
Chuck Lever

2022-11-07 18:53:12

by Jeffrey Layton

[permalink] [raw]

Subject: Re: [PATCH] nfsd: remove dedicated nfsd_filecache_wq

On Mon, 2022-11-07 at 18:16 +0000, Chuck Lever III wrote:
>
> > On Nov 7, 2022, at 12:28 PM, Jeff Layton <[email protected]> wrote:
> >
> > On Mon, 2022-11-07 at 12:10 -0500, Jeff Layton wrote:
> > > There's no clear benefit to allocating our own over just using the
> > > system_wq. This also fixes a minor bug in nfsd_file_cache_init(). In the
> > > current code, if allocating the wq fails, then the nfsd_file_rhash_tbl
> > > is leaked.
> > >
> > > Signed-off-by: Jeff Layton <[email protected]>
> > > ---
> > > fs/nfsd/filecache.c | 13 +------------
> > > 1 file changed, 1 insertion(+), 12 deletions(-)
> > >
> >
> > I'm playing with this and it seems to be ok, but reading further into
> > the workqueue doc, it says this:
> >
> > * A wq serves as a domain for forward progress guarantee
> > (``WQ_MEM_RECLAIM``, flush and work item attributes. Work items
> > which are not involved in memory reclaim and don't need to be
> > flushed as a part of a group of work items, and don't require any
> > special attribute, can use one of the system wq. There is no
> > difference in execution characteristics between using a dedicated wq
> > and a system wq.
> >
> > These jobs are involved in mem reclaim however, due to the shrinker.
> > OTOH, the existing nfsd_filecache_wq doesn't set WQ_MEM_RECLAIM.
> >
> > In any case, we aren't flushing the work or anything as part of mem
> > reclaim, so maybe the above bullet point doesn't apply here?
>
> In the steady state, deferring writeback seems like the right
> thing to do, and I don't see the need for a special WQ for that
> case -- hence nfsd_file_schedule_laundrette() can use the
> system_wq.
>
> That might explain the dual WQ arrangement in the current code.
>

True. Looking through the changelog, the dedicated workqueue was added
by Trond in 9542e6a643fc ("nfsd: Containerise filecache laundrette"). I
assume he was concerned about reclaim.

Trond, if we keep the dedicated workqueue for the laundrette, does it
also need WQ_MEM_RECLAIM?

> But I'd feel better if the shrinker skipped files that require
> writeback to avoid a potential deadlock scenario for some
> filesystems.
>

It already does this via the nfsd_file_check_writeback call in
nfsd_file_lru_cb.

>
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index 1e76b0d3b83a..59e06d68d20c 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -66,8 +66,6 @@ struct nfsd_fcache_disposal {
> > > struct list_head freeme;
> > > };
> > >
> > > -static struct workqueue_struct *nfsd_filecache_wq __read_mostly;
> > > -
> > > static struct kmem_cache *nfsd_file_slab;
> > > static struct kmem_cache *nfsd_file_mark_slab;
> > > static struct list_lru nfsd_file_lru;
> > > @@ -564,7 +562,7 @@ nfsd_file_list_add_disposal(struct list_head *files, struct net *net)
> > > spin_lock(&l->lock);
> > > list_splice_tail_init(files, &l->freeme);
> > > spin_unlock(&l->lock);
> > > - queue_work(nfsd_filecache_wq, &l->work);
> > > + queue_work(system_wq, &l->work);
> > > }
> > >
> > > static void
> > > @@ -855,11 +853,6 @@ nfsd_file_cache_init(void)
> > > if (ret)
> > > return ret;
> > >
> > > - ret = -ENOMEM;
> > > - nfsd_filecache_wq = alloc_workqueue("nfsd_filecache", 0, 0);
> > > - if (!nfsd_filecache_wq)
> > > - goto out;
> > > -
> > > nfsd_file_slab = kmem_cache_create("nfsd_file",
> > > sizeof(struct nfsd_file), 0, 0, NULL);
> > > if (!nfsd_file_slab) {
> > > @@ -917,8 +910,6 @@ nfsd_file_cache_init(void)
> > > nfsd_file_slab = NULL;
> > > kmem_cache_destroy(nfsd_file_mark_slab);
> > > nfsd_file_mark_slab = NULL;
> > > - destroy_workqueue(nfsd_filecache_wq);
> > > - nfsd_filecache_wq = NULL;
> > > rhashtable_destroy(&nfsd_file_rhash_tbl);
> > > goto out;
> > > }
> > > @@ -1034,8 +1025,6 @@ nfsd_file_cache_shutdown(void)
> > > fsnotify_wait_marks_destroyed();
> > > kmem_cache_destroy(nfsd_file_mark_slab);
> > > nfsd_file_mark_slab = NULL;
> > > - destroy_workqueue(nfsd_filecache_wq);
> > > - nfsd_filecache_wq = NULL;
> > > rhashtable_destroy(&nfsd_file_rhash_tbl);
> > >
> > > for_each_possible_cpu(i) {
> >
> > --
> > Jeff Layton <[email protected]>
>
> --
> Chuck Lever
>
>
>

--
Jeff Layton <[email protected]>

2022-11-07 19:11:25

by Trond Myklebust

[permalink] [raw]

Subject: Re: [PATCH] nfsd: remove dedicated nfsd_filecache_wq

> On Nov 7, 2022, at 13:45, Jeff Layton <[email protected]> wrote:
>
> On Mon, 2022-11-07 at 18:16 +0000, Chuck Lever III wrote:
>>
>>> On Nov 7, 2022, at 12:28 PM, Jeff Layton <[email protected]> wrote:
>>>
>>> On Mon, 2022-11-07 at 12:10 -0500, Jeff Layton wrote:
>>>> There's no clear benefit to allocating our own over just using the
>>>> system_wq. This also fixes a minor bug in nfsd_file_cache_init(). In the
>>>> current code, if allocating the wq fails, then the nfsd_file_rhash_tbl
>>>> is leaked.
>>>>
>>>> Signed-off-by: Jeff Layton <[email protected]>
>>>> ---
>>>> fs/nfsd/filecache.c | 13 +------------
>>>> 1 file changed, 1 insertion(+), 12 deletions(-)
>>>>
>>>
>>> I'm playing with this and it seems to be ok, but reading further into
>>> the workqueue doc, it says this:
>>>
>>> * A wq serves as a domain for forward progress guarantee
>>> (``WQ_MEM_RECLAIM``, flush and work item attributes. Work items
>>> which are not involved in memory reclaim and don't need to be
>>> flushed as a part of a group of work items, and don't require any
>>> special attribute, can use one of the system wq. There is no
>>> difference in execution characteristics between using a dedicated wq
>>> and a system wq.
>>>
>>> These jobs are involved in mem reclaim however, due to the shrinker.
>>> OTOH, the existing nfsd_filecache_wq doesn't set WQ_MEM_RECLAIM.
>>>
>>> In any case, we aren't flushing the work or anything as part of mem
>>> reclaim, so maybe the above bullet point doesn't apply here?
>>
>> In the steady state, deferring writeback seems like the right
>> thing to do, and I don't see the need for a special WQ for that
>> case -- hence nfsd_file_schedule_laundrette() can use the
>> system_wq.
>>
>> That might explain the dual WQ arrangement in the current code.
>>
>
> True. Looking through the changelog, the dedicated workqueue was added
> by Trond in 9542e6a643fc ("nfsd: Containerise filecache laundrette"). I
> assume he was concerned about reclaim.

It is about separation of concerns. The issue is to ensure that when you have multiple containers each running their own set of knfsd servers, then you won’t end up with one server bottlenecking the cleanup of all the containers. It is of particular interest to do this if one of these containers is actually re-exporting NFS.

>
> Trond, if we keep the dedicated workqueue for the laundrette, does it
> also need WQ_MEM_RECLAIM?

In theory, the files on that list are supposed to have already been flushed, so I don’t think it is needed.

_________________________________
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]

2022-11-07 20:49:16

by Jeffrey Layton

[permalink] [raw]

Subject: Re: [PATCH] nfsd: remove dedicated nfsd_filecache_wq

On Mon, 2022-11-07 at 19:06 +0000, Trond Myklebust wrote:
>
> > On Nov 7, 2022, at 13:45, Jeff Layton <[email protected]> wrote:
> >
> > On Mon, 2022-11-07 at 18:16 +0000, Chuck Lever III wrote:
> > >
> > > > On Nov 7, 2022, at 12:28 PM, Jeff Layton <[email protected]> wrote:
> > > >
> > > > On Mon, 2022-11-07 at 12:10 -0500, Jeff Layton wrote:
> > > > > There's no clear benefit to allocating our own over just using the
> > > > > system_wq. This also fixes a minor bug in nfsd_file_cache_init(). In the
> > > > > current code, if allocating the wq fails, then the nfsd_file_rhash_tbl
> > > > > is leaked.
> > > > >
> > > > > Signed-off-by: Jeff Layton <[email protected]>
> > > > > ---
> > > > > fs/nfsd/filecache.c | 13 +------------
> > > > > 1 file changed, 1 insertion(+), 12 deletions(-)
> > > > >
> > > >
> > > > I'm playing with this and it seems to be ok, but reading further into
> > > > the workqueue doc, it says this:
> > > >
> > > > * A wq serves as a domain for forward progress guarantee
> > > > (``WQ_MEM_RECLAIM``, flush and work item attributes. Work items
> > > > which are not involved in memory reclaim and don't need to be
> > > > flushed as a part of a group of work items, and don't require any
> > > > special attribute, can use one of the system wq. There is no
> > > > difference in execution characteristics between using a dedicated wq
> > > > and a system wq.
> > > >
> > > > These jobs are involved in mem reclaim however, due to the shrinker.
> > > > OTOH, the existing nfsd_filecache_wq doesn't set WQ_MEM_RECLAIM.
> > > >
> > > > In any case, we aren't flushing the work or anything as part of mem
> > > > reclaim, so maybe the above bullet point doesn't apply here?
> > >
> > > In the steady state, deferring writeback seems like the right
> > > thing to do, and I don't see the need for a special WQ for that
> > > case -- hence nfsd_file_schedule_laundrette() can use the
> > > system_wq.
> > >
> > > That might explain the dual WQ arrangement in the current code.
> > >
> >
> > True. Looking through the changelog, the dedicated workqueue was added
> > by Trond in 9542e6a643fc ("nfsd: Containerise filecache laundrette"). I
> > assume he was concerned about reclaim.
>
> It is about separation of concerns. The issue is to ensure that when you have multiple containers each running their own set of knfsd servers, then you won’t end up with one server bottlenecking the cleanup of all the containers. It is of particular interest to do this if one of these containers is actually re-exporting NFS.
>

I'm not sure that having a separate workqueue really prevents that these
days though, does it? They all use the same kthreads. We're not using
flush_workqueue. What scenario do we prevent by having a dedicated
workqueue?

> >
> > Trond, if we keep the dedicated workqueue for the laundrette, does it
> > also need WQ_MEM_RECLAIM?
>
> In theory, the files on that list are supposed to have already been flushed, so I don’t think it is needed.
>

Ok.

--
Jeff Layton <[email protected]>