2022-05-28 01:38:13

by Chuck Lever III

[permalink] [raw]
Subject: filecache LRU performance regression

Hi Frank-

Bruce recently reminded me about this issue. Is there a bugzilla somewhere?
Do you have a reproducer I can try?


--
Chuck Lever





2022-05-28 18:31:31

by Frank van der Linden

[permalink] [raw]
Subject: Re: filecache LRU performance regression

On Fri, May 27, 2022 at 06:59:47PM +0000, Chuck Lever III wrote:
>
>
> Hi Frank-
>
> Bruce recently reminded me about this issue. Is there a bugzilla somewhere?
> Do you have a reproducer I can try?

Hi Chuck,

The easiest way to reproduce the issue is to run generic/531 over an
NFSv4 mount, using a system with a larger number of CPUs on the client
side (or just scaling the test up manually - it has a calculation based
on the number of CPUs).

The test will take a long time to finish. I initially described the
details here:

https://lore.kernel.org/linux-nfs/20200608192122.GA19171@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com/

Since then, it was also reported here:

https://lore.kernel.org/all/[email protected]/T/#m8c3e4173696e17a9d5903d2a619550f352314d20

I posted an experimental patch, but it's actually not quite correct
(although I think the idea behind it is makes sense):

https://lore.kernel.org/linux-nfs/[email protected]/T/#m869aa427f125afee2af9a89d569c6b98e12e516f

The basic problem from the initial email I sent:

> So here's what happens: for NFSv4, files that are associated with an
> open stateid can stick around for a long time, as long as there's no
> CLOSE done on them. That's what's happening here. Also, since those files
> have a refcount of >= 2 (one for the hash table, one for being pointed to
> by the state), they are never eligible for removal from the file cache.
> Worse, since the code call nfs_file_gc inline if the upper bound is crossed
> (8192), every single operation that calls nfsd_file_acquire will end up
> walking the entire LRU, trying to free files, and failing every time.
> Walking a list with millions of files every single time isn't great.

I guess the basic issues here are:

1) Should these NFSv4 files be in the filecache at all? They are readily
available from the state, no need for additional caching.
2) In general: can state ever be gracefully retired if the client still
has an OPEN?

- Frank

2022-05-30 03:11:40

by Wang Yugui

[permalink] [raw]
Subject: Re: filecache LRU performance regression

Hi,

>
> > On May 28, 2022, at 10:32 PM, Wang Yugui <[email protected]> wrote:
> >
> > Hi,
> >
> >>> On May 27, 2022, at 4:37 PM, Frank van der Linden <[email protected]> wrote:
> >>>
> >>> On Fri, May 27, 2022 at 06:59:47PM +0000, Chuck Lever III wrote:
> >>>>
> >>>> Hi Frank-
> >>>>
> >>>> Bruce recently reminded me about this issue. Is there a bugzilla somewhere?
> >>>> Do you have a reproducer I can try?
> >>>
> >>> Hi Chuck,
> >>>
> >>> The easiest way to reproduce the issue is to run generic/531 over an
> >>> NFSv4 mount, using a system with a larger number of CPUs on the client
> >>> side (or just scaling the test up manually - it has a calculation based
> >>> on the number of CPUs).
> >>>
> >>> The test will take a long time to finish. I initially described the
> >>> details here:
> >>>
> >>> https://lore.kernel.org/linux-nfs/20200608192122.GA19171@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com/
> >>>
> >>> Since then, it was also reported here:
> >>>
> >>> https://lore.kernel.org/all/[email protected]/T/#m8c3e4173696e17a9d5903d2a619550f352314d20
> >>
> >> Thanks for the summary. So, there isn't a bugzilla tracking this
> >> issue? If not, please create one here:
> >>
> >> https://bugzilla.linux-nfs.org/
> >>
> >> Then we don't have to keep asking for a repeat summary ;-)
> >>
> >>
> >>> I posted an experimental patch, but it's actually not quite correct
> >>> (although I think the idea behind it is makes sense):
> >>>
> >>> https://lore.kernel.org/linux-nfs/[email protected]/T/#m869aa427f125afee2af9a89d569c6b98e12e516f
> >>
> >> A handful of random comments:
> >>
> >> - nfsd_file_put() is now significantly different than it used
> >> to be, so that part of the patch will have to be updated in
> >> order to apply to v5.18+
> >
> > When many open files(>NFSD_FILE_LRU_THRESHOLD),
> > nfsd_file_gc() will waste many CPU times.
>
> Thanks for the suggestion. I agree that CPU and
> memory bandwidth are not being used effectively
> by the filecache garbage collector.
>
>
> > Can we serialize nfsd_file_gc() for v5.18+ as a first step?
>
> If I understand Frank's problem statement correctly,
> garbage collection during an nfsd_file_put() under
> an NFSv4-only workload walks the length of the LRU
> list and finds nothing to evict 100% of the time.
> That seems like a bug, and fixing it might give us
> the most improvement in this area.
>
>
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 3d944ca..6abefd9 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -63,6 +63,8 @@ static struct delayed_work nfsd_filecache_laundrette;
> >
> > static void nfsd_file_gc(void);
> >
> > +static atomic_t nfsd_file_gc_queue_delayed = ATOMIC_INIT(0);
> > +
> > static void
> > nfsd_file_schedule_laundrette(void)
> > {
> > @@ -71,8 +73,10 @@ nfsd_file_schedule_laundrette(void)
> > if (count == 0 || test_bit(NFSD_FILE_SHUTDOWN, &nfsd_file_lru_flags))
> > return;
> >
> > - queue_delayed_work(system_wq, &nfsd_filecache_laundrette,
> > + if(atomic_cmpxchg(&nfsd_file_gc_queue_delayed, 0, 1)==0){
> > + queue_delayed_work(system_wq, &nfsd_filecache_laundrette,
> > NFSD_LAUNDRETTE_DELAY);
>
> I might misunderstand what this is doing exactly.
> I'm sure there's a preferred idiom in the kernel
> for not queuing a new work item when one is already
> running, so that an open-coded cmpxchg is not
> necessary.

Thanks. I dropped queue_delayed_work related change and reposted it.

> It might be better to allocate a specific workqueue
> for filecache garbage collection, and limit the
> maximum number of active work items allowed on that
> queue to 1. One benefit of that might be reducing
> the number of threads contending for the filecache
> data structures.

In this way, the number of threads of filecache garbage collection is
reduced. but filecache is still accessed by all nfsd threads for filecache
fetch/save.

Best Regards
Wang Yugui ([email protected])
2022/05/30

> If GC is capped like this, maybe create one GC
> workqueue per nfsd_net so GC activity in one
> namespace does not starve filecache GC in other
> namespaces.
>
> Before I would take patches like this, though,
> performance data demonstrating a problem and some
> improvement should be presented separately or as
> part of the patch descriptions.

> If you repost, start a separate thread and cc:
>
> M: Tejun Heo <[email protected]>
> R: Lai Jiangshan <[email protected]>
>
> to get review by workqueue domain experts.
>
>