It's apparent that a number of distributions and their customers
remain on long-term stable kernels. We are aware of the scalability
problems and other bugs in NFSD in kernels between v5.4 and v6.1.
To address the filecache and other scalability problems in those
kernels, I'm preparing backported patches of NFSD fixes for several
popular LTS kernels. These backports are destined for the official
LTS kernel branches so that distributions can easily integrate them
into their products.
Once this effort is complete, Greg and Sasha will continue to be
responsible for backporting NFSD-related fixes from upstream into
the LTS kernels.
Here's a status update.
---
I've pushed the NFSD backports to branches in this repo:
https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
If you are able, I encourage you to pull these, review them or try
them out, and report any issues or successes. I'm currently using
the NFS workflows in kdevops as the testing platform, but am
planning to include other tests.
LTS v5.15.y
Work on backporting the NFSD file cache fixes to v5.15.y is now
complete. Subsequent fixes and changes will go through the usual
-> stable@ process.
LTS v5.10.y
I've updated nfsd-5.10.y to include the patches and fixes that
were included in nfsd-5.15.y. You can find these patches in the
"nfsd-5.10.y" branch in the above repo.
This week I intend to set up CI testing for this branch.
--
Chuck Lever
On Mon, May 6, 2024 at 4:25 PM Chuck Lever <[email protected]> wrote:
>
> It's apparent that a number of distributions and their customers
> remain on long-term stable kernels. We are aware of the scalability
> problems and other bugs in NFSD in kernels between v5.4 and v6.1.
>
Chuck,
Are you able to share a partial list of scalability problems that were
fixed by this backport series?
Specifically, my interest is in the list of improvements to 5.15.y.
Thanks,
Amir.
On Wed, May 08, 2024 at 02:29:05PM +0300, Amir Goldstein wrote:
> On Mon, May 6, 2024 at 4:25 PM Chuck Lever <[email protected]> wrote:
> >
> > It's apparent that a number of distributions and their customers
> > remain on long-term stable kernels. We are aware of the scalability
> > problems and other bugs in NFSD in kernels between v5.4 and v6.1.
> >
>
> Chuck,
>
> Are you able to share a partial list of scalability problems that were
> fixed by this backport series?
>
> Specifically, my interest is in the list of improvements to 5.15.y.
In broad strokes:
- The garbage collection mechanism was rewritten to keep the LRU
list short, and to keep sweeps productive. This was the main issue
as the LRU used to have hundreds of thousands of files on it and
one sweep would take so much CPU it was reported as a soft lockup.
- NFSv4 OPEN files are no longer garbage collected so that an NFSv4
CLOSE means local accessors have immediate access to a file.
- The filecache hash table is converted to an rhltable so that it
can efficiently manage many more open files.
- Various fixes prevent writeback of garbage-collected files from
bogging down.
There are a handful of important bugfixes before this commit, but
starting with commit 0369b53886ec ("NFSD: Report filecache LRU
size"), have a look at the commits that touch fs/nfsd/filecache.c.
--
Chuck Lever