2022-02-24 01:30:44

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH v7 00/21] Readdir improvements

From: Trond Myklebust <[email protected]>

The current NFS readdir code will always try to maximise the amount of
readahead it performs on the assumption that we can cache anything that
isn't immediately read by the process.
There are several cases where this assumption breaks down, including
when the 'ls -l' heuristic kicks in to try to force use of readdirplus
as a batch replacement for lookup/getattr.

This series also implement Ben's page cache filter to ensure that we can
improve the ability to share cached data between processes that are
reading the same directory at the same time, and to avoid live-locks
when the directory is simultaneously changing.

--
v2: Remove reset of dtsize when NFS_INO_FORCE_READDIR is set
v3: Avoid excessive window shrinking in uncached_readdir case
v4: Track 'ls -l' cache hit/miss statistics
Improved algorithm for falling back to uncached readdir
Skip readdirplus when files are being written to
v5: bugfixes
Skip readdirplus when the acdirmax/acregmax values are low
Request a full XDR buffer when doing READDIRPLUS
v6: Add tracing
Don't have lookup request readdirplus when it won't help
v7: Implement Ben's page cache filter
Reduce the use of uncached readdir
Change indexing of the page cache to improve seekdir() performance.

Trond Myklebust (21):
NFS: constify nfs_server_capable() and nfs_have_writebacks()
NFS: Trace lookup revalidation failure
NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
NFS: Calculate page offsets algorithmically
NFS: Store the change attribute in the directory page cache
NFS: If the cookie verifier changes, we must invalidate the page cache
NFS: Don't re-read the entire page cache to find the next cookie
NFS: Adjust the amount of readahead performed by NFS readdir
NFS: Simplify nfs_readdir_xdr_to_array()
NFS: Reduce use of uncached readdir
NFS: Improve heuristic for readdirplus
NFS: Don't ask for readdirplus unless it can help nfs_getattr()
NFSv4: Ask for a full XDR buffer of readdir goodness
NFS: Readdirplus can't help lookup for case insensitive filesystems
NFS: Don't request readdirplus when revalidation was forced
NFS: Add basic readdir tracing
NFS: Trace effects of readdirplus on the dcache
NFS: Trace effects of the readdirplus heuristic
NFS: Convert readdir page cache to use a cookie based index
NFS: Fix up forced readdirplus
NFS: Remove unnecessary cache invalidations for directories

fs/nfs/dir.c | 450 ++++++++++++++++++++++++-----------------
fs/nfs/inode.c | 46 ++---
fs/nfs/internal.h | 4 +-
fs/nfs/nfs3xdr.c | 7 +-
fs/nfs/nfs4proc.c | 2 -
fs/nfs/nfs4xdr.c | 6 +-
fs/nfs/nfstrace.h | 122 ++++++++++-
include/linux/nfs_fs.h | 19 +-
8 files changed, 421 insertions(+), 235 deletions(-)

--
2.35.1


2022-02-24 15:16:03

by David Wysochanski

[permalink] [raw]
Subject: Re: [PATCH v7 00/21] Readdir improvements

On Wed, Feb 23, 2022 at 4:24 PM <[email protected]> wrote:
>
> From: Trond Myklebust <[email protected]>
>
> The current NFS readdir code will always try to maximise the amount of
> readahead it performs on the assumption that we can cache anything that
> isn't immediately read by the process.
> There are several cases where this assumption breaks down, including
> when the 'ls -l' heuristic kicks in to try to force use of readdirplus
> as a batch replacement for lookup/getattr.
>
> This series also implement Ben's page cache filter to ensure that we can
> improve the ability to share cached data between processes that are
> reading the same directory at the same time, and to avoid live-locks
> when the directory is simultaneously changing.
>
> --
> v2: Remove reset of dtsize when NFS_INO_FORCE_READDIR is set
> v3: Avoid excessive window shrinking in uncached_readdir case
> v4: Track 'ls -l' cache hit/miss statistics
> Improved algorithm for falling back to uncached readdir
> Skip readdirplus when files are being written to
> v5: bugfixes
> Skip readdirplus when the acdirmax/acregmax values are low
> Request a full XDR buffer when doing READDIRPLUS
> v6: Add tracing
> Don't have lookup request readdirplus when it won't help
> v7: Implement Ben's page cache filter
> Reduce the use of uncached readdir
> Change indexing of the page cache to improve seekdir() performance.
>
> Trond Myklebust (21):
> NFS: constify nfs_server_capable() and nfs_have_writebacks()
> NFS: Trace lookup revalidation failure
> NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
> NFS: Calculate page offsets algorithmically
> NFS: Store the change attribute in the directory page cache
> NFS: If the cookie verifier changes, we must invalidate the page cache
> NFS: Don't re-read the entire page cache to find the next cookie
> NFS: Adjust the amount of readahead performed by NFS readdir
> NFS: Simplify nfs_readdir_xdr_to_array()
> NFS: Reduce use of uncached readdir
> NFS: Improve heuristic for readdirplus
> NFS: Don't ask for readdirplus unless it can help nfs_getattr()
> NFSv4: Ask for a full XDR buffer of readdir goodness
> NFS: Readdirplus can't help lookup for case insensitive filesystems
> NFS: Don't request readdirplus when revalidation was forced
> NFS: Add basic readdir tracing
> NFS: Trace effects of readdirplus on the dcache
> NFS: Trace effects of the readdirplus heuristic
> NFS: Convert readdir page cache to use a cookie based index
> NFS: Fix up forced readdirplus
> NFS: Remove unnecessary cache invalidations for directories
>
> fs/nfs/dir.c | 450 ++++++++++++++++++++++++-----------------
> fs/nfs/inode.c | 46 ++---
> fs/nfs/internal.h | 4 +-
> fs/nfs/nfs3xdr.c | 7 +-
> fs/nfs/nfs4proc.c | 2 -
> fs/nfs/nfs4xdr.c | 6 +-
> fs/nfs/nfstrace.h | 122 ++++++++++-
> include/linux/nfs_fs.h | 19 +-
> 8 files changed, 421 insertions(+), 235 deletions(-)
>
> --
> 2.35.1
>

Trond I have been following your work here with periodic tests though
not fully following all the patches content. As you know this is a tricky
area and seems to be a hotspot area for customers that use NFS, with
many scenarios that may go wrong. Thanks for your work, which now
includes even some tracepoints and Ben's page cache filler.

This patchset seems to be the best of all the ones so far. My initial
tests (listings when modifying as well as idle directories) indicate
that the issue that Gonzalo reported on Jan 14th [1] looks to be fixed
by this set, but I'll let him confirm. I'll do some more testing and
let you know if there's anything else I find. If there're some
scenarios (mount options, servers, etc) you need more testing on, let
us know and we'll try to make that happen.

[1] [PATCH] NFS: limit block size reported for directories

2022-02-24 15:20:26

by David Wysochanski

[permalink] [raw]
Subject: Re: [PATCH v7 00/21] Readdir improvements

On Wed, Feb 23, 2022 at 4:24 PM <[email protected]> wrote:
>
> From: Trond Myklebust <[email protected]>
>
> The current NFS readdir code will always try to maximise the amount of
> readahead it performs on the assumption that we can cache anything that
> isn't immediately read by the process.
> There are several cases where this assumption breaks down, including
> when the 'ls -l' heuristic kicks in to try to force use of readdirplus
> as a batch replacement for lookup/getattr.
>
> This series also implement Ben's page cache filter to ensure that we can
> improve the ability to share cached data between processes that are
> reading the same directory at the same time, and to avoid live-locks
> when the directory is simultaneously changing.
>
> --
> v2: Remove reset of dtsize when NFS_INO_FORCE_READDIR is set
> v3: Avoid excessive window shrinking in uncached_readdir case
> v4: Track 'ls -l' cache hit/miss statistics
> Improved algorithm for falling back to uncached readdir
> Skip readdirplus when files are being written to
> v5: bugfixes
> Skip readdirplus when the acdirmax/acregmax values are low
> Request a full XDR buffer when doing READDIRPLUS
> v6: Add tracing
> Don't have lookup request readdirplus when it won't help
> v7: Implement Ben's page cache filter
> Reduce the use of uncached readdir
> Change indexing of the page cache to improve seekdir() performance.
>
> Trond Myklebust (21):
> NFS: constify nfs_server_capable() and nfs_have_writebacks()
> NFS: Trace lookup revalidation failure
> NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
> NFS: Calculate page offsets algorithmically
> NFS: Store the change attribute in the directory page cache
> NFS: If the cookie verifier changes, we must invalidate the page cache
> NFS: Don't re-read the entire page cache to find the next cookie
> NFS: Adjust the amount of readahead performed by NFS readdir
> NFS: Simplify nfs_readdir_xdr_to_array()
> NFS: Reduce use of uncached readdir
> NFS: Improve heuristic for readdirplus
> NFS: Don't ask for readdirplus unless it can help nfs_getattr()
> NFSv4: Ask for a full XDR buffer of readdir goodness
> NFS: Readdirplus can't help lookup for case insensitive filesystems
> NFS: Don't request readdirplus when revalidation was forced
> NFS: Add basic readdir tracing
> NFS: Trace effects of readdirplus on the dcache
> NFS: Trace effects of the readdirplus heuristic
> NFS: Convert readdir page cache to use a cookie based index
> NFS: Fix up forced readdirplus
> NFS: Remove unnecessary cache invalidations for directories
>
> fs/nfs/dir.c | 450 ++++++++++++++++++++++++-----------------
> fs/nfs/inode.c | 46 ++---
> fs/nfs/internal.h | 4 +-
> fs/nfs/nfs3xdr.c | 7 +-
> fs/nfs/nfs4proc.c | 2 -
> fs/nfs/nfs4xdr.c | 6 +-
> fs/nfs/nfstrace.h | 122 ++++++++++-
> include/linux/nfs_fs.h | 19 +-
> 8 files changed, 421 insertions(+), 235 deletions(-)
>
> --
> 2.35.1
>

I don't see this pushed this to your 'testing' branch, but I applied
manually after resetting at 3f58e709c162

2022-02-25 12:15:11

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v7 00/21] Readdir improvements

On Thu, 2022-02-24 at 07:25 -0500, David Wysochanski wrote:
> On Wed, Feb 23, 2022 at 4:24 PM <[email protected]> wrote:
> >
> > From: Trond Myklebust <[email protected]>
> >
> > The current NFS readdir code will always try to maximise the amount
> > of
> > readahead it performs on the assumption that we can cache anything
> > that
> > isn't immediately read by the process.
> > There are several cases where this assumption breaks down,
> > including
> > when the 'ls -l' heuristic kicks in to try to force use of
> > readdirplus
> > as a batch replacement for lookup/getattr.
> >
> > This series also implement Ben's page cache filter to ensure that
> > we can
> > improve the ability to share cached data between processes that are
> > reading the same directory at the same time, and to avoid live-
> > locks
> > when the directory is simultaneously changing.
> >
> > --
> > v2: Remove reset of dtsize when NFS_INO_FORCE_READDIR is set
> > v3: Avoid excessive window shrinking in uncached_readdir case
> > v4: Track 'ls -l' cache hit/miss statistics
> >     Improved algorithm for falling back to uncached readdir
> >     Skip readdirplus when files are being written to
> > v5: bugfixes
> >     Skip readdirplus when the acdirmax/acregmax values are low
> >     Request a full XDR buffer when doing READDIRPLUS
> > v6: Add tracing
> >     Don't have lookup request readdirplus when it won't help
> > v7: Implement Ben's page cache filter
> >     Reduce the use of uncached readdir
> >     Change indexing of the page cache to improve seekdir()
> > performance.
> >
> > Trond Myklebust (21):
> >   NFS: constify nfs_server_capable() and nfs_have_writebacks()
> >   NFS: Trace lookup revalidation failure
> >   NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
> >   NFS: Calculate page offsets algorithmically
> >   NFS: Store the change attribute in the directory page cache
> >   NFS: If the cookie verifier changes, we must invalidate the page
> > cache
> >   NFS: Don't re-read the entire page cache to find the next cookie
> >   NFS: Adjust the amount of readahead performed by NFS readdir
> >   NFS: Simplify nfs_readdir_xdr_to_array()
> >   NFS: Reduce use of uncached readdir
> >   NFS: Improve heuristic for readdirplus
> >   NFS: Don't ask for readdirplus unless it can help nfs_getattr()
> >   NFSv4: Ask for a full XDR buffer of readdir goodness
> >   NFS: Readdirplus can't help lookup for case insensitive
> > filesystems
> >   NFS: Don't request readdirplus when revalidation was forced
> >   NFS: Add basic readdir tracing
> >   NFS: Trace effects of readdirplus on the dcache
> >   NFS: Trace effects of the readdirplus heuristic
> >   NFS: Convert readdir page cache to use a cookie based index
> >   NFS: Fix up forced readdirplus
> >   NFS: Remove unnecessary cache invalidations for directories
> >
> >  fs/nfs/dir.c           | 450 ++++++++++++++++++++++++-------------
> > ----
> >  fs/nfs/inode.c         |  46 ++---
> >  fs/nfs/internal.h      |   4 +-
> >  fs/nfs/nfs3xdr.c       |   7 +-
> >  fs/nfs/nfs4proc.c      |   2 -
> >  fs/nfs/nfs4xdr.c       |   6 +-
> >  fs/nfs/nfstrace.h      | 122 ++++++++++-
> >  include/linux/nfs_fs.h |  19 +-
> >  8 files changed, 421 insertions(+), 235 deletions(-)
> >
> > --
> > 2.35.1
> >
>
> Trond I have been following your work here with periodic tests though
> not fully following all the patches content.  As you know this is a
> tricky
> area and seems to be a hotspot area for customers that use NFS, with
> many scenarios that may go wrong.  Thanks for your work, which now
> includes even some tracepoints and Ben's page cache filler.
>
> This patchset seems to be the best of all the ones so far.  My
> initial
> tests (listings when modifying as well as idle directories) indicate
> that the issue that Gonzalo reported on Jan 14th [1] looks to be
> fixed
> by this set, but I'll let him confirm.  I'll do some more testing and
> let you know if there's anything else I find.  If there're some
> scenarios (mount options, servers, etc) you need more testing on, let
> us know and we'll try to make that happen.
>
> [1] [PATCH] NFS: limit block size reported for directories
>

Thanks Dave! I very much appreciate your testing the patches. This is a
very complex area due to the interplay of readdir/readdirplus, the
dcache and the attribute cache, and so it is key to get as much
statistics as possible. Thanks again to you and Red Hat for
contributing to that effort.

Thanks also to Ben for his comments and reviews. His insights are key
to the progress we appear to be making.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]