2020-11-21 13:31:04

by David Wysochanski

[permalink] [raw]
Subject: [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs

These patches update the NFS client to use the new netfs and fscache
APIs and are at:
https://github.com/DaveWysochanskiRH/kernel.git
https://github.com/DaveWysochanskiRH/kernel/commit/94e9633d98a5542ea384b1095290ac6f915fc917
https://github.com/DaveWysochanskiRH/kernel/releases/tag/fscache-iter-nfs-20201120

The patches are based on David Howells fscache-iter tree at
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter

The first 6 patches refactor some of the NFS read code to facilitate
re-use, the next 6 patches do the conversion to the new API, and the
last patch is a somewhat awkward fix for a problem seen in final
testing.

Per David Howell's recent post, note that the new fscache API is
divided into two separate APIs, a 'netfs' API and an 'fscache' API.
The netfs API was done to help simplify the IO paths of network
filesystems, and can be called even when fscache is disabled, thus
simplifing both readpage and readahead implementations. However,
for now these NFS conversion patches only call the netfs API when
fscache is enabled, similar to the existing NFS code.

Trond and Anna, I would appreciate your guidance on this patchset.
At least I would like your feedback regarding the direction
you would like these patches to go, in particular, the following
items:

1. Whether you are ok with using the netfs API unconditionally even
when fscache is disabled, or prefer this "least invasive to NFS"
approach. Note the unconditional use of the netfs API is the
recommended approach per David's post and the AFS and CEPH
implementations, but I was unsure if you would accept this
approach or would prefer to minimize changes to NFS. Note if
we keep the current approach to minimize NFS changes, we will
have to address some problems with page unlocking such as with
patch 13 in the series.

2. Whether to keep the NFS fscache implementation as "read only"
or if we add write through support. Today we only enable fscache
when a file is open read-only and disable / invalidate when a file
is open for write.

Still TODO
1. Address known issues (lockdep, page unlocking), depending on
what is decided as far as implementation direction
a) nfs_issue_op: takes rcu_read_lock but may calls nfs_page_alloc()
with GFP_KERNEL which may sleep (dhowells noted this in a review)
b) nfs_refresh_inode() takes inode->i_lock but may call
__fscache_invalidate() which may sleep (found with lockdep)
2. Fixup NFS fscache stats (NFSIOS_FSCACHE_*)
* Compare with netfs stats and determine if still needed
3. Cleanup dfprintks and/or convert to tracepoints
4. Further tests (see "Not tested yet")

Checks run
1. checkpatch: PASS*
* a few warnings, mostly trivial fixups, some unrelated to this set
2. kernel builds with each patch: PASS
* each patch in series built cleanly which ensure bisection

Tests run
1. Custom NFS+fscache unit tests for basic operation: PASS*
* no oops or data corruptions
* Some op counts are a bit off but these are mostly due
to statistics not implemented properly (NFSIOS_FSCACHE_*)
2. cthon04: PASS (test options "-b -g -s -l", fsc,vers=3,4.0,4.1,4.2,sec=sys)
* No failures or oopses for any version or test options
3. iozone tests (fsc,vers=3,4.0,4.1,4.2,sec=sys): PASS
* No failures or oopses
4. kernel build (fsc,vers=3,4.1,4.2): PASS*
* all builds finish without errors or data corruption
* one lockdep "scheduling while atomic" fired with NFS41 and
was due to one an fscache invalidation code path (known issue 'b' above)
5. xfstests/generic (fsc,vers=4.2, nofsc,vers=4.2): PASS*
* generic/013 (pass but triggers i_lock lockdep warning known issue 'a' above)
* NOTE: The following tests failed with errors, but they
also fail on vanilla 5.10-rc4 so are not related to this
patchset
* generic/074 (lockep invalid wait context - nfs_free_request())
* generic/538 (short read)
* generic/551 (pread: Unknown error 524, Data verification fail)
* generic/568 (ERROR: File grew from 4096 B to 8192 B when writing to the fallocated range)

Not tested yet:
* error injections (for example, connection disruptions, server errors during IO, etc)
* pNFS
* many process mixed read/write on same file
* performance
Dave Wysochanski (13):
NFS: Clean up nfs_readpage() and nfs_readpages()
NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read
succeeds
NFS: Refactor nfs_readpage() and nfs_readpage_async() to use
nfs_readdesc
NFS: Call readpage_async_filler() from nfs_readpage_async()
NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
NFS: Allow internal use of read structs and functions
NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
NFS: Convert fscache_enable_cookie and fscache_disable_cookie
NFS: Convert fscache invalidation and update aux_data and i_size
NFS: Convert to the netfs API and nfs_readpage to use netfs_readpage
NFS: Convert readpage to readahead and use netfs_readahead for fscache
NFS: Allow NFS use of new fscache API in build
NFS: Ensure proper page unlocking when reads fail with retryable
errors

fs/nfs/Kconfig | 2 +-
fs/nfs/direct.c | 2 +
fs/nfs/file.c | 22 ++--
fs/nfs/fscache-index.c | 94 --------------
fs/nfs/fscache.c | 315 ++++++++++++++++++++-------------------------
fs/nfs/fscache.h | 132 +++++++------------
fs/nfs/inode.c | 4 +-
fs/nfs/internal.h | 8 ++
fs/nfs/nfs4proc.c | 2 +-
fs/nfs/pagelist.c | 2 +
fs/nfs/read.c | 248 ++++++++++++++++-------------------
fs/nfs/write.c | 3 +-
include/linux/nfs_fs.h | 5 +-
include/linux/nfs_iostat.h | 2 +-
include/linux/nfs_page.h | 1 +
include/linux/nfs_xdr.h | 1 +
16 files changed, 339 insertions(+), 504 deletions(-)

--
1.8.3.1


2020-11-21 16:19:27

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs

Hi Dave-

> On Nov 21, 2020, at 8:29 AM, Dave Wysochanski <[email protected]> wrote:
>
> These patches update the NFS client to use the new netfs and fscache
> APIs and are at:
> https://github.com/DaveWysochanskiRH/kernel.git
> https://github.com/DaveWysochanskiRH/kernel/commit/94e9633d98a5542ea384b1095290ac6f915fc917
> https://github.com/DaveWysochanskiRH/kernel/releases/tag/fscache-iter-nfs-20201120
>
> The patches are based on David Howells fscache-iter tree at
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
>
> The first 6 patches refactor some of the NFS read code to facilitate
> re-use, the next 6 patches do the conversion to the new API, and the
> last patch is a somewhat awkward fix for a problem seen in final
> testing.
>
> Per David Howell's recent post, note that the new fscache API is
> divided into two separate APIs, a 'netfs' API and an 'fscache' API.
> The netfs API was done to help simplify the IO paths of network
> filesystems, and can be called even when fscache is disabled, thus
> simplifing both readpage and readahead implementations. However,
> for now these NFS conversion patches only call the netfs API when
> fscache is enabled, similar to the existing NFS code.
>
> Trond and Anna, I would appreciate your guidance on this patchset.
> At least I would like your feedback regarding the direction
> you would like these patches to go, in particular, the following
> items:
>
> 1. Whether you are ok with using the netfs API unconditionally even
> when fscache is disabled, or prefer this "least invasive to NFS"
> approach. Note the unconditional use of the netfs API is the
> recommended approach per David's post and the AFS and CEPH
> implementations, but I was unsure if you would accept this
> approach or would prefer to minimize changes to NFS. Note if
> we keep the current approach to minimize NFS changes, we will
> have to address some problems with page unlocking such as with
> patch 13 in the series.
>
> 2. Whether to keep the NFS fscache implementation as "read only"
> or if we add write through support. Today we only enable fscache
> when a file is open read-only and disable / invalidate when a file
> is open for write.
>
> Still TODO
> 1. Address known issues (lockdep, page unlocking), depending on
> what is decided as far as implementation direction
> a) nfs_issue_op: takes rcu_read_lock but may calls nfs_page_alloc()
> with GFP_KERNEL which may sleep (dhowells noted this in a review)
> b) nfs_refresh_inode() takes inode->i_lock but may call
> __fscache_invalidate() which may sleep (found with lockdep)
> 2. Fixup NFS fscache stats (NFSIOS_FSCACHE_*)
> * Compare with netfs stats and determine if still needed
> 3. Cleanup dfprintks and/or convert to tracepoints
> 4. Further tests (see "Not tested yet")

Can you say whether your approach has any performance impact?
In particular, what comparative benchmarks have been run?


> Checks run
> 1. checkpatch: PASS*
> * a few warnings, mostly trivial fixups, some unrelated to this set
> 2. kernel builds with each patch: PASS
> * each patch in series built cleanly which ensure bisection
>
> Tests run
> 1. Custom NFS+fscache unit tests for basic operation: PASS*
> * no oops or data corruptions
> * Some op counts are a bit off but these are mostly due
> to statistics not implemented properly (NFSIOS_FSCACHE_*)
> 2. cthon04: PASS (test options "-b -g -s -l", fsc,vers=3,4.0,4.1,4.2,sec=sys)
> * No failures or oopses for any version or test options
> 3. iozone tests (fsc,vers=3,4.0,4.1,4.2,sec=sys): PASS
> * No failures or oopses
> 4. kernel build (fsc,vers=3,4.1,4.2): PASS*
> * all builds finish without errors or data corruption
> * one lockdep "scheduling while atomic" fired with NFS41 and
> was due to one an fscache invalidation code path (known issue 'b' above)
> 5. xfstests/generic (fsc,vers=4.2, nofsc,vers=4.2): PASS*
> * generic/013 (pass but triggers i_lock lockdep warning known issue 'a' above)
> * NOTE: The following tests failed with errors, but they
> also fail on vanilla 5.10-rc4 so are not related to this
> patchset
> * generic/074 (lockep invalid wait context - nfs_free_request())
> * generic/538 (short read)
> * generic/551 (pread: Unknown error 524, Data verification fail)
> * generic/568 (ERROR: File grew from 4096 B to 8192 B when writing to the fallocated range)
>
> Not tested yet:
> * error injections (for example, connection disruptions, server errors during IO, etc)
> * pNFS
> * many process mixed read/write on same file
> * performance
> Dave Wysochanski (13):
> NFS: Clean up nfs_readpage() and nfs_readpages()
> NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read
> succeeds
> NFS: Refactor nfs_readpage() and nfs_readpage_async() to use
> nfs_readdesc
> NFS: Call readpage_async_filler() from nfs_readpage_async()
> NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
> NFS: Allow internal use of read structs and functions
> NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
> NFS: Convert fscache_enable_cookie and fscache_disable_cookie
> NFS: Convert fscache invalidation and update aux_data and i_size
> NFS: Convert to the netfs API and nfs_readpage to use netfs_readpage
> NFS: Convert readpage to readahead and use netfs_readahead for fscache
> NFS: Allow NFS use of new fscache API in build
> NFS: Ensure proper page unlocking when reads fail with retryable
> errors
>
> fs/nfs/Kconfig | 2 +-
> fs/nfs/direct.c | 2 +
> fs/nfs/file.c | 22 ++--
> fs/nfs/fscache-index.c | 94 --------------
> fs/nfs/fscache.c | 315 ++++++++++++++++++++-------------------------
> fs/nfs/fscache.h | 132 +++++++------------
> fs/nfs/inode.c | 4 +-
> fs/nfs/internal.h | 8 ++
> fs/nfs/nfs4proc.c | 2 +-
> fs/nfs/pagelist.c | 2 +
> fs/nfs/read.c | 248 ++++++++++++++++-------------------
> fs/nfs/write.c | 3 +-
> include/linux/nfs_fs.h | 5 +-
> include/linux/nfs_iostat.h | 2 +-
> include/linux/nfs_page.h | 1 +
> include/linux/nfs_xdr.h | 1 +
> 16 files changed, 339 insertions(+), 504 deletions(-)
>
> --
> 1.8.3.1
>

--
Chuck Lever



2020-11-21 17:03:58

by David Wysochanski

[permalink] [raw]
Subject: Re: [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs

On Sat, Nov 21, 2020 at 11:14 AM Chuck Lever <[email protected]> wrote:
>
> Hi Dave-
>
> > On Nov 21, 2020, at 8:29 AM, Dave Wysochanski <[email protected]> wrote:
> >
> > These patches update the NFS client to use the new netfs and fscache
> > APIs and are at:
> > https://github.com/DaveWysochanskiRH/kernel.git
> > https://github.com/DaveWysochanskiRH/kernel/commit/94e9633d98a5542ea384b1095290ac6f915fc917
> > https://github.com/DaveWysochanskiRH/kernel/releases/tag/fscache-iter-nfs-20201120
> >
> > The patches are based on David Howells fscache-iter tree at
> > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
> >
> > The first 6 patches refactor some of the NFS read code to facilitate
> > re-use, the next 6 patches do the conversion to the new API, and the
> > last patch is a somewhat awkward fix for a problem seen in final
> > testing.
> >
> > Per David Howell's recent post, note that the new fscache API is
> > divided into two separate APIs, a 'netfs' API and an 'fscache' API.
> > The netfs API was done to help simplify the IO paths of network
> > filesystems, and can be called even when fscache is disabled, thus
> > simplifing both readpage and readahead implementations. However,
> > for now these NFS conversion patches only call the netfs API when
> > fscache is enabled, similar to the existing NFS code.
> >
> > Trond and Anna, I would appreciate your guidance on this patchset.
> > At least I would like your feedback regarding the direction
> > you would like these patches to go, in particular, the following
> > items:
> >
> > 1. Whether you are ok with using the netfs API unconditionally even
> > when fscache is disabled, or prefer this "least invasive to NFS"
> > approach. Note the unconditional use of the netfs API is the
> > recommended approach per David's post and the AFS and CEPH
> > implementations, but I was unsure if you would accept this
> > approach or would prefer to minimize changes to NFS. Note if
> > we keep the current approach to minimize NFS changes, we will
> > have to address some problems with page unlocking such as with
> > patch 13 in the series.
> >
> > 2. Whether to keep the NFS fscache implementation as "read only"
> > or if we add write through support. Today we only enable fscache
> > when a file is open read-only and disable / invalidate when a file
> > is open for write.
> >
> > Still TODO
> > 1. Address known issues (lockdep, page unlocking), depending on
> > what is decided as far as implementation direction
> > a) nfs_issue_op: takes rcu_read_lock but may calls nfs_page_alloc()
> > with GFP_KERNEL which may sleep (dhowells noted this in a review)
> > b) nfs_refresh_inode() takes inode->i_lock but may call
> > __fscache_invalidate() which may sleep (found with lockdep)
> > 2. Fixup NFS fscache stats (NFSIOS_FSCACHE_*)
> > * Compare with netfs stats and determine if still needed
> > 3. Cleanup dfprintks and/or convert to tracepoints
> > 4. Further tests (see "Not tested yet")
>
> Can you say whether your approach has any performance impact?
No I cannot.

> In particular, what comparative benchmarks have been run?
>
No comparisons so far, but note the last bullet - "performance".

Are you wondering about performance with/without fscache for this
series, or the old vs new fscache, or something else?


>
> > Checks run
> > 1. checkpatch: PASS*
> > * a few warnings, mostly trivial fixups, some unrelated to this set
> > 2. kernel builds with each patch: PASS
> > * each patch in series built cleanly which ensure bisection
> >
> > Tests run
> > 1. Custom NFS+fscache unit tests for basic operation: PASS*
> > * no oops or data corruptions
> > * Some op counts are a bit off but these are mostly due
> > to statistics not implemented properly (NFSIOS_FSCACHE_*)
> > 2. cthon04: PASS (test options "-b -g -s -l", fsc,vers=3,4.0,4.1,4.2,sec=sys)
> > * No failures or oopses for any version or test options
> > 3. iozone tests (fsc,vers=3,4.0,4.1,4.2,sec=sys): PASS
> > * No failures or oopses
> > 4. kernel build (fsc,vers=3,4.1,4.2): PASS*
> > * all builds finish without errors or data corruption
> > * one lockdep "scheduling while atomic" fired with NFS41 and
> > was due to one an fscache invalidation code path (known issue 'b' above)
> > 5. xfstests/generic (fsc,vers=4.2, nofsc,vers=4.2): PASS*
> > * generic/013 (pass but triggers i_lock lockdep warning known issue 'a' above)
> > * NOTE: The following tests failed with errors, but they
> > also fail on vanilla 5.10-rc4 so are not related to this
> > patchset
> > * generic/074 (lockep invalid wait context - nfs_free_request())
> > * generic/538 (short read)
> > * generic/551 (pread: Unknown error 524, Data verification fail)
> > * generic/568 (ERROR: File grew from 4096 B to 8192 B when writing to the fallocated range)
> >
> > Not tested yet:
> > * error injections (for example, connection disruptions, server errors during IO, etc)
> > * pNFS
> > * many process mixed read/write on same file
> > * performance
> > Dave Wysochanski (13):
> > NFS: Clean up nfs_readpage() and nfs_readpages()
> > NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read
> > succeeds
> > NFS: Refactor nfs_readpage() and nfs_readpage_async() to use
> > nfs_readdesc
> > NFS: Call readpage_async_filler() from nfs_readpage_async()
> > NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
> > NFS: Allow internal use of read structs and functions
> > NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
> > NFS: Convert fscache_enable_cookie and fscache_disable_cookie
> > NFS: Convert fscache invalidation and update aux_data and i_size
> > NFS: Convert to the netfs API and nfs_readpage to use netfs_readpage
> > NFS: Convert readpage to readahead and use netfs_readahead for fscache
> > NFS: Allow NFS use of new fscache API in build
> > NFS: Ensure proper page unlocking when reads fail with retryable
> > errors
> >
> > fs/nfs/Kconfig | 2 +-
> > fs/nfs/direct.c | 2 +
> > fs/nfs/file.c | 22 ++--
> > fs/nfs/fscache-index.c | 94 --------------
> > fs/nfs/fscache.c | 315 ++++++++++++++++++++-------------------------
> > fs/nfs/fscache.h | 132 +++++++------------
> > fs/nfs/inode.c | 4 +-
> > fs/nfs/internal.h | 8 ++
> > fs/nfs/nfs4proc.c | 2 +-
> > fs/nfs/pagelist.c | 2 +
> > fs/nfs/read.c | 248 ++++++++++++++++-------------------
> > fs/nfs/write.c | 3 +-
> > include/linux/nfs_fs.h | 5 +-
> > include/linux/nfs_iostat.h | 2 +-
> > include/linux/nfs_page.h | 1 +
> > include/linux/nfs_xdr.h | 1 +
> > 16 files changed, 339 insertions(+), 504 deletions(-)
> >
> > --
> > 1.8.3.1
> >
>
> --
> Chuck Lever
>
>
>

2020-11-21 17:17:38

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs



> On Nov 21, 2020, at 12:01 PM, David Wysochanski <[email protected]> wrote:
>
> On Sat, Nov 21, 2020 at 11:14 AM Chuck Lever <[email protected]> wrote:
>>
>> Hi Dave-
>>
>>> On Nov 21, 2020, at 8:29 AM, Dave Wysochanski <[email protected]> wrote:
>>>
>>> These patches update the NFS client to use the new netfs and fscache
>>> APIs and are at:
>>> https://github.com/DaveWysochanskiRH/kernel.git
>>> https://github.com/DaveWysochanskiRH/kernel/commit/94e9633d98a5542ea384b1095290ac6f915fc917
>>> https://github.com/DaveWysochanskiRH/kernel/releases/tag/fscache-iter-nfs-20201120
>>>
>>> The patches are based on David Howells fscache-iter tree at
>>> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
>>>
>>> The first 6 patches refactor some of the NFS read code to facilitate
>>> re-use, the next 6 patches do the conversion to the new API, and the
>>> last patch is a somewhat awkward fix for a problem seen in final
>>> testing.
>>>
>>> Per David Howell's recent post, note that the new fscache API is
>>> divided into two separate APIs, a 'netfs' API and an 'fscache' API.
>>> The netfs API was done to help simplify the IO paths of network
>>> filesystems, and can be called even when fscache is disabled, thus
>>> simplifing both readpage and readahead implementations. However,
>>> for now these NFS conversion patches only call the netfs API when
>>> fscache is enabled, similar to the existing NFS code.
>>>
>>> Trond and Anna, I would appreciate your guidance on this patchset.
>>> At least I would like your feedback regarding the direction
>>> you would like these patches to go, in particular, the following
>>> items:
>>>
>>> 1. Whether you are ok with using the netfs API unconditionally even
>>> when fscache is disabled, or prefer this "least invasive to NFS"
>>> approach. Note the unconditional use of the netfs API is the
>>> recommended approach per David's post and the AFS and CEPH
>>> implementations, but I was unsure if you would accept this
>>> approach or would prefer to minimize changes to NFS. Note if
>>> we keep the current approach to minimize NFS changes, we will
>>> have to address some problems with page unlocking such as with
>>> patch 13 in the series.
>>>
>>> 2. Whether to keep the NFS fscache implementation as "read only"
>>> or if we add write through support. Today we only enable fscache
>>> when a file is open read-only and disable / invalidate when a file
>>> is open for write.
>>>
>>> Still TODO
>>> 1. Address known issues (lockdep, page unlocking), depending on
>>> what is decided as far as implementation direction
>>> a) nfs_issue_op: takes rcu_read_lock but may calls nfs_page_alloc()
>>> with GFP_KERNEL which may sleep (dhowells noted this in a review)
>>> b) nfs_refresh_inode() takes inode->i_lock but may call
>>> __fscache_invalidate() which may sleep (found with lockdep)
>>> 2. Fixup NFS fscache stats (NFSIOS_FSCACHE_*)
>>> * Compare with netfs stats and determine if still needed
>>> 3. Cleanup dfprintks and/or convert to tracepoints
>>> 4. Further tests (see "Not tested yet")
>>
>> Can you say whether your approach has any performance impact?
> No I cannot.
>
>> In particular, what comparative benchmarks have been run?
>>
> No comparisons so far, but note the last bullet - "performance".
>
> Are you wondering about performance with/without fscache for this
> series, or the old vs new fscache, or something else?

I'd like to have some explicit performance-related merge worthiness
criteria. For example: "No performance regressions, and here's how
we're going to determine that we're good: fio / iozone / yada with
NFS/TCP and NFS/RDMA on 100GbE; for very little additional CPU
cost measured via perf xyzzy. Also some benchmark that measures lock
contention."

We haven't been especially careful about this in the past when
reworking the client's primary I/O paths. Nothing unreasonable, but
it should be stated up front where we want to end up.

Another approach might be: we're going to start by making fscache
opt-in. As confidence increases over time and good performance is
demonstrated, then we'll unify the fscache and non-cached I/O paths.


>>> Checks run
>>> 1. checkpatch: PASS*
>>> * a few warnings, mostly trivial fixups, some unrelated to this set
>>> 2. kernel builds with each patch: PASS
>>> * each patch in series built cleanly which ensure bisection
>>>
>>> Tests run
>>> 1. Custom NFS+fscache unit tests for basic operation: PASS*
>>> * no oops or data corruptions
>>> * Some op counts are a bit off but these are mostly due
>>> to statistics not implemented properly (NFSIOS_FSCACHE_*)
>>> 2. cthon04: PASS (test options "-b -g -s -l", fsc,vers=3,4.0,4.1,4.2,sec=sys)
>>> * No failures or oopses for any version or test options
>>> 3. iozone tests (fsc,vers=3,4.0,4.1,4.2,sec=sys): PASS
>>> * No failures or oopses
>>> 4. kernel build (fsc,vers=3,4.1,4.2): PASS*
>>> * all builds finish without errors or data corruption
>>> * one lockdep "scheduling while atomic" fired with NFS41 and
>>> was due to one an fscache invalidation code path (known issue 'b' above)
>>> 5. xfstests/generic (fsc,vers=4.2, nofsc,vers=4.2): PASS*
>>> * generic/013 (pass but triggers i_lock lockdep warning known issue 'a' above)
>>> * NOTE: The following tests failed with errors, but they
>>> also fail on vanilla 5.10-rc4 so are not related to this
>>> patchset
>>> * generic/074 (lockep invalid wait context - nfs_free_request())
>>> * generic/538 (short read)
>>> * generic/551 (pread: Unknown error 524, Data verification fail)
>>> * generic/568 (ERROR: File grew from 4096 B to 8192 B when writing to the fallocated range)
>>>
>>> Not tested yet:
>>> * error injections (for example, connection disruptions, server errors during IO, etc)
>>> * pNFS
>>> * many process mixed read/write on same file
>>> * performance
>>> Dave Wysochanski (13):
>>> NFS: Clean up nfs_readpage() and nfs_readpages()
>>> NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read
>>> succeeds
>>> NFS: Refactor nfs_readpage() and nfs_readpage_async() to use
>>> nfs_readdesc
>>> NFS: Call readpage_async_filler() from nfs_readpage_async()
>>> NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
>>> NFS: Allow internal use of read structs and functions
>>> NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
>>> NFS: Convert fscache_enable_cookie and fscache_disable_cookie
>>> NFS: Convert fscache invalidation and update aux_data and i_size
>>> NFS: Convert to the netfs API and nfs_readpage to use netfs_readpage
>>> NFS: Convert readpage to readahead and use netfs_readahead for fscache
>>> NFS: Allow NFS use of new fscache API in build
>>> NFS: Ensure proper page unlocking when reads fail with retryable
>>> errors
>>>
>>> fs/nfs/Kconfig | 2 +-
>>> fs/nfs/direct.c | 2 +
>>> fs/nfs/file.c | 22 ++--
>>> fs/nfs/fscache-index.c | 94 --------------
>>> fs/nfs/fscache.c | 315 ++++++++++++++++++++-------------------------
>>> fs/nfs/fscache.h | 132 +++++++------------
>>> fs/nfs/inode.c | 4 +-
>>> fs/nfs/internal.h | 8 ++
>>> fs/nfs/nfs4proc.c | 2 +-
>>> fs/nfs/pagelist.c | 2 +
>>> fs/nfs/read.c | 248 ++++++++++++++++-------------------
>>> fs/nfs/write.c | 3 +-
>>> include/linux/nfs_fs.h | 5 +-
>>> include/linux/nfs_iostat.h | 2 +-
>>> include/linux/nfs_page.h | 1 +
>>> include/linux/nfs_xdr.h | 1 +
>>> 16 files changed, 339 insertions(+), 504 deletions(-)
>>>
>>> --
>>> 1.8.3.1
>>>
>>
>> --
>> Chuck Lever

--
Chuck Lever



2020-11-21 18:30:54

by David Wysochanski

[permalink] [raw]
Subject: Re: [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs

On Sat, Nov 21, 2020 at 12:16 PM Chuck Lever <[email protected]> wrote:
>
>
>
> > On Nov 21, 2020, at 12:01 PM, David Wysochanski <[email protected]> wrote:
> >
> > On Sat, Nov 21, 2020 at 11:14 AM Chuck Lever <[email protected]> wrote:
> >>
> >> Hi Dave-
> >>
> >>> On Nov 21, 2020, at 8:29 AM, Dave Wysochanski <[email protected]> wrote:
> >>>
> >>> These patches update the NFS client to use the new netfs and fscache
> >>> APIs and are at:
> >>> https://github.com/DaveWysochanskiRH/kernel.git
> >>> https://github.com/DaveWysochanskiRH/kernel/commit/94e9633d98a5542ea384b1095290ac6f915fc917
> >>> https://github.com/DaveWysochanskiRH/kernel/releases/tag/fscache-iter-nfs-20201120
> >>>
> >>> The patches are based on David Howells fscache-iter tree at
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
> >>>
> >>> The first 6 patches refactor some of the NFS read code to facilitate
> >>> re-use, the next 6 patches do the conversion to the new API, and the
> >>> last patch is a somewhat awkward fix for a problem seen in final
> >>> testing.
> >>>
> >>> Per David Howell's recent post, note that the new fscache API is
> >>> divided into two separate APIs, a 'netfs' API and an 'fscache' API.
> >>> The netfs API was done to help simplify the IO paths of network
> >>> filesystems, and can be called even when fscache is disabled, thus
> >>> simplifing both readpage and readahead implementations. However,
> >>> for now these NFS conversion patches only call the netfs API when
> >>> fscache is enabled, similar to the existing NFS code.
> >>>
> >>> Trond and Anna, I would appreciate your guidance on this patchset.
> >>> At least I would like your feedback regarding the direction
> >>> you would like these patches to go, in particular, the following
> >>> items:
> >>>
> >>> 1. Whether you are ok with using the netfs API unconditionally even
> >>> when fscache is disabled, or prefer this "least invasive to NFS"
> >>> approach. Note the unconditional use of the netfs API is the
> >>> recommended approach per David's post and the AFS and CEPH
> >>> implementations, but I was unsure if you would accept this
> >>> approach or would prefer to minimize changes to NFS. Note if
> >>> we keep the current approach to minimize NFS changes, we will
> >>> have to address some problems with page unlocking such as with
> >>> patch 13 in the series.
> >>>
> >>> 2. Whether to keep the NFS fscache implementation as "read only"
> >>> or if we add write through support. Today we only enable fscache
> >>> when a file is open read-only and disable / invalidate when a file
> >>> is open for write.
> >>>
> >>> Still TODO
> >>> 1. Address known issues (lockdep, page unlocking), depending on
> >>> what is decided as far as implementation direction
> >>> a) nfs_issue_op: takes rcu_read_lock but may calls nfs_page_alloc()
> >>> with GFP_KERNEL which may sleep (dhowells noted this in a review)
> >>> b) nfs_refresh_inode() takes inode->i_lock but may call
> >>> __fscache_invalidate() which may sleep (found with lockdep)
> >>> 2. Fixup NFS fscache stats (NFSIOS_FSCACHE_*)
> >>> * Compare with netfs stats and determine if still needed
> >>> 3. Cleanup dfprintks and/or convert to tracepoints
> >>> 4. Further tests (see "Not tested yet")
> >>
> >> Can you say whether your approach has any performance impact?
> > No I cannot.
> >
> >> In particular, what comparative benchmarks have been run?
> >>
> > No comparisons so far, but note the last bullet - "performance".
> >
> > Are you wondering about performance with/without fscache for this
> > series, or the old vs new fscache, or something else?
>
> I'd like to have some explicit performance-related merge worthiness
> criteria. For example: "No performance regressions, and here's how
> we're going to determine that we're good: fio / iozone / yada with
> NFS/TCP and NFS/RDMA on 100GbE; for very little additional CPU
> cost measured via perf xyzzy. Also some benchmark that measures lock
> contention."
>
> We haven't been especially careful about this in the past when
> reworking the client's primary I/O paths. Nothing unreasonable, but
> it should be stated up front where we want to end up.
>
Makes sense.

> Another approach might be: we're going to start by making fscache
> opt-in. As confidence increases over time and good performance is
> demonstrated, then we'll unify the fscache and non-cached I/O paths.
>
It sounds like what you want is what I've done in this first implementation.
This implementation takes a "least invasive to NFS" approach, staying
with the old fscache on/off logic, even though this was not ideal or what
was recommended as the end game for the netfs API.

>
> >>> Checks run
> >>> 1. checkpatch: PASS*
> >>> * a few warnings, mostly trivial fixups, some unrelated to this set
> >>> 2. kernel builds with each patch: PASS
> >>> * each patch in series built cleanly which ensure bisection
> >>>
> >>> Tests run
> >>> 1. Custom NFS+fscache unit tests for basic operation: PASS*
> >>> * no oops or data corruptions
> >>> * Some op counts are a bit off but these are mostly due
> >>> to statistics not implemented properly (NFSIOS_FSCACHE_*)
> >>> 2. cthon04: PASS (test options "-b -g -s -l", fsc,vers=3,4.0,4.1,4.2,sec=sys)
> >>> * No failures or oopses for any version or test options
> >>> 3. iozone tests (fsc,vers=3,4.0,4.1,4.2,sec=sys): PASS
> >>> * No failures or oopses
> >>> 4. kernel build (fsc,vers=3,4.1,4.2): PASS*
> >>> * all builds finish without errors or data corruption
> >>> * one lockdep "scheduling while atomic" fired with NFS41 and
> >>> was due to one an fscache invalidation code path (known issue 'b' above)
> >>> 5. xfstests/generic (fsc,vers=4.2, nofsc,vers=4.2): PASS*
> >>> * generic/013 (pass but triggers i_lock lockdep warning known issue 'a' above)
> >>> * NOTE: The following tests failed with errors, but they
> >>> also fail on vanilla 5.10-rc4 so are not related to this
> >>> patchset
> >>> * generic/074 (lockep invalid wait context - nfs_free_request())
> >>> * generic/538 (short read)
> >>> * generic/551 (pread: Unknown error 524, Data verification fail)
> >>> * generic/568 (ERROR: File grew from 4096 B to 8192 B when writing to the fallocated range)
> >>>
> >>> Not tested yet:
> >>> * error injections (for example, connection disruptions, server errors during IO, etc)
> >>> * pNFS
> >>> * many process mixed read/write on same file
> >>> * performance
> >>> Dave Wysochanski (13):
> >>> NFS: Clean up nfs_readpage() and nfs_readpages()
> >>> NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read
> >>> succeeds
> >>> NFS: Refactor nfs_readpage() and nfs_readpage_async() to use
> >>> nfs_readdesc
> >>> NFS: Call readpage_async_filler() from nfs_readpage_async()
> >>> NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
> >>> NFS: Allow internal use of read structs and functions
> >>> NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
> >>> NFS: Convert fscache_enable_cookie and fscache_disable_cookie
> >>> NFS: Convert fscache invalidation and update aux_data and i_size
> >>> NFS: Convert to the netfs API and nfs_readpage to use netfs_readpage
> >>> NFS: Convert readpage to readahead and use netfs_readahead for fscache
> >>> NFS: Allow NFS use of new fscache API in build
> >>> NFS: Ensure proper page unlocking when reads fail with retryable
> >>> errors
> >>>
> >>> fs/nfs/Kconfig | 2 +-
> >>> fs/nfs/direct.c | 2 +
> >>> fs/nfs/file.c | 22 ++--
> >>> fs/nfs/fscache-index.c | 94 --------------
> >>> fs/nfs/fscache.c | 315 ++++++++++++++++++++-------------------------
> >>> fs/nfs/fscache.h | 132 +++++++------------
> >>> fs/nfs/inode.c | 4 +-
> >>> fs/nfs/internal.h | 8 ++
> >>> fs/nfs/nfs4proc.c | 2 +-
> >>> fs/nfs/pagelist.c | 2 +
> >>> fs/nfs/read.c | 248 ++++++++++++++++-------------------
> >>> fs/nfs/write.c | 3 +-
> >>> include/linux/nfs_fs.h | 5 +-
> >>> include/linux/nfs_iostat.h | 2 +-
> >>> include/linux/nfs_page.h | 1 +
> >>> include/linux/nfs_xdr.h | 1 +
> >>> 16 files changed, 339 insertions(+), 504 deletions(-)
> >>>
> >>> --
> >>> 1.8.3.1
> >>>
> >>
> >> --
> >> Chuck Lever
>
> --
> Chuck Lever
>
>
>

2020-11-21 18:51:17

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs



> On Nov 21, 2020, at 1:28 PM, David Wysochanski <[email protected]> wrote:
>
> On Sat, Nov 21, 2020 at 12:16 PM Chuck Lever <[email protected]> wrote:
>>
>>
>>
>>> On Nov 21, 2020, at 12:01 PM, David Wysochanski <[email protected]> wrote:
>>>
>>> On Sat, Nov 21, 2020 at 11:14 AM Chuck Lever <[email protected]> wrote:
>>>>
>>>> Hi Dave-
>>>>
>>>>> On Nov 21, 2020, at 8:29 AM, Dave Wysochanski <[email protected]> wrote:
>>>>>
>>>>> These patches update the NFS client to use the new netfs and fscache
>>>>> APIs and are at:
>>>>> https://github.com/DaveWysochanskiRH/kernel.git
>>>>> https://github.com/DaveWysochanskiRH/kernel/commit/94e9633d98a5542ea384b1095290ac6f915fc917
>>>>> https://github.com/DaveWysochanskiRH/kernel/releases/tag/fscache-iter-nfs-20201120
>>>>>
>>>>> The patches are based on David Howells fscache-iter tree at
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
>>>>>
>>>>> The first 6 patches refactor some of the NFS read code to facilitate
>>>>> re-use, the next 6 patches do the conversion to the new API, and the
>>>>> last patch is a somewhat awkward fix for a problem seen in final
>>>>> testing.
>>>>>
>>>>> Per David Howell's recent post, note that the new fscache API is
>>>>> divided into two separate APIs, a 'netfs' API and an 'fscache' API.
>>>>> The netfs API was done to help simplify the IO paths of network
>>>>> filesystems, and can be called even when fscache is disabled, thus
>>>>> simplifing both readpage and readahead implementations. However,
>>>>> for now these NFS conversion patches only call the netfs API when
>>>>> fscache is enabled, similar to the existing NFS code.
>>>>>
>>>>> Trond and Anna, I would appreciate your guidance on this patchset.
>>>>> At least I would like your feedback regarding the direction
>>>>> you would like these patches to go, in particular, the following
>>>>> items:
>>>>>
>>>>> 1. Whether you are ok with using the netfs API unconditionally even
>>>>> when fscache is disabled, or prefer this "least invasive to NFS"
>>>>> approach. Note the unconditional use of the netfs API is the
>>>>> recommended approach per David's post and the AFS and CEPH
>>>>> implementations, but I was unsure if you would accept this
>>>>> approach or would prefer to minimize changes to NFS. Note if
>>>>> we keep the current approach to minimize NFS changes, we will
>>>>> have to address some problems with page unlocking such as with
>>>>> patch 13 in the series.
>>>>>
>>>>> 2. Whether to keep the NFS fscache implementation as "read only"
>>>>> or if we add write through support. Today we only enable fscache
>>>>> when a file is open read-only and disable / invalidate when a file
>>>>> is open for write.
>>>>>
>>>>> Still TODO
>>>>> 1. Address known issues (lockdep, page unlocking), depending on
>>>>> what is decided as far as implementation direction
>>>>> a) nfs_issue_op: takes rcu_read_lock but may calls nfs_page_alloc()
>>>>> with GFP_KERNEL which may sleep (dhowells noted this in a review)
>>>>> b) nfs_refresh_inode() takes inode->i_lock but may call
>>>>> __fscache_invalidate() which may sleep (found with lockdep)
>>>>> 2. Fixup NFS fscache stats (NFSIOS_FSCACHE_*)
>>>>> * Compare with netfs stats and determine if still needed
>>>>> 3. Cleanup dfprintks and/or convert to tracepoints
>>>>> 4. Further tests (see "Not tested yet")
>>>>
>>>> Can you say whether your approach has any performance impact?
>>> No I cannot.
>>>
>>>> In particular, what comparative benchmarks have been run?
>>>>
>>> No comparisons so far, but note the last bullet - "performance".
>>>
>>> Are you wondering about performance with/without fscache for this
>>> series, or the old vs new fscache, or something else?
>>
>> I'd like to have some explicit performance-related merge worthiness
>> criteria. For example: "No performance regressions, and here's how
>> we're going to determine that we're good: fio / iozone / yada with
>> NFS/TCP and NFS/RDMA on 100GbE; for very little additional CPU
>> cost measured via perf xyzzy. Also some benchmark that measures lock
>> contention."
>>
>> We haven't been especially careful about this in the past when
>> reworking the client's primary I/O paths. Nothing unreasonable, but
>> it should be stated up front where we want to end up.
>>
> Makes sense.
>
>> Another approach might be: we're going to start by making fscache
>> opt-in. As confidence increases over time and good performance is
>> demonstrated, then we'll unify the fscache and non-cached I/O paths.
>>
> It sounds like what you want is what I've done in this first implementation.

My understanding is that you haven't decided whether to take the
opt-in approach or to convert the primary I/O paths right now.


> This implementation takes a "least invasive to NFS" approach, staying
> with the old fscache on/off logic, even though this was not ideal or what
> was recommended as the end game for the netfs API.

I'm not taking a position on your two approaches, but I would like
that when the time comes to take the approach that involves full
integration, we should have an agreed-upon set of performance
goals. I don't want that integration to cause our high performance
environments (NFS/RDMA, for instance) to lose out from that
integration.

Btw, it isn't clear yet that we need to use the fscache APIs to
introduce proper huge page support. One of our current efforts is
to convert xdr_buf from the use of an array of struct page pointers
to an array of struct bio_vec pointers. In that case each entry in
the array carries the size of the thing that the entry points to,
which today is always PAGE_SIZE, but in the future could be much
larger.


>>>>> Checks run
>>>>> 1. checkpatch: PASS*
>>>>> * a few warnings, mostly trivial fixups, some unrelated to this set
>>>>> 2. kernel builds with each patch: PASS
>>>>> * each patch in series built cleanly which ensure bisection
>>>>>
>>>>> Tests run
>>>>> 1. Custom NFS+fscache unit tests for basic operation: PASS*
>>>>> * no oops or data corruptions
>>>>> * Some op counts are a bit off but these are mostly due
>>>>> to statistics not implemented properly (NFSIOS_FSCACHE_*)
>>>>> 2. cthon04: PASS (test options "-b -g -s -l", fsc,vers=3,4.0,4.1,4.2,sec=sys)
>>>>> * No failures or oopses for any version or test options
>>>>> 3. iozone tests (fsc,vers=3,4.0,4.1,4.2,sec=sys): PASS
>>>>> * No failures or oopses
>>>>> 4. kernel build (fsc,vers=3,4.1,4.2): PASS*
>>>>> * all builds finish without errors or data corruption
>>>>> * one lockdep "scheduling while atomic" fired with NFS41 and
>>>>> was due to one an fscache invalidation code path (known issue 'b' above)
>>>>> 5. xfstests/generic (fsc,vers=4.2, nofsc,vers=4.2): PASS*
>>>>> * generic/013 (pass but triggers i_lock lockdep warning known issue 'a' above)
>>>>> * NOTE: The following tests failed with errors, but they
>>>>> also fail on vanilla 5.10-rc4 so are not related to this
>>>>> patchset
>>>>> * generic/074 (lockep invalid wait context - nfs_free_request())
>>>>> * generic/538 (short read)
>>>>> * generic/551 (pread: Unknown error 524, Data verification fail)
>>>>> * generic/568 (ERROR: File grew from 4096 B to 8192 B when writing to the fallocated range)
>>>>>
>>>>> Not tested yet:
>>>>> * error injections (for example, connection disruptions, server errors during IO, etc)
>>>>> * pNFS
>>>>> * many process mixed read/write on same file
>>>>> * performance
>>>>> Dave Wysochanski (13):
>>>>> NFS: Clean up nfs_readpage() and nfs_readpages()
>>>>> NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read
>>>>> succeeds
>>>>> NFS: Refactor nfs_readpage() and nfs_readpage_async() to use
>>>>> nfs_readdesc
>>>>> NFS: Call readpage_async_filler() from nfs_readpage_async()
>>>>> NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
>>>>> NFS: Allow internal use of read structs and functions
>>>>> NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
>>>>> NFS: Convert fscache_enable_cookie and fscache_disable_cookie
>>>>> NFS: Convert fscache invalidation and update aux_data and i_size
>>>>> NFS: Convert to the netfs API and nfs_readpage to use netfs_readpage
>>>>> NFS: Convert readpage to readahead and use netfs_readahead for fscache
>>>>> NFS: Allow NFS use of new fscache API in build
>>>>> NFS: Ensure proper page unlocking when reads fail with retryable
>>>>> errors
>>>>>
>>>>> fs/nfs/Kconfig | 2 +-
>>>>> fs/nfs/direct.c | 2 +
>>>>> fs/nfs/file.c | 22 ++--
>>>>> fs/nfs/fscache-index.c | 94 --------------
>>>>> fs/nfs/fscache.c | 315 ++++++++++++++++++++-------------------------
>>>>> fs/nfs/fscache.h | 132 +++++++------------
>>>>> fs/nfs/inode.c | 4 +-
>>>>> fs/nfs/internal.h | 8 ++
>>>>> fs/nfs/nfs4proc.c | 2 +-
>>>>> fs/nfs/pagelist.c | 2 +
>>>>> fs/nfs/read.c | 248 ++++++++++++++++-------------------
>>>>> fs/nfs/write.c | 3 +-
>>>>> include/linux/nfs_fs.h | 5 +-
>>>>> include/linux/nfs_iostat.h | 2 +-
>>>>> include/linux/nfs_page.h | 1 +
>>>>> include/linux/nfs_xdr.h | 1 +
>>>>> 16 files changed, 339 insertions(+), 504 deletions(-)
>>>>>
>>>>> --
>>>>> 1.8.3.1
>>>>>
>>>>
>>>> --
>>>> Chuck Lever
>>
>> --
>> Chuck Lever

--
Chuck Lever



2020-11-30 13:09:48

by David Wysochanski

[permalink] [raw]
Subject: Re: [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs

On Sat, Nov 21, 2020 at 8:29 AM Dave Wysochanski <[email protected]> wrote:
>
> These patches update the NFS client to use the new netfs and fscache
> APIs and are at:
> https://github.com/DaveWysochanskiRH/kernel.git
> https://github.com/DaveWysochanskiRH/kernel/commit/94e9633d98a5542ea384b1095290ac6f915fc917
> https://github.com/DaveWysochanskiRH/kernel/releases/tag/fscache-iter-nfs-20201120
>
> The patches are based on David Howells fscache-iter tree at
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
>
> The first 6 patches refactor some of the NFS read code to facilitate
> re-use, the next 6 patches do the conversion to the new API, and the
> last patch is a somewhat awkward fix for a problem seen in final
> testing.
>
> Per David Howell's recent post, note that the new fscache API is
> divided into two separate APIs, a 'netfs' API and an 'fscache' API.
> The netfs API was done to help simplify the IO paths of network
> filesystems, and can be called even when fscache is disabled, thus
> simplifing both readpage and readahead implementations. However,
> for now these NFS conversion patches only call the netfs API when
> fscache is enabled, similar to the existing NFS code.
>
> Trond and Anna, I would appreciate your guidance on this patchset.
> At least I would like your feedback regarding the direction
> you would like these patches to go, in particular, the following
> items:
>
> 1. Whether you are ok with using the netfs API unconditionally even
> when fscache is disabled, or prefer this "least invasive to NFS"
> approach. Note the unconditional use of the netfs API is the
> recommended approach per David's post and the AFS and CEPH
> implementations, but I was unsure if you would accept this
> approach or would prefer to minimize changes to NFS. Note if
> we keep the current approach to minimize NFS changes, we will
> have to address some problems with page unlocking such as with
> patch 13 in the series.
>
> 2. Whether to keep the NFS fscache implementation as "read only"
> or if we add write through support. Today we only enable fscache
> when a file is open read-only and disable / invalidate when a file
> is open for write.
>
> Still TODO
> 1. Address known issues (lockdep, page unlocking), depending on
> what is decided as far as implementation direction
> a) nfs_issue_op: takes rcu_read_lock but may calls nfs_page_alloc()
> with GFP_KERNEL which may sleep (dhowells noted this in a review)
> b) nfs_refresh_inode() takes inode->i_lock but may call
> __fscache_invalidate() which may sleep (found with lockdep)
> 2. Fixup NFS fscache stats (NFSIOS_FSCACHE_*)
> * Compare with netfs stats and determine if still needed
> 3. Cleanup dfprintks and/or convert to tracepoints
> 4. Further tests (see "Not tested yet")
>
> Checks run
> 1. checkpatch: PASS*
> * a few warnings, mostly trivial fixups, some unrelated to this set
> 2. kernel builds with each patch: PASS
> * each patch in series built cleanly which ensure bisection
>
> Tests run
> 1. Custom NFS+fscache unit tests for basic operation: PASS*
> * no oops or data corruptions
> * Some op counts are a bit off but these are mostly due
> to statistics not implemented properly (NFSIOS_FSCACHE_*)
> 2. cthon04: PASS (test options "-b -g -s -l", fsc,vers=3,4.0,4.1,4.2,sec=sys)
> * No failures or oopses for any version or test options
> 3. iozone tests (fsc,vers=3,4.0,4.1,4.2,sec=sys): PASS
> * No failures or oopses
> 4. kernel build (fsc,vers=3,4.1,4.2): PASS*
> * all builds finish without errors or data corruption
> * one lockdep "scheduling while atomic" fired with NFS41 and
> was due to one an fscache invalidation code path (known issue 'b' above)
> 5. xfstests/generic (fsc,vers=4.2, nofsc,vers=4.2): PASS*
> * generic/013 (pass but triggers i_lock lockdep warning known issue 'a' above)
> * NOTE: The following tests failed with errors, but they
> also fail on vanilla 5.10-rc4 so are not related to this
> patchset
> * generic/074 (lockep invalid wait context - nfs_free_request())
> * generic/538 (short read)
> * generic/551 (pread: Unknown error 524, Data verification fail)
> * generic/568 (ERROR: File grew from 4096 B to 8192 B when writing to the fallocated range)
>

FYI, Anna asked about test coverage last week during the chat with
David Howells about fscache.
Since then, I ran xfstests generic on NFS versions 3, 4.0, 4.1, and
4.2 with the following configurations:
- 5.10.0-rc4 without 'fsc' (fscache not enabled)
- 5.10.0-rc4-94e9633d98a5+ (these patches) without 'fsc' (fscache not enabled)
- 5.10.0-rc4-94e9633d98a5+ (these patches) with 'fsc' (fscache enabled)

In all of the above, a small number of tests failed with the patched
kernels also failed with the vanilla kernel, so no new failures
occurred.
If you want the full text file output for the test runs just ask, but
here's a short summary:

5.10.0-rc4/xfstests-generic-NFS3-nofsc-5.10.0-rc4.txt:generic/053 -
output mismatch (see
/root/git/xfstests-dev/results//nfsnofsc/generic/053.out.bad)
5.10.0-rc4/xfstests-generic-NFS3-nofsc-5.10.0-rc4.txt:generic/089
26888s ... - output mismatch (see
/root/git/xfstests-dev/results//nfsnofsc/generic/089.out.bad)
5.10.0-rc4/xfstests-generic-NFS3-nofsc-5.10.0-rc4.txt:generic/105 -
output mismatch (see
/root/git/xfstests-dev/results//nfsnofsc/generic/105.out.bad)
5.10.0-rc4/xfstests-generic-NFS3-nofsc-5.10.0-rc4.txt:generic/258 1s
... [failed, exit status 1]- output mismatch (see
/root/git/xfstests-dev/results//nfsnofsc/generic/258.out.bad)
5.10.0-rc4/xfstests-generic-NFS3-nofsc-5.10.0-rc4.txt:generic/294 -
output mismatch (see
/root/git/xfstests-dev/results//nfsnofsc/generic/294.out.bad)
5.10.0-rc4/xfstests-generic-NFS3-nofsc-5.10.0-rc4.txt:generic/318 -
output mismatch (see
/root/git/xfstests-dev/results//nfsnofsc/generic/318.out.bad)
5.10.0-rc4/xfstests-generic-NFS3-nofsc-5.10.0-rc4.txt:generic/319 -
output mismatch (see
/root/git/xfstests-dev/results//nfsnofsc/generic/319.out.bad)
5.10.0-rc4/xfstests-generic-NFS3-nofsc-5.10.0-rc4.txt:generic/444 -
output mismatch (see
/root/git/xfstests-dev/results//nfsnofsc/generic/444.out.bad)
5.10.0-rc4/xfstests-generic-NFS3-nofsc-5.10.0-rc4.txt:generic/467 36s
... - output mismatch (see
/root/git/xfstests-dev/results//nfsnofsc/generic/467.out.bad)
5.10.0-rc4/xfstests-generic-NFS3-nofsc-5.10.0-rc4.txt:generic/477 23s
... - output mismatch (see
/root/git/xfstests-dev/results//nfsnofsc/generic/477.out.bad)
5.10.0-rc4/xfstests-generic-NFS3-nofsc-5.10.0-rc4.txt:generic/529 -
output mismatch (see
/root/git/xfstests-dev/results//nfsnofsc/generic/529.out.bad)
5.10.0-rc4/xfstests-generic-NFS3-nofsc-5.10.0-rc4.txt:generic/551
7535s ... - output mismatch (see
/root/git/xfstests-dev/results//nfsnofsc/generic/551.out.bad)
5.10.0-rc4/xfstests-generic-NFS40-nofsc-5.10.0-rc4.txt:generic/294
- output mismatch (see
/root/git/xfstests-dev/results//nfs40nofsc/generic/294.out.bad)
5.10.0-rc4/xfstests-generic-NFS40-nofsc-5.10.0-rc4.txt:generic/426 36s
... - output mismatch (see
/root/git/xfstests-dev/results//nfs40nofsc/generic/426.out.bad)
5.10.0-rc4/xfstests-generic-NFS40-nofsc-5.10.0-rc4.txt:generic/467 4s
... - output mismatch (see
/root/git/xfstests-dev/results//nfs40nofsc/generic/467.out.bad)
5.10.0-rc4/xfstests-generic-NFS40-nofsc-5.10.0-rc4.txt:generic/477 3s
... - output mismatch (see
/root/git/xfstests-dev/results//nfs40nofsc/generic/477.out.bad)
5.10.0-rc4/xfstests-generic-NFS40-nofsc-5.10.0-rc4.txt:generic/551
7535s ... - output mismatch (see
/root/git/xfstests-dev/results//nfs40nofsc/generic/551.out.bad)
5.10.0-rc4/xfstests-generic-NFS41-nofsc-5.10.0-rc4.txt:generic/294
- output mismatch (see
/root/git/xfstests-dev/results//nfs41nofsc/generic/294.out.bad)
5.10.0-rc4/xfstests-generic-NFS41-nofsc-5.10.0-rc4.txt:generic/551
7535s ... - output mismatch (see
/root/git/xfstests-dev/results//nfs41nofsc/generic/551.out.bad)
5.10.0-rc4/xfstests-generic-NFS42-nofsc-5.10.0-rc4.txt:generic/294
- output mismatch (see
/root/git/xfstests-dev/results//nfs42nofsc/generic/294.out.bad)
5.10.0-rc4/xfstests-generic-NFS42-nofsc-5.10.0-rc4.txt:generic/538 30s
... - output mismatch (see
/root/git/xfstests-dev/results//nfs42nofsc/generic/538.out.bad)
5.10.0-rc4/xfstests-generic-NFS42-nofsc-5.10.0-rc4.txt:generic/551
7535s ... - output mismatch (see
/root/git/xfstests-dev/results//nfs42nofsc/generic/551.out.bad)
5.10.0-rc4/xfstests-generic-NFS42-nofsc-5.10.0-rc4.txt:generic/568
- output mismatch (see
/root/git/xfstests-dev/results//nfs42nofsc/generic/568.out.bad)



> Not tested yet:
> * error injections (for example, connection disruptions, server errors during IO, etc)
> * pNFS
> * many process mixed read/write on same file
> * performance
> Dave Wysochanski (13):
> NFS: Clean up nfs_readpage() and nfs_readpages()
> NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read
> succeeds
> NFS: Refactor nfs_readpage() and nfs_readpage_async() to use
> nfs_readdesc
> NFS: Call readpage_async_filler() from nfs_readpage_async()
> NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
> NFS: Allow internal use of read structs and functions
> NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
> NFS: Convert fscache_enable_cookie and fscache_disable_cookie
> NFS: Convert fscache invalidation and update aux_data and i_size
> NFS: Convert to the netfs API and nfs_readpage to use netfs_readpage
> NFS: Convert readpage to readahead and use netfs_readahead for fscache
> NFS: Allow NFS use of new fscache API in build
> NFS: Ensure proper page unlocking when reads fail with retryable
> errors
>
> fs/nfs/Kconfig | 2 +-
> fs/nfs/direct.c | 2 +
> fs/nfs/file.c | 22 ++--
> fs/nfs/fscache-index.c | 94 --------------
> fs/nfs/fscache.c | 315 ++++++++++++++++++++-------------------------
> fs/nfs/fscache.h | 132 +++++++------------
> fs/nfs/inode.c | 4 +-
> fs/nfs/internal.h | 8 ++
> fs/nfs/nfs4proc.c | 2 +-
> fs/nfs/pagelist.c | 2 +
> fs/nfs/read.c | 248 ++++++++++++++++-------------------
> fs/nfs/write.c | 3 +-
> include/linux/nfs_fs.h | 5 +-
> include/linux/nfs_iostat.h | 2 +-
> include/linux/nfs_page.h | 1 +
> include/linux/nfs_xdr.h | 1 +
> 16 files changed, 339 insertions(+), 504 deletions(-)
>
> --
> 1.8.3.1
>

2020-11-30 13:21:06

by David Wysochanski

[permalink] [raw]
Subject: Re: [PATCH v1 0/13] Convert NFS to new netfs and fscache APIs

On Sat, Nov 21, 2020 at 1:48 PM Chuck Lever <[email protected]> wrote:
>
>
>
> > On Nov 21, 2020, at 1:28 PM, David Wysochanski <[email protected]> wrote:
> >
> > On Sat, Nov 21, 2020 at 12:16 PM Chuck Lever <[email protected]> wrote:
> >>
> >>
> >>
> >>> On Nov 21, 2020, at 12:01 PM, David Wysochanski <[email protected]> wrote:
> >>>
> >>> On Sat, Nov 21, 2020 at 11:14 AM Chuck Lever <[email protected]> wrote:
> >>>>
> >>>> Hi Dave-
> >>>>
> >>>>> On Nov 21, 2020, at 8:29 AM, Dave Wysochanski <[email protected]> wrote:
> >>>>>
> >>>>> These patches update the NFS client to use the new netfs and fscache
> >>>>> APIs and are at:
> >>>>> https://github.com/DaveWysochanskiRH/kernel.git
> >>>>> https://github.com/DaveWysochanskiRH/kernel/commit/94e9633d98a5542ea384b1095290ac6f915fc917
> >>>>> https://github.com/DaveWysochanskiRH/kernel/releases/tag/fscache-iter-nfs-20201120
> >>>>>
> >>>>> The patches are based on David Howells fscache-iter tree at
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-iter
> >>>>>
> >>>>> The first 6 patches refactor some of the NFS read code to facilitate
> >>>>> re-use, the next 6 patches do the conversion to the new API, and the
> >>>>> last patch is a somewhat awkward fix for a problem seen in final
> >>>>> testing.
> >>>>>
> >>>>> Per David Howell's recent post, note that the new fscache API is
> >>>>> divided into two separate APIs, a 'netfs' API and an 'fscache' API.
> >>>>> The netfs API was done to help simplify the IO paths of network
> >>>>> filesystems, and can be called even when fscache is disabled, thus
> >>>>> simplifing both readpage and readahead implementations. However,
> >>>>> for now these NFS conversion patches only call the netfs API when
> >>>>> fscache is enabled, similar to the existing NFS code.
> >>>>>
> >>>>> Trond and Anna, I would appreciate your guidance on this patchset.
> >>>>> At least I would like your feedback regarding the direction
> >>>>> you would like these patches to go, in particular, the following
> >>>>> items:
> >>>>>
> >>>>> 1. Whether you are ok with using the netfs API unconditionally even
> >>>>> when fscache is disabled, or prefer this "least invasive to NFS"
> >>>>> approach. Note the unconditional use of the netfs API is the
> >>>>> recommended approach per David's post and the AFS and CEPH
> >>>>> implementations, but I was unsure if you would accept this
> >>>>> approach or would prefer to minimize changes to NFS. Note if
> >>>>> we keep the current approach to minimize NFS changes, we will
> >>>>> have to address some problems with page unlocking such as with
> >>>>> patch 13 in the series.
> >>>>>
> >>>>> 2. Whether to keep the NFS fscache implementation as "read only"
> >>>>> or if we add write through support. Today we only enable fscache
> >>>>> when a file is open read-only and disable / invalidate when a file
> >>>>> is open for write.
> >>>>>
> >>>>> Still TODO
> >>>>> 1. Address known issues (lockdep, page unlocking), depending on
> >>>>> what is decided as far as implementation direction
> >>>>> a) nfs_issue_op: takes rcu_read_lock but may calls nfs_page_alloc()
> >>>>> with GFP_KERNEL which may sleep (dhowells noted this in a review)
> >>>>> b) nfs_refresh_inode() takes inode->i_lock but may call
> >>>>> __fscache_invalidate() which may sleep (found with lockdep)
> >>>>> 2. Fixup NFS fscache stats (NFSIOS_FSCACHE_*)
> >>>>> * Compare with netfs stats and determine if still needed
> >>>>> 3. Cleanup dfprintks and/or convert to tracepoints
> >>>>> 4. Further tests (see "Not tested yet")
> >>>>
> >>>> Can you say whether your approach has any performance impact?
> >>> No I cannot.
> >>>
> >>>> In particular, what comparative benchmarks have been run?
> >>>>
> >>> No comparisons so far, but note the last bullet - "performance".
> >>>
> >>> Are you wondering about performance with/without fscache for this
> >>> series, or the old vs new fscache, or something else?
> >>
> >> I'd like to have some explicit performance-related merge worthiness
> >> criteria. For example: "No performance regressions, and here's how
> >> we're going to determine that we're good: fio / iozone / yada with
> >> NFS/TCP and NFS/RDMA on 100GbE; for very little additional CPU
> >> cost measured via perf xyzzy. Also some benchmark that measures lock
> >> contention."
> >>
> >> We haven't been especially careful about this in the past when
> >> reworking the client's primary I/O paths. Nothing unreasonable, but
> >> it should be stated up front where we want to end up.
> >>
> > Makes sense.
> >
> >> Another approach might be: we're going to start by making fscache
> >> opt-in. As confidence increases over time and good performance is
> >> demonstrated, then we'll unify the fscache and non-cached I/O paths.
> >>
> > It sounds like what you want is what I've done in this first implementation.
>
> My understanding is that you haven't decided whether to take the
> opt-in approach or to convert the primary I/O paths right now.
>

I posted this approach because I thought it would be the least path to
getting the new fscache changes merged.

I have another patch in progress that converts by using netfs API
unconditionally.
However I didn't think it made sense to post that until I received feedback
on this set and approach vs the more invasive approach.

>
> > This implementation takes a "least invasive to NFS" approach, staying
> > with the old fscache on/off logic, even though this was not ideal or what
> > was recommended as the end game for the netfs API.
>
> I'm not taking a position on your two approaches, but I would like
> that when the time comes to take the approach that involves full
> integration, we should have an agreed-upon set of performance
> goals. I don't want that integration to cause our high performance
> environments (NFS/RDMA, for instance) to lose out from that
> integration.
>
> Btw, it isn't clear yet that we need to use the fscache APIs to
> introduce proper huge page support. One of our current efforts is
> to convert xdr_buf from the use of an array of struct page pointers
> to an array of struct bio_vec pointers. In that case each entry in
> the array carries the size of the thing that the entry points to,
> which today is always PAGE_SIZE, but in the future could be much
> larger.
>
>
> >>>>> Checks run
> >>>>> 1. checkpatch: PASS*
> >>>>> * a few warnings, mostly trivial fixups, some unrelated to this set
> >>>>> 2. kernel builds with each patch: PASS
> >>>>> * each patch in series built cleanly which ensure bisection
> >>>>>
> >>>>> Tests run
> >>>>> 1. Custom NFS+fscache unit tests for basic operation: PASS*
> >>>>> * no oops or data corruptions
> >>>>> * Some op counts are a bit off but these are mostly due
> >>>>> to statistics not implemented properly (NFSIOS_FSCACHE_*)
> >>>>> 2. cthon04: PASS (test options "-b -g -s -l", fsc,vers=3,4.0,4.1,4.2,sec=sys)
> >>>>> * No failures or oopses for any version or test options
> >>>>> 3. iozone tests (fsc,vers=3,4.0,4.1,4.2,sec=sys): PASS
> >>>>> * No failures or oopses
> >>>>> 4. kernel build (fsc,vers=3,4.1,4.2): PASS*
> >>>>> * all builds finish without errors or data corruption
> >>>>> * one lockdep "scheduling while atomic" fired with NFS41 and
> >>>>> was due to one an fscache invalidation code path (known issue 'b' above)
> >>>>> 5. xfstests/generic (fsc,vers=4.2, nofsc,vers=4.2): PASS*
> >>>>> * generic/013 (pass but triggers i_lock lockdep warning known issue 'a' above)
> >>>>> * NOTE: The following tests failed with errors, but they
> >>>>> also fail on vanilla 5.10-rc4 so are not related to this
> >>>>> patchset
> >>>>> * generic/074 (lockep invalid wait context - nfs_free_request())
> >>>>> * generic/538 (short read)
> >>>>> * generic/551 (pread: Unknown error 524, Data verification fail)
> >>>>> * generic/568 (ERROR: File grew from 4096 B to 8192 B when writing to the fallocated range)
> >>>>>
> >>>>> Not tested yet:
> >>>>> * error injections (for example, connection disruptions, server errors during IO, etc)
> >>>>> * pNFS
> >>>>> * many process mixed read/write on same file
> >>>>> * performance
> >>>>> Dave Wysochanski (13):
> >>>>> NFS: Clean up nfs_readpage() and nfs_readpages()
> >>>>> NFS: In nfs_readpage() only increment NFSIOS_READPAGES when read
> >>>>> succeeds
> >>>>> NFS: Refactor nfs_readpage() and nfs_readpage_async() to use
> >>>>> nfs_readdesc
> >>>>> NFS: Call readpage_async_filler() from nfs_readpage_async()
> >>>>> NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()
> >>>>> NFS: Allow internal use of read structs and functions
> >>>>> NFS: Convert fscache_acquire_cookie and fscache_relinquish_cookie
> >>>>> NFS: Convert fscache_enable_cookie and fscache_disable_cookie
> >>>>> NFS: Convert fscache invalidation and update aux_data and i_size
> >>>>> NFS: Convert to the netfs API and nfs_readpage to use netfs_readpage
> >>>>> NFS: Convert readpage to readahead and use netfs_readahead for fscache
> >>>>> NFS: Allow NFS use of new fscache API in build
> >>>>> NFS: Ensure proper page unlocking when reads fail with retryable
> >>>>> errors
> >>>>>
> >>>>> fs/nfs/Kconfig | 2 +-
> >>>>> fs/nfs/direct.c | 2 +
> >>>>> fs/nfs/file.c | 22 ++--
> >>>>> fs/nfs/fscache-index.c | 94 --------------
> >>>>> fs/nfs/fscache.c | 315 ++++++++++++++++++++-------------------------
> >>>>> fs/nfs/fscache.h | 132 +++++++------------
> >>>>> fs/nfs/inode.c | 4 +-
> >>>>> fs/nfs/internal.h | 8 ++
> >>>>> fs/nfs/nfs4proc.c | 2 +-
> >>>>> fs/nfs/pagelist.c | 2 +
> >>>>> fs/nfs/read.c | 248 ++++++++++++++++-------------------
> >>>>> fs/nfs/write.c | 3 +-
> >>>>> include/linux/nfs_fs.h | 5 +-
> >>>>> include/linux/nfs_iostat.h | 2 +-
> >>>>> include/linux/nfs_page.h | 1 +
> >>>>> include/linux/nfs_xdr.h | 1 +
> >>>>> 16 files changed, 339 insertions(+), 504 deletions(-)
> >>>>>
> >>>>> --
> >>>>> 1.8.3.1
> >>>>>
> >>>>
> >>>> --
> >>>> Chuck Lever
> >>
> >> --
> >> Chuck Lever
>
> --
> Chuck Lever
>
>
>