2022-07-22 18:24:23

by Jeffrey Layton

[permalink] [raw]
Subject: [PATCH 0/3] nfs: fix -ENOSPC DIO write regression

Boyang reported that xfstest generic/476 would never complete when run
against a filesystem that was "too small".

What I found was that we would end up trying to issue a large DIO write
that would come back short. The kernel would then follow up and try to
write out the rest and get back -ENOSPC. It would then try to issue a
commit, which would then try to reissue the writes, and around it would
go.

This patchset seems to fix it. Unfortunately, I'm not positive which
patch _broke_ this as it seems to have happened quite some time ago.

Jeff Layton (3):
nfs: add new nfs_direct_req tracepoint events
nfs: always check dreq->error after a commit
nfs: only issue commit in DIO codepath if we have uncommitted data

fs/nfs/direct.c | 50 +++++++++--------------------
fs/nfs/internal.h | 33 ++++++++++++++++++++
fs/nfs/nfstrace.h | 69 +++++++++++++++++++++++++++++++++++++++++
fs/nfs/write.c | 48 +++++++++++++++++-----------
include/linux/nfs_xdr.h | 1 +
5 files changed, 148 insertions(+), 53 deletions(-)

--
2.36.1


2022-07-24 19:33:33

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 0/3] nfs: fix -ENOSPC DIO write regression

On Fri, 2022-07-22 at 14:12 -0400, Jeff Layton wrote:
> Boyang reported that xfstest generic/476 would never complete when
> run
> against a filesystem that was "too small".
>
> What I found was that we would end up trying to issue a large DIO
> write
> that would come back short. The kernel would then follow up and try
> to
> write out the rest and get back -ENOSPC. It would then try to issue a
> commit, which would then try to reissue the writes, and around it
> would
> go.
>
> This patchset seems to fix it. Unfortunately, I'm not positive which
> patch _broke_ this as it seems to have happened quite some time ago.
>
> Jeff Layton (3):
>   nfs: add new nfs_direct_req tracepoint events
>   nfs: always check dreq->error after a commit
>   nfs: only issue commit in DIO codepath if we have uncommitted data
>
>  fs/nfs/direct.c         | 50 +++++++++--------------------
>  fs/nfs/internal.h       | 33 ++++++++++++++++++++
>  fs/nfs/nfstrace.h       | 69
> +++++++++++++++++++++++++++++++++++++++++
>  fs/nfs/write.c          | 48 +++++++++++++++++-----------
>  include/linux/nfs_xdr.h |  1 +
>  5 files changed, 148 insertions(+), 53 deletions(-)
>

With this series applied, I'm seeing things like xfstests generic/013
looping forever.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2022-07-24 20:19:35

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 0/3] nfs: fix -ENOSPC DIO write regression

On Sun, 2022-07-24 at 19:10 +0000, Trond Myklebust wrote:
> On Fri, 2022-07-22 at 14:12 -0400, Jeff Layton wrote:
> > Boyang reported that xfstest generic/476 would never complete when
> > run
> > against a filesystem that was "too small".
> >
> > What I found was that we would end up trying to issue a large DIO
> > write
> > that would come back short. The kernel would then follow up and try
> > to
> > write out the rest and get back -ENOSPC. It would then try to issue
> > a
> > commit, which would then try to reissue the writes, and around it
> > would
> > go.
> >
> > This patchset seems to fix it. Unfortunately, I'm not positive
> > which
> > patch _broke_ this as it seems to have happened quite some time
> > ago.
> >
> > Jeff Layton (3):
> >   nfs: add new nfs_direct_req tracepoint events
> >   nfs: always check dreq->error after a commit
> >   nfs: only issue commit in DIO codepath if we have uncommitted
> > data
> >
> >  fs/nfs/direct.c         | 50 +++++++++--------------------
> >  fs/nfs/internal.h       | 33 ++++++++++++++++++++
> >  fs/nfs/nfstrace.h       | 69
> > +++++++++++++++++++++++++++++++++++++++++
> >  fs/nfs/write.c          | 48 +++++++++++++++++-----------
> >  include/linux/nfs_xdr.h |  1 +
> >  5 files changed, 148 insertions(+), 53 deletions(-)
> >
>
> With this series applied, I'm seeing things like xfstests generic/013
> looping forever.
>

Sorry, false alarm... That turned out to be due to an interesting
readahead config issue.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2022-08-04 09:10:49

by Boyang Xue

[permalink] [raw]
Subject: Re: [PATCH 0/3] nfs: fix -ENOSPC DIO write regression

Hi Jeff,

Thanks for fixing this! I have run some tests against this patchset
for days, and the results are all good. generic/476 would complete
within 30 mins typically. These tests are:

For verifying generic/476:
xfstests-multihost-nfsv3-over-ext4
xfstests-multihost-nfsv3-over-feature-ext4
xfstests-multihost-nfsv3-over-feature-xfs
xfstests-multihost-nfsv3-over-xfs
xfstests-multihost-nfsv4.0-over-ext4
xfstests-multihost-nfsv4.0-over-feature-ext4
xfstests-multihost-nfsv4.0-over-feature-xfs
xfstests-multihost-nfsv4.0-over-xfs
xfstests-multihost-nfsv4.1-over-ext4
xfstests-multihost-nfsv4.1-over-feature-ext4
xfstests-multihost-nfsv4.1-over-feature-xfs
xfstests-multihost-nfsv4.1-over-xfs
xfstests-multihost-nfsv4.2-over-ext4
xfstests-multihost-nfsv4.2-over-feature-ext4
xfstests-multihost-nfsv4.2-over-feature-xfs
xfstests-multihost-nfsv4.2-over-xfs
xfstests-localhost-nfsv3
xfstests-localhost-nfsv4.0
xfstests-localhost-nfsv4.1
xfstests-localhost-nfsv4.2

Regression tests:
ltp-nfsv{3,4.0,4.1,4.2}
pjd-test: nfs
nfs-connectathon
nfs-sanity-check

All tests were run on x86_64, aarch64, ppc64le, and s390x (part, due
to some config issues).

Hope this helps.

Thanks,
Boyang

On Sat, Jul 23, 2022 at 2:12 AM Jeff Layton <[email protected]> wrote:
>
> Boyang reported that xfstest generic/476 would never complete when run
> against a filesystem that was "too small".
>
> What I found was that we would end up trying to issue a large DIO write
> that would come back short. The kernel would then follow up and try to
> write out the rest and get back -ENOSPC. It would then try to issue a
> commit, which would then try to reissue the writes, and around it would
> go.
>
> This patchset seems to fix it. Unfortunately, I'm not positive which
> patch _broke_ this as it seems to have happened quite some time ago.
>
> Jeff Layton (3):
> nfs: add new nfs_direct_req tracepoint events
> nfs: always check dreq->error after a commit
> nfs: only issue commit in DIO codepath if we have uncommitted data
>
> fs/nfs/direct.c | 50 +++++++++--------------------
> fs/nfs/internal.h | 33 ++++++++++++++++++++
> fs/nfs/nfstrace.h | 69 +++++++++++++++++++++++++++++++++++++++++
> fs/nfs/write.c | 48 +++++++++++++++++-----------
> include/linux/nfs_xdr.h | 1 +
> 5 files changed, 148 insertions(+), 53 deletions(-)
>
> --
> 2.36.1
>