Boyang reported that xfstest generic/476 would never complete when run
against a filesystem that was "too small".
What I found was that we would end up trying to issue a large DIO write
that would come back short. The kernel would then follow up and try to
write out the rest and get back -ENOSPC. It would then try to issue a
commit, which would then try to reissue the writes, and around it would
go.
This patchset seems to fix it. Unfortunately, I'm not positive which
patch _broke_ this as it seems to have happened quite some time ago.
Jeff Layton (3):
nfs: add new nfs_direct_req tracepoint events
nfs: always check dreq->error after a commit
nfs: only issue commit in DIO codepath if we have uncommitted data
fs/nfs/direct.c | 50 +++++++++--------------------
fs/nfs/internal.h | 33 ++++++++++++++++++++
fs/nfs/nfstrace.h | 69 +++++++++++++++++++++++++++++++++++++++++
fs/nfs/write.c | 48 +++++++++++++++++-----------
include/linux/nfs_xdr.h | 1 +
5 files changed, 148 insertions(+), 53 deletions(-)
--
2.36.1
On Fri, 2022-07-22 at 14:12 -0400, Jeff Layton wrote:
> Boyang reported that xfstest generic/476 would never complete when
> run
> against a filesystem that was "too small".
>
> What I found was that we would end up trying to issue a large DIO
> write
> that would come back short. The kernel would then follow up and try
> to
> write out the rest and get back -ENOSPC. It would then try to issue a
> commit, which would then try to reissue the writes, and around it
> would
> go.
>
> This patchset seems to fix it. Unfortunately, I'm not positive which
> patch _broke_ this as it seems to have happened quite some time ago.
>
> Jeff Layton (3):
> nfs: add new nfs_direct_req tracepoint events
> nfs: always check dreq->error after a commit
> nfs: only issue commit in DIO codepath if we have uncommitted data
>
> fs/nfs/direct.c | 50 +++++++++--------------------
> fs/nfs/internal.h | 33 ++++++++++++++++++++
> fs/nfs/nfstrace.h | 69
> +++++++++++++++++++++++++++++++++++++++++
> fs/nfs/write.c | 48 +++++++++++++++++-----------
> include/linux/nfs_xdr.h | 1 +
> 5 files changed, 148 insertions(+), 53 deletions(-)
>
With this series applied, I'm seeing things like xfstests generic/013
looping forever.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On Sun, 2022-07-24 at 19:10 +0000, Trond Myklebust wrote:
> On Fri, 2022-07-22 at 14:12 -0400, Jeff Layton wrote:
> > Boyang reported that xfstest generic/476 would never complete when
> > run
> > against a filesystem that was "too small".
> >
> > What I found was that we would end up trying to issue a large DIO
> > write
> > that would come back short. The kernel would then follow up and try
> > to
> > write out the rest and get back -ENOSPC. It would then try to issue
> > a
> > commit, which would then try to reissue the writes, and around it
> > would
> > go.
> >
> > This patchset seems to fix it. Unfortunately, I'm not positive
> > which
> > patch _broke_ this as it seems to have happened quite some time
> > ago.
> >
> > Jeff Layton (3):
> > nfs: add new nfs_direct_req tracepoint events
> > nfs: always check dreq->error after a commit
> > nfs: only issue commit in DIO codepath if we have uncommitted
> > data
> >
> > fs/nfs/direct.c | 50 +++++++++--------------------
> > fs/nfs/internal.h | 33 ++++++++++++++++++++
> > fs/nfs/nfstrace.h | 69
> > +++++++++++++++++++++++++++++++++++++++++
> > fs/nfs/write.c | 48 +++++++++++++++++-----------
> > include/linux/nfs_xdr.h | 1 +
> > 5 files changed, 148 insertions(+), 53 deletions(-)
> >
>
> With this series applied, I'm seeing things like xfstests generic/013
> looping forever.
>
Sorry, false alarm... That turned out to be due to an interesting
readahead config issue.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
Hi Jeff,
Thanks for fixing this! I have run some tests against this patchset
for days, and the results are all good. generic/476 would complete
within 30 mins typically. These tests are:
For verifying generic/476:
xfstests-multihost-nfsv3-over-ext4
xfstests-multihost-nfsv3-over-feature-ext4
xfstests-multihost-nfsv3-over-feature-xfs
xfstests-multihost-nfsv3-over-xfs
xfstests-multihost-nfsv4.0-over-ext4
xfstests-multihost-nfsv4.0-over-feature-ext4
xfstests-multihost-nfsv4.0-over-feature-xfs
xfstests-multihost-nfsv4.0-over-xfs
xfstests-multihost-nfsv4.1-over-ext4
xfstests-multihost-nfsv4.1-over-feature-ext4
xfstests-multihost-nfsv4.1-over-feature-xfs
xfstests-multihost-nfsv4.1-over-xfs
xfstests-multihost-nfsv4.2-over-ext4
xfstests-multihost-nfsv4.2-over-feature-ext4
xfstests-multihost-nfsv4.2-over-feature-xfs
xfstests-multihost-nfsv4.2-over-xfs
xfstests-localhost-nfsv3
xfstests-localhost-nfsv4.0
xfstests-localhost-nfsv4.1
xfstests-localhost-nfsv4.2
Regression tests:
ltp-nfsv{3,4.0,4.1,4.2}
pjd-test: nfs
nfs-connectathon
nfs-sanity-check
All tests were run on x86_64, aarch64, ppc64le, and s390x (part, due
to some config issues).
Hope this helps.
Thanks,
Boyang
On Sat, Jul 23, 2022 at 2:12 AM Jeff Layton <[email protected]> wrote:
>
> Boyang reported that xfstest generic/476 would never complete when run
> against a filesystem that was "too small".
>
> What I found was that we would end up trying to issue a large DIO write
> that would come back short. The kernel would then follow up and try to
> write out the rest and get back -ENOSPC. It would then try to issue a
> commit, which would then try to reissue the writes, and around it would
> go.
>
> This patchset seems to fix it. Unfortunately, I'm not positive which
> patch _broke_ this as it seems to have happened quite some time ago.
>
> Jeff Layton (3):
> nfs: add new nfs_direct_req tracepoint events
> nfs: always check dreq->error after a commit
> nfs: only issue commit in DIO codepath if we have uncommitted data
>
> fs/nfs/direct.c | 50 +++++++++--------------------
> fs/nfs/internal.h | 33 ++++++++++++++++++++
> fs/nfs/nfstrace.h | 69 +++++++++++++++++++++++++++++++++++++++++
> fs/nfs/write.c | 48 +++++++++++++++++-----------
> include/linux/nfs_xdr.h | 1 +
> 5 files changed, 148 insertions(+), 53 deletions(-)
>
> --
> 2.36.1
>