These fixes fix two regressions reported by Red Hat QE. This first
issue reported was that writing a file larger than the available space
would result in a client hang instead of returning -ENOSPC. The second
issue reported was that the client was returning -EIO instead of -EDQUOT
when a quota is exceeded.
On further investigation, I found that in the first issue the client
wasn't really hung. Instead it was performing all of the requested
writes before it would eventually return -ENOSPC. The dd command was
writing 400TB, which would have taken a few weeks to complete.
Adjusting the 'bs' and 'count' arguments so that the resulting file
would be much smaller (but still enough to fill up the space on the NFS
server) showed that the dd command would indeed complete with -ENOSPC.
But on older kernels even the 400TB dd would return -ENOSPC quickly.
In the second issue, I found that adding 'conv=fsync' would make the dd
command return -EDQUOT. So what was happening with the original
command was that it was doing a close() without first calling fsync()
and the close() was returning -EIO.
I bisected both of these down to commit aded8d7b54f2 ("NFS: Don't
inadvertently clear writeback errors"). HOWEVER, as a test I reverted
that commit and it wasn't sufficient to bring back the old behavior. I
turns out that was due to commit 6fbda89b257f ("NFS: Replace custom
error reporting mechanism with generic one"). This is why I listed the
second commit in the 'Fixes:' tag of both of my patches.
The first patch fixes the -EDQUOT issue and the second one fixes the
-ENOSPC issue.
-Scott
Scott Mayhew (2):
nfs: ensure correct writeback errors are returned on close()
nfs: nfs_file_write() should check for writeback errors
fs/nfs/file.c | 15 +++++++++++----
fs/nfs/nfs4file.c | 3 ++-
2 files changed, 13 insertions(+), 5 deletions(-)
--
2.25.4
nfs_wb_all() calls filemap_write_and_wait(), which uses
filemap_check_errors() to determine the error to return.
filemap_check_errors() only looks at the mapping->flags and will
therefore only return either -ENOSPC or -EIO. To ensure that the
correct error is returned on close(), nfs{,4}_file_flush() should call
file_check_and_advance_wb_err() which looks at the errseq value in
mapping->wb_err.
Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism with
generic one")
Signed-off-by: Scott Mayhew <[email protected]>
---
fs/nfs/file.c | 3 ++-
fs/nfs/nfs4file.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index f96367a2463e..eeef6580052f 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -148,7 +148,8 @@ nfs_file_flush(struct file *file, fl_owner_t id)
return 0;
/* Flush writes to the server and return any errors */
- return nfs_wb_all(inode);
+ nfs_wb_all(inode);
+ return file_check_and_advance_wb_err(file);
}
ssize_t
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 8e5d6223ddd3..77bf9c12734c 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -125,7 +125,8 @@ nfs4_file_flush(struct file *file, fl_owner_t id)
return filemap_fdatawrite(file->f_mapping);
/* Flush writes to the server and return any errors */
- return nfs_wb_all(inode);
+ nfs_wb_all(inode);
+ return file_check_and_advance_wb_err(file);
}
#ifdef CONFIG_NFS_V4_2
--
2.25.4
The NFS_CONTEXT_ERROR_WRITE flag (as well as the check of said flag) was
removed by commit 6fbda89b257f. The absence of an error check allows
writes to be continually queued up for a server that may no longer be
able to handle them. Fix it by adding an error check using the generic
error reporting functions.
Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism with
generic one")
Signed-off-by: Scott Mayhew <[email protected]>
---
fs/nfs/file.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index eeef6580052f..c6496f21d1e2 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -588,12 +588,14 @@ static const struct vm_operations_struct nfs_file_vm_ops = {
.page_mkwrite = nfs_vm_page_mkwrite,
};
-static int nfs_need_check_write(struct file *filp, struct inode *inode)
+static int nfs_need_check_write(struct file *filp, struct inode *inode,
+ int error)
{
struct nfs_open_context *ctx;
ctx = nfs_file_open_context(filp);
- if (nfs_ctx_key_to_expire(ctx, inode))
+ if (nfs_error_is_fatal_on_server(error) ||
+ nfs_ctx_key_to_expire(ctx, inode))
return 1;
return 0;
}
@@ -604,6 +606,8 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file_inode(file);
unsigned long written = 0;
ssize_t result;
+ errseq_t since;
+ int error;
result = nfs_key_timeout_notify(file, inode);
if (result)
@@ -628,6 +632,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
if (iocb->ki_pos > i_size_read(inode))
nfs_revalidate_mapping(inode, file->f_mapping);
+ since = filemap_sample_wb_err(file->f_mapping);
nfs_start_io_write(inode);
result = generic_write_checks(iocb, from);
if (result > 0) {
@@ -646,7 +651,8 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
goto out;
/* Return error values */
- if (nfs_need_check_write(file, inode)) {
+ error = filemap_check_wb_err(file->f_mapping, since);
+ if (nfs_need_check_write(file, inode, error)) {
int err = nfs_wb_all(inode);
if (err < 0)
result = err;
--
2.25.4
On Fri, 2020-07-31 at 13:46 -0400, Scott Mayhew wrote:
> nfs_wb_all() calls filemap_write_and_wait(), which uses
> filemap_check_errors() to determine the error to return.
> filemap_check_errors() only looks at the mapping->flags and will
> therefore only return either -ENOSPC or -EIO. To ensure that the
> correct error is returned on close(), nfs{,4}_file_flush() should
> call
> file_check_and_advance_wb_err() which looks at the errseq value in
> mapping->wb_err.
>
> Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism
> with
> generic one")
> Signed-off-by: Scott Mayhew <[email protected]>
> ---
> fs/nfs/file.c | 3 ++-
> fs/nfs/nfs4file.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index f96367a2463e..eeef6580052f 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -148,7 +148,8 @@ nfs_file_flush(struct file *file, fl_owner_t id)
> return 0;
>
> /* Flush writes to the server and return any errors */
> - return nfs_wb_all(inode);
> + nfs_wb_all(inode);
> + return file_check_and_advance_wb_err(file);
> }
>
> ssize_t
> diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> index 8e5d6223ddd3..77bf9c12734c 100644
> --- a/fs/nfs/nfs4file.c
> +++ b/fs/nfs/nfs4file.c
> @@ -125,7 +125,8 @@ nfs4_file_flush(struct file *file, fl_owner_t id)
> return filemap_fdatawrite(file->f_mapping);
>
> /* Flush writes to the server and return any errors */
> - return nfs_wb_all(inode);
> + nfs_wb_all(inode);
> + return file_check_and_advance_wb_err(file);
> }
>
> #ifdef CONFIG_NFS_V4_2
I don't think this one is correct. The contract with POSIX is that we
always deliver the error on fsync(). If we call
file_check_and_advance_wb_err() here in nfs_file_flush(), then that
means we eat the error before it can get delivered to fsync().
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]
On Fri, 31 Jul 2020, Trond Myklebust wrote:
> On Fri, 2020-07-31 at 13:46 -0400, Scott Mayhew wrote:
> > nfs_wb_all() calls filemap_write_and_wait(), which uses
> > filemap_check_errors() to determine the error to return.
> > filemap_check_errors() only looks at the mapping->flags and will
> > therefore only return either -ENOSPC or -EIO. To ensure that the
> > correct error is returned on close(), nfs{,4}_file_flush() should
> > call
> > file_check_and_advance_wb_err() which looks at the errseq value in
> > mapping->wb_err.
> >
> > Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism
> > with
> > generic one")
> > Signed-off-by: Scott Mayhew <[email protected]>
> > ---
> > fs/nfs/file.c | 3 ++-
> > fs/nfs/nfs4file.c | 3 ++-
> > 2 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> > index f96367a2463e..eeef6580052f 100644
> > --- a/fs/nfs/file.c
> > +++ b/fs/nfs/file.c
> > @@ -148,7 +148,8 @@ nfs_file_flush(struct file *file, fl_owner_t id)
> > return 0;
> >
> > /* Flush writes to the server and return any errors */
> > - return nfs_wb_all(inode);
> > + nfs_wb_all(inode);
> > + return file_check_and_advance_wb_err(file);
> > }
> >
> > ssize_t
> > diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> > index 8e5d6223ddd3..77bf9c12734c 100644
> > --- a/fs/nfs/nfs4file.c
> > +++ b/fs/nfs/nfs4file.c
> > @@ -125,7 +125,8 @@ nfs4_file_flush(struct file *file, fl_owner_t id)
> > return filemap_fdatawrite(file->f_mapping);
> >
> > /* Flush writes to the server and return any errors */
> > - return nfs_wb_all(inode);
> > + nfs_wb_all(inode);
> > + return file_check_and_advance_wb_err(file);
> > }
> >
> > #ifdef CONFIG_NFS_V4_2
>
> I don't think this one is correct. The contract with POSIX is that we
> always deliver the error on fsync(). If we call
> file_check_and_advance_wb_err() here in nfs_file_flush(), then that
> means we eat the error before it can get delivered to fsync().
I was looking at callers of the flush f_op and the only one I saw was
filp_close(), so I assumed that there wouldn't be any other calls to
fsync() for that struct file... I guess that's not the case if the file
descriptor was duplicated though.
Would a solution using filemap_sample_wb_err() & filemap_check_wb_err()
be acceptable (like in the 2nd patch)?
-Scott
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>
On Fri, 2020-07-31 at 15:43 -0400, Scott Mayhew wrote:
> On Fri, 31 Jul 2020, Trond Myklebust wrote:
>
> > On Fri, 2020-07-31 at 13:46 -0400, Scott Mayhew wrote:
> > > nfs_wb_all() calls filemap_write_and_wait(), which uses
> > > filemap_check_errors() to determine the error to return.
> > > filemap_check_errors() only looks at the mapping->flags and will
> > > therefore only return either -ENOSPC or -EIO. To ensure that the
> > > correct error is returned on close(), nfs{,4}_file_flush() should
> > > call
> > > file_check_and_advance_wb_err() which looks at the errseq value
> > > in
> > > mapping->wb_err.
> > >
> > > Fixes: 6fbda89b257f ("NFS: Replace custom error reporting
> > > mechanism
> > > with
> > > generic one")
> > > Signed-off-by: Scott Mayhew <[email protected]>
> > > ---
> > > fs/nfs/file.c | 3 ++-
> > > fs/nfs/nfs4file.c | 3 ++-
> > > 2 files changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> > > index f96367a2463e..eeef6580052f 100644
> > > --- a/fs/nfs/file.c
> > > +++ b/fs/nfs/file.c
> > > @@ -148,7 +148,8 @@ nfs_file_flush(struct file *file, fl_owner_t
> > > id)
> > > return 0;
> > >
> > > /* Flush writes to the server and return any errors */
> > > - return nfs_wb_all(inode);
> > > + nfs_wb_all(inode);
> > > + return file_check_and_advance_wb_err(file);
> > > }
> > >
> > > ssize_t
> > > diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
> > > index 8e5d6223ddd3..77bf9c12734c 100644
> > > --- a/fs/nfs/nfs4file.c
> > > +++ b/fs/nfs/nfs4file.c
> > > @@ -125,7 +125,8 @@ nfs4_file_flush(struct file *file, fl_owner_t
> > > id)
> > > return filemap_fdatawrite(file->f_mapping);
> > >
> > > /* Flush writes to the server and return any errors */
> > > - return nfs_wb_all(inode);
> > > + nfs_wb_all(inode);
> > > + return file_check_and_advance_wb_err(file);
> > > }
> > >
> > > #ifdef CONFIG_NFS_V4_2
> >
> > I don't think this one is correct. The contract with POSIX is that
> > we
> > always deliver the error on fsync(). If we call
> > file_check_and_advance_wb_err() here in nfs_file_flush(), then that
> > means we eat the error before it can get delivered to fsync().
>
> I was looking at callers of the flush f_op and the only one I saw was
> filp_close(), so I assumed that there wouldn't be any other calls to
> fsync() for that struct file... I guess that's not the case if the
> file
> descriptor was duplicated though.
>
> Would a solution using filemap_sample_wb_err() &
> filemap_check_wb_err()
> be acceptable (like in the 2nd patch)?
>
I think that would be more appropriate, yes.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]