2023-01-09 05:53:00

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 00/11] Remove AS_EIO and AS_ENOSPC

Finish the work of converting every user to the "new" wb_err
infrastructure. This will clash with Christoph's patch series to remove
folio_write_one(), so I'll redo this after that patch series goes in.

Matthew Wilcox (Oracle) (11):
memory-failure: Remove comment referencing AS_EIO
filemap: Remove filemap_check_and_keep_errors()
f2fs: Convert f2fs_wait_on_node_pages_writeback() to errseq
fuse: Convert fuse_flush() to use file_check_and_advance_wb_err()
page-writeback: Convert folio_write_one() to use an errseq
filemap: Convert filemap_write_and_wait_range() to use errseq
filemap: Convert filemap_fdatawait_range() to errseq
cifs: Remove call to filemap_check_wb_err()
mm: Remove AS_EIO and AS_ENOSPC
mm: Remove filemap_fdatawait_range_keep_errors()
mm: Remove filemap_fdatawait_keep_errors()

block/bdev.c | 8 +--
fs/btrfs/extent_io.c | 6 +--
fs/cifs/file.c | 8 ++-
fs/f2fs/data.c | 2 +-
fs/f2fs/node.c | 4 +-
fs/fs-writeback.c | 7 +--
fs/fuse/file.c | 3 +-
fs/jbd2/commit.c | 12 ++---
fs/xfs/scrub/bmap.c | 2 +-
include/linux/pagemap.h | 23 ++------
mm/filemap.c | 113 +++++++---------------------------------
mm/memory-failure.c | 28 ----------
mm/page-writeback.c | 17 +++---
13 files changed, 51 insertions(+), 182 deletions(-)

--
2.35.1


2023-01-09 05:53:38

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 03/11] f2fs: Convert f2fs_wait_on_node_pages_writeback() to errseq

Convert from the old filemap_check_errors() to the errseq infrastructure.
This means we will not report any previously-occurring error, and we will
not clear any previously-occurring error.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/f2fs/node.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index dde4c0458704..a87b5515c681 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -2055,12 +2055,14 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi,
unsigned int seq_id)
{
+ struct address_space *mapping = NODE_MAPPING(sbi);
struct fsync_node_entry *fn;
struct page *page;
struct list_head *head = &sbi->fsync_node_list;
unsigned long flags;
unsigned int cur_seq_id = 0;
int ret2, ret = 0;
+ errseq_t since = filemap_sample_wb_err(mapping);

while (seq_id && cur_seq_id < seq_id) {
spin_lock_irqsave(&sbi->fsync_node_lock, flags);
@@ -2088,7 +2090,7 @@ int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi,
break;
}

- ret2 = filemap_check_errors(NODE_MAPPING(sbi));
+ ret2 = filemap_check_wb_err(mapping, since);
if (!ret)
ret = ret2;

--
2.35.1

2023-01-09 05:53:55

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 05/11] page-writeback: Convert folio_write_one() to use an errseq

Use the errseq infrastructure to detect an error due to writing
back this folio instead of the old error checking code.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/page-writeback.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index ad608ef2a243..491b70dad994 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2610,15 +2610,12 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
*
* The folio must be locked by the caller and will be unlocked upon return.
*
- * Note that the mapping's AS_EIO/AS_ENOSPC flags will be cleared when this
- * function returns.
- *
* Return: %0 on success, negative error code otherwise
*/
int folio_write_one(struct folio *folio)
{
struct address_space *mapping = folio->mapping;
- int ret = 0;
+ int err = 0;
struct writeback_control wbc = {
.sync_mode = WB_SYNC_ALL,
.nr_to_write = folio_nr_pages(folio),
@@ -2629,18 +2626,20 @@ int folio_write_one(struct folio *folio)
folio_wait_writeback(folio);

if (folio_clear_dirty_for_io(folio)) {
+ errseq_t since = filemap_sample_wb_err(mapping);
+
folio_get(folio);
- ret = mapping->a_ops->writepage(&folio->page, &wbc);
- if (ret == 0)
+ err = mapping->a_ops->writepage(&folio->page, &wbc);
+ if (!err) {
folio_wait_writeback(folio);
+ err = filemap_check_wb_err(mapping, since);
+ }
folio_put(folio);
} else {
folio_unlock(folio);
}

- if (!ret)
- ret = filemap_check_errors(mapping);
- return ret;
+ return err;
}
EXPORT_SYMBOL(folio_write_one);

--
2.35.1

2023-01-09 05:55:30

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 01/11] memory-failure: Remove comment referencing AS_EIO

The EIO is now reported to every caller which has the file open on
the next operation which returns an error. We obviously cannot check
whether the user took action correctly on that error, but we can remove
this comment as wb_err is never cleared, unlike AS_EIO.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
mm/memory-failure.c | 28 ----------------------------
1 file changed, 28 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index c77a9e37e27e..1a1c66f7e5dd 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -992,34 +992,6 @@ static int me_pagecache_dirty(struct page_state *ps, struct page *p)
* who check the mapping.
* This way the application knows that something went
* wrong with its dirty file data.
- *
- * There's one open issue:
- *
- * The EIO will be only reported on the next IO
- * operation and then cleared through the IO map.
- * Normally Linux has two mechanisms to pass IO error
- * first through the AS_EIO flag in the address space
- * and then through the PageError flag in the page.
- * Since we drop pages on memory failure handling the
- * only mechanism open to use is through AS_AIO.
- *
- * This has the disadvantage that it gets cleared on
- * the first operation that returns an error, while
- * the PageError bit is more sticky and only cleared
- * when the page is reread or dropped. If an
- * application assumes it will always get error on
- * fsync, but does other operations on the fd before
- * and the page is dropped between then the error
- * will not be properly reported.
- *
- * This can already happen even without hwpoisoned
- * pages: first on metadata IO errors (which only
- * report through AS_EIO) or when the page is dropped
- * at the wrong time.
- *
- * So right now we assume that the application DTRT on
- * the first EIO, but we're not worse than other parts
- * of the kernel.
*/
mapping_set_error(mapping, -EIO);
}
--
2.35.1

2023-01-09 05:57:44

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 08/11] cifs: Remove call to filemap_check_wb_err()

filemap_write_and_wait() now calls filemap_check_wb_err(), so we cannot
glean any additional information by calling it ourselves. It may also
be misleading as it will pick up on any errors since the beginning of
time which may well be since before this program opened the file.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/cifs/file.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 22dfc1f8b4f1..7e7ee26cf77d 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -3042,14 +3042,12 @@ int cifs_flush(struct file *file, fl_owner_t id)
int rc = 0;

if (file->f_mode & FMODE_WRITE)
- rc = filemap_write_and_wait(inode->i_mapping);
+ rc = filemap_write_and_wait(file->f_mapping);

cifs_dbg(FYI, "Flush inode %p file %p rc %d\n", inode, file, rc);
- if (rc) {
- /* get more nuanced writeback errors */
- rc = filemap_check_wb_err(file->f_mapping, 0);
+ if (rc)
trace_cifs_flush_err(inode->i_ino, rc);
- }
+
return rc;
}

--
2.35.1

2023-01-09 05:59:00

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 04/11] fuse: Convert fuse_flush() to use file_check_and_advance_wb_err()

As with fsync, use the newer file_check_and_advance_wb_err() instead
of filemap_check_errors().

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/fuse/file.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 875314ee6f59..7174646ddf09 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -500,11 +500,10 @@ static int fuse_flush(struct file *file, fl_owner_t id)
fuse_sync_writes(inode);
inode_unlock(inode);

- err = filemap_check_errors(file->f_mapping);
+ err = file_check_and_advance_wb_err(file);
if (err)
return err;

- err = 0;
if (fm->fc->no_flush)
goto inval_attr_out;

--
2.35.1

2023-01-09 06:11:55

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 07/11] filemap: Convert filemap_fdatawait_range() to errseq

This removes the unwelcome behaviour of clearing the
errors in the address space, which is the same behaviour as
filemap_fdatawait_range_keep_errors(), so unify the two functions.
We can also get rid of filemap_fdatawait_keep_errors() as it is now the
same as filemap_fdatawait()

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
include/linux/pagemap.h | 7 +++---
mm/filemap.c | 51 +----------------------------------------
2 files changed, 5 insertions(+), 53 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 29e1f9e76eb6..985fd47739f4 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -33,16 +33,17 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
int write_inode_now(struct inode *, int sync);
int filemap_fdatawrite(struct address_space *);
int filemap_flush(struct address_space *);
-int filemap_fdatawait_keep_errors(struct address_space *mapping);
int filemap_fdatawait_range(struct address_space *, loff_t lstart, loff_t lend);
-int filemap_fdatawait_range_keep_errors(struct address_space *mapping,
- loff_t start_byte, loff_t end_byte);

static inline int filemap_fdatawait(struct address_space *mapping)
{
return filemap_fdatawait_range(mapping, 0, LLONG_MAX);
}

+#define filemap_fdatawait_range_keep_errors(mapping, start, end) \
+ filemap_fdatawait_range(mapping, start, end)
+#define filemap_fdatawait_keep_errors(mapping) filemap_fdatawait(mapping)
+
bool filemap_range_has_page(struct address_space *, loff_t lstart, loff_t lend);
int filemap_write_and_wait_range(struct address_space *mapping,
loff_t lstart, loff_t lend);
diff --git a/mm/filemap.c b/mm/filemap.c
index c72b2e1140d7..887520db115a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -526,43 +526,17 @@ static void __filemap_fdatawait_range(struct address_space *mapping,
* in the given range and wait for all of them. Check error status of
* the address space and return it.
*
- * Since the error status of the address space is cleared by this function,
- * callers are responsible for checking the return value and handling and/or
- * reporting the error.
- *
* Return: error status of the address space.
*/
int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
loff_t end_byte)
-{
- __filemap_fdatawait_range(mapping, start_byte, end_byte);
- return filemap_check_errors(mapping);
-}
-EXPORT_SYMBOL(filemap_fdatawait_range);
-
-/**
- * filemap_fdatawait_range_keep_errors - wait for writeback to complete
- * @mapping: address space structure to wait for
- * @start_byte: offset in bytes where the range starts
- * @end_byte: offset in bytes where the range ends (inclusive)
- *
- * Walk the list of under-writeback pages of the given address space in the
- * given range and wait for all of them. Unlike filemap_fdatawait_range(),
- * this function does not clear error status of the address space.
- *
- * Use this function if callers don't handle errors themselves. Expected
- * call sites are system-wide / filesystem-wide data flushers: e.g. sync(2),
- * fsfreeze(8)
- */
-int filemap_fdatawait_range_keep_errors(struct address_space *mapping,
- loff_t start_byte, loff_t end_byte)
{
errseq_t since = filemap_sample_wb_err(mapping);

__filemap_fdatawait_range(mapping, start_byte, end_byte);
return filemap_check_wb_err(mapping, since);
}
-EXPORT_SYMBOL(filemap_fdatawait_range_keep_errors);
+EXPORT_SYMBOL(filemap_fdatawait_range);

/**
* file_fdatawait_range - wait for writeback to complete
@@ -589,29 +563,6 @@ int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
}
EXPORT_SYMBOL(file_fdatawait_range);

-/**
- * filemap_fdatawait_keep_errors - wait for writeback without clearing errors
- * @mapping: address space structure to wait for
- *
- * Walk the list of under-writeback pages of the given address space
- * and wait for all of them. Unlike filemap_fdatawait(), this function
- * does not clear error status of the address space.
- *
- * Use this function if callers don't handle errors themselves. Expected
- * call sites are system-wide / filesystem-wide data flushers: e.g. sync(2),
- * fsfreeze(8)
- *
- * Return: error status of the address space.
- */
-int filemap_fdatawait_keep_errors(struct address_space *mapping)
-{
- errseq_t since = filemap_sample_wb_err(mapping);
-
- __filemap_fdatawait_range(mapping, 0, LLONG_MAX);
- return filemap_check_wb_err(mapping, since);
-}
-EXPORT_SYMBOL(filemap_fdatawait_keep_errors);
-
/* Returns true if writeback might be needed or already in progress. */
static bool mapping_needs_writeback(struct address_space *mapping)
{
--
2.35.1

2023-01-09 06:13:30

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 10/11] mm: Remove filemap_fdatawait_range_keep_errors()

This function is now the same as filemap_fdatawait_range(), so change
both callers to use that instead.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
fs/jbd2/commit.c | 12 +++++-------
include/linux/pagemap.h | 2 --
2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 4810438b7856..36aa1b117a50 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -221,11 +221,10 @@ EXPORT_SYMBOL(jbd2_submit_inode_data);
int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode)
{
if (!jinode || !(jinode->i_flags & JI_WAIT_DATA) ||
- !jinode->i_vfs_inode || !jinode->i_vfs_inode->i_mapping)
+ !jinode->i_vfs_inode || !jinode->i_vfs_inode->i_mapping)
return 0;
- return filemap_fdatawait_range_keep_errors(
- jinode->i_vfs_inode->i_mapping, jinode->i_dirty_start,
- jinode->i_dirty_end);
+ return filemap_fdatawait_range(jinode->i_vfs_inode->i_mapping,
+ jinode->i_dirty_start, jinode->i_dirty_end);
}
EXPORT_SYMBOL(jbd2_wait_inode_data);

@@ -270,9 +269,8 @@ int jbd2_journal_finish_inode_data_buffers(struct jbd2_inode *jinode)
{
struct address_space *mapping = jinode->i_vfs_inode->i_mapping;

- return filemap_fdatawait_range_keep_errors(mapping,
- jinode->i_dirty_start,
- jinode->i_dirty_end);
+ return filemap_fdatawait_range(mapping, jinode->i_dirty_start,
+ jinode->i_dirty_end);
}

/*
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 573b8cce3a85..7fe2a5ec1c12 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -40,8 +40,6 @@ static inline int filemap_fdatawait(struct address_space *mapping)
return filemap_fdatawait_range(mapping, 0, LLONG_MAX);
}

-#define filemap_fdatawait_range_keep_errors(mapping, start, end) \
- filemap_fdatawait_range(mapping, start, end)
#define filemap_fdatawait_keep_errors(mapping) filemap_fdatawait(mapping)

bool filemap_range_has_page(struct address_space *, loff_t lstart, loff_t lend);
--
2.35.1

2023-01-09 15:05:35

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 08/11] cifs: Remove call to filemap_check_wb_err()

On Mon, 2023-01-09 at 05:18 +0000, Matthew Wilcox (Oracle) wrote:
> filemap_write_and_wait() now calls filemap_check_wb_err(), so we cannot
> glean any additional information by calling it ourselves. It may also
> be misleading as it will pick up on any errors since the beginning of
> time which may well be since before this program opened the file.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> fs/cifs/file.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> index 22dfc1f8b4f1..7e7ee26cf77d 100644
> --- a/fs/cifs/file.c
> +++ b/fs/cifs/file.c
> @@ -3042,14 +3042,12 @@ int cifs_flush(struct file *file, fl_owner_t id)
> int rc = 0;
>
> if (file->f_mode & FMODE_WRITE)
> - rc = filemap_write_and_wait(inode->i_mapping);
> + rc = filemap_write_and_wait(file->f_mapping);

If we're calling ->flush, then the file is being closed. Should this
just be?
rc = file_write_and_wait(file);

It's not like we need to worry about corrupting ->f_wb_err at that
point.

>
> cifs_dbg(FYI, "Flush inode %p file %p rc %d\n", inode, file rc);
> - if (rc) {
> - /* get more nuanced writeback errors */
> - rc = filemap_check_wb_err(file->f_mapping, 0);
> + if (rc)
> trace_cifs_flush_err(inode->i_ino, rc);
> - }
> +
> return rc;
> }
>

--
Jeff Layton <[email protected]>

2023-01-09 15:42:32

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [PATCH 00/11] Remove AS_EIO and AS_ENOSPC

On Mon, 2023-01-09 at 05:18 +0000, Matthew Wilcox (Oracle) wrote:
> Finish the work of converting every user to the "new" wb_err
> infrastructure. This will clash with Christoph's patch series to remove
> folio_write_one(), so I'll redo this after that patch series goes in.
>
> Matthew Wilcox (Oracle) (11):
> memory-failure: Remove comment referencing AS_EIO
> filemap: Remove filemap_check_and_keep_errors()
> f2fs: Convert f2fs_wait_on_node_pages_writeback() to errseq
> fuse: Convert fuse_flush() to use file_check_and_advance_wb_err()
> page-writeback: Convert folio_write_one() to use an errseq
> filemap: Convert filemap_write_and_wait_range() to use errseq
> filemap: Convert filemap_fdatawait_range() to errseq
> cifs: Remove call to filemap_check_wb_err()
> mm: Remove AS_EIO and AS_ENOSPC
> mm: Remove filemap_fdatawait_range_keep_errors()

> mm: Remove filemap_fdatawait_keep_errors()
>
> block/bdev.c | 8 +--
> fs/btrfs/extent_io.c | 6 +--
> fs/cifs/file.c | 8 ++-
> fs/f2fs/data.c | 2 +-
> fs/f2fs/node.c | 4 +-
> fs/fs-writeback.c | 7 +--
> fs/fuse/file.c | 3 +-
> fs/jbd2/commit.c | 12 ++---
> fs/xfs/scrub/bmap.c | 2 +-
> include/linux/pagemap.h | 23 ++------
> mm/filemap.c | 113 +++++++---------------------------------
> mm/memory-failure.c | 28 ----------
> mm/page-writeback.c | 17 +++---
> 13 files changed, 51 insertions(+), 182 deletions(-)
>

Now that I got my understanding straight about unseen errors, I think
this looks good. I think it'd be best to avoid advancing f_wb_err in
->flush operations but the rest of this looks fine.

You can add this to all but #4 (and that one should be easily fixable).

Reviewed-by: Jeff Layton <[email protected]>

2023-01-09 15:43:20

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 08/11] cifs: Remove call to filemap_check_wb_err()

On Mon, 2023-01-09 at 09:42 -0500, Jeff Layton wrote:
> On Mon, 2023-01-09 at 05:18 +0000, Matthew Wilcox (Oracle) wrote:
> > filemap_write_and_wait() now calls filemap_check_wb_err(), so we cannot
> > glean any additional information by calling it ourselves. It may also
> > be misleading as it will pick up on any errors since the beginning of
> > time which may well be since before this program opened the file.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > ---
> > fs/cifs/file.c | 8 +++-----
> > 1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> > index 22dfc1f8b4f1..7e7ee26cf77d 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -3042,14 +3042,12 @@ int cifs_flush(struct file *file, fl_owner_t id)
> > int rc = 0;
> >
> > if (file->f_mode & FMODE_WRITE)
> > - rc = filemap_write_and_wait(inode->i_mapping);
> > + rc = filemap_write_and_wait(file->f_mapping);
>
> If we're calling ->flush, then the file is being closed. Should this
> just be?
> rc = file_write_and_wait(file);
>
> It's not like we need to worry about corrupting ->f_wb_err at that
> point.
>

OTOH, I suppose it is possible for there to be racing fsync syscall with
a filp_close, and in that case advancing the f_wb_err might be a bad
idea, particularly since a lot of places ignore the return from
filp_close. It's probably best to _not_ advance the f_wb_err on ->flush
calls.

That said...wonder if we ought to consider making filp_close and ->flush
void return functions. There's no POSIX requirement to flush all of the
data on close(), so an application really shouldn't rely on seeing
writeback errors returned there since it's not reliable.

If you care about writeback errors, you have to call fsync -- full stop.

> >
> > cifs_dbg(FYI, "Flush inode %p file %p rc %d\n", inode, file rc);
> > - if (rc) {
> > - /* get more nuanced writeback errors */
> > - rc = filemap_check_wb_err(file->f_mapping, 0);
> > + if (rc)
> > trace_cifs_flush_err(inode->i_ino, rc);
> > - }
> > +
> > return rc;
> > }
> >
>

--
Jeff Layton <[email protected]>

2023-01-09 15:45:14

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 08/11] cifs: Remove call to filemap_check_wb_err()

On Mon, Jan 09, 2023 at 10:14:12AM -0500, Jeff Layton wrote:
> On Mon, 2023-01-09 at 09:42 -0500, Jeff Layton wrote:
> > On Mon, 2023-01-09 at 05:18 +0000, Matthew Wilcox (Oracle) wrote:
> > > filemap_write_and_wait() now calls filemap_check_wb_err(), so we cannot
> > > glean any additional information by calling it ourselves. It may also
> > > be misleading as it will pick up on any errors since the beginning of
> > > time which may well be since before this program opened the file.
> > >
> > > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > > ---
> > > fs/cifs/file.c | 8 +++-----
> > > 1 file changed, 3 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> > > index 22dfc1f8b4f1..7e7ee26cf77d 100644
> > > --- a/fs/cifs/file.c
> > > +++ b/fs/cifs/file.c
> > > @@ -3042,14 +3042,12 @@ int cifs_flush(struct file *file, fl_owner_t id)
> > > int rc = 0;
> > >
> > > if (file->f_mode & FMODE_WRITE)
> > > - rc = filemap_write_and_wait(inode->i_mapping);
> > > + rc = filemap_write_and_wait(file->f_mapping);
> >
> > If we're calling ->flush, then the file is being closed. Should this
> > just be?
> > rc = file_write_and_wait(file);
> >
> > It's not like we need to worry about corrupting ->f_wb_err at that
> > point.
> >
>
> OTOH, I suppose it is possible for there to be racing fsync syscall with
> a filp_close, and in that case advancing the f_wb_err might be a bad
> idea, particularly since a lot of places ignore the return from
> filp_close. It's probably best to _not_ advance the f_wb_err on ->flush
> calls.

There's only so much we can do to protect an application from itself.
If it's racing an fsync() against close(), it might get an EBADF from
fsync(), or end up fsyncing entirely the wrong file due to a close();
open(); associating the fd with a different file.

> That said...wonder if we ought to consider making filp_close and ->flush
> void return functions. There's no POSIX requirement to flush all of the
> data on close(), so an application really shouldn't rely on seeing
> writeback errors returned there since it's not reliable.
>
> If you care about writeback errors, you have to call fsync -- full stop.

Yes, most filesystems do not writeback dirty data on close().
Applications can't depend on that behaviour. Interestingly, if you read
https://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html
really carefully, it says:

If an I/O error occurred while reading from or writing to the file
system during close(), it may return -1 with errno set to [EIO];
if this error is returned, the state of fildes is unspecified.

So if we return an error, userspace doesn't know if this fd is still
open or not! This feels like poor underspecification on POSIX's part
(and probably stems from a historical unwillingness to declare any
vendor's implementation as "not Unix").

2023-01-09 15:46:59

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 08/11] cifs: Remove call to filemap_check_wb_err()

On Mon, Jan 09, 2023 at 09:42:36AM -0500, Jeff Layton wrote:
> On Mon, 2023-01-09 at 05:18 +0000, Matthew Wilcox (Oracle) wrote:
> > filemap_write_and_wait() now calls filemap_check_wb_err(), so we cannot
> > glean any additional information by calling it ourselves. It may also
> > be misleading as it will pick up on any errors since the beginning of
> > time which may well be since before this program opened the file.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > ---
> > fs/cifs/file.c | 8 +++-----
> > 1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> > index 22dfc1f8b4f1..7e7ee26cf77d 100644
> > --- a/fs/cifs/file.c
> > +++ b/fs/cifs/file.c
> > @@ -3042,14 +3042,12 @@ int cifs_flush(struct file *file, fl_owner_t id)
> > int rc = 0;
> >
> > if (file->f_mode & FMODE_WRITE)
> > - rc = filemap_write_and_wait(inode->i_mapping);
> > + rc = filemap_write_and_wait(file->f_mapping);
>
> If we're calling ->flush, then the file is being closed. Should this
> just be?
> rc = file_write_and_wait(file);
>
> It's not like we need to worry about corrupting ->f_wb_err at that
> point.

Yes, I think you're right, and then this is a standalone patch that can
go in this cycle, perhaps.

2023-01-09 16:09:19

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 04/11] fuse: Convert fuse_flush() to use file_check_and_advance_wb_err()

On Mon, 2023-01-09 at 05:18 +0000, Matthew Wilcox (Oracle) wrote:
> As with fsync, use the newer file_check_and_advance_wb_err() instead
> of filemap_check_errors().
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
> fs/fuse/file.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 875314ee6f59..7174646ddf09 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -500,11 +500,10 @@ static int fuse_flush(struct file *file, fl_owner_t id)
> fuse_sync_writes(inode);
> inode_unlock(inode);
>
> - err = filemap_check_errors(file->f_mapping);
> + err = file_check_and_advance_wb_err(file);
> if (err)
> return err;
>
>

I think it'd be best to not advance the f_wb_err here. ->flush is called
on filp_close which is mainly close() syscalls, but there are some other
callers too, and an error reported by ->flush can be discarded.



>
> - err = 0;
> if (fm->fc->no_flush)
> goto inval_attr_out;
>

--
Jeff Layton <[email protected]>

2023-01-09 16:11:37

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH 08/11] cifs: Remove call to filemap_check_wb_err()

On Mon, 2023-01-09 at 15:30 +0000, Matthew Wilcox wrote:
> On Mon, Jan 09, 2023 at 10:14:12AM -0500, Jeff Layton wrote:
> > On Mon, 2023-01-09 at 09:42 -0500, Jeff Layton wrote:
> > > On Mon, 2023-01-09 at 05:18 +0000, Matthew Wilcox (Oracle) wrote:
> > > > filemap_write_and_wait() now calls filemap_check_wb_err(), so we cannot
> > > > glean any additional information by calling it ourselves. It may also
> > > > be misleading as it will pick up on any errors since the beginning of
> > > > time which may well be since before this program opened the file.
> > > >
> > > > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > > > ---
> > > > fs/cifs/file.c | 8 +++-----
> > > > 1 file changed, 3 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/fs/cifs/file.c b/fs/cifs/file.c
> > > > index 22dfc1f8b4f1..7e7ee26cf77d 100644
> > > > --- a/fs/cifs/file.c
> > > > +++ b/fs/cifs/file.c
> > > > @@ -3042,14 +3042,12 @@ int cifs_flush(struct file *file, fl_owner_t id)
> > > > int rc = 0;
> > > >
> > > > if (file->f_mode & FMODE_WRITE)
> > > > - rc = filemap_write_and_wait(inode->i_mapping);
> > > > + rc = filemap_write_and_wait(file->f_mapping);
> > >
> > > If we're calling ->flush, then the file is being closed. Should this
> > > just be?
> > > rc = file_write_and_wait(file);
> > >
> > > It's not like we need to worry about corrupting ->f_wb_err at that
> > > point.
> > >
> >
> > OTOH, I suppose it is possible for there to be racing fsync syscall with
> > a filp_close, and in that case advancing the f_wb_err might be a bad
> > idea, particularly since a lot of places ignore the return from
> > filp_close. It's probably best to _not_ advance the f_wb_err on ->flush
> > calls.
>
> There's only so much we can do to protect an application from itself.
> If it's racing an fsync() against close(), it might get an EBADF from
> fsync(), or end up fsyncing entirely the wrong file due to a close();
> open(); associating the fd with a different file.
>

close() is not the worry, as it does return the error from ->flush. It's
the other callers of filp_close that concern me. A lot of those are
dealing with special "internal" struct files, but not all of them.

There's no benefit to advancing f_wb_err in ->flush, so I think we ought
to just avoid it. I think we don't even want to mark it SEEN in that
case either, so filemap_check_wb_err ought to be fine.

> > That said...wonder if we ought to consider making filp_close and ->flush
> > void return functions. There's no POSIX requirement to flush all of the
> > data on close(), so an application really shouldn't rely on seeing
> > writeback errors returned there since it's not reliable.
> >
> > If you care about writeback errors, you have to call fsync -- full stop.
>
> Yes, most filesystems do not writeback dirty data on close().
> Applications can't depend on that behaviour. Interestingly, if you read
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html
> really carefully, it says:
>
> If an I/O error occurred while reading from or writing to the file
> system during close(), it may return -1 with errno set to [EIO];
> if this error is returned, the state of fildes is unspecified.
>
> So if we return an error, userspace doesn't know if this fd is still
> open or not! This feels like poor underspecification on POSIX's part
> (and probably stems from a historical unwillingness to declare any
> vendor's implementation as "not Unix").
>

It's a mess. On Linux we even?tear down the fd on EINTR and EAGAIN, so
retrying a close() on failure will always get you a EBADF. The only sane
thing for userland to do is to just ignore errors on close (aside from
maybe EBADF).
--
Jeff Layton <[email protected]>

2023-01-12 08:35:05

by Christoph Hellwig

[permalink] [raw]

2023-01-12 08:47:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 00/11] Remove AS_EIO and AS_ENOSPC

On Mon, Jan 09, 2023 at 05:18:12AM +0000, Matthew Wilcox (Oracle) wrote:
> Finish the work of converting every user to the "new" wb_err
> infrastructure. This will clash with Christoph's patch series to remove
> folio_write_one(), so I'll redo this after that patch series goes in.

Based on the conflicts with the btrfs tree picking up the btrfs part
of this series I think the best way forward is:

- I'll resubmit my series without the two btrfs patches and the final
patch to move folio_write_one into jfs
- that last patch will be delayed until 6.4 and rebased on top of
your series

While this keeps folio_write_one/write_one_page around for one more
merge window it should entangle all the work nicely. We'll just need
to watch out that no new users appear.