2017-03-08 16:55:14

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 0/9] mm/fs: get PG_error out of the writeback reporting business

v2:
- still ClearPageError during __filemap_fdatawait_range
- clear AS_* errors when reporting errors during write initiation
- set mapping errors when launder_page fails
- set mapping errors when writeback fails during migration
- set mapping errors when DAX writeback fails
- Documentation patch to give guidance about writeback errors

Here is v2 of this set. The main difference is some new patches to
ensure that mapping errors get set in a few rather obscure places when
writeback fails, and a patch (based on Jan's suggestion) to clear out
the address space errors when initiating writeback fails with -EIO. I
also left the code clearing PG_error in __filemap_fdatawait_range. We
may want to remove that eventually, but we need to ensure that it gets
cleared in some way when writeback fails.

I've done a bit of testing with this (mostly xfstests on xfs), and it
seems to work ok AFAICT.

Original cover letter follows:

------------------------------8<-----------------------------

I recently did some work to wire up -ENOSPC handling in ceph, and found
I could get back -EIO errors in some cases when I should have instead
gotten -ENOSPC. The problem was that the ceph writeback code would set
PG_error on a writeback error, and that error would clobber the mapping
error.

While I fixed that problem by simply not setting that bit on errors,
that led me down a rabbit hole of looking at how PG_error is being
handled in the kernel.

This patch series is a few fixes for things that I 100% noticed by
inspection. I don't have a great way to test these since they involve
error handling. I can certainly doctor up a kernel to inject errors
in this code and test by hand however if these look plausible up front.

Jeff Layton (9):
mm: fix mapping_set_error call in me_pagecache_dirty
mm: drop "wait" parameter from write_one_page
mm: clear any AS_* errors when returning error on any fsync or close
nilfs2: set the mapping error when calling SetPageError on writeback
dax: set error in mapping when writeback fails
mm: set mapping error when launder_pages fails
mm: ensure that we set mapping error if writeout() fails
mm: don't TestClearPageError in __filemap_fdatawait_range
Documentation: document what to do on a writeback error

Documentation/filesystems/vfs.txt | 7 +++++++
fs/dax.c | 4 +++-
fs/exofs/dir.c | 2 +-
fs/ext2/dir.c | 2 +-
fs/jfs/jfs_metapage.c | 4 ++--
fs/minix/dir.c | 2 +-
fs/nilfs2/segment.c | 1 +
fs/sysv/dir.c | 2 +-
fs/ufs/dir.c | 2 +-
include/linux/mm.h | 2 +-
mm/filemap.c | 40 ++++++++++++++++++++++-----------------
mm/memory-failure.c | 2 +-
mm/migrate.c | 6 +++++-
mm/page-writeback.c | 14 +++++++-------
mm/truncate.c | 6 +++++-
15 files changed, 60 insertions(+), 36 deletions(-)

--
2.9.3


2017-03-08 16:38:07

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 7/9] mm: ensure that we set mapping error if writeout() fails

If writepage fails during a page migration, then we need to ensure that
fsync will see it by flagging the mapping.

Signed-off-by: Jeff Layton <[email protected]>
---
mm/migrate.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 9a0897a14d37..a9c0b46865b7 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -789,7 +789,11 @@ static int writeout(struct address_space *mapping, struct page *page)
/* unlocked. Relock */
lock_page(page);

- return (rc < 0) ? -EIO : -EAGAIN;
+ if (rc < 0) {
+ mapping_set_error(mapping, rc);
+ return -EIO;
+ }
+ return -EAGAIN;
}

/*
--
2.9.3

2017-03-08 16:38:10

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 1/9] mm: fix mapping_set_error call in me_pagecache_dirty

The error code should be negative. Since this ends up in the default
case anyway, this is harmless, but it's less confusing to negate it.

Signed-off-by: Jeff Layton <[email protected]>
---
mm/memory-failure.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 27f7210e7fab..4b56e53e5378 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -674,7 +674,7 @@ static int me_pagecache_dirty(struct page *p, unsigned long pfn)
* the first EIO, but we're not worse than other parts
* of the kernel.
*/
- mapping_set_error(mapping, EIO);
+ mapping_set_error(mapping, -EIO);
}

return me_pagecache_clean(p, pfn);
--
2.9.3

2017-03-08 16:38:09

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 2/9] mm: drop "wait" parameter from write_one_page

The callers all set it to 1. Also, make it clear that this function will
not set any sort of AS_* error, and that the caller must do so if
necessary. No existing caller uses this on normal files, so none of them
need it.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/exofs/dir.c | 2 +-
fs/ext2/dir.c | 2 +-
fs/jfs/jfs_metapage.c | 4 ++--
fs/minix/dir.c | 2 +-
fs/sysv/dir.c | 2 +-
fs/ufs/dir.c | 2 +-
include/linux/mm.h | 2 +-
mm/page-writeback.c | 14 +++++++-------
8 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/exofs/dir.c b/fs/exofs/dir.c
index 42f9a0a0c4ca..e163ed980c20 100644
--- a/fs/exofs/dir.c
+++ b/fs/exofs/dir.c
@@ -72,7 +72,7 @@ static int exofs_commit_chunk(struct page *page, loff_t pos, unsigned len)
set_page_dirty(page);

if (IS_DIRSYNC(dir))
- err = write_one_page(page, 1);
+ err = write_one_page(page);
else
unlock_page(page);

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index d9650c9508e4..e2709695b177 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -100,7 +100,7 @@ static int ext2_commit_chunk(struct page *page, loff_t pos, unsigned len)
}

if (IS_DIRSYNC(dir)) {
- err = write_one_page(page, 1);
+ err = write_one_page(page);
if (!err)
err = sync_inode_metadata(dir, 1);
} else {
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index 489aaa1403e5..744fa3c079e6 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -711,7 +711,7 @@ void force_metapage(struct metapage *mp)
get_page(page);
lock_page(page);
set_page_dirty(page);
- write_one_page(page, 1);
+ write_one_page(page);
clear_bit(META_forcewrite, &mp->flag);
put_page(page);
}
@@ -756,7 +756,7 @@ void release_metapage(struct metapage * mp)
set_page_dirty(page);
if (test_bit(META_sync, &mp->flag)) {
clear_bit(META_sync, &mp->flag);
- write_one_page(page, 1);
+ write_one_page(page);
lock_page(page); /* write_one_page unlocks the page */
}
} else if (mp->lsn) /* discard_metapage doesn't remove it */
diff --git a/fs/minix/dir.c b/fs/minix/dir.c
index 7edc9b395700..baa9721f1299 100644
--- a/fs/minix/dir.c
+++ b/fs/minix/dir.c
@@ -57,7 +57,7 @@ static int dir_commit_chunk(struct page *page, loff_t pos, unsigned len)
mark_inode_dirty(dir);
}
if (IS_DIRSYNC(dir))
- err = write_one_page(page, 1);
+ err = write_one_page(page);
else
unlock_page(page);
return err;
diff --git a/fs/sysv/dir.c b/fs/sysv/dir.c
index 5bdae85ceef7..f5191cb2c947 100644
--- a/fs/sysv/dir.c
+++ b/fs/sysv/dir.c
@@ -45,7 +45,7 @@ static int dir_commit_chunk(struct page *page, loff_t pos, unsigned len)
mark_inode_dirty(dir);
}
if (IS_DIRSYNC(dir))
- err = write_one_page(page, 1);
+ err = write_one_page(page);
else
unlock_page(page);
return err;
diff --git a/fs/ufs/dir.c b/fs/ufs/dir.c
index de01b8f2aa78..48609f1d9580 100644
--- a/fs/ufs/dir.c
+++ b/fs/ufs/dir.c
@@ -53,7 +53,7 @@ static int ufs_commit_chunk(struct page *page, loff_t pos, unsigned len)
mark_inode_dirty(dir);
}
if (IS_DIRSYNC(dir))
- err = write_one_page(page, 1);
+ err = write_one_page(page);
else
unlock_page(page);
return err;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0d65dd72c0f4..0ac8fe63769a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2164,7 +2164,7 @@ extern void filemap_map_pages(struct vm_fault *vmf,
extern int filemap_page_mkwrite(struct vm_fault *vmf);

/* mm/page-writeback.c */
-int write_one_page(struct page *page, int wait);
+int write_one_page(struct page *page);
void task_dirty_inc(struct task_struct *tsk);

/* readahead.c */
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index d8ac2a7fb9e7..de0dbf12e2c1 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2361,15 +2361,16 @@ int do_writepages(struct address_space *mapping, struct writeback_control *wbc)
}

/**
- * write_one_page - write out a single page and optionally wait on I/O
+ * write_one_page - write out a single page and wait on I/O
* @page: the page to write
- * @wait: if true, wait on writeout
*
* The page must be locked by the caller and will be unlocked upon return.
*
- * write_one_page() returns a negative error code if I/O failed.
+ * write_one_page() returns a negative error code if I/O failed. Note that
+ * the address_space is not marked for error. The caller must do this if
+ * needed.
*/
-int write_one_page(struct page *page, int wait)
+int write_one_page(struct page *page)
{
struct address_space *mapping = page->mapping;
int ret = 0;
@@ -2380,13 +2381,12 @@ int write_one_page(struct page *page, int wait)

BUG_ON(!PageLocked(page));

- if (wait)
- wait_on_page_writeback(page);
+ wait_on_page_writeback(page);

if (clear_page_dirty_for_io(page)) {
get_page(page);
ret = mapping->a_ops->writepage(page, &wbc);
- if (ret == 0 && wait) {
+ if (ret == 0) {
wait_on_page_writeback(page);
if (PageError(page))
ret = -EIO;
--
2.9.3

2017-03-08 16:42:09

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 3/9] mm: clear any AS_* errors when returning error on any fsync or close

Currently we don't clear the address space error when there is a -EIO
error on fsynci, due to writeback initiation failure. If writes fail
with -EIO and the mapping is flagged with an AS_EIO or AS_ENOSPC error,
then we can end up returning errors on two fsync calls, even when a
write between them succeeded (or there was no write).

Ensure that we also clear out any mapping errors when initiating
writeback fails with -EIO in filemap_write_and_wait and
filemap_write_and_wait_range.

Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
mm/filemap.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 1694623a6289..fc123b9833e1 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -488,7 +488,7 @@ EXPORT_SYMBOL(filemap_fdatawait);

int filemap_write_and_wait(struct address_space *mapping)
{
- int err = 0;
+ int err;

if ((!dax_mapping(mapping) && mapping->nrpages) ||
(dax_mapping(mapping) && mapping->nrexceptional)) {
@@ -499,10 +499,18 @@ int filemap_write_and_wait(struct address_space *mapping)
* But the -EIO is special case, it may indicate the worst
* thing (e.g. bug) happened, so we avoid waiting for it.
*/
- if (err != -EIO) {
+ if (likely(err != -EIO)) {
int err2 = filemap_fdatawait(mapping);
if (!err)
err = err2;
+ } else {
+ /*
+ * Clear the error in the address space since we're
+ * returning an error here. -EIO takes precedence over
+ * everything else though, so we can just discard
+ * the return here.
+ */
+ filemap_check_errors(mapping);
}
} else {
err = filemap_check_errors(mapping);
@@ -537,6 +545,14 @@ int filemap_write_and_wait_range(struct address_space *mapping,
lstart, lend);
if (!err)
err = err2;
+ } else {
+ /*
+ * Clear the error in the address space since we're
+ * returning an error here. -EIO takes precedence over
+ * everything else though, so we can just discard
+ * the return here.
+ */
+ filemap_check_errors(mapping);
}
} else {
err = filemap_check_errors(mapping);
--
2.9.3

2017-03-08 16:42:07

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 8/9] mm: don't TestClearPageError in __filemap_fdatawait_range

The -EIO returned here can end up overriding whatever error is marked in
the address space. This means that an -ENOSPC error (AS_ENOSPC) can end
up being turned into -EIO if a page gets PG_error set on it during error
handling.

Read errors are also sometimes tracked on a per-page level using
PG_error. Suppose we have a read error on a page, and then that page is
subsequently dirtied by overwriting the whole page. Writeback doesn't
clear PG_error, so we can then end up successfully writing back that
page and still return -EIO on fsync.

Worse yet, PG_error is cleared during a sync() syscall, but the -EIO
return from this code is silently discarded. Any subsystem that is
relying on PG_error to report errors during fsync or close is already
broken due to this. All you need is a stray sync() call on the box
at the wrong time and you've lost the error.

Since the handling of the PG_error flag is somewhat inconsistent across
subsystems, let's just rely on marking the address space when there are
writeback errors. Change the TestClearPageError call to ClearPageError,
and make __filemap_fdatawait_range a void return function.

Signed-off-by: Jeff Layton <[email protected]>
---
mm/filemap.c | 20 +++++---------------
1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index fc123b9833e1..150559e94f8a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -376,17 +376,16 @@ int filemap_flush(struct address_space *mapping)
}
EXPORT_SYMBOL(filemap_flush);

-static int __filemap_fdatawait_range(struct address_space *mapping,
+static void __filemap_fdatawait_range(struct address_space *mapping,
loff_t start_byte, loff_t end_byte)
{
pgoff_t index = start_byte >> PAGE_SHIFT;
pgoff_t end = end_byte >> PAGE_SHIFT;
struct pagevec pvec;
int nr_pages;
- int ret = 0;

if (end_byte < start_byte)
- goto out;
+ return;

pagevec_init(&pvec, 0);
while ((index <= end) &&
@@ -403,14 +402,11 @@ static int __filemap_fdatawait_range(struct address_space *mapping,
continue;

wait_on_page_writeback(page);
- if (TestClearPageError(page))
- ret = -EIO;
+ ClearPageError(page);
}
pagevec_release(&pvec);
cond_resched();
}
-out:
- return ret;
}

/**
@@ -430,14 +426,8 @@ static int __filemap_fdatawait_range(struct address_space *mapping,
int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
loff_t end_byte)
{
- int ret, ret2;
-
- ret = __filemap_fdatawait_range(mapping, start_byte, end_byte);
- ret2 = filemap_check_errors(mapping);
- if (!ret)
- ret = ret2;
-
- return ret;
+ __filemap_fdatawait_range(mapping, start_byte, end_byte);
+ return filemap_check_errors(mapping);
}
EXPORT_SYMBOL(filemap_fdatawait_range);

--
2.9.3

2017-03-08 16:55:07

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 6/9] mm: set mapping error when launder_pages fails

If launder_page fails, then we hit a problem writing back some inode
data. Ensure that we communicate that fact in a subsequent fsync since
another task could still have it open for write.

Signed-off-by: Jeff Layton <[email protected]>
---
mm/truncate.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 6263affdef88..29ae420a5bf9 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -594,11 +594,15 @@ invalidate_complete_page2(struct address_space *mapping, struct page *page)

static int do_launder_page(struct address_space *mapping, struct page *page)
{
+ int ret;
+
if (!PageDirty(page))
return 0;
if (page->mapping != mapping || mapping->a_ops->launder_page == NULL)
return 0;
- return mapping->a_ops->launder_page(page);
+ ret = mapping->a_ops->launder_page(page);
+ mapping_set_error(mapping, ret);
+ return ret;
}

/**
--
2.9.3

2017-03-08 16:55:15

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 9/9] Documentation: document what to do on a writeback error

There's no real guidance on this for filesystem authors, so add a
paragraph to vfs.txt that explains how this should be handled.

Signed-off-by: Jeff Layton <[email protected]>
---
Documentation/filesystems/vfs.txt | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 569211703721..527370fbab39 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -577,6 +577,13 @@ should clear PG_Dirty and set PG_Writeback. It can be actually
written at any point after PG_Dirty is clear. Once it is known to be
safe, PG_Writeback is cleared.

+If there is an error during writeback, then the address_space should be
+marked with an AS_EIO or AS_ENOSPC error, in order to ensure that the
+error will be reported to the application at fsync or close. Most
+writepage callers will do this automatically if writepage returns an
+error, but writepages implementations generally need to ensure this
+themselves.
+
Writeback makes use of a writeback_control structure...

struct address_space_operations
--
2.9.3

2017-03-08 16:55:06

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 5/9] dax: set error in mapping when writeback fails

In order to get proper error codes from fsync, we must set an error in
the mapping range when writeback fails.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/dax.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index de622d4282a6..a601137286ed 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -892,8 +892,10 @@ int dax_writeback_mapping_range(struct address_space *mapping,

ret = dax_writeback_one(bdev, mapping, indices[i],
pvec.pages[i]);
- if (ret < 0)
+ if (ret < 0) {
+ mapping_set_error(mapping, ret);
return ret;
+ }
}
}
return 0;
--
2.9.3

2017-03-08 16:55:04

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 4/9] nilfs2: set the mapping error when calling SetPageError on writeback

In a later patch, we're going to want to make the fsync codepath not do
a TestClearPageError call as that can override the error set in the
address space. To do that though, we need to ensure that filesystems
that are relying on the PG_error bit for reporting writeback errors
also set an error in the address space.

Ensure that this is set in nilfs2.

Cc: Ryusuke Konishi <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nilfs2/segment.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index febed1217b3f..612d4b446793 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1745,6 +1745,7 @@ static void nilfs_end_page_io(struct page *page, int err)
} else {
__set_page_dirty_nobuffers(page);
SetPageError(page);
+ mapping_set_error(page_mapping(page), err);
}

end_page_writeback(page);
--
2.9.3

2017-03-08 18:02:22

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] mm: set mapping error when launder_pages fails

On Wed, 2017-03-08 at 11:29 -0500, Jeff Layton wrote:
> If launder_page fails, then we hit a problem writing back some inode
> data. Ensure that we communicate that fact in a subsequent fsync
> since
> another task could still have it open for write.
>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
>  mm/truncate.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 6263affdef88..29ae420a5bf9 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -594,11 +594,15 @@ invalidate_complete_page2(struct address_space
> *mapping, struct page *page)
>  
>  static int do_launder_page(struct address_space *mapping, struct
> page *page)
>  {
> + int ret;
> +
>   if (!PageDirty(page))
>   return 0;
>   if (page->mapping != mapping || mapping->a_ops->launder_page
> == NULL)
>   return 0;
> - return mapping->a_ops->launder_page(page);
> + ret = mapping->a_ops->launder_page(page);
> + mapping_set_error(mapping, ret);
> + return ret;
>  }
>  
>  /**

No. At that layer, you don't know that this is a page error. In the NFS
case, it could, for instance, just as well be a fatal signal.

--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2017-03-08 18:47:30

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] mm: set mapping error when launder_pages fails

On Wed, 2017-03-08 at 18:01 +0000, Trond Myklebust wrote:
> On Wed, 2017-03-08 at 11:29 -0500, Jeff Layton wrote:
> > If launder_page fails, then we hit a problem writing back some inode
> > data. Ensure that we communicate that fact in a subsequent fsync
> > since
> > another task could still have it open for write.
> >
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> >  mm/truncate.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/truncate.c b/mm/truncate.c
> > index 6263affdef88..29ae420a5bf9 100644
> > --- a/mm/truncate.c
> > +++ b/mm/truncate.c
> > @@ -594,11 +594,15 @@ invalidate_complete_page2(struct address_space
> > *mapping, struct page *page)
> >  
> >  static int do_launder_page(struct address_space *mapping, struct
> > page *page)
> >  {
> > + int ret;
> > +
> >   if (!PageDirty(page))
> >   return 0;
> >   if (page->mapping != mapping || mapping->a_ops->launder_page
> > == NULL)
> >   return 0;
> > - return mapping->a_ops->launder_page(page);
> > + ret = mapping->a_ops->launder_page(page);
> > + mapping_set_error(mapping, ret);
> > + return ret;
> >  }
> >  
> >  /**
>
> No. At that layer, you don't know that this is a page error. In the NFS
> case, it could, for instance, just as well be a fatal signal.
>

Ok...don't we have the same problem with writepage then? Most of the
writepage callers will set an error in the mapping if writepage returns
any sort of error? A fatal signal in that codepath could cause the same
problem, it seems. We don't dip into direct reclaim so much anymore, so
maybe signals aren't an issue there?

The alternative here would be to push this down into the callers. I
worry a bit though about getting this right across filesystems though.
It'd be preferable it if we could keep the mapping_set_error call in
generic VFS code instead, but if not then I'll just plan to do that.

Thanks,
--
Jeff Layton <[email protected]>

2017-03-08 19:20:59

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] mm: set mapping error when launder_pages fails

On Wed, 2017-03-08 at 13:38 -0500, Jeff Layton wrote:
> On Wed, 2017-03-08 at 18:01 +0000, Trond Myklebust wrote:
> > On Wed, 2017-03-08 at 11:29 -0500, Jeff Layton wrote:
> > > If launder_page fails, then we hit a problem writing back some
> > > inode
> > > data. Ensure that we communicate that fact in a subsequent fsync
> > > since
> > > another task could still have it open for write.
> > >
> > > Signed-off-by: Jeff Layton <[email protected]>
> > > ---
> > >  mm/truncate.c | 6 +++++-
> > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/truncate.c b/mm/truncate.c
> > > index 6263affdef88..29ae420a5bf9 100644
> > > --- a/mm/truncate.c
> > > +++ b/mm/truncate.c
> > > @@ -594,11 +594,15 @@ invalidate_complete_page2(struct
> > > address_space
> > > *mapping, struct page *page)
> > >  
> > >  static int do_launder_page(struct address_space *mapping, struct
> > > page *page)
> > >  {
> > > + int ret;
> > > +
> > >   if (!PageDirty(page))
> > >   return 0;
> > >   if (page->mapping != mapping || mapping->a_ops-
> > > >launder_page 
> > > == NULL)
> > >   return 0;
> > > - return mapping->a_ops->launder_page(page);
> > > + ret = mapping->a_ops->launder_page(page);
> > > + mapping_set_error(mapping, ret);
> > > + return ret;
> > >  }
> > >  
> > >  /**
> >
> > No. At that layer, you don't know that this is a page error. In the
> > NFS
> > case, it could, for instance, just as well be a fatal signal.
> >
>
> Ok...don't we have the same problem with writepage then? Most of the
> writepage callers will set an error in the mapping if writepage
> returns
> any sort of error? A fatal signal in that codepath could cause the
> same
> problem, it seems. We don't dip into direct reclaim so much anymore,
> so
> maybe signals aren't an issue there?

If writepage() fails due to a signal, then it has the option of marking
the page as dirty and returning AOP_WRITEPAGE_ACTIVATE. That's not
possible for launder_page().

> The alternative here would be to push this down into the callers. I
> worry a bit though about getting this right across filesystems
> though.
> It'd be preferable it if we could keep the mapping_set_error call in
> generic VFS code instead, but if not then I'll just plan to do that.
>



--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]

2017-03-08 22:45:52

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] mm: set mapping error when launder_pages fails

On Thu, Mar 09 2017, Trond Myklebust wrote:

> On Wed, 2017-03-08 at 11:29 -0500, Jeff Layton wrote:
>> If launder_page fails, then we hit a problem writing back some inode
>> data. Ensure that we communicate that fact in a subsequent fsync
>> since
>> another task could still have it open for write.
>>
>> Signed-off-by: Jeff Layton <[email protected]>
>> ---
>>  mm/truncate.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/truncate.c b/mm/truncate.c
>> index 6263affdef88..29ae420a5bf9 100644
>> --- a/mm/truncate.c
>> +++ b/mm/truncate.c
>> @@ -594,11 +594,15 @@ invalidate_complete_page2(struct address_space
>> *mapping, struct page *page)
>>  
>>  static int do_launder_page(struct address_space *mapping, struct
>> page *page)
>>  {
>> + int ret;
>> +
>>   if (!PageDirty(page))
>>   return 0;
>>   if (page->mapping != mapping || mapping->a_ops->launder_page
>> == NULL)
>>   return 0;
>> - return mapping->a_ops->launder_page(page);
>> + ret = mapping->a_ops->launder_page(page);
>> + mapping_set_error(mapping, ret);
>> + return ret;
>>  }
>>  
>>  /**
>
> No. At that layer, you don't know that this is a page error. In the NFS
> case, it could, for instance, just as well be a fatal signal.
>

In that case, would 'ret' be ERESTARTSYS or EAGAIN or similar?
Should mapping_set_error() ignore those?

Thanks,
NeilBrown

> --
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]


Attachments:
signature.asc (832.00 B)

2017-03-08 23:16:18

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] mm: clear any AS_* errors when returning error on any fsync or close

On Thu, Mar 09 2017, Jeff Layton wrote:

> Currently we don't clear the address space error when there is a -EIO
> error on fsynci, due to writeback initiation failure. If writes fail
> with -EIO and the mapping is flagged with an AS_EIO or AS_ENOSPC error,
> then we can end up returning errors on two fsync calls, even when a
> write between them succeeded (or there was no write).
>
> Ensure that we also clear out any mapping errors when initiating
> writeback fails with -EIO in filemap_write_and_wait and
> filemap_write_and_wait_range.

This change appears to assume that filemap_write_and_wait* is only
called from fsync() (or similar) and the return status is always
checked.

A __must_check annotation might be helpful.

It would catch v9_fs_file_lock(), afs_setattr() and others.

While I think your change is probably heading in the right direction,
there seem to be some loose ends still.

Thanks,
NeilBrown


>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> mm/filemap.c | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 1694623a6289..fc123b9833e1 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -488,7 +488,7 @@ EXPORT_SYMBOL(filemap_fdatawait);
>
> int filemap_write_and_wait(struct address_space *mapping)
> {
> - int err = 0;
> + int err;
>
> if ((!dax_mapping(mapping) && mapping->nrpages) ||
> (dax_mapping(mapping) && mapping->nrexceptional)) {
> @@ -499,10 +499,18 @@ int filemap_write_and_wait(struct address_space *mapping)
> * But the -EIO is special case, it may indicate the worst
> * thing (e.g. bug) happened, so we avoid waiting for it.
> */
> - if (err != -EIO) {
> + if (likely(err != -EIO)) {
> int err2 = filemap_fdatawait(mapping);
> if (!err)
> err = err2;
> + } else {
> + /*
> + * Clear the error in the address space since we're
> + * returning an error here. -EIO takes precedence over
> + * everything else though, so we can just discard
> + * the return here.
> + */
> + filemap_check_errors(mapping);
> }
> } else {
> err = filemap_check_errors(mapping);
> @@ -537,6 +545,14 @@ int filemap_write_and_wait_range(struct address_space *mapping,
> lstart, lend);
> if (!err)
> err = err2;
> + } else {
> + /*
> + * Clear the error in the address space since we're
> + * returning an error here. -EIO takes precedence over
> + * everything else though, so we can just discard
> + * the return here.
> + */
> + filemap_check_errors(mapping);
> }
> } else {
> err = filemap_check_errors(mapping);
> --
> 2.9.3


Attachments:
signature.asc (832.00 B)

2017-03-09 00:10:23

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] mm: clear any AS_* errors when returning error on any fsync or close

On Thu, 2017-03-09 at 08:23 +1100, NeilBrown wrote:
> On Thu, Mar 09 2017, Jeff Layton wrote:
>
> > Currently we don't clear the address space error when there is a -EIO
> > error on fsynci, due to writeback initiation failure. If writes fail
> > with -EIO and the mapping is flagged with an AS_EIO or AS_ENOSPC error,
> > then we can end up returning errors on two fsync calls, even when a
> > write between them succeeded (or there was no write).
> >
> > Ensure that we also clear out any mapping errors when initiating
> > writeback fails with -EIO in filemap_write_and_wait and
> > filemap_write_and_wait_range.
>
> This change appears to assume that filemap_write_and_wait* is only
> called from fsync() (or similar) and the return status is always
> checked.
>
> A __must_check annotation might be helpful.
>

Yes, good idea.

> It would catch v9_fs_file_lock(), afs_setattr() and others.
>

Ouch -- good catch.

Actually, those look like bugs in the code as it exists today. If some
background page writeback fails, but no write initiation fails on that
call, then those callers are discarding errors that should have been
reported at fsync.

> While I think your change is probably heading in the right direction,
> there seem to be some loose ends still.
>

Yes...I probably should be prefacing all of these patches with [RFC] at
this point.

I think I'm starting to grasp the problem (and its scope), but we might
have to think about how to approach this more strategically. Given that
we have this wrong in so many places, I think that probably means that
the interfaces we have make it easy to do so. I need to consider how to
correct that.

>
>
> >
> > Suggested-by: Jan Kara <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > mm/filemap.c | 20 ++++++++++++++++++--
> > 1 file changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 1694623a6289..fc123b9833e1 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -488,7 +488,7 @@ EXPORT_SYMBOL(filemap_fdatawait);
> >
> > int filemap_write_and_wait(struct address_space *mapping)
> > {
> > - int err = 0;
> > + int err;
> >
> > if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > (dax_mapping(mapping) && mapping->nrexceptional)) {
> > @@ -499,10 +499,18 @@ int filemap_write_and_wait(struct address_space *mapping)
> > * But the -EIO is special case, it may indicate the worst
> > * thing (e.g. bug) happened, so we avoid waiting for it.
> > */
> > - if (err != -EIO) {
> > + if (likely(err != -EIO)) {
> > int err2 = filemap_fdatawait(mapping);
> > if (!err)
> > err = err2;
> > + } else {
> > + /*
> > + * Clear the error in the address space since we're
> > + * returning an error here. -EIO takes precedence over
> > + * everything else though, so we can just discard
> > + * the return here.
> > + */
> > + filemap_check_errors(mapping);
> > }
> > } else {
> > err = filemap_check_errors(mapping);
> > @@ -537,6 +545,14 @@ int filemap_write_and_wait_range(struct address_space *mapping,
> > lstart, lend);
> > if (!err)
> > err = err2;
> > + } else {
> > + /*
> > + * Clear the error in the address space since we're
> > + * returning an error here. -EIO takes precedence over
> > + * everything else though, so we can just discard
> > + * the return here.
> > + */
> > + filemap_check_errors(mapping);
> > }
> > } else {
> > err = filemap_check_errors(mapping);
> > --
> > 2.9.3

--
Jeff Layton <[email protected]>

2017-03-10 00:06:53

by Ross Zwisler

[permalink] [raw]
Subject: Re: [PATCH v2 1/9] mm: fix mapping_set_error call in me_pagecache_dirty

On Wed, Mar 08, 2017 at 11:29:26AM -0500, Jeff Layton wrote:
> The error code should be negative. Since this ends up in the default
> case anyway, this is harmless, but it's less confusing to negate it.
>
> Signed-off-by: Jeff Layton <[email protected]>

Reviewed-by: Ross Zwisler <[email protected]>

2017-03-10 00:07:10

by Ross Zwisler

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] mm: drop "wait" parameter from write_one_page

On Wed, Mar 08, 2017 at 11:29:27AM -0500, Jeff Layton wrote:
> The callers all set it to 1. Also, make it clear that this function will
> not set any sort of AS_* error, and that the caller must do so if
> necessary. No existing caller uses this on normal files, so none of them
> need it.
>
> Signed-off-by: Jeff Layton <[email protected]>

Reviewed-by: Ross Zwisler <[email protected]>

2017-03-10 00:09:43

by Ross Zwisler

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] mm: clear any AS_* errors when returning error on any fsync or close

On Wed, Mar 08, 2017 at 11:29:28AM -0500, Jeff Layton wrote:
> Currently we don't clear the address space error when there is a -EIO
> error on fsynci, due to writeback initiation failure. If writes fail
fsync

> with -EIO and the mapping is flagged with an AS_EIO or AS_ENOSPC error,
> then we can end up returning errors on two fsync calls, even when a
> write between them succeeded (or there was no write).
>
> Ensure that we also clear out any mapping errors when initiating
> writeback fails with -EIO in filemap_write_and_wait and
> filemap_write_and_wait_range.
>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> mm/filemap.c | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 1694623a6289..fc123b9833e1 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -488,7 +488,7 @@ EXPORT_SYMBOL(filemap_fdatawait);
>
> int filemap_write_and_wait(struct address_space *mapping)
> {
> - int err = 0;
> + int err;
>
> if ((!dax_mapping(mapping) && mapping->nrpages) ||
> (dax_mapping(mapping) && mapping->nrexceptional)) {
> @@ -499,10 +499,18 @@ int filemap_write_and_wait(struct address_space *mapping)
> * But the -EIO is special case, it may indicate the worst
> * thing (e.g. bug) happened, so we avoid waiting for it.
> */
> - if (err != -EIO) {
> + if (likely(err != -EIO)) {

The above two cleanup changes were made only to filemap_write_and_wait(), but
should also probably be done to filemap_write_and_wait_range() to keep them as
consistent as possible?

2017-03-10 00:21:11

by Ross Zwisler

[permalink] [raw]
Subject: Re: [PATCH v2 5/9] dax: set error in mapping when writeback fails

On Wed, Mar 08, 2017 at 11:29:30AM -0500, Jeff Layton wrote:
> In order to get proper error codes from fsync, we must set an error in
> the mapping range when writeback fails.
>
> Signed-off-by: Jeff Layton <[email protected]>

Yep, paired with the changes to filmap_write_and_wait() and
filemap_write_and_wait_range(), this seems fine.

Reviewed-by: Ross Zwisler <[email protected]>

2017-03-10 03:08:58

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] mm: clear any AS_* errors when returning error on any fsync or close

On Thu, 2017-03-09 at 17:09 -0700, Ross Zwisler wrote:
> On Wed, Mar 08, 2017 at 11:29:28AM -0500, Jeff Layton wrote:
> > Currently we don't clear the address space error when there is a -EIO
> > error on fsynci, due to writeback initiation failure. If writes fail
>
> fsync
>
> > with -EIO and the mapping is flagged with an AS_EIO or AS_ENOSPC error,
> > then we can end up returning errors on two fsync calls, even when a
> > write between them succeeded (or there was no write).
> >
> > Ensure that we also clear out any mapping errors when initiating
> > writeback fails with -EIO in filemap_write_and_wait and
> > filemap_write_and_wait_range.
> >
> > Suggested-by: Jan Kara <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > mm/filemap.c | 20 ++++++++++++++++++--
> > 1 file changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 1694623a6289..fc123b9833e1 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -488,7 +488,7 @@ EXPORT_SYMBOL(filemap_fdatawait);
> >
> > int filemap_write_and_wait(struct address_space *mapping)
> > {
> > - int err = 0;
> > + int err;
> >
> > if ((!dax_mapping(mapping) && mapping->nrpages) ||
> > (dax_mapping(mapping) && mapping->nrexceptional)) {
> > @@ -499,10 +499,18 @@ int filemap_write_and_wait(struct address_space *mapping)
> > * But the -EIO is special case, it may indicate the worst
> > * thing (e.g. bug) happened, so we avoid waiting for it.
> > */
> > - if (err != -EIO) {
> > + if (likely(err != -EIO)) {
>
> The above two cleanup changes were made only to filemap_write_and_wait(), but
> should also probably be done to filemap_write_and_wait_range() to keep them as
> consistent as possible?

Thanks, I fixed that in the patch in my tree. Unfortunately, as Neil
pointed out, there is a bigger problem here...

There are a lot of callers of the filemap_write_and_wait* functions
that never check the return code at all, and some others that call this
from codepaths that where we can't report errors properly. Yet, the
mapping error gets cleared out anyway, which means that fsync will
probably never see it.

So while I doubt this patch will make anything worse, I think we have
to look at fixing those problems first. We need to ensure that when
filemap_check_errors is called, that we're in a codepath where we can
actually report the error to something that can interpret it properly.
Basically, only in write, fsync, msync or close codepaths. For the
others, we need to use something like filemap_fdatawait_keep_errors so
that we don't end up dropping writeback errors onto the floor.

I'm going to look at fixing that up first (maybe as a preliminary
series to this one). There are a lot of callers though, and I don't see
a way around having to go and review all of these callsites
individually. Maybe it's be best to just lift the filemap_check_errors
calls higher in the call stack to ensure that? Not sure...

Anyway...I'm first trying to collect a list of what I think needs
fixing here, and figure out how to break all of this up into manageable
pieces and order it sanely.
--
Jeff Layton <[email protected]>