LinuxLists.cc - [PATCH 04/33] NFS: Fix nfs_wb_page() to always exit with an error or a clean page

2008-04-19 20:50:31

Subject: [PATCH 04/33] NFS: Fix nfs_wb_page() to always exit with an error or a clean page

It is possible for nfs_wb_page() to sometimes exit with 0 return value, yet
the page is left in a dirty state.
For instance in the case where the server rebooted, and the COMMIT request
failed, then all the previously "clean" pages which were cached by the
server, but were not guaranteed to have been writted out to disk,
have to be redirtied and resent to the server.
The fix is to have nfs_wb_page_priority() check that the page is clean
before it exits...

This fixes a condition that triggers the BUG_ON(PagePrivate(page)) in
nfs_create_request() when we're in the nfs_readpage() path.

Also eliminate a redundant BUG_ON(!PageLocked(page)) while we're at it. It
turns out that clear_page_dirty_for_io() has the exact same test.

Signed-off-by: Trond Myklebust <[email protected]>
---

fs/nfs/write.c | 23 ++++++++++++-----------
1 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ce40cad..997b42a 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1493,18 +1493,19 @@ static int nfs_wb_page_priority(struct inode *inode, struct page *page,
};
int ret;

- BUG_ON(!PageLocked(page));
- if (clear_page_dirty_for_io(page)) {
- ret = nfs_writepage_locked(page, &wbc);
+ do {
+ if (clear_page_dirty_for_io(page)) {
+ ret = nfs_writepage_locked(page, &wbc);
+ if (ret < 0)
+ goto out_error;
+ } else if (!PagePrivate(page))
+ break;
+ ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
if (ret < 0)
- goto out;
- }
- if (!PagePrivate(page))
- return 0;
- ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
- if (ret >= 0)
- return 0;
-out:
+ goto out_error;
+ } while (PagePrivate(page));
+ return 0;
+out_error:
__mark_inode_dirty(inode, I_DIRTY_PAGES);
return ret;
}

2008-04-21 18:55:33

by Chuck Lever III

[permalink] [raw]

Subject: Re: [PATCH 04/33] NFS: Fix nfs_wb_page() to always exit with an error or a clean page

Hi Trond-

On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> It is possible for nfs_wb_page() to sometimes exit with 0 return
> value, yet
> the page is left in a dirty state.
> For instance in the case where the server rebooted, and the COMMIT
> request
> failed, then all the previously "clean" pages which were cached by the
> server, but were not guaranteed to have been writted out to disk,
> have to be redirtied and resent to the server.
> The fix is to have nfs_wb_page_priority() check that the page is clean
> before it exits...

We have somewhat similar logic in nfs_wb_page_cancel and
__nfs_write_mapping. How is it that these two are not affected by the
"server reboot / commit failed" scenario? Do they simply retry until
the resent write succeeds?

> This fixes a condition that triggers the BUG_ON(PagePrivate(page)) in
> nfs_create_request() when we're in the nfs_readpage() path.

Seems like there's more at stake here than just triggering a BUG.

In the launder_page path, dirty data could be lost if nfs_wb_page
returns zero, but hasn't successfully flushed the data to the server,
right?

In the nfs_flush_incompatible path, you would lose write ordering at
the least... couldn't that potentially result in data corruption?

I don't recall the test case that triggered the BUG. Do we have a
test that is run often (eg. fsx or fsx-odirect) that exercises this
path?

> Also eliminate a redundant BUG_ON(!PageLocked(page)) while we're at
> it. It
> turns out that clear_page_dirty_for_io() has the exact same test.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> fs/nfs/write.c | 23 ++++++++++++-----------
> 1 files changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index ce40cad..997b42a 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1493,18 +1493,19 @@ static int nfs_wb_page_priority(struct inode
> *inode, struct page *page,
> };
> int ret;
>
> - BUG_ON(!PageLocked(page));
> - if (clear_page_dirty_for_io(page)) {
> - ret = nfs_writepage_locked(page, &wbc);
> + do {
> + if (clear_page_dirty_for_io(page)) {
> + ret = nfs_writepage_locked(page, &wbc);
> + if (ret < 0)
> + goto out_error;
> + } else if (!PagePrivate(page))
> + break;
> + ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
> if (ret < 0)
> - goto out;
> - }
> - if (!PagePrivate(page))
> - return 0;
> - ret = nfs_sync_mapping_wait(page->mapping, &wbc, how);
> - if (ret >= 0)
> - return 0;
> -out:
> + goto out_error;
> + } while (PagePrivate(page));
> + return 0;
> +out_error:
> __mark_inode_dirty(inode, I_DIRTY_PAGES);
> return ret;
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2008-04-22 00:14:09

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH 04/33] NFS: Fix nfs_wb_page() to always exit with an error or a clean page

On Mon, 2008-04-21 at 14:53 -0400, Chuck Lever wrote:
> Hi Trond-
>
> On Apr 19, 2008, at 4:40 PM, Trond Myklebust wrote:
> > It is possible for nfs_wb_page() to sometimes exit with 0 return
> > value, yet
> > the page is left in a dirty state.
> > For instance in the case where the server rebooted, and the COMMIT
> > request
> > failed, then all the previously "clean" pages which were cached by the
> > server, but were not guaranteed to have been writted out to disk,
> > have to be redirtied and resent to the server.
> > The fix is to have nfs_wb_page_priority() check that the page is clean
> > before it exits...
>
> We have somewhat similar logic in nfs_wb_page_cancel and
> __nfs_write_mapping. How is it that these two are not affected by the
> "server reboot / commit failed" scenario? Do they simply retry until
> the resent write succeeds?

Who said they're not affected?

The difference is that nfs_wb_page_cancel() will attempt to clear all
requests that are not committed instead of redirtying them.

OTOH, nfs_write_mapping() can and will leave dirty pages under certain
circumstances.

> > This fixes a condition that triggers the BUG_ON(PagePrivate(page)) in
> > nfs_create_request() when we're in the nfs_readpage() path.
>
> Seems like there's more at stake here than just triggering a BUG.
>
> In the launder_page path, dirty data could be lost if nfs_wb_page
> returns zero, but hasn't successfully flushed the data to the server,
> right?
>
> In the nfs_flush_incompatible path, you would lose write ordering at
> the least... couldn't that potentially result in data corruption?

Yes.

> I don't recall the test case that triggered the BUG. Do we have a
> test that is run often (eg. fsx or fsx-odirect) that exercises this
> path?

You would have to either cause the RPC call to fail due to something
like an ENOMEM error, or you would have to have a server reboot at the
'right' moment.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com