2010-11-23 14:11:54

by Nick Piggin

[permalink] [raw]
Subject: [patch 4/7] fs: preserve inode dirty bits on failed metadata writeback

Otherwise we think the inode is clean even if syncing failed.

Signed-off-by: Nick Piggin <[email protected]>

Index: linux-2.6/fs/fs-writeback.c
===================================================================
--- linux-2.6.orig/fs/fs-writeback.c 2010-11-23 22:28:22.000000000 +1100
+++ linux-2.6/fs/fs-writeback.c 2010-11-23 22:57:40.000000000 +1100
@@ -447,15 +447,25 @@ writeback_single_inode(struct inode *ino
spin_lock(&inode_lock);
dirty = inode->i_state & I_DIRTY;
inode->i_state &= ~(I_DIRTY_SYNC | I_DIRTY_DATASYNC);
- spin_unlock(&inode_lock);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & (I_DIRTY_SYNC | I_DIRTY_DATASYNC)) {
- int err = write_inode(inode, wbc);
+ int err;
+
+ spin_unlock(&inode_lock);
+ err = write_inode(inode, wbc);
if (ret == 0)
ret = err;
+ spin_lock(&inode_lock);
+ if (err) {
+ /*
+ * Inode writeout failed, restore inode metadata
+ * dirty bits.
+ */
+ inode->i_state |= dirty &
+ (I_DIRTY_SYNC | I_DIRTY_DATASYNC);
+ }
}

- spin_lock(&inode_lock);
if (!inode_writeback_end(inode)) {
if (wbc->nr_to_write <= 0) {
/*


2010-11-29 14:59:38

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch 4/7] fs: preserve inode dirty bits on failed metadata writeback

On Wed, Nov 24, 2010 at 01:06:14AM +1100, [email protected] wrote:
> Otherwise we think the inode is clean even if syncing failed.

The patch itself looks fine, but I'm not sure it's enough. If we do
an synchronous writeout it could fail long after ->write_inode
has returned.

2010-11-30 00:08:37

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch 4/7] fs: preserve inode dirty bits on failed metadata writeback

On Mon, Nov 29, 2010 at 09:59:37AM -0500, Christoph Hellwig wrote:
> On Wed, Nov 24, 2010 at 01:06:14AM +1100, [email protected] wrote:
> > Otherwise we think the inode is clean even if syncing failed.
>
> The patch itself looks fine, but I'm not sure it's enough. If we do
> an synchronous writeout it could fail long after ->write_inode
> has returned.

Oh there are lots of holes in this buggy POS. Still more that I
haven't fixed, even before you think about error cases.

But, after a *successful* ->write_inode (whether sync or async),
then the filesystem is not going to get any more, unless they
mark the inode dirty again.

I think that's fine, so long as the dirty buffers or whatever are
properly synced at sync(2) / fsync(2) time.