2008-01-15 12:50:56

by Wu Fengguang

[permalink] [raw]
Subject: [PATCH 09/13] writeback: requeue_io() on redirtied inode

Redirtied inodes could be seen in really fast writes.
They should really be synced as soon as possible.

redirty_tail() could delay the inode for up to 30s.
Kill the delay by using requeue_io() instead.

Cc: Michael Rubin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Fengguang Wu <[email protected]>
---
fs/fs-writeback.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)

--- linux-mm.orig/fs/fs-writeback.c
+++ linux-mm/fs/fs-writeback.c
@@ -294,7 +294,7 @@ __sync_single_inode(struct inode *inode,
* Someone redirtied the inode while were writing back
* the pages.
*/
- redirty_tail(inode);
+ requeue_io(inode);
} else if (atomic_read(&inode->i_count)) {
/*
* The inode is clean, inuse

--


2008-01-16 08:13:36

by David Chinner

[permalink] [raw]
Subject: Re: [PATCH 09/13] writeback: requeue_io() on redirtied inode

On Tue, Jan 15, 2008 at 08:36:46PM +0800, Fengguang Wu wrote:
> Redirtied inodes could be seen in really fast writes.
> They should really be synced as soon as possible.
>
> redirty_tail() could delay the inode for up to 30s.
> Kill the delay by using requeue_io() instead.

That's actually bad for anything that does delayed allocation
or updates state on data I/o completion.

e.g. XFS when writing past EOF doing delalloc dirties the inode
during writeout (allocation) and then updates the file size on data
I/o completion hence dirtying the inode again.

With this change, writing the last pages out would result
in hitting this code and causing the inode to be flushed very
soon after the data write. Then, after the inode write is issued,
we get data I/o completion which dirties the inode again,
resulting in needing to write the inode again to clean it.
i.e. it introduces a potential new and useless inode write
I/O.

Also, the immediate inode write may be useless for XFS because the
inode may be pinned in memory due to async transactions
still in flight (e.g. from delalloc) so we've got two
situations where flushing the inode immediately is suboptimal.

Hence I don't think this is an optimisation that should be made
in the generic writeback code.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2008-01-17 04:23:30

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 09/13] writeback: requeue_io() on redirtied inode

On Wed, Jan 16, 2008 at 07:13:07PM +1100, David Chinner wrote:
> On Tue, Jan 15, 2008 at 08:36:46PM +0800, Fengguang Wu wrote:
> > Redirtied inodes could be seen in really fast writes.
> > They should really be synced as soon as possible.
> >
> > redirty_tail() could delay the inode for up to 30s.
> > Kill the delay by using requeue_io() instead.
>
> That's actually bad for anything that does delayed allocation
> or updates state on data I/o completion.
>
> e.g. XFS when writing past EOF doing delalloc dirties the inode
> during writeout (allocation) and then updates the file size on data
> I/o completion hence dirtying the inode again.
>
> With this change, writing the last pages out would result
> in hitting this code and causing the inode to be flushed very
> soon after the data write. Then, after the inode write is issued,
> we get data I/o completion which dirties the inode again,
> resulting in needing to write the inode again to clean it.
> i.e. it introduces a potential new and useless inode write
> I/O.
>
> Also, the immediate inode write may be useless for XFS because the
> inode may be pinned in memory due to async transactions
> still in flight (e.g. from delalloc) so we've got two
> situations where flushing the inode immediately is suboptimal.
>
> Hence I don't think this is an optimisation that should be made
> in the generic writeback code.

Thanks for the explanation.
I can confirm that many requeue_io() happened for the same XFS inode:
[ 158.794562] requeue_io 328: inode 5243009 size 34647 at 03:03(hda3)
[ 158.794827] mm/page-writeback.c 668 wb_kupdate: pdflush(183) 14209 global 486 10 0 wc _M tw 1013 sk 0
[ 158.795293] requeue_io 328: inode 5243009 size 34647 at 03:03(hda3)
[ 158.795313] mm/page-writeback.c 668 wb_kupdate: pdflush(183) 14198 global 486 10 0 wc _M tw 1024 sk 0
...
[ 170.713900] requeue_io 328: inode 5243009 size 34647 at 03:03(hda3)
[ 170.713925] mm/page-writeback.c 668 wb_kupdate: pdflush(183) 14198 global 1875 0 0 wc _M tw 1024 sk 0
[ 170.813584] mm/page-writeback.c 668 wb_kupdate: pdflush(183) 14198 global 2855 0 0 wc __ tw 1024 sk 0