2008-01-15 12:51:42

by Wu Fengguang

[permalink] [raw]
Subject: [PATCH 00/13] writeback bug fixes and simplifications take 2

Andrew,

This patchset mainly polishes the writeback queuing policies.
The main goals are:

(1) small files should not be starved by big dirty files
(2) sync as fast as possible for not-blocked inodes/pages
- don't leave them out; no congestion_wait() in between them
(3) avoid busy iowait for blocked inodes
- retry them in the next go of s_io(maybe at the next wakeup of pdflush)

The role of the queues:

s_dirty: park for dirtied_when expiration
s_io: park for io submission
s_more_io: for big dirty inodes, they will be retried in this run of pdflush
(it ensures fairness between small/large files)
s_more_io_wait: for blocked inodes, they will be picked up in next run of s_io


This patchset is in better shape, but still not ready for merge.
It begins with:

[PATCH 01/13] writeback: revert 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b
[PATCH 02/13] writeback: clear PAGECACHE_TAG_DIRTY for truncated page in block_write_full_page()

Introduces more_io/more_io_wait based policies:

[PATCH 03/13] writeback: introduce writeback_control.more_io
[PATCH 04/13] writeback: introduce super_block.s_more_io_wait
[PATCH 05/13] writeback: merge duplicate code into writeback_some_pages()
[PATCH 06/13] writeback: defer writeback on not-all-pages-written
[PATCH 07/13] writeback: defer writeback on locked inode
[PATCH 08/13] writeback: defer writeback on locked buffers
[PATCH 09/13] writeback: requeue_io() on redirtied inode

And finishes with some code cleanups:

[PATCH 10/13] writeback: introduce queue_dirty()
[PATCH 11/13] writeback: queue_dirty() on memory-backed bdi
[PATCH 12/13] writeback: remove redirty_tail()
[PATCH 13/13] writeback: cleanup __sync_single_inode()

Diffstat:

fs/buffer.c | 2
fs/fs-writeback.c | 121 +++++++++++++++---------------------------
fs/super.c | 1
include/linux/fs.h | 1
mm/page-writeback.c | 46 +++++++--------
5 files changed, 72 insertions(+), 99 deletions(-)

Regards,
Fengguang Wu
--


2008-01-15 18:33:21

by Michael Rubin

[permalink] [raw]
Subject: Re: [PATCH 00/13] writeback bug fixes and simplifications take 2

On Jan 15, 2008 4:36 AM, Fengguang Wu <[email protected]> wrote:
> Andrew,
>
> This patchset mainly polishes the writeback queuing policies.

Anyone know which tree is this patched based out of?

> The main goals are:
>
> (1) small files should not be starved by big dirty files
> (2) sync as fast as possible for not-blocked inodes/pages
> - don't leave them out; no congestion_wait() in between them
> (3) avoid busy iowait for blocked inodes
> - retry them in the next go of s_io(maybe at the next wakeup of pdflush)
>

Fengguang do you have any specific tests for any of these cases? As I
have posted earlier I am putting together a writeback test suite for
test.kernel.org and if you have one (even if it's an ugly shell
script) that would save me some time.

Also if you want any of mine let me know. :-)

mrubin


> The role of the queues:
>
> s_dirty: park for dirtied_when expiration
> s_io: park for io submission
> s_more_io: for big dirty inodes, they will be retried in this run of pdflush
> (it ensures fairness between small/large files)
> s_more_io_wait: for blocked inodes, they will be picked up in next run of s_io
>
>
> This patchset is in better shape, but still not ready for merge.
> It begins with:
>
> [PATCH 01/13] writeback: revert 2e6883bdf49abd0e7f0d9b6297fc3be7ebb2250b
> [PATCH 02/13] writeback: clear PAGECACHE_TAG_DIRTY for truncated page in block_write_full_page()
>
> Introduces more_io/more_io_wait based policies:
>
> [PATCH 03/13] writeback: introduce writeback_control.more_io
> [PATCH 04/13] writeback: introduce super_block.s_more_io_wait
> [PATCH 05/13] writeback: merge duplicate code into writeback_some_pages()
> [PATCH 06/13] writeback: defer writeback on not-all-pages-written
> [PATCH 07/13] writeback: defer writeback on locked inode
> [PATCH 08/13] writeback: defer writeback on locked buffers
> [PATCH 09/13] writeback: requeue_io() on redirtied inode
>
> And finishes with some code cleanups:
>
> [PATCH 10/13] writeback: introduce queue_dirty()
> [PATCH 11/13] writeback: queue_dirty() on memory-backed bdi
> [PATCH 12/13] writeback: remove redirty_tail()
> [PATCH 13/13] writeback: cleanup __sync_single_inode()
>
> Diffstat:
>
> fs/buffer.c | 2
> fs/fs-writeback.c | 121 +++++++++++++++---------------------------
> fs/super.c | 1
> include/linux/fs.h | 1
> mm/page-writeback.c | 46 +++++++--------
> 5 files changed, 72 insertions(+), 99 deletions(-)
>
> Regards,
> Fengguang Wu
> --
>

2008-01-16 02:19:06

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 00/13] writeback bug fixes and simplifications take 2

On Tue, Jan 15, 2008 at 10:33:01AM -0800, Michael Rubin wrote:
> On Jan 15, 2008 4:36 AM, Fengguang Wu <[email protected]> wrote:
> > Andrew,
> >
> > This patchset mainly polishes the writeback queuing policies.
>
> Anyone know which tree is this patched based out of?

They are against the latest -mm tree, or 2.6.24-rc6-mm1.

> > The main goals are:
> >
> > (1) small files should not be starved by big dirty files
> > (2) sync as fast as possible for not-blocked inodes/pages
> > - don't leave them out; no congestion_wait() in between them
> > (3) avoid busy iowait for blocked inodes
> > - retry them in the next go of s_io(maybe at the next wakeup of pdflush)
> >
>
> Fengguang do you have any specific tests for any of these cases? As I
> have posted earlier I am putting together a writeback test suite for
> test.kernel.org and if you have one (even if it's an ugly shell
> script) that would save me some time.

No, I just run tests with cp/dd etc. I analyze the code and debug
traces a lot, and know that it works in the situations I can imagine.
But dedicated test suites are good in the long term.

> Also if you want any of mine let me know. :-)

OK, thank you.

Fengguang

2008-01-18 07:52:17

by Michael Rubin

[permalink] [raw]
Subject: Re: [PATCH 00/13] writeback bug fixes and simplifications take 2

On Jan 15, 2008 4:36 AM, Fengguang Wu <[email protected]> wrote:
> Andrew,
>
> This patchset mainly polishes the writeback queuing policies.
> The main goals are:
>
> (1) small files should not be starved by big dirty files
> (2) sync as fast as possible for not-blocked inodes/pages
> - don't leave them out; no congestion_wait() in between them
> (3) avoid busy iowait for blocked inodes
> - retry them in the next go of s_io(maybe at the next wakeup of pdflush)
>
> The role of the queues:
>
> s_dirty: park for dirtied_when expiration
> s_io: park for io submission
> s_more_io: for big dirty inodes, they will be retried in this run of pdflush
> (it ensures fairness between small/large files)
> s_more_io_wait: for blocked inodes, they will be picked up in next run of s_io

Quick question to make sure I get this. Each queue is sorted as such:

s_dirty - sorted by the dirtied_when field
s_io - sorted by no explicit key but by the order we want to process
in sync_sb_inodes
s_more_io - held for later they are sorted in the same manner as s_io

Is that it?

mrubin

2008-01-18 08:28:05

by Wu Fengguang

[permalink] [raw]
Subject: Re: [PATCH 00/13] writeback bug fixes and simplifications take 2

On Thu, Jan 17, 2008 at 11:51:51PM -0800, Michael Rubin wrote:
> On Jan 15, 2008 4:36 AM, Fengguang Wu <[email protected]> wrote:
> > Andrew,
> >
> > This patchset mainly polishes the writeback queuing policies.
> > The main goals are:
> >
> > (1) small files should not be starved by big dirty files
> > (2) sync as fast as possible for not-blocked inodes/pages
> > - don't leave them out; no congestion_wait() in between them
> > (3) avoid busy iowait for blocked inodes
> > - retry them in the next go of s_io(maybe at the next wakeup of pdflush)
> >
> > The role of the queues:
> >
> > s_dirty: park for dirtied_when expiration
> > s_io: park for io submission
> > s_more_io: for big dirty inodes, they will be retried in this run of pdflush
> > (it ensures fairness between small/large files)
> > s_more_io_wait: for blocked inodes, they will be picked up in next run of s_io
>
> Quick question to make sure I get this. Each queue is sorted as such:
>
> s_dirty - sorted by the dirtied_when field
> s_io - sorted by no explicit key but by the order we want to process
> in sync_sb_inodes
> s_more_io - held for later they are sorted in the same manner as s_io
>
> Is that it?

Yes, exactly. s_io and s_more_io can be considered as one list broken
up into two - to provide the cursor for sequential iteration.
And s_more_io_wait is simply a container for blocked inodes.