LinuxLists.cc - Page Cache writeback too slow, SSD/noop scheduler/ext2

2009-03-20 18:32:27

Subject: Page Cache writeback too slow, SSD/noop scheduler/ext2

Hi,

We have hit a problem where the page-cache writeback algorithm is not
keeping up.
When memory gets low this will result in very irregular performance drops.

Our setup is as follows:
30 x Quad core machine with 64GB ram.
These are single purpose machines running MySQL.
Kernel version: 2.6.28.7
A dedicated SSD drive for the ext2 database partition
Noop scheduler for the ssd drive.

The current hypothesis is as follows:
The wk_update function does not write enough dirty pages, which allows the
number of dirty pages to grow to the dirty_background limit.
When memory is low, ?background_writeout() comes around and ?forcefully?
writes dirty pages to disk.
This forced write fills the disk queue and starves read calls that MySQL is
trying to do: basically killing performance for a few seconds.
This pattern repeats as soon as the cleared memory is filled again.

Decreasing the dirty_writeback_centisecs to 100 doesn?t help

I don?t know why this is, but I did some preliminary tracing using systemtap
and it seems that the majority of times wk_update calls decides to do
nothing.

Doubling /sys/block/sdb/queue/nr_requests to 256, seems to help abit: the
nr_dirty pages is increasing more slowly.
But I am unsure of side-effects and am afraid of increasing the starvation
problem for mysql.

I?am very much willing to work on this issue and see it fixed, but would
like to tap into the knowledge of people here.
So:
* Have more people seen this or simular issues?
* Is the hypothesis above a viable one?
* Suggestions/pointers for further research and statistics I should measure
to improve the understanding of this problem.

With regards,

Jos

2009-03-21 10:59:26

by Andrew Morton

[permalink] [raw]

Subject: Re: Page Cache writeback too slow, SSD/noop scheduler/ext2

On Fri, 20 Mar 2009 19:26:06 +0100 Jos Houtman <[email protected]> wrote:

> Hi,
>
> We have hit a problem where the page-cache writeback algorithm is not
> keeping up.
> When memory gets low this will result in very irregular performance drops.
>
> Our setup is as follows:
> 30 x Quad core machine with 64GB ram.
> These are single purpose machines running MySQL.
> Kernel version: 2.6.28.7
> A dedicated SSD drive for the ext2 database partition
> Noop scheduler for the ssd drive.
>
>
> The current hypothesis is as follows:
> The wk_update function does not write enough dirty pages, which allows the
> number of dirty pages to grow to the dirty_background limit.
> When memory is low, __background_writeout() comes around and __forcefully__
> writes dirty pages to disk.
> This forced write fills the disk queue and starves read calls that MySQL is
> trying to do: basically killing performance for a few seconds.
> This pattern repeats as soon as the cleared memory is filled again.
>
> Decreasing the dirty_writeback_centisecs to 100 doesn__t help
>
> I don__t know why this is, but I did some preliminary tracing using systemtap
> and it seems that the majority of times wk_update calls decides to do
> nothing.
>
> Doubling /sys/block/sdb/queue/nr_requests to 256, seems to help abit: the
> nr_dirty pages is increasing more slowly.
> But I am unsure of side-effects and am afraid of increasing the starvation
> problem for mysql.
>
>
> I__am very much willing to work on this issue and see it fixed, but would
> like to tap into the knowledge of people here.
> So:
> * Have more people seen this or simular issues?
> * Is the hypothesis above a viable one?
> * Suggestions/pointers for further research and statistics I should measure
> to improve the understanding of this problem.
>

I don't think that noop-iosched tries to do anything to prevent
writes-starve-reads. Do you get better behaviour from any of the other IO
schedulers?

2009-03-22 16:53:56

by Jos Houtman

[permalink] [raw]

Subject: RE: Page Cache writeback too slow, SSD/noop scheduler/ext2

On 3/21/09 11:53 AM, "Andrew Morton" <[email protected]> wrote:

> On Fri, 20 Mar 2009 19:26:06 +0100 Jos Houtman <[email protected]> wrote:
>
>> Hi,
>>
>> We have hit a problem where the page-cache writeback algorithm is not
>> keeping up.
>> When memory gets low this will result in very irregular performance drops.
>>
>> Our setup is as follows:
>> 30 x Quad core machine with 64GB ram.
>> These are single purpose machines running MySQL.
>> Kernel version: 2.6.28.7
>> A dedicated SSD drive for the ext2 database partition
>> Noop scheduler for the ssd drive.
>>
>>
>> The current hypothesis is as follows:
>> The wk_update function does not write enough dirty pages, which allows the
>> number of dirty pages to grow to the dirty_background limit.
>> When memory is low, __background_writeout() comes around and __forcefully__
>> writes dirty pages to disk.
>> This forced write fills the disk queue and starves read calls that MySQL is
>> trying to do: basically killing performance for a few seconds.
>> This pattern repeats as soon as the cleared memory is filled again.
>>
>> Decreasing the dirty_writeback_centisecs to 100 doesn__t help
>>
>> I don__t know why this is, but I did some preliminary tracing using systemtap
>> and it seems that the majority of times wk_update calls decides to do
>> nothing.
>>
>> Doubling /sys/block/sdb/queue/nr_requests to 256, seems to help abit: the
>> nr_dirty pages is increasing more slowly.
>> But I am unsure of side-effects and am afraid of increasing the starvation
>> problem for mysql.
>>
>>
>> I__am very much willing to work on this issue and see it fixed, but would
>> like to tap into the knowledge of people here.
>> So:
>> * Have more people seen this or simular issues?
>> * Is the hypothesis above a viable one?
>> * Suggestions/pointers for further research and statistics I should measure
>> to improve the understanding of this problem.
>>
>
> I don't think that noop-iosched tries to do anything to prevent
> writes-starve-reads. Do you get better behaviour from any of the other IO
> schedulers?
>

I did a quick stress test and cfq does not immediately seem to hurt
performance, although some of my colleague's have tested this in the past
with the opposite results (which is why we use noop).

But despite the scheduler, the real problem is in the writeback algorithm
not keeping up.
We can grow 600K dirty pages during the day, and only ~300k is flushed to
disk during the night hours.

While a quick look at the writeback algorithm let me to expect
__wk_update()__ to flush ~1024 pages every 5 seconds, which is almost 3GB
per hour. It obviously does not manage to do this in our setup.

I don?t believe the speed of the ssd to be the problem, running sync
manually only takes a few minutes to flush 800K dirty pages to disk.

With regards,

Jos

2009-03-24 14:49:20

by Nick Piggin

[permalink] [raw]

Subject: Re: Page Cache writeback too slow, SSD/noop scheduler/ext2

On Monday 23 March 2009 03:53:29 Jos Houtman wrote:
> On 3/21/09 11:53 AM, "Andrew Morton" <[email protected]> wrote:
> > On Fri, 20 Mar 2009 19:26:06 +0100 Jos Houtman <[email protected]> wrote:
> >> Hi,
> >>
> >> We have hit a problem where the page-cache writeback algorithm is not
> >> keeping up.
> >> When memory gets low this will result in very irregular performance
> >> drops.
> >>
> >> Our setup is as follows:
> >> 30 x Quad core machine with 64GB ram.
> >> These are single purpose machines running MySQL.
> >> Kernel version: 2.6.28.7
> >> A dedicated SSD drive for the ext2 database partition
> >> Noop scheduler for the ssd drive.
> >>
> >>
> >> The current hypothesis is as follows:
> >> The wk_update function does not write enough dirty pages, which allows
> >> the number of dirty pages to grow to the dirty_background limit.
> >> When memory is low, __background_writeout() comes around and
> >> __forcefully__ writes dirty pages to disk.
> >> This forced write fills the disk queue and starves read calls that MySQL
> >> is trying to do: basically killing performance for a few seconds. This
> >> pattern repeats as soon as the cleared memory is filled again.
> >>
> >> Decreasing the dirty_writeback_centisecs to 100 doesn__t help
> >>
> >> I don__t know why this is, but I did some preliminary tracing using
> >> systemtap and it seems that the majority of times wk_update calls
> >> decides to do nothing.
> >>
> >> Doubling /sys/block/sdb/queue/nr_requests to 256, seems to help abit:
> >> the nr_dirty pages is increasing more slowly.
> >> But I am unsure of side-effects and am afraid of increasing the
> >> starvation problem for mysql.
> >>
> >>
> >> I__am very much willing to work on this issue and see it fixed, but
> >> would like to tap into the knowledge of people here.
> >> So:
> >> * Have more people seen this or simular issues?
> >> * Is the hypothesis above a viable one?
> >> * Suggestions/pointers for further research and statistics I should
> >> measure to improve the understanding of this problem.
> >
> > I don't think that noop-iosched tries to do anything to prevent
> > writes-starve-reads. Do you get better behaviour from any of the other
> > IO schedulers?
>
> I did a quick stress test and cfq does not immediately seem to hurt
> performance, although some of my colleague's have tested this in the past
> with the opposite results (which is why we use noop).
>
> But despite the scheduler, the real problem is in the writeback algorithm
> not keeping up.
> We can grow 600K dirty pages during the day, and only ~300k is flushed to
> disk during the night hours.
>
> While a quick look at the writeback algorithm let me to expect
> __wk_update()__ to flush ~1024 pages every 5 seconds, which is almost 3GB
> per hour. It obviously does not manage to do this in our setup.
>
> I don?t believe the speed of the ssd to be the problem, running sync
> manually only takes a few minutes to flush 800K dirty pages to disk.

kupdate surely should just continue to keep trying to write back pages
so long as there are more old pages to clean, and the queue isn't
congested. That seems to be the intention anyway: MAX_WRITEBACK_PAGES
is just the number to write back in a single call, but you see
nr_to_write is set to the number of dirty pages in the system.

On your system, what must be happening is more_io is not being set.
The logic in fs/fs-writeback.c might be busted.

2009-03-25 05:27:30

by Fengguang Wu

[permalink] [raw]

Subject: Re: Page Cache writeback too slow, SSD/noop scheduler/ext2

On Wed, Mar 25, 2009 at 01:48:53AM +1100, Nick Piggin wrote:
> On Monday 23 March 2009 03:53:29 Jos Houtman wrote:
> > On 3/21/09 11:53 AM, "Andrew Morton" <[email protected]> wrote:
> > > On Fri, 20 Mar 2009 19:26:06 +0100 Jos Houtman <[email protected]> wrote:
> > >> Hi,
> > >>
> > >> We have hit a problem where the page-cache writeback algorithm is not
> > >> keeping up.
> > >> When memory gets low this will result in very irregular performance
> > >> drops.
> > >>
> > >> Our setup is as follows:
> > >> 30 x Quad core machine with 64GB ram.
> > >> These are single purpose machines running MySQL.
> > >> Kernel version: 2.6.28.7
> > >> A dedicated SSD drive for the ext2 database partition
> > >> Noop scheduler for the ssd drive.
> > >>
> > >>
> > >> The current hypothesis is as follows:
> > >> The wk_update function does not write enough dirty pages, which allows
> > >> the number of dirty pages to grow to the dirty_background limit.
> > >> When memory is low, __background_writeout() comes around and
> > >> __forcefully__ writes dirty pages to disk.
> > >> This forced write fills the disk queue and starves read calls that MySQL
> > >> is trying to do: basically killing performance for a few seconds. This
> > >> pattern repeats as soon as the cleared memory is filled again.
> > >>
> > >> Decreasing the dirty_writeback_centisecs to 100 doesn__t help
> > >>
> > >> I don__t know why this is, but I did some preliminary tracing using
> > >> systemtap and it seems that the majority of times wk_update calls
> > >> decides to do nothing.
> > >>
> > >> Doubling /sys/block/sdb/queue/nr_requests to 256, seems to help abit:
> > >> the nr_dirty pages is increasing more slowly.
> > >> But I am unsure of side-effects and am afraid of increasing the
> > >> starvation problem for mysql.
> > >>
> > >>
> > >> I__am very much willing to work on this issue and see it fixed, but
> > >> would like to tap into the knowledge of people here.
> > >> So:
> > >> * Have more people seen this or simular issues?
> > >> * Is the hypothesis above a viable one?
> > >> * Suggestions/pointers for further research and statistics I should
> > >> measure to improve the understanding of this problem.
> > >
> > > I don't think that noop-iosched tries to do anything to prevent
> > > writes-starve-reads. Do you get better behaviour from any of the other
> > > IO schedulers?
> >
> > I did a quick stress test and cfq does not immediately seem to hurt
> > performance, although some of my colleague's have tested this in the past
> > with the opposite results (which is why we use noop).
> >
> > But despite the scheduler, the real problem is in the writeback algorithm
> > not keeping up.
> > We can grow 600K dirty pages during the day, and only ~300k is flushed to
> > disk during the night hours.
> >
> > While a quick look at the writeback algorithm let me to expect
> > __wk_update()__ to flush ~1024 pages every 5 seconds, which is almost 3GB
> > per hour. It obviously does not manage to do this in our setup.
> >
> > I don¹t believe the speed of the ssd to be the problem, running sync
> > manually only takes a few minutes to flush 800K dirty pages to disk.
>
> kupdate surely should just continue to keep trying to write back pages
> so long as there are more old pages to clean, and the queue isn't
> congested. That seems to be the intention anyway: MAX_WRITEBACK_PAGES
> is just the number to write back in a single call, but you see
> nr_to_write is set to the number of dirty pages in the system.
>
> On your system, what must be happening is more_io is not being set.
> The logic in fs/fs-writeback.c might be busted.

Hi Jos,

I prepared a debugging patch for 2.6.28. (I cannot observe writeback
problems on my local ext2 mount.)

You can view the states of all dirty inodes by doing

modprobe filecache
echo ls dirty > /proc/filecache
cat /proc/filecache

The 'age' field shows (jiffies - inode->dirtied_when), which may also be useful
for debugging Jeff and Ian's case(if it keeps growing, then dirtied_when is stuck).

The detailed dirty writeback traces can be retrieved by doing

echo 1 > /proc/sys/fs/dirty_debug
sleep 6s
echo 0 > /proc/sys/fs/dirty_debug
dmesg

The dmesg trace should help identify the bug in periodic writeback.

Thanks,
Fengguang

Attachments:

(No filename) (4.25 kB)
filecache+writeback-debug-2.6.28.patch (39.83 kB)
Download all attachments

2009-03-27 17:00:09

by Jos Houtman

[permalink] [raw]

Subject: Re: Page Cache writeback too slow, SSD/noop scheduler/ext2

Hi,

>>
>> kupdate surely should just continue to keep trying to write back pages
>> so long as there are more old pages to clean, and the queue isn't
>> congested. That seems to be the intention anyway: MAX_WRITEBACK_PAGES
>> is just the number to write back in a single call, but you see
>> nr_to_write is set to the number of dirty pages in the system.

And when it's congested it should just wait a little bit before continuing.

>> On your system, what must be happening is more_io is not being set.
>> The logic in fs/fs-writeback.c might be busted.

I don't know about more_io, but I agree that the logic seems busted.

>
> Hi Jos,
>
> I prepared a debugging patch for 2.6.28. (I cannot observe writeback
> problems on my local ext2 mount.)

Thanx for the patch, but for the next time: How should I apply it?
it seems to be context aware (@@) and broke on all kernel versions I tried
2.6.28/2.6.28.7/2.6.29

Because I saw the patch only a few hour ago and didn't want to block on your
reply I decided to patch it manually and in the process ported it to 2.6.29.

As for the information the patch provided: It is most helpful.

Attached you will find a list of files containing dirty pages and the count
of there dirty pages, there is also a dmesg output where I trace the
writeback for 40 seconds.

I did some testing on my own using printk's and what I saw is that for the
inodes located on sdb1 (the database) a lot of times they would pass
http://lxr.linux.no/linux+v2.6.29/fs/fs-writeback.c#L335
And then redirty_tail would be called, I haven't had the time to dig deeper,
but that is my primary suspect for the moment.

Thanx again,

Jos

Attachments:

filecache-27-march.txt (4.53 kB)
dmesg-27-march.txt (16.94 kB)
Download all attachments

2009-03-29 02:33:21

by Fengguang Wu

[permalink] [raw]

Subject: Re: Page Cache writeback too slow, SSD/noop scheduler/ext2

On Sat, Mar 28, 2009 at 12:59:43AM +0800, Jos Houtman wrote:
> Hi,
>
> >>
> >> kupdate surely should just continue to keep trying to write back pages
> >> so long as there are more old pages to clean, and the queue isn't
> >> congested. That seems to be the intention anyway: MAX_WRITEBACK_PAGES
> >> is just the number to write back in a single call, but you see
> >> nr_to_write is set to the number of dirty pages in the system.
>
> And when it's congested it should just wait a little bit before continuing.
>
> >> On your system, what must be happening is more_io is not being set.
> >> The logic in fs/fs-writeback.c might be busted.
>
> I don't know about more_io, but I agree that the logic seems busted.
>
> >
> > Hi Jos,
> >
> > I prepared a debugging patch for 2.6.28. (I cannot observe writeback
> > problems on my local ext2 mount.)
>
> Thanx for the patch, but for the next time: How should I apply it?
> it seems to be context aware (@@) and broke on all kernel versions I tried
> 2.6.28/2.6.28.7/2.6.29

Do you mean that the patch applies after removing " @@.*$"?

To be safe, I created the patch with quilt as well as git, for 2.6.29.

> Because I saw the patch only a few hour ago and didn't want to block on your
> reply I decided to patch it manually and in the process ported it to 2.6.29.
>
> As for the information the patch provided: It is most helpful.
>
> Attached you will find a list of files containing dirty pages and the count
> of there dirty pages, there is also a dmesg output where I trace the
> writeback for 40 seconds.

They helped, thank you!

> I did some testing on my own using printk's and what I saw is that for the
> inodes located on sdb1 (the database) a lot of times they would pass
> http://lxr.linux.no/linux+v2.6.29/fs/fs-writeback.c#L335
> And then redirty_tail would be called, I haven't had the time to dig deeper,
> but that is my primary suspect for the moment.

You are right. In your case, there are several big dirty files in sdb1,
and the sdb write queue is constantly (almost-)congested. The SSD write
speed is so slow, that in each round of sdb1 writeback, it begins with
an uncongested queue, but quickly fills up after writing some pages.
Hence all the inodes will get redirtied because of (nr_to_write > 0).

The following quick fix should solve the slow-writeback-on-congested-SSD
problem. However the writeback sequence is suboptimal: it sync-and-requeue
each file until congested (in your case about 3~600 pages) instead of
until MAX_WRITEBACK_PAGES=1024 pages.

A more complete fix would be turning MAX_WRITEBACK_PAGES into an exact
per-file limit. It has been sitting in my todo list for quite a while...

Thanks,
Fengguang

---
fs/fs-writeback.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- mm.orig/fs/fs-writeback.c
+++ mm/fs/fs-writeback.c
@@ -325,7 +325,8 @@ __sync_single_inode(struct inode *inode,
* soon as the queue becomes uncongested.
*/
inode->i_state |= I_DIRTY_PAGES;
- if (wbc->nr_to_write <= 0) {
+ if (wbc->nr_to_write <= 0 ||
+ wbc->encountered_congestion) {
/*
* slice used up: queue for next turn
*/

Attachments:

(No filename) (3.10 kB)
writeback-requeue-congestion-quickfix.patch (486.00 B)
Download all attachments

2009-03-30 15:47:38

by Jos Houtman

[permalink] [raw]

Subject: Re: Page Cache writeback too slow, SSD/noop scheduler/ext2

>> Thanx for the patch, but for the next time: How should I apply it?
>> it seems to be context aware (@@) and broke on all kernel versions I tried
>> 2.6.28/2.6.28.7/2.6.29
>
> Do you mean that the patch applies after removing " @@.*$"?

I didn't try that, but this time it worked. So it was probably my error.

>
> You are right. In your case, there are several big dirty files in sdb1,
> and the sdb write queue is constantly (almost-)congested. The SSD write
> speed is so slow, that in each round of sdb1 writeback, it begins with
> an uncongested queue, but quickly fills up after writing some pages.
> Hence all the inodes will get redirtied because of (nr_to_write > 0).
>
> The following quick fix should solve the slow-writeback-on-congested-SSD
> problem. However the writeback sequence is suboptimal: it sync-and-requeue
> each file until congested (in your case about 3~600 pages) instead of
> until MAX_WRITEBACK_PAGES=1024 pages.

Yeah that fixed it, but performance dropped due to the more constant
congestion. So I will need to try some different io schedulers.

Next to that I was wondering if there are any plans to make sure that not
all dirty-files are written back in the same interval.

In my case all database files are written back each 30 seconds, while I
would prefer them to be more divided over the interval.

Thanks,

Jos

2009-03-31 00:29:17

by Fengguang Wu

[permalink] [raw]

Subject: Re: Page Cache writeback too slow, SSD/noop scheduler/ext2

On Tue, Mar 31, 2009 at 12:47:19AM +0800, Jos Houtman wrote:
>
> >> Thanx for the patch, but for the next time: How should I apply it?
> >> it seems to be context aware (@@) and broke on all kernel versions I tried
> >> 2.6.28/2.6.28.7/2.6.29
> >
> > Do you mean that the patch applies after removing " @@.*$"?
>
> I didn't try that, but this time it worked. So it was probably my error.
>
> >
> > You are right. In your case, there are several big dirty files in sdb1,
> > and the sdb write queue is constantly (almost-)congested. The SSD write
> > speed is so slow, that in each round of sdb1 writeback, it begins with
> > an uncongested queue, but quickly fills up after writing some pages.
> > Hence all the inodes will get redirtied because of (nr_to_write > 0).
> >
> > The following quick fix should solve the slow-writeback-on-congested-SSD
> > problem. However the writeback sequence is suboptimal: it sync-and-requeue
> > each file until congested (in your case about 3~600 pages) instead of
> > until MAX_WRITEBACK_PAGES=1024 pages.
>
> Yeah that fixed it, but performance dropped due to the more constant
> congestion. So I will need to try some different io schedulers.

Read performance or write performance?

> Next to that I was wondering if there are any plans to make sure that not
> all dirty-files are written back in the same interval.
>
> In my case all database files are written back each 30 seconds, while I
> would prefer them to be more divided over the interval.

pdflush will wake up every 5s to sync files dirtied more than 30s.
So the writeback of inodes should be distributed(somehow randomly)
into these 5s-interval-wakeups due to varied dirty times.

However the distribution may well be uneven in may cases. It seems to
be conflicting goals for HDD and SSD: one favors somehow small busty
writeback, another favors smooth writeback streams. I guess the better
scheme would be bursty pdflush writebacks plus IO scheduler level QoS.

Thanks,
Fengguang

2009-03-31 12:17:20

by Jos Houtman

[permalink] [raw]

Subject: Re: Page Cache writeback too slow, SSD/noop scheduler/ext2

>
> Next to that I was wondering if there are any plans to make sure that not
> all dirty-files are written back in the same interval.
>
> In my case all database files are written back each 30 seconds, while I
> would prefer them to be more divided over the interval.

There another question I have: does the writeback go through the io
scheduler? Because no matter the io scheduler or the tuning done, the
writeback algorithm totally starves the reads.

See the url below for an example with CFQ, but deadline or noop all show
this behaviour:
http://94.100.113.33/535450001-535500000/535451701-535451800/535451800_6_L7g
t.jpeg

Is there anything I can do about this behaviour by creating a better
interleaving of the reads and writes?

Jos

2009-03-31 12:31:48

by Fengguang Wu

[permalink] [raw]

Subject: Re: Page Cache writeback too slow, SSD/noop scheduler/ext2

On Tue, Mar 31, 2009 at 08:16:52PM +0800, Jos Houtman wrote:
> >
> > Next to that I was wondering if there are any plans to make sure that not
> > all dirty-files are written back in the same interval.
> >
> > In my case all database files are written back each 30 seconds, while I
> > would prefer them to be more divided over the interval.
>
> There another question I have: does the writeback go through the io
> scheduler? Because no matter the io scheduler or the tuning done, the
> writeback algorithm totally starves the reads.

I noticed this annoying writes-starve-reads problem too. I'll look into it.

> See the url below for an example with CFQ, but deadline or noop all show
> this behaviour:
> http://94.100.113.33/535450001-535500000/535451701-535451800/535451800_6_L7g
> t.jpeg
>
> Is there anything I can do about this behaviour by creating a better
> interleaving of the reads and writes?

I guess it should be handled in the generic block io layer. Once we
solved the writes-starve-reads problem, the bursty-writeback behavior
becomes a no-problem for SSD.

Thanks,
Fengguang

2009-03-31 14:10:58

by Jos Houtman

[permalink] [raw]

Subject: Re: Page Cache writeback too slow, SSD/noop scheduler/ext2

>> There another question I have: does the writeback go through the io
>> scheduler? Because no matter the io scheduler or the tuning done, the
>> writeback algorithm totally starves the reads.
>
> I noticed this annoying writes-starve-reads problem too. I'll look into it.

Thanks

>
>> Is there anything I can do about this behaviour by creating a better
>> interleaving of the reads and writes?
>
> I guess it should be handled in the generic block io layer. Once we
> solved the writes-starve-reads problem, the bursty-writeback behavior
> becomes a no-problem for SSD.

Yeah this was the part where I figured the io-schedulers kicked in, but
obviously I was wrong :P.

If I can do anything more to help this along, let me know.

Thanks

Jos