LinuxLists.cc - [PATCH] cfq: Fix starvation of async writes in presence of heavy sync workload

2011-06-20 14:16:39

Subject: [PATCH] cfq: Fix starvation of async writes in presence of heavy sync workload

In presence of heavy sync workload CFQ can starve asnc writes.
If one launches multiple readers (say 16), then one can notice
that CFQ can withhold dispatch of WRITEs for a very long time say
200 or 300 seconds.

Basically CFQ schedules an async queue but does not dispatch any
writes because it is waiting for exisintng sync requests in queue to
finish. While it is waiting, one or other reader gets queued up and
preempts the async queue. So we did schedule the async queue but never
dispatched anything from it. This can repeat for long time hence
practically starving Writers.

This patch allows async queue to dispatch atleast 1 requeust once
it gets scheduled and denies preemption if async queue has been
waiting for sync requests to drain and has not been able to dispatch
a request yet.

One concern with this fix is that how does it impact readers
in presence of heavy writting going on.

I did a test where I launch firefox, load a website and close
firefox and measure the time. I ran the test 3 times and took
average.

- Vanilla kernel time ~= 1 minute 40 seconds
- Patched kenrel time ~= 1 minute 35 seconds

Basically it looks like that for this test times have not
changed much for this test. But I would not claim that it does
not impact reader's latencies at all. It might show up in
other workloads.

I think we anyway need to fix writer starvation. If this patch
causes issues, then we need to look at reducing writer's
queue depth further to improve latencies for readers.

Reported-and-Tested-by: Tao Ma <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
block/cfq-iosched.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

Index: linux-2.6/block/cfq-iosched.c
===================================================================
--- linux-2.6.orig/block/cfq-iosched.c 2011-06-10 10:05:34.660781278 -0400
+++ linux-2.6/block/cfq-iosched.c 2011-06-20 08:29:13.328186380 -0400
@@ -3315,8 +3315,15 @@ cfq_should_preempt(struct cfq_data *cfqd
* if the new request is sync, but the currently running queue is
* not, let the sync request have priority.
*/
- if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq))
+ if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq)) {
+ /*
+ * Allow atleast one dispatch otherwise this can repeat
+ * and writes can be starved completely
+ */
+ if (!cfqq->slice_dispatch)
+ return false;
return true;
+ }

if (new_cfqq->cfqg != cfqq->cfqg)
return false;

2011-06-20 14:34:16

by Vivek Goyal

[permalink] [raw]

Subject: Re: [PATCH] cfq: Fix starvation of async writes in presence of heavy sync workload

On Mon, Jun 20, 2011 at 10:16:31AM -0400, Vivek Goyal wrote:
> In presence of heavy sync workload CFQ can starve asnc writes.
> If one launches multiple readers (say 16), then one can notice
> that CFQ can withhold dispatch of WRITEs for a very long time say
> 200 or 300 seconds.
>
> Basically CFQ schedules an async queue but does not dispatch any
> writes because it is waiting for exisintng sync requests in queue to
> finish. While it is waiting, one or other reader gets queued up and
> preempts the async queue. So we did schedule the async queue but never
> dispatched anything from it. This can repeat for long time hence
> practically starving Writers.
>
> This patch allows async queue to dispatch atleast 1 requeust once
> it gets scheduled and denies preemption if async queue has been
> waiting for sync requests to drain and has not been able to dispatch
> a request yet.
>
> One concern with this fix is that how does it impact readers
> in presence of heavy writting going on.
>
> I did a test where I launch firefox, load a website and close
> firefox and measure the time. I ran the test 3 times and took
> average.
>
> - Vanilla kernel time ~= 1 minute 40 seconds
> - Patched kenrel time ~= 1 minute 35 seconds
>

Forgot to mention that this was in the presence of a dd doing
writes.

dd if=/dev/zero of=zerofile bs=4K count=1M

Launching firefox takes around 25 seconds or so. Loading first
website takes a long time. I had a quick look at blktrace
and it I see for long time no reads are queued at all. Looks
like firefox has some dependency on some write to finish
before next read can be issued.

Thanks
Vivek

2011-06-20 16:14:43

by Justin TerAvest

[permalink] [raw]

Subject: Re: [PATCH] cfq: Fix starvation of async writes in presence of heavy sync workload

On Mon, Jun 20, 2011 at 7:16 AM, Vivek Goyal <[email protected]> wrote:
> In presence of heavy sync workload CFQ can starve asnc writes.
> If one launches multiple readers (say 16), then one can notice
> that CFQ can withhold dispatch of WRITEs for a very long time say
> 200 or 300 seconds.
>
> Basically CFQ schedules an async queue but does not dispatch any
> writes because it is waiting for exisintng sync requests in queue to
> finish. While it is waiting, one or other reader gets queued up and
> preempts the async queue. So we did schedule the async queue but never
> dispatched anything from it. This can repeat for long time hence
> practically starving Writers.
>
> This patch allows async queue to dispatch atleast 1 requeust once
> it gets scheduled and denies preemption if async queue has been
> waiting for sync requests to drain and has not been able to dispatch
> a request yet.
>
> One concern with this fix is that how does it impact readers
> in presence of heavy writting going on.
>
> I did a test where I launch firefox, load a website and close
> firefox and measure the time. I ran the test 3 times and took
> average.
>
> - Vanilla kernel time ~= 1 minute 40 seconds
> - Patched kenrel time ~= 1 minute 35 seconds
>
> Basically it looks like that for this test times have not
> changed much for this test. But I would not claim that it does
> not impact reader's latencies at all. It might show up in
> other workloads.
>
> I think we anyway need to fix writer starvation. If this patch
> causes issues, then we need to look at reducing writer's
> queue depth further to improve latencies for readers.

Maybe we should be more specific about what it means to "fix writer starvation"

This makes the preemption logic slightly harder to understand, and I'm
concerned we'll keep making little adjustments like this to the
scheduler.

>
> Reported-and-Tested-by: Tao Ma <[email protected]>
> Signed-off-by: Vivek Goyal <[email protected]>
> ---
> ?block/cfq-iosched.c | ? ?9 ++++++++-
> ?1 file changed, 8 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/block/cfq-iosched.c
> ===================================================================
> --- linux-2.6.orig/block/cfq-iosched.c ?2011-06-10 10:05:34.660781278 -0400
> +++ linux-2.6/block/cfq-iosched.c ? ? ? 2011-06-20 08:29:13.328186380 -0400
> @@ -3315,8 +3315,15 @@ cfq_should_preempt(struct cfq_data *cfqd
> ? ? ? ? * if the new request is sync, but the currently running queue is
> ? ? ? ? * not, let the sync request have priority.
> ? ? ? ? */
> - ? ? ? if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq))
> + ? ? ? if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq)) {
> + ? ? ? ? ? ? ? /*
> + ? ? ? ? ? ? ? ?* Allow atleast one dispatch otherwise this can repeat
> + ? ? ? ? ? ? ? ?* and writes can be starved completely
> + ? ? ? ? ? ? ? ?*/
> + ? ? ? ? ? ? ? if (!cfqq->slice_dispatch)
> + ? ? ? ? ? ? ? ? ? ? ? return false;
> ? ? ? ? ? ? ? ?return true;
> + ? ? ? }
>
> ? ? ? ?if (new_cfqq->cfqg != cfqq->cfqg)
> ? ? ? ? ? ? ? ?return false;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>

2011-06-20 16:45:12

by Vivek Goyal

[permalink] [raw]

Subject: Re: [PATCH] cfq: Fix starvation of async writes in presence of heavy sync workload

On Mon, Jun 20, 2011 at 09:14:18AM -0700, Justin TerAvest wrote:
> On Mon, Jun 20, 2011 at 7:16 AM, Vivek Goyal <[email protected]> wrote:
> > In presence of heavy sync workload CFQ can starve asnc writes.
> > If one launches multiple readers (say 16), then one can notice
> > that CFQ can withhold dispatch of WRITEs for a very long time say
> > 200 or 300 seconds.
> >
> > Basically CFQ schedules an async queue but does not dispatch any
> > writes because it is waiting for exisintng sync requests in queue to
> > finish. While it is waiting, one or other reader gets queued up and
> > preempts the async queue. So we did schedule the async queue but never
> > dispatched anything from it. This can repeat for long time hence
> > practically starving Writers.
> >
> > This patch allows async queue to dispatch atleast 1 requeust once
> > it gets scheduled and denies preemption if async queue has been
> > waiting for sync requests to drain and has not been able to dispatch
> > a request yet.
> >
> > One concern with this fix is that how does it impact readers
> > in presence of heavy writting going on.
> >
> > I did a test where I launch firefox, load a website and close
> > firefox and measure the time. I ran the test 3 times and took
> > average.
> >
> > - Vanilla kernel time ~= 1 minute 40 seconds
> > - Patched kenrel time ~= 1 minute 35 seconds
> >
> > Basically it looks like that for this test times have not
> > changed much for this test. But I would not claim that it does
> > not impact reader's latencies at all. It might show up in
> > other workloads.
> >
> > I think we anyway need to fix writer starvation. If this patch
> > causes issues, then we need to look at reducing writer's
> > queue depth further to improve latencies for readers.
>
> Maybe we should be more specific about what it means to "fix writer starvation"
>

Tao ma recently ran into issues with writer starvation. Here is
the lkml thread.

https://lkml.org/lkml/2011/6/9/167

I also ran some fio based scripts launching multiple readers
and multiple buffered writers and noticed that there are large
windows where we don't dispatch even a single request from
async queues. That's what starvation is. Time period for
not dispatching request was in the range of 200 seconds.

> This makes the preemption logic slightly harder to understand, and I'm
> concerned we'll keep making little adjustments like this to the
> scheduler.

If you have other ideas for handling this, we can definitely give
it a try.

Thanks
Vivek

>
> >
> > Reported-and-Tested-by: Tao Ma <[email protected]>
> > Signed-off-by: Vivek Goyal <[email protected]>
> > ---
> > ?block/cfq-iosched.c | ? ?9 ++++++++-
> > ?1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6/block/cfq-iosched.c
> > ===================================================================
> > --- linux-2.6.orig/block/cfq-iosched.c ?2011-06-10 10:05:34.660781278 -0400
> > +++ linux-2.6/block/cfq-iosched.c ? ? ? 2011-06-20 08:29:13.328186380 -0400
> > @@ -3315,8 +3315,15 @@ cfq_should_preempt(struct cfq_data *cfqd
> > ? ? ? ? * if the new request is sync, but the currently running queue is
> > ? ? ? ? * not, let the sync request have priority.
> > ? ? ? ? */
> > - ? ? ? if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq))
> > + ? ? ? if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq)) {
> > + ? ? ? ? ? ? ? /*
> > + ? ? ? ? ? ? ? ?* Allow atleast one dispatch otherwise this can repeat
> > + ? ? ? ? ? ? ? ?* and writes can be starved completely
> > + ? ? ? ? ? ? ? ?*/
> > + ? ? ? ? ? ? ? if (!cfqq->slice_dispatch)
> > + ? ? ? ? ? ? ? ? ? ? ? return false;
> > ? ? ? ? ? ? ? ?return true;
> > + ? ? ? }
> >
> > ? ? ? ?if (new_cfqq->cfqg != cfqq->cfqg)
> > ? ? ? ? ? ? ? ?return false;
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at ?http://www.tux.org/lkml/
> >

2011-06-20 22:17:16

by Justin TerAvest

[permalink] [raw]

Subject: Re: [PATCH] cfq: Fix starvation of async writes in presence of heavy sync workload

On Mon, Jun 20, 2011 at 9:45 AM, Vivek Goyal <[email protected]> wrote:
> On Mon, Jun 20, 2011 at 09:14:18AM -0700, Justin TerAvest wrote:
>> On Mon, Jun 20, 2011 at 7:16 AM, Vivek Goyal <[email protected]> wrote:
>> > In presence of heavy sync workload CFQ can starve asnc writes.
>> > If one launches multiple readers (say 16), then one can notice
>> > that CFQ can withhold dispatch of WRITEs for a very long time say
>> > 200 or 300 seconds.
>> >
>> > Basically CFQ schedules an async queue but does not dispatch any
>> > writes because it is waiting for exisintng sync requests in queue to
>> > finish. While it is waiting, one or other reader gets queued up and
>> > preempts the async queue. So we did schedule the async queue but never
>> > dispatched anything from it. This can repeat for long time hence
>> > practically starving Writers.
>> >
>> > This patch allows async queue to dispatch atleast 1 requeust once
>> > it gets scheduled and denies preemption if async queue has been
>> > waiting for sync requests to drain and has not been able to dispatch
>> > a request yet.
>> >
>> > One concern with this fix is that how does it impact readers
>> > in presence of heavy writting going on.
>> >
>> > I did a test where I launch firefox, load a website and close
>> > firefox and measure the time. I ran the test 3 times and took
>> > average.
>> >
>> > - Vanilla kernel time ~= 1 minute 40 seconds
>> > - Patched kenrel time ~= 1 minute 35 seconds
>> >
>> > Basically it looks like that for this test times have not
>> > changed much for this test. But I would not claim that it does
>> > not impact reader's latencies at all. It might show up in
>> > other workloads.
>> >
>> > I think we anyway need to fix writer starvation. If this patch
>> > causes issues, then we need to look at reducing writer's
>> > queue depth further to improve latencies for readers.
>>
>> Maybe we should be more specific about what it means to "fix writer starvation"
>>
>
> Tao ma recently ran into issues with writer starvation. Here is
> the lkml thread.
>
> https://lkml.org/lkml/2011/6/9/167
>
> I also ran some fio based scripts launching multiple readers
> and multiple buffered writers and noticed that there are large
> windows where we don't dispatch even a single request from
> async queues. That's what starvation is. Time period for
> not dispatching request was in the range of 200 seconds.

How do we establish what's acceptable? My complaint is that it's not
obvious what tradeoffs to make in the I/O scheduler.

>
>> This makes the preemption logic slightly harder to understand, and I'm
>> concerned we'll keep making little adjustments like this to the
>> scheduler.
>
> If you have other ideas for handling this, we can definitely give
> it a try.

I haven't written out a case to prove it, but it seems like other
preemption logic (like the cfq_rq_close() case) could also cause some
requests to be starved indefinitely.

I think if we want to make stronger guarantees about request
starvation, we might have to rethink how preemption works.

>
> Thanks
> Vivek
>
>>
>> >
>> > Reported-and-Tested-by: Tao Ma <[email protected]>
>> > Signed-off-by: Vivek Goyal <[email protected]>
>> > ---
>> > ?block/cfq-iosched.c | ? ?9 ++++++++-
>> > ?1 file changed, 8 insertions(+), 1 deletion(-)
>> >
>> > Index: linux-2.6/block/cfq-iosched.c
>> > ===================================================================
>> > --- linux-2.6.orig/block/cfq-iosched.c ?2011-06-10 10:05:34.660781278 -0400
>> > +++ linux-2.6/block/cfq-iosched.c ? ? ? 2011-06-20 08:29:13.328186380 -0400
>> > @@ -3315,8 +3315,15 @@ cfq_should_preempt(struct cfq_data *cfqd
>> > ? ? ? ? * if the new request is sync, but the currently running queue is
>> > ? ? ? ? * not, let the sync request have priority.
>> > ? ? ? ? */
>> > - ? ? ? if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq))
>> > + ? ? ? if (rq_is_sync(rq) && !cfq_cfqq_sync(cfqq)) {
>> > + ? ? ? ? ? ? ? /*
>> > + ? ? ? ? ? ? ? ?* Allow atleast one dispatch otherwise this can repeat
>> > + ? ? ? ? ? ? ? ?* and writes can be starved completely
>> > + ? ? ? ? ? ? ? ?*/
>> > + ? ? ? ? ? ? ? if (!cfqq->slice_dispatch)
>> > + ? ? ? ? ? ? ? ? ? ? ? return false;
>> > ? ? ? ? ? ? ? ?return true;
>> > + ? ? ? }
>> >
>> > ? ? ? ?if (new_cfqq->cfqg != cfqq->cfqg)
>> > ? ? ? ? ? ? ? ?return false;
>> >
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> > the body of a message to [email protected]
>> > More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>> > Please read the FAQ at ?http://www.tux.org/lkml/
>> >
>

2011-06-20 22:33:29

by Vivek Goyal

[permalink] [raw]

Subject: Re: [PATCH] cfq: Fix starvation of async writes in presence of heavy sync workload

On Mon, Jun 20, 2011 at 03:16:48PM -0700, Justin TerAvest wrote:

[..]
> How do we establish what's acceptable? My complaint is that it's not
> obvious what tradeoffs to make in the I/O scheduler.
>

I think it should be driven with real workloads and some common
sense. Easily reproducible complete starvation of async requests
sounds bad enough that it needs fixing.

> >
> >> This makes the preemption logic slightly harder to understand, and I'm
> >> concerned we'll keep making little adjustments like this to the
> >> scheduler.
> >
> > If you have other ideas for handling this, we can definitely give
> > it a try.
>
> I haven't written out a case to prove it, but it seems like other
> preemption logic (like the cfq_rq_close() case) could also cause some
> requests to be starved indefinitely.

If we can easily reproduce this starvation may be that also needs
fixing.

>
> I think if we want to make stronger guarantees about request
> starvation, we might have to rethink how preemption works.

What's your proposal? cpu scheduler like only class based preemption
is not going to work for the simple reason that writes come in big
sizes without any dependencies and reads can come in small sizes
one at a time because these are dependent reads.

So are you saying that write starvation is not a real problem or you
are suggesting that overall you are not happy with preemption logic
and want more changes in there.

Thanks
Vivek

2011-06-21 02:15:21

by Shaohua Li

[permalink] [raw]

Subject: Re: [PATCH] cfq: Fix starvation of async writes in presence of heavy sync workload

2011/6/20 Vivek Goyal <[email protected]>:
> In presence of heavy sync workload CFQ can starve asnc writes.
> If one launches multiple readers (say 16), then one can notice
> that CFQ can withhold dispatch of WRITEs for a very long time say
> 200 or 300 seconds.
>
> Basically CFQ schedules an async queue but does not dispatch any
> writes because it is waiting for exisintng sync requests in queue to
> finish. While it is waiting, one or other reader gets queued up and
> preempts the async queue. So we did schedule the async queue but never
> dispatched anything from it. This can repeat for long time hence
> practically starving Writers.
>
> This patch allows async queue to dispatch atleast 1 requeust once
> it gets scheduled and denies preemption if async queue has been
> waiting for sync requests to drain and has not been able to dispatch
> a request yet.
>
> One concern with this fix is that how does it impact readers
> in presence of heavy writting going on.
>
> I did a test where I launch firefox, load a website and close
> firefox and measure the time. I ran the test 3 times and took
> average.
>
> - Vanilla kernel time ~= 1 minute 40 seconds
> - Patched kenrel time ~= 1 minute 35 seconds
>
> Basically it looks like that for this test times have not
> changed much for this test. But I would not claim that it does
> not impact reader's latencies at all. It might show up in
> other workloads.
>
> I think we anyway need to fix writer starvation. If this patch
> causes issues, then we need to look at reducing writer's
> queue depth further to improve latencies for readers.
I'm afraid this can causes read latency because cfq_dispatch_requests
doesn't check preempt. we will dispatch 4 requests at least instead of
just one. can we add a logic to force it just dispatches one request?

Thanks,
Shaohua

2011-06-21 15:26:10

by Vivek Goyal

[permalink] [raw]

Subject: Re: [PATCH] cfq: Fix starvation of async writes in presence of heavy sync workload

On Tue, Jun 21, 2011 at 10:15:15AM +0800, Shaohua Li wrote:
> 2011/6/20 Vivek Goyal <[email protected]>:
> > In presence of heavy sync workload CFQ can starve asnc writes.
> > If one launches multiple readers (say 16), then one can notice
> > that CFQ can withhold dispatch of WRITEs for a very long time say
> > 200 or 300 seconds.
> >
> > Basically CFQ schedules an async queue but does not dispatch any
> > writes because it is waiting for exisintng sync requests in queue to
> > finish. While it is waiting, one or other reader gets queued up and
> > preempts the async queue. So we did schedule the async queue but never
> > dispatched anything from it. This can repeat for long time hence
> > practically starving Writers.
> >
> > This patch allows async queue to dispatch atleast 1 requeust once
> > it gets scheduled and denies preemption if async queue has been
> > waiting for sync requests to drain and has not been able to dispatch
> > a request yet.
> >
> > One concern with this fix is that how does it impact readers
> > in presence of heavy writting going on.
> >
> > I did a test where I launch firefox, load a website and close
> > firefox and measure the time. I ran the test 3 times and took
> > average.
> >
> > - Vanilla kernel time ~= 1 minute 40 seconds
> > - Patched kenrel time ~= 1 minute 35 seconds
> >
> > Basically it looks like that for this test times have not
> > changed much for this test. But I would not claim that it does
> > not impact reader's latencies at all. It might show up in
> > other workloads.
> >
> > I think we anyway need to fix writer starvation. If this patch
> > causes issues, then we need to look at reducing writer's
> > queue depth further to improve latencies for readers.
> I'm afraid this can causes read latency because cfq_dispatch_requests
> doesn't check preempt. we will dispatch 4 requests at least instead of
> just one. can we add a logic to force it just dispatches one request?

This will happen only if some other read queue does not preempt write
queue after disptaching 1 request.

Anyway, agreed that with single reader, it will not preempt writer and
then writer gets to dispatch bunch of requests.

If we want to protect against that, then we can simply expire writer
after dispatching one request if there are busy queues.

I could change following code.

/*
* expire an async queue immediately if it has used up its slice.
* idle
* queue always expire after 1 dispatch round.
*/
if (cfqd->busy_queues > 1 && ((!cfq_cfqq_sync(cfqq) &&
cfqq->slice_dispatch >= cfq_prio_to_maxrq(cfqd, cfqq)) ||
cfq_class_idle(cfqq))) {
cfqq->slice_end = jiffies + 1;
cfq_slice_expired(cfqd, 0);
}

to look as follows.

/*
* expire an async queue and idle queue after 1 dispatch round.
*/
if (cfqd->busy_queues > 1 && ((!cfq_cfqq_sync(cfqq) ||
cfq_class_idle(cfqq))) {
cfqq->slice_end = jiffies + 1;
cfq_slice_expired(cfqd, 0);
}

Will this help?

Thanks
Vivek

2011-06-22 02:08:03

by Shaohua Li

[permalink] [raw]

Subject: Re: [PATCH] cfq: Fix starvation of async writes in presence of heavy sync workload

On Tue, 2011-06-21 at 23:26 +0800, Vivek Goyal wrote:
> On Tue, Jun 21, 2011 at 10:15:15AM +0800, Shaohua Li wrote:
> > 2011/6/20 Vivek Goyal <[email protected]>:
> > > In presence of heavy sync workload CFQ can starve asnc writes.
> > > If one launches multiple readers (say 16), then one can notice
> > > that CFQ can withhold dispatch of WRITEs for a very long time say
> > > 200 or 300 seconds.
> > >
> > > Basically CFQ schedules an async queue but does not dispatch any
> > > writes because it is waiting for exisintng sync requests in queue to
> > > finish. While it is waiting, one or other reader gets queued up and
> > > preempts the async queue. So we did schedule the async queue but never
> > > dispatched anything from it. This can repeat for long time hence
> > > practically starving Writers.
> > >
> > > This patch allows async queue to dispatch atleast 1 requeust once
> > > it gets scheduled and denies preemption if async queue has been
> > > waiting for sync requests to drain and has not been able to dispatch
> > > a request yet.
> > >
> > > One concern with this fix is that how does it impact readers
> > > in presence of heavy writting going on.
> > >
> > > I did a test where I launch firefox, load a website and close
> > > firefox and measure the time. I ran the test 3 times and took
> > > average.
> > >
> > > - Vanilla kernel time ~= 1 minute 40 seconds
> > > - Patched kenrel time ~= 1 minute 35 seconds
> > >
> > > Basically it looks like that for this test times have not
> > > changed much for this test. But I would not claim that it does
> > > not impact reader's latencies at all. It might show up in
> > > other workloads.
> > >
> > > I think we anyway need to fix writer starvation. If this patch
> > > causes issues, then we need to look at reducing writer's
> > > queue depth further to improve latencies for readers.
> > I'm afraid this can causes read latency because cfq_dispatch_requests
> > doesn't check preempt. we will dispatch 4 requests at least instead of
> > just one. can we add a logic to force it just dispatches one request?
>
> This will happen only if some other read queue does not preempt write
> queue after disptaching 1 request.
could happen in multiple queues too, because ncq disks dispatch several
requests in a short time.

> Anyway, agreed that with single reader, it will not preempt writer and
> then writer gets to dispatch bunch of requests.
>
> If we want to protect against that, then we can simply expire writer
> after dispatching one request if there are busy queues.
>
> I could change following code.
>
> /*
> * expire an async queue immediately if it has used up its slice.
> * idle
> * queue always expire after 1 dispatch round.
> */
> if (cfqd->busy_queues > 1 && ((!cfq_cfqq_sync(cfqq) &&
> cfqq->slice_dispatch >= cfq_prio_to_maxrq(cfqd, cfqq)) ||
> cfq_class_idle(cfqq))) {
> cfqq->slice_end = jiffies + 1;
> cfq_slice_expired(cfqd, 0);
> }
>
> to look as follows.
>
> /*
> * expire an async queue and idle queue after 1 dispatch round.
> */
> if (cfqd->busy_queues > 1 && ((!cfq_cfqq_sync(cfqq) ||
> cfq_class_idle(cfqq))) {
> cfqq->slice_end = jiffies + 1;
> cfq_slice_expired(cfqd, 0);
> }
Looks fine. the cfqd->busy_queues check for async queue need exclude a
idle queue if there is. That is for a idle queue and an aync queue,
don't expire async. maybe use this:
if (cfqd->busy_queues > 1 && cfq_calss_idle(cfqq) ||
cfqd->busy_sync_queues > 0 && !cfq_cfqq_sync(cfqq))

Thanks,
Shaohua