2011-03-02 12:43:45

by Shaohua Li

[permalink] [raw]
Subject: cfq-iosched preempt issues

queue preemption is good for some workloads and not for others. With commit
f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
1. In a multi-threaded workload, each thread runs a random read/write (for
example, mmap write) with iodepth 1. I found the queue depth gets smaller
with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
are waitting for write, and on the other hand, there are less threads doing
read. This will make the queue depth small, so performance drops a little.
So in this case, speed up write can speed up read too, but we can't detect
it.
2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
What about if there are two queues, one sync and one async? If the sync queue's
think time is small, we can treat it as the sole queue, because the sync queue
will preempt async queue, so we don't need care about the async queue's latency.
The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.

Any idea?

Thanks,
Shaohua
-----------------------------------------------
Subject: cfq-iosched: don't limlit sync queue depth with only one such sync queue

If there are a sync and an async queue and the sync queue's think time is small,
we can ignore the sync queue's dispatch quantum. Because the sync queue will
always preempt the async queue, we don't need to care about async's latency.
In the same way, we can optimize a RT queue too to improve performance.
This can fix a performance regression of aiostress test, which is introduced by
commit f8ae6e3eb825. The issue should exist even without the commit, but the
commit amplifies the impact.

Signed-off-by: Shaohua Li <[email protected]>
---
block/cfq-iosched.c | 91 +++++++++++++++++++++++++++++++++++++++++-----------
1 file changed, 73 insertions(+), 18 deletions(-)

Index: linux/block/cfq-iosched.c
===================================================================
--- linux.orig/block/cfq-iosched.c 2011-03-02 14:58:19.000000000 +0800
+++ linux/block/cfq-iosched.c 2011-03-02 15:48:38.000000000 +0800
@@ -1150,6 +1150,20 @@ void cfq_unlink_blkio_group(void *key, s
spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
}

+static bool cfq_have_cfqgs(struct cfq_data *cfqd)
+{
+ struct hlist_node *pos;
+ struct cfq_group *cfqg;
+ int cnt = 0;
+
+ hlist_for_each_entry(cfqg, pos, &cfqd->cfqg_list, cfqd_node) {
+ cnt++;
+ if (cnt > 1)
+ break;
+ }
+ return cnt > 1;
+}
+
#else /* GROUP_IOSCHED */
static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create)
{
@@ -1169,6 +1183,12 @@ cfq_link_cfqq_cfqg(struct cfq_queue *cfq
static void cfq_release_cfq_groups(struct cfq_data *cfqd) {}
static inline void cfq_put_cfqg(struct cfq_group *cfqg) {}

+static inline bool cfq_have_cfqgs(struct cfq_data *cfqd)
+{
+ return false;
+}
+
+
#endif /* GROUP_IOSCHED */

/*
@@ -2381,6 +2401,57 @@ static inline bool cfq_slice_used_soon(s
return false;
}

+static unsigned int cfq_queue_max_quantum(struct cfq_data *cfqd,
+ struct cfq_queue *cfqq)
+{
+ int sync = cfq_cfqq_sync(cfqq);
+ enum wl_prio_t prio = cfqq_prio(cfqq);
+ struct cfq_group *cfqg = cfqq->cfqg;
+ int sync_queues_cnt, async_queues_cnt;
+ struct cfq_io_context *cic = RQ_CIC(cfqq->next_rq);
+
+ /* Sole queue user, no limit */
+ if (cfqd->busy_queues == 1)
+ return -1;
+
+ if (cfq_have_cfqgs(cfqd) || (!sync && prio != RT_WORKLOAD))
+ goto normal;
+
+ sync_queues_cnt = cfqg->service_trees[prio][SYNC_NOIDLE_WORKLOAD].count
+ + cfqg->service_trees[prio][SYNC_WORKLOAD].count;
+ async_queues_cnt = cfqg->service_trees[prio][ASYNC_WORKLOAD].count;
+ /*
+ * If a queue is a sole sync queue and think time is small, we can ignore
+ * async queue here and give the sync queue no dispatch limit, because a
+ * sync queue can preempt async queue.
+ *
+ * If the queue is RT, we don't need check BE, because even the
+ * queue is expired, the dispatcher will select RT queue again next time.
+ *
+ * If the queue is BE, we don't check RT here, because dispatcher will
+ * switch to RT next time, so we at most dispatch one extra request.
+ */
+ if (((!sync && prio == RT_WORKLOAD && sync_queues_cnt == 0 &&
+ async_queues_cnt == 1) || sync_queues_cnt == 1) &&
+ sample_valid(cic->ttime_samples) &&
+ cic->ttime_mean < cfqd->cfq_slice_idle)
+ return -1;
+normal:
+ /*
+ * We have other queues, don't allow more IO from this one
+ */
+ if (cfq_slice_used_soon(cfqd, cfqq))
+ return 0;
+ else
+ /*
+ * Normally we start throttling cfqq when cfq_quantum/2
+ * requests have been dispatched. But we can drive
+ * deeper queue depths at the beginning of slice
+ * subjected to upper limit of cfq_quantum.
+ * */
+ return cfqd->cfq_quantum;
+}
+
static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
{
unsigned int max_dispatch;
@@ -2411,25 +2482,9 @@ static bool cfq_may_dispatch(struct cfq_
if (cfq_class_idle(cfqq))
return false;

- /*
- * We have other queues, don't allow more IO from this one
- */
- if (cfqd->busy_queues > 1 && cfq_slice_used_soon(cfqd, cfqq))
+ max_dispatch = cfq_queue_max_quantum(cfqd, cfqq);
+ if (max_dispatch == 0)
return false;
-
- /*
- * Sole queue user, no limit
- */
- if (cfqd->busy_queues == 1)
- max_dispatch = -1;
- else
- /*
- * Normally we start throttling cfqq when cfq_quantum/2
- * requests have been dispatched. But we can drive
- * deeper queue depths at the beginning of slice
- * subjected to upper limit of cfq_quantum.
- * */
- max_dispatch = cfqd->cfq_quantum;
}

/*


2011-03-02 16:17:53

by Jeff Moyer

[permalink] [raw]
Subject: Re: cfq-iosched preempt issues

Shaohua Li <[email protected]> writes:

> queue preemption is good for some workloads and not for others. With commit
> f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
> 1. In a multi-threaded workload, each thread runs a random read/write (for
> example, mmap write) with iodepth 1. I found the queue depth gets smaller
> with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
> are waitting for write, and on the other hand, there are less threads doing
> read. This will make the queue depth small, so performance drops a little.
> So in this case, speed up write can speed up read too, but we can't detect
> it.

I don't fully understand your workload. What is the aio-stress or fio
command line/config file?

> 2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
> What about if there are two queues, one sync and one async? If the sync queue's
> think time is small, we can treat it as the sole queue, because the sync queue
> will preempt async queue, so we don't need care about the async queue's latency.
> The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.

I'm not sure I entirely agree with that reasoning. Do you have real
workloads that are regressing due to this commit, or is it just these
cooked up benchmarks?

Cheers,
Jeff

2011-03-02 20:21:25

by Vivek Goyal

[permalink] [raw]
Subject: Re: cfq-iosched preempt issues

On Wed, Mar 02, 2011 at 08:43:41PM +0800, Shaohua Li wrote:
> queue preemption is good for some workloads and not for others. With commit
> f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
> 1. In a multi-threaded workload, each thread runs a random read/write (for
> example, mmap write) with iodepth 1. I found the queue depth gets smaller
> with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
> are waitting for write, and on the other hand, there are less threads doing
> read. This will make the queue depth small, so performance drops a little.
> So in this case, speed up write can speed up read too, but we can't detect
> it.
> 2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
> What about if there are two queues, one sync and one async? If the sync queue's
> think time is small, we can treat it as the sole queue, because the sync queue
> will preempt async queue, so we don't need care about the async queue's latency.
> The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.
>
> Any idea?

CFQ is already very complicated, lets try to keep it simple. Because it
is complicated, making it hierarchical for cgroup becomes even harder.

IIUC, you are saying that cfqd->busy_queues check is not sufficient as
it takes async queues also in account.

So we can keep another count say, cfqd->busy_sync_queues and if there
are no busy_sync_queues, allow unlimited depth and that should be
a really simple few lines change.

But lets do it only if you have a real life workload. Similiarly we can
worry about RT case when there is a real workload behind it.

Thanks
Vivek





>
> Thanks,
> Shaohua
> -----------------------------------------------
> Subject: cfq-iosched: don't limlit sync queue depth with only one such sync queue
>
> If there are a sync and an async queue and the sync queue's think time is small,
> we can ignore the sync queue's dispatch quantum. Because the sync queue will
> always preempt the async queue, we don't need to care about async's latency.
> In the same way, we can optimize a RT queue too to improve performance.
> This can fix a performance regression of aiostress test, which is introduced by
> commit f8ae6e3eb825. The issue should exist even without the commit, but the
> commit amplifies the impact.
>
> Signed-off-by: Shaohua Li <[email protected]>
> ---
> block/cfq-iosched.c | 91 +++++++++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 73 insertions(+), 18 deletions(-)
>
> Index: linux/block/cfq-iosched.c
> ===================================================================
> --- linux.orig/block/cfq-iosched.c 2011-03-02 14:58:19.000000000 +0800
> +++ linux/block/cfq-iosched.c 2011-03-02 15:48:38.000000000 +0800
> @@ -1150,6 +1150,20 @@ void cfq_unlink_blkio_group(void *key, s
> spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
> }
>
> +static bool cfq_have_cfqgs(struct cfq_data *cfqd)
> +{
> + struct hlist_node *pos;
> + struct cfq_group *cfqg;
> + int cnt = 0;
> +
> + hlist_for_each_entry(cfqg, pos, &cfqd->cfqg_list, cfqd_node) {
> + cnt++;
> + if (cnt > 1)
> + break;
> + }
> + return cnt > 1;
> +}
> +
> #else /* GROUP_IOSCHED */
> static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create)
> {
> @@ -1169,6 +1183,12 @@ cfq_link_cfqq_cfqg(struct cfq_queue *cfq
> static void cfq_release_cfq_groups(struct cfq_data *cfqd) {}
> static inline void cfq_put_cfqg(struct cfq_group *cfqg) {}
>
> +static inline bool cfq_have_cfqgs(struct cfq_data *cfqd)
> +{
> + return false;
> +}
> +
> +
> #endif /* GROUP_IOSCHED */
>
> /*
> @@ -2381,6 +2401,57 @@ static inline bool cfq_slice_used_soon(s
> return false;
> }
>
> +static unsigned int cfq_queue_max_quantum(struct cfq_data *cfqd,
> + struct cfq_queue *cfqq)
> +{
> + int sync = cfq_cfqq_sync(cfqq);
> + enum wl_prio_t prio = cfqq_prio(cfqq);
> + struct cfq_group *cfqg = cfqq->cfqg;
> + int sync_queues_cnt, async_queues_cnt;
> + struct cfq_io_context *cic = RQ_CIC(cfqq->next_rq);
> +
> + /* Sole queue user, no limit */
> + if (cfqd->busy_queues == 1)
> + return -1;
> +
> + if (cfq_have_cfqgs(cfqd) || (!sync && prio != RT_WORKLOAD))
> + goto normal;
> +
> + sync_queues_cnt = cfqg->service_trees[prio][SYNC_NOIDLE_WORKLOAD].count
> + + cfqg->service_trees[prio][SYNC_WORKLOAD].count;
> + async_queues_cnt = cfqg->service_trees[prio][ASYNC_WORKLOAD].count;
> + /*
> + * If a queue is a sole sync queue and think time is small, we can ignore
> + * async queue here and give the sync queue no dispatch limit, because a
> + * sync queue can preempt async queue.
> + *
> + * If the queue is RT, we don't need check BE, because even the
> + * queue is expired, the dispatcher will select RT queue again next time.
> + *
> + * If the queue is BE, we don't check RT here, because dispatcher will
> + * switch to RT next time, so we at most dispatch one extra request.
> + */
> + if (((!sync && prio == RT_WORKLOAD && sync_queues_cnt == 0 &&
> + async_queues_cnt == 1) || sync_queues_cnt == 1) &&
> + sample_valid(cic->ttime_samples) &&
> + cic->ttime_mean < cfqd->cfq_slice_idle)
> + return -1;
> +normal:
> + /*
> + * We have other queues, don't allow more IO from this one
> + */
> + if (cfq_slice_used_soon(cfqd, cfqq))
> + return 0;
> + else
> + /*
> + * Normally we start throttling cfqq when cfq_quantum/2
> + * requests have been dispatched. But we can drive
> + * deeper queue depths at the beginning of slice
> + * subjected to upper limit of cfq_quantum.
> + * */
> + return cfqd->cfq_quantum;
> +}
> +
> static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
> {
> unsigned int max_dispatch;
> @@ -2411,25 +2482,9 @@ static bool cfq_may_dispatch(struct cfq_
> if (cfq_class_idle(cfqq))
> return false;
>
> - /*
> - * We have other queues, don't allow more IO from this one
> - */
> - if (cfqd->busy_queues > 1 && cfq_slice_used_soon(cfqd, cfqq))
> + max_dispatch = cfq_queue_max_quantum(cfqd, cfqq);
> + if (max_dispatch == 0)
> return false;
> -
> - /*
> - * Sole queue user, no limit
> - */
> - if (cfqd->busy_queues == 1)
> - max_dispatch = -1;
> - else
> - /*
> - * Normally we start throttling cfqq when cfq_quantum/2
> - * requests have been dispatched. But we can drive
> - * deeper queue depths at the beginning of slice
> - * subjected to upper limit of cfq_quantum.
> - * */
> - max_dispatch = cfqd->cfq_quantum;
> }
>
> /*

2011-03-02 21:05:39

by Jeff Moyer

[permalink] [raw]
Subject: Re: cfq-iosched preempt issues

Vivek Goyal <[email protected]> writes:

> On Wed, Mar 02, 2011 at 08:43:41PM +0800, Shaohua Li wrote:
>> queue preemption is good for some workloads and not for others. With commit
>> f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
>> 1. In a multi-threaded workload, each thread runs a random read/write (for
>> example, mmap write) with iodepth 1. I found the queue depth gets smaller
>> with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
>> are waitting for write, and on the other hand, there are less threads doing
>> read. This will make the queue depth small, so performance drops a little.
>> So in this case, speed up write can speed up read too, but we can't detect
>> it.
>> 2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
>> What about if there are two queues, one sync and one async? If the sync queue's
>> think time is small, we can treat it as the sole queue, because the sync queue
>> will preempt async queue, so we don't need care about the async queue's latency.
>> The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.
>>
>> Any idea?
>
> CFQ is already very complicated, lets try to keep it simple. Because it
> is complicated, making it hierarchical for cgroup becomes even harder.
>
> IIUC, you are saying that cfqd->busy_queues check is not sufficient as
> it takes async queues also in account.
>
> So we can keep another count say, cfqd->busy_sync_queues and if there
> are no busy_sync_queues, allow unlimited depth and that should be
> a really simple few lines change.

That covers workload 2, but what about 1? I'm really not sure what the
workload there is.

Cheers,
Jeff

2011-03-02 21:27:40

by Vivek Goyal

[permalink] [raw]
Subject: Re: cfq-iosched preempt issues

On Wed, Mar 02, 2011 at 04:05:30PM -0500, Jeff Moyer wrote:
> Vivek Goyal <[email protected]> writes:
>
> > On Wed, Mar 02, 2011 at 08:43:41PM +0800, Shaohua Li wrote:
> >> queue preemption is good for some workloads and not for others. With commit
> >> f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
> >> 1. In a multi-threaded workload, each thread runs a random read/write (for
> >> example, mmap write) with iodepth 1. I found the queue depth gets smaller
> >> with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
> >> are waitting for write, and on the other hand, there are less threads doing
> >> read. This will make the queue depth small, so performance drops a little.
> >> So in this case, speed up write can speed up read too, but we can't detect
> >> it.
> >> 2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
> >> What about if there are two queues, one sync and one async? If the sync queue's
> >> think time is small, we can treat it as the sole queue, because the sync queue
> >> will preempt async queue, so we don't need care about the async queue's latency.
> >> The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.
> >>
> >> Any idea?
> >
> > CFQ is already very complicated, lets try to keep it simple. Because it
> > is complicated, making it hierarchical for cgroup becomes even harder.
> >
> > IIUC, you are saying that cfqd->busy_queues check is not sufficient as
> > it takes async queues also in account.
> >
> > So we can keep another count say, cfqd->busy_sync_queues and if there
> > are no busy_sync_queues, allow unlimited depth and that should be
> > a really simple few lines change.
>
> That covers workload 2, but what about 1? I'm really not sure what the
> workload there is.

But CFQ can't track that if reads are stuck behind peding writes. And the
whole philosophy is that give READS the importance and not WRITES. So I
am not sure what we can do about first case.

If we are really worried about performance and willing to loose isolation
in the process (read vs write isolation, or isolation across groups), then
may be we can think of implementing another tunables say min_queue_depth.
That tells CFQ that don't idle if you are not driving min_queue_depth.

But again, this should be backed by some real workloads.

Thanks
Vivek

2011-03-02 21:47:21

by Jeff Moyer

[permalink] [raw]
Subject: Re: cfq-iosched preempt issues

Vivek Goyal <[email protected]> writes:

> On Wed, Mar 02, 2011 at 04:05:30PM -0500, Jeff Moyer wrote:
>> Vivek Goyal <[email protected]> writes:
>>
>> > On Wed, Mar 02, 2011 at 08:43:41PM +0800, Shaohua Li wrote:
>> >> queue preemption is good for some workloads and not for others. With commit
>> >> f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
>> >> 1. In a multi-threaded workload, each thread runs a random read/write (for
>> >> example, mmap write) with iodepth 1. I found the queue depth gets smaller
>> >> with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
>> >> are waitting for write, and on the other hand, there are less threads doing
>> >> read. This will make the queue depth small, so performance drops a little.
>> >> So in this case, speed up write can speed up read too, but we can't detect
>> >> it.
>> >> 2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
>> >> What about if there are two queues, one sync and one async? If the sync queue's
>> >> think time is small, we can treat it as the sole queue, because the sync queue
>> >> will preempt async queue, so we don't need care about the async queue's latency.
>> >> The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.
>> >>
>> >> Any idea?
>> >
>> > CFQ is already very complicated, lets try to keep it simple. Because it
>> > is complicated, making it hierarchical for cgroup becomes even harder.
>> >
>> > IIUC, you are saying that cfqd->busy_queues check is not sufficient as
>> > it takes async queues also in account.
>> >
>> > So we can keep another count say, cfqd->busy_sync_queues and if there
>> > are no busy_sync_queues, allow unlimited depth and that should be
>> > a really simple few lines change.
>>
>> That covers workload 2, but what about 1? I'm really not sure what the
>> workload there is.
>
> But CFQ can't track that if reads are stuck behind peding writes. And the
> whole philosophy is that give READS the importance and not WRITES. So I
> am not sure what we can do about first case.

OK, I suspected it might be reads backed up behind writes, but wasn't
sure. I agree that we can't tell that's happening, and it's less clear
whether we'd even want to do anything about it.

> If we are really worried about performance and willing to loose isolation
> in the process (read vs write isolation, or isolation across groups), then
> may be we can think of implementing another tunables say min_queue_depth.
> That tells CFQ that don't idle if you are not driving min_queue_depth.

Hm, I think that would break a lot of things. ;-)

> But again, this should be backed by some real workloads.

I agree, and said as much in my initial response to Shaohua.

Cheers,
Jeff

2011-03-03 00:46:31

by Shaohua Li

[permalink] [raw]
Subject: Re: cfq-iosched preempt issues

On Thu, 2011-03-03 at 00:17 +0800, Jeff Moyer wrote:
> Shaohua Li <[email protected]> writes:
>
> > queue preemption is good for some workloads and not for others. With commit
> > f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
> > 1. In a multi-threaded workload, each thread runs a random read/write (for
> > example, mmap write) with iodepth 1. I found the queue depth gets smaller
> > with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
> > are waitting for write, and on the other hand, there are less threads doing
> > read. This will make the queue depth small, so performance drops a little.
> > So in this case, speed up write can speed up read too, but we can't detect
> > it.
>
> I don't fully understand your workload. What is the aio-stress or fio
> command line/config file?
ha, sure, this is a fio workload. attached is the fio script. it's
basically a simple mmap rand write workload.

> > 2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
> > What about if there are two queues, one sync and one async? If the sync queue's
> > think time is small, we can treat it as the sole queue, because the sync queue
> > will preempt async queue, so we don't need care about the async queue's latency.
> > The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.
>
> I'm not sure I entirely agree with that reasoning. Do you have real
> workloads that are regressing due to this commit, or is it just these
> cooked up benchmarks?
this is an aiostress with option: '-O -l -L -s 1200m -i 1 -r 16k'
the random read throughput has impact with preemption.

Thanks,
Shaohua


Attachments:
jobfile (11.92 kB)

2011-03-03 00:49:33

by Shaohua Li

[permalink] [raw]
Subject: Re: cfq-iosched preempt issues

On Thu, 2011-03-03 at 04:21 +0800, Vivek Goyal wrote:
> On Wed, Mar 02, 2011 at 08:43:41PM +0800, Shaohua Li wrote:
> > queue preemption is good for some workloads and not for others. With commit
> > f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
> > 1. In a multi-threaded workload, each thread runs a random read/write (for
> > example, mmap write) with iodepth 1. I found the queue depth gets smaller
> > with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
> > are waitting for write, and on the other hand, there are less threads doing
> > read. This will make the queue depth small, so performance drops a little.
> > So in this case, speed up write can speed up read too, but we can't detect
> > it.
> > 2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
> > What about if there are two queues, one sync and one async? If the sync queue's
> > think time is small, we can treat it as the sole queue, because the sync queue
> > will preempt async queue, so we don't need care about the async queue's latency.
> > The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.
> >
> > Any idea?
>
> CFQ is already very complicated, lets try to keep it simple. Because it
> is complicated, making it hierarchical for cgroup becomes even harder.
>
> IIUC, you are saying that cfqd->busy_queues check is not sufficient as
> it takes async queues also in account.
>
> So we can keep another count say, cfqd->busy_sync_queues and if there
> are no busy_sync_queues, allow unlimited depth and that should be
> a really simple few lines change.
sure, this is ok too.

> But lets do it only if you have a real life workload. Similiarly we can
> worry about RT case when there is a real workload behind it.
aiostress is the workload. but I haven't real workload for the RT case.

Thanks,
Shaohua

2011-03-03 01:05:38

by Shaohua Li

[permalink] [raw]
Subject: Re: cfq-iosched preempt issues

On Thu, 2011-03-03 at 05:27 +0800, Vivek Goyal wrote:
> On Wed, Mar 02, 2011 at 04:05:30PM -0500, Jeff Moyer wrote:
> > Vivek Goyal <[email protected]> writes:
> >
> > > On Wed, Mar 02, 2011 at 08:43:41PM +0800, Shaohua Li wrote:
> > >> queue preemption is good for some workloads and not for others. With commit
> > >> f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
> > >> 1. In a multi-threaded workload, each thread runs a random read/write (for
> > >> example, mmap write) with iodepth 1. I found the queue depth gets smaller
> > >> with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
> > >> are waitting for write, and on the other hand, there are less threads doing
> > >> read. This will make the queue depth small, so performance drops a little.
> > >> So in this case, speed up write can speed up read too, but we can't detect
> > >> it.
> > >> 2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
> > >> What about if there are two queues, one sync and one async? If the sync queue's
> > >> think time is small, we can treat it as the sole queue, because the sync queue
> > >> will preempt async queue, so we don't need care about the async queue's latency.
> > >> The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.
> > >>
> > >> Any idea?
> > >
> > > CFQ is already very complicated, lets try to keep it simple. Because it
> > > is complicated, making it hierarchical for cgroup becomes even harder.
> > >
> > > IIUC, you are saying that cfqd->busy_queues check is not sufficient as
> > > it takes async queues also in account.
> > >
> > > So we can keep another count say, cfqd->busy_sync_queues and if there
> > > are no busy_sync_queues, allow unlimited depth and that should be
> > > a really simple few lines change.
> >
> > That covers workload 2, but what about 1? I'm really not sure what the
> > workload there is.
>
> But CFQ can't track that if reads are stuck behind peding writes. And the
> whole philosophy is that give READS the importance and not WRITES. So I
> am not sure what we can do about first case.
I'm also not sure if we should take care about the case, since we should
give READ priority.

> If we are really worried about performance and willing to loose isolation
> in the process (read vs write isolation, or isolation across groups), then
> may be we can think of implementing another tunables say min_queue_depth.
> That tells CFQ that don't idle if you are not driving min_queue_depth.
The NCQ disk gives a lot of challenges to CFQ. It is hard to utilize the
full disk queue depth without loosing isolation. A tunable seems the
best option for people who don't so care about latency.