2009-12-08 22:53:09

by Vivek Goyal

[permalink] [raw]
Subject: [PATCH 1/2] cfq-iosched: Get rid of cfqq wait_busy_done flag

o Get rid of wait_busy_done flag. This flag only tells we were doing wait
busy on a queue and that queue got request so expire it. That information
can easily be obtained by (cfq_cfqq_wait_busy() && queue_is_not_empty). So
remove this flag and keep code simple.

Signed-off-by: Vivek Goyal <[email protected]>
---
block/cfq-iosched.c | 17 ++++++++---------
1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index cfb0b2f..276d765 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -319,7 +319,6 @@ enum cfqq_state_flags {
CFQ_CFQQ_FLAG_coop, /* cfqq is shared */
CFQ_CFQQ_FLAG_deep, /* sync cfqq experienced large depth */
CFQ_CFQQ_FLAG_wait_busy, /* Waiting for next request */
- CFQ_CFQQ_FLAG_wait_busy_done, /* Got new request. Expire the queue */
};

#define CFQ_CFQQ_FNS(name) \
@@ -348,7 +347,6 @@ CFQ_CFQQ_FNS(sync);
CFQ_CFQQ_FNS(coop);
CFQ_CFQQ_FNS(deep);
CFQ_CFQQ_FNS(wait_busy);
-CFQ_CFQQ_FNS(wait_busy_done);
#undef CFQ_CFQQ_FNS

#ifdef CONFIG_DEBUG_CFQ_IOSCHED
@@ -1574,7 +1572,6 @@ __cfq_slice_expired(struct cfq_data *cfqd, struct cfq_queue *cfqq,

cfq_clear_cfqq_wait_request(cfqq);
cfq_clear_cfqq_wait_busy(cfqq);
- cfq_clear_cfqq_wait_busy_done(cfqq);

/*
* store what was left of this slice, if the queue idled/timed out
@@ -2128,11 +2125,17 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)

if (!cfqd->rq_queued)
return NULL;
+
+ /*
+ * We were waiting for group to get backlogged. Expire the queue
+ */
+ if (cfq_cfqq_wait_busy(cfqq) && !RB_EMPTY_ROOT(&cfqq->sort_list))
+ goto expire;
+
/*
* The active queue has run out of time, expire it and select new.
*/
- if ((cfq_slice_used(cfqq) || cfq_cfqq_wait_busy_done(cfqq))
- && !cfq_cfqq_must_dispatch(cfqq))
+ if (cfq_slice_used(cfqq) && !cfq_cfqq_must_dispatch(cfqq))
goto expire;

/*
@@ -3165,10 +3168,6 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
cfqq->last_request_pos = blk_rq_pos(rq) + blk_rq_sectors(rq);

if (cfqq == cfqd->active_queue) {
- if (cfq_cfqq_wait_busy(cfqq)) {
- cfq_clear_cfqq_wait_busy(cfqq);
- cfq_mark_cfqq_wait_busy_done(cfqq);
- }
/*
* Remember that we saw a request from this process, but
* don't start queuing just yet. Otherwise we risk seeing lots
--
1.6.2.5


2009-12-08 22:53:06

by Vivek Goyal

[permalink] [raw]
Subject: [PATCH 2/2] cfq-iosched: Take care of corner cases of group losing share due to deletion

If there is a sequential reader running in a group, we wait for next request
to come in that group after slice expiry and once new request is in, we expire
the queue. Otherwise we delete the group from service tree and group looses
its fair share.

So far I was marking a queue as wait_busy if it had consumed its slice and
it was last queue in the group. But this condition did not cover following
two cases.

1.If a request completed and slice has not expired yet. Next request comes
in and is dispatched to disk. Now select_queue() hits and slice has expired.
This group will be deleted. Because request is still in the disk, this queue
will never get a chance to wait_busy.

2.If request completed and slice has not expired yet. Before next request
comes in (delay due to think time), select_queue() hits and expires the
queue hence group. This queue never got a chance to wait busy.

Gui was hitting the boundary condition 1 and not getting fairness numbers
proportional to weight.

This patch puts the checks for above two conditions and improves the fairness
numbers for sequential workload on rotational media. Check in select_queue()
takes care of case 1 and additional check in should_wait_busy() takes care
of case 2.

Reported-by: Gui Jianfeng <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
block/cfq-iosched.c | 55 +++++++++++++++++++++++++++++++++++++++++++++-----
1 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 276d765..50da108 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -2135,8 +2135,22 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
/*
* The active queue has run out of time, expire it and select new.
*/
- if (cfq_slice_used(cfqq) && !cfq_cfqq_must_dispatch(cfqq))
- goto expire;
+ if (cfq_slice_used(cfqq) && !cfq_cfqq_must_dispatch(cfqq)) {
+ /*
+ * If slice had not expired at the completion of last request
+ * we might not have turned on wait_busy flag. Don't expire
+ * the queue yet. Allow the group to get backlogged.
+ *
+ * The very fact that we have used the slice, that means we
+ * have been idling all along on this queue and it should be
+ * ok to wait for this request to complete.
+ */
+ if (cfqq->cfqg->nr_cfqq == 1 && cfqq->dispatched
+ && cfq_should_idle(cfqd, cfqq))
+ goto keep_queue;
+ else
+ goto expire;
+ }

/*
* The active queue has requests and isn't expired, allow it to
@@ -3250,6 +3264,36 @@ static void cfq_update_hw_tag(struct cfq_data *cfqd)
cfqd->hw_tag = 0;
}

+static inline bool
+cfq_should_wait_busy(struct cfq_data *cfqd, struct cfq_queue *cfqq)
+{
+ struct cfq_io_context *cic = cfqd->active_cic;
+
+ /* If there are other queues in the group, don't wait */
+ if (cfqq->cfqg->nr_cfqq > 1)
+ return false;
+
+ if (cfq_slice_used(cfqq))
+ return true;
+
+ /* if slice left is less than think time, wait busy */
+ if (cic && sample_valid(cic->ttime_samples)
+ && (cfqq->slice_end - jiffies < cic->ttime_mean))
+ return true;
+
+ /*
+ * If think times is less than a jiffy than ttime_mean=0 and above
+ * will not be true. It might happen that slice has not expired yet
+ * but will expire soon (4-5 ns) during select_queue(). To cover the
+ * case where think time is less than a jiffy, mark the queue wait
+ * busy if only 1 jiffy is left in the slice.
+ */
+ if (cfqq->slice_end - jiffies == 1)
+ return true;
+
+ return false;
+}
+
static void cfq_completed_request(struct request_queue *q, struct request *rq)
{
struct cfq_queue *cfqq = RQ_CFQQ(rq);
@@ -3288,11 +3332,10 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
}

/*
- * If this queue consumed its slice and this is last queue
- * in the group, wait for next request before we expire
- * the queue
+ * Should we wait for next request to come in before we expire
+ * the queue.
*/
- if (cfq_slice_used(cfqq) && cfqq->cfqg->nr_cfqq == 1) {
+ if (cfq_should_wait_busy(cfqd, cfqq)) {
cfqq->slice_end = jiffies + cfqd->cfq_slice_idle;
cfq_mark_cfqq_wait_busy(cfqq);
}
--
1.6.2.5

2009-12-09 13:56:35

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 2/2] cfq-iosched: Take care of corner cases of group losing share due to deletion

On Tue, Dec 08 2009, Vivek Goyal wrote:
> If there is a sequential reader running in a group, we wait for next request
> to come in that group after slice expiry and once new request is in, we expire
> the queue. Otherwise we delete the group from service tree and group looses
> its fair share.
>
> So far I was marking a queue as wait_busy if it had consumed its slice and
> it was last queue in the group. But this condition did not cover following
> two cases.
>
> 1.If a request completed and slice has not expired yet. Next request comes
> in and is dispatched to disk. Now select_queue() hits and slice has expired.
> This group will be deleted. Because request is still in the disk, this queue
> will never get a chance to wait_busy.
>
> 2.If request completed and slice has not expired yet. Before next request
> comes in (delay due to think time), select_queue() hits and expires the
> queue hence group. This queue never got a chance to wait busy.
>
> Gui was hitting the boundary condition 1 and not getting fairness numbers
> proportional to weight.
>
> This patch puts the checks for above two conditions and improves the fairness
> numbers for sequential workload on rotational media. Check in select_queue()
> takes care of case 1 and additional check in should_wait_busy() takes care
> of case 2.

I think this (and 1/2) look fine, just one minor comment:

> @@ -3250,6 +3264,36 @@ static void cfq_update_hw_tag(struct cfq_data *cfqd)
> cfqd->hw_tag = 0;
> }
>
> +static inline bool
> +cfq_should_wait_busy(struct cfq_data *cfqd, struct cfq_queue *cfqq)
> +{

That's too large to inline.

--
Jens Axboe

2009-12-09 15:17:25

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 2/2] cfq-iosched: Take care of corner cases of group losing share due to deletion

On Wed, Dec 09, 2009 at 02:56:39PM +0100, Jens Axboe wrote:
> On Tue, Dec 08 2009, Vivek Goyal wrote:
> > If there is a sequential reader running in a group, we wait for next request
> > to come in that group after slice expiry and once new request is in, we expire
> > the queue. Otherwise we delete the group from service tree and group looses
> > its fair share.
> >
> > So far I was marking a queue as wait_busy if it had consumed its slice and
> > it was last queue in the group. But this condition did not cover following
> > two cases.
> >
> > 1.If a request completed and slice has not expired yet. Next request comes
> > in and is dispatched to disk. Now select_queue() hits and slice has expired.
> > This group will be deleted. Because request is still in the disk, this queue
> > will never get a chance to wait_busy.
> >
> > 2.If request completed and slice has not expired yet. Before next request
> > comes in (delay due to think time), select_queue() hits and expires the
> > queue hence group. This queue never got a chance to wait busy.
> >
> > Gui was hitting the boundary condition 1 and not getting fairness numbers
> > proportional to weight.
> >
> > This patch puts the checks for above two conditions and improves the fairness
> > numbers for sequential workload on rotational media. Check in select_queue()
> > takes care of case 1 and additional check in should_wait_busy() takes care
> > of case 2.
>
> I think this (and 1/2) look fine, just one minor comment:
>
> > @@ -3250,6 +3264,36 @@ static void cfq_update_hw_tag(struct cfq_data *cfqd)
> > cfqd->hw_tag = 0;
> > }
> >
> > +static inline bool
> > +cfq_should_wait_busy(struct cfq_data *cfqd, struct cfq_queue *cfqq)
> > +{
>
> That's too large to inline.

Hi Jens,

Please find below the new version of patch. I have removed inline from
cfq_should_wait_busy().

Please let me know if you prefer a seprate posting in new mail thread.

Thanks
Vivek


DESC
cfq-iosched: Take care of corner cases of group losing share due to deletion
EDESC

If there is a sequential reader running in a group, we wait for next request
to come in that group after slice expiry and once new request is in, we expire
the queue. Otherwise we delete the group from service tree and group looses
its fair share.

So far I was marking a queue as wait_busy if it had consumed its slice and
it was last queue in the group. But this condition did not cover following
two cases.

1.If a request completed and slice has not expired yet. Next request comes
in and is dispatched to disk. Now select_queue() hits and slice has expired.
This group will be deleted. Because request is still in the disk, this queue
will never get a chance to wait_busy.

2.If request completed and slice has not expired yet. Before next request
comes in (delay due to think time), select_queue() hits and expires the
queue hence group. This queue never got a chance to wait busy.

Gui was hitting the boundary condition 1 and not getting fairness numbers
proportional to weight.

This patch puts the checks for above two conditions and improves the fairness
numbers for sequential workload on rotational media. Check in select_queue()
takes care of case 1 and additional check in should_wait_busy() takes care
of case 2.

Reported-by: Gui Jianfeng <[email protected]>
Signed-off-by: Vivek Goyal <[email protected]>
---
block/cfq-iosched.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 48 insertions(+), 6 deletions(-)

Index: linux2/block/cfq-iosched.c
===================================================================
--- linux2.orig/block/cfq-iosched.c 2009-12-08 17:44:32.000000000 -0500
+++ linux2/block/cfq-iosched.c 2009-12-09 09:54:08.000000000 -0500
@@ -2135,8 +2135,22 @@ static struct cfq_queue *cfq_select_queu
/*
* The active queue has run out of time, expire it and select new.
*/
- if (cfq_slice_used(cfqq) && !cfq_cfqq_must_dispatch(cfqq))
- goto expire;
+ if (cfq_slice_used(cfqq) && !cfq_cfqq_must_dispatch(cfqq)) {
+ /*
+ * If slice had not expired at the completion of last request
+ * we might not have turned on wait_busy flag. Don't expire
+ * the queue yet. Allow the group to get backlogged.
+ *
+ * The very fact that we have used the slice, that means we
+ * have been idling all along on this queue and it should be
+ * ok to wait for this request to complete.
+ */
+ if (cfqq->cfqg->nr_cfqq == 1 && cfqq->dispatched
+ && cfq_should_idle(cfqd, cfqq))
+ goto keep_queue;
+ else
+ goto expire;
+ }

/*
* The active queue has requests and isn't expired, allow it to
@@ -3250,6 +3264,35 @@ static void cfq_update_hw_tag(struct cfq
cfqd->hw_tag = 0;
}

+static bool cfq_should_wait_busy(struct cfq_data *cfqd, struct cfq_queue *cfqq)
+{
+ struct cfq_io_context *cic = cfqd->active_cic;
+
+ /* If there are other queues in the group, don't wait */
+ if (cfqq->cfqg->nr_cfqq > 1)
+ return false;
+
+ if (cfq_slice_used(cfqq))
+ return true;
+
+ /* if slice left is less than think time, wait busy */
+ if (cic && sample_valid(cic->ttime_samples)
+ && (cfqq->slice_end - jiffies < cic->ttime_mean))
+ return true;
+
+ /*
+ * If think times is less than a jiffy than ttime_mean=0 and above
+ * will not be true. It might happen that slice has not expired yet
+ * but will expire soon (4-5 ns) during select_queue(). To cover the
+ * case where think time is less than a jiffy, mark the queue wait
+ * busy if only 1 jiffy is left in the slice.
+ */
+ if (cfqq->slice_end - jiffies == 1)
+ return true;
+
+ return false;
+}
+
static void cfq_completed_request(struct request_queue *q, struct request *rq)
{
struct cfq_queue *cfqq = RQ_CFQQ(rq);
@@ -3288,11 +3331,10 @@ static void cfq_completed_request(struct
}

/*
- * If this queue consumed its slice and this is last queue
- * in the group, wait for next request before we expire
- * the queue
+ * Should we wait for next request to come in before we expire
+ * the queue.
*/
- if (cfq_slice_used(cfqq) && cfqq->cfqg->nr_cfqq == 1) {
+ if (cfq_should_wait_busy(cfqd, cfqq)) {
cfqq->slice_end = jiffies + cfqd->cfq_slice_idle;
cfq_mark_cfqq_wait_busy(cfqq);
}

2009-12-09 18:50:39

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 2/2] cfq-iosched: Take care of corner cases of group losing share due to deletion

On Wed, Dec 09 2009, Vivek Goyal wrote:
> On Wed, Dec 09, 2009 at 02:56:39PM +0100, Jens Axboe wrote:
> > On Tue, Dec 08 2009, Vivek Goyal wrote:
> > > If there is a sequential reader running in a group, we wait for next request
> > > to come in that group after slice expiry and once new request is in, we expire
> > > the queue. Otherwise we delete the group from service tree and group looses
> > > its fair share.
> > >
> > > So far I was marking a queue as wait_busy if it had consumed its slice and
> > > it was last queue in the group. But this condition did not cover following
> > > two cases.
> > >
> > > 1.If a request completed and slice has not expired yet. Next request comes
> > > in and is dispatched to disk. Now select_queue() hits and slice has expired.
> > > This group will be deleted. Because request is still in the disk, this queue
> > > will never get a chance to wait_busy.
> > >
> > > 2.If request completed and slice has not expired yet. Before next request
> > > comes in (delay due to think time), select_queue() hits and expires the
> > > queue hence group. This queue never got a chance to wait busy.
> > >
> > > Gui was hitting the boundary condition 1 and not getting fairness numbers
> > > proportional to weight.
> > >
> > > This patch puts the checks for above two conditions and improves the fairness
> > > numbers for sequential workload on rotational media. Check in select_queue()
> > > takes care of case 1 and additional check in should_wait_busy() takes care
> > > of case 2.
> >
> > I think this (and 1/2) look fine, just one minor comment:
> >
> > > @@ -3250,6 +3264,36 @@ static void cfq_update_hw_tag(struct cfq_data *cfqd)
> > > cfqd->hw_tag = 0;
> > > }
> > >
> > > +static inline bool
> > > +cfq_should_wait_busy(struct cfq_data *cfqd, struct cfq_queue *cfqq)
> > > +{
> >
> > That's too large to inline.
>
> Hi Jens,
>
> Please find below the new version of patch. I have removed inline from
> cfq_should_wait_busy().
>
> Please let me know if you prefer a seprate posting in new mail thread.

No problem, actually I just hand-edited your previous patch when
applying it. Sorry, should have said so!

--
Jens Axboe

2009-12-09 19:03:11

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 1/2] cfq-iosched: Get rid of cfqq wait_busy_done flag

Vivek Goyal <[email protected]> writes:

> o Get rid of wait_busy_done flag. This flag only tells we were doing wait
> busy on a queue and that queue got request so expire it. That information
> can easily be obtained by (cfq_cfqq_wait_busy() && queue_is_not_empty). So
> remove this flag and keep code simple.
>
> Signed-off-by: Vivek Goyal <[email protected]>
Reviewed-by: Jeff Moyer <[email protected]>

2009-12-09 19:06:17

by Jeff Moyer

[permalink] [raw]
Subject: Re: [PATCH 2/2] cfq-iosched: Take care of corner cases of group losing share due to deletion

Vivek Goyal <[email protected]> writes:

> Reported-by: Gui Jianfeng <[email protected]>
> Signed-off-by: Vivek Goyal <[email protected]>

> + /*
> + * If think times is less than a jiffy than ttime_mean=0 and above
> + * will not be true. It might happen that slice has not expired yet
> + * but will expire soon (4-5 ns) during select_queue(). To cover the

4-5ns? I'm not sure why you chose these numbers. Is that what you saw
in testing? Anyway, I'm ok with the change, I dislike the reference to
time when a single jiffy could be up to 10ms.

Reviewed-by: Jeff Moyer <[email protected]>

2009-12-09 20:39:43

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 2/2] cfq-iosched: Take care of corner cases of group losing share due to deletion

On Wed, Dec 09, 2009 at 02:06:10PM -0500, Jeff Moyer wrote:
> Vivek Goyal <[email protected]> writes:
>
> > Reported-by: Gui Jianfeng <[email protected]>
> > Signed-off-by: Vivek Goyal <[email protected]>
>
> > + /*
> > + * If think times is less than a jiffy than ttime_mean=0 and above
> > + * will not be true. It might happen that slice has not expired yet
> > + * but will expire soon (4-5 ns) during select_queue(). To cover the
>
> 4-5ns? I'm not sure why you chose these numbers. Is that what you saw
> in testing?

Yes, that was the number I observed in blktrace while I was debugging
the issue.

> Anyway, I'm ok with the change, I dislike the reference to
> time when a single jiffy could be up to 10ms.
>
> Reviewed-by: Jeff Moyer <[email protected]>