The core block layer already has code to detect presence of command
queuing devices. We convert cfq to use that instead of re-doing the
computation.
Signed-off-by: Corrado Zoccolo <[email protected]>
---
block/cfq-iosched.c | 55 ++------------------------------------------------
1 files changed, 3 insertions(+), 52 deletions(-)
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 829d87d..a185742 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -186,14 +186,6 @@ struct cfq_data {
int sync_flight;
/*
- * queue-depth detection
- */
- int rq_queued;
- int hw_tag;
- int hw_tag_samples;
- int rq_in_driver_peak;
-
- /*
* idle window management
*/
struct timer_list idle_slice_timer;
@@ -912,7 +904,6 @@ static void cfq_remove_request(struct request *rq)
list_del_init(&rq->queuelist);
cfq_del_rq_rb(rq);
- cfqq->cfqd->rq_queued--;
if (rq_is_meta(rq)) {
WARN_ON(!cfqq->meta_pending);
cfqq->meta_pending--;
@@ -1227,7 +1218,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
* for devices that support queuing, otherwise we still have a problem
* with sync vs async workloads.
*/
- if (blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag)
+ if (blk_queue_nonrot(cfqd->queue) && blk_queue_queuing(cfqd->queue))
return;
WARN_ON(!RB_EMPTY_ROOT(&cfqq->sort_list));
@@ -1273,7 +1264,8 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
if (cfqd->serving_type == SYNC_NOIDLE_WORKLOAD &&
service_tree_for(cfqd->serving_prio, SYNC_NOIDLE_WORKLOAD, cfqd)
->count > 0) {
- if (blk_queue_nonrot(cfqd->queue) || cfqd->hw_tag)
+ if (blk_queue_nonrot(cfqd->queue) ||
+ blk_queue_queuing(cfqd->queue))
return;
sl = min(sl, msecs_to_jiffies(CFQ_MIN_TT));
}
@@ -2462,7 +2454,6 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
{
struct cfq_io_context *cic = RQ_CIC(rq);
- cfqd->rq_queued++;
if (rq_is_meta(rq))
cfqq->meta_pending++;
@@ -2518,43 +2509,6 @@ static void cfq_insert_request(struct request_queue *q, struct request *rq)
cfq_rq_enqueued(cfqd, cfqq, rq);
}
-/*
- * Update hw_tag based on peak queue depth over 50 samples under
- * sufficient load.
- */
-static void cfq_update_hw_tag(struct cfq_data *cfqd)
-{
- struct cfq_queue *cfqq = cfqd->active_queue;
-
- if (rq_in_driver(cfqd) > cfqd->rq_in_driver_peak)
- cfqd->rq_in_driver_peak = rq_in_driver(cfqd);
-
- if (cfqd->rq_queued <= CFQ_HW_QUEUE_MIN &&
- rq_in_driver(cfqd) <= CFQ_HW_QUEUE_MIN)
- return;
-
- /*
- * If active queue hasn't enough requests and can idle, cfq might not
- * dispatch sufficient requests to hardware. Don't zero hw_tag in this
- * case
- */
- if (cfqq && cfq_cfqq_idle_window(cfqq) &&
- cfqq->dispatched + cfqq->queued[0] + cfqq->queued[1] <
- CFQ_HW_QUEUE_MIN && rq_in_driver(cfqd) < CFQ_HW_QUEUE_MIN)
- return;
-
- if (cfqd->hw_tag_samples++ < 50)
- return;
-
- if (cfqd->rq_in_driver_peak >= CFQ_HW_QUEUE_MIN)
- cfqd->hw_tag = 1;
- else
- cfqd->hw_tag = 0;
-
- cfqd->hw_tag_samples = 0;
- cfqd->rq_in_driver_peak = 0;
-}
-
static void cfq_completed_request(struct request_queue *q, struct request *rq)
{
struct cfq_queue *cfqq = RQ_CFQQ(rq);
@@ -2565,8 +2519,6 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
now = jiffies;
cfq_log_cfqq(cfqd, cfqq, "complete");
- cfq_update_hw_tag(cfqd);
-
WARN_ON(!cfqd->rq_in_driver[sync]);
WARN_ON(!cfqq->dispatched);
cfqd->rq_in_driver[sync]--;
@@ -2959,7 +2911,6 @@ static void *cfq_init_queue(struct request_queue *q)
cfqd->cfq_slice_async_rq = cfq_slice_async_rq;
cfqd->cfq_slice_idle = cfq_slice_idle;
cfqd->cfq_latency = 1;
- cfqd->hw_tag = 1;
cfqd->last_end_sync_rq = jiffies;
return cfqd;
}
--
1.6.2.5
Corrado Zoccolo <[email protected]> writes:
> The core block layer already has code to detect presence of command
> queuing devices. We convert cfq to use that instead of re-doing the
> computation.
This sounds like a good idea to me, and it appears the same number of
requests will be used to detect command queuing. I'm surprised we can't
just query each device to detect this, honestly.
One thing you missed, though, was removing the CFQ_HW_QUEUE_MIN define.
Cheers,
Jeff
On Tue, Nov 10 2009, Corrado Zoccolo wrote:
> The core block layer already has code to detect presence of command
> queuing devices. We convert cfq to use that instead of re-doing the
> computation.
There's is the major difference that the CFQ variant is dynamic and the
block layer one is not. This change came from Aaron some time ago IIRC,
see commit 45333d5. It's a bit of a chicken and egg problem.
--
Jens Axboe
Jens Axboe <[email protected]> writes:
> On Tue, Nov 10 2009, Corrado Zoccolo wrote:
>> The core block layer already has code to detect presence of command
>> queuing devices. We convert cfq to use that instead of re-doing the
>> computation.
>
> There's is the major difference that the CFQ variant is dynamic and the
> block layer one is not. This change came from Aaron some time ago IIRC,
> see commit 45333d5. It's a bit of a chicken and egg problem.
Really? blk_dequeue_request sure looks like it updates things
dynamically, but only one way (not queueing -> queueing). Would it make
sense to just put CFQ's logic into the block layer so that everyone uses
the same implementation? It makes little sense to have two notions of
whether or not queueing is supported for a device.
Cheers,
Jeff
On Tue, Nov 10 2009, Jeff Moyer wrote:
> Jens Axboe <[email protected]> writes:
>
> > On Tue, Nov 10 2009, Corrado Zoccolo wrote:
> >> The core block layer already has code to detect presence of command
> >> queuing devices. We convert cfq to use that instead of re-doing the
> >> computation.
> >
> > There's is the major difference that the CFQ variant is dynamic and the
> > block layer one is not. This change came from Aaron some time ago IIRC,
> > see commit 45333d5. It's a bit of a chicken and egg problem.
>
> Really? blk_dequeue_request sure looks like it updates things
> dynamically, but only one way (not queueing -> queueing). Would it make
Yes of course the block layer one is dynamically on as well. The ideal
goal would be to have every driver use the block layer tagging in which
case we'd know without checking, but alas it isn't so (yet). My point is
that the CFQ variant is dynamically off as well. Corrado presents his
patch as a direct functional equivelant, which it definitely isn't.
> sense to just put CFQ's logic into the block layer so that everyone uses
> the same implementation? It makes little sense to have two notions of
> whether or not queueing is supported for a device.
The one use in the block layer cares about the static property of the
device, not the current behaviour. So I'm not sure it makes a lot of
sense to unify these. It's not really a case of code duplication either,
the block layer one is two checks and a bit. The cfq variant is a bit
more involved in that it tracks the state continually.
--
Jens Axboe
Jens Axboe <[email protected]> writes:
> On Tue, Nov 10 2009, Jeff Moyer wrote:
>> Jens Axboe <[email protected]> writes:
>>
>> > On Tue, Nov 10 2009, Corrado Zoccolo wrote:
>> >> The core block layer already has code to detect presence of command
>> >> queuing devices. We convert cfq to use that instead of re-doing the
>> >> computation.
>> >
>> > There's is the major difference that the CFQ variant is dynamic and the
>> > block layer one is not. This change came from Aaron some time ago IIRC,
>> > see commit 45333d5. It's a bit of a chicken and egg problem.
>>
>> Really? blk_dequeue_request sure looks like it updates things
>> dynamically, but only one way (not queueing -> queueing). Would it make
>
> Yes of course the block layer one is dynamically on as well. The ideal
> goal would be to have every driver use the block layer tagging in which
> case we'd know without checking, but alas it isn't so (yet). My point is
> that the CFQ variant is dynamically off as well. Corrado presents his
> patch as a direct functional equivelant, which it definitely isn't.
OK. So we really want to keep track of two things:
1) What queue depth does the hardware support?
2) What is the command queue depth configured to?
That second thing can be changed by the administrator (down from or up
to the maximum value allowed by 1).
>> sense to just put CFQ's logic into the block layer so that everyone uses
>> the same implementation? It makes little sense to have two notions of
>> whether or not queueing is supported for a device.
>
> The one use in the block layer cares about the static property of the
> device, not the current behaviour. So I'm not sure it makes a lot of
> sense to unify these. It's not really a case of code duplication either,
> the block layer one is two checks and a bit. The cfq variant is a bit
> more involved in that it tracks the state continually.
Why don't we simply use the value configured via the queue_depth sysfs
file?
Cheers,
Jeff
On Tue, Nov 10 2009, Jeff Moyer wrote:
> Jens Axboe <[email protected]> writes:
>
> > On Tue, Nov 10 2009, Jeff Moyer wrote:
> >> Jens Axboe <[email protected]> writes:
> >>
> >> > On Tue, Nov 10 2009, Corrado Zoccolo wrote:
> >> >> The core block layer already has code to detect presence of command
> >> >> queuing devices. We convert cfq to use that instead of re-doing the
> >> >> computation.
> >> >
> >> > There's is the major difference that the CFQ variant is dynamic and the
> >> > block layer one is not. This change came from Aaron some time ago IIRC,
> >> > see commit 45333d5. It's a bit of a chicken and egg problem.
> >>
> >> Really? blk_dequeue_request sure looks like it updates things
> >> dynamically, but only one way (not queueing -> queueing). Would it make
> >
> > Yes of course the block layer one is dynamically on as well. The ideal
> > goal would be to have every driver use the block layer tagging in which
> > case we'd know without checking, but alas it isn't so (yet). My point is
> > that the CFQ variant is dynamically off as well. Corrado presents his
> > patch as a direct functional equivelant, which it definitely isn't.
>
> OK. So we really want to keep track of two things:
> 1) What queue depth does the hardware support?
> 2) What is the command queue depth configured to?
>
> That second thing can be changed by the administrator (down from or up
> to the maximum value allowed by 1).
>
> >> sense to just put CFQ's logic into the block layer so that everyone uses
> >> the same implementation? It makes little sense to have two notions of
> >> whether or not queueing is supported for a device.
> >
> > The one use in the block layer cares about the static property of the
> > device, not the current behaviour. So I'm not sure it makes a lot of
> > sense to unify these. It's not really a case of code duplication either,
> > the block layer one is two checks and a bit. The cfq variant is a bit
> > more involved in that it tracks the state continually.
>
> Why don't we simply use the value configured via the queue_depth sysfs
> file?
First of all, that only covers SCSI. We could do that by having the tag
on/off functions set the same flag. But even for such devices, actual
tag depth is dependent upon what other devices are on the controller
(since it's often a shared map) and may not even be statically
detectable in the sense that actual depth is only really seen when the
device returns busy on a queue attempt.
In most cases it would work fine, but the dynamic detection is more
reliable. The sysfs setting in reality is max setting.
--
Jens Axboe
Jens Axboe <[email protected]> writes:
>> Why don't we simply use the value configured via the queue_depth sysfs
>> file?
>
> First of all, that only covers SCSI. We could do that by having the tag
> on/off functions set the same flag. But even for such devices, actual
> tag depth is dependent upon what other devices are on the controller
> (since it's often a shared map) and may not even be statically
> detectable in the sense that actual depth is only really seen when the
> device returns busy on a queue attempt.
>
> In most cases it would work fine, but the dynamic detection is more
> reliable. The sysfs setting in reality is max setting.
OK, thanks for the patient explanation, Jens!
Cheers,
Jeff
On Tue, Nov 10, 2009 at 4:14 PM, Jens Axboe <[email protected]> wrote:
> On Tue, Nov 10 2009, Corrado Zoccolo wrote:
>> The core block layer already has code to detect presence of command
>> queuing devices. We convert cfq to use that instead of re-doing the
>> computation.
>
> There's is the major difference that the CFQ variant is dynamic and the
> block layer one is not. This change came from Aaron some time ago IIRC,
> see commit 45333d5. It's a bit of a chicken and egg problem.
The comment by Aaron:
CFQ's detection of queueing devices assumes a non-queuing device and detects
if the queue depth reaches a certain threshold. Under some workloads (e.g.
synchronous reads), CFQ effectively forces a unit queue depth,
thus defeating
the detection logic. This leads to poor performance on queuing hardware,
since the idle window remains enabled.
makes me think that the dynamic-off detection in cfq may really be
buggy (BTW this could explain the bad results on SSD Jeff observed
before my patch set).
The problem is, that once the hw_tag is 0, it is difficult for it to
become 1 again, as explained by Aaron, since cfq will hardly send more
than 1 request at a time. My patch set fixes this for SSDs (the seeky
readers will still be sent without idling, and if they are enough, the
logic will see a large enough depth to reconsider the initial
decision).
So the only sound way to do the detection is to start in an
indeterminate state, in which CFQ behaves as if hw_tag = 1, and then,
if for a long observation period we never saw large depth, we switch
to hw_tag = 0, otherwise we stick to hw_tag = 1, without reconsidering
it.
I think the correct logic could be pushed to the blk-core, by
introducing also an indeterminate bit.
Corrado
>
> --
> Jens Axboe
>
>
On Tue, Nov 10 2009, Corrado Zoccolo wrote:
> On Tue, Nov 10, 2009 at 4:14 PM, Jens Axboe <[email protected]> wrote:
> > On Tue, Nov 10 2009, Corrado Zoccolo wrote:
> >> The core block layer already has code to detect presence of command
> >> queuing devices. We convert cfq to use that instead of re-doing the
> >> computation.
> >
> > There's is the major difference that the CFQ variant is dynamic and the
> > block layer one is not. This change came from Aaron some time ago IIRC,
> > see commit 45333d5. It's a bit of a chicken and egg problem.
>
> The comment by Aaron:
> CFQ's detection of queueing devices assumes a non-queuing device and detects
> if the queue depth reaches a certain threshold. Under some workloads (e.g.
> synchronous reads), CFQ effectively forces a unit queue depth,
> thus defeating
> the detection logic. This leads to poor performance on queuing hardware,
> since the idle window remains enabled.
>
> makes me think that the dynamic-off detection in cfq may really be
> buggy (BTW this could explain the bad results on SSD Jeff observed
> before my patch set).
> The problem is, that once the hw_tag is 0, it is difficult for it to
> become 1 again, as explained by Aaron, since cfq will hardly send more
> than 1 request at a time. My patch set fixes this for SSDs (the seeky
> readers will still be sent without idling, and if they are enough, the
> logic will see a large enough depth to reconsider the initial
> decision).
>
> So the only sound way to do the detection is to start in an
> indeterminate state, in which CFQ behaves as if hw_tag = 1, and then,
> if for a long observation period we never saw large depth, we switch
> to hw_tag = 0, otherwise we stick to hw_tag = 1, without reconsidering
> it.
That is probably the better way to do it, as I said earlier it is indeed
a chicken and egg problem. Care to patch something like that up?
> I think the correct logic could be pushed to the blk-core, by
> introducing also an indeterminate bit.
And I still don't think that is a good idea. The block layer case cares
more about the capability side ("is this a good ssd?") where as the CFQ
case incorporates process behaviour as well. I'll gladly take patches to
improve the CFQ logic.
--
Jens Axboe
On Thu, Nov 12, 2009 at 1:16 PM, Jens Axboe <[email protected]> wrote:
> On Tue, Nov 10 2009, Corrado Zoccolo wrote:
>> On Tue, Nov 10, 2009 at 4:14 PM, Jens Axboe <[email protected]> wrote:
>> > On Tue, Nov 10 2009, Corrado Zoccolo wrote:
>> >> The core block layer already has code to detect presence of command
>> >> queuing devices. We convert cfq to use that instead of re-doing the
>> >> computation.
>> >
>> > There's is the major difference that the CFQ variant is dynamic and the
>> > block layer one is not. This change came from Aaron some time ago IIRC,
>> > see commit 45333d5. It's a bit of a chicken and egg problem.
>>
>> The comment by Aaron:
>> CFQ's detection of queueing devices assumes a non-queuing device and detects
>> if the queue depth reaches a certain threshold. Under some workloads (e.g.
>> synchronous reads), CFQ effectively forces a unit queue depth,
>> thus defeating
>> the detection logic. This leads to poor performance on queuing hardware,
>> since the idle window remains enabled.
>>
>> makes me think that the dynamic-off detection in cfq may really be
>> buggy (BTW this could explain the bad results on SSD Jeff observed
>> before my patch set).
>> The problem is, that once the hw_tag is 0, it is difficult for it to
>> become 1 again, as explained by Aaron, since cfq will hardly send more
>> than 1 request at a time. My patch set fixes this for SSDs (the seeky
>> readers will still be sent without idling, and if they are enough, the
>> logic will see a large enough depth to reconsider the initial
>> decision).
>>
>> So the only sound way to do the detection is to start in an
>> indeterminate state, in which CFQ behaves as if hw_tag = 1, and then,
>> if for a long observation period we never saw large depth, we switch
>> to hw_tag = 0, otherwise we stick to hw_tag = 1, without reconsidering
>> it.
>
> That is probably the better way to do it, as I said earlier it is indeed
> a chicken and egg problem. Care to patch something like that up?
Ok.
>> I think the correct logic could be pushed to the blk-core, by
>> introducing also an indeterminate bit.
>
> And I still don't think that is a good idea. The block layer case cares
> more about the capability side ("is this a good ssd?") where as the CFQ
> case incorporates process behaviour as well. I'll gladly take patches to
> improve the CFQ logic.
Ok, I'll work on CFQ side then.
What about other possible measurements (e.g. avg seek time could be
used to adjust the slice_idle parameter)? Should they go into cfq, or
in the block layer, or possibly in a separate library that is used by
cfq?
>
> --
> Jens Axboe
>
>
--
__________________________________________________________________________
dott. Corrado Zoccolo mailto:[email protected]
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------
On Thu, Nov 12 2009, Corrado Zoccolo wrote:
> On Thu, Nov 12, 2009 at 1:16 PM, Jens Axboe <[email protected]> wrote:
> > On Tue, Nov 10 2009, Corrado Zoccolo wrote:
> >> On Tue, Nov 10, 2009 at 4:14 PM, Jens Axboe <[email protected]> wrote:
> >> > On Tue, Nov 10 2009, Corrado Zoccolo wrote:
> >> >> The core block layer already has code to detect presence of command
> >> >> queuing devices. We convert cfq to use that instead of re-doing the
> >> >> computation.
> >> >
> >> > There's is the major difference that the CFQ variant is dynamic and the
> >> > block layer one is not. This change came from Aaron some time ago IIRC,
> >> > see commit 45333d5. It's a bit of a chicken and egg problem.
> >>
> >> The comment by Aaron:
> >> ? ? CFQ's detection of queueing devices assumes a non-queuing device and detects
> >> ? ? if the queue depth reaches a certain threshold. ?Under some workloads (e.g.
> >> ? ? synchronous reads), CFQ effectively forces a unit queue depth,
> >> thus defeating
> >> ? ? the detection logic. ?This leads to poor performance on queuing hardware,
> >> ? ? since the idle window remains enabled.
> >>
> >> makes me think that the dynamic-off detection in cfq may really be
> >> buggy (BTW this could explain the bad results on SSD Jeff observed
> >> before my patch set).
> >> The problem is, that once the hw_tag is 0, it is difficult for it to
> >> become 1 again, as explained by Aaron, since cfq will hardly send more
> >> than 1 request at a time. My patch set fixes this for SSDs (the seeky
> >> readers will still be sent without idling, and if they are enough, the
> >> logic will see a large enough depth to reconsider the initial
> >> decision).
> >>
> >> So the only sound way to do the detection is to start in an
> >> indeterminate state, in which CFQ behaves as if hw_tag = 1, and then,
> >> if for a long observation period we never saw large depth, we switch
> >> to hw_tag = 0, otherwise we stick to hw_tag = 1, without reconsidering
> >> it.
> >
> > That is probably the better way to do it, as I said earlier it is indeed
> > a chicken and egg problem. Care to patch something like that up?
> Ok.
>
> >> I think the correct logic could be pushed to the blk-core, by
> >> introducing also an indeterminate bit.
> >
> > And I still don't think that is a good idea. The block layer case cares
> > more about the capability side ("is this a good ssd?") where as the CFQ
> > case incorporates process behaviour as well. I'll gladly take patches to
> > improve the CFQ logic.
> Ok, I'll work on CFQ side then.
>
> What about other possible measurements (e.g. avg seek time could be
> used to adjust the slice_idle parameter)? Should they go into cfq, or
> in the block layer, or possibly in a separate library that is used by
> cfq?
I'd just stick it in CFQ, since that's where we use it. If there are
other uses for it, we can always migrate it to a common helper.
--
Jens Axboe