2022-11-03 16:29:52

by Paolo Valente

[permalink] [raw]
Subject: [PATCH V6 0/8] block, bfq: extend bfq to support multi-actuator drives

Hi,
this V6 differs from V5 in that it addresses the issue reported in
[2].

Here is the whole description of this patch series again. This
extension addresses the following issue. Single-LUN multi-actuator
SCSI drives, as well as all multi-actuator SATA drives appear as a
single device to the I/O subsystem [1]. Yet they address commands to
different actuators internally, as a function of Logical Block
Addressing (LBAs). A given sector is reachable by only one of the
actuators. For example, Seagate’s Serial Advanced Technology
Attachment (SATA) version contains two actuators and maps the lower
half of the SATA LBA space to the lower actuator and the upper half to
the upper actuator.

Evidently, to fully utilize actuators, no actuator must be left idle
or underutilized while there is pending I/O for it. To reach this
goal, the block layer must somehow control the load of each actuator
individually. This series enriches BFQ with such a per-actuator
control, as a first step. Then it also adds a simple mechanism for
guaranteeing that actuators with pending I/O are never left idle.

See [1] for a more detailed overview of the problem and of the
solutions implemented in this patch series. There you will also find
some preliminary performance results.

Thanks,
Paolo

[1] https://www.linaro.org/blog/budget-fair-queueing-bfq-linux-io-scheduler-optimizations-for-multi-actuator-sata-hard-drives/
[2] https://lore.kernel.org/lkml/[email protected]/T/#m3685011dacaf0801bd084b2a3194c861a4e940ba

Davide Zini (3):
block, bfq: split also async bfq_queues on a per-actuator basis
block, bfq: inject I/O to underutilized actuators
block, bfq: balance I/O injection among underutilized actuators

Federico Gavioli (1):
block, bfq: retrieve independent access ranges from request queue

Paolo Valente (4):
block, bfq: split sync bfq_queues on a per-actuator basis
block, bfq: forbid stable merging of queues associated with different
actuators
block, bfq: move io_cq-persistent bfqq data into a dedicated struct
block, bfq: turn bfqq_data into an array in bfq_io_cq

block/bfq-cgroup.c | 97 +++++----
block/bfq-iosched.c | 521 ++++++++++++++++++++++++++++++--------------
block/bfq-iosched.h | 141 +++++++++---
block/bfq-wf2q.c | 2 +-
4 files changed, 528 insertions(+), 233 deletions(-)

--
2.20.1


2022-11-03 16:30:05

by Paolo Valente

[permalink] [raw]
Subject: [PATCH V6 7/8] block, bfq: inject I/O to underutilized actuators

From: Davide Zini <[email protected]>

The main service scheme of BFQ for sync I/O is serving one sync
bfq_queue at a time, for a while. In particular, BFQ enforces this
scheme when it deems the latter necessary to boost throughput or
to preserve service guarantees. Unfortunately, when BFQ enforces
this policy, only one actuator at a time gets served for a while,
because each bfq_queue contains I/O only for one actuator. The
other actuators may remain underutilized.

Actually, BFQ may serve (inject) extra I/O, taken from other
bfq_queues, in parallel with that of the in-service queue. This
injection mechanism may provide the ground for dealing also with
the above actuator-underutilization problem. Yet BFQ does not take
the actuator load into account when choosing which queue to pick
extra I/O from. In addition, BFQ may happen to inject extra I/O
only when the in-service queue is temporarily empty.

In view of these facts, this commit extends the
injection mechanism in such a way that the latter:
(1) takes into account also the actuator load;
(2) checks such a load on each dispatch, and injects I/O for an
underutilized actuator, if there is one and there is I/O for it.

To perform the check in (2), this commit introduces a load
threshold, currently set to 4. A linear scan of each actuator is
performed, until an actuator is found for which the following two
conditions hold: the load of the actuator is below the threshold,
and there is at least one non-in-service queue that contains I/O
for that actuator. If such a pair (actuator, queue) is found, then
the head request of that queue is returned for dispatch, instead
of the head request of the in-service queue.

We have set the threshold, empirically, to the minimum possible
value for which an actuator is fully utilized, or close to be
fully utilized. By doing so, injected I/O 'steals' as few
drive-queue slots as possibile to the in-service queue. This
reduces as much as possible the probability that the service of
I/O from the in-service bfq_queue gets delayed because of slot
exhaustion, i.e., because all the slots of the drive queue are
filled with I/O injected from other queues (NCQ provides for 32
slots).

This new mechanism also counters actuator underutilization in the
case of asymmetric configurations of bfq_queues. Namely if there
are few bfq_queues containing I/O for some actuators and many
bfq_queues containing I/O for other actuators. Or if the
bfq_queues containing I/O for some actuators have lower weights
than the other bfq_queues.

Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Davide Zini <[email protected]>
---
block/bfq-cgroup.c | 2 +-
block/bfq-iosched.c | 139 +++++++++++++++++++++++++++++++++-----------
block/bfq-iosched.h | 39 ++++++++++++-
block/bfq-wf2q.c | 2 +-
4 files changed, 143 insertions(+), 39 deletions(-)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index d243c429d9c0..38ccfe55ad46 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -694,7 +694,7 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bfq_activate_bfqq(bfqd, bfqq);
}

- if (!bfqd->in_service_queue && !bfqd->rq_in_driver)
+ if (!bfqd->in_service_queue && !bfqd->tot_rq_in_driver)
bfq_schedule_dispatch(bfqd);
/* release extra ref taken above, bfqq may happen to be freed now */
bfq_put_queue(bfqq);
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 106c8820cc5c..db91f1a651d3 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2252,6 +2252,7 @@ static void bfq_add_request(struct request *rq)

bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq));
bfqq->queued[rq_is_sync(rq)]++;
+
/*
* Updating of 'bfqd->queued' is protected by 'bfqd->lock', however, it
* may be read without holding the lock in bfq_has_work().
@@ -2297,9 +2298,9 @@ static void bfq_add_request(struct request *rq)
* elapsed.
*/
if (bfqq == bfqd->in_service_queue &&
- (bfqd->rq_in_driver == 0 ||
+ (bfqd->tot_rq_in_driver == 0 ||
(bfqq->last_serv_time_ns > 0 &&
- bfqd->rqs_injected && bfqd->rq_in_driver > 0)) &&
+ bfqd->rqs_injected && bfqd->tot_rq_in_driver > 0)) &&
time_is_before_eq_jiffies(bfqq->decrease_time_jif +
msecs_to_jiffies(10))) {
bfqd->last_empty_occupied_ns = ktime_get_ns();
@@ -2323,7 +2324,7 @@ static void bfq_add_request(struct request *rq)
* will be set in case injection is performed
* on bfqq before rq is completed).
*/
- if (bfqd->rq_in_driver == 0)
+ if (bfqd->tot_rq_in_driver == 0)
bfqd->rqs_injected = false;
}
}
@@ -2421,15 +2422,18 @@ static sector_t get_sdist(sector_t last_pos, struct request *rq)
static void bfq_activate_request(struct request_queue *q, struct request *rq)
{
struct bfq_data *bfqd = q->elevator->elevator_data;
+ unsigned int act_idx = bfq_actuator_index(bfqd, rq->bio);

- bfqd->rq_in_driver++;
+ bfqd->tot_rq_in_driver++;
+ bfqd->rq_in_driver[act_idx]++;
}

static void bfq_deactivate_request(struct request_queue *q, struct request *rq)
{
struct bfq_data *bfqd = q->elevator->elevator_data;

- bfqd->rq_in_driver--;
+ bfqd->tot_rq_in_driver--;
+ bfqd->rq_in_driver[bfq_actuator_index(bfqd, rq->bio)]--;
}
#endif

@@ -2703,11 +2707,14 @@ void bfq_end_wr_async_queues(struct bfq_data *bfqd,
static void bfq_end_wr(struct bfq_data *bfqd)
{
struct bfq_queue *bfqq;
+ int i;

spin_lock_irq(&bfqd->lock);

- list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
- bfq_bfqq_end_wr(bfqq);
+ for (i = 0; i < bfqd->num_actuators; i++) {
+ list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list)
+ bfq_bfqq_end_wr(bfqq);
+ }
list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
bfq_bfqq_end_wr(bfqq);
bfq_end_wr_async(bfqd);
@@ -3651,13 +3658,13 @@ static void bfq_update_peak_rate(struct bfq_data *bfqd, struct request *rq)
* - start a new observation interval with this dispatch
*/
if (now_ns - bfqd->last_dispatch > 100*NSEC_PER_MSEC &&
- bfqd->rq_in_driver == 0)
+ bfqd->tot_rq_in_driver == 0)
goto update_rate_and_reset;

/* Update sampling information */
bfqd->peak_rate_samples++;

- if ((bfqd->rq_in_driver > 0 ||
+ if ((bfqd->tot_rq_in_driver > 0 ||
now_ns - bfqd->last_completion < BFQ_MIN_TT)
&& !BFQ_RQ_SEEKY(bfqd, bfqd->last_position, rq))
bfqd->sequential_samples++;
@@ -3924,7 +3931,7 @@ static bool idling_needed_for_service_guarantees(struct bfq_data *bfqd,
return (bfqq->wr_coeff > 1 &&
(bfqd->wr_busy_queues <
tot_busy_queues ||
- bfqd->rq_in_driver >=
+ bfqd->tot_rq_in_driver >=
bfqq->dispatched + 4)) ||
bfq_asymmetric_scenario(bfqd, bfqq) ||
tot_busy_queues == 1;
@@ -4696,6 +4703,7 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd)
{
struct bfq_queue *bfqq, *in_serv_bfqq = bfqd->in_service_queue;
unsigned int limit = in_serv_bfqq->inject_limit;
+ int i;
/*
* If
* - bfqq is not weight-raised and therefore does not carry
@@ -4727,7 +4735,7 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd)
)
limit = 1;

- if (bfqd->rq_in_driver >= limit)
+ if (bfqd->tot_rq_in_driver >= limit)
return NULL;

/*
@@ -4742,11 +4750,12 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd)
* (and re-added only if it gets new requests, but then it
* is assigned again enough budget for its new backlog).
*/
- list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
- if (!RB_EMPTY_ROOT(&bfqq->sort_list) &&
- (in_serv_always_inject || bfqq->wr_coeff > 1) &&
- bfq_serv_to_charge(bfqq->next_rq, bfqq) <=
- bfq_bfqq_budget_left(bfqq)) {
+ for (i = 0; i < bfqd->num_actuators; i++) {
+ list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list)
+ if (!RB_EMPTY_ROOT(&bfqq->sort_list) &&
+ (in_serv_always_inject || bfqq->wr_coeff > 1) &&
+ bfq_serv_to_charge(bfqq->next_rq, bfqq) <=
+ bfq_bfqq_budget_left(bfqq)) {
/*
* Allow for only one large in-flight request
* on non-rotational devices, for the
@@ -4771,22 +4780,69 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd)
else
limit = in_serv_bfqq->inject_limit;

- if (bfqd->rq_in_driver < limit) {
+ if (bfqd->tot_rq_in_driver < limit) {
bfqd->rqs_injected = true;
return bfqq;
}
}
+ }
+
+ return NULL;
+}
+
+static struct bfq_queue *
+bfq_find_active_bfqq_for_actuator(struct bfq_data *bfqd,
+ int idx)
+{
+ struct bfq_queue *bfqq = NULL;
+
+ if (bfqd->in_service_queue &&
+ bfqd->in_service_queue->actuator_idx == idx)
+ return bfqd->in_service_queue;
+
+ list_for_each_entry(bfqq, &bfqd->active_list[idx], bfqq_list) {
+ if (!RB_EMPTY_ROOT(&bfqq->sort_list) &&
+ bfq_serv_to_charge(bfqq->next_rq, bfqq) <=
+ bfq_bfqq_budget_left(bfqq)) {
+ return bfqq;
+ }
+ }

return NULL;
}

+/*
+ * Perform a linear scan of each actuator, until an actuator is found
+ * for which the following two conditions hold: the load of the
+ * actuator is below the threshold (see comments on actuator_load_threshold
+ * for details), and there is a queue that contains I/O for that
+ * actuator. On success, return that queue.
+ */
+static struct bfq_queue *
+bfq_find_bfqq_for_underused_actuator(struct bfq_data *bfqd)
+{
+ int i;
+
+ for (i = 0 ; i < bfqd->num_actuators; i++)
+ if (bfqd->rq_in_driver[i] < bfqd->actuator_load_threshold) {
+ struct bfq_queue *bfqq =
+ bfq_find_active_bfqq_for_actuator(bfqd, i);
+
+ if (bfqq)
+ return bfqq;
+ }
+
+ return NULL;
+}
+
+
/*
* Select a queue for service. If we have a current queue in service,
* check whether to continue servicing it, or retrieve and set a new one.
*/
static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
{
- struct bfq_queue *bfqq;
+ struct bfq_queue *bfqq, *inject_bfqq;
struct request *next_rq;
enum bfqq_expiration reason = BFQQE_BUDGET_TIMEOUT;

@@ -4808,6 +4864,15 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
goto expire;

check_queue:
+ /*
+ * If some actuator is underutilized, but the in-service
+ * queue does not contain I/O for that actuator, then try to
+ * inject I/O for that actuator.
+ */
+ inject_bfqq = bfq_find_bfqq_for_underused_actuator(bfqd);
+ if (inject_bfqq && inject_bfqq != bfqq)
+ return inject_bfqq;
+
/*
* This loop is rarely executed more than once. Even when it
* happens, it is much more convenient to re-execute this loop
@@ -5163,11 +5228,11 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)

/*
* We exploit the bfq_finish_requeue_request hook to
- * decrement rq_in_driver, but
+ * decrement tot_rq_in_driver, but
* bfq_finish_requeue_request will not be invoked on
* this request. So, to avoid unbalance, just start
- * this request, without incrementing rq_in_driver. As
- * a negative consequence, rq_in_driver is deceptively
+ * this request, without incrementing tot_rq_in_driver. As
+ * a negative consequence, tot_rq_in_driver is deceptively
* lower than it should be while this request is in
* service. This may cause bfq_schedule_dispatch to be
* invoked uselessly.
@@ -5176,7 +5241,7 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
* bfq_finish_requeue_request hook, if defined, is
* probably invoked also on this request. So, by
* exploiting this hook, we could 1) increment
- * rq_in_driver here, and 2) decrement it in
+ * tot_rq_in_driver here, and 2) decrement it in
* bfq_finish_requeue_request. Such a solution would
* let the value of the counter be always accurate,
* but it would entail using an extra interface
@@ -5205,7 +5270,7 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
* Of course, serving one request at a time may cause loss of
* throughput.
*/
- if (bfqd->strict_guarantees && bfqd->rq_in_driver > 0)
+ if (bfqd->strict_guarantees && bfqd->tot_rq_in_driver > 0)
goto exit;

bfqq = bfq_select_queue(bfqd);
@@ -5216,7 +5281,8 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)

if (rq) {
inc_in_driver_start_rq:
- bfqd->rq_in_driver++;
+ bfqd->rq_in_driver[bfqq->actuator_idx]++;
+ bfqd->tot_rq_in_driver++;
start_rq:
rq->rq_flags |= RQF_STARTED;
}
@@ -6289,7 +6355,7 @@ static void bfq_update_hw_tag(struct bfq_data *bfqd)
struct bfq_queue *bfqq = bfqd->in_service_queue;

bfqd->max_rq_in_driver = max_t(int, bfqd->max_rq_in_driver,
- bfqd->rq_in_driver);
+ bfqd->tot_rq_in_driver);

if (bfqd->hw_tag == 1)
return;
@@ -6300,7 +6366,7 @@ static void bfq_update_hw_tag(struct bfq_data *bfqd)
* sum is not exact, as it's not taking into account deactivated
* requests.
*/
- if (bfqd->rq_in_driver + bfqd->queued <= BFQ_HW_QUEUE_THRESHOLD)
+ if (bfqd->tot_rq_in_driver + bfqd->queued <= BFQ_HW_QUEUE_THRESHOLD)
return;

/*
@@ -6311,7 +6377,7 @@ static void bfq_update_hw_tag(struct bfq_data *bfqd)
if (bfqq && bfq_bfqq_has_short_ttime(bfqq) &&
bfqq->dispatched + bfqq->queued[0] + bfqq->queued[1] <
BFQ_HW_QUEUE_THRESHOLD &&
- bfqd->rq_in_driver < BFQ_HW_QUEUE_THRESHOLD)
+ bfqd->tot_rq_in_driver < BFQ_HW_QUEUE_THRESHOLD)
return;

if (bfqd->hw_tag_samples++ < BFQ_HW_QUEUE_SAMPLES)
@@ -6332,7 +6398,8 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)

bfq_update_hw_tag(bfqd);

- bfqd->rq_in_driver--;
+ bfqd->rq_in_driver[bfqq->actuator_idx]--;
+ bfqd->tot_rq_in_driver--;
bfqq->dispatched--;

if (!bfqq->dispatched && !bfq_bfqq_busy(bfqq)) {
@@ -6451,7 +6518,7 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
BFQQE_NO_MORE_REQUESTS);
}

- if (!bfqd->rq_in_driver)
+ if (!bfqd->tot_rq_in_driver)
bfq_schedule_dispatch(bfqd);
}

@@ -6582,13 +6649,13 @@ static void bfq_update_inject_limit(struct bfq_data *bfqd,
* conditions to do it, or we can lower the last base value
* computed.
*
- * NOTE: (bfqd->rq_in_driver == 1) means that there is no I/O
+ * NOTE: (bfqd->tot_rq_in_driver == 1) means that there is no I/O
* request in flight, because this function is in the code
* path that handles the completion of a request of bfqq, and,
* in particular, this function is executed before
- * bfqd->rq_in_driver is decremented in such a code path.
+ * bfqd->tot_rq_in_driver is decremented in such a code path.
*/
- if ((bfqq->last_serv_time_ns == 0 && bfqd->rq_in_driver == 1) ||
+ if ((bfqq->last_serv_time_ns == 0 && bfqd->tot_rq_in_driver == 1) ||
tot_time_ns < bfqq->last_serv_time_ns) {
if (bfqq->last_serv_time_ns == 0) {
/*
@@ -6598,7 +6665,7 @@ static void bfq_update_inject_limit(struct bfq_data *bfqd,
bfqq->inject_limit = max_t(unsigned int, 1, old_limit);
}
bfqq->last_serv_time_ns = tot_time_ns;
- } else if (!bfqd->rqs_injected && bfqd->rq_in_driver == 1)
+ } else if (!bfqd->rqs_injected && bfqd->tot_rq_in_driver == 1)
/*
* No I/O injected and no request still in service in
* the drive: these are the exact conditions for
@@ -7239,7 +7306,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
bfqd->queue_weights_tree = RB_ROOT_CACHED;
bfqd->num_groups_with_pending_reqs = 0;

- INIT_LIST_HEAD(&bfqd->active_list);
+ INIT_LIST_HEAD(&bfqd->active_list[0]);
+ INIT_LIST_HEAD(&bfqd->active_list[1]);
INIT_LIST_HEAD(&bfqd->idle_list);
INIT_HLIST_HEAD(&bfqd->burst_list);

@@ -7284,6 +7352,9 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
ref_wr_duration[blk_queue_nonrot(bfqd->queue)];
bfqd->peak_rate = ref_rate[blk_queue_nonrot(bfqd->queue)] * 2 / 3;

+ /* see comments on the definition of next field inside bfq_data */
+ bfqd->actuator_load_threshold = 4;
+
spin_lock_init(&bfqd->lock);

/*
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 90130a893c8f..adb3ba6a9d90 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -586,7 +586,12 @@ struct bfq_data {
/* number of queued requests */
int queued;
/* number of requests dispatched and waiting for completion */
- int rq_in_driver;
+ int tot_rq_in_driver;
+ /*
+ * number of requests dispatched and waiting for completion
+ * for each actuator
+ */
+ int rq_in_driver[BFQ_MAX_ACTUATORS];

/* true if the device is non rotational and performs queueing */
bool nonrot_with_queueing;
@@ -680,8 +685,13 @@ struct bfq_data {
/* maximum budget allotted to a bfq_queue before rescheduling */
int bfq_max_budget;

- /* list of all the bfq_queues active on the device */
- struct list_head active_list;
+ /*
+ * List of all the bfq_queues active for a specific actuator
+ * on the device. Keeping active queues separate on a
+ * per-actuator basis helps implementing per-actuator
+ * injection more efficiently.
+ */
+ struct list_head active_list[BFQ_MAX_ACTUATORS];
/* list of all the bfq_queues idle on the device */
struct list_head idle_list;

@@ -816,6 +826,29 @@ struct bfq_data {
* in this device.
*/
struct blk_independent_access_range ia_ranges[BFQ_MAX_ACTUATORS];
+
+ /*
+ * If the number of I/O requests queued in the device for a
+ * given actuator is below next threshold, then the actuator
+ * is deemed as underutilized. If this condition is found to
+ * hold for some actuator upon a dispatch, but (i) the
+ * in-service queue does not contain I/O for that actuator,
+ * while (ii) some other queue does contain I/O for that
+ * actuator, then the head I/O request of the latter queue is
+ * returned (injected), instead of the head request of the
+ * currently in-service queue.
+ *
+ * We set the threshold, empirically, to the minimum possible
+ * value for which an actuator is fully utilized, or close to
+ * be fully utilized. By doing so, injected I/O 'steals' as
+ * few drive-queue slots as possibile to the in-service
+ * queue. This reduces as much as possible the probability
+ * that the service of I/O from the in-service bfq_queue gets
+ * delayed because of slot exhaustion, i.e., because all the
+ * slots of the drive queue are filled with I/O injected from
+ * other queues (NCQ provides for 32 slots).
+ */
+ unsigned int actuator_load_threshold;
};

enum bfqq_state_flags {
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index 8fc3da4c23bb..ec0273e2cd07 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -477,7 +477,7 @@ static void bfq_active_insert(struct bfq_service_tree *st,
bfqd = (struct bfq_data *)bfqg->bfqd;
#endif
if (bfqq)
- list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list);
+ list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list[bfqq->actuator_idx]);
#ifdef CONFIG_BFQ_GROUP_IOSCHED
if (bfqg != bfqd->root_group)
bfqg->active_entities++;
--
2.20.1


2022-11-03 16:30:13

by Paolo Valente

[permalink] [raw]
Subject: [PATCH V6 8/8] block, bfq: balance I/O injection among underutilized actuators

From: Davide Zini <[email protected]>

Upon the invocation of its dispatch function, BFQ returns the next I/O
request of the in-service bfq_queue, unless some exception holds. One
such exception is that there is some underutilized actuator, different
from the actuator for which the in-service queue contains I/O, and
that some other bfq_queue happens to contain I/O for such an
actuator. In this case, the next I/O request of the latter bfq_queue,
and not of the in-service bfq_queue, is returned (I/O is injected from
that bfq_queue). To find such an actuator, a linear scan, in
increasing index order, is performed among actuators.

Performing a linear scan entails a prioritization among actuators: an
underutilized actuator may be considered for injection only if all
actuators with a lower index are currently fully utilized, or if there
is no pending I/O for any lower-index actuator that happens to be
underutilized.

This commits breaks this prioritization and tends to distribute
injection uniformly across actuators. This is obtained by adding the
following condition to the linear scan: even if an actuator A is
underutilized, A is however skipped if its load is higher than that of
the next actuator.

Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Davide Zini <[email protected]>
---
block/bfq-iosched.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index db91f1a651d3..c568a5a112a7 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4813,10 +4813,16 @@ bfq_find_active_bfqq_for_actuator(struct bfq_data *bfqd,

/*
* Perform a linear scan of each actuator, until an actuator is found
- * for which the following two conditions hold: the load of the
- * actuator is below the threshold (see comments on actuator_load_threshold
- * for details), and there is a queue that contains I/O for that
- * actuator. On success, return that queue.
+ * for which the following three conditions hold: the load of the
+ * actuator is below the threshold (see comments on
+ * actuator_load_threshold for details) and lower than that of the
+ * next actuator (comments on this extra condition below), and there
+ * is a queue that contains I/O for that actuator. On success, return
+ * that queue.
+ *
+ * Performing a plain linear scan entails a prioritization among
+ * actuators. The extra condition above breaks this prioritization and
+ * tends to distribute injection uniformly across actuators.
*/
static struct bfq_queue *
bfq_find_bfqq_for_underused_actuator(struct bfq_data *bfqd)
@@ -4824,7 +4830,9 @@ bfq_find_bfqq_for_underused_actuator(struct bfq_data *bfqd)
int i;

for (i = 0 ; i < bfqd->num_actuators; i++)
- if (bfqd->rq_in_driver[i] < bfqd->actuator_load_threshold) {
+ if (bfqd->rq_in_driver[i] < bfqd->actuator_load_threshold &&
+ (i == bfqd->num_actuators - 1 ||
+ bfqd->rq_in_driver[i] < bfqd->rq_in_driver[i+1])) {
struct bfq_queue *bfqq =
bfq_find_active_bfqq_for_actuator(bfqd, i);

--
2.20.1


2022-11-03 16:30:20

by Paolo Valente

[permalink] [raw]
Subject: [PATCH V6 4/8] block, bfq: turn bfqq_data into an array in bfq_io_cq

When a bfq_queue Q is merged with another queue, several pieces of
information are saved about Q. These pieces are stored in the
bfqq_data field in the bfq_io_cq data structure of the process
associated with Q.

Yet, with a multi-actuator drive, a process may get associated with
multiple bfq_queues: one queue for each of the N actuators. Each of
these queues may undergo a merge. So, the bfq_io_cq data structure
must be able to accommodate the above information for N queues.

This commit solves this problem by turning the bfqq_data scalar field
into an array of N elements (and by changing code so as to handle
this array).

This solution is written under the assumption that bfq_queues
associated with different actuators cannot be cross-merged. This
assumption holds naturally with basic queue merging: the latter is
triggered by spatial locality, and sectors for different actuators are
not close to each other. As for stable cross-merging, the assumption
here is that it is disabled.

Signed-off-by: Gabriele Felici <[email protected]>
Signed-off-by: Gianmarco Lusvardi <[email protected]>
Signed-off-by: Giulio Barabino <[email protected]>
Signed-off-by: Emiliano Maccaferri <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 72 ++++++++++++++++++++++++---------------------
block/bfq-iosched.h | 12 +++++---
2 files changed, 47 insertions(+), 37 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 01528182c0c5..f44bac054aaf 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -404,7 +404,7 @@ void bic_set_bfqq(struct bfq_io_cq *bic,
* we cancel the stable merge if
* bic->stable_merge_bfqq == bfqq.
*/
- struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data[actuator_idx];
bic->bfqq[is_sync][actuator_idx] = bfqq;

if (bfqq && bfqq_data->stable_merge_bfqq == bfqq) {
@@ -1175,9 +1175,10 @@ static void
bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd,
struct bfq_io_cq *bic, bool bfq_already_existing)
{
- struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
unsigned int old_wr_coeff = 1;
bool busy = bfq_already_existing && bfq_bfqq_busy(bfqq);
+ unsigned int a_idx = bfqq->actuator_idx;
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data[a_idx];

if (bfqq_data->saved_has_short_ttime)
bfq_mark_bfqq_has_short_ttime(bfqq);
@@ -1827,6 +1828,16 @@ static bool bfq_bfqq_higher_class_or_weight(struct bfq_queue *bfqq,
return bfqq_weight > in_serv_weight;
}

+/* get the index of the actuator that will serve bio */
+static unsigned int bfq_actuator_index(struct bfq_data *bfqd, struct bio *bio)
+{
+ /*
+ * Multi-actuator support not complete yet, so always return 0
+ * for the moment.
+ */
+ return 0;
+}
+
static bool bfq_better_to_idle(struct bfq_queue *bfqq);

static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
@@ -1881,7 +1892,9 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
wr_or_deserves_wr = bfqd->low_latency &&
(bfqq->wr_coeff > 1 ||
(bfq_bfqq_sync(bfqq) &&
- (bfqq->bic || RQ_BIC(rq)->bfqq_data.stably_merged) &&
+ (bfqq->bic ||
+ RQ_BIC(rq)->bfqq_data[bfq_actuator_index(bfqd, rq->bio)]
+ .stably_merged) &&
(*interactive || soft_rt)));

/*
@@ -2469,16 +2482,6 @@ static void bfq_remove_request(struct request_queue *q,

}

-/* get the index of the actuator that will serve bio */
-static unsigned int bfq_actuator_index(struct bfq_data *bfqd, struct bio *bio)
-{
- /*
- * Multi-actuator support not complete yet, so always return 0
- * for the moment.
- */
- return 0;
-}
-
static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
unsigned int nr_segs)
{
@@ -2905,7 +2908,8 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
void *io_struct, bool request, struct bfq_io_cq *bic)
{
struct bfq_queue *in_service_bfqq, *new_bfqq;
- struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
+ unsigned int a_idx = bfqq->actuator_idx;
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data[a_idx];

/* if a merge has already been setup, then proceed with that first */
if (bfqq->new_bfqq)
@@ -2952,8 +2956,9 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
if (new_bfqq) {
bfqq_data->stably_merged = true;
if (new_bfqq->bic)
- new_bfqq->bic->bfqq_data.stably_merged =
- true;
+ new_bfqq->bic->bfqq_data
+ [new_bfqq->actuator_idx]
+ .stably_merged = true;
}
return new_bfqq;
} else
@@ -3052,7 +3057,9 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
{
struct bfq_io_cq *bic = bfqq->bic;
- struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
+ /* State must be saved for the right queue index. */
+ unsigned int a_idx = bfqq->actuator_idx;
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data[a_idx];

/*
* If !bfqq->bic, the queue is already shared or its requests
@@ -3063,7 +3070,7 @@ static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
return;

bfqq_data->saved_last_serv_time_ns = bfqq->last_serv_time_ns;
- bfqq_data->saved_inject_limit = bfqq->inject_limit;
+ bfqq_data->saved_inject_limit = bfqq->inject_limit;
bfqq_data->saved_decrease_time_jif = bfqq->decrease_time_jif;

bfqq_data->saved_weight = bfqq->entity.orig_weight;
@@ -5425,7 +5432,7 @@ static void bfq_exit_icq(struct io_cq *icq)
unsigned long flags;
unsigned int act_idx;
unsigned int num_actuators;
- struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
+ struct bfq_iocq_bfqq_data *bfqq_data = bic->bfqq_data;

/*
* bfqd is NULL if scheduler already exited, and in that case
@@ -5445,10 +5452,10 @@ static void bfq_exit_icq(struct io_cq *icq)
num_actuators = BFQ_MAX_ACTUATORS;
}

- if (bfqq_data->stable_merge_bfqq)
- bfq_put_stable_ref(bfqq_data->stable_merge_bfqq);
-
for (act_idx = 0; act_idx < num_actuators; act_idx++) {
+ if (bfqq_data[act_idx].stable_merge_bfqq)
+ bfq_put_stable_ref(bfqq_data[act_idx].stable_merge_bfqq);
+
bfq_exit_icq_bfqq(bic, true, act_idx);
bfq_exit_icq_bfqq(bic, false, act_idx);
}
@@ -5635,16 +5642,16 @@ bfq_do_early_stable_merge(struct bfq_data *bfqd, struct bfq_queue *bfqq,
struct bfq_io_cq *bic,
struct bfq_queue *last_bfqq_created)
{
+ unsigned int a_idx = last_bfqq_created->actuator_idx;
struct bfq_queue *new_bfqq =
bfq_setup_merge(bfqq, last_bfqq_created);
- struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;

if (!new_bfqq)
return bfqq;

if (new_bfqq->bic)
- new_bfqq->bic->bfqq_data.stably_merged = true;
- bfqq_data->stably_merged = true;
+ new_bfqq->bic->bfqq_data[a_idx].stably_merged = true;
+ bic->bfqq_data[a_idx].stably_merged = true;

/*
* Reusing merge functions. This implies that
@@ -5713,7 +5720,6 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
&bfqd->last_bfqq_created;

struct bfq_queue *last_bfqq_created = *source_bfqq;
- struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;

/*
* If last_bfqq_created has not been set yet, then init it. If
@@ -5775,7 +5781,8 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
/*
* Record the bfqq to merge to.
*/
- bfqq_data->stable_merge_bfqq = last_bfqq_created;
+ bic->bfqq_data[last_bfqq_created->actuator_idx].stable_merge_bfqq =
+ last_bfqq_created;
}
}

@@ -6696,7 +6703,7 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
{
unsigned int act_idx = bfq_actuator_index(bfqd, bio);
struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync, act_idx);
- struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data[act_idx];

if (likely(bfqq && bfqq != &bfqd->oom_bfqq))
return bfqq;
@@ -6804,7 +6811,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
struct bfq_queue *bfqq;
bool new_queue = false;
bool bfqq_already_existing = false, split = false;
- struct bfq_iocq_bfqq_data *bfqq_data;
+ unsigned int a_idx = bfq_actuator_index(bfqd, bio);

if (unlikely(!rq->elv.icq))
return NULL;
@@ -6828,17 +6835,16 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio, false, is_sync,
&new_queue);

- bfqq_data = &bic->bfqq_data;
-
if (likely(!new_queue)) {
/* If the queue was seeky for too long, break it apart. */
if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq) &&
- !bfqq_data->stably_merged) {
+ !bic->bfqq_data[a_idx].stably_merged) {
struct bfq_queue *old_bfqq = bfqq;

/* Update bic before losing reference to bfqq */
if (bfq_bfqq_in_large_burst(bfqq))
- bfqq_data->saved_in_large_burst = true;
+ bic->bfqq_data[a_idx].saved_in_large_burst =
+ true;

bfqq = bfq_split_bfqq(bic, bfqq);
split = true;
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index f2e8ab91951c..e27897d66a0f 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -416,7 +416,7 @@ struct bfq_queue {
struct bfq_iocq_bfqq_data {
/*
* Snapshot of the has_short_time flag before merging; taken
- * to remember its value while the queue is merged, so as to
+ * to remember its values while the queue is merged, so as to
* be able to restore it in case of split.
*/
bool saved_has_short_ttime;
@@ -430,7 +430,7 @@ struct bfq_iocq_bfqq_data {
u64 saved_tot_idle_time;

/*
- * Same purpose as the previous fields for the value of the
+ * Same purpose as the previous fields for the values of the
* field keeping the queue's belonging to a large burst
*/
bool saved_in_large_burst;
@@ -493,8 +493,12 @@ struct bfq_io_cq {
uint64_t blkcg_serial_nr; /* the current blkcg serial */
#endif

- /* persistent data for associated synchronous process queue */
- struct bfq_iocq_bfqq_data bfqq_data;
+ /*
+ * Persistent data for associated synchronous process queues
+ * (one queue per actuator, see field bfqq above). In
+ * particular, each of these queues may undergo a merge.
+ */
+ struct bfq_iocq_bfqq_data bfqq_data[BFQ_MAX_ACTUATORS];

unsigned int requests; /* Number of requests this process has in flight */
};
--
2.20.1


2022-11-03 16:30:20

by Paolo Valente

[permalink] [raw]
Subject: [PATCH V6 6/8] block, bfq: retrieve independent access ranges from request queue

From: Federico Gavioli <[email protected]>

This patch implements the code to gather the content of the
independent_access_ranges structure from the request_queue and copy
it into the queue's bfq_data. This copy is done at queue initialization.

We copy the access ranges into the bfq_data to avoid taking the queue
lock each time we access the ranges.

This implementation, however, puts a limit to the maximum independent
ranges supported by the scheduler. Such a limit is equal to the constant
BFQ_MAX_ACTUATORS. This limit was placed to avoid the allocation of
dynamic memory.

Co-developed-by: Rory Chen <[email protected]>
Signed-off-by: Rory Chen <[email protected]>
Signed-off-by: Federico Gavioli <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 54 ++++++++++++++++++++++++++++++++++++++-------
block/bfq-iosched.h | 5 +++++
2 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index c94b80e3f685..106c8820cc5c 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1831,10 +1831,26 @@ static bool bfq_bfqq_higher_class_or_weight(struct bfq_queue *bfqq,
/* get the index of the actuator that will serve bio */
static unsigned int bfq_actuator_index(struct bfq_data *bfqd, struct bio *bio)
{
- /*
- * Multi-actuator support not complete yet, so always return 0
- * for the moment.
- */
+ struct blk_independent_access_range *iar;
+ unsigned int i;
+ sector_t end;
+
+ /* no search needed if one or zero ranges present */
+ if (bfqd->num_actuators < 2)
+ return 0;
+
+ /* bio_end_sector(bio) gives the sector after the last one */
+ end = bio_end_sector(bio) - 1;
+
+ for (i = 0; i < bfqd->num_actuators; i++) {
+ iar = &(bfqd->ia_ranges[i]);
+ if (end >= iar->sector && end < iar->sector + iar->nr_sectors)
+ return i;
+ }
+
+ WARN_ONCE(true,
+ "bfq_actuator_index: bio sector out of ranges: end=%llu\n",
+ end);
return 0;
}

@@ -2479,7 +2495,6 @@ static void bfq_remove_request(struct request_queue *q,

if (rq->cmd_flags & REQ_META)
bfqq->meta_pending--;
-
}

static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
@@ -7144,6 +7159,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
{
struct bfq_data *bfqd;
struct elevator_queue *eq;
+ unsigned int i;
+ struct blk_independent_access_ranges *ia_ranges = q->disk->ia_ranges;

eq = elevator_alloc(q, e);
if (!eq)
@@ -7187,10 +7204,31 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
bfqd->queue = q;

/*
- * Multi-actuator support not complete yet, default to single
- * actuator for the moment.
+ * If the disk supports multiple actuators, we copy the independent
+ * access ranges from the request queue structure.
*/
- bfqd->num_actuators = 1;
+ spin_lock_irq(&q->queue_lock);
+ if (ia_ranges) {
+ /*
+ * Check if the disk ia_ranges size exceeds the current bfq
+ * actuator limit.
+ */
+ if (ia_ranges->nr_ia_ranges > BFQ_MAX_ACTUATORS) {
+ pr_crit("nr_ia_ranges higher than act limit: iars=%d, max=%d.\n",
+ ia_ranges->nr_ia_ranges, BFQ_MAX_ACTUATORS);
+ pr_crit("Falling back to single actuator mode.\n");
+ bfqd->num_actuators = 0;
+ } else {
+ bfqd->num_actuators = ia_ranges->nr_ia_ranges;
+
+ for (i = 0; i < bfqd->num_actuators; i++)
+ bfqd->ia_ranges[i] = ia_ranges->ia_range[i];
+ }
+ } else {
+ bfqd->num_actuators = 0;
+ }
+
+ spin_unlock_irq(&q->queue_lock);

INIT_LIST_HEAD(&bfqd->dispatch);

diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index f1c2e77cbf9a..90130a893c8f 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -811,6 +811,11 @@ struct bfq_data {
*/
unsigned int num_actuators;

+ /*
+ * Disk independent access ranges for each actuator
+ * in this device.
+ */
+ struct blk_independent_access_range ia_ranges[BFQ_MAX_ACTUATORS];
};

enum bfqq_state_flags {
--
2.20.1


2022-11-03 16:44:04

by Paolo Valente

[permalink] [raw]
Subject: [PATCH V6 5/8] block, bfq: split also async bfq_queues on a per-actuator basis

From: Davide Zini <[email protected]>

Similarly to sync bfq_queues, also async bfq_queues need to be split
on a per-actuator basis.

Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Davide Zini <[email protected]>
---
block/bfq-iosched.c | 41 +++++++++++++++++++++++------------------
block/bfq-iosched.h | 8 ++++----
2 files changed, 27 insertions(+), 22 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index f44bac054aaf..c94b80e3f685 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2673,14 +2673,16 @@ static void bfq_bfqq_end_wr(struct bfq_queue *bfqq)
void bfq_end_wr_async_queues(struct bfq_data *bfqd,
struct bfq_group *bfqg)
{
- int i, j;
-
- for (i = 0; i < 2; i++)
- for (j = 0; j < IOPRIO_NR_LEVELS; j++)
- if (bfqg->async_bfqq[i][j])
- bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
- if (bfqg->async_idle_bfqq)
- bfq_bfqq_end_wr(bfqg->async_idle_bfqq);
+ int i, j, k;
+
+ for (k = 0; k < bfqd->num_actuators; k++) {
+ for (i = 0; i < 2; i++)
+ for (j = 0; j < IOPRIO_NR_LEVELS; j++)
+ if (bfqg->async_bfqq[i][j][k])
+ bfq_bfqq_end_wr(bfqg->async_bfqq[i][j][k]);
+ if (bfqg->async_idle_bfqq[k])
+ bfq_bfqq_end_wr(bfqg->async_idle_bfqq[k]);
+ }
}

static void bfq_end_wr(struct bfq_data *bfqd)
@@ -5620,18 +5622,18 @@ static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,

static struct bfq_queue **bfq_async_queue_prio(struct bfq_data *bfqd,
struct bfq_group *bfqg,
- int ioprio_class, int ioprio)
+ int ioprio_class, int ioprio, int act_idx)
{
switch (ioprio_class) {
case IOPRIO_CLASS_RT:
- return &bfqg->async_bfqq[0][ioprio];
+ return &bfqg->async_bfqq[0][ioprio][act_idx];
case IOPRIO_CLASS_NONE:
ioprio = IOPRIO_BE_NORM;
fallthrough;
case IOPRIO_CLASS_BE:
- return &bfqg->async_bfqq[1][ioprio];
+ return &bfqg->async_bfqq[1][ioprio][act_idx];
case IOPRIO_CLASS_IDLE:
- return &bfqg->async_idle_bfqq;
+ return &bfqg->async_idle_bfqq[act_idx];
default:
return NULL;
}
@@ -5805,7 +5807,8 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,

if (!is_sync) {
async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
- ioprio);
+ ioprio,
+ bfq_actuator_index(bfqd, bio));
bfqq = *async_bfqq;
if (bfqq)
goto out;
@@ -7022,13 +7025,15 @@ static void __bfq_put_async_bfqq(struct bfq_data *bfqd,
*/
void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
{
- int i, j;
+ int i, j, k;

- for (i = 0; i < 2; i++)
- for (j = 0; j < IOPRIO_NR_LEVELS; j++)
- __bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
+ for (k = 0; k < bfqd->num_actuators; k++) {
+ for (i = 0; i < 2; i++)
+ for (j = 0; j < IOPRIO_NR_LEVELS; j++)
+ __bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j][k]);

- __bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
+ __bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq[k]);
+ }
}

/*
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index e27897d66a0f..f1c2e77cbf9a 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -976,8 +976,8 @@ struct bfq_group {

void *bfqd;

- struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
- struct bfq_queue *async_idle_bfqq;
+ struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS][BFQ_MAX_ACTUATORS];
+ struct bfq_queue *async_idle_bfqq[BFQ_MAX_ACTUATORS];

struct bfq_entity *my_entity;

@@ -993,8 +993,8 @@ struct bfq_group {
struct bfq_entity entity;
struct bfq_sched_data sched_data;

- struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
- struct bfq_queue *async_idle_bfqq;
+ struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS][BFQ_MAX_ACTUATORS];
+ struct bfq_queue *async_idle_bfqq[BFQ_MAX_ACTUATORS];

struct rb_root rq_pos_tree;
};
--
2.20.1


2022-11-03 16:50:20

by Paolo Valente

[permalink] [raw]
Subject: [PATCH V6 2/8] block, bfq: forbid stable merging of queues associated with different actuators

If queues associated with different actuators are merged, then control
is lost on each actuator. Therefore some actuator may be
underutilized, and throughput may decrease. This problem cannot occur
with basic queue merging, because the latter is triggered by spatial
locality, and sectors for different actuators are not close to each
other. Yet it may happen with stable merging. To address this issue,
this commit prevents stable merging from occurring among queues
associated with different actuators.

Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 5c69394bbb65..ec4b0e70265f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5705,9 +5705,13 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
* it has been set already, but too long ago, then move it
* forward to bfqq. Finally, move also if bfqq belongs to a
* different group than last_bfqq_created, or if bfqq has a
- * different ioprio or ioprio_class. If none of these
- * conditions holds true, then try an early stable merge or
- * schedule a delayed stable merge.
+ * different ioprio, ioprio_class or actuator_idx. If none of
+ * these conditions holds true, then try an early stable merge
+ * or schedule a delayed stable merge. As for the condition on
+ * actuator_idx, the reason is that, if queues associated with
+ * different actuators are merged, then control is lost on
+ * each actuator. Therefore some actuator may be
+ * underutilized, and throughput may decrease.
*
* A delayed merge is scheduled (instead of performing an
* early merge), in case bfqq might soon prove to be more
@@ -5725,7 +5729,8 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
bfqq->creation_time) ||
bfqq->entity.parent != last_bfqq_created->entity.parent ||
bfqq->ioprio != last_bfqq_created->ioprio ||
- bfqq->ioprio_class != last_bfqq_created->ioprio_class)
+ bfqq->ioprio_class != last_bfqq_created->ioprio_class ||
+ bfqq->actuator_idx != last_bfqq_created->actuator_idx)
*source_bfqq = bfqq;
else if (time_after_eq(last_bfqq_created->creation_time +
bfqd->bfq_burst_interval,
--
2.20.1


2022-11-03 17:06:38

by Paolo Valente

[permalink] [raw]
Subject: [PATCH V6 3/8] block, bfq: move io_cq-persistent bfqq data into a dedicated struct

With a multi-actuator drive, a process may get associated with multiple
bfq_queues: one queue for each of the N actuators. So, the bfq_io_cq
data structure must be able to accommodate its per-queue persistent
information for N queues. Currently it stores this information for
just one queue, in several scalar fields.

This is a preparatory commit for moving to accommodating persistent
information for N queues. In particular, this commit packs all the
above scalar fields into a single data structure. Then there is now
only one fieldi, in bfq_io_cq, that stores all the above information. This
scalar field will then be turned into an array by a following commit.

Suggested-by: Damien Le Moal <[email protected]>
Signed-off-by: Gianmarco Lusvardi <[email protected]>
Signed-off-by: Giulio Barabino <[email protected]>
Signed-off-by: Emiliano Maccaferri <[email protected]>
Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 129 +++++++++++++++++++++++++-------------------
block/bfq-iosched.h | 52 ++++++++++--------
2 files changed, 105 insertions(+), 76 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index ec4b0e70265f..01528182c0c5 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -404,9 +404,10 @@ void bic_set_bfqq(struct bfq_io_cq *bic,
* we cancel the stable merge if
* bic->stable_merge_bfqq == bfqq.
*/
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
bic->bfqq[is_sync][actuator_idx] = bfqq;

- if (bfqq && bic->stable_merge_bfqq == bfqq) {
+ if (bfqq && bfqq_data->stable_merge_bfqq == bfqq) {
/*
* Actually, these same instructions are executed also
* in bfq_setup_cooperator, in case of abort or actual
@@ -415,9 +416,9 @@ void bic_set_bfqq(struct bfq_io_cq *bic,
* did so, we would nest even more complexity in this
* function.
*/
- bfq_put_stable_ref(bic->stable_merge_bfqq);
+ bfq_put_stable_ref(bfqq_data->stable_merge_bfqq);

- bic->stable_merge_bfqq = NULL;
+ bfqq_data->stable_merge_bfqq = NULL;
}
}

@@ -1174,38 +1175,40 @@ static void
bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd,
struct bfq_io_cq *bic, bool bfq_already_existing)
{
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
unsigned int old_wr_coeff = 1;
bool busy = bfq_already_existing && bfq_bfqq_busy(bfqq);

- if (bic->saved_has_short_ttime)
+ if (bfqq_data->saved_has_short_ttime)
bfq_mark_bfqq_has_short_ttime(bfqq);
else
bfq_clear_bfqq_has_short_ttime(bfqq);

- if (bic->saved_IO_bound)
+ if (bfqq_data->saved_IO_bound)
bfq_mark_bfqq_IO_bound(bfqq);
else
bfq_clear_bfqq_IO_bound(bfqq);

- bfqq->last_serv_time_ns = bic->saved_last_serv_time_ns;
- bfqq->inject_limit = bic->saved_inject_limit;
- bfqq->decrease_time_jif = bic->saved_decrease_time_jif;
+ bfqq->last_serv_time_ns = bfqq_data->saved_last_serv_time_ns;
+ bfqq->inject_limit = bfqq_data->saved_inject_limit;
+ bfqq->decrease_time_jif = bfqq_data->saved_decrease_time_jif;

- bfqq->entity.new_weight = bic->saved_weight;
- bfqq->ttime = bic->saved_ttime;
- bfqq->io_start_time = bic->saved_io_start_time;
- bfqq->tot_idle_time = bic->saved_tot_idle_time;
+ bfqq->entity.new_weight = bfqq_data->saved_weight;
+ bfqq->ttime = bfqq_data->saved_ttime;
+ bfqq->io_start_time = bfqq_data->saved_io_start_time;
+ bfqq->tot_idle_time = bfqq_data->saved_tot_idle_time;
/*
* Restore weight coefficient only if low_latency is on
*/
if (bfqd->low_latency) {
old_wr_coeff = bfqq->wr_coeff;
- bfqq->wr_coeff = bic->saved_wr_coeff;
+ bfqq->wr_coeff = bfqq_data->saved_wr_coeff;
}
- bfqq->service_from_wr = bic->saved_service_from_wr;
- bfqq->wr_start_at_switch_to_srt = bic->saved_wr_start_at_switch_to_srt;
- bfqq->last_wr_start_finish = bic->saved_last_wr_start_finish;
- bfqq->wr_cur_max_time = bic->saved_wr_cur_max_time;
+ bfqq->service_from_wr = bfqq_data->saved_service_from_wr;
+ bfqq->wr_start_at_switch_to_srt =
+ bfqq_data->saved_wr_start_at_switch_to_srt;
+ bfqq->last_wr_start_finish = bfqq_data->saved_last_wr_start_finish;
+ bfqq->wr_cur_max_time = bfqq_data->saved_wr_cur_max_time;

if (bfqq->wr_coeff > 1 && (bfq_bfqq_in_large_burst(bfqq) ||
time_is_before_jiffies(bfqq->last_wr_start_finish +
@@ -1878,7 +1881,7 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
wr_or_deserves_wr = bfqd->low_latency &&
(bfqq->wr_coeff > 1 ||
(bfq_bfqq_sync(bfqq) &&
- (bfqq->bic || RQ_BIC(rq)->stably_merged) &&
+ (bfqq->bic || RQ_BIC(rq)->bfqq_data.stably_merged) &&
(*interactive || soft_rt)));

/*
@@ -2902,6 +2905,7 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
void *io_struct, bool request, struct bfq_io_cq *bic)
{
struct bfq_queue *in_service_bfqq, *new_bfqq;
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;

/* if a merge has already been setup, then proceed with that first */
if (bfqq->new_bfqq)
@@ -2923,21 +2927,21 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
* stable merging) also if bic is associated with a
* sync queue, but this bfqq is async
*/
- if (bfq_bfqq_sync(bfqq) && bic->stable_merge_bfqq &&
+ if (bfq_bfqq_sync(bfqq) && bfqq_data->stable_merge_bfqq &&
!bfq_bfqq_just_created(bfqq) &&
time_is_before_jiffies(bfqq->split_time +
msecs_to_jiffies(bfq_late_stable_merging)) &&
time_is_before_jiffies(bfqq->creation_time +
msecs_to_jiffies(bfq_late_stable_merging))) {
struct bfq_queue *stable_merge_bfqq =
- bic->stable_merge_bfqq;
+ bfqq_data->stable_merge_bfqq;
int proc_ref = min(bfqq_process_refs(bfqq),
bfqq_process_refs(stable_merge_bfqq));

/* deschedule stable merge, because done or aborted here */
bfq_put_stable_ref(stable_merge_bfqq);

- bic->stable_merge_bfqq = NULL;
+ bfqq_data->stable_merge_bfqq = NULL;

if (!idling_boosts_thr_without_issues(bfqd, bfqq) &&
proc_ref > 0) {
@@ -2946,10 +2950,10 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bfq_setup_merge(bfqq, stable_merge_bfqq);

if (new_bfqq) {
- bic->stably_merged = true;
+ bfqq_data->stably_merged = true;
if (new_bfqq->bic)
- new_bfqq->bic->stably_merged =
- true;
+ new_bfqq->bic->bfqq_data.stably_merged =
+ true;
}
return new_bfqq;
} else
@@ -3048,6 +3052,7 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
{
struct bfq_io_cq *bic = bfqq->bic;
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;

/*
* If !bfqq->bic, the queue is already shared or its requests
@@ -3057,18 +3062,21 @@ static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
if (!bic)
return;

- bic->saved_last_serv_time_ns = bfqq->last_serv_time_ns;
- bic->saved_inject_limit = bfqq->inject_limit;
- bic->saved_decrease_time_jif = bfqq->decrease_time_jif;
-
- bic->saved_weight = bfqq->entity.orig_weight;
- bic->saved_ttime = bfqq->ttime;
- bic->saved_has_short_ttime = bfq_bfqq_has_short_ttime(bfqq);
- bic->saved_IO_bound = bfq_bfqq_IO_bound(bfqq);
- bic->saved_io_start_time = bfqq->io_start_time;
- bic->saved_tot_idle_time = bfqq->tot_idle_time;
- bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
- bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node);
+ bfqq_data->saved_last_serv_time_ns = bfqq->last_serv_time_ns;
+ bfqq_data->saved_inject_limit = bfqq->inject_limit;
+ bfqq_data->saved_decrease_time_jif = bfqq->decrease_time_jif;
+
+ bfqq_data->saved_weight = bfqq->entity.orig_weight;
+ bfqq_data->saved_ttime = bfqq->ttime;
+ bfqq_data->saved_has_short_ttime =
+ bfq_bfqq_has_short_ttime(bfqq);
+ bfqq_data->saved_IO_bound = bfq_bfqq_IO_bound(bfqq);
+ bfqq_data->saved_io_start_time = bfqq->io_start_time;
+ bfqq_data->saved_tot_idle_time = bfqq->tot_idle_time;
+ bfqq_data->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
+ bfqq_data->was_in_burst_list =
+ !hlist_unhashed(&bfqq->burst_list_node);
+
if (unlikely(bfq_bfqq_just_created(bfqq) &&
!bfq_bfqq_in_large_burst(bfqq) &&
bfqq->bfqd->low_latency)) {
@@ -3081,17 +3089,21 @@ static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
* to bfqq, so that to avoid that bfqq unjustly fails
* to enjoy weight raising if split soon.
*/
- bic->saved_wr_coeff = bfqq->bfqd->bfq_wr_coeff;
- bic->saved_wr_start_at_switch_to_srt = bfq_smallest_from_now();
- bic->saved_wr_cur_max_time = bfq_wr_duration(bfqq->bfqd);
- bic->saved_last_wr_start_finish = jiffies;
+ bfqq_data->saved_wr_coeff = bfqq->bfqd->bfq_wr_coeff;
+ bfqq_data->saved_wr_start_at_switch_to_srt =
+ bfq_smallest_from_now();
+ bfqq_data->saved_wr_cur_max_time =
+ bfq_wr_duration(bfqq->bfqd);
+ bfqq_data->saved_last_wr_start_finish = jiffies;
} else {
- bic->saved_wr_coeff = bfqq->wr_coeff;
- bic->saved_wr_start_at_switch_to_srt =
+ bfqq_data->saved_wr_coeff = bfqq->wr_coeff;
+ bfqq_data->saved_wr_start_at_switch_to_srt =
bfqq->wr_start_at_switch_to_srt;
- bic->saved_service_from_wr = bfqq->service_from_wr;
- bic->saved_last_wr_start_finish = bfqq->last_wr_start_finish;
- bic->saved_wr_cur_max_time = bfqq->wr_cur_max_time;
+ bfqq_data->saved_service_from_wr =
+ bfqq->service_from_wr;
+ bfqq_data->saved_last_wr_start_finish =
+ bfqq->last_wr_start_finish;
+ bfqq_data->saved_wr_cur_max_time = bfqq->wr_cur_max_time;
}
}

@@ -5413,6 +5425,7 @@ static void bfq_exit_icq(struct io_cq *icq)
unsigned long flags;
unsigned int act_idx;
unsigned int num_actuators;
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;

/*
* bfqd is NULL if scheduler already exited, and in that case
@@ -5432,8 +5445,8 @@ static void bfq_exit_icq(struct io_cq *icq)
num_actuators = BFQ_MAX_ACTUATORS;
}

- if (bic->stable_merge_bfqq)
- bfq_put_stable_ref(bic->stable_merge_bfqq);
+ if (bfqq_data->stable_merge_bfqq)
+ bfq_put_stable_ref(bfqq_data->stable_merge_bfqq);

for (act_idx = 0; act_idx < num_actuators; act_idx++) {
bfq_exit_icq_bfqq(bic, true, act_idx);
@@ -5624,13 +5637,14 @@ bfq_do_early_stable_merge(struct bfq_data *bfqd, struct bfq_queue *bfqq,
{
struct bfq_queue *new_bfqq =
bfq_setup_merge(bfqq, last_bfqq_created);
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;

if (!new_bfqq)
return bfqq;

if (new_bfqq->bic)
- new_bfqq->bic->stably_merged = true;
- bic->stably_merged = true;
+ new_bfqq->bic->bfqq_data.stably_merged = true;
+ bfqq_data->stably_merged = true;

/*
* Reusing merge functions. This implies that
@@ -5699,6 +5713,7 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
&bfqd->last_bfqq_created;

struct bfq_queue *last_bfqq_created = *source_bfqq;
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;

/*
* If last_bfqq_created has not been set yet, then init it. If
@@ -5760,7 +5775,7 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
/*
* Record the bfqq to merge to.
*/
- bic->stable_merge_bfqq = last_bfqq_created;
+ bfqq_data->stable_merge_bfqq = last_bfqq_created;
}
}

@@ -6681,6 +6696,7 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
{
unsigned int act_idx = bfq_actuator_index(bfqd, bio);
struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync, act_idx);
+ struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;

if (likely(bfqq && bfqq != &bfqd->oom_bfqq))
return bfqq;
@@ -6694,12 +6710,12 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,

bic_set_bfqq(bic, bfqq, is_sync, act_idx);
if (split && is_sync) {
- if ((bic->was_in_burst_list && bfqd->large_burst) ||
- bic->saved_in_large_burst)
+ if ((bfqq_data->was_in_burst_list && bfqd->large_burst) ||
+ bfqq_data->saved_in_large_burst)
bfq_mark_bfqq_in_large_burst(bfqq);
else {
bfq_clear_bfqq_in_large_burst(bfqq);
- if (bic->was_in_burst_list)
+ if (bfqq_data->was_in_burst_list)
/*
* If bfqq was in the current
* burst list before being
@@ -6788,6 +6804,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
struct bfq_queue *bfqq;
bool new_queue = false;
bool bfqq_already_existing = false, split = false;
+ struct bfq_iocq_bfqq_data *bfqq_data;

if (unlikely(!rq->elv.icq))
return NULL;
@@ -6811,15 +6828,17 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio, false, is_sync,
&new_queue);

+ bfqq_data = &bic->bfqq_data;
+
if (likely(!new_queue)) {
/* If the queue was seeky for too long, break it apart. */
if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq) &&
- !bic->stably_merged) {
+ !bfqq_data->stably_merged) {
struct bfq_queue *old_bfqq = bfqq;

/* Update bic before losing reference to bfqq */
if (bfq_bfqq_in_large_burst(bfqq))
- bic->saved_in_large_burst = true;
+ bfqq_data->saved_in_large_burst = true;

bfqq = bfq_split_bfqq(bic, bfqq);
split = true;
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index bfcbd8ea9000..f2e8ab91951c 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -411,27 +411,9 @@ struct bfq_queue {
};

/**
- * struct bfq_io_cq - per (request_queue, io_context) structure.
- */
-struct bfq_io_cq {
- /* associated io_cq structure */
- struct io_cq icq; /* must be the first member */
- /*
- * Matrix of associated process queues: first row for async
- * queues, second row sync queues. Each row contains one
- * column for each actuator. An I/O request generated by the
- * process is inserted into the queue pointed by bfqq[i][j] if
- * the request is to be served by the j-th actuator of the
- * drive, where i==0 or i==1, depending on whether the request
- * is async or sync. So there is a distinct queue for each
- * actuator.
- */
- struct bfq_queue *bfqq[2][BFQ_MAX_ACTUATORS];
- /* per (request_queue, blkcg) ioprio */
- int ioprio;
-#ifdef CONFIG_BFQ_GROUP_IOSCHED
- uint64_t blkcg_serial_nr; /* the current blkcg serial */
-#endif
+* struct bfq_data - bfqq data unique and persistent for associated bfq_io_cq
+*/
+struct bfq_iocq_bfqq_data {
/*
* Snapshot of the has_short_time flag before merging; taken
* to remember its value while the queue is merged, so as to
@@ -486,6 +468,34 @@ struct bfq_io_cq {
struct bfq_queue *stable_merge_bfqq;

bool stably_merged; /* non splittable if true */
+};
+
+/**
+ * struct bfq_io_cq - per (request_queue, io_context) structure.
+ */
+struct bfq_io_cq {
+ /* associated io_cq structure */
+ struct io_cq icq; /* must be the first member */
+ /*
+ * Matrix of associated process queues: first row for async
+ * queues, second row sync queues. Each row contains one
+ * column for each actuator. An I/O request generated by the
+ * process is inserted into the queue pointed by bfqq[i][j] if
+ * the request is to be served by the j-th actuator of the
+ * drive, where i==0 or i==1, depending on whether the request
+ * is async or sync. So there is a distinct queue for each
+ * actuator.
+ */
+ struct bfq_queue *bfqq[2][BFQ_MAX_ACTUATORS];
+ /* per (request_queue, blkcg) ioprio */
+ int ioprio;
+#ifdef CONFIG_BFQ_GROUP_IOSCHED
+ uint64_t blkcg_serial_nr; /* the current blkcg serial */
+#endif
+
+ /* persistent data for associated synchronous process queue */
+ struct bfq_iocq_bfqq_data bfqq_data;
+
unsigned int requests; /* Number of requests this process has in flight */
};

--
2.20.1


2022-11-10 17:41:30

by Arie van der Hoeven

[permalink] [raw]
Subject: Re: [PATCH V6 8/8] block, bfq: balance I/O injection among underutilized actuators

Checking in on this series and what we can communicate to partners as to potential integration into Linux. Is 6.1 viable?

We have at least one big partner whose launch schedule is gated on these changes.

Regards, --Arie


From: Paolo Valente <[email protected]>
Sent: Thursday, November 3, 2022 9:26 AM
To: Jens Axboe <[email protected]>
Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; Arie van der Hoeven <[email protected]>; Rory Chen <[email protected]>; Davide Zini <[email protected]>; Paolo Valente <[email protected]>
Subject: [PATCH V6 8/8] block, bfq: balance I/O injection among underutilized actuators


This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.


From: Davide Zini <[email protected]>

Upon the invocation of its dispatch function, BFQ returns the next I/O
request of the in-service bfq_queue, unless some exception holds. One
such exception is that there is some underutilized actuator, different
from the actuator for which the in-service queue contains I/O, and
that some other bfq_queue happens to contain I/O for such an
actuator. In this case, the next I/O request of the latter bfq_queue,
and not of the in-service bfq_queue, is returned (I/O is injected from
that bfq_queue). To find such an actuator, a linear scan, in
increasing index order, is performed among actuators.

Performing a linear scan entails a prioritization among actuators: an
underutilized actuator may be considered for injection only if all
actuators with a lower index are currently fully utilized, or if there
is no pending I/O for any lower-index actuator that happens to be
underutilized.

This commits breaks this prioritization and tends to distribute
injection uniformly across actuators. This is obtained by adding the
following condition to the linear scan: even if an actuator A is
underutilized, A is however skipped if its load is higher than that of
the next actuator.

Signed-off-by: Paolo Valente <[email protected]>
Signed-off-by: Davide Zini <[email protected]>
---
block/bfq-iosched.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index db91f1a651d3..c568a5a112a7 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4813,10 +4813,16 @@ bfq_find_active_bfqq_for_actuator(struct bfq_data *bfqd,

/*
* Perform a linear scan of each actuator, until an actuator is found
- * for which the following two conditions hold: the load of the
- * actuator is below the threshold (see comments on actuator_load_threshold
- * for details), and there is a queue that contains I/O for that
- * actuator. On success, return that queue.
+ * for which the following three conditions hold: the load of the
+ * actuator is below the threshold (see comments on
+ * actuator_load_threshold for details) and lower than that of the
+ * next actuator (comments on this extra condition below), and there
+ * is a queue that contains I/O for that actuator. On success, return
+ * that queue.
+ *
+ * Performing a plain linear scan entails a prioritization among
+ * actuators. The extra condition above breaks this prioritization and
+ * tends to distribute injection uniformly across actuators.
*/
static struct bfq_queue *
bfq_find_bfqq_for_underused_actuator(struct bfq_data *bfqd)
@@ -4824,7 +4830,9 @@ bfq_find_bfqq_for_underused_actuator(struct bfq_data *bfqd)
int i;

for (i = 0 ; i < bfqd->num_actuators; i++)
- if (bfqd->rq_in_driver[i] < bfqd->actuator_load_threshold) {
+ if (bfqd->rq_in_driver[i] < bfqd->actuator_load_threshold &&
+ (i == bfqd->num_actuators - 1 ||
+ bfqd->rq_in_driver[i] < bfqd->rq_in_driver[i+1])) {
struct bfq_queue *bfqq =
bfq_find_active_bfqq_for_actuator(bfqd, i);

--
2.20.1

Seagate Internal

2022-11-20 08:05:53

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH V6 8/8] block, bfq: balance I/O injection among underutilized actuators



> Il giorno 10 nov 2022, alle ore 16:25, Arie van der Hoeven <[email protected]> ha scritto:
>
> Checking in on this series and what we can communicate to partners as to potential integration into Linux. Is 6.1 viable?

Hi Arie,
definitely too late for 6.1.

Jens, could you enqueue this for 6.2? Unless Damien, or anybody else
still sees issues to address.

Thanks,
Paolo

>
> We have at least one big partner whose launch schedule is gated on these changes.
>
> Regards, --Arie
>
>
> From: Paolo Valente <[email protected]>
> Sent: Thursday, November 3, 2022 9:26 AM
> To: Jens Axboe <[email protected]>
> Cc: [email protected] <[email protected]>; [email protected] <[email protected]>; Arie van der Hoeven <[email protected]>; Rory Chen <[email protected]>; Davide Zini <[email protected]>; Paolo Valente <[email protected]>
> Subject: [PATCH V6 8/8] block, bfq: balance I/O injection among underutilized actuators
>
>
> This message has originated from an External Source. Please use proper judgment and caution when opening attachments, clicking links, or responding to this email.
>
>
> From: Davide Zini <[email protected]>
>
> Upon the invocation of its dispatch function, BFQ returns the next I/O
> request of the in-service bfq_queue, unless some exception holds. One
> such exception is that there is some underutilized actuator, different
> from the actuator for which the in-service queue contains I/O, and
> that some other bfq_queue happens to contain I/O for such an
> actuator. In this case, the next I/O request of the latter bfq_queue,
> and not of the in-service bfq_queue, is returned (I/O is injected from
> that bfq_queue). To find such an actuator, a linear scan, in
> increasing index order, is performed among actuators.
>
> Performing a linear scan entails a prioritization among actuators: an
> underutilized actuator may be considered for injection only if all
> actuators with a lower index are currently fully utilized, or if there
> is no pending I/O for any lower-index actuator that happens to be
> underutilized.
>
> This commits breaks this prioritization and tends to distribute
> injection uniformly across actuators. This is obtained by adding the
> following condition to the linear scan: even if an actuator A is
> underutilized, A is however skipped if its load is higher than that of
> the next actuator.
>
> Signed-off-by: Paolo Valente <[email protected]>
> Signed-off-by: Davide Zini <[email protected]>
> ---
> block/bfq-iosched.c | 18 +++++++++++++-----
> 1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index db91f1a651d3..c568a5a112a7 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -4813,10 +4813,16 @@ bfq_find_active_bfqq_for_actuator(struct bfq_data *bfqd,
>
> /*
> * Perform a linear scan of each actuator, until an actuator is found
> - * for which the following two conditions hold: the load of the
> - * actuator is below the threshold (see comments on actuator_load_threshold
> - * for details), and there is a queue that contains I/O for that
> - * actuator. On success, return that queue.
> + * for which the following three conditions hold: the load of the
> + * actuator is below the threshold (see comments on
> + * actuator_load_threshold for details) and lower than that of the
> + * next actuator (comments on this extra condition below), and there
> + * is a queue that contains I/O for that actuator. On success, return
> + * that queue.
> + *
> + * Performing a plain linear scan entails a prioritization among
> + * actuators. The extra condition above breaks this prioritization and
> + * tends to distribute injection uniformly across actuators.
> */
> static struct bfq_queue *
> bfq_find_bfqq_for_underused_actuator(struct bfq_data *bfqd)
> @@ -4824,7 +4830,9 @@ bfq_find_bfqq_for_underused_actuator(struct bfq_data *bfqd)
> int i;
>
> for (i = 0 ; i < bfqd->num_actuators; i++)
> - if (bfqd->rq_in_driver[i] < bfqd->actuator_load_threshold) {
> + if (bfqd->rq_in_driver[i] < bfqd->actuator_load_threshold &&
> + (i == bfqd->num_actuators - 1 ||
> + bfqd->rq_in_driver[i] < bfqd->rq_in_driver[i+1])) {
> struct bfq_queue *bfqq =
> bfq_find_active_bfqq_for_actuator(bfqd, i);
>
> --
> 2.20.1
>
> Seagate Internal


2022-11-20 22:33:43

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH V6 8/8] block, bfq: balance I/O injection among underutilized actuators

On 11/20/22 12:29 AM, Paolo Valente wrote:
>
>
>> Il giorno 10 nov 2022, alle ore 16:25, Arie van der Hoeven <[email protected]> ha scritto:
>>
>> Checking in on this series and what we can communicate to partners as to potential integration into Linux. Is 6.1 viable?
>
> Hi Arie,
> definitely too late for 6.1.
>
> Jens, could you enqueue this for 6.2? Unless Damien, or anybody else
> still sees issues to address.

Would be nice to get Damien's feedback on the series.

--
Jens Axboe



2022-11-20 23:58:59

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH V6 8/8] block, bfq: balance I/O injection among underutilized actuators

On 11/21/22 07:06, Jens Axboe wrote:
> On 11/20/22 12:29 AM, Paolo Valente wrote:
>>
>>
>>> Il giorno 10 nov 2022, alle ore 16:25, Arie van der Hoeven <[email protected]> ha scritto:
>>>
>>> Checking in on this series and what we can communicate to partners as to potential integration into Linux. Is 6.1 viable?
>>
>> Hi Arie,
>> definitely too late for 6.1.
>>
>> Jens, could you enqueue this for 6.2? Unless Damien, or anybody else
>> still sees issues to address.
>
> Would be nice to get Damien's feedback on the series.

I will try to have a look today.

>

--
Damien Le Moal
Western Digital Research


2022-11-21 00:48:22

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH V6 3/8] block, bfq: move io_cq-persistent bfqq data into a dedicated struct

On 11/4/22 01:26, Paolo Valente wrote:
> With a multi-actuator drive, a process may get associated with multiple
> bfq_queues: one queue for each of the N actuators. So, the bfq_io_cq
> data structure must be able to accommodate its per-queue persistent
> information for N queues. Currently it stores this information for
> just one queue, in several scalar fields.
>
> This is a preparatory commit for moving to accommodating persistent
> information for N queues. In particular, this commit packs all the
> above scalar fields into a single data structure. Then there is now
> only one fieldi, in bfq_io_cq, that stores all the above information. This

s/fieldi/field

Other than that, looks OK to me.

Reviewed-by: Damien Le Moal <[email protected]>

> scalar field will then be turned into an array by a following commit.
>
> Suggested-by: Damien Le Moal <[email protected]>
> Signed-off-by: Gianmarco Lusvardi <[email protected]>
> Signed-off-by: Giulio Barabino <[email protected]>
> Signed-off-by: Emiliano Maccaferri <[email protected]>
> Signed-off-by: Paolo Valente <[email protected]>
> ---
> block/bfq-iosched.c | 129 +++++++++++++++++++++++++-------------------
> block/bfq-iosched.h | 52 ++++++++++--------
> 2 files changed, 105 insertions(+), 76 deletions(-)
>
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index ec4b0e70265f..01528182c0c5 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -404,9 +404,10 @@ void bic_set_bfqq(struct bfq_io_cq *bic,
> * we cancel the stable merge if
> * bic->stable_merge_bfqq == bfqq.
> */
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
> bic->bfqq[is_sync][actuator_idx] = bfqq;
>
> - if (bfqq && bic->stable_merge_bfqq == bfqq) {
> + if (bfqq && bfqq_data->stable_merge_bfqq == bfqq) {
> /*
> * Actually, these same instructions are executed also
> * in bfq_setup_cooperator, in case of abort or actual
> @@ -415,9 +416,9 @@ void bic_set_bfqq(struct bfq_io_cq *bic,
> * did so, we would nest even more complexity in this
> * function.
> */
> - bfq_put_stable_ref(bic->stable_merge_bfqq);
> + bfq_put_stable_ref(bfqq_data->stable_merge_bfqq);
>
> - bic->stable_merge_bfqq = NULL;
> + bfqq_data->stable_merge_bfqq = NULL;
> }
> }
>
> @@ -1174,38 +1175,40 @@ static void
> bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd,
> struct bfq_io_cq *bic, bool bfq_already_existing)
> {
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
> unsigned int old_wr_coeff = 1;
> bool busy = bfq_already_existing && bfq_bfqq_busy(bfqq);
>
> - if (bic->saved_has_short_ttime)
> + if (bfqq_data->saved_has_short_ttime)
> bfq_mark_bfqq_has_short_ttime(bfqq);
> else
> bfq_clear_bfqq_has_short_ttime(bfqq);
>
> - if (bic->saved_IO_bound)
> + if (bfqq_data->saved_IO_bound)
> bfq_mark_bfqq_IO_bound(bfqq);
> else
> bfq_clear_bfqq_IO_bound(bfqq);
>
> - bfqq->last_serv_time_ns = bic->saved_last_serv_time_ns;
> - bfqq->inject_limit = bic->saved_inject_limit;
> - bfqq->decrease_time_jif = bic->saved_decrease_time_jif;
> + bfqq->last_serv_time_ns = bfqq_data->saved_last_serv_time_ns;
> + bfqq->inject_limit = bfqq_data->saved_inject_limit;
> + bfqq->decrease_time_jif = bfqq_data->saved_decrease_time_jif;
>
> - bfqq->entity.new_weight = bic->saved_weight;
> - bfqq->ttime = bic->saved_ttime;
> - bfqq->io_start_time = bic->saved_io_start_time;
> - bfqq->tot_idle_time = bic->saved_tot_idle_time;
> + bfqq->entity.new_weight = bfqq_data->saved_weight;
> + bfqq->ttime = bfqq_data->saved_ttime;
> + bfqq->io_start_time = bfqq_data->saved_io_start_time;
> + bfqq->tot_idle_time = bfqq_data->saved_tot_idle_time;
> /*
> * Restore weight coefficient only if low_latency is on
> */
> if (bfqd->low_latency) {
> old_wr_coeff = bfqq->wr_coeff;
> - bfqq->wr_coeff = bic->saved_wr_coeff;
> + bfqq->wr_coeff = bfqq_data->saved_wr_coeff;
> }
> - bfqq->service_from_wr = bic->saved_service_from_wr;
> - bfqq->wr_start_at_switch_to_srt = bic->saved_wr_start_at_switch_to_srt;
> - bfqq->last_wr_start_finish = bic->saved_last_wr_start_finish;
> - bfqq->wr_cur_max_time = bic->saved_wr_cur_max_time;
> + bfqq->service_from_wr = bfqq_data->saved_service_from_wr;
> + bfqq->wr_start_at_switch_to_srt =
> + bfqq_data->saved_wr_start_at_switch_to_srt;
> + bfqq->last_wr_start_finish = bfqq_data->saved_last_wr_start_finish;
> + bfqq->wr_cur_max_time = bfqq_data->saved_wr_cur_max_time;
>
> if (bfqq->wr_coeff > 1 && (bfq_bfqq_in_large_burst(bfqq) ||
> time_is_before_jiffies(bfqq->last_wr_start_finish +
> @@ -1878,7 +1881,7 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
> wr_or_deserves_wr = bfqd->low_latency &&
> (bfqq->wr_coeff > 1 ||
> (bfq_bfqq_sync(bfqq) &&
> - (bfqq->bic || RQ_BIC(rq)->stably_merged) &&
> + (bfqq->bic || RQ_BIC(rq)->bfqq_data.stably_merged) &&
> (*interactive || soft_rt)));
>
> /*
> @@ -2902,6 +2905,7 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> void *io_struct, bool request, struct bfq_io_cq *bic)
> {
> struct bfq_queue *in_service_bfqq, *new_bfqq;
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
>
> /* if a merge has already been setup, then proceed with that first */
> if (bfqq->new_bfqq)
> @@ -2923,21 +2927,21 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> * stable merging) also if bic is associated with a
> * sync queue, but this bfqq is async
> */
> - if (bfq_bfqq_sync(bfqq) && bic->stable_merge_bfqq &&
> + if (bfq_bfqq_sync(bfqq) && bfqq_data->stable_merge_bfqq &&
> !bfq_bfqq_just_created(bfqq) &&
> time_is_before_jiffies(bfqq->split_time +
> msecs_to_jiffies(bfq_late_stable_merging)) &&
> time_is_before_jiffies(bfqq->creation_time +
> msecs_to_jiffies(bfq_late_stable_merging))) {
> struct bfq_queue *stable_merge_bfqq =
> - bic->stable_merge_bfqq;
> + bfqq_data->stable_merge_bfqq;
> int proc_ref = min(bfqq_process_refs(bfqq),
> bfqq_process_refs(stable_merge_bfqq));
>
> /* deschedule stable merge, because done or aborted here */
> bfq_put_stable_ref(stable_merge_bfqq);
>
> - bic->stable_merge_bfqq = NULL;
> + bfqq_data->stable_merge_bfqq = NULL;
>
> if (!idling_boosts_thr_without_issues(bfqd, bfqq) &&
> proc_ref > 0) {
> @@ -2946,10 +2950,10 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> bfq_setup_merge(bfqq, stable_merge_bfqq);
>
> if (new_bfqq) {
> - bic->stably_merged = true;
> + bfqq_data->stably_merged = true;
> if (new_bfqq->bic)
> - new_bfqq->bic->stably_merged =
> - true;
> + new_bfqq->bic->bfqq_data.stably_merged =
> + true;
> }
> return new_bfqq;
> } else
> @@ -3048,6 +3052,7 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
> {
> struct bfq_io_cq *bic = bfqq->bic;
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
>
> /*
> * If !bfqq->bic, the queue is already shared or its requests
> @@ -3057,18 +3062,21 @@ static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
> if (!bic)
> return;
>
> - bic->saved_last_serv_time_ns = bfqq->last_serv_time_ns;
> - bic->saved_inject_limit = bfqq->inject_limit;
> - bic->saved_decrease_time_jif = bfqq->decrease_time_jif;
> -
> - bic->saved_weight = bfqq->entity.orig_weight;
> - bic->saved_ttime = bfqq->ttime;
> - bic->saved_has_short_ttime = bfq_bfqq_has_short_ttime(bfqq);
> - bic->saved_IO_bound = bfq_bfqq_IO_bound(bfqq);
> - bic->saved_io_start_time = bfqq->io_start_time;
> - bic->saved_tot_idle_time = bfqq->tot_idle_time;
> - bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
> - bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node);
> + bfqq_data->saved_last_serv_time_ns = bfqq->last_serv_time_ns;
> + bfqq_data->saved_inject_limit = bfqq->inject_limit;
> + bfqq_data->saved_decrease_time_jif = bfqq->decrease_time_jif;
> +
> + bfqq_data->saved_weight = bfqq->entity.orig_weight;
> + bfqq_data->saved_ttime = bfqq->ttime;
> + bfqq_data->saved_has_short_ttime =
> + bfq_bfqq_has_short_ttime(bfqq);
> + bfqq_data->saved_IO_bound = bfq_bfqq_IO_bound(bfqq);
> + bfqq_data->saved_io_start_time = bfqq->io_start_time;
> + bfqq_data->saved_tot_idle_time = bfqq->tot_idle_time;
> + bfqq_data->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
> + bfqq_data->was_in_burst_list =
> + !hlist_unhashed(&bfqq->burst_list_node);
> +
> if (unlikely(bfq_bfqq_just_created(bfqq) &&
> !bfq_bfqq_in_large_burst(bfqq) &&
> bfqq->bfqd->low_latency)) {
> @@ -3081,17 +3089,21 @@ static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
> * to bfqq, so that to avoid that bfqq unjustly fails
> * to enjoy weight raising if split soon.
> */
> - bic->saved_wr_coeff = bfqq->bfqd->bfq_wr_coeff;
> - bic->saved_wr_start_at_switch_to_srt = bfq_smallest_from_now();
> - bic->saved_wr_cur_max_time = bfq_wr_duration(bfqq->bfqd);
> - bic->saved_last_wr_start_finish = jiffies;
> + bfqq_data->saved_wr_coeff = bfqq->bfqd->bfq_wr_coeff;
> + bfqq_data->saved_wr_start_at_switch_to_srt =
> + bfq_smallest_from_now();
> + bfqq_data->saved_wr_cur_max_time =
> + bfq_wr_duration(bfqq->bfqd);
> + bfqq_data->saved_last_wr_start_finish = jiffies;
> } else {
> - bic->saved_wr_coeff = bfqq->wr_coeff;
> - bic->saved_wr_start_at_switch_to_srt =
> + bfqq_data->saved_wr_coeff = bfqq->wr_coeff;
> + bfqq_data->saved_wr_start_at_switch_to_srt =
> bfqq->wr_start_at_switch_to_srt;
> - bic->saved_service_from_wr = bfqq->service_from_wr;
> - bic->saved_last_wr_start_finish = bfqq->last_wr_start_finish;
> - bic->saved_wr_cur_max_time = bfqq->wr_cur_max_time;
> + bfqq_data->saved_service_from_wr =
> + bfqq->service_from_wr;
> + bfqq_data->saved_last_wr_start_finish =
> + bfqq->last_wr_start_finish;
> + bfqq_data->saved_wr_cur_max_time = bfqq->wr_cur_max_time;
> }
> }
>
> @@ -5413,6 +5425,7 @@ static void bfq_exit_icq(struct io_cq *icq)
> unsigned long flags;
> unsigned int act_idx;
> unsigned int num_actuators;
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
>
> /*
> * bfqd is NULL if scheduler already exited, and in that case
> @@ -5432,8 +5445,8 @@ static void bfq_exit_icq(struct io_cq *icq)
> num_actuators = BFQ_MAX_ACTUATORS;
> }
>
> - if (bic->stable_merge_bfqq)
> - bfq_put_stable_ref(bic->stable_merge_bfqq);
> + if (bfqq_data->stable_merge_bfqq)
> + bfq_put_stable_ref(bfqq_data->stable_merge_bfqq);
>
> for (act_idx = 0; act_idx < num_actuators; act_idx++) {
> bfq_exit_icq_bfqq(bic, true, act_idx);
> @@ -5624,13 +5637,14 @@ bfq_do_early_stable_merge(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> {
> struct bfq_queue *new_bfqq =
> bfq_setup_merge(bfqq, last_bfqq_created);
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
>
> if (!new_bfqq)
> return bfqq;
>
> if (new_bfqq->bic)
> - new_bfqq->bic->stably_merged = true;
> - bic->stably_merged = true;
> + new_bfqq->bic->bfqq_data.stably_merged = true;
> + bfqq_data->stably_merged = true;
>
> /*
> * Reusing merge functions. This implies that
> @@ -5699,6 +5713,7 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
> &bfqd->last_bfqq_created;
>
> struct bfq_queue *last_bfqq_created = *source_bfqq;
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
>
> /*
> * If last_bfqq_created has not been set yet, then init it. If
> @@ -5760,7 +5775,7 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
> /*
> * Record the bfqq to merge to.
> */
> - bic->stable_merge_bfqq = last_bfqq_created;
> + bfqq_data->stable_merge_bfqq = last_bfqq_created;
> }
> }
>
> @@ -6681,6 +6696,7 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
> {
> unsigned int act_idx = bfq_actuator_index(bfqd, bio);
> struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync, act_idx);
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
>
> if (likely(bfqq && bfqq != &bfqd->oom_bfqq))
> return bfqq;
> @@ -6694,12 +6710,12 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
>
> bic_set_bfqq(bic, bfqq, is_sync, act_idx);
> if (split && is_sync) {
> - if ((bic->was_in_burst_list && bfqd->large_burst) ||
> - bic->saved_in_large_burst)
> + if ((bfqq_data->was_in_burst_list && bfqd->large_burst) ||
> + bfqq_data->saved_in_large_burst)
> bfq_mark_bfqq_in_large_burst(bfqq);
> else {
> bfq_clear_bfqq_in_large_burst(bfqq);
> - if (bic->was_in_burst_list)
> + if (bfqq_data->was_in_burst_list)
> /*
> * If bfqq was in the current
> * burst list before being
> @@ -6788,6 +6804,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
> struct bfq_queue *bfqq;
> bool new_queue = false;
> bool bfqq_already_existing = false, split = false;
> + struct bfq_iocq_bfqq_data *bfqq_data;
>
> if (unlikely(!rq->elv.icq))
> return NULL;
> @@ -6811,15 +6828,17 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
> bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio, false, is_sync,
> &new_queue);
>
> + bfqq_data = &bic->bfqq_data;
> +
> if (likely(!new_queue)) {
> /* If the queue was seeky for too long, break it apart. */
> if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq) &&
> - !bic->stably_merged) {
> + !bfqq_data->stably_merged) {
> struct bfq_queue *old_bfqq = bfqq;
>
> /* Update bic before losing reference to bfqq */
> if (bfq_bfqq_in_large_burst(bfqq))
> - bic->saved_in_large_burst = true;
> + bfqq_data->saved_in_large_burst = true;
>
> bfqq = bfq_split_bfqq(bic, bfqq);
> split = true;
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index bfcbd8ea9000..f2e8ab91951c 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -411,27 +411,9 @@ struct bfq_queue {
> };
>
> /**
> - * struct bfq_io_cq - per (request_queue, io_context) structure.
> - */
> -struct bfq_io_cq {
> - /* associated io_cq structure */
> - struct io_cq icq; /* must be the first member */
> - /*
> - * Matrix of associated process queues: first row for async
> - * queues, second row sync queues. Each row contains one
> - * column for each actuator. An I/O request generated by the
> - * process is inserted into the queue pointed by bfqq[i][j] if
> - * the request is to be served by the j-th actuator of the
> - * drive, where i==0 or i==1, depending on whether the request
> - * is async or sync. So there is a distinct queue for each
> - * actuator.
> - */
> - struct bfq_queue *bfqq[2][BFQ_MAX_ACTUATORS];
> - /* per (request_queue, blkcg) ioprio */
> - int ioprio;
> -#ifdef CONFIG_BFQ_GROUP_IOSCHED
> - uint64_t blkcg_serial_nr; /* the current blkcg serial */
> -#endif
> +* struct bfq_data - bfqq data unique and persistent for associated bfq_io_cq
> +*/
> +struct bfq_iocq_bfqq_data {
> /*
> * Snapshot of the has_short_time flag before merging; taken
> * to remember its value while the queue is merged, so as to
> @@ -486,6 +468,34 @@ struct bfq_io_cq {
> struct bfq_queue *stable_merge_bfqq;
>
> bool stably_merged; /* non splittable if true */
> +};
> +
> +/**
> + * struct bfq_io_cq - per (request_queue, io_context) structure.
> + */
> +struct bfq_io_cq {
> + /* associated io_cq structure */
> + struct io_cq icq; /* must be the first member */
> + /*
> + * Matrix of associated process queues: first row for async
> + * queues, second row sync queues. Each row contains one
> + * column for each actuator. An I/O request generated by the
> + * process is inserted into the queue pointed by bfqq[i][j] if
> + * the request is to be served by the j-th actuator of the
> + * drive, where i==0 or i==1, depending on whether the request
> + * is async or sync. So there is a distinct queue for each
> + * actuator.
> + */
> + struct bfq_queue *bfqq[2][BFQ_MAX_ACTUATORS];
> + /* per (request_queue, blkcg) ioprio */
> + int ioprio;
> +#ifdef CONFIG_BFQ_GROUP_IOSCHED
> + uint64_t blkcg_serial_nr; /* the current blkcg serial */
> +#endif
> +
> + /* persistent data for associated synchronous process queue */
> + struct bfq_iocq_bfqq_data bfqq_data;
> +
> unsigned int requests; /* Number of requests this process has in flight */
> };
>

--
Damien Le Moal
Western Digital Research


2022-11-21 00:54:00

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH V6 4/8] block, bfq: turn bfqq_data into an array in bfq_io_cq

On 11/4/22 01:26, Paolo Valente wrote:
> When a bfq_queue Q is merged with another queue, several pieces of
> information are saved about Q. These pieces are stored in the
> bfqq_data field in the bfq_io_cq data structure of the process
> associated with Q.
>
> Yet, with a multi-actuator drive, a process may get associated with
> multiple bfq_queues: one queue for each of the N actuators. Each of
> these queues may undergo a merge. So, the bfq_io_cq data structure
> must be able to accommodate the above information for N queues.
>
> This commit solves this problem by turning the bfqq_data scalar field
> into an array of N elements (and by changing code so as to handle
> this array).
>
> This solution is written under the assumption that bfq_queues
> associated with different actuators cannot be cross-merged. This
> assumption holds naturally with basic queue merging: the latter is
> triggered by spatial locality, and sectors for different actuators are
> not close to each other. As for stable cross-merging, the assumption

The last sector served by actuator N is very close to the first sector
served by actuator N+1 :)

So I am not sure this argument is valid. Better explanation required here
I think.

> here is that it is disabled.
>
> Signed-off-by: Gabriele Felici <[email protected]>
> Signed-off-by: Gianmarco Lusvardi <[email protected]>
> Signed-off-by: Giulio Barabino <[email protected]>
> Signed-off-by: Emiliano Maccaferri <[email protected]>
> Signed-off-by: Paolo Valente <[email protected]>
> ---
> block/bfq-iosched.c | 72 ++++++++++++++++++++++++---------------------
> block/bfq-iosched.h | 12 +++++---
> 2 files changed, 47 insertions(+), 37 deletions(-)
>
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 01528182c0c5..f44bac054aaf 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -404,7 +404,7 @@ void bic_set_bfqq(struct bfq_io_cq *bic,
> * we cancel the stable merge if
> * bic->stable_merge_bfqq == bfqq.
> */
> - struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data[actuator_idx];
> bic->bfqq[is_sync][actuator_idx] = bfqq;
>
> if (bfqq && bfqq_data->stable_merge_bfqq == bfqq) {
> @@ -1175,9 +1175,10 @@ static void
> bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_data *bfqd,
> struct bfq_io_cq *bic, bool bfq_already_existing)
> {
> - struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
> unsigned int old_wr_coeff = 1;
> bool busy = bfq_already_existing && bfq_bfqq_busy(bfqq);
> + unsigned int a_idx = bfqq->actuator_idx;
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data[a_idx];
>
> if (bfqq_data->saved_has_short_ttime)
> bfq_mark_bfqq_has_short_ttime(bfqq);
> @@ -1827,6 +1828,16 @@ static bool bfq_bfqq_higher_class_or_weight(struct bfq_queue *bfqq,
> return bfqq_weight > in_serv_weight;
> }
>
> +/* get the index of the actuator that will serve bio */
> +static unsigned int bfq_actuator_index(struct bfq_data *bfqd, struct bio *bio)
> +{
> + /*
> + * Multi-actuator support not complete yet, so always return 0
> + * for the moment.
> + */
> + return 0;
> +}

??? This was added in patch 1. Why again here ?

> +
> static bool bfq_better_to_idle(struct bfq_queue *bfqq);
>
> static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
> @@ -1881,7 +1892,9 @@ static void bfq_bfqq_handle_idle_busy_switch(struct bfq_data *bfqd,
> wr_or_deserves_wr = bfqd->low_latency &&
> (bfqq->wr_coeff > 1 ||
> (bfq_bfqq_sync(bfqq) &&
> - (bfqq->bic || RQ_BIC(rq)->bfqq_data.stably_merged) &&
> + (bfqq->bic ||
> + RQ_BIC(rq)->bfqq_data[bfq_actuator_index(bfqd, rq->bio)]
> + .stably_merged) &&
> (*interactive || soft_rt)));
>
> /*
> @@ -2469,16 +2482,6 @@ static void bfq_remove_request(struct request_queue *q,
>
> }
>
> -/* get the index of the actuator that will serve bio */
> -static unsigned int bfq_actuator_index(struct bfq_data *bfqd, struct bio *bio)
> -{
> - /*
> - * Multi-actuator support not complete yet, so always return 0
> - * for the moment.
> - */
> - return 0;
> -}
> -

ah. You moved it... May be add it in the right place in patch 1 then to
avoid that ? That may be due to some git/diff artifacts though.

> static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
> unsigned int nr_segs)
> {
> @@ -2905,7 +2908,8 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> void *io_struct, bool request, struct bfq_io_cq *bic)
> {
> struct bfq_queue *in_service_bfqq, *new_bfqq;
> - struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
> + unsigned int a_idx = bfqq->actuator_idx;
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data[a_idx];
>
> /* if a merge has already been setup, then proceed with that first */
> if (bfqq->new_bfqq)
> @@ -2952,8 +2956,9 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> if (new_bfqq) {
> bfqq_data->stably_merged = true;
> if (new_bfqq->bic)
> - new_bfqq->bic->bfqq_data.stably_merged =
> - true;
> + new_bfqq->bic->bfqq_data
> + [new_bfqq->actuator_idx]
> + .stably_merged = true;

Aouch. This is really hard to read as that is really not supposed to be
split like this.

> }
> return new_bfqq;
> } else
> @@ -3052,7 +3057,9 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
> {
> struct bfq_io_cq *bic = bfqq->bic;
> - struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
> + /* State must be saved for the right queue index. */

Drop this comment. It serves no purpose in my opinion.

> + unsigned int a_idx = bfqq->actuator_idx;
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data[a_idx];
>
> /*
> * If !bfqq->bic, the queue is already shared or its requests
> @@ -3063,7 +3070,7 @@ static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
> return;
>
> bfqq_data->saved_last_serv_time_ns = bfqq->last_serv_time_ns;
> - bfqq_data->saved_inject_limit = bfqq->inject_limit;
> + bfqq_data->saved_inject_limit = bfqq->inject_limit;
> bfqq_data->saved_decrease_time_jif = bfqq->decrease_time_jif;
>
> bfqq_data->saved_weight = bfqq->entity.orig_weight;
> @@ -5425,7 +5432,7 @@ static void bfq_exit_icq(struct io_cq *icq)
> unsigned long flags;
> unsigned int act_idx;
> unsigned int num_actuators;
> - struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
> + struct bfq_iocq_bfqq_data *bfqq_data = bic->bfqq_data;
>
> /*
> * bfqd is NULL if scheduler already exited, and in that case
> @@ -5445,10 +5452,10 @@ static void bfq_exit_icq(struct io_cq *icq)
> num_actuators = BFQ_MAX_ACTUATORS;
> }
>
> - if (bfqq_data->stable_merge_bfqq)
> - bfq_put_stable_ref(bfqq_data->stable_merge_bfqq);
> -
> for (act_idx = 0; act_idx < num_actuators; act_idx++) {
> + if (bfqq_data[act_idx].stable_merge_bfqq)
> + bfq_put_stable_ref(bfqq_data[act_idx].stable_merge_bfqq);
> +
> bfq_exit_icq_bfqq(bic, true, act_idx);
> bfq_exit_icq_bfqq(bic, false, act_idx);
> }
> @@ -5635,16 +5642,16 @@ bfq_do_early_stable_merge(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> struct bfq_io_cq *bic,
> struct bfq_queue *last_bfqq_created)
> {
> + unsigned int a_idx = last_bfqq_created->actuator_idx;
> struct bfq_queue *new_bfqq =
> bfq_setup_merge(bfqq, last_bfqq_created);
> - struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
>
> if (!new_bfqq)
> return bfqq;
>
> if (new_bfqq->bic)
> - new_bfqq->bic->bfqq_data.stably_merged = true;
> - bfqq_data->stably_merged = true;
> + new_bfqq->bic->bfqq_data[a_idx].stably_merged = true;
> + bic->bfqq_data[a_idx].stably_merged = true;
>
> /*
> * Reusing merge functions. This implies that
> @@ -5713,7 +5720,6 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
> &bfqd->last_bfqq_created;
>
> struct bfq_queue *last_bfqq_created = *source_bfqq;
> - struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
>
> /*
> * If last_bfqq_created has not been set yet, then init it. If
> @@ -5775,7 +5781,8 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
> /*
> * Record the bfqq to merge to.
> */
> - bfqq_data->stable_merge_bfqq = last_bfqq_created;
> + bic->bfqq_data[last_bfqq_created->actuator_idx].stable_merge_bfqq =
> + last_bfqq_created;
> }
> }
>
> @@ -6696,7 +6703,7 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
> {
> unsigned int act_idx = bfq_actuator_index(bfqd, bio);
> struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync, act_idx);
> - struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data;
> + struct bfq_iocq_bfqq_data *bfqq_data = &bic->bfqq_data[act_idx];
>
> if (likely(bfqq && bfqq != &bfqd->oom_bfqq))
> return bfqq;
> @@ -6804,7 +6811,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
> struct bfq_queue *bfqq;
> bool new_queue = false;
> bool bfqq_already_existing = false, split = false;
> - struct bfq_iocq_bfqq_data *bfqq_data;
> + unsigned int a_idx = bfq_actuator_index(bfqd, bio);
>
> if (unlikely(!rq->elv.icq))
> return NULL;
> @@ -6828,17 +6835,16 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
> bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio, false, is_sync,
> &new_queue);
>
> - bfqq_data = &bic->bfqq_data;
> -
> if (likely(!new_queue)) {
> /* If the queue was seeky for too long, break it apart. */
> if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq) &&
> - !bfqq_data->stably_merged) {
> + !bic->bfqq_data[a_idx].stably_merged) {
> struct bfq_queue *old_bfqq = bfqq;
>
> /* Update bic before losing reference to bfqq */
> if (bfq_bfqq_in_large_burst(bfqq))
> - bfqq_data->saved_in_large_burst = true;
> + bic->bfqq_data[a_idx].saved_in_large_burst =
> + true;
>
> bfqq = bfq_split_bfqq(bic, bfqq);
> split = true;
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index f2e8ab91951c..e27897d66a0f 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -416,7 +416,7 @@ struct bfq_queue {
> struct bfq_iocq_bfqq_data {
> /*
> * Snapshot of the has_short_time flag before merging; taken
> - * to remember its value while the queue is merged, so as to
> + * to remember its values while the queue is merged, so as to
> * be able to restore it in case of split.
> */
> bool saved_has_short_ttime;
> @@ -430,7 +430,7 @@ struct bfq_iocq_bfqq_data {
> u64 saved_tot_idle_time;
>
> /*
> - * Same purpose as the previous fields for the value of the
> + * Same purpose as the previous fields for the values of the
> * field keeping the queue's belonging to a large burst
> */
> bool saved_in_large_burst;
> @@ -493,8 +493,12 @@ struct bfq_io_cq {
> uint64_t blkcg_serial_nr; /* the current blkcg serial */
> #endif
>
> - /* persistent data for associated synchronous process queue */
> - struct bfq_iocq_bfqq_data bfqq_data;
> + /*
> + * Persistent data for associated synchronous process queues
> + * (one queue per actuator, see field bfqq above). In
> + * particular, each of these queues may undergo a merge.
> + */
> + struct bfq_iocq_bfqq_data bfqq_data[BFQ_MAX_ACTUATORS];

I wonder if packing this together with struct bfq_queue would be cleaner.
That would avoid the 2 arrays you have in this struct. Something like this:

struct bfq_queue_data {
struct bfq_queue *bfqq[2];
struct bfq_iocq_bfqq_data iocq_data;
}

struct bfq_io_cq {
...
struct bfq_queue_data bfqqd[BFQ_MAX_ACTUATORS];
...
}

Thinking aloud here. That may actually make the code more complicated.

>
> unsigned int requests; /* Number of requests this process has in flight */
> };

--
Damien Le Moal
Western Digital Research


2022-11-21 00:54:54

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH V6 2/8] block, bfq: forbid stable merging of queues associated with different actuators

On 11/4/22 01:26, Paolo Valente wrote:
> If queues associated with different actuators are merged, then control
> is lost on each actuator. Therefore some actuator may be
> underutilized, and throughput may decrease. This problem cannot occur
> with basic queue merging, because the latter is triggered by spatial
> locality, and sectors for different actuators are not close to each
> other. Yet it may happen with stable merging. To address this issue,
> this commit prevents stable merging from occurring among queues
> associated with different actuators.
>
> Signed-off-by: Paolo Valente <[email protected]>
> ---
> block/bfq-iosched.c | 13 +++++++++----
> 1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 5c69394bbb65..ec4b0e70265f 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -5705,9 +5705,13 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
> * it has been set already, but too long ago, then move it
> * forward to bfqq. Finally, move also if bfqq belongs to a
> * different group than last_bfqq_created, or if bfqq has a
> - * different ioprio or ioprio_class. If none of these
> - * conditions holds true, then try an early stable merge or
> - * schedule a delayed stable merge.
> + * different ioprio, ioprio_class or actuator_idx. If none of
> + * these conditions holds true, then try an early stable merge
> + * or schedule a delayed stable merge. As for the condition on
> + * actuator_idx, the reason is that, if queues associated with
> + * different actuators are merged, then control is lost on
> + * each actuator. Therefore some actuator may be
> + * underutilized, and throughput may decrease.
> *
> * A delayed merge is scheduled (instead of performing an
> * early merge), in case bfqq might soon prove to be more
> @@ -5725,7 +5729,8 @@ static struct bfq_queue *bfq_do_or_sched_stable_merge(struct bfq_data *bfqd,
> bfqq->creation_time) ||
> bfqq->entity.parent != last_bfqq_created->entity.parent ||
> bfqq->ioprio != last_bfqq_created->ioprio ||
> - bfqq->ioprio_class != last_bfqq_created->ioprio_class)
> + bfqq->ioprio_class != last_bfqq_created->ioprio_class ||
> + bfqq->actuator_idx != last_bfqq_created->actuator_idx)
> *source_bfqq = bfqq;
> else if (time_after_eq(last_bfqq_created->creation_time +
> bfqd->bfq_burst_interval,

Reviewed-by: Damien Le Moal <[email protected]>

--
Damien Le Moal
Western Digital Research


2022-11-21 01:45:59

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH V6 5/8] block, bfq: split also async bfq_queues on a per-actuator basis

On 11/4/22 01:26, Paolo Valente wrote:
> From: Davide Zini <[email protected]>
>
> Similarly to sync bfq_queues, also async bfq_queues need to be split
> on a per-actuator basis.
>
> Signed-off-by: Paolo Valente <[email protected]>
> Signed-off-by: Davide Zini <[email protected]>

Reviewed-by: Damien Le Moal <[email protected]>

> ---
> block/bfq-iosched.c | 41 +++++++++++++++++++++++------------------
> block/bfq-iosched.h | 8 ++++----
> 2 files changed, 27 insertions(+), 22 deletions(-)
>
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index f44bac054aaf..c94b80e3f685 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -2673,14 +2673,16 @@ static void bfq_bfqq_end_wr(struct bfq_queue *bfqq)
> void bfq_end_wr_async_queues(struct bfq_data *bfqd,
> struct bfq_group *bfqg)
> {
> - int i, j;
> -
> - for (i = 0; i < 2; i++)
> - for (j = 0; j < IOPRIO_NR_LEVELS; j++)
> - if (bfqg->async_bfqq[i][j])
> - bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
> - if (bfqg->async_idle_bfqq)
> - bfq_bfqq_end_wr(bfqg->async_idle_bfqq);
> + int i, j, k;
> +
> + for (k = 0; k < bfqd->num_actuators; k++) {
> + for (i = 0; i < 2; i++)
> + for (j = 0; j < IOPRIO_NR_LEVELS; j++)
> + if (bfqg->async_bfqq[i][j][k])
> + bfq_bfqq_end_wr(bfqg->async_bfqq[i][j][k]);
> + if (bfqg->async_idle_bfqq[k])
> + bfq_bfqq_end_wr(bfqg->async_idle_bfqq[k]);
> + }
> }
>
> static void bfq_end_wr(struct bfq_data *bfqd)
> @@ -5620,18 +5622,18 @@ static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>
> static struct bfq_queue **bfq_async_queue_prio(struct bfq_data *bfqd,
> struct bfq_group *bfqg,
> - int ioprio_class, int ioprio)
> + int ioprio_class, int ioprio, int act_idx)
> {
> switch (ioprio_class) {
> case IOPRIO_CLASS_RT:
> - return &bfqg->async_bfqq[0][ioprio];
> + return &bfqg->async_bfqq[0][ioprio][act_idx];
> case IOPRIO_CLASS_NONE:
> ioprio = IOPRIO_BE_NORM;
> fallthrough;
> case IOPRIO_CLASS_BE:
> - return &bfqg->async_bfqq[1][ioprio];
> + return &bfqg->async_bfqq[1][ioprio][act_idx];
> case IOPRIO_CLASS_IDLE:
> - return &bfqg->async_idle_bfqq;
> + return &bfqg->async_idle_bfqq[act_idx];
> default:
> return NULL;
> }
> @@ -5805,7 +5807,8 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
>
> if (!is_sync) {
> async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
> - ioprio);
> + ioprio,
> + bfq_actuator_index(bfqd, bio));
> bfqq = *async_bfqq;
> if (bfqq)
> goto out;
> @@ -7022,13 +7025,15 @@ static void __bfq_put_async_bfqq(struct bfq_data *bfqd,
> */
> void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
> {
> - int i, j;
> + int i, j, k;
>
> - for (i = 0; i < 2; i++)
> - for (j = 0; j < IOPRIO_NR_LEVELS; j++)
> - __bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
> + for (k = 0; k < bfqd->num_actuators; k++) {
> + for (i = 0; i < 2; i++)
> + for (j = 0; j < IOPRIO_NR_LEVELS; j++)
> + __bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j][k]);
>
> - __bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
> + __bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq[k]);
> + }
> }
>
> /*
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index e27897d66a0f..f1c2e77cbf9a 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -976,8 +976,8 @@ struct bfq_group {
>
> void *bfqd;
>
> - struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
> - struct bfq_queue *async_idle_bfqq;
> + struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS][BFQ_MAX_ACTUATORS];
> + struct bfq_queue *async_idle_bfqq[BFQ_MAX_ACTUATORS];
>
> struct bfq_entity *my_entity;
>
> @@ -993,8 +993,8 @@ struct bfq_group {
> struct bfq_entity entity;
> struct bfq_sched_data sched_data;
>
> - struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS];
> - struct bfq_queue *async_idle_bfqq;
> + struct bfq_queue *async_bfqq[2][IOPRIO_NR_LEVELS][BFQ_MAX_ACTUATORS];
> + struct bfq_queue *async_idle_bfqq[BFQ_MAX_ACTUATORS];
>
> struct rb_root rq_pos_tree;
> };

--
Damien Le Moal
Western Digital Research


2022-11-21 01:50:24

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH V6 7/8] block, bfq: inject I/O to underutilized actuators

On 11/4/22 01:26, Paolo Valente wrote:
> From: Davide Zini <[email protected]>
>
> The main service scheme of BFQ for sync I/O is serving one sync
> bfq_queue at a time, for a while. In particular, BFQ enforces this
> scheme when it deems the latter necessary to boost throughput or
> to preserve service guarantees. Unfortunately, when BFQ enforces
> this policy, only one actuator at a time gets served for a while,
> because each bfq_queue contains I/O only for one actuator. The
> other actuators may remain underutilized.
>
> Actually, BFQ may serve (inject) extra I/O, taken from other
> bfq_queues, in parallel with that of the in-service queue. This
> injection mechanism may provide the ground for dealing also with
> the above actuator-underutilization problem. Yet BFQ does not take
> the actuator load into account when choosing which queue to pick
> extra I/O from. In addition, BFQ may happen to inject extra I/O
> only when the in-service queue is temporarily empty.
>
> In view of these facts, this commit extends the
> injection mechanism in such a way that the latter:
> (1) takes into account also the actuator load;
> (2) checks such a load on each dispatch, and injects I/O for an
> underutilized actuator, if there is one and there is I/O for it.
>
> To perform the check in (2), this commit introduces a load
> threshold, currently set to 4. A linear scan of each actuator is
> performed, until an actuator is found for which the following two
> conditions hold: the load of the actuator is below the threshold,
> and there is at least one non-in-service queue that contains I/O
> for that actuator. If such a pair (actuator, queue) is found, then
> the head request of that queue is returned for dispatch, instead
> of the head request of the in-service queue.
>
> We have set the threshold, empirically, to the minimum possible
> value for which an actuator is fully utilized, or close to be
> fully utilized. By doing so, injected I/O 'steals' as few
> drive-queue slots as possibile to the in-service queue. This
> reduces as much as possible the probability that the service of
> I/O from the in-service bfq_queue gets delayed because of slot
> exhaustion, i.e., because all the slots of the drive queue are
> filled with I/O injected from other queues (NCQ provides for 32
> slots).
>
> This new mechanism also counters actuator underutilization in the
> case of asymmetric configurations of bfq_queues. Namely if there
> are few bfq_queues containing I/O for some actuators and many
> bfq_queues containing I/O for other actuators. Or if the
> bfq_queues containing I/O for some actuators have lower weights
> than the other bfq_queues.
>
> Signed-off-by: Paolo Valente <[email protected]>
> Signed-off-by: Davide Zini <[email protected]>

A few nits below. Otherwise, looks ok.

Reviewed-by: Damien Le Moal <[email protected]>

> ---
> block/bfq-cgroup.c | 2 +-
> block/bfq-iosched.c | 139 +++++++++++++++++++++++++++++++++-----------
> block/bfq-iosched.h | 39 ++++++++++++-
> block/bfq-wf2q.c | 2 +-
> 4 files changed, 143 insertions(+), 39 deletions(-)
>
> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
> index d243c429d9c0..38ccfe55ad46 100644
> --- a/block/bfq-cgroup.c
> +++ b/block/bfq-cgroup.c
> @@ -694,7 +694,7 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> bfq_activate_bfqq(bfqd, bfqq);
> }
>
> - if (!bfqd->in_service_queue && !bfqd->rq_in_driver)
> + if (!bfqd->in_service_queue && !bfqd->tot_rq_in_driver)
> bfq_schedule_dispatch(bfqd);
> /* release extra ref taken above, bfqq may happen to be freed now */
> bfq_put_queue(bfqq);
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 106c8820cc5c..db91f1a651d3 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -2252,6 +2252,7 @@ static void bfq_add_request(struct request *rq)
>
> bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq));
> bfqq->queued[rq_is_sync(rq)]++;
> +

whiteline change

> /*
> * Updating of 'bfqd->queued' is protected by 'bfqd->lock', however, it
> * may be read without holding the lock in bfq_has_work().
> @@ -2297,9 +2298,9 @@ static void bfq_add_request(struct request *rq)
> * elapsed.
> */
> if (bfqq == bfqd->in_service_queue &&
> - (bfqd->rq_in_driver == 0 ||
> + (bfqd->tot_rq_in_driver == 0 ||
> (bfqq->last_serv_time_ns > 0 &&
> - bfqd->rqs_injected && bfqd->rq_in_driver > 0)) &&
> + bfqd->rqs_injected && bfqd->tot_rq_in_driver > 0)) &&
> time_is_before_eq_jiffies(bfqq->decrease_time_jif +
> msecs_to_jiffies(10))) {
> bfqd->last_empty_occupied_ns = ktime_get_ns();
> @@ -2323,7 +2324,7 @@ static void bfq_add_request(struct request *rq)
> * will be set in case injection is performed
> * on bfqq before rq is completed).
> */
> - if (bfqd->rq_in_driver == 0)
> + if (bfqd->tot_rq_in_driver == 0)
> bfqd->rqs_injected = false;
> }
> }
> @@ -2421,15 +2422,18 @@ static sector_t get_sdist(sector_t last_pos, struct request *rq)
> static void bfq_activate_request(struct request_queue *q, struct request *rq)
> {
> struct bfq_data *bfqd = q->elevator->elevator_data;
> + unsigned int act_idx = bfq_actuator_index(bfqd, rq->bio);
>
> - bfqd->rq_in_driver++;
> + bfqd->tot_rq_in_driver++;
> + bfqd->rq_in_driver[act_idx]++;
> }
>
> static void bfq_deactivate_request(struct request_queue *q, struct request *rq)
> {
> struct bfq_data *bfqd = q->elevator->elevator_data;
>
> - bfqd->rq_in_driver--;
> + bfqd->tot_rq_in_driver--;
> + bfqd->rq_in_driver[bfq_actuator_index(bfqd, rq->bio)]--;
> }
> #endif
>
> @@ -2703,11 +2707,14 @@ void bfq_end_wr_async_queues(struct bfq_data *bfqd,
> static void bfq_end_wr(struct bfq_data *bfqd)
> {
> struct bfq_queue *bfqq;
> + int i;
>
> spin_lock_irq(&bfqd->lock);
>
> - list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
> - bfq_bfqq_end_wr(bfqq);
> + for (i = 0; i < bfqd->num_actuators; i++) {
> + list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list)
> + bfq_bfqq_end_wr(bfqq);
> + }
> list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
> bfq_bfqq_end_wr(bfqq);
> bfq_end_wr_async(bfqd);
> @@ -3651,13 +3658,13 @@ static void bfq_update_peak_rate(struct bfq_data *bfqd, struct request *rq)
> * - start a new observation interval with this dispatch
> */
> if (now_ns - bfqd->last_dispatch > 100*NSEC_PER_MSEC &&
> - bfqd->rq_in_driver == 0)
> + bfqd->tot_rq_in_driver == 0)
> goto update_rate_and_reset;
>
> /* Update sampling information */
> bfqd->peak_rate_samples++;
>
> - if ((bfqd->rq_in_driver > 0 ||
> + if ((bfqd->tot_rq_in_driver > 0 ||
> now_ns - bfqd->last_completion < BFQ_MIN_TT)
> && !BFQ_RQ_SEEKY(bfqd, bfqd->last_position, rq))
> bfqd->sequential_samples++;
> @@ -3924,7 +3931,7 @@ static bool idling_needed_for_service_guarantees(struct bfq_data *bfqd,
> return (bfqq->wr_coeff > 1 &&
> (bfqd->wr_busy_queues <
> tot_busy_queues ||
> - bfqd->rq_in_driver >=
> + bfqd->tot_rq_in_driver >=
> bfqq->dispatched + 4)) ||
> bfq_asymmetric_scenario(bfqd, bfqq) ||

Nit: with all the line splits, this is really hard to read... Use the full
80 chars available please.

> tot_busy_queues == 1;
> @@ -4696,6 +4703,7 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd)
> {
> struct bfq_queue *bfqq, *in_serv_bfqq = bfqd->in_service_queue;
> unsigned int limit = in_serv_bfqq->inject_limit;
> + int i;

Missing blank line after this declaration.

> /*
> * If
> * - bfqq is not weight-raised and therefore does not carry
> @@ -4727,7 +4735,7 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd)
> )
> limit = 1;
>
> - if (bfqd->rq_in_driver >= limit)
> + if (bfqd->tot_rq_in_driver >= limit)
> return NULL;
>
> /*
> @@ -4742,11 +4750,12 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd)
> * (and re-added only if it gets new requests, but then it
> * is assigned again enough budget for its new backlog).
> */
> - list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
> - if (!RB_EMPTY_ROOT(&bfqq->sort_list) &&
> - (in_serv_always_inject || bfqq->wr_coeff > 1) &&
> - bfq_serv_to_charge(bfqq->next_rq, bfqq) <=
> - bfq_bfqq_budget_left(bfqq)) {
> + for (i = 0; i < bfqd->num_actuators; i++) {
> + list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list)
> + if (!RB_EMPTY_ROOT(&bfqq->sort_list) &&
> + (in_serv_always_inject || bfqq->wr_coeff > 1) &&
> + bfq_serv_to_charge(bfqq->next_rq, bfqq) <=
> + bfq_bfqq_budget_left(bfqq)) {
> /*
> * Allow for only one large in-flight request
> * on non-rotational devices, for the
> @@ -4771,22 +4780,69 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd)
> else
> limit = in_serv_bfqq->inject_limit;
>
> - if (bfqd->rq_in_driver < limit) {
> + if (bfqd->tot_rq_in_driver < limit) {
> bfqd->rqs_injected = true;
> return bfqq;
> }
> }
> + }
> +
> + return NULL;
> +}
> +
> +static struct bfq_queue *
> +bfq_find_active_bfqq_for_actuator(struct bfq_data *bfqd,
> + int idx)

Why the line split ?

> +{
> + struct bfq_queue *bfqq = NULL;
> +
> + if (bfqd->in_service_queue &&
> + bfqd->in_service_queue->actuator_idx == idx)
> + return bfqd->in_service_queue;
> +
> + list_for_each_entry(bfqq, &bfqd->active_list[idx], bfqq_list) {
> + if (!RB_EMPTY_ROOT(&bfqq->sort_list) &&
> + bfq_serv_to_charge(bfqq->next_rq, bfqq) <=
> + bfq_bfqq_budget_left(bfqq)) {
> + return bfqq;
> + }
> + }
>
> return NULL;
> }
>
> +/*
> + * Perform a linear scan of each actuator, until an actuator is found
> + * for which the following two conditions hold: the load of the
> + * actuator is below the threshold (see comments on actuator_load_threshold
> + * for details), and there is a queue that contains I/O for that
> + * actuator. On success, return that queue.
> + */
> +static struct bfq_queue *
> +bfq_find_bfqq_for_underused_actuator(struct bfq_data *bfqd)
> +{
> + int i;
> +
> + for (i = 0 ; i < bfqd->num_actuators; i++)
> + if (bfqd->rq_in_driver[i] < bfqd->actuator_load_threshold) {
> + struct bfq_queue *bfqq =
> + bfq_find_active_bfqq_for_actuator(bfqd, i);
> +
> + if (bfqq)
> + return bfqq;
> + }

Given that the statement inside the for loop is multi-line, adding curly
brackets would be nice.

> +
> + return NULL;
> +}
> +
> +
> /*
> * Select a queue for service. If we have a current queue in service,
> * check whether to continue servicing it, or retrieve and set a new one.
> */
> static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
> {
> - struct bfq_queue *bfqq;
> + struct bfq_queue *bfqq, *inject_bfqq;
> struct request *next_rq;
> enum bfqq_expiration reason = BFQQE_BUDGET_TIMEOUT;
>
> @@ -4808,6 +4864,15 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
> goto expire;
>
> check_queue:
> + /*
> + * If some actuator is underutilized, but the in-service
> + * queue does not contain I/O for that actuator, then try to
> + * inject I/O for that actuator.
> + */
> + inject_bfqq = bfq_find_bfqq_for_underused_actuator(bfqd);
> + if (inject_bfqq && inject_bfqq != bfqq)
> + return inject_bfqq;
> +
> /*
> * This loop is rarely executed more than once. Even when it
> * happens, it is much more convenient to re-execute this loop
> @@ -5163,11 +5228,11 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
>
> /*
> * We exploit the bfq_finish_requeue_request hook to
> - * decrement rq_in_driver, but
> + * decrement tot_rq_in_driver, but
> * bfq_finish_requeue_request will not be invoked on
> * this request. So, to avoid unbalance, just start
> - * this request, without incrementing rq_in_driver. As
> - * a negative consequence, rq_in_driver is deceptively
> + * this request, without incrementing tot_rq_in_driver. As
> + * a negative consequence, tot_rq_in_driver is deceptively
> * lower than it should be while this request is in
> * service. This may cause bfq_schedule_dispatch to be
> * invoked uselessly.
> @@ -5176,7 +5241,7 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
> * bfq_finish_requeue_request hook, if defined, is
> * probably invoked also on this request. So, by
> * exploiting this hook, we could 1) increment
> - * rq_in_driver here, and 2) decrement it in
> + * tot_rq_in_driver here, and 2) decrement it in
> * bfq_finish_requeue_request. Such a solution would
> * let the value of the counter be always accurate,
> * but it would entail using an extra interface
> @@ -5205,7 +5270,7 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
> * Of course, serving one request at a time may cause loss of
> * throughput.
> */
> - if (bfqd->strict_guarantees && bfqd->rq_in_driver > 0)
> + if (bfqd->strict_guarantees && bfqd->tot_rq_in_driver > 0)
> goto exit;
>
> bfqq = bfq_select_queue(bfqd);
> @@ -5216,7 +5281,8 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
>
> if (rq) {
> inc_in_driver_start_rq:
> - bfqd->rq_in_driver++;
> + bfqd->rq_in_driver[bfqq->actuator_idx]++;
> + bfqd->tot_rq_in_driver++;
> start_rq:
> rq->rq_flags |= RQF_STARTED;
> }
> @@ -6289,7 +6355,7 @@ static void bfq_update_hw_tag(struct bfq_data *bfqd)
> struct bfq_queue *bfqq = bfqd->in_service_queue;
>
> bfqd->max_rq_in_driver = max_t(int, bfqd->max_rq_in_driver,
> - bfqd->rq_in_driver);
> + bfqd->tot_rq_in_driver);
>
> if (bfqd->hw_tag == 1)
> return;
> @@ -6300,7 +6366,7 @@ static void bfq_update_hw_tag(struct bfq_data *bfqd)
> * sum is not exact, as it's not taking into account deactivated
> * requests.
> */
> - if (bfqd->rq_in_driver + bfqd->queued <= BFQ_HW_QUEUE_THRESHOLD)
> + if (bfqd->tot_rq_in_driver + bfqd->queued <= BFQ_HW_QUEUE_THRESHOLD)
> return;
>
> /*
> @@ -6311,7 +6377,7 @@ static void bfq_update_hw_tag(struct bfq_data *bfqd)
> if (bfqq && bfq_bfqq_has_short_ttime(bfqq) &&
> bfqq->dispatched + bfqq->queued[0] + bfqq->queued[1] <
> BFQ_HW_QUEUE_THRESHOLD &&
> - bfqd->rq_in_driver < BFQ_HW_QUEUE_THRESHOLD)
> + bfqd->tot_rq_in_driver < BFQ_HW_QUEUE_THRESHOLD)
> return;
>
> if (bfqd->hw_tag_samples++ < BFQ_HW_QUEUE_SAMPLES)
> @@ -6332,7 +6398,8 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
>
> bfq_update_hw_tag(bfqd);
>
> - bfqd->rq_in_driver--;
> + bfqd->rq_in_driver[bfqq->actuator_idx]--;
> + bfqd->tot_rq_in_driver--;
> bfqq->dispatched--;
>
> if (!bfqq->dispatched && !bfq_bfqq_busy(bfqq)) {
> @@ -6451,7 +6518,7 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
> BFQQE_NO_MORE_REQUESTS);
> }
>
> - if (!bfqd->rq_in_driver)
> + if (!bfqd->tot_rq_in_driver)
> bfq_schedule_dispatch(bfqd);
> }
>
> @@ -6582,13 +6649,13 @@ static void bfq_update_inject_limit(struct bfq_data *bfqd,
> * conditions to do it, or we can lower the last base value
> * computed.
> *
> - * NOTE: (bfqd->rq_in_driver == 1) means that there is no I/O
> + * NOTE: (bfqd->tot_rq_in_driver == 1) means that there is no I/O
> * request in flight, because this function is in the code
> * path that handles the completion of a request of bfqq, and,
> * in particular, this function is executed before
> - * bfqd->rq_in_driver is decremented in such a code path.
> + * bfqd->tot_rq_in_driver is decremented in such a code path.
> */
> - if ((bfqq->last_serv_time_ns == 0 && bfqd->rq_in_driver == 1) ||
> + if ((bfqq->last_serv_time_ns == 0 && bfqd->tot_rq_in_driver == 1) ||
> tot_time_ns < bfqq->last_serv_time_ns) {
> if (bfqq->last_serv_time_ns == 0) {
> /*
> @@ -6598,7 +6665,7 @@ static void bfq_update_inject_limit(struct bfq_data *bfqd,
> bfqq->inject_limit = max_t(unsigned int, 1, old_limit);
> }
> bfqq->last_serv_time_ns = tot_time_ns;
> - } else if (!bfqd->rqs_injected && bfqd->rq_in_driver == 1)
> + } else if (!bfqd->rqs_injected && bfqd->tot_rq_in_driver == 1)
> /*
> * No I/O injected and no request still in service in
> * the drive: these are the exact conditions for
> @@ -7239,7 +7306,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
> bfqd->queue_weights_tree = RB_ROOT_CACHED;
> bfqd->num_groups_with_pending_reqs = 0;
>
> - INIT_LIST_HEAD(&bfqd->active_list);
> + INIT_LIST_HEAD(&bfqd->active_list[0]);
> + INIT_LIST_HEAD(&bfqd->active_list[1]);
> INIT_LIST_HEAD(&bfqd->idle_list);
> INIT_HLIST_HEAD(&bfqd->burst_list);
>
> @@ -7284,6 +7352,9 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
> ref_wr_duration[blk_queue_nonrot(bfqd->queue)];
> bfqd->peak_rate = ref_rate[blk_queue_nonrot(bfqd->queue)] * 2 / 3;
>
> + /* see comments on the definition of next field inside bfq_data */
> + bfqd->actuator_load_threshold = 4;
> +
> spin_lock_init(&bfqd->lock);
>
> /*
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index 90130a893c8f..adb3ba6a9d90 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -586,7 +586,12 @@ struct bfq_data {
> /* number of queued requests */
> int queued;
> /* number of requests dispatched and waiting for completion */
> - int rq_in_driver;
> + int tot_rq_in_driver;
> + /*
> + * number of requests dispatched and waiting for completion
> + * for each actuator
> + */
> + int rq_in_driver[BFQ_MAX_ACTUATORS];
>
> /* true if the device is non rotational and performs queueing */
> bool nonrot_with_queueing;
> @@ -680,8 +685,13 @@ struct bfq_data {
> /* maximum budget allotted to a bfq_queue before rescheduling */
> int bfq_max_budget;
>
> - /* list of all the bfq_queues active on the device */
> - struct list_head active_list;
> + /*
> + * List of all the bfq_queues active for a specific actuator
> + * on the device. Keeping active queues separate on a
> + * per-actuator basis helps implementing per-actuator
> + * injection more efficiently.
> + */
> + struct list_head active_list[BFQ_MAX_ACTUATORS];
> /* list of all the bfq_queues idle on the device */
> struct list_head idle_list;
>
> @@ -816,6 +826,29 @@ struct bfq_data {
> * in this device.
> */
> struct blk_independent_access_range ia_ranges[BFQ_MAX_ACTUATORS];
> +
> + /*
> + * If the number of I/O requests queued in the device for a
> + * given actuator is below next threshold, then the actuator
> + * is deemed as underutilized. If this condition is found to
> + * hold for some actuator upon a dispatch, but (i) the
> + * in-service queue does not contain I/O for that actuator,
> + * while (ii) some other queue does contain I/O for that
> + * actuator, then the head I/O request of the latter queue is
> + * returned (injected), instead of the head request of the
> + * currently in-service queue.
> + *
> + * We set the threshold, empirically, to the minimum possible
> + * value for which an actuator is fully utilized, or close to
> + * be fully utilized. By doing so, injected I/O 'steals' as
> + * few drive-queue slots as possibile to the in-service
> + * queue. This reduces as much as possible the probability
> + * that the service of I/O from the in-service bfq_queue gets
> + * delayed because of slot exhaustion, i.e., because all the
> + * slots of the drive queue are filled with I/O injected from
> + * other queues (NCQ provides for 32 slots).
> + */
> + unsigned int actuator_load_threshold;
> };
>
> enum bfqq_state_flags {
> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
> index 8fc3da4c23bb..ec0273e2cd07 100644
> --- a/block/bfq-wf2q.c
> +++ b/block/bfq-wf2q.c
> @@ -477,7 +477,7 @@ static void bfq_active_insert(struct bfq_service_tree *st,
> bfqd = (struct bfq_data *)bfqg->bfqd;
> #endif
> if (bfqq)
> - list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list);
> + list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list[bfqq->actuator_idx]);
> #ifdef CONFIG_BFQ_GROUP_IOSCHED
> if (bfqg != bfqd->root_group)
> bfqg->active_entities++;

--
Damien Le Moal
Western Digital Research


2022-11-21 02:17:18

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH V6 6/8] block, bfq: retrieve independent access ranges from request queue

On 11/4/22 01:26, Paolo Valente wrote:
> From: Federico Gavioli <[email protected]>
>
> This patch implements the code to gather the content of the
> independent_access_ranges structure from the request_queue and copy
> it into the queue's bfq_data. This copy is done at queue initialization.
>
> We copy the access ranges into the bfq_data to avoid taking the queue
> lock each time we access the ranges.
>
> This implementation, however, puts a limit to the maximum independent
> ranges supported by the scheduler. Such a limit is equal to the constant
> BFQ_MAX_ACTUATORS. This limit was placed to avoid the allocation of
> dynamic memory.
>
> Co-developed-by: Rory Chen <[email protected]>
> Signed-off-by: Rory Chen <[email protected]>
> Signed-off-by: Federico Gavioli <[email protected]>
> Signed-off-by: Paolo Valente <[email protected]>
> ---
> block/bfq-iosched.c | 54 ++++++++++++++++++++++++++++++++++++++-------
> block/bfq-iosched.h | 5 +++++
> 2 files changed, 51 insertions(+), 8 deletions(-)
>
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index c94b80e3f685..106c8820cc5c 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -1831,10 +1831,26 @@ static bool bfq_bfqq_higher_class_or_weight(struct bfq_queue *bfqq,
> /* get the index of the actuator that will serve bio */
> static unsigned int bfq_actuator_index(struct bfq_data *bfqd, struct bio *bio)
> {
> - /*
> - * Multi-actuator support not complete yet, so always return 0
> - * for the moment.
> - */
> + struct blk_independent_access_range *iar;
> + unsigned int i;
> + sector_t end;
> +
> + /* no search needed if one or zero ranges present */
> + if (bfqd->num_actuators < 2)
> + return 0;
> +
> + /* bio_end_sector(bio) gives the sector after the last one */
> + end = bio_end_sector(bio) - 1;
> +
> + for (i = 0; i < bfqd->num_actuators; i++) {
> + iar = &(bfqd->ia_ranges[i]);
> + if (end >= iar->sector && end < iar->sector + iar->nr_sectors)
> + return i;
> + }
> +
> + WARN_ONCE(true,
> + "bfq_actuator_index: bio sector out of ranges: end=%llu\n",
> + end);
> return 0;
> }
>
> @@ -2479,7 +2495,6 @@ static void bfq_remove_request(struct request_queue *q,
>
> if (rq->cmd_flags & REQ_META)
> bfqq->meta_pending--;
> -

whiteline change

> }
>
> static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
> @@ -7144,6 +7159,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
> {
> struct bfq_data *bfqd;
> struct elevator_queue *eq;
> + unsigned int i;
> + struct blk_independent_access_ranges *ia_ranges = q->disk->ia_ranges;
>
> eq = elevator_alloc(q, e);
> if (!eq)
> @@ -7187,10 +7204,31 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
> bfqd->queue = q;
>
> /*
> - * Multi-actuator support not complete yet, default to single
> - * actuator for the moment.
> + * If the disk supports multiple actuators, we copy the independent
> + * access ranges from the request queue structure.
> */
> - bfqd->num_actuators = 1;
> + spin_lock_irq(&q->queue_lock);
> + if (ia_ranges) {
> + /*
> + * Check if the disk ia_ranges size exceeds the current bfq
> + * actuator limit.
> + */
> + if (ia_ranges->nr_ia_ranges > BFQ_MAX_ACTUATORS) {
> + pr_crit("nr_ia_ranges higher than act limit: iars=%d, max=%d.\n",
> + ia_ranges->nr_ia_ranges, BFQ_MAX_ACTUATORS);
> + pr_crit("Falling back to single actuator mode.\n");
> + bfqd->num_actuators = 0;
> + } else {
> + bfqd->num_actuators = ia_ranges->nr_ia_ranges;
> +
> + for (i = 0; i < bfqd->num_actuators; i++)
> + bfqd->ia_ranges[i] = ia_ranges->ia_range[i];
> + }
> + } else {
> + bfqd->num_actuators = 0;

That is very weird. The default should be 1 actuator.
ia_ranges->nr_ia_ranges is 0 when the disk does not provide any range
information, meaning it is a regular disk with a single actuator.

> + }
> +
> + spin_unlock_irq(&q->queue_lock);
>
> INIT_LIST_HEAD(&bfqd->dispatch);
>
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index f1c2e77cbf9a..90130a893c8f 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -811,6 +811,11 @@ struct bfq_data {
> */
> unsigned int num_actuators;
>
> + /*
> + * Disk independent access ranges for each actuator
> + * in this device.
> + */
> + struct blk_independent_access_range ia_ranges[BFQ_MAX_ACTUATORS];

I fail to see how keeping this information is useful, especially given
that this struct contains a kobj. Why not just copy the sector &
nr_sectors fields into struct bfq_queue ? That would also work for the
single actuator case as then sector will be 0 and nr_sectors will be the
disk total capacity.

I think this patch should be first in the series. That will avoid having
the empty bfq_actuator_index() function.

> };
>
> enum bfqq_state_flags {

--
Damien Le Moal
Western Digital Research


2022-12-06 08:20:03

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH V6 4/8] block, bfq: turn bfqq_data into an array in bfq_io_cq



> Il giorno 21 nov 2022, alle ore 01:39, Damien Le Moal <[email protected]> ha scritto:
...
>>
>> bfqq = bfq_split_bfqq(bic, bfqq);
>> split = true;
>> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
>> index f2e8ab91951c..e27897d66a0f 100644
>> --- a/block/bfq-iosched.h
>> +++ b/block/bfq-iosched.h
>> @@ -416,7 +416,7 @@ struct bfq_queue {
>> struct bfq_iocq_bfqq_data {
>> /*
>> * Snapshot of the has_short_time flag before merging; taken
>> - * to remember its value while the queue is merged, so as to
>> + * to remember its values while the queue is merged, so as to
>> * be able to restore it in case of split.
>> */
>> bool saved_has_short_ttime;
>> @@ -430,7 +430,7 @@ struct bfq_iocq_bfqq_data {
>> u64 saved_tot_idle_time;
>>
>> /*
>> - * Same purpose as the previous fields for the value of the
>> + * Same purpose as the previous fields for the values of the
>> * field keeping the queue's belonging to a large burst
>> */
>> bool saved_in_large_burst;
>> @@ -493,8 +493,12 @@ struct bfq_io_cq {
>> uint64_t blkcg_serial_nr; /* the current blkcg serial */
>> #endif
>>
>> - /* persistent data for associated synchronous process queue */
>> - struct bfq_iocq_bfqq_data bfqq_data;
>> + /*
>> + * Persistent data for associated synchronous process queues
>> + * (one queue per actuator, see field bfqq above). In
>> + * particular, each of these queues may undergo a merge.
>> + */
>> + struct bfq_iocq_bfqq_data bfqq_data[BFQ_MAX_ACTUATORS];
>
> I wonder if packing this together with struct bfq_queue would be cleaner.
> That would avoid the 2 arrays you have in this struct. Something like this:
>
> struct bfq_queue_data {
> struct bfq_queue *bfqq[2];
> struct bfq_iocq_bfqq_data iocq_data;
> }
>
> struct bfq_io_cq {
> ...
> struct bfq_queue_data bfqqd[BFQ_MAX_ACTUATORS];
> ...
> }
>
> Thinking aloud here. That may actually make the code more complicated.

I see your point, but, yes, this change would entail one more
indirection when accessing queues from io contexts.

Apart from this, I have applied all of your other suggestions here.

Thanks,
Paolo

>
>>
>> unsigned int requests; /* Number of requests this process has in flight */
>> };
>
> --
> Damien Le Moal
> Western Digital Research

2022-12-06 08:35:27

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH V6 6/8] block, bfq: retrieve independent access ranges from request queue



> Il giorno 21 nov 2022, alle ore 02:01, Damien Le Moal <[email protected]> ha scritto:
>

...

>
>> }
>>
>> static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
>> @@ -7144,6 +7159,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>> {
>> struct bfq_data *bfqd;
>> struct elevator_queue *eq;
>> + unsigned int i;
>> + struct blk_independent_access_ranges *ia_ranges = q->disk->ia_ranges;
>>
>> eq = elevator_alloc(q, e);
>> if (!eq)
>> @@ -7187,10 +7204,31 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>> bfqd->queue = q;
>>
>> /*
>> - * Multi-actuator support not complete yet, default to single
>> - * actuator for the moment.
>> + * If the disk supports multiple actuators, we copy the independent
>> + * access ranges from the request queue structure.
>> */
>> - bfqd->num_actuators = 1;
>> + spin_lock_irq(&q->queue_lock);
>> + if (ia_ranges) {
>> + /*
>> + * Check if the disk ia_ranges size exceeds the current bfq
>> + * actuator limit.
>> + */
>> + if (ia_ranges->nr_ia_ranges > BFQ_MAX_ACTUATORS) {
>> + pr_crit("nr_ia_ranges higher than act limit: iars=%d, max=%d.\n",
>> + ia_ranges->nr_ia_ranges, BFQ_MAX_ACTUATORS);
>> + pr_crit("Falling back to single actuator mode.\n");
>> + bfqd->num_actuators = 0;
>> + } else {
>> + bfqd->num_actuators = ia_ranges->nr_ia_ranges;
>> +
>> + for (i = 0; i < bfqd->num_actuators; i++)
>> + bfqd->ia_ranges[i] = ia_ranges->ia_range[i];
>> + }
>> + } else {
>> + bfqd->num_actuators = 0;
>
> That is very weird. The default should be 1 actuator.
> ia_ranges->nr_ia_ranges is 0 when the disk does not provide any range
> information, meaning it is a regular disk with a single actuator.

Actually, IIUC this assignment to 0 seems to be done exactly when you
say that it should be done, i.e., when the disk does not provide any
range information (ia_ranges is NULL). Am I missing something else?

Once again, all other suggestions applied. I'm about to submit a V7.

Thanks,
Paolo

2022-12-06 08:53:17

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH V6 6/8] block, bfq: retrieve independent access ranges from request queue

On 12/6/22 17:06, Paolo Valente wrote:
>
>
>> Il giorno 21 nov 2022, alle ore 02:01, Damien Le Moal <[email protected]> ha scritto:
>>
>
> ...
>
>>
>>> }
>>>
>>> static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
>>> @@ -7144,6 +7159,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>>> {
>>> struct bfq_data *bfqd;
>>> struct elevator_queue *eq;
>>> + unsigned int i;
>>> + struct blk_independent_access_ranges *ia_ranges = q->disk->ia_ranges;
>>>
>>> eq = elevator_alloc(q, e);
>>> if (!eq)
>>> @@ -7187,10 +7204,31 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>>> bfqd->queue = q;
>>>
>>> /*
>>> - * Multi-actuator support not complete yet, default to single
>>> - * actuator for the moment.
>>> + * If the disk supports multiple actuators, we copy the independent
>>> + * access ranges from the request queue structure.
>>> */
>>> - bfqd->num_actuators = 1;
>>> + spin_lock_irq(&q->queue_lock);
>>> + if (ia_ranges) {
>>> + /*
>>> + * Check if the disk ia_ranges size exceeds the current bfq
>>> + * actuator limit.
>>> + */
>>> + if (ia_ranges->nr_ia_ranges > BFQ_MAX_ACTUATORS) {
>>> + pr_crit("nr_ia_ranges higher than act limit: iars=%d, max=%d.\n",
>>> + ia_ranges->nr_ia_ranges, BFQ_MAX_ACTUATORS);
>>> + pr_crit("Falling back to single actuator mode.\n");
>>> + bfqd->num_actuators = 0;
>>> + } else {
>>> + bfqd->num_actuators = ia_ranges->nr_ia_ranges;
>>> +
>>> + for (i = 0; i < bfqd->num_actuators; i++)
>>> + bfqd->ia_ranges[i] = ia_ranges->ia_range[i];
>>> + }
>>> + } else {
>>> + bfqd->num_actuators = 0;
>>
>> That is very weird. The default should be 1 actuator.
>> ia_ranges->nr_ia_ranges is 0 when the disk does not provide any range
>> information, meaning it is a regular disk with a single actuator.
>
> Actually, IIUC this assignment to 0 seems to be done exactly when you
> say that it should be done, i.e., when the disk does not provide any
> range information (ia_ranges is NULL). Am I missing something else?

No ranges reported means no extra actuators, so a single actuator an
single LBA range for the entire device. In that case, bfq should process
all IOs using bfqd->ia_ranges[0]. The get range function will always
return that range. That makes the code clean and avoids different path for
nr_ranges == 1 and nr_ranges > 1. No ?

>
> Once again, all other suggestions applied. I'm about to submit a V7.
>
> Thanks,
> Paolo
>

--
Damien Le Moal
Western Digital Research

2022-12-06 09:41:56

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH V6 6/8] block, bfq: retrieve independent access ranges from request queue



> Il giorno 6 dic 2022, alle ore 09:29, Damien Le Moal <[email protected]> ha scritto:
>
> On 12/6/22 17:06, Paolo Valente wrote:
>>
>>
>>> Il giorno 21 nov 2022, alle ore 02:01, Damien Le Moal <[email protected]> ha scritto:
>>>
>>
>> ...
>>
>>>
>>>> }
>>>>
>>>> static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
>>>> @@ -7144,6 +7159,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>>>> {
>>>> struct bfq_data *bfqd;
>>>> struct elevator_queue *eq;
>>>> + unsigned int i;
>>>> + struct blk_independent_access_ranges *ia_ranges = q->disk->ia_ranges;
>>>>
>>>> eq = elevator_alloc(q, e);
>>>> if (!eq)
>>>> @@ -7187,10 +7204,31 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>>>> bfqd->queue = q;
>>>>
>>>> /*
>>>> - * Multi-actuator support not complete yet, default to single
>>>> - * actuator for the moment.
>>>> + * If the disk supports multiple actuators, we copy the independent
>>>> + * access ranges from the request queue structure.
>>>> */
>>>> - bfqd->num_actuators = 1;
>>>> + spin_lock_irq(&q->queue_lock);
>>>> + if (ia_ranges) {
>>>> + /*
>>>> + * Check if the disk ia_ranges size exceeds the current bfq
>>>> + * actuator limit.
>>>> + */
>>>> + if (ia_ranges->nr_ia_ranges > BFQ_MAX_ACTUATORS) {
>>>> + pr_crit("nr_ia_ranges higher than act limit: iars=%d, max=%d.\n",
>>>> + ia_ranges->nr_ia_ranges, BFQ_MAX_ACTUATORS);
>>>> + pr_crit("Falling back to single actuator mode.\n");
>>>> + bfqd->num_actuators = 0;
>>>> + } else {
>>>> + bfqd->num_actuators = ia_ranges->nr_ia_ranges;
>>>> +
>>>> + for (i = 0; i < bfqd->num_actuators; i++)
>>>> + bfqd->ia_ranges[i] = ia_ranges->ia_range[i];
>>>> + }
>>>> + } else {
>>>> + bfqd->num_actuators = 0;
>>>
>>> That is very weird. The default should be 1 actuator.
>>> ia_ranges->nr_ia_ranges is 0 when the disk does not provide any range
>>> information, meaning it is a regular disk with a single actuator.
>>
>> Actually, IIUC this assignment to 0 seems to be done exactly when you
>> say that it should be done, i.e., when the disk does not provide any
>> range information (ia_ranges is NULL). Am I missing something else?
>
> No ranges reported means no extra actuators, so a single actuator an
> single LBA range for the entire device.

I'm still confused, sorry. Where will I read sector ranges from, if
no sector range information is available (ia_ranges is NULL)?

> In that case, bfq should process
> all IOs using bfqd->ia_ranges[0]. The get range function will always
> return that range. That makes the code clean and avoids different path for
> nr_ranges == 1 and nr_ranges > 1. No ?

Apart from the above point, for which maybe there is some other
source of information for getting ranges, I see the following issue.

What you propose is to save sector information and trigger the
range-checking for loop also for the above single-actuator case. Yet
txecuting (one iteration of) that loop will will always result in
getting a 0 as index. So, what's the point is saving data and
executing code on each IO, for getting a static result that we already
know we will get?

Thanks,
Paolo

>
>>
>> Once again, all other suggestions applied. I'm about to submit a V7.
>>
>> Thanks,
>> Paolo
>>
>
> --
> Damien Le Moal
> Western Digital Research

2022-12-06 09:46:27

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH V6 6/8] block, bfq: retrieve independent access ranges from request queue

On 12/6/22 17:41, Paolo Valente wrote:
>
>
>> Il giorno 6 dic 2022, alle ore 09:29, Damien Le Moal <[email protected]> ha scritto:
>>
>> On 12/6/22 17:06, Paolo Valente wrote:
>>>
>>>
>>>> Il giorno 21 nov 2022, alle ore 02:01, Damien Le Moal <[email protected]> ha scritto:
>>>>
>>>
>>> ...
>>>
>>>>
>>>>> }
>>>>>
>>>>> static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
>>>>> @@ -7144,6 +7159,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>>>>> {
>>>>> struct bfq_data *bfqd;
>>>>> struct elevator_queue *eq;
>>>>> + unsigned int i;
>>>>> + struct blk_independent_access_ranges *ia_ranges = q->disk->ia_ranges;
>>>>>
>>>>> eq = elevator_alloc(q, e);
>>>>> if (!eq)
>>>>> @@ -7187,10 +7204,31 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>>>>> bfqd->queue = q;
>>>>>
>>>>> /*
>>>>> - * Multi-actuator support not complete yet, default to single
>>>>> - * actuator for the moment.
>>>>> + * If the disk supports multiple actuators, we copy the independent
>>>>> + * access ranges from the request queue structure.
>>>>> */
>>>>> - bfqd->num_actuators = 1;
>>>>> + spin_lock_irq(&q->queue_lock);
>>>>> + if (ia_ranges) {
>>>>> + /*
>>>>> + * Check if the disk ia_ranges size exceeds the current bfq
>>>>> + * actuator limit.
>>>>> + */
>>>>> + if (ia_ranges->nr_ia_ranges > BFQ_MAX_ACTUATORS) {
>>>>> + pr_crit("nr_ia_ranges higher than act limit: iars=%d, max=%d.\n",
>>>>> + ia_ranges->nr_ia_ranges, BFQ_MAX_ACTUATORS);
>>>>> + pr_crit("Falling back to single actuator mode.\n");
>>>>> + bfqd->num_actuators = 0;
>>>>> + } else {
>>>>> + bfqd->num_actuators = ia_ranges->nr_ia_ranges;
>>>>> +
>>>>> + for (i = 0; i < bfqd->num_actuators; i++)
>>>>> + bfqd->ia_ranges[i] = ia_ranges->ia_range[i];
>>>>> + }
>>>>> + } else {
>>>>> + bfqd->num_actuators = 0;
>>>>
>>>> That is very weird. The default should be 1 actuator.
>>>> ia_ranges->nr_ia_ranges is 0 when the disk does not provide any range
>>>> information, meaning it is a regular disk with a single actuator.
>>>
>>> Actually, IIUC this assignment to 0 seems to be done exactly when you
>>> say that it should be done, i.e., when the disk does not provide any
>>> range information (ia_ranges is NULL). Am I missing something else?
>>
>> No ranges reported means no extra actuators, so a single actuator an
>> single LBA range for the entire device.
>
> I'm still confused, sorry. Where will I read sector ranges from, if
> no sector range information is available (ia_ranges is NULL)?

start = 0 and nr_sectors = bdev_nr_sectors(bdev).
No ia_ranges to read.

>
>> In that case, bfq should process
>> all IOs using bfqd->ia_ranges[0]. The get range function will always
>> return that range. That makes the code clean and avoids different path for
>> nr_ranges == 1 and nr_ranges > 1. No ?
>
> Apart from the above point, for which maybe there is some other
> source of information for getting ranges, I see the following issue.
>
> What you propose is to save sector information and trigger the
> range-checking for loop also for the above single-actuator case. Yet
> txecuting (one iteration of) that loop will will always result in
> getting a 0 as index. So, what's the point is saving data and
> executing code on each IO, for getting a static result that we already
> know we will get?

Surely, you can add an "if (bfqd->num_actuators ==1)" optimization in
strategic places to optimize for regular devices with a single actuator,
which bfqd->num_actuators == 1 *exactly* describes. Having
"bfqd->num_actuators = 0" makes no sense to me.

But if you feel strongly about this, feel free to ignore this.

>
> Thanks,
> Paolo
>
>>
>>>
>>> Once again, all other suggestions applied. I'm about to submit a V7.
>>>
>>> Thanks,
>>> Paolo
>>>
>>
>> --
>> Damien Le Moal
>> Western Digital Research
>

--
Damien Le Moal
Western Digital Research

2022-12-06 16:03:02

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH V6 6/8] block, bfq: retrieve independent access ranges from request queue



> Il giorno 6 dic 2022, alle ore 10:02, Damien Le Moal <[email protected]> ha scritto:
>
> On 12/6/22 17:41, Paolo Valente wrote:
>>
>>
>>> Il giorno 6 dic 2022, alle ore 09:29, Damien Le Moal <[email protected]> ha scritto:
>>>
>>> On 12/6/22 17:06, Paolo Valente wrote:
>>>>
>>>>
>>>>> Il giorno 21 nov 2022, alle ore 02:01, Damien Le Moal <[email protected]> ha scritto:
>>>>>
>>>>
>>>> ...
>>>>
>>>>>
>>>>>> }
>>>>>>
>>>>>> static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
>>>>>> @@ -7144,6 +7159,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>>>>>> {
>>>>>> struct bfq_data *bfqd;
>>>>>> struct elevator_queue *eq;
>>>>>> + unsigned int i;
>>>>>> + struct blk_independent_access_ranges *ia_ranges = q->disk->ia_ranges;
>>>>>>
>>>>>> eq = elevator_alloc(q, e);
>>>>>> if (!eq)
>>>>>> @@ -7187,10 +7204,31 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>>>>>> bfqd->queue = q;
>>>>>>
>>>>>> /*
>>>>>> - * Multi-actuator support not complete yet, default to single
>>>>>> - * actuator for the moment.
>>>>>> + * If the disk supports multiple actuators, we copy the independent
>>>>>> + * access ranges from the request queue structure.
>>>>>> */
>>>>>> - bfqd->num_actuators = 1;
>>>>>> + spin_lock_irq(&q->queue_lock);
>>>>>> + if (ia_ranges) {
>>>>>> + /*
>>>>>> + * Check if the disk ia_ranges size exceeds the current bfq
>>>>>> + * actuator limit.
>>>>>> + */
>>>>>> + if (ia_ranges->nr_ia_ranges > BFQ_MAX_ACTUATORS) {
>>>>>> + pr_crit("nr_ia_ranges higher than act limit: iars=%d, max=%d.\n",
>>>>>> + ia_ranges->nr_ia_ranges, BFQ_MAX_ACTUATORS);
>>>>>> + pr_crit("Falling back to single actuator mode.\n");
>>>>>> + bfqd->num_actuators = 0;
>>>>>> + } else {
>>>>>> + bfqd->num_actuators = ia_ranges->nr_ia_ranges;
>>>>>> +
>>>>>> + for (i = 0; i < bfqd->num_actuators; i++)
>>>>>> + bfqd->ia_ranges[i] = ia_ranges->ia_range[i];
>>>>>> + }
>>>>>> + } else {
>>>>>> + bfqd->num_actuators = 0;
>>>>>
>>>>> That is very weird. The default should be 1 actuator.
>>>>> ia_ranges->nr_ia_ranges is 0 when the disk does not provide any range
>>>>> information, meaning it is a regular disk with a single actuator.
>>>>
>>>> Actually, IIUC this assignment to 0 seems to be done exactly when you
>>>> say that it should be done, i.e., when the disk does not provide any
>>>> range information (ia_ranges is NULL). Am I missing something else?
>>>
>>> No ranges reported means no extra actuators, so a single actuator an
>>> single LBA range for the entire device.
>>
>> I'm still confused, sorry. Where will I read sector ranges from, if
>> no sector range information is available (ia_ranges is NULL)?
>
> start = 0 and nr_sectors = bdev_nr_sectors(bdev).
> No ia_ranges to read.
>

ok, thanks

>>
>>> In that case, bfq should process
>>> all IOs using bfqd->ia_ranges[0]. The get range function will always
>>> return that range. That makes the code clean and avoids different path for
>>> nr_ranges == 1 and nr_ranges > 1. No ?
>>
>> Apart from the above point, for which maybe there is some other
>> source of information for getting ranges, I see the following issue.
>>
>> What you propose is to save sector information and trigger the
>> range-checking for loop also for the above single-actuator case. Yet
>> txecuting (one iteration of) that loop will will always result in
>> getting a 0 as index. So, what's the point is saving data and
>> executing code on each IO, for getting a static result that we already
>> know we will get?
>
> Surely, you can add an "if (bfqd->num_actuators ==1)" optimization in
> strategic places to optimize for regular devices with a single actuator,
> which bfqd->num_actuators == 1 *exactly* describes. Having
> "bfqd->num_actuators = 0" makes no sense to me.
>

Ok, I see your point at last, sorry. I'll check the code, but I think
that there is no problem in moving from 0 to 1 actuators for the case
ia_ranges == NULL. I meant to separate the case "single actuator with
ia_ranges available" (num_actuators = 1), from the case "no ia_ranges
available" (num_actuators = 0). But evidently things don't work as I
thought, and using the same value (1) is ok.

Just, let me avoid setting the fields bfqd->sector and
bfqd->nr_sectors for a case where we don't use them.

Thanks,
Paolo

> But if you feel strongly about this, feel free to ignore this.
>
>>
>> Thanks,
>> Paolo
>>
>>>
>>>>
>>>> Once again, all other suggestions applied. I'm about to submit a V7.
>>>>
>>>> Thanks,
>>>> Paolo
>>>>
>>>
>>> --
>>> Damien Le Moal
>>> Western Digital Research
>>
>
> --
> Damien Le Moal
> Western Digital Research

2022-12-06 23:40:32

by Damien Le Moal

[permalink] [raw]
Subject: Re: [PATCH V6 6/8] block, bfq: retrieve independent access ranges from request queue

On 12/7/22 00:43, Paolo Valente wrote:
>>>> In that case, bfq should process
>>>> all IOs using bfqd->ia_ranges[0]. The get range function will always
>>>> return that range. That makes the code clean and avoids different path for
>>>> nr_ranges == 1 and nr_ranges > 1. No ?
>>>
>>> Apart from the above point, for which maybe there is some other
>>> source of information for getting ranges, I see the following issue.
>>>
>>> What you propose is to save sector information and trigger the
>>> range-checking for loop also for the above single-actuator case. Yet
>>> txecuting (one iteration of) that loop will will always result in
>>> getting a 0 as index. So, what's the point is saving data and
>>> executing code on each IO, for getting a static result that we already
>>> know we will get?
>>
>> Surely, you can add an "if (bfqd->num_actuators ==1)" optimization in
>> strategic places to optimize for regular devices with a single actuator,
>> which bfqd->num_actuators == 1 *exactly* describes. Having
>> "bfqd->num_actuators = 0" makes no sense to me.
>>
>
> Ok, I see your point at last, sorry. I'll check the code, but I think
> that there is no problem in moving from 0 to 1 actuators for the case
> ia_ranges == NULL. I meant to separate the case "single actuator with
> ia_ranges available" (num_actuators = 1), from the case "no ia_ranges
> available" (num_actuators = 0). But evidently things don't work as I
> thought, and using the same value (1) is ok.

Any HDD always has at least 1 actuator. Per SCSI & ATA specs, ia_range
will be present only and only if the device has *more than one actuator*.
So the case "no ia_ranges available" means "num_actuator = 1" and the
implied access range is the entire device capacity.

> Just, let me avoid setting the fields bfqd->sector and
> bfqd->nr_sectors for a case where we don't use them.

Sure. But if you do not use them thanks to "if (num_actuators == 1)"
optimizations, it would still not hurt to set these fields. That actually
could be helpful for debugging.

Overall, I think that the code should not differ much for the
num_actuators == 1 case. Any actuator search loop would simply hit the
first and only entry on the first iteration, so should not add any nasty
overhead. Such single code path would likely greatly simplify the code.


--
Damien Le Moal
Western Digital Research

2022-12-08 11:13:47

by Paolo Valente

[permalink] [raw]
Subject: Re: [PATCH V6 6/8] block, bfq: retrieve independent access ranges from request queue



> Il giorno 7 dic 2022, alle ore 00:34, Damien Le Moal <[email protected]> ha scritto:
>
>

[...]

>> Just, let me avoid setting the fields bfqd->sector and
>> bfqd->nr_sectors for a case where we don't use them.
>
> Sure. But if you do not use them thanks to "if (num_actuators == 1)"
> optimizations, it would still not hurt to set these fields. That actually
> could be helpful for debugging.
>

Got it. I'm about to send a V9 that applies this last suggestion of yours.

Thanks,
Paolo