by kernel test robot

[permalink] [raw]

Subject: Re: [PATCH V3 7/8] block, bfq: inject I/O to underutilized actuators

Hi Paolo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v6.0 next-20220930]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Paolo-Valente/block-bfq-extend-bfq-to-support-multi-actuator-drives/20221004-174503
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-a004-20221003
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/badc4a914c2ee8880ec1f5b498588235c77e05f9
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Paolo-Valente/block-bfq-extend-bfq-to-support-multi-actuator-drives/20221004-174503
git checkout badc4a914c2ee8880ec1f5b498588235c77e05f9
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash block/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

All warnings (new ones prefixed by >>):

>> block/bfq-iosched.c:4781:19: warning: no previous prototype for function 'bfq_find_active_bfqq_for_actuator' [-Wmissing-prototypes]
struct bfq_queue *bfq_find_active_bfqq_for_actuator(struct bfq_data *bfqd,
^
block/bfq-iosched.c:4781:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
struct bfq_queue *bfq_find_active_bfqq_for_actuator(struct bfq_data *bfqd,
^
static
>> block/bfq-iosched.c:4808:19: warning: no previous prototype for function 'bfq_find_bfqq_for_underused_actuator' [-Wmissing-prototypes]
struct bfq_queue *bfq_find_bfqq_for_underused_actuator(struct bfq_data *bfqd)
^
block/bfq-iosched.c:4808:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
struct bfq_queue *bfq_find_bfqq_for_underused_actuator(struct bfq_data *bfqd)
^
static
2 warnings generated.

vim +/bfq_find_active_bfqq_for_actuator +4781 block/bfq-iosched.c

4780
> 4781 struct bfq_queue *bfq_find_active_bfqq_for_actuator(struct bfq_data *bfqd,
4782 int idx)
4783 {
4784 struct bfq_queue *bfqq = NULL;
4785
4786 if (bfqd->in_service_queue &&
4787 bfqd->in_service_queue->actuator_idx == idx)
4788 return bfqd->in_service_queue;
4789
4790 list_for_each_entry(bfqq, &bfqd->active_list[idx], bfqq_list) {
4791 if (!RB_EMPTY_ROOT(&bfqq->sort_list) &&
4792 bfq_serv_to_charge(bfqq->next_rq, bfqq) <=
4793 bfq_bfqq_budget_left(bfqq)) {
4794 return bfqq;
4795 }
4796 }
4797
4798 return NULL;
4799 }
4800
4801 /*
4802 * Perform a linear scan of each actuator, until an actuator is found
4803 * for which the following two conditions hold: the load of the
4804 * actuator is below the threshold (see comments on actuator_load_threshold
4805 * for details), and there is a queue that contains I/O for that
4806 * actuator. On success, return that queue.
4807 */
> 4808 struct bfq_queue *bfq_find_bfqq_for_underused_actuator(struct bfq_data *bfqd)
4809 {
4810 int i;
4811
4812 for (i = 0 ; i < bfqd->num_ia_ranges; i++)
4813 if (bfqd->rq_in_driver[i] < bfqd->actuator_load_threshold) {
4814 struct bfq_queue *bfqq =
4815 bfq_find_active_bfqq_for_actuator(bfqd, i);
4816
4817 if (bfqq)
4818 return bfqq;
4819 }
4820
4821 return NULL;
4822 }
4823

--
0-DAY CI Kernel Test Service
https://01.org/lkp

Attachments:

(No filename) (4.16 kB)
config (142.48 kB)
Download all attachments

2022-10-04 19:13:53

Hi Paolo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v6.0 next-20221005]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Paolo-Valente/block-bfq-extend-bfq-to-support-multi-actuator-drives/20221004-174503
base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
config: i386-randconfig-m021-20221003
compiler: gcc-11 (Debian 11.3.0-5) 11.3.0

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>

New smatch warnings:
block/bfq-iosched.c:7009 bfq_put_async_queues() warn: inconsistent indenting

Old smatch warnings:
block/bfq-iosched.c:6133 __bfq_insert_request() warn: variable dereferenced before check 'bfqq' (see line 6129)

vim +7009 block/bfq-iosched.c

6993
6994 /*
6995 * Release all the bfqg references to its async queues. If we are
6996 * deallocating the group these queues may still contain requests, so
6997 * we reparent them to the root cgroup (i.e., the only one that will
6998 * exist for sure until all the requests on a device are gone).
6999 */
7000 void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
7001 {
7002 int i, j, k;
7003
7004 for (k = 0; k < BFQ_NUM_ACTUATORS; k++) {
7005 for (i = 0; i < 2; i++)
7006 for (j = 0; j < IOPRIO_NR_LEVELS; j++)
7007 __bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j][k]);
7008
> 7009 __bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq[k]);
7010 }
7011 }
7012

--
0-DAY CI Kernel Test Service
https://01.org/lkp

Attachments:

(No filename) (1.87 kB)
config (184.33 kB)
Download all attachments

2022-10-11 16:13:25

by Paolo Valente

[permalink] [raw]

Subject: Re: [PATCH V3 1/8] block, bfq: split sync bfq_queues on a per-actuator basis

> Il giorno 5 ott 2022, alle ore 01:04, Damien Le Moal <[email protected]> ha scritto:
>
> On 10/4/22 18:40, Paolo Valente wrote:
>> Multi-actuator drives appear as a single device to the I/O subsystem [1].
>
> Not necessarilly. Multi-lun scsi model will show up as multiple drives
> with one actuator each.
>

Right. In that case, IIUC each LUN appears as a separate device (in
/dev/), and allows one of the actuators to be controlled separately.
So each such device has only one range in the air data structure, and
the extension in these patches is simply inactive.

That said, I'll try to address all your suggestions for this and the
other patches, and send a new version of this series.

Thank you,
Paolo

>> Yet they address commands to different actuators internally, as a
>> function of Logical Block Addressing (LBAs). A given sector is
>> reachable by only one of the actuators. For example, Seagate’s Serial
>> Advanced Technology Attachment (SATA) version contains two actuators
>> and maps the lower half of the SATA LBA space to the lower actuator
>> and the upper half to the upper actuator.
>>
>> Evidently, to fully utilize actuators, no actuator must be left idle
>> or underutilized while there is pending I/O for it. The block layer
>> must somehow control the load of each actuator individually. This
>> commit lays the ground for allowing BFQ to provide such a per-actuator
>> control.
>>
>> BFQ associates an I/O-request sync bfq_queue with each process doing
>> synchronous I/O, or with a group of processes, in case of queue
>> merging. Then BFQ serves one bfq_queue at a time. While in service, a
>> bfq_queue is emptied in request-position order. Yet the same process,
>> or group of processes, may generate I/O for different actuators. In
>> this case, different streams of I/O (each for a different actuator)
>> get all inserted into the same sync bfq_queue. So there is basically
>> no individual control on when each stream is served, i.e., on when the
>> I/O requests of the stream are picked from the bfq_queue and
>> dispatched to the drive.
>>
>> This commit enables BFQ to control the service of each actuator
>> individually for synchronous I/O, by simply splitting each sync
>> bfq_queue into N queues, one for each actuator. In other words, a sync
>> bfq_queue is now associated to a pair (process, actuator). As a
>> consequence of this split, the per-queue proportional-share policy
>> implemented by BFQ will guarantee that the sync I/O generated for each
>> actuator, by each process, receives its fair share of service.
>>
>> This is just a preparatory patch. If the I/O of the same process
>> happens to be sent to different queues, then each of these queues may
>> undergo queue merging. To handle this event, the bfq_io_cq data
>> structure must be properly extended. In addition, stable merging must
>> be disabled to avoid loss of control on individual actuators. Finally,
>> also async queues must be split. These issues are described in detail
>> and addressed in next commits. As for this commit, although multiple
>> per-process bfq_queues are provided, the I/O of each process or group
>> of processes is still sent to only one queue, regardless of the
>> actuator the I/O is for. The forwarding to distinct bfq_queues will be
>> enabled after addressing the above issues.
>>
>> [1] https://www.linaro.org/blog/budget-fair-queueing-bfq-linux-io-scheduler-optimizations-for-multi-actuator-sata-hard-drives/
>>
>> Signed-off-by: Gabriele Felici <[email protected]>
>> Signed-off-by: Paolo Valente <[email protected]>
>> ---
>> block/bfq-cgroup.c | 95 +++++++++++++++++--------------
>> block/bfq-iosched.c | 135 +++++++++++++++++++++++++++-----------------
>> block/bfq-iosched.h | 38 +++++++++----
>> 3 files changed, 164 insertions(+), 104 deletions(-)
>>
>> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
>> index 30b15a9a47c4..a745dd9d658e 100644
>> --- a/block/bfq-cgroup.c
>> +++ b/block/bfq-cgroup.c
>> @@ -705,6 +705,48 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>> bfq_put_queue(bfqq);
>> }
>>
>> +static void bfq_sync_bfqq_move(struct bfq_data *bfqd,
>> + struct bfq_queue *sync_bfqq,
>> + struct bfq_io_cq *bic,
>> + struct bfq_group *bfqg,
>> + unsigned int act_idx)
>> +{
>> + if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
>> + /* We are the only user of this bfqq, just move it */
>> + if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
>> + bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
>> + } else {
>> + struct bfq_queue *bfqq;
>> +
>> + /*
>> + * The queue was merged to a different queue. Check
>> + * that the merge chain still belongs to the same
>> + * cgroup.
>> + */
>> + for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
>> + if (bfqq->entity.sched_data !=
>> + &bfqg->sched_data)
>> + break;
>> + if (bfqq) {
>> + /*
>> + * Some queue changed cgroup so the merge is
>> + * not valid anymore. We cannot easily just
>> + * cancel the merge (by clearing new_bfqq) as
>> + * there may be other processes using this
>> + * queue and holding refs to all queues below
>> + * sync_bfqq->new_bfqq. Similarly if the merge
>> + * already happened, we need to detach from
>> + * bfqq now so that we cannot merge bio to a
>> + * request from the old cgroup.
>> + */
>> + bfq_put_cooperator(sync_bfqq);
>> + bfq_release_process_ref(bfqd, sync_bfqq);
>> + bic_set_bfqq(bic, NULL, 1, act_idx);
>> + }
>> + }
>> +}
>> +
>> +
>> /**
>> * __bfq_bic_change_cgroup - move @bic to @bfqg.
>> * @bfqd: the queue descriptor.
>> @@ -719,53 +761,24 @@ static void *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
>> struct bfq_io_cq *bic,
>> struct bfq_group *bfqg)
>> {
>> - struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0);
>> - struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1);
>> struct bfq_entity *entity;
>> + unsigned int act_idx;
>>
>> - if (async_bfqq) {
>> - entity = &async_bfqq->entity;
>> -
>> - if (entity->sched_data != &bfqg->sched_data) {
>> - bic_set_bfqq(bic, NULL, 0);
>> - bfq_release_process_ref(bfqd, async_bfqq);
>> - }
>> - }
>> + for (act_idx = 0; act_idx < BFQ_NUM_ACTUATORS; act_idx++) {
>
> Why loop over all BFQ_NUM_ACTUATORS actuators even though this patch
> itself is not enough to support multiple actuators ?
> You then have patch 5 changing this macro to BFQ_MAX_ACTUATORS and then
> patch 6 finally introducing a nr_ia_range bfq field to indicate the
> effective number of actuators.
>
> Why not:
> 1) introduce BFQ_MAX_ACTUATORS in this patch and define the bfqq field
> using it
> 2) introduce a nr_actuators field defaultint to 1 for now and use that as
> the upper bound for actuator earch loop
>
> That would be 100% consistent with the current code (no change in
> practice) and avoid all the code churn you have in the following patches.
>
>> + struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0, act_idx);
>> + struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1, act_idx);
>>
>> - if (sync_bfqq) {
>> - if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
>> - /* We are the only user of this bfqq, just move it */
>> - if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
>> - bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
>> - } else {
>> - struct bfq_queue *bfqq;
>> + if (async_bfqq) {
>> + entity = &async_bfqq->entity;
>>
>> - /*
>> - * The queue was merged to a different queue. Check
>> - * that the merge chain still belongs to the same
>> - * cgroup.
>> - */
>> - for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
>> - if (bfqq->entity.sched_data !=
>> - &bfqg->sched_data)
>> - break;
>> - if (bfqq) {
>> - /*
>> - * Some queue changed cgroup so the merge is
>> - * not valid anymore. We cannot easily just
>> - * cancel the merge (by clearing new_bfqq) as
>> - * there may be other processes using this
>> - * queue and holding refs to all queues below
>> - * sync_bfqq->new_bfqq. Similarly if the merge
>> - * already happened, we need to detach from
>> - * bfqq now so that we cannot merge bio to a
>> - * request from the old cgroup.
>> - */
>> - bfq_put_cooperator(sync_bfqq);
>> - bfq_release_process_ref(bfqd, sync_bfqq);
>> - bic_set_bfqq(bic, NULL, 1);
>> + if (entity->sched_data != &bfqg->sched_data) {
>> + bic_set_bfqq(bic, NULL, 0, act_idx);
>> + bfq_release_process_ref(bfqd, async_bfqq);
>> }
>> }
>> +
>> + if (sync_bfqq)
>> + bfq_sync_bfqq_move(bfqd, sync_bfqq, bic, bfqg, act_idx);
>> }
>>
>> return bfqg;
>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>> index c740b41fe0a4..c2485b599d87 100644
>> --- a/block/bfq-iosched.c
>> +++ b/block/bfq-iosched.c
>> @@ -377,14 +377,19 @@ static const unsigned long bfq_late_stable_merging = 600;
>> #define RQ_BIC(rq) ((struct bfq_io_cq *)((rq)->elv.priv[0]))
>> #define RQ_BFQQ(rq) ((rq)->elv.priv[1])
>>
>> -struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync)
>> +struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic,
>> + bool is_sync,
>> + unsigned int actuator_idx)
>> {
>> - return bic->bfqq[is_sync];
>> + return bic->bfqq[is_sync][actuator_idx];
>> }
>>
>> static void bfq_put_stable_ref(struct bfq_queue *bfqq);
>>
>> -void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync)
>> +void bic_set_bfqq(struct bfq_io_cq *bic,
>> + struct bfq_queue *bfqq,
>> + bool is_sync,
>> + unsigned int actuator_idx)
>> {
>> /*
>> * If bfqq != NULL, then a non-stable queue merge between
>> @@ -399,7 +404,7 @@ void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync)
>> * we cancel the stable merge if
>> * bic->stable_merge_bfqq == bfqq.
>> */
>> - bic->bfqq[is_sync] = bfqq;
>> + bic->bfqq[is_sync][actuator_idx] = bfqq;
>>
>> if (bfqq && bic->stable_merge_bfqq == bfqq) {
>> /*
>> @@ -672,9 +677,9 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
>> {
>> struct bfq_data *bfqd = data->q->elevator->elevator_data;
>> struct bfq_io_cq *bic = bfq_bic_lookup(data->q);
>> - struct bfq_queue *bfqq = bic ? bic_to_bfqq(bic, op_is_sync(opf)) : NULL;
>> int depth;
>> unsigned limit = data->q->nr_requests;
>> + unsigned int act_idx;
>>
>> /* Sync reads have full depth available */
>> if (op_is_sync(opf) && !op_is_write(opf)) {
>> @@ -684,14 +689,21 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
>> limit = (limit * depth) >> bfqd->full_depth_shift;
>> }
>>
>> - /*
>> - * Does queue (or any parent entity) exceed number of requests that
>> - * should be available to it? Heavily limit depth so that it cannot
>> - * consume more available requests and thus starve other entities.
>> - */
>> - if (bfqq && bfqq_request_over_limit(bfqq, limit))
>> - depth = 1;
>> + for (act_idx = 0; act_idx < BFQ_NUM_ACTUATORS; act_idx++) {
>> + struct bfq_queue *bfqq =
>> + bic ? bic_to_bfqq(bic, op_is_sync(opf), act_idx) : NULL;
>>
>> + /*
>> + * Does queue (or any parent entity) exceed number of
>> + * requests that should be available to it? Heavily
>> + * limit depth so that it cannot consume more
>> + * available requests and thus starve other entities.
>> + */
>> + if (bfqq && bfqq_request_over_limit(bfqq, limit)) {
>> + depth = 1;
>> + break;
>> + }
>> + }
>> bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u",
>> __func__, bfqd->wr_busy_queues, op_is_sync(opf), depth);
>> if (depth)
>> @@ -2142,7 +2154,7 @@ static void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>> * We reset waker detection logic also if too much time has passed
>> * since the first detection. If wakeups are rare, pointless idling
>> * doesn't hurt throughput that much. The condition below makes sure
>> - * we do not uselessly idle blocking waker in more than 1/64 cases.
>> + * we do not uselessly idle blocking waker in more than 1/64 cases.
>> */
>> if (bfqd->last_completed_rq_bfqq !=
>> bfqq->tentative_waker_bfqq ||
>> @@ -2454,6 +2466,16 @@ static void bfq_remove_request(struct request_queue *q,
>>
>> }
>>
>> +/* get the index of the actuator that will serve bio */
>> +static unsigned int bfq_actuator_index(struct bfq_data *bfqd, struct bio *bio)
>> +{
>> + /*
>> + * Multi-actuator support not complete yet, so always return 0
>> + * for the moment.
>> + */
>> + return 0;
>> +}
>> +
>> static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
>> unsigned int nr_segs)
>> {
>> @@ -2478,7 +2500,8 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
>> */
>> bfq_bic_update_cgroup(bic, bio);
>>
>> - bfqd->bio_bfqq = bic_to_bfqq(bic, op_is_sync(bio->bi_opf));
>> + bfqd->bio_bfqq = bic_to_bfqq(bic, op_is_sync(bio->bi_opf),
>> + bfq_actuator_index(bfqd, bio));
>> } else {
>> bfqd->bio_bfqq = NULL;
>> }
>> @@ -3174,7 +3197,7 @@ bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
>> /*
>> * Merge queues (that is, let bic redirect its requests to new_bfqq)
>> */
>> - bic_set_bfqq(bic, new_bfqq, 1);
>> + bic_set_bfqq(bic, new_bfqq, 1, bfqq->actuator_idx);
>> bfq_mark_bfqq_coop(new_bfqq);
>> /*
>> * new_bfqq now belongs to at least two bics (it is a shared queue):
>> @@ -4808,11 +4831,12 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
>> */
>> if (bfq_bfqq_wait_request(bfqq) ||
>> (bfqq->dispatched != 0 && bfq_better_to_idle(bfqq))) {
>> + unsigned int act_idx = bfqq->actuator_idx;
>> struct bfq_queue *async_bfqq =
>> - bfqq->bic && bfqq->bic->bfqq[0] &&
>> - bfq_bfqq_busy(bfqq->bic->bfqq[0]) &&
>> - bfqq->bic->bfqq[0]->next_rq ?
>> - bfqq->bic->bfqq[0] : NULL;
>> + bfqq->bic && bfqq->bic->bfqq[0][act_idx] &&
>> + bfq_bfqq_busy(bfqq->bic->bfqq[0][act_idx]) &&
>> + bfqq->bic->bfqq[0][act_idx]->next_rq ?
>> + bfqq->bic->bfqq[0][act_idx] : NULL;
>> struct bfq_queue *blocked_bfqq =
>> !hlist_empty(&bfqq->woken_list) ?
>> container_of(bfqq->woken_list.first,
>> @@ -4904,7 +4928,7 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
>> icq_to_bic(async_bfqq->next_rq->elv.icq) == bfqq->bic &&
>> bfq_serv_to_charge(async_bfqq->next_rq, async_bfqq) <=
>> bfq_bfqq_budget_left(async_bfqq))
>> - bfqq = bfqq->bic->bfqq[0];
>> + bfqq = bfqq->bic->bfqq[0][act_idx];
>> else if (bfqq->waker_bfqq &&
>> bfq_bfqq_busy(bfqq->waker_bfqq) &&
>> bfqq->waker_bfqq->next_rq &&
>> @@ -5367,49 +5391,47 @@ static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
>> bfq_release_process_ref(bfqd, bfqq);
>> }
>>
>> -static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic, bool is_sync)
>> +static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic,
>> + bool is_sync,
>> + unsigned int actuator_idx)
>> {
>> - struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync);
>> + struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync, actuator_idx);
>> struct bfq_data *bfqd;
>>
>> if (bfqq)
>> bfqd = bfqq->bfqd; /* NULL if scheduler already exited */
>>
>> if (bfqq && bfqd) {
>> - unsigned long flags;
>> -
>> - spin_lock_irqsave(&bfqd->lock, flags);
>> bfqq->bic = NULL;
>> bfq_exit_bfqq(bfqd, bfqq);
>> - bic_set_bfqq(bic, NULL, is_sync);
>> - spin_unlock_irqrestore(&bfqd->lock, flags);
>> + bic_set_bfqq(bic, NULL, is_sync, actuator_idx);
>> }
>> }
>>
>> static void bfq_exit_icq(struct io_cq *icq)
>> {
>> struct bfq_io_cq *bic = icq_to_bic(icq);
>> + struct bfq_data *bfqd = bic_to_bfqd(bic);
>> + unsigned long flags;
>> + unsigned int act_idx;
>>
>> - if (bic->stable_merge_bfqq) {
>> - struct bfq_data *bfqd = bic->stable_merge_bfqq->bfqd;
>> -
>> - /*
>> - * bfqd is NULL if scheduler already exited, and in
>> - * that case this is the last time bfqq is accessed.
>> - */
>> - if (bfqd) {
>> - unsigned long flags;
>> + /*
>> + * bfqd is NULL if scheduler already exited, and in that case
>> + * this is the last time these queues are accessed.
>> + */
>> + if (bfqd)
>> + spin_lock_irqsave(&bfqd->lock, flags);
>>
>> - spin_lock_irqsave(&bfqd->lock, flags);
>> - bfq_put_stable_ref(bic->stable_merge_bfqq);
>> - spin_unlock_irqrestore(&bfqd->lock, flags);
>> - } else {
>> + for (act_idx = 0; act_idx < BFQ_NUM_ACTUATORS; act_idx++) {
>> + if (bic->stable_merge_bfqq)
>> bfq_put_stable_ref(bic->stable_merge_bfqq);
>> - }
>> +
>> + bfq_exit_icq_bfqq(bic, true, act_idx);
>> + bfq_exit_icq_bfqq(bic, false, act_idx);
>> }
>>
>> - bfq_exit_icq_bfqq(bic, true);
>> - bfq_exit_icq_bfqq(bic, false);
>> + if (bfqd)
>> + spin_unlock_irqrestore(&bfqd->lock, flags);
>> }
>>
>> /*
>> @@ -5486,23 +5508,25 @@ static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio)
>>
>> bic->ioprio = ioprio;
>>
>> - bfqq = bic_to_bfqq(bic, false);
>> + bfqq = bic_to_bfqq(bic, false, bfq_actuator_index(bfqd, bio));
>> if (bfqq) {
>> bfq_release_process_ref(bfqd, bfqq);
>> bfqq = bfq_get_queue(bfqd, bio, false, bic, true);
>> - bic_set_bfqq(bic, bfqq, false);
>> + bic_set_bfqq(bic, bfqq, false, bfq_actuator_index(bfqd, bio));
>> }
>>
>> - bfqq = bic_to_bfqq(bic, true);
>> + bfqq = bic_to_bfqq(bic, true, bfq_actuator_index(bfqd, bio));
>> if (bfqq)
>> bfq_set_next_ioprio_data(bfqq, bic);
>> }
>>
>> static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>> - struct bfq_io_cq *bic, pid_t pid, int is_sync)
>> + struct bfq_io_cq *bic, pid_t pid, int is_sync,
>> + unsigned int act_idx)
>> {
>> u64 now_ns = ktime_get_ns();
>>
>> + bfqq->actuator_idx = act_idx;
>> RB_CLEAR_NODE(&bfqq->entity.rb_node);
>> INIT_LIST_HEAD(&bfqq->fifo);
>> INIT_HLIST_NODE(&bfqq->burst_list_node);
>> @@ -5741,6 +5765,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
>> struct bfq_group *bfqg;
>>
>> bfqg = bfq_bio_bfqg(bfqd, bio);
>> +
>> if (!is_sync) {
>> async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
>> ioprio);
>> @@ -5755,7 +5780,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
>>
>> if (bfqq) {
>> bfq_init_bfqq(bfqd, bfqq, bic, current->pid,
>> - is_sync);
>> + is_sync, bfq_actuator_index(bfqd, bio));
>> bfq_init_entity(&bfqq->entity, bfqg);
>> bfq_log_bfqq(bfqd, bfqq, "allocated");
>> } else {
>> @@ -6070,7 +6095,8 @@ static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq)
>> * then complete the merge and redirect it to
>> * new_bfqq.
>> */
>> - if (bic_to_bfqq(RQ_BIC(rq), 1) == bfqq)
>> + if (bic_to_bfqq(RQ_BIC(rq), 1,
>> + bfq_actuator_index(bfqd, rq->bio)) == bfqq)
>> bfq_merge_bfqqs(bfqd, RQ_BIC(rq),
>> bfqq, new_bfqq);
>>
>> @@ -6624,7 +6650,7 @@ bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
>> return bfqq;
>> }
>>
>> - bic_set_bfqq(bic, NULL, 1);
>> + bic_set_bfqq(bic, NULL, 1, bfqq->actuator_idx);
>>
>> bfq_put_cooperator(bfqq);
>>
>> @@ -6638,7 +6664,8 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
>> bool split, bool is_sync,
>> bool *new_queue)
>> {
>> - struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync);
>> + unsigned int act_idx = bfq_actuator_index(bfqd, bio);
>> + struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync, act_idx);
>>
>> if (likely(bfqq && bfqq != &bfqd->oom_bfqq))
>> return bfqq;
>> @@ -6650,7 +6677,7 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
>> bfq_put_queue(bfqq);
>> bfqq = bfq_get_queue(bfqd, bio, is_sync, bic, split);
>>
>> - bic_set_bfqq(bic, bfqq, is_sync);
>> + bic_set_bfqq(bic, bfqq, is_sync, act_idx);
>> if (split && is_sync) {
>> if ((bic->was_in_burst_list && bfqd->large_burst) ||
>> bic->saved_in_large_burst)
>> @@ -7092,8 +7119,10 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>> * Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues.
>> * Grab a permanent reference to it, so that the normal code flow
>> * will not attempt to free it.
>> + * Set zero as actuator index: we will pretend that
>> + * all I/O requests are for the same actuator.
>> */
>> - bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0);
>> + bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0, 0);
>> bfqd->oom_bfqq.ref++;
>> bfqd->oom_bfqq.new_ioprio = BFQ_DEFAULT_QUEUE_IOPRIO;
>> bfqd->oom_bfqq.new_ioprio_class = IOPRIO_CLASS_BE;
>> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
>> index ad8e513d7e87..8b5225a9e080 100644
>> --- a/block/bfq-iosched.h
>> +++ b/block/bfq-iosched.h
>> @@ -33,6 +33,8 @@
>> */
>> #define BFQ_SOFTRT_WEIGHT_FACTOR 100
>>
>> +#define BFQ_NUM_ACTUATORS 2
>> +
>> struct bfq_entity;
>>
>> /**
>> @@ -225,12 +227,14 @@ struct bfq_ttime {
>> * struct bfq_queue - leaf schedulable entity.
>> *
>> * A bfq_queue is a leaf request queue; it can be associated with an
>> - * io_context or more, if it is async or shared between cooperating
>> - * processes. @cgroup holds a reference to the cgroup, to be sure that it
>> - * does not disappear while a bfqq still references it (mostly to avoid
>> - * races between request issuing and task migration followed by cgroup
>> - * destruction).
>> - * All the fields are protected by the queue lock of the containing bfqd.
>> + * io_context or more, if it is async or shared between cooperating
>> + * processes. Besides, it contains I/O requests for only one actuator
>> + * (an io_context is associated with a different bfq_queue for each
>> + * actuator it generates I/O for). @cgroup holds a reference to the
>> + * cgroup, to be sure that it does not disappear while a bfqq still
>> + * references it (mostly to avoid races between request issuing and
>> + * task migration followed by cgroup destruction). All the fields are
>> + * protected by the queue lock of the containing bfqd.
>> */
>> struct bfq_queue {
>> /* reference counter */
>> @@ -399,6 +403,9 @@ struct bfq_queue {
>> * the woken queues when this queue exits.
>> */
>> struct hlist_head woken_list;
>> +
>> + /* index of the actuator this queue is associated with */
>> + unsigned int actuator_idx;
>> };
>>
>> /**
>> @@ -407,8 +414,17 @@ struct bfq_queue {
>> struct bfq_io_cq {
>> /* associated io_cq structure */
>> struct io_cq icq; /* must be the first member */
>> - /* array of two process queues, the sync and the async */
>> - struct bfq_queue *bfqq[2];
>> + /*
>> + * Matrix of associated process queues: first row for async
>> + * queues, second row sync queues. Each row contains one
>> + * column for each actuator. An I/O request generated by the
>> + * process is inserted into the queue pointed by bfqq[i][j] if
>> + * the request is to be served by the j-th actuator of the
>> + * drive, where i==0 or i==1, depending on whether the request
>> + * is async or sync. So there is a distinct queue for each
>> + * actuator.
>> + */
>> + struct bfq_queue *bfqq[2][BFQ_NUM_ACTUATORS];
>> /* per (request_queue, blkcg) ioprio */
>> int ioprio;
>> #ifdef CONFIG_BFQ_GROUP_IOSCHED
>> @@ -968,8 +984,10 @@ struct bfq_group {
>>
>> extern const int bfq_timeout;
>>
>> -struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync);
>> -void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync);
>> +struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync,
>> + unsigned int actuator_idx);
>> +void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync,
>> + unsigned int actuator_idx);
>> struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic);
>> void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
>> void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>
> --
> Damien Le Moal
> Western Digital Research

2022-10-11 23:48:20

by Damien Le Moal

[permalink] [raw]

Subject: Re: [PATCH V3 1/8] block, bfq: split sync bfq_queues on a per-actuator basis

On 10/12/22 00:45, Paolo VALENTE wrote:
>
>
>> Il giorno 5 ott 2022, alle ore 01:04, Damien Le Moal <[email protected]> ha scritto:
>>
>> On 10/4/22 18:40, Paolo Valente wrote:
>>> Multi-actuator drives appear as a single device to the I/O subsystem [1].
>>
>> Not necessarilly. Multi-lun scsi model will show up as multiple drives
>> with one actuator each.
>>
>
> Right. In that case, IIUC each LUN appears as a separate device (in
> /dev/), and allows one of the actuators to be controlled separately.
> So each such device has only one range in the air data structure, and
> the extension in these patches is simply inactive.

Yes. My comment was all about not being so general with how multi-actuator
drives are seen from the host. Your sentence should be rephrased to
something like: "Single LUN multi-actuator SCSI drives and all
multi-actuator SATA drives appear as a single device to the I/O subsystem."

No confusion possible this way :)

>
> That said, I'll try to address all your suggestions for this and the
> other patches, and send a new version of this series.
>
> Thank you,
> Paolo
>
>>> Yet they address commands to different actuators internally, as a
>>> function of Logical Block Addressing (LBAs). A given sector is
>>> reachable by only one of the actuators. For example, Seagate’s Serial
>>> Advanced Technology Attachment (SATA) version contains two actuators
>>> and maps the lower half of the SATA LBA space to the lower actuator
>>> and the upper half to the upper actuator.
>>>
>>> Evidently, to fully utilize actuators, no actuator must be left idle
>>> or underutilized while there is pending I/O for it. The block layer
>>> must somehow control the load of each actuator individually. This
>>> commit lays the ground for allowing BFQ to provide such a per-actuator
>>> control.
>>>
>>> BFQ associates an I/O-request sync bfq_queue with each process doing
>>> synchronous I/O, or with a group of processes, in case of queue
>>> merging. Then BFQ serves one bfq_queue at a time. While in service, a
>>> bfq_queue is emptied in request-position order. Yet the same process,
>>> or group of processes, may generate I/O for different actuators. In
>>> this case, different streams of I/O (each for a different actuator)
>>> get all inserted into the same sync bfq_queue. So there is basically
>>> no individual control on when each stream is served, i.e., on when the
>>> I/O requests of the stream are picked from the bfq_queue and
>>> dispatched to the drive.
>>>
>>> This commit enables BFQ to control the service of each actuator
>>> individually for synchronous I/O, by simply splitting each sync
>>> bfq_queue into N queues, one for each actuator. In other words, a sync
>>> bfq_queue is now associated to a pair (process, actuator). As a
>>> consequence of this split, the per-queue proportional-share policy
>>> implemented by BFQ will guarantee that the sync I/O generated for each
>>> actuator, by each process, receives its fair share of service.
>>>
>>> This is just a preparatory patch. If the I/O of the same process
>>> happens to be sent to different queues, then each of these queues may
>>> undergo queue merging. To handle this event, the bfq_io_cq data
>>> structure must be properly extended. In addition, stable merging must
>>> be disabled to avoid loss of control on individual actuators. Finally,
>>> also async queues must be split. These issues are described in detail
>>> and addressed in next commits. As for this commit, although multiple
>>> per-process bfq_queues are provided, the I/O of each process or group
>>> of processes is still sent to only one queue, regardless of the
>>> actuator the I/O is for. The forwarding to distinct bfq_queues will be
>>> enabled after addressing the above issues.
>>>
>>> [1] https://www.linaro.org/blog/budget-fair-queueing-bfq-linux-io-scheduler-optimizations-for-multi-actuator-sata-hard-drives/
>>>
>>> Signed-off-by: Gabriele Felici <[email protected]>
>>> Signed-off-by: Paolo Valente <[email protected]>
>>> ---
>>> block/bfq-cgroup.c | 95 +++++++++++++++++--------------
>>> block/bfq-iosched.c | 135 +++++++++++++++++++++++++++-----------------
>>> block/bfq-iosched.h | 38 +++++++++----
>>> 3 files changed, 164 insertions(+), 104 deletions(-)
>>>
>>> diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
>>> index 30b15a9a47c4..a745dd9d658e 100644
>>> --- a/block/bfq-cgroup.c
>>> +++ b/block/bfq-cgroup.c
>>> @@ -705,6 +705,48 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>>> bfq_put_queue(bfqq);
>>> }
>>>
>>> +static void bfq_sync_bfqq_move(struct bfq_data *bfqd,
>>> + struct bfq_queue *sync_bfqq,
>>> + struct bfq_io_cq *bic,
>>> + struct bfq_group *bfqg,
>>> + unsigned int act_idx)
>>> +{
>>> + if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
>>> + /* We are the only user of this bfqq, just move it */
>>> + if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
>>> + bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
>>> + } else {
>>> + struct bfq_queue *bfqq;
>>> +
>>> + /*
>>> + * The queue was merged to a different queue. Check
>>> + * that the merge chain still belongs to the same
>>> + * cgroup.
>>> + */
>>> + for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
>>> + if (bfqq->entity.sched_data !=
>>> + &bfqg->sched_data)
>>> + break;
>>> + if (bfqq) {
>>> + /*
>>> + * Some queue changed cgroup so the merge is
>>> + * not valid anymore. We cannot easily just
>>> + * cancel the merge (by clearing new_bfqq) as
>>> + * there may be other processes using this
>>> + * queue and holding refs to all queues below
>>> + * sync_bfqq->new_bfqq. Similarly if the merge
>>> + * already happened, we need to detach from
>>> + * bfqq now so that we cannot merge bio to a
>>> + * request from the old cgroup.
>>> + */
>>> + bfq_put_cooperator(sync_bfqq);
>>> + bfq_release_process_ref(bfqd, sync_bfqq);
>>> + bic_set_bfqq(bic, NULL, 1, act_idx);
>>> + }
>>> + }
>>> +}
>>> +
>>> +
>>> /**
>>> * __bfq_bic_change_cgroup - move @bic to @bfqg.
>>> * @bfqd: the queue descriptor.
>>> @@ -719,53 +761,24 @@ static void *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
>>> struct bfq_io_cq *bic,
>>> struct bfq_group *bfqg)
>>> {
>>> - struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0);
>>> - struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1);
>>> struct bfq_entity *entity;
>>> + unsigned int act_idx;
>>>
>>> - if (async_bfqq) {
>>> - entity = &async_bfqq->entity;
>>> -
>>> - if (entity->sched_data != &bfqg->sched_data) {
>>> - bic_set_bfqq(bic, NULL, 0);
>>> - bfq_release_process_ref(bfqd, async_bfqq);
>>> - }
>>> - }
>>> + for (act_idx = 0; act_idx < BFQ_NUM_ACTUATORS; act_idx++) {
>>
>> Why loop over all BFQ_NUM_ACTUATORS actuators even though this patch
>> itself is not enough to support multiple actuators ?
>> You then have patch 5 changing this macro to BFQ_MAX_ACTUATORS and then
>> patch 6 finally introducing a nr_ia_range bfq field to indicate the
>> effective number of actuators.
>>
>> Why not:
>> 1) introduce BFQ_MAX_ACTUATORS in this patch and define the bfqq field
>> using it
>> 2) introduce a nr_actuators field defaultint to 1 for now and use that as
>> the upper bound for actuator earch loop
>>
>> That would be 100% consistent with the current code (no change in
>> practice) and avoid all the code churn you have in the following patches.
>>
>>> + struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0, act_idx);
>>> + struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1, act_idx);
>>>
>>> - if (sync_bfqq) {
>>> - if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
>>> - /* We are the only user of this bfqq, just move it */
>>> - if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
>>> - bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
>>> - } else {
>>> - struct bfq_queue *bfqq;
>>> + if (async_bfqq) {
>>> + entity = &async_bfqq->entity;
>>>
>>> - /*
>>> - * The queue was merged to a different queue. Check
>>> - * that the merge chain still belongs to the same
>>> - * cgroup.
>>> - */
>>> - for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
>>> - if (bfqq->entity.sched_data !=
>>> - &bfqg->sched_data)
>>> - break;
>>> - if (bfqq) {
>>> - /*
>>> - * Some queue changed cgroup so the merge is
>>> - * not valid anymore. We cannot easily just
>>> - * cancel the merge (by clearing new_bfqq) as
>>> - * there may be other processes using this
>>> - * queue and holding refs to all queues below
>>> - * sync_bfqq->new_bfqq. Similarly if the merge
>>> - * already happened, we need to detach from
>>> - * bfqq now so that we cannot merge bio to a
>>> - * request from the old cgroup.
>>> - */
>>> - bfq_put_cooperator(sync_bfqq);
>>> - bfq_release_process_ref(bfqd, sync_bfqq);
>>> - bic_set_bfqq(bic, NULL, 1);
>>> + if (entity->sched_data != &bfqg->sched_data) {
>>> + bic_set_bfqq(bic, NULL, 0, act_idx);
>>> + bfq_release_process_ref(bfqd, async_bfqq);
>>> }
>>> }
>>> +
>>> + if (sync_bfqq)
>>> + bfq_sync_bfqq_move(bfqd, sync_bfqq, bic, bfqg, act_idx);
>>> }
>>>
>>> return bfqg;
>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
>>> index c740b41fe0a4..c2485b599d87 100644
>>> --- a/block/bfq-iosched.c
>>> +++ b/block/bfq-iosched.c
>>> @@ -377,14 +377,19 @@ static const unsigned long bfq_late_stable_merging = 600;
>>> #define RQ_BIC(rq) ((struct bfq_io_cq *)((rq)->elv.priv[0]))
>>> #define RQ_BFQQ(rq) ((rq)->elv.priv[1])
>>>
>>> -struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync)
>>> +struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic,
>>> + bool is_sync,
>>> + unsigned int actuator_idx)
>>> {
>>> - return bic->bfqq[is_sync];
>>> + return bic->bfqq[is_sync][actuator_idx];
>>> }
>>>
>>> static void bfq_put_stable_ref(struct bfq_queue *bfqq);
>>>
>>> -void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync)
>>> +void bic_set_bfqq(struct bfq_io_cq *bic,
>>> + struct bfq_queue *bfqq,
>>> + bool is_sync,
>>> + unsigned int actuator_idx)
>>> {
>>> /*
>>> * If bfqq != NULL, then a non-stable queue merge between
>>> @@ -399,7 +404,7 @@ void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync)
>>> * we cancel the stable merge if
>>> * bic->stable_merge_bfqq == bfqq.
>>> */
>>> - bic->bfqq[is_sync] = bfqq;
>>> + bic->bfqq[is_sync][actuator_idx] = bfqq;
>>>
>>> if (bfqq && bic->stable_merge_bfqq == bfqq) {
>>> /*
>>> @@ -672,9 +677,9 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
>>> {
>>> struct bfq_data *bfqd = data->q->elevator->elevator_data;
>>> struct bfq_io_cq *bic = bfq_bic_lookup(data->q);
>>> - struct bfq_queue *bfqq = bic ? bic_to_bfqq(bic, op_is_sync(opf)) : NULL;
>>> int depth;
>>> unsigned limit = data->q->nr_requests;
>>> + unsigned int act_idx;
>>>
>>> /* Sync reads have full depth available */
>>> if (op_is_sync(opf) && !op_is_write(opf)) {
>>> @@ -684,14 +689,21 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
>>> limit = (limit * depth) >> bfqd->full_depth_shift;
>>> }
>>>
>>> - /*
>>> - * Does queue (or any parent entity) exceed number of requests that
>>> - * should be available to it? Heavily limit depth so that it cannot
>>> - * consume more available requests and thus starve other entities.
>>> - */
>>> - if (bfqq && bfqq_request_over_limit(bfqq, limit))
>>> - depth = 1;
>>> + for (act_idx = 0; act_idx < BFQ_NUM_ACTUATORS; act_idx++) {
>>> + struct bfq_queue *bfqq =
>>> + bic ? bic_to_bfqq(bic, op_is_sync(opf), act_idx) : NULL;
>>>
>>> + /*
>>> + * Does queue (or any parent entity) exceed number of
>>> + * requests that should be available to it? Heavily
>>> + * limit depth so that it cannot consume more
>>> + * available requests and thus starve other entities.
>>> + */
>>> + if (bfqq && bfqq_request_over_limit(bfqq, limit)) {
>>> + depth = 1;
>>> + break;
>>> + }
>>> + }
>>> bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u",
>>> __func__, bfqd->wr_busy_queues, op_is_sync(opf), depth);
>>> if (depth)
>>> @@ -2142,7 +2154,7 @@ static void bfq_check_waker(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>>> * We reset waker detection logic also if too much time has passed
>>> * since the first detection. If wakeups are rare, pointless idling
>>> * doesn't hurt throughput that much. The condition below makes sure
>>> - * we do not uselessly idle blocking waker in more than 1/64 cases.
>>> + * we do not uselessly idle blocking waker in more than 1/64 cases.
>>> */
>>> if (bfqd->last_completed_rq_bfqq !=
>>> bfqq->tentative_waker_bfqq ||
>>> @@ -2454,6 +2466,16 @@ static void bfq_remove_request(struct request_queue *q,
>>>
>>> }
>>>
>>> +/* get the index of the actuator that will serve bio */
>>> +static unsigned int bfq_actuator_index(struct bfq_data *bfqd, struct bio *bio)
>>> +{
>>> + /*
>>> + * Multi-actuator support not complete yet, so always return 0
>>> + * for the moment.
>>> + */
>>> + return 0;
>>> +}
>>> +
>>> static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
>>> unsigned int nr_segs)
>>> {
>>> @@ -2478,7 +2500,8 @@ static bool bfq_bio_merge(struct request_queue *q, struct bio *bio,
>>> */
>>> bfq_bic_update_cgroup(bic, bio);
>>>
>>> - bfqd->bio_bfqq = bic_to_bfqq(bic, op_is_sync(bio->bi_opf));
>>> + bfqd->bio_bfqq = bic_to_bfqq(bic, op_is_sync(bio->bi_opf),
>>> + bfq_actuator_index(bfqd, bio));
>>> } else {
>>> bfqd->bio_bfqq = NULL;
>>> }
>>> @@ -3174,7 +3197,7 @@ bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
>>> /*
>>> * Merge queues (that is, let bic redirect its requests to new_bfqq)
>>> */
>>> - bic_set_bfqq(bic, new_bfqq, 1);
>>> + bic_set_bfqq(bic, new_bfqq, 1, bfqq->actuator_idx);
>>> bfq_mark_bfqq_coop(new_bfqq);
>>> /*
>>> * new_bfqq now belongs to at least two bics (it is a shared queue):
>>> @@ -4808,11 +4831,12 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
>>> */
>>> if (bfq_bfqq_wait_request(bfqq) ||
>>> (bfqq->dispatched != 0 && bfq_better_to_idle(bfqq))) {
>>> + unsigned int act_idx = bfqq->actuator_idx;
>>> struct bfq_queue *async_bfqq =
>>> - bfqq->bic && bfqq->bic->bfqq[0] &&
>>> - bfq_bfqq_busy(bfqq->bic->bfqq[0]) &&
>>> - bfqq->bic->bfqq[0]->next_rq ?
>>> - bfqq->bic->bfqq[0] : NULL;
>>> + bfqq->bic && bfqq->bic->bfqq[0][act_idx] &&
>>> + bfq_bfqq_busy(bfqq->bic->bfqq[0][act_idx]) &&
>>> + bfqq->bic->bfqq[0][act_idx]->next_rq ?
>>> + bfqq->bic->bfqq[0][act_idx] : NULL;
>>> struct bfq_queue *blocked_bfqq =
>>> !hlist_empty(&bfqq->woken_list) ?
>>> container_of(bfqq->woken_list.first,
>>> @@ -4904,7 +4928,7 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
>>> icq_to_bic(async_bfqq->next_rq->elv.icq) == bfqq->bic &&
>>> bfq_serv_to_charge(async_bfqq->next_rq, async_bfqq) <=
>>> bfq_bfqq_budget_left(async_bfqq))
>>> - bfqq = bfqq->bic->bfqq[0];
>>> + bfqq = bfqq->bic->bfqq[0][act_idx];
>>> else if (bfqq->waker_bfqq &&
>>> bfq_bfqq_busy(bfqq->waker_bfqq) &&
>>> bfqq->waker_bfqq->next_rq &&
>>> @@ -5367,49 +5391,47 @@ static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
>>> bfq_release_process_ref(bfqd, bfqq);
>>> }
>>>
>>> -static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic, bool is_sync)
>>> +static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic,
>>> + bool is_sync,
>>> + unsigned int actuator_idx)
>>> {
>>> - struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync);
>>> + struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync, actuator_idx);
>>> struct bfq_data *bfqd;
>>>
>>> if (bfqq)
>>> bfqd = bfqq->bfqd; /* NULL if scheduler already exited */
>>>
>>> if (bfqq && bfqd) {
>>> - unsigned long flags;
>>> -
>>> - spin_lock_irqsave(&bfqd->lock, flags);
>>> bfqq->bic = NULL;
>>> bfq_exit_bfqq(bfqd, bfqq);
>>> - bic_set_bfqq(bic, NULL, is_sync);
>>> - spin_unlock_irqrestore(&bfqd->lock, flags);
>>> + bic_set_bfqq(bic, NULL, is_sync, actuator_idx);
>>> }
>>> }
>>>
>>> static void bfq_exit_icq(struct io_cq *icq)
>>> {
>>> struct bfq_io_cq *bic = icq_to_bic(icq);
>>> + struct bfq_data *bfqd = bic_to_bfqd(bic);
>>> + unsigned long flags;
>>> + unsigned int act_idx;
>>>
>>> - if (bic->stable_merge_bfqq) {
>>> - struct bfq_data *bfqd = bic->stable_merge_bfqq->bfqd;
>>> -
>>> - /*
>>> - * bfqd is NULL if scheduler already exited, and in
>>> - * that case this is the last time bfqq is accessed.
>>> - */
>>> - if (bfqd) {
>>> - unsigned long flags;
>>> + /*
>>> + * bfqd is NULL if scheduler already exited, and in that case
>>> + * this is the last time these queues are accessed.
>>> + */
>>> + if (bfqd)
>>> + spin_lock_irqsave(&bfqd->lock, flags);
>>>
>>> - spin_lock_irqsave(&bfqd->lock, flags);
>>> - bfq_put_stable_ref(bic->stable_merge_bfqq);
>>> - spin_unlock_irqrestore(&bfqd->lock, flags);
>>> - } else {
>>> + for (act_idx = 0; act_idx < BFQ_NUM_ACTUATORS; act_idx++) {
>>> + if (bic->stable_merge_bfqq)
>>> bfq_put_stable_ref(bic->stable_merge_bfqq);
>>> - }
>>> +
>>> + bfq_exit_icq_bfqq(bic, true, act_idx);
>>> + bfq_exit_icq_bfqq(bic, false, act_idx);
>>> }
>>>
>>> - bfq_exit_icq_bfqq(bic, true);
>>> - bfq_exit_icq_bfqq(bic, false);
>>> + if (bfqd)
>>> + spin_unlock_irqrestore(&bfqd->lock, flags);
>>> }
>>>
>>> /*
>>> @@ -5486,23 +5508,25 @@ static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio)
>>>
>>> bic->ioprio = ioprio;
>>>
>>> - bfqq = bic_to_bfqq(bic, false);
>>> + bfqq = bic_to_bfqq(bic, false, bfq_actuator_index(bfqd, bio));
>>> if (bfqq) {
>>> bfq_release_process_ref(bfqd, bfqq);
>>> bfqq = bfq_get_queue(bfqd, bio, false, bic, true);
>>> - bic_set_bfqq(bic, bfqq, false);
>>> + bic_set_bfqq(bic, bfqq, false, bfq_actuator_index(bfqd, bio));
>>> }
>>>
>>> - bfqq = bic_to_bfqq(bic, true);
>>> + bfqq = bic_to_bfqq(bic, true, bfq_actuator_index(bfqd, bio));
>>> if (bfqq)
>>> bfq_set_next_ioprio_data(bfqq, bic);
>>> }
>>>
>>> static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>>> - struct bfq_io_cq *bic, pid_t pid, int is_sync)
>>> + struct bfq_io_cq *bic, pid_t pid, int is_sync,
>>> + unsigned int act_idx)
>>> {
>>> u64 now_ns = ktime_get_ns();
>>>
>>> + bfqq->actuator_idx = act_idx;
>>> RB_CLEAR_NODE(&bfqq->entity.rb_node);
>>> INIT_LIST_HEAD(&bfqq->fifo);
>>> INIT_HLIST_NODE(&bfqq->burst_list_node);
>>> @@ -5741,6 +5765,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
>>> struct bfq_group *bfqg;
>>>
>>> bfqg = bfq_bio_bfqg(bfqd, bio);
>>> +
>>> if (!is_sync) {
>>> async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
>>> ioprio);
>>> @@ -5755,7 +5780,7 @@ static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
>>>
>>> if (bfqq) {
>>> bfq_init_bfqq(bfqd, bfqq, bic, current->pid,
>>> - is_sync);
>>> + is_sync, bfq_actuator_index(bfqd, bio));
>>> bfq_init_entity(&bfqq->entity, bfqg);
>>> bfq_log_bfqq(bfqd, bfqq, "allocated");
>>> } else {
>>> @@ -6070,7 +6095,8 @@ static bool __bfq_insert_request(struct bfq_data *bfqd, struct request *rq)
>>> * then complete the merge and redirect it to
>>> * new_bfqq.
>>> */
>>> - if (bic_to_bfqq(RQ_BIC(rq), 1) == bfqq)
>>> + if (bic_to_bfqq(RQ_BIC(rq), 1,
>>> + bfq_actuator_index(bfqd, rq->bio)) == bfqq)
>>> bfq_merge_bfqqs(bfqd, RQ_BIC(rq),
>>> bfqq, new_bfqq);
>>>
>>> @@ -6624,7 +6650,7 @@ bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
>>> return bfqq;
>>> }
>>>
>>> - bic_set_bfqq(bic, NULL, 1);
>>> + bic_set_bfqq(bic, NULL, 1, bfqq->actuator_idx);
>>>
>>> bfq_put_cooperator(bfqq);
>>>
>>> @@ -6638,7 +6664,8 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
>>> bool split, bool is_sync,
>>> bool *new_queue)
>>> {
>>> - struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync);
>>> + unsigned int act_idx = bfq_actuator_index(bfqd, bio);
>>> + struct bfq_queue *bfqq = bic_to_bfqq(bic, is_sync, act_idx);
>>>
>>> if (likely(bfqq && bfqq != &bfqd->oom_bfqq))
>>> return bfqq;
>>> @@ -6650,7 +6677,7 @@ static struct bfq_queue *bfq_get_bfqq_handle_split(struct bfq_data *bfqd,
>>> bfq_put_queue(bfqq);
>>> bfqq = bfq_get_queue(bfqd, bio, is_sync, bic, split);
>>>
>>> - bic_set_bfqq(bic, bfqq, is_sync);
>>> + bic_set_bfqq(bic, bfqq, is_sync, act_idx);
>>> if (split && is_sync) {
>>> if ((bic->was_in_burst_list && bfqd->large_burst) ||
>>> bic->saved_in_large_burst)
>>> @@ -7092,8 +7119,10 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
>>> * Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues.
>>> * Grab a permanent reference to it, so that the normal code flow
>>> * will not attempt to free it.
>>> + * Set zero as actuator index: we will pretend that
>>> + * all I/O requests are for the same actuator.
>>> */
>>> - bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0);
>>> + bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0, 0);
>>> bfqd->oom_bfqq.ref++;
>>> bfqd->oom_bfqq.new_ioprio = BFQ_DEFAULT_QUEUE_IOPRIO;
>>> bfqd->oom_bfqq.new_ioprio_class = IOPRIO_CLASS_BE;
>>> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
>>> index ad8e513d7e87..8b5225a9e080 100644
>>> --- a/block/bfq-iosched.h
>>> +++ b/block/bfq-iosched.h
>>> @@ -33,6 +33,8 @@
>>> */
>>> #define BFQ_SOFTRT_WEIGHT_FACTOR 100
>>>
>>> +#define BFQ_NUM_ACTUATORS 2
>>> +
>>> struct bfq_entity;
>>>
>>> /**
>>> @@ -225,12 +227,14 @@ struct bfq_ttime {
>>> * struct bfq_queue - leaf schedulable entity.
>>> *
>>> * A bfq_queue is a leaf request queue; it can be associated with an
>>> - * io_context or more, if it is async or shared between cooperating
>>> - * processes. @cgroup holds a reference to the cgroup, to be sure that it
>>> - * does not disappear while a bfqq still references it (mostly to avoid
>>> - * races between request issuing and task migration followed by cgroup
>>> - * destruction).
>>> - * All the fields are protected by the queue lock of the containing bfqd.
>>> + * io_context or more, if it is async or shared between cooperating
>>> + * processes. Besides, it contains I/O requests for only one actuator
>>> + * (an io_context is associated with a different bfq_queue for each
>>> + * actuator it generates I/O for). @cgroup holds a reference to the
>>> + * cgroup, to be sure that it does not disappear while a bfqq still
>>> + * references it (mostly to avoid races between request issuing and
>>> + * task migration followed by cgroup destruction). All the fields are
>>> + * protected by the queue lock of the containing bfqd.
>>> */
>>> struct bfq_queue {
>>> /* reference counter */
>>> @@ -399,6 +403,9 @@ struct bfq_queue {
>>> * the woken queues when this queue exits.
>>> */
>>> struct hlist_head woken_list;
>>> +
>>> + /* index of the actuator this queue is associated with */
>>> + unsigned int actuator_idx;
>>> };
>>>
>>> /**
>>> @@ -407,8 +414,17 @@ struct bfq_queue {
>>> struct bfq_io_cq {
>>> /* associated io_cq structure */
>>> struct io_cq icq; /* must be the first member */
>>> - /* array of two process queues, the sync and the async */
>>> - struct bfq_queue *bfqq[2];
>>> + /*
>>> + * Matrix of associated process queues: first row for async
>>> + * queues, second row sync queues. Each row contains one
>>> + * column for each actuator. An I/O request generated by the
>>> + * process is inserted into the queue pointed by bfqq[i][j] if
>>> + * the request is to be served by the j-th actuator of the
>>> + * drive, where i==0 or i==1, depending on whether the request
>>> + * is async or sync. So there is a distinct queue for each
>>> + * actuator.
>>> + */
>>> + struct bfq_queue *bfqq[2][BFQ_NUM_ACTUATORS];
>>> /* per (request_queue, blkcg) ioprio */
>>> int ioprio;
>>> #ifdef CONFIG_BFQ_GROUP_IOSCHED
>>> @@ -968,8 +984,10 @@ struct bfq_group {
>>>
>>> extern const int bfq_timeout;
>>>
>>> -struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync);
>>> -void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync);
>>> +struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync,
>>> + unsigned int actuator_idx);
>>> +void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync,
>>> + unsigned int actuator_idx);
>>> struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic);
>>> void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
>>> void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq,
>>
>> --
>> Damien Le Moal
>> Western Digital Research
>
>

--
Damien Le Moal
Western Digital Research