Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp7107962rwb; Tue, 6 Dec 2022 00:59:48 -0800 (PST) X-Google-Smtp-Source: AA0mqf4A5BQrgPyn4CWAjRSxhYlP8ekyFdOAgvElQeT3Sel7b7IMGzUvd15ez8aV5acxCtsvynKU X-Received: by 2002:a05:6402:528f:b0:461:9cbd:8fba with SMTP id en15-20020a056402528f00b004619cbd8fbamr53749164edb.19.1670317187902; Tue, 06 Dec 2022 00:59:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670317187; cv=none; d=google.com; s=arc-20160816; b=lyEVKaLWgFhMbRu61F2K9lAIi45Zw6UQM/L0FUD6i8DB25paDWJdz2igvRPuSVrVlN DFzIbcc8muVQ+zE/iP772qhWf8CglX3TJcKK4s1ld/rUxD+bC9EQ2Jh8ZGlLaholpoup 7dJhlx6IeiAe9IgXbzowrGjF1br49XLsZt681Yz9Fvwda2tZ6Kw05FTLv8I1Rtw7kFo0 jhecz1V2M3Axmsv7csnq3NA21zGJ29QVcvPhSauKVaildTmCOnVY6gGkh5F31Z3eq6gd JagOQXTrpfclBoSNDhSxfYDlNSaX17yLulvBojid08HwUNrhJdDk4bGZGPKNGwnzxTxH /NtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=y4Nf49ADSGLlGyjDOUT9QXEULpQkeBaIMdSYlxuRDIA=; b=BiNwv1YlMJ275/SJm485JYYBHm0x5bP1uZkem8UXMu2ewnQ6qRMT5Uy1qHHhE/cf0u uudAPc9Mv5XpJEfO1Gozm9qthAUvmNmKegBiamwlUzRN0hcelaB6XdIz0XYmikCc/ov8 rF49XkQrsnX6kG7xqViJAWSgby3aX/DAg8wkTyJi/B7r9GmvT8EkAYaIocpK0RdGNoNa K2+7CeG27ra4h3u90iNVFw3/PulBM1pOUBfmu514EeCqrCVQpq6Ea1pqU/A26f1/Eawi qfujXBlSH3M/kDe+i+W/Org1pWQaxvdhvZcbGxA7+Lzjz0eXEJ8Ad23tftpDPHdqQbRv oEqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=XE8BHkKO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n4-20020aa7c784000000b00463f8aad371si1302318eds.239.2022.12.06.00.59.28; Tue, 06 Dec 2022 00:59:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=XE8BHkKO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233672AbiLFIQn (ORCPT + 79 others); Tue, 6 Dec 2022 03:16:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233119AbiLFIQW (ORCPT ); Tue, 6 Dec 2022 03:16:22 -0500 Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 554F21A207 for ; Tue, 6 Dec 2022 00:16:06 -0800 (PST) Received: by mail-ed1-x52c.google.com with SMTP id r26so19150545edc.10 for ; Tue, 06 Dec 2022 00:16:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=y4Nf49ADSGLlGyjDOUT9QXEULpQkeBaIMdSYlxuRDIA=; b=XE8BHkKOI5xakejcRiy6og75hE83Uq5ULuQylOdGu1eRFcfkIxzgo3Q4DR0PVpZR+w yLCRsfV9QxJV1KF97eQv5duqKb4CcDFIWL7qO/GM+OGzuRWDnod89GDZEjKkqadA6iZq TuDCW1G+EuxjL265DgfB5R/LRJepUgHMn9RhsrTfO3Bwj0oK7O01aC4+SYBAn6HfJfpN AKqcZBOm2qVh3qDDYURg1mRUwovXV3V/LV9cztZKt9pR0LllqVzs2LtS2rHI9WzZCfY8 kSCRQROv6cIeywYl11wpIzr2f2r/UXxufVEJEXLMnjQkt6snMpaQIvreUiJMwEKQFUw/ Q6+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y4Nf49ADSGLlGyjDOUT9QXEULpQkeBaIMdSYlxuRDIA=; b=6RHIcK+zRD9SGpGJ/y6dfi5kIQMmg31toiGy9s67zCBTDjkihyPrhDBTIaDZa4RCc6 /vasIHt9e35d0YZlkHae0+vGmHyDEhMXLYYVdNTQEM8BMTAKdR/UfzYyCfniSKbz41wi cxTsnA7fLGSc2xTND17lEtKOkA0C53NsJ2NRvmdJjVsflA4tkys34fPjJgvrp/ab2FI4 mrvtLTuApIGLtvb1PcRugxPTGgFRF2+9ezztrCKzDSSg9S9PxZdBn6p0buS0Du504TtI x3VwvJwGPEAUYMkGb/LoysK1hrAB6l5pF3hd8ksHfRehEO0wEF0CYsqPPFqs9RzvC0gb hR1Q== X-Gm-Message-State: ANoB5plkQt4UnJ/StyrywgGVT44g9a6aB+M34d/gf7W3HcLTocpU9Oz8 0BlfW0kpVAiJVoJjzssHNK73wA== X-Received: by 2002:a05:6402:538e:b0:468:ea55:ab40 with SMTP id ew14-20020a056402538e00b00468ea55ab40mr66718edb.323.1670314564631; Tue, 06 Dec 2022 00:16:04 -0800 (PST) Received: from MBP-di-Paolo.station (net-2-35-55-161.cust.vodafonedsl.it. [2.35.55.161]) by smtp.gmail.com with ESMTPSA id 24-20020a170906329800b007b29d292852sm7161944ejw.148.2022.12.06.00.16.03 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Dec 2022 00:16:04 -0800 (PST) From: Paolo Valente To: Jens Axboe Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, arie.vanderhoeven@seagate.com, rory.c.chen@seagate.com, glen.valante@linaro.org, Davide Zini , Paolo Valente Subject: [PATCH V7 7/8] block, bfq: inject I/O to underutilized actuators Date: Tue, 6 Dec 2022 09:15:50 +0100 Message-Id: <20221206081551.28257-8-paolo.valente@linaro.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20221206081551.28257-1-paolo.valente@linaro.org> References: <20221206081551.28257-1-paolo.valente@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Davide Zini The main service scheme of BFQ for sync I/O is serving one sync bfq_queue at a time, for a while. In particular, BFQ enforces this scheme when it deems the latter necessary to boost throughput or to preserve service guarantees. Unfortunately, when BFQ enforces this policy, only one actuator at a time gets served for a while, because each bfq_queue contains I/O only for one actuator. The other actuators may remain underutilized. Actually, BFQ may serve (inject) extra I/O, taken from other bfq_queues, in parallel with that of the in-service queue. This injection mechanism may provide the ground for dealing also with the above actuator-underutilization problem. Yet BFQ does not take the actuator load into account when choosing which queue to pick extra I/O from. In addition, BFQ may happen to inject extra I/O only when the in-service queue is temporarily empty. In view of these facts, this commit extends the injection mechanism in such a way that the latter: (1) takes into account also the actuator load; (2) checks such a load on each dispatch, and injects I/O for an underutilized actuator, if there is one and there is I/O for it. To perform the check in (2), this commit introduces a load threshold, currently set to 4. A linear scan of each actuator is performed, until an actuator is found for which the following two conditions hold: the load of the actuator is below the threshold, and there is at least one non-in-service queue that contains I/O for that actuator. If such a pair (actuator, queue) is found, then the head request of that queue is returned for dispatch, instead of the head request of the in-service queue. We have set the threshold, empirically, to the minimum possible value for which an actuator is fully utilized, or close to be fully utilized. By doing so, injected I/O 'steals' as few drive-queue slots as possibile to the in-service queue. This reduces as much as possible the probability that the service of I/O from the in-service bfq_queue gets delayed because of slot exhaustion, i.e., because all the slots of the drive queue are filled with I/O injected from other queues (NCQ provides for 32 slots). This new mechanism also counters actuator underutilization in the case of asymmetric configurations of bfq_queues. Namely if there are few bfq_queues containing I/O for some actuators and many bfq_queues containing I/O for other actuators. Or if the bfq_queues containing I/O for some actuators have lower weights than the other bfq_queues. Signed-off-by: Paolo Valente Signed-off-by: Davide Zini --- block/bfq-cgroup.c | 2 +- block/bfq-iosched.c | 143 ++++++++++++++++++++++++++++++++------------ block/bfq-iosched.h | 39 +++++++++++- block/bfq-wf2q.c | 2 +- 4 files changed, 144 insertions(+), 42 deletions(-) diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c index 8275cdd4573f..3de4884f728d 100644 --- a/block/bfq-cgroup.c +++ b/block/bfq-cgroup.c @@ -698,7 +698,7 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq, bfq_activate_bfqq(bfqd, bfqq); } - if (!bfqd->in_service_queue && !bfqd->rq_in_driver) + if (!bfqd->in_service_queue && !bfqd->tot_rq_in_driver) bfq_schedule_dispatch(bfqd); /* release extra ref taken above, bfqq may happen to be freed now */ bfq_put_queue(bfqq); diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 665e78035666..f4df5710b648 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -2303,9 +2303,9 @@ static void bfq_add_request(struct request *rq) * elapsed. */ if (bfqq == bfqd->in_service_queue && - (bfqd->rq_in_driver == 0 || + (bfqd->tot_rq_in_driver == 0 || (bfqq->last_serv_time_ns > 0 && - bfqd->rqs_injected && bfqd->rq_in_driver > 0)) && + bfqd->rqs_injected && bfqd->tot_rq_in_driver > 0)) && time_is_before_eq_jiffies(bfqq->decrease_time_jif + msecs_to_jiffies(10))) { bfqd->last_empty_occupied_ns = ktime_get_ns(); @@ -2329,7 +2329,7 @@ static void bfq_add_request(struct request *rq) * will be set in case injection is performed * on bfqq before rq is completed). */ - if (bfqd->rq_in_driver == 0) + if (bfqd->tot_rq_in_driver == 0) bfqd->rqs_injected = false; } } @@ -2427,15 +2427,18 @@ static sector_t get_sdist(sector_t last_pos, struct request *rq) static void bfq_activate_request(struct request_queue *q, struct request *rq) { struct bfq_data *bfqd = q->elevator->elevator_data; + unsigned int act_idx = bfq_actuator_index(bfqd, rq->bio); - bfqd->rq_in_driver++; + bfqd->tot_rq_in_driver++; + bfqd->rq_in_driver[act_idx]++; } static void bfq_deactivate_request(struct request_queue *q, struct request *rq) { struct bfq_data *bfqd = q->elevator->elevator_data; - bfqd->rq_in_driver--; + bfqd->tot_rq_in_driver--; + bfqd->rq_in_driver[bfq_actuator_index(bfqd, rq->bio)]--; } #endif @@ -2710,11 +2713,14 @@ void bfq_end_wr_async_queues(struct bfq_data *bfqd, static void bfq_end_wr(struct bfq_data *bfqd) { struct bfq_queue *bfqq; + int i; spin_lock_irq(&bfqd->lock); - list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list) - bfq_bfqq_end_wr(bfqq); + for (i = 0; i < bfqd->num_actuators; i++) { + list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list) + bfq_bfqq_end_wr(bfqq); + } list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list) bfq_bfqq_end_wr(bfqq); bfq_end_wr_async(bfqd); @@ -3671,13 +3677,13 @@ static void bfq_update_peak_rate(struct bfq_data *bfqd, struct request *rq) * - start a new observation interval with this dispatch */ if (now_ns - bfqd->last_dispatch > 100*NSEC_PER_MSEC && - bfqd->rq_in_driver == 0) + bfqd->tot_rq_in_driver == 0) goto update_rate_and_reset; /* Update sampling information */ bfqd->peak_rate_samples++; - if ((bfqd->rq_in_driver > 0 || + if ((bfqd->tot_rq_in_driver > 0 || now_ns - bfqd->last_completion < BFQ_MIN_TT) && !BFQ_RQ_SEEKY(bfqd, bfqd->last_position, rq)) bfqd->sequential_samples++; @@ -3942,10 +3948,8 @@ static bool idling_needed_for_service_guarantees(struct bfq_data *bfqd, return false; return (bfqq->wr_coeff > 1 && - (bfqd->wr_busy_queues < - tot_busy_queues || - bfqd->rq_in_driver >= - bfqq->dispatched + 4)) || + (bfqd->wr_busy_queues < tot_busy_queues || + bfqd->tot_rq_in_driver >= bfqq->dispatched + 4)) || bfq_asymmetric_scenario(bfqd, bfqq) || tot_busy_queues == 1; } @@ -4716,6 +4720,8 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd) { struct bfq_queue *bfqq, *in_serv_bfqq = bfqd->in_service_queue; unsigned int limit = in_serv_bfqq->inject_limit; + int i; + /* * If * - bfqq is not weight-raised and therefore does not carry @@ -4747,7 +4753,7 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd) ) limit = 1; - if (bfqd->rq_in_driver >= limit) + if (bfqd->tot_rq_in_driver >= limit) return NULL; /* @@ -4762,11 +4768,12 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd) * (and re-added only if it gets new requests, but then it * is assigned again enough budget for its new backlog). */ - list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list) - if (!RB_EMPTY_ROOT(&bfqq->sort_list) && - (in_serv_always_inject || bfqq->wr_coeff > 1) && - bfq_serv_to_charge(bfqq->next_rq, bfqq) <= - bfq_bfqq_budget_left(bfqq)) { + for (i = 0; i < bfqd->num_actuators; i++) { + list_for_each_entry(bfqq, &bfqd->active_list[i], bfqq_list) + if (!RB_EMPTY_ROOT(&bfqq->sort_list) && + (in_serv_always_inject || bfqq->wr_coeff > 1) && + bfq_serv_to_charge(bfqq->next_rq, bfqq) <= + bfq_bfqq_budget_left(bfqq)) { /* * Allow for only one large in-flight request * on non-rotational devices, for the @@ -4791,22 +4798,69 @@ bfq_choose_bfqq_for_injection(struct bfq_data *bfqd) else limit = in_serv_bfqq->inject_limit; - if (bfqd->rq_in_driver < limit) { + if (bfqd->tot_rq_in_driver < limit) { bfqd->rqs_injected = true; return bfqq; } } + } + + return NULL; +} + +static struct bfq_queue * +bfq_find_active_bfqq_for_actuator(struct bfq_data *bfqd, int idx) +{ + struct bfq_queue *bfqq = NULL; + + if (bfqd->in_service_queue && + bfqd->in_service_queue->actuator_idx == idx) + return bfqd->in_service_queue; + + list_for_each_entry(bfqq, &bfqd->active_list[idx], bfqq_list) { + if (!RB_EMPTY_ROOT(&bfqq->sort_list) && + bfq_serv_to_charge(bfqq->next_rq, bfqq) <= + bfq_bfqq_budget_left(bfqq)) { + return bfqq; + } + } return NULL; } +/* + * Perform a linear scan of each actuator, until an actuator is found + * for which the following two conditions hold: the load of the + * actuator is below the threshold (see comments on actuator_load_threshold + * for details), and there is a queue that contains I/O for that + * actuator. On success, return that queue. + */ +static struct bfq_queue * +bfq_find_bfqq_for_underused_actuator(struct bfq_data *bfqd) +{ + int i; + + for (i = 0 ; i < bfqd->num_actuators; i++) { + if (bfqd->rq_in_driver[i] < bfqd->actuator_load_threshold) { + struct bfq_queue *bfqq = + bfq_find_active_bfqq_for_actuator(bfqd, i); + + if (bfqq) + return bfqq; + } + } + + return NULL; +} + + /* * Select a queue for service. If we have a current queue in service, * check whether to continue servicing it, or retrieve and set a new one. */ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd) { - struct bfq_queue *bfqq; + struct bfq_queue *bfqq, *inject_bfqq; struct request *next_rq; enum bfqq_expiration reason = BFQQE_BUDGET_TIMEOUT; @@ -4828,6 +4882,15 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd) goto expire; check_queue: + /* + * If some actuator is underutilized, but the in-service + * queue does not contain I/O for that actuator, then try to + * inject I/O for that actuator. + */ + inject_bfqq = bfq_find_bfqq_for_underused_actuator(bfqd); + if (inject_bfqq && inject_bfqq != bfqq) + return inject_bfqq; + /* * This loop is rarely executed more than once. Even when it * happens, it is much more convenient to re-execute this loop @@ -5183,11 +5246,11 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx) /* * We exploit the bfq_finish_requeue_request hook to - * decrement rq_in_driver, but + * decrement tot_rq_in_driver, but * bfq_finish_requeue_request will not be invoked on * this request. So, to avoid unbalance, just start - * this request, without incrementing rq_in_driver. As - * a negative consequence, rq_in_driver is deceptively + * this request, without incrementing tot_rq_in_driver. As + * a negative consequence, tot_rq_in_driver is deceptively * lower than it should be while this request is in * service. This may cause bfq_schedule_dispatch to be * invoked uselessly. @@ -5196,7 +5259,7 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx) * bfq_finish_requeue_request hook, if defined, is * probably invoked also on this request. So, by * exploiting this hook, we could 1) increment - * rq_in_driver here, and 2) decrement it in + * tot_rq_in_driver here, and 2) decrement it in * bfq_finish_requeue_request. Such a solution would * let the value of the counter be always accurate, * but it would entail using an extra interface @@ -5225,7 +5288,7 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx) * Of course, serving one request at a time may cause loss of * throughput. */ - if (bfqd->strict_guarantees && bfqd->rq_in_driver > 0) + if (bfqd->strict_guarantees && bfqd->tot_rq_in_driver > 0) goto exit; bfqq = bfq_select_queue(bfqd); @@ -5236,7 +5299,8 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx) if (rq) { inc_in_driver_start_rq: - bfqd->rq_in_driver++; + bfqd->rq_in_driver[bfqq->actuator_idx]++; + bfqd->tot_rq_in_driver++; start_rq: rq->rq_flags |= RQF_STARTED; } @@ -6304,7 +6368,7 @@ static void bfq_update_hw_tag(struct bfq_data *bfqd) struct bfq_queue *bfqq = bfqd->in_service_queue; bfqd->max_rq_in_driver = max_t(int, bfqd->max_rq_in_driver, - bfqd->rq_in_driver); + bfqd->tot_rq_in_driver); if (bfqd->hw_tag == 1) return; @@ -6315,7 +6379,7 @@ static void bfq_update_hw_tag(struct bfq_data *bfqd) * sum is not exact, as it's not taking into account deactivated * requests. */ - if (bfqd->rq_in_driver + bfqd->queued <= BFQ_HW_QUEUE_THRESHOLD) + if (bfqd->tot_rq_in_driver + bfqd->queued <= BFQ_HW_QUEUE_THRESHOLD) return; /* @@ -6326,7 +6390,7 @@ static void bfq_update_hw_tag(struct bfq_data *bfqd) if (bfqq && bfq_bfqq_has_short_ttime(bfqq) && bfqq->dispatched + bfqq->queued[0] + bfqq->queued[1] < BFQ_HW_QUEUE_THRESHOLD && - bfqd->rq_in_driver < BFQ_HW_QUEUE_THRESHOLD) + bfqd->tot_rq_in_driver < BFQ_HW_QUEUE_THRESHOLD) return; if (bfqd->hw_tag_samples++ < BFQ_HW_QUEUE_SAMPLES) @@ -6347,7 +6411,8 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd) bfq_update_hw_tag(bfqd); - bfqd->rq_in_driver--; + bfqd->rq_in_driver[bfqq->actuator_idx]--; + bfqd->tot_rq_in_driver--; bfqq->dispatched--; if (!bfqq->dispatched && !bfq_bfqq_busy(bfqq)) { @@ -6466,7 +6531,7 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd) BFQQE_NO_MORE_REQUESTS); } - if (!bfqd->rq_in_driver) + if (!bfqd->tot_rq_in_driver) bfq_schedule_dispatch(bfqd); } @@ -6597,13 +6662,13 @@ static void bfq_update_inject_limit(struct bfq_data *bfqd, * conditions to do it, or we can lower the last base value * computed. * - * NOTE: (bfqd->rq_in_driver == 1) means that there is no I/O + * NOTE: (bfqd->tot_rq_in_driver == 1) means that there is no I/O * request in flight, because this function is in the code * path that handles the completion of a request of bfqq, and, * in particular, this function is executed before - * bfqd->rq_in_driver is decremented in such a code path. + * bfqd->tot_rq_in_driver is decremented in such a code path. */ - if ((bfqq->last_serv_time_ns == 0 && bfqd->rq_in_driver == 1) || + if ((bfqq->last_serv_time_ns == 0 && bfqd->tot_rq_in_driver == 1) || tot_time_ns < bfqq->last_serv_time_ns) { if (bfqq->last_serv_time_ns == 0) { /* @@ -6613,7 +6678,7 @@ static void bfq_update_inject_limit(struct bfq_data *bfqd, bfqq->inject_limit = max_t(unsigned int, 1, old_limit); } bfqq->last_serv_time_ns = tot_time_ns; - } else if (!bfqd->rqs_injected && bfqd->rq_in_driver == 1) + } else if (!bfqd->rqs_injected && bfqd->tot_rq_in_driver == 1) /* * No I/O injected and no request still in service in * the drive: these are the exact conditions for @@ -7257,7 +7322,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) bfqd->queue_weights_tree = RB_ROOT_CACHED; bfqd->num_groups_with_pending_reqs = 0; - INIT_LIST_HEAD(&bfqd->active_list); + INIT_LIST_HEAD(&bfqd->active_list[0]); + INIT_LIST_HEAD(&bfqd->active_list[1]); INIT_LIST_HEAD(&bfqd->idle_list); INIT_HLIST_HEAD(&bfqd->burst_list); @@ -7302,6 +7368,9 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) ref_wr_duration[blk_queue_nonrot(bfqd->queue)]; bfqd->peak_rate = ref_rate[blk_queue_nonrot(bfqd->queue)] * 2 / 3; + /* see comments on the definition of next field inside bfq_data */ + bfqd->actuator_load_threshold = 4; + spin_lock_init(&bfqd->lock); /* diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h index 953980de6b4b..be725d9143fa 100644 --- a/block/bfq-iosched.h +++ b/block/bfq-iosched.h @@ -586,7 +586,12 @@ struct bfq_data { /* number of queued requests */ int queued; /* number of requests dispatched and waiting for completion */ - int rq_in_driver; + int tot_rq_in_driver; + /* + * number of requests dispatched and waiting for completion + * for each actuator + */ + int rq_in_driver[BFQ_MAX_ACTUATORS]; /* true if the device is non rotational and performs queueing */ bool nonrot_with_queueing; @@ -680,8 +685,13 @@ struct bfq_data { /* maximum budget allotted to a bfq_queue before rescheduling */ int bfq_max_budget; - /* list of all the bfq_queues active on the device */ - struct list_head active_list; + /* + * List of all the bfq_queues active for a specific actuator + * on the device. Keeping active queues separate on a + * per-actuator basis helps implementing per-actuator + * injection more efficiently. + */ + struct list_head active_list[BFQ_MAX_ACTUATORS]; /* list of all the bfq_queues idle on the device */ struct list_head idle_list; @@ -817,6 +827,29 @@ struct bfq_data { sector_t sector[BFQ_MAX_ACTUATORS]; sector_t nr_sectors[BFQ_MAX_ACTUATORS]; struct blk_independent_access_range ia_ranges[BFQ_MAX_ACTUATORS]; + + /* + * If the number of I/O requests queued in the device for a + * given actuator is below next threshold, then the actuator + * is deemed as underutilized. If this condition is found to + * hold for some actuator upon a dispatch, but (i) the + * in-service queue does not contain I/O for that actuator, + * while (ii) some other queue does contain I/O for that + * actuator, then the head I/O request of the latter queue is + * returned (injected), instead of the head request of the + * currently in-service queue. + * + * We set the threshold, empirically, to the minimum possible + * value for which an actuator is fully utilized, or close to + * be fully utilized. By doing so, injected I/O 'steals' as + * few drive-queue slots as possibile to the in-service + * queue. This reduces as much as possible the probability + * that the service of I/O from the in-service bfq_queue gets + * delayed because of slot exhaustion, i.e., because all the + * slots of the drive queue are filled with I/O injected from + * other queues (NCQ provides for 32 slots). + */ + unsigned int actuator_load_threshold; }; enum bfqq_state_flags { diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c index 8fc3da4c23bb..ec0273e2cd07 100644 --- a/block/bfq-wf2q.c +++ b/block/bfq-wf2q.c @@ -477,7 +477,7 @@ static void bfq_active_insert(struct bfq_service_tree *st, bfqd = (struct bfq_data *)bfqg->bfqd; #endif if (bfqq) - list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list); + list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list[bfqq->actuator_idx]); #ifdef CONFIG_BFQ_GROUP_IOSCHED if (bfqg != bfqd->root_group) bfqg->active_entities++; -- 2.20.1