2018-06-25 19:58:00

by Paolo Valente

[permalink] [raw]
Subject: [PATCH BUGFIX RESEND 0/4] bfq: fix bugs breaking bandwidth guarantees occasionally

Hi,
during some bandwidth tests, I found some occasional but severe
malfunctions (losses of bandwidth control). The first three patches in
this series fix the bugs that caused these malfunctions. The last
patch is a fix/improvement of the name of one of the functions
involved with these bugs.

I guess these patches are appropriate for next kernel release.

Thanks,
Paolo

Paolo Valente (4):
block, bfq: add/remove entity weights correctly
block, bfq: do not expire a queue that will deserve dispatch plugging
block, bfq: fix service being wrongly set to zero in case of
preemption
block, bfq: give a better name to bfq_bfqq_may_idle

block/bfq-iosched.c | 131 +++++++++++++++++++++++++++++++++++++++++++---------
block/bfq-iosched.h | 7 ++-
block/bfq-wf2q.c | 30 ++++++------
3 files changed, 128 insertions(+), 40 deletions(-)

--
2.16.1


2018-06-25 19:56:56

by Paolo Valente

[permalink] [raw]
Subject: [PATCH BUGFIX RESEND 4/4] block, bfq: give a better name to bfq_bfqq_may_idle

The actual goal of the function bfq_bfqq_may_idle is to tell whether
it is better to perform device idling (more precisely: I/O-dispatch
plugging) for the input bfq_queue, either to boost throughput or to
preserve service guarantees. This commit improves the name of the
function accordingly.

Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index d579cc8e0db6..41d9036b1822 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -634,7 +634,7 @@ static bool bfq_differentiated_weights(struct bfq_data *bfqd)
* The following function returns true if every queue must receive the
* same share of the throughput (this condition is used when deciding
* whether idling may be disabled, see the comments in the function
- * bfq_bfqq_may_idle()).
+ * bfq_better_to_idle()).
*
* Such a scenario occurs when:
* 1) all active queues have the same weight,
@@ -3355,7 +3355,7 @@ static bool bfq_may_expire_for_budg_timeout(struct bfq_queue *bfqq)
* issues taken into account are not trivial. We discuss these issues
* individually while introducing the variables.
*/
-static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq)
+static bool bfq_better_to_idle(struct bfq_queue *bfqq)
{
struct bfq_data *bfqd = bfqq->bfqd;
bool rot_without_queueing =
@@ -3588,19 +3588,19 @@ static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq)
}

/*
- * If the in-service queue is empty but the function bfq_bfqq_may_idle
+ * If the in-service queue is empty but the function bfq_better_to_idle
* returns true, then:
* 1) the queue must remain in service and cannot be expired, and
* 2) the device must be idled to wait for the possible arrival of a new
* request for the queue.
- * See the comments on the function bfq_bfqq_may_idle for the reasons
+ * See the comments on the function bfq_better_to_idle for the reasons
* why performing device idling is the best choice to boost the throughput
- * and preserve service guarantees when bfq_bfqq_may_idle itself
+ * and preserve service guarantees when bfq_better_to_idle itself
* returns true.
*/
static bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
{
- return RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_bfqq_may_idle(bfqq);
+ return RB_EMPTY_ROOT(&bfqq->sort_list) && bfq_better_to_idle(bfqq);
}

/*
@@ -3686,7 +3686,7 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
* may idle after their completion, then keep it anyway.
*/
if (bfq_bfqq_wait_request(bfqq) ||
- (bfqq->dispatched != 0 && bfq_bfqq_may_idle(bfqq))) {
+ (bfqq->dispatched != 0 && bfq_better_to_idle(bfqq))) {
bfqq = NULL;
goto keep_queue;
}
@@ -4734,7 +4734,7 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
BFQQE_BUDGET_TIMEOUT);
else if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
(bfqq->dispatched == 0 ||
- !bfq_bfqq_may_idle(bfqq)))
+ !bfq_better_to_idle(bfqq)))
bfq_bfqq_expire(bfqd, bfqq, false,
BFQQE_NO_MORE_REQUESTS);
}
--
2.16.1


2018-06-25 19:57:12

by Paolo Valente

[permalink] [raw]
Subject: [PATCH BUGFIX RESEND 2/4] block, bfq: do not expire a queue that will deserve dispatch plugging

For some bfq_queues, BFQ plugs I/O dispatching when the queue becomes
idle, and keeps the plug until a new request of the queue arrives, or
a timeout fires. BFQ does so either to boost throughput or to preserve
service guarantees for the queue.

More precisely, for such a queue, plugging starts when the queue
happens to have either no request enqueued, or no request in flight,
that is, no request already dispatched but not yet completed.

On the opposite end, BFQ may happen to expire a queue with no request
enqueued, without doing any plugging, if the queue still has some
request in flight. Unfortunately, such a premature expiration causes
the queue to lose its chance to enjoy dispatch plugging a moment
later, i.e., when its in-flight requests finally get completed. This
breaks service guarantees for the queue.

This commit prevents BFQ from expiring an empty queue if the latter
still has in-flight requests.

Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 36 +++++++++++++++++++++++++++++++++---
1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 3f32e88c7e9b..4fd4f1996498 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -3597,8 +3597,14 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)

bfq_log_bfqq(bfqd, bfqq, "select_queue: already in-service queue");

+ /*
+ * Do not expire bfqq for budget timeout if bfqq may be about
+ * to enjoy device idling. The reason why, in this case, we
+ * prevent bfqq from expiring is the same as in the comments
+ * on the case where bfq_bfqq_must_idle() returns true, in
+ * bfq_completed_request().
+ */
if (bfq_may_expire_for_budg_timeout(bfqq) &&
- !bfq_bfqq_wait_request(bfqq) &&
!bfq_bfqq_must_idle(bfqq))
goto expire;

@@ -4674,8 +4680,32 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
* or if we want to idle in case it has no pending requests.
*/
if (bfqd->in_service_queue == bfqq) {
- if (bfqq->dispatched == 0 && bfq_bfqq_must_idle(bfqq)) {
- bfq_arm_slice_timer(bfqd);
+ if (bfq_bfqq_must_idle(bfqq)) {
+ if (bfqq->dispatched == 0)
+ bfq_arm_slice_timer(bfqd);
+ /*
+ * If we get here, we do not expire bfqq, even
+ * if bfqq was in budget timeout or had no
+ * more requests (as controlled in the next
+ * conditional instructions). The reason for
+ * not expiring bfqq is as follows.
+ *
+ * Here bfqq->dispatched > 0 holds, but
+ * bfq_bfqq_must_idle() returned true. This
+ * implies that, even if no request arrives
+ * for bfqq before bfqq->dispatched reaches 0,
+ * bfqq will, however, not be expired on the
+ * completion event that causes bfqq->dispatch
+ * to reach zero. In contrast, on this event,
+ * bfqq will start enjoying device idling
+ * (I/O-dispatch plugging).
+ *
+ * But, if we expired bfqq here, bfqq would
+ * not have the chance to enjoy device idling
+ * when bfqq->dispatched finally reaches
+ * zero. This would expose bfqq to violation
+ * of its reserved service guarantees.
+ */
return;
} else if (bfq_may_expire_for_budg_timeout(bfqq))
bfq_bfqq_expire(bfqd, bfqq, false,
--
2.16.1


2018-06-25 19:58:20

by Paolo Valente

[permalink] [raw]
Subject: [PATCH BUGFIX RESEND 3/4] block, bfq: fix service being wrongly set to zero in case of preemption

If
- a bfq_queue Q preempts another queue, because one request of Q
arrives in time,
- but, after this preemption, Q is not the queue that is set in service,
then Q->entity.service is set to 0 when Q is eventually set in
service. But Q should have continued receiving service with its old
budget (which is why preemption has occurred) and its old service.

This commit addresses this issue by resetting service on queue real
expiration.

Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 34 ++++++++++++++++++++++++++++------
block/bfq-wf2q.c | 6 ------
2 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 4fd4f1996498..d579cc8e0db6 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -1382,18 +1382,30 @@ static bool bfq_bfqq_update_budg_for_activation(struct bfq_data *bfqd,
* remain unchanged after such an expiration, and the
* following statement therefore assigns to
* entity->budget the remaining budget on such an
- * expiration. For clarity, entity->service is not
- * updated on expiration in any case, and, in normal
- * operation, is reset only when bfqq is selected for
- * service (see bfq_get_next_queue).
+ * expiration.
*/
entity->budget = min_t(unsigned long,
bfq_bfqq_budget_left(bfqq),
bfqq->max_budget);

+ /*
+ * At this point, we have used entity->service to get
+ * the budget left (needed for updating
+ * entity->budget). Thus we finally can, and have to,
+ * reset entity->service. The latter must be reset
+ * because bfqq would otherwise be charged again for
+ * the service it has received during its previous
+ * service slot(s).
+ */
+ entity->service = 0;
+
return true;
}

+ /*
+ * We can finally complete expiration, by setting service to 0.
+ */
+ entity->service = 0;
entity->budget = max_t(unsigned long, bfqq->max_budget,
bfq_serv_to_charge(bfqq->next_rq, bfqq));
bfq_clear_bfqq_non_blocking_wait_rq(bfqq);
@@ -3271,11 +3283,21 @@ void bfq_bfqq_expire(struct bfq_data *bfqd,
ref = bfqq->ref;
__bfq_bfqq_expire(bfqd, bfqq);

+ if (ref == 1) /* bfqq is gone, no more actions on it */
+ return;
+
/* mark bfqq as waiting a request only if a bic still points to it */
- if (ref > 1 && !bfq_bfqq_busy(bfqq) &&
+ if (!bfq_bfqq_busy(bfqq) &&
reason != BFQQE_BUDGET_TIMEOUT &&
- reason != BFQQE_BUDGET_EXHAUSTED)
+ reason != BFQQE_BUDGET_EXHAUSTED) {
bfq_mark_bfqq_non_blocking_wait_rq(bfqq);
+ /*
+ * Not setting service to 0, because, if the next rq
+ * arrives in time, the queue will go on receiving
+ * service with this same budget (as if it never expired)
+ */
+ } else
+ entity->service = 0;
}

/*
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index 58cf38fcee05..dbc07b456059 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -1544,12 +1544,6 @@ struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
entity = sd->next_in_service;
sd->in_service_entity = entity;

- /*
- * Reset the accumulator of the amount of service that
- * the entity is about to receive.
- */
- entity->service = 0;
-
/*
* If entity is no longer a candidate for next
* service, then it must be extracted from its active
--
2.16.1


2018-06-25 19:59:19

by Paolo Valente

[permalink] [raw]
Subject: [PATCH BUGFIX RESEND 1/4] block, bfq: add/remove entity weights correctly

To keep I/O throughput high as often as possible, BFQ performs
I/O-dispatch plugging (aka device idling) only when beneficial exactly
for throughput, or when needed for service guarantees (low latency,
fairness). An important case where the latter condition holds is when
the scenario is 'asymmetric' in terms of weights: i.e., when some
bfq_queue or whole group of queues has a higher weight, and thus has
to receive more service, than other queues or groups. Without dispatch
plugging, lower-weight queues/groups may unjustly steal bandwidth to
higher-weight queues/groups.

To detect asymmetric scenarios, BFQ checks some sufficient
conditions. One of these conditions is that active groups have
different weights. BFQ controls this condition by maintaining a
special set of unique weights of active groups
(group_weights_tree). To this purpose, in the function
bfq_active_insert/bfq_active_extract BFQ adds/removes the weight of a
group to/from this set.

Unfortunately, the function bfq_active_extract may happen to be
invoked also for a group that is still active (to preserve the correct
update of the next queue to serve, see comments in function
bfq_no_longer_next_in_service() for details). In this case, removing
the weight of the group makes the set group_weights_tree
inconsistent. Service-guarantee violations follow.

This commit addresses this issue by moving group_weights_tree
insertions from their previous location (in bfq_active_insert) into
the function __bfq_activate_entity, and by moving group_weights_tree
extractions from bfq_active_extract to when the entity that represents
a group remains throughly idle, i.e., with no request either enqueued
or dispatched.

Signed-off-by: Paolo Valente <[email protected]>
---
block/bfq-iosched.c | 45 +++++++++++++++++++++++++++++++++++++++++----
block/bfq-iosched.h | 7 +++++--
block/bfq-wf2q.c | 24 +++++++++++++-----------
3 files changed, 59 insertions(+), 17 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 495b9ddb3355..3f32e88c7e9b 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -742,8 +742,9 @@ void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_entity *entity,
* See the comments to the function bfq_weights_tree_add() for considerations
* about overhead.
*/
-void bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_entity *entity,
- struct rb_root *root)
+void __bfq_weights_tree_remove(struct bfq_data *bfqd,
+ struct bfq_entity *entity,
+ struct rb_root *root)
{
if (!entity->weight_counter)
return;
@@ -759,6 +760,43 @@ void bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_entity *entity,
entity->weight_counter = NULL;
}

+/*
+ * Invoke __bfq_weights_tree_remove on bfqq and all its inactive
+ * parent entities.
+ */
+void bfq_weights_tree_remove(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq)
+{
+ struct bfq_entity *entity = bfqq->entity.parent;
+
+ __bfq_weights_tree_remove(bfqd, &bfqq->entity,
+ &bfqd->queue_weights_tree);
+
+ for_each_entity(entity) {
+ struct bfq_sched_data *sd = entity->my_sched_data;
+
+ if (sd->next_in_service || sd->in_service_entity) {
+ /*
+ * entity is still active, because either
+ * next_in_service or in_service_entity is not
+ * NULL (see the comments on the definition of
+ * next_in_service for details on why
+ * in_service_entity must be checked too).
+ *
+ * As a consequence, the weight of entity is
+ * not to be removed. In addition, if entity
+ * is active, then its parent entities are
+ * active as well, and thus their weights are
+ * not to be removed either. In the end, this
+ * loop must stop here.
+ */
+ break;
+ }
+ __bfq_weights_tree_remove(bfqd, entity,
+ &bfqd->group_weights_tree);
+ }
+}
+
/*
* Return expired entry, or NULL to just start from scratch in rbtree.
*/
@@ -4582,8 +4620,7 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
*/
bfqq->budget_timeout = jiffies;

- bfq_weights_tree_remove(bfqd, &bfqq->entity,
- &bfqd->queue_weights_tree);
+ bfq_weights_tree_remove(bfqd, bfqq);
}

now_ns = ktime_get_ns();
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 0f712e03b035..a8a2e5aca4d4 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -827,8 +827,11 @@ struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic);
void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_entity *entity,
struct rb_root *root);
-void bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_entity *entity,
- struct rb_root *root);
+void __bfq_weights_tree_remove(struct bfq_data *bfqd,
+ struct bfq_entity *entity,
+ struct rb_root *root);
+void bfq_weights_tree_remove(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq);
void bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bool compensate, enum bfqq_expiration reason);
void bfq_put_queue(struct bfq_queue *bfqq);
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index 4498c43245e2..58cf38fcee05 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -499,9 +499,6 @@ static void bfq_active_insert(struct bfq_service_tree *st,
if (bfqq)
list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list);
#ifdef CONFIG_BFQ_GROUP_IOSCHED
- else /* bfq_group */
- bfq_weights_tree_add(bfqd, entity, &bfqd->group_weights_tree);
-
if (bfqg != bfqd->root_group)
bfqg->active_entities++;
#endif
@@ -601,10 +598,6 @@ static void bfq_active_extract(struct bfq_service_tree *st,
if (bfqq)
list_del(&bfqq->bfqq_list);
#ifdef CONFIG_BFQ_GROUP_IOSCHED
- else /* bfq_group */
- bfq_weights_tree_remove(bfqd, entity,
- &bfqd->group_weights_tree);
-
if (bfqg != bfqd->root_group)
bfqg->active_entities--;
#endif
@@ -799,7 +792,7 @@ __bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
if (prev_weight != new_weight) {
root = bfqq ? &bfqd->queue_weights_tree :
&bfqd->group_weights_tree;
- bfq_weights_tree_remove(bfqd, entity, root);
+ __bfq_weights_tree_remove(bfqd, entity, root);
}
entity->weight = new_weight;
/*
@@ -971,7 +964,7 @@ static void bfq_update_fin_time_enqueue(struct bfq_entity *entity,
* one of its children receives a new request.
*
* Basically, this function updates the timestamps of entity and
- * inserts entity into its active tree, ater possibly extracting it
+ * inserts entity into its active tree, after possibly extracting it
* from its idle tree.
*/
static void __bfq_activate_entity(struct bfq_entity *entity,
@@ -1015,6 +1008,16 @@ static void __bfq_activate_entity(struct bfq_entity *entity,
entity->on_st = true;
}

+#ifdef BFQ_GROUP_IOSCHED_ENABLED
+ if (!bfq_entity_to_bfqq(entity)) { /* bfq_group */
+ struct bfq_group *bfqg =
+ container_of(entity, struct bfq_group, entity);
+
+ bfq_weights_tree_add(bfqg->bfqd, entity,
+ &bfqd->group_weights_tree);
+ }
+#endif
+
bfq_update_fin_time_enqueue(entity, st, backshifted);
}

@@ -1664,8 +1667,7 @@ void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bfqd->busy_queues--;

if (!bfqq->dispatched)
- bfq_weights_tree_remove(bfqd, &bfqq->entity,
- &bfqd->queue_weights_tree);
+ bfq_weights_tree_remove(bfqd, bfqq);

if (bfqq->wr_coeff > 1)
bfqd->wr_busy_queues--;
--
2.16.1


2018-06-26 18:19:54

by Holger Hoffstätte

[permalink] [raw]
Subject: Re: [PATCH BUGFIX RESEND 0/4] bfq: fix bugs breaking bandwidth guarantees occasionally

On 06/25/18 21:55, Paolo Valente wrote:
> Hi,
> during some bandwidth tests, I found some occasional but severe
> malfunctions (losses of bandwidth control). The first three patches in
> this series fix the bugs that caused these malfunctions. The last
> patch is a fix/improvement of the name of one of the functions
> involved with these bugs.
>
> I guess these patches are appropriate for next kernel release.

Ran these overnight on 2 machines on top of recent BFQ and nothing
caught on fire. One funny benchmark result stood out since it gave
me (repeatedly!) 560 MB/s read bandwidth on an SSD which is rated to
do "up to 550MB/s", so I guess BFQ's bandwidth guarantees are now
really quite strong. :-)

Therefore:
Tested-by: Holger Hoffstätte <[email protected]>

cheers,
Holger

2018-06-26 20:45:07

by Oleksandr Natalenko

[permalink] [raw]
Subject: Re: [PATCH BUGFIX RESEND 0/4] bfq: fix bugs breaking bandwidth guarantees occasionally

Hi.

On 25.06.2018 21:55, Paolo Valente wrote:
> Hi,
> during some bandwidth tests, I found some occasional but severe
> malfunctions (losses of bandwidth control). The first three patches in
> this series fix the bugs that caused these malfunctions. The last
> patch is a fix/improvement of the name of one of the functions
> involved with these bugs.
>
> I guess these patches are appropriate for next kernel release.
>
> Thanks,
> Paolo
>
> Paolo Valente (4):
> block, bfq: add/remove entity weights correctly
> block, bfq: do not expire a queue that will deserve dispatch plugging
> block, bfq: fix service being wrongly set to zero in case of
> preemption
> block, bfq: give a better name to bfq_bfqq_may_idle
>
> block/bfq-iosched.c | 131
> +++++++++++++++++++++++++++++++++++++++++++---------
> block/bfq-iosched.h | 7 ++-
> block/bfq-wf2q.c | 30 ++++++------
> 3 files changed, 128 insertions(+), 40 deletions(-)
>
> --
> 2.16.1

So far, no smoke or visible issues while running it on my 2 SSDs with a
daily workload applied.

Tested-by: Oleksandr Natalenko <[email protected]>

Thanks.

--
Oleksandr Natalenko (post-factum)

2018-06-29 04:21:55

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH BUGFIX RESEND 0/4] bfq: fix bugs breaking bandwidth guarantees occasionally

On 6/25/18 1:55 PM, Paolo Valente wrote:
> Hi,
> during some bandwidth tests, I found some occasional but severe
> malfunctions (losses of bandwidth control). The first three patches in
> this series fix the bugs that caused these malfunctions. The last
> patch is a fix/improvement of the name of one of the functions
> involved with these bugs.

Applied for 4.19, thanks.

--
Jens Axboe