Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Content-Type: text/plain;
        charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
Subject: Re: [PATCH] block, bfq: improve asymmetric scenarios detection
From:   Paolo Valente <paolo.valente@linaro.org>
In-Reply-To: <20181012095557.13744-1-paolo.valente@linaro.org>
Date:   Fri, 12 Oct 2018 12:01:01 +0200
Cc:     linux-block <linux-block@vger.kernel.org>,
        linux-kernel <linux-kernel@vger.kernel.org>,
        ulf.hansson@linaro.org, linus.walleij@linaro.org,
        broonie@kernel.org, bfq-iosched@googlegroups.com,
        oleksandr@natalenko.name, Federico Motta <federico@willer.it>
Content-Transfer-Encoding: quoted-printable
Message-Id: <F40EE6B6-73A5-4A99-8F0C-13DEBC8EFB2C@linaro.org>
References: <20181012095557.13744-1-paolo.valente@linaro.org>
To:     Jens Axboe <axboe@kernel.dk>
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

Hi Jens,
this is based against next, and meant for 4.20, if we are not too late
again (and, of course, if the patch is acceptable).

Thanks,
Paolo

> Il giorno 12 ott 2018, alle ore 11:55, Paolo Valente =
<paolo.valente@linaro.org> ha scritto:
>=20
> From: Federico Motta <federico@willer.it>
>=20
> bfq defines as asymmetric a scenario where an active entity, say E
> (representing either a single bfq_queue or a group of other entities),
> has a higher weight than some other entities.  If the entity E does =
sync
> I/O in such a scenario, then bfq plugs the dispatch of the I/O of the
> other entities in the following situation: E is in service but
> temporarily has no pending I/O request.  In fact, without this =
plugging,
> all the times that E stops being temporarily idle, it may find the
> internal queues of the storage device already filled with an
> out-of-control number of extra requests, from other entities. So E may
> have to wait for the service of these extra requests, before finally
> having its own requests served. This may easily break service
> guarantees, with E getting less than its fair share of the device
> throughput.  Usually, the end result is that E gets the same fraction =
of
> the throughput as the other entities, instead of getting more, =
according
> to its higher weight.
>=20
> Yet there are two other more subtle cases where E, even if its weight =
is
> actually equal to or even lower than the weight of any other active
> entities, may get less than its fair share of the throughput in case =
the
> above I/O plugging is not performed:
> 1. other entities issue larger requests than E;
> 2. other entities contain more active child entities than E (or in
>   general tend to have more backlog than E).
>=20
> In the first case, other entities may get more service than E because
> they get larger requests, than those of E, served during the temporary
> idle periods of E.  In the second case, other entities get more =
service
> because, by having many child entities, they have many requests ready
> for dispatching while E is temporarily idle.
>=20
> This commit addresses this issue by extending the definition of
> asymmetric scenario: a scenario is asymmetric when
> - active entities representing bfq_queues have differentiated weights,
>  as in the original definition
> or (inclusive)
> - one or more entities representing groups of entities are active.
>=20
> This broader definition makes sure that I/O plugging will be performed
> in all the above cases, provided that there is at least one active
> group.  Of course, this definition is very coarse, so it will trigger
> I/O plugging also in cases where it is not needed, such as, e.g.,
> multiple active entities with just one child each, and all with the =
same
> I/O-request size.  The reason for this coarse definition is just that =
a
> finer-grained definition would be rather heavy to compute.
>=20
> On the opposite end, even this new definition does not trigger I/O
> plugging in all cases where there is no active group, and all =
bfq_queues
> have the same weight.  So, in these cases some unfairness may occur if
> there are asymmetries in I/O-request sizes.  We made this choice =
because
> I/O plugging may lower throughput, and probably a user that has not
> created any group cares more about throughput than about perfect
> fairness.  At any rate, as for possible applications that may care =
about
> service guarantees, bfq already guarantees a high responsiveness and a
> low latency to soft real-time applications automatically.
>=20
> Signed-off-by: Federico Motta <federico@willer.it>
> Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
> ---
> block/bfq-iosched.c | 223 ++++++++++++++++++++++++--------------------
> block/bfq-iosched.h |  27 +++---
> block/bfq-wf2q.c    |  36 +++----
> 3 files changed, 155 insertions(+), 131 deletions(-)
>=20
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 1a1b80dfd69d..6075100f03a5 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -624,12 +624,13 @@ void bfq_pos_tree_add_move(struct bfq_data =
*bfqd, struct bfq_queue *bfqq)
> }
>=20
> /*
> - * Tell whether there are active queues or groups with differentiated =
weights.
> + * Tell whether there are active queues with different weights or
> + * active groups.
>  */
> -static bool bfq_differentiated_weights(struct bfq_data *bfqd)
> +static bool bfq_varied_queue_weights_or_active_groups(struct bfq_data =
*bfqd)
> {
> 	/*
> -	 * For weights to differ, at least one of the trees must contain
> +	 * For queue weights to differ, queue_weights_tree must contain
> 	 * at least two nodes.
> 	 */
> 	return (!RB_EMPTY_ROOT(&bfqd->queue_weights_tree) &&
> @@ -637,9 +638,7 @@ static bool bfq_differentiated_weights(struct =
bfq_data *bfqd)
> 		 bfqd->queue_weights_tree.rb_node->rb_right)
> #ifdef CONFIG_BFQ_GROUP_IOSCHED
> 	       ) ||
> -	       (!RB_EMPTY_ROOT(&bfqd->group_weights_tree) &&
> -		(bfqd->group_weights_tree.rb_node->rb_left ||
> -		 bfqd->group_weights_tree.rb_node->rb_right)
> +		(bfqd->num_active_groups > 0
> #endif
> 	       );
> }
> @@ -657,26 +656,25 @@ static bool bfq_differentiated_weights(struct =
bfq_data *bfqd)
>  * 3) all active groups at the same level in the groups tree have the =
same
>  *    number of children.
>  *
> - * Unfortunately, keeping the necessary state for evaluating exactly =
the
> - * above symmetry conditions would be quite complex and =
time-consuming.
> - * Therefore this function evaluates, instead, the following stronger
> - * sub-conditions, for which it is much easier to maintain the needed
> - * state:
> + * Unfortunately, keeping the necessary state for evaluating exactly
> + * the last two symmetry sub-conditions above would be quite complex
> + * and time consuming.  Therefore this function evaluates, instead,
> + * only the following stronger two sub-conditions, for which it is
> + * much easier to maintain the needed state:
>  * 1) all active queues have the same weight,
> - * 2) all active groups have the same weight,
> - * 3) all active groups have at most one active child each.
> - * In particular, the last two conditions are always true if =
hierarchical
> - * support and the cgroups interface are not enabled, thus no state =
needs
> - * to be maintained in this case.
> + * 2) there are no active groups.
> + * In particular, the last condition is always true if hierarchical
> + * support or the cgroups interface are not enabled, thus no state
> + * needs to be maintained in this case.
>  */
> static bool bfq_symmetric_scenario(struct bfq_data *bfqd)
> {
> -	return !bfq_differentiated_weights(bfqd);
> +	return !bfq_varied_queue_weights_or_active_groups(bfqd);
> }
>=20
> /*
>  * If the weight-counter tree passed as input contains no counter for
> - * the weight of the input entity, then add that counter; otherwise =
just
> + * the weight of the input queue, then add that counter; otherwise =
just
>  * increment the existing counter.
>  *
>  * Note that weight-counter trees contain few nodes in mostly =
symmetric
> @@ -687,25 +685,25 @@ static bool bfq_symmetric_scenario(struct =
bfq_data *bfqd)
>  * In most scenarios, the rate at which nodes are created/destroyed
>  * should be low too.
>  */
> -void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_entity =
*entity,
> +void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue =
*bfqq,
> 			  struct rb_root *root)
> {
> +	struct bfq_entity *entity =3D &bfqq->entity;
> 	struct rb_node **new =3D &(root->rb_node), *parent =3D NULL;
>=20
> 	/*
> -	 * Do not insert if the entity is already associated with a
> +	 * Do not insert if the queue is already associated with a
> 	 * counter, which happens if:
> -	 *   1) the entity is associated with a queue,
> -	 *   2) a request arrival has caused the queue to become both
> +	 *   1) a request arrival has caused the queue to become both
> 	 *      non-weight-raised, and hence change its weight, and
> 	 *      backlogged; in this respect, each of the two events
> 	 *      causes an invocation of this function,
> -	 *   3) this is the invocation of this function caused by the
> +	 *   2) this is the invocation of this function caused by the
> 	 *      second event. This second invocation is actually =
useless,
> 	 *      and we handle this fact by exiting immediately. More
> 	 *      efficient or clearer solutions might possibly be =
adopted.
> 	 */
> -	if (entity->weight_counter)
> +	if (bfqq->weight_counter)
> 		return;
>=20
> 	while (*new) {
> @@ -715,7 +713,7 @@ void bfq_weights_tree_add(struct bfq_data *bfqd, =
struct bfq_entity *entity,
> 		parent =3D *new;
>=20
> 		if (entity->weight =3D=3D __counter->weight) {
> -			entity->weight_counter =3D __counter;
> +			bfqq->weight_counter =3D __counter;
> 			goto inc_counter;
> 		}
> 		if (entity->weight < __counter->weight)
> @@ -724,66 +722,67 @@ void bfq_weights_tree_add(struct bfq_data *bfqd, =
struct bfq_entity *entity,
> 			new =3D &((*new)->rb_right);
> 	}
>=20
> -	entity->weight_counter =3D kzalloc(sizeof(struct =
bfq_weight_counter),
> -					 GFP_ATOMIC);
> +	bfqq->weight_counter =3D kzalloc(sizeof(struct =
bfq_weight_counter),
> +				       GFP_ATOMIC);
>=20
> 	/*
> 	 * In the unlucky event of an allocation failure, we just
> -	 * exit. This will cause the weight of entity to not be
> -	 * considered in bfq_differentiated_weights, which, in its
> -	 * turn, causes the scenario to be deemed wrongly symmetric in
> -	 * case entity's weight would have been the only weight making
> -	 * the scenario asymmetric. On the bright side, no unbalance
> -	 * will however occur when entity becomes inactive again (the
> -	 * invocation of this function is triggered by an activation
> -	 * of entity). In fact, bfq_weights_tree_remove does nothing
> -	 * if !entity->weight_counter.
> +	 * exit. This will cause the weight of queue to not be
> +	 * considered in bfq_varied_queue_weights_or_active_groups,
> +	 * which, in its turn, causes the scenario to be deemed
> +	 * wrongly symmetric in case bfqq's weight would have been
> +	 * the only weight making the scenario asymmetric.  On the
> +	 * bright side, no unbalance will however occur when bfqq
> +	 * becomes inactive again (the invocation of this function
> +	 * is triggered by an activation of queue).  In fact,
> +	 * bfq_weights_tree_remove does nothing if
> +	 * !bfqq->weight_counter.
> 	 */
> -	if (unlikely(!entity->weight_counter))
> +	if (unlikely(!bfqq->weight_counter))
> 		return;
>=20
> -	entity->weight_counter->weight =3D entity->weight;
> -	rb_link_node(&entity->weight_counter->weights_node, parent, =
new);
> -	rb_insert_color(&entity->weight_counter->weights_node, root);
> +	bfqq->weight_counter->weight =3D entity->weight;
> +	rb_link_node(&bfqq->weight_counter->weights_node, parent, new);
> +	rb_insert_color(&bfqq->weight_counter->weights_node, root);
>=20
> inc_counter:
> -	entity->weight_counter->num_active++;
> +	bfqq->weight_counter->num_active++;
> }
>=20
> /*
> - * Decrement the weight counter associated with the entity, and, if =
the
> + * Decrement the weight counter associated with the queue, and, if =
the
>  * counter reaches 0, remove the counter from the tree.
>  * See the comments to the function bfq_weights_tree_add() for =
considerations
>  * about overhead.
>  */
> void __bfq_weights_tree_remove(struct bfq_data *bfqd,
> -			       struct bfq_entity *entity,
> +			       struct bfq_queue *bfqq,
> 			       struct rb_root *root)
> {
> -	if (!entity->weight_counter)
> +	if (!bfqq->weight_counter)
> 		return;
>=20
> -	entity->weight_counter->num_active--;
> -	if (entity->weight_counter->num_active > 0)
> +	bfqq->weight_counter->num_active--;
> +	if (bfqq->weight_counter->num_active > 0)
> 		goto reset_entity_pointer;
>=20
> -	rb_erase(&entity->weight_counter->weights_node, root);
> -	kfree(entity->weight_counter);
> +	rb_erase(&bfqq->weight_counter->weights_node, root);
> +	kfree(bfqq->weight_counter);
>=20
> reset_entity_pointer:
> -	entity->weight_counter =3D NULL;
> +	bfqq->weight_counter =3D NULL;
> }
>=20
> /*
> - * Invoke __bfq_weights_tree_remove on bfqq and all its inactive
> - * parent entities.
> + * Invoke __bfq_weights_tree_remove on bfqq and decrement the number
> + * of active groups for each queue's inactive parent entity.
>  */
> void bfq_weights_tree_remove(struct bfq_data *bfqd,
> 			     struct bfq_queue *bfqq)
> {
> 	struct bfq_entity *entity =3D bfqq->entity.parent;
>=20
> -	__bfq_weights_tree_remove(bfqd, &bfqq->entity,
> +	__bfq_weights_tree_remove(bfqd, bfqq,
> 				  &bfqd->queue_weights_tree);
>=20
> 	for_each_entity(entity) {
> @@ -797,17 +796,13 @@ void bfq_weights_tree_remove(struct bfq_data =
*bfqd,
> 			 * next_in_service for details on why
> 			 * in_service_entity must be checked too).
> 			 *
> -			 * As a consequence, the weight of entity is
> -			 * not to be removed. In addition, if entity
> -			 * is active, then its parent entities are
> -			 * active as well, and thus their weights are
> -			 * not to be removed either. In the end, this
> -			 * loop must stop here.
> +			 * As a consequence, its parent entities are
> +			 * active as well, and thus this loop must
> +			 * stop here.
> 			 */
> 			break;
> 		}
> -		__bfq_weights_tree_remove(bfqd, entity,
> -					  &bfqd->group_weights_tree);
> +		bfqd->num_active_groups--;
> 	}
> }
>=20
> @@ -3506,9 +3501,11 @@ static bool bfq_better_to_idle(struct bfq_queue =
*bfqq)
> 	 * symmetric scenario where:
> 	 * (i)  each of these processes must get the same throughput as
> 	 *      the others;
> -	 * (ii) all these processes have the same I/O pattern
> -		(either sequential or random).
> -	 * In fact, in such a scenario, the drive will tend to treat
> +	 * (ii) the I/O of each process has the same properties, in
> +	 *      terms of locality (sequential or random), direction
> +	 *      (reads or writes), request sizes, greediness
> +	 *      (from I/O-bound to sporadic), and so on.
> +	 * In fact, in such a scenario, the drive tends to treat
> 	 * the requests of each of these processes in about the same
> 	 * way as the requests of the others, and thus to provide
> 	 * each of these processes with about the same throughput
> @@ -3517,18 +3514,50 @@ static bool bfq_better_to_idle(struct =
bfq_queue *bfqq)
> 	 * certainly needed to guarantee that bfqq receives its
> 	 * assigned fraction of the device throughput (see [1] for
> 	 * details).
> +	 * The problem is that idling may significantly reduce
> +	 * throughput with certain combinations of types of I/O and
> +	 * devices. An important example is sync random I/O, on flash
> +	 * storage with command queueing. So, unless bfqq falls in the
> +	 * above cases where idling also boosts throughput, it would
> +	 * be important to check conditions (i) and (ii) accurately,
> +	 * so as to avoid idling when not strictly needed for service
> +	 * guarantees.
> +	 *
> +	 * Unfortunately, it is extremely difficult to thoroughly
> +	 * check condition (ii). And, in case there are active groups,
> +	 * it becomes very difficult to check condition (i) too. In
> +	 * fact, if there are active groups, then, for condition (i)
> +	 * to become false, it is enough that an active group contains
> +	 * more active processes or sub-groups than some other active
> +	 * group. We address this issue with the following bi-modal
> +	 * behavior, implemented in the function
> +	 * bfq_symmetric_scenario().
> 	 *
> -	 * We address this issue by controlling, actually, only the
> -	 * symmetry sub-condition (i), i.e., provided that
> -	 * sub-condition (i) holds, idling is not performed,
> -	 * regardless of whether sub-condition (ii) holds. In other
> -	 * words, only if sub-condition (i) holds, then idling is
> +	 * If there are active groups, then the scenario is tagged as
> +	 * asymmetric, conservatively, without checking any of the
> +	 * conditions (i) and (ii). So the device is idled for bfqq.
> +	 * This behavior matches also the fact that groups are created
> +	 * exactly if controlling I/O (to preserve bandwidth and
> +	 * latency guarantees) is a primary concern.
> +	 *
> +	 * On the opposite end, if there are no active groups, then
> +	 * only condition (i) is actually controlled, i.e., provided
> +	 * that condition (i) holds, idling is not performed,
> +	 * regardless of whether condition (ii) holds. In other words,
> +	 * only if condition (i) does not hold, then idling is
> 	 * allowed, and the device tends to be prevented from queueing
> -	 * many requests, possibly of several processes. The reason
> -	 * for not controlling also sub-condition (ii) is that we
> -	 * exploit preemption to preserve guarantees in case of
> -	 * symmetric scenarios, even if (ii) does not hold, as
> -	 * explained in the next two paragraphs.
> +	 * many requests, possibly of several processes. Since there
> +	 * are no active groups, then, to control condition (i) it is
> +	 * enough to check whether all active queues have the same
> +	 * weight.
> +	 *
> +	 * Not checking condition (ii) evidently exposes bfqq to the
> +	 * risk of getting less throughput than its fair share.
> +	 * However, for queues with the same weight, a further
> +	 * mechanism, preemption, mitigates or even eliminates this
> +	 * problem. And it does so without consequences on overall
> +	 * throughput. This mechanism and its benefits are explained
> +	 * in the next three paragraphs.
> 	 *
> 	 * Even if a queue, say Q, is expired when it remains idle, Q
> 	 * can still preempt the new in-service queue if the next
> @@ -3542,11 +3571,7 @@ static bool bfq_better_to_idle(struct bfq_queue =
*bfqq)
> 	 * idling allows the internal queues of the device to contain
> 	 * many requests, and thus to reorder requests, we can rather
> 	 * safely assume that the internal scheduler still preserves a
> -	 * minimum of mid-term fairness. The motivation for using
> -	 * preemption instead of idling is that, by not idling,
> -	 * service guarantees are preserved without minimally
> -	 * sacrificing throughput. In other words, both a high
> -	 * throughput and its desired distribution are obtained.
> +	 * minimum of mid-term fairness.
> 	 *
> 	 * More precisely, this preemption-based, idleless approach
> 	 * provides fairness in terms of IOPS, and not sectors per
> @@ -3565,27 +3590,27 @@ static bool bfq_better_to_idle(struct =
bfq_queue *bfqq)
> 	 * 1024/8 times as high as the service received by the other
> 	 * queue.
> 	 *
> -	 * On the other hand, device idling is performed, and thus
> -	 * pure sector-domain guarantees are provided, for the
> -	 * following queues, which are likely to need stronger
> -	 * throughput guarantees: weight-raised queues, and queues
> -	 * with a higher weight than other queues. When such queues
> -	 * are active, sub-condition (i) is false, which triggers
> -	 * device idling.
> +	 * The motivation for using preemption instead of idling (for
> +	 * queues with the same weight) is that, by not idling,
> +	 * service guarantees are preserved (completely or at least in
> +	 * part) without minimally sacrificing throughput. And, if
> +	 * there is no active group, then the primary expectation for
> +	 * this device is probably a high throughput.
> 	 *
> -	 * According to the above considerations, the next variable is
> -	 * true (only) if sub-condition (i) holds. To compute the
> -	 * value of this variable, we not only use the return value of
> -	 * the function bfq_symmetric_scenario(), but also check
> -	 * whether bfqq is being weight-raised, because
> -	 * bfq_symmetric_scenario() does not take into account also
> -	 * weight-raised queues (see comments on
> -	 * bfq_weights_tree_add()). In particular, if bfqq is being
> -	 * weight-raised, it is important to idle only if there are
> -	 * other, non-weight-raised queues that may steal throughput
> -	 * to bfqq. Actually, we should be even more precise, and
> -	 * differentiate between interactive weight raising and
> -	 * soft real-time weight raising.
> +	 * We are now left only with explaining the additional
> +	 * compound condition that is checked below for deciding
> +	 * whether the scenario is asymmetric. To explain this
> +	 * compound condition, we need to add that the function
> +	 * bfq_symmetric_scenario checks the weights of only
> +	 * non-weight-raised queues, for efficiency reasons (see
> +	 * comments on bfq_weights_tree_add()). Then the fact that
> +	 * bfqq is weight-raised is checked explicitly here. More
> +	 * precisely, the compound condition below takes into account
> +	 * also the fact that, even if bfqq is being weight-raised,
> +	 * the scenario is still symmetric if all active queues happen
> +	 * to be weight-raised. Actually, we should be even more
> +	 * precise here, and differentiate between interactive weight
> +	 * raising and soft real-time weight raising.
> 	 *
> 	 * As a side note, it is worth considering that the above
> 	 * device-idling countermeasures may however fail in the
> @@ -5392,7 +5417,7 @@ static int bfq_init_queue(struct request_queue =
*q, struct elevator_type *e)
> 	bfqd->idle_slice_timer.function =3D bfq_idle_slice_timer;
>=20
> 	bfqd->queue_weights_tree =3D RB_ROOT;
> -	bfqd->group_weights_tree =3D RB_ROOT;
> +	bfqd->num_active_groups =3D 0;
>=20
> 	INIT_LIST_HEAD(&bfqd->active_list);
> 	INIT_LIST_HEAD(&bfqd->idle_list);
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index 37d627afdc2e..77651d817ecd 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -108,15 +108,14 @@ struct bfq_sched_data {
> };
>=20
> /**
> - * struct bfq_weight_counter - counter of the number of all active =
entities
> + * struct bfq_weight_counter - counter of the number of all active =
queues
>  *                             with a given weight.
>  */
> struct bfq_weight_counter {
> -	unsigned int weight; /* weight of the entities this counter =
refers to */
> -	unsigned int num_active; /* nr of active entities with this =
weight */
> +	unsigned int weight; /* weight of the queues this counter refers =
to */
> +	unsigned int num_active; /* nr of active queues with this weight =
*/
> 	/*
> -	 * Weights tree member (see bfq_data's @queue_weights_tree and
> -	 * @group_weights_tree)
> +	 * Weights tree member (see bfq_data's @queue_weights_tree)
> 	 */
> 	struct rb_node weights_node;
> };
> @@ -151,8 +150,6 @@ struct bfq_weight_counter {
> struct bfq_entity {
> 	/* service_tree member */
> 	struct rb_node rb_node;
> -	/* pointer to the weight counter associated with this entity */
> -	struct bfq_weight_counter *weight_counter;
>=20
> 	/*
> 	 * Flag, true if the entity is on a tree (either the active or
> @@ -266,6 +263,9 @@ struct bfq_queue {
> 	/* entity representing this queue in the scheduler */
> 	struct bfq_entity entity;
>=20
> +	/* pointer to the weight counter associated with this entity */
> +	struct bfq_weight_counter *weight_counter;
> +
> 	/* maximum budget allowed from the feedback mechanism */
> 	int max_budget;
> 	/* budget expiration (in jiffies) */
> @@ -449,14 +449,9 @@ struct bfq_data {
> 	 */
> 	struct rb_root queue_weights_tree;
> 	/*
> -	 * rbtree of non-queue @bfq_entity weight counters, sorted by
> -	 * weight. Used to keep track of whether all @bfq_groups have
> -	 * the same weight. The tree contains one counter for each
> -	 * distinct weight associated to some active @bfq_group (see
> -	 * the comments to the functions bfq_weights_tree_[add|remove]
> -	 * for further details).
> +	 * number of groups with requests still waiting for completion
> 	 */
> -	struct rb_root group_weights_tree;
> +	unsigned int num_active_groups;
>=20
> 	/*
> 	 * Number of bfq_queues containing requests (including the
> @@ -851,10 +846,10 @@ struct bfq_queue *bic_to_bfqq(struct bfq_io_cq =
*bic, bool is_sync);
> void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool =
is_sync);
> struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic);
> void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue =
*bfqq);
> -void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_entity =
*entity,
> +void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue =
*bfqq,
> 			  struct rb_root *root);
> void __bfq_weights_tree_remove(struct bfq_data *bfqd,
> -			       struct bfq_entity *entity,
> +			       struct bfq_queue *bfqq,
> 			       struct rb_root *root);
> void bfq_weights_tree_remove(struct bfq_data *bfqd,
> 			     struct bfq_queue *bfqq);
> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
> index ff7c2d470bb8..476b5a90a5a4 100644
> --- a/block/bfq-wf2q.c
> +++ b/block/bfq-wf2q.c
> @@ -788,25 +788,29 @@ __bfq_entity_update_weight_prio(struct =
bfq_service_tree *old_st,
> 		new_weight =3D entity->orig_weight *
> 			     (bfqq ? bfqq->wr_coeff : 1);
> 		/*
> -		 * If the weight of the entity changes, remove the =
entity
> -		 * from its old weight counter (if there is a counter
> -		 * associated with the entity), and add it to the =
counter
> -		 * associated with its new weight.
> +		 * If the weight of the entity changes, and the entity =
is a
> +		 * queue, remove the entity from its old weight counter =
(if
> +		 * there is a counter associated with the entity).
> 		 */
> 		if (prev_weight !=3D new_weight) {
> -			root =3D bfqq ? &bfqd->queue_weights_tree :
> -				      &bfqd->group_weights_tree;
> -			__bfq_weights_tree_remove(bfqd, entity, root);
> +			if (bfqq) {
> +				root =3D &bfqd->queue_weights_tree;
> +				__bfq_weights_tree_remove(bfqd, bfqq, =
root);
> +			} else
> +				bfqd->num_active_groups--;
> 		}
> 		entity->weight =3D new_weight;
> 		/*
> -		 * Add the entity to its weights tree only if it is
> -		 * not associated with a weight-raised queue.
> +		 * Add the entity, if it is not a weight-raised queue,
> +		 * to the counter associated with its new weight.
> 		 */
> -		if (prev_weight !=3D new_weight &&
> -		    (bfqq ? bfqq->wr_coeff =3D=3D 1 : 1))
> -			/* If we get here, root has been initialized. */
> -			bfq_weights_tree_add(bfqd, entity, root);
> +		if (prev_weight !=3D new_weight) {
> +			if (bfqq && bfqq->wr_coeff =3D=3D 1) {
> +				/* If we get here, root has been =
initialized. */
> +				bfq_weights_tree_add(bfqd, bfqq, root);
> +			} else
> +				bfqd->num_active_groups++;
> +		}
>=20
> 		new_st->wsum +=3D entity->weight;
>=20
> @@ -1012,9 +1016,9 @@ static void __bfq_activate_entity(struct =
bfq_entity *entity,
> 	if (!bfq_entity_to_bfqq(entity)) { /* bfq_group */
> 		struct bfq_group *bfqg =3D
> 			container_of(entity, struct bfq_group, entity);
> +		struct bfq_data *bfqd =3D bfqg->bfqd;
>=20
> -		bfq_weights_tree_add(bfqg->bfqd, entity,
> -				     &bfqd->group_weights_tree);
> +		bfqd->num_active_groups++;
> 	}
> #endif
>=20
> @@ -1692,7 +1696,7 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, =
struct bfq_queue *bfqq)
>=20
> 	if (!bfqq->dispatched)
> 		if (bfqq->wr_coeff =3D=3D 1)
> -			bfq_weights_tree_add(bfqd, &bfqq->entity,
> +			bfq_weights_tree_add(bfqd, bfqq,
> 					     &bfqd->queue_weights_tree);
>=20
> 	if (bfqq->wr_coeff > 1)
> --=20
> 2.19.1
>=20