LinuxLists.cc - [PATCH -next v2 0/5] support concurrent sync io for bfq on a specail occasion

2022-04-16 10:56:47

Subject: [PATCH -next v2 0/5] support concurrent sync io for bfq on a specail occasion

Changes in v2:
- Use a different aporch to count root group, which is much simple.

Currently, bfq can't handle sync io concurrently as long as they
are not issued from root group. This is because
'bfqd->num_groups_with_pending_reqs > 0' is always true in
bfq_asymmetric_scenario().

The way that bfqg is counted to 'num_groups_with_pending_reqs':

Before this patchset:
1) root group will never be counted.
2) Count if bfqg or it's child bfqgs have pending requests.
3) Don't count if bfqg and it's child bfqgs complete all the requests.

After this patchset:
1) root group is counted.
2) Count if bfqg have pending requests.
This is because, for example:
if sync ios are issued from cgroup /root/c1/c2, root, c1 and c2 will all
be counted into 'num_groups_with_pending_reqs', which makes it impossible
to handle sync ios concurrently.

3) Don't count if bfqg complete all the requests.
This is because, for example:
t1 issue sync io on root group, t2 and t3 issue sync io on the same child
group. num_groups_with_pending_reqs is 2 now. After t1 stopped,
num_groups_with_pending_reqs is still 2. sync io from t2 and t3 still can't
be handled concurrently.

fio test script: startdelay is used to avoid queue merging
[global]
filename=/dev/nvme0n1
allow_mounted_write=0
ioengine=psync
direct=1
ioscheduler=bfq
offset_increment=10g
group_reporting
rw=randwrite
bs=4k

[test1]
numjobs=1

[test2]
startdelay=1
numjobs=1

[test3]
startdelay=2
numjobs=1

[test4]
startdelay=3
numjobs=1

[test5]
startdelay=4
numjobs=1

[test6]
startdelay=5
numjobs=1

[test7]
startdelay=6
numjobs=1

[test8]
startdelay=7
numjobs=1

test result:
running fio on root cgroup
v5.18-rc1: 550 Mib/s
v5.18-rc1-patched: 550 Mib/s

running fio on non-root cgroup
v5.18-rc1: 349 Mib/s
v5.18-rc1-patched: 550 Mib/s

Yu Kuai (5):
block, bfq: cleanup bfq_weights_tree add/remove apis
block, bfq: add fake weight_counter for weight-raised queue
bfq, block: record how many queues have pending requests in bfq_group
block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
block, bfq: do not idle if only one cgroup is activated

block/bfq-cgroup.c | 1 +
block/bfq-iosched.c | 90 +++++++++++++++++++--------------------------
block/bfq-iosched.h | 26 ++++++-------
block/bfq-wf2q.c | 30 +++------------
4 files changed, 56 insertions(+), 91 deletions(-)

--
2.31.1

2022-04-16 10:57:54

by Yu Kuai

[permalink] [raw]

Subject: [PATCH -next v2 1/5] block, bfq: cleanup bfq_weights_tree add/remove apis

They already pass 'bfqd' as the first parameter, there is no need to
pass 'bfqd->queue_weights_tree' as another parameter.

Signed-off-by: Yu Kuai <[email protected]>
---
block/bfq-iosched.c | 14 +++++++-------
block/bfq-iosched.h | 7 ++-----
block/bfq-wf2q.c | 16 +++++-----------
3 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 2e0dd68a3cbe..2deea2d07a1f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -862,9 +862,9 @@ static bool bfq_asymmetric_scenario(struct bfq_data *bfqd,
* In most scenarios, the rate at which nodes are created/destroyed
* should be low too.
*/
-void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq,
- struct rb_root_cached *root)
+void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq)
{
+ struct rb_root_cached *root = &bfqd->queue_weights_tree;
struct bfq_entity *entity = &bfqq->entity;
struct rb_node **new = &(root->rb_root.rb_node), *parent = NULL;
bool leftmost = true;
@@ -936,13 +936,14 @@ void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq,
* See the comments to the function bfq_weights_tree_add() for considerations
* about overhead.
*/
-void __bfq_weights_tree_remove(struct bfq_data *bfqd,
- struct bfq_queue *bfqq,
- struct rb_root_cached *root)
+void __bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_queue *bfqq)
{
+ struct rb_root_cached *root;
+
if (!bfqq->weight_counter)
return;

+ root = &bfqd->queue_weights_tree;
bfqq->weight_counter->num_active--;
if (bfqq->weight_counter->num_active > 0)
goto reset_entity_pointer;
@@ -1004,8 +1005,7 @@ void bfq_weights_tree_remove(struct bfq_data *bfqd,
* has no dispatched request. DO NOT use bfqq after the next
* function invocation.
*/
- __bfq_weights_tree_remove(bfqd, bfqq,
- &bfqd->queue_weights_tree);
+ __bfq_weights_tree_remove(bfqd, bfqq);
}

/*
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 3b83e3d1c2e5..072099b0c11a 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -969,11 +969,8 @@ struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync);
void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync);
struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic);
void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
-void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq,
- struct rb_root_cached *root);
-void __bfq_weights_tree_remove(struct bfq_data *bfqd,
- struct bfq_queue *bfqq,
- struct rb_root_cached *root);
+void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq);
+void __bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_weights_tree_remove(struct bfq_data *bfqd,
struct bfq_queue *bfqq);
void bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq,
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index f8eb340381cf..a1296058c1ec 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -707,7 +707,6 @@ __bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
unsigned int prev_weight, new_weight;
struct bfq_data *bfqd = NULL;
- struct rb_root_cached *root;
#ifdef CONFIG_BFQ_GROUP_IOSCHED
struct bfq_sched_data *sd;
struct bfq_group *bfqg;
@@ -770,19 +769,15 @@ __bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
* queue, remove the entity from its old weight counter (if
* there is a counter associated with the entity).
*/
- if (prev_weight != new_weight && bfqq) {
- root = &bfqd->queue_weights_tree;
- __bfq_weights_tree_remove(bfqd, bfqq, root);
- }
+ if (prev_weight != new_weight && bfqq)
+ __bfq_weights_tree_remove(bfqd, bfqq);
entity->weight = new_weight;
/*
* Add the entity, if it is not a weight-raised queue,
* to the counter associated with its new weight.
*/
- if (prev_weight != new_weight && bfqq && bfqq->wr_coeff == 1) {
- /* If we get here, root has been initialized. */
- bfq_weights_tree_add(bfqd, bfqq, root);
- }
+ if (prev_weight != new_weight && bfqq && bfqq->wr_coeff == 1)
+ bfq_weights_tree_add(bfqd, bfqq);

new_st->wsum += entity->weight;

@@ -1686,8 +1681,7 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)

if (!bfqq->dispatched)
if (bfqq->wr_coeff == 1)
- bfq_weights_tree_add(bfqd, bfqq,
- &bfqd->queue_weights_tree);
+ bfq_weights_tree_add(bfqd, bfqq);

if (bfqq->wr_coeff > 1)
bfqd->wr_busy_queues++;
--
2.31.1

2022-04-16 12:14:51

by Yu Kuai

[permalink] [raw]

Subject: [PATCH -next v2 4/5] block, bfq: refactor the counting of 'num_groups_with_pending_reqs'

Currently, bfq can't handle sync io concurrently as long as they
are not issued from root group. This is because
'bfqd->num_groups_with_pending_reqs > 0' is always true in
bfq_asymmetric_scenario().

The way that bfqg is counted to 'num_groups_with_pending_reqs':

Before this patch:
1) root group will never be counted.
2) Count if bfqg or it's child bfqgs have pending requests.
3) Don't count if bfqg and it's child bfqgs complete all the requests.

After this patch:
1) root group is counted.
2) Count if bfqg have pending requests.
3) Don't count if bfqg complete all the requests.

With this patch, the occasion that only one group have pending requests
can be detected, and next patch will support concurrent sync io in the
occasion.

Signed-off-by: Yu Kuai <[email protected]>
---
block/bfq-iosched.c | 52 ++++++---------------------------------------
block/bfq-iosched.h | 18 ++++++++--------
block/bfq-wf2q.c | 13 ------------
3 files changed, 15 insertions(+), 68 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 994c6b36a5d5..39abcd95df8e 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -941,7 +941,8 @@ void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq)
#ifdef CONFIG_BFQ_GROUP_IOSCHED
if (!entity->in_groups_with_pending_reqs) {
entity->in_groups_with_pending_reqs = true;
- bfqq_group(bfqq)->num_entities_with_pending_reqs++;
+ if (!(bfqq_group(bfqq)->num_entities_with_pending_reqs++))
+ bfqd->num_groups_with_pending_reqs++;
}
#endif
}
@@ -974,7 +975,8 @@ void __bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_queue *bfqq)
#ifdef CONFIG_BFQ_GROUP_IOSCHED
if (bfqq->entity.in_groups_with_pending_reqs) {
bfqq->entity.in_groups_with_pending_reqs = false;
- bfqq_group(bfqq)->num_entities_with_pending_reqs--;
+ if (!(--bfqq_group(bfqq)->num_entities_with_pending_reqs))
+ bfqd->num_groups_with_pending_reqs--;
}
#endif

@@ -989,48 +991,6 @@ void __bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_queue *bfqq)
void bfq_weights_tree_remove(struct bfq_data *bfqd,
struct bfq_queue *bfqq)
{
- struct bfq_entity *entity = bfqq->entity.parent;
-
- for_each_entity(entity) {
- struct bfq_sched_data *sd = entity->my_sched_data;
-
- if (sd->next_in_service || sd->in_service_entity) {
- /*
- * entity is still active, because either
- * next_in_service or in_service_entity is not
- * NULL (see the comments on the definition of
- * next_in_service for details on why
- * in_service_entity must be checked too).
- *
- * As a consequence, its parent entities are
- * active as well, and thus this loop must
- * stop here.
- */
- break;
- }
-
- /*
- * The decrement of num_groups_with_pending_reqs is
- * not performed immediately upon the deactivation of
- * entity, but it is delayed to when it also happens
- * that the first leaf descendant bfqq of entity gets
- * all its pending requests completed. The following
- * instructions perform this delayed decrement, if
- * needed. See the comments on
- * num_groups_with_pending_reqs for details.
- */
- if (entity->in_groups_with_pending_reqs) {
- entity->in_groups_with_pending_reqs = false;
- bfqd->num_groups_with_pending_reqs--;
- }
- }
-
- /*
- * Next function is invoked last, because it causes bfqq to be
- * freed if the following holds: bfqq is not in service and
- * has no dispatched request. DO NOT use bfqq after the next
- * function invocation.
- */
__bfq_weights_tree_remove(bfqd, bfqq);
}

@@ -3710,7 +3670,7 @@ static void bfq_dispatch_remove(struct request_queue *q, struct request *rq)
* group. More precisely, for conditions (i-a) or (i-b) to become
* false because of such a group, it is not even necessary that the
* group is (still) active: it is sufficient that, even if the group
- * has become inactive, some of its descendant processes still have
+ * has become inactive, some of its processes still have
* some request already dispatched but still waiting for
* completion. In fact, requests have still to be guaranteed their
* share of the throughput even after being dispatched. In this
@@ -3719,7 +3679,7 @@ static void bfq_dispatch_remove(struct request_queue *q, struct request *rq)
* happens, the group is not considered in the calculation of whether
* the scenario is asymmetric, then the group may fail to be
* guaranteed its fair share of the throughput (basically because
- * idling may not be performed for the descendant processes of the
+ * idling may not be performed for the processes of the
* group, but it had to be). We address this issue with the following
* bi-modal behavior, implemented in the function
* bfq_asymmetric_scenario().
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 5e1a0ead2b6a..0850ca03e1d5 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -495,27 +495,27 @@ struct bfq_data {
struct rb_root_cached queue_weights_tree;

/*
- * Number of groups with at least one descendant process that
+ * Number of groups with at least one process that
* has at least one request waiting for completion. Note that
* this accounts for also requests already dispatched, but not
* yet completed. Therefore this number of groups may differ
* (be larger) than the number of active groups, as a group is
* considered active only if its corresponding entity has
- * descendant queues with at least one request queued. This
+ * queues with at least one request queued. This
* number is used to decide whether a scenario is symmetric.
* For a detailed explanation see comments on the computation
* of the variable asymmetric_scenario in the function
* bfq_better_to_idle().
*
* However, it is hard to compute this number exactly, for
- * groups with multiple descendant processes. Consider a group
- * that is inactive, i.e., that has no descendant process with
+ * groups with multiple processes. Consider a group
+ * that is inactive, i.e., that has no process with
* pending I/O inside BFQ queues. Then suppose that
* num_groups_with_pending_reqs is still accounting for this
- * group, because the group has descendant processes with some
+ * group, because the group has processes with some
* I/O request still in flight. num_groups_with_pending_reqs
* should be decremented when the in-flight request of the
- * last descendant process is finally completed (assuming that
+ * last process is finally completed (assuming that
* nothing else has changed for the group in the meantime, in
* terms of composition of the group and active/inactive state of child
* groups and processes). To accomplish this, an additional
@@ -524,7 +524,7 @@ struct bfq_data {
* we resort to the following tradeoff between simplicity and
* accuracy: for an inactive group that is still counted in
* num_groups_with_pending_reqs, we decrement
- * num_groups_with_pending_reqs when the first descendant
+ * num_groups_with_pending_reqs when the first
* process of the group remains with no request waiting for
* completion.
*
@@ -532,12 +532,12 @@ struct bfq_data {
* carefulness: to avoid multiple decrements, we flag a group,
* more precisely an entity representing a group, as still
* counted in num_groups_with_pending_reqs when it becomes
- * inactive. Then, when the first descendant queue of the
+ * inactive. Then, when the first queue of the
* entity remains with no request waiting for completion,
* num_groups_with_pending_reqs is decremented, and this flag
* is reset. After this flag is reset for the entity,
* num_groups_with_pending_reqs won't be decremented any
- * longer in case a new descendant queue of the entity remains
+ * longer in case a new queue of the entity remains
* with no request waiting for completion.
*/
unsigned int num_groups_with_pending_reqs;
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index ae12c6b2c525..e848d5d2bcdc 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -979,19 +979,6 @@ static void __bfq_activate_entity(struct bfq_entity *entity,
entity->on_st_or_in_serv = true;
}

-#ifdef CONFIG_BFQ_GROUP_IOSCHED
- if (!bfq_entity_to_bfqq(entity)) { /* bfq_group */
- struct bfq_group *bfqg =
- container_of(entity, struct bfq_group, entity);
- struct bfq_data *bfqd = bfqg->bfqd;
-
- if (!entity->in_groups_with_pending_reqs) {
- entity->in_groups_with_pending_reqs = true;
- bfqd->num_groups_with_pending_reqs++;
- }
- }
-#endif
-
bfq_update_fin_time_enqueue(entity, st, backshifted);
}

--
2.31.1

2022-04-16 13:39:21

by Yu Kuai

[permalink] [raw]

Subject: [PATCH -next v2 2/5] block, bfq: add fake weight_counter for weight-raised queue

Weight-raised queue is not inserted to weights_tree, which makes it
impossible to track how many queues have pending requests through
weights_tree insertion and removel. This patch add fake weight_counter
for weight-raised queue to do that.

Signed-off-by: Yu Kuai <[email protected]>
---
block/bfq-iosched.c | 11 +++++++++++
block/bfq-wf2q.c | 5 ++---
2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 2deea2d07a1f..a2977c938c70 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -134,6 +134,8 @@
#include "bfq-iosched.h"
#include "blk-wbt.h"

+#define BFQ_FAKE_WEIGHT_COUNTER ((void *) POISON_INUSE)
+
#define BFQ_BFQQ_FNS(name) \
void bfq_mark_bfqq_##name(struct bfq_queue *bfqq) \
{ \
@@ -884,6 +886,12 @@ void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq)
if (bfqq->weight_counter)
return;

+ if (bfqq->wr_coeff != 1) {
+ bfqq->weight_counter = BFQ_FAKE_WEIGHT_COUNTER;
+ bfqq->ref++;
+ return;
+ }
+
while (*new) {
struct bfq_weight_counter *__counter = container_of(*new,
struct bfq_weight_counter,
@@ -943,6 +951,9 @@ void __bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_queue *bfqq)
if (!bfqq->weight_counter)
return;

+ if (bfqq->weight_counter == BFQ_FAKE_WEIGHT_COUNTER)
+ goto reset_entity_pointer;
+
root = &bfqd->queue_weights_tree;
bfqq->weight_counter->num_active--;
if (bfqq->weight_counter->num_active > 0)
diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index a1296058c1ec..ae12c6b2c525 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -776,7 +776,7 @@ __bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
* Add the entity, if it is not a weight-raised queue,
* to the counter associated with its new weight.
*/
- if (prev_weight != new_weight && bfqq && bfqq->wr_coeff == 1)
+ if (prev_weight != new_weight && bfqq)
bfq_weights_tree_add(bfqd, bfqq);

new_st->wsum += entity->weight;
@@ -1680,8 +1680,7 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
bfqd->busy_queues[bfqq->ioprio_class - 1]++;

if (!bfqq->dispatched)
- if (bfqq->wr_coeff == 1)
- bfq_weights_tree_add(bfqd, bfqq);
+ bfq_weights_tree_add(bfqd, bfqq);

if (bfqq->wr_coeff > 1)
bfqd->wr_busy_queues++;
--
2.31.1

2022-04-17 08:36:01

by Yu Kuai

[permalink] [raw]

Subject: [PATCH -next v2 5/5] block, bfq: do not idle if only one cgroup is activated

Now that root group is counted into 'num_groups_with_pending_reqs',
'num_groups_with_pending_reqs > 0' is always true in
bfq_asymmetric_scenario().

Thus change the condition to 'num_groups_with_pending_reqs > 1'.

On the other hand, now that 'num_groups_with_pending_reqs' represents
how many groups have pending requests, this change can enable concurrent
sync io is only on cgroup is activated.

Signed-off-by: Yu Kuai <[email protected]>
---
block/bfq-iosched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 39abcd95df8e..7d9f94882f8e 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -846,7 +846,7 @@ static bool bfq_asymmetric_scenario(struct bfq_data *bfqd,

return varied_queue_weights || multiple_classes_busy
#ifdef CONFIG_BFQ_GROUP_IOSCHED
- || bfqd->num_groups_with_pending_reqs > 0
+ || bfqd->num_groups_with_pending_reqs > 1
#endif
;
}
--
2.31.1

2022-04-24 13:17:21

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH -next v2 0/5] support concurrent sync io for bfq on a specail occasion

friendly ping ...

?? 2022/04/16 17:37, Yu Kuai д??:
> Changes in v2:
> - Use a different aporch to count root group, which is much simple.
>
> Currently, bfq can't handle sync io concurrently as long as they
> are not issued from root group. This is because
> 'bfqd->num_groups_with_pending_reqs > 0' is always true in
> bfq_asymmetric_scenario().
>
> The way that bfqg is counted to 'num_groups_with_pending_reqs':
>
> Before this patchset:
> 1) root group will never be counted.
> 2) Count if bfqg or it's child bfqgs have pending requests.
> 3) Don't count if bfqg and it's child bfqgs complete all the requests.
>
> After this patchset:
> 1) root group is counted.
> 2) Count if bfqg have pending requests.
> This is because, for example:
> if sync ios are issued from cgroup /root/c1/c2, root, c1 and c2 will all
> be counted into 'num_groups_with_pending_reqs', which makes it impossible
> to handle sync ios concurrently.
>
> 3) Don't count if bfqg complete all the requests.
> This is because, for example:
> t1 issue sync io on root group, t2 and t3 issue sync io on the same child
> group. num_groups_with_pending_reqs is 2 now. After t1 stopped,
> num_groups_with_pending_reqs is still 2. sync io from t2 and t3 still can't
> be handled concurrently.
>
> fio test script: startdelay is used to avoid queue merging
> [global]
> filename=/dev/nvme0n1
> allow_mounted_write=0
> ioengine=psync
> direct=1
> ioscheduler=bfq
> offset_increment=10g
> group_reporting
> rw=randwrite
> bs=4k
>
> [test1]
> numjobs=1
>
> [test2]
> startdelay=1
> numjobs=1
>
> [test3]
> startdelay=2
> numjobs=1
>
> [test4]
> startdelay=3
> numjobs=1
>
> [test5]
> startdelay=4
> numjobs=1
>
> [test6]
> startdelay=5
> numjobs=1
>
> [test7]
> startdelay=6
> numjobs=1
>
> [test8]
> startdelay=7
> numjobs=1
>
> test result:
> running fio on root cgroup
> v5.18-rc1: 550 Mib/s
> v5.18-rc1-patched: 550 Mib/s
>
> running fio on non-root cgroup
> v5.18-rc1: 349 Mib/s
> v5.18-rc1-patched: 550 Mib/s
>
> Yu Kuai (5):
> block, bfq: cleanup bfq_weights_tree add/remove apis
> block, bfq: add fake weight_counter for weight-raised queue
> bfq, block: record how many queues have pending requests in bfq_group
> block, bfq: refactor the counting of 'num_groups_with_pending_reqs'
> block, bfq: do not idle if only one cgroup is activated
>
> block/bfq-cgroup.c | 1 +
> block/bfq-iosched.c | 90 +++++++++++++++++++--------------------------
> block/bfq-iosched.h | 26 ++++++-------
> block/bfq-wf2q.c | 30 +++------------
> 4 files changed, 56 insertions(+), 91 deletions(-)
>

2022-04-25 10:54:27

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH -next v2 2/5] block, bfq: add fake weight_counter for weight-raised queue

On Sat 16-04-22 17:37:50, Yu Kuai wrote:
> Weight-raised queue is not inserted to weights_tree, which makes it
> impossible to track how many queues have pending requests through
> weights_tree insertion and removel. This patch add fake weight_counter
> for weight-raised queue to do that.
>
> Signed-off-by: Yu Kuai <[email protected]>

This is a bit hacky. I was looking into a better place where to hook to
count entities in a bfq_group with requests and I think bfq_add_bfqq_busy()
and bfq_del_bfqq_busy() are ideal for this. It also makes better sense
conceptually than hooking into weights tree handling.

Other than this the rest of the series looks fine to me.

Honza

> ---
> block/bfq-iosched.c | 11 +++++++++++
> block/bfq-wf2q.c | 5 ++---
> 2 files changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 2deea2d07a1f..a2977c938c70 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -134,6 +134,8 @@
> #include "bfq-iosched.h"
> #include "blk-wbt.h"
>
> +#define BFQ_FAKE_WEIGHT_COUNTER ((void *) POISON_INUSE)
> +
> #define BFQ_BFQQ_FNS(name) \
> void bfq_mark_bfqq_##name(struct bfq_queue *bfqq) \
> { \
> @@ -884,6 +886,12 @@ void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq)
> if (bfqq->weight_counter)
> return;
>
> + if (bfqq->wr_coeff != 1) {
> + bfqq->weight_counter = BFQ_FAKE_WEIGHT_COUNTER;
> + bfqq->ref++;
> + return;
> + }
> +
> while (*new) {
> struct bfq_weight_counter *__counter = container_of(*new,
> struct bfq_weight_counter,
> @@ -943,6 +951,9 @@ void __bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_queue *bfqq)
> if (!bfqq->weight_counter)
> return;
>
> + if (bfqq->weight_counter == BFQ_FAKE_WEIGHT_COUNTER)
> + goto reset_entity_pointer;
> +
> root = &bfqd->queue_weights_tree;
> bfqq->weight_counter->num_active--;
> if (bfqq->weight_counter->num_active > 0)
> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
> index a1296058c1ec..ae12c6b2c525 100644
> --- a/block/bfq-wf2q.c
> +++ b/block/bfq-wf2q.c
> @@ -776,7 +776,7 @@ __bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
> * Add the entity, if it is not a weight-raised queue,
> * to the counter associated with its new weight.
> */
> - if (prev_weight != new_weight && bfqq && bfqq->wr_coeff == 1)
> + if (prev_weight != new_weight && bfqq)
> bfq_weights_tree_add(bfqd, bfqq);
>
> new_st->wsum += entity->weight;
> @@ -1680,8 +1680,7 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
> bfqd->busy_queues[bfqq->ioprio_class - 1]++;
>
> if (!bfqq->dispatched)
> - if (bfqq->wr_coeff == 1)
> - bfq_weights_tree_add(bfqd, bfqq);
> + bfq_weights_tree_add(bfqd, bfqq);
>
> if (bfqq->wr_coeff > 1)
> bfqd->wr_busy_queues++;
> --
> 2.31.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-04-25 11:38:39

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH -next v2 1/5] block, bfq: cleanup bfq_weights_tree add/remove apis

On Sat 16-04-22 17:37:49, Yu Kuai wrote:
> They already pass 'bfqd' as the first parameter, there is no need to
> pass 'bfqd->queue_weights_tree' as another parameter.
>
> Signed-off-by: Yu Kuai <[email protected]>

Nice cleanup. Feel free to add:

Reviewed-by: Jan Kara <[email protected]>

Honza

> ---
> block/bfq-iosched.c | 14 +++++++-------
> block/bfq-iosched.h | 7 ++-----
> block/bfq-wf2q.c | 16 +++++-----------
> 3 files changed, 14 insertions(+), 23 deletions(-)
>
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 2e0dd68a3cbe..2deea2d07a1f 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -862,9 +862,9 @@ static bool bfq_asymmetric_scenario(struct bfq_data *bfqd,
> * In most scenarios, the rate at which nodes are created/destroyed
> * should be low too.
> */
> -void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> - struct rb_root_cached *root)
> +void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq)
> {
> + struct rb_root_cached *root = &bfqd->queue_weights_tree;
> struct bfq_entity *entity = &bfqq->entity;
> struct rb_node **new = &(root->rb_root.rb_node), *parent = NULL;
> bool leftmost = true;
> @@ -936,13 +936,14 @@ void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> * See the comments to the function bfq_weights_tree_add() for considerations
> * about overhead.
> */
> -void __bfq_weights_tree_remove(struct bfq_data *bfqd,
> - struct bfq_queue *bfqq,
> - struct rb_root_cached *root)
> +void __bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_queue *bfqq)
> {
> + struct rb_root_cached *root;
> +
> if (!bfqq->weight_counter)
> return;
>
> + root = &bfqd->queue_weights_tree;
> bfqq->weight_counter->num_active--;
> if (bfqq->weight_counter->num_active > 0)
> goto reset_entity_pointer;
> @@ -1004,8 +1005,7 @@ void bfq_weights_tree_remove(struct bfq_data *bfqd,
> * has no dispatched request. DO NOT use bfqq after the next
> * function invocation.
> */
> - __bfq_weights_tree_remove(bfqd, bfqq,
> - &bfqd->queue_weights_tree);
> + __bfq_weights_tree_remove(bfqd, bfqq);
> }
>
> /*
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index 3b83e3d1c2e5..072099b0c11a 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -969,11 +969,8 @@ struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync);
> void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq, bool is_sync);
> struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic);
> void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
> -void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> - struct rb_root_cached *root);
> -void __bfq_weights_tree_remove(struct bfq_data *bfqd,
> - struct bfq_queue *bfqq,
> - struct rb_root_cached *root);
> +void bfq_weights_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq);
> +void __bfq_weights_tree_remove(struct bfq_data *bfqd, struct bfq_queue *bfqq);
> void bfq_weights_tree_remove(struct bfq_data *bfqd,
> struct bfq_queue *bfqq);
> void bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq,
> diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
> index f8eb340381cf..a1296058c1ec 100644
> --- a/block/bfq-wf2q.c
> +++ b/block/bfq-wf2q.c
> @@ -707,7 +707,6 @@ __bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
> struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
> unsigned int prev_weight, new_weight;
> struct bfq_data *bfqd = NULL;
> - struct rb_root_cached *root;
> #ifdef CONFIG_BFQ_GROUP_IOSCHED
> struct bfq_sched_data *sd;
> struct bfq_group *bfqg;
> @@ -770,19 +769,15 @@ __bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
> * queue, remove the entity from its old weight counter (if
> * there is a counter associated with the entity).
> */
> - if (prev_weight != new_weight && bfqq) {
> - root = &bfqd->queue_weights_tree;
> - __bfq_weights_tree_remove(bfqd, bfqq, root);
> - }
> + if (prev_weight != new_weight && bfqq)
> + __bfq_weights_tree_remove(bfqd, bfqq);
> entity->weight = new_weight;
> /*
> * Add the entity, if it is not a weight-raised queue,
> * to the counter associated with its new weight.
> */
> - if (prev_weight != new_weight && bfqq && bfqq->wr_coeff == 1) {
> - /* If we get here, root has been initialized. */
> - bfq_weights_tree_add(bfqd, bfqq, root);
> - }
> + if (prev_weight != new_weight && bfqq && bfqq->wr_coeff == 1)
> + bfq_weights_tree_add(bfqd, bfqq);
>
> new_st->wsum += entity->weight;
>
> @@ -1686,8 +1681,7 @@ void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
>
> if (!bfqq->dispatched)
> if (bfqq->wr_coeff == 1)
> - bfq_weights_tree_add(bfqd, bfqq,
> - &bfqd->queue_weights_tree);
> + bfq_weights_tree_add(bfqd, bfqq);
>
> if (bfqq->wr_coeff > 1)
> bfqd->wr_busy_queues++;
> --
> 2.31.1
>
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-04-25 14:48:34

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH -next v2 2/5] block, bfq: add fake weight_counter for weight-raised queue

?? 2022/04/25 17:48, Jan Kara д??:
> On Sat 16-04-22 17:37:50, Yu Kuai wrote:
>> Weight-raised queue is not inserted to weights_tree, which makes it
>> impossible to track how many queues have pending requests through
>> weights_tree insertion and removel. This patch add fake weight_counter
>> for weight-raised queue to do that.
>>
>> Signed-off-by: Yu Kuai <[email protected]>
>
> This is a bit hacky. I was looking into a better place where to hook to
> count entities in a bfq_group with requests and I think bfq_add_bfqq_busy()
> and bfq_del_bfqq_busy() are ideal for this. It also makes better sense
> conceptually than hooking into weights tree handling.
>
Hi,

bfq_del_bfqq_busy() will be called when all the reqs in the bfqq are
dispatched, however there might still some reqs are't completed yet.

Here what we want to track is how many bfqqs have pending reqs,
specifically if the bfqq have reqs are't complted.

Thus I think bfq_del_bfqq_busy() is not the right place to do that.

Thanks,
Kuai
> Other than this the rest of the series looks fine to me.
>
> Honza

2022-04-25 22:38:22

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH -next v2 2/5] block, bfq: add fake weight_counter for weight-raised queue

?? 2022/04/25 21:34, yukuai (C) д??:
> ?? 2022/04/25 17:48, Jan Kara д??:
>> On Sat 16-04-22 17:37:50, Yu Kuai wrote:
>>> Weight-raised queue is not inserted to weights_tree, which makes it
>>> impossible to track how many queues have pending requests through
>>> weights_tree insertion and removel. This patch add fake weight_counter
>>> for weight-raised queue to do that.
>>>
>>> Signed-off-by: Yu Kuai <[email protected]>
>>
>> This is a bit hacky. I was looking into a better place where to hook to
>> count entities in a bfq_group with requests and I think
>> bfq_add_bfqq_busy()
>> and bfq_del_bfqq_busy() are ideal for this. It also makes better sense
>> conceptually than hooking into weights tree handling.
>>
> Hi,
>
> bfq_del_bfqq_busy() will be called when all the reqs in the bfqq are
> dispatched, however there might still some reqs are't completed yet.
>
> Here what we want to track is how many bfqqs have pending reqs,
> specifically if the bfqq have reqs are't complted.
>
> Thus I think bfq_del_bfqq_busy() is not the right place to do that.

BTW, there is a counter 'dispatched' in bfqq, how about we rename it
to 'inflight', and inc when adding req to bfqq, dec the same as
'dispatched' ?

This way we can count bfqq when adding 'inflight' from 0 to 1, and
stop when decreasing 'inflight' from 1 to 0.

2022-04-26 02:00:50

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH -next v2 2/5] block, bfq: add fake weight_counter for weight-raised queue

On Mon 25-04-22 21:55:46, yukuai (C) wrote:
> 在 2022/04/25 21:34, yukuai (C) 写道:
> > 在 2022/04/25 17:48, Jan Kara 写道:
> > > On Sat 16-04-22 17:37:50, Yu Kuai wrote:
> > > > Weight-raised queue is not inserted to weights_tree, which makes it
> > > > impossible to track how many queues have pending requests through
> > > > weights_tree insertion and removel. This patch add fake weight_counter
> > > > for weight-raised queue to do that.
> > > >
> > > > Signed-off-by: Yu Kuai <[email protected]>
> > >
> > > This is a bit hacky. I was looking into a better place where to hook to
> > > count entities in a bfq_group with requests and I think
> > > bfq_add_bfqq_busy()
> > > and bfq_del_bfqq_busy() are ideal for this. It also makes better sense
> > > conceptually than hooking into weights tree handling.
> > >
> > Hi,
> >
> > bfq_del_bfqq_busy() will be called when all the reqs in the bfqq are
> > dispatched, however there might still some reqs are't completed yet.
> >
> > Here what we want to track is how many bfqqs have pending reqs,
> > specifically if the bfqq have reqs are't complted.
> >
> > Thus I think bfq_del_bfqq_busy() is not the right place to do that.
>
> BTW, there is a counter 'dispatched' in bfqq, how about we rename it
> to 'inflight', and inc when adding req to bfqq, dec the same as
> 'dispatched' ?
>
> This way we can count bfqq when adding 'inflight' from 0 to 1, and
> stop when decreasing 'inflight' from 1 to 0.

Well, but 'dispatched' is used in quite a few places and it would require
quite some thinking to decide which impact using 'inflight' has there...
But we also have 'bfqq->entity.allocated' which is number of requests in
some state associated with bfqq and we could use that. But as I wrote in my
previous email, I'm not convinced it is really necessary...

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-04-26 10:23:23

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH -next v2 2/5] block, bfq: add fake weight_counter for weight-raised queue

在 2022/04/26 0:16, Jan Kara 写道:
> Hello!
>
> On Mon 25-04-22 21:34:16, yukuai (C) wrote:
>> 在 2022/04/25 17:48, Jan Kara 写道:
>>> On Sat 16-04-22 17:37:50, Yu Kuai wrote:
>>>> Weight-raised queue is not inserted to weights_tree, which makes it
>>>> impossible to track how many queues have pending requests through
>>>> weights_tree insertion and removel. This patch add fake weight_counter
>>>> for weight-raised queue to do that.
>>>>
>>>> Signed-off-by: Yu Kuai <[email protected]>
>>>
>>> This is a bit hacky. I was looking into a better place where to hook to
>>> count entities in a bfq_group with requests and I think bfq_add_bfqq_busy()
>>> and bfq_del_bfqq_busy() are ideal for this. It also makes better sense
>>> conceptually than hooking into weights tree handling.
>>
>> bfq_del_bfqq_busy() will be called when all the reqs in the bfqq are
>> dispatched, however there might still some reqs are't completed yet.
>>
>> Here what we want to track is how many bfqqs have pending reqs,
>> specifically if the bfqq have reqs are't complted.
>>
>> Thus I think bfq_del_bfqq_busy() is not the right place to do that.
>
> Yes, I'm aware there will be a difference. But note that bfqq can stay busy
> with only dispatched requests because the logic in __bfq_bfqq_expire() will
> not call bfq_del_bfqq_busy() if idling is needed for service guarantees. So
> I think using bfq_add/del_bfqq_busy() would work OK.
Hi,

I didn't think of that before. If bfqq stay busy after dispathing all
the requests, there are two other places that bfqq can clear busy:

1) bfq_remove_request(), bfqq has to insert a new req while it's not in
service.

2) bfq_release_process_ref(), user thread is gone / moved, or old bfqq
is gone due to merge / ioprio change.

I wonder, will bfq_del_bfqq_busy() be called immediately when requests
are completed? (It seems not to me...). For example, a user thread
issue a sync io just once, and it keep running without issuing new io,
then when does the bfqq clears the busy state?

Thanks,
Kuai
>
> Honza
>

2022-04-26 12:29:07

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH -next v2 2/5] block, bfq: add fake weight_counter for weight-raised queue

Hello!

On Mon 25-04-22 21:34:16, yukuai (C) wrote:
> 在 2022/04/25 17:48, Jan Kara 写道:
> > On Sat 16-04-22 17:37:50, Yu Kuai wrote:
> > > Weight-raised queue is not inserted to weights_tree, which makes it
> > > impossible to track how many queues have pending requests through
> > > weights_tree insertion and removel. This patch add fake weight_counter
> > > for weight-raised queue to do that.
> > >
> > > Signed-off-by: Yu Kuai <[email protected]>
> >
> > This is a bit hacky. I was looking into a better place where to hook to
> > count entities in a bfq_group with requests and I think bfq_add_bfqq_busy()
> > and bfq_del_bfqq_busy() are ideal for this. It also makes better sense
> > conceptually than hooking into weights tree handling.
>
> bfq_del_bfqq_busy() will be called when all the reqs in the bfqq are
> dispatched, however there might still some reqs are't completed yet.
>
> Here what we want to track is how many bfqqs have pending reqs,
> specifically if the bfqq have reqs are't complted.
>
> Thus I think bfq_del_bfqq_busy() is not the right place to do that.

Yes, I'm aware there will be a difference. But note that bfqq can stay busy
with only dispatched requests because the logic in __bfq_bfqq_expire() will
not call bfq_del_bfqq_busy() if idling is needed for service guarantees. So
I think using bfq_add/del_bfqq_busy() would work OK.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-04-26 12:54:37

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH -next v2 2/5] block, bfq: add fake weight_counter for weight-raised queue

On Tue 26-04-22 09:49:04, yukuai (C) wrote:
> 在 2022/04/26 0:16, Jan Kara 写道:
> > Hello!
> >
> > On Mon 25-04-22 21:34:16, yukuai (C) wrote:
> > > 在 2022/04/25 17:48, Jan Kara 写道:
> > > > On Sat 16-04-22 17:37:50, Yu Kuai wrote:
> > > > > Weight-raised queue is not inserted to weights_tree, which makes it
> > > > > impossible to track how many queues have pending requests through
> > > > > weights_tree insertion and removel. This patch add fake weight_counter
> > > > > for weight-raised queue to do that.
> > > > >
> > > > > Signed-off-by: Yu Kuai <[email protected]>
> > > >
> > > > This is a bit hacky. I was looking into a better place where to hook to
> > > > count entities in a bfq_group with requests and I think bfq_add_bfqq_busy()
> > > > and bfq_del_bfqq_busy() are ideal for this. It also makes better sense
> > > > conceptually than hooking into weights tree handling.
> > >
> > > bfq_del_bfqq_busy() will be called when all the reqs in the bfqq are
> > > dispatched, however there might still some reqs are't completed yet.
> > >
> > > Here what we want to track is how many bfqqs have pending reqs,
> > > specifically if the bfqq have reqs are't complted.
> > >
> > > Thus I think bfq_del_bfqq_busy() is not the right place to do that.
> >
> > Yes, I'm aware there will be a difference. But note that bfqq can stay busy
> > with only dispatched requests because the logic in __bfq_bfqq_expire() will
> > not call bfq_del_bfqq_busy() if idling is needed for service guarantees. So
> > I think using bfq_add/del_bfqq_busy() would work OK.
> Hi,
>
> I didn't think of that before. If bfqq stay busy after dispathing all
> the requests, there are two other places that bfqq can clear busy:
>
> 1) bfq_remove_request(), bfqq has to insert a new req while it's not in
> service.

Yes and the request then would have to be dispatched or merged. Which
generally means another bfqq from the same bfqg is currently active and
thus this should have no impact on service guarantees we are interested in.

> 2) bfq_release_process_ref(), user thread is gone / moved, or old bfqq
> is gone due to merge / ioprio change.

Yes, here there's no new IO for the bfqq so no point in maintaining any
service guarantees to it.

> I wonder, will bfq_del_bfqq_busy() be called immediately when requests
> are completed? (It seems not to me...). For example, a user thread
> issue a sync io just once, and it keep running without issuing new io,
> then when does the bfqq clears the busy state?

No, when bfqq is kept busy, it will get scheduled as in-service queue in
the future. Then what happens depends on whether it will get more requests
or not. But generally its busy state will get cleared once it is expired
for other reason than preemption.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2022-04-26 14:35:45

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH -next v2 2/5] block, bfq: add fake weight_counter for weight-raised queue

在 2022/04/26 17:15, Jan Kara 写道:
> On Tue 26-04-22 16:27:46, yukuai (C) wrote:
>> 在 2022/04/26 15:40, Jan Kara 写道:
>>> On Tue 26-04-22 09:49:04, yukuai (C) wrote:
>>>> 在 2022/04/26 0:16, Jan Kara 写道:
>>>>> Hello!
>>>>>
>>>>> On Mon 25-04-22 21:34:16, yukuai (C) wrote:
>>>>>> 在 2022/04/25 17:48, Jan Kara 写道:
>>>>>>> On Sat 16-04-22 17:37:50, Yu Kuai wrote:
>>>>>>>> Weight-raised queue is not inserted to weights_tree, which makes it
>>>>>>>> impossible to track how many queues have pending requests through
>>>>>>>> weights_tree insertion and removel. This patch add fake weight_counter
>>>>>>>> for weight-raised queue to do that.
>>>>>>>>
>>>>>>>> Signed-off-by: Yu Kuai <[email protected]>
>>>>>>>
>>>>>>> This is a bit hacky. I was looking into a better place where to hook to
>>>>>>> count entities in a bfq_group with requests and I think bfq_add_bfqq_busy()
>>>>>>> and bfq_del_bfqq_busy() are ideal for this. It also makes better sense
>>>>>>> conceptually than hooking into weights tree handling.
>>>>>>
>>>>>> bfq_del_bfqq_busy() will be called when all the reqs in the bfqq are
>>>>>> dispatched, however there might still some reqs are't completed yet.
>>>>>>
>>>>>> Here what we want to track is how many bfqqs have pending reqs,
>>>>>> specifically if the bfqq have reqs are't complted.
>>>>>>
>>>>>> Thus I think bfq_del_bfqq_busy() is not the right place to do that.
>>>>>
>>>>> Yes, I'm aware there will be a difference. But note that bfqq can stay busy
>>>>> with only dispatched requests because the logic in __bfq_bfqq_expire() will
>>>>> not call bfq_del_bfqq_busy() if idling is needed for service guarantees. So
>>>>> I think using bfq_add/del_bfqq_busy() would work OK.
>>>> Hi,
>>>>
>>>> I didn't think of that before. If bfqq stay busy after dispathing all
>>>> the requests, there are two other places that bfqq can clear busy:
>>>>
>>>> 1) bfq_remove_request(), bfqq has to insert a new req while it's not in
>>>> service.
>>>
>>> Yes and the request then would have to be dispatched or merged. Which
>>> generally means another bfqq from the same bfqg is currently active and
>>> thus this should have no impact on service guarantees we are interested in.
>>>
>>>> 2) bfq_release_process_ref(), user thread is gone / moved, or old bfqq
>>>> is gone due to merge / ioprio change.
>>>
>>> Yes, here there's no new IO for the bfqq so no point in maintaining any
>>> service guarantees to it.
>>>
>>>> I wonder, will bfq_del_bfqq_busy() be called immediately when requests
>>>> are completed? (It seems not to me...). For example, a user thread
>>>> issue a sync io just once, and it keep running without issuing new io,
>>>> then when does the bfqq clears the busy state?
>>>
>>> No, when bfqq is kept busy, it will get scheduled as in-service queue in
>>> the future. Then what happens depends on whether it will get more requests
>>> or not. But generally its busy state will get cleared once it is expired
>>> for other reason than preemption.
>>
>> Thanks for your explanation.
>>
>> I think in normal case using bfq_add/del_bfqq_busy() if fine.
>>
>> There is one last situation that I'm worried: If some disk are very
>> slow that the dispatched reqs are not completed when the bfqq is
>> rescheduled as in-service queue, and thus busy state can be cleared
>> while reqs are not completed.
>>
>> Using bfq_del_bfqq_busy() will change behaviour in this specail case,
>> do you think service guarantees will be broken?
>
> Well, I don't think so. Because slow disks don't tend to do a lot of
> internal scheduling (or have deep IO queues for that matter). Also note
> that generally bfq_select_queue() will not even expire a queue (despite it
> not having any requests to dispatch) when we should not dispatch other
> requests to maintain service guarantees. So I think service guarantees will
> be generally preserved. Obviously I could be wrong, we we will not know
> until we try it :).

Thanks a lot for your explanation, I'll do some tests. And i'll send a
new version if tests look good.

2022-04-26 16:46:01

by Paolo Valente

[permalink] [raw]

Subject: Re: [PATCH -next v2 2/5] block, bfq: add fake weight_counter for weight-raised queue

> Il giorno 26 apr 2022, alle ore 11:15, Jan Kara <[email protected]> ha scritto:
>
> On Tue 26-04-22 16:27:46, yukuai (C) wrote:
>> 在 2022/04/26 15:40, Jan Kara 写道:
>>> On Tue 26-04-22 09:49:04, yukuai (C) wrote:
>>>> 在 2022/04/26 0:16, Jan Kara 写道:
>>>>> Hello!
>>>>>
>>>>> On Mon 25-04-22 21:34:16, yukuai (C) wrote:
>>>>>> 在 2022/04/25 17:48, Jan Kara 写道:
>>>>>>> On Sat 16-04-22 17:37:50, Yu Kuai wrote:
>>>>>>>> Weight-raised queue is not inserted to weights_tree, which makes it
>>>>>>>> impossible to track how many queues have pending requests through
>>>>>>>> weights_tree insertion and removel. This patch add fake weight_counter
>>>>>>>> for weight-raised queue to do that.
>>>>>>>>
>>>>>>>> Signed-off-by: Yu Kuai <[email protected]>
>>>>>>>
>>>>>>> This is a bit hacky. I was looking into a better place where to hook to
>>>>>>> count entities in a bfq_group with requests and I think bfq_add_bfqq_busy()
>>>>>>> and bfq_del_bfqq_busy() are ideal for this. It also makes better sense
>>>>>>> conceptually than hooking into weights tree handling.
>>>>>>
>>>>>> bfq_del_bfqq_busy() will be called when all the reqs in the bfqq are
>>>>>> dispatched, however there might still some reqs are't completed yet.
>>>>>>
>>>>>> Here what we want to track is how many bfqqs have pending reqs,
>>>>>> specifically if the bfqq have reqs are't complted.
>>>>>>
>>>>>> Thus I think bfq_del_bfqq_busy() is not the right place to do that.
>>>>>
>>>>> Yes, I'm aware there will be a difference. But note that bfqq can stay busy
>>>>> with only dispatched requests because the logic in __bfq_bfqq_expire() will
>>>>> not call bfq_del_bfqq_busy() if idling is needed for service guarantees. So
>>>>> I think using bfq_add/del_bfqq_busy() would work OK.
>>>> Hi,
>>>>
>>>> I didn't think of that before. If bfqq stay busy after dispathing all
>>>> the requests, there are two other places that bfqq can clear busy:
>>>>
>>>> 1) bfq_remove_request(), bfqq has to insert a new req while it's not in
>>>> service.
>>>
>>> Yes and the request then would have to be dispatched or merged. Which
>>> generally means another bfqq from the same bfqg is currently active and
>>> thus this should have no impact on service guarantees we are interested in.
>>>
>>>> 2) bfq_release_process_ref(), user thread is gone / moved, or old bfqq
>>>> is gone due to merge / ioprio change.
>>>
>>> Yes, here there's no new IO for the bfqq so no point in maintaining any
>>> service guarantees to it.
>>>
>>>> I wonder, will bfq_del_bfqq_busy() be called immediately when requests
>>>> are completed? (It seems not to me...). For example, a user thread
>>>> issue a sync io just once, and it keep running without issuing new io,
>>>> then when does the bfqq clears the busy state?
>>>
>>> No, when bfqq is kept busy, it will get scheduled as in-service queue in
>>> the future. Then what happens depends on whether it will get more requests
>>> or not. But generally its busy state will get cleared once it is expired
>>> for other reason than preemption.
>>
>> Thanks for your explanation.
>>
>> I think in normal case using bfq_add/del_bfqq_busy() if fine.
>>
>> There is one last situation that I'm worried: If some disk are very
>> slow that the dispatched reqs are not completed when the bfqq is
>> rescheduled as in-service queue, and thus busy state can be cleared
>> while reqs are not completed.
>>
>> Using bfq_del_bfqq_busy() will change behaviour in this specail case,
>> do you think service guarantees will be broken?
>
> Well, I don't think so. Because slow disks don't tend to do a lot of
> internal scheduling (or have deep IO queues for that matter). Also note
> that generally bfq_select_queue() will not even expire a queue (despite it
> not having any requests to dispatch) when we should not dispatch other
> requests to maintain service guarantees. So I think service guarantees will
> be generally preserved. Obviously I could be wrong, we we will not know
> until we try it :).
>

I have nothing to add ... You guys are getting better than me about BFQ :)

Thanks,
Paolo

> Honza
>
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

2022-04-26 21:03:25

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH -next v2 2/5] block, bfq: add fake weight_counter for weight-raised queue

在 2022/04/26 15:40, Jan Kara 写道:
> On Tue 26-04-22 09:49:04, yukuai (C) wrote:
>> 在 2022/04/26 0:16, Jan Kara 写道:
>>> Hello!
>>>
>>> On Mon 25-04-22 21:34:16, yukuai (C) wrote:
>>>> 在 2022/04/25 17:48, Jan Kara 写道:
>>>>> On Sat 16-04-22 17:37:50, Yu Kuai wrote:
>>>>>> Weight-raised queue is not inserted to weights_tree, which makes it
>>>>>> impossible to track how many queues have pending requests through
>>>>>> weights_tree insertion and removel. This patch add fake weight_counter
>>>>>> for weight-raised queue to do that.
>>>>>>
>>>>>> Signed-off-by: Yu Kuai <[email protected]>
>>>>>
>>>>> This is a bit hacky. I was looking into a better place where to hook to
>>>>> count entities in a bfq_group with requests and I think bfq_add_bfqq_busy()
>>>>> and bfq_del_bfqq_busy() are ideal for this. It also makes better sense
>>>>> conceptually than hooking into weights tree handling.
>>>>
>>>> bfq_del_bfqq_busy() will be called when all the reqs in the bfqq are
>>>> dispatched, however there might still some reqs are't completed yet.
>>>>
>>>> Here what we want to track is how many bfqqs have pending reqs,
>>>> specifically if the bfqq have reqs are't complted.
>>>>
>>>> Thus I think bfq_del_bfqq_busy() is not the right place to do that.
>>>
>>> Yes, I'm aware there will be a difference. But note that bfqq can stay busy
>>> with only dispatched requests because the logic in __bfq_bfqq_expire() will
>>> not call bfq_del_bfqq_busy() if idling is needed for service guarantees. So
>>> I think using bfq_add/del_bfqq_busy() would work OK.
>> Hi,
>>
>> I didn't think of that before. If bfqq stay busy after dispathing all
>> the requests, there are two other places that bfqq can clear busy:
>>
>> 1) bfq_remove_request(), bfqq has to insert a new req while it's not in
>> service.
>
> Yes and the request then would have to be dispatched or merged. Which
> generally means another bfqq from the same bfqg is currently active and
> thus this should have no impact on service guarantees we are interested in.
>
>> 2) bfq_release_process_ref(), user thread is gone / moved, or old bfqq
>> is gone due to merge / ioprio change.
>
> Yes, here there's no new IO for the bfqq so no point in maintaining any
> service guarantees to it.
>
>> I wonder, will bfq_del_bfqq_busy() be called immediately when requests
>> are completed? (It seems not to me...). For example, a user thread
>> issue a sync io just once, and it keep running without issuing new io,
>> then when does the bfqq clears the busy state?
>
> No, when bfqq is kept busy, it will get scheduled as in-service queue in
> the future. Then what happens depends on whether it will get more requests
> or not. But generally its busy state will get cleared once it is expired
> for other reason than preemption.

Thanks for your explanation.

I think in normal case using bfq_add/del_bfqq_busy() if fine.

There is one last situation that I'm worried: If some disk are very
slow that the dispatched reqs are not completed when the bfqq is
rescheduled as in-service queue, and thus busy state can be cleared
while reqs are not completed.

Using bfq_del_bfqq_busy() will change behaviour in this specail case,
do you think service guarantees will be broken?

Thanks,
Kuai

2022-04-27 09:20:46

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH -next v2 2/5] block, bfq: add fake weight_counter for weight-raised queue

On Tue 26-04-22 16:27:46, yukuai (C) wrote:
> 在 2022/04/26 15:40, Jan Kara 写道:
> > On Tue 26-04-22 09:49:04, yukuai (C) wrote:
> > > 在 2022/04/26 0:16, Jan Kara 写道:
> > > > Hello!
> > > >
> > > > On Mon 25-04-22 21:34:16, yukuai (C) wrote:
> > > > > 在 2022/04/25 17:48, Jan Kara 写道:
> > > > > > On Sat 16-04-22 17:37:50, Yu Kuai wrote:
> > > > > > > Weight-raised queue is not inserted to weights_tree, which makes it
> > > > > > > impossible to track how many queues have pending requests through
> > > > > > > weights_tree insertion and removel. This patch add fake weight_counter
> > > > > > > for weight-raised queue to do that.
> > > > > > >
> > > > > > > Signed-off-by: Yu Kuai <[email protected]>
> > > > > >
> > > > > > This is a bit hacky. I was looking into a better place where to hook to
> > > > > > count entities in a bfq_group with requests and I think bfq_add_bfqq_busy()
> > > > > > and bfq_del_bfqq_busy() are ideal for this. It also makes better sense
> > > > > > conceptually than hooking into weights tree handling.
> > > > >
> > > > > bfq_del_bfqq_busy() will be called when all the reqs in the bfqq are
> > > > > dispatched, however there might still some reqs are't completed yet.
> > > > >
> > > > > Here what we want to track is how many bfqqs have pending reqs,
> > > > > specifically if the bfqq have reqs are't complted.
> > > > >
> > > > > Thus I think bfq_del_bfqq_busy() is not the right place to do that.
> > > >
> > > > Yes, I'm aware there will be a difference. But note that bfqq can stay busy
> > > > with only dispatched requests because the logic in __bfq_bfqq_expire() will
> > > > not call bfq_del_bfqq_busy() if idling is needed for service guarantees. So
> > > > I think using bfq_add/del_bfqq_busy() would work OK.
> > > Hi,
> > >
> > > I didn't think of that before. If bfqq stay busy after dispathing all
> > > the requests, there are two other places that bfqq can clear busy:
> > >
> > > 1) bfq_remove_request(), bfqq has to insert a new req while it's not in
> > > service.
> >
> > Yes and the request then would have to be dispatched or merged. Which
> > generally means another bfqq from the same bfqg is currently active and
> > thus this should have no impact on service guarantees we are interested in.
> >
> > > 2) bfq_release_process_ref(), user thread is gone / moved, or old bfqq
> > > is gone due to merge / ioprio change.
> >
> > Yes, here there's no new IO for the bfqq so no point in maintaining any
> > service guarantees to it.
> >
> > > I wonder, will bfq_del_bfqq_busy() be called immediately when requests
> > > are completed? (It seems not to me...). For example, a user thread
> > > issue a sync io just once, and it keep running without issuing new io,
> > > then when does the bfqq clears the busy state?
> >
> > No, when bfqq is kept busy, it will get scheduled as in-service queue in
> > the future. Then what happens depends on whether it will get more requests
> > or not. But generally its busy state will get cleared once it is expired
> > for other reason than preemption.
>
> Thanks for your explanation.
>
> I think in normal case using bfq_add/del_bfqq_busy() if fine.
>
> There is one last situation that I'm worried: If some disk are very
> slow that the dispatched reqs are not completed when the bfqq is
> rescheduled as in-service queue, and thus busy state can be cleared
> while reqs are not completed.
>
> Using bfq_del_bfqq_busy() will change behaviour in this specail case,
> do you think service guarantees will be broken?

Well, I don't think so. Because slow disks don't tend to do a lot of
internal scheduling (or have deep IO queues for that matter). Also note
that generally bfq_select_queue() will not even expire a queue (despite it
not having any requests to dispatch) when we should not dispatch other
requests to maintain service guarantees. So I think service guarantees will
be generally preserved. Obviously I could be wrong, we we will not know
until we try it :).

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR