2021-12-14 04:27:48

by QiuLaibin

[permalink] [raw]
Subject: [PATCH v2 -next] block/wbt: fix negative inflight counter when remove scsi device

Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in
wbt_disable_default() when switch elevator to bfq. And when
we remove scsi device, wbt will be enabled by wbt_enable_default.
If it become false positive between wbt_wait() and wbt_track()
when submit write request.

The following is the scenario that triggered the problem.

T1 T2 T3
elevator_switch_mq
bfq_init_queue
wbt_disable_default <= Set
rwb->enable_state (OFF)
Submit_bio
blk_mq_make_request
rq_qos_throttle
<= rwb->enable_state (OFF)
scsi_remove_device
sd_remove
del_gendisk
blk_unregister_queue
elv_unregister_queue
wbt_enable_default
<= Set rwb->enable_state (ON)
q_qos_track
<= rwb->enable_state (ON)
^^^^^^ this request will mark WBT_TRACKED without inflight add and will
lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.

Fix this by move wbt_enable_default() from elv_unregister to
elevator_switch_mq. Only re-enable wbt when scheduler switch.
Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
Signed-off-by: Laibin Qiu <[email protected]>
---
block/elevator.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index ec98aed39c4f..de3cf1fa52fa 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -525,8 +525,6 @@ void elv_unregister_queue(struct request_queue *q)
kobject_del(&e->kobj);

e->registered = 0;
- /* Re-enable throttling in case elevator disabled it */
- wbt_enable_default(q);
}
}

@@ -593,8 +591,11 @@ int elevator_switch_mq(struct request_queue *q,
lockdep_assert_held(&q->sysfs_lock);

if (q->elevator) {
- if (q->elevator->registered)
+ if (q->elevator->registered) {
elv_unregister_queue(q);
+ /* Re-enable throttling in case elevator disabled it */
+ wbt_enable_default(q);
+ }

ioc_clear_queue(q);
blk_mq_sched_free_rqs(q);
--
2.22.0



2021-12-14 12:45:15

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH v2 -next] block/wbt: fix negative inflight counter when remove scsi device

On Tue, Dec 14, 2021 at 12:42:59PM +0800, Laibin Qiu wrote:
> Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in
> wbt_disable_default() when switch elevator to bfq. And when
> we remove scsi device, wbt will be enabled by wbt_enable_default.
> If it become false positive between wbt_wait() and wbt_track()
> when submit write request.
>
> The following is the scenario that triggered the problem.
>
> T1 T2 T3
> elevator_switch_mq
> bfq_init_queue
> wbt_disable_default <= Set
> rwb->enable_state (OFF)
> Submit_bio
> blk_mq_make_request
> rq_qos_throttle
> <= rwb->enable_state (OFF)
> scsi_remove_device
> sd_remove
> del_gendisk
> blk_unregister_queue
> elv_unregister_queue
> wbt_enable_default
> <= Set rwb->enable_state (ON)
> q_qos_track
> <= rwb->enable_state (ON)
> ^^^^^^ this request will mark WBT_TRACKED without inflight add and will
> lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.
>
> Fix this by move wbt_enable_default() from elv_unregister to
> elevator_switch_mq. Only re-enable wbt when scheduler switch.
> Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
> Signed-off-by: Laibin Qiu <[email protected]>
> ---
> block/elevator.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/block/elevator.c b/block/elevator.c
> index ec98aed39c4f..de3cf1fa52fa 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -525,8 +525,6 @@ void elv_unregister_queue(struct request_queue *q)
> kobject_del(&e->kobj);
>
> e->registered = 0;
> - /* Re-enable throttling in case elevator disabled it */
> - wbt_enable_default(q);
> }
> }
>
> @@ -593,8 +591,11 @@ int elevator_switch_mq(struct request_queue *q,
> lockdep_assert_held(&q->sysfs_lock);
>
> if (q->elevator) {
> - if (q->elevator->registered)
> + if (q->elevator->registered) {
> elv_unregister_queue(q);
> + /* Re-enable throttling in case elevator disabled it */
> + wbt_enable_default(q);
> + }

Please move wbt_enable_default() into bfq_exit_queue(), which should
be easier to follow and fix the issue too given only bfq disables wbt.


Thanks,
Ming