2021-12-13 03:53:55

by QiuLaibin

[permalink] [raw]
Subject: [PATCH -next] block/wbt: fix negative inflight counter when remove scsi device

Now that we disable wbt by set WBT_STATE_OFF_DEFAULT in
wbt_disable_default() when switch elevator to bfq. And when
we remove scsi device, wbt will be enabled by wbt_enable_default.
If it become false positive between wbt_wait() and wbt_track()
when submit write request.

The following is the scenario that triggered the problem.

T1 T2 T3
elevator_switch_mq
bfq_init_queue
wbt_disable_default <= Set
rwb->enable_state (OFF)
Submit_bio
blk_mq_make_request
rq_qos_throttle
<= rwb->enable_state (OFF)
scsi_remove_device
sd_remove
del_gendisk
blk_unregister_queue
elv_unregister_queue
wbt_enable_default
<= Set rwb->enable_state (ON)
q_qos_track
<= rwb->enable_state (ON)
^^^^^^ this request will mark WBT_TRACKED without inflight add and will
lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.

Fix this by judge whether QUEUE_FLAG_REGISTERED is marked to distinguish
scsi remove scene.
Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
Signed-off-by: Laibin Qiu <[email protected]>
---
block/blk-wbt.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 3ed71b8da887..537f77bb1365 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -637,6 +637,10 @@ void wbt_enable_default(struct request_queue *q)
{
struct rq_qos *rqos = wbt_rq_qos(q);

+ /* Queue not registered? Maybe shutting down... */
+ if (!blk_queue_registered(q))
+ return;
+
/* Throttling already enabled? */
if (rqos) {
if (RQWB(rqos)->enable_state == WBT_STATE_OFF_DEFAULT)
@@ -644,10 +648,6 @@ void wbt_enable_default(struct request_queue *q)
return;
}

- /* Queue not registered? Maybe shutting down... */
- if (!blk_queue_registered(q))
- return;
-
if (queue_is_mq(q) && IS_ENABLED(CONFIG_BLK_WBT_MQ))
wbt_init(q);
}
--
2.22.0



2021-12-13 17:16:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH -next] block/wbt: fix negative inflight counter when remove scsi device

On Mon, Dec 13, 2021 at 12:09:07PM +0800, Laibin Qiu wrote:
> Submit_bio
> scsi_remove_device
> sd_remove
> del_gendisk
> blk_unregister_queue
> elv_unregister_queue
> wbt_enable_default
> <= Set rwb->enable_state (ON)
> q_qos_track
> <= rwb->enable_state (ON)
> ^^^^^^ this request will mark WBT_TRACKED without inflight add and will
> lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.
>
> Fix this by judge whether QUEUE_FLAG_REGISTERED is marked to distinguish
> scsi remove scene.
> Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
> Signed-off-by: Laibin Qiu <[email protected]>
> ---
> block/blk-wbt.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index 3ed71b8da887..537f77bb1365 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -637,6 +637,10 @@ void wbt_enable_default(struct request_queue *q)
> {
> struct rq_qos *rqos = wbt_rq_qos(q);
>
> + /* Queue not registered? Maybe shutting down... */
> + if (!blk_queue_registered(q))
> + return;

Wouldn't it make more sense to simply not call wbt_enable_default from
elv_unregister_queue?

2021-12-14 01:13:32

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH -next] block/wbt: fix negative inflight counter when remove scsi device

On Mon, Dec 13, 2021 at 09:16:51AM -0800, Christoph Hellwig wrote:
> On Mon, Dec 13, 2021 at 12:09:07PM +0800, Laibin Qiu wrote:
> > Submit_bio
> > scsi_remove_device
> > sd_remove
> > del_gendisk
> > blk_unregister_queue
> > elv_unregister_queue
> > wbt_enable_default
> > <= Set rwb->enable_state (ON)
> > q_qos_track
> > <= rwb->enable_state (ON)
> > ^^^^^^ this request will mark WBT_TRACKED without inflight add and will
> > lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.
> >
> > Fix this by judge whether QUEUE_FLAG_REGISTERED is marked to distinguish
> > scsi remove scene.
> > Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
> > Signed-off-by: Laibin Qiu <[email protected]>
> > ---
> > block/blk-wbt.c | 8 ++++----
> > 1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> > index 3ed71b8da887..537f77bb1365 100644
> > --- a/block/blk-wbt.c
> > +++ b/block/blk-wbt.c
> > @@ -637,6 +637,10 @@ void wbt_enable_default(struct request_queue *q)
> > {
> > struct rq_qos *rqos = wbt_rq_qos(q);
> >
> > + /* Queue not registered? Maybe shutting down... */
> > + if (!blk_queue_registered(q))
> > + return;
>
> Wouldn't it make more sense to simply not call wbt_enable_default from
> elv_unregister_queue?

wbt_disable_default() is called in bfq_init_root_group(), so wbt_enable_default
should be moved to bfq_exit_queue()?


Thanks,
Ming


2021-12-14 04:25:13

by QiuLaibin

[permalink] [raw]
Subject: Re: [PATCH -next] block/wbt: fix negative inflight counter when remove scsi device


On 2021/12/14 1:16, Christoph Hellwig wrote:
> On Mon, Dec 13, 2021 at 12:09:07PM +0800, Laibin Qiu wrote:
>> Submit_bio
>> scsi_remove_device
>> sd_remove
>> del_gendisk
>> blk_unregister_queue
>> elv_unregister_queue
>> wbt_enable_default
>> <= Set rwb->enable_state (ON)
>> q_qos_track
>> <= rwb->enable_state (ON)
>> ^^^^^^ this request will mark WBT_TRACKED without inflight add and will
>> lead to drop rqw->inflight to -1 in wbt_done() which will trigger IO hung.
>>
>> Fix this by judge whether QUEUE_FLAG_REGISTERED is marked to distinguish
>> scsi remove scene.
>> Fixes: 76a8040817b4b ("blk-wbt: make sure throttle is enabled properly")
>> Signed-off-by: Laibin Qiu <[email protected]>
>> ---
>> block/blk-wbt.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
>> index 3ed71b8da887..537f77bb1365 100644
>> --- a/block/blk-wbt.c
>> +++ b/block/blk-wbt.c
>> @@ -637,6 +637,10 @@ void wbt_enable_default(struct request_queue *q)
>> {
>> struct rq_qos *rqos = wbt_rq_qos(q);
>>
>> + /* Queue not registered? Maybe shutting down... */
>> + if (!blk_queue_registered(q))
>> + return;
>
> Wouldn't it make more sense to simply not call wbt_enable_default from
> elv_unregister_queue?
> .
>

Refer to your opinion, I will post another version of V2.
Please take a look again.

2021-12-14 08:07:09

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH -next] block/wbt: fix negative inflight counter when remove scsi device

On Tue, Dec 14, 2021 at 09:13:10AM +0800, Ming Lei wrote:
> > Wouldn't it make more sense to simply not call wbt_enable_default from
> > elv_unregister_queue?
>
> wbt_disable_default() is called in bfq_init_root_group(), so wbt_enable_default

s/bfq_init_root_group/bfq_init_queue/

But yes, that sounds like an even better idea. Or maybe even an
elevator feature flag.