2022-06-29 05:34:47

by Liu Song

[permalink] [raw]
Subject: [PATCH] blk-mq: set BLK_MQ_S_STOPPED first to avoid unexpected queue work

From: Liu Song <[email protected]>

In "__blk_mq_delay_run_hw_queue", BLK_MQ_S_STOPPED is checked first,
and then queue work, but in "blk_mq_stop_hw_queue", execute cancel
work first and then set BLK_MQ_S_STOPPED, so there is a risk of
queue work after setting BLK_MQ_S_STOPPED, which can be solved by
adjusting the order.

Signed-off-by: Liu Song <[email protected]>
---
block/blk-mq.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 93d9d60..865915e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2258,9 +2258,9 @@ bool blk_mq_queue_stopped(struct request_queue *q)
*/
void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
{
- cancel_delayed_work(&hctx->run_work);
-
set_bit(BLK_MQ_S_STOPPED, &hctx->state);
+
+ cancel_delayed_work(&hctx->run_work);
}
EXPORT_SYMBOL(blk_mq_stop_hw_queue);

--
1.8.3.1


2022-06-29 18:45:36

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] blk-mq: set BLK_MQ_S_STOPPED first to avoid unexpected queue work

On 6/28/22 22:18, Liu Song wrote:
> From: Liu Song <[email protected]>
>
> In "__blk_mq_delay_run_hw_queue", BLK_MQ_S_STOPPED is checked first,
> and then queue work, but in "blk_mq_stop_hw_queue", execute cancel
> work first and then set BLK_MQ_S_STOPPED, so there is a risk of
> queue work after setting BLK_MQ_S_STOPPED, which can be solved by
> adjusting the order.
>
> Signed-off-by: Liu Song <[email protected]>
> ---
> block/blk-mq.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 93d9d60..865915e 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2258,9 +2258,9 @@ bool blk_mq_queue_stopped(struct request_queue *q)
> */
> void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
> {
> - cancel_delayed_work(&hctx->run_work);
> -
> set_bit(BLK_MQ_S_STOPPED, &hctx->state);
> +
> + cancel_delayed_work(&hctx->run_work);
> }
> EXPORT_SYMBOL(blk_mq_stop_hw_queue);

What made you come up with this patch? Source code reading or something
else? Please mention this in the patch description.

Regarding the above patch, I don't think this patch fixes the existing
race between blk_mq_stop_hw_queue() and __blk_mq_delay_run_hw_queue(),
not even if cancel_delayed_work_sync() would be used.

The comment block above blk_mq_stop_hw_queue() clearly mentions that it
is not guaranteed that this function stops dispatching of requests
immediately. So why bother about fixing the existing race conditions that
do not affect what is guaranteed by blk_mq_stop_hw_queue()?

Thanks,

Bart.

2022-06-30 01:36:35

by Liu Song

[permalink] [raw]
Subject: Re: [PATCH] blk-mq: set BLK_MQ_S_STOPPED first to avoid unexpected queue work

>On 6/28/22 22:18, Liu Song wrote:
>> From: Liu Song <[email protected]>
>>
>> In "__blk_mq_delay_run_hw_queue", BLK_MQ_S_STOPPED is checked first,
>> and then queue work, but in "blk_mq_stop_hw_queue", execute cancel
>> work first and then set BLK_MQ_S_STOPPED, so there is a risk of
>> queue work after setting BLK_MQ_S_STOPPED, which can be solved by
>> adjusting the order.
>>
>> Signed-off-by: Liu Song <[email protected]>
>> ---
>> block/blk-mq.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 93d9d60..865915e 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -2258,9 +2258,9 @@ bool blk_mq_queue_stopped(struct request_queue *q)
>> */
>> void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
>> {
>> - cancel_delayed_work(&hctx->run_work);
>> -
>> set_bit(BLK_MQ_S_STOPPED, &hctx->state);
>> +
>> + cancel_delayed_work(&hctx->run_work);
>> }
>> EXPORT_SYMBOL(blk_mq_stop_hw_queue);
>
>What made you come up with this patch? Source code reading or something
>else? Please mention this in the patch description.

Hi,

I found this by source code reading.
It is true that "blk_mq_stop_hw_queue" does not guarantee any dispatch will be blocked,
but I think "blk_mq_stop_hw_queue" and "__blk_mq_delay_run_hw_queue" have a reverse
order in the processing logic of "BLK_MQ_S_STOPPED".
Part of the race problem can be solved only by adjusting the judgment order, so it is still valuable.

Thanks

>
>Regarding the above patch, I don't think this patch fixes the existing
>race between blk_mq_stop_hw_queue() and __blk_mq_delay_run_hw_queue(),
>not even if cancel_delayed_work_sync() would be used.
>
>The comment block above blk_mq_stop_hw_queue() clearly mentions that it
>is not guaranteed that this function stops dispatching of requests
>immediately. So why bother about fixing the existing race conditions that
>do not affect what is guaranteed by blk_mq_stop_hw_queue()?
>
>Thanks,
>
>Bart.