The depth is 62, and the wake_batch is 8. In the following situation,
the task would hang forever.
t1: t2: t3:
blk_mq_get_tag . .
io_schedule . .
elevator_switch .
blk_mq_freeze_queue .
blk_freeze_queue_start .
blk_mq_freeze_queue_wait .
blk_mq_submit_bio
__bio_queue_enter
Fix this issue by waking up all the waiters sleeping on tags after
freezing the queue.
Signed-off-by: Yang Yang <[email protected]>
---
block/blk-core.c | 2 --
block/blk-mq.c | 4 +++-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index a16b5abdbbf5..e1eacfad6e5b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q)
* prevent I/O from crossing blk_queue_enter().
*/
blk_freeze_queue_start(q);
- if (queue_is_mq(q))
- blk_mq_wake_waiters(q);
/* Make blk_queue_enter() reexamine the DYING flag. */
wake_up_all(&q->mq_freeze_wq);
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4ecb9db62337..9eb3139e713a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q)
if (++q->mq_freeze_depth == 1) {
percpu_ref_kill(&q->q_usage_counter);
mutex_unlock(&q->mq_freeze_lock);
- if (queue_is_mq(q))
+ if (queue_is_mq(q)) {
+ blk_mq_wake_waiters(q);
blk_mq_run_hw_queues(q, false);
+ }
} else {
mutex_unlock(&q->mq_freeze_lock);
}
--
2.34.1
On 5/19/24 20:38, Yang Yang wrote:
> The depth is 62, and the wake_batch is 8. In the following situation,
> the task would hang forever.
>
> t1: t2: t3:
> blk_mq_get_tag . .
> io_schedule . .
> elevator_switch .
> blk_mq_freeze_queue .
> blk_freeze_queue_start .
> blk_mq_freeze_queue_wait .
> blk_mq_submit_bio
> __bio_queue_enter
>
> Fix this issue by waking up all the waiters sleeping on tags after
> freezing the queue.
Shouldn't blk_mq_alloc_request() be mentioned in t1 since that is the function
that calls blk_queue_enter()?
> diff --git a/block/blk-core.c b/block/blk-core.c
> index a16b5abdbbf5..e1eacfad6e5b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q)
> * prevent I/O from crossing blk_queue_enter().
> */
> blk_freeze_queue_start(q);
> - if (queue_is_mq(q))
> - blk_mq_wake_waiters(q);
> /* Make blk_queue_enter() reexamine the DYING flag. */
> wake_up_all(&q->mq_freeze_wq);
> }
Why has blk_queue_start_drain() been modified? I don't see any reference
in the patch description to blk_queue_start_drain(). Am I perhaps missing
something?
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 4ecb9db62337..9eb3139e713a 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q)
> if (++q->mq_freeze_depth == 1) {
> percpu_ref_kill(&q->q_usage_counter);
> mutex_unlock(&q->mq_freeze_lock);
> - if (queue_is_mq(q))
> + if (queue_is_mq(q)) {
> + blk_mq_wake_waiters(q);
> blk_mq_run_hw_queues(q, false);
> + }
> } else {
> mutex_unlock(&q->mq_freeze_lock);
> }
Why would the above change be necessary? If the blk_queue_enter() call
by blk_mq_alloc_request() succeeds and blk_mq_get_tag() calls
io_schedule(), io_schedule() will be woken up indirectly by the
blk_mq_run_hw_queues() call because that call will free one of the tags
that the io_schedule() call is waiting for.
Thanks,
Bart.
On 2024/5/21 2:11, Bart Van Assche wrote:
> On 5/19/24 20:38, Yang Yang wrote:
>> The depth is 62, and the wake_batch is 8. In the following situation,
>> the task would hang forever.
>>
>> t1: t2: t3:
>> blk_mq_get_tag . .
>> io_schedule . .
>> elevator_switch .
>> blk_mq_freeze_queue .
>> blk_freeze_queue_start .
>> blk_mq_freeze_queue_wait .
>> blk_mq_submit_bio
>> __bio_queue_enter
>>
>> Fix this issue by waking up all the waiters sleeping on tags after
>> freezing the queue.
>
> Shouldn't blk_mq_alloc_request() be mentioned in t1 since that is the function
> that calls blk_queue_enter()?
t1: t2: t3:
blk_mq_submit_bio . .
__blk_mq_alloc_requests . .
blk_mq_get_tag . .
io_schedule . .
elevator_switch .
blk_mq_freeze_queue .
blk_freeze_queue_start .
q->mq_freeze_depth=1 .
blk_mq_freeze_queue_wait .
blk_mq_submit_bio
__bio_queue_enter
wait_event(!q->mq_freeze_depth)
>
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index a16b5abdbbf5..e1eacfad6e5b 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q)
>> * prevent I/O from crossing blk_queue_enter().
>> */
>> blk_freeze_queue_start(q);
>> - if (queue_is_mq(q))
>> - blk_mq_wake_waiters(q);
>> /* Make blk_queue_enter() reexamine the DYING flag. */
>> wake_up_all(&q->mq_freeze_wq);
>> }
>
> Why has blk_queue_start_drain() been modified? I don't see any reference
> in the patch description to blk_queue_start_drain(). Am I perhaps missing
> something?
blk_mq_wake_waiters() has already been called in blk_freeze_queue_start(),
so I thought it can be removed from blk_queue_start_drain().
>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 4ecb9db62337..9eb3139e713a 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q)
>> if (++q->mq_freeze_depth == 1) {
>> percpu_ref_kill(&q->q_usage_counter);
>> mutex_unlock(&q->mq_freeze_lock);
>> - if (queue_is_mq(q))
>> + if (queue_is_mq(q)) {
>> + blk_mq_wake_waiters(q);
>> blk_mq_run_hw_queues(q, false);
>> + }
>> } else {
>> mutex_unlock(&q->mq_freeze_lock);
>> }
>
> Why would the above change be necessary? If the blk_queue_enter() call
> by blk_mq_alloc_request() succeeds and blk_mq_get_tag() calls
> io_schedule(), io_schedule() will be woken up indirectly by the
> blk_mq_run_hw_queues() call because that call will free one of the tags
> that the io_schedule() call is waiting for.
This patch is a workaround solution. I think the hang is caused by
a lost wakeup, so after blk_mq_run_hw_queues(), t1 is still waiting
for the tag.
bt = 0xFFFFFF802F9C6790 -> (
sb = (
depth = 62,
shift = 6,
map_nr = 1,
round_robin = FALSE,
map = 0xFFFFFF803BF97000,
alloc_hint = 0x00000049119B3F4C),
wake_batch = 6,
wake_index = (counter = 0),
ws = 0xFFFFFF803BEBCA00,
ws_active = (counter = 1),
min_shallow_depth = 48,
completion_cnt = (counter = 1),
wakeup_cnt = (counter = 0))
Upon analyzing the coredump, it was noticed that sbq->completion_cnt=1,
and I can't figure out why.
blk_mq_put_tag->sbitmap_queue_clear->sbitmap_queue_wake_up(sbq, 1) should
be called multiple times, considering that sbq->ws_active=1,
sbq->completion_cnt should be greater than 1.
Looking forward to some advice from block layer experts.
Thanks.
>
> Thanks,
>
> Bart.