2023-08-13 16:38:02

by Chengming Zhou

[permalink] [raw]
Subject: [PATCH v2] blk-mq: release scheduler resource when request complete

From: Chengming Zhou <[email protected]>

Chuck reported [1] a IO hang problem on NFS exports that reside on SATA
devices and bisected to commit 615939a2ae73 ("blk-mq: defer to the normal
submission path for post-flush requests").

We analysed the IO hang problem, found there are two postflush requests
are waiting for each other.

The first postflush request completed the REQ_FSEQ_DATA sequence, so go to
the REQ_FSEQ_POSTFLUSH sequence and added in the flush pending list, but
failed to blk_kick_flush() because of the second postflush request which
is inflight waiting in scheduler queue.

The second postflush waiting in scheduler queue can't be dispatched because
the first postflush hasn't released scheduler resource even though it has
completed by itself.

Fix it by releasing scheduler resource when the first postflush request
completed, so the second postflush can be dispatched and completed, then
make blk_kick_flush() succeed.

[1] https://lore.kernel.org/all/[email protected]/

Fixes: 615939a2ae73 ("blk-mq: defer to the normal submission path for post-flush requests")
Reported-by: Chuck Lever <[email protected]>
Signed-off-by: Chengming Zhou <[email protected]>
Tested-by: Chuck Lever <[email protected]>
---
v2:
- All IO schedulers do set ->finish_request(), so remove the
check and warn on not setting when register.
---
block/blk-mq.c | 16 ++++++++++++----
block/elevator.c | 3 +++
2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f14b8669ac69..a8c63bef8ff1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -682,6 +682,14 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
}
EXPORT_SYMBOL_GPL(blk_mq_alloc_request_hctx);

+static void blk_mq_finish_request(struct request *rq)
+{
+ struct request_queue *q = rq->q;
+
+ if (rq->rq_flags & RQF_USE_SCHED)
+ q->elevator->type->ops.finish_request(rq);
+}
+
static void __blk_mq_free_request(struct request *rq)
{
struct request_queue *q = rq->q;
@@ -708,10 +716,6 @@ void blk_mq_free_request(struct request *rq)
{
struct request_queue *q = rq->q;

- if ((rq->rq_flags & RQF_USE_SCHED) &&
- q->elevator->type->ops.finish_request)
- q->elevator->type->ops.finish_request(rq);
-
if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq)))
laptop_io_completion(q->disk->bdi);

@@ -1021,6 +1025,8 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
if (blk_mq_need_time_stamp(rq))
__blk_mq_end_request_acct(rq, ktime_get_ns());

+ blk_mq_finish_request(rq);
+
if (rq->end_io) {
rq_qos_done(rq->q, rq);
if (rq->end_io(rq, error) == RQ_END_IO_FREE)
@@ -1075,6 +1081,8 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob)
if (iob->need_ts)
__blk_mq_end_request_acct(rq, now);

+ blk_mq_finish_request(rq);
+
rq_qos_done(rq->q, rq);

/*
diff --git a/block/elevator.c b/block/elevator.c
index 8400e303fbcb..ac2cb3814eac 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -499,6 +499,9 @@ void elv_unregister_queue(struct request_queue *q)

int elv_register(struct elevator_type *e)
{
+ if (WARN_ON_ONCE(!e->ops.finish_request))
+ return -EINVAL;
+
/* insert_requests and dispatch_request are mandatory */
if (WARN_ON_ONCE(!e->ops.insert_requests || !e->ops.dispatch_request))
return -EINVAL;
--
2.41.0



2023-08-19 12:15:55

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2] blk-mq: release scheduler resource when request complete

On 8/17/23 11:17 AM, Chengming Zhou wrote:
> On 2023/8/17 23:29, Chengming Zhou wrote:
>> On 2023/8/17 22:50, Bart Van Assche wrote:
>>> On 8/17/23 07:41, kernel test robot wrote:
>>>> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278
>>>> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
>>>
>>> The above information shows that dd_inserted_request() has been called
>>> 276 times and also that dd_finish_request() has been called 278 times.
>>
>> Thanks much for your help.
>>
>> This patch indeed introduced a regression, postflush requests will be completed
>> twice, so here dd_finish_request() is more than dd_inserted_request().
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index a8c63bef8ff1..7cd47ffc04ce 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -686,8 +686,10 @@ static void blk_mq_finish_request(struct request *rq)
>> {
>> struct request_queue *q = rq->q;
>>
>> - if (rq->rq_flags & RQF_USE_SCHED)
>> + if (rq->rq_flags & RQF_USE_SCHED) {
>> q->elevator->type->ops.finish_request(rq);
>> + rq->rq_flags &= ~RQF_USE_SCHED;
>> + }
>> }
>>
>
> I just tried to run LKP and xfstests, firstly failed to run LKP on my server
> which seems to miss some dependencies. Then I ran xfstests successfully.
>
> But xfstests generic/704 always pass and no WARN in dmesg. (I don't know why,
> maybe my server settings are some different from the test robot.)
>
> So I try to reproduce it manually. Steps:
>
> ```
> echo mq-deadline > /sys/block/sdb/queue/scheduler
>
> mkfs.ext4 /dev/sdb
> mount /dev/sdb /fs/sdb
> cd /fs/sdb
> stress-ng --symlink 4 --timeout 60
>
> echo none > /sys/block/sdb/queue/scheduler
> ```
>
> This way the WARNING in mq-deadline can be reproduced easily.
>
> Then retest with the diff, mq-deadline WARNING still happened... So there
> are still other requests which have RQF_USE_SCHED flag completed without
> being inserted into elevator.
>
> Will use some tracing and look again.

Ah missed this, thanks for doing this testing. I'll wait for an update
version. We can just fold in whatever change we need, and most likely
just push the patch to next week rather than send off a pull request for
this week. It's the only important thing pending on the block side.

--
Jens Axboe


2023-08-19 16:58:15

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2] blk-mq: release scheduler resource when request complete

On 8/17/23 11:20 AM, Jens Axboe wrote:
> On 8/17/23 9:29 AM, Chengming Zhou wrote:
>> On 2023/8/17 22:50, Bart Van Assche wrote:
>>> On 8/17/23 07:41, kernel test robot wrote:
>>>> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278
>>>> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3))
>>>
>>> The above information shows that dd_inserted_request() has been called
>>> 276 times and also that dd_finish_request() has been called 278 times.
>>
>> Thanks much for your help.
>>
>> This patch indeed introduced a regression, postflush requests will be completed
>> twice, so here dd_finish_request() is more than dd_inserted_request().
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index a8c63bef8ff1..7cd47ffc04ce 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -686,8 +686,10 @@ static void blk_mq_finish_request(struct request *rq)
>> {
>> struct request_queue *q = rq->q;
>>
>> - if (rq->rq_flags & RQF_USE_SCHED)
>> + if (rq->rq_flags & RQF_USE_SCHED) {
>> q->elevator->type->ops.finish_request(rq);
>> + rq->rq_flags &= ~RQF_USE_SCHED;
>> + }
>> }
>>
>>
>> Clear RQF_USE_SCHED flag here should fix this problem, which should be ok
>> since finish_request() is the last callback, this flag isn't needed anymore.
>>
>> Jens, should I send this diff as another patch or resend updated v3?
>
> I don't think this is the right solution, it makes all kinds of
> assumptions on what that flag is and when it's safe to clear it. It's a
> very fragile fix, I think we need to do better than that.

Well maybe this is actually fine, since we're freeing the request now
anyway. I can fold it in the fix, I'll add a comment as well. If this is
subtle enough that it caused this issue, we definitely should have a
comment on why we're clearing this flag.

--
Jens Axboe