2023-01-11 05:05:41

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 00/14] A few bugfix and cleanup patches for blk-mq

Hi, this series contain several bugfix patches to fix potential io
hung and a few cleanup patches to remove stale codes and unnecessary
check. Most changes are in request issue and dispatch path. Thanks.

---
V3:
-Collect Reviewed-by from Christoph
-Add new patch "blk-mq: make blk_mq_commit_rqs a general function for all
commits" suggested-by Christoph
-Move patch "blk-mq: remove unncessary from_schedule parameter in
blk_mq_plug_issue_direct" forwad. This is because of some abandoned
work, no influence anyway. No special attention should be paied.
-Make patch based on rewriten blk_mq_commit_rqs.

V2:
-Thanks Christoph for review and there are two fixes in v2 according
to recommends from Christoph.
1)Avoid overly long line in patch "blk-mq: avoid sleep in
blk_mq_alloc_request_hctx"
2)Check BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED in two WARN_ON_ONCE
---

Kemeng Shi (14):
blk-mq: avoid sleep in blk_mq_alloc_request_hctx
blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx
blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait
blk-mq: Fix potential io hung for shared sbitmap per tagset
blk-mq: remove unnecessary list_empty check in
blk_mq_try_issue_list_directly
blk-mq: remove unncessary from_schedule parameter in
blk_mq_plug_issue_direct
blk-mq: make blk_mq_commit_rqs a general function for all commits
blk-mq: remove unncessary error count and commit in
blk_mq_plug_issue_direct
blk-mq: use blk_mq_commit_rqs helper in blk_mq_try_issue_list_directly
blk-mq: simplify flush check in blk_mq_dispatch_rq_list
blk-mq: remove unnecessary error count and check in
blk_mq_dispatch_rq_list
blk-mq: remove set of bd->last when get driver tag for next request
fails
blk-mq: use switch/case to improve readability in
blk_mq_try_issue_list_directly
blk-mq: correct stale comment of .get_budget

block/blk-mq-sched.c | 7 +--
block/blk-mq.c | 147 ++++++++++++++++++++-----------------------
2 files changed, 71 insertions(+), 83 deletions(-)

--
2.30.0


2023-01-11 05:06:06

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 08/14] blk-mq: remove unncessary error count and commit in blk_mq_plug_issue_direct

We need only to explicitly commit in two error cases:
-did not queue everything initially scheduled to queue
-the last attempt to queue a request failed
(see comment of blk_mq_commit_rqs for more details).
Both cases can be checked with ret of last request which breaks list walk.
Remove unnecessary error count and unnecessary commit triggered by error
which is not covered by cases described above.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 98f6003474f2..c6c84f44c7a6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2706,11 +2706,10 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug)
struct blk_mq_hw_ctx *hctx = NULL;
struct request *rq;
int queued = 0;
- int errors = 0;
+ blk_status_t ret;

while ((rq = rq_list_pop(&plug->mq_list))) {
bool last = rq_list_empty(plug->mq_list);
- blk_status_t ret;

if (hctx != rq->mq_hctx) {
if (hctx)
@@ -2726,20 +2725,15 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug)
case BLK_STS_RESOURCE:
case BLK_STS_DEV_RESOURCE:
blk_mq_request_bypass_insert(rq, false, true);
- blk_mq_plug_commit_rqs(hctx, &queued);
- return;
+ goto out;
default:
blk_mq_end_request(rq, ret);
- errors++;
break;
}
}

- /*
- * If we didn't flush the entire list, we could have told the driver
- * there was more coming, but that turned out to be a lie.
- */
- if (errors)
+out:
+ if (ret != BLK_STS_OK)
blk_mq_plug_commit_rqs(hctx, &queued);
}

--
2.30.0

2023-01-11 05:06:14

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 10/14] blk-mq: simplify flush check in blk_mq_dispatch_rq_list

1. Remove check of needs_resource and ret == BLK_STS_DEV_RESOURCE.
For busy error BLK_STS*_RESOURCE, request will always be added
back to list, so need_resource will not be true and ret will
not be == BLK_STS_DEV_RESOURCE if list is empty. We could remove
these dead check.

2. Check ret of last request instead of errors
If list is empty, we only need to explicitly commit_rqs
if error happens at last request which is stored in ret. So check
ret of last request instead of errors to remove unnecessary
commit_rqs triggered by errors returned from previous request.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f45d81e20d9e..8fd25713751c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2114,9 +2114,9 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
/* If we didn't flush the entire list, we could have told the driver
* there was more coming, but that turned out to be a lie.
*/
- if ((!list_empty(list) || errors || needs_resource ||
- ret == BLK_STS_DEV_RESOURCE) && q->mq_ops->commit_rqs && queued)
- q->mq_ops->commit_rqs(hctx);
+ if ((!list_empty(list) || ret != BLK_STS_OK))
+ blk_mq_commit_rqs(hctx, queued);
+
/*
* Any items that need requeuing? Stuff them into hctx->dispatch,
* that is where we will continue on next queue run.
--
2.30.0

2023-01-11 05:13:15

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 02/14] blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx

Commit 97889f9ac24f8 ("blk-mq: remove synchronize_rcu() from
blk_mq_del_queue_tag_set()") remove handle of TAG_SHARED in restart,
then shared_hctx_restart counted for how many hardware queues are marked
for restart is removed too.
Remove the stale comment that we still count hardware queues need restart.

Fixes: 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq-sched.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 23d1a90fec42..ae40cdb7a383 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -19,8 +19,7 @@
#include "blk-wbt.h"

/*
- * Mark a hardware queue as needing a restart. For shared queues, maintain
- * a count of how many hardware queues are marked for restart.
+ * Mark a hardware queue as needing a restart.
*/
void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
{
--
2.30.0

2023-01-11 05:14:25

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 11/14] blk-mq: remove unnecessary error count and check in blk_mq_dispatch_rq_list

blk_mq_dispatch_rq_list will notify if hctx is busy in return bool. It will
return true if we are not busy and can handle more and return false on the
opposite. Inside blk_mq_dispatch_rq_list, errors is only used if list is
empty and we will return true if list is empty and (errors + queued) != 0.

There are three types of status returned from request:
-busy error BLK_STS*_RESOURCE: the failed request will be added back
to list and list will not be empty.
-BLK_STS_OK: We count queued for BLK_STS_OK
-rest error: We count errors for rest error

If list is empty, there is no request gets busy error then (errors +
queued) will be total requests in the list which is checked not empty at
beginning of blk_mq_dispatch_rq_list. So (errors + queued) != 0 is always
met if list is empty. Then the (errors + queued) != 0 check and errors
number count is not needed.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8fd25713751c..b2133af1c846 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2039,7 +2039,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
enum prep_dispatch prep;
struct request_queue *q = hctx->queue;
struct request *rq, *nxt;
- int errors, queued;
+ int queued;
blk_status_t ret = BLK_STS_OK;
LIST_HEAD(zone_list);
bool needs_resource = false;
@@ -2050,7 +2050,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
/*
* Now process all the entries, sending them to the driver.
*/
- errors = queued = 0;
+ queued = 0;
do {
struct blk_mq_queue_data bd;

@@ -2103,7 +2103,6 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
needs_resource = true;
break;
default:
- errors++;
blk_mq_end_request(rq, ret);
}
} while (!list_empty(list));
@@ -2181,10 +2180,10 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,

blk_mq_update_dispatch_busy(hctx, true);
return false;
- } else
- blk_mq_update_dispatch_busy(hctx, false);
+ }

- return (queued + errors) != 0;
+ blk_mq_update_dispatch_busy(hctx, false);
+ return true;
}

/**
--
2.30.0

2023-01-11 05:15:27

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 03/14] blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait

For shared queues case, we will only wait on bitmap_tags if we fail to get
driver tag. However, rq could be from breserved_tags, then two problems
will occur:
1. io hung if no tag is currently allocated from bitmap_tags.
2. unnecessary wakeup when tag is freed to bitmap_tags while no tag is
freed to breserved_tags.
Wait on the bitmap which rq from to fix this.

Fixes: f906a6a0f426 ("blk-mq: improve tag waiting setup for non-shared tags")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f2586d485be6..de0e0d70cba2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1826,7 +1826,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
- struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
+ struct sbitmap_queue *sbq;
struct wait_queue_head *wq;
wait_queue_entry_t *wait;
bool ret;
@@ -1849,6 +1849,10 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
if (!list_empty_careful(&wait->entry))
return false;

+ if (blk_mq_tag_is_reserved(rq->mq_hctx->sched_tags, rq->internal_tag))
+ sbq = &hctx->tags->breserved_tags;
+ else
+ sbq = &hctx->tags->bitmap_tags;
wq = &bt_wait_ptr(sbq, hctx)->wait;

spin_lock_irq(&wq->lock);
--
2.30.0

2023-01-11 05:16:17

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 06/14] blk-mq: remove unncessary from_schedule parameter in blk_mq_plug_issue_direct

Function blk_mq_plug_issue_direct tries to issue batch requests in plug
list to driver directly. We will only issue plug request to driver if we
are not from scheduler, so from_scheduler parameter of
blk_mq_plug_issue_direct is always false, so as the blk_mq_commit_rqs
which is only called in blk_mq_plug_issue_direct.
Remove unncessary from_scheduler of blk_mq_plug_issue_direct and
blk_mq_commit_rqs.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ca2be137d6db..c6cc3feb3b84 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2555,11 +2555,10 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
spin_unlock(&ctx->lock);
}

-static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int *queued,
- bool from_schedule)
+static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int *queued)
{
if (hctx->queue->mq_ops->commit_rqs) {
- trace_block_unplug(hctx->queue, *queued, !from_schedule);
+ trace_block_unplug(hctx->queue, *queued, true);
hctx->queue->mq_ops->commit_rqs(hctx);
}
*queued = 0;
@@ -2688,7 +2687,7 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
return __blk_mq_try_issue_directly(rq->mq_hctx, rq, true, last);
}

-static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
+static void blk_mq_plug_issue_direct(struct blk_plug *plug)
{
struct blk_mq_hw_ctx *hctx = NULL;
struct request *rq;
@@ -2701,7 +2700,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)

if (hctx != rq->mq_hctx) {
if (hctx)
- blk_mq_commit_rqs(hctx, &queued, from_schedule);
+ blk_mq_commit_rqs(hctx, &queued);
hctx = rq->mq_hctx;
}

@@ -2713,7 +2712,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
case BLK_STS_RESOURCE:
case BLK_STS_DEV_RESOURCE:
blk_mq_request_bypass_insert(rq, false, true);
- blk_mq_commit_rqs(hctx, &queued, from_schedule);
+ blk_mq_commit_rqs(hctx, &queued);
return;
default:
blk_mq_end_request(rq, ret);
@@ -2727,7 +2726,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
* there was more coming, but that turned out to be a lie.
*/
if (errors)
- blk_mq_commit_rqs(hctx, &queued, from_schedule);
+ blk_mq_commit_rqs(hctx, &queued);
}

static void __blk_mq_flush_plug_list(struct request_queue *q,
@@ -2798,7 +2797,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
}

blk_mq_run_dispatch_ops(q,
- blk_mq_plug_issue_direct(plug, false));
+ blk_mq_plug_issue_direct(plug));
if (rq_list_empty(plug->mq_list))
return;
}
--
2.30.0

2023-01-11 05:16:41

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 12/14] blk-mq: remove set of bd->last when get driver tag for next request fails

Commit 113285b473824 ("blk-mq: ensure that bd->last is always set
correctly") will set last if we failed to get driver tag for next
request to avoid flush miss as we break the list walk and will not
send the last request in the list which will be sent with last set
normally.
This code seems stale now becase the flush introduced is always
redundant as:
For case tag is really out, we will send a extra flush if we find
list is not empty after list walk.
For case some tag is freed before retry in blk_mq_prep_dispatch_rq for
next, then we can get a tag for next request in retry and flush notified
already is not necessary.

Just remove these stale codes.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 24 ++----------------------
1 file changed, 2 insertions(+), 22 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index b2133af1c846..1b66a5169be2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1923,16 +1923,6 @@ static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy)
static void blk_mq_handle_dev_resource(struct request *rq,
struct list_head *list)
{
- struct request *next =
- list_first_entry_or_null(list, struct request, queuelist);
-
- /*
- * If an I/O scheduler has been configured and we got a driver tag for
- * the next request already, free it.
- */
- if (next)
- blk_mq_put_driver_tag(next);
-
list_add(&rq->queuelist, list);
__blk_mq_requeue_request(rq);
}
@@ -2038,7 +2028,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
{
enum prep_dispatch prep;
struct request_queue *q = hctx->queue;
- struct request *rq, *nxt;
+ struct request *rq;
int queued;
blk_status_t ret = BLK_STS_OK;
LIST_HEAD(zone_list);
@@ -2064,17 +2054,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
list_del_init(&rq->queuelist);

bd.rq = rq;
-
- /*
- * Flag last if we have no more requests, or if we have more
- * but can't assign a driver tag to it.
- */
- if (list_empty(list))
- bd.last = true;
- else {
- nxt = list_first_entry(list, struct request, queuelist);
- bd.last = !blk_mq_get_driver_tag(nxt);
- }
+ bd.last = list_empty(list);

/*
* once the request is queued to lld, no need to cover the
--
2.30.0

2023-01-11 05:16:47

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 05/14] blk-mq: remove unnecessary list_empty check in blk_mq_try_issue_list_directly

We only break the list walk if we get 'BLK_STS_*RESOURCE'. We also
count errors for 'BLK_STS_*RESOURCE' error. If list is not empty,
errors will always be non-zero. So we can remove unnecessary list_empty
check. This will remove redundant list_empty check for case that
error happened at sending last request in list.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2bfe83f2bcca..ca2be137d6db 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2839,8 +2839,7 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
* the driver there was more coming, but that turned out to
* be a lie.
*/
- if ((!list_empty(list) || errors) &&
- hctx->queue->mq_ops->commit_rqs && queued)
+ if (errors && hctx->queue->mq_ops->commit_rqs && queued)
hctx->queue->mq_ops->commit_rqs(hctx);
}

--
2.30.0

2023-01-11 05:16:59

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 04/14] blk-mq: Fix potential io hung for shared sbitmap per tagset

Commit f906a6a0f4268 ("blk-mq: improve tag waiting setup for non-shared
tags") mark restart for unshared tags for improvement. At that time,
tags is only shared betweens queues and we can check if tags is shared
by test BLK_MQ_F_TAG_SHARED.
Afterwards, commit 32bc15afed04b ("blk-mq: Facilitate a shared sbitmap per
tagset") enabled tags share betweens hctxs inside a queue. We only
mark restart for shared hctxs inside a queue and may cause io hung if
there is no tag currently allocated by hctxs going to be marked restart.
Wait on sbitmap_queue instead of mark restart for shared hctxs case to
fix this.

Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset")
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index de0e0d70cba2..2bfe83f2bcca 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1831,7 +1831,8 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
wait_queue_entry_t *wait;
bool ret;

- if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
+ if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) &&
+ !(blk_mq_is_shared_tags(hctx->flags))) {
blk_mq_sched_mark_restart_hctx(hctx);

/*
@@ -2101,7 +2102,8 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
bool needs_restart;
/* For non-shared tags, the RESTART check will suffice */
bool no_tag = prep == PREP_DISPATCH_NO_TAG &&
- (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED);
+ ((hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) ||
+ blk_mq_is_shared_tags(hctx->flags));

if (nr_budgets)
blk_mq_release_budgets(q, list);
--
2.30.0

2023-01-11 05:17:47

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 14/14] blk-mq: correct stale comment of .get_budget

Commit 88022d7201e96 ("blk-mq: don't handle failure in .get_budget")
remove BLK_STS_RESOURCE return value and we only check if we can get
the budget from .get_budget() now.
Correct stale comment that ".get_budget() returns BLK_STS_NO_RESOURCE"
to ".get_budget() fails to get the budget".

Fixes: 88022d7201e9 ("blk-mq: don't handle failure in .get_budget")
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq-sched.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ae40cdb7a383..06b312c69114 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -81,7 +81,7 @@ static bool blk_mq_dispatch_hctx_list(struct list_head *rq_list)
/*
* Only SCSI implements .get_budget and .put_budget, and SCSI restarts
* its queue by itself in its completion handler, so we don't need to
- * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
+ * restart queue if .get_budget() fails to get the budget.
*
* Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to
* be run again. This is necessary to avoid starving flushes.
@@ -209,7 +209,7 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
/*
* Only SCSI implements .get_budget and .put_budget, and SCSI restarts
* its queue by itself in its completion handler, so we don't need to
- * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
+ * restart queue if .get_budget() fails to get the budget.
*
* Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to
* be run again. This is necessary to avoid starving flushes.
--
2.30.0

2023-01-11 05:18:48

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 09/14] blk-mq: use blk_mq_commit_rqs helper in blk_mq_try_issue_list_directly

Call blk_mq_commit_rqs instead of access ->commit_rqs directly. As you
can see in comment of blk_mq_commit_rqs, we only need explicitly call
this in two cases:
-did not queue everything initially scheduled to queue
-the last attempt to queue a request failed
Both cases can be checked with ret of last request which breaks list
walk. Then we can remove unnecessary error count and unnecessary
commit triggered by error besides cases described above.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c6c84f44c7a6..f45d81e20d9e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2819,17 +2819,15 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
struct list_head *list)
{
int queued = 0;
- int errors = 0;
+ blk_status_t ret;

while (!list_empty(list)) {
- blk_status_t ret;
struct request *rq = list_first_entry(list, struct request,
queuelist);

list_del_init(&rq->queuelist);
ret = blk_mq_request_issue_directly(rq, list_empty(list));
if (ret != BLK_STS_OK) {
- errors++;
if (ret == BLK_STS_RESOURCE ||
ret == BLK_STS_DEV_RESOURCE) {
blk_mq_request_bypass_insert(rq, false,
@@ -2841,13 +2839,8 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
queued++;
}

- /*
- * If we didn't flush the entire list, we could have told
- * the driver there was more coming, but that turned out to
- * be a lie.
- */
- if (errors && hctx->queue->mq_ops->commit_rqs && queued)
- hctx->queue->mq_ops->commit_rqs(hctx);
+ if (ret != BLK_STS_OK)
+ blk_mq_commit_rqs(hctx, queued);
}

static bool blk_mq_attempt_bio_merge(struct request_queue *q,
--
2.30.0

2023-01-11 05:19:59

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 13/14] blk-mq: use switch/case to improve readability in blk_mq_try_issue_list_directly

Use switch/case handle error as other function do to improve
readability in blk_mq_try_issue_list_directly.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1b66a5169be2..baa65a15abb5 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2806,18 +2806,22 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,

list_del_init(&rq->queuelist);
ret = blk_mq_request_issue_directly(rq, list_empty(list));
- if (ret != BLK_STS_OK) {
- if (ret == BLK_STS_RESOURCE ||
- ret == BLK_STS_DEV_RESOURCE) {
- blk_mq_request_bypass_insert(rq, false,
- list_empty(list));
- break;
- }
- blk_mq_end_request(rq, ret);
- } else
+ switch (ret) {
+ case BLK_STS_OK:
queued++;
+ break;
+ case BLK_STS_RESOURCE:
+ case BLK_STS_DEV_RESOURCE:
+ blk_mq_request_bypass_insert(rq, false,
+ list_empty(list));
+ goto out;
+ default:
+ blk_mq_end_request(rq, ret);
+ break;
+ }
}

+out:
if (ret != BLK_STS_OK)
blk_mq_commit_rqs(hctx, queued);
}
--
2.30.0

2023-01-11 05:38:15

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 07/14] blk-mq: make blk_mq_commit_rqs a general function for all commits

1. rename orignal blk_mq_commit_rqs to blk_mq_plug_commit_rqs as
trace_block_unplug is only needed when we dispatch request from plug list.
We need a commit_rqs wrapper for this case. Besides, this patch
adds queued check and only commits request if any request was queued
to keep commit behavior consistent and remove unnecessary commit.
2. add new blk_mq_commit_rqs for general commits. Current
blk_mq_commit_rqs will not clear queued as queued clearing is not
wanted generally.
3. document rule for unusual cases which need explicit commit_rqs.

Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 38 ++++++++++++++++++++++++++------------
1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c6cc3feb3b84..98f6003474f2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2007,6 +2007,29 @@ static void blk_mq_release_budgets(struct request_queue *q,
}
}

+/* blk_mq_commit_rqs and blk_mq_plug_commit_rqs notify driver using
+ * bd->last that there is no more requests. (See comment in struct
+ * blk_mq_ops for commit_rqs for details)
+ * Attention, we should explicitly call this in unusual cases:
+ * 1) did not queue everything initially scheduled to queue
+ * 2) the last attempt to queue a request failed
+ */
+static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int queued)
+{
+ if (hctx->queue->mq_ops->commit_rqs && queued) {
+ hctx->queue->mq_ops->commit_rqs(hctx);
+ }
+}
+
+static void blk_mq_plug_commit_rqs(struct blk_mq_hw_ctx *hctx, int *queued)
+{
+ if (hctx->queue->mq_ops->commit_rqs && *queued) {
+ trace_block_unplug(hctx->queue, *queued, true);
+ hctx->queue->mq_ops->commit_rqs(hctx);
+ }
+ *queued = 0;
+}
+
/*
* Returns true if we did some work AND can potentially do more.
*/
@@ -2555,15 +2578,6 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
spin_unlock(&ctx->lock);
}

-static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int *queued)
-{
- if (hctx->queue->mq_ops->commit_rqs) {
- trace_block_unplug(hctx->queue, *queued, true);
- hctx->queue->mq_ops->commit_rqs(hctx);
- }
- *queued = 0;
-}
-
static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
unsigned int nr_segs)
{
@@ -2700,7 +2714,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug)

if (hctx != rq->mq_hctx) {
if (hctx)
- blk_mq_commit_rqs(hctx, &queued);
+ blk_mq_plug_commit_rqs(hctx, &queued);
hctx = rq->mq_hctx;
}

@@ -2712,7 +2726,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug)
case BLK_STS_RESOURCE:
case BLK_STS_DEV_RESOURCE:
blk_mq_request_bypass_insert(rq, false, true);
- blk_mq_commit_rqs(hctx, &queued);
+ blk_mq_plug_commit_rqs(hctx, &queued);
return;
default:
blk_mq_end_request(rq, ret);
@@ -2726,7 +2740,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug)
* there was more coming, but that turned out to be a lie.
*/
if (errors)
- blk_mq_commit_rqs(hctx, &queued);
+ blk_mq_plug_commit_rqs(hctx, &queued);
}

static void __blk_mq_flush_plug_list(struct request_queue *q,
--
2.30.0

2023-01-11 05:38:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 08/14] blk-mq: remove unncessary error count and commit in blk_mq_plug_issue_direct

On Wed, Jan 11, 2023 at 09:01:53PM +0800, Kemeng Shi wrote:
> We need only to explicitly commit in two error cases:
> -did not queue everything initially scheduled to queue
> -the last attempt to queue a request failed
> (see comment of blk_mq_commit_rqs for more details).
> Both cases can be checked with ret of last request which breaks list walk.
> Remove unnecessary error count and unnecessary commit triggered by error
> which is not covered by cases described above.
>
> Signed-off-by: Kemeng Shi <[email protected]>
> ---
> block/blk-mq.c | 14 ++++----------
> 1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 98f6003474f2..c6c84f44c7a6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2706,11 +2706,10 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug)
> struct blk_mq_hw_ctx *hctx = NULL;
> struct request *rq;
> int queued = 0;
> - int errors = 0;
> + blk_status_t ret;

I think we need to initialize this to BLK_STS_OK here.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2023-01-11 05:40:46

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v3 01/14] blk-mq: avoid sleep in blk_mq_alloc_request_hctx

Commit 1f5bd336b9150 ("blk-mq: add blk_mq_alloc_request_hctx") add
blk_mq_alloc_request_hctx to send commands to a specific queue. If
BLK_MQ_REQ_NOWAIT is not set in tag allocation, we may change to different
hctx after sleep and get tag from unexpected hctx. So BLK_MQ_REQ_NOWAIT
must be set in flags for blk_mq_alloc_request_hctx.
After commit 600c3b0cea784 ("blk-mq: open code __blk_mq_alloc_request in
blk_mq_alloc_request_hctx"), blk_mq_alloc_request_hctx return -EINVAL
if both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED are not set instead of
if BLK_MQ_REQ_NOWAIT is not set. So if BLK_MQ_REQ_NOWAIT is not set and
BLK_MQ_REQ_RESERVED is set, blk_mq_alloc_request_hctx could alloc tag
from unexpected hctx. I guess what we need here is that return -EINVAL
if either BLK_MQ_REQ_NOWAIT or BLK_MQ_REQ_RESERVED is not set.

Currently both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED will be set if
specific hctx is needed in nvme_auth_submit, nvmf_connect_io_queue
and nvmf_connect_admin_queue. Fix the potential BLK_MQ_REQ_NOWAIT missed
case in future.

Fixes: 600c3b0cea78 ("blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c5cf0dbca1db..f2586d485be6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -658,7 +658,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
* allocator for this for the rare use case of a command tied to
* a specific queue.
*/
- if (WARN_ON_ONCE(!(flags & (BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_RESERVED))))
+ if (WARN_ON_ONCE(!(flags & BLK_MQ_REQ_NOWAIT)) ||
+ WARN_ON_ONCE(!(flags & BLK_MQ_REQ_RESERVED)))
return ERR_PTR(-EINVAL);

if (hctx_idx >= q->nr_hw_queues)
--
2.30.0

2023-01-11 05:43:58

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 09/14] blk-mq: use blk_mq_commit_rqs helper in blk_mq_try_issue_list_directly

On Wed, Jan 11, 2023 at 09:01:54PM +0800, Kemeng Shi wrote:
> + blk_status_t ret;

Same here, I think ret needs to be initialized now.

2023-01-11 06:09:41

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 07/14] blk-mq: make blk_mq_commit_rqs a general function for all commits

On Wed, Jan 11, 2023 at 09:01:52PM +0800, Kemeng Shi wrote:
> 1. rename orignal blk_mq_commit_rqs to blk_mq_plug_commit_rqs as
> trace_block_unplug is only needed when we dispatch request from plug list.

Why? I think always having the trace even for the commit case seems
very useful for making the traces useful.

> +/* blk_mq_commit_rqs and blk_mq_plug_commit_rqs notify driver using
> + * bd->last that there is no more requests. (See comment in struct

This is not the normal kernel comment style.

> +static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int queued)
> +{
> + if (hctx->queue->mq_ops->commit_rqs && queued) {
> + hctx->queue->mq_ops->commit_rqs(hctx);
> + }

No need for the braces.

2023-01-11 06:21:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 10/14] blk-mq: simplify flush check in blk_mq_dispatch_rq_list

> + if ((!list_empty(list) || ret != BLK_STS_OK))

Plase drop the double braces.

2023-01-11 06:50:44

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH v3 07/14] blk-mq: make blk_mq_commit_rqs a general function for all commits


Hi Christoph, thank you for taking time to review, alos sorry for the bother
of code style problem. I will fix them in next version.
on 1/11/2023 1:45 PM, Christoph Hellwig wrote:
> On Wed, Jan 11, 2023 at 09:01:52PM +0800, Kemeng Shi wrote:
>> 1. rename orignal blk_mq_commit_rqs to blk_mq_plug_commit_rqs as
>> trace_block_unplug is only needed when we dispatch request from plug list.
>
> Why? I think always having the trace even for the commit case seems
> very useful for making the traces useful.
I think unplug event more likely means that request going to be sent to driver
was plugged and in plug list. And the current code do only trace unplug event
when dispatching requests from plug list. If so, would it be better to add
a new event to trace commit?
>> +/* blk_mq_commit_rqs and blk_mq_plug_commit_rqs notify driver using
>> + * bd->last that there is no more requests. (See comment in struct
>
> This is not the normal kernel comment style.
>
>> +static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int queued)
>> +{
>> + if (hctx->queue->mq_ops->commit_rqs && queued) {
>> + hctx->queue->mq_ops->commit_rqs(hctx);
>> + }
>
> No need for the braces.
>

--
Best wishes
Kemeng Shi

2023-01-16 01:32:18

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH v3 07/14] blk-mq: make blk_mq_commit_rqs a general function for all commits


on 1/11/2023 2:30 PM, Kemeng Shi wrote:
>
> Hi Christoph, thank you for taking time to review, alos sorry for the bother
> of code style problem. I will fix them in next version.
> on 1/11/2023 1:45 PM, Christoph Hellwig wrote:
>> On Wed, Jan 11, 2023 at 09:01:52PM +0800, Kemeng Shi wrote:
>>> 1. rename orignal blk_mq_commit_rqs to blk_mq_plug_commit_rqs as
>>> trace_block_unplug is only needed when we dispatch request from plug list.
>>
>> Why? I think always having the trace even for the commit case seems
>> very useful for making the traces useful.
> I think unplug event more likely means that request going to be sent to driver
> was plugged and in plug list. And the current code do only trace unplug event
> when dispatching requests from plug list. If so, would it be better to add
> a new event to trace commit?
Hi Christoph, which way do you prefer now? Keep unplug event consistent to
trace commit of requests from plug list only or trace all commits with
unplug event. Please let me know and I will consider it in next version.
Thanks.
>>> +/* blk_mq_commit_rqs and blk_mq_plug_commit_rqs notify driver using
>>> + * bd->last that there is no more requests. (See comment in struct
>>
>> This is not the normal kernel comment style.
>>
>>> +static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int queued)
>>> +{
>>> + if (hctx->queue->mq_ops->commit_rqs && queued) {
>>> + hctx->queue->mq_ops->commit_rqs(hctx);
>>> + }
>>
>> No need for the braces.
>>
>

--
Best wishes
Kemeng Shi

2023-01-16 17:06:45

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 07/14] blk-mq: make blk_mq_commit_rqs a general function for all commits

On Mon, Jan 16, 2023 at 09:07:00AM +0800, Kemeng Shi wrote:
> >> Why? I think always having the trace even for the commit case seems
> >> very useful for making the traces useful.
> > I think unplug event more likely means that request going to be sent to driver
> > was plugged and in plug list. And the current code do only trace unplug event
> > when dispatching requests from plug list. If so, would it be better to add
> > a new event to trace commit?
> Hi Christoph, which way do you prefer now? Keep unplug event consistent to
> trace commit of requests from plug list only or trace all commits with
> unplug event. Please let me know and I will consider it in next version.
> Thanks.

To me always having the trace feels more useful, but let's see if Jens
has an opinion on it.

2023-01-16 18:22:04

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v3 07/14] blk-mq: make blk_mq_commit_rqs a general function for all commits

On 1/16/23 9:09 AM, Christoph Hellwig wrote:
> On Mon, Jan 16, 2023 at 09:07:00AM +0800, Kemeng Shi wrote:
>>>> Why? I think always having the trace even for the commit case seems
>>>> very useful for making the traces useful.
>>> I think unplug event more likely means that request going to be sent to driver
>>> was plugged and in plug list. And the current code do only trace unplug event
>>> when dispatching requests from plug list. If so, would it be better to add
>>> a new event to trace commit?
>> Hi Christoph, which way do you prefer now? Keep unplug event consistent to
>> trace commit of requests from plug list only or trace all commits with
>> unplug event. Please let me know and I will consider it in next version.
>> Thanks.
>
> To me always having the trace feels more useful, but let's see if Jens
> has an opinion on it.

Agree, that is probably the saner option.

--
Jens Axboe


2023-01-17 01:03:25

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH v3 07/14] blk-mq: make blk_mq_commit_rqs a general function for all commits



on 1/17/2023 12:13 AM, Jens Axboe wrote:
> On 1/16/23 9:09 AM, Christoph Hellwig wrote:
>> On Mon, Jan 16, 2023 at 09:07:00AM +0800, Kemeng Shi wrote:
>>>>> Why? I think always having the trace even for the commit case seems
>>>>> very useful for making the traces useful.
>>>> I think unplug event more likely means that request going to be sent to driver
>>>> was plugged and in plug list. And the current code do only trace unplug event
>>>> when dispatching requests from plug list. If so, would it be better to add
>>>> a new event to trace commit?
>>> Hi Christoph, which way do you prefer now? Keep unplug event consistent to
>>> trace commit of requests from plug list only or trace all commits with
>>> unplug event. Please let me know and I will consider it in next version.
>>> Thanks.
>>
>> To me always having the trace feels more useful, but let's see if Jens
>> has an opinion on it.
>
> Agree, that is probably the saner option.
>
Thanks for replies, I will trace all commits with unplug event in next version.

--
Best wishes
Kemeng Shi