2023-01-18 01:41:51

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 01/14] blk-mq: avoid sleep in blk_mq_alloc_request_hctx

Commit 1f5bd336b9150 ("blk-mq: add blk_mq_alloc_request_hctx") add
blk_mq_alloc_request_hctx to send commands to a specific queue. If
BLK_MQ_REQ_NOWAIT is not set in tag allocation, we may change to different
hctx after sleep and get tag from unexpected hctx. So BLK_MQ_REQ_NOWAIT
must be set in flags for blk_mq_alloc_request_hctx.
After commit 600c3b0cea784 ("blk-mq: open code __blk_mq_alloc_request in
blk_mq_alloc_request_hctx"), blk_mq_alloc_request_hctx return -EINVAL
if both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED are not set instead of
if BLK_MQ_REQ_NOWAIT is not set. So if BLK_MQ_REQ_NOWAIT is not set and
BLK_MQ_REQ_RESERVED is set, blk_mq_alloc_request_hctx could alloc tag
from unexpected hctx. I guess what we need here is that return -EINVAL
if either BLK_MQ_REQ_NOWAIT or BLK_MQ_REQ_RESERVED is not set.

Currently both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED will be set if
specific hctx is needed in nvme_auth_submit, nvmf_connect_io_queue
and nvmf_connect_admin_queue. Fix the potential BLK_MQ_REQ_NOWAIT missed
case in future.

Fixes: 600c3b0cea78 ("blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2c49b4151da1..9c6a6ce9eaf2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -658,7 +658,8 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
* allocator for this for the rare use case of a command tied to
* a specific queue.
*/
- if (WARN_ON_ONCE(!(flags & (BLK_MQ_REQ_NOWAIT | BLK_MQ_REQ_RESERVED))))
+ if (WARN_ON_ONCE(!(flags & BLK_MQ_REQ_NOWAIT)) ||
+ WARN_ON_ONCE(!(flags & BLK_MQ_REQ_RESERVED)))
return ERR_PTR(-EINVAL);

if (hctx_idx >= q->nr_hw_queues)
--
2.30.0


2023-01-18 01:41:56

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 10/14] blk-mq: simplify flush check in blk_mq_dispatch_rq_list

1. Remove check of needs_resource and ret == BLK_STS_DEV_RESOURCE.
For busy error BLK_STS*_RESOURCE, request will always be added
back to list, so need_resource will not be true and ret will
not be == BLK_STS_DEV_RESOURCE if list is empty. We could remove
these dead check.

2. Check ret of last request instead of errors
If list is empty, we only need to explicitly commit_rqs
if error happens at last request which is stored in ret. So check
ret of last request instead of errors to remove unnecessary
commit_rqs triggered by errors returned from previous request.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c945a4602849..d4fdb01475b6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2108,9 +2108,9 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
/* If we didn't flush the entire list, we could have told the driver
* there was more coming, but that turned out to be a lie.
*/
- if ((!list_empty(list) || errors || needs_resource ||
- ret == BLK_STS_DEV_RESOURCE) && q->mq_ops->commit_rqs && queued)
- q->mq_ops->commit_rqs(hctx);
+ if (!list_empty(list) || ret != BLK_STS_OK)
+ blk_mq_commit_rqs(hctx, queued, false);
+
/*
* Any items that need requeuing? Stuff them into hctx->dispatch,
* that is where we will continue on next queue run.
--
2.30.0

2023-01-18 01:42:02

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 09/14] blk-mq: use blk_mq_commit_rqs helper in blk_mq_try_issue_list_directly

Call blk_mq_commit_rqs instead of access ->commit_rqs directly. As you
can see in comment of blk_mq_commit_rqs, we only need explicitly call
this in two cases:
-did not queue everything initially scheduled to queue
-the last attempt to queue a request failed
Both cases can be checked with ret of last request which breaks list
walk. Then we can remove unnecessary error count and unnecessary
commit triggered by error besides cases described above.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4066fe5c3147..c945a4602849 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2815,17 +2815,15 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
struct list_head *list)
{
int queued = 0;
- int errors = 0;
+ blk_status_t ret = BLK_STS_OK;

while (!list_empty(list)) {
- blk_status_t ret;
struct request *rq = list_first_entry(list, struct request,
queuelist);

list_del_init(&rq->queuelist);
ret = blk_mq_request_issue_directly(rq, list_empty(list));
if (ret != BLK_STS_OK) {
- errors++;
if (ret == BLK_STS_RESOURCE ||
ret == BLK_STS_DEV_RESOURCE) {
blk_mq_request_bypass_insert(rq, false,
@@ -2837,13 +2835,8 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
queued++;
}

- /*
- * If we didn't flush the entire list, we could have told
- * the driver there was more coming, but that turned out to
- * be a lie.
- */
- if (errors && hctx->queue->mq_ops->commit_rqs && queued)
- hctx->queue->mq_ops->commit_rqs(hctx);
+ if (ret != BLK_STS_OK)
+ blk_mq_commit_rqs(hctx, queued, false);
}

static bool blk_mq_attempt_bio_merge(struct request_queue *q,
--
2.30.0

2023-01-18 01:42:05

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 13/14] blk-mq: use switch/case to improve readability in blk_mq_try_issue_list_directly

Use switch/case handle error as other function do to improve
readability in blk_mq_try_issue_list_directly.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index afd4f0c3166b..bc6d7e411b1a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2802,18 +2802,22 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,

list_del_init(&rq->queuelist);
ret = blk_mq_request_issue_directly(rq, list_empty(list));
- if (ret != BLK_STS_OK) {
- if (ret == BLK_STS_RESOURCE ||
- ret == BLK_STS_DEV_RESOURCE) {
- blk_mq_request_bypass_insert(rq, false,
- list_empty(list));
- break;
- }
- blk_mq_end_request(rq, ret);
- } else
+ switch (ret) {
+ case BLK_STS_OK:
queued++;
+ break;
+ case BLK_STS_RESOURCE:
+ case BLK_STS_DEV_RESOURCE:
+ blk_mq_request_bypass_insert(rq, false,
+ list_empty(list));
+ goto out;
+ default:
+ blk_mq_end_request(rq, ret);
+ break;
+ }
}

+out:
if (ret != BLK_STS_OK)
blk_mq_commit_rqs(hctx, queued, false);
}
--
2.30.0

2023-01-18 01:42:20

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 14/14] blk-mq: correct stale comment of .get_budget

Commit 88022d7201e96 ("blk-mq: don't handle failure in .get_budget")
remove BLK_STS_RESOURCE return value and we only check if we can get
the budget from .get_budget() now.
Correct stale comment that ".get_budget() returns BLK_STS_NO_RESOURCE"
to ".get_budget() fails to get the budget".

Fixes: 88022d7201e9 ("blk-mq: don't handle failure in .get_budget")
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq-sched.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ae40cdb7a383..06b312c69114 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -81,7 +81,7 @@ static bool blk_mq_dispatch_hctx_list(struct list_head *rq_list)
/*
* Only SCSI implements .get_budget and .put_budget, and SCSI restarts
* its queue by itself in its completion handler, so we don't need to
- * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
+ * restart queue if .get_budget() fails to get the budget.
*
* Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to
* be run again. This is necessary to avoid starving flushes.
@@ -209,7 +209,7 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
/*
* Only SCSI implements .get_budget and .put_budget, and SCSI restarts
* its queue by itself in its completion handler, so we don't need to
- * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
+ * restart queue if .get_budget() fails to get the budget.
*
* Returns -EAGAIN if hctx->dispatch was found non-empty and run_work has to
* be run again. This is necessary to avoid starving flushes.
--
2.30.0

2023-01-18 01:42:30

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 05/14] blk-mq: remove unnecessary list_empty check in blk_mq_try_issue_list_directly

We only break the list walk if we get 'BLK_STS_*RESOURCE'. We also
count errors for 'BLK_STS_*RESOURCE' error. If list is not empty,
errors will always be non-zero. So we can remove unnecessary list_empty
check. This will remove redundant list_empty check for case that
error happened at sending last request in list.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6eb23bbe1787..7160af49c245 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2839,8 +2839,7 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
* the driver there was more coming, but that turned out to
* be a lie.
*/
- if ((!list_empty(list) || errors) &&
- hctx->queue->mq_ops->commit_rqs && queued)
+ if (errors && hctx->queue->mq_ops->commit_rqs && queued)
hctx->queue->mq_ops->commit_rqs(hctx);
}

--
2.30.0

2023-01-18 01:42:30

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 12/14] blk-mq: remove set of bd->last when get driver tag for next request fails

Commit 113285b473824 ("blk-mq: ensure that bd->last is always set
correctly") will set last if we failed to get driver tag for next
request to avoid flush miss as we break the list walk and will not
send the last request in the list which will be sent with last set
normally.
This code seems stale now becase the flush introduced is always
redundant as:
For case tag is really out, we will send a extra flush if we find
list is not empty after list walk.
For case some tag is freed before retry in blk_mq_prep_dispatch_rq for
next, then we can get a tag for next request in retry and flush notified
already is not necessary.

Just remove these stale codes.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 24 ++----------------------
1 file changed, 2 insertions(+), 22 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 90471b5c868f..afd4f0c3166b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1923,16 +1923,6 @@ static void blk_mq_update_dispatch_busy(struct blk_mq_hw_ctx *hctx, bool busy)
static void blk_mq_handle_dev_resource(struct request *rq,
struct list_head *list)
{
- struct request *next =
- list_first_entry_or_null(list, struct request, queuelist);
-
- /*
- * If an I/O scheduler has been configured and we got a driver tag for
- * the next request already, free it.
- */
- if (next)
- blk_mq_put_driver_tag(next);
-
list_add(&rq->queuelist, list);
__blk_mq_requeue_request(rq);
}
@@ -2032,7 +2022,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
{
enum prep_dispatch prep;
struct request_queue *q = hctx->queue;
- struct request *rq, *nxt;
+ struct request *rq;
int queued;
blk_status_t ret = BLK_STS_OK;
LIST_HEAD(zone_list);
@@ -2058,17 +2048,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
list_del_init(&rq->queuelist);

bd.rq = rq;
-
- /*
- * Flag last if we have no more requests, or if we have more
- * but can't assign a driver tag to it.
- */
- if (list_empty(list))
- bd.last = true;
- else {
- nxt = list_first_entry(list, struct request, queuelist);
- bd.last = !blk_mq_get_driver_tag(nxt);
- }
+ bd.last = list_empty(list);

/*
* once the request is queued to lld, no need to cover the
--
2.30.0

2023-01-18 01:42:43

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 11/14] blk-mq: remove unnecessary error count and check in blk_mq_dispatch_rq_list

blk_mq_dispatch_rq_list will notify if hctx is busy in return bool. It will
return true if we are not busy and can handle more and return false on the
opposite. Inside blk_mq_dispatch_rq_list, errors is only used if list is
empty and we will return true if list is empty and (errors + queued) != 0.

There are three types of status returned from request:
-busy error BLK_STS*_RESOURCE: the failed request will be added back
to list and list will not be empty.
-BLK_STS_OK: We count queued for BLK_STS_OK
-rest error: We count errors for rest error

If list is empty, there is no request gets busy error then (errors +
queued) will be total requests in the list which is checked not empty at
beginning of blk_mq_dispatch_rq_list. So (errors + queued) != 0 is always
met if list is empty. Then the (errors + queued) != 0 check and errors
number count is not needed.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index d4fdb01475b6..90471b5c868f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2033,7 +2033,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
enum prep_dispatch prep;
struct request_queue *q = hctx->queue;
struct request *rq, *nxt;
- int errors, queued;
+ int queued;
blk_status_t ret = BLK_STS_OK;
LIST_HEAD(zone_list);
bool needs_resource = false;
@@ -2044,7 +2044,7 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
/*
* Now process all the entries, sending them to the driver.
*/
- errors = queued = 0;
+ queued = 0;
do {
struct blk_mq_queue_data bd;

@@ -2097,7 +2097,6 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
needs_resource = true;
break;
default:
- errors++;
blk_mq_end_request(rq, ret);
}
} while (!list_empty(list));
@@ -2175,10 +2174,10 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,

blk_mq_update_dispatch_busy(hctx, true);
return false;
- } else
- blk_mq_update_dispatch_busy(hctx, false);
+ }

- return (queued + errors) != 0;
+ blk_mq_update_dispatch_busy(hctx, false);
+ return true;
}

/**
--
2.30.0

2023-01-18 01:50:38

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 07/14] blk-mq: make blk_mq_commit_rqs a general function for all commits

1. move blk_mq_commit_rqs forward before functions need commits.
2. add queued check and only commits request if any request was queued
in blk_mq_commit_rqs to keep commit behavior consistent and remove
unnecessary commit.
3. split the queued clearing from blk_mq_plug_commit_rqs as it is
not wanted general.
4. sync current caller of blk_mq_commit_rqs with new general
blk_mq_commit_rqs.
5. document rule for unusual cases which need explicit commit_rqs.

Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 37 +++++++++++++++++++++++--------------
1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5d146ec9f8cb..159d9163c46c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2007,6 +2007,23 @@ static void blk_mq_release_budgets(struct request_queue *q,
}
}

+/*
+ * blk_mq_commit_rqs will notify driver using bd->last that there is no
+ * more requests. (See comment in struct blk_mq_ops for commit_rqs for
+ * details)
+ * Attention, we should explicitly call this in unusual cases:
+ * 1) did not queue everything initially scheduled to queue
+ * 2) the last attempt to queue a request failed
+ */
+static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int queued,
+ bool from_schedule)
+{
+ if (hctx->queue->mq_ops->commit_rqs && queued) {
+ trace_block_unplug(hctx->queue, queued, !from_schedule);
+ hctx->queue->mq_ops->commit_rqs(hctx);
+ }
+}
+
/*
* Returns true if we did some work AND can potentially do more.
*/
@@ -2555,16 +2572,6 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
spin_unlock(&ctx->lock);
}

-static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int *queued,
- bool from_schedule)
-{
- if (hctx->queue->mq_ops->commit_rqs) {
- trace_block_unplug(hctx->queue, *queued, !from_schedule);
- hctx->queue->mq_ops->commit_rqs(hctx);
- }
- *queued = 0;
-}
-
static void blk_mq_bio_to_request(struct request *rq, struct bio *bio,
unsigned int nr_segs)
{
@@ -2700,8 +2707,10 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug)
blk_status_t ret;

if (hctx != rq->mq_hctx) {
- if (hctx)
- blk_mq_commit_rqs(hctx, &queued, false);
+ if (hctx) {
+ blk_mq_commit_rqs(hctx, queued, false);
+ queued = 0;
+ }
hctx = rq->mq_hctx;
}

@@ -2713,7 +2722,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug)
case BLK_STS_RESOURCE:
case BLK_STS_DEV_RESOURCE:
blk_mq_request_bypass_insert(rq, false, true);
- blk_mq_commit_rqs(hctx, &queued, false);
+ blk_mq_commit_rqs(hctx, queued, false);
return;
default:
blk_mq_end_request(rq, ret);
@@ -2727,7 +2736,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug)
* there was more coming, but that turned out to be a lie.
*/
if (errors)
- blk_mq_commit_rqs(hctx, &queued, false);
+ blk_mq_commit_rqs(hctx, queued, false);
}

static void __blk_mq_flush_plug_list(struct request_queue *q,
--
2.30.0

2023-01-18 01:54:50

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 08/14] blk-mq: remove unncessary error count and commit in blk_mq_plug_issue_direct

We need only to explicitly commit in two error cases:
-did not queue everything initially scheduled to queue
-the last attempt to queue a request failed
(see comment of blk_mq_commit_rqs for more details).
Both cases can be checked with ret of last request which breaks list walk.
Remove unnecessary error count and unnecessary commit triggered by error
which is not covered by cases described above.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 159d9163c46c..4066fe5c3147 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2700,11 +2700,10 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug)
struct blk_mq_hw_ctx *hctx = NULL;
struct request *rq;
int queued = 0;
- int errors = 0;
+ blk_status_t ret = BLK_STS_OK;

while ((rq = rq_list_pop(&plug->mq_list))) {
bool last = rq_list_empty(plug->mq_list);
- blk_status_t ret;

if (hctx != rq->mq_hctx) {
if (hctx) {
@@ -2722,20 +2721,15 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug)
case BLK_STS_RESOURCE:
case BLK_STS_DEV_RESOURCE:
blk_mq_request_bypass_insert(rq, false, true);
- blk_mq_commit_rqs(hctx, queued, false);
- return;
+ goto out;
default:
blk_mq_end_request(rq, ret);
- errors++;
break;
}
}

- /*
- * If we didn't flush the entire list, we could have told the driver
- * there was more coming, but that turned out to be a lie.
- */
- if (errors)
+out:
+ if (ret != BLK_STS_OK)
blk_mq_commit_rqs(hctx, queued, false);
}

--
2.30.0

2023-01-18 02:25:49

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 03/14] blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait

For shared queues case, we will only wait on bitmap_tags if we fail to get
driver tag. However, rq could be from breserved_tags, then two problems
will occur:
1. io hung if no tag is currently allocated from bitmap_tags.
2. unnecessary wakeup when tag is freed to bitmap_tags while no tag is
freed to breserved_tags.
Wait on the bitmap which rq from to fix this.

Fixes: f906a6a0f426 ("blk-mq: improve tag waiting setup for non-shared tags")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9c6a6ce9eaf2..943f2ce1e674 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1826,7 +1826,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
- struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
+ struct sbitmap_queue *sbq;
struct wait_queue_head *wq;
wait_queue_entry_t *wait;
bool ret;
@@ -1849,6 +1849,10 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
if (!list_empty_careful(&wait->entry))
return false;

+ if (blk_mq_tag_is_reserved(rq->mq_hctx->sched_tags, rq->internal_tag))
+ sbq = &hctx->tags->breserved_tags;
+ else
+ sbq = &hctx->tags->bitmap_tags;
wq = &bt_wait_ptr(sbq, hctx)->wait;

spin_lock_irq(&wq->lock);
--
2.30.0

2023-01-18 02:36:45

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 04/14] blk-mq: Fix potential io hung for shared sbitmap per tagset

Commit f906a6a0f4268 ("blk-mq: improve tag waiting setup for non-shared
tags") mark restart for unshared tags for improvement. At that time,
tags is only shared betweens queues and we can check if tags is shared
by test BLK_MQ_F_TAG_SHARED.
Afterwards, commit 32bc15afed04b ("blk-mq: Facilitate a shared sbitmap per
tagset") enabled tags share betweens hctxs inside a queue. We only
mark restart for shared hctxs inside a queue and may cause io hung if
there is no tag currently allocated by hctxs going to be marked restart.
Wait on sbitmap_queue instead of mark restart for shared hctxs case to
fix this.

Fixes: 32bc15afed04 ("blk-mq: Facilitate a shared sbitmap per tagset")
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 943f2ce1e674..6eb23bbe1787 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1831,7 +1831,8 @@ static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
wait_queue_entry_t *wait;
bool ret;

- if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED)) {
+ if (!(hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) &&
+ !(blk_mq_is_shared_tags(hctx->flags))) {
blk_mq_sched_mark_restart_hctx(hctx);

/*
@@ -2101,7 +2102,8 @@ bool blk_mq_dispatch_rq_list(struct blk_mq_hw_ctx *hctx, struct list_head *list,
bool needs_restart;
/* For non-shared tags, the RESTART check will suffice */
bool no_tag = prep == PREP_DISPATCH_NO_TAG &&
- (hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED);
+ ((hctx->flags & BLK_MQ_F_TAG_QUEUE_SHARED) ||
+ blk_mq_is_shared_tags(hctx->flags));

if (nr_budgets)
blk_mq_release_budgets(q, list);
--
2.30.0

2023-01-18 02:53:25

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 06/14] blk-mq: remove unncessary from_schedule parameter in blk_mq_plug_issue_direct

Function blk_mq_plug_issue_direct tries to issue batch requests in plug
list to driver directly. We will only issue plug request to driver if we
are not from scheduler, so from_scheduler parameter of
blk_mq_plug_issue_direct is always false.
Remove unncessary from_scheduler of blk_mq_plug_issue_direct.

Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7160af49c245..5d146ec9f8cb 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2688,7 +2688,7 @@ static blk_status_t blk_mq_request_issue_directly(struct request *rq, bool last)
return __blk_mq_try_issue_directly(rq->mq_hctx, rq, true, last);
}

-static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
+static void blk_mq_plug_issue_direct(struct blk_plug *plug)
{
struct blk_mq_hw_ctx *hctx = NULL;
struct request *rq;
@@ -2701,7 +2701,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)

if (hctx != rq->mq_hctx) {
if (hctx)
- blk_mq_commit_rqs(hctx, &queued, from_schedule);
+ blk_mq_commit_rqs(hctx, &queued, false);
hctx = rq->mq_hctx;
}

@@ -2713,7 +2713,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
case BLK_STS_RESOURCE:
case BLK_STS_DEV_RESOURCE:
blk_mq_request_bypass_insert(rq, false, true);
- blk_mq_commit_rqs(hctx, &queued, from_schedule);
+ blk_mq_commit_rqs(hctx, &queued, false);
return;
default:
blk_mq_end_request(rq, ret);
@@ -2727,7 +2727,7 @@ static void blk_mq_plug_issue_direct(struct blk_plug *plug, bool from_schedule)
* there was more coming, but that turned out to be a lie.
*/
if (errors)
- blk_mq_commit_rqs(hctx, &queued, from_schedule);
+ blk_mq_commit_rqs(hctx, &queued, false);
}

static void __blk_mq_flush_plug_list(struct request_queue *q,
@@ -2798,7 +2798,7 @@ void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
}

blk_mq_run_dispatch_ops(q,
- blk_mq_plug_issue_direct(plug, false));
+ blk_mq_plug_issue_direct(plug));
if (rq_list_empty(plug->mq_list))
return;
}
--
2.30.0

2023-01-18 02:54:58

by Kemeng Shi

[permalink] [raw]
Subject: [PATCH v4 02/14] blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx

Commit 97889f9ac24f8 ("blk-mq: remove synchronize_rcu() from
blk_mq_del_queue_tag_set()") remove handle of TAG_SHARED in restart,
then shared_hctx_restart counted for how many hardware queues are marked
for restart is removed too.
Remove the stale comment that we still count hardware queues need restart.

Fixes: 97889f9ac24f ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()")
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kemeng Shi <[email protected]>
---
block/blk-mq-sched.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 23d1a90fec42..ae40cdb7a383 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -19,8 +19,7 @@
#include "blk-wbt.h"

/*
- * Mark a hardware queue as needing a restart. For shared queues, maintain
- * a count of how many hardware queues are marked for restart.
+ * Mark a hardware queue as needing a restart.
*/
void blk_mq_sched_mark_restart_hctx(struct blk_mq_hw_ctx *hctx)
{
--
2.30.0

2023-01-18 17:57:51

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 07/14] blk-mq: make blk_mq_commit_rqs a general function for all commits

On Wed, Jan 18, 2023 at 05:37:19PM +0800, Kemeng Shi wrote:
> +/*
> + * blk_mq_commit_rqs will notify driver using bd->last that there is no
> + * more requests. (See comment in struct blk_mq_ops for commit_rqs for
> + * details)
> + * Attention, we should explicitly call this in unusual cases:
> + * 1) did not queue everything initially scheduled to queue
> + * 2) the last attempt to queue a request failed
> + */
> +static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int queued,
> + bool from_schedule)

Isn't from_schedule always false here as well now?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2023-01-18 17:59:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 12/14] blk-mq: remove set of bd->last when get driver tag for next request fails

On Wed, Jan 18, 2023 at 05:37:24PM +0800, Kemeng Shi wrote:
> Commit 113285b473824 ("blk-mq: ensure that bd->last is always set
> correctly") will set last if we failed to get driver tag for next
> request to avoid flush miss as we break the list walk and will not
> send the last request in the list which will be sent with last set
> normally.
> This code seems stale now becase the flush introduced is always
> redundant as:
> For case tag is really out, we will send a extra flush if we find
> list is not empty after list walk.
> For case some tag is freed before retry in blk_mq_prep_dispatch_rq for
> next, then we can get a tag for next request in retry and flush notified
> already is not necessary.

I think Ming will know this code better than me, but aren't we
losing the blk_mq_get_driver_tag call entirely here now. Where
is getting the driver tag covered now?

2023-01-18 18:15:53

by Christoph Hellwig

[permalink] [raw]

2023-01-18 18:20:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 06/14] blk-mq: remove unncessary from_schedule parameter in blk_mq_plug_issue_direct

On Wed, Jan 18, 2023 at 05:37:18PM +0800, Kemeng Shi wrote:
> Function blk_mq_plug_issue_direct tries to issue batch requests in plug
> list to driver directly. We will only issue plug request to driver if we
> are not from scheduler, so from_scheduler parameter of
> blk_mq_plug_issue_direct is always false.
> Remove unncessary from_scheduler of blk_mq_plug_issue_direct.

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2023-01-19 02:17:19

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH v4 12/14] blk-mq: remove set of bd->last when get driver tag for next request fails



on 1/19/2023 1:42 AM, Christoph Hellwig wrote:
> On Wed, Jan 18, 2023 at 05:37:24PM +0800, Kemeng Shi wrote:
>> Commit 113285b473824 ("blk-mq: ensure that bd->last is always set
>> correctly") will set last if we failed to get driver tag for next
>> request to avoid flush miss as we break the list walk and will not
>> send the last request in the list which will be sent with last set
>> normally.
>> This code seems stale now becase the flush introduced is always
>> redundant as:
>> For case tag is really out, we will send a extra flush if we find
>> list is not empty after list walk.
>> For case some tag is freed before retry in blk_mq_prep_dispatch_rq for
>> next, then we can get a tag for next request in retry and flush notified
>> already is not necessary.
>
> I think Ming will know this code better than me, but aren't we
> losing the blk_mq_get_driver_tag call entirely here now. Where
> is getting the driver tag covered now?
>
We will get driver tag in blk_mq_prep_dispatch_rq at beginning of dispatch
loop, so it's fine to remove blk_mq_get_driver_tag here. Thanks.
--
Best wishes
Kemeng Shi

2023-01-19 02:32:47

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH v4 07/14] blk-mq: make blk_mq_commit_rqs a general function for all commits



on 1/19/2023 1:37 AM, Christoph Hellwig wrote:
> On Wed, Jan 18, 2023 at 05:37:19PM +0800, Kemeng Shi wrote:
>> +/*
>> + * blk_mq_commit_rqs will notify driver using bd->last that there is no
>> + * more requests. (See comment in struct blk_mq_ops for commit_rqs for
>> + * details)
>> + * Attention, we should explicitly call this in unusual cases:
>> + * 1) did not queue everything initially scheduled to queue
>> + * 2) the last attempt to queue a request failed
>> + */
>> +static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int queued,
>> + bool from_schedule)
>
> Isn't from_schedule always false here as well now?
Hi Christoph ,
Yes, it's always false now. As blk_mq_commit_rqs is a general function
for all commits now, I keep the from_schedule for commits where
from_schedule maybe true in future. We can remove from_schedule now
and add it back when from_schedule is indeed needed. Both way is
acceptable for me. Please let me know which way do you prefer and I
will send a new version if needed.
Thanks.
> Otherwise looks good:
>
> Reviewed-by: Christoph Hellwig <[email protected]>
>

--
Best wishes
Kemeng Shi

2023-01-28 01:55:45

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH v4 07/14] blk-mq: make blk_mq_commit_rqs a general function for all commits



on 1/19/2023 9:40 AM, Kemeng Shi wrote:
>
>
> on 1/19/2023 1:37 AM, Christoph Hellwig wrote:
>> On Wed, Jan 18, 2023 at 05:37:19PM +0800, Kemeng Shi wrote:
>>> +/*
>>> + * blk_mq_commit_rqs will notify driver using bd->last that there is no
>>> + * more requests. (See comment in struct blk_mq_ops for commit_rqs for
>>> + * details)
>>> + * Attention, we should explicitly call this in unusual cases:
>>> + * 1) did not queue everything initially scheduled to queue
>>> + * 2) the last attempt to queue a request failed
>>> + */
>>> +static void blk_mq_commit_rqs(struct blk_mq_hw_ctx *hctx, int queued,
>>> + bool from_schedule)
>>
>> Isn't from_schedule always false here as well now?
> Hi Christoph ,
> Yes, it's always false now. As blk_mq_commit_rqs is a general function
> for all commits now, I keep the from_schedule for commits where
> from_schedule maybe true in future. We can remove from_schedule now
> and add it back when from_schedule is indeed needed. Both way is
> acceptable for me. Please let me know which way do you prefer and I
> will send a new version if needed.
> Thanks.
>> Otherwise looks good:
>>
>> Reviewed-by: Christoph Hellwig <[email protected]>
>>
>
Hi Christoph,
Sorry for bother. I'm not sure if you got this message. I plan to collect
reviewed-by from you for this patch if no futher work is informed.
Thanks.

--
Best wishes
Kemeng Shi


2023-01-28 02:03:54

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH v4 12/14] blk-mq: remove set of bd->last when get driver tag for next request fails



on 1/19/2023 9:45 AM, Kemeng Shi wrote:
>
>
> on 1/19/2023 1:42 AM, Christoph Hellwig wrote:
>> On Wed, Jan 18, 2023 at 05:37:24PM +0800, Kemeng Shi wrote:
>>> Commit 113285b473824 ("blk-mq: ensure that bd->last is always set
>>> correctly") will set last if we failed to get driver tag for next
>>> request to avoid flush miss as we break the list walk and will not
>>> send the last request in the list which will be sent with last set
>>> normally.
>>> This code seems stale now becase the flush introduced is always
>>> redundant as:
>>> For case tag is really out, we will send a extra flush if we find
>>> list is not empty after list walk.
>>> For case some tag is freed before retry in blk_mq_prep_dispatch_rq for
>>> next, then we can get a tag for next request in retry and flush notified
>>> already is not necessary.
>>
>> I think Ming will know this code better than me, but aren't we
>> losing the blk_mq_get_driver_tag call entirely here now. Where
>> is getting the driver tag covered now?
>>
> We will get driver tag in blk_mq_prep_dispatch_rq at beginning of dispatch
> loop, so it's fine to remove blk_mq_get_driver_tag here. Thanks.
>

Hi Ming and everyone familiar with code invovled, could you help with
reviewing this patch and patch "[PATCH v4 04/14] blk-mq: Fix potential
io hung for shared sbitmap per tagset" in the same patchset.
Thanks.

--
Best wishes
Kemeng Shi


2023-02-06 01:05:52

by Kemeng Shi

[permalink] [raw]
Subject: Re: [PATCH v4 12/14] blk-mq: remove set of bd->last when get driver tag for next request fails


on 1/28/2023 10:03 AM, Kemeng Shi wrote:
>
>
> on 1/19/2023 9:45 AM, Kemeng Shi wrote:
>>
>>
>> on 1/19/2023 1:42 AM, Christoph Hellwig wrote:
>>> On Wed, Jan 18, 2023 at 05:37:24PM +0800, Kemeng Shi wrote:
>>>> Commit 113285b473824 ("blk-mq: ensure that bd->last is always set
>>>> correctly") will set last if we failed to get driver tag for next
>>>> request to avoid flush miss as we break the list walk and will not
>>>> send the last request in the list which will be sent with last set
>>>> normally.
>>>> This code seems stale now becase the flush introduced is always
>>>> redundant as:
>>>> For case tag is really out, we will send a extra flush if we find
>>>> list is not empty after list walk.
>>>> For case some tag is freed before retry in blk_mq_prep_dispatch_rq for
>>>> next, then we can get a tag for next request in retry and flush notified
>>>> already is not necessary.
>>>
>>> I think Ming will know this code better than me, but aren't we
>>> losing the blk_mq_get_driver_tag call entirely here now. Where
>>> is getting the driver tag covered now?
>>>
>> We will get driver tag in blk_mq_prep_dispatch_rq at beginning of dispatch
>> loop, so it's fine to remove blk_mq_get_driver_tag here. Thanks.
>>
>
> Hi Ming and everyone familiar with code invovled, could you help with
> reviewing this patch and patch "[PATCH v4 04/14] blk-mq: Fix potential
> io hung for shared sbitmap per tagset" in the same patchset.
> Thanks.
>
Could anyone please help review the last two patches without reviewed-by.
Thanks.

--
Best wishes
Kemeng Shi


2023-02-06 16:06:20

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 12/14] blk-mq: remove set of bd->last when get driver tag for next request fails

On Mon, Feb 06, 2023 at 09:05:25AM +0800, Kemeng Shi wrote:
> Could anyone please help review the last two patches without reviewed-by.

I'm still not really convinved of this one patch.

The rest of the series looks like great small cleanups, but this one
isn't quite as trivial. Maybe you can split it from the series and
resend it with a more detailed and better explanation for the next
merge window?

2023-02-06 16:22:46

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v4 01/14] blk-mq: avoid sleep in blk_mq_alloc_request_hctx


On Wed, 18 Jan 2023 17:37:13 +0800, Kemeng Shi wrote:
> Commit 1f5bd336b9150 ("blk-mq: add blk_mq_alloc_request_hctx") add
> blk_mq_alloc_request_hctx to send commands to a specific queue. If
> BLK_MQ_REQ_NOWAIT is not set in tag allocation, we may change to different
> hctx after sleep and get tag from unexpected hctx. So BLK_MQ_REQ_NOWAIT
> must be set in flags for blk_mq_alloc_request_hctx.
> After commit 600c3b0cea784 ("blk-mq: open code __blk_mq_alloc_request in
> blk_mq_alloc_request_hctx"), blk_mq_alloc_request_hctx return -EINVAL
> if both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED are not set instead of
> if BLK_MQ_REQ_NOWAIT is not set. So if BLK_MQ_REQ_NOWAIT is not set and
> BLK_MQ_REQ_RESERVED is set, blk_mq_alloc_request_hctx could alloc tag
> from unexpected hctx. I guess what we need here is that return -EINVAL
> if either BLK_MQ_REQ_NOWAIT or BLK_MQ_REQ_RESERVED is not set.
>
> [...]

Applied, thanks!

[01/14] blk-mq: avoid sleep in blk_mq_alloc_request_hctx
commit: 6ee858a3d3270a68902d66bb47c151a83622535c
[02/14] blk-mq: remove stale comment for blk_mq_sched_mark_restart_hctx
commit: c31e76bcc379182fe67a82c618493b7b8868c672
[03/14] blk-mq: wait on correct sbitmap_queue in blk_mq_mark_tag_wait
commit: 98b99e9412d0cde8c7b442bf5efb09528a2ede8b
[04/14] blk-mq: Fix potential io hung for shared sbitmap per tagset
commit: 47df9ce95cd568d3f84218c4f65e9fbd4dfeda55
[05/14] blk-mq: remove unnecessary list_empty check in blk_mq_try_issue_list_directly
commit: 08e3599e7401a7eae5e68f5e2601cc4a4e53951b
[06/14] blk-mq: remove unncessary from_schedule parameter in blk_mq_plug_issue_direct
commit: 3e368fb023ffab83404f628d02789550d79eca9c
[07/14] blk-mq: make blk_mq_commit_rqs a general function for all commits
commit: 34c9f547402f11c0241a44800574ec4fa38cccb8
[08/14] blk-mq: remove unncessary error count and commit in blk_mq_plug_issue_direct
commit: 0d617a83e8d4d3149d76cc074d9779a3b0ee7baf
[09/14] blk-mq: use blk_mq_commit_rqs helper in blk_mq_try_issue_list_directly
commit: 984ce0a7d75b577fd84f2cc7a83e6e2d2503f90e
[10/14] blk-mq: simplify flush check in blk_mq_dispatch_rq_list
commit: e4ef2e05e0020db0d61b2cf451ef38a2bba33910
[11/14] blk-mq: remove unnecessary error count and check in blk_mq_dispatch_rq_list
commit: 4ea58fe456c21bb259a7cbf8498946f86e9b84aa
[12/14] blk-mq: remove set of bd->last when get driver tag for next request fails
commit: f1ce99f7098d9e7a322caf48eb8af05be7999827
[13/14] blk-mq: use switch/case to improve readability in blk_mq_try_issue_list_directly
commit: 27e8b2bb149aff7b7b673b46c7206f4f37c30093
[14/14] blk-mq: correct stale comment of .get_budget
commit: 01542f651a9f58a9b176c3d3dc3eefbacee53b78

Best regards,
--
Jens Axboe