2021-12-06 12:55:06

by John Garry

[permalink] [raw]
Subject: [PATCH v2 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

In [0] Kashyap reports high CPU usage for blk_mq_queue_tag_busy_iter()
and callees for shared tags.

Indeed blk_mq_queue_tag_busy_iter() would be less optimum for moving to
shared tags, but it was not optimum previously.

This series optimises by having only a single iter (per regular and resv
tags) for the shared tags, instead of an iter per HW queue.

[0] https://lore.kernel.org/linux-block/[email protected]/

Differences to v1:
- Add tested-by and reviewed-by tags
- Reformat 3/3 a bit to keep <= 80 char lines
- I kept the RB tags, so please check and let me know if not ok
with the changes

John Garry (3):
blk-mq: Drop busy_iter_fn blk_mq_hw_ctx argument
blk-mq: Delete busy_iter_fn
blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

block/blk-mq-tag.c | 65 ++++++++++++++++++++++++++++--------------
block/blk-mq-tag.h | 2 +-
block/blk-mq.c | 17 ++++++-----
include/linux/blk-mq.h | 2 --
4 files changed, 53 insertions(+), 33 deletions(-)

--
2.26.2



2021-12-06 12:55:12

by John Garry

[permalink] [raw]
Subject: [PATCH v2 1/3] blk-mq: Drop busy_iter_fn blk_mq_hw_ctx argument

The only user of blk_mq_hw_ctx blk_mq_hw_ctx argument is
blk_mq_rq_inflight().

Function blk_mq_rq_inflight() uses the hctx to find the associated request
queue to match against the request. However this same check is already
done in caller bt_iter(), so drop this check.

With that change there are no more users of busy_iter_fn blk_mq_hw_ctx
argument, so drop the argument.

Signed-off-by: John Garry <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by Hannes Reinecke <[email protected]>
Tested-by: Kashyap Desai <[email protected]>
---
block/blk-mq-tag.c | 2 +-
block/blk-mq.c | 17 ++++++++---------
include/linux/blk-mq.h | 3 +--
3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 380e2dd31bfc..d3cf91d764d5 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -254,7 +254,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
return true;

if (rq->q == hctx->queue && rq->mq_hctx == hctx)
- ret = iter_data->fn(hctx, rq, iter_data->data, reserved);
+ ret = iter_data->fn(rq, iter_data->data, reserved);
blk_mq_put_rq_ref(rq);
return ret;
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 22ec21aa0c22..433fbb40dd6e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -127,8 +127,7 @@ struct mq_inflight {
unsigned int inflight[2];
};

-static bool blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
- struct request *rq, void *priv,
+static bool blk_mq_check_inflight(struct request *rq, void *priv,
bool reserved)
{
struct mq_inflight *mi = priv;
@@ -1308,14 +1307,15 @@ void blk_mq_delay_kick_requeue_list(struct request_queue *q,
}
EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);

-static bool blk_mq_rq_inflight(struct blk_mq_hw_ctx *hctx, struct request *rq,
- void *priv, bool reserved)
+static bool blk_mq_rq_inflight(struct request *rq, void *priv,
+ bool reserved)
{
/*
- * If we find a request that isn't idle and the queue matches,
- * we know the queue is busy. Return false to stop the iteration.
+ * If we find a request that isn't idle we know the queue is busy
+ * as it's checked in the iter.
+ * Return false to stop the iteration.
*/
- if (blk_mq_request_started(rq) && rq->q == hctx->queue) {
+ if (blk_mq_request_started(rq)) {
bool *busy = priv;

*busy = true;
@@ -1377,8 +1377,7 @@ void blk_mq_put_rq_ref(struct request *rq)
__blk_mq_free_request(rq);
}

-static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
- struct request *rq, void *priv, bool reserved)
+static bool blk_mq_check_expired(struct request *rq, void *priv, bool reserved)
{
unsigned long *next = priv;

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index ecdc049b52fa..17ebf29e42d8 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -470,8 +470,7 @@ struct blk_mq_queue_data {
bool last;
};

-typedef bool (busy_iter_fn)(struct blk_mq_hw_ctx *, struct request *, void *,
- bool);
+typedef bool (busy_iter_fn)(struct request *, void *, bool);
typedef bool (busy_tag_iter_fn)(struct request *, void *, bool);

/**
--
2.26.2


2021-12-06 12:55:15

by John Garry

[permalink] [raw]
Subject: [PATCH v2 2/3] blk-mq: Delete busy_iter_fn

Typedefs busy_iter_fn and busy_tag_iter_fn are now identical, so delete
busy_iter_fn to reduce duplication.

It would be nicer to delete busy_tag_iter_fn, as the name busy_iter_fn is
less specific.

However busy_tag_iter_fn is used in many different parts of the tree,
unlike busy_iter_fn which is just use in block/, so just take the
straightforward path now, so that we could rename later treewide.

Signed-off-by: John Garry <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Tested-by: Kashyap Desai <[email protected]>
---
block/blk-mq-tag.c | 6 +++---
block/blk-mq-tag.h | 2 +-
include/linux/blk-mq.h | 1 -
3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index d3cf91d764d5..58b80d4b7a07 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -215,7 +215,7 @@ void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags)

struct bt_iter_data {
struct blk_mq_hw_ctx *hctx;
- busy_iter_fn *fn;
+ busy_tag_iter_fn *fn;
void *data;
bool reserved;
};
@@ -274,7 +274,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* bitmap_tags member of struct blk_mq_tags.
*/
static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt,
- busy_iter_fn *fn, void *data, bool reserved)
+ busy_tag_iter_fn *fn, void *data, bool reserved)
{
struct bt_iter_data iter_data = {
.hctx = hctx,
@@ -457,7 +457,7 @@ EXPORT_SYMBOL(blk_mq_tagset_wait_completed_request);
* called for all requests on all queues that share that tag set and not only
* for requests associated with @q.
*/
-void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
+void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_tag_iter_fn *fn,
void *priv)
{
struct blk_mq_hw_ctx *hctx;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index df787b5a23bd..5668e28be0b7 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -28,7 +28,7 @@ extern void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set,
extern void blk_mq_tag_update_sched_shared_tags(struct request_queue *q);

extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
-void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
+void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_tag_iter_fn *fn,
void *priv);
void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
void *priv);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 17ebf29e42d8..772f8f921526 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -470,7 +470,6 @@ struct blk_mq_queue_data {
bool last;
};

-typedef bool (busy_iter_fn)(struct request *, void *, bool);
typedef bool (busy_tag_iter_fn)(struct request *, void *, bool);

/**
--
2.26.2


2021-12-06 12:55:17

by John Garry

[permalink] [raw]
Subject: [PATCH v2 3/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

Kashyap reports high CPU usage in blk_mq_queue_tag_busy_iter() and callees
using megaraid SAS RAID card since moving to shared tags [0].

Previously, when shared tags was shared sbitmap, this function was less
than optimum since we would iter through all tags for all hctx's,
yet only ever match upto tagset depth number of rqs.

Since the change to shared tags, things are even less efficient if we have
parallel callers of blk_mq_queue_tag_busy_iter(). This is because in
bt_iter() -> blk_mq_find_and_get_req() there would be more contention on
accessing each request ref and tags->lock since they are now shared among
all HW queues.

Optimise by having separate calls to bt_for_each() for when we're using
shared tags. In this case no longer pass a hctx, as it is no longer
relevant, and teach bt_iter() about this.

Ming suggested something along the lines of this change, apart from a
different implementation.

[0] https://lore.kernel.org/linux-block/[email protected]/

Signed-off-by: John Garry <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reported-and-tested-by: Kashyap Desai <[email protected]>
---
block/blk-mq-tag.c | 59 ++++++++++++++++++++++++++++++++--------------
1 file changed, 41 insertions(+), 18 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 58b80d4b7a07..e55a6834c9a6 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -215,6 +215,7 @@ void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags)

struct bt_iter_data {
struct blk_mq_hw_ctx *hctx;
+ struct request_queue *q;
busy_tag_iter_fn *fn;
void *data;
bool reserved;
@@ -238,11 +239,18 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
{
struct bt_iter_data *iter_data = data;
struct blk_mq_hw_ctx *hctx = iter_data->hctx;
- struct blk_mq_tags *tags = hctx->tags;
+ struct request_queue *q = iter_data->q;
+ struct blk_mq_tag_set *set = q->tag_set;
bool reserved = iter_data->reserved;
+ struct blk_mq_tags *tags;
struct request *rq;
bool ret = true;

+ if (blk_mq_is_shared_tags(set->flags))
+ tags = set->shared_tags;
+ else
+ tags = hctx->tags;
+
if (!reserved)
bitnr += tags->nr_reserved_tags;
/*
@@ -253,7 +261,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
if (!rq)
return true;

- if (rq->q == hctx->queue && rq->mq_hctx == hctx)
+ if (rq->q == q && (!hctx || rq->mq_hctx == hctx))
ret = iter_data->fn(rq, iter_data->data, reserved);
blk_mq_put_rq_ref(rq);
return ret;
@@ -262,6 +270,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
/**
* bt_for_each - iterate over the requests associated with a hardware queue
* @hctx: Hardware queue to examine.
+ * @q: Request queue to examine.
* @bt: sbitmap to examine. This is either the breserved_tags member
* or the bitmap_tags member of struct blk_mq_tags.
* @fn: Pointer to the function that will be called for each request
@@ -273,14 +282,16 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* @reserved: Indicates whether @bt is the breserved_tags member or the
* bitmap_tags member of struct blk_mq_tags.
*/
-static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt,
- busy_tag_iter_fn *fn, void *data, bool reserved)
+static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct request_queue *q,
+ struct sbitmap_queue *bt, busy_tag_iter_fn *fn,
+ void *data, bool reserved)
{
struct bt_iter_data iter_data = {
.hctx = hctx,
.fn = fn,
.data = data,
.reserved = reserved,
+ .q = q,
};

sbitmap_for_each_set(&bt->sb, bt_iter, &iter_data);
@@ -460,9 +471,6 @@ EXPORT_SYMBOL(blk_mq_tagset_wait_completed_request);
void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_tag_iter_fn *fn,
void *priv)
{
- struct blk_mq_hw_ctx *hctx;
- int i;
-
/*
* __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx
* while the queue is frozen. So we can use q_usage_counter to avoid
@@ -471,19 +479,34 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_tag_iter_fn *fn,
if (!percpu_ref_tryget(&q->q_usage_counter))
return;

- queue_for_each_hw_ctx(q, hctx, i) {
- struct blk_mq_tags *tags = hctx->tags;
-
- /*
- * If no software queues are currently mapped to this
- * hardware queue, there's nothing to check
- */
- if (!blk_mq_hw_queue_mapped(hctx))
- continue;
+ if (blk_mq_is_shared_tags(q->tag_set->flags)) {
+ struct blk_mq_tags *tags = q->tag_set->shared_tags;
+ struct sbitmap_queue *bresv = &tags->breserved_tags;
+ struct sbitmap_queue *btags = &tags->bitmap_tags;

if (tags->nr_reserved_tags)
- bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
- bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+ bt_for_each(NULL, q, bresv, fn, priv, true);
+ bt_for_each(NULL, q, btags, fn, priv, false);
+ } else {
+ struct blk_mq_hw_ctx *hctx;
+ int i;
+
+ queue_for_each_hw_ctx(q, hctx, i) {
+ struct blk_mq_tags *tags = hctx->tags;
+ struct sbitmap_queue *bresv = &tags->breserved_tags;
+ struct sbitmap_queue *btags = &tags->bitmap_tags;
+
+ /*
+ * If no software queues are currently mapped to this
+ * hardware queue, there's nothing to check
+ */
+ if (!blk_mq_hw_queue_mapped(hctx))
+ continue;
+
+ if (tags->nr_reserved_tags)
+ bt_for_each(hctx, q, bresv, fn, priv, true);
+ bt_for_each(hctx, q, btags, fn, priv, false);
+ }
}
blk_queue_exit(q);
}
--
2.26.2


2021-12-06 19:07:04

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On 12/6/21 5:49 AM, John Garry wrote:
> In [0] Kashyap reports high CPU usage for blk_mq_queue_tag_busy_iter()
> and callees for shared tags.
>
> Indeed blk_mq_queue_tag_busy_iter() would be less optimum for moving to
> shared tags, but it was not optimum previously.
>
> This series optimises by having only a single iter (per regular and resv
> tags) for the shared tags, instead of an iter per HW queue.
>
> [0] https://lore.kernel.org/linux-block/[email protected]/

The patch(es) are missing Fixes tags.

--
Jens Axboe


2021-12-06 19:34:36

by John Garry

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On 06/12/2021 19:07, Jens Axboe wrote:
> On 12/6/21 5:49 AM, John Garry wrote:
>> In [0] Kashyap reports high CPU usage for blk_mq_queue_tag_busy_iter()
>> and callees for shared tags.
>>
>> Indeed blk_mq_queue_tag_busy_iter() would be less optimum for moving to
>> shared tags, but it was not optimum previously.
>>
>> This series optimises by having only a single iter (per regular and resv
>> tags) for the shared tags, instead of an iter per HW queue.
>>
>> [0]https://lore.kernel.org/linux-block/[email protected]/

Hi Jens,

> The patch(es) are missing Fixes tags.

The first two patches aren't fixes, but are general dev. As for the
last, it prob should go as a fix for 5.16, but I was not sure how you
would feel about that - it's not a trivial change, we're late in the
cycle, and Kashyap was happy for 5.17 .

Let me know if the last could be accepted as a fix and I'll re-send
separately with a fixes tag.

Thanks,
John

2021-12-06 19:48:48

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On 12/6/21 12:34 PM, John Garry wrote:
> On 06/12/2021 19:07, Jens Axboe wrote:
>> On 12/6/21 5:49 AM, John Garry wrote:
>>> In [0] Kashyap reports high CPU usage for blk_mq_queue_tag_busy_iter()
>>> and callees for shared tags.
>>>
>>> Indeed blk_mq_queue_tag_busy_iter() would be less optimum for moving to
>>> shared tags, but it was not optimum previously.
>>>
>>> This series optimises by having only a single iter (per regular and resv
>>> tags) for the shared tags, instead of an iter per HW queue.
>>>
>>> [0]https://lore.kernel.org/linux-block/[email protected]/
>
> Hi Jens,
>
>> The patch(es) are missing Fixes tags.
>
> The first two patches aren't fixes, but are general dev. As for the
> last, it prob should go as a fix for 5.16, but I was not sure how you
> would feel about that - it's not a trivial change, we're late in the
> cycle, and Kashyap was happy for 5.17 .
>
> Let me know if the last could be accepted as a fix and I'll re-send
> separately with a fixes tag.

Regardless of whether it's going into 5.16 or 5.17 it should have a
fixes tag.

--
Jens Axboe


2021-12-06 20:00:05

by John Garry

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On 06/12/2021 19:48, Jens Axboe wrote:
>> The first two patches aren't fixes, but are general dev. As for the
>> last, it prob should go as a fix for 5.16, but I was not sure how you
>> would feel about that - it's not a trivial change, we're late in the
>> cycle, and Kashyap was happy for 5.17 .
>>

Hi Jens,

>> Let me know if the last could be accepted as a fix and I'll re-send
>> separately with a fixes tag.
> Regardless of whether it's going into 5.16 or 5.17 it should have a
> fixes tag.

ok, so patch 3/3 would fix commit e155b0c238b2 ("blk-mq: Use shared tags
for shared sbitmap support")

I'll reply to that one with a fixes tag and I think that b4 would pick
it up in case.

Cheers,
John

2021-12-06 20:10:50

by John Garry

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On 06/12/2021 12:49, John Garry wrote:
> Kashyap reports high CPU usage in blk_mq_queue_tag_busy_iter() and callees
> using megaraid SAS RAID card since moving to shared tags [0].
>
> Previously, when shared tags was shared sbitmap, this function was less
> than optimum since we would iter through all tags for all hctx's,
> yet only ever match upto tagset depth number of rqs.
>
> Since the change to shared tags, things are even less efficient if we have
> parallel callers of blk_mq_queue_tag_busy_iter(). This is because in
> bt_iter() -> blk_mq_find_and_get_req() there would be more contention on
> accessing each request ref and tags->lock since they are now shared among
> all HW queues.
>
> Optimise by having separate calls to bt_for_each() for when we're using
> shared tags. In this case no longer pass a hctx, as it is no longer
> relevant, and teach bt_iter() about this.
>
> Ming suggested something along the lines of this change, apart from a
> different implementation.
>
> [0]https://lore.kernel.org/linux-block/[email protected]/
>

Fixes: e155b0c238b2 ("blk-mq: Use shared tags for shared sbitmap support")

> Signed-off-by: John Garry<[email protected]>
> Reviewed-by: Hannes Reinecke<[email protected]>
> Reviewed-by: Ming Lei<[email protected]>
> Reported-and-tested-by: Kashyap Desai<[email protected]>


2021-12-07 16:18:11

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On Mon, 6 Dec 2021 20:49:47 +0800, John Garry wrote:
> In [0] Kashyap reports high CPU usage for blk_mq_queue_tag_busy_iter()
> and callees for shared tags.
>
> Indeed blk_mq_queue_tag_busy_iter() would be less optimum for moving to
> shared tags, but it was not optimum previously.
>
> This series optimises by having only a single iter (per regular and resv
> tags) for the shared tags, instead of an iter per HW queue.
>
> [...]

Applied, thanks!

[1/3] blk-mq: Drop busy_iter_fn blk_mq_hw_ctx argument
(no commit info)
[2/3] blk-mq: Delete busy_iter_fn
(no commit info)
[3/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags
(no commit info)

Best regards,
--
Jens Axboe