2021-11-02 11:35:47

by John Garry

[permalink] [raw]
Subject: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

In [0], Kashyap reports high CPU usage for blk_mq_queue_tag_busy_iter()
and callees for shared tags.

Indeed blk_mq_queue_tag_busy_iter() would be less optimum for moving to
shared tags, but it was not optimum previously.

So I would like this series tested, and also to know what is triggering
blk_mq_queue_tag_busy_iter() from userspace to cause such high CPU
loading.

As suggested by Ming, reading /proc/diskstats in a while true loop
can trigger blk_mq_queue_tag_busy_iter(); I do so in a test with 2x
separate consoles, and here are the results:

v5.15
blk_mq_queue_tag_busy_iter() 6.2%
part_stat_read_all() 6.7

pre-v5.16 (Linus' master branch @ commit bfc484fe6abb)
blk_mq_queue_tag_busy_iter() 4.5%
part_stat_read_all() 6.2

pre-v5.16+this series
blk_mq_queue_tag_busy_iter() not shown in top users
part_stat_read_all() 7.5%

These results are from perf top, on a system with 7x
disks, with hisi_sas which has 16x HW queues.

[0] https://lore.kernel.org/linux-block/[email protected]/

John Garry (3):
blk-mq: Drop busy_iter_fn blk_mq_hw_ctx argument
blk-mq: Delete busy_iter_fn
blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

block/blk-mq-tag.c | 58 +++++++++++++++++++++++++++---------------
block/blk-mq-tag.h | 2 +-
block/blk-mq.c | 17 ++++++-------
include/linux/blk-mq.h | 2 --
4 files changed, 47 insertions(+), 32 deletions(-)

--
2.17.1


2021-11-02 11:36:24

by John Garry

[permalink] [raw]
Subject: [PATCH RFT 3/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

Kashyap reports high CPU usage in blk_mq_queue_tag_busy_iter() and callees
using megaraid SAS RAID card since moving to shared tags [0].

Previously, when shared tags was shared sbitmap, this function was less
than optimum since we would iter through all tags for all hctx's,
yet only ever match upto tagset depth number of rqs.

Since the change to shared tags, things are even less efficient if we have
parallel callers of blk_mq_queue_tag_busy_iter(). This is because in
bt_iter() -> blk_mq_find_and_get_req() there would be more contention on
accessing each request ref and tags->lock since they are now shared among
all HW queues.

Optimise by having separate calls to bt_for_each() for when we're using
shared tags. In this case no longer pass a hctx, as it is no longer
relevant, and teach bt_iter() about this.

Ming suggested something along the lines of this change, apart from a
different implementation.

[0] https://lore.kernel.org/linux-block/[email protected]/

Signed-off-by: John Garry <[email protected]>
---
block/blk-mq-tag.c | 52 +++++++++++++++++++++++++++++++---------------
1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index bc233ea92adf..00515933c8a8 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -215,6 +215,7 @@ void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags)

struct bt_iter_data {
struct blk_mq_hw_ctx *hctx;
+ struct request_queue *q;
busy_tag_iter_fn *fn;
void *data;
bool reserved;
@@ -238,11 +239,18 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
{
struct bt_iter_data *iter_data = data;
struct blk_mq_hw_ctx *hctx = iter_data->hctx;
- struct blk_mq_tags *tags = hctx->tags;
+ struct request_queue *q = iter_data->q;
bool reserved = iter_data->reserved;
+ struct blk_mq_tag_set *set = q->tag_set;
+ struct blk_mq_tags *tags;
struct request *rq;
bool ret = true;

+ if (blk_mq_is_shared_tags(set->flags))
+ tags = set->shared_tags;
+ else
+ tags = hctx->tags;
+
if (!reserved)
bitnr += tags->nr_reserved_tags;
/*
@@ -253,7 +261,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
if (!rq)
return true;

- if (rq->q == hctx->queue && rq->mq_hctx == hctx)
+ if (rq->q == q && (!hctx || rq->mq_hctx == hctx))
ret = iter_data->fn(rq, iter_data->data, reserved);
blk_mq_put_rq_ref(rq);
return ret;
@@ -274,13 +282,15 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* bitmap_tags member of struct blk_mq_tags.
*/
static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt,
- busy_tag_iter_fn *fn, void *data, bool reserved)
+ busy_tag_iter_fn *fn, void *data, bool reserved,
+ struct request_queue *q)
{
struct bt_iter_data iter_data = {
.hctx = hctx,
.fn = fn,
.data = data,
.reserved = reserved,
+ .q = q,
};

sbitmap_for_each_set(&bt->sb, bt_iter, &iter_data);
@@ -460,9 +470,6 @@ EXPORT_SYMBOL(blk_mq_tagset_wait_completed_request);
void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_tag_iter_fn *fn,
void *priv)
{
- struct blk_mq_hw_ctx *hctx;
- int i;
-
/*
* __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx
* while the queue is frozen. So we can use q_usage_counter to avoid
@@ -471,19 +478,30 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_tag_iter_fn *fn,
if (!percpu_ref_tryget(&q->q_usage_counter))
return;

- queue_for_each_hw_ctx(q, hctx, i) {
- struct blk_mq_tags *tags = hctx->tags;
-
- /*
- * If no software queues are currently mapped to this
- * hardware queue, there's nothing to check
- */
- if (!blk_mq_hw_queue_mapped(hctx))
- continue;
+ if (blk_mq_is_shared_tags(q->tag_set->flags)) {
+ struct blk_mq_tags *tags = q->tag_set->shared_tags;

if (tags->nr_reserved_tags)
- bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
- bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+ bt_for_each(NULL, &tags->breserved_tags, fn, priv, true, q);
+ bt_for_each(NULL, &tags->bitmap_tags, fn, priv, false, q);
+ } else {
+ struct blk_mq_hw_ctx *hctx;
+ int i;
+
+ queue_for_each_hw_ctx(q, hctx, i) {
+ struct blk_mq_tags *tags = hctx->tags;
+
+ /*
+ * If no software queues are currently mapped to this
+ * hardware queue, there's nothing to check
+ */
+ if (!blk_mq_hw_queue_mapped(hctx))
+ continue;
+
+ if (tags->nr_reserved_tags)
+ bt_for_each(hctx, &tags->breserved_tags, fn, priv, true, q);
+ bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false, q);
+ }
}
blk_queue_exit(q);
}
--
2.17.1

2021-11-02 11:36:37

by John Garry

[permalink] [raw]
Subject: [PATCH RFT 2/3] blk-mq: Delete busy_iter_fn

Typedefs busy_iter_fn and busy_tag_iter_fn are now identical, so delete
busy_iter_fn to reduce duplication.

It would be nicer to delete busy_tag_iter_fn, as the name busy_iter_fn is
less specific.

However busy_tag_iter_fn is used in many different parts of the tree,
unlike busy_iter_fn which is just use in block/, so just take the
straightforward path now, so that we could rename later treewide.

Signed-off-by: John Garry <[email protected]>
---
block/blk-mq-tag.c | 6 +++---
block/blk-mq-tag.h | 2 +-
include/linux/blk-mq.h | 1 -
3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 0d773c44a7ec..bc233ea92adf 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -215,7 +215,7 @@ void blk_mq_put_tags(struct blk_mq_tags *tags, int *tag_array, int nr_tags)

struct bt_iter_data {
struct blk_mq_hw_ctx *hctx;
- busy_iter_fn *fn;
+ busy_tag_iter_fn *fn;
void *data;
bool reserved;
};
@@ -274,7 +274,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* bitmap_tags member of struct blk_mq_tags.
*/
static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt,
- busy_iter_fn *fn, void *data, bool reserved)
+ busy_tag_iter_fn *fn, void *data, bool reserved)
{
struct bt_iter_data iter_data = {
.hctx = hctx,
@@ -457,7 +457,7 @@ EXPORT_SYMBOL(blk_mq_tagset_wait_completed_request);
* called for all requests on all queues that share that tag set and not only
* for requests associated with @q.
*/
-void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
+void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_tag_iter_fn *fn,
void *priv)
{
struct blk_mq_hw_ctx *hctx;
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index df787b5a23bd..5668e28be0b7 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -28,7 +28,7 @@ extern void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set,
extern void blk_mq_tag_update_sched_shared_tags(struct request_queue *q);

extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
-void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
+void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_tag_iter_fn *fn,
void *priv);
void blk_mq_all_tag_iter(struct blk_mq_tags *tags, busy_tag_iter_fn *fn,
void *priv);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index da8de0d6f99b..2344c68bff35 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -479,7 +479,6 @@ struct blk_mq_queue_data {
bool last;
};

-typedef bool (busy_iter_fn)(struct request *, void *, bool);
typedef bool (busy_tag_iter_fn)(struct request *, void *, bool);

/**
--
2.17.1

2021-11-09 12:04:36

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH RFT 2/3] blk-mq: Delete busy_iter_fn

On Tue, Nov 02, 2021 at 07:27:34PM +0800, John Garry wrote:
> Typedefs busy_iter_fn and busy_tag_iter_fn are now identical, so delete
> busy_iter_fn to reduce duplication.
>
> It would be nicer to delete busy_tag_iter_fn, as the name busy_iter_fn is
> less specific.
>
> However busy_tag_iter_fn is used in many different parts of the tree,
> unlike busy_iter_fn which is just use in block/, so just take the
> straightforward path now, so that we could rename later treewide.
>
> Signed-off-by: John Garry <[email protected]>

Reviewed-by: Ming Lei <[email protected]>

--
Ming

2021-11-15 15:56:56

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On 02/11/2021 11:27, John Garry wrote:

Hi Kashyap,

Any chance you can try this series or give an update on the issue
reported earlier?

thanks @ Ming for the reviews.

Cheers,
John

> In [0], Kashyap reports high CPU usage for blk_mq_queue_tag_busy_iter()
> and callees for shared tags.
>
> Indeed blk_mq_queue_tag_busy_iter() would be less optimum for moving to
> shared tags, but it was not optimum previously.
>
> So I would like this series tested, and also to know what is triggering
> blk_mq_queue_tag_busy_iter() from userspace to cause such high CPU
> loading.
>
> As suggested by Ming, reading /proc/diskstats in a while true loop
> can trigger blk_mq_queue_tag_busy_iter(); I do so in a test with 2x
> separate consoles, and here are the results:
>
> v5.15
> blk_mq_queue_tag_busy_iter() 6.2%
> part_stat_read_all() 6.7
>
> pre-v5.16 (Linus' master branch @ commit bfc484fe6abb)
> blk_mq_queue_tag_busy_iter() 4.5%
> part_stat_read_all() 6.2
>
> pre-v5.16+this series
> blk_mq_queue_tag_busy_iter() not shown in top users
> part_stat_read_all() 7.5%
>
> These results are from perf top, on a system with 7x
> disks, with hisi_sas which has 16x HW queues.
>
> [0] https://lore.kernel.org/linux-block/[email protected]/
>
> John Garry (3):
> blk-mq: Drop busy_iter_fn blk_mq_hw_ctx argument
> blk-mq: Delete busy_iter_fn
> blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags
>
> block/blk-mq-tag.c | 58 +++++++++++++++++++++++++++---------------
> block/blk-mq-tag.h | 2 +-
> block/blk-mq.c | 17 ++++++-------
> include/linux/blk-mq.h | 2 --
> 4 files changed, 47 insertions(+), 32 deletions(-)
>


2021-11-15 16:32:04

by Kashyap Desai

[permalink] [raw]
Subject: RE: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

> Hi Kashyap,
>
> Any chance you can try this series or give an update on the issue reported
> earlier?

John -

I will try something this week and let you know.

>
> thanks @ Ming for the reviews.
>
> Cheers,
> John
>


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2021-11-25 13:54:48

by Kashyap Desai

[permalink] [raw]
Subject: RE: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

> -----Original Message-----
> From: Kashyap Desai [mailto:[email protected]]
> Sent: Monday, November 15, 2021 10:02 PM
> To: 'John Garry' <[email protected]>; '[email protected]'
> <[email protected]>
> Cc: '[email protected]' <[email protected]>; 'linux-
> [email protected]' <[email protected]>;
> '[email protected]'
> <[email protected]>; '[email protected]' <[email protected]>
> Subject: RE: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter()
> for shared tags
>
> > Hi Kashyap,
> >
> > Any chance you can try this series or give an update on the issue
> > reported earlier?
>
> John -
>
> I will try something this week and let you know.

John - I tested patchset and looks good. Issue reported at below thread is
fixed.
https://lore.kernel.org/linux-block/[email protected]/

Here is perf top data.

5.70% [megaraid_sas] [k] complete_cmd_fusion
3.80% [megaraid_sas] [k]
megasas_build_and_issue_cmd_fusion
2.75% [kernel] [k] syscall_return_via_sysret
2.68% [kernel] [k] entry_SYSCALL_64
2.22% [kernel] [k] io_submit_one
2.19% [megaraid_sas] [k] megasas_build_ldio_fusion
1.95% [kernel] [k] llist_add_batch
1.80% [kernel] [k] llist_reverse_order
1.79% [kernel] [k] scsi_complete
1.73% [kernel] [k] scsi_queue_rq
1.66% [kernel] [k] check_preemption_disabled
1.37% [megaraid_sas] [k] megasas_queue_command
1.16% [kernel] [k] native_irq_return_iret
1.11% [kernel] [k] aio_complete_rw
1.07% [kernel] [k] read_tsc
1.06% fio [.] __fio_gettime
1.05% [kernel] [k] flush_smp_call_function_queue
1.03% [kernel] [k] blk_complete_reqs
1.00% [kernel] [k] blk_mq_free_request
0.98% [kernel] [k] sbitmap_get


I will continue testing and let you know how it goes.


Kashyap

>
> >
> > thanks @ Ming for the reviews.
> >
> > Cheers,
> > John
> >


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2021-11-25 14:11:39

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On 25/11/2021 13:46, Kashyap Desai wrote:
>> John -
>>
>> I will try something this week and let you know.
> John - I tested patchset and looks good. Issue reported at below thread is
> fixed.
> https://lore.kernel.org/linux-block/[email protected]/
>
> Here is perf top data.
>
> 5.70% [megaraid_sas] [k] complete_cmd_fusion
> 3.80% [megaraid_sas] [k]
> megasas_build_and_issue_cmd_fusion
> 2.75% [kernel] [k] syscall_return_via_sysret
> 2.68% [kernel] [k] entry_SYSCALL_64
> 2.22% [kernel] [k] io_submit_one
> 2.19% [megaraid_sas] [k] megasas_build_ldio_fusion
> 1.95% [kernel] [k] llist_add_batch
> 1.80% [kernel] [k] llist_reverse_order
> 1.79% [kernel] [k] scsi_complete
> 1.73% [kernel] [k] scsi_queue_rq
> 1.66% [kernel] [k] check_preemption_disabled
> 1.37% [megaraid_sas] [k] megasas_queue_command
> 1.16% [kernel] [k] native_irq_return_iret
> 1.11% [kernel] [k] aio_complete_rw
> 1.07% [kernel] [k] read_tsc
> 1.06% fio [.] __fio_gettime
> 1.05% [kernel] [k] flush_smp_call_function_queue
> 1.03% [kernel] [k] blk_complete_reqs
> 1.00% [kernel] [k] blk_mq_free_request
> 0.98% [kernel] [k] sbitmap_get
>
>
> I will continue testing and let you know how it goes.

ok, good to know, thanks. But I would still like to know what is
triggering blk_mq_queue_tag_busy_iter() so often. Indeed, as mentioned
in this cover letter, this function was hardly optimised before for
shared sbitmap.

And any opinion whether we would want this as a fix? Information
requested above would help explain why we would need it as a fix.

Cheers,
John

2021-11-26 11:53:40

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On 26/11/2021 11:25, Kashyap Desai wrote:
>>>
>>> I will continue testing and let you know how it goes.
>> ok, good to know, thanks. But I would still like to know what is
>> triggering
>> blk_mq_queue_tag_busy_iter() so often. Indeed, as mentioned in this cover
>> letter, this function was hardly optimised before for shared sbitmap.
> If I give "--disk_util=0" option in my fio run, caller of "
> blk_mq_queue_tag_busy_iter" reduced drastically.
> As part of <fio> run, application call diskutils operations and it is almost
> same as doing "cat /proc/stats" in loop.
> Looking at fio code, it call diskstats every 250 msec. Here is sample fio
> logs -
>
> diskutil 87720 /sys/block/sdb/stat: stat read ok? 0
> diskutil 87720 update io ticks
> diskutil 87720 open stat file: /sys/block/sdb/stat
> diskutil 87720 /sys/block/sdb/stat: 127853173 0 1022829056 241827073
> 0 0 0 0 255 984012 241827073 0 0
> 0 0 0 0
>
> There is one more call trace, but not sure why it is getting executed in my
> test. Below path does not execute so frequently but it consumes cpu (not
> noticeable on my setup)
>
> kthread
> worker_thread
> process_one_work
> blk_mq_timeout_work
> blk_mq_queue_tag_busy_iter
> bt_iter
> blk_mq_find_and_get_req
> _raw_spin_lock_irqsave
> native_queued_spin_lock_slowpath
>
>

It would be still nice to know where this is coming from.

> This patch set improves above call trace even after disk_util=0 is set.
ok, fine. Thanks for testing.

So I guess that this is a regression, and you would want this series for
v5.16, right? My changes were made with v5.17 in mind.

I am not sure how Jens feels about it, since the changes are
significant. It would be a lot easier to argue for v5.16 if we got to
this point earlier in the cycle...

Anyway, it would be good to have full review first, so please help with
that.

@Ming, can you please give feedback on 3/3 here?

BTW, I am on vacation next week and can't help progress then, so any
assistance would be good.

Thanks,
John

2021-11-26 12:02:19

by Kashyap Desai

[permalink] [raw]
Subject: RE: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

> >
> >
> > I will continue testing and let you know how it goes.
>
> ok, good to know, thanks. But I would still like to know what is
> triggering
> blk_mq_queue_tag_busy_iter() so often. Indeed, as mentioned in this cover
> letter, this function was hardly optimised before for shared sbitmap.

If I give "--disk_util=0" option in my fio run, caller of "
blk_mq_queue_tag_busy_iter" reduced drastically.
As part of <fio> run, application call diskutils operations and it is almost
same as doing "cat /proc/stats" in loop.
Looking at fio code, it call diskstats every 250 msec. Here is sample fio
logs -

diskutil 87720 /sys/block/sdb/stat: stat read ok? 0
diskutil 87720 update io ticks
diskutil 87720 open stat file: /sys/block/sdb/stat
diskutil 87720 /sys/block/sdb/stat: 127853173 0 1022829056 241827073
0 0 0 0 255 984012 241827073 0 0
0 0 0 0

There is one more call trace, but not sure why it is getting executed in my
test. Below path does not execute so frequently but it consumes cpu (not
noticeable on my setup)

kthread
worker_thread
process_one_work
blk_mq_timeout_work
blk_mq_queue_tag_busy_iter
bt_iter
blk_mq_find_and_get_req
_raw_spin_lock_irqsave
native_queued_spin_lock_slowpath


This patch set improves above call trace even after disk_util=0 is set.

Kashyap

>
> And any opinion whether we would want this as a fix? Information requested
> above would help explain why we would need it as a fix.
>
> Cheers,
> John


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2021-11-28 10:43:04

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH RFT 2/3] blk-mq: Delete busy_iter_fn

On 11/2/21 12:27 PM, John Garry wrote:
> Typedefs busy_iter_fn and busy_tag_iter_fn are now identical, so delete
> busy_iter_fn to reduce duplication.
>
> It would be nicer to delete busy_tag_iter_fn, as the name busy_iter_fn is
> less specific.
>
> However busy_tag_iter_fn is used in many different parts of the tree,
> unlike busy_iter_fn which is just use in block/, so just take the
> straightforward path now, so that we could rename later treewide.
>
> Signed-off-by: John Garry <[email protected]>
> ---
> block/blk-mq-tag.c | 6 +++---
> block/blk-mq-tag.h | 2 +-
> include/linux/blk-mq.h | 1 -
> 3 files changed, 4 insertions(+), 5 deletions(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

2021-11-28 10:45:39

by Hannes Reinecke

[permalink] [raw]
Subject: Re: [PATCH RFT 3/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On 11/2/21 12:27 PM, John Garry wrote:
> Kashyap reports high CPU usage in blk_mq_queue_tag_busy_iter() and callees
> using megaraid SAS RAID card since moving to shared tags [0].
>
> Previously, when shared tags was shared sbitmap, this function was less
> than optimum since we would iter through all tags for all hctx's,
> yet only ever match upto tagset depth number of rqs.
>
> Since the change to shared tags, things are even less efficient if we have
> parallel callers of blk_mq_queue_tag_busy_iter(). This is because in
> bt_iter() -> blk_mq_find_and_get_req() there would be more contention on
> accessing each request ref and tags->lock since they are now shared among
> all HW queues.
>
> Optimise by having separate calls to bt_for_each() for when we're using
> shared tags. In this case no longer pass a hctx, as it is no longer
> relevant, and teach bt_iter() about this.
>
> Ming suggested something along the lines of this change, apart from a
> different implementation.
>
> [0] https://lore.kernel.org/linux-block/[email protected]/
>
> Signed-off-by: John Garry <[email protected]>
> ---
> block/blk-mq-tag.c | 52 +++++++++++++++++++++++++++++++---------------
> 1 file changed, 35 insertions(+), 17 deletions(-)
>
Reviewed-by: Hannes Reinecke <[email protected]>

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
[email protected] +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

2021-11-29 02:21:55

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH RFT 3/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On Tue, Nov 02, 2021 at 07:27:35PM +0800, John Garry wrote:
> Kashyap reports high CPU usage in blk_mq_queue_tag_busy_iter() and callees
> using megaraid SAS RAID card since moving to shared tags [0].
>
> Previously, when shared tags was shared sbitmap, this function was less
> than optimum since we would iter through all tags for all hctx's,
> yet only ever match upto tagset depth number of rqs.
>
> Since the change to shared tags, things are even less efficient if we have
> parallel callers of blk_mq_queue_tag_busy_iter(). This is because in
> bt_iter() -> blk_mq_find_and_get_req() there would be more contention on
> accessing each request ref and tags->lock since they are now shared among
> all HW queues.
>
> Optimise by having separate calls to bt_for_each() for when we're using
> shared tags. In this case no longer pass a hctx, as it is no longer
> relevant, and teach bt_iter() about this.
>
> Ming suggested something along the lines of this change, apart from a
> different implementation.
>
> [0] https://lore.kernel.org/linux-block/[email protected]/
>
> Signed-off-by: John Garry <[email protected]>

Looks fine,

Reviewed-by: Ming Lei <[email protected]>


Thanks,
Ming


2021-12-06 09:57:57

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On 26/11/2021 11:51, John Garry wrote:
>> This patch set improves above call trace even after disk_util=0 is set.
> ok, fine. Thanks for testing.
>
> So I guess that this is a regression, and you would want this series for
> v5.16, right? My changes were made with v5.17 in mind.
>
> I am not sure how Jens feels about it, since the changes are
> significant. It would be a lot easier to argue for v5.16 if we got to
> this point earlier in the cycle...
>

note: I will now resend for 5.17 and add your tested-by, please let me
know if unhappy about that.

> Anyway, it would be good to have full review first, so please help with
> that.
>
> @Ming, can you please give feedback on 3/3 here?

Thanks and also to Hannes for the reviews.

john

2021-12-06 10:03:12

by Kashyap Desai

[permalink] [raw]
Subject: RE: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

> -----Original Message-----
> From: John Garry [mailto:[email protected]]
> Sent: Monday, December 6, 2021 3:28 PM
> To: Kashyap Desai <[email protected]>; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter()
> for shared tags
>
> On 26/11/2021 11:51, John Garry wrote:
> >> This patch set improves above call trace even after disk_util=0 is set.
> > ok, fine. Thanks for testing.
> >
> > So I guess that this is a regression, and you would want this series
> > for v5.16, right? My changes were made with v5.17 in mind.
> >
> > I am not sure how Jens feels about it, since the changes are
> > significant. It would be a lot easier to argue for v5.16 if we got to
> > this point earlier in the cycle...
> >
>
> note: I will now resend for 5.17 and add your tested-by, please let me
> know if
> unhappy about that.

John - 5.17 should be OK. I am doing additional testing and so far no issue.

Tested-by: Kashyap Desai <[email protected]>

>
> > Anyway, it would be good to have full review first, so please help
> > with that.
> >
> > @Ming, can you please give feedback on 3/3 here?
>
> Thanks and also to Hannes for the reviews.
>
> john


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2021-12-08 13:05:04

by John Garry

[permalink] [raw]
Subject: Re: [PATCH RFT 0/3] blk-mq: Optimise blk_mq_queue_tag_busy_iter() for shared tags

On 06/12/2021 10:03, Kashyap Desai wrote:
>> note: I will now resend for 5.17 and add your tested-by, please let me
>> know if
>> unhappy about that.
> John - 5.17 should be OK. I am doing additional testing and so far no issue.
>
> Tested-by: Kashyap Desai<[email protected]>
>

Hi Kashyap,

On a related topic, you mentioned in an earlier discussion another
possible regression you saw around 5.11 or 5.12, unrelated to shared
tags transition:
https://lore.kernel.org/linux-block/[email protected]/

Did you come to any conclusion on that?

Thanks,
John