2021-10-05 10:30:35

by John Garry

[permalink] [raw]
Subject: [PATCH v5 00/14] blk-mq: Reduce static requests memory footprint for shared sbitmap

Currently a full set of static requests are allocated per hw queue per
tagset when shared sbitmap is used.

However, only tagset->queue_depth number of requests may be active at
any given time. As such, only tagset->queue_depth number of static
requests are required.

The same goes for using an IO scheduler, which allocates a full set of
static requests per hw queue per request queue.

This series changes shared sbitmap support by using a shared tags per
tagset and request queue. Ming suggested something along those lines in
v1 review. In using a shared tags, the static rqs also become shared,
reducing the number of sets of static rqs, reducing memory usage.

Patch "blk-mq: Use shared tags for shared sbitmap support" is a bit big,
and could potentially be broken down. But then maintaining ability to
bisect becomes harder and each sub-patch would get more convoluted.

For megaraid sas driver on my 128-CPU arm64 system with 1x SATA disk, we
save approx. 300MB(!) [370MB -> 60MB]

Baseline is 1b2d1439fc25 (block/for-next) Merge branch 'for-5.16/io_uring'
into for-next

Changes since v4:
- Add Hannes' and Ming's RB tags (thanks!) - but please check 12/14 rework
- Add patch to change "shared sbitmap" naming to "shared tags"
- Rebase "block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ"

Changes since v3:
- Fix transient error handling issue in 05/13
- Add Hannes RB tags (thanks!)

Changes since v2:
- Make blk_mq_clear_rq_mapping() static again
- Various function renaming for conciseness and consistency
- Add/refactor alloc/free map and rqs function
- Drop the new blk_mq_ops init_request method in favour of passing an
invalid HW queue index for shared sbitmap
- Add patch to not clear rq mapping for driver tags
- Remove blk_mq_init_bitmap_tags()
- Add some more RB tags (thanks!)

Changes since v1:
- Switch to use blk_mq_tags for shared sbitmap
- Add new blk_mq_ops init request callback
- Add some RB tags (thanks!)

John Garry (14):
blk-mq: Change rqs check in blk_mq_free_rqs()
block: Rename BLKDEV_MAX_RQ -> BLKDEV_DEFAULT_RQ
blk-mq: Relocate shared sbitmap resize in blk_mq_update_nr_requests()
blk-mq: Invert check in blk_mq_update_nr_requests()
blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}()
blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()
blk-mq: Pass driver tags to blk_mq_clear_rq_mapping()
blk-mq: Don't clear driver tags own mapping
blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap()
blk-mq: Add blk_mq_alloc_map_and_rqs()
blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()
blk-mq: Use shared tags for shared sbitmap support
blk-mq: Stop using pointers for blk_mq_tags bitmap tags
blk-mq: Change shared sbitmap naming to shared tags

block/bfq-iosched.c | 4 +-
block/blk-core.c | 6 +-
block/blk-mq-debugfs.c | 8 +-
block/blk-mq-sched.c | 118 +++++++++++-------------
block/blk-mq-sched.h | 4 +-
block/blk-mq-tag.c | 135 ++++++++++------------------
block/blk-mq-tag.h | 16 ++--
block/blk-mq.c | 198 +++++++++++++++++++++++------------------
block/blk-mq.h | 36 ++++----
block/blk.h | 2 +-
block/elevator.c | 2 +-
block/kyber-iosched.c | 4 +-
block/mq-deadline.c | 2 +-
drivers/block/rbd.c | 2 +-
include/linux/blk-mq.h | 17 ++--
include/linux/blkdev.h | 5 +-
16 files changed, 262 insertions(+), 297 deletions(-)

--
2.26.2


2021-10-05 10:30:48

by John Garry

[permalink] [raw]
Subject: [PATCH v5 09/14] blk-mq: Add blk_mq_tag_update_sched_shared_sbitmap()

Put the functionality to update the sched shared sbitmap size in a common
function.

Since the same formula is always used to resize, and it can be got from
the request queue argument, so just pass the request queue pointer.

Signed-off-by: John Garry <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
---
block/blk-mq-sched.c | 3 +--
block/blk-mq-tag.c | 6 ++++++
block/blk-mq-tag.h | 1 +
block/blk-mq.c | 3 +--
4 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index bdbb6c31b433..6c15f6e98e2e 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -575,8 +575,7 @@ static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
&queue->sched_breserved_tags;
}

- sbitmap_queue_resize(&queue->sched_bitmap_tags,
- queue->nr_requests - set->reserved_tags);
+ blk_mq_tag_update_sched_shared_sbitmap(queue);

return 0;
}
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index ff5caeb82542..55b5a226dcc0 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -634,6 +634,12 @@ void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int s
sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
}

+void blk_mq_tag_update_sched_shared_sbitmap(struct request_queue *q)
+{
+ sbitmap_queue_resize(&q->sched_bitmap_tags,
+ q->nr_requests - q->tag_set->reserved_tags);
+}
+
/**
* blk_mq_unique_tag() - return a tag that is unique queue-wide
* @rq: request for which to compute a unique tag
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index f0a0ee758a55..a9f5f1824819 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -50,6 +50,7 @@ extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
unsigned int depth, bool can_grow);
extern void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set,
unsigned int size);
+extern void blk_mq_tag_update_sched_shared_sbitmap(struct request_queue *q);

extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 158ee7dbbd76..7e964a1c1bee 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3643,8 +3643,7 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
q->nr_requests = nr;
if (blk_mq_is_sbitmap_shared(set->flags)) {
if (q->elevator)
- sbitmap_queue_resize(&q->sched_bitmap_tags,
- nr - set->reserved_tags);
+ blk_mq_tag_update_sched_shared_sbitmap(q);
else
blk_mq_tag_resize_shared_sbitmap(set, nr);
}
--
2.26.2

2021-10-05 10:31:10

by John Garry

[permalink] [raw]
Subject: [PATCH v5 07/14] blk-mq: Pass driver tags to blk_mq_clear_rq_mapping()

Function blk_mq_clear_rq_mapping() will be used for shared sbitmap tags
in future, so pass a driver tags pointer instead of the tagset container
and HW queue index.

Signed-off-by: John Garry <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
---
block/blk-mq.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 9895b55dff61..1bee153e6b7f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2304,10 +2304,9 @@ static size_t order_to_size(unsigned int order)
}

/* called before freeing request pool in @tags */
-static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
- struct blk_mq_tags *tags, unsigned int hctx_idx)
+static void blk_mq_clear_rq_mapping(struct blk_mq_tags *drv_tags,
+ struct blk_mq_tags *tags)
{
- struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
struct page *page;
unsigned long flags;

@@ -2316,7 +2315,7 @@ static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
unsigned long end = start + order_to_size(page->private);
int i;

- for (i = 0; i < set->queue_depth; i++) {
+ for (i = 0; i < drv_tags->nr_tags; i++) {
struct request *rq = drv_tags->rqs[i];
unsigned long rq_addr = (unsigned long)rq;

@@ -2340,8 +2339,11 @@ static void blk_mq_clear_rq_mapping(struct blk_mq_tag_set *set,
void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
unsigned int hctx_idx)
{
+ struct blk_mq_tags *drv_tags;
struct page *page;

+ drv_tags = set->tags[hctx_idx];
+
if (tags->static_rqs && set->ops->exit_request) {
int i;

@@ -2355,7 +2357,7 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
}
}

- blk_mq_clear_rq_mapping(set, tags, hctx_idx);
+ blk_mq_clear_rq_mapping(drv_tags, tags);

while (!list_empty(&tags->page_list)) {
page = list_first_entry(&tags->page_list, struct page, lru);
--
2.26.2

2021-10-05 10:31:18

by John Garry

[permalink] [raw]
Subject: [PATCH v5 05/14] blk-mq-sched: Rename blk_mq_sched_alloc_{tags -> map_and_rqs}()

Function blk_mq_sched_alloc_tags() does same as
__blk_mq_alloc_map_and_request(), so give a similar name to be consistent.

Similarly rename label err_free_tags -> err_free_map_and_rqs.

Signed-off-by: John Garry <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
---
block/blk-mq-sched.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 2231fb0d4c35..644b6d554d72 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -515,9 +515,9 @@ void blk_mq_sched_insert_requests(struct blk_mq_hw_ctx *hctx,
percpu_ref_put(&q->q_usage_counter);
}

-static int blk_mq_sched_alloc_tags(struct request_queue *q,
- struct blk_mq_hw_ctx *hctx,
- unsigned int hctx_idx)
+static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
+ struct blk_mq_hw_ctx *hctx,
+ unsigned int hctx_idx)
{
struct blk_mq_tag_set *set = q->tag_set;
int ret;
@@ -609,15 +609,15 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
BLKDEV_DEFAULT_RQ);

queue_for_each_hw_ctx(q, hctx, i) {
- ret = blk_mq_sched_alloc_tags(q, hctx, i);
+ ret = blk_mq_sched_alloc_map_and_rqs(q, hctx, i);
if (ret)
- goto err_free_tags;
+ goto err_free_map_and_rqs;
}

if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
ret = blk_mq_init_sched_shared_sbitmap(q);
if (ret)
- goto err_free_tags;
+ goto err_free_map_and_rqs;
}

ret = e->ops.init_sched(q, e);
@@ -645,7 +645,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
err_free_sbitmap:
if (blk_mq_is_sbitmap_shared(q->tag_set->flags))
blk_mq_exit_sched_shared_sbitmap(q);
-err_free_tags:
+err_free_map_and_rqs:
blk_mq_sched_free_requests(q);
blk_mq_sched_tags_teardown(q);
q->elevator = NULL;
--
2.26.2

2021-10-05 10:31:22

by John Garry

[permalink] [raw]
Subject: [PATCH v5 10/14] blk-mq: Add blk_mq_alloc_map_and_rqs()

Add a function to combine allocating tags and the associated requests,
and factor out common patterns to use this new function.

Some function only call blk_mq_alloc_map_and_rqs() now, but more
functionality will be added later.

Also make blk_mq_alloc_rq_map() and blk_mq_alloc_rqs() static since they
are only used in blk-mq.c, and finally rename some functions for
conciseness and consistency with other function names:
- __blk_mq_alloc_map_and_{request -> rqs}()
- blk_mq_alloc_{map_and_requests -> set_map_and_rqs}()

Suggested-by: Ming Lei <[email protected]>
Signed-off-by: John Garry <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
---
block/blk-mq-sched.c | 15 +++--------
block/blk-mq-tag.c | 9 +------
block/blk-mq.c | 62 +++++++++++++++++++++++++-------------------
block/blk-mq.h | 9 ++-----
4 files changed, 42 insertions(+), 53 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 6c15f6e98e2e..d1b56bb9ac64 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -519,21 +519,12 @@ static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
struct blk_mq_hw_ctx *hctx,
unsigned int hctx_idx)
{
- struct blk_mq_tag_set *set = q->tag_set;
- int ret;
+ hctx->sched_tags = blk_mq_alloc_map_and_rqs(q->tag_set, hctx_idx,
+ q->nr_requests);

- hctx->sched_tags = blk_mq_alloc_rq_map(set, hctx_idx, q->nr_requests,
- set->reserved_tags, set->flags);
if (!hctx->sched_tags)
return -ENOMEM;
-
- ret = blk_mq_alloc_rqs(set, hctx->sched_tags, hctx_idx, q->nr_requests);
- if (ret) {
- blk_mq_free_rq_map(hctx->sched_tags, set->flags);
- hctx->sched_tags = NULL;
- }
-
- return ret;
+ return 0;
}

/* called in queue's release handler, tagset has gone away */
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 55b5a226dcc0..db99f1246795 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -592,7 +592,6 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
if (tdepth > tags->nr_tags) {
struct blk_mq_tag_set *set = hctx->queue->tag_set;
struct blk_mq_tags *new;
- bool ret;

if (!can_grow)
return -EINVAL;
@@ -604,15 +603,9 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
if (tdepth > MAX_SCHED_RQ)
return -EINVAL;

- new = blk_mq_alloc_rq_map(set, hctx->queue_num, tdepth,
- tags->nr_reserved_tags, set->flags);
+ new = blk_mq_alloc_map_and_rqs(set, hctx->queue_num, tdepth);
if (!new)
return -ENOMEM;
- ret = blk_mq_alloc_rqs(set, new, hctx->queue_num, tdepth);
- if (ret) {
- blk_mq_free_rq_map(new, set->flags);
- return -ENOMEM;
- }

blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
blk_mq_free_rq_map(*tagsptr, set->flags);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7e964a1c1bee..6d3664acb9f8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2385,11 +2385,11 @@ void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags)
blk_mq_free_tags(tags, flags);
}

-struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
- unsigned int hctx_idx,
- unsigned int nr_tags,
- unsigned int reserved_tags,
- unsigned int flags)
+static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
+ unsigned int hctx_idx,
+ unsigned int nr_tags,
+ unsigned int reserved_tags,
+ unsigned int flags)
{
struct blk_mq_tags *tags;
int node;
@@ -2437,8 +2437,9 @@ static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
return 0;
}

-int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
- unsigned int hctx_idx, unsigned int depth)
+static int blk_mq_alloc_rqs(struct blk_mq_tag_set *set,
+ struct blk_mq_tags *tags,
+ unsigned int hctx_idx, unsigned int depth)
{
unsigned int i, j, entries_per_page, max_order = 4;
size_t rq_size, left;
@@ -2849,25 +2850,34 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
}
}

-static bool __blk_mq_alloc_map_and_request(struct blk_mq_tag_set *set,
- int hctx_idx)
+struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
+ unsigned int hctx_idx,
+ unsigned int depth)
{
- unsigned int flags = set->flags;
- int ret = 0;
+ struct blk_mq_tags *tags;
+ int ret;

- set->tags[hctx_idx] = blk_mq_alloc_rq_map(set, hctx_idx,
- set->queue_depth, set->reserved_tags, flags);
- if (!set->tags[hctx_idx])
- return false;
+ tags = blk_mq_alloc_rq_map(set, hctx_idx, depth, set->reserved_tags,
+ set->flags);
+ if (!tags)
+ return NULL;

- ret = blk_mq_alloc_rqs(set, set->tags[hctx_idx], hctx_idx,
- set->queue_depth);
- if (!ret)
- return true;
+ ret = blk_mq_alloc_rqs(set, tags, hctx_idx, depth);
+ if (ret) {
+ blk_mq_free_rq_map(tags, set->flags);
+ return NULL;
+ }

- blk_mq_free_rq_map(set->tags[hctx_idx], flags);
- set->tags[hctx_idx] = NULL;
- return false;
+ return tags;
+}
+
+static bool __blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
+ int hctx_idx)
+{
+ set->tags[hctx_idx] = blk_mq_alloc_map_and_rqs(set, hctx_idx,
+ set->queue_depth);
+
+ return set->tags[hctx_idx];
}

static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
@@ -2912,7 +2922,7 @@ static void blk_mq_map_swqueue(struct request_queue *q)
hctx_idx = set->map[j].mq_map[i];
/* unmapped hw queue can be remapped after CPU topo changed */
if (!set->tags[hctx_idx] &&
- !__blk_mq_alloc_map_and_request(set, hctx_idx)) {
+ !__blk_mq_alloc_map_and_rqs(set, hctx_idx)) {
/*
* If tags initialization fail for some hctx,
* that hctx won't be brought online. In this
@@ -3345,7 +3355,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
int i;

for (i = 0; i < set->nr_hw_queues; i++) {
- if (!__blk_mq_alloc_map_and_request(set, i))
+ if (!__blk_mq_alloc_map_and_rqs(set, i))
goto out_unwind;
cond_resched();
}
@@ -3364,7 +3374,7 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
* may reduce the depth asked for, if memory is tight. set->queue_depth
* will be updated to reflect the allocated depth.
*/
-static int blk_mq_alloc_map_and_requests(struct blk_mq_tag_set *set)
+static int blk_mq_alloc_set_map_and_rqs(struct blk_mq_tag_set *set)
{
unsigned int depth;
int err;
@@ -3530,7 +3540,7 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
if (ret)
goto out_free_mq_map;

- ret = blk_mq_alloc_map_and_requests(set);
+ ret = blk_mq_alloc_set_map_and_rqs(set);
if (ret)
goto out_free_mq_map;

diff --git a/block/blk-mq.h b/block/blk-mq.h
index d08779f77a26..83585a344568 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -55,13 +55,8 @@ void blk_mq_put_rq_ref(struct request *rq);
void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
unsigned int hctx_idx);
void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
-struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
- unsigned int hctx_idx,
- unsigned int nr_tags,
- unsigned int reserved_tags,
- unsigned int flags);
-int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
- unsigned int hctx_idx, unsigned int depth);
+struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
+ unsigned int hctx_idx, unsigned int depth);

/*
* Internal helpers for request insertion into sw queues
--
2.26.2

2021-10-05 10:31:33

by John Garry

[permalink] [raw]
Subject: [PATCH v5 12/14] blk-mq: Use shared tags for shared sbitmap support

Currently we use separate sbitmap pairs and active_queues atomic_t for
shared sbitmap support.

However a full sets of static requests are used per HW queue, which is
quite wasteful, considering that the total number of requests usable at
any given time across all HW queues is limited by the shared sbitmap depth.

As such, it is considerably more memory efficient in the case of shared
sbitmap to allocate a set of static rqs per tag set or request queue, and
not per HW queue.

So replace the sbitmap pairs and active_queues atomic_t with a shared
tags per tagset and request queue, which will hold a set of shared static
rqs.

Since there is now no valid HW queue index to be passed to the blk_mq_ops
.init and .exit_request callbacks, pass an invalid index token. This
changes the semantics of the APIs, such that the callback would need to
validate the HW queue index before using it. Currently no user of shared
sbitmap actually uses the HW queue index (as would be expected).

Signed-off-by: John Garry <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
---
block/blk-mq-sched.c | 82 ++++++++++++++++-----------------
block/blk-mq-tag.c | 63 ++++++++-----------------
block/blk-mq-tag.h | 6 +--
block/blk-mq.c | 101 +++++++++++++++++++++--------------------
block/blk-mq.h | 7 ++-
include/linux/blk-mq.h | 15 +++---
include/linux/blkdev.h | 3 +-
7 files changed, 125 insertions(+), 152 deletions(-)

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index d1b56bb9ac64..428da4949d80 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -519,6 +519,11 @@ static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
struct blk_mq_hw_ctx *hctx,
unsigned int hctx_idx)
{
+ if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
+ hctx->sched_tags = q->shared_sbitmap_tags;
+ return 0;
+ }
+
hctx->sched_tags = blk_mq_alloc_map_and_rqs(q->tag_set, hctx_idx,
q->nr_requests);

@@ -527,61 +532,54 @@ static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
return 0;
}

+static void blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue)
+{
+ blk_mq_free_rq_map(queue->shared_sbitmap_tags);
+ queue->shared_sbitmap_tags = NULL;
+}
+
/* called in queue's release handler, tagset has gone away */
-static void blk_mq_sched_tags_teardown(struct request_queue *q)
+static void blk_mq_sched_tags_teardown(struct request_queue *q, unsigned int flags)
{
struct blk_mq_hw_ctx *hctx;
int i;

queue_for_each_hw_ctx(q, hctx, i) {
if (hctx->sched_tags) {
- blk_mq_free_rq_map(hctx->sched_tags, hctx->flags);
+ if (!blk_mq_is_sbitmap_shared(q->tag_set->flags))
+ blk_mq_free_rq_map(hctx->sched_tags);
hctx->sched_tags = NULL;
}
}
+
+ if (blk_mq_is_sbitmap_shared(flags))
+ blk_mq_exit_sched_shared_sbitmap(q);
}

static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
{
struct blk_mq_tag_set *set = queue->tag_set;
- int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
- struct blk_mq_hw_ctx *hctx;
- int ret, i;

/*
* Set initial depth at max so that we don't need to reallocate for
* updating nr_requests.
*/
- ret = blk_mq_init_bitmaps(&queue->sched_bitmap_tags,
- &queue->sched_breserved_tags,
- MAX_SCHED_RQ, set->reserved_tags,
- set->numa_node, alloc_policy);
- if (ret)
- return ret;
-
- queue_for_each_hw_ctx(queue, hctx, i) {
- hctx->sched_tags->bitmap_tags =
- &queue->sched_bitmap_tags;
- hctx->sched_tags->breserved_tags =
- &queue->sched_breserved_tags;
- }
+ queue->shared_sbitmap_tags = blk_mq_alloc_map_and_rqs(set,
+ BLK_MQ_NO_HCTX_IDX,
+ MAX_SCHED_RQ);
+ if (!queue->shared_sbitmap_tags)
+ return -ENOMEM;

blk_mq_tag_update_sched_shared_sbitmap(queue);

return 0;
}

-static void blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue)
-{
- sbitmap_queue_free(&queue->sched_bitmap_tags);
- sbitmap_queue_free(&queue->sched_breserved_tags);
-}
-
int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
{
+ unsigned int i, flags = q->tag_set->flags;
struct blk_mq_hw_ctx *hctx;
struct elevator_queue *eq;
- unsigned int i;
int ret;

if (!e) {
@@ -598,21 +596,21 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
BLKDEV_DEFAULT_RQ);

- queue_for_each_hw_ctx(q, hctx, i) {
- ret = blk_mq_sched_alloc_map_and_rqs(q, hctx, i);
+ if (blk_mq_is_sbitmap_shared(flags)) {
+ ret = blk_mq_init_sched_shared_sbitmap(q);
if (ret)
- goto err_free_map_and_rqs;
+ return ret;
}

- if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
- ret = blk_mq_init_sched_shared_sbitmap(q);
+ queue_for_each_hw_ctx(q, hctx, i) {
+ ret = blk_mq_sched_alloc_map_and_rqs(q, hctx, i);
if (ret)
goto err_free_map_and_rqs;
}

ret = e->ops.init_sched(q, e);
if (ret)
- goto err_free_sbitmap;
+ goto err_free_map_and_rqs;

blk_mq_debugfs_register_sched(q);

@@ -632,12 +630,10 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)

return 0;

-err_free_sbitmap:
- if (blk_mq_is_sbitmap_shared(q->tag_set->flags))
- blk_mq_exit_sched_shared_sbitmap(q);
err_free_map_and_rqs:
blk_mq_sched_free_rqs(q);
- blk_mq_sched_tags_teardown(q);
+ blk_mq_sched_tags_teardown(q, flags);
+
q->elevator = NULL;
return ret;
}
@@ -651,9 +647,15 @@ void blk_mq_sched_free_rqs(struct request_queue *q)
struct blk_mq_hw_ctx *hctx;
int i;

- queue_for_each_hw_ctx(q, hctx, i) {
- if (hctx->sched_tags)
- blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
+ if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
+ blk_mq_free_rqs(q->tag_set, q->shared_sbitmap_tags,
+ BLK_MQ_NO_HCTX_IDX);
+ } else {
+ queue_for_each_hw_ctx(q, hctx, i) {
+ if (hctx->sched_tags)
+ blk_mq_free_rqs(q->tag_set,
+ hctx->sched_tags, i);
+ }
}
}

@@ -674,8 +676,6 @@ void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e)
blk_mq_debugfs_unregister_sched(q);
if (e->type->ops.exit_sched)
e->type->ops.exit_sched(e);
- blk_mq_sched_tags_teardown(q);
- if (blk_mq_is_sbitmap_shared(flags))
- blk_mq_exit_sched_shared_sbitmap(q);
+ blk_mq_sched_tags_teardown(q, flags);
q->elevator = NULL;
}
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index a0ecc6d88f84..0e10e8404bf0 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -26,11 +26,10 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
{
if (blk_mq_is_sbitmap_shared(hctx->flags)) {
struct request_queue *q = hctx->queue;
- struct blk_mq_tag_set *set = q->tag_set;

if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) &&
!test_and_set_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
- atomic_inc(&set->active_queues_shared_sbitmap);
+ atomic_inc(&hctx->tags->active_queues);
} else {
if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state) &&
!test_and_set_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
@@ -57,14 +56,14 @@ void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
{
struct blk_mq_tags *tags = hctx->tags;
- struct request_queue *q = hctx->queue;
- struct blk_mq_tag_set *set = q->tag_set;

if (blk_mq_is_sbitmap_shared(hctx->flags)) {
+ struct request_queue *q = hctx->queue;
+
if (!test_and_clear_bit(QUEUE_FLAG_HCTX_ACTIVE,
&q->queue_flags))
return;
- atomic_dec(&set->active_queues_shared_sbitmap);
+ atomic_dec(&tags->active_queues);
} else {
if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
return;
@@ -510,38 +509,10 @@ static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
return 0;
}

-int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set)
-{
- int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags);
- int i, ret;
-
- ret = blk_mq_init_bitmaps(&set->__bitmap_tags, &set->__breserved_tags,
- set->queue_depth, set->reserved_tags,
- set->numa_node, alloc_policy);
- if (ret)
- return ret;
-
- for (i = 0; i < set->nr_hw_queues; i++) {
- struct blk_mq_tags *tags = set->tags[i];
-
- tags->bitmap_tags = &set->__bitmap_tags;
- tags->breserved_tags = &set->__breserved_tags;
- }
-
- return 0;
-}
-
-void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set)
-{
- sbitmap_queue_free(&set->__bitmap_tags);
- sbitmap_queue_free(&set->__breserved_tags);
-}
-
struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
unsigned int reserved_tags,
- int node, unsigned int flags)
+ int node, int alloc_policy)
{
- int alloc_policy = BLK_MQ_FLAG_TO_ALLOC_POLICY(flags);
struct blk_mq_tags *tags;

if (total_tags > BLK_MQ_TAG_MAX) {
@@ -557,9 +528,6 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
tags->nr_reserved_tags = reserved_tags;
spin_lock_init(&tags->lock);

- if (blk_mq_is_sbitmap_shared(flags))
- return tags;
-
if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
kfree(tags);
return NULL;
@@ -567,12 +535,10 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
return tags;
}

-void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags)
+void blk_mq_free_tags(struct blk_mq_tags *tags)
{
- if (!blk_mq_is_sbitmap_shared(flags)) {
- sbitmap_queue_free(tags->bitmap_tags);
- sbitmap_queue_free(tags->breserved_tags);
- }
+ sbitmap_queue_free(tags->bitmap_tags);
+ sbitmap_queue_free(tags->breserved_tags);
kfree(tags);
}

@@ -603,6 +569,13 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
if (tdepth > MAX_SCHED_RQ)
return -EINVAL;

+ /*
+ * Only the sbitmap needs resizing since we allocated the max
+ * initially.
+ */
+ if (blk_mq_is_sbitmap_shared(set->flags))
+ return 0;
+
new = blk_mq_alloc_map_and_rqs(set, hctx->queue_num, tdepth);
if (!new)
return -ENOMEM;
@@ -623,12 +596,14 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,

void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int size)
{
- sbitmap_queue_resize(&set->__bitmap_tags, size - set->reserved_tags);
+ struct blk_mq_tags *tags = set->shared_sbitmap_tags;
+
+ sbitmap_queue_resize(&tags->__bitmap_tags, size - set->reserved_tags);
}

void blk_mq_tag_update_sched_shared_sbitmap(struct request_queue *q)
{
- sbitmap_queue_resize(&q->sched_bitmap_tags,
+ sbitmap_queue_resize(q->shared_sbitmap_tags->bitmap_tags,
q->nr_requests - q->tag_set->reserved_tags);
}

diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index a9f5f1824819..e43f7589b96c 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -32,16 +32,14 @@ struct blk_mq_tags {

extern struct blk_mq_tags *blk_mq_init_tags(unsigned int nr_tags,
unsigned int reserved_tags,
- int node, unsigned int flags);
-extern void blk_mq_free_tags(struct blk_mq_tags *tags, unsigned int flags);
+ int node, int alloc_policy);
+extern void blk_mq_free_tags(struct blk_mq_tags *tags);
extern int blk_mq_init_bitmaps(struct sbitmap_queue *bitmap_tags,
struct sbitmap_queue *breserved_tags,
unsigned int queue_depth,
unsigned int reserved,
int node, int alloc_policy);

-extern int blk_mq_init_shared_sbitmap(struct blk_mq_tag_set *set);
-extern void blk_mq_exit_shared_sbitmap(struct blk_mq_tag_set *set);
extern unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data);
extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
unsigned int tag);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7bf56dc3af30..019d3f7b7bb7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2346,7 +2346,10 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
struct blk_mq_tags *drv_tags;
struct page *page;

- drv_tags = set->tags[hctx_idx];
+ if (blk_mq_is_sbitmap_shared(set->flags))
+ drv_tags = set->shared_sbitmap_tags;
+ else
+ drv_tags = set->tags[hctx_idx];

if (tags->static_rqs && set->ops->exit_request) {
int i;
@@ -2375,21 +2378,20 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
}
}

-void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags)
+void blk_mq_free_rq_map(struct blk_mq_tags *tags)
{
kfree(tags->rqs);
tags->rqs = NULL;
kfree(tags->static_rqs);
tags->static_rqs = NULL;

- blk_mq_free_tags(tags, flags);
+ blk_mq_free_tags(tags);
}

static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
unsigned int hctx_idx,
unsigned int nr_tags,
- unsigned int reserved_tags,
- unsigned int flags)
+ unsigned int reserved_tags)
{
struct blk_mq_tags *tags;
int node;
@@ -2398,7 +2400,8 @@ static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
if (node == NUMA_NO_NODE)
node = set->numa_node;

- tags = blk_mq_init_tags(nr_tags, reserved_tags, node, flags);
+ tags = blk_mq_init_tags(nr_tags, reserved_tags, node,
+ BLK_MQ_FLAG_TO_ALLOC_POLICY(set->flags));
if (!tags)
return NULL;

@@ -2406,7 +2409,7 @@ static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY,
node);
if (!tags->rqs) {
- blk_mq_free_tags(tags, flags);
+ blk_mq_free_tags(tags);
return NULL;
}

@@ -2415,7 +2418,7 @@ static struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set,
node);
if (!tags->static_rqs) {
kfree(tags->rqs);
- blk_mq_free_tags(tags, flags);
+ blk_mq_free_tags(tags);
return NULL;
}

@@ -2857,14 +2860,13 @@ struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
struct blk_mq_tags *tags;
int ret;

- tags = blk_mq_alloc_rq_map(set, hctx_idx, depth, set->reserved_tags,
- set->flags);
+ tags = blk_mq_alloc_rq_map(set, hctx_idx, depth, set->reserved_tags);
if (!tags)
return NULL;

ret = blk_mq_alloc_rqs(set, tags, hctx_idx, depth);
if (ret) {
- blk_mq_free_rq_map(tags, set->flags);
+ blk_mq_free_rq_map(tags);
return NULL;
}

@@ -2874,6 +2876,12 @@ struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
static bool __blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
int hctx_idx)
{
+ if (blk_mq_is_sbitmap_shared(set->flags)) {
+ set->tags[hctx_idx] = set->shared_sbitmap_tags;
+
+ return true;
+ }
+
set->tags[hctx_idx] = blk_mq_alloc_map_and_rqs(set, hctx_idx,
set->queue_depth);

@@ -2884,14 +2892,21 @@ void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
struct blk_mq_tags *tags,
unsigned int hctx_idx)
{
- unsigned int flags = set->flags;
-
if (tags) {
blk_mq_free_rqs(set, tags, hctx_idx);
- blk_mq_free_rq_map(tags, flags);
+ blk_mq_free_rq_map(tags);
}
}

+static void __blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
+ unsigned int hctx_idx)
+{
+ if (!blk_mq_is_sbitmap_shared(set->flags))
+ blk_mq_free_map_and_rqs(set, set->tags[hctx_idx], hctx_idx);
+
+ set->tags[hctx_idx] = NULL;
+}
+
static void blk_mq_map_swqueue(struct request_queue *q)
{
unsigned int i, j, hctx_idx;
@@ -2969,10 +2984,8 @@ static void blk_mq_map_swqueue(struct request_queue *q)
* fallback in case of a new remap fails
* allocation
*/
- if (i && set->tags[i]) {
- blk_mq_free_map_and_rqs(set, set->tags[i], i);
- set->tags[i] = NULL;
- }
+ if (i)
+ __blk_mq_free_map_and_rqs(set, i);

hctx->tags = NULL;
continue;
@@ -3268,8 +3281,7 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
struct blk_mq_hw_ctx *hctx = hctxs[j];

if (hctx) {
- blk_mq_free_map_and_rqs(set, set->tags[j], j);
- set->tags[j] = NULL;
+ __blk_mq_free_map_and_rqs(set, j);
blk_mq_exit_hctx(q, set, hctx, j);
hctxs[j] = NULL;
}
@@ -3356,6 +3368,14 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
{
int i;

+ if (blk_mq_is_sbitmap_shared(set->flags)) {
+ set->shared_sbitmap_tags = blk_mq_alloc_map_and_rqs(set,
+ BLK_MQ_NO_HCTX_IDX,
+ set->queue_depth);
+ if (!set->shared_sbitmap_tags)
+ return -ENOMEM;
+ }
+
for (i = 0; i < set->nr_hw_queues; i++) {
if (!__blk_mq_alloc_map_and_rqs(set, i))
goto out_unwind;
@@ -3365,9 +3385,12 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
return 0;

out_unwind:
- while (--i >= 0) {
- blk_mq_free_map_and_rqs(set, set->tags[i], i);
- set->tags[i] = NULL;
+ while (--i >= 0)
+ __blk_mq_free_map_and_rqs(set, i);
+
+ if (blk_mq_is_sbitmap_shared(set->flags)) {
+ blk_mq_free_map_and_rqs(set, set->shared_sbitmap_tags,
+ BLK_MQ_NO_HCTX_IDX);
}

return -ENOMEM;
@@ -3548,25 +3571,11 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
if (ret)
goto out_free_mq_map;

- if (blk_mq_is_sbitmap_shared(set->flags)) {
- atomic_set(&set->active_queues_shared_sbitmap, 0);
-
- if (blk_mq_init_shared_sbitmap(set)) {
- ret = -ENOMEM;
- goto out_free_mq_rq_maps;
- }
- }
-
mutex_init(&set->tag_list_lock);
INIT_LIST_HEAD(&set->tag_list);

return 0;

-out_free_mq_rq_maps:
- for (i = 0; i < set->nr_hw_queues; i++) {
- blk_mq_free_map_and_rqs(set, set->tags[i], i);
- set->tags[i] = NULL;
- }
out_free_mq_map:
for (i = 0; i < set->nr_maps; i++) {
kfree(set->map[i].mq_map);
@@ -3598,13 +3607,13 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
{
int i, j;

- for (i = 0; i < set->nr_hw_queues; i++) {
- blk_mq_free_map_and_rqs(set, set->tags[i], i);
- set->tags[i] = NULL;
- }
+ for (i = 0; i < set->nr_hw_queues; i++)
+ __blk_mq_free_map_and_rqs(set, i);

- if (blk_mq_is_sbitmap_shared(set->flags))
- blk_mq_exit_shared_sbitmap(set);
+ if (blk_mq_is_sbitmap_shared(set->flags)) {
+ blk_mq_free_map_and_rqs(set, set->shared_sbitmap_tags,
+ BLK_MQ_NO_HCTX_IDX);
+ }

for (j = 0; j < set->nr_maps; j++) {
kfree(set->map[j].mq_map);
@@ -3642,12 +3651,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
if (hctx->sched_tags) {
ret = blk_mq_tag_update_depth(hctx, &hctx->sched_tags,
nr, true);
- if (blk_mq_is_sbitmap_shared(set->flags)) {
- hctx->sched_tags->bitmap_tags =
- &q->sched_bitmap_tags;
- hctx->sched_tags->breserved_tags =
- &q->sched_breserved_tags;
- }
} else {
ret = blk_mq_tag_update_depth(hctx, &hctx->tags, nr,
false);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index bcb0ca89d37a..8824ae03215a 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -54,7 +54,7 @@ void blk_mq_put_rq_ref(struct request *rq);
*/
void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
unsigned int hctx_idx);
-void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
+void blk_mq_free_rq_map(struct blk_mq_tags *tags);
struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
unsigned int hctx_idx, unsigned int depth);
void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
@@ -330,17 +330,16 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,

if (blk_mq_is_sbitmap_shared(hctx->flags)) {
struct request_queue *q = hctx->queue;
- struct blk_mq_tag_set *set = q->tag_set;

if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
return true;
- users = atomic_read(&set->active_queues_shared_sbitmap);
} else {
if (!test_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
return true;
- users = atomic_read(&hctx->tags->active_queues);
}

+ users = atomic_read(&hctx->tags->active_queues);
+
if (!users)
return true;

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 31cc41dfef6b..faa20a19bfcc 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -440,13 +440,11 @@ enum hctx_type {
* @flags: Zero or more BLK_MQ_F_* flags.
* @driver_data: Pointer to data owned by the block driver that created this
* tag set.
- * @active_queues_shared_sbitmap:
- * number of active request queues per tag set.
- * @__bitmap_tags: A shared tags sbitmap, used over all hctx's
- * @__breserved_tags:
- * A shared reserved tags sbitmap, used over all hctx's
* @tags: Tag sets. One tag set per hardware queue. Has @nr_hw_queues
* elements.
+ * @shared_sbitmap_tags:
+ * Shared sbitmap set of tags. Has @nr_hw_queues elements. If
+ * set, shared by all @tags.
* @tag_list_lock: Serializes tag_list accesses.
* @tag_list: List of the request queues that use this tag set. See also
* request_queue.tag_set_list.
@@ -463,12 +461,11 @@ struct blk_mq_tag_set {
unsigned int timeout;
unsigned int flags;
void *driver_data;
- atomic_t active_queues_shared_sbitmap;

- struct sbitmap_queue __bitmap_tags;
- struct sbitmap_queue __breserved_tags;
struct blk_mq_tags **tags;

+ struct blk_mq_tags *shared_sbitmap_tags;
+
struct mutex tag_list_lock;
struct list_head tag_list;
};
@@ -640,6 +637,8 @@ enum {
((policy & ((1 << BLK_MQ_F_ALLOC_POLICY_BITS) - 1)) \
<< BLK_MQ_F_ALLOC_POLICY_START_BIT)

+#define BLK_MQ_NO_HCTX_IDX (-1U)
+
struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
struct lock_class_key *lkclass);
#define blk_mq_alloc_disk(set, queuedata) \
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0e960d74615e..cf92c13eb80e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -238,8 +238,7 @@ struct request_queue {

atomic_t nr_active_requests_shared_sbitmap;

- struct sbitmap_queue sched_bitmap_tags;
- struct sbitmap_queue sched_breserved_tags;
+ struct blk_mq_tags *shared_sbitmap_tags;

struct list_head icq_list;
#ifdef CONFIG_BLK_CGROUP
--
2.26.2

2021-10-05 10:31:46

by John Garry

[permalink] [raw]
Subject: [PATCH v5 14/14] blk-mq: Change shared sbitmap naming to shared tags

Now that shared sbitmap support really means shared tags, rename symbols
to match that.

Signed-off-by: John Garry <[email protected]>
---
block/blk-core.c | 2 +-
block/blk-mq-sched.c | 32 ++++++++++++++++----------------
block/blk-mq-tag.c | 18 +++++++++---------
block/blk-mq-tag.h | 4 ++--
block/blk-mq.c | 32 ++++++++++++++++----------------
block/blk-mq.h | 16 ++++++++--------
block/elevator.c | 2 +-
include/linux/blk-mq.h | 8 ++++----
include/linux/blkdev.h | 4 ++--
9 files changed, 59 insertions(+), 59 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 7c869737ce4c..532c817525de 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -536,7 +536,7 @@ struct request_queue *blk_alloc_queue(int node_id)

q->node = node_id;

- atomic_set(&q->nr_active_requests_shared_sbitmap, 0);
+ atomic_set(&q->nr_active_requests_shared_tags, 0);

timer_setup(&q->timeout, blk_rq_timed_out_timer, 0);
INIT_WORK(&q->timeout_work, blk_timeout_work);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 428da4949d80..27312da7d638 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -519,8 +519,8 @@ static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
struct blk_mq_hw_ctx *hctx,
unsigned int hctx_idx)
{
- if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
- hctx->sched_tags = q->shared_sbitmap_tags;
+ if (blk_mq_is_shared_tags(q->tag_set->flags)) {
+ hctx->sched_tags = q->sched_shared_tags;
return 0;
}

@@ -532,10 +532,10 @@ static int blk_mq_sched_alloc_map_and_rqs(struct request_queue *q,
return 0;
}

-static void blk_mq_exit_sched_shared_sbitmap(struct request_queue *queue)
+static void blk_mq_exit_sched_shared_tags(struct request_queue *queue)
{
- blk_mq_free_rq_map(queue->shared_sbitmap_tags);
- queue->shared_sbitmap_tags = NULL;
+ blk_mq_free_rq_map(queue->sched_shared_tags);
+ queue->sched_shared_tags = NULL;
}

/* called in queue's release handler, tagset has gone away */
@@ -546,17 +546,17 @@ static void blk_mq_sched_tags_teardown(struct request_queue *q, unsigned int fla

queue_for_each_hw_ctx(q, hctx, i) {
if (hctx->sched_tags) {
- if (!blk_mq_is_sbitmap_shared(q->tag_set->flags))
+ if (!blk_mq_is_shared_tags(q->tag_set->flags))
blk_mq_free_rq_map(hctx->sched_tags);
hctx->sched_tags = NULL;
}
}

- if (blk_mq_is_sbitmap_shared(flags))
- blk_mq_exit_sched_shared_sbitmap(q);
+ if (blk_mq_is_shared_tags(flags))
+ blk_mq_exit_sched_shared_tags(q);
}

-static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
+static int blk_mq_init_sched_shared_tags(struct request_queue *queue)
{
struct blk_mq_tag_set *set = queue->tag_set;

@@ -564,13 +564,13 @@ static int blk_mq_init_sched_shared_sbitmap(struct request_queue *queue)
* Set initial depth at max so that we don't need to reallocate for
* updating nr_requests.
*/
- queue->shared_sbitmap_tags = blk_mq_alloc_map_and_rqs(set,
+ queue->sched_shared_tags = blk_mq_alloc_map_and_rqs(set,
BLK_MQ_NO_HCTX_IDX,
MAX_SCHED_RQ);
- if (!queue->shared_sbitmap_tags)
+ if (!queue->sched_shared_tags)
return -ENOMEM;

- blk_mq_tag_update_sched_shared_sbitmap(queue);
+ blk_mq_tag_update_sched_shared_tags(queue);

return 0;
}
@@ -596,8 +596,8 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
q->nr_requests = 2 * min_t(unsigned int, q->tag_set->queue_depth,
BLKDEV_DEFAULT_RQ);

- if (blk_mq_is_sbitmap_shared(flags)) {
- ret = blk_mq_init_sched_shared_sbitmap(q);
+ if (blk_mq_is_shared_tags(flags)) {
+ ret = blk_mq_init_sched_shared_tags(q);
if (ret)
return ret;
}
@@ -647,8 +647,8 @@ void blk_mq_sched_free_rqs(struct request_queue *q)
struct blk_mq_hw_ctx *hctx;
int i;

- if (blk_mq_is_sbitmap_shared(q->tag_set->flags)) {
- blk_mq_free_rqs(q->tag_set, q->shared_sbitmap_tags,
+ if (blk_mq_is_shared_tags(q->tag_set->flags)) {
+ blk_mq_free_rqs(q->tag_set, q->sched_shared_tags,
BLK_MQ_NO_HCTX_IDX);
} else {
queue_for_each_hw_ctx(q, hctx, i) {
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 211068a5f676..72a2724a4eee 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -24,7 +24,7 @@
*/
bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
{
- if (blk_mq_is_sbitmap_shared(hctx->flags)) {
+ if (blk_mq_is_shared_tags(hctx->flags)) {
struct request_queue *q = hctx->queue;

if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags) &&
@@ -57,19 +57,19 @@ void __blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
{
struct blk_mq_tags *tags = hctx->tags;

- if (blk_mq_is_sbitmap_shared(hctx->flags)) {
+ if (blk_mq_is_shared_tags(hctx->flags)) {
struct request_queue *q = hctx->queue;

if (!test_and_clear_bit(QUEUE_FLAG_HCTX_ACTIVE,
&q->queue_flags))
return;
- atomic_dec(&tags->active_queues);
} else {
if (!test_and_clear_bit(BLK_MQ_S_TAG_ACTIVE, &hctx->state))
return;
- atomic_dec(&tags->active_queues);
}

+ atomic_dec(&tags->active_queues);
+
blk_mq_tag_wakeup_all(tags, false);
}

@@ -557,7 +557,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
* Only the sbitmap needs resizing since we allocated the max
* initially.
*/
- if (blk_mq_is_sbitmap_shared(set->flags))
+ if (blk_mq_is_shared_tags(set->flags))
return 0;

new = blk_mq_alloc_map_and_rqs(set, hctx->queue_num, tdepth);
@@ -578,16 +578,16 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
return 0;
}

-void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int size)
+void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set, unsigned int size)
{
- struct blk_mq_tags *tags = set->shared_sbitmap_tags;
+ struct blk_mq_tags *tags = set->shared_tags;

sbitmap_queue_resize(&tags->bitmap_tags, size - set->reserved_tags);
}

-void blk_mq_tag_update_sched_shared_sbitmap(struct request_queue *q)
+void blk_mq_tag_update_sched_shared_tags(struct request_queue *q)
{
- sbitmap_queue_resize(&q->shared_sbitmap_tags->bitmap_tags,
+ sbitmap_queue_resize(&q->sched_shared_tags->bitmap_tags,
q->nr_requests - q->tag_set->reserved_tags);
}

diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 1052d69147ba..d8ce89fa1686 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -43,9 +43,9 @@ extern void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
struct blk_mq_tags **tags,
unsigned int depth, bool can_grow);
-extern void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set,
+extern void blk_mq_tag_resize_shared_tags(struct blk_mq_tag_set *set,
unsigned int size);
-extern void blk_mq_tag_update_sched_shared_sbitmap(struct request_queue *q);
+extern void blk_mq_tag_update_sched_shared_tags(struct request_queue *q);

extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 11644c67b064..a8c437afc2c3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2228,7 +2228,7 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
blk_insert_flush(rq);
blk_mq_run_hw_queue(data.hctx, true);
} else if (plug && (q->nr_hw_queues == 1 ||
- blk_mq_is_sbitmap_shared(rq->mq_hctx->flags) ||
+ blk_mq_is_shared_tags(rq->mq_hctx->flags) ||
q->mq_ops->commit_rqs || !blk_queue_nonrot(q))) {
/*
* Use plugging if we have a ->commit_rqs() hook as well, as
@@ -2346,8 +2346,8 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
struct blk_mq_tags *drv_tags;
struct page *page;

- if (blk_mq_is_sbitmap_shared(set->flags))
- drv_tags = set->shared_sbitmap_tags;
+ if (blk_mq_is_shared_tags(set->flags))
+ drv_tags = set->shared_tags;
else
drv_tags = set->tags[hctx_idx];

@@ -2876,8 +2876,8 @@ struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
static bool __blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
int hctx_idx)
{
- if (blk_mq_is_sbitmap_shared(set->flags)) {
- set->tags[hctx_idx] = set->shared_sbitmap_tags;
+ if (blk_mq_is_shared_tags(set->flags)) {
+ set->tags[hctx_idx] = set->shared_tags;

return true;
}
@@ -2901,7 +2901,7 @@ void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
static void __blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
unsigned int hctx_idx)
{
- if (!blk_mq_is_sbitmap_shared(set->flags))
+ if (!blk_mq_is_shared_tags(set->flags))
blk_mq_free_map_and_rqs(set, set->tags[hctx_idx], hctx_idx);

set->tags[hctx_idx] = NULL;
@@ -3368,11 +3368,11 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
{
int i;

- if (blk_mq_is_sbitmap_shared(set->flags)) {
- set->shared_sbitmap_tags = blk_mq_alloc_map_and_rqs(set,
+ if (blk_mq_is_shared_tags(set->flags)) {
+ set->shared_tags = blk_mq_alloc_map_and_rqs(set,
BLK_MQ_NO_HCTX_IDX,
set->queue_depth);
- if (!set->shared_sbitmap_tags)
+ if (!set->shared_tags)
return -ENOMEM;
}

@@ -3388,8 +3388,8 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
while (--i >= 0)
__blk_mq_free_map_and_rqs(set, i);

- if (blk_mq_is_sbitmap_shared(set->flags)) {
- blk_mq_free_map_and_rqs(set, set->shared_sbitmap_tags,
+ if (blk_mq_is_shared_tags(set->flags)) {
+ blk_mq_free_map_and_rqs(set, set->shared_tags,
BLK_MQ_NO_HCTX_IDX);
}

@@ -3610,8 +3610,8 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
for (i = 0; i < set->nr_hw_queues; i++)
__blk_mq_free_map_and_rqs(set, i);

- if (blk_mq_is_sbitmap_shared(set->flags)) {
- blk_mq_free_map_and_rqs(set, set->shared_sbitmap_tags,
+ if (blk_mq_is_shared_tags(set->flags)) {
+ blk_mq_free_map_and_rqs(set, set->shared_tags,
BLK_MQ_NO_HCTX_IDX);
}

@@ -3662,11 +3662,11 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
}
if (!ret) {
q->nr_requests = nr;
- if (blk_mq_is_sbitmap_shared(set->flags)) {
+ if (blk_mq_is_shared_tags(set->flags)) {
if (q->elevator)
- blk_mq_tag_update_sched_shared_sbitmap(q);
+ blk_mq_tag_update_sched_shared_tags(q);
else
- blk_mq_tag_resize_shared_sbitmap(set, nr);
+ blk_mq_tag_resize_shared_tags(set, nr);
}
}

diff --git a/block/blk-mq.h b/block/blk-mq.h
index 8824ae03215a..171e8cdcff54 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -157,7 +157,7 @@ struct blk_mq_alloc_data {
struct blk_mq_hw_ctx *hctx;
};

-static inline bool blk_mq_is_sbitmap_shared(unsigned int flags)
+static inline bool blk_mq_is_shared_tags(unsigned int flags)
{
return flags & BLK_MQ_F_TAG_HCTX_SHARED;
}
@@ -217,24 +217,24 @@ static inline int blk_mq_get_rq_budget_token(struct request *rq)

static inline void __blk_mq_inc_active_requests(struct blk_mq_hw_ctx *hctx)
{
- if (blk_mq_is_sbitmap_shared(hctx->flags))
- atomic_inc(&hctx->queue->nr_active_requests_shared_sbitmap);
+ if (blk_mq_is_shared_tags(hctx->flags))
+ atomic_inc(&hctx->queue->nr_active_requests_shared_tags);
else
atomic_inc(&hctx->nr_active);
}

static inline void __blk_mq_dec_active_requests(struct blk_mq_hw_ctx *hctx)
{
- if (blk_mq_is_sbitmap_shared(hctx->flags))
- atomic_dec(&hctx->queue->nr_active_requests_shared_sbitmap);
+ if (blk_mq_is_shared_tags(hctx->flags))
+ atomic_dec(&hctx->queue->nr_active_requests_shared_tags);
else
atomic_dec(&hctx->nr_active);
}

static inline int __blk_mq_active_requests(struct blk_mq_hw_ctx *hctx)
{
- if (blk_mq_is_sbitmap_shared(hctx->flags))
- return atomic_read(&hctx->queue->nr_active_requests_shared_sbitmap);
+ if (blk_mq_is_shared_tags(hctx->flags))
+ return atomic_read(&hctx->queue->nr_active_requests_shared_tags);
return atomic_read(&hctx->nr_active);
}
static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
@@ -328,7 +328,7 @@ static inline bool hctx_may_queue(struct blk_mq_hw_ctx *hctx,
if (bt->sb.depth == 1)
return true;

- if (blk_mq_is_sbitmap_shared(hctx->flags)) {
+ if (blk_mq_is_shared_tags(hctx->flags)) {
struct request_queue *q = hctx->queue;

if (!test_bit(QUEUE_FLAG_HCTX_ACTIVE, &q->queue_flags))
diff --git a/block/elevator.c b/block/elevator.c
index 57be09cd7f6d..1f39f6e8ebb9 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -637,7 +637,7 @@ static struct elevator_type *elevator_get_default(struct request_queue *q)
return NULL;

if (q->nr_hw_queues != 1 &&
- !blk_mq_is_sbitmap_shared(q->tag_set->flags))
+ !blk_mq_is_shared_tags(q->tag_set->flags))
return NULL;

return elevator_get(q, "mq-deadline", false);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index faa20a19bfcc..75d75657df21 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -442,9 +442,9 @@ enum hctx_type {
* tag set.
* @tags: Tag sets. One tag set per hardware queue. Has @nr_hw_queues
* elements.
- * @shared_sbitmap_tags:
- * Shared sbitmap set of tags. Has @nr_hw_queues elements. If
- * set, shared by all @tags.
+ * @shared_tags:
+ * Shared set of tags. Has @nr_hw_queues elements. If set,
+ * shared by all @tags.
* @tag_list_lock: Serializes tag_list accesses.
* @tag_list: List of the request queues that use this tag set. See also
* request_queue.tag_set_list.
@@ -464,7 +464,7 @@ struct blk_mq_tag_set {

struct blk_mq_tags **tags;

- struct blk_mq_tags *shared_sbitmap_tags;
+ struct blk_mq_tags *shared_tags;

struct mutex tag_list_lock;
struct list_head tag_list;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index cf92c13eb80e..b19172db7eef 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -236,9 +236,9 @@ struct request_queue {
struct timer_list timeout;
struct work_struct timeout_work;

- atomic_t nr_active_requests_shared_sbitmap;
+ atomic_t nr_active_requests_shared_tags;

- struct blk_mq_tags *shared_sbitmap_tags;
+ struct blk_mq_tags *sched_shared_tags;

struct list_head icq_list;
#ifdef CONFIG_BLK_CGROUP
--
2.26.2

2021-10-05 10:31:48

by John Garry

[permalink] [raw]
Subject: [PATCH v5 08/14] blk-mq: Don't clear driver tags own mapping

Function blk_mq_clear_rq_mapping() is required to clear the sched tags
mappings in driver tags rqs[].

But there is no need for a driver tags to clear its own mapping, so skip
clearing the mapping in this scenario.

Signed-off-by: John Garry <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
---
block/blk-mq.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 1bee153e6b7f..158ee7dbbd76 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2310,6 +2310,10 @@ static void blk_mq_clear_rq_mapping(struct blk_mq_tags *drv_tags,
struct page *page;
unsigned long flags;

+ /* There is no need to clear a driver tags own mapping */
+ if (drv_tags == tags)
+ return;
+
list_for_each_entry(page, &tags->page_list, lru) {
unsigned long start = (unsigned long)page_address(page);
unsigned long end = start + order_to_size(page->private);
--
2.26.2

2021-10-05 10:31:51

by John Garry

[permalink] [raw]
Subject: [PATCH v5 06/14] blk-mq-sched: Rename blk_mq_sched_free_{requests -> rqs}()

To be more concise and consistent in naming, rename
blk_mq_sched_free_requests() -> blk_mq_sched_free_rqs().

Signed-off-by: John Garry <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
---
block/blk-core.c | 2 +-
block/blk-mq-sched.c | 6 +++---
block/blk-mq-sched.h | 2 +-
block/blk.h | 2 +-
4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index cf0899a93367..7c869737ce4c 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -407,7 +407,7 @@ void blk_cleanup_queue(struct request_queue *q)
*/
mutex_lock(&q->sysfs_lock);
if (q->elevator)
- blk_mq_sched_free_requests(q);
+ blk_mq_sched_free_rqs(q);
mutex_unlock(&q->sysfs_lock);

percpu_ref_exit(&q->q_usage_counter);
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 644b6d554d72..bdbb6c31b433 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -631,7 +631,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
ret = e->ops.init_hctx(hctx, i);
if (ret) {
eq = q->elevator;
- blk_mq_sched_free_requests(q);
+ blk_mq_sched_free_rqs(q);
blk_mq_exit_sched(q, eq);
kobject_put(&eq->kobj);
return ret;
@@ -646,7 +646,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
if (blk_mq_is_sbitmap_shared(q->tag_set->flags))
blk_mq_exit_sched_shared_sbitmap(q);
err_free_map_and_rqs:
- blk_mq_sched_free_requests(q);
+ blk_mq_sched_free_rqs(q);
blk_mq_sched_tags_teardown(q);
q->elevator = NULL;
return ret;
@@ -656,7 +656,7 @@ int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e)
* called in either blk_queue_cleanup or elevator_switch, tagset
* is required for freeing requests
*/
-void blk_mq_sched_free_requests(struct request_queue *q)
+void blk_mq_sched_free_rqs(struct request_queue *q)
{
struct blk_mq_hw_ctx *hctx;
int i;
diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h
index f6a1ce9e31f3..e4d367e0b2a3 100644
--- a/block/blk-mq-sched.h
+++ b/block/blk-mq-sched.h
@@ -29,7 +29,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx);

int blk_mq_init_sched(struct request_queue *q, struct elevator_type *e);
void blk_mq_exit_sched(struct request_queue *q, struct elevator_queue *e);
-void blk_mq_sched_free_requests(struct request_queue *q);
+void blk_mq_sched_free_rqs(struct request_queue *q);

static inline bool
blk_mq_sched_bio_merge(struct request_queue *q, struct bio *bio,
diff --git a/block/blk.h b/block/blk.h
index deb8393e34ee..762af963b302 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -240,7 +240,7 @@ static inline void elevator_exit(struct request_queue *q,
{
lockdep_assert_held(&q->sysfs_lock);

- blk_mq_sched_free_requests(q);
+ blk_mq_sched_free_rqs(q);
__elevator_exit(q, e);
}

--
2.26.2

2021-10-05 10:31:57

by John Garry

[permalink] [raw]
Subject: [PATCH v5 13/14] blk-mq: Stop using pointers for blk_mq_tags bitmap tags

Now that we use shared tags for shared sbitmap support, we don't require
the tags sbitmap pointers, so drop them.

This essentially reverts commit 222a5ae03cdd ("blk-mq: Use pointers for
blk_mq_tags bitmap tags").

Function blk_mq_init_bitmap_tags() is removed also, since it would be only
a wrappper for blk_mq_init_bitmaps().

Reviewed-by: Ming Lei <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: John Garry <[email protected]>
---
block/bfq-iosched.c | 4 +--
block/blk-mq-debugfs.c | 8 +++---
block/blk-mq-tag.c | 56 +++++++++++++++---------------------------
block/blk-mq-tag.h | 7 ++----
block/blk-mq.c | 8 +++---
block/kyber-iosched.c | 4 +--
block/mq-deadline.c | 2 +-
7 files changed, 35 insertions(+), 54 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index b5ef23f63d51..fec18118dc30 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6884,8 +6884,8 @@ static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
struct blk_mq_tags *tags = hctx->sched_tags;
unsigned int min_shallow;

- min_shallow = bfq_update_depths(bfqd, tags->bitmap_tags);
- sbitmap_queue_min_shallow_depth(tags->bitmap_tags, min_shallow);
+ min_shallow = bfq_update_depths(bfqd, &tags->bitmap_tags);
+ sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, min_shallow);
}

static int bfq_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int index)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 4b66d2776eda..4000376330c9 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -452,11 +452,11 @@ static void blk_mq_debugfs_tags_show(struct seq_file *m,
atomic_read(&tags->active_queues));

seq_puts(m, "\nbitmap_tags:\n");
- sbitmap_queue_show(tags->bitmap_tags, m);
+ sbitmap_queue_show(&tags->bitmap_tags, m);

if (tags->nr_reserved_tags) {
seq_puts(m, "\nbreserved_tags:\n");
- sbitmap_queue_show(tags->breserved_tags, m);
+ sbitmap_queue_show(&tags->breserved_tags, m);
}
}

@@ -487,7 +487,7 @@ static int hctx_tags_bitmap_show(void *data, struct seq_file *m)
if (res)
goto out;
if (hctx->tags)
- sbitmap_bitmap_show(&hctx->tags->bitmap_tags->sb, m);
+ sbitmap_bitmap_show(&hctx->tags->bitmap_tags.sb, m);
mutex_unlock(&q->sysfs_lock);

out:
@@ -521,7 +521,7 @@ static int hctx_sched_tags_bitmap_show(void *data, struct seq_file *m)
if (res)
goto out;
if (hctx->sched_tags)
- sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags->sb, m);
+ sbitmap_bitmap_show(&hctx->sched_tags->bitmap_tags.sb, m);
mutex_unlock(&q->sysfs_lock);

out:
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 0e10e8404bf0..211068a5f676 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -44,9 +44,9 @@ bool __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx)
*/
void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool include_reserve)
{
- sbitmap_queue_wake_all(tags->bitmap_tags);
+ sbitmap_queue_wake_all(&tags->bitmap_tags);
if (include_reserve)
- sbitmap_queue_wake_all(tags->breserved_tags);
+ sbitmap_queue_wake_all(&tags->breserved_tags);
}

/*
@@ -100,10 +100,10 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
WARN_ON_ONCE(1);
return BLK_MQ_NO_TAG;
}
- bt = tags->breserved_tags;
+ bt = &tags->breserved_tags;
tag_offset = 0;
} else {
- bt = tags->bitmap_tags;
+ bt = &tags->bitmap_tags;
tag_offset = tags->nr_reserved_tags;
}

@@ -149,9 +149,9 @@ unsigned int blk_mq_get_tag(struct blk_mq_alloc_data *data)
data->ctx);
tags = blk_mq_tags_from_data(data);
if (data->flags & BLK_MQ_REQ_RESERVED)
- bt = tags->breserved_tags;
+ bt = &tags->breserved_tags;
else
- bt = tags->bitmap_tags;
+ bt = &tags->bitmap_tags;

/*
* If destination hw queue is changed, fake wake up on
@@ -185,10 +185,10 @@ void blk_mq_put_tag(struct blk_mq_tags *tags, struct blk_mq_ctx *ctx,
const int real_tag = tag - tags->nr_reserved_tags;

BUG_ON(real_tag >= tags->nr_tags);
- sbitmap_queue_clear(tags->bitmap_tags, real_tag, ctx->cpu);
+ sbitmap_queue_clear(&tags->bitmap_tags, real_tag, ctx->cpu);
} else {
BUG_ON(tag >= tags->nr_reserved_tags);
- sbitmap_queue_clear(tags->breserved_tags, tag, ctx->cpu);
+ sbitmap_queue_clear(&tags->breserved_tags, tag, ctx->cpu);
}
}

@@ -339,9 +339,9 @@ static void __blk_mq_all_tag_iter(struct blk_mq_tags *tags,
WARN_ON_ONCE(flags & BT_TAG_ITER_RESERVED);

if (tags->nr_reserved_tags)
- bt_tags_for_each(tags, tags->breserved_tags, fn, priv,
+ bt_tags_for_each(tags, &tags->breserved_tags, fn, priv,
flags | BT_TAG_ITER_RESERVED);
- bt_tags_for_each(tags, tags->bitmap_tags, fn, priv, flags);
+ bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, flags);
}

/**
@@ -458,8 +458,8 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
continue;

if (tags->nr_reserved_tags)
- bt_for_each(hctx, tags->breserved_tags, fn, priv, true);
- bt_for_each(hctx, tags->bitmap_tags, fn, priv, false);
+ bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
+ bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
}
blk_queue_exit(q);
}
@@ -491,24 +491,6 @@ int blk_mq_init_bitmaps(struct sbitmap_queue *bitmap_tags,
return -ENOMEM;
}

-static int blk_mq_init_bitmap_tags(struct blk_mq_tags *tags,
- int node, int alloc_policy)
-{
- int ret;
-
- ret = blk_mq_init_bitmaps(&tags->__bitmap_tags,
- &tags->__breserved_tags,
- tags->nr_tags, tags->nr_reserved_tags,
- node, alloc_policy);
- if (ret)
- return ret;
-
- tags->bitmap_tags = &tags->__bitmap_tags;
- tags->breserved_tags = &tags->__breserved_tags;
-
- return 0;
-}
-
struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
unsigned int reserved_tags,
int node, int alloc_policy)
@@ -528,7 +510,9 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,
tags->nr_reserved_tags = reserved_tags;
spin_lock_init(&tags->lock);

- if (blk_mq_init_bitmap_tags(tags, node, alloc_policy) < 0) {
+ if (blk_mq_init_bitmaps(&tags->bitmap_tags, &tags->breserved_tags,
+ total_tags, reserved_tags, node,
+ alloc_policy) < 0) {
kfree(tags);
return NULL;
}
@@ -537,8 +521,8 @@ struct blk_mq_tags *blk_mq_init_tags(unsigned int total_tags,

void blk_mq_free_tags(struct blk_mq_tags *tags)
{
- sbitmap_queue_free(tags->bitmap_tags);
- sbitmap_queue_free(tags->breserved_tags);
+ sbitmap_queue_free(&tags->bitmap_tags);
+ sbitmap_queue_free(&tags->breserved_tags);
kfree(tags);
}

@@ -587,7 +571,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
* Don't need (or can't) update reserved tags here, they
* remain static and should never need resizing.
*/
- sbitmap_queue_resize(tags->bitmap_tags,
+ sbitmap_queue_resize(&tags->bitmap_tags,
tdepth - tags->nr_reserved_tags);
}

@@ -598,12 +582,12 @@ void blk_mq_tag_resize_shared_sbitmap(struct blk_mq_tag_set *set, unsigned int s
{
struct blk_mq_tags *tags = set->shared_sbitmap_tags;

- sbitmap_queue_resize(&tags->__bitmap_tags, size - set->reserved_tags);
+ sbitmap_queue_resize(&tags->bitmap_tags, size - set->reserved_tags);
}

void blk_mq_tag_update_sched_shared_sbitmap(struct request_queue *q)
{
- sbitmap_queue_resize(q->shared_sbitmap_tags->bitmap_tags,
+ sbitmap_queue_resize(&q->shared_sbitmap_tags->bitmap_tags,
q->nr_requests - q->tag_set->reserved_tags);
}

diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index e43f7589b96c..1052d69147ba 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -13,11 +13,8 @@ struct blk_mq_tags {

atomic_t active_queues;

- struct sbitmap_queue *bitmap_tags;
- struct sbitmap_queue *breserved_tags;
-
- struct sbitmap_queue __bitmap_tags;
- struct sbitmap_queue __breserved_tags;
+ struct sbitmap_queue bitmap_tags;
+ struct sbitmap_queue breserved_tags;

struct request **rqs;
struct request **static_rqs;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 019d3f7b7bb7..11644c67b064 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1064,14 +1064,14 @@ static inline unsigned int queued_to_index(unsigned int queued)

static bool __blk_mq_get_driver_tag(struct request *rq)
{
- struct sbitmap_queue *bt = rq->mq_hctx->tags->bitmap_tags;
+ struct sbitmap_queue *bt = &rq->mq_hctx->tags->bitmap_tags;
unsigned int tag_offset = rq->mq_hctx->tags->nr_reserved_tags;
int tag;

blk_mq_tag_busy(rq->mq_hctx);

if (blk_mq_tag_is_reserved(rq->mq_hctx->sched_tags, rq->internal_tag)) {
- bt = rq->mq_hctx->tags->breserved_tags;
+ bt = &rq->mq_hctx->tags->breserved_tags;
tag_offset = 0;
} else {
if (!hctx_may_queue(rq->mq_hctx, bt))
@@ -1114,7 +1114,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
struct sbitmap_queue *sbq;

list_del_init(&wait->entry);
- sbq = hctx->tags->bitmap_tags;
+ sbq = &hctx->tags->bitmap_tags;
atomic_dec(&sbq->ws_active);
}
spin_unlock(&hctx->dispatch_wait_lock);
@@ -1132,7 +1132,7 @@ static int blk_mq_dispatch_wake(wait_queue_entry_t *wait, unsigned mode,
static bool blk_mq_mark_tag_wait(struct blk_mq_hw_ctx *hctx,
struct request *rq)
{
- struct sbitmap_queue *sbq = hctx->tags->bitmap_tags;
+ struct sbitmap_queue *sbq = &hctx->tags->bitmap_tags;
struct wait_queue_head *wq;
wait_queue_entry_t *wait;
bool ret;
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 5808f0ea0b79..76578fdccbf2 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -451,11 +451,11 @@ static void kyber_depth_updated(struct blk_mq_hw_ctx *hctx)
{
struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
struct blk_mq_tags *tags = hctx->sched_tags;
- unsigned int shift = tags->bitmap_tags->sb.shift;
+ unsigned int shift = tags->bitmap_tags.sb.shift;

kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U;

- sbitmap_queue_min_shallow_depth(tags->bitmap_tags, kqd->async_depth);
+ sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, kqd->async_depth);
}

static int kyber_init_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_idx)
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 2001ac186f5c..85d919bf60c7 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -567,7 +567,7 @@ static void dd_depth_updated(struct blk_mq_hw_ctx *hctx)

dd->async_depth = max(1UL, 3 * q->nr_requests / 4);

- sbitmap_queue_min_shallow_depth(tags->bitmap_tags, dd->async_depth);
+ sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, dd->async_depth);
}

/* Called by blk_mq_init_hctx() and blk_mq_init_sched(). */
--
2.26.2

2021-10-05 10:32:37

by John Garry

[permalink] [raw]
Subject: [PATCH v5 11/14] blk-mq: Refactor and rename blk_mq_free_map_and_{requests->rqs}()

Refactor blk_mq_free_map_and_requests() such that it can be used at many
sites at which the tag map and rqs are freed.

Also rename to blk_mq_free_map_and_rqs(), which is shorter and matches the
alloc equivalent.

Suggested-by: Ming Lei <[email protected]>
Signed-off-by: John Garry <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
---
block/blk-mq-tag.c | 3 +--
block/blk-mq.c | 40 ++++++++++++++++++++++++----------------
block/blk-mq.h | 4 +++-
3 files changed, 28 insertions(+), 19 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index db99f1246795..a0ecc6d88f84 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -607,8 +607,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
if (!new)
return -ENOMEM;

- blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
- blk_mq_free_rq_map(*tagsptr, set->flags);
+ blk_mq_free_map_and_rqs(set, *tagsptr, hctx->queue_num);
*tagsptr = new;
} else {
/*
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6d3664acb9f8..7bf56dc3af30 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2880,15 +2880,15 @@ static bool __blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
return set->tags[hctx_idx];
}

-static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
- unsigned int hctx_idx)
+void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
+ struct blk_mq_tags *tags,
+ unsigned int hctx_idx)
{
unsigned int flags = set->flags;

- if (set->tags && set->tags[hctx_idx]) {
- blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx);
- blk_mq_free_rq_map(set->tags[hctx_idx], flags);
- set->tags[hctx_idx] = NULL;
+ if (tags) {
+ blk_mq_free_rqs(set, tags, hctx_idx);
+ blk_mq_free_rq_map(tags, flags);
}
}

@@ -2969,8 +2969,10 @@ static void blk_mq_map_swqueue(struct request_queue *q)
* fallback in case of a new remap fails
* allocation
*/
- if (i && set->tags[i])
- blk_mq_free_map_and_requests(set, i);
+ if (i && set->tags[i]) {
+ blk_mq_free_map_and_rqs(set, set->tags[i], i);
+ set->tags[i] = NULL;
+ }

hctx->tags = NULL;
continue;
@@ -3266,8 +3268,8 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
struct blk_mq_hw_ctx *hctx = hctxs[j];

if (hctx) {
- if (hctx->tags)
- blk_mq_free_map_and_requests(set, j);
+ blk_mq_free_map_and_rqs(set, set->tags[j], j);
+ set->tags[j] = NULL;
blk_mq_exit_hctx(q, set, hctx, j);
hctxs[j] = NULL;
}
@@ -3363,8 +3365,10 @@ static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
return 0;

out_unwind:
- while (--i >= 0)
- blk_mq_free_map_and_requests(set, i);
+ while (--i >= 0) {
+ blk_mq_free_map_and_rqs(set, set->tags[i], i);
+ set->tags[i] = NULL;
+ }

return -ENOMEM;
}
@@ -3559,8 +3563,10 @@ int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
return 0;

out_free_mq_rq_maps:
- for (i = 0; i < set->nr_hw_queues; i++)
- blk_mq_free_map_and_requests(set, i);
+ for (i = 0; i < set->nr_hw_queues; i++) {
+ blk_mq_free_map_and_rqs(set, set->tags[i], i);
+ set->tags[i] = NULL;
+ }
out_free_mq_map:
for (i = 0; i < set->nr_maps; i++) {
kfree(set->map[i].mq_map);
@@ -3592,8 +3598,10 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
{
int i, j;

- for (i = 0; i < set->nr_hw_queues; i++)
- blk_mq_free_map_and_requests(set, i);
+ for (i = 0; i < set->nr_hw_queues; i++) {
+ blk_mq_free_map_and_rqs(set, set->tags[i], i);
+ set->tags[i] = NULL;
+ }

if (blk_mq_is_sbitmap_shared(set->flags))
blk_mq_exit_shared_sbitmap(set);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 83585a344568..bcb0ca89d37a 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -57,7 +57,9 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
void blk_mq_free_rq_map(struct blk_mq_tags *tags, unsigned int flags);
struct blk_mq_tags *blk_mq_alloc_map_and_rqs(struct blk_mq_tag_set *set,
unsigned int hctx_idx, unsigned int depth);
-
+void blk_mq_free_map_and_rqs(struct blk_mq_tag_set *set,
+ struct blk_mq_tags *tags,
+ unsigned int hctx_idx);
/*
* Internal helpers for request insertion into sw queues
*/
--
2.26.2

2021-10-05 12:40:54

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v5 00/14] blk-mq: Reduce static requests memory footprint for shared sbitmap

On 10/5/21 4:23 AM, John Garry wrote:
> Currently a full set of static requests are allocated per hw queue per
> tagset when shared sbitmap is used.
>
> However, only tagset->queue_depth number of requests may be active at
> any given time. As such, only tagset->queue_depth number of static
> requests are required.
>
> The same goes for using an IO scheduler, which allocates a full set of
> static requests per hw queue per request queue.
>
> This series changes shared sbitmap support by using a shared tags per
> tagset and request queue. Ming suggested something along those lines in
> v1 review. In using a shared tags, the static rqs also become shared,
> reducing the number of sets of static rqs, reducing memory usage.
>
> Patch "blk-mq: Use shared tags for shared sbitmap support" is a bit big,
> and could potentially be broken down. But then maintaining ability to
> bisect becomes harder and each sub-patch would get more convoluted.
>
> For megaraid sas driver on my 128-CPU arm64 system with 1x SATA disk, we
> save approx. 300MB(!) [370MB -> 60MB]
>
> Baseline is 1b2d1439fc25 (block/for-next) Merge branch 'for-5.16/io_uring'
> into for-next

Let's get this queued up for testing, thanks John.

--
Jens Axboe

2021-10-05 13:35:18

by John Garry

[permalink] [raw]
Subject: Re: [PATCH v5 00/14] blk-mq: Reduce static requests memory footprint for shared sbitmap

On 05/10/2021 13:35, Jens Axboe wrote:
>> Baseline is 1b2d1439fc25 (block/for-next) Merge branch 'for-5.16/io_uring'
>> into for-next
> Let's get this queued up for testing, thanks John.

Cheers, appreciated

@Kashyap, You mentioned that when testing you saw a performance
regression from v5.11 -> v5.12 - any idea on that yet? Can you describe
the scenario, like IO scheduler and how many disks and the type? Does
disabling host_tagset_enable restore performance?

From checking differences between those kernels, I don't see anything
directly relevant in sbitmap support or in the megaraid sas driver.

Thanks,
John

2021-10-05 14:07:41

by Kashyap Desai

[permalink] [raw]
Subject: RE: [PATCH v5 00/14] blk-mq: Reduce static requests memory footprint for shared sbitmap

> -----Original Message-----
> From: John Garry [mailto:[email protected]]
> Sent: Tuesday, October 5, 2021 7:05 PM
> To: Jens Axboe <[email protected]>; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH v5 00/14] blk-mq: Reduce static requests memory
> footprint for shared sbitmap
>
> On 05/10/2021 13:35, Jens Axboe wrote:
> >> Baseline is 1b2d1439fc25 (block/for-next) Merge branch
> >> 'for-5.16/io_uring'
> >> into for-next
> > Let's get this queued up for testing, thanks John.
>
> Cheers, appreciated
>
> @Kashyap, You mentioned that when testing you saw a performance
> regression from v5.11 -> v5.12 - any idea on that yet? Can you describe
> the
> scenario, like IO scheduler and how many disks and the type? Does
> disabling
> host_tagset_enable restore performance?

John - I am still working on this. System was not available due to some
other debugging.

>
> From checking differences between those kernels, I don't see anything
> directly relevant in sbitmap support or in the megaraid sas driver.
>
> Thanks,
> John


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2021-10-05 16:25:30

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v5 00/14] blk-mq: Reduce static requests memory footprint for shared sbitmap

On 10/5/21 7:34 AM, John Garry wrote:
> On 05/10/2021 13:35, Jens Axboe wrote:
>>> Baseline is 1b2d1439fc25 (block/for-next) Merge branch 'for-5.16/io_uring'
>>> into for-next
>> Let's get this queued up for testing, thanks John.
>
> Cheers, appreciated
>
> @Kashyap, You mentioned that when testing you saw a performance
> regression from v5.11 -> v5.12 - any idea on that yet? Can you describe
> the scenario, like IO scheduler and how many disks and the type? Does
> disabling host_tagset_enable restore performance?

FWIW, I ran my usual peak testing on this and didn't observe any regressions.
Caveat being that a) no scheduler is involved, and b) no shared tags. But
at least that fast case/path is fine.

--
Jens Axboe

2021-10-07 20:36:15

by Kashyap Desai

[permalink] [raw]
Subject: RE: [PATCH v5 00/14] blk-mq: Reduce static requests memory footprint for shared sbitmap

> > -----Original Message-----
> > From: John Garry [mailto:[email protected]]
> > Sent: Tuesday, October 5, 2021 7:05 PM
> > To: Jens Axboe <[email protected]>; [email protected]
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Subject: Re: [PATCH v5 00/14] blk-mq: Reduce static requests memory
> > footprint for shared sbitmap
> >
> > On 05/10/2021 13:35, Jens Axboe wrote:
> > >> Baseline is 1b2d1439fc25 (block/for-next) Merge branch 'for-
> 5.16/io_uring'
> > >> into for-next
> > > Let's get this queued up for testing, thanks John.
> >
> > Cheers, appreciated
> >
> > @Kashyap, You mentioned that when testing you saw a performance
> > regression from v5.11 -> v5.12 - any idea on that yet? Can you
> > describe the scenario, like IO scheduler and how many disks and the
> > type? Does disabling host_tagset_enable restore performance?
>
> John - I am still working on this. System was not available due to some
> other
> debugging.

John -

I tested this patchset on 5.15-rc4 (master) -
https://github.com/torvalds/linux.git

#1 I noticed some performance regression @mq-deadline scheduler which is not
related to this series. I will bisect and get more detail about this issue
separately.
#2 w.r.t this patchset, I noticed one issue which is related to cpu usage
is high in certain case.

I have covered test on same setup using same h/w. I tested on Aero MegaRaid
Controller.

Test #1 : Total 24 SAS SSDs in JBOD mode.
(numactl -N 1 fio
24.fio --rw=randread --bs=4k --iodepth=256 --numjobs=1
--ioscheduler=none/mq-deadline)
No performance regression is noticed using this patchset. I can get 3.1 M
IOPs (max IOPs on this setup). I noticed some CPU hogging issue if iodepth
from application is high.

Cpu usage data from (top)
%Node1 : 6.4 us, 57.5 sy, 0.0 ni, 23.7 id, 0.0 wa, 0.0 hi, 12.4 si, 0.0
st

Perf top data -
19.11% [kernel] [k] native_queued_spin_lock_slowpath
4.72% [megaraid_sas] [k] complete_cmd_fusion
3.70% [megaraid_sas] [k] megasas_build_and_issue_cmd_fusion
2.76% [megaraid_sas] [k] megasas_build_ldio_fusion
2.16% [kernel] [k] syscall_return_via_sysret
2.16% [kernel] [k] entry_SYSCALL_64
1.87% [megaraid_sas] [k] megasas_queue_command
1.58% [kernel] [k] io_submit_one
1.53% [kernel] [k] llist_add_batch
1.51% [kernel] [k] blk_mq_find_and_get_req
1.43% [kernel] [k] llist_reverse_order
1.42% [kernel] [k] scsi_complete
1.18% [kernel] [k] blk_mq_rq_ctx_init.isra.51
1.17% [kernel] [k] _raw_spin_lock_irqsave
1.15% [kernel] [k] blk_mq_get_driver_tag
1.09% [kernel] [k] read_tsc
0.97% [kernel] [k] native_irq_return_iret
0.91% [kernel] [k] scsi_queue_rq
0.89% [kernel] [k] blk_complete_reqs

Perf top data indicates lock contention in "blk_mq_find_and_get_req" call.

1.31% 1.31% kworker/57:1H-k [kernel.vmlinux]
native_queued_spin_lock_slowpath
ret_from_fork
kthread
worker_thread
process_one_work
blk_mq_timeout_work
blk_mq_queue_tag_busy_iter
bt_iter
blk_mq_find_and_get_req
_raw_spin_lock_irqsave
native_queued_spin_lock_slowpath


Kernel v5.14 Data -

%Node1 : 8.4 us, 31.2 sy, 0.0 ni, 43.7 id, 0.0 wa, 0.0 hi, 16.8 si, 0.0
st
4.46% [kernel] [k] complete_cmd_fusion
3.69% [kernel] [k] megasas_build_and_issue_cmd_fusion
2.97% [kernel] [k] blk_mq_find_and_get_req
2.81% [kernel] [k] megasas_build_ldio_fusion
2.62% [kernel] [k] syscall_return_via_sysret
2.17% [kernel] [k] __entry_text_start
2.01% [kernel] [k] io_submit_one
1.87% [kernel] [k] scsi_queue_rq
1.77% [kernel] [k] native_queued_spin_lock_slowpath
1.76% [kernel] [k] scsi_complete
1.66% [kernel] [k] llist_reverse_order
1.63% [kernel] [k] _raw_spin_lock_irqsave
1.61% [kernel] [k] llist_add_batch
1.39% [kernel] [k] aio_complete_rw
1.37% [kernel] [k] read_tsc
1.07% [kernel] [k] blk_complete_reqs
1.07% [kernel] [k] native_irq_return_iret
1.04% [kernel] [k] __x86_indirect_thunk_rax
1.03% fio [.] __fio_gettime
1.00% [kernel] [k] flush_smp_call_function_queue


Test #2: Three VDs (each VD consist of 8 SAS SSDs).
(numactl -N 1 fio
3vd.fio --rw=randread --bs=4k --iodepth=32 --numjobs=8
--ioscheduler=none/mq-deadline)

There is a performance regression but it is not due to this patch set.
Kernel v5.11 gives 2.1M IOPs on mq-deadline but 5.15 (without this patchset)
gives 1.8M IOPs.
In this test I did not noticed CPU issue as mentioned in Test-1.

In general, I noticed host_busy is incorrect once I apply this patchset. It
should not be more than can_queue, but sysfs host_busy value is very high
when IOs are running. This issue is only after applying this patchset.

Is this patch set only change the behavior of <shared_host_tag> enabled
driver ? Will there be any impact on mpi3mr driver ? I can test that as
well.

Kashyap

>
> >
> > From checking differences between those kernels, I don't see anything
> > directly relevant in sbitmap support or in the megaraid sas driver.
> >
> > Thanks,
> > John


Attachments:
smime.p7s (4.11 kB)
S/MIME Cryptographic Signature

2021-10-08 03:12:47

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH v5 00/14] blk-mq: Reduce static requests memory footprint for shared sbitmap

On 10/7/21 13:31, Kashyap Desai wrote:
> I tested this patchset on 5.15-rc4 (master) -
> https://github.com/torvalds/linux.git
>
> #1 I noticed some performance regression @mq-deadline scheduler which is not
> related to this series. I will bisect and get more detail about this issue
> separately.

Please test this patch series on top of Jens' for-next branch
(git://git.kernel.dk/linux-block). The mq-deadline performance on Jens'
for-next branch should match that of kernel v5.13.

Thanks,

Bart.

2021-10-08 08:06:38

by John Garry

[permalink] [raw]
Subject: Re: [PATCH v5 00/14] blk-mq: Reduce static requests memory footprint for shared sbitmap

On 08/10/2021 04:11, Bart Van Assche wrote:

+

> On 10/7/21 13:31, Kashyap Desai wrote:
>> I tested this patchset on 5.15-rc4 (master) -
>> https://github.com/torvalds/linux.git
>>
>> #1 I noticed some performance regression @mq-deadline scheduler which
>> is not
>> related to this series. I will bisect and get more detail about this
>> issue
>> separately.
>
> Please test this patch series on top of Jens' for-next branch
> (git://git.kernel.dk/linux-block). The mq-deadline performance on Jens'
> for-next branch should match that of kernel v5.13.

Kashyap did also mention earlier that he say the drop from v5.11 -> v5.12.

The only thing I saw possibly related there was Jan's work in
https://lore.kernel.org/linux-block/[email protected]/
to fix a performance regression witnessed on megaraid sas for slower media.

But I quite doubtful that this is the issue.

Thanks,
John

2021-10-08 10:16:54

by John Garry

[permalink] [raw]
Subject: Re: [PATCH v5 00/14] blk-mq: Reduce static requests memory footprint for shared sbitmap

On 07/10/2021 21:31, Kashyap Desai wrote:
> Perf top data indicates lock contention in "blk_mq_find_and_get_req" call.
>
> 1.31% 1.31% kworker/57:1H-k [kernel.vmlinux]
> native_queued_spin_lock_slowpath
> ret_from_fork
> kthread
> worker_thread
> process_one_work
> blk_mq_timeout_work
> blk_mq_queue_tag_busy_iter
> bt_iter
> blk_mq_find_and_get_req
> _raw_spin_lock_irqsave
> native_queued_spin_lock_slowpath
>
>
> Kernel v5.14 Data -
>
> %Node1 : 8.4 us, 31.2 sy, 0.0 ni, 43.7 id, 0.0 wa, 0.0 hi, 16.8 si, 0.0
> st
> 4.46% [kernel] [k] complete_cmd_fusion
> 3.69% [kernel] [k] megasas_build_and_issue_cmd_fusion
> 2.97% [kernel] [k] blk_mq_find_and_get_req
> 2.81% [kernel] [k] megasas_build_ldio_fusion
> 2.62% [kernel] [k] syscall_return_via_sysret
> 2.17% [kernel] [k] __entry_text_start
> 2.01% [kernel] [k] io_submit_one
> 1.87% [kernel] [k] scsi_queue_rq
> 1.77% [kernel] [k] native_queued_spin_lock_slowpath
> 1.76% [kernel] [k] scsi_complete
> 1.66% [kernel] [k] llist_reverse_order
> 1.63% [kernel] [k] _raw_spin_lock_irqsave
> 1.61% [kernel] [k] llist_add_batch
> 1.39% [kernel] [k] aio_complete_rw
> 1.37% [kernel] [k] read_tsc
> 1.07% [kernel] [k] blk_complete_reqs
> 1.07% [kernel] [k] native_irq_return_iret
> 1.04% [kernel] [k] __x86_indirect_thunk_rax
> 1.03% fio [.] __fio_gettime
> 1.00% [kernel] [k] flush_smp_call_function_queue
>
>
> Test #2: Three VDs (each VD consist of 8 SAS SSDs).
> (numactl -N 1 fio
> 3vd.fio --rw=randread --bs=4k --iodepth=32 --numjobs=8
> --ioscheduler=none/mq-deadline)
>
> There is a performance regression but it is not due to this patch set.
> Kernel v5.11 gives 2.1M IOPs on mq-deadline but 5.15 (without this patchset)
> gives 1.8M IOPs.
> In this test I did not noticed CPU issue as mentioned in Test-1.
>
> In general, I noticed host_busy is incorrect once I apply this patchset. It
> should not be more than can_queue, but sysfs host_busy value is very high
> when IOs are running. This issue is only after applying this patchset.
>
> Is this patch set only change the behavior of <shared_host_tag> enabled
> driver ? Will there be any impact on mpi3mr driver ? I can test that as
> well.

I can see where the high value of host_busy is coming from in this
series - we incorrectly re-iter the tags by #hw queues times in
blk_mq_tagset_busy_iter() - d'oh.

Please try the below patch. I have looked at other places where we may
have similar problems in looping the hw queue count for tagset->tags[],
and they look ok. But I will double-check. I think that
blk_mq_queue_tag_busy_iter() should be fine - Ming?

--->8----

From e6ecaa6d624ebb903fa773ca2a2035300b4c55c5 Mon Sep 17 00:00:00 2001
From: John Garry <[email protected]>
Date: Fri, 8 Oct 2021 10:55:11 +0100
Subject: [PATCH] blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags

Since it is now possible for a tagset to share a single set of tags, the
iter function should not re-iter the tags for the count of hw queues in
that case. Rather it should just iter once.

Signed-off-by: John Garry <[email protected]>

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 72a2724a4eee..ef888aab81b3 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -378,9 +378,15 @@ void blk_mq_all_tag_iter(struct blk_mq_tags *tags,
busy_tag_iter_fn *fn,
void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
busy_tag_iter_fn *fn, void *priv)
{
+ int nr_hw_queues;
int i;

- for (i = 0; i < tagset->nr_hw_queues; i++) {
+ if (blk_mq_is_shared_tags(tagset->flags))
+ nr_hw_queues = 1;
+ else
+ nr_hw_queues = tagset->nr_hw_queues;
+
+ for (i = 0; i < nr_hw_queues; i++) {
if (tagset->tags && tagset->tags[i])
__blk_mq_all_tag_iter(tagset->tags[i], fn, priv,
BT_TAG_ITER_STARTED);

----8<----

Thanks,
john