2017-06-26 10:21:12

by Christoph Hellwig

[permalink] [raw]
Subject: block: spread MSI(-X) vectors to all possible CPUs

Hi all,

this series contains the left-over block bits to spread the MSI-X
vectors over all CPU. Thomas already rewrote and then merged the
irq bits into the tip irq/core branch, and this is the remainder.

As there are no dependencies on other block changes adding them
to the tip tree might be easiest if Jens could ACK them.


2017-06-26 10:21:23

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 1/3] blk-mq: include all present CPUs in the default queue mapping

This way we get a nice distribution independent of the current cpu
online / offline state.

Signed-off-by: Christoph Hellwig <[email protected]>
---
block/blk-mq-cpumap.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 8e61e8640e17..5eaecd40f701 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -35,7 +35,6 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)
{
unsigned int *map = set->mq_map;
unsigned int nr_queues = set->nr_hw_queues;
- const struct cpumask *online_mask = cpu_online_mask;
unsigned int i, nr_cpus, nr_uniq_cpus, queue, first_sibling;
cpumask_var_t cpus;

@@ -44,7 +43,7 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)

cpumask_clear(cpus);
nr_cpus = nr_uniq_cpus = 0;
- for_each_cpu(i, online_mask) {
+ for_each_present_cpu(i) {
nr_cpus++;
first_sibling = get_first_sibling(i);
if (!cpumask_test_cpu(first_sibling, cpus))
@@ -54,7 +53,7 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)

queue = 0;
for_each_possible_cpu(i) {
- if (!cpumask_test_cpu(i, online_mask)) {
+ if (!cpumask_test_cpu(i, cpu_present_mask)) {
map[i] = 0;
continue;
}
--
2.11.0

2017-06-26 10:21:39

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 3/3] nvme: allocate queues for all possible CPUs

Unlike most drіvers that simply pass the maximum possible vectors to
pci_alloc_irq_vectors NVMe needs to configure the device before allocting
the vectors, so it needs a manual update for the new scheme of using
all present CPUs.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/nvme/host/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 951042a375d6..b3dcd7abc6d7 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1525,7 +1525,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
struct pci_dev *pdev = to_pci_dev(dev->dev);
int result, nr_io_queues, size;

- nr_io_queues = num_online_cpus();
+ nr_io_queues = num_present_cpus();
result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
if (result < 0)
return result;
--
2.11.0

2017-06-26 10:21:21

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 2/3] blk-mq: create hctx for each present CPU

Currently we only create hctx for online CPUs, which can lead to a lot
of churn due to frequent soft offline / online operations. Instead
allocate one for each present CPU to avoid this and dramatically simplify
the code.

Signed-off-by: Christoph Hellwig <[email protected]>
---
block/blk-mq.c | 120 +++++----------------------------------------
block/blk-mq.h | 5 --
include/linux/cpuhotplug.h | 1 -
3 files changed, 11 insertions(+), 115 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index bb66c96850b1..dd390e27824d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,9 +37,6 @@
#include "blk-wbt.h"
#include "blk-mq-sched.h"

-static DEFINE_MUTEX(all_q_mutex);
-static LIST_HEAD(all_q_list);
-
static void blk_mq_poll_stats_start(struct request_queue *q);
static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync);
@@ -1975,8 +1972,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
INIT_LIST_HEAD(&__ctx->rq_list);
__ctx->queue = q;

- /* If the cpu isn't online, the cpu is mapped to first hctx */
- if (!cpu_online(i))
+ /* If the cpu isn't present, the cpu is mapped to first hctx */
+ if (!cpu_present(i))
continue;

hctx = blk_mq_map_queue(q, i);
@@ -2019,8 +2016,7 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
}
}

-static void blk_mq_map_swqueue(struct request_queue *q,
- const struct cpumask *online_mask)
+static void blk_mq_map_swqueue(struct request_queue *q)
{
unsigned int i, hctx_idx;
struct blk_mq_hw_ctx *hctx;
@@ -2038,13 +2034,11 @@ static void blk_mq_map_swqueue(struct request_queue *q,
}

/*
- * Map software to hardware queues
+ * Map software to hardware queues.
+ *
+ * If the cpu isn't present, the cpu is mapped to first hctx.
*/
- for_each_possible_cpu(i) {
- /* If the cpu isn't online, the cpu is mapped to first hctx */
- if (!cpumask_test_cpu(i, online_mask))
- continue;
-
+ for_each_present_cpu(i) {
hctx_idx = q->mq_map[i];
/* unmapped hw queue can be remapped after CPU topo changed */
if (!set->tags[hctx_idx] &&
@@ -2330,16 +2324,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
blk_queue_softirq_done(q, set->ops->complete);

blk_mq_init_cpu_queues(q, set->nr_hw_queues);
-
- get_online_cpus();
- mutex_lock(&all_q_mutex);
-
- list_add_tail(&q->all_q_node, &all_q_list);
blk_mq_add_queue_tag_set(set, q);
- blk_mq_map_swqueue(q, cpu_online_mask);
-
- mutex_unlock(&all_q_mutex);
- put_online_cpus();
+ blk_mq_map_swqueue(q);

if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
int ret;
@@ -2365,18 +2351,12 @@ void blk_mq_free_queue(struct request_queue *q)
{
struct blk_mq_tag_set *set = q->tag_set;

- mutex_lock(&all_q_mutex);
- list_del_init(&q->all_q_node);
- mutex_unlock(&all_q_mutex);
-
blk_mq_del_queue_tag_set(q);
-
blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
}

/* Basically redo blk_mq_init_queue with queue frozen */
-static void blk_mq_queue_reinit(struct request_queue *q,
- const struct cpumask *online_mask)
+static void blk_mq_queue_reinit(struct request_queue *q)
{
WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));

@@ -2389,76 +2369,12 @@ static void blk_mq_queue_reinit(struct request_queue *q,
* involves free and re-allocate memory, worthy doing?)
*/

- blk_mq_map_swqueue(q, online_mask);
+ blk_mq_map_swqueue(q);

blk_mq_sysfs_register(q);
blk_mq_debugfs_register_hctxs(q);
}

-/*
- * New online cpumask which is going to be set in this hotplug event.
- * Declare this cpumasks as global as cpu-hotplug operation is invoked
- * one-by-one and dynamically allocating this could result in a failure.
- */
-static struct cpumask cpuhp_online_new;
-
-static void blk_mq_queue_reinit_work(void)
-{
- struct request_queue *q;
-
- mutex_lock(&all_q_mutex);
- /*
- * We need to freeze and reinit all existing queues. Freezing
- * involves synchronous wait for an RCU grace period and doing it
- * one by one may take a long time. Start freezing all queues in
- * one swoop and then wait for the completions so that freezing can
- * take place in parallel.
- */
- list_for_each_entry(q, &all_q_list, all_q_node)
- blk_freeze_queue_start(q);
- list_for_each_entry(q, &all_q_list, all_q_node)
- blk_mq_freeze_queue_wait(q);
-
- list_for_each_entry(q, &all_q_list, all_q_node)
- blk_mq_queue_reinit(q, &cpuhp_online_new);
-
- list_for_each_entry(q, &all_q_list, all_q_node)
- blk_mq_unfreeze_queue(q);
-
- mutex_unlock(&all_q_mutex);
-}
-
-static int blk_mq_queue_reinit_dead(unsigned int cpu)
-{
- cpumask_copy(&cpuhp_online_new, cpu_online_mask);
- blk_mq_queue_reinit_work();
- return 0;
-}
-
-/*
- * Before hotadded cpu starts handling requests, new mappings must be
- * established. Otherwise, these requests in hw queue might never be
- * dispatched.
- *
- * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
- * for CPU0, and ctx1 for CPU1).
- *
- * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
- * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
- *
- * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
- * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
- * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
- * ignored.
- */
-static int blk_mq_queue_reinit_prepare(unsigned int cpu)
-{
- cpumask_copy(&cpuhp_online_new, cpu_online_mask);
- cpumask_set_cpu(cpu, &cpuhp_online_new);
- blk_mq_queue_reinit_work();
- return 0;
-}
-
static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
{
int i;
@@ -2669,7 +2585,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
blk_mq_update_queue_map(set);
list_for_each_entry(q, &set->tag_list, tag_set_list) {
blk_mq_realloc_hw_ctxs(set, q);
- blk_mq_queue_reinit(q, cpu_online_mask);
+ blk_mq_queue_reinit(q);
}

list_for_each_entry(q, &set->tag_list, tag_set_list)
@@ -2885,24 +2801,10 @@ bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
}
EXPORT_SYMBOL_GPL(blk_mq_poll);

-void blk_mq_disable_hotplug(void)
-{
- mutex_lock(&all_q_mutex);
-}
-
-void blk_mq_enable_hotplug(void)
-{
- mutex_unlock(&all_q_mutex);
-}
-
static int __init blk_mq_init(void)
{
cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
blk_mq_hctx_notify_dead);
-
- cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
- blk_mq_queue_reinit_prepare,
- blk_mq_queue_reinit_dead);
return 0;
}
subsys_initcall(blk_mq_init);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index cc67b48e3551..558df56544d2 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -56,11 +56,6 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
bool at_head);
void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
struct list_head *list);
-/*
- * CPU hotplug helpers
- */
-void blk_mq_enable_hotplug(void);
-void blk_mq_disable_hotplug(void);

/*
* CPU -> queue mappings
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index c15f22c54535..7f815d915977 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -58,7 +58,6 @@ enum cpuhp_state {
CPUHP_XEN_EVTCHN_PREPARE,
CPUHP_ARM_SHMOBILE_SCU_PREPARE,
CPUHP_SH_SH3X_PREPARE,
- CPUHP_BLK_MQ_PREPARE,
CPUHP_NET_FLOW_PREPARE,
CPUHP_TOPOLOGY_PREPARE,
CPUHP_NET_IUCV_PREPARE,
--
2.11.0

2017-06-27 18:26:28

by Jens Axboe

[permalink] [raw]
Subject: Re: block: spread MSI(-X) vectors to all possible CPUs

On 06/26/2017 04:20 AM, Christoph Hellwig wrote:
> Hi all,
>
> this series contains the left-over block bits to spread the MSI-X
> vectors over all CPU. Thomas already rewrote and then merged the
> irq bits into the tip irq/core branch, and this is the remainder.
>
> As there are no dependencies on other block changes adding them
> to the tip tree might be easiest if Jens could ACK them.

Looks fine to me, you can add my Reviewed-by if you want to funnel
them through the tip tree.

--
Jens Axboe

Subject: [tip:irq/core] blk-mq: Include all present CPUs in the default queue mapping

Commit-ID: 5f042e7cbd9ebd3580077dcdc21f35e68c2adf5f
Gitweb: http://git.kernel.org/tip/5f042e7cbd9ebd3580077dcdc21f35e68c2adf5f
Author: Christoph Hellwig <[email protected]>
AuthorDate: Mon, 26 Jun 2017 12:20:56 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 28 Jun 2017 23:00:06 +0200

blk-mq: Include all present CPUs in the default queue mapping

This way we get a nice distribution independent of the current cpu
online / offline state.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
block/blk-mq-cpumap.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 8e61e86..5eaecd4 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -35,7 +35,6 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)
{
unsigned int *map = set->mq_map;
unsigned int nr_queues = set->nr_hw_queues;
- const struct cpumask *online_mask = cpu_online_mask;
unsigned int i, nr_cpus, nr_uniq_cpus, queue, first_sibling;
cpumask_var_t cpus;

@@ -44,7 +43,7 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)

cpumask_clear(cpus);
nr_cpus = nr_uniq_cpus = 0;
- for_each_cpu(i, online_mask) {
+ for_each_present_cpu(i) {
nr_cpus++;
first_sibling = get_first_sibling(i);
if (!cpumask_test_cpu(first_sibling, cpus))
@@ -54,7 +53,7 @@ int blk_mq_map_queues(struct blk_mq_tag_set *set)

queue = 0;
for_each_possible_cpu(i) {
- if (!cpumask_test_cpu(i, online_mask)) {
+ if (!cpumask_test_cpu(i, cpu_present_mask)) {
map[i] = 0;
continue;
}

Subject: [tip:irq/core] blk-mq: Create hctx for each present CPU

Commit-ID: 4b855ad37194f7bdbb200ce7a1c7051fecb56a08
Gitweb: http://git.kernel.org/tip/4b855ad37194f7bdbb200ce7a1c7051fecb56a08
Author: Christoph Hellwig <[email protected]>
AuthorDate: Mon, 26 Jun 2017 12:20:57 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 28 Jun 2017 23:00:07 +0200

blk-mq: Create hctx for each present CPU

Currently we only create hctx for online CPUs, which can lead to a lot
of churn due to frequent soft offline / online operations. Instead
allocate one for each present CPU to avoid this and dramatically simplify
the code.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
block/blk-mq.c | 120 +++++----------------------------------------
block/blk-mq.h | 5 --
include/linux/cpuhotplug.h | 1 -
3 files changed, 11 insertions(+), 115 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index bb66c96..dd390e2 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -37,9 +37,6 @@
#include "blk-wbt.h"
#include "blk-mq-sched.h"

-static DEFINE_MUTEX(all_q_mutex);
-static LIST_HEAD(all_q_list);
-
static void blk_mq_poll_stats_start(struct request_queue *q);
static void blk_mq_poll_stats_fn(struct blk_stat_callback *cb);
static void __blk_mq_stop_hw_queues(struct request_queue *q, bool sync);
@@ -1975,8 +1972,8 @@ static void blk_mq_init_cpu_queues(struct request_queue *q,
INIT_LIST_HEAD(&__ctx->rq_list);
__ctx->queue = q;

- /* If the cpu isn't online, the cpu is mapped to first hctx */
- if (!cpu_online(i))
+ /* If the cpu isn't present, the cpu is mapped to first hctx */
+ if (!cpu_present(i))
continue;

hctx = blk_mq_map_queue(q, i);
@@ -2019,8 +2016,7 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set,
}
}

-static void blk_mq_map_swqueue(struct request_queue *q,
- const struct cpumask *online_mask)
+static void blk_mq_map_swqueue(struct request_queue *q)
{
unsigned int i, hctx_idx;
struct blk_mq_hw_ctx *hctx;
@@ -2038,13 +2034,11 @@ static void blk_mq_map_swqueue(struct request_queue *q,
}

/*
- * Map software to hardware queues
+ * Map software to hardware queues.
+ *
+ * If the cpu isn't present, the cpu is mapped to first hctx.
*/
- for_each_possible_cpu(i) {
- /* If the cpu isn't online, the cpu is mapped to first hctx */
- if (!cpumask_test_cpu(i, online_mask))
- continue;
-
+ for_each_present_cpu(i) {
hctx_idx = q->mq_map[i];
/* unmapped hw queue can be remapped after CPU topo changed */
if (!set->tags[hctx_idx] &&
@@ -2330,16 +2324,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
blk_queue_softirq_done(q, set->ops->complete);

blk_mq_init_cpu_queues(q, set->nr_hw_queues);
-
- get_online_cpus();
- mutex_lock(&all_q_mutex);
-
- list_add_tail(&q->all_q_node, &all_q_list);
blk_mq_add_queue_tag_set(set, q);
- blk_mq_map_swqueue(q, cpu_online_mask);
-
- mutex_unlock(&all_q_mutex);
- put_online_cpus();
+ blk_mq_map_swqueue(q);

if (!(set->flags & BLK_MQ_F_NO_SCHED)) {
int ret;
@@ -2365,18 +2351,12 @@ void blk_mq_free_queue(struct request_queue *q)
{
struct blk_mq_tag_set *set = q->tag_set;

- mutex_lock(&all_q_mutex);
- list_del_init(&q->all_q_node);
- mutex_unlock(&all_q_mutex);
-
blk_mq_del_queue_tag_set(q);
-
blk_mq_exit_hw_queues(q, set, set->nr_hw_queues);
}

/* Basically redo blk_mq_init_queue with queue frozen */
-static void blk_mq_queue_reinit(struct request_queue *q,
- const struct cpumask *online_mask)
+static void blk_mq_queue_reinit(struct request_queue *q)
{
WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));

@@ -2389,76 +2369,12 @@ static void blk_mq_queue_reinit(struct request_queue *q,
* involves free and re-allocate memory, worthy doing?)
*/

- blk_mq_map_swqueue(q, online_mask);
+ blk_mq_map_swqueue(q);

blk_mq_sysfs_register(q);
blk_mq_debugfs_register_hctxs(q);
}

-/*
- * New online cpumask which is going to be set in this hotplug event.
- * Declare this cpumasks as global as cpu-hotplug operation is invoked
- * one-by-one and dynamically allocating this could result in a failure.
- */
-static struct cpumask cpuhp_online_new;
-
-static void blk_mq_queue_reinit_work(void)
-{
- struct request_queue *q;
-
- mutex_lock(&all_q_mutex);
- /*
- * We need to freeze and reinit all existing queues. Freezing
- * involves synchronous wait for an RCU grace period and doing it
- * one by one may take a long time. Start freezing all queues in
- * one swoop and then wait for the completions so that freezing can
- * take place in parallel.
- */
- list_for_each_entry(q, &all_q_list, all_q_node)
- blk_freeze_queue_start(q);
- list_for_each_entry(q, &all_q_list, all_q_node)
- blk_mq_freeze_queue_wait(q);
-
- list_for_each_entry(q, &all_q_list, all_q_node)
- blk_mq_queue_reinit(q, &cpuhp_online_new);
-
- list_for_each_entry(q, &all_q_list, all_q_node)
- blk_mq_unfreeze_queue(q);
-
- mutex_unlock(&all_q_mutex);
-}
-
-static int blk_mq_queue_reinit_dead(unsigned int cpu)
-{
- cpumask_copy(&cpuhp_online_new, cpu_online_mask);
- blk_mq_queue_reinit_work();
- return 0;
-}
-
-/*
- * Before hotadded cpu starts handling requests, new mappings must be
- * established. Otherwise, these requests in hw queue might never be
- * dispatched.
- *
- * For example, there is a single hw queue (hctx) and two CPU queues (ctx0
- * for CPU0, and ctx1 for CPU1).
- *
- * Now CPU1 is just onlined and a request is inserted into ctx1->rq_list
- * and set bit0 in pending bitmap as ctx1->index_hw is still zero.
- *
- * And then while running hw queue, blk_mq_flush_busy_ctxs() finds bit0 is set
- * in pending bitmap and tries to retrieve requests in hctx->ctxs[0]->rq_list.
- * But htx->ctxs[0] is a pointer to ctx0, so the request in ctx1->rq_list is
- * ignored.
- */
-static int blk_mq_queue_reinit_prepare(unsigned int cpu)
-{
- cpumask_copy(&cpuhp_online_new, cpu_online_mask);
- cpumask_set_cpu(cpu, &cpuhp_online_new);
- blk_mq_queue_reinit_work();
- return 0;
-}
-
static int __blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
{
int i;
@@ -2669,7 +2585,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
blk_mq_update_queue_map(set);
list_for_each_entry(q, &set->tag_list, tag_set_list) {
blk_mq_realloc_hw_ctxs(set, q);
- blk_mq_queue_reinit(q, cpu_online_mask);
+ blk_mq_queue_reinit(q);
}

list_for_each_entry(q, &set->tag_list, tag_set_list)
@@ -2885,24 +2801,10 @@ bool blk_mq_poll(struct request_queue *q, blk_qc_t cookie)
}
EXPORT_SYMBOL_GPL(blk_mq_poll);

-void blk_mq_disable_hotplug(void)
-{
- mutex_lock(&all_q_mutex);
-}
-
-void blk_mq_enable_hotplug(void)
-{
- mutex_unlock(&all_q_mutex);
-}
-
static int __init blk_mq_init(void)
{
cpuhp_setup_state_multi(CPUHP_BLK_MQ_DEAD, "block/mq:dead", NULL,
blk_mq_hctx_notify_dead);
-
- cpuhp_setup_state_nocalls(CPUHP_BLK_MQ_PREPARE, "block/mq:prepare",
- blk_mq_queue_reinit_prepare,
- blk_mq_queue_reinit_dead);
return 0;
}
subsys_initcall(blk_mq_init);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index cc67b48..558df56 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -56,11 +56,6 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
bool at_head);
void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
struct list_head *list);
-/*
- * CPU hotplug helpers
- */
-void blk_mq_enable_hotplug(void);
-void blk_mq_disable_hotplug(void);

/*
* CPU -> queue mappings
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index c15f22c..7f815d9 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -58,7 +58,6 @@ enum cpuhp_state {
CPUHP_XEN_EVTCHN_PREPARE,
CPUHP_ARM_SHMOBILE_SCU_PREPARE,
CPUHP_SH_SH3X_PREPARE,
- CPUHP_BLK_MQ_PREPARE,
CPUHP_NET_FLOW_PREPARE,
CPUHP_TOPOLOGY_PREPARE,
CPUHP_NET_IUCV_PREPARE,

Subject: [tip:irq/core] nvme: Allocate queues for all possible CPUs

Commit-ID: 425a17cbfff933c4cca4eeef5caa5926d198dd85
Gitweb: http://git.kernel.org/tip/425a17cbfff933c4cca4eeef5caa5926d198dd85
Author: Christoph Hellwig <[email protected]>
AuthorDate: Mon, 26 Jun 2017 12:20:58 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 28 Jun 2017 23:00:07 +0200

nvme: Allocate queues for all possible CPUs

Unlike most drіvers that simply pass the maximum possible vectors to
pci_alloc_irq_vectors NVMe needs to configure the device before allocting
the vectors, so it needs a manual update for the new scheme of using
all present CPUs.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>

---
drivers/nvme/host/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 951042a..b3dcd7a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1525,7 +1525,7 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
struct pci_dev *pdev = to_pci_dev(dev->dev);
int result, nr_io_queues, size;

- nr_io_queues = num_online_cpus();
+ nr_io_queues = num_present_cpus();
result = nvme_set_queue_count(&dev->ctrl, &nr_io_queues);
if (result < 0)
return result;

2017-07-02 17:47:38

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH 2/3] blk-mq: create hctx for each present CPU

Looks good,

Reviewed-by: Sagi Grimberg <[email protected]>

2017-07-02 17:47:48

by Sagi Grimberg

[permalink] [raw]

2017-07-03 10:50:16

by Max Gurtovoy

[permalink] [raw]
Subject: Re: block: spread MSI(-X) vectors to all possible CPUs



On 6/27/2017 9:24 PM, Jens Axboe wrote:
> On 06/26/2017 04:20 AM, Christoph Hellwig wrote:
>> Hi all,
>>
>> this series contains the left-over block bits to spread the MSI-X
>> vectors over all CPU. Thomas already rewrote and then merged the
>> irq bits into the tip irq/core branch, and this is the remainder.
>>
>> As there are no dependencies on other block changes adding them
>> to the tip tree might be easiest if Jens could ACK them.
>
> Looks fine to me, you can add my Reviewed-by if you want to funnel
> them through the tip tree.
>

I guess we'll need to do some rebasing with my "blk-mq: map all HWQ also
in hyperthreaded system" patch.
Jens/Christoph,
how do you prefer doing it ?