When a cpu is offline, blk_mq_hctx_notify_dead() is called once for each
hctx for the offline cpu.
While blk_mq_hctx_notify_dead() is used to splice all ctx->rq_lists[type]
to hctx->dispatch, it never checks whether the ctx is already mapped to the
hctx.
For example, on a VM (with nvme) of 4 cpu, to offline cpu 2 out of the
4 cpu (0-3), blk_mq_hctx_notify_dead() is called once for each io queue
hctx:
1st: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 3
2nd: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 2
3rd: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 1
4th: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 0
Although blk_mq_ctx->cpu = 2 is only mapped to blk_mq_hw_ctx->queue_num = 2
in this case, its ctx->rq_lists[type] will however be moved to
blk_mq_hw_ctx->queue_num = 3 during the 1st call of
blk_mq_hctx_notify_dead().
This patch would return and go ahead to next call of
blk_mq_hctx_notify_dead() if ctx is not mapped to hctx.
Signed-off-by: Dongli Zhang <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
---
block/blk-mq.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a935483..9612746 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2219,6 +2219,10 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
enum hctx_type type;
hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
+
+ if (!cpumask_test_cpu(cpu, hctx->cpumask))
+ return 0;
+
ctx = __blk_mq_get_ctx(hctx->queue, cpu);
type = hctx->type;
--
2.7.4
Hi Jens,
On 04/08/2019 07:12 PM, Dongli Zhang wrote:
> When a cpu is offline, blk_mq_hctx_notify_dead() is called once for each
> hctx for the offline cpu.
>
> While blk_mq_hctx_notify_dead() is used to splice all ctx->rq_lists[type]
> to hctx->dispatch, it never checks whether the ctx is already mapped to the
> hctx.
>
> For example, on a VM (with nvme) of 4 cpu, to offline cpu 2 out of the
> 4 cpu (0-3), blk_mq_hctx_notify_dead() is called once for each io queue
> hctx:
>
> 1st: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 3
> 2nd: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 2
> 3rd: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 1
> 4th: blk_mq_ctx->cpu = 2 for blk_mq_hw_ctx->queue_num = 0
>
> Although blk_mq_ctx->cpu = 2 is only mapped to blk_mq_hw_ctx->queue_num = 2
> in this case, its ctx->rq_lists[type] will however be moved to
> blk_mq_hw_ctx->queue_num = 3 during the 1st call of
> blk_mq_hctx_notify_dead().
>
> This patch would return and go ahead to next call of
> blk_mq_hctx_notify_dead() if ctx is not mapped to hctx.
>
> Signed-off-by: Dongli Zhang <[email protected]>
> Reviewed-by: Ming Lei <[email protected]>
Would you consider this one?
In addition to Ming's Reviewed-by, there is another Reviewed-by from Keith as in
below link.
https://lore.kernel.org/linux-block/[email protected]/
Thank you very much!
Dongli Zhang
> ---
> block/blk-mq.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index a935483..9612746 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2219,6 +2219,10 @@ static int blk_mq_hctx_notify_dead(unsigned int cpu, struct hlist_node *node)
> enum hctx_type type;
>
> hctx = hlist_entry_safe(node, struct blk_mq_hw_ctx, cpuhp_dead);
> +
> + if (!cpumask_test_cpu(cpu, hctx->cpumask))
> + return 0;
> +
> ctx = __blk_mq_get_ctx(hctx->queue, cpu);
> type = hctx->type;
>
>