Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751630AbeAPKNF (ORCPT + 1 other); Tue, 16 Jan 2018 05:13:05 -0500 Received: from userp2120.oracle.com ([156.151.31.85]:55332 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750832AbeAPKNA (ORCPT ); Tue, 16 Jan 2018 05:13:00 -0500 Subject: Re: [PATCH 2/2] blk-mq: simplify queue mapping & schedule with each possisble CPU To: Ming Lei , Jens Axboe , linux-block@vger.kernel.org, Christoph Hellwig Cc: Christian Borntraeger , Stefan Haberland , Thomas Gleixner , linux-kernel@vger.kernel.org, Christoph Hellwig References: <20180112025306.28004-1-ming.lei@redhat.com> <20180112025306.28004-3-ming.lei@redhat.com> From: "jianchao.wang" Message-ID: <0d36c16b-cb4b-6088-fdf3-2fe5d8f33cd7@oracle.com> Date: Tue, 16 Jan 2018 18:12:09 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <20180112025306.28004-3-ming.lei@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8775 signatures=668652 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801160148 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi Ming On 01/12/2018 10:53 AM, Ming Lei wrote: > From: Christoph Hellwig > > The previous patch assigns interrupt vectors to all possible CPUs, so > now hctx can be mapped to possible CPUs, this patch applies this fact > to simplify queue mapping & schedule so that we don't need to handle > CPU hotplug for dealing with physical CPU plug & unplug. With this > simplication, we can work well on physical CPU plug & unplug, which > is a normal use case for VM at least. > > Make sure we allocate blk_mq_ctx structures for all possible CPUs, and > set hctx->numa_node for possible CPUs which are mapped to this hctx. And > only choose the online CPUs for schedule. > > Reported-by: Christian Borntraeger > Tested-by: Christian Borntraeger > Tested-by: Stefan Haberland > Cc: Thomas Gleixner > Signed-off-by: Christoph Hellwig > (merged the three into one because any single one may not work, and fix > selecting online CPUs for scheduler) > Signed-off-by: Ming Lei > --- > block/blk-mq.c | 19 ++++++++----------- > 1 file changed, 8 insertions(+), 11 deletions(-) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 8000ba6db07d..ef9beca2d117 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -440,7 +440,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, > blk_queue_exit(q); > return ERR_PTR(-EXDEV); > } > - cpu = cpumask_first(alloc_data.hctx->cpumask); > + cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask); > alloc_data.ctx = __blk_mq_get_ctx(q, cpu); > > rq = blk_mq_get_request(q, NULL, op, &alloc_data); > @@ -1323,9 +1323,10 @@ static int blk_mq_hctx_next_cpu(struct blk_mq_hw_ctx *hctx) > if (--hctx->next_cpu_batch <= 0) { > int next_cpu; > > - next_cpu = cpumask_next(hctx->next_cpu, hctx->cpumask); > + next_cpu = cpumask_next_and(hctx->next_cpu, hctx->cpumask, > + cpu_online_mask); > if (next_cpu >= nr_cpu_ids) > - next_cpu = cpumask_first(hctx->cpumask); > + next_cpu = cpumask_first_and(hctx->cpumask,cpu_online_mask); the next_cpu here could be >= nr_cpu_ids when the none of on hctx->cpumask is online. This could be reproduced on NVMe with a patch that could hold some rqs on ctx->rq_list, meanwhile a script online and offline the cpus. Then a panic occurred in __queue_work(). maybe cpu_possible_mask here, the workers in the pool of the offlined cpu has been unbound. It should be ok to queue on them. Thanks Jianchao > hctx->next_cpu = next_cpu; > hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; > @@ -2219,16 +2220,11 @@ static void blk_mq_init_cpu_queues(struct request_queue *q, > INIT_LIST_HEAD(&__ctx->rq_list); > __ctx->queue = q; > > - /* If the cpu isn't present, the cpu is mapped to first hctx */ > - if (!cpu_present(i)) > - continue; > - > - hctx = blk_mq_map_queue(q, i); > - > /* > * Set local node, IFF we have more than one hw queue. If > * not, we remain on the home node of the device > */ > + hctx = blk_mq_map_queue(q, i); > if (nr_hw_queues > 1 && hctx->numa_node == NUMA_NO_NODE) > hctx->numa_node = local_memory_node(cpu_to_node(i)); > } > @@ -2285,7 +2281,7 @@ static void blk_mq_map_swqueue(struct request_queue *q) > * > * If the cpu isn't present, the cpu is mapped to first hctx. > */ > - for_each_present_cpu(i) { > + for_each_possible_cpu(i) { > hctx_idx = q->mq_map[i]; > /* unmapped hw queue can be remapped after CPU topo changed */ > if (!set->tags[hctx_idx] && > @@ -2339,7 +2335,8 @@ static void blk_mq_map_swqueue(struct request_queue *q) > /* > * Initialize batch roundrobin counts > */ > - hctx->next_cpu = cpumask_first(hctx->cpumask); > + hctx->next_cpu = cpumask_first_and(hctx->cpumask, > + cpu_online_mask); > hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH; > } > } >