Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753346AbbF2OZw (ORCPT ); Mon, 29 Jun 2015 10:25:52 -0400 Received: from mail-vn0-f45.google.com ([209.85.216.45]:36727 "EHLO mail-vn0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753327AbbF2OZp (ORCPT ); Mon, 29 Jun 2015 10:25:45 -0400 MIME-Version: 1.0 In-Reply-To: References: <1434894751-6877-1-git-send-email-akinobu.mita@gmail.com> <1434894751-6877-4-git-send-email-akinobu.mita@gmail.com> <20150625234030.4dc99725@tom-T450> <1435338857.30650.6.camel@mita-ThinkPad-T540p> Date: Mon, 29 Jun 2015 23:25:44 +0900 Message-ID: Subject: Re: [PATCH 3/4] blk-mq: establish new mapping before cpu starts handling requests From: Akinobu Mita To: Ming Lei Cc: Linux Kernel Mailing List , Jens Axboe Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4952 Lines: 128 2015-06-28 1:08 GMT+09:00 Ming Lei : > On Sat, Jun 27, 2015 at 1:14 AM, Akinobu Mita wrote: >> Akinobu Mita wrote: >>> 2015-06-26 0:40 GMT+09:00 Ming Lei : >>> > On Thu, 25 Jun 2015 21:49:43 +0900 >>> > Akinobu Mita wrote: >>> >> For example, there is a single hw queue (hctx) and two CPU queues >>> >> (ctx0 for CPU0, and ctx1 for CPU1). Now CPU1 is just onlined and >>> >> a request is inserted into ctx1->rq_list and set bit0 in pending >>> >> bitmap as ctx1->index_hw is still zero. >>> >> >>> >> And then while running hw queue, flush_busy_ctxs() finds bit0 is set >>> >> in pending bitmap and tries to retrieve requests in >>> >> hctx->ctxs[0].rq_list. But htx->ctxs[0] is ctx0, so the request in >>> >> ctx1->rq_list is ignored. >>> > >>> > Per current design, the request should have been inserted into ctx0 instead >>> > of ctx1 because ctx1 isn't mapped yet even though ctx1->cpu becomes ONLINE. >>> > >>> > So how about the following patch? which looks much simpler. >>> >>> OK, I'll try this patch to see if the problem disappears. >> >> This doesn't fix the problem. Because: >> >>> > diff --git a/block/blk-mq.c b/block/blk-mq.c >>> > index f537796..2f45b73 100644 >>> > --- a/block/blk-mq.c >>> > +++ b/block/blk-mq.c >>> > @@ -1034,7 +1034,12 @@ void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue, >>> > struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx; >>> > >>> > current_ctx = blk_mq_get_ctx(q); >>> > - if (!cpu_online(ctx->cpu)) >>> > + /* >>> > + * ctx->cpu may become ONLINE but ctx hasn't been mapped to >>> > + * hctx yet because there is a tiny race window between >>> > + * ctx->cpu ONLINE and doing the remap >>> > + */ >>> > + if (!blk_mq_ctx_mapped(ctx)) >>> > rq->mq_ctx = ctx = current_ctx; >> >> The process running on just onlined CPU1 in the above example can >> satisfy this condition and current_ctx will be ctx1. So the same >> scenario can happen (the request is ignored by flush_busy_ctxs). > > Yeah, that is possible, and it should be bug of blk_mq_insert_request(), > because the function supposes that the current ctx is mapped. > > Then I think the approach in your 1st email of this thread may be > good one, and looks we have to make the remapping during > CPU UP_PREPARE notifier. OK, we can move on to other topic that you suggested in the other mail thread. >> I found simple alternative solution that assigns the offline CPUs >> unique ctx->index_hw. > > This solution looks simpler, but it may not be correct. You are right. This solution needs amendments, just in case we needs to come back this solution instead of the first approach. >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 594eea0..a8fcfbf 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -1787,10 +1787,11 @@ static void blk_mq_map_swqueue(struct >> request_queue *q) >> */ >> queue_for_each_ctx(q, ctx, i) { >> /* If the cpu isn't online, the cpu is mapped to first hctx */ >> - if (!cpu_online(i)) >> - continue; >> + if (!cpu_online(i) && cpu_possible(i)) > > The cpu_possible() check isn't needed. OK. >> + hctx = q->mq_ops->map_queue(q, 0); >> + else >> + hctx = q->mq_ops->map_queue(q, i); > > The above change supposes that all offline CPUs(ctxs) > share the same 'hctx' mapped from CPU 0, and that is > obvious wrong. > > All offline CPUs should have shared the 1st 'hctx' instead > of the 'hctx' mapped from CPU 0. Do you mean that we shoud use 'q->queue_hw_ctx[0]' for offline CPU? >> >> - hctx = q->mq_ops->map_queue(q, i); >> cpumask_set_cpu(i, hctx->cpumask); > > CPU i shouldn't have been set on hctx->cpumask in this approach > if it isn't online. OK. >> ctx->index_hw = hctx->nr_ctx; >> hctx->ctxs[hctx->nr_ctx++] = ctx; > > I am not sure the current code is ready about adding offline 'ctx' into > 'hctx', and there are some observalbe effects at least: > > - blk_mq_run_hw_queue() can run even the hctx hasn't mapped > 'ctx' Is this fixed by not setting hctx->cpumask for offline CPUs? > - the offline ctx kobject will be exposed to user space in sysfs OK. this should be avoided. > - blk_mq_hw_queue_mapped() may becomes always true > - set->tags[i](request entries) may not be freed even if there > aren't mapped 'ctx' in one 'hctx' Aren't these only happend for the 1st hctx (i.e. 'q->queue_hw_ctx[0]')? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/