Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752902AbbF3JGa (ORCPT ); Tue, 30 Jun 2015 05:06:30 -0400 Received: from mail-ig0-f182.google.com ([209.85.213.182]:35827 "EHLO mail-ig0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751949AbbF3JGV (ORCPT ); Tue, 30 Jun 2015 05:06:21 -0400 MIME-Version: 1.0 In-Reply-To: References: <1434894751-6877-1-git-send-email-akinobu.mita@gmail.com> <1434894751-6877-4-git-send-email-akinobu.mita@gmail.com> <20150625234030.4dc99725@tom-T450> <1435338857.30650.6.camel@mita-ThinkPad-T540p> Date: Tue, 30 Jun 2015 17:06:20 +0800 Message-ID: Subject: Re: [PATCH 3/4] blk-mq: establish new mapping before cpu starts handling requests From: Ming Lei To: Akinobu Mita Cc: Linux Kernel Mailing List , Jens Axboe Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5668 Lines: 147 On Mon, Jun 29, 2015 at 10:25 PM, Akinobu Mita wrote: > 2015-06-28 1:08 GMT+09:00 Ming Lei : >> On Sat, Jun 27, 2015 at 1:14 AM, Akinobu Mita wrote: >>> Akinobu Mita wrote: >>>> 2015-06-26 0:40 GMT+09:00 Ming Lei : >>>> > On Thu, 25 Jun 2015 21:49:43 +0900 >>>> > Akinobu Mita wrote: >>>> >> For example, there is a single hw queue (hctx) and two CPU queues >>>> >> (ctx0 for CPU0, and ctx1 for CPU1). Now CPU1 is just onlined and >>>> >> a request is inserted into ctx1->rq_list and set bit0 in pending >>>> >> bitmap as ctx1->index_hw is still zero. >>>> >> >>>> >> And then while running hw queue, flush_busy_ctxs() finds bit0 is set >>>> >> in pending bitmap and tries to retrieve requests in >>>> >> hctx->ctxs[0].rq_list. But htx->ctxs[0] is ctx0, so the request in >>>> >> ctx1->rq_list is ignored. >>>> > >>>> > Per current design, the request should have been inserted into ctx0 instead >>>> > of ctx1 because ctx1 isn't mapped yet even though ctx1->cpu becomes ONLINE. >>>> > >>>> > So how about the following patch? which looks much simpler. >>>> >>>> OK, I'll try this patch to see if the problem disappears. >>> >>> This doesn't fix the problem. Because: >>> >>>> > diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> > index f537796..2f45b73 100644 >>>> > --- a/block/blk-mq.c >>>> > +++ b/block/blk-mq.c >>>> > @@ -1034,7 +1034,12 @@ void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue, >>>> > struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx; >>>> > >>>> > current_ctx = blk_mq_get_ctx(q); >>>> > - if (!cpu_online(ctx->cpu)) >>>> > + /* >>>> > + * ctx->cpu may become ONLINE but ctx hasn't been mapped to >>>> > + * hctx yet because there is a tiny race window between >>>> > + * ctx->cpu ONLINE and doing the remap >>>> > + */ >>>> > + if (!blk_mq_ctx_mapped(ctx)) >>>> > rq->mq_ctx = ctx = current_ctx; >>> >>> The process running on just onlined CPU1 in the above example can >>> satisfy this condition and current_ctx will be ctx1. So the same >>> scenario can happen (the request is ignored by flush_busy_ctxs). >> >> Yeah, that is possible, and it should be bug of blk_mq_insert_request(), >> because the function supposes that the current ctx is mapped. >> >> Then I think the approach in your 1st email of this thread may be >> good one, and looks we have to make the remapping during >> CPU UP_PREPARE notifier. > > OK, we can move on to other topic that you suggested in the other mail > thread. Why? That is just what this patch in this thread is doing, :-) > >>> I found simple alternative solution that assigns the offline CPUs >>> unique ctx->index_hw. >> >> This solution looks simpler, but it may not be correct. > > You are right. This solution needs amendments, just in case we needs > to come back this solution instead of the first approach. > >>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>> index 594eea0..a8fcfbf 100644 >>> --- a/block/blk-mq.c >>> +++ b/block/blk-mq.c >>> @@ -1787,10 +1787,11 @@ static void blk_mq_map_swqueue(struct >>> request_queue *q) >>> */ >>> queue_for_each_ctx(q, ctx, i) { >>> /* If the cpu isn't online, the cpu is mapped to first hctx */ >>> - if (!cpu_online(i)) >>> - continue; >>> + if (!cpu_online(i) && cpu_possible(i)) >> >> The cpu_possible() check isn't needed. > > OK. > >>> + hctx = q->mq_ops->map_queue(q, 0); >>> + else >>> + hctx = q->mq_ops->map_queue(q, i); >> >> The above change supposes that all offline CPUs(ctxs) >> share the same 'hctx' mapped from CPU 0, and that is >> obvious wrong. >> >> All offline CPUs should have shared the 1st 'hctx' instead >> of the 'hctx' mapped from CPU 0. > > Do you mean that we shoud use 'q->queue_hw_ctx[0]' for offline CPU? Yes, it is the 1st hctx, but I don't think it is good idea to add the unmapped ctx into the 1st hctx because it may bring side effects. > >>> >>> - hctx = q->mq_ops->map_queue(q, i); >>> cpumask_set_cpu(i, hctx->cpumask); >> >> CPU i shouldn't have been set on hctx->cpumask in this approach >> if it isn't online. > > OK. > >>> ctx->index_hw = hctx->nr_ctx; >>> hctx->ctxs[hctx->nr_ctx++] = ctx; >> >> I am not sure the current code is ready about adding offline 'ctx' into >> 'hctx', and there are some observalbe effects at least: >> >> - blk_mq_run_hw_queue() can run even the hctx hasn't mapped >> 'ctx' > > Is this fixed by not setting hctx->cpumask for offline CPUs? > >> - the offline ctx kobject will be exposed to user space in sysfs > > OK. this should be avoided. > >> - blk_mq_hw_queue_mapped() may becomes always true >> - set->tags[i](request entries) may not be freed even if there >> aren't mapped 'ctx' in one 'hctx' > > Aren't these only happend for the 1st hctx (i.e. 'q->queue_hw_ctx[0]')? I mean it may bring lots of issues if you add the unmapped ctx into the 1st hctx by 'hctx->ctxs[hctx->nr_ctx++] = ctx', that may cause the fix quite complicated. So I think it will be good to do the remapping during CPU PREPRE_UP notifier, and I will review this patch later. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/