Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752753AbbFYPku (ORCPT ); Thu, 25 Jun 2015 11:40:50 -0400 Received: from mail-pd0-f180.google.com ([209.85.192.180]:33443 "EHLO mail-pd0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751546AbbFYPkl (ORCPT ); Thu, 25 Jun 2015 11:40:41 -0400 Date: Thu, 25 Jun 2015 23:40:30 +0800 From: Ming Lei To: Akinobu Mita Cc: Linux Kernel Mailing List , Jens Axboe Subject: Re: [PATCH 3/4] blk-mq: establish new mapping before cpu starts handling requests Message-ID: <20150625234030.4dc99725@tom-T450> In-Reply-To: References: <1434894751-6877-1-git-send-email-akinobu.mita@gmail.com> <1434894751-6877-4-git-send-email-akinobu.mita@gmail.com> Organization: Ming X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.23; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5971 Lines: 155 On Thu, 25 Jun 2015 21:49:43 +0900 Akinobu Mita wrote: > 2015-06-25 17:07 GMT+09:00 Ming Lei : > > On Thu, Jun 25, 2015 at 10:56 AM, Akinobu Mita wrote: > >> 2015-06-25 1:24 GMT+09:00 Ming Lei : > >>> On Wed, Jun 24, 2015 at 10:34 PM, Akinobu Mita wrote: > >>>> Hi Ming, > >>>> > >>>> 2015-06-24 18:46 GMT+09:00 Ming Lei : > >>>>> On Sun, Jun 21, 2015 at 9:52 PM, Akinobu Mita wrote: > >>>>>> ctx->index_hw is zero for the CPUs which have never been onlined since > >>>>>> the block queue was initialized. If one of those CPUs is hotadded and > >>>>>> starts handling request before new mappings are established, pending > >>>>> > >>>>> Could you explain a bit what the handling request is? The fact is that > >>>>> blk_mq_queue_reinit() is run after all queues are put into freezing. > >>>> > >>>> Notifier callbacks for CPU_ONLINE action can be run on the other CPU > >>>> than the CPU which was just onlined. So it is possible for the > >>>> process running on the just onlined CPU to insert request and run > >>>> hw queue before blk_mq_queue_reinit_notify() is actually called with > >>>> action=CPU_ONLINE. > >>> > >>> You are right because blk_mq_queue_reinit_notify() is alwasy run after > >>> the CPU becomes UP, so there is a tiny window in which the CPU is up > >>> but the mapping is updated. Per current design, the CPU just onlined > >>> is still mapped to hw queue 0 until the mapping is updated by > >>> blk_mq_queue_reinit_notify(). > >>> > >>> But I am wondering why it is a problem and why you think flush_busy_ctxs > >>> can't find the requests on the software queue in this situation? > >> > >> The problem happens when the CPU has just been onlined first time > >> since the request queue was initialized. At this time ctx->index_hw > >> for the CPU is still zero before blk_mq_queue_reinit_notify is called. > >> > >> The request can be inserted to ctx->rq_list, but blk_mq_hctx_mark_pending() > >> marks busy for wrong bit position as ctx->index_hw is zero. > > > > It isn't wrong bit since the CPU onlined just is still mapped to hctx 0 at that > > time . > > ctx->index_hw is not CPU queue to HW queue mapping. > ctx->index_hw is the index in hctx->ctxs[] for this ctx. > Each ctx in a hw queue should have unique ctx->index_hw. You are right, sorry for my fault. > > This problem can be reproducible with a single hw queue. (The script > in cover letter can reproduce this problem with a single hw queue) > > >> flush_busy_ctxs() only retrieves the requests from software queues > >> which are marked busy. So the request just inserted is ignored as > >> the corresponding bit position is not busy. > > > > Before making the remap in blk_mq_queue_reinit() for the CPU topo change, > > the request queue will be put into freezing first and all requests > > inserted to hctx 0 > > should be retrieved and scheduled out. So can the request be igonred by > > flush_busy_ctxs()? > > For example, there is a single hw queue (hctx) and two CPU queues > (ctx0 for CPU0, and ctx1 for CPU1). Now CPU1 is just onlined and > a request is inserted into ctx1->rq_list and set bit0 in pending > bitmap as ctx1->index_hw is still zero. > > And then while running hw queue, flush_busy_ctxs() finds bit0 is set > in pending bitmap and tries to retrieve requests in > hctx->ctxs[0].rq_list. But htx->ctxs[0] is ctx0, so the request in > ctx1->rq_list is ignored. Per current design, the request should have been inserted into ctx0 instead of ctx1 because ctx1 isn't mapped yet even though ctx1->cpu becomes ONLINE. So how about the following patch? which looks much simpler. --- diff --git a/block/blk-mq.c b/block/blk-mq.c index f537796..2f45b73 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1034,7 +1034,12 @@ void blk_mq_insert_request(struct request *rq, bool at_head, bool run_queue, struct blk_mq_ctx *ctx = rq->mq_ctx, *current_ctx; current_ctx = blk_mq_get_ctx(q); - if (!cpu_online(ctx->cpu)) + /* + * ctx->cpu may become ONLINE but ctx hasn't been mapped to + * hctx yet because there is a tiny race window between + * ctx->cpu ONLINE and doing the remap + */ + if (!blk_mq_ctx_mapped(ctx)) rq->mq_ctx = ctx = current_ctx; hctx = q->mq_ops->map_queue(q, ctx->cpu); @@ -1063,7 +1068,7 @@ static void blk_mq_insert_requests(struct request_queue *q, current_ctx = blk_mq_get_ctx(q); - if (!cpu_online(ctx->cpu)) + if (!blk_mq_ctx_mapped(ctx)) ctx = current_ctx; hctx = q->mq_ops->map_queue(q, ctx->cpu); @@ -1816,13 +1821,16 @@ static void blk_mq_map_swqueue(struct request_queue *q) */ queue_for_each_ctx(q, ctx, i) { /* If the cpu isn't online, the cpu is mapped to first hctx */ - if (!cpu_online(i)) + if (!cpu_online(i)) { + ctx->mapped = 0; continue; + } hctx = q->mq_ops->map_queue(q, i); cpumask_set_cpu(i, hctx->cpumask); cpumask_set_cpu(i, hctx->tags->cpumask); ctx->index_hw = hctx->nr_ctx; + ctx->mapped = 1; hctx->ctxs[hctx->nr_ctx++] = ctx; } diff --git a/block/blk-mq.h b/block/blk-mq.h index 6a48c4c..52819ad 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -10,7 +10,8 @@ struct blk_mq_ctx { } ____cacheline_aligned_in_smp; unsigned int cpu; - unsigned int index_hw; + unsigned int index_hw : 16; + unsigned int mapped : 1; unsigned int last_tag ____cacheline_aligned_in_smp; @@ -123,4 +124,9 @@ static inline bool blk_mq_hw_queue_mapped(struct blk_mq_hw_ctx *hctx) return hctx->nr_ctx && hctx->tags; } +static inline bool blk_mq_ctx_mapped(struct blk_mq_ctx *ctx) +{ + return ctx->mapped; +} + #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/