Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753323AbbFXMfr (ORCPT ); Wed, 24 Jun 2015 08:35:47 -0400 Received: from mail-ie0-f173.google.com ([209.85.223.173]:35418 "EHLO mail-ie0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753296AbbFXMfh (ORCPT ); Wed, 24 Jun 2015 08:35:37 -0400 MIME-Version: 1.0 In-Reply-To: <1434894751-6877-5-git-send-email-akinobu.mita@gmail.com> References: <1434894751-6877-1-git-send-email-akinobu.mita@gmail.com> <1434894751-6877-5-git-send-email-akinobu.mita@gmail.com> Date: Wed, 24 Jun 2015 20:35:37 +0800 Message-ID: Subject: Re: [PATCH 4/4] blk-mq: fix mq_usage_counter race when switching to percpu mode From: Ming Lei To: Akinobu Mita Cc: Linux Kernel Mailing List , Jens Axboe Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5382 Lines: 140 On Sun, Jun 21, 2015 at 9:52 PM, Akinobu Mita wrote: > percpu_ref_switch_to_percpu() and percpu_ref_kill() must not be > executed at the same time as the following scenario is possible: > > 1. q->mq_usage_counter is initialized in atomic mode. > (atomic counter: 1) > > 2. After the disk registration, a process like systemd-udev starts > accessing the disk, and successfully increases refcount successfully > by percpu_ref_tryget_live() in blk_mq_queue_enter(). > (atomic counter: 2) > > 3. In the final stage of initialization, q->mq_usage_counter is being > switched to percpu mode by percpu_ref_switch_to_percpu() in > blk_mq_finish_init(). But if CONFIG_PREEMPT_VOLUNTARY is enabled, > the process is rescheduled in the middle of switching when calling > wait_event() in __percpu_ref_switch_to_percpu(). > (atomic counter: 2) This can only happen when freezing queue from CPU hotplug happens before running wait_event() from blk_mq_finish_init(). So looks like we still may avoid the race by moving adding queue into all_q_list to blk_mq_register_disk(), but the mapping need to update at that time. > > 4. CPU hotplug handling for blk-mq calls percpu_ref_kill() to freeze > request queue. q->mq_usage_counter is decreased and marked as > DEAD. Wait until all requests have finished. > (atomic counter: 1) > > 5. The process rescheduled in the step 3. is resumed and finishes > all remaining work in __percpu_ref_switch_to_percpu(). > A bias value is added to atomic counter of q->mq_usage_counter. > (atomic counter: PERCPU_COUNT_BIAS + 1) > > 6. A request issed in the step 2. is finished and q->mq_usage_counter > is decreased by blk_mq_queue_exit(). q->mq_usage_counter is DEAD, > so atomic counter is decreased and no release handler is called. > (atomic counter: PERCPU_COUNT_BIAS) > > 7. CPU hotplug handling in the step 4. will wait forever as > q->mq_usage_counter will never be zero. > > Also, percpu_ref_reinit() and percpu_ref_kill() must not be executed > at the same time. Because both functions could call > __percpu_ref_switch_to_percpu() which adds the bias value and > initialize percpu counter. > > Fix those races by serializing with a mutex. > > Signed-off-by: Akinobu Mita > Cc: Jens Axboe > --- > block/blk-mq-sysfs.c | 2 ++ > block/blk-mq.c | 6 ++++++ > include/linux/blkdev.h | 6 ++++++ > 3 files changed, 14 insertions(+) > > diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c > index d8ef3a3..af3e126 100644 > --- a/block/blk-mq-sysfs.c > +++ b/block/blk-mq-sysfs.c > @@ -407,7 +407,9 @@ static void blk_mq_sysfs_init(struct request_queue *q) > /* see blk_register_queue() */ > void blk_mq_finish_init(struct request_queue *q) > { > + mutex_lock(&q->mq_usage_lock); > percpu_ref_switch_to_percpu(&q->mq_usage_counter); > + mutex_unlock(&q->mq_usage_lock); > } > > int blk_mq_register_disk(struct gendisk *disk) > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 64d93e4..62d0ef1 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -119,7 +119,9 @@ void blk_mq_freeze_queue_start(struct request_queue *q) > spin_unlock_irq(q->queue_lock); > > if (freeze) { > + mutex_lock(&q->mq_usage_lock); > percpu_ref_kill(&q->mq_usage_counter); > + mutex_unlock(&q->mq_usage_lock); > blk_mq_run_hw_queues(q, false); > } > } > @@ -150,7 +152,9 @@ void blk_mq_unfreeze_queue(struct request_queue *q) > WARN_ON_ONCE(q->mq_freeze_depth < 0); > spin_unlock_irq(q->queue_lock); > if (wake) { > + mutex_lock(&q->mq_usage_lock); > percpu_ref_reinit(&q->mq_usage_counter); > + mutex_unlock(&q->mq_usage_lock); > wake_up_all(&q->mq_freeze_wq); > } > } > @@ -1961,6 +1965,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, > hctxs[i]->queue_num = i; > } > > + mutex_init(&q->mq_usage_lock); > + > /* > * Init percpu_ref in atomic mode so that it's faster to shutdown. > * See blk_register_queue() for details. > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index 5d93a66..c5bf534 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -484,6 +484,12 @@ struct request_queue { > struct rcu_head rcu_head; > wait_queue_head_t mq_freeze_wq; > struct percpu_ref mq_usage_counter; > + /* > + * Protect concurrent access from percpu_ref_switch_to_percpu and > + * percpu_ref_kill, and access from percpu_ref_switch_to_percpu and > + * percpu_ref_reinit. > + */ > + struct mutex mq_usage_lock; > struct list_head all_q_node; > > struct blk_mq_tag_set *tag_set; > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > Please read the FAQ at http://www.tux.org/lkml/ -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/