Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932358AbbG1IKg (ORCPT ); Tue, 28 Jul 2015 04:10:36 -0400 Received: from blu004-omc1s14.hotmail.com ([65.55.116.25]:52151 "EHLO BLU004-OMC1S14.hotmail.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932331AbbG1IKc (ORCPT ); Tue, 28 Jul 2015 04:10:32 -0400 X-TMN: [fCd7/VJTtlXdfAsqQL84aTXVhFIt9TzA] X-Originating-Email: [wanpeng.li@hotmail.com] Message-ID: Subject: Re: [PATCH v3 7/7] blk-mq: fix deadlock when reading cpu_list To: Akinobu Mita , linux-kernel@vger.kernel.org References: <1437236903-31617-1-git-send-email-akinobu.mita@gmail.com> <1437236903-31617-8-git-send-email-akinobu.mita@gmail.com> CC: Jens Axboe , Ming Lei From: Wanpeng Li Date: Tue, 28 Jul 2015 16:10:23 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 MIME-Version: 1.0 In-Reply-To: <1437236903-31617-8-git-send-email-akinobu.mita@gmail.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 28 Jul 2015 08:10:30.0410 (UTC) FILETIME=[DE89AEA0:01D0C90C] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3138 Lines: 91 On 7/19/15 12:28 AM, Akinobu Mita wrote: > CPU hotplug handling for blk-mq (blk_mq_queue_reinit) acquires > all_q_mutex in blk_mq_queue_reinit_notify() and then removes sysfs > entries by blk_mq_sysfs_unregister(). Removing sysfs entry needs to > be blocked until the active reference of the kernfs_node to be zero. > > On the other hand, reading blk_mq_hw_sysfs_cpu sysfs entry (e.g. > /sys/block/nullb0/mq/0/cpu_list) acquires all_q_mutex in > blk_mq_hw_sysfs_cpus_show(). > > If these happen at the same time, a deadlock can happen. Because one > can wait for the active reference to be zero with holding all_q_mutex, > and the other tries to acquire all_q_mutex with holding the active > reference. > > The reason that all_q_mutex is acquired in blk_mq_hw_sysfs_cpus_show() > is to avoid reading an imcomplete hctx->cpumask. Since reading sysfs > entry for blk-mq needs to acquire q->sysfs_lock, we can avoid deadlock > and reading an imcomplete hctx->cpumask by protecting q->sysfs_lock > while hctx->cpumask is being updated. Dump ctx attrs will be excluded with map software to hardware queue operations which makes people confusing after your patch. Regards, Wanpeng Li > > Signed-off-by: Akinobu Mita > Cc: Jens Axboe > Cc: Ming Lei > --- > block/blk-mq-sysfs.c | 4 ---- > block/blk-mq.c | 7 +++++++ > 2 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c > index f63b464..e0f71bf 100644 > --- a/block/blk-mq-sysfs.c > +++ b/block/blk-mq-sysfs.c > @@ -218,8 +218,6 @@ static ssize_t blk_mq_hw_sysfs_cpus_show(struct blk_mq_hw_ctx *hctx, char *page) > unsigned int i, first = 1; > ssize_t ret = 0; > > - blk_mq_disable_hotplug(); > - > for_each_cpu(i, hctx->cpumask) { > if (first) > ret += sprintf(ret + page, "%u", i); > @@ -229,8 +227,6 @@ static ssize_t blk_mq_hw_sysfs_cpus_show(struct blk_mq_hw_ctx *hctx, char *page) > first = 0; > } > > - blk_mq_enable_hotplug(); > - > ret += sprintf(ret + page, "\n"); > return ret; > } > diff --git a/block/blk-mq.c b/block/blk-mq.c > index b931e38..1a5e7d1 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1815,6 +1815,11 @@ static void blk_mq_map_swqueue(struct request_queue *q, > struct blk_mq_ctx *ctx; > struct blk_mq_tag_set *set = q->tag_set; > > + /* > + * Avoid others reading imcomplete hctx->cpumask through sysfs > + */ > + mutex_lock(&q->sysfs_lock); > + > queue_for_each_hw_ctx(q, hctx, i) { > cpumask_clear(hctx->cpumask); > hctx->nr_ctx = 0; > @@ -1834,6 +1839,8 @@ static void blk_mq_map_swqueue(struct request_queue *q, > hctx->ctxs[hctx->nr_ctx++] = ctx; > } > > + mutex_unlock(&q->sysfs_lock); > + > queue_for_each_hw_ctx(q, hctx, i) { > struct blk_mq_ctxmap *map = &hctx->ctx_map; > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/