Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp3616120pxu; Mon, 30 Nov 2020 06:57:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJzIaehaIyTmTYlRTfF8hSATWcbi5M6IMqdIWEdldujRt9G+Uh65jLFOv16PzGi7qN1CF+7m X-Received: by 2002:a17:906:7f0b:: with SMTP id d11mr5006144ejr.7.1606748271963; Mon, 30 Nov 2020 06:57:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606748271; cv=none; d=google.com; s=arc-20160816; b=xrKvy47uzR1KTXchK1+eIaVIu62xfWQDv3I6dRFpJiSfnH2OJaxQkMlprRsNS+QDSh iG5lZtxjx4eyqEUKstVueytj6vNFftF+WCRGxe9JP+269pm0NewQYkB9ryW/0Nwrd1u0 T4WMwIdQiDuyN/f1Sw8YLQU0sG/rWmUdwvlzsiWh3dRrWRzm1w7pQxkA4vCQ3KHNkj0y j57l/uUj9DpJ5p5TVCNzQKlmk/P/H6gAZOe29gxFRubDBvs70WeQ65J5paDfmpCftWHQ 7rHgIQQf15IvhkfC9BSlwMB1Qf6XJpZCpLeu8x84MbT0NmQN4M6RvxkwX/QDVYIG633r NyEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=KfHMSwun2bk+hfn7brGiMjswJviA+e5vR0WPKlbFLcM=; b=F/vL/xtyDcBZ2ml+tvq4uIkf//z5doO4n4AvobKvJOWB/Jhin8+pgh7kba1Jlt6evE crfWEIwdu1l7prFuKccV1+DqcAlImZ05GgKF6Vd0++GraVkyFQWC8iZxRzh0q8ktf5oy cxu1HB3sKT+PhV4B62qO1E8jal8dm+XWISwy8xQ+yD/CtpPShkcxUgdBFN+Zsme+aLkE P3IxPkBOVbmxSEopXPpIthHw6pqrrTg218PStu4Jz418MVq9T3nsAJo2GrrzRx5AqB9B 2xSoVyEe6FaaYY1jeQ55j9rKoW4/fVwEXQeB/JVOlMsrZ+/RZtYGv2hJNp0EjmSndpJy nZ8A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ay17si769454ejb.702.2020.11.30.06.57.28; Mon, 30 Nov 2020 06:57:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727341AbgK3OzY (ORCPT + 99 others); Mon, 30 Nov 2020 09:55:24 -0500 Received: from frasgout.his.huawei.com ([185.176.79.56]:2178 "EHLO frasgout.his.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726514AbgK3OzY (ORCPT ); Mon, 30 Nov 2020 09:55:24 -0500 Received: from fraeml708-chm.china.huawei.com (unknown [172.18.147.226]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4Cl7VD4LYpz67KcS; Mon, 30 Nov 2020 22:51:48 +0800 (CST) Received: from lhreml724-chm.china.huawei.com (10.201.108.75) by fraeml708-chm.china.huawei.com (10.206.15.36) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2106.2; Mon, 30 Nov 2020 15:54:40 +0100 Received: from [10.47.3.199] (10.47.3.199) by lhreml724-chm.china.huawei.com (10.201.108.75) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.1913.5; Mon, 30 Nov 2020 14:54:38 +0000 Subject: Re: [PATCH V1] block: Fix use-after-free while iterating over requests To: Hannes Reinecke , Bart Van Assche , Pradeep P V K , , CC: , , Ming Lei , Kashyap Desai References: <1606402925-24420-1-git-send-email-ppvk@codeaurora.org> <693ea723-aa9e-1166-8a19-a7787f724969@huawei.com> <0c925db8-e481-5f21-b0fe-f691142b0437@suse.de> From: John Garry Message-ID: Date: Mon, 30 Nov 2020 14:54:13 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.1.2 MIME-Version: 1.0 In-Reply-To: <0c925db8-e481-5f21-b0fe-f691142b0437@suse.de> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.47.3.199] X-ClientProxiedBy: lhreml704-chm.china.huawei.com (10.201.108.53) To lhreml724-chm.china.huawei.com (10.201.108.75) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 30/11/2020 07:04, Hannes Reinecke wrote: > On 11/26/20 5:49 PM, John Garry wrote: >> On 26/11/2020 16:27, Bart Van Assche wrote: >>> On 11/26/20 7:02 AM, Pradeep P V K wrote: >>>> Observes below crash while accessing (use-after-free) request queue >>>> member of struct request. >>>> >>>> 191.784789:   <2> Unable to handle kernel paging request at virtual >>>> address ffffff81429a4440 >>>> ... >>>> 191.786174:   <2> CPU: 3 PID: 213 Comm: kworker/3:1H Tainted: G S >>>> O      5.4.61-qgki-debug-ge45de39 #1 >>>> ... >>>> 191.786226:   <2> Workqueue: kblockd blk_mq_timeout_work >>>> 191.786242:   <2> pstate: 20c00005 (nzCv daif +PAN +UAO) >>>> 191.786261:   <2> pc : bt_for_each+0x114/0x1a4 >>>> 191.786274:   <2> lr : bt_for_each+0xe0/0x1a4 >>>> ... >>>> 191.786494:   <2> Call trace: >>>> 191.786507:   <2>  bt_for_each+0x114/0x1a4 >>>> 191.786519:   <2>  blk_mq_queue_tag_busy_iter+0x60/0xd4 >>>> 191.786532:   <2>  blk_mq_timeout_work+0x54/0xe8 >>>> 191.786549:   <2>  process_one_work+0x2cc/0x568 >>>> 191.786562:   <2>  worker_thread+0x28c/0x518 >>>> 191.786577:   <2>  kthread+0x160/0x170 >>>> 191.786594:   <2>  ret_from_fork+0x10/0x18 >>>> 191.786615:   <2> Code: 0b080148 f9404929 f8685921 b4fffe01 (f9400028) >>>> 191.786630:   <2> ---[ end trace 0f1f51d79ab3f955 ]--- >>>> 191.786643:   <2> Kernel panic - not syncing: Fatal exception >>>> >>>> Fix this by updating the freed request with NULL. >>>> This could avoid accessing the already free request from other >>>> contexts while iterating over the requests. >>>> >>>> Signed-off-by: Pradeep P V K >>>> --- >>>>   block/blk-mq.c | 1 + >>>>   block/blk-mq.h | 1 + >>>>   2 files changed, 2 insertions(+) >>>> >>>> diff --git a/block/blk-mq.c b/block/blk-mq.c >>>> index 55bcee5..9996cb1 100644 >>>> --- a/block/blk-mq.c >>>> +++ b/block/blk-mq.c >>>> @@ -492,6 +492,7 @@ static void __blk_mq_free_request(struct request >>>> *rq) >>>>       blk_crypto_free_request(rq); >>>>       blk_pm_mark_last_busy(rq); >>>> +    hctx->tags->rqs[rq->tag] = NULL; >>>>       rq->mq_hctx = NULL; >>>>       if (rq->tag != BLK_MQ_NO_TAG) >>>>           blk_mq_put_tag(hctx->tags, ctx, rq->tag); >>>> diff --git a/block/blk-mq.h b/block/blk-mq.h >>>> index a52703c..8747bf1 100644 >>>> --- a/block/blk-mq.h >>>> +++ b/block/blk-mq.h >>>> @@ -224,6 +224,7 @@ static inline int >>>> __blk_mq_active_requests(struct blk_mq_hw_ctx *hctx) >>>>   static inline void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx >>>> *hctx, >>>>                          struct request *rq) >>>>   { >>>> +    hctx->tags->rqs[rq->tag] = NULL; >>>>       blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag); >>>>       rq->tag = BLK_MQ_NO_TAG; >>> >>> Is this perhaps a block driver bug instead of a block layer core bug? If >>> this would be a block layer core bug, it would have been reported >>> before. >> >> Isn't this the same issue which as been reported many times: >> >> https://lore.kernel.org/linux-block/20200820180335.3109216-1-ming.lei@redhat.com/ >> >> >> https://lore.kernel.org/linux-block/8376443a-ec1b-0cef-8244-ed584b96fa96@huawei.com/ >> >> >> But I never saw a crash, just kasan report. >> > And if that above were a concern, I would have thought one would need to > use a WRITE_ONCE() here; otherwise we might have a race condition where > other CPUs still see the old value, no? From further looking at the history here, Kashyap tried the same approach and preference was to not add anything to the fast path: https://lore.kernel.org/linux-block/04e2f9e8-79fa-f1cb-ab23-4a15bf3f64cc@kernel.dk/ I had another solution for this which clears any references when we free the sched tags, but never posted as it looked a bit crazy and I was busy, below. ---->8------ diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index a19cdf159b75..240804617683 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -608,7 +608,7 @@ static void blk_mq_sched_free_tags(struct blk_mq_tag_set *set, unsigned int hctx_idx) { if (hctx->sched_tags) { - blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx); + blk_mq_free_rqs(set, hctx->sched_tags, hctx_idx, NULL); blk_mq_free_rq_map(hctx->sched_tags); hctx->sched_tags = NULL; } @@ -711,10 +711,9 @@ void blk_mq_sched_free_requests(struct request_queue *q) { struct blk_mq_hw_ctx *hctx; int i; queue_for_each_hw_ctx(q, hctx, i) { - if (hctx->sched_tags) - blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i); + blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i, hctx->tags); } } diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index 32d82e23b095..9622cef0c38d 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -515,7 +515,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx, return -ENOMEM; } - blk_mq_free_rqs(set, *tagsptr, hctx->queue_num); + blk_mq_free_rqs(set, *tagsptr, hctx->queue_num, NULL); blk_mq_free_rq_map(*tagsptr); *tagsptr = new; } else { diff --git a/block/blk-mq.c b/block/blk-mq.c index 0015a1892153..6ff815ceae34 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2256,12 +2256,12 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio) EXPORT_SYMBOL_GPL(blk_mq_submit_bio); /* only for request based dm */ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, - unsigned int hctx_idx) + unsigned int hctx_idx, struct blk_mq_tags *references) { struct page *page; if (tags->rqs && set->ops->exit_request) { - int i; + int i, j; for (i = 0; i < tags->nr_tags; i++) { struct request *rq = tags->static_rqs[i]; @@ -2269,6 +2269,12 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, if (!rq) continue; set->ops->exit_request(set, rq, hctx_idx); + for (j = 0; references && j < references->nr_tags; j++) { + struct request *old = cmpxchg(&references->rqs[j], rq, 0); + } tags->static_rqs[i] = NULL; } } @@ -2425,7 +2431,7 @@ int blk_mq_alloc_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, return 0; fail: - blk_mq_free_rqs(set, tags, hctx_idx); + blk_mq_free_rqs(set, tags, hctx_idx, NULL); return -ENOMEM; } @@ -2755,7 +2761,7 @@ static void blk_mq_free_map_and_requests(struct blk_mq_tag_set *set, unsigned int hctx_idx) { if (set->tags && set->tags[hctx_idx]) { - blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx); + blk_mq_free_rqs(set, set->tags[hctx_idx], hctx_idx, NULL); blk_mq_free_rq_map(set->tags[hctx_idx]); set->tags[hctx_idx] = NULL; } diff --git a/block/blk-mq.h b/block/blk-mq.h index 863a2f3346d4..bee8c5de600b 100644 --- a/block/blk-mq.h +++ b/block/blk-mq.h @@ -52,7 +52,7 @@ struct request *blk_mq_dequeue_from_ctx(struct blk_mq_hw_ctx *hctx, * Internal helpers for allocating/freeing the request map */ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags, - unsigned int hctx_idx); + unsigned int hctx_idx, struct blk_mq_tags *references); void blk_mq_free_rq_map(struct blk_mq_tags *tags); struct blk_mq_tags *blk_mq_alloc_rq_map(struct blk_mq_tag_set *set, unsigned int hctx_idx,