Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp155771rwb; Wed, 9 Nov 2022 00:05:27 -0800 (PST) X-Google-Smtp-Source: AMsMyM6CzYmxGim1i5q6NwHNlflXeZqE+X8Ti42sLN0s2lJmg/VaJzp6a67LWYajyZvsHjx/mnTo X-Received: by 2002:a50:cd07:0:b0:464:63b:1017 with SMTP id z7-20020a50cd07000000b00464063b1017mr36450656edi.364.1667981127653; Wed, 09 Nov 2022 00:05:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667981127; cv=none; d=google.com; s=arc-20160816; b=0d0iHyqieD849J5ygTbWSK84SWbyig7GrzoOJYD/kpI63YvBsMYvXsP1671C6AjU6j jFverl5/pLQ+LXFCvDqzF5thLW7ddUkUPjMB4oqtcp/qsJXESiQpNmTJ5n1rFoEOqURU zjqrpzHEH/6Nla4jaISwERZGjpO1ZGHjCFT/xQj9NdQ/n/SEXPb3ldzQUgYF6VxpsW7T fmBNS2QSGg6XG6t0rCAq9JWXDZfJ/jYez3MWncTS7uBUEtM1Am8EhOt4Zd2pAuV7oZbo x94KUsq5z5KRcjfd5b8A0EpCkQOgYmS3b2lBYZAKqk26yIaqZRFAMJ168GYpu3ebNZS5 XJGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=BpS60wEc4ZLku1K3DC2yeSWX8GPj9//tB/fjOARAZv0=; b=PlfZhUJG8zrxwv0z0NRFUYjpXUACMI7N9B0WHTk8PC/51fNkrUvxTK00mPKuPAnsLo oJLFZpN1857a2dbJCSpSXohejD0vDlyd44JfJIVzxgHM1EdtNjIVoT+M1en0qgYnL5/1 8EYo1/J+1iPj2EaxjexsBgZu6dPYENve5sDuosbN2fOvHiMimTF906x0oBGhqkXatuhl BSwL2B6lXGZKPKo2O/HbzXyaWRwI7ryA9wEBRNUxoKNohZyO78BCwO9FnOF1jbSNcsjL ocKvrzgNQYyOxnoTyxGHOEFNR0f5w0zAp0ZgJjum0JIsBhJLmGad8UIUIzYqgtet6YLb yCRA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dy1-20020a05640231e100b0044ebf99d6f9si13225014edb.263.2022.11.09.00.05.04; Wed, 09 Nov 2022 00:05:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229691AbiKIHlN (ORCPT + 93 others); Wed, 9 Nov 2022 02:41:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45648 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229448AbiKIHlL (ORCPT ); Wed, 9 Nov 2022 02:41:11 -0500 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3509F1A04A; Tue, 8 Nov 2022 23:41:10 -0800 (PST) Received: from canpemm500002.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4N6cMr1cSmz15MVP; Wed, 9 Nov 2022 15:40:56 +0800 (CST) Received: from [10.169.59.127] (10.169.59.127) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 9 Nov 2022 15:41:08 +0800 Subject: Re: nvme-pci: NULL pointer dereference in nvme_dev_disable() on linux-next To: Sagi Grimberg , Christoph Hellwig , Gerd Bayer CC: Jens Axboe , Niklas Schnelle , , , References: <20221108074846.GA22674@lst.de> <65f5d26b-b0af-f9c6-e77c-e82ac969e9f9@grimberg.me> From: Chao Leng Message-ID: Date: Wed, 9 Nov 2022 15:41:07 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 MIME-Version: 1.0 In-Reply-To: <65f5d26b-b0af-f9c6-e77c-e82ac969e9f9@grimberg.me> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.169.59.127] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500002.china.huawei.com (7.192.104.244) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The check "if (!ctrl->tagset)" is just reduce the probability. The real reason is the race of probe and remove. It is similar with TCP and RDMA transport. Israel has tried to fix it. The detail: https://github.com/torvalds/linux/commit/ce1518139e6976cf19c133b555083354fdb629b8 Unfortunately, this patch was reverted. If it is in the process of "probe", remove should not be called. Maybe we can move pci_set_drvdata to the end of nvme_probe. Of course, the removal may not take effect if it is in the process of "probe". This is why the patch of Israel is reverted. Perhaps the better option would be that "remove" wait for the "probe" to complete, and then do the real remove. This requires additional mechanism to implement this. On 2022/11/9 10:54, Sagi Grimberg wrote: > >> Below is the minimal fix.  I'll see if I sort out the mess that is >> probe/reset failure vs ->remove a bit better, though. >> >> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c >> index f94b05c585cbc..577bacdcfee08 100644 >> --- a/drivers/nvme/host/core.c >> +++ b/drivers/nvme/host/core.c >> @@ -5160,6 +5160,8 @@ EXPORT_SYMBOL_GPL(nvme_start_freeze); >>   void nvme_stop_queues(struct nvme_ctrl *ctrl) >>   { >> +    if (!ctrl->tagset) >> +        return; >>       if (!test_and_set_bit(NVME_CTRL_STOPPED, &ctrl->flags)) >>           blk_mq_quiesce_tagset(ctrl->tagset); >>       else >> @@ -5169,6 +5171,8 @@ EXPORT_SYMBOL_GPL(nvme_stop_queues); >>   void nvme_start_queues(struct nvme_ctrl *ctrl) >>   { >> +    if (!ctrl->tagset) >> +        return; >>       if (test_and_clear_bit(NVME_CTRL_STOPPED, &ctrl->flags)) >>           blk_mq_unquiesce_tagset(ctrl->tagset); >>   } > > Can we do that in the pci driver and not here? > > .