Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp1052786pxv; Thu, 22 Jul 2021 21:00:13 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw9mq+4hF5hf9X+yA5QSqU+HG6qdxyKAG72utTWJNpfehDNKlsytx0Q3edwT+B4g0ThdqV+ X-Received: by 2002:a50:cd86:: with SMTP id p6mr3353344edi.212.1627012813353; Thu, 22 Jul 2021 21:00:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627012813; cv=none; d=google.com; s=arc-20160816; b=DHaJ77zBmidlW3ShJskmjCakzgw+K5N00Pr3RugVcp2mehovxtyKNXXeTilBCQEHuR F02LQJPTN/K4EsL2dVRxDSYMn3JlI23L5L43apkigEzOZIZ87Ffil38mYFvgppqkRqrv NoT9mOKcQ0f5NC370MbQ07PCLyL7c0tCsjm9V4gBj18ewQpKPSabnzk226aadAB2SHiq yySknFhIym56DS+qYpvw+zvEdca3yiTu3wROaV6Z1clvevCX9zNKVHhSCn8KYJdK669Y nPp8Wu/3U3kGt2ISbqkuD9Y+O4OCjC0iHMWxjcRlxD4sB5hNupD6kniNVDDbv6+JG5vq 0iUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KrCt8dLk6lyduKdlFf+cA3vus0Jtc1NyFlBjFc/uH8Y=; b=zdyge7xZTHNdlvvh0z58zrtHkCmdCckRIDHMT2fhWRs/jf6YADEvhPdDIqsQTCVsG8 HzbqR1Rl8PW4scO2bAVAZEeDnnp0mauHGpTjRJtjnVP78vP+QFMZmiY8SMSDUVox0q9e slQvGS+lZWJjhrU/ajx27kWbeUrzVDpuWRBO5s3HNQrUYRFeKND1rweWpru1w+aHXZZM 5PWOODIyfNXjcVCeuCT776IcIKY6J0/qbAe9XBIlbghiMkCGCDTbHE5mQat2oQ9YI5IF TI1MhEIf8ML4FHdQ8a2uBoVKRtSP7gsJpFMcjMHPbBdWHylVOX/QGb+RGass06/5O2Vc lkoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=E8nFfw3w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f14si31705345ejx.84.2021.07.22.20.59.49; Thu, 22 Jul 2021 21:00:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=E8nFfw3w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233919AbhGWDRY (ORCPT + 99 others); Thu, 22 Jul 2021 23:17:24 -0400 Received: from mail.kernel.org ([198.145.29.99]:37072 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233708AbhGWDQ5 (ORCPT ); Thu, 22 Jul 2021 23:16:57 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 1E32B60E9C; Fri, 23 Jul 2021 03:57:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1627012652; bh=fHoJD2nmeHTB7HukJvxAs42kgkhwb5agKiJFkm9nR1s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=E8nFfw3wHdcyeZFo+N6PbdV5EcuEnydYAMHPEkBNgV3DU/EFf9sVnYv/BTF3uCthV 9+PCWKuQjoHv6whnLHGV1huMwq12wLUcZHdbuc9wk2LydTyDlmpWgBAfAPhRFMk6Up aTmNXg8bnVyN1fS+jD6t6TXG4CEwhPM+o0D/i3rc2dq7gnJK68FrL0t+LN/mTWL+JJ 4vB1PSOgzW5zKKtFehZjOxxmA0KFGlssXEDw4cfq20u7SA2o2CatP9c254TGg7jgmY jsDaHS1iQ/fdEl5TsoUQ/iTs1UgXXVVvkPaCG+23q4UYrmUmu1wQn0+gGajcfKzI9u 9EPdQtioFAqMQ== From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Casey Chen , Yuanyuan Zhong , Keith Busch , Christoph Hellwig , Sasha Levin , linux-nvme@lists.infradead.org Subject: [PATCH AUTOSEL 5.13 08/19] nvme-pci: fix multiple races in nvme_setup_io_queues Date: Thu, 22 Jul 2021 23:57:09 -0400 Message-Id: <20210723035721.531372-8-sashal@kernel.org> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210723035721.531372-1-sashal@kernel.org> References: <20210723035721.531372-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Casey Chen [ Upstream commit e4b9852a0f4afe40604afb442e3af4452722050a ] Below two paths could overlap each other if we power off a drive quickly after powering it on. There are multiple races in nvme_setup_io_queues() because of shutdown_lock missing and improper use of NVMEQ_ENABLED bit. nvme_reset_work() nvme_remove() nvme_setup_io_queues() nvme_dev_disable() ... ... A1 clear NVMEQ_ENABLED bit for admin queue lock retry: B1 nvme_suspend_io_queues() A2 pci_free_irq() admin queue B2 nvme_suspend_queue() admin queue A3 pci_free_irq_vectors() nvme_pci_disable() A4 nvme_setup_irqs(); B3 pci_free_irq_vectors() ... unlock A5 queue_request_irq() for admin queue set NVMEQ_ENABLED bit ... nvme_create_io_queues() A6 result = queue_request_irq(); set NVMEQ_ENABLED bit ... fail to allocate enough IO queues: A7 nvme_suspend_io_queues() goto retry If B3 runs in between A1 and A2, it will crash if irqaction haven't been freed by A2. B2 is supposed to free admin queue IRQ but it simply can't fulfill the job as A1 has cleared NVMEQ_ENABLED bit. Fix: combine A1 A2 so IRQ get freed as soon as the NVMEQ_ENABLED bit gets cleared. After solved #1, A2 could race with B3 if A2 is freeing IRQ while B3 is checking irqaction. A3 also could race with B2 if B2 is freeing IRQ while A3 is checking irqaction. Fix: A2 and A3 take lock for mutual exclusion. A3 could race with B3 since they could run free_msi_irqs() in parallel. Fix: A3 takes lock for mutual exclusion. A4 could fail to allocate all needed IRQ vectors if A3 and A4 are interrupted by B3. Fix: A4 takes lock for mutual exclusion. If A5/A6 happened after B2/B1, B3 will crash since irqaction is not NULL. They are just allocated by A5/A6. Fix: Lock queue_request_irq() and setting of NVMEQ_ENABLED bit. A7 could get chance to pci_free_irq() for certain IO queue while B3 is checking irqaction. Fix: A7 takes lock. nvme_dev->online_queues need to be protected by shutdown_lock. Since it is not atomic, both paths could modify it using its own copy. Co-developed-by: Yuanyuan Zhong Signed-off-by: Casey Chen Reviewed-by: Keith Busch Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin --- drivers/nvme/host/pci.c | 66 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 58 insertions(+), 8 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 42ad75ff1348..1e704c63f1e2 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1562,6 +1562,28 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid) wmb(); /* ensure the first interrupt sees the initialization */ } +/* + * Try getting shutdown_lock while setting up IO queues. + */ +static int nvme_setup_io_queues_trylock(struct nvme_dev *dev) +{ + /* + * Give up if the lock is being held by nvme_dev_disable. + */ + if (!mutex_trylock(&dev->shutdown_lock)) + return -ENODEV; + + /* + * Controller is in wrong state, fail early. + */ + if (dev->ctrl.state != NVME_CTRL_CONNECTING) { + mutex_unlock(&dev->shutdown_lock); + return -ENODEV; + } + + return 0; +} + static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled) { struct nvme_dev *dev = nvmeq->dev; @@ -1590,8 +1612,11 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled) goto release_cq; nvmeq->cq_vector = vector; - nvme_init_queue(nvmeq, qid); + result = nvme_setup_io_queues_trylock(dev); + if (result) + return result; + nvme_init_queue(nvmeq, qid); if (!polled) { result = queue_request_irq(nvmeq); if (result < 0) @@ -1599,10 +1624,12 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled) } set_bit(NVMEQ_ENABLED, &nvmeq->flags); + mutex_unlock(&dev->shutdown_lock); return result; release_sq: dev->online_queues--; + mutex_unlock(&dev->shutdown_lock); adapter_delete_sq(dev, qid); release_cq: adapter_delete_cq(dev, qid); @@ -2176,7 +2203,18 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) if (nr_io_queues == 0) return 0; - clear_bit(NVMEQ_ENABLED, &adminq->flags); + /* + * Free IRQ resources as soon as NVMEQ_ENABLED bit transitions + * from set to unset. If there is a window to it is truely freed, + * pci_free_irq_vectors() jumping into this window will crash. + * And take lock to avoid racing with pci_free_irq_vectors() in + * nvme_dev_disable() path. + */ + result = nvme_setup_io_queues_trylock(dev); + if (result) + return result; + if (test_and_clear_bit(NVMEQ_ENABLED, &adminq->flags)) + pci_free_irq(pdev, 0, adminq); if (dev->cmb_use_sqes) { result = nvme_cmb_qdepth(dev, nr_io_queues, @@ -2192,14 +2230,17 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) result = nvme_remap_bar(dev, size); if (!result) break; - if (!--nr_io_queues) - return -ENOMEM; + if (!--nr_io_queues) { + result = -ENOMEM; + goto out_unlock; + } } while (1); adminq->q_db = dev->dbs; retry: /* Deregister the admin queue's interrupt */ - pci_free_irq(pdev, 0, adminq); + if (test_and_clear_bit(NVMEQ_ENABLED, &adminq->flags)) + pci_free_irq(pdev, 0, adminq); /* * If we enable msix early due to not intx, disable it again before @@ -2208,8 +2249,10 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) pci_free_irq_vectors(pdev); result = nvme_setup_irqs(dev, nr_io_queues); - if (result <= 0) - return -EIO; + if (result <= 0) { + result = -EIO; + goto out_unlock; + } dev->num_vecs = result; result = max(result - 1, 1); @@ -2223,8 +2266,9 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) */ result = queue_request_irq(adminq); if (result) - return result; + goto out_unlock; set_bit(NVMEQ_ENABLED, &adminq->flags); + mutex_unlock(&dev->shutdown_lock); result = nvme_create_io_queues(dev); if (result || dev->online_queues < 2) @@ -2233,6 +2277,9 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) if (dev->online_queues - 1 < dev->max_qid) { nr_io_queues = dev->online_queues - 1; nvme_disable_io_queues(dev); + result = nvme_setup_io_queues_trylock(dev); + if (result) + return result; nvme_suspend_io_queues(dev); goto retry; } @@ -2241,6 +2288,9 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) dev->io_queues[HCTX_TYPE_READ], dev->io_queues[HCTX_TYPE_POLL]); return 0; +out_unlock: + mutex_unlock(&dev->shutdown_lock); + return result; } static void nvme_del_queue_end(struct request *req, blk_status_t error) -- 2.30.2