Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp5317136ybi; Wed, 12 Jun 2019 00:05:50 -0700 (PDT) X-Google-Smtp-Source: APXvYqzApjx782/oPeHRt5VU4YyBB0nNJ4P/c45xxdUkPdoGPjKzUVbZJZGn+jSKFrk1q1jucZnt X-Received: by 2002:a17:902:6acc:: with SMTP id i12mr3035091plt.214.1560323150611; Wed, 12 Jun 2019 00:05:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560323150; cv=none; d=google.com; s=arc-20160816; b=y6r5JysJlTDmSNBWnauzA5xS7kcMh32b7DL4XuFL7IYcE5BpJFT0/KhiyZBdAt7WFE /Z6rULBuC0kdyfS4bcX1bRgX2KOxAHZibxvq0nSQRqWqyzIKSi2b8v6WFi2qoKxhybQN BGm2EkRS1Br8tN5r9keiaIO+7hKSuM/bKrH3n2EYskOB030/xxErCLcs/AwlC1TaUy2/ 3CdkY99CaD4fXb2xaNicFfDsA17FfEMLK1CKxzGiL+3h+w6UwQELnm0o9xOAvapFyNxr iIzr7S9NB96H9U4XefxC9JsCv4GoILPcdfq1/O+9JjfwOFGh4fZJdnxrPk3oViVhLyye 758w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject; bh=pGW9WLUybyV7COtBjXVdeGygKXnhfF5IarA1pJ12H0Y=; b=DgFW7nIjYzcg4II7W/nmvqQGQGHzlfPgW47BAOaMofaTX5kxm+s5VPpoOxbZ+gcaUI SboWTX8YeJ2zmo3DCWQoov5MGNWj97919IjTcFOvLP6dJFES9IfgqOFgbh8hJRwQYJkG hqO5Q/Roj/xU7BYlHbUzxPHjcdvQsJlKapCRqwTOqikRu0ZczfFSBHOwecr9e5B8iI1P F+/D1huVl73nhgAiF3mPbbhZmdBaZZ/FpvUE9CUmnyUVCHWTVpLlyRO/4l2FdtciMQDo 23EDLFk7ppBAFyyxPXWI2sca12C1ZploF+6+9UdVVbRgVjd4UVl7CPJQZCUshkHV7s6G ZgMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e67si14268049pgc.11.2019.06.12.00.05.34; Wed, 12 Jun 2019 00:05:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2408272AbfFKXk3 (ORCPT + 99 others); Tue, 11 Jun 2019 19:40:29 -0400 Received: from mga06.intel.com ([134.134.136.31]:35765 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2408263AbfFKXk2 (ORCPT ); Tue, 11 Jun 2019 19:40:28 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Jun 2019 16:40:27 -0700 X-ExtLoop1: 1 Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga007.fm.intel.com with ESMTP; 11 Jun 2019 16:40:26 -0700 Subject: [PATCH 5/6] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock From: Dan Williams To: linux-nvdimm@lists.01.org Cc: stable@vger.kernel.org, Vishal Verma , peterz@infradead.org, linux-kernel@vger.kernel.org Date: Tue, 11 Jun 2019 16:26:10 -0700 Message-ID: <156029557026.419799.1378625476581339382.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <156029554317.419799.1324389595953183385.stgit@dwillia2-desk3.amr.corp.intel.com> References: <156029554317.419799.1324389595953183385.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-2-gc94f MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A multithreaded namespace creation/destruction stress test currently deadlocks with the following lockup signature: INFO: task ndctl:2924 blocked for more than 122 seconds. Tainted: G OE 5.2.0-rc4+ #3382 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ndctl D 0 2924 1176 0x00000000 Call Trace: ? __schedule+0x27e/0x780 schedule+0x30/0xb0 wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm] ? finish_wait+0x80/0x80 uuid_store+0xe6/0x2e0 [libnvdimm] kernfs_fop_write+0xf0/0x1a0 vfs_write+0xb7/0x1b0 ksys_write+0x5c/0xd0 do_syscall_64+0x60/0x240 INFO: task ndctl:2923 blocked for more than 122 seconds. Tainted: G OE 5.2.0-rc4+ #3382 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ndctl D 0 2923 1175 0x00000000 Call Trace: ? __schedule+0x27e/0x780 ? __mutex_lock+0x489/0x910 schedule+0x30/0xb0 schedule_preempt_disabled+0x11/0x20 __mutex_lock+0x48e/0x910 ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm] ? __lock_acquire+0x23f/0x1710 ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm] nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm] __dax_pmem_probe+0x5e/0x210 [dax_pmem_core] ? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm] dax_pmem_probe+0xc/0x20 [dax_pmem] nvdimm_bus_probe+0x90/0x2c0 [libnvdimm] really_probe+0xef/0x390 driver_probe_device+0xb4/0x100 In this sequence an 'nd_dax' device is being probed and trying to take the lock on its backing namespace to validate that the 'nd_dax' device indeed has exclusive access to the backing namespace. Meanwhile, another thread is trying to update the uuid property of that same backing namespace. So one thread is in the probe path trying to acquire the lock, and the other thread has acquired the lock and tries to flush the probe path. Fix this deadlock by not holding the namespace device_lock over the wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and subsequently dropped internally to wait_nvdimm_bus_probe_idle(). Cc: Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation") Cc: Vishal Verma Signed-off-by: Dan Williams --- drivers/nvdimm/bus.c | 17 +++++++++++------ drivers/nvdimm/region_devs.c | 4 ++++ 2 files changed, 15 insertions(+), 6 deletions(-) diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c index dfb93228d6a7..c1d26fca9c4c 100644 --- a/drivers/nvdimm/bus.c +++ b/drivers/nvdimm/bus.c @@ -887,10 +887,12 @@ void wait_nvdimm_bus_probe_idle(struct device *dev) do { if (nvdimm_bus->probe_active == 0) break; - nvdimm_bus_unlock(&nvdimm_bus->dev); + nvdimm_bus_unlock(dev); + device_unlock(dev); wait_event(nvdimm_bus->wait, nvdimm_bus->probe_active == 0); - nvdimm_bus_lock(&nvdimm_bus->dev); + device_lock(dev); + nvdimm_bus_lock(dev); } while (true); } @@ -1017,7 +1019,7 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm, case ND_CMD_ARS_START: case ND_CMD_CLEAR_ERROR: case ND_CMD_CALL: - dev_dbg(&nvdimm_bus->dev, "'%s' command while read-only.\n", + dev_dbg(dev, "'%s' command while read-only.\n", nvdimm ? nvdimm_cmd_name(cmd) : nvdimm_bus_cmd_name(cmd)); return -EPERM; @@ -1088,7 +1090,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm, goto out; } - nvdimm_bus_lock(&nvdimm_bus->dev); + device_lock(dev); + nvdimm_bus_lock(dev); rc = nd_cmd_clear_to_send(nvdimm_bus, nvdimm, func, buf); if (rc) goto out_unlock; @@ -1103,7 +1106,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm, nvdimm_account_cleared_poison(nvdimm_bus, clear_err->address, clear_err->cleared); } - nvdimm_bus_unlock(&nvdimm_bus->dev); + nvdimm_bus_unlock(dev); + device_unlock(dev); if (copy_to_user(p, buf, buf_len)) rc = -EFAULT; @@ -1112,7 +1116,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm, return rc; out_unlock: - nvdimm_bus_unlock(&nvdimm_bus->dev); + nvdimm_bus_unlock(dev); + device_unlock(dev); out: vfree(buf); return rc; diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c index 4fed9ce9c2fe..a15276cdec7d 100644 --- a/drivers/nvdimm/region_devs.c +++ b/drivers/nvdimm/region_devs.c @@ -422,10 +422,12 @@ static ssize_t available_size_show(struct device *dev, * memory nvdimm_bus_lock() is dropped, but that's userspace's * problem to not race itself. */ + device_lock(dev); nvdimm_bus_lock(dev); wait_nvdimm_bus_probe_idle(dev); available = nd_region_available_dpa(nd_region); nvdimm_bus_unlock(dev); + device_unlock(dev); return sprintf(buf, "%llu\n", available); } @@ -437,10 +439,12 @@ static ssize_t max_available_extent_show(struct device *dev, struct nd_region *nd_region = to_nd_region(dev); unsigned long long available = 0; + device_lock(dev); nvdimm_bus_lock(dev); wait_nvdimm_bus_probe_idle(dev); available = nd_region_allocatable_dpa(nd_region); nvdimm_bus_unlock(dev); + device_unlock(dev); return sprintf(buf, "%llu\n", available); }