Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp1882197ybi; Wed, 17 Jul 2019 23:40:52 -0700 (PDT) X-Google-Smtp-Source: APXvYqwfC+Kg6GT+LmrkqyBR2jsYXp3ZJq902EFiaQQAJnjYPqlRBh6f0IvNBfEu0EtVc4gUMVef X-Received: by 2002:a63:c20e:: with SMTP id b14mr44553964pgd.96.1563432052186; Wed, 17 Jul 2019 23:40:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563432052; cv=none; d=google.com; s=arc-20160816; b=v6UNSeZg8+YoO8u8QkQPglf2TquLzgnO58gDc7A7AZhq3kOSS5C5Mv2wmHdF9yagDI Vkw/jrb3UkDo/HnLoyto4UODMfZXP5roVEvcCcbZW+V2DeM7oqpcbfwJe3RliX+0hfg4 TEUtZSUTkyzCo6wcipedixV3+nKaNbMkYkoU2CXwLWzj+t9+WvZR50183hhcnjLZQu+c E9M4ePqiysRW4P8zORqXz9CS2HY1joo8eASVmhnJv04v0XlVdGEUH5gdzGeoQf3me7nI LslNLxJ5La8Rrh1ENHb/XYxMtwqCXaSaTHHSwyQUZPf9g62pih8OT5C9NFQyNYpMHq0h 0PHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=IyEgWa0WdxBWBa6CzxcMcVHbwCGtaOL4SMdcfx2LKzA=; b=Frc9qCTm9SbK18GhzTmkCcMVsaGxRiKEr5ttNa6rVS4oHHYnmSEFIclIILC0LwbsB9 OyVu74kiSp51TOMrHTfaz++e7OqeVXm0odcrOKTV3OTP0eIPOpPJAJ+o2cFMQ2sYSUke /chVPRfnOCn7xfNk/ncqRHCBXM2Z51KLMaF+zzv6qLRQmUALc0Aw2lklY+kerVQhMlgl lFYgL2Dh/Ul3uykd3wA/sRF4l66dKRWr8t/IsJVJjttFRXGur2uyfrjL3Lo3faXlYCyc wZ0d8zry42iYx3Sv64zQn/AsbIE6DE4nBMIVnqhSRZj5cli+/xXxl3amiLS83XY9NxQN 1UDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=OECf+Wpe; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a21si524116pgv.185.2019.07.17.23.40.33; Wed, 17 Jul 2019 23:40:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=OECf+Wpe; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727566AbfGRGj7 (ORCPT + 99 others); Thu, 18 Jul 2019 02:39:59 -0400 Received: from mail-ot1-f65.google.com ([209.85.210.65]:46469 "EHLO mail-ot1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726649AbfGRGj7 (ORCPT ); Thu, 18 Jul 2019 02:39:59 -0400 Received: by mail-ot1-f65.google.com with SMTP id z23so27810763ote.13 for ; Wed, 17 Jul 2019 23:39:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=IyEgWa0WdxBWBa6CzxcMcVHbwCGtaOL4SMdcfx2LKzA=; b=OECf+WpeGSYLwM1UPTrEag6W4xZtDFGcdAV6czFUyazKwY7DxPaFLoJB9baFzMb4Rw zOnfjQ+5NEySGHin2V0JqgZc6DW8ADr9+tVIU/VdGrdQbyFXr0exIEYP7nYZqBIQb0kM FSUlnjqJcegqGdXL1KSR9TS+AWZFOBFVSoeY6YqNYDQ8SqGJQ6wh+Y4oRb0zyzKG4RXt viMY6DoKhRBk1JkSvdO+8uI35dPHHIBV41lj7kkHKHdzPKXzpgSWUe/lXj9CURFQYGFu 5nJ4JK08EzY3Ut+d06TzicyadD8Jb/qrZPlsSBJ0vpY5ODNeRcXG+6hSlw33Vx5JNlhV IUXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=IyEgWa0WdxBWBa6CzxcMcVHbwCGtaOL4SMdcfx2LKzA=; b=HbdF5Y285rZi0pFp1MaQCx5xPX+JOGjuz4c5VYfoQNgpdBH0dZF7yzpgX5U3vVqSnH 0TxLSpymPJk9e+2JXtRCa16M+HxVxdoV688p2rj8zMWdzBZtdFASgfUF1pRgMq0+atMK RHoQrMep2nP0kBteTxYaIEwZuI1tj/yhuyNpf+0vPF69UJlXlwCFJ8HozkSCPg9jANnc 8K/GZlJkgN0hnJsl5CR6ZElA+HVjSRt+UEw137hnG+wudLQAi7/JGgNKY+nqN5MGrsDw hPKeRwEpPBAYZDBVq4isc7e7w+bDjCRpuPmJFU6+vWL8aj21xFAvAqjqI5il2V1LmMyn Ey8A== X-Gm-Message-State: APjAAAWy9fOE57yZ4PacCngtEHWeAxJyyIwq86rGvG5MkjibfGfZC+Ku WljkwP1v6uX6n1phfIlYN603Ro5R7edJTvfYVpgBpA== X-Received: by 2002:a9d:470d:: with SMTP id a13mr32351754otf.126.1563431997958; Wed, 17 Jul 2019 23:39:57 -0700 (PDT) MIME-Version: 1.0 References: <156341206785.292348.1660822720191643298.stgit@dwillia2-desk3.amr.corp.intel.com> <156341210094.292348.2384694131126767789.stgit@dwillia2-desk3.amr.corp.intel.com> <20190718020448.GE3079@sasha-vm> In-Reply-To: <20190718020448.GE3079@sasha-vm> From: Dan Williams Date: Wed, 17 Jul 2019 23:39:47 -0700 Message-ID: Subject: Re: [PATCH v2 6/7] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock To: Sasha Levin Cc: linux-nvdimm , stable , Vishal Verma , Jane Chu , Peter Zijlstra , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 17, 2019 at 7:05 PM Sasha Levin wrote: > > On Wed, Jul 17, 2019 at 06:08:21PM -0700, Dan Williams wrote: > >A multithreaded namespace creation/destruction stress test currently > >deadlocks with the following lockup signature: > > > > INFO: task ndctl:2924 blocked for more than 122 seconds. > > Tainted: G OE 5.2.0-rc4+ #3382 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > ndctl D 0 2924 1176 0x00000000 > > Call Trace: > > ? __schedule+0x27e/0x780 > > schedule+0x30/0xb0 > > wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm] > > ? finish_wait+0x80/0x80 > > uuid_store+0xe6/0x2e0 [libnvdimm] > > kernfs_fop_write+0xf0/0x1a0 > > vfs_write+0xb7/0x1b0 > > ksys_write+0x5c/0xd0 > > do_syscall_64+0x60/0x240 > > > > INFO: task ndctl:2923 blocked for more than 122 seconds. > > Tainted: G OE 5.2.0-rc4+ #3382 > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > ndctl D 0 2923 1175 0x00000000 > > Call Trace: > > ? __schedule+0x27e/0x780 > > ? __mutex_lock+0x489/0x910 > > schedule+0x30/0xb0 > > schedule_preempt_disabled+0x11/0x20 > > __mutex_lock+0x48e/0x910 > > ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm] > > ? __lock_acquire+0x23f/0x1710 > > ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm] > > nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm] > > __dax_pmem_probe+0x5e/0x210 [dax_pmem_core] > > ? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm] > > dax_pmem_probe+0xc/0x20 [dax_pmem] > > nvdimm_bus_probe+0x90/0x2c0 [libnvdimm] > > really_probe+0xef/0x390 > > driver_probe_device+0xb4/0x100 > > > >In this sequence an 'nd_dax' device is being probed and trying to take > >the lock on its backing namespace to validate that the 'nd_dax' device > >indeed has exclusive access to the backing namespace. Meanwhile, another > >thread is trying to update the uuid property of that same backing > >namespace. So one thread is in the probe path trying to acquire the > >lock, and the other thread has acquired the lock and tries to flush the > >probe path. > > > >Fix this deadlock by not holding the namespace device_lock over the > >wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires > >the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and > >subsequently dropped internally to wait_nvdimm_bus_probe_idle(). > > > >Cc: > >Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation") > >Cc: Vishal Verma > >Tested-by: Jane Chu > >Signed-off-by: Dan Williams > > Hi Dan, > > The way these patches are split, when we take them to stable this patch > won't apply because it wants "libnvdimm/bus: Prepare the nd_ioctl() path > to be re-entrant". > > If you were to send another iteration of this patchset, could you please > re-order the patches so they will apply cleanly to stable? this will > help with reducing mail exchanges later on and possibly a mis-merge into > stable. > > If not, this should serve as a reference for future us to double check > the backport. Oh we should backport all of them. I'll tag that one for -stable as well. It's a hard pre-requisite for the fix.