Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752204AbdFOW1d (ORCPT ); Thu, 15 Jun 2017 18:27:33 -0400 Received: from mail.kernel.org ([198.145.29.99]:39180 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750820AbdFOW1c (ORCPT ); Thu, 15 Jun 2017 18:27:32 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 18DB9219A9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=helgaas@kernel.org Date: Thu, 15 Jun 2017 17:27:29 -0500 From: Bjorn Helgaas To: Srinath Mannam Cc: bhelgaas@google.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, bcm-kernel-feedback-list@broadcom.com Subject: Re: [RFC PATCH v2] pci: Concurrency issue in NVMe Init through PCIe switch Message-ID: <20170615222729.GJ12735@bhelgaas-glaptop.roam.corp.google.com> References: <1496135297-19680-1-git-send-email-srinath.mannam@broadcom.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1496135297-19680-1-git-send-email-srinath.mannam@broadcom.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4892 Lines: 141 Hi Srinath, On Tue, May 30, 2017 at 02:38:17PM +0530, Srinath Mannam wrote: > We found a concurrency issue in NVMe Init when we initialize > multiple NVMe connected over PCIe switch. > > Setup details: > - SMP system with 8 ARMv8 cores running Linux kernel(4.11). > - Two NVMe cards are connected to PCIe RC through bridge as shown > in the below figure. > > [RC] > | > [BRIDGE] > | > ----------- > | | > [NVMe] [NVMe] > > Issue description: > After PCIe enumeration completed NVMe driver probe function called > for both the devices from two CPUS simultaneously. > From nvme_probe, pci_enable_device_mem called for both the EPs. This > function called pci_enable_bridge enable recursively until RC. Let's refine the changelog a little bit by removing details that aren't pertinent. The fact that this happens with NVMe on ARMv8 is irrelevant. It could happen on any SMP system. The critical thing is that drivers for two devices, both below the same disabled bridge, called pci_enable_device() about the same time, and both tried to enable the bridge simultaneously. > Inside pci_enable_bridge function, at two places concurrency issue is > observed. > > Place 1: > CPU 0: > 1. Done Atomic increment dev->enable_cnt > in pci_enable_device_flags > 2. Inside pci_enable_resources > 3. Completed pci_read_config_word(dev, PCI_COMMAND, &cmd) > 4. Ready to set PCI_COMMAND_MEMORY (0x2) in > pci_write_config_word(dev, PCI_COMMAND, cmd) > CPU 1: > 1. Check pci_is_enabled in function pci_enable_bridge > and it is true > 2. Check (!dev->is_busmaster) also true > 3. Gone into pci_set_master > 4. Completed pci_read_config_word(dev, PCI_COMMAND, &old_cmd) > 5. Ready to set PCI_COMMAND_MASTER (0x4) in > pci_write_config_word(dev, PCI_COMMAND, cmd) > > By the time of last point for both the CPUs are read value 0 and > ready to write 2 and 4. > After last point final value in PCI_COMMAND register is 4 instead of 6. > > Place 2: > CPU 0: > 1. Done Atomic increment dev->enable_cnt in > pci_enable_device_flags > > Signed-off-by: Srinath Mannam > --- > Changes since v1: > - Used mutex to syncronize pci_enable_bridge > > drivers/pci/pci.c | 4 ++++ > drivers/pci/probe.c | 1 + > include/linux/pci.h | 1 + > 3 files changed, 6 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index b01bd5b..5bff3e7 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -1347,7 +1347,9 @@ static void pci_enable_bridge(struct pci_dev *dev) > { > struct pci_dev *bridge; > int retval; > + struct mutex *lock = &dev->bridge_lock; > > + mutex_lock(lock); I don't think it's necessary to hold the lock until we call pci_set_master() or pci_enable_device(), is it? E.g., we shouldn't need to hold the lock for "dev" while we call pci_enable_bridge() for its upstream bridge. > bridge = pci_upstream_bridge(dev); > if (bridge) > pci_enable_bridge(bridge); > @@ -1355,6 +1357,7 @@ static void pci_enable_bridge(struct pci_dev *dev) > if (pci_is_enabled(dev)) { > if (!dev->is_busmaster) > pci_set_master(dev); > + mutex_unlock(lock); It's not a big deal either way, but I probably would write this with a single unlock at the end and a goto here. > return; > } > > @@ -1363,6 +1366,7 @@ static void pci_enable_bridge(struct pci_dev *dev) > dev_err(&dev->dev, "Error enabling bridge (%d), continuing\n", > retval); > pci_set_master(dev); > + mutex_unlock(lock); > } > > static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags) > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index 19c8950..1c25d1c 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -880,6 +880,7 @@ static struct pci_bus *pci_alloc_child_bus(struct pci_bus *parent, > child->dev.parent = child->bridge; > pci_set_bus_of_node(child); > pci_set_bus_speed(child); > + mutex_init(&bridge->bridge_lock); > > /* Set up default resource pointers and names.. */ > for (i = 0; i < PCI_BRIDGE_RESOURCE_NUM; i++) { > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 33c2b0b..7e88f41 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -266,6 +266,7 @@ struct pci_dev { > void *sysdata; /* hook for sys-specific extension */ > struct proc_dir_entry *procent; /* device entry in /proc/bus/pci */ > struct pci_slot *slot; /* Physical slot this device is in */ > + struct mutex bridge_lock; I don't really like adding a per-device lock just for this unusual case. Can you use the existing device_lock() instead? > unsigned int devfn; /* encoded device & function index */ > unsigned short vendor; > -- > 2.7.4 >