Received: by 10.213.65.68 with SMTP id h4csp1361024imn; Wed, 21 Mar 2018 08:51:03 -0700 (PDT) X-Google-Smtp-Source: AG47ELtOpU0Yszax8d2n3FUCM8yTscTMTNqYyolKZdzXDqQJDlh5B1i0zTJir9Jl1AQANXH5UDxn X-Received: by 10.99.50.66 with SMTP id y63mr15660965pgy.207.1521647463883; Wed, 21 Mar 2018 08:51:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521647463; cv=none; d=google.com; s=arc-20160816; b=o6x80f4G91JN6yOIVA1Z7bMV36TcuSZ+yRXQOxa98O/yA/Vyb7IzN1tvvlfQGfY/Bh WQEmrDPCdkxu4e/BS5JB4BNe3i5NOn0WCMPXlU8OrQMxI/kVsTsBNRFhCzPunymghhyP XiVc9o9sbZeMwTBD5UUfbuV8ke/Mdm2qEUJzaCggoPTyRrVH/ADml+vjM17PqL5bxfIm 8MXbaFpr8bJM7alMbXnBzNcplFJRUayYTp2THerW27HWqkpsH7anZLGkess6B8E0mfye qyXTy6VXCkNEW5TpMYojy8aujGmiDEWLXQrzVlSjPKMfsP5911cxJr6lxZnnD7I/yt9e kISg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=QXpfJklhf8Ef4C5NBYOi3CK0Ve6JTEn+at70HFUeJIM=; b=Bs1OWfA//0/kVEscRA0G1fTBjEpC/WY6Gc5SVfrLzcppbpbY5Ra2vcLWDo3qbHqrjc 7dyr6TK6NLPUAtexhrxKxoD48Fu1T+uAynyBjWiyLGNO1Z+65/VTx3jHNZUFJMIjhrug zVrbQJ3aqjs/uyH7G7Toi85ZHcu1D9aGGNr/3Wn9JK8Gr14H2OaMlSstTesto/QyrPmB qF4yI9cIXE8rRNIEAAy3do7KE8W+/ui9y4Jq9dFohpSGGbhLl3TA45cQiBZpJklgLcLd kXu2oHtAYDE3lEBO7QucVzNLhZhhMyyBT6pLTzTa3PK+cQOsDeLSqK3UVIB3LJaZm96z DrfA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w3-v6si4140137plb.17.2018.03.21.08.50.49; Wed, 21 Mar 2018 08:51:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752822AbeCUPse (ORCPT + 99 others); Wed, 21 Mar 2018 11:48:34 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:40098 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752432AbeCUPsa (ORCPT ); Wed, 21 Mar 2018 11:48:30 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C15D0A2017; Wed, 21 Mar 2018 15:48:29 +0000 (UTC) Received: from ming.t460p (ovpn-12-103.pek2.redhat.com [10.72.12.103]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CD11D10B00A5; Wed, 21 Mar 2018 15:48:16 +0000 (UTC) Date: Wed, 21 Mar 2018 23:48:09 +0800 From: Ming Lei To: Marta Rybczynska Cc: keith busch , axboe@fb.com, hch@lst.de, sagi@grimberg.me, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, bhelgaas@google.com, linux-pci@vger.kernel.org, Pierre-Yves Kerbrat Subject: Re: [RFC PATCH] nvme: avoid race-conditions when enabling devices Message-ID: <20180321154807.GD22254@ming.t460p> References: <744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu> <20180321115037.GA26083@ming.t460p> <464125757.5843583.1521634231341.JavaMail.zimbra@kalray.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <464125757.5843583.1521634231341.JavaMail.zimbra@kalray.eu> User-Agent: Mutt/1.9.1 (2017-09-22) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Wed, 21 Mar 2018 15:48:29 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Wed, 21 Mar 2018 15:48:29 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'ming.lei@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 21, 2018 at 01:10:31PM +0100, Marta Rybczynska wrote: > > On Wed, Mar 21, 2018 at 12:00:49PM +0100, Marta Rybczynska wrote: > >> NVMe driver uses threads for the work at device reset, including enabling > >> the PCIe device. When multiple NVMe devices are initialized, their reset > >> works may be scheduled in parallel. Then pci_enable_device_mem can be > >> called in parallel on multiple cores. > >> > >> This causes a loop of enabling of all upstream bridges in > >> pci_enable_bridge(). pci_enable_bridge() causes multiple operations > >> including __pci_set_master and architecture-specific functions that > >> call ones like and pci_enable_resources(). Both __pci_set_master() > >> and pci_enable_resources() read PCI_COMMAND field in the PCIe space > >> and change it. This is done as read/modify/write. > >> > >> Imagine that the PCIe tree looks like: > >> A - B - switch - C - D > >> \- E - F > >> > >> D and F are two NVMe disks and all devices from B are not enabled and bus > >> mastering is not set. If their reset work are scheduled in parallel the two > >> modifications of PCI_COMMAND may happen in parallel without locking and the > >> system may end up with the part of PCIe tree not enabled. > > > > Then looks serialized reset should be used, and I did see the commit > > 79c48ccf2fe ("nvme-pci: serialize pci resets") fixes issue of 'failed > > to mark controller state' in reset stress test. > > > > But that commit only covers case of PCI reset from sysfs attribute, and > > maybe other cases need to be dealt with in similar way too. > > > > It seems to me that the serialized reset works for multiple resets of the > same device, doesn't it? Our problem is linked to resets of different devices > that share the same PCIe tree. Given reset shouldn't be a frequent action, it might be fine to serialize all reset from different devices. Thanks, Ming