Received: by 2002:ac0:a591:0:0:0:0:0 with SMTP id m17-v6csp1588749imm; Fri, 6 Jul 2018 02:56:06 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeVU7qX/pAp0i9TFVOojmZiMPepJUuOlHu42Y7T6J9qc0NzBrhac6DKjz2qjJ5mfX4n0lJY X-Received: by 2002:a63:f45:: with SMTP id 5-v6mr7039096pgp.447.1530870966573; Fri, 06 Jul 2018 02:56:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530870966; cv=none; d=google.com; s=arc-20160816; b=gJwoS9GeHwP7V02OEKZrqAf0ikzATm2xl4DrS/CIVbMfbzNsVyrPhOuFxuJ4vEcpGN viV9EibobLYCdAbEcYy/GVOXLSVRwCGwqwMDRjCyP4B3Hs/xdKALYJW1YRR1ilK2kJ7R R5FC0yFKqdpP3fX6l38AYBvrs/+sirSe65ZLFkgB/K0OUpC862C4EI78k70eAy8tZBO4 pEvb4EQCJLuELj8knyY0ev+X6VdRSl5Hup0VLKaiGe8m7vyABZthjKG8CKRhIRMN5pST 47JsJTzt8V6nJrers5WAmHrZvFjYdSZEJFrxZBvAF1pW70VgVYRMR1sJnJno8os7gVEh 0ppw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=Ylf1FgW1ULm9eMDKoSbLgQcHKLPByJxgRAAJR9GbMhY=; b=HEo4RNCZ+RbhTmw/n9wbSACPAFsCHIlrx4EPMPqKa5hO/3ziIhmwXc0p4ZlLS15ZK0 3dulDCzmqrIaU4wbgi1448zX5ryydpx6yInsIG9vBPDTYTVp5mgQsCsUnhZB2C7IgrDE fOIjbuNnBgD0zN8TlXoWw2Jj6L/9FoIUzsAnTA0jIAiSxkqMEnQBRY1wHN8Q+sTcG410 KvnD9krnvFSKggKUoC4dJNY6fe4G4jYApjmDXpb6hVdKj4fO47CZC/8H0qejSif4/CYK cwLfwOIWRQMLqXGk62Eb9f9SstbdBk7BhqbR6BLfgz7UwOjzWosuTFL5sod/fd85Muhr kNRg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x13-v6si7918667pln.62.2018.07.06.02.55.51; Fri, 06 Jul 2018 02:56:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754100AbeGFJyq (ORCPT + 99 others); Fri, 6 Jul 2018 05:54:46 -0400 Received: from cloudserver094114.home.pl ([79.96.170.134]:41581 "EHLO cloudserver094114.home.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754001AbeGFJyo (ORCPT ); Fri, 6 Jul 2018 05:54:44 -0400 Received: from 79.184.254.38.ipv4.supernova.orange.pl (79.184.254.38) (HELO aspire.rjw.lan) by serwer1319399.home.pl (79.96.170.134) with SMTP (IdeaSmtpServer 0.83) id 2f18f1563ae7ccd0; Fri, 6 Jul 2018 11:54:42 +0200 From: "Rafael J. Wysocki" To: Pingfan Liu Cc: linux-kernel@vger.kernel.org, Greg Kroah-Hartman , "Rafael J . Wysocki" , Grygorii Strashko , Christoph Hellwig , Bjorn Helgaas , Dave Young , linux-pci@vger.kernel.org, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCHv3 2/4] drivers/base: utilize device tree info to shutdown devices Date: Fri, 06 Jul 2018 11:53:15 +0200 Message-ID: <2067910.hkxRV6zLYm@aspire.rjw.lan> In-Reply-To: References: <1530600642-25090-1-git-send-email-kernelfans@gmail.com> <2108146.dv4EAOf6IP@aspire.rjw.lan> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Friday, July 6, 2018 5:02:15 AM CEST Pingfan Liu wrote: > On Thu, Jul 5, 2018 at 6:13 PM Rafael J. Wysocki wrote: > > > > On Tuesday, July 3, 2018 8:50:40 AM CEST Pingfan Liu wrote: > > > commit 52cdbdd49853 ("driver core: correct device's shutdown order") > > > places an assumption of supplier<-consumer order on the process of probe. > > > But it turns out to break down the parent <- child order in some scene. > > > E.g in pci, a bridge is enabled by pci core, and behind it, the devices > > > have been probed. Then comes the bridge's module, which enables extra > > > feature(such as hotplug) on this bridge. This will break the > > > parent<-children order and cause failure when "kexec -e" in some scenario. > > > > > > The detailed description of the scenario: > > > An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod) > > > match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due > > > to some issue. For this case, the bridge is moved after its children in > > > devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not > > > write back buffer in flight due to the former shutdown of the bridge which > > > clears the BusMaster bit. > > > > > > It is a little hard to impose both "parent<-child" and "supplier<-consumer" > > > order on devices_kset. Take the following scene: > > > step0: before a consumer's probing, (note child_a is supplier of consumer_a) > > > [ consumer-X, child_a, ...., child_z] [... consumer_a, ..., consumer_z, ...] supplier-X > > > ^^^^^^^^^^ affected range ^^^^^^^^^^ > > > step1: when probing, moving consumer-X after supplier-X > > > [ child_a, ...., child_z] [.... consumer_a, ..., consumer_z, ...] supplier-X, consumer-X > > > step2: the children of consumer-X should be re-ordered to maintain the seq > > > [... consumer_a, ..., consumer_z, ....] supplier-X [consumer-X, child_a, ...., child_z] > > > step3: the consumer_a should be re-ordered to maintain the seq > > > [... consumer_z, ...] supplier-X [ consumer-X, child_a, consumer_a ..., child_z] > > > > > > It requires two nested recursion to drain out all out-of-order item in > > > "affected range". To avoid such complicated code, this patch suggests > > > to utilize the info in device tree, instead of using the order of > > > devices_kset during shutdown. It iterates the device tree, and firstly > > > shutdown a device's children and consumers. After this patch, the buggy > > > commit is hollow and left to clean. > > > > > > Cc: Greg Kroah-Hartman > > > Cc: Rafael J. Wysocki > > > Cc: Grygorii Strashko > > > Cc: Christoph Hellwig > > > Cc: Bjorn Helgaas > > > Cc: Dave Young > > > Cc: linux-pci@vger.kernel.org > > > Cc: linuxppc-dev@lists.ozlabs.org > > > Signed-off-by: Pingfan Liu > > > --- > > > drivers/base/core.c | 48 +++++++++++++++++++++++++++++++++++++++++++----- > > > include/linux/device.h | 1 + > > > 2 files changed, 44 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/base/core.c b/drivers/base/core.c > > > index a48868f..684b994 100644 > > > --- a/drivers/base/core.c > > > +++ b/drivers/base/core.c > > > @@ -1446,6 +1446,7 @@ void device_initialize(struct device *dev) > > > INIT_LIST_HEAD(&dev->links.consumers); > > > INIT_LIST_HEAD(&dev->links.suppliers); > > > dev->links.status = DL_DEV_NO_DRIVER; > > > + dev->shutdown = false; > > > } > > > EXPORT_SYMBOL_GPL(device_initialize); > > > > > > @@ -2811,7 +2812,6 @@ static void __device_shutdown(struct device *dev) > > > * lock is to be held > > > */ > > > parent = get_device(dev->parent); > > > - get_device(dev); > > > > Why is the get_/put_device() not needed any more? > > > They are moved upper layer into device_for_each_child_shutdown(). > Since there is lock breakage in __device_shutdown(), resorting to > ref++ to protect the ancestor. And I think the > get_device(dev->parent) can be deleted either. Wouldn't that break USB? > > > /* > > > * Make sure the device is off the kset list, in the > > > * event that dev->*->shutdown() doesn't remove it. > > > @@ -2842,23 +2842,60 @@ static void __device_shutdown(struct device *dev) > > > dev_info(dev, "shutdown\n"); > > > dev->driver->shutdown(dev); > > > } > > > - > > > + dev->shutdown = true; > > > device_unlock(dev); > > > if (parent) > > > device_unlock(parent); > > > > > > - put_device(dev); > > > put_device(parent); > > > spin_lock(&devices_kset->list_lock); > > > } > > > > > > +/* shutdown dev's children and consumer firstly, then itself */ > > > +static int device_for_each_child_shutdown(struct device *dev) > > > > Confusing name. > > > > What about device_shutdown_subordinate()? > > > Fine. My understanding of words is not exact. > > > > +{ > > > + struct klist_iter i; > > > + struct device *child; > > > + struct device_link *link; > > > + > > > + /* already shutdown, then skip this sub tree */ > > > + if (dev->shutdown) > > > + return 0; > > > + > > > + if (!dev->p) > > > + goto check_consumers; > > > + > > > + /* there is breakage of lock in __device_shutdown(), and the redundant > > > + * ref++ on srcu protected consumer is harmless since shutdown is not > > > + * hot path. > > > + */ > > > + get_device(dev); > > > + > > > + klist_iter_init(&dev->p->klist_children, &i); > > > + while ((child = next_device(&i))) > > > + device_for_each_child_shutdown(child); > > > > Why don't you use device_for_each_child() here? > > > OK, I will try use it. Well, hold on. > > > + klist_iter_exit(&i); > > > + > > > +check_consumers: > > > + list_for_each_entry_rcu(link, &dev->links.consumers, s_node) { > > > + if (!link->consumer->shutdown) > > > + device_for_each_child_shutdown(link->consumer); > > > + } > > > + > > > + __device_shutdown(dev); > > > + put_device(dev); > > > > Possible reference counter imbalance AFAICS. > > > Yes, get_device() should be ahead of "if (!dev->p)". Is anything else I miss? Yes, that's it. > > > + return 0; > > > +} > > > > Well, instead of doing this dance, we might as well walk dpm_list here as it > > is in the right order. > > > Sorry, do you mean that using the same way to manage the dpm_list? No, I mean to use dpm_list instead of devices_kset for shutdown. They should be in the same order anyway if all is correct. > > Of course, that would require dpm_list to be available for CONFIG_PM unset, > > but it may be a better approach long term. > > > > > + > > > /** > > > * device_shutdown - call ->shutdown() on each device to shutdown. > > > */ > > > void device_shutdown(void) > > > { > > > struct device *dev; > > > + int idx; > > > > > > + idx = device_links_read_lock(); > > > spin_lock(&devices_kset->list_lock); > > > /* > > > * Walk the devices list backward, shutting down each in turn. > > > @@ -2866,11 +2903,12 @@ void device_shutdown(void) > > > * devices offline, even as the system is shutting down. > > > */ > > > while (!list_empty(&devices_kset->list)) { > > > - dev = list_entry(devices_kset->list.prev, struct device, > > > + dev = list_entry(devices_kset->list.next, struct device, > > > kobj.entry); > > > - __device_shutdown(dev); > > > + device_for_each_child_shutdown(dev); > > > } > > > spin_unlock(&devices_kset->list_lock); > > > + device_links_read_unlock(idx); > > > } > > > > > > /* > > > diff --git a/include/linux/device.h b/include/linux/device.h > > > index 055a69d..8a0f784 100644 > > > --- a/include/linux/device.h > > > +++ b/include/linux/device.h > > > @@ -1003,6 +1003,7 @@ struct device { > > > bool offline:1; > > > bool of_node_reused:1; > > > bool dma_32bit_limit:1; > > > + bool shutdown:1; /* one direction: false->true */ > > > }; > > > > > > static inline struct device *kobj_to_dev(struct kobject *kobj) > > > > > > > If the device_kset_move_last() in really_probe() is the only problem, > > I'd rather try to fix that one in the first place. > > > > Why is it needed? > > > I had tried, but it turns out not easy to archive. The code is > https://patchwork.kernel.org/patch/10485195/. And I make a detailed > description of the algorithm in this patch's commit log. To be more > detailed, we face the potential out of order issue in really_probe() > like : 0th. [ consumer-X, child_a, ...., child_z] [... consumer_a, > ..., consumer_z, ...] supplier-X //(note child_a is supplier of > consumer_a). To address all the potential out of order item in the > affected section [... consumer_a, ..., consumer_z, ...], it will > incur two nested recursions. 1st, moving consumer-X and its > descendants after supplier-X, 2nd, moving consumer_a after child_a, > 3rd. the 2nd step may pose the same situation of 0th. Besides the two > interleaved recursion, the breakage of spin lock requires more effort > to protect the item from disappearing in linked-list (which I did not > implement in the https://patchwork.kernel.org/patch/10485195/). Hence > I turn to this cheap method. So I think that we simply need to drop the devices_kset_move_last() call from really_probe() as it is plain incorrect and the use case for it is questionable at best. And the use case it is supposed to address should be addressed differently. Thanks, Rafael