Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp880194imj; Fri, 15 Feb 2019 08:17:56 -0800 (PST) X-Google-Smtp-Source: AHgI3IZuGNsus4rQwTmVULjxaLjLIf21j6gBd6tMNzpeejjmo2xp+Dg9w4biZA/CcE95bEiKWlBi X-Received: by 2002:a62:4bd5:: with SMTP id d82mr10359422pfj.85.1550247476615; Fri, 15 Feb 2019 08:17:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550247476; cv=none; d=google.com; s=arc-20160816; b=vzKGZ9MP90np9rrAfqRQRQ7/fPxtK74q637QLeYy51PXNnkUbIdS5U3cMYx/Zm/h1m MvAEQLjV07D79947fnFDMy0DciGOFs3DHBodlY2HmI7eMXUrYNnkIH1w3IY96nZjSxB/ DKoRdNsLxVTJtZzuBJsPkNyKsGQez3i3tSVZuMsQmL8wIpN9Fvc7a9armVgNV5ODY9cT 9vOg9G6MdBwPUxw9T2NWeUo4WQLpX9JIlM3uSssCClIJseHEmcTgKSkGrQYRrIu780kB fWnTc87jyBu4O8P7N4gcvAPvWqq5MiNWflHVDBInROJzaCeuVez32r/2izedGoWrWNGH Im/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:dkim-signature:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=1XQY04J3K89YhrOYrMPXZ+U3kFS38yM5qw7rD8NI0t4=; b=LGDVEvFzbYkygL69p4Ot7YZNqztJvDK8JuHvuCry0Iz7iR4iHDkOW/VvxfbZjYoHFG gnKWZSgXqOToRD7aV3I+WSdG1uYEoVNOz0kYom2SCsPCZZHzcLnfePsb/DzA3ULcujhY Oa74mvZk5JrM3GGorxf+k785lOjbfRHvvz8JptfHrPzEa/auS8wYLklnpGtZc4TlIZ/i xmD9R5hwFOFGEbZgQe2wI9naSNUIsE3KiH/UFiB9/w0OeZ3lSCjtkOmjOqw4QWyBCazv S/vSZe1REPFh0xoDswkscGk+f6eVyYenfuTqXwXhJLySNRZrV6s/MQeAcIa2G3pAC1Db rGcw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=NJHB+6NV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o24si6390748pfa.34.2019.02.15.08.17.40; Fri, 15 Feb 2019 08:17:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=NJHB+6NV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437135AbfBONVS (ORCPT + 99 others); Fri, 15 Feb 2019 08:21:18 -0500 Received: from hqemgate14.nvidia.com ([216.228.121.143]:3129 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2437117AbfBONVR (ORCPT ); Fri, 15 Feb 2019 08:21:17 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Fri, 15 Feb 2019 05:21:20 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Fri, 15 Feb 2019 05:21:15 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Fri, 15 Feb 2019 05:21:15 -0800 Received: from [10.21.132.148] (172.20.13.39) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Fri, 15 Feb 2019 13:21:12 +0000 Subject: Re: [PATCH 2/2] driver core: Fix possible supplier PM-usage counter imbalance To: "Rafael J. Wysocki" CC: Greg Kroah-Hartman , LKML , Linux PM , Ulf Hansson , Daniel Vetter , Lukas Wunner , Andrzej Hajda , Russell King - ARM Linux , Lucas Stach , Linus Walleij , Thierry Reding , Laurent Pinchart , Marek Szyprowski , linux-tegra References: <5510642.nRbR3bcduN@aspire.rjw.lan> <9351473.C2nPJoyFsE@aspire.rjw.lan> <2ed95b05-317c-59bb-498a-b5481e54bcf6@nvidia.com> <23147304.zVnvcQtZVR@aspire.rjw.lan> From: Jon Hunter Message-ID: <03cb05e4-5d34-3669-1ce9-bb8710c70c95@nvidia.com> Date: Fri, 15 Feb 2019 13:21:10 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <23147304.zVnvcQtZVR@aspire.rjw.lan> X-Originating-IP: [172.20.13.39] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL101.nvidia.com (172.20.187.10) Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1550236880; bh=1XQY04J3K89YhrOYrMPXZ+U3kFS38yM5qw7rD8NI0t4=; h=X-PGP-Universal:Subject:To:CC:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:X-Originating-IP: X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=NJHB+6NVb67jIqSSM06n/bw0Fjp0IpU7JyhhglMCCq9gIOXq+Dn3RLYqXciV34Dmb Qd0qdPxklbgT1mQeCbhM9Ue6lTiexiB7pBLeavPLQmR6DNu9wUZZg8BzUo0cssSwAc 7wWDsE/rDJ91yro/w0tcSWpEfuPIQEqWnYLKwu3cKvzkZ66Ye3xtdLB3eMAFMibRSJ XPkZ0xSGknrH/cqOZXLBp19DVvBWPs75txzMxaFf4vsBvZWbIMy8XOUrlw0qeS6Gvx +vUHiUN70b/WjgHaQJZsn4yNyJhUlyflo3ISVXShisiQdaPfoOJ17i/K5b3VuvmetH 06n2rDGmV7yuA== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 15/02/2019 12:06, Rafael J. Wysocki wrote: > On Friday, February 15, 2019 12:00:27 PM CET Jon Hunter wrote: >> Hi Rafael, >> >> On 12/02/2019 12:08, Rafael J. Wysocki wrote: >>> From: Rafael J. Wysocki >>> >>> If a stateless device link to a certain supplier with >>> DL_FLAG_PM_RUNTIME set in the flags is added and then removed by the >>> consumer driver's probe callback, the supplier's PM-runtime usage >>> counter will be nonzero after that which effectively causes the >>> supplier to remain "always on" going forward. >>> >>> Namely, device_link_add() called to add the link invokes >>> device_link_rpm_prepare() which notices that the consumer driver is >>> probing, so it increments the supplier's PM-runtime usage counter >>> with the assumption that the link will stay around until >>> pm_runtime_put_suppliers() is called by driver_probe_device(), >>> but if the link goes away before that point, the supplier's >>> PM-runtime usage counter will remain nonzero. >>> >>> To prevent that from happening, first rework pm_runtime_get_suppliers() >>> and pm_runtime_put_suppliers() to use the rpm_active refounts of device >>> links and make the latter only drop rpm_active and the supplier's >>> PM-runtime usage counter for each link by one, unless rpm_active is >>> one already for it. Next, modify device_link_add() to bump up the >>> new link's rpm_active refcount and the suppliers PM-runtime usage >>> counter by two, to prevent pm_runtime_put_suppliers(), if it is >>> called subsequently, from suspending the supplier prematurely (in >>> case its PM-runtime usage counter goes down to 0 in there). >>> >>> Due to the way rpm_put_suppliers() works, this change does not >>> affect runtime suspend of the consumer ends of new device links (or, >>> generally, device links for which DL_FLAG_PM_RUNTIME has just been >>> set). >>> >>> Fixes: e2f3cd831a28 ("driver core: Fix handling of runtime PM flags in device_link_add()") >>> Reported-by: Ulf Hansson >>> Signed-off-by: Rafael J. Wysocki >>> --- >>> >>> Note that the issue had been there before commit e2f3cd831a28, but it was >>> overlooked by that commit and this change is a fix on top of it, so make >>> the Fixes: tag point to commit e2f3cd831a28 (instead of an earlier one >>> that the patch will not be applicable to). >> I noticed that yesterday's and today's -next were no longer booting on >> one of our Tegra boards (Tegra210 Jetson TX2) because networking is >> failing. The ethernet chip is a USB device and looking at the bootlogs I >> can see that the Tegra XHCI driver is failing ... >> >> tegra-xusb 70090000.usb: xHCI host controller not responding, assume dead >> tegra-xusb 70090000.usb: HC died; cleaning up >> >> The Tegra XHCI driver uses multiple power-domains and uses >> device_link_add() to attach them. So now I am wondering if there is >> something that we have got wrong in our implementation. However, I don't >> see the device being probed deferred on boot or anything like that. >> >> The driver in question is drivers/usb/host/xhci-tegra.c and we add the >> links in the function tegra_xusb_powerdomain_init() which is before RPM >> is enabled. Let me know if you have any thoughts. > > Please try the appended patch on top of the $subject one (provided that > reverting the $subject patch makes the problem go away). Thanks and yes to confirm, reverting the $subject patch on top of next does make the issue go away. > --- > drivers/base/power/runtime.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > Index: linux-pm/drivers/base/power/runtime.c > =================================================================== > --- linux-pm.orig/drivers/base/power/runtime.c > +++ linux-pm/drivers/base/power/runtime.c > @@ -1675,9 +1675,12 @@ void pm_runtime_put_suppliers(struct dev > idx = device_links_read_lock(); > > list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) > - if (link->flags & DL_FLAG_PM_RUNTIME && > - refcount_dec_not_one(&link->rpm_active)) > - pm_runtime_put(link->supplier); > + if (link->flags & DL_FLAG_PM_RUNTIME) { > + if (refcount_dec_not_one(&link->rpm_active)) > + pm_runtime_put(link->supplier); > + else > + pm_request_idle(link->supplier); > + } > > device_links_read_unlock(idx); > } I will try this now and report back in a bit. Cheers Jon -- nvpublic