Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp883773imj; Fri, 15 Feb 2019 08:21:07 -0800 (PST) X-Google-Smtp-Source: AHgI3IZP57PeWaoSp7fFAkiR+9rdiGwzHG9HswmceUBT4EmITOBMs2zZOjubtar7oTsCci9MtWPV X-Received: by 2002:a63:f553:: with SMTP id e19mr6028559pgk.87.1550247667557; Fri, 15 Feb 2019 08:21:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550247667; cv=none; d=google.com; s=arc-20160816; b=0K6IBVLYWcvCGdx5MxoEEt4/pm46w5MCUZdzq/CB0IPkM0fRnTER5kWIOJdHVduONE 5zB9pTxOOzpnX1H7CYAwXFUfOLIj7warhUTLUt5Z1tQWS9jTkhzC0zD2mLqMNTMrh84w +LIWjACZsjcaV5NVOdv5qxXzfsZCr17PFrFaI/xOawj9Ws68H/V7q0lPt/xrfra/QYkB RCSK3dj/X+97SQk0uMEj2RofA5HXepaPbtpIHDjh3yXXoxT12ukh6s43ZRPRMOwfCdd7 QysSoQ7FGYGoTVIosIXNOYpbH9FMl41mtf84u9XNKH2C3tplBfj62rI5M1sK3EEU0oZP Zhmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:dkim-signature:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:from:subject; bh=QHLvf2kAtRLdCg9LxIHTdGmok2BFcNgoPRRgnqUK5Gg=; b=NJVuI9t9xrYMmcIZ9y7pHU0s0Gy/od+Yec/vzjBav5+fouxWM0ejgGxVpj1F1UMEU4 HYPl7qGBG08ntSQHo4Sv2NIuJPcF+v1Gc27vy0euWBMIUWU93mbFzT1uMCkCs2qk9Zzm t+qEwnokNHP81f+gqQQdJrZbsxK7OYK9kKQQucjc6gjdO303rUK/kcTFdP2ajG58997Q myPOZ/FyLlykKlauu3wUJZAIfSV7fembA0lMI7p6PImTjlWgin06hJVVKAakSKDLLnJW qrm/urzQcDwlLgt46ilrsQiSWuKrkvFa0DIAzT0Fp3TZQ/vogsVciUHZkIi43aAq1ADx kg0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=npQFcA5+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 29si5847747pgs.478.2019.02.15.08.20.51; Fri, 15 Feb 2019 08:21:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=npQFcA5+; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729819AbfBOOPB (ORCPT + 99 others); Fri, 15 Feb 2019 09:15:01 -0500 Received: from hqemgate14.nvidia.com ([216.228.121.143]:5456 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726324AbfBOOPB (ORCPT ); Fri, 15 Feb 2019 09:15:01 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Fri, 15 Feb 2019 06:15:04 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Fri, 15 Feb 2019 06:14:59 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Fri, 15 Feb 2019 06:14:59 -0800 Received: from [10.21.132.148] (172.20.13.39) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Fri, 15 Feb 2019 14:14:56 +0000 Subject: Re: [PATCH 2/2] driver core: Fix possible supplier PM-usage counter imbalance From: Jon Hunter To: "Rafael J. Wysocki" CC: Greg Kroah-Hartman , LKML , Linux PM , Ulf Hansson , Daniel Vetter , Lukas Wunner , Andrzej Hajda , Russell King - ARM Linux , Lucas Stach , Linus Walleij , Thierry Reding , Laurent Pinchart , Marek Szyprowski , linux-tegra References: <5510642.nRbR3bcduN@aspire.rjw.lan> <9351473.C2nPJoyFsE@aspire.rjw.lan> <2ed95b05-317c-59bb-498a-b5481e54bcf6@nvidia.com> <23147304.zVnvcQtZVR@aspire.rjw.lan> <03cb05e4-5d34-3669-1ce9-bb8710c70c95@nvidia.com> Message-ID: <71d5b9e0-8906-eded-f8bd-9a9023e54eb9@nvidia.com> Date: Fri, 15 Feb 2019 14:14:54 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <03cb05e4-5d34-3669-1ce9-bb8710c70c95@nvidia.com> X-Originating-IP: [172.20.13.39] X-ClientProxiedBy: HQMAIL103.nvidia.com (172.20.187.11) To HQMAIL101.nvidia.com (172.20.187.10) Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1550240104; bh=QHLvf2kAtRLdCg9LxIHTdGmok2BFcNgoPRRgnqUK5Gg=; h=X-PGP-Universal:Subject:From:To:CC:References:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:X-Originating-IP: X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=npQFcA5+PGdptoiFTtXvf2sqW9u11fZbjSwkJ1Pf2nJkAr79xinaDr98O46kFiHpg 2pydDdFoXJlOV3qWe2QIyPtNnAjk0HQ5TJEXLwUWZGg8xZao4kUvgyXvBXLZNbAAWH /eV7QzUgjHmO6SJIivOokhFKgoBop3GZliOrLHBP3t2lkfixqhjup1EYbNcZSNWfVJ duh5gAOCbiWYBfViyR3YNgWD3hj2VDZezrGCRFz6wwvAuxmoeQjX2dXZmNtOF4PCS0 08eF5zgQ5vDS8cMvHF1mvaVN74o6QN/t3lKw5DJ26I/ZRSN0U3dr5knkQCkXEv64gO x9UPkaFTtaI0w== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 15/02/2019 13:21, Jon Hunter wrote: > > On 15/02/2019 12:06, Rafael J. Wysocki wrote: >> On Friday, February 15, 2019 12:00:27 PM CET Jon Hunter wrote: >>> Hi Rafael, >>> >>> On 12/02/2019 12:08, Rafael J. Wysocki wrote: >>>> From: Rafael J. Wysocki >>>> >>>> If a stateless device link to a certain supplier with >>>> DL_FLAG_PM_RUNTIME set in the flags is added and then removed by the >>>> consumer driver's probe callback, the supplier's PM-runtime usage >>>> counter will be nonzero after that which effectively causes the >>>> supplier to remain "always on" going forward. >>>> >>>> Namely, device_link_add() called to add the link invokes >>>> device_link_rpm_prepare() which notices that the consumer driver is >>>> probing, so it increments the supplier's PM-runtime usage counter >>>> with the assumption that the link will stay around until >>>> pm_runtime_put_suppliers() is called by driver_probe_device(), >>>> but if the link goes away before that point, the supplier's >>>> PM-runtime usage counter will remain nonzero. >>>> >>>> To prevent that from happening, first rework pm_runtime_get_suppliers() >>>> and pm_runtime_put_suppliers() to use the rpm_active refounts of device >>>> links and make the latter only drop rpm_active and the supplier's >>>> PM-runtime usage counter for each link by one, unless rpm_active is >>>> one already for it. Next, modify device_link_add() to bump up the >>>> new link's rpm_active refcount and the suppliers PM-runtime usage >>>> counter by two, to prevent pm_runtime_put_suppliers(), if it is >>>> called subsequently, from suspending the supplier prematurely (in >>>> case its PM-runtime usage counter goes down to 0 in there). >>>> >>>> Due to the way rpm_put_suppliers() works, this change does not >>>> affect runtime suspend of the consumer ends of new device links (or, >>>> generally, device links for which DL_FLAG_PM_RUNTIME has just been >>>> set). >>>> >>>> Fixes: e2f3cd831a28 ("driver core: Fix handling of runtime PM flags in device_link_add()") >>>> Reported-by: Ulf Hansson >>>> Signed-off-by: Rafael J. Wysocki >>>> --- >>>> >>>> Note that the issue had been there before commit e2f3cd831a28, but it was >>>> overlooked by that commit and this change is a fix on top of it, so make >>>> the Fixes: tag point to commit e2f3cd831a28 (instead of an earlier one >>>> that the patch will not be applicable to). >>> I noticed that yesterday's and today's -next were no longer booting on >>> one of our Tegra boards (Tegra210 Jetson TX2) because networking is >>> failing. The ethernet chip is a USB device and looking at the bootlogs I >>> can see that the Tegra XHCI driver is failing ... >>> >>> tegra-xusb 70090000.usb: xHCI host controller not responding, assume dead >>> tegra-xusb 70090000.usb: HC died; cleaning up >>> >>> The Tegra XHCI driver uses multiple power-domains and uses >>> device_link_add() to attach them. So now I am wondering if there is >>> something that we have got wrong in our implementation. However, I don't >>> see the device being probed deferred on boot or anything like that. >>> >>> The driver in question is drivers/usb/host/xhci-tegra.c and we add the >>> links in the function tegra_xusb_powerdomain_init() which is before RPM >>> is enabled. Let me know if you have any thoughts. >> >> Please try the appended patch on top of the $subject one (provided that >> reverting the $subject patch makes the problem go away). > > Thanks and yes to confirm, reverting the $subject patch on top of next > does make the issue go away. > >> --- >> drivers/base/power/runtime.c | 9 ++++++--- >> 1 file changed, 6 insertions(+), 3 deletions(-) >> >> Index: linux-pm/drivers/base/power/runtime.c >> =================================================================== >> --- linux-pm.orig/drivers/base/power/runtime.c >> +++ linux-pm/drivers/base/power/runtime.c >> @@ -1675,9 +1675,12 @@ void pm_runtime_put_suppliers(struct dev >> idx = device_links_read_lock(); >> >> list_for_each_entry_rcu(link, &dev->links.suppliers, c_node) >> - if (link->flags & DL_FLAG_PM_RUNTIME && >> - refcount_dec_not_one(&link->rpm_active)) >> - pm_runtime_put(link->supplier); >> + if (link->flags & DL_FLAG_PM_RUNTIME) { >> + if (refcount_dec_not_one(&link->rpm_active)) >> + pm_runtime_put(link->supplier); >> + else >> + pm_request_idle(link->supplier); >> + } >> >> device_links_read_unlock(idx); >> } > > I will try this now and report back in a bit. I tried this on top of next, but unfortunately the same issue still persists and so this did not fix it. Let me know if there is any debug I can add/enable. Cheers Jon -- nvpublic