Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp874978imj; Fri, 15 Feb 2019 08:13:19 -0800 (PST) X-Google-Smtp-Source: AHgI3IYJ0F3GeuPEY81ckK8dn7XrUI+fE79snAmafUNlCC5k4QfnvWBO8FJHMvmdcJzEUgzmZWF2 X-Received: by 2002:a17:902:aa8d:: with SMTP id d13mr10973624plr.293.1550247199832; Fri, 15 Feb 2019 08:13:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550247199; cv=none; d=google.com; s=arc-20160816; b=oXEm9xr5dhiKdNTGRH2DDGsMPVHNC1iykY5sTdLGwcJ62M4bi5QIl+0gSlK2Aecsx7 JJ1jyO1rDh5huIcD/d6Ad8OnltZIZ5ryxqPaUMrrkK9ru3mnu5OQFnb/19XNMUDwUgat oiJnFdPLlAq/HGhmWKdtBGV60fnQLlysdfsACSXeoJiRdhAxbMITIoqCYrbrvWV+G9TM q3chCsAGRgXmt4r/BrbDpkdx61b4b5cIHAvbXjFLbbSfNdjjSoKv/bEnIuqWFq5kK0Cy XoBVqf45iluuBeH1ho2eprUAkRjeWPCOVLBxfmKukuyuQ2irIlRSLVpMtW/uapThe+p7 muZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=EQHuoLy39gbPzNOhM6cnv20TgXbddXoS9xdz8iweADI=; b=XyLnKDqfLKNGTsvLko/QeyKcH/NR4OQpvYx94/SPuUp/dSOd4is6GmuNr5kkZOYoRd iVZDl/xEvSJoePVBVsqYAXSqWDK7H7cv6YHqkL9vLwpxMCNYJtFeyl/PIQdevosdt1G8 2tCwZr+Ir3sdvtuAuh/Zm9AeqjUhNAJv5w/ylNLimKNrzQll+TTtYZt5h4ZLKB8ceMfs 88dS3Y+kLQHns7XNVsRc+woWtnf1RmleR+vcX4WJXZt3MbR+AwIJCwwBAy6VwgHsZPet h3YASV2i8Xc/WEI4RyI6/ZNqXMgJGUi2jXMKmpDKXJUzyucy/zhnEzRB9gxFaMwpJCLE rMrQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 59si2517080pla.240.2019.02.15.08.13.03; Fri, 15 Feb 2019 08:13:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436874AbfBOL5c (ORCPT + 99 others); Fri, 15 Feb 2019 06:57:32 -0500 Received: from mail-ot1-f66.google.com ([209.85.210.66]:35547 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436815AbfBOL5b (ORCPT ); Fri, 15 Feb 2019 06:57:31 -0500 Received: by mail-ot1-f66.google.com with SMTP id z19so16121770otm.2; Fri, 15 Feb 2019 03:57:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=EQHuoLy39gbPzNOhM6cnv20TgXbddXoS9xdz8iweADI=; b=mubDm8kaKXCswshVltj75FcgItr86Ow1rvWOX3Jj1biM19MUYY3s3WJayvVPBo3osi ZRw53/5kR6PK3G0ZwApY3Ece1iLgYB6Y3eSpK378OgCcm8hslawQV69NPTMu3cVcEHjX tMjFioVJ+ZBeT7L7XVio9MbOq5yx7VA0eI2Bby6XRw8rE7rxguoV1g2CKmWqFZDiePqQ imoVPPUCZXVG9NAfth8/J68owrOZ0dBbqKOIM8iXLvkAjPeWnABIVZqVJ/cWWFq6L1TN SCc4E941i4vr723p4XJcyXo4W1wPwsRlx4ws+PAMHgr/XybfTSLHKTcSKuEa8m5Fa999 vQyw== X-Gm-Message-State: AHQUAuZxhlHomf+hsXsfwo9q8xSwAv/w1bEUyz+wHWpiRg/istARwVVW lqfIMdGH3Sj9ogFbKWZ2EovisHFHvbBKQNXDP2s= X-Received: by 2002:aca:ed0f:: with SMTP id l15mr5649982oih.76.1550231849362; Fri, 15 Feb 2019 03:57:29 -0800 (PST) MIME-Version: 1.0 References: <5510642.nRbR3bcduN@aspire.rjw.lan> <9351473.C2nPJoyFsE@aspire.rjw.lan> <2ed95b05-317c-59bb-498a-b5481e54bcf6@nvidia.com> In-Reply-To: <2ed95b05-317c-59bb-498a-b5481e54bcf6@nvidia.com> From: "Rafael J. Wysocki" Date: Fri, 15 Feb 2019 12:57:18 +0100 Message-ID: Subject: Re: [PATCH 2/2] driver core: Fix possible supplier PM-usage counter imbalance To: Jon Hunter Cc: "Rafael J. Wysocki" , Greg Kroah-Hartman , LKML , Linux PM , Ulf Hansson , Daniel Vetter , Lukas Wunner , Andrzej Hajda , Russell King - ARM Linux , Lucas Stach , Linus Walleij , Thierry Reding , Laurent Pinchart , Marek Szyprowski , linux-tegra Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 15, 2019 at 12:00 PM Jon Hunter wrote: > > Hi Rafael, > > On 12/02/2019 12:08, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki > > > > If a stateless device link to a certain supplier with > > DL_FLAG_PM_RUNTIME set in the flags is added and then removed by the > > consumer driver's probe callback, the supplier's PM-runtime usage > > counter will be nonzero after that which effectively causes the > > supplier to remain "always on" going forward. > > > > Namely, device_link_add() called to add the link invokes > > device_link_rpm_prepare() which notices that the consumer driver is > > probing, so it increments the supplier's PM-runtime usage counter > > with the assumption that the link will stay around until > > pm_runtime_put_suppliers() is called by driver_probe_device(), > > but if the link goes away before that point, the supplier's > > PM-runtime usage counter will remain nonzero. > > > > To prevent that from happening, first rework pm_runtime_get_suppliers() > > and pm_runtime_put_suppliers() to use the rpm_active refounts of device > > links and make the latter only drop rpm_active and the supplier's > > PM-runtime usage counter for each link by one, unless rpm_active is > > one already for it. Next, modify device_link_add() to bump up the > > new link's rpm_active refcount and the suppliers PM-runtime usage > > counter by two, to prevent pm_runtime_put_suppliers(), if it is > > called subsequently, from suspending the supplier prematurely (in > > case its PM-runtime usage counter goes down to 0 in there). > > > > Due to the way rpm_put_suppliers() works, this change does not > > affect runtime suspend of the consumer ends of new device links (or, > > generally, device links for which DL_FLAG_PM_RUNTIME has just been > > set). > > > > Fixes: e2f3cd831a28 ("driver core: Fix handling of runtime PM flags in device_link_add()") > > Reported-by: Ulf Hansson > > Signed-off-by: Rafael J. Wysocki > > --- > > > > Note that the issue had been there before commit e2f3cd831a28, but it was > > overlooked by that commit and this change is a fix on top of it, so make > > the Fixes: tag point to commit e2f3cd831a28 (instead of an earlier one > > that the patch will not be applicable to). > > I noticed that yesterday's and today's -next were no longer booting on > one of our Tegra boards (Tegra210 Jetson TX2) because networking is > failing. The ethernet chip is a USB device and looking at the bootlogs I > can see that the Tegra XHCI driver is failing ... Is it failing because of this particular commit? That is, does reverting the entire commit help? > tegra-xusb 70090000.usb: xHCI host controller not responding, assume dead > tegra-xusb 70090000.usb: HC died; cleaning up > > The Tegra XHCI driver uses multiple power-domains and uses > device_link_add() to attach them. So now I am wondering if there is > something that we have got wrong in our implementation. However, I don't > see the device being probed deferred on boot or anything like that. It won't be, because you use stateless links. > The driver in question is drivers/usb/host/xhci-tegra.c and we add the > links in the function tegra_xusb_powerdomain_init() which is before RPM > is enabled. Let me know if you have any thoughts. Well, if it breaks, then there is a bug somewhere. I'm not seeing it now, but let's dig into this. Since you don't pass DL_FLAG_RPM_ACTIVE to device_link_add(), the changes related to that don't matter. The links are not there before your probe function runs. It adds the links and then pm_runtime_put_suppliers() sees them, but since link->rpm_active is one for the new links, it won't do anything with them. Well, there is a difference, but if it matters, then something fishy is going on IMO. Before this change pm_runtime_put_suppliers() would do pm_runtime_put() on the new links' suppliers and (because their PM-runtime usage counters are both one at that point) it will actually try to suspend the suppliers. It should be easy enough to verify if this really matters, stay tuned.