Power down Thunderbolt controllers on Macs when nothing is plugged in
to save around 2W per controller.
Apple provides an ACPI-based (but nonstandard) mechanism to cut power
and signal hotplug during powerdown. The usual way to implement such
nonstandard mechanisms seems to be a struct dev_pm_domain.
E.g. vga_switcheroo uses that for Optimus GPUs which control power
with ACPI DSMs. Hence this third iteration of the series uses that
as well. In v2 a more complicated approach was employed wherein power
control was exerted by a PCIe port service driver instead.
All the prep work went into 4.9 and 4.10, shrinking this series to just
7 patches:
- The actual "meat" of the series (to borrow a term from Bjorn) is in
patches [6/7] and [7/7]. These two need an ack from Andreas.
- Patches [1/7] to [3/7] need an ack from Bjorn (and possibly Rafael or
Mika). They're fairly small and just add a bit to struct pci_dev
signifying that a device is part of a Thunderbolt daisy chain, then
use that bit to modify runtime PM for PCIe ports. I'm also cc'ing
Tomas and Amir at Intel Israel, if you guys have comments please shout.
- Patches [4/7] and [5/7] need an ack from Rafael. Their sole purpose
is to avoid a gratuitous WARN splat when assigning the struct
dev_pm_domain.
I've pushed the patches to GitHub to ease reviewing/fetching:
https://github.com/l1k/linux/commits/thunderbolt_runpm_v3
Link to the previous iteration (v2, May 2016):
http://www.spinics.net/lists/linux-pci/msg51158.html
Thanks,
Lukas
Lukas Wunner (7):
PCI: Recognize Thunderbolt devices
PCI: Allow runtime PM on Thunderbolt ports
PCI: Don't block runtime PM for Thunderbolt host hotplug ports
Revert "PM / Runtime: Remove the exported function
pm_children_suspended()"
PM: Make requirements of dev_pm_domain_set() more precise
thunderbolt: Power down controller when idle
thunderbolt: Runtime suspend NHI when idle
drivers/base/power/common.c | 15 +-
drivers/base/power/runtime.c | 3 +-
drivers/pci/pci.c | 20 ++-
drivers/pci/pci.h | 2 +
drivers/pci/probe.c | 34 +++++
drivers/thunderbolt/Kconfig | 3 +-
drivers/thunderbolt/Makefile | 4 +-
drivers/thunderbolt/nhi.c | 5 +
drivers/thunderbolt/power.c | 356 +++++++++++++++++++++++++++++++++++++++++++
drivers/thunderbolt/power.h | 37 +++++
drivers/thunderbolt/switch.c | 9 ++
drivers/thunderbolt/tb.c | 13 ++
drivers/thunderbolt/tb.h | 2 +
include/linux/pci.h | 1 +
include/linux/pm_runtime.h | 7 +
15 files changed, 500 insertions(+), 11 deletions(-)
create mode 100644 drivers/thunderbolt/power.c
create mode 100644 drivers/thunderbolt/power.h
--
2.10.2
We're about to allow runtime PM on Thunderbolt ports in
pci_bridge_d3_possible() and unblock runtime PM for Thunderbolt host
hotplug ports in pci_dev_check_d3cold(). In both cases we need to
uniquely identify if a PCI device belongs to a Thunderbolt controller.
We also have the need to detect presence of a Thunderbolt controller in
drivers/platform/x86/apple-gmux.c because dual GPU MacBook Pros cannot
switch external DP/HDMI ports between GPUs if they have Thunderbolt.
Furthermore, in multiple places in the DRM subsystem we need to detect
whether a GPU is on-board or attached with Thunderbolt. As an example,
Thunderbolt-attached GPUs shall not be registered with vga_switcheroo.
Intel uses a Vendor-Specific Extended Capability (VSEC) with ID 0x1234
on devices belonging to a Thunderbolt controller which allows us to
recognize them.
Detect presence of this VSEC on device probe and cache it in a newly
added is_thunderbolt bit in struct pci_dev which can then be queried by
pci_bridge_d3_possible(), pci_dev_check_d3cold(), apple-gmux and others.
Cc: Andreas Noever <[email protected]>
Cc: Tomas Winkler <[email protected]>
Cc: Amir Levy <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
---
drivers/pci/pci.h | 2 ++
drivers/pci/probe.c | 34 ++++++++++++++++++++++++++++++++++
include/linux/pci.h | 1 +
3 files changed, 37 insertions(+)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index cb17db2..45c2b81 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -3,6 +3,8 @@
#define PCI_FIND_CAP_TTL 48
+#define PCI_VSEC_ID_INTEL_TBT 0x1234 /* Thunderbolt */
+
extern const unsigned char pcie_link_speed[];
bool pcie_cap_has_lnkctl(const struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index e164b5c..891a8fa 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1206,6 +1206,37 @@ void set_pcie_hotplug_bridge(struct pci_dev *pdev)
pdev->is_hotplug_bridge = 1;
}
+static void set_pcie_vendor_specific(struct pci_dev *dev)
+{
+ int vsec = 0;
+ u32 header;
+
+ while ((vsec = pci_find_next_ext_capability(dev, vsec,
+ PCI_EXT_CAP_ID_VNDR))) {
+ pci_read_config_dword(dev, vsec + PCI_VNDR_HEADER, &header);
+
+ /* Is the device part of a Thunderbolt controller? */
+ if (dev->vendor == PCI_VENDOR_ID_INTEL &&
+ PCI_VNDR_HEADER_ID(header) == PCI_VSEC_ID_INTEL_TBT)
+ dev->is_thunderbolt = 1;
+ }
+
+ /*
+ * Is the device attached with Thunderbolt? Walk upwards and check for
+ * each encountered bridge if it's part of a Thunderbolt controller.
+ * Reaching the host bridge means dev is soldered to the mainboard.
+ */
+ if (!dev->is_thunderbolt) {
+ struct pci_dev *parent = dev;
+
+ while ((parent = pci_upstream_bridge(parent)))
+ if (parent->is_thunderbolt) {
+ dev->is_thunderbolt = 1;
+ break;
+ }
+ }
+}
+
/**
* pci_ext_cfg_is_aliased - is ext config space just an alias of std config?
* @dev: PCI device
@@ -1358,6 +1389,9 @@ int pci_setup_device(struct pci_dev *dev)
/* need to have dev->class ready */
dev->cfg_size = pci_cfg_space_size(dev);
+ /* need to have dev->cfg_size ready */
+ set_pcie_vendor_specific(dev);
+
/* "Unknown power state" */
dev->current_state = PCI_UNKNOWN;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e2d1a12..3c775e8 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -358,6 +358,7 @@ struct pci_dev {
unsigned int is_virtfn:1;
unsigned int reset_fn:1;
unsigned int is_hotplug_bridge:1;
+ unsigned int is_thunderbolt:1; /* part of Thunderbolt daisy chain */
unsigned int __aer_firmware_first_valid:1;
unsigned int __aer_firmware_first:1;
unsigned int broken_intx_masking:1;
--
2.10.2
Currently PCIe ports are only allowed to go to D3 if the BIOS is dated
2015 or newer to avoid potential issues with old chipsets. However for
Thunderbolt we know that even the oldest controller, Light Ridge (2010),
is able to suspend its ports to D3 just fine.
We're about to add runtime PM for Thunderbolt on the Mac. Apple has
released two EFI security updates in 2015 which encompass all machines
with Thunderbolt, but the achieved power saving should be made available
to users even if they haven't updated their BIOS. To this end,
special-case Thunderbolt in pci_bridge_d3_possible().
This allows the Thunderbolt controller to power down but the root port
to which the Thunderbolt controller is attached remains in D0 unless
the EFI update is installed. Users can pass pcie_port_pm=force on the
kernel command line if they cannot install the EFI update but still want
to benefit from the additional power saving of putting the root port
into D3. In practice, root ports can be suspended to D3 without issues
at least on 2012 Ivy Bridge machines.
If the BIOS cut-off date is ever lowered to 2010, the Thunderbolt
special case can be removed.
Cc: Mika Westerberg <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Andreas Noever <[email protected]>
Cc: Tomas Winkler <[email protected]>
Cc: Amir Levy <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
---
drivers/pci/pci.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a881c0d..8ed098d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2224,7 +2224,7 @@ void pci_config_pm_runtime_put(struct pci_dev *pdev)
* @bridge: Bridge to check
*
* This function checks if it is possible to move the bridge to D3.
- * Currently we only allow D3 for recent enough PCIe ports.
+ * Currently we only allow D3 for recent enough PCIe ports and Thunderbolt.
*/
bool pci_bridge_d3_possible(struct pci_dev *bridge)
{
@@ -2258,6 +2258,11 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
year >= 2015) {
return true;
}
+
+ /* Even the oldest 2010 Thunderbolt controller supports D3. */
+ if (bridge->is_thunderbolt)
+ return true;
+
break;
}
--
2.10.2
Hotplug ports generally block their parents from suspending to D3hot as
otherwise their interrupts couldn't be delivered.
An exception are Thunderbolt host controllers: They have a separate
GPIO pin to side-band signal plug events even if the controller is
powered down or its parent ports are suspended to D3. They can be told
apart from Thunderbolt controllers in attached devices by checking if
they're situated below a non-Thunderbolt device (typically a root port,
or the downstream port of a PCIe switch in the case of the MacPro6,1).
To enable runtime PM for Thunderbolt on the Mac, the downstream bridges
of a host controller must not block runtime PM on the upstream bridge as
power to the chip is only cut once the upstream bridge has suspended.
Amend the condition in pci_dev_check_d3cold() accordingly.
Cc: Mika Westerberg <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Andreas Noever <[email protected]>
Cc: Tomas Winkler <[email protected]>
Cc: Amir Levy <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
---
drivers/pci/pci.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8ed098d..0b03fe7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2271,6 +2271,7 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
static int pci_dev_check_d3cold(struct pci_dev *dev, void *data)
{
+ struct pci_dev *parent, *grandparent;
bool *d3cold_ok = data;
if (/* The device needs to be allowed to go D3cold ... */
@@ -2284,7 +2285,17 @@ static int pci_dev_check_d3cold(struct pci_dev *dev, void *data)
!pci_power_manageable(dev) ||
/* Hotplug interrupts cannot be delivered if the link is down. */
- dev->is_hotplug_bridge)
+ (dev->is_hotplug_bridge &&
+
+ /*
+ * Exception: Thunderbolt host controllers have a pin to
+ * side-band signal plug events. Their hotplug ports are
+ * recognizable by having a non-Thunderbolt device as
+ * grandparent.
+ */
+ !(dev->is_thunderbolt && (parent = pci_upstream_bridge(dev)) &&
+ (grandparent = pci_upstream_bridge(parent)) &&
+ !grandparent->is_thunderbolt)))
*d3cold_ok = false;
--
2.10.2
This reverts commit 62006c1702b3b1be0c0726949e0ee0ea2326be9c which
removed pm_children_suspended() because it had only a single caller.
We're about to add a second caller, so establish the status quo ante.
Cc: Ulf Hansson <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
---
drivers/base/power/runtime.c | 3 +--
include/linux/pm_runtime.h | 7 +++++++
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 872eac4..03293c3 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -243,8 +243,7 @@ static int rpm_check_suspend_allowed(struct device *dev)
retval = -EACCES;
else if (atomic_read(&dev->power.usage_count) > 0)
retval = -EAGAIN;
- else if (!dev->power.ignore_children &&
- atomic_read(&dev->power.child_count))
+ else if (!pm_children_suspended(dev))
retval = -EBUSY;
/* Pending resume requests take precedence over suspends. */
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index ca4823e..7de2aa5c 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -66,6 +66,12 @@ static inline void pm_suspend_ignore_children(struct device *dev, bool enable)
dev->power.ignore_children = enable;
}
+static inline bool pm_children_suspended(struct device *dev)
+{
+ return dev->power.ignore_children
+ || !atomic_read(&dev->power.child_count);
+}
+
static inline void pm_runtime_get_noresume(struct device *dev)
{
atomic_inc(&dev->power.usage_count);
@@ -161,6 +167,7 @@ static inline void pm_runtime_allow(struct device *dev) {}
static inline void pm_runtime_forbid(struct device *dev) {}
static inline void pm_suspend_ignore_children(struct device *dev, bool enable) {}
+static inline bool pm_children_suspended(struct device *dev) { return false; }
static inline void pm_runtime_get_noresume(struct device *dev) {}
static inline void pm_runtime_put_noidle(struct device *dev) {}
static inline bool device_run_wake(struct device *dev) { return false; }
--
2.10.2
Since commit 989561de9b51 ("PM / Domains: add setter for dev.pm_domain")
a PM domain may only be assigned to unbound devices.
The motivation was not made explicit in the changelog other than "in the
general case that can cause problems and also [...] we can simplify code
quite a bit if we can always assume that". Rafael J. Wysocki elaborated
in a mailing list conversation that "setting a PM domain generally
changes the set of PM callbacks for the device and it may not be safe to
call it after the driver has been bound".
The concern seems to be that if a device is put to sleep and its PM
callbacks are changed, the device may end up in an undefined state or
not resume at all. The real underlying requirement is thus to ensure
that the device is awake and execution of its PM callbacks is prevented
while the PM domain is assigned. Unbound devices happen to fulfill this
requirement, but bound devices can be made to satisfy it as well:
The caller can prevent execution of PM ops with lock_system_sleep() and
by holding a runtime PM reference to the device.
Accordingly, adjust dev_pm_domain_set() to WARN only if the device is in
the midst of a system sleep transition, or runtime PM is enabled and the
device is either not active or may become inactive imminently (because
it has no active children or its refcount is zero).
The change is required to support runtime PM for Thunderbolt on the Mac,
which poses the unique issue that a child device (the NHI) needs to
assign a PM domain to its grandparent (the upstream bridge). Because
the grandparent's driver is built-in and the child's driver is a module,
the grandparent is usually already bound when the child probes,
resulting in a WARN splat when calling dev_pm_domain_set(). However the
PM core guarantees both that the grandparent is active and that system
sleep is not commenced until the child has finished probing. So in this
case it is safe to call dev_pm_domain_set() from the child's ->probe
hook and the WARN splat is entirely gratuitous.
Note that commit e79aee49bcf9 ("PM: Avoid false-positive warnings in
dev_pm_domain_set()") modified the WARN to not apply if a PM domain is
removed. This is unsafe as it allows removal of the PM domain while
the device is asleep. The present commit rectifies this.
Cc: Ulf Hansson <[email protected]>
Cc: Tomeu Vizoso <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
---
drivers/base/power/common.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/base/power/common.c b/drivers/base/power/common.c
index f6a9ad5..d02c1e0 100644
--- a/drivers/base/power/common.c
+++ b/drivers/base/power/common.c
@@ -13,6 +13,7 @@
#include <linux/pm_clock.h>
#include <linux/acpi.h>
#include <linux/pm_domain.h>
+#include <linux/pm_runtime.h>
#include "power.h"
@@ -136,8 +137,10 @@ EXPORT_SYMBOL_GPL(dev_pm_domain_detach);
* @dev: Device whose PM domain is to be set.
* @pd: PM domain to be set, or NULL.
*
- * Sets the PM domain the device belongs to. The PM domain of a device needs
- * to be set before its probe finishes (it's bound to a driver).
+ * Sets the PM domain the device belongs to. The PM domain of a device needs
+ * to be set while the device is awake. This is guaranteed during ->probe.
+ * Otherwise the caller is responsible for ensuring wakefulness, e.g. by
+ * holding a runtime PM reference as well as invoking lock_system_sleep().
*
* This function must be called with the device lock held.
*/
@@ -146,8 +149,12 @@ void dev_pm_domain_set(struct device *dev, struct dev_pm_domain *pd)
if (dev->pm_domain == pd)
return;
- WARN(pd && device_is_bound(dev),
- "PM domains can only be changed for unbound devices\n");
+ WARN(dev->power.is_prepared || dev->power.is_suspended ||
+ (pm_runtime_enabled(dev) &&
+ (dev->power.runtime_status != RPM_ACTIVE ||
+ (pm_children_suspended(dev) &&
+ !atomic_read(&dev->power.usage_count)))),
+ "PM domains can only be changed for awake devices\n");
dev->pm_domain = pd;
device_pm_check_callbacks(dev);
}
--
2.10.2
Document and implement Apple's ACPI-based (but nonstandard) pm mechanism
for Thunderbolt. Briefly, an ACPI method provided by Apple is used to
cut power to the controller. A GPE is enabled while the controller is
powered down which sideband-signals a plug event, whereupon we reinstate
power using the ACPI method.
This saves 1.7 W on machines with a Light Ridge controller and is
reported to save 4 W on Cactus Ridge 4C and Falcon Ridge 4C. (I believe
4 W includes the bus power drawn by Apple's Gigabit Ethernet adapter.)
It fixes (at least partially) a power regression introduced in 3.17 by
commit 7bc5a2bad0b8 ("ACPI: Support _OSI("Darwin") correctly").
A Thunderbolt controller appears to the OS as a set of virtual devices:
One upstream bridge, multiple downstream bridges and one NHI (Native
Host Interface). The upstream and downstream bridges represent a PCIe
switch (see definition of a switch in the PCIe spec). The NHI device is
used to manage the switch fabric. Hotplugged devices appear behind the
downstream bridges:
(Root Port) ---- Upstream Bridge --+-- Downstream Bridge 0 ---- NHI
+-- Downstream Bridge 1 --
+-- Downstream Bridge 2 --
...
Power is cut to the entire set of devices. The Linux pm model is
hierarchical and assumes that a child cannot resume before its parent.
To conform to this model, power control must be governed by the
Thunderbolt controller's topmost device, which is the upstream bridge.
The NHI and downstream bridges go to D3hot independently and the
upstream bridge goes to D3cold once all its children have suspended.
This commit only adds runtime pm for the upstream bridge. Runtime pm
for the NHI is added in a separate commit to signify its independence.
Runtime pm for the downstream bridges is handled by the pcieport driver.
Because Apple's ACPI methods are nonstandard, a struct dev_pm_domain is
used to override the PCI bus pm_ops. The thunderbolt driver binds to
the NHI, thus the dev_pm_domain is assigned to the upstream bridge when
its grandchild ->probes and evicted when it ->removes.
There are no Thunderbolt specs publicly available from Intel or Apple,
so I've included documentation to the extent that I was able to reverse-
engineer things. Documentation on the Go2Sx and Ok2Go2Sx pins is
tentative as those are missing on my Light Ridge. Apple only uses them
on Cactus Ridge 4C. Someone with such a controller needs to find out
through experimentation if the documentation is accurate and amend it if
necessary.
To maximize power saving, the controller utilizes the PM core's direct-
complete procedure, i.e. it stays suspended during the system sleep
process.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=92111
Cc: Andreas Noever <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
---
drivers/thunderbolt/Kconfig | 3 +-
drivers/thunderbolt/Makefile | 4 +-
drivers/thunderbolt/nhi.c | 3 +
drivers/thunderbolt/power.c | 347 +++++++++++++++++++++++++++++++++++++++++++
drivers/thunderbolt/power.h | 37 +++++
drivers/thunderbolt/tb.h | 2 +
6 files changed, 393 insertions(+), 3 deletions(-)
create mode 100644 drivers/thunderbolt/power.c
create mode 100644 drivers/thunderbolt/power.h
diff --git a/drivers/thunderbolt/Kconfig b/drivers/thunderbolt/Kconfig
index d35db16..41625cf 100644
--- a/drivers/thunderbolt/Kconfig
+++ b/drivers/thunderbolt/Kconfig
@@ -1,9 +1,10 @@
menuconfig THUNDERBOLT
tristate "Thunderbolt support for Apple devices"
- depends on PCI
+ depends on PCI && ACPI
depends on X86 || COMPILE_TEST
select APPLE_PROPERTIES if EFI_STUB && X86
select CRC32
+ select PM
help
Cactus Ridge Thunderbolt Controller driver
This driver is required if you want to hotplug Thunderbolt devices on
diff --git a/drivers/thunderbolt/Makefile b/drivers/thunderbolt/Makefile
index 5d1053c..b220825 100644
--- a/drivers/thunderbolt/Makefile
+++ b/drivers/thunderbolt/Makefile
@@ -1,3 +1,3 @@
obj-${CONFIG_THUNDERBOLT} := thunderbolt.o
-thunderbolt-objs := nhi.o ctl.o tb.o switch.o cap.o path.o tunnel_pci.o eeprom.o
-
+thunderbolt-objs := nhi.o ctl.o tb.o switch.o cap.o path.o tunnel_pci.o \
+ eeprom.o power.o
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index a8c2041..88fb2fb 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -605,6 +605,8 @@ static int nhi_probe(struct pci_dev *pdev, const struct pci_device_id *id)
}
pci_set_drvdata(pdev, tb);
+ thunderbolt_power_init(tb);
+
return 0;
}
@@ -612,6 +614,7 @@ static void nhi_remove(struct pci_dev *pdev)
{
struct tb *tb = pci_get_drvdata(pdev);
struct tb_nhi *nhi = tb->nhi;
+ thunderbolt_power_fini(tb);
thunderbolt_shutdown_and_free(tb);
nhi_shutdown(nhi);
}
diff --git a/drivers/thunderbolt/power.c b/drivers/thunderbolt/power.c
new file mode 100644
index 0000000..4d7c6a0
--- /dev/null
+++ b/drivers/thunderbolt/power.c
@@ -0,0 +1,347 @@
+/*
+ * power.c - power thunderbolt controller down when idle
+ * Copyright (C) 2016 Lukas Wunner <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2) as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Apple provides the following means for power control in ACPI:
+ *
+ * * On Macs with Thunderbolt 1 Gen 1 controllers (Light Ridge, Eagle Ridge):
+ * * XRPE method ("Power Enable"), takes argument 1 or 0, toggles a GPIO pin
+ * to switch the controller on or off.
+ * * XRIN named object (alternatively _GPE), contains number of a GPE which
+ * fires as long as something is plugged in (regardless of power state).
+ * * XRIL method ("Interrupt Low"), returns 0 as long as something is
+ * plugged in, 1 otherwise.
+ * * XRIP and XRIO methods, unused by macOS driver.
+ *
+ * * On Macs with Thunderbolt 1 Gen 2 controllers (Cactus Ridge 4C):
+ * * XRIN not only fires as long as something is plugged in, but also as long
+ * as the controller's CIO switch is powered up.
+ * * XRIL method changed its meaning, it returns 0 as long as the CIO switch
+ * is powered up, 1 otherwise.
+ * * Additional SXFP method ("Force Power"), accepts only argument 0,
+ * switches the controller off. This carries out just the raw power change,
+ * unlike XRPE which disables the link on the PCIe Root Port in an orderly
+ * fashion before switching off the controller.
+ * * Additional SXLV, SXIO, SXIL methods to utilize the Go2Sx and Ok2Go2Sx
+ * pins (see background below). Apparently SXLV toggles the value given to
+ * the POC via Go2Sx (0 or 1), SXIO changes the direction (0 or 1) and SXIL
+ * returns the value received from the POC via Ok2Go2Sx.
+ * * On some Macs, additional XRST method, takes argument 1 or 0, asserts or
+ * deasserts a GPIO pin to reset the controller.
+ * * On Macs introduced 2013, XRPE was renamed TRPE.
+ *
+ * * On Macs with Thunderbolt 2 controllers (Falcon Ridge 4C and 2C):
+ * * SXLV, SXIO, SXIL methods to utilize Go2Sx and Ok2Go2Sx are gone.
+ * * On the MacPro6,1 which has multiple Thunderbolt controllers, each NHI
+ * device has a separate XRIN GPE and separate TRPE, SXFP and XRIL methods.
+ *
+ * Background:
+ *
+ * * Gen 1 controllers (Light Ridge, Eagle Ridge) had no power management
+ * and no ability to distinguish whether a DP or Thunderbolt device is
+ * plugged in. Apple put an ARM Cortex MCU (NXP LPC1112A) on the logic board
+ * which snoops on the connector lines and, depending on the type of device,
+ * sends an HPD signal to the GPU or rings the Thunderbolt XRIN doorbell
+ * interrupt. The switches for the 3.3V and 1.05V power rails to the
+ * Thunderbolt controller are toggled by a GPIO pin on the southbridge.
+ *
+ * * On gen 2 controllers (Cactus Ridge 4C), Intel integrated the MCU into the
+ * controller and called it POC. This caused a change of semantics for XRIN
+ * and XRIL. The POC is powered by a separate 3.3V rail which is active even
+ * in sleep state S4. It only draws a very small current. The regular 3.3V
+ * and 1.05V power rails are no longer controlled by the southbridge but by
+ * the POC. In other words the controller powers *itself* up and down! It is
+ * instructed to do so with the Go2Sx pin. Another pin, Ok2Go2Sx, allows the
+ * controller to indicate if it is ready to power itself down. Apple wires
+ * Go2Sx and Ok2Go2Sx to the same GPIO pin on the southbridge, hence the pin
+ * is used bidirectionally. A third pin, Force Power, is intended by Intel
+ * for debug only but Apple abuses it for XRPE/TRPE and SXFP. Perhaps it
+ * leads to larger power saving gains. They utilize Go2Sx and Ok2Go2Sx only
+ * on Cactus Ridge, presumably because the controller somehow requires that.
+ * On Falcon Ridge they forego these pins and rely solely on Force Power.
+ *
+ * Implementation Notes:
+ *
+ * * To conform to Linux' hierarchical power management model, power control
+ * is governed by the topmost PCI device of the controller, which is the
+ * upstream bridge. The controller is powered down once all child devices
+ * of the upstream bridge have suspended and its autosuspend delay has
+ * elapsed.
+ *
+ * * The autosuspend delay is user configurable via sysfs and should be lower
+ * or equal to that of the NHI since hotplug events are not acted upon if
+ * the NHI has suspended but the controller has not yet powered down.
+ * However the delay should not be zero to avoid frequent power changes
+ * (e.g. multiple times just for lspci -vv) since powering up takes 2 sec.
+ * (Powering down is almost instantaneous.)
+ */
+
+#include <linux/delay.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+
+#include "power.h"
+
+#ifdef pr_fmt
+#undef pr_fmt
+#endif
+#define pr_fmt(fmt) KBUILD_MODNAME " %s: " fmt, dev_name(dev)
+
+#define to_power(dev) container_of(dev->pm_domain, struct tb_power, pm_domain)
+
+static int upstream_prepare(struct device *dev)
+{
+ struct tb_power *power = to_power(dev);
+
+ if (pm_runtime_active(dev))
+ return 0;
+
+ /* prevent interrupts during system sleep transition */
+ if (ACPI_FAILURE(acpi_disable_gpe(NULL, power->wake_gpe))) {
+ pr_err("cannot disable wake GPE, resuming\n");
+ pm_request_resume(dev);
+ return -EAGAIN;
+ }
+
+ return DPM_DIRECT_COMPLETE;
+}
+
+static void upstream_complete(struct device *dev)
+{
+ struct tb_power *power = to_power(dev);
+
+ if (pm_runtime_active(dev))
+ return;
+
+ /*
+ * If the controller was powered down before system sleep, calling XRPE
+ * to power it up will fail on the next runtime resume. An additional
+ * call to XRPE is necessary to reset the power switch first.
+ */
+ pr_info("resetting power switch\n");
+ if (ACPI_FAILURE(acpi_execute_simple_method(power->set, NULL, 0))) {
+ pr_err("cannot call power->set method\n");
+ dev->power.runtime_error = -EIO;
+ }
+
+ if (ACPI_FAILURE(acpi_enable_gpe(NULL, power->wake_gpe))) {
+ pr_err("cannot enable wake GPE, resuming\n");
+ pm_request_resume(dev);
+ }
+}
+
+static int set_d3cold(struct pci_dev *pdev, void *ptr)
+{
+ pdev->current_state = PCI_D3cold;
+ return 0;
+}
+
+static int request_resume(struct pci_dev *pdev, void *ptr)
+{
+ WARN_ON(pm_request_resume(&pdev->dev) < 0);
+ return 0;
+}
+
+static int upstream_runtime_suspend(struct device *dev)
+{
+ struct tb_power *power = to_power(dev);
+ struct pci_dev *pdev = to_pci_dev(dev);
+ unsigned long long powered_down;
+ int ret, i;
+
+ /* children are effectively in D3cold once upstream goes to D3hot */
+ pci_walk_bus(pdev->subordinate, set_d3cold, NULL);
+
+ ret = dev->bus->pm->runtime_suspend(dev);
+ if (ret) {
+ pci_walk_bus(pdev->subordinate, request_resume, NULL);
+ return ret;
+ }
+
+ pr_info("powering down\n");
+ pdev->current_state = PCI_D3cold;
+ if (ACPI_FAILURE(acpi_execute_simple_method(power->set, NULL, 0))) {
+ pr_err("cannot call power->set method, resuming\n");
+ goto err;
+ }
+
+ /*
+ * On gen 2 controllers, the wake GPE fires as long as the controller
+ * is powered up. Poll until it's powered down before enabling the GPE.
+ */
+ for (i = 0; i < 300; i++) {
+ if (ACPI_FAILURE(acpi_evaluate_integer(power->get,
+ NULL, NULL, &powered_down))) {
+ pr_err("cannot call power->get method, resuming\n");
+ goto err;
+ }
+ if (powered_down)
+ break;
+ usleep_range(800, 1600);
+ }
+ if (!powered_down) {
+ pr_err("refused to power down, resuming\n");
+ goto err;
+ }
+
+ if (ACPI_FAILURE(acpi_enable_gpe(NULL, power->wake_gpe))) {
+ pr_err("cannot enable wake GPE, resuming\n");
+ goto err;
+ }
+
+ return 0;
+
+err:
+ acpi_execute_simple_method(power->set, NULL, 1);
+ dev->bus->pm->runtime_resume(dev);
+ pci_walk_bus(pdev->subordinate, request_resume, NULL);
+ return -EAGAIN;
+}
+
+static int upstream_runtime_resume(struct device *dev)
+{
+ struct tb_power *power = to_power(dev);
+ struct pci_dev *pdev = to_pci_dev(dev);
+ int ret;
+
+ if (!dev->power.is_prepared &&
+ ACPI_FAILURE(acpi_disable_gpe(NULL, power->wake_gpe))) {
+ pr_err("cannot disable wake GPE, disabling runtime pm\n");
+ pm_runtime_get_noresume(&power->tb->nhi->pdev->dev);
+ }
+
+ pr_info("powering up\n");
+ if (ACPI_FAILURE(acpi_execute_simple_method(power->set, NULL, 1))) {
+ pr_err("cannot call power->set method\n");
+ return -ENODEV;
+ }
+
+ ret = dev->bus->pm->runtime_resume(dev);
+
+ /* wake children to force pci_restore_state() after D3cold */
+ pci_walk_bus(pdev->subordinate, request_resume, NULL);
+
+ return ret;
+}
+
+static u32 nhi_wake(acpi_handle gpe_device, u32 gpe_number, void *ctx)
+{
+ struct device *nhi_dev = ctx;
+ WARN_ON(pm_request_resume(nhi_dev) < 0);
+ return ACPI_INTERRUPT_HANDLED;
+}
+
+static int disable_pme_poll(struct pci_dev *pdev, void *ptr)
+{
+ struct pci_bus *downstream_bus = (struct pci_bus *)ptr;
+
+ /* PME# pin is not connected, the wake GPE is used instead */
+ if (pdev->bus == downstream_bus || /* downstream bridge */
+ pdev->subordinate == downstream_bus || /* upstream bridge */
+ (pdev->bus->parent == downstream_bus &&
+ pdev->class == PCI_CLASS_SYSTEM_OTHER << 8)) /* NHI */
+ pdev->pme_poll = false;
+
+ return 0;
+}
+
+void thunderbolt_power_init(struct tb *tb)
+{
+ struct device *upstream_dev, *nhi_dev = &tb->nhi->pdev->dev;
+ struct tb_power *power = NULL;
+ struct acpi_handle *nhi_handle;
+
+ power = kzalloc(sizeof(*power), GFP_KERNEL);
+ if (!power) {
+ dev_err(nhi_dev, "cannot allocate power data\n");
+ goto err;
+ }
+
+ nhi_handle = ACPI_HANDLE(nhi_dev);
+ if (!nhi_handle) {
+ dev_err(nhi_dev, "cannot find ACPI handle\n");
+ goto err;
+ }
+
+ /* Macs introduced 2011/2012 have XRPE, 2013+ have TRPE */
+ if (ACPI_FAILURE(acpi_get_handle(nhi_handle, "XRPE", &power->set)) &&
+ ACPI_FAILURE(acpi_get_handle(nhi_handle, "TRPE", &power->set))) {
+ dev_err(nhi_dev, "cannot find power->set method\n");
+ goto err;
+ }
+
+ if (ACPI_FAILURE(acpi_get_handle(nhi_handle, "XRIL", &power->get))) {
+ dev_err(nhi_dev, "cannot find power->get method\n");
+ goto err;
+ }
+
+ if (ACPI_FAILURE(acpi_evaluate_integer(nhi_handle, "XRIN", NULL,
+ &power->wake_gpe))) {
+ dev_err(nhi_dev, "cannot find wake GPE\n");
+ goto err;
+ }
+
+ if (ACPI_FAILURE(acpi_install_gpe_handler(NULL, power->wake_gpe,
+ ACPI_GPE_LEVEL_TRIGGERED, nhi_wake, nhi_dev))) {
+ dev_err(nhi_dev, "cannot install GPE handler\n");
+ goto err;
+ }
+
+ if (!nhi_dev->parent || !nhi_dev->parent->parent) {
+ dev_err(nhi_dev, "cannot find upstream bridge\n");
+ goto err;
+ }
+ upstream_dev = nhi_dev->parent->parent;
+
+ pci_walk_bus(to_pci_dev(upstream_dev)->bus, disable_pme_poll,
+ to_pci_dev(upstream_dev)->subordinate);
+
+ power->pm_domain.ops = *upstream_dev->bus->pm;
+ power->pm_domain.ops.prepare = upstream_prepare;
+ power->pm_domain.ops.complete = upstream_complete;
+ power->pm_domain.ops.runtime_suspend = upstream_runtime_suspend;
+ power->pm_domain.ops.runtime_resume = upstream_runtime_resume;
+ power->tb = tb;
+ dev_pm_domain_set(upstream_dev, &power->pm_domain);
+
+ tb->power = power;
+
+ return;
+
+err:
+ dev_err(nhi_dev, "controller will stay powered up permanently\n");
+ kfree(power);
+}
+
+void thunderbolt_power_fini(struct tb *tb)
+{
+ struct device *nhi_dev = &tb->nhi->pdev->dev;
+ struct device *upstream_dev = nhi_dev->parent->parent;
+ struct tb_power *power = tb->power;
+
+ if (!power)
+ return;
+
+ tb->power = NULL;
+ dev_pm_domain_set(upstream_dev, NULL);
+
+ if (ACPI_FAILURE(acpi_remove_gpe_handler(NULL, power->wake_gpe,
+ nhi_wake)))
+ dev_err(nhi_dev, "cannot remove GPE handler\n");
+
+ kfree(power);
+}
diff --git a/drivers/thunderbolt/power.h b/drivers/thunderbolt/power.h
new file mode 100644
index 0000000..4ab9b29
--- /dev/null
+++ b/drivers/thunderbolt/power.h
@@ -0,0 +1,37 @@
+/*
+ * power.h - power thunderbolt controller down when idle
+ * Copyright (C) 2016 Lukas Wunner <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License (version 2) as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef THUNDERBOLT_POWER_H
+#define THUNDERBOLT_POWER_H
+
+#include <linux/acpi.h>
+#include <linux/pm_domain.h>
+
+#include "tb.h"
+
+struct tb_power {
+ struct tb *tb;
+ struct dev_pm_domain pm_domain; /* assigned to upstream bridge */
+ unsigned long long wake_gpe; /* hotplug interrupt during powerdown */
+ acpi_handle set; /* method to power controller up/down */
+ acpi_handle get; /* method to query power state */
+};
+
+void thunderbolt_power_init(struct tb *tb);
+void thunderbolt_power_fini(struct tb *tb);
+
+#endif
diff --git a/drivers/thunderbolt/tb.h b/drivers/thunderbolt/tb.h
index 61d57ba..43aed58 100644
--- a/drivers/thunderbolt/tb.h
+++ b/drivers/thunderbolt/tb.h
@@ -11,6 +11,7 @@
#include "tb_regs.h"
#include "ctl.h"
+#include "power.h"
/**
* struct tb_switch - a thunderbolt switch
@@ -103,6 +104,7 @@ struct tb {
*/
struct tb_nhi *nhi;
struct tb_ctl *ctl;
+ struct tb_power *power;
struct workqueue_struct *wq; /* ordered workqueue for plug events */
struct tb_switch *root_switch;
struct list_head tunnel_list; /* list of active PCIe tunnels */
--
2.10.2
Runtime suspend the NHI when no Thunderbolt devices have been plugged in
for 10 sec (user-configurable via autosuspend_delay_ms in sysfs).
The NHI is not able to detect plug events while suspended, it relies on
the GPE handler to resume it on hotplug.
After the NHI resumes, it takes about 700 ms until a hotplug event
appears on the RX ring. In case autosuspend_delay_ms has been reduced
to 0 by the user, we need to wait in tb_resume() to avoid going back to
sleep before we had a chance to detect a hotplugged device. A runtime
pm ref is held for the duration of tb_handle_hotplug() to keep the NHI
awake while the hotplug event is processed.
Apart from that we acquire a runtime pm ref for each newly allocated
switch (except for the root switch) and drop one when a switch is freed,
thereby ensuring the NHI stays active as long as devices are plugged in.
This behaviour is identical to the macOS driver.
Cc: Andreas Noever <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
---
drivers/thunderbolt/nhi.c | 2 ++
drivers/thunderbolt/power.c | 9 +++++++++
drivers/thunderbolt/switch.c | 9 +++++++++
drivers/thunderbolt/tb.c | 13 +++++++++++++
4 files changed, 33 insertions(+)
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index 88fb2fb..319ed81 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -632,6 +632,8 @@ static const struct dev_pm_ops nhi_pm_ops = {
* pci-tunnels stay alive.
*/
.restore_noirq = nhi_resume_noirq,
+ .runtime_suspend = nhi_suspend_noirq,
+ .runtime_resume = nhi_resume_noirq,
};
static struct pci_device_id nhi_ids[] = {
diff --git a/drivers/thunderbolt/power.c b/drivers/thunderbolt/power.c
index 4d7c6a0..1b5f066 100644
--- a/drivers/thunderbolt/power.c
+++ b/drivers/thunderbolt/power.c
@@ -320,6 +320,12 @@ void thunderbolt_power_init(struct tb *tb)
tb->power = power;
+ pm_runtime_allow(nhi_dev);
+ pm_runtime_set_autosuspend_delay(nhi_dev, 10000);
+ pm_runtime_use_autosuspend(nhi_dev);
+ pm_runtime_mark_last_busy(nhi_dev);
+ pm_runtime_put_autosuspend(nhi_dev);
+
return;
err:
@@ -336,6 +342,9 @@ void thunderbolt_power_fini(struct tb *tb)
if (!power)
return;
+ pm_runtime_get(nhi_dev);
+ pm_runtime_forbid(nhi_dev);
+
tb->power = NULL;
dev_pm_domain_set(upstream_dev, NULL);
diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
index c6f30b1..422fe6e 100644
--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -5,6 +5,7 @@
*/
#include <linux/delay.h>
+#include <linux/pm_runtime.h>
#include <linux/slab.h>
#include "tb.h"
@@ -326,6 +327,11 @@ void tb_switch_free(struct tb_switch *sw)
if (!sw->is_unplugged)
tb_plug_events_active(sw, false);
+ if (sw != sw->tb->root_switch) {
+ pm_runtime_mark_last_busy(&sw->tb->nhi->pdev->dev);
+ pm_runtime_put_autosuspend(&sw->tb->nhi->pdev->dev);
+ }
+
kfree(sw->ports);
kfree(sw->drom);
kfree(sw);
@@ -420,6 +426,9 @@ struct tb_switch *tb_switch_alloc(struct tb *tb, u64 route)
if (tb_plug_events_active(sw, true))
goto err;
+ if (tb->root_switch)
+ pm_runtime_get(&tb->nhi->pdev->dev);
+
return sw;
err:
kfree(sw->ports);
diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c
index 24b6d30..a3fedf9 100644
--- a/drivers/thunderbolt/tb.c
+++ b/drivers/thunderbolt/tb.c
@@ -7,6 +7,7 @@
#include <linux/slab.h>
#include <linux/errno.h>
#include <linux/delay.h>
+#include <linux/pm_runtime.h>
#include "tb.h"
#include "tb_regs.h"
@@ -217,8 +218,11 @@ static void tb_handle_hotplug(struct work_struct *work)
{
struct tb_hotplug_event *ev = container_of(work, typeof(*ev), work);
struct tb *tb = ev->tb;
+ struct device *dev = &tb->nhi->pdev->dev;
struct tb_switch *sw;
struct tb_port *port;
+
+ pm_runtime_get(dev);
mutex_lock(&tb->lock);
if (!tb->hotplug_active)
goto out; /* during init, suspend or shutdown */
@@ -274,6 +278,8 @@ static void tb_handle_hotplug(struct work_struct *work)
out:
mutex_unlock(&tb->lock);
kfree(ev);
+ pm_runtime_mark_last_busy(dev);
+ pm_runtime_put_autosuspend(dev);
}
/**
@@ -433,4 +439,11 @@ void thunderbolt_resume(struct tb *tb)
tb->hotplug_active = true;
mutex_unlock(&tb->lock);
tb_info(tb, "resume finished\n");
+
+ /*
+ * If runtime resuming due to a hotplug event (rather than resuming
+ * from system sleep), wait for it to arrive. May take about 700 ms.
+ */
+ if (tb->nhi->pdev->dev.power.runtime_status == RPM_RESUMING)
+ msleep(1000);
}
--
2.10.2
On Sat, Dec 17, 2016 at 4:39 PM, Lukas Wunner <[email protected]> wrote:
> Document and implement Apple's ACPI-based (but nonstandard) pm mechanism
> for Thunderbolt. Briefly, an ACPI method provided by Apple is used to
> cut power to the controller. A GPE is enabled while the controller is
> powered down which sideband-signals a plug event, whereupon we reinstate
> power using the ACPI method.
>
> This saves 1.7 W on machines with a Light Ridge controller and is
> reported to save 4 W on Cactus Ridge 4C and Falcon Ridge 4C. (I believe
> 4 W includes the bus power drawn by Apple's Gigabit Ethernet adapter.)
> It fixes (at least partially) a power regression introduced in 3.17 by
> commit 7bc5a2bad0b8 ("ACPI: Support _OSI("Darwin") correctly").
>
> +++ b/drivers/thunderbolt/power.c
> @@ -0,0 +1,347 @@
> +#include <linux/delay.h>
> +#include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> +
> +#include "power.h"
> +
> +#ifdef pr_fmt
> +#undef pr_fmt
> +#endif
Perhaps just define pr_fmt before any other include?
We have such check where actually default pr_fmt is defined. No need
to duplicate.
> +#define pr_fmt(fmt) KBUILD_MODNAME " %s: " fmt, dev_name(dev)
> +
> + /* prevent interrupts during system sleep transition */
> + if (ACPI_FAILURE(acpi_disable_gpe(NULL, power->wake_gpe))) {
> + pr_err("cannot disable wake GPE, resuming\n");
dev_err?
> + pm_request_resume(dev);
> + return -EAGAIN;
> + }
> +
> + return DPM_DIRECT_COMPLETE;
> +}
> + pr_info("resetting power switch\n");
> + if (ACPI_FAILURE(acpi_execute_simple_method(power->set, NULL, 0))) {
> + pr_err("cannot call power->set method\n");
> + dev->power.runtime_error = -EIO;
> + }
> +
> + if (ACPI_FAILURE(acpi_enable_gpe(NULL, power->wake_gpe))) {
> + pr_err("cannot enable wake GPE, resuming\n");
> + pm_request_resume(dev);
> + }
Ditto. pr_ -> dev_ ?
Also in the rest of code where applicable.
> + /*
> + * On gen 2 controllers, the wake GPE fires as long as the controller
> + * is powered up. Poll until it's powered down before enabling the GPE.
> + */
> + for (i = 0; i < 300; i++) {
300 is magic.
> + if (ACPI_FAILURE(acpi_evaluate_integer(power->get,
> + NULL, NULL, &powered_down))) {
> + pr_err("cannot call power->get method, resuming\n");
> + goto err;
> + }
> + if (powered_down)
> + break;
> + usleep_range(800, 1600);
Why 800? Perhaps comment on this.
> + }
> + if (!powered_down) {
> + pr_err("refused to power down, resuming\n");
> + goto err;
> + }
> +
> + if (ACPI_FAILURE(acpi_enable_gpe(NULL, power->wake_gpe))) {
> + pr_err("cannot enable wake GPE, resuming\n");
> + goto err;
> + }
> +
> + return 0;
> +
> +err:
err_resume: ?
> + acpi_execute_simple_method(power->set, NULL, 1);
> + dev->bus->pm->runtime_resume(dev);
> + pci_walk_bus(pdev->subordinate, request_resume, NULL);
> + return -EAGAIN;
> +}
> +void thunderbolt_power_init(struct tb *tb)
> +{
> + struct device *upstream_dev, *nhi_dev = &tb->nhi->pdev->dev;
> + struct tb_power *power = NULL;
> + struct acpi_handle *nhi_handle;
> +
> + power = kzalloc(sizeof(*power), GFP_KERNEL);
> + if (!power) {
> + dev_err(nhi_dev, "cannot allocate power data\n");
> + goto err;
> + }
> +
> + nhi_handle = ACPI_HANDLE(nhi_dev);
> + if (!nhi_handle) {
> + dev_err(nhi_dev, "cannot find ACPI handle\n");
> + goto err;
> + }
> +
> + /* Macs introduced 2011/2012 have XRPE, 2013+ have TRPE */
> + if (ACPI_FAILURE(acpi_get_handle(nhi_handle, "XRPE", &power->set)) &&
> + ACPI_FAILURE(acpi_get_handle(nhi_handle, "TRPE", &power->set))) {
> + dev_err(nhi_dev, "cannot find power->set method\n");
> + goto err;
> + }
> +
> + if (ACPI_FAILURE(acpi_get_handle(nhi_handle, "XRIL", &power->get))) {
> + dev_err(nhi_dev, "cannot find power->get method\n");
> + goto err;
> + }
> +
> + if (ACPI_FAILURE(acpi_evaluate_integer(nhi_handle, "XRIN", NULL,
> + &power->wake_gpe))) {
> + dev_err(nhi_dev, "cannot find wake GPE\n");
> + goto err;
> + }
> +
> + if (ACPI_FAILURE(acpi_install_gpe_handler(NULL, power->wake_gpe,
> + ACPI_GPE_LEVEL_TRIGGERED, nhi_wake, nhi_dev))) {
> + dev_err(nhi_dev, "cannot install GPE handler\n");
> + goto err;
> + }
> +
> + if (!nhi_dev->parent || !nhi_dev->parent->parent) {
> + dev_err(nhi_dev, "cannot find upstream bridge\n");
> + goto err;
> + }
> + upstream_dev = nhi_dev->parent->parent;
> +
> + pci_walk_bus(to_pci_dev(upstream_dev)->bus, disable_pme_poll,
> + to_pci_dev(upstream_dev)->subordinate);
> +
> + power->pm_domain.ops = *upstream_dev->bus->pm;
> + power->pm_domain.ops.prepare = upstream_prepare;
> + power->pm_domain.ops.complete = upstream_complete;
> + power->pm_domain.ops.runtime_suspend = upstream_runtime_suspend;
> + power->pm_domain.ops.runtime_resume = upstream_runtime_resume;
> + power->tb = tb;
> + dev_pm_domain_set(upstream_dev, &power->pm_domain);
> +
> + tb->power = power;
> +
> + return;
> +
> +err:
err_free: ?
> + dev_err(nhi_dev, "controller will stay powered up permanently\n");
> + kfree(power);
> +}
> +
> +void thunderbolt_power_fini(struct tb *tb)
> +{
> + struct device *nhi_dev = &tb->nhi->pdev->dev;
> + struct device *upstream_dev = nhi_dev->parent->parent;
> + struct tb_power *power = tb->power;
> +
> + if (!power)
> + return;
Would be the case?
> +
> + tb->power = NULL;
> + dev_pm_domain_set(upstream_dev, NULL);
> +
> + if (ACPI_FAILURE(acpi_remove_gpe_handler(NULL, power->wake_gpe,
> + nhi_wake)))
> + dev_err(nhi_dev, "cannot remove GPE handler\n");
> +
> + kfree(power);
> +}
--
With Best Regards,
Andy Shevchenko
On Mon, Dec 19, 2016 at 01:05:10AM +0200, Andy Shevchenko wrote:
> On Sat, Dec 17, 2016 at 4:39 PM, Lukas Wunner <[email protected]> wrote:
> > Document and implement Apple's ACPI-based (but nonstandard) pm mechanism
> > for Thunderbolt. Briefly, an ACPI method provided by Apple is used to
> > cut power to the controller. A GPE is enabled while the controller is
> > powered down which sideband-signals a plug event, whereupon we reinstate
> > power using the ACPI method.
> >
> > This saves 1.7 W on machines with a Light Ridge controller and is
> > reported to save 4 W on Cactus Ridge 4C and Falcon Ridge 4C. (I believe
> > 4 W includes the bus power drawn by Apple's Gigabit Ethernet adapter.)
> > It fixes (at least partially) a power regression introduced in 3.17 by
> > commit 7bc5a2bad0b8 ("ACPI: Support _OSI("Darwin") correctly").
> >
>
> > +++ b/drivers/thunderbolt/power.c
> > @@ -0,0 +1,347 @@
>
> > +#include <linux/delay.h>
> > +#include <linux/pci.h>
> > +#include <linux/pm_runtime.h>
> > +
> > +#include "power.h"
> > +
>
> > +#ifdef pr_fmt
> > +#undef pr_fmt
> > +#endif
> > +#define pr_fmt(fmt) KBUILD_MODNAME " %s: " fmt, dev_name(dev)
>
> Perhaps just define pr_fmt before any other include?
> We have such check where actually default pr_fmt is defined. No need
> to duplicate.
If I put the '#define pr_fmt(fmt)' line above all includes, I get:
include/linux/ratelimit.h: In function 'ratelimit_state_exit':
drivers/thunderbolt/power.c:93:49: error: implicit declaration of function 'dev_name'
This is caused by 6b1d174b0c27 which was introduced this August.
If I try to solve this by including <linux/device.h> before the
'#define pr_fmt(fmt)' line, I get:
drivers/thunderbolt/power.c:95:0: warning: "pr_fmt" redefined
#define pr_fmt(fmt) KBUILD_MODNAME " %s: " fmt, dev_name(dev)
^
In file included from /root/kernel/linux/include/linux/kernel.h:13:0,
from /root/kernel/linux/include/linux/list.h:8,
from /root/kernel/linux/include/linux/kobject.h:20,
from /root/kernel/linux/include/linux/device.h:17,
from /root/kernel/linux/drivers/thunderbolt/power.c:93:
include/linux/printk.h:260:0: note: this is the location of the previous definition
#define pr_fmt(fmt) fmt
^
So it seems there's no alternative to the '#undef pr_fmt'.
> > + /* prevent interrupts during system sleep transition */
> > + if (ACPI_FAILURE(acpi_disable_gpe(NULL, power->wake_gpe))) {
> > + pr_err("cannot disable wake GPE, resuming\n");
>
> dev_err?
This is intentionally pr_err for cosmetic reasons. :-)
With dev_err it would look like this in dmesg:
pcieport 0000:05:00.0: cannot disable wake GPE, resuming
With pr_err it looks like this:
thunderbolt 0000:05:00.0: cannot disable wake GPE, resuming
Thus, someone grepping for this error message will get a hint that
they have to look in drivers/thunderbolt/ rather than drivers/pci/pcie/.
The code of this PM callback is located in the thunderbolt driver,
which binds to the NHI, 0000:07:00.0. But the PM callback is
assigned to the upstream bridge, which is the grandparent of the NHI,
0000:05:00.0. The pr_fmt is crafted such that the KBUILD_MODNAME
("thunderbolt") is logged rather than "pcieport". So I use pr_*
in the PM callbacks assigned to the upstream bridge and dev_*
in thunderbolt_power_init() / _fini() (which is executed in the
context of the NHI).
This is also much nicer for end users looking at dmesg. E.g. when
the chip is suspended, it looks like this:
thunderbolt 0000:07:00.0: suspending...
thunderbolt 0000:07:00.0: stopping RX ring 0
thunderbolt 0000:07:00.0: disabling interrupt at register 0x38204 bit 0 (0x1 -> 0x0)
thunderbolt 0000:07:00.0: stopping TX ring 0
thunderbolt 0000:07:00.0: disabling interrupt at register 0x38200 bit 0 (0x1 -> 0x0)
thunderbolt 0000:07:00.0: control channel stopped
thunderbolt 0000:07:00.0: suspend finished
thunderbolt 0000:05:00.0: powering down
It would be confusing for end users if it would say here that
the pcieport is powering down.
> > + /*
> > + * On gen 2 controllers, the wake GPE fires as long as the controller
> > + * is powered up. Poll until it's powered down before enabling the GPE.
> > + */
> > + for (i = 0; i < 300; i++) {
>
> 300 is magic.
[...]
> Why 800? Perhaps comment on this.
We mimic the behaviour of the macOS driver here which polls up to
300 times with a 1 ms delay. I've now extended the comment above
in my working branch to explain this.
> > +err:
>
> err_resume: ?
Ok.
> > +err:
>
> err_free: ?
Ok.
> > +void thunderbolt_power_fini(struct tb *tb)
> > +{
> > + struct device *nhi_dev = &tb->nhi->pdev->dev;
> > + struct device *upstream_dev = nhi_dev->parent->parent;
> > + struct tb_power *power = tb->power;
> > +
>
> > + if (!power)
> > + return;
>
> Would be the case?
That would be the case if thunderbolt_power_init() failed, then we
have to skip removing the GPE handler and all that. I've now added
a comment to explain this.
I've also discovered and fixed a bug in thunderbolt_power_init(),
in the "cannot find upstream bridge" error path I have to remove
the GPE handler.
I'll wait a bit if there's further feedback and will post a
rectified version probably next week, after the merge window
has closed.
Thanks!
Lukas
On Tue, Dec 20, 2016 at 1:28 PM, Lukas Wunner <[email protected]> wrote:
> On Mon, Dec 19, 2016 at 01:05:10AM +0200, Andy Shevchenko wrote:
>> On Sat, Dec 17, 2016 at 4:39 PM, Lukas Wunner <[email protected]> wrote:
>> > Document and implement Apple's ACPI-based (but nonstandard) pm mechanism
>> > for Thunderbolt. Briefly, an ACPI method provided by Apple is used to
>> > cut power to the controller. A GPE is enabled while the controller is
>> > powered down which sideband-signals a plug event, whereupon we reinstate
>> > power using the ACPI method.
>> >
>> > This saves 1.7 W on machines with a Light Ridge controller and is
>> > reported to save 4 W on Cactus Ridge 4C and Falcon Ridge 4C. (I believe
>> > 4 W includes the bus power drawn by Apple's Gigabit Ethernet adapter.)
>> > It fixes (at least partially) a power regression introduced in 3.17 by
>> > commit 7bc5a2bad0b8 ("ACPI: Support _OSI("Darwin") correctly").
>> > +++ b/drivers/thunderbolt/power.c
>> > @@ -0,0 +1,347 @@
>>
>> > +#include <linux/delay.h>
>> > +#include <linux/pci.h>
>> > +#include <linux/pm_runtime.h>
>> > +
>> > +#include "power.h"
>> > +
>>
>> > +#ifdef pr_fmt
>> > +#undef pr_fmt
>> > +#endif
>> > +#define pr_fmt(fmt) KBUILD_MODNAME " %s: " fmt, dev_name(dev)
>>
>> Perhaps just define pr_fmt before any other include?
>> We have such check where actually default pr_fmt is defined. No need
>> to duplicate.
>
> If I put the '#define pr_fmt(fmt)' line above all includes, I get:
>
> include/linux/ratelimit.h: In function 'ratelimit_state_exit':
> drivers/thunderbolt/power.c:93:49: error: implicit declaration of function 'dev_name'
>
> This is caused by 6b1d174b0c27 which was introduced this August.
>
>
> If I try to solve this by including <linux/device.h> before the
Not before, but rather after?
printk.h defines default pr_fmt. What you need is to define it before.
> '#define pr_fmt(fmt)' line, I get:
>
> drivers/thunderbolt/power.c:95:0: warning: "pr_fmt" redefined
> #define pr_fmt(fmt) KBUILD_MODNAME " %s: " fmt, dev_name(dev)
> ^
> In file included from /root/kernel/linux/include/linux/kernel.h:13:0,
> from /root/kernel/linux/include/linux/list.h:8,
> from /root/kernel/linux/include/linux/kobject.h:20,
> from /root/kernel/linux/include/linux/device.h:17,
> from /root/kernel/linux/drivers/thunderbolt/power.c:93:
> include/linux/printk.h:260:0: note: this is the location of the previous definition
> #define pr_fmt(fmt) fmt
> ^
>
>
> So it seems there's no alternative to the '#undef pr_fmt'.
Imagine how many drivers could suffer of this. So, something is wrong
either in your code, in headers, or in both. But many drivers for now
are using cusotm pr_fmt() in a way I described.
>> > + /* prevent interrupts during system sleep transition */
>> > + if (ACPI_FAILURE(acpi_disable_gpe(NULL, power->wake_gpe))) {
>> > + pr_err("cannot disable wake GPE, resuming\n");
>>
>> dev_err?
>
> This is intentionally pr_err for cosmetic reasons. :-)
>
> With dev_err it would look like this in dmesg:
>
> pcieport 0000:05:00.0: cannot disable wake GPE, resuming
>
> With pr_err it looks like this:
>
> thunderbolt 0000:05:00.0: cannot disable wake GPE, resuming
>
> Thus, someone grepping for this error message will get a hint that
> they have to look in drivers/thunderbolt/ rather than drivers/pci/pcie/.
>
> The code of this PM callback is located in the thunderbolt driver,
> which binds to the NHI, 0000:07:00.0. But the PM callback is
> assigned to the upstream bridge, which is the grandparent of the NHI,
> 0000:05:00.0. The pr_fmt is crafted such that the KBUILD_MODNAME
> ("thunderbolt") is logged rather than "pcieport". So I use pr_*
> in the PM callbacks assigned to the upstream bridge and dev_*
> in thunderbolt_power_init() / _fini() (which is executed in the
> context of the NHI).
I understand rationale, here my question: could pcie bridge driver
replace name for the port which serves as thunderbolt?
>> > +void thunderbolt_power_fini(struct tb *tb)
>> > +{
>> > + struct device *nhi_dev = &tb->nhi->pdev->dev;
>> > + struct device *upstream_dev = nhi_dev->parent->parent;
>> > + struct tb_power *power = tb->power;
>> > +
>>
>> > + if (!power)
>> > + return;
>>
>> Would be the case?
>
> That would be the case if thunderbolt_power_init() failed, then we
> have to skip removing the GPE handler and all that. I've now added
> a comment to explain this.
And you can't do this outside because outside has no knowledge what is
tb_power is. Am I right?
--
With Best Regards,
Andy Shevchenko
On Tue, Dec 20, 2016 at 03:44:31PM +0200, Andy Shevchenko wrote:
> On Tue, Dec 20, 2016 at 1:28 PM, Lukas Wunner <[email protected]> wrote:
> > On Mon, Dec 19, 2016 at 01:05:10AM +0200, Andy Shevchenko wrote:
> >> On Sat, Dec 17, 2016 at 4:39 PM, Lukas Wunner <[email protected]> wrote:
> >> > +#include <linux/delay.h>
> >> > +#include <linux/pci.h>
> >> > +#include <linux/pm_runtime.h>
> >> > +
> >> > +#include "power.h"
> >> > +
> >> > +#ifdef pr_fmt
> >> > +#undef pr_fmt
> >> > +#endif
> >> > +#define pr_fmt(fmt) KBUILD_MODNAME " %s: " fmt, dev_name(dev)
> >>
> >> Perhaps just define pr_fmt before any other include?
> >> We have such check where actually default pr_fmt is defined. No need
> >> to duplicate.
> >
> > If I put the '#define pr_fmt(fmt)' line above all includes, I get:
> >
> > include/linux/ratelimit.h: In function 'ratelimit_state_exit':
> > drivers/thunderbolt/power.c:93:49: error: implicit declaration of function 'dev_name'
> >
> > This is caused by 6b1d174b0c27 which was introduced this August.
> >
> > If I try to solve this by including <linux/device.h> before the
>
> Not before, but rather after?
>
> printk.h defines default pr_fmt. What you need is to define it before.
If I define pr_fmt and then include <linux/printk.h> I get the error
above because <linux/printk.h> seems to include <linux/ratelimit.h>
via some sub-includes, and this defines the static inline which
expands pr_fmt and fails because dev_name() is not yet defined.
> > So it seems there's no alternative to the '#undef pr_fmt'.
>
> Imagine how many drivers could suffer of this. So, something is wrong
> either in your code, in headers, or in both. But many drivers for now
> are using cusotm pr_fmt() in a way I described.
There are already 51 files in the tree using the '#undef pr_fmt' idiom,
so this is pretty common:
# /bin/ls | egrep -v '(\.git|debian)' | xargs egrep -r '#undef pr_fmt' | wc -l
51
However what I can do is drop the '#ifdef pr_fmt', it's unnecessary,
I think I cargo-culted this from one of these 51 files.
> >> > + /* prevent interrupts during system sleep transition */
> >> > + if (ACPI_FAILURE(acpi_disable_gpe(NULL, power->wake_gpe))) {
> >> > + pr_err("cannot disable wake GPE, resuming\n");
> >>
> >> dev_err?
> >
> > This is intentionally pr_err for cosmetic reasons. :-)
> >
> > With dev_err it would look like this in dmesg:
> >
> > pcieport 0000:05:00.0: cannot disable wake GPE, resuming
> >
> > With pr_err it looks like this:
> >
> > thunderbolt 0000:05:00.0: cannot disable wake GPE, resuming
> >
> > Thus, someone grepping for this error message will get a hint that
> > they have to look in drivers/thunderbolt/ rather than drivers/pci/pcie/.
> >
> > The code of this PM callback is located in the thunderbolt driver,
> > which binds to the NHI, 0000:07:00.0. But the PM callback is
> > assigned to the upstream bridge, which is the grandparent of the NHI,
> > 0000:05:00.0. The pr_fmt is crafted such that the KBUILD_MODNAME
> > ("thunderbolt") is logged rather than "pcieport". So I use pr_*
> > in the PM callbacks assigned to the upstream bridge and dev_*
> > in thunderbolt_power_init() / _fini() (which is executed in the
> > context of the NHI).
>
> I understand rationale, here my question: could pcie bridge driver
> replace name for the port which serves as thunderbolt?
The "pcieport" string is hardcoded in drivers/pci/pcie/portdrv_pci.c
and I'd like to avoid cluttering this file with some quirk which is
specific to this Mac Thunderbolt driver.
> >> > +void thunderbolt_power_fini(struct tb *tb)
> >> > +{
> >> > + struct device *nhi_dev = &tb->nhi->pdev->dev;
> >> > + struct device *upstream_dev = nhi_dev->parent->parent;
> >> > + struct tb_power *power = tb->power;
> >> > +
> >>
> >> > + if (!power)
> >> > + return;
> >>
> >> Would be the case?
> >
> > That would be the case if thunderbolt_power_init() failed, then we
> > have to skip removing the GPE handler and all that. I've now added
> > a comment to explain this.
>
> And you can't do this outside because outside has no knowledge what is
> tb_power is. Am I right?
thunderbolt_power_fini() is called from the ->remove hook of
thunderbolt.ko. I could in principle call it conditionally
but I think clarity improves if I perform the check here because
the conditions that might lead to tb->power being NULL are
visible immediately before this function in thunderbolt_power_init().
Thanks,
Lukas