2014-01-14 23:09:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 0/3] PM: Mechanism to avoid resuming runtime-suspended devices during system suspend

Hi,

The following experimental series of 3 patches implements a mechanism allowing
subsystems to avoid resuming runtime-suspended devices during system suspend.

As far as the PM core goes, it introduces a new flag, power.no_suspend, that
will be set by the core for devices which can stay suspended. The idea is that
subsystems should know which devices can stay suspended over system suspend
and to allow them to tell the core about that patch [1/3] changes the calling
convention of the device PM .prepare() callback so that it can return a positive
value on success to be interpreted as "this device has been runtime-suspended
and doesn't need to be resumed" information. If .prepare() returns a positive
number for certain device, the core will set power.no_suspend and will not run
suspend callbacks for device with that flag set going forward (during this
particular system suspend transition).

However, parents may generally need to be resumed so that the suspend of their
children can be carried out, so the PM core will clear power.no_suspend for
the parents of devices whose power.no_suspend is not set (unless those parents
have power.ignore_children set).

Patch [2/3] adds a new runtime PM helper function that subsystems can use to
check whether or not a given device is runtime-suspended when .prepare() is being
executed for it.

Patch [3/3] implements the subsystem part for the ACPI PM domain, because that
is relatively straightforward. If the general approach makes sense, I'll think
about doing the same for PCI.

Comments welcome!

Thanks,
Rafael


2014-01-14 23:08:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 2/3] PM / runtime: Routine for checking device status during system suspend

From: Rafael J. Wysocki <[email protected]>

Add a new helper routine, pm_runtime_enabled_and_suspended(), to
allow subsystems (or PM domains) to check the runtime PM status of
devices during system suspend (possibly to avoid resuming those
devices upfront at that time).

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/runtime.c | 28 ++++++++++++++++++++++++++++
include/linux/pm_runtime.h | 2 ++
2 files changed, 30 insertions(+)

Index: linux-pm/include/linux/pm_runtime.h
===================================================================
--- linux-pm.orig/include/linux/pm_runtime.h
+++ linux-pm/include/linux/pm_runtime.h
@@ -53,6 +53,7 @@ extern unsigned long pm_runtime_autosusp
extern void pm_runtime_update_max_time_suspended(struct device *dev,
s64 delta_ns);
extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
+extern bool pm_runtime_enabled_and_suspended(struct device *dev);

static inline bool pm_children_suspended(struct device *dev)
{
@@ -161,6 +162,7 @@ static inline unsigned long pm_runtime_a
struct device *dev) { return 0; }
static inline void pm_runtime_set_memalloc_noio(struct device *dev,
bool enable){}
+static inline bool pm_runtime_enabled_and_suspended(struct device *dev) { return false };

#endif /* !CONFIG_PM_RUNTIME */

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1194,6 +1194,34 @@ void pm_runtime_enable(struct device *de
EXPORT_SYMBOL_GPL(pm_runtime_enable);

/**
+ * pm_runtime_enabled_and_suspended - Check runtime PM status of a device.
+ * @dev: Device to handle.
+ *
+ * This routine is to be executed during system suspend only, after
+ * device_prepare() has been executed for @dev.
+ *
+ * Return false if runtime PM is disabled for the device. Otherwise, wait
+ * for pending transitions to complete and check the runtime PM status of the
+ * device after that. Return true if it is RPM_SUSPENDED.
+ */
+bool pm_runtime_enabled_and_suspended(struct device *dev)
+{
+ unsigned long flags;
+ bool ret;
+
+ spin_lock_irqsave(&dev->power.lock, flags);
+ if (dev->power.disable_depth) {
+ ret = false;
+ } else {
+ __pm_runtime_barrier(dev);
+ ret = pm_runtime_status_suspended(dev);
+ }
+ spin_unlock_irqrestore(&dev->power.lock, flags);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enabled_and_suspended);
+
+/**
* pm_runtime_forbid - Block runtime PM of a device.
* @dev: Device to handle.
*

2014-01-14 23:08:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 3/3] ACPI / PM: Avoid resuming devices in ACPI PM domain during system suspend

From: Rafael J. Wysocki <[email protected]>

Rework the ACPI PM domain's PM callbacks to avoid resuming devices
during system suspend in order to modify their wakeup settings if
that isn't necessary.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/acpi/device_pm.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -812,6 +812,13 @@ int acpi_dev_runtime_resume(struct devic
struct acpi_device *adev = acpi_dev_pm_get_node(dev);
int error;

+ /*
+ * This only matters during system suspend, if acpi_subsys_prepare()
+ * has returned 1. In that case, we may be resumed through a child
+ * runtime resume, in which case our system suspend callbacks will need
+ * to be executed, so power.no_suspend has to be cleared.
+ */
+ dev->power.no_suspend = false;
if (!adev)
return 0;

@@ -912,12 +919,28 @@ EXPORT_SYMBOL_GPL(acpi_dev_resume_early)
*/
int acpi_subsys_prepare(struct device *dev)
{
+ struct acpi_device *adev = acpi_dev_pm_get_node(dev);
+ u32 target_state;
+ int error, state;
+
+ if (!adev || !pm_runtime_enabled_and_suspended(dev))
+ return pm_generic_prepare(dev);
+
+ target_state = acpi_target_system_state();
+ error = acpi_dev_pm_get_state(dev, adev, target_state, NULL, &state);
+ if (error || state != adev->power.state
+ || device_may_wakeup(dev) != !!adev->wakeup.prepare_count) {
+ pm_runtime_resume(dev);
+ return pm_generic_prepare(dev);
+ }
/*
- * Follow PCI and resume devices suspended at run time before running
- * their system suspend callbacks.
+ * If this is a wakeup device, wakeup power has been enabled already for
+ * it during the preceding runtime suspend. Caveat: "sleep state" is
+ * one of the _DSW arguments, but that shouldn't matter for the devices
+ * using acpi_general_pm_domain.
*/
- pm_runtime_resume(dev);
- return pm_generic_prepare(dev);
+ error = pm_generic_prepare(dev);
+ return error ? error : 1;
}
EXPORT_SYMBOL_GPL(acpi_subsys_prepare);

2014-01-14 23:09:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 1/3] PM / sleep: Flag to avoid executing suspend callbacks for devices

From: Rafael J. Wysocki <[email protected]>

Currently, some subsystems (e.g. PCI and the ACPI PM domain) have to
resume all runtime-suspended devices during system suspend, mostly
because those devices may need to be reprogrammed due to different
wakeup settings for system sleep and runtime PM. However, at least
in some cases that isn't really necessary, because the wakeup
settings aren't really different.

The idea here is that subsystems should know whether or not it is
necessary to reprogram a given device during system suspend and they
should be able to tell the PM core about that. For this reason,
modify the PM core so that if the .prepare() callback returns a
positive value for certain device, the core will set a new
power.no_suspend flag for it. Then, if that flag is set, the core
will skip all of the subsequent suspend callbacks for that device.

However, since parents may need to be resumed so that their children
can be reprogrammed, make the PM core clear power.no_suspend for
devices that don't have power.ignore_children set and whose
children don't have power.no_suspend set.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/main.c | 38 ++++++++++++++++++++++++++------------
include/linux/pm.h | 1 +
2 files changed, 27 insertions(+), 12 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -918,7 +918,7 @@ static int device_suspend_noirq(struct d
pm_callback_t callback = NULL;
char *info = NULL;

- if (dev->power.syscore)
+ if (dev->power.syscore || dev->power.no_suspend)
return 0;

if (dev->pm_domain) {
@@ -1006,7 +1006,7 @@ static int device_suspend_late(struct de

__pm_runtime_disable(dev, false);

- if (dev->power.syscore)
+ if (dev->power.syscore || dev->power.no_suspend)
return 0;

if (dev->pm_domain) {
@@ -1143,8 +1143,10 @@ static int __device_suspend(struct devic
* for it, this is equivalent to the device signaling wakeup, so the
* system suspend operation should be aborted.
*/
- if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
+ if (pm_runtime_barrier(dev) && device_may_wakeup(dev)) {
pm_wakeup_event(dev, 0);
+ dev->power.no_suspend = false;
+ }

if (pm_wakeup_pending()) {
async_error = -EBUSY;
@@ -1157,6 +1159,9 @@ static int __device_suspend(struct devic
dpm_watchdog_set(&wd, dev);
device_lock(dev);

+ if (dev->power.no_suspend)
+ goto End;
+
if (dev->pm_domain) {
info = "power domain ";
callback = pm_op(&dev->pm_domain->ops, state);
@@ -1205,9 +1210,13 @@ static int __device_suspend(struct devic
End:
if (!error) {
dev->power.is_suspended = true;
- if (dev->power.wakeup_path
- && dev->parent && !dev->parent->power.ignore_children)
- dev->parent->power.wakeup_path = true;
+ if (dev->parent && !dev->parent->power.ignore_children) {
+ if (dev->power.wakeup_path)
+ dev->parent->power.wakeup_path = true;
+
+ if (!dev->power.no_suspend)
+ dev->parent->power.no_suspend = false;
+ }
}

device_unlock(dev);
@@ -1307,7 +1316,7 @@ static int device_prepare(struct device
{
int (*callback)(struct device *) = NULL;
char *info = NULL;
- int error = 0;
+ int ret = 0;

if (dev->power.syscore)
return 0;
@@ -1323,6 +1332,7 @@ static int device_prepare(struct device
device_lock(dev);

dev->power.wakeup_path = device_may_wakeup(dev);
+ dev->power.no_suspend = false;

if (dev->pm_domain) {
info = "preparing power domain ";
@@ -1344,16 +1354,20 @@ static int device_prepare(struct device
}

if (callback) {
- error = callback(dev);
- suspend_report_result(callback, error);
+ ret = callback(dev);
+ suspend_report_result(callback, ret);
}

device_unlock(dev);

- if (error)
+ if (ret < 0) {
pm_runtime_put(dev);
+ } else if (ret > 0) {
+ dev->power.no_suspend = true;
+ ret = 0;
+ }

- return error;
+ return ret;
}

/**
@@ -1422,7 +1436,7 @@ EXPORT_SYMBOL_GPL(dpm_suspend_start);

void __suspend_report_result(const char *function, void *fn, int ret)
{
- if (ret)
+ if (ret < 0)
printk(KERN_ERR "%s(): %pF returns %d\n", function, fn, ret);
}
EXPORT_SYMBOL_GPL(__suspend_report_result);
Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -544,6 +544,7 @@ struct dev_pm_info {
bool is_suspended:1; /* Ditto */
bool ignore_children:1;
bool early_init:1; /* Owned by the PM core */
+ bool no_suspend:1;
spinlock_t lock;
#ifdef CONFIG_PM_SLEEP
struct list_head entry;

2014-01-15 13:43:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Update][RFC][PATCH 3/3] ACPI / PM: Avoid resuming devices in ACPI PM domain during system suspend

From: Rafael J. Wysocki <[email protected]>
Subject: ACPI / PM: Avoid resuming devices in ACPI PM domain during system suspend

Rework the ACPI PM domain's PM callbacks to avoid resuming devices
during system suspend in order to modify their wakeup settings if
that isn't necessary.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---

Changes from the previous version:

- I don't think it is really necessary to clear power.no_suspend from
acpi_dev_runtime_resume(), because __device_suspend() will run for
children before it runs for the parent, so it is sufficient to clear
power.no_suspend for the parent from there.

At the same time, since the parent can only be runtime-suspended if the
children are runtime-suspended, the children with power.no_suspend clear
have to be runtime-resumed before __device_suspend() is executed for
them and that will trigger runtime resume of the parent.

- To avoid problems related to possible differences between runtime resume
and system resume driver callbacks, use pm_runtime_resume() to resume
devices whose power.no_suspend was set during the suspend we're resuming
from.

Thanks,
Rafael

---
drivers/acpi/device_pm.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 40 insertions(+), 5 deletions(-)

Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -912,12 +912,28 @@ EXPORT_SYMBOL_GPL(acpi_dev_resume_early)
*/
int acpi_subsys_prepare(struct device *dev)
{
+ struct acpi_device *adev = acpi_dev_pm_get_node(dev);
+ u32 target_state;
+ int error, state;
+
+ if (!adev || !pm_runtime_enabled_and_suspended(dev))
+ return pm_generic_prepare(dev);
+
+ target_state = acpi_target_system_state();
+ error = acpi_dev_pm_get_state(dev, adev, target_state, NULL, &state);
+ if (error || state != adev->power.state
+ || device_may_wakeup(dev) != !!adev->wakeup.prepare_count) {
+ pm_runtime_resume(dev);
+ return pm_generic_prepare(dev);
+ }
/*
- * Follow PCI and resume devices suspended at run time before running
- * their system suspend callbacks.
+ * If this is a wakeup device, wakeup power has been enabled already for
+ * it during the preceding runtime suspend. Caveat: "sleep state" is
+ * one of the _DSW arguments, but that shouldn't matter for the devices
+ * using acpi_general_pm_domain.
*/
- pm_runtime_resume(dev);
- return pm_generic_prepare(dev);
+ error = pm_generic_prepare(dev);
+ return error ? error : 1;
}
EXPORT_SYMBOL_GPL(acpi_subsys_prepare);

@@ -945,10 +961,27 @@ EXPORT_SYMBOL_GPL(acpi_subsys_suspend_la
*/
int acpi_subsys_resume_early(struct device *dev)
{
- int ret = acpi_dev_resume_early(dev);
+ int ret;
+
+ if (dev->power.no_suspend)
+ return 0;
+
+ ret = acpi_dev_resume_early(dev);
return ret ? ret : pm_generic_resume_early(dev);
}
EXPORT_SYMBOL_GPL(acpi_subsys_resume_early);
+
+/**
+ * acpi_subsys_resume - Resume device using ACPI (if not resumed before).
+ * @dev: Device to resume.
+ *
+ * If power.no_suspend is set for @dev, run pm_runtime_resume() for it.
+ */
+int acpi_subsys_resume(struct device *dev)
+{
+ return dev->power.no_suspend ? pm_runtime_resume(dev) : 0;
+}
+EXPORT_SYMBOL_GPL(acpi_subsys_resume);
#endif /* CONFIG_PM_SLEEP */

static struct dev_pm_domain acpi_general_pm_domain = {
@@ -961,8 +994,10 @@ static struct dev_pm_domain acpi_general
.prepare = acpi_subsys_prepare,
.suspend_late = acpi_subsys_suspend_late,
.resume_early = acpi_subsys_resume_early,
+ .resume = acpi_subsys_resume,
.poweroff_late = acpi_subsys_suspend_late,
.restore_early = acpi_subsys_resume_early,
+ .restore = acpi_subsys_resume,
#endif
},
};

2014-01-16 13:26:20

by Mika Westerberg

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/3] PM / runtime: Routine for checking device status during system suspend

On Wed, Jan 15, 2014 at 12:14:46AM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Add a new helper routine, pm_runtime_enabled_and_suspended(), to
> allow subsystems (or PM domains) to check the runtime PM status of
> devices during system suspend (possibly to avoid resuming those
> devices upfront at that time).
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/base/power/runtime.c | 28 ++++++++++++++++++++++++++++
> include/linux/pm_runtime.h | 2 ++
> 2 files changed, 30 insertions(+)
>
> Index: linux-pm/include/linux/pm_runtime.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm_runtime.h
> +++ linux-pm/include/linux/pm_runtime.h
> @@ -53,6 +53,7 @@ extern unsigned long pm_runtime_autosusp
> extern void pm_runtime_update_max_time_suspended(struct device *dev,
> s64 delta_ns);
> extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
> +extern bool pm_runtime_enabled_and_suspended(struct device *dev);
>
> static inline bool pm_children_suspended(struct device *dev)
> {
> @@ -161,6 +162,7 @@ static inline unsigned long pm_runtime_a
> struct device *dev) { return 0; }
> static inline void pm_runtime_set_memalloc_noio(struct device *dev,
> bool enable){}
> +static inline bool pm_runtime_enabled_and_suspended(struct device *dev) { return false };

The above probably doesn't compile if !CONFIG_PM_RUNTIME because of the
misplaced semicolon.

2014-01-16 15:53:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: [Update][RFC][PATCH 2/3] PM / runtime: Routine for checking device status during system suspend

On Thursday, January 16, 2014 03:32:51 PM Mika Westerberg wrote:
> On Wed, Jan 15, 2014 at 12:14:46AM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Add a new helper routine, pm_runtime_enabled_and_suspended(), to
> > allow subsystems (or PM domains) to check the runtime PM status of
> > devices during system suspend (possibly to avoid resuming those
> > devices upfront at that time).
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > drivers/base/power/runtime.c | 28 ++++++++++++++++++++++++++++
> > include/linux/pm_runtime.h | 2 ++
> > 2 files changed, 30 insertions(+)
> >
> > Index: linux-pm/include/linux/pm_runtime.h
> > ===================================================================
> > --- linux-pm.orig/include/linux/pm_runtime.h
> > +++ linux-pm/include/linux/pm_runtime.h
> > @@ -53,6 +53,7 @@ extern unsigned long pm_runtime_autosusp
> > extern void pm_runtime_update_max_time_suspended(struct device *dev,
> > s64 delta_ns);
> > extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
> > +extern bool pm_runtime_enabled_and_suspended(struct device *dev);
> >
> > static inline bool pm_children_suspended(struct device *dev)
> > {
> > @@ -161,6 +162,7 @@ static inline unsigned long pm_runtime_a
> > struct device *dev) { return 0; }
> > static inline void pm_runtime_set_memalloc_noio(struct device *dev,
> > bool enable){}
> > +static inline bool pm_runtime_enabled_and_suspended(struct device *dev) { return false };
>
> The above probably doesn't compile if !CONFIG_PM_RUNTIME because of the
> misplaced semicolon.

Of well, indeed, thanks!

Update follows.

---
From: Rafael J. Wysocki <[email protected]>
Subject: PM / runtime: Routine for checking device status during system suspend

Add a new helper routine, pm_runtime_enabled_and_suspended(), to
allow subsystems (or PM domains) to check the runtime PM status of
devices during system suspend (possibly to avoid resuming those
devices upfront at that time).

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/runtime.c | 28 ++++++++++++++++++++++++++++
include/linux/pm_runtime.h | 2 ++
2 files changed, 30 insertions(+)

Index: linux-pm/include/linux/pm_runtime.h
===================================================================
--- linux-pm.orig/include/linux/pm_runtime.h
+++ linux-pm/include/linux/pm_runtime.h
@@ -53,6 +53,7 @@ extern unsigned long pm_runtime_autosusp
extern void pm_runtime_update_max_time_suspended(struct device *dev,
s64 delta_ns);
extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
+extern bool pm_runtime_enabled_and_suspended(struct device *dev);

static inline bool pm_children_suspended(struct device *dev)
{
@@ -161,6 +162,7 @@ static inline unsigned long pm_runtime_a
struct device *dev) { return 0; }
static inline void pm_runtime_set_memalloc_noio(struct device *dev,
bool enable){}
+static inline bool pm_runtime_enabled_and_suspended(struct device *dev) { return false; }

#endif /* !CONFIG_PM_RUNTIME */

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1194,6 +1194,34 @@ void pm_runtime_enable(struct device *de
EXPORT_SYMBOL_GPL(pm_runtime_enable);

/**
+ * pm_runtime_enabled_and_suspended - Check runtime PM status of a device.
+ * @dev: Device to handle.
+ *
+ * This routine is to be executed during system suspend only, after
+ * device_prepare() has been executed for @dev.
+ *
+ * Return false if runtime PM is disabled for the device. Otherwise, wait
+ * for pending transitions to complete and check the runtime PM status of the
+ * device after that. Return true if it is RPM_SUSPENDED.
+ */
+bool pm_runtime_enabled_and_suspended(struct device *dev)
+{
+ unsigned long flags;
+ bool ret;
+
+ spin_lock_irqsave(&dev->power.lock, flags);
+ if (dev->power.disable_depth) {
+ ret = false;
+ } else {
+ __pm_runtime_barrier(dev);
+ ret = pm_runtime_status_suspended(dev);
+ }
+ spin_unlock_irqrestore(&dev->power.lock, flags);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enabled_and_suspended);
+
+/**
* pm_runtime_forbid - Block runtime PM of a device.
* @dev: Device to handle.
*

2014-02-16 23:38:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 2/3][Resend] PM / runtime: Routine for checking device status during system suspend

From: Rafael J. Wysocki <[email protected]>

Add a new helper routine, pm_runtime_enabled_and_suspended(), to
allow subsystems (or PM domains) to check the runtime PM status of
devices during system suspend (possibly to avoid resuming those
devices upfront at that time).

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/runtime.c | 28 ++++++++++++++++++++++++++++
include/linux/pm_runtime.h | 2 ++
2 files changed, 30 insertions(+)

Index: linux-pm/include/linux/pm_runtime.h
===================================================================
--- linux-pm.orig/include/linux/pm_runtime.h
+++ linux-pm/include/linux/pm_runtime.h
@@ -53,6 +53,7 @@ extern unsigned long pm_runtime_autosusp
extern void pm_runtime_update_max_time_suspended(struct device *dev,
s64 delta_ns);
extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
+extern bool pm_runtime_enabled_and_suspended(struct device *dev);

static inline bool pm_children_suspended(struct device *dev)
{
@@ -161,6 +162,7 @@ static inline unsigned long pm_runtime_a
struct device *dev) { return 0; }
static inline void pm_runtime_set_memalloc_noio(struct device *dev,
bool enable){}
+static inline bool pm_runtime_enabled_and_suspended(struct device *dev) { return false; }

#endif /* !CONFIG_PM_RUNTIME */

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1194,6 +1194,34 @@ void pm_runtime_enable(struct device *de
EXPORT_SYMBOL_GPL(pm_runtime_enable);

/**
+ * pm_runtime_enabled_and_suspended - Check runtime PM status of a device.
+ * @dev: Device to handle.
+ *
+ * This routine is to be executed during system suspend only, after
+ * device_prepare() has been executed for @dev.
+ *
+ * Return false if runtime PM is disabled for the device. Otherwise, wait
+ * for pending transitions to complete and check the runtime PM status of the
+ * device after that. Return true if it is RPM_SUSPENDED.
+ */
+bool pm_runtime_enabled_and_suspended(struct device *dev)
+{
+ unsigned long flags;
+ bool ret;
+
+ spin_lock_irqsave(&dev->power.lock, flags);
+ if (dev->power.disable_depth) {
+ ret = false;
+ } else {
+ __pm_runtime_barrier(dev);
+ ret = pm_runtime_status_suspended(dev);
+ }
+ spin_unlock_irqrestore(&dev->power.lock, flags);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enabled_and_suspended);
+
+/**
* pm_runtime_forbid - Block runtime PM of a device.
* @dev: Device to handle.
*

2014-02-16 23:38:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 3/3] ACPI / PM: Avoid resuming devices in ACPI PM domain during system suspend

From: Rafael J. Wysocki <[email protected]>

Rework the ACPI PM domain's PM callbacks to avoid resuming devices
during system suspend in order to modify their wakeup settings if
that isn't necessary.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/acpi/device_pm.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -900,12 +900,28 @@ EXPORT_SYMBOL_GPL(acpi_dev_resume_early)
*/
int acpi_subsys_prepare(struct device *dev)
{
+ struct acpi_device *adev = ACPI_COMPANION(dev);
+ u32 target_state;
+ int error, state;
+
+ if (!adev || !pm_runtime_enabled_and_suspended(dev))
+ return pm_generic_prepare(dev);
+
+ target_state = acpi_target_system_state();
+ error = acpi_dev_pm_get_state(dev, adev, target_state, NULL, &state);
+ if (error || state != adev->power.state
+ || device_may_wakeup(dev) != !!adev->wakeup.prepare_count) {
+ pm_runtime_resume(dev);
+ return pm_generic_prepare(dev);
+ }
/*
- * Follow PCI and resume devices suspended at run time before running
- * their system suspend callbacks.
+ * If this is a wakeup device, wakeup power has been enabled already for
+ * it during the preceding runtime suspend. Caveat: "sleep state" is
+ * one of the _DSW arguments, but that shouldn't matter for the devices
+ * using acpi_general_pm_domain.
*/
- pm_runtime_resume(dev);
- return pm_generic_prepare(dev);
+ error = pm_generic_prepare(dev);
+ return error ? error : 1;
}
EXPORT_SYMBOL_GPL(acpi_subsys_prepare);

2014-02-16 23:39:14

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 0/3] PM: Mechanism to avoid resuming runtime-suspended devices during system suspend

On Wednesday, January 15, 2014 12:12:29 AM Rafael J. Wysocki wrote:
> Hi,
>
> The following experimental series of 3 patches implements a mechanism allowing
> subsystems to avoid resuming runtime-suspended devices during system suspend.
>
> As far as the PM core goes, it introduces a new flag, power.no_suspend, that
> will be set by the core for devices which can stay suspended. The idea is that
> subsystems should know which devices can stay suspended over system suspend
> and to allow them to tell the core about that patch [1/3] changes the calling
> convention of the device PM .prepare() callback so that it can return a positive
> value on success to be interpreted as "this device has been runtime-suspended
> and doesn't need to be resumed" information. If .prepare() returns a positive
> number for certain device, the core will set power.no_suspend and will not run
> suspend callbacks for device with that flag set going forward (during this
> particular system suspend transition).
>
> However, parents may generally need to be resumed so that the suspend of their
> children can be carried out, so the PM core will clear power.no_suspend for
> the parents of devices whose power.no_suspend is not set (unless those parents
> have power.ignore_children set).
>
> Patch [2/3] adds a new runtime PM helper function that subsystems can use to
> check whether or not a given device is runtime-suspended when .prepare() is being
> executed for it.
>
> Patch [3/3] implements the subsystem part for the ACPI PM domain, because that
> is relatively straightforward. If the general approach makes sense, I'll think
> about doing the same for PCI.

I have a new version of this.

The new patch [1/3] goes farther than the previous one, because I realized that
all subsystems returning values greater from zero from their .prepare()
callbacks will want to skip .resume_noirq() and .resume_early() for the
"fast suspended" devices and all of them will likely want to run a
pm_request_resume() for those devices in their .resume(). So, if all of
them would do that anyway, it's better if the core does that for them.
Of course, that simplifies patch [3/3] quite a bit. Patch [2/3] is the
same as before.

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-02-16 23:39:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

From: Rafael J. Wysocki <[email protected]>

Currently, some subsystems (e.g. PCI and the ACPI PM domain) have to
resume all runtime-suspended devices during system suspend, mostly
because those devices may need to be reprogrammed due to different
wakeup settings for system sleep and for runtime PM. However, at
least in some cases, that isn't really necessary, because the wakeup
settings may not be really different.

The idea here is that subsystems should know whether or not it is
necessary to reprogram a given device during system suspend and they
should be able to tell the PM core about that. For this reason,
modify the PM core so that if the .prepare() callback returns a
positive value for certain device, the core will set a new
power.fast_suspend flag for it. Then, if that flag is set, the core
will skip all of the subsequent suspend callbacks for that device.
It also will skip all of the system resume callbacks for the device
during the subsequent system resume and pm_request_resume() will be
executed to trigger a runtime PM resume of the device after the
system device resume sequence has been finished.

However, since parents may need to be resumed so that their children
can be reprogrammed, make the PM core clear power.fast_suspend for
devices whose children don't have power.fast_suspend set (the
power.ignore_children flag doesn't matter here, because a parent
whose children are normally ignored for runtime PM may still need to
be accessible for their children to be prepare for system suspend
properly).

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/main.c | 49 +++++++++++++++++++++++++++++++---------------
include/linux/pm.h | 1
2 files changed, 35 insertions(+), 15 deletions(-)

Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -478,7 +478,7 @@ static int device_resume_noirq(struct de
TRACE_DEVICE(dev);
TRACE_RESUME(0);

- if (dev->power.syscore)
+ if (dev->power.syscore || dev->power.fast_suspend)
goto Out;

if (!dev->power.is_noirq_suspended)
@@ -599,7 +599,7 @@ static int device_resume_early(struct de
TRACE_DEVICE(dev);
TRACE_RESUME(0);

- if (dev->power.syscore)
+ if (dev->power.syscore || dev->power.fast_suspend)
goto Out;

if (!dev->power.is_late_suspended)
@@ -724,6 +724,11 @@ static int device_resume(struct device *
if (dev->power.syscore)
goto Complete;

+ if (dev->power.fast_suspend) {
+ pm_request_resume(dev);
+ goto Complete;
+ }
+
dpm_wait(dev->parent, async);
dpm_watchdog_set(&wd, dev);
device_lock(dev);
@@ -994,7 +999,7 @@ static int __device_suspend_noirq(struct
return async_error;
}

- if (dev->power.syscore)
+ if (dev->power.syscore || dev->power.fast_suspend)
return 0;

if (dev->pm_domain) {
@@ -1127,7 +1132,7 @@ static int __device_suspend_late(struct
return async_error;
}

- if (dev->power.syscore)
+ if (dev->power.syscore || dev->power.fast_suspend)
return 0;

if (dev->pm_domain) {
@@ -1296,8 +1301,10 @@ static int __device_suspend(struct devic
* for it, this is equivalent to the device signaling wakeup, so the
* system suspend operation should be aborted.
*/
- if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
+ if (pm_runtime_barrier(dev) && device_may_wakeup(dev)) {
pm_wakeup_event(dev, 0);
+ dev->power.fast_suspend = false;
+ }

if (pm_wakeup_pending()) {
async_error = -EBUSY;
@@ -1310,6 +1317,9 @@ static int __device_suspend(struct devic
dpm_watchdog_set(&wd, dev);
device_lock(dev);

+ if (dev->power.fast_suspend)
+ goto End;
+
if (dev->pm_domain) {
info = "power domain ";
callback = pm_op(&dev->pm_domain->ops, state);
@@ -1358,9 +1368,14 @@ static int __device_suspend(struct devic
End:
if (!error) {
dev->power.is_suspended = true;
- if (dev->power.wakeup_path
- && dev->parent && !dev->parent->power.ignore_children)
- dev->parent->power.wakeup_path = true;
+ if (dev->parent) {
+ if (!dev->parent->power.ignore_children
+ && dev->power.wakeup_path)
+ dev->parent->power.wakeup_path = true;
+
+ if (!dev->power.fast_suspend)
+ dev->parent->power.fast_suspend = false;
+ }
}

device_unlock(dev);
@@ -1460,7 +1475,7 @@ static int device_prepare(struct device
{
int (*callback)(struct device *) = NULL;
char *info = NULL;
- int error = 0;
+ int ret = 0;

if (dev->power.syscore)
return 0;
@@ -1476,6 +1491,7 @@ static int device_prepare(struct device
device_lock(dev);

dev->power.wakeup_path = device_may_wakeup(dev);
+ dev->power.fast_suspend = false;

if (dev->pm_domain) {
info = "preparing power domain ";
@@ -1497,16 +1513,19 @@ static int device_prepare(struct device
}

if (callback) {
- error = callback(dev);
- suspend_report_result(callback, error);
+ ret = callback(dev);
+ suspend_report_result(callback, ret);
}

device_unlock(dev);

- if (error)
+ if (ret < 0) {
pm_runtime_put(dev);
-
- return error;
+ return ret;
+ } else if (ret > 0) {
+ dev->power.fast_suspend = true;
+ }
+ return 0;
}

/**
@@ -1575,7 +1594,7 @@ EXPORT_SYMBOL_GPL(dpm_suspend_start);

void __suspend_report_result(const char *function, void *fn, int ret)
{
- if (ret)
+ if (ret < 0)
printk(KERN_ERR "%s(): %pF returns %d\n", function, fn, ret);
}
EXPORT_SYMBOL_GPL(__suspend_report_result);
Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -546,6 +546,7 @@ struct dev_pm_info {
bool is_late_suspended:1;
bool ignore_children:1;
bool early_init:1; /* Owned by the PM core */
+ bool fast_suspend:1; /* Owned by the PM core */
spinlock_t lock;
#ifdef CONFIG_PM_SLEEP
struct list_head entry;

2014-02-18 12:59:39

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On 17 February 2014 00:50, Rafael J. Wysocki <[email protected]> wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Currently, some subsystems (e.g. PCI and the ACPI PM domain) have to
> resume all runtime-suspended devices during system suspend, mostly
> because those devices may need to be reprogrammed due to different
> wakeup settings for system sleep and for runtime PM. However, at
> least in some cases, that isn't really necessary, because the wakeup
> settings may not be really different.
>
> The idea here is that subsystems should know whether or not it is
> necessary to reprogram a given device during system suspend and they
> should be able to tell the PM core about that. For this reason,
> modify the PM core so that if the .prepare() callback returns a
> positive value for certain device, the core will set a new
> power.fast_suspend flag for it. Then, if that flag is set, the core
> will skip all of the subsequent suspend callbacks for that device.
> It also will skip all of the system resume callbacks for the device
> during the subsequent system resume and pm_request_resume() will be
> executed to trigger a runtime PM resume of the device after the
> system device resume sequence has been finished.
>
> However, since parents may need to be resumed so that their children
> can be reprogrammed, make the PM core clear power.fast_suspend for
> devices whose children don't have power.fast_suspend set (the
> power.ignore_children flag doesn't matter here, because a parent
> whose children are normally ignored for runtime PM may still need to
> be accessible for their children to be prepare for system suspend
> properly).
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/base/power/main.c | 49 +++++++++++++++++++++++++++++++---------------
> include/linux/pm.h | 1
> 2 files changed, 35 insertions(+), 15 deletions(-)
>
> Index: linux-pm/drivers/base/power/main.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/main.c
> +++ linux-pm/drivers/base/power/main.c
> @@ -478,7 +478,7 @@ static int device_resume_noirq(struct de
> TRACE_DEVICE(dev);
> TRACE_RESUME(0);
>
> - if (dev->power.syscore)
> + if (dev->power.syscore || dev->power.fast_suspend)
> goto Out;
>
> if (!dev->power.is_noirq_suspended)
> @@ -599,7 +599,7 @@ static int device_resume_early(struct de
> TRACE_DEVICE(dev);
> TRACE_RESUME(0);
>
> - if (dev->power.syscore)
> + if (dev->power.syscore || dev->power.fast_suspend)
> goto Out;
>
> if (!dev->power.is_late_suspended)
> @@ -724,6 +724,11 @@ static int device_resume(struct device *
> if (dev->power.syscore)
> goto Complete;
>
> + if (dev->power.fast_suspend) {
> + pm_request_resume(dev);
> + goto Complete;

So, this will trigger an async request to runtime resume the device.

At device_complete(), we do pm_runtime_put() to return the reference
we fetched at device_prepare(), thus likely causing the device to be
runtime suspended again. Is that the expected sequence you need? Could
you elaborate why?

Kind regards
Ulf Hansson

> + }
> +
> dpm_wait(dev->parent, async);
> dpm_watchdog_set(&wd, dev);
> device_lock(dev);
> @@ -994,7 +999,7 @@ static int __device_suspend_noirq(struct
> return async_error;
> }
>
> - if (dev->power.syscore)
> + if (dev->power.syscore || dev->power.fast_suspend)
> return 0;
>
> if (dev->pm_domain) {
> @@ -1127,7 +1132,7 @@ static int __device_suspend_late(struct
> return async_error;
> }
>
> - if (dev->power.syscore)
> + if (dev->power.syscore || dev->power.fast_suspend)
> return 0;
>
> if (dev->pm_domain) {
> @@ -1296,8 +1301,10 @@ static int __device_suspend(struct devic
> * for it, this is equivalent to the device signaling wakeup, so the
> * system suspend operation should be aborted.
> */
> - if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
> + if (pm_runtime_barrier(dev) && device_may_wakeup(dev)) {
> pm_wakeup_event(dev, 0);
> + dev->power.fast_suspend = false;
> + }
>
> if (pm_wakeup_pending()) {
> async_error = -EBUSY;
> @@ -1310,6 +1317,9 @@ static int __device_suspend(struct devic
> dpm_watchdog_set(&wd, dev);
> device_lock(dev);
>
> + if (dev->power.fast_suspend)
> + goto End;
> +
> if (dev->pm_domain) {
> info = "power domain ";
> callback = pm_op(&dev->pm_domain->ops, state);
> @@ -1358,9 +1368,14 @@ static int __device_suspend(struct devic
> End:
> if (!error) {
> dev->power.is_suspended = true;
> - if (dev->power.wakeup_path
> - && dev->parent && !dev->parent->power.ignore_children)
> - dev->parent->power.wakeup_path = true;
> + if (dev->parent) {
> + if (!dev->parent->power.ignore_children
> + && dev->power.wakeup_path)
> + dev->parent->power.wakeup_path = true;
> +
> + if (!dev->power.fast_suspend)
> + dev->parent->power.fast_suspend = false;
> + }
> }
>
> device_unlock(dev);
> @@ -1460,7 +1475,7 @@ static int device_prepare(struct device
> {
> int (*callback)(struct device *) = NULL;
> char *info = NULL;
> - int error = 0;
> + int ret = 0;
>
> if (dev->power.syscore)
> return 0;
> @@ -1476,6 +1491,7 @@ static int device_prepare(struct device
> device_lock(dev);
>
> dev->power.wakeup_path = device_may_wakeup(dev);
> + dev->power.fast_suspend = false;
>
> if (dev->pm_domain) {
> info = "preparing power domain ";
> @@ -1497,16 +1513,19 @@ static int device_prepare(struct device
> }
>
> if (callback) {
> - error = callback(dev);
> - suspend_report_result(callback, error);
> + ret = callback(dev);
> + suspend_report_result(callback, ret);
> }
>
> device_unlock(dev);
>
> - if (error)
> + if (ret < 0) {
> pm_runtime_put(dev);
> -
> - return error;
> + return ret;
> + } else if (ret > 0) {
> + dev->power.fast_suspend = true;
> + }
> + return 0;
> }
>
> /**
> @@ -1575,7 +1594,7 @@ EXPORT_SYMBOL_GPL(dpm_suspend_start);
>
> void __suspend_report_result(const char *function, void *fn, int ret)
> {
> - if (ret)
> + if (ret < 0)
> printk(KERN_ERR "%s(): %pF returns %d\n", function, fn, ret);
> }
> EXPORT_SYMBOL_GPL(__suspend_report_result);
> Index: linux-pm/include/linux/pm.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm.h
> +++ linux-pm/include/linux/pm.h
> @@ -546,6 +546,7 @@ struct dev_pm_info {
> bool is_late_suspended:1;
> bool ignore_children:1;
> bool early_init:1; /* Owned by the PM core */
> + bool fast_suspend:1; /* Owned by the PM core */
> spinlock_t lock;
> #ifdef CONFIG_PM_SLEEP
> struct list_head entry;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-02-18 13:10:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Tuesday, February 18, 2014 01:59:36 PM Ulf Hansson wrote:
> On 17 February 2014 00:50, Rafael J. Wysocki <[email protected]> wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Currently, some subsystems (e.g. PCI and the ACPI PM domain) have to
> > resume all runtime-suspended devices during system suspend, mostly
> > because those devices may need to be reprogrammed due to different
> > wakeup settings for system sleep and for runtime PM. However, at
> > least in some cases, that isn't really necessary, because the wakeup
> > settings may not be really different.
> >
> > The idea here is that subsystems should know whether or not it is
> > necessary to reprogram a given device during system suspend and they
> > should be able to tell the PM core about that. For this reason,
> > modify the PM core so that if the .prepare() callback returns a
> > positive value for certain device, the core will set a new
> > power.fast_suspend flag for it. Then, if that flag is set, the core
> > will skip all of the subsequent suspend callbacks for that device.
> > It also will skip all of the system resume callbacks for the device
> > during the subsequent system resume and pm_request_resume() will be
> > executed to trigger a runtime PM resume of the device after the
> > system device resume sequence has been finished.
> >
> > However, since parents may need to be resumed so that their children
> > can be reprogrammed, make the PM core clear power.fast_suspend for
> > devices whose children don't have power.fast_suspend set (the
> > power.ignore_children flag doesn't matter here, because a parent
> > whose children are normally ignored for runtime PM may still need to
> > be accessible for their children to be prepare for system suspend
> > properly).
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > drivers/base/power/main.c | 49 +++++++++++++++++++++++++++++++---------------
> > include/linux/pm.h | 1
> > 2 files changed, 35 insertions(+), 15 deletions(-)
> >
> > Index: linux-pm/drivers/base/power/main.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/main.c
> > +++ linux-pm/drivers/base/power/main.c
> > @@ -478,7 +478,7 @@ static int device_resume_noirq(struct de
> > TRACE_DEVICE(dev);
> > TRACE_RESUME(0);
> >
> > - if (dev->power.syscore)
> > + if (dev->power.syscore || dev->power.fast_suspend)
> > goto Out;
> >
> > if (!dev->power.is_noirq_suspended)
> > @@ -599,7 +599,7 @@ static int device_resume_early(struct de
> > TRACE_DEVICE(dev);
> > TRACE_RESUME(0);
> >
> > - if (dev->power.syscore)
> > + if (dev->power.syscore || dev->power.fast_suspend)
> > goto Out;
> >
> > if (!dev->power.is_late_suspended)
> > @@ -724,6 +724,11 @@ static int device_resume(struct device *
> > if (dev->power.syscore)
> > goto Complete;
> >
> > + if (dev->power.fast_suspend) {
> > + pm_request_resume(dev);
> > + goto Complete;
>
> So, this will trigger an async request to runtime resume the device.
>
> At device_complete(), we do pm_runtime_put() to return the reference
> we fetched at device_prepare(), thus likely causing the device to be
> runtime suspended again. Is that the expected sequence you need? Could
> you elaborate why?

That pm_runtime_put() will not cause the device to be re-suspended,
because it will be executed before the resume scheduled by the
pm_request_resume() above.

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-02-19 17:01:23

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Mon, 17 Feb 2014, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <[email protected]>
>
> Currently, some subsystems (e.g. PCI and the ACPI PM domain) have to
> resume all runtime-suspended devices during system suspend, mostly
> because those devices may need to be reprogrammed due to different
> wakeup settings for system sleep and for runtime PM. However, at
> least in some cases, that isn't really necessary, because the wakeup
> settings may not be really different.
>
> The idea here is that subsystems should know whether or not it is
> necessary to reprogram a given device during system suspend and they
> should be able to tell the PM core about that. For this reason,
> modify the PM core so that if the .prepare() callback returns a
> positive value for certain device, the core will set a new
> power.fast_suspend flag for it. Then, if that flag is set, the core
> will skip all of the subsequent suspend callbacks for that device.
> It also will skip all of the system resume callbacks for the device
> during the subsequent system resume and pm_request_resume() will be
> executed to trigger a runtime PM resume of the device after the
> system device resume sequence has been finished.

Does the PM core really need to get involved in this? Can't the
subsystem do the right thing on its own?

In the USB subsystem, the .suspend routine checks the required wakeup
settings. If they are different for runtime suspend and system
suspend, and if the device is runtime suspended, then we call
pm_runtime_resume. After that, if the device is still in runtime
suspend then we return immediately.

Of course, this addresses only the suspend side of the issue. Skipping
the resume callbacks is a separate matter, and the USB subsystem
doesn't try to do it. Still, I don't see any reason why we couldn't
take care of that as well.

> However, since parents may need to be resumed so that their children
> can be reprogrammed, make the PM core clear power.fast_suspend for
> devices whose children don't have power.fast_suspend set (the
> power.ignore_children flag doesn't matter here, because a parent
> whose children are normally ignored for runtime PM may still need to
> be accessible for their children to be prepare for system suspend
> properly).

I have run across a similar issue. It's a general problem that a
device may try to remain in runtime suspend during a system resume, but
a descendant of the device may need to perform I/O as part of its own
resume routine. A natural solution would be to use the regular runtime
PM facilities to wake up the device. But since the PM work queue is
frozen, we can't rely on pm_runtime_get or the equivalent. I'm not
sure what the best solution will be.


After a quick look, I noticed a couple of questionable things in this
patch. This is after reading just the second half...

> @@ -1296,8 +1301,10 @@ static int __device_suspend(struct devic
> * for it, this is equivalent to the device signaling wakeup, so the
> * system suspend operation should be aborted.
> */
> - if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
> + if (pm_runtime_barrier(dev) && device_may_wakeup(dev)) {
> pm_wakeup_event(dev, 0);
> + dev->power.fast_suspend = false;
> + }

Is this change needed? We're aborting the sleep transition anyway,
right?

> @@ -1310,6 +1317,9 @@ static int __device_suspend(struct devic
> dpm_watchdog_set(&wd, dev);
> device_lock(dev);
>
> + if (dev->power.fast_suspend)
> + goto End;
> +

What happens if dev->power.fast_suspend gets set following the .prepare
callback, but then before __device_suspend runs, the device gets
runtime resumed?

It looks like rpm_resume needs to clear the new flag.

> @@ -1358,9 +1368,14 @@ static int __device_suspend(struct devic
> End:
> if (!error) {
> dev->power.is_suspended = true;
> - if (dev->power.wakeup_path
> - && dev->parent && !dev->parent->power.ignore_children)
> - dev->parent->power.wakeup_path = true;
> + if (dev->parent) {
> + if (!dev->parent->power.ignore_children
> + && dev->power.wakeup_path)
> + dev->parent->power.wakeup_path = true;
> +
> + if (!dev->power.fast_suspend)
> + dev->parent->power.fast_suspend = false;
> + }

On SMP systems with async suspend, this isn't safe. Two threads should
not be allowed to write to bitfields in the same structure at the same
time.

Alan Stern

2014-02-20 01:08:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Wednesday, February 19, 2014 12:01:20 PM Alan Stern wrote:
> On Mon, 17 Feb 2014, Rafael J. Wysocki wrote:
>
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Currently, some subsystems (e.g. PCI and the ACPI PM domain) have to
> > resume all runtime-suspended devices during system suspend, mostly
> > because those devices may need to be reprogrammed due to different
> > wakeup settings for system sleep and for runtime PM. However, at
> > least in some cases, that isn't really necessary, because the wakeup
> > settings may not be really different.
> >
> > The idea here is that subsystems should know whether or not it is
> > necessary to reprogram a given device during system suspend and they
> > should be able to tell the PM core about that. For this reason,
> > modify the PM core so that if the .prepare() callback returns a
> > positive value for certain device, the core will set a new
> > power.fast_suspend flag for it. Then, if that flag is set, the core
> > will skip all of the subsequent suspend callbacks for that device.
> > It also will skip all of the system resume callbacks for the device
> > during the subsequent system resume and pm_request_resume() will be
> > executed to trigger a runtime PM resume of the device after the
> > system device resume sequence has been finished.

I was worried that you wouldn't comment at all. ;-)

> Does the PM core really need to get involved in this?

Yes, it does, in my opinion, and the reason is because it may be necessary
to resume parents in order to reprogram their children and the parents and
the children may be on different bus types.

> Can't the subsystem do the right thing on its own?

No, I don't think so.

> In the USB subsystem, the .suspend routine checks the required wakeup
> settings. If they are different for runtime suspend and system
> suspend, and if the device is runtime suspended, then we call
> pm_runtime_resume. After that, if the device is still in runtime
> suspend then we return immediately.
>
> Of course, this addresses only the suspend side of the issue. Skipping
> the resume callbacks is a separate matter, and the USB subsystem
> doesn't try to do it. Still, I don't see any reason why we couldn't
> take care of that as well.

What about USB controllers that are PCI devices?

> > However, since parents may need to be resumed so that their children
> > can be reprogrammed, make the PM core clear power.fast_suspend for
> > devices whose children don't have power.fast_suspend set (the
> > power.ignore_children flag doesn't matter here, because a parent
> > whose children are normally ignored for runtime PM may still need to
> > be accessible for their children to be prepare for system suspend
> > properly).
>
> I have run across a similar issue. It's a general problem that a
> device may try to remain in runtime suspend during a system resume, but
> a descendant of the device may need to perform I/O as part of its own
> resume routine. A natural solution would be to use the regular runtime
> PM facilities to wake up the device. But since the PM work queue is
> frozen, we can't rely on pm_runtime_get or the equivalent. I'm not
> sure what the best solution will be.

We can rely on pm_runtime_get_sync(), though, which would be the right thing to
use here.

However, given that the parent's .prepare() has run already, I'm not sure
if we want runtime PM to be involved at all.

> After a quick look, I noticed a couple of questionable things in this
> patch. This is after reading just the second half...
>
> > @@ -1296,8 +1301,10 @@ static int __device_suspend(struct devic
> > * for it, this is equivalent to the device signaling wakeup, so the
> > * system suspend operation should be aborted.
> > */
> > - if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
> > + if (pm_runtime_barrier(dev) && device_may_wakeup(dev)) {
> > pm_wakeup_event(dev, 0);
> > + dev->power.fast_suspend = false;
> > + }
>
> Is this change needed? We're aborting the sleep transition anyway,
> right?

Yes, we are. The goal was to ensure that power.fast_suspend would be cleared
in that case, but dpm_resume_end() should take care of this anyway.

> > @@ -1310,6 +1317,9 @@ static int __device_suspend(struct devic
> > dpm_watchdog_set(&wd, dev);
> > device_lock(dev);
> >
> > + if (dev->power.fast_suspend)
> > + goto End;
> > +
>
> What happens if dev->power.fast_suspend gets set following the .prepare
> callback, but then before __device_suspend runs, the device gets
> runtime resumed?
>
> It looks like rpm_resume needs to clear the new flag.

I thought about that and came to the conclusion that that wasn't necessary.

There simply is no reason for devices with power.fast_suspend set to be
runtime-resumed after (or even during) dpm_prepare() other than while
resuming their children, in which case power.fast_suspend is going to be
cleared for the children and then for the parents too.

That *is* a little fragile, though.

> > @@ -1358,9 +1368,14 @@ static int __device_suspend(struct devic
> > End:
> > if (!error) {
> > dev->power.is_suspended = true;
> > - if (dev->power.wakeup_path
> > - && dev->parent && !dev->parent->power.ignore_children)
> > - dev->parent->power.wakeup_path = true;
> > + if (dev->parent) {
> > + if (!dev->parent->power.ignore_children
> > + && dev->power.wakeup_path)
> > + dev->parent->power.wakeup_path = true;
> > +
> > + if (!dev->power.fast_suspend)
> > + dev->parent->power.fast_suspend = false;
> > + }
>
> On SMP systems with async suspend, this isn't safe. Two threads should
> not be allowed to write to bitfields in the same structure at the same
> time.

Do I understand correctly that your concern is about suspending two
children of the same parent in parallel and one of them modifying
power.wakeup_path and the other modifying power.fast_suspend at the
same time?

If so, that modification can be done under the parent's power.lock.

Rafael

2014-02-20 01:27:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Thursday, February 20, 2014 02:23:30 AM Rafael J. Wysocki wrote:
> On Wednesday, February 19, 2014 12:01:20 PM Alan Stern wrote:
> > On Mon, 17 Feb 2014, Rafael J. Wysocki wrote:
> >
> > > From: Rafael J. Wysocki <[email protected]>
> > >
> > > Currently, some subsystems (e.g. PCI and the ACPI PM domain) have to
> > > resume all runtime-suspended devices during system suspend, mostly
> > > because those devices may need to be reprogrammed due to different
> > > wakeup settings for system sleep and for runtime PM. However, at
> > > least in some cases, that isn't really necessary, because the wakeup
> > > settings may not be really different.
> > >
> > > The idea here is that subsystems should know whether or not it is
> > > necessary to reprogram a given device during system suspend and they
> > > should be able to tell the PM core about that. For this reason,
> > > modify the PM core so that if the .prepare() callback returns a
> > > positive value for certain device, the core will set a new
> > > power.fast_suspend flag for it. Then, if that flag is set, the core
> > > will skip all of the subsequent suspend callbacks for that device.
> > > It also will skip all of the system resume callbacks for the device
> > > during the subsequent system resume and pm_request_resume() will be
> > > executed to trigger a runtime PM resume of the device after the
> > > system device resume sequence has been finished.
>
> I was worried that you wouldn't comment at all. ;-)
>
> > Does the PM core really need to get involved in this?
>
> Yes, it does, in my opinion, and the reason is because it may be necessary
> to resume parents in order to reprogram their children and the parents and
> the children may be on different bus types.
>
> > Can't the subsystem do the right thing on its own?
>
> No, I don't think so.
>
> > In the USB subsystem, the .suspend routine checks the required wakeup
> > settings. If they are different for runtime suspend and system
> > suspend, and if the device is runtime suspended, then we call
> > pm_runtime_resume. After that, if the device is still in runtime
> > suspend then we return immediately.
> >
> > Of course, this addresses only the suspend side of the issue. Skipping
> > the resume callbacks is a separate matter, and the USB subsystem
> > doesn't try to do it. Still, I don't see any reason why we couldn't
> > take care of that as well.
>
> What about USB controllers that are PCI devices?
>
> > > However, since parents may need to be resumed so that their children
> > > can be reprogrammed, make the PM core clear power.fast_suspend for
> > > devices whose children don't have power.fast_suspend set (the
> > > power.ignore_children flag doesn't matter here, because a parent
> > > whose children are normally ignored for runtime PM may still need to
> > > be accessible for their children to be prepare for system suspend
> > > properly).
> >
> > I have run across a similar issue. It's a general problem that a
> > device may try to remain in runtime suspend during a system resume, but
> > a descendant of the device may need to perform I/O as part of its own
> > resume routine. A natural solution would be to use the regular runtime
> > PM facilities to wake up the device. But since the PM work queue is
> > frozen, we can't rely on pm_runtime_get or the equivalent. I'm not
> > sure what the best solution will be.
>
> We can rely on pm_runtime_get_sync(), though, which would be the right thing to
> use here.
>
> However, given that the parent's .prepare() has run already, I'm not sure
> if we want runtime PM to be involved at all.
>
> > After a quick look, I noticed a couple of questionable things in this
> > patch. This is after reading just the second half...
> >
> > > @@ -1296,8 +1301,10 @@ static int __device_suspend(struct devic
> > > * for it, this is equivalent to the device signaling wakeup, so the
> > > * system suspend operation should be aborted.
> > > */
> > > - if (pm_runtime_barrier(dev) && device_may_wakeup(dev))
> > > + if (pm_runtime_barrier(dev) && device_may_wakeup(dev)) {
> > > pm_wakeup_event(dev, 0);
> > > + dev->power.fast_suspend = false;
> > > + }
> >
> > Is this change needed? We're aborting the sleep transition anyway,
> > right?
>
> Yes, we are. The goal was to ensure that power.fast_suspend would be cleared
> in that case, but dpm_resume_end() should take care of this anyway.

Sorry, I wanted to say "the clearing of it in device_prepare()", not sure why
I mentioned dpm_resume_end(). Well, maybe I'm too tired ...

Rafael

2014-02-20 17:03:40

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Thu, 20 Feb 2014, Rafael J. Wysocki wrote:

> On Wednesday, February 19, 2014 12:01:20 PM Alan Stern wrote:
> > On Mon, 17 Feb 2014, Rafael J. Wysocki wrote:
> >
> > > From: Rafael J. Wysocki <[email protected]>
> > >
> > > Currently, some subsystems (e.g. PCI and the ACPI PM domain) have to
> > > resume all runtime-suspended devices during system suspend, mostly
> > > because those devices may need to be reprogrammed due to different
> > > wakeup settings for system sleep and for runtime PM. However, at
> > > least in some cases, that isn't really necessary, because the wakeup
> > > settings may not be really different.
> > >
> > > The idea here is that subsystems should know whether or not it is
> > > necessary to reprogram a given device during system suspend and they
> > > should be able to tell the PM core about that. For this reason,
> > > modify the PM core so that if the .prepare() callback returns a
> > > positive value for certain device, the core will set a new
> > > power.fast_suspend flag for it. Then, if that flag is set, the core
> > > will skip all of the subsequent suspend callbacks for that device.
> > > It also will skip all of the system resume callbacks for the device
> > > during the subsequent system resume and pm_request_resume() will be
> > > executed to trigger a runtime PM resume of the device after the
> > > system device resume sequence has been finished.
>
> I was worried that you wouldn't comment at all. ;-)

Things have been pretty busy here for the last few weeks...

> > Does the PM core really need to get involved in this?
>
> Yes, it does, in my opinion, and the reason is because it may be necessary
> to resume parents in order to reprogram their children and the parents and
> the children may be on different bus types.

Hmmm. Reprogramming the children could refer to two different things:

(1) Altering the child's wakeup settings before suspending it, or

(2) Doing I/O to the child after resuming it.

(1) is something we already have to deal with. Generally, the child's
subsystem or driver would simply call pm_runtime_get_sync, or something
of the sort. The parent would then be in the RPM_ACTIVE state when its
->resume callback was invoked, so the rest of this discussion would not
apply.

(2) is more difficult. It is the reason why, in the original runtime
PM documentation, we suggested that system resume should wake up _all_
devices, including those that were in runtime suspend before the system
sleep.

If the child's subsystem or driver knows to call pm_runtime_get_sync
before starting the I/O, then it should work out okay. Trouble arises
when the subsystem/driver _assumes_ that the parent is active. This
sounds like the sort of thing that could be fixed on a case-by-case
basis: Simply remove that assumption from the child's subsystem/driver.
But I guess you want to set up a more structured solution.

Note that this patch set really addresses two distinct issues: Skipping
the ->suspend callbacks if the device is already runtime suspended, and
skipping the ->resume callbacks. In principle these are independent
decisions.

> > Can't the subsystem do the right thing on its own?
>
> No, I don't think so.

Well, certainly the subsystem could avoid doing anything in the
->suspend callbacks, by returning immediately. But skipping over the
->resume callbacks is more problematic, because of case (2) above.

> > In the USB subsystem, the .suspend routine checks the required wakeup
> > settings. If they are different for runtime suspend and system
> > suspend, and if the device is runtime suspended, then we call
> > pm_runtime_resume. After that, if the device is still in runtime
> > suspend then we return immediately.
> >
> > Of course, this addresses only the suspend side of the issue. Skipping
> > the resume callbacks is a separate matter, and the USB subsystem
> > doesn't try to do it. Still, I don't see any reason why we couldn't
> > take care of that as well.
>
> What about USB controllers that are PCI devices?

The description above applies to USB devices, not USB controllers.
For PCI-based USB controllers, we currently assume that the controller
is RPM_ACTIVE when the ->suspend callback is called.

Of course, that assumption could be removed easily enough.

> > I have run across a similar issue. It's a general problem that a
> > device may try to remain in runtime suspend during a system resume, but
> > a descendant of the device may need to perform I/O as part of its own
> > resume routine. A natural solution would be to use the regular runtime
> > PM facilities to wake up the device. But since the PM work queue is
> > frozen, we can't rely on pm_runtime_get or the equivalent. I'm not
> > sure what the best solution will be.
>
> We can rely on pm_runtime_get_sync(), though, which would be the right thing to
> use here.
>
> However, given that the parent's .prepare() has run already, I'm not sure
> if we want runtime PM to be involved at all.

This is a decision we have to make. At the start of a system resume,
we expect that the device is in a low-power state. After the ->resume
callback returns, there is a choice. The device could be back to full
power and marked RPM_ACTIVE. Or it could remain in low power and be
marked RPM_SUSPENDED. In the second alternative, it seems that runtime
PM is unavoidably involved.

Are you primarily concerned about performing runtime PM actions before
calling the ->complete callback? I don't think that will cause any
difficulties, but we could warn about it in the documentation.

> > > @@ -1310,6 +1317,9 @@ static int __device_suspend(struct devic
> > > dpm_watchdog_set(&wd, dev);
> > > device_lock(dev);
> > >
> > > + if (dev->power.fast_suspend)
> > > + goto End;
> > > +
> >
> > What happens if dev->power.fast_suspend gets set following the .prepare
> > callback, but then before __device_suspend runs, the device gets
> > runtime resumed?
> >
> > It looks like rpm_resume needs to clear the new flag.
>
> I thought about that and came to the conclusion that that wasn't necessary.
>
> There simply is no reason for devices with power.fast_suspend set to be
> runtime-resumed after (or even during) dpm_prepare() other than while
> resuming their children, in which case power.fast_suspend is going to be
> cleared for the children and then for the parents too.
>
> That *is* a little fragile, though.

It's possible for the device to get a wakeup request. If this happens,
it should cause the entire system sleep to be aborted, but depending on
that is also a little fragile.

There's also the possibility of doing I/O to the child while the parent
is in runtime suspend (with ignore_children set). These situations
tend to be idiosyncratic; it's hard to say anything about them in
general.

> > > @@ -1358,9 +1368,14 @@ static int __device_suspend(struct devic
> > > End:
> > > if (!error) {
> > > dev->power.is_suspended = true;
> > > - if (dev->power.wakeup_path
> > > - && dev->parent && !dev->parent->power.ignore_children)
> > > - dev->parent->power.wakeup_path = true;
> > > + if (dev->parent) {
> > > + if (!dev->parent->power.ignore_children
> > > + && dev->power.wakeup_path)
> > > + dev->parent->power.wakeup_path = true;
> > > +
> > > + if (!dev->power.fast_suspend)
> > > + dev->parent->power.fast_suspend = false;
> > > + }
> >
> > On SMP systems with async suspend, this isn't safe. Two threads should
> > not be allowed to write to bitfields in the same structure at the same
> > time.
>
> Do I understand correctly that your concern is about suspending two
> children of the same parent in parallel and one of them modifying
> power.wakeup_path and the other modifying power.fast_suspend at the
> same time?

Yes, that's what I meant.

> If so, that modification can be done under the parent's power.lock.

That would avoid the problem.

Overall, I'm not convinced about the need for the fast_suspend
mechanism. I'm quite certain that the PM core doesn't need to get
involved in the decision about skipping ->suspend callbacks.

The question of skipping ->resume callbacks is more complex. In the
end, it comes down to a question of leaving the device in runtime
suspend during system resume -- whether this is happens because the PM
core skips the ->resume callback or because the callback returns
immediately is unimportant.

So I think what we really need to do is discuss this last question more
fully. Case (2) can be tricky, and we may find that the device's
subsystem doesn't have enough information to predict what the
children's subsystems will do.

Alan Stern

2014-02-23 23:45:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Thursday, February 20, 2014 12:03:37 PM Alan Stern wrote:
> On Thu, 20 Feb 2014, Rafael J. Wysocki wrote:
>
> > On Wednesday, February 19, 2014 12:01:20 PM Alan Stern wrote:
> > > On Mon, 17 Feb 2014, Rafael J. Wysocki wrote:
> > >
> > > > From: Rafael J. Wysocki <[email protected]>
> > > >
> > > > Currently, some subsystems (e.g. PCI and the ACPI PM domain) have to
> > > > resume all runtime-suspended devices during system suspend, mostly
> > > > because those devices may need to be reprogrammed due to different
> > > > wakeup settings for system sleep and for runtime PM. However, at
> > > > least in some cases, that isn't really necessary, because the wakeup
> > > > settings may not be really different.
> > > >
> > > > The idea here is that subsystems should know whether or not it is
> > > > necessary to reprogram a given device during system suspend and they
> > > > should be able to tell the PM core about that. For this reason,
> > > > modify the PM core so that if the .prepare() callback returns a
> > > > positive value for certain device, the core will set a new
> > > > power.fast_suspend flag for it. Then, if that flag is set, the core
> > > > will skip all of the subsequent suspend callbacks for that device.
> > > > It also will skip all of the system resume callbacks for the device
> > > > during the subsequent system resume and pm_request_resume() will be
> > > > executed to trigger a runtime PM resume of the device after the
> > > > system device resume sequence has been finished.
> >
> > I was worried that you wouldn't comment at all. ;-)
>
> Things have been pretty busy here for the last few weeks...
>
> > > Does the PM core really need to get involved in this?
> >
> > Yes, it does, in my opinion, and the reason is because it may be necessary
> > to resume parents in order to reprogram their children and the parents and
> > the children may be on different bus types.
>
> Hmmm. Reprogramming the children could refer to two different things:
>
> (1) Altering the child's wakeup settings before suspending it, or
>
> (2) Doing I/O to the child after resuming it.
>
> (1) is something we already have to deal with. Generally, the child's
> subsystem or driver would simply call pm_runtime_get_sync, or something
> of the sort.

Yes, it could. Except that that won't work for parents with power.ignore_children
set.

Also, it may not do that today and I'd like to introduce a mechanism by which
that optimizatiom may be enabled by subsystems/drivers when ready.

So for example today there is no guarantee that each device will be resumed
as appropriate by whoever handles its children when necessary, because that
code may expect the device to be operational when it is running (precisely
because we used to resume that device in .prepare()).

> The parent would then be in the RPM_ACTIVE state when its
> ->resume callback was invoked, so the rest of this discussion would not
> apply.
>
> (2) is more difficult. It is the reason why, in the original runtime
> PM documentation, we suggested that system resume should wake up _all_
> devices, including those that were in runtime suspend before the system
> sleep.

In my view there really is no substantial difference between (1) and (2).
After all, changing the wakeup settings of a device's children may involve
doing I/O to them.

> If the child's subsystem or driver knows to call pm_runtime_get_sync
> before starting the I/O, then it should work out okay. Trouble arises
> when the subsystem/driver _assumes_ that the parent is active.

Precisely

> This sounds like the sort of thing that could be fixed on a case-by-case
> basis: Simply remove that assumption from the child's subsystem/driver.
> But I guess you want to set up a more structured solution.

Yes. In particular, I'd like the "child" subsystem to be able to let the
core know that it is fine to leave the parent suspended.

> Note that this patch set really addresses two distinct issues: Skipping
> the ->suspend callbacks if the device is already runtime suspended, and
> skipping the ->resume callbacks. In principle these are independent
> decisions.

Yes, they are.

That said the patchset doesn't really do both. It only really is about
skipping the ->suspend callbacks if possible, but the *consequence* of that
is that *system* resume ->resume callbacks cannot be used to resume the
device any more in general. That's because ->resume_early may try to
reverse the ->suspend_late's actions and so on, so if ->suspend_late hasn't
run, it would be a bug to run ->resume_early for that device.

> > > Can't the subsystem do the right thing on its own?
> >
> > No, I don't think so.
>
> Well, certainly the subsystem could avoid doing anything in the
> ->suspend callbacks, by returning immediately.

As I said above, today there's no guaranee that things would work out correctly
in that case.

> But skipping over the ->resume callbacks is more problematic, because of case
> (2) above.
>
> > > In the USB subsystem, the .suspend routine checks the required wakeup
> > > settings. If they are different for runtime suspend and system
> > > suspend, and if the device is runtime suspended, then we call
> > > pm_runtime_resume. After that, if the device is still in runtime
> > > suspend then we return immediately.
> > >
> > > Of course, this addresses only the suspend side of the issue. Skipping
> > > the resume callbacks is a separate matter, and the USB subsystem
> > > doesn't try to do it. Still, I don't see any reason why we couldn't
> > > take care of that as well.
> >
> > What about USB controllers that are PCI devices?
>
> The description above applies to USB devices, not USB controllers.
> For PCI-based USB controllers, we currently assume that the controller
> is RPM_ACTIVE when the ->suspend callback is called.
>
> Of course, that assumption could be removed easily enough.

Well, in my view, in general, any mechanism that crosses the subsystem
boundary pretty much requires the core to be involved, this way or another.

> > > I have run across a similar issue. It's a general problem that a
> > > device may try to remain in runtime suspend during a system resume, but
> > > a descendant of the device may need to perform I/O as part of its own
> > > resume routine. A natural solution would be to use the regular runtime
> > > PM facilities to wake up the device. But since the PM work queue is
> > > frozen, we can't rely on pm_runtime_get or the equivalent. I'm not
> > > sure what the best solution will be.
> >
> > We can rely on pm_runtime_get_sync(), though, which would be the right thing to
> > use here.
> >
> > However, given that the parent's .prepare() has run already, I'm not sure
> > if we want runtime PM to be involved at all.
>
> This is a decision we have to make. At the start of a system resume,
> we expect that the device is in a low-power state. After the ->resume
> callback returns, there is a choice. The device could be back to full
> power and marked RPM_ACTIVE. Or it could remain in low power and be
> marked RPM_SUSPENDED. In the second alternative, it seems that runtime
> PM is unavoidably involved.

I agree. However, as stated above, once we've decided not to run, for
example, the ->suspend callback, we also should not attempt to run
->resume for the same device, because the latter may expect that the former
has run. That is an additional complication that needs to be taken into
account in my opinion.

Of course, if the decision is left to the subsystem, then it will know
that ->suspend has returned immediately and it will make ->resume return
immediately too, but that sounds like a thing that will be readily duplicated
by multiple subsystems just because we don't want the core to be involved.
Also there's no way to know anything about the children then.

> Are you primarily concerned about performing runtime PM actions before
> calling the ->complete callback?

No, I'm not.

> I don't think that will cause any difficulties, but we could warn about it in
> the documentation.
>
> > > > @@ -1310,6 +1317,9 @@ static int __device_suspend(struct devic
> > > > dpm_watchdog_set(&wd, dev);
> > > > device_lock(dev);
> > > >
> > > > + if (dev->power.fast_suspend)
> > > > + goto End;
> > > > +
> > >
> > > What happens if dev->power.fast_suspend gets set following the .prepare
> > > callback, but then before __device_suspend runs, the device gets
> > > runtime resumed?
> > >
> > > It looks like rpm_resume needs to clear the new flag.
> >
> > I thought about that and came to the conclusion that that wasn't necessary.
> >
> > There simply is no reason for devices with power.fast_suspend set to be
> > runtime-resumed after (or even during) dpm_prepare() other than while
> > resuming their children, in which case power.fast_suspend is going to be
> > cleared for the children and then for the parents too.
> >
> > That *is* a little fragile, though.
>
> It's possible for the device to get a wakeup request. If this happens,
> it should cause the entire system sleep to be aborted, but depending on
> that is also a little fragile.
>
> There's also the possibility of doing I/O to the child while the parent
> is in runtime suspend (with ignore_children set). These situations
> tend to be idiosyncratic; it's hard to say anything about them in
> general.

I agree. For this reason, I think that the core has no choice but to treat
power.ignore_children set as "well, that device may need to be operational
going forward".

> > > > @@ -1358,9 +1368,14 @@ static int __device_suspend(struct devic
> > > > End:
> > > > if (!error) {
> > > > dev->power.is_suspended = true;
> > > > - if (dev->power.wakeup_path
> > > > - && dev->parent && !dev->parent->power.ignore_children)
> > > > - dev->parent->power.wakeup_path = true;
> > > > + if (dev->parent) {
> > > > + if (!dev->parent->power.ignore_children
> > > > + && dev->power.wakeup_path)
> > > > + dev->parent->power.wakeup_path = true;
> > > > +
> > > > + if (!dev->power.fast_suspend)
> > > > + dev->parent->power.fast_suspend = false;
> > > > + }
> > >
> > > On SMP systems with async suspend, this isn't safe. Two threads should
> > > not be allowed to write to bitfields in the same structure at the same
> > > time.
> >
> > Do I understand correctly that your concern is about suspending two
> > children of the same parent in parallel and one of them modifying
> > power.wakeup_path and the other modifying power.fast_suspend at the
> > same time?
>
> Yes, that's what I meant.
>
> > If so, that modification can be done under the parent's power.lock.
>
> That would avoid the problem.
>
> Overall, I'm not convinced about the need for the fast_suspend
> mechanism. I'm quite certain that the PM core doesn't need to get
> involved in the decision about skipping ->suspend callbacks.

I obviously disagree. :-)

Otherwise, I wouldn't have bothered with posting this patchset ...

> The question of skipping ->resume callbacks is more complex. In the
> end, it comes down to a question of leaving the device in runtime
> suspend during system resume -- whether this is happens because the PM
> core skips the ->resume callback or because the callback returns
> immediately is unimportant.

Well, not really in my opinion ...

> So I think what we really need to do is discuss this last question more
> fully. Case (2) can be tricky, and we may find that the device's
> subsystem doesn't have enough information to predict what the
> children's subsystems will do.

Precisely. :-) And that's why I'd like to give the children's subsystem
a way to give the parent's one a clue - with some help from the core.

Rafael

2014-02-24 19:36:06

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Mon, 24 Feb 2014, Rafael J. Wysocki wrote:

> Also, it may not do that today and I'd like to introduce a mechanism by which
> that optimizatiom may be enabled by subsystems/drivers when ready.
>
> So for example today there is no guarantee that each device will be resumed
> as appropriate by whoever handles its children when necessary, because that
> code may expect the device to be operational when it is running (precisely
> because we used to resume that device in .prepare()).

> Yes. In particular, I'd like the "child" subsystem to be able to let the
> core know that it is fine to leave the parent suspended.

> That said the patchset doesn't really do both. It only really is about
> skipping the ->suspend callbacks if possible, but the *consequence* of that
> is that *system* resume ->resume callbacks cannot be used to resume the
> device any more in general. That's because ->resume_early may try to
> reverse the ->suspend_late's actions and so on, so if ->suspend_late hasn't
> run, it would be a bug to run ->resume_early for that device.

> I agree. For this reason, I think that the core has no choice but to treat
> power.ignore_children set as "well, that device may need to be operational
> going forward".

This discussion is getting a little messy. Let's try to clarify it.
Here is the major point:

We would like to save time during system suspend/resume by
skipping over devices that are already in runtime suspend,
whenever it is safe to do so.

Of course, the "it is safe to do so" part is what makes this difficult.
It boils down to three characteristics for each device.

(a) The device uses the same power state for runtime suspend and
system suspend. Therefore, if the device is already in runtime
suspend and the wakeup settings are correct, there is no need
for the PM core to invoke the device's ->suspend callbacks.

This requires a few comments. The matter of whether the same power
state is used for both types of suspend is generally known beforehand,
because it is decided by the subsystem or driver. The matter of
whether the wakeup settings are correct often can't be known until the
system suspend starts, because userspace can select whether or not
wakeup should be enabled during a system sleep.

Also, if the PM core is going to skip the ->suspend callbacks then it
is obliged to skip the ->resume callbacks as well (we mustn't call one
without the other). Therefore, in cases where (a) holds, the device
will necessarily emerge from the system resume in a runtime-suspended
state. This may or may not cause problems for the device's children;
see below.

(b) It's okay for the device's parent to be in runtime suspend
when the device's ->suspend callbacks are invoked.

I included this just to be thorough. In fact, I expect (b) to be true
for pretty much every device already. Or if it isn't true for some
devices, this is because of a special arrangement between the device's
subsystem and the parent's subsystem. For example, the parent might
always be runtime-resumed by its subsystem at the start of a system
suspend (which is what PCI does now; I don't know if it is necessary).

In the absence of any sort of special arrangement, if (b) wasn't true
for some device then that device would already be experiencing problems
going into system suspend. So (b) should not cause much difficulty.
And if a special arrangement is present, it is a private matter between
the two subsystems, not involving the PM core.

(c) It's okay for the device's parent to be in runtime suspend
when the device's ->resume callbacks are invoked.

Unlike (b), I expect that (c) does _not_ hold for quite a few devices
currently. The reason is historical: When runtime PM was first
implemented, we decided that all devices should emerge from system
resume in the RPM_ACTIVE state, even if they were in runtime suspend
when the system suspend started. Therefore drivers could depend on the
parent not being in runtime suspend while the device's ->resume
callback was running.

I don't think it is a good idea to perpetuate an accident of history.
Instead of adding a special mechanism to the PM core for accomodating
devices where (c) doesn't hold, I think we should fix up the drivers
so that (c) _does_ hold everywhere. Maybe you disagree.


Anyway, let's assume first that things are all fixed up, and (b) and
(c) hold for every device. This means we can go back and consider (a).

Since the "same power state for both types of suspend" answer is known
beforehand, let's concentrate on devices where it is true (other
devices will simply continue to operate as they do today). For these
devices, the driver or subsystem will have to compute a flag value --
let's call it "same_wakeup_setting". The PM core can't do this because
it doesn't understand the device-specific details of wakeup settings.

Then your proposal comes down to this:

If ->prepare returns > 0, the PM core sets the
same_wakeup_setting flag.

If same_wakeup_setting is on and the device is in runtime
suspend, the PM core skips the various ->suspend and ->resume
callbacks.

My proposal was never made explicit, but it would take a form something
like this:

During ->prepare, the subsystem sets the same_wakeup_setting
flag appropriately.

If same_wakeup_setting is on and the device is in runtime
suspend, the subsystem's ->suspend and ->resume callbacks
return immediately.

There isn't very much difference between the two proposals. Mine is a
little more flexible; for example, it allows the subsystem to return
immediately from ->suspend but have ->resume put the device back in the
RPM_ACTIVE state.


Now let's change course and suppose that (c) _doesn't_ hold for a large
selection of devices. As a simple consequence, if (c) doesn't hold for
some device then (a) can't be allowed to hold for any ancestor of that
device. (I'm disregarding the power.ignore_children flag.)

Your proposal would take the same_wakeup_setting flag (you called it
"fast_suspend"), used for answering (a), and combine it with the answer
to (c). That is, you would have ->prepare return > 0 only if (a) and
(c) both hold -- and in addition, you turn off the flag if it is off in
any child.

In practice, I suspect this means fast_suspend will end up affecting
only leaf devices: those with no children. This is partly because many
devices don't have a .prepare method; there are plenty of entries in
the device tree that don't correspond to physical devices (e.g., class
devices, or devices present only for their sysfs attributes) and hence
have no PM support at all. Even though (c) does hold for such devices,
the PM core won't realize it.

In addition, I don't like the way your proposal mixes together the
answers to (a) and (c). If they could be kept separate, I think you
could do a better job.

For instance, suppose (a) is true for some device, but (c) is false for
one of its children. Then the PM core could skip the ->suspend and
->resume callbacks for that device, and it could do a pm_runtime_resume
on the device before resuming the child. The end result would be a
single ->runtime_resume call, instead of ->runtime_resume followed by
->suspend and then ->resume.

Alan Stern

2014-02-24 23:52:56

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Monday, February 24, 2014 02:36:02 PM Alan Stern wrote:
> On Mon, 24 Feb 2014, Rafael J. Wysocki wrote:
>
> > Also, it may not do that today and I'd like to introduce a mechanism by which
> > that optimizatiom may be enabled by subsystems/drivers when ready.
> >
> > So for example today there is no guarantee that each device will be resumed
> > as appropriate by whoever handles its children when necessary, because that
> > code may expect the device to be operational when it is running (precisely
> > because we used to resume that device in .prepare()).
>
> > Yes. In particular, I'd like the "child" subsystem to be able to let the
> > core know that it is fine to leave the parent suspended.
>
> > That said the patchset doesn't really do both. It only really is about
> > skipping the ->suspend callbacks if possible, but the *consequence* of that
> > is that *system* resume ->resume callbacks cannot be used to resume the
> > device any more in general. That's because ->resume_early may try to
> > reverse the ->suspend_late's actions and so on, so if ->suspend_late hasn't
> > run, it would be a bug to run ->resume_early for that device.
>
> > I agree. For this reason, I think that the core has no choice but to treat
> > power.ignore_children set as "well, that device may need to be operational
> > going forward".
>
> This discussion is getting a little messy. Let's try to clarify it.
> Here is the major point:
>
> We would like to save time during system suspend/resume by

Actually, that's not only about saving time, but also about saving energy.

> skipping over devices that are already in runtime suspend,
> whenever it is safe to do so.
>
> Of course, the "it is safe to do so" part is what makes this difficult.
> It boils down to three characteristics for each device.
>
> (a) The device uses the same power state for runtime suspend and
> system suspend. Therefore, if the device is already in runtime
> suspend and the wakeup settings are correct, there is no need
> for the PM core to invoke the device's ->suspend callbacks.
>
> This requires a few comments. The matter of whether the same power
> state is used for both types of suspend is generally known beforehand,
> because it is decided by the subsystem or driver. The matter of
> whether the wakeup settings are correct often can't be known until the
> system suspend starts, because userspace can select whether or not
> wakeup should be enabled during a system sleep.
>
> Also, if the PM core is going to skip the ->suspend callbacks then it
> is obliged to skip the ->resume callbacks as well (we mustn't call one
> without the other). Therefore, in cases where (a) holds, the device
> will necessarily emerge from the system resume in a runtime-suspended
> state.

The "emerge from system resume" requires a bit of clarification in my
opinion. Do you refer to the status of the device when user space is
thawed or earlier? If earlier, then when exactly?

> This may or may not cause problems for the device's children;
> see below.
>
> (b) It's okay for the device's parent to be in runtime suspend
> when the device's ->suspend callbacks are invoked.
>
> I included this just to be thorough. In fact, I expect (b) to be true
> for pretty much every device already.

I don't quite understand this. What if the parent is a bridge and the
child's ->suspend tries to access the child's registers? That surely won't
work if the parent is in a low-power state at that point.

> Or if it isn't true for some
> devices, this is because of a special arrangement between the device's
> subsystem and the parent's subsystem. For example, the parent might
> always be runtime-resumed by its subsystem at the start of a system
> suspend (which is what PCI does now; I don't know if it is necessary).
>
> In the absence of any sort of special arrangement, if (b) wasn't true
> for some device then that device would already be experiencing problems
> going into system suspend.

Unless, of course, its parent is a PCI device, because in that case it will
always be resumed by the PCI bus type. And if the "always resumed" is going
to be changed to "only resumed if the parent's configuration needs to change",
there may be some regressions here and there.

> So (b) should not cause much difficulty.

I disagree.

> And if a special arrangement is present, it is a private matter between
> the two subsystems, not involving the PM core.
>
> (c) It's okay for the device's parent to be in runtime suspend
> when the device's ->resume callbacks are invoked.
>
> Unlike (b), I expect that (c) does _not_ hold for quite a few devices
> currently. The reason is historical: When runtime PM was first
> implemented, we decided that all devices should emerge from system
> resume in the RPM_ACTIVE state, even if they were in runtime suspend
> when the system suspend started. Therefore drivers could depend on the
> parent not being in runtime suspend while the device's ->resume
> callback was running.

That's correct.

> I don't think it is a good idea to perpetuate an accident of history.
> Instead of adding a special mechanism to the PM core for accomodating
> devices where (c) doesn't hold, I think we should fix up the drivers
> so that (c) _does_ hold everywhere. Maybe you disagree.

I agree with the idea, but I have a certain view on how to achieve it.
Which is by allowing the "good" ones to mark themselves as "good", then
go through the ones that aren't marked as "good" yet, fix them up and
mark them as "good". Finally, when everyone is marked "good", we can
drop the marking.

> Anyway, let's assume first that things are all fixed up, and (b) and
> (c) hold for every device. This means we can go back and consider (a).
>
> Since the "same power state for both types of suspend" answer is known
> beforehand, let's concentrate on devices where it is true (other
> devices will simply continue to operate as they do today). For these
> devices, the driver or subsystem will have to compute a flag value --
> let's call it "same_wakeup_setting". The PM core can't do this because
> it doesn't understand the device-specific details of wakeup settings.
>
> Then your proposal comes down to this:
>
> If ->prepare returns > 0, the PM core sets the
> same_wakeup_setting flag.
>
> If same_wakeup_setting is on and the device is in runtime
> suspend, the PM core skips the various ->suspend and ->resume
> callbacks.
>
> My proposal was never made explicit, but it would take a form something
> like this:
>
> During ->prepare, the subsystem sets the same_wakeup_setting
> flag appropriately.
>
> If same_wakeup_setting is on and the device is in runtime
> suspend, the subsystem's ->suspend and ->resume callbacks
> return immediately.
>
> There isn't very much difference between the two proposals. Mine is a
> little more flexible; for example, it allows the subsystem to return
> immediately from ->suspend but have ->resume put the device back in the
> RPM_ACTIVE state.

Your version is fine by me, I only made the PM core set the flag, because it
would clear that flag afterward in some cases.

> Now let's change course and suppose that (c) _doesn't_ hold for a large
> selection of devices. As a simple consequence, if (c) doesn't hold for
> some device then (a) can't be allowed to hold for any ancestor of that
> device. (I'm disregarding the power.ignore_children flag.)
>
> Your proposal would take the same_wakeup_setting flag (you called it
> "fast_suspend"), used for answering (a), and combine it with the answer
> to (c). That is, you would have ->prepare return > 0 only if (a) and
> (c) both hold -- and in addition, you turn off the flag if it is off in
> any child.
>
> In practice, I suspect this means fast_suspend will end up affecting
> only leaf devices: those with no children. This is partly because many
> devices don't have a .prepare method; there are plenty of entries in
> the device tree that don't correspond to physical devices (e.g., class
> devices, or devices present only for their sysfs attributes) and hence
> have no PM support at all. Even though (c) does hold for such devices,
> the PM core won't realize it.
>
> In addition, I don't like the way your proposal mixes together the
> answers to (a) and (c).

It really doesn't and I tried to explain that in my previous message, but
that apparently wasn't clear enough.

The reasoning was basically to set fast_suspend for a device if your condition
(a) held *and* if fast_suspend was set for all of the device's children. Then
we would know that all those children would be RPM_SUSPENDED during system
resume as well and the resume part might be "streamlined" as well. There was
nothing about (c) anywhere in that patchset. :-)

> If they could be kept separate, I think you could do a better job.
>
> For instance, suppose (a) is true for some device, but (c) is false for
> one of its children. Then the PM core could skip the ->suspend and
> ->resume callbacks for that device, and it could do a pm_runtime_resume
> on the device before resuming the child. The end result would be a
> single ->runtime_resume call, instead of ->runtime_resume followed by
> ->suspend and then ->resume.

That would be a different optimization from the one I'm thinking about.

For now, I'm focusing on one problem, which is when resuming runtime-suspended
devices during system suspend may be avoided and how to make that generally
work for different parent-child arrangements.

The resume part changes in my patchset were consequences of that only.

Rafael

2014-02-25 17:08:17

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Tue, 25 Feb 2014, Rafael J. Wysocki wrote:

> > This discussion is getting a little messy. Let's try to clarify it.
> > Here is the major point:
> >
> > We would like to save time during system suspend/resume by
>
> Actually, that's not only about saving time, but also about saving energy.

Sure.

> > skipping over devices that are already in runtime suspend,
> > whenever it is safe to do so.
> >
> > Of course, the "it is safe to do so" part is what makes this difficult.
> > It boils down to three characteristics for each device.
> >
> > (a) The device uses the same power state for runtime suspend and
> > system suspend. Therefore, if the device is already in runtime
> > suspend and the wakeup settings are correct, there is no need
> > for the PM core to invoke the device's ->suspend callbacks.
> >
> > This requires a few comments. The matter of whether the same power
> > state is used for both types of suspend is generally known beforehand,
> > because it is decided by the subsystem or driver. The matter of
> > whether the wakeup settings are correct often can't be known until the
> > system suspend starts, because userspace can select whether or not
> > wakeup should be enabled during a system sleep.
> >
> > Also, if the PM core is going to skip the ->suspend callbacks then it
> > is obliged to skip the ->resume callbacks as well (we mustn't call one
> > without the other). Therefore, in cases where (a) holds, the device
> > will necessarily emerge from the system resume in a runtime-suspended
> > state.
>
> The "emerge from system resume" requires a bit of clarification in my
> opinion. Do you refer to the status of the device when user space is
> thawed or earlier? If earlier, then when exactly?

The status of the device when its ->resume callback returns. Or in
the context of your patch, when the ->resume callback is skipped.

> > This may or may not cause problems for the device's children;
> > see below.
> >
> > (b) It's okay for the device's parent to be in runtime suspend
> > when the device's ->suspend callbacks are invoked.
> >
> > I included this just to be thorough. In fact, I expect (b) to be true
> > for pretty much every device already.
>
> I don't quite understand this. What if the parent is a bridge and the
> child's ->suspend tries to access the child's registers? That surely won't
> work if the parent is in a low-power state at that point.

It _does_ work on all current systems -- but only because the question
never arises if the device's parent is never in runtime suspend when
the device's ->suspend callbacks are invoked.

I admit, there most likely _are_ devices that would get into trouble if
the question ever did arise.

> > In the absence of any sort of special arrangement, if (b) wasn't true
> > for some device then that device would already be experiencing problems
> > going into system suspend.
>
> Unless, of course, its parent is a PCI device, because in that case it will
> always be resumed by the PCI bus type. And if the "always resumed" is going
> to be changed to "only resumed if the parent's configuration needs to change",
> there may be some regressions here and there.

Okay, so you want to take a problem involving PCI and some other
subsystems, and solve it by getting the PM core involved. And you want
to mix it in with the idea of "fast suspend". Is that really the best
approach?

Here's another approach. Add an "okay for my parent to be
runtime-suspended when my ->suspend and ->resume callbacks are invoked"
flag to dev->power. Make pci_pm_prepare(dev) check this flag in every
child of dev; if there are any children where the flag isn't set then
call pm_runtime_resume(dev) (and print a message in the kernel log so
that you know which driver needs to be fixed).

This is appropriate because it adjusts the PCI core in a way that can
safely avoid regressions, without getting the PM core involved in
matters that should remain strictly between the PCI subsystem and the
subsystems of children of PCI devices.

It also allows the "fast suspend" change to be cleanly separated out
into a second patch. In that patch, all you do is set
dev->power.fast_suspend if ->prepare returns > 0, and then you skip
->suspend and ->resume if fast_suspend is set and the device is
runtime-suspended.

> I agree with the idea, but I have a certain view on how to achieve it.
> Which is by allowing the "good" ones to mark themselves as "good", then
> go through the ones that aren't marked as "good" yet, fix them up and
> mark them as "good". Finally, when everyone is marked "good", we can
> drop the marking.

How will you know when nothing remains unmarked?

> The reasoning was basically to set fast_suspend for a device if your condition
> (a) held *and* if fast_suspend was set for all of the device's children. Then
> we would know that all those children would be RPM_SUSPENDED during system
> resume as well and the resume part might be "streamlined" as well. There was
> nothing about (c) anywhere in that patchset. :-)

This seems like getting the answer to the wrong question. You want to
know whether the children are okay with the parent being
runtime-suspended during the child's suspend and resume callbacks, but
instead you are asking if the children can remain runtime-suspended
through the entire system suspend. Those are two different questions.

They seem related, because if the parent is in runtime suspend then the
child must also be in runtime suspend (disregarding the possibility of
ignore_children). But this is a red herring. It's entirely possible
for the child to be RPM_ACTIVE during one of these callback even if the
parent is RPM_SUSPENDED when the callback starts. The child's driver
merely needs to call pm_runtime_resume(dev) at the beginning of the
callback.

If the child's driver does this, it would be perfectly okay for the
parent to use fast_suspend without the child using it too. You could
skip calling the parent's suspend and resume callbacks, and the child
would work just fine.

> That would be a different optimization from the one I'm thinking about.
>
> For now, I'm focusing on one problem, which is when resuming runtime-suspended
> devices during system suspend may be avoided and how to make that generally
> work for different parent-child arrangements.
>
> The resume part changes in my patchset were consequences of that only.

See, I think that by considering this as a single problem, you aren't
getting down to the fundamental issues.

Alan Stern

2014-02-25 23:41:29

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Tuesday, February 25, 2014 12:08:14 PM Alan Stern wrote:
> On Tue, 25 Feb 2014, Rafael J. Wysocki wrote:
>
> > > This discussion is getting a little messy. Let's try to clarify it.
> > > Here is the major point:
> > >
> > > We would like to save time during system suspend/resume by
> >
> > Actually, that's not only about saving time, but also about saving energy.
>
> Sure.
>
> > > skipping over devices that are already in runtime suspend,
> > > whenever it is safe to do so.
> > >
> > > Of course, the "it is safe to do so" part is what makes this difficult.
> > > It boils down to three characteristics for each device.
> > >
> > > (a) The device uses the same power state for runtime suspend and
> > > system suspend. Therefore, if the device is already in runtime
> > > suspend and the wakeup settings are correct, there is no need
> > > for the PM core to invoke the device's ->suspend callbacks.
> > >
> > > This requires a few comments. The matter of whether the same power
> > > state is used for both types of suspend is generally known beforehand,
> > > because it is decided by the subsystem or driver. The matter of
> > > whether the wakeup settings are correct often can't be known until the
> > > system suspend starts, because userspace can select whether or not
> > > wakeup should be enabled during a system sleep.
> > >
> > > Also, if the PM core is going to skip the ->suspend callbacks then it
> > > is obliged to skip the ->resume callbacks as well (we mustn't call one
> > > without the other). Therefore, in cases where (a) holds, the device
> > > will necessarily emerge from the system resume in a runtime-suspended
> > > state.
> >
> > The "emerge from system resume" requires a bit of clarification in my
> > opinion. Do you refer to the status of the device when user space is
> > thawed or earlier? If earlier, then when exactly?
>
> The status of the device when its ->resume callback returns. Or in
> the context of your patch, when the ->resume callback is skipped.
>
> > > This may or may not cause problems for the device's children;
> > > see below.
> > >
> > > (b) It's okay for the device's parent to be in runtime suspend
> > > when the device's ->suspend callbacks are invoked.
> > >
> > > I included this just to be thorough. In fact, I expect (b) to be true
> > > for pretty much every device already.
> >
> > I don't quite understand this. What if the parent is a bridge and the
> > child's ->suspend tries to access the child's registers? That surely won't
> > work if the parent is in a low-power state at that point.
>
> It _does_ work on all current systems -- but only because the question
> never arises if the device's parent is never in runtime suspend when
> the device's ->suspend callbacks are invoked.
>
> I admit, there most likely _are_ devices that would get into trouble if
> the question ever did arise.

Well, I kind of put that to a test by posting these two patches:

https://patchwork.kernel.org/patch/3705261/
https://patchwork.kernel.org/patch/3705271/

We'll see if they lead to any regressions, but I'm going to work on top of
them going forward anyway.

> > > In the absence of any sort of special arrangement, if (b) wasn't true
> > > for some device then that device would already be experiencing problems
> > > going into system suspend.
> >
> > Unless, of course, its parent is a PCI device, because in that case it will
> > always be resumed by the PCI bus type. And if the "always resumed" is going
> > to be changed to "only resumed if the parent's configuration needs to change",
> > there may be some regressions here and there.
>
> Okay, so you want to take a problem involving PCI and some other
> subsystems, and solve it by getting the PM core involved. And you want
> to mix it in with the idea of "fast suspend". Is that really the best
> approach?
>
> Here's another approach. Add an "okay for my parent to be
> runtime-suspended when my ->suspend and ->resume callbacks are invoked"
> flag to dev->power. Make pci_pm_prepare(dev) check this flag in every
> child of dev; if there are any children where the flag isn't set then
> call pm_runtime_resume(dev) (and print a message in the kernel log so
> that you know which driver needs to be fixed).
>
> This is appropriate because it adjusts the PCI core in a way that can
> safely avoid regressions, without getting the PM core involved in
> matters that should remain strictly between the PCI subsystem and the
> subsystems of children of PCI devices.
>
> It also allows the "fast suspend" change to be cleanly separated out
> into a second patch. In that patch, all you do is set
> dev->power.fast_suspend if ->prepare returns > 0, and then you skip
> ->suspend and ->resume if fast_suspend is set and the device is
> runtime-suspended.

Actually, on top of the two patches mentioned above (and for devices
without power.ignore_children set) the question reduces to whether or not
(i) the device itself is runtime-suspended when its .suspend() callback is
running and (ii) its power state is such that it can remain suspended.
If both (i) and (ii) are met, the device may be left suspended safely,
because if any of its children had depended on it, they would have resumed
it already.

Still, I think that something like power.fast_suspend is needed to indicate
that .suspend_late(), .suspend_noirq(), .resume_noirq() and .resume_early()
should be skipped for it (in my opinion the core may very well skip them then)
and so that .resume() knows how to handle the device.

I'll prepare a new series working along these lines.

> > I agree with the idea, but I have a certain view on how to achieve it.
> > Which is by allowing the "good" ones to mark themselves as "good", then
> > go through the ones that aren't marked as "good" yet, fix them up and
> > mark them as "good". Finally, when everyone is marked "good", we can
> > drop the marking.
>
> How will you know when nothing remains unmarked?
>
> > The reasoning was basically to set fast_suspend for a device if your condition
> > (a) held *and* if fast_suspend was set for all of the device's children. Then
> > we would know that all those children would be RPM_SUSPENDED during system
> > resume as well and the resume part might be "streamlined" as well. There was
> > nothing about (c) anywhere in that patchset. :-)
>
> This seems like getting the answer to the wrong question. You want to
> know whether the children are okay with the parent being
> runtime-suspended during the child's suspend and resume callbacks, but
> instead you are asking if the children can remain runtime-suspended
> through the entire system suspend. Those are two different questions.
>
> They seem related, because if the parent is in runtime suspend then the
> child must also be in runtime suspend (disregarding the possibility of
> ignore_children). But this is a red herring. It's entirely possible
> for the child to be RPM_ACTIVE during one of these callback even if the
> parent is RPM_SUSPENDED when the callback starts. The child's driver
> merely needs to call pm_runtime_resume(dev) at the beginning of the
> callback.
>
> If the child's driver does this, it would be perfectly okay for the
> parent to use fast_suspend without the child using it too. You could
> skip calling the parent's suspend and resume callbacks, and the child
> would work just fine.
>
> > That would be a different optimization from the one I'm thinking about.
> >
> > For now, I'm focusing on one problem, which is when resuming runtime-suspended
> > devices during system suspend may be avoided and how to make that generally
> > work for different parent-child arrangements.
> >
> > The resume part changes in my patchset were consequences of that only.
>
> See, I think that by considering this as a single problem, you aren't
> getting down to the fundamental issues.

Well, I'm not sure what exactly you mean.

I generally agree that whether or not a device may be left suspended during and
after system resume and whether or not a device may be left suspended during
system suspend are two different questions. However, when it *is* left
suspended during system suspend, then that implies certain way of handling it
during the subsequent system resume. After which it still may not be left
suspended.

Rafael

2014-02-26 16:49:09

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Wed, 26 Feb 2014, Rafael J. Wysocki wrote:

> > I admit, there most likely _are_ devices that would get into trouble if
> > the question ever did arise.
>
> Well, I kind of put that to a test by posting these two patches:
>
> https://patchwork.kernel.org/patch/3705261/
> https://patchwork.kernel.org/patch/3705271/
>
> We'll see if they lead to any regressions, but I'm going to work on top of
> them going forward anyway.

And here I had the impression that you wanted to avoid any regressions
from those patches...

> Actually, on top of the two patches mentioned above (and for devices
> without power.ignore_children set) the question reduces to whether or not
> (i) the device itself is runtime-suspended when its .suspend() callback is
> running and (ii) its power state is such that it can remain suspended.
> If both (i) and (ii) are met, the device may be left suspended safely,
> because if any of its children had depended on it, they would have resumed
> it already.

Does this mean you changed your mind? In an earlier email, you wrote:

> > (b) It's okay for the device's parent to be in runtime suspend
> > when the device's ->suspend callbacks are invoked.
> >
> > I included this just to be thorough. In fact, I expect (b) to be true
> > for pretty much every device already.
>
> I don't quite understand this. What if the parent is a bridge and the
> child's ->suspend tries to access the child's registers? That surely won't
> work if the parent is in a low-power state at that point.

So the answer is that if the bridge is suspended, then the child must
be suspended too and hence the child's ->suspend should _expect_
problems if it tries to access the child's registers.

(By the way, during this discussion I have had a tendency to mix up two
related concepts:

The device's ->suspend routine expects the _parent_ not to be
suspended;

The device's ->suspend routine expects the _device_ not to be
suspended.

Obviously the second implies the first. But once the second has been
fixed, the first should never cause any trouble.)

> Still, I think that something like power.fast_suspend is needed to indicate
> that .suspend_late(), .suspend_noirq(), .resume_noirq() and .resume_early()
> should be skipped for it (in my opinion the core may very well skip them then)
> and so that .resume() knows how to handle the device.

I don't follow. Why would you skip these routines without also
skipping .suspend and .resume?

> I generally agree that whether or not a device may be left suspended during and
> after system resume and whether or not a device may be left suspended during
> system suspend are two different questions. However, when it *is* left
> suspended during system suspend, then that implies certain way of handling it
> during the subsequent system resume. After which it still may not be left
> suspended.

I would prefer to say: "However, when the system suspend callbacks
_are_ skipped, that implies the corresponding system resume callbacks
must also be skipped and hence the device must remain suspended". Is
this consistent with what you meant?

As I see it, the fast_suspend implementation could lead to regressions
in two ways:

The child's ->suspend doesn't expect the parent to be
suspended.

The child's ->resume doesn't expect the parent to be
suspended.

We agree now that the first won't be a problem, because it would imply
the child is suspended too.

However, the second may indeed be a problem. I don't know how you
intend to handle it. Apply the patch, like you did for ACPI and PCI
above, and then see what happens?

A simple solution is to use fast_suspend only for devices that have no
children. But that would not be optimal.

Another possibility is always to call pm_runtime_resume(dev->parent)
before invoking dev's ->resume callback. But that might not solve the
entire problem (it wouldn't help dev's ->resume_early callback, for
instance) and it also might be sub-optimal.

Alan Stern

2014-02-26 21:29:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Wednesday, February 26, 2014 11:49:05 AM Alan Stern wrote:
> On Wed, 26 Feb 2014, Rafael J. Wysocki wrote:
>
> > > I admit, there most likely _are_ devices that would get into trouble if
> > > the question ever did arise.
> >
> > Well, I kind of put that to a test by posting these two patches:
> >
> > https://patchwork.kernel.org/patch/3705261/
> > https://patchwork.kernel.org/patch/3705271/
> >
> > We'll see if they lead to any regressions, but I'm going to work on top of
> > them going forward anyway.
>
> And here I had the impression that you wanted to avoid any regressions
> from those patches...

In the meantime I realized that regressions from them are quite unlikely and
if they happen, they will be limited to corner cases.

The reason is (as you kind of noted below) that (assuming that runtime PM is
supported for both the child and the parent) if the child is to be accessed
in ->suspend, for example, it should be runtime-resumed beforehand and that
will cause the parent to be runtime-resumed as well (the power.ignore_children
devices are special-cased explicitly).

> > Actually, on top of the two patches mentioned above (and for devices
> > without power.ignore_children set) the question reduces to whether or not
> > (i) the device itself is runtime-suspended when its .suspend() callback is
> > running and (ii) its power state is such that it can remain suspended.
> > If both (i) and (ii) are met, the device may be left suspended safely,
> > because if any of its children had depended on it, they would have resumed
> > it already.
>
> Does this mean you changed your mind? In an earlier email, you wrote:

I've just realized that if the pm_runtime_resume() is moved from ->prepare
to ->suspend, then it is done *after* the ->suspend callbacks of the
children have been executed, which means that if the parent is runtime-suspended
at this point, the children are runtime-suspended either. Moreover, they
can't be runtime-resumed in ->suspend_late, because that is executed with
runtime PM disabled, so it is reasonalby safe to assume that they won't need
the parent going forward - until ->resume.

> > > (b) It's okay for the device's parent to be in runtime suspend
> > > when the device's ->suspend callbacks are invoked.
> > >
> > > I included this just to be thorough. In fact, I expect (b) to be true
> > > for pretty much every device already.
> >
> > I don't quite understand this. What if the parent is a bridge and the
> > child's ->suspend tries to access the child's registers? That surely won't
> > work if the parent is in a low-power state at that point.
>
> So the answer is that if the bridge is suspended, then the child must
> be suspended too and hence the child's ->suspend should _expect_
> problems if it tries to access the child's registers.

Agreed.

> (By the way, during this discussion I have had a tendency to mix up two
> related concepts:
>
> The device's ->suspend routine expects the _parent_ not to be
> suspended;
>
> The device's ->suspend routine expects the _device_ not to be
> suspended.
>
> Obviously the second implies the first. But once the second has been
> fixed, the first should never cause any trouble.)
>
> > Still, I think that something like power.fast_suspend is needed to indicate
> > that .suspend_late(), .suspend_noirq(), .resume_noirq() and .resume_early()
> > should be skipped for it (in my opinion the core may very well skip them then)
> > and so that .resume() knows how to handle the device.
>
> I don't follow. Why would you skip these routines without also
> skipping .suspend and .resume?

Because .suspend will set the flag and then it would be reasonable to call .resume,
for symmetry and to let it decide what to do (e.g. call pm_runtime_resume(dev) or
do something else, depending on the subsystem).

> > I generally agree that whether or not a device may be left suspended during and
> > after system resume and whether or not a device may be left suspended during
> > system suspend are two different questions. However, when it *is* left
> > suspended during system suspend, then that implies certain way of handling it
> > during the subsequent system resume. After which it still may not be left
> > suspended.
>
> I would prefer to say: "However, when the system suspend callbacks
> _are_ skipped, that implies the corresponding system resume callbacks
> must also be skipped and hence the device must remain suspended". Is
> this consistent with what you meant?

Yes, it is.

> As I see it, the fast_suspend implementation could lead to regressions
> in two ways:
>
> The child's ->suspend doesn't expect the parent to be
> suspended.
>
> The child's ->resume doesn't expect the parent to be
> suspended.
>
> We agree now that the first won't be a problem, because it would imply
> the child is suspended too.

Yes.

> However, the second may indeed be a problem. I don't know how you
> intend to handle it. Apply the patch, like you did for ACPI and PCI
> above, and then see what happens?

For starters, I'd just make the parent's ->resume call pm_runtime_resume(dev).
That will make the parent be ready before the child's ->resume is called.
And then it may be optimized further going forward, possibly by replacing
the pm_runtime_resume() with pm_request_resume() for some devices and by
leaving some devices in RPM_SUSPENDED.

> A simple solution is to use fast_suspend only for devices that have no
> children. But that would not be optimal.
>
> Another possibility is always to call pm_runtime_resume(dev->parent)
> before invoking dev's ->resume callback. But that might not solve the
> entire problem (it wouldn't help dev's ->resume_early callback, for
> instance) and it also might be sub-optimal.

The child's ->resume_early may be a problem indeed (or its ->resume_noirq
for that matter).

Well, if power.fast_suspend set guarantees that ->suspend_late, ->suspend_noirq,
->resume_noirq, and ->resume_early will be skipped for a device, then we may
restrict setting it for devices whose children have it set (or that have no
children). Initially, that will be equivalent to setting it for leaf devices
only, but it might be extended over time in a natural way.

Rafael

2014-02-26 22:17:09

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Wed, 26 Feb 2014, Rafael J. Wysocki wrote:

> > > Still, I think that something like power.fast_suspend is needed to indicate
> > > that .suspend_late(), .suspend_noirq(), .resume_noirq() and .resume_early()
> > > should be skipped for it (in my opinion the core may very well skip them then)
> > > and so that .resume() knows how to handle the device.
> >
> > I don't follow. Why would you skip these routines without also
> > skipping .suspend and .resume?
>
> Because .suspend will set the flag and then it would be reasonable to call .resume,
> for symmetry and to let it decide what to do (e.g. call pm_runtime_resume(dev) or
> do something else, depending on the subsystem).

In the original patch, ->prepare returned the flag. When it was set,
you would skip ->suspend, ->suspend_late, and ->suspend_noirq (and the
corresponding resume callbacks). Did you decide to change this?

> > However, the second may indeed be a problem. I don't know how you
> > intend to handle it. Apply the patch, like you did for ACPI and PCI
> > above, and then see what happens?
>
> For starters, I'd just make the parent's ->resume call pm_runtime_resume(dev).
> That will make the parent be ready before the child's ->resume is called.
> And then it may be optimized further going forward, possibly by replacing
> the pm_runtime_resume() with pm_request_resume() for some devices and by
> leaving some devices in RPM_SUSPENDED.

Of course, this would not be possible with the original version of the
patch, because it wouldn't invoke the parent's ->resume.

> > A simple solution is to use fast_suspend only for devices that have no
> > children. But that would not be optimal.
> >
> > Another possibility is always to call pm_runtime_resume(dev->parent)
> > before invoking dev's ->resume callback. But that might not solve the
> > entire problem (it wouldn't help dev's ->resume_early callback, for
> > instance) and it also might be sub-optimal.
>
> The child's ->resume_early may be a problem indeed (or its ->resume_noirq
> for that matter).

If the child knows about the problem beforehand, it can runtime-resume
the parent during its ->suspend.

> Well, if power.fast_suspend set guarantees that ->suspend_late, ->suspend_noirq,
> ->resume_noirq, and ->resume_early will be skipped for a device, then we may
> restrict setting it for devices whose children have it set (or that have no
> children). Initially, that will be equivalent to setting it for leaf devices
> only, but it might be extended over time in a natural way.

Initially, maybe. But it's the wrong approach in general. The right
approach is to restrict setting fast_suspend for devices whose children
don't mind their parent being suspended when their resume callbacks
run -- not for devices whose children also have fast_suspend set.

That's the point I've been trying to express all along.

Alan Stern

2014-02-26 22:58:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Wednesday, February 26, 2014 05:17:03 PM Alan Stern wrote:
> On Wed, 26 Feb 2014, Rafael J. Wysocki wrote:
>
> > > > Still, I think that something like power.fast_suspend is needed to indicate
> > > > that .suspend_late(), .suspend_noirq(), .resume_noirq() and .resume_early()
> > > > should be skipped for it (in my opinion the core may very well skip them then)
> > > > and so that .resume() knows how to handle the device.
> > >
> > > I don't follow. Why would you skip these routines without also
> > > skipping .suspend and .resume?
> >
> > Because .suspend will set the flag and then it would be reasonable to call .resume,
> > for symmetry and to let it decide what to do (e.g. call pm_runtime_resume(dev) or
> > do something else, depending on the subsystem).
>
> In the original patch, ->prepare returned the flag. When it was set,
> you would skip ->suspend, ->suspend_late, and ->suspend_noirq (and the
> corresponding resume callbacks). Did you decide to change this?

Yes, I did.

After these patches:

https://patchwork.kernel.org/patch/3705261/
https://patchwork.kernel.org/patch/3705271/

the decision doesn't have to be made until ->suspend (I'm ingoring the
power.ignore_children set special case), because that's when
pm_runtime_resume(dev) is now called (by ACPI and PCI).

> > > However, the second may indeed be a problem. I don't know how you
> > > intend to handle it. Apply the patch, like you did for ACPI and PCI
> > > above, and then see what happens?
> >
> > For starters, I'd just make the parent's ->resume call pm_runtime_resume(dev).
> > That will make the parent be ready before the child's ->resume is called.
> > And then it may be optimized further going forward, possibly by replacing
> > the pm_runtime_resume() with pm_request_resume() for some devices and by
> > leaving some devices in RPM_SUSPENDED.
>
> Of course, this would not be possible with the original version of the
> patch, because it wouldn't invoke the parent's ->resume.

Right.

> > > A simple solution is to use fast_suspend only for devices that have no
> > > children. But that would not be optimal.
> > >
> > > Another possibility is always to call pm_runtime_resume(dev->parent)
> > > before invoking dev's ->resume callback. But that might not solve the
> > > entire problem (it wouldn't help dev's ->resume_early callback, for
> > > instance) and it also might be sub-optimal.
> >
> > The child's ->resume_early may be a problem indeed (or its ->resume_noirq
> > for that matter).
>
> If the child knows about the problem beforehand, it can runtime-resume
> the parent during its ->suspend.

Well, it even should do that in those cases. We may need to deal with children
that don't do that, though.

> > Well, if power.fast_suspend set guarantees that ->suspend_late, ->suspend_noirq,
> > ->resume_noirq, and ->resume_early will be skipped for a device, then we may
> > restrict setting it for devices whose children have it set (or that have no
> > children). Initially, that will be equivalent to setting it for leaf devices
> > only, but it might be extended over time in a natural way.
>
> Initially, maybe.

Of course initially.

> But it's the wrong approach in general.

In the long run - I agree.

> The right approach is to restrict setting fast_suspend for devices whose
> children don't mind their parent being suspended when their resume callbacks
> run -- not for devices whose children also have fast_suspend set.

I agree, but we need to know which children are OK with the parent being
suspended. Having fast_suspend set is a good indication of that. :-)

Of course, we may introduce a separate flag for that just fine if you prefer.

> That's the point I've been trying to express all along.

I see.

Rafael

2014-02-27 15:02:11

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM / sleep: New flag to speed up suspend-resume of suspended devices

On Thu, 27 Feb 2014, Rafael J. Wysocki wrote:

> > If the child knows about the problem beforehand, it can runtime-resume
> > the parent during its ->suspend.
>
> Well, it even should do that in those cases. We may need to deal with children
> that don't do that, though.
>
> > > Well, if power.fast_suspend set guarantees that ->suspend_late, ->suspend_noirq,
> > > ->resume_noirq, and ->resume_early will be skipped for a device, then we may
> > > restrict setting it for devices whose children have it set (or that have no
> > > children). Initially, that will be equivalent to setting it for leaf devices
> > > only, but it might be extended over time in a natural way.
> >
> > Initially, maybe.
>
> Of course initially.
>
> > But it's the wrong approach in general.
>
> In the long run - I agree.
>
> > The right approach is to restrict setting fast_suspend for devices whose
> > children don't mind their parent being suspended when their resume callbacks
> > run -- not for devices whose children also have fast_suspend set.
>
> I agree, but we need to know which children are OK with the parent being
> suspended. Having fast_suspend set is a good indication of that. :-)
>
> Of course, we may introduce a separate flag for that just fine if you prefer.
>
> > That's the point I've been trying to express all along.
>
> I see.

Okay. I'll wait to see the next version.

Alan Stern

2014-04-24 22:24:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 2/3][Resend] PM / runtime: Routine for checking device status during system suspend

From: Rafael J. Wysocki <[email protected]>

Add a new helper routine, pm_runtime_enabled_and_suspended(), to
allow subsystems (or PM domains) to check the runtime PM status of
devices during system suspend (possibly to avoid resuming those
devices upfront at that time).

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/runtime.c | 28 ++++++++++++++++++++++++++++
include/linux/pm_runtime.h | 2 ++
2 files changed, 30 insertions(+)

Index: linux-pm/include/linux/pm_runtime.h
===================================================================
--- linux-pm.orig/include/linux/pm_runtime.h
+++ linux-pm/include/linux/pm_runtime.h
@@ -57,6 +57,7 @@ extern unsigned long pm_runtime_autosusp
extern void pm_runtime_update_max_time_suspended(struct device *dev,
s64 delta_ns);
extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
+extern bool pm_runtime_enabled_and_suspended(struct device *dev);

static inline bool pm_children_suspended(struct device *dev)
{
@@ -165,6 +166,7 @@ static inline unsigned long pm_runtime_a
struct device *dev) { return 0; }
static inline void pm_runtime_set_memalloc_noio(struct device *dev,
bool enable){}
+static inline bool pm_runtime_enabled_and_suspended(struct device *dev) { return false; }

#endif /* !CONFIG_PM_RUNTIME */

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1195,6 +1195,34 @@ void pm_runtime_enable(struct device *de
EXPORT_SYMBOL_GPL(pm_runtime_enable);

/**
+ * pm_runtime_enabled_and_suspended - Check runtime PM status of a device.
+ * @dev: Device to handle.
+ *
+ * This routine is to be executed during system suspend only, after
+ * device_prepare() has been executed for @dev.
+ *
+ * Return false if runtime PM is disabled for the device. Otherwise, wait
+ * for pending transitions to complete and check the runtime PM status of the
+ * device after that. Return true if it is RPM_SUSPENDED.
+ */
+bool pm_runtime_enabled_and_suspended(struct device *dev)
+{
+ unsigned long flags;
+ bool ret;
+
+ spin_lock_irqsave(&dev->power.lock, flags);
+ if (dev->power.disable_depth) {
+ ret = false;
+ } else {
+ __pm_runtime_barrier(dev);
+ ret = pm_runtime_status_suspended(dev);
+ }
+ spin_unlock_irqrestore(&dev->power.lock, flags);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(pm_runtime_enabled_and_suspended);
+
+/**
* pm_runtime_forbid - Block runtime PM of a device.
* @dev: Device to handle.
*

2014-04-24 22:24:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 1/3] PM / sleep: Flags to speed up suspend-resume of runtime-suspended devices

From: Rafael J. Wysocki <[email protected]>

Currently, some subsystems (e.g. PCI and the ACPI PM domain) have to
resume all runtime-suspended devices during system suspend, mostly
because those devices may need to be reprogrammed due to different
wakeup settings for system sleep and for runtime PM. However, at
least in some cases, that isn't really necessary, because the wakeup
settings may not be really different.

The idea here is that subsystems should know whether or not it is
necessary to reprogram a given device during system suspend and they
should be able to tell the PM core about that. For that reason, add
two new device PM flags, power.resume_not_needed and
power.use_runtime_resume, such that:

(1) If power.resume_not_needed is set for the given device and for
all of its children and the device is runtime-suspended during
device_suspend(), the remaining device's system suspend/resume
callbacks need not be executed.

(2) If power.use_runtime_resume is set for the given device and the
device is runtime-suspended in device_suspend_late(), its late/early
and noirq system suspend/resume callbacks should be skipped and
it should be resumed through pm_runtime_resume() in device_resume().

Those flags are cleared by the PM core in dpm_prepare() for all
devices. Next, subsystems (or drivers) are supposed to set
power.resume_not_needed in their ->prepare() callbacks for devices
whose remaining system suspend/resume callbacks are generally safe to
be skipped if they are runtime-suspended already. Finally, for each
runtime-suspended device with power.resume_not_needed set during
device_suspend(), its subsystem (or driver) may set the device's
power.use_runtime_resume in its ->suspend() callback without changing
the device's state, in which case the PM core will skip all of the
subsequent system suspend/resume callbacks for it and will resume it
in device_resume() using pm_runtime_resume().

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/main.c | 41 ++++++++++++++++++++++++++++++++++-------
include/linux/pm.h | 2 ++
2 files changed, 36 insertions(+), 7 deletions(-)

Index: linux-pm/include/linux/pm.h
===================================================================
--- linux-pm.orig/include/linux/pm.h
+++ linux-pm/include/linux/pm.h
@@ -546,6 +546,8 @@ struct dev_pm_info {
bool is_late_suspended:1;
bool ignore_children:1;
bool early_init:1; /* Owned by the PM core */
+ bool resume_not_needed:1;
+ bool use_runtime_resume:1;
spinlock_t lock;
#ifdef CONFIG_PM_SLEEP
struct list_head entry;
Index: linux-pm/drivers/base/power/main.c
===================================================================
--- linux-pm.orig/drivers/base/power/main.c
+++ linux-pm/drivers/base/power/main.c
@@ -479,7 +479,7 @@ static int device_resume_noirq(struct de
TRACE_DEVICE(dev);
TRACE_RESUME(0);

- if (dev->power.syscore)
+ if (dev->power.syscore || dev->power.use_runtime_resume)
goto Out;

if (!dev->power.is_noirq_suspended)
@@ -605,7 +605,7 @@ static int device_resume_early(struct de
TRACE_DEVICE(dev);
TRACE_RESUME(0);

- if (dev->power.syscore)
+ if (dev->power.syscore || dev->power.use_runtime_resume)
goto Out;

if (!dev->power.is_late_suspended)
@@ -735,6 +735,11 @@ static int device_resume(struct device *
if (dev->power.syscore)
goto Complete;

+ if (dev->power.use_runtime_resume) {
+ pm_runtime_resume(dev);
+ goto Complete;
+ }
+
dpm_wait(dev->parent, async);
dpm_watchdog_set(&wd, dev);
device_lock(dev);
@@ -1007,7 +1012,7 @@ static int __device_suspend_noirq(struct
goto Complete;
}

- if (dev->power.syscore)
+ if (dev->power.syscore || dev->power.use_runtime_resume)
goto Complete;

dpm_wait_for_children(dev, async);
@@ -1146,7 +1151,7 @@ static int __device_suspend_late(struct
goto Complete;
}

- if (dev->power.syscore)
+ if (dev->power.syscore || dev->power.use_runtime_resume)
goto Complete;

dpm_wait_for_children(dev, async);
@@ -1383,9 +1388,29 @@ static int __device_suspend(struct devic
End:
if (!error) {
dev->power.is_suspended = true;
- if (dev->power.wakeup_path
- && dev->parent && !dev->parent->power.ignore_children)
- dev->parent->power.wakeup_path = true;
+ if (dev->parent) {
+ spin_lock_irq(&dev->parent->power.lock);
+
+ if (dev->power.wakeup_path
+ && !dev->parent->power.ignore_children)
+ dev->parent->power.wakeup_path = true;
+
+ /*
+ * Subsystems are supposed to set resume_not_needed in
+ * their ->prepare() callbacks for devices whose
+ * remaining system suspend and resume callbacks are
+ * generally safe to be skipped if the devices are
+ * runtime-suspended.
+ *
+ * If the device's resume_not_needed is not set at this
+ * point, it has to be cleared for its parent too (even
+ * if the subsystem has set that flag for it).
+ */
+ if (!dev->power.resume_not_needed)
+ dev->parent->power.resume_not_needed = false;
+
+ spin_unlock_irq(&dev->parent->power.lock);
+ }
}

device_unlock(dev);
@@ -1553,6 +1578,8 @@ int dpm_prepare(pm_message_t state)
struct device *dev = to_device(dpm_list.next);

get_device(dev);
+ dev->power.use_runtime_resume = false;
+ dev->power.resume_not_needed = false;
mutex_unlock(&dpm_list_mtx);

error = device_prepare(dev, state);

2014-04-24 22:25:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 3/3] ACPI / PM: Avoid resuming devices in ACPI PM domain during system suspend

From: Rafael J. Wysocki <[email protected]>

Rework the ACPI PM domain's PM callbacks to avoid resuming devices
during system suspend (in order to modify their wakeup settings etc.)
if that isn't necessary.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/acpi/device_pm.c | 33 ++++++++++++++++++++++++++++++---
drivers/acpi/scan.c | 4 ++++
include/acpi/acpi_bus.h | 3 ++-
3 files changed, 36 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -907,6 +907,7 @@ int acpi_subsys_prepare(struct device *d
if (dev->power.ignore_children)
pm_runtime_resume(dev);

+ dev->power.resume_not_needed = true;
return pm_generic_prepare(dev);
}
EXPORT_SYMBOL_GPL(acpi_subsys_prepare);
@@ -914,13 +915,39 @@ EXPORT_SYMBOL_GPL(acpi_subsys_prepare);
/**
* acpi_subsys_suspend - Run the device driver's suspend callback.
* @dev: Device to handle.
- *
- * Follow PCI and resume devices suspended at run time before running their
- * system suspend callbacks.
*/
int acpi_subsys_suspend(struct device *dev)
{
+ struct acpi_device *adev = ACPI_COMPANION(dev);
+ u32 sys_target;
+
+ if (!adev || !pm_runtime_enabled_and_suspended(dev))
+ goto out;
+
+ if (!dev->power.resume_not_needed
+ || device_may_wakeup(dev) != !!adev->wakeup.prepare_count)
+ goto resume;
+
+ sys_target = acpi_target_system_state();
+ if (sys_target != ACPI_STATE_S0) {
+ int ret, state;
+
+ if (adev->power.flags.dsw_present)
+ goto resume;
+
+ ret = acpi_dev_pm_get_state(dev, adev, sys_target, NULL, &state);
+ if (ret || state != adev->power.state)
+ goto resume;
+ }
+
+ dev->power.use_runtime_resume = true;
+ return 0;
+
+ resume:
pm_runtime_resume(dev);
+
+ out:
+ dev->power.resume_not_needed = false;
return pm_generic_suspend(dev);
}

Index: linux-pm/include/acpi/acpi_bus.h
===================================================================
--- linux-pm.orig/include/acpi/acpi_bus.h
+++ linux-pm/include/acpi/acpi_bus.h
@@ -261,7 +261,8 @@ struct acpi_device_power_flags {
u32 inrush_current:1; /* Serialize Dx->D0 */
u32 power_removed:1; /* Optimize Dx->D0 */
u32 ignore_parent:1; /* Power is independent of parent power state */
- u32 reserved:27;
+ u32 dsw_present:1; /* _DSW present? */
+ u32 reserved:26;
};

struct acpi_device_power_state {
Index: linux-pm/drivers/acpi/scan.c
===================================================================
--- linux-pm.orig/drivers/acpi/scan.c
+++ linux-pm/drivers/acpi/scan.c
@@ -1551,9 +1551,13 @@ static void acpi_bus_get_power_flags(str
*/
if (acpi_has_method(device->handle, "_PSC"))
device->power.flags.explicit_get = 1;
+
if (acpi_has_method(device->handle, "_IRC"))
device->power.flags.inrush_current = 1;

+ if (acpi_has_method(device->handle, "_DSW"))
+ device->power.flags.dsw_present = 1;
+
/*
* Enumerate supported power management states
*/

2014-04-24 22:25:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 0/3] PM: Mechanism to avoid resuming runtime-suspended devices during system suspend, v2

On Thursday, February 27, 2014 10:02:05 AM Alan Stern wrote:
> On Thu, 27 Feb 2014, Rafael J. Wysocki wrote:
>
> > > If the child knows about the problem beforehand, it can runtime-resume
> > > the parent during its ->suspend.
> >
> > Well, it even should do that in those cases. We may need to deal with children
> > that don't do that, though.
> >
> > > > Well, if power.fast_suspend set guarantees that ->suspend_late, ->suspend_noirq,
> > > > ->resume_noirq, and ->resume_early will be skipped for a device, then we may
> > > > restrict setting it for devices whose children have it set (or that have no
> > > > children). Initially, that will be equivalent to setting it for leaf devices
> > > > only, but it might be extended over time in a natural way.
> > >
> > > Initially, maybe.
> >
> > Of course initially.
> >
> > > But it's the wrong approach in general.
> >
> > In the long run - I agree.
> >
> > > The right approach is to restrict setting fast_suspend for devices whose
> > > children don't mind their parent being suspended when their resume callbacks
> > > run -- not for devices whose children also have fast_suspend set.
> >
> > I agree, but we need to know which children are OK with the parent being
> > suspended. Having fast_suspend set is a good indication of that. :-)
> >
> > Of course, we may introduce a separate flag for that just fine if you prefer.
> >
> > > That's the point I've been trying to express all along.
> >
> > I see.
>
> Okay. I'll wait to see the next version.

Well, that took some time, but a new version follows.

It uses two flags (see the changelog of patch [1/3]) and is reworked on top of
the changes that went into 3.15-rc. Patch [2/3] is just a resend.

Thanks,
Rafael

2014-04-25 11:28:30

by Ulf Hansson

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/3][Resend] PM / runtime: Routine for checking device status during system suspend

On 25 April 2014 00:39, Rafael J. Wysocki <[email protected]> wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Add a new helper routine, pm_runtime_enabled_and_suspended(), to
> allow subsystems (or PM domains) to check the runtime PM status of
> devices during system suspend (possibly to avoid resuming those
> devices upfront at that time).
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/base/power/runtime.c | 28 ++++++++++++++++++++++++++++
> include/linux/pm_runtime.h | 2 ++
> 2 files changed, 30 insertions(+)
>
> Index: linux-pm/include/linux/pm_runtime.h
> ===================================================================
> --- linux-pm.orig/include/linux/pm_runtime.h
> +++ linux-pm/include/linux/pm_runtime.h
> @@ -57,6 +57,7 @@ extern unsigned long pm_runtime_autosusp
> extern void pm_runtime_update_max_time_suspended(struct device *dev,
> s64 delta_ns);
> extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
> +extern bool pm_runtime_enabled_and_suspended(struct device *dev);
>
> static inline bool pm_children_suspended(struct device *dev)
> {
> @@ -165,6 +166,7 @@ static inline unsigned long pm_runtime_a
> struct device *dev) { return 0; }
> static inline void pm_runtime_set_memalloc_noio(struct device *dev,
> bool enable){}
> +static inline bool pm_runtime_enabled_and_suspended(struct device *dev) { return false; }
>
> #endif /* !CONFIG_PM_RUNTIME */
>
> Index: linux-pm/drivers/base/power/runtime.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/runtime.c
> +++ linux-pm/drivers/base/power/runtime.c
> @@ -1195,6 +1195,34 @@ void pm_runtime_enable(struct device *de
> EXPORT_SYMBOL_GPL(pm_runtime_enable);
>
> /**
> + * pm_runtime_enabled_and_suspended - Check runtime PM status of a device.
> + * @dev: Device to handle.
> + *
> + * This routine is to be executed during system suspend only, after
> + * device_prepare() has been executed for @dev.

Hi Rafael,

Do we really need to state the above constraints. Could we not leave
it to be used in other scenarios as well? I am not sure those would
exists though, but still. :-)

Reviewed-by: Ulf Hansson <[email protected]>

Kind regards
Ulf Hansson

> + *
> + * Return false if runtime PM is disabled for the device. Otherwise, wait
> + * for pending transitions to complete and check the runtime PM status of the
> + * device after that. Return true if it is RPM_SUSPENDED.
> + */
> +bool pm_runtime_enabled_and_suspended(struct device *dev)
> +{
> + unsigned long flags;
> + bool ret;
> +
> + spin_lock_irqsave(&dev->power.lock, flags);
> + if (dev->power.disable_depth) {
> + ret = false;
> + } else {
> + __pm_runtime_barrier(dev);
> + ret = pm_runtime_status_suspended(dev);
> + }
> + spin_unlock_irqrestore(&dev->power.lock, flags);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(pm_runtime_enabled_and_suspended);
> +
> +/**
> * pm_runtime_forbid - Block runtime PM of a device.
> * @dev: Device to handle.
> *
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html