2008-03-21 00:03:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

Hi,

Below is an updated version of the $subject patch. It has only been
compilation tested, but I'd like to get as much feedback as reasonably possible
at this stage to avoid redesigning things later in the process.

The most important changes from the previous one are the following:
* Callbacks to be executed with interrupts disabled are now separated from
the "regular" ones. There still are six of them, but IMHO this is what may
be necessary in the most general case (eg. a driver may have to carry out
some operations with interrupts disabled, which are different for SUSPEND,
FREEZE and HIBERNATE; the complementary "resume" callbacks are needed for
error recovery, for example)
* It occured to me that the ->freeze() and ->quiesce() callbacks were
essentially the same, so I removed the latter
* The ->recover() callback was also dropped, as ->thaw() may well be used instead
just fine (it seems)
* ->prepare() and ->complete() are now called in separate loops which causes
some funny complications with error recovery. Namely, we must remember which
devices have already been ->prepare()d, so that we call ->complete() for them
in the error path. Moreover, there may be some ->prepare()d and some
->suspend()ed devices at the same time, so if there's an error we should
->resume() the ->suspend()ed ones and ->complete() everything etc.
This may be handled in many different ways, but I decided to introduce a new
list on which the ->complete()d devices are stored. Accordingly, the
registration of new children of a device is blocked between ->prepare() and
->complete() (it was only blocked between ->prepare() and ->resume() in the
previous version).

Please have a look.

Thanks,
Rafael

---
From: Rafael J. Wysocki <[email protected]>

Introduce 'struct pm_ops' and 'struct pm_noirq_ops' representing
suspend and hibernation operations for bus types, device classes and
device types.

Modify the PM core to use 'struct pm_ops' and 'struct pm_noirq_ops'
objects, if defined, instead of the ->suspend() and ->resume() or,
respectively, ->suspend_late() and ->resume_early() callbacks that
will be considered as legacy and gradually phased out.

The main purpose of doing this is to separate suspend (aka S2RAM and
standby) callbacks from hibernation callbacks in such a way that the
new callbacks won't take arguments and the semantics of each of them
will be clearly specified. ?This has been requested for multiple
times by many people, including Linus himself, and the reason is that
within the current scheme if ->resume() is called, for example, it's
difficult to say why it's been called (ie. is it a resume from RAM or
from hibernation or a suspend/hibernation failure etc.?).

The second purpose is to make the suspend/hibernation callbacks more
flexible so that device drivers can handle more than they can within
the current scheme. For example, some drivers may need to prevent
new children of the device from being registered before their
->suspend() callbacks are executed or they may want to carry out some
operations requiring the availability of some other devices, not
directly bound via the parent-child relationship, in order to prepare
for the execution of ->suspend(), etc.

Ultimately, we'd like to stop using the freezing of tasks for suspend
and therefore the drivers' suspend/hibernation code will have to take
care of the handling of the user space during suspend/hibernation.
That, in turn, would be difficult within the current scheme, without
the new ->prepare() and ->complete() callbacks.

Not-yet-signed-off-by: Rafael J. Wysocki <[email protected]>
---

arch/x86/kernel/apm_32.c | 4
drivers/base/power/main.c | 551 +++++++++++++++++++++++++++++++++++++---------
include/linux/device.h | 8
include/linux/pm.h | 273 ++++++++++++++++++++--
kernel/power/disk.c | 14 -
kernel/power/main.c | 4
6 files changed, 708 insertions(+), 146 deletions(-)

Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -114,7 +114,9 @@ typedef struct pm_message {
int event;
} pm_message_t;

-/*
+/**
+ * struct pm_ops - device PM callbacks
+ *
* Several driver power state transitions are externally visible, affecting
* the state of pending I/O queues and (for drivers that touch hardware)
* interrupts, wakeups, DMA, and other hardware state. There may also be
@@ -122,6 +124,245 @@ typedef struct pm_message {
* to the rest of the driver stack (such as a driver that's ON gating off
* clocks which are not in active use).
*
+ * The externally visible transitions are handled with the help of the following
+ * callbacks included in this structure:
+ *
+ * @prepare: Prepare the device for the upcoming transition, but do NOT change
+ * its hardware state. Prevent new children of the device from being
+ * registered and prevent new calls to the probe method from being made
+ * after @prepare() returns. If @prepare() detects a situation it cannot
+ * handle (e.g. registration of a child already in progress), it may return
+ * -EAGAIN, so that the PM core can execute it once again (e.g. after the
+ * new child has been registered) to recover from the race condition. This
+ * method is executed for all kinds of suspend transitions and is followed
+ * by one of the suspend callbacks: @suspend(), @freeze(), or @poweroff().
+ * The PM core executes @prepare() for all devices before starting to
+ * execute suspend callbacks for any of them, so drivers may assume all of
+ * the other devices to be present and functional while @prepare() is being
+ * executed. However, they may NOT assume anything about the availability
+ * of the user space at that time.
+ *
+ * @complete: Undo the changes made by @prepare(). This method is executed for
+ * all kinds of resume transitions, following one of the resume callbacks:
+ * @resume(), @thaw(), @restore(). Also called if the state transition
+ * fails before the driver's suspend callback (@suspend(), @freeze(),
+ * @poweroff()) can be executed (e.g. if the suspend callback fails for one
+ * of the other devices that the PM core has unsucessfully attempted to
+ * suspend earlier). Also executed if a new child of the device has been
+ * registered during a successful @prepare() (in that case @prepare() will
+ * be executed once again for the device).
+ * The PM core executes @complete() after it has executed the appropriate
+ * resume callback for all devices.
+ *
+ * @suspend: Executed before putting the system into a sleep state in which the
+ * contents of main memory are preserved. Quiesce the device, put it into
+ * a low power state appropriate for the upcoming system state (such as
+ * PCI_D3hot), and enable wakeup events as appropriate.
+ *
+ * @resume: Executed after waking the system up from a sleep state in which the
+ * contents of main memory were preserved. Put the device into the
+ * appropriate state, according to the information saved in memory by the
+ * preceding @suspend(). The driver starts working again, responding to
+ * hardware events and software requests. The hardware may have gone
+ * through a power-off reset, or it may have maintained state from the
+ * previous suspend() which the driver may rely on while resuming. On most
+ * platforms, there are no restrictions on availability of resources like
+ * clocks during @resume().
+ *
+ * @freeze: Hibernation-specific, executed before creating a hibernation image.
+ * Quiesce operations so that a consistent image can be created, but do NOT
+ * otherwise put the device into a low power device state and do NOT emit
+ * system wakeup events. Save in main memory the device settings to be
+ * used by @restore() during the subsequent resume from hibernation or by
+ * the subsequent @thaw(), if the creation of the image or the restoration
+ * of main memory contents from it fails.
+ *
+ * @thaw: Hibernation-specific, executed after creating a hibernation image OR
+ * if the creation of the image fails. Also executed after a failing
+ * attempt to restore the contents of main memory from such an image.
+ * Undo the changes made by the preceding @freeze(), so the device can be
+ * operated in the same way as immediately before the call to @freeze().
+ *
+ * @poweroff: Hibernation-specific, executed after saving a hibernation image.
+ * Quiesce the device, put it into a low power state appropriate for the
+ * upcoming system state (such as PCI_D3hot), and enable wakeup events as
+ * appropriate.
+ *
+ * @restore: Hibernation-specific, executed after restoring the contents of main
+ * memory from a hibernation image. Driver starts working again,
+ * responding to hardware events and software requests. Drivers may NOT
+ * make ANY assumptions about the hardware state right prior to @restore().
+ * On most platforms, there are no restrictions on availability of
+ * resources like clocks during @restore().
+ *
+ * All of the above callbacks, except for @complete(), return error codes.
+ * However, the error codes returned by the resume operations, @resume(),
+ * @thaw(), and @restore(), are only printed in the system logs, since the PM
+ * core cannot do anything else about them.
+ */
+
+struct pm_ops {
+ int (*prepare)(struct device *dev);
+ void (*complete)(struct device *dev);
+ int (*suspend)(struct device *dev);
+ int (*resume)(struct device *dev);
+ int (*freeze)(struct device *dev);
+ int (*thaw)(struct device *dev);
+ int (*poweroff)(struct device *dev);
+ int (*restore)(struct device *dev);
+};
+
+/**
+ * struct pm_noirq_ops - device PM callbacks executed with interrupts disabled
+ *
+ * The following callbacks included in 'struct pm_noirq_ops' are executed with
+ * interrupts disabled and with the nonboot CPUs switched off:
+ *
+ * @suspend_noirq: Complete the operations of ->suspend() by carrying out any
+ * actions required for suspending the device that need interrupts to be
+ * disabled
+ *
+ * @resume_noirq: Prepare for the execution of ->resume() by carrying out any
+ * actions required for resuming the device that need interrupts to be
+ * disabled
+ *
+ * @freeze_noirq: Complete the operations of ->freeze() by carrying out any
+ * actions required for freezing the device that need interrupts to be
+ * disabled
+ *
+ * @thaw_noirq: Prepare for the execution of ->thaw() by carrying out any
+ * actions required for thawing the device that need interrupts to be
+ * disabled
+ *
+ * @poweroff_noirq: Complete the operations of ->poweroff() by carrying out any
+ * actions required for handling the device that need interrupts to be
+ * disabled
+ *
+ * @restore_noirq: Prepare for the execution of ->restore() by carrying out any
+ * actions required for restoring the operations of the device that need
+ * interrupts to be disabled
+ *
+ * All of the above callbacks, return error codes, but the error codes returned
+ * by the resume operations, @resume_noirq(), @thaw_noirq(), and
+ * @restore_noirq(), are only printed in the system logs, since the PM core
+ * cannot do anything else about them.
+ */
+
+struct pm_noirq_ops {
+ int (*suspend_noirq)(struct device *dev);
+ int (*resume_noirq)(struct device *dev);
+ int (*freeze_noirq)(struct device *dev);
+ int (*thaw_noirq)(struct device *dev);
+ int (*poweroff_noirq)(struct device *dev);
+ int (*restore_noirq)(struct device *dev);
+};
+
+/**
+ * PM_EVENT_ messages
+ *
+ * The following PM_EVENT_ messages are defined for the internal use of the PM
+ * core, in order to provide a mechanism allowing the high level suspend and
+ * hibernation code to convey the necessary information to the device PM core
+ * code:
+ *
+ * ON No transition.
+ *
+ * FREEZE System is going to hibernate, call ->prepare() and ->freeze()
+ * for all devices.
+ *
+ * SUSPEND System is going to suspend, call ->prepare() and ->suspend()
+ * for all devices.
+ *
+ * HIBERNATE Hibernation image has been saved, call ->prepare() and
+ * ->poweroff() for all devices.
+ *
+ * QUIESCE Contents of main memory are going to be restored from a (loaded)
+ * hibernation image, call ->prepare() and ->freeze() for all
+ * devices.
+ *
+ * RESUME System is resuming, call ->resume() and ->complete() for all
+ * devices.
+ *
+ * THAW Hibernation image has been created, call ->thaw() and
+ * ->complete() for all devices.
+ *
+ * RESTORE Contents of main memory have been restored from a hibernation
+ * image, call ->restore() and ->complete() for all devices.
+ *
+ * RECOVER Creation of a hibernation image or restoration of the main
+ * memory contents from a hibernation image has failed, call
+ * ->thaw() and ->complete() for all devices.
+ */
+
+#define PM_EVENT_ON 0x0000
+#define PM_EVENT_FREEZE 0x0001
+#define PM_EVENT_SUSPEND 0x0002
+#define PM_EVENT_HIBERNATE 0x0004
+#define PM_EVENT_QUIESCE 0x0008
+#define PM_EVENT_RESUME 0x0010
+#define PM_EVENT_THAW 0x0020
+#define PM_EVENT_RESTORE 0x0040
+#define PM_EVENT_RECOVER 0x0080
+
+#define PM_EVENT_SLEEP (PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
+
+#define PMSG_FREEZE ((struct pm_message){ .event = PM_EVENT_FREEZE, })
+#define PMSG_QUIESCE ((struct pm_message){ .event = PM_EVENT_QUIESCE, })
+#define PMSG_SUSPEND ((struct pm_message){ .event = PM_EVENT_SUSPEND, })
+#define PMSG_HIBERNATE ((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
+#define PMSG_RESUME ((struct pm_message){ .event = PM_EVENT_RESUME, })
+#define PMSG_THAW ((struct pm_message){ .event = PM_EVENT_THAW, })
+#define PMSG_RESTORE ((struct pm_message){ .event = PM_EVENT_RESTORE, })
+#define PMSG_RECOVER ((struct pm_message){ .event = PM_EVENT_RECOVER, })
+#define PMSG_ON ((struct pm_message){ .event = PM_EVENT_ON, })
+
+/**
+ * Device power management states
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.
+ *
+ * DPM_ON Device is regarded as operational. Set this way
+ * initially and when ->resume(), ->thaw(), or ->restore()
+ * (or ->complete() in case of an error) is about to be
+ * called. Also set when ->prepare() fails.
+ *
+ * DPM_PREPARING Device is currently being prepared for power transition.
+ * Set when ->prepare() is about to be called for the
+ * device.
+ *
+ * DPM_OFF Device is regarded as not fully operational. Set
+ * immediately after a successful call to ->prepare(),
+ * prior to calling ->suspend(), ->freeze(), or
+ * ->poweroff() for all devices.
+ */
+
+enum dpm_state {
+ DPM_ON,
+ DPM_PREPARING,
+ DPM_OFF,
+};
+
+struct dev_pm_info {
+ pm_message_t power_state;
+ unsigned can_wakeup:1;
+ unsigned should_wakeup:1;
+ enum dpm_state status:2; /* Owned by the PM core */
+#ifdef CONFIG_PM_SLEEP
+ struct list_head entry;
+#endif
+};
+
+/*
+ * The PM_EVENT_ messages are also used by drivers implementing the legacy
+ * suspend framework, based on the ->suspend() and ->resume() callbacks common
+ * for suspend and hibernation transitions, according to the rules below.
+ */
+
+/* Necessary, because several drivers use PM_EVENT_PRETHAW */
+#define PM_EVENT_PRETHAW PM_EVENT_QUIESCE
+
+/*
* One transition is triggered by resume(), after a suspend() call; the
* message is implicit:
*
@@ -166,35 +407,11 @@ typedef struct pm_message {
* or from system low-power states such as standby or suspend-to-RAM.
*/

-#define PM_EVENT_ON 0
-#define PM_EVENT_FREEZE 1
-#define PM_EVENT_SUSPEND 2
-#define PM_EVENT_HIBERNATE 4
-#define PM_EVENT_PRETHAW 8
-
-#define PM_EVENT_SLEEP (PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
-
-#define PMSG_FREEZE ((struct pm_message){ .event = PM_EVENT_FREEZE, })
-#define PMSG_PRETHAW ((struct pm_message){ .event = PM_EVENT_PRETHAW, })
-#define PMSG_SUSPEND ((struct pm_message){ .event = PM_EVENT_SUSPEND, })
-#define PMSG_HIBERNATE ((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
-#define PMSG_ON ((struct pm_message){ .event = PM_EVENT_ON, })
-
-struct dev_pm_info {
- pm_message_t power_state;
- unsigned can_wakeup:1;
- unsigned should_wakeup:1;
- bool sleeping:1; /* Owned by the PM core */
-#ifdef CONFIG_PM_SLEEP
- struct list_head entry;
-#endif
-};
+#ifdef CONFIG_PM_SLEEP
+extern void device_power_up(pm_message_t state);
+extern void device_resume(pm_message_t state);

extern int device_power_down(pm_message_t state);
-extern void device_power_up(void);
-extern void device_resume(void);
-
-#ifdef CONFIG_PM_SLEEP
extern int device_suspend(pm_message_t state);
extern int device_prepare_suspend(pm_message_t state);

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -48,6 +48,7 @@
*/

LIST_HEAD(dpm_active);
+static LIST_HEAD(dpm_in_transit);
static LIST_HEAD(dpm_off);
static LIST_HEAD(dpm_off_irq);
static LIST_HEAD(dpm_destroy);
@@ -55,7 +56,7 @@ static LIST_HEAD(dpm_destroy);
static DEFINE_MUTEX(dpm_list_mtx);

/* 'true' if all devices have been suspended, protected by dpm_list_mtx */
-static bool all_sleeping;
+static bool all_inactive;

/**
* device_pm_add - add a device to the list of active devices
@@ -69,22 +70,30 @@ int device_pm_add(struct device *dev)
dev->bus ? dev->bus->name : "No Bus",
kobject_name(&dev->kobj));
mutex_lock(&dpm_list_mtx);
- if ((dev->parent && dev->parent->power.sleeping) || all_sleeping) {
- if (dev->parent->power.sleeping)
- dev_warn(dev,
- "parent %s is sleeping, will not add\n",
+ if (all_inactive) {
+ dev_warn(dev, "all devices are sleeping, will not add\n");
+ goto Refuse;
+ }
+ if (dev->parent)
+ switch (dev->parent->power.status) {
+ case DPM_OFF:
+ dev_warn(dev, "parent %s is sleeping, will not add\n",
dev->parent->bus_id);
- else
- dev_warn(dev, "devices are sleeping, will not add\n");
- WARN_ON(true);
- error = -EBUSY;
- } else {
- error = dpm_sysfs_add(dev);
- if (!error)
- list_add_tail(&dev->power.entry, &dpm_active);
- }
+ goto Refuse;
+ case DPM_PREPARING:
+ dev->parent->power.status = DPM_ON;
+ break;
+ }
+ error = dpm_sysfs_add(dev);
+ if (!error)
+ list_add_tail(&dev->power.entry, &dpm_active);
+ End:
mutex_unlock(&dpm_list_mtx);
return error;
+ Refuse:
+ WARN_ON(true);
+ error = -EBUSY;
+ goto End;
}

/**
@@ -122,26 +131,184 @@ void device_pm_schedule_removal(struct d
}
EXPORT_SYMBOL_GPL(device_pm_schedule_removal);

+/**
+ * pm_op - execute the PM operation appropiate for given PM event
+ * @dev: Device.
+ * @ops: PM operations to choose from.
+ * @state: PM event message.
+ */
+static int pm_op(struct device *dev, struct pm_ops *ops, pm_message_t state)
+{
+ int error = 0;
+
+ switch (state.event) {
+#ifdef CONFIG_SUSPEND
+ case PM_EVENT_SUSPEND:
+ if (ops->suspend) {
+ error = ops->suspend(dev);
+ suspend_report_result(ops->suspend, error);
+ }
+ break;
+ case PM_EVENT_RESUME:
+ if (ops->resume) {
+ error = ops->resume(dev);
+ suspend_report_result(ops->resume, error);
+ }
+ break;
+#endif /* CONFIG_SUSPEND */
+#ifdef CONFIG_HIBERNATION
+ case PM_EVENT_FREEZE:
+ case PM_EVENT_QUIESCE:
+ if (ops->freeze) {
+ error = ops->freeze(dev);
+ suspend_report_result(ops->freeze, error);
+ }
+ break;
+ case PM_EVENT_HIBERNATE:
+ if (ops->poweroff) {
+ error = ops->poweroff(dev);
+ suspend_report_result(ops->poweroff, error);
+ }
+ break;
+ case PM_EVENT_THAW:
+ case PM_EVENT_RECOVER:
+ if (ops->thaw) {
+ error = ops->thaw(dev);
+ suspend_report_result(ops->thaw, error);
+ }
+ break;
+ case PM_EVENT_RESTORE:
+ if (ops->restore) {
+ error = ops->restore(dev);
+ suspend_report_result(ops->restore, error);
+ }
+ break;
+#endif /* CONFIG_HIBERNATION */
+ default:
+ error = -EINVAL;
+ }
+ return error;
+}
+
+/**
+ * pm_noirq_op - execute the PM operation appropiate for given PM event
+ * @dev: Device.
+ * @ops: PM operations to choose from.
+ * @state: PM event message.
+ *
+ * The operation is executed with interrupts disabled by the only remaining
+ * functional CPU in the system.
+ */
+static int pm_noirq_op(struct device *dev, struct pm_noirq_ops *ops,
+ pm_message_t state)
+{
+ int error = 0;
+
+ switch (state.event) {
+#ifdef CONFIG_SUSPEND
+ case PM_EVENT_SUSPEND:
+ if (ops->suspend_noirq) {
+ error = ops->suspend_noirq(dev);
+ suspend_report_result(ops->suspend_noirq, error);
+ }
+ break;
+ case PM_EVENT_RESUME:
+ if (ops->resume_noirq) {
+ error = ops->resume_noirq(dev);
+ suspend_report_result(ops->resume_noirq, error);
+ }
+ break;
+#endif /* CONFIG_SUSPEND */
+#ifdef CONFIG_HIBERNATION
+ case PM_EVENT_FREEZE:
+ case PM_EVENT_QUIESCE:
+ if (ops->freeze_noirq) {
+ error = ops->freeze_noirq(dev);
+ suspend_report_result(ops->freeze_noirq, error);
+ }
+ break;
+ case PM_EVENT_HIBERNATE:
+ if (ops->poweroff_noirq) {
+ error = ops->poweroff_noirq(dev);
+ suspend_report_result(ops->poweroff_noirq, error);
+ }
+ break;
+ case PM_EVENT_THAW:
+ case PM_EVENT_RECOVER:
+ if (ops->thaw_noirq) {
+ error = ops->thaw_noirq(dev);
+ suspend_report_result(ops->thaw_noirq, error);
+ }
+ break;
+ case PM_EVENT_RESTORE:
+ if (ops->restore_noirq) {
+ error = ops->restore_noirq(dev);
+ suspend_report_result(ops->restore_noirq, error);
+ }
+ break;
+#endif /* CONFIG_HIBERNATION */
+ default:
+ error = -EINVAL;
+ }
+ return error;
+}
+
+static char *pm_verb(int event)
+{
+ switch (event) {
+ case PM_EVENT_SUSPEND:
+ return "suspend";
+ case PM_EVENT_RESUME:
+ return "resume";
+ case PM_EVENT_FREEZE:
+ return "freeze";
+ case PM_EVENT_QUIESCE:
+ return "quiesce";
+ case PM_EVENT_HIBERNATE:
+ return "hibernate";
+ case PM_EVENT_THAW:
+ return "thaw";
+ case PM_EVENT_RESTORE:
+ return "restore";
+ default:
+ return "(unknown PM event)";
+ }
+}
+
+static void pm_dev_dbg(struct device *dev, pm_message_t state, char *info)
+{
+ dev_dbg(dev, "%s%s%s\n", info, pm_verb(state.event),
+ ((state.event & PM_EVENT_SLEEP) && device_may_wakeup(dev)) ?
+ ", may wakeup" : "");
+}
+
/*------------------------- Resume routines -------------------------*/

/**
- * resume_device_early - Power on one device (early resume).
+ * resume_device_noirq - Power on one device (early resume).
* @dev: Device.
+ * @state: Operation to carry out.
*
* Must be called with interrupts disabled.
*/
-static int resume_device_early(struct device *dev)
+static int resume_device_noirq(struct device *dev, pm_message_t state)
{
int error = 0;

TRACE_DEVICE(dev);
TRACE_RESUME(0);

- if (dev->bus && dev->bus->resume_early) {
- dev_dbg(dev, "EARLY resume\n");
+ if (!dev->bus)
+ goto End;
+
+ if (dev->bus->pm_noirq) {
+ pm_dev_dbg(dev, state, "EARLY ");
+ error = pm_noirq_op(dev, dev->bus->pm_noirq, state);
+ } else if (dev->bus->resume_early) {
+ pm_dev_dbg(dev, state, "legacy EARLY ");
error = dev->bus->resume_early(dev);
}
-
+ End:
TRACE_RESUME(error);
return error;
}
@@ -156,7 +323,7 @@ static int resume_device_early(struct de
*
* Must be called with interrupts disabled and only one CPU running.
*/
-static void dpm_power_up(void)
+static void dpm_power_up(pm_message_t state)
{

while (!list_empty(&dpm_off_irq)) {
@@ -164,7 +331,7 @@ static void dpm_power_up(void)
struct device *dev = to_device(entry);

list_move_tail(entry, &dpm_off);
- resume_device_early(dev);
+ resume_device_noirq(dev, state);
}
}

@@ -176,19 +343,19 @@ static void dpm_power_up(void)
*
* Must be called with interrupts disabled.
*/
-void device_power_up(void)
+void device_power_up(pm_message_t state)
{
sysdev_resume();
- dpm_power_up();
+ dpm_power_up(state);
}
EXPORT_SYMBOL_GPL(device_power_up);

/**
* resume_device - Restore state for one device.
* @dev: Device.
- *
+ * @state: Operation to carry out.
*/
-static int resume_device(struct device *dev)
+static int resume_device(struct device *dev, pm_message_t state)
{
int error = 0;

@@ -197,21 +364,40 @@ static int resume_device(struct device *

down(&dev->sem);

- if (dev->bus && dev->bus->resume) {
- dev_dbg(dev,"resuming\n");
- error = dev->bus->resume(dev);
+ if (dev->bus) {
+ if (dev->bus->pm) {
+ pm_dev_dbg(dev, state, "");
+ error = pm_op(dev, dev->bus->pm, state);
+ } else if (dev->bus->resume) {
+ pm_dev_dbg(dev, state, "legacy ");
+ error = dev->bus->resume(dev);
+ }
+ if (error)
+ goto End;
}

- if (!error && dev->type && dev->type->resume) {
- dev_dbg(dev,"resuming\n");
- error = dev->type->resume(dev);
+ if (dev->type) {
+ if (dev->type->pm) {
+ pm_dev_dbg(dev, state, "type ");
+ error = pm_op(dev, dev->type->pm, state);
+ } else if (dev->type->resume) {
+ pm_dev_dbg(dev, state, "legacy type ");
+ error = dev->type->resume(dev);
+ }
+ if (error)
+ goto End;
}

- if (!error && dev->class && dev->class->resume) {
- dev_dbg(dev,"class resume\n");
- error = dev->class->resume(dev);
+ if (dev->class) {
+ if (dev->class->pm) {
+ pm_dev_dbg(dev, state, "class ");
+ error = pm_op(dev, dev->class->pm, state);
+ } else if (dev->class->resume) {
+ pm_dev_dbg(dev, state, "legacy class ");
+ error = dev->class->resume(dev);
+ }
}
-
+ End:
up(&dev->sem);

TRACE_RESUME(error);
@@ -226,20 +412,73 @@ static int resume_device(struct device *
* went through the early resume.
*
* Take devices from the dpm_off_list, resume them,
- * and put them on the dpm_locked list.
+ * and put them on the dpm_in_transit list.
*/
-static void dpm_resume(void)
+static void dpm_resume(pm_message_t state)
{
mutex_lock(&dpm_list_mtx);
- all_sleeping = false;
while(!list_empty(&dpm_off)) {
struct list_head *entry = dpm_off.next;
struct device *dev = to_device(entry);

+ list_move_tail(entry, &dpm_in_transit);
+ mutex_unlock(&dpm_list_mtx);
+
+ resume_device(dev, state);
+
+ mutex_lock(&dpm_list_mtx);
+ }
+ mutex_unlock(&dpm_list_mtx);
+}
+
+/**
+ * complete_device - Complete a PM transition for given device
+ * @dev: Device.
+ * @state: Power transition we are completing.
+ */
+static void complete_device(struct device *dev, pm_message_t state)
+{
+ down(&dev->sem);
+
+ if (dev->bus && dev->bus->pm && dev->bus->pm->complete) {
+ pm_dev_dbg(dev, state, "completing ");
+ dev->bus->pm->complete(dev);
+ }
+
+ if (dev->type && dev->type->pm && dev->type->pm->complete) {
+ pm_dev_dbg(dev, state, "completing type ");
+ dev->type->pm->complete(dev);
+ }
+
+ if (dev->class && dev->class->pm && dev->class->pm->complete) {
+ pm_dev_dbg(dev, state, "completing class ");
+ dev->class->pm->complete(dev);
+ }
+
+ up(&dev->sem);
+}
+
+/**
+ * dpm_complete - Complete a PM transition for all devices.
+ * @state: Power transition we are completing.
+ *
+ * Take devices from the dpm_in_transit, complete the transition for each
+ * of them and put them on the dpm_active list.
+ */
+static void dpm_complete(pm_message_t state)
+{
+ mutex_lock(&dpm_list_mtx);
+ all_inactive = false;
+ while(!list_empty(&dpm_in_transit)) {
+ struct list_head *entry = dpm_in_transit.next;
+ struct device *dev = to_device(entry);
+
list_move_tail(entry, &dpm_active);
- dev->power.sleeping = false;
+ dev->power.status = DPM_ON;
mutex_unlock(&dpm_list_mtx);
- resume_device(dev);
+
+ complete_device(dev, state);
+
mutex_lock(&dpm_list_mtx);
}
mutex_unlock(&dpm_list_mtx);
@@ -267,14 +506,16 @@ static void unregister_dropped_devices(v

/**
* device_resume - Restore state of each device in system.
+ * @state: Operation to carry out.
*
* Resume all the devices, unlock them all, and allow new
* devices to be registered once again.
*/
-void device_resume(void)
+void device_resume(pm_message_t state)
{
might_sleep();
- dpm_resume();
+ dpm_resume(state);
+ dpm_complete(state);
unregister_dropped_devices();
}
EXPORT_SYMBOL_GPL(device_resume);
@@ -282,37 +523,44 @@ EXPORT_SYMBOL_GPL(device_resume);

/*------------------------- Suspend routines -------------------------*/

-static inline char *suspend_verb(u32 event)
+/**
+ * resume_event - return a PM message representing the resume event
+ * corresponding to given sleep state.
+ * @sleep_state - PM message representing a sleep state
+ */
+static pm_message_t resume_event(pm_message_t sleep_state)
{
- switch (event) {
- case PM_EVENT_SUSPEND: return "suspend";
- case PM_EVENT_FREEZE: return "freeze";
- case PM_EVENT_PRETHAW: return "prethaw";
- default: return "(unknown suspend event)";
+ switch (sleep_state.event) {
+ case PM_EVENT_SUSPEND:
+ return PMSG_RESUME;
+ case PM_EVENT_FREEZE:
+ case PM_EVENT_QUIESCE:
+ return PMSG_RECOVER;
+ case PM_EVENT_HIBERNATE:
+ return PMSG_RESTORE;
}
-}
-
-static void
-suspend_device_dbg(struct device *dev, pm_message_t state, char *info)
-{
- dev_dbg(dev, "%s%s%s\n", info, suspend_verb(state.event),
- ((state.event == PM_EVENT_SUSPEND) && device_may_wakeup(dev)) ?
- ", may wakeup" : "");
+ return PMSG_ON;
}

/**
- * suspend_device_late - Shut down one device (late suspend).
+ * suspend_device_noirq - Shut down one device (late suspend).
* @dev: Device.
- * @state: Power state device is entering.
+ * @state: PM message representing the operation to perform.
*
* This is called with interrupts off and only a single CPU running.
*/
-static int suspend_device_late(struct device *dev, pm_message_t state)
+static int suspend_device_noirq(struct device *dev, pm_message_t state)
{
int error = 0;

- if (dev->bus && dev->bus->suspend_late) {
- suspend_device_dbg(dev, state, "LATE ");
+ if (!dev->bus)
+ return 0;
+
+ if (dev->bus->pm_noirq) {
+ pm_dev_dbg(dev, state, "LATE ");
+ error = pm_noirq_op(dev, dev->bus->pm_noirq, state);
+ } else if (dev->bus->suspend_late) {
+ pm_dev_dbg(dev, state, "legacy LATE ");
error = dev->bus->suspend_late(dev, state);
suspend_report_result(dev->bus->suspend_late, error);
}
@@ -321,7 +569,7 @@ static int suspend_device_late(struct de

/**
* device_power_down - Shut down special devices.
- * @state: Power state to enter.
+ * @state: PM message representing the operation to perform.
*
* Power down devices that require interrupts to be disabled
* and move them from the dpm_off list to the dpm_off_irq list.
@@ -337,7 +585,7 @@ int device_power_down(pm_message_t state
struct list_head *entry = dpm_off.prev;
struct device *dev = to_device(entry);

- error = suspend_device_late(dev, state);
+ error = suspend_device_noirq(dev, state);
if (error) {
printk(KERN_ERR "Could not power down device %s: "
"error %d\n",
@@ -351,7 +599,7 @@ int device_power_down(pm_message_t state
if (!error)
error = sysdev_suspend(state);
if (error)
- dpm_power_up();
+ dpm_power_up(resume_event(state));
return error;
}
EXPORT_SYMBOL_GPL(device_power_down);
@@ -367,29 +615,43 @@ static int suspend_device(struct device

down(&dev->sem);

- if (dev->power.power_state.event) {
- dev_dbg(dev, "PM: suspend %d-->%d\n",
- dev->power.power_state.event, state.event);
- }
-
- if (dev->class && dev->class->suspend) {
- suspend_device_dbg(dev, state, "class ");
- error = dev->class->suspend(dev, state);
- suspend_report_result(dev->class->suspend, error);
+ if (dev->class) {
+ if (dev->class->pm) {
+ pm_dev_dbg(dev, state, "class ");
+ error = pm_op(dev, dev->class->pm, state);
+ } else if (dev->class->suspend) {
+ pm_dev_dbg(dev, state, "legacy class ");
+ error = dev->class->suspend(dev, state);
+ suspend_report_result(dev->class->suspend, error);
+ }
+ if (error)
+ goto End;
}

- if (!error && dev->type && dev->type->suspend) {
- suspend_device_dbg(dev, state, "type ");
- error = dev->type->suspend(dev, state);
- suspend_report_result(dev->type->suspend, error);
+ if (dev->type) {
+ if (dev->type->pm) {
+ pm_dev_dbg(dev, state, "type ");
+ error = pm_op(dev, dev->type->pm, state);
+ } else if (dev->type->suspend) {
+ pm_dev_dbg(dev, state, "legacy type ");
+ error = dev->type->suspend(dev, state);
+ suspend_report_result(dev->type->suspend, error);
+ }
+ if (error)
+ goto End;
}

- if (!error && dev->bus && dev->bus->suspend) {
- suspend_device_dbg(dev, state, "");
- error = dev->bus->suspend(dev, state);
- suspend_report_result(dev->bus->suspend, error);
+ if (dev->bus) {
+ if (dev->bus->pm) {
+ pm_dev_dbg(dev, state, "");
+ error = pm_op(dev, dev->bus->pm, state);
+ } else if (dev->bus->suspend) {
+ pm_dev_dbg(dev, state, "legacy ");
+ error = dev->bus->suspend(dev, state);
+ suspend_report_result(dev->bus->suspend, error);
+ }
}
-
+ End:
up(&dev->sem);

return error;
@@ -399,65 +661,140 @@ static int suspend_device(struct device
* dpm_suspend - Suspend every device.
* @state: Power state to put each device in.
*
- * Walk the dpm_locked list. Suspend each device and move it
- * to the dpm_off list.
- *
- * (For historical reasons, if it returns -EAGAIN, that used to mean
- * that the device would be called again with interrupts disabled.
- * These days, we use the "suspend_late()" callback for that, so we
- * print a warning and consider it an error).
+ * Walk the dpm_in_transit list. Suspend each device and move it to the
+ * dpm_off list.
*/
static int dpm_suspend(pm_message_t state)
{
int error = 0;

mutex_lock(&dpm_list_mtx);
- while (!list_empty(&dpm_active)) {
- struct list_head *entry = dpm_active.prev;
+ while (!list_empty(&dpm_in_transit)) {
+ struct list_head *entry = dpm_in_transit.prev;
struct device *dev = to_device(entry);

- WARN_ON(dev->parent && dev->parent->power.sleeping);
-
- dev->power.sleeping = true;
mutex_unlock(&dpm_list_mtx);
+
error = suspend_device(dev, state);
+
mutex_lock(&dpm_list_mtx);
if (error) {
printk(KERN_ERR "Could not suspend device %s: "
- "error %d%s\n",
- kobject_name(&dev->kobj),
- error,
- (error == -EAGAIN ?
- " (please convert to suspend_late)" :
- ""));
- dev->power.sleeping = false;
+ "error %d\n", kobject_name(&dev->kobj), error);
break;
}
if (!list_empty(&dev->power.entry))
list_move(&dev->power.entry, &dpm_off);
}
- if (!error)
- all_sleeping = true;
mutex_unlock(&dpm_list_mtx);
+ return error;
+}
+
+/**
+ * prepare_device - Execute the ->prepare() callback(s) for given device.
+ * @dev: Device.
+ * @state: PM operation we are preparing for.
+ */
+static int prepare_device(struct device *dev, pm_message_t state)
+{
+ int error = 0;
+
+ down(&dev->sem);
+
+ if (dev->bus && dev->bus->pm && dev->bus->pm->prepare) {
+ pm_dev_dbg(dev, state, "preparing ");
+ error = dev->bus->pm->prepare(dev);
+ suspend_report_result(dev->bus->pm->prepare, error);
+ if (error)
+ goto End;
+ }
+
+ if (dev->type && dev->type->pm && dev->type->pm->prepare) {
+ pm_dev_dbg(dev, state, "preparing type ");
+ error = dev->type->pm->prepare(dev);
+ suspend_report_result(dev->type->pm->prepare, error);
+ if (error)
+ goto End;
+ }
+
+ if (dev->class && dev->class->pm && dev->class->pm->prepare) {
+ pm_dev_dbg(dev, state, "preparing class ");
+ error = dev->class->pm->prepare(dev);
+ suspend_report_result(dev->class->pm->prepare, error);
+ }
+ End:
+ up(&dev->sem);

return error;
}

/**
+ * dpm_prepare - Prepare all devices for a PM transition.
+ * @state: Power state to put each device in.
+ *
+ * Walk the dpm_active list. Prepare each device and move it to the
+ * dpm_in_transit list.
+ */
+static int dpm_prepare(pm_message_t state)
+{
+ int error = 0;
+
+ mutex_lock(&dpm_list_mtx);
+ while (!list_empty(&dpm_active)) {
+ struct list_head *entry = dpm_active.prev;
+ struct device *dev = to_device(entry);
+
+ WARN_ON(dev->parent && dev->parent->power.status == DPM_OFF);
+ dev->power.status = DPM_PREPARING;
+ mutex_unlock(&dpm_list_mtx);
+
+ error = prepare_device(dev, state);
+
+ mutex_lock(&dpm_list_mtx);
+ if (error) {
+ dev->power.status = DPM_ON;
+ if (error == -EAGAIN)
+ continue;
+ printk(KERN_ERR "Could not prepare device %s "
+ "for suspend: error %d\n",
+ kobject_name(&dev->kobj), error);
+ goto End;
+ }
+ if (dev->power.status == DPM_ON) {
+ /* Child added during prepare_device() */
+ mutex_unlock(&dpm_list_mtx);
+
+ complete_device(dev, resume_event(state));
+
+ mutex_lock(&dpm_list_mtx);
+ continue;
+ }
+ dev->power.status = DPM_OFF;
+ if (!list_empty(&dev->power.entry))
+ list_move(&dev->power.entry, &dpm_in_transit);
+ }
+ all_inactive = true;
+ End:
+ mutex_unlock(&dpm_list_mtx);
+ return error;
+}
+
+/**
* device_suspend - Save state and stop all devices in system.
* @state: new power management state
*
- * Prevent new devices from being registered, then lock all devices
- * and suspend them.
+ * Prepare and suspend all devices.
*/
int device_suspend(pm_message_t state)
{
int error;

might_sleep();
- error = dpm_suspend(state);
+ error = dpm_prepare(state);
+ if (!error)
+ error = dpm_suspend(state);
if (error)
- device_resume();
+ device_resume(resume_event(state));
return error;
}
EXPORT_SYMBOL_GPL(device_suspend);
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -69,6 +69,9 @@ struct bus_type {
int (*resume_early)(struct device *dev);
int (*resume)(struct device *dev);

+ struct pm_ops *pm;
+ struct pm_noirq_ops *pm_noirq;
+
struct bus_type_private *p;
};

@@ -202,6 +205,8 @@ struct class {

int (*suspend)(struct device *dev, pm_message_t state);
int (*resume)(struct device *dev);
+
+ struct pm_ops *pm;
};

extern int __must_check class_register(struct class *class);
@@ -345,8 +350,11 @@ struct device_type {
struct attribute_group **groups;
int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
void (*release)(struct device *dev);
+
int (*suspend)(struct device *dev, pm_message_t state);
int (*resume)(struct device *dev);
+
+ struct pm_ops *pm;
};

/* interface for exporting device attributes */
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -224,7 +224,7 @@ static int create_image(int platform_mod
/* NOTE: device_power_up() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
*/
- device_power_up();
+ device_power_up(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
Enable_irqs:
local_irq_enable();
return error;
@@ -280,7 +280,7 @@ int hibernation_snapshot(int platform_mo
Finish:
platform_finish(platform_mode);
Resume_devices:
- device_resume();
+ device_resume(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
Resume_console:
resume_console();
Close:
@@ -301,7 +301,7 @@ static int resume_target_kernel(void)
int error;

local_irq_disable();
- error = device_power_down(PMSG_PRETHAW);
+ error = device_power_down(PMSG_QUIESCE);
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
"aborting resume\n");
@@ -329,7 +329,7 @@ static int resume_target_kernel(void)
swsusp_free();
restore_processor_state();
touch_softlockup_watchdog();
- device_power_up();
+ device_power_up(PMSG_THAW);
Enable_irqs:
local_irq_enable();
return error;
@@ -350,7 +350,7 @@ int hibernation_restore(int platform_mod

pm_prepare_console();
suspend_console();
- error = device_suspend(PMSG_PRETHAW);
+ error = device_suspend(PMSG_QUIESCE);
if (error)
goto Finish;

@@ -362,7 +362,7 @@ int hibernation_restore(int platform_mod
enable_nonboot_cpus();
}
platform_restore_cleanup(platform_mode);
- device_resume();
+ device_resume(PMSG_RECOVER);
Finish:
resume_console();
pm_restore_console();
@@ -419,7 +419,7 @@ int hibernation_platform_enter(void)
Finish:
hibernation_ops->finish();
Resume_devices:
- device_resume();
+ device_resume(PMSG_RESTORE);
Resume_console:
resume_console();
Close:
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -239,7 +239,7 @@ static int suspend_enter(suspend_state_t
if (!suspend_test(TEST_CORE))
error = suspend_ops->enter(state);

- device_power_up();
+ device_power_up(PMSG_RESUME);
Done:
arch_suspend_enable_irqs();
BUG_ON(irqs_disabled());
@@ -291,7 +291,7 @@ int suspend_devices_and_enter(suspend_st
if (suspend_ops->finish)
suspend_ops->finish();
Resume_devices:
- device_resume();
+ device_resume(PMSG_RESUME);
Resume_console:
resume_console();
Close:
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1208,9 +1208,9 @@ static int suspend(int vetoable)
if (err != APM_SUCCESS)
apm_error("suspend", err);
err = (err == APM_SUCCESS) ? 0 : -EIO;
- device_power_up();
+ device_power_up(PMSG_RESUME);
local_irq_enable();
- device_resume();
+ device_resume(PMSG_RESUME);
queue_event(APM_NORMAL_RESUME, NULL);
out:
spin_lock(&user_list_lock);


2008-03-21 00:21:39

by Johannes Berg

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)


> + * All of the above callbacks, except for @complete(), return error codes.
> + * However, the error codes returned by the resume operations, @resume(),
> + * @thaw(), and @restore(), are only printed in the system logs, since the PM
> + * core cannot do anything else about them.

Why bother and not just make them return void, the error printing can
most likely be done much much better in the callback since that possibly
has information on why it failed.

> + * All of the above callbacks, return error codes, but the error codes returned
> + * by the resume operations, @resume_noirq(), @thaw_noirq(), and
> + * @restore_noirq(), are only printed in the system logs, since the PM core
> + * cannot do anything else about them.

Same here, of course.

johannes


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2008-03-21 00:27:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Friday, 21 of March 2008, Johannes Berg wrote:
>
> > + * All of the above callbacks, except for @complete(), return error codes.
> > + * However, the error codes returned by the resume operations, @resume(),
> > + * @thaw(), and @restore(), are only printed in the system logs, since the PM
> > + * core cannot do anything else about them.
>
> Why bother and not just make them return void, the error printing can
> most likely be done much much better in the callback since that possibly
> has information on why it failed.
>
> > + * All of the above callbacks, return error codes, but the error codes returned
> > + * by the resume operations, @resume_noirq(), @thaw_noirq(), and
> > + * @restore_noirq(), are only printed in the system logs, since the PM core
> > + * cannot do anything else about them.
>
> Same here, of course.

The errors returned are used by TRACE_RESUME(), for example.

Well, perhaps the comments should be changed instead. :-)

Thanks,
Rafael

2008-03-21 01:21:44

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:

> Hi,
>
> Below is an updated version of the $subject patch. It has only been
> compilation tested, but I'd like to get as much feedback as reasonably possible
> at this stage to avoid redesigning things later in the process.
>
> The most important changes from the previous one are the following:
> * Callbacks to be executed with interrupts disabled are now separated from
> the "regular" ones. There still are six of them, but IMHO this is what may
> be necessary in the most general case (eg. a driver may have to carry out
> some operations with interrupts disabled, which are different for SUSPEND,
> FREEZE and HIBERNATE; the complementary "resume" callbacks are needed for
> error recovery, for example)
> * It occured to me that the ->freeze() and ->quiesce() callbacks were
> essentially the same, so I removed the latter
> * The ->recover() callback was also dropped, as ->thaw() may well be used instead
> just fine (it seems)
> * ->prepare() and ->complete() are now called in separate loops which causes
> some funny complications with error recovery. Namely, we must remember which
> devices have already been ->prepare()d, so that we call ->complete() for them
> in the error path. Moreover, there may be some ->prepare()d and some
> ->suspend()ed devices at the same time, so if there's an error we should
> ->resume() the ->suspend()ed ones and ->complete() everything etc.
> This may be handled in many different ways, but I decided to introduce a new
> list on which the ->complete()d devices are stored. Accordingly, the
> registration of new children of a device is blocked between ->prepare() and
> ->complete() (it was only blocked between ->prepare() and ->resume() in the
> previous version).
>
> Please have a look.

I don't have time to go through this in detail now, but a few things
stand out.

One trivial problem is that your dpm_prepare and dpm_complete routines
go through the list in the wrong order.

More importantly, the situation with prepare and complete is getting
rather complicated. Now that you're adding dev->power.state, why not
go all the way? Get rid of all the different lists, keeping just
dpm_active and possibly dpm_destroy. Instead of moving devices between
different lists, just store in dev->power.state an identification for
which list the device is supposed to be on or is supposed to be moving
to. (Or else have a bunch of bitfields indicating which methods have
been called.) This has the advantage that the entries will never get
out of order, even if devices are registered at the wrong time.

There's some question about when we want the PM core to start
preventing new children from being registered. Should this start right
after prepare() returns, or should it start right before suspend() is
called? The first alternative sounds better to me. Not only does it
agree with the purpose of the prepare method, it also makes race
situations easier to handle.

Since dpm_prepare is supposed to go through the list in the forward
direction, logically the "root" of the device tree should be the first
thing "prepared". This means you should not allow parentless devices
to be registered any time after dpm_prepare has started. Is that
liable to cause problems?

There may be similar problems with complete(). A number of drivers
check during their resume method for the presence of new children
plugged in while the system was asleep. All these drivers would have
to be converted over to the new scheme if they weren't permitted to
register the new children before complete() was called. Of course,
this is easily fixed by permitting new children starting from when
resume() is called rather than when complete() is called.

Alan Stern

2008-03-21 02:15:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Friday, 21 of March 2008, Alan Stern wrote:
> On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:
>
> > Hi,
> >
> > Below is an updated version of the $subject patch. It has only been
> > compilation tested, but I'd like to get as much feedback as reasonably possible
> > at this stage to avoid redesigning things later in the process.
> >
> > The most important changes from the previous one are the following:
> > * Callbacks to be executed with interrupts disabled are now separated from
> > the "regular" ones. There still are six of them, but IMHO this is what may
> > be necessary in the most general case (eg. a driver may have to carry out
> > some operations with interrupts disabled, which are different for SUSPEND,
> > FREEZE and HIBERNATE; the complementary "resume" callbacks are needed for
> > error recovery, for example)
> > * It occured to me that the ->freeze() and ->quiesce() callbacks were
> > essentially the same, so I removed the latter
> > * The ->recover() callback was also dropped, as ->thaw() may well be used instead
> > just fine (it seems)
> > * ->prepare() and ->complete() are now called in separate loops which causes
> > some funny complications with error recovery. Namely, we must remember which
> > devices have already been ->prepare()d, so that we call ->complete() for them
> > in the error path. Moreover, there may be some ->prepare()d and some
> > ->suspend()ed devices at the same time, so if there's an error we should
> > ->resume() the ->suspend()ed ones and ->complete() everything etc.
> > This may be handled in many different ways, but I decided to introduce a new
> > list on which the ->complete()d devices are stored. Accordingly, the
> > registration of new children of a device is blocked between ->prepare() and
> > ->complete() (it was only blocked between ->prepare() and ->resume() in the
> > previous version).
> >
> > Please have a look.
>
> I don't have time to go through this in detail now, but a few things
> stand out.
>
> One trivial problem is that your dpm_prepare and dpm_complete routines
> go through the list in the wrong order.

I'm not all so sure that the order is actually wrong. What would be the
advantage of the forward order over the current one?

> More importantly, the situation with prepare and complete is getting
> rather complicated. Now that you're adding dev->power.state, why not
> go all the way? Get rid of all the different lists, keeping just
> dpm_active and possibly dpm_destroy.

dpm_destroy is not necessary and I'm going to drop it (later).

> Instead of moving devices between different lists, just store in
> dev->power.state an identification for which list the device is supposed
> to be on or is supposed to be moving to. (Or else have a bunch of bitfields
> indicating which methods have been called.) This has the advantage that
> the entries will never get out of order, even if devices are registered at
> the wrong time.

That's correct.

> There's some question about when we want the PM core to start
> preventing new children from being registered. Should this start right
> after prepare() returns, or should it start right before suspend() is
> called? The first alternative sounds better to me. Not only does it
> agree with the purpose of the prepare method, it also makes race
> situations easier to handle.

I agree and that's implemented in the patch (ie. the registrations of new
children are blocked immediately after ->prepare() has returned).

> Since dpm_prepare is supposed to go through the list in the forward
> direction, logically the "root" of the device tree should be the first
> thing "prepared". This means you should not allow parentless devices
> to be registered any time after dpm_prepare has started. Is that
> liable to cause problems?

I'm still not seeing the advantage of the forward direction in the first place.

Although I don't see what particular problems that may cause, I think the
current approach (first, block registrations of new children for each
->prepare()d device and finally block any registrations of new devices) is
more natural.

> There may be similar problems with complete(). A number of drivers
> check during their resume method for the presence of new children
> plugged in while the system was asleep. All these drivers would have
> to be converted over to the new scheme if they weren't permitted to
> register the new children before complete() was called. Of course,
> this is easily fixed by permitting new children starting from when
> resume() is called rather than when complete() is called.

Well, the problem here was the protection of the correct ordering of the
various lists. However, if the approach with changing 'status' is adopted
instead, which seems to be better, we'll be able to unblock the registering
of new children before ->resume().

Thanks,
Rafael

2008-03-21 02:53:46

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:

> > One trivial problem is that your dpm_prepare and dpm_complete routines
> > go through the list in the wrong order.
>
> I'm not all so sure that the order is actually wrong. What would be the
> advantage of the forward order over the current one?

The advantage is that races no longer matter.

Suppose you're going through the list backwards and a race occurs, so
that a child is registered while the parent's prepare() is running.
That new child will of course be at the end of dpm_active, so the loop
in dpm_prepare won't see it (assuming you adopt the approach of using
only a single list). But if you go through the list forwards and a new
child is added, you will naturally see the child when you reach the end
of the list. You don't even have to go through the business about
changing the parent's state back to DPM_ON.

There are other ways of describing the situation, but they all come
down to the same thing. For instance, this is the way Ben was talking
about doing things.

> > Since dpm_prepare is supposed to go through the list in the forward
> > direction, logically the "root" of the device tree should be the first
> > thing "prepared". This means you should not allow parentless devices
> > to be registered any time after dpm_prepare has started. Is that
> > liable to cause problems?
>
> I'm still not seeing the advantage of the forward direction in the first place.
>
> Although I don't see what particular problems that may cause, I think the
> current approach (first, block registrations of new children for each
> ->prepare()d device and finally block any registrations of new devices) is
> more natural.

I suppose for parentless devices it doesn't really matter. You could
safely allow them to be registered any time up until dpm_prepare()
finishes -- which is what your patch does. Perhaps the "all_inactive"
flag should be renamed to "all_prepared".

> > There may be similar problems with complete(). A number of drivers
> > check during their resume method for the presence of new children
> > plugged in while the system was asleep. All these drivers would have
> > to be converted over to the new scheme if they weren't permitted to
> > register the new children before complete() was called. Of course,
> > this is easily fixed by permitting new children starting from when
> > resume() is called rather than when complete() is called.
>
> Well, the problem here was the protection of the correct ordering of the
> various lists. However, if the approach with changing 'status' is adopted
> instead, which seems to be better, we'll be able to unblock the registering
> of new children before ->resume().

Yep. The only thing to watch out for is in device_pm_remove(); it
would be a disaster if somehow a device was removed while it was being
prepared/suspended/resumed/completed/whatever. I know that's not
supposed to happen but there's nothing to prevent it, especially if
the device in question doesn't have a driver. No doubt you can invent
a way to allow this to happen safely.

Alan Stern

2008-03-21 08:15:56

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

Hi Rafael.

Is it possible to extend this in some way so we avoid the
#ifdef stuff in the drivers?
We could introduce a few special sections that we discard if
PM is not in use.
We have a reliable build time infrastructure to detect
inconsistencies if needed.

Something like:
#define __suspend __section(.suspend.text)
#define __suspenddata __section(.suspend.data)

#define __hibernate __section(.hibernate.text)
#define __hibernatedata __section(.hibernate.data)

A few more tricks will be needed when we assign the functon pointers.
We have __devexit_p(*) and we may use something similar.

Sam




> Index: linux-2.6/include/linux/pm.h
> ===================================================================
> --- linux-2.6.orig/include/linux/pm.h
> +++ linux-2.6/include/linux/pm.h
> @@ -114,7 +114,9 @@ typedef struct pm_message {
> int event;
> } pm_message_t;
>
> -/*
> +/**
> + * struct pm_ops - device PM callbacks
> + *
> * Several driver power state transitions are externally visible, affecting
> * the state of pending I/O queues and (for drivers that touch hardware)
> * interrupts, wakeups, DMA, and other hardware state. There may also be
> @@ -122,6 +124,245 @@ typedef struct pm_message {
> * to the rest of the driver stack (such as a driver that's ON gating off
> * clocks which are not in active use).
> *
> + * The externally visible transitions are handled with the help of the following
> + * callbacks included in this structure:
> + *
> + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> + * its hardware state. Prevent new children of the device from being
> + * registered and prevent new calls to the probe method from being made
> + * after @prepare() returns. If @prepare() detects a situation it cannot
> + * handle (e.g. registration of a child already in progress), it may return
> + * -EAGAIN, so that the PM core can execute it once again (e.g. after the
> + * new child has been registered) to recover from the race condition. This
> + * method is executed for all kinds of suspend transitions and is followed
> + * by one of the suspend callbacks: @suspend(), @freeze(), or @poweroff().
> + * The PM core executes @prepare() for all devices before starting to
> + * execute suspend callbacks for any of them, so drivers may assume all of
> + * the other devices to be present and functional while @prepare() is being
> + * executed. However, they may NOT assume anything about the availability
> + * of the user space at that time.
> + *
> + * @complete: Undo the changes made by @prepare(). This method is executed for
> + * all kinds of resume transitions, following one of the resume callbacks:
> + * @resume(), @thaw(), @restore(). Also called if the state transition
> + * fails before the driver's suspend callback (@suspend(), @freeze(),
> + * @poweroff()) can be executed (e.g. if the suspend callback fails for one
> + * of the other devices that the PM core has unsucessfully attempted to
> + * suspend earlier). Also executed if a new child of the device has been
> + * registered during a successful @prepare() (in that case @prepare() will
> + * be executed once again for the device).
> + * The PM core executes @complete() after it has executed the appropriate
> + * resume callback for all devices.
> + *
> + * @suspend: Executed before putting the system into a sleep state in which the
> + * contents of main memory are preserved. Quiesce the device, put it into
> + * a low power state appropriate for the upcoming system state (such as
> + * PCI_D3hot), and enable wakeup events as appropriate.
> + *
> + * @resume: Executed after waking the system up from a sleep state in which the
> + * contents of main memory were preserved. Put the device into the
> + * appropriate state, according to the information saved in memory by the
> + * preceding @suspend(). The driver starts working again, responding to
> + * hardware events and software requests. The hardware may have gone
> + * through a power-off reset, or it may have maintained state from the
> + * previous suspend() which the driver may rely on while resuming. On most
> + * platforms, there are no restrictions on availability of resources like
> + * clocks during @resume().
> + *
> + * @freeze: Hibernation-specific, executed before creating a hibernation image.
> + * Quiesce operations so that a consistent image can be created, but do NOT
> + * otherwise put the device into a low power device state and do NOT emit
> + * system wakeup events. Save in main memory the device settings to be
> + * used by @restore() during the subsequent resume from hibernation or by
> + * the subsequent @thaw(), if the creation of the image or the restoration
> + * of main memory contents from it fails.
> + *
> + * @thaw: Hibernation-specific, executed after creating a hibernation image OR
> + * if the creation of the image fails. Also executed after a failing
> + * attempt to restore the contents of main memory from such an image.
> + * Undo the changes made by the preceding @freeze(), so the device can be
> + * operated in the same way as immediately before the call to @freeze().
> + *
> + * @poweroff: Hibernation-specific, executed after saving a hibernation image.
> + * Quiesce the device, put it into a low power state appropriate for the
> + * upcoming system state (such as PCI_D3hot), and enable wakeup events as
> + * appropriate.
> + *
> + * @restore: Hibernation-specific, executed after restoring the contents of main
> + * memory from a hibernation image. Driver starts working again,
> + * responding to hardware events and software requests. Drivers may NOT
> + * make ANY assumptions about the hardware state right prior to @restore().
> + * On most platforms, there are no restrictions on availability of
> + * resources like clocks during @restore().
> + *
> + * All of the above callbacks, except for @complete(), return error codes.
> + * However, the error codes returned by the resume operations, @resume(),
> + * @thaw(), and @restore(), are only printed in the system logs, since the PM
> + * core cannot do anything else about them.
> + */
> +
> +struct pm_ops {
> + int (*prepare)(struct device *dev);
> + void (*complete)(struct device *dev);
> + int (*suspend)(struct device *dev);
> + int (*resume)(struct device *dev);
> + int (*freeze)(struct device *dev);
> + int (*thaw)(struct device *dev);
> + int (*poweroff)(struct device *dev);
> + int (*restore)(struct device *dev);
> +};
> +
> +/**
> + * struct pm_noirq_ops - device PM callbacks executed with interrupts disabled
> + *
> + * The following callbacks included in 'struct pm_noirq_ops' are executed with
> + * interrupts disabled and with the nonboot CPUs switched off:
> + *
> + * @suspend_noirq: Complete the operations of ->suspend() by carrying out any
> + * actions required for suspending the device that need interrupts to be
> + * disabled
> + *
> + * @resume_noirq: Prepare for the execution of ->resume() by carrying out any
> + * actions required for resuming the device that need interrupts to be
> + * disabled
> + *
> + * @freeze_noirq: Complete the operations of ->freeze() by carrying out any
> + * actions required for freezing the device that need interrupts to be
> + * disabled
> + *
> + * @thaw_noirq: Prepare for the execution of ->thaw() by carrying out any
> + * actions required for thawing the device that need interrupts to be
> + * disabled
> + *
> + * @poweroff_noirq: Complete the operations of ->poweroff() by carrying out any
> + * actions required for handling the device that need interrupts to be
> + * disabled
> + *
> + * @restore_noirq: Prepare for the execution of ->restore() by carrying out any
> + * actions required for restoring the operations of the device that need
> + * interrupts to be disabled
> + *
> + * All of the above callbacks, return error codes, but the error codes returned
> + * by the resume operations, @resume_noirq(), @thaw_noirq(), and
> + * @restore_noirq(), are only printed in the system logs, since the PM core
> + * cannot do anything else about them.
> + */
> +
> +struct pm_noirq_ops {
> + int (*suspend_noirq)(struct device *dev);
> + int (*resume_noirq)(struct device *dev);
> + int (*freeze_noirq)(struct device *dev);
> + int (*thaw_noirq)(struct device *dev);
> + int (*poweroff_noirq)(struct device *dev);
> + int (*restore_noirq)(struct device *dev);
> +};
> +
> +/**
> + * PM_EVENT_ messages
> + *
> + * The following PM_EVENT_ messages are defined for the internal use of the PM
> + * core, in order to provide a mechanism allowing the high level suspend and
> + * hibernation code to convey the necessary information to the device PM core
> + * code:
> + *
> + * ON No transition.
> + *
> + * FREEZE System is going to hibernate, call ->prepare() and ->freeze()
> + * for all devices.
> + *
> + * SUSPEND System is going to suspend, call ->prepare() and ->suspend()
> + * for all devices.
> + *
> + * HIBERNATE Hibernation image has been saved, call ->prepare() and
> + * ->poweroff() for all devices.
> + *
> + * QUIESCE Contents of main memory are going to be restored from a (loaded)
> + * hibernation image, call ->prepare() and ->freeze() for all
> + * devices.
> + *
> + * RESUME System is resuming, call ->resume() and ->complete() for all
> + * devices.
> + *
> + * THAW Hibernation image has been created, call ->thaw() and
> + * ->complete() for all devices.
> + *
> + * RESTORE Contents of main memory have been restored from a hibernation
> + * image, call ->restore() and ->complete() for all devices.
> + *
> + * RECOVER Creation of a hibernation image or restoration of the main
> + * memory contents from a hibernation image has failed, call
> + * ->thaw() and ->complete() for all devices.
> + */
> +
> +#define PM_EVENT_ON 0x0000
> +#define PM_EVENT_FREEZE 0x0001
> +#define PM_EVENT_SUSPEND 0x0002
> +#define PM_EVENT_HIBERNATE 0x0004
> +#define PM_EVENT_QUIESCE 0x0008
> +#define PM_EVENT_RESUME 0x0010
> +#define PM_EVENT_THAW 0x0020
> +#define PM_EVENT_RESTORE 0x0040
> +#define PM_EVENT_RECOVER 0x0080
> +
> +#define PM_EVENT_SLEEP (PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
> +
> +#define PMSG_FREEZE ((struct pm_message){ .event = PM_EVENT_FREEZE, })
> +#define PMSG_QUIESCE ((struct pm_message){ .event = PM_EVENT_QUIESCE, })
> +#define PMSG_SUSPEND ((struct pm_message){ .event = PM_EVENT_SUSPEND, })
> +#define PMSG_HIBERNATE ((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
> +#define PMSG_RESUME ((struct pm_message){ .event = PM_EVENT_RESUME, })
> +#define PMSG_THAW ((struct pm_message){ .event = PM_EVENT_THAW, })
> +#define PMSG_RESTORE ((struct pm_message){ .event = PM_EVENT_RESTORE, })
> +#define PMSG_RECOVER ((struct pm_message){ .event = PM_EVENT_RECOVER, })
> +#define PMSG_ON ((struct pm_message){ .event = PM_EVENT_ON, })
> +
> +/**
> + * Device power management states
> + *
> + * These state labels are used internally by the PM core to indicate the current
> + * status of a device with respect to the PM core operations.
> + *
> + * DPM_ON Device is regarded as operational. Set this way
> + * initially and when ->resume(), ->thaw(), or ->restore()
> + * (or ->complete() in case of an error) is about to be
> + * called. Also set when ->prepare() fails.
> + *
> + * DPM_PREPARING Device is currently being prepared for power transition.
> + * Set when ->prepare() is about to be called for the
> + * device.
> + *
> + * DPM_OFF Device is regarded as not fully operational. Set
> + * immediately after a successful call to ->prepare(),
> + * prior to calling ->suspend(), ->freeze(), or
> + * ->poweroff() for all devices.
> + */
> +
> +enum dpm_state {
> + DPM_ON,
> + DPM_PREPARING,
> + DPM_OFF,
> +};
> +
> +struct dev_pm_info {
> + pm_message_t power_state;
> + unsigned can_wakeup:1;
> + unsigned should_wakeup:1;
> + enum dpm_state status:2; /* Owned by the PM core */
> +#ifdef CONFIG_PM_SLEEP
> + struct list_head entry;
> +#endif
> +};
> +
> +/*
> + * The PM_EVENT_ messages are also used by drivers implementing the legacy
> + * suspend framework, based on the ->suspend() and ->resume() callbacks common
> + * for suspend and hibernation transitions, according to the rules below.
> + */
> +
> +/* Necessary, because several drivers use PM_EVENT_PRETHAW */
> +#define PM_EVENT_PRETHAW PM_EVENT_QUIESCE
> +
> +/*
> * One transition is triggered by resume(), after a suspend() call; the
> * message is implicit:
> *
> @@ -166,35 +407,11 @@ typedef struct pm_message {
> * or from system low-power states such as standby or suspend-to-RAM.
> */
>
> -#define PM_EVENT_ON 0
> -#define PM_EVENT_FREEZE 1
> -#define PM_EVENT_SUSPEND 2
> -#define PM_EVENT_HIBERNATE 4
> -#define PM_EVENT_PRETHAW 8
> -
> -#define PM_EVENT_SLEEP (PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
> -
> -#define PMSG_FREEZE ((struct pm_message){ .event = PM_EVENT_FREEZE, })
> -#define PMSG_PRETHAW ((struct pm_message){ .event = PM_EVENT_PRETHAW, })
> -#define PMSG_SUSPEND ((struct pm_message){ .event = PM_EVENT_SUSPEND, })
> -#define PMSG_HIBERNATE ((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
> -#define PMSG_ON ((struct pm_message){ .event = PM_EVENT_ON, })
> -
> -struct dev_pm_info {
> - pm_message_t power_state;
> - unsigned can_wakeup:1;
> - unsigned should_wakeup:1;
> - bool sleeping:1; /* Owned by the PM core */
> -#ifdef CONFIG_PM_SLEEP
> - struct list_head entry;
> -#endif
> -};
> +#ifdef CONFIG_PM_SLEEP
> +extern void device_power_up(pm_message_t state);
> +extern void device_resume(pm_message_t state);
>
> extern int device_power_down(pm_message_t state);
> -extern void device_power_up(void);
> -extern void device_resume(void);
> -
> -#ifdef CONFIG_PM_SLEEP
> extern int device_suspend(pm_message_t state);
> extern int device_prepare_suspend(pm_message_t state);
>
> Index: linux-2.6/drivers/base/power/main.c
> ===================================================================
> --- linux-2.6.orig/drivers/base/power/main.c
> +++ linux-2.6/drivers/base/power/main.c
> @@ -48,6 +48,7 @@
> */
>
> LIST_HEAD(dpm_active);
> +static LIST_HEAD(dpm_in_transit);
> static LIST_HEAD(dpm_off);
> static LIST_HEAD(dpm_off_irq);
> static LIST_HEAD(dpm_destroy);
> @@ -55,7 +56,7 @@ static LIST_HEAD(dpm_destroy);
> static DEFINE_MUTEX(dpm_list_mtx);
>
> /* 'true' if all devices have been suspended, protected by dpm_list_mtx */
> -static bool all_sleeping;
> +static bool all_inactive;
>
> /**
> * device_pm_add - add a device to the list of active devices
> @@ -69,22 +70,30 @@ int device_pm_add(struct device *dev)
> dev->bus ? dev->bus->name : "No Bus",
> kobject_name(&dev->kobj));
> mutex_lock(&dpm_list_mtx);
> - if ((dev->parent && dev->parent->power.sleeping) || all_sleeping) {
> - if (dev->parent->power.sleeping)
> - dev_warn(dev,
> - "parent %s is sleeping, will not add\n",
> + if (all_inactive) {
> + dev_warn(dev, "all devices are sleeping, will not add\n");
> + goto Refuse;
> + }
> + if (dev->parent)
> + switch (dev->parent->power.status) {
> + case DPM_OFF:
> + dev_warn(dev, "parent %s is sleeping, will not add\n",
> dev->parent->bus_id);
> - else
> - dev_warn(dev, "devices are sleeping, will not add\n");
> - WARN_ON(true);
> - error = -EBUSY;
> - } else {
> - error = dpm_sysfs_add(dev);
> - if (!error)
> - list_add_tail(&dev->power.entry, &dpm_active);
> - }
> + goto Refuse;
> + case DPM_PREPARING:
> + dev->parent->power.status = DPM_ON;
> + break;
> + }
> + error = dpm_sysfs_add(dev);
> + if (!error)
> + list_add_tail(&dev->power.entry, &dpm_active);
> + End:
> mutex_unlock(&dpm_list_mtx);
> return error;
> + Refuse:
> + WARN_ON(true);
> + error = -EBUSY;
> + goto End;
> }
>
> /**
> @@ -122,26 +131,184 @@ void device_pm_schedule_removal(struct d
> }
> EXPORT_SYMBOL_GPL(device_pm_schedule_removal);
>
> +/**
> + * pm_op - execute the PM operation appropiate for given PM event
> + * @dev: Device.
> + * @ops: PM operations to choose from.
> + * @state: PM event message.
> + */
> +static int pm_op(struct device *dev, struct pm_ops *ops, pm_message_t state)
> +{
> + int error = 0;
> +
> + switch (state.event) {
> +#ifdef CONFIG_SUSPEND
> + case PM_EVENT_SUSPEND:
> + if (ops->suspend) {
> + error = ops->suspend(dev);
> + suspend_report_result(ops->suspend, error);
> + }
> + break;
> + case PM_EVENT_RESUME:
> + if (ops->resume) {
> + error = ops->resume(dev);
> + suspend_report_result(ops->resume, error);
> + }
> + break;
> +#endif /* CONFIG_SUSPEND */
> +#ifdef CONFIG_HIBERNATION
> + case PM_EVENT_FREEZE:
> + case PM_EVENT_QUIESCE:
> + if (ops->freeze) {
> + error = ops->freeze(dev);
> + suspend_report_result(ops->freeze, error);
> + }
> + break;
> + case PM_EVENT_HIBERNATE:
> + if (ops->poweroff) {
> + error = ops->poweroff(dev);
> + suspend_report_result(ops->poweroff, error);
> + }
> + break;
> + case PM_EVENT_THAW:
> + case PM_EVENT_RECOVER:
> + if (ops->thaw) {
> + error = ops->thaw(dev);
> + suspend_report_result(ops->thaw, error);
> + }
> + break;
> + case PM_EVENT_RESTORE:
> + if (ops->restore) {
> + error = ops->restore(dev);
> + suspend_report_result(ops->restore, error);
> + }
> + break;
> +#endif /* CONFIG_HIBERNATION */
> + default:
> + error = -EINVAL;
> + }
> + return error;
> +}
> +
> +/**
> + * pm_noirq_op - execute the PM operation appropiate for given PM event
> + * @dev: Device.
> + * @ops: PM operations to choose from.
> + * @state: PM event message.
> + *
> + * The operation is executed with interrupts disabled by the only remaining
> + * functional CPU in the system.
> + */
> +static int pm_noirq_op(struct device *dev, struct pm_noirq_ops *ops,
> + pm_message_t state)
> +{
> + int error = 0;
> +
> + switch (state.event) {
> +#ifdef CONFIG_SUSPEND
> + case PM_EVENT_SUSPEND:
> + if (ops->suspend_noirq) {
> + error = ops->suspend_noirq(dev);
> + suspend_report_result(ops->suspend_noirq, error);
> + }
> + break;
> + case PM_EVENT_RESUME:
> + if (ops->resume_noirq) {
> + error = ops->resume_noirq(dev);
> + suspend_report_result(ops->resume_noirq, error);
> + }
> + break;
> +#endif /* CONFIG_SUSPEND */
> +#ifdef CONFIG_HIBERNATION
> + case PM_EVENT_FREEZE:
> + case PM_EVENT_QUIESCE:
> + if (ops->freeze_noirq) {
> + error = ops->freeze_noirq(dev);
> + suspend_report_result(ops->freeze_noirq, error);
> + }
> + break;
> + case PM_EVENT_HIBERNATE:
> + if (ops->poweroff_noirq) {
> + error = ops->poweroff_noirq(dev);
> + suspend_report_result(ops->poweroff_noirq, error);
> + }
> + break;
> + case PM_EVENT_THAW:
> + case PM_EVENT_RECOVER:
> + if (ops->thaw_noirq) {
> + error = ops->thaw_noirq(dev);
> + suspend_report_result(ops->thaw_noirq, error);
> + }
> + break;
> + case PM_EVENT_RESTORE:
> + if (ops->restore_noirq) {
> + error = ops->restore_noirq(dev);
> + suspend_report_result(ops->restore_noirq, error);
> + }
> + break;
> +#endif /* CONFIG_HIBERNATION */
> + default:
> + error = -EINVAL;
> + }
> + return error;
> +}
> +
> +static char *pm_verb(int event)
> +{
> + switch (event) {
> + case PM_EVENT_SUSPEND:
> + return "suspend";
> + case PM_EVENT_RESUME:
> + return "resume";
> + case PM_EVENT_FREEZE:
> + return "freeze";
> + case PM_EVENT_QUIESCE:
> + return "quiesce";
> + case PM_EVENT_HIBERNATE:
> + return "hibernate";
> + case PM_EVENT_THAW:
> + return "thaw";
> + case PM_EVENT_RESTORE:
> + return "restore";
> + default:
> + return "(unknown PM event)";
> + }
> +}
> +
> +static void pm_dev_dbg(struct device *dev, pm_message_t state, char *info)
> +{
> + dev_dbg(dev, "%s%s%s\n", info, pm_verb(state.event),
> + ((state.event & PM_EVENT_SLEEP) && device_may_wakeup(dev)) ?
> + ", may wakeup" : "");
> +}
> +
> /*------------------------- Resume routines -------------------------*/
>
> /**
> - * resume_device_early - Power on one device (early resume).
> + * resume_device_noirq - Power on one device (early resume).
> * @dev: Device.
> + * @state: Operation to carry out.
> *
> * Must be called with interrupts disabled.
> */
> -static int resume_device_early(struct device *dev)
> +static int resume_device_noirq(struct device *dev, pm_message_t state)
> {
> int error = 0;
>
> TRACE_DEVICE(dev);
> TRACE_RESUME(0);
>
> - if (dev->bus && dev->bus->resume_early) {
> - dev_dbg(dev, "EARLY resume\n");
> + if (!dev->bus)
> + goto End;
> +
> + if (dev->bus->pm_noirq) {
> + pm_dev_dbg(dev, state, "EARLY ");
> + error = pm_noirq_op(dev, dev->bus->pm_noirq, state);
> + } else if (dev->bus->resume_early) {
> + pm_dev_dbg(dev, state, "legacy EARLY ");
> error = dev->bus->resume_early(dev);
> }
> -
> + End:
> TRACE_RESUME(error);
> return error;
> }
> @@ -156,7 +323,7 @@ static int resume_device_early(struct de
> *
> * Must be called with interrupts disabled and only one CPU running.
> */
> -static void dpm_power_up(void)
> +static void dpm_power_up(pm_message_t state)
> {
>
> while (!list_empty(&dpm_off_irq)) {
> @@ -164,7 +331,7 @@ static void dpm_power_up(void)
> struct device *dev = to_device(entry);
>
> list_move_tail(entry, &dpm_off);
> - resume_device_early(dev);
> + resume_device_noirq(dev, state);
> }
> }
>
> @@ -176,19 +343,19 @@ static void dpm_power_up(void)
> *
> * Must be called with interrupts disabled.
> */
> -void device_power_up(void)
> +void device_power_up(pm_message_t state)
> {
> sysdev_resume();
> - dpm_power_up();
> + dpm_power_up(state);
> }
> EXPORT_SYMBOL_GPL(device_power_up);
>
> /**
> * resume_device - Restore state for one device.
> * @dev: Device.
> - *
> + * @state: Operation to carry out.
> */
> -static int resume_device(struct device *dev)
> +static int resume_device(struct device *dev, pm_message_t state)
> {
> int error = 0;
>
> @@ -197,21 +364,40 @@ static int resume_device(struct device *
>
> down(&dev->sem);
>
> - if (dev->bus && dev->bus->resume) {
> - dev_dbg(dev,"resuming\n");
> - error = dev->bus->resume(dev);
> + if (dev->bus) {
> + if (dev->bus->pm) {
> + pm_dev_dbg(dev, state, "");
> + error = pm_op(dev, dev->bus->pm, state);
> + } else if (dev->bus->resume) {
> + pm_dev_dbg(dev, state, "legacy ");
> + error = dev->bus->resume(dev);
> + }
> + if (error)
> + goto End;
> }
>
> - if (!error && dev->type && dev->type->resume) {
> - dev_dbg(dev,"resuming\n");
> - error = dev->type->resume(dev);
> + if (dev->type) {
> + if (dev->type->pm) {
> + pm_dev_dbg(dev, state, "type ");
> + error = pm_op(dev, dev->type->pm, state);
> + } else if (dev->type->resume) {
> + pm_dev_dbg(dev, state, "legacy type ");
> + error = dev->type->resume(dev);
> + }
> + if (error)
> + goto End;
> }
>
> - if (!error && dev->class && dev->class->resume) {
> - dev_dbg(dev,"class resume\n");
> - error = dev->class->resume(dev);
> + if (dev->class) {
> + if (dev->class->pm) {
> + pm_dev_dbg(dev, state, "class ");
> + error = pm_op(dev, dev->class->pm, state);
> + } else if (dev->class->resume) {
> + pm_dev_dbg(dev, state, "legacy class ");
> + error = dev->class->resume(dev);
> + }
> }
> -
> + End:
> up(&dev->sem);
>
> TRACE_RESUME(error);
> @@ -226,20 +412,73 @@ static int resume_device(struct device *
> * went through the early resume.
> *
> * Take devices from the dpm_off_list, resume them,
> - * and put them on the dpm_locked list.
> + * and put them on the dpm_in_transit list.
> */
> -static void dpm_resume(void)
> +static void dpm_resume(pm_message_t state)
> {
> mutex_lock(&dpm_list_mtx);
> - all_sleeping = false;
> while(!list_empty(&dpm_off)) {
> struct list_head *entry = dpm_off.next;
> struct device *dev = to_device(entry);
>
> + list_move_tail(entry, &dpm_in_transit);
> + mutex_unlock(&dpm_list_mtx);
> +
> + resume_device(dev, state);
> +
> + mutex_lock(&dpm_list_mtx);
> + }
> + mutex_unlock(&dpm_list_mtx);
> +}
> +
> +/**
> + * complete_device - Complete a PM transition for given device
> + * @dev: Device.
> + * @state: Power transition we are completing.
> + */
> +static void complete_device(struct device *dev, pm_message_t state)
> +{
> + down(&dev->sem);
> +
> + if (dev->bus && dev->bus->pm && dev->bus->pm->complete) {
> + pm_dev_dbg(dev, state, "completing ");
> + dev->bus->pm->complete(dev);
> + }
> +
> + if (dev->type && dev->type->pm && dev->type->pm->complete) {
> + pm_dev_dbg(dev, state, "completing type ");
> + dev->type->pm->complete(dev);
> + }
> +
> + if (dev->class && dev->class->pm && dev->class->pm->complete) {
> + pm_dev_dbg(dev, state, "completing class ");
> + dev->class->pm->complete(dev);
> + }
> +
> + up(&dev->sem);
> +}
> +
> +/**
> + * dpm_complete - Complete a PM transition for all devices.
> + * @state: Power transition we are completing.
> + *
> + * Take devices from the dpm_in_transit, complete the transition for each
> + * of them and put them on the dpm_active list.
> + */
> +static void dpm_complete(pm_message_t state)
> +{
> + mutex_lock(&dpm_list_mtx);
> + all_inactive = false;
> + while(!list_empty(&dpm_in_transit)) {
> + struct list_head *entry = dpm_in_transit.next;
> + struct device *dev = to_device(entry);
> +
> list_move_tail(entry, &dpm_active);
> - dev->power.sleeping = false;
> + dev->power.status = DPM_ON;
> mutex_unlock(&dpm_list_mtx);
> - resume_device(dev);
> +
> + complete_device(dev, state);
> +
> mutex_lock(&dpm_list_mtx);
> }
> mutex_unlock(&dpm_list_mtx);
> @@ -267,14 +506,16 @@ static void unregister_dropped_devices(v
>
> /**
> * device_resume - Restore state of each device in system.
> + * @state: Operation to carry out.
> *
> * Resume all the devices, unlock them all, and allow new
> * devices to be registered once again.
> */
> -void device_resume(void)
> +void device_resume(pm_message_t state)
> {
> might_sleep();
> - dpm_resume();
> + dpm_resume(state);
> + dpm_complete(state);
> unregister_dropped_devices();
> }
> EXPORT_SYMBOL_GPL(device_resume);
> @@ -282,37 +523,44 @@ EXPORT_SYMBOL_GPL(device_resume);
>
> /*------------------------- Suspend routines -------------------------*/
>
> -static inline char *suspend_verb(u32 event)
> +/**
> + * resume_event - return a PM message representing the resume event
> + * corresponding to given sleep state.
> + * @sleep_state - PM message representing a sleep state
> + */
> +static pm_message_t resume_event(pm_message_t sleep_state)
> {
> - switch (event) {
> - case PM_EVENT_SUSPEND: return "suspend";
> - case PM_EVENT_FREEZE: return "freeze";
> - case PM_EVENT_PRETHAW: return "prethaw";
> - default: return "(unknown suspend event)";
> + switch (sleep_state.event) {
> + case PM_EVENT_SUSPEND:
> + return PMSG_RESUME;
> + case PM_EVENT_FREEZE:
> + case PM_EVENT_QUIESCE:
> + return PMSG_RECOVER;
> + case PM_EVENT_HIBERNATE:
> + return PMSG_RESTORE;
> }
> -}
> -
> -static void
> -suspend_device_dbg(struct device *dev, pm_message_t state, char *info)
> -{
> - dev_dbg(dev, "%s%s%s\n", info, suspend_verb(state.event),
> - ((state.event == PM_EVENT_SUSPEND) && device_may_wakeup(dev)) ?
> - ", may wakeup" : "");
> + return PMSG_ON;
> }
>
> /**
> - * suspend_device_late - Shut down one device (late suspend).
> + * suspend_device_noirq - Shut down one device (late suspend).
> * @dev: Device.
> - * @state: Power state device is entering.
> + * @state: PM message representing the operation to perform.
> *
> * This is called with interrupts off and only a single CPU running.
> */
> -static int suspend_device_late(struct device *dev, pm_message_t state)
> +static int suspend_device_noirq(struct device *dev, pm_message_t state)
> {
> int error = 0;
>
> - if (dev->bus && dev->bus->suspend_late) {
> - suspend_device_dbg(dev, state, "LATE ");
> + if (!dev->bus)
> + return 0;
> +
> + if (dev->bus->pm_noirq) {
> + pm_dev_dbg(dev, state, "LATE ");
> + error = pm_noirq_op(dev, dev->bus->pm_noirq, state);
> + } else if (dev->bus->suspend_late) {
> + pm_dev_dbg(dev, state, "legacy LATE ");
> error = dev->bus->suspend_late(dev, state);
> suspend_report_result(dev->bus->suspend_late, error);
> }
> @@ -321,7 +569,7 @@ static int suspend_device_late(struct de
>
> /**
> * device_power_down - Shut down special devices.
> - * @state: Power state to enter.
> + * @state: PM message representing the operation to perform.
> *
> * Power down devices that require interrupts to be disabled
> * and move them from the dpm_off list to the dpm_off_irq list.
> @@ -337,7 +585,7 @@ int device_power_down(pm_message_t state
> struct list_head *entry = dpm_off.prev;
> struct device *dev = to_device(entry);
>
> - error = suspend_device_late(dev, state);
> + error = suspend_device_noirq(dev, state);
> if (error) {
> printk(KERN_ERR "Could not power down device %s: "
> "error %d\n",
> @@ -351,7 +599,7 @@ int device_power_down(pm_message_t state
> if (!error)
> error = sysdev_suspend(state);
> if (error)
> - dpm_power_up();
> + dpm_power_up(resume_event(state));
> return error;
> }
> EXPORT_SYMBOL_GPL(device_power_down);
> @@ -367,29 +615,43 @@ static int suspend_device(struct device
>
> down(&dev->sem);
>
> - if (dev->power.power_state.event) {
> - dev_dbg(dev, "PM: suspend %d-->%d\n",
> - dev->power.power_state.event, state.event);
> - }
> -
> - if (dev->class && dev->class->suspend) {
> - suspend_device_dbg(dev, state, "class ");
> - error = dev->class->suspend(dev, state);
> - suspend_report_result(dev->class->suspend, error);
> + if (dev->class) {
> + if (dev->class->pm) {
> + pm_dev_dbg(dev, state, "class ");
> + error = pm_op(dev, dev->class->pm, state);
> + } else if (dev->class->suspend) {
> + pm_dev_dbg(dev, state, "legacy class ");
> + error = dev->class->suspend(dev, state);
> + suspend_report_result(dev->class->suspend, error);
> + }
> + if (error)
> + goto End;
> }
>
> - if (!error && dev->type && dev->type->suspend) {
> - suspend_device_dbg(dev, state, "type ");
> - error = dev->type->suspend(dev, state);
> - suspend_report_result(dev->type->suspend, error);
> + if (dev->type) {
> + if (dev->type->pm) {
> + pm_dev_dbg(dev, state, "type ");
> + error = pm_op(dev, dev->type->pm, state);
> + } else if (dev->type->suspend) {
> + pm_dev_dbg(dev, state, "legacy type ");
> + error = dev->type->suspend(dev, state);
> + suspend_report_result(dev->type->suspend, error);
> + }
> + if (error)
> + goto End;
> }
>
> - if (!error && dev->bus && dev->bus->suspend) {
> - suspend_device_dbg(dev, state, "");
> - error = dev->bus->suspend(dev, state);
> - suspend_report_result(dev->bus->suspend, error);
> + if (dev->bus) {
> + if (dev->bus->pm) {
> + pm_dev_dbg(dev, state, "");
> + error = pm_op(dev, dev->bus->pm, state);
> + } else if (dev->bus->suspend) {
> + pm_dev_dbg(dev, state, "legacy ");
> + error = dev->bus->suspend(dev, state);
> + suspend_report_result(dev->bus->suspend, error);
> + }
> }
> -
> + End:
> up(&dev->sem);
>
> return error;
> @@ -399,65 +661,140 @@ static int suspend_device(struct device
> * dpm_suspend - Suspend every device.
> * @state: Power state to put each device in.
> *
> - * Walk the dpm_locked list. Suspend each device and move it
> - * to the dpm_off list.
> - *
> - * (For historical reasons, if it returns -EAGAIN, that used to mean
> - * that the device would be called again with interrupts disabled.
> - * These days, we use the "suspend_late()" callback for that, so we
> - * print a warning and consider it an error).
> + * Walk the dpm_in_transit list. Suspend each device and move it to the
> + * dpm_off list.
> */
> static int dpm_suspend(pm_message_t state)
> {
> int error = 0;
>
> mutex_lock(&dpm_list_mtx);
> - while (!list_empty(&dpm_active)) {
> - struct list_head *entry = dpm_active.prev;
> + while (!list_empty(&dpm_in_transit)) {
> + struct list_head *entry = dpm_in_transit.prev;
> struct device *dev = to_device(entry);
>
> - WARN_ON(dev->parent && dev->parent->power.sleeping);
> -
> - dev->power.sleeping = true;
> mutex_unlock(&dpm_list_mtx);
> +
> error = suspend_device(dev, state);
> +
> mutex_lock(&dpm_list_mtx);
> if (error) {
> printk(KERN_ERR "Could not suspend device %s: "
> - "error %d%s\n",
> - kobject_name(&dev->kobj),
> - error,
> - (error == -EAGAIN ?
> - " (please convert to suspend_late)" :
> - ""));
> - dev->power.sleeping = false;
> + "error %d\n", kobject_name(&dev->kobj), error);
> break;
> }
> if (!list_empty(&dev->power.entry))
> list_move(&dev->power.entry, &dpm_off);
> }
> - if (!error)
> - all_sleeping = true;
> mutex_unlock(&dpm_list_mtx);
> + return error;
> +}
> +
> +/**
> + * prepare_device - Execute the ->prepare() callback(s) for given device.
> + * @dev: Device.
> + * @state: PM operation we are preparing for.
> + */
> +static int prepare_device(struct device *dev, pm_message_t state)
> +{
> + int error = 0;
> +
> + down(&dev->sem);
> +
> + if (dev->bus && dev->bus->pm && dev->bus->pm->prepare) {
> + pm_dev_dbg(dev, state, "preparing ");
> + error = dev->bus->pm->prepare(dev);
> + suspend_report_result(dev->bus->pm->prepare, error);
> + if (error)
> + goto End;
> + }
> +
> + if (dev->type && dev->type->pm && dev->type->pm->prepare) {
> + pm_dev_dbg(dev, state, "preparing type ");
> + error = dev->type->pm->prepare(dev);
> + suspend_report_result(dev->type->pm->prepare, error);
> + if (error)
> + goto End;
> + }
> +
> + if (dev->class && dev->class->pm && dev->class->pm->prepare) {
> + pm_dev_dbg(dev, state, "preparing class ");
> + error = dev->class->pm->prepare(dev);
> + suspend_report_result(dev->class->pm->prepare, error);
> + }
> + End:
> + up(&dev->sem);
>
> return error;
> }
>
> /**
> + * dpm_prepare - Prepare all devices for a PM transition.
> + * @state: Power state to put each device in.
> + *
> + * Walk the dpm_active list. Prepare each device and move it to the
> + * dpm_in_transit list.
> + */
> +static int dpm_prepare(pm_message_t state)
> +{
> + int error = 0;
> +
> + mutex_lock(&dpm_list_mtx);
> + while (!list_empty(&dpm_active)) {
> + struct list_head *entry = dpm_active.prev;
> + struct device *dev = to_device(entry);
> +
> + WARN_ON(dev->parent && dev->parent->power.status == DPM_OFF);
> + dev->power.status = DPM_PREPARING;
> + mutex_unlock(&dpm_list_mtx);
> +
> + error = prepare_device(dev, state);
> +
> + mutex_lock(&dpm_list_mtx);
> + if (error) {
> + dev->power.status = DPM_ON;
> + if (error == -EAGAIN)
> + continue;
> + printk(KERN_ERR "Could not prepare device %s "
> + "for suspend: error %d\n",
> + kobject_name(&dev->kobj), error);
> + goto End;
> + }
> + if (dev->power.status == DPM_ON) {
> + /* Child added during prepare_device() */
> + mutex_unlock(&dpm_list_mtx);
> +
> + complete_device(dev, resume_event(state));
> +
> + mutex_lock(&dpm_list_mtx);
> + continue;
> + }
> + dev->power.status = DPM_OFF;
> + if (!list_empty(&dev->power.entry))
> + list_move(&dev->power.entry, &dpm_in_transit);
> + }
> + all_inactive = true;
> + End:
> + mutex_unlock(&dpm_list_mtx);
> + return error;
> +}
> +
> +/**
> * device_suspend - Save state and stop all devices in system.
> * @state: new power management state
> *
> - * Prevent new devices from being registered, then lock all devices
> - * and suspend them.
> + * Prepare and suspend all devices.
> */
> int device_suspend(pm_message_t state)
> {
> int error;
>
> might_sleep();
> - error = dpm_suspend(state);
> + error = dpm_prepare(state);
> + if (!error)
> + error = dpm_suspend(state);
> if (error)
> - device_resume();
> + device_resume(resume_event(state));
> return error;
> }
> EXPORT_SYMBOL_GPL(device_suspend);
> Index: linux-2.6/include/linux/device.h
> ===================================================================
> --- linux-2.6.orig/include/linux/device.h
> +++ linux-2.6/include/linux/device.h
> @@ -69,6 +69,9 @@ struct bus_type {
> int (*resume_early)(struct device *dev);
> int (*resume)(struct device *dev);
>
> + struct pm_ops *pm;
> + struct pm_noirq_ops *pm_noirq;
> +
> struct bus_type_private *p;
> };
>
> @@ -202,6 +205,8 @@ struct class {
>
> int (*suspend)(struct device *dev, pm_message_t state);
> int (*resume)(struct device *dev);
> +
> + struct pm_ops *pm;
> };
>
> extern int __must_check class_register(struct class *class);
> @@ -345,8 +350,11 @@ struct device_type {
> struct attribute_group **groups;
> int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
> void (*release)(struct device *dev);
> +
> int (*suspend)(struct device *dev, pm_message_t state);
> int (*resume)(struct device *dev);
> +
> + struct pm_ops *pm;
> };
>
> /* interface for exporting device attributes */
> Index: linux-2.6/kernel/power/disk.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/disk.c
> +++ linux-2.6/kernel/power/disk.c
> @@ -224,7 +224,7 @@ static int create_image(int platform_mod
> /* NOTE: device_power_up() is just a resume() for devices
> * that suspended with irqs off ... no overall powerup.
> */
> - device_power_up();
> + device_power_up(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
> Enable_irqs:
> local_irq_enable();
> return error;
> @@ -280,7 +280,7 @@ int hibernation_snapshot(int platform_mo
> Finish:
> platform_finish(platform_mode);
> Resume_devices:
> - device_resume();
> + device_resume(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
> Resume_console:
> resume_console();
> Close:
> @@ -301,7 +301,7 @@ static int resume_target_kernel(void)
> int error;
>
> local_irq_disable();
> - error = device_power_down(PMSG_PRETHAW);
> + error = device_power_down(PMSG_QUIESCE);
> if (error) {
> printk(KERN_ERR "PM: Some devices failed to power down, "
> "aborting resume\n");
> @@ -329,7 +329,7 @@ static int resume_target_kernel(void)
> swsusp_free();
> restore_processor_state();
> touch_softlockup_watchdog();
> - device_power_up();
> + device_power_up(PMSG_THAW);
> Enable_irqs:
> local_irq_enable();
> return error;
> @@ -350,7 +350,7 @@ int hibernation_restore(int platform_mod
>
> pm_prepare_console();
> suspend_console();
> - error = device_suspend(PMSG_PRETHAW);
> + error = device_suspend(PMSG_QUIESCE);
> if (error)
> goto Finish;
>
> @@ -362,7 +362,7 @@ int hibernation_restore(int platform_mod
> enable_nonboot_cpus();
> }
> platform_restore_cleanup(platform_mode);
> - device_resume();
> + device_resume(PMSG_RECOVER);
> Finish:
> resume_console();
> pm_restore_console();
> @@ -419,7 +419,7 @@ int hibernation_platform_enter(void)
> Finish:
> hibernation_ops->finish();
> Resume_devices:
> - device_resume();
> + device_resume(PMSG_RESTORE);
> Resume_console:
> resume_console();
> Close:
> Index: linux-2.6/kernel/power/main.c
> ===================================================================
> --- linux-2.6.orig/kernel/power/main.c
> +++ linux-2.6/kernel/power/main.c
> @@ -239,7 +239,7 @@ static int suspend_enter(suspend_state_t
> if (!suspend_test(TEST_CORE))
> error = suspend_ops->enter(state);
>
> - device_power_up();
> + device_power_up(PMSG_RESUME);
> Done:
> arch_suspend_enable_irqs();
> BUG_ON(irqs_disabled());
> @@ -291,7 +291,7 @@ int suspend_devices_and_enter(suspend_st
> if (suspend_ops->finish)
> suspend_ops->finish();
> Resume_devices:
> - device_resume();
> + device_resume(PMSG_RESUME);
> Resume_console:
> resume_console();
> Close:
> Index: linux-2.6/arch/x86/kernel/apm_32.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> +++ linux-2.6/arch/x86/kernel/apm_32.c
> @@ -1208,9 +1208,9 @@ static int suspend(int vetoable)
> if (err != APM_SUCCESS)
> apm_error("suspend", err);
> err = (err == APM_SUCCESS) ? 0 : -EIO;
> - device_power_up();
> + device_power_up(PMSG_RESUME);
> local_irq_enable();
> - device_resume();
> + device_resume(PMSG_RESUME);
> queue_event(APM_NORMAL_RESUME, NULL);
> out:
> spin_lock(&user_list_lock);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2008-03-22 22:19:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Friday, 21 of March 2008, Alan Stern wrote:
> On Fri, 21 Mar 2008, Rafael J. Wysocki wrote:
>
> > > One trivial problem is that your dpm_prepare and dpm_complete routines
> > > go through the list in the wrong order.
> >
> > I'm not all so sure that the order is actually wrong. What would be the
> > advantage of the forward order over the current one?
>
> The advantage is that races no longer matter.
>
> Suppose you're going through the list backwards and a race occurs, so
> that a child is registered while the parent's prepare() is running.
> That new child will of course be at the end of dpm_active, so the loop
> in dpm_prepare won't see it (assuming you adopt the approach of using
> only a single list). But if you go through the list forwards and a new
> child is added, you will naturally see the child when you reach the end
> of the list. You don't even have to go through the business about
> changing the parent's state back to DPM_ON.
>
> There are other ways of describing the situation, but they all come
> down to the same thing. For instance, this is the way Ben was talking
> about doing things.
>
> > > Since dpm_prepare is supposed to go through the list in the forward
> > > direction, logically the "root" of the device tree should be the first
> > > thing "prepared". This means you should not allow parentless devices
> > > to be registered any time after dpm_prepare has started. Is that
> > > liable to cause problems?
> >
> > I'm still not seeing the advantage of the forward direction in the first place.
> >
> > Although I don't see what particular problems that may cause, I think the
> > current approach (first, block registrations of new children for each
> > ->prepare()d device and finally block any registrations of new devices) is
> > more natural.
>
> I suppose for parentless devices it doesn't really matter. You could
> safely allow them to be registered any time up until dpm_prepare()
> finishes -- which is what your patch does. Perhaps the "all_inactive"
> flag should be renamed to "all_prepared".
>
> > > There may be similar problems with complete(). A number of drivers
> > > check during their resume method for the presence of new children
> > > plugged in while the system was asleep. All these drivers would have
> > > to be converted over to the new scheme if they weren't permitted to
> > > register the new children before complete() was called. Of course,
> > > this is easily fixed by permitting new children starting from when
> > > resume() is called rather than when complete() is called.
> >
> > Well, the problem here was the protection of the correct ordering of the
> > various lists. However, if the approach with changing 'status' is adopted
> > instead, which seems to be better, we'll be able to unblock the registering
> > of new children before ->resume().
>
> Yep. The only thing to watch out for is in device_pm_remove(); it
> would be a disaster if somehow a device was removed while it was being
> prepared/suspended/resumed/completed/whatever. I know that's not
> supposed to happen but there's nothing to prevent it, especially if
> the device in question doesn't have a driver. No doubt you can invent
> a way to allow this to happen safely.

Well, that's a separate issue that IMO should be addressed in a separate patch.
Something like the one below comes to mind.

The comment removed by the patch is wrong IMO, because it implies that
device_add() may be called with the device semaphore held and that might
deadlock in bus_attach_device(). Thus, I think we can acquire dev->sem
in device_pm_add() and in device_pm_remove().

Thanks,
Rafael

---
drivers/base/power/main.c | 21 ++++++++++++++-------
include/linux/pm.h | 1 +
2 files changed, 15 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -41,10 +41,6 @@
* from the bottom (i.e., end) up and resumed in the opposite order.
* That way no parent will be suspended while it still has an active
* child.
- *
- * Since device_pm_add() may be called with a device semaphore held,
- * we must never try to acquire a device semaphore while holding
- * dpm_list_mutex.
*/

LIST_HEAD(dpm_active);
@@ -68,6 +64,7 @@ int device_pm_add(struct device *dev)
pr_debug("PM: Adding info for %s:%s\n",
dev->bus ? dev->bus->name : "No Bus",
kobject_name(&dev->kobj));
+ down(&dev->sem);
mutex_lock(&dpm_list_mtx);
if ((dev->parent && dev->parent->power.sleeping) || all_sleeping) {
if (dev->parent->power.sleeping)
@@ -80,10 +77,13 @@ int device_pm_add(struct device *dev)
error = -EBUSY;
} else {
error = dpm_sysfs_add(dev);
- if (!error)
+ if (!error) {
+ dev->power.present = true;
list_add_tail(&dev->power.entry, &dpm_active);
+ }
}
mutex_unlock(&dpm_list_mtx);
+ up(&dev->sem);
return error;
}

@@ -98,10 +98,13 @@ void device_pm_remove(struct device *dev
pr_debug("PM: Removing info for %s:%s\n",
dev->bus ? dev->bus->name : "No Bus",
kobject_name(&dev->kobj));
+ down(&dev->sem);
mutex_lock(&dpm_list_mtx);
dpm_sysfs_remove(dev);
+ dev->power.present = false;
list_del_init(&dev->power.entry);
mutex_unlock(&dpm_list_mtx);
+ up(&dev->sem);
}

/**
@@ -196,6 +199,8 @@ static int resume_device(struct device *
TRACE_RESUME(0);

down(&dev->sem);
+ if (!dev->power.present)
+ goto End;

if (dev->bus && dev->bus->resume) {
dev_dbg(dev,"resuming\n");
@@ -211,7 +216,7 @@ static int resume_device(struct device *
dev_dbg(dev,"class resume\n");
error = dev->class->resume(dev);
}
-
+ End:
up(&dev->sem);

TRACE_RESUME(error);
@@ -366,6 +371,8 @@ static int suspend_device(struct device
int error = 0;

down(&dev->sem);
+ if (!dev->power.present)
+ goto End;

if (dev->power.power_state.event) {
dev_dbg(dev, "PM: suspend %d-->%d\n",
@@ -389,7 +396,7 @@ static int suspend_device(struct device
error = dev->bus->suspend(dev, state);
suspend_report_result(dev->bus->suspend, error);
}
-
+ End:
up(&dev->sem);

return error;
Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -185,6 +185,7 @@ struct dev_pm_info {
unsigned can_wakeup:1;
unsigned should_wakeup:1;
bool sleeping:1; /* Owned by the PM core */
+ bool present:1; /* Owned by the PM core */
#ifdef CONFIG_PM_SLEEP
struct list_head entry;
#endif

2008-03-22 23:28:31

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Sat, 22 Mar 2008, Rafael J. Wysocki wrote:

> > Yep. The only thing to watch out for is in device_pm_remove(); it
> > would be a disaster if somehow a device was removed while it was being
> > prepared/suspended/resumed/completed/whatever. I know that's not
> > supposed to happen but there's nothing to prevent it, especially if
> > the device in question doesn't have a driver. No doubt you can invent
> > a way to allow this to happen safely.
>
> Well, that's a separate issue that IMO should be addressed in a separate patch.
> Something like the one below comes to mind.
>
> The comment removed by the patch is wrong IMO, because it implies that
> device_add() may be called with the device semaphore held and that might
> deadlock in bus_attach_device().

Are you talking about this comment?

> - * Since device_pm_add() may be called with a device semaphore held,
> - * we must never try to acquire a device semaphore while holding
> - * dpm_list_mutex.

It isn't wrong. device_add() may indeed be called with a device
semaphore held -- just not the semaphore for the device being added.
Quite often it is called with device's parent's semaphore held. The
implication is not that we may deadlock in bus_attach_device(); rather
it is that the order of acquisition must always be device semaphore
first, dev_list_mutex second.

> Thus, I think we can acquire dev->sem
> in device_pm_add() and in device_pm_remove().

No, you have missed the entire point. The problem doesn't exist in the
current code; it exists only if we switch over to using a single list.
Routines like dpm_suspend() won't be able to use list_for_each_entry()
to traverse the list because entries may be removed by other threads
during the traversal. Even list_for_each_entry_safe() won't work
correctly without careful attention to details.

Alan Stern

2008-03-22 23:45:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Sunday, 23 of March 2008, Alan Stern wrote:
> On Sat, 22 Mar 2008, Rafael J. Wysocki wrote:
>
[--snip--]
>
> No, you have missed the entire point. The problem doesn't exist in the
> current code; it exists only if we switch over to using a single list.
> Routines like dpm_suspend() won't be able to use list_for_each_entry()
> to traverse the list because entries may be removed by other threads
> during the traversal. Even list_for_each_entry_safe() won't work
> correctly without careful attention to details.

Ah, ok. Thanks for the clarification.

Doesn't it help that we traverse the list under dpm_list_mtx? Anyone who
removes an entry is required to take dpm_list_mtx that we're holding while
the list is traversed except when the callbacks are invoked.

The only problem I see is when the device currently being handled is removed
from the list by a concurrent thread. Is that you were referring to?

Rafael

2008-03-23 02:07:57

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Sun, 23 Mar 2008, Rafael J. Wysocki wrote:

> On Sunday, 23 of March 2008, Alan Stern wrote:
> > On Sat, 22 Mar 2008, Rafael J. Wysocki wrote:
> >
> [--snip--]
> >
> > No, you have missed the entire point. The problem doesn't exist in the
> > current code; it exists only if we switch over to using a single list.
> > Routines like dpm_suspend() won't be able to use list_for_each_entry()
> > to traverse the list because entries may be removed by other threads
> > during the traversal. Even list_for_each_entry_safe() won't work
> > correctly without careful attention to details.
>
> Ah, ok. Thanks for the clarification.
>
> Doesn't it help that we traverse the list under dpm_list_mtx? Anyone who
> removes an entry is required to take dpm_list_mtx that we're holding while
> the list is traversed except when the callbacks are invoked.

It doesn't help. What _does_ help is the fact that these traversals
are all serialized (since only one thread can carry out a system sleep
at any time).

> The only problem I see is when the device currently being handled is removed
> from the list by a concurrent thread. Is that you were referring to?

Yes, that is the problem. If you try to work around it by using
list_for_each_entry_safe() then you run into a problem when a
concurrent thread removes the device _following_ the one being handled
(or when the device being handled is the last one on the list and a
concurrent thread registers a new device, which can only happen in
dpm_prepare()).

It's not hard to fix. Just something to be aware of.

Alan Stern

P.S.: Oh yes, another related issue... We should call get_device() and
put_device() while holding dpm_list_mtx. Otherwise the device
structure might vanish when the callbacks are invoked.

2008-03-23 18:42:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Sunday, 23 of March 2008, Alan Stern wrote:
> On Sun, 23 Mar 2008, Rafael J. Wysocki wrote:
>
> > On Sunday, 23 of March 2008, Alan Stern wrote:
> > > On Sat, 22 Mar 2008, Rafael J. Wysocki wrote:
> > >
> > [--snip--]
> > >
> > > No, you have missed the entire point. The problem doesn't exist in the
> > > current code; it exists only if we switch over to using a single list.
> > > Routines like dpm_suspend() won't be able to use list_for_each_entry()
> > > to traverse the list because entries may be removed by other threads
> > > during the traversal. Even list_for_each_entry_safe() won't work
> > > correctly without careful attention to details.
> >
> > Ah, ok. Thanks for the clarification.
> >
> > Doesn't it help that we traverse the list under dpm_list_mtx? Anyone who
> > removes an entry is required to take dpm_list_mtx that we're holding while
> > the list is traversed except when the callbacks are invoked.
>
> It doesn't help. What _does_ help is the fact that these traversals
> are all serialized (since only one thread can carry out a system sleep
> at any time).
>
> > The only problem I see is when the device currently being handled is removed
> > from the list by a concurrent thread. Is that you were referring to?
>
> Yes, that is the problem. If you try to work around it by using
> list_for_each_entry_safe() then you run into a problem when a
> concurrent thread removes the device _following_ the one being handled
> (or when the device being handled is the last one on the list and a
> concurrent thread registers a new device, which can only happen in
> dpm_prepare()).
>
> It's not hard to fix. Just something to be aware of.

Yes, I've almost finished a new patch taking that into account. I'll send it
soon in a separate thread.

> P.S.: Oh yes, another related issue... We should call get_device() and
> put_device() while holding dpm_list_mtx. Otherwise the device
> structure might vanish when the callbacks are invoked.

Good idea.

Thanks,
Rafael

2008-03-23 21:17:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Friday, 21 of March 2008, Sam Ravnborg wrote:
> Hi Rafael.

Hi Sam,

> Is it possible to extend this in some way so we avoid the
> #ifdef stuff in the drivers?

Well, I'd love to do something like this.

> We could introduce a few special sections that we discard if
> PM is not in use.
> We have a reliable build time infrastructure to detect
> inconsistencies if needed.
>
> Something like:
> #define __suspend __section(.suspend.text)
> #define __suspenddata __section(.suspend.data)
>
> #define __hibernate __section(.hibernate.text)
> #define __hibernatedata __section(.hibernate.data)
>
> A few more tricks will be needed when we assign the functon pointers.
> We have __devexit_p(*) and we may use something similar.

Unfortunately, I have a little experience with linkers and I don't think I'll
be able to do anything like this in a reasonable time without any help.

Thanks,
Rafael

2008-03-25 09:52:29

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

Am Freitag, 21. März 2008 01:21:03 schrieb Johannes Berg:
> > + * All of the above callbacks, except for @complete(), return error codes.
> > + * However, the error codes returned by the resume operations, @resume(),
> > + * @thaw(), and @restore(), are only printed in the system logs, since the PM
> > + * core cannot do anything else about them.
>
> Why bother and not just make them return void, the error printing can
> most likely be done much much better in the callback since that possibly
> has information on why it failed.

A device that cannot wake up is unusable. Shouldn't the pm core disconnect()
such a device?

Regards
Oliver

2008-03-25 13:07:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Tuesday, 25 of March 2008, Oliver Neukum wrote:
> Am Freitag, 21. März 2008 01:21:03 schrieb Johannes Berg:
> > > + * All of the above callbacks, except for @complete(), return error codes.
> > > + * However, the error codes returned by the resume operations, @resume(),
> > > + * @thaw(), and @restore(), are only printed in the system logs, since the PM
> > > + * core cannot do anything else about them.
> >
> > Why bother and not just make them return void, the error printing can
> > most likely be done much much better in the callback since that possibly
> > has information on why it failed.
>
> A device that cannot wake up is unusable. Shouldn't the pm core disconnect()
> such a device?

Well, if ->resume() returns an error, the driver already knows there's a
problem and it can act upon that, at least in principle.

However, the PM core probably shouldn't try to resume the children of a failing
device. Also, if ->resume_noirq() fails, it probably is not a good idea to
call ->resume() and ->complete() for the same device and for it's children.

Thanks,
Rafael

2008-03-25 13:15:25

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

Am Dienstag, 25. März 2008 14:06:15 schrieb Rafael J. Wysocki:
> On Tuesday, 25 of March 2008, Oliver Neukum wrote:
> > Am Freitag, 21. März 2008 01:21:03 schrieb Johannes Berg:
> > > > + * All of the above callbacks, except for @complete(), return error codes.
> > > > + * However, the error codes returned by the resume operations, @resume(),
> > > > + * @thaw(), and @restore(), are only printed in the system logs, since the PM
> > > > + * core cannot do anything else about them.
> > >
> > > Why bother and not just make them return void, the error printing can
> > > most likely be done much much better in the callback since that possibly
> > > has information on why it failed.
> >
> > A device that cannot wake up is unusable. Shouldn't the pm core disconnect()
> > such a device?
>
> Well, if ->resume() returns an error, the driver already knows there's a
> problem and it can act upon that, at least in principle.

Then why return an error? If a driver returns an error I would assume that
to indicate an irrecoverable error.

> However, the PM core probably shouldn't try to resume the children of a failing
> device. Also, if ->resume_noirq() fails, it probably is not a good idea to
> call ->resume() and ->complete() for the same device and for it's children.

Exactly. But we need to define what happens in these cases. If we simply
ignore the errors, the drivers must be able to deal with IO to half suspended
devices.

Regards
Oliver

2008-03-25 14:19:56

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Tue, 25 Mar 2008, Oliver Neukum wrote:

> Am Dienstag, 25. März 2008 14:06:15 schrieb Rafael J. Wysocki:
> > On Tuesday, 25 of March 2008, Oliver Neukum wrote:
>
> > > A device that cannot wake up is unusable. Shouldn't the pm core disconnect()
> > > such a device?

It's not safe for the PM core to do such things unilaterally. The
decision to unregister a device should be made by the driver or the
subsystem.

(The only problem is that it's impossible to unregister a device from
within its suspend or resume methods. Perhaps there should be a way
for the driver to tell the PM core that the core should unregister the
device as soon as the method returns. I don't know if such a facility
would get used...)

> > Well, if ->resume() returns an error, the driver already knows there's a
> > problem and it can act upon that, at least in principle.
>
> Then why return an error? If a driver returns an error I would assume that
> to indicate an irrecoverable error.

The PM core prints a warning in the system log whenever an error is
returned. There isn't much more it can do.

> > However, the PM core probably shouldn't try to resume the children of a failing
> > device. Also, if ->resume_noirq() fails, it probably is not a good idea to
> > call ->resume() and ->complete() for the same device and for it's children.
>
> Exactly. But we need to define what happens in these cases. If we simply
> ignore the errors, the drivers must be able to deal with IO to half suspended
> devices.

Just so -- it's up to the drivers to deal with this sort of thing. The
PM core can't know the details of what should be done in each case.

For example, if a USB hub can't be resumed then usbcore marks all of
its descendants with USB_STATE_NOTATTACHED. When the children's resume
methods are called, they return without trying to do anything. Later
on khubd takes care of unregistering everything.

Alan Stern

2008-03-25 14:24:48

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

Am Dienstag, 25. M?rz 2008 15:19:45 schrieb Alan Stern:
> On Tue, 25 Mar 2008, Oliver Neukum wrote:
>
> > Am Dienstag, 25. März 2008 14:06:15 schrieb Rafael J. Wysocki:
> > > On Tuesday, 25 of March 2008, Oliver Neukum wrote:
> >
> > > > A device that cannot wake up is unusable. Shouldn't the pm core disconnect()
> > > > such a device?
>
> It's not safe for the PM core to do such things unilaterally. ?The
> decision to unregister a device should be made by the driver or the
> subsystem.

Why? You can trigger it from user space via sysfs and in many cases
suspending to disk will disconnect all devices on a bus, so I'd say a
failure to resume is just a limited subcase of a device vanishing during
sleep.

Regards
Oliver

2008-03-25 14:33:32

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Tue, 25 Mar 2008, Oliver Neukum wrote:

> Am Dienstag, 25. M?rz 2008 15:19:45 schrieb Alan Stern:

> > It's not safe for the PM core to do such things unilaterally. ?The
> > decision to unregister a device should be made by the driver or the
> > subsystem.
>
> Why? You can trigger it from user space via sysfs

How can you do that?

> and in many cases
> suspending to disk will disconnect all devices on a bus,

But these disconnects aren't done by the PM core; they are done by
individual drivers or subsystems.

> so I'd say a
> failure to resume is just a limited subcase of a device vanishing during
> sleep.

I'll go along with that. If a device vanishes during sleep, the PM
core isn't responsible for unregistering it -- the device's subsystem
is.

Alan Stern

2008-03-25 19:48:56

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

Am Dienstag, 25. M?rz 2008 15:33:22 schrieb Alan Stern:
> > so I'd say a
> > failure to resume is just a limited subcase of a device vanishing during
> > sleep.
>
> I'll go along with that. ?If a device vanishes during sleep, the PM
> core isn't responsible for unregistering it -- the device's subsystem
> is.

Yes, that makes sense. You are right.

Regards
Oliver

2008-03-25 20:42:19

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Tuesday, 25 of March 2008, Oliver Neukum wrote:
> Am Dienstag, 25. M?rz 2008 15:33:22 schrieb Alan Stern:
> > > so I'd say a
> > > failure to resume is just a limited subcase of a device vanishing during
> > > sleep.
> >
> > I'll go along with that. ?If a device vanishes during sleep, the PM
> > core isn't responsible for unregistering it -- the device's subsystem
> > is.
>
> Yes, that makes sense. You are right.

Still, if ->resume() returns an error, does it make sense, from the PM core's
point of view, to execute ->complete() for that device, for example?

If you think it does, that behavior should be clearly documented (I didn't
think about that before).

Thanks,
Rafael

2008-03-25 20:49:46

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

Am Dienstag, 25. M?rz 2008 21:41:48 schrieb Rafael J. Wysocki:
> On Tuesday, 25 of March 2008, Oliver Neukum wrote:
> > Am Dienstag, 25. M?rz 2008 15:33:22 schrieb Alan Stern:
> > > > so I'd say a
> > > > failure to resume is just a limited subcase of a device vanishing during
> > > > sleep.
> > >
> > > I'll go along with that. ?If a device vanishes during sleep, the PM
> > > core isn't responsible for unregistering it -- the device's subsystem
> > > is.
> >
> > Yes, that makes sense. You are right.
>
> Still, if ->resume() returns an error, does it make sense, from the PM core's
> point of view, to execute ->complete() for that device, for example?

IMO you must always keep the ordering invariant. If a parent returns an error
the PM core must not wake its children.

Regards
Oliver

2008-03-25 20:56:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Tuesday, 25 of March 2008, Oliver Neukum wrote:
> Am Dienstag, 25. M?rz 2008 21:41:48 schrieb Rafael J. Wysocki:
> > On Tuesday, 25 of March 2008, Oliver Neukum wrote:
> > > Am Dienstag, 25. M?rz 2008 15:33:22 schrieb Alan Stern:
> > > > > so I'd say a
> > > > > failure to resume is just a limited subcase of a device vanishing during
> > > > > sleep.
> > > >
> > > > I'll go along with that. ?If a device vanishes during sleep, the PM
> > > > core isn't responsible for unregistering it -- the device's subsystem
> > > > is.
> > >
> > > Yes, that makes sense. You are right.
> >
> > Still, if ->resume() returns an error, does it make sense, from the PM core's
> > point of view, to execute ->complete() for that device, for example?
>
> IMO you must always keep the ordering invariant. If a parent returns an error
> the PM core must not wake its children.

I'm agreeing here, but one of the previous Alan's comments suggests he has a
differing opinion. Alan?

I'm considering to make the PM core skip the resuming of the children of
devices that failed to resume and skip calling ->complete() for that devices
and their children.

Thanks,
Rafael

2008-03-26 14:10:22

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Tue, 25 Mar 2008, Rafael J. Wysocki wrote:

> On Tuesday, 25 of March 2008, Oliver Neukum wrote:
> > Am Dienstag, 25. M?rz 2008 21:41:48 schrieb Rafael J. Wysocki:
> > > On Tuesday, 25 of March 2008, Oliver Neukum wrote:
> > > > Am Dienstag, 25. M?rz 2008 15:33:22 schrieb Alan Stern:
> > > > > > so I'd say a
> > > > > > failure to resume is just a limited subcase of a device vanishing during
> > > > > > sleep.
> > > > >
> > > > > I'll go along with that. ?If a device vanishes during sleep, the PM
> > > > > core isn't responsible for unregistering it -- the device's subsystem
> > > > > is.
> > > >
> > > > Yes, that makes sense. You are right.
> > >
> > > Still, if ->resume() returns an error, does it make sense, from the PM core's
> > > point of view, to execute ->complete() for that device, for example?
> >
> > IMO you must always keep the ordering invariant. If a parent returns an error
> > the PM core must not wake its children.

Don't think of it that way. The PM core doesn't wake anything. It
merely notifies drivers that the system sleep is ending, so that the
drivers can wake their devices. It's up to the driver to detect
whether the parent failed to resume, in which case the driver should
take appropriate action.

The situation is no different from what happens when the user tries to
access a mounted USB disk drive after the USB cable has been unplugged.
The drivers take care of everything.

> I'm agreeing here, but one of the previous Alan's comments suggests he has a
> differing opinion. Alan?
>
> I'm considering to make the PM core skip the resuming of the children of
> devices that failed to resume and skip calling ->complete() for that devices
> and their children.

While that might in principle be a reasonable thing to do, it's
different from how the PM core has behaved in the past. I don't see
any point in making a change like that now.

Alan Stern

2008-03-26 14:24:31

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

Am Mittwoch, 26. M?rz 2008 15:10:01 schrieb Alan Stern:
> > > IMO you must always keep the ordering invariant. If a parent returns an error
> > > the PM core must not wake its children.
>
> Don't think of it that way. ?The PM core doesn't wake anything. ?It
> merely notifies drivers that the system sleep is ending, so that the
> drivers can wake their devices. ?It's up to the driver to detect
> whether the parent failed to resume, in which case the driver should
> take appropriate action.

How do you propose that every driver should check the power state
of its parent? Without locking the parent?

> The situation is no different from what happens when the user tries to
> access a mounted USB disk drive after the USB cable has been unplugged. ?
> The drivers take care of everything.

That completely throws away the reason to have a PM core. We've made
a guarantee to drivers that they wil not be woken unless their parents are
awake. In fact the semantics of the callbacks are defined in a way that
adding devices to a parent can be enabled. You cannot add children to a
dead parent. It's the very reason for this rewrite.

Regards
Oliver

2008-03-26 14:40:37

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Wed, 26 Mar 2008, Oliver Neukum wrote:

> Am Mittwoch, 26. M?rz 2008 15:10:01 schrieb Alan Stern:
> > > > IMO you must always keep the ordering invariant. If a parent returns an error
> > > > the PM core must not wake its children.
> >
> > Don't think of it that way. ?The PM core doesn't wake anything. ?It
> > merely notifies drivers that the system sleep is ending, so that the
> > drivers can wake their devices. ?It's up to the driver to detect
> > whether the parent failed to resume, in which case the driver should
> > take appropriate action.
>
> How do you propose that every driver should check the power state
> of its parent? Without locking the parent?

It depends entirely on the driver and subsystem; generalizations are
not possible. When appropriate, they can copy the scheme used by USB.
Or they can make up their own scheme.

> > The situation is no different from what happens when the user tries to
> > access a mounted USB disk drive after the USB cable has been unplugged. ?
> > The drivers take care of everything.
>
> That completely throws away the reason to have a PM core.

Nonsense.

> We've made
> a guarantee to drivers that they wil not be woken unless their parents are
> awake.

No we haven't. The guarantee is only that the PM core will call the
parent's resume method before calling the child's method. There is no
guarantee about whether the method succeeds or the parent is awake.

Remember, the whole purpose of this is to let drivers know when the
system is going to sleep or waking up. Proper handling of devices is
up to the drivers, not up to the core.

> In fact the semantics of the callbacks are defined in a way that
> adding devices to a parent can be enabled. You cannot add children to a
> dead parent.

Depends what you mean. In some cases it is possible -- i.e., you can
do it and the kernel won't crash, although the new children probably
won't be usable. In any case it doesn't matter; the PM core doesn't
care whether children are added to a dead parent, only drivers care.

> It's the very reason for this rewrite.

No it isn't. See Rafael's changelog ocmments; the rewrite is being
done for completely different reasons.

Alan Stern

2008-03-26 15:42:39

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

Am Mittwoch, 26. M?rz 2008 15:40:27 schrieb Alan Stern:
> Remember, the whole purpose of this is to let drivers know when the
> system is going to sleep or waking up. ?Proper handling of devices is
> up to the drivers, not up to the core.

Then declare these methods void. We cannot introduce methods that deliberately
ignore errors. Reporting is also better done in the drivers.

Regards
Oliver

2008-03-26 16:36:47

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Wed, 26 Mar 2008, Oliver Neukum wrote:

> Am Mittwoch, 26. M?rz 2008 15:40:27 schrieb Alan Stern:
> > Remember, the whole purpose of this is to let drivers know when the
> > system is going to sleep or waking up. ?Proper handling of devices is
> > up to the drivers, not up to the core.
>
> Then declare these methods void. We cannot introduce methods that deliberately
> ignore errors. Reporting is also better done in the drivers.

That decision is up to Rafael. Changing the methods to return void is
okay with me.

Alan Stern

2008-03-26 20:47:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Wednesday, 26 of March 2008, Alan Stern wrote:
> On Wed, 26 Mar 2008, Oliver Neukum wrote:
>
> > Am Mittwoch, 26. M?rz 2008 15:40:27 schrieb Alan Stern:
> > > Remember, the whole purpose of this is to let drivers know when the
> > > system is going to sleep or waking up. ?Proper handling of devices is
> > > up to the drivers, not up to the core.
> >
> > Then declare these methods void. We cannot introduce methods that deliberately
> > ignore errors. Reporting is also better done in the drivers.
>
> That decision is up to Rafael. Changing the methods to return void is
> okay with me.

But that's not what they currently do, either.

If I change the methods to void and it turns out in the future that it's
better if they return error codes, it will be rather difficult to go back and
change everything. For this reason, I'd prefer to retain the returning of
error codes.

What exactly do you whink would be wrong with using the error codes to avoid
resuming the children of devices that failed to resume?

Rafael

2008-03-27 02:48:30

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH] PM: Introduce new top level suspend and hibernation callbacks (rev. 2)

On Wed, 26 Mar 2008, Rafael J. Wysocki wrote:

> What exactly do you whink would be wrong with using the error codes to avoid
> resuming the children of devices that failed to resume?

Nothing, I guess. Drivers should work okay either way.

Alan Stern