2008-03-16 23:26:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 0/3] PM: Rework suspend and hibernation code for devices

Hi,

The following three patches are intended to start the redesign of the suspend
and hibernation framework for devices.

The first patch defines the new framework. Specifically, it introduces the
structure 'struct pm_ops' containing a set of hibernation and suspend
callbacks to be defined by bus types, device types, device classes and,
finally, by device drivers (the role of each callback is described in the
comment in pm.h; the documentation will be updated later). It also modifies
the code in drivers/base/power/main.c to use the new framework, if available,
and fall back to the old one otherwise (no visible functional changes should be
added).

The other two patches implement the new suspend and hibernation callbacks
for the platform and PCI bus types.

Please review and tell me what you think.

Thanks,
Rafael


--
"Premature optimization is the root of all evil." - Donald Knuth


2008-03-16 23:26:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

From: Rafael J. Wysocki <[email protected]>

Introduce 'struct pm_ops' representing a set of suspend and
hibernation operations for bus types, device classes and device
types.

Modify the PM core to use 'struct pm_ops' objects, if defined, instead
of the ->suspend(), ->resume(), ->suspend_late(), ->resume_early()
callbacks that will be considered as legacy and gradually phased out.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---

arch/x86/kernel/apm_32.c | 4
drivers/base/power/main.c | 453 +++++++++++++++++++++++++++++++++++-----------
include/linux/device.h | 8
include/linux/pm.h | 241 +++++++++++++++++++++---
kernel/power/disk.c | 14 -
kernel/power/main.c | 4
6 files changed, 583 insertions(+), 141 deletions(-)

Index: linux-2.6/include/linux/pm.h
===================================================================
--- linux-2.6.orig/include/linux/pm.h
+++ linux-2.6/include/linux/pm.h
@@ -114,7 +114,9 @@ typedef struct pm_message {
int event;
} pm_message_t;

-/*
+/**
+ * struct pm_ops - device PM callbacks
+ *
* Several driver power state transitions are externally visible, affecting
* the state of pending I/O queues and (for drivers that touch hardware)
* interrupts, wakeups, DMA, and other hardware state. There may also be
@@ -122,6 +124,213 @@ typedef struct pm_message {
* to the rest of the driver stack (such as a driver that's ON gating off
* clocks which are not in active use).
*
+ * The externally visible transitions are handled with the help of the following
+ * callbacks included in this structure:
+ *
+ * @prepare: Prepare the device for the upcoming transition, but do NOT change
+ * its hardware state. Prevent new children of the device from being
+ * registered and prevent new calls to the probe method from being made
+ * after @prepare() returns. Also, if a concurrent child registration
+ * or a call to the probe method already in progress is detected by
+ * @prepare(), it may return -EAGAIN, so that the PM core can execute it
+ * once again (e.g. after suspending the newly registered child) to recover
+ * from the race condition. This method is executed for all kinds of
+ * suspend transitions and is immediately followed by one of the suspend
+ * callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce().
+ *
+ * @complete: Undo the changes made by @prepare(). This method is executed for
+ * all kinds of resume transitions, immediately following one of the resume
+ * callbacks: @resume(), @thaw(), @restore(), or @recover(). Also executed
+ * if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce())
+ * immediately following a successful @prepare() fails OR if a new child
+ * of the device has been registered during @prepare().
+ *
+ * @suspend: Executed before putting the system into a sleep state in which the
+ * contents of main memory are preserved. Quiesce the device, put it into
+ * a low power state appropriate for the upcoming system state (such as
+ * PCI_D3hot), and enable wakeup events as appropriate.
+ *
+ * @resume: Executed after waking the system up from a sleep state in which the
+ * contents of main memory were preserved. Put the device into the
+ * appropriate state, according to the information saved in memory by the
+ * preceding @suspend(). The driver starts working again, responding to
+ * hardware events and software requests. The hardware may have gone
+ * through a power-off reset, or it may have maintained state from the
+ * previous suspend() which the driver may rely on while resuming. On most
+ * platforms, there are no restrictions on availability of resources like
+ * clocks during @resume().
+ *
+ * @freeze: Hibernation-specific, executed before creating a hibernation image.
+ * Quiesce operations so that a consistent image can be created, but do NOT
+ * otherwise put the device into a low power device state and do NOT emit
+ * system wakeup events. Save in main memory the device settings to be
+ * used by @restore() during the subsequent resume from hibernation.
+ *
+ * @thaw: Hibernation-specific, executed after creating a hibernation image.
+ * Undo the changes made by the preceding @freeze(), so the device can be
+ * operated in the same way as immediately before the call to @freeze().
+ *
+ * @poweroff: Hibernation-specific, executed after saving a hibernation image.
+ * Quiesce the device, put it into a low power state appropriate for the
+ * upcoming system state (such as PCI_D3hot), and enable wakeup events as
+ * appropriate.
+ *
+ * @quiesce: Hibernation-specific, executed after loading a hibernation image
+ * and before restoring the contents of main memory from it. Quiesce
+ * operations so that the contents of main memory can be restored from the
+ * image in a consistent way, but do NOT otherwise put the device into a
+ * low power state and do NOT emit system wakeup events. Do NOT save any
+ * device settings in main memory.
+ *
+ * @restore: Hibernation-specific, executed after restoring the contents of main
+ * memory from a hibernation image. Driver starts working again,
+ * responding to hardware events and software requests. Do NOT make ANY
+ * assumptions about the hardware state right prior to @restore(). On most
+ * platforms, there are no restrictions on availability of resources like
+ * clocks during @restore().
+ *
+ * @recover: Hibernation-specific, executed after a failing creation of a
+ * hibernation image OR after a failing attempt to restore the contents of
+ * main memory from such an image. Undo the changes made by the preceding
+ * @freeze() or @quiesce(), so the device can be operated in the same way
+ * as immediately before the failing transition.
+ *
+ * struct pm_ops is also used for defining driver PM operations to be carried
+ * out with interrupts disabled. In this case, however, there is only one
+ * active CPU while the operations are being performed, so they are executed by
+ * the PM core in a more straightforward way. In particular, the failure of
+ * @prepare() carried out with interrupts disabled causes the entire PM
+ * transition to fail, regardless of the error code returned by it.
+ *
+ * All of the above callbacks, except for @complete(), return error codes.
+ * However, the error codes returned by resume operations, @resume(), @thaw(),
+ * @restore(), and @recover(), are only printed in the system logs, since the
+ * PM core cannot do anything else about them.
+ */
+
+struct pm_ops {
+#ifdef CONFIG_PM_SLEEP
+ int (*prepare)(struct device *dev);
+ void (*complete)(struct device *dev);
+#endif
+#ifdef CONFIG_SUSPEND
+ int (*suspend)(struct device *dev);
+ int (*resume)(struct device *dev);
+#endif
+#ifdef CONFIG_HIBERNATION
+ int (*freeze)(struct device *dev);
+ int (*thaw)(struct device *dev);
+ int (*poweroff)(struct device *dev);
+ int (*quiesce)(struct device *dev);
+ int (*restore)(struct device *dev);
+ int (*recover)(struct device *dev);
+#endif
+};
+
+/**
+ * PM_EVENT_ messages
+ *
+ * The following PM_EVENT_ messages are defined for the internal use of the PM
+ * core, in order to provide a mechanism allowing the high level suspend and
+ * hibernation code to convey the necessary information to the device PM core
+ * code:
+ *
+ * ON No transition.
+ *
+ * FREEZE System is going to hibernate, call ->prepare() and ->freeze()
+ * for all devices.
+ *
+ * SUSPEND System is going to suspend, call ->prepare() and ->suspend()
+ * for all devices.
+ *
+ * HIBERNATE Hibernation image has been saved, call ->prepare() and
+ * ->poweroff() for all devices.
+ *
+ * QUIESCE Contents of main memory are going to be restored from a (loaded)
+ * hibernation image, call ->prepare() and ->quiesce() for all
+ * devices.
+ *
+ * RESUME System is resuming, call ->resume() and ->complete() for all
+ * devices.
+ *
+ * THAW Hibernation image has been created, call ->thaw() and
+ * ->complete() for all devices.
+ *
+ * RESTORE Contents of main memory have been restored from a hibernation
+ * image, call ->restore() and ->complete() for all devices.
+ *
+ * RECOVER Creation of a hibernation image or restoration of the main
+ * memory contents from a hibernation image has failed, call
+ * ->recover() and ->complete() for all devices.
+ */
+
+#define PM_EVENT_ON 0x0000
+#define PM_EVENT_FREEZE 0x0001
+#define PM_EVENT_SUSPEND 0x0002
+#define PM_EVENT_HIBERNATE 0x0004
+#define PM_EVENT_QUIESCE 0x0008
+#define PM_EVENT_RESUME 0x0010
+#define PM_EVENT_THAW 0x0020
+#define PM_EVENT_RESTORE 0x0040
+#define PM_EVENT_RECOVER 0x0080
+
+#define PM_EVENT_SLEEP (PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
+
+#define PMSG_FREEZE ((struct pm_message){ .event = PM_EVENT_FREEZE, })
+#define PMSG_QUIESCE ((struct pm_message){ .event = PM_EVENT_QUIESCE, })
+#define PMSG_SUSPEND ((struct pm_message){ .event = PM_EVENT_SUSPEND, })
+#define PMSG_HIBERNATE ((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
+#define PMSG_RESUME ((struct pm_message){ .event = PM_EVENT_RESUME, })
+#define PMSG_THAW ((struct pm_message){ .event = PM_EVENT_THAW, })
+#define PMSG_RESTORE ((struct pm_message){ .event = PM_EVENT_RESTORE, })
+#define PMSG_RECOVER ((struct pm_message){ .event = PM_EVENT_RECOVER, })
+#define PMSG_ON ((struct pm_message){ .event = PM_EVENT_ON, })
+
+/**
+ * Device power management states
+ *
+ * These state labels are used internally by the PM core to indicate the current
+ * status of a device with respect to the PM core operations.
+ *
+ * DPM_ON Device is regarded as operational. Set this way
+ * initially and when ->resume(), ->thaw(), ->restore(),
+ * ->recover(), or ->complete() is about to be called.
+ * Also set when ->prepare() fails.
+ *
+ * DPM_SUSPENDING Device is currently being suspended. Set when
+ * ->prepare() is about to be called.
+ *
+ * DPM_OFF Device is regarded as suspended. Set when ->suspend(),
+ * ->freeze(), ->poweroff(), or ->quiesce() is about to be
+ * called.
+ */
+
+enum dpm_state {
+ DPM_ON,
+ DPM_SUSPENDING,
+ DPM_OFF,
+};
+
+struct dev_pm_info {
+ pm_message_t power_state;
+ unsigned can_wakeup:1;
+ unsigned should_wakeup:1;
+ enum dpm_state status:2; /* Owned by the PM core */
+#ifdef CONFIG_PM_SLEEP
+ struct list_head entry;
+#endif
+};
+
+/*
+ * The PM_EVENT_ messages are also used by drivers implementing the legacy
+ * suspend framework, based on the ->suspend() and ->resume() callbacks common
+ * for suspend and hibernation transitions, according to the rules below.
+ */
+
+/* Necessary, because several drivers use PM_EVENT_PRETHAW */
+#define PM_EVENT_PRETHAW PM_EVENT_QUIESCE
+
+/*
* One transition is triggered by resume(), after a suspend() call; the
* message is implicit:
*
@@ -166,35 +375,11 @@ typedef struct pm_message {
* or from system low-power states such as standby or suspend-to-RAM.
*/

-#define PM_EVENT_ON 0
-#define PM_EVENT_FREEZE 1
-#define PM_EVENT_SUSPEND 2
-#define PM_EVENT_HIBERNATE 4
-#define PM_EVENT_PRETHAW 8
-
-#define PM_EVENT_SLEEP (PM_EVENT_SUSPEND | PM_EVENT_HIBERNATE)
-
-#define PMSG_FREEZE ((struct pm_message){ .event = PM_EVENT_FREEZE, })
-#define PMSG_PRETHAW ((struct pm_message){ .event = PM_EVENT_PRETHAW, })
-#define PMSG_SUSPEND ((struct pm_message){ .event = PM_EVENT_SUSPEND, })
-#define PMSG_HIBERNATE ((struct pm_message){ .event = PM_EVENT_HIBERNATE, })
-#define PMSG_ON ((struct pm_message){ .event = PM_EVENT_ON, })
-
-struct dev_pm_info {
- pm_message_t power_state;
- unsigned can_wakeup:1;
- unsigned should_wakeup:1;
- bool sleeping:1; /* Owned by the PM core */
-#ifdef CONFIG_PM_SLEEP
- struct list_head entry;
-#endif
-};
+#ifdef CONFIG_PM_SLEEP
+extern void device_power_up(pm_message_t state);
+extern void device_resume(pm_message_t state);

extern int device_power_down(pm_message_t state);
-extern void device_power_up(void);
-extern void device_resume(void);
-
-#ifdef CONFIG_PM_SLEEP
extern int device_suspend(pm_message_t state);
extern int device_prepare_suspend(pm_message_t state);

Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -69,22 +69,30 @@ int device_pm_add(struct device *dev)
dev->bus ? dev->bus->name : "No Bus",
kobject_name(&dev->kobj));
mutex_lock(&dpm_list_mtx);
- if ((dev->parent && dev->parent->power.sleeping) || all_sleeping) {
- if (dev->parent->power.sleeping)
- dev_warn(dev,
- "parent %s is sleeping, will not add\n",
+ if (all_sleeping) {
+ dev_warn(dev, "all devices are sleeping, will not add\n");
+ goto Refuse;
+ }
+ if (dev->parent)
+ switch (dev->parent->power.status) {
+ case DPM_OFF:
+ dev_warn(dev, "parent %s is sleeping, will not add\n",
dev->parent->bus_id);
- else
- dev_warn(dev, "devices are sleeping, will not add\n");
- WARN_ON(true);
- error = -EBUSY;
- } else {
- error = dpm_sysfs_add(dev);
- if (!error)
- list_add_tail(&dev->power.entry, &dpm_active);
- }
+ goto Refuse;
+ case DPM_SUSPENDING:
+ dev->parent->power.status = DPM_ON;
+ break;
+ }
+ error = dpm_sysfs_add(dev);
+ if (!error)
+ list_add_tail(&dev->power.entry, &dpm_active);
+ End:
mutex_unlock(&dpm_list_mtx);
return error;
+ Refuse:
+ WARN_ON(true);
+ error = -EBUSY;
+ goto End;
}

/**
@@ -122,26 +130,135 @@ void device_pm_schedule_removal(struct d
}
EXPORT_SYMBOL_GPL(device_pm_schedule_removal);

+/**
+ * pm_op - execute the PM operation appropiate for given PM event
+ * @dev: Device.
+ * @ops: PM operations to choose from.
+ * @state: PM event message.
+ */
+static int pm_op(struct device *dev, struct pm_ops *ops, pm_message_t state)
+{
+ int error = 0;
+
+ switch (state.event) {
+#ifdef CONFIG_SUSPEND
+ case PM_EVENT_SUSPEND:
+ if (ops->suspend) {
+ error = ops->suspend(dev);
+ suspend_report_result(ops->suspend, error);
+ }
+ break;
+ case PM_EVENT_RESUME:
+ if (ops->resume) {
+ error = ops->resume(dev);
+ suspend_report_result(ops->resume, error);
+ }
+ break;
+#endif /* CONFIG_SUSPEND */
+#ifdef CONFIG_HIBERNATION
+ case PM_EVENT_FREEZE:
+ if (ops->freeze) {
+ error = ops->freeze(dev);
+ suspend_report_result(ops->freeze, error);
+ }
+ break;
+ case PM_EVENT_QUIESCE:
+ if (ops->quiesce) {
+ error = ops->quiesce(dev);
+ suspend_report_result(ops->quiesce, error);
+ }
+ break;
+ case PM_EVENT_HIBERNATE:
+ if (ops->poweroff) {
+ error = ops->poweroff(dev);
+ suspend_report_result(ops->poweroff, error);
+ }
+ break;
+ case PM_EVENT_THAW:
+ if (ops->thaw) {
+ error = ops->thaw(dev);
+ suspend_report_result(ops->thaw, error);
+ }
+ break;
+ case PM_EVENT_RESTORE:
+ if (ops->restore) {
+ error = ops->restore(dev);
+ suspend_report_result(ops->restore, error);
+ }
+ break;
+ case PM_EVENT_RECOVER:
+ if (ops->recover) {
+ error = ops->recover(dev);
+ suspend_report_result(ops->recover, error);
+ }
+ break;
+#endif /* CONFIG_HIBERNATION */
+ default:
+ error = -EINVAL;
+ }
+ return error;
+}
+
+static char *pm_verb(int event)
+{
+ switch (event) {
+ case PM_EVENT_SUSPEND:
+ return "suspend";
+ case PM_EVENT_RESUME:
+ return "resume";
+ case PM_EVENT_FREEZE:
+ return "freeze";
+ case PM_EVENT_QUIESCE:
+ return "quiesce";
+ case PM_EVENT_HIBERNATE:
+ return "hibernate";
+ case PM_EVENT_THAW:
+ return "thaw";
+ case PM_EVENT_RESTORE:
+ return "restore";
+ default:
+ return "(unknown PM event)";
+ }
+}
+
+static void pm_dev_dbg(struct device *dev, pm_message_t state, char *info)
+{
+ dev_dbg(dev, "%s%s%s\n", info, pm_verb(state.event),
+ ((state.event & PM_EVENT_SLEEP) && device_may_wakeup(dev)) ?
+ ", may wakeup" : "");
+}
+
/*------------------------- Resume routines -------------------------*/

/**
- * resume_device_early - Power on one device (early resume).
+ * resume_device_noirq - Power on one device (early resume).
* @dev: Device.
+ * @state: Operation to carry out.
*
* Must be called with interrupts disabled.
*/
-static int resume_device_early(struct device *dev)
+static int resume_device_noirq(struct device *dev, pm_message_t state)
{
int error = 0;

TRACE_DEVICE(dev);
TRACE_RESUME(0);

- if (dev->bus && dev->bus->resume_early) {
- dev_dbg(dev, "EARLY resume\n");
+ if (!dev->bus)
+ goto End;
+
+ if (dev->bus->pm_noirq) {
+ struct pm_ops *ops = dev->bus->pm_noirq;
+
+ pm_dev_dbg(dev, state, "EARLY ");
+ error = pm_op(dev, ops, state);
+ if (ops->complete)
+ ops->complete(dev);
+ } else if (dev->bus->resume_early) {
+ pm_dev_dbg(dev, state, "legacy EARLY ");
error = dev->bus->resume_early(dev);
}
-
+ End:
TRACE_RESUME(error);
return error;
}
@@ -156,7 +273,7 @@ static int resume_device_early(struct de
*
* Must be called with interrupts disabled and only one CPU running.
*/
-static void dpm_power_up(void)
+static void dpm_power_up(pm_message_t state)
{

while (!list_empty(&dpm_off_irq)) {
@@ -164,7 +281,7 @@ static void dpm_power_up(void)
struct device *dev = to_device(entry);

list_move_tail(entry, &dpm_off);
- resume_device_early(dev);
+ resume_device_noirq(dev, state);
}
}

@@ -176,19 +293,46 @@ static void dpm_power_up(void)
*
* Must be called with interrupts disabled.
*/
-void device_power_up(void)
+void device_power_up(pm_message_t state)
{
sysdev_resume();
- dpm_power_up();
+ dpm_power_up(state);
}
EXPORT_SYMBOL_GPL(device_power_up);

/**
+ * complete_device - Complete a PM transition for given device
+ * @dev: Device.
+ * @state: Power transition we are completing.
+ */
+static void complete_device(struct device *dev, pm_message_t state)
+{
+ down(&dev->sem);
+
+ if (dev->bus && dev->bus->pm && dev->bus->pm->complete) {
+ pm_dev_dbg(dev, state, "completing ");
+ dev->bus->pm->complete(dev);
+ }
+
+ if (dev->type && dev->type->pm && dev->type->pm->complete) {
+ pm_dev_dbg(dev, state, "completing type ");
+ dev->type->pm->complete(dev);
+ }
+
+ if (dev->class && dev->class->pm && dev->class->pm->complete) {
+ pm_dev_dbg(dev, state, "completing class ");
+ dev->class->pm->complete(dev);
+ }
+
+ up(&dev->sem);
+}
+
+/**
* resume_device - Restore state for one device.
* @dev: Device.
- *
+ * @state: Operation to carry out.
*/
-static int resume_device(struct device *dev)
+static int resume_device(struct device *dev, pm_message_t state)
{
int error = 0;

@@ -197,21 +341,40 @@ static int resume_device(struct device *

down(&dev->sem);

- if (dev->bus && dev->bus->resume) {
- dev_dbg(dev,"resuming\n");
- error = dev->bus->resume(dev);
+ if (dev->bus) {
+ if (dev->bus->pm) {
+ pm_dev_dbg(dev, state, "");
+ error = pm_op(dev, dev->bus->pm, state);
+ } else if (dev->bus->resume) {
+ pm_dev_dbg(dev, state, "legacy ");
+ error = dev->bus->resume(dev);
+ }
+ if (error)
+ goto End;
}

- if (!error && dev->type && dev->type->resume) {
- dev_dbg(dev,"resuming\n");
- error = dev->type->resume(dev);
+ if (dev->type) {
+ if (dev->type->pm) {
+ pm_dev_dbg(dev, state, "type ");
+ error = pm_op(dev, dev->type->pm, state);
+ } else if (dev->type->resume) {
+ pm_dev_dbg(dev, state, "legacy type ");
+ error = dev->type->resume(dev);
+ }
+ if (error)
+ goto End;
}

- if (!error && dev->class && dev->class->resume) {
- dev_dbg(dev,"class resume\n");
- error = dev->class->resume(dev);
+ if (dev->class) {
+ if (dev->class->pm) {
+ pm_dev_dbg(dev, state, "class ");
+ error = pm_op(dev, dev->class->pm, state);
+ } else if (dev->class->resume) {
+ pm_dev_dbg(dev, state, "legacy class ");
+ error = dev->class->resume(dev);
+ }
}
-
+ End:
up(&dev->sem);

TRACE_RESUME(error);
@@ -226,9 +389,9 @@ static int resume_device(struct device *
* went through the early resume.
*
* Take devices from the dpm_off_list, resume them,
- * and put them on the dpm_locked list.
+ * and put them on the dpm_active list.
*/
-static void dpm_resume(void)
+static void dpm_resume(pm_message_t state)
{
mutex_lock(&dpm_list_mtx);
all_sleeping = false;
@@ -237,9 +400,10 @@ static void dpm_resume(void)
struct device *dev = to_device(entry);

list_move_tail(entry, &dpm_active);
- dev->power.sleeping = false;
+ dev->power.status = DPM_ON;
mutex_unlock(&dpm_list_mtx);
- resume_device(dev);
+ resume_device(dev, state);
+ complete_device(dev, state);
mutex_lock(&dpm_list_mtx);
}
mutex_unlock(&dpm_list_mtx);
@@ -267,14 +431,15 @@ static void unregister_dropped_devices(v

/**
* device_resume - Restore state of each device in system.
+ * @state: Operation to carry out.
*
* Resume all the devices, unlock them all, and allow new
* devices to be registered once again.
*/
-void device_resume(void)
+void device_resume(pm_message_t state)
{
might_sleep();
- dpm_resume();
+ dpm_resume(state);
unregister_dropped_devices();
}
EXPORT_SYMBOL_GPL(device_resume);
@@ -282,37 +447,52 @@ EXPORT_SYMBOL_GPL(device_resume);

/*------------------------- Suspend routines -------------------------*/

-static inline char *suspend_verb(u32 event)
+/**
+ * resume_event - return a PM message representing the resume event
+ * corresponding to given sleep state.
+ * @sleep_state - PM message representing a sleep state
+ */
+static pm_message_t resume_event(pm_message_t sleep_state)
{
- switch (event) {
- case PM_EVENT_SUSPEND: return "suspend";
- case PM_EVENT_FREEZE: return "freeze";
- case PM_EVENT_PRETHAW: return "prethaw";
- default: return "(unknown suspend event)";
+ switch (sleep_state.event) {
+ case PM_EVENT_SUSPEND:
+ return PMSG_RESUME;
+ case PM_EVENT_FREEZE:
+ case PM_EVENT_QUIESCE:
+ return PMSG_RECOVER;
+ case PM_EVENT_HIBERNATE:
+ return PMSG_RESTORE;
}
-}
-
-static void
-suspend_device_dbg(struct device *dev, pm_message_t state, char *info)
-{
- dev_dbg(dev, "%s%s%s\n", info, suspend_verb(state.event),
- ((state.event == PM_EVENT_SUSPEND) && device_may_wakeup(dev)) ?
- ", may wakeup" : "");
+ return PMSG_ON;
}

/**
- * suspend_device_late - Shut down one device (late suspend).
+ * suspend_device_noirq - Shut down one device (late suspend).
* @dev: Device.
- * @state: Power state device is entering.
+ * @state: PM message representing the operation to perform.
*
* This is called with interrupts off and only a single CPU running.
*/
-static int suspend_device_late(struct device *dev, pm_message_t state)
+static int suspend_device_noirq(struct device *dev, pm_message_t state)
{
int error = 0;

- if (dev->bus && dev->bus->suspend_late) {
- suspend_device_dbg(dev, state, "LATE ");
+ if (!dev->bus)
+ return 0;
+
+ if (dev->bus->pm_noirq) {
+ struct pm_ops *ops = dev->bus->pm_noirq;
+
+ pm_dev_dbg(dev, state, "LATE ");
+ if (ops->prepare)
+ error = ops->prepare(dev);
+ if (!error) {
+ error = pm_op(dev, ops, state);
+ if (error && ops->complete)
+ ops->complete(dev);
+ }
+ } else if (dev->bus->suspend_late) {
+ pm_dev_dbg(dev, state, "legacy LATE ");
error = dev->bus->suspend_late(dev, state);
suspend_report_result(dev->bus->suspend_late, error);
}
@@ -321,7 +501,7 @@ static int suspend_device_late(struct de

/**
* device_power_down - Shut down special devices.
- * @state: Power state to enter.
+ * @state: PM message representing the operation to perform.
*
* Power down devices that require interrupts to be disabled
* and move them from the dpm_off list to the dpm_off_irq list.
@@ -337,7 +517,7 @@ int device_power_down(pm_message_t state
struct list_head *entry = dpm_off.prev;
struct device *dev = to_device(entry);

- error = suspend_device_late(dev, state);
+ error = suspend_device_noirq(dev, state);
if (error) {
printk(KERN_ERR "Could not power down device %s: "
"error %d\n",
@@ -351,12 +531,50 @@ int device_power_down(pm_message_t state
if (!error)
error = sysdev_suspend(state);
if (error)
- dpm_power_up();
+ dpm_power_up(resume_event(state));
return error;
}
EXPORT_SYMBOL_GPL(device_power_down);

/**
+ * prepare_device - Execute the ->prepare() callback(s) for given device.
+ * @dev: Device.
+ * @state: PM operation we are preparing for.
+ */
+static int prepare_device(struct device *dev, pm_message_t state)
+{
+ int error = 0;
+
+ down(&dev->sem);
+
+ if (dev->bus && dev->bus->pm && dev->bus->pm->prepare) {
+ pm_dev_dbg(dev, state, "preparing ");
+ error = dev->bus->pm->prepare(dev);
+ suspend_report_result(dev->bus->pm->prepare, error);
+ if (error)
+ goto End;
+ }
+
+ if (dev->type && dev->type->pm && dev->type->pm->prepare) {
+ pm_dev_dbg(dev, state, "preparing type ");
+ error = dev->type->pm->prepare(dev);
+ suspend_report_result(dev->type->pm->prepare, error);
+ if (error)
+ goto End;
+ }
+
+ if (dev->class && dev->class->pm && dev->class->pm->prepare) {
+ pm_dev_dbg(dev, state, "preparing class ");
+ error = dev->class->pm->prepare(dev);
+ suspend_report_result(dev->class->pm->prepare, error);
+ }
+ End:
+ up(&dev->sem);
+
+ return error;
+}
+
+/**
* suspend_device - Save state of one device.
* @dev: Device.
* @state: Power state device is entering.
@@ -367,29 +585,43 @@ static int suspend_device(struct device

down(&dev->sem);

- if (dev->power.power_state.event) {
- dev_dbg(dev, "PM: suspend %d-->%d\n",
- dev->power.power_state.event, state.event);
- }
-
- if (dev->class && dev->class->suspend) {
- suspend_device_dbg(dev, state, "class ");
- error = dev->class->suspend(dev, state);
- suspend_report_result(dev->class->suspend, error);
+ if (dev->class) {
+ if (dev->class->pm) {
+ pm_dev_dbg(dev, state, "class ");
+ error = pm_op(dev, dev->class->pm, state);
+ } else if (dev->class->suspend) {
+ pm_dev_dbg(dev, state, "legacy class ");
+ error = dev->class->suspend(dev, state);
+ suspend_report_result(dev->class->suspend, error);
+ }
+ if (error)
+ goto End;
}

- if (!error && dev->type && dev->type->suspend) {
- suspend_device_dbg(dev, state, "type ");
- error = dev->type->suspend(dev, state);
- suspend_report_result(dev->type->suspend, error);
+ if (dev->type) {
+ if (dev->type->pm) {
+ pm_dev_dbg(dev, state, "type ");
+ error = pm_op(dev, dev->type->pm, state);
+ } else if (dev->type->suspend) {
+ pm_dev_dbg(dev, state, "legacy type ");
+ error = dev->type->suspend(dev, state);
+ suspend_report_result(dev->type->suspend, error);
+ }
+ if (error)
+ goto End;
}

- if (!error && dev->bus && dev->bus->suspend) {
- suspend_device_dbg(dev, state, "");
- error = dev->bus->suspend(dev, state);
- suspend_report_result(dev->bus->suspend, error);
+ if (dev->bus) {
+ if (dev->bus->pm) {
+ pm_dev_dbg(dev, state, "");
+ error = pm_op(dev, dev->bus->pm, state);
+ } else if (dev->bus->suspend) {
+ pm_dev_dbg(dev, state, "legacy ");
+ error = dev->bus->suspend(dev, state);
+ suspend_report_result(dev->bus->suspend, error);
+ }
}
-
+ End:
up(&dev->sem);

return error;
@@ -399,13 +631,8 @@ static int suspend_device(struct device
* dpm_suspend - Suspend every device.
* @state: Power state to put each device in.
*
- * Walk the dpm_locked list. Suspend each device and move it
+ * Walk the dpm_active list. Suspend each device and move it
* to the dpm_off list.
- *
- * (For historical reasons, if it returns -EAGAIN, that used to mean
- * that the device would be called again with interrupts disabled.
- * These days, we use the "suspend_late()" callback for that, so we
- * print a warning and consider it an error).
*/
static int dpm_suspend(pm_message_t state)
{
@@ -416,30 +643,53 @@ static int dpm_suspend(pm_message_t stat
struct list_head *entry = dpm_active.prev;
struct device *dev = to_device(entry);

- WARN_ON(dev->parent && dev->parent->power.sleeping);
+ WARN_ON(dev->parent && dev->parent->power.status == DPM_OFF);
+ dev->power.status = DPM_SUSPENDING;
+ mutex_unlock(&dpm_list_mtx);
+
+ error = prepare_device(dev, state);

- dev->power.sleeping = true;
+ mutex_lock(&dpm_list_mtx);
+ if (error) {
+ dev->power.status = DPM_ON;
+ if (error == -EAGAIN)
+ continue;
+ printk(KERN_ERR "Could not prepare device %s "
+ "for suspend: error %d\n",
+ kobject_name(&dev->kobj), error);
+ goto Unlock;
+ }
+ if (dev->power.status == DPM_ON) {
+ /* Child added during prepare_device() */
+ mutex_unlock(&dpm_list_mtx);
+
+ complete_device(dev, resume_event(state));
+
+ mutex_lock(&dpm_list_mtx);
+ continue;
+ }
+ dev->power.status = DPM_OFF;
mutex_unlock(&dpm_list_mtx);
+
error = suspend_device(dev, state);
+
mutex_lock(&dpm_list_mtx);
if (error) {
+ dev->power.status = DPM_ON;
printk(KERN_ERR "Could not suspend device %s: "
- "error %d%s\n",
- kobject_name(&dev->kobj),
- error,
- (error == -EAGAIN ?
- " (please convert to suspend_late)" :
- ""));
- dev->power.sleeping = false;
- break;
+ "error %d\n", kobject_name(&dev->kobj), error);
+ mutex_unlock(&dpm_list_mtx);
+
+ complete_device(dev, resume_event(state));
+ goto End;
}
if (!list_empty(&dev->power.entry))
list_move(&dev->power.entry, &dpm_off);
}
- if (!error)
- all_sleeping = true;
+ all_sleeping = true;
+ Unlock:
mutex_unlock(&dpm_list_mtx);
-
+ End:
return error;
}

@@ -447,8 +697,7 @@ static int dpm_suspend(pm_message_t stat
* device_suspend - Save state and stop all devices in system.
* @state: new power management state
*
- * Prevent new devices from being registered, then lock all devices
- * and suspend them.
+ * Prepare and suspend all devices.
*/
int device_suspend(pm_message_t state)
{
@@ -457,7 +706,7 @@ int device_suspend(pm_message_t state)
might_sleep();
error = dpm_suspend(state);
if (error)
- device_resume();
+ device_resume(resume_event(state));
return error;
}
EXPORT_SYMBOL_GPL(device_suspend);
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -69,6 +69,9 @@ struct bus_type {
int (*resume_early)(struct device *dev);
int (*resume)(struct device *dev);

+ struct pm_ops *pm;
+ struct pm_ops *pm_noirq;
+
struct bus_type_private *p;
};

@@ -202,6 +205,8 @@ struct class {

int (*suspend)(struct device *dev, pm_message_t state);
int (*resume)(struct device *dev);
+
+ struct pm_ops *pm;
};

extern int __must_check class_register(struct class *class);
@@ -345,8 +350,11 @@ struct device_type {
struct attribute_group **groups;
int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
void (*release)(struct device *dev);
+
int (*suspend)(struct device *dev, pm_message_t state);
int (*resume)(struct device *dev);
+
+ struct pm_ops *pm;
};

/* interface for exporting device attributes */
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -224,7 +224,7 @@ static int create_image(int platform_mod
/* NOTE: device_power_up() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
*/
- device_power_up();
+ device_power_up(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
Enable_irqs:
local_irq_enable();
return error;
@@ -280,7 +280,7 @@ int hibernation_snapshot(int platform_mo
Finish:
platform_finish(platform_mode);
Resume_devices:
- device_resume();
+ device_resume(in_suspend ? PMSG_RECOVER : PMSG_RESTORE);
Resume_console:
resume_console();
Close:
@@ -301,7 +301,7 @@ static int resume_target_kernel(void)
int error;

local_irq_disable();
- error = device_power_down(PMSG_PRETHAW);
+ error = device_power_down(PMSG_QUIESCE);
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
"aborting resume\n");
@@ -329,7 +329,7 @@ static int resume_target_kernel(void)
swsusp_free();
restore_processor_state();
touch_softlockup_watchdog();
- device_power_up();
+ device_power_up(PMSG_THAW);
Enable_irqs:
local_irq_enable();
return error;
@@ -350,7 +350,7 @@ int hibernation_restore(int platform_mod

pm_prepare_console();
suspend_console();
- error = device_suspend(PMSG_PRETHAW);
+ error = device_suspend(PMSG_QUIESCE);
if (error)
goto Finish;

@@ -362,7 +362,7 @@ int hibernation_restore(int platform_mod
enable_nonboot_cpus();
}
platform_restore_cleanup(platform_mode);
- device_resume();
+ device_resume(PMSG_RECOVER);
Finish:
resume_console();
pm_restore_console();
@@ -419,7 +419,7 @@ int hibernation_platform_enter(void)
Finish:
hibernation_ops->finish();
Resume_devices:
- device_resume();
+ device_resume(PMSG_RESTORE);
Resume_console:
resume_console();
Close:
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -239,7 +239,7 @@ static int suspend_enter(suspend_state_t
if (!suspend_test(TEST_CORE))
error = suspend_ops->enter(state);

- device_power_up();
+ device_power_up(PMSG_RESUME);
Done:
arch_suspend_enable_irqs();
BUG_ON(irqs_disabled());
@@ -291,7 +291,7 @@ int suspend_devices_and_enter(suspend_st
if (suspend_ops->finish)
suspend_ops->finish();
Resume_devices:
- device_resume();
+ device_resume(PMSG_RESUME);
Resume_console:
resume_console();
Close:
Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1208,9 +1208,9 @@ static int suspend(int vetoable)
if (err != APM_SUCCESS)
apm_error("suspend", err);
err = (err == APM_SUCCESS) ? 0 : -EIO;
- device_power_up();
+ device_power_up(PMSG_RESUME);
local_irq_enable();
- device_resume();
+ device_resume(PMSG_RESUME);
queue_event(APM_NORMAL_RESUME, NULL);
out:
spin_lock(&user_list_lock);

2008-03-16 23:27:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 2/3] PM: New suspend and hibernation callbacks for platform bus type

From: Rafael J. Wysocki <[email protected]>

Implement new suspend and hibernation callbacks for the platform bus
type.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/platform.c | 444 ++++++++++++++++++++++++++++++++++++++--
include/linux/platform_device.h | 2
2 files changed, 425 insertions(+), 21 deletions(-)

Index: linux-2.6/include/linux/platform_device.h
===================================================================
--- linux-2.6.orig/include/linux/platform_device.h
+++ linux-2.6/include/linux/platform_device.h
@@ -53,6 +53,8 @@ struct platform_driver {
int (*suspend_late)(struct platform_device *, pm_message_t state);
int (*resume_early)(struct platform_device *);
int (*resume)(struct platform_device *);
+ struct pm_ops *pm;
+ struct pm_ops *pm_noirq;
struct device_driver driver;
};

Index: linux-2.6/drivers/base/platform.c
===================================================================
--- linux-2.6.orig/drivers/base/platform.c
+++ linux-2.6/drivers/base/platform.c
@@ -560,61 +560,463 @@ static int platform_match(struct device
return (strncmp(pdev->name, drv->name, BUS_ID_SIZE) == 0);
}

-static int platform_suspend(struct device *dev, pm_message_t mesg)
+#ifdef CONFIG_PM_SLEEP
+static int platform_pm_prepare(struct device *dev)
{
+ struct platform_driver *drv;
int ret = 0;

- if (dev->driver && dev->driver->suspend)
- ret = dev->driver->suspend(dev, mesg);
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm && drv->pm->prepare)
+ ret = drv->pm->prepare(dev);
+
+ return ret;
+}
+
+static int platform_pm_prepare_noirq(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm_noirq && drv->pm_noirq->prepare)
+ ret = drv->pm_noirq->prepare(dev);
+
+ return ret;
+}
+
+static void platform_pm_complete(struct device *dev)
+{
+ struct platform_driver *drv;
+
+ if (!dev->driver)
+ return;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm && drv->pm->complete)
+ drv->pm->complete(dev);
+}
+
+static void platform_pm_complete_noirq(struct device *dev)
+{
+ struct platform_driver *drv;
+
+ if (!dev->driver)
+ return;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm_noirq && drv->pm_noirq->complete)
+ drv->pm_noirq->complete(dev);
+}
+
+#ifdef CONFIG_SUSPEND
+static int platform_pm_suspend(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm) {
+ if (drv->pm->suspend)
+ ret = drv->pm->suspend(dev);
+ } else if (dev->driver->suspend) {
+ /* Legacy mechanism */
+ ret = dev->driver->suspend(dev, PMSG_SUSPEND);
+ }
+
+ return ret;
+}
+
+static int platform_pm_suspend_noirq(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->suspend)
+ ret = drv->pm_noirq->suspend(dev);
+ } else {
+ /* Legacy mechanism */
+ struct platform_device *pdev;
+
+ pdev = container_of(dev, struct platform_device, dev);
+ if (drv->suspend_late)
+ ret = drv->suspend_late(pdev, PMSG_SUSPEND);
+ }
+
+ return ret;
+}
+
+static int platform_pm_resume(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm) {
+ if (drv->pm->resume)
+ ret = drv->pm->resume(dev);
+ } else if (dev->driver->resume) {
+ /* Legacy mechanism */
+ ret = dev->driver->resume(dev);
+ }

return ret;
}

-static int platform_suspend_late(struct device *dev, pm_message_t mesg)
+static int platform_pm_resume_noirq(struct device *dev)
{
- struct platform_driver *drv = to_platform_driver(dev->driver);
- struct platform_device *pdev;
+ struct platform_driver *drv;
int ret = 0;

- pdev = container_of(dev, struct platform_device, dev);
- if (dev->driver && drv->suspend_late)
- ret = drv->suspend_late(pdev, mesg);
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->resume)
+ ret = drv->pm_noirq->resume(dev);
+ } else {
+ /* Legacy mechanism */
+ struct platform_device *pdev;
+
+ pdev = container_of(dev, struct platform_device, dev);
+ if (drv->resume_early)
+ ret = drv->resume_early(pdev);
+ }

return ret;
}
+#endif /* CONFIG_SUSPEND */

-static int platform_resume_early(struct device *dev)
+#ifdef CONFIG_HIBERNATION
+static int platform_pm_freeze(struct device *dev)
{
- struct platform_driver *drv = to_platform_driver(dev->driver);
- struct platform_device *pdev;
+ struct platform_driver *drv;
int ret = 0;

- pdev = container_of(dev, struct platform_device, dev);
- if (dev->driver && drv->resume_early)
- ret = drv->resume_early(pdev);
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm) {
+ if (drv->pm->freeze)
+ ret = drv->pm->freeze(dev);
+ } else if (dev->driver->suspend) {
+ /* Legacy mechanism */
+ ret = dev->driver->suspend(dev, PMSG_FREEZE);
+ }

return ret;
}

-static int platform_resume(struct device *dev)
+static int platform_pm_freeze_noirq(struct device *dev)
{
+ struct platform_driver *drv;
int ret = 0;

- if (dev->driver && dev->driver->resume)
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->freeze)
+ ret = drv->pm_noirq->freeze(dev);
+ } else {
+ /* Legacy mechanism */
+ struct platform_device *pdev;
+
+ pdev = container_of(dev, struct platform_device, dev);
+ if (drv->suspend_late)
+ ret = drv->suspend_late(pdev, PMSG_FREEZE);
+ }
+
+ return ret;
+}
+
+static int platform_pm_thaw(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm) {
+ if (drv->pm->thaw)
+ ret = drv->pm->thaw(dev);
+ } else if (dev->driver->resume) {
+ /* Legacy mechanism */
ret = dev->driver->resume(dev);
+ }

return ret;
}

+static int platform_pm_thaw_noirq(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->thaw)
+ ret = drv->pm_noirq->thaw(dev);
+ } else {
+ /* Legacy mechanism */
+ struct platform_device *pdev;
+
+ pdev = container_of(dev, struct platform_device, dev);
+ if (drv->resume_early)
+ ret = drv->resume_early(pdev);
+ }
+
+ return ret;
+}
+
+static int platform_pm_poweroff(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm) {
+ if (drv->pm->poweroff)
+ ret = drv->pm->poweroff(dev);
+ } else if (dev->driver->suspend) {
+ /* Legacy mechanism */
+ ret = dev->driver->suspend(dev, PMSG_HIBERNATE);
+ }
+
+ return ret;
+}
+
+static int platform_pm_poweroff_noirq(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->poweroff)
+ ret = drv->pm_noirq->poweroff(dev);
+ } else {
+ /* Legacy mechanism */
+ struct platform_device *pdev;
+
+ pdev = container_of(dev, struct platform_device, dev);
+ if (drv->suspend_late)
+ ret = drv->suspend_late(pdev, PMSG_HIBERNATE);
+ }
+
+ return ret;
+}
+
+static int platform_pm_quiesce(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm) {
+ if (drv->pm->quiesce)
+ ret = drv->pm->quiesce(dev);
+ } else if (dev->driver->suspend) {
+ /* Legacy mechanism */
+ ret = dev->driver->suspend(dev, PMSG_QUIESCE);
+ }
+
+ return ret;
+}
+
+static int platform_pm_quiesce_noirq(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->quiesce)
+ ret = drv->pm_noirq->quiesce(dev);
+ } else {
+ /* Legacy mechanism */
+ struct platform_device *pdev;
+
+ pdev = container_of(dev, struct platform_device, dev);
+ if (drv->suspend_late)
+ ret = drv->suspend_late(pdev, PMSG_QUIESCE);
+ }
+
+ return ret;
+}
+
+static int platform_pm_restore(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm) {
+ if (drv->pm->restore)
+ ret = drv->pm->restore(dev);
+ } else if (dev->driver->resume) {
+ /* Legacy mechanism */
+ ret = dev->driver->resume(dev);
+ }
+
+ return ret;
+}
+
+static int platform_pm_restore_noirq(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->restore)
+ ret = drv->pm_noirq->restore(dev);
+ } else {
+ /* Legacy mechanism */
+ struct platform_device *pdev;
+
+ pdev = container_of(dev, struct platform_device, dev);
+ if (drv->resume_early)
+ ret = drv->resume_early(pdev);
+ }
+
+ return ret;
+}
+
+static int platform_pm_recover(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm) {
+ if (drv->pm->recover)
+ ret = drv->pm->recover(dev);
+ } else if (dev->driver->resume) {
+ /* Legacy mechanism */
+ ret = dev->driver->resume(dev);
+ }
+
+ return ret;
+}
+
+static int platform_pm_recover_noirq(struct device *dev)
+{
+ struct platform_driver *drv;
+ int ret = 0;
+
+ if (!dev->driver)
+ return 0;
+
+ drv = to_platform_driver(dev->driver);
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->recover)
+ ret = drv->pm_noirq->recover(dev);
+ } else {
+ /* Legacy mechanism */
+ struct platform_device *pdev;
+
+ pdev = container_of(dev, struct platform_device, dev);
+ if (drv->resume_early)
+ ret = drv->resume_early(pdev);
+ }
+
+ return ret;
+}
+#endif /* CONFIG_HIBERNATION */
+#endif /* CONFIG_PM_SLEEP */
+
+struct pm_ops platform_pm_ops = {
+#ifdef CONFIG_PM_SLEEP
+ .prepare = platform_pm_prepare,
+ .complete = platform_pm_complete,
+#endif
+#ifdef CONFIG_SUSPEND
+ .suspend = platform_pm_suspend,
+ .resume = platform_pm_resume,
+#endif
+#ifdef CONFIG_HIBERNATION
+ .freeze = platform_pm_freeze,
+ .thaw = platform_pm_thaw,
+ .poweroff = platform_pm_poweroff,
+ .quiesce = platform_pm_quiesce,
+ .restore = platform_pm_restore,
+ .recover = platform_pm_recover,
+#endif
+};
+
+struct pm_ops platform_pm_ops_noirq = {
+#ifdef CONFIG_PM_SLEEP
+ .prepare = platform_pm_prepare_noirq,
+ .complete = platform_pm_complete_noirq,
+#endif
+#ifdef CONFIG_SUSPEND
+ .suspend = platform_pm_suspend_noirq,
+ .resume = platform_pm_resume_noirq,
+#endif
+#ifdef CONFIG_HIBERNATION
+ .freeze = platform_pm_freeze_noirq,
+ .thaw = platform_pm_thaw_noirq,
+ .poweroff = platform_pm_poweroff_noirq,
+ .quiesce = platform_pm_quiesce_noirq,
+ .restore = platform_pm_restore_noirq,
+ .recover = platform_pm_recover_noirq,
+#endif
+};
+
struct bus_type platform_bus_type = {
.name = "platform",
.dev_attrs = platform_dev_attrs,
.match = platform_match,
.uevent = platform_uevent,
- .suspend = platform_suspend,
- .suspend_late = platform_suspend_late,
- .resume_early = platform_resume_early,
- .resume = platform_resume,
+ .pm = &platform_pm_ops,
+ .pm_noirq = &platform_pm_ops_noirq,
};
EXPORT_SYMBOL_GPL(platform_bus_type);

2008-03-16 23:27:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH 3/3] PM: New suspend and hibernation callbacks for PCI bus type

From: Rafael J. Wysocki <[email protected]>

Implement new suspend and hibernation callbacks for the PCI busi type.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci-driver.c | 497 ++++++++++++++++++++++++++++++++++++++++++-----
include/linux/pci.h | 2
2 files changed, 453 insertions(+), 46 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -271,45 +271,78 @@ static int pci_device_remove(struct devi
return 0;
}

-static int pci_device_suspend(struct device * dev, pm_message_t state)
+static void pci_device_shutdown(struct device *dev)
{
- struct pci_dev * pci_dev = to_pci_dev(dev);
- struct pci_driver * drv = pci_dev->driver;
- int i = 0;
-
- if (drv && drv->suspend) {
- i = drv->suspend(pci_dev, state);
- suspend_report_result(drv->suspend, i);
- } else {
- pci_save_state(pci_dev);
- /*
- * mark its power state as "unknown", since we don't know if
- * e.g. the BIOS will change its device state when we suspend.
- */
- if (pci_dev->current_state == PCI_D0)
- pci_dev->current_state = PCI_UNKNOWN;
- }
- return i;
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+
+ if (drv && drv->shutdown)
+ drv->shutdown(pci_dev);
}

-static int pci_device_suspend_late(struct device * dev, pm_message_t state)
+#ifdef CONFIG_PM_SLEEP
+static int pci_pm_prepare(struct device *dev)
{
- struct pci_dev * pci_dev = to_pci_dev(dev);
- struct pci_driver * drv = pci_dev->driver;
- int i = 0;
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;

- if (drv && drv->suspend_late) {
- i = drv->suspend_late(pci_dev, state);
- suspend_report_result(drv->suspend_late, i);
- }
- return i;
+ if (drv && drv->pm && drv->pm->prepare)
+ error = drv->pm->prepare(dev);
+
+ return error;
+}
+
+static int pci_pm_prepare_noirq(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ if (drv && drv->pm_noirq && drv->pm_noirq->prepare)
+ error = drv->pm_noirq->prepare(dev);
+
+ return error;
+}
+
+static void pci_pm_complete(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+
+ if (drv && drv->pm && drv->pm->complete)
+ drv->pm->complete(dev);
+}
+
+static void pci_pm_complete_noirq(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+
+ if (drv && drv->pm_noirq && drv->pm_noirq->complete)
+ drv->pm_noirq->complete(dev);
}

/*
- * Default resume method for devices that have no driver provided resume,
+ * Default "suspend" method for devices that have no driver provided suspend,
* or not even a driver at all.
*/
-static int pci_default_resume(struct pci_dev *pci_dev)
+static void pci_default_pm_suspend(struct pci_dev *pci_dev)
+{
+ pci_save_state(pci_dev);
+ /*
+ * mark its power state as "unknown", since we don't know if
+ * e.g. the BIOS will change its device state when we suspend.
+ */
+ if (pci_dev->current_state == PCI_D0)
+ pci_dev->current_state = PCI_UNKNOWN;
+}
+
+/*
+ * Default "resume" method for devices that have no driver provided resume,
+ * or not even a driver at all.
+ */
+static int pci_default_pm_resume(struct pci_dev *pci_dev)
{
int retval = 0;

@@ -324,40 +357,376 @@ static int pci_default_resume(struct pci
return retval;
}

-static int pci_device_resume(struct device * dev)
+#ifdef CONFIG_SUSPEND
+static int pci_pm_suspend(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ if (!drv) {
+ pci_default_pm_suspend(pci_dev);
+ return 0;
+ }
+
+ if (drv->pm) {
+ if (drv->pm->suspend) {
+ error = drv->pm->suspend(dev);
+ suspend_report_result(drv->pm->suspend, error);
+ } else {
+ pci_default_pm_suspend(pci_dev);
+ }
+ } else {
+ /* Legacy mechanism */
+ if (drv->suspend) {
+ error = drv->suspend(pci_dev, PMSG_SUSPEND);
+ suspend_report_result(drv->suspend, error);
+ } else {
+ pci_default_pm_suspend(pci_dev);
+ }
+ }
+
+ return error;
+}
+
+static int pci_pm_suspend_noirq(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ if (!drv)
+ return 0;
+
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->suspend) {
+ error = drv->pm_noirq->suspend(dev);
+ suspend_report_result(drv->pm_noirq->suspend, error);
+ }
+ } else if (drv->suspend_late) {
+ /* Legacy mechanism */
+ error = drv->suspend_late(pci_dev, PMSG_SUSPEND);
+ suspend_report_result(drv->suspend_late, error);
+ }
+
+ return error;
+}
+
+static int pci_pm_resume(struct device *dev)
{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
int error;
- struct pci_dev * pci_dev = to_pci_dev(dev);
- struct pci_driver * drv = pci_dev->driver;

- if (drv && drv->resume)
+ if (!drv)
+ return pci_default_pm_resume(pci_dev);
+
+ if (drv->pm) {
+ error = drv->pm->resume ?
+ drv->pm->resume(dev) : pci_default_pm_resume(pci_dev);
+ } else {
+ /* Legacy mechanism */
+ error = drv->resume ?
+ drv->resume(pci_dev) : pci_default_pm_resume(pci_dev);
+ }
+
+ return error;
+}
+
+static int pci_pm_resume_noirq(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ pci_fixup_device(pci_fixup_resume, pci_dev);
+
+ if (!drv)
+ return 0;
+
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->resume)
+ error = drv->pm_noirq->resume(dev);
+ } else if (drv->resume_early) {
+ /* Legacy mechanism */
+ error = drv->resume_early(pci_dev);
+ }
+
+ return error;
+}
+#endif /* CONFIG_SUSPEND */
+
+#ifdef CONFIG_HIBERNATION
+static int pci_pm_freeze(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ if (!drv) {
+ pci_default_pm_suspend(pci_dev);
+ return 0;
+ }
+
+ if (drv->pm) {
+ if (drv->pm->freeze) {
+ error = drv->pm->freeze(dev);
+ suspend_report_result(drv->pm->freeze, error);
+ } else {
+ pci_default_pm_suspend(pci_dev);
+ }
+ } else {
+ /* Legacy mechanism */
+ if (drv->suspend) {
+ error = drv->suspend(pci_dev, PMSG_FREEZE);
+ suspend_report_result(drv->suspend, error);
+ } else {
+ pci_default_pm_suspend(pci_dev);
+ }
+ }
+
+ return error;
+}
+
+static int pci_pm_freeze_noirq(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ if (!drv)
+ return 0;
+
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->freeze) {
+ error = drv->pm_noirq->freeze(dev);
+ suspend_report_result(drv->pm_noirq->freeze, error);
+ }
+ } else if (drv->suspend_late) {
+ /* Legacy mechanism */
+ error = drv->suspend_late(pci_dev, PMSG_FREEZE);
+ suspend_report_result(drv->suspend_late, error);
+ }
+
+ return error;
+}
+
+static int pci_pm_thaw(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ if (!drv)
+ return 0;
+
+ if (drv->pm) {
+ if (drv->pm->thaw)
+ error = drv->pm->thaw(dev);
+ } else if (drv->resume) {
+ /* Legacy mechanism */
error = drv->resume(pci_dev);
- else
- error = pci_default_resume(pci_dev);
+ }
+
+ return error;
+}
+
+static int pci_pm_thaw_noirq(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ if (!drv)
+ return 0;
+
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->thaw)
+ error = drv->pm_noirq->thaw(dev);
+ } else if (drv->resume_early) {
+ /* Legacy mechanism */
+ error = drv->resume_early(pci_dev);
+ }
+
+ return error;
+}
+
+static int pci_pm_poweroff(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ if (!drv)
+ return 0;
+
+ if (drv->pm) {
+ if (drv->pm->poweroff) {
+ error = drv->pm->poweroff(dev);
+ suspend_report_result(drv->pm->poweroff, error);
+ }
+ } else if (drv->suspend) {
+ /* Legacy mechanism */
+ error = drv->suspend(pci_dev, PMSG_HIBERNATE);
+ suspend_report_result(drv->suspend, error);
+ }
+
+ return error;
+}
+
+static int pci_pm_poweroff_noirq(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ if (!drv)
+ return 0;
+
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->poweroff) {
+ error = drv->pm_noirq->poweroff(dev);
+ suspend_report_result(drv->pm_noirq->poweroff, error);
+ }
+ } else if (drv->suspend_late) {
+ /* Legacy mechanism */
+ error = drv->suspend_late(pci_dev, PMSG_HIBERNATE);
+ suspend_report_result(drv->suspend_late, error);
+ }
+
+ return error;
+}
+
+static int pci_pm_quiesce(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ if (!drv)
+ return 0;
+
+ if (drv->pm) {
+ if (drv->pm->quiesce) {
+ error = drv->pm->quiesce(dev);
+ suspend_report_result(drv->pm->quiesce, error);
+ }
+ } else if (drv->suspend) {
+ /* Legacy mechanism */
+ error = drv->suspend(pci_dev, PMSG_QUIESCE);
+ suspend_report_result(drv->suspend, error);
+ }
+
+ return error;
+}
+
+static int pci_pm_quiesce_noirq(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ if (!drv)
+ return 0;
+
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->quiesce) {
+ error = drv->pm_noirq->quiesce(dev);
+ suspend_report_result(drv->pm_noirq->quiesce, error);
+ }
+ } else if (drv->suspend_late) {
+ /* Legacy mechanism */
+ error = drv->suspend_late(pci_dev, PMSG_QUIESCE);
+ suspend_report_result(drv->suspend_late, error);
+ }
+
+ return error;
+}
+
+static int pci_pm_restore(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error;
+
+ if (!drv)
+ return pci_default_pm_resume(pci_dev);
+
+ if (drv->pm) {
+ error = drv->pm->restore ?
+ drv->pm->restore(dev) : pci_default_pm_resume(pci_dev);
+ } else {
+ /* Legacy mechanism */
+ error = drv->resume ?
+ drv->resume(pci_dev) : pci_default_pm_resume(pci_dev);
+ }
+
return error;
}

-static int pci_device_resume_early(struct device * dev)
+static int pci_pm_restore_noirq(struct device *dev)
{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
int error = 0;
- struct pci_dev * pci_dev = to_pci_dev(dev);
- struct pci_driver * drv = pci_dev->driver;

pci_fixup_device(pci_fixup_resume, pci_dev);

- if (drv && drv->resume_early)
+ if (!drv)
+ return 0;
+
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->restore)
+ error = drv->pm_noirq->restore(dev);
+ } else if (drv->resume_early) {
+ /* Legacy mechanism */
error = drv->resume_early(pci_dev);
+ }
+
return error;
}

-static void pci_device_shutdown(struct device *dev)
+static int pci_pm_recover(struct device *dev)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
struct pci_driver *drv = pci_dev->driver;
+ int error = 0;

- if (drv && drv->shutdown)
- drv->shutdown(pci_dev);
+ if (!drv)
+ return 0;
+
+ if (drv->pm) {
+ if (drv->pm->recover)
+ error = drv->pm->recover(dev);
+ } else if (drv->resume) {
+ /* Legacy mechanism */
+ error = drv->resume(pci_dev);
+ }
+
+ return error;
+}
+
+static int pci_pm_recover_noirq(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+ int error = 0;
+
+ if (!drv)
+ return 0;
+
+ if (drv->pm_noirq) {
+ if (drv->pm_noirq->recover)
+ error = drv->pm_noirq->recover(dev);
+ } else if (drv->resume_early) {
+ /* Legacy mechanism */
+ error = drv->resume_early(pci_dev);
+ }
+
+ return error;
}
+#endif /* CONFIG_HIBERNATION */
+#endif /* CONFIG_PM_SLEEP */

/**
* __pci_register_driver - register a new pci driver
@@ -500,18 +869,54 @@ int pci_uevent(struct device *dev, struc
}
#endif

+struct pm_ops pci_pm_ops = {
+#ifdef CONFIG_PM_SLEEP
+ .prepare = pci_pm_prepare,
+ .complete = pci_pm_complete,
+#endif
+#ifdef CONFIG_SUSPEND
+ .suspend = pci_pm_suspend,
+ .resume = pci_pm_resume,
+#endif
+#ifdef CONFIG_HIBERNATION
+ .freeze = pci_pm_freeze,
+ .thaw = pci_pm_thaw,
+ .poweroff = pci_pm_poweroff,
+ .quiesce = pci_pm_quiesce,
+ .restore = pci_pm_restore,
+ .recover = pci_pm_recover,
+#endif
+};
+
+struct pm_ops pci_pm_ops_noirq = {
+#ifdef CONFIG_PM_SLEEP
+ .prepare = pci_pm_prepare_noirq,
+ .complete = pci_pm_complete_noirq,
+#endif
+#ifdef CONFIG_SUSPEND
+ .suspend = pci_pm_suspend_noirq,
+ .resume = pci_pm_resume_noirq,
+#endif
+#ifdef CONFIG_HIBERNATION
+ .freeze = pci_pm_freeze_noirq,
+ .thaw = pci_pm_thaw_noirq,
+ .poweroff = pci_pm_poweroff_noirq,
+ .quiesce = pci_pm_quiesce_noirq,
+ .restore = pci_pm_restore_noirq,
+ .recover = pci_pm_recover_noirq,
+#endif
+};
+
struct bus_type pci_bus_type = {
.name = "pci",
.match = pci_bus_match,
.uevent = pci_uevent,
.probe = pci_device_probe,
.remove = pci_device_remove,
- .suspend = pci_device_suspend,
- .suspend_late = pci_device_suspend_late,
- .resume_early = pci_device_resume_early,
- .resume = pci_device_resume,
.shutdown = pci_device_shutdown,
.dev_attrs = pci_dev_attrs,
+ .pm = &pci_pm_ops,
+ .pm_noirq = &pci_pm_ops_noirq,
};

static int __init pci_driver_init(void)
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -382,6 +382,8 @@ struct pci_driver {
int (*resume) (struct pci_dev *dev); /* Device woken up */
void (*shutdown) (struct pci_dev *dev);

+ struct pm_ops *pm;
+ struct pm_ops *pm_noirq;
struct pci_error_handlers *err_handler;
struct device_driver driver;
struct pci_dynids dynids;

2008-03-18 11:47:37

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH 2/3] PM: New suspend and hibernation callbacks for platform bus type

On Mon 2008-03-17 00:24:22, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Implement new suspend and hibernation callbacks for the platform bus
> type.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>

ACK... but... this is getting quite big:

> drivers/base/platform.c | 444 ++++++++++++++++++++++++++++++++++++++--
> include/linux/platform_device.h | 2

...I hope cleanups in drivers are worth it...

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les: http://www.ujezdskystrom.info/

2008-03-18 11:47:49

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/3] PM: New suspend and hibernation callbacks for PCI bus type

On Mon 2008-03-17 00:25:15, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Implement new suspend and hibernation callbacks for the PCI busi type.

"bus", I guess.

> Signed-off-by: Rafael J. Wysocki <[email protected]>

ACK.

> ---
> drivers/pci/pci-driver.c | 497 ++++++++++++++++++++++++++++++++++++++++++-----
> include/linux/pci.h | 2
> 2 files changed, 453 insertions(+), 46 deletions(-)

...but big, too :-(.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les: http://www.ujezdskystrom.info/

2008-03-18 11:48:04

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

Hi!

> Signed-off-by: Rafael J. Wysocki <[email protected]>

ACK.

> + * The externally visible transitions are handled with the help of the following
> + * callbacks included in this structure:
> + *
> + * @prepare: Prepare the device for the upcoming transition, but do NOT change
> + * its hardware state. Prevent new children of the device from being
> + * registered and prevent new calls to the probe method from being made
> + * after @prepare() returns. Also, if a concurrent child registration
> + * or a call to the probe method already in progress is detected by
> + * @prepare(), it may return -EAGAIN, so that the PM core can execute it
> + * once again (e.g. after suspending the newly registered child) to recover
> + * from the race condition. This method is executed for all kinds of
> + * suspend transitions and is immediately followed by one of the suspend
> + * callbacks: @suspend(), @freeze(), @poweroff(), or @quiesce().
> + *
> + * @complete: Undo the changes made by @prepare(). This method is executed for
> + * all kinds of resume transitions, immediately following one of the resume
> + * callbacks: @resume(), @thaw(), @restore(), or @recover(). Also executed
> + * if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce())
> + * immediately following a successful @prepare() fails OR if a new child
> + * of the device has been registered during @prepare().

So... we do prepare() but it detects new child, so it returns -EAGAIN.
so we call complete() based on description above
...and then we call prepare() to suspend again?

> + * @suspend: Executed before putting the system into a sleep state in which the
> + * contents of main memory are preserved. Quiesce the device, put it into

content....is?

> + * @resume: Executed after waking the system up from a sleep state in which the
> + * contents of main memory were preserved. Put the device into the

content...is?

> + * @freeze: Hibernation-specific, executed before creating a hibernation image.
> + * Quiesce operations so that a consistent image can be created, but do NOT
> + * otherwise put the device into a low power device state and do NOT emit
> + * system wakeup events. Save in main memory the device settings to be

"Save device settings (used by @restore()) into main memory"?

> + * @quiesce: Hibernation-specific, executed after loading a hibernation image
> + * and before restoring the contents of main memory from it. Quiesce
> + * operations so that the contents of main memory can be restored from the

operations, so ?
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les: http://www.ujezdskystrom.info/

2008-03-19 20:48:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

On Wednesday, 19 of March 2008, Pavel Machek wrote:
> Hi!

Hi,

> > > So... we do prepare() but it detects new child, so it returns -EAGAIN.
> > > so we call complete() based on description above
> > > ...and then we call prepare() to suspend again?
> >
> > You misunderstood (maybe the comment needs to be clarified as above).
>
> Yes, please :-).
>
> > If prepare() returns any error (including -EAGAIN) then complete() does
> > not get called. If prepare() returns successfully but the PM core
> > detects that a new child was added while prepare() was running, then we
> > call complete(), suspend the child, and call prepare() again.
>
> Ok.
>
> > > > + * @suspend: Executed before putting the system into a sleep state in which the
> > > > + * contents of main memory are preserved. Quiesce the device, put it into
> > >
> > > content....is?
> >
> > It's okay to use "contents" -- analogous to the table of contents in a
> > book. It's one of those weird corner cases where either alternative is
> > acceptable.
>
> Ok -- I guess I should get that english course ;-).

Thanks for the comments.

In the meantime I had an IRC chat with Ben, who wanted a couple of quite
substantial changes to be made to this patch.

First, Ben thinks that ->prepare() should be called in a separate loop for
all devices, before any of them is suspended, so that drivers can assume the
availability of the other devices during ->prepare() (for example, so that they
can use GFP_KERNEL memory allocations). Accordingly, ->complete() would
be called in a separate loop after calling ->resume() for all devices.

Next, he wants the number of noirq callbacks to be reduced.

Both of the above things make sense, so I'll rework the patch (or maybe even
all three patches) to implement them and we'll see how they will look like.

Thanks,
Rafael

2008-03-19 21:16:33

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

On Tue 2008-03-18 17:53:40, Greg KH wrote:
> On Mon, Mar 17, 2008 at 12:22:29AM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Introduce 'struct pm_ops' representing a set of suspend and
> > hibernation operations for bus types, device classes and device
> > types.
>
> Ok, I must have missed the thread describing why we need to do this, so,
> why do we need to do this? What is this going to buy us in the end
> after everything is changed?

That was rather long thread where Linus flamed us for having
everything in one "suspend" callback, selecting what to do by
pm_message_t parameter. It was kind of hard to miss ;-).
Pavel


> > +struct pm_ops {
> > +#ifdef CONFIG_PM_SLEEP
> > + int (*prepare)(struct device *dev);
> > + void (*complete)(struct device *dev);
> > +#endif
> > +#ifdef CONFIG_SUSPEND
> > + int (*suspend)(struct device *dev);
> > + int (*resume)(struct device *dev);
> > +#endif
> > +#ifdef CONFIG_HIBERNATION
> > + int (*freeze)(struct device *dev);
> > + int (*thaw)(struct device *dev);
> > + int (*poweroff)(struct device *dev);
> > + int (*quiesce)(struct device *dev);
> > + int (*restore)(struct device *dev);
> > + int (*recover)(struct device *dev);
> > +#endif
>
>
> Don't ifdef stuff like this, it only causes ifdefs to be needed to the
> .c code as well for all places these structures are defined in
> drivers/busses, right?

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les: http://www.ujezdskystrom.info/

2008-03-19 21:17:28

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

Hi!

> > So... we do prepare() but it detects new child, so it returns -EAGAIN.
> > so we call complete() based on description above
> > ...and then we call prepare() to suspend again?
>
> You misunderstood (maybe the comment needs to be clarified as above).

Yes, please :-).

> If prepare() returns any error (including -EAGAIN) then complete() does
> not get called. If prepare() returns successfully but the PM core
> detects that a new child was added while prepare() was running, then we
> call complete(), suspend the child, and call prepare() again.

Ok.

> > > + * @suspend: Executed before putting the system into a sleep state in which the
> > > + * contents of main memory are preserved. Quiesce the device, put it into
> >
> > content....is?
>
> It's okay to use "contents" -- analogous to the table of contents in a
> book. It's one of those weird corner cases where either alternative is
> acceptable.

Ok -- I guess I should get that english course ;-).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
pomozte zachranit klanovicky les: http://www.ujezdskystrom.info/

2008-03-19 21:32:45

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

Hi Alan !

On Tue, 2008-03-18 at 22:44 -0400, Alan Stern wrote:

> In order to make this work, you would have to prevent new children from
> being registered starting from the time just after prepare() returns,
> rather than from the time just before suspend() is called. Nothing is
> wrong with that, but it requires a redesign of the new flags. (It's
> worth mentioning that drivers will want to have a flag that gets set
> just _before_ prepare() -- just _after_ is too late.)

I think that's the most logical semantic to expose to drivers. We've
been needing prepare() for a lot of things in the past.

The main one is that you know user space is still up & running (not only
it's not been frozen, but even in the absence of a freezer, you know
storage hasn't been stopped yet, etc...). So if your driver has tight
interaction with userspace that requires clean handling of suspend, it
can be done at that point (X with DRI is an example that could make good
use of that).

The whole GFP_KERNEL allocation is another, though we could make
GFP_KERNEL become silently GFP_NOIO when entering the suspend loop, it's
still useful for the driver to know that we are -about- to start a
suspend operation, if any significant storage is needed (for backup for
example, or if it needs to cache a firmware to ensure proper resume),
it's the right time to do it.

Now, about the issue vs. device insertion, I believe it's solvable
unless I missed something:

So it's the responsibility of bus drivers such as khubd to stop
inserting new devices in the tree from prepare(). There we agree. The
problem is about -enforcing- it in the core right ? That is, making
device addition actually fail if a bogus bus driver tries to insert a
device at the wrong time.

This does indeed have an issue of when precisely do we stick that flag
(that tells the core to fail) since before calling the bus driver
prepare() is too early and after is potentially too late.

Now, I think this is not really a problem if prepare() calls the drivers
from top to bottom: In that case, we would set the flag after the bus
driver returns from prepare(). That means that there is a small window
where a bogus bus driver can still insert a new child, but I think that
isn't an issue, specifically because we walk from parent to child:

If the sequence is:

1- prepare() returns
2- we set the "no new addition" flag
3- we start walking children

You see that the only place the where the "bad" bus driver can still
attempt to insert without failing is between 1 and 2. However, that
happens before we start prepare()'ing the children. So that means that
anything that has successfully been added -will- have it's prepare()
called, and thus becomes a normal part of the suspend/resume process.
Any attempt at adding after 2 would fail, thus we are sure the children
list is sane here (won't get new things right in between. Removal might
still happen I suppose, we'll have to look into this closely).

> In the other direction, it ought to be okay to allow new children to
> be registered during resume(). There's no need to wait for complete().

There are pros and cons here. A lot of drivers may rely on
request_firmware() and such, and will possibly deadlock or timeout if
called during resume before storage is available (i've seen that
happening with today scheme). It might be better to be safe here, but I
agree, it's generally less of a concern.

> > Next, he wants the number of noirq callbacks to be reduced.
>
> I agree. For example, there isn't much point in having prepare() and
> complete() entries for noirq.

Exactly. I think we really only need noirq variants of suspend and
resume, and possibly freeze and thaw (I'm not 100% of the later as I
haven't studied the problem of hibernate closely, but I can see a usage
scenario that might make sense).

> > Both of the above things make sense, so I'll rework the patch (or maybe even
> > all three patches) to implement them and we'll see how they will look like.

Cheers,
Ben.

2008-03-19 21:49:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/3] PM: New suspend and hibernation callbacks for PCI bus type

On Wednesday, 19 of March 2008, Greg KH wrote:
> On Mon, Mar 17, 2008 at 12:25:15AM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Implement new suspend and hibernation callbacks for the PCI busi type.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > drivers/pci/pci-driver.c | 497 ++++++++++++++++++++++++++++++++++++++++++-----
> > include/linux/pci.h | 2
> > 2 files changed, 453 insertions(+), 46 deletions(-)
>
> Hm, that's a lot of code added. What additional functionality did we
> just get?

The possibility to handle suspend/hibernation in a more flexible way, provided
that the drivers define the new callbacks.

> Are we going to have to create a noirq version for every driver and bus
> type now?

No, we aren't. The PCI and platform bus types already have the noirq callbacks
implemented (they are called _late and _early, but that's the same
essentially).

Thanks,
Rafael

2008-03-19 21:49:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

On Wednesday, 19 of March 2008, Greg KH wrote:
> On Mon, Mar 17, 2008 at 12:22:29AM +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Introduce 'struct pm_ops' representing a set of suspend and
> > hibernation operations for bus types, device classes and device
> > types.
>
> Ok, I must have missed the thread describing why we need to do this, so,
> why do we need to do this? What is this going to buy us in the end
> after everything is changed?

There were many threads related to that.

To summarize, the first purpose is to separate suspend (aka s2ram and standby)
callbacks from hibernation callbacks in such a way that the new callbacks won't
take arguments and the purpose of each of them will be clearly specified. This
has been requested multiple times by many people, including Linus himself,
and the reason is that within the current scheme if ->resume() is called, for
example, it's difficult to say why it's been called (ie. is it a resume from RAM or
from hibernation or a suspend/hibernation failure etc.?).

The second purpose is to make the suspend/hibernation callbacks more flexible
so that device drivers can handle more than they can within the current scheme.
For example, some drivers may need to prevent new children of the device from
being registered before their ->suspend() callbacks are executed or they may
want to carry out some operations requiring the availability of some other
devices, not directly bound via the parent-child relationship, in order to prepare
for the execution of ->suspend(), etc.

Ultimately, we'd like to stop using the freezing of tasks for suspend and
therefore the drivers' suspend/hibernation code will have to take care of
the handling of the user space during suspend/hibernation which would be
difficult within the current scheme, without the ->prepare() and ->complete()
callbacks.

> > +struct pm_ops {
> > +#ifdef CONFIG_PM_SLEEP
> > + int (*prepare)(struct device *dev);
> > + void (*complete)(struct device *dev);
> > +#endif
> > +#ifdef CONFIG_SUSPEND
> > + int (*suspend)(struct device *dev);
> > + int (*resume)(struct device *dev);
> > +#endif
> > +#ifdef CONFIG_HIBERNATION
> > + int (*freeze)(struct device *dev);
> > + int (*thaw)(struct device *dev);
> > + int (*poweroff)(struct device *dev);
> > + int (*quiesce)(struct device *dev);
> > + int (*restore)(struct device *dev);
> > + int (*recover)(struct device *dev);
> > +#endif
>
>
> Don't ifdef stuff like this, it only causes ifdefs to be needed to the
> .c code as well for all places these structures are defined in
> drivers/busses, right?

Sure, but that code won't be necessary otherwise too.

I'll remove the #ifdefs in the next version of the patch.

Thanks,
Rafael

2008-03-19 22:16:32

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

On Wed, 19 Mar 2008, Benjamin Herrenschmidt wrote:

> Hi Alan !

Hi!

> On Tue, 2008-03-18 at 22:44 -0400, Alan Stern wrote:

> Now, about the issue vs. device insertion, I believe it's solvable
> unless I missed something:
>
> So it's the responsibility of bus drivers such as khubd to stop
> inserting new devices in the tree from prepare(). There we agree. The
> problem is about -enforcing- it in the core right ? That is, making
> device addition actually fail if a bogus bus driver tries to insert a
> device at the wrong time.
>
> This does indeed have an issue of when precisely do we stick that flag
> (that tells the core to fail) since before calling the bus driver
> prepare() is too early and after is potentially too late.
>
> Now, I think this is not really a problem if prepare() calls the drivers
> from top to bottom:

That was in fact the original plan, before Rafael changed over to
calling prepare() just prior to suspend().

> In that case, we would set the flag after the bus
> driver returns from prepare(). That means that there is a small window
> where a bogus bus driver can still insert a new child, but I think that
> isn't an issue, specifically because we walk from parent to child:
>
> If the sequence is:
>
> 1- prepare() returns
> 2- we set the "no new addition" flag
> 3- we start walking children
>
> You see that the only place the where the "bad" bus driver can still
> attempt to insert without failing is between 1 and 2. However, that
> happens before we start prepare()'ing the children. So that means that
> anything that has successfully been added -will- have it's prepare()
> called, and thus becomes a normal part of the suspend/resume process.
> Any attempt at adding after 2 would fail, thus we are sure the children
> list is sane here (won't get new things right in between. Removal might
> still happen I suppose, we'll have to look into this closely).

Your analysis is right. Removal isn't a problem; we don't care if
drivers are prohibited from registering new children when they are
unbound. In general we want to make it possible to unregister devices
at any time during the suspend-resume procedure (except of course that
a device can't be unregistered while one of its methods is running).

An important point I tried to make in the earlier email is that drivers
will want a simple way to know when it is illegal for them to register
new children. For example, suppose the registration is done by a
workqueue routine. The most reliable way for the driver to insure that
the routine won't try to register new children improperly is to have
the routine check a flag which gets set _before_ prepare() is called.

This means there would have to be _two_ flags: one set before calling
prepare() and one set afterward. Only the second would be checked by
the PM core when deciding whether to allow new child registrations.

> > In the other direction, it ought to be okay to allow new children to
> > be registered during resume(). There's no need to wait for complete().
>
> There are pros and cons here. A lot of drivers may rely on
> request_firmware() and such, and will possibly deadlock or timeout if
> called during resume before storage is available (i've seen that
> happening with today scheme). It might be better to be safe here, but I
> agree, it's generally less of a concern.

That's up to the individual driver. If it wants to wait until its
complete method runs, then fine. But if it doesn't want to wait, the
PM core shouldn't force new registrations to fail.

Alan Stern

2008-03-19 23:06:31

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

On Wed, 19 Mar 2008, Rafael J. Wysocki wrote:

> In the meantime I had an IRC chat with Ben, who wanted a couple of quite
> substantial changes to be made to this patch.
>
> First, Ben thinks that ->prepare() should be called in a separate loop for
> all devices, before any of them is suspended, so that drivers can assume the
> availability of the other devices during ->prepare() (for example, so that they
> can use GFP_KERNEL memory allocations). Accordingly, ->complete() would
> be called in a separate loop after calling ->resume() for all devices.

In order to make this work, you would have to prevent new children from
being registered starting from the time just after prepare() returns,
rather than from the time just before suspend() is called. Nothing is
wrong with that, but it requires a redesign of the new flags. (It's
worth mentioning that drivers will want to have a flag that gets set
just _before_ prepare() -- just _after_ is too late.)

In the other direction, it ought to be okay to allow new children to
be registered during resume(). There's no need to wait for complete().

> Next, he wants the number of noirq callbacks to be reduced.

I agree. For example, there isn't much point in having prepare() and
complete() entries for noirq.

> Both of the above things make sense, so I'll rework the patch (or maybe even
> all three patches) to implement them and we'll see how they will look like.

Alan Stern

2008-03-19 23:27:21

by Greg KH

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

On Wed, Mar 19, 2008 at 02:22:00PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, 19 of March 2008, Greg KH wrote:
> > On Mon, Mar 17, 2008 at 12:22:29AM +0100, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <[email protected]>
> > >
> > > Introduce 'struct pm_ops' representing a set of suspend and
> > > hibernation operations for bus types, device classes and device
> > > types.
> >
> > Ok, I must have missed the thread describing why we need to do this, so,
> > why do we need to do this? What is this going to buy us in the end
> > after everything is changed?
>
> There were many threads related to that.
>
> To summarize, the first purpose is to separate suspend (aka s2ram and standby)
> callbacks from hibernation callbacks in such a way that the new callbacks won't
> take arguments and the purpose of each of them will be clearly specified. This
> has been requested multiple times by many people, including Linus himself,
> and the reason is that within the current scheme if ->resume() is called, for
> example, it's difficult to say why it's been called (ie. is it a resume from RAM or
> from hibernation or a suspend/hibernation failure etc.?).
>
> The second purpose is to make the suspend/hibernation callbacks more flexible
> so that device drivers can handle more than they can within the current scheme.
> For example, some drivers may need to prevent new children of the device from
> being registered before their ->suspend() callbacks are executed or they may
> want to carry out some operations requiring the availability of some other
> devices, not directly bound via the parent-child relationship, in order to prepare
> for the execution of ->suspend(), etc.
>
> Ultimately, we'd like to stop using the freezing of tasks for suspend and
> therefore the drivers' suspend/hibernation code will have to take care of
> the handling of the user space during suspend/hibernation which would be
> difficult within the current scheme, without the ->prepare() and ->complete()
> callbacks.

Ok, thanks. You might want to include this in the patch itself (hint,
hint, hint...)

greg k-h

2008-03-19 23:27:45

by Greg KH

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/3] PM: New suspend and hibernation callbacks for PCI bus type

On Wed, Mar 19, 2008 at 02:24:48PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, 19 of March 2008, Greg KH wrote:
> > On Mon, Mar 17, 2008 at 12:25:15AM +0100, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <[email protected]>
> > >
> > > Implement new suspend and hibernation callbacks for the PCI busi type.
> > >
> > > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > > ---
> > > drivers/pci/pci-driver.c | 497 ++++++++++++++++++++++++++++++++++++++++++-----
> > > include/linux/pci.h | 2
> > > 2 files changed, 453 insertions(+), 46 deletions(-)
> >
> > Hm, that's a lot of code added. What additional functionality did we
> > just get?
>
> The possibility to handle suspend/hibernation in a more flexible way, provided
> that the drivers define the new callbacks.

So, we're going to have to now convert all drivers, right? Nice, I can
always use a bump up in the "number of patches submited" numbers :)

thanks,

greg k-h

2008-03-19 23:28:24

by Greg KH

[permalink] [raw]
Subject: Re: [RFC][PATCH 3/3] PM: New suspend and hibernation callbacks for PCI bus type

On Mon, Mar 17, 2008 at 12:25:15AM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Implement new suspend and hibernation callbacks for the PCI busi type.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/pci/pci-driver.c | 497 ++++++++++++++++++++++++++++++++++++++++++-----
> include/linux/pci.h | 2
> 2 files changed, 453 insertions(+), 46 deletions(-)

Hm, that's a lot of code added. What additional functionality did we
just get?

Are we going to have to create a noirq version for every driver and bus
type now?

thanks,

greg k-h

2008-03-19 23:28:54

by Greg KH

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

On Mon, Mar 17, 2008 at 12:22:29AM +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> Introduce 'struct pm_ops' representing a set of suspend and
> hibernation operations for bus types, device classes and device
> types.

Ok, I must have missed the thread describing why we need to do this, so,
why do we need to do this? What is this going to buy us in the end
after everything is changed?

> +struct pm_ops {
> +#ifdef CONFIG_PM_SLEEP
> + int (*prepare)(struct device *dev);
> + void (*complete)(struct device *dev);
> +#endif
> +#ifdef CONFIG_SUSPEND
> + int (*suspend)(struct device *dev);
> + int (*resume)(struct device *dev);
> +#endif
> +#ifdef CONFIG_HIBERNATION
> + int (*freeze)(struct device *dev);
> + int (*thaw)(struct device *dev);
> + int (*poweroff)(struct device *dev);
> + int (*quiesce)(struct device *dev);
> + int (*restore)(struct device *dev);
> + int (*recover)(struct device *dev);
> +#endif


Don't ifdef stuff like this, it only causes ifdefs to be needed to the
.c code as well for all places these structures are defined in
drivers/busses, right?

thanks,

greg k-h

2008-03-19 23:40:19

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

On Tue, 18 Mar 2008, Pavel Machek wrote:

> > + * @complete: Undo the changes made by @prepare(). This method is executed for
> > + * all kinds of resume transitions, immediately following one of the resume
> > + * callbacks: @resume(), @thaw(), @restore(), or @recover(). Also executed
> > + * if a suspend callback (@suspend(), @freeze(), @poweroff(), @quiesce())
> > + * immediately following a successful @prepare() fails OR if a new child
> > + * of the device has been registered during @prepare().

"during a successful call to @prepare()"?

> So... we do prepare() but it detects new child, so it returns -EAGAIN.
> so we call complete() based on description above
> ...and then we call prepare() to suspend again?

You misunderstood (maybe the comment needs to be clarified as above).

If prepare() returns any error (including -EAGAIN) then complete() does
not get called. If prepare() returns successfully but the PM core
detects that a new child was added while prepare() was running, then we
call complete(), suspend the child, and call prepare() again.

> > + * @suspend: Executed before putting the system into a sleep state in which the
> > + * contents of main memory are preserved. Quiesce the device, put it into
>
> content....is?

It's okay to use "contents" -- analogous to the table of contents in a
book. It's one of those weird corner cases where either alternative is
acceptable.

Alan Stern

2008-03-20 03:35:38

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks


> That was in fact the original plan, before Rafael changed over to
> calling prepare() just prior to suspend().

Heh, ok :-)

> Your analysis is right. Removal isn't a problem; we don't care if
> drivers are prohibited from registering new children when they are
> unbound. In general we want to make it possible to unregister devices
> at any time during the suspend-resume procedure (except of course that
> a device can't be unregistered while one of its methods is running).
>
> An important point I tried to make in the earlier email is that drivers
> will want a simple way to know when it is illegal for them to register
> new children. For example, suppose the registration is done by a
> workqueue routine. The most reliable way for the driver to insure that
> the routine won't try to register new children improperly is to have
> the routine check a flag which gets set _before_ prepare() is called.

I don't totally agree here. Drivers could have their own flag set
internally with appropriate locking. The problem with your approach is
locking. Just setting a flag is mostly useless, -unless- there
appropriate locking between setter and testers.

For example, the flags we talked about previously, the one set _after_
prepare that causes device_add() to fail, this one must have its setting
done within the same mutex/lock as device_add() internally uses to
protect the list. Or something along those lines.

Now, your idea of having the core set some other flag to tell drivers to
stop doing addition wouldn't work here because the driver would -not-
hold the core device list mutex between testing it and performing the
addition.

As a general matter, the driver would need to internally, using it's own
locking mechanisms, ensure that it will not do any further device_add,
and that's not something the core can help with. In our case, that could
be acheived by flushing the work queues and ensuring we don't queue
another one, or using some driver internal semaphore to set a "stop
adding" flag whatever...

However, I think this is mostly a non-issue, because the core _does_
provide something here that is useful for drivers who don't want to
bother with the above: The failure return from device_add. If drivers
don't want to do something akin to what I described, they can just be
made to deal gracefully with the failure from device_add that would
happen due to the core internal flag being set after the return from
prepare().

So I don't think we need anything else.

Cheers,
Ben.

2008-03-20 14:45:49

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

On Thu, 20 Mar 2008, Benjamin Herrenschmidt wrote:

> > An important point I tried to make in the earlier email is that drivers
> > will want a simple way to know when it is illegal for them to register
> > new children. For example, suppose the registration is done by a
> > workqueue routine. The most reliable way for the driver to insure that
> > the routine won't try to register new children improperly is to have
> > the routine check a flag which gets set _before_ prepare() is called.
>
> I don't totally agree here. Drivers could have their own flag set
> internally with appropriate locking. The problem with your approach is
> locking. Just setting a flag is mostly useless, -unless- there
> appropriate locking between setter and testers.

I didn't mention locking because it seemed obvious that locking is
needed. Of course a flag alone is mostly useless.

But we already _do_ have a lock (dev->sem) and we _don't_ have a flag.
I was pointing out that some drivers will want to have a flag added.
Maybe so many drivers will want it that the flag should be added to the
PM structure, where it will be available for every device. If only a
few drivers want this new flag, then yes, they can put it in their
private data structures.

> However, I think this is mostly a non-issue, because the core _does_
> provide something here that is useful for drivers who don't want to
> bother with the above: The failure return from device_add. If drivers
> don't want to do something akin to what I described, they can just be
> made to deal gracefully with the failure from device_add that would
> happen due to the core internal flag being set after the return from
> prepare().

Relying on a registration failure to handle this sort of thing is _not_
a good idea. For example, how would the driver distinguish failure
caused by impending suspend from other sorts of failure?

Alan Stern

2008-03-20 18:26:45

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

On Thu, 20 Mar 2008, Benjamin Herrenschmidt wrote:

> > An important point I tried to make in the earlier email is that drivers
> > will want a simple way to know when it is illegal for them to register
> > new children. For example, suppose the registration is done by a
> > workqueue routine. The most reliable way for the driver to insure that
> > the routine won't try to register new children improperly is to have
> > the routine check a flag which gets set _before_ prepare() is called.
>
> I don't totally agree here. Drivers could have their own flag set
> internally with appropriate locking. The problem with your approach is
> locking. Just setting a flag is mostly useless, -unless- there
> appropriate locking between setter and testers.

Here's an example of what I mean.

One of the things we don't want to do is bind a new driver to a device
after it has gone through the prepare() stage. Doing so would involve
calling the driver's probe() routine, which is likely to want to
register new children and who knows what else. The probe routine might
even end up running after the device was suspended! Clearly this
should be avoided.

But the user can force a binding to occur by writing the device's path
to the driver's "bind" attribute in sysfs. This means that
driver_bind() in drivers/base/bus.c will need to know whether or not
the device has gone through the prepare() stage, which means the device
structure will need to have a flag set before prepare() is called (more
precisely, the flag must be set before dev->sem is released following
the call to prepare).

Either that or else driver_bind() must always block whenever a system
sleep is in progress. That would be easier -- but it's a lot like what
the freezer would do. Which would you prefer?

Alan Stern

2008-03-20 22:34:36

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks


On Thu, 2008-03-20 at 10:45 -0400, Alan Stern wrote:
>
> I didn't mention locking because it seemed obvious that locking is
> needed. Of course a flag alone is mostly useless.
>
> But we already _do_ have a lock (dev->sem) and we _don't_ have a flag.
> I was pointing out that some drivers will want to have a flag added.
> Maybe so many drivers will want it that the flag should be added to the
> PM structure, where it will be available for every device. If only a
> few drivers want this new flag, then yes, they can put it in their
> private data structures.

What I'm saying is that a flag set prior to prepare() would be almost
impossible to get right in term of locking for the reasons I explained.

I think this should be under driver reponsibility, using it's own
internal locking. Or else, drivers can rely on failure.

> > However, I think this is mostly a non-issue, because the core _does_
> > provide something here that is useful for drivers who don't want to
> > bother with the above: The failure return from device_add. If drivers
> > don't want to do something akin to what I described, they can just be
> > made to deal gracefully with the failure from device_add that would
> > happen due to the core internal flag being set after the return from
> > prepare().
>
> Relying on a registration failure to handle this sort of thing is _not_
> a good idea. For example, how would the driver distinguish failure
> caused by impending suspend from other sorts of failure?

By the precise error code of course. But as I said, I would expect most
bus drivers to have their own proper implementation that does the right
thing, it's not like it was hard and there aren't that many bus drivers.
Relying on failure is a possibility for simple ones that don't want to
care much, in which case, there is little need to differenciate the
failure caused by suspend from other failures other than maybe printing
some garbage in the log vs. not :-)

Ben.

2008-03-20 22:36:19

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks


On Thu, 2008-03-20 at 14:26 -0400, Alan Stern wrote:
>
> One of the things we don't want to do is bind a new driver to a device
> after it has gone through the prepare() stage. Doing so would involve
> calling the driver's probe() routine, which is likely to want to
> register new children and who knows what else. The probe routine might
> even end up running after the device was suspended! Clearly this
> should be avoided.
>
> But the user can force a binding to occur by writing the device's path
> to the driver's "bind" attribute in sysfs. This means that
> driver_bind() in drivers/base/bus.c will need to know whether or not
> the device has gone through the prepare() stage, which means the device
> structure will need to have a flag set before prepare() is called (more
> precisely, the flag must be set before dev->sem is released following
> the call to prepare).
>
> Either that or else driver_bind() must always block whenever a system
> sleep is in progress. That would be easier -- but it's a lot like what
> the freezer would do. Which would you prefer?

I don't fully understand what you are saying here.

You say "bind a new driver to a device after it has gone through the
prepare() stage"

So either there was already a driver bound to that device, which got a
prepare() callback, and in which case bind() doesn't mean much. Or there
was not, in which case there was no prepare() call involved.

I suppose if you prepare(), then unbind(), then something tries to
bind()... But wouldn't the core flag set after prepare() be plenty
enough to shield against that ?

I fail to see the race you are talking about here.

Ben.

2008-03-20 22:57:23

by Alan Stern

[permalink] [raw]
Subject: Re: [RFC][PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks

On Fri, 21 Mar 2008, Benjamin Herrenschmidt wrote:

>
> On Thu, 2008-03-20 at 14:26 -0400, Alan Stern wrote:
> >
> > One of the things we don't want to do is bind a new driver to a device
> > after it has gone through the prepare() stage. Doing so would involve
> > calling the driver's probe() routine, which is likely to want to
> > register new children and who knows what else. The probe routine might
> > even end up running after the device was suspended! Clearly this
> > should be avoided.
> >
> > But the user can force a binding to occur by writing the device's path
> > to the driver's "bind" attribute in sysfs. This means that
> > driver_bind() in drivers/base/bus.c will need to know whether or not
> > the device has gone through the prepare() stage, which means the device
> > structure will need to have a flag set before prepare() is called (more
> > precisely, the flag must be set before dev->sem is released following
> > the call to prepare).
> >
> > Either that or else driver_bind() must always block whenever a system
> > sleep is in progress. That would be easier -- but it's a lot like what
> > the freezer would do. Which would you prefer?
>
> I don't fully understand what you are saying here.
>
> You say "bind a new driver to a device after it has gone through the
> prepare() stage"
>
> So either there was already a driver bound to that device, which got a
> prepare() callback, and in which case bind() doesn't mean much. Or there
> was not, in which case there was no prepare() call involved.

That's why I said "gone through the prepare() stage" instead of saying
"after prepare() was called". Yes, if there's no driver bound then
there's no method to call. But the PM core will still pass the device
to a subroutine whose job is to call the corresponding method, if one
exists.

> I suppose if you prepare(), then unbind(), then something tries to
> bind()... But wouldn't the core flag set after prepare() be plenty
> enough to shield against that ?
>
> I fail to see the race you are talking about here.

Let's say Task 0 is starting a sleep transition when Task 1 calls
driver_bind():

Task 0 Task 1
------ ------
down(&dev->sem) succeeds down(&dev->sem) blocks
Sees that dev has no driver
so there is no prepare
method to call
up(&dev->sem)
down() succeeds
Sees that dev->power.prepared
flag isn't set
Calls (drv->probe)(dev)
Sets dev->power.prepared
flag
drv tries to register a child
device and fails for no good
reason

One way to fix this is to set dev->power.prepared before releasing
dev->sem. Another way to fix it is to have driver_bind() block
(as though it were frozen) whenever a system sleep is underway.

Which do you prefer?

Alan Stern