2009-03-11 09:57:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 0/10] PM: Rework suspend-resume ordering to avoid problems with shared interrupts (updated)

Hi,

The last iteration of this series of patches didn't draw comments except for
the discussion about the "wake-up" interrupts, so here's an update that I'd
like to consider as more-or-less final. I would also like to use a separate
'suspend' git tree for merging these patches, if you don't mind.

The following patches modifiy the way in which we handle disabling interrupts
during suspend and enabling them during resume. They also change the ordering
of the core suspend and hibernation code to take advantage of the new approach
to the interrupts and modify the PCI PM core to avoid a few problems.

Namely, interrupts are currently disabled on the boot CPU as soon as the
nonboot CPUs have been disabled, which doesn't allow device drivers' "late"
suspend and "early" resume callbacks to sleep. Among other things this means
they cannot execute ACPI AML routines, which leads to problems with
suspend-resume of PCI devices, as recently discussed.

1/10 modifies the [suspend|hibernation] and resume code, as well as the other
code using the device PM framework, so that device drivers will not receive
interrupts during the "late" suspend phase, although interrupts will only be
disabled on the CPU right before calling sysdev_suspend() (and analogously
during resume).

2/10 - 4/10 modify the suspend, hibernation and kexec jump code, respectively,
so that the "late" phase of suspending devices will happen before executing the
platform "prepare" callback and disabling nonboot CPUs (and analogously during
resume).

5/10 is a patch that's already in the PCI linux-next tree and I included it in
the series, because the next patches depend on it.

6/10 makes the PCI PM core use pci_set_power_state() to put devices into
D0 during early resume, which allows the platform-specific operations to be
carried out at that time, if necessary.

7/10 uses the opportunity to move pci_restore_standard_config() to pci-driver.c,
where it belongs IMO.

8/10 makes the PCI PM core code put devices into low power states during the
"late" phase of suspend which allows us to avoid a long-standing race related
to shared interrupts and to handle devices that require some platform-specific
operations to be put into low power states appropriately at the same time.
[The second rev of the patch retains the current behavior during the
"power-off" phase of hibernation, which is that the devices without drivers or
without PM support in the drivers are not power managed by the core.]

9/10 fixes pci_set_power_state() so that it doesn't return error code when
attempting to put a PCI device without PM support (either native or through the
platform) into D0 (such devices are always in D0).

10/10 makes the PCI PM core save and restore the configuration spaces of
devices that have no drivers or no PM support in the drivers during suspend and
resume, respectively.

Thanks,
Rafael


2009-03-11 09:58:18

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

From: Rafael J. Wysocki <[email protected]>

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively. These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume. Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/kernel/apm_32.c | 15 +++++--
drivers/base/power/main.c | 20 +++++-----
drivers/base/sys.c | 8 ++++
drivers/xen/manage.c | 16 ++++----
include/linux/interrupt.h | 5 ++
include/linux/irq.h | 1
kernel/irq/Makefile | 1
kernel/irq/internals.h | 1
kernel/irq/manage.c | 2 -
kernel/irq/pm.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/kexec.c | 8 ++--
kernel/power/disk.c | 39 ++++++++++++++------
kernel/power/main.c | 17 +++++---
13 files changed, 181 insertions(+), 41 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned
extern void disable_irq(unsigned int irq);
extern void enable_irq(unsigned int irq);

+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)

extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
*/
static int suspend_enter(suspend_state_t state)
{
- int error = 0;
+ int error;

device_pm_lock();
- arch_suspend_disable_irqs();
- BUG_ON(!irqs_disabled());

- if ((error = device_power_down(PMSG_SUSPEND))) {
+ error = device_power_down(PMSG_SUSPEND);
+ if (error) {
printk(KERN_ERR "PM: Some devices failed to power down\n");
goto Done;
}

+ arch_suspend_disable_irqs();
+ BUG_ON(!irqs_disabled());
+
error = sysdev_suspend(PMSG_SUSPEND);
if (!error) {
if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
sysdev_resume();
}

- device_power_up(PMSG_RESUME);
- Done:
arch_suspend_enable_irqs();
BUG_ON(irqs_disabled());
+
+ device_power_up(PMSG_RESUME);
+
+ Done:
device_pm_unlock();
+
return error;
}

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
return error;

device_pm_lock();
- local_irq_disable();
+
/* At this point, device_suspend() has been called, but *not*
* device_power_down(). We *must* call device_power_down() now.
* Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
"aborting hibernation\n");
- goto Enable_irqs;
+ goto Unlock;
}
+
+ local_irq_disable();
+
sysdev_suspend(PMSG_FREEZE);
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
/* NOTE: device_power_up() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
*/
+
Power_up_devices:
+ local_irq_enable();
+
device_power_up(in_suspend ?
(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
- local_irq_enable();
+
+ Unlock:
device_pm_unlock();
+
return error;
}

@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
int error;

device_pm_lock();
- local_irq_disable();
+
error = device_power_down(PMSG_QUIESCE);
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
"aborting resume\n");
- goto Enable_irqs;
+ goto Unlock;
}
+
+ local_irq_disable();
+
sysdev_suspend(PMSG_QUIESCE);
/* We'll ignore saved state, but this gets preempt count (etc) right */
save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
swsusp_free();
restore_processor_state();
touch_softlockup_watchdog();
+
sysdev_resume();
- device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
local_irq_enable();
+
+ device_power_up(PMSG_RECOVER);
+
+ Unlock:
device_pm_unlock();
+
return error;
}

@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
goto Finish;

device_pm_lock();
- local_irq_disable();
+
error = device_power_down(PMSG_HIBERNATE);
if (!error) {
+ local_irq_disable();
sysdev_suspend(PMSG_HIBERNATE);
hibernation_ops->enter();
/* We should never get here */
while (1);
}
- local_irq_enable();
+
device_pm_unlock();

/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
*/
Finish:
hibernation_ops->finish();
+
Resume_devices:
entering_platform_hibernation = false;
device_resume(PMSG_RESTORE);
resume_console();
+
Close:
hibernation_ops->end();
+
return error;
}

Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
struct apm_user *as;

device_suspend(PMSG_SUSPEND);
- local_irq_disable();
+
device_power_down(PMSG_SUSPEND);
+
+ local_irq_disable();
sysdev_suspend(PMSG_SUSPEND);

local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
if (err != APM_SUCCESS)
apm_error("suspend", err);
err = (err == APM_SUCCESS) ? 0 : -EIO;
+
sysdev_resume();
- device_power_up(PMSG_RESUME);
local_irq_enable();
+
+ device_power_up(PMSG_RESUME);
+
device_resume(PMSG_RESUME);
queue_event(APM_NORMAL_RESUME, NULL);
spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
{
int err;

- local_irq_disable();
device_power_down(PMSG_SUSPEND);
+
+ local_irq_disable();
sysdev_suspend(PMSG_SUSPEND);
local_irq_enable();

@@ -1239,8 +1245,9 @@ static void standby(void)

local_irq_disable();
sysdev_resume();
- device_power_up(PMSG_RESUME);
local_irq_enable();
+
+ device_power_up(PMSG_RESUME);
}

static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)

BUG_ON(!irqs_disabled());

- err = device_power_down(PMSG_SUSPEND);
- if (err) {
- printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
- err);
- return err;
- }
err = sysdev_suspend(PMSG_SUSPEND);
if (err) {
printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
xen_mm_unpin_all();

sysdev_resume();
- device_power_up(PMSG_RESUME);

if (!*cancelled) {
xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
/* XXX use normal device tree? */
xenbus_suspend();

+ err = device_power_down(PMSG_SUSPEND);
+ if (err) {
+ printk(KERN_ERR "device_power_down failed: %d\n", err);
+ goto resume_devices;
+ }
+
err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
if (err) {
printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
} else
xenbus_suspend_cancel();

+ device_power_up(PMSG_RESUME);
+
+resume_devices:
device_resume(PMSG_RESUME);

/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
if (error)
goto Resume_devices;
device_pm_lock();
- local_irq_disable();
/* At this point, device_suspend() has been called,
* but *not* device_power_down(). We *must*
* device_power_down() now. Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
*/
error = device_power_down(PMSG_FREEZE);
if (error)
- goto Enable_irqs;
+ goto Unlock_pm;

+ local_irq_disable();
/* Suspend system devices */
error = sysdev_suspend(PMSG_FREEZE);
if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
if (kexec_image->preserve_context) {
sysdev_resume();
Power_up_devices:
- device_power_up(PMSG_RESTORE);
- Enable_irqs:
local_irq_enable();
+ device_power_up(PMSG_RESTORE);
+ Unlock_pm:
device_pm_unlock();
enable_nonboot_cpus();
Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef void (*irq_flow_handler_t)(unsig
#define IRQ_SPURIOUS_DISABLED 0x00800000 /* IRQ was disabled by the spurious trap */
#define IRQ_MOVE_PCNTXT 0x01000000 /* IRQ migration from process context */
#define IRQ_AFFINITY_SET 0x02000000 /* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED 0x04000000 /* IRQ has gone through suspend sequence */

#ifdef CONFIG_IRQ_PER_CPU
# define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,89 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <[email protected]>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc) {
+ unsigned long flags;
+ bool sync = false;
+
+ spin_lock_irqsave(&desc->lock, flags);
+
+ if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
+ if (!desc->depth++) {
+ desc->status |= IRQ_DISABLED;
+ desc->chip->disable(irq);
+ sync = true;
+ }
+ desc->status |= IRQ_SUSPENDED;
+ }
+
+ spin_unlock_irqrestore(&desc->lock, flags);
+
+ if (sync)
+ synchronize_irq(irq);
+ }
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc) {
+ unsigned long flags;
+
+ if (!(desc->status & IRQ_SUSPENDED))
+ continue;
+
+ spin_lock_irqsave(&desc->lock, flags);
+ desc->status &= ~IRQ_SUSPENDED;
+ __enable_irq(desc, irq);
+ spin_unlock_irqrestore(&desc->lock, flags);
+ }
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc)
+ if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+ return -EBUSY;
+
+ return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -215,7 +215,7 @@ void disable_irq(unsigned int irq)
}
EXPORT_SYMBOL(disable_irq);

-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq)
{
switch (desc->depth) {
case 0:
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
#include <linux/pm.h>
#include <linux/resume-trace.h>
#include <linux/rwsem.h>
+#include <linux/interrupt.h>

#include "../base.h"
#include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
* Execute the appropriate "noirq resume" callback for all devices marked
* as DPM_OFF_IRQ.
*
- * Must be called with interrupts disabled and only one CPU running.
+ * Must be called under dpm_list_mtx. Device drivers should not receive
+ * interrupts while it's being executed.
*/
static void dpm_power_up(pm_message_t state)
{
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
* device_power_up - Turn on all devices that need special attention.
* @state: PM transition of the system being carried out.
*
- * Power on system devices, then devices that required we shut them down
- * with interrupts disabled.
- *
- * Must be called with interrupts disabled.
+ * Call the "early" resume handlers and enable device drivers to receive
+ * interrupts.
*/
void device_power_up(pm_message_t state)
{
dpm_power_up(state);
+ resume_device_irqs();
}
EXPORT_SYMBOL_GPL(device_power_up);

@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
* device_power_down - Shut down special devices.
* @state: PM transition of the system being carried out.
*
- * Power down devices that require interrupts to be disabled.
- * Then power down system devices.
+ * Prevent device drivers from receiving interrupts and call the "late"
+ * suspend handlers.
*
- * Must be called with interrupts disabled and only one CPU running.
+ * Must be called under dpm_list_mtx.
*/
int device_power_down(pm_message_t state)
{
struct device *dev;
int error = 0;

+ suspend_device_irqs();
list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
error = suspend_device_noirq(dev, state);
if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
dev->power.status = DPM_OFF_IRQ;
}
if (error)
- dpm_power_up(resume_event(state));
+ device_power_up(resume_event(state));
return error;
}
EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
#include <linux/pm.h>
#include <linux/device.h>
#include <linux/mutex.h>
+#include <linux/interrupt.h>

#include "base.h"

@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
struct sysdev_driver *drv, *err_drv;
int ret;

+ pr_debug("Checking wake-up interrupts\n");
+
+ /* Return error code if there are any wake-up interrupts pending */
+ ret = check_wakeup_irqs();
+ if (ret)
+ return ret;
+
pr_debug("Suspending System Devices\n");

list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,7 @@ extern void compat_irq_chip_set_default_

extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
unsigned long flags);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq);

extern struct lock_class_key irq_desc_lock_class;
extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

2009-03-11 09:58:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 3/10] PM: Change hibernation code ordering

From: Rafael J. Wysocki <[email protected]>

Change the ordering of the hibernation core code so that the platform
"prepare" callbacks are executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change (along with the previous analogous change of the suspend
core code) will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
kernel/power/disk.c | 109 +++++++++++++++++++++++++++++-----------------------
1 file changed, 61 insertions(+), 48 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -228,13 +228,22 @@ static int create_image(int platform_mod
goto Unlock;
}

+ error = platform_pre_snapshot(platform_mode);
+ if (error || hibernation_test(TEST_PLATFORM))
+ goto Platform_finish;
+
+ error = disable_nonboot_cpus();
+ if (error || hibernation_test(TEST_CPUS)
+ || hibernation_testmode(HIBERNATION_TEST))
+ goto Enable_cpus;
+
local_irq_disable();

sysdev_suspend(PMSG_FREEZE);
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
"aborting hibernation\n");
- goto Power_up_devices;
+ goto Enable_irqs;
}

if (hibernation_test(TEST_CORE))
@@ -250,15 +259,22 @@ static int create_image(int platform_mod
restore_processor_state();
if (!in_suspend)
platform_leave(platform_mode);
+
Power_up:
sysdev_resume();
/* NOTE: device_power_up() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
*/

- Power_up_devices:
+ Enable_irqs:
local_irq_enable();

+ Enable_cpus:
+ enable_nonboot_cpus();
+
+ Platform_finish:
+ platform_finish(platform_mode);
+
device_power_up(in_suspend ?
(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);

@@ -298,25 +314,9 @@ int hibernation_snapshot(int platform_mo
if (hibernation_test(TEST_DEVICES))
goto Recover_platform;

- error = platform_pre_snapshot(platform_mode);
- if (error || hibernation_test(TEST_PLATFORM))
- goto Finish;
-
- error = disable_nonboot_cpus();
- if (!error) {
- if (hibernation_test(TEST_CPUS))
- goto Enable_cpus;
-
- if (hibernation_testmode(HIBERNATION_TEST))
- goto Enable_cpus;
+ error = create_image(platform_mode);
+ /* Control returns here after successful restore */

- error = create_image(platform_mode);
- /* Control returns here after successful restore */
- }
- Enable_cpus:
- enable_nonboot_cpus();
- Finish:
- platform_finish(platform_mode);
Resume_devices:
device_resume(in_suspend ?
(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
@@ -338,7 +338,7 @@ int hibernation_snapshot(int platform_mo
* kernel.
*/

-static int resume_target_kernel(void)
+static int resume_target_kernel(bool platform_mode)
{
int error;

@@ -351,9 +351,20 @@ static int resume_target_kernel(void)
goto Unlock;
}

+ error = platform_pre_restore(platform_mode);
+ if (error)
+ goto Cleanup;
+
+ error = disable_nonboot_cpus();
+ if (error)
+ goto Enable_cpus;
+
local_irq_disable();

- sysdev_suspend(PMSG_QUIESCE);
+ error = sysdev_suspend(PMSG_QUIESCE);
+ if (error)
+ goto Enable_irqs;
+
/* We'll ignore saved state, but this gets preempt count (etc) right */
save_processor_state();
error = restore_highmem();
@@ -379,8 +390,15 @@ static int resume_target_kernel(void)

sysdev_resume();

+ Enable_irqs:
local_irq_enable();

+ Enable_cpus:
+ enable_nonboot_cpus();
+
+ Cleanup:
+ platform_restore_cleanup(platform_mode);
+
device_power_up(PMSG_RECOVER);

Unlock:
@@ -405,19 +423,10 @@ int hibernation_restore(int platform_mod
pm_prepare_console();
suspend_console();
error = device_suspend(PMSG_QUIESCE);
- if (error)
- goto Finish;
-
- error = platform_pre_restore(platform_mode);
if (!error) {
- error = disable_nonboot_cpus();
- if (!error)
- error = resume_target_kernel();
- enable_nonboot_cpus();
+ error = resume_target_kernel(platform_mode);
+ device_resume(PMSG_RECOVER);
}
- platform_restore_cleanup(platform_mode);
- device_resume(PMSG_RECOVER);
- Finish:
resume_console();
pm_restore_console();
return error;
@@ -453,34 +462,38 @@ int hibernation_platform_enter(void)
goto Resume_devices;
}

+ device_pm_lock();
+
+ error = device_power_down(PMSG_HIBERNATE);
+ if (error)
+ goto Unlock;
+
error = hibernation_ops->prepare();
if (error)
- goto Resume_devices;
+ goto Platofrm_finish;

error = disable_nonboot_cpus();
if (error)
- goto Finish;
-
- device_pm_lock();
-
- error = device_power_down(PMSG_HIBERNATE);
- if (!error) {
- local_irq_disable();
- sysdev_suspend(PMSG_HIBERNATE);
- hibernation_ops->enter();
- /* We should never get here */
- while (1);
- }
+ goto Platofrm_finish;

- device_pm_unlock();
+ local_irq_disable();
+ sysdev_suspend(PMSG_HIBERNATE);
+ hibernation_ops->enter();
+ /* We should never get here */
+ while (1);

/*
* We don't need to reenable the nonboot CPUs or resume consoles, since
* the system is going to be halted anyway.
*/
- Finish:
+ Platofrm_finish:
hibernation_ops->finish();

+ device_power_up(PMSG_RESTORE);
+
+ Unlock:
+ device_pm_unlock();
+
Resume_devices:
entering_platform_hibernation = false;
device_resume(PMSG_RESTORE);

2009-03-11 09:58:36

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 2/10] PM: Change suspend code ordering

From: Rafael J. Wysocki <[email protected]>

Change the ordering of the suspend core code so that the platform
"prepare" callback is executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
kernel/power/main.c | 38 ++++++++++++++++++++++----------------
1 file changed, 22 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -297,6 +297,19 @@ static int suspend_enter(suspend_state_t
goto Done;
}

+ if (suspend_ops->prepare) {
+ error = suspend_ops->prepare();
+ if (error)
+ goto Power_up_devices;
+ }
+
+ if (suspend_test(TEST_PLATFORM))
+ goto Platfrom_finish;
+
+ error = disable_nonboot_cpus();
+ if (error || suspend_test(TEST_CPUS))
+ goto Enable_cpus;
+
arch_suspend_disable_irqs();
BUG_ON(!irqs_disabled());

@@ -310,6 +323,14 @@ static int suspend_enter(suspend_state_t
arch_suspend_enable_irqs();
BUG_ON(irqs_disabled());

+ Enable_cpus:
+ enable_nonboot_cpus();
+
+ Platfrom_finish:
+ if (suspend_ops->finish)
+ suspend_ops->finish();
+
+ Power_up_devices:
device_power_up(PMSG_RESUME);

Done:
@@ -346,23 +367,8 @@ int suspend_devices_and_enter(suspend_st
if (suspend_test(TEST_DEVICES))
goto Recover_platform;

- if (suspend_ops->prepare) {
- error = suspend_ops->prepare();
- if (error)
- goto Resume_devices;
- }
-
- if (suspend_test(TEST_PLATFORM))
- goto Finish;
+ suspend_enter(state);

- error = disable_nonboot_cpus();
- if (!error && !suspend_test(TEST_CPUS))
- suspend_enter(state);
-
- enable_nonboot_cpus();
- Finish:
- if (suspend_ops->finish)
- suspend_ops->finish();
Resume_devices:
suspend_test_start();
device_resume(PMSG_RESUME);

2009-03-11 09:59:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 4/10] kexec: Change kexec jump code ordering

From: Rafael J. Wysocki <[email protected]>

Change the ordering of the kexec jump code so that the nonboot CPUs
are disabled after calling device drivers' "late suspend" methods.

This change reflects the recent modifications of the power management
code that is also used by kexec jump.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
kernel/kexec.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1450,9 +1450,6 @@ int kernel_kexec(void)
error = device_suspend(PMSG_FREEZE);
if (error)
goto Resume_console;
- error = disable_nonboot_cpus();
- if (error)
- goto Resume_devices;
device_pm_lock();
/* At this point, device_suspend() has been called,
* but *not* device_power_down(). We *must*
@@ -1463,13 +1460,15 @@ int kernel_kexec(void)
*/
error = device_power_down(PMSG_FREEZE);
if (error)
- goto Unlock_pm;
-
+ goto Resume_devices;
+ error = disable_nonboot_cpus();
+ if (error)
+ goto Enable_cpus;
local_irq_disable();
/* Suspend system devices */
error = sysdev_suspend(PMSG_FREEZE);
if (error)
- goto Power_up_devices;
+ goto Enable_irqs;
} else
#endif
{
@@ -1483,13 +1482,13 @@ int kernel_kexec(void)
#ifdef CONFIG_KEXEC_JUMP
if (kexec_image->preserve_context) {
sysdev_resume();
- Power_up_devices:
+ Enable_irqs:
local_irq_enable();
- device_power_up(PMSG_RESTORE);
- Unlock_pm:
- device_pm_unlock();
+ Enable_cpus:
enable_nonboot_cpus();
+ device_power_up(PMSG_RESTORE);
Resume_devices:
+ device_pm_unlock();
device_resume(PMSG_RESTORE);
Resume_console:
resume_console();

2009-03-11 09:59:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 6/10] PCI PM: Use pci_set_power_state during early resume

From: Rafael J. Wysocki <[email protected]>

Once we have allowed timer interrupts to be enabled during the early
phase of resuming devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into D0 at that time. Then,
the platform-specific PM code will have a chance to handle devices
that don't implement the native PCI PM or that require some
additional, platform-specific operations to be carried out to power
them up. Also, by doing this we can simplify the code quite a bit.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci.c | 48 +++++++++---------------------------------------
1 file changed, 9 insertions(+), 39 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -426,7 +426,6 @@ static inline int platform_pci_sleep_wak
* given PCI device
* @dev: PCI device to handle.
* @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
- * @wait: If 'true', wait for the device to change its power state
*
* RETURN VALUE:
* -EINVAL if the requested state is invalid.
@@ -435,8 +434,7 @@ static inline int platform_pci_sleep_wak
* 0 if device already is in the requested state.
* 0 if device's power state has been successfully changed.
*/
-static int
-pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state, bool wait)
+static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
{
u16 pmcsr;
bool need_restore = false;
@@ -481,10 +479,8 @@ pci_raw_set_power_state(struct pci_dev *
break;
case PCI_UNKNOWN: /* Boot-up */
if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
- && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
+ && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
need_restore = true;
- wait = true;
- }
/* Fall-through: force to D0 */
default:
pmcsr = 0;
@@ -494,9 +490,6 @@ pci_raw_set_power_state(struct pci_dev *
/* enter specified state */
pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);

- if (!wait)
- return 0;
-
/* Mandatory power management transition delays */
/* see PCI PM 1.1 5.6.1 table 18 */
if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
@@ -521,7 +514,7 @@ pci_raw_set_power_state(struct pci_dev *
if (need_restore)
pci_restore_bars(dev);

- if (wait && dev->bus->self)
+ if (dev->bus->self)
pcie_aspm_pm_state_change(dev->bus->self);

return 0;
@@ -591,7 +584,7 @@ int pci_set_power_state(struct pci_dev *
if (state == PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
return 0;

- error = pci_raw_set_power_state(dev, state, true);
+ error = pci_raw_set_power_state(dev, state);

if (state > PCI_D0 && platform_pci_power_manageable(dev)) {
/* Allow the platform to finalize the transition */
@@ -1390,37 +1383,14 @@ void pci_allocate_cap_save_buffers(struc
*/
int pci_restore_standard_config(struct pci_dev *dev)
{
- pci_power_t prev_state;
- int error;
-
- pci_update_current_state(dev, PCI_D0);
-
- prev_state = dev->current_state;
- if (prev_state == PCI_D0)
- goto Restore;
-
- error = pci_raw_set_power_state(dev, PCI_D0, false);
- if (error)
- return error;
+ pci_update_current_state(dev, PCI_UNKNOWN);

- /*
- * This assumes that we won't get a bus in B2 or B3 from the BIOS, but
- * we've made this assumption forever and it appears to be universally
- * satisfied.
- */
- switch(prev_state) {
- case PCI_D3cold:
- case PCI_D3hot:
- mdelay(pci_pm_d3_delay);
- break;
- case PCI_D2:
- udelay(PCI_PM_D2_DELAY);
- break;
+ if (dev->current_state != PCI_D0) {
+ int error = pci_set_power_state(dev, PCI_D0);
+ if (error)
+ return error;
}

- pci_update_current_state(dev, PCI_D0);
-
- Restore:
return dev->state_saved ? pci_restore_state(dev) : 0;
}

2009-03-11 09:59:35

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 5/10] PCI PM: Consistently use variable name "error" for pm call return values

From: Frans Pop <[email protected]>

I noticed two functions use a variable "i" to store the return value of PM
function calls while the rest of the file uses "error". As "i" normally
indicates a counter of some sort it seems better to keep this consistent.

Signed-off-by: Frans Pop <[email protected]>
Signed-off-by: Jesse Barnes <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci-driver.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,17 +352,17 @@ static int pci_legacy_suspend(struct dev
{
struct pci_dev * pci_dev = to_pci_dev(dev);
struct pci_driver * drv = pci_dev->driver;
- int i = 0;
+ int error = 0;

if (drv && drv->suspend) {
pci_power_t prev = pci_dev->current_state;

pci_dev->state_saved = false;

- i = drv->suspend(pci_dev, state);
- suspend_report_result(drv->suspend, i);
- if (i)
- return i;
+ error = drv->suspend(pci_dev, state);
+ suspend_report_result(drv->suspend, error);
+ if (error)
+ return error;

if (pci_dev->state_saved)
goto Fixup;
@@ -385,20 +385,20 @@ static int pci_legacy_suspend(struct dev
Fixup:
pci_fixup_device(pci_fixup_suspend, pci_dev);

- return i;
+ return error;
}

static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
{
struct pci_dev * pci_dev = to_pci_dev(dev);
struct pci_driver * drv = pci_dev->driver;
- int i = 0;
+ int error = 0;

if (drv && drv->suspend_late) {
- i = drv->suspend_late(pci_dev, state);
- suspend_report_result(drv->suspend_late, i);
+ error = drv->suspend_late(pci_dev, state);
+ suspend_report_result(drv->suspend_late, error);
}
- return i;
+ return error;
}

static int pci_legacy_resume_early(struct device *dev)

2009-03-11 10:00:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 7/10] PCI PM: Move pci_restore_standard_config to pci-driver.c

From: Rafael J. Wysocki <[email protected]>

Move pci_restore_standard_config() from pci.c to pci-driver.c and
make it static.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci-driver.c | 17 +++++++++++++++++
drivers/pci/pci.c | 21 ---------------------
drivers/pci/pci.h | 1 -
3 files changed, 17 insertions(+), 22 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -423,6 +423,23 @@ static int pci_legacy_resume(struct devi

/* Auxiliary functions used by the new power management framework */

+/**
+ * pci_restore_standard_config - restore standard config registers of PCI device
+ * @pci_dev: PCI device to handle
+ */
+static int pci_restore_standard_config(struct pci_dev *pci_dev)
+{
+ pci_update_current_state(pci_dev, PCI_UNKNOWN);
+
+ if (pci_dev->current_state != PCI_D0) {
+ int error = pci_set_power_state(pci_dev, PCI_D0);
+ if (error)
+ return error;
+ }
+
+ return pci_dev->state_saved ? pci_restore_state(pci_dev) : 0;
+}
+
static void pci_pm_default_resume_noirq(struct pci_dev *pci_dev)
{
pci_restore_standard_config(pci_dev);
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1374,27 +1374,6 @@ void pci_allocate_cap_save_buffers(struc
}

/**
- * pci_restore_standard_config - restore standard config registers of PCI device
- * @dev: PCI device to handle
- *
- * This function assumes that the device's configuration space is accessible.
- * If the device needs to be powered up, the function will wait for it to
- * change the state.
- */
-int pci_restore_standard_config(struct pci_dev *dev)
-{
- pci_update_current_state(dev, PCI_UNKNOWN);
-
- if (dev->current_state != PCI_D0) {
- int error = pci_set_power_state(dev, PCI_D0);
- if (error)
- return error;
- }
-
- return dev->state_saved ? pci_restore_state(dev) : 0;
-}
-
-/**
* pci_enable_ari - enable ARI forwarding if hardware support it
* @dev: the PCI device
*/
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -49,7 +49,6 @@ extern void pci_disable_enabled_device(s
extern void pci_pm_init(struct pci_dev *dev);
extern void platform_pci_wakeup_init(struct pci_dev *dev);
extern void pci_allocate_cap_save_buffers(struct pci_dev *dev);
-extern int pci_restore_standard_config(struct pci_dev *dev);

static inline bool pci_is_bridge(struct pci_dev *pci_dev)
{

2009-03-11 10:01:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 10/10] PCI PM: Restore config spaces of all devices during early resume

From: Rafael J. Wysocki <[email protected]>

At present the configuration spaces of PCI devices that have no drivers
or no PM support in the drivers (either legacy or through a pm object) are not
saved during suspend and, consequently, they are not restored during resume.
This generally may lead to the state of the system being slightly inconsistent
after the resume, so it's better to save and restore the configuration spaces
of these devices as well.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci-driver.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -516,13 +516,13 @@ static int pci_pm_suspend(struct device
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend(dev, PMSG_SUSPEND);

+ pci_dev->state_saved = false;
+
if (!pm) {
pci_pm_default_suspend(pci_dev);
goto Fixup;
}

- pci_dev->state_saved = false;
-
if (pm->suspend) {
pci_power_t prev = pci_dev->current_state;
int error;
@@ -554,8 +554,10 @@ static int pci_pm_suspend_noirq(struct d
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend_late(dev, PMSG_SUSPEND);

- if (!pm)
+ if (!pm) {
+ pci_save_state(pci_dev);
return 0;
+ }

if (pm->suspend_noirq) {
pci_power_t prev = pci_dev->current_state;
@@ -650,13 +652,13 @@ static int pci_pm_freeze(struct device *
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend(dev, PMSG_FREEZE);

+ pci_dev->state_saved = false;
+
if (!pm) {
pci_pm_default_suspend(pci_dev);
return 0;
}

- pci_dev->state_saved = false;
-
if (pm->freeze) {
int error;

@@ -738,13 +740,13 @@ static int pci_pm_poweroff(struct device
if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend(dev, PMSG_HIBERNATE);

+ pci_dev->state_saved = false;
+
if (!pm) {
pci_pm_default_suspend(pci_dev);
goto Fixup;
}

- pci_dev->state_saved = false;
-
if (pm->poweroff) {
int error;

2009-03-11 10:00:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 8/10] PCI PM: Put devices into low power states during late suspend (rev. 2)

From: Rafael J. Wysocki <[email protected]>

Once we have allowed timer interrupts to be enabled during the late
phase of suspending devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into low power states at
that time. We can also use some related platform callbacks, like the
ones preparing devices for wake-up, during the late suspend.

Doing this will allow us to avoid the race condition where a device
using shared interrupts is put into a low power state with interrupts
enabled and then an interrupt (for another device) comes in and
confuses its driver. At the same time, devices that don't support
the native PCI PM or that require some additional, platform-specific
operations to be carried out to put them into low power states will
be handled as appropriate.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci-driver.c | 134 ++++++++++++++++++++++++++++-------------------
1 file changed, 81 insertions(+), 53 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,53 +352,60 @@ static int pci_legacy_suspend(struct dev
{
struct pci_dev * pci_dev = to_pci_dev(dev);
struct pci_driver * drv = pci_dev->driver;
- int error = 0;
+
+ pci_dev->state_saved = false;

if (drv && drv->suspend) {
pci_power_t prev = pci_dev->current_state;
-
- pci_dev->state_saved = false;
+ int error;

error = drv->suspend(pci_dev, state);
suspend_report_result(drv->suspend, error);
if (error)
return error;

- if (pci_dev->state_saved)
- goto Fixup;
-
- if (pci_dev->current_state != PCI_D0
+ if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
&& pci_dev->current_state != PCI_UNKNOWN) {
WARN_ONCE(pci_dev->current_state != prev,
"PCI PM: Device state not saved by %pF\n",
drv->suspend);
- goto Fixup;
}
}

- pci_save_state(pci_dev);
- /*
- * This is for compatibility with existing code with legacy PM support.
- */
- pci_pm_set_unknown_state(pci_dev);
-
- Fixup:
pci_fixup_device(pci_fixup_suspend, pci_dev);

- return error;
+ return 0;
}

static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
{
struct pci_dev * pci_dev = to_pci_dev(dev);
struct pci_driver * drv = pci_dev->driver;
- int error = 0;

if (drv && drv->suspend_late) {
+ pci_power_t prev = pci_dev->current_state;
+ int error;
+
error = drv->suspend_late(pci_dev, state);
suspend_report_result(drv->suspend_late, error);
+ if (error)
+ return error;
+
+ if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+ && pci_dev->current_state != PCI_UNKNOWN) {
+ WARN_ONCE(pci_dev->current_state != prev,
+ "PCI PM: Device state not saved by %pF\n",
+ drv->suspend_late);
+ return 0;
+ }
}
- return error;
+
+ if (!pci_dev->state_saved)
+ pci_save_state(pci_dev);
+
+ pci_pm_set_unknown_state(pci_dev);
+
+ return 0;
}

static int pci_legacy_resume_early(struct device *dev)
@@ -460,7 +467,6 @@ static void pci_pm_default_suspend(struc
/* Disable non-bridge devices without PM support */
if (!pci_is_bridge(pci_dev))
pci_disable_enabled_device(pci_dev);
- pci_save_state(pci_dev);
}

static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev)
@@ -526,24 +532,14 @@ static int pci_pm_suspend(struct device
if (error)
return error;

- if (pci_dev->state_saved)
- goto Fixup;
-
- if (pci_dev->current_state != PCI_D0
+ if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
&& pci_dev->current_state != PCI_UNKNOWN) {
WARN_ONCE(pci_dev->current_state != prev,
"PCI PM: State of device not saved by %pF\n",
pm->suspend);
- goto Fixup;
}
}

- if (!pci_dev->state_saved) {
- pci_save_state(pci_dev);
- if (!pci_is_bridge(pci_dev))
- pci_prepare_to_sleep(pci_dev);
- }
-
Fixup:
pci_fixup_device(pci_fixup_suspend, pci_dev);

@@ -553,21 +549,41 @@ static int pci_pm_suspend(struct device
static int pci_pm_suspend_noirq(struct device *dev)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
- struct device_driver *drv = dev->driver;
- int error = 0;
+ struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;

if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend_late(dev, PMSG_SUSPEND);

- if (drv && drv->pm && drv->pm->suspend_noirq) {
- error = drv->pm->suspend_noirq(dev);
- suspend_report_result(drv->pm->suspend_noirq, error);
+ if (!pm)
+ return 0;
+
+ if (pm->suspend_noirq) {
+ pci_power_t prev = pci_dev->current_state;
+ int error;
+
+ error = pm->suspend_noirq(dev);
+ suspend_report_result(pm->suspend_noirq, error);
+ if (error)
+ return error;
+
+ if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+ && pci_dev->current_state != PCI_UNKNOWN) {
+ WARN_ONCE(pci_dev->current_state != prev,
+ "PCI PM: State of device not saved by %pF\n",
+ pm->suspend_noirq);
+ return 0;
+ }
}

- if (!error)
- pci_pm_set_unknown_state(pci_dev);
+ if (!pci_dev->state_saved) {
+ pci_save_state(pci_dev);
+ if (!pci_is_bridge(pci_dev))
+ pci_prepare_to_sleep(pci_dev);
+ }

- return error;
+ pci_pm_set_unknown_state(pci_dev);
+
+ return 0;
}

static int pci_pm_resume_noirq(struct device *dev)
@@ -650,9 +666,6 @@ static int pci_pm_freeze(struct device *
return error;
}

- if (!pci_dev->state_saved)
- pci_save_state(pci_dev);
-
return 0;
}

@@ -660,20 +673,25 @@ static int pci_pm_freeze_noirq(struct de
{
struct pci_dev *pci_dev = to_pci_dev(dev);
struct device_driver *drv = dev->driver;
- int error = 0;

if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend_late(dev, PMSG_FREEZE);

if (drv && drv->pm && drv->pm->freeze_noirq) {
+ int error;
+
error = drv->pm->freeze_noirq(dev);
suspend_report_result(drv->pm->freeze_noirq, error);
+ if (error)
+ return error;
}

- if (!error)
- pci_pm_set_unknown_state(pci_dev);
+ if (!pci_dev->state_saved)
+ pci_save_state(pci_dev);

- return error;
+ pci_pm_set_unknown_state(pci_dev);
+
+ return 0;
}

static int pci_pm_thaw_noirq(struct device *dev)
@@ -716,7 +734,6 @@ static int pci_pm_poweroff(struct device
{
struct pci_dev *pci_dev = to_pci_dev(dev);
struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
- int error = 0;

if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend(dev, PMSG_HIBERNATE);
@@ -729,33 +746,44 @@ static int pci_pm_poweroff(struct device
pci_dev->state_saved = false;

if (pm->poweroff) {
+ int error;
+
error = pm->poweroff(dev);
suspend_report_result(pm->poweroff, error);
+ if (error)
+ return error;
}

- if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
- pci_prepare_to_sleep(pci_dev);
-
Fixup:
pci_fixup_device(pci_fixup_suspend, pci_dev);

- return error;
+ return 0;
}

static int pci_pm_poweroff_noirq(struct device *dev)
{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
struct device_driver *drv = dev->driver;
- int error = 0;

if (pci_has_legacy_pm_support(to_pci_dev(dev)))
return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);

- if (drv && drv->pm && drv->pm->poweroff_noirq) {
+ if (!drv || !drv->pm)
+ return 0;
+
+ if (drv->pm->poweroff_noirq) {
+ int error;
+
error = drv->pm->poweroff_noirq(dev);
suspend_report_result(drv->pm->poweroff_noirq, error);
+ if (error)
+ return error;
}

- return error;
+ if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
+ pci_prepare_to_sleep(pci_dev);
+
+ return 0;
}

static int pci_pm_restore_noirq(struct device *dev)

2009-03-11 10:00:59

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 9/10] PCI PM: Make pci_set_power_state() handle devices with no PM support

From: Rafael J. Wysocki <[email protected]>

There is a problem with PCI devices without any PM support (either native or
through the platform) that pci_set_power_state() always returns error code for
them, even if they are being put into D0. However, such devices are always in
D0, so pci_set_power_state() should return success when attempting to put such a
device into D0. It also should update the current_state field for these
devices as appropriate. This modification is necessary so that the standard
configuration registers of these devices are successfully restored by
pci_restore_standard_config() during the "early" phase of resume.

In addition, pci_set_power_state() should check the value of current_state
before calling the platform to change the power state of the device to avoid
doing that unnecessarily.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -439,6 +439,10 @@ static int pci_raw_set_power_state(struc
u16 pmcsr;
bool need_restore = false;

+ /* Check if we're already there */
+ if (dev->current_state == state)
+ return 0;
+
if (!dev->pm_cap)
return -EIO;

@@ -449,10 +453,7 @@ static int pci_raw_set_power_state(struc
* Can enter D0 from any state, but if we can only go deeper
* to sleep if we're already in a low power state
*/
- if (dev->current_state == state) {
- /* we're already there */
- return 0;
- } else if (state != PCI_D0 && dev->current_state <= PCI_D3cold
+ if (state != PCI_D0 && dev->current_state <= PCI_D3cold
&& dev->current_state > state) {
dev_err(&dev->dev, "invalid power transition "
"(from state %d to %d)\n", dev->current_state, state);
@@ -570,12 +571,17 @@ int pci_set_power_state(struct pci_dev *
*/
return 0;

- if (state == PCI_D0 && platform_pci_power_manageable(dev)) {
+ /* Check if we're already there */
+ if (dev->current_state == state)
+ return 0;
+
+ if (state == PCI_D0) {
/*
* Allow the platform to change the state, for example via ACPI
* _PR0, _PS0 and some such, but do not trust it.
*/
- int ret = platform_pci_set_power_state(dev, PCI_D0);
+ int ret = platform_pci_power_manageable(dev) ?
+ platform_pci_set_power_state(dev, PCI_D0) : 0;
if (!ret)
pci_update_current_state(dev, PCI_D0);
}

2009-03-11 10:40:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

Rafael,

On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:

> Index: linux-2.6/kernel/irq/pm.c
> ===================================================================
> --- /dev/null
> +++ linux-2.6/kernel/irq/pm.c
> @@ -0,0 +1,89 @@
> +/*
> + * linux/kernel/irq/pm.c
> + *
> + * Copyright (C) 2009 Rafael J. Wysocki <[email protected]>, Novell Inc.
> + *
> + * This file contains power management functions related to interrupts.
> + */
> +
> +#include <linux/irq.h>
> +#include <linux/module.h>
> +#include <linux/interrupt.h>
> +
> +#include "internals.h"
> +
> +/**
> + * suspend_device_irqs - disable all currently enabled interrupt lines
> + *
> + * During system-wide suspend or hibernation device interrupts need to be
> + * disabled at the chip level and this function is provided for this purpose.
> + * It disables all interrupt lines that are enabled at the moment and sets the
> + * IRQ_SUSPENDED flag for them.
> + */
> +void suspend_device_irqs(void)
> +{
> + struct irq_desc *desc;
> + int irq;
> +
> + for_each_irq_desc(irq, desc) {
> + unsigned long flags;
> + bool sync = false;
> +
> + spin_lock_irqsave(&desc->lock, flags);
> +
> + if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
> + if (!desc->depth++) {
> + desc->status |= IRQ_DISABLED;
> + desc->chip->disable(irq);
> + sync = true;
> + }
> + desc->status |= IRQ_SUSPENDED;

This flag needs to be checked in __enable_irq().

> + }
> +
> + spin_unlock_irqrestore(&desc->lock, flags);
> +
> + if (sync)
> + synchronize_irq(irq);
> + }
> +}
> +EXPORT_SYMBOL_GPL(suspend_device_irqs);

I'm not too enthusiastic about this open coded implementation of
disable_irq() with slightly different semantics.

Can we please move the fiddling with desc->* into
kernel/irq/manage.c and share the code there ?

> +/**
> + * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
> + *
> + * Enable all interrupt lines previously disabled by suspend_device_irqs() that
> + * have the IRQ_SUSPENDED flag set.
> + */
> +void resume_device_irqs(void)
> +{
> + struct irq_desc *desc;
> + int irq;
> +
> + for_each_irq_desc(irq, desc) {
> + unsigned long flags;
> +
> + if (!(desc->status & IRQ_SUSPENDED))
> + continue;
> +
> + spin_lock_irqsave(&desc->lock, flags);
> + desc->status &= ~IRQ_SUSPENDED;
> + __enable_irq(desc, irq);
> + spin_unlock_irqrestore(&desc->lock, flags);
> + }
> +}
> +EXPORT_SYMBOL_GPL(resume_device_irqs);

Ditto.

Thanks,

tglx

2009-03-11 20:59:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> Rafael,
>
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
>
> > Index: linux-2.6/kernel/irq/pm.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/kernel/irq/pm.c
> > @@ -0,0 +1,89 @@
> > +/*
> > + * linux/kernel/irq/pm.c
> > + *
> > + * Copyright (C) 2009 Rafael J. Wysocki <[email protected]>, Novell Inc.
> > + *
> > + * This file contains power management functions related to interrupts.
> > + */
> > +
> > +#include <linux/irq.h>
> > +#include <linux/module.h>
> > +#include <linux/interrupt.h>
> > +
> > +#include "internals.h"
> > +
> > +/**
> > + * suspend_device_irqs - disable all currently enabled interrupt lines
> > + *
> > + * During system-wide suspend or hibernation device interrupts need to be
> > + * disabled at the chip level and this function is provided for this purpose.
> > + * It disables all interrupt lines that are enabled at the moment and sets the
> > + * IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > + struct irq_desc *desc;
> > + int irq;
> > +
> > + for_each_irq_desc(irq, desc) {
> > + unsigned long flags;
> > + bool sync = false;
> > +
> > + spin_lock_irqsave(&desc->lock, flags);
> > +
> > + if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
> > + if (!desc->depth++) {
> > + desc->status |= IRQ_DISABLED;
> > + desc->chip->disable(irq);
> > + sync = true;
> > + }
> > + desc->status |= IRQ_SUSPENDED;
>
> This flag needs to be checked in __enable_irq().
>
> > + }
> > +
> > + spin_unlock_irqrestore(&desc->lock, flags);
> > +
> > + if (sync)
> > + synchronize_irq(irq);
> > + }
> > +}
> > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
>
> I'm not too enthusiastic about this open coded implementation of
> disable_irq() with slightly different semantics.

The difference in semantics is important IMO, otherwise I woulndn't have
done that. In particular, IMO, the condition should be under the spinlock IMO
and I'd rather not synchronize all interrupts we don't really disable here.

> Can we please move the fiddling with desc->* into
> kernel/irq/manage.c and share the code there ?

Can you please discuss that with Ingo? I moved that from manage.c at his
request.

Thanks,
Rafael

2009-03-11 21:15:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> Rafael,
>
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
>
> > Index: linux-2.6/kernel/irq/pm.c
> > ===================================================================
> > --- /dev/null
> > +++ linux-2.6/kernel/irq/pm.c
> > @@ -0,0 +1,89 @@
> > +/*
> > + * linux/kernel/irq/pm.c
> > + *
> > + * Copyright (C) 2009 Rafael J. Wysocki <[email protected]>, Novell Inc.
> > + *
> > + * This file contains power management functions related to interrupts.
> > + */
> > +
> > +#include <linux/irq.h>
> > +#include <linux/module.h>
> > +#include <linux/interrupt.h>
> > +
> > +#include "internals.h"
> > +
> > +/**
> > + * suspend_device_irqs - disable all currently enabled interrupt lines
> > + *
> > + * During system-wide suspend or hibernation device interrupts need to be
> > + * disabled at the chip level and this function is provided for this purpose.
> > + * It disables all interrupt lines that are enabled at the moment and sets the
> > + * IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > + struct irq_desc *desc;
> > + int irq;
> > +
> > + for_each_irq_desc(irq, desc) {
> > + unsigned long flags;
> > + bool sync = false;
> > +
> > + spin_lock_irqsave(&desc->lock, flags);
> > +
> > + if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
> > + if (!desc->depth++) {
> > + desc->status |= IRQ_DISABLED;
> > + desc->chip->disable(irq);
> > + sync = true;
> > + }
> > + desc->status |= IRQ_SUSPENDED;
>
> This flag needs to be checked in __enable_irq().

[I overlooked this comment, sorry.]

Why does it?

Rafael

2009-03-11 21:38:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > + desc->status |= IRQ_SUSPENDED;
> >
> > This flag needs to be checked in __enable_irq().
>
> [I overlooked this comment, sorry.]
>
> Why does it?

To catch abuse and callers of enable_irq() when this flag is set.

Thanks,

tglx

2009-03-11 21:44:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> >
> > I'm not too enthusiastic about this open coded implementation of
> > disable_irq() with slightly different semantics.
>
> The difference in semantics is important IMO, otherwise I woulndn't have
> done that. In particular, IMO, the condition should be under the spinlock IMO
> and I'd rather not synchronize all interrupts we don't really disable here.

I don't say that the difference is not relevant. But the code is
almost the same and disable_irq() could have the sync_irq optimization
as well.

> > Can we please move the fiddling with desc->* into
> > kernel/irq/manage.c and share the code there ?
>
> Can you please discuss that with Ingo? I moved that from manage.c at his
> request.

Hmrpf. Will do. I just want to avoid that we have scattered functions
which deal with the guts of the irq code all over the place. I'm fine
with your loop in irq/pm.c, but the actual handling of the irq
internals should remain in manage.c.

I'll have a closer look how to solve this.

Thanks,

tglx

2009-03-11 21:50:31

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > + desc->status |= IRQ_SUSPENDED;
> > >
> > > This flag needs to be checked in __enable_irq().
> >
> > [I overlooked this comment, sorry.]
> >
> > Why does it?
>
> To catch abuse and callers of enable_irq() when this flag is set.

Hmm. This means you'd like to make enable_irq() fail if called with
IRQ_SUSPENDED set, correct?

What if someone calls irq_disable() and then irq_enable() between
suspend_device_irqs() and resume_device_irqs()? That would be pointless, but
surely not a bug? Should irq_disable() also fail if IRQ_SUSPENDED is set?

Or should __enable_irq() only fail with IRQ_SUSPENDED set for desc->depth == 1?

Rafael

2009-03-11 21:55:49

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:

> On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > + desc->status |= IRQ_SUSPENDED;
> > > >
> > > > This flag needs to be checked in __enable_irq().
> > >
> > > [I overlooked this comment, sorry.]
> > >
> > > Why does it?
> >
> > To catch abuse and callers of enable_irq() when this flag is set.
>
> Hmm. This means you'd like to make enable_irq() fail if called with
> IRQ_SUSPENDED set, correct?
>
> What if someone calls irq_disable() and then irq_enable() between
> suspend_device_irqs() and resume_device_irqs()? That would be pointless, but
> surely not a bug? Should irq_disable() also fail if IRQ_SUSPENDED is set?

I'm not worried about nested ones.

> Or should __enable_irq() only fail with IRQ_SUSPENDED set for desc->depth == 1?

At least it needs a WARN_ON() in that case. A very prominent one.

Thanks,

tglx

2009-03-11 22:03:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > >
> > > I'm not too enthusiastic about this open coded implementation of
> > > disable_irq() with slightly different semantics.
> >
> > The difference in semantics is important IMO, otherwise I woulndn't have
> > done that. In particular, IMO, the condition should be under the spinlock IMO
> > and I'd rather not synchronize all interrupts we don't really disable here.
>
> I don't say that the difference is not relevant. But the code is
> almost the same and disable_irq() could have the sync_irq optimization
> as well.

Agreed.

> > > Can we please move the fiddling with desc->* into
> > > kernel/irq/manage.c and share the code there ?
> >
> > Can you please discuss that with Ingo? I moved that from manage.c at his
> > request.
>
> Hmrpf. Will do. I just want to avoid that we have scattered functions
> which deal with the guts of the irq code all over the place.

I understand your concern, I'd prefer to avoid that too.

> I'm fine with your loop in irq/pm.c, but the actual handling of the irq
> internals should remain in manage.c.

Well, perhaps we can add a parameter to disable_irq_nosync() telling it not
to disable the interrupt if it's a timer one? Something like

void disable_irq_nosync(unsigned int irq, bool skip_timer) etc.?

Also, it could return a value meaning whether or not the interrupt has been
actually disabled.

> I'll have a closer look how to solve this.

Thanks!

Best,
Rafael

2009-03-11 22:07:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
>
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > > + desc->status |= IRQ_SUSPENDED;
> > > > >
> > > > > This flag needs to be checked in __enable_irq().
> > > >
> > > > [I overlooked this comment, sorry.]
> > > >
> > > > Why does it?
> > >
> > > To catch abuse and callers of enable_irq() when this flag is set.
> >
> > Hmm. This means you'd like to make enable_irq() fail if called with
> > IRQ_SUSPENDED set, correct?
> >
> > What if someone calls irq_disable() and then irq_enable() between
> > suspend_device_irqs() and resume_device_irqs()? That would be pointless, but
> > surely not a bug? Should irq_disable() also fail if IRQ_SUSPENDED is set?
>
> I'm not worried about nested ones.
>
> > Or should __enable_irq() only fail with IRQ_SUSPENDED set for desc->depth == 1?
>
> At least it needs a WARN_ON() in that case. A very prominent one.

I'm going to make it fail and print a warning for desc->depth == 1if IRQ_SUSPENDED is set.
Hope that's fine with everyone.

Thanks,
Rafael

2009-03-11 22:07:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)



On Wed, 11 Mar 2009, Thomas Gleixner wrote:
>
> I'm not worried about nested ones.

Then you shouldn't be worried about IRQ_SUSPENDED at all, since that one
increments the disabled depth count.

So _all_ disable/enable_irq calls will by definition be nested inside
IRQ_SUSPENDED.

Linus

2009-03-11 22:13:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Wednesday 11 March 2009, Linus Torvalds wrote:
>
> On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> >
> > I'm not worried about nested ones.
>
> Then you shouldn't be worried about IRQ_SUSPENDED at all, since that one
> increments the disabled depth count.
>
> So _all_ disable/enable_irq calls will by definition be nested inside
> IRQ_SUSPENDED.

Still, if there's an unbalanced irq_enable() between suspend_device_irqs()
and resume_device_irqs(), we'll not detect it immediately, but only in
resume_device_irqs(). It would be better if the unbalanced call failed in that
case IMHO.

Rafael

2009-03-11 22:28:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Wed, 11 Mar 2009, Linus Torvalds wrote:
> On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> >
> > I'm not worried about nested ones.
>
> Then you shouldn't be worried about IRQ_SUSPENDED at all, since that one
> increments the disabled depth count.
>
> So _all_ disable/enable_irq calls will by definition be nested inside
> IRQ_SUSPENDED.

Right, if they are in disable -> enable order.

But the stupid stray enable will be visible either by wrecking the
suspend with hard to debug failures or trigger the depth check in the
resume code. I'm burned enough by the timer failures which pop up long
after the real bug happened.

Thanks,

tglx

2009-03-11 22:48:15

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > >
> > > I'm not too enthusiastic about this open coded implementation of
> > > disable_irq() with slightly different semantics.
> >
> > The difference in semantics is important IMO, otherwise I woulndn't have
> > done that. In particular, IMO, the condition should be under the spinlock IMO
> > and I'd rather not synchronize all interrupts we don't really disable here.
>
> I don't say that the difference is not relevant. But the code is
> almost the same and disable_irq() could have the sync_irq optimization
> as well.

Thought more about that. Avoiding the sync_irq() for irqs which have
no action associated is fine, but you need to catch the following case
as well:

driver code calls disable_irq_nosyc() from the handler (which is
still running)

suspend code skips the sync due to depth > 0

The sync operation is not that expensive.

Thanks,

tglx

2009-03-12 13:36:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Wednesday 11 March 2009, Thomas Gleixner wrote:
> On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > >
> > > > I'm not too enthusiastic about this open coded implementation of
> > > > disable_irq() with slightly different semantics.
> > >
> > > The difference in semantics is important IMO, otherwise I woulndn't have
> > > done that. In particular, IMO, the condition should be under the spinlock IMO
> > > and I'd rather not synchronize all interrupts we don't really disable here.
> >
> > I don't say that the difference is not relevant. But the code is
> > almost the same and disable_irq() could have the sync_irq optimization
> > as well.
>
> Thought more about that. Avoiding the sync_irq() for irqs which have
> no action associated is fine, but you need to catch the following case
> as well:
>
> driver code calls disable_irq_nosyc() from the handler (which is
> still running)
>
> suspend code skips the sync due to depth > 0
>
> The sync operation is not that expensive.

OK, what about this (untested, irrelevant parts skipped)?

Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,79 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <[email protected]>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&desc->lock, flags);
+ __disable_irq(desc, irq, true);
+ spin_unlock_irqrestore(&desc->lock, flags);
+ }
+
+ for_each_irq_desc(irq, desc)
+ if (desc->status & IRQ_SUSPENDED)
+ synchronize_irq(irq);
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc) {
+ unsigned long flags;
+
+ if (!(desc->status & IRQ_SUSPENDED))
+ continue;
+
+ spin_lock_irqsave(&desc->lock, flags);
+ __enable_irq(desc, irq, true);
+ spin_unlock_irqrestore(&desc->lock, flags);
+ }
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc)
+ if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+ return -EBUSY;
+
+ return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -162,6 +162,20 @@ static inline int do_irq_select_affinity
}
#endif

+void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
+{
+ if (suspend) {
+ if (desc->action && (desc->action->flags & IRQF_TIMER))
+ return;
+ desc->status |= IRQ_SUSPENDED;
+ }
+
+ if (!desc->depth++) {
+ desc->status |= IRQ_DISABLED;
+ desc->chip->disable(irq);
+ }
+}
+
/**
* disable_irq_nosync - disable an irq without waiting
* @irq: Interrupt to disable
@@ -182,10 +196,7 @@ void disable_irq_nosync(unsigned int irq
return;

spin_lock_irqsave(&desc->lock, flags);
- if (!desc->depth++) {
- desc->status |= IRQ_DISABLED;
- desc->chip->disable(irq);
- }
+ __disable_irq(desc, irq, false);
spin_unlock_irqrestore(&desc->lock, flags);
}
EXPORT_SYMBOL(disable_irq_nosync);
@@ -215,15 +226,19 @@ void disable_irq(unsigned int irq)
}
EXPORT_SYMBOL(disable_irq);

-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume)
{
+ if (resume)
+ desc->status &= ~IRQ_SUSPENDED;
+
switch (desc->depth) {
case 0:
- WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
- break;
+ goto err_out;
case 1: {
unsigned int status = desc->status & ~IRQ_DISABLED;

+ if (desc->status & IRQ_SUSPENDED)
+ goto err_out;
/* Prevent probing on this irq: */
desc->status = status | IRQ_NOPROBE;
check_irq_resend(desc, irq);
@@ -232,6 +247,11 @@ static void __enable_irq(struct irq_desc
default:
desc->depth--;
}
+
+ return;
+
+ err_out:
+ WARN(true, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
}

/**
@@ -253,7 +273,7 @@ void enable_irq(unsigned int irq)
return;

spin_lock_irqsave(&desc->lock, flags);
- __enable_irq(desc, irq);
+ __enable_irq(desc, irq, false);
spin_unlock_irqrestore(&desc->lock, flags);
}
EXPORT_SYMBOL(enable_irq);
@@ -511,7 +531,7 @@ __setup_irq(unsigned int irq, struct irq
*/
if (shared && (desc->status & IRQ_SPURIOUS_DISABLED)) {
desc->status &= ~IRQ_SPURIOUS_DISABLED;
- __enable_irq(desc, irq);
+ __enable_irq(desc, irq, false);
}

spin_unlock_irqrestore(&desc->lock, flags);
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,8 @@ extern void compat_irq_chip_set_default_

extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
unsigned long flags);
+extern void __disable_irq(struct irq_desc *desc, unsigned int irq, bool susp);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume);

extern struct lock_class_key irq_desc_lock_class;
extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

Thanks,
Rafael

2009-03-12 21:44:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Thursday 12 March 2009, Rafael J. Wysocki wrote:
> On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > > >
> > > > > I'm not too enthusiastic about this open coded implementation of
> > > > > disable_irq() with slightly different semantics.
> > > >
> > > > The difference in semantics is important IMO, otherwise I woulndn't have
> > > > done that. In particular, IMO, the condition should be under the spinlock IMO
> > > > and I'd rather not synchronize all interrupts we don't really disable here.
> > >
> > > I don't say that the difference is not relevant. But the code is
> > > almost the same and disable_irq() could have the sync_irq optimization
> > > as well.
> >
> > Thought more about that. Avoiding the sync_irq() for irqs which have
> > no action associated is fine, but you need to catch the following case
> > as well:
> >
> > driver code calls disable_irq_nosyc() from the handler (which is
> > still running)
> >
> > suspend code skips the sync due to depth > 0
> >
> > The sync operation is not that expensive.
>
> OK, what about this (untested, irrelevant parts skipped)?

Well, I guess I need to assume that no reaction means it's fine. ;-)

Below is the complete patch. Thomas, Ingo, please let me know it it is fine
with you.

Thanks,
Rafael

---
From: Rafael J. Wysocki <[email protected]>
Subject: PM: Rework handling of interrupts during suspend-resume (rev. 6)

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively. These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume. Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/kernel/apm_32.c | 15 ++++++--
drivers/base/power/main.c | 20 ++++++-----
drivers/base/sys.c | 8 ++++
drivers/xen/manage.c | 16 +++++----
include/linux/interrupt.h | 5 ++
include/linux/irq.h | 1
kernel/irq/Makefile | 1
kernel/irq/internals.h | 2 +
kernel/irq/manage.c | 31 +++++++++++++-----
kernel/irq/pm.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/kexec.c | 8 ++--
kernel/power/disk.c | 39 ++++++++++++++++------
kernel/power/main.c | 17 ++++++---
13 files changed, 195 insertions(+), 47 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned
extern void disable_irq(unsigned int irq);
extern void enable_irq(unsigned int irq);

+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)

extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
*/
static int suspend_enter(suspend_state_t state)
{
- int error = 0;
+ int error;

device_pm_lock();
- arch_suspend_disable_irqs();
- BUG_ON(!irqs_disabled());

- if ((error = device_power_down(PMSG_SUSPEND))) {
+ error = device_power_down(PMSG_SUSPEND);
+ if (error) {
printk(KERN_ERR "PM: Some devices failed to power down\n");
goto Done;
}

+ arch_suspend_disable_irqs();
+ BUG_ON(!irqs_disabled());
+
error = sysdev_suspend(PMSG_SUSPEND);
if (!error) {
if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
sysdev_resume();
}

- device_power_up(PMSG_RESUME);
- Done:
arch_suspend_enable_irqs();
BUG_ON(irqs_disabled());
+
+ device_power_up(PMSG_RESUME);
+
+ Done:
device_pm_unlock();
+
return error;
}

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
return error;

device_pm_lock();
- local_irq_disable();
+
/* At this point, device_suspend() has been called, but *not*
* device_power_down(). We *must* call device_power_down() now.
* Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
"aborting hibernation\n");
- goto Enable_irqs;
+ goto Unlock;
}
+
+ local_irq_disable();
+
sysdev_suspend(PMSG_FREEZE);
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
/* NOTE: device_power_up() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
*/
+
Power_up_devices:
+ local_irq_enable();
+
device_power_up(in_suspend ?
(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
- local_irq_enable();
+
+ Unlock:
device_pm_unlock();
+
return error;
}

@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
int error;

device_pm_lock();
- local_irq_disable();
+
error = device_power_down(PMSG_QUIESCE);
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
"aborting resume\n");
- goto Enable_irqs;
+ goto Unlock;
}
+
+ local_irq_disable();
+
sysdev_suspend(PMSG_QUIESCE);
/* We'll ignore saved state, but this gets preempt count (etc) right */
save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
swsusp_free();
restore_processor_state();
touch_softlockup_watchdog();
+
sysdev_resume();
- device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
local_irq_enable();
+
+ device_power_up(PMSG_RECOVER);
+
+ Unlock:
device_pm_unlock();
+
return error;
}

@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
goto Finish;

device_pm_lock();
- local_irq_disable();
+
error = device_power_down(PMSG_HIBERNATE);
if (!error) {
+ local_irq_disable();
sysdev_suspend(PMSG_HIBERNATE);
hibernation_ops->enter();
/* We should never get here */
while (1);
}
- local_irq_enable();
+
device_pm_unlock();

/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
*/
Finish:
hibernation_ops->finish();
+
Resume_devices:
entering_platform_hibernation = false;
device_resume(PMSG_RESTORE);
resume_console();
+
Close:
hibernation_ops->end();
+
return error;
}

Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
struct apm_user *as;

device_suspend(PMSG_SUSPEND);
- local_irq_disable();
+
device_power_down(PMSG_SUSPEND);
+
+ local_irq_disable();
sysdev_suspend(PMSG_SUSPEND);

local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
if (err != APM_SUCCESS)
apm_error("suspend", err);
err = (err == APM_SUCCESS) ? 0 : -EIO;
+
sysdev_resume();
- device_power_up(PMSG_RESUME);
local_irq_enable();
+
+ device_power_up(PMSG_RESUME);
+
device_resume(PMSG_RESUME);
queue_event(APM_NORMAL_RESUME, NULL);
spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
{
int err;

- local_irq_disable();
device_power_down(PMSG_SUSPEND);
+
+ local_irq_disable();
sysdev_suspend(PMSG_SUSPEND);
local_irq_enable();

@@ -1239,8 +1245,9 @@ static void standby(void)

local_irq_disable();
sysdev_resume();
- device_power_up(PMSG_RESUME);
local_irq_enable();
+
+ device_power_up(PMSG_RESUME);
}

static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)

BUG_ON(!irqs_disabled());

- err = device_power_down(PMSG_SUSPEND);
- if (err) {
- printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
- err);
- return err;
- }
err = sysdev_suspend(PMSG_SUSPEND);
if (err) {
printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
xen_mm_unpin_all();

sysdev_resume();
- device_power_up(PMSG_RESUME);

if (!*cancelled) {
xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
/* XXX use normal device tree? */
xenbus_suspend();

+ err = device_power_down(PMSG_SUSPEND);
+ if (err) {
+ printk(KERN_ERR "device_power_down failed: %d\n", err);
+ goto resume_devices;
+ }
+
err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
if (err) {
printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
} else
xenbus_suspend_cancel();

+ device_power_up(PMSG_RESUME);
+
+resume_devices:
device_resume(PMSG_RESUME);

/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
if (error)
goto Resume_devices;
device_pm_lock();
- local_irq_disable();
/* At this point, device_suspend() has been called,
* but *not* device_power_down(). We *must*
* device_power_down() now. Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
*/
error = device_power_down(PMSG_FREEZE);
if (error)
- goto Enable_irqs;
+ goto Unlock_pm;

+ local_irq_disable();
/* Suspend system devices */
error = sysdev_suspend(PMSG_FREEZE);
if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
if (kexec_image->preserve_context) {
sysdev_resume();
Power_up_devices:
- device_power_up(PMSG_RESTORE);
- Enable_irqs:
local_irq_enable();
+ device_power_up(PMSG_RESTORE);
+ Unlock_pm:
device_pm_unlock();
enable_nonboot_cpus();
Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef void (*irq_flow_handler_t)(unsig
#define IRQ_SPURIOUS_DISABLED 0x00800000 /* IRQ was disabled by the spurious trap */
#define IRQ_MOVE_PCNTXT 0x01000000 /* IRQ migration from process context */
#define IRQ_AFFINITY_SET 0x02000000 /* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED 0x04000000 /* IRQ has gone through suspend sequence */

#ifdef CONFIG_IRQ_PER_CPU
# define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,79 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <[email protected]>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&desc->lock, flags);
+ __disable_irq(desc, irq, true);
+ spin_unlock_irqrestore(&desc->lock, flags);
+ }
+
+ for_each_irq_desc(irq, desc)
+ if (desc->status & IRQ_SUSPENDED)
+ synchronize_irq(irq);
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc) {
+ unsigned long flags;
+
+ if (!(desc->status & IRQ_SUSPENDED))
+ continue;
+
+ spin_lock_irqsave(&desc->lock, flags);
+ __enable_irq(desc, irq, true);
+ spin_unlock_irqrestore(&desc->lock, flags);
+ }
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc)
+ if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+ return -EBUSY;
+
+ return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -162,6 +162,20 @@ static inline int do_irq_select_affinity
}
#endif

+void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
+{
+ if (suspend) {
+ if (desc->action && (desc->action->flags & IRQF_TIMER))
+ return;
+ desc->status |= IRQ_SUSPENDED;
+ }
+
+ if (!desc->depth++) {
+ desc->status |= IRQ_DISABLED;
+ desc->chip->disable(irq);
+ }
+}
+
/**
* disable_irq_nosync - disable an irq without waiting
* @irq: Interrupt to disable
@@ -182,10 +196,7 @@ void disable_irq_nosync(unsigned int irq
return;

spin_lock_irqsave(&desc->lock, flags);
- if (!desc->depth++) {
- desc->status |= IRQ_DISABLED;
- desc->chip->disable(irq);
- }
+ __disable_irq(desc, irq, false);
spin_unlock_irqrestore(&desc->lock, flags);
}
EXPORT_SYMBOL(disable_irq_nosync);
@@ -215,15 +226,21 @@ void disable_irq(unsigned int irq)
}
EXPORT_SYMBOL(disable_irq);

-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume)
{
+ if (resume)
+ desc->status &= ~IRQ_SUSPENDED;
+
switch (desc->depth) {
case 0:
+ err_out:
WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
break;
case 1: {
unsigned int status = desc->status & ~IRQ_DISABLED;

+ if (desc->status & IRQ_SUSPENDED)
+ goto err_out;
/* Prevent probing on this irq: */
desc->status = status | IRQ_NOPROBE;
check_irq_resend(desc, irq);
@@ -253,7 +270,7 @@ void enable_irq(unsigned int irq)
return;

spin_lock_irqsave(&desc->lock, flags);
- __enable_irq(desc, irq);
+ __enable_irq(desc, irq, false);
spin_unlock_irqrestore(&desc->lock, flags);
}
EXPORT_SYMBOL(enable_irq);
@@ -511,7 +528,7 @@ __setup_irq(unsigned int irq, struct irq
*/
if (shared && (desc->status & IRQ_SPURIOUS_DISABLED)) {
desc->status &= ~IRQ_SPURIOUS_DISABLED;
- __enable_irq(desc, irq);
+ __enable_irq(desc, irq, false);
}

spin_unlock_irqrestore(&desc->lock, flags);
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
#include <linux/pm.h>
#include <linux/resume-trace.h>
#include <linux/rwsem.h>
+#include <linux/interrupt.h>

#include "../base.h"
#include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
* Execute the appropriate "noirq resume" callback for all devices marked
* as DPM_OFF_IRQ.
*
- * Must be called with interrupts disabled and only one CPU running.
+ * Must be called under dpm_list_mtx. Device drivers should not receive
+ * interrupts while it's being executed.
*/
static void dpm_power_up(pm_message_t state)
{
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
* device_power_up - Turn on all devices that need special attention.
* @state: PM transition of the system being carried out.
*
- * Power on system devices, then devices that required we shut them down
- * with interrupts disabled.
- *
- * Must be called with interrupts disabled.
+ * Call the "early" resume handlers and enable device drivers to receive
+ * interrupts.
*/
void device_power_up(pm_message_t state)
{
dpm_power_up(state);
+ resume_device_irqs();
}
EXPORT_SYMBOL_GPL(device_power_up);

@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
* device_power_down - Shut down special devices.
* @state: PM transition of the system being carried out.
*
- * Power down devices that require interrupts to be disabled.
- * Then power down system devices.
+ * Prevent device drivers from receiving interrupts and call the "late"
+ * suspend handlers.
*
- * Must be called with interrupts disabled and only one CPU running.
+ * Must be called under dpm_list_mtx.
*/
int device_power_down(pm_message_t state)
{
struct device *dev;
int error = 0;

+ suspend_device_irqs();
list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
error = suspend_device_noirq(dev, state);
if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
dev->power.status = DPM_OFF_IRQ;
}
if (error)
- dpm_power_up(resume_event(state));
+ device_power_up(resume_event(state));
return error;
}
EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
#include <linux/pm.h>
#include <linux/device.h>
#include <linux/mutex.h>
+#include <linux/interrupt.h>

#include "base.h"

@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
struct sysdev_driver *drv, *err_drv;
int ret;

+ pr_debug("Checking wake-up interrupts\n");
+
+ /* Return error code if there are any wake-up interrupts pending */
+ ret = check_wakeup_irqs();
+ if (ret)
+ return ret;
+
pr_debug("Suspending System Devices\n");

list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,8 @@ extern void compat_irq_chip_set_default_

extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
unsigned long flags);
+extern void __disable_irq(struct irq_desc *desc, unsigned int irq, bool susp);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq, bool resume);

extern struct lock_class_key irq_desc_lock_class;
extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

2009-03-13 00:42:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)


* Rafael J. Wysocki <[email protected]> wrote:

> On Thursday 12 March 2009, Rafael J. Wysocki wrote:
> > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > > > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > > > >
> > > > > > I'm not too enthusiastic about this open coded implementation of
> > > > > > disable_irq() with slightly different semantics.
> > > > >
> > > > > The difference in semantics is important IMO, otherwise I woulndn't have
> > > > > done that. In particular, IMO, the condition should be under the spinlock IMO
> > > > > and I'd rather not synchronize all interrupts we don't really disable here.
> > > >
> > > > I don't say that the difference is not relevant. But the code is
> > > > almost the same and disable_irq() could have the sync_irq optimization
> > > > as well.
> > >
> > > Thought more about that. Avoiding the sync_irq() for irqs which have
> > > no action associated is fine, but you need to catch the following case
> > > as well:
> > >
> > > driver code calls disable_irq_nosyc() from the handler (which is
> > > still running)
> > >
> > > suspend code skips the sync due to depth > 0
> > >
> > > The sync operation is not that expensive.
> >
> > OK, what about this (untested, irrelevant parts skipped)?
>
> Well, I guess I need to assume that no reaction means it's fine. ;-)
>
> Below is the complete patch. Thomas, Ingo, please let me know
> it it is fine with you.

looks good - but you sure want to split it up some more, right?

> 13 files changed, 195 insertions(+), 47 deletions(-)

We want the non-intrusive 'add new APIs' bits [which give most
of the linecount] separated from the 'all hell breaks lose'
functional changes ;-) Makes it easier to revert, bisect, etc.

Ingo

2009-03-13 07:18:30

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Thu, Mar 12, 2009 at 1:43 PM, Rafael J. Wysocki <[email protected]> wrote:
>
> +void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
> +{
> + ? ? ? if (suspend) {
> + ? ? ? ? ? ? ? if (desc->action && (desc->action->flags & IRQF_TIMER))
> + ? ? ? ? ? ? ? ? ? ? ? return;

Don't you want "(!desc->action || ..." here to avoid enabling unused
interrupts on resume?

--
Arve Hj?nnev?g

2009-03-13 16:54:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Friday 13 March 2009, Arve Hj?nnev?g wrote:
> On Thu, Mar 12, 2009 at 1:43 PM, Rafael J. Wysocki <[email protected]> wrote:
> >
> > +void __disable_irq(struct irq_desc *desc, unsigned int irq, bool suspend)
> > +{
> > + if (suspend) {
> > + if (desc->action && (desc->action->flags & IRQF_TIMER))
> > + return;
>
> Don't you want "(!desc->action || ..." here to avoid enabling unused
> interrupts on resume?

Hmm, good idea, thanks.

Best,
Rafael

2009-03-13 17:11:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Friday 13 March 2009, Ingo Molnar wrote:
>
> * Rafael J. Wysocki <[email protected]> wrote:
>
> > On Thursday 12 March 2009, Rafael J. Wysocki wrote:
> > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > On Wed, 11 Mar 2009, Thomas Gleixner wrote:
> > > > > On Wed, 11 Mar 2009, Rafael J. Wysocki wrote:
> > > > > > On Wednesday 11 March 2009, Thomas Gleixner wrote:
> > > > > > > > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > > > > > >
> > > > > > > I'm not too enthusiastic about this open coded implementation of
> > > > > > > disable_irq() with slightly different semantics.
> > > > > >
> > > > > > The difference in semantics is important IMO, otherwise I woulndn't have
> > > > > > done that. In particular, IMO, the condition should be under the spinlock IMO
> > > > > > and I'd rather not synchronize all interrupts we don't really disable here.
> > > > >
> > > > > I don't say that the difference is not relevant. But the code is
> > > > > almost the same and disable_irq() could have the sync_irq optimization
> > > > > as well.
> > > >
> > > > Thought more about that. Avoiding the sync_irq() for irqs which have
> > > > no action associated is fine, but you need to catch the following case
> > > > as well:
> > > >
> > > > driver code calls disable_irq_nosyc() from the handler (which is
> > > > still running)
> > > >
> > > > suspend code skips the sync due to depth > 0
> > > >
> > > > The sync operation is not that expensive.
> > >
> > > OK, what about this (untested, irrelevant parts skipped)?
> >
> > Well, I guess I need to assume that no reaction means it's fine. ;-)
> >
> > Below is the complete patch. Thomas, Ingo, please let me know
> > it it is fine with you.
>
> looks good - but you sure want to split it up some more, right?

Well, in fact I didn't think about that.

> > 13 files changed, 195 insertions(+), 47 deletions(-)
>
> We want the non-intrusive 'add new APIs' bits [which give most
> of the linecount] separated from the 'all hell breaks lose'
> functional changes ;-) Makes it easier to revert, bisect, etc.

I can split it into a patch adding the new functions under kernel/irq and
another one making the suspend code use them, but that's going
to put the new functions somewhat out of context, IMO.

Still, I'll do it if you want me to.

Thanks,
Rafael

2009-03-13 19:58:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Thu, 12 Mar 2009, Rafael J. Wysocki wrote:
> +/**
> + * suspend_device_irqs - disable all currently enabled interrupt lines
> + *
> + * During system-wide suspend or hibernation device interrupts need to be
> + * disabled at the chip level and this function is provided for this purpose.
> + * It disables all interrupt lines that are enabled at the moment and sets the
> + * IRQ_SUSPENDED flag for them.
> + */
> +void suspend_device_irqs(void)
> +{
> + struct irq_desc *desc;
> + int irq;
> +
> + for_each_irq_desc(irq, desc) {
> + unsigned long flags;
> +
> + spin_lock_irqsave(&desc->lock, flags);
> + __disable_irq(desc, irq, true);
> + spin_unlock_irqrestore(&desc->lock, flags);

Can we move the locking into __disable_irq ?

> + }
> +
> + for_each_irq_desc(irq, desc)
> + if (desc->status & IRQ_SUSPENDED)
> + synchronize_irq(irq);
> +}
> +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> +
> +/**
> + * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
> + *
> + * Enable all interrupt lines previously disabled by suspend_device_irqs() that
> + * have the IRQ_SUSPENDED flag set.
> + */
> +void resume_device_irqs(void)
> +{
> + struct irq_desc *desc;
> + int irq;
> +
> + for_each_irq_desc(irq, desc) {
> + unsigned long flags;
> +
> + if (!(desc->status & IRQ_SUSPENDED))
> + continue;
> +
> + spin_lock_irqsave(&desc->lock, flags);
> + __enable_irq(desc, irq, true);
> + spin_unlock_irqrestore(&desc->lock, flags);

Ditto

Thanks,

tglx

2009-03-13 21:57:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Friday 13 March 2009, Thomas Gleixner wrote:
> On Thu, 12 Mar 2009, Rafael J. Wysocki wrote:
> > +/**
> > + * suspend_device_irqs - disable all currently enabled interrupt lines
> > + *
> > + * During system-wide suspend or hibernation device interrupts need to be
> > + * disabled at the chip level and this function is provided for this purpose.
> > + * It disables all interrupt lines that are enabled at the moment and sets the
> > + * IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > + struct irq_desc *desc;
> > + int irq;
> > +
> > + for_each_irq_desc(irq, desc) {
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&desc->lock, flags);
> > + __disable_irq(desc, irq, true);
> > + spin_unlock_irqrestore(&desc->lock, flags);
>
> Can we move the locking into __disable_irq ?

Well, yes, but (see below)

> > + }
> > +
> > + for_each_irq_desc(irq, desc)
> > + if (desc->status & IRQ_SUSPENDED)
> > + synchronize_irq(irq);
> > +}
> > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > +
> > +/**
> > + * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
> > + *
> > + * Enable all interrupt lines previously disabled by suspend_device_irqs() that
> > + * have the IRQ_SUSPENDED flag set.
> > + */
> > +void resume_device_irqs(void)
> > +{
> > + struct irq_desc *desc;
> > + int irq;
> > +
> > + for_each_irq_desc(irq, desc) {
> > + unsigned long flags;
> > +
> > + if (!(desc->status & IRQ_SUSPENDED))
> > + continue;
> > +
> > + spin_lock_irqsave(&desc->lock, flags);
> > + __enable_irq(desc, irq, true);
> > + spin_unlock_irqrestore(&desc->lock, flags);
>
> Ditto

No, because of __setup_irq().

Thanks,
Rafael

2009-03-14 00:04:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Friday 13 March 2009, Thomas Gleixner wrote:
> On Thu, 12 Mar 2009, Rafael J. Wysocki wrote:
> > +/**
> > + * suspend_device_irqs - disable all currently enabled interrupt lines
> > + *
> > + * During system-wide suspend or hibernation device interrupts need to be
> > + * disabled at the chip level and this function is provided for this purpose.
> > + * It disables all interrupt lines that are enabled at the moment and sets the
> > + * IRQ_SUSPENDED flag for them.
> > + */
> > +void suspend_device_irqs(void)
> > +{
> > + struct irq_desc *desc;
> > + int irq;
> > +
> > + for_each_irq_desc(irq, desc) {
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&desc->lock, flags);
> > + __disable_irq(desc, irq, true);
> > + spin_unlock_irqrestore(&desc->lock, flags);
>
> Can we move the locking into __disable_irq ?
>
> > + }
> > +
> > + for_each_irq_desc(irq, desc)
> > + if (desc->status & IRQ_SUSPENDED)
> > + synchronize_irq(irq);
> > +}
> > +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> > +
> > +/**
> > + * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
> > + *
> > + * Enable all interrupt lines previously disabled by suspend_device_irqs() that
> > + * have the IRQ_SUSPENDED flag set.
> > + */
> > +void resume_device_irqs(void)
> > +{
> > + struct irq_desc *desc;
> > + int irq;
> > +
> > + for_each_irq_desc(irq, desc) {
> > + unsigned long flags;
> > +
> > + if (!(desc->status & IRQ_SUSPENDED))
> > + continue;
> > +
> > + spin_lock_irqsave(&desc->lock, flags);
> > + __enable_irq(desc, irq, true);
> > + spin_unlock_irqrestore(&desc->lock, flags);
>
> Ditto

Well, I guess you'd prefer something like the appended patch, but Ingo probably
won't like it since it contains additional #ifdefs in irq/manage.c . Sigh.

Thanks,
Rafael

---
From: Rafael J. Wysocki <[email protected]>
Subject: PM: Introduce functions for suspending and resuming device interrupts

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively. These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

These functions will be used to rework the handling of interrupts
during suspend (hibernation) and resume. Namely, interrupts will
only be disabled on the CPU right before suspending sysdevs, while
device drivers will be prevented from receiving interrupts, with the
help of the new helper function, before their "late" suspend
callbacks run (and analogously during resume).

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
include/linux/interrupt.h | 5 +++
include/linux/irq.h | 1
kernel/irq/Makefile | 1
kernel/irq/internals.h | 2 +
kernel/irq/manage.c | 45 ++++++++++++++++++++++++++++---
kernel/irq/pm.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 116 insertions(+), 4 deletions(-)

Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef void (*irq_flow_handler_t)(unsig
#define IRQ_SPURIOUS_DISABLED 0x00800000 /* IRQ was disabled by the spurious trap */
#define IRQ_MOVE_PCNTXT 0x01000000 /* IRQ migration from process context */
#define IRQ_AFFINITY_SET 0x02000000 /* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED 0x04000000 /* IRQ has gone through suspend sequence */

#ifdef CONFIG_IRQ_PER_CPU
# define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -162,6 +162,14 @@ static inline int do_irq_select_affinity
}
#endif

+static void __disable_irq(struct irq_desc *desc, unsigned int irq)
+{
+ if (!desc->depth++) {
+ desc->status |= IRQ_DISABLED;
+ desc->chip->disable(irq);
+ }
+}
+
/**
* disable_irq_nosync - disable an irq without waiting
* @irq: Interrupt to disable
@@ -182,10 +190,7 @@ void disable_irq_nosync(unsigned int irq
return;

spin_lock_irqsave(&desc->lock, flags);
- if (!desc->depth++) {
- desc->status |= IRQ_DISABLED;
- desc->chip->disable(irq);
- }
+ __disable_irq(desc, irq);
spin_unlock_irqrestore(&desc->lock, flags);
}
EXPORT_SYMBOL(disable_irq_nosync);
@@ -215,15 +220,32 @@ void disable_irq(unsigned int irq)
}
EXPORT_SYMBOL(disable_irq);

+#ifdef CONFIG_PM_SLEEP
+void suspend_irq(struct irq_desc *desc, unsigned int irq)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&desc->lock, flags);
+ if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
+ __disable_irq(desc, irq);
+ desc->status |= IRQ_SUSPENDED;
+ }
+ spin_unlock_irqrestore(&desc->lock, flags);
+}
+#endif
+
static void __enable_irq(struct irq_desc *desc, unsigned int irq)
{
switch (desc->depth) {
case 0:
+ err_out:
WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq);
break;
case 1: {
unsigned int status = desc->status & ~IRQ_DISABLED;

+ if (desc->status & IRQ_SUSPENDED)
+ goto err_out;
/* Prevent probing on this irq: */
desc->status = status | IRQ_NOPROBE;
check_irq_resend(desc, irq);
@@ -258,6 +280,21 @@ void enable_irq(unsigned int irq)
}
EXPORT_SYMBOL(enable_irq);

+#ifdef CONFIG_PM_SLEEP
+void resume_irq(struct irq_desc *desc, unsigned int irq)
+{
+ unsigned long flags;
+
+ if (!(desc->status & IRQ_SUSPENDED))
+ return;
+
+ spin_lock_irqsave(&desc->lock, flags);
+ desc->status &= ~IRQ_SUSPENDED;
+ __enable_irq(desc, irq);
+ spin_unlock_irqrestore(&desc->lock, flags);
+}
+#endif
+
static int set_irq_wake_real(unsigned int irq, unsigned int on)
{
struct irq_desc *desc = irq_to_desc(irq);
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,8 @@ extern void compat_irq_chip_set_default_

extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
unsigned long flags);
+extern void suspend_irq(struct irq_desc *desc, unsigned int irq);
+extern void resume_irq(struct irq_desc *desc, unsigned int irq);

extern struct lock_class_key irq_desc_lock_class;
extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,66 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <[email protected]>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc)
+ suspend_irq(desc, irq);
+
+ for_each_irq_desc(irq, desc)
+ if (desc->status & IRQ_SUSPENDED)
+ synchronize_irq(irq);
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc)
+ resume_irq(desc, irq);
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc)
+ if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+ return -EBUSY;
+
+ return 0;
+}
Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned
extern void disable_irq(unsigned int irq);
extern void enable_irq(unsigned int irq);

+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)

extern cpumask_var_t irq_default_affinity;

2009-03-14 07:33:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Fri, 13 Mar 2009, Rafael J. Wysocki wrote:
> > > + spin_unlock_irqrestore(&desc->lock, flags);
> >
> > Ditto
>
> No, because of __setup_irq().

Sorry, forgot about that. Ok. Keep the locking in pm.c then.

Thanks,

tglx

2009-03-14 10:02:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [update, rev. 6] Re: [PATCH 1/10] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Saturday 14 March 2009, Thomas Gleixner wrote:
> On Fri, 13 Mar 2009, Rafael J. Wysocki wrote:
> > > > + spin_unlock_irqrestore(&desc->lock, flags);
> > >
> > > Ditto
> >
> > No, because of __setup_irq().
>
> Sorry, forgot about that. Ok. Keep the locking in pm.c then.

Will do, thanks.

OK, it seems we're approaching the final version. :-)

I'm going to split the $subject patch as requested by Ingo (API changes and
functionality changes) and post the full series once again for completness.

Thanks,
Rafael