2009-03-07 10:28:49

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts

Hi,

The following patches modifiy the way in which we handle disabling interrupts
during suspend and enabling them during resume. They also change the ordering
of the core suspend and hibernation code to take advantage of the new approach
to the interrupts and modify the PCI PM core to avoid a few problems.

Namely, interrupts are currently disabled on the boot CPU as soon as the
nonboot CPUs have been disabled, which doesn't allow device drivers' "late"
suspend and "early" resume callbacks to sleep. Among other things this means
they cannot execute ACPI AML routines, which leads to problems with
suspend-resume of PCI devices, as recently discussed.

1/8 modifies the [suspend|hibernation] and resume code, as well as the other
code using the device PM framework, so that device drivers will not receive
interrupts during the "late" suspend phase, although interrupts will only be
disabled on the CPU right before calling sysdev_suspend() (and analogously
during resume).

2/8 - 4/8 modify the suspend, hibernation and kexec jump code, respectively,
so that the "late" phase of suspending devices will happen before executing the
platform "prepare" callback and disabling nonboot CPUs (and analogously during
resume).

5/8 is a patch that's already in the PCI linux-next tree and I included it in
the series, because the next patches depend on it.

6/8 makes the PCI PM core use pci_set_power_state() to put devices into
D0 during early resume, which allows the platform-specific operations to be
carried out at that time, if necessary.

7/8 uses the opportunity to move pci_restore_standard_config() to pci-driver.c,
where it belongs IMO.

8/8 makes the PCI PM core code put devices into low power states during the
"late" phase of suspend which allows us to avoid a long-standing race related
to shared interrupts and to handle devices that require some platform-specific
operations to be put into low power states appropriately at the same time.

Comments welcome.

Thanks,
Rafael


2009-03-07 10:29:08

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH][2/8] PM: Change suspend code ordering

From: Rafael J. Wysocki <[email protected]>

Change the ordering of the suspend core code so that the platform
"prepare" callback is executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
kernel/power/main.c | 38 ++++++++++++++++++++++----------------
1 file changed, 22 insertions(+), 16 deletions(-)

Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -297,6 +297,19 @@ static int suspend_enter(suspend_state_t
goto Done;
}

+ if (suspend_ops->prepare) {
+ error = suspend_ops->prepare();
+ if (error)
+ goto Power_up_devices;
+ }
+
+ if (suspend_test(TEST_PLATFORM))
+ goto Platfrom_finish;
+
+ error = disable_nonboot_cpus();
+ if (error || suspend_test(TEST_CPUS))
+ goto Enable_cpus;
+
arch_suspend_disable_irqs();
BUG_ON(!irqs_disabled());

@@ -310,6 +323,14 @@ static int suspend_enter(suspend_state_t
arch_suspend_enable_irqs();
BUG_ON(irqs_disabled());

+ Enable_cpus:
+ enable_nonboot_cpus();
+
+ Platfrom_finish:
+ if (suspend_ops->finish)
+ suspend_ops->finish();
+
+ Power_up_devices:
device_power_up(PMSG_RESUME);

Done:
@@ -346,23 +367,8 @@ int suspend_devices_and_enter(suspend_st
if (suspend_test(TEST_DEVICES))
goto Recover_platform;

- if (suspend_ops->prepare) {
- error = suspend_ops->prepare();
- if (error)
- goto Resume_devices;
- }
-
- if (suspend_test(TEST_PLATFORM))
- goto Finish;
+ suspend_enter(state);

- error = disable_nonboot_cpus();
- if (!error && !suspend_test(TEST_CPUS))
- suspend_enter(state);
-
- enable_nonboot_cpus();
- Finish:
- if (suspend_ops->finish)
- suspend_ops->finish();
Resume_devices:
suspend_test_start();
device_resume(PMSG_RESUME);

2009-03-07 10:29:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH][6/8] PCI PM: Use pci_set_power_state during early resume

From: Rafael J. Wysocki <[email protected]>

Once we have allowed timer interrupts to be enabled during the early
phase of resuming devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into D0 at that time. Then,
the platform-specific PM code will have a chance to handle devices
that don't implement the native PCI PM or that require some
additional, platform-specific operations to be carried out to power
them up. Also, by doing this we can simplify the code quite a bit.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci.c | 48 +++++++++---------------------------------------
1 file changed, 9 insertions(+), 39 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -426,7 +426,6 @@ static inline int platform_pci_sleep_wak
* given PCI device
* @dev: PCI device to handle.
* @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
- * @wait: If 'true', wait for the device to change its power state
*
* RETURN VALUE:
* -EINVAL if the requested state is invalid.
@@ -435,8 +434,7 @@ static inline int platform_pci_sleep_wak
* 0 if device already is in the requested state.
* 0 if device's power state has been successfully changed.
*/
-static int
-pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state, bool wait)
+static int pci_raw_set_power_state(struct pci_dev *dev, pci_power_t state)
{
u16 pmcsr;
bool need_restore = false;
@@ -481,10 +479,8 @@ pci_raw_set_power_state(struct pci_dev *
break;
case PCI_UNKNOWN: /* Boot-up */
if ((pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D3hot
- && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET)) {
+ && !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET))
need_restore = true;
- wait = true;
- }
/* Fall-through: force to D0 */
default:
pmcsr = 0;
@@ -494,9 +490,6 @@ pci_raw_set_power_state(struct pci_dev *
/* enter specified state */
pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);

- if (!wait)
- return 0;
-
/* Mandatory power management transition delays */
/* see PCI PM 1.1 5.6.1 table 18 */
if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
@@ -521,7 +514,7 @@ pci_raw_set_power_state(struct pci_dev *
if (need_restore)
pci_restore_bars(dev);

- if (wait && dev->bus->self)
+ if (dev->bus->self)
pcie_aspm_pm_state_change(dev->bus->self);

return 0;
@@ -591,7 +584,7 @@ int pci_set_power_state(struct pci_dev *
if (state == PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
return 0;

- error = pci_raw_set_power_state(dev, state, true);
+ error = pci_raw_set_power_state(dev, state);

if (state > PCI_D0 && platform_pci_power_manageable(dev)) {
/* Allow the platform to finalize the transition */
@@ -1390,37 +1383,14 @@ void pci_allocate_cap_save_buffers(struc
*/
int pci_restore_standard_config(struct pci_dev *dev)
{
- pci_power_t prev_state;
- int error;
-
- pci_update_current_state(dev, PCI_D0);
-
- prev_state = dev->current_state;
- if (prev_state == PCI_D0)
- goto Restore;
-
- error = pci_raw_set_power_state(dev, PCI_D0, false);
- if (error)
- return error;
+ pci_update_current_state(dev, PCI_UNKNOWN);

- /*
- * This assumes that we won't get a bus in B2 or B3 from the BIOS, but
- * we've made this assumption forever and it appears to be universally
- * satisfied.
- */
- switch(prev_state) {
- case PCI_D3cold:
- case PCI_D3hot:
- mdelay(pci_pm_d3_delay);
- break;
- case PCI_D2:
- udelay(PCI_PM_D2_DELAY);
- break;
+ if (dev->current_state != PCI_D0) {
+ int error = pci_set_power_state(dev, PCI_D0);
+ if (error)
+ return error;
}

- pci_update_current_state(dev, PCI_D0);
-
- Restore:
return dev->state_saved ? pci_restore_state(dev) : 0;
}

2009-03-07 10:29:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH][5/8] PCI PM: Consistently use variable name "error" for pm call return values

From: Frans Pop <[email protected]>

I noticed two functions use a variable "i" to store the return value of PM
function calls while the rest of the file uses "error". As "i" normally
indicates a counter of some sort it seems better to keep this consistent.

Signed-off-by: Frans Pop <[email protected]>
Signed-off-by: Jesse Barnes <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci-driver.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,17 +352,17 @@ static int pci_legacy_suspend(struct dev
{
struct pci_dev * pci_dev = to_pci_dev(dev);
struct pci_driver * drv = pci_dev->driver;
- int i = 0;
+ int error = 0;

if (drv && drv->suspend) {
pci_power_t prev = pci_dev->current_state;

pci_dev->state_saved = false;

- i = drv->suspend(pci_dev, state);
- suspend_report_result(drv->suspend, i);
- if (i)
- return i;
+ error = drv->suspend(pci_dev, state);
+ suspend_report_result(drv->suspend, error);
+ if (error)
+ return error;

if (pci_dev->state_saved)
goto Fixup;
@@ -385,20 +385,20 @@ static int pci_legacy_suspend(struct dev
Fixup:
pci_fixup_device(pci_fixup_suspend, pci_dev);

- return i;
+ return error;
}

static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
{
struct pci_dev * pci_dev = to_pci_dev(dev);
struct pci_driver * drv = pci_dev->driver;
- int i = 0;
+ int error = 0;

if (drv && drv->suspend_late) {
- i = drv->suspend_late(pci_dev, state);
- suspend_report_result(drv->suspend_late, i);
+ error = drv->suspend_late(pci_dev, state);
+ suspend_report_result(drv->suspend_late, error);
}
- return i;
+ return error;
}

static int pci_legacy_resume_early(struct device *dev)

2009-03-07 10:29:57

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH][7/8] PCI PM: Move pci_restore_standard_config to pci-driver.c

From: Rafael J. Wysocki <[email protected]>

Move pci_restore_standard_config() from pci.c to pci-driver.c and
make it static.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci-driver.c | 17 +++++++++++++++++
drivers/pci/pci.c | 21 ---------------------
drivers/pci/pci.h | 1 -
3 files changed, 17 insertions(+), 22 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -423,6 +423,23 @@ static int pci_legacy_resume(struct devi

/* Auxiliary functions used by the new power management framework */

+/**
+ * pci_restore_standard_config - restore standard config registers of PCI device
+ * @pci_dev: PCI device to handle
+ */
+static int pci_restore_standard_config(struct pci_dev *pci_dev)
+{
+ pci_update_current_state(pci_dev, PCI_UNKNOWN);
+
+ if (pci_dev->current_state != PCI_D0) {
+ int error = pci_set_power_state(pci_dev, PCI_D0);
+ if (error)
+ return error;
+ }
+
+ return pci_dev->state_saved ? pci_restore_state(pci_dev) : 0;
+}
+
static void pci_pm_default_resume_noirq(struct pci_dev *pci_dev)
{
pci_restore_standard_config(pci_dev);
Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -1374,27 +1374,6 @@ void pci_allocate_cap_save_buffers(struc
}

/**
- * pci_restore_standard_config - restore standard config registers of PCI device
- * @dev: PCI device to handle
- *
- * This function assumes that the device's configuration space is accessible.
- * If the device needs to be powered up, the function will wait for it to
- * change the state.
- */
-int pci_restore_standard_config(struct pci_dev *dev)
-{
- pci_update_current_state(dev, PCI_UNKNOWN);
-
- if (dev->current_state != PCI_D0) {
- int error = pci_set_power_state(dev, PCI_D0);
- if (error)
- return error;
- }
-
- return dev->state_saved ? pci_restore_state(dev) : 0;
-}
-
-/**
* pci_enable_ari - enable ARI forwarding if hardware support it
* @dev: the PCI device
*/
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -49,7 +49,6 @@ extern void pci_disable_enabled_device(s
extern void pci_pm_init(struct pci_dev *dev);
extern void platform_pci_wakeup_init(struct pci_dev *dev);
extern void pci_allocate_cap_save_buffers(struct pci_dev *dev);
-extern int pci_restore_standard_config(struct pci_dev *dev);

static inline bool pci_is_bridge(struct pci_dev *pci_dev)
{

2009-03-07 10:30:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH][8/8] PCI PM: Put devices into low power states during late suspend

From: Rafael J. Wysocki <[email protected]>

Once we have allowed timer interrupts to be enabled during the late
phase of suspending devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into low power states at
that time. We can also use some related platform callbacks, like the
ones preparing devices for wake-up, during the late suspend.

Doing this will allow us to avoid the race condition where a device
using shared interrupts is put into a low power state with interrupts
enabled and then an interrupt (for another device) comes in and
confuses its driver. At the same time, devices that don't support
the native PCI PM or that require some additional, platform-specific
operations to be carried out to put them into low power states will
be handled as appropriate.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/pci/pci-driver.c | 129 ++++++++++++++++++++++++++++-------------------
1 file changed, 77 insertions(+), 52 deletions(-)

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -352,53 +352,60 @@ static int pci_legacy_suspend(struct dev
{
struct pci_dev * pci_dev = to_pci_dev(dev);
struct pci_driver * drv = pci_dev->driver;
- int error = 0;
+
+ pci_dev->state_saved = false;

if (drv && drv->suspend) {
pci_power_t prev = pci_dev->current_state;
-
- pci_dev->state_saved = false;
+ int error;

error = drv->suspend(pci_dev, state);
suspend_report_result(drv->suspend, error);
if (error)
return error;

- if (pci_dev->state_saved)
- goto Fixup;
-
- if (pci_dev->current_state != PCI_D0
+ if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
&& pci_dev->current_state != PCI_UNKNOWN) {
WARN_ONCE(pci_dev->current_state != prev,
"PCI PM: Device state not saved by %pF\n",
drv->suspend);
- goto Fixup;
}
}

- pci_save_state(pci_dev);
- /*
- * This is for compatibility with existing code with legacy PM support.
- */
- pci_pm_set_unknown_state(pci_dev);
-
- Fixup:
pci_fixup_device(pci_fixup_suspend, pci_dev);

- return error;
+ return 0;
}

static int pci_legacy_suspend_late(struct device *dev, pm_message_t state)
{
struct pci_dev * pci_dev = to_pci_dev(dev);
struct pci_driver * drv = pci_dev->driver;
- int error = 0;

if (drv && drv->suspend_late) {
+ pci_power_t prev = pci_dev->current_state;
+ int error;
+
error = drv->suspend_late(pci_dev, state);
suspend_report_result(drv->suspend_late, error);
+ if (error)
+ return error;
+
+ if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+ && pci_dev->current_state != PCI_UNKNOWN) {
+ WARN_ONCE(pci_dev->current_state != prev,
+ "PCI PM: Device state not saved by %pF\n",
+ drv->suspend_late);
+ return 0;
+ }
}
- return error;
+
+ if (!pci_dev->state_saved)
+ pci_save_state(pci_dev);
+
+ pci_pm_set_unknown_state(pci_dev);
+
+ return 0;
}

static int pci_legacy_resume_early(struct device *dev)
@@ -460,7 +467,6 @@ static void pci_pm_default_suspend(struc
/* Disable non-bridge devices without PM support */
if (!pci_is_bridge(pci_dev))
pci_disable_enabled_device(pci_dev);
- pci_save_state(pci_dev);
}

static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev)
@@ -526,24 +532,14 @@ static int pci_pm_suspend(struct device
if (error)
return error;

- if (pci_dev->state_saved)
- goto Fixup;
-
- if (pci_dev->current_state != PCI_D0
+ if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
&& pci_dev->current_state != PCI_UNKNOWN) {
WARN_ONCE(pci_dev->current_state != prev,
"PCI PM: State of device not saved by %pF\n",
pm->suspend);
- goto Fixup;
}
}

- if (!pci_dev->state_saved) {
- pci_save_state(pci_dev);
- if (!pci_is_bridge(pci_dev))
- pci_prepare_to_sleep(pci_dev);
- }
-
Fixup:
pci_fixup_device(pci_fixup_suspend, pci_dev);

@@ -553,21 +549,41 @@ static int pci_pm_suspend(struct device
static int pci_pm_suspend_noirq(struct device *dev)
{
struct pci_dev *pci_dev = to_pci_dev(dev);
- struct device_driver *drv = dev->driver;
- int error = 0;
+ struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;

if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend_late(dev, PMSG_SUSPEND);

- if (drv && drv->pm && drv->pm->suspend_noirq) {
- error = drv->pm->suspend_noirq(dev);
- suspend_report_result(drv->pm->suspend_noirq, error);
+ if (!pm)
+ return 0;
+
+ if (pm->suspend_noirq) {
+ pci_power_t prev = pci_dev->current_state;
+ int error;
+
+ error = pm->suspend_noirq(dev);
+ suspend_report_result(pm->suspend_noirq, error);
+ if (error)
+ return error;
+
+ if (!pci_dev->state_saved && pci_dev->current_state != PCI_D0
+ && pci_dev->current_state != PCI_UNKNOWN) {
+ WARN_ONCE(pci_dev->current_state != prev,
+ "PCI PM: State of device not saved by %pF\n",
+ pm->suspend_noirq);
+ return 0;
+ }
}

- if (!error)
- pci_pm_set_unknown_state(pci_dev);
+ if (!pci_dev->state_saved) {
+ pci_save_state(pci_dev);
+ if (!pci_is_bridge(pci_dev))
+ pci_prepare_to_sleep(pci_dev);
+ }

- return error;
+ pci_pm_set_unknown_state(pci_dev);
+
+ return 0;
}

static int pci_pm_resume_noirq(struct device *dev)
@@ -650,9 +666,6 @@ static int pci_pm_freeze(struct device *
return error;
}

- if (!pci_dev->state_saved)
- pci_save_state(pci_dev);
-
return 0;
}

@@ -660,20 +673,25 @@ static int pci_pm_freeze_noirq(struct de
{
struct pci_dev *pci_dev = to_pci_dev(dev);
struct device_driver *drv = dev->driver;
- int error = 0;

if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend_late(dev, PMSG_FREEZE);

if (drv && drv->pm && drv->pm->freeze_noirq) {
+ int error;
+
error = drv->pm->freeze_noirq(dev);
suspend_report_result(drv->pm->freeze_noirq, error);
+ if (error)
+ return error;
}

- if (!error)
- pci_pm_set_unknown_state(pci_dev);
+ if (!pci_dev->state_saved)
+ pci_save_state(pci_dev);

- return error;
+ pci_pm_set_unknown_state(pci_dev);
+
+ return 0;
}

static int pci_pm_thaw_noirq(struct device *dev)
@@ -716,7 +734,6 @@ static int pci_pm_poweroff(struct device
{
struct pci_dev *pci_dev = to_pci_dev(dev);
struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
- int error = 0;

if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_suspend(dev, PMSG_HIBERNATE);
@@ -729,33 +746,41 @@ static int pci_pm_poweroff(struct device
pci_dev->state_saved = false;

if (pm->poweroff) {
+ int error;
+
error = pm->poweroff(dev);
suspend_report_result(pm->poweroff, error);
+ if (error)
+ return error;
}

- if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
- pci_prepare_to_sleep(pci_dev);
-
Fixup:
pci_fixup_device(pci_fixup_suspend, pci_dev);

- return error;
+ return 0;
}

static int pci_pm_poweroff_noirq(struct device *dev)
{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
struct device_driver *drv = dev->driver;
- int error = 0;

if (pci_has_legacy_pm_support(to_pci_dev(dev)))
return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);

if (drv && drv->pm && drv->pm->poweroff_noirq) {
+ int error;
+
error = drv->pm->poweroff_noirq(dev);
suspend_report_result(drv->pm->poweroff_noirq, error);
+ if (error)
+ return error;
}

- return error;
+ if (!pci_dev->state_saved && !pci_is_bridge(pci_dev))
+ pci_prepare_to_sleep(pci_dev);
+
+ return 0;
}

static int pci_pm_restore_noirq(struct device *dev)

2009-03-07 10:30:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH][4/8] kexec: Change kexec jump code ordering

From: Rafael J. Wysocki <[email protected]>

Change the ordering of the kexec jump code so that the nonboot CPUs
are disabled after calling device drivers' "late suspend" methods.

This change reflects the recent modifications of the power management
code that is also used by kexec jump.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
kernel/kexec.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1450,9 +1450,6 @@ int kernel_kexec(void)
error = device_suspend(PMSG_FREEZE);
if (error)
goto Resume_console;
- error = disable_nonboot_cpus();
- if (error)
- goto Resume_devices;
device_pm_lock();
/* At this point, device_suspend() has been called,
* but *not* device_power_down(). We *must*
@@ -1463,13 +1460,15 @@ int kernel_kexec(void)
*/
error = device_power_down(PMSG_FREEZE);
if (error)
- goto Unlock_pm;
-
+ goto Resume_devices;
+ error = disable_nonboot_cpus();
+ if (error)
+ goto Enable_cpus;
local_irq_disable();
/* Suspend system devices */
error = sysdev_suspend(PMSG_FREEZE);
if (error)
- goto Power_up_devices;
+ goto Enable_irqs;
} else
#endif
{
@@ -1483,13 +1482,13 @@ int kernel_kexec(void)
#ifdef CONFIG_KEXEC_JUMP
if (kexec_image->preserve_context) {
sysdev_resume();
- Power_up_devices:
+ Enable_irqs:
local_irq_enable();
- device_power_up(PMSG_RESTORE);
- Unlock_pm:
- device_pm_unlock();
+ Enable_cpus:
enable_nonboot_cpus();
+ device_power_up(PMSG_RESTORE);
Resume_devices:
+ device_pm_unlock();
device_resume(PMSG_RESTORE);
Resume_console:
resume_console();

2009-03-07 10:31:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH][3/8] PM: Change hibernation code ordering

From: Rafael J. Wysocki <[email protected]>

Change the ordering of the hibernation core code so that the platform
"prepare" callbacks are executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change (along with the previous analogous change of the suspend
core code) will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
kernel/power/disk.c | 109 +++++++++++++++++++++++++++++-----------------------
1 file changed, 61 insertions(+), 48 deletions(-)

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -228,13 +228,22 @@ static int create_image(int platform_mod
goto Unlock;
}

+ error = platform_pre_snapshot(platform_mode);
+ if (error || hibernation_test(TEST_PLATFORM))
+ goto Platform_finish;
+
+ error = disable_nonboot_cpus();
+ if (error || hibernation_test(TEST_CPUS)
+ || hibernation_testmode(HIBERNATION_TEST))
+ goto Enable_cpus;
+
local_irq_disable();

sysdev_suspend(PMSG_FREEZE);
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
"aborting hibernation\n");
- goto Power_up_devices;
+ goto Enable_irqs;
}

if (hibernation_test(TEST_CORE))
@@ -250,15 +259,22 @@ static int create_image(int platform_mod
restore_processor_state();
if (!in_suspend)
platform_leave(platform_mode);
+
Power_up:
sysdev_resume();
/* NOTE: device_power_up() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
*/

- Power_up_devices:
+ Enable_irqs:
local_irq_enable();

+ Enable_cpus:
+ enable_nonboot_cpus();
+
+ Platform_finish:
+ platform_finish(platform_mode);
+
device_power_up(in_suspend ?
(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);

@@ -298,25 +314,9 @@ int hibernation_snapshot(int platform_mo
if (hibernation_test(TEST_DEVICES))
goto Recover_platform;

- error = platform_pre_snapshot(platform_mode);
- if (error || hibernation_test(TEST_PLATFORM))
- goto Finish;
-
- error = disable_nonboot_cpus();
- if (!error) {
- if (hibernation_test(TEST_CPUS))
- goto Enable_cpus;
-
- if (hibernation_testmode(HIBERNATION_TEST))
- goto Enable_cpus;
+ error = create_image(platform_mode);
+ /* Control returns here after successful restore */

- error = create_image(platform_mode);
- /* Control returns here after successful restore */
- }
- Enable_cpus:
- enable_nonboot_cpus();
- Finish:
- platform_finish(platform_mode);
Resume_devices:
device_resume(in_suspend ?
(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
@@ -338,7 +338,7 @@ int hibernation_snapshot(int platform_mo
* kernel.
*/

-static int resume_target_kernel(void)
+static int resume_target_kernel(bool platform_mode)
{
int error;

@@ -351,9 +351,20 @@ static int resume_target_kernel(void)
goto Unlock;
}

+ error = platform_pre_restore(platform_mode);
+ if (error)
+ goto Cleanup;
+
+ error = disable_nonboot_cpus();
+ if (error)
+ goto Enable_cpus;
+
local_irq_disable();

- sysdev_suspend(PMSG_QUIESCE);
+ error = sysdev_suspend(PMSG_QUIESCE);
+ if (error)
+ goto Enable_irqs;
+
/* We'll ignore saved state, but this gets preempt count (etc) right */
save_processor_state();
error = restore_highmem();
@@ -379,8 +390,15 @@ static int resume_target_kernel(void)

sysdev_resume();

+ Enable_irqs:
local_irq_enable();

+ Enable_cpus:
+ enable_nonboot_cpus();
+
+ Cleanup:
+ platform_restore_cleanup(platform_mode);
+
device_power_up(PMSG_RECOVER);

Unlock:
@@ -405,19 +423,10 @@ int hibernation_restore(int platform_mod
pm_prepare_console();
suspend_console();
error = device_suspend(PMSG_QUIESCE);
- if (error)
- goto Finish;
-
- error = platform_pre_restore(platform_mode);
if (!error) {
- error = disable_nonboot_cpus();
- if (!error)
- error = resume_target_kernel();
- enable_nonboot_cpus();
+ error = resume_target_kernel(platform_mode);
+ device_resume(PMSG_RECOVER);
}
- platform_restore_cleanup(platform_mode);
- device_resume(PMSG_RECOVER);
- Finish:
resume_console();
pm_restore_console();
return error;
@@ -453,34 +462,38 @@ int hibernation_platform_enter(void)
goto Resume_devices;
}

+ device_pm_lock();
+
+ error = device_power_down(PMSG_HIBERNATE);
+ if (error)
+ goto Unlock;
+
error = hibernation_ops->prepare();
if (error)
- goto Resume_devices;
+ goto Platofrm_finish;

error = disable_nonboot_cpus();
if (error)
- goto Finish;
-
- device_pm_lock();
-
- error = device_power_down(PMSG_HIBERNATE);
- if (!error) {
- local_irq_disable();
- sysdev_suspend(PMSG_HIBERNATE);
- hibernation_ops->enter();
- /* We should never get here */
- while (1);
- }
+ goto Platofrm_finish;

- device_pm_unlock();
+ local_irq_disable();
+ sysdev_suspend(PMSG_HIBERNATE);
+ hibernation_ops->enter();
+ /* We should never get here */
+ while (1);

/*
* We don't need to reenable the nonboot CPUs or resume consoles, since
* the system is going to be halted anyway.
*/
- Finish:
+ Platofrm_finish:
hibernation_ops->finish();

+ device_power_up(PMSG_RESTORE);
+
+ Unlock:
+ device_pm_unlock();
+
Resume_devices:
entering_platform_hibernation = false;
device_resume(PMSG_RESTORE);

2009-03-07 10:30:58

by Rafael J. Wysocki

[permalink] [raw]
Subject: [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)

From: Rafael J. Wysocki <[email protected]>

Introduce two helper functions allowing us to prevent device drivers
from getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume, respectively. These
functions make it possible to keep timer interrupts enabled while the
"late" suspend and "early" resume callbacks provided by device
drivers are being executed.

Use these functions to rework the handling of interrupts during
suspend (hibernation) and resume. Namely, interrupts will only be
disabled on the CPU right before suspending sysdevs, while device
drivers will be prevented from receiving interrupts, with the help of
the new helper function, before their "late" suspend callbacks run
(and analogously during resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/kernel/apm_32.c | 15 +++++--
drivers/base/power/main.c | 20 +++++-----
drivers/base/sys.c | 8 ++++
drivers/xen/manage.c | 16 ++++----
include/linux/interrupt.h | 5 ++
include/linux/irq.h | 1
kernel/irq/Makefile | 1
kernel/irq/internals.h | 1
kernel/irq/manage.c | 2 -
kernel/irq/pm.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++
kernel/kexec.c | 8 ++--
kernel/power/disk.c | 39 ++++++++++++++------
kernel/power/main.c | 17 +++++---
13 files changed, 181 insertions(+), 41 deletions(-)

Index: linux-2.6/include/linux/interrupt.h
===================================================================
--- linux-2.6.orig/include/linux/interrupt.h
+++ linux-2.6/include/linux/interrupt.h
@@ -106,6 +106,11 @@ extern void disable_irq_nosync(unsigned
extern void disable_irq(unsigned int irq);
extern void enable_irq(unsigned int irq);

+/* The following three functions are for the core kernel use only. */
+extern void suspend_device_irqs(void);
+extern void resume_device_irqs(void);
+extern int check_wakeup_irqs(void);
+
#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_HARDIRQS)

extern cpumask_var_t irq_default_affinity;
Index: linux-2.6/kernel/power/main.c
===================================================================
--- linux-2.6.orig/kernel/power/main.c
+++ linux-2.6/kernel/power/main.c
@@ -287,17 +287,19 @@ void __attribute__ ((weak)) arch_suspend
*/
static int suspend_enter(suspend_state_t state)
{
- int error = 0;
+ int error;

device_pm_lock();
- arch_suspend_disable_irqs();
- BUG_ON(!irqs_disabled());

- if ((error = device_power_down(PMSG_SUSPEND))) {
+ error = device_power_down(PMSG_SUSPEND);
+ if (error) {
printk(KERN_ERR "PM: Some devices failed to power down\n");
goto Done;
}

+ arch_suspend_disable_irqs();
+ BUG_ON(!irqs_disabled());
+
error = sysdev_suspend(PMSG_SUSPEND);
if (!error) {
if (!suspend_test(TEST_CORE))
@@ -305,11 +307,14 @@ static int suspend_enter(suspend_state_t
sysdev_resume();
}

- device_power_up(PMSG_RESUME);
- Done:
arch_suspend_enable_irqs();
BUG_ON(irqs_disabled());
+
+ device_power_up(PMSG_RESUME);
+
+ Done:
device_pm_unlock();
+
return error;
}

Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -214,7 +214,7 @@ static int create_image(int platform_mod
return error;

device_pm_lock();
- local_irq_disable();
+
/* At this point, device_suspend() has been called, but *not*
* device_power_down(). We *must* call device_power_down() now.
* Otherwise, drivers for some devices (e.g. interrupt controllers)
@@ -225,8 +225,11 @@ static int create_image(int platform_mod
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
"aborting hibernation\n");
- goto Enable_irqs;
+ goto Unlock;
}
+
+ local_irq_disable();
+
sysdev_suspend(PMSG_FREEZE);
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
@@ -252,12 +255,16 @@ static int create_image(int platform_mod
/* NOTE: device_power_up() is just a resume() for devices
* that suspended with irqs off ... no overall powerup.
*/
+
Power_up_devices:
+ local_irq_enable();
+
device_power_up(in_suspend ?
(error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE);
- Enable_irqs:
- local_irq_enable();
+
+ Unlock:
device_pm_unlock();
+
return error;
}

@@ -336,13 +343,16 @@ static int resume_target_kernel(void)
int error;

device_pm_lock();
- local_irq_disable();
+
error = device_power_down(PMSG_QUIESCE);
if (error) {
printk(KERN_ERR "PM: Some devices failed to power down, "
"aborting resume\n");
- goto Enable_irqs;
+ goto Unlock;
}
+
+ local_irq_disable();
+
sysdev_suspend(PMSG_QUIESCE);
/* We'll ignore saved state, but this gets preempt count (etc) right */
save_processor_state();
@@ -366,11 +376,16 @@ static int resume_target_kernel(void)
swsusp_free();
restore_processor_state();
touch_softlockup_watchdog();
+
sysdev_resume();
- device_power_up(PMSG_RECOVER);
- Enable_irqs:
+
local_irq_enable();
+
+ device_power_up(PMSG_RECOVER);
+
+ Unlock:
device_pm_unlock();
+
return error;
}

@@ -447,15 +462,16 @@ int hibernation_platform_enter(void)
goto Finish;

device_pm_lock();
- local_irq_disable();
+
error = device_power_down(PMSG_HIBERNATE);
if (!error) {
+ local_irq_disable();
sysdev_suspend(PMSG_HIBERNATE);
hibernation_ops->enter();
/* We should never get here */
while (1);
}
- local_irq_enable();
+
device_pm_unlock();

/*
@@ -464,12 +480,15 @@ int hibernation_platform_enter(void)
*/
Finish:
hibernation_ops->finish();
+
Resume_devices:
entering_platform_hibernation = false;
device_resume(PMSG_RESTORE);
resume_console();
+
Close:
hibernation_ops->end();
+
return error;
}

Index: linux-2.6/arch/x86/kernel/apm_32.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/apm_32.c
+++ linux-2.6/arch/x86/kernel/apm_32.c
@@ -1190,8 +1190,10 @@ static int suspend(int vetoable)
struct apm_user *as;

device_suspend(PMSG_SUSPEND);
- local_irq_disable();
+
device_power_down(PMSG_SUSPEND);
+
+ local_irq_disable();
sysdev_suspend(PMSG_SUSPEND);

local_irq_enable();
@@ -1209,9 +1211,12 @@ static int suspend(int vetoable)
if (err != APM_SUCCESS)
apm_error("suspend", err);
err = (err == APM_SUCCESS) ? 0 : -EIO;
+
sysdev_resume();
- device_power_up(PMSG_RESUME);
local_irq_enable();
+
+ device_power_up(PMSG_RESUME);
+
device_resume(PMSG_RESUME);
queue_event(APM_NORMAL_RESUME, NULL);
spin_lock(&user_list_lock);
@@ -1228,8 +1233,9 @@ static void standby(void)
{
int err;

- local_irq_disable();
device_power_down(PMSG_SUSPEND);
+
+ local_irq_disable();
sysdev_suspend(PMSG_SUSPEND);
local_irq_enable();

@@ -1239,8 +1245,9 @@ static void standby(void)

local_irq_disable();
sysdev_resume();
- device_power_up(PMSG_RESUME);
local_irq_enable();
+
+ device_power_up(PMSG_RESUME);
}

static apm_event_t get_event(void)
Index: linux-2.6/drivers/xen/manage.c
===================================================================
--- linux-2.6.orig/drivers/xen/manage.c
+++ linux-2.6/drivers/xen/manage.c
@@ -39,12 +39,6 @@ static int xen_suspend(void *data)

BUG_ON(!irqs_disabled());

- err = device_power_down(PMSG_SUSPEND);
- if (err) {
- printk(KERN_ERR "xen_suspend: device_power_down failed: %d\n",
- err);
- return err;
- }
err = sysdev_suspend(PMSG_SUSPEND);
if (err) {
printk(KERN_ERR "xen_suspend: sysdev_suspend failed: %d\n",
@@ -69,7 +63,6 @@ static int xen_suspend(void *data)
xen_mm_unpin_all();

sysdev_resume();
- device_power_up(PMSG_RESUME);

if (!*cancelled) {
xen_irq_resume();
@@ -108,6 +101,12 @@ static void do_suspend(void)
/* XXX use normal device tree? */
xenbus_suspend();

+ err = device_power_down(PMSG_SUSPEND);
+ if (err) {
+ printk(KERN_ERR "device_power_down failed: %d\n", err);
+ goto resume_devices;
+ }
+
err = stop_machine(xen_suspend, &cancelled, &cpumask_of_cpu(0));
if (err) {
printk(KERN_ERR "failed to start xen_suspend: %d\n", err);
@@ -120,6 +119,9 @@ static void do_suspend(void)
} else
xenbus_suspend_cancel();

+ device_power_up(PMSG_RESUME);
+
+resume_devices:
device_resume(PMSG_RESUME);

/* Make sure timer events get retriggered on all CPUs */
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1454,7 +1454,6 @@ int kernel_kexec(void)
if (error)
goto Resume_devices;
device_pm_lock();
- local_irq_disable();
/* At this point, device_suspend() has been called,
* but *not* device_power_down(). We *must*
* device_power_down() now. Otherwise, drivers for
@@ -1464,8 +1463,9 @@ int kernel_kexec(void)
*/
error = device_power_down(PMSG_FREEZE);
if (error)
- goto Enable_irqs;
+ goto Unlock_pm;

+ local_irq_disable();
/* Suspend system devices */
error = sysdev_suspend(PMSG_FREEZE);
if (error)
@@ -1484,9 +1484,9 @@ int kernel_kexec(void)
if (kexec_image->preserve_context) {
sysdev_resume();
Power_up_devices:
- device_power_up(PMSG_RESTORE);
- Enable_irqs:
local_irq_enable();
+ device_power_up(PMSG_RESTORE);
+ Unlock_pm:
device_pm_unlock();
enable_nonboot_cpus();
Resume_devices:
Index: linux-2.6/include/linux/irq.h
===================================================================
--- linux-2.6.orig/include/linux/irq.h
+++ linux-2.6/include/linux/irq.h
@@ -65,6 +65,7 @@ typedef void (*irq_flow_handler_t)(unsig
#define IRQ_SPURIOUS_DISABLED 0x00800000 /* IRQ was disabled by the spurious trap */
#define IRQ_MOVE_PCNTXT 0x01000000 /* IRQ migration from process context */
#define IRQ_AFFINITY_SET 0x02000000 /* IRQ affinity was set from userspace*/
+#define IRQ_SUSPENDED 0x04000000 /* IRQ has gone through suspend sequence */

#ifdef CONFIG_IRQ_PER_CPU
# define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
Index: linux-2.6/kernel/irq/pm.c
===================================================================
--- /dev/null
+++ linux-2.6/kernel/irq/pm.c
@@ -0,0 +1,89 @@
+/*
+ * linux/kernel/irq/pm.c
+ *
+ * Copyright (C) 2009 Rafael J. Wysocki <[email protected]>, Novell Inc.
+ *
+ * This file contains power management functions related to interrupts.
+ */
+
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+
+#include "internals.h"
+
+/**
+ * suspend_device_irqs - disable all currently enabled interrupt lines
+ *
+ * During system-wide suspend or hibernation device interrupts need to be
+ * disabled at the chip level and this function is provided for this purpose.
+ * It disables all interrupt lines that are enabled at the moment and sets the
+ * IRQ_SUSPENDED flag for them.
+ */
+void suspend_device_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc) {
+ unsigned long flags;
+ bool sync = false;
+
+ spin_lock_irqsave(&desc->lock, flags);
+
+ if (desc->action && !(desc->action->flags & IRQF_TIMER)) {
+ if (!desc->depth++) {
+ desc->status |= IRQ_DISABLED;
+ desc->chip->disable(irq);
+ sync = true;
+ }
+ desc->status |= IRQ_SUSPENDED;
+ }
+
+ spin_unlock_irqrestore(&desc->lock, flags);
+
+ if (sync)
+ synchronize_irq(irq);
+ }
+}
+EXPORT_SYMBOL_GPL(suspend_device_irqs);
+
+/**
+ * resume_device_irqs - enable interrupt lines disabled by suspend_device_irqs()
+ *
+ * Enable all interrupt lines previously disabled by suspend_device_irqs() that
+ * have the IRQ_SUSPENDED flag set.
+ */
+void resume_device_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc) {
+ unsigned long flags;
+
+ if (!(desc->status & IRQ_SUSPENDED))
+ continue;
+
+ spin_lock_irqsave(&desc->lock, flags);
+ desc->status &= ~IRQ_SUSPENDED;
+ __enable_irq(desc, irq);
+ spin_unlock_irqrestore(&desc->lock, flags);
+ }
+}
+EXPORT_SYMBOL_GPL(resume_device_irqs);
+
+/**
+ * check_wakeup_irqs - check if any wake-up interrupts are pending
+ */
+int check_wakeup_irqs(void)
+{
+ struct irq_desc *desc;
+ int irq;
+
+ for_each_irq_desc(irq, desc)
+ if ((desc->status & IRQ_WAKEUP) && (desc->status & IRQ_PENDING))
+ return -EBUSY;
+
+ return 0;
+}
Index: linux-2.6/kernel/irq/Makefile
===================================================================
--- linux-2.6.orig/kernel/irq/Makefile
+++ linux-2.6/kernel/irq/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_GENERIC_IRQ_PROBE) += autop
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
obj-$(CONFIG_NUMA_MIGRATE_IRQ_DESC) += numa_migrate.o
+obj-$(CONFIG_PM_SLEEP) += pm.o
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -215,7 +215,7 @@ void disable_irq(unsigned int irq)
}
EXPORT_SYMBOL(disable_irq);

-static void __enable_irq(struct irq_desc *desc, unsigned int irq)
+void __enable_irq(struct irq_desc *desc, unsigned int irq)
{
switch (desc->depth) {
case 0:
Index: linux-2.6/drivers/base/power/main.c
===================================================================
--- linux-2.6.orig/drivers/base/power/main.c
+++ linux-2.6/drivers/base/power/main.c
@@ -23,6 +23,7 @@
#include <linux/pm.h>
#include <linux/resume-trace.h>
#include <linux/rwsem.h>
+#include <linux/interrupt.h>

#include "../base.h"
#include "power.h"
@@ -305,7 +306,8 @@ static int resume_device_noirq(struct de
* Execute the appropriate "noirq resume" callback for all devices marked
* as DPM_OFF_IRQ.
*
- * Must be called with interrupts disabled and only one CPU running.
+ * Must be called under dpm_list_mtx. Device drivers should not receive
+ * interrupts while it's being executed.
*/
static void dpm_power_up(pm_message_t state)
{
@@ -326,14 +328,13 @@ static void dpm_power_up(pm_message_t st
* device_power_up - Turn on all devices that need special attention.
* @state: PM transition of the system being carried out.
*
- * Power on system devices, then devices that required we shut them down
- * with interrupts disabled.
- *
- * Must be called with interrupts disabled.
+ * Call the "early" resume handlers and enable device drivers to receive
+ * interrupts.
*/
void device_power_up(pm_message_t state)
{
dpm_power_up(state);
+ resume_device_irqs();
}
EXPORT_SYMBOL_GPL(device_power_up);

@@ -558,16 +559,17 @@ static int suspend_device_noirq(struct d
* device_power_down - Shut down special devices.
* @state: PM transition of the system being carried out.
*
- * Power down devices that require interrupts to be disabled.
- * Then power down system devices.
+ * Prevent device drivers from receiving interrupts and call the "late"
+ * suspend handlers.
*
- * Must be called with interrupts disabled and only one CPU running.
+ * Must be called under dpm_list_mtx.
*/
int device_power_down(pm_message_t state)
{
struct device *dev;
int error = 0;

+ suspend_device_irqs();
list_for_each_entry_reverse(dev, &dpm_list, power.entry) {
error = suspend_device_noirq(dev, state);
if (error) {
@@ -577,7 +579,7 @@ int device_power_down(pm_message_t state
dev->power.status = DPM_OFF_IRQ;
}
if (error)
- dpm_power_up(resume_event(state));
+ device_power_up(resume_event(state));
return error;
}
EXPORT_SYMBOL_GPL(device_power_down);
Index: linux-2.6/drivers/base/sys.c
===================================================================
--- linux-2.6.orig/drivers/base/sys.c
+++ linux-2.6/drivers/base/sys.c
@@ -22,6 +22,7 @@
#include <linux/pm.h>
#include <linux/device.h>
#include <linux/mutex.h>
+#include <linux/interrupt.h>

#include "base.h"

@@ -369,6 +370,13 @@ int sysdev_suspend(pm_message_t state)
struct sysdev_driver *drv, *err_drv;
int ret;

+ pr_debug("Checking wake-up interrupts\n");
+
+ /* Return error code if there are any wake-up interrupts pending */
+ ret = check_wakeup_irqs();
+ if (ret)
+ return ret;
+
pr_debug("Suspending System Devices\n");

list_for_each_entry_reverse(cls, &system_kset->list, kset.kobj.entry) {
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -12,6 +12,7 @@ extern void compat_irq_chip_set_default_

extern int __irq_set_trigger(struct irq_desc *desc, unsigned int irq,
unsigned long flags);
+extern void __enable_irq(struct irq_desc *desc, unsigned int irq);

extern struct lock_class_key irq_desc_lock_class;
extern void init_kstat_irqs(struct irq_desc *desc, int cpu, int nr);

2009-03-07 16:51:31

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Sat, 7 Mar 2009, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <[email protected]>
>
> Introduce two helper functions allowing us to prevent device drivers
> from getting any interrupts (without disabling interrupts on the CPU)
> during suspend (or hibernation) and to make them start to receive
> interrupts again during the subsequent resume, respectively. These
> functions make it possible to keep timer interrupts enabled while the
> "late" suspend and "early" resume callbacks provided by device
> drivers are being executed.
>
> Use these functions to rework the handling of interrupts during
> suspend (hibernation) and resume. Namely, interrupts will only be
> disabled on the CPU right before suspending sysdevs, while device
> drivers will be prevented from receiving interrupts, with the help of
> the new helper function, before their "late" suspend callbacks run
> (and analogously during resume).
>
> In addition, since the device interrups are now disabled before the
> CPU has turned all interrupts off and the CPU will ACK the interrupts
> setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
> any wake-up interrupts are pending and abort suspend if that's the
> case.

One thing about this isn't clear: the distinction between "wake-up"
interrupts and other interrupts.

In an ideal world, the only pending interrupts during sysdev_suspend
would be wake-up interrupts, because drivers would have prevented their
devices from generating any other kind of IRQ and would have done all
the necessary synchronization as part of their suspend (_not_
suspend_late) methods. Thus there would be no need to distinguish
between wake-up and non-wake-up interrupts.

So perhaps you're worried about drivers that aren't sufficiently
clever. Or is something deeper going on?

Alan Stern

2009-03-07 17:57:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Saturday 07 March 2009, Alan Stern wrote:
> On Sat, 7 Mar 2009, Rafael J. Wysocki wrote:
>
> > From: Rafael J. Wysocki <[email protected]>
> >
> > Introduce two helper functions allowing us to prevent device drivers
> > from getting any interrupts (without disabling interrupts on the CPU)
> > during suspend (or hibernation) and to make them start to receive
> > interrupts again during the subsequent resume, respectively. These
> > functions make it possible to keep timer interrupts enabled while the
> > "late" suspend and "early" resume callbacks provided by device
> > drivers are being executed.
> >
> > Use these functions to rework the handling of interrupts during
> > suspend (hibernation) and resume. Namely, interrupts will only be
> > disabled on the CPU right before suspending sysdevs, while device
> > drivers will be prevented from receiving interrupts, with the help of
> > the new helper function, before their "late" suspend callbacks run
> > (and analogously during resume).
> >
> > In addition, since the device interrups are now disabled before the
> > CPU has turned all interrupts off and the CPU will ACK the interrupts
> > setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
> > any wake-up interrupts are pending and abort suspend if that's the
> > case.
>
> One thing about this isn't clear: the distinction between "wake-up"
> interrupts and other interrupts.
>
> In an ideal world, the only pending interrupts during sysdev_suspend
> would be wake-up interrupts, because drivers would have prevented their
> devices from generating any other kind of IRQ and would have done all
> the necessary synchronization as part of their suspend (_not_
> suspend_late) methods. Thus there would be no need to distinguish
> between wake-up and non-wake-up interrupts.
>
> So perhaps you're worried about drivers that aren't sufficiently
> clever. Or is something deeper going on?

Some drivers leave interrupts enabled during suspend on purpose and mark
them as "wake-up interrupts" so that the platform can abort suspend if any
of them is pending at the time the "enter suspend" hook is called (this doesn't
happen on x86 AFAICS).

However, after the $subject patch the CPU will ACK those interrupts if they
happen between suspend_device_irqs() and local_irq_disable(), so the platform
won't see them as pending. Instead, they will have IRQ_PENDING set in
desc->status, so we check if this is the case.

Thanks,
Rafael

2009-03-08 03:53:43

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Sat, 7 Mar 2009, Rafael J. Wysocki wrote:

> > One thing about this isn't clear: the distinction between "wake-up"
> > interrupts and other interrupts.
> >
> > In an ideal world, the only pending interrupts during sysdev_suspend
> > would be wake-up interrupts, because drivers would have prevented their
> > devices from generating any other kind of IRQ and would have done all
> > the necessary synchronization as part of their suspend (_not_
> > suspend_late) methods. Thus there would be no need to distinguish
> > between wake-up and non-wake-up interrupts.
> >
> > So perhaps you're worried about drivers that aren't sufficiently
> > clever. Or is something deeper going on?
>
> Some drivers leave interrupts enabled during suspend on purpose and mark
> them as "wake-up interrupts" so that the platform can abort suspend if any
> of them is pending at the time the "enter suspend" hook is called (this doesn't
> happen on x86 AFAICS).
>
> However, after the $subject patch the CPU will ACK those interrupts if they
> happen between suspend_device_irqs() and local_irq_disable(), so the platform
> won't see them as pending. Instead, they will have IRQ_PENDING set in
> desc->status, so we check if this is the case.

You didn't answer my question. Why bother to distinguish between
"wake-up" interrupts and non-"wake-up" interrupts?

In other words, why not simply abort the suspend if IRQ_PENDING is set
for _any_ interrupt during sysdev_suspend()?

Alan Stern

2009-03-08 10:01:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Sunday 08 March 2009, Alan Stern wrote:
> On Sat, 7 Mar 2009, Rafael J. Wysocki wrote:
>
> > > One thing about this isn't clear: the distinction between "wake-up"
> > > interrupts and other interrupts.
> > >
> > > In an ideal world, the only pending interrupts during sysdev_suspend
> > > would be wake-up interrupts, because drivers would have prevented their
> > > devices from generating any other kind of IRQ and would have done all
> > > the necessary synchronization as part of their suspend (_not_
> > > suspend_late) methods. Thus there would be no need to distinguish
> > > between wake-up and non-wake-up interrupts.
> > >
> > > So perhaps you're worried about drivers that aren't sufficiently
> > > clever. Or is something deeper going on?
> >
> > Some drivers leave interrupts enabled during suspend on purpose and mark
> > them as "wake-up interrupts" so that the platform can abort suspend if any
> > of them is pending at the time the "enter suspend" hook is called (this doesn't
> > happen on x86 AFAICS).
> >
> > However, after the $subject patch the CPU will ACK those interrupts if they
> > happen between suspend_device_irqs() and local_irq_disable(), so the platform
> > won't see them as pending. Instead, they will have IRQ_PENDING set in
> > desc->status, so we check if this is the case.
>
> You didn't answer my question. Why bother to distinguish between
> "wake-up" interrupts and non-"wake-up" interrupts?

Sorry, I thought it followed from what I wrote.

> In other words, why not simply abort the suspend if IRQ_PENDING is set
> for _any_ interrupt during sysdev_suspend()?

The "wake-up" ones are _intentionally_ left enabled, while the other ones may
be left enabled by mistake. The check is intended to prevent the current
behavior from changing (ie. suspend is aborted if any "wake-up" interrupts
are pending) and since the platforms only check for the "wake-up" interrupts,
it doesn't go any further. Moreover, I think it might introduce a regression
if it did.

Thanks,
Rafael

2009-03-08 12:37:25

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Sun, 8 Mar 2009, Rafael J. Wysocki wrote:

> > > > So perhaps you're worried about drivers that aren't sufficiently
> > > > clever. Or is something deeper going on?

> > In other words, why not simply abort the suspend if IRQ_PENDING is set
> > for _any_ interrupt during sysdev_suspend()?
>
> The "wake-up" ones are _intentionally_ left enabled, while the other ones may
> be left enabled by mistake. The check is intended to prevent the current
> behavior from changing (ie. suspend is aborted if any "wake-up" interrupts
> are pending) and since the platforms only check for the "wake-up" interrupts,
> it doesn't go any further. Moreover, I think it might introduce a regression
> if it did.

So it _is_ because you are worried about drivers that aren't
sufficiently clever. If the drivers did their job correctly then there
wouldn't be any pending non-"wake-up" interrupts to confuse matters.

Alan Stern

2009-03-08 17:21:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)



On Sat, 7 Mar 2009, Alan Stern wrote:
>
> You didn't answer my question. Why bother to distinguish between
> "wake-up" interrupts and non-"wake-up" interrupts?
>
> In other words, why not simply abort the suspend if IRQ_PENDING is set
> for _any_ interrupt during sysdev_suspend()?

.. because some drivers might not actually shut down the hardware until
they get to "suspend_late"? If even then, for that matter - a driver may
simply not care, knowing that the hardware will be powered off, and will
be re-initialized at resume.

The thinking that you have to shut your hardware down at "->suspend()"
time is a _disease_. There are literally classes of hardware out there
where that would be an outright _bug_, like for a PCI bridge device. For
many devices, "suspend()" has to be the phase where you shut down the
_external_ stuff (eg for a disk controller, it's when you'd flush and stop
your disks), but the controller itself may well be alive until later.

Linus

2009-03-08 19:28:35

by Frans Pop

[permalink] [raw]
Subject: Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts

(Most CCs dropped.)

Hi Rafael,

Rafael J. Wysocki wrote:
> The following patches modifiy the way in which we handle disabling
> interrupts during suspend and enabling them during resume. They also
> change the ordering of the core suspend and hibernation code to take
> advantage of the new approach to the interrupts and modify the PCI PM
> core to avoid a few problems.

I've given this series a try on my HP 2510p. I've seen no regressions
with suspend to RAM.

Below is a diff between suspend/resume dmesg from before (based on rc5)
and after (rc7 + series) the patch, with some comments.
Nothing looks really wrong, but there are some surprising changes.

Essentially JFYI though.

Cheers,
FJP

PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.00 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
Suspending console(s) (use no_console_suspend to debug)
sd 0:0:0:0: [sda] Synchronizing SCSI cache
sd 0:0:0:0: [sda] Stopping disk
ACPI handle has no context!
ACPI handle has no context!
sdhci-pci 0000:02:06.2: PME# disabled
sdhci-pci 0000:02:06.2: PCI INT C disabled
ACPI handle has no context!
ACPI handle has no context!
# Bogus: result of using wireless instead of wired networking.
+iwlagn 0000:10:00.0: PCI INT A disabled
ata2: port disabled. ignoring.
ata_piix 0000:00:1f.1: PCI INT A disabled
ehci_hcd 0000:00:1d.7: PCI INT A disabled
ehci_hcd 0000:00:1d.7: PME# disabled
uhci_hcd 0000:00:1d.2: PCI INT C disabled
uhci_hcd 0000:00:1d.1: PCI INT B disabled
uhci_hcd 0000:00:1d.0: PCI INT A disabled
HDA Intel 0000:00:1b.0: PCI INT A disabled
HDA Intel 0000:00:1b.0: power state changed by ACPI to D3
ehci_hcd 0000:00:1a.7: PCI INT C disabled
ehci_hcd 0000:00:1a.7: PME# disabled
uhci_hcd 0000:00:1a.1: PCI INT B disabled
uhci_hcd 0000:00:1a.0: PCI INT A disabled
e1000e 0000:00:19.0: PME# enabled
e1000e 0000:00:19.0: wake-up capability enabled by ACPI
e1000e 0000:00:19.0: PME# enabled
e1000e 0000:00:19.0: wake-up capability enabled by ACPI
e1000e 0000:00:19.0: PCI INT A disabled
ACPI handle has no context!
# This has moved up a bit. Looks more logical.
+ricoh-mmc: Suspending.
+ricoh-mmc: Controller is now re-enabled.
ACPI: Preparing to enter system sleep state S3
Disabling non-boot CPUs ...
CPU 1 is now offline
SMP alternatives: switching to UP code
CPU0 attaching NULL sched-domain.
CPU1 attaching NULL sched-domain.
CPU0 attaching NULL sched-domain.
CPU1 is down
-ricoh-mmc: Suspending.
-ricoh-mmc: Controller is now re-enabled.
Extended CMOS year: 2000

Back to C!
+CPU0: Thermal monitoring enabled (TM2)
Extended CMOS year: 2000
# This whole block has moved up before early config space restores.
# No changes in the block itself.
+Enabling non-boot CPUs ...
+SMP alternatives: switching to SMP code
+Booting processor 1 APIC 0x1 ip 0x6000
+Initializing CPU#1
+Calibrating delay using timer specific routine.. 2660.04 BogoMIPS (lpj=5320097)
+CPU: L1 I cache: 32K, L1 D cache: 32K
+CPU: L2 cache: 2048K
+[ds] using Core 2/Atom configuration
+CPU: Physical Processor ID: 0
+CPU: Processor Core ID: 1
+CPU1: Thermal monitoring enabled (TM2)
+CPU1: Intel(R) Core(TM)2 Duo CPU U7700 @ 1.33GHz stepping 0d
+CPU0 attaching NULL sched-domain.
+Switched to high resolution mode on CPU 1
+CPU0 attaching sched-domain:
+ domain 0: span 0-1 level MC
+ groups: 0 1
+CPU1 attaching sched-domain:
+ domain 0: span 0-1 level MC
+ groups: 1 0
+CPU1 is up
+ACPI: Waking up from system sleep state S3
pci 0000:00:02.0: restoring config space at offset 0x8 (was 0x1, writing 0x2001)
# These don't need restoring anymore?
-pci 0000:00:02.1: restoring config space at offset 0x4 (was 0x4, writing 0xe0500004)
-pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
-pci 0000:00:03.0: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
-pci 0000:00:03.0: restoring config space at offset 0x4 (was 0xfed12004, writing 0xe0600004)
-pci 0000:00:03.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
-pci 0000:00:03.2: restoring config space at offset 0x8 (was 0x1, writing 0x2031)
-pci 0000:00:03.2: restoring config space at offset 0x7 (was 0x1, writing 0x2021)
-pci 0000:00:03.2: restoring config space at offset 0x6 (was 0x1, writing 0x2019)
-pci 0000:00:03.2: restoring config space at offset 0x5 (was 0x1, writing 0x2011)
-pci 0000:00:03.2: restoring config space at offset 0x4 (was 0x1, writing 0x2009)
-pci 0000:00:03.2: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00001)
serial 0000:00:03.3: restoring config space at offset 0xf (was 0x200, writing 0x20a)
serial 0000:00:03.3: restoring config space at offset 0x5 (was 0x0, writing 0xe0601000)
serial 0000:00:03.3: restoring config space at offset 0x4 (was 0x1, writing 0x2041)
serial 0000:00:03.3: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00007)
e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
e1000e 0000:00:19.0: restoring config space at offset 0x6 (was 0x1, writing 0x2061)
e1000e 0000:00:19.0: restoring config space at offset 0x5 (was 0x0, writing 0xe0640000)
e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
# These have moved down to late resume.
-uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
-uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
-uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
-uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
-uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
-uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
ehci_hcd 0000:00:1a.7: restoring config space at offset 0xf (was 0x300, writing 0x30b)
ehci_hcd 0000:00:1a.7: restoring config space at offset 0x4 (was 0x0, writing 0xe0641000)
ehci_hcd 0000:00:1a.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002)
HDA Intel 0000:00:1b.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
HDA Intel 0000:00:1b.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
HDA Intel 0000:00:1b.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100002)
pcieport-driver 0000:00:1c.0: restoring config space at offset 0xf (was 0x100, writing 0x4010a)
pcieport-driver 0000:00:1c.0: restoring config space at offset 0x9 (was 0x10001, writing 0x1fff1)
pcieport-driver 0000:00:1c.0: restoring config space at offset 0x8 (was 0x0, writing 0xfff0)
pcieport-driver 0000:00:1c.0: restoring config space at offset 0x7 (was 0x0, writing 0x200000f0)
pcieport-driver 0000:00:1c.0: restoring config space at offset 0x6 (was 0x0, writing 0x80800)
pcieport-driver 0000:00:1c.0: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
pcieport-driver 0000:00:1c.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
pcieport-driver 0000:00:1c.1: restoring config space at offset 0xf (was 0x200, writing 0x4020a)
pcieport-driver 0000:00:1c.1: restoring config space at offset 0x9 (was 0x10001, writing 0x1fff1)
pcieport-driver 0000:00:1c.1: restoring config space at offset 0x8 (was 0x0, writing 0xe000e000)
pcieport-driver 0000:00:1c.1: restoring config space at offset 0x7 (was 0x0, writing 0xf0)
pcieport-driver 0000:00:1c.1: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
pcieport-driver 0000:00:1c.1: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
# These have moved down to late resume.
-uhci_hcd 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
-uhci_hcd 0000:00:1d.0: restoring config space at offset 0x8 (was 0x1, writing 0x20c1)
-uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
-uhci_hcd 0000:00:1d.1: restoring config space at offset 0xf (was 0x200, writing 0x20b)
-uhci_hcd 0000:00:1d.1: restoring config space at offset 0x8 (was 0x1, writing 0x20e1)
-uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
-uhci_hcd 0000:00:1d.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
-uhci_hcd 0000:00:1d.2: restoring config space at offset 0x8 (was 0x1, writing 0x2101)
-uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
ehci_hcd 0000:00:1d.7: restoring config space at offset 0xf (was 0x100, writing 0x10a)
ehci_hcd 0000:00:1d.7: restoring config space at offset 0x4 (was 0x0, writing 0xe0648000)
ehci_hcd 0000:00:1d.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002)
# These have disappeared.
-pci 0000:00:1e.0: restoring config space at offset 0x9 (was 0x10001, writing 0x83f18001)
-pci 0000:00:1e.0: restoring config space at offset 0x8 (was 0x0, writing 0xe030e010)
-pci 0000:00:1e.0: restoring config space at offset 0x7 (was 0x228000f0, writing 0x22803030)
-pci 0000:00:1e.0: restoring config space at offset 0x1 (was 0x100007, writing 0x100107)
# First two moved to late resume.
# The third already happened during late resume (duplicated).
-ata_piix 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x10a)
-ata_piix 0000:00:1f.1: restoring config space at offset 0x8 (was 0xc01, writing 0x2121)
-ata_piix 0000:00:1f.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2880005)
iwlagn 0000:10:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
iwlagn 0000:10:00.0: restoring config space at offset 0x4 (was 0x4, writing 0xe0000004)
iwlagn 0000:10:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
iwlagn 0000:10:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100006)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0xf (was 0x3000100, writing 0x580010b)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0xe (was 0x0, writing 0x34fc)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0xd (was 0x0, writing 0x3400)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0xc (was 0x0, writing 0x30fc)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0xb (was 0x0, writing 0x3000)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0xa (was 0x0, writing 0x87fff000)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0x9 (was 0x0, writing 0x84000000)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0x8 (was 0x0, writing 0x83fff000)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0x7 (was 0x0, writing 0x80000000)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0x6 (was 0x0, writing 0xb0060302)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0x4 (was 0x0, writing 0xe0100000)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0x3 (was 0x820000, writing 0x82a800)
yenta_cardbus 0000:02:06.0: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100007)
ohci1394 0000:02:06.1: restoring config space at offset 0xf (was 0x4020200, writing 0x4020205)
ohci1394 0000:02:06.1: restoring config space at offset 0x4 (was 0x0, writing 0xe0101000)
ohci1394 0000:02:06.1: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
ohci1394 0000:02:06.1: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
sdhci-pci 0000:02:06.2: restoring config space at offset 0xf (was 0x300, writing 0x30a)
sdhci-pci 0000:02:06.2: restoring config space at offset 0x4 (was 0x0, writing 0xe0102000)
sdhci-pci 0000:02:06.2: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
sdhci-pci 0000:02:06.2: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
# Some changes; a lot just got dropped.
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0xf (was 0x300, writing 0xffffffff)
+ricoh-mmc 0000:02:06.3: restoring config space at offset 0xf (was 0x300, writing 0x30a)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0xe (was 0x0, writing 0xffffffff)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0xd (was 0x80, writing 0xffffffff)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0xc (was 0x0, writing 0xffffffff)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0xb (was 0x30c9103c, writing 0xffffffff)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0xa (was 0x0, writing 0xffffffff)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0x9 (was 0x0, writing 0xffffffff)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0x8 (was 0x0, writing 0xffffffff)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0x7 (was 0x0, writing 0xffffffff)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0x6 (was 0x0, writing 0xffffffff)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0x5 (was 0x0, writing 0xffffffff)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0x4 (was 0x0, writing 0xffffffff)
+ricoh-mmc 0000:02:06.3: restoring config space at offset 0x4 (was 0x0, writing 0xe0103000)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0x3 (was 0x800000, writing 0xffffffff)
+ricoh-mmc 0000:02:06.3: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0x2 (was 0x8800011, writing 0xffffffff)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0x1 (was 0x2100000, writing 0xffffffff)
+ricoh-mmc 0000:02:06.3: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
-ricoh-mmc 0000:02:06.3: restoring config space at offset 0x0 (was 0x8431180, writing 0xffffffff)
ricoh-mmc: Resuming.
ricoh-mmc: Controller is now disabled.
-Enabling non-boot CPUs ...
-SMP alternatives: switching to SMP code
-Booting processor 1 APIC 0x1 ip 0x6000
-Initializing CPU#1
-Calibrating delay using timer specific routine.. 2660.07 BogoMIPS (lpj=5320158)
-CPU: L1 I cache: 32K, L1 D cache: 32K
-CPU: L2 cache: 2048K
-[ds] using Core 2/Atom configuration
-CPU: Physical Processor ID: 0
-CPU: Processor Core ID: 1
-CPU1: Thermal monitoring enabled (TM2)
-x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
-CPU1: Intel(R) Core(TM)2 Duo CPU U7700 @ 1.33GHz stepping 0d
-CPU0 attaching NULL sched-domain.
-Switched to high resolution mode on CPU 1
-CPU0 attaching sched-domain:
- domain 0: span 0-1 level MC
- groups: 0 1
-CPU1 attaching sched-domain:
- domain 0: span 0-1 level MC
- groups: 1 0
-CPU1 is up
-ACPI: Waking up from system sleep state S3
ACPI: EC: non-query interrupt received, switching to interrupt mode
pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900403, writing 0x900003)
pci 0000:00:02.0: PME# disabled
pci 0000:00:02.1: PME# disabled
pci 0000:00:03.0: PME# disabled
pci 0000:00:03.2: PME# disabled
e1000e 0000:00:19.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
e1000e 0000:00:19.0: setting latency timer to 64
e1000e 0000:00:19.0: wake-up capability disabled by ACPI
e1000e 0000:00:19.0: PME# disabled
e1000e 0000:00:19.0: wake-up capability disabled by ACPI
e1000e 0000:00:19.0: PME# disabled
e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
+uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
+uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
+uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
uhci_hcd 0000:00:1a.0: setting latency timer to 64
usb usb1: root hub lost power or was reset
+uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
+uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
+uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
uhci_hcd 0000:00:1a.1: setting latency timer to 64
usb usb3: root hub lost power or was reset
ehci_hcd 0000:00:1a.7: PME# disabled
ehci_hcd 0000:00:1a.7: PCI INT C -> GSI 18 (level, low) -> IRQ 18
ehci_hcd 0000:00:1a.7: setting latency timer to 64
ehci_hcd 0000:00:1a.7: PME# disabled
# Called twice now?
HDA Intel 0000:00:1b.0: power state changed by ACPI to D0
+HDA Intel 0000:00:1b.0: power state changed by ACPI to D0
HDA Intel 0000:00:1b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
HDA Intel 0000:00:1b.0: setting latency timer to 64
pcieport-driver 0000:00:1c.0: setting latency timer to 64
pcieport-driver 0000:00:1c.1: setting latency timer to 64
+uhci_hcd 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
+uhci_hcd 0000:00:1d.0: restoring config space at offset 0x8 (was 0x1, writing 0x20c1)
+uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
uhci_hcd 0000:00:1d.0: setting latency timer to 64
usb usb5: root hub lost power or was reset
+uhci_hcd 0000:00:1d.1: restoring config space at offset 0xf (was 0x200, writing 0x20b)
+uhci_hcd 0000:00:1d.1: restoring config space at offset 0x8 (was 0x1, writing 0x20e1)
+uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 22 (level, low) -> IRQ 22
uhci_hcd 0000:00:1d.1: setting latency timer to 64
usb usb6: root hub lost power or was reset
+uhci_hcd 0000:00:1d.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
+uhci_hcd 0000:00:1d.2: restoring config space at offset 0x8 (was 0x1, writing 0x2101)
+uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
uhci_hcd 0000:00:1d.2: setting latency timer to 64
usb usb7: root hub lost power or was reset
ehci_hcd 0000:00:1d.7: PME# disabled
ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 20 (level, low) -> IRQ 20
ehci_hcd 0000:00:1d.7: setting latency timer to 64
ehci_hcd 0000:00:1d.7: PME# disabled
pci 0000:00:1e.0: setting latency timer to 64
+ata_piix 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x10a)
+ata_piix 0000:00:1f.1: restoring config space at offset 0x8 (was 0xc01, writing 0x2121)
ata_piix 0000:00:1f.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2880005)
ata_piix 0000:00:1f.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
ata_piix 0000:00:1f.1: setting latency timer to 64
ata2: port disabled. ignoring.
ACPI Exception (exoparg2-0445): AE_AML_PACKAGE_LIMIT, Index (000000005) is beyond end of object [20081204]
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.C2C3] (Node ffff88007e01dea0), AE_AML_PACKAGE_LIMIT
ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.C003.C0F6.C3F3._STM] (Node ffff88007e043de0), AE_AML_PACKAGE_LIMIT
ata1: ACPI set timing mode failed (status=0x300b)
# Remaining differences are bogus: result of using wireless instead of wired networking.
+iwlagn 0000:10:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
+iwlagn 0000:10:00.0: irq 27 for MSI/MSI-X
ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[19] MMIO=[e0101000-e01017ff] Max Packet=[2048] IR/IT contexts=[4/4]
sdhci-pci 0000:02:06.2: PCI INT C -> GSI 20 (level, low) -> IRQ 20
+Registered led device: iwl-phy0:radio
+Registered led device: iwl-phy0:assoc
+Registered led device: iwl-phy0:RX
+Registered led device: iwl-phy0:TX
sd 0:0:0:0: [sda] Starting disk
ata1.01: ACPI cmd ef/03:0c:00:00:00:b0 filtered out
ata1.01: ACPI cmd ef/03:40:00:00:00:b0 filtered out
ata1.00: ACPI cmd ef/03:01:00:00:00:a0 filtered out
ata1.00: ACPI cmd ef/03:45:00:00:00:a0 filtered out
ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
ata1.00: ACPI cmd b1/c1:00:00:00:00:a0 filtered out
ata1.00: ACPI cmd c6/00:10:00:00:00:a0 succeeded
-e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
-0000:00:19.0: eth0: 10/100 speed: disabling TSO
ata1.00: configured for UDMA/100
ata1.01: configured for MWDMA2
ata1.00: configured for UDMA/100
ata1.01: configured for MWDMA2
ata1: EH complete
sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors: (120 GB/111 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors: (120 GB/111 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
usb 1-1: reset full speed USB device using uhci_hcd and address 2
usb 5-2: reset full speed USB device using uhci_hcd and address 2
pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900403, writing 0x900003)
pci 0000:00:02.0: setting latency timer to 64
Restarting tasks ... done.

2009-03-08 20:40:29

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Sun, 8 Mar 2009, Linus Torvalds wrote:

> On Sat, 7 Mar 2009, Alan Stern wrote:
> >
> > You didn't answer my question. Why bother to distinguish between
> > "wake-up" interrupts and non-"wake-up" interrupts?
> >
> > In other words, why not simply abort the suspend if IRQ_PENDING is set
> > for _any_ interrupt during sysdev_suspend()?
>
> .. because some drivers might not actually shut down the hardware until
> they get to "suspend_late"? If even then, for that matter - a driver may
> simply not care, knowing that the hardware will be powered off, and will
> be re-initialized at resume.
>
> The thinking that you have to shut your hardware down at "->suspend()"
> time is a _disease_. There are literally classes of hardware out there
> where that would be an outright _bug_, like for a PCI bridge device. For
> many devices, "suspend()" has to be the phase where you shut down the
> _external_ stuff (eg for a disk controller, it's when you'd flush and stop
> your disks), but the controller itself may well be alive until later.

Yes, certainly. I agree completely.

But there is a difference between shutting down the hardware and merely
preventing it from generating interrupt requests. If a device remains
capable of generating IRQs after its driver's suspend method has run,
the driver runs the risk of having its handler called at a time when it
isn't prepared to cope correctly. Of course, this will depend on the
details of how the driver is written.

There have been examples in the past of devices that, for one reason or
another, _did_ generate IRQs at inconvenient times. The hardware or
the BIOS may have done improper initialization, for example. On a
shared IRQ this led to interrupt storms. IIRC, the solution was to add
a PCI quirk routine to disable IRQ generation at an early stage.
Didn't e100 have this problem?

Alan Stern

2009-03-08 20:51:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts

On Sunday 08 March 2009, Frans Pop wrote:
> (Most CCs dropped.)
>
> Hi Rafael,

Hi Frans,

> Rafael J. Wysocki wrote:
> > The following patches modifiy the way in which we handle disabling
> > interrupts during suspend and enabling them during resume. They also
> > change the ordering of the core suspend and hibernation code to take
> > advantage of the new approach to the interrupts and modify the PCI PM
> > core to avoid a few problems.
>
> I've given this series a try on my HP 2510p. I've seen no regressions
> with suspend to RAM.

Great, thanks for testing!

> Below is a diff between suspend/resume dmesg from before (based on rc5)
> and after (rc7 + series) the patch, with some comments.
> Nothing looks really wrong, but there are some surprising changes.
>
> Essentially JFYI though.
>
> Cheers,
> FJP
>
> PM: Syncing filesystems ... done.
> Freezing user space processes ... (elapsed 0.00 seconds) done.
> Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
> Suspending console(s) (use no_console_suspend to debug)
> sd 0:0:0:0: [sda] Synchronizing SCSI cache
> sd 0:0:0:0: [sda] Stopping disk
> ACPI handle has no context!
> ACPI handle has no context!
> sdhci-pci 0000:02:06.2: PME# disabled
> sdhci-pci 0000:02:06.2: PCI INT C disabled
> ACPI handle has no context!
> ACPI handle has no context!
> # Bogus: result of using wireless instead of wired networking.
> +iwlagn 0000:10:00.0: PCI INT A disabled
> ata2: port disabled. ignoring.
> ata_piix 0000:00:1f.1: PCI INT A disabled
> ehci_hcd 0000:00:1d.7: PCI INT A disabled
> ehci_hcd 0000:00:1d.7: PME# disabled
> uhci_hcd 0000:00:1d.2: PCI INT C disabled
> uhci_hcd 0000:00:1d.1: PCI INT B disabled
> uhci_hcd 0000:00:1d.0: PCI INT A disabled
> HDA Intel 0000:00:1b.0: PCI INT A disabled
> HDA Intel 0000:00:1b.0: power state changed by ACPI to D3
> ehci_hcd 0000:00:1a.7: PCI INT C disabled
> ehci_hcd 0000:00:1a.7: PME# disabled
> uhci_hcd 0000:00:1a.1: PCI INT B disabled
> uhci_hcd 0000:00:1a.0: PCI INT A disabled
> e1000e 0000:00:19.0: PME# enabled
> e1000e 0000:00:19.0: wake-up capability enabled by ACPI
> e1000e 0000:00:19.0: PME# enabled
> e1000e 0000:00:19.0: wake-up capability enabled by ACPI
> e1000e 0000:00:19.0: PCI INT A disabled
> ACPI handle has no context!
> # This has moved up a bit. Looks more logical.

This is a result of patch 2/8, intentional.

> +ricoh-mmc: Suspending.
> +ricoh-mmc: Controller is now re-enabled.
> ACPI: Preparing to enter system sleep state S3
> Disabling non-boot CPUs ...
> CPU 1 is now offline
> SMP alternatives: switching to UP code
> CPU0 attaching NULL sched-domain.
> CPU1 attaching NULL sched-domain.
> CPU0 attaching NULL sched-domain.
> CPU1 is down
> -ricoh-mmc: Suspending.
> -ricoh-mmc: Controller is now re-enabled.
> Extended CMOS year: 2000
>
> Back to C!
> +CPU0: Thermal monitoring enabled (TM2)
> Extended CMOS year: 2000
> # This whole block has moved up before early config space restores.
> # No changes in the block itself.

Yes this also is an intentional result of patch 2/8.

> +Enabling non-boot CPUs ...
> +SMP alternatives: switching to SMP code
> +Booting processor 1 APIC 0x1 ip 0x6000
> +Initializing CPU#1
> +Calibrating delay using timer specific routine.. 2660.04 BogoMIPS (lpj=5320097)
> +CPU: L1 I cache: 32K, L1 D cache: 32K
> +CPU: L2 cache: 2048K
> +[ds] using Core 2/Atom configuration
> +CPU: Physical Processor ID: 0
> +CPU: Processor Core ID: 1
> +CPU1: Thermal monitoring enabled (TM2)
> +CPU1: Intel(R) Core(TM)2 Duo CPU U7700 @ 1.33GHz stepping 0d
> +CPU0 attaching NULL sched-domain.
> +Switched to high resolution mode on CPU 1
> +CPU0 attaching sched-domain:
> + domain 0: span 0-1 level MC
> + groups: 0 1
> +CPU1 attaching sched-domain:
> + domain 0: span 0-1 level MC
> + groups: 1 0
> +CPU1 is up
> +ACPI: Waking up from system sleep state S3
> pci 0000:00:02.0: restoring config space at offset 0x8 (was 0x1, writing 0x2001)
> # These don't need restoring anymore?

I think they generally do, but the restored values may (and often are)
identical to the current ones.

> -pci 0000:00:02.1: restoring config space at offset 0x4 (was 0x4, writing 0xe0500004)
> -pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
> -pci 0000:00:03.0: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
> -pci 0000:00:03.0: restoring config space at offset 0x4 (was 0xfed12004, writing 0xe0600004)
> -pci 0000:00:03.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
> -pci 0000:00:03.2: restoring config space at offset 0x8 (was 0x1, writing 0x2031)
> -pci 0000:00:03.2: restoring config space at offset 0x7 (was 0x1, writing 0x2021)
> -pci 0000:00:03.2: restoring config space at offset 0x6 (was 0x1, writing 0x2019)
> -pci 0000:00:03.2: restoring config space at offset 0x5 (was 0x1, writing 0x2011)
> -pci 0000:00:03.2: restoring config space at offset 0x4 (was 0x1, writing 0x2009)
> -pci 0000:00:03.2: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00001)
> serial 0000:00:03.3: restoring config space at offset 0xf (was 0x200, writing 0x20a)
> serial 0000:00:03.3: restoring config space at offset 0x5 (was 0x0, writing 0xe0601000)
> serial 0000:00:03.3: restoring config space at offset 0x4 (was 0x1, writing 0x2041)
> serial 0000:00:03.3: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00007)
> e1000e 0000:00:19.0: restoring config space at offset 0xf (was 0x100, writing 0x10b)
> e1000e 0000:00:19.0: restoring config space at offset 0x6 (was 0x1, writing 0x2061)
> e1000e 0000:00:19.0: restoring config space at offset 0x5 (was 0x0, writing 0xe0640000)
> e1000e 0000:00:19.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100007)
> # These have moved down to late resume.

That's a bit strange. It looks like the registers changed after we had
restored them during "early" resume. So either we hadn't actually restored
them (it would be interesting to find out why), or they really changed (again,
it would be interesting to see why).

> -uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
> -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> -uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
> -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
> -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> ehci_hcd 0000:00:1a.7: restoring config space at offset 0xf (was 0x300, writing 0x30b)
> ehci_hcd 0000:00:1a.7: restoring config space at offset 0x4 (was 0x0, writing 0xe0641000)
> ehci_hcd 0000:00:1a.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002)
> HDA Intel 0000:00:1b.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> HDA Intel 0000:00:1b.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
> HDA Intel 0000:00:1b.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100002)
> pcieport-driver 0000:00:1c.0: restoring config space at offset 0xf (was 0x100, writing 0x4010a)
> pcieport-driver 0000:00:1c.0: restoring config space at offset 0x9 (was 0x10001, writing 0x1fff1)
> pcieport-driver 0000:00:1c.0: restoring config space at offset 0x8 (was 0x0, writing 0xfff0)
> pcieport-driver 0000:00:1c.0: restoring config space at offset 0x7 (was 0x0, writing 0x200000f0)
> pcieport-driver 0000:00:1c.0: restoring config space at offset 0x6 (was 0x0, writing 0x80800)
> pcieport-driver 0000:00:1c.0: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
> pcieport-driver 0000:00:1c.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
> pcieport-driver 0000:00:1c.1: restoring config space at offset 0xf (was 0x200, writing 0x4020a)
> pcieport-driver 0000:00:1c.1: restoring config space at offset 0x9 (was 0x10001, writing 0x1fff1)
> pcieport-driver 0000:00:1c.1: restoring config space at offset 0x8 (was 0x0, writing 0xe000e000)
> pcieport-driver 0000:00:1c.1: restoring config space at offset 0x7 (was 0x0, writing 0xf0)
> pcieport-driver 0000:00:1c.1: restoring config space at offset 0x3 (was 0x810000, writing 0x810010)
> pcieport-driver 0000:00:1c.1: restoring config space at offset 0x1 (was 0x100000, writing 0x100407)
> # These have moved down to late resume.

The last comment applies here too.

> -uhci_hcd 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> -uhci_hcd 0000:00:1d.0: restoring config space at offset 0x8 (was 0x1, writing 0x20c1)
> -uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> -uhci_hcd 0000:00:1d.1: restoring config space at offset 0xf (was 0x200, writing 0x20b)
> -uhci_hcd 0000:00:1d.1: restoring config space at offset 0x8 (was 0x1, writing 0x20e1)
> -uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> -uhci_hcd 0000:00:1d.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
> -uhci_hcd 0000:00:1d.2: restoring config space at offset 0x8 (was 0x1, writing 0x2101)
> -uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> ehci_hcd 0000:00:1d.7: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> ehci_hcd 0000:00:1d.7: restoring config space at offset 0x4 (was 0x0, writing 0xe0648000)
> ehci_hcd 0000:00:1d.7: restoring config space at offset 0x1 (was 0x2900000, writing 0x2900002)
> # These have disappeared.

Good.

> -pci 0000:00:1e.0: restoring config space at offset 0x9 (was 0x10001, writing 0x83f18001)
> -pci 0000:00:1e.0: restoring config space at offset 0x8 (was 0x0, writing 0xe030e010)
> -pci 0000:00:1e.0: restoring config space at offset 0x7 (was 0x228000f0, writing 0x22803030)
> -pci 0000:00:1e.0: restoring config space at offset 0x1 (was 0x100007, writing 0x100107)
> # First two moved to late resume.

Again, a bit strange.

> # The third already happened during late resume (duplicated).
> -ata_piix 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> -ata_piix 0000:00:1f.1: restoring config space at offset 0x8 (was 0xc01, writing 0x2121)
> -ata_piix 0000:00:1f.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2880005)
> iwlagn 0000:10:00.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> iwlagn 0000:10:00.0: restoring config space at offset 0x4 (was 0x4, writing 0xe0000004)
> iwlagn 0000:10:00.0: restoring config space at offset 0x3 (was 0x0, writing 0x10)
> iwlagn 0000:10:00.0: restoring config space at offset 0x1 (was 0x100000, writing 0x100006)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0xf (was 0x3000100, writing 0x580010b)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0xe (was 0x0, writing 0x34fc)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0xd (was 0x0, writing 0x3400)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0xc (was 0x0, writing 0x30fc)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0xb (was 0x0, writing 0x3000)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0xa (was 0x0, writing 0x87fff000)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0x9 (was 0x0, writing 0x84000000)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0x8 (was 0x0, writing 0x83fff000)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0x7 (was 0x0, writing 0x80000000)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0x6 (was 0x0, writing 0xb0060302)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0x4 (was 0x0, writing 0xe0100000)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0x3 (was 0x820000, writing 0x82a800)
> yenta_cardbus 0000:02:06.0: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100007)
> ohci1394 0000:02:06.1: restoring config space at offset 0xf (was 0x4020200, writing 0x4020205)
> ohci1394 0000:02:06.1: restoring config space at offset 0x4 (was 0x0, writing 0xe0101000)
> ohci1394 0000:02:06.1: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
> ohci1394 0000:02:06.1: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
> sdhci-pci 0000:02:06.2: restoring config space at offset 0xf (was 0x300, writing 0x30a)
> sdhci-pci 0000:02:06.2: restoring config space at offset 0x4 (was 0x0, writing 0xe0102000)
> sdhci-pci 0000:02:06.2: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
> sdhci-pci 0000:02:06.2: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
> # Some changes; a lot just got dropped.
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xf (was 0x300, writing 0xffffffff)
> +ricoh-mmc 0000:02:06.3: restoring config space at offset 0xf (was 0x300, writing 0x30a)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xe (was 0x0, writing 0xffffffff)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xd (was 0x80, writing 0xffffffff)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xc (was 0x0, writing 0xffffffff)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xb (was 0x30c9103c, writing 0xffffffff)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0xa (was 0x0, writing 0xffffffff)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x9 (was 0x0, writing 0xffffffff)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x8 (was 0x0, writing 0xffffffff)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x7 (was 0x0, writing 0xffffffff)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x6 (was 0x0, writing 0xffffffff)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x5 (was 0x0, writing 0xffffffff)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x4 (was 0x0, writing 0xffffffff)
> +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x4 (was 0x0, writing 0xe0103000)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x3 (was 0x800000, writing 0xffffffff)
> +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x3 (was 0x800000, writing 0x804010)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x2 (was 0x8800011, writing 0xffffffff)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x1 (was 0x2100000, writing 0xffffffff)
> +ricoh-mmc 0000:02:06.3: restoring config space at offset 0x1 (was 0x2100000, writing 0x2100006)
> -ricoh-mmc 0000:02:06.3: restoring config space at offset 0x0 (was 0x8431180, writing 0xffffffff)
> ricoh-mmc: Resuming.
> ricoh-mmc: Controller is now disabled.
> -Enabling non-boot CPUs ...
> -SMP alternatives: switching to SMP code
> -Booting processor 1 APIC 0x1 ip 0x6000
> -Initializing CPU#1
> -Calibrating delay using timer specific routine.. 2660.07 BogoMIPS (lpj=5320158)
> -CPU: L1 I cache: 32K, L1 D cache: 32K
> -CPU: L2 cache: 2048K
> -[ds] using Core 2/Atom configuration
> -CPU: Physical Processor ID: 0
> -CPU: Processor Core ID: 1
> -CPU1: Thermal monitoring enabled (TM2)
> -x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
> -CPU1: Intel(R) Core(TM)2 Duo CPU U7700 @ 1.33GHz stepping 0d
> -CPU0 attaching NULL sched-domain.
> -Switched to high resolution mode on CPU 1
> -CPU0 attaching sched-domain:
> - domain 0: span 0-1 level MC
> - groups: 0 1
> -CPU1 attaching sched-domain:
> - domain 0: span 0-1 level MC
> - groups: 1 0
> -CPU1 is up
> -ACPI: Waking up from system sleep state S3
> ACPI: EC: non-query interrupt received, switching to interrupt mode
> pci 0000:00:02.0: restoring config space at offset 0x1 (was 0x900403, writing 0x900003)
> pci 0000:00:02.0: PME# disabled
> pci 0000:00:02.1: PME# disabled
> pci 0000:00:03.0: PME# disabled
> pci 0000:00:03.2: PME# disabled
> e1000e 0000:00:19.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
> e1000e 0000:00:19.0: setting latency timer to 64
> e1000e 0000:00:19.0: wake-up capability disabled by ACPI
> e1000e 0000:00:19.0: PME# disabled
> e1000e 0000:00:19.0: wake-up capability disabled by ACPI
> e1000e 0000:00:19.0: PME# disabled
> e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
> +uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> +uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
> +uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> uhci_hcd 0000:00:1a.0: setting latency timer to 64
> usb usb1: root hub lost power or was reset
> +uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
> +uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
> +uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
> uhci_hcd 0000:00:1a.1: setting latency timer to 64
> usb usb3: root hub lost power or was reset
> ehci_hcd 0000:00:1a.7: PME# disabled
> ehci_hcd 0000:00:1a.7: PCI INT C -> GSI 18 (level, low) -> IRQ 18
> ehci_hcd 0000:00:1a.7: setting latency timer to 64
> ehci_hcd 0000:00:1a.7: PME# disabled
> # Called twice now?
> HDA Intel 0000:00:1b.0: power state changed by ACPI to D0
> +HDA Intel 0000:00:1b.0: power state changed by ACPI to D0

Yeah, it's not nice. The problem is that pci_set_power_state() doesn't
check if the power state is already correct before calling the platform to
change it. The platform should cope with that, but it shouldn't be called
for the second time at all.

In fact I have a patch to change this behavior, but I consider it as a separate
thing.

> HDA Intel 0000:00:1b.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> HDA Intel 0000:00:1b.0: setting latency timer to 64
> pcieport-driver 0000:00:1c.0: setting latency timer to 64
> pcieport-driver 0000:00:1c.1: setting latency timer to 64
> +uhci_hcd 0000:00:1d.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> +uhci_hcd 0000:00:1d.0: restoring config space at offset 0x8 (was 0x1, writing 0x20c1)
> +uhci_hcd 0000:00:1d.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
> uhci_hcd 0000:00:1d.0: setting latency timer to 64
> usb usb5: root hub lost power or was reset
> +uhci_hcd 0000:00:1d.1: restoring config space at offset 0xf (was 0x200, writing 0x20b)
> +uhci_hcd 0000:00:1d.1: restoring config space at offset 0x8 (was 0x1, writing 0x20e1)
> +uhci_hcd 0000:00:1d.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 22 (level, low) -> IRQ 22
> uhci_hcd 0000:00:1d.1: setting latency timer to 64
> usb usb6: root hub lost power or was reset
> +uhci_hcd 0000:00:1d.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
> +uhci_hcd 0000:00:1d.2: restoring config space at offset 0x8 (was 0x1, writing 0x2101)
> +uhci_hcd 0000:00:1d.2: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
> uhci_hcd 0000:00:1d.2: setting latency timer to 64
> usb usb7: root hub lost power or was reset
> ehci_hcd 0000:00:1d.7: PME# disabled
> ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 20 (level, low) -> IRQ 20
> ehci_hcd 0000:00:1d.7: setting latency timer to 64
> ehci_hcd 0000:00:1d.7: PME# disabled
> pci 0000:00:1e.0: setting latency timer to 64
> +ata_piix 0000:00:1f.1: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> +ata_piix 0000:00:1f.1: restoring config space at offset 0x8 (was 0xc01, writing 0x2121)
> ata_piix 0000:00:1f.1: restoring config space at offset 0x1 (was 0x2800005, writing 0x2880005)
> ata_piix 0000:00:1f.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> ata_piix 0000:00:1f.1: setting latency timer to 64
> ata2: port disabled. ignoring.
> ACPI Exception (exoparg2-0445): AE_AML_PACKAGE_LIMIT, Index (000000005) is beyond end of object [20081204]
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.C2C3] (Node ffff88007e01dea0), AE_AML_PACKAGE_LIMIT
> ACPI Error (psparse-0537): Method parse/execution failed [\_SB_.C003.C0F6.C3F3._STM] (Node ffff88007e043de0), AE_AML_PACKAGE_LIMIT
> ata1: ACPI set timing mode failed (status=0x300b)
> # Remaining differences are bogus: result of using wireless instead of wired networking.

OK

Thanks for the debugging work.

Best,
Rafael

2009-03-08 21:37:17

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Sunday 08 March 2009, Alan Stern wrote:
> On Sun, 8 Mar 2009, Linus Torvalds wrote:
>
> > On Sat, 7 Mar 2009, Alan Stern wrote:
> > >
> > > You didn't answer my question. Why bother to distinguish between
> > > "wake-up" interrupts and non-"wake-up" interrupts?
> > >
> > > In other words, why not simply abort the suspend if IRQ_PENDING is set
> > > for _any_ interrupt during sysdev_suspend()?
> >
> > .. because some drivers might not actually shut down the hardware until
> > they get to "suspend_late"? If even then, for that matter - a driver may
> > simply not care, knowing that the hardware will be powered off, and will
> > be re-initialized at resume.
> >
> > The thinking that you have to shut your hardware down at "->suspend()"
> > time is a _disease_. There are literally classes of hardware out there
> > where that would be an outright _bug_, like for a PCI bridge device. For
> > many devices, "suspend()" has to be the phase where you shut down the
> > _external_ stuff (eg for a disk controller, it's when you'd flush and stop
> > your disks), but the controller itself may well be alive until later.
>
> Yes, certainly. I agree completely.
>
> But there is a difference between shutting down the hardware and merely
> preventing it from generating interrupt requests. If a device remains
> capable of generating IRQs after its driver's suspend method has run,
> the driver runs the risk of having its handler called at a time when it
> isn't prepared to cope correctly. Of course, this will depend on the
> details of how the driver is written.
>
> There have been examples in the past of devices that, for one reason or
> another, _did_ generate IRQs at inconvenient times. The hardware or
> the BIOS may have done improper initialization, for example. On a
> shared IRQ this led to interrupt storms.

Well, we're now trying to fix exactly this problem. :-)

> IIRC, the solution was to add a PCI quirk routine to disable IRQ generation
> at an early stage. Didn't e100 have this problem?

I don't remember, sorry.

Thanks,
Rafael

2009-03-09 15:01:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)



On Sun, 8 Mar 2009, Alan Stern wrote:
>
> There have been examples in the past of devices that, for one reason or
> another, _did_ generate IRQs at inconvenient times. The hardware or
> the BIOS may have done improper initialization, for example. On a
> shared IRQ this led to interrupt storms. IIRC, the solution was to add
> a PCI quirk routine to disable IRQ generation at an early stage.
> Didn't e100 have this problem?

.. and this is exactly the reason why we've done all these changes.

There are tons of drivers that are unable to cope with interrupts that
happen after they've done their "pci_set_power_state(PCI_D3hot)".

With shared interrupts (and _another_ device still live), they do stupid
things like read the interrupt status register, getting all-ones (because
the device is dead), and then deciding that that means that that need to
handle the interrupt. And that goes on in a loop. Forever.

Or they do _that_ part right, but their suspend also free'd some data
structure, so now the interrupt handler will follow a NULL pointer and/or
scribble to freed memory. The source of bugs is infinite, and not fixable
(because, quite frankly, most device driver writers are very focused on
the hardware, and have a hard time thinking about it as part of the bigger
system - and even if they happen test suspend/resume, they probably won't
be testing it with shared interrupts, so it will work _for_them_ even if
it's totally broken).

So what all the PCI changes try to do is to basically not have the driver
do the "pci_set_power_state(PCI_D3)" at _all_, an do it in the PCI layer.
But more importantly, it needs to be done _after_ interrupts have been
disabled for this all to work. And, for exactly the same reason, the PCI
layer needs to wake the device up and restore its config space _before_
enabling interrupts again, and _before_ doing any ->resume calls.

And that, in turn, means that since we have all these ACPI ordering
things, and many cases want to use ACPI to wake things up, and/or have
delays etc, we end up actually wanting things like timer interrupts
working at that time - but not normal "device" interrupts. Because many
delays do need them, even as simple delays as the (fairly short, but not
"busy loop" short) one for turning the device back into PCI_D0 again.

So this literally explains all the re-ordering, and all the interrupt
games we now play in Rafael's patch-set. The _whole_ (and only) point is
to make it easier for device drivers, while also changing the environment
so that we can call ACPI and we can sleep even before the devices have
really resumed (or even early_resume'd).

Linus

2009-03-09 15:13:37

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)

On Mon, 9 Mar 2009, Linus Torvalds wrote:

> On Sun, 8 Mar 2009, Alan Stern wrote:
> >
> > There have been examples in the past of devices that, for one reason or
> > another, _did_ generate IRQs at inconvenient times. The hardware or
> > the BIOS may have done improper initialization, for example. On a
> > shared IRQ this led to interrupt storms. IIRC, the solution was to add
> > a PCI quirk routine to disable IRQ generation at an early stage.
> > Didn't e100 have this problem?
>
> .. and this is exactly the reason why we've done all these changes.
>
> There are tons of drivers that are unable to cope with interrupts that
> happen after they've done their "pci_set_power_state(PCI_D3hot)".
>
> With shared interrupts (and _another_ device still live), they do stupid
> things like read the interrupt status register, getting all-ones (because
> the device is dead), and then deciding that that means that that need to
> handle the interrupt. And that goes on in a loop. Forever.
>
> Or they do _that_ part right, but their suspend also free'd some data
> structure, so now the interrupt handler will follow a NULL pointer and/or
> scribble to freed memory. The source of bugs is infinite, and not fixable
> (because, quite frankly, most device driver writers are very focused on
> the hardware, and have a hard time thinking about it as part of the bigger
> system - and even if they happen test suspend/resume, they probably won't
> be testing it with shared interrupts, so it will work _for_them_ even if
> it's totally broken).
>
> So what all the PCI changes try to do is to basically not have the driver
> do the "pci_set_power_state(PCI_D3)" at _all_, an do it in the PCI layer.
> But more importantly, it needs to be done _after_ interrupts have been
> disabled for this all to work. And, for exactly the same reason, the PCI
> layer needs to wake the device up and restore its config space _before_
> enabling interrupts again, and _before_ doing any ->resume calls.
>
> And that, in turn, means that since we have all these ACPI ordering
> things, and many cases want to use ACPI to wake things up, and/or have
> delays etc, we end up actually wanting things like timer interrupts
> working at that time - but not normal "device" interrupts. Because many
> delays do need them, even as simple delays as the (fairly short, but not
> "busy loop" short) one for turning the device back into PCI_D0 again.
>
> So this literally explains all the re-ordering, and all the interrupt
> games we now play in Rafael's patch-set. The _whole_ (and only) point is
> to make it easier for device drivers, while also changing the environment
> so that we can call ACPI and we can sleep even before the devices have
> really resumed (or even early_resume'd).

I see. The unstated key point is this:

Unsophisticated drivers can still be expected to work if they
get an interrupt after their suspend method has run, _provided_
the device is still in D0. Likewise, unsophisticated drivers
can be expected to fail if they get an interrupt after the
device has been put in D3.

Hence you don't require drivers to disable interrupt generation in
their suspend methods, and you do prevent interrupts from being
delivered to drivers before changing device power states.

And hence you also go to some trouble to distinguish between IRQs
which might be received merely because the driver didn't bother to
suppress them vs. IRQs which indicate a genuine wakeup request.

Got it.

Alan Stern

2009-03-09 15:42:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: [linux-pm] [RFC][PATCH][1/8] PM: Rework handling of interrupts during suspend-resume (rev. 5)



On Mon, 9 Mar 2009, Alan Stern wrote:
>
> I see. The unstated key point is this:
>
> Unsophisticated drivers [...]

Another key point is:

- _un_sophisticated is the norm, and anybody who expects otherwise is
living in some odd la-la-land together with his or her pink unicorn and
endless supplies of quaaludes.

The thing is, we have about a metric sh*tload of drivers, and many of them
are effectively written by people who don't really do kernel work, and are
basically unmaintained in the long run (ie they may be maintained while
written, but two years down the line they have a couple of hundred users
and nobody who really cares about it, because the original author long
since moved on to fancier hardware).

Linus

2009-03-14 08:44:51

by Frans Pop

[permalink] [raw]
Subject: Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts

On Sunday 08 March 2009, Rafael J. Wysocki wrote:
> > # These don't need restoring anymore?
>
> I think they generally do, but the restored values may (and often are)
> identical to the current ones.
>
> > ? ?-pci 0000:00:02.1: restoring config space at offset 0x4 (was 0x4, writing 0xe0500004)
> > -pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
> > -pci 0000:00:03.0: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
> > -pci 0000:00:03.0: restoring config space at offset 0x4 (was 0xfed12004, writing 0xe0600004)
> > -pci 0000:00:03.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
> > -pci 0000:00:03.2: restoring config space at offset 0x8 (was 0x1, writing 0x2031)
> > -pci 0000:00:03.2: restoring config space at offset 0x7 (was 0x1, writing 0x2021)
> > -pci 0000:00:03.2: restoring config space at offset 0x6 (was 0x1, writing 0x2019)
> > -pci 0000:00:03.2: restoring config space at offset 0x5 (was 0x1, writing 0x2011)
> > -pci 0000:00:03.2: restoring config space at offset 0x4 (was 0x1, writing 0x2009)
> > -pci 0000:00:03.2: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00001)
[...]
> > # These have moved down to late resume.
>
> That's a bit strange. ?It looks like the registers changed after we had
> restored them during "early" resume. ?So either we hadn't actually
> restored them (it would be interesting to find out why), or they really
> changed (again, it would be interesting to see why).
>
> > ? ?-uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> > ? ?-uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
> > ? ?-uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> > ? ?-uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
> > ? ?-uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
> > ? ?-uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)

These changes look to have been reverted somehow with rc8 + your latest
patch set. Not sure if it's due to changes in the patches, or just an
effect of local circumstances (such as (un)suspending while the system
is docked). Or sun spots of course.

The "restoring config space" messages now look virtually the same
as for rc5, only some messages for the ricoh-mmc module are still
"missing", but I'm not worried about that.

Cheers,
FJP

2009-03-14 11:59:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts

On Saturday 14 March 2009, Frans Pop wrote:
> On Sunday 08 March 2009, Rafael J. Wysocki wrote:
> > > # These don't need restoring anymore?
> >
> > I think they generally do, but the restored values may (and often are)
> > identical to the current ones.
> >
> > > -pci 0000:00:02.1: restoring config space at offset 0x4 (was 0x4, writing 0xe0500004)
> > > -pci 0000:00:02.1: restoring config space at offset 0x1 (was 0x900000, writing 0x900007)
> > > -pci 0000:00:03.0: restoring config space at offset 0xf (was 0x100, writing 0x1ff)
> > > -pci 0000:00:03.0: restoring config space at offset 0x4 (was 0xfed12004, writing 0xe0600004)
> > > -pci 0000:00:03.2: restoring config space at offset 0xf (was 0x300, writing 0x30b)
> > > -pci 0000:00:03.2: restoring config space at offset 0x8 (was 0x1, writing 0x2031)
> > > -pci 0000:00:03.2: restoring config space at offset 0x7 (was 0x1, writing 0x2021)
> > > -pci 0000:00:03.2: restoring config space at offset 0x6 (was 0x1, writing 0x2019)
> > > -pci 0000:00:03.2: restoring config space at offset 0x5 (was 0x1, writing 0x2011)
> > > -pci 0000:00:03.2: restoring config space at offset 0x4 (was 0x1, writing 0x2009)
> > > -pci 0000:00:03.2: restoring config space at offset 0x1 (was 0xb00000, writing 0xb00001)
> [...]
> > > # These have moved down to late resume.
> >
> > That's a bit strange. It looks like the registers changed after we had
> > restored them during "early" resume. So either we hadn't actually
> > restored them (it would be interesting to find out why), or they really
> > changed (again, it would be interesting to see why).
> >
> > > -uhci_hcd 0000:00:1a.0: restoring config space at offset 0xf (was 0x100, writing 0x10a)
> > > -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x8 (was 0x1, writing 0x2081)
> > > -uhci_hcd 0000:00:1a.0: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
> > > -uhci_hcd 0000:00:1a.1: restoring config space at offset 0xf (was 0x200, writing 0x20a)
> > > -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x8 (was 0x1, writing 0x20a1)
> > > -uhci_hcd 0000:00:1a.1: restoring config space at offset 0x1 (was 0x2800000, writing 0x2800001)
>
> These changes look to have been reverted somehow with rc8 + your latest
> patch set. Not sure if it's due to changes in the patches, or just an
> effect of local circumstances (such as (un)suspending while the system
> is docked). Or sun spots of course.
>
> The "restoring config space" messages now look virtually the same
> as for rc5, only some messages for the ricoh-mmc module are still
> "missing", but I'm not worried about that.

Thanks for testing!

Could you please also test the last iteration of the $subject patch series
(just sent) with the appended patch applied on top and post dmesg output?

Rafael

---
drivers/pci/pci-driver.c | 23 +++++++++++++++++++++--
drivers/pci/pci.c | 5 +++++
2 files changed, 26 insertions(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/pci.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -732,6 +732,9 @@ int
pci_save_state(struct pci_dev *dev)
{
int i;
+
+ dev_info(&dev->dev, "saving PCI config space\n");
+
/* XXX: 100% dword access ok here? */
for (i = 0; i < 16; i++)
pci_read_config_dword(dev, i * 4,&dev->saved_config_space[i]);
@@ -753,6 +756,8 @@ pci_restore_state(struct pci_dev *dev)
int i;
u32 val;

+ dev_info(&dev->dev, "restoring PCI config space\n");
+
/* PCI Express register must be restored first */
pci_restore_pcie_state(dev);

Index: linux-2.6/drivers/pci/pci-driver.c
===================================================================
--- linux-2.6.orig/drivers/pci/pci-driver.c
+++ linux-2.6/drivers/pci/pci-driver.c
@@ -438,10 +438,24 @@ static int pci_restore_standard_config(s
{
pci_update_current_state(pci_dev, PCI_UNKNOWN);

+ switch (pci_dev->current_state) {
+ case PCI_UNKNOWN:
+ case PCI_POWER_ERROR:
+ dev_info(&pci_dev->dev, "%s: unknown power state\n",
+ __FUNCTION__);
+ break;
+ default:
+ dev_info(&pci_dev->dev, "%s: power state D%d\n",
+ __FUNCTION__, pci_dev->current_state);
+ }
+
if (pci_dev->current_state != PCI_D0) {
int error = pci_set_power_state(pci_dev, PCI_D0);
- if (error)
+ if (error) {
+ dev_err(&pci_dev->dev,
+ "error %d while changing power state\n", error);
return error;
+ }
}

return pci_dev->state_saved ? pci_restore_state(pci_dev) : 0;
@@ -449,6 +463,8 @@ static int pci_restore_standard_config(s

static void pci_pm_default_resume_noirq(struct pci_dev *pci_dev)
{
+ dev_info(&pci_dev->dev, "%s: calling pci_restore_standard_config()\n",
+ __FUNCTION__);
pci_restore_standard_config(pci_dev);
pci_dev->state_saved = false;
pci_fixup_device(pci_fixup_resume_early, pci_dev);
@@ -615,8 +631,11 @@ static int pci_pm_resume(struct device *
* This is necessary for the suspend error path in which resume is
* called without restoring the standard config registers of the device.
*/
- if (pci_dev->state_saved)
+ if (pci_dev->state_saved) {
+ dev_info(dev, "%s: restoring standard PCI config registers\n",
+ __FUNCTION__);
pci_restore_standard_config(pci_dev);
+ }

if (pci_has_legacy_pm_support(pci_dev))
return pci_legacy_resume(dev);

2009-03-14 14:12:18

by Frans Pop

[permalink] [raw]
Subject: Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts

On Saturday 14 March 2009, you wrote:
> Could you please also test the last iteration of the $subject patch
> series (just sent) with the appended patch applied on top and post
> dmesg output?

Here you are:
- boot
- STR with wireless networking
- STD with wireless networking
- STR with wired networking and killswitch on wireless

No problems seen :-)

Cheers,
FJP


Attachments:
(No filename) (368.00 B)
2.6.29-rc8-rjw-test.gz (13.85 kB)
Download all attachments

2009-03-14 22:31:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [RFC][PATCH][0/8] PM: Rework suspend-resume ordering to avoid problems with shared interrupts

On Saturday 14 March 2009, Frans Pop wrote:
> On Saturday 14 March 2009, you wrote:
> > Could you please also test the last iteration of the $subject patch
> > series (just sent) with the appended patch applied on top and post
> > dmesg output?
>
> Here you are:
> - boot
> - STR with wireless networking
> - STD with wireless networking
> - STR with wired networking and killswitch on wireless
>
> No problems seen :-)

Great, thanks for the log, it looks correct.

Best,
Rafael