2007-12-24 00:40:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 0/3] PM: Do not destroy/create devices while suspended

Hi,

Some device drivers register CPU hotplug notifiers and use them to destroy
device objects when removing the corresponding CPUs and to create these objects
when adding the CPUs back.

Unfortunately, this is not the right thing to do during suspend/hibernation,
since in that cases the CPU hotplug notifiers are called after suspending
devices and before resuming them, so the operations in question are carried
out on the objects representing suspended devices which shouldn't be
unregistered behing the PM core's back. Although right now it usually doesn't
lead to any practical complications, it will predictably deadlock if
gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch is applied.

The solution is to prevent drivers from removing/adding devices from within
CPU hotplug notifiers during suspend/hibernation using the FROZEN bit
in the notifier's action argument. The following three patches modify the
MSR, x86-64 MCE and cpuid drivers along these lines.

Thanks,
Rafael


2007-12-24 00:39:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 3/3] PM: Do not destroy/create devices while suspended in cpuid.c

From: Rafael J. Wysocki <[email protected]>

The cpuid driver should not attempt to destroy/create a suspended
device.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/kernel/cpuid.c | 3 ---
1 file changed, 3 deletions(-)

Index: linux-2.6/arch/x86/kernel/cpuid.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpuid.c
+++ linux-2.6/arch/x86/kernel/cpuid.c
@@ -157,13 +157,10 @@ static int __cpuinit cpuid_class_cpu_cal

switch (action) {
case CPU_UP_PREPARE:
- case CPU_UP_PREPARE_FROZEN:
err = cpuid_device_create(cpu);
break;
case CPU_UP_CANCELED:
- case CPU_UP_CANCELED_FROZEN:
case CPU_DEAD:
- case CPU_DEAD_FROZEN:
cpuid_device_destroy(cpu);
break;
}

2007-12-24 00:39:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 2/3] PM: Do not destroy/create devices while suspended in mce_64.c

From: Rafael J. Wysocki <[email protected]>

The x86-64 MCE driver should not attempt to destroy/create a suspended
device.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/kernel/cpu/mcheck/mce_64.c | 2 --
1 file changed, 2 deletions(-)

Index: linux-2.6/arch/x86/kernel/cpu/mcheck/mce_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/cpu/mcheck/mce_64.c
+++ linux-2.6/arch/x86/kernel/cpu/mcheck/mce_64.c
@@ -862,11 +862,9 @@ mce_cpu_callback(struct notifier_block *

switch (action) {
case CPU_ONLINE:
- case CPU_ONLINE_FROZEN:
mce_create_device(cpu);
break;
case CPU_DEAD:
- case CPU_DEAD_FROZEN:
mce_remove_device(cpu);
break;
}

2007-12-24 00:40:40

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH 1/3] PM: Do not destroy/create devices while suspended in msr.c

From: Rafael J. Wysocki <[email protected]>

The MSR driver should not attempt to destroy/create a suspended
device.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
arch/x86/kernel/msr.c | 3 ---
1 file changed, 3 deletions(-)

Index: linux-2.6/arch/x86/kernel/msr.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/msr.c
+++ linux-2.6/arch/x86/kernel/msr.c
@@ -155,13 +155,10 @@ static int __cpuinit msr_class_cpu_callb

switch (action) {
case CPU_UP_PREPARE:
- case CPU_UP_PREPARE_FROZEN:
err = msr_device_create(cpu);
break;
case CPU_UP_CANCELED:
- case CPU_UP_CANCELED_FROZEN:
case CPU_DEAD:
- case CPU_DEAD_FROZEN:
msr_device_destroy(cpu);
break;
}

2007-12-24 15:51:24

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 0/3] PM: Do not destroy/create devices while suspended

On Mon, 24 Dec 2007, Rafael J. Wysocki wrote:

> Hi,
>
> Some device drivers register CPU hotplug notifiers and use them to destroy
> device objects when removing the corresponding CPUs and to create these objects
> when adding the CPUs back.
>
> Unfortunately, this is not the right thing to do during suspend/hibernation,
> since in that cases the CPU hotplug notifiers are called after suspending
> devices and before resuming them, so the operations in question are carried
> out on the objects representing suspended devices which shouldn't be
> unregistered behing the PM core's back. Although right now it usually doesn't
> lead to any practical complications, it will predictably deadlock if
> gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch is applied.
>
> The solution is to prevent drivers from removing/adding devices from within
> CPU hotplug notifiers during suspend/hibernation using the FROZEN bit
> in the notifier's action argument. The following three patches modify the
> MSR, x86-64 MCE and cpuid drivers along these lines.

Do we need to worry about the possibility that when the system wakes up
from hibernation, the set of usable CPUs might be smaller than it was
beforehand? Is any special handling needed for this, or is it already
accounted for?

Alan Stern

2007-12-25 12:35:27

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 0/3] PM: Do not destroy/create devices while suspended

On Mon 2007-12-24 10:51:15, Alan Stern wrote:
> On Mon, 24 Dec 2007, Rafael J. Wysocki wrote:
>
> > Hi,
> >
> > Some device drivers register CPU hotplug notifiers and use them to destroy
> > device objects when removing the corresponding CPUs and to create these objects
> > when adding the CPUs back.
> >
> > Unfortunately, this is not the right thing to do during suspend/hibernation,
> > since in that cases the CPU hotplug notifiers are called after suspending
> > devices and before resuming them, so the operations in question are carried
> > out on the objects representing suspended devices which shouldn't be
> > unregistered behing the PM core's back. Although right now it usually doesn't
> > lead to any practical complications, it will predictably deadlock if
> > gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch is applied.
> >
> > The solution is to prevent drivers from removing/adding devices from within
> > CPU hotplug notifiers during suspend/hibernation using the FROZEN bit
> > in the notifier's action argument. The following three patches modify the
> > MSR, x86-64 MCE and cpuid drivers along these lines.
>
> Do we need to worry about the possibility that when the system wakes up
> from hibernation, the set of usable CPUs might be smaller than it was
> beforehand? Is any special handling needed for this, or is it already
> accounted for?

That should not happen... but it does in some error cases.... so
handling it would be a bonus.

Waking up with one cpu out of 8 is bad, but still way better than not
waking up at all ;-).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-12-25 12:35:55

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 1/3] PM: Do not destroy/create devices while suspended in msr.c

On Mon 2007-12-24 01:56:34, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The MSR driver should not attempt to destroy/create a suspended
> device.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>

ACK.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-12-25 12:36:17

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 2/3] PM: Do not destroy/create devices while suspended in mce_64.c

On Mon 2007-12-24 01:57:17, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The x86-64 MCE driver should not attempt to destroy/create a suspended
> device.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>

ACK.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-12-25 12:36:31

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 3/3] PM: Do not destroy/create devices while suspended in cpuid.c

On Mon 2007-12-24 01:57:57, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> The cpuid driver should not attempt to destroy/create a suspended
> device.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>

ACK.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-12-25 16:02:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 0/3] PM: Do not destroy/create devices while suspended

On Monday, 24 of December 2007, Alan Stern wrote:
> On Mon, 24 Dec 2007, Rafael J. Wysocki wrote:
>
> > Hi,
> >
> > Some device drivers register CPU hotplug notifiers and use them to destroy
> > device objects when removing the corresponding CPUs and to create these objects
> > when adding the CPUs back.
> >
> > Unfortunately, this is not the right thing to do during suspend/hibernation,
> > since in that cases the CPU hotplug notifiers are called after suspending
> > devices and before resuming them, so the operations in question are carried
> > out on the objects representing suspended devices which shouldn't be
> > unregistered behing the PM core's back. Although right now it usually doesn't
> > lead to any practical complications, it will predictably deadlock if
> > gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch is applied.
> >
> > The solution is to prevent drivers from removing/adding devices from within
> > CPU hotplug notifiers during suspend/hibernation using the FROZEN bit
> > in the notifier's action argument. The following three patches modify the
> > MSR, x86-64 MCE and cpuid drivers along these lines.
>
> Do we need to worry about the possibility that when the system wakes up
> from hibernation, the set of usable CPUs might be smaller than it was
> beforehand?

This is possible in error conditions.

> Is any special handling needed for this, or is it already accounted for?

Hm, well. The cleanest thing would be to allow the drivers to remove the
device objects on CPU_UP_CANCELED_FROZEN, which means that we weren't able to
bring the CPU up during a resume, but still that will deadlock with
gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch.

Greetings,
Rafael

2007-12-25 19:34:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 0/3] PM: Do not destroy/create devices while suspended

On Tuesday, 25 of December 2007, Rafael J. Wysocki wrote:
> On Monday, 24 of December 2007, Alan Stern wrote:
> > On Mon, 24 Dec 2007, Rafael J. Wysocki wrote:
> >
> > > Hi,
> > >
> > > Some device drivers register CPU hotplug notifiers and use them to destroy
> > > device objects when removing the corresponding CPUs and to create these objects
> > > when adding the CPUs back.
> > >
> > > Unfortunately, this is not the right thing to do during suspend/hibernation,
> > > since in that cases the CPU hotplug notifiers are called after suspending
> > > devices and before resuming them, so the operations in question are carried
> > > out on the objects representing suspended devices which shouldn't be
> > > unregistered behing the PM core's back. Although right now it usually doesn't
> > > lead to any practical complications, it will predictably deadlock if
> > > gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch is applied.
> > >
> > > The solution is to prevent drivers from removing/adding devices from within
> > > CPU hotplug notifiers during suspend/hibernation using the FROZEN bit
> > > in the notifier's action argument. The following three patches modify the
> > > MSR, x86-64 MCE and cpuid drivers along these lines.
> >
> > Do we need to worry about the possibility that when the system wakes up
> > from hibernation, the set of usable CPUs might be smaller than it was
> > beforehand?
>
> This is possible in error conditions.
>
> > Is any special handling needed for this, or is it already accounted for?
>
> Hm, well. The cleanest thing would be to allow the drivers to remove the
> device objects on CPU_UP_CANCELED_FROZEN, which means that we weren't able to
> bring the CPU up during a resume, but still that will deadlock with
> gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch.

Hmm. In principle, device objects may be destroyed on CPU_UP_CANCELED_FROZEN
without acquiring the device locks, since in fact we know these objects won't
be accessed concurrently at that time (the locks are already held by the PM
core, but the PM core is not going to actually access the devices before the
subsequent resume).

Comments?

Thanks,
Rafael

2007-12-26 03:34:08

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH 0/3] PM: Do not destroy/create devices while suspended

On Tue, 25 Dec 2007, Rafael J. Wysocki wrote:

> > > Do we need to worry about the possibility that when the system wakes up
> > > from hibernation, the set of usable CPUs might be smaller than it was
> > > beforehand?
> >
> > This is possible in error conditions.
> >
> > > Is any special handling needed for this, or is it already accounted for?
> >
> > Hm, well. The cleanest thing would be to allow the drivers to remove the
> > device objects on CPU_UP_CANCELED_FROZEN, which means that we weren't able to
> > bring the CPU up during a resume, but still that will deadlock with
> > gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch.
>
> Hmm. In principle, device objects may be destroyed on CPU_UP_CANCELED_FROZEN
> without acquiring the device locks, since in fact we know these objects won't
> be accessed concurrently at that time (the locks are already held by the PM
> core, but the PM core is not going to actually access the devices before the
> subsequent resume).

How about delaying the CPU_UP_CANCELED_FROZEN announcements until it's
really safe to send them out? That is, after all devices have been
resumed and the PM core no longer holds any of their locks. (Should
this be before or after tasks leave the freezer? -- I'm not sure.)

So the idea is send appropriate announcements at the usual time for
CPUs that do come back up normally, and don't send anything right away
for CPUs that fail to come up. Just keep track of which ones failed,
and then later take care of them.

Alan Stern

2007-12-26 14:52:48

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 0/3] PM: Do not destroy/create devices while suspended

On Wednesday, 26 of December 2007, Alan Stern wrote:
> On Tue, 25 Dec 2007, Rafael J. Wysocki wrote:
>
> > > > Do we need to worry about the possibility that when the system wakes up
> > > > from hibernation, the set of usable CPUs might be smaller than it was
> > > > beforehand?
> > >
> > > This is possible in error conditions.
> > >
> > > > Is any special handling needed for this, or is it already accounted for?
> > >
> > > Hm, well. The cleanest thing would be to allow the drivers to remove the
> > > device objects on CPU_UP_CANCELED_FROZEN, which means that we weren't able to
> > > bring the CPU up during a resume, but still that will deadlock with
> > > gregkh-driver-pm-acquire-device-locks-prior-to-suspending.patch.
> >
> > Hmm. In principle, device objects may be destroyed on CPU_UP_CANCELED_FROZEN
> > without acquiring the device locks, since in fact we know these objects won't
> > be accessed concurrently at that time (the locks are already held by the PM
> > core, but the PM core is not going to actually access the devices before the
> > subsequent resume).
>
> How about delaying the CPU_UP_CANCELED_FROZEN announcements until it's
> really safe to send them out? That is, after all devices have been
> resumed and the PM core no longer holds any of their locks. (Should
> this be before or after tasks leave the freezer? -- I'm not sure.)
>
> So the idea is send appropriate announcements at the usual time for
> CPUs that do come back up normally, and don't send anything right away
> for CPUs that fail to come up. Just keep track of which ones failed,
> and then later take care of them.

However, we don't want to execute .resume() for device objects that correspond
to the "dead" CPUs, so to a minimum we should remove them from the dpm_off
list on CPU_UP_CANCELED_FROZEN. For this purpose, we can define a
callback that will remove the device from dpm_off immediately and schedule its
destruction after all devices have been resumed.

Rafael