2006-02-27 22:35:31

by Chris Wright

[permalink] [raw]
Subject: [patch 10/39] [PATCH] i386/x86-64: Dont IPI to offline cpus on shutdown

-stable review patch. If anyone has any objections, please let us know.
------------------

So why are we calling smp_send_stop from machine_halt?

We don't.

Looking more closely at the bug report the problem here
is that halt -p is called which triggers not a halt but
an attempt to power off.

machine_power_off calls machine_shutdown which calls smp_send_stop.

If pm_power_off is set we should never make it out machine_power_off
to the call of do_exit. So pm_power_off must not be set in this case.
When pm_power_off is not set we expect machine_power_off to devolve
into machine_halt.

So how do we fix this?

Playing too much with smp_send_stop is dangerous because it
must also be safe to be called from panic.

It looks like the obviously correct fix is to only call
machine_shutdown when pm_power_off is defined. Doing
that will make Andi's assumption about not scheduling
true and generally simplify what must be supported.

This turns machine_power_off into a noop like machine_halt
when pm_power_off is not defined.

If the expected behavior is that sys_reboot(LINUX_REBOOT_CMD_POWER_OFF)
becomes sys_reboot(LINUX_REBOOT_CMD_HALT) if pm_power_off is NULL
this is not quite a comprehensive fix as we pass a different parameter
to the reboot notifier and we set system_state to a different value
before calling device_shutdown().

Unfortunately any fix more comprehensive I can think of is not
obviously correct. The core problem is that there is no architecture
independent way to detect if machine_power will become a noop, without
calling it.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Chris Wright <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---

arch/i386/kernel/reboot.c | 7 ++++---
arch/x86_64/kernel/reboot.c | 10 ++++++----
2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/i386/kernel/reboot.c b/arch/i386/kernel/reboot.c
index 2fa5803..d207242 100644
--- linux-2.6.15.4.orig/arch/i386/kernel/reboot.c
+++ linux-2.6.15.4/arch/i386/kernel/reboot.c
@@ -12,6 +12,7 @@
#include <linux/efi.h>
#include <linux/dmi.h>
#include <linux/ctype.h>
+#include <linux/pm.h>
#include <asm/uaccess.h>
#include <asm/apic.h>
#include <asm/desc.h>
@@ -355,10 +356,10 @@ void machine_halt(void)

void machine_power_off(void)
{
- machine_shutdown();
-
- if (pm_power_off)
+ if (pm_power_off) {
+ machine_shutdown();
pm_power_off();
+ }
}


--- linux-2.6.15.4.orig/arch/x86_64/kernel/reboot.c
+++ linux-2.6.15.4/arch/x86_64/kernel/reboot.c
@@ -6,6 +6,7 @@
#include <linux/kernel.h>
#include <linux/ctype.h>
#include <linux/string.h>
+#include <linux/pm.h>
#include <asm/io.h>
#include <asm/kdebug.h>
#include <asm/delay.h>
@@ -154,10 +155,11 @@ void machine_halt(void)

void machine_power_off(void)
{
- if (!reboot_force) {
- machine_shutdown();
- }
- if (pm_power_off)
+ if (pm_power_off) {
+ if (!reboot_force) {
+ machine_shutdown();
+ }
pm_power_off();
+ }
}


--


2006-02-27 22:39:28

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 10/39] [PATCH] i386/x86-64: Dont IPI to offline cpus on shutdown

On Monday 27 February 2006 23:32, Chris Wright wrote:
> -stable review patch. If anyone has any objections, please let us know.
> ------------------
>
> So why are we calling smp_send_stop from machine_halt?

I don't think that one is really suitable for stable since it's
a relative obscure problem and the fix is not fully clear. Also it might
have side effects. Shouldn't be merged.

-Andi

2006-02-27 23:16:33

by Chris Wright

[permalink] [raw]
Subject: Re: [patch 10/39] [PATCH] i386/x86-64: Dont IPI to offline cpus on shutdown

* Andi Kleen ([email protected]) wrote:
> On Monday 27 February 2006 23:32, Chris Wright wrote:
> > -stable review patch. If anyone has any objections, please let us know.
> > ------------------
> >
> > So why are we calling smp_send_stop from machine_halt?
>
> I don't think that one is really suitable for stable since it's
> a relative obscure problem and the fix is not fully clear. Also it might
> have side effects. Shouldn't be merged.

This was sent in by both Andrew and Ashok, and is upstream (although Eric
notes there's more to the comprehensive solution). It allegedly solves:

http://bugzilla.kernel.org/show_bug.cgi?id=6077

Although the reporter seems to have gone silent. Unless there's some
compelling evidence otherwise, I'm happy to drop it.

thanks,
-chris

2006-02-28 07:04:28

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [patch 10/39] [PATCH] i386/x86-64: Dont IPI to offline cpus on shutdown

Chris Wright <[email protected]> writes:

> * Andi Kleen ([email protected]) wrote:
>> On Monday 27 February 2006 23:32, Chris Wright wrote:
>> > -stable review patch. If anyone has any objections, please let us know.
>> > ------------------
>> >
>> > So why are we calling smp_send_stop from machine_halt?
>>
>> I don't think that one is really suitable for stable since it's
>> a relative obscure problem and the fix is not fully clear. Also it might
>> have side effects. Shouldn't be merged.
>
> This was sent in by both Andrew and Ashok, and is upstream (although Eric
> notes there's more to the comprehensive solution). It allegedly solves:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=6077
>
> Although the reporter seems to have gone silent. Unless there's some
> compelling evidence otherwise, I'm happy to drop it.

The comprehensive fix for 2.6.15.x is to remove -p from /sbin/halt
if your machine has this problem. I have just updated the bugzilla
entry so we can remember this.

There are no security implications to this, either since this is a crash
when attempting to power off the machine.

Eric

2006-03-01 22:17:35

by Chris Wright

[permalink] [raw]
Subject: Re: [patch 10/39] [PATCH] i386/x86-64: Dont IPI to offline cpus on shutdown

* Eric W. Biederman ([email protected]) wrote:
> The comprehensive fix for 2.6.15.x is to remove -p from /sbin/halt
> if your machine has this problem. I have just updated the bugzilla
> entry so we can remember this.

fix...workaround... ;-) At any rate, I've dropped this one. Thanks
to you and Andi for reviewing.

thanks,
-chris