2008-10-31 18:19:11

by Andrey Borzenkov

[permalink] [raw]
Subject: [PATCH] Use BIOS reboot on Toshiba Portege 4000

Subject: [PATCH] Use BIOS reboot on Toshiba Portege 4000

From: Andrey Borzenkov <[email protected]>

After commit c7ffa6c26277b403920e2255d10df849bd613380 which defaults
to reboot via ACPI keyboard is dead on Toshiba Portege 4000 upon reboot.
Power off is required to revive it again. Add DMI entry to force BIOS
reboot method as it was before.

Signed-off-by: Andrey Borzenkov <[email protected]>

---

This fixes regression in 2.6.28. Commit log states:

Triple-fault and keyboard reset may assert INIT instead of RESET; however
INIT is blocked when Intel VT is enabled. This leads to a partially reset
machine when invoking emergency_restart via sysrq-b: the processor is still
working but other parts of the system are dead.

I wonder if we better check for VT bit instead of using DMI entries?

arch/x86/kernel/reboot.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)


diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f4c93f1..f9b28b3 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -197,6 +197,14 @@ static struct dmi_system_id __initdata reboot_dmi_table[] = {
DMI_MATCH(DMI_PRODUCT_NAME, "HP Compaq"),
},
},
+ { /* Handle problems with rebooting on Toshiba Portege 4000 */
+ .callback = set_bios_reboot,
+ .ident = "Toshiba Portege 4000",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "PORTEGE 4000"),
+ },
+ },
{ }
};


Attachments:
(No filename) (1.44 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-11-03 09:10:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] Use BIOS reboot on Toshiba Portege 4000


* Andrey Borzenkov <[email protected]> wrote:

> Subject: [PATCH] Use BIOS reboot on Toshiba Portege 4000
>
> From: Andrey Borzenkov <[email protected]>
>
> After commit c7ffa6c26277b403920e2255d10df849bd613380 which defaults
> to reboot via ACPI keyboard is dead on Toshiba Portege 4000 upon reboot.
> Power off is required to revive it again. Add DMI entry to force BIOS
> reboot method as it was before.
>
> Signed-off-by: Andrey Borzenkov <[email protected]>

Avi, i expect more boxes to be affected by this bug, and the DMI
solution just does not scale.

So could we please disable VMX from the emergency-shutdown code
instead of twiddling with the reboot method?

Something like this might work as well: iff VMX is enabled, we just do
smp_send_stop() (instead of skipping it) which should take care of
this.

Ingo

2008-11-03 10:04:19

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] Use BIOS reboot on Toshiba Portege 4000

Ingo Molnar wrote:
>> Subject: [PATCH] Use BIOS reboot on Toshiba Portege 4000
>>
>> From: Andrey Borzenkov <[email protected]>
>>
>> After commit c7ffa6c26277b403920e2255d10df849bd613380 which defaults
>> to reboot via ACPI keyboard is dead on Toshiba Portege 4000 upon reboot.
>> Power off is required to revive it again. Add DMI entry to force BIOS
>> reboot method as it was before.
>>
>> Signed-off-by: Andrey Borzenkov <[email protected]>
>>
>
> Avi, i expect more boxes to be affected by this bug, and the DMI
> solution just does not scale.
>
> So could we please disable VMX from the emergency-shutdown code
> instead of twiddling with the reboot method?
>
> Something like this might work as well: iff VMX is enabled, we just do
> smp_send_stop() (instead of skipping it) which should take care of
> this.
>

There is already some code being worked on for kdump (which suffers from
the same symptoms), only kdump uses NMI IPIs for increased stopping
power. Eduardo, can you take a look at porting it to emergency reboot?

--
error compiling committee.c: too many arguments to function

2008-11-03 10:06:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] Use BIOS reboot on Toshiba Portege 4000


* Avi Kivity <[email protected]> wrote:

> Ingo Molnar wrote:
>>> Subject: [PATCH] Use BIOS reboot on Toshiba Portege 4000
>>>
>>> From: Andrey Borzenkov <[email protected]>
>>>
>>> After commit c7ffa6c26277b403920e2255d10df849bd613380 which defaults
>>> to reboot via ACPI keyboard is dead on Toshiba Portege 4000 upon reboot.
>>> Power off is required to revive it again. Add DMI entry to force BIOS
>>> reboot method as it was before.
>>>
>>> Signed-off-by: Andrey Borzenkov <[email protected]>
>>>
>>
>> Avi, i expect more boxes to be affected by this bug, and the DMI
>> solution just does not scale.
>>
>> So could we please disable VMX from the emergency-shutdown code
>> instead of twiddling with the reboot method?
>>
>> Something like this might work as well: iff VMX is enabled, we just do
>> smp_send_stop() (instead of skipping it) which should take care of
>> this.
>>
>
> There is already some code being worked on for kdump (which suffers
> from the same symptoms), only kdump uses NMI IPIs for increased
> stopping power. Eduardo, can you take a look at porting it to
> emergency reboot?

note that this is a must-have regression fix for v2.6.28.

Ingo

2008-11-03 13:34:56

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH] Use BIOS reboot on Toshiba Portege 4000

On Mon, Nov 03, 2008 at 12:02:53PM +0200, Avi Kivity wrote:
> Ingo Molnar wrote:
>>> Subject: [PATCH] Use BIOS reboot on Toshiba Portege 4000
>>>
>>> From: Andrey Borzenkov <[email protected]>
>>>
>>> After commit c7ffa6c26277b403920e2255d10df849bd613380 which defaults
>>> to reboot via ACPI keyboard is dead on Toshiba Portege 4000 upon reboot.
>>> Power off is required to revive it again. Add DMI entry to force BIOS
>>> reboot method as it was before.
>>>
>>> Signed-off-by: Andrey Borzenkov <[email protected]>
>>>
>>
>> Avi, i expect more boxes to be affected by this bug, and the DMI
>> solution just does not scale.
>>
>> So could we please disable VMX from the emergency-shutdown code
>> instead of twiddling with the reboot method?
>>
>> Something like this might work as well: iff VMX is enabled, we just do
>> smp_send_stop() (instead of skipping it) which should take care of
>> this.
>>
>
> There is already some code being worked on for kdump (which suffers from
> the same symptoms), only kdump uses NMI IPIs for increased stopping
> power. Eduardo, can you take a look at porting it to emergency reboot?

We probably need to disable vmx on all CPUs, but emergency reboot skips
native_smp_send_stop() (where we could hook a virt_disable call in).

As relying on IPIs defeats the whole point of emergency_restart, a proper
fix will need to use NMIs like the kdump code does.

--
Eduardo

2008-11-03 14:42:33

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] Use BIOS reboot on Toshiba Portege 4000

Eduardo Habkost wrote:
> We probably need to disable vmx on all CPUs, but emergency reboot skips
> native_smp_send_stop() (where we could hook a virt_disable call in).
>
> As relying on IPIs defeats the whole point of emergency_restart, a proper
> fix will need to use NMIs like the kdump code does.
>

They should use the same code; they have a similar environment at entry
and reliability requirements for e_r are not greater than kdump's.

--
error compiling committee.c: too many arguments to function

2008-11-03 16:15:22

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] Use BIOS reboot on Toshiba Portege 4000

Avi Kivity <[email protected]> writes:

> Eduardo Habkost wrote:
>> We probably need to disable vmx on all CPUs, but emergency reboot skips
>> native_smp_send_stop() (where we could hook a virt_disable call in).
>>
>> As relying on IPIs defeats the whole point of emergency_restart, a proper
>> fix will need to use NMIs like the kdump code does.
>>
>
> They should use the same code; they have a similar environment at entry and
> reliability requirements for e_r are not greater than kdump's.

Just a sec.

I think we are confusing two issues here.

- Ordinary machine_restart which happens to call emergency_restart.
And is proceeded by machine_shutdown.

- And emergency_restart itself.

To some extent I would be a lot happier if Alt-sysrq-r did what
was necessary to get into a context where it can call machine_restart
or even better kernel_restart().
emergency_restart() is a nice idea but is broken by design.

That said. If we can turn off vmx on that one processor.
That should be enough for the cpu to triple fault and let
the BIOS do what it needs to do on that cpu i.e. outb(magic, 0x92)
and toggle a motherboard level reset?

If I read the earlier comments correctly the deep issue is
that going through ACPI to reset systems is less reliable than
doing it the classic way.

Eric

2008-11-03 17:02:19

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH] Use BIOS reboot on Toshiba Portege 4000

On Mon, Nov 03, 2008 at 08:13:07AM -0800, Eric W. Biederman wrote:
> Avi Kivity <[email protected]> writes:
>
> > Eduardo Habkost wrote:
> >> We probably need to disable vmx on all CPUs, but emergency reboot skips
> >> native_smp_send_stop() (where we could hook a virt_disable call in).
> >>
> >> As relying on IPIs defeats the whole point of emergency_restart, a proper
> >> fix will need to use NMIs like the kdump code does.
> >>
> >
> > They should use the same code; they have a similar environment at entry and
> > reliability requirements for e_r are not greater than kdump's.
>
> Just a sec.
>
> I think we are confusing two issues here.
>
> - Ordinary machine_restart which happens to call emergency_restart.
> And is proceeded by machine_shutdown.
>
> - And emergency_restart itself.
>

I am considering only emergency_restart() itself, that simply reboots
the machine. All other cases should be calling the KVM reboot notifier,
that disables virtualization on all CPUs.


> To some extent I would be a lot happier if Alt-sysrq-r did what
> was necessary to get into a context where it can call machine_restart
> or even better kernel_restart().
> emergency_restart() is a nice idea but is broken by design.

Eliminating emergency_restart() looks good in theory, but can we
eliminate it on all cases? We need something for cases when we are on
a possibly-broken state and we want to reboot the machine as reliably
as possible.


>
> That said. If we can turn off vmx on that one processor.
> That should be enough for the cpu to triple fault and let
> the BIOS do what it needs to do on that cpu i.e. outb(magic, 0x92)
> and toggle a motherboard level reset?
>
> If I read the earlier comments correctly the deep issue is
> that going through ACPI to reset systems is less reliable than
> doing it the classic way.

That's what Ingo suggested: instead of defaulting to ACPI reboot,
disable VMX before rebooting if needed and get back to a safer default.

That leads us to the NMI stuff: we need to disable VMX on all CPUs,
and we can't use IPIs if we are on a possibly-broken state.

--
Eduardo

2008-11-04 10:48:58

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] Use BIOS reboot on Toshiba Portege 4000

Eric W. Biederman wrote:
> I think we are confusing two issues here.
>
> - Ordinary machine_restart which happens to call emergency_restart.
> And is proceeded by machine_shutdown.
>
> - And emergency_restart itself.
>
> To some extent I would be a lot happier if Alt-sysrq-r did what
> was necessary to get into a context where it can call machine_restart
> or even better kernel_restart().
> emergency_restart() is a nice idea but is broken by design.
>
>

Isn't emergency_restart() equivalent to kexec()? Both start from
indeterminite conditions.

> That said. If we can turn off vmx on that one processor.
> That should be enough for the cpu to triple fault and let
> the BIOS do what it needs to do on that cpu i.e. outb(magic, 0x92)
> and toggle a motherboard level reset?
>
>

If triple fault is wired to INIT (as it is at least on some systems; for
example one of mine) then the cpu will reset, but why will the bios
proceed to issue a motherboard reset? Won't it happily POST it's way to
boot (leaving the other cpus dead)?

> If I read the earlier comments correctly the deep issue is
> that going through ACPI to reset systems is less reliable than
> doing it the classic way.
>

It depends on the system; both are unreliable. But if we use the same
trick as with kdump (NMI SIPI + vmxoff) the choice will be orthogonal to
whether vmx is in use or not.

--
error compiling committee.c: too many arguments to function

2008-11-04 11:25:25

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] Use BIOS reboot on Toshiba Portege 4000

Avi Kivity <[email protected]> writes:

> Eric W. Biederman wrote:
>> I think we are confusing two issues here.
>>
>> - Ordinary machine_restart which happens to call emergency_restart.
>> And is proceeded by machine_shutdown.
>>
>> - And emergency_restart itself.
>>
>> To some extent I would be a lot happier if Alt-sysrq-r did what
>> was necessary to get into a context where it can call machine_restart
>> or even better kernel_restart().
>> emergency_restart() is a nice idea but is broken by design.
>>
>>
>
> Isn't emergency_restart() equivalent to kexec()? Both start from indeterminite
> conditions.

Good point. That is a reasonable direction to evolve it on x86.
Similar to and sharing some of the same code as the kexec on panic path.

We may need to separate out emergency_restart from the normal clean
restart to make that happen. It would be pointless and silly to be
sending NMI at other cpus for example if we have cleanly shut them
down already.

>> That said. If we can turn off vmx on that one processor.
>> That should be enough for the cpu to triple fault and let
>> the BIOS do what it needs to do on that cpu i.e. outb(magic, 0x92)
>> and toggle a motherboard level reset?
>>
>>
>
> If triple fault is wired to INIT (as it is at least on some systems; for example
> one of mine) then the cpu will reset, but why will the bios proceed to issue a
> motherboard reset? Won't it happily POST it's way to boot (leaving the other
> cpus dead)?

I'm not certain. But when I was writing BIOS's it was much easier to
just toggle the reset line than to try and cope with the weird state
the machine was in. I'm pretty certain why we don't see more problems
with reboot when we leave the machine in a weird state. It is certainly
legal for a BIOS to just run the POST code though.


>> If I read the earlier comments correctly the deep issue is
>> that going through ACPI to reset systems is less reliable than
>> doing it the classic way.
>>
>
> It depends on the system; both are unreliable. But if we use the same trick as
> with kdump (NMI SIPI + vmxoff) the choice will be orthogonal to whether vmx is
> in use or not.


Yes.

Eric

2008-11-04 11:58:39

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH] Use BIOS reboot on Toshiba Portege 4000

On Tue, Nov 04, 2008 at 03:22:30AM -0800, Eric W. Biederman wrote:
> Avi Kivity <[email protected]> writes:
>
> > Eric W. Biederman wrote:
> >> I think we are confusing two issues here.
> >>
> >> - Ordinary machine_restart which happens to call emergency_restart.
> >> And is proceeded by machine_shutdown.
> >>
> >> - And emergency_restart itself.
> >>
> >> To some extent I would be a lot happier if Alt-sysrq-r did what
> >> was necessary to get into a context where it can call machine_restart
> >> or even better kernel_restart().
> >> emergency_restart() is a nice idea but is broken by design.
> >>
> >>
> >
> > Isn't emergency_restart() equivalent to kexec()? Both start from indeterminite
> > conditions.
>
> Good point. That is a reasonable direction to evolve it on x86.
> Similar to and sharing some of the same code as the kexec on panic path.
>
> We may need to separate out emergency_restart from the normal clean
> restart to make that happen. It would be pointless and silly to be
> sending NMI at other cpus for example if we have cleanly shut them
> down already.

When pulling the NMI stuff to the reboot code, I've just hit this issue:
currently, machine_ops.emergency_restart has two possible semantics:

- The function that should be called to immediately reboot the machine,
after the cleanup done by machine_restart()
- The function that should be called to reboot the machine when we are
on a possibly inconsistent state (and will do the vmx-disabling
stuff)

Currently we don't do anything special on emergency_restart() case,
so both cases are equivalent. But now we will need to differentiate
both cases.


...all this work because VMX breaks the good old reset lines. :\

--
Eduardo