2014-04-03 06:30:55

by Li, Aubrey

[permalink] [raw]
Subject: Re: FW: [BUG] x86: reboot doesn't reboot

> From: Steven Rostedt [mailto:[email protected]]
> Sent: Thursday, April 03, 2014 2:14 PM
> To: Linus Torvalds; LKML
> Cc: Ingo Molnar; H. Peter Anvin; Thomas Gleixner; Li, Aubrey; Matthew Garrett
> Subject: [BUG] x86: reboot doesn't reboot
>
> I noticed that one of my test boxes no longer does a reboot with the latest
> kernel. It goes down and shows:
>
> [ 51.589896] reboot: Restarting system
> [ 51.593567] reboot: machine restart
>
>
> And the system sounds like it turns off (the fans stop), but then I just sit there
> and wait, and wait and wait. The system never starts up again.
>
> I did a bisect and it landed on: a4f1987e4c54 "x86, reboot: Add EFI and
> CF9 reboot methods into the default list" which looks like a very likely suspect. I
> removed that and fb3bd7b19b2b, as the second commit was just a fix up of the
> first, recompiled, installed and rebooted to that kernel. I tried to reboot again
> and sure enough after posting the "machine restart" I started getting those
> "Press any key to continue."
> messages again. That's a good sign.
>
> Linus mentioned to me to use "reboot=pci" on the kernel command line, but no
> dice. That doesn't seem to make a difference.

May I know if "reboot=t" make any difference on your system with the change?

Thanks,
-Aubrey


2014-04-03 12:09:32

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Thu, 03 Apr 2014 14:30:50 +0800
"Li, Aubrey" <[email protected]> wrote:


> May I know if "reboot=t" make any difference on your system with the change?

It appears it does. Yes, it boots with "reboot=t" with the patches
applied, and does not reboot without it.

-- Steve

2014-04-03 13:42:00

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Thu, 03 Apr 2014 14:30:50 +0800
"Li, Aubrey" <[email protected]> wrote:


> May I know if "reboot=t" make any difference on your system with the change?

Is this the future fix? Or do you have patches for me to test. I'll be
happy to test any patches you have on this box. If you need to know
anything about this box, let me know and I'll run any command that
would help you.

Thanks,

-- Steve

2014-04-03 13:45:44

by Matthew Garrett

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Thu, Apr 03, 2014 at 09:41:55AM -0400, Steven Rostedt wrote:
> On Thu, 03 Apr 2014 14:30:50 +0800
> "Li, Aubrey" <[email protected]> wrote:
>
>
> > May I know if "reboot=t" make any difference on your system with the change?
>
> Is this the future fix? Or do you have patches for me to test. I'll be
> happy to test any patches you have on this box. If you need to know
> anything about this box, let me know and I'll run any command that
> would help you.

I think what this tells us is we should revert the patch and never try
0xcf9 by default.

--
Matthew Garrett | [email protected]

2014-04-03 14:11:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On 04/03/2014 06:41 AM, Steven Rostedt wrote:
> On Thu, 03 Apr 2014 14:30:50 +0800
> "Li, Aubrey" <[email protected]> wrote:
>
>
>> May I know if "reboot=t" make any difference on your system with the change?
>
> Is this the future fix? Or do you have patches for me to test. I'll be
> happy to test any patches you have on this box. If you need to know
> anything about this box, let me know and I'll run any command that
> would help you.
>

Could you tell which of these modes work on your box:

reboot=t
reboot=k
reboot=b (BIOS only)
reboot=a
reboot=e (EFI only)
reboot=p

-hpa

2014-04-03 14:47:25

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Thu, 03 Apr 2014 07:10:47 -0700
"H. Peter Anvin" <[email protected]> wrote:

> Could you tell which of these modes work on your box:
>
> reboot=t

I already stated that 't' works.

> reboot=k

Fails

> reboot=b (BIOS only)

Passed

> reboot=a

Passed

> reboot=e (EFI only)

Fails

> reboot=p

Fails

-- Steve

2014-04-03 14:50:42

by Matthew Garrett

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Thu, Apr 03, 2014 at 10:47:21AM -0400, Steven Rostedt wrote:
> > reboot=a
>
> Passed

Huh. We should be trying the acpi method twice before we try PCI at all,
unless something's gone very wrong.

--
Matthew Garrett | [email protected]

2014-04-03 15:14:56

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Thu, 3 Apr 2014 15:50:22 +0100
Matthew Garrett <[email protected]> wrote:

> On Thu, Apr 03, 2014 at 10:47:21AM -0400, Steven Rostedt wrote:
> > > reboot=a
> >
> > Passed
>
> Huh. We should be trying the acpi method twice before we try PCI at all,
> unless something's gone very wrong.
>

Bah, after getting this email, I tried again, and it fails on 'a'.

I may have booted into the wrong kernel, or miscounted my reboots.

Put this down as a FAIL :-(


-- Steve

2014-04-03 15:17:17

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Thu, 3 Apr 2014 10:47:21 -0400
Steven Rostedt <[email protected]> wrote:

> On Thu, 03 Apr 2014 07:10:47 -0700

> Passed
>
> > reboot=a

After getting Matthew's email, I tried this again and it failed to
boot. Let me rerun through the list again and I'll report back if
things have changed.

Changing options, rebooting, changing options rebooting, gets a bit
confusing. I may have rebooted the 'b' and thought it was the 'a'.

-- Steve

2014-04-03 15:21:23

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On 04/03/2014 08:17 AM, Steven Rostedt wrote:
> On Thu, 3 Apr 2014 10:47:21 -0400
> Steven Rostedt <[email protected]> wrote:
>
>> On Thu, 03 Apr 2014 07:10:47 -0700
>
>> Passed
>>
>>> reboot=a
>
> After getting Matthew's email, I tried this again and it failed to
> boot. Let me rerun through the list again and I'll report back if
> things have changed.
>
> Changing options, rebooting, changing options rebooting, gets a bit
> confusing. I may have rebooted the 'b' and thought it was the 'a'.
>

Also, can you send the dmidecode from your system? Is it a Dell?

-hpa

2014-04-03 15:22:17

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On 04/03/2014 08:20 AM, H. Peter Anvin wrote:
>
> Also, can you send the dmidecode from your system? Is it a Dell?
>

(And I assume you're booting BIOS, not EFI.)

-hpa

2014-04-03 15:35:49

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot


Rechecked my answers.

On Thu, 3 Apr 2014 10:47:21 -0400
Steven Rostedt <[email protected]> wrote:

> On Thu, 03 Apr 2014 07:10:47 -0700
> "H. Peter Anvin" <[email protected]> wrote:
>
> > Could you tell which of these modes work on your box:
> >
> > reboot=t
>
> I already stated that 't' works.

Yes it works.

>
> > reboot=k
>
> Fails

Still fails.

>
> > reboot=b (BIOS only)
>
> Passed

Still passes.

>
> > reboot=a
>
> Passed

Correction, this consistently fails.

>
> > reboot=e (EFI only)
>
> Fails

Still fails.

>
> > reboot=p
>
> Fails

Still fails.

-- Steve

2014-04-03 15:39:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Thu, 03 Apr 2014 08:21:55 -0700
"H. Peter Anvin" <[email protected]> wrote:

> On 04/03/2014 08:20 AM, H. Peter Anvin wrote:
> >
> > Also, can you send the dmidecode from your system? Is it a Dell?

Hmm, I didn't see this email. Note, this box is an old development box
that Intel sent me years ago.

> >
>
> (And I assume you're booting BIOS, not EFI.)

Yeah, and I'm not sure it even has EFI. I'll have to hook a monitor to
it to find out. I rather not do that ;-)

-- Steve


Attachments:
(No filename) (481.00 B)
dmidecode.gz (3.12 kB)
Download all attachments

2014-04-03 15:59:14

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On 04/03/2014 08:39 AM, Steven Rostedt wrote:
>
> Hmm, I didn't see this email. Note, this box is an old development box
> that Intel sent me years ago.
>

Preproduction system?

-hpa

2014-04-03 16:13:35

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Thu, 03 Apr 2014 08:58:14 -0700
"H. Peter Anvin" <[email protected]> wrote:

> On 04/03/2014 08:39 AM, Steven Rostedt wrote:
> >
> > Hmm, I didn't see this email. Note, this box is an old development box
> > that Intel sent me years ago.
> >
>
> Preproduction system?
>

Could be. Honestly, I don't remember. I use to get boxes from Intel and
AMD randomly at my doorstep. Sometimes, a representative would actually
tell me why I got it. I just took them and set them up in my test
environment. Never really questioned why. I think these boxes are from
2006 or 2007.

-- Steve

2014-04-03 23:23:39

by Li, Aubrey

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On 2014/4/4 0:13, Steven Rostedt wrote:
> On Thu, 03 Apr 2014 08:58:14 -0700
> "H. Peter Anvin" <[email protected]> wrote:
>
>> On 04/03/2014 08:39 AM, Steven Rostedt wrote:
>>>
>>> Hmm, I didn't see this email. Note, this box is an old development box
>>> that Intel sent me years ago.
>>>
>>
>> Preproduction system?
>>
>
> Could be. Honestly, I don't remember. I use to get boxes from Intel and
> AMD randomly at my doorstep. Sometimes, a representative would actually
> tell me why I got it. I just took them and set them up in my test
> environment. Never really questioned why. I think these boxes are from
> 2006 or 2007.
>

Can you please send the dmi table out?

Thanks,
-Aubrey

2014-04-03 23:40:25

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, 04 Apr 2014 07:23:32 +0800
"Li, Aubrey" <[email protected]> wrote:

> Can you please send the dmi table out?

I already did as a gz attachment to H. Peter. You were on the Cc, did
you not receive it?

-- Steve

2014-04-03 23:53:01

by Li, Aubrey

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On 2014/4/4 7:40, Steven Rostedt wrote:
> On Fri, 04 Apr 2014 07:23:32 +0800
> "Li, Aubrey" <[email protected]> wrote:
>
>> Can you please send the dmi table out?
>
> I already did as a gz attachment to H. Peter. You were on the Cc, did
> you not receive it?
>
Oh, I got it. This is a Preproduction machine.
When reboot failed via a method (=e or =p), there are two case.

Case 1: this method do nothing, pass the attempt chance to the next method
Case 2: this method hangs the system

I want to know if CF9 is case 1 or case 2. Could you please try the following
patch *without* any reboot parameters?

(1) If we didn't see any string, then EFI hangs your box.
(2) if we see the first string but not the second one, CF9 hangs your box
(3) if we see both, couldn't be, because BIOS works on your box.

Thanks,
-Aubrey

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 654b465..4de3027 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -547,11 +547,13 @@ static void native_machine_emergency_restart(void)
u8 reboot_code = reboot_mode == REBOOT_WARM ?
0x06 : 0x0E;
u8 cf9 = inb(0xcf9) & ~reboot_code;
+ pr_info("reboot via CF9...\n");
outb(cf9|2, 0xcf9); /* Request hard reset */
udelay(50);
/* Actually do the reset */
outb(cf9|reboot_code, 0xcf9);
udelay(50);
+ pr_info("reboot via CF9 done...\n");
}
reboot_type = BOOT_BIOS;
break;


2014-04-04 00:12:42

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On 04/03/2014 04:52 PM, Li, Aubrey wrote:
> On 2014/4/4 7:40, Steven Rostedt wrote:
>> On Fri, 04 Apr 2014 07:23:32 +0800
>> "Li, Aubrey" <[email protected]> wrote:
>>
>>> Can you please send the dmi table out?
>>
>> I already did as a gz attachment to H. Peter. You were on the Cc, did
>> you not receive it?
>>
> Oh, I got it. This is a Preproduction machine.
> When reboot failed via a method (=e or =p), there are two case.
>
> Case 1: this method do nothing, pass the attempt chance to the next method
> Case 2: this method hangs the system
>
> I want to know if CF9 is case 1 or case 2. Could you please try the following
> patch *without* any reboot parameters?
>
> (1) If we didn't see any string, then EFI hangs your box.
> (2) if we see the first string but not the second one, CF9 hangs your box
> (3) if we see both, couldn't be, because BIOS works on your box.
>

Given that this machine doesn't have EFI, it seems kind of obvious, no?

-hpa

2014-04-04 01:27:55

by Li, Aubrey

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On 2014/4/4 8:12, H. Peter Anvin wrote:
> On 04/03/2014 04:52 PM, Li, Aubrey wrote:
>> On 2014/4/4 7:40, Steven Rostedt wrote:
>>> On Fri, 04 Apr 2014 07:23:32 +0800
>>> "Li, Aubrey" <[email protected]> wrote:
>>>
>>>> Can you please send the dmi table out?
>>>
>>> I already did as a gz attachment to H. Peter. You were on the Cc, did
>>> you not receive it?
>>>
>> Oh, I got it. This is a Preproduction machine.
>> When reboot failed via a method (=e or =p), there are two case.
>>
>> Case 1: this method do nothing, pass the attempt chance to the next method
>> Case 2: this method hangs the system
>>
>> I want to know if CF9 is case 1 or case 2. Could you please try the following
>> patch *without* any reboot parameters?
>>
>> (1) If we didn't see any string, then EFI hangs your box.
>> (2) if we see the first string but not the second one, CF9 hangs your box
>> (3) if we see both, couldn't be, because BIOS works on your box.
>>
>
> Given that this machine doesn't have EFI, it seems kind of obvious, no?
>
> -hpa

Yes. it should be but I want to confirm.

The current situation is,
- we have one(do we know more?) preproduction machine hangs by CF9.
- We have more than one(could be thousand known) production machine
works by CF9.

So, if I understand correctly(please correct me if I was wrong), I don't
think the justification is enough to revert the patch. The patch
includes EFI, CF9 and BIOS.

I'm open to make Steven's machine work:
(1) remove CF9
(2) add DMI table
(3) any other idea without a regression.

I prefer (2) or (3) if better because if we do (1) we will probably
receive some other regression reports.

Thanks,
-Aubrey

2014-04-04 01:41:17

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

Keep in mind we already tried CF9 in the default flow and it broke things. I'm willing to wait for reports about production machines, though, but I fully expect them.

On April 3, 2014 6:27:48 PM PDT, "Li, Aubrey" <[email protected]> wrote:
>On 2014/4/4 8:12, H. Peter Anvin wrote:
>> On 04/03/2014 04:52 PM, Li, Aubrey wrote:
>>> On 2014/4/4 7:40, Steven Rostedt wrote:
>>>> On Fri, 04 Apr 2014 07:23:32 +0800
>>>> "Li, Aubrey" <[email protected]> wrote:
>>>>
>>>>> Can you please send the dmi table out?
>>>>
>>>> I already did as a gz attachment to H. Peter. You were on the Cc,
>did
>>>> you not receive it?
>>>>
>>> Oh, I got it. This is a Preproduction machine.
>>> When reboot failed via a method (=e or =p), there are two case.
>>>
>>> Case 1: this method do nothing, pass the attempt chance to the next
>method
>>> Case 2: this method hangs the system
>>>
>>> I want to know if CF9 is case 1 or case 2. Could you please try the
>following
>>> patch *without* any reboot parameters?
>>>
>>> (1) If we didn't see any string, then EFI hangs your box.
>>> (2) if we see the first string but not the second one, CF9 hangs
>your box
>>> (3) if we see both, couldn't be, because BIOS works on your box.
>>>
>>
>> Given that this machine doesn't have EFI, it seems kind of obvious,
>no?
>>
>> -hpa
>
>Yes. it should be but I want to confirm.
>
>The current situation is,
>- we have one(do we know more?) preproduction machine hangs by CF9.
>- We have more than one(could be thousand known) production machine
>works by CF9.
>
>So, if I understand correctly(please correct me if I was wrong), I
>don't
>think the justification is enough to revert the patch. The patch
>includes EFI, CF9 and BIOS.
>
>I'm open to make Steven's machine work:
>(1) remove CF9
>(2) add DMI table
>(3) any other idea without a regression.
>
>I prefer (2) or (3) if better because if we do (1) we will probably
>receive some other regression reports.
>
>Thanks,
>-Aubrey

--
Sent from my mobile phone. Please pardon brevity and lack of formatting.

2014-04-04 02:16:15

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, 04 Apr 2014 07:52:53 +0800
"Li, Aubrey" <[email protected]> wrote:

> On 2014/4/4 7:40, Steven Rostedt wrote:
> > On Fri, 04 Apr 2014 07:23:32 +0800
> > "Li, Aubrey" <[email protected]> wrote:
> >
> >> Can you please send the dmi table out?
> >
> > I already did as a gz attachment to H. Peter. You were on the Cc, did
> > you not receive it?
> >
> Oh, I got it. This is a Preproduction machine.
> When reboot failed via a method (=e or =p), there are two case.
>
> Case 1: this method do nothing, pass the attempt chance to the next method
> Case 2: this method hangs the system
>
> I want to know if CF9 is case 1 or case 2. Could you please try the following
> patch *without* any reboot parameters?
>
> (1) If we didn't see any string, then EFI hangs your box.
> (2) if we see the first string but not the second one, CF9 hangs your box
> (3) if we see both, couldn't be, because BIOS works on your box.

Here's the output:

[ 114.445327] reboot: Restarting system
[ 114.449002] reboot: machine restart
[ 114.453495] reboot: reboot via CF9...

-- Steve

2014-04-04 02:43:15

by Li, Aubrey

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On 2014/4/4 10:16, Steven Rostedt wrote:
> On Fri, 04 Apr 2014 07:52:53 +0800
> "Li, Aubrey" <[email protected]> wrote:
>
>> On 2014/4/4 7:40, Steven Rostedt wrote:
>>> On Fri, 04 Apr 2014 07:23:32 +0800
>>> "Li, Aubrey" <[email protected]> wrote:
>>>
>>>> Can you please send the dmi table out?
>>>
>>> I already did as a gz attachment to H. Peter. You were on the Cc, did
>>> you not receive it?
>>>
>> Oh, I got it. This is a Preproduction machine.
>> When reboot failed via a method (=e or =p), there are two case.
>>
>> Case 1: this method do nothing, pass the attempt chance to the next method
>> Case 2: this method hangs the system
>>
>> I want to know if CF9 is case 1 or case 2. Could you please try the following
>> patch *without* any reboot parameters?
>>
>> (1) If we didn't see any string, then EFI hangs your box.
>> (2) if we see the first string but not the second one, CF9 hangs your box
>> (3) if we see both, couldn't be, because BIOS works on your box.
>
> Here's the output:
>
> [ 114.445327] reboot: Restarting system
> [ 114.449002] reboot: machine restart
> [ 114.453495] reboot: reboot via CF9...
>

Thanks Steven, I got what I want.

2014-04-04 06:13:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot


* H. Peter Anvin <[email protected]> wrote:

> Keep in mind we already tried CF9 in the default flow and it broke
> things. I'm willing to wait for reports about production machines,
> though, but I fully expect them.

Typically there's nothing particularly weird about preproduction Intel
hardware when it comes to reboot methods, other than people more
willing to test them with development kernels.

Is there any system were the triple fault method does not work?

Thanks,

Ingo

2014-04-04 06:48:53

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] x86: Try the BIOS reboot method before the PCI reboot method


* Steven Rostedt <[email protected]> wrote:

> On Thu, 03 Apr 2014 07:10:47 -0700
> "H. Peter Anvin" <[email protected]> wrote:
>
> > Could you tell which of these modes work on your box:

Basically my thinking is that the patch should be reverted, if my fix
below does not work.

I distilled your test results into:

reboot=t # triple fault ok
reboot=k # keyboard ctrl FAIL
reboot=b # BIOS ok
reboot=a # ACPI FAIL
reboot=e # EFI FAIL [system has no EFI]
reboot=p # PCI 0xcf9 FAIL

And I think it's pretty obvious that we should only try 0xcf9 as a
last resort - if at all. For some reason the 0xcf9 reboot method got
marked 'safe' - why is that? If only pci_direct_probe() had funny
extra lines /* like this */ ...

The other observation is that (on this box) we should try the 'BIOS'
method before the PCI method.

Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...

[ Plus all the BOOT_ flags are total misnomers as well, why aren't
they named REBOOT_ ...? ]

Anyway, the patch below fixes the worst problems:

- it orders the actual reboot logic to follow the reboot ordering
pattern - it was in a pretty random order before for no good
reason.

- it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
BOOT_CF9_SAFE flags to make the code more obvious.

- it tries the BIOS reboot method before the PCI reboot method.

Only build tested.

Alternatively we could just use the following reboot order:

* 1) If the FADT has the ACPI reboot register flag set, try it
* 2) If still alive, write to the keyboard controller
* 3) If still alive, write to the ACPI reboot register again
* 4) If still alive, write to the keyboard controller again
* 5) If still alive, call the EFI runtime service to reboot
* 6) If still alive, force a triple fault

I.e. eliminate the 'PCI' and 'BIOS' methods from our default sequence,
as both are documented as being able to hang some boxes.

Thanks,

Ingo

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 654b465..527dbcb 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -114,8 +114,8 @@ EXPORT_SYMBOL(machine_real_restart);
*/
static int __init set_pci_reboot(const struct dmi_system_id *d)
{
- if (reboot_type != BOOT_CF9) {
- reboot_type = BOOT_CF9;
+ if (reboot_type != BOOT_CF9_FORCE) {
+ reboot_type = BOOT_CF9_FORCE;
pr_info("%s series board detected. Selecting %s-method for reboots.\n",
d->ident, "PCI");
}
@@ -468,10 +468,15 @@ void __attribute__((weak)) mach_reboot_fixups(void)
* 6) If still alive, write to the PCI IO port 0xCF9 to reboot
* 7) If still alive, inform BIOS to do a proper reboot
*
- * If the machine is still alive at this stage, it gives up. We default to
- * following the same pattern, except that if we're still alive after (7) we'll
- * try to force a triple fault and then cycle between hitting the keyboard
- * controller and doing that
+ * If the machine is still alive at this stage, it gives up.
+ *
+ * We default to following the same pattern, except that we try
+ * (7) [BIOS] before (6) [PCI], and we add 8): try to force a
+ * triple fault and then cycle between hitting the keyboard
+ * controller and doing that.
+ *
+ * This means that this function can never return, it can misbehave
+ * by not rebooting properly and hanging.
*/
static void native_machine_emergency_restart(void)
{
@@ -492,6 +497,11 @@ static void native_machine_emergency_restart(void)
for (;;) {
/* Could also try the reset bit in the Hammer NB */
switch (reboot_type) {
+ case BOOT_ACPI:
+ acpi_reboot();
+ reboot_type = BOOT_KBD;
+ break;
+
case BOOT_KBD:
mach_reboot_fixups(); /* For board specific fixups */

@@ -509,43 +519,29 @@ static void native_machine_emergency_restart(void)
}
break;

- case BOOT_TRIPLE:
- load_idt(&no_idt);
- __asm__ __volatile__("int3");
-
- /* We're probably dead after this, but... */
- reboot_type = BOOT_KBD;
- break;
-
- case BOOT_BIOS:
- machine_real_restart(MRR_BIOS);
-
- /* We're probably dead after this, but... */
- reboot_type = BOOT_TRIPLE;
- break;
-
- case BOOT_ACPI:
- acpi_reboot();
- reboot_type = BOOT_KBD;
- break;
-
case BOOT_EFI:
if (efi_enabled(EFI_RUNTIME_SERVICES))
efi.reset_system(reboot_mode == REBOOT_WARM ?
EFI_RESET_WARM :
EFI_RESET_COLD,
EFI_SUCCESS, 0, NULL);
- reboot_type = BOOT_CF9_COND;
+ reboot_type = BOOT_BIOS;
+ break;
+
+ case BOOT_BIOS:
+ machine_real_restart(MRR_BIOS);
+
+ /* We're probably dead after this, but... */
+ reboot_type = BOOT_CF9_SAFE;
break;

- case BOOT_CF9:
+ case BOOT_CF9_FORCE:
port_cf9_safe = true;
/* Fall through */

- case BOOT_CF9_COND:
+ case BOOT_CF9_SAFE:
if (port_cf9_safe) {
- u8 reboot_code = reboot_mode == REBOOT_WARM ?
- 0x06 : 0x0E;
+ u8 reboot_code = reboot_mode == REBOOT_WARM ? 0x06 : 0x0E;
u8 cf9 = inb(0xcf9) & ~reboot_code;
outb(cf9|2, 0xcf9); /* Request hard reset */
udelay(50);
@@ -553,7 +549,15 @@ static void native_machine_emergency_restart(void)
outb(cf9|reboot_code, 0xcf9);
udelay(50);
}
- reboot_type = BOOT_BIOS;
+ reboot_type = BOOT_TRIPLE;
+ break;
+
+ case BOOT_TRIPLE:
+ load_idt(&no_idt);
+ __asm__ __volatile__("int3");
+
+ /* We're probably dead after this, but... */
+ reboot_type = BOOT_KBD;
break;
}
}
diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index 9e7db9e..48bf152 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -20,13 +20,13 @@ enum reboot_mode {
extern enum reboot_mode reboot_mode;

enum reboot_type {
- BOOT_TRIPLE = 't',
- BOOT_KBD = 'k',
- BOOT_BIOS = 'b',
- BOOT_ACPI = 'a',
- BOOT_EFI = 'e',
- BOOT_CF9 = 'p',
- BOOT_CF9_COND = 'q',
+ BOOT_TRIPLE = 't',
+ BOOT_KBD = 'k',
+ BOOT_BIOS = 'b',
+ BOOT_ACPI = 'a',
+ BOOT_EFI = 'e',
+ BOOT_CF9_FORCE = 'p',
+ BOOT_CF9_SAFE = 'q',
};
extern enum reboot_type reboot_type;

2014-04-04 07:53:55

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] x86: Don't do the BIOS and PCI 0xCF9 reboot methods by default


* Ingo Molnar <[email protected]> wrote:

> Alternatively we could just use the following reboot order:
>
> * 1) If the FADT has the ACPI reboot register flag set, try it
> * 2) If still alive, write to the keyboard controller
> * 3) If still alive, write to the ACPI reboot register again
> * 4) If still alive, write to the keyboard controller again
> * 5) If still alive, call the EFI runtime service to reboot
> * 6) If still alive, force a triple fault
>
> I.e. eliminate the 'PCI' and 'BIOS' methods from our default
> sequence, as both are documented as being able to hang some boxes.

The patch below, on top of my first patch, does that.

Only lightly tested.

My expectation would be that Steve's box reboots fine by default with
the first patch already, and that it would still reboot fine with the
second patch applied as well.

Thanks,

Ingo

Signed-off-by: Ingo Molnar <[email protected]>
---

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 527dbcb..bd9b0ef 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -470,10 +470,10 @@ void __attribute__((weak)) mach_reboot_fixups(void)
*
* If the machine is still alive at this stage, it gives up.
*
- * We default to following the same pattern, except that we try
- * (7) [BIOS] before (6) [PCI], and we add 8): try to force a
- * triple fault and then cycle between hitting the keyboard
- * controller and doing that.
+ * We default to following the same pattern, except that after
+ * the 5) EFI reboot step we go straight to Linux specific
+ * step 8): try to force a triple fault and then cycle between
+ * hitting the keyboard controller and doing that.
*
* This means that this function can never return, it can misbehave
* by not rebooting properly and hanging.
@@ -525,7 +525,7 @@ static void native_machine_emergency_restart(void)
EFI_RESET_WARM :
EFI_RESET_COLD,
EFI_SUCCESS, 0, NULL);
- reboot_type = BOOT_BIOS;
+ reboot_type = BOOT_TRIPLE;
break;

case BOOT_BIOS:

Subject: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

Commit-ID: d122ab8d224a59aa0c3e4ba3540c6ab99556c3e3
Gitweb: http://git.kernel.org/tip/d122ab8d224a59aa0c3e4ba3540c6ab99556c3e3
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 4 Apr 2014 08:41:26 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 4 Apr 2014 08:48:03 +0200

[PATCH] x86: Try the BIOS reboot method before the PCI reboot method

Steve reported a reboot hang and bisected it back to this commit:

a4f1987e4c54 x86, reboot: Add EFI and CF9 reboot methods into the default list

He heroically tested all reboot methods and found the following:

reboot=t # triple fault ok
reboot=k # keyboard ctrl FAIL
reboot=b # BIOS ok
reboot=a # ACPI FAIL
reboot=e # EFI FAIL [system has no EFI]
reboot=p # PCI 0xcf9 FAIL

And I think it's pretty obvious that we should only try 0xcf9 as a
last resort - if at all.

The other observation is that (on this box) we should try the 'BIOS'
method before the PCI method.

Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...

So this patch fixes the worst problems:

- it orders the actual reboot logic to follow the reboot ordering
pattern - it was in a pretty random order before for no good
reason.

- it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
BOOT_CF9_SAFE flags to make the code more obvious.

- it tries the BIOS reboot method before the PCI reboot method.

Reported-and-bisected-by: Steven Rostedt <[email protected]>
Cc: Li Aubrey <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matthew Garrett <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/reboot.c | 68 +++++++++++++++++++++++++-----------------------
include/linux/reboot.h | 14 +++++-----
2 files changed, 43 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 654b465..527dbcb 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -114,8 +114,8 @@ EXPORT_SYMBOL(machine_real_restart);
*/
static int __init set_pci_reboot(const struct dmi_system_id *d)
{
- if (reboot_type != BOOT_CF9) {
- reboot_type = BOOT_CF9;
+ if (reboot_type != BOOT_CF9_FORCE) {
+ reboot_type = BOOT_CF9_FORCE;
pr_info("%s series board detected. Selecting %s-method for reboots.\n",
d->ident, "PCI");
}
@@ -468,10 +468,15 @@ void __attribute__((weak)) mach_reboot_fixups(void)
* 6) If still alive, write to the PCI IO port 0xCF9 to reboot
* 7) If still alive, inform BIOS to do a proper reboot
*
- * If the machine is still alive at this stage, it gives up. We default to
- * following the same pattern, except that if we're still alive after (7) we'll
- * try to force a triple fault and then cycle between hitting the keyboard
- * controller and doing that
+ * If the machine is still alive at this stage, it gives up.
+ *
+ * We default to following the same pattern, except that we try
+ * (7) [BIOS] before (6) [PCI], and we add 8): try to force a
+ * triple fault and then cycle between hitting the keyboard
+ * controller and doing that.
+ *
+ * This means that this function can never return, it can misbehave
+ * by not rebooting properly and hanging.
*/
static void native_machine_emergency_restart(void)
{
@@ -492,6 +497,11 @@ static void native_machine_emergency_restart(void)
for (;;) {
/* Could also try the reset bit in the Hammer NB */
switch (reboot_type) {
+ case BOOT_ACPI:
+ acpi_reboot();
+ reboot_type = BOOT_KBD;
+ break;
+
case BOOT_KBD:
mach_reboot_fixups(); /* For board specific fixups */

@@ -509,43 +519,29 @@ static void native_machine_emergency_restart(void)
}
break;

- case BOOT_TRIPLE:
- load_idt(&no_idt);
- __asm__ __volatile__("int3");
-
- /* We're probably dead after this, but... */
- reboot_type = BOOT_KBD;
- break;
-
- case BOOT_BIOS:
- machine_real_restart(MRR_BIOS);
-
- /* We're probably dead after this, but... */
- reboot_type = BOOT_TRIPLE;
- break;
-
- case BOOT_ACPI:
- acpi_reboot();
- reboot_type = BOOT_KBD;
- break;
-
case BOOT_EFI:
if (efi_enabled(EFI_RUNTIME_SERVICES))
efi.reset_system(reboot_mode == REBOOT_WARM ?
EFI_RESET_WARM :
EFI_RESET_COLD,
EFI_SUCCESS, 0, NULL);
- reboot_type = BOOT_CF9_COND;
+ reboot_type = BOOT_BIOS;
+ break;
+
+ case BOOT_BIOS:
+ machine_real_restart(MRR_BIOS);
+
+ /* We're probably dead after this, but... */
+ reboot_type = BOOT_CF9_SAFE;
break;

- case BOOT_CF9:
+ case BOOT_CF9_FORCE:
port_cf9_safe = true;
/* Fall through */

- case BOOT_CF9_COND:
+ case BOOT_CF9_SAFE:
if (port_cf9_safe) {
- u8 reboot_code = reboot_mode == REBOOT_WARM ?
- 0x06 : 0x0E;
+ u8 reboot_code = reboot_mode == REBOOT_WARM ? 0x06 : 0x0E;
u8 cf9 = inb(0xcf9) & ~reboot_code;
outb(cf9|2, 0xcf9); /* Request hard reset */
udelay(50);
@@ -553,7 +549,15 @@ static void native_machine_emergency_restart(void)
outb(cf9|reboot_code, 0xcf9);
udelay(50);
}
- reboot_type = BOOT_BIOS;
+ reboot_type = BOOT_TRIPLE;
+ break;
+
+ case BOOT_TRIPLE:
+ load_idt(&no_idt);
+ __asm__ __volatile__("int3");
+
+ /* We're probably dead after this, but... */
+ reboot_type = BOOT_KBD;
break;
}
}
diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index 9e7db9e..48bf152 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -20,13 +20,13 @@ enum reboot_mode {
extern enum reboot_mode reboot_mode;

enum reboot_type {
- BOOT_TRIPLE = 't',
- BOOT_KBD = 'k',
- BOOT_BIOS = 'b',
- BOOT_ACPI = 'a',
- BOOT_EFI = 'e',
- BOOT_CF9 = 'p',
- BOOT_CF9_COND = 'q',
+ BOOT_TRIPLE = 't',
+ BOOT_KBD = 'k',
+ BOOT_BIOS = 'b',
+ BOOT_ACPI = 'a',
+ BOOT_EFI = 'e',
+ BOOT_CF9_FORCE = 'p',
+ BOOT_CF9_SAFE = 'q',
};
extern enum reboot_type reboot_type;

2014-04-04 08:00:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method


* tip-bot for Ingo Molnar <[email protected]> wrote:

> Commit-ID: d122ab8d224a59aa0c3e4ba3540c6ab99556c3e3
> Gitweb: http://git.kernel.org/tip/d122ab8d224a59aa0c3e4ba3540c6ab99556c3e3
> Author: Ingo Molnar <[email protected]>
> AuthorDate: Fri, 4 Apr 2014 08:41:26 +0200
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Fri, 4 Apr 2014 08:48:03 +0200
>
> [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

So, JFYI, I have put this into x86/reboot to get it tested, but it's
not queued up for upstream yet until there's concensus.

(Steve testing it wouldn't hurt either.)

Thanks,

Ingo

2014-04-04 12:35:37

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

Plenty. It used to be the default reboot method for a long time, but it isn't anymore for a reason.

On April 3, 2014 11:13:32 PM PDT, Ingo Molnar <[email protected]> wrote:
>
>* H. Peter Anvin <[email protected]> wrote:
>
>> Keep in mind we already tried CF9 in the default flow and it broke
>> things. I'm willing to wait for reports about production machines,
>> though, but I fully expect them.
>
>Typically there's nothing particularly weird about preproduction Intel
>hardware when it comes to reboot methods, other than people more
>willing to test them with development kernels.
>
>Is there any system were the triple fault method does not work?
>
>Thanks,
>
> Ingo

--
Sent from my mobile phone. Please pardon brevity and lack of formatting.

2014-04-04 12:58:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

Trying anything at all after BIOS is meaningless as BIOS is terminal... it will either work or it will hang. Triplefault is the same way.

On April 4, 2014 1:00:07 AM PDT, Ingo Molnar <[email protected]> wrote:
>
>* tip-bot for Ingo Molnar <[email protected]> wrote:
>
>> Commit-ID: d122ab8d224a59aa0c3e4ba3540c6ab99556c3e3
>> Gitweb:
>http://git.kernel.org/tip/d122ab8d224a59aa0c3e4ba3540c6ab99556c3e3
>> Author: Ingo Molnar <[email protected]>
>> AuthorDate: Fri, 4 Apr 2014 08:41:26 +0200
>> Committer: Ingo Molnar <[email protected]>
>> CommitDate: Fri, 4 Apr 2014 08:48:03 +0200
>>
>> [PATCH] x86: Try the BIOS reboot method before the PCI reboot method
>
>So, JFYI, I have put this into x86/reboot to get it tested, but it's
>not queued up for upstream yet until there's concensus.
>
>(Steve testing it wouldn't hurt either.)
>
>Thanks,
>
> Ingo

--
Sent from my mobile phone. Please pardon brevity and lack of formatting.

2014-04-04 13:30:10

by Steven Rostedt

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On Fri, 4 Apr 2014 10:00:07 +0200
Ingo Molnar <[email protected]> wrote:

>
> * tip-bot for Ingo Molnar <[email protected]> wrote:
>
> > Commit-ID: d122ab8d224a59aa0c3e4ba3540c6ab99556c3e3
> > Gitweb: http://git.kernel.org/tip/d122ab8d224a59aa0c3e4ba3540c6ab99556c3e3
> > Author: Ingo Molnar <[email protected]>
> > AuthorDate: Fri, 4 Apr 2014 08:41:26 +0200
> > Committer: Ingo Molnar <[email protected]>
> > CommitDate: Fri, 4 Apr 2014 08:48:03 +0200
> >
> > [PATCH] x86: Try the BIOS reboot method before the PCI reboot method
>
> So, JFYI, I have put this into x86/reboot to get it tested, but it's
> not queued up for upstream yet until there's concensus.
>
> (Steve testing it wouldn't hurt either.)

I'll give it a test. But currently I resurrected an old box Pentium III,
and I'm going to give it a try to see if it fails with the current
kernel too.

-- Steve

2014-04-04 13:39:08

by Li, Aubrey

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

<cc to adam>

On 2014/4/4 16:00, Ingo Molnar wrote:
>
>>
>> [PATCH] x86: Try the BIOS reboot method before the PCI reboot method
>
> So, JFYI, I have put this into x86/reboot to get it tested, but it's
> not queued up for upstream yet until there's concensus.
>
> (Steve testing it wouldn't hurt either.)
>
> Thanks,
>
> Ingo

The change is originated from the following bug:
https://bugzilla.kernel.org/show_bug.cgi?id=70931

The system will hang after BIOS or TRIPLE if it doesn't work, that's why
I added CF9 before BIOS and TRIPLE.

Adam may want to try this change to see if reboot works on his DELL
Venue 8 Pro.

I don't think there is enough justification to revert the patch. We
introduce EFI into the default list, we didn't see any reason to remove
it out so far.

Thanks,
-Aubrey


2014-04-04 15:09:55

by Adam Williamson

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On Fri, 2014-04-04 at 21:38 +0800, Li, Aubrey wrote:
> <cc to adam>
>
> On 2014/4/4 16:00, Ingo Molnar wrote:
> >
> >>
> >> [PATCH] x86: Try the BIOS reboot method before the PCI reboot method
> >
> > So, JFYI, I have put this into x86/reboot to get it tested, but it's
> > not queued up for upstream yet until there's concensus.
> >
> > (Steve testing it wouldn't hurt either.)
> >
> > Thanks,
> >
> > Ingo
>
> The change is originated from the following bug:
> https://bugzilla.kernel.org/show_bug.cgi?id=70931
>
> The system will hang after BIOS or TRIPLE if it doesn't work, that's why
> I added CF9 before BIOS and TRIPLE.
>
> Adam may want to try this change to see if reboot works on his DELL
> Venue 8 Pro.
>
> I don't think there is enough justification to revert the patch. We
> introduce EFI into the default list, we didn't see any reason to remove
> it out so far.

AIUI the bottom line for Baytrail-y devices is that they'll be OK so
long as EFI is tried before anything which causes a hang, right?

I can try patches out for sure and confirm, but it'd be good to have a
definite candidate. Are we considering the first patch alone -
https://patchwork.kernel.org/patch/3937101/ - or also the second patch -
https://patchwork.kernel.org/patch/3937091/ ?
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net

2014-04-04 15:12:28

by Matthew Garrett

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, Apr 04, 2014 at 09:27:48AM +0800, Li, Aubrey wrote:

> The current situation is,
> - we have one(do we know more?) preproduction machine hangs by CF9.
> - We have more than one(could be thousand known) production machine
> works by CF9.

Production hardware should never require CF9.

--
Matthew Garrett | [email protected]

2014-04-04 15:14:11

by Matthew Garrett

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On Fri, Apr 04, 2014 at 09:38:57PM +0800, Li, Aubrey wrote:

> I don't think there is enough justification to revert the patch. We
> introduce EFI into the default list, we didn't see any reason to remove
> it out so far.

We should be using the EFI method, yes. But it seems that CF9 isn't
safe, so we should drop that.

--
Matthew Garrett | [email protected]

2014-04-04 15:14:26

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On 04/04/2014 08:12 AM, Matthew Garrett wrote:
> On Fri, Apr 04, 2014 at 09:27:48AM +0800, Li, Aubrey wrote:
>
>> The current situation is,
>> - we have one(do we know more?) preproduction machine hangs by CF9.
>> - We have more than one(could be thousand known) production machine
>> works by CF9.
>
> Production hardware should never require CF9.
>

There are a lot of things that shouldn't be.

-hpa

2014-04-04 15:29:28

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On 04/04/2014 08:13 AM, Matthew Garrett wrote:
> On Fri, Apr 04, 2014 at 09:38:57PM +0800, Li, Aubrey wrote:
>
>> I don't think there is enough justification to revert the patch. We
>> introduce EFI into the default list, we didn't see any reason to remove
>> it out so far.
>
> We should be using the EFI method, yes. But it seems that CF9 isn't
> safe, so we should drop that.
>

As I said, I'm willing to hear about production hardware having problems
before killing it off, but we should dig into the history of the past
removal.

-hpa

2014-04-04 15:36:55

by Matthew Garrett

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, Apr 04, 2014 at 08:13:48AM -0700, H. Peter Anvin wrote:
> On 04/04/2014 08:12 AM, Matthew Garrett wrote:
> > On Fri, Apr 04, 2014 at 09:27:48AM +0800, Li, Aubrey wrote:
> >
> >> The current situation is,
> >> - we have one(do we know more?) preproduction machine hangs by CF9.
> >> - We have more than one(could be thousand known) production machine
> >> works by CF9.
> >
> > Production hardware should never require CF9.
> >
>
> There are a lot of things that shouldn't be.

Windows doesn't hit CF9, and production hardware is always tested with
Windows, so. Adding CF9 may make things work in some situations but it
clearly breaks them in others - we'd be better off spending our time
figuring out why systems that appear to need CF9 won't otherwise work.

--
Matthew Garrett | [email protected]

2014-04-04 15:38:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, Apr 4, 2014 at 8:12 AM, Matthew Garrett <[email protected]> wrote:
>
> Production hardware should never require CF9.

That's total BS.

The fact is, we may be doing something wrong, but ACPI fails on a
*lot* of systems. A huge swath of Dell machines in particular for some
reason (laptops, desktops, _and_ now there's tablet reports).

And EFI isn't even in the running.

The keyboard controller is sadly unreliable too, although I really
don't understand why. Even when a legacy keyboard controller exists
(which isn't as universal as you'd think, even though the *hardware*
is pretty much guaranteed to be there in the chipset, it can be
disabled) there seem to be machines where the reset line isn't hooked
up. Don't ask me why. Same goes for the triple fault failure case.

End result: there's a *lot* of machines that seem to want the PCI or
legacy BIOS reboot. And it has absolutely _zero_ to do with
"production hardware".

It would be interesting if somebody can figure out *exactly* what
Windows does, because the fact that a lot of Dell machines need quirks
almost certainly means that it's _us_ doing something wrong. Dell
doesn't generally do lots of fancy odd things. I pretty much guarantee
it's because we've done something odd that Windows doesn't do.

Anyway, claiming that the PCI method shouldn't be needed is living in
some kind of dream-land. The fact is, all the other methods are
equally broken, or more so.

Linus

2014-04-04 15:40:20

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On 04/04/2014 08:36 AM, Matthew Garrett wrote:
> On Fri, Apr 04, 2014 at 08:13:48AM -0700, H. Peter Anvin wrote:
>> On 04/04/2014 08:12 AM, Matthew Garrett wrote:
>>> On Fri, Apr 04, 2014 at 09:27:48AM +0800, Li, Aubrey wrote:
>>>
>>>> The current situation is,
>>>> - we have one(do we know more?) preproduction machine hangs by CF9.
>>>> - We have more than one(could be thousand known) production machine
>>>> works by CF9.
>>>
>>> Production hardware should never require CF9.
>>>
>>
>> There are a lot of things that shouldn't be.
>
> Windows doesn't hit CF9, and production hardware is always tested with
> Windows, so. Adding CF9 may make things work in some situations but it
> clearly breaks them in others - we'd be better off spending our time
> figuring out why systems that appear to need CF9 won't otherwise work.
>

There are several platforms where the default boot method invokes an SMM
handler which blithly assumes that the hardware is in whatever state
Windows leaves it in, and hangs otherwise. This seems to include nearly
all Dell systems. So it isn't really quite that simple...

-hpa

2014-04-04 15:45:58

by Matthew Garrett

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, Apr 04, 2014 at 08:38:42AM -0700, Linus Torvalds wrote:
> On Fri, Apr 4, 2014 at 8:12 AM, Matthew Garrett <[email protected]> wrote:
> >
> > Production hardware should never require CF9.
>
> That's total BS.
>
> The fact is, we may be doing something wrong, but ACPI fails on a
> *lot* of systems. A huge swath of Dell machines in particular for some
> reason (laptops, desktops, _and_ now there's tablet reports).

Which is almost certainly because the other reboot methods are trapping
into SMI and hitting some hardware that we've left in a different state
to Windows. CF9 may work around that, but the actual fix is to figure
out why the firmware is wedging and fix it. Otherwise we're going to
spend the rest of our lives maintaining a giant DMI list that's still
going to be missing entries and users are going to be sad.

> The keyboard controller is sadly unreliable too, although I really
> don't understand why. Even when a legacy keyboard controller exists
> (which isn't as universal as you'd think, even though the *hardware*
> is pretty much guaranteed to be there in the chipset, it can be
> disabled) there seem to be machines where the reset line isn't hooked
> up. Don't ask me why. Same goes for the triple fault failure case.

See: SMI. Or in the triple fault case, because there's some early init
code that has the same issue. As far as I can tell Windows never triple
faults, so again I think this is our fault at some level.

> It would be interesting if somebody can figure out *exactly* what
> Windows does, because the fact that a lot of Dell machines need quirks
> almost certainly means that it's _us_ doing something wrong. Dell
> doesn't generally do lots of fancy odd things. I pretty much guarantee
> it's because we've done something odd that Windows doesn't do.

Windows hits the keyboard controller and then tries the ACPI vector. It
then sleeps for a short period, then tries the keyboard controller again
and the ACPI vector again. This means that systems which put cf9 in the
ACPI vector tend to work because of the second write, which is obviously
not what the spec envisaged but here we are. The only time it hits CF9
is when the ACPI tables tell it to.

--
Matthew Garrett | [email protected]

2014-04-04 15:46:17

by Matthew Garrett

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, Apr 04, 2014 at 08:39:51AM -0700, H. Peter Anvin wrote:
>
> There are several platforms where the default boot method invokes an SMM
> handler which blithly assumes that the hardware is in whatever state
> Windows leaves it in, and hangs otherwise. This seems to include nearly
> all Dell systems. So it isn't really quite that simple...

That's the story of all firmware, ever.

--
Matthew Garrett | [email protected]

2014-04-04 16:01:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, 04 Apr 2014 08:39:51 -0700
"H. Peter Anvin" <[email protected]> wrote:


> There are several platforms where the default boot method invokes an SMM
> handler which blithly assumes that the hardware is in whatever state
> Windows leaves it in, and hangs otherwise. This seems to include nearly
> all Dell systems. So it isn't really quite that simple...

Just an update. I pulled out of my closet an old Pentium III and got it
to finally boot the new kernel. It reboots fine, but as I left the
printk patch in, it didn't get to the cf9 usage. I take it, it can
detect the level. And perhaps the issues is only with x86_64?

I have no problem with sticking in reboot=b in my development box. I
just wanted people to know that I had issues.

Maybe keep the current way and on reboot, have a pr_info() stating, if
your system doesn't reboot try "reboot=b" on the kernel command line?

-- Steve

2014-04-04 16:03:50

by Tobias Klausmann

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot


On 04.04.2014 17:45, Matthew Garrett wrote:
> On Fri, Apr 04, 2014 at 08:38:42AM -0700, Linus Torvalds wrote:
>> On Fri, Apr 4, 2014 at 8:12 AM, Matthew Garrett <[email protected]> wrote:
>>> Production hardware should never require CF9.
>> That's total BS.
>>
>> The fact is, we may be doing something wrong, but ACPI fails on a
>> *lot* of systems. A huge swath of Dell machines in particular for some
>> reason (laptops, desktops, _and_ now there's tablet reports).
> Which is almost certainly because the other reboot methods are trapping
> into SMI and hitting some hardware that we've left in a different state
> to Windows. CF9 may work around that, but the actual fix is to figure
> out why the firmware is wedging and fix it. Otherwise we're going to
> spend the rest of our lives maintaining a giant DMI list that's still
> going to be missing entries and users are going to be sad.
>
>> The keyboard controller is sadly unreliable too, although I really
>> don't understand why. Even when a legacy keyboard controller exists
>> (which isn't as universal as you'd think, even though the *hardware*
>> is pretty much guaranteed to be there in the chipset, it can be
>> disabled) there seem to be machines where the reset line isn't hooked
>> up. Don't ask me why. Same goes for the triple fault failure case.
> See: SMI. Or in the triple fault case, because there's some early init
> code that has the same issue. As far as I can tell Windows never triple
> faults, so again I think this is our fault at some level.
>
>> It would be interesting if somebody can figure out *exactly* what
>> Windows does, because the fact that a lot of Dell machines need quirks
>> almost certainly means that it's _us_ doing something wrong. Dell
>> doesn't generally do lots of fancy odd things. I pretty much guarantee
>> it's because we've done something odd that Windows doesn't do.
> Windows hits the keyboard controller and then tries the ACPI vector. It
> then sleeps for a short period, then tries the keyboard controller again
> and the ACPI vector again. This means that systems which put cf9 in the
> ACPI vector tend to work because of the second write, which is obviously
> not what the spec envisaged but here we are. The only time it hits CF9
> is when the ACPI tables tell it to.
>

Hi,
sorry to get into the discussion at this random point, not at the
starting point, but with the latest Linus Tree my system needs several
minutes to reboot instead of some seconds, so this may be related. If I
can help in any way, let me konw!

Thanks,
Tobias Klausmann

2014-04-04 16:09:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, Apr 4, 2014 at 8:45 AM, Matthew Garrett <[email protected]> wrote:
>
> Which is almost certainly because the other reboot methods are trapping
> into SMI and hitting some hardware that we've left in a different state
> to Windows.

Why are you making up these completely invalid arguments? Because you
are making them up.

Our default reboot type is REBOOT_ACPI. That's the one we try *first*.
There are no "other reboot methods" playing games.

And it fails on tons of machines. It shouldn't be failing, but it is.
We are doing something sufficiently different from Windows that it
doesn't work. At some point there was some talk about having enabled
VT-d at boot but not disabled it before reboot, but that's just one
theory.

And given this *fact*, your denial that "PCI reboot should never be
used" is counterfactual. It may be true in some theoretical "this is
how the world should work" universe, but in the real world it is just
BS.

Why are you so deep in denial about this?

I'd love to fix whatever environment issue it is that causes this.
Last time people came up with situations, I offered to test patches,
because I have access to one of the failing Dell machines. Crickets.

Linus

2014-04-04 16:21:24

by Matthew Garrett

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, Apr 04, 2014 at 09:09:32AM -0700, Linus Torvalds wrote:
> On Fri, Apr 4, 2014 at 8:45 AM, Matthew Garrett <[email protected]> wrote:
> >
> > Which is almost certainly because the other reboot methods are trapping
> > into SMI and hitting some hardware that we've left in a different state
> > to Windows.
>
> Why are you making up these completely invalid arguments? Because you
> are making them up.
>
> Our default reboot type is REBOOT_ACPI. That's the one we try *first*.
> There are no "other reboot methods" playing games.

We try ACPI. That will sometimes work, but infrequently. We then try the
keyboard controller. That will generally work. We then try ACPI again,
which will typically work because it's often now the second write to
CF9. We then try the keyboard controller again, because that's what
Windows does. The machine should now have rebooted.

But *any* of those accesses could have generated an SMI. For all we know
the firmware is running huge quantities of code in response to any of
those register accesses. We don't know what other hardware that code
touches. We don't know what expectations it has. We don't know whether
it was written by humans or written by some sort of simulated annealing
mechanism that finally collapsed into a state where Windows rebooted and
then shipped (or even humans behaving indistinguishably from a simulated
annealing mechanism).

> And given this *fact*, your denial that "PCI reboot should never be
> used" is counterfactual. It may be true in some theoretical "this is
> how the world should work" universe, but in the real world it is just
> BS.
>
> Why are you so deep in denial about this?

Because Windows doesn't use CF9 but machines reboot anyway. That
shouldn't be a controversial point of view. We know that CF9 fixes some
machines. We know that it breaks some machines. We don't know how many
machines it fixes or how many machines it breaks. We don't know how many
machines are flipped from a working state to a broken state whenever we
fiddle with the order or introduce new heuristics. We don't know how
many go from broken to working. The only way we can be reasonably
certain that hardware will work is to duplicate precisely what Windows
does, because that's all that most vendors will ever have tested.

The problem is that while we may know exactly what Windows does in terms
of actually triggering the reboot, we don't know everything else it does
on the shutdown path and it's difficult to instrument things like
embedded controller accesses when qemu doesn't emulate one. I'll contact
the people I know in Dell and see if I can find anyone from the firmware
division who'll actually talk to us.

--
Matthew Garrett | [email protected]

2014-04-04 16:32:11

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On 04/04/2014 08:45 AM, Matthew Garrett wrote:
>
> Windows hits the keyboard controller and then tries the ACPI vector. It
> then sleeps for a short period, then tries the keyboard controller again
> and the ACPI vector again.
>

So do you think we should move the KBC upward in our priority list?
That would seem to be the obvious thing to do...

-hpa

2014-04-04 17:34:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot


* Matthew Garrett <[email protected]> wrote:

> On Fri, Apr 04, 2014 at 08:13:48AM -0700, H. Peter Anvin wrote:
> > On 04/04/2014 08:12 AM, Matthew Garrett wrote:
> > > On Fri, Apr 04, 2014 at 09:27:48AM +0800, Li, Aubrey wrote:
> > >
> > >> The current situation is,
> > >> - we have one(do we know more?) preproduction machine hangs by CF9.
> > >> - We have more than one(could be thousand known) production machine
> > >> works by CF9.
> > >
> > > Production hardware should never require CF9.
> > >
> >
> > There are a lot of things that shouldn't be.
>
> Windows doesn't hit CF9, and production hardware is always tested with
> Windows, so. [...]

So why the hell does the reboot function comment claim that the
Windows reboot sequence (which is the de facto hardware standard)
uses 0xcf9:

/*
* Windows compatible x86 hardware expects the following on reboot:
*
* 1) If the FADT has the ACPI reboot register flag set, try it
* 2) If still alive, write to the keyboard controller
* 3) If still alive, write to the ACPI reboot register again
* 4) If still alive, write to the keyboard controller again
* 5) If still alive, call the EFI runtime service to reboot
* 6) If still alive, write to the PCI IO port 0xCF9 to reboot
* 7) If still alive, inform BIOS to do a proper reboot
*

??

Thanks,

Ingo

2014-04-04 17:37:38

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

The comment header is bogus... it describes what we do, not what Windows does.

On April 4, 2014 10:34:31 AM PDT, Ingo Molnar <[email protected]> wrote:
>
>* Matthew Garrett <[email protected]> wrote:
>
>> On Fri, Apr 04, 2014 at 08:13:48AM -0700, H. Peter Anvin wrote:
>> > On 04/04/2014 08:12 AM, Matthew Garrett wrote:
>> > > On Fri, Apr 04, 2014 at 09:27:48AM +0800, Li, Aubrey wrote:
>> > >
>> > >> The current situation is,
>> > >> - we have one(do we know more?) preproduction machine hangs by
>CF9.
>> > >> - We have more than one(could be thousand known) production
>machine
>> > >> works by CF9.
>> > >
>> > > Production hardware should never require CF9.
>> > >
>> >
>> > There are a lot of things that shouldn't be.
>>
>> Windows doesn't hit CF9, and production hardware is always tested
>with
>> Windows, so. [...]
>
>So why the hell does the reboot function comment claim that the
>Windows reboot sequence (which is the de facto hardware standard)
>uses 0xcf9:
>
>/*
> * Windows compatible x86 hardware expects the following on reboot:
> *
> * 1) If the FADT has the ACPI reboot register flag set, try it
> * 2) If still alive, write to the keyboard controller
> * 3) If still alive, write to the ACPI reboot register again
> * 4) If still alive, write to the keyboard controller again
> * 5) If still alive, call the EFI runtime service to reboot
> * 6) If still alive, write to the PCI IO port 0xCF9 to reboot
> * 7) If still alive, inform BIOS to do a proper reboot
> *
>
>??
>
>Thanks,
>
> Ingo

--
Sent from my mobile phone. Please pardon brevity and lack of formatting.

2014-04-04 17:44:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot


* H. Peter Anvin <[email protected]> wrote:

> The comment header is bogus... it describes what we do, not what
> Windows does.

That comment certainly pretends to describe Windows behavior and then
it goes to outline the differences in Linux behavior:

/*
* Windows compatible x86 hardware expects the following on reboot:
*
* 1) If the FADT has the ACPI reboot register flag set, try it
* 2) If still alive, write to the keyboard controller
* 3) If still alive, write to the ACPI reboot register again
* 4) If still alive, write to the keyboard controller again
* 5) If still alive, call the EFI runtime service to reboot
* 6) If still alive, write to the PCI IO port 0xCF9 to reboot
* 7) If still alive, inform BIOS to do a proper reboot
*
* If the machine is still alive at this stage, it gives up. We default to
* following the same pattern, except that if we're still alive after (7) we'll
* try to force a triple fault and then cycle between hitting the keyboard
* controller and doing that
*/

(The 'we default' means 'Linux defaults'.)

So that comment should be fixed to describe the hardware ABI (Windows
behavior), to the extent it is known - and then outline where and why
Linux deviates from that. (and preferably it should match Windows
behavior and only _add_ to that behavior at the end of the reboot
sequence.)

Before any other change is done beyond the revert of this latest
broken commit.

Thanks,

Ingo

2014-04-04 17:48:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method


* Li, Aubrey <[email protected]> wrote:

> <cc to adam>
>
> On 2014/4/4 16:00, Ingo Molnar wrote:
> >
> >>
> >> [PATCH] x86: Try the BIOS reboot method before the PCI reboot method
> >
> > So, JFYI, I have put this into x86/reboot to get it tested, but it's
> > not queued up for upstream yet until there's concensus.
> >
> > (Steve testing it wouldn't hurt either.)
> >
> > Thanks,
> >
> > Ingo
>
> The change is originated from the following bug:
> https://bugzilla.kernel.org/show_bug.cgi?id=70931
>
> The system will hang after BIOS or TRIPLE if it doesn't work, that's
> why I added CF9 before BIOS and TRIPLE.
>
> Adam may want to try this change to see if reboot works on his DELL
> Venue 8 Pro.
>
> I don't think there is enough justification to revert the patch.
> [...]

'Breaks a box' is more than enough justification to revert a patch...

You can try to fix boxes, but only if you can do it without breaking
others.

Thanks,

Ingo

2014-04-04 18:01:08

by Matthew Garrett

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, Apr 04, 2014 at 07:44:37PM +0200, Ingo Molnar wrote:
>
> Before any other change is done beyond the revert of this latest
> broken commit.

The commit in question was the one that added the misleading text to the
comment. I'm sorry, I should have caught that.

--
Matthew Garrett | [email protected]

2014-04-04 18:19:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, Apr 4, 2014 at 9:21 AM, Matthew Garrett <[email protected]> wrote:
>
> Because Windows doesn't use CF9 but machines reboot anyway. That
> shouldn't be a controversial point of view.

No, that's not a controversial view.

But it's not the case for us, and I haven't seen a patch to make us
work more like Windows, so for *us*, and in this universe (rather than
some make-believe one), the fact that we need to use pci to reboot is
simply a fact. Including, very much, on "production hardware".

So I repeat: I'd love to fix whatever it is that we do that is
different and triggers problems (particularly on Dell, but there are
certainly other vendors too). But in the absense of that (and I
haven't actually seen a serious patch for it), denying that reboot=pci
is sometimes the right thing - regardless of "production" hardware or
not is not valid. That's what I reacted to.

Linus

2014-04-04 18:26:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG] x86: reboot doesn't reboot

On Fri, Apr 4, 2014 at 9:31 AM, H. Peter Anvin <[email protected]> wrote:
> On 04/04/2014 08:45 AM, Matthew Garrett wrote:
>>
>> Windows hits the keyboard controller and then tries the ACPI vector. It
>> then sleeps for a short period, then tries the keyboard controller again
>> and the ACPI vector again.
>>
>
> So do you think we should move the KBC upward in our priority list?
> That would seem to be the obvious thing to do...

Not likely to help. And certainly not Steve's problem. For him, moving
the *BIOS* reboot up would help.

I do think that to fix Steve's issue, we should probably remove the
PCI from the list of defaults again. So I agree with Matthew in that
sense, I'm just arguing that real production hardware still (sadly)
very much does need it.

Linus

2014-04-04 19:12:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On Fri, Apr 4, 2014 at 10:48 AM, Ingo Molnar <[email protected]> wrote:
>
> 'Breaks a box' is more than enough justification to revert a patch...

Indeed. That's especially true when "breaks a box" was found within a
day of the patch being merged.

The default thinking should be: "If we found _one_ box that broke
during the merge window, that probably means that there are at least
ten thousand boxes that would break if the change actually hit a major
distribution kernel".

Yes, developer boxes are sometimes "special". But the fact is, I doubt
that's the case. Not when this kind of breakage (oops, reboot doesn't
work) is actually not all that unusual. The quirk entries we have for
it are *not* developer boxes.

Linus

2014-04-06 17:40:57

by Alan Cox

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On Fri, 4 Apr 2014 16:13:59 +0100
Matthew Garrett <[email protected]> wrote:

> On Fri, Apr 04, 2014 at 09:38:57PM +0800, Li, Aubrey wrote:
>
> > I don't think there is enough justification to revert the patch. We
> > introduce EFI into the default list, we didn't see any reason to remove
> > it out so far.
>
> We should be using the EFI method, yes. But it seems that CF9 isn't
> safe, so we should drop that.
>

Windows does not use CF9 and there are known, documented cases where CF9
will not work reliably. That includes reliable reboot on some Baytrail
systems. (Erratum VLT60 and VLT62).

So we really shouldn't be doing 0xCF9 except as a last resort if we are
still alive and have been through the official reboot methods.

Alan

2014-04-06 18:41:01

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On 04/06/2014 10:40 AM, One Thousand Gnomes wrote:
> On Fri, 4 Apr 2014 16:13:59 +0100
> Matthew Garrett <[email protected]> wrote:
>
>> On Fri, Apr 04, 2014 at 09:38:57PM +0800, Li, Aubrey wrote:
>>
>>> I don't think there is enough justification to revert the patch. We
>>> introduce EFI into the default list, we didn't see any reason to remove
>>> it out so far.
>>
>> We should be using the EFI method, yes. But it seems that CF9 isn't
>> safe, so we should drop that.
>>
>
> Windows does not use CF9 and there are known, documented cases where CF9
> will not work reliably. That includes reliable reboot on some Baytrail
> systems. (Erratum VLT60 and VLT62).
>
> So we really shouldn't be doing 0xCF9 except as a last resort if we are
> still alive and have been through the official reboot methods.
>

No question. The question at hand is if we should do it after all other
non-terminal (BIOS, triple) methods have been tried.

-hpa

2014-04-07 08:01:07

by Li, Aubrey

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On 2014/4/7 2:40, H. Peter Anvin wrote:
>
> No question. The question at hand is if we should do it after all other
> non-terminal (BIOS, triple) methods have been tried.
>

The reboot sequence before the change is:
(1) ACPI
(2) KEYBOARD
(3) ACPI
(4) KEYBOARD
(5) TRIPLE

The reboot sequence after the change is:
(1) ACPI
(2) KEYBOARD
(3) ACPI
(4) KEYBOARD
(ADD_1) EFI
(ADD_2) CF9
(ADD_3) BIOS
(5) TRIPLE

Steven's machine exactly hit (5), while CF9 hangs, so we encountered
this regression.

EFI is no question. So, Is everybody okay with the following sequence?
(1) ACPI
(2) KEYBOARD
(3) ACPI
(4) KEYBOARD
(ADD_1) EFI
(5) TRIPLE
(ADD_2) CF9
(ADD_3) BIOS

Thanks,
-Aubrey

2014-04-07 15:42:40

by Steven Rostedt

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On Fri, 4 Apr 2014 10:00:07 +0200
Ingo Molnar <[email protected]> wrote:

>
> * tip-bot for Ingo Molnar <[email protected]> wrote:
>
> > Commit-ID: d122ab8d224a59aa0c3e4ba3540c6ab99556c3e3
> > Gitweb: http://git.kernel.org/tip/d122ab8d224a59aa0c3e4ba3540c6ab99556c3e3
> > Author: Ingo Molnar <[email protected]>
> > AuthorDate: Fri, 4 Apr 2014 08:41:26 +0200
> > Committer: Ingo Molnar <[email protected]>
> > CommitDate: Fri, 4 Apr 2014 08:48:03 +0200
> >
> > [PATCH] x86: Try the BIOS reboot method before the PCI reboot method
>
> So, JFYI, I have put this into x86/reboot to get it tested, but it's
> not queued up for upstream yet until there's concensus.
>
> (Steve testing it wouldn't hurt either.)

Tested, and with this patch it reboots fine.

I booted without the patch, and it failed to reboot.

I added the patch, booted it, checked if I had the right kernel with
uname, and then rebooted. It rebooted fine.

Went back, removed the patch, rebuilt and rebooted. Tried to reboot,
and it failed again.

Tested-by: Steven Rostedt <[email protected]>

-- Steve

2014-04-07 21:04:13

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On 04/07/2014 01:00 AM, Li, Aubrey wrote:
>
> EFI is no question. So, Is everybody okay with the following sequence?
> (1) ACPI
> (2) KEYBOARD
> (3) ACPI
> (4) KEYBOARD
> (ADD_1) EFI
> (5) TRIPLE
> (ADD_2) CF9
> (ADD_3) BIOS
>

No. Because after "triple" there is NOTHING. Putting anything after it
is a joke.

How many times do I have to explain this to how many people? If
"triple" or "BIOS" doesn't work, there is no trying again... the machine
is dead.

-hpa

2014-04-07 22:05:18

by Adam Williamson

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On Mon, 2014-04-07 at 14:03 -0700, H. Peter Anvin wrote:
> On 04/07/2014 01:00 AM, Li, Aubrey wrote:
> >
> > EFI is no question. So, Is everybody okay with the following sequence?
> > (1) ACPI
> > (2) KEYBOARD
> > (3) ACPI
> > (4) KEYBOARD
> > (ADD_1) EFI
> > (5) TRIPLE
> > (ADD_2) CF9
> > (ADD_3) BIOS
> >
>
> No. Because after "triple" there is NOTHING. Putting anything after it
> is a joke.
>
> How many times do I have to explain this to how many people? If
> "triple" or "BIOS" doesn't work, there is no trying again... the machine
> is dead.

So if I'm following correctly, we should be able to sort the methods
into three buckets:

1) known to (almost) always be 'safe'
2) may cause system freeze if they fail
3) definitely cause system freeze if they fail

We put the methods from bucket 1) first, in whichever order makes the
most sense. Then we put any methods from bucket 2), in an order derived
from some kind of likelihood of success / likelihood of hang on fail
calculation. And at the end we can put precisely *one* method from
bucket 3, which should obviously be the one most likely to succeed on
the most systems which haven't already worked with one of the other
methods. We can't have more than one bucket 3) method, it just doesn't
make sense.

It sounds like at least TRIPLE and BIOS are 'bucket 3' methods, so it
seems like we have to pick which of the two we think is most useful, put
it last, and not have the other one. Of the new methods it sounds like
EFI is 'bucket 1', BIOS is 'bucket 3', and CF9 is 'bucket 2'.

so, it sounds like...

1) ACPI
2) KEYBOARD
3) ACPI
4) KEYBOARD
5) EFI
possibly 6) CF9
7) TRIPLE *or* BIOS

is what you would say makes sense, right? And really all there is to
decide is whether to include CF9, and whether to put TRIPLE or BIOS at
the end?

Again if I'm following correctly, this ought to be fine for Baytrail
devices - the ones we were trying to fix in the first place - because
they'll succeed with EFI. The justification for adding CF9 in the first
place wasn't very strong - it was just thrown in as a 'bonus extra',
more or less:

"These platforms [Baytrail] don't support ACPI reboot, we expect to call
EFI runtime service to handle this case, and CF9 is an alternate after
EFI."

so it doesn't seem unreasonable to remove it. The machine we're
currently trying to 'fix' needed TRIPLE in order to reboot correctly, it
seems. So again if I'm following correctly, that would suggest we should
have:

1) ACPI
2) KEYBOARD
3) ACPI
4) KEYBOARD
5) EFI
6) TRIPLE

I can patch that into reboot.c and confirm baytrails still work, if
that's desired.
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net

2014-04-07 22:09:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On 04/07/2014 03:04 PM, Adam Williamson wrote:
>
> So if I'm following correctly, we should be able to sort the methods
> into three buckets:
>
> 1) known to (almost) always be 'safe'
> 2) may cause system freeze if they fail
> 3) definitely cause system freeze if they fail
>
> We put the methods from bucket 1) first, in whichever order makes the
> most sense. Then we put any methods from bucket 2), in an order derived
> from some kind of likelihood of success / likelihood of hang on fail
> calculation. And at the end we can put precisely *one* method from
> bucket 3, which should obviously be the one most likely to succeed on
> the most systems which haven't already worked with one of the other
> methods. We can't have more than one bucket 3) method, it just doesn't
> make sense.
>

Nice idea, but wrong. The problem is that you have to consider the
probability of success vs failure of the type 2 & 3 methods, and what is
a regression or not.

> It sounds like at least TRIPLE and BIOS are 'bucket 3' methods, so it
> seems like we have to pick which of the two we think is most useful, put
> it last, and not have the other one. Of the new methods it sounds like
> EFI is 'bucket 1', BIOS is 'bucket 3', and CF9 is 'bucket 2'.
>
> so, it sounds like...
>
> 1) ACPI
> 2) KEYBOARD
> 3) ACPI
> 4) KEYBOARD
> 5) EFI
> possibly 6) CF9
> 7) TRIPLE *or* BIOS
>
> is what you would say makes sense, right? And really all there is to
> decide is whether to include CF9, and whether to put TRIPLE or BIOS at
> the end?
>

Yep. And it has now been shown again that CF9 just isn't safe. We do
TRIPLE as the ultimate fallback now.

-hpa

2014-04-07 23:13:17

by Adam Williamson

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On Mon, 2014-04-07 at 15:09 -0700, H. Peter Anvin wrote:

> > so, it sounds like...
> >
> > 1) ACPI
> > 2) KEYBOARD
> > 3) ACPI
> > 4) KEYBOARD
> > 5) EFI
> > possibly 6) CF9
> > 7) TRIPLE *or* BIOS
> >
> > is what you would say makes sense, right? And really all there is to
> > decide is whether to include CF9, and whether to put TRIPLE or BIOS at
> > the end?
> >
>
> Yep. And it has now been shown again that CF9 just isn't safe. We do
> TRIPLE as the ultimate fallback now.

OK. So the conservative option here - compared to the state before
Aubrey's patch, not the current state - is to simply add 5) EFI, go back
to TRIPLE instead of BIOS at the end of the list, and leave it at that.
We expect that would fix Baytrails.

But if we look at the message on Aubrey's commit:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/reboot.c?id=a4f1987e4c5489a3877eaa7451a68d28c5a3f664

it mentions:

"If one method required is in the default list in this patch but the
machine reboot still hangs, that means some methods ahead of the
required method cause the system hangs, then reboot the machine by
passing reboot= arguments and submit the reboot dmidecode table quirk."

basically, we know that we probably can't come up with a standard
fallback chain that will work for *every* system, and the goal instead
is to come up with the approach that requires the *fewest* quirks.

Again just the monkey here, but I think this is right:

'CF9' = 'reboot=p' = 'set_pci_reboot'
'BIOS' = 'reboot=bios' = 'set_bios_reboot'

if so, according to the existing quirk table, we know of 17 cases which
need CF9 and 11 which need BIOS, and are therefore not served by the
previous fallback chain. So far after making the change, AFAIK, we've
heard of one case which does not work with ACPI or KBD, hangs with CF9,
but works with TRIPLE. It would need a quirk for any order which
includes CF9, but that's still 16 quirks fewer than we're currently
carrying for systems which need CF9.

Peter suggested going back and looking at the previous attempt to add
CF9 to the fallback chain. I just did that. Here's the relevant commits:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/reboot.c?id=b47b92884212008b4bd044ba6b48b93c00b10ec6
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kernel/reboot.c?id=3889d0cea2b73049bdca062d9ff1e5d33468289c

the relevant text:

"Unfortunately this has been shown to cause lockups on at least two
systems for which REBOOT_KBD worked, both Thinkpads with Intel
chipsets."

those systems would, it seems, be OK with the proposed order:

ACPI
KBD
ACPI
KBD
EFI
CF9
TRIPLE
(rinse and repeat)

because KBD gets hit before CF9. So we at least don't have any prior
evidence that adding CF9 *after* KBD will break systems other than the
one found so far, AFAICS.

The question of whether the 'last resort' should be TRIPLE or BIOS is
presumably similarly data driven - would we wind up with more quirks to
use TRIPLE than we currently have to use BIOS? Always hard to answer
those questions, though.

Do we assume that finding a single system which would need a quirk to
avoid CF9 under the new chain means we'd quickly find another 16 and
hence be in a worse position than we are by leaving CF9 out? Or do we
give it a bit longer to see if more really do show up?
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net

2014-04-09 20:50:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [tip:x86/reboot] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On 04/07/2014 04:12 PM, Adam Williamson wrote:
>
> basically, we know that we probably can't come up with a standard
> fallback chain that will work for *every* system, and the goal instead
> is to come up with the approach that requires the *fewest* quirks.
>

Not quite.

The problem is that we can't test everything, so part of it isn't just
about the amount of quirks but also our confidence level in them.

-hpa

Subject: [tip:x86/urgent] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

Commit-ID: f042310bf8a846bcf21012ffee78d9eb562a7fa4
Gitweb: http://git.kernel.org/tip/f042310bf8a846bcf21012ffee78d9eb562a7fa4
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 4 Apr 2014 08:41:26 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 14 Apr 2014 13:13:30 +0200

[PATCH] x86: Try the BIOS reboot method before the PCI reboot method

Steve reported a reboot hang and bisected it back to this commit:

a4f1987e4c54 x86, reboot: Add EFI and CF9 reboot methods into the default list

He heroically tested all reboot methods and found the following:

reboot=t # triple fault ok
reboot=k # keyboard ctrl FAIL
reboot=b # BIOS ok
reboot=a # ACPI FAIL
reboot=e # EFI FAIL [system has no EFI]
reboot=p # PCI 0xcf9 FAIL

And I think it's pretty obvious that we should only try 0xcf9 as a
last resort - if at all.

The other observation is that (on this box) we should never try
the PCI reboot method, but close with either the 'triple fault'
or the 'BIOS' (terminal!) reboot methods.

Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...

So this patch fixes the worst problems:

- it orders the actual reboot logic to follow the reboot ordering
pattern - it was in a pretty random order before for no good
reason.

- it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
BOOT_CF9_SAFE flags to make the code more obvious.

- it tries the BIOS reboot method before the PCI reboot method.
(Since 'BIOS' is a terminal reboot method resulting in a hang
if it does not work, this is essentially equivalent to removing
the PCI reboot method from the default reboot chain.)

- just for the miraculous possibility of terminal (resulting
in hang) reboot methods of triple fault or BIOS returning
without having done their job, there's an ordering between
them as well.

Reported-and-bisected-and-tested-by: Steven Rostedt <[email protected]>
Cc: Li Aubrey <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matthew Garrett <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/reboot.c | 72 +++++++++++++++++++++++++-----------------------
include/linux/reboot.h | 14 +++++-----
2 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 654b465..3399d3a 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -114,8 +114,8 @@ EXPORT_SYMBOL(machine_real_restart);
*/
static int __init set_pci_reboot(const struct dmi_system_id *d)
{
- if (reboot_type != BOOT_CF9) {
- reboot_type = BOOT_CF9;
+ if (reboot_type != BOOT_CF9_FORCE) {
+ reboot_type = BOOT_CF9_FORCE;
pr_info("%s series board detected. Selecting %s-method for reboots.\n",
d->ident, "PCI");
}
@@ -458,20 +458,23 @@ void __attribute__((weak)) mach_reboot_fixups(void)
}

/*
- * Windows compatible x86 hardware expects the following on reboot:
+ * To the best of our knowledge Windows compatible x86 hardware expects
+ * the following on reboot:
*
* 1) If the FADT has the ACPI reboot register flag set, try it
* 2) If still alive, write to the keyboard controller
* 3) If still alive, write to the ACPI reboot register again
* 4) If still alive, write to the keyboard controller again
* 5) If still alive, call the EFI runtime service to reboot
- * 6) If still alive, write to the PCI IO port 0xCF9 to reboot
- * 7) If still alive, inform BIOS to do a proper reboot
+ * 6) If no EFI runtime service, call the BIOS to do a reboot
*
- * If the machine is still alive at this stage, it gives up. We default to
- * following the same pattern, except that if we're still alive after (7) we'll
- * try to force a triple fault and then cycle between hitting the keyboard
- * controller and doing that
+ * We default to following the same pattern. We also have
+ * two other reboot methods: 'triple fault' and 'PCI', which
+ * can be triggered via the reboot= kernel boot option or
+ * via quirks.
+ *
+ * This means that this function can never return, it can misbehave
+ * by not rebooting properly and hanging.
*/
static void native_machine_emergency_restart(void)
{
@@ -492,6 +495,11 @@ static void native_machine_emergency_restart(void)
for (;;) {
/* Could also try the reset bit in the Hammer NB */
switch (reboot_type) {
+ case BOOT_ACPI:
+ acpi_reboot();
+ reboot_type = BOOT_KBD;
+ break;
+
case BOOT_KBD:
mach_reboot_fixups(); /* For board specific fixups */

@@ -509,43 +517,29 @@ static void native_machine_emergency_restart(void)
}
break;

- case BOOT_TRIPLE:
- load_idt(&no_idt);
- __asm__ __volatile__("int3");
-
- /* We're probably dead after this, but... */
- reboot_type = BOOT_KBD;
- break;
-
- case BOOT_BIOS:
- machine_real_restart(MRR_BIOS);
-
- /* We're probably dead after this, but... */
- reboot_type = BOOT_TRIPLE;
- break;
-
- case BOOT_ACPI:
- acpi_reboot();
- reboot_type = BOOT_KBD;
- break;
-
case BOOT_EFI:
if (efi_enabled(EFI_RUNTIME_SERVICES))
efi.reset_system(reboot_mode == REBOOT_WARM ?
EFI_RESET_WARM :
EFI_RESET_COLD,
EFI_SUCCESS, 0, NULL);
- reboot_type = BOOT_CF9_COND;
+ reboot_type = BOOT_BIOS;
+ break;
+
+ case BOOT_BIOS:
+ machine_real_restart(MRR_BIOS);
+
+ /* We're probably dead after this, but... */
+ reboot_type = BOOT_CF9_SAFE;
break;

- case BOOT_CF9:
+ case BOOT_CF9_FORCE:
port_cf9_safe = true;
/* Fall through */

- case BOOT_CF9_COND:
+ case BOOT_CF9_SAFE:
if (port_cf9_safe) {
- u8 reboot_code = reboot_mode == REBOOT_WARM ?
- 0x06 : 0x0E;
+ u8 reboot_code = reboot_mode == REBOOT_WARM ? 0x06 : 0x0E;
u8 cf9 = inb(0xcf9) & ~reboot_code;
outb(cf9|2, 0xcf9); /* Request hard reset */
udelay(50);
@@ -553,7 +547,15 @@ static void native_machine_emergency_restart(void)
outb(cf9|reboot_code, 0xcf9);
udelay(50);
}
- reboot_type = BOOT_BIOS;
+ reboot_type = BOOT_TRIPLE;
+ break;
+
+ case BOOT_TRIPLE:
+ load_idt(&no_idt);
+ __asm__ __volatile__("int3");
+
+ /* We're probably dead after this, but... */
+ reboot_type = BOOT_KBD;
break;
}
}
diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index 9e7db9e..48bf152 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -20,13 +20,13 @@ enum reboot_mode {
extern enum reboot_mode reboot_mode;

enum reboot_type {
- BOOT_TRIPLE = 't',
- BOOT_KBD = 'k',
- BOOT_BIOS = 'b',
- BOOT_ACPI = 'a',
- BOOT_EFI = 'e',
- BOOT_CF9 = 'p',
- BOOT_CF9_COND = 'q',
+ BOOT_TRIPLE = 't',
+ BOOT_KBD = 'k',
+ BOOT_BIOS = 'b',
+ BOOT_ACPI = 'a',
+ BOOT_EFI = 'e',
+ BOOT_CF9_FORCE = 'p',
+ BOOT_CF9_SAFE = 'q',
};
extern enum reboot_type reboot_type;

2014-04-14 11:27:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [tip:x86/urgent] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method


* tip-bot for Ingo Molnar <[email protected]> wrote:

> Commit-ID: f042310bf8a846bcf21012ffee78d9eb562a7fa4
> Gitweb: http://git.kernel.org/tip/f042310bf8a846bcf21012ffee78d9eb562a7fa4
> Author: Ingo Molnar <[email protected]>
> AuthorDate: Fri, 4 Apr 2014 08:41:26 +0200
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Mon, 14 Apr 2014 13:13:30 +0200
>
> [PATCH] x86: Try the BIOS reboot method before the PCI reboot method
>
> Steve reported a reboot hang and bisected it back to this commit:
>
> a4f1987e4c54 x86, reboot: Add EFI and CF9 reboot methods into the default list
>
> He heroically tested all reboot methods and found the following:
>
> reboot=t # triple fault ok
> reboot=k # keyboard ctrl FAIL
> reboot=b # BIOS ok
> reboot=a # ACPI FAIL
> reboot=e # EFI FAIL [system has no EFI]
> reboot=p # PCI 0xcf9 FAIL
>
> And I think it's pretty obvious that we should only try 0xcf9 as a
> last resort - if at all.
>
> The other observation is that (on this box) we should never try
> the PCI reboot method, but close with either the 'triple fault'
> or the 'BIOS' (terminal!) reboot methods.
>
> Thirdly, CF9_COND is a total misnomer - it should be something like
> CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...
>
> So this patch fixes the worst problems:
>
> - it orders the actual reboot logic to follow the reboot ordering
> pattern - it was in a pretty random order before for no good
> reason.
>
> - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
> BOOT_CF9_SAFE flags to make the code more obvious.
>
> - it tries the BIOS reboot method before the PCI reboot method.
> (Since 'BIOS' is a terminal reboot method resulting in a hang
> if it does not work, this is essentially equivalent to removing
> the PCI reboot method from the default reboot chain.)
>
> - just for the miraculous possibility of terminal (resulting
> in hang) reboot methods of triple fault or BIOS returning
> without having done their job, there's an ordering between
> them as well.
>
> Reported-and-bisected-and-tested-by: Steven Rostedt <[email protected]>
> Cc: Li Aubrey <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Matthew Garrett <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>

So I rebased the patch Steve tested as-is and only fixed the patch
description and some comments in the code, preserving Steve's testing
status, and propagated it into x86/urgent.

I've added this description that tries to describe the new/old logic:

> + * To the best of our knowledge Windows compatible x86 hardware expects
> + * the following on reboot:
> *
> * 1) If the FADT has the ACPI reboot register flag set, try it
> * 2) If still alive, write to the keyboard controller
> * 3) If still alive, write to the ACPI reboot register again
> * 4) If still alive, write to the keyboard controller again
> * 5) If still alive, call the EFI runtime service to reboot
> + * 6) If no EFI runtime service, call the BIOS to do a reboot
> *
> + * We default to following the same pattern. We also have
> + * two other reboot methods: 'triple fault' and 'PCI', which
> + * can be triggered via the reboot= kernel boot option or
> + * via quirks.
> + *
> + * This means that this function can never return, it can misbehave
> + * by not rebooting properly and hanging.

If this is inaccurate (or undesirable, or fails to match the fine
code) then please object.

Thanks,

Ingo

2014-04-14 16:04:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [tip:x86/urgent] [PATCH] x86: Try the BIOS reboot method before the PCI reboot method

On Mon, 14 Apr 2014 13:27:14 +0200
Ingo Molnar <[email protected]> wrote:

> So I rebased the patch Steve tested as-is and only fixed the patch
> description and some comments in the code, preserving Steve's testing
> status, and propagated it into x86/urgent.

I pulled your latest x86/urgent branch, booted it and then rebooted it.
Worked fine. Then I did a "git show | patch -p1 -R" rebuilt, rebooted,
and then reboot again, and it locked up.

Using commit f042310bf8a846bcf21012ffee78d9eb562a7fa4

-- Steve

Subject: [tip:x86/urgent] x86: Remove the PCI reboot method from the default chain

Commit-ID: 5be44a6fb1edb57d7d2d77151870dcd79c8c0e58
Gitweb: http://git.kernel.org/tip/5be44a6fb1edb57d7d2d77151870dcd79c8c0e58
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 4 Apr 2014 08:41:26 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 16 Apr 2014 08:56:09 +0200

x86: Remove the PCI reboot method from the default chain

Steve reported a reboot hang and bisected it back to this commit:

a4f1987e4c54 x86, reboot: Add EFI and CF9 reboot methods into the default list

He heroically tested all reboot methods and found the following:

reboot=t # triple fault ok
reboot=k # keyboard ctrl FAIL
reboot=b # BIOS ok
reboot=a # ACPI FAIL
reboot=e # EFI FAIL [system has no EFI]
reboot=p # PCI 0xcf9 FAIL

And I think it's pretty obvious that we should only try PCI 0xcf9 as a
last resort - if at all.

The other observation is that (on this box) we should never try
the PCI reboot method, but close with either the 'triple fault'
or the 'BIOS' (terminal!) reboot methods.

Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...

So this patch fixes the worst problems:

- it orders the actual reboot logic to follow the reboot ordering
pattern - it was in a pretty random order before for no good
reason.

- it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
BOOT_CF9_SAFE flags to make the code more obvious.

- it tries the BIOS reboot method before the PCI reboot method.
(Since 'BIOS' is a terminal reboot method resulting in a hang
if it does not work, this is essentially equivalent to removing
the PCI reboot method from the default reboot chain.)

- just for the miraculous possibility of terminal (resulting
in hang) reboot methods of triple fault or BIOS returning
without having done their job, there's an ordering between
them as well.

Reported-and-bisected-and-tested-by: Steven Rostedt <[email protected]>
Cc: Li Aubrey <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Matthew Garrett <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/reboot.c | 72 +++++++++++++++++++++++++-----------------------
include/linux/reboot.h | 14 +++++-----
2 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 654b465..3399d3a 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -114,8 +114,8 @@ EXPORT_SYMBOL(machine_real_restart);
*/
static int __init set_pci_reboot(const struct dmi_system_id *d)
{
- if (reboot_type != BOOT_CF9) {
- reboot_type = BOOT_CF9;
+ if (reboot_type != BOOT_CF9_FORCE) {
+ reboot_type = BOOT_CF9_FORCE;
pr_info("%s series board detected. Selecting %s-method for reboots.\n",
d->ident, "PCI");
}
@@ -458,20 +458,23 @@ void __attribute__((weak)) mach_reboot_fixups(void)
}

/*
- * Windows compatible x86 hardware expects the following on reboot:
+ * To the best of our knowledge Windows compatible x86 hardware expects
+ * the following on reboot:
*
* 1) If the FADT has the ACPI reboot register flag set, try it
* 2) If still alive, write to the keyboard controller
* 3) If still alive, write to the ACPI reboot register again
* 4) If still alive, write to the keyboard controller again
* 5) If still alive, call the EFI runtime service to reboot
- * 6) If still alive, write to the PCI IO port 0xCF9 to reboot
- * 7) If still alive, inform BIOS to do a proper reboot
+ * 6) If no EFI runtime service, call the BIOS to do a reboot
*
- * If the machine is still alive at this stage, it gives up. We default to
- * following the same pattern, except that if we're still alive after (7) we'll
- * try to force a triple fault and then cycle between hitting the keyboard
- * controller and doing that
+ * We default to following the same pattern. We also have
+ * two other reboot methods: 'triple fault' and 'PCI', which
+ * can be triggered via the reboot= kernel boot option or
+ * via quirks.
+ *
+ * This means that this function can never return, it can misbehave
+ * by not rebooting properly and hanging.
*/
static void native_machine_emergency_restart(void)
{
@@ -492,6 +495,11 @@ static void native_machine_emergency_restart(void)
for (;;) {
/* Could also try the reset bit in the Hammer NB */
switch (reboot_type) {
+ case BOOT_ACPI:
+ acpi_reboot();
+ reboot_type = BOOT_KBD;
+ break;
+
case BOOT_KBD:
mach_reboot_fixups(); /* For board specific fixups */

@@ -509,43 +517,29 @@ static void native_machine_emergency_restart(void)
}
break;

- case BOOT_TRIPLE:
- load_idt(&no_idt);
- __asm__ __volatile__("int3");
-
- /* We're probably dead after this, but... */
- reboot_type = BOOT_KBD;
- break;
-
- case BOOT_BIOS:
- machine_real_restart(MRR_BIOS);
-
- /* We're probably dead after this, but... */
- reboot_type = BOOT_TRIPLE;
- break;
-
- case BOOT_ACPI:
- acpi_reboot();
- reboot_type = BOOT_KBD;
- break;
-
case BOOT_EFI:
if (efi_enabled(EFI_RUNTIME_SERVICES))
efi.reset_system(reboot_mode == REBOOT_WARM ?
EFI_RESET_WARM :
EFI_RESET_COLD,
EFI_SUCCESS, 0, NULL);
- reboot_type = BOOT_CF9_COND;
+ reboot_type = BOOT_BIOS;
+ break;
+
+ case BOOT_BIOS:
+ machine_real_restart(MRR_BIOS);
+
+ /* We're probably dead after this, but... */
+ reboot_type = BOOT_CF9_SAFE;
break;

- case BOOT_CF9:
+ case BOOT_CF9_FORCE:
port_cf9_safe = true;
/* Fall through */

- case BOOT_CF9_COND:
+ case BOOT_CF9_SAFE:
if (port_cf9_safe) {
- u8 reboot_code = reboot_mode == REBOOT_WARM ?
- 0x06 : 0x0E;
+ u8 reboot_code = reboot_mode == REBOOT_WARM ? 0x06 : 0x0E;
u8 cf9 = inb(0xcf9) & ~reboot_code;
outb(cf9|2, 0xcf9); /* Request hard reset */
udelay(50);
@@ -553,7 +547,15 @@ static void native_machine_emergency_restart(void)
outb(cf9|reboot_code, 0xcf9);
udelay(50);
}
- reboot_type = BOOT_BIOS;
+ reboot_type = BOOT_TRIPLE;
+ break;
+
+ case BOOT_TRIPLE:
+ load_idt(&no_idt);
+ __asm__ __volatile__("int3");
+
+ /* We're probably dead after this, but... */
+ reboot_type = BOOT_KBD;
break;
}
}
diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index 9e7db9e..48bf152 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -20,13 +20,13 @@ enum reboot_mode {
extern enum reboot_mode reboot_mode;

enum reboot_type {
- BOOT_TRIPLE = 't',
- BOOT_KBD = 'k',
- BOOT_BIOS = 'b',
- BOOT_ACPI = 'a',
- BOOT_EFI = 'e',
- BOOT_CF9 = 'p',
- BOOT_CF9_COND = 'q',
+ BOOT_TRIPLE = 't',
+ BOOT_KBD = 'k',
+ BOOT_BIOS = 'b',
+ BOOT_ACPI = 'a',
+ BOOT_EFI = 'e',
+ BOOT_CF9_FORCE = 'p',
+ BOOT_CF9_SAFE = 'q',
};
extern enum reboot_type reboot_type;

2014-04-28 06:07:56

by Yuhong Bao

[permalink] [raw]
Subject: RE: [BUG] x86: reboot doesn't reboot

> I'll contact
> the people I know in Dell and see if I can find anyone from the firmware
> division who'll actually talk to us.
What about now?

As a side note, OT, but: http://arstechnica.com/gadgets/2014/04/fast-but-compromised-gigabytes-amd-powered-mini-gaming-pc-reviewed/?comments=1 (notice the Editor's Pick!) -