2006-10-05 10:51:06

by Lukas Hejtmanek

[permalink] [raw]
Subject: Machine reboot

Hello,

I'm facing troubles with machine restart. While sysrq-b restarts machine, reboot
command does not. Using printk I found that kernel does not hang and issues
reset properly but BIOS does not initiate boot sequence. Is there something
I could do?

--
Luk?? Hejtm?nek


2006-10-05 11:28:01

by Jesper Juhl

[permalink] [raw]
Subject: Re: Machine reboot

On 05/10/06, Lukas Hejtmanek <[email protected]> wrote:
> Hello,
>
> I'm facing troubles with machine restart. While sysrq-b restarts machine, reboot
> command does not. Using printk I found that kernel does not hang and issues
> reset properly but BIOS does not initiate boot sequence. Is there something
> I could do?
>
You can try playing with different combinations of these options :

CONFIG_APM_ALLOW_INTS
CONFIG_APM_REAL_MODE_POWER_OFF

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-10-05 11:58:24

by Magnus Damm

[permalink] [raw]
Subject: Re: Machine reboot

On 10/5/06, Lukas Hejtmanek <[email protected]> wrote:
> Hello,
>
> I'm facing troubles with machine restart. While sysrq-b restarts machine, reboot
> command does not. Using printk I found that kernel does not hang and issues
> reset properly but BIOS does not initiate boot sequence. Is there something
> I could do?

A long shot, but switching to real mode does not work if the cpu is
running in VMX root mode ie on hardware with Intel VT extensions
enabled. So if you are using some kind of kernel virtualization module
on rather new hardware, consider rmmod:ing the module before
rebooting.

I'm about to post patches for kexec that fixes this problem, but I'm
not sure about the current reboot status.

/ magnus

2006-10-05 13:57:53

by Lukas Hejtmanek

[permalink] [raw]
Subject: Re: Machine reboot

On Thu, Oct 05, 2006 at 01:28:00PM +0200, Jesper Juhl wrote:
> >I'm facing troubles with machine restart. While sysrq-b restarts machine,
> >reboot
> >command does not. Using printk I found that kernel does not hang and issues
> >reset properly but BIOS does not initiate boot sequence. Is there something
> >I could do?
> >
> You can try playing with different combinations of these options :
>
> CONFIG_APM_ALLOW_INTS
> CONFIG_APM_REAL_MODE_POWER_OFF

I'm not using APM (as I think my board DP965LT does not support APM).

Also power off works OK (using halt cmd).

--
Luk?? Hejtm?nek

2006-10-05 16:03:31

by Lukas Hejtmanek

[permalink] [raw]
Subject: Re: Machine reboot

On Thu, Oct 05, 2006 at 08:58:22PM +0900, Magnus Damm wrote:
> A long shot, but switching to real mode does not work if the cpu is
> running in VMX root mode ie on hardware with Intel VT extensions
> enabled. So if you are using some kind of kernel virtualization module
> on rather new hardware, consider rmmod:ing the module before
> rebooting.
>
> I'm about to post patches for kexec that fixes this problem, but I'm
> not sure about the current reboot status.

You are right, I'm using Intel Core 2 Duo processor with DP965LT board that is
capable of VT extensions. However, I'm using vanilla 2.6.18 kernel in X86_64,
no additional patches, nor XEN or VMWARE is running (even their modules are
not loaded). Moreover, SYSRQ-B (emergency reboot) works fine. System graceful
reboot does not work.

--
Luk?? Hejtm?nek

2006-10-09 14:25:59

by Pavel Machek

[permalink] [raw]
Subject: Re: Machine reboot

On Thu 05-10-06 18:05:18, Lukas Hejtmanek wrote:
> On Thu, Oct 05, 2006 at 08:58:22PM +0900, Magnus Damm wrote:
> > A long shot, but switching to real mode does not work if the cpu is
> > running in VMX root mode ie on hardware with Intel VT extensions
> > enabled. So if you are using some kind of kernel virtualization module
> > on rather new hardware, consider rmmod:ing the module before
> > rebooting.
> >
> > I'm about to post patches for kexec that fixes this problem, but I'm
> > not sure about the current reboot status.
>
> You are right, I'm using Intel Core 2 Duo processor with DP965LT board that is
> capable of VT extensions. However, I'm using vanilla 2.6.18 kernel in X86_64,
> no additional patches, nor XEN or VMWARE is running (even their modules are
> not loaded). Moreover, SYSRQ-B (emergency reboot) works fine. System graceful
> reboot does not work.

Of course... copy/paste pieces of sysrq-b sequence into regular
sequence to find out what the critical difference is... no, it will
not be easy.

Perhaps your box *likes* to reboot with apic on or something? Perhaps
device_shutdown() breaks your ability to reboot?

--
Thanks for all the (sleeping) penguins.

2006-10-12 23:24:58

by Aleksey Gorelov

[permalink] [raw]
Subject: RE: Machine reboot

>-----Original Message-----
>From: [email protected]
>[mailto:[email protected]] On Behalf Of Lukas
>Hejtmanek
>Sent: Thursday, October 05, 2006 3:53 AM
>To: [email protected]
>Subject: Machine reboot
>
>Hello,
>
>I'm facing troubles with machine restart. While sysrq-b
>restarts machine, reboot
>command does not. Using printk I found that kernel does not
>hang and issues
>reset properly but BIOS does not initiate boot sequence. Is
>there something
>I could do?

I have similar issue on Intel DG965WH board. Did you try to shutdown network interface and
'rmmod e1000' right before reboot ? In my case machine reboots fine after that.

Aleks.

2006-10-12 23:46:09

by Kok, Auke

[permalink] [raw]
Subject: Re: Machine reboot

Aleksey Gorelov wrote:
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Lukas
>> Hejtmanek
>> Sent: Thursday, October 05, 2006 3:53 AM
>> To: [email protected]
>> Subject: Machine reboot
>>
>> Hello,
>>
>> I'm facing troubles with machine restart. While sysrq-b
>> restarts machine, reboot
>> command does not. Using printk I found that kernel does not
>> hang and issues
>> reset properly but BIOS does not initiate boot sequence. Is
>> there something
>> I could do?
>
> I have similar issue on Intel DG965WH board. Did you try to shutdown network interface and
> 'rmmod e1000' right before reboot ? In my case machine reboots fine after that.
>
> Aleks.

interesting, do you do that because it specifically fixes a problem you have? if so, I'd
like to know about it :)

Auke

2006-10-13 00:05:58

by Aleksey Gorelov

[permalink] [raw]
Subject: Re: Machine reboot


Auke Kok <[email protected]> wrote:
> Aleksey Gorelov wrote:
> >> -----Original Message-----
> >> From: [email protected]
> >> [mailto:[email protected]] On Behalf Of Lukas
> >> Hejtmanek
> >> Sent: Thursday, October 05, 2006 3:53 AM
> >> To: [email protected]
> >> Subject: Machine reboot
> >>
> >> Hello,
> >>
> >> I'm facing troubles with machine restart. While sysrq-b
> >> restarts machine, reboot
> >> command does not. Using printk I found that kernel does not
> >> hang and issues
> >> reset properly but BIOS does not initiate boot sequence. Is
> >> there something
> >> I could do?
> >
> > I have similar issue on Intel DG965WH board. Did you try to shutdown network interface and
> > 'rmmod e1000' right before reboot ? In my case machine reboots fine after that.
> >
> > Aleks.
>
> interesting, do you do that because it specifically fixes a problem you have? if so, I'd
> like to know about it :)
>
> Auke
>
I'm just trying to localize the issue.
Since right before machine stalls during reboot I see something like

ACPI: PCI interrupt for device 000:00:19.0 disabled
Restarting system.

and this device is Gb ethernet, e1000 is perfect candidate to look at. And yes, removing e1000
before reboot works around the issue.

I'm afraid this is now common issue across Intel 965 board series, at least with their latest BIOS
updates.

Aleks.


2006-10-13 04:10:30

by Kok, Auke

[permalink] [raw]
Subject: Re: Machine reboot

Aleksey Gorelov wrote:
> Auke Kok <[email protected]> wrote:
>> Aleksey Gorelov wrote:
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of Lukas
>>>> Hejtmanek
>>>> Sent: Thursday, October 05, 2006 3:53 AM
>>>> To: [email protected]
>>>> Subject: Machine reboot
>>>>
>>>> Hello,
>>>>
>>>> I'm facing troubles with machine restart. While sysrq-b
>>>> restarts machine, reboot
>>>> command does not. Using printk I found that kernel does not
>>>> hang and issues
>>>> reset properly but BIOS does not initiate boot sequence. Is
>>>> there something
>>>> I could do?
>>> I have similar issue on Intel DG965WH board. Did you try to shutdown network interface and
>>> 'rmmod e1000' right before reboot ? In my case machine reboots fine after that.
>>>
>>> Aleks.
>>
>> interesting, do you do that because it specifically fixes a problem you have? if so, I'd
>> like to know about it :)
>>
>> Auke
>>
> I'm just trying to localize the issue.
> Since right before machine stalls during reboot I see something like
>
> ACPI: PCI interrupt for device 000:00:19.0 disabled
> Restarting system.

that's quite a normal message, not sure why that would constitute a problem.

> and this device is Gb ethernet, e1000 is perfect candidate to look at. And yes, removing e1000
> before reboot works around the issue.

Have you tried to only `ifconfig ethX down` ? my own i965 board shuts down perfectly
fine without unloading the e1000 driver.

> I'm afraid this is now common issue across Intel 965 board series, at least with their latest BIOS
> updates.

first time I've heard of it!

I'm unsure my BIOS version will be the same as it's a devel system for our drivers, but
still I have never heard of anyone requiring the full unload of the NIC driver to be
able to shutdown.

Would you be able to debug a failed shutdown perhaps and capture the console output?
when exactly does it `stall` ? What other interrupts are assigned on your system? Did
other BIOS versions work correctly?

Auke

2006-10-13 09:16:17

by Lukas Hejtmanek

[permalink] [raw]
Subject: Re: Machine reboot

On Thu, Oct 12, 2006 at 09:08:34PM -0700, Auke Kok wrote:
> >and this device is Gb ethernet, e1000 is perfect candidate to look at. And
> >yes, removing e1000
> >before reboot works around the issue.
>
> Have you tried to only `ifconfig ethX down` ? my own i965 board shuts down
> perfectly fine without unloading the e1000 driver.

I can confirm that rmmod e1000 causes that machine can reboot gracefully.

> Would you be able to debug a failed shutdown perhaps and capture the
> console output? when exactly does it `stall` ? What other interrupts are
> assigned on your system? Did other BIOS versions work correctly?

Up to version 0864 it restarts normally. Any higher version causes hang on
restart if e1000 driver is loaded.

I've tried to report it to Intel but they replied that Linux is unsupported on
this board...

It's not an issue in the Linux kernel. Using various printk I can see that
tripple fault or reset via KBD is issued and followed by hang of the BIOS.

For i965 chipsets, the BIOS is *a lot* buggy :(

--
Luk?? Hejtm?nek

2006-10-13 14:38:14

by Kok, Auke

[permalink] [raw]
Subject: Re: Machine reboot

Lukas Hejtmanek wrote:
> On Thu, Oct 12, 2006 at 09:08:34PM -0700, Auke Kok wrote:
>>> and this device is Gb ethernet, e1000 is perfect candidate to look at. And
>>> yes, removing e1000
>>> before reboot works around the issue.
>> Have you tried to only `ifconfig ethX down` ? my own i965 board shuts down
>> perfectly fine without unloading the e1000 driver.
>
> I can confirm that rmmod e1000 causes that machine can reboot gracefully.
>
>> Would you be able to debug a failed shutdown perhaps and capture the
>> console output? when exactly does it `stall` ? What other interrupts are
>> assigned on your system? Did other BIOS versions work correctly?
>
> Up to version 0864 it restarts normally. Any higher version causes hang on
> restart if e1000 driver is loaded.
>
> I've tried to report it to Intel but they replied that Linux is unsupported on
> this board...
>
> It's not an issue in the Linux kernel. Using various printk I can see that
> tripple fault or reset via KBD is issued and followed by hang of the BIOS.
>
> For i965 chipsets, the BIOS is *a lot* buggy :(

that's depressing, can you send me the output of `dmidecode` of the latest BIOS? Perhaps
I can reproduce it myself with that version.

Auke

2006-10-13 15:25:51

by Lukas Hejtmanek

[permalink] [raw]
Subject: Re: Machine reboot

On Fri, Oct 13, 2006 at 07:36:01AM -0700, Auke Kok wrote:
> >For i965 chipsets, the BIOS is *a lot* buggy :(
>
> that's depressing, can you send me the output of `dmidecode` of the latest
> BIOS? Perhaps I can reproduce it myself with that version.

output of dmidecode is attached.

--
Luk?? Hejtm?nek


Attachments:
(No filename) (306.00 B)
dmidecode (9.68 kB)
Download all attachments

2006-10-13 15:30:44

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Machine reboot


> For i965 chipsets, the BIOS is *a lot* buggy :(

have you run the Linux firmware test kit on it?

see http://www.linuxfirmwarekit.org

2006-10-13 16:22:14

by Lukas Hejtmanek

[permalink] [raw]
Subject: Re: Machine reboot

On Fri, Oct 13, 2006 at 07:36:01AM -0700, Auke Kok wrote:
> >It's not an issue in the Linux kernel. Using various printk I can see that
> >tripple fault or reset via KBD is issued and followed by hang of the BIOS.
> >
> >For i965 chipsets, the BIOS is *a lot* buggy :(
>
> that's depressing, can you send me the output of `dmidecode` of the latest
> BIOS? Perhaps I can reproduce it myself with that version.

Good news, as of kernel 2.6.19-rc1-git9, BIOS does *not* hang with both e1000 as
module or built in kernel.

The previous version of kernel was 2.6.18 which hangs the BIOS.

Aleksey:
are you sure that it is not the same in your case? Did you not switch kernel
version between e1000 as a module and built in kernel?

--
Luk?? Hejtm?nek

2006-10-13 16:27:13

by Lukas Hejtmanek

[permalink] [raw]
Subject: Re: Machine reboot

On Fri, Oct 13, 2006 at 05:30:36PM +0200, Arjan van de Ven wrote:
>
> > For i965 chipsets, the BIOS is *a lot* buggy :(
>
> have you run the Linux firmware test kit on it?
>
> see http://www.linuxfirmwarekit.org

I did. It complains about EDD as fatal error, some warnings about ACPI and
MMCONFIG, otherwise it says passed.

However, I suspect another BIOS bug:
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not
present [20060707]
ACPI Exception (acpi_processor-0681): AE_NOT_FOUND, Processor Device is not
present [20060707]

(BIOS announces 4 processors while 1 dual core is present).

--
Luk?? Hejtm?nek

2006-10-13 20:22:25

by Aleksey Gorelov

[permalink] [raw]
Subject: Re: Machine reboot



--- Lukas Hejtmanek <[email protected]> wrote:

> On Fri, Oct 13, 2006 at 07:36:01AM -0700, Auke Kok wrote:
> > >It's not an issue in the Linux kernel. Using various printk I can see that
> > >tripple fault or reset via KBD is issued and followed by hang of the BIOS.
> > >
> > >For i965 chipsets, the BIOS is *a lot* buggy :(
> >
> > that's depressing, can you send me the output of `dmidecode` of the latest
> > BIOS? Perhaps I can reproduce it myself with that version.
>
> Good news, as of kernel 2.6.19-rc1-git9, BIOS does *not* hang with both e1000 as
> module or built in kernel.
>
> The previous version of kernel was 2.6.18 which hangs the BIOS.
>
> Aleksey:
> are you sure that it is not the same in your case? Did you not switch kernel
> version between e1000 as a module and built in kernel?

As far as I understand, you've udpated the whole kernel, not just the driver. I've tried using
driver from 2.6.19-rc2 as well as v7.2.9 from Intel's website - same story - still no reboot. Did
you try just updating driver (without whole kernel) ?

Aleks.


2006-10-13 20:27:53

by Aleksey Gorelov

[permalink] [raw]
Subject: Re: Machine reboot



--- Auke Kok <[email protected]> wrote:

>
> >> interesting, do you do that because it specifically fixes a problem you have? if so, I'd
> >> like to know about it :)
> >>
> >> Auke
> >>
> > I'm just trying to localize the issue.
> > Since right before machine stalls during reboot I see something like
> >
> > ACPI: PCI interrupt for device 000:00:19.0 disabled
> > Restarting system.
>
> that's quite a normal message, not sure why that would constitute a problem.
It's not the problem at all, but served as a hint for me to try unloading driver.
However, from latest Lukas's findings, it seems that something (_not_ in the e1000 driver) in
between 2.6.18 & 2.6.19-rc2 fixes it.

Aleks.

2006-10-13 20:30:09

by Lukas Hejtmanek

[permalink] [raw]
Subject: Re: Machine reboot

On Fri, Oct 13, 2006 at 01:27:52PM -0700, Aleksey Gorelov wrote:
> It's not the problem at all, but served as a hint for me to try unloading driver.
> However, from latest Lukas's findings, it seems that something (_not_ in the e1000 driver) in
> between 2.6.18 & 2.6.19-rc2 fixes it.

Does 2.6.19-rc1 work for you? Both with module and built in e1000 driver?

--
Luk?? Hejtm?nek

2006-10-13 21:36:39

by Aleksey Gorelov

[permalink] [raw]
Subject: Re: Machine reboot

--- Lukas Hejtmanek <[email protected]> wrote:

> On Fri, Oct 13, 2006 at 01:27:52PM -0700, Aleksey Gorelov wrote:
> > It's not the problem at all, but served as a hint for me to try unloading driver.
> > However, from latest Lukas's findings, it seems that something (_not_ in the e1000 driver) in
> > between 2.6.18 & 2.6.19-rc2 fixes it.
>
> Does 2.6.19-rc1 work for you? Both with module and built in e1000 driver?
>
I did not try that yet. But it looks like another person on the list just complained about
reboot issue on Intel 965 with 2.6.18, .19-rc1, and .19-rc2...

Aleks.

2006-10-13 21:41:53

by Lukas Hejtmanek

[permalink] [raw]
Subject: Re: Machine reboot

On Fri, Oct 13, 2006 at 01:22:24PM -0700, Aleksey Gorelov wrote:
> > Good news, as of kernel 2.6.19-rc1-git9, BIOS does *not* hang with both e1000 as
> > module or built in kernel.
> >
> > The previous version of kernel was 2.6.18 which hangs the BIOS.
> >
> > Aleksey:
> > are you sure that it is not the same in your case? Did you not switch kernel
> > version between e1000 as a module and built in kernel?
>
> As far as I understand, you've udpated the whole kernel, not just the
> driver. I've tried using driver from 2.6.19-rc2 as well as v7.2.9 from
> Intel's website - same story - still no reboot. Did you try just updating
> driver (without whole kernel) ?

I've rechecked behaviour of e1000 driver. For me, it looks like this:

2.6.18 (vanilla): e1000 as a module - system reboots OK.
e1000 built in - system hangs after KBD reset or tripple
fault

2.6.19-rc1-git9 (vanilla): e1000 as a module - system reboots OK.
e1000 built in - system reboots OK.

--
Luk?? Hejtm?nek

2006-10-13 23:22:50

by Aleksey Gorelov

[permalink] [raw]
Subject: Re: Machine reboot



--- Auke Kok <[email protected]> wrote:

> >
> > For i965 chipsets, the BIOS is *a lot* buggy :(
>
> that's depressing, can you send me the output of `dmidecode` of the latest BIOS? Perhaps
> I can reproduce it myself with that version.
>
Hi, Auke

It looks like the reason for reboot failure is unability of BIOS to reboot if network pci device
is in D3 state. Once I've added
pci_set_power_state(pdev, PCI_D0);
as a last line to e1000_shutdown() method, board started rebooting again.
Moreover, here is what I found in release notes to the latest BIOS (from October 5, 2006):
"Fixed an issue where system not able to shutdown to S5 if the LAN is set to D3 mode."
This may have affected reboot with LAN in D3 negatively.
I guess you are in the best position of all of us to bring the issue to Intel BIOS team.

Aleks.


2006-10-14 10:51:45

by Lukas Hejtmanek

[permalink] [raw]
Subject: Re: Machine reboot

On Fri, Oct 13, 2006 at 04:22:48PM -0700, Aleksey Gorelov wrote:
> It looks like the reason for reboot failure is unability of BIOS to reboot if network pci device
> is in D3 state. Once I've added
> pci_set_power_state(pdev, PCI_D0);
> as a last line to e1000_shutdown() method, board started rebooting again.
> Moreover, here is what I found in release notes to the latest BIOS (from October 5, 2006):
> "Fixed an issue where system not able to shutdown to S5 if the LAN is set to D3 mode."
> This may have affected reboot with LAN in D3 negatively.
> I guess you are in the best position of all of us to bring the issue to Intel BIOS team.

The fixed shutdown to S5 is present in latest BIOS version only.
However, the bug has been present since 1162 BIOS version.

--
Luk?? Hejtm?nek