2002-03-15 21:17:52

by Udo A. Steinberg

[permalink] [raw]
Subject: [OOPS] Kernel powerdown


Hi,

The following oops happens whenever i try to halt my machine
with kernel 2.5.6.

The last messages seen are:

flushing ide devices: hda hdb hde
Power down.
NMI Watchdog detected LOCKUP on CPU0

The relevant ACPI output is:

ACPI: Core Subsystem version [20011018]
ACPI: Subsystem enabled
ACPI: System firmware supports S0 S1 S3 S4 S5
Processor[0]: C0 C1 C2, 8 throttling states
ACPI: Power Button (FF) found
ACPI: Multiple power buttons detected, ignoring fixed-feature
ACPI: Power Button (CM) found

If you need more info, let me know.

-Udo.



ksymoops 2.4.4 on i686 2.5.6. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.5.6/ (default)
-m /boot/System.map-2.5.6 (specified)

Error (regular_file): read_ksyms stat /proc/ksyms failed
ksymoops: No such file or directory
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
NMI Watchdog detected LOCKUP on CPU0, eip c01b42f6, registers:
CPU: 0
EIP: 0010:[<c01b42f6>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00000002
eax: 0029611a ebx: 007ab8cc ecx: aa1fd773 edx: 00000011
esi: 00003c01 edi: 00000001 ebp: bffffd3c esp: cd947e30
ds: 0018 es: 0018 ss: 0018
Stack: 00003c01 c01b4343 007ab8cc c01b4379 007ab8cc c01b686a 00002710 00003c01
c01beeda 00002710 ffffffff 00000005 00000001 05003607 00000001 cd947e74
07074343 00000001 00000005 00000000 00000000 c01ce6cd 00000005 cd946000
Call Trace: [<c01b4343>] [<c01b4379>] [<c01b686a>] [<c01beeda>] [<c01ce6cd>]
[<c01ce717>] [<c010550b>] [<c011fa3d>] [<c0113a19>] [<c011de5f>] [<c011def2>]
[<c011e234>] [<c011eb81>] [<c01140cc>] [<c011d2c6>] [<c011d230>] [<c011d40a>]
[<c0107047>]
Code: 39 d8 72 f6 5b c3 8d 74 26 00 8b 44 24 04 eb 0a 8d 76 00 8d

>>EIP; c01b42f6 <__rdtsc_delay+16/20> <=====
Trace; c01b4343 <__delay+13/30>
Trace; c01b4379 <__udelay+19/20>
Trace; c01b686a <acpi_os_stall+3a/40>
Trace; c01beeda <acpi_enter_sleep_state+18a/1c0>
Trace; c01ce6cd <sm_osl_suspend+3d/80>
Trace; c01ce717 <sm_osl_power_down+7/10>
Trace; c010550b <machine_power_off+b/10>
Trace; c011fa3d <sys_reboot+15d/270>
Trace; c0113a19 <wake_up_process+9/10>
Trace; c011de5f <deliver_signal+4f/60>
Trace; c011def2 <send_sig_info+82/b0>
Trace; c011e234 <kill_something_info+144/170>
Trace; c011eb81 <sys_kill+51/60>
Trace; c01140cc <schedule+20c/250>
Trace; c011d2c6 <schedule_timeout+86/a0>
Trace; c011d230 <process_timeout+0/10>
Trace; c011d40a <sys_nanosleep+11a/1f0>
Trace; c0107047 <syscall_call+7/b>
Code; c01b42f6 <__rdtsc_delay+16/20>
00000000 <_EIP>:
Code; c01b42f6 <__rdtsc_delay+16/20> <=====
0: 39 d8 cmp %ebx,%eax <=====
Code; c01b42f8 <__rdtsc_delay+18/20>
2: 72 f6 jb fffffffa <_EIP+0xfffffffa> c01b42f0 <__rdtsc_delay+10/20>
Code; c01b42fa <__rdtsc_delay+1a/20>
4: 5b pop %ebx
Code; c01b42fb <__rdtsc_delay+1b/20>
5: c3 ret
Code; c01b42fc <__rdtsc_delay+1c/20>
6: 8d 74 26 00 lea 0x0(%esi,1),%esi
Code; c01b4300 <__loop_delay+0/30>
a: 8b 44 24 04 mov 0x4(%esp,1),%eax
Code; c01b4304 <__loop_delay+4/30>
e: eb 0a jmp 1a <_EIP+0x1a> c01b4310 <__loop_delay+10/30>
Code; c01b4306 <__loop_delay+6/30>
10: 8d 76 00 lea 0x0(%esi),%esi
Code; c01b4309 <__loop_delay+9/30>
13: 8d 00 lea (%eax),%eax


1 error issued. Results may not be reliable.


2002-03-15 21:25:13

by Alan

[permalink] [raw]
Subject: Re: [OOPS] Kernel powerdown

> flushing ide devices: hda hdb hde
> Power down.
> NMI Watchdog detected LOCKUP on CPU0


Looks like the ACPI code is simply forgetting to turn off the NMI watchdog

2002-03-15 21:31:03

by Robert Love

[permalink] [raw]
Subject: Re: [OOPS] Kernel powerdown

On Fri, 2002-03-15 at 16:17, Udo A. Steinberg wrote:

> flushing ide devices: hda hdb hde
> Power down.
> NMI Watchdog detected LOCKUP on CPU0

I suspect ACPI or whatever is not disabling the NMI watchdog on
shutdown. The OOPS is harmless, but obviously does need to be fixed.

Robert Love

2002-03-15 21:32:24

by Andrew Grover

[permalink] [raw]
Subject: RE: [OOPS] Kernel powerdown

> From: Alan Cox [mailto:[email protected]]
> > flushing ide devices: hda hdb hde
> > Power down.
> > NMI Watchdog detected LOCKUP on CPU0
> Looks like the ACPI code is simply forgetting to turn off the
> NMI watchdog

Does the machine power off successfully using ACPI when the NMI watchdog is
not enabled?

Theoretically we should be turning the machine off, after which I'm pretty
sure the NMI watchdog shouldn't be an issue :) but IIRC we are masking
interrupts and doing some delays before turning off, so the NMI watchdog
might not be liking that? APM doesn't turn off the NMI afaik so why should
ACPI have to?

-- Andy

2002-03-15 21:33:03

by Alan

[permalink] [raw]
Subject: Re: [OOPS] Kernel powerdown

> sure the NMI watchdog shouldn't be an issue :) but IIRC we are masking
> interrupts and doing some delays before turning off, so the NMI watchdog
> might not be liking that? APM doesn't turn off the NMI afaik so why should
> ACPI have to?

Its entirely possible that APM has the same bug but isnt seeing it because
it tends to drop into oblivion before the timer goes off

2002-03-15 21:41:23

by Robert Love

[permalink] [raw]
Subject: RE: [OOPS] Kernel powerdown

On Fri, 2002-03-15 at 16:30, Grover, Andrew wrote:

> Theoretically we should be turning the machine off, after which I'm pretty
> sure the NMI watchdog shouldn't be an issue :) but IIRC we are masking
> interrupts and doing some delays before turning off, so the NMI watchdog
> might not be liking that? APM doesn't turn off the NMI afaik so why should
> ACPI have to?

Hm, is the period with interrupts off during shutdown much greater with
ACPI than with APM? Maybe it is just simply that ...

You could sprinkle calls to
touch_nmi_watchdog()
in the ACPI shutdown code and see if the problem goes away ...

I am also curious about Andrew's question - does the system properly
shutdown without nmi-watchdog? The case could be that interrupts are
disabled and ACPI then goes to shut the system down, fails, and the
system just sits (like, say, a Windows 9x machine :>) and finally the
watchdog causes an OOPS. This seems most likely, in fact.

Robert Love

2002-03-15 21:45:33

by Udo A. Steinberg

[permalink] [raw]
Subject: Re: [OOPS] Kernel powerdown

"Grover, Andrew" wrote:

> > Looks like the ACPI code is simply forgetting to turn off the
> > NMI watchdog

That's right, however I don't think it should have to turn it off.

> Does the machine power off successfully using ACPI when the NMI watchdog is
> not enabled?

No, it never managed to power off with ACPI. It works with APM though.

> Theoretically we should be turning the machine off, after which I'm pretty
> sure the NMI watchdog shouldn't be an issue :)

That's what I think.

> but IIRC we are masking
> interrupts and doing some delays before turning off, so the NMI watchdog
> might not be liking that?

The problem is that it doesn't power off at all, no matter how long the
delay is ;)

> APM doesn't turn off the NMI afaik so why should ACPI have to?

Imho the problem will most likely go away when poweroff works properly
on my board. I can supply whatever info you need to make it work, too ;)

The board is an Asus A7V.

-Udo.

2002-03-15 21:50:24

by Andrew Grover

[permalink] [raw]
Subject: RE: [OOPS] Kernel powerdown

> From: Udo A. Steinberg [mailto:[email protected]]
> > Does the machine power off successfully using ACPI when the
> NMI watchdog is
> > not enabled?
>
> No, it never managed to power off with ACPI. It works with APM though.

Oh. Well then the NMI thing is a red herring. Try the latest ACPI patch from
sf.net/projects/acpi and see if that fixes things.

-- Andy

2002-03-15 21:56:24

by Robert Love

[permalink] [raw]
Subject: Re: [OOPS] Kernel powerdown

On Fri, 2002-03-15 at 16:44, Udo A. Steinberg wrote:

> > Does the machine power off successfully using ACPI when the NMI watchdog is
> > not enabled?
>
> No, it never managed to power off with ACPI. It works with APM though.

Ah, that is the problem, then.

> > APM doesn't turn off the NMI afaik so why should ACPI have to?
>
> Imho the problem will most likely go away when poweroff works properly
> on my board. I can supply whatever info you need to make it work, too ;)
>
> The board is an Asus A7V.

See if the attached patch fixes it ...

Robert Love

diff -urN linux-2.4.19/drivers/acpi/hardware/hwsleep.c linux/drivers/acpi/hardware/hwsleep.c
--- linux-2.4.19/drivers/acpi/hardware/hwsleep.c Fri Mar 15 00:28:10 2002
+++ linux/drivers/acpi/hardware/hwsleep.c Fri Mar 15 16:54:57 2002
@@ -152,6 +152,15 @@
return status;
}

+ /*
+ * Broken ACPI table on ASUS A7V:
+ * it reports type 7, but poweroff is type 2
+ */
+ if (type_a == 7 && type_b == 7 && sleep_state == ACPI_STATE_S5
+ && !memcmp(acpi_gbl_DSDT->oem_id, "ASUS\0\0", 6)
+ && !memcmp(acpi_gbl_DSDT->oem_table_id, "A7V", 3)) {
+ type_a = type_b = 2;
+ }
/* run the _PTS and _GTS methods */

MEMSET(&arg_list, 0, sizeof(arg_list));

2002-03-15 22:18:48

by Udo A. Steinberg

[permalink] [raw]
Subject: Re: [OOPS] Kernel powerdown

"Grover, Andrew" wrote:
>
> Oh. Well then the NMI thing is a red herring. Try the latest ACPI patch from
> sf.net/projects/acpi and see if that fixes things.

The latest ACPI patch fixes it.
Sorry Robert, that makes your patch obsolete :)

-Udo.

2002-03-15 22:26:59

by Robert Love

[permalink] [raw]
Subject: Re: [OOPS] Kernel powerdown

On Fri, 2002-03-15 at 17:18, Udo A. Steinberg wrote:

> The latest ACPI patch fixes it.
> Sorry Robert, that makes your patch obsolete :)

I believe it includes a variant of the patch I sent.

No matter, so long as it works. It would be nice though to know if what
I posted works as that can easily be pushed to Marcelo and Linus.
Nonetheless, the ACPI can push their next update in due time.

Glad it works,

Robert Love

2002-03-15 22:41:00

by Udo A. Steinberg

[permalink] [raw]
Subject: Re: [OOPS] Kernel powerdown

Robert Love wrote:
>
> I believe it includes a variant of the patch I sent.

I can't find any A7V specific workarounds in the latest ACPI patch from
sf.net, so I don't think so.

> No matter, so long as it works. It would be nice though to know if what
> I posted works as that can easily be pushed to Marcelo and Linus.

The newer code seems to do the right thing (tm), so I'd definitely
prefer that over some mobo-specific code. It appears that the A7V isn't
broken.

> Nonetheless, the ACPI can push their next update in due time.

Yep. No rush.

-Udo.

2002-03-16 13:39:24

by Tony Hoyle

[permalink] [raw]
Subject: Re: [OOPS] Kernel powerdown

Robert Love wrote:
>>
>>The board is an Asus A7V.
>>
>
Hmm... I had exactly the same (failure to poweroff) on the A7M. A BIOS
upgrade came out last week that seems to have fixed it (I'm still
looking at why the power button doesn't work though).

Tony