2002-08-07 16:26:15

by Adam Lackorzynski

[permalink] [raw]
Subject: Oopses on dual Athlon with 2.4.19-ac4 and 2.4.20-pre1-ac1

Hi,

I have a dual Athlon here which ooopses after 2 minutes of dnetc when
running 2.4.19-ac4 or 2.4.20-pre1-ac1. I cannot reproduce this with
2.4.19 or 2.4.20-pre1. The two Athlons are sitting on a A7M266-D.


I have put the kern log, kernel config, lspci info etc. on
http://os.inf.tu-dresden.de/~adam/oops/
as this is quite big.

2.4.20-pre1-ac1:

Unable to handle kernel NULL pointer dereference at virtual address 0000002a
printing eip:
c011831c
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c011831c>] Not tainted
EFLAGS: 00010007
eax: 0000008c ebx: ffffffd6 ecx: c0395264 edx: ddecc000
esi: c0395240 edi: ddecc02c ebp: ddecdfa4 esp: ddecdf88
ds: 0018 es: 0018 ss: 0018
Process dnetc (pid: 310, stackpage=ddecd000)
Stack: ddecc000 c0395264 ddecc02c 00000001 c011521f ddecc000 ddecc000 ddecdfbc
c011935d ddecc000 000000c5 000b4aa0 c0395240 bffff7b4 c0108b6b 00000000
00000000 40025004 000000c5 000b4aa0 bffff7b4 0000009e 0000002b 0000002b
Call Trace: [<c011521f>] [<c011935d>] [<c0108b6b>]

Code: 8b 4b 54 89 4d f4 8b 72 58 85 c9 75 37 89 73 58 f0 ff 46 14

>>EIP; c011831c <schedule+19c/3a0> <=====

>>ebx; ffffffd6 <END_OF_CODE+3fbf683a/????>
>>ecx; c0395264 <runqueues+9e4/13800>
>>edx; ddecc000 <END_OF_CODE+1dac2864/????>
>>esi; c0395240 <runqueues+9c0/13800>
>>edi; ddecc02c <END_OF_CODE+1dac2890/????>
>>ebp; ddecdfa4 <END_OF_CODE+1dac4808/????>
>>esp; ddecdf88 <END_OF_CODE+1dac47ec/????>

Trace; c011521f <smp_apic_timer_interrupt+ef/110>
Trace; c011935d <sys_sched_yield+11d/130>
Trace; c0108b6b <system_call+33/38>

Code; c011831c <schedule+19c/3a0>
00000000 <_EIP>:
Code; c011831c <schedule+19c/3a0> <=====
0: 8b 4b 54 mov 0x54(%ebx),%ecx <=====
Code; c011831f <schedule+19f/3a0>
3: 89 4d f4 mov %ecx,0xfffffff4(%ebp)
Code; c0118322 <schedule+1a2/3a0>
6: 8b 72 58 mov 0x58(%edx),%esi
Code; c0118325 <schedule+1a5/3a0>
9: 85 c9 test %ecx,%ecx
Code; c0118327 <schedule+1a7/3a0>
b: 75 37 jne 44 <_EIP+0x44> c0118360 <schedule+1e0/3a0>
Code; c0118329 <schedule+1a9/3a0>
d: 89 73 58 mov %esi,0x58(%ebx)
Code; c011832c <schedule+1ac/3a0>
10: f0 ff 46 14 lock incl 0x14(%esi)

------------------------------------------------------------

Unable to handle kernel NULL pointer dereference at virtual address 0000002a
printing eip:
c011831c
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c011831c>] Not tainted
EFLAGS: 00010003
eax: 0000008c ebx: ffffffd6 ecx: c0395264 edx: dea0c000
esi: c0395240 edi: dea0c02c ebp: dea0dfa4 esp: dea0df88
ds: 0018 es: 0018 ss: 0018
Process dnetc (pid: 307, stackpage=dea0d000)
Stack: dea0c000 c0395264 dea0c02c c011521f 00000000 dea0c000 dea0c000 dea0dfbc
c011935d dea0c000 000000c5 000be6e0 c0395240 bffff7b4 c0108b6b 00000000
00000000 40025004 000000c5 000be6e0 bffff7b4 0000009e 0000002b 0000002b
Call Trace: [<c011521f>] [<c011935d>] [<c0108b6b>]

Code: 8b 4b 54 89 4d f4 8b 72 58 85 c9 75 37 89 73 58 f0 ff 46 14

>>EIP; c011831c <schedule+19c/3a0> <=====

>>ebx; ffffffd6 <END_OF_CODE+3fbf683a/????>
>>ecx; c0395264 <runqueues+9e4/13800>
>>edx; dea0c000 <END_OF_CODE+1e602864/????>
>>esi; c0395240 <runqueues+9c0/13800>
>>edi; dea0c02c <END_OF_CODE+1e602890/????>
>>ebp; dea0dfa4 <END_OF_CODE+1e604808/????>
>>esp; dea0df88 <END_OF_CODE+1e6047ec/????>

Trace; c011521f <smp_apic_timer_interrupt+ef/110>
Trace; c011935d <sys_sched_yield+11d/130>
Trace; c0108b6b <system_call+33/38>

Code; c011831c <schedule+19c/3a0>
00000000 <_EIP>:
Code; c011831c <schedule+19c/3a0> <=====
0: 8b 4b 54 mov 0x54(%ebx),%ecx <=====
Code; c011831f <schedule+19f/3a0>
3: 89 4d f4 mov %ecx,0xfffffff4(%ebp)
Code; c0118322 <schedule+1a2/3a0>
6: 8b 72 58 mov 0x58(%edx),%esi
Code; c0118325 <schedule+1a5/3a0>
9: 85 c9 test %ecx,%ecx
Code; c0118327 <schedule+1a7/3a0>
b: 75 37 jne 44 <_EIP+0x44> c0118360 <schedule+1e0/3a0>
Code; c0118329 <schedule+1a9/3a0>
d: 89 73 58 mov %esi,0x58(%ebx)
Code; c011832c <schedule+1ac/3a0>
10: f0 ff 46 14 lock incl 0x14(%esi)


2.4.19-ac4:

Unable to handle kernel NULL pointer dereference at virtual address 0000002a
printing eip:
c01181ac
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01181ac>] Not tainted
EFLAGS: 00010003
eax: 0000008c ebx: ffffffd6 ecx: c038b8a4 edx: de7ce000
esi: c038b880 edi: de7ce02c ebp: de7cffa4 esp: de7cff88
ds: 0018 es: 0018 ss: 0018
Process dnetc (pid: 302, stackpage=de7cf000)
Stack: de7ce000 c038b8a4 de7ce02c 00000000 c011521f de7ce000 de7ce000 de7cffbc
c01191ed de7ce000 0000003c 00050910 c038b880 bffff7b4 c0108b6b 00000000
00000000 40025004 0000003c 00050910 bffff7b4 0000009e 0000002b 0000002b
Call Trace: [<c011521f>] [<c01191ed>] [<c0108b6b>]

Code: 8b 4b 54 89 4d f4 8b 72 58 85 c9 75 37 89 73 58 f0 ff 46 14

>>EIP; c01181ac <schedule+19c/3a0> <=====

>>ebx; ffffffd6 <END_OF_CODE+3fbff8ba/????>
>>ecx; c038b8a4 <runqueues+24/13800>
>>edx; de7ce000 <END_OF_CODE+1e3cd8e4/????>
>>esi; c038b880 <runqueues+0/13800>
>>edi; de7ce02c <END_OF_CODE+1e3cd910/????>
>>ebp; de7cffa4 <END_OF_CODE+1e3cf888/????>
>>esp; de7cff88 <END_OF_CODE+1e3cf86c/????>

Trace; c011521f <smp_apic_timer_interrupt+ef/110>
Trace; c01191ed <sys_sched_yield+11d/130>
Trace; c0108b6b <system_call+33/38>

Code; c01181ac <schedule+19c/3a0>
00000000 <_EIP>:
Code; c01181ac <schedule+19c/3a0> <=====
0: 8b 4b 54 mov 0x54(%ebx),%ecx <=====
Code; c01181af <schedule+19f/3a0>
3: 89 4d f4 mov %ecx,0xfffffff4(%ebp)
Code; c01181b2 <schedule+1a2/3a0>
6: 8b 72 58 mov 0x58(%edx),%esi
Code; c01181b5 <schedule+1a5/3a0>
9: 85 c9 test %ecx,%ecx
Code; c01181b7 <schedule+1a7/3a0>
b: 75 37 jne 44 <_EIP+0x44> c01181f0 <schedule+1e0/3
a0>
Code; c01181b9 <schedule+1a9/3a0>
d: 89 73 58 mov %esi,0x58(%ebx)
Code; c01181bc <schedule+1ac/3a0>
10: f0 ff 46 14 lock incl 0x14(%esi)






HTH,
Adam
--
Adam [email protected]
Lackorzynski http://os.inf.tu-dresden.de/~adam/


2002-08-07 17:34:23

by Alan

[permalink] [raw]
Subject: Re: Oopses on dual Athlon with 2.4.19-ac4 and 2.4.20-pre1-ac1

On Wed, 2002-08-07 at 17:29, Adam Lackorzynski wrote:
> Hi,
>
> I have a dual Athlon here which ooopses after 2 minutes of dnetc when
> running 2.4.19-ac4 or 2.4.20-pre1-ac1. I cannot reproduce this with
> 2.4.19 or 2.4.20-pre1. The two Athlons are sitting on a A7M266-D.

Are you loading the amd76x_pm module for power management ?

2002-08-07 18:13:02

by Alan

[permalink] [raw]
Subject: Re: Oopses on dual Athlon with 2.4.19-ac4 and 2.4.20-pre1-ac1

On Wed, 2002-08-07 at 19:05, Adam Lackorzynski wrote:
> On Wed Aug 07, 2002 at 19:57:03 +0100, Alan Cox wrote:
> > On Wed, 2002-08-07 at 17:29, Adam Lackorzynski wrote:
> > > I have a dual Athlon here which ooopses after 2 minutes of dnetc when
> > > running 2.4.19-ac4 or 2.4.20-pre1-ac1. I cannot reproduce this with
> > > 2.4.19 or 2.4.20-pre1. The two Athlons are sitting on a A7M266-D.
> >
> > Are you loading the amd76x_pm module for power management ?
>
> No, the module wasn't loaded in any of the cases. Only ipv6 and rtc are
> loaded.

Can you reproduce it with ACPI disabled ?

2002-08-07 18:01:34

by Adam Lackorzynski

[permalink] [raw]
Subject: Re: Oopses on dual Athlon with 2.4.19-ac4 and 2.4.20-pre1-ac1

On Wed Aug 07, 2002 at 19:57:03 +0100, Alan Cox wrote:
> On Wed, 2002-08-07 at 17:29, Adam Lackorzynski wrote:
> > I have a dual Athlon here which ooopses after 2 minutes of dnetc when
> > running 2.4.19-ac4 or 2.4.20-pre1-ac1. I cannot reproduce this with
> > 2.4.19 or 2.4.20-pre1. The two Athlons are sitting on a A7M266-D.
>
> Are you loading the amd76x_pm module for power management ?

No, the module wasn't loaded in any of the cases. Only ipv6 and rtc are
loaded.



Adam
--
Adam [email protected]
Lackorzynski http://os.inf.tu-dresden.de/~adam/

2002-08-07 18:44:12

by Adam Lackorzynski

[permalink] [raw]
Subject: Re: Oopses on dual Athlon with 2.4.19-ac4 and 2.4.20-pre1-ac1

On Wed Aug 07, 2002 at 20:35:42 +0100, Alan Cox wrote:
> On Wed, 2002-08-07 at 19:05, Adam Lackorzynski wrote:
> > On Wed Aug 07, 2002 at 19:57:03 +0100, Alan Cox wrote:
> > > On Wed, 2002-08-07 at 17:29, Adam Lackorzynski wrote:
> > > > I have a dual Athlon here which ooopses after 2 minutes of dnetc when
> > > > running 2.4.19-ac4 or 2.4.20-pre1-ac1. I cannot reproduce this with
> > > > 2.4.19 or 2.4.20-pre1. The two Athlons are sitting on a A7M266-D.
> > >
> > > Are you loading the amd76x_pm module for power management ?
> >
> > No, the module wasn't loaded in any of the cases. Only ipv6 and rtc are
> > loaded.
>
> Can you reproduce it with ACPI disabled ?

Yes, 2.4.20-pre1-ac1 with acpi=off crashed after 5 mins of uptime.
Unfortunately I only had remote access right now (and forgot to plug the
serial cable in another box), so that Oops has to wait... :/



Adam
--
Adam [email protected]
Lackorzynski http://os.inf.tu-dresden.de/~adam/

2002-08-07 20:41:58

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: Oopses on dual Athlon with 2.4.19-ac4 and 2.4.20-pre1-ac1

On Wed, Aug 07, 2002 at 08:35:42PM +0100, Alan Cox wrote:
> Can you reproduce it with ACPI disabled ?

Hmm, my system has ACPI disabled. It hit the following Oops with
2.4.19-rc3-ac4 on an A7M-266D, but with the power management code
compiled in. Note that it hit during the nightly updatedb task.

-ben
--
"You will be reincarnated as a toad; and you will be much happier."

Aug 7 04:02:00 bob kernel: int3: 0000
Aug 7 04:02:00 bob kernel: CPU: 0
Aug 7 04:02:00 bob kernel: EIP: 0010:[ctl_cpu_vars+5/192] Not tainted
Aug 7 04:02:00 bob kernel: EIP: 0010:[<c02bff85>] Not tainted
Aug 7 04:02:00 bob kernel: EFLAGS: 00000246
Aug 7 04:02:00 bob kernel: eax: 00000000 ebx: 00000000 ecx: c1e1f080 edx: ddd24a40
Aug 7 04:02:00 bob kernel: esi: bffffa30 edi: 08054a58 ebp: bffff908 esp: ca083f94
Aug 7 04:02:00 bob kernel: ds: 0018 es: 0018 ss: 0018
Aug 7 04:02:00 bob kernel: Process find (pid: 1958, stackpage=ca083000)
Aug 7 04:02:00 bob kernel: Stack: c0146784 ddd24a40 ddd24a40 c1e1dec0 08054a58 00000004 bffffa68 00000008
Aug 7 04:02:00 bob kernel: 00000001 da986e40 ca082000 c0108aab 08054a5a bffffa30 4213030c bffffa30
Aug 7 04:02:00 bob kernel: 08054a58 bffff908 000000c4 0000002b 0000002b 000000c4 420da0e3 00000023
Aug 7 04:02:00 bob kernel: Call Trace: [sys_lstat64+52/112] [system_call+51/56]
Aug 7 04:02:00 bob kernel: Call Trace: [<c0146784>] [<c0108aab>]
Aug 7 04:02:00 bob kernel:
Aug 7 04:02:00 bob kernel: Code: db 27 c0 98 44 36 c0 04 00 00 00 24 01 00 00 00 00 00 00 00

2002-08-08 08:13:07

by Adam Lackorzynski

[permalink] [raw]
Subject: Re: Oopses on dual Athlon with 2.4.19-ac4 and 2.4.20-pre1-ac1

On Wed Aug 07, 2002 at 20:35:42 +0100, Alan Cox wrote:
> On Wed, 2002-08-07 at 19:05, Adam Lackorzynski wrote:
> > On Wed Aug 07, 2002 at 19:57:03 +0100, Alan Cox wrote:
> > > On Wed, 2002-08-07 at 17:29, Adam Lackorzynski wrote:
> > > > I have a dual Athlon here which ooopses after 2 minutes of dnetc when
> > > > running 2.4.19-ac4 or 2.4.20-pre1-ac1. I cannot reproduce this with
> > > > 2.4.19 or 2.4.20-pre1. The two Athlons are sitting on a A7M266-D.
> > >
> > > Are you loading the amd76x_pm module for power management ?
> >
> > No, the module wasn't loaded in any of the cases. Only ipv6 and rtc are
> > loaded.
>
> Can you reproduce it with ACPI disabled ?

2.4.20-pre1-ac1 with acpi=off, looks like the other ones:

>>EIP; c011831c <schedule+19c/3a0> <=====

Code: 8b 4b 54 89 4d f4 8b 72 58 85 c9 75 37 89 73 58 f0 ff 46 14
Unable to handle kernel NULL pointer dereference at virtual address 00000028
c011831c
*pde = 00000000
Oops: 0000
CPU: 1
EIP: 0010:[<c011831c>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010003
eax: 0000008c ebx: ffffffd4 ecx: c03956dc edx: dd956000
esi: c0395240 edi: dd95602c ebp: dd957fa4 esp: dd957f88
ds: 0018 es: 0018 ss: 0018
Process dnetc (pid: 338, stackpage=dd957000)
Stack: dd956000 c03956dc dd95602c 00000001 c011521f dd956000 dd956000 dd957fbc
c011935d dd956000 000000c5 00013880 c0395240 bffff7b4 c0108b6b 00000000
00000001 40025004 000000c5 00013880 bffff7b4 0000009e 0000002b 0000002b
Call Trace: [<c011521f>] [<c011935d>] [<c0108b6b>]
Code: 8b 4b 54 89 4d f4 8b 72 58 85 c9 75 37 89 73 58 f0 ff 46 14

>>ebx; ffffffd4 <END_OF_CODE+3fbf6838/????>
>>ecx; c03956dc <runqueues+e5c/13800>
>>edx; dd956000 <END_OF_CODE+1d54c864/????>
>>esi; c0395240 <runqueues+9c0/13800>
>>edi; dd95602c <END_OF_CODE+1d54c890/????>
>>ebp; dd957fa4 <END_OF_CODE+1d54e808/????>
>>esp; dd957f88 <END_OF_CODE+1d54e7ec/????>

Trace; c011521f <smp_apic_timer_interrupt+ef/110>
Trace; c011935d <sys_sched_yield+11d/130>
Trace; c0108b6b <system_call+33/38>

Code; c011831c <schedule+19c/3a0>
00000000 <_EIP>:
Code; c011831c <schedule+19c/3a0> <=====
0: 8b 4b 54 mov 0x54(%ebx),%ecx <=====
Code; c011831f <schedule+19f/3a0>
3: 89 4d f4 mov %ecx,0xfffffff4(%ebp)
Code; c0118322 <schedule+1a2/3a0>
6: 8b 72 58 mov 0x58(%edx),%esi
Code; c0118325 <schedule+1a5/3a0>
9: 85 c9 test %ecx,%ecx
Code; c0118327 <schedule+1a7/3a0>
b: 75 37 jne 44 <_EIP+0x44> c0118360 <schedule+1e0/3
a0>
Code; c0118329 <schedule+1a9/3a0>
d: 89 73 58 mov %esi,0x58(%ebx)
Code; c011832c <schedule+1ac/3a0>
10: f0 ff 46 14 lock incl 0x14(%esi)


Adam
--
Adam [email protected]
Lackorzynski http://os.inf.tu-dresden.de/~adam/