2003-01-14 18:41:45

by Robert L. Harris

[permalink] [raw]
Subject: Oops on server that just started hanging and crashing



Ok, after some data collection since (didn't know only the box in
question could decode an oops...):

System panic'd and has started hanging without a visual panic:

Dual-amd 1.5Ghz
512Meg Ram
3Ware IDE RAID controller
16x160Gig disks, Making up 4 RAID5 arrays



ksymoops 2.3.4 on i686 2.4.19-ac4. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.19-ac4/ (default)
-m /boot/System.map-2.4.19-ac4 (specified)

No modules in ksyms, skipping objects
Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file?
Warning (compare_maps): ksyms_base symbol __wake_up_sync_R__ver___wake_up_sync not found in System.map. Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol i2o_sys_init_R__ver_i2o_sys_init not found in System.map. Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol idle_cpu_R__ver_idle_cpu not found in System.map. Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol set_cpus_allowed_R__ver_set_cpus_allowed not found in System.map. Ignoring ksyms_base entry
invalid operand: 0000
CPU: 1
EIP: 0010:[<c02b700b>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 0000002b ebx: c020f938 ecx: 00000001 edx: 00000001
esi: 0000001a edi: 00000020 ebp: c16e3560 esp: dffe9ea4
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=dffe9000)
Stack: c039e240 c020f938 0000189d 0000189d c16e3400 df04ea40 c020f940
df04ea40
0000189d c020f938 00000020 c16e3400 00000040 0000e401 0000189d
00000020 00000001 0000001f c842b89d 00001000 c020f170 c16e3400
df3249c0 04000001
Call Trace: [<c020f938>] [<c020f940>] [<c020f938>] [<c020f170>]
[<c010a461>] [<c010a656>] [<c0106df0>] [<c010cc88>] [<c0106df0>]
[<c0106e1c>] [<c0106e79>]
[<c011abe9>] [<c011ae7f>]
Code: 0f 0b 5c 00 60 e2 39 c0 83 c4 14 5b c3 90 8d b4 26 00 00 00

>>EIP; c02b700b <skb_over_panic+2b/40> <=====
Trace; c020f938 <boomerang_rx+258/430>
Trace; c020f940 <boomerang_rx+260/430>
Trace; c020f938 <boomerang_rx+258/430>
Trace; c020f170 <boomerang_interrupt+130/3d0>
Trace; c010a461 <handle_IRQ_event+51/80>
Trace; c010a656 <do_IRQ+a6/f0>
Trace; c0106df0 <default_idle+0/40>
Trace; c010cc88 <call_do_IRQ+5/d>
Trace; c0106df0 <default_idle+0/40>
Trace; c0106e1c <default_idle+2c/40>
Trace; c0106e79 <cpu_idle+29/40>
Trace; c011abe9 <call_console_drivers+d9/e0>
Trace; c011ae7f <release_console_sem+8f/a0>
Code; c02b700b <skb_over_panic+2b/40>
00000000 <_EIP>:
Code; c02b700b <skb_over_panic+2b/40> <=====
0: 0f 0b ud2a <=====
Code; c02b700d <skb_over_panic+2d/40>
2: 5c pop %esp
Code; c02b700e <skb_over_panic+2e/40>
3: 00 60 e2 add %ah,0xffffffe2(%eax)
Code; c02b7011 <skb_over_panic+31/40>
6: 39 c0 cmp %eax,%eax
Code; c02b7013 <skb_over_panic+33/40>
8: 83 c4 14 add $0x14,%esp
Code; c02b7016 <skb_over_panic+36/40>
b: 5b pop %ebx
Code; c02b7017 <skb_over_panic+37/40>
c: c3 ret
Code; c02b7018 <skb_over_panic+38/40>
d: 90 nop
Code; c02b7019 <skb_over_panic+39/40>
e: 8d b4 26 00 00 00 00 lea 0x0(%esi,1),%esi

<0>Kernel panic: Aiee, killing interrupt handler!

5 warnings issued. Results may not be reliable.



:wq!
---------------------------------------------------------------------------
Robert L. Harris | PGP Key ID: FC96D405

DISCLAIMER:
These are MY OPINIONS ALONE. I speak for no-one else.
FYI:
perl -e 'print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'


Attachments:
(No filename) (3.71 kB)
(No filename) (189.00 B)
Download all attachments

2003-01-15 16:48:29

by Jan Hudec

[permalink] [raw]
Subject: Re: Oops on server that just started hanging and crashing

On Tue, Jan 14, 2003 at 01:50:33PM -0500, Robert L. Harris wrote:
> Ok, after some data collection since (didn't know only the box in
> question could decode an oops...):
>
> System panic'd and has started hanging without a visual panic:

Started since when? Since recompiling and booting new kernel and/or
kernel modules or not? Because various panics and oopses often result
from hardware failure. Also, did the kernel crash just once, did it
crash several times at the same or similar address or did it crash
several times on completely different addresses?

For hard lockups, you could enable the NMI watchdog. That would give
even if it otherwise locks up completely. See
Documentation/nmi_watchdog.txt in kernel sources. However that is useful
if you don't suspect hardware (unfortunately almost anything - power
source, memory, CPU, bus controlers... - can cause seemingly random
lockups and oopses).

> Dual-amd 1.5Ghz
> 512Meg Ram
> 3Ware IDE RAID controller
> 16x160Gig disks, Making up 4 RAID5 arrays
>
>
>
> ksymoops 2.3.4 on i686 2.4.19-ac4. Options used
> -V (default)
> -k /proc/ksyms (default)
> -l /proc/modules (default)
> -o /lib/modules/2.4.19-ac4/ (default)
> -m /boot/System.map-2.4.19-ac4 (specified)
>
> No modules in ksyms, skipping objects
> Warning (read_lsmod): no symbols in lsmod, is /proc/modules a valid lsmod file?
> Warning (compare_maps): ksyms_base symbol __wake_up_sync_R__ver___wake_up_sync not found in System.map. Ignoring ksyms_base entry
> Warning (compare_maps): ksyms_base symbol i2o_sys_init_R__ver_i2o_sys_init not found in System.map. Ignoring ksyms_base entry
> Warning (compare_maps): ksyms_base symbol idle_cpu_R__ver_idle_cpu not found in System.map. Ignoring ksyms_base entry
> Warning (compare_maps): ksyms_base symbol set_cpus_allowed_R__ver_set_cpus_allowed not found in System.map. Ignoring ksyms_base entry
> invalid operand: 0000

Absolutely sure the System.map is for the kernel that generated the
oops? Absolutely sure the same modules are loaded?

-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <[email protected]>