2002-06-07 17:40:34

by Anna Riley

[permalink] [raw]
Subject: kernel meltdown

Hi there,
I am hoping someone can help me. This morning one of our web servers crapped itself and I don't know why. It's running RedHat 7.2 kernel version 2.4.9-31smp. I couldn't login from the console so I had to reset it. When it came back up it was fine. This is what I am seeing in the messages log:


Jun 7 02:57:43 web02 kernel: kernel BUG at slab.c:1767!
Jun 7 02:57:43 web02 kernel: invalid operand: 0000
Jun 7 02:57:43 web02 kernel: Kernel 2.4.9-31smp
Jun 7 02:57:43 web02 kernel: CPU: 0
Jun 7 02:57:43 web02 kernel: EIP: 0010:[kmem_cache_reap+504/912] Not tainted
Jun 7 02:57:43 web02 kernel: EIP: 0010:[<c0133d08>] Not tainted
Jun 7 02:57:43 web02 kernel: EFLAGS: 00010092
Jun 7 02:57:43 web02 kernel: EIP is at kmem_cache_reap [kernel] 0x1f8
Jun 7 02:57:43 web02 kernel: eax: 0000001b ebx: e2c34000 ecx: c02db9e4 edx: 00003906
Jun 7 02:57:43 web02 kernel: esi: c1b8f9e8 edi: c1b8f9f8 ebp: 00000000 esp: e3fedf8c
Jun 7 02:57:43 web02 kernel: ds: 0018 es: 0018 ss: 0018
Jun 7 02:57:43 web02 kernel: Process kswapd (pid: 5, stackpage=e3fed000)
Jun 7 02:57:43 web02 kernel: Stack: c024f253 000006e7 00000d80 c1b8f9f8 c1b8f9f0 00000183 e3f9d000 0000000a
Jun 7 02:57:43 web02 kernel: 00000000 00000000 00000000 00000183 000000c0 000000c0 0008e000 c0136076
Jun 7 02:57:43 web02 kernel: 000000c0 e3fec000 00000006 c01360d5 000000c0 00000000 00010f00 c1d9ffb8
Jun 7 02:57:43 web02 kernel: Call Trace: [call_spurious_interrupt+130654/156203] .rodata.str1.1 [kernel] 0x2a8e
Jun 7 02:57:43 web02 kernel: Call Trace: [<c024f253>] .rodata.str1.1 [kernel] 0x2a8e
Jun 7 02:57:43 web02 kernel: [do_try_to_free_pages+70/80] do_try_to_free_pages [kernel] 0x46
Jun 7 02:57:43 web02 kernel: [<c0136076>] do_try_to_free_pages [kernel] 0x46
Jun 7 02:57:43 web02 kernel: [kswapd+85/240] kswapd [kernel] 0x55
Jun 7 02:57:43 web02 kernel: [<c01360d5>] kswapd [kernel] 0x55
Jun 7 02:57:43 web02 kernel: [_stext+0/96] stext [kernel] 0x0
Jun 7 02:57:43 web02 kernel: [<c0105000>] stext [kernel] 0x0
Jun 7 02:57:43 web02 kernel: [kernel_thread+38/48] kernel_thread [kernel] 0x26
Jun 7 02:57:43 web02 kernel: [<c0105866>] kernel_thread [kernel] 0x26
Jun 7 02:57:43 web02 kernel: [kswapd+0/240] kswapd [kernel] 0x0
Jun 7 02:57:43 web02 kernel: [<c0136080>] kswapd [kernel] 0x0
Jun 7 02:57:43 web02 kernel:
Jun 7 02:57:43 web02 kernel:
Jun 7 02:57:43 web02 kernel: Code: 0f 0b 58 5a 8b 03 45 39 f8 75 dd 8b 4e 2c 89 ea 8b 7e 4c d3


I have searched on google for some of these messages but I couldn't find anything helpful. Any help or direction would be apprecaited.

I am not subscribed to this list so email to me directly would be great. If I am mailing to the wrong list my apologies.

Thanks so much all!

-anna


2002-06-07 18:33:00

by Anna Riley

[permalink] [raw]
Subject: RE: kernel meltdown


Here is the output and thanks Thunder:

ksymoops 2.4.1 on i686 2.4.9-31smp. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.9-31smp/ (default)
-m /boot/System.map-2.4.9-31smp (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Error (expand_objects): cannot stat(/lib/ext3.o) for ext3
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/jbd.o) for jbd
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/sym53c8xx.o) for sym53c8xx
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
ksymoops: No such file or directory
/usr/bin/find: /lib/modules/2.4.9-31smp/build: No such file or directory
Error (pclose_local): find_objects pclose failed 0x100


Warning (compare_maps): ksyms_base symbol
GPLONLY_IO_APIC_get_PCI_irq_vector not found in System.map. Ignoring
ksyms_base entry
Warning (compare_maps): ksyms_base symbol
GPLONLY_pci_hp_change_slot_info not found in System.map. Ignoring
ksyms_base entry
Warning (compare_maps): ksyms_base symbol GPLONLY_pci_hp_deregister not
found in System.map. Ignoring ksyms_base entry
Warning (compare_maps): ksyms_base symbol GPLONLY_pci_hp_register not
found in System.map. Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol partition_name , ksyms_base
says c01c2f40, System.map says c0161e20. Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol nlmsvc_grace_period , lockd
says e4924e54, /lib/modules/2.4.9-31smp/kernel/fs/lockd/lockd.o says
e49242b4. Ignoring /lib/modules/2.4.9-31smp/kernel/fs/lockd/lockd.o
entry
Warning (compare_maps): mismatch on symbol nlmsvc_ops , lockd says
e4924e50, /lib/modules/2.4.9-31smp/kernel/fs/lockd/lockd.o says
e49242b0. Ignoring /lib/modules/2.4.9-31smp/kernel/fs/lockd/lockd.o
entry
Warning (compare_maps): mismatch on symbol nlmsvc_timeout , lockd says
e4924e58, /lib/modules/2.4.9-31smp/kernel/fs/lockd/lockd.o says
e49242b8. Ignoring /lib/modules/2.4.9-31smp/kernel/fs/lockd/lockd.o
entry
Warning (compare_maps): mismatch on symbol nfs_debug , sunrpc says
e4916180, /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o says
e4915e60. Ignoring /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o
entry
Warning (compare_maps): mismatch on symbol nfsd_debug , sunrpc says
e4916184, /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o says
e4915e64. Ignoring /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o
entry
Warning (compare_maps): mismatch on symbol nlm_debug , sunrpc says
e4916188, /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o says
e4915e68. Ignoring /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o
entry
Warning (compare_maps): mismatch on symbol rpc_debug , sunrpc says
e491617c, /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o says
e4915e5c. Ignoring /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o
entry
Warning (compare_maps): mismatch on symbol rpc_garbage_args , sunrpc
says e491615c, /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o says
e4915e3c. Ignoring /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o
entry
Warning (compare_maps): mismatch on symbol rpc_success , sunrpc says
e491614c, /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o says
e4915e2c. Ignoring /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o
entry
Warning (compare_maps): mismatch on symbol rpc_system_err , sunrpc says
e4916160, /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o says
e4915e40. Ignoring /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o
entry
Warning (compare_maps): mismatch on symbol xdr_one , sunrpc says
e4916144, /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o says
e4915e24. Ignoring /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o
entry
Warning (compare_maps): mismatch on symbol xdr_two , sunrpc says
e4916148, /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o says
e4915e28. Ignoring /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o
entry
Warning (compare_maps): mismatch on symbol xdr_zero , sunrpc says
e4916140, /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o says
e4915e20. Ignoring /lib/modules/2.4.9-31smp/kernel/net/sunrpc/sunrpc.o
entry
Warning (compare_maps): mismatch on symbol usb_devfs_handle , usbcore
says e48db520, /lib/modules/2.4.9-31smp/kernel/drivers/usb/usbcore.o
says e48db040. Ignoring
/lib/modules/2.4.9-31smp/kernel/drivers/usb/usbcore.o entry
Warning (map_ksym_to_module): cannot match loaded module ext3 to a
unique module object. Trace may not be reliable.
Warning (compare_maps): mismatch on symbol sd , sd_mod says e481ce60,
/lib/modules/2.4.9-31smp/kernel/drivers/scsi/sd_mod.o says e481cdc0.
Ignoring /lib/modules/2.4.9-31smp/kernel/drivers/scsi/sd_mod.o entry
Warning (compare_maps): mismatch on symbol proc_scsi , scsi_mod says
e4818c9c, /lib/modules/2.4.9-31smp/kernel/drivers/scsi/scsi_mod.o says
e48174d4. Ignoring
/lib/modules/2.4.9-31smp/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_devicelist , scsi_mod
says e4818cc8, /lib/modules/2.4.9-31smp/kernel/drivers/scsi/scsi_mod.o
says e4817500. Ignoring
/lib/modules/2.4.9-31smp/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_hostlist , scsi_mod
says e4818cc4, /lib/modules/2.4.9-31smp/kernel/drivers/scsi/scsi_mod.o
says e48174fc. Ignoring
/lib/modules/2.4.9-31smp/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_hosts , scsi_mod says
e4818ccc, /lib/modules/2.4.9-31smp/kernel/drivers/scsi/scsi_mod.o says
e4817504. Ignoring
/lib/modules/2.4.9-31smp/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_logging_level ,
scsi_mod says e4818c98,
/lib/modules/2.4.9-31smp/kernel/drivers/scsi/scsi_mod.o says e48174d0.
Ignoring /lib/modules/2.4.9-31smp/kernel/drivers/scsi/scsi_mod.o entry
kernel BUG at slab.c:1767!
invalid operand: 0000
CPU: 0
EIP: 0010:[kmem_cache_reap+504/912] Not tainted
EIP: 0010:[<c0133d08>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010092
eax: 0000001b ebx: e2c34000 ecx: c02db9e4 edx: 00003906
esi: c1b8f9e8 edi: c1b8f9f8 ebp: 00000000 esp: e3fedf8c
ds: 0018 es: 0018 ss: 0018
Process kswapd (pid: 5, stackpage=e3fed000)
Stack: c024f253 000006e7 00000d80 c1b8f9f8 c1b8f9f0 00000183 e3f9d000
0000000a
00000000 00000000 00000000 00000183 000000c0 000000c0 0008e000
c0136076
000000c0 e3fec000 00000006 c01360d5 000000c0 00000000 00010f00
c1d9ffb8
Call Trace: [call_spurious_interrupt+130654/156203] .rodata.str1.1
[kernel] 0x2a8e
Call Trace: [<c024f253>] .rodata.str1.1 [kernel] 0x2a8e
[<c0136076>] do_try_to_free_pages [kernel] 0x46
[<c01360d5>] kswapd [kernel] 0x55
[<c0105000>] stext [kernel] 0x0
[<c0105866>] kernel_thread [kernel] 0x26
[<c0136080>] kswapd [kernel] 0x0
Code: 0f 0b 58 5a 8b 03 45 39 f8 75 dd 8b 4e 2c 89 ea 8b 7e 4c d3

>>EIP; c0133d08 <kmem_cache_reap+1f8/390> <=====
Trace; c024f253 <call_spurious_interrupt+1fe5e/2622b>
Trace; c0136076 <do_try_to_free_pages+46/50>
Trace; c01360d5 <kswapd+55/f0>
Trace; c0105000 <_stext+0/0>
Trace; c0105866 <kernel_thread+26/30>
Trace; c0136080 <kswapd+0/f0>
Code; c0133d08 <kmem_cache_reap+1f8/390>
00000000 <_EIP>:
Code; c0133d08 <kmem_cache_reap+1f8/390> <=====
0: 0f 0b ud2a <=====
Code; c0133d0a <kmem_cache_reap+1fa/390>
2: 58 pop %eax
Code; c0133d0b <kmem_cache_reap+1fb/390>
3: 5a pop %edx
Code; c0133d0c <kmem_cache_reap+1fc/390>
4: 8b 03 mov (%ebx),%eax
Code; c0133d0e <kmem_cache_reap+1fe/390>
6: 45 inc %ebp
Code; c0133d0f <kmem_cache_reap+1ff/390>
7: 39 f8 cmp %edi,%eax
Code; c0133d11 <kmem_cache_reap+201/390>
9: 75 dd jne ffffffe8 <_EIP+0xffffffe8>
c0133cf0 <kmem_cache_reap+1e0/390>
Code; c0133d13 <kmem_cache_reap+203/390>
b: 8b 4e 2c mov 0x2c(%esi),%ecx
Code; c0133d16 <kmem_cache_reap+206/390>
e: 89 ea mov %ebp,%edx
Code; c0133d18 <kmem_cache_reap+208/390>
10: 8b 7e 4c mov 0x4c(%esi),%edi
Code; c0133d1b <kmem_cache_reap+20b/390>
13: d3 00 roll %cl,(%eax)


27 warnings and 6 errors issued. Results may not be reliable.




-anna



-----Original Message-----
From: Thunder from the hill [mailto:[email protected]]
Sent: Friday, June 07, 2002 1:27 PM
To: Anna Riley
Cc: Thunder from the hill
Subject: RE: kernel meltdown


Hi,

On Fri, 7 Jun 2002, Anna Riley wrote:
> kernel BUG at slab.c:1767!
> invalid operand: 0000
> Kernel 2.4.9-31smp
> CPU: 0
> EIP: 0010:[kmem_cache_reap+504/912] Not tainted
> EIP: 0010:[<c0133d08>] Not tainted
> EFLAGS: 00010092
> EIP is at kmem_cache_reap [kernel] 0x1f8
> eax: 0000001b ebx: e2c34000 ecx: c02db9e4 edx: 00003906
> esi: c1b8f9e8 edi: c1b8f9f8 ebp: 00000000 esp: e3fedf8c
> ds: 0018 es: 0018 ss: 0018
> Process kswapd (pid: 5, stackpage=e3fed000)
> Stack: c024f253 000006e7 00000d80 c1b8f9f8 c1b8f9f0 00000183 e3f9d000
> 0000000a
> 00000000 00000000 00000000 00000183 000000c0 000000c0 0008e000
> c0136076
> 000000c0 e3fec000 00000006 c01360d5 000000c0 00000000 00010f00
> c1d9ffb8
> Call Trace: [call_spurious_interrupt+130654/156203] .rodata.str1.1
> [kernel] 0x2a8e
> Call Trace: [<c024f253>] .rodata.str1.1 [kernel] 0x2a8e
> [do_try_to_free_pages+70/80] do_try_to_free_pages [kernel] 0x46
> [<c0136076>] do_try_to_free_pages [kernel] 0x46
> [kswapd+85/240] kswapd [kernel] 0x55
> [<c01360d5>] kswapd [kernel] 0x55
> [_stext+0/96] stext [kernel] 0x0
> [<c0105000>] stext [kernel] 0x0
> [kernel_thread+38/48] kernel_thread [kernel] 0x26
> [<c0105866>] kernel_thread [kernel] 0x26
> [kswapd+0/240] kswapd [kernel] 0x0
> [<c0136080>] kswapd [kernel] 0x0
> Code: 0f 0b 58 5a 8b 03 45 39 f8 75 dd 8b 4e 2c 89 ea 8b 7e 4c d3

> > Then do "cat $filename | ksymoops" and send the result to the list.

Yet you haven't pushed it through ksymoops, which was the whole
training.
Only you can do that, since only you have the appropriate symbols.
Please
push this oops through ksymoops and mail it to
[email protected]! (I will get it, either)

Regards,
Thunder
--
ship is leaving right on time | Thunder from the hill at
ngforever
empty harbour, wave goodbye |
evacuation of the isle | free inhabitant not directly
caveman's paintings drowning | belonging anywhere

2002-06-07 22:11:28

by Kasper Dupont

[permalink] [raw]
Subject: Re: kernel meltdown

Anna Riley wrote:
>
> Hi there,
> I am hoping someone can help me. This morning one of our web servers
> crapped itself and I don't know why. It's running RedHat 7.2 kernel
> version 2.4.9-31smp. I couldn't login from the console so I had to
> reset it. When it came back up it was fine. This is what I am seeing
> in the messages log:
>
> Jun 7 02:57:43 web02 kernel: kernel BUG at slab.c:1767!

I looked up that line in the source and found this piece of code:

full_free = 0;
p = searchp->slabs_free.next;
while (p != &searchp->slabs_free) {
slabp = list_entry(p, slab_t, list);
if (slabp->inuse)
BUG();
full_free++;
p = p->next;
}

Could it be a race with this particular slabp being taken in use
by another CPU at this very moment?

--
Kasper Dupont -- der bruger for meget tid p? usenet.
For sending spam use mailto:[email protected]

2002-06-07 23:50:18

by Tomas Vanderka

[permalink] [raw]
Subject: Re: kernel meltdown

Hi,
yesterday morning I got almost the same oops and kernel BUG at
slab.c:1794! I am running an up box so It's not an smp problem i think.
(I have lot of oopses since I added one hdd and changed raid0 to raid5.
aren't you using stuff like raid, lvm or reiserfs?)
Just sent this because it looks like the same problem

VanTo


Attachments:
7d (2.84 kB)

2002-06-10 09:15:37

by Randal, Phil

[permalink] [raw]
Subject: RE: kernel meltdown

RedHat have released an updated kernel (2.4.9-34).

Does this help?

---------------------------------------------
Phil Randal
Network Engineer
Herefordshire Council
Hereford, UK

> -----Original Message-----
> From: Anna Riley [mailto:[email protected]]
> Sent: 07 June 2002 18:36
> To: [email protected]
> Subject: kernel meltdown
>
>
> Hi there,
> I am hoping someone can help me. This morning one of
> our web servers crapped itself and I don't know why. It's
> running RedHat 7.2 kernel version 2.4.9-31smp. I couldn't
> login from the console so I had to reset it. When it came
> back up it was fine. This is what I am seeing in the messages log:
>
>
> Jun 7 02:57:43 web02 kernel: kernel BUG at slab.c:1767!
> Jun 7 02:57:43 web02 kernel: invalid operand: 0000
> Jun 7 02:57:43 web02 kernel: Kernel 2.4.9-31smp
> Jun 7 02:57:43 web02 kernel: CPU: 0
> Jun 7 02:57:43 web02 kernel: EIP:
> 0010:[kmem_cache_reap+504/912] Not tainted
> Jun 7 02:57:43 web02 kernel: EIP: 0010:[<c0133d08>] Not tainted
> Jun 7 02:57:43 web02 kernel: EFLAGS: 00010092
> Jun 7 02:57:43 web02 kernel: EIP is at kmem_cache_reap
> [kernel] 0x1f8
> Jun 7 02:57:43 web02 kernel: eax: 0000001b ebx: e2c34000
> ecx: c02db9e4 edx: 00003906
> Jun 7 02:57:43 web02 kernel: esi: c1b8f9e8 edi: c1b8f9f8
> ebp: 00000000 esp: e3fedf8c
> Jun 7 02:57:43 web02 kernel: ds: 0018 es: 0018 ss: 0018
> Jun 7 02:57:43 web02 kernel: Process kswapd (pid: 5,
> stackpage=e3fed000)
> Jun 7 02:57:43 web02 kernel: Stack: c024f253 000006e7
> 00000d80 c1b8f9f8 c1b8f9f0 00000183 e3f9d000 0000000a
> Jun 7 02:57:43 web02 kernel: 00000000 00000000
> 00000000 00000183 000000c0 000000c0 0008e000 c0136076
> Jun 7 02:57:43 web02 kernel: 000000c0 e3fec000
> 00000006 c01360d5 000000c0 00000000 00010f00 c1d9ffb8
> Jun 7 02:57:43 web02 kernel: Call Trace:
> [call_spurious_interrupt+130654/156203] .rodata.str1.1
> [kernel] 0x2a8e
> Jun 7 02:57:43 web02 kernel: Call Trace: [<c024f253>]
> .rodata.str1.1 [kernel] 0x2a8e
> Jun 7 02:57:43 web02 kernel: [do_try_to_free_pages+70/80]
> do_try_to_free_pages [kernel] 0x46
> Jun 7 02:57:43 web02 kernel: [<c0136076>]
> do_try_to_free_pages [kernel] 0x46
> Jun 7 02:57:43 web02 kernel: [kswapd+85/240] kswapd [kernel] 0x55
> Jun 7 02:57:43 web02 kernel: [<c01360d5>] kswapd [kernel] 0x55
> Jun 7 02:57:43 web02 kernel: [_stext+0/96] stext [kernel] 0x0
> Jun 7 02:57:43 web02 kernel: [<c0105000>] stext [kernel] 0x0
> Jun 7 02:57:43 web02 kernel: [kernel_thread+38/48]
> kernel_thread [kernel] 0x26
> Jun 7 02:57:43 web02 kernel: [<c0105866>] kernel_thread
> [kernel] 0x26
> Jun 7 02:57:43 web02 kernel: [kswapd+0/240] kswapd [kernel] 0x0
> Jun 7 02:57:43 web02 kernel: [<c0136080>] kswapd [kernel] 0x0
> Jun 7 02:57:43 web02 kernel:
> Jun 7 02:57:43 web02 kernel:
> Jun 7 02:57:43 web02 kernel: Code: 0f 0b 58 5a 8b 03 45 39
> f8 75 dd 8b 4e 2c 89 ea 8b 7e 4c d3
>
>
> I have searched on google for some of these messages but I
> couldn't find anything helpful. Any help or direction would
> be apprecaited.
>
> I am not subscribed to this list so email to me directly
> would be great. If I am mailing to the wrong list my apologies.
>
> Thanks so much all!
>
> -anna
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>