2002-10-18 15:18:58

by David F Barrera

[permalink] [raw]
Subject: Machine hang - OOPS


My machine 'hanged' last night while running the Database Opensource Test
Suite (DOTS) with DB2 on a standard Red Hat Linux release 7.3 (Valhalla),
Kernel 2.4.18-3bigmem on an i686 installation. The machine appears hang,
although it replies to a ping.

I captured the following oops and ran it through ksymoops. Does anyone
have an idea what might have gone wrong? If so, please respond to me
directly.

Thanks.

David F Barrera

ksymoops output:

ksymoops 2.4.4 on i686 2.4.18-3bigmem. Options used
-v /boot/vmlinux-2.4.18-3bigmem (specified)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.18-3bigmem/ (default)
-m /boot/System.map-2.4.18-3bigmem (specified)

Error (expand_objects): cannot stat(/lib/ips.o) for ips
Error (expand_objects): cannot stat(/lib/aic7xxx.o) for aic7xxx
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
Warning (compare_maps): mismatch on symbol partition_name , ksyms_base
says c01ce610, vmlinux says c0168d00. Ignoring ksyms_base entry
wait_on_irq, CPU 6:
irq: 0 [ 0 0 0 0 0 0 0 0 ]
bh: 1 [ 1 0 0 0 1 0 0 0 ]
Stack dumps:
CPU 0:00000000 00000000 0000e4c3 0000e4c4 0000e4c5 00000000 00000000
00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 000fde67 00000000 00000000 00000000 00000000
00000000 00000000 Call Trace:
CPU 1:00000000 00000000 f568ba80 edccc9a8 c02f0bec 00000013 f690be00
00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 Call Trace:
CPU 2:00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
00000001 01000000 00000012 f6e89c00 f6e89c00 00000000 c8ecee00
c8ecee00 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 Call Trace:
CPU 3:00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 Call Trace:
CPU 4:00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 Call Trace: [<f898d88a>] speedo_start_xmit [eepro100]
0x17a
[<c01e67f4>] qdisc_restart [kernel] 0x14
[<c01dd7fe>] dev_queue_xmit [kernel] 0x14e
[<c01d965c>] alloc_skb [kernel] 0xfc
[<c01f46d5>] ip_output [kernel] 0x105
[<c0205e80>] tcp_make_synack [kernel] 0x20
[<c01f432a>] ip_build_and_send_pkt [kernel] 0x1aa
[<c02079bd>] tcp_reset_keepalive_timer [kernel] 0x1d
[<f898d88a>] speedo_start_xmit [eepro100] 0x17a
[<c01e67f4>] qdisc_restart [kernel] 0x14
[<c01dd7fe>] dev_queue_xmit [kernel] 0x14e
[<f898d88a>] speedo_start_xmit [eepro100] 0x17a
[<c01f46d5>] ip_output [kernel] 0x105
[<c01e67f4>] qdisc_restart [kernel] 0x14
[<c01f4ae8>] ip_queue_xmit [kernel] 0x398
[<c01dd7fe>] dev_queue_xmit [kernel] 0x14e
[<c01f4ae8>] ip_queue_xmit [kernel] 0x398
[<c01f46d5>] ip_output [kernel] 0x105
[<c02093ee>] tcp_v4_send_check [kernel] 0x6e
[<c01f4ae8>] ip_queue_xmit [kernel] 0x398
[<c02093ee>] tcp_v4_send_check [kernel] 0x6e
[<c0204295>] tcp_transmit_skb [kernel] 0x565
[<c01daa64>] skb_checksum [kernel] 0x54
[<c01daa64>] skb_checksum [kernel] 0x54
[<c0119279>] __wake_up [kernel] 0x39
[<c020a87c>] tcp_v4_rcv [kernel] 0x3cc
[<c01daa64>] skb_checksum [kernel] 0x54
[<c011929e>] __wake_up [kernel] 0x5e
[<c020a87c>] tcp_v4_rcv [kernel] 0x3cc
[<c010de4f>] timer_interrupt [kernel] 0xaf
[<c01250e5>] update_process_times [kernel] 0x25
[<c01250e5>] update_process_times [kernel] 0x25
[<c0116079>] smp_apic_timer_interrupt [kernel] 0xa9
[<c012119b>] do_softirq [kernel] 0x7b
[<c012170f>] ksoftirqd [kernel] 0xaf
[<c0107286>] kernel_thread [kernel] 0x26
[<c0121660>] ksoftirqd [kernel] 0x0
CPU 5:00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 Call Trace:
CPU 7:00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 Call Trace: [<c0181fa1>] add_timer_randomness [kernel]
0xd1
[<c0182511>] extract_entropy [kernel] 0x381
[<f898d88a>] speedo_start_xmit [eepro100] 0x17a
[<f898d88a>] speedo_start_xmit [eepro100] 0x17a
[<c01e67f4>] qdisc_restart [kernel] 0x14
[<c01dd7fe>] dev_queue_xmit [kernel] 0x14e
[<c01d965c>] alloc_skb [kernel] 0xfc
[<c0209798>] tcp_v4_route_req [kernel] 0x78
[<c0118c60>] scheduler_tick [kernel] 0x80
[<c01250e5>] update_process_times [kernel] 0x25
[<c0116079>] smp_apic_timer_interrupt [kernel] 0xa9
[<c01250e5>] update_process_times [kernel] 0x25
[<f898d88a>] speedo_start_xmit [eepro100] 0x17a
[<c01e67f4>] qdisc_restart [kernel] 0x14
[<c01d965c>] alloc_skb [kernel] 0xfc
[<c01dd7e5>] dev_queue_xmit [kernel] 0x135
[<c01f46d5>] ip_output [kernel] 0x105
[<c01f4ae8>] ip_queue_xmit [kernel] 0x398
[<c010a52e>] handle_IRQ_event [kernel] 0x5e
[<c010a76d>] do_IRQ [kernel] 0xdd
[<c02093ee>] tcp_v4_send_check [kernel] 0x6e
[<c0204295>] tcp_transmit_skb [kernel] 0x565
[<c0204d67>] tcp_write_xmit [kernel] 0x157
[<c0119279>] __wake_up [kernel] 0x39
[<c02021b2>] __tcp_data_snd_check [kernel] 0x52
[<c020215c>] tcp_new_space [kernel] 0x7c
[<c020260c>] tcp_rcv_established [kernel] 0x11c
[<f8815890>] scsi_request_fn [scsi_mod] 0x1b0
[<c01daa64>] skb_checksum [kernel] 0x54
[<c020a2af>] tcp_v4_checksum_init [kernel] 0x7f
[<c020a571>] tcp_v4_rcv [kernel] 0xc1
[<f8815454>] scsi_io_completion_Rsmp_7e059b86 [scsi_mod] 0x2a4
[<c01f19ac>] ip_local_deliver [kernel] 0x12c
[<f881213f>] scsi_delete_timer_Rsmp_16aab775 [scsi_mod] 0xf
[<f880def1>] scsi_done [scsi_mod] 0xc1
[<f88580a1>] ips_done [ips] 0x4b1
[<f88588be>] ips_chkstatus [ips] 0x12e
[<f8854fb4>] ips_intr_copperhead [ips] 0x84
[<c0125410>] do_timer [kernel] 0x20
[<c010de4f>] timer_interrupt [kernel] 0xaf
[<c011862b>] wake_up_process [kernel] 0xb
[<c010a76d>] do_IRQ [kernel] 0xdd
[<c0106e70>] default_idle [kernel] 0x0
[<c0106e70>] default_idle [kernel] 0x0
[<c0106ef9>] cpu_idle [kernel] 0x29
[<c011cbf8>] printk [kernel] 0x128
CPU 6:c8ee7f28 c02423a4 00000006 00000000 ffffffff 00000006 c010a362
c02423b9
00000001 f6380000 00000001 c017d3fe f6380368 c02f5044 c8ee7f74
c8ee665c
c8ee6000 c012164d f6380000 f6380130 c02f5044 c8ee6000 00000000
c012a075
Call Trace: [<c010a362>] __global_cli [kernel] 0xe2
[<c017d3fe>] flush_to_ldisc [kernel] 0x9e
[<c012164d>] __run_task_queue [kernel] 0x5d
[<c012a075>] context_thread [kernel] 0x145
[<c0129f30>] context_thread [kernel] 0x0
[<c0105000>] stext [kernel] 0x0
[<c0107286>] kernel_thread [kernel] 0x26
[<c0129f30>] context_thread [kernel] 0x0
Warning (Oops_read): Code line not seen, dumping what data is available

Trace; c010a362 <__global_cli+e2/170>
Trace; c017d3fe <flush_to_ldisc+9e/120>
Trace; c012164d <__run_task_queue+5d/70>
Trace; c012a075 <context_thread+145/200>
Trace; c0129f30 <context_thread+0/200>
Trace; c0105000 <_stext+0/0>
Trace; c0107286 <kernel_thread+26/30>
Trace; c0129f30 <context_thread+0/200>


2 warnings and 4 errors issued. Results may not be reliable.



2002-10-18 15:50:23

by Alastair Stevens

[permalink] [raw]
Subject: Re: Machine hang - OOPS

> My machine 'hanged' last night while running the Database Opensource
Test
> Suite (DOTS) with DB2 on a standard Red Hat Linux release 7.3
(Valhalla),
> Kernel 2.4.18-3bigmem on an i686 installation. The machine appears
hang,
> although it replies to a ping.

Just before Alan says it :-)

There have been 4 errata kernels released for RH 7.3 so far, so you
really ought to upgrade to the latest of those for starters. Obviously,
your problem may not be easily reproducible, but if you _could_
reproduce it on the latest kernel, I think people would take a higher
level of interest! Just my advice....

Cheers
Alastair .-=-.
__________________________________,' `.
\ http://www.mrc-bsu.cam.ac.uk
Alastair Stevens, Systems Management Team \ 01223 330383
MRC Biostatistics Unit, Cambridge UK `=.......................