Hello
This is not the first time I'm getting an Oops with 2.4.20 and they always
seem to appear when ppp is active AND another activity takes place. This
time it was reading a CD over the IDE-SCSI driver (I was trying to verify
the error reported earlier on this list - see Subj. "ide-scsi CD-recorder
error reading burned disks"). But earlier I saw these Oopses before I even
installed the CD-Writer, before I had them when using a USB-printer, while
also testing ppp. For some reason, it always happens when I am using
"local" ppp - over a null-modem, but not when dialing a provider. These
Oopses happen in the interrupt handler, so, I have to copy the
Oops-message manually. I could use a serial terminal, but I need ppp to
trigger them:-) ...and these modern boxes only have 1 serial port:-(
So, after re-typing the Oops and running ksymoops (after restoring the
original module configuration), I've got the following:
ksymoops 2.4.3 on i686 2.4.20. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.20/ (default)
-m /boot/System.map-2.4.20 (default)
Oops: 0000
CPU: 0
EIP: 0010:[<c8b597f8>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: c26c3ec0 ebx: c02ae448 ecx: c61901c0 edx: c61901c0
esi: 00000000 edi: c02ae448 ebp: c0271e7c exp: c0271e5c
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c0271000)
Stack: c02ae448 c02ae404 c02ae448 c0271e88 c01b164b 000010c6 00000000 c26c3ec0
c0271e90 c8b599ca c02ae448 00000000 00000000 c0271ec0 c01b19f5 c02ae448
c61901c0 00000000 c61901c0 c02ae404 c02ae448 00000000 c02ae404 00000000
Call Trace: [<c01b164b>] [<c8b599ca>] [<c01b19f5>] [<c0115ff8>] [<c01b1d42>]
[<c01b200f>] [<c01b1e70>] [<c011d598>] [<c0119fac>] [<c0119ed9>] [<c0119cbb>]
[<c01082dd>] [<c0105240>] [<c0105240>] [<c010a5b8>] [<c0105240>] [<c0105240>]
[<c0105266>] [<c01052e2>] [<c0105000>] [<c010502a>]
Code: 8b 56 18 c7 45 f4 00 00 00 00 89 70 04 8b 46 1c c7 46 10 00
>>EIP; c8b597f8 <[ide-scsi]idescsi_issue_pc+18/1a0> <=====
Trace; c01b164a <ide_wait_stat+ba/110>
Trace; c8b599ca <[ide-scsi]idescsi_do_request+4a/60>
Trace; c01b19f4 <start_request+194/200>
Trace; c0115ff8 <printk+108/120>
Trace; c01b1d42 <ide_do_request+272/2c0>
Trace; c01b200e <ide_timer_expiry+19e/1b0>
Trace; c01b1e70 <ide_timer_expiry+0/1b0>
Trace; c011d598 <timer_bh+248/360>
Trace; c0119fac <bh_action+1c/50>
Trace; c0119ed8 <tasklet_hi_action+48/70>
Trace; c0119cba <do_softirq+4a/a0>
Trace; c01082dc <do_IRQ+9c/b0>
Trace; c0105240 <default_idle+0/30>
Trace; c0105240 <default_idle+0/30>
Trace; c010a5b8 <call_do_IRQ+6/e>
Trace; c0105240 <default_idle+0/30>
Trace; c0105240 <default_idle+0/30>
Trace; c0105266 <default_idle+26/30>
Trace; c01052e2 <cpu_idle+42/60>
Trace; c0105000 <_stext+0/0>
Trace; c010502a <rest_init+2a/30>
Code; c8b597f8 <[ide-scsi]idescsi_issue_pc+18/1a0>
00000000 <_EIP>:
Code; c8b597f8 <[ide-scsi]idescsi_issue_pc+18/1a0> <=====
0: 8b 56 18 mov 0x18(%esi),%edx <=====
Code; c8b597fa <[ide-scsi]idescsi_issue_pc+1a/1a0>
3: c7 45 f4 00 00 00 00 movl $0x0,0xfffffff4(%ebp)
Code; c8b59802 <[ide-scsi]idescsi_issue_pc+22/1a0>
a: 89 70 04 mov %esi,0x4(%eax)
Code; c8b59804 <[ide-scsi]idescsi_issue_pc+24/1a0>
d: 8b 46 1c mov 0x1c(%esi),%eax
Code; c8b59808 <[ide-scsi]idescsi_issue_pc+28/1a0>
10: c7 46 10 00 00 00 00 movl $0x0,0x10(%esi)
<0>Kernel panic: Aiee, killing interrupt handler!
5 warnings issued. Results may not be reliable.
Thanks
Guennadi
---
Guennadi Liakhovetski
Aha! Here's what happened:
</var/log/messages>
...
Jan 1 21:08:20 poirot pppd[1223]: sent [LCP EchoReq id=0xf magic=0x36c1d42a]
Jan 1 21:08:20 poirot pppd[1223]: rcvd [LCP EchoRep id=0xf magic=0xa654c63c]
Jan 1 21:08:32 poirot kernel: scsi : aborting command due to timeout : pid 11654, scsi1, channel 0, id 0, lun 0 Read (10) 00 00 00 00 00 00 00 02 00
Jan 1 21:08:32 poirot kernel: hdc: timeout waiting for DMA
Jan 1 21:08:32 poirot kernel: ide_dmaproc: chipset supported ide_dma_timeout func only: 14
Jan 1 21:08:32 poirot kernel: hdc: status timeout: status=0xc0 { Busy }
Jan 1 21:08:32 poirot kernel: hdc: drive not ready for command
Jan 1 21:08:50 poirot pppd[1223]: sent [LCP EchoReq id=0x10 magic=0x36c1d42a]
Jan 1 21:08:50 poirot pppd[1223]: rcvd [LCP EchoRep id=0x10 magic=0xa654c63c]
And that's it. Then came the fatal Oops, described in the previous post.
Yeah, after IDE-errors there were at least another 18 seconds, but,
anyway, might be related? And here's one more fragment from messages, that
seem to show another lock-up:
Dec 23 20:46:32 poirot kernel: I/O error: dev 0b:00, sector 387984
Dec 23 20:47:40 poirot su: (to root) lyakh on /dev/tty2
Dec 23 20:47:40 poirot su: pam_unix2: session started for user root, service su
Dec 23 20:49:08 poirot kernel: I/O error: dev 0b:01, sector 387984
Dec 23 20:50:00 poirot /USR/SBIN/CRON[2313]: (root) CMD ( /usr/lib/sa/sa1 )
Dec 23 20:59:00 poirot /USR/SBIN/CRON[2318]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly)
Dec 23 21:00:00 poirot /USR/SBIN/CRON[2322]: (root) CMD ( /usr/lib/sa/sa1 )
Dec 23 21:06:16 poirot kernel: scsi : aborting command due to timeout : pid 9333, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 00 00 00 00 02 00
Dec 23 21:06:16 poirot kernel: hdc: timeout waiting for DMA
Dec 23 21:06:16 poirot kernel: ide_dmaproc: chipset supported ide_dma_timeout func only: 14
Dec 23 21:06:16 poirot kernel: hdc: status timeout: status=0xc0 { Busy }
Dec 23 21:06:16 poirot kernel: hdc: drive not ready for command
Dec 23 21:06:41 poirot kernel: scsi : aborting command due to timeout : pid 9333, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 00 00 00 00 02 00
Dec 23 21:06:41 poirot kernel: SCSI host 0 abort (pid 9333) timed out - resetting
Dec 23 21:06:41 poirot kernel: SCSI bus is being reset for host 0 channel 0.
Dec 23 21:06:41 poirot kernel: scsi : aborting command due to timeout : pid 9333, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 00 00 00 00 02 00
Dec 23 21:06:41 poirot kernel: SCSI host 0 abort (pid 9333) timed out - resetting
Dec 23 21:06:41 poirot kernel: SCSI bus is being reset for host 0 channel 0.
Dec 23 21:06:42 poirot kernel: scsi : aborting command due to timeout : pid 9333, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 00 00 00 00 02 00
Dec 23 21:06:42 poirot kernel: SCSI host 0 abort (pid 9333) timed out - resetting
Dec 23 21:06:42 poirot kernel: SCSI bus is being reset for host 0 channel 0.
Dec 23 21:06:42 poirot kernel: scsi : aborting command due to timeout : pid 9333, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 00 00 00 00 02 00
Dec 23 21:06:42 poirot kernel: SCSI host 0 abort (pid 9333) timed out - resetting
Dec 23 21:06:42 poirot kernel: SCSI bus is being reset for host 0 channel 0.
Dec 23 21:06:43 poirot kernel: scsi : aborting command due to timeout : pid 9333, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 00 00 00 00 02 00
Dec 23 21:06:43 poirot kernel: SCSI host 0 abort (pid 9333) timed out - resetting
Dec 23 21:06:43 poirot kernel: SCSI bus is being reset for host 0 channel 0.
Dec 23 21:06:43 poirot kernel: scsi : aborting command due to timeout : pid 9333, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 00 00 00 00 02 00
Dec 23 21:06:43 poirot kernel: SCSI host 0 abort (pid 9333) timed out - resetting
Dec 23 21:06:43 poirot kernel: SCSI bus is being reset for host 0 channel 0.
Dec 23 21:06:44 poirot kernel: scsi : aborting command due to timeout : pid 9333, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 00 00 00 00 02 00
Dec 23 21:06:44 poirot kernel: SCSI host 0 abort (pid 9333) timed out - resetting
Dec 23 21:06:44 poirot kernel: SCSI bus is being reset for host 0 channel 0.
Dec 23 21:06:44 poirot kernel: scsi : aborting command due to timeout : pid 9333, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 00 00 00 00 02 00
Dec 23 21:06:44 poirot kernel: SCSI host 0 abort (pid 9333) timed out - resetting
Dec 23 21:06:44 poirot kernel: SCSI bus is being reset for host 0 channel 0.
Dec 23 21:06:45 poirot kernel: scsi : aborting command due to timeout : pid 9333, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 00 00 00 00 02 00
Dec 23 21:06:45 poirot kernel: SCSI host 0 abort (pid 9333) timed out - resetting
Dec 23 21:06:45 poirot kernel: SCSI bus is being reset for host 0 channel 0.
Dec 23 21:06:45 poirot kernel: scsi : aborting command due to timeout : pid 9333, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 00 00 00 00 02 00
Dec 23 21:06:45 poirot kernel: SCSI host 0 abort (pid 9333) timed out - resetting
Dec 23 21:06:45 poirot kernel: SCSI bus is being reset for host 0 channel 0.
Dec 23 21:06:46 poirot kernel: scsi : aborting command due to timeout : pid 9333, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 00 00 00 00 02 00
Dec 23 21:06:46 poirot kernel: SCSI host 0 abort (pid 9333) timed out - resetting
Dec 23 21:06:46 poirot kernel: SCSI bus is being reset for host 0 channel 0.
Dec 23 21:06:46 poirot kernel: hdc: ATAPI reset timed-out, status=0xd0
Thanks
---
Guennadi Liakhovetski
Forgot to mention - the hardware:
an ASUS (A7VI-VM) mobo with the
VIA ProSavage KM133 (VIA VT8365) North and
VT82C686B South bridges,
Duron 900MHz
hda: Maxtor 2B020H1,
hdb: WDC AC21600H,
hdc: 40X12, (BenQ CD/RW "CRW 4012A")
Adaptec 29160 Ultra160 SCSI adapter
PIONEER Model: DVD-ROM DVD-305 Rev: 1.03
...still hoping for some feedback / encouragement:-) Now that I was trying
to reproduce those Oopses it works rock-stable with the same kernel /
modules:-( whatever I do - ide-scsi read / write with usb-printing and
ppp-transfers, with and without X - conditions that gave me multiple
Oopses when I didn't want them:-) From all the info in these messages, any
idea how to re-trigger them?
Thanks
---
Guennadi Liakhovetski