I'm having some interrupt problems on a Dell XPS 600. It is running
Fedora Core 3, with a 2.6.18.2 kernel.org kernel. It is a dual
Pentium D 3 GHz CPU. It will behave like the interrupts are not
longer being generated. If irqbalance is running /proc/interrupts
will show that interrupt no longer incremented when that device is
having problems. If irqblance isn't running it seems like
/proc/interrupts is still incrementing, but the device isn't getting
them. I've attached dmesg and lspci.
I've seen SATA errors, USB system completely not responding, and other
issues that could be explained by irq problems. After some
accumulative observation it is better (runs longer) without running
irqbalance and so with the SATA IRQ I ran in /proc/irq/209 which is
the SATA interrupt while I had bonnie++ running (fresh boot, not in
X),
while true ; do echo 2 > smp_affinity; sleep .001 ; echo 1 > smp_affinity ; sleep .001 ; done
Very shortly I was getting the following SATA errors which the system
did not recover from.
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: (BMDMA stat 0x24)
ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: soft resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: hard resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: hard resetting port
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
I've also seen this in the past.
ata1: command 0xca timeout, stat 0x50 host_stat 0x24
ata1: status=0x50 { DriveReady SeekComplete }
sda: Current: sense key: No Sense
Additional sense: No additional sense information
Info fld=0x6726
I copied this error from the USB system,
ohci-hcd 000:00:0b.0: IRQ INTR_SF loosage
ohci-hcd 000:00:0b.0: IRQ INTR_SF loosage
ohci-hcd 000:00:0b.0: bad entry 364d6041
Any ideas? Anyone else seeing any problems?
--
David Fries <[email protected]>
http://fries.net/~david/ (PGP encryption key available)