2004-04-02 17:29:10

by Johannes Deisenhofer

[permalink] [raw]
Subject: Undecoded Interrupt with SiL3112 IDE?

Hi,

I have a recent problem with my sil3112 onboard SATA adapter.

Starting after a while (sometimes after hours of uptime), i get this every few
seconds:

----- snip -----
irq 18: nobody cared!
Call Trace:
[<c0108f03>] __report_bad_irq+0x33/0x90
[<c0108fe0>] note_interrupt+0x50/0x80
[<c01091e9>] do_IRQ+0xa9/0x130
[<c0105030>] default_idle+0x0/0x30
[<c010795c>] common_interrupt+0x18/0x20
[<c0105030>] default_idle+0x0/0x30
[<c0105053>] default_idle+0x23/0x30
[<c01050de>] cpu_idle+0x2e/0x40
[<c0103055>] _stext+0x55/0x60
[<c04e66f5>] start_kernel+0x155/0x160

handlers:
[<c02ae9f0>] (ide_intr+0x0/0x180)
[<c02ae9f0>] (ide_intr+0x0/0x180)
Disabling IRQ #18
----- snip -----

This started recently (without changes in hardware or software) and drags down
my machine quite a bit. No hangs / data losses, however.

- There is always an interrupt storm (about 100000 IRQ in ca. 1 sec) on IRQ 18
when this message is logged.
- kernel 2.6.5-rc2
- siimage driver (not libata)
- two SATA drives on adapter, ST3120026AS and WDC WD1200JD-00FYB0
- Asus A7N8X board (nforce2 chipset), latest bios
- There is only the onboard SATA adapter on this IRQ. I've pulled the PCI card
physically sharing the same IRQ line.
- Same problem with kernel 2.4, although it handles it less gracefully (system
freezes for some time).
- Disabling ACPI doesn't change a thing (IRQ #11 will be disabled, then)
- System has otherwise been stable.
- After reboot, problem will disappear for a while

I suspect some unhandled error condition of the sil chip. After disconnecting
and reseating both SATA connectors, problems disappeared for two days.
Coincidence?

Anything I can test before I go and buy new cables?

From lspci -v -xxxx

01:0b.0 RAID bus controller: CMD Technology Inc Silicon Image SiI 3112
SATARaid Controller (rev 01)
Subsystem: CMD Technology Inc: Unknown device 6112
Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 18
I/O ports at 9800 [size=8]
I/O ports at 9c00 [size=4]
I/O ports at a000 [size=8]
I/O ports at a400 [size=4]
I/O ports at a800 [size=16]
Memory at de005000 (32-bit, non-prefetchable) [size=512]
Expansion ROM at <unassigned> [disabled] [size=512K]
Capabilities: [60] Power Management version 2
00: 95 10 12 31 07 00 b0 02 01 00 04 01 01 20 00 00
10: 01 98 00 00 01 9c 00 00 01 a0 00 00 01 a4 00 00
20: 01 a8 00 00 00 50 00 de 00 00 00 00 95 10 12 61
30: 00 00 00 00 60 00 00 00 00 00 00 00 0b 01 00 00
40: 02 00 00 00 00 82 08 ba 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 01 00 22 06 00 40 00 64 00 00 00 00 00 00 00 00
70: 00 00 20 00 00 e0 d5 37 00 00 20 00 00 c0 d5 37
80: 03 00 00 00 03 00 00 00 00 00 10 00 da a9 50 7e
90: 00 fc 01 01 0f ff 00 00 00 00 00 18 00 00 00 00
a0: 01 60 8a 32 8a 32 dd 62 c1 10 92 43 01 40 09 40
b0: 01 60 8a 32 8a 32 dd 62 c1 10 92 43 02 40 09 40
c0: 84 01 00 00 13 01 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00


P.S.: I'm not on the linux-kernel list, but I read the archives

Jo



2004-04-02 17:48:34

by Justin Cormack

[permalink] [raw]
Subject: Re: Undecoded Interrupt with SiL3112 IDE?

siimage driver has buggy interrupt handling - have seen similar
behaviour. It appears to be unmaintained. Recommend using libata
instead.

Justin


On Fri, 2004-04-02 at 18:29, Johannes Deisenhofer wrote:
> Hi,
>
> I have a recent problem with my sil3112 onboard SATA adapter.
>
> Starting after a while (sometimes after hours of uptime), i get this every few
> seconds:
>
> ----- snip -----
> irq 18: nobody cared!
> Call Trace:
> [<c0108f03>] __report_bad_irq+0x33/0x90
> [<c0108fe0>] note_interrupt+0x50/0x80
> [<c01091e9>] do_IRQ+0xa9/0x130
> [<c0105030>] default_idle+0x0/0x30
> [<c010795c>] common_interrupt+0x18/0x20
> [<c0105030>] default_idle+0x0/0x30
> [<c0105053>] default_idle+0x23/0x30
> [<c01050de>] cpu_idle+0x2e/0x40
> [<c0103055>] _stext+0x55/0x60
> [<c04e66f5>] start_kernel+0x155/0x160
>
> handlers:
> [<c02ae9f0>] (ide_intr+0x0/0x180)
> [<c02ae9f0>] (ide_intr+0x0/0x180)
> Disabling IRQ #18
> ----- snip -----
>
> This started recently (without changes in hardware or software) and drags down
> my machine quite a bit. No hangs / data losses, however.
>
> - There is always an interrupt storm (about 100000 IRQ in ca. 1 sec) on IRQ 18
> when this message is logged.
> - kernel 2.6.5-rc2
> - siimage driver (not libata)
> - two SATA drives on adapter, ST3120026AS and WDC WD1200JD-00FYB0
> - Asus A7N8X board (nforce2 chipset), latest bios
> - There is only the onboard SATA adapter on this IRQ. I've pulled the PCI card
> physically sharing the same IRQ line.
> - Same problem with kernel 2.4, although it handles it less gracefully (system
> freezes for some time).
> - Disabling ACPI doesn't change a thing (IRQ #11 will be disabled, then)
> - System has otherwise been stable.
> - After reboot, problem will disappear for a while
>
> I suspect some unhandled error condition of the sil chip. After disconnecting
> and reseating both SATA connectors, problems disappeared for two days.
> Coincidence?
>
> Anything I can test before I go and buy new cables?
>
> From lspci -v -xxxx
>
> 01:0b.0 RAID bus controller: CMD Technology Inc Silicon Image SiI 3112
> SATARaid Controller (rev 01)
> Subsystem: CMD Technology Inc: Unknown device 6112
> Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 18
> I/O ports at 9800 [size=8]
> I/O ports at 9c00 [size=4]
> I/O ports at a000 [size=8]
> I/O ports at a400 [size=4]
> I/O ports at a800 [size=16]
> Memory at de005000 (32-bit, non-prefetchable) [size=512]
> Expansion ROM at <unassigned> [disabled] [size=512K]
> Capabilities: [60] Power Management version 2
> 00: 95 10 12 31 07 00 b0 02 01 00 04 01 01 20 00 00
> 10: 01 98 00 00 01 9c 00 00 01 a0 00 00 01 a4 00 00
> 20: 01 a8 00 00 00 50 00 de 00 00 00 00 95 10 12 61
> 30: 00 00 00 00 60 00 00 00 00 00 00 00 0b 01 00 00
> 40: 02 00 00 00 00 82 08 ba 00 00 00 00 00 00 00 00
> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 01 00 22 06 00 40 00 64 00 00 00 00 00 00 00 00
> 70: 00 00 20 00 00 e0 d5 37 00 00 20 00 00 c0 d5 37
> 80: 03 00 00 00 03 00 00 00 00 00 10 00 da a9 50 7e
> 90: 00 fc 01 01 0f ff 00 00 00 00 00 18 00 00 00 00
> a0: 01 60 8a 32 8a 32 dd 62 c1 10 92 43 01 40 09 40
> b0: 01 60 8a 32 8a 32 dd 62 c1 10 92 43 02 40 09 40
> c0: 84 01 00 00 13 01 00 00 00 00 00 00 00 00 00 00
> d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
>
> P.S.: I'm not on the linux-kernel list, but I read the archives
>
> Jo
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2004-04-04 14:12:14

by Johannes Deisenhofer

[permalink] [raw]
Subject: Re: Undecoded Interrupt with SiL3112 IDE?

Justin Cormack wrote:
> siimage driver has buggy interrupt handling - have seen similar
> behaviour. It appears to be unmaintained. Recommend using libata
> instead.
>

I tried libata from 2.6.5-rc3. Previous versions are considered broken by the
author, especially for error handling.

However, I had far worse problems:

Apr 3 10:18:10 urmel kernel: <3>ata1: DMA timeout, stat 0x0
Apr 3 10:18:10 urmel kernel: ATA: abnormal status 0x58 on port 0xF8854087
Apr 3 10:18:10 urmel kernel: scsi0: ERROR on channel 0, id 0, lun 0, CDB:
Read (10) 00 00 07 65 3f 00 00 c8 00
Apr 3 10:18:10 urmel kernel: Current sda: sense key Medium Error
Apr 3 10:18:10 urmel kernel: Additional sense: Unrecovered read error - auto
reallocate failed
Apr 3 10:18:10 urmel kernel: end_request: I/O error, dev sda, sector 484671
Apr 3 10:18:10 urmel kernel: ATA: abnormal status 0x58 on port 0xF8854087
Apr 3 10:18:10 urmel last message repeated 2 times
Apr 3 10:19:47 urmel PAM_pwdb[3086]: (login) session opened for user root by
(uid=0)

This probably was a hardware problem. A new SATA cable seems to have fixed it.
Can't explain the 'medium error'.
SMART status of the drive is ok. No bad sectors according to SMART, none
reallocated.

Jo