2003-11-10 05:43:31

by Gavin Baker

[permalink] [raw]
Subject: (HPT372A) cat /proc/ide/hpt366 == crash

I have an Highpoint "RocketRaid 133" dual channel PCI IDE "raid" controller that uses an HPT372A.

With the latest 2.4's and 2.6.0-test9, if I cat /proc/ide/hpt366 I get the ide channel status followed shortly after by:

hdg: status timeout: status=0xd0 {Busy}

hdg: DMA disabled
hdg: drive not ready for command
ide3: reset: master: error (0x00?)
hdg: status timeout: status=0xd0 {Busy}

hdg: drive not ready for command
ide3: reset: master: error (0x00?)
end-request: I/O error, dev hdg, sector xxxxxx
EXT3-fs error (device md0): ext3_get_inode_loc: unable to read inode
block - inode = xxxxxx, block = xxxxxx

With the last two lines repeating until there is total filesystem corruption.

In regular usage they have been fine (I've built my distro from source without a problem).

dmesg:
HPT372A: IDE controller at PCI slot 0000:02:06.0
HPT372A: chipset revision 1
HPT37X: using 33MHz PCI clock
HPT372A: 100% native mode on irq 18
ide2: BM-DMA at 0xd800-0xd807, BIOS settings: hde:DMA, hdf:pio
HPT366: reg5ah=0x00 ATA-66 Cable Port0
ide3: BM-DMA at 0xd808-0xd80f, BIOS settings: hdg:DMA, hdh:pio
HPT366: reg5ah=0x00 ATA-66 Cable Port0
hde: ST3120026A, ATA DISK drive
ide2 at 0xc800-0xc807,0xcc02 on irq 18
hdg: ST3120026A, ATA DISK drive
ide3 at 0xd000-0xd007,0xd402 on irq 18

hde: max request size: 1024KiB
hde: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=16383/255/63,
UDMA(100)
/dev/ide/host2/bus0/target0/lun0: p1 p2 p3
hdg: max request size: 1024KiB
hdg: 234441648 sectors (120034 MB) w/8192KiB Cache, CHS=16383/255/63,
UDMA(100)
/dev/ide/host2/bus1/target0/lun0: p1 p2 p3


lspci:
02:06.0 RAID bus controller: Triones Technologies, Inc. HPT372A (rev
01)
Subsystem: Triones Technologies, Inc.: Unknown device 0001
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 120 (2000ns min, 2000ns max)
Interrupt: pin A routed to IRQ 18
Region 0: I/O ports at c800 [size=8]
Region 1: I/O ports at cc00 [size=4]
Region 2: I/O ports at d000 [size=8]
Region 3: I/O ports at d400 [size=4]
Region 4: I/O ports at d800 [size=256]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

Both drives are new seagate barracudas. They both have DMA enabled. I use regular kernel raid0, not the software raid drivers.

Probably related, the SMART data from these drives is showing Hardware_ECC_Recovered is up over 31 and 120 million, with Power_On_Hours less than 250.

Any ideas?

Thanks,
Gavin Baker
(PS, please CC: me)
--
______________________________________________
Check out the latest SMS services @ http://www.linuxmail.org
This allows you to send and receive SMS through your mailbox.


Powered by Outblaze


2003-11-10 15:10:24

by Tomi Orava

[permalink] [raw]
Subject: Re: (HPT372A) DMA/Interrupt problems, again


> I have an Highpoint "RocketRaid 133" dual channel PCI IDE "raid"
> controller that uses an HPT372A.

> hdg: status timeout: status=0xd0 {Busy}
>
> hdg: DMA disabled
> hdg: drive not ready for command
> ide3: reset: master: error (0x00?)
> hdg: status timeout: status=0xd0 {Busy}
>
> hdg: drive not ready for command
> ide3: reset: master: error (0x00?)
> end-request: I/O error, dev hdg, sector xxxxxx
> EXT3-fs error (device md0): ext3_get_inode_loc: unable to read inode
> block - inode = xxxxxx, block = xxxxxx

There was some discussion couple of weeks ago about a
similar problem with HPT374-controller. However, we did not
find a solution for this problem, even though there was a 3-4
persons who had seen this problem with different hardware
configuration.

I'm starting to wonder if the case is not really about a problem
in HPT366-driver but somewhere lower in IDE/interrupt code
as I got the following errors just an hour ago, with Sil680-controller.

The problem occurs only on _heavy_ I/O-access ie. whenever
I'm updating Postresql database with lkml web-archive data for example.
On normal/light use the system works just fine.

Do other people see this error a lot ?

Regards,
Tomi Orava

PS. Has anyone with enough knowledge about Linux memory handling
checked if Mark Bellon's slab-patch (msg subject:
"PATCH (2.4.x) - Interrupts disabled for a long time")
might somehow affect these IDE-problems ? Didn't see
any comments about the patch yet ...

----------------------------------------------------------------------------------
hde: dma_timer_expiry: dma status == 0x21
hde: error waiting for DMA
hde: dma timeout retry: status=0x58 { DriveReady SeekComplete DataRequest }

blk: queue c0466908, I/O limit 4095Mb (mask 0xffffffff)
hde: dma_timer_expiry: dma status == 0x21
hde: error waiting for DMA
hde: dma timeout retry: status=0xd0 { Busy }

hde: DMA disabled
ide2: reset timed-out, status=0xd0
hde: status timeout: status=0xd0 { Busy }

hde: drive not ready for command
ide2: reset timed-out, status=0xd0
end_request: I/O error, dev 21:02 (hde), sector 58728019
raid1: Disk failure on hde2, disabling device.
^IOperation continuing on 1 devices
-------------------------------------------------------------------------------------

The hardware in this case was:

Epox 8K9A3+/1.4Mhz AMD/TB
CMD/Sil 680 ide-controller:
2xMAXTOR 6L060J3 (D740X)

----------------------------------------------------------------
00:09.0 RAID bus controller: CMD Technology Inc PCI0680 (rev 01)
Subsystem: CMD Technology Inc PCI0680
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort+
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64, cache line size 01
Interrupt: pin A routed to IRQ 17
Region 0: I/O ports at 9000 [size=8]
Region 1: I/O ports at 9400 [size=4]
Region 2: I/O ports at 9800 [size=8]
Region 3: I/O ports at 9c00 [size=4]
Region 4: I/O ports at a000 [size=16]
Region 5: Memory at df000000 (32-bit, non-prefetchable) [size=256]
Expansion ROM at <unassigned> [disabled] [size=512K]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-
00: 95 10 80 06 07 00 90 0a 01 00 04 01 01 40 00 00
10: 01 90 00 00 01 94 00 00 01 98 00 00 01 9c 00 00
20: 01 a0 00 00 00 00 00 df 00 00 00 00 95 10 80 06
30: 00 00 00 00 60 00 00 00 00 00 00 00 11 01 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 01 00 22 06 00 40 00 64 00 00 00 00 00 00 00 00
70: 00 00 20 00 00 50 e8 37 00 00 20 00 00 40 e8 37
80: 03 00 00 00 03 00 00 00 00 00 11 00 00 00 00 00
90: ec ff 01 09 ff ff ff 44 00 00 00 18 00 00 00 00
a0: 01 60 8a 32 8a 32 dd 62 c1 10 92 43 01 40 09 40
b0: 01 60 8a 32 8a 32 dd 62 c1 10 92 43 01 40 09 40
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00