Subject: PDC20265 ide_dma_timeout and RAID5 issues (2.4.17)

I've been trying to get the Linux software RAID to work with two QUANTUM
FIREBALLP AS40.0 drives. Each HD is connected as the master unit of one of
the channels of a Promise PDC20265 controller (ASUS A7V [not A7V133!] board,
newest BIOS). The connection is done using 80-wire IDE cable, without any
slave devices.

Using dd for read and write is fine, and so is RAID1. However, if (and only
if) I try to read from a RAID5 device using the two HDs (3 disk raid,
running in degraded mode), the system loses sync with the PDC20265
controller, and starts spilling out DMA errors, and interrupt lost errors.
It requires a SYSRQ-assisted sync+boot to recover the system.

Here is the error log:
# dd if=/dev/md3 of=/dev/null
raid5: switching cache buffer size, 4096 --> 1024
hdg: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hdg: status error: status=0x00 { }
hdg: drive not ready for command
hdg: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdg: read_intr: error=0x04 { DriveStatusError }
hdg: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdg: read_intr: error=0x04 { DriveStatusError }
hdg: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdg: read_intr: error=0x04 { DriveStatusError }
ide3: reset: success
hdg: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hdg: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hde: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hde: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hde: read_intr: error=0x04 { DriveStatusError }
hde: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hde: read_intr: error=0x04 { DriveStatusError }
hde: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hde: read_intr: error=0x04 { DriveStatusError }
hde: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hde: read_intr: error=0x04 { DriveStatusError }
ide2: reset: success
hdg: lost interrupt
hdg: lost interrupt
hdg: lost interrupt
hdg: lost interrupt

The same problem also happens with 2.2.20 + IDE patches + New RAID patches.
It is very interesting that the RAID1 profile does not trigger the bug, and
doing all sort of parallel reads using dd will not trigger the bug either.
Only RAID5 seems to be able to trigger it.

Kernel is 2.4.17, with the improved K7+VIA "Athlon bug stomper" patch, plus
Debian patches (bug also shows up without K7 patch). Attached is also the
lspci -v output for this machine, and bootup log.

Any ideas on how to fix this one? I will gladly help to debug and test
patches for this issue...

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh


Attachments:
(No filename) (2.89 kB)
foo (12.77 kB)
Download all attachments
Subject: Re: PDC20265 ide_dma_timeout and RAID5 issues (2.4.17)

Duh, forgot to attach lspci -v output. Here it is...

00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 02)
Subsystem: Asustek Computer, Inc.: Unknown device 8033
Flags: bus master, medium devsel, latency 8
Memory at e7000000 (32-bit, prefetchable) [size=16M]
Capabilities: [a0] AGP version 2.0
Capabilities: [c0] Power Management version 2

00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP] (prog-if 00 [Normal decode])
Flags: bus master, 66Mhz, medium devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
Memory behind bridge: d6000000-d7dfffff
Prefetchable memory behind bridge: d7f00000-e6ffffff
Capabilities: [80] Power Management version 2

00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 22)
Subsystem: Asustek Computer, Inc.: Unknown device 8033
Flags: bus master, stepping, medium devsel, latency 0

00:04.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 10) (prog-if 8a [Master SecP PriP])
Flags: bus master, medium devsel, latency 32
I/O ports at d800 [size=16]
Capabilities: [c0] Power Management version 2

00:04.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 10) (prog-if 00 [UHCI])
Subsystem: Unknown device 0925:1234
Flags: bus master, medium devsel, latency 32, IRQ 5
I/O ports at d400 [size=32]
Capabilities: [80] Power Management version 2

00:04.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 10) (prog-if 00 [UHCI])
Subsystem: Unknown device 0925:1234
Flags: bus master, medium devsel, latency 32, IRQ 5
I/O ports at d000 [size=32]
Capabilities: [80] Power Management version 2

00:04.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 30)
Subsystem: Asustek Computer, Inc.: Unknown device 8033
Flags: medium devsel, IRQ 9
Capabilities: [68] Power Management version 2

00:09.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone] (rev 30)
Subsystem: 3Com Corporation 3C905B Fast Etherlink XL 10/100
Flags: bus master, medium devsel, latency 32, IRQ 5
I/O ports at a400 [size=128]
Memory at d5800000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 1

00:0a.0 Multimedia audio controller: Cirrus Logic CS 4614/22/24 [CrystalClear SoundFusion Audio Accelerator] (rev 01)
Subsystem: KYE Systems Corporation: Unknown device 7003
Flags: bus master, medium devsel, latency 32, IRQ 5
Memory at d5000000 (32-bit, non-prefetchable) [size=4K]
Memory at d4800000 (32-bit, non-prefetchable) [size=1M]
Capabilities: [40] Power Management version 2

00:11.0 Unknown mass storage controller: Promise Technology, Inc. 20265 (rev 02)
Subsystem: Promise Technology, Inc.: Unknown device 4d33
Flags: bus master, medium devsel, latency 32, IRQ 10
I/O ports at a000 [size=8]
I/O ports at 9800 [size=4]
I/O ports at 9400 [size=8]
I/O ports at 9000 [size=4]
I/O ports at 8800 [size=64]
Memory at d4000000 (32-bit, non-prefetchable) [size=128K]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [58] Power Management version 1

01:00.0 VGA compatible controller: nVidia Corporation NV11 (GeForce2 MX) (rev b2) (prog-if 00 [VGA])
Subsystem: Asustek Computer, Inc.: Unknown device 4031
Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 11
Memory at d6000000 (32-bit, non-prefetchable) [size=16M]
Memory at d8000000 (32-bit, prefetchable) [size=128M]
Expansion ROM at d7ff0000 [disabled] [size=64K]
Capabilities: [60] Power Management version 2
Capabilities: [44] AGP version 2.0

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

Subject: Re: PDC20265 ide_dma_timeout and RAID5 issues (2.4.17)

Well, following a hint by Carlos Carvalho (thanks, man!), I downloaded and
applied the newest version of the IDE patches, and so far the system appears
to be working fine. I am stress-testing it, and will report back if it
breaks.

Apparently, there is a showstopper bug for the PDC20265 on a A7V for RAID in
the current IDE subsystem of 2.4.17, which is fixed in the newest
incarnation of the IDE patches. So, for archival purposes, anyone having
issues and hangs with Promise PDC20265 controllers and Linux Software
RAID5, on a VIA kt133 board... you may need to update the IDE subsystem with
the newest patches from linuxdiskcert.org to solve your problem.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh