2004-09-14 17:02:47

by Peter Mc Aulay

[permalink] [raw]
Subject: pdc202xx_new + software raid0 freezes on array writes

Hi...

[NOTE: Please CC replies to me personally as I'm not a list subscriber.
Thank you.]

I was wondering if anyone could help me at all with a little problem I'm
having.

I have 4 Western Digital 200GB hard disks (WDC WD2000JB) in a software
RAID0 setup, two each connected to a Promise Ultra100TX2 (PDC20268) and
an Ultra133TX2 (PDC20269) (pdc202XX_new driver compiled into the
kernel). The system itself is a K6-2 450Mhz CPU on an ATX mainboard of
unknown brand using the ALi M1541/5229 chipset (ALI15X3). It boots from
a seperate Seagate (ST38410A) hard disk connected to the on-board
controller.

Whenever I try to write more than a certain amount (which is variable,
about 10 to 60 MB) *to* the RAID array, the whole system locks up
silently, hard. No more network, no Oops, syslog entry or console
message, the magic SysRq key doesn't work, keyboard LEDs are dead. The
only way to get the system back up is via a hard reset or power cycle.

But I can read as much as I like *from* the array just fine. Smaller
writes work fine as well.

I've reproduced this with kernel versions 2.4.23 and 2.4.27 with a
variety of compilation options to no effect, as well as fiddling with
kernel boot parameters, hdparm settings, testing the RAM, flashing the
Ultra100TX2 BIOS so they both are version 2.20.0.15, increasing the PCI
latency and checking the PnP settings for IRQ and DMA channels in the
(AMI) BIOS, moving the PCI cards around, changing the cables, and
checking the cooling. There is no APIC so I can't use the NMI oopser.
I've defined ATA_DEBUG and ATA_VERBOSE_DEBUG in include/linux/libata.h
and PDC202XX_DEBUG_DRIVE_INFO and PDC202XX_DECODE_REGISTER_INFO in
drivers/ide/pci/pdc202xx_new.h. Not a peep. Dmesg reports nothing
irregular during boot. Current kernel version is "Linux version 2.4.27
(pmcaulay@chernobyl) (gcc version 2.95.4 20011002 (Debian prerelease))
#10 Tue Sep 14 00:06:09 CEST 2004".

If I disable DMA on the drives that make up the RAID array (using
hdparm), I get lots of syslog entries like these every time something is
written and the system freezes for a little while (1-2 seconds) on every
bus reset:

kernel: hdg: status error: status=0x58 { DriveReady SeekComplete DataRequest }
kernel:
kernel: hdg: drive not ready for command
kernel: hdg: status timeout: status=0xd0 { Busy }
kernel:
kernel: PDC202XX: Secondary channel reset.
kernel: hdg: drive not ready for command
kernel: ide3: reset: success

(Controller, channel and drive varies.)

The system does not always lock up permanently however, and disk writes
sometimes complete correctly. This is the only way I could get it to
work at all, but the system isn't exactly usable this way (the
stuttering also affects ssh sessions and the console, and very large
writes eventually lead to bad crashes which occasionally damage the
filesystem).

Other thoughts:
- The same combination of controllers and drives used to work fine with
a Pentium 166 + Intel Triton mainboard, but I'm no longer sure which
kernel version I was using at the time.
- There were no problems writing to a 80GB Hitachi DeskStar on any
channel of either Promise controllers, but I didn't test all channels
at the same time.
- I've swapped out the UltraTX2 for a Highpoint Rocket133 (hpt302, using
vendor driver 1.2 which pretends to be SCSI, not IDE) and the array
still crashes, but not quite as often.
- If I connect all drives to the HPT302 everything works fine (the
pdc20268 is still installed but not connected).
- Disks connected to the on-board IDE controllers have no problems.

lspci -v:

00:00.0 Host bridge: Acer Laboratories Inc. [ALi] M1541 (rev 04)
Subsystem: Acer Laboratories Inc. [ALi] ALI M1541 Aladdin V/V+ AGP System Controller
Flags: bus master, slow devsel, latency 128
Memory at e0000000 (32-bit, non-prefetchable) [size=4M]
Capabilities: [b0] AGP version 1.0

00:01.0 PCI bridge: Acer Laboratories Inc. [ALi] M5243 (rev 04) (prog-if 00 [Normal decode])
Flags: bus master, slow devsel, latency 128
Bus: primary=00, secondary=01, subordinate=01, sec-latency=128
I/O behind bridge: 0000c000-0000cfff
Memory behind bridge: cda00000-cfafffff
Prefetchable memory behind bridge: c9800000-cd8fffff

00:02.0 USB Controller: Acer Laboratories Inc. [ALi] M5237 USB (rev 03) (prog-if 10 [OHCI])
Flags: bus master, medium devsel, latency 128, IRQ 11
Memory at dffff000 (32-bit, non-prefetchable) [size=4K]

00:07.0 ISA bridge: Acer Laboratories Inc. [ALi] M1533 PCI to ISA Bridge [Aladdin IV] (rev c3)
Subsystem: Acer Laboratories Inc. [ALi] ALI M1533 Aladdin IV ISA Bridge
Flags: bus master, medium devsel, latency 0

00:0e.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 01)
Flags: bus master, medium devsel, latency 128, IRQ 10
Memory at cd9ff000 (32-bit, prefetchable) [size=4K]
I/O ports at df00 [size=32]
Memory at dfe00000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at dfd00000 [disabled] [size=1M]

00:0f.0 IDE interface: Acer Laboratories Inc. [ALi] M5229 IDE (rev c1) (prog-if fa)
Flags: bus master, medium devsel, latency 32, IRQ 14
I/O ports at ffa0 [size=16]

00:10.0 Unknown mass storage controller: Promise Technology, Inc. 20268 (rev 02) (prog-if 85)
Subsystem: Promise Technology, Inc. 20268
Flags: bus master, 66Mhz, slow devsel, latency 128, IRQ 9
I/O ports at dfa0 [size=8]
I/O ports at dff0 [size=4]
I/O ports at df90 [size=8]
I/O ports at dfe0 [size=4]
I/O ports at df40 [size=16]
Memory at dfff8000 (32-bit, non-prefetchable) [size=16K]
Expansion ROM at dffe0000 [disabled] [size=16K]
Capabilities: [60] Power Management version 1

00:12.0 Unknown mass storage controller: Promise Technology, Inc.: Unknown device 4d69 (rev 02) (prog-if 85)
Subsystem: Promise Technology, Inc.: Unknown device 4d68
Flags: bus master, 66Mhz, slow devsel, latency 128, IRQ 12
I/O ports at df80 [size=8]
I/O ports at df68 [size=4]
I/O ports at ded0 [size=8]
I/O ports at df60 [size=4]
I/O ports at dea0 [size=16]
Memory at dfff4000 (32-bit, non-prefetchable) [size=16K]
Expansion ROM at dffd0000 [disabled] [size=16K]
Capabilities: [60] Power Management version 1

01:00.0 VGA compatible controller: nVidia Corporation Riva TnT2 [NV5] (rev 11) (prog-if 00 [VGA])
Subsystem: LeadTek Research Inc.: Unknown device 2135
Flags: bus master, 66Mhz, medium devsel, latency 128
Memory at ce000000 (32-bit, non-prefetchable) [size=16M]
Memory at ca000000 (32-bit, prefetchable) [size=32M]
Expansion ROM at cf8f0000 [disabled] [size=64K]
Capabilities: [60] Power Management version 1
Capabilities: [44] AGP version 2.0

--
Peter Mc Aulay <[email protected]>


2004-09-15 06:18:34

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: pdc202xx_new + software raid0 freezes on array writes

On Tue, 14 Sep 2004 16:05:30 +0000, Peter Mc Aulay wrote:

> Hi...
>
> [NOTE: Please CC replies to me personally as I'm not a list subscriber.
> Thank you.]
>
> I was wondering if anyone could help me at all with a little problem I'm
> having.
>
> I have 4 Western Digital 200GB hard disks (WDC WD2000JB) in a software
> RAID0 setup, two each connected to a Promise Ultra100TX2 (PDC20268) and
> an Ultra133TX2 (PDC20269) (pdc202XX_new driver compiled into the
> kernel). The system itself is a K6-2 450Mhz CPU on an ATX mainboard of
> unknown brand using the ALi M1541/5229 chipset (ALI15X3). It boots from
> a seperate Seagate (ST38410A) hard disk connected to the on-board
> controller.

I had the very same problems with the pdc20268 and also reported them
(that was 1-2 years ago)... I threw them away now and replaced them with
some hpt370 controllers... then also the problems went away...

However I found that using some device on the secondary controller and
accessing that one makes it freeze already...

Soeren

2004-09-15 16:38:54

by Martin Josefsson

[permalink] [raw]
Subject: Re: pdc202xx_new + software raid0 freezes on array writes

On Wed, 2004-09-15 at 08:18, Soeren Sonnenburg wrote:

> I had the very same problems with the pdc20268 and also reported them
> (that was 1-2 years ago)... I threw them away now and replaced them with
> some hpt370 controllers... then also the problems went away...

Same here, also reported to lkml. Also replaced with hpt370, havn't had
a single problem that we could pinpoint to the hpt370 controllers since.

--
/Martin


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part