2006-02-23 01:09:11

by mirror admin

[permalink] [raw]
Subject: 3ware 9550SX-ML16 controller weird pci bus parity errors

Hello,

I'm trying to run several new setups of a SuperMicro H8DSR-8 dual
opteron motherboard with two CPUs, 16GB of RAM and a 3ware 9550SX-ML16
controller with WDC WD4000YR drives, all parts brand new.

Installing a basic Linux dist goes fine, basic read/write is fine.
Whenever I try to read from any of the disks behind the 3ware controller
with dd (dd if=/dev/sda of=/dev/null) dd will stop at approximately
1.2GB with an IO error (tried the same on multiple disks, this happens
with all of them and the IO error occurs between 1.2 and 1.8GB with 1.2GB
being the most common case), When the IO error shows up the kernel log
gets immediately filled with the following messages:

Feb 22 01:16:02 newbox kernel: 3w-9xxx: scsi0: ERROR: (0x06:0x000C):
PCI Parity Error: clearing.
Feb 22 01:16:02 newbox kernel: 3w-9xxx: scsi0: ERROR: (0x03:0x0212):
PCI bus parity error:.
Feb 22 01:16:13 newbox kernel: 3w-9xxx: scsi0: ERROR: (0x06:0x000C):
PCI Parity Error: clearing.
Feb 22 01:16:13 newbox kernel: 3w-9xxx: scsi0: ERROR: (0x03:0x0212):
PCI bus parity error:.

Sometimes it would lead to a system crash with no other messages
printed.

I have tried kernels 2.6.15-rc5 and 2.6.15.4 so far and both have
failed with the same symptoms. I have tried using the very latest 3ware
driver release, 9.3.0.3 and several of the more recent firmware versions
for this card with the same results.

Any hints are greatly appreciated!




2006-02-24 11:59:14

by Just Marc

[permalink] [raw]
Subject: Re: 3ware 9550SX-ML16 controller weird pci bus parity errors

I've also been experiencing similar PCI parity error prints lately on
fairly recent hardware, shortly after the prints show up the syscam
sometimes just locks up without further prints / panics, running recent
kernels -- 2.6.14+.

Can anyone please shed some light on what can cause such errors, how can
they be debugged / addressed?

Marc