2005-01-05 21:39:01

by Andrew Morton

[permalink] [raw]
Subject: Fw: [Bugme-new] [Bug 3993] New: sata_sx4 causes file corruption during simultaneous writes



Begin forwarded message:

Date: Wed, 5 Jan 2005 08:08:32 -0800
From: [email protected]
To: [email protected]
Subject: [Bugme-new] [Bug 3993] New: sata_sx4 causes file corruption during simultaneous writes


http://bugme.osdl.org/show_bug.cgi?id=3993

Summary: sata_sx4 causes file corruption during simultaneous
writes
Kernel Version: 2.6.9
Status: NEW
Severity: normal
Owner: [email protected]
Submitter: [email protected]


Distribution: Debian testing (sarge)
Hardware Environment: Dual Pentium III 733 Mhz, 512 MB ECC Ram, Promise SX4 S150
Controller
Software Environment: 2.6.9 kernel with SMP support
Problem Description:
Three Seagate 160MB drives connected to the Promise SX4 S150 'Fasttrak'
controller, using the libata sata_sx4 driver. Individual writes to the drives
are fine. When the drives are written to simultaneously, either by multiple cp
threads or assembling them in a raid 5, corruption occurs as evidenced by fsck
errors and inconsistent md5 sums.

No hardware errors are reported. The drives all give clean badblocks tests and
return good benchmarks via bonnie++.

The system used has passed rigorous mprime testing and memtest testing. The ram
used in the promise card has passed promises's test, is certified as promise
compatible, and has passed indepndent memtest testing when installed in a
separate system.

Steps to reproduce:

1. Format drives and setup filesystem.
2. Start simultaneous instances of cp, copying large files to each drive.
3. Compare md5sums of copied files OR run fsck

Alternative:

1. Assemble drives into raid 5 array
2. copy file
3. compare md5 of copied file with original OR run fsck

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


2005-01-05 22:24:50

by David Ranson

[permalink] [raw]
Subject: Re: Fw: [Bugme-new] [Bug 3993] New: sata_sx4 causes file corruption during simultaneous writes

Andrew Morton wrote:

>controller, using the libata sata_sx4 driver. Individual writes to the drives
>are fine. When the drives are written to simultaneously, either by multiple cp
>threads or assembling them in a raid 5, corruption occurs as evidenced by fsck
>errors and inconsistent md5 sums.
>
>
>
FWIW at <$dayjob> we have had exactly the same issues using Win2k (ugh)
and Promise's own drivers on a Dual Opteron system (Rioworks HDAMA) with
an integrated Fastrak S150TX4 controller. Relatively stable using a
single drive as a separate volume (our application prefers a RAID 0
stripe), but random subtle corruptions when using an array (striped or
mirrored). This is both using the controller's embedded RAID and W2K's
software RAID (with the Promise configured to present separate disks).
Firmware upgrades/downgrades were tried with no luck. We have two
identically configured machines that both exhibit the same problem.

Interestingly, the errors were always single flipped bit(s) at random
offset(s) within the file. Different on each run. Sounds like a RAM
issue but both machines memtest fine and run without issues when using a
single drive.

We never found a solution (we simply use single (large) SATA drives
instead) :-(

David