2005-12-28 06:24:37

by Tomas Carnecky

[permalink] [raw]
Subject: Serial ATA Lockups

My setup: Shuttle XPC Barebone, AMD CPU, two serial ATA disks in a
software raid setup.
When the system is under heavy load (start World of Warcraft, dd
if=/dev/zero of=/part/file etc) I get these messages in dmesg:

ata1: translated ATA stat/err 0x51/84 to SCSI SK/ASC/ASCQ 0xb/47/00
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: error=0x84 { DriveStatusError BadCRC }

over and over, pages with these messages.
The system will eventually lockup hard, HDD led is on, no disk activity,
I have to reboot the system. Some kernels ago (2.6.14.2) I got a kernel
backtrace on the console, I don't remember exactly anymore but there was
something with scsi_resume(). I don't get this backtrace with this
kernel: 2.6.15-rc6-gd5ea4e26, now it just locks up hard.

Sometimes I can't even boot (like just before), it locked up before init
could be started. And I've seen this on my console (transcript):

command 0x35 timeout, stat 0xd0 host_stat 0x21
translated ATA stat/err 0x51/84 to SCSI SK/ASC/ASCQ 0xb/47/00
status=0xd0 { busy }
SCSI error: return code = 0x8000002
sda: Current: sense key = 0xB
end_request: I/O error, dev sda, sector [sector #]
ATA abnormal status 0xD0 on port 0x9f7

Its very hard recover from a hard lockup because at the next reboot, the
kernel wants to RESYNC the raid arrays and this causes heavy load which
again causes a hard lockup. And endless loop. Sometimes, I can boot and
then I change to 'init 2' to stop as many services as I can and unmount
as many partitions as I can but even then it sometimes locks up again.

The Barebone SATA chip supports SATA-II, but the harddrives are SATA-I.
The two disks are Seagate 120GB. I've had problems with the harddrives
before, I've had them on a ICP Vortex SATA hardware-raid controller and
sometimes, one disk would fail and I'd have to rebuild the array. It was
always the same disk, and I don't think it one that I have in my new
computer now.

Can this be fixed in the kernel? Or do I have to buy new harddisks?

tom


2005-12-28 07:33:50

by Robert Hancock

[permalink] [raw]
Subject: Re: Serial ATA Lockups

Tomas Carnecky wrote:
> My setup: Shuttle XPC Barebone, AMD CPU, two serial ATA disks in a
> software raid setup.
> When the system is under heavy load (start World of Warcraft, dd
> if=/dev/zero of=/part/file etc) I get these messages in dmesg:
>
> ata1: translated ATA stat/err 0x51/84 to SCSI SK/ASC/ASCQ 0xb/47/00
> ata1: status=0x51 { DriveReady SeekComplete Error }
> ata1: error=0x84 { DriveStatusError BadCRC }
>
> over and over, pages with these messages.

BadCRC indicates CRC errors during data transfers on the SATA interface.
This is almost certainly hardware problems - most likely a bad SATA
cable, or maybe a bad drive or motherboard controller.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/