2006-01-27 05:07:57

by Kalin KOZHUHAROV

[permalink] [raw]
Subject: libata errors in 2.6.15.1 ICH6 AHCI (SATA drive WD740GD)

Hi there.

I am reiterating this, while trying to diagnose the problem.
It is a DIY box with Asus P5GDC-V Deluxe motherboard with Marvel 88E8053 GB
ethernet (for info see [1]) and WD740GD (10k RPM) harddisk.

The NIC was not found by the in kernel driver, so I used a patch to sk98lin
binary driver, later tried sky2; both with intermittent succes. Now I have a
r8169 NIC and have disabled on board one in BIOS and put a new vanilla
linux-2.6.5.1

After some time (30 minutes to 3 days) the machine dies, first the disk,
some partitions mounted RO by the kernel, finally everything is dead (no
response to ping and KBD).

What I get in the dmesg is this:
...
[ 23.464209] hub 5-0:1.0: USB hub found
[ 23.464221] hub 5-0:1.0: 8 ports detected
[ 25.819331] r8169: eth0: link up
[13091.397797] ata1: handling error/timeout
[13091.397805] ata1: port reset, p_is 0 is 0 pis 0 cmd 4017 tf d0 ss 113 se 0
[13091.397823] ata1: status=0x50 { DriveReady SeekComplete }
[13091.397828] sda: Current: sense key=0x0
[13091.397831] ASC=0x0 ASCQ=0x0
[13091.481534] ata1: port reset, p_is 40000001 is 1 pis 0 cmd 4017 tf 471 ss
113 se 0
[13091.481542] ata1: translated ATA stat/err 0x71/04 to SCSI SK/ASC/ASCQ
0xb/00/00
[13091.481544] ata1: status=0x71 { DriveReady DeviceFault SeekComplete Error }
[13091.481549] ata1: error=0x04 { DriveStatusError }
...

The full dmesg can be found under [1] as 2.6.15.1-K01_P4_server.3.dmesg

I checked the drive (on the same machine) both with smartctl and with the
boot floppy I downloaded from WD support site (Data lifeguard tools).
Neither reported anything bad (yes I looked the status after the test).

The filesystem (reiserfs) does fscheck on every bood, but so far corruption
has not occured as far as I can see.

As always, the usual question is:

What is the cause of this? Bug?

What can I do to better diagnose it?

Is any additional info helpful (see [1])?

Dmesg and other hardware info can be found here:
[1]: http://linux.tar.bz/reports/oopses/char/

Kalin.
--
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|


2006-01-27 06:51:25

by Chase Venters

[permalink] [raw]
Subject: Re: libata errors in 2.6.15.1 ICH6 AHCI (SATA drive WD740GD)

On Thursday 26 January 2006 23:07, Kalin KOZHUHAROV wrote:
> Hi there.
>
> I am reiterating this, while trying to diagnose the problem.
> It is a DIY box with Asus P5GDC-V Deluxe motherboard with Marvel 88E8053 GB
> ethernet (for info see [1]) and WD740GD (10k RPM) harddisk.
>

Funny. I've been having problems at least since 2.6.13 (perhaps before; my
memory is broken) with my Asus P5GDC-V Deluxe and 4 WD drives. I've seen DMA
timeouts on my serial console, followed by an immediate kernel freeze in
which Magic SysRQ doesn't even respond.

I have yet to experience the freezing behavior on 2.6.15 (though I think I may
have seen errors in dmesg at one point), but then again, I've been a victim
of a slab leak which means I haven't maintained much of an uptime under
2.6.15.

I'm working on debugging the slab leak at the moment... unfortunately, I don't
know enough about SATA to really debug this issue. Bisecting would take
forever because it usually takes several days before I ever experience a
random freeze.

Nevertheless, if anyone has any pointers, I'd really like to start to wring
some of these bugs out of my kernel.

>
> Kalin.

Cheers,
Chase