2006-03-16 23:02:19

by Mauro Tassinari

[permalink] [raw]
Subject: libata/sata errors on ich[?]/maxtor

The following hw combination appears to be broken, up to 2.6.16-rc6,
hanging after giving the usual bunch of repeated messages whenever a
moderate
i/o load is started, in this case a sda (hitachi) to sdb (maxtor) cpio.

.... snip ....

ata2: command 0x35 timeout, stat 0xd0 host_stat 0x21
ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata2: status=0xd0 { Busy }
sd 1:0:0:0: SCSI error: return code = 0x8000002
sdb: Current: sense key: Aborted Command
Additional sense: Scsi parity error
end_request: I/O error, dev sdb, sector 308678943
ATA: abnormal status 0xD0 on port 0xEFA7
ATA: abnormal status 0xD0 on port 0xEFA7
ATA: abnormal status 0xD0 on port 0xEFA7

.... snip ....

The behaviour is repetitive and does not depend on hw.
Different tests were run on different disks and platforms.
Hitachi to hitachi gave no errors.

Some details follow

Regards

Mauro Tassinari




root@test:~# /usr/src/linux/scripts/ver_linux
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux test 2.6.16-rc6-abi-sata #3 PREEMPT Thu Mar 16 18:59:20 Local time
zone must be set--see i686 pentium4 i386 GNU/Linux

Gnu C 3.4.6
Gnu make 3.80
binutils 2.15.92.0.2
util-linux 2.12r
mount 2.12r
module-init-tools 3.1
e2fsprogs 1.38
jfsutils 1.1.8
reiserfsprogs 3.6.19
reiser4progs line
xfsprogs 2.7.11
pcmcia-cs 3.2.8
quota-tools 3.12.
PPP 2.4.4b1
nfs-utils 1.0.7
Linux C Library 2.3.6
Dynamic linker (ldd) 2.3.6
Linux C++ Library 6.0.3
Procps 3.2.6
Net-tools 1.60
Kbd 1.12
Sh-utils 5.94
udev 064
Modules Loaded binfmt_aout binfmt_xout binfmt_coff abi_wyse
abi_solaris abi_sco abi_uw7 abi_ibcs abi_cxenix abi_svr4 lcall
7 abi_util snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
snd_mixer_oss nfsd exportfs lockd sunrpc ipv6 ohci_hcd
pcspkr intel_agp pl2303 usbserial shpchp pci_hotplug uhci_hcd ehci_hcd
usbcore i8xx_tco ahci i2c_i801 i2c_core sky2 pci200syn hdl
c syncppp com20020_pci com20020 arcnet snd_ens1371 gameport snd_rawmidi
snd_seq_device snd_ac97_codec snd_ac97_bus snd_pcm snd_tim
er snd soundcore snd_page_alloc pcmcia firmware_class yenta_socket
rsrc_nonstatic pcmcia_core tsdev lp parport_pc parport psmouse

root@test:~# hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
Model Number: Maxtor 6L160M0
Serial Number: L39VAZ4G
Firmware Revision: BACE1G20
Standards:
Used: ATA/ATAPI-7 T13 1532D revision 0
Supported: 7 6 5 4
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 320173056
device size with M = 1024*1024: 156334 MBytes
device size with M = 1000*1000: 163928 MBytes (163 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: unknown setting (0x0000)
Recommended acoustic management value: 192, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_VERIFY command
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
Advanced Power Management feature set
SET_MAX security extension
* Automatic Acoustic Management feature set
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
Media Card Pass-Through
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* URG for READ_STREAM[_DMA]_EXT
* URG for WRITE_STREAM[_DMA]_EXT
* SATA-I signaling speed (1.5Gb/s)
* Native Command Queueing (NCQ)
Software settings preservation
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
not supported: enhanced erase
Checksum: correct


1^ platform

root@test:~# lspci -v
00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub
Interface (rev 02)
Subsystem: ASUSTeK Computer Inc. P4P800 Mainboard
Flags: bus master, fast devsel, latency 0
Memory at f8000000 (32-bit, prefetchable) [size=64M]
Capabilities: [e4] #09 [2106]
Capabilities: [a0] AGP version 3.0

00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev
02) (prog-if 00 [Normal decode])
Flags: bus master, 66Mhz, fast devsel, latency 64
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
Memory behind bridge: fc900000-fe9fffff
Prefetchable memory behind bridge: e7f00000-f7efffff

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2) (prog-if 00
[Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=02, subordinate=02, sec-latency=64
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fea00000-feafffff

00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface
Bridge (rev 02)
Flags: bus master, medium devsel, latency 0

00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE
Controller (rev 02) (prog-if 8a [Master SecP PriP])
Subsystem: ASUSTeK Computer Inc. P4P800 Mainboard
Flags: bus master, medium devsel, latency 0, IRQ 16
I/O ports at <unassigned>
I/O ports at <unassigned>
I/O ports at <unassigned>
I/O ports at <unassigned>
I/O ports at fc00 [size=16]
Memory at 30000000 (32-bit, non-prefetchable) [size=1K]

00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev
02) (prog-if 8f [Master SecP SecO PriP PriO])
Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 16
I/O ports at efe0 [size=8]
I/O ports at efac [size=4]
I/O ports at efa0 [size=8]
I/O ports at efa8 [size=4]
I/O ports at ef90 [size=16]

00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller
(rev 02)
Subsystem: ASUSTeK Computer Inc. P4P800 Mainboard
Flags: medium devsel, IRQ 11
I/O ports at 0400 [size=32]

00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER
(ICH5/ICH5R) AC'97 Audio Controller (rev 02)
Subsystem: ASUSTeK Computer Inc.: Unknown device 812a
Flags: bus master, medium devsel, latency 0, IRQ 20
I/O ports at e800 [size=256]
I/O ports at ef00 [size=64]
Memory at febfb800 (32-bit, non-prefetchable) [size=512]
Memory at febfb400 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2

01:00.0 VGA compatible controller: nVidia Corporation NV18 [GeForce4 MX 440
AGP 8x] (rev c1) (prog-if 00 [VGA])
Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 19
Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
Memory at e8000000 (32-bit, prefetchable) [size=128M]
Expansion ROM at fe9e0000 [disabled] [size=128K]
Capabilities: [60] Power Management version 2
Capabilities: [44] AGP version 3.0

02:04.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak
378/SATA 378) (rev 02)
Subsystem: ASUSTeK Computer Inc. K8V Deluxe/PC-DL Deluxe motherboard
Flags: bus master, 66Mhz, medium devsel, latency 96, IRQ 18
I/O ports at df00 [size=64]
I/O ports at dfa0 [size=16]
I/O ports at dc00 [size=128]
Memory at feaff000 (32-bit, non-prefetchable) [size=4K]
Memory at feac0000 (32-bit, non-prefetchable) [size=128K]
Capabilities: [60] Power Management version 2

02:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RT8139
Flags: bus master, medium devsel, latency 64, IRQ 18
I/O ports at d800 [size=256]
Memory at feafec00 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2

02:0c.0 Communication controller: PLX Technology, Inc. PCI <-> IOBus Bridge
(rev 01)
Subsystem: PLX Technology, Inc.: Unknown device 1584
Flags: medium devsel, IRQ 17
I/O ports at d480 [size=128]
I/O ports at df80 [size=32]
I/O ports at dfe0 [size=8]



2^ platform

root@test:/var/lspci -v
00:00.0 Host bridge: Intel Corporation 915G/P/GV/GL/PL/910GL Processor to
I/O Controller (rev 04)
Subsystem: Intel Corporation 915G/P/GV/GL/PL/910GL Processor to I/O
Controller
Flags: bus master, fast devsel, latency 0
Capabilities: [e0] #09 [2109]

00:01.0 PCI bridge: Intel Corporation 915G/P/GV/GL/PL/910GL PCI Express Root
Port (rev 04) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: d7f00000-d7ffffff
Prefetchable memory behind bridge: d8000000-dfffffff
Capabilities: [88] #0d [0000]
Capabilities: [80] Power Management version 2
Capabilities: [90] Message Signalled Interrupts: 64bit- Queue=0/0
Enable-
Capabilities: [a0] #10 [0141]

00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
PCI Express Port 1 (rev 03) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Capabilities: [40] #10 [0141]
Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0
Enable-
Capabilities: [90] #0d [0000]
Capabilities: [a0] Power Management version 2

00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
PCI Express Port 2 (rev 03) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
I/O behind bridge: 0000c000-0000cfff
Memory behind bridge: d7e00000-d7efffff
Capabilities: [40] #10 [0141]
Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0
Enable-
Capabilities: [90] #0d [0000]
Capabilities: [a0] Power Management version 2

00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #1 (rev 03) (prog-if 00 [UHCI])
Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
Flags: bus master, medium devsel, latency 0, IRQ 22
I/O ports at 9880 [size=32]

00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #2 (rev 03) (prog-if 00 [UHCI])
Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
Flags: bus master, medium devsel, latency 0, IRQ 19
I/O ports at 9c00 [size=32]

00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #3 (rev 03) (prog-if 00 [UHCI])
Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
Flags: bus master, medium devsel, latency 0, IRQ 18
I/O ports at a000 [size=32]

00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB UHCI #4 (rev 03) (prog-if 00 [UHCI])
Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
Flags: bus master, medium devsel, latency 0, IRQ 16
I/O ports at a080 [size=32]

00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) USB2 EHCI Controller (rev 03) (prog-if 20 [EHCI])
Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
Flags: bus master, medium devsel, latency 0, IRQ 22
Memory at d7dff800 (32-bit, non-prefetchable) [size=1K]
Capabilities: [50] Power Management version 2
Capabilities: [58] #0a [20a0]

00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3) (prog-if 01
[Subtractive decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
I/O behind bridge: 0000b000-0000bfff
Capabilities: [50] #0d [0000]

00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface
Bridge (rev 03)
Flags: bus master, medium devsel, latency 0

00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
IDE Controller (rev 03) (prog-if 8a [Master SecP PriP])
Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
Flags: bus master, medium devsel, latency 0, IRQ 18
I/O ports at <unassigned>
I/O ports at <unassigned>
I/O ports at <unassigned>
I/O ports at <unassigned>
I/O ports at ffa0 [size=16]

00:1f.2 IDE interface: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW) SATA
Controller (rev 03) (prog-if 8f [Master SecP SecO PriP PriO])
Subsystem: ASUSTeK Computer Inc.: Unknown device 2601
Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 19
I/O ports at ac00 [size=8]
I/O ports at a880 [size=4]
I/O ports at a800 [size=8]
I/O ports at a480 [size=4]
I/O ports at a400 [size=16]
Memory at d7dffc00 (32-bit, non-prefetchable) [size=1K]
Capabilities: [70] Power Management version 2

00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus
Controller (rev 03)
Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
Flags: medium devsel
I/O ports at 0400 [size=32]

01:0a.0 Communication controller: PLX Technology, Inc. PCI <-> IOBus Bridge
(rev 01)
Subsystem: PLX Technology, Inc.: Unknown device 1588
Flags: medium devsel, IRQ 21
I/O ports at b880 [size=128]
I/O ports at b800 [size=64]
I/O ports at b480 [size=8]

01:0b.0 Multimedia audio controller: Ensoniq 5880 AudioPCI (rev 02)
Subsystem: Ensoniq Creative Sound Blaster AudioPCI128
Flags: bus master, slow devsel, latency 64, IRQ 20
I/O ports at bc00 [size=64]
Capabilities: [dc] Power Management version 1

02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 Gigabit
Ethernet Controller (rev 15)
Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet
Controller (Asus)
Flags: bus master, fast devsel, latency 0, IRQ 17
Memory at d7efc000 (64-bit, non-prefetchable) [size=16K]
I/O ports at c800 [size=256]
Expansion ROM at d7ec0000 [disabled] [size=128K]
Capabilities: [48] Power Management version 2
Capabilities: [50] Vital Product Data
Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1
Enable-
Capabilities: [e0] #10 [0011]

04:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon
X300 (PCIE)] (prog-if 00 [VGA])
Subsystem: ASUSTeK Computer Inc. Extreme AX300SE-X
Flags: bus master, fast devsel, latency 0
Memory at d8000000 (32-bit, prefetchable) [size=128M]
I/O ports at e000 [size=256]
Memory at d7fe0000 (32-bit, non-prefetchable) [size=64K]
Expansion ROM at d7fc0000 [disabled] [size=128K]
Capabilities: [50] Power Management version 2
Capabilities: [58] #10 [0001]
Capabilities: [80] Message Signalled Interrupts: 64bit+ Queue=0/0
Enable-

04:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE]
Subsystem: ASUSTeK Computer Inc.: Unknown device 002b
Flags: bus master, fast devsel, latency 0
Memory at d7ff0000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [50] Power Management version 2
Capabilities: [58] #10 [0001]





2006-03-17 04:37:09

by Samuel Masham

[permalink] [raw]
Subject: Re: libata/sata errors on ich[?]/maxtor

Hi Mauro, All,

On 17/03/06, Mauro Tassinari <[email protected]> wrote:
> The following hw combination appears to be broken, up to 2.6.16-rc6,
> hanging after giving the usual bunch of repeated messages whenever a
> moderate
> i/o load is started, in this case a sda (hitachi) to sdb (maxtor) cpio.
>
> .... snip ....
>
> ata2: command 0x35 timeout, stat 0xd0 host_stat 0x21
> ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> ata2: status=0xd0 { Busy }
> sd 1:0:0:0: SCSI error: return code = 0x8000002
> sdb: Current: sense key: Aborted Command
> Additional sense: Scsi parity error
> end_request: I/O error, dev sdb, sector 308678943
> ATA: abnormal status 0xD0 on port 0xEFA7
> ATA: abnormal status 0xD0 on port 0xEFA7
> ATA: abnormal status 0xD0 on port 0xEFA7
>
> .... snip ....

Unfortualy I can reproduce this at will just by trying to mkfs -t ext3
on one of the box's second drive.

A couple of weeks ago we added a second hardisk to our Dell 750's

SATA controler (lspci output from a slightly less uptodate box...)
00:1f.2 IDE interface: Intel Corp. 6300ESB SATA Storage Controller
(rev 02) (prog-if 8a [Master SecP PriP])
Subsystem: Dell: Unknown device 0165
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Interrupt: pin A routed to IRQ 17
Region 0: I/O ports at <unassigned>
Region 1: I/O ports at <unassigned>
Region 2: I/O ports at <unassigned>
Region 3: I/O ports at <unassigned>
Region 4: I/O ports at fea0 [size=16]
SATA drives
Ata Maxtor 6Y080M0 SCSI sda 0
Ata Maxtor 6V250F0 SCSI sdb 0

(using the ata_piix driver...)

We have 10 of these boxes with RHEL 3 on an even with the latest
(updated yesterday 2.4.20-40? ie update 7) version the situation is
unchanged.

I wasn't going to post the issue to LKML as it in our case it should
be a redhat problem but if you are seeing the same thing they may be
some value in adding my information here... I will be adding this to
redhats bugzilla ... soon ;)

What seems to happen is that mkfs will try and write to the drive
using the normal

(From strace)

_llseek(3, 72613699584, [72613699584], SEEK_SET) = 0
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768

but this write then "never" returns...

from the echo t > /proc/sysrq-trigger output

mkfs.ext3 R F6EA8000 3872 22820 22819 (NOTLB)
Call Trace: [<c01355d5>] schedule_timeout [kernel] 0x65 (0xd5233cd4)
[<c0135560>] process_timeout [kernel] 0x0 (0xd5233cf4)
[<c012569d>] io_schedule_timeout [kernel] 0x2d (0xd5233d0c)
[<c01d33fa>] __get_request_wait [kernel] 0xfa (0xd5233d18)
[<c01d3a5c>] __make_request [kernel] 0x15c (0xd5233d74)
[<c01d40da>] generic_make_request [kernel] 0xea (0xd5233dd0)
[<c01d4179>] submit_bh_rsector [kernel] 0x49 (0xd5233df8)
[<c01667bb>] write_locked_buffers [kernel] 0x3b (0xd5233e14)
[<c0166930>] write_some_buffers [kernel] 0x160 (0xd5233e28)
[<c0167b24>] balance_dirty [kernel] 0x34 (0xd5233ecc)
[<c0168aa4>] __block_commit_write [kernel] 0x84 (0xd5233ed8)
[<c01692dd>] block_commit_write [kernel] 0x2d (0xd5233ef4)
[<c014c6d5>] do_generic_file_write [kernel] 0x235 (0xd5233f0c)
[<c014cc4f>] generic_file_write [kernel] 0x18f (0xd5233f60)
[<c0165267>] sys_write [kernel] 0x97 (0xd5233f94)
[<c01110d9>] syscall_trace_enter [kernel] 0x59 (0xd5233fac)
[<c02af114>] no_timing2 [kernel] 0x7 (0xd5233fc0)
[<c02a002b>] clip_pop [kernel] 0x5b (0xd5233fe0)

The value of R varies with it sometime being 00000001 or a number
like F6758000

As you can see from the printk's here this error continues and the for
every access (write?) to the drive you just have to wait for a
timeout.

ata1: command 0x35 timeout, stat 0xd1 host_stat 0x61
ata1: translated ATA stat/err 0xd1/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata1: status=0xd1 { Busy }
SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 8000002
Current sd08:12: sense key Aborted Command
Additional sense indicates Scsi parity error
I/O error: dev 08:12, sector 89391226
ATA: abnormal status 0xD1 on port 0x177
ATA: abnormal status 0xD1 on port 0x177
ATA: abnormal status 0xD1 on port 0x177
ata1: command 0x35 timeout, stat 0xd1 host_stat 0x61
ata1: translated ATA stat/err 0xd1/00 to SCSI SK/ASC/ASCQ 0xb/47/00
ata1: status=0xd1 { Busy }
SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 8000002
Current sd08:12: sense key Aborted Command
Additional sense indicates Scsi parity error
I/O error: dev 08:12, sector 89391228
ATA: abnormal status 0xD1 on port 0x177
ATA: abnormal status 0xD1 on port 0x177
ATA: abnormal status 0xD1 on port 0x177

For us this also seems to be blocking the access to the other drive?
so we are left with a effectively dead box....

We have seen this effect on 3 of the 10 boxes so far (with different
triggers) but it is 100% repeatable on this one box this way.

It looks to me like ether we should be clearing an error status after
we have got it or the drive should clear it automatically but it
clearly isn't. As such all subsequent accesses just timeout repeating
the already seen error...

Obviously this last paragraph is random wild speculation on my part :)

Anyone got a handle on this one?

Samuel

ps we cant upgrade to the latest and greatest (ie anything other than
rhel3) on these boxes but if we are showing the same symptoms then
maybe it is useful... and we have a great test case here for any
fixes...

>
> The behaviour is repetitive and does not depend on hw.
> Different tests were run on different disks and platforms.
> Hitachi to hitachi gave no errors.
>
> Some details follow
>
> Regards
>
> Mauro Tassinari
>
>
>
>
> root@test:~# /usr/src/linux/scripts/ver_linux
> If some fields are empty or look unusual you may have an old version.
> Compare to the current minimal requirements in Documentation/Changes.
>
> Linux test 2.6.16-rc6-abi-sata #3 PREEMPT Thu Mar 16 18:59:20 Local time
> zone must be set--see i686 pentium4 i386 GNU/Linux
>
> Gnu C 3.4.6
> Gnu make 3.80
> binutils 2.15.92.0.2
> util-linux 2.12r
> mount 2.12r
> module-init-tools 3.1
> e2fsprogs 1.38
> jfsutils 1.1.8
> reiserfsprogs 3.6.19
> reiser4progs line
> xfsprogs 2.7.11
> pcmcia-cs 3.2.8
> quota-tools 3.12.
> PPP 2.4.4b1
> nfs-utils 1.0.7
> Linux C Library 2.3.6
> Dynamic linker (ldd) 2.3.6
> Linux C++ Library 6.0.3
> Procps 3.2.6
> Net-tools 1.60
> Kbd 1.12
> Sh-utils 5.94
> udev 064
> Modules Loaded binfmt_aout binfmt_xout binfmt_coff abi_wyse
> abi_solaris abi_sco abi_uw7 abi_ibcs abi_cxenix abi_svr4 lcall
> 7 abi_util snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
> snd_mixer_oss nfsd exportfs lockd sunrpc ipv6 ohci_hcd
> pcspkr intel_agp pl2303 usbserial shpchp pci_hotplug uhci_hcd ehci_hcd
> usbcore i8xx_tco ahci i2c_i801 i2c_core sky2 pci200syn hdl
> c syncppp com20020_pci com20020 arcnet snd_ens1371 gameport snd_rawmidi
> snd_seq_device snd_ac97_codec snd_ac97_bus snd_pcm snd_tim
> er snd soundcore snd_page_alloc pcmcia firmware_class yenta_socket
> rsrc_nonstatic pcmcia_core tsdev lp parport_pc parport psmouse
>
> root@test:~# hdparm -I /dev/sdb
>
> /dev/sdb:
>
> ATA device, with non-removable media
> Model Number: Maxtor 6L160M0
> Serial Number: L39VAZ4G
> Firmware Revision: BACE1G20
> Standards:
> Used: ATA/ATAPI-7 T13 1532D revision 0
> Supported: 7 6 5 4
> Configuration:
> Logical max current
> cylinders 16383 16383
> heads 16 16
> sectors/track 63 63
> --
> CHS current addressable sectors: 16514064
> LBA user addressable sectors: 268435455
> LBA48 user addressable sectors: 320173056
> device size with M = 1024*1024: 156334 MBytes
> device size with M = 1000*1000: 163928 MBytes (163 GB)
> Capabilities:
> LBA, IORDY(can be disabled)
> Queue depth: 32
> Standby timer values: spec'd by Standard, no device specific minimum
> R/W multiple sector transfer: Max = 16 Current = 16
> Advanced power management level: unknown setting (0x0000)
> Recommended acoustic management value: 192, current value: 254
> DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
> Cycle time: min=120ns recommended=120ns
> PIO: pio0 pio1 pio2 pio3 pio4
> Cycle time: no flow control=120ns IORDY flow control=120ns
> Cycle time: no flow control=120ns IORDY flow control=120ns
> Commands/features:
> Enabled Supported:
> * SMART feature set
> Security Mode feature set
> * Power Management feature set
> * Write cache
> * Look-ahead
> * Host Protected Area feature set
> * WRITE_VERIFY command
> * WRITE_BUFFER command
> * READ_BUFFER command
> * NOP cmd
> * DOWNLOAD_MICROCODE
> Advanced Power Management feature set
> SET_MAX security extension
> * Automatic Acoustic Management feature set
> * 48-bit Address feature set
> * Device Configuration Overlay feature set
> * Mandatory FLUSH_CACHE
> * FLUSH_CACHE_EXT
> * SMART error logging
> * SMART self-test
> Media Card Pass-Through
> * General Purpose Logging feature set
> * WRITE_{DMA|MULTIPLE}_FUA_EXT
> * URG for READ_STREAM[_DMA]_EXT
> * URG for WRITE_STREAM[_DMA]_EXT
> * SATA-I signaling speed (1.5Gb/s)
> * Native Command Queueing (NCQ)
> Software settings preservation
> Security:
> Master password revision code = 65534
> supported
> not enabled
> not locked
> not frozen
> not expired: security count
> not supported: enhanced erase
> Checksum: correct
>
>
> 1^ platform
>
> root@test:~# lspci -v
> 00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub
> Interface (rev 02)
> Subsystem: ASUSTeK Computer Inc. P4P800 Mainboard
> Flags: bus master, fast devsel, latency 0
> Memory at f8000000 (32-bit, prefetchable) [size=64M]
> Capabilities: [e4] #09 [2106]
> Capabilities: [a0] AGP version 3.0
>
> 00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev
> 02) (prog-if 00 [Normal decode])
> Flags: bus master, 66Mhz, fast devsel, latency 64
> Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
> Memory behind bridge: fc900000-fe9fffff
> Prefetchable memory behind bridge: e7f00000-f7efffff
>
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2) (prog-if 00
> [Normal decode])
> Flags: bus master, fast devsel, latency 0
> Bus: primary=00, secondary=02, subordinate=02, sec-latency=64
> I/O behind bridge: 0000d000-0000dfff
> Memory behind bridge: fea00000-feafffff
>
> 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface
> Bridge (rev 02)
> Flags: bus master, medium devsel, latency 0
>
> 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE
> Controller (rev 02) (prog-if 8a [Master SecP PriP])
> Subsystem: ASUSTeK Computer Inc. P4P800 Mainboard
> Flags: bus master, medium devsel, latency 0, IRQ 16
> I/O ports at <unassigned>
> I/O ports at <unassigned>
> I/O ports at <unassigned>
> I/O ports at <unassigned>
> I/O ports at fc00 [size=16]
> Memory at 30000000 (32-bit, non-prefetchable) [size=1K]
>
> 00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev
> 02) (prog-if 8f [Master SecP SecO PriP PriO])
> Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 16
> I/O ports at efe0 [size=8]
> I/O ports at efac [size=4]
> I/O ports at efa0 [size=8]
> I/O ports at efa8 [size=4]
> I/O ports at ef90 [size=16]
>
> 00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller
> (rev 02)
> Subsystem: ASUSTeK Computer Inc. P4P800 Mainboard
> Flags: medium devsel, IRQ 11
> I/O ports at 0400 [size=32]
>
> 00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER
> (ICH5/ICH5R) AC'97 Audio Controller (rev 02)
> Subsystem: ASUSTeK Computer Inc.: Unknown device 812a
> Flags: bus master, medium devsel, latency 0, IRQ 20
> I/O ports at e800 [size=256]
> I/O ports at ef00 [size=64]
> Memory at febfb800 (32-bit, non-prefetchable) [size=512]
> Memory at febfb400 (32-bit, non-prefetchable) [size=256]
> Capabilities: [50] Power Management version 2
>
> 01:00.0 VGA compatible controller: nVidia Corporation NV18 [GeForce4 MX 440
> AGP 8x] (rev c1) (prog-if 00 [VGA])
> Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 19
> Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
> Memory at e8000000 (32-bit, prefetchable) [size=128M]
> Expansion ROM at fe9e0000 [disabled] [size=128K]
> Capabilities: [60] Power Management version 2
> Capabilities: [44] AGP version 3.0
>
> 02:04.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak
> 378/SATA 378) (rev 02)
> Subsystem: ASUSTeK Computer Inc. K8V Deluxe/PC-DL Deluxe motherboard
> Flags: bus master, 66Mhz, medium devsel, latency 96, IRQ 18
> I/O ports at df00 [size=64]
> I/O ports at dfa0 [size=16]
> I/O ports at dc00 [size=128]
> Memory at feaff000 (32-bit, non-prefetchable) [size=4K]
> Memory at feac0000 (32-bit, non-prefetchable) [size=128K]
> Capabilities: [60] Power Management version 2
>
> 02:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> RTL-8139/8139C/8139C+ (rev 10)
> Subsystem: Realtek Semiconductor Co., Ltd. RT8139
> Flags: bus master, medium devsel, latency 64, IRQ 18
> I/O ports at d800 [size=256]
> Memory at feafec00 (32-bit, non-prefetchable) [size=256]
> Capabilities: [50] Power Management version 2
>
> 02:0c.0 Communication controller: PLX Technology, Inc. PCI <-> IOBus Bridge
> (rev 01)
> Subsystem: PLX Technology, Inc.: Unknown device 1584
> Flags: medium devsel, IRQ 17
> I/O ports at d480 [size=128]
> I/O ports at df80 [size=32]
> I/O ports at dfe0 [size=8]
>
>
>
> 2^ platform
>
> root@test:/var/lspci -v
> 00:00.0 Host bridge: Intel Corporation 915G/P/GV/GL/PL/910GL Processor to
> I/O Controller (rev 04)
> Subsystem: Intel Corporation 915G/P/GV/GL/PL/910GL Processor to I/O
> Controller
> Flags: bus master, fast devsel, latency 0
> Capabilities: [e0] #09 [2109]
>
> 00:01.0 PCI bridge: Intel Corporation 915G/P/GV/GL/PL/910GL PCI Express Root
> Port (rev 04) (prog-if 00 [Normal decode])
> Flags: bus master, fast devsel, latency 0
> Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
> I/O behind bridge: 0000e000-0000efff
> Memory behind bridge: d7f00000-d7ffffff
> Prefetchable memory behind bridge: d8000000-dfffffff
> Capabilities: [88] #0d [0000]
> Capabilities: [80] Power Management version 2
> Capabilities: [90] Message Signalled Interrupts: 64bit- Queue=0/0
> Enable-
> Capabilities: [a0] #10 [0141]
>
> 00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
> PCI Express Port 1 (rev 03) (prog-if 00 [Normal decode])
> Flags: bus master, fast devsel, latency 0
> Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
> I/O behind bridge: 0000d000-0000dfff
> Capabilities: [40] #10 [0141]
> Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0
> Enable-
> Capabilities: [90] #0d [0000]
> Capabilities: [a0] Power Management version 2
>
> 00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
> PCI Express Port 2 (rev 03) (prog-if 00 [Normal decode])
> Flags: bus master, fast devsel, latency 0
> Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
> I/O behind bridge: 0000c000-0000cfff
> Memory behind bridge: d7e00000-d7efffff
> Capabilities: [40] #10 [0141]
> Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0
> Enable-
> Capabilities: [90] #0d [0000]
> Capabilities: [a0] Power Management version 2
>
> 00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> Family) USB UHCI #1 (rev 03) (prog-if 00 [UHCI])
> Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> Flags: bus master, medium devsel, latency 0, IRQ 22
> I/O ports at 9880 [size=32]
>
> 00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> Family) USB UHCI #2 (rev 03) (prog-if 00 [UHCI])
> Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> Flags: bus master, medium devsel, latency 0, IRQ 19
> I/O ports at 9c00 [size=32]
>
> 00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> Family) USB UHCI #3 (rev 03) (prog-if 00 [UHCI])
> Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> Flags: bus master, medium devsel, latency 0, IRQ 18
> I/O ports at a000 [size=32]
>
> 00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> Family) USB UHCI #4 (rev 03) (prog-if 00 [UHCI])
> Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> Flags: bus master, medium devsel, latency 0, IRQ 16
> I/O ports at a080 [size=32]
>
> 00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> Family) USB2 EHCI Controller (rev 03) (prog-if 20 [EHCI])
> Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> Flags: bus master, medium devsel, latency 0, IRQ 22
> Memory at d7dff800 (32-bit, non-prefetchable) [size=1K]
> Capabilities: [50] Power Management version 2
> Capabilities: [58] #0a [20a0]
>
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3) (prog-if 01
> [Subtractive decode])
> Flags: bus master, fast devsel, latency 0
> Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
> I/O behind bridge: 0000b000-0000bfff
> Capabilities: [50] #0d [0000]
>
> 00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface
> Bridge (rev 03)
> Flags: bus master, medium devsel, latency 0
>
> 00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
> IDE Controller (rev 03) (prog-if 8a [Master SecP PriP])
> Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> Flags: bus master, medium devsel, latency 0, IRQ 18
> I/O ports at <unassigned>
> I/O ports at <unassigned>
> I/O ports at <unassigned>
> I/O ports at <unassigned>
> I/O ports at ffa0 [size=16]
>
> 00:1f.2 IDE interface: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW) SATA
> Controller (rev 03) (prog-if 8f [Master SecP SecO PriP PriO])
> Subsystem: ASUSTeK Computer Inc.: Unknown device 2601
> Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 19
> I/O ports at ac00 [size=8]
> I/O ports at a880 [size=4]
> I/O ports at a800 [size=8]
> I/O ports at a480 [size=4]
> I/O ports at a400 [size=16]
> Memory at d7dffc00 (32-bit, non-prefetchable) [size=1K]
> Capabilities: [70] Power Management version 2
>
> 00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus
> Controller (rev 03)
> Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> Flags: medium devsel
> I/O ports at 0400 [size=32]
>
> 01:0a.0 Communication controller: PLX Technology, Inc. PCI <-> IOBus Bridge
> (rev 01)
> Subsystem: PLX Technology, Inc.: Unknown device 1588
> Flags: medium devsel, IRQ 21
> I/O ports at b880 [size=128]
> I/O ports at b800 [size=64]
> I/O ports at b480 [size=8]
>
> 01:0b.0 Multimedia audio controller: Ensoniq 5880 AudioPCI (rev 02)
> Subsystem: Ensoniq Creative Sound Blaster AudioPCI128
> Flags: bus master, slow devsel, latency 64, IRQ 20
> I/O ports at bc00 [size=64]
> Capabilities: [dc] Power Management version 1
>
> 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 Gigabit
> Ethernet Controller (rev 15)
> Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet
> Controller (Asus)
> Flags: bus master, fast devsel, latency 0, IRQ 17
> Memory at d7efc000 (64-bit, non-prefetchable) [size=16K]
> I/O ports at c800 [size=256]
> Expansion ROM at d7ec0000 [disabled] [size=128K]
> Capabilities: [48] Power Management version 2
> Capabilities: [50] Vital Product Data
> Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1
> Enable-
> Capabilities: [e0] #10 [0011]
>
> 04:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon
> X300 (PCIE)] (prog-if 00 [VGA])
> Subsystem: ASUSTeK Computer Inc. Extreme AX300SE-X
> Flags: bus master, fast devsel, latency 0
> Memory at d8000000 (32-bit, prefetchable) [size=128M]
> I/O ports at e000 [size=256]
> Memory at d7fe0000 (32-bit, non-prefetchable) [size=64K]
> Expansion ROM at d7fc0000 [disabled] [size=128K]
> Capabilities: [50] Power Management version 2
> Capabilities: [58] #10 [0001]
> Capabilities: [80] Message Signalled Interrupts: 64bit+ Queue=0/0
> Enable-
>
> 04:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE]
> Subsystem: ASUSTeK Computer Inc.: Unknown device 002b
> Flags: bus master, fast devsel, latency 0
> Memory at d7ff0000 (32-bit, non-prefetchable) [size=64K]
> Capabilities: [50] Power Management version 2
> Capabilities: [58] #10 [0001]
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2006-03-17 06:01:48

by Samuel Masham

[permalink] [raw]
Subject: Re: libata/sata errors on ich[?]/maxtor

On 17/03/06, Samuel Masham <[email protected]> wrote:
> Hi Mauro, All,
>
> On 17/03/06, Mauro Tassinari <[email protected]> wrote:
> > The following hw combination appears to be broken, up to 2.6.16-rc6,
> > hanging after giving the usual bunch of repeated messages whenever a
> > moderate
> > i/o load is started, in this case a sda (hitachi) to sdb (maxtor) cpio.
> >
> > .... snip ....
> >
> > ata2: command 0x35 timeout, stat 0xd0 host_stat 0x21
> > ata2: translated ATA stat/err 0xd0/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> > ata2: status=0xd0 { Busy }
> > sd 1:0:0:0: SCSI error: return code = 0x8000002
> > sdb: Current: sense key: Aborted Command
> > Additional sense: Scsi parity error
> > end_request: I/O error, dev sdb, sector 308678943
> > ATA: abnormal status 0xD0 on port 0xEFA7
> > ATA: abnormal status 0xD0 on port 0xEFA7
> > ATA: abnormal status 0xD0 on port 0xEFA7
> >
> > .... snip ....
>
> Unfortualy I can reproduce this at will just by trying to mkfs -t ext3
> on one of the box's second drive.
>
> A couple of weeks ago we added a second hardisk to our Dell 750's
>
<snip>
> I wasn't going to post the issue to LKML as it in our case it should
> be a redhat problem but if you are seeing the same thing they may be
> some value in adding my information here... I will be adding this to
> redhats bugzilla ... soon ;)

see: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=185724

> What seems to happen is that mkfs will try and write to the drive
> using the normal
>
> (From strace)
>
> _llseek(3, 72613699584, [72613699584], SEEK_SET) = 0
> write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 32768
>
> but this write then "never" returns...
>
> from the echo t > /proc/sysrq-trigger output
>
> mkfs.ext3 R F6EA8000 3872 22820 22819 (NOTLB)
> Call Trace: [<c01355d5>] schedule_timeout [kernel] 0x65 (0xd5233cd4)
> [<c0135560>] process_timeout [kernel] 0x0 (0xd5233cf4)
> [<c012569d>] io_schedule_timeout [kernel] 0x2d (0xd5233d0c)
> [<c01d33fa>] __get_request_wait [kernel] 0xfa (0xd5233d18)
> [<c01d3a5c>] __make_request [kernel] 0x15c (0xd5233d74)
> [<c01d40da>] generic_make_request [kernel] 0xea (0xd5233dd0)
> [<c01d4179>] submit_bh_rsector [kernel] 0x49 (0xd5233df8)
> [<c01667bb>] write_locked_buffers [kernel] 0x3b (0xd5233e14)
> [<c0166930>] write_some_buffers [kernel] 0x160 (0xd5233e28)
> [<c0167b24>] balance_dirty [kernel] 0x34 (0xd5233ecc)
> [<c0168aa4>] __block_commit_write [kernel] 0x84 (0xd5233ed8)
> [<c01692dd>] block_commit_write [kernel] 0x2d (0xd5233ef4)
> [<c014c6d5>] do_generic_file_write [kernel] 0x235 (0xd5233f0c)
> [<c014cc4f>] generic_file_write [kernel] 0x18f (0xd5233f60)
> [<c0165267>] sys_write [kernel] 0x97 (0xd5233f94)
> [<c01110d9>] syscall_trace_enter [kernel] 0x59 (0xd5233fac)
> [<c02af114>] no_timing2 [kernel] 0x7 (0xd5233fc0)
> [<c02a002b>] clip_pop [kernel] 0x5b (0xd5233fe0)
>
> The value of R varies with it sometime being 00000001 or a number
> like F6758000
>
> As you can see from the printk's here this error continues and the for
> every access (write?) to the drive you just have to wait for a
> timeout.
>
> ata1: command 0x35 timeout, stat 0xd1 host_stat 0x61
> ata1: translated ATA stat/err 0xd1/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> ata1: status=0xd1 { Busy }
> SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 8000002
> Current sd08:12: sense key Aborted Command
> Additional sense indicates Scsi parity error
> I/O error: dev 08:12, sector 89391226
> ATA: abnormal status 0xD1 on port 0x177
> ATA: abnormal status 0xD1 on port 0x177
> ATA: abnormal status 0xD1 on port 0x177
> ata1: command 0x35 timeout, stat 0xd1 host_stat 0x61
> ata1: translated ATA stat/err 0xd1/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> ata1: status=0xd1 { Busy }
> SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 8000002
> Current sd08:12: sense key Aborted Command
> Additional sense indicates Scsi parity error
> I/O error: dev 08:12, sector 89391228
> ATA: abnormal status 0xD1 on port 0x177
> ATA: abnormal status 0xD1 on port 0x177
> ATA: abnormal status 0xD1 on port 0x177
>
> For us this also seems to be blocking the access to the other drive?
> so we are left with a effectively dead box....
>
> We have seen this effect on 3 of the 10 boxes so far (with different
> triggers) but it is 100% repeatable on this one box this way.
>
> It looks to me like ether we should be clearing an error status after
> we have got it or the drive should clear it automatically but it
> clearly isn't. As such all subsequent accesses just timeout repeating
> the already seen error...
>
> Obviously this last paragraph is random wild speculation on my part :)
>
> Anyone got a handle on this one?
>
> Samuel
>
> ps we cant upgrade to the latest and greatest (ie anything other than
> rhel3) on these boxes but if we are showing the same symptoms then
> maybe it is useful... and we have a great test case here for any
> fixes...
>
> >
> > The behaviour is repetitive and does not depend on hw.
> > Different tests were run on different disks and platforms.
> > Hitachi to hitachi gave no errors.
> >
> > Some details follow
> >
> > Regards
> >
> > Mauro Tassinari
> >
> >
> >
> >
> > root@test:~# /usr/src/linux/scripts/ver_linux
> > If some fields are empty or look unusual you may have an old version.
> > Compare to the current minimal requirements in Documentation/Changes.
> >
> > Linux test 2.6.16-rc6-abi-sata #3 PREEMPT Thu Mar 16 18:59:20 Local time
> > zone must be set--see i686 pentium4 i386 GNU/Linux
> >
> > Gnu C 3.4.6
> > Gnu make 3.80
> > binutils 2.15.92.0.2
> > util-linux 2.12r
> > mount 2.12r
> > module-init-tools 3.1
> > e2fsprogs 1.38
> > jfsutils 1.1.8
> > reiserfsprogs 3.6.19
> > reiser4progs line
> > xfsprogs 2.7.11
> > pcmcia-cs 3.2.8
> > quota-tools 3.12.
> > PPP 2.4.4b1
> > nfs-utils 1.0.7
> > Linux C Library 2.3.6
> > Dynamic linker (ldd) 2.3.6
> > Linux C++ Library 6.0.3
> > Procps 3.2.6
> > Net-tools 1.60
> > Kbd 1.12
> > Sh-utils 5.94
> > udev 064
> > Modules Loaded binfmt_aout binfmt_xout binfmt_coff abi_wyse
> > abi_solaris abi_sco abi_uw7 abi_ibcs abi_cxenix abi_svr4 lcall
> > 7 abi_util snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
> > snd_mixer_oss nfsd exportfs lockd sunrpc ipv6 ohci_hcd
> > pcspkr intel_agp pl2303 usbserial shpchp pci_hotplug uhci_hcd ehci_hcd
> > usbcore i8xx_tco ahci i2c_i801 i2c_core sky2 pci200syn hdl
> > c syncppp com20020_pci com20020 arcnet snd_ens1371 gameport snd_rawmidi
> > snd_seq_device snd_ac97_codec snd_ac97_bus snd_pcm snd_tim
> > er snd soundcore snd_page_alloc pcmcia firmware_class yenta_socket
> > rsrc_nonstatic pcmcia_core tsdev lp parport_pc parport psmouse
> >
> > root@test:~# hdparm -I /dev/sdb
> >
> > /dev/sdb:
> >
> > ATA device, with non-removable media
> > Model Number: Maxtor 6L160M0
> > Serial Number: L39VAZ4G
> > Firmware Revision: BACE1G20
> > Standards:
> > Used: ATA/ATAPI-7 T13 1532D revision 0
> > Supported: 7 6 5 4
> > Configuration:
> > Logical max current
> > cylinders 16383 16383
> > heads 16 16
> > sectors/track 63 63
> > --
> > CHS current addressable sectors: 16514064
> > LBA user addressable sectors: 268435455
> > LBA48 user addressable sectors: 320173056
> > device size with M = 1024*1024: 156334 MBytes
> > device size with M = 1000*1000: 163928 MBytes (163 GB)
> > Capabilities:
> > LBA, IORDY(can be disabled)
> > Queue depth: 32
> > Standby timer values: spec'd by Standard, no device specific minimum
> > R/W multiple sector transfer: Max = 16 Current = 16
> > Advanced power management level: unknown setting (0x0000)
> > Recommended acoustic management value: 192, current value: 254
> > DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
> > Cycle time: min=120ns recommended=120ns
> > PIO: pio0 pio1 pio2 pio3 pio4
> > Cycle time: no flow control=120ns IORDY flow control=120ns
> > Cycle time: no flow control=120ns IORDY flow control=120ns
> > Commands/features:
> > Enabled Supported:
> > * SMART feature set
> > Security Mode feature set
> > * Power Management feature set
> > * Write cache
> > * Look-ahead
> > * Host Protected Area feature set
> > * WRITE_VERIFY command
> > * WRITE_BUFFER command
> > * READ_BUFFER command
> > * NOP cmd
> > * DOWNLOAD_MICROCODE
> > Advanced Power Management feature set
> > SET_MAX security extension
> > * Automatic Acoustic Management feature set
> > * 48-bit Address feature set
> > * Device Configuration Overlay feature set
> > * Mandatory FLUSH_CACHE
> > * FLUSH_CACHE_EXT
> > * SMART error logging
> > * SMART self-test
> > Media Card Pass-Through
> > * General Purpose Logging feature set
> > * WRITE_{DMA|MULTIPLE}_FUA_EXT
> > * URG for READ_STREAM[_DMA]_EXT
> > * URG for WRITE_STREAM[_DMA]_EXT
> > * SATA-I signaling speed (1.5Gb/s)
> > * Native Command Queueing (NCQ)
> > Software settings preservation
> > Security:
> > Master password revision code = 65534
> > supported
> > not enabled
> > not locked
> > not frozen
> > not expired: security count
> > not supported: enhanced erase
> > Checksum: correct
> >
> >
> > 1^ platform
> >
> > root@test:~# lspci -v
> > 00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub
> > Interface (rev 02)
> > Subsystem: ASUSTeK Computer Inc. P4P800 Mainboard
> > Flags: bus master, fast devsel, latency 0
> > Memory at f8000000 (32-bit, prefetchable) [size=64M]
> > Capabilities: [e4] #09 [2106]
> > Capabilities: [a0] AGP version 3.0
> >
> > 00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev
> > 02) (prog-if 00 [Normal decode])
> > Flags: bus master, 66Mhz, fast devsel, latency 64
> > Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
> > Memory behind bridge: fc900000-fe9fffff
> > Prefetchable memory behind bridge: e7f00000-f7efffff
> >
> > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2) (prog-if 00
> > [Normal decode])
> > Flags: bus master, fast devsel, latency 0
> > Bus: primary=00, secondary=02, subordinate=02, sec-latency=64
> > I/O behind bridge: 0000d000-0000dfff
> > Memory behind bridge: fea00000-feafffff
> >
> > 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface
> > Bridge (rev 02)
> > Flags: bus master, medium devsel, latency 0
> >
> > 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE
> > Controller (rev 02) (prog-if 8a [Master SecP PriP])
> > Subsystem: ASUSTeK Computer Inc. P4P800 Mainboard
> > Flags: bus master, medium devsel, latency 0, IRQ 16
> > I/O ports at <unassigned>
> > I/O ports at <unassigned>
> > I/O ports at <unassigned>
> > I/O ports at <unassigned>
> > I/O ports at fc00 [size=16]
> > Memory at 30000000 (32-bit, non-prefetchable) [size=1K]
> >
> > 00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev
> > 02) (prog-if 8f [Master SecP SecO PriP PriO])
> > Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> > Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 16
> > I/O ports at efe0 [size=8]
> > I/O ports at efac [size=4]
> > I/O ports at efa0 [size=8]
> > I/O ports at efa8 [size=4]
> > I/O ports at ef90 [size=16]
> >
> > 00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller
> > (rev 02)
> > Subsystem: ASUSTeK Computer Inc. P4P800 Mainboard
> > Flags: medium devsel, IRQ 11
> > I/O ports at 0400 [size=32]
> >
> > 00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER
> > (ICH5/ICH5R) AC'97 Audio Controller (rev 02)
> > Subsystem: ASUSTeK Computer Inc.: Unknown device 812a
> > Flags: bus master, medium devsel, latency 0, IRQ 20
> > I/O ports at e800 [size=256]
> > I/O ports at ef00 [size=64]
> > Memory at febfb800 (32-bit, non-prefetchable) [size=512]
> > Memory at febfb400 (32-bit, non-prefetchable) [size=256]
> > Capabilities: [50] Power Management version 2
> >
> > 01:00.0 VGA compatible controller: nVidia Corporation NV18 [GeForce4 MX 440
> > AGP 8x] (rev c1) (prog-if 00 [VGA])
> > Flags: bus master, 66Mhz, medium devsel, latency 64, IRQ 19
> > Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
> > Memory at e8000000 (32-bit, prefetchable) [size=128M]
> > Expansion ROM at fe9e0000 [disabled] [size=128K]
> > Capabilities: [60] Power Management version 2
> > Capabilities: [44] AGP version 3.0
> >
> > 02:04.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak
> > 378/SATA 378) (rev 02)
> > Subsystem: ASUSTeK Computer Inc. K8V Deluxe/PC-DL Deluxe motherboard
> > Flags: bus master, 66Mhz, medium devsel, latency 96, IRQ 18
> > I/O ports at df00 [size=64]
> > I/O ports at dfa0 [size=16]
> > I/O ports at dc00 [size=128]
> > Memory at feaff000 (32-bit, non-prefetchable) [size=4K]
> > Memory at feac0000 (32-bit, non-prefetchable) [size=128K]
> > Capabilities: [60] Power Management version 2
> >
> > 02:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> > RTL-8139/8139C/8139C+ (rev 10)
> > Subsystem: Realtek Semiconductor Co., Ltd. RT8139
> > Flags: bus master, medium devsel, latency 64, IRQ 18
> > I/O ports at d800 [size=256]
> > Memory at feafec00 (32-bit, non-prefetchable) [size=256]
> > Capabilities: [50] Power Management version 2
> >
> > 02:0c.0 Communication controller: PLX Technology, Inc. PCI <-> IOBus Bridge
> > (rev 01)
> > Subsystem: PLX Technology, Inc.: Unknown device 1584
> > Flags: medium devsel, IRQ 17
> > I/O ports at d480 [size=128]
> > I/O ports at df80 [size=32]
> > I/O ports at dfe0 [size=8]
> >
> >
> >
> > 2^ platform
> >
> > root@test:/var/lspci -v
> > 00:00.0 Host bridge: Intel Corporation 915G/P/GV/GL/PL/910GL Processor to
> > I/O Controller (rev 04)
> > Subsystem: Intel Corporation 915G/P/GV/GL/PL/910GL Processor to I/O
> > Controller
> > Flags: bus master, fast devsel, latency 0
> > Capabilities: [e0] #09 [2109]
> >
> > 00:01.0 PCI bridge: Intel Corporation 915G/P/GV/GL/PL/910GL PCI Express Root
> > Port (rev 04) (prog-if 00 [Normal decode])
> > Flags: bus master, fast devsel, latency 0
> > Bus: primary=00, secondary=04, subordinate=04, sec-latency=0
> > I/O behind bridge: 0000e000-0000efff
> > Memory behind bridge: d7f00000-d7ffffff
> > Prefetchable memory behind bridge: d8000000-dfffffff
> > Capabilities: [88] #0d [0000]
> > Capabilities: [80] Power Management version 2
> > Capabilities: [90] Message Signalled Interrupts: 64bit- Queue=0/0
> > Enable-
> > Capabilities: [a0] #10 [0141]
> >
> > 00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
> > PCI Express Port 1 (rev 03) (prog-if 00 [Normal decode])
> > Flags: bus master, fast devsel, latency 0
> > Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
> > I/O behind bridge: 0000d000-0000dfff
> > Capabilities: [40] #10 [0141]
> > Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0
> > Enable-
> > Capabilities: [90] #0d [0000]
> > Capabilities: [a0] Power Management version 2
> >
> > 00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
> > PCI Express Port 2 (rev 03) (prog-if 00 [Normal decode])
> > Flags: bus master, fast devsel, latency 0
> > Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
> > I/O behind bridge: 0000c000-0000cfff
> > Memory behind bridge: d7e00000-d7efffff
> > Capabilities: [40] #10 [0141]
> > Capabilities: [80] Message Signalled Interrupts: 64bit- Queue=0/0
> > Enable-
> > Capabilities: [90] #0d [0000]
> > Capabilities: [a0] Power Management version 2
> >
> > 00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> > Family) USB UHCI #1 (rev 03) (prog-if 00 [UHCI])
> > Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> > Flags: bus master, medium devsel, latency 0, IRQ 22
> > I/O ports at 9880 [size=32]
> >
> > 00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> > Family) USB UHCI #2 (rev 03) (prog-if 00 [UHCI])
> > Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> > Flags: bus master, medium devsel, latency 0, IRQ 19
> > I/O ports at 9c00 [size=32]
> >
> > 00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> > Family) USB UHCI #3 (rev 03) (prog-if 00 [UHCI])
> > Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> > Flags: bus master, medium devsel, latency 0, IRQ 18
> > I/O ports at a000 [size=32]
> >
> > 00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> > Family) USB UHCI #4 (rev 03) (prog-if 00 [UHCI])
> > Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> > Flags: bus master, medium devsel, latency 0, IRQ 16
> > I/O ports at a080 [size=32]
> >
> > 00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> > Family) USB2 EHCI Controller (rev 03) (prog-if 20 [EHCI])
> > Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> > Flags: bus master, medium devsel, latency 0, IRQ 22
> > Memory at d7dff800 (32-bit, non-prefetchable) [size=1K]
> > Capabilities: [50] Power Management version 2
> > Capabilities: [58] #0a [20a0]
> >
> > 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3) (prog-if 01
> > [Subtractive decode])
> > Flags: bus master, fast devsel, latency 0
> > Bus: primary=00, secondary=01, subordinate=01, sec-latency=32
> > I/O behind bridge: 0000b000-0000bfff
> > Capabilities: [50] #0d [0000]
> >
> > 00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface
> > Bridge (rev 03)
> > Flags: bus master, medium devsel, latency 0
> >
> > 00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
> > IDE Controller (rev 03) (prog-if 8a [Master SecP PriP])
> > Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> > Flags: bus master, medium devsel, latency 0, IRQ 18
> > I/O ports at <unassigned>
> > I/O ports at <unassigned>
> > I/O ports at <unassigned>
> > I/O ports at <unassigned>
> > I/O ports at ffa0 [size=16]
> >
> > 00:1f.2 IDE interface: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW) SATA
> > Controller (rev 03) (prog-if 8f [Master SecP SecO PriP PriO])
> > Subsystem: ASUSTeK Computer Inc.: Unknown device 2601
> > Flags: bus master, 66Mhz, medium devsel, latency 0, IRQ 19
> > I/O ports at ac00 [size=8]
> > I/O ports at a880 [size=4]
> > I/O ports at a800 [size=8]
> > I/O ports at a480 [size=4]
> > I/O ports at a400 [size=16]
> > Memory at d7dffc00 (32-bit, non-prefetchable) [size=1K]
> > Capabilities: [70] Power Management version 2
> >
> > 00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus
> > Controller (rev 03)
> > Subsystem: ASUSTeK Computer Inc.: Unknown device 80a6
> > Flags: medium devsel
> > I/O ports at 0400 [size=32]
> >
> > 01:0a.0 Communication controller: PLX Technology, Inc. PCI <-> IOBus Bridge
> > (rev 01)
> > Subsystem: PLX Technology, Inc.: Unknown device 1588
> > Flags: medium devsel, IRQ 21
> > I/O ports at b880 [size=128]
> > I/O ports at b800 [size=64]
> > I/O ports at b480 [size=8]
> >
> > 01:0b.0 Multimedia audio controller: Ensoniq 5880 AudioPCI (rev 02)
> > Subsystem: Ensoniq Creative Sound Blaster AudioPCI128
> > Flags: bus master, slow devsel, latency 64, IRQ 20
> > I/O ports at bc00 [size=64]
> > Capabilities: [dc] Power Management version 1
> >
> > 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 Gigabit
> > Ethernet Controller (rev 15)
> > Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet
> > Controller (Asus)
> > Flags: bus master, fast devsel, latency 0, IRQ 17
> > Memory at d7efc000 (64-bit, non-prefetchable) [size=16K]
> > I/O ports at c800 [size=256]
> > Expansion ROM at d7ec0000 [disabled] [size=128K]
> > Capabilities: [48] Power Management version 2
> > Capabilities: [50] Vital Product Data
> > Capabilities: [5c] Message Signalled Interrupts: 64bit+ Queue=0/1
> > Enable-
> > Capabilities: [e0] #10 [0011]
> >
> > 04:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon
> > X300 (PCIE)] (prog-if 00 [VGA])
> > Subsystem: ASUSTeK Computer Inc. Extreme AX300SE-X
> > Flags: bus master, fast devsel, latency 0
> > Memory at d8000000 (32-bit, prefetchable) [size=128M]
> > I/O ports at e000 [size=256]
> > Memory at d7fe0000 (32-bit, non-prefetchable) [size=64K]
> > Expansion ROM at d7fc0000 [disabled] [size=128K]
> > Capabilities: [50] Power Management version 2
> > Capabilities: [58] #10 [0001]
> > Capabilities: [80] Message Signalled Interrupts: 64bit+ Queue=0/0
> > Enable-
> >
> > 04:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE]
> > Subsystem: ASUSTeK Computer Inc.: Unknown device 002b
> > Flags: bus master, fast devsel, latency 0
> > Memory at d7ff0000 (32-bit, non-prefetchable) [size=64K]
> > Capabilities: [50] Power Management version 2
> > Capabilities: [58] #10 [0001]
> >
> >
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>

2006-03-17 09:55:11

by [email protected]

[permalink] [raw]
Subject: R: libata/sata errors on ich[?]/maxtor

Hi Samuel, All

> some value in adding my information here... I will be adding this to
> redhats bugzilla ... soon ;)
>
>see: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=185724
>
> What seems to happen is that mkfs will try and write to the drive
> using the normal
>

from bugzilla:

>Writing inode tables: 733/1247
>
>....and then no more (actually since the update the block number we stick
on
>has moved slightly)

yes, we experimented the same absolutely reproducible behaviour,
the system hangs - better, processes involving i/o hang -
after a few timeouts and cannot be brought down other than hard resetting
it.
Our mileage can vary...

We use vanilla kernels on slackware-current distribution (see ver_linux in
previous post).

Tryed:
2.6.15
2.6.15.6
2.6.16-rc4
2.6.16-rc6

Mauro


2006-03-17 11:28:37

by Alan

[permalink] [raw]
Subject: Re: libata/sata errors on ich[?]/maxtor

On Gwe, 2006-03-17 at 13:37 +0900, Samuel Masham wrote:
> As you can see from the printk's here this error continues and the for
> every access (write?) to the drive you just have to wait for a
> timeout.

Eventually the drive will be offlined.

> ata1: command 0x35 timeout, stat 0xd1 host_stat 0x61
> ata1: translated ATA stat/err 0xd1/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> ata1: status=0xd1 { Busy }
> SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 8000002
> Current sd08:12: sense key Aborted Command
> Additional sense indicates Scsi parity error

It thinks there is a communication (eg cable problem), at least that is
how it has mapped the error report. Not something I'd expect to see in
the SATA case on several machines so it could be some kind of setup
error or timing incompatibility in the driver.

What is attached to that controller (SATA and PATA items)

2006-03-17 14:35:51

by Samuel Masham

[permalink] [raw]
Subject: Re: libata/sata errors on ich[?]/maxtor

Hi Alan,

On 17/03/06, Alan Cox <[email protected]> wrote:
> On Gwe, 2006-03-17 at 13:37 +0900, Samuel Masham wrote:
> > As you can see from the printk's here this error continues and the for
> > every access (write?) to the drive you just have to wait for a
> > timeout.
>
> Eventually the drive will be offlined.

really? I can test that easily enough if nothing else :)

> > ata1: command 0x35 timeout, stat 0xd1 host_stat 0x61
> > ata1: translated ATA stat/err 0xd1/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> > ata1: status=0xd1 { Busy }
> > SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 8000002
> > Current sd08:12: sense key Aborted Command
> > Additional sense indicates Scsi parity error
>
> It thinks there is a communication (eg cable problem), at least that is
> how it has mapped the error report. Not something I'd expect to see in
> the SATA case on several machines so it could be some kind of setup
> error or timing incompatibility in the driver.

Well Its cheep enough to get another cable and test that. But as the
failure is repeatable in nearly (but not quite) down to the block for
a given build then i don't have so much hope.

> What is attached to that controller (SATA and PATA items)

It being the weekend here i don't now have access to the box... but
from the rhn page (linked in the bugzilla entry if anyone can follow
that) it says

Ata Maxtor 6Y080M0 SCSI sda 0
Ata Maxtor 6V250F0 SCSI sdb 0

I thought they had a ide (normal ata) cddrive on board as well but
cant see that on the hardware info page...

I will check on Monday and report back.

Since posting this i have tried one more thing

I had a look at the data sheet for the sata control er and it said
that it supported the SATA 1 (150) mode and the drive supports SATA 2
(300) i think (names maybe confused here).

So I tried moving the jumper on the drive to the 150 mode position but
this made no difference.

Could the drive have some functions (NCQ?) enabled by default that the
controller cant handle? (and we dont turn off when we initialise the
drive as we have no support yet?)... ok that's reaching...

Any ideas of what I can check or tryout would be great!

Thanks

Samuel

ps Mauro, yep I am quite convinced we are seeing the same thing
here... I will let you know how it turns out for us

2006-03-17 15:10:42

by Mauro Tassinari

[permalink] [raw]
Subject: R: libata/sata errors on ich[?]/maxtor

Hi Samuel,

>
> ps Mauro, yep I am quite convinced we are seeing the same
> thing here... I will let you know how it turns out for us
>

yes, definitely.

Just note that - same systems - hitachi -> hitachi
works fine.

As soon as I'm back in the office, next Monday,
I'll try to get a couple of cables no more than three
inches long and see what happens. Not that I expect much, btw...

Mauro

2006-03-18 16:17:47

by Ian Young

[permalink] [raw]
Subject: Re: libata/sata errors on ich[?]/maxtor

Hi,

I'm having the same sorts of problems on similar hardware. I've not
found a way to reliably reproduce the error-- it seems to just happen
once a week, during periods of relative inactivity. On my system, the 2
drives are a seagate ST3300831AS and a Maxtor 6V300F0.

Mar 12 15:15:24 sarsen kernel: ata1: command 0x35 timeout, stat 0xd0
host_stat 0x61
Mar 12 15:15:24 sarsen kernel: ata1: translated ATA stat/err 0xd0/00 to
SCSI SK/ASC/ASCQ 0xb/47/00
Mar 12 15:15:24 sarsen kernel: ata1: status=0xd0 { Busy }
Mar 12 15:15:24 sarsen kernel: sd 0:0:1:0: SCSI error: return code =
0x8000002
Mar 12 15:15:24 sarsen kernel: sdb: Current: sense key: Aborted Command
Mar 12 15:15:24 sarsen kernel: Additional sense: Scsi parity error
Mar 12 15:15:24 sarsen kernel: end_request: I/O error, dev sdb, sector
586099263
[drive is kicked from RAID-1 array]
Mar 12 15:15:24 sarsen kernel: ATA: abnormal status 0xD0 on port 0x1F7

Later, smartd errors out (predictably) as well.

The odd part is that I can reboot the box, the drive comes back on-line,
and the array re-syncs as if nothing happens (and for the curious,
checksumming all the files against a clean backup revealed no
corruption). The error doesn't pop up on re-sync.

Maxtor's own diagnostic tool marks the drive as okay, all the SMART
statistics seem to be fine, and mkfs.ext3 -cc completed without error as
a stand-alone drive.

I have also, as Samuel did, move the jumper to the SATA150 setting. The
most recent time I received the error, I also switched the block
addressing mode in the BIOS from "auto" to "large" in a sort of voodoo
maneuver suggested by someone who had experienced errors with this line
of drives on an Nforce4 board. ...which brings up the point that the
6VxxxF0 drives have problems on nforce4 chipsets, and maxtor provides
new firmware for them if you request it, though I don't know more about
it than that, and haven't called to see if the firmware is available.
(link: http://www.ngohq.com/home.php?page=articles&go=read&arc_id=59)

I'm running Fedora Core 4's kernel: 2.6.15-1.1833_FC4 #1 Wed Mar 1
23:41:37 EST 2006 i686 i686 i386 GNU/Linux
with ata_piix as the libata driver.
Also, my `lshw` results: http://www.societasilluminati.org/hinv.html
(though as of the latest error, when I switched to "large" disk access,
I also left the drive in slot 0, so future listings will show it as
/dev/sda)






2006-03-22 09:59:23

by Samuel Masham

[permalink] [raw]
Subject: Re: libata/sata errors on ich[?]/maxtor

Hi Again All, Alan,

On 17/03/06, Samuel Masham <[email protected]> wrote:
> Hi Alan,
>
> On 17/03/06, Alan Cox <[email protected]> wrote:
> > On Gwe, 2006-03-17 at 13:37 +0900, Samuel Masham wrote:
> > > As you can see from the printk's here this error continues and the for
> > > every access (write?) to the drive you just have to wait for a
> > > timeout.
> >
> > Eventually the drive will be offlined.
>
> really? I can test that easily enough if nothing else :)

When is it (should it) going to offline the drive? its been spitting
out these messages (about set per min?) for 4 hours at the moment with
no change bar the sector number increasing by 2 each time...

> > > ata1: command 0x35 timeout, stat 0xd1 host_stat 0x61
> > > ata1: translated ATA stat/err 0xd1/00 to SCSI SK/ASC/ASCQ 0xb/47/00
> > > ata1: status=0xd1 { Busy }
> > > SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 8000002
> > > Current sd08:12: sense key Aborted Command
> > > Additional sense indicates Scsi parity error
> >
> > It thinks there is a communication (eg cable problem), at least that is
> > how it has mapped the error report. Not something I'd expect to see in
> > the SATA case on several machines so it could be some kind of setup
> > error or timing incompatibility in the driver.
>
> Well Its cheep enough to get another cable and test that.

Done. The new short cable showed no difference in behavior.

So left with the timing/setup error... Anyone with any ideas?

> > What is attached to that controller (SATA and PATA items)

as I said before there are two hardisks

> Ata Maxtor 6Y080M0 SCSI sda 0
> Ata Maxtor 6V250F0 SCSI sdb 0

(Remember the problem is ONLY with the second drive... and according
to others any in the 6Vxxx series shows this same issue?)

...and there is a cdrom drive attached via pata

(I think its on the same controller... the 6300ESB seems to do just
about everything...)

hdparm -I /dev/hda

/dev/hda:

ATAPI CD-ROM, with removable media
Model Number: SAMSUNG CD-ROM SN-124
Serial Number:
Firmware Revision: N103
Standards:
Likely used CD-ROM ATAPI-1
Configuration:
DRQ response: 50us.
Packet size: 12 bytes
Capabilities:
LBA, IORDY(can be disabled)
DMA: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 *udma2
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns

As Ian mentioned maxtor have release a new version of the drive
firmware ... but... The 6V250F0 drive that shows this lockup IS
running the latest drive firmware which I discovered after a rather
long exchange with Maxtor...

I have had a bit of a look at the sata spec and would just like to
confirm that the drive is configured to disable the NCQ (as the Maxtor
support seemed to stress this point). From what i can see this is done
in the Device Configuration Overlay...

>From the spec

4.8. Device Configuration Overlay
4.8.1. Definition

WORD 8: Serial ATA command / feature sets supported
This word enables configuration of command sets and feature sets.
Bit 0 indicates whether native command queuing shall be
supported by the device. When
set to one, the drive shall support native command queuing.
When cleared to zero, drive
support for native command queuing shall be disabled ....

So anyone got any ideas how to read this?

Or anything else to check / try...

Samuel

2006-03-22 11:27:41

by [email protected]

[permalink] [raw]
Subject: R: libata/sata errors on ich[?]/maxtor

Hi Samuel, All,

> > >
> > > Eventually the drive will be offlined.
> >
> > really? I can test that easily enough if nothing else :)
>
> When is it (should it) going to offline the drive? its been spitting out
these messages (about set per min?) for 4 hours at the moment with > no
change bar the sector number increasing by 2 each time...
>


confirm. tryed with cables about 4 inches long,
to no success. The process doing the i/o hangs and cannot be killed, so no
proper sync or shutdown other than reset is possible.
2.6.16 release shows no improvement.
2.6.16-rc6-mm2 does not even recognise the maxtor at boot...
btw, a couple of WD drives went flawless.



Mauro

2006-03-25 23:12:55

by Ian Young

[permalink] [raw]
Subject: Re: libata/sata errors on ich[?]/maxtor

I think I may have figured out my problem (I hope). I had, a while ago,
upgraded my BIOS, which reset the SATA mode for the controller. It's odd
that the error didn't pop up until I upgraded my drives from dual maxtor
6Y160M0's...

Originally I had the controller set up in the BIOS as "RAID" instead of
IDE, so they didnt' show up on any of the BIOS IDE channels in the BIOS
Setup screen. After the BIOS upgrade, this was overwritten with "Auto"
mode. When I first started having problems it was on "Auto"... I've set
it to "Combined" and "Enhanced", to no avail. (though I may set it to
"Enhanced" and reboot just to see what the kernel messages say.... ) As
of now, I've re-set it to "RAID", and now the kernel outputs the
following on boot. I have not had any issues since making this change.


kernel: Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
kernel: ide: Assuming 33MHz system bus speed for PIO modes; override
with idebus=xx
kernel: ICH5: IDE controller at PCI slot 0000:00:1f.1
kernel: ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
kernel: PCI: setting IRQ 11 as level-triggered
kernel: ACPI: PCI Interrupt 0000:00:1f.1[A] -> Link [LNKC] -> GSI 11
(level, low) -> IRQ 11
kernel: ICH5: chipset revision 2
kernel: ICH5: not 100% native mode: will probe irqs later
kernel: ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA,
hdb:pio
kernel: ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA,
hdd:pio
kernel: hdd: SR244W, ATAPI CD/DVD-ROM drive
kernel: ide1 at 0x170-0x177,0x376 on irq 15
kernel: hdd: ATAPI 24X CD-ROM drive, 128kB Cache, UDMA(33)
kernel: Uniform CD-ROM driver Revision: 3.20
kernel: ide-floppy driver 0.99.newide
[...]
kernel: SCSI subsystem initialized
kernel: ACPI: PCI Interrupt 0000:00:1f.2[A] -> Link [LNKC] -> GSI 11
(level, low) -> IRQ 11
kernel: ata1: SATA max UDMA/133 cmd 0xE200 ctl 0xE302 bmdma 0xE600
irq 11
kernel: ata2: SATA max UDMA/133 cmd 0xE400 ctl 0xE502 bmdma 0xE608
irq 11
kernel: ata1: dev 0 ATA-7, max UDMA/133, 586114704 sectors: LBA48
kernel: ata1: dev 0 configured for UDMA/133
kernel: scsi0 : ata_piix
kernel: ata2: dev 0 ATA-7, max UDMA/133, 586072368 sectors: LBA48
kernel: ata2: dev 0 configured for UDMA/133
kernel: scsi1 : ata_piix
kernel: Vendor: ATA Model: Maxtor 6V300F0 Rev: VA11
kernel: Type: Direct-Access ANSI SCSI
revision: 05
kernel: SCSI device sda: 586114704 512-byte hdwr sectors (300091 MB)
kernel: SCSI device sda: drive cache: write back
kernel: SCSI device sda: 586114704 512-byte hdwr sectors (300091 MB)
kernel: SCSI device sda: drive cache: write back
kernel: sda: sda1
kernel: sd 0:0:0:0: Attached scsi disk sda
kernel: Vendor: ATA Model: ST3300831AS Rev: 3.03
kernel: Type: Direct-Access ANSI SCSI
revision: 05
kernel: SCSI device sdb: 586072368 512-byte hdwr sectors (300069 MB)
kernel: SCSI device sdb: drive cache: write back
kernel: SCSI device sdb: 586072368 512-byte hdwr sectors (300069 MB)
kernel: SCSI device sdb: drive cache: write back
kernel: sdb: sdb1
kernel: sd 1:0:0:0: Attached scsi disk sdb


Previously, on "Auto", this is what the kernel would output on boot:

kernel: Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
kernel: ide: Assuming 33MHz system bus speed for PIO modes; override
with idebus=xx
kernel: ide0: I/O resource 0x1F0-0x1F7 not free.
kernel: ide0: ports already in use, skipping probe
kernel: hdd: SR244W, ATAPI CD/DVD-ROM drive
kernel: ide1 at 0x170-0x177,0x376 on irq 15
kernel: hdd: ATAPI 24X CD-ROM drive, 128kB Cache
kernel: Uniform CD-ROM driver Revision: 3.20
kernel: ide-floppy driver 0.99.newide
[...]
kernel: SCSI subsystem initialized
kernel: ata_piix: combined mode detected
kernel: ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
kernel: PCI: setting IRQ 11 as level-triggered
kernel: ACPI: PCI Interrupt 0000:00:1f.2[A] -> Link [LNKC] -> GSI 11
(level, low) -> IRQ 11
kernel: ata: 0x170 IDE port busy
kernel: ata1: SATA max UDMA/133 cmd 0x1F0 ctl 0x3F6 bmdma 0xF000 irq 14
kernel: input: AT Translated Set 2 keyboard on isa0060/serio0
kernel: ata1: dev 0 ATA, max UDMA/133, 320173056 sectors: lba48
kernel: ata1: dev 1 ATA, max UDMA/133, 320173056 sectors: lba48
kernel: ata1: dev 0 configured for UDMA/133
kernel: ata1: dev 1 configured for UDMA/133
kernel: scsi0 : ata_piix
kernel: Vendor: ATA Model: Maxtor 6Y160M0 Rev: YAR5
kernel: Type: Direct-Access ANSI SCSI
revision: 05
kernel: SCSI device sda: 320173056 512-byte hdwr sectors (163929 MB)
kernel: SCSI device sda: drive cache: write back
kernel: SCSI device sda: 320173056 512-byte hdwr sectors (163929 MB)
kernel: SCSI device sda: drive cache: write back
kernel: sda: sda1
kernel: Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
kernel: Vendor: ATA Model: Maxtor 6Y160M0 Rev: YAR5
kernel: Type: Direct-Access ANSI SCSI
revision: 05
kernel: SCSI device sdb: 320173056 512-byte hdwr sectors (163929 MB)
kernel: SCSI device sdb: drive cache: write back
kernel: SCSI device sdb: 320173056 512-byte hdwr sectors (163929 MB)
kernel: SCSI device sdb: drive cache: write back
kernel: sdb: sdb1
kernel: Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0


lshw also shows the bus and IO memory organized differently, with these
two options:

http://www.societasilluminati.org/hinv2.html : "RAID" mode set in the bios
http://www.societasilluminati.org/hinv.html: "Enhanced" mode set in the bios

Unfortunately, my boot logs have rolled since I got the "Enhanced" listing.


Samuel Masham wrote:
> Hi Again All, Alan,
>
> On 17/03/06, Samuel Masham <[email protected]> wrote:
>
>> Hi Alan,
>>
>> On 17/03/06, Alan Cox <[email protected]> wrote:
>>
>>> On Gwe, 2006-03-17 at 13:37 +0900, Samuel Masham wrote:
>>>
>>>> As you can see from the printk's here this error continues and the for
>>>> every access (write?) to the drive you just have to wait for a
>>>> timeout.
>>>>
>>> Eventually the drive will be offlined.
>>>
>> really? I can test that easily enough if nothing else :)
>>
>
> When is it (should it) going to offline the drive? its been spitting
> out these messages (about set per min?) for 4 hours at the moment with
> no change bar the sector number increasing by 2 each time...
>
>
>>>> ata1: command 0x35 timeout, stat 0xd1 host_stat 0x61
>>>> ata1: translated ATA stat/err 0xd1/00 to SCSI SK/ASC/ASCQ 0xb/47/00
>>>> ata1: status=0xd1 { Busy }
>>>> SCSI disk error : host 0 channel 0 id 1 lun 0 return code = 8000002
>>>> Current sd08:12: sense key Aborted Command
>>>> Additional sense indicates Scsi parity error
>>>>
>>> It thinks there is a communication (eg cable problem), at least that is
>>> how it has mapped the error report. Not something I'd expect to see in
>>> the SATA case on several machines so it could be some kind of setup
>>> error or timing incompatibility in the driver.
>>>
>> Well Its cheep enough to get another cable and test that.
>>
>
> Done. The new short cable showed no difference in behavior.
>
> So left with the timing/setup error... Anyone with any ideas?
>
>
>>> What is attached to that controller (SATA and PATA items)
>>>
>
> as I said before there are two hardisks
>
>
>> Ata Maxtor 6Y080M0 SCSI sda 0
>> Ata Maxtor 6V250F0 SCSI sdb 0
>>
>
> (Remember the problem is ONLY with the second drive... and according
> to others any in the 6Vxxx series shows this same issue?)
>
> ...and there is a cdrom drive attached via pata
>
> (I think its on the same controller... the 6300ESB seems to do just
> about everything...)
>
> hdparm -I /dev/hda
>
> /dev/hda:
>
> ATAPI CD-ROM, with removable media
> Model Number: SAMSUNG CD-ROM SN-124
> Serial Number:
> Firmware Revision: N103
> Standards:
> Likely used CD-ROM ATAPI-1
> Configuration:
> DRQ response: 50us.
> Packet size: 12 bytes
> Capabilities:
> LBA, IORDY(can be disabled)
> DMA: sdma0 sdma1 sdma2 mdma0 mdma1 mdma2 udma0 udma1 *udma2
> Cycle time: min=120ns recommended=120ns
> PIO: pio0 pio1 pio2 pio3 pio4
> Cycle time: no flow control=120ns IORDY flow control=120ns
>
> As Ian mentioned maxtor have release a new version of the drive
> firmware ... but... The 6V250F0 drive that shows this lockup IS
> running the latest drive firmware which I discovered after a rather
> long exchange with Maxtor...
>
> I have had a bit of a look at the sata spec and would just like to
> confirm that the drive is configured to disable the NCQ (as the Maxtor
> support seemed to stress this point). From what i can see this is done
> in the Device Configuration Overlay...
>
> From the spec
>
> 4.8. Device Configuration Overlay
> 4.8.1. Definition
>
> WORD 8: Serial ATA command / feature sets supported
> This word enables configuration of command sets and feature sets.
> Bit 0 indicates whether native command queuing shall be
> supported by the device. When
> set to one, the drive shall support native command queuing.
> When cleared to zero, drive
> support for native command queuing shall be disabled ....
>
> So anyone got any ideas how to read this?
>
> Or anything else to check / try...
>
> Samuel
>