2004-06-29 05:40:14

by Travis Morgan

[permalink] [raw]
Subject: SATA problems on two different systems

System 1 (endeavour):

> uname -a
Linux endeavour 2.6.6-rc1 #1 SMP Sun Apr 25 22:01:39 MDT 2004 i686
Intel(R) Pentium(R) 4 CPU 2.80GHz GenuineIntel GNU/Linux

> cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping : 9
cpu MHz : 2793.876
cache size : 512 KB
physical id : 0
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips : 5521.40

processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping : 9
cpu MHz : 2793.876
cache size : 512 KB
physical id : 0
siblings : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
bogomips : 5570.56

> lspci
0000:00:00.0 Host bridge: Intel Corp. 82865G/PE/P DRAM
Controller/Host-Hub Interface (rev 02)
0000:00:01.0 PCI bridge: Intel Corp. 82865G/PE/P PCI to AGP Controller
(rev 02)
0000:00:03.0 PCI bridge: Intel Corp. 82865G/PE/P PCI to CSA Bridge (rev
02)
0000:00:1d.0 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
UHCI #1 (rev 02)
0000:00:1d.1 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
UHCI #2 (rev 02)
0000:00:1d.2 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
UHCI #3 (rev 02)
0000:00:1d.3 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB
UHCI #4 (rev 02)
0000:00:1d.7 USB Controller: Intel Corp. 82801EB/ER (ICH5/ICH5R) USB2
EHCI Controller (rev 02)
0000:00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB/EB/ER Hub interface
to PCI Bridge (rev c2)
0000:00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Bridge
(rev 02)
0000:00:1f.1 IDE interface: Intel Corp. 82801EB/ER (ICH5/ICH5R) Ultra
ATA 100 Storage Controller (rev 02)
0000:00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150
Storage Controller (rev 02)
0000:00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus Controller
(rev 02)
0000:00:1f.5 Multimedia audio controller: Intel Corp. 82801EB/ER
(ICH5/ICH5R) AC'97 Audio Controller (rev 02)
0000:02:01.0 Ethernet controller: Intel Corp. 82547EI Gigabit Ethernet
Controller (LOM)
0000:03:00.0 VGA compatible controller: ATI Technologies Inc 3D Rage
I/II 215GT [Mach64 GT] (rev 41)
0000:03:02.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro
100] (rev 0c)
0000:03:07.0 FireWire (IEEE 1394): Lucent Microelectronics FW323 (rev
61)

> df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/md1 reiserfs 112G 77G 35G 69% /

> mdadm --query --detail /dev/md0 (this is the /boot partition)
/dev/md0:
Version : 00.90.01
Creation Time : Wed Nov 19 04:11:46 2003
Raid Level : raid1
Array Size : 96256 (94.00 MiB 98.57 MB)
Device Size : 96256 (94.00 MiB 98.57 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Mon Jun 28 02:08:30 2004
State : clean, no-errors
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0


Number Major Minor RaidDevice State
0 8 17 0 active sync
/dev/scsi/host1/bus0/target0/lun0/part1
1 8 1 1 active sync
/dev/scsi/host0/bus0/target0/lun0/part1
UUID : 38f606a1:deb53258:1f13987d:866ec669
Events : 0.51

> mdadm --query --detail /dev/md1
/dev/md1:
Version : 00.90.01
Creation Time : Wed Nov 19 04:11:59 2003
Raid Level : raid1
Array Size : 116824064 (111.41 GiB 119.63 GB)
Device Size : 116824064 (111.41 GiB 119.63 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Mon Jun 28 23:16:11 2004
State : clean, no-errors
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0


Number Major Minor RaidDevice State
0 0 0 -1 removed
1 8 3 1 active sync
/dev/scsi/host0/bus0/target0/lun0/part3
2 8 19 -1 faulty
/dev/scsi/host1/bus0/target0/lun0/part3
UUID : 7355f40b:84ce0756:1fe5bc9a:559e7cd0
Events : 0.18698581
> dmesg
scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 03 30 b7 74 00 00
7d 00
Current sdb: sense = 70 3
ASC=11 ASCQ= 4
Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
0x00 0x11 0x04
end_request: I/O error, dev sdb, sector 53524340
scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 03 30 b7 75 00 00
7c 00
Current sdb: sense = 70 3
ASC=11 ASCQ= 4
Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
0x00 0x11 0x04
end_request: I/O error, dev sdb, sector 53524341
scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 03 30 b7 76 00 00
7b 00
Current sdb: sense = 70 3
ASC=11 ASCQ= 4
Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
0x00 0x11 0x04
end_request: I/O error, dev sdb, sector 53524342
scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 03 30 b7 77 00 00
7a 00
Current sdb: sense = 70 3
ASC=11 ASCQ= 4
Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
0x00 0x11 0x04
end_request: I/O error, dev sdb, sector 53524343
scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 03 30 b7 78 00 00
79 00
Current sdb: sense = 70 3
ASC=11 ASCQ= 4
Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
0x00 0x11 0x04
end_request: I/O error, dev sdb, sector 53524344
<SNIP>
raid1: Disk failure on sdb3, disabling device.
Operation continuing on 1 devices
raid1: sdb: unrecoverable I/O read error for block 52737152
md: md1: sync done.
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:1, o:0, dev:sdb3
disk 1, wo:0, o:1, dev:sda3
RAID1 conf printout:
--- wd:1 rd:2
disk 1, wo:0, o:1, dev:sda3
md: syncing RAID array md1
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwith (but not more than 200000
KB/sec) for reconstruction.
md: using 128k window, over a total of 116824064 blocks.
md: md1: sync done.
RAID1 conf printout:
--- wd:1 rd:2
disk 1, wo:0, o:1, dev:sda3

This raid volume is on two Seagate Barracuda 120 gig SATA drives running
on the system's ICH5 SATA controller.

#################################################################################

System 2 (castle):

> uname -a
Linux castle 2.6.7-gentoo-r6 #1 Mon Jun 28 10:39:39 MDT 2004 i686 AMD
Athlon(tm) XP 2600+ AuthenticAMD GNU/Linux

> cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 8
model name : AMD Athlon(tm) XP 2600+
stepping : 1
cpu MHz : 2089.340
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 4096.00

> lspci
0000:00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600
AGP] Host Bridge
0000:00:01.0 PCI bridge: VIA Technologies, Inc. VT8235 PCI Bridge
0000:00:06.0 VGA compatible controller: nVidia Corporation NV6
[Vanta/Vanta LT] (rev 15)
0000:00:08.0 RAID bus controller: CMD Technology Inc PCI0680 (rev 01)
0000:00:0b.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1
(rev 08)
0000:00:0b.1 Input device controller: Creative Labs SB Live! MIDI/Game
Port (rev 08)
0000:00:0d.0 RAID bus controller: Promise Technology, Inc. PDC20376 (rev
02)
0000:00:0e.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host
Controller (rev 46)
0000:00:0f.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5702
Gigabit Ethernet (rev 02)
0000:00:10.0 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0
controller] (rev 80)
0000:00:10.1 USB Controller: VIA Technologies, Inc. VT6202 [USB 2.0
controller] (rev 80)
0000:00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82)
0000:00:11.0 ISA bridge: VIA Technologies, Inc. VT8235 ISA Bridge
0000:00:11.1 IDE interface: VIA Technologies, Inc.
VT82C586A/B/VT82C686/A/B/VT823x/A/C/VT8235 PIPC Bus Master IDE (rev 06)
0000:01:00.0 VGA compatible controller: nVidia Corporation NV28
[GeForce4 Ti 4800 SE] (rev a1)

> df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 reiserfs 150G 107G 44G 72% /pub/music
/dev/sdb1 reiserfs 150G 140G 11G 94% /pub/video

> dmesg
Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
0x00 0x11 0x04
end_request: I/O error, dev sda, sector 91683284
scsi0: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 05 76 f9 d5 00 00
72 00
Current sda: sense = 70 3
ASC=11 ASCQ= 4
Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
0x00 0x11 0x04
end_request: I/O error, dev sda, sector 91683285
scsi0: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 05 76 f9 d6 00 00
71 00
Current sda: sense = 70 3
ASC=11 ASCQ= 4
Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
0x00 0x11 0x04
end_request: I/O error, dev sda, sector 91683286
scsi0: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 05 76 f9 d7 00 00
70 00
Current sda: sense = 70 3
ASC=11 ASCQ= 4
Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
0x00 0x11 0x04
end_request: I/O error, dev sda, sector 91683287
scsi0: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 05 76 f9 d8 00 00
6f 00
Current sda: sense = 70 3
ASC=11 ASCQ= 4
Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
0x00 0x11 0x04
end_request: I/O error, dev sda, sector 91683288
scsi0: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 05 76 f9 d9 00 00
6e 00
Current sda: sense = 70 3
ASC=11 ASCQ= 4
Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
0x00 0x11 0x04
end_request: I/O error, dev sda, sector 91683289
scsi0: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 05 76 f9 da 00 00
6d 00
Current sda: sense = 70 3
ASC=11 ASCQ= 4
Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
0x00 0x11 0x04
end_request: I/O error, dev sda, sector 91683290
<SNIP>

This system has two Seagate Barracuda 160 gig SATA drives running on a
PDC20376 SATA controller.

I'm not sure if this is a problem with something I've done or a bug
handling SATA in the kernel. I find it disturbing that both these
systems have this problem with different hardware configurations but the
same type of drives (only different sizes).

The first system, endeavour, was up for over 30 days. It was then
rebooted and has crashed for an unknown reason twice since and shows
these dmesg entries after booting up.

The second system, castle, gives me those messages when I try to copy
some data from sda1 and hangs the process for a very long time while it
keeps advancing 'bad sectors'.

If there's anything else I can do to test this let me know.

I'm attaching the kernel configs for both systems as well.

--
Travis Morgan <[email protected]>


Attachments:
endeavour.config (29.55 kB)
castle.config (35.32 kB)
Download all attachments

2004-06-30 19:15:09

by Jeff Garzik

[permalink] [raw]
Subject: Re: SATA problems on two different systems

Travis Morgan wrote:
> scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 03 30 b7 74 00 00
> 7d 00
> Current sdb: sense = 70 3
> ASC=11 ASCQ= 4
> Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
> 0x00 0x11 0x04
> end_request: I/O error, dev sdb, sector 53524340
> scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 03 30 b7 75 00 00
> 7c 00
> Current sdb: sense = 70 3
> ASC=11 ASCQ= 4
> Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
> 0x00 0x11 0x04
> end_request: I/O error, dev sdb, sector 53524341
> scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 03 30 b7 76 00 00
> 7b 00
> Current sdb: sense = 70 3
> ASC=11 ASCQ= 4
> Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
> 0x00 0x11 0x04
> end_request: I/O error, dev sdb, sector 53524342
> scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 03 30 b7 77 00 00
> 7a 00
> Current sdb: sense = 70 3
> ASC=11 ASCQ= 4
> Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
> 0x00 0x11 0x04
> end_request: I/O error, dev sdb, sector 53524343
> scsi1: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 03 30 b7 78 00 00
> 79 00
> Current sdb: sense = 70 3
> ASC=11 ASCQ= 4
> Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
> 0x00 0x11 0x04
> end_request: I/O error, dev sdb, sector 53524344
> <SNIP>
> raid1: Disk failure on sdb3, disabling device.
> Operation continuing on 1 devices
> raid1: sdb: unrecoverable I/O read error for block 52737152
> md: md1: sync done.
> RAID1 conf printout:
> --- wd:1 rd:2
> disk 0, wo:1, o:0, dev:sdb3
> disk 1, wo:0, o:1, dev:sda3
> RAID1 conf printout:
> --- wd:1 rd:2
> disk 1, wo:0, o:1, dev:sda3
> md: syncing RAID array md1
> md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
> md: using maximum available idle IO bandwith (but not more than 200000
> KB/sec) for reconstruction.
> md: using 128k window, over a total of 116824064 blocks.
> md: md1: sync done.
> RAID1 conf printout:
> --- wd:1 rd:2
> disk 1, wo:0, o:1, dev:sda3
>
> This raid volume is on two Seagate Barracuda 120 gig SATA drives running
> on the system's ICH5 SATA controller.
[...]
> Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
> 0x00 0x11 0x04
> end_request: I/O error, dev sda, sector 91683284
> scsi0: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 05 76 f9 d5 00 00
> 72 00
> Current sda: sense = 70 3
> ASC=11 ASCQ= 4
> Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
> 0x00 0x11 0x04
> end_request: I/O error, dev sda, sector 91683285
> scsi0: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 05 76 f9 d6 00 00
> 71 00
> Current sda: sense = 70 3
> ASC=11 ASCQ= 4
> Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
> 0x00 0x11 0x04
> end_request: I/O error, dev sda, sector 91683286
> scsi0: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 05 76 f9 d7 00 00
> 70 00
> Current sda: sense = 70 3
> ASC=11 ASCQ= 4
> Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
> 0x00 0x11 0x04
> end_request: I/O error, dev sda, sector 91683287
> scsi0: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 05 76 f9 d8 00 00
> 6f 00
> Current sda: sense = 70 3
> ASC=11 ASCQ= 4
> Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
> 0x00 0x11 0x04
> end_request: I/O error, dev sda, sector 91683288
> scsi0: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 05 76 f9 d9 00 00
> 6e 00
> Current sda: sense = 70 3
> ASC=11 ASCQ= 4
> Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
> 0x00 0x11 0x04
> end_request: I/O error, dev sda, sector 91683289
> scsi0: ERROR on channel 0, id 0, lun 0, CDB: 0x28 00 05 76 f9 da 00 00
> 6d 00
> Current sda: sense = 70 3
> ASC=11 ASCQ= 4
> Raw sense data:0x70 0x00 0x03 0x00 0x00 0x00 0x00 0x06 0x00 0x00 0x00
> 0x00 0x11 0x04
> end_request: I/O error, dev sda, sector 91683290
> <SNIP>
>
> This system has two Seagate Barracuda 160 gig SATA drives running on a
> PDC20376 SATA controller.
>
> I'm not sure if this is a problem with something I've done or a bug
> handling SATA in the kernel. I find it disturbing that both these
> systems have this problem with different hardware configurations but the
> same type of drives (only different sizes).
>
> The first system, endeavour, was up for over 30 days. It was then
> rebooted and has crashed for an unknown reason twice since and shows
> these dmesg entries after booting up.
>
> The second system, castle, gives me those messages when I try to copy
> some data from sda1 and hangs the process for a very long time while it
> keeps advancing 'bad sectors'.
>
> If there's anything else I can do to test this let me know.


You certainly provided the right information, thanks :)

While I certainly would not rule out a driver bug, these errors are
normally indicative of some sort of hardware problem. My first guess is
always to replace the SATA cables.

I'll be instrumenting the SATA driver to provide a lot more verbosity on
error very soon, so getting you to test again when that is in place (a
few days, a week at most) would be useful.

Jeff