2007-09-13 08:34:42

by Jon Ivar Rykkelid

[permalink] [raw]
Subject: sata_nv issues with MCP51 SATA controller

Sep 8 00:05:59 mirakel kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 8 00:05:59 mirakel kernel: ata1.00: cmd 35/00:08:47:83:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep 8 00:05:59 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 8 00:05:59 mirakel kernel: ata1: soft resetting port
Sep 8 00:05:59 mirakel kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:06:00 mirakel kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 8 00:06:00 mirakel kernel: ata2.00: cmd c8/00:08:d7:6e:6f/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 in
Sep 8 00:06:00 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 8 00:06:00 mirakel kernel: ata2: soft resetting port
Sep 8 00:06:01 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:06:30 mirakel kernel: ata1.00: qc timeout (cmd 0x27)
Sep 8 00:06:30 mirakel kernel: ata1.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:06:30 mirakel kernel: ata1.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:06:30 mirakel kernel: ata1: failed to recover some devices, retrying in 5 secs
Sep 8 00:06:31 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep 8 00:06:31 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:06:31 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:06:31 mirakel kernel: ata2: failed to recover some devices, retrying in 5 secs
Sep 8 00:06:35 mirakel kernel: ata1: hard resetting port
Sep 8 00:06:35 mirakel kernel: ata1: SRST failed (errno=-19)
Sep 8 00:06:35 mirakel kernel: ata1: reset failed (errno=-19), retrying in 10 secs
Sep 8 00:06:36 mirakel kernel: ata2: hard resetting port
Sep 8 00:06:36 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:06:45 mirakel kernel: ata1: hard resetting port
Sep 8 00:06:45 mirakel kernel: ata1: SRST failed (errno=-19)
Sep 8 00:06:45 mirakel kernel: ata1: reset failed (errno=-19), retrying in 10 secs
Sep 8 00:06:55 mirakel kernel: ata1: hard resetting port
Sep 8 00:06:55 mirakel kernel: ata1: SRST failed (errno=-19)
Sep 8 00:06:55 mirakel kernel: ata1: reset failed (errno=-19), retrying in 35 secs
Sep 8 00:07:06 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep 8 00:07:06 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:07:06 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:07:06 mirakel kernel: ata2.00: limiting speed to UDMA/133:PIO3
Sep 8 00:07:06 mirakel kernel: ata2: failed to recover some devices, retrying in 5 secs
Sep 8 00:07:11 mirakel kernel: ata2: hard resetting port
Sep 8 00:07:12 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:07:30 mirakel kernel: ata1: hard resetting port
Sep 8 00:07:30 mirakel kernel: ata1: SRST failed (errno=-19)
Sep 8 00:07:30 mirakel kernel: ata1: reset failed, giving up
Sep 8 00:07:30 mirakel kernel: ata1.00: disabled
Sep 8 00:07:30 mirakel kernel: ata1: EH complete
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 488407879
Sep 8 00:07:30 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 8 00:07:30 mirakel kernel: raid5: Disk failure on dm-0, disabling device. Operation continuing on 7 devices
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 141263543
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 4560055
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] READ CAPACITY failed
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Sense not available.
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Asking for cache data failed
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Assuming drive cache: write through
Sep 8 00:07:42 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep 8 00:07:42 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:07:42 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:07:42 mirakel kernel: ata2.00: disabled
Sep 8 00:07:42 mirakel kernel: ata2: EH complete
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 141520599
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 141671879
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 488407879
Sep 8 00:07:42 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 8 00:07:42 mirakel kernel: raid5: Disk failure on dm-1, disabling device. Operation continuing on 6 devices
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] READ CAPACITY failed
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Sense not available.
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Write Protect is off
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Asking for cache data failed
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Assuming drive cache: write through
Sep 8 00:08:12 mirakel kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 8 00:08:12 mirakel kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
Sep 8 00:08:12 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 8 00:08:13 mirakel kernel: ata3: soft resetting port
Sep 8 00:08:13 mirakel kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:08:42 mirakel kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 8 00:08:42 mirakel kernel: ata4.00: cmd 35/00:08:bf:44:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep 8 00:08:42 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 8 00:08:43 mirakel kernel: ata4: soft resetting port
Sep 8 00:08:43 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:08:43 mirakel kernel: ata3.00: qc timeout (cmd 0x27)
Sep 8 00:08:43 mirakel kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:08:43 mirakel kernel: ata3.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:08:43 mirakel kernel: ata3: failed to recover some devices, retrying in 5 secs
Sep 8 00:08:48 mirakel kernel: ata3: hard resetting port
Sep 8 00:08:48 mirakel kernel: ata3: SRST failed (errno=-19)
Sep 8 00:08:48 mirakel kernel: ata3: reset failed (errno=-19), retrying in 10 secs
Sep 8 00:08:58 mirakel kernel: ata3: hard resetting port
Sep 8 00:08:58 mirakel kernel: ata3: SRST failed (errno=-19)
Sep 8 00:08:58 mirakel kernel: ata3: reset failed (errno=-19), retrying in 10 secs
Sep 8 00:09:08 mirakel kernel: ata3: hard resetting port
Sep 8 00:09:08 mirakel kernel: ata3: SRST failed (errno=-19)
Sep 8 00:09:08 mirakel kernel: ata3: reset failed (errno=-19), retrying in 35 secs
Sep 8 00:09:13 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep 8 00:09:13 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:09:13 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:09:13 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep 8 00:09:18 mirakel kernel: ata4: hard resetting port
Sep 8 00:09:18 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:09:43 mirakel kernel: ata3: hard resetting port
Sep 8 00:09:43 mirakel kernel: ata3: SRST failed (errno=-19)
Sep 8 00:09:43 mirakel kernel: ata3: reset failed, giving up
Sep 8 00:09:43 mirakel kernel: ata3.00: disabled
Sep 8 00:09:43 mirakel kernel: ata3: EH complete
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:09:43 mirakel kernel: end_request: I/O error, dev sdc, sector 488391871
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] READ CAPACITY failed
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Sense not available.
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Write Protect is off
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Asking for cache data failed
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Assuming drive cache: write through
Sep 8 00:09:43 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 8 00:09:43 mirakel kernel: raid5: Disk failure on sdc1, disabling device. Operation continuing on 5 devices
Sep 8 00:09:48 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep 8 00:09:48 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:09:48 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:09:48 mirakel kernel: ata4.00: limiting speed to UDMA/133:PIO3
Sep 8 00:09:48 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep 8 00:09:53 mirakel kernel: ata4: hard resetting port
Sep 8 00:09:54 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:10:24 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep 8 00:10:24 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:10:24 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:10:24 mirakel kernel: ata4.00: disabled
Sep 8 00:10:25 mirakel kernel: ata4: EH complete
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:10:25 mirakel kernel: end_request: I/O error, dev sdd, sector 488391871
Sep 8 00:10:25 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 8 00:10:25 mirakel kernel: raid5: Disk failure on sdd1, disabling device. Operation continuing on 4 devices
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] READ CAPACITY failed
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Sense not available.
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Write Protect is off
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Asking for cache data failed
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Assuming drive cache: write through
Sep 8 00:10:25 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:25 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716576
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:25 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716499
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716500
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716501
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 6175
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: Aborting journal on device md0.
Sep 8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
Sep 8 00:10:25 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:25 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:25 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:25 mirakel kernel: disk 5, o:0, dev:sdc1
Sep 8 00:10:25 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:25 mirakel kernel: disk 7, o:0, dev:sdd1
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 0
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 0
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_free_blocks_sb: Journal has aborted
Sep 8 00:10:26 mirakel kernel: ext3_abort called.
Sep 8 00:10:26 mirakel kernel: EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Sep 8 00:10:26 mirakel kernel: Remounting filesystem read-only
Sep 8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123686376
Sep 8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123689709
Sep 8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123689744
Sep 8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:26 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:26 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:26 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:26 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:26 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:26 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:26 mirakel kernel: disk 5, o:0, dev:sdc1
Sep 8 00:10:26 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:26 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:26 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:26 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:26 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:26 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:26 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:26 mirakel kernel: disk 5, o:0, dev:sdc1
Sep 8 00:10:26 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:26 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:26 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:26 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:26 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:26 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:27 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:27 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:27 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:27 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:27 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:27 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:27 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:27 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:27 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:27 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:27 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:27 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:27 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:27 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:27 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:27 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:27 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:27 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:27 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: EXT3-fs error (device md0): ext3_readdir: directory #126337 contains a hole at offset 4096


Attachments:
sata_nv-error.log (16.71 kB)

2007-09-13 09:17:59

by Tejun Heo

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Jon Ivar Rykkelid wrote:
> I'm having serious disk-issues when using the on-board nvidia controller
> for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia
> chipset, cpu is intel Core2Quad)
>
> excerpt from "lspci":
> 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
> 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
> (rev a1)
> 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
> (rev a1)
>
> I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that
> works fine (/dev/hda)
>
> However, any number of disks (I have tried 2 and 4) connected to the
> SATA-controller(s), will eventually fail. - See attached log (excerpt /
> anything relevant from /var/log/messages)
>
> At first, disks were REALLY unstable, but then I disabled S.M.A.R.T.
> (both in BIOS and Linux), and I updated from the CentOS5 (equivalent of
> RHEL5) kernel (2.6.18) to the latest (at that time) official kernel form
> kernel.org:
>
>> uname -a
> Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007
> i686 i686 i386 GNU/Linux
>
> Now it will normally take a day or two before SATA crashes, so things
> are better, but still rather useless.
>
> First error when sata_nv get into problems is always:
> "exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
> (as shown in the attached log-file.) - when this happens to one device,
> it'll almost instantly happen to the other disk attached to that
> controller as well. A couple of minutes (or so) later, the disk(s)
> connected to the other controller will start acting up as well (in the
> same manner). - I/O freezes, and nothing helps except a reboot...
>
> As I run a rather large (software / md) RAID-5 disk array on this server
> (I'm doing a bit of video editing), every crash means a time-consuming
> rebuild of the disk-array...
>
> I have given up on the sata_nv / nvidia-controllers for the time being.
> I now resort to some old PCI-connected sata-controllers which work fine
> (but slow, as they are outdated and "overloaded").
>
> So, if anyone has a good solution / suggestion / improved driver (over
> the one supplied with the official 2.6.22.5-kernel) I am eager to give
> it a go and see if the situation can be resolved.
>
> I appreciate any sensible suggestions.

Wheeee... the whole controller seems to have went down at once and it's
not even IRQ routing problem - resets are failing. This is the first
time I see something like this. Sorry but I don't have any idea what's
going on. cc'ing Robert. Any ideas?

--
tejun

2007-09-14 15:58:29

by Jeff Garzik

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Jon Ivar Rykkelid wrote:
> It is NOT the PSU, nor is it cables, as all the drives work well using
> the same cables + PSU (in the same box) if I connect them to my other
> two controllers (in that same box).


It's sometimes the combination that matters most. You cannot really
make that determination yet.

Jeff


2007-09-14 18:38:43

by Jon Ivar Rykkelid

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>
>> I'm going to test another (identical) motherboard this evening to
>> establish whether it could be a HW-issue.
>
> Not just motherboard. It is more likely to be a cable, drive or PSU
> problem.

>> It is NOT the PSU, nor is it cables, as all the drives work well
>> using the same cables + PSU (in the same box) if I connect them to my
>> other two controllers (in that same box).
>
>
> It's sometimes the combination that matters most. You cannot really
> make that determination yet.
>

Whatever.
(Though I must confess, that in spite of my Master degree in Electrical
Engineering and extensive HW experience, I can not for the life of me
understand how you can find it more likely to be cables (that work fine
with other controllers), disks (that also work fine with other
controllers) or the power-supply (that also works fine with exactly the
same things connected to it) rather than the motherboard's
SATA-controller (that is the item that actually is reported to fail in
the first place). - Sure, I'm well aware that sometimes the combination
of HW matters, but to my experience we're normally not talking about
"dumb" stuff like cables and PSU if that is the issue.)

Anyway, I have just changed to the other (identical) motherboard, and
things are running just fine at the moment...
I'll let you know if they start acting up (as they did before). If not I
guess the fault was with the motherboard and not the driver - Guess
we'll know pretty soon...

Thanks for all your effort, gents, let's hope it all works now!

BR
Jon Ivar

2007-09-14 20:28:50

by auxsvr

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Hello,

I get a similar, if not identical, problem with an ASUS A8N SLI nforce4 based
motherboard. The PC (with a seagate SATA-2 120 GB HDD) ran fine for two
years , last Christmas windows xp (I didn't change either hardware or
drivers) started crashing and the filesystem got corrupted beyond repair
within 8 hours after every installation. The system log contained entries
about bad sectors and, based on the seagate diagnosis tool, I returned the
system to the supplier. According to the retail shop, neither the disk nor
the system had any problems, so I was coerced to pay for a replacement disk.
The replacement HDD (seagate again, 120 GB) ran fine until a month ago (this
time the system is connected to a UPS), when the same problem occurred! I
moved the disk to a linux system with the promise tx2plus controller (the one
I'm typing this from), found bad sectors, formatted it and everything works
fine for at least 6 hours of continuous disk writes and reads in this system.
If I return the disk to the nforce4 system, it becomes corrupted within some
hours of disk access, no matter whether linux or windows is installed,
regardless of NCQ settings, drivers and cables.

The symptoms are the same in both cases: the system crashes, then runs for
some hours, then the controller stops completely responding (ata1: exception
Emask 0x10 SAct 0x0 SErr 0x1810000 action 0x2 frozen is the first error
message), the disk access LED blinks continuously, linux 2.6.18 (opensuse
10.2) throws lots of error messages similar to the ones you mention above,
linux says that the device is dead and the system becomes unusable (no disk
access). After a reboot, the filesystem is fine for some time, afterwards
similar error messages appear, seek errors appear and the filesystem becomes
completely destroyed. The positive part of this ordeal is that the linux SATA
error handling works fine and linux recovered the first time, without access
to the drive of course, while windows crashed badly and I was unable to find
out what was happening in the beginning.

I cannot say with certainty that this is a hardware error or damage, seagate
technical support insists that their HDD is at fault, which is obviously
wrong, the PC is (after the second incident) connected to a UPS and was
checked by the service at the shop, and the most weird thing I cannot
explain is that the system ran fine for 8 months after I changed the
disk, even though the disk wasn't damaged! Either the motherboard is damaged
or faulty (how can you explain that it ran fine for 8 months after I changed
the disk?) or there is some very weird interaction with the HDD and the SATA
controller, which isn't unlikely, considering the problems reported about
combinations of nforce4 and maxtor HDDs, yet still doesn't explain the 2 year
and 8 month period of normal operation. I'm going to contact the service
again and see how this comes out.

2007-09-15 11:29:52

by Prakash Punnoor

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

On the day of Saturday 15 September 2007 Jon Ivar Rykkelid hast written:
> Do you get the same error messages that I do if you're running without
> the "acpi_use_timer_override" (this is how it is spelled, isn't it) ?

I don't remeber which messages I get, but for me the kernel didn't boot with
certain versions. Any yes, you spelled it correctly.
--
(?= =?)
//\ Prakash Punnoor /\\
V_/ \_V


Attachments:
(No filename) (426.00 B)
signature.asc (189.00 B)
This is a digitally signed message part.
Download all attachments