2007-09-13 07:46:20

by Jon Ivar Rykkelid

[permalink] [raw]
Subject: sata_nv issues with MCP51 SATA controller

Sep 8 00:05:59 mirakel kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 8 00:05:59 mirakel kernel: ata1.00: cmd 35/00:08:47:83:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep 8 00:05:59 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 8 00:05:59 mirakel kernel: ata1: soft resetting port
Sep 8 00:05:59 mirakel kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:06:00 mirakel kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 8 00:06:00 mirakel kernel: ata2.00: cmd c8/00:08:d7:6e:6f/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 in
Sep 8 00:06:00 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 8 00:06:00 mirakel kernel: ata2: soft resetting port
Sep 8 00:06:01 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:06:30 mirakel kernel: ata1.00: qc timeout (cmd 0x27)
Sep 8 00:06:30 mirakel kernel: ata1.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:06:30 mirakel kernel: ata1.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:06:30 mirakel kernel: ata1: failed to recover some devices, retrying in 5 secs
Sep 8 00:06:31 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep 8 00:06:31 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:06:31 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:06:31 mirakel kernel: ata2: failed to recover some devices, retrying in 5 secs
Sep 8 00:06:35 mirakel kernel: ata1: hard resetting port
Sep 8 00:06:35 mirakel kernel: ata1: SRST failed (errno=-19)
Sep 8 00:06:35 mirakel kernel: ata1: reset failed (errno=-19), retrying in 10 secs
Sep 8 00:06:36 mirakel kernel: ata2: hard resetting port
Sep 8 00:06:36 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:06:45 mirakel kernel: ata1: hard resetting port
Sep 8 00:06:45 mirakel kernel: ata1: SRST failed (errno=-19)
Sep 8 00:06:45 mirakel kernel: ata1: reset failed (errno=-19), retrying in 10 secs
Sep 8 00:06:55 mirakel kernel: ata1: hard resetting port
Sep 8 00:06:55 mirakel kernel: ata1: SRST failed (errno=-19)
Sep 8 00:06:55 mirakel kernel: ata1: reset failed (errno=-19), retrying in 35 secs
Sep 8 00:07:06 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep 8 00:07:06 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:07:06 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:07:06 mirakel kernel: ata2.00: limiting speed to UDMA/133:PIO3
Sep 8 00:07:06 mirakel kernel: ata2: failed to recover some devices, retrying in 5 secs
Sep 8 00:07:11 mirakel kernel: ata2: hard resetting port
Sep 8 00:07:12 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:07:30 mirakel kernel: ata1: hard resetting port
Sep 8 00:07:30 mirakel kernel: ata1: SRST failed (errno=-19)
Sep 8 00:07:30 mirakel kernel: ata1: reset failed, giving up
Sep 8 00:07:30 mirakel kernel: ata1.00: disabled
Sep 8 00:07:30 mirakel kernel: ata1: EH complete
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 488407879
Sep 8 00:07:30 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 8 00:07:30 mirakel kernel: raid5: Disk failure on dm-0, disabling device. Operation continuing on 7 devices
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 141263543
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 4560055
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] READ CAPACITY failed
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Sense not available.
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Asking for cache data failed
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Assuming drive cache: write through
Sep 8 00:07:42 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep 8 00:07:42 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:07:42 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:07:42 mirakel kernel: ata2.00: disabled
Sep 8 00:07:42 mirakel kernel: ata2: EH complete
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 141520599
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 141671879
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 488407879
Sep 8 00:07:42 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 8 00:07:42 mirakel kernel: raid5: Disk failure on dm-1, disabling device. Operation continuing on 6 devices
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] READ CAPACITY failed
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Sense not available.
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Write Protect is off
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Asking for cache data failed
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Assuming drive cache: write through
Sep 8 00:08:12 mirakel kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 8 00:08:12 mirakel kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
Sep 8 00:08:12 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 8 00:08:13 mirakel kernel: ata3: soft resetting port
Sep 8 00:08:13 mirakel kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:08:42 mirakel kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 8 00:08:42 mirakel kernel: ata4.00: cmd 35/00:08:bf:44:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep 8 00:08:42 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 8 00:08:43 mirakel kernel: ata4: soft resetting port
Sep 8 00:08:43 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:08:43 mirakel kernel: ata3.00: qc timeout (cmd 0x27)
Sep 8 00:08:43 mirakel kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:08:43 mirakel kernel: ata3.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:08:43 mirakel kernel: ata3: failed to recover some devices, retrying in 5 secs
Sep 8 00:08:48 mirakel kernel: ata3: hard resetting port
Sep 8 00:08:48 mirakel kernel: ata3: SRST failed (errno=-19)
Sep 8 00:08:48 mirakel kernel: ata3: reset failed (errno=-19), retrying in 10 secs
Sep 8 00:08:58 mirakel kernel: ata3: hard resetting port
Sep 8 00:08:58 mirakel kernel: ata3: SRST failed (errno=-19)
Sep 8 00:08:58 mirakel kernel: ata3: reset failed (errno=-19), retrying in 10 secs
Sep 8 00:09:08 mirakel kernel: ata3: hard resetting port
Sep 8 00:09:08 mirakel kernel: ata3: SRST failed (errno=-19)
Sep 8 00:09:08 mirakel kernel: ata3: reset failed (errno=-19), retrying in 35 secs
Sep 8 00:09:13 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep 8 00:09:13 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:09:13 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:09:13 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep 8 00:09:18 mirakel kernel: ata4: hard resetting port
Sep 8 00:09:18 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:09:43 mirakel kernel: ata3: hard resetting port
Sep 8 00:09:43 mirakel kernel: ata3: SRST failed (errno=-19)
Sep 8 00:09:43 mirakel kernel: ata3: reset failed, giving up
Sep 8 00:09:43 mirakel kernel: ata3.00: disabled
Sep 8 00:09:43 mirakel kernel: ata3: EH complete
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:09:43 mirakel kernel: end_request: I/O error, dev sdc, sector 488391871
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] READ CAPACITY failed
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Sense not available.
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Write Protect is off
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Asking for cache data failed
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Assuming drive cache: write through
Sep 8 00:09:43 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 8 00:09:43 mirakel kernel: raid5: Disk failure on sdc1, disabling device. Operation continuing on 5 devices
Sep 8 00:09:48 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep 8 00:09:48 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:09:48 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:09:48 mirakel kernel: ata4.00: limiting speed to UDMA/133:PIO3
Sep 8 00:09:48 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep 8 00:09:53 mirakel kernel: ata4: hard resetting port
Sep 8 00:09:54 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:10:24 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep 8 00:10:24 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:10:24 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:10:24 mirakel kernel: ata4.00: disabled
Sep 8 00:10:25 mirakel kernel: ata4: EH complete
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:10:25 mirakel kernel: end_request: I/O error, dev sdd, sector 488391871
Sep 8 00:10:25 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 8 00:10:25 mirakel kernel: raid5: Disk failure on sdd1, disabling device. Operation continuing on 4 devices
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] READ CAPACITY failed
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Sense not available.
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Write Protect is off
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Asking for cache data failed
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Assuming drive cache: write through
Sep 8 00:10:25 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:25 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716576
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:25 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716499
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716500
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716501
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 6175
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: Aborting journal on device md0.
Sep 8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
Sep 8 00:10:25 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:25 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:25 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:25 mirakel kernel: disk 5, o:0, dev:sdc1
Sep 8 00:10:25 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:25 mirakel kernel: disk 7, o:0, dev:sdd1
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 0
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 0
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_free_blocks_sb: Journal has aborted
Sep 8 00:10:26 mirakel kernel: ext3_abort called.
Sep 8 00:10:26 mirakel kernel: EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Sep 8 00:10:26 mirakel kernel: Remounting filesystem read-only
Sep 8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123686376
Sep 8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123689709
Sep 8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123689744
Sep 8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:26 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:26 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:26 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:26 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:26 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:26 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:26 mirakel kernel: disk 5, o:0, dev:sdc1
Sep 8 00:10:26 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:26 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:26 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:26 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:26 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:26 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:26 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:26 mirakel kernel: disk 5, o:0, dev:sdc1
Sep 8 00:10:26 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:26 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:26 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:26 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:26 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:26 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:27 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:27 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:27 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:27 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:27 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:27 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:27 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:27 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:27 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:27 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:27 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:27 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:27 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:27 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:27 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:27 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:27 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:27 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:27 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: EXT3-fs error (device md0): ext3_readdir: directory #126337 contains a hole at offset 4096


Attachments:
sata_nv-error.log (16.71 kB)

2007-09-13 14:20:48

by Jeff Garzik

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Jon Ivar Rykkelid wrote:
>
> Hi, I'm resending (didn't see my first attempt appear on the maillist):
>
>
>
> I'm having serious disk-issues when using the on-board nvidia controller
> for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia
> chipset, cpu is intel Core2Quad)
>
> excerpt from "lspci":
> 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
> 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
> (rev a1)
> 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
> (rev a1)
>
> I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that
> works fine (/dev/hda)
>
> However, any number of disks (I have tried 2 and 4) connected to the
> SATA-controller(s), will eventually fail. - See attached log (excerpt /
> anything relevant from /var/log/messages)
>
> At first, disks were REALLY unstable, but then I disabled S.M.A.R.T.
> (both in BIOS and Linux), and I updated from the CentOS5 (equivalent of
> RHEL5) kernel (2.6.18) to the latest (at that time) official kernel from
> kernel.org:
>
> > uname -a
> Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007
> i686 i686 i386 GNU/Linux
>
> Now it will normally take a day or two before SATA crashes, so things
> are better, but still rather useless.
>
> First error when sata_nv get into problems is always:
> "exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
> (as shown in the attached log-file.) - when this happens to one device,
> it'll almost instantly happen to the other disk attached to that
> controller as well. A couple of minutes (or so) later, the disk(s)
> connected to the other controller will start acting up as well (in the
> same manner). - I/O freezes, and nothing helps except a reboot...
>
> As I run a rather large (software / md) RAID-5 disk array on this server
> (I'm doing a bit of video editing), every crash means a time-consuming
> rebuild of the disk-array...
>
> I have given up on the sata_nv / nvidia-controllers for the time being.
> I now resort to some old PCI-connected sata-controllers which work fine
> (but slow, as they are outdated and "overloaded").
>
> So, if anyone has a good solution / suggestion / improved driver (over
> the one supplied with the official 2.6.22.5-kernel) I am eager to give
> it a go and see if the situation can be resolved.

does adma=0 module option do anything?

Jeff



2007-09-13 15:05:55

by Jon Ivar Rykkelid

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>>
>> Hi, I'm resending (didn't see my first attempt appear on the maillist):
>>
>>
>>
>> I'm having serious disk-issues when using the on-board nvidia controller
>> for my HDDs (My motherboard is a Gigabyte GA-N650SLI-DS4 with nvidia
>> chipset, cpu is intel Core2Quad)
>>
>> excerpt from "lspci":
>> 00:0d.0 IDE interface: nVidia Corporation MCP51 IDE (rev a1)
>> 00:0e.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
>> (rev a1)
>> 00:0f.0 IDE interface: nVidia Corporation MCP51 Serial ATA Controller
>> (rev a1)
>>
>> I have a normal IDE/P-ATA-disk attached to the "IDE"-controller and that
>> works fine (/dev/hda)
>>
>> However, any number of disks (I have tried 2 and 4) connected to the
>> SATA-controller(s), will eventually fail. - See attached log (excerpt /
>> anything relevant from /var/log/messages)
>>
>> At first, disks were REALLY unstable, but then I disabled S.M.A.R.T.
>> (both in BIOS and Linux), and I updated from the CentOS5 (equivalent of
>> RHEL5) kernel (2.6.18) to the latest (at that time) official kernel from
>> kernel.org:
>>
>> > uname -a
>> Linux mirakel 2.6.22.5-custom_jir #2 SMP Thu Aug 30 22:06:21 CEST 2007
>> i686 i686 i386 GNU/Linux
>>
>> Now it will normally take a day or two before SATA crashes, so things
>> are better, but still rather useless.
>>
>> First error when sata_nv get into problems is always:
>> "exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen"
>> (as shown in the attached log-file.) - when this happens to one device,
>> it'll almost instantly happen to the other disk attached to that
>> controller as well. A couple of minutes (or so) later, the disk(s)
>> connected to the other controller will start acting up as well (in the
>> same manner). - I/O freezes, and nothing helps except a reboot...
>>
>> As I run a rather large (software / md) RAID-5 disk array on this server
>> (I'm doing a bit of video editing), every crash means a time-consuming
>> rebuild of the disk-array...
>>
>> I have given up on the sata_nv / nvidia-controllers for the time being.
>> I now resort to some old PCI-connected sata-controllers which work fine
>> (but slow, as they are outdated and "overloaded").
>>
>> So, if anyone has a good solution / suggestion / improved driver (over
>> the one supplied with the official 2.6.22.5-kernel) I am eager to give
>> it a go and see if the situation can be resolved.
>
> does adma=0 module option do anything?
>
> Jeff
Thanks for the suggestion, but sata_nv is not built modular in my
current kernel, so "no can do" at the moment
(However, if some expert REALLY thinks this will fix things, I will
CERTAINLY recompile and give it a go)

As I said before, it all works for some time (a day or two) before it
crashes with the current kernel & no "S.M.A.R.T.". With my current setup
I have always had the time to fully rebuild my disk-array before a new
crash. - In the case of 4 disks attached to the nvidia controllers
(disregarding the disks on other controllers), this means that the
sata_nv-driver / controllers alone have read at least 750GB and written
250GB of data before the crash (with no resets working) - soft reboot
fixes everything. - I'm pretty confident that this is a driver issue.

As Tejun Heo <[email protected]> writes "the whole controller seems to
have went down at once and it's not even IRQ routing problem - resets
are failing."

The error-messages / crash-symptoms were the same with SMART enabled and
the original CentOS5-kernel, except that with that setup, the crashes
were much more frequent.

Any help?

BR
Jon Ivar

2007-09-13 15:16:25

by Tejun Heo

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Jon Ivar Rykkelid wrote:
> Thanks for the suggestion, but sata_nv is not built modular in my
> current kernel, so "no can do" at the moment
> (However, if some expert REALLY thinks this will fix things, I will
> CERTAINLY recompile and give it a go)

Passing "sata_nv.adma=0" as kernel boot parameter will do the trick.

--
tejun

2007-09-13 18:01:46

by Jon Ivar Rykkelid

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Resending, as my first attempts contained HTML and was blocked...

Tejun Heo wrote:
> Jon Ivar Rykkelid wrote:
>
>> Thanks for the suggestion, but sata_nv is not built modular in my
>> current kernel, so "no can do" at the moment
>> (However, if some expert REALLY thinks this will fix things, I will
>> CERTAINLY recompile and give it a go)
>>
>
> Passing "sata_nv.adma=0" as kernel boot parameter will do the trick.
>
>
Ahh, silly me... Of course!
Ooops, I just got back, and verified: I actually have sata_nv running as
a module after all on this server... My bad.
I fixed /etc/modprobe.conf to include the following two lines:
"
alias scsi_hostadapter sata_nv
options sata_nv adma=0
...
"

I then ran "mkinitrd" (to ensure that the latest options from
modprobe.conf were included) in the initrd-image that I load at boot.

- Do you guys think this is worth a try? Anyway, I have rebooted now, so
I'll test it for a few days and let you know - We'll just have to wait
and see...
Do you think I should re-enable SMART to provoke a failure, or would
that be to tempt fate too much? (For now I have not re-enabled SMART)

PS: Is there any way of testing / verifying that sata_nv is now running
with this option? - I am pretty sure I have done it correctly, but I
would still like to confirm that the proper option has been passed if
possible.

Thanks
Jon Ivar

2007-09-13 19:26:30

by Jon Ivar Rykkelid

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Sep 8 00:05:59 mirakel kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 8 00:05:59 mirakel kernel: ata1.00: cmd 35/00:08:47:83:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep 8 00:05:59 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 8 00:05:59 mirakel kernel: ata1: soft resetting port
Sep 8 00:05:59 mirakel kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:06:00 mirakel kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 8 00:06:00 mirakel kernel: ata2.00: cmd c8/00:08:d7:6e:6f/00:00:00:00:00/e8 tag 0 cdb 0x0 data 4096 in
Sep 8 00:06:00 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 8 00:06:00 mirakel kernel: ata2: soft resetting port
Sep 8 00:06:01 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:06:30 mirakel kernel: ata1.00: qc timeout (cmd 0x27)
Sep 8 00:06:30 mirakel kernel: ata1.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:06:30 mirakel kernel: ata1.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:06:30 mirakel kernel: ata1: failed to recover some devices, retrying in 5 secs
Sep 8 00:06:31 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep 8 00:06:31 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:06:31 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:06:31 mirakel kernel: ata2: failed to recover some devices, retrying in 5 secs
Sep 8 00:06:35 mirakel kernel: ata1: hard resetting port
Sep 8 00:06:35 mirakel kernel: ata1: SRST failed (errno=-19)
Sep 8 00:06:35 mirakel kernel: ata1: reset failed (errno=-19), retrying in 10 secs
Sep 8 00:06:36 mirakel kernel: ata2: hard resetting port
Sep 8 00:06:36 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:06:45 mirakel kernel: ata1: hard resetting port
Sep 8 00:06:45 mirakel kernel: ata1: SRST failed (errno=-19)
Sep 8 00:06:45 mirakel kernel: ata1: reset failed (errno=-19), retrying in 10 secs
Sep 8 00:06:55 mirakel kernel: ata1: hard resetting port
Sep 8 00:06:55 mirakel kernel: ata1: SRST failed (errno=-19)
Sep 8 00:06:55 mirakel kernel: ata1: reset failed (errno=-19), retrying in 35 secs
Sep 8 00:07:06 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep 8 00:07:06 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:07:06 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:07:06 mirakel kernel: ata2.00: limiting speed to UDMA/133:PIO3
Sep 8 00:07:06 mirakel kernel: ata2: failed to recover some devices, retrying in 5 secs
Sep 8 00:07:11 mirakel kernel: ata2: hard resetting port
Sep 8 00:07:12 mirakel kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:07:30 mirakel kernel: ata1: hard resetting port
Sep 8 00:07:30 mirakel kernel: ata1: SRST failed (errno=-19)
Sep 8 00:07:30 mirakel kernel: ata1: reset failed, giving up
Sep 8 00:07:30 mirakel kernel: ata1.00: disabled
Sep 8 00:07:30 mirakel kernel: ata1: EH complete
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 488407879
Sep 8 00:07:30 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 8 00:07:30 mirakel kernel: raid5: Disk failure on dm-0, disabling device. Operation continuing on 7 devices
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 141263543
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:30 mirakel kernel: end_request: I/O error, dev sda, sector 4560055
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] READ CAPACITY failed
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Sense not available.
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Write Protect is off
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Asking for cache data failed
Sep 8 00:07:30 mirakel kernel: sd 0:0:0:0: [sda] Assuming drive cache: write through
Sep 8 00:07:42 mirakel kernel: ata2.00: qc timeout (cmd 0x27)
Sep 8 00:07:42 mirakel kernel: ata2.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:07:42 mirakel kernel: ata2.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:07:42 mirakel kernel: ata2.00: disabled
Sep 8 00:07:42 mirakel kernel: ata2: EH complete
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 141520599
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 141671879
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:42 mirakel kernel: end_request: I/O error, dev sdb, sector 488407879
Sep 8 00:07:42 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 8 00:07:42 mirakel kernel: raid5: Disk failure on dm-1, disabling device. Operation continuing on 6 devices
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] READ CAPACITY failed
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Sense not available.
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Write Protect is off
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Asking for cache data failed
Sep 8 00:07:42 mirakel kernel: sd 1:0:0:0: [sdb] Assuming drive cache: write through
Sep 8 00:08:12 mirakel kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 8 00:08:12 mirakel kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0
Sep 8 00:08:12 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 8 00:08:13 mirakel kernel: ata3: soft resetting port
Sep 8 00:08:13 mirakel kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:08:42 mirakel kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 8 00:08:42 mirakel kernel: ata4.00: cmd 35/00:08:bf:44:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep 8 00:08:42 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 8 00:08:43 mirakel kernel: ata4: soft resetting port
Sep 8 00:08:43 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:08:43 mirakel kernel: ata3.00: qc timeout (cmd 0x27)
Sep 8 00:08:43 mirakel kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:08:43 mirakel kernel: ata3.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:08:43 mirakel kernel: ata3: failed to recover some devices, retrying in 5 secs
Sep 8 00:08:48 mirakel kernel: ata3: hard resetting port
Sep 8 00:08:48 mirakel kernel: ata3: SRST failed (errno=-19)
Sep 8 00:08:48 mirakel kernel: ata3: reset failed (errno=-19), retrying in 10 secs
Sep 8 00:08:58 mirakel kernel: ata3: hard resetting port
Sep 8 00:08:58 mirakel kernel: ata3: SRST failed (errno=-19)
Sep 8 00:08:58 mirakel kernel: ata3: reset failed (errno=-19), retrying in 10 secs
Sep 8 00:09:08 mirakel kernel: ata3: hard resetting port
Sep 8 00:09:08 mirakel kernel: ata3: SRST failed (errno=-19)
Sep 8 00:09:08 mirakel kernel: ata3: reset failed (errno=-19), retrying in 35 secs
Sep 8 00:09:13 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep 8 00:09:13 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:09:13 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:09:13 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep 8 00:09:18 mirakel kernel: ata4: hard resetting port
Sep 8 00:09:18 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:09:43 mirakel kernel: ata3: hard resetting port
Sep 8 00:09:43 mirakel kernel: ata3: SRST failed (errno=-19)
Sep 8 00:09:43 mirakel kernel: ata3: reset failed, giving up
Sep 8 00:09:43 mirakel kernel: ata3.00: disabled
Sep 8 00:09:43 mirakel kernel: ata3: EH complete
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:09:43 mirakel kernel: end_request: I/O error, dev sdc, sector 488391871
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] READ CAPACITY failed
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Sense not available.
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Write Protect is off
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Asking for cache data failed
Sep 8 00:09:43 mirakel kernel: sd 2:0:0:0: [sdc] Assuming drive cache: write through
Sep 8 00:09:43 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 8 00:09:43 mirakel kernel: raid5: Disk failure on sdc1, disabling device. Operation continuing on 5 devices
Sep 8 00:09:48 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep 8 00:09:48 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:09:48 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:09:48 mirakel kernel: ata4.00: limiting speed to UDMA/133:PIO3
Sep 8 00:09:48 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep 8 00:09:53 mirakel kernel: ata4: hard resetting port
Sep 8 00:09:54 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 8 00:10:24 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep 8 00:10:24 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 8 00:10:24 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep 8 00:10:24 mirakel kernel: ata4.00: disabled
Sep 8 00:10:25 mirakel kernel: ata4: EH complete
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:10:25 mirakel kernel: end_request: I/O error, dev sdd, sector 488391871
Sep 8 00:10:25 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 8 00:10:25 mirakel kernel: raid5: Disk failure on sdd1, disabling device. Operation continuing on 4 devices
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] READ CAPACITY failed
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Sense not available.
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Write Protect is off
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Asking for cache data failed
Sep 8 00:10:25 mirakel kernel: sd 3:0:0:0: [sdd] Assuming drive cache: write through
Sep 8 00:10:25 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:25 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716576
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:25 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716499
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716500
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 123716501
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 6175
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: Aborting journal on device md0.
Sep 8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
Sep 8 00:10:25 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:25 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:25 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:25 mirakel kernel: disk 5, o:0, dev:sdc1
Sep 8 00:10:25 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:25 mirakel kernel: disk 7, o:0, dev:sdd1
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 0
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted
Sep 8 00:10:25 mirakel kernel: Buffer I/O error on device md0, logical block 0
Sep 8 00:10:25 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:25 mirakel kernel: EXT3-fs error (device md0) in ext3_free_blocks_sb: Journal has aborted
Sep 8 00:10:26 mirakel kernel: ext3_abort called.
Sep 8 00:10:26 mirakel kernel: EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Sep 8 00:10:26 mirakel kernel: Remounting filesystem read-only
Sep 8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123686376
Sep 8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123689709
Sep 8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:26 mirakel kernel: Buffer I/O error on device md0, logical block 123689744
Sep 8 00:10:26 mirakel kernel: lost page write due to I/O error on md0
Sep 8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:26 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:26 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:26 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:26 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:26 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:26 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:26 mirakel kernel: disk 5, o:0, dev:sdc1
Sep 8 00:10:26 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:26 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:26 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:26 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:26 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:26 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:26 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:26 mirakel kernel: disk 5, o:0, dev:sdc1
Sep 8 00:10:26 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:26 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:26 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:26 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:26 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:26 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:26 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:27 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:27 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:27 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:27 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:27 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 4, o:0, dev:dm-0
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:27 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:27 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:27 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:27 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:27 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:27 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:27 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:27 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:27 mirakel kernel: disk 2, o:0, dev:dm-1
Sep 8 00:10:27 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: RAID5 conf printout:
Sep 8 00:10:27 mirakel kernel: --- rd:8 wd:4
Sep 8 00:10:27 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 8 00:10:27 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 8 00:10:27 mirakel kernel: disk 3, o:1, dev:hds1
Sep 8 00:10:27 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 8 00:10:27 mirakel kernel: EXT3-fs error (device md0): ext3_readdir: directory #126337 contains a hole at offset 4096


Attachments:
sata_nv-error.log (16.71 kB)

2007-09-13 19:54:38

by Jeff Garzik

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Jon Ivar Rykkelid wrote:
> Hi,
>
> I now tested with the adma=0 option, but if anything I got a crash
> quicker than before. Same error message started coming in, but this time
> the system hung before I was able to capture the log as well (but I saw
> the error, and it was the same as before, except that this time it was
> the ata3-channel that first started acting up..) - To remind you all
> what this is about, I have reattached the log that I originally captured...

Sounds like a hardware problem, since disabling ADMA is generally the
cure-all we use -- it appears to stress the hardware less.

Jeff



2007-09-13 21:15:44

by Jon Ivar Rykkelid

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Is this the general opinion? - Should I try to get a replacement
motherboard of the same type?

If so, can anyone confirm that the sata_nv-driver is working with the
Gigabyte GA-N650SLI-DS4 motherboard at all / have anyone been successful
with this MB? How about the MCP51 SATA controller? - Can anyone confirm
that the driver is working for this HW? I would feel awkward to try to
claim a warranty replacement if it is proved that the HW is OK after
all, and the problem is with the linux-driver...

BR
Jon Ivar

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>> Hi,
>>
>> I now tested with the adma=0 option, but if anything I got a crash
>> quicker than before. Same error message started coming in, but this
>> time the system hung before I was able to capture the log as well
>> (but I saw the error, and it was the same as before, except that this
>> time it was the ata3-channel that first started acting up..) - To
>> remind you all what this is about, I have reattached the log that I
>> originally captured...
>
> Sounds like a hardware problem, since disabling ADMA is generally the
> cure-all we use -- it appears to stress the hardware less.
>
> Jeff
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Jon Ivar Rykkelid Web: http://www.pvv.org/~jonry
Enromvegen 191 Phone: +47 72 56 86 86
N-7026 Trondheim Mob.: +47 906 20 250
Norway Email: [email protected]

2007-09-14 00:38:19

by Robert Hancock

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>> Hi,
>>
>> I now tested with the adma=0 option, but if anything I got a crash
>> quicker than before. Same error message started coming in, but this
>> time the system hung before I was able to capture the log as well (but
>> I saw the error, and it was the same as before, except that this time
>> it was the ata3-channel that first started acting up..) - To remind
>> you all what this is about, I have reattached the log that I
>> originally captured...
>
> Sounds like a hardware problem, since disabling ADMA is generally the
> cure-all we use -- it appears to stress the hardware less.

If this is an MCP51 chipset, adma=0 will make no difference since that
chipset does not support ADMA in the first place.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2007-09-14 12:10:35

by Jon Ivar Rykkelid

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Hi,

To eliminate the possibility of this being a hardware issue, I have now
acquired another "Gigabyte GA-N650SLI-DS4" motherboard (with the "MCP51"
chipset) for testing. I'll swap parts this evening. Hopefully I'll be
able to tell you in a few hours whether this appears to be working as it
should. The motherboard that I'm going to swap to has actually been
tested (with MS Windows OS+driver) for more than a day with a disk
connected, so if this MB also fails, I think it will be safe to say that
the issue is with the sata_nv driver... So hang on.

(You can't think of something else that could conflict with the sata_nv
driver after a bit of time, like two of my raid-disks being encrypted,
me running a SW raid-5 array / some special HW (quad-core CPU) / me
running vmware on this server ... ? - To me, all these suggestions seems
rather far fetched, especially as all is working with another
controller, so I'm arguing that unless there's a HW issue, the issue is
with the driver, but you're the expert(s), so let me know if you differ.)

I'll keep you posted as to the result of swapping HW.. Give me a few
hours. :-)

BR
Jon Ivar

Robert Hancock wrote:
> Jeff Garzik wrote:
>> Jon Ivar Rykkelid wrote:
>>> Hi,
>>>
>>> I now tested with the adma=0 option, but if anything I got a crash
>>> quicker than before. Same error message started coming in, but this
>>> time the system hung before I was able to capture the log as well
>>> (but I saw the error, and it was the same as before, except that
>>> this time it was the ata3-channel that first started acting up..) -
>>> To remind you all what this is about, I have reattached the log that
>>> I originally captured...
>>
>> Sounds like a hardware problem, since disabling ADMA is generally the
>> cure-all we use -- it appears to stress the hardware less.
>
> If this is an MCP51 chipset, adma=0 will make no difference since that
> chipset does not support ADMA in the first place.
>

2007-09-14 13:28:57

by Prakash Punnoor

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

On the day of Thursday 13 September 2007 Jon Ivar Rykkelid hast written:
> Resending, as my first attempts contained HTML and was blocked...
>
> Tejun Heo wrote:
> > Jon Ivar Rykkelid wrote:
> >> Thanks for the suggestion, but sata_nv is not built modular in my
> >> current kernel, so "no can do" at the moment
> >> (However, if some expert REALLY thinks this will fix things, I will
> >> CERTAINLY recompile and give it a go)
> >
> > Passing "sata_nv.adma=0" as kernel boot parameter will do the trick.
>
> Ahh, silly me... Of course!
> Ooops, I just got back, and verified: I actually have sata_nv running as
> a module after all on this server... My bad.
> I fixed /etc/modprobe.conf to include the following two lines:
> "
> alias scsi_hostadapter sata_nv
> options sata_nv adma=0
> ...
> "

I don't think it will matter, as adma doesn't affect MCP51, but only nforce4.
So I'd look for other trouble makers.
--
(?= =?)
//\ Prakash Punnoor /\\
V_/ \_V


Attachments:
(No filename) (990.00 B)
signature.asc (189.00 B)
This is a digitally signed message part.
Download all attachments

2007-09-14 14:17:31

by Jon Ivar Rykkelid

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Prakash Punnoor wrote:
> I don't think it will matter, as adma doesn't affect MCP51, but only nforce4.
> So I'd look for other trouble makers.
>
Robert told me. (And you're correct - It didn't help).

I'm going to test another (identical) motherboard this evening to
establish whether it could be a HW-issue.

I'll keep you posted

Jon Ivar

2007-09-14 14:25:32

by Jeff Garzik

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Jon Ivar Rykkelid wrote:
> Prakash Punnoor wrote:
>> I don't think it will matter, as adma doesn't affect MCP51, but only
>> nforce4. So I'd look for other trouble makers.
>>
> Robert told me. (And you're correct - It didn't help).

Yes, it was already in slow-and-safe mode.


> I'm going to test another (identical) motherboard this evening to
> establish whether it could be a HW-issue.

Not just motherboard. It is more likely to be a cable, drive or PSU
problem.

Jeff


2007-09-14 14:41:17

by Tejun Heo

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Jeff Garzik wrote:
> Jon Ivar Rykkelid wrote:
>> Prakash Punnoor wrote:
>>> I don't think it will matter, as adma doesn't affect MCP51, but only
>>> nforce4. So I'd look for other trouble makers.
>>>
>> Robert told me. (And you're correct - It didn't help).
>
> Yes, it was already in slow-and-safe mode.
>
>
>> I'm going to test another (identical) motherboard this evening to
>> establish whether it could be a HW-issue.
>
> Not just motherboard. It is more likely to be a cable, drive or PSU
> problem.

I don't think it's cable as the problem occurs on multiple ports. My
bet is either the controller or PSU.

Thanks.

--
tejun

2007-09-14 20:35:48

by Jon Ivar Rykkelid

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Sep 14 20:09:15 mirakel kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 14 20:09:15 mirakel kernel: ata3.00: cmd 35/00:08:bf:44:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep 14 20:09:15 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 14 20:09:15 mirakel kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
Sep 14 20:09:15 mirakel kernel: ata4.00: cmd 35/00:08:bf:44:1c/00:00:1d:00:00/e0 tag 0 cdb 0x0 data 4096 out
Sep 14 20:09:15 mirakel kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 14 20:09:16 mirakel kernel: ata3: soft resetting port
Sep 14 20:09:16 mirakel kernel: ata4: soft resetting port
Sep 14 20:09:16 mirakel kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 14 20:09:16 mirakel kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 14 20:09:46 mirakel kernel: ata3.00: qc timeout (cmd 0x27)
Sep 14 20:09:46 mirakel kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 14 20:09:46 mirakel kernel: ata3.00: failed to set xfermode (err_mask=0x40)
Sep 14 20:09:46 mirakel kernel: ata3: failed to recover some devices, retrying in 5 secs
Sep 14 20:09:46 mirakel kernel: ata4.00: qc timeout (cmd 0x27)
Sep 14 20:09:46 mirakel kernel: ata4.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 14 20:09:46 mirakel kernel: ata4.00: failed to set xfermode (err_mask=0x40)
Sep 14 20:09:46 mirakel kernel: ata4: failed to recover some devices, retrying in 5 secs
Sep 14 20:09:51 mirakel kernel: ata3: hard resetting port
Sep 14 20:09:51 mirakel kernel: ata4: hard resetting port
Sep 14 20:09:51 mirakel kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 14 20:09:51 mirakel kernel: ata4: SRST failed (errno=-19)
Sep 14 20:09:51 mirakel kernel: ata4: reset failed (errno=-19), retrying in 10 secs
Sep 14 20:10:01 mirakel kernel: ata4: hard resetting port
Sep 14 20:10:01 mirakel kernel: ata4: SRST failed (errno=-19)
Sep 14 20:10:01 mirakel kernel: ata4: reset failed (errno=-19), retrying in 10 secs
Sep 14 20:10:11 mirakel kernel: ata4: hard resetting port
Sep 14 20:10:11 mirakel kernel: ata4: SRST failed (errno=-19)
Sep 14 20:10:11 mirakel kernel: ata4: reset failed (errno=-19), retrying in 35 secs
Sep 14 20:10:21 mirakel kernel: ata3.00: qc timeout (cmd 0x27)
Sep 14 20:10:21 mirakel kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 14 20:10:21 mirakel kernel: ata3.00: failed to set xfermode (err_mask=0x40)
Sep 14 20:10:21 mirakel kernel: ata3.00: limiting speed to UDMA/133:PIO3
Sep 14 20:10:21 mirakel kernel: ata3: failed to recover some devices, retrying in 5 secs
Sep 14 20:10:26 mirakel kernel: ata3: hard resetting port
Sep 14 20:10:27 mirakel kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 14 20:10:46 mirakel kernel: ata4: hard resetting port
Sep 14 20:10:46 mirakel kernel: ata4: SRST failed (errno=-19)
Sep 14 20:10:46 mirakel kernel: ata4: reset failed, giving up
Sep 14 20:10:46 mirakel kernel: ata4.00: disabled
Sep 14 20:10:46 mirakel kernel: ata4: EH complete
Sep 14 20:10:46 mirakel kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 14 20:10:46 mirakel kernel: end_request: I/O error, dev sdd, sector 488391871
Sep 14 20:10:46 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 14 20:10:46 mirakel kernel: raid5: Disk failure on sdd1, disabling device. Operation continuing on 7 devices
Sep 14 20:10:57 mirakel kernel: ata3.00: qc timeout (cmd 0x27)
Sep 14 20:10:57 mirakel kernel: ata3.00: ata_hpa_resize 1: hpa sectors (0) is smaller than sectors (490234752)
Sep 14 20:10:57 mirakel kernel: ata3.00: failed to set xfermode (err_mask=0x40)
Sep 14 20:10:57 mirakel kernel: ata3.00: disabled
Sep 14 20:10:58 mirakel kernel: ata3: EH complete
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 14 20:10:58 mirakel kernel: end_request: I/O error, dev sdc, sector 488391871
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] READ CAPACITY failed
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] Sense not available.
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] Write Protect is off
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] Asking for cache data failed
Sep 14 20:10:58 mirakel kernel: sd 2:0:0:0: [sdc] Assuming drive cache: write through
Sep 14 20:10:58 mirakel kernel: md: super_written gets error=-5, uptodate=0
Sep 14 20:10:58 mirakel kernel: raid5: Disk failure on sdc1, disabling device. Operation continuing on 6 devices
Sep 14 20:10:58 mirakel kernel: RAID5 conf printout:
Sep 14 20:10:58 mirakel kernel: Buffer I/O error on device md0, logical block 119194013
Sep 14 20:10:58 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:58 mirakel kernel: Buffer I/O error on device md0, logical block 119194014
Sep 14 20:10:58 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:58 mirakel kernel: Buffer I/O error on device md0, logical block 6660
Sep 14 20:10:58 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:58 mirakel kernel: Aborting journal on device md0.
Sep 14 20:10:58 mirakel kernel: --- rd:8 wd:6
Sep 14 20:10:58 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 14 20:10:58 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 14 20:10:58 mirakel kernel: disk 2, o:1, dev:dm-0
Sep 14 20:10:58 mirakel kernel: disk 3, o:1, dev:hds1
Sep 14 20:10:58 mirakel kernel: disk 4, o:1, dev:dm-1
Sep 14 20:10:58 mirakel kernel: disk 5, o:0, dev:sdd1
Sep 14 20:10:58 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 14 20:10:58 mirakel kernel: disk 7, o:0, dev:sdc1
Sep 14 20:10:58 mirakel kernel: ext3_abort called.
Sep 14 20:10:58 mirakel kernel: EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Sep 14 20:10:58 mirakel kernel: Remounting filesystem read-only
Sep 14 20:10:58 mirakel kernel: RAID5 conf printout:
Sep 14 20:10:58 mirakel kernel: --- rd:8 wd:6
Sep 14 20:10:58 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 14 20:10:58 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 14 20:10:58 mirakel kernel: disk 2, o:1, dev:dm-0
Sep 14 20:10:58 mirakel kernel: disk 3, o:1, dev:hds1
Sep 14 20:10:58 mirakel kernel: disk 4, o:1, dev:dm-1
Sep 14 20:10:58 mirakel kernel: disk 5, o:0, dev:sdd1
Sep 14 20:10:58 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 14 20:10:58 mirakel kernel: RAID5 conf printout:
Sep 14 20:10:58 mirakel kernel: --- rd:8 wd:6
Sep 14 20:10:58 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 14 20:10:58 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 14 20:10:58 mirakel kernel: disk 2, o:1, dev:dm-0
Sep 14 20:10:58 mirakel kernel: disk 3, o:1, dev:hds1
Sep 14 20:10:58 mirakel kernel: disk 4, o:1, dev:dm-1
Sep 14 20:10:58 mirakel kernel: disk 5, o:0, dev:sdd1
Sep 14 20:10:58 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 14 20:10:58 mirakel kernel: RAID5 conf printout:
Sep 14 20:10:58 mirakel kernel: --- rd:8 wd:6
Sep 14 20:10:58 mirakel kernel: disk 0, o:1, dev:hdg1
Sep 14 20:10:58 mirakel kernel: disk 1, o:1, dev:hdo1
Sep 14 20:10:58 mirakel kernel: disk 2, o:1, dev:dm-0
Sep 14 20:10:58 mirakel kernel: disk 3, o:1, dev:hds1
Sep 14 20:10:58 mirakel kernel: disk 4, o:1, dev:dm-1
Sep 14 20:10:58 mirakel kernel: disk 6, o:1, dev:hdk1
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 29
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 30
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 119177216
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 119180640
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 119180739
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 119193951
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0
Sep 14 20:10:59 mirakel kernel: Buffer I/O error on device md0, logical block 119193953
Sep 14 20:10:59 mirakel kernel: lost page write due to I/O error on md0


Attachments:
sata_nv-new.log (8.41 kB)

2007-09-15 07:11:54

by Prakash Punnoor

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

On the day of Friday 14 September 2007 Jon Ivar Rykkelid hast written:
> Hi, I'm getting inmore confident that the driver is the issue.
>
>
> (Or have anyone EVER been successful with the latest kernel/driver on
> this HW)?

I don't have exaclty the same hw, but the same chipset and I don't have any
problems - even with the swncq patch applied. Do you have an hpet? If not,
try booting with acpi_use_time_override. My system won't work with skipping
the override.

--
(?= =?)
//\ Prakash Punnoor /\\
V_/ \_V


Attachments:
(No filename) (546.00 B)
signature.asc (189.00 B)
This is a digitally signed message part.
Download all attachments

2007-09-15 10:14:46

by Jon Ivar Rykkelid

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

Prakash Punnoor wrote:
> I don't have exaclty the same hw, but the same chipset and I don't have any
> problems - even with the swncq patch applied. Do you have an hpet? If not,
> try booting with acpi_use_time_override. My system won't work with skipping
> the override.
>
>
Hi , I reconnected and rebooted with the kernel option
"acpi_use_timer_override" (this is the correct spelling, isn't it? -
Kernel didn't complain.). Didn't help, the same error received as
before. - I'll have to connect all disks back to my PCI-connected SATA
controllers and start rebuilding my RAID yet again.

It seems random which disk is first affected (This far, I know that it
has happened to ata1, ata3 and ata4, three of my potential disks) - I
guess it just happens to the disk that is being used at the moment when
the driver / controller acts up.)

I'm about to give in. I think I'll try to replace both ( Gigabyte
GA-N650SLI-DS4 ) motherboards, as the driver simply isn't working for
the on-board controller of these boards. Could be a combination of the
controllers and some other HW on the motherboards of course, but all is
working when I connect all disks to my non-nvidia controllers. - Guess
I'll opt for a motherboard with an intel-chipset after all...

BR
Jon Ivar

2007-09-15 14:48:17

by John Stoffel

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

>>>>> "Jon" == Jon Ivar Rykkelid <[email protected]> writes:

Jon> Prakash Punnoor wrote:
>> I don't have exaclty the same hw, but the same chipset and I don't have any
>> problems - even with the swncq patch applied. Do you have an hpet? If not,
>> try booting with acpi_use_time_override. My system won't work with skipping
>> the override.

Jon> Hi , I reconnected and rebooted with the kernel option
Jon> "acpi_use_timer_override" (this is the correct spelling, isn't
Jon> it? - Kernel didn't complain.). Didn't help, the same error
Jon> received as before. - I'll have to connect all disks back to my
Jon> PCI-connected SATA controllers and start rebuilding my RAID yet
Jon> again.

What happens when you just have ONE disk connected to the motherboard
controller, and the rest connected to PCI controllers? Does it crap
out then? You've just such a nice repeatable problem across
motherboards that it's a shame to waste this debugging time.

I'm wondering if it's a PCI bus issue somehow, and that the load on
the motherboard controller isn't supportable when you have a bunch of
disks on PCI controllers as well. Shot in the dark...

Thanks for all your hard work on this, I know how frustrating it is to
not have a stable system!

John

2007-09-15 19:30:18

by Jon Ivar Rykkelid

[permalink] [raw]
Subject: Re: sata_nv issues with MCP51 SATA controller

John Stoffel wrote:
> What happens when you just have ONE disk connected to the motherboard
> controller, and the rest connected to PCI controllers? Does it crap
> out then? You've just such a nice repeatable problem across
> motherboards that it's a shame to waste this debugging time.
>
Sorry, I gave in. I have now abandoned my nvidia trials (both
motherboards have been returned, and I'm now running with Intel chipset)
- My current motherboard is less ideal (in terms of PCI-slots etc.), but
on the other hand it works...
> I'm wondering if it's a PCI bus issue somehow, and that the load on
> the motherboard controller isn't supportable when you have a bunch of
> disks on PCI controllers as well. Shot in the dark...
>
That was actually not such a bad idea... Unfortunately it's too late now
(If not I should have tested for sure). I was/am after all running an
8-disk SATA array (plus a normal IDE disk - not in the raid). I had 4
disks running through two PCI-cards and 4 disks used the motherboard's
controller. - When all 8 disks were connected to the two PCI-cards the
speed dropped compared to when the motherboard's controller took some
load.. (So it could maybe be an issue with bandwidth / load ? - I don't
know.)
> Thanks for all your hard work on this, I know how frustrating it is to
> not have a stable system!
>
Sorry for giving in, but I felt I was banging my head against the wall
(and with too few sensible solutions being suggested). Now I guess I'm
semi-happy that all seems to work OK with the Intel chipset..
Frustrating that the sata_nv-driver / nvidia HW didn't work with my
configuration, though...

Thank you all for your effort as well - hope someone figures this out
sometime in the future.

All the best
Jon Ivar