2010-06-10 04:39:55

by alan

[permalink] [raw]
Subject: Question on siig sata 3 controller

Does anyone know the status of the SIIG DP SATA 6Gb/s 2S1P PCIe (Part
number: SC-SA0E12-S1)?

I am encountering problems writing a large quantity through this
controller and I want to see if there is a way to fix this. The pci ids
do not appear to be referenced in the kernel.

Are any of the siig sata controllers supported? Is there some issue with
them supporting Linux that I am not aware of?

Here is the lspci data:

05:00.0 SATA controller: Device 1b4b:9123 (rev 11) (prog-if 01 [AHCI 1.0])
Subsystem: Device 1b4b:9123
Flags: bus master, fast devsel, latency 0, IRQ 30
I/O ports at dc00 [size=8]
I/O ports at d880 [size=4]
I/O ports at d800 [size=8]
I/O ports at d480 [size=4]
I/O ports at d400 [size=16]
Memory at f9fff800 (32-bit, non-prefetchable) [size=2K]
Expansion ROM at f9fe0000 [disabled] [size=64K]
Capabilities: <access denied>
Kernel driver in use: ahci

05:00.1 IDE interface: Device 1b4b:91a4 (rev 11) (prog-if 8f [Master SecP
SecO PriP PriO])
Subsystem: Device 1b4b:91a4
Flags: fast devsel, IRQ 18
I/O ports at d080 [size=8]
I/O ports at d000 [size=4]
I/O ports at cc00 [size=8]
I/O ports at c880 [size=4]
I/O ports at c800 [size=16]
Memory at f9fff400 (32-bit, non-prefetchable) [size=16]
Expansion ROM at f9fd0000 [disabled] [size=64K]
Capabilities: <access denied>
Kernel modules: ata_generic, pata_acpi

Thanks!


2010-06-10 08:53:55

by Jeff Garzik

[permalink] [raw]
Subject: Re: Question on siig sata 3 controller

On 06/10/2010 12:39 AM, Alan wrote:
> Does anyone know the status of the SIIG DP SATA 6Gb/s 2S1P PCIe (Part
> number: SC-SA0E12-S1)?
>
> I am encountering problems writing a large quantity through this
> controller and I want to see if there is a way to fix this. The pci ids
> do not appear to be referenced in the kernel.
>
> Are any of the siig sata controllers supported? Is there some issue with
> them supporting Linux that I am not aware of?
>
> Here is the lspci data:
>
> 05:00.0 SATA controller: Device 1b4b:9123 (rev 11) (prog-if 01 [AHCI 1.0])
> Subsystem: Device 1b4b:9123
> Flags: bus master, fast devsel, latency 0, IRQ 30
> I/O ports at dc00 [size=8]
> I/O ports at d880 [size=4]
> I/O ports at d800 [size=8]
> I/O ports at d480 [size=4]
> I/O ports at d400 [size=16]
> Memory at f9fff800 (32-bit, non-prefetchable) [size=2K]
> Expansion ROM at f9fe0000 [disabled] [size=64K]
> Capabilities:<access denied>
> Kernel driver in use: ahci

What issues are you seeing?

The 'ahci' driver is aware of this controller...

Jeff


2010-06-10 16:28:34

by alan

[permalink] [raw]
Subject: Re: Question on siig sata 3 controller

On Thu, 10 Jun 2010, Jeff Garzik wrote:

> On 06/10/2010 12:39 AM, Alan wrote:
>> Does anyone know the status of the SIIG DP SATA 6Gb/s 2S1P PCIe (Part
>> number: SC-SA0E12-S1)?
>>
>> I am encountering problems writing a large quantity through this
>> controller and I want to see if there is a way to fix this. The pci ids
>> do not appear to be referenced in the kernel.
>>
>> Are any of the siig sata controllers supported? Is there some issue with
>> them supporting Linux that I am not aware of?
>>
>> Here is the lspci data:
>>
>> 05:00.0 SATA controller: Device 1b4b:9123 (rev 11) (prog-if 01 [AHCI 1.0])
>> Subsystem: Device 1b4b:9123
>> Flags: bus master, fast devsel, latency 0, IRQ 30
>> I/O ports at dc00 [size=8]
>> I/O ports at d880 [size=4]
>> I/O ports at d800 [size=8]
>> I/O ports at d480 [size=4]
>> I/O ports at d400 [size=16]
>> Memory at f9fff800 (32-bit, non-prefetchable) [size=2K]
>> Expansion ROM at f9fe0000 [disabled] [size=64K]
>> Capabilities:<access denied>
>> Kernel driver in use: ahci
>
> What issues are you seeing?
>
> The 'ahci' driver is aware of this controller...

If you write a large amount of data to the drive (about 6-8 gigs+) the
drive will error out and disconnect.

I will post the string of error messages when I get home.

--
Truth is stranger than fiction because fiction has to make sense.

2010-06-11 02:08:45

by alan

[permalink] [raw]
Subject: Re: Question on siig sata 3 controller

> On 06/10/2010 12:39 AM, Alan wrote:
>> Does anyone know the status of the SIIG DP SATA 6Gb/s 2S1P PCIe (Part
>> number: SC-SA0E12-S1)?
>>
>> I am encountering problems writing a large quantity through this
>> controller and I want to see if there is a way to fix this. The pci ids
>> do not appear to be referenced in the kernel.
>>
>> Are any of the siig sata controllers supported? Is there some issue with
>> them supporting Linux that I am not aware of?
>>
>> Here is the lspci data:
>>
>> 05:00.0 SATA controller: Device 1b4b:9123 (rev 11) (prog-if 01 [AHCI
>> 1.0])
>> Subsystem: Device 1b4b:9123
>> Flags: bus master, fast devsel, latency 0, IRQ 30
>> I/O ports at dc00 [size=8]
>> I/O ports at d880 [size=4]
>> I/O ports at d800 [size=8]
>> I/O ports at d480 [size=4]
>> I/O ports at d400 [size=16]
>> Memory at f9fff800 (32-bit, non-prefetchable) [size=2K]
>> Expansion ROM at f9fe0000 [disabled] [size=64K]
>> Capabilities:<access denied>
>> Kernel driver in use: ahci
>
> What issues are you seeing?
>
> The 'ahci' driver is aware of this controller...

When writing large amounts of data I see messages like the following:

Jun 8 19:31:46 zowie kernel: ata2.00: exception Emask 0x0 SAct 0x3fffffff
SErr 0x0 action 0x6 frozen
Jun 8 19:31:46 zowie kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jun 8 19:31:46 zowie kernel: ata2.00: cmd
61/28:00:17:fb:06/00:00:04:00:00/40 tag 0 ncq 20480 out
Jun 8 19:31:46 zowie kernel: res
40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 8 19:31:46 zowie kernel: ata2.00: status: { DRDY }
Jun 8 19:31:46 zowie kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jun 8 19:31:46 zowie kernel: ata2.00: cmd
61/20:08:9f:db:06/00:00:04:00:00/40 tag 1 ncq 16384 out
Jun 8 19:31:46 zowie kernel: res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 8 19:31:46 zowie kernel: ata2.00: status: { DRDY }
Jun 8 19:31:46 zowie kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jun 8 19:31:46 zowie kernel: ata2.00: cmd
61/28:10:d7:df:06/00:00:04:00:00/40 tag 2 ncq 20480 out
Jun 8 19:31:46 zowie kernel: res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 8 19:31:46 zowie kernel: ata2.00: status: { DRDY }
Jun 8 19:31:46 zowie kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jun 8 19:31:46 zowie kernel: ata2.00: cmd
61/30:18:0f:e4:06/00:00:04:00:00/40 tag 3 ncq 24576 out
Jun 8 19:31:46 zowie kernel: res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 8 19:31:46 zowie kernel: ata2.00: status: { DRDY }
Jun 8 19:31:46 zowie kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jun 8 19:31:46 zowie kernel: ata2.00: cmd
61/28:20:17:fc:06/00:00:04:00:00/40 tag 4 ncq 20480 out
Jun 8 19:31:46 zowie kernel: res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 8 19:31:46 zowie kernel: ata2.00: status: { DRDY }
Jun 8 19:31:46 zowie kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jun 8 19:31:46 zowie kernel: ata2.00: cmd
61/08:28:b7:b7:06/00:00:04:00:00/40 tag 5 ncq 4096 out
Jun 8 19:31:46 zowie kernel: res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 8 19:31:46 zowie kernel: ata2.00: status: { DRDY }
Jun 8 19:31:46 zowie kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jun 8 19:31:46 zowie kernel: ata2.00: cmd
61/20:30:1f:d3:06/00:00:04:00:00/40 tag 6 ncq 16384 out
Jun 8 19:31:46 zowie kernel: res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 8 19:31:46 zowie kernel: ata2.00: status: { DRDY }
Jun 8 19:31:46 zowie kernel: ata2.00: failed command: WRITE FPDMA QUEUED

After a bit it does this:

Jun 8 19:31:46 zowie kernel: ata2: hard resetting link
Jun 8 19:31:48 zowie kernel: ata2: SATA link up 1.5 Gbps (SStatus 113
SControl 300)
Jun 8 19:31:53 zowie kernel: ata2.00: qc timeout (cmd 0xec)
Jun 8 19:31:53 zowie kernel: ata2.00: failed to IDENTIFY (I/O error,
err_mask=0x4)
Jun 8 19:31:53 zowie kernel: ata2.00: revalidation failed (errno=-5)
Jun 8 19:31:53 zowie kernel: ata2: hard resetting link
Jun 8 19:31:54 zowie kernel: ata2: SATA link up 1.5 Gbps (SStatus 113
SControl 300)
Jun 8 19:32:04 zowie kernel: ata2.00: qc timeout (cmd 0xec)
Jun 8 19:32:05 zowie kernel: ata2.00: failed to IDENTIFY (I/O error,
err_mask=0x4)
Jun 8 19:32:05 zowie kernel: ata2.00: revalidation failed (errno=-5)
Jun 8 19:32:05 zowie kernel: ata2: limiting SATA link speed to 1.5 Gbps
Jun 8 19:32:05 zowie kernel: ata2: hard resetting link
Jun 8 19:32:05 zowie kernel: ata2: SATA link up 1.5 Gbps (SStatus 113
SControl 310)
Jun 8 19:32:35 zowie kernel: ata2.00: qc timeout (cmd 0xec)
Jun 8 19:32:36 zowie kernel: ata2.00: failed to IDENTIFY (I/O error,
err_mask=0x4)
Jun 8 19:32:36 zowie kernel: ata2.00: revalidation failed (errno=-5)
Jun 8 19:32:36 zowie kernel: ata2.00: disabled
Jun 8 19:32:36 zowie kernel: ata2.00: device reported invalid CHS sector 0
Jun 8 19:32:36 zowie kernel: ata2.00: device reported invalid CHS sector 0
Jun 8 19:32:36 zowie kernel: ata2.00: device reported invalid CHS sector 0
Jun 8 19:32:36 zowie kernel: ata2.00: device reported invalid CHS sector 0

The drive goes into a read only state at this point.

It does not matter what drive I put on the controller. The controller has
been replaced once already.

Double-plus ungood.

2010-06-15 07:03:58

by Rogier Wolff

[permalink] [raw]
Subject: Re: Question on siig sata 3 controller

On Thu, Jun 10, 2010 at 07:08:43PM -0700, Alan wrote:
> When writing large amounts of data I see messages like the following:

yeah! I'm trying to write some 2.5Tb to my raid array, where 2 of 8
disks are connected to an Asus U3S6 board.
http://www.asus.com/product.aspx?P_ID=lGYmelQ8mJvPtYTv

After a while, those two disks bomb out, and make the raid
inaccessible.

A reboot brings the disks back to life. So in theory, Linux should be
able to restore life into these drives by doing the right magic with
the hardware bits...

I'm running 2.6.34:

Linux version 2.6.34 (root@zebigbos) (gcc version 3.4.2) #3 SMP Mon May 17 21:04:13 CEST 2010


Log file entries:

ata5.00: exception Emask 0x0 SAct 0xfff SErr 0x0 action 0x6 frozen
ata5.00: failed command: READ FPDMA QUEUED
ata5.00: cmd 60/a8:00:f6:12:10/00:00:0d:00:00/40 tag 0 ncq 86016 in
res 40/00:14:ee:98:bb/00:00:0a:00:00/40 Emask 0x4 (timeout)
ata5.00: status: { DRDY }
...
ata5.00: failed command: READ FPDMA QUEUED
ata5.00: cmd 60/a0:58:ee:19:10/00:00:0d:00:00/40 tag 11 ncq 81920 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: status: { DRDY }
ata5: hard resetting link
ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 370)
ata5.00: configured for UDMA/133
ata5.00: device reported invalid CHS sector 0
*last message repeated 10 times
ata5: EH complete

(all tags 1...10 are aalso listed.)

This seems "harmless", it happend a few times the last hour or so
(during the rebuild).

When things went bad last time I got:

one of these "harmless events" (but this time with 31 tags listed!):

Jun 14 18:26:23 vercingetorix kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 370)

and then 5 seconds later:

ata5.00: qc timeout (cmd 0xec)
ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata5.00: revalidation failed (errno=-5)
ata5: hard resetting link
ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 370)
ata5.00: qc timeout (cmd 0xec)
ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4)


Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

2010-06-15 10:04:00

by Alan

[permalink] [raw]
Subject: Re: Question on siig sata 3 controller

> A reboot brings the disks back to life. So in theory, Linux should be
> able to restore life into these drives by doing the right magic with
> the hardware bits...

We don't have power control of the drives. If the firmware crashes or a
drive flakes out due to power problems or something similar occurs its
game over until you hit the switch.

> ata5: hard resetting link
> ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 370)

We tried the biggest hammer we had

Alan

2010-06-15 14:53:46

by Rogier Wolff

[permalink] [raw]
Subject: Re: Question on siig sata 3 controller

On Tue, Jun 15, 2010 at 11:07:48AM +0100, Alan Cox wrote:
> > A reboot brings the disks back to life. So in theory, Linux should be
> > able to restore life into these drives by doing the right magic with
> > the hardware bits...

> We don't have power control of the drives. If the firmware crashes
> or a drive flakes out due to power problems or something similar
> occurs its game over until you hit the switch.

The thing is, the power didn't cycle. I just typed "reboot" from a
remote location. (Yes, in most cases leading up to yesterday's/this
morning's event I thought I had to powercycle to bring them back, but
I tried "just the reboot" this morning and it worked!)

The controller has TWO drives connected. BOTH drives became
inaccessible at exactly the same point in time. This has happened
before, with BOTH drives disappearing at the same moment.

The RAID superblocks on BOTH drives had info like:
RAID disk 1/8, raid is up 8/8
say for disk numbers 1,2.

All six other drives had
RAID disk 4/8, raid is broken 6/8
say, for disk numbers 0, 3,4,5,6,7

Next time this happens, I'll try removing and reinserting all the sata
modules (the machine is a file-server. It's NFS-root so it doesn't
depend on the storage modules for it's root fs.... :-) )

sata_nv 20758 0
ahci 36037 6

Is one of these modules the driver for this controller? I think it's
AHCI: lshw says it uses ports cc00 ... and a bunch of others, and
those ports are claimed by ahci according to /proc/ioports. Ah! I need
better eyes. lshw already mentions that it's ahci...

> > ata5: hard resetting link
> > ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 370)
>
> We tried the biggest hammer we had

Not big enough! De BIOS manages a bigger one!

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

2010-06-15 14:58:29

by Alan

[permalink] [raw]
Subject: Re: Question on siig sata 3 controller

> Is one of these modules the driver for this controller? I think it's
> AHCI: lshw says it uses ports cc00 ... and a bunch of others, and
> those ports are claimed by ahci according to /proc/ioports. Ah! I need
> better eyes. lshw already mentions that it's ahci...

AHCI will be driving it.

2010-06-15 17:25:35

by alan

[permalink] [raw]
Subject: Re: Question on siig sata 3 controller

On Tue, 2010-06-15 at 16:01 +0100, Alan Cox wrote:
> > Is one of these modules the driver for this controller? I think it's
> > AHCI: lshw says it uses ports cc00 ... and a bunch of others, and
> > those ports are claimed by ahci according to /proc/ioports. Ah! I need
> > better eyes. lshw already mentions that it's ahci...
>
> AHCI will be driving it.

I have seen this problem with the 2.6.33 kernel in Fedora 13. The
problem goes away in 2.6.35-rc3. (Though networking is fubared for me on
that kernel, so I have not migrated to it.)

My understanding is the "fix" in the driver was to blacklist ncq for
that controller. I have not verified that yet.

2010-08-10 13:48:26

by Rogier Wolff

[permalink] [raw]
Subject: Re: Question on siig sata 3 controller

On Tue, Jun 15, 2010 at 10:25:32AM -0700, Alan wrote:
> On Tue, 2010-06-15 at 16:01 +0100, Alan Cox wrote:
> > > Is one of these modules the driver for this controller? I think it's
> > > AHCI: lshw says it uses ports cc00 ... and a bunch of others, and
> > > those ports are claimed by ahci according to /proc/ioports. Ah! I need
> > > better eyes. lshw already mentions that it's ahci...
> >
> > AHCI will be driving it.
>
> I have seen this problem with the 2.6.33 kernel in Fedora 13. The
> problem goes away in 2.6.35-rc3. (Though networking is fubared for me on
> that kernel, so I have not migrated to it.)
>
> My understanding is the "fix" in the driver was to blacklist ncq for
> that controller. I have not verified that yet.

One of my disks died again a while ago. So I went to the machine to
replace the drive. But I forgot to write down which one had died. So I
started it up again. Now I had 7 disks again like before, but a
different drive was now "gone". So my RAID had only 6 out of 8 drives
and was "gone". Together with some 4.7T worth of data on it....

Next I went to the machine with a spare sata card. I removed the
drives from the ASUS U3S6 card, and put them on the old pci sata card.

By the time I logged in on the machine, the RAID had found 8/8 drives
and I think it had already started rebuilding.....

I now haven't had any problems with the drives in more than a week.

Performance of the raid has dropped from 600Mb to around 400Mb/sec,
obviously because the PCI card cannot handle 200Mb/sec of disk IO.

I'm open to suggestions for cheap highperformance WORKING PCIe sata
cards....

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ

2010-08-12 01:05:40

by alan

[permalink] [raw]
Subject: Re: Question on siig sata 3 controller

On Tue, 10 Aug 2010, Rogier Wolff wrote:

> On Tue, Jun 15, 2010 at 10:25:32AM -0700, Alan wrote:
>> On Tue, 2010-06-15 at 16:01 +0100, Alan Cox wrote:
>>>> Is one of these modules the driver for this controller? I think it's
>>>> AHCI: lshw says it uses ports cc00 ... and a bunch of others, and
>>>> those ports are claimed by ahci according to /proc/ioports. Ah! I need
>>>> better eyes. lshw already mentions that it's ahci...
>>>
>>> AHCI will be driving it.
>>
>> I have seen this problem with the 2.6.33 kernel in Fedora 13. The
>> problem goes away in 2.6.35-rc3. (Though networking is fubared for me on
>> that kernel, so I have not migrated to it.)
>>
>> My understanding is the "fix" in the driver was to blacklist ncq for
>> that controller. I have not verified that yet.
>
> One of my disks died again a while ago. So I went to the machine to
> replace the drive. But I forgot to write down which one had died. So I
> started it up again. Now I had 7 disks again like before, but a
> different drive was now "gone". So my RAID had only 6 out of 8 drives
> and was "gone". Together with some 4.7T worth of data on it....
>
> Next I went to the machine with a spare sata card. I removed the
> drives from the ASUS U3S6 card, and put them on the old pci sata card.
>
> By the time I logged in on the machine, the RAID had found 8/8 drives
> and I think it had already started rebuilding.....
>
> I now haven't had any problems with the drives in more than a week.
>
> Performance of the raid has dropped from 600Mb to around 400Mb/sec,
> obviously because the PCI card cannot handle 200Mb/sec of disk IO.
>
> I'm open to suggestions for cheap highperformance WORKING PCIe sata
> cards....

I found that if I ran the latest of Linus' kernels, the controller worked
correctly. There is obviously a change that needs to get backported into
the other working kernels.

--
Truth is stranger than fiction because fiction has to make sense.