2010-11-22 21:25:11

by Tobias Karnat

[permalink] [raw]
Subject: sata_sil24: external raid storage mistaken as port multiplier

Hi,

I have a regression regarding the sata_sil24 driver:

My RaidSonic Stardom SR3620-2S-SB2 is mistaken as a port multiplier,
it did work with Linux 2.6.23.

Maybe an module option to disable port multiplier support would already help.

-Tobias

Linux Tobias-Karnat 2.6.35-22-generic #35-Ubuntu SMP Sat Oct 16 20:45:36 UTC 2010 x86_64 GNU/Linux

02:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01)
Subsystem: Silicon Image, Inc. Device 7132
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fadfbf80 (64-bit, non-prefetchable) [size=128]
Region 2: Memory at fadfc000 (64-bit, non-prefetchable) [size=16K]
Region 4: I/O ports at cc80 [size=128]
Expansion ROM at fae00000 [disabled] [size=512K]
Capabilities: [54] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [5c] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [70] Express (v1) Legacy Endpoint, MSI 00
DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Kernel driver in use: sata_sil24
Kernel modules: sata_sil24

[ 2044.110031] ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xe frozen
[ 2044.110036] ata3: irq_stat 0x00b40090, PHY RDY changed
[ 2044.110044] ata3: hard resetting link
[ 2046.330028] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 2047.110026] ata3.15: Port Multiplier 1.1, 0x1095:0x4723 r28, 3 ports, feat 0x1/0x9
[ 2047.510027] ata3.00: hard resetting link
[ 2050.320020] ata3.00: failed to resume link (SControl 0)
[ 2050.540038] ata3.00: softreset failed (SRST command error)
[ 2050.720037] ata3.00: failed to read SCR 0 (Emask=0x1)
[ 2050.720040] ata3.00: reset failed, giving up
[ 2050.720046] ata3.15: hard resetting link
[ 2050.720048] ata3: controller in dubious state, performing PORT_RST
[ 2052.980030] ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 2053.720024] ata3.00: hard resetting link
[ 2056.520040] ata3.00: failed to resume link (SControl 0)
[ 2057.420721] ata3.00: SATA link down (SStatus 0 SControl 0)
[ 2057.620025] ata3.01: hard resetting link
[ 2060.440045] ata3.01: failed to resume link (SControl 0)
[ 2067.620038] ata3.01: softreset failed (timeout)
[ 2067.650049] ata3.01: hard resetting link
[ 2070.450048] ata3.01: failed to resume link (SControl 0)
[ 2077.660099] ata3.01: softreset failed (timeout)
[ 2077.760064] ata3.01: hard resetting link
[ 2080.561452] ata3.01: failed to resume link (SControl 0)
[ 2112.750663] ata3.01: softreset failed (timeout)
[ 2112.890033] ata3.01: limiting SATA link speed to 1.5 Gbps
[ 2112.890038] ata3.01: hard resetting link
[ 2115.690110] ata3.01: failed to resume link (SControl 0)
[ 2117.900038] ata3.01: softreset failed (timeout)
[ 2117.990048] ata3.01: reset failed, giving up
[ 2117.990055] ata3.15: hard resetting link
[ 2120.220030] ata3.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[ 2121.000053] ata3.01: hard resetting link
[ 2123.800040] ata3.01: failed to resume link (SControl 0)
[ 2131.000059] ata3.01: softreset failed (timeout)
[ 2131.100045] ata3.01: hard resetting link
[ 2133.900047] ata3.01: failed to resume link (SControl 0)
[ 2141.100154] ata3.01: softreset failed (timeout)
[ 2141.200046] ata3.01: hard resetting link
[ 2144.010034] ata3.01: failed to resume link (SControl 0)
[ 2176.200663] ata3.01: softreset failed (timeout)
[ 2176.200670] ata3.01: failed to read SCR 0 (Emask=0x40)
[ 2176.200674] ata3.01: reset failed, giving up
[ 2176.200680] ata3.15: hard resetting link
[ 2178.290030] ata3.15: SATA link down (SStatus 0 SControl 0)
[ 2181.294766] ata3.15: qc timeout (cmd 0xe4)
[ 2181.294795] ata3.15: failed to read PMP GSCR[0] (Emask=0x5)
[ 2181.294799] ata3.15: PMP revalidation failed (errno=-5)
[ 2183.290042] ata3.15: hard resetting link
[ 2185.382136] ata3.15: SATA link down (SStatus 0 SControl 0)
[ 2188.382646] ata3.15: qc timeout (cmd 0xe4)
[ 2188.382675] ata3.15: failed to read PMP GSCR[0] (Emask=0x5)
[ 2188.382680] ata3.15: PMP revalidation failed (errno=-5)
[ 2188.382685] ata3.15: limiting SATA link speed to 1.5 Gbps
[ 2190.390642] ata3.15: hard resetting link
[ 2192.480655] ata3.15: SATA link down (SStatus 0 SControl 10)
[ 2195.480021] ata3.15: qc timeout (cmd 0xe4)
[ 2195.480050] ata3.15: failed to read PMP GSCR[0] (Emask=0x5)
[ 2195.480054] ata3.15: PMP revalidation failed (errno=-5)
[ 2197.480016] ata3.15: hard resetting link
[ 2199.570662] ata3.15: SATA link down (SStatus 0 SControl 10)
[ 2202.570646] ata3.15: qc timeout (cmd 0xe4)
[ 2202.570675] ata3.15: failed to read PMP GSCR[0] (Emask=0x5)
[ 2202.570680] ata3.15: PMP revalidation failed (errno=-5)
[ 2204.570638] ata3.15: hard resetting link
[ 2206.670025] ata3.15: SATA link down (SStatus 0 SControl 10)
[ 2209.670020] ata3.15: qc timeout (cmd 0xe4)
[ 2209.670049] ata3.15: failed to read PMP GSCR[0] (Emask=0x5)
[ 2209.670053] ata3.15: PMP revalidation failed (errno=-5)
[ 2209.670056] ata3.15: failed to recover PMP after 5 tries, giving up
[ 2209.670059] ata3.15: Port Multiplier detaching
[ 2209.670088] ata3.00: disabled
[ 2209.670104] ata3: hard resetting link
[ 2211.760652] ata3: SATA link down (SStatus 0 SControl 10)
[ 2211.760664] ata3: EH complete


2010-11-23 12:08:52

by Tobias Karnat

[permalink] [raw]
Subject: Re: sata_sil24: external raid storage mistaken as port multiplier

Am Montag, den 22.11.2010, 22:24 +0100 schrieb Tobias Karnat:
> My RaidSonic Stardom SR3620-2S-SB2 is mistaken as a port multiplier,
> it did work with Linux 2.6.23.

This is what I get with Linux 2.6.23:

[ 242.139418] ata1: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0x2 frozen
[ 242.139424] ata1: irq_stat 0x00a00080, device exchanged
[ 242.859815] ata1: soft resetting port
[ 243.694381] ata1: softreset failed (SRST command error)
[ 243.694389] ata1: reset failed (errno=-5), retrying in 10 secs
[ 252.842608] ata1: hard resetting port
[ 254.982393] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 254.982489] ata1.00: ATA-6: External Disk 0, RGL10403, max UDMA/133
[ 254.982492] ata1.00: 1953546336 sectors, multi 1: LBA48
[ 254.982605] ata1.00: configured for UDMA/100
[ 254.982612] ata1: EH complete
[ 254.982740] scsi 0:0:0:0: Direct-Access ATA External Disk 0 RGL1 PQ: 0 ANSI: 5
[ 254.982836] sd 0:0:0:0: [sdh] 1953546336 512-byte hardware sectors (1000216 MB)
[ 254.982848] sd 0:0:0:0: [sdh] Write Protect is off
[ 254.982853] sd 0:0:0:0: [sdh] Mode Sense: 00 3a 00 00
[ 254.982870] sd 0:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 254.982971] sd 0:0:0:0: [sdh] 1953546336 512-byte hardware sectors (1000216 MB)
[ 254.983025] sd 0:0:0:0: [sdh] Write Protect is off
[ 254.983030] sd 0:0:0:0: [sdh] Mode Sense: 00 3a 00 00
[ 254.983063] sd 0:0:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 254.983068] sdh: unknown partition table
[ 254.988917] sd 0:0:0:0: [sdh] Attached SCSI disk
[ 254.989036] sd 0:0:0:0: Attached scsi generic sg9 type 0

----------------------------
Device Type 0
Vendor: ATA
Product: External Disk 0
Revision level: RGL1

Serial Number '070C05F_879391___0_8'

-Tobias

2010-11-24 04:23:15

by Tobias Karnat

[permalink] [raw]
Subject: Re: sata_sil24: external raid storage mistaken as port multiplier

Hi,

I got it fixed by removing ATA_FLAG_PMP from the SIL24_COMMON_FLAGS.

Could someone turn this into a module option?

The external raid case might in fact has a built-in port multiplier,
but the case can only be configured as raid0 and raid1.

I suspect that Linux tries to to access the drives separately, which fails.

-Tobias

2010-11-26 17:19:24

by Tejun Heo

[permalink] [raw]
Subject: Re: sata_sil24: external raid storage mistaken as port multiplier

On 11/24/2010 05:23 AM, Tobias Karnat wrote:
> Hi,
>
> I got it fixed by removing ATA_FLAG_PMP from the SIL24_COMMON_FLAGS.
>
> Could someone turn this into a module option?
>
> The external raid case might in fact has a built-in port multiplier,
> but the case can only be configured as raid0 and raid1.
>
> I suspect that Linux tries to to access the drives separately, which fails.

Hmmm... well, libata is just sending SRST w/ the port number set to 15
and the device is reporting that it is a port multipler to that.
Depending on configuration these devices don't work too well when
commanded as a PMP device. If you put it into JBOD mode, it will
probably work fine. I have no idea why it still reports as a PMP
device when configured as a virtual device.

That said, yeah, it probably would be a good idea to add a
libata.force param.

Can you please apply the following patch and verify that the device
doesn't work without any parameter but it does with
"libata.force=nopmp"?

Thanks.

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 7f77c67..7423265 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6325,6 +6325,7 @@ static int __init ata_parse_force_one(char **cur,
{ "nohrst", .lflags = ATA_LFLAG_NO_HRST },
{ "nosrst", .lflags = ATA_LFLAG_NO_SRST },
{ "norst", .lflags = ATA_LFLAG_NO_HRST | ATA_LFLAG_NO_SRST },
+ { "nopmp", .lflags = ATA_LFLAG_NO_PMP },
};
char *start = *cur, *p = *cur;
char *id, *val, *endp;
diff --git a/include/linux/libata.h b/include/linux/libata.h
index d947b12..6102ba2 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -174,6 +174,7 @@ enum {
ATA_LFLAG_DISABLED = (1 << 6), /* link is disabled */
ATA_LFLAG_SW_ACTIVITY = (1 << 7), /* keep activity stats */
ATA_LFLAG_NO_LPM = (1 << 8), /* disable LPM on this link */
+ ATA_LFLAG_NO_PMP = (1 << 9), /* disable PMP support */

/* struct ata_port flags */
ATA_FLAG_SLAVE_POSS = (1 << 0), /* host supports slave dev */
@@ -1210,7 +1211,8 @@ extern struct device_attribute *ata_common_sdev_attrs[];
#ifdef CONFIG_SATA_PMP
static inline bool sata_pmp_supported(struct ata_port *ap)
{
- return ap->flags & ATA_FLAG_PMP;
+ return (ap->flags & ATA_FLAG_PMP) &&
+ !(ap->link.flags & ATA_LFLAG_NO_PMP);
}

static inline bool sata_pmp_attached(struct ata_port *ap)

2010-11-26 20:14:01

by Mark Lord

[permalink] [raw]
Subject: Re: sata_sil24: external raid storage mistaken as port multiplier

On 10-11-26 12:19 PM, Tejun Heo wrote:
> On 11/24/2010 05:23 AM, Tobias Karnat wrote:
>> Hi,
>>
>> I got it fixed by removing ATA_FLAG_PMP from the SIL24_COMMON_FLAGS.
>>
>> Could someone turn this into a module option?
>>
>> The external raid case might in fact has a built-in port multiplier,
>> but the case can only be configured as raid0 and raid1.
>>
>> I suspect that Linux tries to to access the drives separately, which fails.
>
> Hmmm... well, libata is just sending SRST w/ the port number set to 15
> and the device is reporting that it is a port multipler to that.
> Depending on configuration these devices don't work too well when
> commanded as a PMP device. If you put it into JBOD mode, it will
> probably work fine. I have no idea why it still reports as a PMP
> device when configured as a virtual device.
>
> That said, yeah, it probably would be a good idea to add a
> libata.force param.

How about some form of auto-detection instead?

-ml

2010-11-26 20:16:10

by Tejun Heo

[permalink] [raw]
Subject: Re: sata_sil24: external raid storage mistaken as port multiplier

Hello,

On 11/26/2010 09:13 PM, Mark Lord wrote:
>> That said, yeah, it probably would be a good idea to add a
>> libata.force param.
>
> How about some form of auto-detection instead?

That's a device which reports that it's a PMP but fails to behave
itself as one. The sucky thing is that depending on the current mode
and firmware revision, the same hardware can actually bahave as a
pretty decent PMP. I frankly have no idea how to work it around.

--
tejun

2010-11-27 02:57:16

by Mark Lord

[permalink] [raw]
Subject: Re: sata_sil24: external raid storage mistaken as port multiplier

On 10-11-26 03:16 PM, Tejun Heo wrote:
> Hello,
>
> On 11/26/2010 09:13 PM, Mark Lord wrote:
>>> That said, yeah, it probably would be a good idea to add a
>>> libata.force param.
>>
>> How about some form of auto-detection instead?
>
> That's a device which reports that it's a PMP but fails to behave
> itself as one. The sucky thing is that depending on the current mode
> and firmware revision, the same hardware can actually bahave as a
> pretty decent PMP.

Yuck! :)

I suppose *if* we knew the exact fwrev that requires the workaround,
then we could make it automatic for that, and still have the boot flag
for cases we don't know about.

Tobias? Got the IDENTIFY info from that device?
Something like "hdparm --istdout /dev/sdX" ?

2010-11-28 10:52:08

by Tobias Karnat

[permalink] [raw]
Subject: Re: sata_sil24: external raid storage mistaken as port multiplier

Am Freitag, den 26.11.2010, 18:19 +0100 schrieb Tejun Heo:
> That said, yeah, it probably would be a good idea to add a
> libata.force param.
>
> Can you please apply the following patch and verify that the device
> doesn't work without any parameter but it does with
> "libata.force=nopmp"?

Sorry, but I am currently happy with the generic Ubuntu Kernel
with the module recompiled.

Maybe I am just lazy, but I have scheduled to compile my first Kernel
on Ubuntu 10.10 when 2.6.37 will be out.

Am Freitag, den 26.11.2010, 21:57 -0500 schrieb Mark Lord:
> I suppose *if* we knew the exact fwrev that requires the workaround,
> then we could make it automatic for that, and still have the boot flag
> for cases we don't know about.
>
> Tobias? Got the IDENTIFY info from that device?
> Something like "hdparm --istdout /dev/sdX" ?

Yes, hdparm --istdout /dev/sdh gives,

/dev/sdh:
0040 3fff c837 0010 0000 0000 003f 0000
0000 0000 3037 3043 3035 465f 3837 3933
3931 5f5f 5f30 5f38 0003 3e00 0004 5247
4c31 3034 3033 4578 7465 726e 616c 2044
6973 6b20 3020 2020 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 8001
0000 2f00 4000 0200 0000 0006 0000 0000
0000 0000 0000 0101 ffff 0fff 0000 0407
0003 0078 0078 0078 0078 0000 0000 0000
0000 0000 0000 0000 0201 0000 0000 0000
007e 001b 0069 7460 4040 0028 3460 4040
207f 0000 0000 0000 fffe 0000 c0fe 0000
0000 0000 0000 0000 c060 7470 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0001 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0017 2040
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 a3a5

Tobias

2010-12-03 12:45:50

by Tobias Karnat

[permalink] [raw]
Subject: Re: sata_sil24: external raid storage mistaken as port multiplier

Am Sonntag, den 28.11.2010, 11:51 +0100 schrieb Tobias Karnat:
> Am Freitag, den 26.11.2010, 18:19 +0100 schrieb Tejun Heo:
> > That said, yeah, it probably would be a good idea to add a
> > libata.force param.
> >
> > Can you please apply the following patch and verify that the device
> > doesn't work without any parameter but it does with
> > "libata.force=nopmp"?
>
> Sorry, but I am currently happy with the generic Ubuntu Kernel
> with the module recompiled.
>
> Maybe I am just lazy, but I have scheduled to compile my first Kernel
> on Ubuntu 10.10 when 2.6.37 will be out.

Well, I'm now on 2.6.36.1, because I had an problem with Kaffeine and
2.6.37-rc4. But I don't like the idea to force to not use pmp on every
controller. I don't have an pmp which I use, but maybe someone else does
and also has the problem with the external case.

> Am Freitag, den 26.11.2010, 21:57 -0500 schrieb Mark Lord:
> > I suppose *if* we knew the exact fwrev that requires the workaround,
> > then we could make it automatic for that, and still have the boot flag
> > for cases we don't know about.
> >
> > Tobias? Got the IDENTIFY info from that device?
> > Something like "hdparm --istdout /dev/sdX" ?
>
> Yes, hdparm --istdout /dev/sdh gives,
>
> /dev/sdh:
> 0040 3fff c837 0010 0000 0000 003f 0000
> 0000 0000 3037 3043 3035 465f 3837 3933
> 3931 5f5f 5f30 5f38 0003 3e00 0004 5247
> 4c31 3034 3033 4578 7465 726e 616c 2044
> 6973 6b20 3020 2020 2020 2020 2020 2020
> 2020 2020 2020 2020 2020 2020 2020 8001
> 0000 2f00 4000 0200 0000 0006 0000 0000
> 0000 0000 0000 0101 ffff 0fff 0000 0407
> 0003 0078 0078 0078 0078 0000 0000 0000
> 0000 0000 0000 0000 0201 0000 0000 0000
> 007e 001b 0069 7460 4040 0028 3460 4040
> 207f 0000 0000 0000 fffe 0000 c0fe 0000
> 0000 0000 0000 0000 c060 7470 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0001 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0017 2040
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 0000
> 0000 0000 0000 0000 0000 0000 0000 a3a5

I hope this is enough info?

-Tobias