2010-01-21 16:51:23

by Chandra Shekhar Sah

[permalink] [raw]
Subject: port multiplier problem

Hi,

I have a disk array of 12 disks. The interface card has two sil3726 port
multiplier for 10 disks (5 disks in each port multiplier) and 2 direct
sata connectors. The host controller is sil3124. It was connected to
suse 10.4 and working fine.

There was a high voltage accident (high voltage to disk array only, not
computer) and because of that we replaced the power supply of this disk
array. Now, we are getting some problem, disks are not being
recognized,. So, I connected it to Fedora 12 (To correct if it is
because of some bugs) with driver sil24. Here are few things that I have
tested.

1) If I connect only one disk to direct sata connector, it works fine.
2) If I connect two disks to both direct sata connectors, it won't work.
3) If I connect one disk to one direct sata and one disk directly to
host controller, it works.
3) If I connect one or more disk to PM, it won't work.
4) If I connect one disk to direct sata and one to PM, it won't work.

Below is the error from dmesg in case of failure:
==================================
aic7xxx 0000:03:04.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
scsi2 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
<Adaptec aic7899 Ultra160 SCSI adapter>
aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

sata_sil24 0000:03:02.0: version 1.1
sata_sil24 0000:03:02.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
scsi3 : sata_sil24
scsi4 : sata_sil24
scsi5 : sata_sil24
scsi6 : sata_sil24
ata3: SATA max UDMA/100 host m128@0xea009000 port 0xea000000 irq 18
ata4: SATA max UDMA/100 host m128@0xea009000 port 0xea002000 irq 18
ata5: SATA max UDMA/100 host m128@0xea009000 port 0xea004000 irq 18
ata6: SATA max UDMA/100 host m128@0xea009000 port 0xea006000 irq 18
aic7xxx 0000:03:04.1: PCI INT B -> GSI 18 (level, low) -> IRQ 18
ata3: SATA link down (SStatus 0 SControl 0)
ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata4.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
ata4.00: hard resetting link
ata4.00: SATA link down (SStatus 0 SControl 10)
ata4.01: hard resetting link
ata4.01: SATA link down (SStatus 0 SControl 320)
ata4.02: hard resetting link
ata4.02: SATA link down (SStatus 0 SControl 320)
ata4.03: hard resetting link
ata4.03: SATA link down (SStatus 0 SControl 320)
ata4.04: hard resetting link
ata4.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata4.05: hard resetting link
ata4.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata4.04: failed to IDENTIFY (I/O error, err_mask=0x11)
ata4.15: hard resetting link
ata4.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
...........................
...........................
===================================

I appreciate your help.

Thanks
Manojg


2010-02-02 03:33:51

by Tejun Heo

[permalink] [raw]
Subject: Re: port multiplier problem

Hello,

On 01/22/2010 01:51 AM, Chandra Nepali wrote:
> I have a disk array of 12 disks. The interface card has two sil3726 port
> multiplier for 10 disks (5 disks in each port multiplier) and 2 direct
> sata connectors. The host controller is sil3124. It was connected to
> suse 10.4 and working fine.

Wasn't 10.3 the last of 10 series? After that it was 11.0. What's
the version of the kernel?

> There was a high voltage accident (high voltage to disk array only, not
> computer) and because of that we replaced the power supply of this disk
> array. Now, we are getting some problem, disks are not being
> recognized,.

Ouch...

> So, I connected it to Fedora 12 (To correct if it is
> because of some bugs) with driver sil24. Here are few things that I have
> tested.
>
> 1) If I connect only one disk to direct sata connector, it works fine.
> 2) If I connect two disks to both direct sata connectors, it won't work.
> 3) If I connect one disk to one direct sata and one disk directly to
> host controller, it works.

What do you mean by 'direct sata'? How is it different from 'directly
to host controller'?

> 3) If I connect one or more disk to PM, it won't work.
> 4) If I connect one disk to direct sata and one to PM, it won't work.

Maybe the PMP is fried?

Thanks.

--
tejun

2010-02-02 14:17:17

by Chandra Shekhar Sah

[permalink] [raw]
Subject: Re: port multiplier problem

Hi Tejun,

Thanks a lot for reply.

The disk array was connected to suse 10.3 before accident and was
working fine. After accident, I saw problems, so I switched to Fedora 12
because it has up-to-date sil24 driver for sata host controller sil3124.

Now, I am using Fedora 12 (kernel 2.6.31.12). The backplane card
(NORCO-LIB1220) on the disk array has 12 sata conectors, 5-5-1-1. i.e.
there are two port multiplier (sil3726, 5 sata ports in each PMP) and
two are without port multiplier, I mentioned it as "direct sata" (may be
not appropriate word).

I thought PMP is damaged so I purchased a new NORCO-LIB1220 backplane
card but same problem. The disks connected to PMP are not being
recognized. However, disk connected to "direct sata" (no disks on PMP)
is fine.

I greatly appreciate your help.
CN



On 2/1/10 10:40 PM, Tejun Heo wrote:
> Hello,
>
> On 01/22/2010 01:51 AM, Chandra Nepali wrote:
>
>> I have a disk array of 12 disks. The interface card has two sil3726 port
>> multiplier for 10 disks (5 disks in each port multiplier) and 2 direct
>> sata connectors. The host controller is sil3124. It was connected to
>> suse 10.4 and working fine.
>>
> Wasn't 10.3 the last of 10 series? After that it was 11.0. What's
> the version of the kernel?
>
>
>> There was a high voltage accident (high voltage to disk array only, not
>> computer) and because of that we replaced the power supply of this disk
>> array. Now, we are getting some problem, disks are not being
>> recognized,.
>>
> Ouch...
>
>
>> So, I connected it to Fedora 12 (To correct if it is
>> because of some bugs) with driver sil24. Here are few things that I have
>> tested.
>>
>> 1) If I connect only one disk to direct sata connector, it works fine.
>> 2) If I connect two disks to both direct sata connectors, it won't work.
>> 3) If I connect one disk to one direct sata and one disk directly to
>> host controller, it works.
>>
> What do you mean by 'direct sata'? How is it different from 'directly
> to host controller'?
>
>
>> 3) If I connect one or more disk to PM, it won't work.
>> 4) If I connect one disk to direct sata and one to PM, it won't work.
>>
> Maybe the PMP is fried?
>
> Thanks.
>
>

2010-02-02 14:21:38

by Tejun Heo

[permalink] [raw]
Subject: Re: port multiplier problem

On 02/02/2010 11:17 PM, Chandra Shekhar Sah wrote:
> I thought PMP is damaged so I purchased a new NORCO-LIB1220 backplane
> card but same problem. The disks connected to PMP are not being
> recognized. However, disk connected to "direct sata" (no disks on PMP)
> is fine.

Ah.. okay, can you please post full dmesg of failed probe?

Thanks.

--
tejun

2010-02-02 15:00:25

by Chandra Shekhar Sah

[permalink] [raw]
Subject: Re: port multiplier problem

Hi Tejun,

I have attached full dmesg output, while 9 disks are connected to PMP,
and no disk to "direct sata".

Thanks,
CN

On 2/2/10 9:24 AM, Tejun Heo wrote:
> On 02/02/2010 11:17 PM, Chandra Shekhar Sah wrote:
>
>> I thought PMP is damaged so I purchased a new NORCO-LIB1220 backplane
>> card but same problem. The disks connected to PMP are not being
>> recognized. However, disk connected to "direct sata" (no disks on PMP)
>> is fine.
>>
> Ah.. okay, can you please post full dmesg of failed probe?
>
> Thanks.
>
>


Attachments:
dmesglog (94.56 kB)

2010-02-02 16:45:07

by Grant Grundler

[permalink] [raw]
Subject: Re: port multiplier problem

On Tue, Feb 2, 2010 at 6:59 AM, Chandra Shekhar Sah <[email protected]> wrote:
> Hi Tejun,
>
> I have attached full dmesg output, while 9 disks are connected to PMP, and
> no disk to "direct sata".

Linux version 2.6.31.12-174.2.3.fc12.i686.PAE
([email protected])
...
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata3.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
ata3.00: hard resetting link
ata3.00: SATA link down (SStatus 0 SControl 10)
ata3.01: hard resetting link
ata3.01: SATA link down (SStatus 0 SControl 320)
ata3.02: hard resetting link
ata3.02: SATA link down (SStatus 0 SControl 320)
ata3.03: hard resetting link
ata3.03: SATA link down (SStatus 0 SControl 320)
ata3.04: hard resetting link
ata3.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.05: hard resetting link
ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)

Sil3726 has only 5 ports. 6th port is an enclosure management port.
Wasn't there a patch submitted to ignore the enclosure mgt port?

I doubt this is the problem Chandra is seeing but it could be related.

Chandra, just to be complete, can you share which drive model and
firmware is printed on the drive label?

thanks,
grant

2010-02-02 19:05:27

by Grant Grundler

[permalink] [raw]
Subject: Re: port multiplier problem

On Tue, Feb 2, 2010 at 10:12 AM, Chandra Shekhar Sah <[email protected]> wrote:
> Hi Grant,
>
> There are 6 Seagate Barracuda and 6 Hitachi DeskStar.

I thought 0x1095/0x3726 was a Silicon Image part. Can you confirm this?

If it is, this sounds like a broken implementation to me. Here is what
the Silicon Image 3726 Data Sheet says in the introduction:
Silicon Image’s SiI3726 is 1-to-5 SATA Port Multiplier designed to
provide a high performance link between a single SATA host port and 5
SATA device ports.

So I don't know where the 6th device is getting connected. Some
explanation/data sheet from the HW vendor would be helpful at this
point.

> Seagate Model: ST3750640AS
> Firmware: 3.AAK
>
> Hitachi Model:HDS721075KLA330
> Firmware: Not sure

Both of these drives work behind Sil3726. (First hand experience).

>
> I have attached pictures of both labels, in case.

Perfect - thanks for posting those.

thanks,
grant

2010-02-04 02:38:02

by Grant Grundler

[permalink] [raw]
Subject: Re: port multiplier problem

On Tue, Feb 2, 2010 at 11:22 AM, Chandra Shekhar Sah <[email protected]> wrote:
> Hi Grant,
>
> Thank for reply.
> Yes, PMP is sil3726. The backplane of the disk array has two PMP (each
> 1-to-5 as you have mentioned) and two sata direct connection without PMP.
> So, 10 disk are behind 2 PMP.

Ah ok. That explains your "5-5-1-1" comment now. I tried to find a
Data sheet for this board but only found one in Chinese:
http://www.norco.com.cn/UpLoadFile/Manual/DS-12X0-CN.pdf

and I unfortunately don't speak/read chinese. Probably doesn't matter
though since...

This email thread looks like a duplicate of a previous bug report:
http://markmail.org/message/lp3ynvfefejpiy2r

(or search for "Ubuntu 9.04 (2.6.28-14) and eSATA Port Multiplier
(PMP) Not working")

Chandra, you might read through that thread and dmesg output (posted by Chris K)
to see what else you have in common.

I had two questions on that thread that never got answered:
http://markmail.org/message/snpekoj4qexrslk5

| How can we find out if anyone has the SEMB properly wired up?
| Would it be hard to make libata aware of "SEMB port not responding" case?
| ie if the SEMB port times out or has no link, reduce the port count of
| the sil3726 PMP by one.
|
| Maybe add a "enable_sil24_semb" flag to libata?
| (avoid checking unless someone asks for it). I hate magic flags but also
| don't want to subject most people to the timeout delay.

I (or Gwendal) can post a patch (and lightly test) for any of the above.
Just need to get some guidance so we don't waste our time.

thanks,
grant

> Total is 12 disks. The sata host controller is
> sil3124. Picture of the backplane is attached.
>
> Thanks,
> Chandra
>
> On 2/2/10 2:04 PM, Grant Grundler wrote:
>>
>> On Tue, Feb 2, 2010 at 10:12 AM, Chandra Shekhar Sah<[email protected]>
>>  wrote:
>>
>>>
>>> Hi Grant,
>>>
>>> There are 6 Seagate Barracuda and 6 Hitachi DeskStar.
>>>
>>
>> I thought 0x1095/0x3726 was a Silicon Image part. Can you confirm this?
>>
>> If it is, this sounds like a broken implementation to me. Here is what
>> the Silicon Image 3726 Data Sheet says in the introduction:
>>     Silicon Image’s SiI3726 is 1-to-5 SATA Port Multiplier designed to
>> provide a high performance link between a single SATA host port and 5
>> SATA device ports.
>>
>> So I don't know where the 6th device is getting connected. Some
>> explanation/data sheet from the HW vendor would be helpful at this
>> point.
>>
>>
>>>
>>> Seagate Model: ST3750640AS
>>> Firmware: 3.AAK
>>>
>>> Hitachi Model:HDS721075KLA330
>>> Firmware: Not sure
>>>
>>
>> Both of these drives work behind Sil3726. (First hand experience).
>>
>>
>>>
>>> I have attached pictures of both labels, in case.
>>>
>>
>> Perfect - thanks for posting those.
>>
>> thanks,
>> grant
>>
>>
>
>

2010-02-04 03:14:34

by Tejun Heo

[permalink] [raw]
Subject: Re: port multiplier problem

Hello,

On 02/03/2010 01:44 AM, Grant Grundler wrote:
> On Tue, Feb 2, 2010 at 6:59 AM, Chandra Shekhar Sah <[email protected]> wrote:
>> Hi Tejun,
>>
>> I have attached full dmesg output, while 9 disks are connected to PMP, and
>> no disk to "direct sata".
>
> Linux version 2.6.31.12-174.2.3.fc12.i686.PAE
> ([email protected])
> ...
> ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> ata3.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
> ata3.00: hard resetting link
> ata3.00: SATA link down (SStatus 0 SControl 10)
> ata3.01: hard resetting link
> ata3.01: SATA link down (SStatus 0 SControl 320)
> ata3.02: hard resetting link
> ata3.02: SATA link down (SStatus 0 SControl 320)
> ata3.03: hard resetting link
> ata3.03: SATA link down (SStatus 0 SControl 320)
> ata3.04: hard resetting link
> ata3.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> ata3.05: hard resetting link
> ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
>
> Sil3726 has only 5 ports. 6th port is an enclosure management port.
> Wasn't there a patch submitted to ignore the enclosure mgt port?

Eh... I don't think it got included.

> I doubt this is the problem Chandra is seeing but it could be related.

Doesn't seem to be. IDENTIFY failures are coming from normal ports.

Thanks.

--
tejun

2010-02-04 03:17:54

by Tejun Heo

[permalink] [raw]
Subject: Re: port multiplier problem

On 02/04/2010 11:37 AM, Grant Grundler wrote:
> I had two questions on that thread that never got answered:
> http://markmail.org/message/snpekoj4qexrslk5
>
> | How can we find out if anyone has the SEMB properly wired up?
> | Would it be hard to make libata aware of "SEMB port not responding" case?
> | ie if the SEMB port times out or has no link, reduce the port count of
> | the sil3726 PMP by one.
> |
> | Maybe add a "enable_sil24_semb" flag to libata?
> | (avoid checking unless someone asks for it). I hate magic flags but also
> | don't want to subject most people to the timeout delay.
>
> I (or Gwendal) can post a patch (and lightly test) for any of the above.
> Just need to get some guidance so we don't waste our time.

It's not really sil24 tho. But anyways, I think we can just disable
them altogether. It's not like they have ever worked. Just limiting
both 3726 and 4726 to 5 ports should be fine. That said, I'm not
quite sure this is relevant to the reported problem but it's worth a
shot.

Thanks.

--
tejun

2010-02-04 16:40:06

by Chandra Shekhar Sah

[permalink] [raw]
Subject: Re: port multiplier problem

Hi Grant,

I compared PMP part with Chris dmesg and here are some similarities:
=================================
Similarity
=================================
sata_sil24 0000:03:02.0: version 1.1
sata_sil24 0000:03:02.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
scsi3 : sata_sil24
scsi4 : sata_sil24
scsi5 : sata_sil24
scsi6 : sata_sil24
ata3: SATA max UDMA/100 host m128@0xea009000 port 0xea000000 irq 18
ata4: SATA max UDMA/100 host m128@0xea009000 port 0xea002000 irq 18
ata5: SATA max UDMA/100 host m128@0xea009000 port 0xea004000 irq 18
ata6: SATA max UDMA/100 host m128@0xea009000 port 0xea006000 irq 18
aic7xxx 0000:03:04.1: PCI INT B -> GSI 18 (level, low) -> IRQ 18
ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
ata3.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
ata3.00: hard resetting link
ata3.00: SATA link down (SStatus 0 SControl 10)
ata3.01: hard resetting link
ata3.01: SATA link down (SStatus 0 SControl 320)
ata3.02: hard resetting link
ata3.02: SATA link down (SStatus 0 SControl 320)
ata3.03: hard resetting link
ata3.03: SATA link down (SStatus 0 SControl 320)
ata3.04: hard resetting link
ata3.04: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata3.05: hard resetting link
ata3.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata3.04: failed to IDENTIFY (I/O error, err_mask=0x11)
===============================================

However, I got some extra errors in my case (shown below):
===============================================
ata3.04: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xf
ata3.04: SError: { PHYRdyChg DevExch }

ata3.04: PHY status changed but maxed out on retries, giving up
ata3.04: Manully issue scan to resume this link
ata3.04: exception Emask 0x10 SAct 0x0 SErr 0x4050000 action 0xf t4
ata3.04: irq_stat 0x01060002, failed to transmit command FIS
ata3.04: SError: { PHYRdyChg CommWake DevExch }
ata3.04: limiting SATA link speed to 1.5 Gbps

ata4.15: hard resetting link
ata3.15: qc timeout (cmd 0xe4)
ata3.01: failed to read SCR 2 (Emask=0x4)
ata3.01: COMRESET failed (errno=-5)
ata3.01: failed to read SCR 0 (Emask=0x40)
ata3.01: reset failed, giving up

ata4.05: SATA link up 1.5 Gbps (SStatus 113 SControl 320)
ata4: PMP SError.N set for some ports, repeating recovery
ata4.04: hard resetting link

ata4.15: hard resetting link
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
SELinux: initialized (dev rpc_pipefs, type rpc_pipefs), uses genfs_contexts
ata4.15: SATA link up 3.0 Gbps (SStatus 123 SControl 0)

ata4: PMP SError.N set for some ports, repeating recovery
==========================================

Chris's post remind me about the LED light. In my drives (all hot
swappable), each has two LEDs: one for power (green light) and another
(orange light) probably showing the activity of the drive and it is
related with PMP because this orange light does not lit if I connect the
drive to direct sata port (with direct sata connection, disks are
working fine).

When the disk array was working fine (few months ago), the orange light
lit for a very short time when the disk's power is switched on and then
it turns off. It lit again when disk is working. However, now the orange
light does not turn off. I saw similar thing in the manual of sil3726
PMP, saying that some light turn off after disk is ready (I don't know
which light).

Thanks,
CN


On 2/3/10 9:37 PM, Grant Grundler wrote:
> On Tue, Feb 2, 2010 at 11:22 AM, Chandra Shekhar Sah<[email protected]> wrote:
>
>> Hi Grant,
>>
>> Thank for reply.
>> Yes, PMP is sil3726. The backplane of the disk array has two PMP (each
>> 1-to-5 as you have mentioned) and two sata direct connection without PMP.
>> So, 10 disk are behind 2 PMP.
>>
> Ah ok. That explains your "5-5-1-1" comment now. I tried to find a
> Data sheet for this board but only found one in Chinese:
> http://www.norco.com.cn/UpLoadFile/Manual/DS-12X0-CN.pdf
>
> and I unfortunately don't speak/read chinese. Probably doesn't matter
> though since...
>
> This email thread looks like a duplicate of a previous bug report:
> http://markmail.org/message/lp3ynvfefejpiy2r
>
> (or search for "Ubuntu 9.04 (2.6.28-14) and eSATA Port Multiplier
> (PMP) Not working")
>
> Chandra, you might read through that thread and dmesg output (posted by Chris K)
> to see what else you have in common.
>
> I had two questions on that thread that never got answered:
> http://markmail.org/message/snpekoj4qexrslk5
>
> | How can we find out if anyone has the SEMB properly wired up?
> | Would it be hard to make libata aware of "SEMB port not responding" case?
> | ie if the SEMB port times out or has no link, reduce the port count of
> | the sil3726 PMP by one.
> |
> | Maybe add a "enable_sil24_semb" flag to libata?
> | (avoid checking unless someone asks for it). I hate magic flags but also
> | don't want to subject most people to the timeout delay.
>
> I (or Gwendal) can post a patch (and lightly test) for any of the above.
> Just need to get some guidance so we don't waste our time.
>
> thanks,
> grant
>
>
>> Total is 12 disks. The sata host controller is
>> sil3124. Picture of the backplane is attached.
>>
>> Thanks,
>> Chandra
>>
>> On 2/2/10 2:04 PM, Grant Grundler wrote:
>>
>>> On Tue, Feb 2, 2010 at 10:12 AM, Chandra Shekhar Sah<[email protected]>
>>> wrote:
>>>
>>>
>>>> Hi Grant,
>>>>
>>>> There are 6 Seagate Barracuda and 6 Hitachi DeskStar.
>>>>
>>>>
>>> I thought 0x1095/0x3726 was a Silicon Image part. Can you confirm this?
>>>
>>> If it is, this sounds like a broken implementation to me. Here is what
>>> the Silicon Image 3726 Data Sheet says in the introduction:
>>> Silicon Image’s SiI3726 is 1-to-5 SATA Port Multiplier designed to
>>> provide a high performance link between a single SATA host port and 5
>>> SATA device ports.
>>>
>>> So I don't know where the 6th device is getting connected. Some
>>> explanation/data sheet from the HW vendor would be helpful at this
>>> point.
>>>
>>>
>>>
>>>> Seagate Model: ST3750640AS
>>>> Firmware: 3.AAK
>>>>
>>>> Hitachi Model:HDS721075KLA330
>>>> Firmware: Not sure
>>>>
>>>>
>>> Both of these drives work behind Sil3726. (First hand experience).
>>>
>>>
>>>
>>>> I have attached pictures of both labels, in case.
>>>>
>>>>
>>> Perfect - thanks for posting those.
>>>
>>> thanks,
>>> grant
>>>
>>>
>>>
>>
>>
>

2010-02-04 17:59:20

by Grant Grundler

[permalink] [raw]
Subject: Re: port multiplier problem

On Wed, Feb 3, 2010 at 7:24 PM, Tejun Heo <[email protected]> wrote:
> On 02/04/2010 11:37 AM, Grant Grundler wrote:
>> I had two questions on that thread that never got answered:
>>    http://markmail.org/message/snpekoj4qexrslk5
>>
>> | How can we find out if anyone has the SEMB properly wired up?
>> | Would it be hard to make libata aware of "SEMB port not responding" case?
>> | ie if the SEMB port times out or has no link, reduce the port count of
>> | the sil3726 PMP by one.
>> |
>> | Maybe add a "enable_sil24_semb" flag to libata?
>> | (avoid checking unless someone asks for it). I hate magic flags but also
>> | don't want to subject most people to the timeout delay.
>>
>> I (or Gwendal) can post a patch (and lightly test) for any of the above.
>> Just need to get some guidance so we don't waste our time.
>
> It's not really sil24 tho.  But anyways, I think we can just disable
> them altogether.  It's not like they have ever worked.  Just limiting
> both 3726 and 4726 to 5 ports should be fine.

Sorry - You are right. I meant "enable_sil3726_semb".

I'm not sure we need to limit the SEMB ports anymore either. See below.

>  That said, I'm not
> quite sure this is relevant to the reported problem but it's worth a
> shot.

I didn't have a better idea.

I'm seeing this in sata_pmp_quirks() since ATA_LFLAG_NO_SRST was introduced:
337 static void sata_pmp_quirks(struct ata_port *ap)
338 {
339 u32 *gscr = ap->link.device->gscr;
340 u16 vendor = sata_pmp_gscr_vendor(gscr);
341 u16 devid = sata_pmp_gscr_devid(gscr);
342 struct ata_link *link;
343
344 if (vendor == 0x1095 && devid == 0x3726) {
345 /* sil3726 quirks */
346 ata_for_each_link(link, ap, EDGE) {
347 /* Class code report is unreliable and SRST
348 * times out under certain configurations.
349 */
350 if (link->pmp < 5)
351 link->flags |= ATA_LFLAG_NO_SRST |
352 ATA_LFLAG_ASSUME_ATA;
353
354 /* port 5 is for SEMB device and it
doesn't like SR ST */
355 if (link->pmp == 5)
356 link->flags |= ATA_LFLAG_NO_SRST |
357 ATA_LFLAG_ASSUME_SEMB;
358 }


But the ATA_LFLAG_NO_SRST used in line 351 is not present in the
2.6.26 tree I know works with PMPs. The original commit comment isn't
specific about exactly which HW had problems:
http://www.mail-archive.com/[email protected]/msg24335.html

"Some links on some PMPs locks up on SRST and/or report incorrect
device signature. Implement ATA_LFLAG_NO_SRST, ASSUME_ATA and
ASSUME_SEMB to handle these quirky links. NO_SRST makes EH avoid
SRST. ASSUME_ATA and SEMB forces class code to ATA and SEMB_UNSUP
respectively. Note that SEMB isn't currently supported yet so the
_UNSUP variant is used."


Can you publish which PMP implementations sometimes lock up on SRST?

I doubt this is related to the problem Chandra is seeing but again,
don't have better ideas.

BTW, this same kernel works fine without disabling port 5 (SEMB port).
I didn't know
this until I just looked. I know previous source trees Google used
ignored SEMB port
on 3726 and I mistakenly assumed this one did too. :(

thanks,
grant

2010-02-05 02:38:16

by Tejun Heo

[permalink] [raw]
Subject: Re: port multiplier problem

Hello,

On 02/05/2010 02:59 AM, Grant Grundler wrote:
>> That said, I'm not quite sure this is relevant to the reported
>> problem but it's worth a shot.
>
> I didn't have a better idea.

Heh... yeah, me neither. If that doesn't work, I'll dig out 3726s I
have and see whether the new kernels are having problems with them but
I can't think of anything which could have effects like this.

> I'm seeing this in sata_pmp_quirks() since ATA_LFLAG_NO_SRST was introduced:
> 337 static void sata_pmp_quirks(struct ata_port *ap)
> 338 {
> 339 u32 *gscr = ap->link.device->gscr;
> 340 u16 vendor = sata_pmp_gscr_vendor(gscr);
> 341 u16 devid = sata_pmp_gscr_devid(gscr);
> 342 struct ata_link *link;
> 343
> 344 if (vendor == 0x1095 && devid == 0x3726) {
> 345 /* sil3726 quirks */
> 346 ata_for_each_link(link, ap, EDGE) {
> 347 /* Class code report is unreliable and SRST
> 348 * times out under certain configurations.
> 349 */
> 350 if (link->pmp < 5)
> 351 link->flags |= ATA_LFLAG_NO_SRST |
> 352 ATA_LFLAG_ASSUME_ATA;
> 353
> 354 /* port 5 is for SEMB device and it doesn't like SRST */
> 355 if (link->pmp == 5)
> 356 link->flags |= ATA_LFLAG_NO_SRST |
> 357 ATA_LFLAG_ASSUME_SEMB;
> 358 }
>
>
> But the ATA_LFLAG_NO_SRST used in line 351 is not present in the
> 2.6.26 tree I know works with PMPs. The original commit comment isn't
> specific about exactly which HW had problems:
> http://www.mail-archive.com/[email protected]/msg24335.html
>
> "Some links on some PMPs locks up on SRST and/or report incorrect
> device signature. Implement ATA_LFLAG_NO_SRST, ASSUME_ATA and
> ASSUME_SEMB to handle these quirky links. NO_SRST makes EH avoid
> SRST. ASSUME_ATA and SEMB forces class code to ATA and SEMB_UNSUP
> respectively. Note that SEMB isn't currently supported yet so the
> _UNSUP variant is used."
>
> Can you publish which PMP implementations sometimes lock up on SRST?

I don't remember the details but it was a 3726 or 4726 with slightly
different firmware revision.

> I doubt this is related to the problem Chandra is seeing but again,
> don't have better ideas.
>
> BTW, this same kernel works fine without disabling port 5 (SEMB
> port). I didn't know this until I just looked. I know previous
> source trees Google used ignored SEMB port on 3726 and I mistakenly
> assumed this one did too. :(

But the other report where the PMP device failed hardreset too was a
3/4726. I don't know what the variables are (firmware revision
definitely is one but there seem to be others) but these pseudo
devices are extremely fragile. Certain configurations just don't work
or work with strange quirks. I think it would be best to just stay
away from those.

Thanks.

--
tejun

2010-02-11 20:22:59

by Chandra Shekhar Sah

[permalink] [raw]
Subject: Re: port multiplier problem

Hi all,

Any suggestion?

Thanks,
Chandra

On 2/4/10 12:59 PM, Grant Grundler wrote:
> On Wed, Feb 3, 2010 at 7:24 PM, Tejun Heo<[email protected]> wrote:
>
>> On 02/04/2010 11:37 AM, Grant Grundler wrote:
>>
>>> I had two questions on that thread that never got answered:
>>> http://markmail.org/message/snpekoj4qexrslk5
>>>
>>> | How can we find out if anyone has the SEMB properly wired up?
>>> | Would it be hard to make libata aware of "SEMB port not responding" case?
>>> | ie if the SEMB port times out or has no link, reduce the port count of
>>> | the sil3726 PMP by one.
>>> |
>>> | Maybe add a "enable_sil24_semb" flag to libata?
>>> | (avoid checking unless someone asks for it). I hate magic flags but also
>>> | don't want to subject most people to the timeout delay.
>>>
>>> I (or Gwendal) can post a patch (and lightly test) for any of the above.
>>> Just need to get some guidance so we don't waste our time.
>>>
>> It's not really sil24 tho. But anyways, I think we can just disable
>> them altogether. It's not like they have ever worked. Just limiting
>> both 3726 and 4726 to 5 ports should be fine.
>>
> Sorry - You are right. I meant "enable_sil3726_semb".
>
> I'm not sure we need to limit the SEMB ports anymore either. See below.
>
>
>> That said, I'm not
>> quite sure this is relevant to the reported problem but it's worth a
>> shot.
>>
> I didn't have a better idea.
>
> I'm seeing this in sata_pmp_quirks() since ATA_LFLAG_NO_SRST was introduced:
> 337 static void sata_pmp_quirks(struct ata_port *ap)
> 338 {
> 339 u32 *gscr = ap->link.device->gscr;
> 340 u16 vendor = sata_pmp_gscr_vendor(gscr);
> 341 u16 devid = sata_pmp_gscr_devid(gscr);
> 342 struct ata_link *link;
> 343
> 344 if (vendor == 0x1095&& devid == 0x3726) {
> 345 /* sil3726 quirks */
> 346 ata_for_each_link(link, ap, EDGE) {
> 347 /* Class code report is unreliable and SRST
> 348 * times out under certain configurations.
> 349 */
> 350 if (link->pmp< 5)
> 351 link->flags |= ATA_LFLAG_NO_SRST |
> 352 ATA_LFLAG_ASSUME_ATA;
> 353
> 354 /* port 5 is for SEMB device and it
> doesn't like SR ST */
> 355 if (link->pmp == 5)
> 356 link->flags |= ATA_LFLAG_NO_SRST |
> 357 ATA_LFLAG_ASSUME_SEMB;
> 358 }
>
>
> But the ATA_LFLAG_NO_SRST used in line 351 is not present in the
> 2.6.26 tree I know works with PMPs. The original commit comment isn't
> specific about exactly which HW had problems:
> http://www.mail-archive.com/[email protected]/msg24335.html
>
> "Some links on some PMPs locks up on SRST and/or report incorrect
> device signature. Implement ATA_LFLAG_NO_SRST, ASSUME_ATA and
> ASSUME_SEMB to handle these quirky links. NO_SRST makes EH avoid
> SRST. ASSUME_ATA and SEMB forces class code to ATA and SEMB_UNSUP
> respectively. Note that SEMB isn't currently supported yet so the
> _UNSUP variant is used."
>
>
> Can you publish which PMP implementations sometimes lock up on SRST?
>
> I doubt this is related to the problem Chandra is seeing but again,
> don't have better ideas.
>
> BTW, this same kernel works fine without disabling port 5 (SEMB port).
> I didn't know
> this until I just looked. I know previous source trees Google used
> ignored SEMB port
> on 3726 and I mistakenly assumed this one did too. :(
>
> thanks,
> grant
>
>