2010-08-21 18:54:08

by Nicolas Jungers

[permalink] [raw]
Subject: possible esata regression in 2.6.35

My arm box doesn't succeed to use my esata port multiplier (addonics
sil3726 based). It was working well with 2.6.34.1 and 2.6.34.4 but not
with both 2.6.35.2 and 2.6.35.3. I haven't test other kernels.

The kernels are from http://sheeva.with-linux.com/sheeva/ with for
example the following config
http://sheeva.with-linux.com/sheeva/2.6.35.3/sheeva-2.6.35.3.config

The symptoms are in the console a loop on the esata links. Here is the
start of it:

ata2: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
ata2: edma_err_cause=00000010 pp_flags=00000000, dev connect
ata2: SError: { PHYRdyChg DevExch }
ata2: hard resetting link
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
ata2.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
ata2.00: hard resetting link
ata2.01: hard resetting link
ata2.02: hard resetting link
ata2.03: hard resetting link
ata2.04: hard resetting link
ata2.05: hard resetting link
ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.15: hard resetting link
ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
ata2.00: hard resetting link
ata2.01: hard resetting link
ata2.02: hard resetting link
ata2.04: hard resetting link
ata2.05: hard resetting link
ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.15: hard resetting link
ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
ata2.00: hard resetting link
ata2.01: hard resetting link
ata2.02: hard resetting link
ata2.03: hard resetting link
ata2.04: hard resetting link
ata2.05: hard resetting link
ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.00: failed to recover link after 3 tries, disabling
ata2.15: hard resetting link
ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl F300)

N.


2010-08-21 19:13:30

by Mark Lord

[permalink] [raw]
Subject: Re: possible esata regression in 2.6.35

On 10-08-21 02:52 PM, Nicolas Jungers wrote:
> My arm box doesn't succeed to use my esata port multiplier (addonics sil3726 based). It was working well with 2.6.34.1 and 2.6.34.4 but not with both 2.6.35.2 and 2.6.35.3. I haven't test other kernels.
>
> The kernels are from http://sheeva.with-linux.com/sheeva/ with for example the following config http://sheeva.with-linux.com/sheeva/2.6.35.3/sheeva-2.6.35.3.config
>
> The symptoms are in the console a loop on the esata links. Here is the start of it:
>
> ata2: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
> ata2: edma_err_cause=00000010 pp_flags=00000000, dev connect
> ata2: SError: { PHYRdyChg DevExch }
> ata2: hard resetting link
> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
> ata2.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
> ata2.00: hard resetting link
> ata2.01: hard resetting link
> ata2.02: hard resetting link
> ata2.03: hard resetting link
> ata2.04: hard resetting link
> ata2.05: hard resetting link
> ata2.00: qc timeout (cmd 0xec)
> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata2.15: hard resetting link
> ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
> ata2.00: hard resetting link
> ata2.01: hard resetting link
> ata2.02: hard resetting link
> ata2.04: hard resetting link
> ata2.05: hard resetting link
> ata2.00: qc timeout (cmd 0xec)
> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata2.15: hard resetting link
> ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
> ata2.00: hard resetting link
> ata2.01: hard resetting link
> ata2.02: hard resetting link
> ata2.03: hard resetting link
> ata2.04: hard resetting link
> ata2.05: hard resetting link
> ata2.00: qc timeout (cmd 0xec)
> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata2.00: failed to recover link after 3 tries, disabling
> ata2.15: hard resetting link
> ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
...

From the "edma_err" keyword above, we can deduce that you've got this PM
plugged into a Marvell SATA controller of some kind, managed by sata_mv.
Care to tell us more?

There are various changes in libata in 2.6.34/35 that break sata_mv,
particularly for ATAPI drives and any SSDs that support the DSM/TRIM commands.
Do either of those two things apply in your case?

If so, there are two pending patches which fix them.
They were posted to linux-ide this past Thursday,
and can also be found as attachments to this bug report:

https://bugzilla.kernel.org/show_bug.cgi?id=16434

Cheers

2010-08-21 19:54:09

by Nicolas Jungers

[permalink] [raw]
Subject: Re: possible esata regression in 2.6.35

On 08/21/2010 09:13 PM, Mark Lord wrote:
> On 10-08-21 02:52 PM, Nicolas Jungers wrote:
>> My arm box doesn't succeed to use my esata port multiplier (addonics
>> sil3726 based). It was working well with 2.6.34.1 and 2.6.34.4 but not
>> with both 2.6.35.2 and 2.6.35.3. I haven't test other kernels.
>>
>> The kernels are from http://sheeva.with-linux.com/sheeva/ with for
>> example the following config
>> http://sheeva.with-linux.com/sheeva/2.6.35.3/sheeva-2.6.35.3.config
>>
>> The symptoms are in the console a loop on the esata links. Here is the
>> start of it:
>>
>> ata2: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
>> ata2: edma_err_cause=00000010 pp_flags=00000000, dev connect
>> ata2: SError: { PHYRdyChg DevExch }
>> ata2: hard resetting link
>> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
>> ata2.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
>> ata2.00: hard resetting link
>> ata2.01: hard resetting link
>> ata2.02: hard resetting link
>> ata2.03: hard resetting link
>> ata2.04: hard resetting link
>> ata2.05: hard resetting link
>> ata2.00: qc timeout (cmd 0xec)
>> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> ata2.15: hard resetting link
>> ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
>> ata2.00: hard resetting link
>> ata2.01: hard resetting link
>> ata2.02: hard resetting link
>> ata2.04: hard resetting link
>> ata2.05: hard resetting link
>> ata2.00: qc timeout (cmd 0xec)
>> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> ata2.15: hard resetting link
>> ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
>> ata2.00: hard resetting link
>> ata2.01: hard resetting link
>> ata2.02: hard resetting link
>> ata2.03: hard resetting link
>> ata2.04: hard resetting link
>> ata2.05: hard resetting link
>> ata2.00: qc timeout (cmd 0xec)
>> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>> ata2.00: failed to recover link after 3 tries, disabling
>> ata2.15: hard resetting link
>> ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
> ...
>
> From the "edma_err" keyword above, we can deduce that you've got this PM
> plugged into a Marvell SATA controller of some kind, managed by sata_mv.
> Care to tell us more?

sorry, I'm deep into it, so it's too obvious for me. Yes, it's he
marvell plug computer (esata sheevaplug), kirkwood platform with indeed
the sata_mv module taking care of the sata port.

Start of the boot:

Linux version 2.6.35.3 (kelly@speedy) (gcc version 4.4.3 (Sourcery G++
Lite er) ) #1 PREEMPT Fri Aug 20 18:22:29 MDT 2010
CPU: Feroceon 88FR131 [56251311] revision 1 (ARMv5TE), cr=00053177
CPU: VIVT data cache, VIVT instruction cache
Machine: Marvell eSATA SheevaPlug Reference Board

and later:

Waiting for /dev to be fully populated...sata_mv sata_mv.0: version 1.28
sata_mv sata_mv.0: slots 32 ports 2
scsi0 : sata_mv
scsi1 : sata_mv
ata1: SATA max UDMA/133 irq 21
ata2: SATA max UDMA/133 irq 21
ata1: SATA link down (SStatus 0 SControl F300)
ata2: SATA link down (SStatus 0 SControl F300)
done.

>
> There are various changes in libata in 2.6.34/35 that break sata_mv,
> particularly for ATAPI drives and any SSDs that support the DSM/TRIM
> commands.
> Do either of those two things apply in your case?

I couldn't tell, but the problem is with the port multiplier. When I
directly attach a sata drive, it works.

>
> If so, there are two pending patches which fix them.
> They were posted to linux-ide this past Thursday,
> and can also be found as attachments to this bug report:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=16434

I haven't compiled a kernel in ages, but I'll try that Monday and report.

>
> Cheers

thanks,
N.

2010-08-22 19:55:00

by Jeff Garzik

[permalink] [raw]
Subject: Re: possible esata regression in 2.6.35

On 08/21/2010 02:52 PM, Nicolas Jungers wrote:
> My arm box doesn't succeed to use my esata port multiplier (addonics
> sil3726 based). It was working well with 2.6.34.1 and 2.6.34.4 but not
> with both 2.6.35.2 and 2.6.35.3. I haven't test other kernels.
>
> The kernels are from http://sheeva.with-linux.com/sheeva/ with for
> example the following config
> http://sheeva.with-linux.com/sheeva/2.6.35.3/sheeva-2.6.35.3.config
>
> The symptoms are in the console a loop on the esata links. Here is the
> start of it:
>
> ata2: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
> ata2: edma_err_cause=00000010 pp_flags=00000000, dev connect
> ata2: SError: { PHYRdyChg DevExch }
> ata2: hard resetting link
> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
> ata2.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9

Can you post or link to the entire dmesg?

Notably, we need to see the probe messages to determine what SATA chip
you are using... From the edma_err_cause config I'd guess sata_mv, but
more info would be useful.

Jeff



2010-08-22 19:57:16

by Jeff Garzik

[permalink] [raw]
Subject: Re: possible esata regression in 2.6.35

On 08/22/2010 03:54 PM, Jeff Garzik wrote:
> On 08/21/2010 02:52 PM, Nicolas Jungers wrote:
>> My arm box doesn't succeed to use my esata port multiplier (addonics
>> sil3726 based). It was working well with 2.6.34.1 and 2.6.34.4 but not
>> with both 2.6.35.2 and 2.6.35.3. I haven't test other kernels.
>>
>> The kernels are from http://sheeva.with-linux.com/sheeva/ with for
>> example the following config
>> http://sheeva.with-linux.com/sheeva/2.6.35.3/sheeva-2.6.35.3.config
>>
>> The symptoms are in the console a loop on the esata links. Here is the
>> start of it:
>>
>> ata2: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
>> ata2: edma_err_cause=00000010 pp_flags=00000000, dev connect
>> ata2: SError: { PHYRdyChg DevExch }
>> ata2: hard resetting link
>> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
>> ata2.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
>
> Can you post or link to the entire dmesg?
>
> Notably, we need to see the probe messages to determine what SATA chip
> you are using... From the edma_err_cause config I'd guess sata_mv, but
> more info would be useful.

Nevermind, I see the reply (it got auto-sorted into the wrong folder
locally).

2010-08-25 19:00:47

by Maciej Rutecki

[permalink] [raw]
Subject: Re: possible esata regression in 2.6.35

On sobota, 21 sierpnia 2010 o 20:52:32 Nicolas Jungers wrote:
> My arm box doesn't succeed to use my esata port multiplier (addonics
> sil3726 based). It was working well with 2.6.34.1 and 2.6.34.4 but not
> with both 2.6.35.2 and 2.6.35.3. I haven't test other kernels.
>
> The kernels are from http://sheeva.with-linux.com/sheeva/ with for
> example the following config
> http://sheeva.with-linux.com/sheeva/2.6.35.3/sheeva-2.6.35.3.config
>
> The symptoms are in the console a loop on the esata links. Here is the
> start of it:
>
> ata2: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
> ata2: edma_err_cause=00000010 pp_flags=00000000, dev connect
> ata2: SError: { PHYRdyChg DevExch }
> ata2: hard resetting link
> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
> ata2.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
> ata2.00: hard resetting link
> ata2.01: hard resetting link
> ata2.02: hard resetting link
> ata2.03: hard resetting link
> ata2.04: hard resetting link
> ata2.05: hard resetting link
> ata2.00: qc timeout (cmd 0xec)
> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata2.15: hard resetting link
> ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
> ata2.00: hard resetting link
> ata2.01: hard resetting link
> ata2.02: hard resetting link
> ata2.04: hard resetting link
> ata2.05: hard resetting link
> ata2.00: qc timeout (cmd 0xec)
> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata2.15: hard resetting link
> ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
> ata2.00: hard resetting link
> ata2.01: hard resetting link
> ata2.02: hard resetting link
> ata2.03: hard resetting link
> ata2.04: hard resetting link
> ata2.05: hard resetting link
> ata2.00: qc timeout (cmd 0xec)
> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata2.00: failed to recover link after 3 tries, disabling
> ata2.15: hard resetting link
> ata2.15: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
>

I created a Bugzilla entry at
https://bugzilla.kernel.org/show_bug.cgi?id=17071
for your bug report, please add your address to the CC list in there, thanks!

--
Maciej Rutecki
http://www.maciek.unixy.pl

2010-08-27 08:19:59

by Gwendal Grignou

[permalink] [raw]
Subject: Re: possible esata regression in 2.6.35

I can reproduce the problem on uptsream-linux using a PC with Marvell
7042 controller and Sil3726 PMP. Without the SIl3726, it works fine.

What I can see on the SATA analyzer [I will send clean trace tomorrow]
is the disk send the DATA FIS back to the PMP, but the PMP does not
manage to have the data accepted by the host.

Non data commands work fine.

In dmesg:
[10058.404047] ata29: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[10058.404742] ata29.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6
ports, feat 0x1/0x9
[10058.405250] ata29.00: hard resetting link
[10058.809151] ata29.00: link resume succeeded after 1 retries
[10058.911613] ata29.01: hard resetting link
[10059.217572] ata29.02: hard resetting link
[10059.523572] ata29.03: hard resetting link
[10059.829505] ata29.04: hard resetting link
[10060.134572] ata29.05: hard resetting link
[10065.440079] ata29.03: qc timeout (cmd 0xec)
[10065.440085] ata29.03: failed to IDENTIFY (I/O error, err_mask=0x4)
[10065.440092] ata29.15: hard resetting link
[10065.947049] ata29.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[10065.947737] ata29.00: hard resetting link
[10066.352106] ata29.00: link resume succeeded after 1 retries
[10066.454616] ata29.01: hard resetting link
[10066.760591] ata29.02: hard resetting link
[10067.066570] ata29.03: hard resetting link
[10067.372505] ata29.05: hard resetting link
[10077.677071] ata29.03: qc timeout (cmd 0xec)
[10077.677076] ata29.03: failed to IDENTIFY (I/O error, err_mask=0x4)
[10077.677081] ata29.15: hard resetting link
[10078.184073] ata29.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[10078.184783] ata29.00: hard resetting link
[10078.589185] ata29.00: link resume succeeded after 1 retries
[10078.691593] ata29.01: hard resetting link
[10078.997592] ata29.02: hard resetting link
[10079.303571] ata29.03: hard resetting link
[10079.609505] ata29.04: hard resetting link
[10079.914572] ata29.05: hard resetting link
[10110.220078] ata29.03: qc timeout (cmd 0xec)
[10110.220083] ata29.03: failed to IDENTIFY (I/O error, err_mask=0x4)
[10110.220087] ata29.03: failed to recover link after 3 tries, disabling
[10110.220094] ata29.15: hard resetting link
[10110.727044] ata29.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[10111.033375] ata29.00: hard resetting link
[10111.338571] ata29.01: hard resetting link
[10111.644591] ata29.02: hard resetting link
[10111.950594] ata29.05: hard resetting link
[10112.256643] ata29: EH complete


Gwendal.




On Sun, Aug 22, 2010 at 12:57 PM, Jeff Garzik <[email protected]> wrote:
> On 08/22/2010 03:54 PM, Jeff Garzik wrote:
>>
>> On 08/21/2010 02:52 PM, Nicolas Jungers wrote:
>>>
>>> My arm box doesn't succeed to use my esata port multiplier (addonics
>>> sil3726 based). It was working well with 2.6.34.1 and 2.6.34.4 but not
>>> with both 2.6.35.2 and 2.6.35.3. I haven't test other kernels.
>>>
>>> The kernels are from http://sheeva.with-linux.com/sheeva/ with for
>>> example the following config
>>> http://sheeva.with-linux.com/sheeva/2.6.35.3/sheeva-2.6.35.3.config
>>>
>>> The symptoms are in the console a loop on the esata links. Here is the
>>> start of it:
>>>
>>> ata2: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
>>> ata2: edma_err_cause=00000010 pp_flags=00000000, dev connect
>>> ata2: SError: { PHYRdyChg DevExch }
>>> ata2: hard resetting link
>>> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
>>> ata2.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
>>
>> Can you post or link to the entire dmesg?
>>
>> Notably, we need to see the probe messages to determine what SATA chip
>> you are using... From the edma_err_cause config I'd guess sata_mv, but
>> more info would be useful.
>
> Nevermind, I see the reply (it got auto-sorted into the wrong folder
> locally).
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2010-08-27 22:56:11

by Gwendal Grignou

[permalink] [raw]
Subject: Re: possible esata regression in 2.6.35

I found the problem: in ata_sff_pio_stack:

struct ata_queued_cmd *qc = ap->port_task_data;

has been replaced by:

/* qc can be NULL if timeout occurred */
qc = ata_qc_from_tag(ap, ap->link.active_tag);
if (!qc)
return;

That does not work in case of port multipler, because the link to look
at is not ap->link. ap->link.active_tag is ATA_POISON.

I will submit a patch where I re-introduce port_task_data, this time
containing the link to to look at.

Gwendal.



On Fri, Aug 27, 2010 at 12:29 PM, Gwendal Grignou <[email protected]> wrote:
> I include the trace. You need windows and a lecroy satasuite to look
> at the trace.
>
> Looking at extended traces, the identify never move in HSM:
>
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613529] ata_sff_flush_pio_task: ENTER
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613561] ata_sff_exec_command:
> ata13: cmd 0xE4
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613577] ata_sff_hsm_move:
> ata13: protocol 1 task_state 3 (dev_stat 0x50)
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613580] ata_sff_hsm_move:
> ata13: dev 0 command complete, drv_stat 0x50
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613605] ata_sff_flush_pio_task: ENTER
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613637] ata_sff_exec_command:
> ata13: cmd 0xE4
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613654] ata_sff_hsm_move:
> ata13: protocol 1 task_state 3 (dev_stat 0x50)
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613656] ata_sff_hsm_move:
> ata13: dev 0 command complete, drv_stat 0x50
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613681] ata_sff_flush_pio_task: ENTER
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613688]
> ata_eh_revalidate_and_attach: ENTER
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613691]
> ata_eh_revalidate_and_attach: ENTER
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613693]
> ata_eh_revalidate_and_attach: ENTER
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613694]
> ata_eh_revalidate_and_attach: ENTER
> Aug 27 12:07:53 halab-59 kernel: [ ?548.613725] ata_sff_exec_command:
> ata13: cmd 0xEC
> ...
> Aug 27 12:08:23 halab-59 kernel: [ ?578.613048] __ata_port_freeze:
> ata13 port frozen
> Aug 27 12:08:23 halab-59 kernel: [ ?578.613058] ata13.03: qc timeout (cmd 0xec)
> Aug 27 12:08:23 halab-59 kernel: [ ?578.613062] ata13.03: failed to
> IDENTIFY (I/O error, err_mask=0x4)
>
>
> On Fri, Aug 27, 2010 at 1:19 AM, Gwendal Grignou <[email protected]> wrote:
>> I can reproduce the problem on uptsream-linux using a PC with Marvell
>> 7042 controller and Sil3726 PMP. Without the SIl3726, it works fine.
>>
>> What I can see on the SATA analyzer [I will send clean trace tomorrow]
>> is the disk send the DATA FIS back to the PMP, but the PMP does not
>> manage to have the data accepted by the host.
>>
>> Non data commands work fine.
>>
>> In dmesg:
>> [10058.404047] ata29: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [10058.404742] ata29.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6
>> ports, feat 0x1/0x9
>> [10058.405250] ata29.00: hard resetting link
>> [10058.809151] ata29.00: link resume succeeded after 1 retries
>> [10058.911613] ata29.01: hard resetting link
>> [10059.217572] ata29.02: hard resetting link
>> [10059.523572] ata29.03: hard resetting link
>> [10059.829505] ata29.04: hard resetting link
>> [10060.134572] ata29.05: hard resetting link
>> [10065.440079] ata29.03: qc timeout (cmd 0xec)
>> [10065.440085] ata29.03: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [10065.440092] ata29.15: hard resetting link
>> [10065.947049] ata29.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [10065.947737] ata29.00: hard resetting link
>> [10066.352106] ata29.00: link resume succeeded after 1 retries
>> [10066.454616] ata29.01: hard resetting link
>> [10066.760591] ata29.02: hard resetting link
>> [10067.066570] ata29.03: hard resetting link
>> [10067.372505] ata29.05: hard resetting link
>> [10077.677071] ata29.03: qc timeout (cmd 0xec)
>> [10077.677076] ata29.03: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [10077.677081] ata29.15: hard resetting link
>> [10078.184073] ata29.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [10078.184783] ata29.00: hard resetting link
>> [10078.589185] ata29.00: link resume succeeded after 1 retries
>> [10078.691593] ata29.01: hard resetting link
>> [10078.997592] ata29.02: hard resetting link
>> [10079.303571] ata29.03: hard resetting link
>> [10079.609505] ata29.04: hard resetting link
>> [10079.914572] ata29.05: hard resetting link
>> [10110.220078] ata29.03: qc timeout (cmd 0xec)
>> [10110.220083] ata29.03: failed to IDENTIFY (I/O error, err_mask=0x4)
>> [10110.220087] ata29.03: failed to recover link after 3 tries, disabling
>> [10110.220094] ata29.15: hard resetting link
>> [10110.727044] ata29.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> [10111.033375] ata29.00: hard resetting link
>> [10111.338571] ata29.01: hard resetting link
>> [10111.644591] ata29.02: hard resetting link
>> [10111.950594] ata29.05: hard resetting link
>> [10112.256643] ata29: EH complete
>>
>>
>> Gwendal.
>>
>>
>>
>>
>> On Sun, Aug 22, 2010 at 12:57 PM, Jeff Garzik <[email protected]> wrote:
>>> On 08/22/2010 03:54 PM, Jeff Garzik wrote:
>>>>
>>>> On 08/21/2010 02:52 PM, Nicolas Jungers wrote:
>>>>>
>>>>> My arm box doesn't succeed to use my esata port multiplier (addonics
>>>>> sil3726 based). It was working well with 2.6.34.1 and 2.6.34.4 but not
>>>>> with both 2.6.35.2 and 2.6.35.3. I haven't test other kernels.
>>>>>
>>>>> The kernels are from http://sheeva.with-linux.com/sheeva/ with for
>>>>> example the following config
>>>>> http://sheeva.with-linux.com/sheeva/2.6.35.3/sheeva-2.6.35.3.config
>>>>>
>>>>> The symptoms are in the console a loop on the esata links. Here is the
>>>>> start of it:
>>>>>
>>>>> ata2: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
>>>>> ata2: edma_err_cause=00000010 pp_flags=00000000, dev connect
>>>>> ata2: SError: { PHYRdyChg DevExch }
>>>>> ata2: hard resetting link
>>>>> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
>>>>> ata2.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
>>>>
>>>> Can you post or link to the entire dmesg?
>>>>
>>>> Notably, we need to see the probe messages to determine what SATA chip
>>>> you are using... From the edma_err_cause config I'd guess sata_mv, but
>>>> more info would be useful.
>>>
>>> Nevermind, I see the reply (it got auto-sorted into the wrong folder
>>> locally).
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
>>> the body of a message to [email protected]
>>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>>
>>
>

2010-08-29 00:40:19

by Gwendal Grignou

[permalink] [raw]
Subject: Re: possible esata regression in 2.6.35

I submit a patch, but messed up with the --in-reply-to argument. Check
"libata-sff: Reenable Port Multiplier after libata-sff remodeling."
thread.

I can now see my drive:
Aug 27 17:42:36 halab-59 kernel: [ 103.631293] ata_sff_exec_command:
ata5: cmd 0xEC
Aug 27 17:42:36 halab-59 kernel: [ 103.634050] ata_sff_hsm_move:
ata5: protocol 2 task_state 2 (dev_stat 0x58)
Aug 27 17:42:36 halab-59 kernel: [ 103.634053] ata_pio_sector: data read
Aug 27 17:42:36 halab-59 kernel: [ 103.634228] ata_sff_hsm_move:
ata5: protocol 2 task_state 3 (dev_stat 0x50)
Aug 27 17:42:36 halab-59 kernel: [ 103.634231] ata_sff_hsm_move:
ata5: dev 0 command complete, drv_stat 0x50

In general BMDMA does not support PMP, it assumes all qc are on ap->host.
sata_mv does it sometimes, I will submit a patch for that, after I
test the error path through-fully.

Gwendal.


On Fri, Aug 27, 2010 at 3:56 PM, Gwendal Grignou <[email protected]> wrote:
> I found the problem: in ata_sff_pio_stack:
>
> struct ata_queued_cmd *qc = ap->port_task_data;
>
> has been replaced by:
>
> /* qc can be NULL if timeout occurred */
> qc = ata_qc_from_tag(ap, ap->link.active_tag);
> if (!qc)
> ? ? ? ?return;
>
> That does not work in case of port multipler, because the link to look
> at is not ap->link. ap->link.active_tag is ATA_POISON.
>
> I will submit a patch where I re-introduce port_task_data, this time
> containing the link to to look at.
>
> Gwendal.
>
>
>
> On Fri, Aug 27, 2010 at 12:29 PM, Gwendal Grignou <[email protected]> wrote:
>> I include the trace. You need windows and a lecroy satasuite to look
>> at the trace.
>>
>> Looking at extended traces, the identify never move in HSM:
>>
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613529] ata_sff_flush_pio_task: ENTER
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613561] ata_sff_exec_command:
>> ata13: cmd 0xE4
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613577] ata_sff_hsm_move:
>> ata13: protocol 1 task_state 3 (dev_stat 0x50)
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613580] ata_sff_hsm_move:
>> ata13: dev 0 command complete, drv_stat 0x50
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613605] ata_sff_flush_pio_task: ENTER
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613637] ata_sff_exec_command:
>> ata13: cmd 0xE4
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613654] ata_sff_hsm_move:
>> ata13: protocol 1 task_state 3 (dev_stat 0x50)
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613656] ata_sff_hsm_move:
>> ata13: dev 0 command complete, drv_stat 0x50
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613681] ata_sff_flush_pio_task: ENTER
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613688]
>> ata_eh_revalidate_and_attach: ENTER
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613691]
>> ata_eh_revalidate_and_attach: ENTER
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613693]
>> ata_eh_revalidate_and_attach: ENTER
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613694]
>> ata_eh_revalidate_and_attach: ENTER
>> Aug 27 12:07:53 halab-59 kernel: [ ?548.613725] ata_sff_exec_command:
>> ata13: cmd 0xEC
>> ...
>> Aug 27 12:08:23 halab-59 kernel: [ ?578.613048] __ata_port_freeze:
>> ata13 port frozen
>> Aug 27 12:08:23 halab-59 kernel: [ ?578.613058] ata13.03: qc timeout (cmd 0xec)
>> Aug 27 12:08:23 halab-59 kernel: [ ?578.613062] ata13.03: failed to
>> IDENTIFY (I/O error, err_mask=0x4)
>>
>>
>> On Fri, Aug 27, 2010 at 1:19 AM, Gwendal Grignou <[email protected]> wrote:
>>> I can reproduce the problem on uptsream-linux using a PC with Marvell
>>> 7042 controller and Sil3726 PMP. Without the SIl3726, it works fine.
>>>
>>> What I can see on the SATA analyzer [I will send clean trace tomorrow]
>>> is the disk send the DATA FIS back to the PMP, but the PMP does not
>>> manage to have the data accepted by the host.
>>>
>>> Non data commands work fine.
>>>
>>> In dmesg:
>>> [10058.404047] ata29: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> [10058.404742] ata29.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6
>>> ports, feat 0x1/0x9
>>> [10058.405250] ata29.00: hard resetting link
>>> [10058.809151] ata29.00: link resume succeeded after 1 retries
>>> [10058.911613] ata29.01: hard resetting link
>>> [10059.217572] ata29.02: hard resetting link
>>> [10059.523572] ata29.03: hard resetting link
>>> [10059.829505] ata29.04: hard resetting link
>>> [10060.134572] ata29.05: hard resetting link
>>> [10065.440079] ata29.03: qc timeout (cmd 0xec)
>>> [10065.440085] ata29.03: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [10065.440092] ata29.15: hard resetting link
>>> [10065.947049] ata29.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> [10065.947737] ata29.00: hard resetting link
>>> [10066.352106] ata29.00: link resume succeeded after 1 retries
>>> [10066.454616] ata29.01: hard resetting link
>>> [10066.760591] ata29.02: hard resetting link
>>> [10067.066570] ata29.03: hard resetting link
>>> [10067.372505] ata29.05: hard resetting link
>>> [10077.677071] ata29.03: qc timeout (cmd 0xec)
>>> [10077.677076] ata29.03: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [10077.677081] ata29.15: hard resetting link
>>> [10078.184073] ata29.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> [10078.184783] ata29.00: hard resetting link
>>> [10078.589185] ata29.00: link resume succeeded after 1 retries
>>> [10078.691593] ata29.01: hard resetting link
>>> [10078.997592] ata29.02: hard resetting link
>>> [10079.303571] ata29.03: hard resetting link
>>> [10079.609505] ata29.04: hard resetting link
>>> [10079.914572] ata29.05: hard resetting link
>>> [10110.220078] ata29.03: qc timeout (cmd 0xec)
>>> [10110.220083] ata29.03: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [10110.220087] ata29.03: failed to recover link after 3 tries, disabling
>>> [10110.220094] ata29.15: hard resetting link
>>> [10110.727044] ata29.15: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>>> [10111.033375] ata29.00: hard resetting link
>>> [10111.338571] ata29.01: hard resetting link
>>> [10111.644591] ata29.02: hard resetting link
>>> [10111.950594] ata29.05: hard resetting link
>>> [10112.256643] ata29: EH complete
>>>
>>>
>>> Gwendal.
>>>
>>>
>>>
>>>
>>> On Sun, Aug 22, 2010 at 12:57 PM, Jeff Garzik <[email protected]> wrote:
>>>> On 08/22/2010 03:54 PM, Jeff Garzik wrote:
>>>>>
>>>>> On 08/21/2010 02:52 PM, Nicolas Jungers wrote:
>>>>>>
>>>>>> My arm box doesn't succeed to use my esata port multiplier (addonics
>>>>>> sil3726 based). It was working well with 2.6.34.1 and 2.6.34.4 but not
>>>>>> with both 2.6.35.2 and 2.6.35.3. I haven't test other kernels.
>>>>>>
>>>>>> The kernels are from http://sheeva.with-linux.com/sheeva/ with for
>>>>>> example the following config
>>>>>> http://sheeva.with-linux.com/sheeva/2.6.35.3/sheeva-2.6.35.3.config
>>>>>>
>>>>>> The symptoms are in the console a loop on the esata links. Here is the
>>>>>> start of it:
>>>>>>
>>>>>> ata2: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
>>>>>> ata2: edma_err_cause=00000010 pp_flags=00000000, dev connect
>>>>>> ata2: SError: { PHYRdyChg DevExch }
>>>>>> ata2: hard resetting link
>>>>>> ata2: SATA link up 3.0 Gbps (SStatus 123 SControl F300)
>>>>>> ata2.15: Port Multiplier 1.1, 0x1095:0x3726 r23, 6 ports, feat 0x1/0x9
>>>>>
>>>>> Can you post or link to the entire dmesg?
>>>>>
>>>>> Notably, we need to see the probe messages to determine what SATA chip
>>>>> you are using... From the edma_err_cause config I'd guess sata_mv, but
>>>>> more info would be useful.
>>>>
>>>> Nevermind, I see the reply (it got auto-sorted into the wrong folder
>>>> locally).
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>
>