2013-05-29 15:51:25

by Mark Langsdorf

[permalink] [raw]
Subject: [PATCH] ata: increase retry count but shorten duration for Calxeda controller

The Calxeda SATA phy intermittently fails to bring up a link with Gen3
Retrying the phy hard reset can work around the issue, but the drive
may fail again. In less than 150 out of 15000 test runs, it took more
than 10 tries for the link to be established (but never more than 35).
Increase the retry count to guarantee the link is established.

Also, the default 2 second time-out on a failed drive is too long in
this situation. Shorten it to 500 ms. This was also tested 15000 times
on 24 drives and none of them experienced a time out.

Signed-off-by: Mark Langsdorf <[email protected]>
---
drivers/ata/sata_highbank.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/sata_highbank.c b/drivers/ata/sata_highbank.c
index 0d7c4c2..536936f 100644
--- a/drivers/ata/sata_highbank.c
+++ b/drivers/ata/sata_highbank.c
@@ -199,7 +199,7 @@ static int highbank_initialize_phys(struct device *dev, void __iomem *addr)
static int ahci_highbank_hardreset(struct ata_link *link, unsigned int *class,
unsigned long deadline)
{
- const unsigned long *timing = sata_ehc_deb_timing(&link->eh_context);
+ unsigned long timing[] = { 5, 100, 500};
struct ata_port *ap = link->ap;
struct ahci_port_priv *pp = ap->private_data;
u8 *d2h_fis = pp->rx_fis + RX_FIS_D2H_REG;
@@ -207,7 +207,7 @@ static int ahci_highbank_hardreset(struct ata_link *link, unsigned int *class,
bool online;
u32 sstatus;
int rc;
- int retry = 10;
+ int retry = 100;

ahci_stop_engine(ap);

--
1.7.10.4


2013-05-29 20:13:16

by Timur Tabi

[permalink] [raw]
Subject: Re: [PATCH] ata: increase retry count but shorten duration for Calxeda controller

On Wed, May 29, 2013 at 10:51 AM, Mark Langsdorf
<[email protected]> wrote:
>
> {
> - const unsigned long *timing = sata_ehc_deb_timing(&link->eh_context);
> + unsigned long timing[] = { 5, 100, 500};


You didn't address my comments the last time you posted this. I'll
post them again:


Why are you dropping the 'const'?

Assuming it works, this should be more efficient:

static const unsigned long timing[] = {5, 100, 500};

2013-05-29 20:35:37

by Mark Langsdorf

[permalink] [raw]
Subject: Re: [PATCH] ata: increase retry count but shorten duration for Calxeda controller

On 05/29/2013 03:12 PM, Timur Tabi wrote:
> On Wed, May 29, 2013 at 10:51 AM, Mark Langsdorf
> <[email protected]> wrote:
>>
>> {
>> - const unsigned long *timing = sata_ehc_deb_timing(&link->eh_context);
>> + unsigned long timing[] = { 5, 100, 500};
>
>
> You didn't address my comments the last time you posted this. I'll
> post them again:
>
>
> Why are you dropping the 'const'?
>
> Assuming it works, this should be more efficient:
>
> static const unsigned long timing[] = {5, 100, 500};

I thought there was a compile issue, but I just rechecked and there
wasn't. I'll fix for the next submission.

Thanks for the review.

--Mark Langsdorf

2013-05-30 06:59:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] ata: increase retry count but shorten duration for Calxeda controller

On Wed, May 29, 2013 at 03:35:28PM -0500, Mark Langsdorf wrote:
> On 05/29/2013 03:12 PM, Timur Tabi wrote:
> > On Wed, May 29, 2013 at 10:51 AM, Mark Langsdorf
> > <[email protected]> wrote:
> >>
> >> {
> >> - const unsigned long *timing = sata_ehc_deb_timing(&link->eh_context);
> >> + unsigned long timing[] = { 5, 100, 500};
> >
> >
> > You didn't address my comments the last time you posted this. I'll
> > post them again:
> >
> >
> > Why are you dropping the 'const'?
> >
> > Assuming it works, this should be more efficient:
> >
> > static const unsigned long timing[] = {5, 100, 500};
>
> I thought there was a compile issue, but I just rechecked and there
> wasn't. I'll fix for the next submission.

Also, please add a comment explaining why those parameters are
necessary and how they're determined - ie. the bulk of the commit
message; otherwise, it looks pretty random.

Thanks.

--
tejun

2013-05-31 15:27:35

by Mark Langsdorf

[permalink] [raw]
Subject: [PATCH v3] ata: increase retry count but shorten duration for Calxeda controller

The Calxeda SATA phy intermittently fails to bring up a link with Gen3
Retrying the phy hard reset can work around the issue, but the drive
may fail again. In less than 150 out of 15000 test runs, it took more
than 10 tries for the link to be established (but never more than 35).
Triple the maximum observed retry count to provide plenty of margin for
rare events and to guarantee that the link is established.

Also, the default 2 second time-out on a failed drive is too long in
this situation. The uboot implementation of the same driver function
uses a much shorter time-out period and never experiences a time out
issue. Shorten the Linux time-out value for this driver to 500 ms and
keep the other timing constants the same as the stock AHCI driver. This
change was also tested 15000 times on 24 drives and none of them
experienced a time out.

Signed-off-by: Mark Langsdorf <[email protected]>
---
Changes from v2
Add static to the timing variable definition
Changes from v1
Add const to the timing variable definition
Added more detail in why the various numbers were chosen

drivers/ata/sata_highbank.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/sata_highbank.c b/drivers/ata/sata_highbank.c
index b20aa96..46ccc1c 100644
--- a/drivers/ata/sata_highbank.c
+++ b/drivers/ata/sata_highbank.c
@@ -199,7 +199,7 @@ static int highbank_initialize_phys(struct device *dev, void __iomem *addr)
static int ahci_highbank_hardreset(struct ata_link *link, unsigned int *class,
unsigned long deadline)
{
- const unsigned long *timing = sata_ehc_deb_timing(&link->eh_context);
+ static const unsigned long timing[] = { 5, 100, 500};
struct ata_port *ap = link->ap;
struct ahci_port_priv *pp = ap->private_data;
u8 *d2h_fis = pp->rx_fis + RX_FIS_D2H_REG;
@@ -207,7 +207,7 @@ static int ahci_highbank_hardreset(struct ata_link *link, unsigned int *class,
bool online;
u32 sstatus;
int rc;
- int retry = 10;
+ int retry = 100;

ahci_stop_engine(ap);

--
1.8.1.2

2013-06-02 08:00:21

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v3] ata: increase retry count but shorten duration for Calxeda controller

On Fri, May 31, 2013 at 10:27:26AM -0500, Mark Langsdorf wrote:
> The Calxeda SATA phy intermittently fails to bring up a link with Gen3
> Retrying the phy hard reset can work around the issue, but the drive
> may fail again. In less than 150 out of 15000 test runs, it took more
> than 10 tries for the link to be established (but never more than 35).
> Triple the maximum observed retry count to provide plenty of margin for
> rare events and to guarantee that the link is established.
>
> Also, the default 2 second time-out on a failed drive is too long in
> this situation. The uboot implementation of the same driver function
> uses a much shorter time-out period and never experiences a time out
> issue. Shorten the Linux time-out value for this driver to 500 ms and
> keep the other timing constants the same as the stock AHCI driver. This
> change was also tested 15000 times on 24 drives and none of them
> experienced a time out.

For the third time, explain the above in the comment; otherwise, it's
not going in.

--
tejun

2013-06-03 12:27:05

by Mark Langsdorf

[permalink] [raw]
Subject: Re: [PATCH v3] ata: increase retry count but shorten duration for Calxeda controller

On 06/02/2013 03:00 AM, Tejun Heo wrote:
> On Fri, May 31, 2013 at 10:27:26AM -0500, Mark Langsdorf wrote:
>
> For the third time, explain the above in the comment; otherwise, it's
> not going in.

Sorry, I completely misread your requirement. I'll move it to the
comment as requested.

--Mark Langsdorf
Calxeda, Inc.

2013-06-03 13:22:53

by Mark Langsdorf

[permalink] [raw]
Subject: [PATCH v4] ata: increase retry count but shorten duration for Calxeda controller

Increase the retry count for the hard reset function to 100 but
shorten the time out period to 500 ms. See the comment for
ahci_highbank_hardreset for the reasons why those vaulues were
chosen.

Signed-off-by: Mark Langsdorf <[email protected]>
---
Changes from v3
Move the detail to a comment on the ahci_highbank_hardreset function
Changes from v2
Add static to the timing variable definition
Changes from v1
Add const to the timing variable definition
Added more detail in why the various numbers were chosen

drivers/ata/sata_highbank.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/sata_highbank.c b/drivers/ata/sata_highbank.c
index b20aa96..c846fd3 100644
--- a/drivers/ata/sata_highbank.c
+++ b/drivers/ata/sata_highbank.c
@@ -196,10 +196,26 @@ static int highbank_initialize_phys(struct device *dev, void __iomem *addr)
return 0;
}

+/*
+ * The Calxeda SATA phy intermittently fails to bring up a link with Gen3
+ * Retrying the phy hard reset can work around the issue, but the drive
+ * may fail again. In less than 150 out of 15000 test runs, it took more
+ * than 10 tries for the link to be established (but never more than 35).
+ * Triple the maximum observed retry count to provide plenty of margin for
+ * rare events and to guarantee that the link is established.
+ *
+ * Also, the default 2 second time-out on a failed drive is too long in
+ * this situation. The uboot implementation of the same driver function
+ * uses a much shorter time-out period and never experiences a time out
+ * issue. Reducing the time-out to 500ms improves the responsiveness.
+ * The other timing constants were kept the same as the stock AHCI driver.
+ * This change was also tested 15000 times on 24 drives and none of them
+ * experienced a time out.
+ */
static int ahci_highbank_hardreset(struct ata_link *link, unsigned int *class,
unsigned long deadline)
{
- const unsigned long *timing = sata_ehc_deb_timing(&link->eh_context);
+ static const unsigned long timing[] = { 5, 100, 500};
struct ata_port *ap = link->ap;
struct ahci_port_priv *pp = ap->private_data;
u8 *d2h_fis = pp->rx_fis + RX_FIS_D2H_REG;
@@ -207,7 +223,7 @@ static int ahci_highbank_hardreset(struct ata_link *link, unsigned int *class,
bool online;
u32 sstatus;
int rc;
- int retry = 10;
+ int retry = 100;

ahci_stop_engine(ap);

--
1.8.1.2

2013-06-03 20:39:48

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v4] ata: increase retry count but shorten duration for Calxeda controller

On Mon, Jun 03, 2013 at 08:22:54AM -0500, Mark Langsdorf wrote:
> Increase the retry count for the hard reset function to 100 but
> shorten the time out period to 500 ms. See the comment for
> ahci_highbank_hardreset for the reasons why those vaulues were
> chosen.
>
> Signed-off-by: Mark Langsdorf <[email protected]>

Applied to libata/for-3.10-fixes w/ stable cc'd.

Thanks!

--
tejun