2007-05-10 07:20:55

by Paul Mundt

[permalink] [raw]
Subject: libata reset-seq merge broke sata_sil on sh

Current git fails to boot via SATA on SH with the recent libata merge:

sata_sil 0000:00:01.0: cache line size not set. Driver may not function
scsi0 : sata_sil
scsi1 : sata_sil
ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma 0xfd000200 irq 0
ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma 0xfd000208 irq 0
ata1: device not ready (errno=-19), forcing hardreset
ata1: COMRESET failed (errno=-19)
ata1: reset failed (errno=-19), retrying in 10 secs
ata1: COMRESET failed (errno=-19)
ata1: reset failed (errno=-19), retrying in 10 secs
ata1: COMRESET failed (errno=-19)
ata1: reset failed (errno=-19), retrying in 35 secs
ata1: COMRESET failed (errno=-19)
ata1: reset failed, giving up
ata2: SATA link down (SStatus 0 SControl 310)
...

bisect points to the reset-seq merge as the bad change. Going back
a bit further from this, these are where the problems first started
showing up (but still managed to find the disk, which current git is
not able to).

At b8cffc6ad8c000410186815b7bcc6b76ef1bbb13 it's still working, even
though it's started complaining about the reset..

sata_sil 0000:00:01.0: version 2.2
sata_sil 0000:00:01.0: cache line size not set. Driver may not function
scsi0 : sata_sil
scsi1 : sata_sil
ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma 0xfd000200 irq 0
ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma 0xfd000208 irq 0
ata1: device not ready (errno=-19), forcing hardreset
ata1: COMRESET failed (errno=-19)
ata1: hardreset failed, retrying in 5 secs
ata1: COMRESET failed (errno=-19)
ata1: hardreset failed, retrying in 5 secs
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
ata1.00: ATA-5: HHD424020F7SV00, 00MLA0A5, max UDMA/100
ata1.00: 39070080 sectors, multi 0: LBA
ata1.00: applying bridge limits
ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
ata1.00: configured for UDMA/100
ata2: SATA link down (SStatus 0 SControl 310)
isa bounce pool size: 16 pages
scsi 0:0:0:0: Direct-Access ATA HHD424020F7SV00 00ML PQ: 0 ANSI: 5
SCSI device sda: 39070080 512-byte hdwr sectors (20004 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
SCSI device sda: 39070080 512-byte hdwr sectors (20004 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sda: sda1 sda2 sda3 sda4
sd 0:0:0:0: Attached scsi disk sda

at 27c78b372d05e47bbd059c9bb003c6d716abff54 the retry time was changed
to 10 seconds, which still manages to find the device..

sata_sil 0000:00:01.0: cache line size not set. Driver may not function
scsi0 : sata_sil
scsi1 : sata_sil
ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma 0xfd000200 irq 0
ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma 0xfd000208 irq 0
ata1: device not ready (errno=-19), forcing hardreset
ata1: COMRESET failed (errno=-19)
ata1: reset failed (errno=-19), retrying in 10 secs
ata1: COMRESET failed (errno=-19)
ata1: reset failed (errno=-19), retrying in 10 secs
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
ata1.00: ATA-5: HHD424020F7SV00, 00MLA0A5, max UDMA/100
ata1.00: 39070080 sectors, multi 0: LBA
ata1.00: applying bridge limits
ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
ata1.00: configured for UDMA/100
ata2: SATA link down (SStatus 0 SControl 310)
isa bounce pool size: 16 pages
scsi 0:0:0:0: Direct-Access ATA HHD424020F7SV00 00ML PQ: 0 ANSI: 5
SCSI device sda: 39070080 512-byte hdwr sectors (20004 MB)
sda: Write Protect is off
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
SCSI device sda: 39070080 512-byte hdwr sectors (20004 MB)
sda: Write Protect is off
SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sda: sda1 sda2 sda3 sda4
sd 0:0:0:0: Attached scsi disk sda

However, if I go back before any of the reset changes were introduced,
things were working fine, and there were no problems with waiting for
the reset. Ideas?


2007-05-10 11:28:43

by Tejun Heo

[permalink] [raw]
Subject: Re: libata reset-seq merge broke sata_sil on sh

Hello,

Paul Mundt wrote:
[--snip, thanks a lot for the detailed report--]
> However, if I go back before any of the reset changes were introduced,
> things were working fine, and there were no problems with waiting for
> the reset. Ideas?

Hmm... It worked so well on all my sil's. I'm a bit puzzled because we
also failed the same condition before the change too. The only change
is we're less patient with the initial tries but in the end we give more
than enough time to the device to prepare itself.

* Does your drive start spun down when it boots? Can you post dmesg
with printk timestamp turned on with kernel prior to reset-seq merge?

* In ata_bus_softreset() and sata_std_hardreset(), there's msleep(150)
delay before checking the status post-reset. Does increasing the delay
make any difference? Please try to increase it exponentially till it
reaches 10sec.

--
tejun

2007-05-10 11:54:17

by Tejun Heo

[permalink] [raw]
Subject: Re: libata reset-seq merge broke sata_sil on sh

Some more thoughts.

Tejun Heo wrote:
> Hello,
>
> Paul Mundt wrote:
> [--snip, thanks a lot for the detailed report--]
>> However, if I go back before any of the reset changes were introduced,
>> things were working fine, and there were no problems with waiting for
>> the reset. Ideas?
>
> Hmm... It worked so well on all my sil's. I'm a bit puzzled because we
> also failed the same condition before the change too. The only change
> is we're less patient with the initial tries but in the end we give more
> than enough time to the device to prepare itself.
>
> * Does your drive start spun down when it boots? Can you post dmesg
> with printk timestamp turned on with kernel prior to reset-seq merge?
>
> * In ata_bus_softreset() and sata_std_hardreset(), there's msleep(150)
> delay before checking the status post-reset. Does increasing the delay
> make any difference? Please try to increase it exponentially till it
> reaches 10sec.

* According to the report, things still work till 27c78b37 - commit for
the actual merge and there's no related change till the current master
from there. So, which commit actually breaks detection? Or is
detection just flaky after b8cffc6a?

* How does things work on 9b89391c?

Also, please turn on printk timestamp on all reports so that we can now
the timeline of things.

Thanks.

--
tejun

2007-05-10 12:47:34

by Paul Mundt

[permalink] [raw]
Subject: Re: libata reset-seq merge broke sata_sil on sh

On Thu, May 10, 2007 at 01:28:19PM +0200, Tejun Heo wrote:
> Paul Mundt wrote:
> [--snip, thanks a lot for the detailed report--]
> > However, if I go back before any of the reset changes were introduced,
> > things were working fine, and there were no problems with waiting for
> > the reset. Ideas?
>
> * Does your drive start spun down when it boots? Can you post dmesg
> with printk timestamp turned on with kernel prior to reset-seq merge?
>
Yes, it's spun down at boot. I'll get the logs with the timestamps and
the results of the mdelay() incrementing in the morning when I've got the
board handy again.

On Thu, May 10, 2007 at 01:53:48PM +0200, Tejun Heo wrote:
> * According to the report, things still work till 27c78b37 - commit for
> the actual merge and there's no related change till the current master
> from there. So, which commit actually breaks detection? Or is
> detection just flaky after b8cffc6a?
>
The detection is simply flaky after that point, however before the
current master it never hit the 35 second point (and thus never implied
that the link was down). I'll double check the bisect log to see if there
was anything beyond that that may have caused it.

The -ENODEV at least implies that the SRST fails, so at least that's a
starting point.

One other curious thing is that it seems to misreport the IRQ. In this
case the log indicates 0, whereas it's actually on IRQ 66. When the
system is booted, /proc/interrupts reflects that sata_sil is on the
proper IRQ (even when it's reported as 0 in the boot printk()).

2007-05-10 13:09:21

by Tejun Heo

[permalink] [raw]
Subject: Re: libata reset-seq merge broke sata_sil on sh

Paul Mundt wrote:
> Yes, it's spun down at boot. I'll get the logs with the timestamps and
> the results of the mdelay() incrementing in the morning when I've got the
> board handy again.

I see. That's where the expected initial prereset failure comes from.

> On Thu, May 10, 2007 at 01:53:48PM +0200, Tejun Heo wrote:
>> * According to the report, things still work till 27c78b37 - commit for
>> the actual merge and there's no related change till the current master
>> from there. So, which commit actually breaks detection? Or is
>> detection just flaky after b8cffc6a?
>>
> The detection is simply flaky after that point, however before the
> current master it never hit the 35 second point (and thus never implied
> that the link was down). I'll double check the bisect log to see if there
> was anything beyond that that may have caused it.
>
> The -ENODEV at least implies that the SRST fails, so at least that's a
> starting point.

If prereset() fails to get the initial DRDY before 10secs, it assumes
something went wrong and escalates to hardreset. sil family of
controllers report 0xff status while the link is broken and it seems
that your particular drive needs more than the current 150ms to recover
phy link. It probably went unnoticed till now because the device was
never hardreset before. If the diagnosis is correct, increasing the
delay in hardreset should fix the problem. Well, let's see. :-)

> One other curious thing is that it seems to misreport the IRQ. In this
> case the log indicates 0, whereas it's actually on IRQ 66. When the
> system is booted, /proc/interrupts reflects that sata_sil is on the
> proper IRQ (even when it's reported as 0 in the boot printk()).

That's somthing I've missed during init model conversion. It's just
printk problem. I'll fix it.

Thanks.

--
tejun

2007-05-11 00:52:59

by Paul Mundt

[permalink] [raw]
Subject: Re: libata reset-seq merge broke sata_sil on sh

On Thu, May 10, 2007 at 03:08:59PM +0200, Tejun Heo wrote:
> Paul Mundt wrote:
> > The detection is simply flaky after that point, however before the
> > current master it never hit the 35 second point (and thus never implied
> > that the link was down). I'll double check the bisect log to see if there
> > was anything beyond that that may have caused it.
> >
> > The -ENODEV at least implies that the SRST fails, so at least that's a
> > starting point.
>
> If prereset() fails to get the initial DRDY before 10secs, it assumes
> something went wrong and escalates to hardreset. sil family of
> controllers report 0xff status while the link is broken and it seems
> that your particular drive needs more than the current 150ms to recover
> phy link. It probably went unnoticed till now because the device was
> never hardreset before. If the diagnosis is correct, increasing the
> delay in hardreset should fix the problem. Well, let's see. :-)
>
Bumping the hardreset delay up does indeed fix it, I've had to bump it up
to 1200 before it started working (at 600 it still fails):

[ 0.967379] scsi0 : sata_sil
[ 0.970425] scsi1 : sata_sil
[ 0.973298] ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma 0xfd000200 irq 0
[ 0.981331] ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma 0xfd000208 irq 0
[ 1.299353] ata1: device not ready (errno=-19), forcing hardreset
[ 2.817893] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2.826284] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
[ 2.831052] ata1.00: ATA-5: HHD424020F7SV00, 00MLA0A5, max UDMA/100
[ 2.837548] ata1.00: 39070080 sectors, multi 0: LBA
[ 2.842702] ata1.00: applying bridge limits
[ 2.854162] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
[ 2.858938] ata1.00: configured for UDMA/100
[ 3.172602] ata2: SATA link down (SStatus 0 SControl 310)
[ 3.175736] scsi 0:0:0:0: Direct-Access ATA HHD424020F7SV00 00ML PQ: 0 ANSI: 5

I'm not sure if it matters or not, but this is an iVDR drive, so that
might also have additional implications.

--

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 4595d1f..4dad3fd 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3518,7 +3518,7 @@ int sata_std_hardreset(struct ata_port *ap, unsigned int *class,
}

/* wait a while before checking status, see SRST for more info */
- msleep(150);
+ msleep(1200);

rc = ata_wait_ready(ap, deadline);
/* link occupied, -ENODEV too is an error */

2007-05-11 09:39:44

by Tejun Heo

[permalink] [raw]
Subject: Re: libata reset-seq merge broke sata_sil on sh

[cc'ing Gary Hade for GoVault drive, Hello]

Hello, Paul.

Paul Mundt wrote:
> On Thu, May 10, 2007 at 03:08:59PM +0200, Tejun Heo wrote:
>> Paul Mundt wrote:
>>> The detection is simply flaky after that point, however before the
>>> current master it never hit the 35 second point (and thus never implied
>>> that the link was down). I'll double check the bisect log to see if there
>>> was anything beyond that that may have caused it.
>>>
>>> The -ENODEV at least implies that the SRST fails, so at least that's a
>>> starting point.
>> If prereset() fails to get the initial DRDY before 10secs, it assumes
>> something went wrong and escalates to hardreset. sil family of
>> controllers report 0xff status while the link is broken and it seems
>> that your particular drive needs more than the current 150ms to recover
>> phy link. It probably went unnoticed till now because the device was
>> never hardreset before. If the diagnosis is correct, increasing the
>> delay in hardreset should fix the problem. Well, let's see. :-)
>>
> Bumping the hardreset delay up does indeed fix it, I've had to bump it up
> to 1200 before it started working (at 600 it still fails):
>
> [ 0.967379] scsi0 : sata_sil
> [ 0.970425] scsi1 : sata_sil
> [ 0.973298] ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma 0xfd000200 irq 0
> [ 0.981331] ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma 0xfd000208 irq 0
> [ 1.299353] ata1: device not ready (errno=-19), forcing hardreset
> [ 2.817893] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [ 2.826284] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
> [ 2.831052] ata1.00: ATA-5: HHD424020F7SV00, 00MLA0A5, max UDMA/100
> [ 2.837548] ata1.00: 39070080 sectors, multi 0: LBA
> [ 2.842702] ata1.00: applying bridge limits
> [ 2.854162] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
> [ 2.858938] ata1.00: configured for UDMA/100
> [ 3.172602] ata2: SATA link down (SStatus 0 SControl 310)
> [ 3.175736] scsi 0:0:0:0: Direct-Access ATA HHD424020F7SV00 00ML PQ: 0 ANSI: 5
>
> I'm not sure if it matters or not, but this is an iVDR drive, so that
> might also have additional implications.

Don't have the flimsiest idea what an iVDR drive is but I take it that
it's some sort of special purpose thing. :-)

So, the HHD424020F7SV00 is the second device which isn't happy with
150ms wait after reset. The first one was Quantum GoVault. It's a bit
different in that the HHD only fails after hardreset.

Anyways, so, it seems we'll have to extent the 150ms wait. I'll try to
cook up a patch for that && parallel probing.

Gary, IIRC, the requirement for GoVault was 3secs, right? Paul, can you
try to estimate the minimum required delay? Please go down by 100ms and
report where it breaks.

Thanks.

--
tejun

2007-05-12 03:50:16

by Paul Mundt

[permalink] [raw]
Subject: Re: libata reset-seq merge broke sata_sil on sh

On Fri, May 11, 2007 at 11:39:20AM +0200, Tejun Heo wrote:
> Paul Mundt wrote:
> > Bumping the hardreset delay up does indeed fix it, I've had to bump it up
> > to 1200 before it started working (at 600 it still fails):
> >
> > [ 0.967379] scsi0 : sata_sil
> > [ 0.970425] scsi1 : sata_sil
> > [ 0.973298] ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma 0xfd000200 irq 0
> > [ 0.981331] ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma 0xfd000208 irq 0
> > [ 1.299353] ata1: device not ready (errno=-19), forcing hardreset
> > [ 2.817893] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> > [ 2.826284] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
> > [ 2.831052] ata1.00: ATA-5: HHD424020F7SV00, 00MLA0A5, max UDMA/100
> > [ 2.837548] ata1.00: 39070080 sectors, multi 0: LBA
> > [ 2.842702] ata1.00: applying bridge limits
> > [ 2.854162] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
> > [ 2.858938] ata1.00: configured for UDMA/100
> > [ 3.172602] ata2: SATA link down (SStatus 0 SControl 310)
> > [ 3.175736] scsi 0:0:0:0: Direct-Access ATA HHD424020F7SV00 00ML PQ: 0 ANSI: 5
> >
> > I'm not sure if it matters or not, but this is an iVDR drive, so that
> > might also have additional implications.
>
> Don't have the flimsiest idea what an iVDR drive is but I take it that
> it's some sort of special purpose thing. :-)
>
http://www.ivdr.org

The GoVault appears to be a similar device, in that they're both
removeable cartridges.

> Gary, IIRC, the requirement for GoVault was 3secs, right? Paul, can you
> try to estimate the minimum required delay? Please go down by 100ms and
> report where it breaks.
>
800ms was the lowest it would work at, 700ms still breaks.

2007-05-16 00:30:50

by Paul Mundt

[permalink] [raw]
Subject: Re: libata reset-seq merge broke sata_sil on sh

On Sat, May 12, 2007 at 12:49:28PM +0900, Paul Mundt wrote:
> On Fri, May 11, 2007 at 11:39:20AM +0200, Tejun Heo wrote:
> > Paul Mundt wrote:
> > > Bumping the hardreset delay up does indeed fix it, I've had to bump it up
> > > to 1200 before it started working (at 600 it still fails):
> > >
> > > [ 0.967379] scsi0 : sata_sil
> > > [ 0.970425] scsi1 : sata_sil
> > > [ 0.973298] ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma 0xfd000200 irq 0
> > > [ 0.981331] ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma 0xfd000208 irq 0
> > > [ 1.299353] ata1: device not ready (errno=-19), forcing hardreset
> > > [ 2.817893] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> > > [ 2.826284] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
> > > [ 2.831052] ata1.00: ATA-5: HHD424020F7SV00, 00MLA0A5, max UDMA/100
> > > [ 2.837548] ata1.00: 39070080 sectors, multi 0: LBA
> > > [ 2.842702] ata1.00: applying bridge limits
> > > [ 2.854162] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
> > > [ 2.858938] ata1.00: configured for UDMA/100
> > > [ 3.172602] ata2: SATA link down (SStatus 0 SControl 310)
> > > [ 3.175736] scsi 0:0:0:0: Direct-Access ATA HHD424020F7SV00 00ML PQ: 0 ANSI: 5
> > >
> > > I'm not sure if it matters or not, but this is an iVDR drive, so that
> > > might also have additional implications.
> >
> > Don't have the flimsiest idea what an iVDR drive is but I take it that
> > it's some sort of special purpose thing. :-)
> >
> http://www.ivdr.org
>
> The GoVault appears to be a similar device, in that they're both
> removeable cartridges.
>
> > Gary, IIRC, the requirement for GoVault was 3secs, right? Paul, can you
> > try to estimate the minimum required delay? Please go down by 100ms and
> > report where it breaks.
> >
> 800ms was the lowest it would work at, 700ms still breaks.

Ping?

2007-05-16 16:45:47

by Tejun Heo

[permalink] [raw]
Subject: [PATCH] libata: implement ata_wait_after_reset()

On certain device/controller combination, 0xff status is asserted
after reset and doesn't get cleared during 150ms post-reset wait. As
0xff status is interpreted as no device (for good reasons), this can
lead to misdetection on such cases.

This patch implements ata_wait_after_reset() which replaces the 150ms
sleep and waits upto ATA_TMOUT_FF_WAIT if status is 0xff.
ATA_TMOUT_FF_WAIT is currently 800ms which is enough for
HHD424020F7SV00 to get detected but not enough for Quantum GoVault
drive which is known to take upto 2s.

Without parallel probing, spending 2s on 0xff port would incur too
much delay on ata_piix's which use 0xff to indicate empty port and
doesn't have SCR register, so GoVault needs to wait till parallel
probing.

Signed-off-by: Tejun Heo <[email protected]>
---

This patch is against the current libata-dev#upstream +
pata_scc-fix-build-failure[1].

[1] http://article.gmane.org/gmane.linux.kernel/528405

Paul, please verify this fixes your problem. You can skip the
pata_scc patch, it will cause pata_scc part to be rejected but doesn't
matter.

Jeff, added 800ms delay is sad but this seems to be the only way to
avoid iVDR drive detection regression for 2.6.22. I'll cook up
parallel probing for 2.6.23.

Thanks.

drivers/ata/ahci.c | 11 +------
drivers/ata/libata-core.c | 67 +++++++++++++++++++++++++++++++++++---------
drivers/ata/pata_scc.c | 13 +-------
drivers/ata/sata_inic162x.c | 2 -
include/linux/libata.h | 8 +++++
5 files changed, 67 insertions(+), 34 deletions(-)

Index: work/drivers/ata/ahci.c
===================================================================
--- work.orig/drivers/ata/ahci.c
+++ work/drivers/ata/ahci.c
@@ -947,15 +947,8 @@ static int ahci_softreset(struct ata_por
writel(1, port_mmio + PORT_CMD_ISSUE);
readl(port_mmio + PORT_CMD_ISSUE); /* flush */

- /* spec mandates ">= 2ms" before checking status.
- * We wait 150ms, because that was the magic delay used for
- * ATAPI devices in Hale Landis's ATADRVR, for the period of time
- * between when the ATA command register is written, and then
- * status is checked. Because waiting for "a while" before
- * checking status is fine, post SRST, we perform this magic
- * delay here as well.
- */
- msleep(150);
+ /* wait a while before checking status */
+ ata_wait_after_reset(ap, deadline);

rc = ata_wait_ready(ap, deadline);
/* link occupied, -ENODEV too is an error */
Index: work/drivers/ata/libata-core.c
===================================================================
--- work.orig/drivers/ata/libata-core.c
+++ work/drivers/ata/libata-core.c
@@ -3008,6 +3008,55 @@ int ata_busy_sleep(struct ata_port *ap,
}

/**
+ * ata_wait_after_reset - wait before checking status after reset
+ * @ap: port containing status register to be polled
+ * @deadline: deadline jiffies for the operation
+ *
+ * After reset, we need to pause a while before reading status.
+ * Also, certain combination of controller and device report 0xff
+ * for some duration (e.g. until SATA PHY is up and running)
+ * which is interpreted as empty port in ATA world. This
+ * function also waits for such devices to get out of 0xff
+ * status.
+ *
+ * LOCKING:
+ * Kernel thread context (may sleep).
+ */
+void ata_wait_after_reset(struct ata_port *ap, unsigned long deadline)
+{
+ unsigned long until = jiffies + ATA_TMOUT_FF_WAIT;
+
+ if (time_before(until, deadline))
+ deadline = until;
+
+ /* Spec mandates ">= 2ms" before checking status. We wait
+ * 150ms, because that was the magic delay used for ATAPI
+ * devices in Hale Landis's ATADRVR, for the period of time
+ * between when the ATA command register is written, and then
+ * status is checked. Because waiting for "a while" before
+ * checking status is fine, post SRST, we perform this magic
+ * delay here as well.
+ *
+ * Old drivers/ide uses the 2mS rule and then waits for ready.
+ */
+ msleep(150);
+
+ /* Wait for 0xff to clear. Some SATA devices take a long time
+ * to clear 0xff after reset. For example, HHD424020F7SV00
+ * iVDR needs >= 800ms while. Quantum GoVault needs even more
+ * than that.
+ */
+ while (1) {
+ u8 status = ata_chk_status(ap);
+
+ if (status != 0xff || time_after(jiffies, deadline))
+ return;
+
+ msleep(50);
+ }
+}
+
+/**
* ata_wait_ready - sleep until BSY clears, or timeout
* @ap: port containing status register to be polled
* @deadline: deadline jiffies for the operation
@@ -3117,17 +3166,8 @@ static int ata_bus_softreset(struct ata_
udelay(20); /* FIXME: flush */
iowrite8(ap->ctl, ioaddr->ctl_addr);

- /* spec mandates ">= 2ms" before checking status.
- * We wait 150ms, because that was the magic delay used for
- * ATAPI devices in Hale Landis's ATADRVR, for the period of time
- * between when the ATA command register is written, and then
- * status is checked. Because waiting for "a while" before
- * checking status is fine, post SRST, we perform this magic
- * delay here as well.
- *
- * Old drivers/ide uses the 2mS rule and then waits for ready
- */
- msleep(150);
+ /* wait a while before checking status */
+ ata_wait_after_reset(ap, deadline);

/* Before we perform post reset processing we want to see if
* the bus shows 0xFF because the odd clown forgets the D7
@@ -3543,8 +3583,8 @@ int sata_std_hardreset(struct ata_port *
return 0;
}

- /* wait a while before checking status, see SRST for more info */
- msleep(150);
+ /* wait a while before checking status */
+ ata_wait_after_reset(ap, deadline);

rc = ata_wait_ready(ap, deadline);
/* link occupied, -ENODEV too is an error */
@@ -6847,6 +6887,7 @@ EXPORT_SYMBOL_GPL(ata_port_disable);
EXPORT_SYMBOL_GPL(ata_ratelimit);
EXPORT_SYMBOL_GPL(ata_wait_register);
EXPORT_SYMBOL_GPL(ata_busy_sleep);
+EXPORT_SYMBOL_GPL(ata_wait_after_reset);
EXPORT_SYMBOL_GPL(ata_wait_ready);
EXPORT_SYMBOL_GPL(ata_port_queue_task);
EXPORT_SYMBOL_GPL(ata_scsi_ioctl);
Index: work/include/linux/libata.h
===================================================================
--- work.orig/include/linux/libata.h
+++ work/include/linux/libata.h
@@ -218,6 +218,13 @@ enum {
ATA_TMOUT_INTERNAL = 30 * HZ,
ATA_TMOUT_INTERNAL_QUICK = 5 * HZ,

+ /* FIXME: GoVault needs 2s but we can't afford that without
+ * parallel probing. 800ms is enough for iVDR disk
+ * HHD424020F7SV00. Increase to 2secs when parallel probing
+ * is in place.
+ */
+ ATA_TMOUT_FF_WAIT = 4 * HZ / 5,
+
/* ATA bus states */
BUS_UNKNOWN = 0,
BUS_DMA = 1,
@@ -734,6 +741,7 @@ extern void ata_host_resume(struct ata_h
extern int ata_ratelimit(void);
extern int ata_busy_sleep(struct ata_port *ap,
unsigned long timeout_pat, unsigned long timeout);
+extern void ata_wait_after_reset(struct ata_port *ap, unsigned long deadline);
extern int ata_wait_ready(struct ata_port *ap, unsigned long deadline);
extern void ata_port_queue_task(struct ata_port *ap, work_func_t fn,
void *data, unsigned long delay);
Index: work/drivers/ata/pata_scc.c
===================================================================
--- work.orig/drivers/ata/pata_scc.c
+++ work/drivers/ata/pata_scc.c
@@ -557,17 +557,8 @@ static unsigned int scc_bus_softreset(st
udelay(20);
out_be32(ioaddr->ctl_addr, ap->ctl);

- /* spec mandates ">= 2ms" before checking status.
- * We wait 150ms, because that was the magic delay used for
- * ATAPI devices in Hale Landis's ATADRVR, for the period of time
- * between when the ATA command register is written, and then
- * status is checked. Because waiting for "a while" before
- * checking status is fine, post SRST, we perform this magic
- * delay here as well.
- *
- * Old drivers/ide uses the 2mS rule and then waits for ready
- */
- msleep(150);
+ /* wait a while before checking status */
+ ata_wait_after_reset(ap, deadline);

/* Before we perform post reset processing we want to see if
* the bus shows 0xFF because the odd clown forgets the D7
Index: work/drivers/ata/sata_inic162x.c
===================================================================
--- work.orig/drivers/ata/sata_inic162x.c
+++ work/drivers/ata/sata_inic162x.c
@@ -446,7 +446,7 @@ static int inic_hardreset(struct ata_por
struct ata_taskfile tf;

/* wait a while before checking status */
- msleep(150);
+ ata_wait_after_reset(ap, deadline);

rc = ata_wait_ready(ap, deadline);
/* link occupied, -ENODEV too is an error */

2007-05-17 00:51:21

by Paul Mundt

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

On Wed, May 16, 2007 at 06:44:53PM +0200, Tejun Heo wrote:
> This patch is against the current libata-dev#upstream +
> pata_scc-fix-build-failure[1].
>
> [1] http://article.gmane.org/gmane.linux.kernel/528405
>
> Paul, please verify this fixes your problem. You can skip the
> pata_scc patch, it will cause pata_scc part to be rejected but doesn't
> matter.
>
Yes, this does get iVDR detection working again. The only problem seems
to be that every now and then I end up with this:

scsi0 : sata_sil
scsi1 : sata_sil
ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma 0xfd000200 irq 0
ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma 0xfd000208 irq 0
ata1: device not ready (errno=-19), forcing hardreset
ata1: COMRESET failed (errno=-19)
ata1: reset failed (errno=-19), retrying in 9 secs
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

So at least the drive detection works, but it would be nice not to
trigger this 9-second retry.

2007-05-17 00:59:59

by Paul Mundt

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

On Wed, May 16, 2007 at 06:44:53PM +0200, Tejun Heo wrote:
> + /* FIXME: GoVault needs 2s but we can't afford that without
> + * parallel probing. 800ms is enough for iVDR disk
> + * HHD424020F7SV00. Increase to 2secs when parallel probing
> + * is in place.
> + */
> + ATA_TMOUT_FF_WAIT = 4 * HZ / 5,
> +

Changing this to 4 * HZ / 4 gets rid of the occasional COMRESET failure.
So it would seem that 800ms is good enough for the common case, but it
seems to be cutting it pretty close..

2007-05-19 15:54:27

by Indan Zupancic

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

On Wed, May 16, 2007 18:44, Tejun Heo wrote:
> On certain device/controller combination, 0xff status is asserted
> after reset and doesn't get cleared during 150ms post-reset wait. As
> 0xff status is interpreted as no device (for good reasons), this can
> lead to misdetection on such cases.
>
> This patch implements ata_wait_after_reset() which replaces the 150ms
> sleep and waits upto ATA_TMOUT_FF_WAIT if status is 0xff.
> ATA_TMOUT_FF_WAIT is currently 800ms which is enough for
> HHD424020F7SV00 to get detected but not enough for Quantum GoVault
> drive which is known to take upto 2s.
>
> Without parallel probing, spending 2s on 0xff port would incur too
> much delay on ata_piix's which use 0xff to indicate empty port and
> doesn't have SCR register, so GoVault needs to wait till parallel
> probing.

Using sata_sil (SiI 3512) with a ST3120827AS disk here.

For me the COMRESET happens after resume (s2ram):

[ 2.174342] sd 0:0:0:0: [sda] Starting disk
[ 2.476407] ata1: device not ready (errno=-19), forcing hardreset
[ 2.476429] ata2: SATA link down (SStatus 0 SControl 310)
[ 2.931793] ata1: COMRESET failed (errno=-19)
[ 2.931797] ata1: reset failed (errno=-19), retrying in 10 secs
[ 12.918421] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 12.921957] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
[ 12.925908] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648

With your patch applied the output is:

[ 2.173369] sd 0:0:0:0: [sda] Starting disk
[ 2.475431] ata1: device not ready (errno=-19), forcing hardreset
[ 2.475453] ata2: SATA link down (SStatus 0 SControl 310)
[ 3.592930] ata1: COMRESET failed (errno=-19)
[ 3.592934] ata1: reset failed (errno=-19), retrying in 9 secs
[ 12.917446] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 12.920969] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
[ 12.924945] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648

Resume takes now ten seconds or more, while it used to take only a few seconds,
what changed? A few kernel versions ago (2.6.21-rc or so?) it retried in 5 seconds
instead of 10, but even that is too long.

Maybe a silly question, but why do we wait for the harddisk to show up? Maybe that
makes a little bit of sense at bootup, but for resume from ram, where nothing is
supposed to have changed, it seems a bit strange. Why not wait when something tries
to access the harddisk or something?

I wonder if sil_pci_device_resume() and sd_resume() know about each other...
I'll disable sd_resume() and see how that goes.

Greetings,

Indan


2007-05-19 16:39:29

by Indan Zupancic

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

Hello,

Disabling sd_resume() gives me a "quick" resume again (few seconds),
though it doesn't get rid of the COMRESET errors:

[ 2.476417] ata1: device not ready (errno=-19), forcing hardreset
[ 2.476440] ata2: SATA link down (SStatus 0 SControl 310)
[ 2.824896] usb_endpoint usbdev2.3_ep00: PM: resume from 0, parent 2-1 still 2
[ 2.824902] usb 2-1:1.0: PM: resume from 2, parent 2-1 still 2
[ 2.824906] usb 2-1:1.1: PM: resume from 2, parent 2-1 still 2
[ 2.824910] usb 2-1:1.2: PM: resume from 2, parent 2-1 still 2
[ 2.824914] usb_endpoint usbdev2.3_ep05: PM: resume from 0, parent 2-1:1.2 still 2
[ 2.824919] usb_endpoint usbdev2.3_ep85: PM: resume from 0, parent 2-1:1.2 still 2
[ 2.824923] usb_endpoint usbdev2.3_ep81: PM: resume from 0, parent 2-1:1.0 still 2
[ 2.825392] Restarting tasks ... done.
[ 2.931812] ata1: COMRESET failed (errno=-19)
[ 2.931880] ata1: reset failed (errno=-19), retrying in 10 secs
[ 3.136543] usb 2-1: USB disconnect, address 3
[ 3.249382] usb 2-1: new full speed USB device using ohci_hcd and address 4
[ 3.396768] usb 2-1: configuration #1 chosen from 1 choice
[ 4.225560] usb 2-1: reset full speed USB device using ohci_hcd and address 4
[ 4.372298] usbcore: registered new interface driver speedtch
[ 4.456118] speedtch 2-1:1.0: found stage 1 firmware speedtch-1.bin.0.00
[ 4.621056] speedtch 2-1:1.0: found stage 2 firmware speedtch-2.bin.0.00
[ 9.153941] ATM dev 0: ADSL line is synchronising
[ 12.918440] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 12.921971] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
[ 12.924924] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648

Doesn't the controller generate an interrupt when it detects a harddisk?
Is it really needed to do polling?

Idea: What about implementing a sata_sil specific .check_status function,
instead of using the generic ata_check_status()? That one could check other
registers to find out what's really going on in the case ENODEV is returned
by ata_check_status(), or something like that.

Greetings,

Indan


2007-05-19 18:23:48

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

Indan Zupancic wrote:
> Using sata_sil (SiI 3512) with a ST3120827AS disk here.
>
> For me the COMRESET happens after resume (s2ram):
>
> [ 2.174342] sd 0:0:0:0: [sda] Starting disk
> [ 2.476407] ata1: device not ready (errno=-19), forcing hardreset
> [ 2.476429] ata2: SATA link down (SStatus 0 SControl 310)
> [ 2.931793] ata1: COMRESET failed (errno=-19)
> [ 2.931797] ata1: reset failed (errno=-19), retrying in 10 secs
> [ 12.918421] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [ 12.921957] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
> [ 12.925908] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
>
> With your patch applied the output is:
>
> [ 2.173369] sd 0:0:0:0: [sda] Starting disk
> [ 2.475431] ata1: device not ready (errno=-19), forcing hardreset
> [ 2.475453] ata2: SATA link down (SStatus 0 SControl 310)
> [ 3.592930] ata1: COMRESET failed (errno=-19)
> [ 3.592934] ata1: reset failed (errno=-19), retrying in 9 secs
> [ 12.917446] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [ 12.920969] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
> [ 12.924945] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
>
> Resume takes now ten seconds or more, while it used to take only a few seconds,
> what changed? A few kernel versions ago (2.6.21-rc or so?) it retried in 5 seconds
> instead of 10, but even that is too long.

There are two different things at play here...

1. We dropped libata specific suspend/resume implementation in favor of
sd driven one. Unfortunately, the new implementation currently can't do
background spinup, so that's probably why you can feel the delay. We
need to figure out how to do background spinup with the new implementation.

2. To make reset finish in defined time, each reset try now has defined
deadlines. I was trying to be conservative so I chose 10sec for the
first try. It's driven by timing table near the top of libata-eh.c, so
adjusting the table is easy and maybe we can be a bit more aggressive
with the first two tries. But if we solve #1, this shouldn't matter too
much.

So, it's the unfortunate combination of the above two.

> Maybe a silly question, but why do we wait for the harddisk to show up? Maybe that
> makes a little bit of sense at bootup, but for resume from ram, where nothing is
> supposed to have changed, it seems a bit strange. Why not wait when something tries
> to access the harddisk or something?

I hope it's explained now.

> I wonder if sil_pci_device_resume() and sd_resume() know about each other...
> I'll disable sd_resume() and see how that goes.

Yeap, that should work. In most configurations, devices spin up
immediately after power is restored. sd_resume() just waits for the
device to be revalidated in such cases.

--
tejun

2007-05-19 18:43:43

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

Indan Zupancic wrote:
> Doesn't the controller generate an interrupt when it detects a harddisk?
> Is it really needed to do polling?

It depends on the controller but the closest thing is usually PHY status
changed interrupt and PHY readiness doesn't imply device readiness. On
some controllers, you can play smart and try to do these things by
interrupt but polling is much more reliable for this kind of stuff.

> Idea: What about implementing a sata_sil specific .check_status function,
> instead of using the generic ata_check_status()? That one could check other
> registers to find out what's really going on in the case ENODEV is returned
> by ata_check_status(), or something like that.

Yeah, if SCR registers are accessible, 0xff doesn't indicate the device
isn't there, so the whole skip-0xff logic probably shouldn't apply in
such cases, but we can also achieve pretty good result by just making
the first reset tries a bit more aggressive.

The first timeout of 10secs was chosen because most desktop drives take
somewhere around 7~10 secs to spin up, so retrying before that didn't
make much sense, but mobile devices usually spin up faster, so it might
be worth to try one more time inbetween.

--
tejun

2007-05-19 19:05:23

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index d5939e6..27ddc39 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3031,7 +3031,7 @@ int ata_wait_ready(struct ata_port *ap,

if (!(status & ATA_BUSY))
return 0;
- if (status == 0xff)
+ if (!ata_port_online(ap) && status == 0xff)
return -ENODEV;
if (time_after(now, deadline))
return -EBUSY;


Attachments:
patch (411.00 B)

2007-05-19 22:33:31

by Indan Zupancic

[permalink] [raw]
Subject: sd_resume redundant? [was: [PATCH] libata: implement ata_wait_after_reset()]

On Sat, May 19, 2007 21:04, Tejun Heo wrote:
> Tejun Heo wrote:
>> Yeah, if SCR registers are accessible, 0xff doesn't indicate the device
>> isn't there, so the whole skip-0xff logic probably shouldn't apply in
>> such cases, but we can also achieve pretty good result by just making
>> the first reset tries a bit more aggressive.
>
> So, here's the patch.
>
> Paul, can you please test this patch without the previous patch? Indan,
> this should reduce the resume delay. Please test. But you'll still
> feel some added delay compared to 2.6.20 due to the mentioned
> suspend/resume change.

This removed the COMRESET errors indeed, and with sd_resume()
disabled everything is speedy again (2s or so. Still a desktop pc).
I didn't try with sd_resume enabled.

Everything seems to work fine without sd_resume(), so why is it needed?

Greetings,

Indan


2007-05-19 22:55:04

by Indan Zupancic

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

On Sat, May 19, 2007 20:23, Tejun Heo wrote:
> Indan Zupancic wrote:
>> Resume takes now ten seconds or more, while it used to take only a few seconds,
>> what changed? A few kernel versions ago (2.6.21-rc or so?) it retried in 5 seconds
>> instead of 10, but even that is too long.
>
> There are two different things at play here...
>
> 1. We dropped libata specific suspend/resume implementation in favor of
> sd driven one. Unfortunately, the new implementation currently can't do
> background spinup, so that's probably why you can feel the delay. We
> need to figure out how to do background spinup with the new implementation.

What are the advantages of that, except slower resume? ;-)

> 2. To make reset finish in defined time, each reset try now has defined
> deadlines. I was trying to be conservative so I chose 10sec for the
> first try.

Right. So to me it seems that failed, because the undefined reset time is just
replaced with a fixed ad hoc sleep, which is longer than any undefined reset
mechanism. So where did the advantage go? Better to go back to how it was,
is my impression. Or make that deadline 10 seconds and after that give up,
instead of the other way round. Or just move to async reset.

> It's driven by timing table near the top of libata-eh.c, so
> adjusting the table is easy and maybe we can be a bit more aggressive
> with the first two tries. But if we solve #1, this shouldn't matter too
> much.

#2 seems totally unrelated to #1, as #1 is about spinup, and #2 about reset.
Only relation is that #2 slows down #1 because #1 needs to wait on #2.


>> Maybe a silly question, but why do we wait for the harddisk to show up? Maybe that
>> makes a little bit of sense at bootup, but for resume from ram, where nothing is
>> supposed to have changed, it seems a bit strange. Why not wait when something tries
>> to access the harddisk or something?
>
> I hope it's explained now.

Almost. Explained is why we wait on the disk, but that's only the sd_resume part.
It's still unclear why resume must wait till the reset is done.


>> I wonder if sil_pci_device_resume() and sd_resume() know about each other...
>> I'll disable sd_resume() and see how that goes.
>
> Yeap, that should work. In most configurations, devices spin up
> immediately after power is restored. sd_resume() just waits for the
> device to be revalidated in such cases.

And it kicks the disk into action. Why? Only thing it does is sending a START_STOP
command. I assume that causes the disk to spin up. But for e.g. laptopmode I can
imagine that's quite annoying. I think sd_resume() is totally unnecessary and can
be removed, because it seems wrong to always force the harddisk to spin up,
background spin up or not.

Greetings,

Indan


2007-05-20 09:51:00

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

Indan Zupancic wrote:
>> 1. We dropped libata specific suspend/resume implementation in favor of
>> sd driven one. Unfortunately, the new implementation currently can't do
>> background spinup, so that's probably why you can feel the delay. We
>> need to figure out how to do background spinup with the new implementation.
>
> What are the advantages of that, except slower resume? ;-)

The change was made primarily to make libata spin down disks properly on
power down and hibernate. I don't like the added delay either but it's
better than shortening the lifespan of harddisks. Making resuming
asynchronous is planned but we need more infrastructure to do that
properly. The previous implementation worked but it also might try to
spin up a lot of disks at once which can't be good for PSU. I'm not
sure yet how to do that properly with sd driving suspend/resume.

>> 2. To make reset finish in defined time, each reset try now has defined
>> deadlines. I was trying to be conservative so I chose 10sec for the
>> first try.
>
> Right. So to me it seems that failed, because the undefined reset time is just
> replaced with a fixed ad hoc sleep, which is longer than any undefined reset
> mechanism. So where did the advantage go? Better to go back to how it was,
> is my impression. Or make that deadline 10 seconds and after that give up,
> instead of the other way round. Or just move to async reset.

Well, yeah, your case degraded for sure but a lot of hotplug or error
handling cases are now much better which often used to take more than 30
secs in many cases. There are just a lot of cases and a lot of weird
devices. Aiming generally acceptable delay on most cases is of higher
priority than optimizing specific cases. That said, the 10 sec delayed
retry you're seeing is primarily caused by incorrect interpretation of
0xff status on sata_sil, so with that fixed, it shouldn't be too much of
a problem.

>> It's driven by timing table near the top of libata-eh.c, so
>> adjusting the table is easy and maybe we can be a bit more aggressive
>> with the first two tries. But if we solve #1, this shouldn't matter too
>> much.
>
> #2 seems totally unrelated to #1, as #1 is about spinup, and #2 about reset.
> Only relation is that #2 slows down #1 because #1 needs to wait on #2.

Not that easy. Many drives don't respond to reset while they're
spinning up.

>>> Maybe a silly question, but why do we wait for the harddisk to show up? Maybe that
>>> makes a little bit of sense at bootup, but for resume from ram, where nothing is
>>> supposed to have changed, it seems a bit strange. Why not wait when something tries
>>> to access the harddisk or something?
>> I hope it's explained now.
>
> Almost. Explained is why we wait on the disk, but that's only the sd_resume part.
> It's still unclear why resume must wait till the reset is done.

Port reset itself is asynchronous. The wait is solely from sd_resume.

>>> I wonder if sil_pci_device_resume() and sd_resume() know about each other...
>>> I'll disable sd_resume() and see how that goes.
>> Yeap, that should work. In most configurations, devices spin up
>> immediately after power is restored. sd_resume() just waits for the
>> device to be revalidated in such cases.
>
> And it kicks the disk into action. Why? Only thing it does is sending a START_STOP
> command. I assume that causes the disk to spin up. But for e.g. laptopmode I can
> imagine that's quite annoying. I think sd_resume() is totally unnecessary and can
> be removed, because it seems wrong to always force the harddisk to spin up,
> background spin up or not.

Most ATA drives would spin up immediately when power is reapplied unless
configured otherwise and you can configure whether sd_resume issues
START_STOP or not by echoing to sysfs node
/sys/class/scsi_disk/h:c:i:l/manage_start_stop. The drawback here is
you won't get proper spindown if you turn it off. Adding fine-grained
control can be useful. Wanna give it a try? Shouldn't be too difficult
and, as manage_start_stop is just added, I think we can still change its
API.

--
tejun

2007-05-20 09:54:49

by Tejun Heo

[permalink] [raw]
Subject: Re: sd_resume redundant? [was: [PATCH] libata: implement ata_wait_after_reset()]

Indan Zupancic wrote:
> On Sat, May 19, 2007 21:04, Tejun Heo wrote:
>> Tejun Heo wrote:
>>> Yeah, if SCR registers are accessible, 0xff doesn't indicate the device
>>> isn't there, so the whole skip-0xff logic probably shouldn't apply in
>>> such cases, but we can also achieve pretty good result by just making
>>> the first reset tries a bit more aggressive.
>> So, here's the patch.
>>
>> Paul, can you please test this patch without the previous patch? Indan,
>> this should reduce the resume delay. Please test. But you'll still
>> feel some added delay compared to 2.6.20 due to the mentioned
>> suspend/resume change.
>
> This removed the COMRESET errors indeed, and with sd_resume()
> disabled everything is speedy again (2s or so. Still a desktop pc).
> I didn't try with sd_resume enabled.

Can you try to measure with sd_resume in place?

> Everything seems to work fine without sd_resume(), so why is it needed?

Because not all disks spin up without being told to do so and like it or
not spinning disks up on resume is the default behavior. As I wrote in
the other reply, it would be worthwhile to make it configurable.

--
tejun

2007-05-20 13:26:50

by Indan Zupancic

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

Hello Tejun,

Thanks for your answers.

On Sun, May 20, 2007 11:50, Tejun Heo wrote:
> Indan Zupancic wrote:
>>> 1. We dropped libata specific suspend/resume implementation in favor of
>>> sd driven one. Unfortunately, the new implementation currently can't do
>>> background spinup, so that's probably why you can feel the delay. We
>>> need to figure out how to do background spinup with the new implementation.
>>
>> What are the advantages of that, except slower resume? ;-)
>
> The change was made primarily to make libata spin down disks properly on
> power down and hibernate. I don't like the added delay either but it's
> better than shortening the lifespan of harddisks.

Ah, it fixes that "weird noise on shutdown" problem? Fair enough.


> Making resuming
> asynchronous is planned but we need more infrastructure to do that
> properly. The previous implementation worked but it also might try to
> spin up a lot of disks at once which can't be good for PSU. I'm not
> sure yet how to do that properly with sd driving suspend/resume.

Don't controllers support spread spin up of disks, to avoid the peak load?
I think I saw something about that in the SiI 3512 spec I downloaded. So
maybe something like a special spin up method which can be implemented
by drivers, and if it isn't, the current start stop thing is used?


>>> 2. To make reset finish in defined time, each reset try now has defined
>>> deadlines. I was trying to be conservative so I chose 10sec for the
>>> first try.
>>
>> Right. So to me it seems that failed, because the undefined reset time is just
>> replaced with a fixed ad hoc sleep, which is longer than any undefined reset
>> mechanism. So where did the advantage go? Better to go back to how it was,
>> is my impression. Or make that deadline 10 seconds and after that give up,
>> instead of the other way round. Or just move to async reset.
>
> Well, yeah, your case degraded for sure but a lot of hotplug or error
> handling cases are now much better which often used to take more than 30
> secs in many cases. There are just a lot of cases and a lot of weird
> devices. Aiming generally acceptable delay on most cases is of higher
> priority than optimizing specific cases.

What I meant is that the deadline isn't effective because if it can't be done within
that time, it's just retried after 10 seconds or whatever, basically rendering the
deadline useless. But now I understand that the retry is done in the background,
and that it was just the sd_resume that caused everything to wait on it.


> That said, the 10 sec delayed
> retry you're seeing is primarily caused by incorrect interpretation of
> 0xff status on sata_sil, so with that fixed, it shouldn't be too much of
> a problem.

True.

>>> It's driven by timing table near the top of libata-eh.c, so
>>> adjusting the table is easy and maybe we can be a bit more aggressive
>>> with the first two tries. But if we solve #1, this shouldn't matter too
>>> much.
>>
>> #2 seems totally unrelated to #1, as #1 is about spinup, and #2 about reset.
>> Only relation is that #2 slows down #1 because #1 needs to wait on #2.
>
> Not that easy. Many drives don't respond to reset while they're
> spinning up.

Bugger. So it seems like a good idea to do the reset and spinup async together.


>>>> Maybe a silly question, but why do we wait for the harddisk to show up? Maybe that
>>>> makes a little bit of sense at bootup, but for resume from ram, where nothing is
>>>> supposed to have changed, it seems a bit strange. Why not wait when something tries
>>>> to access the harddisk or something?
>>> I hope it's explained now.
>>
>> Almost. Explained is why we wait on the disk, but that's only the sd_resume part.
>> It's still unclear why resume must wait till the reset is done.
>
> Port reset itself is asynchronous. The wait is solely from sd_resume.

The whole resetting, or just the retry after 10 seconds? But it becomes clear now that
you're right that the spinup problem is solved if sd_resume does background spin up.


>>>> I wonder if sil_pci_device_resume() and sd_resume() know about each other...
>>>> I'll disable sd_resume() and see how that goes.
>>> Yeap, that should work. In most configurations, devices spin up
>>> immediately after power is restored. sd_resume() just waits for the
>>> device to be revalidated in such cases.
>>
>> And it kicks the disk into action. Why? Only thing it does is sending a START_STOP
>> command. I assume that causes the disk to spin up. But for e.g. laptopmode I can
>> imagine that's quite annoying. I think sd_resume() is totally unnecessary and can
>> be removed, because it seems wrong to always force the harddisk to spin up,
>> background spin up or not.
>
> Most ATA drives would spin up immediately when power is reapplied unless
> configured otherwise and you can configure whether sd_resume issues
> START_STOP or not by echoing to sysfs node
> /sys/class/scsi_disk/h:c:i:l/manage_start_stop. The drawback here is
> you won't get proper spindown if you turn it off.

Thanks for the pointer, I'll fiddle with it. What do you mean with "proper" here?
Not at all, or in a forced way because power goes away?

(And what does the 'allow_restart' option do?)

> Adding fine-grained
> control can be useful. Wanna give it a try? Shouldn't be too difficult
> and, as manage_start_stop is just added, I think we can still change its
> API.

Well, manage_start_stop could be split up into manage_start and manage_stop,
but I don't know how useful that is in practice. Maybe it makes more sense to
check /proc/sys/vm/laptop_mode and then decide whether to start the disk.
Or just remember the disk state and restore that?

My impression is that these options aren't that useful other than for debugging,
but that normally the right thing should be done automatically. (Not spinning
down at reboot, not spinning up at bootup if the disk isn't used, not spinning up
after resume if the disk wasn't spinning before suspend, etc.)

Perhaps a silly question, but how can you ever set manage_start_stop to 0 before
the disk is started at bootup? You can change it before doing suspend, but not at
bootup. So is this option at the right place?

Greetings,

Indan


2007-05-20 14:28:06

by Indan Zupancic

[permalink] [raw]
Subject: Re: sd_resume redundant? [was: [PATCH] libata: implement ata_wait_after_reset()]

On Sun, May 20, 2007 11:54, Tejun Heo wrote:
> Indan Zupancic wrote:
>> On Sat, May 19, 2007 21:04, Tejun Heo wrote:
>>> Tejun Heo wrote:
>>>> Yeah, if SCR registers are accessible, 0xff doesn't indicate the device
>>>> isn't there, so the whole skip-0xff logic probably shouldn't apply in
>>>> such cases, but we can also achieve pretty good result by just making
>>>> the first reset tries a bit more aggressive.
>>> So, here's the patch.
>>>
>>> Paul, can you please test this patch without the previous patch? Indan,
>>> this should reduce the resume delay. Please test. But you'll still
>>> feel some added delay compared to 2.6.20 due to the mentioned
>>> suspend/resume change.
>>
>> This removed the COMRESET errors indeed, and with sd_resume()
>> disabled everything is speedy again (2s or so. Still a desktop pc).
>> I didn't try with sd_resume enabled.
>
> Can you try to measure with sd_resume in place?

[ 2.173366] sd 0:0:0:0: [sda] Starting disk
[ 2.475422] ata2: SATA link down (SStatus 0 SControl 310)
[ 5.478403] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 5.481928] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
[ 5.485904] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
[ 5.485913] ata1.00: configured for UDMA/100
[ 5.505109] sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
[ 5.505461] sd 0:0:0:0: [sda] Write Protect is off
[ 5.505465] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 5.505612] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or
FUA
...
[ 6.157259] Restarting tasks ... done.


And with echo 0 > /sys/class/scsi_disk/0\:0\:0\:0/manage_start_stop:

[ 2.476476] ata2: SATA link down (SStatus 0 SControl 310)
...
[ 2.825479] Restarting tasks ... done.
...
[ 5.022076] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 5.025605] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
[ 5.028594] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
[ 5.028606] ata1.00: configured for UDMA/100
[ 5.028720] sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
[ 5.028767] sd 0:0:0:0: [sda] Write Protect is off
[ 5.028773] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 5.028831] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or
FUA

So over all it takes half a second longer to detect the disk, but because everything waits on it,
it takes more than three seconds longer to resume.

Setting manage_start_stop to 0 fixes it and is good enough for me, I didn't notice anything bad yet
because of the unmanaged stop. Implementing background spin up will fix it too.


>> Everything seems to work fine without sd_resume(), so why is it needed?
>
> Because not all disks spin up without being told to do so and like it or
> not spinning disks up on resume is the default behavior. As I wrote in
> the other reply, it would be worthwhile to make it configurable.

Not even after they receive a read command? Ugh.

Greeting,

Indan


2007-05-20 17:09:52

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

Indan Zupancic wrote:
>> The change was made primarily to make libata spin down disks properly on
>> power down and hibernate. I don't like the added delay either but it's
>> better than shortening the lifespan of harddisks.
>
> Ah, it fixes that "weird noise on shutdown" problem? Fair enough.

Yeap, finally.

>> Making resuming
>> asynchronous is planned but we need more infrastructure to do that
>> properly. The previous implementation worked but it also might try to
>> spin up a lot of disks at once which can't be good for PSU. I'm not
>> sure yet how to do that properly with sd driving suspend/resume.
>
> Don't controllers support spread spin up of disks, to avoid the peak load?
> I think I saw something about that in the SiI 3512 spec I downloaded. So
> maybe something like a special spin up method which can be implemented
> by drivers, and if it isn't, the current start stop thing is used?

What they support is allowing drivers to control spin up, and it needs
support from the HDD (many seem to do these days) && BIOS (so that it
doesn't spin up the disks during POST) too. To do that properly, we
need a repository which distributes spinup tokens such that we can do
parallel, not thundering herd, spinups.

>>> Right. So to me it seems that failed, because the undefined reset time is just
>>> replaced with a fixed ad hoc sleep, which is longer than any undefined reset
>>> mechanism. So where did the advantage go? Better to go back to how it was,
>>> is my impression. Or make that deadline 10 seconds and after that give up,
>>> instead of the other way round. Or just move to async reset.
>> Well, yeah, your case degraded for sure but a lot of hotplug or error
>> handling cases are now much better which often used to take more than 30
>> secs in many cases. There are just a lot of cases and a lot of weird
>> devices. Aiming generally acceptable delay on most cases is of higher
>> priority than optimizing specific cases.
>
> What I meant is that the deadline isn't effective because if it can't be done within
> that time, it's just retried after 10 seconds or whatever, basically rendering the
> deadline useless. But now I understand that the retry is done in the background,
> and that it was just the sd_resume that caused everything to wait on it.

The deadline table sets limits to the maximum time a try can take &&
when the next attempt is made. So, the initial deadline of 10 secs
means that the first try is allowed to take upto 10secs and no matter
when or why it fails, the next attempt is made after 10sec is passed. I
would love to bang devices harder but a lot of creepy device are out
there and I'm pretty sure some of them would crap themselves if pushed
too hard. :-(

Anyways, in your case, the problem was that sata_sil wasted the first
try for no good reason. I'm not determined whether we need more
aggressive deadlines for early tries for cases like this. It's
something to think about.

>>>> It's driven by timing table near the top of libata-eh.c, so
>>>> adjusting the table is easy and maybe we can be a bit more aggressive
>>>> with the first two tries. But if we solve #1, this shouldn't matter too
>>>> much.
>>> #2 seems totally unrelated to #1, as #1 is about spinup, and #2 about reset.
>>> Only relation is that #2 slows down #1 because #1 needs to wait on #2.
>> Not that easy. Many drives don't respond to reset while they're
>> spinning up.
>
> Bugger. So it seems like a good idea to do the reset and spinup async together.

There are a lot of cases. Depending on various configuration and whims,
some drives spin up when power is reapplied and don't respond to reset
till it's ready. Some drives don't spin up when power is reapplied but
spin up when PHY is woken up and don't respond to reset till it's ready.
Yet others just sit quiet and respond to reset nicely but don't spin up
till told to do so.

So, well, I agree that the best strategy is doing it all asynchronously.

>>>>> Maybe a silly question, but why do we wait for the harddisk to show up? Maybe that
>>>>> makes a little bit of sense at bootup, but for resume from ram, where nothing is
>>>>> supposed to have changed, it seems a bit strange. Why not wait when something tries
>>>>> to access the harddisk or something?
>>>> I hope it's explained now.
>>> Almost. Explained is why we wait on the disk, but that's only the sd_resume part.
>>> It's still unclear why resume must wait till the reset is done.
>> Port reset itself is asynchronous. The wait is solely from sd_resume.
>
> The whole resetting, or just the retry after 10 seconds? But it becomes clear now that
> you're right that the spinup problem is solved if sd_resume does background spin up.

The whole resetting. ATA controller/port resume itself is completely
asynchronous. The resume method just kicks EH thread and tells it to
wake the thing up and returns. The EH thread asynchronously resets the
port, revalidates all the devices and see if any new ones are attached, etc.

The culprit is that while EH thread is active all commands are deferred,
so the START_STOP command sd_resume() issues waits in the SCSI midlayer
queue till EH is complete. That's why doing it asynchronously is
tricky. I'm not sure whether we would be able to do it without
separating libata from SCSI so that libata has its own suspend/resume
functions. :-(

>> Most ATA drives would spin up immediately when power is reapplied unless
>> configured otherwise and you can configure whether sd_resume issues
>> START_STOP or not by echoing to sysfs node
>> /sys/class/scsi_disk/h:c:i:l/manage_start_stop. The drawback here is
>> you won't get proper spindown if you turn it off.
>
> Thanks for the pointer, I'll fiddle with it. What do you mean with "proper" here?
> Not at all, or in a forced way because power goes away?

Well, basically, your drive isn't told that the power is gonna go off.
Power just goes off and your drive has to do emergency head unload which
sometimes causes the funny noise.

> (And what does the 'allow_restart' option do?)

That's for SCSI and doesn't matter for ATA. ATA drives restart
themselves automatically when media access is needed anyway.

>> Adding fine-grained
>> control can be useful. Wanna give it a try? Shouldn't be too difficult
>> and, as manage_start_stop is just added, I think we can still change its
>> API.
>
> Well, manage_start_stop could be split up into manage_start and manage_stop,
> but I don't know how useful that is in practice. Maybe it makes more sense to
> check /proc/sys/vm/laptop_mode and then decide whether to start the disk.
> Or just remember the disk state and restore that?
>
> My impression is that these options aren't that useful other than for debugging,
> but that normally the right thing should be done automatically. (Not spinning
> down at reboot, not spinning up at bootup if the disk isn't used, not spinning up
> after resume if the disk wasn't spinning before suspend, etc.)
>
> Perhaps a silly question, but how can you ever set manage_start_stop to 0 before
> the disk is started at bootup? You can change it before doing suspend, but not at
> bootup. So is this option at the right place?

The problem is sorta complicated because it involves all devices which
use SCSI midlayer. libata, USB, all the regular SCSI disks, whatnot, so
it's really hard to sell to linux-scsi people if you choose a default
behavior which is specific to libata devices. What can be done is
adding a configurable option which doesn't change the default behavior
and configure it in the low level driver. This is how manage_start_stop
is configured. The default value is zero but libata sets it to one
while registering each device, so the default value for all libata
devices is 1 (this should answer your last question).

--
tejun

2007-05-20 17:18:37

by Tejun Heo

[permalink] [raw]
Subject: Re: sd_resume redundant? [was: [PATCH] libata: implement ata_wait_after_reset()]

Indan Zupancic wrote:
>> Can you try to measure with sd_resume in place?
>
> [ 2.173366] sd 0:0:0:0: [sda] Starting disk
> [ 2.475422] ata2: SATA link down (SStatus 0 SControl 310)
> [ 5.478403] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [ 5.481928] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
> [ 5.485904] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
> [ 5.485913] ata1.00: configured for UDMA/100
> [ 5.505109] sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
> [ 5.505461] sd 0:0:0:0: [sda] Write Protect is off
> [ 5.505465] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [ 5.505612] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or
> FUA
> ...
> [ 6.157259] Restarting tasks ... done.
>
>
> And with echo 0 > /sys/class/scsi_disk/0\:0\:0\:0/manage_start_stop:
>
> [ 2.476476] ata2: SATA link down (SStatus 0 SControl 310)
> ...
> [ 2.825479] Restarting tasks ... done.
> ...
> [ 5.022076] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [ 5.025605] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
> [ 5.028594] ata1.00: ata_hpa_resize 1: sectors = 234441648, hpa_sectors = 234441648
> [ 5.028606] ata1.00: configured for UDMA/100
> [ 5.028720] sd 0:0:0:0: [sda] 234441648 512-byte hardware sectors (120034 MB)
> [ 5.028767] sd 0:0:0:0: [sda] Write Protect is off
> [ 5.028773] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [ 5.028831] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or
> FUA
>
> So over all it takes half a second longer to detect the disk, but
> because everything waits on it, it takes more than three seconds
> longer to resume.

Eeeek. Extra three secs doesn't sound too hot. :-(

> Setting manage_start_stop to 0 fixes it and is good enough for me, I
> didn't notice anything bad yet because of the unmanaged
> stop. Implementing background spin up will fix it too.

Just commenting out sd_resume() would be a better solution for your
case tho.

>>> Everything seems to work fine without sd_resume(), so why is it needed?
>> Because not all disks spin up without being told to do so and like it or
>> not spinning disks up on resume is the default behavior. As I wrote in
>> the other reply, it would be worthwhile to make it configurable.
>
> Not even after they receive a read command? Ugh.

After receiving a command which requires media access, they do. What
I was saying is that the current default behavior is to spin up all
devices on resume and part of that is achieved by sd_resume().

Hmmm... skipping START_STOP during sd_resume() actually is a pretty
good solution for ATA devices. I'll think about it.

Thanks.

--
tejun

2007-05-20 19:35:28

by Indan Zupancic

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

Hello Tejun,

On Sun, May 20, 2007 19:09, Tejun Heo wrote:
> Indan Zupancic wrote:
>> Don't controllers support spread spin up of disks, to avoid the peak load?
>> I think I saw something about that in the SiI 3512 spec I downloaded. So
>> maybe something like a special spin up method which can be implemented
>> by drivers, and if it isn't, the current start stop thing is used?
>
> What they support is allowing drivers to control spin up, and it needs
> support from the HDD (many seem to do these days) && BIOS (so that it
> doesn't spin up the disks during POST) too. To do that properly, we
> need a repository which distributes spinup tokens such that we can do
> parallel, not thundering herd, spinups.

More and more do I get the impression that with full hardware control for Linux,
from BIOS to firmware, great things could be done. Alas.


> The deadline table sets limits to the maximum time a try can take &&
> when the next attempt is made. So, the initial deadline of 10 secs
> means that the first try is allowed to take upto 10secs and no matter
> when or why it fails, the next attempt is made after 10sec is passed. I
> would love to bang devices harder but a lot of creepy device are out
> there and I'm pretty sure some of them would crap themselves if pushed
> too hard. :-(

Well, if it's going to be retried later on, then it's not really a deadline, is it?
Perhaps it's better to give up after the deadline, and only try again after an
interrupt is received from the controller?


> Anyways, in your case, the problem was that sata_sil wasted the first
> try for no good reason. I'm not determined whether we need more
> aggressive deadlines for early tries for cases like this. It's
> something to think about.

The annoying part is that all this seems to depend on the harddisk, and not
really on the controller (or the combination of the two).


>> Bugger. So it seems like a good idea to do the reset and spinup async together.
>
> There are a lot of cases. Depending on various configuration and whims,
> some drives spin up when power is reapplied and don't respond to reset
> till it's ready. Some drives don't spin up when power is reapplied but
> spin up when PHY is woken up and don't respond to reset till it's ready.
> Yet others just sit quiet and respond to reset nicely but don't spin up
> till told to do so.
>
> So, well, I agree that the best strategy is doing it all asynchronously.

Another one would be to do it interrupt driven instead of doing polling,
if the controller supports it well.


>> The whole resetting, or just the retry after 10 seconds? But it becomes clear now that
>> you're right that the spinup problem is solved if sd_resume does background spin up.
>
> The whole resetting. ATA controller/port resume itself is completely
> asynchronous. The resume method just kicks EH thread and tells it to
> wake the thing up and returns. The EH thread asynchronously resets the
> port, revalidates all the devices and see if any new ones are attached, etc.
>
> The culprit is that while EH thread is active all commands are deferred,
> so the START_STOP command sd_resume() issues waits in the SCSI midlayer
> queue till EH is complete.

Are you saying that right now there is only one EH thread? In that case an
improvent would be to have one EH thread per controller/device, so that at
least independent scsi devices can't block each other.

> That's why doing it asynchronously is
> tricky. I'm not sure whether we would be able to do it without
> separating libata from SCSI so that libata has its own suspend/resume
> functions. :-(

Well, it seems you just moved the other direction, with leaning on sd_resume.


>> Thanks for the pointer, I'll fiddle with it. What do you mean with "proper" here?
>> Not at all, or in a forced way because power goes away?
>
> Well, basically, your drive isn't told that the power is gonna go off.
> Power just goes off and your drive has to do emergency head unload which
> sometimes causes the funny noise.

Hm, I suspected something like that. Though the shutdown sound of my HD
seems to sound the same as usual. Perhaps the BIOS gives it a sign as part of the
suspend? (Or to the controller, which delegates it.)


>> Perhaps a silly question, but how can you ever set manage_start_stop to 0 before
>> the disk is started at bootup? You can change it before doing suspend, but not at
>> bootup. So is this option at the right place?
>
> The problem is sorta complicated because it involves all devices which
> use SCSI midlayer. libata, USB, all the regular SCSI disks, whatnot, so
> it's really hard to sell to linux-scsi people if you choose a default
> behavior which is specific to libata devices. What can be done is
> adding a configurable option which doesn't change the default behavior
> and configure it in the low level driver. This is how manage_start_stop
> is configured. The default value is zero but libata sets it to one
> while registering each device, so the default value for all libata
> devices is 1 (this should answer your last question).

The "problem" here is that the option only shows up after the device is detected,
in which case the default is already executed, rendering the 'start' part useless
for people who don't use s2ram. So the interface does seem to be a bit hampered,
or at least not fully effective. On the other hand, as you said disks are spin up by
the BIOS already, so it's more a theoretical problem than a real one I suppose.

Thanks for your explanations,

Indan


2007-05-20 19:47:49

by Indan Zupancic

[permalink] [raw]
Subject: Re: sd_resume redundant? [was: [PATCH] libata: implement ata_wait_after_reset()]

On Sun, May 20, 2007 19:17, Tejun Heo wrote:
> Indan Zupancic wrote:
>> So over all it takes half a second longer to detect the disk, but
>> because everything waits on it, it takes more than three seconds
>> longer to resume.
>
> Eeeek. Extra three secs doesn't sound too hot. :-(

Hey, without your reset fix it was 11 seconds slower. ;-)

Well, it used to be around 1.7 seconds to resume, and now (after the fixes)
it's 2.8 (according to the dmesg timestamps). But the first timestamp printed
was 0.7, and now it's 2.1, so I don't know if it's a real slowdown or that the timer
is set correctly earlier. That was with 2.19 btw, I don't know what happened in
the meantime. I suspect the timer changes. To know for sure I'd need to boot
an older kernel and measure resume time with a stopwatch.


>> Setting manage_start_stop to 0 fixes it and is good enough for me, I
>> didn't notice anything bad yet because of the unmanaged
>> stop. Implementing background spin up will fix it too.
>
> Just commenting out sd_resume() would be a better solution for your
> case tho.

I like your idea below more. ;-)


>>>> Everything seems to work fine without sd_resume(), so why is it needed?
>>> Because not all disks spin up without being told to do so and like it or
>>> not spinning disks up on resume is the default behavior. As I wrote in
>>> the other reply, it would be worthwhile to make it configurable.
>>
>> Not even after they receive a read command? Ugh.
>
> After receiving a command which requires media access, they do. What
> I was saying is that the current default behavior is to spin up all
> devices on resume and part of that is achieved by sd_resume().
>
> Hmmm... skipping START_STOP during sd_resume() actually is a pretty
> good solution for ATA devices. I'll think about it.

I can't speak for others, but for me this would fix all regressions and even
disk spin up seems half a second faster without the START_STOP.
It would also solve the laptopmode thing, not unnecessarily spinning up
disks that don't need to be (if the hardware doesn't do it).

Greetings,

Indan


2007-05-23 00:08:45

by Paul Mundt

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

On Sat, May 19, 2007 at 09:04:56PM +0200, Tejun Heo wrote:
> Tejun Heo wrote:
> > Yeah, if SCR registers are accessible, 0xff doesn't indicate the device
> > isn't there, so the whole skip-0xff logic probably shouldn't apply in
> > such cases, but we can also achieve pretty good result by just making
> > the first reset tries a bit more aggressive.
>
> So, here's the patch.
>
> Paul, can you please test this patch without the previous patch? Indan,
> this should reduce the resume delay. Please test. But you'll still
> feel some added delay compared to 2.6.20 due to the mentioned
> suspend/resume change.
>
Seems to work ok:

[ 0.977254] scsi0 : sata_sil
[ 0.980243] scsi1 : sata_sil
[ 0.983207] ata1: SATA max UDMA/100 cmd 0xfd000280 ctl 0xfd00028a bmdma 0xfd000200 irq 0
[ 0.991183] ata2: SATA max UDMA/100 cmd 0xfd0002c0 ctl 0xfd0002ca bmdma 0xfd000208 irq 0
[ 2.578436] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 2.586828] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
[ 2.591596] ata1.00: ATA-5: HHD424020F7SV00, 00MLA0A5, max UDMA/100
[ 2.598094] ata1.00: 39070080 sectors, multi 0: LBA
[ 2.603248] ata1.00: applying bridge limits
[ 2.614710] ata1.00: ata_hpa_resize 1: sectors = 39070080, hpa_sectors = 39070080
[ 2.619489] ata1.00: configured for UDMA/100
[ 2.933096] ata2: SATA link down (SStatus 0 SControl 310)
[ 2.936265] scsi 0:0:0:0: Direct-Access ATA HHD424020F7SV00 00ML PQ: 0 ANSI: 5
[ 2.945002] sd 0:0:0:0: [sda] 39070080 512-byte hardware sectors (20004 MB)
[ 2.951473] sd 0:0:0:0: [sda] Write Protect is off

2007-05-29 01:31:51

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

Tejun Heo wrote:
> - msleep(150);
> + /* wait a while before checking status */
> + ata_wait_after_reset(ap, deadline);
[...]
> - msleep(150);
> + /* wait a while before checking status */
> + ata_wait_after_reset(ap, deadline);
>
> /* Before we perform post reset processing we want to see if
> * the bus shows 0xFF because the odd clown forgets the D7
> @@ -3543,8 +3583,8 @@ int sata_std_hardreset(struct ata_port *
> return 0;
> }
>
> - /* wait a while before checking status, see SRST for more info */
> - msleep(150);
> + /* wait a while before checking status */
> + ata_wait_after_reset(ap, deadline);
>
> rc = ata_wait_ready(ap, deadline);
[...]
> - msleep(150);
> + /* wait a while before checking status */
> + ata_wait_after_reset(ap, deadline);
>
> /* Before we perform post reset processing we want to see if
> * the bus shows 0xFF because the odd clown forgets the D7
> Index: work/drivers/ata/sata_inic162x.c
> ===================================================================
> --- work.orig/drivers/ata/sata_inic162x.c
> +++ work/drivers/ata/sata_inic162x.c
> @@ -446,7 +446,7 @@ static int inic_hardreset(struct ata_por
> struct ata_taskfile tf;
>
> /* wait a while before checking status */
> - msleep(150);
> + ata_wait_after_reset(ap, deadline);
>
> rc = ata_wait_ready(ap, deadline);
[...]

The main thing that bothers me is not the increase in delay, but the
fact that this create converts a delay/Status-poll sequence into a
delay/Status-poll/Status-poll sequence.

ata_wait_after_reset() immediately before ata_wait_ready() seems highly
redundant. Why not just poll Status once?

Jeff



2007-05-29 07:35:08

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] libata: implement ata_wait_after_reset()

Jeff Garzik wrote:
> Tejun Heo wrote:
>> - msleep(150);
>> + /* wait a while before checking status */
>> + ata_wait_after_reset(ap, deadline);
> [...]
>> - msleep(150);
>> + /* wait a while before checking status */
>> + ata_wait_after_reset(ap, deadline);
>>
>> /* Before we perform post reset processing we want to see if
>> * the bus shows 0xFF because the odd clown forgets the D7
>> @@ -3543,8 +3583,8 @@ int sata_std_hardreset(struct ata_port *
>> return 0;
>> }
>>
>> - /* wait a while before checking status, see SRST for more info */
>> - msleep(150);
>> + /* wait a while before checking status */
>> + ata_wait_after_reset(ap, deadline);
>>
>> rc = ata_wait_ready(ap, deadline);
> [...]
>> - msleep(150);
>> + /* wait a while before checking status */
>> + ata_wait_after_reset(ap, deadline);
>>
>> /* Before we perform post reset processing we want to see if
>> * the bus shows 0xFF because the odd clown forgets the D7
>> Index: work/drivers/ata/sata_inic162x.c
>> ===================================================================
>> --- work.orig/drivers/ata/sata_inic162x.c
>> +++ work/drivers/ata/sata_inic162x.c
>> @@ -446,7 +446,7 @@ static int inic_hardreset(struct ata_por
>> struct ata_taskfile tf;
>>
>> /* wait a while before checking status */
>> - msleep(150);
>> + ata_wait_after_reset(ap, deadline);
>>
>> rc = ata_wait_ready(ap, deadline);
> [...]
>
> The main thing that bothers me is not the increase in delay, but the
> fact that this create converts a delay/Status-poll sequence into a
> delay/Status-poll/Status-poll sequence.
>
> ata_wait_after_reset() immediately before ata_wait_ready() seems highly
> redundant. Why not just poll Status once?

I was trying to minimize code disturbance around reset such that
ata_wait_after_reset() can be drop-in replacement for msleep(150). This
was for two reasons 1. as this patch was to fix regression I didn't want
to introduce a lot of change into -rcX and 2. I was lazy. :-)

As dont-consider-0xff-as-port-empty-if-sstatus-available patch fixes the
regression nicely, I think we can delay this to 2.6.23. I'll merge
ata_wait_after_reset() into ata_wait_ready() (or the other way around).

Thanks.

--
tejun