2006-08-01 16:16:34

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.18-rc2, problem to wake up spinned down drive?

Harald Dunkel wrote:
> Hi folks,
>
> I tried to spin down my harddisk using hdparm, but when it is
> supposed to spin up again, then it is blocked for quite some
> time. dmesg says:
>
> ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata1.00: (BMDMA stat 0x20)
> ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
> ata1: port is slow to respond, please be patient
> ata1: port failed to respond (30 secs)
> ata1: soft resetting port
> ata1.00: configured for UDMA/133
> ata1: EH complete
> SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)
> sda: Write Protect is off
> sda: Mode Sense: 00 3a 00 00
> SCSI device sda: drive cache: write back
>
> The disk is a SAMSUNG SP1614C.
>
> On another machine (with a SAMSUNG SP2504C inside) there is no
> such problem: The disk is back after just a few seconds.

In standby mode, the drive's interface and state machines stay online
and are supposed to spin up and process the command when it receives
one. The above message is printed because an IO command hasn't finished
in 30 secs meaning that it didn't wake up when it should have. The
drive seems to act incorrectly.

> Is there some trick to wake up the disk a little bit faster?

Can you try the following instead of hdparm?

echo 1 > /sys/bus/scsi/devices/1:0:0:0/power/state

It will make libata involved in putting the disk to sleep and waking it
up, and, when waking, it will kick the drive in the ass by resetting the
channel. Please try with the latest -rc kernel.

--
tejun


2006-08-01 18:17:08

by Harald Dunkel

[permalink] [raw]
Subject: Re: 2.6.18-rc2, problem to wake up spinned down drive?

Tejun Heo wrote:
>
> Can you try the following instead of hdparm?
>
> echo 1 > /sys/bus/scsi/devices/1:0:0:0/power/state
>
> It will make libata involved in putting the disk to sleep and waking it
> up, and, when waking, it will kick the drive in the ass by resetting the
> channel. Please try with the latest -rc kernel.
>

Sorry to say, but this did not work:

# echo 1 > /sys/bus/scsi/devices/0:0:0:0/power/state
bash: echo: write error: Invalid argument
# ll !$
ll /sys/bus/scsi/devices/0:0:0:0/power/state
-rw-r--r-- 1 root root 0 Aug 1 20:00 /sys/bus/scsi/devices/0:0:0:0/power/state
# cat !$
cat /sys/bus/scsi/devices/0:0:0:0/power/state
0
# uname -a
Linux bugs 2.6.18-rc3 #2 PREEMPT Sun Jul 30 16:26:22 CEST 2006 i686 GNU/Linux


Regards

Harri



Attachments:
signature.asc (252.00 B)
OpenPGP digital signature

2006-08-01 18:23:13

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.18-rc2, problem to wake up spinned down drive?

Harald Dunkel wrote:
> Tejun Heo wrote:
>> Can you try the following instead of hdparm?
>>
>> echo 1 > /sys/bus/scsi/devices/1:0:0:0/power/state
>>
>> It will make libata involved in putting the disk to sleep and waking it
>> up, and, when waking, it will kick the drive in the ass by resetting the
>> channel. Please try with the latest -rc kernel.
>>
>
> Sorry to say, but this did not work:
>
> # echo 1 > /sys/bus/scsi/devices/0:0:0:0/power/state
> bash: echo: write error: Invalid argument
> # ll !$
> ll /sys/bus/scsi/devices/0:0:0:0/power/state
> -rw-r--r-- 1 root root 0 Aug 1 20:00 /sys/bus/scsi/devices/0:0:0:0/power/state
> # cat !$
> cat /sys/bus/scsi/devices/0:0:0:0/power/state
> 0
> # uname -a
> Linux bugs 2.6.18-rc3 #2 PREEMPT Sun Jul 30 16:26:22 CEST 2006 i686 GNU/Linux

You probably should do 'echo -n 1', the parsing function is pretty picky.

--
tejun

2006-08-02 16:52:39

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.6.18-rc2, problem to wake up spinned down drive?

Tejun Heo wrote:
> Harald Dunkel wrote:
>> Tejun Heo wrote:
>>> Can you try the following instead of hdparm?
>>>
>>> echo 1 > /sys/bus/scsi/devices/1:0:0:0/power/state
>>>
>>> It will make libata involved in putting the disk to sleep and waking it
>>> up, and, when waking, it will kick the drive in the ass by resetting the
>>> channel. Please try with the latest -rc kernel.
>>>
>>
>> Sorry to say, but this did not work:
>>
>> # echo 1 > /sys/bus/scsi/devices/0:0:0:0/power/state
>> bash: echo: write error: Invalid argument
>> # ll !$
>> ll /sys/bus/scsi/devices/0:0:0:0/power/state
>> -rw-r--r-- 1 root root 0 Aug 1 20:00
>> /sys/bus/scsi/devices/0:0:0:0/power/state
>> # cat !$
>> cat /sys/bus/scsi/devices/0:0:0:0/power/state
>> 0
>> # uname -a
>> Linux bugs 2.6.18-rc3 #2 PREEMPT Sun Jul 30 16:26:22 CEST 2006 i686
>> GNU/Linux
>
> You probably should do 'echo -n 1', the parsing function is pretty picky.
>
Given that the data from the "cat" of state returned a zero with
newline, perhaps unreasonably picky. On a Fedora kernel it just doesn't
seem to work for SATA drives, sample size = 1.

--
Bill Davidsen <[email protected]>
Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one errors occurs during
wildcard (glob) expansion.

2006-08-05 19:33:08

by Harald Dunkel

[permalink] [raw]
Subject: Re: 2.6.18-rc2, problem to wake up spinned down drive?

Tejun Heo wrote:
> Harald Dunkel wrote:
>>
>> Sorry to say, but this did not work:
>>
>> # echo 1 > /sys/bus/scsi/devices/0:0:0:0/power/state
>> bash: echo: write error: Invalid argument
>
> You probably should do 'echo -n 1', the parsing function is pretty picky.
>

# echo -n 1 > /sys/bus/scsi/devices/0:0:0:0/power/state
bash: echo: write error: Invalid argument

???


Regards

Harri



Attachments:
signature.asc (252.00 B)
OpenPGP digital signature

2006-08-06 22:00:22

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.6.18-rc2, problem to wake up spinned down drive?

Hi!

> >ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action
> >0x2 frozen
> >ata1.00: (BMDMA stat 0x20)
> >ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0
> >(timeout)
> >ata1: port is slow to respond, please be patient
> >ata1: port failed to respond (30 secs)
> >ata1: soft resetting port
> >ata1.00: configured for UDMA/133
> >ata1: EH complete
> >SCSI device sda: 312581808 512-byte hdwr sectors
> >(160042 MB)
> >sda: Write Protect is off
> >sda: Mode Sense: 00 3a 00 00
> >SCSI device sda: drive cache: write back
> >
> >The disk is a SAMSUNG SP1614C.
> >
> >On another machine (with a SAMSUNG SP2504C inside)
> >there is no
> >such problem: The disk is back after just a few seconds.
>
> In standby mode, the drive's interface and state
> machines stay online and are supposed to spin up and
> process the command when it receives one. The above
> message is printed because an IO command hasn't finished
> in 30 secs meaning that it didn't wake up when it should
> have. The drive seems to act incorrectly.
>
> >Is there some trick to wake up the disk a little bit
> >faster?
>
> Can you try the following instead of hdparm?
>
> echo 1 > /sys/bus/scsi/devices/1:0:0:0/power/state

Really? I thought power/state takes 0/3 (for D0 and D3)

Pavel
--
Thanks for all the (sleeping) penguins.

2006-08-07 03:07:47

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.18-rc2, problem to wake up spinned down drive?

Pavel Machek wrote:
>> echo 1 > /sys/bus/scsi/devices/1:0:0:0/power/state
>
> Really? I thought power/state takes 0/3 (for D0 and D3)

Yes, of course. My mistake. Sorry about the confusion. The correct
command is 'echo -n 3 > /sys/bus/scsi/devices/x:y:z:w/power/state'.

--
tejun

2006-08-07 18:43:29

by Harald Dunkel

[permalink] [raw]
Subject: Re: 2.6.18-rc2, problem to wake up spinned down drive?

Tejun Heo wrote:
> Pavel Machek wrote:
>>> echo 1 > /sys/bus/scsi/devices/1:0:0:0/power/state
>>
>> Really? I thought power/state takes 0/3 (for D0 and D3)
>
> Yes, of course. My mistake. Sorry about the confusion. The correct
> command is 'echo -n 3 > /sys/bus/scsi/devices/x:y:z:w/power/state'.
>

(Sure? :-)

Now this did not work at all. The '-n 3' was probably
correct, but when I tried to access the disk, then it
did not spin up again (I waited for 5 minutes). There
was no message on the console, either.

But I could not reproduce this problem.

How do I monitor that the disk spins down and up?


Regards

Harri



Attachments:
signature.asc (252.00 B)
OpenPGP digital signature

2006-08-07 19:27:09

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.18-rc2, problem to wake up spinned down drive?

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 98bd3aa..5676388 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -99,7 +99,7 @@ #define SD_MAX_DISKS (((26 * 26) + 26 +
/*
* Time out in seconds for disks and Magneto-opticals (which are slower).
*/
-#define SD_TIMEOUT (30 * HZ)
+#define SD_TIMEOUT (7 * HZ)
#define SD_MOD_TIMEOUT (75 * HZ)

/*
diff --git a/include/linux/libata.h b/include/linux/libata.h
index b941670..45686f9 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -200,9 +200,9 @@ enum {
ATA_HOST_SIMPLEX = (1 << 0), /* Host is simplex, one DMA channel per host_set only */

/* various lengths of time */
- ATA_TMOUT_BOOT = 30 * HZ, /* heuristic */
- ATA_TMOUT_BOOT_QUICK = 7 * HZ, /* heuristic */
- ATA_TMOUT_INTERNAL = 30 * HZ,
+ ATA_TMOUT_BOOT = 10 * HZ, /* heuristic */
+ ATA_TMOUT_BOOT_QUICK = 5 * HZ, /* heuristic */
+ ATA_TMOUT_INTERNAL = 10 * HZ,
ATA_TMOUT_INTERNAL_QUICK = 5 * HZ,

/* ATA bus states */


Attachments:
patch (987.00 B)

2006-08-08 18:40:05

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.6.18-rc2, problem to wake up spinned down drive?

Tejun Heo wrote:

> Harald Dunkel wrote:
>
>> Tejun Heo wrote:
>>
>>> Pavel Machek wrote:
>>>
>>>>> echo 1 > /sys/bus/scsi/devices/1:0:0:0/power/state
>>>>
>>>> Really? I thought power/state takes 0/3 (for D0 and D3)
>>>
>>> Yes, of course. My mistake. Sorry about the confusion. The correct
>>> command is 'echo -n 3 > /sys/bus/scsi/devices/x:y:z:w/power/state'.
>>>
>>
>> (Sure? :-)
>
>
> The sleeping part is correct. That will make libata put the disk to
> sleep.
>
>> Now this did not work at all. The '-n 3' was probably
>> correct, but when I tried to access the disk, then it
>> did not spin up again (I waited for 5 minutes). There
>> was no message on the console, either.
>>
>> But I could not reproduce this problem.
>>
>> How do I monitor that the disk spins down and up?
>
>
> But the waking up part isn't. You need to issue wake up explicitly by
> doing 'echo -n 0 > /sys/...' I've been a complete idiot in this
> thread. Please excuse me. :-(
>
> I think the solution to your problem is adjusting command timeout to
> more reasonable values which should make the problem more bearable.
> It'll take some time to figure out how to make timeouts more
> intelligent without breaking support for slow devices. I'll work on
> that.

Tejun, would it be possible and sensible to either let the user tune
this per-drive, or to have the kernel note how long {something} takes
and auto-tune to that? As you said, the issue is not breaking slow devices.

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

2006-08-08 18:59:22

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.18-rc2, problem to wake up spinned down drive?

Bill Davidsen wrote:
>> I think the solution to your problem is adjusting command timeout to
>> more reasonable values which should make the problem more bearable.
>> It'll take some time to figure out how to make timeouts more
>> intelligent without breaking support for slow devices. I'll work on
>> that.
>
> Tejun, would it be possible and sensible to either let the user tune
> this per-drive, or to have the kernel note how long {something} takes
> and auto-tune to that? As you said, the issue is not breaking slow devices.

I think the driver can be made to have sufficient static intelligence to
not require user or auto tuning. !BUSY wait in pre/postreset which are
often cause of unnecessary 30s delay during recovery can be avoided by...

1. for !hotplug, waiting for BSY before reset doesn't make sense in the
first place (why would we be resetting the device if it can clear BSY?)

2. for hotplug, we can make things much more intelligent. e.g. try
prereset waiting and softreset from 0-5s, then hardresets 5-10s, 10-15s,
15-30s and 30s-60s, which will guarantee 1. slow device is given full
idle 30s to get ready eventually 2. recovery reset is complete in 60s,
while giving fast devices several chances to be fast.

And, for IO command timeouts, some operating system is said to use 7s
timeout for ATA IO commands and simply adopting that value would be good
enough. We also can choose more agressive timeouts for some EH commands
(IDNTIFY, SET_FEATURES...).

With all above combined, EH recovery should be pretty snappy and
recovery time well-bound.

--
tejun