2009-10-26 16:20:17

by Philippe De Muyter

[permalink] [raw]
Subject: [PATCH ide] : Increase WAIT_DRQ to support slow CF cards

Hi,

I just encountered a problem with write-access to a batch of CF cards
(KINGSTON TECHNOLOGY 4GB COMPACT FLASH CF/4GB
3.3V/5V 9904321 - 006.AOOLF 4449081 - 1219643 X001 ASSY IN TAIWAN (c) 2008)
connected to a PC-CARD / PCMCIA interface, with the following error messages :

hda: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hda: no DRQ after issuing MULTWRITE

After testing with different bigger values for the WAIT_DRQ timeout value,
the problem disappeared. I had success with WAIT_DRQ = 500ms, then with
WAIT_DRQ = 300ms. I then tested with WAIT_DRQ = 200ms, but the problem
reappeared. So I kept the 300ms value.

Signed-off-by: Philippe De Muyter <[email protected]>

diff -r a145344bb228 include/linux/ide.h
--- a/include/linux/ide.h Thu Oct 22 08:28:28 2009 +0900
+++ b/include/linux/ide.h Mon Oct 26 16:51:23 2009 +0100
@@ -125,8 +125,8 @@
* Timeouts for various operations:
*/
enum {
- /* spec allows up to 20ms */
- WAIT_DRQ = HZ / 10, /* 100ms */
+ /* spec allows up to 20ms, but some CF cards need more than 200ms */
+ WAIT_DRQ = 3 * HZ / 10, /* 300ms */
/* some laptops are very slow */
WAIT_READY = 5 * HZ, /* 5s */
/* should be less than 3ms (?), if all ATAPI CD is closed at boot */


2009-10-27 00:34:57

by Robert Hancock

[permalink] [raw]
Subject: Re: [PATCH ide] : Increase WAIT_DRQ to support slow CF cards

On 10/26/2009 10:20 AM, Philippe De Muyter wrote:
> Hi,
>
> I just encountered a problem with write-access to a batch of CF cards
> (KINGSTON TECHNOLOGY 4GB COMPACT FLASH CF/4GB
> 3.3V/5V 9904321 - 006.AOOLF 4449081 - 1219643 X001 ASSY IN TAIWAN (c) 2008)
> connected to a PC-CARD / PCMCIA interface, with the following error messages :
>
> hda: status timeout: status=0xd0 { Busy }
> ide: failed opcode was: unknown
> hda: no DRQ after issuing MULTWRITE
>
> After testing with different bigger values for the WAIT_DRQ timeout value,
> the problem disappeared. I had success with WAIT_DRQ = 500ms, then with
> WAIT_DRQ = 300ms. I then tested with WAIT_DRQ = 200ms, but the problem
> reappeared. So I kept the 300ms value.
>
> Signed-off-by: Philippe De Muyter<[email protected]>
>
> diff -r a145344bb228 include/linux/ide.h
> --- a/include/linux/ide.h Thu Oct 22 08:28:28 2009 +0900
> +++ b/include/linux/ide.h Mon Oct 26 16:51:23 2009 +0100
> @@ -125,8 +125,8 @@
> * Timeouts for various operations:
> */
> enum {
> - /* spec allows up to 20ms */
> - WAIT_DRQ = HZ / 10, /* 100ms */
> + /* spec allows up to 20ms, but some CF cards need more than 200ms */
> + WAIT_DRQ = 3 * HZ / 10, /* 300ms */
> /* some laptops are very slow */
> WAIT_READY = 5 * HZ, /* 5s */
> /* should be less than 3ms (?), if all ATAPI CD is closed at boot */

This has come up before:

http://marc.info/?l=linux-ide&m=123064513313466&w=2

I think this timeout should not even exist. libata has no such timeout
(only the overall command completion timeout), and I can't find any
reference in current ATA specs to the device being required to raise DRQ
in any particular amount of time.

2009-10-27 00:45:04

by David Miller

[permalink] [raw]
Subject: Re: [PATCH ide] : Increase WAIT_DRQ to support slow CF cards

From: Robert Hancock <[email protected]>
Date: Mon, 26 Oct 2009 18:34:57 -0600

> This has come up before:
>
> http://marc.info/?l=linux-ide&m=123064513313466&w=2
>
> I think this timeout should not even exist. libata has no such timeout
> (only the overall command completion timeout), and I can't find any
> reference in current ATA specs to the device being required to raise
> DRQ in any particular amount of time.

So is the issue that, whilst we should wait for BUSY to clear,
waiting around for DRQ is unreasonable?

It seems that WAIT_DRQ is passed to ide_wait_stat() but that
only controls how long we wait for BUSY to clear, the ATA_DRQ
'bad' bit we pass there only gets probed in a fixed limit loop:

for (i = 0; i < 10; i++) {
udelay(1);
stat = tp_ops->read_status(hwif);

if (OK_STAT(stat, good, bad)) {
*rstat = stat;
return 0;
}
}
*rstat = stat;
return -EFAULT;

Therefore, if increasing WAIT_DRQ helps things for people, it's
because the BUSY bit needs that much time to clear in these
cases.

The talking in that thread seems to state that the ATA layer
waits only for BUSY to clear, it does not wait for DRQ. But
from the data we're seeing here, it is in fact BUSY which needs
so much more time to clear so removing the DRQ bit probe to
be more like ATA won't fix anything.

2009-10-27 01:07:19

by Robert Hancock

[permalink] [raw]
Subject: Re: [PATCH ide] : Increase WAIT_DRQ to support slow CF cards

On Mon, Oct 26, 2009 at 6:45 PM, David Miller <[email protected]> wrote:
> From: Robert Hancock <[email protected]>
> Date: Mon, 26 Oct 2009 18:34:57 -0600
>
>> This has come up before:
>>
>> http://marc.info/?l=linux-ide&m=123064513313466&w=2
>>
>> I think this timeout should not even exist. libata has no such timeout
>> (only the overall command completion timeout), and I can't find any
>> reference in current ATA specs to the device being required to raise
>> DRQ in any particular amount of time.
>
> So is the issue that, whilst we should wait for BUSY to clear,
> waiting around for DRQ is unreasonable?
>
> It seems that WAIT_DRQ is passed to ide_wait_stat() but that
> only controls how long we wait for BUSY to clear, the ATA_DRQ
> 'bad' bit we pass there only gets probed in a fixed limit loop:
>
> ? ? ? ?for (i = 0; i < 10; i++) {
> ? ? ? ? ? ? ? ?udelay(1);
> ? ? ? ? ? ? ? ?stat = tp_ops->read_status(hwif);
>
> ? ? ? ? ? ? ? ?if (OK_STAT(stat, good, bad)) {
> ? ? ? ? ? ? ? ? ? ? ? ?*rstat = stat;
> ? ? ? ? ? ? ? ? ? ? ? ?return 0;
> ? ? ? ? ? ? ? ?}
> ? ? ? ?}
> ? ? ? ?*rstat = stat;
> ? ? ? ?return -EFAULT;
>
> Therefore, if increasing WAIT_DRQ helps things for people, it's
> because the BUSY bit needs that much time to clear in these
> cases.
>
> The talking in that thread seems to state that the ATA layer
> waits only for BUSY to clear, it does not wait for DRQ. ?But
> from the data we're seeing here, it is in fact BUSY which needs
> so much more time to clear so removing the DRQ bit probe to
> be more like ATA won't fix anything.

Hmm, I think you're right.. seems it expects BSY to be de-asserted
within 100ms when issuing a write, which is fairly ridiculous. Maybe
not a problem for a hard drive in typical cases, but if a CF or SSD is
in an erase cycle or something it's quite possible for this not to
work.

Of course, just jacking up the timeout may make the problem alluded to
in the comment in __ide_wait_stat more evident ("This routine should
get fixed to not hog the cpu during extra long waits"), as it just
does a tight loop polling the status with no sleeps.

libata only busy-waits for 50 microseconds, if not set then it sleeps
for 2ms and polls for another 10 microseconds, if still not set it
tries the whole thing again at 16ms intervals. Only after (typically)
30 seconds does it give up.

2009-10-27 01:19:00

by David Miller

[permalink] [raw]
Subject: Re: [PATCH ide] : Increase WAIT_DRQ to support slow CF cards

From: Robert Hancock <[email protected]>
Date: Mon, 26 Oct 2009 19:07:18 -0600

> libata only busy-waits for 50 microseconds, if not set then it sleeps
> for 2ms and polls for another 10 microseconds, if still not set it
> tries the whole thing again at 16ms intervals. Only after (typically)
> 30 seconds does it give up.

Porting that kind of logic over to IDE is a non-starter.

It's easier to get people to move over to using the ATA layer for
their devices.

Meanwhile we should provide a way for things to work, and
realistically the only way to do that currently is to bump the
WAIT_DRQ value to some large number.

And that's exactly the kind of patch I'm willing to accept for this.

2009-10-27 01:40:00

by Robert Hancock

[permalink] [raw]
Subject: Re: [PATCH ide] : Increase WAIT_DRQ to support slow CF cards

On Mon, Oct 26, 2009 at 7:19 PM, David Miller <[email protected]> wrote:
> From: Robert Hancock <[email protected]>
> Date: Mon, 26 Oct 2009 19:07:18 -0600
>
>> libata only busy-waits for 50 microseconds, if not set then it sleeps
>> for 2ms and polls for another 10 microseconds, if still not set it
>> tries the whole thing again at 16ms intervals. Only after (typically)
>> 30 seconds does it give up.
>
> Porting that kind of logic over to IDE is a non-starter.
>
> It's easier to get people to move over to using the ATA layer for
> their devices.
>
> Meanwhile we should provide a way for things to work, and
> realistically the only way to do that currently is to bump the
> WAIT_DRQ value to some large number.
>
> And that's exactly the kind of patch I'm willing to accept for this.

I agree, it's sub-optimal but it helps.. if the user wants better
behavior they should a) fix it so that the card isn't using PIO, at
least if it supports DMA and b) not use drivers/ide..

2009-10-27 01:42:56

by David Miller

[permalink] [raw]
Subject: Re: [PATCH ide] : Increase WAIT_DRQ to support slow CF cards

From: Robert Hancock <[email protected]>
Date: Mon, 26 Oct 2009 19:40:03 -0600

> On Mon, Oct 26, 2009 at 7:19 PM, David Miller <[email protected]> wrote:
>> Meanwhile we should provide a way for things to work, and
>> realistically the only way to do that currently is to bump the
>> WAIT_DRQ value to some large number.
>>
>> And that's exactly the kind of patch I'm willing to accept for this.
>
> I agree, it's sub-optimal but it helps.. if the user wants better
> behavior they should a) fix it so that the card isn't using PIO, at
> least if it supports DMA and b) not use drivers/ide..

Philippe's patch that started this thread uses "3 * HZ / 10"
which isn't large enough for the SSD cases. Can someone please
post a patch that uses a large enough value?

Thanks.

2009-10-27 09:45:17

by Philippe De Muyter

[permalink] [raw]
Subject: Re: [PATCH ide] : Increase WAIT_DRQ to support slow CF cards

Hi David,

On Mon, Oct 26, 2009 at 06:43:18PM -0700, David Miller wrote:
> From: Robert Hancock <[email protected]>
> Date: Mon, 26 Oct 2009 19:40:03 -0600
>
> > On Mon, Oct 26, 2009 at 7:19 PM, David Miller <[email protected]> wrote:
> >> Meanwhile we should provide a way for things to work, and
> >> realistically the only way to do that currently is to bump the
> >> WAIT_DRQ value to some large number.
> >>
> >> And that's exactly the kind of patch I'm willing to accept for this.
> >
> > I agree, it's sub-optimal but it helps.. if the user wants better
> > behavior they should a) fix it so that the card isn't using PIO, at
> > least if it supports DMA and b) not use drivers/ide..

Strangely enough, I also had no timeout problem if I started my kernel with
'ide=nodma', instead of increasing WAIT_DRQ. So I surmise that WAIT_DRQ
is used in the dma case.

>
> Philippe's patch that started this thread uses "3 * HZ / 10"
> which isn't large enough for the SSD cases. Can someone please
> post a patch that uses a large enough value?

How big a timeout do you want/accept ? Mark Lord wrote about SSD's in the mail
referred by Robert Hancock :
It should probably be at least 500msec or more now.

Philippe

2009-10-27 10:24:59

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: [PATCH ide] : Increase WAIT_DRQ to support slow CF cards

Hello.

Philippe De Muyter wrote:

>> From: Robert Hancock <[email protected]>
>> Date: Mon, 26 Oct 2009 19:40:03 -0600
>>
>>
>>> On Mon, Oct 26, 2009 at 7:19 PM, David Miller <[email protected]> wrote:
>>>
>>>> Meanwhile we should provide a way for things to work, and
>>>> realistically the only way to do that currently is to bump the
>>>> WAIT_DRQ value to some large number.
>>>>
>>>> And that's exactly the kind of patch I'm willing to accept for this.
>>>>
>>> I agree, it's sub-optimal but it helps.. if the user wants better
>>> behavior they should a) fix it so that the card isn't using PIO, at
>>> least if it supports DMA and b) not use drivers/ide..
>>>
>
> Strangely enough, I also had no timeout problem if I started my kernel with
> 'ide=nodma', instead of increasing WAIT_DRQ.

Hm, interesting...

> So I surmise that WAIT_DRQ is used in the dma case.
>
>

It's used only for the PIO write commands -- see do_rw_taskfile() in
ide-taskfile.c... DMA commands don't require waiting for BSY=0, DRQ=1
condition.

> Philippe
>

WBR, Sergei

2009-10-31 13:56:31

by Mark Lord

[permalink] [raw]
Subject: Re: [PATCH ide] : Increase WAIT_DRQ to support slow CF cards

Robert Hancock wrote:
..
> This has come up before:
>
> http://marc.info/?l=linux-ide&m=123064513313466&w=2
>
> I think this timeout should not even exist. libata has no such timeout
> (only the overall command completion timeout), and I can't find any
> reference in current ATA specs to the device being required to raise DRQ
> in any particular amount of time.
..

The reason for the original (20ms, then 50ms) timeout was this text
from the ATA1 specification, long since outdated:

- Upon receipt of a Class 3 command, the drive sets BSY within 400 nsec,
sets up the sector buffer for a write operation, sets DRQ within 20
msec, and clears BSY within 400 nsec of setting DRQ.

Cheers

2009-12-03 05:57:33

by David Miller

[permalink] [raw]
Subject: Re: [PATCH ide] : Increase WAIT_DRQ to support slow CF cards

From: Mark Lord <[email protected]>
Date: Sat, 31 Oct 2009 09:56:26 -0400

> Robert Hancock wrote:
> ..
>> This has come up before:
>> http://marc.info/?l=linux-ide&m=123064513313466&w=2
>> I think this timeout should not even exist. libata has no such timeout
>> (only the overall command completion timeout), and I can't find any
>> reference in current ATA specs to the device being required to raise
>> DRQ in any particular amount of time.
> ..
>
> The reason for the original (20ms, then 50ms) timeout was this text
> from the ATA1 specification, long since outdated:
>
> - Upon receipt of a Class 3 command, the drive sets BSY within 400 nsec,
> sets up the sector buffer for a write operation, sets DRQ within 20
> msec, and clears BSY within 400 nsec of setting DRQ.

Ok, I'd like to resolve this as follows. We had stated "at least
500msec for SSD drives" so I doubled it.

This should be a pretty safe change. The only major side effect is
that if the device really does hang before setting DRQ it will take
a full second before we notice it.

Any major objections?

ide: Increase WAIT_DRQ to accomodate some CF cards and SSD drives.

Based upon a patch by Philippe De Muyter, and feedback from Mark
Lord and Robert Hancock.

As noted by Mark Lord, the outdated ATA1 spec specifies a 20msec
timeout for setting DRQ but lots of common devices overshoot this.

Signed-off-by: David S. Miller <[email protected]>

diff --git a/include/linux/ide.h b/include/linux/ide.h
index e4135d6..0ec6129 100644
--- a/include/linux/ide.h
+++ b/include/linux/ide.h
@@ -125,8 +125,8 @@ struct ide_io_ports {
* Timeouts for various operations:
*/
enum {
- /* spec allows up to 20ms */
- WAIT_DRQ = HZ / 10, /* 100ms */
+ /* spec allows up to 20ms, but CF cards and SSD drives need more */
+ WAIT_DRQ = 1 * HZ, /* 1s */
/* some laptops are very slow */
WAIT_READY = 5 * HZ, /* 5s */
/* should be less than 3ms (?), if all ATAPI CD is closed at boot */

2009-12-03 08:55:39

by Philippe De Muyter

[permalink] [raw]
Subject: Re: [PATCH ide] : Increase WAIT_DRQ to support slow CF cards

On Wed, Dec 02, 2009 at 09:57:38PM -0800, David Miller wrote:
> ide: Increase WAIT_DRQ to accomodate some CF cards and SSD drives.
>
> Based upon a patch by Philippe De Muyter, and feedback from Mark
> Lord and Robert Hancock.
>
> As noted by Mark Lord, the outdated ATA1 spec specifies a 20msec
> timeout for setting DRQ but lots of common devices overshoot this.
>
> Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Philippe De Muyter <[email protected]>
>
> diff --git a/include/linux/ide.h b/include/linux/ide.h
> index e4135d6..0ec6129 100644
> --- a/include/linux/ide.h
> +++ b/include/linux/ide.h
> @@ -125,8 +125,8 @@ struct ide_io_ports {
> * Timeouts for various operations:
> */
> enum {
> - /* spec allows up to 20ms */
> - WAIT_DRQ = HZ / 10, /* 100ms */
> + /* spec allows up to 20ms, but CF cards and SSD drives need more */
> + WAIT_DRQ = 1 * HZ, /* 1s */
> /* some laptops are very slow */
> WAIT_READY = 5 * HZ, /* 5s */
> /* should be less than 3ms (?), if all ATAPI CD is closed at boot */