2004-09-01 15:29:22

by Romano Giannetti

[permalink] [raw]
Subject: Re: Driver retries disk errors.

On Tue, Aug 31, 2004 at 05:12:50PM +0100, Alan Cox wrote:
>
> > (1) Imagine an application doing a linear read on a file with an 8
> > block read ahead and the last block being bad. The kernel will try to
> > read that bad block 16 times, but because the IDE driver also has 8
> > retries, the kernel will try to read that bad block *64* times. It
> > usually takes an IDE drive about 2 seconds to figure out a block is
> > bad, so the application gets stuck for 2 minutes in that single bad
> > block.
>
> Right now I know of no way to tell which is readahead for a failed
> command or of telling the block layer to forget them. Fix this at the
> block layer and IDE can abort the readahead sequence happily enough
> because IDE is too dumb to have issued further commands to the drive at
> this point.

Just a question from a kernel-almost-illiterate. Could this explain the
behavior of my laptop yesterday, reading a damaged DVD? I had to wait almost
one full minute of retry until being able to kill xine...

If maintaining the retries, it could be nice to allow at least kill -9
between them. I do not know if that's foolish and/or impossible, so please
do not bash too hard...

Have a nice day,
Romano


--
Romano Giannetti - Univ. Pontificia Comillas (Madrid, Spain)
Electronic Engineer - phone +34 915 422 800 ext 2416 fax +34 915 596 569


2004-09-01 15:47:39

by Alan

[permalink] [raw]
Subject: Re: Driver retries disk errors.

On Mer, 2004-09-01 at 16:28, Romano Giannetti wrote:
> Just a question from a kernel-almost-illiterate. Could this explain the
> behavior of my laptop yesterday, reading a damaged DVD? I had to wait almost
> one full minute of retry until being able to kill xine...

Thats the block layer. Its actually hard to fix the kill -9 case.

> If maintaining the retries, it could be nice to allow at least kill -9
> between them. I do not know if that's foolish and/or impossible, so please
> do not bash too hard...

Things like Xine are precisely the cases where you want retry turned off
by the application - if the sector is bad then you want to skip when
playing movies, while you don't want to skip while writing out your
database

2004-09-01 23:21:58

by Rogier Wolff

[permalink] [raw]
Subject: Re: Driver retries disk errors.

On Wed, Sep 01, 2004 at 03:44:38PM +0100, Alan Cox wrote:
> On Mer, 2004-09-01 at 16:28, Romano Giannetti wrote:
> > Just a question from a kernel-almost-illiterate. Could this explain the
> > behavior of my laptop yesterday, reading a damaged DVD? I had to wait almost
> > one full minute of retry until being able to kill xine...
>
> Thats the block layer. Its actually hard to fix the kill -9 case.

I don't think so. It starts with the ide-cd level driver
doing 8 retries. Most disk we see retry themselves for about a
4 second delay before reporting a bad block. A CD taking twice
that much would not sound abnormal. (seeks are about 10 times
as expensive on CDs). 8 times 8 seconds is a full minute.

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam - no windows, no gates, apache inside!" ****

2004-09-02 10:31:44

by Alan

[permalink] [raw]
Subject: Re: Driver retries disk errors.

On Iau, 2004-09-02 at 00:14, Rogier Wolff wrote:
> I don't think so. It starts with the ide-cd level driver
> doing 8 retries. Most disk we see retry themselves for about a
> 4 second delay before reporting a bad block. A CD taking twice

"Most", that is the heart of the reason for not taking them out.

> that much would not sound abnormal. (seeks are about 10 times
> as expensive on CDs). 8 times 8 seconds is a full minute.

As I said media players need a way to turn it to no retry

2004-09-02 11:00:42

by Rogier Wolff

[permalink] [raw]
Subject: Re: Driver retries disk errors.

On Thu, Sep 02, 2004 at 10:29:29AM +0100, Alan Cox wrote:
> On Iau, 2004-09-02 at 00:14, Rogier Wolff wrote:
> > I don't think so. It starts with the ide-cd level driver
> > doing 8 retries. Most disk we see retry themselves for about a
> > 4 second delay before reporting a bad block. A CD taking twice
>
> "Most", that is the heart of the reason for not taking them out.

Some retry only for about a second, the rest takes more than
4 seconds.

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam - no windows, no gates, apache inside!" ****

2004-09-02 14:31:31

by John Stoffel

[permalink] [raw]
Subject: Re: Driver retries disk errors.


>> that much would not sound abnormal. (seeks are about 10 times
>> as expensive on CDs). 8 times 8 seconds is a full minute.

Alan> As I said media players need a way to turn it to no retry

I just ran into this with a scratched CDROM and the program 'grip'
which ended up requiring a reboot of my 2.6.8 kernel to get back
control of /dev/cdrom on my system. Needless to say, I wasn't very
happy about this.

I really think that we need some way to keep such deadlocks from
happening. I really dislike having a device lockup a user application
so hard that it can't be exited. There's no real reason we should be
doing this any more. If we have to, let the user kill it and just
have the kernel make it into a zombie, but at least let the user kill
it off.

John

2004-09-02 16:01:44

by Alan

[permalink] [raw]
Subject: Re: Driver retries disk errors.

On Iau, 2004-09-02 at 15:30, John Stoffel wrote:
> I really think that we need some way to keep such deadlocks from
> happening. I really dislike having a device lockup a user application
> so hard that it can't be exited. There's no real reason we should be
> doing this any more. If we have to, let the user kill it and just
> have the kernel make it into a zombie, but at least let the user kill
> it off.

If you had to reboot file a bug, none of the block error recovery code
or below should ever hang indefinitely.

2004-09-02 16:08:37

by John Stoffel

[permalink] [raw]
Subject: Re: Driver retries disk errors.


Alan> On Iau, 2004-09-02 at 15:30, John Stoffel wrote:
>> I really think that we need some way to keep such deadlocks from
>> happening. I really dislike having a device lockup a user application
>> so hard that it can't be exited. There's no real reason we should be
>> doing this any more. If we have to, let the user kill it and just
>> have the kernel make it into a zombie, but at least let the user kill
>> it off.

Alan> If you had to reboot file a bug, none of the block error
Alan> recovery code or below should ever hang indefinitely.

Once I can reproduce it reliably, I'll send a better report. I've
been holding off on my comments til now, but got caught up in the
moment.

I also know now that it should timeout and come back to life. I even
had a back trace on the hung process, but didn't save it. Mea cupla.

2004-09-02 16:27:40

by Eric D. Mudama

[permalink] [raw]
Subject: Re: Driver retries disk errors.

On Wed, 01 Sep 2004 15:44:38 +0100, Alan Cox <[email protected]> wrote:
> Things like Xine are precisely the cases where you want retry turned off
> by the application - if the sector is bad then you want to skip when
> playing movies, while you don't want to skip while writing out your
> database

This is what they're trying to accomplish with ATA-7 Streaming Feature
Set ... tell the drive to just read through errors and send the
garbage, without doing error recovery, for high bandwidth media
readback. The first drives to support this feature set will be coming
out relatively soon...