2006-02-26 13:08:57

by Nick Warne

[permalink] [raw]
Subject: hda: irq timeout: status=0xd0 DMA question

Hi all,

Doing my housekeeping today I saw this in logs from last week on one of my
boxes (2.4.32) with then about 92 days uptime:



Feb 19 14:05:31 quake kernel: hda: irq timeout: status=0xd0 { Busy }
Feb 19 14:05:31 quake kernel:
Feb 19 14:05:31 quake smartd[405]: Device: /dev/hda, not capable of SMART
self-check
Feb 19 14:05:31 quake smartd[405]: Sending warning via mail to
root@localhost ...
Feb 19 14:05:31 quake kernel: hda: status timeout: status=0xd0 { Busy }
Feb 19 14:05:31 quake kernel:
Feb 19 14:05:31 quake kernel: hda: DMA disabled
Feb 19 14:05:31 quake kernel: hda: drive not ready for command
Feb 19 14:05:33 quake kernel: ide0: reset: success



and looking at drive saw DMA was indeed now off.


At boot:

Uniform Multi-Platform E-IDE driver Revision: 7.00beta4-2.4
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
PIIX4: IDE controller at PCI slot 00:07.1
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:pio
hda: IBM-DTTA-371010, ATA DISK drive
blk: queue c02a7560, I/O limit 4095Mb (mask 0xffffffff)
hdc: NEC CD-ROM DRIVE:28B, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: attached ide-disk driver.
hda: host protected area => 1
hda: 19746720 sectors (10110 MB) w/465KiB Cache, CHS=1229/255/63, UDMA(33)


I dunno what happened to the drive that time (this is the only logs of the
incident) and I turned DMA back on with hdparm - but my question is why is
DMA turned off and then left off after a reset?

Thanks,

Nick
--
"Person who say it cannot be done should not interrupt person doing it."
-Chinese Proverb


2006-02-26 14:09:19

by Mark Lord

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

Nick Warne wrote:
..
> Feb 19 14:05:31 quake kernel: hda: irq timeout: status=0xd0 { Busy }
> Feb 19 14:05:31 quake kernel:
> Feb 19 14:05:31 quake smartd[405]: Device: /dev/hda, not capable of SMART
> self-check
> Feb 19 14:05:31 quake smartd[405]: Sending warning via mail to
> root@localhost ...
> Feb 19 14:05:31 quake kernel: hda: status timeout: status=0xd0 { Busy }
> Feb 19 14:05:31 quake kernel:
> Feb 19 14:05:31 quake kernel: hda: DMA disabled
> Feb 19 14:05:31 quake kernel: hda: drive not ready for command
> Feb 19 14:05:33 quake kernel: ide0: reset: success
..
> I dunno what happened to the drive that time (this is the only logs of the
> incident) and I turned DMA back on with hdparm - but my question is why is
> DMA turned off and then left off after a reset?

When I wrote that code in the mid-1990s, the number one causes of drives
getting confused (and needing to be reset again), were improper DMA timings,
cablings, and buggy DMA firmware.

So at the time, since DMA was a newish feature for IDE, we figured that
turning it off after reset was a Good Thing(tm).

And it was. A more modern implementation might try being more clever about
such stuff, and Tejun is working on something like that for libata.

In the meanwhile, you could have a shell script just loop in the background,
turning DMA back on periodically. If you care.

Cheers

2006-02-26 14:15:09

by Jesper Juhl

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

On 2/26/06, Mark Lord <[email protected]> wrote:
> Nick Warne wrote:
> ..
> > Feb 19 14:05:31 quake kernel: hda: irq timeout: status=0xd0 { Busy }
> > Feb 19 14:05:31 quake kernel:
> > Feb 19 14:05:31 quake smartd[405]: Device: /dev/hda, not capable of SMART
> > self-check
> > Feb 19 14:05:31 quake smartd[405]: Sending warning via mail to
> > root@localhost ...
> > Feb 19 14:05:31 quake kernel: hda: status timeout: status=0xd0 { Busy }
> > Feb 19 14:05:31 quake kernel:
> > Feb 19 14:05:31 quake kernel: hda: DMA disabled
> > Feb 19 14:05:31 quake kernel: hda: drive not ready for command
> > Feb 19 14:05:33 quake kernel: ide0: reset: success
> ..
> > I dunno what happened to the drive that time (this is the only logs of the
> > incident) and I turned DMA back on with hdparm - but my question is why is
> > DMA turned off and then left off after a reset?
>
> When I wrote that code in the mid-1990s, the number one causes of drives
> getting confused (and needing to be reset again), were improper DMA timings,
> cablings, and buggy DMA firmware.
>
> So at the time, since DMA was a newish feature for IDE, we figured that
> turning it off after reset was a Good Thing(tm).
>
> And it was. A more modern implementation might try being more clever about
> such stuff, and Tejun is working on something like that for libata.
>
> In the meanwhile, you could have a shell script just loop in the background,
> turning DMA back on periodically. If you care.
>

Or how about an option for the IDE driver to "not do that" that people
could enable if needed/wanted?
Or just change the code to "not do that" since we are no longer in the
mid-1990s?

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-02-26 17:02:00

by Nick Warne

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

On Sunday 26 February 2006 14:15, Jesper Juhl wrote:
> On 2/26/06, Mark Lord <[email protected]> wrote:
> > Nick Warne wrote:

> > > I dunno what happened to the drive that time (this is the only logs of
> > > the incident) and I turned DMA back on with hdparm - but my question is
> > > why is DMA turned off and then left off after a reset?
> >
> > When I wrote that code in the mid-1990s, the number one causes of drives
> > getting confused (and needing to be reset again), were improper DMA
> > timings, cablings, and buggy DMA firmware.
> >
> > So at the time, since DMA was a newish feature for IDE, we figured that
> > turning it off after reset was a Good Thing(tm).
> >
> > And it was. A more modern implementation might try being more clever
> > about such stuff, and Tejun is working on something like that for libata.

OK, I see...


> > In the meanwhile, you could have a shell script just loop in the
> > background, turning DMA back on periodically. If you care.

I don't like - anyway, it's the first time I have ever seen this on that box
in 4 years, it was a quirk somewhere I think (maybe a power fluctuation or
the like).


> Or how about an option for the IDE driver to "not do that" that people
> could enable if needed/wanted?
> Or just change the code to "not do that" since we are no longer in the
> mid-1990s?

Good idea!

Nick
--
"Person who say it cannot be done should not interrupt person doing it."
-Chinese Proverb

2006-02-26 17:08:07

by Mark Lord

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

Jesper Juhl wrote:
>
> Or how about an option for the IDE driver to "not do that" that people
> could enable if needed/wanted?
> Or just change the code to "not do that" since we are no longer in the
> mid-1990s?

Well, yes. That's what I would do, were I still maintaining the IDE layer.

But that code has become so twisted and confused since then,
that a change like this is probably too risky/challenging for
the current maintainers. It seems really easy to break stuff
when touching parts of that code now, and people don't like it
much when their hard drives get corrupted.

But perhaps someone may successfully implement this.

Cheers

2006-02-26 17:17:14

by Jesper Juhl

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

On 2/26/06, Mark Lord <[email protected]> wrote:
> Jesper Juhl wrote:
> >
> > Or how about an option for the IDE driver to "not do that" that people
> > could enable if needed/wanted?
> > Or just change the code to "not do that" since we are no longer in the
> > mid-1990s?
>
> Well, yes. That's what I would do, were I still maintaining the IDE layer.
>
> But that code has become so twisted and confused since then,
> that a change like this is probably too risky/challenging for
> the current maintainers. It seems really easy to break stuff
> when touching parts of that code now, and people don't like it
> much when their hard drives get corrupted.
>
> But perhaps someone may successfully implement this.
>
Unfortunately my machines only have SCSI devices, so I'd have no way
to actually test a patch, otherwise I'd be happy to give it a shot - a
parameter to disable the behaviour shouldn't be too difficult to
implement, and if the default stays as the current behaviour then it
shouldn't be too controversial.
I wouldn't mind trying to hack up a patch, but it would be untested...

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-02-26 17:20:39

by Nick Warne

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

On Sunday 26 February 2006 17:17, Jesper Juhl wrote:

> > But perhaps someone may successfully implement this.
>
> Unfortunately my machines only have SCSI devices, so I'd have no way
> to actually test a patch, otherwise I'd be happy to give it a shot - a
> parameter to disable the behaviour shouldn't be too difficult to
> implement, and if the default stays as the current behaviour then it
> shouldn't be too controversial.
> I wouldn't mind trying to hack up a patch, but it would be untested...

Post it to me - but look at my original post - this is/was on kernel 2.4.32.
I have yet to see such output on 2.6.x series kernels.

I could test that for you, as I have a test box at work running 2.4.32 that
gets these strange disk errors sometimes (never have nailed that one down).

Nick

--
"Person who say it cannot be done should not interrupt person doing it."
-Chinese Proverb

2006-02-26 17:35:05

by Jesper Juhl

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

On 2/26/06, Nick Warne <[email protected]> wrote:
> On Sunday 26 February 2006 17:17, Jesper Juhl wrote:
>
> > > But perhaps someone may successfully implement this.
> >
> > Unfortunately my machines only have SCSI devices, so I'd have no way
> > to actually test a patch, otherwise I'd be happy to give it a shot - a
> > parameter to disable the behaviour shouldn't be too difficult to
> > implement, and if the default stays as the current behaviour then it
> > shouldn't be too controversial.
> > I wouldn't mind trying to hack up a patch, but it would be untested...
>
> Post it to me - but look at my original post - this is/was on kernel 2.4.32.
> I have yet to see such output on 2.6.x series kernels.
>
> I could test that for you, as I have a test box at work running 2.4.32 that
> gets these strange disk errors sometimes (never have nailed that one down).
>

I haven't been looked at 2.4.x for years, so whatever patch I cook up
would be for 2.6.x

My time currently is limited so it'll probably be a few days before I
have something ready for you to test, but thank you very much for the
offer, I'll get back to you shortly after I've embedded myself in the
IDE code and hopefully cooked something up that makes sense for you to
test.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-02-26 18:10:59

by Henrik Persson

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

Nick Warne wrote:
> On Sunday 26 February 2006 17:17, Jesper Juhl wrote:
>
>
>>>But perhaps someone may successfully implement this.
>>
>>Unfortunately my machines only have SCSI devices, so I'd have no way
>>to actually test a patch, otherwise I'd be happy to give it a shot - a
>>parameter to disable the behaviour shouldn't be too difficult to
>>implement, and if the default stays as the current behaviour then it
>>shouldn't be too controversial.
>>I wouldn't mind trying to hack up a patch, but it would be untested...
>
>
> Post it to me - but look at my original post - this is/was on kernel 2.4.32.
> I have yet to see such output on 2.6.x series kernels.

I get those on 2.6.x.

Does happen once or twice a year.. Probably something funky with the
cabling or some power-related issues.

Anyway, I would be happy if the IDE driver would "just not do that". :)

--
Henrik Persson

2006-02-26 18:43:19

by Robert Hancock

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

Henrik Persson wrote:
> Does happen once or twice a year.. Probably something funky with the
> cabling or some power-related issues.
>
> Anyway, I would be happy if the IDE driver would "just not do that". :)

I can see the reasoning where the device just doesn't function properly
with DMA at all (like on some Compact Flash-to-IDE adapters where the
card claims to support DMA but the DMA lines aren't wired through in the
adapter properly). In that case not disabling DMA would render it
useless. The IDE layer could keep track of whether DMA was previously
working on that device however, and not disable DMA on reset if it had
previously been working.

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-02-26 18:49:40

by Jesper Juhl

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

On 2/26/06, Robert Hancock <[email protected]> wrote:
> Henrik Persson wrote:
> > Does happen once or twice a year.. Probably something funky with the
> > cabling or some power-related issues.
> >
> > Anyway, I would be happy if the IDE driver would "just not do that". :)
>
> I can see the reasoning where the device just doesn't function properly
> with DMA at all (like on some Compact Flash-to-IDE adapters where the
> card claims to support DMA but the DMA lines aren't wired through in the
> adapter properly). In that case not disabling DMA would render it
> useless. The IDE layer could keep track of whether DMA was previously
> working on that device however, and not disable DMA on reset if it had
> previously been working.
>
That might be even better than an option to tell the driver "I don't
want you to disable DMA on reset".

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-02-26 20:33:03

by Mark Lord

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

Robert Hancock wrote:
> Henrik Persson wrote:
>> Does happen once or twice a year.. Probably something funky with the
>> cabling or some power-related issues.
>>
>> Anyway, I would be happy if the IDE driver would "just not do that". :)
>
> I can see the reasoning where the device just doesn't function properly
> with DMA at all (like on some Compact Flash-to-IDE adapters where the
> card claims to support DMA but the DMA lines aren't wired through in the
> adapter properly). In that case not disabling DMA would render it
> useless. The IDE layer could keep track of whether DMA was previously
> working on that device however, and not disable DMA on reset if it had
> previously been working.

Definitely. Where these things get sticky is in defining "DMA was working".
And keeping track of it separately for reads and writes.

cheers

2006-02-26 21:11:13

by Nick Warne

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

> > I can see the reasoning where the device just doesn't function properly
> > with DMA at all (like on some Compact Flash-to-IDE adapters where the
> > card claims to support DMA but the DMA lines aren't wired through in the
> > adapter properly). In that case not disabling DMA would render it
> > useless. The IDE layer could keep track of whether DMA was previously
> > working on that device however, and not disable DMA on reset if it had
> > previously been working.
>
> Definitely. Where these things get sticky is in defining "DMA was working".
> And keeping track of it separately for reads and writes.

Hey guys, keep the CC intact, I missed these.

Yes, all the above points are valid and right, I think.

As a user we know if DMA is OK on a ide device, right? Then let user have
option to set it permanent, else carry on as the code does now when idex
needs a reset.

Nick.

2006-02-27 13:33:19

by Mark Lord

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

Nick Warne wrote:
>
> As a user we know if DMA is OK on a ide device, right? Then let user have
> option to set it permanent, else carry on as the code does now when idex
> needs a reset.

Does "hdparm -K1 /dev/hda" solve the problem? That's what that option was
for originally, but I don't know if the IDE driver still uses it correctly.

Cheers

2006-02-27 18:32:44

by Nick Warne

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

On Monday 27 February 2006 13:32, Mark Lord wrote:
> Nick Warne wrote:
> > As a user we know if DMA is OK on a ide device, right? Then let user
> > have option to set it permanent, else carry on as the code does now when
> > idex needs a reset.
>
> Does "hdparm -K1 /dev/hda" solve the problem? That's what that option was
> for originally, but I don't know if the IDE driver still uses it correctly.

Strangely, I was reading up on this at work today, and it does indeed look
like what is required (although my man page refers to -k for options -dmu ) -
so I set both -k1 -K1 options.

Now to wait and see the drive produce the error.

Thanks for help Mark,

Nick
--
"Person who say it cannot be done should not interrupt person doing it."
-Chinese Proverb

2006-03-02 10:33:09

by Nick Warne

[permalink] [raw]
Subject: Re: hda: irq timeout: status=0xd0 DMA question

> Now to wait and see the drive produce the error.


OK, that doesn't work - it appears all get reset anyway. Both drives
here had -K1 and -k1 set with hdparm:


Mar 2 10:28:29 website2 kernel: blk: queue c033da3c, I/O limit 4095Mb
(mask 0xffffffff)
Mar 2 10:28:29 website2 kernel: hda: status error: status=0x58 {
DriveReady SeekComplete DataRequest }
Mar 2 10:28:29 website2 kernel:
Mar 2 10:28:29 website2 kernel: hda: drive not ready for command
Mar 2 10:28:29 website2 kernel: hda: status timeout: status=0xd0 { Busy }
Mar 2 10:28:29 website2 kernel:
Mar 2 10:28:29 website2 kernel: hda: DMA disabled
Mar 2 10:28:29 website2 kernel: hdb: DMA disabled
Mar 2 10:28:29 website2 kernel: hda: drive not ready for command
Mar 2 10:28:29 website2 kernel: ide0: reset: success


[nick@website2 nick]$ sudo /sbin/hdparm /dev/hda

/dev/hda:
multcount = 16 (on)
I/O support = 1 (32-bit)
unmaskirq = 1 (on)
using_dma = 0 (off)
keepsettings = 1 (on)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 784/255/63, sectors = 12594960, start = 0


[nick@website2 nick]$ sudo /sbin/hdparm /dev/hdb

/dev/hdb:
multcount = 0 (off)
I/O support = 1 (32-bit)
unmaskirq = 1 (on)
using_dma = 0 (off)
keepsettings = 1 (on)
nowerr = 0 (off)
readonly = 0 (off)
readahead = 8 (on)
geometry = 525/255/63, sectors = 8439184, start = 0



Nick