2003-06-30 22:01:23

by Marek Michalkiewicz

[permalink] [raw]
Subject: 2.4.21 IDE problems (lost interrupt, bad DMA status)

Hi,

After upgrading the kernel from 2.4.20 to 2.4.21, sometimes I see
the following messages:

hda: dma_timer_expiry: dma status == 0x24
hda: lost interrupt
hda: dma_intr: bad DMA status (dma_stat=30)
hda: dma_intr: status=0x50 { DriveReady SeekComplete }

It happens especially when there is a lot of disk I/O (which stops
for a few seconds when these messages appear), with three different
disks (very unlikely they all decided to die at the same time...),
one old ATA33 (QUANTUM FIREBALL SE8.4A) and two newer ATA100 disks
(WDC WD300BB-32CCB0, ST340015A). IDE controller: VIA VT82C686B
on a MSI MS-6368L motherboard.

I don't remember seeing anything like that in any earlier 2.4.x
kernels. Is this a known problem? Is this anything dangerous -
should I disable UDMA for now to play it safe?

Thanks,
Marek


2003-06-30 22:05:16

by Alan

[permalink] [raw]
Subject: Re: 2.4.21 IDE problems (lost interrupt, bad DMA status)

On Llu, 2003-06-30 at 23:15, Marek Michalkiewicz wrote:
> Hi,
>
> After upgrading the kernel from 2.4.20 to 2.4.21, sometimes I see
> the following messages:
>
> hda: dma_timer_expiry: dma status == 0x24
> hda: lost interrupt
> hda: dma_intr: bad DMA status (dma_stat=30)
> hda: dma_intr: status=0x50 { DriveReady SeekComplete }

Does it happen if you disable local apic support ?

2003-06-30 22:24:44

by dth

[permalink] [raw]
Subject: Re: 2.4.21 IDE problems (lost interrupt, bad DMA status)

Marek Michalkiewicz <[email protected]> wrote:
>I don't remember seeing anything like that in any earlier 2.4.x
>kernels. Is this a known problem? Is this anything dangerous -
>should I disable UDMA for now to play it safe?

afaik this concerns a "lost" interrupt.
Alan Cox's -ax__ pre-patches (current ac4) seems to fix it
for a lot of people. Other approch is to disable IO_APIC on
uni processors during kernel compile.

Happy compiling ;-)

Danny

--
Miguel | "I can't tell if I have worked all my life or if
de Icaza | I have never worked a single day of my life,"

2003-06-30 22:33:16

by dmeyer

[permalink] [raw]
Subject: Re: 2.4.21 IDE problems (lost interrupt, bad DMA status)

In article <[email protected]> you write:
> Hi,
>
> After upgrading the kernel from 2.4.20 to 2.4.21, sometimes I see
> the following messages:
>
> hda: dma_timer_expiry: dma status == 0x24
> hda: lost interrupt
> hda: dma_intr: bad DMA status (dma_stat=30)
> hda: dma_intr: status=0x50 { DriveReady SeekComplete }
>
> It happens especially when there is a lot of disk I/O (which stops
> for a few seconds when these messages appear), with three different
> disks (very unlikely they all decided to die at the same time...),
> one old ATA33 (QUANTUM FIREBALL SE8.4A) and two newer ATA100 disks
> (WDC WD300BB-32CCB0, ST340015A). IDE controller: VIA VT82C686B
> on a MSI MS-6368L motherboard.
>
> I don't remember seeing anything like that in any earlier 2.4.x
> kernels. Is this a known problem? Is this anything dangerous -
> should I disable UDMA for now to play it safe?

I never saw any corruption when I had it. I've seen this with stock
kernels since 2.4.18 or so with ACPI and APIC enabled; with ac kernels
I never get it (I'm suspecting the old ACPI in the stock kernels is
the problem).

So my suggestion is either turn off ACPI and/or APIC, or try
2.4.21-ac.

--
Dave Meyer
[email protected]

2003-07-01 12:59:42

by Edward King

[permalink] [raw]
Subject: Re: 2.4.21 IDE problems (lost interrupt, bad DMA status)



Marek Michalkiewicz wrote:

>Hi,
>
>After upgrading the kernel from 2.4.20 to 2.4.21, sometimes I see
>the following messages:
>
>hda: dma_timer_expiry: dma status == 0x24
>hda: lost interrupt
>hda: dma_intr: bad DMA status (dma_stat=30)
>hda: dma_intr: status=0x50 { DriveReady SeekComplete }
>
>It happens especially when there is a lot of disk I/O (which stops
>for a few seconds when these messages appear), with three different
>disks (very unlikely they all decided to die at the same time...),
>
>

Are you using software raid or devfs?

I was losing interrupts and disabling devfs removed the problem (very
reproducable with software raid 5 -- never really tried much heavy disk
use without raid.)

Edward King


2003-07-01 19:30:57

by Marek Michalkiewicz

[permalink] [raw]
Subject: Re: 2.4.21 IDE problems (lost interrupt, bad DMA status)

On Mon, Jun 30, 2003 at 11:16:40PM +0100, Alan Cox wrote:
> On Llu, 2003-06-30 at 23:15, Marek Michalkiewicz wrote:
> >
> > hda: dma_timer_expiry: dma status == 0x24
> > hda: lost interrupt
> > hda: dma_intr: bad DMA status (dma_stat=30)
> > hda: dma_intr: status=0x50 { DriveReady SeekComplete }
>
> Does it happen if you disable local apic support ?

It seems that booting with "noapic" fixes it, or at least now it
is much more difficult to trigger. Still testing...

Before upgrading to 2.4.21, I've been running 2.4.20 with APIC
enabled for a few months, and there were no such IDE errors.

BTW, "noapic" fixes the "power button not working if ACPI is alone
on its own IRQ" problem (present in both 2.4.20 and 2.4.21) too.

Thanks,
Marek

2003-07-01 19:39:05

by Marek Michalkiewicz

[permalink] [raw]
Subject: Re: 2.4.21 IDE problems (lost interrupt, bad DMA status)

On Tue, Jul 01, 2003 at 08:13:34AM -0500, Edward King wrote:
>
> Are you using software raid or devfs?

No devfs. As for RAID - I'm running the same kernel image on two
very similar boxes, RAID is compiled in but not used on one box,
and the other box currently has RAID1 in degraded mode (one disk,
waiting for me to install the second one).

Marek

2003-07-02 09:21:46

by joe briggs

[permalink] [raw]
Subject: Re: 2.4.21 IDE problems (lost interrupt, bad DMA status)

Can anyone tell me what the -ac patches do with respect to this problem?
Also, what functionality is lost when CONFIG_X86_IO_APIC is not set, and
should it improve this hd timeout/lost interrupt problem?

Thanks!

On Monday 30 June 2003 06:47 pm, [email protected] wrote:
> In article <[email protected]> you write:
> > Hi,
> >
> > After upgrading the kernel from 2.4.20 to 2.4.21, sometimes I see
> > the following messages:
> >
> > hda: dma_timer_expiry: dma status == 0x24
> > hda: lost interrupt
> > hda: dma_intr: bad DMA status (dma_stat=30)
> > hda: dma_intr: status=0x50 { DriveReady SeekComplete }
> >
> > It happens especially when there is a lot of disk I/O (which stops
> > for a few seconds when these messages appear), with three different
> > disks (very unlikely they all decided to die at the same time...),
> > one old ATA33 (QUANTUM FIREBALL SE8.4A) and two newer ATA100 disks
> > (WDC WD300BB-32CCB0, ST340015A). IDE controller: VIA VT82C686B
> > on a MSI MS-6368L motherboard.
> >
> > I don't remember seeing anything like that in any earlier 2.4.x
> > kernels. Is this a known problem? Is this anything dangerous -
> > should I disable UDMA for now to play it safe?
>
> I never saw any corruption when I had it. I've seen this with stock
> kernels since 2.4.18 or so with ACPI and APIC enabled; with ac kernels
> I never get it (I'm suspecting the old ACPI in the stock kernels is
> the problem).
>
> So my suggestion is either turn off ACPI and/or APIC, or try
> 2.4.21-ac.

--
Joe Briggs
Briggs Media Systems
105 Burnsen Ave.
Manchester NH 01304 USA
TEL 603-232-3115 FAX 603-625-5809 MOBILE 603-493-2386
http://www.briggsmedia.com

2003-07-02 09:47:26

by Herbert Xu

[permalink] [raw]
Subject: Re: 2.4.21 IDE problems (lost interrupt, bad DMA status)

joe briggs <[email protected]> wrote:
> Can anyone tell me what the -ac patches do with respect to this problem?
> Also, what functionality is lost when CONFIG_X86_IO_APIC is not set, and
> should it improve this hd timeout/lost interrupt problem?

It fixes the problem where interrupts are lost when the relevant IRQ line
is disabled.
--
Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ )
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2003-07-21 21:42:52

by Ronald Wahl

[permalink] [raw]
Subject: Re: 2.4.21 IDE problems (lost interrupt, bad DMA status)

Herbert Xu wrote:
> joe briggs <[email protected]> wrote:
> > Can anyone tell me what the -ac patches do with respect to this problem?
> > Also, what functionality is lost when CONFIG_X86_IO_APIC is not set, and
> > should it improve this hd timeout/lost interrupt problem?

> It fixes the problem where interrupts are lost when the relevant IRQ line
> is disabled.

I have 3 questions regarding this issue:

1. Can you explain the problem a little bit more in detail?

2. Is there a dedicated patch solving this issue? (I don't want to
apply the complete -ac patch )

3. Will this patch be in 2.4.22?


Thx & regards,
ron

PS: Sorry if this mail is not part of the origin thread. I'm not on the
list and read about the problem in a mailing list archive.