2003-02-17 22:04:44

by Simon Kirby

[permalink] [raw]
Subject: [2.4.21-pre4] IDE hangs box after timeout

Hello,

I don't think this happened on older kernels (< 2.4.18ish), but it may
have happened on 2.4.20 (though I have other problems with 2.4.20 on this
box that makes testing more difficult -- it tends to Oops fairly often).

Anyway, this box has a massive collection of old (and new) drives to make
a large storage area, using MD linear. The box has two SCSI cards, two
promise cards (PDC20269), and onboard IDE (PIIX4). Because the box has
so many drives, I had to use a number of power splitters which are, of
course, cheap and thus unreliable, and occasionally a few drives will
fall off of the bus. This is the real problem, yes, but it seems to be
triggering a lockup bug in 2.4.21-pre4. When hda falls off the bus due
to power loss, I see this on the console:

hda: dma_timer_expiry: dma status == 0x21
hda: timeout waiting for DMA
hda: timeout waiting for DMA
hda: (__ide_dma_test_irq) called while not waiting

...followed by a complete lockup where sysreq does not appear to work.

dmesg and config available here:

http://blue.netnation.com/sim/ref/alfie.dmesg
http://blue.netnation.com/sim/ref/alfie.config

( Yes, a new power supply is on order. :) )

Simon-

[ Simon Kirby ][ Network Operations ]
[ [email protected] ][ NetNation Communications ]
[ Opinions expressed are not necessarily those of my employer. ]


2003-02-18 12:15:38

by Robbert Kouprie

[permalink] [raw]
Subject: Re: [2.4.21-pre4] IDE hangs box after timeout


Hi,

Simon Kirby wrote:

> I don't think this happened on older kernels (< 2.4.18ish), but it may
> have happened on 2.4.20 (though I have other problems with 2.4.20 on
> this box that makes testing more difficult -- it tends to Oops fairly
> often).

I encountered the same problem on a system with just one PDC20269, 2
drives attached to it, and 2 drives attached onboard. This system has an
Enermax 430W power supply, and I would think this was enough for a pentium
3, 3 PCI cards and 4 disks.

Kernels older than 2.4.18 don't have LBA48 support, so would restrict the
20269 in its use. I also encountered the problem with kernel 2.4.17 +
Andre Hedrick's IDE patch, though. Also with 2.4.18/19 and various 2.4.20
-pre and -ac versions upto 2.4.20-rc1-ac4. Testing 2.4.21-pre4-ac4 now.

> I had to use a number of power splitters which are, of course, cheap
> and thus unreliable, and occasionally a few drives will fall off of
> the bus.

I don't use power splitters as this power supply has enough connectors.

> hda: dma_timer_expiry: dma status == 0x21
> hda: timeout waiting for DMA
> hda: timeout waiting for DMA
> hda: (__ide_dma_test_irq) called while not waiting

I see the exact same message.

> ...followed by a complete lockup where sysreq does not appear to work.

For me, the system also locks up completely when there's two disks
connected to the 20269, one on each channel (Note it's always a drive on
the 20269 which drops dead). When you make sure there's only *one* disk
connected to the 20269 (one disk in total, not one on each channel), and
this disk drops dead, then it's just the disk and controller being dead,
and the system continues to run. I imagine the 20269 will lock up the PCI
bus when >1 drives connected to it and one of the drives drops dead.

I'm not sure if we should call this a kernel bug though.

> ( Yes, a new power supply is on order. :) )

I hope this will solve the problem for you. For me it didn't :(

Regards,
- Robbert Kouprie

2003-02-18 20:36:27

by Simon Kirby

[permalink] [raw]
Subject: Re: [2.4.21-pre4] IDE hangs box after timeout

On Tue, Feb 18, 2003 at 01:25:40PM +0100, Robbert Kouprie wrote:

> I encountered the same problem on a system with just one PDC20269, 2
> drives attached to it, and 2 drives attached onboard. This system has an
> Enermax 430W power supply, and I would think this was enough for a pentium
> 3, 3 PCI cards and 4 disks.
>
> Kernels older than 2.4.18 don't have LBA48 support, so would restrict the
> 20269 in its use. I also encountered the problem with kernel 2.4.17 +
> Andre Hedrick's IDE patch, though. Also with 2.4.18/19 and various 2.4.20
> -pre and -ac versions upto 2.4.20-rc1-ac4. Testing 2.4.21-pre4-ac4 now.
>
> > I had to use a number of power splitters which are, of course, cheap
> > and thus unreliable, and occasionally a few drives will fall off of
> > the bus.
>
> I don't use power splitters as this power supply has enough connectors.

Well, I just threw an Enermax EG465P in there along with a dual 350W
redundant supply, and I still had to use about four splitters. O:)
The box has 20 drives and about 10 80mm fans in it, so splitters
were definitely a problem for me.

> > hda: dma_timer_expiry: dma status == 0x21
> > hda: timeout waiting for DMA
> > hda: timeout waiting for DMA
> > hda: (__ide_dma_test_irq) called while not waiting
>
> I see the exact same message.

Every time this message occurred, a reboot would show one or more drives
missing in the BIOS scan. I would open the case, jiggle some wires,
reboot again, and the drives would be back. Since I've replaced the
supply so that I have many more connectors (and four splitters), it seems
to now be reliable (though it needs to run a bit longer to be sure).

Anyway, it seemed to me like the problem was triggered by hardware
issues, but maybe not. Both cases should be handled gracefully, but
currently don't seem to be.

Simon-

[ Simon Kirby ][ Network Operations ]
[ [email protected] ][ NetNation Communications ]
[ Opinions expressed are not necessarily those of my employer. ]

2003-02-18 23:49:15

by Edward King

[permalink] [raw]
Subject: Re: [2.4.21-pre4] IDE hangs box after timeout

Simon Kirby wrote:

>On Tue, Feb 18, 2003 at 01:25:40PM +0100, Robbert Kouprie wrote:
>
>
>>Kernels older than 2.4.18 don't have LBA48 support, so would restrict the
>>20269 in its use. I also encountered the problem with kernel 2.4.17 +
>>Andre Hedrick's IDE patch, though. Also with 2.4.18/19 and various 2.4.20
>>-pre and -ac versions upto 2.4.20-rc1-ac4. Testing 2.4.21-pre4-ac4 now.
>>
>>

Just wanted to jump in -- I have the same problem with 2.4.21-pre4-ac4.
I've got 2 PDC20268's running 4 WD 200GB drives.

Powersupply is a 550 watt Antec.

I have a seperate 13gb drive that I'm booting from on the motherboard,
so the system doesn't lock and reboots bring everything back fine.

I the bios on my PDC's is 2..20.0.14, tried with different cables (same
problem). The drives do seem to come back -- acts as if it has a hard
time resetting the drives but finally comes back (after about 5-10 minutes).

Regards,

- Edward King