2005-11-28 05:02:46

by gboyce

[permalink] [raw]
Subject: IDE + CPU Scaling problem on Via EPIA systems

Folks,

While attempting to setup a new server using a Via EPIA-M motherboard
(Nehemiah processor), I encountered a problem when I enabled CPU Frequency
scaling via the powernowd daemon.

The system is using software RAID 1 on a pair of IDE drives. Everything
goes fine until powernowd is enabled. At that point I get the following
errors:

[4294871.703000] hda: dma_timer_expiry: dma status == 0x20
[4294871.703000] hda: DMA timeout retry
[4294871.703000] hda: timeout waiting for DMA
[4294871.703000] hda: status error: status=0x58 { DriveReady SeekComplete
DataRequest }
[4294871.703000]
[4294871.703000] ide: failed opcode was: unknown
[4294871.703000] hda: drive not ready for command
[4294958.589000] hda: dma_timer_expiry: dma status == 0x20
[4294958.589000] hda: DMA timeout retry
[4294958.589000] hda: timeout waiting for DMA
[4294958.589000] hda: status error: status=0x58 { DriveReady SeekComplete
DataRequest }
[4294958.589000]
[4294958.589000] ide: failed opcode was: unknown
[4294958.589000] hda: drive not ready for command

I ran the system for an extended time previously with no problem, but at
this point I did not use powernowd, and I was not using RAID.

The system is running Ubuntu, but I also encountered someone else with the
same problem on an older Via EPIA system and SuSE. The ubuntu kernel I'm
using is 2.6.12-10-386.

I've attached the full dmesg output as well.

For now I'm just going to turn off cpu scaling. Someday I'd like to use
it though. Any suggestions for things I can try? Or is this perhaps a
known issue of some sort?

--
Greg


Attachments:
dmesg.txt (15.51 kB)

2005-11-28 06:32:15

by Dave Jones

[permalink] [raw]
Subject: Re: IDE + CPU Scaling problem on Via EPIA systems

On Mon, Nov 28, 2005 at 12:02:32AM -0500, gboyce wrote:
> Folks,
>
> While attempting to setup a new server using a Via EPIA-M motherboard
> (Nehemiah processor), I encountered a problem when I enabled CPU Frequency
> scaling via the powernowd daemon.
>
> The system is using software RAID 1 on a pair of IDE drives. Everything
> goes fine until powernowd is enabled. At that point I get the following
> errors:
>
> [4294871.703000] hda: dma_timer_expiry: dma status == 0x20
> [4294871.703000] hda: DMA timeout retry
> [4294871.703000] hda: timeout waiting for DMA
> [4294871.703000] hda: status error: status=0x58 { DriveReady SeekComplete
> DataRequest }
> [4294871.703000]
> [4294871.703000] ide: failed opcode was: unknown
> [4294871.703000] hda: drive not ready for command
> [4294958.589000] hda: dma_timer_expiry: dma status == 0x20
> [4294958.589000] hda: DMA timeout retry
> [4294958.589000] hda: timeout waiting for DMA
> [4294958.589000] hda: status error: status=0x58 { DriveReady SeekComplete
> DataRequest }
> [4294958.589000]
> [4294958.589000] ide: failed opcode was: unknown
> [4294958.589000] hda: drive not ready for command
>
> I ran the system for an extended time previously with no problem, but at
> this point I did not use powernowd, and I was not using RAID.
>
> The system is running Ubuntu, but I also encountered someone else with the
> same problem on an older Via EPIA system and SuSE. The ubuntu kernel I'm
> using is 2.6.12-10-386.
>
> I've attached the full dmesg output as well.
>
> For now I'm just going to turn off cpu scaling. Someday I'd like to use
> it though. Any suggestions for things I can try? Or is this perhaps a
> known issue of some sort?

On some variants of the VIA C3, we need to quiesce all DMA operations
before we do a speed transition. We currently don't do that.
I do have a patch from someone which adds support in the longhaul
driver to wait for IDE transactions to stop, but to do it cleanly,
we really need some callbacks into the IDE layer.

The longhaul driver right now is pretty much a case of
"if it works for you great, if not, don't use it".

Dave

2005-12-06 15:35:50

by Alan

[permalink] [raw]
Subject: Re: IDE + CPU Scaling problem on Via EPIA systems

On Llu, 2005-11-28 at 01:32 -0500, Dave Jones wrote:
> On some variants of the VIA C3, we need to quiesce all DMA operations
> before we do a speed transition. We currently don't do that.
> I do have a patch from someone which adds support in the longhaul
> driver to wait for IDE transactions to stop, but to do it cleanly,
> we really need some callbacks into the IDE layer.

I was under the impression you could turn the IO/MEM enable on the root
bridge off momentarily to get the needed DMA pause safely ? Or does it
abort rather than retry at that point ?

2005-12-06 17:05:38

by Dave Jones

[permalink] [raw]
Subject: Re: IDE + CPU Scaling problem on Via EPIA systems

On Sun, Dec 04, 2005 at 04:22:03PM +0000, Alan Cox wrote:
> On Llu, 2005-11-28 at 01:32 -0500, Dave Jones wrote:
> > On some variants of the VIA C3, we need to quiesce all DMA operations
> > before we do a speed transition. We currently don't do that.
> > I do have a patch from someone which adds support in the longhaul
> > driver to wait for IDE transactions to stop, but to do it cleanly,
> > we really need some callbacks into the IDE layer.
>
> I was under the impression you could turn the IO/MEM enable on the root
> bridge off momentarily to get the needed DMA pause safely ? Or does it
> abort rather than retry at that point ?

I haven't tried that. Is that a safe thing to do ?
We do now disable mastering on all pci devices around the transition
as mandated in the spec, but that didn't really do much either.

Here's the patch I got from one user, which 'worked'...


Subject: [CPUFREQ] longhaul - avoid ide timeouts when bus mastering is off.

I really don't like this. Really.
Unfortunatly its necessary, otherwise after the transition, we live
for just a few minutes, and then just lockup. In an ideal world,
the pm layer would allow some means of walking the device tree
quiescing DMA. Sadly, we don't live in an ideal world.

I'm open to suggestions on better ways to do this, as I'd rather
not push this to Linus' tree.

>From patch by: Ken Staton <[email protected]>
Signed-off-by: Dave Jones <[email protected]>

--- linux-2.6.11/arch/i386/kernel/cpu/cpufreq/longhaul.c~ 2005-05-24 01:51:51.000000000 -0400
+++ linux-2.6.11/arch/i386/kernel/cpu/cpufreq/longhaul.c 2005-05-24 01:52:07.000000000 -0400
@@ -30,6 +30,8 @@
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/pci.h>
+#include <linux/ide.h>
+#include <linux/delay.h>

#include <asm/msr.h>
#include <asm/timex.h>
@@ -91,6 +91,25 @@ static char *print_speed(int speed)
}
#endif

+static void ide_idle(void)
+ {
+ int i;
+ ide_hwif_t *hwif = ide_hwifs;
+ ide_drive_t *drive;
+
+ i = 0;
+ do {
+ drive = &hwif->drives[i];
+ i++;
+ if (strncmp(drive->name,"hd",2) == 0) {
+ while (drive->waiting_for_dma)
+ udelay(10);
+ } else {
+ i = 0;
+ }
+ } while (i != 0);
+}
+

static unsigned int calc_speed(int mult)
{
@@ -146,6 +165,7 @@ static void do_powersaver(union msr_long
longhaul->bits.RevisionKey = 0;

preempt_disable();
+ ide_idle(); /* avoid ide timeouts when bus master off */
local_irq_save(flags);

/*

Whilst it may work fine, and 99.9% of VIA systems are likely IDE, it does
nothing about SCSI, so it's not quite perfect.

That said, even a 99% coverage is better than a 0% coverage we currently have.

It also seems quite racy, as it does nothing to prevent new transactions
from beginning after the drive has become idle.

Dave