2006-10-18 06:44:11

by Daniel Mierswa

[permalink] [raw]
Subject: ASUS M2NPV-VM APIC/ACPI Bug (patched)

Some people have deeper problems with the Asus M2NPV-VM mainboard
(rather the chipset of the mainboard).
A google for "Asus M2NPV-VM apic" shows that. I'm one of them,
desperately searching a way to fix that, using that board with an AMD
Athlon64 X2 3800+ Dual Core Processor.
It wouldn't boot because of APIC and ACPI errors. There were "kind of"
workarounds by passing acpi=off/noirq and noapic to the kernel which
resulted in sometimes bad internal clock. I for myself had the same
problem and due to the error with my internal system clock all
applications and drivers gone mad, including
sound,video,graphics,usb,etc.. I googled around and saw the following:
http://lkml.org/lkml/2006/8/13/25
Actually that was a patch created for the 2.6.18-rc4 kernel. I tried
several kernels all with the same results. Some of them are
2.6.18-mm3, 2.6.19-rc2, 2.6.17, 2.6.18, 2.6.18.1, some gentoo patched
sources and what not. All will hang after the io scheduler gets loaded,
passing acpi=off/noirq to the kernel will workaround that one. Then it
will boot on and finally reach the ochi_hcd driver which will not load
because of shared IRQ problems, passing nousb to the kernel will
workaround that. It will boot more and come to the dhcp client, where it
fails because of an Interrupt error.
Some people passing noapic acpi=off/noirq to the kernel got later sound
problems, they fixed that by passing "snd-hda-intel model=3stack
position_fix=1" which worked around that interrupt problem. So with the
patch provided on http://lkml.org/lkml/2006/8/13/25 it all works out.
The internal system clock works just fine, the drivers load
all fine, no need to patch the sound,graphics or anything at all. No
need for kernel parameters either. Here's the patch again, created by
diff -ur on the current 2.6.18.1 kernel:

--- io_apic.c.orig 2006-10-18 08:02:50.000000000 +0200
+++ io_apic.c 2006-10-18 07:40:48.000000000 +0200
@@ -337,12 +337,12 @@
nvidia_hpet_detected = 0;
acpi_table_parse(ACPI_HPET,
nvidia_hpet_check);
- if (nvidia_hpet_detected == 0) {
+/* if (nvidia_hpet_detected == 0) {
acpi_skip_timer_override = 1;
printk(KERN_INFO "Nvidia board "
"detected. Ignoring ACPI "
"timer override.\n");
- }
+ }*/
#endif
/* RED-PEN skip them on mptables too? */
return;

Is there a small chance by getting that fixed in next kernel versions?
Greets impulze


2006-10-18 07:28:44

by Brown, Len

[permalink] [raw]
Subject: Re: ASUS M2NPV-VM APIC/ACPI Bug (patched)

On Wednesday 18 October 2006 02:44, Daniel Mierswa wrote:
> Some people have deeper problems with the Asus M2NPV-VM mainboard
> (rather the chipset of the mainboard).
> A google for "Asus M2NPV-VM apic" shows that. I'm one of them,
> desperately searching a way to fix that, using that board with an AMD
> Athlon64 X2 3800+ Dual Core Processor.
> It wouldn't boot because of APIC and ACPI errors. There were "kind of"
> workarounds by passing acpi=off/noirq and noapic to the kernel which
> resulted in sometimes bad internal clock. I for myself had the same
> problem and due to the error with my internal system clock all
> applications and drivers gone mad, including
> sound,video,graphics,usb,etc.. I googled around and saw the following:
> http://lkml.org/lkml/2006/8/13/25
> Actually that was a patch created for the 2.6.18-rc4 kernel. I tried
> several kernels all with the same results. Some of them are
> 2.6.18-mm3, 2.6.19-rc2, 2.6.17, 2.6.18, 2.6.18.1, some gentoo patched
> sources and what not. All will hang after the io scheduler gets loaded,
> passing acpi=off/noirq to the kernel will workaround that one. Then it
> will boot on and finally reach the ochi_hcd driver which will not load
> because of shared IRQ problems, passing nousb to the kernel will
> workaround that. It will boot more and come to the dhcp client, where it
> fails because of an Interrupt error.
> Some people passing noapic acpi=off/noirq to the kernel got later sound
> problems, they fixed that by passing "snd-hda-intel model=3stack
> position_fix=1" which worked around that interrupt problem. So with the
> patch provided on http://lkml.org/lkml/2006/8/13/25 it all works out.
> The internal system clock works just fine, the drivers load
> all fine, no need to patch the sound,graphics or anything at all. No
> need for kernel parameters either. Here's the patch again, created by
> diff -ur on the current 2.6.18.1 kernel:
>
> --- io_apic.c.orig 2006-10-18 08:02:50.000000000 +0200
> +++ io_apic.c 2006-10-18 07:40:48.000000000 +0200
> @@ -337,12 +337,12 @@
> nvidia_hpet_detected = 0;
> acpi_table_parse(ACPI_HPET,
> nvidia_hpet_check);
> - if (nvidia_hpet_detected == 0) {
> +/* if (nvidia_hpet_detected == 0) {
> acpi_skip_timer_override = 1;
> printk(KERN_INFO "Nvidia board "
> "detected. Ignoring ACPI "
> "timer override.\n");
> - }
> + }*/
> #endif

I recall quite clearly that Nvidia told us that that acpi_skip_timer_override
was necessary in NFORCE2 days. I don't remember the HPET qualification to
that statement -- I guess that came later.
Unfortunately, my NFORCE2 board is dead, so I can't really test this out directly.

Perhaps checking for PCI_VENDOR_ID_NVIDIA is too broad and the workaround
is counter-productive on their newer NVIDIA chip-sets?

-Len

ps.
One (other) problem with this code is that it checks for an HPET table,
but doesn't check that the kernel has HPET support enabled.

2006-10-19 00:43:19

by Robert Hancock

[permalink] [raw]
Subject: Re: ASUS M2NPV-VM APIC/ACPI Bug (patched)

Len Brown wrote:
> On Wednesday 18 October 2006 02:44, Daniel Mierswa wrote:
>> Some people have deeper problems with the Asus M2NPV-VM mainboard
>> (rather the chipset of the mainboard).
>> A google for "Asus M2NPV-VM apic" shows that. I'm one of them,
>> desperately searching a way to fix that, using that board with an AMD
>> Athlon64 X2 3800+ Dual Core Processor.
>> It wouldn't boot because of APIC and ACPI errors. There were "kind of"
>> workarounds by passing acpi=off/noirq and noapic to the kernel which
>> resulted in sometimes bad internal clock. I for myself had the same
>> problem and due to the error with my internal system clock all
>> applications and drivers gone mad, including
>> sound,video,graphics,usb,etc.. I googled around and saw the following:
>> http://lkml.org/lkml/2006/8/13/25
>> Actually that was a patch created for the 2.6.18-rc4 kernel. I tried
>> several kernels all with the same results. Some of them are
>> 2.6.18-mm3, 2.6.19-rc2, 2.6.17, 2.6.18, 2.6.18.1, some gentoo patched
>> sources and what not. All will hang after the io scheduler gets loaded,
>> passing acpi=off/noirq to the kernel will workaround that one. Then it
>> will boot on and finally reach the ochi_hcd driver which will not load
>> because of shared IRQ problems, passing nousb to the kernel will
>> workaround that. It will boot more and come to the dhcp client, where it
>> fails because of an Interrupt error.
>> Some people passing noapic acpi=off/noirq to the kernel got later sound
>> problems, they fixed that by passing "snd-hda-intel model=3stack
>> position_fix=1" which worked around that interrupt problem. So with the
>> patch provided on http://lkml.org/lkml/2006/8/13/25 it all works out.
>> The internal system clock works just fine, the drivers load
>> all fine, no need to patch the sound,graphics or anything at all. No
>> need for kernel parameters either. Here's the patch again, created by
>> diff -ur on the current 2.6.18.1 kernel:
>>
>> --- io_apic.c.orig 2006-10-18 08:02:50.000000000 +0200
>> +++ io_apic.c 2006-10-18 07:40:48.000000000 +0200
>> @@ -337,12 +337,12 @@
>> nvidia_hpet_detected = 0;
>> acpi_table_parse(ACPI_HPET,
>> nvidia_hpet_check);
>> - if (nvidia_hpet_detected == 0) {
>> +/* if (nvidia_hpet_detected == 0) {
>> acpi_skip_timer_override = 1;
>> printk(KERN_INFO "Nvidia board "
>> "detected. Ignoring ACPI "
>> "timer override.\n");
>> - }
>> + }*/
>> #endif
>
> I recall quite clearly that Nvidia told us that that acpi_skip_timer_override
> was necessary in NFORCE2 days. I don't remember the HPET qualification to
> that statement -- I guess that came later.
> Unfortunately, my NFORCE2 board is dead, so I can't really test this out directly.
>
> Perhaps checking for PCI_VENDOR_ID_NVIDIA is too broad and the workaround
> is counter-productive on their newer NVIDIA chip-sets?
>
> -Len
>
> ps.
> One (other) problem with this code is that it checks for an HPET table,
> but doesn't check that the kernel has HPET support enabled.

I think the intent of the HPET check was that the quirk wasn't needed on
chipsets new enough to have an HPET. Unfortunately, even if the chipset
has an HPET it isn't always enabled by the BIOS.

Clearly this quirk is too broad, it should likely be only triggering on
known chipset revisions with the bad timer overrides and not on all
NVIDIA chipsets. What I am wondering is how these boards manage to work
fine in Windows, (presumably) without any such chipset-specific tweaks..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-10-19 12:41:21

by Andi Kleen

[permalink] [raw]
Subject: Re: ASUS M2NPV-VM APIC/ACPI Bug (patched)

On Wednesday 18 October 2006 09:30, Len Brown wrote:
> On Wednesday 18 October 2006 02:44, Daniel Mierswa wrote:
> > Some people have deeper problems with the Asus M2NPV-VM mainboard
> > (rather the chipset of the mainboard).
> > A google for "Asus M2NPV-VM apic" shows that. I'm one of them,
> > desperately searching a way to fix that, using that board with an AMD
> > Athlon64 X2 3800+ Dual Core Processor.
> > It wouldn't boot because of APIC and ACPI errors. There were "kind of"
> > workarounds by passing acpi=off/noirq and noapic to the kernel which
> > resulted in sometimes bad internal clock. I for myself had the same
> > problem and due to the error with my internal system clock all
> > applications and drivers gone mad, including
> > sound,video,graphics,usb,etc.. I googled around and saw the following:
> > http://lkml.org/lkml/2006/8/13/25
> > Actually that was a patch created for the 2.6.18-rc4 kernel. I tried
> > several kernels all with the same results. Some of them are
> > 2.6.18-mm3, 2.6.19-rc2, 2.6.17, 2.6.18, 2.6.18.1, some gentoo patched
> > sources and what not. All will hang after the io scheduler gets loaded,
> > passing acpi=off/noirq to the kernel will workaround that one. Then it
> > will boot on and finally reach the ochi_hcd driver which will not load
> > because of shared IRQ problems, passing nousb to the kernel will
> > workaround that. It will boot more and come to the dhcp client, where it
> > fails because of an Interrupt error.
> > Some people passing noapic acpi=off/noirq to the kernel got later sound
> > problems, they fixed that by passing "snd-hda-intel model=3stack
> > position_fix=1" which worked around that interrupt problem. So with the
> > patch provided on http://lkml.org/lkml/2006/8/13/25 it all works out.
> > The internal system clock works just fine, the drivers load
> > all fine, no need to patch the sound,graphics or anything at all. No
> > need for kernel parameters either. Here's the patch again, created by
> > diff -ur on the current 2.6.18.1 kernel:
> >
> > --- io_apic.c.orig 2006-10-18 08:02:50.000000000 +0200
> > +++ io_apic.c 2006-10-18 07:40:48.000000000 +0200
> > @@ -337,12 +337,12 @@
> > nvidia_hpet_detected = 0;
> > acpi_table_parse(ACPI_HPET,
> > nvidia_hpet_check);
> > - if (nvidia_hpet_detected == 0) {
> > +/* if (nvidia_hpet_detected == 0) {
> > acpi_skip_timer_override = 1;
> > printk(KERN_INFO "Nvidia board "
> > "detected. Ignoring ACPI "
> > "timer override.\n");
> > - }
> > + }*/
> > #endif
>
> I recall quite clearly that Nvidia told us that that acpi_skip_timer_override
> was necessary in NFORCE2 days. I don't remember the HPET qualification to
> that statement -- I guess that came later.
> Unfortunately, my NFORCE2 board is dead, so I can't really test this out directly.
>
> Perhaps checking for PCI_VENDOR_ID_NVIDIA is too broad and the workaround
> is counter-productive on their newer NVIDIA chip-sets?

I suppose Asus just "forgot" again to enable the HPET in their NF5 BIOS.
In general they seem to hate ACPI tables -- near all their mcfg tables are broken
too. Maybe we need to define the ASUS subset of ACPI (just kidding) @)

Anyways, I suppose we'll need a list of all unique PCI IDs for NF3,NF4 to key this
workaround on. Andy do you have a complete list?

>ps.
>One (other) problem with this code is that it checks for an HPET table,
>but doesn't check that the kernel has HPET support enabled.

Keying on PCI-IDs would fix that too.

-Andi

2006-10-19 12:44:59

by Andi Kleen

[permalink] [raw]
Subject: Re: ASUS M2NPV-VM APIC/ACPI Bug (patched)


> I think the intent of the HPET check was that the quirk wasn't needed on
> chipsets new enough to have an HPET.

Yes.

> Unfortunately, even if the chipset
> has an HPET it isn't always enabled by the BIOS.

It was supposed to be correct in the NF5 reference BIOS, but somehow Asus
must have managed to break the reference BIOS.

> Clearly this quirk is too broad, it should likely be only triggering on
> known chipset revisions with the bad timer overrides and not on all
> NVIDIA chipsets.

That was impossible at the point where it was implemented.

> What I am wondering is how these boards manage to work
> fine in Windows, (presumably) without any such chipset-specific tweaks..

They use the RTC interrupt for timing instead AFAIK so a broken interrupt 0
won't affect them. That's probably why we have so many problems with
interrupt 0 on cheap systems.

I tried it once to use in Linux too BTW, but it unfortunately cannot generate
any of the standard Linux timer frequencies.

-Andi

2006-10-19 15:52:24

by Allen Martin

[permalink] [raw]
Subject: RE: ASUS M2NPV-VM APIC/ACPI Bug (patched)

> > I recall quite clearly that Nvidia told us that that
> > acpi_skip_timer_override was necessary in NFORCE2 days. I don't
> > remember the HPET qualification to that statement -- I
> guess that came later.
> > Unfortunately, my NFORCE2 board is dead, so I can't really
> test this out directly.
> >
> > Perhaps checking for PCI_VENDOR_ID_NVIDIA is too broad and the
> > workaround is counter-productive on their newer NVIDIA chip-sets?
> >
> > -Len
> >
> > ps.
> > One (other) problem with this code is that it checks for an HPET
> > table, but doesn't check that the kernel has HPET support enabled.
>
> I think the intent of the HPET check was that the quirk
> wasn't needed on chipsets new enough to have an HPET.
> Unfortunately, even if the chipset has an HPET it isn't
> always enabled by the BIOS.
>
> Clearly this quirk is too broad, it should likely be only
> triggering on known chipset revisions with the bad timer
> overrides and not on all NVIDIA chipsets. What I am wondering
> is how these boards manage to work fine in Windows,
> (presumably) without any such chipset-specific tweaks..

The problem is this workaround doesn't fix a chipset issue, it fixes
incorrect entries in the BIOS ACPI tables. This bug existed in the
NVIDIA reference BIOS for nForce2 and got copied to all customer BIOSes
for nForce2. Even though our reference BIOSes and documentation for all
chipsets since then have the correct interrupt overrides in the ACPI
tables we still see customer BIOSes that get shipped with incorrect
entries that were probably copied from their nForce2 BIOS code.

I believe the HPET check was because the workaround was causing problems
when enabling HPET on systems that support it. Andy probably has more
details on that.

-Allen
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

2006-10-19 16:12:33

by Andi Kleen

[permalink] [raw]
Subject: Re: ASUS M2NPV-VM APIC/ACPI Bug (patched)


> The problem is this workaround doesn't fix a chipset issue, it fixes
> incorrect entries in the BIOS ACPI tables. This bug existed in the
> NVIDIA reference BIOS for nForce2 and got copied to all customer BIOSes
> for nForce2. Even though our reference BIOSes and documentation for all
> chipsets since then have the correct interrupt overrides in the ACPI
> tables we still see customer BIOSes that get shipped with incorrect
> entries that were probably copied from their nForce2 BIOS code.

Ah my understanding was that it applied to NF3 and possible NF4 too. Does it
not?

> I believe the HPET check was because the workaround was causing problems
> when enabling HPET on systems that support it. Andy probably has more
> details on that.

Yes it was because NF5 needed it to be disabled. Anyways if I can
get a list of PCI-IDs of chipsets where the reference BIOS had this
issue it can be narrowed to those.

-Andi

2006-10-20 01:08:34

by Allen Martin

[permalink] [raw]
Subject: RE: ASUS M2NPV-VM APIC/ACPI Bug (patched)

> Ah my understanding was that it applied to NF3 and possible
> NF4 too. Does it
> not?
>
> > I believe the HPET check was because the workaround was
> causing problems
> > when enabling HPET on systems that support it. Andy
> probably has more
> > details on that.
>
> Yes it was because NF5 needed it to be disabled. Anyways if I can
> get a list of PCI-IDs of chipsets where the reference BIOS had this
> issue it can be narrowed to those.

Well that's the problem. The issue only existed in the nForce2
reference BIOS (and maybe early in nForce3) but we still occasionally
see shipping customer BIOSes to this day that have this same bug for
nForce5 (like M2NPV referenced in this thread).

Probably what ASUS is doing in the M2NPV BIOS is copying the ACPI tables
from an earlier nForce2 product.

Probably what needs to happen is to make the HPET check more robust and
only return 1 if HPET is present and enabled.
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

2006-10-20 13:05:06

by Andi Kleen

[permalink] [raw]
Subject: Re: ASUS M2NPV-VM APIC/ACPI Bug (patched)


> Well that's the problem. The issue only existed in the nForce2
> reference BIOS (and maybe early in nForce3) but we still occasionally

Definitely some NF3 too, i've seen it on 64bit boxes.

> see shipping customer BIOSes to this day that have this same bug for
> nForce5 (like M2NPV referenced in this thread).
>
> Probably what ASUS is doing in the M2NPV BIOS is copying the ACPI tables
> from an earlier nForce2 product.

But the timer override is correct or still broken?

> Probably what needs to happen is to make the HPET check more robust and
> only return 1 if HPET is present and enabled.

I think the problem is that those Asus boards also don't have a HPET
table. So even though NF5 has HPET the kernel doesn't know about it
and the heuristic "if HPET then NF5 and timer override ok" breaks.

I still suspect doing a
"if (PCI ID from NF2 or NF3) ignore timer override"
is probably the best solution right now. But I don't have a full
list of PCI-IDs for NF2/NF3. Do you have one?

Ok that might still break the NF4. I assume it never needs any
timer overrides so it might be safe to include it in the PCI-IDs
too.

Or do you have a better proposal?

-Andi

2006-11-01 14:22:01

by Daniel Mierswa

[permalink] [raw]
Subject: Re: ASUS M2NPV-VM APIC/ACPI Bug (patched)

Andi Kleen wrote:
>> Well that's the problem. The issue only existed in the nForce2
>> reference BIOS (and maybe early in nForce3) but we still occasionally
>>
>
> Definitely some NF3 too, i've seen it on 64bit boxes.
>
>
>> see shipping customer BIOSes to this day that have this same bug for
>> nForce5 (like M2NPV referenced in this thread).
>>
>> Probably what ASUS is doing in the M2NPV BIOS is copying the ACPI tables
>> from an earlier nForce2 product.
>>
>
> But the timer override is correct or still broken?
>
>
>> Probably what needs to happen is to make the HPET check more robust and
>> only return 1 if HPET is present and enabled.
>>
>
> I think the problem is that those Asus boards also don't have a HPET
> table. So even though NF5 has HPET the kernel doesn't know about it
> and the heuristic "if HPET then NF5 and timer override ok" breaks.
>
> I still suspect doing a
> "if (PCI ID from NF2 or NF3) ignore timer override"
> is probably the best solution right now. But I don't have a full
> list of PCI-IDs for NF2/NF3. Do you have one?
>
> Ok that might still break the NF4. I assume it never needs any
> timer overrides so it might be safe to include it in the PCI-IDs
> too.
>
> Or do you have a better proposal?
>
> -Andi
>
Anyway i chatted around the globus and someone also mentioned that my
IRQs for sound and several others are very high. I'm not sure if this is
a board issue or a kernel issue. But since the sound chip on board (hda
intel) is having problems too I guess it's a kernel related thing. I
wonder if this will be fixed in newer versions.

2006-11-01 14:27:15

by Arjan van de Ven

[permalink] [raw]
Subject: Re: ASUS M2NPV-VM APIC/ACPI Bug (patched)


> Anyway i chatted around the globus and someone also mentioned that my
> IRQs for sound and several others are very high. I'm not sure if this is
> a board issue or a kernel issue. But since the sound chip on board (hda
> intel) is having problems too I guess it's a kernel related thing. I
> wonder if this will be fixed in newer versions.

btw if you have bios problems you can always use the linux-ready
firmware developer kit to test how well it does; see
http://www.linuxfirmwarekit.org for details

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org