2004-04-15 18:49:08

by Allen Martin

[permalink] [raw]
Subject: RE: IO-APIC on nforce2 [PATCH]

> True it is a bios thing but I have yet to see an nforce2 MOBO
> that is not
> routed in this way. I am thinking it is internal to the
> chipset. I have seen
> none route it into io-apic pin2.

It was a bug in our original nForce reference BIOS that we gave out to vendors. Since then we fixed the reference BIOS, but since it was after products shipped, most of the motherboard vendors won't pick up the change unless they get complaints from customers.

We've fixed it for our reference BIOS for future products though.


-Allen


2004-04-15 19:20:45

by Craig Bradney

[permalink] [raw]
Subject: RE: IO-APIC on nforce2 [PATCH]

On Thu, 2004-04-15 at 20:33, Allen Martin wrote:
> > True it is a bios thing but I have yet to see an nforce2 MOBO
> > that is not
> > routed in this way. I am thinking it is internal to the
> > chipset. I have seen
> > none route it into io-apic pin2.
>
> It was a bug in our original nForce reference BIOS that we gave out to vendors. Since then we fixed the reference BIOS, but since it was after products shipped, most of the motherboard vendors won't pick up the change unless they get complaints from customers.
>
> We've fixed it for our reference BIOS for future products though.

Would it not be worth Nvidia advising the vendors (possibly already
done) that there are nForce BIOS issues causing reproducable hard
crashes with their existing BIOS versions with relation to the issue
here? I realise that these manufacturers (in my case ASUS) may have no
obligation and have their own product schedules of course.

There would seem to be many people puchasing nForce based systems and
many will not find out how or why there is the problem.

I for one, will not be buying another nForce based motherboard until I'm
sure this issue is sorted properly. Yes, I will probably find issues on
other boards, but I have no such hard crashes on an old P3 Sis chipset
board, a VIA KT Abit Duron board, nor on this P4 board with the very
same kernel source.

While my single purchases make no difference in the scheme of things,
the lack of support from the various manufacturers given to the kernel
developers (I refer in this case to Ross Dickson trying to get help
from Nvidia and another, I think AMD) is turning me away (back to Intel
CPUs).

Of course, I am very happy to finally see a response from someone from
Nvidia now.

With best regards
Craig Bradney



Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2004-04-15 19:51:05

by Brown, Len

[permalink] [raw]
Subject: RE: IO-APIC on nforce2 [PATCH]

On Thu, 2004-04-15 at 14:33, Allen Martin wrote:

> It was a bug in our original nForce reference BIOS that we gave out to vendors. Since then we fixed the reference BIOS, but since it was after products shipped, most of the motherboard vendors won't pick up the change unless they get complaints from customers.
>
> We've fixed it for our reference BIOS for future products though.

Great!

Knowing this makes the path clear.
As we expected, an automatic workaround based on chip-set would
fail because some BIOS's are fixed and some are not.
So we either leave the workaround as manual bootparam
or try to enumerate all BIOS versions with the bug
in dmi_scan. I'm content to do the former. If distros
have trouble supporting nforce2 systems, they may want to add
to the later.

thanks,
-Len

ps.
I'm also excited to see a [email protected] alias
on your note. Perhaps you can explain how we should use it. Should
this alias be included on discussions of the more important issue --
the system hang that seems to be related to HALT in idle/C1?


2004-04-16 08:28:28

by Jamie Lokier

[permalink] [raw]
Subject: Re: IO-APIC on nforce2 [PATCH]

Len Brown wrote:
> As we expected, an automatic workaround based on chip-set would
> fail because some BIOS's are fixed and some are not.

Does the workaround actually fail with the fixed BIOSes?

-- Jamie

2004-04-22 04:01:11

by Brown, Len

[permalink] [raw]
Subject: Re: IO-APIC on nforce2 [PATCH]

On Fri, 2004-04-16 at 04:27, Jamie Lokier wrote:
> Len Brown wrote:
> > As we expected, an automatic workaround based on chip-set would
> > fail because some BIOS's are fixed and some are not.
>
> Does the workaround actually fail with the fixed BIOSes?

A fixed BIOS will not have a bogus IRQ2->pin-2 mapping,
so the acpi_skip_timer_override workaround would not
find an entry to ignore, and would become a NOP.

So if
1. all nforce2 chipsets have timer connected to pin0
2. we can safely discover we're on nforce2 early enough,
like andi did on x86_64

then we could apply the workaround automatically always
w/o any harm.

-Len


2004-04-22 13:22:43

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: IO-APIC on nforce2 [PATCH]

On Thu, 22 Apr 2004, Len Brown wrote:

> So if
> 1. all nforce2 chipsets have timer connected to pin0

Allen, is there a possibility to get a clarification from Nvidia on that?
Specifically, assuming both an 8254 and an I/O APIC core are integrated
into the chip, whether OUT0 of the 8254 is unconditionally routed to
INTIN0 of the I/O APIC or is it configurable somehow.

> 2. we can safely discover we're on nforce2 early enough,
> like andi did on x86_64
>
> then we could apply the workaround automatically always
> w/o any harm.

Indeed.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2004-04-22 13:52:00

by Christian Kröner

[permalink] [raw]
Subject: Re: IO-APIC on nforce2 [PATCH]

since its becoming fancy to post dmidecode output, here is mine. my bios
vendor is MSI (K7N2-Delta) and it has released many BIOS updates for this
board, but none of them fixes the timer-issue. i only got the timer issue,
(related with it the already discussed obscure hi-load) no hang with c1
disconnect enabled. another thing i recently noticed (running 2.6.6-rc2-mm1
now) is that the last XT-PIC interrupt is gone now. i had cascade on irq2
routed as XT-PIC before, now cascade (whatever it is) doesnt exist
anymore ;).

/proc/interrupts now:

0: 32184529 IO-APIC-edge timer
1: 1741 IO-APIC-edge i8042
7: 0 IO-APIC-edge parport0
8: 4 IO-APIC-edge rtc
9: 0 IO-APIC-level acpi
12: 9229 IO-APIC-edge i8042
14: 107111 IO-APIC-edge ide0
15: 92 IO-APIC-edge ide1
16: 3138 IO-APIC-level ide2, saa7134[0]
17: 153 IO-APIC-level CMI8738
19: 2732391 IO-APIC-level nvidia
20: 4315754 IO-APIC-level ohci_hcd, eth0
21: 1167427697 IO-APIC-level ehci_hcd
22: 79 IO-APIC-level ohci_hcd

another thing that bugs me a little (a little offtopic here maybe), is the
irq21 of ehci_hcd seems to get hit about twice as often as the timer irq
although im not at all using USB... any suggestions? maybe i start a second
thread on this one...

attached: dmidecode output.

greets, christian.

pls CC on replies.


Attachments:
(No filename) (1.41 kB)
dmidecode.txt (11.19 kB)
Download all attachments

2004-04-22 15:28:13

by Brown, Len

[permalink] [raw]
Subject: Re: IO-APIC on nforce2 [PATCH]

On Thu, 2004-04-22 at 09:53, Christian Kr?ner wrote:
> since its becoming fancy to post dmidecode output

Thanks. BTW. sending the dmidecode directly to me should be sufficient
should you have need to do so again.

> another thing i recently noticed (running 2.6.6-rc2-mm1
> now) is that the last XT-PIC interrupt is gone now. i had cascade on
> irq2
> routed as XT-PIC before, now cascade (whatever it is) doesnt exist
> anymore ;).

Yes, this is normal on ACPI+IOAPIC configs going forward
details here: http://bugzilla.kernel.org/show_bug.cgi?id=2564

> /proc/interrupts now:
>
> 0: 32184529 IO-APIC-edge timer
> 1: 1741 IO-APIC-edge i8042
> 7: 0 IO-APIC-edge parport0
> 8: 4 IO-APIC-edge rtc
> 9: 0 IO-APIC-level acpi
> 12: 9229 IO-APIC-edge i8042
> 14: 107111 IO-APIC-edge ide0
> 15: 92 IO-APIC-edge ide1
> 16: 3138 IO-APIC-level ide2, saa7134[0]
> 17: 153 IO-APIC-level CMI8738
> 19: 2732391 IO-APIC-level nvidia
> 20: 4315754 IO-APIC-level ohci_hcd, eth0
> 21: 1167427697 IO-APIC-level ehci_hcd
> 22: 79 IO-APIC-level ohci_hcd


> another thing that bugs me a little (a little offtopic here maybe), is
> the
> irq21 of ehci_hcd seems to get hit about twice as often as the timer
> irq
> although im not at all using USB... any suggestions? maybe i start a
> second
> thread on this one...

Better yet, file a bug and we'll look at your ehci interrupt issue in
detail.

thanks,
-Len

How to file a bug against ACPI:

http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI

For failure and success case, please attach
1. dmesg -s64000, or serial console using "debug" on cmdline.
(increase CONFIG_LOG_BUF_SHIFT if it doesn't get back to beginning)
2. /proc/interrupts
3. lspci -v

Please attach the output from acpidmp, available in /usr/sbin/, or in
pmtools:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/



2004-04-22 15:40:49

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: IO-APIC on nforce2 [PATCH]

On Thu, 22 Apr 2004, Len Brown wrote:

> Yes, this is normal on ACPI+IOAPIC configs going forward
> details here: http://bugzilla.kernel.org/show_bug.cgi?id=2564

Except that the IRQ was reserved for plain 8259A setups, where it is
really used for a cascade for a slave 8259A, long before any APIC support
was there in Linux. JFTR.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2004-04-22 16:13:58

by Christian Kröner

[permalink] [raw]
Subject: Re: IO-APIC on nforce2 [PATCH]

On Thursday 22 April 2004 17:27, you wrote:
> On Thu, 2004-04-22 at 09:53, Christian Kr?ner wrote:
> > since its becoming fancy to post dmidecode output
>
> Thanks. BTW. sending the dmidecode directly to me should be sufficient
> should you have need to do so again.
>
> > another thing i recently noticed (running 2.6.6-rc2-mm1
> > now) is that the last XT-PIC interrupt is gone now. i had cascade on
> > irq2
> > routed as XT-PIC before, now cascade (whatever it is) doesnt exist
> > anymore ;).
>
> Yes, this is normal on ACPI+IOAPIC configs going forward
> details here: http://bugzilla.kernel.org/show_bug.cgi?id=2564
>
> > /proc/interrupts now:
> >
> > 0: 32184529 IO-APIC-edge timer
> > 1: 1741 IO-APIC-edge i8042
> > 7: 0 IO-APIC-edge parport0
> > 8: 4 IO-APIC-edge rtc
> > 9: 0 IO-APIC-level acpi
> > 12: 9229 IO-APIC-edge i8042
> > 14: 107111 IO-APIC-edge ide0
> > 15: 92 IO-APIC-edge ide1
> > 16: 3138 IO-APIC-level ide2, saa7134[0]
> > 17: 153 IO-APIC-level CMI8738
> > 19: 2732391 IO-APIC-level nvidia
> > 20: 4315754 IO-APIC-level ohci_hcd, eth0
> > 21: 1167427697 IO-APIC-level ehci_hcd
> > 22: 79 IO-APIC-level ohci_hcd
> >
> >
> > another thing that bugs me a little (a little offtopic here maybe), is
> > the
> > irq21 of ehci_hcd seems to get hit about twice as often as the timer
> > irq
> > although im not at all using USB... any suggestions? maybe i start a
> > second
> > thread on this one...
>
> Better yet, file a bug and we'll look at your ehci interrupt issue in
> detail.
>
> thanks,
> -Len
>
> How to file a bug against ACPI:
>
> http://bugzilla.kernel.org/enter_bug.cgi?product=ACPI
>
> For failure and success case, please attach
> 1. dmesg -s64000, or serial console using "debug" on cmdline.
> (increase CONFIG_LOG_BUF_SHIFT if it doesn't get back to beginning)
> 2. /proc/interrupts
> 3. lspci -v
>
> Please attach the output from acpidmp, available in /usr/sbin/, or in
> pmtools:
> http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/utils/

Since I don't know which category of ACPI bugtracking my "bug" fits into I
send the symptom here:

cat /proc/interrupts

0: 41075417 IO-APIC-edge timer
1: 27326 IO-APIC-edge i8042
7: 0 IO-APIC-edge parport0
8: 4 IO-APIC-edge rtc
9: 0 IO-APIC-level acpi
12: 226123 IO-APIC-edge i8042
14: 152298 IO-APIC-edge ide0
15: 92 IO-APIC-edge ide1
16: 28174 IO-APIC-level ide2, saa7134[0]
17: 86370 IO-APIC-level CMI8738
19: 3500294 IO-APIC-level nvidia
20: 5858400 IO-APIC-level ohci_hcd, eth0
21: 2678002320 IO-APIC-level ehci_hcd
22: 79 IO-APIC-level ohci_hcd
NMI: 13435
LOC: 40935121
ERR: 0
MIS: 0

and the same only 5 seconds later:

0: 41082729 IO-APIC-edge timer
1: 27350 IO-APIC-edge i8042
7: 0 IO-APIC-edge parport0
8: 4 IO-APIC-edge rtc
9: 0 IO-APIC-level acpi
12: 226123 IO-APIC-edge i8042
14: 152313 IO-APIC-edge ide0
15: 92 IO-APIC-edge ide1
16: 28243 IO-APIC-level ide2, saa7134[0]
17: 86515 IO-APIC-level CMI8738
19: 3500917 IO-APIC-level nvidia
20: 5859792 IO-APIC-level ohci_hcd, eth0
21: 2679242125 IO-APIC-level ehci_hcd
22: 79 IO-APIC-level ohci_hcd
NMI: 13438
LOC: 40942396
ERR: 0
MIS: 0

It seems as if irq21 (ehci_hcd) got hit 1 239 805 times in only 5 seconds
without me even using USB in that period.


Please tell me into which category I should file this issue and I will happily
file a bug report right after...

thanks, christian.