LinuxLists.cc - IO-APIC on nforce2

2004-04-12 18:34:08

Subject: IO-APIC on nforce2

I got a problem using LOCAL APIC and IO-APIC on my uniprocessor nforce2 board.
With recent kernels (latest -mm and 2.6.5-linus) the timer irq gets set to
XT-PIC, which results in having a constant hi-load of 15% (after booting) to
about 25% (after having the system run about 12 h). Earlier versions of -mm
set the timer-irq to IO-APIC-level (or edge, i dont remember it well) and i
never had any constant hi-load with these versions. Since mainline kernel
versions never ever set the timer irq to IO-APIC-{level,edge} i used to patch
them with the ross' nforce-patches, so that the timer-irq gets to be
IO-APCI-edge, which worked even though the patch applied with offset. Anyways
with the latest mm-kernels these patches dont work anymore. I could apply
them with offset but it seems the code isn't used or something else is wrong
since the timer-irq stays XT-PIC, which results in the problems above. Could
anyone point out, how to resolve this problem or tell me what I could do, to
get my timer-irq right? I'm sure willing to test patches...
Thanks in advance, christian.

2004-04-13 01:14:32

Re: IRQ0 XT-PIC timer issue

Since the hardware is connected to APIC pin0, it is a BIOS bug
that an ACPI interrupt source override from pin2 to IRQ0 exists.

With this simple 2.6.5 patch you can specify "acpi_skip_timer_override"
to ignore that bogus BIOS directive. The result is with your
ACPI-enabled APIC-enabled kernel, you'll get IRQ0 IO-APIC-edge timer.

Probably there is a more clever way to trigger this workaround
automatcially instead of via boot parameter.

cheers,
-Len

===== Documentation/kernel-parameters.txt 1.44 vs edited =====
--- 1.44/Documentation/kernel-parameters.txt Mon Mar 22 16:03:22 2004
+++ edited/Documentation/kernel-parameters.txt Tue Apr 13 17:47:11 2004
@@ -122,6 +122,10 @@

acpi_serialize [HW,ACPI] force serialization of AML methods

+ acpi_skip_timer_override [HW,ACPI]]
+ Recognize IRQ0/pin2 Interrupt Source Override
+ and ignore it -- for broken nForce2 BIOS.
+
ad1816= [HW,OSS]
Format: <io>,<irq>,<dma>,<dma2>
See also Documentation/sound/oss/AD1816.
===== arch/i386/kernel/setup.c 1.115 vs edited =====
--- 1.115/arch/i386/kernel/setup.c Fri Apr 2 07:21:43 2004
+++ edited/arch/i386/kernel/setup.c Tue Apr 13 17:41:31 2004
@@ -614,6 +614,12 @@
else if (!memcmp(from, "acpi_sci=low", 12))
acpi_sci_flags.polarity = 3;

+ else if (!memcmp(from, "acpi_skip_timer_override", 24)) {
+ extern int acpi_skip_timer_override;
+
+ acpi_skip_timer_override = 1;
+ }
+
#ifdef CONFIG_X86_LOCAL_APIC
/* disable IO-APIC */
else if (!memcmp(from, "noapic", 6))
===== arch/i386/kernel/acpi/boot.c 1.57 vs edited =====
--- 1.57/arch/i386/kernel/acpi/boot.c Tue Mar 30 17:05:19 2004
+++ edited/arch/i386/kernel/acpi/boot.c Tue Apr 13 17:50:14 2004
@@ -62,6 +62,7 @@

acpi_interrupt_flags acpi_sci_flags __initdata;
int acpi_sci_override_gsi __initdata;
+int acpi_skip_timer_override __initdata;

#ifdef CONFIG_X86_LOCAL_APIC
static u64 acpi_lapic_addr __initdata = APIC_DEFAULT_PHYS_BASE;
@@ -327,6 +328,12 @@
acpi_sci_ioapic_setup(intsrc->global_irq,
intsrc->flags.polarity, intsrc->flags.trigger);
return 0;
+ }
+
+ if (acpi_skip_timer_override &&
+ intsrc->bus_irq == 0 && intsrc->global_irq == 2) {
+ printk(PREFIX "BIOS IRQ0 pin2 override ignored.\n");
+ return 0;
}

mp_override_legacy_irq (

Attachments:

wip.patch (1.76 kB)

2004-04-14 04:24:25

by ben soo

[permalink] [raw]

Subject: Re: IO-APIC on nforce2

i must add that i've been using your patches for
the nForce chipset since they first appeared on
this mailing list, and while they've all helped
this box to last a bit longer between lockups
none of them cured it. Once the IO-APIC code was
compiled in and the Athlon idle powersaving
turned on it would inevitabley lock up in a day
or two.

This incorrect result from the mismatch between
your 2.6.3 patches and the current IO-APIC
code is the first time this box seems to be
free from lockup.

b

On Tue, Apr 13, 2004 at 05:18:24PM -0400, really bensoo_at_soo_dot_com wrote:
> My irq0 says XT-PIC. i'm not complaining, box's still
> very stable and since the last post i've burned a few
> DVDs on it while running the file share client and
> playing music.
>
> cat /proc/interrupts
>
> CPU0
> 0: 759809583 XT-PIC timer
> 1: 382279 IO-APIC-edge i8042
> 2: 0 XT-PIC cascade
> 8: 1 IO-APIC-edge rtc
> 9: 0 IO-APIC-level acpi
> 12: 6386931 IO-APIC-edge i8042
> 14: 2117474 IO-APIC-edge ide0
> 15: 5575006 IO-APIC-edge ide1
> 201: 6425958 IO-APIC-level EMU10K1
> 209: 167929203 IO-APIC-level eth0
> NMI: 0
> LOC: 759718637
> ERR: 0
> MIS: 0

2004-04-14 04:59:14

by Ross Dickson

[permalink] [raw]

Attachments:

dmiabitnf7sv2d23apic.txt (11.02 kB)

2004-04-22 08:45:29

On Wed, Apr 21, 2004 at 06:41:38PM -0400, Len Brown wrote:
> > Please send me the output from dmidecode, available in /usr/sbin/, or
>
> I've got 1 Abit, 2 Asus, and 1 Shuttle DMI entry. Let me know if the
> product names (1st line of dmidecode entry) are correct,
> these are not from DMI, but are supposed to be human-readable titles.
>
> I'm interested only in the latest BIOS -- if it is still broken.
> The assumption is that if a fixed BIOS is available, the users
> should upgrade.
>
> thanks,
> -Len
>
> ps. latest BIOS on my shuttle has a C1 disconnect enable setting,
> (curiously, it is disabled by default) so I'll try to reproduce the hang
> on it...
>

On the Shuttle AN35N, the C1 disconnect option default is auto. If you're
talking about this board, or another board Shuttle seemingly fixed, then I
can tell you that I haven't been able to get my to hang with vanilla kernels.

As for your patch, I get a fast timer, and gain about 1 sec per 5 minutes.
The only patch that seemed to work without a fast timer so far was the one
removed by Linus in a testing version. The AN35N has the timer override
bug.

Attached is the dmidecode for the AN35N. Note: onboard sound may be disabled.

Jesse

Attachments:

(No filename) (1.19 kB)
junk (10.81 kB)
Download all attachments

2004-04-22 17:22:42

by Brown, Len

[permalink] [raw]

Subject: Re: IO-APIC on nforce2 [PATCH] + [PATCH] for nmi_debug=1 + [PATCH] for idle=C1halt, 2.6.5

On Thu, 2004-04-22 at 12:39, Jesse Allen wrote:

> On the Shuttle AN35N, the C1 disconnect option default is auto. If you're
> talking about this board, or another board Shuttle seemingly fixed, then I
> can tell you that I haven't been able to get my to hang with vanilla kernels.

Have you been able to hang the AN35N under any conditions?
Old BIOS, non-vanilla kernel?

> As for your patch, I get a fast timer, and gain about 1 sec per 5 minutes.
> The only patch that seemed to work without a fast timer so far was the one
> removed by Linus in a testing version. The AN35N has the timer override
> bug.

Hmm, I didn't notice fast time on my FN41, i'll look for it.

I'm not familiar with the "one removed by Linux in a testing version",
perhaps you could point me to that?

> Attached is the dmidecode for the AN35N.

applied.

thanks,
-Len

2004-04-22 20:51:41

Hello,

I'm sorry for the small interlude in this thread, but I just want to get
something clear.

Basically we have a problem that is all around, except for (some) Shuttle
boards. Noone really knows what's going on, or at least if they know they
are not vocal about it.

In comes Ross Dickson. He starts poking at the problem until he comes up
with two patches. Near the end of 2003, an NVIDIA engineer (Allen Martin)
states that he (or maybe NVIDIA as a whole?) has been unable to reproduce
this weird problem with hard locks, seemingly related to APIC and IO.

He can tell us there was a bug in a reference BIOS that NVIDIA sent out
into the world, but that it has been fixed in a follow-up. Somewhere at
the start of December, Shuttle updates its BIOS for the AN35. Jesse Allen
flashes the new BIOS into his board and for reasons unknown his hard lock
problem has vanished. The importance of the update of NVIDIA's reference
BIOS in relation to the Shuttle update of the BIOS for their product(s) is
unknown as well.

Meanwhile, Ross Dickson drops requests for support tickets at AMD and
NVIDIA. Until this day, no reply yet. Unaffected by the deafening silence
he keeps improving his patches which seem to work(tm).

Without Ross' hard labor one can avoid the hard locks by banning APIC
support from the kernel, or turn off the C1 disconnect feature in the
BIOS, which is misinterpreted by one ACPI developer as running the CPU
"out of spec."

Recently Len Brown, the ACPI Linux kernel maintainer and Intel employee -
can you spot the irony? - agrees to attempt to reproduce the problem.
After having his box run with cat /dev/hda > /dev/null for a night
straight no lockup has occured. The brand of his motherboard is Shuttle.
Did I mention irony...?

Although this topic is primarily about nforce2 chipsets, similar problems
have been reported with SiS chipsets for AMD cpus. Other chipsets capable
of having the CPU disconnect include VIA KT266(A), KT333 and KT400. For
linux a tool like athcool can set the bits for the disconnect and the HLT
instruction. It is unconfirmed that these chipsets suffer from the same
symptoms as nforce2 chipsets.

Does anyone have some input on how to tackle this problem? The only things
I can come up with is mailing all the motherboard manufacturers I can
think of, harass NVIDIA and/or AMD some more through proper channels (i.e.
file a "bug report", but I don't expect much from this, sorry Allen) or
buy Len the cheapest broken nforce2 board I can find at pricewatch.com and
have it shipped to his house :)

Best regards,

Arjen

2004-04-27 17:36:38

On Thu, 2004-04-29 at 13:44, Ross Dickson wrote:
> On Thursday 29 April 2004 06:59, Jesse Allen wrote:
> > On Wed, Apr 28, 2004 at 09:33:34PM +1000, Ross Dickson wrote:
> > > >
> > > > It may be this board never hangs no matter what,
> > > > or perhaps C1 disconnect was simply disabled in that BIOS
> > > > b/c there was no option for it in Advanced Chipset Features
> > > > like there is for the most recent BIOS.
> > >
> > > Maybe other MOBO manufacturers skimp on filter caps and regulator damping
> > > ability and a resonance occurs in the on-board supply rails? Do Shuttle make
> > > any claims to using an improved on board regulator? Or Shuttle may have
> > > always programmed more time in C1 cycle handshakes if such is
> > > configurable?
> >
> > Do you really think so? I think there may be a resonance occuring, even with
> > this new BIOS. I plugged in new headphones into my nforce2 onboard sound, and
> > get a high pitched noise. Now here is where it gets weird: This noise does
> > not occur on boot until sometime after the IDE driver is loaded. I also
> > believe it varies under a high load. If you disable C1 disconnect, it's gone.
> > Also I've heard a high pitched noise at certain times coming right from the
> > copmuter (very faint, but I do have very good hearing, I can even hear a hush
> > sounding from my router. my brother was quite astonished when I pointed that
> > out) I try to distinguish whats doing it. It could be the hard drive. But
> > when I found the other sound in the head phones, I found that the sound varies
> > almost in unison with the sound coming from the computer. Maybe the IDE or
> > hard drive is related, but it is too much related to C1 disconnect.
>
> I think I might break out my oscilloscope this weekend and have a look at how
> clean the supply rails are around the cpu and northbridge and southbridge.
> Who knows I might get lucky and see some unexpected ripple or spikes.
>
> >
> > Whether it is really possible that my board can really generate this sound, I
> > don't know. Though, I have once determined that resonance was occuring in an
> > old system, causing unstable CPU operation. It wasn't that I heard a sound
> > coming from it =). But what I thought was the case was causing it, and pulled
> > it out of the case. I ran it on the table and found it to be stable. That
> > was the only thing wrong. I've also studied resonance before a bit. I know
> > resonance can break systems. But to think that my board is doing emmitting
> > noise like that is pretty bizarre.
>
> Not as bizarre as you may think. I have heard coils and even capacitors "sing"
> in years past whilst servicing electronics.
>
> >
> > It may be true that this Shuttle board may have resonance problems. So that
> > would indicate that they did something much like you describe by changing the
> > C1 handshake time? Isn't that much like what your patch does?
>
> I had not really thought about it from that perspective. Whilst my patch cannot
> alter the handshake times it does prevent consecutive C1 cycles from occurring
> too close together. Too close together I think being less than about 800ns. I
> guess I could look at that with a cro too - use an appropriate pin as the trigger
> source and see if supply rails have load dump voltage rises when going into
> disconnect. Maybe rail voltage rings for about 700ns and might be out of
> tolerence inside Athlon during that time. Would be very interesting if a
> few hundred picofarad of low esr decoupling cap placed on a supply rail near a
> chip makes a difference? A pinout of the nforce2 chipset would help a great deal
> here but I do not have one. Can anyone oblige me?
>
> >
> >
> > >
> > > > hang issue is completely explained and solved.
> > >
> > > I have had good (100%) success in reproducing the fault with the Albatron
> > > KM18G pro MOBO. I needed m-atx form factor and distributor was local to me.
> > > Makes very nice - cheap and stable system but only with the lockup workaround.
> > >
> > > I also recollect that Windows had lockups with nforce2 for a while depending
> > > whether you ran the Nvidia or Microsoft driver.
> > > http://lkml.org/lkml/2003/12/13/5
> > > Anybody got the inside running on that one and what was different between the
> > > two drivers?
> > >
> >
> > Yeah, unfortunately, I didn't save a link to the message board that I found
> > that on. But the issue is pretty common. I'm sure more info can be found on i
> > the windows side.
>
> No tech info but this link shows user had Lockups with Nvidia's ide driver but
> OK with MS one.
> http://club.cdfreaks.com/showthread/t-91381.html
>
> -

This has become a rather interesting problem to watch from afar. The
Athlon here seems to have no issues with the NForce driver under Windows
(I dont burn a lot of DVDs on it tho). Whenever its in Linux, its mainly
a testing machine these days.

It will be interesting to see if theres a real hardware problem and then
if it can be worked around in software (cant image a single product
recall happening).

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part

2004-04-29 12:26:54

by Maciej W. Rozycki

[permalink] [raw]

Subject: Re: IO-APIC on nforce2 [PATCH] + [PATCH] for nmi_debug=1 + [PATCH] for idle=C1halt, 2.6.5

On Thu, 29 Apr 2004, Jamie Lokier wrote:

> > Not necessarily related to the PSU, but the noise may actually be the
> > reason of spurious timer interrupts.
>
> With most device interrupts, additional spurious ones don't cause any
> malfunction because the driver's handler checks whether the device
> actually has a condition pending.

Note the 8254 timer uses edge-triggered interrupts and is just a square
wave signal. There's no acking to deassert the interrupt -- it goes away
spontaneously after a predefined time.

> This is the basis of shared interrupts, of course.

Yep, but the timer is non-shareable by definition.

> Is there any way we can check the timer itself to see whether an
> interrupt was caused by it, so that spurious timer interrupts are ignored?

This may be possible, but complicated and likely unreliable -- an I/O
APIC may deliver a spurious interrupt at the time a real one would be
probable and you can't check if a period between two consecutive timer
interrupts is appropriate without an additional time reference, which may
be unavailable (like the TSC).

Note the timer is special -- we don't really do any device handling, but
we want to get periodic interrupts at the right times to have a time
reference. Coalescing interrupts or discarding spurious ones, which is
normal and acceptable for regular devices, doesn't work here.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2004-04-29 20:24:19

by Jesse Allen

[permalink] [raw]

Subject: Re: IO-APIC on nforce2 [PATCH] + [PATCH] for nmi_debug=1 + [PATCH] for idle=C1halt, 2.6.5

On Thu, Apr 29, 2004 at 09:44:37PM +1000, Ross Dickson wrote:
> On Thursday 29 April 2004 06:59, Jesse Allen wrote:
> > almost in unison with the sound coming from the computer. Maybe the IDE or
> > hard drive is related, but it is too much related to C1 disconnect.
>
> I think I might break out my oscilloscope this weekend and have a look at how
> clean the supply rails are around the cpu and northbridge and southbridge.
> Who knows I might get lucky and see some unexpected ripple or spikes.

I'd be interested in knowing the results.

> > resonance can break systems. But to think that my board is doing emmitting
> > noise like that is pretty bizarre.
>
> Not as bizarre as you may think. I have heard coils and even capacitors "sing"
> in years past whilst servicing electronics.

Yes, I know that these things can theorectically happen. But when it happens
to me, it's a suprise. To an electronics genius, he probably encounters it
more often. =)

> > C1 handshake time? Isn't that much like what your patch does?
>
> I had not really thought about it from that perspective. Whilst my patch cannot
> alter the handshake times it does prevent consecutive C1 cycles from occurring
> too close together. Too close together I think being less than about 800ns. I

ah, ok.

> guess I could look at that with a cro too - use an appropriate pin as the
> trigger source and see if supply rails have load dump voltage rises when
> going into disconnect. Maybe rail voltage rings for about 700ns and might be
> out of tolerence inside Athlon during that time. Would be very interesting if
> a few hundred picofarad of low esr decoupling cap placed on a supply rail
> near a chip makes a difference? A pinout of the nforce2 chipset would help a
> great deal here but I do not have one. Can anyone oblige me?

What I'd like to know is where the sound chip is really at on my board. I've
tried looking before, but find myself confused.

A pic:
http://us.shuttle.com/images/productimages/AN35.jpg

According to a diagram that I have, it points to an AC'97 6-CH AUDIO as a chip
near of the top of the board in the image that I link to, above 2nd PCI slot
left of the AGP. But I'm am also left thinking, how does the NForce2 MCP come
into play. Specs would help. Maybe if we can figure out how the sound is
wired on the board, we could also trace the source of noise to the exact
component.

Jesse

2004-04-29 20:42:59

by Prakash K. Cheemplavam

[permalink] [raw]

Subject: Re: IO-APIC on nforce2 [PATCH] + [PATCH] for nmi_debug=1 + [PATCH] for idle=C1halt, 2.6.5

Jesse Allen wrote:
> What I'd like to know is where the sound chip is really at on my board. I've
> tried looking before, but find myself confused.
>
> A pic:
> http://us.shuttle.com/images/productimages/AN35.jpg
>
> According to a diagram that I have, it points to an AC'97 6-CH AUDIO as a chip
> near of the top of the board in the image that I link to, above 2nd PCI slot
> left of the AGP. But I'm am also left thinking, how does the NForce2 MCP come
> into play. Specs would help. Maybe if we can figure out how the sound is
> wired on the board, we could also trace the source of noise to the exact
> component.

Yes, I also think the chip above 2nd PCI slot is the right one. You can
see the realtek logo. It is only a ac97 codec (basically not more than a
DAC and ADC) and linux currently only has drivers for this. The MCP-T
has an APU, which could do dsp stuff by hardware, but no drivers still
(Hello Nvidia?), so all of this is done via software. (THe APU has even
more functionality, like DD5.1 realtime encoding, fx, and whatever). In
our case, the APU shouldn't cause any troubles, as it is not used. With
the APU, nforce2 chipset behaves like a "real" soundcard. Without, its
sound abilities are not better than the average mainboard's onboard sound.

Prakash