2002-03-15 11:33:53

by Mark Hounschell

[permalink] [raw]
Subject: Advanced Programmable Interrupt Controller (APIC)?

We have a number of DELL boxes in house. The most recent a 6400 (quad p4 box). Also a couple
of dual boxes. They all lock up intermittantly when APIC is enabled. There seems to be no solid way of reproducing the
hangs they just hang randomly. With APIC disabled they do not.
The app that we have to run on these boxes requires that APIC is enabled for irq affinity.
It is a soft real-time app that cannot tollerate the jitter. It will run but interrupt latency
is unexceptable without the irq affinity set. I've read on many lists that if you have random
lockups that you should disableapic. I've also got a number of NON-DELL boxes that don't exibit this lockup. Now I've
also heard that DELL does not properly setup the APIC chip in
the bios because MS os's don't use it. Have no idea if this is true or not. We are using
vanilla kernels (2.4.16) with some process affinity patches applied. I've also noticed that
on the quad boxes (6400) that irqs 0,1,2 cannot be directed to or from a particular processor.
This is also a problem with our app. Mostly it's the lockups that occur with APIC enabled that
is our roadblock for using these nice DELL boxes for our app. Can anyone shed some light on
this.

Thanks in advance
and regards
--
Mark Hounschell
[email protected]


2002-03-15 16:02:31

by Tim Kay

[permalink] [raw]
Subject: Re: Advanced Programmable Interrupt Controller (APIC)?

Mark,
we have a similar problem using PowerEdge 2450s 1550s and 6400s, all our
machines are running with noapic in the lilo config which sounds like it
isn't an option for you. I'd be interested where you heard about Dell
stuffing up the setup of the APIC chip because we may be able to take this up
with them. I've had no reply from the list for the message below (maybe it
would be better posted to Kernel Traffic SMP but that's a very quiet list).
Anyway maybe the BSD diagnostics will help you investigate this.

--------------------enc---------------
Hello,
just a quickie, our Dell Poweredge boxes - Serverworks motherboard - are
continually pumping out IO-APIC errors as I've reported here before, we have
three of the same boxes running FreeBSD (limitless file descriptors per
process - sorry, we need it!) and I've just noticed that dmesg on these says
that:

IO APIC - APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin
2
APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0

Does this help anyone diagnose the error??

/-------------------enc------------------



On Friday 15 Mar 2002 11:34, Mark Hounschell wrote:
> We have a number of DELL boxes in house. The most recent a 6400 (quad p4
> box). Also a couple of dual boxes. They all lock up intermittantly when
> APIC is enabled. There seems to be no solid way of reproducing the hangs
> they just hang randomly. With APIC disabled they do not.
> The app that we have to run on these boxes requires that APIC is enabled
> for irq affinity. It is a soft real-time app that cannot tollerate the
> jitter. It will run but interrupt latency is unexceptable without the irq
> affinity set. I've read on many lists that if you have random lockups that
> you should disableapic. I've also got a number of NON-DELL boxes that don't
> exibit this lockup. Now I've also heard that DELL does not properly setup
> the APIC chip in
> the bios because MS os's don't use it. Have no idea if this is true or not.
> We are using vanilla kernels (2.4.16) with some process affinity patches
> applied. I've also noticed that on the quad boxes (6400) that irqs 0,1,2
> cannot be directed to or from a particular processor. This is also a
> problem with our app. Mostly it's the lockups that occur with APIC enabled
> that is our roadblock for using these nice DELL boxes for our app. Can
> anyone shed some light on this.
>
> Thanks in advance
> and regards

--
----------------
Tim Kay
systems administrator
Advfn.com Plc - http://www.advfn.com/
[email protected]
Tel: 020 7070 0941
Fax: 020 7070 0959

2002-03-15 16:07:42

by Matt Domsch

[permalink] [raw]
Subject: RE: Advanced Programmable Interrupt Controller (APIC)?

> Now I've
> also heard that DELL does not properly setup the APIC chip in
> the bios because MS os's don't use it. Have no idea if this
> is true or not.

To the best of my knowledge, BIOS and Linux work together to set up the
APICs properly on the PowerEdge 6400 (and all our other servers too). If
someone has proof that we don't, and what should be done instead, please let
me know.

Thanks,
Matt

--
Matt Domsch
Sr. Software Engineer
Dell Linux Solutions http://www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
#1 US Linux Server provider for 2001! (IDC Mar 2002)

2002-03-15 16:17:51

by Tim Kay

[permalink] [raw]
Subject: Re: Advanced Programmable Interrupt Controller (APIC)?

Matt,
I'll repeat this here too:


IO APIC - APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin
2
APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0

The above is a diagnostic from a FreeBSD box bootup, this would seem to
suggest that the motherboard rather than Linux is at fault....

Tim

On Friday 15 Mar 2002 16:06, [email protected] wrote:
> > Now I've
> > also heard that DELL does not properly setup the APIC chip in
> > the bios because MS os's don't use it. Have no idea if this
> > is true or not.
>
> To the best of my knowledge, BIOS and Linux work together to set up the
> APICs properly on the PowerEdge 6400 (and all our other servers too). If
> someone has proof that we don't, and what should be done instead, please
> let me know.
>
> Thanks,
> Matt

--
----------------
Tim Kay
systems administrator
Advfn.com Plc - http://www.advfn.com/
[email protected]
Tel: 020 7070 0941
Fax: 020 7070 0959

2002-03-15 17:28:35

by Mark Hounschell

[permalink] [raw]
Subject: Re: Advanced Programmable Interrupt Controller (APIC)?

Tim Kay wrote:
>
> Mark,
> we have a similar problem using PowerEdge 2450s 1550s and 6400s, all our
> machines are running with noapic in the lilo config which sounds like it
> isn't an option for you. I'd be interested where you heard about Dell
> stuffing up the setup of the APIC chip because we may be able to take this up
> with them.

Search the kernel mailing lists at kernel.org for apic and you will find a number
of them. Of coarse this is all hear say.

>I've had no reply from the list for the message below (maybe it
> would be better posted to Kernel Traffic SMP but that's a very quiet list).
> Anyway maybe the BSD diagnostics will help you investigate this.

I have no experience with BSD.

>
> --------------------enc---------------
> Hello,
> just a quickie, our Dell Poweredge boxes - Serverworks motherboard - are
> continually pumping out IO-APIC errors as I've reported here before, we have
> three of the same boxes running FreeBSD (limitless file descriptors per
> process - sorry, we need it!) and I've just noticed that dmesg on these says
> that:
>
> IO APIC - APIC_IO: Testing 8254 interrupt delivery
> APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin
> 2
> APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
>
> Does this help anyone diagnose the error??

I see a message very similar but with the text "Broken_Bios". I don't really know
if it is related to the problem or not.

One thing for sure that I can say is that irqs 0,1,2 cannot be directed to or from
any processor on these 6400 boxes. They insist on being stuck to all 4 processors.
This I do beleive is related to the HANGS that I have.

--
Mark Hounschell
[email protected]

2002-03-15 18:49:41

by jak

[permalink] [raw]
Subject: Re: Advanced Programmable Interrupt Controller (APIC)?

> IO APIC - APIC_IO: Testing 8254 interrupt delivery
> APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0
> intpin 2
> APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
>
> The above is a diagnostic from a FreeBSD box bootup, this would seem to
> suggest that the motherboard rather than Linux is at fault....
>
>>> Now I've
>>> also heard that DELL does not properly setup the APIC chip in
>>> the bios because MS os's don't use it. Have no idea if this
>>> is true or not.
>>
>> To the best of my knowledge, BIOS and Linux work together to set up the
>> APICs properly on the PowerEdge 6400 (and all our other servers too). If
>> someone has proof that we don't, and what should be done instead, please
>> let me know.

--------------------

You might try Maciej W. Rozycki's latest IRQ0 patch, included below. It
fixes a similar problem with IRQ0 on the Dell PowerEdge server boxes.

Joe

--------------------

> From: "Maciej W. Rozycki" <[email protected]>
> To: Marcelo Tosatti <[email protected]>,
> Linus Torvalds <[email protected]>
> cc: Ingo Molnar <[email protected]>, Joe Korty <[email protected]>
> Subject: [patch] 2.4.18, 2.5.5: I/O APIC through-8259A mode IRQ 0 routing
> Organization: Technical University of Gdansk

Hello,

There is a problem with the through-8259A mode for IRQ 0 on I/O APIC
systems. Depending on correctness of an MP table, IRQ 0 routing is either
not registered at all or registered at a wrong pin. As a result the 8254
timer IRQ only works by an accident (it's edge-triggered and never
disabled/enabled so it happens to survive this incorrect configuration).
A visible effect is you can't change the affinity for IRQ 0.

Following is a patch that fixes both cases referred to above. The code
looks obvious but it was additionally run-time tested just in case. The
issue is serious -- please apply the patch ASAP. As no changes were done
to io_apic.c since the development fork, the patch applies cleanly both to
2.4 and to 2.5.

Credit goes to Joe for discovering the affinity problem and providing a
fix proposal (incorporated in the final one).

Maciej

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

patch-2.4.18-irq0_pin-1
diff -up --recursive --new-file linux-2.4.18.macro/arch/i386/kernel/io_apic.c linux-2.4.18/arch/i386/kernel/io_apic.c
--- linux-2.4.18.macro/arch/i386/kernel/io_apic.c Fri Nov 23 15:32:04 2001
+++ linux-2.4.18/arch/i386/kernel/io_apic.c Fri Mar 1 14:58:20 2002
@@ -67,7 +67,7 @@ static struct irq_pin_list {
* shared ISA-space IRQs, so we have to support them. We are super
* fast in the common case, and fast for shared ISA-space IRQs.
*/
-static void add_pin_to_irq(unsigned int irq, int apic, int pin)
+static void __init add_pin_to_irq(unsigned int irq, int apic, int pin)
{
static int first_free_entry = NR_IRQS;
struct irq_pin_list *entry = irq_2_pin + irq;
@@ -85,6 +85,26 @@ static void add_pin_to_irq(unsigned int
entry->pin = pin;
}

+/*
+ * Reroute an IRQ to a different pin.
+ */
+static void __init replace_pin_at_irq(unsigned int irq,
+ int oldapic, int oldpin,
+ int newapic, int newpin)
+{
+ struct irq_pin_list *entry = irq_2_pin + irq;
+
+ while (1) {
+ if (entry->apic == oldapic && entry->pin == oldpin) {
+ entry->apic = newapic;
+ entry->pin = newpin;
+ }
+ if (!entry->next)
+ break;
+ entry = irq_2_pin + entry->next;
+ }
+}
+
#define __DO_ACTION(R, ACTION, FINAL) \
\
{ \
@@ -1533,6 +1553,10 @@ static inline void check_timer(void)
setup_ExtINT_IRQ0_pin(pin2, vector);
if (timer_irq_works()) {
printk("works.\n");
+ if (pin1 != -1)
+ replace_pin_at_irq(0, 0, pin1, 0, pin2);
+ else
+ add_pin_to_irq(0, 0, pin2);
if (nmi_watchdog == NMI_IO_APIC) {
setup_nmi();
check_nmi_watchdog();