Hi everyone,
The following patch fixes a bug that prevents a write to
/proc/irq/0/smp_affinity from actually changing the cpu affinity
of IRQ #0, on all the (Dell server) SMP machines I have access to.
Given the wide variety of IO APIC and legacy PIC usage on various SMP
motherboards, and the nascent state of my APIC understanding, it is
quite likely that this fix is not universal.
I would like expand this patch so that IRQ0 affinity assignment works
properly on as many i386 SMP motherboards as possible. If you have
such a motherboard, please first 1) verify that assignments to
/proc/irq/0/smp_affinity NOP for you, and 2) if it does NOP, that
this patch does or does not fix the problem on your system.
To verify that your system has the problem or not:
in one window, run `watch -n1 cat /proc/interrupts'.
in another window, assign some affinity value to irq0. In the
following example, cpu #0 (in a 4-cpu system) is to no longer get
irq0 interrupts:
echo e >/proc/irq/0/smp_affinity
If your system is working properly, the watch-window should no
longer show increments for the irq0 value for cpu0.
This patch is against 2.4.18-rc4
Joe
--- linux/arch/i386/kernel/io_apic.c.orig Tue Nov 13 20:28:41 2001
+++ linux/arch/i386/kernel/io_apic.c Mon Feb 25 13:17:13 2002
@@ -1537,6 +1537,7 @@
setup_nmi();
check_nmi_watchdog();
}
+ add_pin_to_irq(0, 0, pin2);
return;
}
/*
On Mon, 25 Feb 2002, Joe Korty wrote:
> The following patch fixes a bug that prevents a write to
> /proc/irq/0/smp_affinity from actually changing the cpu affinity
> of IRQ #0, on all the (Dell server) SMP machines I have access to.
A nice spotting. However what you describe is only a side effect of the
bug, which is the IRQ is kept registered at the wrong pin. It's only
because the timer is special and it's edge-triggered it remained unnoticed
for so long.
I propose the following patch. Instead of adding the new pin to the IRQ
0 registry unconditionally, it replaces the already registered pin if one
exists, otherwise it adds the new one. The reason is to remove the
reference to the old pin, which may be connected to an unknown device or
simply dangling and weird things may happen if it ever gets unmasked.
There is also a small performance impact of keeping two pins registered
for a single IRQ.
I don't know if the changes are relevant to your system as you haven't
sent the relevant fragment of a bootstrap log from your system. They
affect all systems that have a bogus IRQ 0 entry in the MP table. For
other systems the patch is equivalent to yours. Please test it if it
works for you as I don't have a suitable system with IRQ 0 unconnected
(I've been able to verify it builds only). Everyone with such a system is
invited to test the patch as well.
If results are positive, the patch will be submitted as is for inclusion
into 2.4 and 2.5.
Maciej
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +
patch-2.4.18-irq0_pin-1
diff -up --recursive --new-file linux-2.4.18.macro/arch/i386/kernel/io_apic.c linux-2.4.18/arch/i386/kernel/io_apic.c
--- linux-2.4.18.macro/arch/i386/kernel/io_apic.c Fri Nov 23 15:32:04 2001
+++ linux-2.4.18/arch/i386/kernel/io_apic.c Fri Mar 1 14:58:20 2002
@@ -67,7 +67,7 @@ static struct irq_pin_list {
* shared ISA-space IRQs, so we have to support them. We are super
* fast in the common case, and fast for shared ISA-space IRQs.
*/
-static void add_pin_to_irq(unsigned int irq, int apic, int pin)
+static void __init add_pin_to_irq(unsigned int irq, int apic, int pin)
{
static int first_free_entry = NR_IRQS;
struct irq_pin_list *entry = irq_2_pin + irq;
@@ -85,6 +85,26 @@ static void add_pin_to_irq(unsigned int
entry->pin = pin;
}
+/*
+ * Reroute an IRQ to a different pin.
+ */
+static void __init replace_pin_at_irq(unsigned int irq,
+ int oldapic, int oldpin,
+ int newapic, int newpin)
+{
+ struct irq_pin_list *entry = irq_2_pin + irq;
+
+ while (1) {
+ if (entry->apic == oldapic && entry->pin == oldpin) {
+ entry->apic = newapic;
+ entry->pin = newpin;
+ }
+ if (!entry->next)
+ break;
+ entry = irq_2_pin + entry->next;
+ }
+}
+
#define __DO_ACTION(R, ACTION, FINAL) \
\
{ \
@@ -1533,6 +1553,10 @@ static inline void check_timer(void)
setup_ExtINT_IRQ0_pin(pin2, vector);
if (timer_irq_works()) {
printk("works.\n");
+ if (pin1 != -1)
+ replace_pin_at_irq(0, 0, pin1, 0, pin2);
+ else
+ add_pin_to_irq(0, 0, pin2);
if (nmi_watchdog == NMI_IO_APIC) {
setup_nmi();
check_nmi_watchdog();
> On Mon, 25 Feb 2002, Joe Korty wrote:
>
>> The following patch fixes a bug that prevents a write to
>> /proc/irq/0/smp_affinity from actually changing the cpu affinity
>> of IRQ #0, on all the (Dell server) SMP machines I have access to.
>
> A nice spotting. However what you describe is only a side effect of the
> bug, which is the IRQ is kept registered at the wrong pin. It's only
> because the timer is special and it's edge-triggered it remained unnoticed
> for so long.
>
> I propose the following patch. [...]
>
> I don't know if the changes are relevant to your system as you haven't
> sent the relevant fragment of a bootstrap log from your system. They
> affect all systems that have a bogus IRQ 0 entry in the MP table. For
> other systems the patch is equivalent to yours. Please test it if it
> works for you as I don't have a suitable system with IRQ 0 unconnected
> (I've been able to verify it builds only). Everyone with such a system is
> invited to test the patch as well.
Thanks. Your proposed patch works fine on the two of my five SMP systems
I have been able to get my hands on this afternoon.
Joe
PS: My original dmesg logs may now be found at
http://www.mindspring.com/~jakorty/irq0.bugreport.orig.
On Fri, 1 Mar 2002, Joe Korty wrote:
> PS: My original dmesg logs may now be found at
> http://www.mindspring.com/~jakorty/irq0.bugreport.orig.
If you only have a line similar to this one:
..TIMER: vector=0x31 pin1=2 pin2=0
then a normal I/O APIC interrupt (pin1) is used and the patch is
irrelevant.
If you have lines like these:
..TIMER: vector=0x31 pin1=-1 pin2=0
...trying to set up timer (IRQ0) through the 8259A ...
..... (found pin 0) ...works.
then IRQ 0 is not registered so far (pin1 is -1) and add_pin_to_irq()
(added by your patch) is invoked ordinarily for pin2 like for other
interrupts.
But if you have lines like these:
..TIMER: vector=0x31 pin1=2 pin2=0
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ...
..... (found pin 0) ...works.
then IRQ 0 needs to be rerouted from pin1 to pin2 and replace_pin_at_irq()
is intended to do so. I'd be pleased to hear from someone with such a
system (they are quite common surprisingly); I'll simulate such a
configuration with my development system anyway.
Other timer configurations (they are two more, sigh) don't matter as they
don't route IRQ 0 via an I/O APIC. They are very rare as well.
Maciej
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +