2003-03-01 06:17:37

by Zwane Mwaikambo

[permalink] [raw]
Subject: [PATCH][2.5] why noirqbalance doesn't work

This patch fixes what seems to have been a longstanding bug. Ever since we
moved cpu bringup later into the boot process, we end up programming the
ioapics before we have any of our possible cpus in the cpu_online_map.
Therefore leading to the following current situation;

For walmart-smp, bigsmp and summit we set the logical destination for cpu
to TARGET_CPUS which can depend on the cpu_online_map, so what you would
normally see with noirqbalance would be all interrupts handled on cpu0
since at that stage no other cpu apart from the BSP is online.

You can check for this by looking at the ioredtbls at boottime for a two
way system;

.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 001 01 0 0 0 0 0 1 1 39
02 001 01 0 0 0 0 0 1 1 31
03 001 01 0 0 0 0 0 1 1 41
04 001 01 0 0 0 0 0 1 1 49
05 001 01 0 0 0 0 0 1 1 51
06 001 01 0 0 0 0 0 1 1 59

Notice that 'Log' is set to 1 instead of 3.

This patch will simply reprogram all the ioredtbls to handle the other
online cpus.

Patch tested on my 2way P2-400 and a 16way NUMAQ both with noirqbalance.
It will not affect the irqbalance case because we are simply setting
TARGET_CPUS which is done anyway.

before:
CPU0 CPU1
0: 1495632 0 IO-APIC-edge timer
1: 4270 0 IO-APIC-edge i8042
2: 0 0 XT-PIC cascade
8: 1 0 IO-APIC-edge rtc
12: 83592 0 IO-APIC-edge i8042
14: 93791 0 IO-APIC-edge ide0
15: 103167 0 IO-APIC-edge ide1
17: 1396088 0 IO-APIC-level EMU10K1, eth0
18: 56125 0 IO-APIC-level aic7xxx, aic7xxx
19: 2258 0 IO-APIC-level uhci-hcd, eth1, serial
NMI: 0 0
LOC: 1495566 1497133

after:
CPU0 CPU1
0: 1046157 1015670 IO-APIC-edge timer
1: 4923 4173 IO-APIC-edge i8042
2: 0 0 XT-PIC cascade
8: 1 0 IO-APIC-edge rtc
12: 48596 48968 IO-APIC-edge i8042
14: 4238 3416 IO-APIC-edge ide0
15: 25362 31525 IO-APIC-edge ide1
17: 3757 4014 IO-APIC-level EMU10K1, eth0
18: 335 366 IO-APIC-level aic7xxx, aic7xxx
19: 1052 908 IO-APIC-level uhci-hcd, eth1
NMI: 0 0
LOC: 2061856 2061893

Index: linux-2.5.63-DBE/arch/i386/kernel/io_apic.c
===================================================================
RCS file: /build/cvsroot/linux-2.5.63/arch/i386/kernel/io_apic.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 io_apic.c
--- linux-2.5.63-DBE/arch/i386/kernel/io_apic.c 27 Feb 2003 22:03:36 -0000 1.1.1.1
+++ linux-2.5.63-DBE/arch/i386/kernel/io_apic.c 1 Mar 2003 06:22:57 -0000
@@ -194,6 +194,31 @@
clear_IO_APIC_pin(apic, pin);
}

+/*
+ * This function currently is only a helper for the i386 smp boot process where
+ * we need to reprogram the ioredtbls to cater for the cpus which have come online
+ * so mask in all cases should simply be TARGET_CPUS
+ */
+void __devinit set_ioapic_logical_dest (unsigned long mask)
+{
+ struct IO_APIC_route_entry entry;
+ unsigned long flags;
+ int apic, pin;
+
+ spin_lock_irqsave(&ioapic_lock, flags);
+ for (apic = 0; apic < nr_ioapics; apic++) {
+ for (pin = 0; pin < nr_ioapic_registers[apic]; pin++) {
+ *(((int *)&entry)+0) = io_apic_read(apic, 0x10+pin*2);
+ *(((int *)&entry)+1) = io_apic_read(apic, 0x11+pin*2);
+ entry.dest.logical.logical_dest = mask;
+ io_apic_write(apic, 0x10 + 2 * pin, *(((int *)&entry) + 0));
+ io_apic_write(apic, 0x11 + 2 * pin, *(((int *)&entry) + 1));
+ }
+
+ }
+ spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
static void set_ioapic_affinity (unsigned int irq, unsigned long mask)
{
unsigned long flags;
Index: linux-2.5.63-DBE/arch/i386/kernel/smpboot.c
===================================================================
RCS file: /build/cvsroot/linux-2.5.63/arch/i386/kernel/smpboot.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 smpboot.c
--- linux-2.5.63-DBE/arch/i386/kernel/smpboot.c 27 Feb 2003 22:03:36 -0000 1.1.1.1
+++ linux-2.5.63-DBE/arch/i386/kernel/smpboot.c 1 Mar 2003 05:37:20 -0000
@@ -1152,8 +1152,10 @@
return 0;
}

+extern void set_ioapic_logical_dest(unsigned long mask);
void __init smp_cpus_done(unsigned int max_cpus)
{
+ set_ioapic_logical_dest(TARGET_CPUS);
zap_low_mappings();
}


--
function.linuxpower.ca


2003-03-01 06:53:29

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][2.5] why noirqbalance doesn't work

This should fix the noapic case with the patch applied.

Index: linux-2.5.63-DBE/arch/i386/kernel/io_apic.c
===================================================================
RCS file: /build/cvsroot/linux-2.5.63/arch/i386/kernel/io_apic.c,v
retrieving revision 1.2
diff -u -r1.2 io_apic.c
--- linux-2.5.63-DBE/arch/i386/kernel/io_apic.c 1 Mar 2003 06:52:16 -0000 1.2
+++ linux-2.5.63-DBE/arch/i386/kernel/io_apic.c 1 Mar 2003 06:52:25 -0000
@@ -205,6 +205,9 @@
unsigned long flags;
int apic, pin;

+ if (skip_ioapic_setup == 1)
+ return;
+
spin_lock_irqsave(&ioapic_lock, flags);
for (apic = 0; apic < nr_ioapics; apic++) {
for (pin = 0; pin < nr_ioapic_registers[apic]; pin++) {

--
function.linuxpower.ca

2003-03-01 08:49:34

by Willy Tarreau

[permalink] [raw]
Subject: [PATCH][2.4] APIC irq balance

On Sat, Mar 01, 2003 at 01:25:50AM -0500, Zwane Mwaikambo wrote:
> This patch fixes what seems to have been a longstanding bug. Ever since we
> moved cpu bringup later into the boot process, we end up programming the
> ioapics before we have any of our possible cpus in the cpu_online_map.
> Therefore leading to the following current situation;

Hi Zwane !

I've had the same problem on 2.4 since 2.4.21-pre1, but I couldn't find the
culprit. I've ported your patch to 2.4.21-pre5 and guess what ? it works, as
shown below. I'd like Maciej to review it quickly (if he has time), so that
Marcelo could include it in 2.4.21. Patch at the end.

Oh, I forgot to say : it's on an Asus A7M266-D, dual XP1800.

Anyway, congratulations for this finding !

Cheers,
Willy

----- dmesg:

NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00 1 0 0 0 0 0 0 00
01 003 03 0 0 0 0 0 1 1 39
02 003 03 0 0 0 0 0 1 1 31
03 003 03 0 0 0 0 0 1 1 41
04 003 03 0 0 0 0 0 1 1 49
05 003 03 0 0 0 0 0 1 1 51
06 003 03 0 0 0 0 0 1 1 59
07 003 03 0 0 0 0 0 1 1 61
08 003 03 0 0 0 0 0 1 1 69
09 003 03 0 0 0 0 0 1 1 71
0a 003 03 1 1 0 1 0 1 1 79
0b 003 03 1 1 0 1 0 1 1 81
0c 003 03 1 1 0 1 0 1 1 89
0d 003 03 0 0 0 0 0 1 1 91
0e 003 03 1 1 0 1 0 1 1 99
0f 003 03 0 0 0 0 0 1 1 A1

----- proc/interrupts :

CPU0 CPU1
0: 5001 4156 IO-APIC-edge timer
1: 188 125 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
8: 0 1 IO-APIC-edge rtc
10: 222 210 IO-APIC-level usb-ohci, eth0
11: 0 0 IO-APIC-level usb-ohci
12: 9099 8796 IO-APIC-level aic7xxx
15: 2 4 IO-APIC-edge ide1
NMI: 0 0
LOC: 9085 9084
ERR: 0
MIS: 0

----- patch


diff -urN linux-2.4.21-pre5/arch/i386/kernel/io_apic.c linux-2.4.21-pre5-apic/arch/i386/kernel/io_apic.c
--- linux-2.4.21-pre5/arch/i386/kernel/io_apic.c Sat Feb 1 19:42:12 2003
+++ linux-2.4.21-pre5-apic/arch/i386/kernel/io_apic.c Sat Mar 1 09:38:18 2003
@@ -1313,6 +1313,34 @@

static void mask_and_ack_level_ioapic_irq (unsigned int irq) { /* nothing */ }

+/*
+ * This function currently is only a helper for the i386 smp boot process where
+ * we need to reprogram the ioredtbls to cater for the cpus which have come online
+ * so mask in all cases should simply be TARGET_CPUS
+ */
+void __devinit set_ioapic_logical_dest (unsigned long mask)
+{
+ struct IO_APIC_route_entry entry;
+ unsigned long flags;
+ int apic, pin;
+
+ if (skip_ioapic_setup == 1)
+ return;
+
+ spin_lock_irqsave(&ioapic_lock, flags);
+ for (apic = 0; apic < nr_ioapics; apic++) {
+ for (pin = 0; pin < nr_ioapic_registers[apic]; pin++) {
+ *(((int *)&entry)+0) = io_apic_read(apic, 0x10+pin*2);
+ *(((int *)&entry)+1) = io_apic_read(apic, 0x11+pin*2);
+ entry.dest.logical.logical_dest = mask;
+ io_apic_write(apic, 0x10 + 2 * pin, *(((int *)&entry) + 0));
+ io_apic_write(apic, 0x11 + 2 * pin, *(((int *)&entry) + 1));
+ }
+
+ }
+ spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
static void set_ioapic_affinity (unsigned int irq, unsigned long mask)
{
unsigned long flags;
diff -urN linux-2.4.21-pre5/arch/i386/kernel/smpboot.c linux-2.4.21-pre5-apic/arch/i386/kernel/smpboot.c
--- linux-2.4.21-pre5/arch/i386/kernel/smpboot.c Sat Feb 1 19:42:12 2003
+++ linux-2.4.21-pre5-apic/arch/i386/kernel/smpboot.c Sat Mar 1 09:41:38 2003
@@ -971,6 +971,8 @@
extern int prof_old_multiplier[NR_CPUS];
extern int prof_counter[NR_CPUS];

+extern void set_ioapic_logical_dest(unsigned long mask);
+
static int boot_cpu_logical_apicid;
/* Where the IO area was mapped on multiquad, always 0 otherwise */
void *xquad_portio;
@@ -1223,5 +1225,6 @@
synchronize_tsc_bp();

smp_done:
+ set_ioapic_logical_dest(cpu_online_map);
zap_low_mappings();
}


2003-03-01 10:11:52

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][2.4] APIC irq balance

On Sat, 1 Mar 2003, Willy Tarreau wrote:

> Hi Zwane !
>
> I've had the same problem on 2.4 since 2.4.21-pre1, but I couldn't find the
> culprit. I've ported your patch to 2.4.21-pre5 and guess what ? it works, as
> shown below. I'd like Maciej to review it quickly (if he has time), so that
> Marcelo could include it in 2.4.21. Patch at the end.

Well that's interesting, i couldn't find a suspicious hunk, but that could
be because of peripheral noise in the patch.

> Oh, I forgot to say : it's on an Asus A7M266-D, dual XP1800.
>
> Anyway, congratulations for this finding !

Thanks =) I'll wait on Maciej especially for 2.4

Zwane
--
function.linuxpower.ca