2008-06-09 14:19:24

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 0/15] Improve x86 smpboot integration

Ingo,

First of all, sorry for being away for so long ;-)

Here it goes a series of improvements for the x86 smpboot integration.
The final goal is the same: To reduce the difference between architectures,
just this time I do a little bit more ;-)

Basically, this one reduces greatly the ifdef count:

[root@t60 linux-2.6-x86]# git-show 674ce:arch/x86/kernel/smpboot.c \
| grep -e "^#ifdef \+CONFIG_X86_\(32\|64\)" | wc -l
18
[root@t60 linux-2.6-x86]# git-show HEAD:arch/x86/kernel/smpboot.c \
| grep -e "^#ifdef \+CONFIG_X86_\(32\|64\)" | wc -l
8

The remaining ones are _mainly_ (not all) due to the fact that x86_64 uses the pda,
while i386 goes with normal per-cpu data. They are things like:

per_cpu(current_task, cpu) = c_idle.idle; vs cpu_pda(cpu)->pcurrent = c_idle.idle;

Also, there is the low mappings piece of code Hugh detected. To that, I intend
to also find a common base between them. Just I'm not doing it in this series, because
it deserves special attention.

As usual, this series was compiled tested in a whole bunch of different configs ( ~ 10
for each architecture), including all i386 variants. Boot tested in all my hardware.

The final diffstat is:

arch/x86/kernel/head_32.S | 2
arch/x86/kernel/head_64.S | 48 +---------------
arch/x86/kernel/io_apic_32.c | 2
arch/x86/kernel/setup64.c | 1
arch/x86/kernel/smpboot.c | 97 ++++------------------------------
b/arch/x86/kernel/acpi/sleep.c | 2
b/arch/x86/kernel/apic_32.c | 6 --
b/arch/x86/kernel/apic_64.c | 10 +++
b/arch/x86/kernel/head_32.S | 6 +-
b/arch/x86/kernel/head_64.S | 5 +
b/arch/x86/kernel/io_apic_32.c | 5 +
b/arch/x86/kernel/io_apic_64.c | 9 ++-
b/arch/x86/kernel/process_32.c | 16 +++++
b/arch/x86/kernel/setup64.c | 5 -
b/arch/x86/kernel/setup_64.c | 27 +++++++++
b/arch/x86/kernel/smpboot.c | 8 --
b/arch/x86/kernel/traps_32.c | 3 -
b/arch/x86/kernel/x8664_ksyms_64.c | 5 -
b/arch/x86/mach-voyager/voyager_smp.c | 2
b/include/asm-x86/desc.h | 24 +++-----
b/include/asm-x86/hw_irq.h | 3 -
b/include/asm-x86/numa_32.h | 1
b/include/asm-x86/segment.h | 24 ++++----
b/include/asm-x86/smp.h | 2
include/asm-x86/hw_irq.h | 4 -
25 files changed, 122 insertions(+), 195 deletions(-)


2008-06-09 14:18:40

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 02/15] x86: don't use gdt_page openly.

There's a macro available for that.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/traps_32.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/traps_32.c b/arch/x86/kernel/traps_32.c
index a92b0c7..9ad896d 100644
--- a/arch/x86/kernel/traps_32.c
+++ b/arch/x86/kernel/traps_32.c
@@ -1135,7 +1135,7 @@ void do_spurious_interrupt_bug(struct pt_regs *regs, long error_code)

unsigned long patch_espfix_desc(unsigned long uesp, unsigned long kesp)
{
- struct desc_struct *gdt = __get_cpu_var(gdt_page).gdt;
+ struct desc_struct *gdt = get_cpu_gdt_table(smp_processor_id());
unsigned long base = (kesp - uesp) & -THREAD_SIZE;
unsigned long new_kesp = kesp - base;
unsigned long lim_pages = (new_kesp | (THREAD_SIZE - 1)) >> PAGE_SHIFT;
--
1.5.4.5

2008-06-09 14:18:56

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 01/15] x86: use stack_start in x86_64

call x86_64's init_rsp stack_start, just as i386 does.
Put a zeroed stack segment for consistency. With this,
we can eliminate one ugly ifdef in smpboot.c.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/acpi/sleep.c | 2 +-
arch/x86/kernel/head_64.S | 5 +++--
arch/x86/kernel/smpboot.c | 7 +------
3 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/acpi/sleep.c b/arch/x86/kernel/acpi/sleep.c
index afc25ee..0103af2 100644
--- a/arch/x86/kernel/acpi/sleep.c
+++ b/arch/x86/kernel/acpi/sleep.c
@@ -72,7 +72,7 @@ int acpi_save_state_mem(void)
saved_magic = 0x12345678;
#else /* CONFIG_64BIT */
header->trampoline_segment = setup_trampoline() >> 4;
- init_rsp = (unsigned long)temp_stack + 4096;
+ stack_start.sp = temp_stack + 4096;
initial_code = (unsigned long)wakeup_long64;
saved_magic = 0x123456789abcdef0;
#endif /* CONFIG_64BIT */
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d8ed325..4949c32 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -191,7 +191,7 @@ ENTRY(secondary_startup_64)
movq %rax, %cr0

/* Setup a boot time stack */
- movq init_rsp(%rip),%rsp
+ movq stack_start(%rip),%rsp

/* zero EFLAGS after setting rsp */
pushq $0
@@ -252,8 +252,9 @@ ENTRY(secondary_startup_64)
.quad x86_64_start_kernel
__FINITDATA

- ENTRY(init_rsp)
+ ENTRY(stack_start)
.quad init_thread_union+THREAD_SIZE-8
+ .word 0

bad_address:
jmp bad_address
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 722b24b..c77a7d2 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -714,11 +714,7 @@ wakeup_secondary_cpu(int phys_apicid, unsigned long start_eip)
* target processor state.
*/
startup_ipi_hook(phys_apicid, (unsigned long) start_secondary,
-#ifdef CONFIG_X86_64
- (unsigned long)init_rsp);
-#else
(unsigned long)stack_start.sp);
-#endif

/*
* Run STARTUP IPI loop.
@@ -905,15 +901,14 @@ do_rest:
early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
c_idle.idle->thread.ip = (unsigned long) start_secondary;
/* Stack for startup_32 can be just as for start_secondary onwards */
- stack_start.sp = (void *) c_idle.idle->thread.sp;
irq_ctx_init(cpu);
#else
cpu_pda(cpu)->pcurrent = c_idle.idle;
- init_rsp = c_idle.idle->thread.sp;
load_sp0(&per_cpu(init_tss, cpu), &c_idle.idle->thread);
initial_code = (unsigned long)start_secondary;
clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
#endif
+ stack_start.sp = (void *) c_idle.idle->thread.sp;

/* start_ip had better be page-aligned! */
start_ip = setup_trampoline();
--
1.5.4.5

2008-06-09 14:19:38

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 03/15] x86: remove early_gdt_descr reference

since we use switch_to_new_gdt, there is no point
in assigning early_gdt_descr except for the first
assignment, which is done manually.

Signed-off-by: Glauber Costa <[email protected]>
CC: James Bottomley <[email protected]>
---
arch/x86/kernel/smpboot.c | 1 -
arch/x86/mach-voyager/voyager_smp.c | 1 -
2 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c77a7d2..181b109 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -898,7 +898,6 @@ do_rest:
#ifdef CONFIG_X86_32
per_cpu(current_task, cpu) = c_idle.idle;
init_gdt(cpu);
- early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
c_idle.idle->thread.ip = (unsigned long) start_secondary;
/* Stack for startup_32 can be just as for start_secondary onwards */
irq_ctx_init(cpu);
diff --git a/arch/x86/mach-voyager/voyager_smp.c b/arch/x86/mach-voyager/voyager_smp.c
index 8dedd01..6b97d9b 100644
--- a/arch/x86/mach-voyager/voyager_smp.c
+++ b/arch/x86/mach-voyager/voyager_smp.c
@@ -529,7 +529,6 @@ static void __init do_boot_cpu(__u8 cpu)

init_gdt(cpu);
per_cpu(current_task, cpu) = idle;
- early_gdt_descr.address = (unsigned long)get_cpu_gdt_table(cpu);
irq_ctx_init(cpu);

/* Note: Don't modify initial ss override */
--
1.5.4.5

2008-06-09 14:19:53

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 05/15] x86: use initial_code for i386

x86_64 jumps to whatever is written in "initial_code" symbol,
instead of a fixed address. Do it for i386 too. It will allow us
to integrate more of the smp boot code.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/head_32.S | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index b98b338..ffb73a5 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -455,7 +455,10 @@ is386: movl $2,%ecx # set MP
jmp initialize_secondary # all other CPUs call initialize_secondary
1:
#endif /* CONFIG_SMP */
- jmp i386_start_kernel
+ jmp *(initial_code)
+.align 4
+ENTRY(initial_code)
+ .long i386_start_kernel

/*
* We depend on ET to be correct. This checks for 287/387.
--
1.5.4.5

2008-06-09 14:20:12

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 04/15] x86: move x86_64 gdt closer to i386

i386 and x86_64 used two different schemes for maintaining the gdt.
With this patch, x86_64 initial gdt table is defined in a .c file,
same way as i386 is now. Also, we call it "gdt_page", and the descriptor,
"early_gdt_descr". This way we achieve common naming, which can allow for
more code integration.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/head_64.S | 48 ++++----------------------------------
arch/x86/kernel/setup64.c | 5 +---
arch/x86/kernel/setup_64.c | 27 +++++++++++++++++++++
arch/x86/kernel/smpboot.c | 10 +------
arch/x86/kernel/x8664_ksyms_64.c | 5 ----
include/asm-x86/desc.h | 24 ++++++++-----------
include/asm-x86/segment.h | 23 +++++++++--------
7 files changed, 57 insertions(+), 85 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 4949c32..febeabb 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -203,7 +203,7 @@ ENTRY(secondary_startup_64)
* addresses where we're currently running on. We have to do that here
* because in 32bit we couldn't load a 64bit linear address.
*/
- lgdt cpu_gdt_descr(%rip)
+ lgdt early_gdt_descr(%rip)

/* set up data segments. actually 0 would do too */
movl $__KERNEL_DS,%eax
@@ -391,54 +391,16 @@ NEXT_PAGE(level2_spare_pgt)

.data
.align 16
- .globl cpu_gdt_descr
-cpu_gdt_descr:
- .word gdt_end-cpu_gdt_table-1
-gdt:
- .quad cpu_gdt_table
-#ifdef CONFIG_SMP
- .rept NR_CPUS-1
- .word 0
- .quad 0
- .endr
-#endif
+ .globl early_gdt_descr
+early_gdt_descr:
+ .word GDT_ENTRIES*8-1
+ .quad per_cpu__gdt_page

ENTRY(phys_base)
/* This must match the first entry in level2_kernel_pgt */
.quad 0x0000000000000000

-/* We need valid kernel segments for data and code in long mode too
- * IRET will check the segment types kkeil 2000/10/28
- * Also sysret mandates a special GDT layout
- */
-
- .section .data.page_aligned, "aw"
- .align PAGE_SIZE
-
-/* The TLS descriptors are currently at a different place compared to i386.
- Hopefully nobody expects them at a fixed place (Wine?) */

-ENTRY(cpu_gdt_table)
- .quad 0x0000000000000000 /* NULL descriptor */
- .quad 0x00cf9b000000ffff /* __KERNEL32_CS */
- .quad 0x00af9b000000ffff /* __KERNEL_CS */
- .quad 0x00cf93000000ffff /* __KERNEL_DS */
- .quad 0x00cffb000000ffff /* __USER32_CS */
- .quad 0x00cff3000000ffff /* __USER_DS, __USER32_DS */
- .quad 0x00affb000000ffff /* __USER_CS */
- .quad 0x0 /* unused */
- .quad 0,0 /* TSS */
- .quad 0,0 /* LDT */
- .quad 0,0,0 /* three TLS descriptors */
- .quad 0x0000f40000000000 /* node/CPU stored in limit */
-gdt_end:
- /* asm/segment.h:GDT_ENTRIES must match this */
- /* This should be a multiple of the cache line size */
- /* GDTs of other CPUs are now dynamically allocated */
-
- /* zero the remaining page */
- .fill PAGE_SIZE / 8 - GDT_ENTRIES,8,0
-
.section .bss, "aw", @nobits
.align L1_CACHE_BYTES
ENTRY(idt_table)
diff --git a/arch/x86/kernel/setup64.c b/arch/x86/kernel/setup64.c
index fc1a56d..70ff071 100644
--- a/arch/x86/kernel/setup64.c
+++ b/arch/x86/kernel/setup64.c
@@ -202,11 +202,8 @@ void __cpuinit cpu_init (void)
* Initialize the per-CPU GDT with the boot GDT,
* and set up the GDT descriptor:
*/
- if (cpu)
- memcpy(get_cpu_gdt_table(cpu), cpu_gdt_table, GDT_SIZE);

- cpu_gdt_descr[cpu].size = GDT_SIZE;
- load_gdt((const struct desc_ptr *)&cpu_gdt_descr[cpu]);
+ switch_to_new_gdt();
load_idt((const struct desc_ptr *)&idt_descr);

memset(me->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
diff --git a/arch/x86/kernel/setup_64.c b/arch/x86/kernel/setup_64.c
index 2aee441..c57223f 100644
--- a/arch/x86/kernel/setup_64.c
+++ b/arch/x86/kernel/setup_64.c
@@ -83,6 +83,22 @@
#define ARCH_SETUP
#endif

+/* We need valid kernel segments for data and code in long mode too
+ * IRET will check the segment types kkeil 2000/10/28
+ * Also sysret mandates a special GDT layout
+ */
+/* The TLS descriptors are currently at a different place compared to i386.
+ Hopefully nobody expects them at a fixed place (Wine?) */
+DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
+ [GDT_ENTRY_KERNEL32_CS] = { { { 0x0000ffff, 0x00cf9b00 } } },
+ [GDT_ENTRY_KERNEL_CS] = { { { 0x0000ffff, 0x00af9b00 } } },
+ [GDT_ENTRY_KERNEL_DS] = { { { 0x0000ffff, 0x00cf9300 } } },
+ [GDT_ENTRY_DEFAULT_USER32_CS] = { { { 0x0000ffff, 0x00cffb00 } } },
+ [GDT_ENTRY_DEFAULT_USER_DS] = { { { 0x0000ffff, 0x00cff300 } } },
+ [GDT_ENTRY_DEFAULT_USER_CS] = { { { 0x0000ffff, 0x00affb00 } } },
+} };
+EXPORT_PER_CPU_SYMBOL_GPL(gdt_page);
+
/*
* Machine setup..
*/
@@ -273,6 +289,17 @@ void __attribute__((weak)) __init memory_setup(void)
machine_specific_memory_setup();
}

+/* Current gdt points %fs at the "master" per-cpu area: after this,
+ * it's on the real one. */
+void switch_to_new_gdt(void)
+{
+ struct desc_ptr gdt_descr;
+
+ gdt_descr.address = (long)get_cpu_gdt_table(smp_processor_id());
+ gdt_descr.size = GDT_SIZE - 1;
+ load_gdt(&gdt_descr);
+}
+
/*
* setup_arch - architecture-specific boot-time initializations
*
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 181b109..69639fd 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -849,14 +849,8 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu)
.done = COMPLETION_INITIALIZER_ONSTACK(c_idle.done),
};
INIT_WORK(&c_idle.work, do_fork_idle);
-#ifdef CONFIG_X86_64
- /* allocate memory for gdts of secondary cpus. Hotplug is considered */
- if (!cpu_gdt_descr[cpu].address &&
- !(cpu_gdt_descr[cpu].address = get_zeroed_page(GFP_KERNEL))) {
- printk(KERN_ERR "Failed to allocate GDT for CPU %d\n", cpu);
- return -1;
- }

+#ifdef CONFIG_X86_64
/* Allocate node local memory for AP pdas */
if (cpu > 0) {
boot_error = get_local_pda(cpu);
@@ -1253,8 +1247,8 @@ void __init native_smp_prepare_boot_cpu(void)
int me = smp_processor_id();
#ifdef CONFIG_X86_32
init_gdt(me);
- switch_to_new_gdt();
#endif
+ switch_to_new_gdt();
/* already set me in cpu_online_map in boot_cpu_init() */
cpu_set(me, cpu_callout_map);
per_cpu(cpu_state, me) = CPU_ONLINE;
diff --git a/arch/x86/kernel/x8664_ksyms_64.c b/arch/x86/kernel/x8664_ksyms_64.c
index 122885b..16a6751 100644
--- a/arch/x86/kernel/x8664_ksyms_64.c
+++ b/arch/x86/kernel/x8664_ksyms_64.c
@@ -60,8 +60,3 @@ EXPORT_SYMBOL(init_level4_pgt);
EXPORT_SYMBOL(load_gs_index);

EXPORT_SYMBOL(_proxy_pda);
-
-#ifdef CONFIG_PARAVIRT
-/* Virtualized guests may want to use it */
-EXPORT_SYMBOL_GPL(cpu_gdt_descr);
-#endif
diff --git a/include/asm-x86/desc.h b/include/asm-x86/desc.h
index b3875d4..07f9f2b 100644
--- a/include/asm-x86/desc.h
+++ b/include/asm-x86/desc.h
@@ -29,11 +29,17 @@ static inline void fill_ldt(struct desc_struct *desc,
extern struct desc_ptr idt_descr;
extern gate_desc idt_table[];

+struct gdt_page {
+ struct desc_struct gdt[GDT_ENTRIES];
+} __attribute__((aligned(PAGE_SIZE)));
+DECLARE_PER_CPU(struct gdt_page, gdt_page);
+
+static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
+{
+ return per_cpu(gdt_page, cpu).gdt;
+}
+
#ifdef CONFIG_X86_64
-extern struct desc_struct cpu_gdt_table[GDT_ENTRIES];
-extern struct desc_ptr cpu_gdt_descr[];
-/* the cpu gdt accessor */
-#define get_cpu_gdt_table(x) ((struct desc_struct *)cpu_gdt_descr[x].address)

static inline void pack_gate(gate_desc *gate, unsigned type, unsigned long func,
unsigned dpl, unsigned ist, unsigned seg)
@@ -51,16 +57,6 @@ static inline void pack_gate(gate_desc *gate, unsigned type, unsigned long func,
}

#else
-struct gdt_page {
- struct desc_struct gdt[GDT_ENTRIES];
-} __attribute__((aligned(PAGE_SIZE)));
-DECLARE_PER_CPU(struct gdt_page, gdt_page);
-
-static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
-{
- return per_cpu(gdt_page, cpu).gdt;
-}
-
static inline void pack_gate(gate_desc *gate, unsigned char type,
unsigned long base, unsigned dpl, unsigned flags,
unsigned short seg)
diff --git a/include/asm-x86/segment.h b/include/asm-x86/segment.h
index ed5131d..dfc8601 100644
--- a/include/asm-x86/segment.h
+++ b/include/asm-x86/segment.h
@@ -61,18 +61,14 @@
#define GDT_ENTRY_TLS_MAX (GDT_ENTRY_TLS_MIN + GDT_ENTRY_TLS_ENTRIES - 1)

#define GDT_ENTRY_DEFAULT_USER_CS 14
-#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS * 8 + 3)

#define GDT_ENTRY_DEFAULT_USER_DS 15
-#define __USER_DS (GDT_ENTRY_DEFAULT_USER_DS * 8 + 3)

#define GDT_ENTRY_KERNEL_BASE 12

#define GDT_ENTRY_KERNEL_CS (GDT_ENTRY_KERNEL_BASE + 0)
-#define __KERNEL_CS (GDT_ENTRY_KERNEL_CS * 8)

#define GDT_ENTRY_KERNEL_DS (GDT_ENTRY_KERNEL_BASE + 1)
-#define __KERNEL_DS (GDT_ENTRY_KERNEL_DS * 8)

#define GDT_ENTRY_TSS (GDT_ENTRY_KERNEL_BASE + 4)
#define GDT_ENTRY_LDT (GDT_ENTRY_KERNEL_BASE + 5)
@@ -139,10 +135,11 @@
#else
#include <asm/cache.h>

-#define __KERNEL_CS 0x10
-#define __KERNEL_DS 0x18
+#define GDT_ENTRY_KERNEL32_CS 1
+#define GDT_ENTRY_KERNEL_CS 2
+#define GDT_ENTRY_KERNEL_DS 3

-#define __KERNEL32_CS 0x08
+#define __KERNEL32_CS (GDT_ENTRY_KERNEL32_CS * 8)

/*
* we cannot use the same code segment descriptor for user and kernel
@@ -150,10 +147,10 @@
* The segment offset needs to contain a RPL. Grr. -AK
* GDT layout to get 64bit syscall right (sysret hardcodes gdt offsets)
*/
-
-#define __USER32_CS 0x23 /* 4*8+3 */
-#define __USER_DS 0x2b /* 5*8+3 */
-#define __USER_CS 0x33 /* 6*8+3 */
+#define GDT_ENTRY_DEFAULT_USER32_CS 4
+#define GDT_ENTRY_DEFAULT_USER_DS 5
+#define GDT_ENTRY_DEFAULT_USER_CS 6
+#define __USER32_CS (GDT_ENTRY_DEFAULT_USER32_CS * 8 + 3)
#define __USER32_DS __USER_DS

#define GDT_ENTRY_TSS 8 /* needs two entries */
@@ -175,6 +172,10 @@

#endif

+#define __KERNEL_CS (GDT_ENTRY_KERNEL_CS * 8)
+#define __KERNEL_DS (GDT_ENTRY_KERNEL_DS * 8)
+#define __USER_DS (GDT_ENTRY_DEFAULT_USER_DS* 8 + 3)
+#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS* 8 + 3)
#ifndef CONFIG_PARAVIRT
#define get_kernel_rpl() 0
#endif
--
1.5.4.5

2008-06-09 14:20:39

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 06/15] x86: boot secondary cpus through initial_code

remove "initialize_secondary". Boot both architectures via
initial_code.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/head_32.S | 2 +-
arch/x86/kernel/smpboot.c | 25 +------------------------
2 files changed, 2 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index ffb73a5..f67e934 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -452,7 +452,7 @@ is386: movl $2,%ecx # set MP
je 1f
movl $(__KERNEL_PERCPU), %eax
movl %eax,%fs # set this cpu's percpu
- jmp initialize_secondary # all other CPUs call initialize_secondary
+ movl (stack_start), %esp
1:
#endif /* CONFIG_SMP */
jmp *(initial_code)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 69639fd..0e4a44a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -349,28 +349,6 @@ static void __cpuinit start_secondary(void *unused)
cpu_idle();
}

-#ifdef CONFIG_X86_32
-/*
- * Everything has been set up for the secondary
- * CPUs - they just need to reload everything
- * from the task structure
- * This function must not return.
- */
-void __devinit initialize_secondary(void)
-{
- /*
- * We don't actually need to load the full TSS,
- * basically just the stack pointer and the ip.
- */
-
- asm volatile(
- "movl %0,%%esp\n\t"
- "jmp *%1"
- :
- :"m" (current->thread.sp), "m" (current->thread.ip));
-}
-#endif
-
static void __cpuinit smp_apply_quirks(struct cpuinfo_x86 *c)
{
#ifdef CONFIG_X86_32
@@ -892,15 +870,14 @@ do_rest:
#ifdef CONFIG_X86_32
per_cpu(current_task, cpu) = c_idle.idle;
init_gdt(cpu);
- c_idle.idle->thread.ip = (unsigned long) start_secondary;
/* Stack for startup_32 can be just as for start_secondary onwards */
irq_ctx_init(cpu);
#else
cpu_pda(cpu)->pcurrent = c_idle.idle;
load_sp0(&per_cpu(init_tss, cpu), &c_idle.idle->thread);
- initial_code = (unsigned long)start_secondary;
clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
#endif
+ initial_code = (unsigned long)start_secondary;
stack_start.sp = (void *) c_idle.idle->thread.sp;

/* start_ip had better be page-aligned! */
--
1.5.4.5

2008-06-09 14:20:52

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 07/15] x86: clearing io_apic harmless for x86_64

so remove ifdef.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/smpboot.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 0e4a44a..5a055a1 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1052,9 +1052,8 @@ static __init void disable_smp(void)
{
cpu_present_map = cpumask_of_cpu(0);
cpu_possible_map = cpumask_of_cpu(0);
-#ifdef CONFIG_X86_32
smpboot_clear_io_apic_irqs();
-#endif
+
if (smp_found_config)
phys_cpu_present_map =
physid_mask_of_physid(boot_cpu_physical_apicid);
--
1.5.4.5

2008-06-09 14:21:15

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 08/15] x86: remove ifdef from stepping

The stepping won't affect x86_64, since there are not x86_64 k7's
or pentiums. So, although it adds to the binary size, remove the ifdef
for smoother integration

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/smpboot.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5a055a1..50d307e 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -351,7 +351,6 @@ static void __cpuinit start_secondary(void *unused)

static void __cpuinit smp_apply_quirks(struct cpuinfo_x86 *c)
{
-#ifdef CONFIG_X86_32
/*
* Mask B, Pentium, but not Pentium MMX
*/
@@ -401,7 +400,6 @@ static void __cpuinit smp_apply_quirks(struct cpuinfo_x86 *c)

valid_k7:
;
-#endif
}

static void __cpuinit smp_checks(void)
--
1.5.4.5

2008-06-09 14:21:32

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

Do it, instead of keeping in io_apic_32.c. This is the way x86_64 already does.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/apic_32.c | 6 +-----
arch/x86/kernel/io_apic_32.c | 2 +-
arch/x86/kernel/smpboot.c | 3 ++-
include/asm-x86/hw_irq.h | 3 ---
4 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/apic_32.c b/arch/x86/kernel/apic_32.c
index fa8cf79..a06442a 100644
--- a/arch/x86/kernel/apic_32.c
+++ b/arch/x86/kernel/apic_32.c
@@ -1277,11 +1277,7 @@ int __init APIC_init_uniprocessor(void)
#endif
localise_nmi_watchdog();
end_local_APIC_setup();
-#ifdef CONFIG_X86_IO_APIC
- if (smp_found_config)
- if (!skip_ioapic_setup && nr_ioapics)
- setup_IO_APIC();
-#endif
+
setup_boot_clock();

return 0;
diff --git a/arch/x86/kernel/io_apic_32.c b/arch/x86/kernel/io_apic_32.c
index 7fc071f..cb79ce3 100644
--- a/arch/x86/kernel/io_apic_32.c
+++ b/arch/x86/kernel/io_apic_32.c
@@ -1606,7 +1606,7 @@ void /*__init*/ print_PIC(void)

#endif /* 0 */

-static void __init enable_IO_APIC(void)
+void __init enable_IO_APIC(void)
{
union IO_APIC_reg_01 reg_01;
int i8259_apic, i8259_pin;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index c4bb997..49d2900 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1179,13 +1179,14 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
*/
setup_local_APIC();

-#ifdef CONFIG_X86_64
+#ifdef CONFIG_X86_IO_APIC
/*
* Enable IO APIC before setting up error vector
*/
if (!skip_ioapic_setup && nr_ioapics)
enable_IO_APIC();
#endif
+
end_local_APIC_setup();

map_cpu_to_logical_apicid();
diff --git a/include/asm-x86/hw_irq.h b/include/asm-x86/hw_irq.h
index 18f067c..31b56cd 100644
--- a/include/asm-x86/hw_irq.h
+++ b/include/asm-x86/hw_irq.h
@@ -66,10 +66,7 @@ extern void disable_IO_APIC(void);
extern void print_IO_APIC(void);
extern int IO_APIC_get_PCI_irq_vector(int bus, int slot, int fn);
extern void setup_ioapic_dest(void);
-
-#ifdef CONFIG_X86_64
extern void enable_IO_APIC(void);
-#endif

/* IPI functions */
extern void send_IPI_self(int vector);
--
1.5.4.5

2008-06-09 14:21:48

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 09/15] x86: change __setup_vector_irq with setup_vector_irq

We create a version of it for i386, and then take the CONFIG_X86_64
ifdef out of the game. We could create a __setup_vector_irq for i386,
but it would incur in an unnecessary lock taking. Moreover, it is better
practice to only export setup_vector_irq anyway.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/io_apic_32.c | 5 +++++
arch/x86/kernel/io_apic_64.c | 9 ++++++++-
arch/x86/kernel/smpboot.c | 11 ++---------
include/asm-x86/hw_irq.h | 2 +-
4 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/io_apic_32.c b/arch/x86/kernel/io_apic_32.c
index 9dbe7e9..7fc071f 100644
--- a/arch/x86/kernel/io_apic_32.c
+++ b/arch/x86/kernel/io_apic_32.c
@@ -1209,6 +1209,11 @@ static int assign_irq_vector(int irq)

return vector;
}
+
+void setup_vector_irq(int cpu)
+{
+}
+
static struct irq_chip ioapic_chip;

#define IOAPIC_AUTO -1
diff --git a/arch/x86/kernel/io_apic_64.c b/arch/x86/kernel/io_apic_64.c
index 5ea376d..97d67c9 100644
--- a/arch/x86/kernel/io_apic_64.c
+++ b/arch/x86/kernel/io_apic_64.c
@@ -780,7 +780,7 @@ static void __clear_irq_vector(int irq)
cpus_clear(cfg->domain);
}

-void __setup_vector_irq(int cpu)
+static void __setup_vector_irq(int cpu)
{
/* Initialize vector_irq on a new cpu */
/* This function must be called with vector_lock held */
@@ -803,6 +803,13 @@ void __setup_vector_irq(int cpu)
}
}

+void setup_vector_irq(int cpu)
+{
+ spin_lock(&vector_lock);
+ __setup_vector_irq(smp_processor_id());
+ spin_unlock(&vector_lock);
+}
+

static struct irq_chip ioapic_chip;

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 50d307e..9d8dfec 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -329,15 +329,8 @@ static void __cpuinit start_secondary(void *unused)
* smp_call_function().
*/
lock_ipi_call_lock();
-#ifdef CONFIG_X86_64
- spin_lock(&vector_lock);
-
- /* Setup the per cpu irq handling data structures */
- __setup_vector_irq(smp_processor_id());
- /*
- * Allow the master to continue.
- */
- spin_unlock(&vector_lock);
+#ifdef CONFIG_X86_IO_APIC
+ setup_vector_irq(smp_processor_id());
#endif
cpu_set(smp_processor_id(), cpu_online_map);
unlock_ipi_call_lock();
diff --git a/include/asm-x86/hw_irq.h b/include/asm-x86/hw_irq.h
index 1428b41..18f067c 100644
--- a/include/asm-x86/hw_irq.h
+++ b/include/asm-x86/hw_irq.h
@@ -97,9 +97,9 @@ extern void (*const interrupt[NR_IRQS])(void);
#else
typedef int vector_irq_t[NR_VECTORS];
DECLARE_PER_CPU(vector_irq_t, vector_irq);
-extern void __setup_vector_irq(int cpu);
extern spinlock_t vector_lock;
#endif
+extern void setup_vector_irq(int cpu);

#endif /* !ASSEMBLY_ */

--
1.5.4.5

2008-06-09 14:22:04

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 10/15] x86: provide connect_bsp_APIC for x86_64

Although it is not really needed, we provide it to get
closer to i386. ifdefs around it are removed in smpboot.c

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/apic_64.c | 10 ++++++++++
arch/x86/kernel/smpboot.c | 5 +----
2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/apic_64.c b/arch/x86/kernel/apic_64.c
index 310bec1..8d62387 100644
--- a/arch/x86/kernel/apic_64.c
+++ b/arch/x86/kernel/apic_64.c
@@ -942,6 +942,8 @@ int __init APIC_init_uniprocessor(void)

verify_local_APIC();

+ connect_bsp_APIC();
+
phys_cpu_present_map = physid_mask_of_physid(boot_cpu_physical_apicid);
apic_write(APIC_ID, SET_APIC_ID(boot_cpu_physical_apicid));

@@ -1023,6 +1025,14 @@ asmlinkage void smp_error_interrupt(void)
irq_exit();
}

+/**
+ * * connect_bsp_APIC - attach the APIC to the interrupt system
+ * */
+void __init connect_bsp_APIC(void)
+{
+ enable_apic_mode();
+}
+
void disconnect_bsp_APIC(int virt_wire_setup)
{
/* Go back to Virtual Wire compatibility mode */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 9d8dfec..c4bb997 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1117,9 +1117,7 @@ static int __init smp_sanity_check(unsigned max_cpus)

localise_nmi_watchdog();

-#ifdef CONFIG_X86_32
connect_bsp_APIC();
-#endif
setup_local_APIC();
end_local_APIC_setup();
return -1;
@@ -1174,9 +1172,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
}
preempt_enable();

-#ifdef CONFIG_X86_32
connect_bsp_APIC();
-#endif
+
/*
* Switch from PIC to APIC mode.
*/
--
1.5.4.5

2008-06-09 14:22:32

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 13/15] x86: remove cpu from maps

during cpu disable, take cpus out of all maps in i386, instead
of just the online map.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/smpboot.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6316d30..f6b99b6 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1321,13 +1321,11 @@ __init void prefill_possible_map(void)
static void __ref remove_cpu_from_maps(int cpu)
{
cpu_clear(cpu, cpu_online_map);
-#ifdef CONFIG_X86_64
cpu_clear(cpu, cpu_callout_map);
cpu_clear(cpu, cpu_callin_map);
/* was set by cpu_init() */
clear_bit(cpu, (unsigned long *)&cpu_initialized);
numa_remove_cpu(cpu);
-#endif
}

int __cpu_disable(void)
--
1.5.4.5

2008-06-09 14:22:49

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 12/15] x86: change naming to match x86_64

Change unmap_cpu_to_logical_apicid to numa_remove_cpu.
Besides being shorter, it is the same name x86_64 uses. We
can save an ifdef in the code this way.

Signed-off-by: Glauber de Oliveira Costa <[email protected]>
---
arch/x86/kernel/smpboot.c | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 49d2900..6316d30 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -181,13 +181,12 @@ static void map_cpu_to_logical_apicid(void)
map_cpu_to_node(cpu, node);
}

-static void unmap_cpu_to_logical_apicid(int cpu)
+static void numa_remove_cpu(int cpu)
{
cpu_2_logical_apicid[cpu] = BAD_APICID;
unmap_cpu_to_node(cpu);
}
#else
-#define unmap_cpu_to_logical_apicid(cpu) do {} while (0)
#define map_cpu_to_logical_apicid() do {} while (0)
#endif

@@ -945,10 +944,7 @@ restore_state:

if (boot_error) {
/* Try to put things back the way they were before ... */
- unmap_cpu_to_logical_apicid(cpu);
-#ifdef CONFIG_X86_64
numa_remove_cpu(cpu); /* was set by numa_add_cpu */
-#endif
cpu_clear(cpu, cpu_callout_map); /* was set by do_boot_cpu() */
cpu_clear(cpu, cpu_initialized); /* was set by cpu_init() */
cpu_clear(cpu, cpu_possible_map);
@@ -1246,7 +1242,7 @@ void cpu_exit_clear(void)
cpu_clear(cpu, cpu_callout_map);
cpu_clear(cpu, cpu_callin_map);

- unmap_cpu_to_logical_apicid(cpu);
+ numa_remove_cpu(cpu);
}
# endif /* CONFIG_X86_32 */

--
1.5.4.5

2008-06-09 14:23:18

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 15/15] x86: take load_sp0 out of smpboot.c

there's no particular reason to do load_sp0 in different
places for i386 and x86_64. They should all be in cpu_init.
Right now, cpu_init itself is not integrated, but with this patch,
the code becomes closer to each other, making in easier to integrate
when the time comes.

Furthermore, although doing it in do_boot_cpu for x86_64 is fine, since it's
only a copy, load_sp0 should be executed in the cpu it refers to anyway.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/setup64.c | 1 +
arch/x86/kernel/smpboot.c | 1 -
2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/setup64.c b/arch/x86/kernel/setup64.c
index 70ff071..151d315 100644
--- a/arch/x86/kernel/setup64.c
+++ b/arch/x86/kernel/setup64.c
@@ -247,6 +247,7 @@ void __cpuinit cpu_init (void)
BUG();
enter_lazy_tlb(&init_mm, me);

+ load_sp0(t, &current->thread);
set_tss_desc(cpu, t);
load_TR_desc();
load_LDT(&init_mm.context);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7952c39..371113a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -864,7 +864,6 @@ do_rest:
irq_ctx_init(cpu);
#else
cpu_pda(cpu)->pcurrent = c_idle.idle;
- load_sp0(&per_cpu(init_tss, cpu), &c_idle.idle->thread);
clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
#endif
initial_code = (unsigned long)start_secondary;
--
1.5.4.5

2008-06-09 14:24:26

by Glauber Costa

[permalink] [raw]
Subject: [PATCH 14/15] x86: move cpu_exit_clear to process_32.c

Take it out of smpboot.c, and move it to process_32.c, closer
to its only user.

Signed-off-by: Glauber Costa <[email protected]>
---
arch/x86/kernel/process_32.c | 16 ++++++++++++++++
arch/x86/kernel/smpboot.c | 19 +------------------
include/asm-x86/numa_32.h | 1 +
include/asm-x86/smp.h | 1 -
4 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 1aaca76..99f8e7d 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -128,6 +128,22 @@ EXPORT_SYMBOL(default_idle);

#ifdef CONFIG_HOTPLUG_CPU
#include <asm/nmi.h>
+
+static void cpu_exit_clear(void)
+{
+ int cpu = raw_smp_processor_id();
+
+ idle_task_exit();
+
+ cpu_uninit();
+ irq_ctx_exit(cpu);
+
+ cpu_clear(cpu, cpu_callout_map);
+ cpu_clear(cpu, cpu_callin_map);
+
+ numa_remove_cpu(cpu);
+}
+
/* We don't actually take CPU down, just spin without interrupts. */
static inline void play_dead(void)
{
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f6b99b6..7952c39 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -181,7 +181,7 @@ static void map_cpu_to_logical_apicid(void)
map_cpu_to_node(cpu, node);
}

-static void numa_remove_cpu(int cpu)
+void numa_remove_cpu(int cpu)
{
cpu_2_logical_apicid[cpu] = BAD_APICID;
unmap_cpu_to_node(cpu);
@@ -1229,23 +1229,6 @@ void __init native_smp_cpus_done(unsigned int max_cpus)

#ifdef CONFIG_HOTPLUG_CPU

-# ifdef CONFIG_X86_32
-void cpu_exit_clear(void)
-{
- int cpu = raw_smp_processor_id();
-
- idle_task_exit();
-
- cpu_uninit();
- irq_ctx_exit(cpu);
-
- cpu_clear(cpu, cpu_callout_map);
- cpu_clear(cpu, cpu_callin_map);
-
- numa_remove_cpu(cpu);
-}
-# endif /* CONFIG_X86_32 */
-
static void remove_siblinginfo(int cpu)
{
int sibling;
diff --git a/include/asm-x86/numa_32.h b/include/asm-x86/numa_32.h
index 03d0f7a..205015f 100644
--- a/include/asm-x86/numa_32.h
+++ b/include/asm-x86/numa_32.h
@@ -2,6 +2,7 @@
#define _ASM_X86_32_NUMA_H 1

extern int pxm_to_nid(int pxm);
+extern void numa_remove_cpu(int cpu);

#ifdef CONFIG_NUMA
extern void __init remap_numa_kva(void);
diff --git a/include/asm-x86/smp.h b/include/asm-x86/smp.h
index fc10073..fad45f6 100644
--- a/include/asm-x86/smp.h
+++ b/include/asm-x86/smp.h
@@ -188,7 +188,6 @@ static inline int hard_smp_processor_id(void)
#endif /* CONFIG_X86_LOCAL_APIC */

#ifdef CONFIG_HOTPLUG_CPU
-extern void cpu_exit_clear(void);
extern void cpu_uninit(void);
#endif

--
1.5.4.5

2008-06-09 15:23:45

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 03/15] x86: remove early_gdt_descr reference

On Mon, 2008-06-09 at 11:16 -0300, Glauber Costa wrote:
> since we use switch_to_new_gdt, there is no point
> in assigning early_gdt_descr except for the first
> assignment, which is done manually.

What makes you think you can do this? If you don't update the early
boot gdt, they all end up using the Boot CPU one. The problem with this
is that there's a time from start_secondary to switch_to_new_gdt where
the per cpu selector (%fs) and the pda selector (%gs) are those of the
boot CPU. The former isn't a problem but the CPU number is in the
latter, and it's used in that path before we get to the initialisation.

James

2008-06-09 15:24:34

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Mon, 9 Jun 2008, Glauber Costa wrote:

> Do it, instead of keeping in io_apic_32.c. This is the way x86_64
> already does.

This change looks wrong -- is native_smp_prepare_cpus() at all run on
!CONFIG_SMP? I don't think so. You have to do this differently.

Maciej

2008-06-09 15:54:10

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH 03/15] x86: remove early_gdt_descr reference

James Bottomley wrote:
> On Mon, 2008-06-09 at 11:16 -0300, Glauber Costa wrote:
>> since we use switch_to_new_gdt, there is no point
>> in assigning early_gdt_descr except for the first
>> assignment, which is done manually.
>
> What makes you think you can do this? If you don't update the early
> boot gdt, they all end up using the Boot CPU one. The problem with this
> is that there's a time from start_secondary to switch_to_new_gdt where
> the per cpu selector (%fs) and the pda selector (%gs) are those of the
> boot CPU. The former isn't a problem but the CPU number is in the
> latter, and it's used in that path before we get to the initialisation.

You are right, I missed it.

However, it only seem to be used in cpu_init, and very early. Sure there
are some users _before_ we load the new gdt, but nothing prevents them
to be moved after it. (Of course, this patch is wrong anyway).

And if we do that, we can even take the %fs loading out of head_32.S
Of course, it's only valid if those are indeed the only early users of it.

Is there any other use I'm missing?

2008-06-09 15:57:01

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

Maciej W. Rozycki wrote:
> On Mon, 9 Jun 2008, Glauber Costa wrote:
>
>> Do it, instead of keeping in io_apic_32.c. This is the way x86_64
>> already does.
>
> This change looks wrong -- is native_smp_prepare_cpus() at all run on
> !CONFIG_SMP? I don't think so. You have to do this differently.
You're right.

I'll update this one.

2008-06-09 17:21:20

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 03/15] x86: remove early_gdt_descr reference

On Mon, 2008-06-09 at 12:49 -0300, Glauber Costa wrote:
> James Bottomley wrote:
> > On Mon, 2008-06-09 at 11:16 -0300, Glauber Costa wrote:
> >> since we use switch_to_new_gdt, there is no point
> >> in assigning early_gdt_descr except for the first
> >> assignment, which is done manually.
> >
> > What makes you think you can do this? If you don't update the early
> > boot gdt, they all end up using the Boot CPU one. The problem with this
> > is that there's a time from start_secondary to switch_to_new_gdt where
> > the per cpu selector (%fs) and the pda selector (%gs) are those of the
> > boot CPU. The former isn't a problem but the CPU number is in the
> > latter, and it's used in that path before we get to the initialisation.
>
> You are right, I missed it.
>
> However, it only seem to be used in cpu_init, and very early. Sure there
> are some users _before_ we load the new gdt, but nothing prevents them
> to be moved after it. (Of course, this patch is wrong anyway).
>
> And if we do that, we can even take the %fs loading out of head_32.S
> Of course, it's only valid if those are indeed the only early users of it.
>
> Is there any other use I'm missing?

Well, %fs loading there is done for the boot CPU. To eliminate that you
have to not only verify that start_secondary doesn't use anything in
per_cpu areas, but also verify that nothing in start_kernel() up until
boot_cpu_init() does ... That's a lot of smp_processor_id() references
to convert.

James

2008-06-09 17:27:56

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH 03/15] x86: remove early_gdt_descr reference

James Bottomley wrote:
> On Mon, 2008-06-09 at 12:49 -0300, Glauber Costa wrote:
>> James Bottomley wrote:
>>> On Mon, 2008-06-09 at 11:16 -0300, Glauber Costa wrote:
>>>> since we use switch_to_new_gdt, there is no point
>>>> in assigning early_gdt_descr except for the first
>>>> assignment, which is done manually.
>>> What makes you think you can do this? If you don't update the early
>>> boot gdt, they all end up using the Boot CPU one. The problem with this
>>> is that there's a time from start_secondary to switch_to_new_gdt where
>>> the per cpu selector (%fs) and the pda selector (%gs) are those of the
>>> boot CPU. The former isn't a problem but the CPU number is in the
>>> latter, and it's used in that path before we get to the initialisation.
>> You are right, I missed it.
>>
>> However, it only seem to be used in cpu_init, and very early. Sure there
>> are some users _before_ we load the new gdt, but nothing prevents them
>> to be moved after it. (Of course, this patch is wrong anyway).
>>
>> And if we do that, we can even take the %fs loading out of head_32.S
>> Of course, it's only valid if those are indeed the only early users of it.
>>
>> Is there any other use I'm missing?
>
> Well, %fs loading there is done for the boot CPU. To eliminate that you
> have to not only verify that start_secondary doesn't use anything in
> per_cpu areas, but also verify that nothing in start_kernel() up until
> boot_cpu_init() does ... That's a lot of smp_processor_id() references
> to convert.
Yes, after a second look, it would be tricky indeed. But only for cpu0.
For all the others, I still think we could get rid of the problem by
switching to the new gdt earlier in cpu_init.

What do you think?

2008-06-09 17:40:24

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH 03/15] x86: remove early_gdt_descr reference

On Mon, 2008-06-09 at 14:23 -0300, Glauber Costa wrote:
> James Bottomley wrote:
> > On Mon, 2008-06-09 at 12:49 -0300, Glauber Costa wrote:
> >> James Bottomley wrote:
> >>> On Mon, 2008-06-09 at 11:16 -0300, Glauber Costa wrote:
> >>>> since we use switch_to_new_gdt, there is no point
> >>>> in assigning early_gdt_descr except for the first
> >>>> assignment, which is done manually.
> >>> What makes you think you can do this? If you don't update the early
> >>> boot gdt, they all end up using the Boot CPU one. The problem with this
> >>> is that there's a time from start_secondary to switch_to_new_gdt where
> >>> the per cpu selector (%fs) and the pda selector (%gs) are those of the
> >>> boot CPU. The former isn't a problem but the CPU number is in the
> >>> latter, and it's used in that path before we get to the initialisation.
> >> You are right, I missed it.
> >>
> >> However, it only seem to be used in cpu_init, and very early. Sure there
> >> are some users _before_ we load the new gdt, but nothing prevents them
> >> to be moved after it. (Of course, this patch is wrong anyway).
> >>
> >> And if we do that, we can even take the %fs loading out of head_32.S
> >> Of course, it's only valid if those are indeed the only early users of it.
> >>
> >> Is there any other use I'm missing?
> >
> > Well, %fs loading there is done for the boot CPU. To eliminate that you
> > have to not only verify that start_secondary doesn't use anything in
> > per_cpu areas, but also verify that nothing in start_kernel() up until
> > boot_cpu_init() does ... That's a lot of smp_processor_id() references
> > to convert.
> Yes, after a second look, it would be tricky indeed. But only for cpu0.
> For all the others, I still think we could get rid of the problem by
> switching to the new gdt earlier in cpu_init.
>
> What do you think?

Operating a CPU with a bogus GDT is very fragile. You can fix all the
current issues with the secondary CPUs, but it gives a critical section
within which none of the per_cpu operations will work. It only takes
one patch violating this rule and we have a very subtle bug introduced.

It looks to me like the better fix might be just to initialise the gdt
completely and properly in do_boot_cpu and just have the single switch
in head_32.S be the correct one. That way there's no problem critical
region.

James

2008-06-09 19:49:16

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

Maciej W. Rozycki wrote:
> On Mon, 9 Jun 2008, Glauber Costa wrote:
>
>> Do it, instead of keeping in io_apic_32.c. This is the way x86_64
>> already does.
>
> This change looks wrong -- is native_smp_prepare_cpus() at all run on
> !CONFIG_SMP? I don't think so. You have to do this differently.
>
> Maciej

How about this one instead?

This one does enable_IO_APIC in the init_uniprocessor too, and should
account for the !smp case.


Attachments:
0001-x86-move-enabling-of-io_apic-to-prepare_cpus.patch (2.47 kB)

2008-06-09 20:13:19

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Mon, 9 Jun 2008, Glauber Costa wrote:

> This one does enable_IO_APIC in the init_uniprocessor too, and should
> account for the !smp case.

Hmm, it looks a little bit better, but why do you want to call
enable_IO_APIC() separately in the first place? There is a comment
stating: "Enable IO APIC before setting up error vector," but why is it
needed on 64-bit systems? Especially as the very same system may run a
32-bit kernel and then it suddenly would not have to do this anymore?
Strange...

Also since you are cleaning up this code -- why don't you actually take
the opportunity and get rid of the horrible #ifdefs interspersed
throughout?

Maciej

2008-06-09 20:53:36

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Mon, Jun 9, 2008 at 1:12 PM, Maciej W. Rozycki <[email protected]> wrote:
> On Mon, 9 Jun 2008, Glauber Costa wrote:
>
>> This one does enable_IO_APIC in the init_uniprocessor too, and should
>> account for the !smp case.
>
> Hmm, it looks a little bit better, but why do you want to call
> enable_IO_APIC() separately in the first place? There is a comment
> stating: "Enable IO APIC before setting up error vector," but why is it
> needed on 64-bit systems? Especially as the very same system may run a
> 32-bit kernel and then it suddenly would not have to do this anymore?
> Strange...

that enable_IO_APIC() only call record old ioapic/pin for timer and
clear_IO_APIC.

we need clear_IO_APIC before enable setup error vector, in case there
is wrong setting in ioapic by BIOS...

YH

2008-06-09 21:01:49

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Mon, 9 Jun 2008, Yinghai Lu wrote:

> we need clear_IO_APIC before enable setup error vector, in case there
> is wrong setting in ioapic by BIOS...

Fair enough, although it is still interesting why it would only trigger
in the 64-bit mode and why shouldn't the BIOS be fixed instead.

Maciej

2008-06-09 21:07:10

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

Maciej W. Rozycki wrote:
> On Mon, 9 Jun 2008, Glauber Costa wrote:
>
>> This one does enable_IO_APIC in the init_uniprocessor too, and should
>> account for the !smp case.
>
> Hmm, it looks a little bit better, but why do you want to call
> enable_IO_APIC() separately in the first place? There is a comment
> stating: "Enable IO APIC before setting up error vector," but why is it
> needed on 64-bit systems? Especially as the very same system may run a
> 32-bit kernel and then it suddenly would not have to do this anymore?
> Strange...

This was reported by Yinghai, but I think he already answered to that.

> Also since you are cleaning up this code -- why don't you actually take
> the opportunity and get rid of the horrible #ifdefs interspersed
> throughout?
throughout where?
They're all over the place ;)

My next target would be per-cpu data. But that's because there's _a lot_
of code in the tree that got ifdefs between 32 and 64-bit because of
differences in that, specially irq statistics. A macro would do, but if
we're gonna do it, let's do it right.

2008-06-10 02:47:19

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Mon, 9 Jun 2008, Maciej W. Rozycki wrote:

> > we need clear_IO_APIC before enable setup error vector, in case there
> > is wrong setting in ioapic by BIOS...
>
> Fair enough, although it is still interesting why it would only trigger
> in the 64-bit mode and why shouldn't the BIOS be fixed instead.

Then again -- what if X86_LOCAL_APIC is set, but X86_IO_APIC is not?

Maciej

2008-06-10 05:09:06

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Mon, Jun 9, 2008 at 7:46 PM, Maciej W. Rozycki <[email protected]> wrote:
> On Mon, 9 Jun 2008, Maciej W. Rozycki wrote:
>
>> > we need clear_IO_APIC before enable setup error vector, in case there
>> > is wrong setting in ioapic by BIOS...
>>
>> Fair enough, although it is still interesting why it would only trigger
>> in the 64-bit mode and why shouldn't the BIOS be fixed instead.
>
> Then again -- what if X86_LOCAL_APIC is set, but X86_IO_APIC is not?

config X86_LOCAL_APIC
def_bool y
depends on X86_64 || (X86_32 && (X86_UP_APIC || ((X86_VISWS ||
SMP) && !X86_VOYAGER) || X86_GENERICARCH))

config X86_IO_APIC
def_bool y
depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP &&
!(X86_VISWS || X86_VOYAGER)) || X86_GENERICARCH))

for 64bit, those are all set.

for 32bit, may need to null stub if X86_IO_APIC is not set

YH

2008-06-10 13:06:14

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

Yinghai Lu wrote:
> On Mon, Jun 9, 2008 at 7:46 PM, Maciej W. Rozycki <[email protected]> wrote:
>> On Mon, 9 Jun 2008, Maciej W. Rozycki wrote:
>>
>>>> we need clear_IO_APIC before enable setup error vector, in case there
>>>> is wrong setting in ioapic by BIOS...
>>> Fair enough, although it is still interesting why it would only trigger
>>> in the 64-bit mode and why shouldn't the BIOS be fixed instead.
>> Then again -- what if X86_LOCAL_APIC is set, but X86_IO_APIC is not?
>
> config X86_LOCAL_APIC
> def_bool y
> depends on X86_64 || (X86_32 && (X86_UP_APIC || ((X86_VISWS ||
> SMP) && !X86_VOYAGER) || X86_GENERICARCH))
>
> config X86_IO_APIC
> def_bool y
> depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP &&
> !(X86_VISWS || X86_VOYAGER)) || X86_GENERICARCH))
>
> for 64bit, those are all set.
>
> for 32bit, may need to null stub if X86_IO_APIC is not set
>
> YH
Fair Enough.

2008-06-10 13:33:13

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Tue, 10 Jun 2008, Glauber Costa wrote:

> >> Then again -- what if X86_LOCAL_APIC is set, but X86_IO_APIC is not?
> >
> > config X86_LOCAL_APIC
> > def_bool y
> > depends on X86_64 || (X86_32 && (X86_UP_APIC || ((X86_VISWS ||
> > SMP) && !X86_VOYAGER) || X86_GENERICARCH))
> >
> > config X86_IO_APIC
> > def_bool y
> > depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP &&
> > !(X86_VISWS || X86_VOYAGER)) || X86_GENERICARCH))
> >
> > for 64bit, those are all set.
> >
> > for 32bit, may need to null stub if X86_IO_APIC is not set
> >
> > YH
> Fair Enough.

That does not answer the question what to do with a 32-bit kernel run on
a system that requires the fixup in the 64-bit mode. Papering over such
firmware bugs in the kernel encourages hardware vendors to maintain the
breakage.

Note that the I/O APIC comes out of reset with all the redirection
entries masked out, so if any error conditions are triggered before Linux
configures the chip, they are a result of a deliberate misprogramming done
to the chip by the firmware.

Does anyone have a reference to the original problem report which lead to
this workaround having been put in place?

Maciej

2008-06-10 19:09:29

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Tue, Jun 10, 2008 at 6:30 AM, Maciej W. Rozycki <[email protected]> wrote:
> On Tue, 10 Jun 2008, Glauber Costa wrote:
>
>> >> Then again -- what if X86_LOCAL_APIC is set, but X86_IO_APIC is not?
>> >
>> > config X86_LOCAL_APIC
>> > def_bool y
>> > depends on X86_64 || (X86_32 && (X86_UP_APIC || ((X86_VISWS ||
>> > SMP) && !X86_VOYAGER) || X86_GENERICARCH))
>> >
>> > config X86_IO_APIC
>> > def_bool y
>> > depends on X86_64 || (X86_32 && (X86_UP_IOAPIC || (SMP &&
>> > !(X86_VISWS || X86_VOYAGER)) || X86_GENERICARCH))
>> >
>> > for 64bit, those are all set.
>> >
>> > for 32bit, may need to null stub if X86_IO_APIC is not set
>> >
>> > YH
>> Fair Enough.
>
> That does not answer the question what to do with a 32-bit kernel run on
> a system that requires the fixup in the 64-bit mode. Papering over such
> firmware bugs in the kernel encourages hardware vendors to maintain the
> breakage.

kernel should not assume io apic register is set right by firmware.
and kernel actually doesn't trust them, and clear the io apic registers.
that patch just move that early before enable error vector.

YH

2008-06-10 19:38:28

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Tue, 10 Jun 2008, Yinghai Lu wrote:

> kernel should not assume io apic register is set right by firmware.

What is the basis of this assumption? Linux generally assumes the
chipset components have been placed by the firmware into a consistent
state. That does not necessarily mean suitable for Linux, hence the need
to reconfigure a bit here or there, but there should be no need to touch
components that are not going to be used by Linux directly. The I/O APIC
is no different.

> and kernel actually doesn't trust them, and clear the io apic registers.

The I/O APIC registers are cleared, because this is the only way you can
assure inconsistent configuration does not happen during reconfiguration.
For example two inputs using the same vector or set up into the ExtINTA
mode. The original intent of the code was not to paper over breakage in
the firmware. You are trying to change it and it can be done, but it has
to be justified well.

> that patch just move that early before enable error vector.

What I am saying repeatedly is clearing of the I/O APIC is not guaranteed
to happen for all the possible cases of Linux configuration. Which means
this is a partial solution only -- please try to propose a better one or
provide the original problem report so that someone else can have a look
at it. What's triggering the error interrupt for example? Is it
recoverable?

Maciej

2008-06-10 19:49:58

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Tue, Jun 10, 2008 at 12:36 PM, Maciej W. Rozycki
<[email protected]> wrote:
> On Tue, 10 Jun 2008, Yinghai Lu wrote:
>
>> kernel should not assume io apic register is set right by firmware.
>
> What is the basis of this assumption? Linux generally assumes the
> chipset components have been placed by the firmware into a consistent
> state. That does not necessarily mean suitable for Linux, hence the need
> to reconfigure a bit here or there, but there should be no need to touch
> components that are not going to be used by Linux directly. The I/O APIC
> is no different.
>
>> and kernel actually doesn't trust them, and clear the io apic registers.
>
> The I/O APIC registers are cleared, because this is the only way you can
> assure inconsistent configuration does not happen during reconfiguration.
> For example two inputs using the same vector or set up into the ExtINTA
> mode. The original intent of the code was not to paper over breakage in
> the firmware. You are trying to change it and it can be done, but it has
> to be justified well.
>
>> that patch just move that early before enable error vector.
>
> What I am saying repeatedly is clearing of the I/O APIC is not guaranteed
> to happen for all the possible cases of Linux configuration. Which means
> this is a partial solution only -- please try to propose a better one or
> provide the original problem report so that someone else can have a look
> at it. What's triggering the error interrupt for example? Is it
> recoverable?

ExtINT is routed to ioapic pin0. but the dst is set to 0.
and the systems has multi sockets with quadcore cpu, so the apic id of boot cpu
is set to 4 instead of 0

YH

2008-06-11 00:31:26

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Tue, 10 Jun 2008, Yinghai Lu wrote:

> ExtINT is routed to ioapic pin0. but the dst is set to 0.
> and the systems has multi sockets with quadcore cpu, so the apic id of boot cpu
> is set to 4 instead of 0

Thanks for the info. Let me understand the situation better: local APIC
IDs are preassigned by the firmware based on their "socket address" and
the socket where the lowest numbered quad would be is empty.
Nevertheless the firmware sets the destination ID of the ExtINTA interrupt
in the I/O APIC to 0 rather than the ID of the bootstrap CPU. Is that
correct?

But it would mean the Virtual Wire interrupt delivery would not work, or
is the I/O APIC setup redundant and the local APIC of the bootstrap CPU is
set up for ExtINTA delivery as well?

Maciej

2008-06-11 02:32:55

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Tue, Jun 10, 2008 at 5:29 PM, Maciej W. Rozycki <[email protected]> wrote:
> On Tue, 10 Jun 2008, Yinghai Lu wrote:
>
>> ExtINT is routed to ioapic pin0. but the dst is set to 0.
>> and the systems has multi sockets with quadcore cpu, so the apic id of boot cpu
>> is set to 4 instead of 0
>
> Thanks for the info. Let me understand the situation better: local APIC
> IDs are preassigned by the firmware based on their "socket address" and
> the socket where the lowest numbered quad would be is empty.
> Nevertheless the firmware sets the destination ID of the ExtINTA interrupt
> in the I/O APIC to 0 rather than the ID of the bootstrap CPU. Is that
> correct?

Yes

after I asked bios engineer to change the dest apic id to 4, the error is gone.

before clear_IO_APIC()
number of MP IRQ sources: 15.
number of IO-APIC #0 registers: 24.
number of IO-APIC #1 registers: 7.
number of IO-APIC #2 registers: 7.
number of IO-APIC #3 registers: 24.
testing the IO APIC.......................

IO APIC #0......
.... register #00: 00000000
....... : physical APIC id: 00
.... register #01: 00170011
....... : max redirection entries: 0017
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
NR Dst Mask Trig IRR Pol Stat Dmod Deli Vect:
00 004 0 0 0 0 0 0 7 00
01 000 1 0 0 0 0 0 0 00
02 000 1 0 0 0 0 0 0 00
03 000 1 0 0 0 0 0 0 00
04 000 1 0 0 0 0 0 0 00
05 000 1 0 0 0 0 0 0 00
06 000 1 0 0 0 0 0 0 00
07 000 1 0 0 0 0 0 0 00
08 000 1 0 0 0 0 0 0 00
09 000 1 0 0 0 0 0 0 00
0a 000 1 0 0 0 0 0 0 00
0b 000 1 0 0 0 0 0 0 00
0c 000 1 0 0 0 0 0 0 00
0d 000 1 0 0 0 0 0 0 00
0e 000 1 0 0 0 0 0 0 00
0f 000 1 0 0 0 0 0 0 00
10 000 1 0 0 0 0 0 0 00
11 000 1 0 0 0 0 0 0 00
12 000 1 0 0 0 0 0 0 00
13 000 1 0 0 0 0 0 0 00
14 000 1 0 0 0 0 0 0 00
15 000 1 0 0 0 0 0 0 00
16 000 1 0 0 0 0 0 0 00
17 000 1 0 0 0 0 0 0 00


>
> But it would mean the Virtual Wire interrupt delivery would not work, or
> is the I/O APIC setup redundant and the local APIC of the bootstrap CPU is
> set up for ExtINTA delivery as well?

it doesn't need to virtual wire for timer (ExtInt), timer is hpet and
routed to ioapic pin2.

Just know at first BIOS engineer refused to change that to 4, because
other os is not happy.

YH

2008-06-11 13:00:04

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 11/15] x86: move enabling of io_apic to prepare_cpus

On Tue, 10 Jun 2008, Yinghai Lu wrote:

> > Thanks for the info. Let me understand the situation better: local APIC
> > IDs are preassigned by the firmware based on their "socket address" and
> > the socket where the lowest numbered quad would be is empty.
> > Nevertheless the firmware sets the destination ID of the ExtINTA interrupt
> > in the I/O APIC to 0 rather than the ID of the bootstrap CPU. Is that
> > correct?
>
> Yes
>
> after I asked bios engineer to change the dest apic id to 4, the error
> is gone.

Thanks for the clarification.

> > But it would mean the Virtual Wire interrupt delivery would not work, or
> > is the I/O APIC setup redundant and the local APIC of the bootstrap CPU is
> > set up for ExtINTA delivery as well?
>
> it doesn't need to virtual wire for timer (ExtInt), timer is hpet and
> routed to ioapic pin2.

That's not what I asked about -- the timer does not matter here. The
Virtual Wire mode is a configuration, where one input of one APIC in the
system is set up for the ExtINTA mode and acts transparently with the
system software having no need to know about it. Instead a pair of
legacy 8259A chips is used to deliver interrupts, including claiming the
INTA cycles, providing vectors and prioritising sources, as defined in the
PC/AT architecture.

Many pieces of software rely on the 8259A PICs, either because they
predate the APIC or because they have no means to make use of
multiprocessor features anyway. They include various versions of DOS
together with software run in that environment (as DOS programs quite
frequently drive hardware at the low level), many versions of the
Microsoft Windows system as well as other software. For these a legacy
mode, either the Virtual Wire mode, or a mode where 8259A interrupts are
delivered directly to one processor's INT line has to be implemented as
mandated both by the Multiprocessor Specification and the Advanced
Configuration and Power Interface Specification.

Coming back to my question -- how is such a mode implemented in the
affected system? Clearly the route through the I/O APIC is not going to
work and moreover, the chip clutters the system with broken interrupt
messages each time the 8259A signals an interrupt.

Please note Linux can use the Virtual Wire mode in the APIC/SMP mode too,
if requested by specifying the "noapic" command-line option -- have you
verified the option works correctly with the affected system?

> Just know at first BIOS engineer refused to change that to 4, because
> other os is not happy.

Well, this is just a confirmation my attitude is correct -- such problems
should not be papered over, because vendors will deny their existence
then. At least a complaint message should be printed so that users have
an opportunity to see it and ask their hardware supplier for an
explanation.

In this case, a workaround for the 64-bit mode happens to be quite cheap,
but it should be extended to cover the 32-bit mode as well. The only
solution for the 32-bit mode I have in mind would lead to a waste of
resources for many users of good hardware. And this because of somebody's
sloppiness -- as I have written -- this better be well justified.

Unless you have precise means to identify this system -- in that case I
think reconfiguring the bootstrap processor's local APIC ID to 0 would be
the right approach. Have you tried it?

Maciej