On kvm guest, the latest kernel will always print warning during kdump kernel boots
as below. The reaons is the legacy irq mode is disabled before jump to kexec/kdump
kernel. So in setup_local_APIC(), the do { xxx } while (queued && max_loops > 0)
can't handle if pending irq exists in APIC IRR since LAPIC is disabled. It will
terminate the do while loop finally when max_loops overflows by subtraction. Then
WARN_ON(max_loops <= 0) is triggered.
[ 0.001000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1467 setup_local_APIC+0x228/0x330
[ 0.001000] Modules linked in:
[ 0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #3
[ 0.001000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
[ 0.001000] RIP: 0010:setup_local_APIC+0x228/0x330
[ 0.001000] RSP: 0000:ffffffffb6e03eb8 EFLAGS: 00010286
[ 0.001000] RAX: 0000009edb4c4d84 RBX: 0000000000000000 RCX: 00000000b099d800
[ 0.001000] RDX: 0000009e00000000 RSI: 0000000000000000 RDI: 0000000000000810
[ 0.001000] RBP: 0000000000000000 R08: ffffffffffffffff R09: 0000000000000001
[ 0.001000] R10: ffff98ce6a801c00 R11: 0761076d072f0776 R12: 0000000000000001
[ 0.001000] R13: 00000000000000f0 R14: 0000000000004000 R15: ffffffffffffc6ff
[ 0.001000] FS: 0000000000000000(0000) GS:ffff98ce6bc00000(0000) knlGS:0000000000000000
[ 0.001000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.001000] CR2: 00000000ffffffff CR3: 0000000022209000 CR4: 00000000000406b0
[ 0.001000] Call Trace:
[ 0.001000] apic_bsp_setup+0x56/0x74
[ 0.001000] x86_late_time_init+0x11/0x16
[ 0.001000] start_kernel+0x3c9/0x486
[ 0.001000] secondary_startup_64+0xa5/0xb0
[ 0.001000] Code: 00 85 c9 74 2d 0f 31 c1 e1 0a 48 c1 e2 20 41 89 cf 4c 03 7c 24 08 48 09 d0 49 29 c7 4c 89 3c 24 48 83 3c 24 00 0f 8f 8f fe ff ff <0f> ff e9 10 ff ff ff 48 83 2c 24 01 eb e7 48 83 c4 18 5b 5d 41
[ 0.001000] ---[ end trace b88e71b9a6ebebdd ]---
[ 0.001000] masked ExtINT on CPU#0
With patch 2/3 applied, the above warning disappeared. And with patch 2/3
applied, the issue mentioned in patch 1/3 can also be fixed because the LAPIC
has been set as ExtINT before jump to kdump kernel, while we had better set it
explicitly. Seems no reason not to enable legacy irq mode before jump to
kexec/kdump kernel, and can make it be consistent with normal kernel.
Patch 3/3 is doing clean up, I am fine if people think it's unnecessary.
Baoquan He (3):
x86/apic: Set up through LAPIC on boot CPU's LINT0 if ioapic is
disabled
x86/apic/kexec: Enable legacy irq mode before jump to kexec/kdump
kernel
x86/apic: Clean up the names of legacy irq mode setting related
functions
arch/x86/include/asm/apic.h | 2 +-
arch/x86/include/asm/io_apic.h | 6 +++---
arch/x86/kernel/apic/apic.c | 13 +++++++------
arch/x86/kernel/apic/io_apic.c | 25 ++++++++++++-------------
arch/x86/kernel/crash.c | 2 +-
arch/x86/kernel/machine_kexec_32.c | 15 +++++----------
arch/x86/kernel/machine_kexec_64.c | 15 +++++----------
arch/x86/kernel/reboot.c | 2 +-
arch/x86/kernel/x86_init.c | 2 +-
drivers/iommu/irq_remapping.c | 2 +-
10 files changed, 37 insertions(+), 47 deletions(-)
--
2.5.5
Kdump kernel will become very slow if 'noapic' is specified in kernel
command line. Normal kernel doesn't have this issue.
This is because the legacy irq mode is disabled in crashed kernel before
jump jump to kdump kernel since commit 522e664644 ("x86/apic: Disable I/O
APIC before shutdown of the local APIC") is merged. While in normal kernel,
the legacy irq mode has been set in BIOS.
So we need set the delivery mode AS ExtINT for LVT0 of boot CPU's LAPIC
explicitly if IO-APIC is disabled, to set up through-local-APIC.
Signed-off-by: Baoquan He <[email protected]>
---
arch/x86/kernel/apic/apic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 880441f24146..7e613fb90630 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1521,7 +1521,7 @@ void setup_local_APIC(void)
* TODO: set up through-local-APIC from through-I/O-APIC? --macro
*/
value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
- if (!cpu && (pic_mode || !value)) {
+ if (!cpu && (pic_mode || !value || skip_ioapic_setup)) {
value = APIC_DM_EXTINT;
apic_printk(APIC_VERBOSE, "enabled ExtINT on CPU#%d\n", cpu);
} else {
--
2.5.5
In commit
commit 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC").
lapic_shutdown() invocation is moved after disable_IO_APIC(). In fact
in disable_IO_APIC(), it not only calls clear_IO_APIC() to disable
IO-APIC, also sets sets LAPIC and IO-APIC to make system be PIC or
Virtual wire mode. While the above commit putting disable_IO_APIC earlier
causes local APIC is completely disabled. So the legacy irq mode is
disabled too before jump to kexec/kdump kernel.
In normal kernel it defaults to be PIC mode or Virtual Wire mode during
system initialization before APIC mode is enabled and this is done by
BIOS initialization. But kexec/kdump kernel won't go through BIOS, so
we should set system as PIC or Virtual Wire mode before jump to kdump
kernel code directly.
So let's take clear_IO_APIC out from disable_IO_APIC and rename
disable_IO_APIC as switch_to_legacy_irq_mode. Then only call clear_IO_APIC
when IO-APIC need be disabled. And call switch_to_legacy_irq_mode before
kexec/kdump jumping.
Signed-off-by: Baoquan He <[email protected]>
---
arch/x86/include/asm/io_apic.h | 3 ++-
arch/x86/kernel/apic/io_apic.c | 12 ++++--------
arch/x86/kernel/crash.c | 2 +-
arch/x86/kernel/machine_kexec_32.c | 15 +++++----------
arch/x86/kernel/machine_kexec_64.c | 15 +++++----------
arch/x86/kernel/reboot.c | 2 +-
6 files changed, 18 insertions(+), 31 deletions(-)
diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index a8834dd546cd..e38ad3863a2c 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -192,7 +192,8 @@ static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
extern void setup_IO_APIC(void);
extern void enable_IO_APIC(void);
-extern void disable_IO_APIC(void);
+extern void clear_IO_APIC (void);
+extern void switch_to_legacy_irq_mode(void);
extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
extern void print_IO_APICs(void);
#else /* !CONFIG_X86_IO_APIC */
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 8a7963421460..a47aa915d18c 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -587,7 +587,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
mpc_ioapic_id(apic), pin);
}
-static void clear_IO_APIC (void)
+void clear_IO_APIC (void)
{
int apic, pin;
@@ -1439,15 +1439,11 @@ void native_disable_io_apic(void)
}
/*
- * Not an __init, needed by the reboot code
+ * Not an __init, needed by kexec/kdump code.
+ * For safety IO-APIC and Local APIC need be cleared before this.
*/
-void disable_IO_APIC(void)
+void switch_to_legacy_irq_mode(void)
{
- /*
- * Clear the IO-APIC before rebooting:
- */
- clear_IO_APIC();
-
if (!nr_legacy_irqs())
return;
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 10e74d4778a1..318ffeaaf55a 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -199,7 +199,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
#ifdef CONFIG_X86_IO_APIC
/* Prevent crash_kexec() from deadlocking on ioapic_lock. */
ioapic_zap_locks();
- disable_IO_APIC();
+ clear_IO_APIC();
#endif
lapic_shutdown();
#ifdef CONFIG_HPET_TIMER
diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
index edfede768688..7ab10d930cc6 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -190,18 +190,13 @@ void machine_kexec(struct kimage *image)
local_irq_disable();
hw_breakpoint_disable();
- if (image->preserve_context) {
#ifdef CONFIG_X86_IO_APIC
- /*
- * We need to put APICs in legacy mode so that we can
- * get timer interrupts in second kernel. kexec/kdump
- * paths already have calls to disable_IO_APIC() in
- * one form or other. kexec jump path also need
- * one.
- */
- disable_IO_APIC();
+ /*
+ * We need to put APICs in legacy mode so that we can
+ * get timer interrupts in second kernel.
+ */
+ switch_to_legacy_irq_mode();
#endif
- }
control_page = page_address(image->control_code_page);
memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 1f790cf9d38f..b5c0cbed6791 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -288,18 +288,13 @@ void machine_kexec(struct kimage *image)
local_irq_disable();
hw_breakpoint_disable();
- if (image->preserve_context) {
#ifdef CONFIG_X86_IO_APIC
- /*
- * We need to put APICs in legacy mode so that we can
- * get timer interrupts in second kernel. kexec/kdump
- * paths already have calls to disable_IO_APIC() in
- * one form or other. kexec jump path also need
- * one.
- */
- disable_IO_APIC();
+ /*
+ * We need to put APICs in legacy mode so that we can
+ * get timer interrupts in second kernel.
+ */
+ switch_to_legacy_irq_mode();
#endif
- }
control_page = page_address(image->control_code_page) + PAGE_SIZE;
memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 2126b9d27c34..b70cc0f38a29 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -666,7 +666,7 @@ void native_machine_shutdown(void)
* Even without the erratum, it still makes sense to quiet IO APIC
* before disabling Local APIC.
*/
- disable_IO_APIC();
+ clear_IO_APIC();
#endif
#ifdef CONFIG_SMP
--
2.5.5
X86 MP spec defines 3 different interrupt modes:
1) PIC Mode—bypasses all APIC components and forces the system to
operate in single-processor mode.
2) Virtual Wire Mode—uses an APIC as a virtual wire, but otherwise
operates the same as PIC Mode.
3) Symmetric I/O Mode—enables the system to operate with more than
one processor.
The current disconnect_bsp_APIC includes two parts: one is to set system
as PIC mode if it's available, the other is to change system back to
Virtual Wire mode. Only PIC mode will detach the APIC from the interrupt
system, Virtual Wire mode doesn't.
Besides Virutal Wire mode has two kinds: one is only setting Local APIC
as Virtual Wire mode and interrupts are delivered from the PIC to the
CPU which Local APIC connected to, the other is both Loca APIC and IO-APIC
need be set as Virtual Wire mode.
So based on above knowledge, take IO-APIC Virtual Wire mode setting code
out and wrap it inot a new function ioapic_set_virtual_wire_mode. Meanwhile
change the name of disconnect_bsp_APIC as lapic_set_legacy_irq_mode. These
makes the legacy irq mode setting more understandable.
Signed-off-by: Baoquan He <[email protected]>
---
arch/x86/include/asm/apic.h | 2 +-
arch/x86/include/asm/io_apic.h | 5 ++---
arch/x86/kernel/apic/apic.c | 11 ++++++-----
arch/x86/kernel/apic/io_apic.c | 17 ++++++++++-------
arch/x86/kernel/x86_init.c | 2 +-
drivers/iommu/irq_remapping.c | 2 +-
6 files changed, 21 insertions(+), 18 deletions(-)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index a9e57f08bfa6..004c48bc8bc8 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -132,7 +132,7 @@ extern int get_physical_broadcast(void);
extern int lapic_get_maxlvt(void);
extern void clear_local_APIC(void);
-extern void disconnect_bsp_APIC(int virt_wire_setup);
+extern void lapic_set_legacy_irq_mode(int virt_wire_setup);
extern void disable_local_APIC(void);
extern void lapic_shutdown(void);
extern void sync_Arb_IDs(void);
diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
index e38ad3863a2c..6800dcea1d21 100644
--- a/arch/x86/include/asm/io_apic.h
+++ b/arch/x86/include/asm/io_apic.h
@@ -183,7 +183,7 @@ extern void disable_ioapic_support(void);
extern void __init io_apic_init_mappings(void);
extern unsigned int native_io_apic_read(unsigned int apic, unsigned int reg);
-extern void native_disable_io_apic(void);
+extern void switch_to_legacy_irq_mode(void);
static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
{
@@ -193,7 +193,6 @@ static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
extern void setup_IO_APIC(void);
extern void enable_IO_APIC(void);
extern void clear_IO_APIC (void);
-extern void switch_to_legacy_irq_mode(void);
extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
extern void print_IO_APICs(void);
#else /* !CONFIG_X86_IO_APIC */
@@ -229,7 +228,7 @@ static inline void mp_save_irq(struct mpc_intsrc *m) { }
static inline void disable_ioapic_support(void) { }
static inline void io_apic_init_mappings(void) { }
#define native_io_apic_read NULL
-#define native_disable_io_apic NULL
+#define switch_to_legacy_irq_mode NULL
static inline void setup_IO_APIC(void) { }
static inline void enable_IO_APIC(void) { }
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 7e613fb90630..301d90d4a0c3 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2050,13 +2050,14 @@ static void __init connect_bsp_APIC(void)
}
/**
- * disconnect_bsp_APIC - detach the APIC from the interrupt system
- * @virt_wire_setup: indicates, whether virtual wire mode is selected
+ * lapic_set_legacy_irq_mode - switch Local APIC back to be legacy irq mode.
+ * @virt_wire_setup: indicates, whether virtual wire mode is selected
*
- * Virtual wire mode is necessary to deliver legacy interrupts even when the
- * APIC is disabled.
+ * If PIC mode is available, LAPIC need be disconnected with CPU. Otherwise
+ * enable LAPIC and set it to be virtual wire mode. However if IO-APIC has
+ * been virtual wire mode, LVT0 of LAPIC need be masked.
*/
-void disconnect_bsp_APIC(int virt_wire_setup)
+void lapic_set_legacy_irq_mode(int virt_wire_setup)
{
unsigned int value;
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index a47aa915d18c..b7fd4236b0e5 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1410,7 +1410,7 @@ void __init enable_IO_APIC(void)
clear_IO_APIC();
}
-void native_disable_io_apic(void)
+static void ioapic_set_virtual_wire_mode(void)
{
/*
* If the i8259 is routed through an IOAPIC
@@ -1433,21 +1433,24 @@ void native_disable_io_apic(void)
*/
ioapic_write_entry(ioapic_i8259.apic, ioapic_i8259.pin, entry);
}
-
- if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
- disconnect_bsp_APIC(ioapic_i8259.pin != -1);
}
/*
- * Not an __init, needed by kexec/kdump code.
- * For safety IO-APIC and Local APIC need be cleared before this.
+ * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT is
+ * provided by using the APICs in conjunction with standard 8259A-equivalent
+ * programmable interrupt controllers (PICs). It's necessary to deliver legacy
+ * interrupts even when APIC mode is not enabled. This is required by kexec/
+ * kdump before enter into the 2nd kernel.
*/
void switch_to_legacy_irq_mode(void)
{
if (!nr_legacy_irqs())
return;
- x86_io_apic_ops.disable();
+ ioapic_set_virtual_wire_mode();
+
+ if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
+ lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1);
}
#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 1151ccd72ce9..c30f0f273dbd 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev)
struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = {
.read = native_io_apic_read,
- .disable = native_disable_io_apic,
+ .disable = switch_to_legacy_irq_mode,
};
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 49721b4e1975..751472ddf536 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void)
* now.
*/
if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
- disconnect_bsp_APIC(0);
+ lapic_set_legacy_irq_mode(0);
}
static void __init irq_remapping_modify_x86_ops(void)
--
2.5.5
Hi all,
PING!
(Add Fenghua and Eric to this thread)
On 01/05/18 at 11:42am, Baoquan He wrote:
> On kvm guest, the latest kernel will always print warning during kdump kernel boots
> as below. The reaons is the legacy irq mode is disabled before jump to kexec/kdump
> kernel. So in setup_local_APIC(), the do { xxx } while (queued && max_loops > 0)
> can't handle if pending irq exists in APIC IRR since LAPIC is disabled. It will
> terminate the do while loop finally when max_loops overflows by subtraction. Then
> WARN_ON(max_loops <= 0) is triggered.
>
> [ 0.001000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1467 setup_local_APIC+0x228/0x330
> [ 0.001000] Modules linked in:
> [ 0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #3
> [ 0.001000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
> [ 0.001000] RIP: 0010:setup_local_APIC+0x228/0x330
> [ 0.001000] RSP: 0000:ffffffffb6e03eb8 EFLAGS: 00010286
> [ 0.001000] RAX: 0000009edb4c4d84 RBX: 0000000000000000 RCX: 00000000b099d800
> [ 0.001000] RDX: 0000009e00000000 RSI: 0000000000000000 RDI: 0000000000000810
> [ 0.001000] RBP: 0000000000000000 R08: ffffffffffffffff R09: 0000000000000001
> [ 0.001000] R10: ffff98ce6a801c00 R11: 0761076d072f0776 R12: 0000000000000001
> [ 0.001000] R13: 00000000000000f0 R14: 0000000000004000 R15: ffffffffffffc6ff
> [ 0.001000] FS: 0000000000000000(0000) GS:ffff98ce6bc00000(0000) knlGS:0000000000000000
> [ 0.001000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.001000] CR2: 00000000ffffffff CR3: 0000000022209000 CR4: 00000000000406b0
> [ 0.001000] Call Trace:
> [ 0.001000] apic_bsp_setup+0x56/0x74
> [ 0.001000] x86_late_time_init+0x11/0x16
> [ 0.001000] start_kernel+0x3c9/0x486
> [ 0.001000] secondary_startup_64+0xa5/0xb0
> [ 0.001000] Code: 00 85 c9 74 2d 0f 31 c1 e1 0a 48 c1 e2 20 41 89 cf 4c 03 7c 24 08 48 09 d0 49 29 c7 4c 89 3c 24 48 83 3c 24 00 0f 8f 8f fe ff ff <0f> ff e9 10 ff ff ff 48 83 2c 24 01 eb e7 48 83 c4 18 5b 5d 41
> [ 0.001000] ---[ end trace b88e71b9a6ebebdd ]---
> [ 0.001000] masked ExtINT on CPU#0
>
> With patch 2/3 applied, the above warning disappeared. And with patch 2/3
> applied, the issue mentioned in patch 1/3 can also be fixed because the LAPIC
> has been set as ExtINT before jump to kdump kernel, while we had better set it
> explicitly. Seems no reason not to enable legacy irq mode before jump to
> kexec/kdump kernel, and can make it be consistent with normal kernel.
>
> Patch 3/3 is doing clean up, I am fine if people think it's unnecessary.
>
>
> Baoquan He (3):
> x86/apic: Set up through LAPIC on boot CPU's LINT0 if ioapic is
> disabled
> x86/apic/kexec: Enable legacy irq mode before jump to kexec/kdump
> kernel
> x86/apic: Clean up the names of legacy irq mode setting related
> functions
>
> arch/x86/include/asm/apic.h | 2 +-
> arch/x86/include/asm/io_apic.h | 6 +++---
> arch/x86/kernel/apic/apic.c | 13 +++++++------
> arch/x86/kernel/apic/io_apic.c | 25 ++++++++++++-------------
> arch/x86/kernel/crash.c | 2 +-
> arch/x86/kernel/machine_kexec_32.c | 15 +++++----------
> arch/x86/kernel/machine_kexec_64.c | 15 +++++----------
> arch/x86/kernel/reboot.c | 2 +-
> arch/x86/kernel/x86_init.c | 2 +-
> drivers/iommu/irq_remapping.c | 2 +-
> 10 files changed, 37 insertions(+), 47 deletions(-)
>
> --
> 2.5.5
>
Baoquan He <[email protected]> writes:
> Hi all,
>
> PING!
>
> (Add Fenghua and Eric to this thread)
Absolutely not. There are very strong reasons not to anything in the
kexec on panic path that we don't need to.
Among them legacy mode only works when the kernel starts up on the boot
cpu and kexec on panic absolutely can not guarantee the kernel will be
crashing and thus the new kernel starting on the boot cpu.
What we need to be doing (and I have seen patches around to this effect)
is teaching the kernel to always use non-legacy mode. Which should
remove all of these issues as then legacy mode won't matter in the
slightest.
Eric
>
> On 01/05/18 at 11:42am, Baoquan He wrote:
>> On kvm guest, the latest kernel will always print warning during kdump kernel boots
>> as below. The reaons is the legacy irq mode is disabled before jump to kexec/kdump
>> kernel. So in setup_local_APIC(), the do { xxx } while (queued && max_loops > 0)
>> can't handle if pending irq exists in APIC IRR since LAPIC is disabled. It will
>> terminate the do while loop finally when max_loops overflows by subtraction. Then
>> WARN_ON(max_loops <= 0) is triggered.
>>
>> [ 0.001000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1467 setup_local_APIC+0x228/0x330
>> [ 0.001000] Modules linked in:
>> [ 0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #3
>> [ 0.001000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
>> [ 0.001000] RIP: 0010:setup_local_APIC+0x228/0x330
>> [ 0.001000] RSP: 0000:ffffffffb6e03eb8 EFLAGS: 00010286
>> [ 0.001000] RAX: 0000009edb4c4d84 RBX: 0000000000000000 RCX: 00000000b099d800
>> [ 0.001000] RDX: 0000009e00000000 RSI: 0000000000000000 RDI: 0000000000000810
>> [ 0.001000] RBP: 0000000000000000 R08: ffffffffffffffff R09: 0000000000000001
>> [ 0.001000] R10: ffff98ce6a801c00 R11: 0761076d072f0776 R12: 0000000000000001
>> [ 0.001000] R13: 00000000000000f0 R14: 0000000000004000 R15: ffffffffffffc6ff
>> [ 0.001000] FS: 0000000000000000(0000) GS:ffff98ce6bc00000(0000) knlGS:0000000000000000
>> [ 0.001000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 0.001000] CR2: 00000000ffffffff CR3: 0000000022209000 CR4: 00000000000406b0
>> [ 0.001000] Call Trace:
>> [ 0.001000] apic_bsp_setup+0x56/0x74
>> [ 0.001000] x86_late_time_init+0x11/0x16
>> [ 0.001000] start_kernel+0x3c9/0x486
>> [ 0.001000] secondary_startup_64+0xa5/0xb0
>> [ 0.001000] Code: 00 85 c9 74 2d 0f 31 c1 e1 0a 48 c1 e2 20 41 89 cf 4c 03 7c 24 08 48 09 d0 49 29 c7 4c 89 3c 24 48 83 3c 24 00 0f 8f 8f fe ff ff <0f> ff e9 10 ff ff ff 48 83 2c 24 01 eb e7 48 83 c4 18 5b 5d 41
>> [ 0.001000] ---[ end trace b88e71b9a6ebebdd ]---
>> [ 0.001000] masked ExtINT on CPU#0
>>
>> With patch 2/3 applied, the above warning disappeared. And with patch 2/3
>> applied, the issue mentioned in patch 1/3 can also be fixed because the LAPIC
>> has been set as ExtINT before jump to kdump kernel, while we had better set it
>> explicitly. Seems no reason not to enable legacy irq mode before jump to
>> kexec/kdump kernel, and can make it be consistent with normal kernel.
>>
>> Patch 3/3 is doing clean up, I am fine if people think it's unnecessary.
>>
>>
>> Baoquan He (3):
>> x86/apic: Set up through LAPIC on boot CPU's LINT0 if ioapic is
>> disabled
>> x86/apic/kexec: Enable legacy irq mode before jump to kexec/kdump
>> kernel
>> x86/apic: Clean up the names of legacy irq mode setting related
>> functions
>>
>> arch/x86/include/asm/apic.h | 2 +-
>> arch/x86/include/asm/io_apic.h | 6 +++---
>> arch/x86/kernel/apic/apic.c | 13 +++++++------
>> arch/x86/kernel/apic/io_apic.c | 25 ++++++++++++-------------
>> arch/x86/kernel/crash.c | 2 +-
>> arch/x86/kernel/machine_kexec_32.c | 15 +++++----------
>> arch/x86/kernel/machine_kexec_64.c | 15 +++++----------
>> arch/x86/kernel/reboot.c | 2 +-
>> arch/x86/kernel/x86_init.c | 2 +-
>> drivers/iommu/irq_remapping.c | 2 +-
>> 10 files changed, 37 insertions(+), 47 deletions(-)
>>
>> --
>> 2.5.5
>>
Baoquan He <[email protected]> writes:
> Hi all,
>
> PING!
>
> (Add Fenghua and Eric to this thread)
>
> On 01/05/18 at 11:42am, Baoquan He wrote:
>> On kvm guest, the latest kernel will always print warning during kdump kernel boots
>> as below. The reaons is the legacy irq mode is disabled before jump to kexec/kdump
>> kernel. So in setup_local_APIC(), the do { xxx } while (queued && max_loops > 0)
>> can't handle if pending irq exists in APIC IRR since LAPIC is disabled. It will
>> terminate the do while loop finally when max_loops overflows by subtraction. Then
>> WARN_ON(max_loops <= 0) is triggered.
Overall this looks like the code is setup_local_APIC is working largely
as designed. It does run into a snag so it warns.
Which leaves the question: Does QEMU have buggy APIC emulation in this
case or is that loop simply incapble of dealing with queued interrupts
in APIC_IRR.
>>
>> [ 0.001000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1467 setup_local_APIC+0x228/0x330
>> [ 0.001000] Modules linked in:
>> [ 0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #3
>> [ 0.001000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
>> [ 0.001000] RIP: 0010:setup_local_APIC+0x228/0x330
>> [ 0.001000] RSP: 0000:ffffffffb6e03eb8 EFLAGS: 00010286
>> [ 0.001000] RAX: 0000009edb4c4d84 RBX: 0000000000000000 RCX: 00000000b099d800
>> [ 0.001000] RDX: 0000009e00000000 RSI: 0000000000000000 RDI: 0000000000000810
>> [ 0.001000] RBP: 0000000000000000 R08: ffffffffffffffff R09: 0000000000000001
>> [ 0.001000] R10: ffff98ce6a801c00 R11: 0761076d072f0776 R12: 0000000000000001
>> [ 0.001000] R13: 00000000000000f0 R14: 0000000000004000 R15: ffffffffffffc6ff
>> [ 0.001000] FS: 0000000000000000(0000) GS:ffff98ce6bc00000(0000) knlGS:0000000000000000
>> [ 0.001000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 0.001000] CR2: 00000000ffffffff CR3: 0000000022209000 CR4: 00000000000406b0
>> [ 0.001000] Call Trace:
>> [ 0.001000] apic_bsp_setup+0x56/0x74
>> [ 0.001000] x86_late_time_init+0x11/0x16
>> [ 0.001000] start_kernel+0x3c9/0x486
>> [ 0.001000] secondary_startup_64+0xa5/0xb0
>> [ 0.001000] Code: 00 85 c9 74 2d 0f 31 c1 e1 0a 48 c1 e2 20 41 89 cf 4c 03 7c 24 08 48 09 d0 49 29 c7 4c 89 3c 24 48 83 3c 24 00 0f 8f 8f fe ff ff <0f> ff e9 10 ff ff ff 48 83 2c 24 01 eb e7 48 83 c4 18 5b 5d 41
>> [ 0.001000] ---[ end trace b88e71b9a6ebebdd ]---
>> [ 0.001000] masked ExtINT on CPU#0
>>
>> With patch 2/3 applied, the above warning disappeared. And with patch 2/3
>> applied, the issue mentioned in patch 1/3 can also be fixed because the LAPIC
>> has been set as ExtINT before jump to kdump kernel, while we had better set it
>> explicitly. Seems no reason not to enable legacy irq mode before jump to
>> kexec/kdump kernel, and can make it be consistent with normal kernel.
>>
>> Patch 3/3 is doing clean up, I am fine if people think it's unnecessary.
>>
I don't see these patches so I am simply going on their description.
>> Baoquan He (3):
>> x86/apic: Set up through LAPIC on boot CPU's LINT0 if ioapic is
>> disabled
*scratches my head* Are you booging the kexec on panic kernel with
apics disabled? When the previous kernel had apics enabled?
That makes my head really hurt if you are. Don't do that.
Eric
On 01/11/18 at 01:05pm, Eric W. Biederman wrote:
> Baoquan He <[email protected]> writes:
>
> > Hi all,
> >
> > PING!
> >
> > (Add Fenghua and Eric to this thread)
> >
> > On 01/05/18 at 11:42am, Baoquan He wrote:
> >> On kvm guest, the latest kernel will always print warning during kdump kernel boots
> >> as below. The reaons is the legacy irq mode is disabled before jump to kexec/kdump
> >> kernel. So in setup_local_APIC(), the do { xxx } while (queued && max_loops > 0)
> >> can't handle if pending irq exists in APIC IRR since LAPIC is disabled. It will
> >> terminate the do while loop finally when max_loops overflows by subtraction. Then
> >> WARN_ON(max_loops <= 0) is triggered.
>
> Overall this looks like the code is setup_local_APIC is working largely
> as designed. It does run into a snag so it warns.
>
> Which leaves the question: Does QEMU have buggy APIC emulation in this
> case or is that loop simply incapble of dealing with queued interrupts
> in APIC_IRR.
Thanks a lot for looking into this, Eric!
Yes, as you said, setup_local_APIC() is working well. It assumes the
current apic can handle the queued interrupts in APIC_IRR. However,
in the current native_machine_crash_shutdown(), it calls
lapic_shutdown() which will invoke disable_local_APIC() to disable APIC
completely with below code. Then when kdump kernel comes into
setup_local_APIC(), the queued interrupts in APIC_IRR can not be handled
at all.
void disable_local_APIC(void)
{
......
/*
* Disable APIC (implies clearing of registers
* for 82489DX!).
*/
value = apic_read(APIC_SPIV);
value &= ~APIC_SPIV_APIC_ENABLED;
apic_write(APIC_SPIV, value);
}
With legacy irq mode enabled before jump to kdump kernel,
setup_local_APIC() can handle it well.
So if we decide to disable legacy mode before jump to kdump kernel, we
need remove the do { xxx } while (queued && max_loops > 0) code block
in setup_local_APIC(), and need change disable_IO_APIC() too since it is
doing thing which does not match its name. Just leave those pending irqs
till final apic mode is setup.
>
> >>
> >> [ 0.001000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1467 setup_local_APIC+0x228/0x330
> >> [ 0.001000] Modules linked in:
> >> [ 0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #3
> >> [ 0.001000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
> >> [ 0.001000] RIP: 0010:setup_local_APIC+0x228/0x330
> >> [ 0.001000] RSP: 0000:ffffffffb6e03eb8 EFLAGS: 00010286
> >> [ 0.001000] RAX: 0000009edb4c4d84 RBX: 0000000000000000 RCX: 00000000b099d800
> >> [ 0.001000] RDX: 0000009e00000000 RSI: 0000000000000000 RDI: 0000000000000810
> >> [ 0.001000] RBP: 0000000000000000 R08: ffffffffffffffff R09: 0000000000000001
> >> [ 0.001000] R10: ffff98ce6a801c00 R11: 0761076d072f0776 R12: 0000000000000001
> >> [ 0.001000] R13: 00000000000000f0 R14: 0000000000004000 R15: ffffffffffffc6ff
> >> [ 0.001000] FS: 0000000000000000(0000) GS:ffff98ce6bc00000(0000) knlGS:0000000000000000
> >> [ 0.001000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 0.001000] CR2: 00000000ffffffff CR3: 0000000022209000 CR4: 00000000000406b0
> >> [ 0.001000] Call Trace:
> >> [ 0.001000] apic_bsp_setup+0x56/0x74
> >> [ 0.001000] x86_late_time_init+0x11/0x16
> >> [ 0.001000] start_kernel+0x3c9/0x486
> >> [ 0.001000] secondary_startup_64+0xa5/0xb0
> >> [ 0.001000] Code: 00 85 c9 74 2d 0f 31 c1 e1 0a 48 c1 e2 20 41 89 cf 4c 03 7c 24 08 48 09 d0 49 29 c7 4c 89 3c 24 48 83 3c 24 00 0f 8f 8f fe ff ff <0f> ff e9 10 ff ff ff 48 83 2c 24 01 eb e7 48 83 c4 18 5b 5d 41
> >> [ 0.001000] ---[ end trace b88e71b9a6ebebdd ]---
> >> [ 0.001000] masked ExtINT on CPU#0
> >>
> >> With patch 2/3 applied, the above warning disappeared. And with patch 2/3
> >> applied, the issue mentioned in patch 1/3 can also be fixed because the LAPIC
> >> has been set as ExtINT before jump to kdump kernel, while we had better set it
> >> explicitly. Seems no reason not to enable legacy irq mode before jump to
> >> kexec/kdump kernel, and can make it be consistent with normal kernel.
> >>
> >> Patch 3/3 is doing clean up, I am fine if people think it's unnecessary.
> >>
>
> I don't see these patches so I am simply going on their description.
>
> >> Baoquan He (3):
> >> x86/apic: Set up through LAPIC on boot CPU's LINT0 if ioapic is
> >> disabled
>
> *scratches my head* Are you booging the kexec on panic kernel with
~~~ typo, should be 'booting'?
> apics disabled? When the previous kernel had apics enabled?
> That makes my head really hurt if you are. Don't do that.
No, not like that. Just the kernel paratmeter 'noapic' is really
misleading. It only disables the IO-APIC using in system. I got this bug
reported, 1st kernel works well with 'noapic' added, while kdump kernel
won't.
*****************************************************************
noapic [SMP,APIC] Tells the kernel to not make use of any
IOAPICs that may be present in the system.
Thanks
Baoquan
CC Eric
On 01/05/18 at 12:37pm, Baoquan He wrote:
> Kdump kernel will become very slow if 'noapic' is specified in kernel
> command line. Normal kernel doesn't have this issue.
>
> This is because the legacy irq mode is disabled in crashed kernel before
> jump jump to kdump kernel since commit 522e664644 ("x86/apic: Disable I/O
> APIC before shutdown of the local APIC") is merged. While in normal kernel,
> the legacy irq mode has been set in BIOS.
>
> So we need set the delivery mode AS ExtINT for LVT0 of boot CPU's LAPIC
> explicitly if IO-APIC is disabled, to set up through-local-APIC.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> arch/x86/kernel/apic/apic.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 880441f24146..7e613fb90630 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1521,7 +1521,7 @@ void setup_local_APIC(void)
> * TODO: set up through-local-APIC from through-I/O-APIC? --macro
> */
> value = apic_read(APIC_LVT0) & APIC_LVT_MASKED;
> - if (!cpu && (pic_mode || !value)) {
> + if (!cpu && (pic_mode || !value || skip_ioapic_setup)) {
> value = APIC_DM_EXTINT;
> apic_printk(APIC_VERBOSE, "enabled ExtINT on CPU#%d\n", cpu);
> } else {
> --
> 2.5.5
>
CC Eric
On 01/05/18 at 12:38pm, Baoquan He wrote:
> In commit
>
> commit 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC").
>
> lapic_shutdown() invocation is moved after disable_IO_APIC(). In fact
> in disable_IO_APIC(), it not only calls clear_IO_APIC() to disable
> IO-APIC, also sets sets LAPIC and IO-APIC to make system be PIC or
> Virtual wire mode. While the above commit putting disable_IO_APIC earlier
> causes local APIC is completely disabled. So the legacy irq mode is
> disabled too before jump to kexec/kdump kernel.
>
> In normal kernel it defaults to be PIC mode or Virtual Wire mode during
> system initialization before APIC mode is enabled and this is done by
> BIOS initialization. But kexec/kdump kernel won't go through BIOS, so
> we should set system as PIC or Virtual Wire mode before jump to kdump
> kernel code directly.
>
> So let's take clear_IO_APIC out from disable_IO_APIC and rename
> disable_IO_APIC as switch_to_legacy_irq_mode. Then only call clear_IO_APIC
> when IO-APIC need be disabled. And call switch_to_legacy_irq_mode before
> kexec/kdump jumping.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> arch/x86/include/asm/io_apic.h | 3 ++-
> arch/x86/kernel/apic/io_apic.c | 12 ++++--------
> arch/x86/kernel/crash.c | 2 +-
> arch/x86/kernel/machine_kexec_32.c | 15 +++++----------
> arch/x86/kernel/machine_kexec_64.c | 15 +++++----------
> arch/x86/kernel/reboot.c | 2 +-
> 6 files changed, 18 insertions(+), 31 deletions(-)
>
> diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> index a8834dd546cd..e38ad3863a2c 100644
> --- a/arch/x86/include/asm/io_apic.h
> +++ b/arch/x86/include/asm/io_apic.h
> @@ -192,7 +192,8 @@ static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
>
> extern void setup_IO_APIC(void);
> extern void enable_IO_APIC(void);
> -extern void disable_IO_APIC(void);
> +extern void clear_IO_APIC (void);
> +extern void switch_to_legacy_irq_mode(void);
> extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
> extern void print_IO_APICs(void);
> #else /* !CONFIG_X86_IO_APIC */
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index 8a7963421460..a47aa915d18c 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -587,7 +587,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
> mpc_ioapic_id(apic), pin);
> }
>
> -static void clear_IO_APIC (void)
> +void clear_IO_APIC (void)
> {
> int apic, pin;
>
> @@ -1439,15 +1439,11 @@ void native_disable_io_apic(void)
> }
>
> /*
> - * Not an __init, needed by the reboot code
> + * Not an __init, needed by kexec/kdump code.
> + * For safety IO-APIC and Local APIC need be cleared before this.
> */
> -void disable_IO_APIC(void)
> +void switch_to_legacy_irq_mode(void)
> {
> - /*
> - * Clear the IO-APIC before rebooting:
> - */
> - clear_IO_APIC();
> -
> if (!nr_legacy_irqs())
> return;
>
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 10e74d4778a1..318ffeaaf55a 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -199,7 +199,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> #ifdef CONFIG_X86_IO_APIC
> /* Prevent crash_kexec() from deadlocking on ioapic_lock. */
> ioapic_zap_locks();
> - disable_IO_APIC();
> + clear_IO_APIC();
> #endif
> lapic_shutdown();
> #ifdef CONFIG_HPET_TIMER
> diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
> index edfede768688..7ab10d930cc6 100644
> --- a/arch/x86/kernel/machine_kexec_32.c
> +++ b/arch/x86/kernel/machine_kexec_32.c
> @@ -190,18 +190,13 @@ void machine_kexec(struct kimage *image)
> local_irq_disable();
> hw_breakpoint_disable();
>
> - if (image->preserve_context) {
> #ifdef CONFIG_X86_IO_APIC
> - /*
> - * We need to put APICs in legacy mode so that we can
> - * get timer interrupts in second kernel. kexec/kdump
> - * paths already have calls to disable_IO_APIC() in
> - * one form or other. kexec jump path also need
> - * one.
> - */
> - disable_IO_APIC();
> + /*
> + * We need to put APICs in legacy mode so that we can
> + * get timer interrupts in second kernel.
> + */
> + switch_to_legacy_irq_mode();
> #endif
> - }
>
> control_page = page_address(image->control_code_page);
> memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 1f790cf9d38f..b5c0cbed6791 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -288,18 +288,13 @@ void machine_kexec(struct kimage *image)
> local_irq_disable();
> hw_breakpoint_disable();
>
> - if (image->preserve_context) {
> #ifdef CONFIG_X86_IO_APIC
> - /*
> - * We need to put APICs in legacy mode so that we can
> - * get timer interrupts in second kernel. kexec/kdump
> - * paths already have calls to disable_IO_APIC() in
> - * one form or other. kexec jump path also need
> - * one.
> - */
> - disable_IO_APIC();
> + /*
> + * We need to put APICs in legacy mode so that we can
> + * get timer interrupts in second kernel.
> + */
> + switch_to_legacy_irq_mode();
> #endif
> - }
>
> control_page = page_address(image->control_code_page) + PAGE_SIZE;
> memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index 2126b9d27c34..b70cc0f38a29 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -666,7 +666,7 @@ void native_machine_shutdown(void)
> * Even without the erratum, it still makes sense to quiet IO APIC
> * before disabling Local APIC.
> */
> - disable_IO_APIC();
> + clear_IO_APIC();
> #endif
>
> #ifdef CONFIG_SMP
> --
> 2.5.5
>
CC Eric
On 01/05/18 at 12:39pm, Baoquan He wrote:
> X86 MP spec defines 3 different interrupt modes:
> 1) PIC Mode—bypasses all APIC components and forces the system to
> operate in single-processor mode.
> 2) Virtual Wire Mode—uses an APIC as a virtual wire, but otherwise
> operates the same as PIC Mode.
> 3) Symmetric I/O Mode—enables the system to operate with more than
> one processor.
>
> The current disconnect_bsp_APIC includes two parts: one is to set system
> as PIC mode if it's available, the other is to change system back to
> Virtual Wire mode. Only PIC mode will detach the APIC from the interrupt
> system, Virtual Wire mode doesn't.
>
> Besides Virutal Wire mode has two kinds: one is only setting Local APIC
> as Virtual Wire mode and interrupts are delivered from the PIC to the
> CPU which Local APIC connected to, the other is both Loca APIC and IO-APIC
> need be set as Virtual Wire mode.
>
> So based on above knowledge, take IO-APIC Virtual Wire mode setting code
> out and wrap it inot a new function ioapic_set_virtual_wire_mode. Meanwhile
> change the name of disconnect_bsp_APIC as lapic_set_legacy_irq_mode. These
> makes the legacy irq mode setting more understandable.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> arch/x86/include/asm/apic.h | 2 +-
> arch/x86/include/asm/io_apic.h | 5 ++---
> arch/x86/kernel/apic/apic.c | 11 ++++++-----
> arch/x86/kernel/apic/io_apic.c | 17 ++++++++++-------
> arch/x86/kernel/x86_init.c | 2 +-
> drivers/iommu/irq_remapping.c | 2 +-
> 6 files changed, 21 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index a9e57f08bfa6..004c48bc8bc8 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -132,7 +132,7 @@ extern int get_physical_broadcast(void);
>
> extern int lapic_get_maxlvt(void);
> extern void clear_local_APIC(void);
> -extern void disconnect_bsp_APIC(int virt_wire_setup);
> +extern void lapic_set_legacy_irq_mode(int virt_wire_setup);
> extern void disable_local_APIC(void);
> extern void lapic_shutdown(void);
> extern void sync_Arb_IDs(void);
> diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> index e38ad3863a2c..6800dcea1d21 100644
> --- a/arch/x86/include/asm/io_apic.h
> +++ b/arch/x86/include/asm/io_apic.h
> @@ -183,7 +183,7 @@ extern void disable_ioapic_support(void);
>
> extern void __init io_apic_init_mappings(void);
> extern unsigned int native_io_apic_read(unsigned int apic, unsigned int reg);
> -extern void native_disable_io_apic(void);
> +extern void switch_to_legacy_irq_mode(void);
>
> static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
> {
> @@ -193,7 +193,6 @@ static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
> extern void setup_IO_APIC(void);
> extern void enable_IO_APIC(void);
> extern void clear_IO_APIC (void);
> -extern void switch_to_legacy_irq_mode(void);
> extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
> extern void print_IO_APICs(void);
> #else /* !CONFIG_X86_IO_APIC */
> @@ -229,7 +228,7 @@ static inline void mp_save_irq(struct mpc_intsrc *m) { }
> static inline void disable_ioapic_support(void) { }
> static inline void io_apic_init_mappings(void) { }
> #define native_io_apic_read NULL
> -#define native_disable_io_apic NULL
> +#define switch_to_legacy_irq_mode NULL
>
> static inline void setup_IO_APIC(void) { }
> static inline void enable_IO_APIC(void) { }
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 7e613fb90630..301d90d4a0c3 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -2050,13 +2050,14 @@ static void __init connect_bsp_APIC(void)
> }
>
> /**
> - * disconnect_bsp_APIC - detach the APIC from the interrupt system
> - * @virt_wire_setup: indicates, whether virtual wire mode is selected
> + * lapic_set_legacy_irq_mode - switch Local APIC back to be legacy irq mode.
> + * @virt_wire_setup: indicates, whether virtual wire mode is selected
> *
> - * Virtual wire mode is necessary to deliver legacy interrupts even when the
> - * APIC is disabled.
> + * If PIC mode is available, LAPIC need be disconnected with CPU. Otherwise
> + * enable LAPIC and set it to be virtual wire mode. However if IO-APIC has
> + * been virtual wire mode, LVT0 of LAPIC need be masked.
> */
> -void disconnect_bsp_APIC(int virt_wire_setup)
> +void lapic_set_legacy_irq_mode(int virt_wire_setup)
> {
> unsigned int value;
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index a47aa915d18c..b7fd4236b0e5 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -1410,7 +1410,7 @@ void __init enable_IO_APIC(void)
> clear_IO_APIC();
> }
>
> -void native_disable_io_apic(void)
> +static void ioapic_set_virtual_wire_mode(void)
> {
> /*
> * If the i8259 is routed through an IOAPIC
> @@ -1433,21 +1433,24 @@ void native_disable_io_apic(void)
> */
> ioapic_write_entry(ioapic_i8259.apic, ioapic_i8259.pin, entry);
> }
> -
> - if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> - disconnect_bsp_APIC(ioapic_i8259.pin != -1);
> }
>
> /*
> - * Not an __init, needed by kexec/kdump code.
> - * For safety IO-APIC and Local APIC need be cleared before this.
> + * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT is
> + * provided by using the APICs in conjunction with standard 8259A-equivalent
> + * programmable interrupt controllers (PICs). It's necessary to deliver legacy
> + * interrupts even when APIC mode is not enabled. This is required by kexec/
> + * kdump before enter into the 2nd kernel.
> */
> void switch_to_legacy_irq_mode(void)
> {
> if (!nr_legacy_irqs())
> return;
>
> - x86_io_apic_ops.disable();
> + ioapic_set_virtual_wire_mode();
> +
> + if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> + lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1);
> }
>
> #ifdef CONFIG_X86_32
> diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> index 1151ccd72ce9..c30f0f273dbd 100644
> --- a/arch/x86/kernel/x86_init.c
> +++ b/arch/x86/kernel/x86_init.c
> @@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev)
>
> struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = {
> .read = native_io_apic_read,
> - .disable = native_disable_io_apic,
> + .disable = switch_to_legacy_irq_mode,
> };
> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> index 49721b4e1975..751472ddf536 100644
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void)
> * now.
> */
> if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> - disconnect_bsp_APIC(0);
> + lapic_set_legacy_irq_mode(0);
> }
>
> static void __init irq_remapping_modify_x86_ops(void)
> --
> 2.5.5
>
Hi Eric,
Sorry I missed your mail contact in the CC list when posted patches.
Have CC-ed them to you one by one. I can send the patches to you
privately if the format is messy and not good for reviewing.
On 01/11/18 at 12:28pm, Eric W. Biederman wrote:
> Baoquan He <[email protected]> writes:
>
> > Hi all,
> >
> > PING!
> >
> > (Add Fenghua and Eric to this thread)
>
> Absolutely not. There are very strong reasons not to anything in the
> kexec on panic path that we don't need to.
Agreed that we should stop adding anything in the path if we don't need
to. Now the legacy irq mode enabling could not be the not needed case.
As you can see we will see the warning in kdump boot log if irq pending
in APIC_RR.
And people may not always want to take the APIC mode in kernel. E.g if
they specify 'noapic' to disable IO-APIC in system explicitly, for
testing purpuse or other special reason. Furthermore, the apic mode
enabling has been put earlier by Dou's patchset. This change will enables
legacy irq mode before jump to kdump won't break that.
After all, a warning seen in boot log of kdump kernel, bug is reported,
we need fix.
Any idea or suggestion about this?
Thanks
Baoquan
>
> Among them legacy mode only works when the kernel starts up on the boot
> cpu and kexec on panic absolutely can not guarantee the kernel will be
> crashing and thus the new kernel starting on the boot cpu.
>
> What we need to be doing (and I have seen patches around to this effect)
> is teaching the kernel to always use non-legacy mode. Which should
> remove all of these issues as then legacy mode won't matter in the
> slightest.
>
> Eric
>
>
>
> >
> > On 01/05/18 at 11:42am, Baoquan He wrote:
> >> On kvm guest, the latest kernel will always print warning during kdump kernel boots
> >> as below. The reaons is the legacy irq mode is disabled before jump to kexec/kdump
> >> kernel. So in setup_local_APIC(), the do { xxx } while (queued && max_loops > 0)
> >> can't handle if pending irq exists in APIC IRR since LAPIC is disabled. It will
> >> terminate the do while loop finally when max_loops overflows by subtraction. Then
> >> WARN_ON(max_loops <= 0) is triggered.
> >>
> >> [ 0.001000] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1467 setup_local_APIC+0x228/0x330
> >> [ 0.001000] Modules linked in:
> >> [ 0.001000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #3
> >> [ 0.001000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
> >> [ 0.001000] RIP: 0010:setup_local_APIC+0x228/0x330
> >> [ 0.001000] RSP: 0000:ffffffffb6e03eb8 EFLAGS: 00010286
> >> [ 0.001000] RAX: 0000009edb4c4d84 RBX: 0000000000000000 RCX: 00000000b099d800
> >> [ 0.001000] RDX: 0000009e00000000 RSI: 0000000000000000 RDI: 0000000000000810
> >> [ 0.001000] RBP: 0000000000000000 R08: ffffffffffffffff R09: 0000000000000001
> >> [ 0.001000] R10: ffff98ce6a801c00 R11: 0761076d072f0776 R12: 0000000000000001
> >> [ 0.001000] R13: 00000000000000f0 R14: 0000000000004000 R15: ffffffffffffc6ff
> >> [ 0.001000] FS: 0000000000000000(0000) GS:ffff98ce6bc00000(0000) knlGS:0000000000000000
> >> [ 0.001000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> [ 0.001000] CR2: 00000000ffffffff CR3: 0000000022209000 CR4: 00000000000406b0
> >> [ 0.001000] Call Trace:
> >> [ 0.001000] apic_bsp_setup+0x56/0x74
> >> [ 0.001000] x86_late_time_init+0x11/0x16
> >> [ 0.001000] start_kernel+0x3c9/0x486
> >> [ 0.001000] secondary_startup_64+0xa5/0xb0
> >> [ 0.001000] Code: 00 85 c9 74 2d 0f 31 c1 e1 0a 48 c1 e2 20 41 89 cf 4c 03 7c 24 08 48 09 d0 49 29 c7 4c 89 3c 24 48 83 3c 24 00 0f 8f 8f fe ff ff <0f> ff e9 10 ff ff ff 48 83 2c 24 01 eb e7 48 83 c4 18 5b 5d 41
> >> [ 0.001000] ---[ end trace b88e71b9a6ebebdd ]---
> >> [ 0.001000] masked ExtINT on CPU#0
> >>
> >> With patch 2/3 applied, the above warning disappeared. And with patch 2/3
> >> applied, the issue mentioned in patch 1/3 can also be fixed because the LAPIC
> >> has been set as ExtINT before jump to kdump kernel, while we had better set it
> >> explicitly. Seems no reason not to enable legacy irq mode before jump to
> >> kexec/kdump kernel, and can make it be consistent with normal kernel.
> >>
> >> Patch 3/3 is doing clean up, I am fine if people think it's unnecessary.
> >>
> >>
> >> Baoquan He (3):
> >> x86/apic: Set up through LAPIC on boot CPU's LINT0 if ioapic is
> >> disabled
> >> x86/apic/kexec: Enable legacy irq mode before jump to kexec/kdump
> >> kernel
> >> x86/apic: Clean up the names of legacy irq mode setting related
> >> functions
> >>
> >> arch/x86/include/asm/apic.h | 2 +-
> >> arch/x86/include/asm/io_apic.h | 6 +++---
> >> arch/x86/kernel/apic/apic.c | 13 +++++++------
> >> arch/x86/kernel/apic/io_apic.c | 25 ++++++++++++-------------
> >> arch/x86/kernel/crash.c | 2 +-
> >> arch/x86/kernel/machine_kexec_32.c | 15 +++++----------
> >> arch/x86/kernel/machine_kexec_64.c | 15 +++++----------
> >> arch/x86/kernel/reboot.c | 2 +-
> >> arch/x86/kernel/x86_init.c | 2 +-
> >> drivers/iommu/irq_remapping.c | 2 +-
> >> 10 files changed, 37 insertions(+), 47 deletions(-)
> >>
> >> --
> >> 2.5.5
> >>
Hi Baoquan,
At 01/05/2018 12:38 PM, Baoquan He wrote:
> In commit
>
> commit 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC").
>
> lapic_shutdown() invocation is moved after disable_IO_APIC(). In fact
> in disable_IO_APIC(), it not only calls clear_IO_APIC() to disable
> IO-APIC, also sets sets LAPIC and IO-APIC to make system be PIC or
> Virtual wire mode. While the above commit putting disable_IO_APIC earlier
> causes local APIC is completely disabled. So the legacy irq mode is
> disabled too before jump to kexec/kdump kernel.
>
I have a question:
As you said, Due to disable_IO_APIC() is triggered before
lapic_shutdown(), So the interrupt virtual wire mode will be disabled.
but, I found that:
After machine_crash_shutdown() is executed, Linux will call
machine_kexec(), and in machine_kexec(), disable_IO_APIC() will also be
called again, why it can't switch to virtual wire mode successfully? Or
is my understanding wrong?
+--------------+
| __crash_kexec|
+--------------+
|
| +-------------------------+
+--> | machine_crash_shutdown |
| ++------------------------+
| |
| | +-----------------+
| +> | disable_IO_APIC |
| | +-----------------+
| |
| | +----------------+
| +-^+ lapic_shutdown |
| +----------------+
|
| +-------------------------+
+--> | machine_kexec |
| ++------------------------+
| |
| | +-----------------+
| +> | disable_IO_APIC |
| +-----------------+
|
v
Thanks,
dou.
> In normal kernel it defaults to be PIC mode or Virtual Wire mode during
> system initialization before APIC mode is enabled and this is done by
> BIOS initialization. But kexec/kdump kernel won't go through BIOS, so
> we should set system as PIC or Virtual Wire mode before jump to kdump
> kernel code directly.
>
> So let's take clear_IO_APIC out from disable_IO_APIC and rename
> disable_IO_APIC as switch_to_legacy_irq_mode. Then only call clear_IO_APIC
> when IO-APIC need be disabled. And call switch_to_legacy_irq_mode before
> kexec/kdump jumping.
>
> Signed-off-by: Baoquan He <[email protected]>
> ---
> arch/x86/include/asm/io_apic.h | 3 ++-
> arch/x86/kernel/apic/io_apic.c | 12 ++++--------
> arch/x86/kernel/crash.c | 2 +-
> arch/x86/kernel/machine_kexec_32.c | 15 +++++----------
> arch/x86/kernel/machine_kexec_64.c | 15 +++++----------
> arch/x86/kernel/reboot.c | 2 +-
> 6 files changed, 18 insertions(+), 31 deletions(-)
>
> diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> index a8834dd546cd..e38ad3863a2c 100644
> --- a/arch/x86/include/asm/io_apic.h
> +++ b/arch/x86/include/asm/io_apic.h
> @@ -192,7 +192,8 @@ static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
>
> extern void setup_IO_APIC(void);
> extern void enable_IO_APIC(void);
> -extern void disable_IO_APIC(void);
> +extern void clear_IO_APIC (void);
> +extern void switch_to_legacy_irq_mode(void);
> extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
> extern void print_IO_APICs(void);
> #else /* !CONFIG_X86_IO_APIC */
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index 8a7963421460..a47aa915d18c 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -587,7 +587,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
> mpc_ioapic_id(apic), pin);
> }
>
> -static void clear_IO_APIC (void)
> +void clear_IO_APIC (void)
> {
> int apic, pin;
>
> @@ -1439,15 +1439,11 @@ void native_disable_io_apic(void)
> }
>
> /*
> - * Not an __init, needed by the reboot code
> + * Not an __init, needed by kexec/kdump code.
> + * For safety IO-APIC and Local APIC need be cleared before this.
> */
> -void disable_IO_APIC(void)
> +void switch_to_legacy_irq_mode(void)
> {
> - /*
> - * Clear the IO-APIC before rebooting:
> - */
> - clear_IO_APIC();
> -
> if (!nr_legacy_irqs())
> return;
>
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 10e74d4778a1..318ffeaaf55a 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -199,7 +199,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> #ifdef CONFIG_X86_IO_APIC
> /* Prevent crash_kexec() from deadlocking on ioapic_lock. */
> ioapic_zap_locks();
> - disable_IO_APIC();
> + clear_IO_APIC();
> #endif
> lapic_shutdown();
> #ifdef CONFIG_HPET_TIMER
> diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
> index edfede768688..7ab10d930cc6 100644
> --- a/arch/x86/kernel/machine_kexec_32.c
> +++ b/arch/x86/kernel/machine_kexec_32.c
> @@ -190,18 +190,13 @@ void machine_kexec(struct kimage *image)
> local_irq_disable();
> hw_breakpoint_disable();
>
> - if (image->preserve_context) {
> #ifdef CONFIG_X86_IO_APIC
> - /*
> - * We need to put APICs in legacy mode so that we can
> - * get timer interrupts in second kernel. kexec/kdump
> - * paths already have calls to disable_IO_APIC() in
> - * one form or other. kexec jump path also need
> - * one.
> - */
> - disable_IO_APIC();
> + /*
> + * We need to put APICs in legacy mode so that we can
> + * get timer interrupts in second kernel.
> + */
> + switch_to_legacy_irq_mode();
> #endif
> - }
>
> control_page = page_address(image->control_code_page);
> memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 1f790cf9d38f..b5c0cbed6791 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -288,18 +288,13 @@ void machine_kexec(struct kimage *image)
> local_irq_disable();
> hw_breakpoint_disable();
>
> - if (image->preserve_context) {
> #ifdef CONFIG_X86_IO_APIC
> - /*
> - * We need to put APICs in legacy mode so that we can
> - * get timer interrupts in second kernel. kexec/kdump
> - * paths already have calls to disable_IO_APIC() in
> - * one form or other. kexec jump path also need
> - * one.
> - */
> - disable_IO_APIC();
> + /*
> + * We need to put APICs in legacy mode so that we can
> + * get timer interrupts in second kernel.
> + */
> + switch_to_legacy_irq_mode();
> #endif
> - }
>
> control_page = page_address(image->control_code_page) + PAGE_SIZE;
> memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index 2126b9d27c34..b70cc0f38a29 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -666,7 +666,7 @@ void native_machine_shutdown(void)
> * Even without the erratum, it still makes sense to quiet IO APIC
> * before disabling Local APIC.
> */
> - disable_IO_APIC();
> + clear_IO_APIC();
> #endif
>
> #ifdef CONFIG_SMP
>
On 01/17/18 at 05:47pm, Dou Liyang wrote:
> Hi Baoquan,
>
> At 01/05/2018 12:38 PM, Baoquan He wrote:
> > In commit
> >
> > commit 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC").
> >
> > lapic_shutdown() invocation is moved after disable_IO_APIC(). In fact
> > in disable_IO_APIC(), it not only calls clear_IO_APIC() to disable
> > IO-APIC, also sets sets LAPIC and IO-APIC to make system be PIC or
> > Virtual wire mode. While the above commit putting disable_IO_APIC earlier
> > causes local APIC is completely disabled. So the legacy irq mode is
> > disabled too before jump to kexec/kdump kernel.
> >
> I have a question:
>
> As you said, Due to disable_IO_APIC() is triggered before
> lapic_shutdown(), So the interrupt virtual wire mode will be disabled.
>
> but, I found that:
>
> After machine_crash_shutdown() is executed, Linux will call
> machine_kexec(), and in machine_kexec(), disable_IO_APIC() will also be
> called again, why it can't switch to virtual wire mode successfully? Or
> is my understanding wrong?
The disable_IO_APIC() calling has a condition check,
if (image->preserve_context) {
disable_IO_APIC();
}
For preserve_context case, it comes from kernel_kexec(). You can check
it in kexec man page, that is another scenario we use kexec for. But not
kexec and kdump.
>
> +--------------+
> | __crash_kexec|
> +--------------+
> |
> | +-------------------------+
> +--> | machine_crash_shutdown |
> | ++------------------------+
> | |
> | | +-----------------+
> | +> | disable_IO_APIC |
> | | +-----------------+
> | |
> | | +----------------+
> | +-^+ lapic_shutdown |
> | +----------------+
> |
> | +-------------------------+
> +--> | machine_kexec |
> | ++------------------------+
> | |
> | | +-----------------+
> | +> | disable_IO_APIC |
> | +-----------------+
> |
> v
>
> Thanks,
> dou.
> > In normal kernel it defaults to be PIC mode or Virtual Wire mode during
> > system initialization before APIC mode is enabled and this is done by
> > BIOS initialization. But kexec/kdump kernel won't go through BIOS, so
> > we should set system as PIC or Virtual Wire mode before jump to kdump
> > kernel code directly.
> >
> > So let's take clear_IO_APIC out from disable_IO_APIC and rename
> > disable_IO_APIC as switch_to_legacy_irq_mode. Then only call clear_IO_APIC
> > when IO-APIC need be disabled. And call switch_to_legacy_irq_mode before
> > kexec/kdump jumping.
> >
> > Signed-off-by: Baoquan He <[email protected]>
> > ---
> > arch/x86/include/asm/io_apic.h | 3 ++-
> > arch/x86/kernel/apic/io_apic.c | 12 ++++--------
> > arch/x86/kernel/crash.c | 2 +-
> > arch/x86/kernel/machine_kexec_32.c | 15 +++++----------
> > arch/x86/kernel/machine_kexec_64.c | 15 +++++----------
> > arch/x86/kernel/reboot.c | 2 +-
> > 6 files changed, 18 insertions(+), 31 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
> > index a8834dd546cd..e38ad3863a2c 100644
> > --- a/arch/x86/include/asm/io_apic.h
> > +++ b/arch/x86/include/asm/io_apic.h
> > @@ -192,7 +192,8 @@ static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
> > extern void setup_IO_APIC(void);
> > extern void enable_IO_APIC(void);
> > -extern void disable_IO_APIC(void);
> > +extern void clear_IO_APIC (void);
> > +extern void switch_to_legacy_irq_mode(void);
> > extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
> > extern void print_IO_APICs(void);
> > #else /* !CONFIG_X86_IO_APIC */
> > diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> > index 8a7963421460..a47aa915d18c 100644
> > --- a/arch/x86/kernel/apic/io_apic.c
> > +++ b/arch/x86/kernel/apic/io_apic.c
> > @@ -587,7 +587,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
> > mpc_ioapic_id(apic), pin);
> > }
> > -static void clear_IO_APIC (void)
> > +void clear_IO_APIC (void)
> > {
> > int apic, pin;
> > @@ -1439,15 +1439,11 @@ void native_disable_io_apic(void)
> > }
> > /*
> > - * Not an __init, needed by the reboot code
> > + * Not an __init, needed by kexec/kdump code.
> > + * For safety IO-APIC and Local APIC need be cleared before this.
> > */
> > -void disable_IO_APIC(void)
> > +void switch_to_legacy_irq_mode(void)
> > {
> > - /*
> > - * Clear the IO-APIC before rebooting:
> > - */
> > - clear_IO_APIC();
> > -
> > if (!nr_legacy_irqs())
> > return;
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index 10e74d4778a1..318ffeaaf55a 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -199,7 +199,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> > #ifdef CONFIG_X86_IO_APIC
> > /* Prevent crash_kexec() from deadlocking on ioapic_lock. */
> > ioapic_zap_locks();
> > - disable_IO_APIC();
> > + clear_IO_APIC();
> > #endif
> > lapic_shutdown();
> > #ifdef CONFIG_HPET_TIMER
> > diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
> > index edfede768688..7ab10d930cc6 100644
> > --- a/arch/x86/kernel/machine_kexec_32.c
> > +++ b/arch/x86/kernel/machine_kexec_32.c
> > @@ -190,18 +190,13 @@ void machine_kexec(struct kimage *image)
> > local_irq_disable();
> > hw_breakpoint_disable();
> > - if (image->preserve_context) {
> > #ifdef CONFIG_X86_IO_APIC
> > - /*
> > - * We need to put APICs in legacy mode so that we can
> > - * get timer interrupts in second kernel. kexec/kdump
> > - * paths already have calls to disable_IO_APIC() in
> > - * one form or other. kexec jump path also need
> > - * one.
> > - */
> > - disable_IO_APIC();
> > + /*
> > + * We need to put APICs in legacy mode so that we can
> > + * get timer interrupts in second kernel.
> > + */
> > + switch_to_legacy_irq_mode();
> > #endif
> > - }
> > control_page = page_address(image->control_code_page);
> > memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
> > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> > index 1f790cf9d38f..b5c0cbed6791 100644
> > --- a/arch/x86/kernel/machine_kexec_64.c
> > +++ b/arch/x86/kernel/machine_kexec_64.c
> > @@ -288,18 +288,13 @@ void machine_kexec(struct kimage *image)
> > local_irq_disable();
> > hw_breakpoint_disable();
> > - if (image->preserve_context) {
> > #ifdef CONFIG_X86_IO_APIC
> > - /*
> > - * We need to put APICs in legacy mode so that we can
> > - * get timer interrupts in second kernel. kexec/kdump
> > - * paths already have calls to disable_IO_APIC() in
> > - * one form or other. kexec jump path also need
> > - * one.
> > - */
> > - disable_IO_APIC();
> > + /*
> > + * We need to put APICs in legacy mode so that we can
> > + * get timer interrupts in second kernel.
> > + */
> > + switch_to_legacy_irq_mode();
> > #endif
> > - }
> > control_page = page_address(image->control_code_page) + PAGE_SIZE;
> > memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
> > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> > index 2126b9d27c34..b70cc0f38a29 100644
> > --- a/arch/x86/kernel/reboot.c
> > +++ b/arch/x86/kernel/reboot.c
> > @@ -666,7 +666,7 @@ void native_machine_shutdown(void)
> > * Even without the erratum, it still makes sense to quiet IO APIC
> > * before disabling Local APIC.
> > */
> > - disable_IO_APIC();
> > + clear_IO_APIC();
> > #endif
> > #ifdef CONFIG_SMP
> >
>
>
Hi Baoquan,
At 01/17/2018 06:08 PM, Baoquan He wrote:
> On 01/17/18 at 05:47pm, Dou Liyang wrote:
>> Hi Baoquan,
>>
>> At 01/05/2018 12:38 PM, Baoquan He wrote:
>>> In commit
>>>
>>> commit 522e66464467 ("x86/apic: Disable I/O APIC before shutdown of the local APIC").
>>>
>>> lapic_shutdown() invocation is moved after disable_IO_APIC(). In fact
>>> in disable_IO_APIC(), it not only calls clear_IO_APIC() to disable
>>> IO-APIC, also sets sets LAPIC and IO-APIC to make system be PIC or
>>> Virtual wire mode. While the above commit putting disable_IO_APIC earlier
>>> causes local APIC is completely disabled. So the legacy irq mode is
>>> disabled too before jump to kexec/kdump kernel.
>>>
>> I have a question:
>>
>> As you said, Due to disable_IO_APIC() is triggered before
>> lapic_shutdown(), So the interrupt virtual wire mode will be disabled.
>>
>> but, I found that:
>>
>> After machine_crash_shutdown() is executed, Linux will call
>> machine_kexec(), and in machine_kexec(), disable_IO_APIC() will also be
>> called again, why it can't switch to virtual wire mode successfully? Or
>> is my understanding wrong?
> The disable_IO_APIC() calling has a condition check,
>
> if (image->preserve_context) {
> disable_IO_APIC();
> }
>
> For preserve_context case, it comes from kernel_kexec(). You can check
> it in kexec man page, that is another scenario we use kexec for. But not
> kexec and kdump.
>
Understood!
This patch looks good to me and I also tested it, it's OK.
Thanks,
dou.
>>
>> +--------------+
>> | __crash_kexec|
>> +--------------+
>> |
>> | +-------------------------+
>> +--> | machine_crash_shutdown |
>> | ++------------------------+
>> | |
>> | | +-----------------+
>> | +> | disable_IO_APIC |
>> | | +-----------------+
>> | |
>> | | +----------------+
>> | +-^+ lapic_shutdown |
>> | +----------------+
>> |
>> | +-------------------------+
>> +--> | machine_kexec |
>> | ++------------------------+
>> | |
>> | | +-----------------+
>> | +> | disable_IO_APIC |
>> | +-----------------+
>> |
>> v
>>
>> Thanks,
>> dou.
>>> In normal kernel it defaults to be PIC mode or Virtual Wire mode during
>>> system initialization before APIC mode is enabled and this is done by
>>> BIOS initialization. But kexec/kdump kernel won't go through BIOS, so
>>> we should set system as PIC or Virtual Wire mode before jump to kdump
>>> kernel code directly.
>>>
>>> So let's take clear_IO_APIC out from disable_IO_APIC and rename
>>> disable_IO_APIC as switch_to_legacy_irq_mode. Then only call clear_IO_APIC
>>> when IO-APIC need be disabled. And call switch_to_legacy_irq_mode before
>>> kexec/kdump jumping.
>>>
>>> Signed-off-by: Baoquan He <[email protected]>
>>> ---
>>> arch/x86/include/asm/io_apic.h | 3 ++-
>>> arch/x86/kernel/apic/io_apic.c | 12 ++++--------
>>> arch/x86/kernel/crash.c | 2 +-
>>> arch/x86/kernel/machine_kexec_32.c | 15 +++++----------
>>> arch/x86/kernel/machine_kexec_64.c | 15 +++++----------
>>> arch/x86/kernel/reboot.c | 2 +-
>>> 6 files changed, 18 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/io_apic.h b/arch/x86/include/asm/io_apic.h
>>> index a8834dd546cd..e38ad3863a2c 100644
>>> --- a/arch/x86/include/asm/io_apic.h
>>> +++ b/arch/x86/include/asm/io_apic.h
>>> @@ -192,7 +192,8 @@ static inline unsigned int io_apic_read(unsigned int apic, unsigned int reg)
>>> extern void setup_IO_APIC(void);
>>> extern void enable_IO_APIC(void);
>>> -extern void disable_IO_APIC(void);
>>> +extern void clear_IO_APIC (void);
>>> +extern void switch_to_legacy_irq_mode(void);
>>> extern int IO_APIC_get_PCI_irq_vector(int bus, int devfn, int pin);
>>> extern void print_IO_APICs(void);
>>> #else /* !CONFIG_X86_IO_APIC */
>>> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
>>> index 8a7963421460..a47aa915d18c 100644
>>> --- a/arch/x86/kernel/apic/io_apic.c
>>> +++ b/arch/x86/kernel/apic/io_apic.c
>>> @@ -587,7 +587,7 @@ static void clear_IO_APIC_pin(unsigned int apic, unsigned int pin)
>>> mpc_ioapic_id(apic), pin);
>>> }
>>> -static void clear_IO_APIC (void)
>>> +void clear_IO_APIC (void)
>>> {
>>> int apic, pin;
>>> @@ -1439,15 +1439,11 @@ void native_disable_io_apic(void)
>>> }
>>> /*
>>> - * Not an __init, needed by the reboot code
>>> + * Not an __init, needed by kexec/kdump code.
>>> + * For safety IO-APIC and Local APIC need be cleared before this.
>>> */
>>> -void disable_IO_APIC(void)
>>> +void switch_to_legacy_irq_mode(void)
>>> {
>>> - /*
>>> - * Clear the IO-APIC before rebooting:
>>> - */
>>> - clear_IO_APIC();
>>> -
>>> if (!nr_legacy_irqs())
>>> return;
>>> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
>>> index 10e74d4778a1..318ffeaaf55a 100644
>>> --- a/arch/x86/kernel/crash.c
>>> +++ b/arch/x86/kernel/crash.c
>>> @@ -199,7 +199,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>>> #ifdef CONFIG_X86_IO_APIC
>>> /* Prevent crash_kexec() from deadlocking on ioapic_lock. */
>>> ioapic_zap_locks();
>>> - disable_IO_APIC();
>>> + clear_IO_APIC();
>>> #endif
>>> lapic_shutdown();
>>> #ifdef CONFIG_HPET_TIMER
>>> diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
>>> index edfede768688..7ab10d930cc6 100644
>>> --- a/arch/x86/kernel/machine_kexec_32.c
>>> +++ b/arch/x86/kernel/machine_kexec_32.c
>>> @@ -190,18 +190,13 @@ void machine_kexec(struct kimage *image)
>>> local_irq_disable();
>>> hw_breakpoint_disable();
>>> - if (image->preserve_context) {
>>> #ifdef CONFIG_X86_IO_APIC
>>> - /*
>>> - * We need to put APICs in legacy mode so that we can
>>> - * get timer interrupts in second kernel. kexec/kdump
>>> - * paths already have calls to disable_IO_APIC() in
>>> - * one form or other. kexec jump path also need
>>> - * one.
>>> - */
>>> - disable_IO_APIC();
>>> + /*
>>> + * We need to put APICs in legacy mode so that we can
>>> + * get timer interrupts in second kernel.
>>> + */
>>> + switch_to_legacy_irq_mode();
>>> #endif
>>> - }
>>> control_page = page_address(image->control_code_page);
>>> memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
>>> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
>>> index 1f790cf9d38f..b5c0cbed6791 100644
>>> --- a/arch/x86/kernel/machine_kexec_64.c
>>> +++ b/arch/x86/kernel/machine_kexec_64.c
>>> @@ -288,18 +288,13 @@ void machine_kexec(struct kimage *image)
>>> local_irq_disable();
>>> hw_breakpoint_disable();
>>> - if (image->preserve_context) {
>>> #ifdef CONFIG_X86_IO_APIC
>>> - /*
>>> - * We need to put APICs in legacy mode so that we can
>>> - * get timer interrupts in second kernel. kexec/kdump
>>> - * paths already have calls to disable_IO_APIC() in
>>> - * one form or other. kexec jump path also need
>>> - * one.
>>> - */
>>> - disable_IO_APIC();
>>> + /*
>>> + * We need to put APICs in legacy mode so that we can
>>> + * get timer interrupts in second kernel.
>>> + */
>>> + switch_to_legacy_irq_mode();
>>> #endif
>>> - }
>>> control_page = page_address(image->control_code_page) + PAGE_SIZE;
>>> memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
>>> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
>>> index 2126b9d27c34..b70cc0f38a29 100644
>>> --- a/arch/x86/kernel/reboot.c
>>> +++ b/arch/x86/kernel/reboot.c
>>> @@ -666,7 +666,7 @@ void native_machine_shutdown(void)
>>> * Even without the erratum, it still makes sense to quiet IO APIC
>>> * before disabling Local APIC.
>>> */
>>> - disable_IO_APIC();
>>> + clear_IO_APIC();
>>> #endif
>>> #ifdef CONFIG_SMP
>>>
>>
>>
>
>
>
Hi Baoquan,
At 01/05/2018 12:39 PM, Baoquan He wrote:
[...]
> /*
> - * Not an __init, needed by kexec/kdump code.
> - * For safety IO-APIC and Local APIC need be cleared before this.
> + * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT is
> + * provided by using the APICs in conjunction with standard 8259A-equivalent
> + * programmable interrupt controllers (PICs). It's necessary to deliver legacy
> + * interrupts even when APIC mode is not enabled. This is required by kexec/
> + * kdump before enter into the 2nd kernel.
> */
> void switch_to_legacy_irq_mode(void)
> {
> if (!nr_legacy_irqs())
> return;
>
> - x86_io_apic_ops.disable();
> + ioapic_set_virtual_wire_mode();
> +
> + if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> + lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1);
Seems these two function, ioapic/lapic_set_legacy_irq_mode should be
exclusive.
But We do that because both the through-lapic and through-ioapic virtual
wire mode need setup the APIC_SPIV_APIC_ENABLED which is only located in
the lapic_set_legacy_irq_mode(). So we need call them both.
IMO, this cleanup may not make it clear. we can separate these two mode
totally or just keep it like before.
Thanks,
dou.
> }
>
> #ifdef CONFIG_X86_32
> diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> index 1151ccd72ce9..c30f0f273dbd 100644
> --- a/arch/x86/kernel/x86_init.c
> +++ b/arch/x86/kernel/x86_init.c
> @@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev)
>
> struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = {
> .read = native_io_apic_read,
> - .disable = native_disable_io_apic,
> + .disable = switch_to_legacy_irq_mode,
> };
> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> index 49721b4e1975..751472ddf536 100644
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void)
> * now.
> */
> if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> - disconnect_bsp_APIC(0);
> + lapic_set_legacy_irq_mode(0);
> }
>
> static void __init irq_remapping_modify_x86_ops(void)
>
On 01/19/18 at 02:42pm, Dou Liyang wrote:
> Hi Baoquan,
>
> At 01/05/2018 12:39 PM, Baoquan He wrote:
> [...]
> > /*
> > - * Not an __init, needed by kexec/kdump code.
> > - * For safety IO-APIC and Local APIC need be cleared before this.
> > + * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT is
> > + * provided by using the APICs in conjunction with standard 8259A-equivalent
> > + * programmable interrupt controllers (PICs). It's necessary to deliver legacy
> > + * interrupts even when APIC mode is not enabled. This is required by kexec/
> > + * kdump before enter into the 2nd kernel.
> > */
> > void switch_to_legacy_irq_mode(void)
> > {
> > if (!nr_legacy_irqs())
> > return;
> > - x86_io_apic_ops.disable();
> > + ioapic_set_virtual_wire_mode();
> > +
> > + if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> > + lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1);
>
> Seems these two function, ioapic/lapic_set_legacy_irq_mode should be
> exclusive.
Thanks for looking into this, dou!
It might be not exclusive. You can see mp_spec 3.6.2.2 Virtual Wire Mode
subsection, there are two kinds of virtual wire mode, one is
8259A-Equivalent pics is connected to lint0 of boot cpu LAPIC, the other
is 8259A-Equivalent pics go through IO-APIC, then is connected to lint0
of LAPIC. Whatever it is, LAPIC need be set as through-lapic.
Above is what I got from mp_spec. But from function
native_disable_io_apic() and disconnect_bsp_APIC(), the code seems to be
telling that if io-apic is connected to 8259A-Equivalent pics, we need
mask lvt0 of LAPIC. This conflicts with mp_spec 3.6.2.2.
Thanks
Baoquan
>
> But We do that because both the through-lapic and through-ioapic virtual
> wire mode need setup the APIC_SPIV_APIC_ENABLED which is only located in
> the lapic_set_legacy_irq_mode(). So we need call them both.
>
> IMO, this cleanup may not make it clear. we can separate these two mode
> totally or just keep it like before.
>
> Thanks,
> dou.
> > }
> > #ifdef CONFIG_X86_32
> > diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> > index 1151ccd72ce9..c30f0f273dbd 100644
> > --- a/arch/x86/kernel/x86_init.c
> > +++ b/arch/x86/kernel/x86_init.c
> > @@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev)
> > struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = {
> > .read = native_io_apic_read,
> > - .disable = native_disable_io_apic,
> > + .disable = switch_to_legacy_irq_mode,
> > };
> > diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> > index 49721b4e1975..751472ddf536 100644
> > --- a/drivers/iommu/irq_remapping.c
> > +++ b/drivers/iommu/irq_remapping.c
> > @@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void)
> > * now.
> > */
> > if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> > - disconnect_bsp_APIC(0);
> > + lapic_set_legacy_irq_mode(0);
> > }
> > static void __init irq_remapping_modify_x86_ops(void)
> >
>
>
At 01/19/2018 03:21 PM, Baoquan He wrote:
> On 01/19/18 at 02:42pm, Dou Liyang wrote:
>> Hi Baoquan,
>>
>> At 01/05/2018 12:39 PM, Baoquan He wrote:
>> [...]
>>> /*
>>> - * Not an __init, needed by kexec/kdump code.
>>> - * For safety IO-APIC and Local APIC need be cleared before this.
>>> + * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT is
>>> + * provided by using the APICs in conjunction with standard 8259A-equivalent
>>> + * programmable interrupt controllers (PICs). It's necessary to deliver legacy
>>> + * interrupts even when APIC mode is not enabled. This is required by kexec/
>>> + * kdump before enter into the 2nd kernel.
>>> */
>>> void switch_to_legacy_irq_mode(void)
>>> {
>>> if (!nr_legacy_irqs())
>>> return;
>>> - x86_io_apic_ops.disable();
>>> + ioapic_set_virtual_wire_mode();
>>> +
>>> + if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
>>> + lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1);
>>
>> Seems these two function, ioapic/lapic_set_legacy_irq_mode should be
>> exclusive.
>
> Thanks for looking into this, dou!
>
> It might be not exclusive. You can see mp_spec 3.6.2.2 Virtual Wire Mode
Oops! Sorry to confuse you, here what I want to say is that the code
make me think that we set through IO-APIC first, then set through
Local-APIC then. But, we can be only in one mode of them.
Just worry about this code seems ignore the irq remapping situation.
We call switch_to_legacy_irq_mode() directly in machine_kexec(),
why we also set x86_io_apic_ops:
.disable = switch_to_legacy_irq_mode
> subsection, there are two kinds of virtual wire mode, one is
> 8259A-Equivalent pics is connected to lint0 of boot cpu LAPIC, the other
> is 8259A-Equivalent pics go through IO-APIC, then is connected to lint0
> of LAPIC. Whatever it is, LAPIC need be set as through-lapic.
>
Yes, Absolutely right!
> Above is what I got from mp_spec. But from function
> native_disable_io_apic() and disconnect_bsp_APIC(), the code seems to be
> telling that if io-apic is connected to 8259A-Equivalent pics, we need
> mask lvt0 of LAPIC. This conflicts with mp_spec 3.6.2.2.
>
I think so.
Thanks,
dou.
> Thanks
> Baoquan
>>
>> But We do that because both the through-lapic and through-ioapic virtual
>> wire mode need setup the APIC_SPIV_APIC_ENABLED which is only located in
>> the lapic_set_legacy_irq_mode(). So we need call them both.
>>
>> IMO, this cleanup may not make it clear. we can separate these two mode
>> totally or just keep it like before.
>>
>> Thanks,
>> dou.
>>> }
>>> #ifdef CONFIG_X86_32
>>> diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
>>> index 1151ccd72ce9..c30f0f273dbd 100644
>>> --- a/arch/x86/kernel/x86_init.c
>>> +++ b/arch/x86/kernel/x86_init.c
>>> @@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev)
>>> struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = {
>>> .read = native_io_apic_read,
>>> - .disable = native_disable_io_apic,
>>> + .disable = switch_to_legacy_irq_mode,
>>> };
>>> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
>>> index 49721b4e1975..751472ddf536 100644
>>> --- a/drivers/iommu/irq_remapping.c
>>> +++ b/drivers/iommu/irq_remapping.c
>>> @@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void)
>>> * now.
>>> */
>>> if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
>>> - disconnect_bsp_APIC(0);
>>> + lapic_set_legacy_irq_mode(0);
>>> }
>>> static void __init irq_remapping_modify_x86_ops(void)
>>>
>>
>>
>
>
>
On 01/19/18 at 04:06pm, Dou Liyang wrote:
>
>
> At 01/19/2018 03:21 PM, Baoquan He wrote:
> > On 01/19/18 at 02:42pm, Dou Liyang wrote:
> > > Hi Baoquan,
> > >
> > > At 01/05/2018 12:39 PM, Baoquan He wrote:
> > > [...]
> > > > /*
> > > > - * Not an __init, needed by kexec/kdump code.
> > > > - * For safety IO-APIC and Local APIC need be cleared before this.
> > > > + * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT is
> > > > + * provided by using the APICs in conjunction with standard 8259A-equivalent
> > > > + * programmable interrupt controllers (PICs). It's necessary to deliver legacy
> > > > + * interrupts even when APIC mode is not enabled. This is required by kexec/
> > > > + * kdump before enter into the 2nd kernel.
> > > > */
> > > > void switch_to_legacy_irq_mode(void)
> > > > {
> > > > if (!nr_legacy_irqs())
> > > > return;
> > > > - x86_io_apic_ops.disable();
> > > > + ioapic_set_virtual_wire_mode();
> > > > +
> > > > + if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> > > > + lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1);
> > >
> > > Seems these two function, ioapic/lapic_set_legacy_irq_mode should be
> > > exclusive.
> >
> > Thanks for looking into this, dou!
> >
> > It might be not exclusive. You can see mp_spec 3.6.2.2 Virtual Wire Mode
>
> Oops! Sorry to confuse you, here what I want to say is that the code
> make me think that we set through IO-APIC first, then set through
> Local-APIC then. But, we can be only in one mode of them.
>
> Just worry about this code seems ignore the irq remapping situation.
>
> We call switch_to_legacy_irq_mode() directly in machine_kexec(),
> why we also set x86_io_apic_ops:
>
> .disable = switch_to_legacy_irq_mode
You are right, this is not very clear. Since we call switch_to_legacy_irq_mode()
directly, nobody will call x86_io_apic_ops.disable() any more.
Should keep native_disable_io_apic() and let x86_io_apic_ops.disable()
to point at it.
Thanks
Baoquan
>
> > subsection, there are two kinds of virtual wire mode, one is
> > 8259A-Equivalent pics is connected to lint0 of boot cpu LAPIC, the other
> > is 8259A-Equivalent pics go through IO-APIC, then is connected to lint0
> > of LAPIC. Whatever it is, LAPIC need be set as through-lapic.
> >
>
> Yes, Absolutely right!
>
> > Above is what I got from mp_spec. But from function
> > native_disable_io_apic() and disconnect_bsp_APIC(), the code seems to be
> > telling that if io-apic is connected to 8259A-Equivalent pics, we need
> > mask lvt0 of LAPIC. This conflicts with mp_spec 3.6.2.2.
> >
>
> I think so.
>
> Thanks,
> dou.
> > Thanks
> > Baoquan
> > >
> > > But We do that because both the through-lapic and through-ioapic virtual
> > > wire mode need setup the APIC_SPIV_APIC_ENABLED which is only located in
> > > the lapic_set_legacy_irq_mode(). So we need call them both.
> > >
> > > IMO, this cleanup may not make it clear. we can separate these two mode
> > > totally or just keep it like before.
> > >
> > > Thanks,
> > > dou.
> > > > }
> > > > #ifdef CONFIG_X86_32
> > > > diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> > > > index 1151ccd72ce9..c30f0f273dbd 100644
> > > > --- a/arch/x86/kernel/x86_init.c
> > > > +++ b/arch/x86/kernel/x86_init.c
> > > > @@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev)
> > > > struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = {
> > > > .read = native_io_apic_read,
> > > > - .disable = native_disable_io_apic,
> > > > + .disable = switch_to_legacy_irq_mode,
> > > > };
> > > > diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> > > > index 49721b4e1975..751472ddf536 100644
> > > > --- a/drivers/iommu/irq_remapping.c
> > > > +++ b/drivers/iommu/irq_remapping.c
> > > > @@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void)
> > > > * now.
> > > > */
> > > > if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> > > > - disconnect_bsp_APIC(0);
> > > > + lapic_set_legacy_irq_mode(0);
> > > > }
> > > > static void __init irq_remapping_modify_x86_ops(void)
> > > >
> > >
> > >
> >
> >
> >
>
>
On 01/19/18 at 02:42pm, Dou Liyang wrote:
> Hi Baoquan,
>
> At 01/05/2018 12:39 PM, Baoquan He wrote:
> [...]
> > /*
> > - * Not an __init, needed by kexec/kdump code.
> > - * For safety IO-APIC and Local APIC need be cleared before this.
> > + * In legacy irq mode, full DOS compatibility with the uniprocessor PC/AT is
> > + * provided by using the APICs in conjunction with standard 8259A-equivalent
> > + * programmable interrupt controllers (PICs). It's necessary to deliver legacy
> > + * interrupts even when APIC mode is not enabled. This is required by kexec/
> > + * kdump before enter into the 2nd kernel.
> > */
> > void switch_to_legacy_irq_mode(void)
> > {
> > if (!nr_legacy_irqs())
> > return;
> > - x86_io_apic_ops.disable();
> > + ioapic_set_virtual_wire_mode();
> > +
> > + if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> > + lapic_set_legacy_irq_mode(ioapic_i8259.pin != -1);
>
> Seems these two function, ioapic/lapic_set_legacy_irq_mode should be
> exclusive.
>
> But We do that because both the through-lapic and through-ioapic virtual
> wire mode need setup the APIC_SPIV_APIC_ENABLED which is only located in
> the lapic_set_legacy_irq_mode(). So we need call them both.
>
> IMO, this cleanup may not make it clear. we can separate these two mode
> totally or just keep it like before.
OK, I tried to find document about the virtual wire mode with
through-ioapic, but failed. The code has been there for a long time and
everything works well, that proves it's good. While the concept is still
not clear to me, at least there isn't description to tell explicitly.
And also we still need the x86_io_apic_ops.disable() hooker which irq
remapping need to use as dou commented.
So drop this clean up patch for now. I will repost the patchset
including patch 1 and 2.
Thanks
Baoquan
> > }
> > #ifdef CONFIG_X86_32
> > diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
> > index 1151ccd72ce9..c30f0f273dbd 100644
> > --- a/arch/x86/kernel/x86_init.c
> > +++ b/arch/x86/kernel/x86_init.c
> > @@ -148,5 +148,5 @@ void arch_restore_msi_irqs(struct pci_dev *dev)
> > struct x86_io_apic_ops x86_io_apic_ops __ro_after_init = {
> > .read = native_io_apic_read,
> > - .disable = native_disable_io_apic,
> > + .disable = switch_to_legacy_irq_mode,
> > };
> > diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> > index 49721b4e1975..751472ddf536 100644
> > --- a/drivers/iommu/irq_remapping.c
> > +++ b/drivers/iommu/irq_remapping.c
> > @@ -37,7 +37,7 @@ static void irq_remapping_disable_io_apic(void)
> > * now.
> > */
> > if (boot_cpu_has(X86_FEATURE_APIC) || apic_from_smp_config())
> > - disconnect_bsp_APIC(0);
> > + lapic_set_legacy_irq_mode(0);
> > }
> > static void __init irq_remapping_modify_x86_ops(void)
> >
>
>