2020-02-21 14:38:05

by Paul Menzel

[permalink] [raw]
Subject: kernel BUG at arch/x86/kernel/apic/apic.c with Dell server with x2APIC enabled and unset X2APIC

Dear Linux folks,


On the Dell PowerEdge T640/04WYPY, BIOS 2.4.8 11/27/2019, Linux 5.4.14 (and 4.19.57) with
unset `IRQ_REMAP` and `X86_X2APIC` crashes on start-up, when x2APIC is enabled in the
firmware.

[ 3.862327] ACPI: Core revision 20190816
[ 3.869551] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
[ 3.878797] APIC: Switch to symmetric I/O mode setup
[ 3.883893] Switched APIC routing to physical flat.
[ 3.888904] ------------[ cut here ]------------
[ 3.893641] kernel BUG at arch/x86/kernel/apic/apic.c:1616!
[ 3.899347] invalid opcode: 0000 [#1] SMP NOPTI
[ 3.903990] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.14.mx64.317 #1
[ 3.910803] Hardware name: Dell Inc. PowerEdge T640/04WYPY, BIOS 2.4.8 11/27/2019
[ 3.918448] RIP: 0010:setup_local_APIC+0x32e/0x390
[ 3.923356] Code: 68 70 2e 01 be 00 07 01 00 bf 50 03 00 00 48 8b 40 10 e8 15 9e db 00 eb a9 be 00 04 01 00 bf 60 03 00 00 e8 04 9e db 00 eb bb <0f> 0b e8 5b 3a 00 00
[ 3.942300] RSP: 0000:ffffffff82403e88 EFLAGS: 00010246
[ 3.947641] RAX: 0000000000000000 RBX: 00000000000000ff RCX: ffffffff82454128
[ 3.955787] RDX: 0000000000000000 RSI: 00000000fffffeff RDI: 0000000000000020
[ 3.963031] RBP: ffffffffffffffff R08: 00000000000001c4 R09: 0734073407370739
[ 3.970277] R10: ffffffff82573000 R11: 0720072007730765 R12: ffffffff82a4a920
[ 3.977522] R13: 0000000000000000 R14: ffff88c07fff0e80 R15: 0000000000000000
[ 3.984766] FS: 0000000000000000(0000) GS:ffff889fffc00000(0000) knlGS:0000000000000000
[ 3.993014] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.998876] CR2: ffff88c07ffff000 CR3: 000000000240a001 CR4: 00000000000606b0
[ 4.006121] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4.013365] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4.020611] Call Trace:
[ 4.023184] apic_intr_mode_init+0x1d2/0x1ec
[ 4.027573] x86_late_time_init+0x17/0x1c
[ 4.031706] start_kernel+0x41f/0x4d3
[ 4.035491] secondary_startup_64+0xa4/0xb0
[ 4.039797] Modules linked in:
[ 4.042997] ---[ end trace c3629ce2e87a638c ]---
[ 4.047746] RIP: 0010:setup_local_APIC+0x32e/0x390
[ 4.052663] Code: 68 70 2e 01 be 00 07 01 00 bf 50 03 00 00 48 8b 40 10 e8 15 9e db 00 eb a9 be 00 04 01 00 bf 60 03 00 00 e8 04 9e db 00 eb bb <0f> 0b e8 5b 3a 00 00
[ 4.071617] RSP: 0000:ffffffff82403e88 EFLAGS: 00010246
[ 4.076966] RAX: 0000000000000000 RBX: 00000000000000ff RCX: ffffffff82454128
[ 4.084219] RDX: 0000000000000000 RSI: 00000000fffffeff RDI: 0000000000000020
[ 4.091475] RBP: ffffffffffffffff R08: 00000000000001c4 R09: 0734073407370739
[ 4.098738] R10: ffffffff82573000 R11: 0720072007730765 R12: ffffffff82a4a920
[ 4.106000] R13: 0000000000000000 R14: ffff88c07fff0e80 R15: 0000000000000000
[ 4.113252] FS: 0000000000000000(0000) GS:ffff889fffc00000(0000) knlGS:0000000000000000
[ 4.121509] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.127380] CR2: ffff88c07ffff000 CR3: 000000000240a001 CR4: 00000000000606b0
[ 4.134632] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4.141887] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4.149142] Kernel panic - not syncing: Attempted to kill the idle task!
[ 4.155968] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

This is the code below.

/*
* Double-check whether this APIC is really registered.
* This is meaningless in clustered apic mode, so we skip it.
*/
BUG_ON(!apic->apic_id_registered());

Should this be made a similar error as in `validate_x2apic`?

panic("BIOS has enabled x2apic but kernel doesn't support x2apic, please disable x2apic in BIOS.\n");

`noapic` and `acpi=off` separately did not work, but `noapic acpi=off` hit the other
panic.


Kind regards,

Paul


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2020-02-21 15:57:33

by Borislav Petkov

[permalink] [raw]
Subject: Re: kernel BUG at arch/x86/kernel/apic/apic.c with Dell server with x2APIC enabled and unset X2APIC

On Fri, Feb 21, 2020 at 03:37:23PM +0100, Paul Menzel wrote:
> Dear Linux folks,
>
>
> On the Dell PowerEdge T640/04WYPY, BIOS 2.4.8 11/27/2019, Linux 5.4.14 (and 4.19.57) with
> unset `IRQ_REMAP` and `X86_X2APIC` crashes on start-up, when x2APIC is enabled in the
> firmware.

Does it happen with latest 5.5-stable too? I see 5.5.5 is the last one...

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-02-21 16:17:39

by Paul Menzel

[permalink] [raw]
Subject: Re: kernel BUG at arch/x86/kernel/apic/apic.c with Dell server with x2APIC enabled and unset X2APIC

Dear Borislav,


On 2020-02-21 16:57, Borislav Petkov wrote:
> On Fri, Feb 21, 2020 at 03:37:23PM +0100, Paul Menzel wrote:

>> On the Dell PowerEdge T640/04WYPY, BIOS 2.4.8 11/27/2019, Linux 5.4.14 (and 4.19.57) with
>> unset `IRQ_REMAP` and `X86_X2APIC` crashes on start-up, when x2APIC is enabled in the
>> firmware.
>
> Does it happen with latest 5.5-stable too? I see 5.5.5 is the last one...

It also happens with Linux 5.6-rc2.


Kind regards,

Paul


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2020-02-21 16:28:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: kernel BUG at arch/x86/kernel/apic/apic.c with Dell server with x2APIC enabled and unset X2APIC

Paul,

Paul Menzel <[email protected]> writes:
>
> On the Dell PowerEdge T640/04WYPY, BIOS 2.4.8 11/27/2019, Linux 5.4.14 (and 4.19.57) with
> unset `IRQ_REMAP` and `X86_X2APIC` crashes on start-up, when x2APIC is enabled in the
> firmware.

> [ 3.883893] Switched APIC routing to physical flat.
> [ 3.888904] ------------[ cut here ]------------
> [ 3.893641] kernel BUG at arch/x86/kernel/apic/apic.c:1616!

So the APIC is not registered.
>
> `noapic` and `acpi=off` separately did not work, but `noapic acpi=off` hit the other
> panic.

I have no idea what you are talking about.

Which command line options are set to reproduce the above?

Also please test the latest stable kernels not some random ones.

Thanks,

tglx

2020-02-21 16:29:30

by Paul Menzel

[permalink] [raw]
Subject: Re: kernel BUG at arch/x86/kernel/apic/apic.c with Dell server with x2APIC enabled and unset X2APIC

Dear Borislav,


On 2020-02-21 17:15, Paul Menzel wrote:

> On 2020-02-21 16:57, Borislav Petkov wrote:
>> On Fri, Feb 21, 2020 at 03:37:23PM +0100, Paul Menzel wrote:
>
>>> On the Dell PowerEdge T640/04WYPY, BIOS 2.4.8 11/27/2019, Linux 5.4.14 (and 4.19.57) with
>>> unset `IRQ_REMAP` and `X86_X2APIC` crashes on start-up, when x2APIC is enabled in the
>>> firmware.
>>
>> Does it happen with latest 5.5-stable too? I see 5.5.5 is the last one...
>
> It also happens with Linux 5.6-rc2.

```
[…]
[ 1.026337] IOAPIC[8]: apic_id 18, version 32, address 0xfec38000, GSI 96-103
[ 1.026340] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 1.026342] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[ 1.026348] Using ACPI (MADT) for SMP configuration information
[ 1.026350] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[ 1.026353] smpboot: Allowing 40 CPUs, 0 hotplug CPUs
[ 1.026370] [mem 0x90000000-0xfdffffff] available for PCI devices
[ 1.026374] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[ 1.031515] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:40 nr_node_ids:2
[ 1.034956] percpu: Embedded 54 pages/cpu s180504 r8192 d32488 u262144
[ 1.034999] Built 2 zonelists, mobility grouping on. Total pages: 65922237
[ 1.035000] Policy zone: Normal
[ 1.035001] Kernel command line: BOOT_IMAGE=/boot/bzImage-5.6.0-rc2.mx64.322 root=LABEL=root ro crashkernel=256M console=ttyS1,115200n8 console=tty0 init=/bin/systemd audit=0 random.trust_cpu=on
[ 1.035112] audit: disabled (until reboot)
[ 1.035136] printk: log_buf_len individual max cpu contribution: 4096 bytes
[ 1.035137] printk: log_buf_len total cpu_extra contributions: 159744 bytes
[ 1.035138] printk: log_buf_len min size: 131072 bytes
[ 1.035302] printk: log_buf_len: 524288 bytes
[ 1.035303] printk: early log buf free: 106124(80%)
[ 1.035646] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 1.632683] Memory: 263209212K/267879304K available (14340K kernel code, 1625K rwdata, 3656K rodata, 1540K init, 972K bss, 4670092K reserved, 0K cma-reserved)
[ 1.633469] ftrace: allocating 41240 entries in 162 pages
[ 1.648215] ftrace: allocated 162 pages with 3 groups
[ 1.648467] rcu: Hierarchical RCU implementation.
[ 1.648467] rcu: RCU event tracing is enabled.
[ 1.648468] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=40.
[ 1.648470] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
[ 1.648470] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=40
[ 1.651303] NR_IRQS: 16640, nr_irqs: 2104, preallocated irqs: 16
[ 1.651609] random: crng done (trusting CPU's manufacturer)
[ 1.652452] Console: colour dummy device 80x25
[ 1.652883] printk: console [tty0] enabled
[ 3.669892] printk: console [ttyS1] enabled
[ 3.674156] ACPI: Core revision 20200110
[ 3.681230] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 79635855245 ns
[ 3.690285] APIC: Switch to symmetric I/O mode setup
[ 3.695246] Switched APIC routing to physical flat.
[ 3.700124] ------------[ cut here ]------------
[ 3.704726] kernel BUG at arch/x86/kernel/apic/apic.c:1629!
[ 3.710300] invalid opcode: 0000 [#1] SMP NOPTI
[ 3.714815] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.6.0-rc2.mx64.322 #1
[ 3.721756] Hardware name: Dell Inc. PowerEdge T640/04WYPY, BIOS 2.4.8 11/27/2019
[ 3.729222] RIP: 0010:setup_local_APIC+0x32e/0x390
[ 3.734001] Code: 78 57 2e 01 be 00 07 01 00 bf 50 03 00 00 48 8b 40 10 e8 35 85 db 00 eb a9 be 00 04 01 00 bf 60 03 00 00 e8 24 85 db 00 eb bb <0f> 0b e8 5b 3a 00 00 e9 11 ff ff ff be 00 04 00 00 bf 60 03 00 00
[ 3.752700] RSP: 0000:ffffffff82403e90 EFLAGS: 00010246
[ 3.757914] RAX: 0000000000000000 RBX: 00000000000000a2 RCX: ffffffff82456088
[ 3.765029] RDX: 0000000000000000 RSI: 00000000fffffeff RDI: 0000000000000020
[ 3.772144] RBP: 0000000000000000 R08: 00000000000001cc R09: 0720072007200720
[ 3.779260] R10: ffffffff8258d940 R11: 0720072007200720 R12: ffff88c07ffeee80
[ 3.786374] R13: 0000000000000000 R14: 00000000000000a2 R15: 000000004647bcbc
[ 3.793492] FS: 0000000000000000(0000) GS:ffff889fffe00000(0000) knlGS:0000000000000000
[ 3.801557] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.807288] CR2: ffff88c07ffff000 CR3: 000000000240a001 CR4: 00000000000606b0
[ 3.814405] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3.821520] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3.828633] Call Trace:
[ 3.831077] apic_intr_mode_init+0xd6/0xef
[ 3.835165] x86_late_time_init+0x20/0x25
[ 3.839167] start_kernel+0x66b/0x71f
[ 3.842825] secondary_startup_64+0xa4/0xb0
[ 3.846998] Modules linked in:
[ 3.850077] ---[ end trace a223188007b81154 ]---
[ 3.854697] RIP: 0010:setup_local_APIC+0x32e/0x390
[ 3.859486] Code: 78 57 2e 01 be 00 07 01 00 bf 50 03 00 00 48 8b 40 10 e8 35 85 db 00 eb a9 be 00 04 01 00 bf 60 03 00 00 e8 24 85 db 00 eb bb <0f> 0b e8 5b 3a 00 00 e9 11 ff ff ff be 00 04 00 00 bf 60 03 00 00
[ 3.878196] RSP: 0000:ffffffff82403e90 EFLAGS: 00010246
[ 3.883418] RAX: 0000000000000000 RBX: 00000000000000a2 RCX: ffffffff82456088
[ 3.890540] RDX: 0000000000000000 RSI: 00000000fffffeff RDI: 0000000000000020
[ 3.897666] RBP: 0000000000000000 R08: 00000000000001cc R09: 0720072007200720
[ 3.904790] R10: ffffffff8258d940 R11: 0720072007200720 R12: ffff88c07ffeee80
[ 3.911913] R13: 0000000000000000 R14: 00000000000000a2 R15: 000000004647bcbc
[ 3.919037] FS: 0000000000000000(0000) GS:ffff889fffe00000(0000) knlGS:0000000000000000
[ 3.927113] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.932854] CR2: ffff88c07ffff000 CR3: 000000000240a001 CR4: 00000000000606b0
[ 3.939977] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3.947100] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3.954225] Kernel panic - not syncing: Attempted to kill the idle task!
[ 3.960923] ---[ end Kernel panic - not syncing: Attempted to kill the idle tas
```

Please find all messages attached.


Kind regards,

Paul


Attachments:
linux-5.6-rc2-messages.txt (25.56 kB)
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature
Download all attachments

2020-02-24 13:02:03

by Paul Menzel

[permalink] [raw]
Subject: Re: kernel BUG at arch/x86/kernel/apic/apic.c with Dell server with x2APIC enabled and unset X2APIC

Dear Thomas,


On 2020-02-21 17:27, Thomas Gleixner wrote:

> Paul Menzel writes:
>>
>> On the Dell PowerEdge T640/04WYPY, BIOS 2.4.8 11/27/2019, Linux 5.4.14 (and 4.19.57) with
>> unset `IRQ_REMAP` and `X86_X2APIC` crashes on start-up, when x2APIC is enabled in the
>> firmware.
>
>> [ 3.883893] Switched APIC routing to physical flat.
>> [ 3.888904] ------------[ cut here ]------------
>> [ 3.893641] kernel BUG at arch/x86/kernel/apic/apic.c:1616!
>
> So the APIC is not registered.
>>
>> `noapic` and `acpi=off` separately did not work, but `noapic acpi=off` hit the other
>> panic.
>
> I have no idea what you are talking about.
>
> Which command line options are set to reproduce the above?

Sorry, please ignore my debugging attempts. It fails without any of
these.

> Also please test the latest stable kernels not some random ones.

Please see my reply to Borislav with the logs of Linux 5.6-rc2.


Kind regards,

Paul


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature
Subject: [tip: x86/apic] x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOS

The following commit has been merged into the x86/apic branch of tip:

Commit-ID: e3998434da4f5b1f57f8d6a8a9f8502ee3723bae
Gitweb: https://git.kernel.org/tip/e3998434da4f5b1f57f8d6a8a9f8502ee3723bae
Author: Mateusz Jończyk <[email protected]>
AuthorDate: Tue, 29 Nov 2022 22:50:08 +01:00
Committer: Thomas Gleixner <[email protected]>
CommitterDate: Fri, 02 Dec 2022 14:28:52 +01:00

x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOS

A kernel that was compiled without CONFIG_X86_X2APIC was unable to boot on
platforms that have x2APIC already enabled in the BIOS before starting the
kernel.

The kernel was supposed to panic with an approprite error message in
validate_x2apic() due to the missing X2APIC support.

However, validate_x2apic() was run too late in the boot cycle, and the
kernel tried to initialize the APIC nonetheless. This resulted in an
earlier panic in setup_local_APIC() because the APIC was not registered.

In my experiments, a panic message in setup_local_APIC() was not visible
in the graphical console, which resulted in a hang with no indication
what has gone wrong.

Instead of calling panic(), disable the APIC, which results in a somewhat
working system with the PIC only (and no SMP). This way the user is able to
diagnose the problem more easily.

Disabling X2APIC mode is not an option because it's impossible on systems
with locked x2APIC.

The proper place to disable the APIC in this case is in check_x2apic(),
which is called early from setup_arch(). Doing this in
__apic_intr_mode_select() is too late.

Make check_x2apic() unconditionally available and remove the empty stub.

Reported-by: Paul Menzel <[email protected]>
Reported-by: Robert Elliott (Servers) <[email protected]>
Signed-off-by: Mateusz Jończyk <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]
Link: https://lore.kernel.org/lkml/[email protected]
Link: https://lore.kernel.org/all/[email protected]
---
arch/x86/Kconfig | 4 ++--
arch/x86/include/asm/apic.h | 3 +--
arch/x86/kernel/apic/apic.c | 13 ++++++++-----
3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 67745ce..b2c0fce 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -462,8 +462,8 @@ config X86_X2APIC

Some Intel systems circa 2022 and later are locked into x2APIC mode
and can not fall back to the legacy APIC modes if SGX or TDX are
- enabled in the BIOS. They will be unable to boot without enabling
- this option.
+ enabled in the BIOS. They will boot with very reduced functionality
+ without enabling this option.

If you don't know what to do here, say N.

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 3415321..3216da7 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -249,7 +249,6 @@ static inline u64 native_x2apic_icr_read(void)
extern int x2apic_mode;
extern int x2apic_phys;
extern void __init x2apic_set_max_apicid(u32 apicid);
-extern void __init check_x2apic(void);
extern void x2apic_setup(void);
static inline int x2apic_enabled(void)
{
@@ -258,13 +257,13 @@ static inline int x2apic_enabled(void)

#define x2apic_supported() (boot_cpu_has(X86_FEATURE_X2APIC))
#else /* !CONFIG_X86_X2APIC */
-static inline void check_x2apic(void) { }
static inline void x2apic_setup(void) { }
static inline int x2apic_enabled(void) { return 0; }

#define x2apic_mode (0)
#define x2apic_supported() (0)
#endif /* !CONFIG_X86_X2APIC */
+extern void __init check_x2apic(void);

struct irq_data;

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index c6876d3..20d9a60 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1931,16 +1931,19 @@ void __init check_x2apic(void)
}
}
#else /* CONFIG_X86_X2APIC */
-static int __init validate_x2apic(void)
+void __init check_x2apic(void)
{
if (!apic_is_x2apic_enabled())
- return 0;
+ return;
/*
- * Checkme: Can we simply turn off x2apic here instead of panic?
+ * Checkme: Can we simply turn off x2APIC here instead of disabling the APIC?
*/
- panic("BIOS has enabled x2apic but kernel doesn't support x2apic, please disable x2apic in BIOS.\n");
+ pr_err("Kernel does not support x2APIC, please recompile with CONFIG_X86_X2APIC.\n");
+ pr_err("Disabling APIC, expect reduced performance and functionality.\n");
+
+ disable_apic = 1;
+ setup_clear_cpu_cap(X86_FEATURE_APIC);
}
-early_initcall(validate_x2apic);

static inline void try_to_enable_x2apic(int remap_mode) { }
static inline void __x2apic_enable(void) { }