2023-03-31 23:56:21

by Lyude Paul

[permalink] [raw]
Subject: [PATCH] Revert "x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC"

This reverts commit e2869bd7af608c343988429ceb1c2fe99644a01f. This commit
unfortunately seems to have resulted in one of my machines no longer
booting. Specifically, this machine is a custom build with a MS-7A39/A320M
GAMING PRO motherboard with firmware version v1.10. I'm not entirely sure
of the cause yet, but starting it up with "earlycon=efifb keep_bootcon" has
informed me that the kernel panics like so:

Call Trace:
<TASK>
dump_stack_lvl+0x33/0x46
panic+0x105/0x2b1
? timer_irq_works+0x53/0xef
panic_if_irq_remap.cold+0x5/0x5
setup_IO_APIC+0x3c4/0x831
? __pfx_native_io_apic_read+0x10/0x10
? __ioapic_read_entry+0x34/0x50
? _raw_spin_unlock_irqrestore+0x1b/0x40
? clear_IO_APIC_pin+0x16b/0x240
apic_intr_mode_init+0x101/0x106
x86_late_time_init+0x20/0x34
start_kernel+0x8b4/0x95f
secondary_startup_64_no_verify+0x5e/0xeb
</TASK>
---[ end Kernel panic - not syncing: timer doesn't work through
interrupt-mapped IO-APIC ]---

My assumption is there's probably something funky with the firmware on the
machine seeing as it's a random gaming motherboard, but that also probably
means there are other boards out there like this that are cold, afraid, and
unable to boot. We could warm their hearts by reverting this, or maybe by
figuring out a proper fix.

Signed-off-by: Lyude Paul <[email protected]>
Fixes: e2869bd7af60 ("x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC")
Cc: Leo Duran <[email protected]>
Cc: Kishon Vijay Abraham I <[email protected]>
Cc: Borislav Petkov (AMD) <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: [email protected]
---
arch/x86/kernel/acpi/boot.c | 19 +++----------------
1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 1c38174b5f019..4177577c173bf 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -188,17 +188,6 @@ static int acpi_register_lapic(int id, u32 acpiid, u8 enabled)
return cpu;
}

-static bool __init acpi_is_processor_usable(u32 lapic_flags)
-{
- if (lapic_flags & ACPI_MADT_ENABLED)
- return true;
-
- if (acpi_support_online_capable && (lapic_flags & ACPI_MADT_ONLINE_CAPABLE))
- return true;
-
- return false;
-}
-
static int __init
acpi_parse_x2apic(union acpi_subtable_headers *header, const unsigned long end)
{
@@ -223,10 +212,6 @@ acpi_parse_x2apic(union acpi_subtable_headers *header, const unsigned long end)
if (apic_id == 0xffffffff)
return 0;

- /* don't register processors that cannot be onlined */
- if (!acpi_is_processor_usable(processor->lapic_flags))
- return 0;
-
/*
* We need to register disabled CPU as well to permit
* counting disabled CPUs. This allows us to size
@@ -265,7 +250,9 @@ acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end)
return 0;

/* don't register processors that can not be onlined */
- if (!acpi_is_processor_usable(processor->lapic_flags))
+ if (acpi_support_online_capable &&
+ !(processor->lapic_flags & ACPI_MADT_ENABLED) &&
+ !(processor->lapic_flags & ACPI_MADT_ONLINE_CAPABLE))
return 0;

/*
--
2.39.2


2023-04-01 00:19:16

by Lyude Paul

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC"

On Fri, 2023-03-31 at 19:53 -0400, Lyude Paul wrote:
> This reverts commit e2869bd7af608c343988429ceb1c2fe99644a01f. This commit
> unfortunately seems to have resulted in one of my machines no longer
> booting. Specifically, this machine is a custom build with a MS-7A39/A320M
> GAMING PRO motherboard with firmware version v1.10. I'm not entirely sure
> of the cause yet, but starting it up with "earlycon=efifb keep_bootcon" has
> informed me that the kernel panics like so:
>
> Call Trace:
> <TASK>
> dump_stack_lvl+0x33/0x46
> panic+0x105/0x2b1
> ? timer_irq_works+0x53/0xef
> panic_if_irq_remap.cold+0x5/0x5
> setup_IO_APIC+0x3c4/0x831
> ? __pfx_native_io_apic_read+0x10/0x10
> ? __ioapic_read_entry+0x34/0x50
> ? _raw_spin_unlock_irqrestore+0x1b/0x40
> ? clear_IO_APIC_pin+0x16b/0x240
> apic_intr_mode_init+0x101/0x106
> x86_late_time_init+0x20/0x34
> start_kernel+0x8b4/0x95f
> secondary_startup_64_no_verify+0x5e/0xeb
> </TASK>
> ---[ end Kernel panic - not syncing: timer doesn't work through
> interrupt-mapped IO-APIC ]---

Agh, I totally forgot to actually decode the stacktrace on this before sending
it out. I can do that if anyone would think it would help, but I have a
feeling the stacktrace here isn't particularly useful in the first place
considering the culprit commit here.

As well, hopefully it goes without saying but: I'm happy to try any kind of
fixes or provide any more information from this machine. Just let me know ♥

>
> My assumption is there's probably something funky with the firmware on the
> machine seeing as it's a random gaming motherboard, but that also probably
> means there are other boards out there like this that are cold, afraid, and
> unable to boot. We could warm their hearts by reverting this, or maybe by
> figuring out a proper fix.
>
> Signed-off-by: Lyude Paul <[email protected]>
> Fixes: e2869bd7af60 ("x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC")
> Cc: Leo Duran <[email protected]>
> Cc: Kishon Vijay Abraham I <[email protected]>
> Cc: Borislav Petkov (AMD) <[email protected]>
> Cc: Zhang Rui <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Cc: "Rafael J. Wysocki" <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: [email protected]
> ---
> arch/x86/kernel/acpi/boot.c | 19 +++----------------
> 1 file changed, 3 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 1c38174b5f019..4177577c173bf 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -188,17 +188,6 @@ static int acpi_register_lapic(int id, u32 acpiid, u8 enabled)
> return cpu;
> }
>
> -static bool __init acpi_is_processor_usable(u32 lapic_flags)
> -{
> - if (lapic_flags & ACPI_MADT_ENABLED)
> - return true;
> -
> - if (acpi_support_online_capable && (lapic_flags & ACPI_MADT_ONLINE_CAPABLE))
> - return true;
> -
> - return false;
> -}
> -
> static int __init
> acpi_parse_x2apic(union acpi_subtable_headers *header, const unsigned long end)
> {
> @@ -223,10 +212,6 @@ acpi_parse_x2apic(union acpi_subtable_headers *header, const unsigned long end)
> if (apic_id == 0xffffffff)
> return 0;
>
> - /* don't register processors that cannot be onlined */
> - if (!acpi_is_processor_usable(processor->lapic_flags))
> - return 0;
> -
> /*
> * We need to register disabled CPU as well to permit
> * counting disabled CPUs. This allows us to size
> @@ -265,7 +250,9 @@ acpi_parse_lapic(union acpi_subtable_headers * header, const unsigned long end)
> return 0;
>
> /* don't register processors that can not be onlined */
> - if (!acpi_is_processor_usable(processor->lapic_flags))
> + if (acpi_support_online_capable &&
> + !(processor->lapic_flags & ACPI_MADT_ENABLED) &&
> + !(processor->lapic_flags & ACPI_MADT_ONLINE_CAPABLE))
> return 0;
>
> /*

--
Cheers,
Lyude Paul (she/her)
Software Engineer at Red Hat

2023-04-01 11:20:29

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] Revert "x86/acpi/boot: Do not register processors that cannot be onlined for x2APIC"

On Fri, Mar 31, 2023 at 07:56:03PM -0400, Lyude Paul wrote:
> As well, hopefully it goes without saying but: I'm happy to try any kind of
> fixes

See if that branch fixes things for you:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=x86/urgent

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette