2023-08-07 16:20:45

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 37/53] x86/cpu: Detect real BSP on crash kernels

When a kdump kernel is started from a crashing CPU then there is no
guarantee that this CPU is the real boot CPU (BSP). If the kdump kernel
tries to online the BSP then the INIT sequence will reset the machine.

There is a command line option to prevent this, but in case of nested kdump
kernels this is wrong.

But that command line option is not required at all because the real BSP
has the lowest local APIC ID in the system. That was not always true, but
support for the only known system which was different (Voyager) got removed
long ago.

Detect whether the boot CPU APIC ID is the lowest APIC ID in the system.
If the lowest registered APIC ID is not the boot CPU APIC ID, then remove
it from the present bitmap and let the possible map initialization ignore
it.

Signed-off-by: Thomas Gleixner <[email protected]>
---
Documentation/admin-guide/kdump/kdump.rst | 7 --
Documentation/admin-guide/kernel-parameters.txt | 9 ---
arch/x86/kernel/cpu/topology.c | 59 +++++++++++++++---------
3 files changed, 39 insertions(+), 36 deletions(-)

--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -191,9 +191,7 @@ Dump-capture kernel config options (Arch
CPU is enough for kdump kernel to dump vmcore on most of systems.

However, you can also specify nr_cpus=X to enable multiple processors
- in kdump kernel. In this case, "disable_cpu_apicid=" is needed to
- tell kdump kernel which cpu is 1st kernel's BSP. Please refer to
- admin-guide/kernel-parameters.txt for more details.
+ in kdump kernel.

With CONFIG_SMP=n, the above things are not related.

@@ -485,8 +483,7 @@ loading dump-capture kernel.
to use multi-thread programs with it, such as parallel dump feature of
makedumpfile. Otherwise, the multi-thread program may have a great
performance degradation. To enable multi-cpu support, you should bring up an
- SMP dump-capture kernel and specify maxcpus/nr_cpus, disable_cpu_apicid=[X]
- options while loading it.
+ SMP dump-capture kernel and specify maxcpus/nr_cpus options while loading it.

* For s390x there are two kdump modes: If a ELF header is specified with
the elfcorehdr= kernel parameter, it is used by the kdump kernel as it
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1071,15 +1071,6 @@
Disable TLBIE instruction. Currently does not work
with KVM, with HASH MMU, or with coherent accelerators.

- disable_cpu_apicid= [X86,APIC,SMP]
- Format: <int>
- The number of initial APIC ID for the
- corresponding CPU to be disabled at boot,
- mostly used for the kdump 2nd kernel to
- disable BSP to wake up multiple CPUs without
- causing system reset or hang due to sending
- INIT from AP to BSP.
-
disable_ddw [PPC/PSERIES]
Disable Dynamic DMA Window support. Use this
to workaround buggy firmware.
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -32,18 +32,13 @@ struct {
unsigned int nr_disabled_cpus;
unsigned int nr_rejected_cpus;
u32 boot_cpu_apic_id;
+ u32 real_bsp_apic_id;
} topo_info __read_mostly = {
.nr_assigned_cpus = 1,
.boot_cpu_apic_id = BAD_APICID,
+ .real_bsp_apic_id = BAD_APICID,
};

-/*
- * Processor to be disabled specified by kernel parameter
- * disable_cpu_apicid=<int>, mostly used for the kdump 2nd kernel to
- * avoid undefined behaviour caused by sending INIT from AP to BSP.
- */
-static u32 disabled_cpu_apicid __ro_after_init = BAD_APICID;
-
bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
{
return phys_id == (u64)cpuid_to_apicid[cpu];
@@ -146,12 +141,6 @@ void __init topology_register_apic(u32 a
return;
}

- if (disabled_cpu_apicid == apic_id) {
- pr_info("Disabling CPU as requested via 'disable_cpu_apicid=0x%x'.\n", apic_id);
- topo_info.nr_rejected_cpus++;
- return;
- }
-
if (present) {
/*
* Prevent double registration, which is valid in case of
@@ -270,6 +259,30 @@ static __init bool restrict_to_up(void)
return apic_is_disabled;
}

+static __init void check_for_kdump_kernel(void)
+{
+ u32 bsp_apicid;
+
+ /*
+ * There is no real good way to detect whether this a kdump()
+ * kernel, but except on the Voyager SMP monstrosity which is not
+ * longer supported, the real BSP has always the lowest numbered
+ * APIC ID. If a crash happened on an AP, which then ends up as
+ * boot CPU in the kdump() kernel, then sending INIT to the real
+ * BSP would reset the whole system.
+ */
+ bsp_apicid = find_first_bit(phys_cpu_present_map, MAX_LOCAL_APIC);
+ if (bsp_apicid == topo_info.boot_cpu_apic_id)
+ return;
+
+ pr_warn("Boot CPU APIC ID not the lowest APIC ID: %x > %x\n",
+ topo_info.boot_cpu_apic_id, bsp_apicid);
+ pr_warn("Crash kernel detected. Disabling real BSP to prevent machine INIT\n");
+
+ topo_info.real_bsp_apic_id = bsp_apicid;
+ clear_bit(bsp_apicid, phys_cpu_present_map);
+}
+
void __init topology_init_possible_cpus(void)
{
unsigned int assigned = topo_info.nr_assigned_cpus;
@@ -278,6 +291,9 @@ void __init topology_init_possible_cpus(
unsigned int cpu, allowed = 1;

if (!restrict_to_up()) {
+ if (total > 1)
+ check_for_kdump_kernel();
+
if (WARN_ON_ONCE(assigned > nr_cpu_ids)) {
disabled += assigned - nr_cpu_ids;
assigned = nr_cpu_ids;
@@ -308,6 +324,14 @@ void __init topology_init_possible_cpus(
for (cpu = 0; cpu < allowed; cpu++) {
u32 apicid = cpuid_to_apicid[cpu];

+ /*
+ * In case of a kdump() kernel, don't mark the real BSP in
+ * the present and possible maps. Sending INIT to it resets
+ * the machine.
+ */
+ if (apicid != BAD_APICID && apicid == topo_info.real_bsp_apic_id)
+ continue;
+
set_cpu_possible(cpu, true);

if (apicid == BAD_APICID)
@@ -337,12 +361,3 @@ static int __init setup_possible_cpus(ch
}
early_param("possible_cpus", setup_possible_cpus);
#endif
-
-static int __init apic_set_disabled_cpu_apicid(char *arg)
-{
- if (!arg || !get_option(&arg, &disabled_cpu_apicid))
- return -EINVAL;
-
- return 0;
-}
-early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);



2024-01-08 14:11:42

by Zhang, Rui

[permalink] [raw]
Subject: Re: [patch 37/53] x86/cpu: Detect real BSP on crash kernels

> +static __init void check_for_kdump_kernel(void)
> +{
> +       u32 bsp_apicid;
> +
> +       /*
> +        * There is no real good way to detect whether this a kdump()
> +        * kernel, but except on the Voyager SMP monstrosity which is
> not
> +        * longer supported, the real BSP has always the lowest
> numbered
> +        * APIC ID. If a crash happened on an AP, which then ends up
> as
> +        * boot CPU in the kdump() kernel, then sending INIT to the
> real
> +        * BSP would reset the whole system.
> +        */


Hi, Thomas,

Unfortunately this causes a regression on Intel Meteorlake platform,
where the BSP APIC ID is NOT the lowest numbered APIC ID (instead,
CPU12, the first Ecore CPU, has APIC ID 0).

And this causes the system fails to enumerate CPU12 (I didn't do
bisect. I suspect this patch breaks it by reading the code).

log with 6.7-rc vanilla kernel,
[ 0.335133] smp: Bringing up secondary CPUs ...
[ 0.335133] smpboot: x86: Booting SMP configuration:
[ 0.335133] .... node #0, CPUs: #1 #3 #6 #8 #10 #12 #13
#14 #15 #16 #17 #18 #19 #20 #21
[ 0.010435] core: cpu_atom PMU driver: PEBS-via-PT
[ 0.010435] ... version: 5
[ 0.010435] ... bit width: 48
[ 0.010435] ... generic registers: 8
[ 0.010435] ... value mask: 0000ffffffffffff
[ 0.010435] ... max period: 00007fffffffffff
[ 0.010435] ... fixed-purpose events: 3
[ 0.010435] ... event mask: 00000007000000ff
[ 0.339203] #2 #4 #5 #7 #9 #11
[ 0.343208] smp: Brought up 1 node, 22 CPUs

log with 6.5-rc4 kernel + your patch series,
[ 2.208960] smpboot: x86: Booting SMP configuration:
[ 2.209869] .... node #0, CPUs: #1 #3 #6 #8 #10 #13 #14
#15 #16 #17 #18 #19 #20 #21
[ 1.796167] core: cpu_atom PMU driver: PEBS-via-PT
[ 1.796167] ... version: 5
[ 1.796167] ... bit width: 48
[ 1.796167] ... generic registers: 8
[ 1.796167] ... value mask: 0000ffffffffffff
[ 1.796167] ... max period: 00007fffffffffff
[ 1.796167] ... fixed-purpose events: 3
[ 1.796167] ... event mask: 00000007000000ff
[ 2.260958] #2 #4 #5 #7 #9 #11
[ 2.263906] smp: Brought up 1 node, 21 CPUs


thanks,
rui

# cpuid -l 0x1f -s 0 | grep x2APIC
x2APIC ID of logical processor = 0x20 (32)
x2APIC ID of logical processor = 0x10 (16)
x2APIC ID of logical processor = 0x11 (17)
x2APIC ID of logical processor = 0x18 (24)
x2APIC ID of logical processor = 0x19 (25)
x2APIC ID of logical processor = 0x21 (33)
x2APIC ID of logical processor = 0x28 (40)
x2APIC ID of logical processor = 0x29 (41)
x2APIC ID of logical processor = 0x30 (48)
x2APIC ID of logical processor = 0x31 (49)
x2APIC ID of logical processor = 0x38 (56)
x2APIC ID of logical processor = 0x39 (57)
x2APIC ID of logical processor = 0x0 (0)
x2APIC ID of logical processor = 0x2 (2)
x2APIC ID of logical processor = 0x4 (4)
x2APIC ID of logical processor = 0x6 (6)
x2APIC ID of logical processor = 0x8 (8)
x2APIC ID of logical processor = 0xa (10)
x2APIC ID of logical processor = 0xc (12)
x2APIC ID of logical processor = 0xe (14)
x2APIC ID of logical processor = 0x40 (64)
x2APIC ID of logical processor = 0x42 (66)

2024-01-08 14:54:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 37/53] x86/cpu: Detect real BSP on crash kernels

On Mon, Jan 08 2024 at 14:11, Zhang, Rui wrote:
>> +static __init void check_for_kdump_kernel(void)
>> +{
>> +       u32 bsp_apicid;
>> +
>> +       /*
>> +        * There is no real good way to detect whether this a kdump()
>> +        * kernel, but except on the Voyager SMP monstrosity which is
>> not
>> +        * longer supported, the real BSP has always the lowest
>> numbered
>> +        * APIC ID. If a crash happened on an AP, which then ends up
>> as
>> +        * boot CPU in the kdump() kernel, then sending INIT to the
>> real
>> +        * BSP would reset the whole system.
>> +        */
>
>
> Hi, Thomas,
>
> Unfortunately this causes a regression on Intel Meteorlake platform,
> where the BSP APIC ID is NOT the lowest numbered APIC ID (instead,
> CPU12, the first Ecore CPU, has APIC ID 0).

Bah. Let me think about that.


2024-01-08 16:13:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 37/53] x86/cpu: Detect real BSP on crash kernels

On Mon, Jan 08 2024 at 15:54, Thomas Gleixner wrote:

> On Mon, Jan 08 2024 at 14:11, Zhang, Rui wrote:
>>> +static __init void check_for_kdump_kernel(void)
>>> +{
>>> +       u32 bsp_apicid;
>>> +
>>> +       /*
>>> +        * There is no real good way to detect whether this a kdump()
>>> +        * kernel, but except on the Voyager SMP monstrosity which is
>>> not
>>> +        * longer supported, the real BSP has always the lowest
>>> numbered
>>> +        * APIC ID. If a crash happened on an AP, which then ends up
>>> as
>>> +        * boot CPU in the kdump() kernel, then sending INIT to the
>>> real
>>> +        * BSP would reset the whole system.
>>> +        */
>>
>>
>> Hi, Thomas,
>>
>> Unfortunately this causes a regression on Intel Meteorlake platform,
>> where the BSP APIC ID is NOT the lowest numbered APIC ID (instead,
>> CPU12, the first Ecore CPU, has APIC ID 0).
>
> Bah. Let me think about that.

In which order are the APICs/CPUs enumerated by ACPI?

2024-01-09 01:54:35

by Zhang, Rui

[permalink] [raw]
Subject: Re: [patch 37/53] x86/cpu: Detect real BSP on crash kernels

On Mon, 2024-01-08 at 17:13 +0100, Thomas Gleixner wrote:
> On Mon, Jan 08 2024 at 15:54, Thomas Gleixner wrote:
>
> > On Mon, Jan 08 2024 at 14:11, Zhang, Rui wrote:
> > > > +static __init void check_for_kdump_kernel(void)
> > > > +{
> > > > +       u32 bsp_apicid;
> > > > +
> > > > +       /*
> > > > +        * There is no real good way to detect whether this a
> > > > kdump()
> > > > +        * kernel, but except on the Voyager SMP monstrosity
> > > > which is
> > > > not
> > > > +        * longer supported, the real BSP has always the lowest
> > > > numbered
> > > > +        * APIC ID. If a crash happened on an AP, which then
> > > > ends up
> > > > as
> > > > +        * boot CPU in the kdump() kernel, then sending INIT to
> > > > the
> > > > real
> > > > +        * BSP would reset the whole system.
> > > > +        */
> > >
> > >
> > > Hi, Thomas,
> > >
> > > Unfortunately this causes a regression on Intel Meteorlake
> > > platform,
> > > where the BSP APIC ID is NOT the lowest numbered APIC ID
> > > (instead,
> > > CPU12, the first Ecore CPU, has APIC ID 0).
> >
> > Bah. Let me think about that.
>
> In which order are the APICs/CPUs enumerated by ACPI?


This is the order in MADT,
$ cat apic.dsl | grep x2Apic
[030h 0048 4] Processor x2Apic ID : 00000010
[040h 0064 4] Processor x2Apic ID : 00000011
[050h 0080 4] Processor x2Apic ID : 00000018
[060h 0096 4] Processor x2Apic ID : 00000019
[070h 0112 4] Processor x2Apic ID : 00000020
[080h 0128 4] Processor x2Apic ID : 00000021
[090h 0144 4] Processor x2Apic ID : 00000028
[0A0h 0160 4] Processor x2Apic ID : 00000029
[0B0h 0176 4] Processor x2Apic ID : 00000030
[0C0h 0192 4] Processor x2Apic ID : 00000031
[0D0h 0208 4] Processor x2Apic ID : 00000038
[0E0h 0224 4] Processor x2Apic ID : 00000039
[0F0h 0240 4] Processor x2Apic ID : 00000000
[100h 0256 4] Processor x2Apic ID : 00000002
[110h 0272 4] Processor x2Apic ID : 00000004
[120h 0288 4] Processor x2Apic ID : 00000006
[130h 0304 4] Processor x2Apic ID : 00000008
[140h 0320 4] Processor x2Apic ID : 0000000A
[150h 0336 4] Processor x2Apic ID : 0000000C
[160h 0352 4] Processor x2Apic ID : 0000000E
[170h 0368 4] Processor x2Apic ID : 00000040
[180h 0384 4] Processor x2Apic ID : 00000042

and this is the order in Linux (from CPU0 to CPUN)
x2APIC ID of logical processor = 0x20 (32)
x2APIC ID of logical processor = 0x10 (16)
x2APIC ID of logical processor = 0x11 (17)
x2APIC ID of logical processor = 0x18 (24)
x2APIC ID of logical processor = 0x19 (25)
x2APIC ID of logical processor = 0x21 (33)
x2APIC ID of logical processor = 0x28 (40)
x2APIC ID of logical processor = 0x29 (41)
x2APIC ID of logical processor = 0x30 (48)
x2APIC ID of logical processor = 0x31 (49)
x2APIC ID of logical processor = 0x38 (56)
x2APIC ID of logical processor = 0x39 (57)
x2APIC ID of logical processor = 0x0 (0)
x2APIC ID of logical processor = 0x2 (2)
x2APIC ID of logical processor = 0x4 (4)
x2APIC ID of logical processor = 0x6 (6)
x2APIC ID of logical processor = 0x8 (8)
x2APIC ID of logical processor = 0xa (10)
x2APIC ID of logical processor = 0xc (12)
x2APIC ID of logical processor = 0xe (14)
x2APIC ID of logical processor = 0x40 (64)
x2APIC ID of logical processor = 0x42 (66)

thanks,
rui

2024-01-10 14:32:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 37/53] x86/cpu: Detect real BSP on crash kernels

On Tue, Jan 09 2024 at 01:54, Zhang, Rui wrote:
> On Mon, 2024-01-08 at 17:13 +0100, Thomas Gleixner wrote:
>> > > Unfortunately this causes a regression on Intel Meteorlake
>> > > platform,
>> > > where the BSP APIC ID is NOT the lowest numbered APIC ID
>> > > (instead,
>> > > CPU12, the first Ecore CPU, has APIC ID 0).
>> >
>> > Bah. Let me think about that.
>>
>> In which order are the APICs/CPUs enumerated by ACPI?
>
>
> This is the order in MADT,
> $ cat apic.dsl | grep x2Apic
> [030h 0048 4] Processor x2Apic ID : 00000010
> [040h 0064 4] Processor x2Apic ID : 00000011
> [050h 0080 4] Processor x2Apic ID : 00000018
> [060h 0096 4] Processor x2Apic ID : 00000019
> [070h 0112 4] Processor x2Apic ID : 00000020
> [080h 0128 4] Processor x2Apic ID : 00000021
> [090h 0144 4] Processor x2Apic ID : 00000028
> [0A0h 0160 4] Processor x2Apic ID : 00000029
> [0B0h 0176 4] Processor x2Apic ID : 00000030
> [0C0h 0192 4] Processor x2Apic ID : 00000031
> [0D0h 0208 4] Processor x2Apic ID : 00000038
> [0E0h 0224 4] Processor x2Apic ID : 00000039
> [0F0h 0240 4] Processor x2Apic ID : 00000000
> [100h 0256 4] Processor x2Apic ID : 00000002
> [110h 0272 4] Processor x2Apic ID : 00000004
> [120h 0288 4] Processor x2Apic ID : 00000006
> [130h 0304 4] Processor x2Apic ID : 00000008
> [140h 0320 4] Processor x2Apic ID : 0000000A
> [150h 0336 4] Processor x2Apic ID : 0000000C
> [160h 0352 4] Processor x2Apic ID : 0000000E
> [170h 0368 4] Processor x2Apic ID : 00000040
> [180h 0384 4] Processor x2Apic ID : 00000042
>
> and this is the order in Linux (from CPU0 to CPUN)
> x2APIC ID of logical processor = 0x20 (32)
> x2APIC ID of logical processor = 0x10 (16)
> x2APIC ID of logical processor = 0x11 (17)
> x2APIC ID of logical processor = 0x18 (24)
> x2APIC ID of logical processor = 0x19 (25)
> x2APIC ID of logical processor = 0x21 (33)
> x2APIC ID of logical processor = 0x28 (40)
> x2APIC ID of logical processor = 0x29 (41)
> x2APIC ID of logical processor = 0x30 (48)
> x2APIC ID of logical processor = 0x31 (49)
> x2APIC ID of logical processor = 0x38 (56)
> x2APIC ID of logical processor = 0x39 (57)
> x2APIC ID of logical processor = 0x0 (0)
> x2APIC ID of logical processor = 0x2 (2)
> x2APIC ID of logical processor = 0x4 (4)
> x2APIC ID of logical processor = 0x6 (6)
> x2APIC ID of logical processor = 0x8 (8)
> x2APIC ID of logical processor = 0xa (10)
> x2APIC ID of logical processor = 0xc (12)
> x2APIC ID of logical processor = 0xe (14)
> x2APIC ID of logical processor = 0x40 (64)
> x2APIC ID of logical processor = 0x42 (66)

What a mess...

2024-01-10 15:14:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 37/53] x86/cpu: Detect real BSP on crash kernels

On Wed, Jan 10 2024 at 15:19, Thomas Gleixner wrote:
>> This is the order in MADT,
>> $ cat apic.dsl | grep x2Apic
>> [030h 0048 4] Processor x2Apic ID : 00000010
>> [040h 0064 4] Processor x2Apic ID : 00000011
..
>> and this is the order in Linux (from CPU0 to CPUN)
>> x2APIC ID of logical processor = 0x20 (32)
>> x2APIC ID of logical processor = 0x10 (16)
>
> What a mess...

And clearly not according to the spec

"The second is that platform firmware should list the boot processor
as the first processor entry in the MADT."

Oh well. There are reasons why this is written the way it is.

2024-01-11 01:52:31

by Zhang, Rui

[permalink] [raw]
Subject: Re: [patch 37/53] x86/cpu: Detect real BSP on crash kernels

On Wed, 2024-01-10 at 16:14 +0100, Thomas Gleixner wrote:
> On Wed, Jan 10 2024 at 15:19, Thomas Gleixner wrote:
> > > This is the order in MADT,
> > > $ cat apic.dsl  | grep x2Apic
> > > [030h 0048   4]          Processor x2Apic ID : 00000010
> > > [040h 0064   4]          Processor x2Apic ID : 00000011
> ...
> > > and this is the order in Linux (from CPU0 to CPUN)
> > >       x2APIC ID of logical processor = 0x20 (32)
> > >       x2APIC ID of logical processor = 0x10 (16)
> >
> > What a mess...
>
> And clearly not according to the spec
>
>   "The second is that platform firmware should list the boot
> processor
>    as the first processor entry in the MADT."
>
> Oh well. There are reasons why this is written the way it is.

Let me sync internally to see why it is designed in this way.

thanks,
rui

2024-01-12 09:14:20

by Zhang, Rui

[permalink] [raw]
Subject: Re: [patch 37/53] x86/cpu: Detect real BSP on crash kernels

Add Len.

On Wed, 2024-01-10 at 16:14 +0100, Thomas Gleixner wrote:
> On Wed, Jan 10 2024 at 15:19, Thomas Gleixner wrote:
> > > This is the order in MADT,
> > > $ cat apic.dsl  | grep x2Apic
> > > [030h 0048   4]          Processor x2Apic ID : 00000010
> > > [040h 0064   4]          Processor x2Apic ID : 00000011
> ...
> > > and this is the order in Linux (from CPU0 to CPUN)
> > >       x2APIC ID of logical processor = 0x20 (32)
> > >       x2APIC ID of logical processor = 0x10 (16)
> >
> > What a mess...
>
> And clearly not according to the spec
>
>   "The second is that platform firmware should list the boot
> processor
>    as the first processor entry in the MADT."
>
> Oh well. There are reasons why this is written the way it is.

This is indeed a violation of the ACPI spec and we should modify the
order in MADT. But this doesn't bring any actual effect as Linux
already handles this, right?

For the BSP APIC ID 0x20, I didn't find out a specific reason why we
have to do it in that way, but it is still legal. We may need to figure
out another way to distinguish the kdump kernel.

thanks,
rui

2024-01-12 15:39:55

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 37/53] x86/cpu: Detect real BSP on crash kernels

On Fri, Jan 12 2024 at 09:14, Zhang, Rui wrote:
> On Wed, 2024-01-10 at 16:14 +0100, Thomas Gleixner wrote:
>> On Wed, Jan 10 2024 at 15:19, Thomas Gleixner wrote:
>> > > This is the order in MADT,
>> > > $ cat apic.dsl  | grep x2Apic
>> > > [030h 0048   4]          Processor x2Apic ID : 00000010
>> > > [040h 0064   4]          Processor x2Apic ID : 00000011
>> ...
>> > > and this is the order in Linux (from CPU0 to CPUN)
>> > >       x2APIC ID of logical processor = 0x20 (32)
>> > >       x2APIC ID of logical processor = 0x10 (16)
>> >
>> > What a mess...
>>
>> And clearly not according to the spec
>>
>>   "The second is that platform firmware should list the boot
>> processor
>>    as the first processor entry in the MADT."
>>
>> Oh well. There are reasons why this is written the way it is.
>
> This is indeed a violation of the ACPI spec and we should modify the
> order in MADT. But this doesn't bring any actual effect as Linux
> already handles this, right?

It brings the effect that we can detect when we are not booting (kexec
case) on the actual boot CPU because then the first enumerated APIC ID
is not the same as the boot CPU APIC ID. No?

> For the BSP APIC ID 0x20, I didn't find out a specific reason why we
> have to do it in that way, but it is still legal.

Linux does not really care in which order the APICs are enumerated.

> We may need to figure out another way to distinguish the kdump kernel.

Having the first enumerated APIC in the MADT as the actual boot CPU is a
sensible and functional way. Everything else including the silly kexec
boot parameter is error prone.

I agree that MADT is error prone too given the fact that not even Intel
can get it right....

Thanks,

tglx

2024-01-13 07:35:59

by Zhang, Rui

[permalink] [raw]
Subject: Re: [patch 37/53] x86/cpu: Detect real BSP on crash kernels

On Fri, 2024-01-12 at 16:39 +0100, Thomas Gleixner wrote:
> On Fri, Jan 12 2024 at 09:14, Zhang, Rui wrote:
> > On Wed, 2024-01-10 at 16:14 +0100, Thomas Gleixner wrote:
> > > On Wed, Jan 10 2024 at 15:19, Thomas Gleixner wrote:
> > > > > This is the order in MADT,
> > > > > $ cat apic.dsl  | grep x2Apic
> > > > > [030h 0048   4]          Processor x2Apic ID : 00000010
> > > > > [040h 0064   4]          Processor x2Apic ID : 00000011
> > > ...
> > > > > and this is the order in Linux (from CPU0 to CPUN)
> > > > >       x2APIC ID of logical processor = 0x20 (32)
> > > > >       x2APIC ID of logical processor = 0x10 (16)
> > > >
> > > > What a mess...
> > >
> > > And clearly not according to the spec
> > >
> > >   "The second is that platform firmware should list the boot
> > > processor
> > >    as the first processor entry in the MADT."
> > >
> > > Oh well. There are reasons why this is written the way it is.
> >
> > This is indeed a violation of the ACPI spec and we should modify
> > the
> > order in MADT. But this doesn't bring any actual effect as Linux
> > already handles this, right?
>
> It brings the effect that we can detect when we are not booting
> (kexec
> case) on the actual boot CPU because then the first enumerated APIC
> ID
> is not the same as the boot CPU APIC ID. No?

Right.
I was thinking in the way this patch series does, which just compares
the boot CPU APIC ID and the lowest numbered APIC ID.

>
> > For the BSP APIC ID 0x20, I didn't find out a specific reason why
> > we
> > have to do it in that way, but it is still legal.
>
> Linux does not really care in which order the APICs are enumerated.
>
> > We may need to figure out another way to distinguish the kdump
> > kernel.
>
> Having the first enumerated APIC in the MADT as the actual boot CPU
> is a
> sensible and functional way. Everything else including the silly
> kexec
> boot parameter is error prone.
>
> I agree that MADT is error prone too given the fact that not even
> Intel
> can get it right....

For this MTL, I can raise an internal ticket to get it right.

Are there quite some platforms with BSP not listed as the first entry
in MADT?
if so, we still have to live with the kexec boot parameter? :)

thanks,
rui

2024-01-15 09:42:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 37/53] x86/cpu: Detect real BSP on crash kernels

On Sat, Jan 13 2024 at 07:35, Zhang, Rui wrote:
> On Fri, 2024-01-12 at 16:39 +0100, Thomas Gleixner wrote:
>> It brings the effect that we can detect when we are not booting
>> (kexec
>> case) on the actual boot CPU because then the first enumerated APIC
>> ID
>> is not the same as the boot CPU APIC ID. No?
>
> Right.
> I was thinking in the way this patch series does, which just compares
> the boot CPU APIC ID and the lowest numbered APIC ID.

Yes, that's obviously not working when the lowest APIC ID is not the
actual boot CPU.

>> I agree that MADT is error prone too given the fact that not even
>> Intel
>> can get it right....
>
> For this MTL, I can raise an internal ticket to get it right.
>
> Are there quite some platforms with BSP not listed as the first entry
> in MADT?
> if so, we still have to live with the kexec boot parameter? :)

I haven't found one yet, but you might look through the P/E systems to
get a picture.