These are 2 fixes for issues introduced by topology related changes
added in the 6.9 merge window.
Juergen Gross (2):
x86/cpu: fix BSP detection when running as Xen PV guest
x86/xen: return a sane initial apic id when running as PV guest
arch/x86/kernel/cpu/topology.c | 2 +-
arch/x86/xen/enlighten_pv.c | 10 +++++++++-
2 files changed, 10 insertions(+), 2 deletions(-)
--
2.35.3
With recent sanity checks for topology information added, there are now
warnings issued for APs when running as a Xen PV guest:
[Firmware Bug]: CPU 1: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0001
This is due to the initial APIC ID obtained via CPUID for PV guests is
always 0.
Avoid the warnings by synthesizing the CPUID data to contain the same
initial APIC ID as xen_pv_smp_config() is using for registering the
APIC IDs of all CPUs.
Fixes: 52128a7a21f7 ("86/cpu/topology: Make the APIC mismatch warnings complete")
Signed-off-by: Juergen Gross <[email protected]>
---
arch/x86/xen/enlighten_pv.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index ace2eb054053..965e4ca36024 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -219,13 +219,20 @@ static __read_mostly unsigned int cpuid_leaf5_edx_val;
static void xen_cpuid(unsigned int *ax, unsigned int *bx,
unsigned int *cx, unsigned int *dx)
{
- unsigned maskebx = ~0;
+ unsigned int maskebx = ~0;
+ unsigned int or_ebx = 0;
/*
* Mask out inconvenient features, to try and disable as many
* unsupported kernel subsystems as possible.
*/
switch (*ax) {
+ case 0x1:
+ /* Replace initial APIC ID in bits 24-31 of EBX. */
+ maskebx = 0x00ffffff;
+ or_ebx = smp_processor_id() << 24;
+ break;
+
case CPUID_MWAIT_LEAF:
/* Synthesize the values.. */
*ax = 0;
@@ -248,6 +255,7 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
: "0" (*ax), "2" (*cx));
*bx &= maskebx;
+ *bx |= or_ebx;
}
static bool __init xen_check_mwait(void)
--
2.35.3
When booting as a Xen PV guest the boot processor isn't detected
correctly and the following message is shown:
CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1
Additionally this results in one CPU being ignored.
Fix that by calling the BSP detection logic when registering the boot
CPU's APIC, too.
Fixes: 5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels")
Signed-off-by: Juergen Gross <[email protected]>
---
arch/x86/kernel/cpu/topology.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index aaca8d235dc2..23c3db5e6396 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -255,7 +255,7 @@ void __init topology_register_boot_apic(u32 apic_id)
WARN_ON_ONCE(topo_info.boot_cpu_apic_id != BAD_APICID);
topo_info.boot_cpu_apic_id = apic_id;
- topo_register_apic(apic_id, CPU_ACPIID_INVALID, true);
+ topology_register_apic(apic_id, CPU_ACPIID_INVALID, true);
}
/**
--
2.35.3
On 05/04/2024 1:34 pm, Juergen Gross wrote:
> With recent sanity checks for topology information added, there are now
> warnings issued for APs when running as a Xen PV guest:
>
> [Firmware Bug]: CPU 1: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0001
>
> This is due to the initial APIC ID obtained via CPUID for PV guests is
> always 0.
/sigh
From Xen:
switch ( leaf )
{
case 0x1:
/* TODO: Rework topology logic. */
res->b &= 0x00ffffffu;
if ( is_hvm_domain(d) )
res->b |= (v->vcpu_id * 2) << 24;
I think there's a very good chance it was random prior to Xen 4.6. That
used to come straight out of a CPUID value, so would get the APIC ID of
whichever pCPU it was scheduled on.
> Avoid the warnings by synthesizing the CPUID data to contain the same
> initial APIC ID as xen_pv_smp_config() is using for registering the
> APIC IDs of all CPUs.
>
> Fixes: 52128a7a21f7 ("86/cpu/topology: Make the APIC mismatch warnings complete")
> Signed-off-by: Juergen Gross <[email protected]>
> ---
> arch/x86/xen/enlighten_pv.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index ace2eb054053..965e4ca36024 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -219,13 +219,20 @@ static __read_mostly unsigned int cpuid_leaf5_edx_val;
> static void xen_cpuid(unsigned int *ax, unsigned int *bx,
> unsigned int *cx, unsigned int *dx)
> {
> - unsigned maskebx = ~0;
> + unsigned int maskebx = ~0;
> + unsigned int or_ebx = 0;
>
> /*
> * Mask out inconvenient features, to try and disable as many
> * unsupported kernel subsystems as possible.
> */
> switch (*ax) {
> + case 0x1:
> + /* Replace initial APIC ID in bits 24-31 of EBX. */
> + maskebx = 0x00ffffff;
> + or_ebx = smp_processor_id() << 24;
I think the comment wants to cross-reference explicitly with
xen_pv_smp_config(), because what we care about here is the two sources
of information matching.
Also while you're at it, the x2APIC ID in leaf 0xb.
~Andrew
On 05.04.24 14:50, Andrew Cooper wrote:
> On 05/04/2024 1:34 pm, Juergen Gross wrote:
>> With recent sanity checks for topology information added, there are now
>> warnings issued for APs when running as a Xen PV guest:
>>
>> [Firmware Bug]: CPU 1: APIC ID mismatch. CPUID: 0x0000 APIC: 0x0001
>>
>> This is due to the initial APIC ID obtained via CPUID for PV guests is
>> always 0.
>
> /sigh
>
> From Xen:
>
> switch ( leaf )
> {
> case 0x1:
> /* TODO: Rework topology logic. */
> res->b &= 0x00ffffffu;
> if ( is_hvm_domain(d) )
> res->b |= (v->vcpu_id * 2) << 24;
>
>
> I think there's a very good chance it was random prior to Xen 4.6. That
> used to come straight out of a CPUID value, so would get the APIC ID of
> whichever pCPU it was scheduled on.
>
>> Avoid the warnings by synthesizing the CPUID data to contain the same
>> initial APIC ID as xen_pv_smp_config() is using for registering the
>> APIC IDs of all CPUs.
>>
>> Fixes: 52128a7a21f7 ("86/cpu/topology: Make the APIC mismatch warnings complete")
>> Signed-off-by: Juergen Gross <[email protected]>
>> ---
>> arch/x86/xen/enlighten_pv.c | 10 +++++++++-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
>> index ace2eb054053..965e4ca36024 100644
>> --- a/arch/x86/xen/enlighten_pv.c
>> +++ b/arch/x86/xen/enlighten_pv.c
>> @@ -219,13 +219,20 @@ static __read_mostly unsigned int cpuid_leaf5_edx_val;
>> static void xen_cpuid(unsigned int *ax, unsigned int *bx,
>> unsigned int *cx, unsigned int *dx)
>> {
>> - unsigned maskebx = ~0;
>> + unsigned int maskebx = ~0;
>> + unsigned int or_ebx = 0;
>>
>> /*
>> * Mask out inconvenient features, to try and disable as many
>> * unsupported kernel subsystems as possible.
>> */
>> switch (*ax) {
>> + case 0x1:
>> + /* Replace initial APIC ID in bits 24-31 of EBX. */
>> + maskebx = 0x00ffffff;
>> + or_ebx = smp_processor_id() << 24;
>
> I think the comment wants to cross-reference explicitly with
> xen_pv_smp_config(), because what we care about here is the two sources
> of information matching.
I can add that as a comment. OTOH I'd really hope someone changing this
code later would look into the commit message of the patch adding it. :-)
>
> Also while you're at it, the x2APIC ID in leaf 0xb.
I'm not sure this is functionally relevant in PV guests.
Note that my patch is only meant to silence warnings during boot. It is not
needed for the system working correctly (at least I think so).
Juergen
Ping?
On 05.04.24 14:34, Juergen Gross wrote:
> When booting as a Xen PV guest the boot processor isn't detected
> correctly and the following message is shown:
>
> CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1
>
> Additionally this results in one CPU being ignored.
>
> Fix that by calling the BSP detection logic when registering the boot
> CPU's APIC, too.
>
> Fixes: 5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels")
> Signed-off-by: Juergen Gross <[email protected]>
> ---
> arch/x86/kernel/cpu/topology.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index aaca8d235dc2..23c3db5e6396 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -255,7 +255,7 @@ void __init topology_register_boot_apic(u32 apic_id)
> WARN_ON_ONCE(topo_info.boot_cpu_apic_id != BAD_APICID);
>
> topo_info.boot_cpu_apic_id = apic_id;
> - topo_register_apic(apic_id, CPU_ACPIID_INVALID, true);
> + topology_register_apic(apic_id, CPU_ACPIID_INVALID, true);
> }
>
> /**
PING!!!!
This patch fixes a regression introduced in 6.9-rc1.
Please consider taking the patch in the 6.9 cycle!
#regzbot ^introduced: 5c5682b9f87a
Juergen
On 19.04.24 13:52, Juergen Gross wrote:
> Ping?
>
> On 05.04.24 14:34, Juergen Gross wrote:
>> When booting as a Xen PV guest the boot processor isn't detected
>> correctly and the following message is shown:
>>
>> CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1
>>
>> Additionally this results in one CPU being ignored.
>>
>> Fix that by calling the BSP detection logic when registering the boot
>> CPU's APIC, too.
>>
>> Fixes: 5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels")
>> Signed-off-by: Juergen Gross <[email protected]>
>> ---
>> arch/x86/kernel/cpu/topology.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
>> index aaca8d235dc2..23c3db5e6396 100644
>> --- a/arch/x86/kernel/cpu/topology.c
>> +++ b/arch/x86/kernel/cpu/topology.c
>> @@ -255,7 +255,7 @@ void __init topology_register_boot_apic(u32 apic_id)
>> WARN_ON_ONCE(topo_info.boot_cpu_apic_id != BAD_APICID);
>> topo_info.boot_cpu_apic_id = apic_id;
>> - topo_register_apic(apic_id, CPU_ACPIID_INVALID, true);
>> + topology_register_apic(apic_id, CPU_ACPIID_INVALID, true);
>> }
>> /**
>
On Fri, Apr 05 2024 at 14:34, Juergen Gross wrote:
> When booting as a Xen PV guest the boot processor isn't detected
> correctly and the following message is shown:
>
> CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1
>
> Additionally this results in one CPU being ignored.
>
> Fix that by calling the BSP detection logic when registering the boot
> CPU's APIC, too.
>
> Fixes: 5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels")
> Signed-off-by: Juergen Gross <[email protected]>
> ---
> arch/x86/kernel/cpu/topology.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
> index aaca8d235dc2..23c3db5e6396 100644
> --- a/arch/x86/kernel/cpu/topology.c
> +++ b/arch/x86/kernel/cpu/topology.c
> @@ -255,7 +255,7 @@ void __init topology_register_boot_apic(u32 apic_id)
> WARN_ON_ONCE(topo_info.boot_cpu_apic_id != BAD_APICID);
>
> topo_info.boot_cpu_apic_id = apic_id;
> - topo_register_apic(apic_id, CPU_ACPIID_INVALID, true);
> + topology_register_apic(apic_id, CPU_ACPIID_INVALID, true);
No. This does not fix anything at all. It just papers over the
underlying problem.
Thanks,
tglx
---
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 27d1a5b7f571..ac41d83b38d3 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -154,9 +154,9 @@ static void __init xen_pv_smp_config(void)
u32 apicid = 0;
int i;
- topology_register_boot_apic(apicid++);
+ topology_register_boot_apic(apicid);
- for (i = 1; i < nr_cpu_ids; i++)
+ for (i = 0; i < nr_cpu_ids; i++)
topology_register_apic(apicid++, CPU_ACPIID_INVALID, true);
/* Pretend to be a proper enumerated system */
On 30.04.24 18:13, Thomas Gleixner wrote:
> On Fri, Apr 05 2024 at 14:34, Juergen Gross wrote:
>> When booting as a Xen PV guest the boot processor isn't detected
>> correctly and the following message is shown:
>>
>> CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1
>>
>> Additionally this results in one CPU being ignored.
>>
>> Fix that by calling the BSP detection logic when registering the boot
>> CPU's APIC, too.
>>
>> Fixes: 5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels")
>> Signed-off-by: Juergen Gross <[email protected]>
>> ---
>> arch/x86/kernel/cpu/topology.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
>> index aaca8d235dc2..23c3db5e6396 100644
>> --- a/arch/x86/kernel/cpu/topology.c
>> +++ b/arch/x86/kernel/cpu/topology.c
>> @@ -255,7 +255,7 @@ void __init topology_register_boot_apic(u32 apic_id)
>> WARN_ON_ONCE(topo_info.boot_cpu_apic_id != BAD_APICID);
>>
>> topo_info.boot_cpu_apic_id = apic_id;
>> - topo_register_apic(apic_id, CPU_ACPIID_INVALID, true);
>> + topology_register_apic(apic_id, CPU_ACPIID_INVALID, true);
>
> No. This does not fix anything at all. It just papers over the
> underlying problem.
>
> Thanks,
>
> tglx
> ---
> diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
> index 27d1a5b7f571..ac41d83b38d3 100644
> --- a/arch/x86/xen/smp_pv.c
> +++ b/arch/x86/xen/smp_pv.c
> @@ -154,9 +154,9 @@ static void __init xen_pv_smp_config(void)
> u32 apicid = 0;
> int i;
>
> - topology_register_boot_apic(apicid++);
> + topology_register_boot_apic(apicid);
>
> - for (i = 1; i < nr_cpu_ids; i++)
> + for (i = 0; i < nr_cpu_ids; i++)
> topology_register_apic(apicid++, CPU_ACPIID_INVALID, true);
>
> /* Pretend to be a proper enumerated system */
>
>
>
Thanks, works great.
Do you want it to send as your patch, or should I add your Signed-off-by or
your Suggested-by?
Juergen
The topology core expects the boot APIC to be registered from earhy APIC
detection first and then again when the firmware tables are evaluated. This
is used for detecting the real BSP CPU on a kexec kernel.
The recent conversion of XEN/PV to register fake APIC IDs failed to
register the boot CPU APIC correctly as it only registers it once. This
causes the BSP detection mechanism to trigger wrongly:
CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1
Additionally this results in one CPU being ignored.
Register the boot CPU APIC twice so that the XEN/PV fake enumeration
behaves like real firmware.
Reported-by: Juergen Gross <[email protected]>
Fixes: e75307023466 ("x86/xen/smp_pv: Register fake APICs")
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
---
arch/x86/xen/smp_pv.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -154,9 +154,9 @@ static void __init xen_pv_smp_config(voi
u32 apicid = 0;
int i;
- topology_register_boot_apic(apicid++);
+ topology_register_boot_apic(apicid);
- for (i = 1; i < nr_cpu_ids; i++)
+ for (i = 0; i < nr_cpu_ids; i++)
topology_register_apic(apicid++, CPU_ACPIID_INVALID, true);
/* Pretend to be a proper enumerated system */
On 02.05.24 16:39, Thomas Gleixner wrote:
> The topology core expects the boot APIC to be registered from earhy APIC
> detection first and then again when the firmware tables are evaluated. This
> is used for detecting the real BSP CPU on a kexec kernel.
>
> The recent conversion of XEN/PV to register fake APIC IDs failed to
> register the boot CPU APIC correctly as it only registers it once. This
> causes the BSP detection mechanism to trigger wrongly:
>
> CPU topo: Boot CPU APIC ID not the first enumerated APIC ID: 0 > 1
>
> Additionally this results in one CPU being ignored.
>
> Register the boot CPU APIC twice so that the XEN/PV fake enumeration
> behaves like real firmware.
>
> Reported-by: Juergen Gross <[email protected]>
> Fixes: e75307023466 ("x86/xen/smp_pv: Register fake APICs")
> Signed-off-by: Thomas Gleixner <[email protected]>
> Tested-by: Juergen Gross <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Thanks for the patch, I'll take it via the Xen tree.
Juergen
> ---
> arch/x86/xen/smp_pv.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/arch/x86/xen/smp_pv.c
> +++ b/arch/x86/xen/smp_pv.c
> @@ -154,9 +154,9 @@ static void __init xen_pv_smp_config(voi
> u32 apicid = 0;
> int i;
>
> - topology_register_boot_apic(apicid++);
> + topology_register_boot_apic(apicid);
>
> - for (i = 1; i < nr_cpu_ids; i++)
> + for (i = 0; i < nr_cpu_ids; i++)
> topology_register_apic(apicid++, CPU_ACPIID_INVALID, true);
>
> /* Pretend to be a proper enumerated system */