2022-10-14 09:45:10

by Zhang, Rui

[permalink] [raw]
Subject: [PATCH V4 0/4] x86/topology: Fix CPUID.1F handling

Handling for CPUID.1F is introduced by commit 7745f03eb395 ("x86/topology:
Add CPUID.1F multi-die/package support").
And only SMT/Core/Die level types are supported at that time.

On Intel AlderLake-N platforms where there are Ecores only, the Ecore
Module topology is enumerated via CPUID.1F Module level.

This exposes two bugs in the CPUID.1F handling code,
1. Linux interprets the Module ID bits as package ID and erroneously
reports a multi module system as a multi-package system.
2. Linux excludes the unknown Module ID bits from the core ID, and results
in duplicate core ID’s shown in a package after the first issue solved.

Patch 3/4 and 4/4 fixes these two problems in CPUID.1F handling code, and
patch 1/4 and 2/4 are needed to avoid potential regressions.

thanks,
-rui

---
Changes in V4:
- Drop a fix for a potential issue, as well as document cleanups/updates.
So this patch series only contains bug fixes (stable candidates).
- Integrate the patch subject/changelog improvements from Dave.
- Add Fixes tags.

Changes in V3:
- changelog improvements based on Peter' feedback
- Remove combined tags

Changes in V2:
- fix/improve changelog/comment wording issues
- reorder the patches to eliminate bisection breakage window
- add a new patch for coretemp driver variable renaming
- update coretemp driver patch to fix a case of ida_free(&ida, -2)


2022-10-14 09:45:13

by Zhang, Rui

[permalink] [raw]
Subject: [PATCH V4 3/4] x86/topology: Fix multiple packages shown on a single-package system

CPUID.1F/B does not emumerate Package level explicitly, instead, all the
APIC-ID bits above the enumerated levels are assumed to be package ID
bits.

Current code gets package ID by shifting out all the APIC-ID bits that
Linux supports, rather than shifting out all the APIC-ID bits that
CPUID.1F enumerates. This introduces problems when CPUID.1F enumerates a
level that Linux does not support.

For example, on a single package AlderLake-N, there are 2 Ecore Modules
with 4 atom cores in each module. Linux does not support the Module
level and interprets the Module ID bits as package ID and erroneously
reports a multi module system as a multi-package system.

Fix this by using APIC-ID bits above all the CPUID.1F enumerated levels
as package ID.

Fixes: 7745f03eb395 ("x86/topology: Add CPUID.1F multi-die/package support")
Cc: [email protected]
Suggested-by: Len Brown <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
Reviewed-by: Len Brown <[email protected]>
---
arch/x86/kernel/cpu/topology.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 132a2de44d2f..f7592814e5d5 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -96,6 +96,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
unsigned int core_select_mask, core_level_siblings;
unsigned int die_select_mask, die_level_siblings;
+ unsigned int pkg_mask_width;
bool die_level_present = false;
int leaf;

@@ -111,10 +112,10 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
- die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+ pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);

sub_index = 1;
- do {
+ while (true) {
cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);

/*
@@ -132,8 +133,13 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
}

+ if (LEAFB_SUBTYPE(ecx) != INVALID_TYPE)
+ pkg_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+ else
+ break;
+
sub_index++;
- } while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);
+ }

core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
die_select_mask = (~(-1 << die_plus_mask_width)) >>
@@ -148,7 +154,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
}

c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
- die_plus_mask_width);
+ pkg_mask_width);
/*
* Reinit the apicid, now that we have extended initial_apicid.
*/
--
2.25.1

Subject: [tip: x86/urgent] x86/topology: Fix multiple packages shown on a single-package system

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 2b12a7a126d62bdbd81f4923c21bf6e9a7fbd069
Gitweb: https://git.kernel.org/tip/2b12a7a126d62bdbd81f4923c21bf6e9a7fbd069
Author: Zhang Rui <[email protected]>
AuthorDate: Fri, 14 Oct 2022 17:01:46 +08:00
Committer: Dave Hansen <[email protected]>
CommitterDate: Mon, 17 Oct 2022 11:58:52 -07:00

x86/topology: Fix multiple packages shown on a single-package system

CPUID.1F/B does not enumerate Package level explicitly, instead, all the
APIC-ID bits above the enumerated levels are assumed to be package ID
bits.

Current code gets package ID by shifting out all the APIC-ID bits that
Linux supports, rather than shifting out all the APIC-ID bits that
CPUID.1F enumerates. This introduces problems when CPUID.1F enumerates a
level that Linux does not support.

For example, on a single package AlderLake-N, there are 2 Ecore Modules
with 4 atom cores in each module. Linux does not support the Module
level and interprets the Module ID bits as package ID and erroneously
reports a multi module system as a multi-package system.

Fix this by using APIC-ID bits above all the CPUID.1F enumerated levels
as package ID.

[ dhansen: spelling fix ]

Fixes: 7745f03eb395 ("x86/topology: Add CPUID.1F multi-die/package support")
Suggested-by: Len Brown <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Len Brown <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/topology.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 132a2de..f759281 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -96,6 +96,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
unsigned int core_select_mask, core_level_siblings;
unsigned int die_select_mask, die_level_siblings;
+ unsigned int pkg_mask_width;
bool die_level_present = false;
int leaf;

@@ -111,10 +112,10 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
- die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+ pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);

sub_index = 1;
- do {
+ while (true) {
cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);

/*
@@ -132,8 +133,13 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
}

+ if (LEAFB_SUBTYPE(ecx) != INVALID_TYPE)
+ pkg_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+ else
+ break;
+
sub_index++;
- } while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);
+ }

core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
die_select_mask = (~(-1 << die_plus_mask_width)) >>
@@ -148,7 +154,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
}

c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
- die_plus_mask_width);
+ pkg_mask_width);
/*
* Reinit the apicid, now that we have extended initial_apicid.
*/