2022-09-22 14:21:29

by Zhang, Rui

[permalink] [raw]
Subject: [PATCH V3 0/8] x86/topology: Improve CPUID.1F handling

On Intel AlderLake-N platforms where there are Ecores only, the Ecore
Module topology is enumerated via CPUID.1F Module level, which has not
been supported by Linux kernel yet.

This exposes two issues in current CPUID.1F handling code.
1. Linux interprets the Module ID bits as package ID and erroneously
reports a multi module system as a multi-package system.
2. Linux excludes the unknown Module ID bits from the core ID, and results
in duplicate core ID’s shown in a package after the first issue solved.

Plus that, a third problem is observed on Intel Hybrid ADL-S/P platforms.
The return value of CPUID.1F SMT level EBX (number of siblings) differs on
Pcore CPUs and Ecore CPUs, and results in inconsistent smp_num_siblings
value based on the Pcore/Ecore CPU enumeration order. This could bring
some potential issues although we have not observed any functionalities
issues so far.

This patch series fixes these three problems in CPUID.1F handling code,
together with some related fixes and document updates.

thanks,
-rui

---
Changes since V2:
- changelog improvements based on Peter' feedback
- Remove combined tags

Changes since V1:
- fix/improve changelog/comment wording issues
- reorder the patches to eliminate bisection breakage window
- add a new patch for coretemp driver variable renaming
- update coretemp driver patch to fix a case of ida_free(&ida, -2)


2022-09-22 14:26:16

by Zhang, Rui

[permalink] [raw]
Subject: [PATCH V3 7/8] Documentation: x86: Update smp_num_siblings/x86_max_cores description

smp_num_siblings/cpuinfo_x86.x86_max_cores are retrieved via CPUID EAX
bit_shift value, and they represent the maximum possible number of threads
in a core, and the maximum possible number of cores in a package.

Update the smp_num_siblings/cpuinfo_x86.x86_max_cores description in the
documentation.

Reviewed-by: Len Brown <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
---
Documentation/x86/topology.rst | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/Documentation/x86/topology.rst b/Documentation/x86/topology.rst
index 7f58010ea86a..c5eb5bc42380 100644
--- a/Documentation/x86/topology.rst
+++ b/Documentation/x86/topology.rst
@@ -49,7 +49,8 @@ AMD nomenclature for package is 'Node'.

- cpuinfo_x86.x86_max_cores:

- The number of cores in a package. This information is retrieved via CPUID.
+ The maximum possible number of cores in a package. This information is
+ retrieved via CPUID.

- cpuinfo_x86.x86_max_dies:

@@ -102,10 +103,10 @@ AMDs nomenclature for a CMT core is "Compute Unit". The kernel always uses

- smp_num_siblings:

- The number of threads in a core. The number of threads in a package can be
- calculated by::
+ The maximum possible number of threads in a core. The maximum possible
+ number of threads in a package can be calculated by::

- threads_per_package = cpuinfo_x86.x86_max_cores * smp_num_siblings
+ maximum_threads_per_package = cpuinfo_x86.x86_max_cores * smp_num_siblings


Threads
--
2.25.1

2022-09-22 14:31:49

by Zhang, Rui

[permalink] [raw]
Subject: [PATCH V3 6/8] x86/topology: Fix max_siblings calculation

The max siblings value returned by CPUID.1F SMT level EBX differs among
CPUs on Intel Hybrid platforms like ADL-S/P.
It returns 2 for Pcore CPUs which have HT sibling and 1 for Ecore CPUs
which do not.

Today, CPUID SMT level EBX sets the global variable smp_num_siblings.
Thus, smp_num_siblings is overridden to different values based on the CPU
Pcore/Ecore enumeration order.

For example,

[ 0.201005] detect_extended_topology: CPU APICID 0x0, smp_num_siblings 2, x86_max_cores 10
[ 0.201117] start_kernel->check_bugs->cpu_smt_check_topology: smp_num_siblings 2
...
[ 0.010146] detect_extended_topology: CPU APICID 0x8, smp_num_siblings 2, x86_max_cores 10
...
[ 0.010146] detect_extended_topology: CPU APICID 0x39, smp_num_siblings 2, x86_max_cores 10
[ 0.010146] detect_extended_topology: CPU APICID 0x48, smp_num_siblings 1, x86_max_cores 20
...
[ 0.010146] detect_extended_topology: CPU APICID 0x4e, smp_num_siblings 1, x86_max_cores 20
[ 2.583800] sched_set_itmt_core_prio: smp_num_siblings 1

This inconsistency brings several potential issues:
1. some kernel configuration like cpu_smt_control, as set in
start_kernel()->check_bugs()->cpu_smt_check_topology(), depends on
smp_num_siblings set by cpu0.
It is pure luck that all the current hybrid platforms use Pcore as cpu0
and hide this problem.
2. some per CPU data like cpuinfo_x86.x86_max_cores that depends on
smp_num_siblings becomes inconsistent and bogus.
3. the final smp_num_siblings value after boot depends on the last CPU
enumerated, which could either be Pcore or Ecore CPU.

The solution is to use CPUID EAX bits_shift to get the maximum number of
addressable logical processors, and use this to determin max siblings.
Because:
1. the CPUID EAX bits_shift values are consistent among CPUs as far as
observed.
2. some code already uses smp_num_siblings value to isolate the SMT ID
bits in APIC-ID, like apic_id_is_primary_thread().

Suggested-by: Len Brown <[email protected]>
Reviewed-by: Len Brown <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
---
arch/x86/kernel/cpu/topology.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 5e868b62a7c4..2a88f2fa5756 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -23,7 +23,12 @@

#define LEAFB_SUBTYPE(ecx) (((ecx) >> 8) & 0xff)
#define BITS_SHIFT_NEXT_LEVEL(eax) ((eax) & 0x1f)
-#define LEVEL_MAX_SIBLINGS(ebx) ((ebx) & 0xffff)
+
+/*
+ * Use EAX bit_shift to calculate the maximum number of addressable logical
+ * processors sharing the current level.
+ */
+#define LEVEL_MAX_SIBLINGS(eax) (1 << BITS_SHIFT_NEXT_LEVEL(eax))

unsigned int __max_die_per_package __read_mostly = 1;
EXPORT_SYMBOL(__max_die_per_package);
@@ -79,7 +84,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
* initial apic id, which also represents 32-bit extended x2apic id.
*/
c->initial_apicid = edx;
- smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ smp_num_siblings = LEVEL_MAX_SIBLINGS(eax);
#endif
return 0;
}
@@ -109,9 +114,9 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
*/
cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
c->initial_apicid = edx;
- core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(eax);
core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
- die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ die_level_siblings = LEVEL_MAX_SIBLINGS(eax);
pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);

sub_index = 1;
@@ -122,14 +127,14 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
* Check for the Core type in the implemented sub leaves.
*/
if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
- core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ core_level_siblings = LEVEL_MAX_SIBLINGS(eax);
core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
die_level_siblings = core_level_siblings;
die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
}
if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
die_level_present = true;
- die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ die_level_siblings = LEVEL_MAX_SIBLINGS(eax);
die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
}

--
2.25.1

2022-09-22 14:32:00

by Zhang, Rui

[permalink] [raw]
Subject: [PATCH V3 4/8] x86/topology: Fix multiple packages shown on a single-package system

CPUID.1F/B does not emumerate Package level explicitly, instead, all the
APIC-ID bits above the enumerated levels are assumed to be package ID
bits.

Current code gets package ID by shifting out all the APIC-ID bits that
Linux supports, rather than shifting out all the APIC-ID bits that
CPUID.1F enumerates. This introduces problems when CPUID.1F enumerates a
level that Linux does not support.

For example, on an AlderLake-N platform, there are 2 Ecore Modules, which
has 4 atom cores in each module, in a single package.
Linux does not support Module level and interprets the Module ID bits as
package ID and erroneously reports a multi module system as a
multi-package system.

Fix this by using APIC-ID bits above all the CPUID.1F enumerated levels as
package ID.

Suggested-by: Len Brown <[email protected]>
Reviewed-by: Len Brown <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
---
arch/x86/kernel/cpu/topology.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 132a2de44d2f..f7592814e5d5 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -96,6 +96,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
unsigned int core_select_mask, core_level_siblings;
unsigned int die_select_mask, die_level_siblings;
+ unsigned int pkg_mask_width;
bool die_level_present = false;
int leaf;

@@ -111,10 +112,10 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
- die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+ pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);

sub_index = 1;
- do {
+ while (true) {
cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);

/*
@@ -132,8 +133,13 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
}

+ if (LEAFB_SUBTYPE(ecx) != INVALID_TYPE)
+ pkg_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+ else
+ break;
+
sub_index++;
- } while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);
+ }

core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
die_select_mask = (~(-1 << die_plus_mask_width)) >>
@@ -148,7 +154,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
}

c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
- die_plus_mask_width);
+ pkg_mask_width);
/*
* Reinit the apicid, now that we have extended initial_apicid.
*/
--
2.25.1

2022-09-22 17:04:51

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH V3 0/8] x86/topology: Improve CPUID.1F handling

> Changes since V2:
> - changelog improvements based on Peter' feedback
> - Remove combined tags

I fixed those up and started testing your v2 yesterday. Can you
double-check that this:

https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/log/?h=cpuid1f

Looks the same as your v3?

2022-09-23 09:29:53

by Zhang, Rui

[permalink] [raw]
Subject: Re: [PATCH V3 0/8] x86/topology: Improve CPUID.1F handling

Hi, Dave,

On Thu, 2022-09-22 at 09:53 -0700, Dave Hansen wrote:
> > Changes since V2:
> > - changelog improvements based on Peter' feedback
> > - Remove combined tags
>
> I fixed those up and started testing your v2 yesterday.

Thanks for doing this.

> Can you
> double-check that this:
>
>
> https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/log/?h=cpuid1f
>
> Looks the same as your v3?

There is no code difference.
Just that I have updated the subject and changelog of patch 1/8 per
Peter' suggestion
https://lore.kernel.org/lkml/[email protected]/

thanks,
rui

2022-10-13 11:06:46

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH V3 0/8] x86/topology: Improve CPUID.1F handling

This series of BUG FIXES needs to be marked for -stable.

What testing is it waiting for?
I don't see upstream, in linux-next or in tip -- which means that
nobody is testing it.

Are we supposed to be pulling from the URL below to get the latest???

The latest Intel Hybrid CPUs are sort of a mess without this series.
Distros will need to back-port it.

thanks,
-Len

On Fri, Sep 23, 2022 at 10:40 AM Zhang Rui <[email protected]> wrote:
>
> Hi, Dave,
>
> On Thu, 2022-09-22 at 09:53 -0700, Dave Hansen wrote:
> > > Changes since V2:
> > > - changelog improvements based on Peter' feedback
> > > - Remove combined tags
> >
> > I fixed those up and started testing your v2 yesterday.
>
> Thanks for doing this.
>
> > Can you
> > double-check that this:
> >
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/daveh/devel.git/log/?h=cpuid1f
> >
> > Looks the same as your v3?
>
> There is no code difference.
> Just that I have updated the subject and changelog of patch 1/8 per
> Peter' suggestion
> https://lore.kernel.org/lkml/[email protected]/
>
> thanks,
> rui
>


--
Len Brown, Intel

2022-10-13 16:30:25

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH V3 0/8] x86/topology: Improve CPUID.1F handling

On 10/13/22 03:58, Len Brown wrote:
> This series of BUG FIXES needs to be marked for -stable.

Hi Len,

That would have been great feedback to include with your review when
your provided your acks. It's also not clear where the bug fixes stop
and the "related fixes" begin. Is the entire series bug fixes that need
to be marked for -stable?

2022-10-14 02:59:30

by Zhang, Rui

[permalink] [raw]
Subject: Re: [PATCH V3 0/8] x86/topology: Improve CPUID.1F handling

Hi, Dave,

On Thu, 2022-10-13 at 08:56 -0700, Dave Hansen wrote:
> On 10/13/22 03:58, Len Brown wrote:
> > This series of BUG FIXES needs to be marked for -stable.
>
> Hi Len,
>
> That would have been great feedback to include with your review when
> your provided your acks. It's also not clear where the bug fixes
> stop
> and the "related fixes" begin. Is the entire series bug fixes that
> need
> to be marked for -stable?

Patch 4/8 ~ 5/8 are real fixes and they depends on patch 2/8 and 3/8 to
avoid possible regressions. So patch 2/8 ~ 5/8 should be stable
material.

patch 6/8 is also a bugfix, but we have not observed any functionality
issues so far.

thanks,
rui

2022-10-14 04:23:07

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH V3 0/8] x86/topology: Improve CPUID.1F handling

On 10/13/22 19:06, Zhang Rui wrote:
> Patch 4/8 ~ 5/8 are real fixes and they depends on patch 2/8 and 3/8 to
> avoid possible regressions. So patch 2/8 ~ 5/8 should be stable
> material.
>
> patch 6/8 is also a bugfix, but we have not observed any functionality
> issues so far.

If you want to make this really easy on me, you could take the changelog
and comment revisions that I made in that testing branch that I shared,
integrate them, and send it out again, and *only* include the bugfixes.

2022-10-14 06:02:30

by Zhang, Rui

[permalink] [raw]
Subject: Re: [PATCH V3 0/8] x86/topology: Improve CPUID.1F handling

On Thu, 2022-10-13 at 20:19 -0700, Dave Hansen wrote:
> On 10/13/22 19:06, Zhang Rui wrote:
> > Patch 4/8 ~ 5/8 are real fixes and they depends on patch 2/8 and
> > 3/8 to
> > avoid possible regressions. So patch 2/8 ~ 5/8 should be stable
> > material.
> >
> > patch 6/8 is also a bugfix, but we have not observed any
> > functionality
> > issues so far.
>
> If you want to make this really easy on me, you could take the
> changelog
> and comment revisions that I made in that testing branch that I
> shared,
> integrate them, and send it out again, and *only* include the
> bugfixes.

Sure, will do this and resend.

thanks,
rui

2022-10-14 08:26:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V3 0/8] x86/topology: Improve CPUID.1F handling

On Thu, Oct 13, 2022 at 08:19:03PM -0700, Dave Hansen wrote:
> On 10/13/22 19:06, Zhang Rui wrote:
> > Patch 4/8 ~ 5/8 are real fixes and they depends on patch 2/8 and 3/8 to
> > avoid possible regressions. So patch 2/8 ~ 5/8 should be stable
> > material.
> >
> > patch 6/8 is also a bugfix, but we have not observed any functionality
> > issues so far.
>
> If you want to make this really easy on me, you could take the changelog
> and comment revisions that I made in that testing branch that I shared,
> integrate them, and send it out again, and *only* include the bugfixes.

It's really simple; if it don't have a Fixes tag, it ain't a fix :-)

2022-10-14 09:30:15

by Zhang, Rui

[permalink] [raw]
Subject: Re: [PATCH V3 0/8] x86/topology: Improve CPUID.1F handling

On Fri, 2022-10-14 at 10:17 +0200, Peter Zijlstra wrote:
> On Thu, Oct 13, 2022 at 08:19:03PM -0700, Dave Hansen wrote:
> > On 10/13/22 19:06, Zhang Rui wrote:
> > > Patch 4/8 ~ 5/8 are real fixes and they depends on patch 2/8 and
> > > 3/8 to
> > > avoid possible regressions. So patch 2/8 ~ 5/8 should be stable
> > > material.
> > >
> > > patch 6/8 is also a bugfix, but we have not observed any
> > > functionality
> > > issues so far.
> >
> > If you want to make this really easy on me, you could take the
> > changelog
> > and comment revisions that I made in that testing branch that I
> > shared,
> > integrate them, and send it out again, and *only* include the
> > bugfixes.
>
> It's really simple; if it don't have a Fixes tag, it ain't a fix :-)

Thanks for pointing this out.
Patches updated with Fixes tag in V4 series.

thanks,
rui