On Intel AlderLake-N platforms where there are Ecores only, the Ecore
Module topology is enumerated via CPUID.1F Module level, which has not
been supported by Linux kernel yet.
This exposes two issues in current CPUID.1F handling code.
1. Linux interprets the Module ID bits as package ID and erroneously
reports a multi module system as a multi-package system.
2. Linux excludes the unknown Module ID bits from the core ID, and results
in duplicate core ID’s shown in a package after the first issue solved.
Plus that, a third problem is observed on Intel Hybrid ADL-S/P platforms.
The return value of CPUID.1F SMT level EBX (number of siblings) differs on
Pcore CPUs and Ecore CPUs, and results in inconsistent smp_num_siblings
value based on the Pcore/Ecore CPU enumeration order. This could bring
some potential issues although we have not observed any functionalities
issues so far.
This patch series fixes these three problems in CPUID.1F handling code,
together with some related fixes and document updates.
thanks,
-rui
---
Changes since V1:
- fix/improve changelog/comment wording issues
- reorder the patches to eliminate bisection breakage window
- add a new patch for coretemp driver variable renaming
- update coretemp driver patch to fix a case of ida_free(&ida, -2)
The max siblings value returned by CPUID.1F SMT level EBX differs among
CPUs on Intel Hybrid platforms like ADL-S/P.
It returns 2 for Pcore CPUs which have HT sibling and 1 for Ecore CPUs
which do not.
Today, CPUID SMT level EBX sets the global variable smp_num_siblings.
Thus, smp_num_siblings is overridden to different values based on the CPU
Pcore/Ecore enumeration order.
For example,
[ 0.201005] detect_extended_topology: CPU APICID 0x0, smp_num_siblings 2, x86_max_cores 10
[ 0.201117] start_kernel->check_bugs->cpu_smt_check_topology: smp_num_siblings 2
...
[ 0.010146] detect_extended_topology: CPU APICID 0x8, smp_num_siblings 2, x86_max_cores 10
...
[ 0.010146] detect_extended_topology: CPU APICID 0x39, smp_num_siblings 2, x86_max_cores 10
[ 0.010146] detect_extended_topology: CPU APICID 0x48, smp_num_siblings 1, x86_max_cores 20
...
[ 0.010146] detect_extended_topology: CPU APICID 0x4e, smp_num_siblings 1, x86_max_cores 20
[ 2.583800] sched_set_itmt_core_prio: smp_num_siblings 1
This inconsistency brings several potential issues:
1. some kernel configuration like cpu_smt_control, as set in
start_kernel()->check_bugs()->cpu_smt_check_topology(), depends on
smp_num_siblings set by cpu0.
It is pure luck that all the current hybrid platforms use Pcore as cpu0
and hide this problem.
2. some per CPU data like cpuinfo_x86.x86_max_cores that depends on
smp_num_siblings becomes inconsistent and bogus.
3. the final smp_num_siblings value after boot depends on the last CPU
enumerated, which could either be Pcore or Ecore CPU.
The solution is to use CPUID EAX bits_shift to get the maximum number of
addressable logical processors, and use this to determin max siblings.
Because:
1. the CPUID EAX bits_shift values are consistent among CPUs as far as
observed.
2. some code already uses smp_num_siblings value to isolate the SMT ID
bits in APIC-ID, like apic_id_is_primary_thread().
Suggested-and-reviewed-by: Len Brown <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
---
arch/x86/kernel/cpu/topology.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 5e868b62a7c4..2a88f2fa5756 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -23,7 +23,12 @@
#define LEAFB_SUBTYPE(ecx) (((ecx) >> 8) & 0xff)
#define BITS_SHIFT_NEXT_LEVEL(eax) ((eax) & 0x1f)
-#define LEVEL_MAX_SIBLINGS(ebx) ((ebx) & 0xffff)
+
+/*
+ * Use EAX bit_shift to calculate the maximum number of addressable logical
+ * processors sharing the current level.
+ */
+#define LEVEL_MAX_SIBLINGS(eax) (1 << BITS_SHIFT_NEXT_LEVEL(eax))
unsigned int __max_die_per_package __read_mostly = 1;
EXPORT_SYMBOL(__max_die_per_package);
@@ -79,7 +84,7 @@ int detect_extended_topology_early(struct cpuinfo_x86 *c)
* initial apic id, which also represents 32-bit extended x2apic id.
*/
c->initial_apicid = edx;
- smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ smp_num_siblings = LEVEL_MAX_SIBLINGS(eax);
#endif
return 0;
}
@@ -109,9 +114,9 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
*/
cpuid_count(leaf, SMT_LEVEL, &eax, &ebx, &ecx, &edx);
c->initial_apicid = edx;
- core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(eax);
core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
- die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ die_level_siblings = LEVEL_MAX_SIBLINGS(eax);
pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
sub_index = 1;
@@ -122,14 +127,14 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
* Check for the Core type in the implemented sub leaves.
*/
if (LEAFB_SUBTYPE(ecx) == CORE_TYPE) {
- core_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ core_level_siblings = LEVEL_MAX_SIBLINGS(eax);
core_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
die_level_siblings = core_level_siblings;
die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
}
if (LEAFB_SUBTYPE(ecx) == DIE_TYPE) {
die_level_present = true;
- die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
+ die_level_siblings = LEVEL_MAX_SIBLINGS(eax);
die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
}
--
2.25.1
The coretemp driver supports up to a hard-coded limit of 128 cores.
Today, the driver can not support a core with an ID above that limit.
Yet, the encoding of core ID's is arbitrary (BIOS APIC-ID) and so they
may be sparse and they may be large.
Update the driver to map arbitrary core ID numbers into appropriate
array indexes so that 128 cores can be supported, no matter the encoding
of core ID's.
Acked-by: Len Brown <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
---
drivers/hwmon/coretemp.c | 56 +++++++++++++++++++++++++++++-----------
1 file changed, 41 insertions(+), 15 deletions(-)
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index bfdcfe8ccb34..291566aeb703 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -46,9 +46,6 @@ MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
#define TOTAL_ATTRS (MAX_CORE_ATTRS + 1)
#define MAX_CORE_DATA (NUM_REAL_CORES + BASE_SYSFS_ATTR_NO)
-#define TO_CORE_ID(cpu) (cpu_data(cpu).cpu_core_id)
-#define TO_ATTR_NO(cpu) (TO_CORE_ID(cpu) + BASE_SYSFS_ATTR_NO)
-
#ifdef CONFIG_SMP
#define for_each_sibling(i, cpu) \
for_each_cpu(i, topology_sibling_cpumask(cpu))
@@ -91,6 +88,8 @@ struct temp_data {
struct platform_data {
struct device *hwmon_dev;
u16 pkg_id;
+ u16 cpu_map[NUM_REAL_CORES];
+ struct ida ida;
struct cpumask cpumask;
struct temp_data *core_data[MAX_CORE_DATA];
struct device_attribute name_attr;
@@ -441,7 +440,7 @@ static struct temp_data *init_temp_data(unsigned int cpu, int pkg_flag)
MSR_IA32_THERM_STATUS;
tdata->is_pkg_data = pkg_flag;
tdata->cpu = cpu;
- tdata->cpu_core_id = TO_CORE_ID(cpu);
+ tdata->cpu_core_id = topology_core_id(cpu);
tdata->attr_size = MAX_CORE_ATTRS;
mutex_init(&tdata->update_lock);
return tdata;
@@ -454,7 +453,7 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu,
struct platform_data *pdata = platform_get_drvdata(pdev);
struct cpuinfo_x86 *c = &cpu_data(cpu);
u32 eax, edx;
- int err, attr_no;
+ int err, index, attr_no;
/*
* Find attr number for sysfs:
@@ -462,14 +461,26 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu,
* The attr number is always core id + 2
* The Pkgtemp will always show up as temp1_*, if available
*/
- attr_no = pkg_flag ? PKG_SYSFS_ATTR_NO : TO_ATTR_NO(cpu);
+ if (pkg_flag) {
+ attr_no = PKG_SYSFS_ATTR_NO;
+ } else {
+ index = ida_alloc(&pdata->ida, GFP_KERNEL);
+ if (index < 0)
+ return index;
+ pdata->cpu_map[index] = topology_core_id(cpu);
+ attr_no = index + BASE_SYSFS_ATTR_NO;
+ }
- if (attr_no > MAX_CORE_DATA - 1)
- return -ERANGE;
+ if (attr_no > MAX_CORE_DATA - 1) {
+ err = -ERANGE;
+ goto ida_free;
+ }
tdata = init_temp_data(cpu, pkg_flag);
- if (!tdata)
- return -ENOMEM;
+ if (!tdata) {
+ err = -ENOMEM;
+ goto ida_free;
+ }
/* Test if we can access the status register */
err = rdmsr_safe_on_cpu(cpu, tdata->status_reg, &eax, &edx);
@@ -505,6 +516,9 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu,
exit_free:
pdata->core_data[attr_no] = NULL;
kfree(tdata);
+ida_free:
+ if (!pkg_flag)
+ ida_free(&pdata->ida, index);
return err;
}
@@ -524,6 +538,9 @@ static void coretemp_remove_core(struct platform_data *pdata, int index)
kfree(pdata->core_data[index]);
pdata->core_data[index] = NULL;
+
+ if (index >= BASE_SYSFS_ATTR_NO)
+ ida_free(&pdata->ida, index - BASE_SYSFS_ATTR_NO);
}
static int coretemp_probe(struct platform_device *pdev)
@@ -537,6 +554,7 @@ static int coretemp_probe(struct platform_device *pdev)
return -ENOMEM;
pdata->pkg_id = pdev->id;
+ ida_init(&pdata->ida);
platform_set_drvdata(pdev, pdata);
pdata->hwmon_dev = devm_hwmon_device_register_with_groups(dev, DRVNAME,
@@ -553,6 +571,7 @@ static int coretemp_remove(struct platform_device *pdev)
if (pdata->core_data[i])
coretemp_remove_core(pdata, i);
+ ida_destroy(&pdata->ida);
return 0;
}
@@ -647,7 +666,7 @@ static int coretemp_cpu_offline(unsigned int cpu)
struct platform_device *pdev = coretemp_get_pdev(cpu);
struct platform_data *pd;
struct temp_data *tdata;
- int index, target;
+ int i, index = -1, target;
/*
* Don't execute this on suspend as the device remove locks
@@ -660,12 +679,19 @@ static int coretemp_cpu_offline(unsigned int cpu)
if (!pdev)
return 0;
- /* The core id is too big, just return */
- index = TO_ATTR_NO(cpu);
- if (index > MAX_CORE_DATA - 1)
+ pd = platform_get_drvdata(pdev);
+
+ for (i = 0; i < NUM_REAL_CORES; i++) {
+ if (pd->cpu_map[i] == topology_core_id(cpu)) {
+ index = i + BASE_SYSFS_ATTR_NO;
+ break;
+ }
+ }
+
+ /* Too many cores and this core is not populated, just return */
+ if (index < 0)
return 0;
- pd = platform_get_drvdata(pdev);
tdata = pd->core_data[index];
cpumask_clear_cpu(cpu, &pd->cpumask);
--
2.25.1
CPUID.1F/B does not emumerate Package level explicitly, instead, all the
APIC-ID bits above the enumerated levels are assumed to be package ID
bits.
Current code gets package ID by shifting out all the APIC-ID bits that
Linux supports, rather than shifting out all the APIC-ID bits that
CPUID.1F enumerates. This introduces problems when CPUID.1F enumerates a
level that Linux does not support.
For example, on an AlderLake-N platform, there are 2 Ecore Modules, which
has 4 atom cores in each module, in a single package.
Linux does not support Module level and interprets the Module ID bits as
package ID and erroneously reports a multi module system as a
multi-package system.
Fix this by using APIC-ID bits above all the CPUID.1F enumerated levels as
package ID.
Suggested-and-reviewed-by: Len Brown <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
---
arch/x86/kernel/cpu/topology.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 132a2de44d2f..f7592814e5d5 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -96,6 +96,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
unsigned int ht_mask_width, core_plus_mask_width, die_plus_mask_width;
unsigned int core_select_mask, core_level_siblings;
unsigned int die_select_mask, die_level_siblings;
+ unsigned int pkg_mask_width;
bool die_level_present = false;
int leaf;
@@ -111,10 +112,10 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
core_level_siblings = smp_num_siblings = LEVEL_MAX_SIBLINGS(ebx);
core_plus_mask_width = ht_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
die_level_siblings = LEVEL_MAX_SIBLINGS(ebx);
- die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+ pkg_mask_width = die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
sub_index = 1;
- do {
+ while (true) {
cpuid_count(leaf, sub_index, &eax, &ebx, &ecx, &edx);
/*
@@ -132,8 +133,13 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
die_plus_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
}
+ if (LEAFB_SUBTYPE(ecx) != INVALID_TYPE)
+ pkg_mask_width = BITS_SHIFT_NEXT_LEVEL(eax);
+ else
+ break;
+
sub_index++;
- } while (LEAFB_SUBTYPE(ecx) != INVALID_TYPE);
+ }
core_select_mask = (~(-1 << core_plus_mask_width)) >> ht_mask_width;
die_select_mask = (~(-1 << die_plus_mask_width)) >>
@@ -148,7 +154,7 @@ int detect_extended_topology(struct cpuinfo_x86 *c)
}
c->phys_proc_id = apic->phys_pkg_id(c->initial_apicid,
- die_plus_mask_width);
+ pkg_mask_width);
/*
* Reinit the apicid, now that we have extended initial_apicid.
*/
--
2.25.1
smp_num_siblings/cpuinfo_x86.x86_max_cores are retrieved via CPUID EAX
bit_shift value, and they represent the maximum possible number of threads
in a core, and the maximum possible number of cores in a package.
Update the smp_num_siblings/cpuinfo_x86.x86_max_cores description in the
documentation.
Reviewed-by: Len Brown <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
---
Documentation/x86/topology.rst | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/Documentation/x86/topology.rst b/Documentation/x86/topology.rst
index 7f58010ea86a..c5eb5bc42380 100644
--- a/Documentation/x86/topology.rst
+++ b/Documentation/x86/topology.rst
@@ -49,7 +49,8 @@ AMD nomenclature for package is 'Node'.
- cpuinfo_x86.x86_max_cores:
- The number of cores in a package. This information is retrieved via CPUID.
+ The maximum possible number of cores in a package. This information is
+ retrieved via CPUID.
- cpuinfo_x86.x86_max_dies:
@@ -102,10 +103,10 @@ AMDs nomenclature for a CMT core is "Compute Unit". The kernel always uses
- smp_num_siblings:
- The number of threads in a core. The number of threads in a package can be
- calculated by::
+ The maximum possible number of threads in a core. The maximum possible
+ number of threads in a package can be calculated by::
- threads_per_package = cpuinfo_x86.x86_max_cores * smp_num_siblings
+ maximum_threads_per_package = cpuinfo_x86.x86_max_cores * smp_num_siblings
Threads
--
2.25.1
cpuinfo_x86.x86_max_dies is introduced in commit 7745f03eb395
("x86/topology: Add CPUID.1F multi-die/package support") and then
removed in commit 14d96d6c06b5
("x86/topology: Create topology_max_die_per_package()").
Remove the obsolete cpuinfo_x86.x86_max_dies description.
Fixes: 14d96d6c06b5 ("x86/topology: Create topology_max_die_per_package()")
Reviewed-by: Len Brown <[email protected]>
Signed-off-by: Zhang Rui <[email protected]>
---
Documentation/x86/topology.rst | 4 ----
1 file changed, 4 deletions(-)
diff --git a/Documentation/x86/topology.rst b/Documentation/x86/topology.rst
index c5eb5bc42380..fbef91b1ee5e 100644
--- a/Documentation/x86/topology.rst
+++ b/Documentation/x86/topology.rst
@@ -52,10 +52,6 @@ AMD nomenclature for package is 'Node'.
The maximum possible number of cores in a package. This information is
retrieved via CPUID.
- - cpuinfo_x86.x86_max_dies:
-
- The number of dies in a package. This information is retrieved via CPUID.
-
- cpuinfo_x86.cpu_die_id:
The physical ID of the die. This information is retrieved via CPUID.
--
2.25.1
On Tue, Aug 16, 2022 at 01:16:28PM +0800, Zhang Rui wrote:
> The coretemp driver supports up to a hard-coded limit of 128 cores.
>
> Today, the driver can not support a core with an ID above that limit.
> Yet, the encoding of core ID's is arbitrary (BIOS APIC-ID) and so they
> may be sparse and they may be large.
>
> Update the driver to map arbitrary core ID numbers into appropriate
> array indexes so that 128 cores can be supported, no matter the encoding
> of core ID's.
>
> Acked-by: Len Brown <[email protected]>
> Signed-off-by: Zhang Rui <[email protected]>
Acked-by: Guenter Roeck <[email protected]>
> ---
> drivers/hwmon/coretemp.c | 56 +++++++++++++++++++++++++++++-----------
> 1 file changed, 41 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> index bfdcfe8ccb34..291566aeb703 100644
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -46,9 +46,6 @@ MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
> #define TOTAL_ATTRS (MAX_CORE_ATTRS + 1)
> #define MAX_CORE_DATA (NUM_REAL_CORES + BASE_SYSFS_ATTR_NO)
>
> -#define TO_CORE_ID(cpu) (cpu_data(cpu).cpu_core_id)
> -#define TO_ATTR_NO(cpu) (TO_CORE_ID(cpu) + BASE_SYSFS_ATTR_NO)
> -
> #ifdef CONFIG_SMP
> #define for_each_sibling(i, cpu) \
> for_each_cpu(i, topology_sibling_cpumask(cpu))
> @@ -91,6 +88,8 @@ struct temp_data {
> struct platform_data {
> struct device *hwmon_dev;
> u16 pkg_id;
> + u16 cpu_map[NUM_REAL_CORES];
> + struct ida ida;
> struct cpumask cpumask;
> struct temp_data *core_data[MAX_CORE_DATA];
> struct device_attribute name_attr;
> @@ -441,7 +440,7 @@ static struct temp_data *init_temp_data(unsigned int cpu, int pkg_flag)
> MSR_IA32_THERM_STATUS;
> tdata->is_pkg_data = pkg_flag;
> tdata->cpu = cpu;
> - tdata->cpu_core_id = TO_CORE_ID(cpu);
> + tdata->cpu_core_id = topology_core_id(cpu);
> tdata->attr_size = MAX_CORE_ATTRS;
> mutex_init(&tdata->update_lock);
> return tdata;
> @@ -454,7 +453,7 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu,
> struct platform_data *pdata = platform_get_drvdata(pdev);
> struct cpuinfo_x86 *c = &cpu_data(cpu);
> u32 eax, edx;
> - int err, attr_no;
> + int err, index, attr_no;
>
> /*
> * Find attr number for sysfs:
> @@ -462,14 +461,26 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu,
> * The attr number is always core id + 2
> * The Pkgtemp will always show up as temp1_*, if available
> */
> - attr_no = pkg_flag ? PKG_SYSFS_ATTR_NO : TO_ATTR_NO(cpu);
> + if (pkg_flag) {
> + attr_no = PKG_SYSFS_ATTR_NO;
> + } else {
> + index = ida_alloc(&pdata->ida, GFP_KERNEL);
> + if (index < 0)
> + return index;
> + pdata->cpu_map[index] = topology_core_id(cpu);
> + attr_no = index + BASE_SYSFS_ATTR_NO;
> + }
>
> - if (attr_no > MAX_CORE_DATA - 1)
> - return -ERANGE;
> + if (attr_no > MAX_CORE_DATA - 1) {
> + err = -ERANGE;
> + goto ida_free;
> + }
>
> tdata = init_temp_data(cpu, pkg_flag);
> - if (!tdata)
> - return -ENOMEM;
> + if (!tdata) {
> + err = -ENOMEM;
> + goto ida_free;
> + }
>
> /* Test if we can access the status register */
> err = rdmsr_safe_on_cpu(cpu, tdata->status_reg, &eax, &edx);
> @@ -505,6 +516,9 @@ static int create_core_data(struct platform_device *pdev, unsigned int cpu,
> exit_free:
> pdata->core_data[attr_no] = NULL;
> kfree(tdata);
> +ida_free:
> + if (!pkg_flag)
> + ida_free(&pdata->ida, index);
> return err;
> }
>
> @@ -524,6 +538,9 @@ static void coretemp_remove_core(struct platform_data *pdata, int index)
>
> kfree(pdata->core_data[index]);
> pdata->core_data[index] = NULL;
> +
> + if (index >= BASE_SYSFS_ATTR_NO)
> + ida_free(&pdata->ida, index - BASE_SYSFS_ATTR_NO);
> }
>
> static int coretemp_probe(struct platform_device *pdev)
> @@ -537,6 +554,7 @@ static int coretemp_probe(struct platform_device *pdev)
> return -ENOMEM;
>
> pdata->pkg_id = pdev->id;
> + ida_init(&pdata->ida);
> platform_set_drvdata(pdev, pdata);
>
> pdata->hwmon_dev = devm_hwmon_device_register_with_groups(dev, DRVNAME,
> @@ -553,6 +571,7 @@ static int coretemp_remove(struct platform_device *pdev)
> if (pdata->core_data[i])
> coretemp_remove_core(pdata, i);
>
> + ida_destroy(&pdata->ida);
> return 0;
> }
>
> @@ -647,7 +666,7 @@ static int coretemp_cpu_offline(unsigned int cpu)
> struct platform_device *pdev = coretemp_get_pdev(cpu);
> struct platform_data *pd;
> struct temp_data *tdata;
> - int index, target;
> + int i, index = -1, target;
>
> /*
> * Don't execute this on suspend as the device remove locks
> @@ -660,12 +679,19 @@ static int coretemp_cpu_offline(unsigned int cpu)
> if (!pdev)
> return 0;
>
> - /* The core id is too big, just return */
> - index = TO_ATTR_NO(cpu);
> - if (index > MAX_CORE_DATA - 1)
> + pd = platform_get_drvdata(pdev);
> +
> + for (i = 0; i < NUM_REAL_CORES; i++) {
> + if (pd->cpu_map[i] == topology_core_id(cpu)) {
> + index = i + BASE_SYSFS_ATTR_NO;
> + break;
> + }
> + }
> +
> + /* Too many cores and this core is not populated, just return */
> + if (index < 0)
> return 0;
>
> - pd = platform_get_drvdata(pdev);
> tdata = pd->core_data[index];
>
> cpumask_clear_cpu(cpu, &pd->cpumask);
> --
> 2.25.1
>
On Tue, 2022-08-16 at 13:16 +0800, Zhang Rui wrote:
> On Intel AlderLake-N platforms where there are Ecores only, the Ecore
> Module topology is enumerated via CPUID.1F Module level, which has
> not
> been supported by Linux kernel yet.
>
> This exposes two issues in current CPUID.1F handling code.
> 1. Linux interprets the Module ID bits as package ID and erroneously
> reports a multi module system as a multi-package system.
> 2. Linux excludes the unknown Module ID bits from the core ID, and
> results
> in duplicate core ID’s shown in a package after the first issue
> solved.
>
> Plus that, a third problem is observed on Intel Hybrid ADL-S/P
> platforms.
> The return value of CPUID.1F SMT level EBX (number of siblings)
> differs on
> Pcore CPUs and Ecore CPUs, and results in inconsistent
> smp_num_siblings
> value based on the Pcore/Ecore CPU enumeration order. This could
> bring
> some potential issues although we have not observed any
> functionalities
> issues so far.
>
> This patch series fixes these three problems in CPUID.1F handling
> code,
> together with some related fixes and document updates.
>
>
Hi,
These patches are mainly bug fixes for ADL-N and Intel Hybrid
platforms.
May I know if there are any further changes needed?
thanks,
rui
> thanks,
> -rui
>
> ---
> Changes since V1:
> - fix/improve changelog/comment wording issues
> - reorder the patches to eliminate bisection breakage window
> - add a new patch for coretemp driver variable renaming
> - update coretemp driver patch to fix a case of ida_free(&ida, -2)
On 8/15/22 22:16, Zhang Rui wrote:
> Suggested-and-reviewed-by: Len Brown <[email protected]>
> Signed-off-by: Zhang Rui <[email protected]>
"Please do not use combined tags, e.g. Reported-and-tested-by, as they
just complicate automated extraction of tags."
https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
Hi, Dave,
On Wed, 2022-09-21 at 09:26 -0700, Dave Hansen wrote:
> On 8/15/22 22:16, Zhang Rui wrote:
> > Suggested-and-reviewed-by: Len Brown <[email protected]>
> > Signed-off-by: Zhang Rui <[email protected]>
>
> "Please do not use combined tags, e.g. Reported-and-tested-by, as
> they
> just complicate automated extraction of tags."
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
Thanks for pointing this out.
Problem is addressed in V3 series.
thanks,
rui