2023-04-24 00:19:25

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2 0/2] x86/cacheinfo: Set the number of leaves per CPU

Hi,

This v2 of now a patchset to set the number of cache leaves independently
for each CPU. v1 can be found here [1].

These are the changes since v2:
* Dave Hansen, suggested to use the existing per-CPU ci_cpu_cacheinfo
variable. Now the global variable num_cache_leaves became useless.
* While here, I noticed that init_cache_level() also became useless:
x86 does not need ci_cpu_cacheinfo::num_levels.

These patches apply cleanly on top of the master branch of the tip tree.

Thanks and BR,
Ricardo


[1]. https://lore.kernel.org/lkml/[email protected]/


Ricardo Neri (2):
x86/cacheinfo: Delete global num_cache_leaves
x86/cacheinfo: Clean out init_cache_level()

arch/x86/kernel/cpu/cacheinfo.c | 50 ++++++++++++++++-----------------
1 file changed, 25 insertions(+), 25 deletions(-)

--
2.25.1


2023-04-24 00:35:45

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2 2/2] x86/cacheinfo: Clean out init_cache_level()

init_cache_level() no longer has a purpose on x86. It no longer needs to
set num_leaves, and it never had to set num_levels, which was unnecessary
on x86.

Replace it with "return 0" simply to override the weak function, which
would return an error.

Cc: Andreas Herrmann <[email protected]>
Cc: Chen Yu <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Zhang Rui <[email protected]>
Reviewed-by: Len Brown <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v1:
* Introduced this patch.
---
arch/x86/kernel/cpu/cacheinfo.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 45c4e9daf3f1..454247e459b1 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1006,11 +1006,6 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,

int init_cache_level(unsigned int cpu)
{
- struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
-
- if (!this_cpu_ci)
- return -EINVAL;
- this_cpu_ci->num_levels = 3;
return 0;
}

--
2.25.1

2023-04-24 01:52:43

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v2 1/2] x86/cacheinfo: Delete global num_cache_leaves

Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all
CPUs from the same global "num_cache_leaves".

This is erroneous on systems like Meteor Lake, which has different
num_leaves per CPU. Delete the global "num_cache_leaves" and initialize
num_leaves accurately on each CPU.

Cc: Andreas Herrmann <[email protected]>
Cc: Chen Yu <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: [email protected]
Reviewed-by: Len Brown <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
After this change, all CPUs will traverse CPUID leaf 0x4 when booted for
the first time. On systems with asymmetric cache topologies this is
useless work.

Creating a list of processor models that have asymmetric cache topologies
was considered. The burden of maintaining such list would outweigh the
performance benefit of skipping this extra step.
---
Changes since v1:
* Do not make num_cache_leaves a per-CPU variable. Instead, reuse the
existing per-CPU ci_cpu_cacheinfo variable. (Dave Hansen)
---
arch/x86/kernel/cpu/cacheinfo.c | 45 ++++++++++++++++++---------------
1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 4063e8991211..45c4e9daf3f1 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -176,7 +176,16 @@ struct _cpuid4_info_regs {
struct amd_northbridge *nb;
};

-static unsigned short num_cache_leaves;
+static inline unsigned int get_num_cache_leaves(unsigned int cpu)
+{
+ return get_cpu_cacheinfo(cpu)->num_leaves;
+}
+
+static inline void
+set_num_cache_leaves(unsigned int nr_leaves, unsigned int cpu)
+{
+ get_cpu_cacheinfo(cpu)->num_leaves = nr_leaves;
+}

/* AMD doesn't have CPUID4. Emulate it here to report the same
information to the user. This makes some assumptions about the machine:
@@ -716,19 +725,21 @@ void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu)
void init_amd_cacheinfo(struct cpuinfo_x86 *c)
{

+ unsigned int cpu = c->cpu_index;
+
if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
- num_cache_leaves = find_num_cache_leaves(c);
+ set_num_cache_leaves(find_num_cache_leaves(c), cpu);
} else if (c->extended_cpuid_level >= 0x80000006) {
if (cpuid_edx(0x80000006) & 0xf000)
- num_cache_leaves = 4;
+ set_num_cache_leaves(4, cpu);
else
- num_cache_leaves = 3;
+ set_num_cache_leaves(3, cpu);
}
}

void init_hygon_cacheinfo(struct cpuinfo_x86 *c)
{
- num_cache_leaves = find_num_cache_leaves(c);
+ set_num_cache_leaves(find_num_cache_leaves(c), c->cpu_index);
}

void init_intel_cacheinfo(struct cpuinfo_x86 *c)
@@ -738,24 +749,21 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
unsigned int new_l1d = 0, new_l1i = 0; /* Cache sizes from cpuid(4) */
unsigned int new_l2 = 0, new_l3 = 0, i; /* Cache sizes from cpuid(4) */
unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb;
-#ifdef CONFIG_SMP
unsigned int cpu = c->cpu_index;
-#endif

if (c->cpuid_level > 3) {
- static int is_initialized;
-
- if (is_initialized == 0) {
- /* Init num_cache_leaves from boot CPU */
- num_cache_leaves = find_num_cache_leaves(c);
- is_initialized++;
- }
+ /*
+ * There should be at least one leaf. A non-zero value means
+ * that the number of leaves has been initialized.
+ */
+ if (!get_num_cache_leaves(cpu))
+ set_num_cache_leaves(find_num_cache_leaves(c), cpu);

/*
* Whenever possible use cpuid(4), deterministic cache
* parameters cpuid leaf to find the cache details
*/
- for (i = 0; i < num_cache_leaves; i++) {
+ for (i = 0; i < get_num_cache_leaves(cpu); i++) {
struct _cpuid4_info_regs this_leaf = {};
int retval;

@@ -791,14 +799,14 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
* Don't use cpuid2 if cpuid4 is supported. For P4, we use cpuid2 for
* trace cache
*/
- if ((num_cache_leaves == 0 || c->x86 == 15) && c->cpuid_level > 1) {
+ if ((!get_num_cache_leaves(cpu) || c->x86 == 15) && c->cpuid_level > 1) {
/* supports eax=2 call */
int j, n;
unsigned int regs[4];
unsigned char *dp = (unsigned char *)regs;
int only_trace = 0;

- if (num_cache_leaves != 0 && c->x86 == 15)
+ if (get_num_cache_leaves(cpu) && c->x86 == 15)
only_trace = 1;

/* Number of times to iterate */
@@ -1000,12 +1008,9 @@ int init_cache_level(unsigned int cpu)
{
struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);

- if (!num_cache_leaves)
- return -ENOENT;
if (!this_cpu_ci)
return -EINVAL;
this_cpu_ci->num_levels = 3;
- this_cpu_ci->num_leaves = num_cache_leaves;
return 0;
}

--
2.25.1

2023-05-10 20:01:06

by Ricardo Neri

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] x86/cacheinfo: Set the number of leaves per CPU

On Sun, Apr 23, 2023 at 05:19:54PM -0700, Ricardo Neri wrote:
> Hi,
>
> This v2 of now a patchset to set the number of cache leaves independently
> for each CPU. v1 can be found here [1].
>
> These are the changes since v2:
> * Dave Hansen, suggested to use the existing per-CPU ci_cpu_cacheinfo
> variable. Now the global variable num_cache_leaves became useless.
> * While here, I noticed that init_cache_level() also became useless:
> x86 does not need ci_cpu_cacheinfo::num_levels.
>
> These patches apply cleanly on top of the master branch of the tip tree.

FYI, I see a NULL pointer dereference when I apply this patchset on top of
v6.4-rc1. I started a discussion here[1].

[1]. https://lore.kernel.org/all/[email protected]/