2023-12-12 22:23:54

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v4 0/4] x86/cacheinfo: Set the number of leaves per CPU

Hi,

The interface /sys/devices/system/cpu/cpuX/cache is broken (not populated)
if CPUs have different numbers of subleaves in CPUID 4. This is the case
of Intel Meteor Lake.

This is v4 of a patchset to fix the cache sysfs interface by setting the
number of cache leaves independently for each CPU.

v1, v2, and v3 can be found here[1], here[2], and here[3].

All the tests described in [4] passed.

Changes since v3:
* Fixed another NULL-pointer dereference when checking the validity of
the last-level cache info.
* Added the Reviewed-by tags from Radu and Sudeep. Thanks!
* Rebased on v6.7-rc5.

Changes since v2:
* This version uncovered a NULL-pointer dereference in recent changes to
cacheinfo[5]. This dereference is observed when the system does not
configure cacheinfo early during boot nor makes corrections later
during CPU hotplug; as is the case in x86. Patch 1 fixes this issue.

Changes since v1:
* Dave Hansen suggested to use the existing per-CPU ci_cpu_cacheinfo
variable. Now the global variable num_cache_leaves became useless.
* While here, I noticed that init_cache_level() also became useless:
x86 does not need ci_cpu_cacheinfo::num_levels.

Thanks and BR,
Ricardo

[1]. https://lore.kernel.org/lkml/[email protected]/
[2]. https://lore.kernel.org/all/[email protected]/
[3]. https://lore.kernel.org/lkml/[email protected]/
[4]. https://lore.kernel.org/lkml/[email protected]/
[5]. https://lore.kernel.org/all/[email protected]/

Ricardo Neri (4):
cacheinfo: Check for null last-level cache info
cacheinfo: Allocate memory for memory if not done from the primary CPU
x86/cacheinfo: Delete global num_cache_leaves
x86/cacheinfo: Clean out init_cache_level()

arch/x86/kernel/cpu/cacheinfo.c | 49 +++++++++++++++++----------------
drivers/base/cacheinfo.c | 9 +++++-
2 files changed, 34 insertions(+), 24 deletions(-)

--
2.25.1


2023-12-12 22:23:58

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v4 1/4] cacheinfo: Check for null last-level cache info

Before determining the validity of the last-level cache info, ensure that
it has been allocated. Simply checking for non-zero cache_leaves() is not
sufficient, as some architectures (e.g., Intel processors) have non-zero
cache_leaves() before allocation.

Dereferencing NULL cacheinfo can occur in update_per_cpu_data_slice_size().
This function iterates over all online CPUs. However, a CPU may have come
online recently, but its cacheinfo may not have been allocated yet.

Cc: Andreas Herrmann <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen Yu <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Radu Rendec <[email protected]>
Cc: Pierre Gondois <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v3:
* Introduced this patch.

Changes since v2:
* N/A

Changes since v1:
* N/A
---

The dereference of a NULL cacheinfo is not observed today because
cache_leaves(cpu) is zero until after init_cache_level() is called
(during the CPU hotplug callback). A subsequent changeset will set
the number of cache leaves earlier and the NULL-pointer dereference
will be observed.
---
drivers/base/cacheinfo.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index f1e79263fe61..967c5cf3fb1d 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -61,6 +61,9 @@ bool last_level_cache_is_valid(unsigned int cpu)
if (!cache_leaves(cpu))
return false;

+ if (!per_cpu_cacheinfo(cpu))
+ return false;
+
llc = per_cpu_cacheinfo_idx(cpu, cache_leaves(cpu) - 1);

return (llc->attributes & CACHE_ID) || !!llc->fw_token;
--
2.25.1

2023-12-12 22:24:01

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v4 2/4] cacheinfo: Allocate memory for memory if not done from the primary CPU

Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU")
adds functionality that architectures can use to optionally allocate and
build cacheinfo early during boot. Commit 6539cffa9495 ("cacheinfo: Add
arch specific early level initializer") lets secondary CPUs correct (and
reallocate memory) cacheinfo data if needed.

If the early build functionality is not used and cacheinfo does not need
correction, memory for cacheinfo is never allocated. x86 does not use the
early build functionality. Consequently, during the cacheinfo CPU hotplug
callback, last_level_cache_is_valid() attempts to dereference a NULL
pointer:

BUG: kernel NULL pointer dereference, address: 0000000000000100
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEPMT SMP NOPTI
CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1
RIP: 0010: last_level_cache_is_valid+0x95/0xe0a

Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if
not done earlier.

Cc: Andreas Herrmann <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen Yu <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Radu Rendec <[email protected]>
Cc: Pierre Gondois <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Radu Rendec <[email protected]>
Reviewed-by: Sudeep Holla <[email protected]>
Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer")
Signed-off-by: Ricardo Neri <[email protected]>
---
The motivation for commit 5944ce092b97 was to prevent a BUG splat in
PREEMPT_RT kernels during memory allocation. This splat is not observed on
x86 because the memory allocation for cacheinfo happens in
detect_cache_attributes() from the cacheinfo CPU hotplug callback.

The dereference of a NULL pointer is not observed today because
cache_leaves(cpu) is zero until after init_cache_level() is called (also
during the CPU hotplug callback). A subsequent changeset will set the
number of cache leaves earlier and the NULL-pointer dereference will be
observed.
---
Changes since v3:
* Added Reviewed-by tag from Radu and Sudeep. Thanks!

Changes since v2:
* Introduced this patch.

Changes since v1:
* N/A
---
drivers/base/cacheinfo.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 967c5cf3fb1d..735ccead190e 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -557,7 +557,11 @@ static inline int init_level_allocate_ci(unsigned int cpu)
*/
ci_cacheinfo(cpu)->early_ci_levels = false;

- if (cache_leaves(cpu) <= early_leaves)
+ /*
+ * Some architectures (e.g., x86) do not use early initialization.
+ * Allocate memory now in such case.
+ */
+ if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
return 0;

kfree(per_cpu_cacheinfo(cpu));
--
2.25.1

2023-12-12 22:24:05

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v4 4/4] x86/cacheinfo: Clean out init_cache_level()

init_cache_level() no longer has a purpose on x86. It no longer needs to
set num_leaves, and it never had to set num_levels, which was unnecessary
on x86.

Replace it with "return 0" simply to override the weak function, which
would return an error.

Cc: Andreas Herrmann <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen Yu <[email protected]>
CC: Huang Ying <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Radu Rendec <[email protected]>
Cc: Pierre Gondois <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Len Brown <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
Changes since v3:
* Rebased on v6.7-rc5.

Changes since v2:
* None

Changes since v1:
* Introduced this patch.
---
arch/x86/kernel/cpu/cacheinfo.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 4125e53a5ef7..1cfe0921ac67 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1002,11 +1002,6 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,

int init_cache_level(unsigned int cpu)
{
- struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
-
- if (!this_cpu_ci)
- return -EINVAL;
- this_cpu_ci->num_levels = 3;
return 0;
}

--
2.25.1

2023-12-12 22:24:10

by Ricardo Neri

[permalink] [raw]
Subject: [PATCH v4 3/4] x86/cacheinfo: Delete global num_cache_leaves

Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all
CPUs from the same global "num_cache_leaves".

This is erroneous on systems such as Meteor Lake, where each CPU has a
distinct num_leaves value. Delete the global "num_cache_leaves" and
initialize num_leaves on each CPU.

Cc: Andreas Herrmann <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen Yu <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Radu Rendec <[email protected]>
Cc: Pierre Gondois <[email protected]>
Cc: Pu Wen <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Sudeep Holla <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Zhang Rui <[email protected]>
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Len Brown <[email protected]>
Signed-off-by: Ricardo Neri <[email protected]>
---
After this change, all CPUs will traverse CPUID leaf 0x4 when booted for
the first time. On systems with symmetric cache topologies this is
useless work.

Creating a list of processor models that have asymmetric cache topologies
was considered. The burden of maintaining such list would outweigh the
performance benefit of skipping this extra step.
---
Changes since v3:
* Rebased on v6.7-rc5.

Changes since v2:
* None

Changes since v1:
* Do not make num_cache_leaves a per-CPU variable. Instead, reuse the
existing per-CPU ci_cpu_cacheinfo variable. (Dave Hansen)
---
arch/x86/kernel/cpu/cacheinfo.c | 44 +++++++++++++++++++--------------
1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index c131c412db89..4125e53a5ef7 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -178,7 +178,16 @@ struct _cpuid4_info_regs {
struct amd_northbridge *nb;
};

-static unsigned short num_cache_leaves;
+static inline unsigned int get_num_cache_leaves(unsigned int cpu)
+{
+ return get_cpu_cacheinfo(cpu)->num_leaves;
+}
+
+static inline void
+set_num_cache_leaves(unsigned int nr_leaves, unsigned int cpu)
+{
+ get_cpu_cacheinfo(cpu)->num_leaves = nr_leaves;
+}

/* AMD doesn't have CPUID4. Emulate it here to report the same
information to the user. This makes some assumptions about the machine:
@@ -718,19 +727,21 @@ void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c)
void init_amd_cacheinfo(struct cpuinfo_x86 *c)
{

+ unsigned int cpu = c->cpu_index;
+
if (boot_cpu_has(X86_FEATURE_TOPOEXT)) {
- num_cache_leaves = find_num_cache_leaves(c);
+ set_num_cache_leaves(find_num_cache_leaves(c), cpu);
} else if (c->extended_cpuid_level >= 0x80000006) {
if (cpuid_edx(0x80000006) & 0xf000)
- num_cache_leaves = 4;
+ set_num_cache_leaves(4, cpu);
else
- num_cache_leaves = 3;
+ set_num_cache_leaves(3, cpu);
}
}

void init_hygon_cacheinfo(struct cpuinfo_x86 *c)
{
- num_cache_leaves = find_num_cache_leaves(c);
+ set_num_cache_leaves(find_num_cache_leaves(c), c->cpu_index);
}

void init_intel_cacheinfo(struct cpuinfo_x86 *c)
@@ -742,19 +753,19 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb;

if (c->cpuid_level > 3) {
- static int is_initialized;
-
- if (is_initialized == 0) {
- /* Init num_cache_leaves from boot CPU */
- num_cache_leaves = find_num_cache_leaves(c);
- is_initialized++;
- }
+ /*
+ * There should be at least one leaf. A non-zero value means
+ * that the number of leaves has been initialized.
+ */
+ if (!get_num_cache_leaves(c->cpu_index))
+ set_num_cache_leaves(find_num_cache_leaves(c),
+ c->cpu_index);

/*
* Whenever possible use cpuid(4), deterministic cache
* parameters cpuid leaf to find the cache details
*/
- for (i = 0; i < num_cache_leaves; i++) {
+ for (i = 0; i < get_num_cache_leaves(c->cpu_index); i++) {
struct _cpuid4_info_regs this_leaf = {};
int retval;

@@ -790,14 +801,14 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c)
* Don't use cpuid2 if cpuid4 is supported. For P4, we use cpuid2 for
* trace cache
*/
- if ((num_cache_leaves == 0 || c->x86 == 15) && c->cpuid_level > 1) {
+ if ((!get_num_cache_leaves(c->cpu_index) || c->x86 == 15) && c->cpuid_level > 1) {
/* supports eax=2 call */
int j, n;
unsigned int regs[4];
unsigned char *dp = (unsigned char *)regs;
int only_trace = 0;

- if (num_cache_leaves != 0 && c->x86 == 15)
+ if (get_num_cache_leaves(c->cpu_index) && c->x86 == 15)
only_trace = 1;

/* Number of times to iterate */
@@ -993,12 +1004,9 @@ int init_cache_level(unsigned int cpu)
{
struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);

- if (!num_cache_leaves)
- return -ENOENT;
if (!this_cpu_ci)
return -EINVAL;
this_cpu_ci->num_levels = 3;
- this_cpu_ci->num_leaves = num_cache_leaves;
return 0;
}

--
2.25.1

2024-01-25 06:00:31

by Ricardo Neri

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] x86/cacheinfo: Set the number of leaves per CPU

On Tue, Dec 12, 2023 at 02:25:15PM -0800, Ricardo Neri wrote:
> Hi,
>
> The interface /sys/devices/system/cpu/cpuX/cache is broken (not populated)
> if CPUs have different numbers of subleaves in CPUID 4. This is the case
> of Intel Meteor Lake.

Hello. Wondering if there is any feedback on this patchset. The interface
/sys/devices/system/cpu/cpuX/cache is still broken on Meteor Lake with
v6.8-rc1.

I verified that this patchset applies cleanly on v6.8-rc1.

Thanks and BR,
Ricardo

2024-01-25 11:17:53

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] cacheinfo: Check for null last-level cache info

On Tue, Dec 12, 2023 at 02:25:16PM -0800, Ricardo Neri wrote:
> Before determining the validity of the last-level cache info, ensure that
> it has been allocated. Simply checking for non-zero cache_leaves() is not
> sufficient, as some architectures (e.g., Intel processors) have non-zero
> cache_leaves() before allocation.
>
> Dereferencing NULL cacheinfo can occur in update_per_cpu_data_slice_size().
> This function iterates over all online CPUs. However, a CPU may have come
> online recently, but its cacheinfo may not have been allocated yet.
>
> Cc: Andreas Herrmann <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Chen Yu <[email protected]>
> Cc: Huang Ying <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: Radu Rendec <[email protected]>
> Cc: Pierre Gondois <[email protected]>
> Cc: Pu Wen <[email protected]>
> Cc: "Rafael J. Wysocki" <[email protected]>
> Cc: Sudeep Holla <[email protected]>

If you respin, you can address the below minor nit. I am fine as is as
well.

Reviewed-by: Sudeep Holla <[email protected]>

[...]

> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index f1e79263fe61..967c5cf3fb1d 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -61,6 +61,9 @@ bool last_level_cache_is_valid(unsigned int cpu)
> if (!cache_leaves(cpu))
> return false;
>
> + if (!per_cpu_cacheinfo(cpu))
> + return false;
> +

[nit] You can even combine this with above if condition.

--
Regards,
Sudeep

2024-01-25 11:18:25

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH v4 0/4] x86/cacheinfo: Set the number of leaves per CPU

On Wed, Jan 24, 2024 at 06:56:52PM -0800, Ricardo Neri wrote:
> On Tue, Dec 12, 2023 at 02:25:15PM -0800, Ricardo Neri wrote:
> > Hi,
> >
> > The interface /sys/devices/system/cpu/cpuX/cache is broken (not populated)
> > if CPUs have different numbers of subleaves in CPUID 4. This is the case
> > of Intel Meteor Lake.
>
> Hello. Wondering if there is any feedback on this patchset. The interface
> /sys/devices/system/cpu/cpuX/cache is still broken on Meteor Lake with
> v6.8-rc1.
>
> I verified that this patchset applies cleanly on v6.8-rc1.
>

Sorry I though I had acked it but now I see 1/4 is new in v4. I have
responded on the original patch.

--
Regards,
Sudeep

2024-01-25 18:11:22

by Ricardo Neri

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] cacheinfo: Check for null last-level cache info

On Thu, Jan 25, 2024 at 11:15:44AM +0000, Sudeep Holla wrote:
> On Tue, Dec 12, 2023 at 02:25:16PM -0800, Ricardo Neri wrote:
> > Before determining the validity of the last-level cache info, ensure that
> > it has been allocated. Simply checking for non-zero cache_leaves() is not
> > sufficient, as some architectures (e.g., Intel processors) have non-zero
> > cache_leaves() before allocation.
> >
> > Dereferencing NULL cacheinfo can occur in update_per_cpu_data_slice_size().
> > This function iterates over all online CPUs. However, a CPU may have come
> > online recently, but its cacheinfo may not have been allocated yet.
> >
> > Cc: Andreas Herrmann <[email protected]>
> > Cc: Catalin Marinas <[email protected]>
> > Cc: Chen Yu <[email protected]>
> > Cc: Huang Ying <[email protected]>
> > Cc: Len Brown <[email protected]>
> > Cc: Radu Rendec <[email protected]>
> > Cc: Pierre Gondois <[email protected]>
> > Cc: Pu Wen <[email protected]>
> > Cc: "Rafael J. Wysocki" <[email protected]>
> > Cc: Sudeep Holla <[email protected]>
>
> If you respin, you can address the below minor nit. I am fine as is as
> well.
>
> Reviewed-by: Sudeep Holla <[email protected]>

Thank you for your review Sudeep!

>
> [...]
>
> > diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> > index f1e79263fe61..967c5cf3fb1d 100644
> > --- a/drivers/base/cacheinfo.c
> > +++ b/drivers/base/cacheinfo.c
> > @@ -61,6 +61,9 @@ bool last_level_cache_is_valid(unsigned int cpu)
> > if (!cache_leaves(cpu))
> > return false;
> >
> > + if (!per_cpu_cacheinfo(cpu))
> > + return false;
> > +
>
> [nit] You can even combine this with above if condition.

Sure, I can take care of this if a new version is needed as per feedback
from the x86 maintainers.