Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU")
tries to build the cacheinfo from the primary CPU prior to secondary
CPUs boot, if the DT/ACPI description contains cache information.
However, if such information is not present, it still reverts to the old
behavior, which allocates the cacheinfo memory on each secondary CPU. On
RT kernels, this triggers a "BUG: sleeping function called from invalid
context" because the allocation is done before preemption is first
enabled on the secondary CPU.
The solution is to add cache information to DT/ACPI, but at least on
arm64 systems this can be avoided by leveraging automatic detection
(through the CLIDR_EL1 register), which is already implemented but
currently doesn't work on RT kernels for the reason described above.
This patch series attempts to enable automatic detection for RT kernels
when no DT/ACPI cache information is available, by pre-allocating
cacheinfo memory on the primary CPU.
The first patch adds an architecture independent infrastructure that
allows architecture specific code to take an early guess at the number
of cache leaves of the secodary CPUs, while it runs in preemptible
context on the primary CPU. At the same time, it gives architecture
specific code the opportunity to go back later, while it runs on the
secondary CPU, and reallocate the cacheinfo memory if the initial guess
proves to be wrong.
The second patch leverages the infrastructure implemented in the first
patch and enables early cache depth detection for arm64.
The third patch addresses a specific issue on ACPI systems with no PPTT.
This issue came up during review/testing of v3.
The patch series is based on an RFC patch that was posted to the
linux-arm-kernel mailing list and discussed with a smaller audience:
https://lore.kernel.org/all/[email protected]/
Changes to v3:
* Rebase on top of v6.3-rc6 to avoid a (trivial) merge conflict.
* Add patch #3 (brief description included above).
* Add "Reviewed-by: Pierre Gondois" tag to all patches.
* Rename the new field that is added to struct cpu_cacheinfo from
early_arch_info to early_ci_levels to better reflect what it does.
* Use local variables in the new detect_cache_level() function. That way
the code is easier to read and the original level/leaves algorithm is
unchanged, which also makes the patch clearer.
Changes to v2:
* Address minor coding style issue (unbalanced braces).
* Move cacheinfo reallocation logic from detect_cache_attributes() to a
new function to improve code readability.
* Minor fix to cacheinfo reallocation logic to avoid a new detection of
the cache level if/when detect_cache_attributes() is called again.
Radu Rendec (3):
cacheinfo: Add arch specific early level initializer
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Allow early level detection when DT/ACPI info is
missing/broken
arch/arm64/kernel/cacheinfo.c | 25 ++++++++++--
drivers/base/arch_topology.c | 4 +-
drivers/base/cacheinfo.c | 75 +++++++++++++++++++++++++----------
include/linux/cacheinfo.h | 2 +
4 files changed, 79 insertions(+), 27 deletions(-)
--
2.39.2
This patch gives architecture specific code the ability to initialize
the cache level and allocate cacheinfo memory early, when cache level
initialization runs on the primary CPU for all possible CPUs.
This is part of a patch series that attempts to further the work in
commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
Previously, in the absence of any DT/ACPI cache info, architecture
specific cache detection and info allocation for secondary CPUs would
happen in non-preemptible context during early CPU initialization and
trigger a "BUG: sleeping function called from invalid context" splat on
an RT kernel.
More specifically, this patch adds the early_cache_level() function,
which is called by fetch_cache_info() as a fallback when the number of
cache leaves cannot be extracted from DT/ACPI. In the default generic
(weak) implementation, this new function returns -ENOENT, which
preserves the original behavior for architectures that do not implement
the function.
Since early detection can get the number of cache leaves wrong in some
cases*, additional logic is added to still call init_cache_level() later
on the secondary CPU, therefore giving the architecture specific code an
opportunity to go back and fix the initial guess. Again, the original
behavior is preserved for architectures that do not implement the new
function.
* For example, on arm64, CLIDR_EL1 detection works only when it runs on
the current CPU. In other words, a CPU cannot detect the cache depth
for any other CPU than itself.
Signed-off-by: Radu Rendec <[email protected]>
Reviewed-by: Pierre Gondois <[email protected]>
---
drivers/base/cacheinfo.c | 75 +++++++++++++++++++++++++++------------
include/linux/cacheinfo.h | 2 ++
2 files changed, 55 insertions(+), 22 deletions(-)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index f3903d002819..d783896c8a1f 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -398,6 +398,11 @@ static void free_cache_attributes(unsigned int cpu)
cache_shared_cpu_map_remove(cpu);
}
+int __weak early_cache_level(unsigned int cpu)
+{
+ return -ENOENT;
+}
+
int __weak init_cache_level(unsigned int cpu)
{
return -ENOENT;
@@ -423,56 +428,82 @@ int allocate_cache_info(int cpu)
int fetch_cache_info(unsigned int cpu)
{
- struct cpu_cacheinfo *this_cpu_ci;
+ struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
unsigned int levels = 0, split_levels = 0;
int ret;
if (acpi_disabled) {
ret = init_of_cache_level(cpu);
- if (ret < 0)
- return ret;
} else {
ret = acpi_get_cache_info(cpu, &levels, &split_levels);
- if (ret < 0)
+ if (!ret) {
+ this_cpu_ci->num_levels = levels;
+ /*
+ * This assumes that:
+ * - there cannot be any split caches (data/instruction)
+ * above a unified cache
+ * - data/instruction caches come by pair
+ */
+ this_cpu_ci->num_leaves = levels + split_levels;
+ }
+ }
+
+ if (ret || !cache_leaves(cpu)) {
+ ret = early_cache_level(cpu);
+ if (ret)
return ret;
- this_cpu_ci = get_cpu_cacheinfo(cpu);
- this_cpu_ci->num_levels = levels;
- /*
- * This assumes that:
- * - there cannot be any split caches (data/instruction)
- * above a unified cache
- * - data/instruction caches come by pair
- */
- this_cpu_ci->num_leaves = levels + split_levels;
+ if (!cache_leaves(cpu))
+ return -ENOENT;
+
+ this_cpu_ci->early_ci_levels = true;
}
- if (!cache_leaves(cpu))
- return -ENOENT;
return allocate_cache_info(cpu);
}
-int detect_cache_attributes(unsigned int cpu)
+static inline int init_level_allocate_ci(unsigned int cpu)
{
- int ret;
+ unsigned int early_leaves = cache_leaves(cpu);
/* Since early initialization/allocation of the cacheinfo is allowed
* via fetch_cache_info() and this also gets called as CPU hotplug
* callbacks via cacheinfo_cpu_online, the init/alloc can be skipped
* as it will happen only once (the cacheinfo memory is never freed).
- * Just populate the cacheinfo.
+ * Just populate the cacheinfo. However, if the cacheinfo has been
+ * allocated early through the arch-specific early_cache_level() call,
+ * there is a chance the info is wrong (this can happen on arm64). In
+ * that case, call init_cache_level() anyway to give the arch-specific
+ * code a chance to make things right.
*/
- if (per_cpu_cacheinfo(cpu))
- goto populate_leaves;
+ if (per_cpu_cacheinfo(cpu) && !ci_cacheinfo(cpu)->early_ci_levels)
+ return 0;
if (init_cache_level(cpu) || !cache_leaves(cpu))
return -ENOENT;
- ret = allocate_cache_info(cpu);
+ /*
+ * Now that we have properly initialized the cache level info, make
+ * sure we don't try to do that again the next time we are called
+ * (e.g. as CPU hotplug callbacks).
+ */
+ ci_cacheinfo(cpu)->early_ci_levels = false;
+
+ if (cache_leaves(cpu) <= early_leaves)
+ return 0;
+
+ kfree(per_cpu_cacheinfo(cpu));
+ return allocate_cache_info(cpu);
+}
+
+int detect_cache_attributes(unsigned int cpu)
+{
+ int ret;
+
+ ret = init_level_allocate_ci(cpu);
if (ret)
return ret;
-populate_leaves:
/*
* If LLC is valid the cache leaves were already populated so just go to
* update the cpu map.
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 908e19d17f49..6147b2672555 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -76,9 +76,11 @@ struct cpu_cacheinfo {
unsigned int num_levels;
unsigned int num_leaves;
bool cpu_map_populated;
+ bool early_ci_levels;
};
struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
+int early_cache_level(unsigned int cpu);
int init_cache_level(unsigned int cpu);
int init_of_cache_level(unsigned int cpu);
int populate_cache_leaves(unsigned int cpu);
--
2.39.2
This patch adds an architecture specific early cache level detection
handler for arm64. This is basically the CLIDR_EL1 based detection that
was previously done (only) in init_cache_level().
This is part of a patch series that attempts to further the work in
commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
Previously, in the absence of any DT/ACPI cache info, architecture
specific cache detection and info allocation for secondary CPUs would
happen in non-preemptible context during early CPU initialization and
trigger a "BUG: sleeping function called from invalid context" splat on
an RT kernel.
This patch does not solve the problem completely for RT kernels. It
relies on the assumption that on most systems, the CPUs are symmetrical
and therefore have the same number of cache leaves. The cacheinfo memory
is allocated early (on the primary CPU), relying on the new handler. If
later (when CLIDR_EL1 based detection runs again on the secondary CPU)
the initial assumption proves to be wrong and the CPU has in fact more
leaves, the cacheinfo memory is reallocated, and that still triggers a
splat on an RT kernel.
In other words, asymmetrical CPU systems *must* still provide cacheinfo
data in DT/ACPI to avoid the splat on RT kernels (unless secondary CPUs
happen to have less leaves than the primary CPU). But symmetrical CPU
systems (the majority) can now get away without the additional DT/ACPI
data and rely on CLIDR_EL1 based detection.
Signed-off-by: Radu Rendec <[email protected]>
Reviewed-by: Pierre Gondois <[email protected]>
---
arch/arm64/kernel/cacheinfo.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index c307f69e9b55..d9c9218fa1fd 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -38,11 +38,9 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
this_leaf->type = type;
}
-int init_cache_level(unsigned int cpu)
+static void detect_cache_level(unsigned int *level_p, unsigned int *leaves_p)
{
unsigned int ctype, level, leaves;
- int fw_level, ret;
- struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
ctype = get_cache_type(level);
@@ -54,6 +52,27 @@ int init_cache_level(unsigned int cpu)
leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
}
+ *level_p = level;
+ *leaves_p = leaves;
+}
+
+int early_cache_level(unsigned int cpu)
+{
+ struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+
+ detect_cache_level(&this_cpu_ci->num_levels, &this_cpu_ci->num_leaves);
+
+ return 0;
+}
+
+int init_cache_level(unsigned int cpu)
+{
+ unsigned int level, leaves;
+ int fw_level, ret;
+ struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+
+ detect_cache_level(&level, &leaves);
+
if (acpi_disabled) {
fw_level = of_find_last_cache_level(cpu);
} else {
--
2.39.2
Recent work enables cacheinfo memory for secondary CPUs to be allocated
early, while still running on the primary CPU. That allows cacheinfo
memory to be allocated safely on RT kernels. To make that work, the
number of cache levels/leaves must be defined in the device tree or ACPI
tables. Further work adds a path for early detection of the number of
cache levels/leaves, which makes it possible to allocate the cacheinfo
memory early without requiring extra DT/ACPI information.
This patch addresses a specific issue with ACPI systems with no PPTT. In
that case, parse_acpi_topology() returns an error code, which in turn
makes init_cpu_topology() return early, before fetch_cache_info() is
called. In that case, the early cache level detection doesn't run.
The solution is to simply remove the "return" statement and let the code
flow fall through to calling fetch_cache_info().
Signed-off-by: Radu Rendec <[email protected]>
Reported-by: Pierre Gondois <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Reviewed-by: Pierre Gondois <[email protected]>
---
drivers/base/arch_topology.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index b1c1dd38ab01..147fb7d4af96 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -835,10 +835,10 @@ void __init init_cpu_topology(void)
if (ret) {
/*
* Discard anything that was parsed if we hit an error so we
- * don't use partial information.
+ * don't use partial information. But do not return yet to give
+ * arch-specific early cache level detection a chance to run.
*/
reset_cpu_topology();
- return;
}
for_each_possible_cpu(cpu) {
--
2.39.2
Hi Will,
On Wed, Apr 12, 2023 at 02:57:58PM -0400, Radu Rendec wrote:
> This patch adds an architecture specific early cache level detection
> handler for arm64. This is basically the CLIDR_EL1 based detection that
> was previously done (only) in init_cache_level().
>
> This is part of a patch series that attempts to further the work in
> commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
> Previously, in the absence of any DT/ACPI cache info, architecture
> specific cache detection and info allocation for secondary CPUs would
> happen in non-preemptible context during early CPU initialization and
> trigger a "BUG: sleeping function called from invalid context" splat on
> an RT kernel.
>
> This patch does not solve the problem completely for RT kernels. It
> relies on the assumption that on most systems, the CPUs are symmetrical
> and therefore have the same number of cache leaves. The cacheinfo memory
> is allocated early (on the primary CPU), relying on the new handler. If
> later (when CLIDR_EL1 based detection runs again on the secondary CPU)
> the initial assumption proves to be wrong and the CPU has in fact more
> leaves, the cacheinfo memory is reallocated, and that still triggers a
> splat on an RT kernel.
>
> In other words, asymmetrical CPU systems *must* still provide cacheinfo
> data in DT/ACPI to avoid the splat on RT kernels (unless secondary CPUs
> happen to have less leaves than the primary CPU). But symmetrical CPU
> systems (the majority) can now get away without the additional DT/ACPI
> data and rely on CLIDR_EL1 based detection.
>
If you are okay with the change, can I have your Acked-by, so that I can
route this via Greg's tree ?
--
Regards,
Sudeep
On Thu, Apr 13, 2023 at 11:22:26AM +0100, Sudeep Holla wrote:
> Hi Will,
>
> On Wed, Apr 12, 2023 at 02:57:58PM -0400, Radu Rendec wrote:
> > This patch adds an architecture specific early cache level detection
> > handler for arm64. This is basically the CLIDR_EL1 based detection that
> > was previously done (only) in init_cache_level().
> >
> > This is part of a patch series that attempts to further the work in
> > commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
> > Previously, in the absence of any DT/ACPI cache info, architecture
> > specific cache detection and info allocation for secondary CPUs would
> > happen in non-preemptible context during early CPU initialization and
> > trigger a "BUG: sleeping function called from invalid context" splat on
> > an RT kernel.
> >
> > This patch does not solve the problem completely for RT kernels. It
> > relies on the assumption that on most systems, the CPUs are symmetrical
> > and therefore have the same number of cache leaves. The cacheinfo memory
> > is allocated early (on the primary CPU), relying on the new handler. If
> > later (when CLIDR_EL1 based detection runs again on the secondary CPU)
> > the initial assumption proves to be wrong and the CPU has in fact more
> > leaves, the cacheinfo memory is reallocated, and that still triggers a
> > splat on an RT kernel.
> >
> > In other words, asymmetrical CPU systems *must* still provide cacheinfo
> > data in DT/ACPI to avoid the splat on RT kernels (unless secondary CPUs
> > happen to have less leaves than the primary CPU). But symmetrical CPU
> > systems (the majority) can now get away without the additional DT/ACPI
> > data and rely on CLIDR_EL1 based detection.
> >
>
> If you are okay with the change, can I have your Acked-by, so that I can
> route this via Greg's tree ?
I really dislike the profileration of __weak functions in this file,
rather than the usual approach of having arch-specific static inlines in
a header file but it seems that nobody has the appetite to clean that up :(
So I'm fine for Greg to queue this if he wants to, but I'd be a lot more
excited if somebody tidied things up a bit first.
Will
On Thu, Apr 13, 2023 at 03:45:22PM +0100, Will Deacon wrote:
> On Thu, Apr 13, 2023 at 11:22:26AM +0100, Sudeep Holla wrote:
> > Hi Will,
> >
> > On Wed, Apr 12, 2023 at 02:57:58PM -0400, Radu Rendec wrote:
> > > This patch adds an architecture specific early cache level detection
> > > handler for arm64. This is basically the CLIDR_EL1 based detection that
> > > was previously done (only) in init_cache_level().
> > >
> > > This is part of a patch series that attempts to further the work in
> > > commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
> > > Previously, in the absence of any DT/ACPI cache info, architecture
> > > specific cache detection and info allocation for secondary CPUs would
> > > happen in non-preemptible context during early CPU initialization and
> > > trigger a "BUG: sleeping function called from invalid context" splat on
> > > an RT kernel.
> > >
> > > This patch does not solve the problem completely for RT kernels. It
> > > relies on the assumption that on most systems, the CPUs are symmetrical
> > > and therefore have the same number of cache leaves. The cacheinfo memory
> > > is allocated early (on the primary CPU), relying on the new handler. If
> > > later (when CLIDR_EL1 based detection runs again on the secondary CPU)
> > > the initial assumption proves to be wrong and the CPU has in fact more
> > > leaves, the cacheinfo memory is reallocated, and that still triggers a
> > > splat on an RT kernel.
> > >
> > > In other words, asymmetrical CPU systems *must* still provide cacheinfo
> > > data in DT/ACPI to avoid the splat on RT kernels (unless secondary CPUs
> > > happen to have less leaves than the primary CPU). But symmetrical CPU
> > > systems (the majority) can now get away without the additional DT/ACPI
> > > data and rely on CLIDR_EL1 based detection.
> > >
> >
> > If you are okay with the change, can I have your Acked-by, so that I can
> > route this via Greg's tree ?
>
> I really dislike the profileration of __weak functions in this file,
You mean in the generic cacheinfo.c right ? Coz arm64 version must not have
any and that is the file in this patch.
> rather than the usual approach of having arch-specific static inlines in
> a header file but it seems that nobody has the appetite to clean that up :(
>
Yes, I will try that when I get sometime. I had not seen or touched this
for longtime until recently when new requirements around this are coming.
> So I'm fine for Greg to queue this if he wants to, but I'd be a lot more
> excited if somebody tidied things up a bit first.
>
Agreed. One reason for such weak functions was to avoid conditional
compilation based on arch at that time with the aim to include couple
of more archs but that hasn't happened and perhaps it is time that it
needs a refresh.
--
Regards,
Sudeep
On Thu, Apr 13, 2023 at 04:05:05PM +0100, Sudeep Holla wrote:
> On Thu, Apr 13, 2023 at 03:45:22PM +0100, Will Deacon wrote:
> > On Thu, Apr 13, 2023 at 11:22:26AM +0100, Sudeep Holla wrote:
> > > Hi Will,
> > >
> > > On Wed, Apr 12, 2023 at 02:57:58PM -0400, Radu Rendec wrote:
> > > > This patch adds an architecture specific early cache level detection
> > > > handler for arm64. This is basically the CLIDR_EL1 based detection that
> > > > was previously done (only) in init_cache_level().
> > > >
> > > > This is part of a patch series that attempts to further the work in
> > > > commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
> > > > Previously, in the absence of any DT/ACPI cache info, architecture
> > > > specific cache detection and info allocation for secondary CPUs would
> > > > happen in non-preemptible context during early CPU initialization and
> > > > trigger a "BUG: sleeping function called from invalid context" splat on
> > > > an RT kernel.
> > > >
> > > > This patch does not solve the problem completely for RT kernels. It
> > > > relies on the assumption that on most systems, the CPUs are symmetrical
> > > > and therefore have the same number of cache leaves. The cacheinfo memory
> > > > is allocated early (on the primary CPU), relying on the new handler. If
> > > > later (when CLIDR_EL1 based detection runs again on the secondary CPU)
> > > > the initial assumption proves to be wrong and the CPU has in fact more
> > > > leaves, the cacheinfo memory is reallocated, and that still triggers a
> > > > splat on an RT kernel.
> > > >
> > > > In other words, asymmetrical CPU systems *must* still provide cacheinfo
> > > > data in DT/ACPI to avoid the splat on RT kernels (unless secondary CPUs
> > > > happen to have less leaves than the primary CPU). But symmetrical CPU
> > > > systems (the majority) can now get away without the additional DT/ACPI
> > > > data and rely on CLIDR_EL1 based detection.
> > > >
> > >
> > > If you are okay with the change, can I have your Acked-by, so that I can
> > > route this via Greg's tree ?
> >
> > I really dislike the profileration of __weak functions in this file,
>
> You mean in the generic cacheinfo.c right ? Coz arm64 version must not have
> any and that is the file in this patch.
Right, but we're providing implementations of both early_cache_level() and
init_cache_level(), which are weak symbols in the core code.
Will
On Wed, 12 Apr 2023 14:57:56 -0400, Radu Rendec wrote:
> Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU")
> tries to build the cacheinfo from the primary CPU prior to secondary
> CPUs boot, if the DT/ACPI description contains cache information.
> However, if such information is not present, it still reverts to the old
> behavior, which allocates the cacheinfo memory on each secondary CPU. On
> RT kernels, this triggers a "BUG: sleeping function called from invalid
> context" because the allocation is done before preemption is first
> enabled on the secondary CPU.
>
> [...]
Applied to sudeep.holla/linux (for-next/cacheinfo), thanks!
[1/3] cacheinfo: Add arch specific early level initializer
https://git.kernel.org/sudeep.holla/c/6539cffa9495
[2/3] cacheinfo: Add arm64 early level initializer implementation
https://git.kernel.org/sudeep.holla/c/c931680cfa95
[3/3] cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
https://git.kernel.org/sudeep.holla/c/e103d55465db
--
Regards,
Sudeep
On Wed, Apr 12, 2023 at 02:57:57PM -0400, Radu Rendec wrote:
> This patch gives architecture specific code the ability to initialize
> the cache level and allocate cacheinfo memory early, when cache level
> initialization runs on the primary CPU for all possible CPUs.
>
> This is part of a patch series that attempts to further the work in
> commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
> Previously, in the absence of any DT/ACPI cache info, architecture
> specific cache detection and info allocation for secondary CPUs would
> happen in non-preemptible context during early CPU initialization and
> trigger a "BUG: sleeping function called from invalid context" splat on
> an RT kernel.
>
> More specifically, this patch adds the early_cache_level() function,
> which is called by fetch_cache_info() as a fallback when the number of
> cache leaves cannot be extracted from DT/ACPI. In the default generic
> (weak) implementation, this new function returns -ENOENT, which
> preserves the original behavior for architectures that do not implement
> the function.
>
> Since early detection can get the number of cache leaves wrong in some
> cases*, additional logic is added to still call init_cache_level() later
> on the secondary CPU, therefore giving the architecture specific code an
> opportunity to go back and fix the initial guess. Again, the original
> behavior is preserved for architectures that do not implement the new
> function.
>
> * For example, on arm64, CLIDR_EL1 detection works only when it runs on
> the current CPU. In other words, a CPU cannot detect the cache depth
> for any other CPU than itself.
>
> Signed-off-by: Radu Rendec <[email protected]>
> Reviewed-by: Pierre Gondois <[email protected]>
> ---
> drivers/base/cacheinfo.c | 75 +++++++++++++++++++++++++++------------
> include/linux/cacheinfo.h | 2 ++
> 2 files changed, 55 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index f3903d002819..d783896c8a1f 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -398,6 +398,11 @@ static void free_cache_attributes(unsigned int cpu)
> cache_shared_cpu_map_remove(cpu);
> }
>
> +int __weak early_cache_level(unsigned int cpu)
> +{
> + return -ENOENT;
> +}
> +
> int __weak init_cache_level(unsigned int cpu)
> {
> return -ENOENT;
> @@ -423,56 +428,82 @@ int allocate_cache_info(int cpu)
>
> int fetch_cache_info(unsigned int cpu)
> {
> - struct cpu_cacheinfo *this_cpu_ci;
> + struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> unsigned int levels = 0, split_levels = 0;
> int ret;
>
> if (acpi_disabled) {
> ret = init_of_cache_level(cpu);
> - if (ret < 0)
> - return ret;
> } else {
> ret = acpi_get_cache_info(cpu, &levels, &split_levels);
> - if (ret < 0)
> + if (!ret) {
> + this_cpu_ci->num_levels = levels;
> + /*
> + * This assumes that:
> + * - there cannot be any split caches (data/instruction)
> + * above a unified cache
> + * - data/instruction caches come by pair
> + */
> + this_cpu_ci->num_leaves = levels + split_levels;
> + }
> + }
> +
> + if (ret || !cache_leaves(cpu)) {
> + ret = early_cache_level(cpu);
> + if (ret)
> return ret;
>
> - this_cpu_ci = get_cpu_cacheinfo(cpu);
> - this_cpu_ci->num_levels = levels;
> - /*
> - * This assumes that:
> - * - there cannot be any split caches (data/instruction)
> - * above a unified cache
> - * - data/instruction caches come by pair
> - */
> - this_cpu_ci->num_leaves = levels + split_levels;
> + if (!cache_leaves(cpu))
> + return -ENOENT;
> +
> + this_cpu_ci->early_ci_levels = true;
> }
> - if (!cache_leaves(cpu))
> - return -ENOENT;
>
> return allocate_cache_info(cpu);
> }
>
> -int detect_cache_attributes(unsigned int cpu)
> +static inline int init_level_allocate_ci(unsigned int cpu)
> {
> - int ret;
> + unsigned int early_leaves = cache_leaves(cpu);
>
> /* Since early initialization/allocation of the cacheinfo is allowed
> * via fetch_cache_info() and this also gets called as CPU hotplug
> * callbacks via cacheinfo_cpu_online, the init/alloc can be skipped
> * as it will happen only once (the cacheinfo memory is never freed).
> - * Just populate the cacheinfo.
> + * Just populate the cacheinfo. However, if the cacheinfo has been
> + * allocated early through the arch-specific early_cache_level() call,
> + * there is a chance the info is wrong (this can happen on arm64). In
> + * that case, call init_cache_level() anyway to give the arch-specific
> + * code a chance to make things right.
> */
> - if (per_cpu_cacheinfo(cpu))
> - goto populate_leaves;
> + if (per_cpu_cacheinfo(cpu) && !ci_cacheinfo(cpu)->early_ci_levels)
> + return 0;
>
> if (init_cache_level(cpu) || !cache_leaves(cpu))
> return -ENOENT;
>
> - ret = allocate_cache_info(cpu);
> + /*
> + * Now that we have properly initialized the cache level info, make
> + * sure we don't try to do that again the next time we are called
> + * (e.g. as CPU hotplug callbacks).
> + */
> + ci_cacheinfo(cpu)->early_ci_levels = false;
> +
> + if (cache_leaves(cpu) <= early_leaves)
> + return 0;
> +
Hi,
I had posted a patchset[1] for x86 that initializes
ci_cacheinfo(cpu)->num_leaves during SMP boot.
This means that early_leaves and a late cache_leaves() are equal but
per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
fetch_cache_info().
I think that we should check here that per_cpu_cacheinfo() has been allocated to
take care of the case in which early and late cache leaves remain the same:
- if (cache_leaves(cpu) <= early_leaves)
+ if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
last_level_cache_is_valid().
I can post a patch with this fix if it makes sense.
[1]. https://lore.kernel.org/all/[email protected]/
On Wed, 2023-05-10 at 12:12 -0700, Ricardo Neri wrote:
> On Wed, Apr 12, 2023 at 02:57:57PM -0400, Radu Rendec wrote:
> > This patch gives architecture specific code the ability to initialize
> > the cache level and allocate cacheinfo memory early, when cache level
> > initialization runs on the primary CPU for all possible CPUs.
[cut]
> > -int detect_cache_attributes(unsigned int cpu)
> > +static inline int init_level_allocate_ci(unsigned int cpu)
> > {
> > - int ret;
> > + unsigned int early_leaves = cache_leaves(cpu);
> >
> > /* Since early initialization/allocation of the cacheinfo is allowed
> > * via fetch_cache_info() and this also gets called as CPU hotplug
> > * callbacks via cacheinfo_cpu_online, the init/alloc can be skipped
> > * as it will happen only once (the cacheinfo memory is never freed).
> > - * Just populate the cacheinfo.
> > + * Just populate the cacheinfo. However, if the cacheinfo has been
> > + * allocated early through the arch-specific early_cache_level() call,
> > + * there is a chance the info is wrong (this can happen on arm64). In
> > + * that case, call init_cache_level() anyway to give the arch-specific
> > + * code a chance to make things right.
> > */
> > - if (per_cpu_cacheinfo(cpu))
> > - goto populate_leaves;
> > + if (per_cpu_cacheinfo(cpu) && !ci_cacheinfo(cpu)->early_ci_levels)
> > + return 0;
> >
> > if (init_cache_level(cpu) || !cache_leaves(cpu))
> > return -ENOENT;
> >
> > - ret = allocate_cache_info(cpu);
> > + /*
> > + * Now that we have properly initialized the cache level info, make
> > + * sure we don't try to do that again the next time we are called
> > + * (e.g. as CPU hotplug callbacks).
> > + */
> > + ci_cacheinfo(cpu)->early_ci_levels = false;
> > +
> > + if (cache_leaves(cpu) <= early_leaves)
> > + return 0;
> > +
>
> I had posted a patchset[1] for x86 that initializes
> ci_cacheinfo(cpu)->num_leaves during SMP boot.
>
> This means that early_leaves and a late cache_leaves() are equal but
> per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> fetch_cache_info().
>
> I think that we should check here that per_cpu_cacheinfo() has been allocated to
> take care of the case in which early and late cache leaves remain the same:
>
> - if (cache_leaves(cpu) <= early_leaves)
> + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
>
> Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> last_level_cache_is_valid().
>
> I can post a patch with this fix if it makes sense.
>
> [1]. https://lore.kernel.org/all/[email protected]/
Hi Ricardo,
Thanks for bringing this to my attention. I need to run some tests on
x86 (I did all that work/testing on arm64) and wrap my head around it.
While I don't see any problem with the fix you're proposing, I'm afraid
it may circle back to the other problem I tried to fix initially. Have
you tested this on an RT kernel by any chance?
I'm thinking that if we end up in init_level_allocate_ci() without the
cacheinfo memory having been allocated earlier, we're up for a "BUG"
splat on RT kernels.
If early_leaves has the right value at that point, the cacheinfo memory
should be allocated early (on the primary CPU), so perhaps there's a
different problem somewhere else.
I'll get back to you as soon as I look at this in more detail but I
just wanted to give you a quick heads-up.
Regards,
Radu
On Wed, May 10, 2023 at 04:44:49PM -0400, Radu Rendec wrote:
> On Wed, 2023-05-10 at 12:12 -0700, Ricardo Neri wrote:
> > On Wed, Apr 12, 2023 at 02:57:57PM -0400, Radu Rendec wrote:
> > > This patch gives architecture specific code the ability to initialize
> > > the cache level and allocate cacheinfo memory early, when cache level
> > > initialization runs on the primary CPU for all possible CPUs.
> [cut]
> > > -int detect_cache_attributes(unsigned int cpu)
> > > +static inline int init_level_allocate_ci(unsigned int cpu)
> > > ?{
> > > -???????int ret;
> > > +???????unsigned int early_leaves = cache_leaves(cpu);
> > > ?
> > > ????????/* Since early initialization/allocation of the cacheinfo is allowed
> > > ???????? * via fetch_cache_info() and this also gets called as CPU hotplug
> > > ???????? * callbacks via cacheinfo_cpu_online, the init/alloc can be skipped
> > > ???????? * as it will happen only once (the cacheinfo memory is never freed).
> > > -??????? * Just populate the cacheinfo.
> > > +??????? * Just populate the cacheinfo. However, if the cacheinfo has been
> > > +??????? * allocated early through the arch-specific early_cache_level() call,
> > > +??????? * there is a chance the info is wrong (this can happen on arm64). In
> > > +??????? * that case, call init_cache_level() anyway to give the arch-specific
> > > +??????? * code a chance to make things right.
> > > ???????? */
> > > -???????if (per_cpu_cacheinfo(cpu))
> > > -???????????????goto populate_leaves;
> > > +???????if (per_cpu_cacheinfo(cpu) && !ci_cacheinfo(cpu)->early_ci_levels)
> > > +???????????????return 0;
> > > ?
> > > ????????if (init_cache_level(cpu) || !cache_leaves(cpu))
> > > ????????????????return -ENOENT;
> > > ?
> > > -???????ret = allocate_cache_info(cpu);
> > > +???????/*
> > > +??????? * Now that we have properly initialized the cache level info, make
> > > +??????? * sure we don't try to do that again the next time we are called
> > > +??????? * (e.g. as CPU hotplug callbacks).
> > > +??????? */
> > > +???????ci_cacheinfo(cpu)->early_ci_levels = false;
> > > +
> > > +???????if (cache_leaves(cpu) <= early_leaves)
> > > +???????????????return 0;
> > > +
> >
> > I had posted a patchset[1] for x86 that initializes
> > ci_cacheinfo(cpu)->num_leaves during SMP boot.
> >
> > This means that early_leaves and a late cache_leaves() are equal but
> > per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> > fetch_cache_info().
> >
> > I think that we should check here that per_cpu_cacheinfo() has been allocated to
> > take care of the case in which early and late cache leaves remain the same:
> >
> > -?????? if (cache_leaves(cpu) <= early_leaves)
> > +?????? if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
> >
> > Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> > last_level_cache_is_valid().
> >
> > I can post a patch with this fix if it makes sense.
> >
> > [1]. https://lore.kernel.org/all/[email protected]/
>
> Hi Ricardo,
Thank you very much for your quick reply!
>
> Thanks for bringing this to my attention. I need to run some tests on
> x86 (I did all that work/testing on arm64) and wrap my head around it.
>
> While I don't see any problem with the fix you're proposing, I'm afraid
> it may circle back to the other problem I tried to fix initially. Have
> you tested this on an RT kernel by any chance?
That is a good point. I did not test on an RT kernel. I'll try that.
>
> I'm thinking that if we end up in init_level_allocate_ci() without the
> cacheinfo memory having been allocated earlier, we're up for a "BUG"
> splat on RT kernels.
>
> If early_leaves has the right value at that point, the cacheinfo memory
> should be allocated early (on the primary CPU), so perhaps there's a
> different problem somewhere else.
That can work for x86, IMO. Not sure about other archs. As you mention,
other archs still want the chance to correct the early cache info.
>
> I'll get back to you as soon as I look at this in more detail but I
> just wanted to give you a quick heads-up.
Thanks!
On Wed, 2023-05-10 at 17:00 -0700, Ricardo Neri wrote:
> On Wed, May 10, 2023 at 04:44:49PM -0400, Radu Rendec wrote:
> > On Wed, 2023-05-10 at 12:12 -0700, Ricardo Neri wrote:
> > > On Wed, Apr 12, 2023 at 02:57:57PM -0400, Radu Rendec wrote:
> > > > This patch gives architecture specific code the ability to initialize
> > > > the cache level and allocate cacheinfo memory early, when cache level
> > > > initialization runs on the primary CPU for all possible CPUs.
> > [cut]
> > > > -int detect_cache_attributes(unsigned int cpu)
> > > > +static inline int init_level_allocate_ci(unsigned int cpu)
> > > > {
> > > > - int ret;
> > > > + unsigned int early_leaves = cache_leaves(cpu);
> > > >
> > > > /* Since early initialization/allocation of the cacheinfo is allowed
> > > > * via fetch_cache_info() and this also gets called as CPU hotplug
> > > > * callbacks via cacheinfo_cpu_online, the init/alloc can be skipped
> > > > * as it will happen only once (the cacheinfo memory is never freed).
> > > > - * Just populate the cacheinfo.
> > > > + * Just populate the cacheinfo. However, if the cacheinfo has been
> > > > + * allocated early through the arch-specific early_cache_level() call,
> > > > + * there is a chance the info is wrong (this can happen on arm64). In
> > > > + * that case, call init_cache_level() anyway to give the arch-specific
> > > > + * code a chance to make things right.
> > > > */
> > > > - if (per_cpu_cacheinfo(cpu))
> > > > - goto populate_leaves;
> > > > + if (per_cpu_cacheinfo(cpu) && !ci_cacheinfo(cpu)->early_ci_levels)
> > > > + return 0;
> > > >
> > > > if (init_cache_level(cpu) || !cache_leaves(cpu))
> > > > return -ENOENT;
> > > >
> > > > - ret = allocate_cache_info(cpu);
> > > > + /*
> > > > + * Now that we have properly initialized the cache level info, make
> > > > + * sure we don't try to do that again the next time we are called
> > > > + * (e.g. as CPU hotplug callbacks).
> > > > + */
> > > > + ci_cacheinfo(cpu)->early_ci_levels = false;
> > > > +
> > > > + if (cache_leaves(cpu) <= early_leaves)
> > > > + return 0;
> > > > +
> > >
> > > I had posted a patchset[1] for x86 that initializes
> > > ci_cacheinfo(cpu)->num_leaves during SMP boot.
> > >
> > > This means that early_leaves and a late cache_leaves() are equal but
> > > per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> > > fetch_cache_info().
> > >
> > > I think that we should check here that per_cpu_cacheinfo() has been allocated to
> > > take care of the case in which early and late cache leaves remain the same:
> > >
> > > - if (cache_leaves(cpu) <= early_leaves)
> > > + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
> > >
> > > Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> > > last_level_cache_is_valid().
> > >
> > > I can post a patch with this fix if it makes sense.
> > >
> > > [1]. https://lore.kernel.org/all/[email protected]/
> >
> > Thanks for bringing this to my attention. I need to run some tests on
> > x86 (I did all that work/testing on arm64) and wrap my head around it.
> >
> > While I don't see any problem with the fix you're proposing, I'm afraid
> > it may circle back to the other problem I tried to fix initially. Have
> > you tested this on an RT kernel by any chance?
>
> That is a good point. I did not test on an RT kernel. I'll try that.
It looks like the flow is much simpler on x86: detect_cache_attributes()
is called only once for each CPU, and it's called in kthread context.
I haven't tested on an RT kernel but I think it should be fine. I put a
msleep() there and saw no issues, which means kmalloc() on RT should be
fine as well.
> > I'm thinking that if we end up in init_level_allocate_ci() without the
> > cacheinfo memory having been allocated earlier, we're up for a "BUG"
> > splat on RT kernels.
> >
> > If early_leaves has the right value at that point, the cacheinfo memory
> > should be allocated early (on the primary CPU), so perhaps there's a
> > different problem somewhere else.
>
> That can work for x86, IMO. Not sure about other archs. As you mention,
> other archs still want the chance to correct the early cache info.
You're right. I got confused for a moment because I was used to the
arm64 flow. On x86, there is no "early" cache info per se because, as I
already mentioned, detect_cache_attributes() is called only once for
each CPU.
I was intrigued about how this worked without your changes, and I
looked closer. Between the initialization of the early_leaves variable
at the beginning of init_level_allocate_ci() and the comparison of
cache_leaves(cpu) and early_leaves, init_cache_level() gets called.
Before your changes, (struct cpu_cacheinfo).num_leaves was initialized
to 0 and then changed in init_cache_level(). That way, early_leaves
ended up as 0, which made the comparison evaluate to false.
At this point I think the patch you proposed is the right way to fix
this. I don't see any reason why it would interfere with other archs
that really use early allocation. This new patch should probably be
added to your series, since otherwise your other patches would
basically "introduce" a null-pointer deref.
My only suggestion would be to add a short comment before the
comparison, to explain that on x86 detect_cache_attributes() is called
only once for each CPU and so early allocation is not possible but
(struct cpu_cacheinfo).num_leaves is already initialized by the time
detect_cache_attributes() is called.
Regards,
Radu
On Wed, May 10, 2023 at 12:12:07PM -0700, Ricardo Neri wrote:
> Hi,
>
> I had posted a patchset[1] for x86 that initializes
> ci_cacheinfo(cpu)->num_leaves during SMP boot.
>
It is entirely clear to me if this is just a clean up or a fix to some
issue you faced ? Just wanted to let you know Prateek from AMD has couple
of fixes [2]
> This means that early_leaves and a late cache_leaves() are equal but
> per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> fetch_cache_info().
>
> I think that we should check here that per_cpu_cacheinfo() has been allocated to
> take care of the case in which early and late cache leaves remain the same:
>
> - if (cache_leaves(cpu) <= early_leaves)
> + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
>
> Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> last_level_cache_is_valid().
>
I think this is different issue as Prateek was just observing wrong info
after cpuhotplug operations. But the patches manage the cpumap_populated
state better with the patches. Can you please look at that as weel ?
--
Regards,
Sudeep
[2] https://lore.kernel.org/all/[email protected]
On Mon, May 15, 2023 at 10:36:08AM +0100, Sudeep Holla wrote:
> On Wed, May 10, 2023 at 12:12:07PM -0700, Ricardo Neri wrote:
> > Hi,
> >
> > I had posted a patchset[1] for x86 that initializes
> > ci_cacheinfo(cpu)->num_leaves during SMP boot.
> >
>
> It is entirely clear to me if this is just a clean up or a fix to some
> issue you faced ? Just wanted to let you know Prateek from AMD has couple
> of fixes [2]
My first patch is a bug fix. The second patch is clean up that results
from fixing the bug in patch 1.
>
> > This means that early_leaves and a late cache_leaves() are equal but
> > per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> > fetch_cache_info().
> >
> > I think that we should check here that per_cpu_cacheinfo() has been allocated to
> > take care of the case in which early and late cache leaves remain the same:
> >
> > - if (cache_leaves(cpu) <= early_leaves)
> > + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
> >
> > Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> > last_level_cache_is_valid().
> >
>
> I think this is different issue as Prateek was just observing wrong info
> after cpuhotplug operations. But the patches manage the cpumap_populated
> state better with the patches. Can you please look at that as weel ?
I verified that the patches from Prateek fix a different issue. I was able
to reproduce his issue. His patches fixes it.
I still see my issue after applying Prateek's patches.
>
> --
> Regards,
> Sudeep
>
> [2] https://lore.kernel.org/all/[email protected]
Thank you for the suggestion!
BR,
Ricardo
On Wed, May 17, 2023 at 06:27:03PM -0700, Ricardo Neri wrote:
> On Mon, May 15, 2023 at 10:36:08AM +0100, Sudeep Holla wrote:
> > On Wed, May 10, 2023 at 12:12:07PM -0700, Ricardo Neri wrote:
> > > Hi,
> > >
> > > I had posted a patchset[1] for x86 that initializes
> > > ci_cacheinfo(cpu)->num_leaves during SMP boot.
> > >
> >
> > It is entirely clear to me if this is just a clean up or a fix to some
> > issue you faced ? Just wanted to let you know Prateek from AMD has couple
> > of fixes [2]
>
> My first patch is a bug fix. The second patch is clean up that results
> from fixing the bug in patch 1.
>
> >
> > > This means that early_leaves and a late cache_leaves() are equal but
> > > per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> > > fetch_cache_info().
> > >
> > > I think that we should check here that per_cpu_cacheinfo() has been allocated to
> > > take care of the case in which early and late cache leaves remain the same:
> > >
> > > - if (cache_leaves(cpu) <= early_leaves)
> > > + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
> > >
> > > Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> > > last_level_cache_is_valid().
> > >
> >
> > I think this is different issue as Prateek was just observing wrong info
> > after cpuhotplug operations. But the patches manage the cpumap_populated
> > state better with the patches. Can you please look at that as weel ?
>
> I verified that the patches from Prateek fix a different issue. I was able
> to reproduce his issue. His patches fixes it.
>
> I still see my issue after applying Prateek's patches.
Thanks, I thought it is different issue and good that you were able to test
them as well. Please post a proper patch for the NULL ptr dereference you
are hitting on x86.
--
Regards,
Sudeep
On Thu, May 11, 2023 at 03:55:18PM -0400, Radu Rendec wrote:
> On Wed, 2023-05-10 at 17:00 -0700, Ricardo Neri wrote:
> > On Wed, May 10, 2023 at 04:44:49PM -0400, Radu Rendec wrote:
> > > On Wed, 2023-05-10 at 12:12 -0700, Ricardo Neri wrote:
> > > > On Wed, Apr 12, 2023 at 02:57:57PM -0400, Radu Rendec wrote:
> > > > > This patch gives architecture specific code the ability to initialize
> > > > > the cache level and allocate cacheinfo memory early, when cache level
> > > > > initialization runs on the primary CPU for all possible CPUs.
> > > [cut]
> > > > > -int detect_cache_attributes(unsigned int cpu)
> > > > > +static inline int init_level_allocate_ci(unsigned int cpu)
> > > > > ?{
> > > > > -???????int ret;
> > > > > +???????unsigned int early_leaves = cache_leaves(cpu);
> > > > > ?
> > > > > ????????/* Since early initialization/allocation of the cacheinfo is allowed
> > > > > ???????? * via fetch_cache_info() and this also gets called as CPU hotplug
> > > > > ???????? * callbacks via cacheinfo_cpu_online, the init/alloc can be skipped
> > > > > ???????? * as it will happen only once (the cacheinfo memory is never freed).
> > > > > -??????? * Just populate the cacheinfo.
> > > > > +??????? * Just populate the cacheinfo. However, if the cacheinfo has been
> > > > > +??????? * allocated early through the arch-specific early_cache_level() call,
> > > > > +??????? * there is a chance the info is wrong (this can happen on arm64). In
> > > > > +??????? * that case, call init_cache_level() anyway to give the arch-specific
> > > > > +??????? * code a chance to make things right.
> > > > > ???????? */
> > > > > -???????if (per_cpu_cacheinfo(cpu))
> > > > > -???????????????goto populate_leaves;
> > > > > +???????if (per_cpu_cacheinfo(cpu) && !ci_cacheinfo(cpu)->early_ci_levels)
> > > > > +???????????????return 0;
> > > > > ?
> > > > > ????????if (init_cache_level(cpu) || !cache_leaves(cpu))
> > > > > ????????????????return -ENOENT;
> > > > > ?
> > > > > -???????ret = allocate_cache_info(cpu);
> > > > > +???????/*
> > > > > +??????? * Now that we have properly initialized the cache level info, make
> > > > > +??????? * sure we don't try to do that again the next time we are called
> > > > > +??????? * (e.g. as CPU hotplug callbacks).
> > > > > +??????? */
> > > > > +???????ci_cacheinfo(cpu)->early_ci_levels = false;
> > > > > +
> > > > > +???????if (cache_leaves(cpu) <= early_leaves)
> > > > > +???????????????return 0;
> > > > > +
> > > >
> > > > I had posted a patchset[1] for x86 that initializes
> > > > ci_cacheinfo(cpu)->num_leaves during SMP boot.
> > > >
> > > > This means that early_leaves and a late cache_leaves() are equal but
> > > > per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> > > > fetch_cache_info().
> > > >
> > > > I think that we should check here that per_cpu_cacheinfo() has been allocated to
> > > > take care of the case in which early and late cache leaves remain the same:
> > > >
> > > > -?????? if (cache_leaves(cpu) <= early_leaves)
> > > > +?????? if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
> > > >
> > > > Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> > > > last_level_cache_is_valid().
> > > >
> > > > I can post a patch with this fix if it makes sense.
> > > >
> > > > [1]. https://lore.kernel.org/all/[email protected]/
> > >
> > > Thanks for bringing this to my attention. I need to run some tests on
> > > x86 (I did all that work/testing on arm64) and wrap my head around it.
> > >
> > > While I don't see any problem with the fix you're proposing, I'm afraid
> > > it may circle back to the other problem I tried to fix initially. Have
> > > you tested this on an RT kernel by any chance?
> >
> > That is a good point. I did not test on an RT kernel. I'll try that.
>
> It looks like the flow is much simpler on x86: detect_cache_attributes()
> is called only once for each CPU, and it's called in kthread context.
>
> I haven't tested on an RT kernel but I think it should be fine. I put a
> msleep() there and saw no issues, which means kmalloc() on RT should be
> fine as well.
I booted the realtime kernel [3] with CONFIG_PREEMPT_RT and did not observe
the BUG splat. I tried before your patchset. Were you able to reproduce on
x86? Also, I was not able to reproduce the BUG splat after your changes +
[1] + my earlier suggested patch in this thread.
>
> > > I'm thinking that if we end up in init_level_allocate_ci() without the
> > > cacheinfo memory having been allocated earlier, we're up for a "BUG"
> > > splat on RT kernels.
> > >
> > > If early_leaves has the right value at that point, the cacheinfo memory
> > > should be allocated early (on the primary CPU), so perhaps there's a
> > > different problem somewhere else.
> >
> > That can work for x86, IMO. Not sure about other archs. As you mention,
> > other archs still want the chance to correct the early cache info.
>
> You're right. I got confused for a moment because I was used to the
> arm64 flow. On x86, there is no "early" cache info per se because, as I
> already mentioned, detect_cache_attributes() is called only once for
> each CPU.
Indeed.
>
> I was intrigued about how this worked without your changes, and I
> looked closer. Between the initialization of the early_leaves variable
> at the beginning of init_level_allocate_ci() and the comparison of
> cache_leaves(cpu) and early_leaves, init_cache_level() gets called.
> Before your changes, (struct cpu_cacheinfo).num_leaves was initialized
> to 0 and then changed in init_cache_level(). That way, early_leaves
> ended up as 0, which made the comparison evaluate to false.
Yes my changes aim to use (struct cpu_cacheinfo).num_leaves directly.
>
> At this point I think the patch you proposed is the right way to fix
> this. I don't see any reason why it would interfere with other archs
> that really use early allocation. This new patch should probably be
> added to your series, since otherwise your other patches would
> basically "introduce" a null-pointer deref.
>
> My only suggestion would be to add a short comment before the
> comparison, to explain that on x86 detect_cache_attributes() is called
> only once for each CPU and so early allocation is not possible but
> (struct cpu_cacheinfo).num_leaves is already initialized by the time
> detect_cache_attributes() is called.
Thank you very much for your help!
BR,
Ricardo
Hi Ricardo,
On Fri, 2023-05-19 at 14:44 -0700, Ricardo Neri wrote:
> On Thu, May 11, 2023 at 03:55:18PM -0400, Radu Rendec wrote:
> > On Wed, 2023-05-10 at 17:00 -0700, Ricardo Neri wrote:
> > > On Wed, May 10, 2023 at 04:44:49PM -0400, Radu Rendec wrote:
> > > > On Wed, 2023-05-10 at 12:12 -0700, Ricardo Neri wrote:
> > > > > On Wed, Apr 12, 2023 at 02:57:57PM -0400, Radu Rendec wrote:
> > > > > > This patch gives architecture specific code the ability to initialize
> > > > > > the cache level and allocate cacheinfo memory early, when cache level
> > > > > > initialization runs on the primary CPU for all possible CPUs.
> > > > [cut]
> > > > > > -int detect_cache_attributes(unsigned int cpu)
> > > > > > +static inline int init_level_allocate_ci(unsigned int cpu)
> > > > > > {
> > > > > > - int ret;
> > > > > > + unsigned int early_leaves = cache_leaves(cpu);
> > > > > >
> > > > > > /* Since early initialization/allocation of the cacheinfo is allowed
> > > > > > * via fetch_cache_info() and this also gets called as CPU hotplug
> > > > > > * callbacks via cacheinfo_cpu_online, the init/alloc can be skipped
> > > > > > * as it will happen only once (the cacheinfo memory is never freed).
> > > > > > - * Just populate the cacheinfo.
> > > > > > + * Just populate the cacheinfo. However, if the cacheinfo has been
> > > > > > + * allocated early through the arch-specific early_cache_level() call,
> > > > > > + * there is a chance the info is wrong (this can happen on arm64). In
> > > > > > + * that case, call init_cache_level() anyway to give the arch-specific
> > > > > > + * code a chance to make things right.
> > > > > > */
> > > > > > - if (per_cpu_cacheinfo(cpu))
> > > > > > - goto populate_leaves;
> > > > > > + if (per_cpu_cacheinfo(cpu) && !ci_cacheinfo(cpu)->early_ci_levels)
> > > > > > + return 0;
> > > > > >
> > > > > > if (init_cache_level(cpu) || !cache_leaves(cpu))
> > > > > > return -ENOENT;
> > > > > >
> > > > > > - ret = allocate_cache_info(cpu);
> > > > > > + /*
> > > > > > + * Now that we have properly initialized the cache level info, make
> > > > > > + * sure we don't try to do that again the next time we are called
> > > > > > + * (e.g. as CPU hotplug callbacks).
> > > > > > + */
> > > > > > + ci_cacheinfo(cpu)->early_ci_levels = false;
> > > > > > +
> > > > > > + if (cache_leaves(cpu) <= early_leaves)
> > > > > > + return 0;
> > > > > > +
> > > > >
> > > > > I had posted a patchset[1] for x86 that initializes
> > > > > ci_cacheinfo(cpu)->num_leaves during SMP boot.
> > > > >
> > > > > This means that early_leaves and a late cache_leaves() are equal but
> > > > > per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> > > > > fetch_cache_info().
> > > > >
> > > > > I think that we should check here that per_cpu_cacheinfo() has been allocated to
> > > > > take care of the case in which early and late cache leaves remain the same:
> > > > >
> > > > > - if (cache_leaves(cpu) <= early_leaves)
> > > > > + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
> > > > >
> > > > > Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> > > > > last_level_cache_is_valid().
> > > > >
> > > > > I can post a patch with this fix if it makes sense.
> > > > >
> > > > > [1]. https://lore.kernel.org/all/[email protected]/
> > > >
> > > > Thanks for bringing this to my attention. I need to run some tests on
> > > > x86 (I did all that work/testing on arm64) and wrap my head around it.
> > > >
> > > > While I don't see any problem with the fix you're proposing, I'm afraid
> > > > it may circle back to the other problem I tried to fix initially. Have
> > > > you tested this on an RT kernel by any chance?
> > >
> > > That is a good point. I did not test on an RT kernel. I'll try that.
> >
> > It looks like the flow is much simpler on x86: detect_cache_attributes()
> > is called only once for each CPU, and it's called in kthread context.
> >
> > I haven't tested on an RT kernel but I think it should be fine. I put a
> > msleep() there and saw no issues, which means kmalloc() on RT should be
> > fine as well.
>
> I booted the realtime kernel [3] with CONFIG_PREEMPT_RT and did not observe
> the BUG splat. I tried before your patchset. Were you able to reproduce on
> x86? Also, I was not able to reproduce the BUG splat after your changes +
> [1] + my earlier suggested patch in this thread.
Thanks for trying this out. I think the BUG splat cannot be reproduced
on x86, either with or without my fix because detect_cache_attributes()
is always called in kthread context, and that makes kmalloc() happy on
RT kernels.
At the time when I first asked you about the RT kernel, I hadn't looked
closely at the x86 code yet, and I was unaware of the (much simpler)
flow and the kthread context. Sorry for the confusion!
In any case, your test on the RT kernel validates the conclusion that
we already came to, and eliminates any trace of doubt. FWIW, I was
testing on arm64 when I found the bug and created the fix. Things are
much more complicated there, and detect_cache_attributes() is called
twice for each CPU (the second time with preemption disabled).
Best regards,
Radu
On Thu, May 18, 2023 at 10:34:14AM +0100, Sudeep Holla wrote:
> On Wed, May 17, 2023 at 06:27:03PM -0700, Ricardo Neri wrote:
> > On Mon, May 15, 2023 at 10:36:08AM +0100, Sudeep Holla wrote:
> > > On Wed, May 10, 2023 at 12:12:07PM -0700, Ricardo Neri wrote:
> > > > Hi,
> > > >
> > > > I had posted a patchset[1] for x86 that initializes
> > > > ci_cacheinfo(cpu)->num_leaves during SMP boot.
> > > >
> > >
> > > It is entirely clear to me if this is just a clean up or a fix to some
> > > issue you faced ? Just wanted to let you know Prateek from AMD has couple
> > > of fixes [2]
> >
> > My first patch is a bug fix. The second patch is clean up that results
> > from fixing the bug in patch 1.
> >
> > >
> > > > This means that early_leaves and a late cache_leaves() are equal but
> > > > per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> > > > fetch_cache_info().
> > > >
> > > > I think that we should check here that per_cpu_cacheinfo() has been allocated to
> > > > take care of the case in which early and late cache leaves remain the same:
> > > >
> > > > - if (cache_leaves(cpu) <= early_leaves)
> > > > + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
> > > >
> > > > Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> > > > last_level_cache_is_valid().
> > > >
> > >
> > > I think this is different issue as Prateek was just observing wrong info
> > > after cpuhotplug operations. But the patches manage the cpumap_populated
> > > state better with the patches. Can you please look at that as weel ?
> >
> > I verified that the patches from Prateek fix a different issue. I was able
> > to reproduce his issue. His patches fixes it.
> >
> > I still see my issue after applying Prateek's patches.
>
> Thanks, I thought it is different issue and good that you were able to test
> them as well. Please post a proper patch for the NULL ptr dereference you
> are hitting on x86.
Gentle ping! Are you still observing NULL ptr dereference with v6.4-rcx ?
If so, can you please post the fix as a proper patch ? Some of the patches
in v6.4-rc1 are being backported, so I prefer to have all the known issues
fixed before that happens. Sorry for the nag, but backport is the reason
I am pushing for this.
--
Regards,
Sudeep
On Wed, May 31, 2023 at 01:22:01PM +0100, Sudeep Holla wrote:
> On Thu, May 18, 2023 at 10:34:14AM +0100, Sudeep Holla wrote:
> > On Wed, May 17, 2023 at 06:27:03PM -0700, Ricardo Neri wrote:
> > > On Mon, May 15, 2023 at 10:36:08AM +0100, Sudeep Holla wrote:
> > > > On Wed, May 10, 2023 at 12:12:07PM -0700, Ricardo Neri wrote:
> > > > > Hi,
> > > > >
> > > > > I had posted a patchset[1] for x86 that initializes
> > > > > ci_cacheinfo(cpu)->num_leaves during SMP boot.
> > > > >
> > > >
> > > > It is entirely clear to me if this is just a clean up or a fix to some
> > > > issue you faced ? Just wanted to let you know Prateek from AMD has couple
> > > > of fixes [2]
> > >
> > > My first patch is a bug fix. The second patch is clean up that results
> > > from fixing the bug in patch 1.
> > >
> > > >
> > > > > This means that early_leaves and a late cache_leaves() are equal but
> > > > > per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> > > > > fetch_cache_info().
> > > > >
> > > > > I think that we should check here that per_cpu_cacheinfo() has been allocated to
> > > > > take care of the case in which early and late cache leaves remain the same:
> > > > >
> > > > > - if (cache_leaves(cpu) <= early_leaves)
> > > > > + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
> > > > >
> > > > > Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> > > > > last_level_cache_is_valid().
> > > > >
> > > >
> > > > I think this is different issue as Prateek was just observing wrong info
> > > > after cpuhotplug operations. But the patches manage the cpumap_populated
> > > > state better with the patches. Can you please look at that as weel ?
> > >
> > > I verified that the patches from Prateek fix a different issue. I was able
> > > to reproduce his issue. His patches fixes it.
> > >
> > > I still see my issue after applying Prateek's patches.
> >
> > Thanks, I thought it is different issue and good that you were able to test
> > them as well. Please post a proper patch for the NULL ptr dereference you
> > are hitting on x86.
>
> Gentle ping! Are you still observing NULL ptr dereference with v6.4-rcx ?
Yes, I still observe it on v6.4-rc4.
> If so, can you please post the fix as a proper patch ? Some of the patches
> in v6.4-rc1 are being backported, so I prefer to have all the known issues
> fixed before that happens. Sorry for the nag, but backport is the reason
> I am pushing for this.
Sure. Sorry for the delay. I have the patch ready and post this week. I
will post it as part my previous patches in [1].
Thanks and BR,
Ricardo
On Wed, May 31, 2023 at 10:03:36AM -0700, Ricardo Neri wrote:
> On Wed, May 31, 2023 at 01:22:01PM +0100, Sudeep Holla wrote:
> > On Thu, May 18, 2023 at 10:34:14AM +0100, Sudeep Holla wrote:
> > > On Wed, May 17, 2023 at 06:27:03PM -0700, Ricardo Neri wrote:
> > > > On Mon, May 15, 2023 at 10:36:08AM +0100, Sudeep Holla wrote:
> > > > > On Wed, May 10, 2023 at 12:12:07PM -0700, Ricardo Neri wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I had posted a patchset[1] for x86 that initializes
> > > > > > ci_cacheinfo(cpu)->num_leaves during SMP boot.
> > > > > >
> > > > >
> > > > > It is entirely clear to me if this is just a clean up or a fix to some
> > > > > issue you faced ? Just wanted to let you know Prateek from AMD has couple
> > > > > of fixes [2]
> > > >
> > > > My first patch is a bug fix. The second patch is clean up that results
> > > > from fixing the bug in patch 1.
> > > >
> > > > >
> > > > > > This means that early_leaves and a late cache_leaves() are equal but
> > > > > > per_cpu_cacheinfo(cpu) is never allocated. Currently, x86 does not use
> > > > > > fetch_cache_info().
> > > > > >
> > > > > > I think that we should check here that per_cpu_cacheinfo() has been allocated to
> > > > > > take care of the case in which early and late cache leaves remain the same:
> > > > > >
> > > > > > - if (cache_leaves(cpu) <= early_leaves)
> > > > > > + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu))
> > > > > >
> > > > > > Otherwise, in v6.4-rc1 + [1] I observe a NULL pointer dereference from
> > > > > > last_level_cache_is_valid().
> > > > > >
> > > > >
> > > > > I think this is different issue as Prateek was just observing wrong info
> > > > > after cpuhotplug operations. But the patches manage the cpumap_populated
> > > > > state better with the patches. Can you please look at that as weel ?
> > > >
> > > > I verified that the patches from Prateek fix a different issue. I was able
> > > > to reproduce his issue. His patches fixes it.
> > > >
> > > > I still see my issue after applying Prateek's patches.
> > >
> > > Thanks, I thought it is different issue and good that you were able to test
> > > them as well. Please post a proper patch for the NULL ptr dereference you
> > > are hitting on x86.
> >
> > Gentle ping! Are you still observing NULL ptr dereference with v6.4-rcx ?
>
> Yes, I still observe it on v6.4-rc4.
>
> > If so, can you please post the fix as a proper patch ? Some of the patches
> > in v6.4-rc1 are being backported, so I prefer to have all the known issues
> > fixed before that happens. Sorry for the nag, but backport is the reason
> > I am pushing for this.
>
> Sure. Sorry for the delay. I have the patch ready and post this week. I
> will post it as part my previous patches in [1].
I at last posted the patchet, Sudeep. You can take a look here:
https://lore.kernel.org/all/[email protected]/
Sorry for the delay. I had to jump through various hoops before posting.
Thanks and BR,
Ricardo