Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU")
tries to build the cacheinfo from the primary CPU prior to secondary
CPUs boot, if the DT/ACPI description contains cache information.
However, if such information is not present, it still reverts to the old
behavior, which allocates the cacheinfo memory on each secondary CPU. On
RT kernels, this triggers a "BUG: sleeping function called from invalid
context" because the allocation is done before preemption is first
enabled on the secondary CPU.
The solution is to add cache information to DT/ACPI, but at least on
arm64 systems this can be avoided by leveraging automatic detection
(through the CLIDR_EL1 register), which is already implemented but
currently doesn't work on RT kernels for the reason described above.
This patch series attempts to enable automatic detection for RT kernels
when no DT/ACPI cache information is available, by pre-allocating
cacheinfo memory on the primary CPU.
The first patch adds an architecture independent infrastructure that
allows architecture specific code to take an early guess at the number
of cache leaves of the secodary CPUs, while it runs in preemptible
context on the primary CPU. At the same time, it gives architecture
specific code the opportunity to go back later, while it runs on the
secondary CPU, and reallocate the cacheinfo memory if the initial guess
proves to be wrong.
The second patch leverages the infrastructure implemented in the first
patch and enables early cache depth detection for arm64.
The patch series is based on an RFC patch that was posted to the
linux-arm-kernel mailing list and discussed with a smaller audience:
https://lore.kernel.org/all/[email protected]/
Changes to v2:
* Address minor coding style issue (unbalanced braces).
* Move cacheinfo reallocation logic from detect_cache_attributes() to a
new function to improve code readability.
* Minor fix to cacheinfo reallocation logic to avoid a new detection of
the cache level if/when detect_cache_attributes() is called again.
Radu Rendec (2):
cacheinfo: Add arch specific early level initializer
cacheinfo: Add arm64 early level initializer implementation
arch/arm64/kernel/cacheinfo.c | 32 +++++++++++----
drivers/base/cacheinfo.c | 75 +++++++++++++++++++++++++----------
include/linux/cacheinfo.h | 2 +
3 files changed, 79 insertions(+), 30 deletions(-)
--
2.39.2
This patch gives of architecture specific code the ability to initialize
the cache level and allocate cacheinfo memory early, when cache level
initialization runs on the primary CPU for all possible CPUs.
This is part of a patch series that attempts to further the work in
commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
Previously, in the absence of any DT/ACPI cache info, architecture
specific cache detection and info allocation for secondary CPUs would
happen in non-preemptible context during early CPU initialization and
trigger a "BUG: sleeping function called from invalid context" splat on
an RT kernel.
More specifically, this patch adds the early_cache_level() function,
which is called by fetch_cache_info() as a fallback when the number of
cache leaves cannot be extracted from DT/ACPI. In the default generic
(weak) implementation, this new function returns -ENOENT, which
preserves the original behavior for architectures that do not implement
the function.
Since early detection can get the number of cache leaves wrong in some
cases*, additional logic is added to still call init_cache_level() later
on the secondary CPU, therefore giving the architecture specific code an
opportunity to go back and fix the initial guess. Again, the original
behavior is preserved for architectures that do not implement the new
function.
* For example, on arm64, CLIDR_EL1 detection works only when it runs on
the current CPU. In other words, a CPU cannot detect the cache depth
for any other CPU than itself.
Signed-off-by: Radu Rendec <[email protected]>
---
drivers/base/cacheinfo.c | 75 +++++++++++++++++++++++++++------------
include/linux/cacheinfo.h | 2 ++
2 files changed, 55 insertions(+), 22 deletions(-)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index f6573c335f4c..30f5553d3ebb 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -398,6 +398,11 @@ static void free_cache_attributes(unsigned int cpu)
cache_shared_cpu_map_remove(cpu);
}
+int __weak early_cache_level(unsigned int cpu)
+{
+ return -ENOENT;
+}
+
int __weak init_cache_level(unsigned int cpu)
{
return -ENOENT;
@@ -423,56 +428,82 @@ int allocate_cache_info(int cpu)
int fetch_cache_info(unsigned int cpu)
{
- struct cpu_cacheinfo *this_cpu_ci;
+ struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
unsigned int levels = 0, split_levels = 0;
int ret;
if (acpi_disabled) {
ret = init_of_cache_level(cpu);
- if (ret < 0)
- return ret;
} else {
ret = acpi_get_cache_info(cpu, &levels, &split_levels);
- if (ret < 0)
+ if (!ret) {
+ this_cpu_ci->num_levels = levels;
+ /*
+ * This assumes that:
+ * - there cannot be any split caches (data/instruction)
+ * above a unified cache
+ * - data/instruction caches come by pair
+ */
+ this_cpu_ci->num_leaves = levels + split_levels;
+ }
+ }
+
+ if (ret || !cache_leaves(cpu)) {
+ ret = early_cache_level(cpu);
+ if (ret)
return ret;
- this_cpu_ci = get_cpu_cacheinfo(cpu);
- this_cpu_ci->num_levels = levels;
- /*
- * This assumes that:
- * - there cannot be any split caches (data/instruction)
- * above a unified cache
- * - data/instruction caches come by pair
- */
- this_cpu_ci->num_leaves = levels + split_levels;
+ if (!cache_leaves(cpu))
+ return -ENOENT;
+
+ this_cpu_ci->early_arch_info = true;
}
- if (!cache_leaves(cpu))
- return -ENOENT;
return allocate_cache_info(cpu);
}
-int detect_cache_attributes(unsigned int cpu)
+static inline int init_level_allocate_ci(unsigned int cpu)
{
- int ret;
+ unsigned int early_leaves = cache_leaves(cpu);
/* Since early initialization/allocation of the cacheinfo is allowed
* via fetch_cache_info() and this also gets called as CPU hotplug
* callbacks via cacheinfo_cpu_online, the init/alloc can be skipped
* as it will happen only once (the cacheinfo memory is never freed).
- * Just populate the cacheinfo.
+ * Just populate the cacheinfo. However, if the cacheinfo has been
+ * allocated early through the arch-specific early_cache_level() call,
+ * there is a chance the info is wrong (this can happen on arm64). In
+ * that case, call init_cache_level() anyway to give the arch-specific
+ * code a chance to make things right.
*/
- if (per_cpu_cacheinfo(cpu))
- goto populate_leaves;
+ if (per_cpu_cacheinfo(cpu) && !ci_cacheinfo(cpu)->early_arch_info)
+ return 0;
if (init_cache_level(cpu) || !cache_leaves(cpu))
return -ENOENT;
- ret = allocate_cache_info(cpu);
+ /*
+ * Now that we have properly initialized the cache level info, make
+ * sure we don't try to do that again the next time we are called
+ * (e.g. as CPU hotplug callbacks).
+ */
+ ci_cacheinfo(cpu)->early_arch_info = false;
+
+ if (cache_leaves(cpu) <= early_leaves)
+ return 0;
+
+ kfree(per_cpu_cacheinfo(cpu));
+ return allocate_cache_info(cpu);
+}
+
+int detect_cache_attributes(unsigned int cpu)
+{
+ int ret;
+
+ ret = init_level_allocate_ci(cpu);
if (ret)
return ret;
-populate_leaves:
/*
* populate_cache_leaves() may completely setup the cache leaves and
* shared_cpu_map or it may leave it partially setup.
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 908e19d17f49..c9d44308fc42 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -76,9 +76,11 @@ struct cpu_cacheinfo {
unsigned int num_levels;
unsigned int num_leaves;
bool cpu_map_populated;
+ bool early_arch_info;
};
struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu);
+int early_cache_level(unsigned int cpu);
int init_cache_level(unsigned int cpu);
int init_of_cache_level(unsigned int cpu);
int populate_cache_leaves(unsigned int cpu);
--
2.39.2
This patch adds an architecture specific early cache level detection
handler for arm64. This is basically the CLIDR_EL1 based detection that
was previously done (only) in init_cache_level().
This is part of a patch series that attempts to further the work in
commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
Previously, in the absence of any DT/ACPI cache info, architecture
specific cache detection and info allocation for secondary CPUs would
happen in non-preemptible context during early CPU initialization and
trigger a "BUG: sleeping function called from invalid context" splat on
an RT kernel.
This patch does not solve the problem completely for RT kernels. It
relies on the assumption that on most systems, the CPUs are symmetrical
and therefore have the same number of cache leaves. The cacheinfo memory
is allocated early (on the primary CPU), relying on the new handler. If
later (when CLIDR_EL1 based detection runs again on the secondary CPU)
the initial assumption proves to be wrong and the CPU has in fact more
leaves, the cacheinfo memory is reallocated, and that still triggers a
splat on an RT kernel.
In other words, asymmetrical CPU systems *must* still provide cacheinfo
data in DT/ACPI to avoid the splat on RT kernels (unless secondary CPUs
happen to have less leaves than the primary CPU). But symmetrical CPU
systems (the majority) can now get away without the additional DT/ACPI
data and rely on CLIDR_EL1 based detection.
Signed-off-by: Radu Rendec <[email protected]>
---
arch/arm64/kernel/cacheinfo.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index c307f69e9b55..520d17e4ebe9 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -38,21 +38,37 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
this_leaf->type = type;
}
-int init_cache_level(unsigned int cpu)
+static void detect_cache_level(unsigned int *level, unsigned int *leaves)
{
- unsigned int ctype, level, leaves;
- int fw_level, ret;
- struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+ unsigned int ctype;
- for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
- ctype = get_cache_type(level);
+ for (*level = 1, *leaves = 0; *level <= MAX_CACHE_LEVEL; (*level)++) {
+ ctype = get_cache_type(*level);
if (ctype == CACHE_TYPE_NOCACHE) {
- level--;
+ (*level)--;
break;
}
/* Separate instruction and data caches */
- leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
+ *leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
}
+}
+
+int early_cache_level(unsigned int cpu)
+{
+ struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+
+ detect_cache_level(&this_cpu_ci->num_levels, &this_cpu_ci->num_leaves);
+
+ return 0;
+}
+
+int init_cache_level(unsigned int cpu)
+{
+ unsigned int level, leaves;
+ int fw_level, ret;
+ struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
+
+ detect_cache_level(&level, &leaves);
if (acpi_disabled) {
fw_level = of_find_last_cache_level(cpu);
--
2.39.2
Hello Radu,
Some additional points:
1-
For the record, Will made a comment about adding weak functions
(cf. https://lore.kernel.org/all/20230327121734.GB31342@willie-the-truck/)
but I don't see how it could be done otherwise ...
2-
The patch-set needs to be rebased on top of v6.3-rc6,
otherwise there is a merge conflict.
3-
When trying the patch-set on an ACPI platform with no PPTT, it seems that
fetch_cache_info() is not called from init_cpu_topology() because
parse_acpi_topology() returns an error code. This result in a
'sleeping function called from invalid context' message. The following made
it work for me:
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -838,7 +838,6 @@ void __init init_cpu_topology(void)
* don't use partial information.
*/
reset_cpu_topology();
- return;
}
for_each_possible_cpu(cpu) {
With 2 and 3 addressed:
Reviewed-by: Pierre Gondois <[email protected]>
Also maybe wait for Sudeep to have a look before sending a v4,
Regards,
Pierre
On 4/7/23 01:39, Radu Rendec wrote:
> Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU")
> tries to build the cacheinfo from the primary CPU prior to secondary
> CPUs boot, if the DT/ACPI description contains cache information.
> However, if such information is not present, it still reverts to the old
> behavior, which allocates the cacheinfo memory on each secondary CPU. On
> RT kernels, this triggers a "BUG: sleeping function called from invalid
> context" because the allocation is done before preemption is first
> enabled on the secondary CPU.
>
> The solution is to add cache information to DT/ACPI, but at least on
> arm64 systems this can be avoided by leveraging automatic detection
> (through the CLIDR_EL1 register), which is already implemented but
> currently doesn't work on RT kernels for the reason described above.
>
> This patch series attempts to enable automatic detection for RT kernels
> when no DT/ACPI cache information is available, by pre-allocating
> cacheinfo memory on the primary CPU.
>
> The first patch adds an architecture independent infrastructure that
> allows architecture specific code to take an early guess at the number
> of cache leaves of the secodary CPUs, while it runs in preemptible
> context on the primary CPU. At the same time, it gives architecture
> specific code the opportunity to go back later, while it runs on the
> secondary CPU, and reallocate the cacheinfo memory if the initial guess
> proves to be wrong.
>
> The second patch leverages the infrastructure implemented in the first
> patch and enables early cache depth detection for arm64.
>
> The patch series is based on an RFC patch that was posted to the
> linux-arm-kernel mailing list and discussed with a smaller audience:
> https://lore.kernel.org/all/[email protected]/
>
> Changes to v2:
> * Address minor coding style issue (unbalanced braces).
> * Move cacheinfo reallocation logic from detect_cache_attributes() to a
> new function to improve code readability.
> * Minor fix to cacheinfo reallocation logic to avoid a new detection of
> the cache level if/when detect_cache_attributes() is called again.
>
> Radu Rendec (2):
> cacheinfo: Add arch specific early level initializer
> cacheinfo: Add arm64 early level initializer implementation
>
> arch/arm64/kernel/cacheinfo.c | 32 +++++++++++----
> drivers/base/cacheinfo.c | 75 +++++++++++++++++++++++++----------
> include/linux/cacheinfo.h | 2 +
> 3 files changed, 79 insertions(+), 30 deletions(-)
>
Hello Pierre,
On Tue, 2023-04-11 at 14:36 +0200, Pierre Gondois wrote:
> Hello Radu,
> Some additional points:
> 1-
> For the record, Will made a comment about adding weak functions
> (cf. https://lore.kernel.org/all/20230327121734.GB31342@willie-the-truck/)
> but I don't see how it could be done otherwise ...
In that comment, Will suggested using static inline functions in a
header. It would probably work but for the sake of consistency with
init_cache_level() I would argue it's better to use a weak function in
this particular case.
> 2-
> The patch-set needs to be rebased on top of v6.3-rc6,
> otherwise there is a merge conflict.
Fair enough. I worked off of linux-rt-devel for obvious reasons, and
forgot to rebase. It's a trivial conflict, which both "git rebase" and
"git am -3" can fix automatically. I will keep this in mind for v4.
> 3-
> When trying the patch-set on an ACPI platform with no PPTT, it seems that
> fetch_cache_info() is not called from init_cpu_topology() because
> parse_acpi_topology() returns an error code. This result in a
> 'sleeping function called from invalid context' message. The following made
> it work for me:
>
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -838,7 +838,6 @@ void __init init_cpu_topology(void)
> * don't use partial information.
> */
> reset_cpu_topology();
> - return;
> }
>
> for_each_possible_cpu(cpu) {
Good catch! I think this calls for a dedicated patch in the series to
do just that and explain why the "return" statement is being removed.
> With 2 and 3 addressed:
> Reviewed-by: Pierre Gondois <[email protected]>
Thanks again for reviewing these patches and for all your input!
> Also maybe wait for Sudeep to have a look before sending a v4,
Sure. Looking at other patches, he seems to respond pretty quickly, so
I'll wait until tomorrow and then send v4 if I don't hear back.
Best regards,
Radu
> On 4/7/23 01:39, Radu Rendec wrote:
> > Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU")
> > tries to build the cacheinfo from the primary CPU prior to secondary
> > CPUs boot, if the DT/ACPI description contains cache information.
> > However, if such information is not present, it still reverts to the old
> > behavior, which allocates the cacheinfo memory on each secondary CPU. On
> > RT kernels, this triggers a "BUG: sleeping function called from invalid
> > context" because the allocation is done before preemption is first
> > enabled on the secondary CPU.
> >
> > The solution is to add cache information to DT/ACPI, but at least on
> > arm64 systems this can be avoided by leveraging automatic detection
> > (through the CLIDR_EL1 register), which is already implemented but
> > currently doesn't work on RT kernels for the reason described above.
> >
> > This patch series attempts to enable automatic detection for RT kernels
> > when no DT/ACPI cache information is available, by pre-allocating
> > cacheinfo memory on the primary CPU.
> >
> > The first patch adds an architecture independent infrastructure that
> > allows architecture specific code to take an early guess at the number
> > of cache leaves of the secodary CPUs, while it runs in preemptible
> > context on the primary CPU. At the same time, it gives architecture
> > specific code the opportunity to go back later, while it runs on the
> > secondary CPU, and reallocate the cacheinfo memory if the initial guess
> > proves to be wrong.
> >
> > The second patch leverages the infrastructure implemented in the first
> > patch and enables early cache depth detection for arm64.
> >
> > The patch series is based on an RFC patch that was posted to the
> > linux-arm-kernel mailing list and discussed with a smaller audience:
> > https://lore.kernel.org/all/[email protected]/
> >
> > Changes to v2:
> > * Address minor coding style issue (unbalanced braces).
> > * Move cacheinfo reallocation logic from detect_cache_attributes() to a
> > new function to improve code readability.
> > * Minor fix to cacheinfo reallocation logic to avoid a new detection of
> > the cache level if/when detect_cache_attributes() is called again.
> >
> > Radu Rendec (2):
> > cacheinfo: Add arch specific early level initializer
> > cacheinfo: Add arm64 early level initializer implementation
> >
> > arch/arm64/kernel/cacheinfo.c | 32 +++++++++++----
> > drivers/base/cacheinfo.c | 75 +++++++++++++++++++++++++----------
> > include/linux/cacheinfo.h | 2 +
> > 3 files changed, 79 insertions(+), 30 deletions(-)
> >
>
On Thu, Apr 06, 2023 at 07:39:25PM -0400, Radu Rendec wrote:
> This patch gives of architecture specific code the ability to initialize
> the cache level and allocate cacheinfo memory early, when cache level
> initialization runs on the primary CPU for all possible CPUs.
>
> This is part of a patch series that attempts to further the work in
> commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
> Previously, in the absence of any DT/ACPI cache info, architecture
> specific cache detection and info allocation for secondary CPUs would
> happen in non-preemptible context during early CPU initialization and
> trigger a "BUG: sleeping function called from invalid context" splat on
> an RT kernel.
>
> More specifically, this patch adds the early_cache_level() function,
> which is called by fetch_cache_info() as a fallback when the number of
> cache leaves cannot be extracted from DT/ACPI. In the default generic
> (weak) implementation, this new function returns -ENOENT, which
> preserves the original behavior for architectures that do not implement
> the function.
>
> Since early detection can get the number of cache leaves wrong in some
> cases*, additional logic is added to still call init_cache_level() later
> on the secondary CPU, therefore giving the architecture specific code an
> opportunity to go back and fix the initial guess. Again, the original
> behavior is preserved for architectures that do not implement the new
> function.
>
> * For example, on arm64, CLIDR_EL1 detection works only when it runs on
> the current CPU. In other words, a CPU cannot detect the cache depth
> for any other CPU than itself.
>
Thanks for the detailed description and putting this together.
> Signed-off-by: Radu Rendec <[email protected]>
> ---
> drivers/base/cacheinfo.c | 75 +++++++++++++++++++++++++++------------
> include/linux/cacheinfo.h | 2 ++
> 2 files changed, 55 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index f6573c335f4c..30f5553d3ebb 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -398,6 +398,11 @@ static void free_cache_attributes(unsigned int cpu)
> cache_shared_cpu_map_remove(cpu);
> }
>
> +int __weak early_cache_level(unsigned int cpu)
> +{
> + return -ENOENT;
> +}
> +
> int __weak init_cache_level(unsigned int cpu)
> {
> return -ENOENT;
> @@ -423,56 +428,82 @@ int allocate_cache_info(int cpu)
>
> int fetch_cache_info(unsigned int cpu)
> {
> - struct cpu_cacheinfo *this_cpu_ci;
> + struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> unsigned int levels = 0, split_levels = 0;
> int ret;
>
> if (acpi_disabled) {
> ret = init_of_cache_level(cpu);
> - if (ret < 0)
> - return ret;
> } else {
> ret = acpi_get_cache_info(cpu, &levels, &split_levels);
> - if (ret < 0)
> + if (!ret) {
> + this_cpu_ci->num_levels = levels;
> + /*
> + * This assumes that:
> + * - there cannot be any split caches (data/instruction)
> + * above a unified cache
> + * - data/instruction caches come by pair
> + */
> + this_cpu_ci->num_leaves = levels + split_levels;
> + }
> + }
> +
> + if (ret || !cache_leaves(cpu)) {
> + ret = early_cache_level(cpu);
> + if (ret)
> return ret;
>
> - this_cpu_ci = get_cpu_cacheinfo(cpu);
> - this_cpu_ci->num_levels = levels;
> - /*
> - * This assumes that:
> - * - there cannot be any split caches (data/instruction)
> - * above a unified cache
> - * - data/instruction caches come by pair
> - */
> - this_cpu_ci->num_leaves = levels + split_levels;
> + if (!cache_leaves(cpu))
> + return -ENOENT;
> +
> + this_cpu_ci->early_arch_info = true;
> }
> - if (!cache_leaves(cpu))
> - return -ENOENT;
>
> return allocate_cache_info(cpu);
> }
>
> -int detect_cache_attributes(unsigned int cpu)
> +static inline int init_level_allocate_ci(unsigned int cpu)
> {
> - int ret;
> + unsigned int early_leaves = cache_leaves(cpu);
>
> /* Since early initialization/allocation of the cacheinfo is allowed
> * via fetch_cache_info() and this also gets called as CPU hotplug
> * callbacks via cacheinfo_cpu_online, the init/alloc can be skipped
> * as it will happen only once (the cacheinfo memory is never freed).
> - * Just populate the cacheinfo.
> + * Just populate the cacheinfo. However, if the cacheinfo has been
> + * allocated early through the arch-specific early_cache_level() call,
> + * there is a chance the info is wrong (this can happen on arm64). In
> + * that case, call init_cache_level() anyway to give the arch-specific
> + * code a chance to make things right.
> */
> - if (per_cpu_cacheinfo(cpu))
> - goto populate_leaves;
> + if (per_cpu_cacheinfo(cpu) && !ci_cacheinfo(cpu)->early_arch_info)
> + return 0;
>
> if (init_cache_level(cpu) || !cache_leaves(cpu))
> return -ENOENT;
>
> - ret = allocate_cache_info(cpu);
> + /*
> + * Now that we have properly initialized the cache level info, make
> + * sure we don't try to do that again the next time we are called
> + * (e.g. as CPU hotplug callbacks).
> + */
> + ci_cacheinfo(cpu)->early_arch_info = false;
I am wondering if it makes sense to rename this as early_ci_levels or
something similar to indicate it is to do with just level information only ?
If not, it needs to be documented if the variable is not more specific.
I am sure I will forget it and will be wondering to understand in few
months time ????.
Other than that, it looks good. I will try to push this for v6.4 but it
may be bit late as it is good to have it in -next for sometime to get more
testing. Anyways send v4, will put it into -next ASAP and see what is the
best course of action after that.
--
Regards,
Sudeep
On Thu, Apr 06, 2023 at 07:39:26PM -0400, Radu Rendec wrote:
> This patch adds an architecture specific early cache level detection
> handler for arm64. This is basically the CLIDR_EL1 based detection that
> was previously done (only) in init_cache_level().
>
> This is part of a patch series that attempts to further the work in
> commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
> Previously, in the absence of any DT/ACPI cache info, architecture
> specific cache detection and info allocation for secondary CPUs would
> happen in non-preemptible context during early CPU initialization and
> trigger a "BUG: sleeping function called from invalid context" splat on
> an RT kernel.
>
> This patch does not solve the problem completely for RT kernels. It
> relies on the assumption that on most systems, the CPUs are symmetrical
> and therefore have the same number of cache leaves. The cacheinfo memory
> is allocated early (on the primary CPU), relying on the new handler. If
> later (when CLIDR_EL1 based detection runs again on the secondary CPU)
> the initial assumption proves to be wrong and the CPU has in fact more
> leaves, the cacheinfo memory is reallocated, and that still triggers a
> splat on an RT kernel.
>
> In other words, asymmetrical CPU systems *must* still provide cacheinfo
> data in DT/ACPI to avoid the splat on RT kernels (unless secondary CPUs
> happen to have less leaves than the primary CPU). But symmetrical CPU
> systems (the majority) can now get away without the additional DT/ACPI
> data and rely on CLIDR_EL1 based detection.
>
> Signed-off-by: Radu Rendec <[email protected]>
> ---
> arch/arm64/kernel/cacheinfo.c | 32 ++++++++++++++++++++++++--------
> 1 file changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
> index c307f69e9b55..520d17e4ebe9 100644
> --- a/arch/arm64/kernel/cacheinfo.c
> +++ b/arch/arm64/kernel/cacheinfo.c
> @@ -38,21 +38,37 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
> this_leaf->type = type;
> }
>
> -int init_cache_level(unsigned int cpu)
> +static void detect_cache_level(unsigned int *level, unsigned int *leaves)
> {
> - unsigned int ctype, level, leaves;
> - int fw_level, ret;
> - struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> + unsigned int ctype;
>
> - for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
> - ctype = get_cache_type(level);
> + for (*level = 1, *leaves = 0; *level <= MAX_CACHE_LEVEL; (*level)++) {
> + ctype = get_cache_type(*level);
> if (ctype == CACHE_TYPE_NOCACHE) {
> - level--;
> + (*level)--;
> break;
> }
> /* Separate instruction and data caches */
> - leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
> + *leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
> }
> +}
I prefer to use locals and assign the value to keep it simple/easy to follow.
Compiler can/will optimise this anyway. But I am fine either way.
I need Will's(or Catalin's) ack if I have to take the changes via Greg's tree.
--
Regards,
Sudeep
On Wed, 2023-04-12 at 12:36 +0100, Sudeep Holla wrote:
> On Thu, Apr 06, 2023 at 07:39:25PM -0400, Radu Rendec wrote:
> > This patch gives of architecture specific code the ability to initialize
> > the cache level and allocate cacheinfo memory early, when cache level
> > initialization runs on the primary CPU for all possible CPUs.
> >
> > This is part of a patch series that attempts to further the work in
> > commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
> > Previously, in the absence of any DT/ACPI cache info, architecture
> > specific cache detection and info allocation for secondary CPUs would
> > happen in non-preemptible context during early CPU initialization and
> > trigger a "BUG: sleeping function called from invalid context" splat on
> > an RT kernel.
> >
> > More specifically, this patch adds the early_cache_level() function,
> > which is called by fetch_cache_info() as a fallback when the number of
> > cache leaves cannot be extracted from DT/ACPI. In the default generic
> > (weak) implementation, this new function returns -ENOENT, which
> > preserves the original behavior for architectures that do not implement
> > the function.
> >
> > Since early detection can get the number of cache leaves wrong in some
> > cases*, additional logic is added to still call init_cache_level() later
> > on the secondary CPU, therefore giving the architecture specific code an
> > opportunity to go back and fix the initial guess. Again, the original
> > behavior is preserved for architectures that do not implement the new
> > function.
> >
> > * For example, on arm64, CLIDR_EL1 detection works only when it runs on
> > the current CPU. In other words, a CPU cannot detect the cache depth
> > for any other CPU than itself.
> >
>
> Thanks for the detailed description and putting this together.
No problem. Happy to help!
> > Signed-off-by: Radu Rendec <[email protected]>
> > ---
> > drivers/base/cacheinfo.c | 75 +++++++++++++++++++++++++++------------
> > include/linux/cacheinfo.h | 2 ++
> > 2 files changed, 55 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> > index f6573c335f4c..30f5553d3ebb 100644
> > --- a/drivers/base/cacheinfo.c
> > +++ b/drivers/base/cacheinfo.c
> > @@ -398,6 +398,11 @@ static void free_cache_attributes(unsigned int cpu)
> > cache_shared_cpu_map_remove(cpu);
> > }
> >
> > +int __weak early_cache_level(unsigned int cpu)
> > +{
> > + return -ENOENT;
> > +}
> > +
> > int __weak init_cache_level(unsigned int cpu)
> > {
> > return -ENOENT;
> > @@ -423,56 +428,82 @@ int allocate_cache_info(int cpu)
> >
> > int fetch_cache_info(unsigned int cpu)
> > {
> > - struct cpu_cacheinfo *this_cpu_ci;
> > + struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> > unsigned int levels = 0, split_levels = 0;
> > int ret;
> >
> > if (acpi_disabled) {
> > ret = init_of_cache_level(cpu);
> > - if (ret < 0)
> > - return ret;
> > } else {
> > ret = acpi_get_cache_info(cpu, &levels, &split_levels);
> > - if (ret < 0)
> > + if (!ret) {
> > + this_cpu_ci->num_levels = levels;
> > + /*
> > + * This assumes that:
> > + * - there cannot be any split caches (data/instruction)
> > + * above a unified cache
> > + * - data/instruction caches come by pair
> > + */
> > + this_cpu_ci->num_leaves = levels + split_levels;
> > + }
> > + }
> > +
> > + if (ret || !cache_leaves(cpu)) {
> > + ret = early_cache_level(cpu);
> > + if (ret)
> > return ret;
> >
> > - this_cpu_ci = get_cpu_cacheinfo(cpu);
> > - this_cpu_ci->num_levels = levels;
> > - /*
> > - * This assumes that:
> > - * - there cannot be any split caches (data/instruction)
> > - * above a unified cache
> > - * - data/instruction caches come by pair
> > - */
> > - this_cpu_ci->num_leaves = levels + split_levels;
> > + if (!cache_leaves(cpu))
> > + return -ENOENT;
> > +
> > + this_cpu_ci->early_arch_info = true;
> > }
> > - if (!cache_leaves(cpu))
> > - return -ENOENT;
> >
> > return allocate_cache_info(cpu);
> > }
> >
> > -int detect_cache_attributes(unsigned int cpu)
> > +static inline int init_level_allocate_ci(unsigned int cpu)
> > {
> > - int ret;
> > + unsigned int early_leaves = cache_leaves(cpu);
> >
> > /* Since early initialization/allocation of the cacheinfo is allowed
> > * via fetch_cache_info() and this also gets called as CPU hotplug
> > * callbacks via cacheinfo_cpu_online, the init/alloc can be skipped
> > * as it will happen only once (the cacheinfo memory is never freed).
> > - * Just populate the cacheinfo.
> > + * Just populate the cacheinfo. However, if the cacheinfo has been
> > + * allocated early through the arch-specific early_cache_level() call,
> > + * there is a chance the info is wrong (this can happen on arm64). In
> > + * that case, call init_cache_level() anyway to give the arch-specific
> > + * code a chance to make things right.
> > */
> > - if (per_cpu_cacheinfo(cpu))
> > - goto populate_leaves;
> > + if (per_cpu_cacheinfo(cpu) && !ci_cacheinfo(cpu)->early_arch_info)
> > + return 0;
> >
> > if (init_cache_level(cpu) || !cache_leaves(cpu))
> > return -ENOENT;
> >
> > - ret = allocate_cache_info(cpu);
> > + /*
> > + * Now that we have properly initialized the cache level info, make
> > + * sure we don't try to do that again the next time we are called
> > + * (e.g. as CPU hotplug callbacks).
> > + */
> > + ci_cacheinfo(cpu)->early_arch_info = false;
>
> I am wondering if it makes sense to rename this as early_ci_levels or
> something similar to indicate it is to do with just level information only ?
> If not, it needs to be documented if the variable is not more specific.
> I am sure I will forget it and will be wondering to understand in few
> months time ????.
Now that you mentioned it, I think it make perfect sense to rename it.
I like early_ci_levels, I will use that in v4.
> Other than that, it looks good. I will try to push this for v6.4 but it
> may be bit late as it is good to have it in -next for sometime to get more
> testing. Anyways send v4, will put it into -next ASAP and see what is the
> best course of action after that.
Sounds great. Thanks for reviewing the patches and for your input!
Best regards,
Radu
On Wed, 2023-04-12 at 12:40 +0100, Sudeep Holla wrote:
> On Thu, Apr 06, 2023 at 07:39:26PM -0400, Radu Rendec wrote:
> > This patch adds an architecture specific early cache level detection
> > handler for arm64. This is basically the CLIDR_EL1 based detection that
> > was previously done (only) in init_cache_level().
> >
> > This is part of a patch series that attempts to further the work in
> > commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU").
> > Previously, in the absence of any DT/ACPI cache info, architecture
> > specific cache detection and info allocation for secondary CPUs would
> > happen in non-preemptible context during early CPU initialization and
> > trigger a "BUG: sleeping function called from invalid context" splat on
> > an RT kernel.
> >
> > This patch does not solve the problem completely for RT kernels. It
> > relies on the assumption that on most systems, the CPUs are symmetrical
> > and therefore have the same number of cache leaves. The cacheinfo memory
> > is allocated early (on the primary CPU), relying on the new handler. If
> > later (when CLIDR_EL1 based detection runs again on the secondary CPU)
> > the initial assumption proves to be wrong and the CPU has in fact more
> > leaves, the cacheinfo memory is reallocated, and that still triggers a
> > splat on an RT kernel.
> >
> > In other words, asymmetrical CPU systems *must* still provide cacheinfo
> > data in DT/ACPI to avoid the splat on RT kernels (unless secondary CPUs
> > happen to have less leaves than the primary CPU). But symmetrical CPU
> > systems (the majority) can now get away without the additional DT/ACPI
> > data and rely on CLIDR_EL1 based detection.
> >
> > Signed-off-by: Radu Rendec <[email protected]>
> > ---
> > arch/arm64/kernel/cacheinfo.c | 32 ++++++++++++++++++++++++--------
> > 1 file changed, 24 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
> > index c307f69e9b55..520d17e4ebe9 100644
> > --- a/arch/arm64/kernel/cacheinfo.c
> > +++ b/arch/arm64/kernel/cacheinfo.c
> > @@ -38,21 +38,37 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
> > this_leaf->type = type;
> > }
> >
> > -int init_cache_level(unsigned int cpu)
> > +static void detect_cache_level(unsigned int *level, unsigned int *leaves)
> > {
> > - unsigned int ctype, level, leaves;
> > - int fw_level, ret;
> > - struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
> > + unsigned int ctype;
> >
> > - for (level = 1, leaves = 0; level <= MAX_CACHE_LEVEL; level++) {
> > - ctype = get_cache_type(level);
> > + for (*level = 1, *leaves = 0; *level <= MAX_CACHE_LEVEL; (*level)++) {
> > + ctype = get_cache_type(*level);
> > if (ctype == CACHE_TYPE_NOCACHE) {
> > - level--;
> > + (*level)--;
> > break;
> > }
> > /* Separate instruction and data caches */
> > - leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
> > + *leaves += (ctype == CACHE_TYPE_SEPARATE) ? 2 : 1;
> > }
> > +}
>
> I prefer to use locals and assign the value to keep it simple/easy to follow.
> Compiler can/will optimise this anyway. But I am fine either way.
To be honest, I was on the fence about this and decided to go with the
pointers, but now that you brought it up, I changed my mind :)
If I keep the original names for the locals and use something else for
the arguments, the patch will look cleaner and it will be obvious for
anyone looking at it that the algorithm for counting the levels/leaves
is unchanged.
Best regards,
Radu
> I need Will's(or Catalin's) ack if I have to take the changes via
> Greg's tree.
>