2022-07-15 10:46:22

by Sudeep Holla

[permalink] [raw]
Subject: [PATCH -next v2 1/2] cacheinfo: Use atomic allocation for percpu cache attributes

On couple of architectures like RISC-V and ARM64, we need to detect
cache attribues quite early during the boot when the secondary CPUs
start. So we will call detect_cache_attributes in the atomic context
and since use of normal allocation can sleep, we will end up getting
"sleeping in the atomic context" bug splat.

In order avoid that, move the allocation to use atomic version in
preparation to move the actual detection of cache attributes in the
CPU hotplug path which is atomic.

Cc: Ionela Voinescu <[email protected]>
Tested-by: Conor Dooley <[email protected]>
Signed-off-by: Sudeep Holla <[email protected]>
---
drivers/base/cacheinfo.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Hi Greg,

Can you apply these couple of patches directly if and when you are happy
with them ?

Regards,
Sudeep

v1->v2: This was added in v2

diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
index 65d566ff24c4..4b5cd08c5a65 100644
--- a/drivers/base/cacheinfo.c
+++ b/drivers/base/cacheinfo.c
@@ -356,7 +356,7 @@ int detect_cache_attributes(unsigned int cpu)
return -ENOENT;

per_cpu_cacheinfo(cpu) = kcalloc(cache_leaves(cpu),
- sizeof(struct cacheinfo), GFP_KERNEL);
+ sizeof(struct cacheinfo), GFP_ATOMIC);
if (per_cpu_cacheinfo(cpu) == NULL) {
cache_leaves(cpu) = 0;
return -ENOMEM;
--
2.37.1


2022-07-15 10:51:33

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH -next v2 1/2] cacheinfo: Use atomic allocation for percpu cache attributes

On 15/07/2022 11:26, Sudeep Holla wrote:
> On couple of architectures like RISC-V and ARM64, we need to detect
> cache attribues quite early during the boot when the secondary CPUs
> start. So we will call detect_cache_attributes in the atomic context
> and since use of normal allocation can sleep, we will end up getting
> "sleeping in the atomic context" bug splat.
>
> In order avoid that, move the allocation to use atomic version in
> preparation to move the actual detection of cache attributes in the
> CPU hotplug path which is atomic.
>
> Cc: Ionela Voinescu <[email protected]>
> Tested-by: Conor Dooley <[email protected]>

Since this was a conversion from comments on the other series:
Acked-by: Conor Dooley <[email protected]>

> Signed-off-by: Sudeep Holla <[email protected]>
> ---
> drivers/base/cacheinfo.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Hi Greg,
>
> Can you apply these couple of patches directly if and when you are happy
> with them ?
>
> Regards,
> Sudeep
>
> v1->v2: This was added in v2
>
> diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
> index 65d566ff24c4..4b5cd08c5a65 100644
> --- a/drivers/base/cacheinfo.c
> +++ b/drivers/base/cacheinfo.c
> @@ -356,7 +356,7 @@ int detect_cache_attributes(unsigned int cpu)
> return -ENOENT;
>
> per_cpu_cacheinfo(cpu) = kcalloc(cache_leaves(cpu),
> - sizeof(struct cacheinfo), GFP_KERNEL);
> + sizeof(struct cacheinfo), GFP_ATOMIC);
> if (per_cpu_cacheinfo(cpu) == NULL) {
> cache_leaves(cpu) = 0;
> return -ENOMEM;
> --
> 2.37.1
>

2022-07-15 11:11:47

by Sudeep Holla

[permalink] [raw]
Subject: [PATCH -next v2 2/2] arch_topology: Fix cache attributes detection in the CPU hotplug path

init_cpu_topology() is called only once at the boot and all the cache
attributes are detected early for all the possible CPUs. However when
the CPUs are hotplugged out, the cacheinfo gets removed. While the
attributes are added back when the CPUs are hotplugged back in as part
of CPU hotplug state machine, it ends up called quite late after the
update_siblings_masks() are called in the secondary_start_kernel()
resulting in wrong llc_sibling_masks.

Move the call to detect_cache_attributes() inside update_siblings_masks()
to ensure the cacheinfo is updated before the LLC sibling masks are
updated. This will fix the incorrect LLC sibling masks generated when
the CPUs are hotplugged out and hotplugged back in again.

Reported-by: Ionela Voinescu <[email protected]>
Tested-by: Ionela Voinescu <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Reviewed-by: Ionela Voinescu <[email protected]>
Signed-off-by: Sudeep Holla <[email protected]>
---
drivers/base/arch_topology.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)

v1->v2:
- No change in this patch, but 1/2 was added to fix possible
bug "sleeping in the atomic context" with this patch.
- Added all the received tags

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 441e14ac33a4..0424b59b695e 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -732,7 +732,11 @@ const struct cpumask *cpu_clustergroup_mask(int cpu)
void update_siblings_masks(unsigned int cpuid)
{
struct cpu_topology *cpu_topo, *cpuid_topo = &cpu_topology[cpuid];
- int cpu;
+ int cpu, ret;
+
+ ret = detect_cache_attributes(cpuid);
+ if (ret)
+ pr_info("Early cacheinfo failed, ret = %d\n", ret);

/* update core and thread sibling masks */
for_each_online_cpu(cpu) {
@@ -821,7 +825,7 @@ __weak int __init parse_acpi_topology(void)
#if defined(CONFIG_ARM64) || defined(CONFIG_RISCV)
void __init init_cpu_topology(void)
{
- int ret, cpu;
+ int ret;

reset_cpu_topology();
ret = parse_acpi_topology();
@@ -836,13 +840,5 @@ void __init init_cpu_topology(void)
reset_cpu_topology();
return;
}
-
- for_each_possible_cpu(cpu) {
- ret = detect_cache_attributes(cpu);
- if (ret) {
- pr_info("Early cacheinfo failed, ret = %d\n", ret);
- break;
- }
- }
}
#endif
--
2.37.1

2022-07-19 15:34:03

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH -next v2 2/2] arch_topology: Fix cache attributes detection in the CPU hotplug path

Hi Sudeep,

On Fri, Jul 15, 2022 at 12:28 PM Sudeep Holla <[email protected]> wrote:
> init_cpu_topology() is called only once at the boot and all the cache
> attributes are detected early for all the possible CPUs. However when
> the CPUs are hotplugged out, the cacheinfo gets removed. While the
> attributes are added back when the CPUs are hotplugged back in as part
> of CPU hotplug state machine, it ends up called quite late after the
> update_siblings_masks() are called in the secondary_start_kernel()
> resulting in wrong llc_sibling_masks.
>
> Move the call to detect_cache_attributes() inside update_siblings_masks()
> to ensure the cacheinfo is updated before the LLC sibling masks are
> updated. This will fix the incorrect LLC sibling masks generated when
> the CPUs are hotplugged out and hotplugged back in again.
>
> Reported-by: Ionela Voinescu <[email protected]>
> Tested-by: Ionela Voinescu <[email protected]>
> Reviewed-by: Conor Dooley <[email protected]>
> Reviewed-by: Ionela Voinescu <[email protected]>
> Signed-off-by: Sudeep Holla <[email protected]>
> ---
> drivers/base/arch_topology.c | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
>
> v1->v2:
> - No change in this patch, but 1/2 was added to fix possible
> bug "sleeping in the atomic context" with this patch.
> - Added all the received tags

Thank you, the "Early cacheinfo failed, ret = -12" is gone.

Tested-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds