2024-04-11 11:35:45

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v4 12/15] arm64: psci: Ignore DENIED CPUs

On Wed, 31 Jan 2024 16:50:38 +0000
Russell King (Oracle) <[email protected]> wrote:

> From: Jean-Philippe Brucker <[email protected]>
>
> When a CPU is marked as disabled, but online capable in the MADT, PSCI
> applies some firmware policy to control when it can be brought online.
> PSCI returns DENIED to a CPU_ON request if this is not currently
> permitted. The OS can learn the current policy from the _STA enabled bit.
>
> Handle the PSCI DENIED return code gracefully instead of printing an
> error.
>
> See https://developer.arm.com/documentation/den0022/f/?lang=en page 58.
>
> Signed-off-by: Jean-Philippe Brucker <[email protected]>
> [ morse: Rewrote commit message ]
> Signed-off-by: James Morse <[email protected]>
> Tested-by: Miguel Luis <[email protected]>
> Tested-by: Vishnu Pajjuri <[email protected]>
> Tested-by: Jianyong Wu <[email protected]>
> Reviewed-by: Jonathan Cameron <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>
> ---

This change to return failure from __cpu_up in non error cases exposes
an possible issue with cpu_up() in kernel/cpu.c in that it brings the numa node
before we try (and fail) to bring up CPUs that may be denied.

We could try offlining the numa node on error, or just register it later.

Currently I'm testing this change which I think is harmless for cases that don't
fail the cpu_up()

For the cpu hotplug path note the node only comes online wiht the cpu online, not
the earlier hotplug. Reasonable given there is nothing online in the node before
that point.

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 537099bf5d02..a4730396ccea 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1742,10 +1742,6 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
return -EINVAL;
}

- err = try_online_node(cpu_to_node(cpu));
- if (err)
- return err;
-
cpu_maps_update_begin();

if (cpu_hotplug_disabled) {
@@ -1760,7 +1756,10 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
err = _cpu_up(cpu, 0, target);
out:
cpu_maps_update_done();
- return err;
+ if (err)
+ return err;
+
+ return try_online_node(cpu_to_node(cpu));
}

There is a kicker in the remove path where currently check_cpu_on_node()
checks for_each_present_cpu() whereas to work for us we need to use
for_each_online_cpu() or the node is never removed.

Now my current view is that we should only show
nodes in /sys/bus/nodes/devices/ if there is a CPU online (assuming no other
reasons the node should be online such as memory).
That's easy enough to make work but all I'm really learning is that the semantics
of what is an online form a node point of view is not consistent.

Fixing this will create a minor change on x86 but does anyone really care
about what happens in the offline path wrt to 'when' the node disappears?
I think the corner case is.

1. Add 2 CPUs (A, B) in a CPU only node.
2. Online CPU A - this brings /sys/bus/devices/nodeX online
3. Remove CPU A - no effect because the check for try_remove is on presence and CPU B is
still present.
4. Online CPU B - no change.
5. Offline CPU B - no change.
4. Remove CPU B - /sys/bus/device/nodeX offline (disappears)

To make it work on arm64 where we never make CPUs not present.

1. Add 2 CPUs (A, B) in a CPU only node.
2. Online CPU A - this brings /sys/bus/devices/nodeX online
3. Remove CPU A - /sys/bus/devices/nodeX offline (disappears)
4. Online CPU B - this brings /sys/bus/device/nodeX online
5. Offline CPU B - no change (node updates only happen in hotplug code)
6. Remove CPU B - /sys/bus/device/nodeX offline (disappears).

Step 5 may seem weird but I think we can't mess with nodes there because
userspace may well rely on them still being around for some reason
(it's a much more common situation).

My assumption is that step3 removing the node isn't going to hurt anyone?

If no one shouts, i'll go ahead with rolling a v5 patch set where this is done.


Jonathan





> Changes since RFC v2
> * Add specification reference
> * Use EPERM rather than EPROBE_DEFER
> Changes since RFC v3:
> * Use EPERM everywhere
> * Drop unnecessary changes to drivers/firmware/psci/psci.c
> ---
> arch/arm64/kernel/psci.c | 2 +-
> arch/arm64/kernel/smp.c | 3 ++-
> 2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
> index 29a8e444db83..fabd732d0a2d 100644
> --- a/arch/arm64/kernel/psci.c
> +++ b/arch/arm64/kernel/psci.c
> @@ -40,7 +40,7 @@ static int cpu_psci_cpu_boot(unsigned int cpu)
> {
> phys_addr_t pa_secondary_entry = __pa_symbol(secondary_entry);
> int err = psci_ops.cpu_on(cpu_logical_map(cpu), pa_secondary_entry);
> - if (err)
> + if (err && err != -EPERM)
> pr_err("failed to boot CPU%d (%d)\n", cpu, err);
>
> return err;
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 4ced34f62dab..dc0e0b3ec2d4 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -132,7 +132,8 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> /* Now bring the CPU into our world */
> ret = boot_secondary(cpu, idle);
> if (ret) {
> - pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> + if (ret != -EPERM)
> + pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> return ret;
> }
>



2024-04-11 13:27:29

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH RFC v4 12/15] arm64: psci: Ignore DENIED CPUs

On Thu, 11 Apr 2024 12:35:23 +0100
Jonathan Cameron <[email protected]> wrote:

> On Wed, 31 Jan 2024 16:50:38 +0000
> Russell King (Oracle) <[email protected]> wrote:
>
> > From: Jean-Philippe Brucker <[email protected]>
> >
> > When a CPU is marked as disabled, but online capable in the MADT, PSCI
> > applies some firmware policy to control when it can be brought online.
> > PSCI returns DENIED to a CPU_ON request if this is not currently
> > permitted. The OS can learn the current policy from the _STA enabled bit.
> >
> > Handle the PSCI DENIED return code gracefully instead of printing an
> > error.
> >
> > See https://developer.arm.com/documentation/den0022/f/?lang=en page 58.
> >
> > Signed-off-by: Jean-Philippe Brucker <[email protected]>
> > [ morse: Rewrote commit message ]
> > Signed-off-by: James Morse <[email protected]>
> > Tested-by: Miguel Luis <[email protected]>
> > Tested-by: Vishnu Pajjuri <[email protected]>
> > Tested-by: Jianyong Wu <[email protected]>
> > Reviewed-by: Jonathan Cameron <[email protected]>
> > Signed-off-by: Russell King (Oracle) <[email protected]>
> > ---
>
> This change to return failure from __cpu_up in non error cases exposes
> an possible issue with cpu_up() in kernel/cpu.c in that it brings the numa node
> before we try (and fail) to bring up CPUs that may be denied.
>
> We could try offlining the numa node on error, or just register it later.
>
> Currently I'm testing this change which I think is harmless for cases that don't
> fail the cpu_up()
>
> For the cpu hotplug path note the node only comes online wiht the cpu online, not
> the earlier hotplug. Reasonable given there is nothing online in the node before
> that point.
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 537099bf5d02..a4730396ccea 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1742,10 +1742,6 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
> return -EINVAL;
> }
>
> - err = try_online_node(cpu_to_node(cpu));
> - if (err)
> - return err;
> -
> cpu_maps_update_begin();
>
> if (cpu_hotplug_disabled) {
> @@ -1760,7 +1756,10 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
> err = _cpu_up(cpu, 0, target);
> out:
> cpu_maps_update_done();
> - return err;
> + if (err)
> + return err;
> +
> + return try_online_node(cpu_to_node(cpu));
> }
>
> There is a kicker in the remove path where currently check_cpu_on_node()
> checks for_each_present_cpu() whereas to work for us we need to use
> for_each_online_cpu() or the node is never removed.
>
> Now my current view is that we should only show
> nodes in /sys/bus/nodes/devices/ if there is a CPU online (assuming no other
> reasons the node should be online such as memory).
> That's easy enough to make work but all I'm really learning is that the semantics
> of what is an online form a node point of view is not consistent.
>
> Fixing this will create a minor change on x86 but does anyone really care
> about what happens in the offline path wrt to 'when' the node disappears?
> I think the corner case is.
>
> 1. Add 2 CPUs (A, B) in a CPU only node.
> 2. Online CPU A - this brings /sys/bus/devices/nodeX online
> 3. Remove CPU A - no effect because the check for try_remove is on presence and CPU B is
> still present.
> 4. Online CPU B - no change.
> 5. Offline CPU B - no change.
> 4. Remove CPU B - /sys/bus/device/nodeX offline (disappears)
>
> To make it work on arm64 where we never make CPUs not present.
>
> 1. Add 2 CPUs (A, B) in a CPU only node.
> 2. Online CPU A - this brings /sys/bus/devices/nodeX online
> 3. Remove CPU A - /sys/bus/devices/nodeX offline (disappears)
> 4. Online CPU B - this brings /sys/bus/device/nodeX online
> 5. Offline CPU B - no change (node updates only happen in hotplug code)
> 6. Remove CPU B - /sys/bus/device/nodeX offline (disappears).
>
> Step 5 may seem weird but I think we can't mess with nodes there because
> userspace may well rely on them still being around for some reason
> (it's a much more common situation).
>
> My assumption is that step3 removing the node isn't going to hurt anyone?
>
> If no one shouts, i'll go ahead with rolling a v5 patch set where this is done.

This needs a little more thought as if we follow above sequence there isn't
currently any cleanup of
/sys/bus/cpu/devices/cpuB/node link.

If you follow the sequence above an attempt is made to create it again.
Whilst I have a WIP bit of dancing code that deals with this it is ugly as
the check on whether a node is online needs to be dropped on this path.
The problem coming back to the tear down and setup paths being in entirely different
states and not remotely symmetric.

So my proposal for now is to treat the whole numa node problem as something
for another day.

Step 1)
For ARM64 cpu hp - NUMA nodes created at boot for any present CPUs - so all
of them. They never go away, hence no annoying tear down to do.

Step 2)
We can consider moving to dynamic numa node handling either via the messy
approach I have working, or via more major surgery.
I have a bunch of other work in this area for memory hotplug, so maybe
I can role those 2 activities together.

Jonathan
>
>
> Jonathan
>
>
>
>
>
> > Changes since RFC v2
> > * Add specification reference
> > * Use EPERM rather than EPROBE_DEFER
> > Changes since RFC v3:
> > * Use EPERM everywhere
> > * Drop unnecessary changes to drivers/firmware/psci/psci.c
> > ---
> > arch/arm64/kernel/psci.c | 2 +-
> > arch/arm64/kernel/smp.c | 3 ++-
> > 2 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
> > index 29a8e444db83..fabd732d0a2d 100644
> > --- a/arch/arm64/kernel/psci.c
> > +++ b/arch/arm64/kernel/psci.c
> > @@ -40,7 +40,7 @@ static int cpu_psci_cpu_boot(unsigned int cpu)
> > {
> > phys_addr_t pa_secondary_entry = __pa_symbol(secondary_entry);
> > int err = psci_ops.cpu_on(cpu_logical_map(cpu), pa_secondary_entry);
> > - if (err)
> > + if (err && err != -EPERM)
> > pr_err("failed to boot CPU%d (%d)\n", cpu, err);
> >
> > return err;
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 4ced34f62dab..dc0e0b3ec2d4 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -132,7 +132,8 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> > /* Now bring the CPU into our world */
> > ret = boot_secondary(cpu, idle);
> > if (ret) {
> > - pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> > + if (ret != -EPERM)
> > + pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> > return ret;
> > }
> >
>