From: Vincent Donnefort <[email protected]>
The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none
of the atomic can. Hence, rollback is not possible after a hotunplug
CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow
it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback
(finish_cpu()) cannot fail anyway.
Signed-off-by: Vincent Donnefort <[email protected]>
---
kernel/cpu.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 9121edf..bcd7b2a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev,
return -EINVAL;
/*
- * Cannot fail STARTING/DYING callbacks.
+ * Cannot fail STARTING/DYING callbacks. Also, those callbacks are
+ * triggered by BRINGUP_CPU bringup callback. Therefore, the latter
+ * can't fail during hotunplug, as it would mean we have no way of
+ * rolling back the atomic states that have been previously teared
+ * down.
*/
- if (cpuhp_is_atomic_state(fail))
+ if (cpuhp_is_atomic_state(fail) ||
+ (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
return -EINVAL;
/*
--
2.7.4
On Mon, Jan 11, 2021 at 05:10:45PM +0000, [email protected] wrote:
> From: Vincent Donnefort <[email protected]>
>
> The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
> triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none
> of the atomic can. Hence, rollback is not possible after a hotunplug
> CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow
> it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback
> (finish_cpu()) cannot fail anyway.
>
> Signed-off-by: Vincent Donnefort <[email protected]>
> ---
> kernel/cpu.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 9121edf..bcd7b2a 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev,
> return -EINVAL;
>
> /*
> - * Cannot fail STARTING/DYING callbacks.
> + * Cannot fail STARTING/DYING callbacks. Also, those callbacks are
> + * triggered by BRINGUP_CPU bringup callback. Therefore, the latter
> + * can't fail during hotunplug, as it would mean we have no way of
> + * rolling back the atomic states that have been previously teared
> + * down.
> */
> - if (cpuhp_is_atomic_state(fail))
> + if (cpuhp_is_atomic_state(fail) ||
> + (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
> return -EINVAL;
Should we instead disallow failing any state that has .cant_stop ?
On Wed, Jan 20, 2021 at 01:58:35PM +0100, Peter Zijlstra wrote:
> On Mon, Jan 11, 2021 at 05:10:45PM +0000, [email protected] wrote:
> > From: Vincent Donnefort <[email protected]>
> >
> > The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
> > triggered by the CPUHP_BRINGUP_CPU step. If the latter doesn't run, none
> > of the atomic can. Hence, rollback is not possible after a hotunplug
> > CPUHP_BRINGUP_CPU step failure and the "fail" interface shouldn't allow
> > it. Moreover, the current CPUHP_BRINGUP_CPU teardown callback
> > (finish_cpu()) cannot fail anyway.
> >
> > Signed-off-by: Vincent Donnefort <[email protected]>
> > ---
> > kernel/cpu.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 9121edf..bcd7b2a 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -2216,9 +2216,14 @@ static ssize_t write_cpuhp_fail(struct device *dev,
> > return -EINVAL;
> >
> > /*
> > - * Cannot fail STARTING/DYING callbacks.
> > + * Cannot fail STARTING/DYING callbacks. Also, those callbacks are
> > + * triggered by BRINGUP_CPU bringup callback. Therefore, the latter
> > + * can't fail during hotunplug, as it would mean we have no way of
> > + * rolling back the atomic states that have been previously teared
> > + * down.
> > */
> > - if (cpuhp_is_atomic_state(fail))
> > + if (cpuhp_is_atomic_state(fail) ||
> > + (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
> > return -EINVAL;
>
> Should we instead disallow failing any state that has .cant_stop ?
We would reduce the scope of what can be tested: bringup_cpu() and
takedown_cpu() are both marked as "cant_stop". Still, those callbacks are
allowed to fail.
Checking for cant_stop, made me also see that write_cpuhp_target() is probably
missing a check for cpuhp_is_atomic_state(). For the same reason as this patch,
when doing cpu_down(), we can't stop in one of these states.
--
Vincent
On Wed, Jan 20, 2021 at 03:17:24PM +0000, Vincent Donnefort wrote:
> On Wed, Jan 20, 2021 at 01:58:35PM +0100, Peter Zijlstra wrote:
> > On Mon, Jan 11, 2021 at 05:10:45PM +0000, [email protected] wrote:
> > > + if (cpuhp_is_atomic_state(fail) ||
> > > + (fail == CPUHP_BRINGUP_CPU && st->state > CPUHP_BRINGUP_CPU))
> > > return -EINVAL;
> >
> > Should we instead disallow failing any state that has .cant_stop ?
>
> We would reduce the scope of what can be tested: bringup_cpu() and
> takedown_cpu() are both marked as "cant_stop". Still, those callbacks are
> allowed to fail.
Fair enough. I suppose we can add an additional cant_fail field, but I'm
not sure that's worth the effort over this.