writing the current state back into hotplug/target calls cpu_down()
which will set cpu dying even when it isn't and then nothing will
ever clear it. A stress test that reads values and writes them back
for all cpu device files in sysfs will trigger the BUG() in
select_fallback_rq once all cpus are marked as dying.
kernel/cpu.c::target_store()
...
if (st->state < target)
ret = cpu_up(dev->id, target);
else
ret = cpu_down(dev->id, target);
cpu_down() -> cpu_set_state()
bool bringup = st->state < target;
...
if (cpu_dying(cpu) != !bringup)
set_cpu_dying(cpu, !bringup);
Make this safe by catching the case where target == state
and bailing early.
Signed-off-by: Phil Auld <[email protected]>
---
Yeah, I know... don't do that. But it's still messy.
!< != >
kernel/cpu.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index d0a9aa0b42e8..8a71b1149c60 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2302,6 +2302,9 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr,
return -EINVAL;
#endif
+ if (target == st->state)
+ return count;
+
ret = lock_device_hotplug_sysfs();
if (ret)
return ret;
--
2.18.0
On Tue, May 24, 2022 at 04:11:51PM +0100 Valentin Schneider wrote:
> On 23/05/22 10:47, Phil Auld wrote:
> > writing the current state back into hotplug/target calls cpu_down()
> > which will set cpu dying even when it isn't and then nothing will
> > ever clear it. A stress test that reads values and writes them back
> > for all cpu device files in sysfs will trigger the BUG() in
> > select_fallback_rq once all cpus are marked as dying.
> >
> > kernel/cpu.c::target_store()
> > ...
> > if (st->state < target)
> > ret = cpu_up(dev->id, target);
> > else
> > ret = cpu_down(dev->id, target);
> >
> > cpu_down() -> cpu_set_state()
> > bool bringup = st->state < target;
> > ...
> > if (cpu_dying(cpu) != !bringup)
> > set_cpu_dying(cpu, !bringup);
> >
> > Make this safe by catching the case where target == state
> > and bailing early.
> >
> > Signed-off-by: Phil Auld <[email protected]>
> > ---
> >
> > Yeah, I know... don't do that. But it's still messy.
> >
> > !< != >
> >
> > kernel/cpu.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index d0a9aa0b42e8..8a71b1149c60 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -2302,6 +2302,9 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr,
> > return -EINVAL;
> > #endif
> >
> > + if (target == st->state)
> > + return count;
> > +
>
> The current checks are against static boundaries, this has to compare
> against st->state - AFAICT this could race with another hotplug operation
> to the same CPU, e.g.
>
> CPU42.cpuhp_state
> ->state == CPUHP_AP_SCHED_STARTING
> ->target == CPUHP_ONLINE
>
> <write CPUHP_ONLINE via sysfs, OK because current state != CPUHP_ONLINE>
>
> CPU42.cpuhp_state == CPUHP_ONLINE
>
> <issues ensue>
>
What I'm trying to fix is not a race. It's just bogus logic.
There is an assumption here that !< means > which is just not
true.
This potential race seems orthogonal and not even effected
one way or the other by this code change, right?
I could not convince myself that the check I added needed to
be under the locks because returning success when the state
is already reporting what you asked for seems harmless.
>
> _cpu_up() has:
>
> /*
> * The caller of cpu_up() might have raced with another
> * caller. Nothing to do.
> */
> if (st->state >= target)
> goto out;
>
> Looks like we want an equivalent in _cpu_down(), what do you think?
Maybe. I still think that
> > if (st->state < target)
> > ret = cpu_up(dev->id, target);
> > else
> > ret = cpu_down(dev->id, target);
is not correct. If we catch the == case earlier then this makes
sense as is.
I suppose "if (st->state <= target)" would work too since __cpu_up()
already checks. Catching this sooner seems better to me though.
>
> > ret = lock_device_hotplug_sysfs();
> > if (ret)
> > return ret;
> > --
> > 2.18.0
>
Cheers,
Phil
--
On 23/05/22 10:47, Phil Auld wrote:
> writing the current state back into hotplug/target calls cpu_down()
> which will set cpu dying even when it isn't and then nothing will
> ever clear it. A stress test that reads values and writes them back
> for all cpu device files in sysfs will trigger the BUG() in
> select_fallback_rq once all cpus are marked as dying.
>
> kernel/cpu.c::target_store()
> ...
> if (st->state < target)
> ret = cpu_up(dev->id, target);
> else
> ret = cpu_down(dev->id, target);
>
> cpu_down() -> cpu_set_state()
> bool bringup = st->state < target;
> ...
> if (cpu_dying(cpu) != !bringup)
> set_cpu_dying(cpu, !bringup);
>
> Make this safe by catching the case where target == state
> and bailing early.
>
> Signed-off-by: Phil Auld <[email protected]>
> ---
>
> Yeah, I know... don't do that. But it's still messy.
>
> !< != >
>
> kernel/cpu.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index d0a9aa0b42e8..8a71b1149c60 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2302,6 +2302,9 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr,
> return -EINVAL;
> #endif
>
> + if (target == st->state)
> + return count;
> +
The current checks are against static boundaries, this has to compare
against st->state - AFAICT this could race with another hotplug operation
to the same CPU, e.g.
CPU42.cpuhp_state
->state == CPUHP_AP_SCHED_STARTING
->target == CPUHP_ONLINE
<write CPUHP_ONLINE via sysfs, OK because current state != CPUHP_ONLINE>
CPU42.cpuhp_state == CPUHP_ONLINE
<issues ensue>
_cpu_up() has:
/*
* The caller of cpu_up() might have raced with another
* caller. Nothing to do.
*/
if (st->state >= target)
goto out;
Looks like we want an equivalent in _cpu_down(), what do you think?
> ret = lock_device_hotplug_sysfs();
> if (ret)
> return ret;
> --
> 2.18.0
Hi Valentin,
On Tue, May 24, 2022 at 04:11:51PM +0100 Valentin Schneider wrote:
> On 23/05/22 10:47, Phil Auld wrote:
> > writing the current state back into hotplug/target calls cpu_down()
> > which will set cpu dying even when it isn't and then nothing will
> > ever clear it. A stress test that reads values and writes them back
> > for all cpu device files in sysfs will trigger the BUG() in
> > select_fallback_rq once all cpus are marked as dying.
> >
> > kernel/cpu.c::target_store()
> > ...
> > if (st->state < target)
> > ret = cpu_up(dev->id, target);
> > else
> > ret = cpu_down(dev->id, target);
> >
> > cpu_down() -> cpu_set_state()
> > bool bringup = st->state < target;
> > ...
> > if (cpu_dying(cpu) != !bringup)
> > set_cpu_dying(cpu, !bringup);
> >
> > Make this safe by catching the case where target == state
> > and bailing early.
> >
> > Signed-off-by: Phil Auld <[email protected]>
> > ---
> >
> > Yeah, I know... don't do that. But it's still messy.
> >
> > !< != >
> >
> > kernel/cpu.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index d0a9aa0b42e8..8a71b1149c60 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -2302,6 +2302,9 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr,
> > return -EINVAL;
> > #endif
> >
> > + if (target == st->state)
> > + return count;
> > +
>
> The current checks are against static boundaries, this has to compare
> against st->state - AFAICT this could race with another hotplug operation
> to the same CPU, e.g.
>
> CPU42.cpuhp_state
> ->state == CPUHP_AP_SCHED_STARTING
> ->target == CPUHP_ONLINE
>
> <write CPUHP_ONLINE via sysfs, OK because current state != CPUHP_ONLINE>
>
> CPU42.cpuhp_state == CPUHP_ONLINE
>
> <issues ensue>
>
>
> _cpu_up() has:
>
> /*
> * The caller of cpu_up() might have raced with another
> * caller. Nothing to do.
> */
> if (st->state >= target)
> goto out;
>
> Looks like we want an equivalent in _cpu_down(), what do you think?
>
I did it like this (shown below) and from my test it also works for
this case.
I could move it below the lock and goto out; instead if you think
that is better. It still seems better to me to stop this higher up
because there's work being done in the out path too. We're not
actually doing any hot(un)plug so doing post unplug cleanup seems
iffy.
_cpu_down()
...
out:
cpus_write_unlock();
/*
* Do post unplug cleanup. This is still protected against
* concurrent CPU hotplug via cpu_add_remove_lock.
*/
lockup_detector_cleanup();
arch_smt_update();
cpu_up_down_serialize_trainwrecks(tasks_frozen);
return ret;
}
----------
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 8a71b1149c60..e36788742d18 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1130,6 +1130,13 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
if (!cpu_present(cpu))
return -EINVAL;
+ /*
+ * The caller of cpu_down() might have raced with another
+ * caller. Nothing to do.
+ */
+ if (st->state <= target)
+ return 0;
+
cpus_write_lock();
cpuhp_tasks_frozen = tasks_frozen;
Cheers,
Phil
--
On 24/05/22 15:37, Phil Auld wrote:
> Hi Valentin,
>
> I did it like this (shown below) and from my test it also works for
> this case.
>
> I could move it below the lock and goto out; instead if you think
> that is better.
I *think* the cpu_add_remove_lock mutex should be sufficient here.
> It still seems better to me to stop this higher up
> because there's work being done in the out path too. We're not
> actually doing any hot(un)plug so doing post unplug cleanup seems
> iffy.
>
I think so too; I now realize _cpu_up() and _cpu_down() have slightly
different prologues: _cpu_up() does its hotplug states / cpu_present_mask
checks *after* grabbing the cpu_hotplug_lock, _cpu_down() does that *before*...
So I believe what you have below is fine, modulo whether we want to align
the prologue of these two functions or not :-)
> _cpu_down()
> ...
> out:
> cpus_write_unlock();
> /*
> * Do post unplug cleanup. This is still protected against
> * concurrent CPU hotplug via cpu_add_remove_lock.
> */
> lockup_detector_cleanup();
> arch_smt_update();
> cpu_up_down_serialize_trainwrecks(tasks_frozen);
> return ret;
> }
>
> ----------
>
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 8a71b1149c60..e36788742d18 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1130,6 +1130,13 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
> if (!cpu_present(cpu))
> return -EINVAL;
>
> + /*
> + * The caller of cpu_down() might have raced with another
> + * caller. Nothing to do.
> + */
> + if (st->state <= target)
> + return 0;
> +
> cpus_write_lock();
>
> cpuhp_tasks_frozen = tasks_frozen;
>
>
>
>
> Cheers,
> Phil
>
> --
On 24/05/22 12:39, Phil Auld wrote:
> On Tue, May 24, 2022 at 04:11:51PM +0100 Valentin Schneider wrote:
>
>>
>> _cpu_up() has:
>>
>> /*
>> * The caller of cpu_up() might have raced with another
>> * caller. Nothing to do.
>> */
>> if (st->state >= target)
>> goto out;
>>
>> Looks like we want an equivalent in _cpu_down(), what do you think?
>
> Maybe. I still think that
>
>> > if (st->state < target)
>> > ret = cpu_up(dev->id, target);
>> > else
>> > ret = cpu_down(dev->id, target);
>
> is not correct. If we catch the == case earlier then this makes
> sense as is.
>
> I suppose "if (st->state <= target)" would work too since __cpu_up()
> already checks. Catching this sooner seems better to me though.
>
Yeah it would be neater to not even enter cpu_{up, down}(), but my paranoia
makes me think we need the comparison to happen with at least the
cpu_add_remove_lock held to make sure st->state isn't moving under our
feet, otherwise we may still end up with target == state in _cpu_down() and
hit the bug you're describing.
>>
>> > ret = lock_device_hotplug_sysfs();
>> > if (ret)
>> > return ret;
>> > --
>> > 2.18.0
>>
>
>
> Cheers,
> Phil
>
> --
On 25/05/22 09:31, Phil Auld wrote:
> On Wed, May 25, 2022 at 10:48:31AM +0100 Valentin Schneider wrote:
>>
>> Yeah it would be neater to not even enter cpu_{up, down}(), but my paranoia
>> makes me think we need the comparison to happen with at least the
>> cpu_add_remove_lock held to make sure st->state isn't moving under our
>> feet, otherwise we may still end up with target == state in _cpu_down() and
>> hit the bug you're describing.
>>
>
> This is what I was originally doing before I tried to "optimize" it:
>
> if (st->state < target)
> ret = cpu_up(dev->id, target);
> else if (st->state > target)
> ret = cpu_down(dev->id, target);
>
> This does the check under the lock and just falls through if state==target.
> I think I'll go back to that version.
>
> I also noticed while testing that the boot cpu does not get its target set.
> It's got state 233 but target 0. So reading that out and writing it back
> on offlines cpu0. I'll try to find where that is not getting set.
>
If I had to guess I'd say it's because the boot CPU doesn't go through the
regular hotplug machinery and sets its state straight to CPUHP_ONLINE
/me digs
Maybe around this?
void __init boot_cpu_hotplug_init(void)
{
this_cpu_write(cpuhp_state.booted_once, true);
this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
}
On Wed, May 25, 2022 at 10:48:31AM +0100 Valentin Schneider wrote:
> On 24/05/22 12:39, Phil Auld wrote:
> >
> >> > if (st->state < target)
> >> > ret = cpu_up(dev->id, target);
> >> > else
> >> > ret = cpu_down(dev->id, target);
> >
> > is not correct. If we catch the == case earlier then this makes
> > sense as is.
> >
> > I suppose "if (st->state <= target)" would work too since __cpu_up()
> > already checks. Catching this sooner seems better to me though.
> >
>
> Yeah it would be neater to not even enter cpu_{up, down}(), but my paranoia
> makes me think we need the comparison to happen with at least the
> cpu_add_remove_lock held to make sure st->state isn't moving under our
> feet, otherwise we may still end up with target == state in _cpu_down() and
> hit the bug you're describing.
>
This is what I was originally doing before I tried to "optimize" it:
if (st->state < target)
ret = cpu_up(dev->id, target);
else if (st->state > target)
ret = cpu_down(dev->id, target);
This does the check under the lock and just falls through if state==target.
I think I'll go back to that version.
I also noticed while testing that the boot cpu does not get its target set.
It's got state 233 but target 0. So reading that out and writing it back
on offlines cpu0. I'll try to find where that is not getting set.
Thanks,
Phil
> >>
> >> > ret = lock_device_hotplug_sysfs();
> >> > if (ret)
> >> > return ret;
> >> > --
> >> > 2.18.0
> >>
> >
> >
> > Cheers,
> > Phil
> >
> > --
>
--
On Wed, May 25, 2022 at 04:09:29PM +0100 Valentin Schneider wrote:
> On 25/05/22 09:31, Phil Auld wrote:
> > On Wed, May 25, 2022 at 10:48:31AM +0100 Valentin Schneider wrote:
> >>
> >> Yeah it would be neater to not even enter cpu_{up, down}(), but my paranoia
> >> makes me think we need the comparison to happen with at least the
> >> cpu_add_remove_lock held to make sure st->state isn't moving under our
> >> feet, otherwise we may still end up with target == state in _cpu_down() and
> >> hit the bug you're describing.
> >>
> >
> > This is what I was originally doing before I tried to "optimize" it:
> >
> > if (st->state < target)
> > ret = cpu_up(dev->id, target);
> > else if (st->state > target)
> > ret = cpu_down(dev->id, target);
> >
> > This does the check under the lock and just falls through if state==target.
> > I think I'll go back to that version.
> >
> > I also noticed while testing that the boot cpu does not get its target set.
> > It's got state 233 but target 0. So reading that out and writing it back
> > on offlines cpu0. I'll try to find where that is not getting set.
> >
>
> If I had to guess I'd say it's because the boot CPU doesn't go through the
> regular hotplug machinery and sets its state straight to CPUHP_ONLINE
>
Yes, that was my thought.
> /me digs
>
> Maybe around this?
>
> void __init boot_cpu_hotplug_init(void)
> {
> this_cpu_write(cpuhp_state.booted_once, true);
> this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
> }
>
Right, just found that too. Probably should set the target there as well.
Cheers,
Phil
--