5.10.47-rt46-rc1 stable review patch.
If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra <[email protected]>
commit 9e81889c7648d48dd5fe13f41cbc99f3c362484a upstream.
Consider:
sched_setaffinity(p, X); sched_setaffinity(p, Y);
Then the first will install p->migration_pending = &my_pending; and
issue stop_one_cpu_nowait(pending); and the second one will read
p->migration_pending and _also_ issue: stop_one_cpu_nowait(pending),
the _SAME_ @pending.
This causes stopper list corruption.
Add set_affinity_pending::stop_pending, to indicate if a stopper is in
progress.
Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Paul Gortmaker <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
---
kernel/sched/core.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9cbe12d8c5bd..20588a59300d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1900,6 +1900,7 @@ struct migration_arg {
struct set_affinity_pending {
refcount_t refs;
+ unsigned int stop_pending;
struct completion done;
struct cpu_stop_work stop_work;
struct migration_arg arg;
@@ -2018,12 +2019,15 @@ static int migration_cpu_stop(void *data)
* determine is_migration_disabled() and so have to chase after
* it.
*/
+ WARN_ON_ONCE(!pending->stop_pending);
task_rq_unlock(rq, p, &rf);
stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
&pending->arg, &pending->stop_work);
return 0;
}
out:
+ if (pending)
+ pending->stop_pending = false;
task_rq_unlock(rq, p, &rf);
if (complete)
@@ -2219,7 +2223,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
int dest_cpu, unsigned int flags)
{
struct set_affinity_pending my_pending = { }, *pending = NULL;
- bool complete = false;
+ bool stop_pending, complete = false;
/* Can the task run on the task's current CPU? If so, we're done */
if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
@@ -2292,14 +2296,19 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
* anything else we cannot do is_migration_disabled(), punt
* and have the stopper function handle it all race-free.
*/
+ stop_pending = pending->stop_pending;
+ if (!stop_pending)
+ pending->stop_pending = true;
refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
if (flags & SCA_MIGRATE_ENABLE)
p->migration_flags &= ~MDF_PUSH;
task_rq_unlock(rq, p, rf);
- stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
- &pending->arg, &pending->stop_work);
+ if (!stop_pending) {
+ stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
+ &pending->arg, &pending->stop_work);
+ }
if (flags & SCA_MIGRATE_ENABLE)
return 0;
--
2.30.2
Hi!
> 5.10.47-rt46-rc1 stable review patch.
> If anyone has any objections, please let me know.
>
> Add set_affinity_pending::stop_pending, to indicate if a stopper is in
> progress.
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9cbe12d8c5bd..20588a59300d 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1900,6 +1900,7 @@ struct migration_arg {
>
> struct set_affinity_pending {
> refcount_t refs;
> + unsigned int stop_pending;
> struct completion done;
> struct cpu_stop_work stop_work;
> struct migration_arg arg;
For better readability, this should be bool, AFAICT.
> * and have the stopper function handle it all race-free.
> */
> + stop_pending = pending->stop_pending;
> + if (!stop_pending)
> + pending->stop_pending = true;
>
...because it is used as bool.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 25/07/21 07:03, Pavel Machek wrote:
> Hi!
>
>> 5.10.47-rt46-rc1 stable review patch.
>> If anyone has any objections, please let me know.
>>
>> Add set_affinity_pending::stop_pending, to indicate if a stopper is in
>> progress.
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 9cbe12d8c5bd..20588a59300d 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1900,6 +1900,7 @@ struct migration_arg {
>>
>> struct set_affinity_pending {
>> refcount_t refs;
>> + unsigned int stop_pending;
>> struct completion done;
>> struct cpu_stop_work stop_work;
>> struct migration_arg arg;
>
> For better readability, this should be bool, AFAICT.
>
It's intentionally declared as an int. sizeof(_Bool) is Implementation
Defined, so you can't sanely reason about struct layout.
There's been quite a few threads about this already, a quick search on lore
gave me:
https://lore.kernel.org/lkml/[email protected]/
[Re: [PATCH RT 5/8] sched: Fix affine_move_task() self-concurrency] On 25/07/2021 (Sun 07:03) Pavel Machek wrote:
> Hi!
>
> > 5.10.47-rt46-rc1 stable review patch.
> > If anyone has any objections, please let me know.
> >
> > Add set_affinity_pending::stop_pending, to indicate if a stopper is in
> > progress.
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 9cbe12d8c5bd..20588a59300d 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1900,6 +1900,7 @@ struct migration_arg {
> >
> > struct set_affinity_pending {
> > refcount_t refs;
> > + unsigned int stop_pending;
> > struct completion done;
> > struct cpu_stop_work stop_work;
> > struct migration_arg arg;
>
> For better readability, this should be bool, AFAICT.
Maybe you missed it in the context you deleted, but this is a mainline
backport to stable-rt, and hence is not the time or place to be
injecting stylistic comments. Just like gregKH's stable tree, backports
are kept as "faithful" to the original as possible unless the older
surrounding code base forces some kind of alteration out of necessity.
Thanks,
Paul.
--
>
> > * and have the stopper function handle it all race-free.
> > */
> > + stop_pending = pending->stop_pending;
> > + if (!stop_pending)
> > + pending->stop_pending = true;
> >
> ...because it is used as bool.
>
> Pavel
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html