Subject: [PATCH RT] sched: migrate_enable: Busy loop until the migration request is completed

If user task changes the CPU affinity mask of a running task it will
dispatch migration request if the current CPU is no longer allowed. This
might happen shortly before a task enters a migrate_disable() section.
Upon leaving the migrate_disable() section, the task will notice that
the current CPU is no longer allowed and will will dispatch its own
migration request to move it off the current CPU.
While invoking __schedule() the first migration request will be
processed and the task returns on the "new" CPU with "arg.done = 0". Its
own migration request will be processed shortly after and will result in
memory corruption if the stack memory, designed for request, was used
otherwise in the meantime.

Spin until the migration request has been processed if it was accepted.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/sched/core.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8bea013b2baf5..5c7be96ca68c4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8227,7 +8227,7 @@ void migrate_enable(void)

WARN_ON(smp_processor_id() != cpu);
if (!is_cpu_allowed(p, cpu)) {
- struct migration_arg arg = { p };
+ struct migration_arg arg = { .task = p };
struct cpu_stop_work work;
struct rq_flags rf;

@@ -8239,7 +8239,10 @@ void migrate_enable(void)
stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
&arg, &work);
__schedule(true);
- WARN_ON_ONCE(!arg.done && !work.disabled);
+ if (!work.disabled) {
+ while (!arg.done)
+ cpu_relax();
+ }
}

out:
--
2.24.0


2019-12-13 06:45:32

by Crystal Wood

[permalink] [raw]
Subject: Re: [PATCH RT] sched: migrate_enable: Busy loop until the migration request is completed

On Thu, 2019-12-12 at 12:27 +0100, Sebastian Andrzej Siewior wrote:
> If user task changes the CPU affinity mask of a running task it will
> dispatch migration request if the current CPU is no longer allowed. This
> might happen shortly before a task enters a migrate_disable() section.
> Upon leaving the migrate_disable() section, the task will notice that
> the current CPU is no longer allowed and will will dispatch its own
> migration request to move it off the current CPU.
> While invoking __schedule() the first migration request will be
> processed and the task returns on the "new" CPU with "arg.done = 0". Its
> own migration request will be processed shortly after and will result in
> memory corruption if the stack memory, designed for request, was used
> otherwise in the meantime.

Ugh.

> Spin until the migration request has been processed if it was accepted.
>
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> ---
> kernel/sched/core.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8bea013b2baf5..5c7be96ca68c4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8227,7 +8227,7 @@ void migrate_enable(void)
>
> WARN_ON(smp_processor_id() != cpu);
> if (!is_cpu_allowed(p, cpu)) {
> - struct migration_arg arg = { p };
> + struct migration_arg arg = { .task = p };
> struct cpu_stop_work work;
> struct rq_flags rf;
>
> @@ -8239,7 +8239,10 @@ void migrate_enable(void)
> stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
> &arg, &work);
> __schedule(true);
> - WARN_ON_ONCE(!arg.done && !work.disabled);
> + if (!work.disabled) {
> + while (!arg.done)
> + cpu_relax();
> + }

We should enable preemption while spinning -- besides the general badness
of spinning with it disabled, there could be deadlock scenarios if
multiple CPUs are spinning in such a loop. Long term maybe have a way to
dequeue the no-longer-needed work instead of waiting.

-Scott

Subject: Re: [PATCH RT] sched: migrate_enable: Busy loop until the migration request is completed

On 2019-12-13 00:44:22 [-0600], Scott Wood wrote:
> > @@ -8239,7 +8239,10 @@ void migrate_enable(void)
> > stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
> > &arg, &work);
> > __schedule(true);
> > - WARN_ON_ONCE(!arg.done && !work.disabled);
> > + if (!work.disabled) {
> > + while (!arg.done)
> > + cpu_relax();
> > + }
>
> We should enable preemption while spinning -- besides the general badness
> of spinning with it disabled, there could be deadlock scenarios if
> multiple CPUs are spinning in such a loop. Long term maybe have a way to
> dequeue the no-longer-needed work instead of waiting.

Hmm. My plan was to use per-CPU memory and spin before the request is
enqueued if the previous isn't done yet (which should not happen™).
Then we could remove __schedule() here and rely on preempt_enable()
doing that. With that change we wouldn't care about migrate-disable
level vs preempt-disable level and could drop the hacks we have in futex
code for instance (where we have an extra migrate_disable() in advance
so they are later balanced).

> -Scott

Sebastian

2020-01-22 21:16:33

by Crystal Wood

[permalink] [raw]
Subject: Re: [PATCH RT] sched: migrate_enable: Busy loop until the migration request is completed

On Fri, 2019-12-13 at 09:14 +0100, Sebastian Andrzej Siewior wrote:
> On 2019-12-13 00:44:22 [-0600], Scott Wood wrote:
> > > @@ -8239,7 +8239,10 @@ void migrate_enable(void)
> > > stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
> > > &arg, &work);
> > > __schedule(true);
> > > - WARN_ON_ONCE(!arg.done && !work.disabled);
> > > + if (!work.disabled) {
> > > + while (!arg.done)
> > > + cpu_relax();
> > > + }
> >
> > We should enable preemption while spinning -- besides the general
> > badness
> > of spinning with it disabled, there could be deadlock scenarios if
> > multiple CPUs are spinning in such a loop. Long term maybe have a way
> > to
> > dequeue the no-longer-needed work instead of waiting.
>
> Hmm. My plan was to use per-CPU memory and spin before the request is
> enqueued if the previous isn't done yet (which should not happen™).

Either it can't happen (and thus no need to spin) or it can, and we need to
worry about deadlocks if we're spinning with preemption disabled. In fact a
deadlock is guaranteed if we're spinning with preemption disabled on the cpu
that's supposed to be running the stopper we're waiting on.

I think you're right that it can't happen though (as long as we queue it
before enabling preemption, the stopper will be runnable and nothing else
can run on the cpu before the queue gets drained), so we can just make it a
warning. I'm testing a patch now.

> Then we could remove __schedule() here and rely on preempt_enable()
> doing that.

We could do that regardless.

-Scott