2022-06-27 15:54:46

by Schspa Shi

[permalink] [raw]
Subject: [PATCH v2] sched/rt: fix bad task migration for rt tasks

Commit 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
allow find_lock_lowest_rq to pick a task with migration disabled.
This commit is intended to push the current running task on this CPU
away.

There is a race scenarios, which allows a migration disabled task to
be migrated to another CPU.

When there is a RT task with higher priority, rt sched class was
intended to migrate higher priority task to lowest rq via push_rt_tasks,
this BUG will happen here.

With the system running on PREEMPT_RT, rt_spin_lock will disable
migration, this will make the problem easier to reproduce.

I have seen this crash on PREEMPT_RT, from the logs, there is a race
when trying to migrate higher priority tasks to the lowest rq.

Please refer to the following scenarios.

CPU0 CPU1
------------------------------------------------------------------
push_rt_task
check is_migration_disabled(next_task)
task not running and
migration_disabled == 0
find_lock_lowest_rq(next_task, rq);
_double_lock_balance(this_rq, busiest);
raw_spin_rq_unlock(this_rq);
double_rq_lock(this_rq, busiest);
<<wait for busiest rq>>
<wakeup>
task become running
migrate_disable();
<context out>
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
WARN_ON_ONCE(is_migration_disabled(p));
---------OOPS-------------

Crash logs as fellowing:
[123671.996430] WARNING: CPU: 2 PID: 13470 at kernel/sched/core.c:2485
set_task_cpu+0x8c/0x108
[123671.996800] pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--)
[123671.996811] pc : set_task_cpu+0x8c/0x108
[123671.996820] lr : set_task_cpu+0x7c/0x108
[123671.996828] sp : ffff80001268bd30
[123671.996832] pmr_save: 00000060
[123671.996835] x29: ffff80001268bd30 x28: ffff0001a3d68e80
[123671.996844] x27: ffff80001225f4a8 x26: ffff800010ab62cb
[123671.996854] x25: ffff80026d95e000 x24: 0000000000000005
[123671.996864] x23: ffff00019746c1b0 x22: 0000000000000000
[123671.996873] x21: ffff00027ee33a80 x20: 0000000000000000
[123671.996882] x19: ffff00019746ba00 x18: 0000000000000000
[123671.996890] x17: 0000000000000000 x16: 0000000000000000
[123671.996899] x15: 000000000000000a x14: 000000000000349e
[123671.996908] x13: ffff800012f4503d x12: 0000000000000001
[123671.996916] x11: 0000000000000000 x10: 0000000000000000
[123671.996925] x9 : 00000000000c0000 x8 : ffff00027ee58700
[123671.996933] x7 : ffff00027ee8da80 x6 : ffff00027ee8e580
[123671.996942] x5 : ffff00027ee8dcc0 x4 : 0000000000000005
[123671.996951] x3 : ffff00027ee8e338 x2 : 0000000000000000
[123671.996959] x1 : 00000000000000ff x0 : 0000000000000002
[123671.996969] Call trace:
[123671.996975] set_task_cpu+0x8c/0x108
[123671.996984] push_rt_task.part.0+0x144/0x184
[123671.996995] push_rt_tasks+0x28/0x3c
[123671.997002] task_woken_rt+0x58/0x68
[123671.997009] ttwu_do_wakeup+0x5c/0xd0
[123671.997019] ttwu_do_activate+0xc0/0xd4
[123671.997028] try_to_wake_up+0x244/0x288
[123671.997036] wake_up_process+0x18/0x24
[123671.997045] __irq_wake_thread+0x64/0x80
[123671.997056] __handle_irq_event_percpu+0x110/0x124
[123671.997064] handle_irq_event_percpu+0x50/0xac
[123671.997072] handle_irq_event+0x84/0xfc

To fix it, we need to check migration_disabled flag again and retry
to retry to push the current running task on this CPU away.

Fixes: 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
Signed-off-by: Schspa Shi <[email protected]>

--

Changelog:
v1 -> v2:
- Modify commit message to add fixed commit information.
- Going to retry to push the current running task on this CPU
away, instead doing nothing for this migrate disabled task.

---
kernel/sched/core.c | 6 ++++++
kernel/sched/rt.c | 9 +++++++++
2 files changed, 15 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da0bf6fe9ecd..3003baa28d2d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2512,6 +2512,11 @@ int push_cpu_stop(void *arg)
if (!lowest_rq)
goto out_unlock;

+ if (unlikely(is_migration_disabled(p))) {
+ p->migration_flags |= MDF_PUSH;
+ goto out_double_unlock;
+ }
+
// XXX validate p is still the highest prio task
if (task_rq(p) == rq) {
deactivate_task(rq, p, 0);
@@ -2520,6 +2525,7 @@ int push_cpu_stop(void *arg)
resched_curr(lowest_rq);
}

+out_double_unlock:
double_unlock_balance(rq, lowest_rq);

out_unlock:
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8c9ed9664840..6a1efccdc122 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2115,6 +2115,15 @@ static int push_rt_task(struct rq *rq, bool pull)
if (WARN_ON(next_task == rq->curr))
return 0;

+ /*
+ * It is possible the task has running for a while, we need to check
+ * task migration disable flag again. If task migration is disabled,
+ * the retry code will retry to push the current running task on this
+ * CPU away.
+ */
+ if (unlikely(is_migration_disabled(next_task)))
+ goto retry;
+
/* We might release rq lock */
get_task_struct(next_task);

--
2.24.3 (Apple Git-128)


2022-07-01 11:01:22

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v2] sched/rt: fix bad task migration for rt tasks

On 27/06/22 23:40, Schspa Shi wrote:
> @@ -2115,6 +2115,15 @@ static int push_rt_task(struct rq *rq, bool pull)
> if (WARN_ON(next_task == rq->curr))
> return 0;
>
> + /*
> + * It is possible the task has running for a while, we need to check
> + * task migration disable flag again. If task migration is disabled,
> + * the retry code will retry to push the current running task on this
> + * CPU away.
> + */
> + if (unlikely(is_migration_disabled(next_task)))
> + goto retry;
> +

Can we ever hit this? The previous is_migration_disabled() check is in the
same rq->lock segment.

AFAIA this doesn't fix the problem v1 was fixing, which is next_task can
become migrate_disable() after push_rt_task() goes through
find_lock_lowest_rq().

For the task to still be in the pushable_tasks list after having made
itself migration disabled, it must no longer be current, which means we
enqueued a higher priority RT task, in which case we went through
set_next_task_rt() so we did rt_queue_push_tasks().

So I think what you had in v1 was actually what we needed.

> /* We might release rq lock */
> get_task_struct(next_task);
>
> --
> 2.24.3 (Apple Git-128)

2022-07-01 12:55:29

by Schspa Shi

[permalink] [raw]
Subject: Re: [PATCH v2] sched/rt: fix bad task migration for rt tasks

Valentin Schneider <[email protected]> writes:

> On 27/06/22 23:40, Schspa Shi wrote:
>> @@ -2115,6 +2115,15 @@ static int push_rt_task(struct rq *rq, bool pull)
>> if (WARN_ON(next_task == rq->curr))
>> return 0;
>>
>> + /*
>> + * It is possible the task has running for a while, we need to check
>> + * task migration disable flag again. If task migration is disabled,
>> + * the retry code will retry to push the current running task on this
>> + * CPU away.
>> + */
>> + if (unlikely(is_migration_disabled(next_task)))
>> + goto retry;
>> +
>
> Can we ever hit this? The previous is_migration_disabled() check is in the
> same rq->lock segment.

Ahh, I'm sorry, I add this to the wrong place, It should be in front of
deactivate_task(rq, next_task, 0);
Sorry for this mistake.

>
> AFAIA this doesn't fix the problem v1 was fixing, which is next_task can
> become migrate_disable() after push_rt_task() goes through
> find_lock_lowest_rq().
>

Something in the following should fix it.

put_task_struct(next_task);
next_task = task;
goto retry;
}

if (unlikely(is_migration_disabled(next_task))) {
put_task_struct(next_task);
goto retry;
}

deactivate_task(rq, next_task, 0);

> For the task to still be in the pushable_tasks list after having made
> itself migration disabled, it must no longer be current, which means we
> enqueued a higher priority RT task, in which case we went through
> set_next_task_rt() so we did rt_queue_push_tasks().

The current task may not have a higher priority, maybe a process of
the same priority preempted the migration disabled task.

In this case, we still have the opportunity to make this migration
disabled task execute faster by migrating the higher priority task
to other CPUs. And this is what the commit
95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
and
1beec5b55060 ("sched: Fix migrate_disable() vs rt/dl balancing")
doing.

Considering this, the V1 patch is not the best solution, and I send
this V2 patch (although there is a misplaced bug here).

Or can we ignore this small possibility?

>
> So I think what you had in v1 was actually what we needed.
>

Yes, v1 is the patch I have tested for a week, V2 hasn't done this
long time.


>> /* We might release rq lock */
>> get_task_struct(next_task);
>>
>> --
>> 2.24.3 (Apple Git-128)

--
Schspa Shi
BRs