2022-06-23 19:25:26

by Schspa Shi

[permalink] [raw]
Subject: [PATCH] sched: fix bad task migration for rt tasks

There is a race scenarios, which allows a migration disabled task to
be migrated to another CPU.

When there is a RT task with higher priority, rt sched class was
intended to migrate higher priority task to lowest rq via push_rt_tasks,
this BUG will happen here.

With the system running on PREEMPT_RT, rt_spin_lock will disable
migration, this will make the problem easier to reproduce.

I have seen this crash on PREEMPT_RT, from the logs, there is a race
when trying to migrate higher priority tasks to the lowest rq.

Please refer to the following scenarios.

CPU0 CPU1
------------------------------------------------------------------
push_rt_task
check is_migration_disabled(next_task)
task not running and
migration_disabled == 0
find_lock_lowest_rq(next_task, rq);
_double_lock_balance(this_rq, busiest);
raw_spin_rq_unlock(this_rq);
double_rq_lock(this_rq, busiest);
<<wait for busiest rq>>
<wakeup>
task become running
migrate_disable();
<context out>
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
WARN_ON_ONCE(is_migration_disabled(p));
---------OOPS-------------

Crash logs as fellowing:
[123671.996430] WARNING: CPU: 2 PID: 13470 at kernel/sched/core.c:2485
set_task_cpu+0x8c/0x108
[123671.996800] pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--)
[123671.996811] pc : set_task_cpu+0x8c/0x108
[123671.996820] lr : set_task_cpu+0x7c/0x108
[123671.996828] sp : ffff80001268bd30
[123671.996832] pmr_save: 00000060
[123671.996835] x29: ffff80001268bd30 x28: ffff0001a3d68e80
[123671.996844] x27: ffff80001225f4a8 x26: ffff800010ab62cb
[123671.996854] x25: ffff80026d95e000 x24: 0000000000000005
[123671.996864] x23: ffff00019746c1b0 x22: 0000000000000000
[123671.996873] x21: ffff00027ee33a80 x20: 0000000000000000
[123671.996882] x19: ffff00019746ba00 x18: 0000000000000000
[123671.996890] x17: 0000000000000000 x16: 0000000000000000
[123671.996899] x15: 000000000000000a x14: 000000000000349e
[123671.996908] x13: ffff800012f4503d x12: 0000000000000001
[123671.996916] x11: 0000000000000000 x10: 0000000000000000
[123671.996925] x9 : 00000000000c0000 x8 : ffff00027ee58700
[123671.996933] x7 : ffff00027ee8da80 x6 : ffff00027ee8e580
[123671.996942] x5 : ffff00027ee8dcc0 x4 : 0000000000000005
[123671.996951] x3 : ffff00027ee8e338 x2 : 0000000000000000
[123671.996959] x1 : 00000000000000ff x0 : 0000000000000002
[123671.996969] Call trace:
[123671.996975] set_task_cpu+0x8c/0x108
[123671.996984] push_rt_task.part.0+0x144/0x184
[123671.996995] push_rt_tasks+0x28/0x3c
[123671.997002] task_woken_rt+0x58/0x68
[123671.997009] ttwu_do_wakeup+0x5c/0xd0
[123671.997019] ttwu_do_activate+0xc0/0xd4
[123671.997028] try_to_wake_up+0x244/0x288
[123671.997036] wake_up_process+0x18/0x24
[123671.997045] __irq_wake_thread+0x64/0x80
[123671.997056] __handle_irq_event_percpu+0x110/0x124
[123671.997064] handle_irq_event_percpu+0x50/0xac
[123671.997072] handle_irq_event+0x84/0xfc

To fix it, we need to check migration_disabled flag again to avoid
bad migration.

Signed-off-by: Schspa Shi <[email protected]>
---
kernel/sched/deadline.c | 3 ++-
kernel/sched/rt.c | 5 ++++-
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b5152961b743..4424f799f6d3 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2238,7 +2238,8 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
!cpumask_test_cpu(later_rq->cpu, &task->cpus_mask) ||
task_running(rq, task) ||
!dl_task(task) ||
- !task_on_rq_queued(task))) {
+ !task_on_rq_queued(task)) ||
+ is_migration_disabled(task)) {
double_unlock_balance(rq, later_rq);
later_rq = NULL;
break;
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8c9ed9664840..3bcd169fed0e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1998,12 +1998,15 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
* the mean time, task could have
* migrated already or had its affinity changed.
* Also make sure that it wasn't scheduled on its rq.
+ * It is possible the task has running for a while,
+ * And we check task migration disable flag again.
*/
if (unlikely(task_rq(task) != rq ||
!cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
task_running(rq, task) ||
!rt_task(task) ||
- !task_on_rq_queued(task))) {
+ !task_on_rq_queued(task) ||
+ is_migration_disabled(task))) {

double_unlock_balance(rq, lowest_rq);
lowest_rq = NULL;
--
2.24.3 (Apple Git-128)


2022-06-24 11:16:16

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched: fix bad task migration for rt tasks

On 24/06/22 02:29, Schspa Shi wrote:
> @@ -1998,12 +1998,15 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
> * the mean time, task could have
> * migrated already or had its affinity changed.
> * Also make sure that it wasn't scheduled on its rq.
> + * It is possible the task has running for a while,
> + * And we check task migration disable flag again.
> */
> if (unlikely(task_rq(task) != rq ||
> !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||

cf. 95158a89dd50 ("sched,rt: Use the full cpumask for balancing"), this
made sense to me back then but not so much anymore... Shouldn't this have
remained a ->cpus_ptr check?

I'm going to revisit that commit, I evicted too much of it.

> task_running(rq, task) ||
> !rt_task(task) ||
> - !task_on_rq_queued(task))) {
> + !task_on_rq_queued(task) ||
> + is_migration_disabled(task))) {
>
> double_unlock_balance(rq, lowest_rq);
> lowest_rq = NULL;
> --
> 2.24.3 (Apple Git-128)

2022-06-25 12:34:50

by Schspa Shi

[permalink] [raw]
Subject: Re: [PATCH] sched: fix bad task migration for rt tasks

Valentin Schneider <[email protected]> writes:

> On 24/06/22 02:29, Schspa Shi wrote:
>> @@ -1998,12 +1998,15 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
>> * the mean time, task could have
>> * migrated already or had its affinity changed.
>> * Also make sure that it wasn't scheduled on its rq.
>> + * It is possible the task has running for a while,
>> + * And we check task migration disable flag again.
>> */
>> if (unlikely(task_rq(task) != rq ||
>> !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
>
> cf. 95158a89dd50 ("sched,rt: Use the full cpumask for balancing"), this
> made sense to me back then but not so much anymore... Shouldn't this have
> remained a ->cpus_ptr check?
>

It seems it is for the following scenarios:
It allows the higher priority non migratable task to be sched quickly
by migrating the current task to
another CPU with lower priority.

Considering this, we should retry for this. rather than return with no
lower priority rq.

I can upload a new patchset for this handing.

Please refers to the fellowing code:
if (is_migration_disabled(next_task)) {
struct task_struct *push_task = NULL;
int cpu;

if (!pull || rq->push_busy)
return 0;

/*
* Invoking find_lowest_rq() on anything but an RT task doesn't
* make sense. Per the above priority check, curr has to
* be of higher priority than next_task, so no need to
* reschedule when bailing out.
*
* Note that the stoppers are masqueraded as SCHED_FIFO
* (cf. sched_set_stop_task()), so we can't rely on rt_task().
*/
if (rq->curr->sched_class != &rt_sched_class)
return 0;

cpu = find_lowest_rq(rq->curr);
if (cpu == -1 || cpu == rq->cpu)
return 0;

/*
* Given we found a CPU with lower priority than @next_task,
* therefore it should be running. However we cannot migrate it
* to this other CPU, instead attempt to push the current
* running task on this CPU away.
*/
push_task = get_push_task(rq);
if (push_task) {
raw_spin_rq_unlock(rq);
stop_one_cpu_nowait(rq->cpu, push_cpu_stop,
push_task, &rq->push_work);
raw_spin_rq_lock(rq);
}

return 0;

}

> I'm going to revisit that commit, I evicted too much of it.
>
>> task_running(rq, task) ||
>> !rt_task(task) ||
>> - !task_on_rq_queued(task))) {
>> + !task_on_rq_queued(task) ||
>> + is_migration_disabled(task))) {
>>
>> double_unlock_balance(rq, lowest_rq);
>> lowest_rq = NULL;
>> --
>> 2.24.3 (Apple Git-128)

--

BRs
Schspa Shi

2022-07-01 07:29:29

by Schspa Shi

[permalink] [raw]
Subject: Re: [PATCH] sched: fix bad task migration for rt tasks

Valentin Schneider <[email protected]> writes:

> On 24/06/22 02:29, Schspa Shi wrote:
>> @@ -1998,12 +1998,15 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
>> * the mean time, task could have
>> * migrated already or had its affinity changed.
>> * Also make sure that it wasn't scheduled on its rq.
>> + * It is possible the task has running for a while,
>> + * And we check task migration disable flag again.
>> */
>> if (unlikely(task_rq(task) != rq ||
>> !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
>
> cf. 95158a89dd50 ("sched,rt: Use the full cpumask for balancing"), this
> made sense to me back then but not so much anymore... Shouldn't this have
> remained a ->cpus_ptr check?
>

This cpus_ptr check seems to be somehow same with is_migration_disabled(task).

> I'm going to revisit that commit, I evicted too much of it.
>

Any further progress on this?

We have tested this patch for a weak, It seems the problems have gone
(It will reproduce 2-3 times per week usually).

I have send a V2 patch too, it task the other case into consideration
(Going to retry to push the current running task on this CPU away,
instead doing nothing for this migrate disabled task.).

Link: https://lore.kernel.org/all/[email protected]/


>> task_running(rq, task) ||
>> !rt_task(task) ||
>> - !task_on_rq_queued(task))) {
>> + !task_on_rq_queued(task) ||
>> + is_migration_disabled(task))) {
>>
>> double_unlock_balance(rq, lowest_rq);
>> lowest_rq = NULL;
>> --
>> 2.24.3 (Apple Git-128)

--
Schspa Shi
BRs