Commit 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
allow find_lock_lowest_rq to pick a task with migration disabled.
This commit is intended to push the current running task on this CPU
away.
There is a race scenarios, which allows a migration disabled task to
be migrated to another CPU.
When there is a RT task with higher priority, rt sched class was
intended to migrate higher priority task to lowest rq via push_rt_tasks,
this BUG will happen here.
With the system running on PREEMPT_RT, rt_spin_lock will disable
migration, this will make the problem easier to reproduce.
I have seen this crash on PREEMPT_RT, from the logs, there is a race
when trying to migrate higher priority tasks to the lowest rq.
Please refer to the following scenarios.
CPU0 CPU1
------------------------------------------------------------------
push_rt_task
check is_migration_disabled(next_task)
task not running and
migration_disabled == 0
find_lock_lowest_rq(next_task, rq);
_double_lock_balance(this_rq, busiest);
raw_spin_rq_unlock(this_rq);
double_rq_lock(this_rq, busiest);
<<wait for busiest rq>>
<wakeup>
task become running
migrate_disable();
<context out>
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
WARN_ON_ONCE(is_migration_disabled(p));
---------OOPS-------------
Crash logs as fellowing:
[123671.996430] WARNING: CPU: 2 PID: 13470 at kernel/sched/core.c:2485
set_task_cpu+0x8c/0x108
[123671.996800] pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--)
[123671.996811] pc : set_task_cpu+0x8c/0x108
[123671.996820] lr : set_task_cpu+0x7c/0x108
[123671.996828] sp : ffff80001268bd30
[123671.996832] pmr_save: 00000060
[123671.996835] x29: ffff80001268bd30 x28: ffff0001a3d68e80
[123671.996844] x27: ffff80001225f4a8 x26: ffff800010ab62cb
[123671.996854] x25: ffff80026d95e000 x24: 0000000000000005
[123671.996864] x23: ffff00019746c1b0 x22: 0000000000000000
[123671.996873] x21: ffff00027ee33a80 x20: 0000000000000000
[123671.996882] x19: ffff00019746ba00 x18: 0000000000000000
[123671.996890] x17: 0000000000000000 x16: 0000000000000000
[123671.996899] x15: 000000000000000a x14: 000000000000349e
[123671.996908] x13: ffff800012f4503d x12: 0000000000000001
[123671.996916] x11: 0000000000000000 x10: 0000000000000000
[123671.996925] x9 : 00000000000c0000 x8 : ffff00027ee58700
[123671.996933] x7 : ffff00027ee8da80 x6 : ffff00027ee8e580
[123671.996942] x5 : ffff00027ee8dcc0 x4 : 0000000000000005
[123671.996951] x3 : ffff00027ee8e338 x2 : 0000000000000000
[123671.996959] x1 : 00000000000000ff x0 : 0000000000000002
[123671.996969] Call trace:
[123671.996975] set_task_cpu+0x8c/0x108
[123671.996984] push_rt_task.part.0+0x144/0x184
[123671.996995] push_rt_tasks+0x28/0x3c
[123671.997002] task_woken_rt+0x58/0x68
[123671.997009] ttwu_do_wakeup+0x5c/0xd0
[123671.997019] ttwu_do_activate+0xc0/0xd4
[123671.997028] try_to_wake_up+0x244/0x288
[123671.997036] wake_up_process+0x18/0x24
[123671.997045] __irq_wake_thread+0x64/0x80
[123671.997056] __handle_irq_event_percpu+0x110/0x124
[123671.997064] handle_irq_event_percpu+0x50/0xac
[123671.997072] handle_irq_event+0x84/0xfc
To fix it, we need to check migration_disabled flag again to avoid
bad migration.
Fixes: 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
Signed-off-by: Schspa Shi <[email protected]>
--
Changelog:
v1 -> v2:
- Modify commit message to add fixed commit information.
- Going to retry to push the current running task on this CPU
away, instead doing nothing for this migrate disabled task.
v2 -> v3:
- Change migration disabled check to the correct position
v3 -> v4:
- Check migrate disabled in find_lock_lowest_rq to avoid not
necessary check when task rq is not released as Steven advised.
v4 -> v5:
- Adjust the comment as Steve advised to make it clear.
---
kernel/sched/deadline.c | 1 +
kernel/sched/rt.c | 4 ++++
2 files changed, 5 insertions(+)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b5152961b7432..cb3b886a081c3 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2238,6 +2238,7 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
!cpumask_test_cpu(later_rq->cpu, &task->cpus_mask) ||
task_running(rq, task) ||
!dl_task(task) ||
+ is_migration_disabled(task) ||
!task_on_rq_queued(task))) {
double_unlock_balance(rq, later_rq);
later_rq = NULL;
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8c9ed96648409..7c32ba51b6d85 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1998,11 +1998,15 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
* the mean time, task could have
* migrated already or had its affinity changed.
* Also make sure that it wasn't scheduled on its rq.
+ * It is possible the task was scheduled, set
+ * "migrate_disabled" and then got preempted, And we
+ * check task migration disable flag here too.
*/
if (unlikely(task_rq(task) != rq ||
!cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
task_running(rq, task) ||
!rt_task(task) ||
+ is_migration_disabled(task) ||
!task_on_rq_queued(task))) {
double_unlock_balance(rq, lowest_rq);
--
2.37.0
When the task to push disable migration, retry to push the current
running task on this CPU away, instead doing nothing for this migrate
disabled task.
Signed-off-by: Schspa Shi <[email protected]>
---
kernel/sched/core.c | 6 +++++-
kernel/sched/rt.c | 6 ++++++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da0bf6fe9ecdc..0b1fefd97d874 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2509,8 +2509,12 @@ int push_cpu_stop(void *arg)
if (p->sched_class->find_lock_rq)
lowest_rq = p->sched_class->find_lock_rq(p, rq);
- if (!lowest_rq)
+ if (!lowest_rq) {
+ if (unlikely(is_migration_disabled(p)))
+ p->migration_flags |= MDF_PUSH;
+
goto out_unlock;
+ }
// XXX validate p is still the highest prio task
if (task_rq(p) == rq) {
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7c32ba51b6d85..877380e465b7a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2136,6 +2136,12 @@ static int push_rt_task(struct rq *rq, bool pull)
*/
task = pick_next_pushable_task(rq);
if (task == next_task) {
+ /*
+ * If next task has now disabled migrating, see if we
+ * can push the current task.
+ */
+ if (unlikely(is_migration_disabled(task)))
+ goto retry;
/*
* The task hasn't migrated, and is still the next
* eligible task, but we failed to find a run-queue
--
2.37.0
On Tue, 12 Jul 2022 09:31:24 +0800
Schspa Shi <[email protected]> wrote:
> @@ -1998,11 +1998,15 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
> * the mean time, task could have
> * migrated already or had its affinity changed.
> * Also make sure that it wasn't scheduled on its rq.
> + * It is possible the task was scheduled, set
> + * "migrate_disabled" and then got preempted, And we
> + * check task migration disable flag here too.
Nit. "got preempted, so we must check the task migration disable flag here
too".
But other than that.
Reviewed-by: Steven Rostedt (Google) <[email protected]>
-- Steve
> */
Steven Rostedt <[email protected]> writes:
> On Tue, 12 Jul 2022 09:31:24 +0800
> Schspa Shi <[email protected]> wrote:
>
>> @@ -1998,11 +1998,15 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
>> * the mean time, task could have
>> * migrated already or had its affinity changed.
>> * Also make sure that it wasn't scheduled on its rq.
>> + * It is possible the task was scheduled, set
>> + * "migrate_disabled" and then got preempted, And we
>> + * check task migration disable flag here too.
>
> Nit. "got preempted, so we must check the task migration disable flag here
> too".
>
OK, I will make a upload for this too.
> But other than that.
>
> Reviewed-by: Steven Rostedt (Google) <[email protected]>
>
> -- Steve
>
>> */
--
BRs
Schspa Shi
On Tue, 12 Jul 2022 09:31:25 +0800
Schspa Shi <[email protected]> wrote:
> When the task to push disable migration, retry to push the current
> running task on this CPU away, instead doing nothing for this migrate
> disabled task.
>
> Signed-off-by: Schspa Shi <[email protected]>
> ---
> kernel/sched/core.c | 6 +++++-
> kernel/sched/rt.c | 6 ++++++
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index da0bf6fe9ecdc..0b1fefd97d874 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2509,8 +2509,12 @@ int push_cpu_stop(void *arg)
> if (p->sched_class->find_lock_rq)
> lowest_rq = p->sched_class->find_lock_rq(p, rq);
>
> - if (!lowest_rq)
> + if (!lowest_rq) {
Probably should add a comment reminding us that the find_lock() function
above could have released the rq lock and allow p to schedule and be
preempted again, and that lowest_rq could be NULL because p now has the
migrate_disable flag set and not because it could not find the lowest rq.
-- Steve
> + if (unlikely(is_migration_disabled(p)))
> + p->migration_flags |= MDF_PUSH;
> +
> goto out_unlock;
> + }
>
> // XXX validate p is still the highest prio task
> if (task_rq(p) == rq) {
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 7c32ba51b6d85..877380e465b7a 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2136,6 +2136,12 @@ static int push_rt_task(struct rq *rq, bool pull)
> */
> task = pick_next_pushable_task(rq);
> if (task == next_task) {
> + /*
> + * If next task has now disabled migrating, see if we
> + * can push the current task.
> + */
> + if (unlikely(is_migration_disabled(task)))
> + goto retry;
> /*
> * The task hasn't migrated, and is still the next
> * eligible task, but we failed to find a run-queue
Steven Rostedt <[email protected]> writes:
> On Tue, 12 Jul 2022 09:31:25 +0800
> Schspa Shi <[email protected]> wrote:
>
>> When the task to push disable migration, retry to push the current
>> running task on this CPU away, instead doing nothing for this migrate
>> disabled task.
>>
>> Signed-off-by: Schspa Shi <[email protected]>
>> ---
>> kernel/sched/core.c | 6 +++++-
>> kernel/sched/rt.c | 6 ++++++
>> 2 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index da0bf6fe9ecdc..0b1fefd97d874 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2509,8 +2509,12 @@ int push_cpu_stop(void *arg)
>> if (p->sched_class->find_lock_rq)
>> lowest_rq = p->sched_class->find_lock_rq(p, rq);
>>
>> - if (!lowest_rq)
>> + if (!lowest_rq) {
>
> Probably should add a comment reminding us that the find_lock() function
> above could have released the rq lock and allow p to schedule and be
> preempted again, and that lowest_rq could be NULL because p now has the
> migrate_disable flag set and not because it could not find the lowest rq.
>
OK, it will be better.
Let me upload a v6 patch for that.
> -- Steve
>
>
>> + if (unlikely(is_migration_disabled(p)))
>> + p->migration_flags |= MDF_PUSH;
>> +
>> goto out_unlock;
>> + }
>>
>> // XXX validate p is still the highest prio task
>> if (task_rq(p) == rq) {
>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>> index 7c32ba51b6d85..877380e465b7a 100644
>> --- a/kernel/sched/rt.c
>> +++ b/kernel/sched/rt.c
>> @@ -2136,6 +2136,12 @@ static int push_rt_task(struct rq *rq, bool pull)
>> */
>> task = pick_next_pushable_task(rq);
>> if (task == next_task) {
>> + /*
>> + * If next task has now disabled migrating, see if we
>> + * can push the current task.
>> + */
>> + if (unlikely(is_migration_disabled(task)))
>> + goto retry;
>> /*
>> * The task hasn't migrated, and is still the next
>> * eligible task, but we failed to find a run-queue
--
BRs
Schspa Shi