LinuxLists.cc - [PATCH v3] sched/rt: fix bad task migration for rt tasks

2022-07-07 17:23:08

Subject: [PATCH v3] sched/rt: fix bad task migration for rt tasks

Commit 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")
allow find_lock_lowest_rq to pick a task with migration disabled.
This commit is intended to push the current running task on this CPU
away.

There is a race scenarios, which allows a migration disabled task to
be migrated to another CPU.

When there is a RT task with higher priority, rt sched class was
intended to migrate higher priority task to lowest rq via push_rt_tasks,
this BUG will happen here.

With the system running on PREEMPT_RT, rt_spin_lock will disable
migration, this will make the problem easier to reproduce.

I have seen this crash on PREEMPT_RT, from the logs, there is a race
when trying to migrate higher priority tasks to the lowest rq.

Please refer to the following scenarios.

CPU0 CPU1
------------------------------------------------------------------
push_rt_task
check is_migration_disabled(next_task)
task not running and
migration_disabled == 0
find_lock_lowest_rq(next_task, rq);
_double_lock_balance(this_rq, busiest);
raw_spin_rq_unlock(this_rq);
double_rq_lock(this_rq, busiest);
<<wait for busiest rq>>
<wakeup>
task become running
migrate_disable();
<context out>
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
WARN_ON_ONCE(is_migration_disabled(p));
---------OOPS-------------

Crash logs as fellowing:
[123671.996430] WARNING: CPU: 2 PID: 13470 at kernel/sched/core.c:2485
set_task_cpu+0x8c/0x108
[123671.996800] pstate: 20400009 (nzCv daif +PAN -UAO -TCO BTYPE=--)
[123671.996811] pc : set_task_cpu+0x8c/0x108
[123671.996820] lr : set_task_cpu+0x7c/0x108
[123671.996828] sp : ffff80001268bd30
[123671.996832] pmr_save: 00000060
[123671.996835] x29: ffff80001268bd30 x28: ffff0001a3d68e80
[123671.996844] x27: ffff80001225f4a8 x26: ffff800010ab62cb
[123671.996854] x25: ffff80026d95e000 x24: 0000000000000005
[123671.996864] x23: ffff00019746c1b0 x22: 0000000000000000
[123671.996873] x21: ffff00027ee33a80 x20: 0000000000000000
[123671.996882] x19: ffff00019746ba00 x18: 0000000000000000
[123671.996890] x17: 0000000000000000 x16: 0000000000000000
[123671.996899] x15: 000000000000000a x14: 000000000000349e
[123671.996908] x13: ffff800012f4503d x12: 0000000000000001
[123671.996916] x11: 0000000000000000 x10: 0000000000000000
[123671.996925] x9 : 00000000000c0000 x8 : ffff00027ee58700
[123671.996933] x7 : ffff00027ee8da80 x6 : ffff00027ee8e580
[123671.996942] x5 : ffff00027ee8dcc0 x4 : 0000000000000005
[123671.996951] x3 : ffff00027ee8e338 x2 : 0000000000000000
[123671.996959] x1 : 00000000000000ff x0 : 0000000000000002
[123671.996969] Call trace:
[123671.996975] set_task_cpu+0x8c/0x108
[123671.996984] push_rt_task.part.0+0x144/0x184
[123671.996995] push_rt_tasks+0x28/0x3c
[123671.997002] task_woken_rt+0x58/0x68
[123671.997009] ttwu_do_wakeup+0x5c/0xd0
[123671.997019] ttwu_do_activate+0xc0/0xd4
[123671.997028] try_to_wake_up+0x244/0x288
[123671.997036] wake_up_process+0x18/0x24
[123671.997045] __irq_wake_thread+0x64/0x80
[123671.997056] __handle_irq_event_percpu+0x110/0x124
[123671.997064] handle_irq_event_percpu+0x50/0xac
[123671.997072] handle_irq_event+0x84/0xfc

To fix it, we need to check migration_disabled flag again and retry
to retry to push the current running task on this CPU away.

Fixes: 95158a89dd50 ("sched,rt: Use the full cpumask for balancing")

Signed-off-by: Schspa Shi <[email protected]>

--

Changelog:
v1 -> v2:
- Modify commit message to add fixed commit information.
- Going to retry to push the current running task on this CPU
away, instead doing nothing for this migrate disabled task.
v2 -> v3:
- Change migration disabled check to the correct position
---
kernel/sched/core.c | 6 ++++++
kernel/sched/rt.c | 11 +++++++++++
2 files changed, 17 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da0bf6fe9ecdc..3003baa28d2d8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2512,6 +2512,11 @@ int push_cpu_stop(void *arg)
if (!lowest_rq)
goto out_unlock;

+ if (unlikely(is_migration_disabled(p))) {
+ p->migration_flags |= MDF_PUSH;
+ goto out_double_unlock;
+ }
+
// XXX validate p is still the highest prio task
if (task_rq(p) == rq) {
deactivate_task(rq, p, 0);
@@ -2520,6 +2525,7 @@ int push_cpu_stop(void *arg)
resched_curr(lowest_rq);
}

+out_double_unlock:
double_unlock_balance(rq, lowest_rq);

out_unlock:
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8c9ed96648409..7948c9cfb472d 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2153,6 +2153,17 @@ static int push_rt_task(struct rq *rq, bool pull)
goto retry;
}

+ /*
+ * It is possible the task has running for a while, we need to check
+ * task migration disable flag again. If task migration is disabled,
+ * the retry code will retry to push the current running task on this
+ * CPU away.
+ */
+ if (unlikely(is_migration_disabled(next_task))) {
+ put_task_struct(next_task);
+ goto retry;
+ }
+
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
activate_task(lowest_rq, next_task, 0);
--
2.37.0

2022-07-07 18:04:15

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks

On Fri, 8 Jul 2022 00:50:14 +0800
Schspa Shi <[email protected]> wrote:

> Please refer to the following scenarios.

I'm not sure this is what is happening. Do you have a trace to back this up?

>
> CPU0 CPU1
> ------------------------------------------------------------------
> push_rt_task
> check is_migration_disabled(next_task)
> task not running and
> migration_disabled == 0
> find_lock_lowest_rq(next_task, rq);
> _double_lock_balance(this_rq, busiest);
> raw_spin_rq_unlock(this_rq);
> double_rq_lock(this_rq, busiest);
> <<wait for busiest rq>>
> <wakeup>

Here's the problem I have. next_task is queued on CPU0, (otherwise CPU0
would not be pushing it). As CPU0 is currently running push_rt_task, how
did next_task start running to set its migrate_disable flag?

Even if it was woken up on another CPU and ran there, by setting
migrate_disable, it would not be put back to CPU0, because its
migrate_disable flag is set (if it is, then there's the bug).

After releasing the rq lock and retaking it, we check that the next_task is
still the next task on CPU0 to push.

> task become running
> migrate_disable();
> <context out>
> deactivate_task(rq, next_task, 0);
> set_task_cpu(next_task, lowest_rq->cpu);
> WARN_ON_ONCE(is_migration_disabled(p));
> ---------OOPS-------------

I don't see how this can happen.

-- Steve

2022-07-08 05:34:58

by Schspa Shi

[permalink] [raw]

Subject: Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks

Steven Rostedt <[email protected]> writes:

> On Fri, 8 Jul 2022 00:50:14 +0800
> Schspa Shi <[email protected]> wrote:
>
>> Please refer to the following scenarios.
>
> I'm not sure this is what is happening. Do you have a trace to
> back this up?
>

I don't have a trace. This is inferred from the exception log.

>>
>> CPU0 CPU1
>> ------------------------------------------------------------------
>> push_rt_task
>> check is_migration_disabled(next_task)
>> task not running and
>> migration_disabled == 0
>> find_lock_lowest_rq(next_task, rq);
>> _double_lock_balance(this_rq, busiest);
>> raw_spin_rq_unlock(this_rq);
>> double_rq_lock(this_rq, busiest);
>> <<wait for busiest rq>>
>> <wakeup>
>
> Here's the problem I have. next_task is queued on CPU0,
> (otherwise CPU0
> would not be pushing it). As CPU0 is currently running
> push_rt_task, how
> did next_task start running to set its migrate_disable flag?

THe next_task wasn't queued on CPU0, it's queued on CPU1 in this
scenarios.

And it's because when task wakup, the rq argument is not the
current running CPU rq, it's next_task's rq
(i.e. CPU1's rq in this sample scenarios).

And you can check this with the Call trace from the crash log.

[123671.996969] Call trace:
[123671.996975] set_task_cpu+0x8c/0x108
[123671.996984] push_rt_task.part.0+0x144/0x184
[123671.996995] push_rt_tasks+0x28/0x3c
[123671.997002] task_woken_rt+0x58/0x68
[123671.997009] ttwu_do_wakeup+0x5c/0xd0
[123671.997019] ttwu_do_activate+0xc0/0xd4
[123671.997028] try_to_wake_up+0x244/0x288
[123671.997036] wake_up_process+0x18/0x24
[123671.997045] __irq_wake_thread+0x64/0x80
[123671.997056] __handle_irq_event_percpu+0x110/0x124

Function ttwu_do_wakeup will lock the task's rq, not current
running
cpu rq.

>
> Even if it was woken up on another CPU and ran there, by setting
> migrate_disable, it would not be put back to CPU0, because its
> migrate_disable flag is set (if it is, then there's the bug).
>

It no needs to put it back to CPU0 for this issue, it's still on
CPU1.

> After releasing the rq lock and retaking it, we check that the
> next_task is
> still the next task on CPU0 to push.
>
>
>> task become running
>> migrate_disable();
>> <context out>
>> deactivate_task(rq, next_task, 0);
>> set_task_cpu(next_task, lowest_rq->cpu);
>> WARN_ON_ONCE(is_migration_disabled(p));
>> ---------OOPS-------------
>
> I don't see how this can happen.
>
> -- Steve

--
BRs
Schspa Shi

2022-07-08 18:07:43

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks

On Fri, 08 Jul 2022 12:51:14 +0800
Schspa Shi <[email protected]> wrote:

> Steven Rostedt <[email protected]> writes:
>
> > On Fri, 8 Jul 2022 00:50:14 +0800
> > Schspa Shi <[email protected]> wrote:
> >
> >> Please refer to the following scenarios.
> >
> > I'm not sure this is what is happening. Do you have a trace to
> > back this up?
> >
>
> I don't have a trace. This is inferred from the exception log.
>
> >>
> >> CPU0 CPU1
> >> ------------------------------------------------------------------
> >> push_rt_task
> >> check is_migration_disabled(next_task)
> >> task not running and
> >> migration_disabled == 0
> >> find_lock_lowest_rq(next_task, rq);
> >> _double_lock_balance(this_rq, busiest);
> >> raw_spin_rq_unlock(this_rq);
> >> double_rq_lock(this_rq, busiest);
> >> <<wait for busiest rq>>
> >> <wakeup>
> >
> > Here's the problem I have. next_task is queued on CPU0,
> > (otherwise CPU0
> > would not be pushing it). As CPU0 is currently running
> > push_rt_task, how
> > did next_task start running to set its migrate_disable flag?
>
> THe next_task wasn't queued on CPU0, it's queued on CPU1 in this
> scenarios.

Bah, I forgot that we still do pushing for other CPUs. I was thinking that
we removed that in favor of pulling. It's been a while since I worked on
this.

>
> And it's because when task wakup, the rq argument is not the
> current running CPU rq, it's next_task's rq
> (i.e. CPU1's rq in this sample scenarios).
>
> And you can check this with the Call trace from the crash log.
>
> [123671.996969] Call trace:
> [123671.996975] set_task_cpu+0x8c/0x108
> [123671.996984] push_rt_task.part.0+0x144/0x184
> [123671.996995] push_rt_tasks+0x28/0x3c
> [123671.997002] task_woken_rt+0x58/0x68
> [123671.997009] ttwu_do_wakeup+0x5c/0xd0
> [123671.997019] ttwu_do_activate+0xc0/0xd4
> [123671.997028] try_to_wake_up+0x244/0x288
> [123671.997036] wake_up_process+0x18/0x24
> [123671.997045] __irq_wake_thread+0x64/0x80
> [123671.997056] __handle_irq_event_percpu+0x110/0x124
>
> Function ttwu_do_wakeup will lock the task's rq, not current
> running
> cpu rq.
>
> >
> > Even if it was woken up on another CPU and ran there, by setting
> > migrate_disable, it would not be put back to CPU0, because its
> > migrate_disable flag is set (if it is, then there's the bug).
> >
>
> It no needs to put it back to CPU0 for this issue, it's still on
> CPU1.
>

Worse things can actually happen then migrating a migrate disabled task.
What prevents next_task from being scheduled and in a running state, or
even migrated?

Hmm, that's covered in find_lock_lowest_rq().

Looks like the the migrate disable check needs to go there.

/* if the prio of this runqueue changed, try again */
if (double_lock_balance(rq, lowest_rq)) {
/*
* We had to unlock the run queue. In
* the mean time, task could have
* migrated already or had its affinity changed.
* Also make sure that it wasn't scheduled on its rq.
*/
if (unlikely(task_rq(task) != rq ||
!cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
task_running(rq, task) ||
!rt_task(task) ||
+ is_migrate_disabled(task) ||
!task_on_rq_queued(task))) {

double_unlock_balance(rq, lowest_rq);
lowest_rq = NULL;
break;
}
}

-- Steve

2022-07-08 18:36:17

by Schspa Shi

[permalink] [raw]

Subject: Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks

Steven Rostedt <[email protected]> writes:

> On Fri, 08 Jul 2022 12:51:14 +0800
> Schspa Shi <[email protected]> wrote:
>
>> Steven Rostedt <[email protected]> writes:
>>
>> > On Fri, 8 Jul 2022 00:50:14 +0800
>> > Schspa Shi <[email protected]> wrote:
>> >
>> >> Please refer to the following scenarios.
>> >
>> > I'm not sure this is what is happening. Do you have a trace to
>> > back this up?
>> >
>>
>> I don't have a trace. This is inferred from the exception log.
>>
>> >>
>> >> CPU0 CPU1
>> >> ------------------------------------------------------------------
>> >> push_rt_task
>> >> check is_migration_disabled(next_task)
>> >> task not running and
>> >> migration_disabled == 0
>> >> find_lock_lowest_rq(next_task, rq);
>> >> _double_lock_balance(this_rq, busiest);
>> >> raw_spin_rq_unlock(this_rq);
>> >> double_rq_lock(this_rq, busiest);
>> >> <<wait for busiest rq>>
>> >> <wakeup>
>> >
>> > Here's the problem I have. next_task is queued on CPU0,
>> > (otherwise CPU0
>> > would not be pushing it). As CPU0 is currently running
>> > push_rt_task, how
>> > did next_task start running to set its migrate_disable flag?
>>
>> THe next_task wasn't queued on CPU0, it's queued on CPU1 in this
>> scenarios.
>
> Bah, I forgot that we still do pushing for other CPUs. I was thinking that
> we removed that in favor of pulling. It's been a while since I worked on
> this.
>
>>
>> And it's because when task wakup, the rq argument is not the
>> current running CPU rq, it's next_task's rq
>> (i.e. CPU1's rq in this sample scenarios).
>>
>> And you can check this with the Call trace from the crash log.
>>
>> [123671.996969] Call trace:
>> [123671.996975] set_task_cpu+0x8c/0x108
>> [123671.996984] push_rt_task.part.0+0x144/0x184
>> [123671.996995] push_rt_tasks+0x28/0x3c
>> [123671.997002] task_woken_rt+0x58/0x68
>> [123671.997009] ttwu_do_wakeup+0x5c/0xd0
>> [123671.997019] ttwu_do_activate+0xc0/0xd4
>> [123671.997028] try_to_wake_up+0x244/0x288
>> [123671.997036] wake_up_process+0x18/0x24
>> [123671.997045] __irq_wake_thread+0x64/0x80
>> [123671.997056] __handle_irq_event_percpu+0x110/0x124
>>
>> Function ttwu_do_wakeup will lock the task's rq, not current
>> running
>> cpu rq.
>>
>> >
>> > Even if it was woken up on another CPU and ran there, by setting
>> > migrate_disable, it would not be put back to CPU0, because its
>> > migrate_disable flag is set (if it is, then there's the bug).
>> >
>>
>> It no needs to put it back to CPU0 for this issue, it's still on
>> CPU1.
>>
>
> Worse things can actually happen then migrating a migrate disabled task.
> What prevents next_task from being scheduled and in a running state, or
> even migrated?
>
> Hmm, that's covered in find_lock_lowest_rq().
>
> Looks like the the migrate disable check needs to go there.
>
> /* if the prio of this runqueue changed, try again */
> if (double_lock_balance(rq, lowest_rq)) {
> /*
> * We had to unlock the run queue. In
> * the mean time, task could have
> * migrated already or had its affinity changed.
> * Also make sure that it wasn't scheduled on its rq.
> */
> if (unlikely(task_rq(task) != rq ||
> !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) ||
> task_running(rq, task) ||
> !rt_task(task) ||
> + is_migrate_disabled(task) ||
> !task_on_rq_queued(task))) {
>

Yes, it's what I did in the V1 patch.
Link: https://lore.kernel.org/all/[email protected]/

But I think it's not the best solution for this problem.
In these scenarios, we still have a chance to make the task run faster
by retrying to retry to push the currently running task on this CPU away.

There is more details on V2 patch's replay message.
Link: https://lore.kernel.org/all/CAMA88TrZ-o4W81Yfw9Wcs3ghoxwpeAKtFejtMTt78GNB0tKaSA@mail.gmail.com/#t

> double_unlock_balance(rq, lowest_rq);
> lowest_rq = NULL;
> break;
> }
> }
>
> -- Steve

--
BRs
Schspa Shi

2022-07-08 19:10:43

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks

On Sat, 09 Jul 2022 02:19:42 +0800
Schspa Shi <[email protected]> wrote:

> Yes, it's what I did in the V1 patch.
> Link: https://lore.kernel.org/all/[email protected]/
>
> But I think it's not the best solution for this problem.
> In these scenarios, we still have a chance to make the task run faster
> by retrying to retry to push the currently running task on this CPU away.
>
> There is more details on V2 patch's replay message.
> Link: https://lore.kernel.org/all/CAMA88TrZ-o4W81Yfw9Wcs3ghoxwpeAKtFejtMTt78GNB0tKaSA@mail.gmail.com/#t

The thing is, this situation can only happen if we release the rq lock in
find_lock_lowest_rq(), and we should not be checking for it in the other
cases.

Perhaps add the check in find_lock_lowest_rq() and also in the !lowest_rq
case do:

task = pick_next_pushable_task(rq);
if (task == next_task) {
+ /*
+ * If next task has now disabled migrating, see if we
+ * can push the current task.
+ */
+ if (unlikely(is_migrate_disabled(task)))
+ goto retry;
/*
* The task hasn't migrated, and is still the next
* eligible task, but we failed to find a run-queue
* to push it to. Do not retry in this case, since
* other CPUs will pull from us when ready.
*/
goto out;
}

-- Steve

2022-07-08 19:38:43

by Schspa Shi

[permalink] [raw]

Subject: Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks

Steven Rostedt <[email protected]> writes:

> On Sat, 09 Jul 2022 02:19:42 +0800
> Schspa Shi <[email protected]> wrote:
>
>> Yes, it's what I did in the V1 patch.
>> Link: https://lore.kernel.org/all/[email protected]/
>>
>> But I think it's not the best solution for this problem.
>> In these scenarios, we still have a chance to make the task run faster
>> by retrying to retry to push the currently running task on this CPU away.
>>
>> There is more details on V2 patch's replay message.
>> Link: https://lore.kernel.org/all/CAMA88TrZ-o4W81Yfw9Wcs3ghoxwpeAKtFejtMTt78GNB0tKaSA@mail.gmail.com/#t
>
> The thing is, this situation can only happen if we release the rq lock in
> find_lock_lowest_rq(), and we should not be checking for it in the other
> cases.
>

If we haven't unlock the rq in find_lock_lowest_rq(), it will return
NULL. It won't call this code added.

if (unlikely(is_migration_disabled(next_task))) {
put_task_struct(next_task);
goto retry;
}

deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);

Beside, find_lock_lowest_rq() return NULL doesn't means rq is rleased,
We need to add a _find_lock_lowest_rq to get the correct rq released
flags?

> Perhaps add the check in find_lock_lowest_rq() and also in the !lowest_rq
> case do:
>
> task = pick_next_pushable_task(rq);
> if (task == next_task) {
> + /*
> + * If next task has now disabled migrating, see if we
> + * can push the current task.
> + */
> + if (unlikely(is_migrate_disabled(task)))
> + goto retry;

Ahh, It can be added, And do we need this to be a separate PATCH?

> /*
> * The task hasn't migrated, and is still the next
> * eligible task, but we failed to find a run-queue
> * to push it to. Do not retry in this case, since
> * other CPUs will pull from us when ready.
> */
> goto out;
> }
>
> -- Steve

--
BRs
Schspa Shi

2022-07-08 19:47:28

by Steven Rostedt

[permalink] [raw]

Subject: Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks

On Sat, 09 Jul 2022 03:14:44 +0800
Schspa Shi <[email protected]> wrote:

> Steven Rostedt <[email protected]> writes:
>
> > On Sat, 09 Jul 2022 02:19:42 +0800
> > Schspa Shi <[email protected]> wrote:
> >
> >> Yes, it's what I did in the V1 patch.
> >> Link: https://lore.kernel.org/all/[email protected]/
> >>
> >> But I think it's not the best solution for this problem.
> >> In these scenarios, we still have a chance to make the task run faster
> >> by retrying to retry to push the currently running task on this CPU away.
> >>
> >> There is more details on V2 patch's replay message.
> >> Link: https://lore.kernel.org/all/CAMA88TrZ-o4W81Yfw9Wcs3ghoxwpeAKtFejtMTt78GNB0tKaSA@mail.gmail.com/#t
> >
> > The thing is, this situation can only happen if we release the rq lock in
> > find_lock_lowest_rq(), and we should not be checking for it in the other
> > cases.
> >
>
> If we haven't unlock the rq in find_lock_lowest_rq(), it will return
> NULL. It won't call this code added.
>
> if (unlikely(is_migration_disabled(next_task))) {
> put_task_struct(next_task);
> goto retry;
> }

Because it doesn't need to. If it did not unlock the run queue, there's no
way that next_task could have run, because we hold the rq lock for
next_task. Which means that its "migrate_disable" state would not have
changed from the first time we checked.

>
> deactivate_task(rq, next_task, 0);
> set_task_cpu(next_task, lowest_rq->cpu);
>
> Beside, find_lock_lowest_rq() return NULL doesn't means rq is rleased,
> We need to add a _find_lock_lowest_rq to get the correct rq released
> flags?

It it returns NULL it either means that the rq lock was released or that it
did not find a rq to push to. Which means there's nothing more to do anyway.

>
> > Perhaps add the check in find_lock_lowest_rq() and also in the !lowest_rq
> > case do:
> >
> > task = pick_next_pushable_task(rq);
> > if (task == next_task) {
> > + /*
> > + * If next task has now disabled migrating, see if we
> > + * can push the current task.
> > + */
> > + if (unlikely(is_migrate_disabled(task)))
> > + goto retry;
>
> Ahh, It can be added, And do we need this to be a separate PATCH?

Sure.

The "fix" to the crash you see should be in the find_lock_lowest_rq() as I
suggested. And then you can add this as an optimization.

-- Steve

>
> > /*
> > * The task hasn't migrated, and is still the next
> > * eligible task, but we failed to find a run-queue
> > * to push it to. Do not retry in this case, since
> > * other CPUs will pull from us when ready.
> > */
> > goto out;
> > }
> >
> > -- Steve
>

2022-07-08 20:51:50

by Schspa Shi

[permalink] [raw]

Subject: Re: [PATCH v3] sched/rt: fix bad task migration for rt tasks

Steven Rostedt <[email protected]> writes:

> On Sat, 09 Jul 2022 03:14:44 +0800
> Schspa Shi <[email protected]> wrote:
>
>> Steven Rostedt <[email protected]> writes:
>>
>> > On Sat, 09 Jul 2022 02:19:42 +0800
>> > Schspa Shi <[email protected]> wrote:
>> >
>> >> Yes, it's what I did in the V1 patch.
>> >> Link: https://lore.kernel.org/all/[email protected]/
>> >>
>> >> But I think it's not the best solution for this problem.
>> >> In these scenarios, we still have a chance to make the task run faster
>> >> by retrying to retry to push the currently running task on this CPU away.
>> >>
>> >> There is more details on V2 patch's replay message.
>> >> Link: https://lore.kernel.org/all/CAMA88TrZ-o4W81Yfw9Wcs3ghoxwpeAKtFejtMTt78GNB0tKaSA@mail.gmail.com/#t
>> >
>> > The thing is, this situation can only happen if we release the rq lock in
>> > find_lock_lowest_rq(), and we should not be checking for it in the other
>> > cases.
>> >
>>
>> If we haven't unlock the rq in find_lock_lowest_rq(), it will return
>> NULL. It won't call this code added.
>>
>> if (unlikely(is_migration_disabled(next_task))) {
>> put_task_struct(next_task);
>> goto retry;
>> }
>
> Because it doesn't need to. If it did not unlock the run queue, there's no
> way that next_task could have run, because we hold the rq lock for
> next_task. Which means that its "migrate_disable" state would not have
> changed from the first time we checked.
>

OK, I get it.

>>
>> deactivate_task(rq, next_task, 0);
>> set_task_cpu(next_task, lowest_rq->cpu);
>>
>> Beside, find_lock_lowest_rq() return NULL doesn't means rq is rleased,
>> We need to add a _find_lock_lowest_rq to get the correct rq released
>> flags?
>
> It it returns NULL it either means that the rq lock was released or that it
> did not find a rq to push to. Which means there's nothing more to do anyway.
>
>>
>> > Perhaps add the check in find_lock_lowest_rq() and also in the !lowest_rq
>> > case do:
>> >
>> > task = pick_next_pushable_task(rq);
>> > if (task == next_task) {
>> > + /*
>> > + * If next task has now disabled migrating, see if we
>> > + * can push the current task.
>> > + */
>> > + if (unlikely(is_migrate_disabled(task)))
>> > + goto retry;
>>
>> Ahh, It can be added, And do we need this to be a separate PATCH?
>
> Sure.
>
> The "fix" to the crash you see should be in the find_lock_lowest_rq() as I
> suggested. And then you can add this as an optimization.

OK, I will make a V4 patch for this, Please review it then.

>
> -- Steve
>
>>
>> > /*
>> > * The task hasn't migrated, and is still the next
>> > * eligible task, but we failed to find a run-queue
>> > * to push it to. Do not retry in this case, since
>> > * other CPUs will pull from us when ready.
>> > */
>> > goto out;
>> > }
>> >
>> > -- Steve
>>

--
BRs
Schspa Shi