When the task to push disable migration, retry to push the current
running task on this CPU away, instead doing nothing for this migrate
disabled task.
Signed-off-by: Schspa Shi <[email protected]>
---
kernel/sched/core.c | 13 ++++++++++++-
kernel/sched/deadline.c | 9 +++++++++
kernel/sched/rt.c | 8 ++++++++
3 files changed, 29 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da0bf6fe9ecd..af90cc558b8e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2509,8 +2509,19 @@ int push_cpu_stop(void *arg)
if (p->sched_class->find_lock_rq)
lowest_rq = p->sched_class->find_lock_rq(p, rq);
- if (!lowest_rq)
+ if (!lowest_rq) {
+ /*
+ * The find_lock_rq function above could have released the rq
+ * lock and allow p to schedule and be preempted again, and
+ * that lowest_rq could be NULL because p now has the
+ * migrate_disable flag set and not because it could not find
+ * the lowest rq. So we must check task migration flag again.
+ */
+ if (unlikely(is_migration_disabled(p)))
+ p->migration_flags |= MDF_PUSH;
+
goto out_unlock;
+ }
// XXX validate p is still the highest prio task
if (task_rq(p) == rq) {
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index cb3b886a081c..21af20445e7f 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2335,6 +2335,15 @@ static int push_dl_task(struct rq *rq)
*/
task = pick_next_pushable_dl_task(rq);
if (task == next_task) {
+ /*
+ * If next task has now disabled migrating, see if we
+ * can push the current task.
+ */
+ if (unlikely(is_migration_disabled(task))) {
+ put_task_struct(next_task);
+ goto retry;
+ }
+
/*
* The task is still there. We don't try
* again, some other CPU will pull it when ready.
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7bd3e6ecbe45..316088e2fee2 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2136,6 +2136,14 @@ static int push_rt_task(struct rq *rq, bool pull)
*/
task = pick_next_pushable_task(rq);
if (task == next_task) {
+ /*
+ * If next task has now disabled migrating, see if we
+ * can push the current task.
+ */
+ if (unlikely(is_migration_disabled(task))) {
+ put_task_struct(next_task);
+ goto retry;
+ }
/*
* The task hasn't migrated, and is still the next
* eligible task, but we failed to find a run-queue
--
2.29.0
On 13/07/2022 15:48, Schspa Shi wrote:
> When the task to push disable migration, retry to push the current
> running task on this CPU away, instead doing nothing for this migrate
> disabled task.
>
> Signed-off-by: Schspa Shi <[email protected]>
Unfortunately, I can't recreate this issue on my Arm64 6 CPUs system on
mainline or PREEMPT_RT (linux-5.19.y-rt and v5.10.59-rt52) (the one you
mentioned in v6.)
With an rt-app rt workload of 12-18 periodic rt-tasks (4/16ms) all with
different priorities I never ran into a `is_migration_disabled(task)`
situation. I only ever get `task_rq(task) != rq` or `task_running(rq,
task)` under the `if (double_lock_balance(rq, lowest_rq))` condition in
find_lock_lowest_rq().
[...]
> // XXX validate p is still the highest prio task
> if (task_rq(p) == rq) {
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index cb3b886a081c..21af20445e7f 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2335,6 +2335,15 @@ static int push_dl_task(struct rq *rq)
> */
> task = pick_next_pushable_dl_task(rq);
> if (task == next_task) {
> + /*
> + * If next task has now disabled migrating, see if we
> + * can push the current task.
> + */
> + if (unlikely(is_migration_disabled(task))) {
> + put_task_struct(next_task);
> + goto retry;
> + }
> +
Looks like for DL this makes no sense since we're not pushing rq->curr
in `retry:` like for RT in case `is_migration_disabled(next_task)`.
[...]
Reviewed-by: Dietmar Eggemann <[email protected]>
Dietmar Eggemann <[email protected]> writes:
> On 13/07/2022 15:48, Schspa Shi wrote:
>> When the task to push disable migration, retry to push the current
>> running task on this CPU away, instead doing nothing for this migrate
>> disabled task.
>>
>> Signed-off-by: Schspa Shi <[email protected]>
>
> Unfortunately, I can't recreate this issue on my Arm64 6 CPUs system on
> mainline or PREEMPT_RT (linux-5.19.y-rt and v5.10.59-rt52) (the one you
> mentioned in v6.)
>
> With an rt-app rt workload of 12-18 periodic rt-tasks (4/16ms) all with
> different priorities I never ran into a `is_migration_disabled(task)`
> situation. I only ever get `task_rq(task) != rq` or `task_running(rq,
> task)` under the `if (double_lock_balance(rq, lowest_rq))` condition in
> find_lock_lowest_rq().
>
I think we need to write a kernel module to add more hard irq context to
increase the probability of recurrence.
I never recreate this issue with my test case too. But our test team can
reproduce the problem, they have more machines to reproduce the problem,
and the problem is easier to reproduce when the CPU is hotplugging.
> [...]
>
>> // XXX validate p is still the highest prio task
>> if (task_rq(p) == rq) {
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index cb3b886a081c..21af20445e7f 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -2335,6 +2335,15 @@ static int push_dl_task(struct rq *rq)
>> */
>> task = pick_next_pushable_dl_task(rq);
>> if (task == next_task) {
>> + /*
>> + * If next task has now disabled migrating, see if we
>> + * can push the current task.
>> + */
>> + if (unlikely(is_migration_disabled(task))) {
>> + put_task_struct(next_task);
>> + goto retry;
>> + }
>> +
>
> Looks like for DL this makes no sense since we're not pushing rq->curr
> in `retry:` like for RT in case `is_migration_disabled(next_task)`.
>
It seems we have the opportunity to execute resched_curr, which will
have a similar effect. I should change the comments for this for next
patch version.
> [...]
>
> Reviewed-by: Dietmar Eggemann <[email protected]>
--
BRs
Schspa Shi