sched_setscheduler() and rt_mutex_setprio() invoke the run-queue balance
callback after changing priorities or the scheduling class of a task. The
run-queue for which the callback is invoked can be local or remote.
That's not a problem for the regular rq::push_work which is serialized with
a busy flag in the run-queue struct, but for the balance_push() work which
is only valid to be invoked on the outgoing CPU that's wrong. It not only
triggers the debug warning, but also leaves the per CPU variable push_work
unprotected, which can result in double enqueues on the stop machine list.
Remove the warning and check that the function is invoked on the
outgoing CPU. If not, just return and do nothing.
Fixes: ae7927023243 ("sched: Optimize finish_lock_switch()")
Reported-by: Sebastian Siewior <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
---
kernel/sched/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8523,7 +8523,6 @@ static void balance_push(struct rq *rq)
struct task_struct *push_task = rq->curr;
lockdep_assert_rq_held(rq);
- SCHED_WARN_ON(rq->cpu != smp_processor_id());
/*
* Ensure the thing is persistent until balance_push_set(.on = false);
@@ -8531,9 +8530,10 @@ static void balance_push(struct rq *rq)
rq->balance_callback = &balance_push_callback;
/*
- * Only active while going offline.
+ * Only active while going offline and when invoked on the outgoing
+ * CPU.
*/
- if (!cpu_dying(rq->cpu))
+ if (!cpu_dying(rq->cpu) && rq == this_rq())
return;
/*
On Fri, Aug 27 2021 at 16:07, Thomas Gleixner wrote:
> sched_setscheduler() and rt_mutex_setprio() invoke the run-queue balance
> callback after changing priorities or the scheduling class of a task. The
> run-queue for which the callback is invoked can be local or remote.
>
> That's not a problem for the regular rq::push_work which is serialized with
> a busy flag in the run-queue struct, but for the balance_push() work which
> is only valid to be invoked on the outgoing CPU that's wrong. It not only
> triggers the debug warning, but also leaves the per CPU variable push_work
> unprotected, which can result in double enqueues on the stop machine list.
>
> Remove the warning and check that the function is invoked on the
> outgoing CPU. If not, just return and do nothing.
>
> Fixes: ae7927023243 ("sched: Optimize finish_lock_switch()")
> Reported-by: Sebastian Siewior <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> ---
> kernel/sched/core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8523,7 +8523,6 @@ static void balance_push(struct rq *rq)
> struct task_struct *push_task = rq->curr;
>
> lockdep_assert_rq_held(rq);
> - SCHED_WARN_ON(rq->cpu != smp_processor_id());
>
> /*
> * Ensure the thing is persistent until balance_push_set(.on = false);
> @@ -8531,9 +8530,10 @@ static void balance_push(struct rq *rq)
> rq->balance_callback = &balance_push_callback;
>
> /*
> - * Only active while going offline.
> + * Only active while going offline and when invoked on the outgoing
> + * CPU.
> */
> - if (!cpu_dying(rq->cpu))
> + if (!cpu_dying(rq->cpu) && rq == this_rq())
> return;
Stupid me. The last minute change of moving the condition into the same
line fatfingered != to ==. Will resend ...
sched_setscheduler() and rt_mutex_setprio() invoke the run-queue balance
callback after changing priorities or the scheduling class of a task. The
run-queue for which the callback is invoked can be local or remote.
That's not a problem for the regular rq::push_work which is serialized with
a busy flag in the run-queue struct, but for the balance_push() work which
is only valid to be invoked on the outgoing CPU that's wrong. It not only
triggers the debug warning, but also leaves the per CPU variable push_work
unprotected, which can result in double enqueues on the stop machine list.
Remove the warning and validate that the function is invoked on the
outgoing CPU.
Fixes: ae7927023243 ("sched: Optimize finish_lock_switch()")
Reported-by: Sebastian Siewior <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
---
V2: Use the correct check for the outgoing CPU
---
kernel/sched/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8435,7 +8435,6 @@ static void balance_push(struct rq *rq)
struct task_struct *push_task = rq->curr;
lockdep_assert_rq_held(rq);
- SCHED_WARN_ON(rq->cpu != smp_processor_id());
/*
* Ensure the thing is persistent until balance_push_set(.on = false);
@@ -8443,9 +8442,10 @@ static void balance_push(struct rq *rq)
rq->balance_callback = &balance_push_callback;
/*
- * Only active while going offline.
+ * Only active while going offline and when invoked on the outgoing
+ * CPU.
*/
- if (!cpu_dying(rq->cpu))
+ if (!cpu_dying(rq->cpu) || rq != this_rq())
return;
/*
Hi Thomas,
On Sat, Aug 28, 2021 at 03:55:52PM +0200, Thomas Gleixner wrote:
> sched_setscheduler() and rt_mutex_setprio() invoke the run-queue balance
> callback after changing priorities or the scheduling class of a task. The
> run-queue for which the callback is invoked can be local or remote.
Let me say(always with my wrong). When the priority or policy changed, it is
the opportunity to call RT/DL/core_sched callback if they queued.
The corresponding callback is: pull/push_rt_task(), pull/push_dl_task() and
sched_core_balance(). The rq's callback list can be queued from local or
other cpu's callback_head. They defined as per-cpu.
sched_core_balance() do not use stop machine. pull_dl_task(), push/pull_rt_task
use stop machine(push_cpu_stop()) to migrate tasks. SCA is another place to
use stop machine(push_cpu_stop()).
> That's not a problem for the regular rq::push_work which is serialized with
> a busy flag in the run-queue struct, but for the balance_push() work which
So, they use rq::push_busy to serialize as said above.
balance_push() is another call back func that may queue on rq's callback list.
But, it is different from the above ones. It is *global* and more import than
others. If it is already queued on, every other callback will not be queued.
And the points where calling __balance_callbacks() will do the cpu outgoing
things.
sched_setscheduler() and rt_mutex_setprio() are the two points to call
__balance_callbacks(), and the __schedule() also have two points. one
is finish_lock_switch() and another is when prev==next case.
Task switched implys priority or policy changed. More important is for
stop machine(__balance_push_cpu_stop()) to migrate the tasks..
> is only valid to be invoked on the outgoing CPU that's wrong. It not only
When deactive cpu, set balance_push() call back. The opportunity that
call this func may from other cpus(now see, so the saying warning).
> triggers the debug warning, but also leaves the per CPU variable push_work
> unprotected, which can result in double enqueues on the stop machine list.
Auh, only just use the outgoing cpu to do the stop->task->stop.. migrate tasks..
And, no need to use what to protect the push_work.
Yeah.. my ugly words.. But.
> Remove the warning and validate that the function is invoked on the
> outgoing CPU.
>
> Fixes: ae7927023243 ("sched: Optimize finish_lock_switch()")
> Reported-by: Sebastian Siewior <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: [email protected]
> Cc: Juri Lelli <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Ben Segall <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Daniel Bristot de Oliveira <[email protected]>
> Cc: Dietmar Eggemann <[email protected]>
> ---
> V2: Use the correct check for the outgoing CPU
> ---
> kernel/sched/core.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8435,7 +8435,6 @@ static void balance_push(struct rq *rq)
> struct task_struct *push_task = rq->curr;
>
> lockdep_assert_rq_held(rq);
> - SCHED_WARN_ON(rq->cpu != smp_processor_id());
>
> /*
> * Ensure the thing is persistent until balance_push_set(.on = false);
> @@ -8443,9 +8442,10 @@ static void balance_push(struct rq *rq)
> rq->balance_callback = &balance_push_callback;
>
> /*
> - * Only active while going offline.
> + * Only active while going offline and when invoked on the outgoing
> + * CPU.
> */
> - if (!cpu_dying(rq->cpu))
> + if (!cpu_dying(rq->cpu) || rq != this_rq())
> return;
>
> /*
Thanks,
Tao
On Sat, Aug 28, 2021 at 03:55:52PM +0200, Thomas Gleixner wrote:
> sched_setscheduler() and rt_mutex_setprio() invoke the run-queue balance
> callback after changing priorities or the scheduling class of a task. The
> run-queue for which the callback is invoked can be local or remote.
>
> That's not a problem for the regular rq::push_work which is serialized with
> a busy flag in the run-queue struct, but for the balance_push() work which
> is only valid to be invoked on the outgoing CPU that's wrong. It not only
> triggers the debug warning, but also leaves the per CPU variable push_work
> unprotected, which can result in double enqueues on the stop machine list.
>
> Remove the warning and validate that the function is invoked on the
> outgoing CPU.
>
> Fixes: ae7927023243 ("sched: Optimize finish_lock_switch()")
> Reported-by: Sebastian Siewior <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
Thanks!
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 868ad33bfa3bf39960982682ad3a0f8ebda1656e
Gitweb: https://git.kernel.org/tip/868ad33bfa3bf39960982682ad3a0f8ebda1656e
Author: Thomas Gleixner <[email protected]>
AuthorDate: Sat, 28 Aug 2021 15:55:52 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Thu, 09 Sep 2021 11:27:23 +02:00
sched: Prevent balance_push() on remote runqueues
sched_setscheduler() and rt_mutex_setprio() invoke the run-queue balance
callback after changing priorities or the scheduling class of a task. The
run-queue for which the callback is invoked can be local or remote.
That's not a problem for the regular rq::push_work which is serialized with
a busy flag in the run-queue struct, but for the balance_push() work which
is only valid to be invoked on the outgoing CPU that's wrong. It not only
triggers the debug warning, but also leaves the per CPU variable push_work
unprotected, which can result in double enqueues on the stop machine list.
Remove the warning and validate that the function is invoked on the
outgoing CPU.
Fixes: ae7927023243 ("sched: Optimize finish_lock_switch()")
Reported-by: Sebastian Siewior <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/87zgt1hdw7.ffs@tglx
---
kernel/sched/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f3b27c6..b21a185 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8523,7 +8523,6 @@ static void balance_push(struct rq *rq)
struct task_struct *push_task = rq->curr;
lockdep_assert_rq_held(rq);
- SCHED_WARN_ON(rq->cpu != smp_processor_id());
/*
* Ensure the thing is persistent until balance_push_set(.on = false);
@@ -8531,9 +8530,10 @@ static void balance_push(struct rq *rq)
rq->balance_callback = &balance_push_callback;
/*
- * Only active while going offline.
+ * Only active while going offline and when invoked on the outgoing
+ * CPU.
*/
- if (!cpu_dying(rq->cpu))
+ if (!cpu_dying(rq->cpu) || rq != this_rq())
return;
/*