2023-06-24 09:30:51

by Hui Tang

[permalink] [raw]
Subject: [PATCH] sched/rt: Fix possible warn when push_rt_task

A warn may be triggered during reboot, as follows:

reboot
->kernel_restart
->machine_restart
->smp_send_stop --- ipi handler set_cpu_online(cpu, false)

balance_callback
-> __balance_callback
->push_rt_task
-> find_lock_lowest_rq --- offline cpu in vec->mask not be cleared
-> find_lowest_rq
-> cpupri_find
-> cpupri_find_fitness
-> __cpupri_find [cpumask_and(..., vec->mask)]
-> set_task_cpu(next_task, lowest_rq->cpu) --- WARN_ON(!oneline(cpu)

So add !cpu_online(lowest_rq->cpu) check before set_task_cpu().
The fix does not completely fix the problem, since cpu_online_mask may
be cleared after check.

Fixes: 4ff9083b8a9a8 ("sched/core: WARN() when migrating to an offline CPU")
Signed-off-by: Hui Tang <[email protected]>
---
kernel/sched/rt.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 00e0e5074115..852ef18b6a50 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2159,6 +2159,9 @@ static int push_rt_task(struct rq *rq, bool pull)
goto retry;
}

+ if (unlikely(!cpu_online(lowest_rq->cpu)))
+ goto out;
+
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
activate_task(lowest_rq, next_task, 0);
--
2.17.1



2023-07-03 12:54:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/rt: Fix possible warn when push_rt_task

On Sat, Jun 24, 2023 at 05:21:30PM +0800, Hui Tang wrote:
> A warn may be triggered during reboot, as follows:
>
> reboot
> ->kernel_restart
> ->machine_restart
> ->smp_send_stop --- ipi handler set_cpu_online(cpu, false)
>
> balance_callback
> -> __balance_callback
> ->push_rt_task
> -> find_lock_lowest_rq --- offline cpu in vec->mask not be cleared
> -> find_lowest_rq
> -> cpupri_find
> -> cpupri_find_fitness
> -> __cpupri_find [cpumask_and(..., vec->mask)]
> -> set_task_cpu(next_task, lowest_rq->cpu) --- WARN_ON(!oneline(cpu)
>
> So add !cpu_online(lowest_rq->cpu) check before set_task_cpu().
> The fix does not completely fix the problem, since cpu_online_mask may
> be cleared after check.

This is tinkering.. at best. I'm sure there's a score of other issues,
not in the least the very same issue in deadline.c. But since this
doesn't actually fix anything, this clearly isn't the right way.

> Fixes: 4ff9083b8a9a8 ("sched/core: WARN() when migrating to an offline CPU")
> Signed-off-by: Hui Tang <[email protected]>
> ---
> kernel/sched/rt.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 00e0e5074115..852ef18b6a50 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2159,6 +2159,9 @@ static int push_rt_task(struct rq *rq, bool pull)
> goto retry;
> }
>
> + if (unlikely(!cpu_online(lowest_rq->cpu)))
> + goto out;
> +
> deactivate_task(rq, next_task, 0);
> set_task_cpu(next_task, lowest_rq->cpu);
> activate_task(lowest_rq, next_task, 0);
> --
> 2.17.1
>