John reported that push_rt_task() can end up invoking
find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS
one), which causes mayhem down convert_prio().
This can happen when current gets demoted to e.g. CFS when releasing an
rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before
getting the chance to reschedule. Exactly who triggers this work isn't
entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task()
if there are no RT tasks on the local RQ, which means the local CPU can't
be in the rto_mask.
My current suspected sequence is something along the lines of the below,
with the demoted task being current.
mark_wakeup_next_waiter()
rt_mutex_adjust_prio()
rt_mutex_setprio() // deboost originally-CFS task
check_class_changed()
switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running
switched_to_fair() // Sets need_resched
__balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above
raw_spin_rq_unlock(rq)
// need_resched is set, so task_woken_rt() can't
// invoke push_rt_tasks(). Best I can come up with is
// local CPU has rt_nr_migratory >= 2 after the demotion, so stays
// in the rto_mask, and then:
<some other CPU running rto_push_irq_work_func() queues rto_push_work on this CPU>
push_rt_task()
// breakage follows here as rq->curr is CFS
Move an existing check to check rq->curr vs the next pushable task's
priority before getting anywhere near find_lowest_rq(). While at it, add an
explicit sched_class of rq->curr check prior to invoking
find_lowest_rq(rq->curr).
Link: http://lore.kernel.org/r/Yb3vXx3DcqVOi+EA@donbot
Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
Reported-by: John Keeping <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
---
kernel/sched/rt.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7b4f4fbbb404..48fc8c04b038 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2026,6 +2026,16 @@ static int push_rt_task(struct rq *rq, bool pull)
return 0;
retry:
+ /*
+ * It's possible that the next_task slipped in of
+ * higher priority than current. If that's the case
+ * just reschedule current.
+ */
+ if (unlikely(next_task->prio < rq->curr->prio)) {
+ resched_curr(rq);
+ return 0;
+ }
+
if (is_migration_disabled(next_task)) {
struct task_struct *push_task = NULL;
int cpu;
@@ -2033,6 +2043,17 @@ static int push_rt_task(struct rq *rq, bool pull)
if (!pull || rq->push_busy)
return 0;
+ /*
+ * Per the above priority check, curr is at least RT. If it's
+ * of a higher class than RT, invoking find_lowest_rq() on it
+ * doesn't make sense.
+ *
+ * Note that the stoppers are masqueraded as SCHED_FIFO
+ * (cf. sched_set_stop_task()), so we can't rely on rt_task().
+ */
+ if (rq->curr->sched_class != &rt_sched_class)
+ return 0;
+
cpu = find_lowest_rq(rq->curr);
if (cpu == -1 || cpu == rq->cpu)
return 0;
@@ -2057,16 +2078,6 @@ static int push_rt_task(struct rq *rq, bool pull)
if (WARN_ON(next_task == rq->curr))
return 0;
- /*
- * It's possible that the next_task slipped in of
- * higher priority than current. If that's the case
- * just reschedule current.
- */
- if (unlikely(next_task->prio < rq->curr->prio)) {
- resched_curr(rq);
- return 0;
- }
-
/* We might release rq lock */
get_task_struct(next_task);
--
2.25.1
On 20/01/22 19:40, Valentin Schneider wrote:
> Link: http://lore.kernel.org/r/Yb3vXx3DcqVOi+EA@donbot
> Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
> Reported-by: John Keeping <[email protected]>
> Signed-off-by: Valentin Schneider <[email protected]>
@John: it's slightly different than the few things we got you to try out,
so I didn't keep your tested-by, sorry!
On Thu, 20 Jan 2022 19:47:01 +0000
Valentin Schneider <[email protected]> wrote:
> On 20/01/22 19:40, Valentin Schneider wrote:
> > Link: http://lore.kernel.org/r/Yb3vXx3DcqVOi+EA@donbot
> > Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
> > Reported-by: John Keeping <[email protected]>
> > Signed-off-by: Valentin Schneider <[email protected]>
>
> @John: it's slightly different than the few things we got you to try out,
> so I didn't keep your tested-by, sorry!
I ran a test with this version as well and as expected this does indeed
fix the issue, so this is also:
Tested-by: John Keeping <[email protected]>
On 20/01/2022 20:40, Valentin Schneider wrote:
[...]
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 7b4f4fbbb404..48fc8c04b038 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2026,6 +2026,16 @@ static int push_rt_task(struct rq *rq, bool pull)
> return 0;
>
> retry:
> + /*
> + * It's possible that the next_task slipped in of
> + * higher priority than current. If that's the case
> + * just reschedule current.
> + */
> + if (unlikely(next_task->prio < rq->curr->prio)) {
> + resched_curr(rq);
> + return 0;
> + }
If we do this before `is_migration_disabled(next_task), shouldn't then
the related condition in push_dl_task() also be moved up?
if (dl_task(rq->curr) &&
dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) &&
rq->curr->nr_cpus_allowed > 1)
To enforce resched_curr(rq) in the `is_migration_disabled(next_task)`
case there as well?
> +
> if (is_migration_disabled(next_task)) {
> struct task_struct *push_task = NULL;
> int cpu;
> @@ -2033,6 +2043,17 @@ static int push_rt_task(struct rq *rq, bool pull)
> if (!pull || rq->push_busy)
> return 0;
>
> + /*
> + * Per the above priority check, curr is at least RT. If it's
> + * of a higher class than RT, invoking find_lowest_rq() on it
> + * doesn't make sense.
> + *
> + * Note that the stoppers are masqueraded as SCHED_FIFO
> + * (cf. sched_set_stop_task()), so we can't rely on rt_task().
> + */
> + if (rq->curr->sched_class != &rt_sched_class)
s/ != / > / ... since the `unlikely(next_task->prio < rq->curr->prio)`
already filters tasks from lower sched classes (CFS)?
> + return 0;
> +
[...]
On 21/01/22 19:47, John Keeping wrote:
> On Thu, 20 Jan 2022 19:47:01 +0000
> Valentin Schneider <[email protected]> wrote:
>
>> On 20/01/22 19:40, Valentin Schneider wrote:
>> > Link: http://lore.kernel.org/r/Yb3vXx3DcqVOi+EA@donbot
>> > Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing")
>> > Reported-by: John Keeping <[email protected]>
>> > Signed-off-by: Valentin Schneider <[email protected]>
>>
>> @John: it's slightly different than the few things we got you to try out,
>> so I didn't keep your tested-by, sorry!
>
> I ran a test with this version as well and as expected this does indeed
> fix the issue, so this is also:
>
> Tested-by: John Keeping <[email protected]>
Thanks for testing this again!
On 24/01/22 10:37, Dietmar Eggemann wrote:
> On 20/01/2022 20:40, Valentin Schneider wrote:
>
> [...]
>
>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>> index 7b4f4fbbb404..48fc8c04b038 100644
>> --- a/kernel/sched/rt.c
>> +++ b/kernel/sched/rt.c
>> @@ -2026,6 +2026,16 @@ static int push_rt_task(struct rq *rq, bool pull)
>> return 0;
>>
>> retry:
>> + /*
>> + * It's possible that the next_task slipped in of
>> + * higher priority than current. If that's the case
>> + * just reschedule current.
>> + */
>> + if (unlikely(next_task->prio < rq->curr->prio)) {
>> + resched_curr(rq);
>> + return 0;
>> + }
>
> If we do this before `is_migration_disabled(next_task), shouldn't then
> the related condition in push_dl_task() also be moved up?
>
> if (dl_task(rq->curr) &&
> dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) &&
> rq->curr->nr_cpus_allowed > 1)
>
> To enforce resched_curr(rq) in the `is_migration_disabled(next_task)`
> case there as well?
>
I'm not sure if we can hit the same issue with DL since DL doesn't have the
push irqwork. If there are DL tasks on the rq when current gets demoted,
switched_from_dl() won't queue pull_dl_task().
That said, if say we have DL tasks on the rq and demote the current DL task
to RT, do we currently have anything that will call resched_curr() (I'm
looking at the rt_mutex path)?
switched_to_fair() has a resched_curr() (which helps for the RT -> CFS
case), I don't see anything that would give us that in switched_from_dl() /
switched_to_rt(), or am I missing something?
>> +
>> if (is_migration_disabled(next_task)) {
>> struct task_struct *push_task = NULL;
>> int cpu;
>> @@ -2033,6 +2043,17 @@ static int push_rt_task(struct rq *rq, bool pull)
>> if (!pull || rq->push_busy)
>> return 0;
>>
>> + /*
>> + * Per the above priority check, curr is at least RT. If it's
>> + * of a higher class than RT, invoking find_lowest_rq() on it
>> + * doesn't make sense.
>> + *
>> + * Note that the stoppers are masqueraded as SCHED_FIFO
>> + * (cf. sched_set_stop_task()), so we can't rely on rt_task().
>> + */
>> + if (rq->curr->sched_class != &rt_sched_class)
>
> s/ != / > / ... since the `unlikely(next_task->prio < rq->curr->prio)`
> already filters tasks from lower sched classes (CFS)?
>
!= points out we won't invoke find_lowest_rq() on anything that isn't RT,
which makes it a bit clearer IMO, and it's not like either of those
comparisons is more expensive than the other :)
>> + return 0;
>> +
>
> [...]
On 24/01/2022 14:29, Valentin Schneider wrote:
> On 24/01/22 10:37, Dietmar Eggemann wrote:
>> On 20/01/2022 20:40, Valentin Schneider wrote:
>>
>> [...]
>>
>>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>>> index 7b4f4fbbb404..48fc8c04b038 100644
>>> --- a/kernel/sched/rt.c
>>> +++ b/kernel/sched/rt.c
>>> @@ -2026,6 +2026,16 @@ static int push_rt_task(struct rq *rq, bool pull)
>>> return 0;
>>>
>>> retry:
>>> + /*
>>> + * It's possible that the next_task slipped in of
>>> + * higher priority than current. If that's the case
>>> + * just reschedule current.
>>> + */
>>> + if (unlikely(next_task->prio < rq->curr->prio)) {
>>> + resched_curr(rq);
>>> + return 0;
>>> + }
>>
>> If we do this before `is_migration_disabled(next_task), shouldn't then
>> the related condition in push_dl_task() also be moved up?
>>
>> if (dl_task(rq->curr) &&
>> dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) &&
>> rq->curr->nr_cpus_allowed > 1)
>>
>> To enforce resched_curr(rq) in the `is_migration_disabled(next_task)`
>> case there as well?
>>
>
> I'm not sure if we can hit the same issue with DL since DL doesn't have the
> push irqwork. If there are DL tasks on the rq when current gets demoted,
> switched_from_dl() won't queue pull_dl_task().
True. But with your RT change we reschedule current (CFS task or lower
rt task than next_task) now even in case next task is
migration-disabled. I.e. we prefer rescheduling over pushing current away.
But for DL we wouldn't reschedule current in such a case, we would just
return 0.
That said, the prio based check in RT includes other sched classes where
the DL check only compares DL tasks.
> That said, if say we have DL tasks on the rq and demote the current DL task
> to RT, do we currently have anything that will call resched_curr() (I'm
> looking at the rt_mutex path)?
> switched_to_fair() has a resched_curr() (which helps for the RT -> CFS
> case), I don't see anything that would give us that in switched_from_dl() /
> switched_to_rt(), or am I missing something?
>
>>> +
>>> if (is_migration_disabled(next_task)) {
>>> struct task_struct *push_task = NULL;
>>> int cpu;
>>> @@ -2033,6 +2043,17 @@ static int push_rt_task(struct rq *rq, bool pull)
>>> if (!pull || rq->push_busy)
>>> return 0;
>>>
>>> + /*
>>> + * Per the above priority check, curr is at least RT. If it's
>>> + * of a higher class than RT, invoking find_lowest_rq() on it
>>> + * doesn't make sense.
>>> + *
>>> + * Note that the stoppers are masqueraded as SCHED_FIFO
>>> + * (cf. sched_set_stop_task()), so we can't rely on rt_task().
>>> + */
>>> + if (rq->curr->sched_class != &rt_sched_class)
>>
>> s/ != / > / ... since the `unlikely(next_task->prio < rq->curr->prio)`
>> already filters tasks from lower sched classes (CFS)?
>>
>
> != points out we won't invoke find_lowest_rq() on anything that isn't RT,
> which makes it a bit clearer IMO, and it's not like either of those
> comparisons is more expensive than the other :)
Also true, but it would be more aligned to the comment above '... If it
(i.e. curr) 's of a higher class than ...'
[...]
On 24/01/22 16:47, Dietmar Eggemann wrote:
> On 24/01/2022 14:29, Valentin Schneider wrote:
>> On 24/01/22 10:37, Dietmar Eggemann wrote:
>>> On 20/01/2022 20:40, Valentin Schneider wrote:
>>>
>>> [...]
>>>
>>>> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
>>>> index 7b4f4fbbb404..48fc8c04b038 100644
>>>> --- a/kernel/sched/rt.c
>>>> +++ b/kernel/sched/rt.c
>>>> @@ -2026,6 +2026,16 @@ static int push_rt_task(struct rq *rq, bool pull)
>>>> return 0;
>>>>
>>>> retry:
>>>> + /*
>>>> + * It's possible that the next_task slipped in of
>>>> + * higher priority than current. If that's the case
>>>> + * just reschedule current.
>>>> + */
>>>> + if (unlikely(next_task->prio < rq->curr->prio)) {
>>>> + resched_curr(rq);
>>>> + return 0;
>>>> + }
>>>
>>> If we do this before `is_migration_disabled(next_task), shouldn't then
>>> the related condition in push_dl_task() also be moved up?
>>>
>>> if (dl_task(rq->curr) &&
>>> dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) &&
>>> rq->curr->nr_cpus_allowed > 1)
>>>
>>> To enforce resched_curr(rq) in the `is_migration_disabled(next_task)`
>>> case there as well?
>>>
>>
>> I'm not sure if we can hit the same issue with DL since DL doesn't have the
>> push irqwork. If there are DL tasks on the rq when current gets demoted,
>> switched_from_dl() won't queue pull_dl_task().
>
> True. But with your RT change we reschedule current (CFS task or lower
> rt task than next_task) now even in case next task is
> migration-disabled. I.e. we prefer rescheduling over pushing current away.
>
> But for DL we wouldn't reschedule current in such a case, we would just
> return 0.
>
> That said, the prio based check in RT includes other sched classes where
> the DL check only compares DL tasks.
>
I think you got a point to at least align the RT and DL code, and yes we
shouldn't care whether the next pushable DL task is migration_disabled or
not if it's higher prio than current, so I think I'll move that in v2.
>> That said, if say we have DL tasks on the rq and demote the current DL task
>> to RT, do we currently have anything that will call resched_curr() (I'm
>> looking at the rt_mutex path)?
>> switched_to_fair() has a resched_curr() (which helps for the RT -> CFS
>> case), I don't see anything that would give us that in switched_from_dl() /
>> switched_to_rt(), or am I missing something?
>>
>>>> +
>>>> if (is_migration_disabled(next_task)) {
>>>> struct task_struct *push_task = NULL;
>>>> int cpu;
>>>> @@ -2033,6 +2043,17 @@ static int push_rt_task(struct rq *rq, bool pull)
>>>> if (!pull || rq->push_busy)
>>>> return 0;
>>>>
>>>> + /*
>>>> + * Per the above priority check, curr is at least RT. If it's
>>>> + * of a higher class than RT, invoking find_lowest_rq() on it
>>>> + * doesn't make sense.
>>>> + *
>>>> + * Note that the stoppers are masqueraded as SCHED_FIFO
>>>> + * (cf. sched_set_stop_task()), so we can't rely on rt_task().
>>>> + */
>>>> + if (rq->curr->sched_class != &rt_sched_class)
>>>
>>> s/ != / > / ... since the `unlikely(next_task->prio < rq->curr->prio)`
>>> already filters tasks from lower sched classes (CFS)?
>>>
>>
>> != points out we won't invoke find_lowest_rq() on anything that isn't RT,
>> which makes it a bit clearer IMO, and it's not like either of those
>> comparisons is more expensive than the other :)
>
> Also true, but it would be more aligned to the comment above '... If it
> (i.e. curr) 's of a higher class than ...'
>
Right, I can clean that up!
> [...]