2020-02-14 16:40:50

by Qais Yousef

[permalink] [raw]
Subject: [PATCH 2/3] sched/rt: allow pulling unfitting task

When implemented RT Capacity Awareness; the logic was done such that if
a task was running on a fitting CPU, then it was sticky and we would try
our best to keep it there.

But as Steve suggested, to adhere to the strict priority rules of RT
class; allow pulling an RT task to unfitting CPU to ensure it gets a
chance to run ASAP. When doing so, mark the queue as overloaded to give
the system a chance to push the task to a better fitting CPU when a
chance arises.

Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Qais Yousef <[email protected]>
---
kernel/sched/rt.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4043abe45459..0c8bac134d3a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1646,10 +1646,20 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)

static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
{
- if (!task_running(rq, p) &&
- cpumask_test_cpu(cpu, p->cpus_ptr) &&
- rt_task_fits_capacity(p, cpu))
+ if (!task_running(rq, p) && cpumask_test_cpu(cpu, p->cpus_ptr)) {
+
+ /*
+ * If the CPU doesn't fit the task, allow pulling but mark the
+ * rq as overloaded so that we can push it again to a more
+ * suitable CPU ASAP.
+ */
+ if (!rt_task_fits_capacity(p, cpu)) {
+ rt_set_overload(rq);
+ rq->rt.overloaded = 1;
+ }
+
return 1;
+ }

return 0;
}
--
2.17.1


2020-02-17 09:12:19

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH 2/3] sched/rt: allow pulling unfitting task

Hi Qais,

On Fri, Feb 14, 2020 at 04:39:48PM +0000, Qais Yousef wrote:
> When implemented RT Capacity Awareness; the logic was done such that if
> a task was running on a fitting CPU, then it was sticky and we would try
> our best to keep it there.
>
> But as Steve suggested, to adhere to the strict priority rules of RT
> class; allow pulling an RT task to unfitting CPU to ensure it gets a
> chance to run ASAP. When doing so, mark the queue as overloaded to give
> the system a chance to push the task to a better fitting CPU when a
> chance arises.
>
> Suggested-by: Steven Rostedt <[email protected]>
> Signed-off-by: Qais Yousef <[email protected]>
> ---
> kernel/sched/rt.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 4043abe45459..0c8bac134d3a 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1646,10 +1646,20 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
>
> static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
> {
> - if (!task_running(rq, p) &&
> - cpumask_test_cpu(cpu, p->cpus_ptr) &&
> - rt_task_fits_capacity(p, cpu))
> + if (!task_running(rq, p) && cpumask_test_cpu(cpu, p->cpus_ptr)) {
> +
> + /*
> + * If the CPU doesn't fit the task, allow pulling but mark the
> + * rq as overloaded so that we can push it again to a more
> + * suitable CPU ASAP.
> + */
> + if (!rt_task_fits_capacity(p, cpu)) {
> + rt_set_overload(rq);
> + rq->rt.overloaded = 1;
> + }
> +

Here rq is source rq from which the task is being pulled. I can't understand
how marking overload condition on source_rq help. Because overload condition
gets cleared in the task dequeue path. i.e dec_rt_tasks->dec_rt_migration->
update_rt_migration().

Also, the overload condition with nr_running=1 may not work as expected unless
we track this overload condition (due to unfit) separately. Because a task
can be pushed only when it is NOT running. So a task running on silver will
continue to run there until it wakes up next time or another high prio task
gets queued here (due to affinity).

btw, Are you testing this path by disabling RT_PUSH_IPI feature? I ask this
because, This feature gets turned on by default in our b.L platforms and
RT task migrations happens by the busy CPU pushing the tasks. Or are there
any cases where we can run into pick_rt_task() even when RT_PUSH_IPI is
enabled?

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-02-17 12:36:43

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 2/3] sched/rt: allow pulling unfitting task

On 02/17/20 14:40, Pavan Kondeti wrote:
> Hi Qais,
>
> On Fri, Feb 14, 2020 at 04:39:48PM +0000, Qais Yousef wrote:
> > When implemented RT Capacity Awareness; the logic was done such that if
> > a task was running on a fitting CPU, then it was sticky and we would try
> > our best to keep it there.
> >
> > But as Steve suggested, to adhere to the strict priority rules of RT
> > class; allow pulling an RT task to unfitting CPU to ensure it gets a
> > chance to run ASAP. When doing so, mark the queue as overloaded to give
> > the system a chance to push the task to a better fitting CPU when a
> > chance arises.
> >
> > Suggested-by: Steven Rostedt <[email protected]>
> > Signed-off-by: Qais Yousef <[email protected]>
> > ---
> > kernel/sched/rt.c | 16 +++++++++++++---
> > 1 file changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 4043abe45459..0c8bac134d3a 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1646,10 +1646,20 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
> >
> > static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
> > {
> > - if (!task_running(rq, p) &&
> > - cpumask_test_cpu(cpu, p->cpus_ptr) &&
> > - rt_task_fits_capacity(p, cpu))
> > + if (!task_running(rq, p) && cpumask_test_cpu(cpu, p->cpus_ptr)) {
> > +
> > + /*
> > + * If the CPU doesn't fit the task, allow pulling but mark the
> > + * rq as overloaded so that we can push it again to a more
> > + * suitable CPU ASAP.
> > + */
> > + if (!rt_task_fits_capacity(p, cpu)) {
> > + rt_set_overload(rq);
> > + rq->rt.overloaded = 1;
> > + }
> > +
>
> Here rq is source rq from which the task is being pulled. I can't understand
> how marking overload condition on source_rq help. Because overload condition
> gets cleared in the task dequeue path. i.e dec_rt_tasks->dec_rt_migration->
> update_rt_migration().

Err yes indeed I rushed this patch. Let me try again.

Thanks

--
Qais Yousef

2020-02-19 13:43:40

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 2/3] sched/rt: allow pulling unfitting task

On 02/17/20 14:40, Pavan Kondeti wrote:
> Hi Qais,
>
> On Fri, Feb 14, 2020 at 04:39:48PM +0000, Qais Yousef wrote:
> > When implemented RT Capacity Awareness; the logic was done such that if
> > a task was running on a fitting CPU, then it was sticky and we would try
> > our best to keep it there.
> >
> > But as Steve suggested, to adhere to the strict priority rules of RT
> > class; allow pulling an RT task to unfitting CPU to ensure it gets a
> > chance to run ASAP. When doing so, mark the queue as overloaded to give
> > the system a chance to push the task to a better fitting CPU when a
> > chance arises.
> >
> > Suggested-by: Steven Rostedt <[email protected]>
> > Signed-off-by: Qais Yousef <[email protected]>
> > ---
> > kernel/sched/rt.c | 16 +++++++++++++---
> > 1 file changed, 13 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 4043abe45459..0c8bac134d3a 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1646,10 +1646,20 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
> >
> > static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
> > {
> > - if (!task_running(rq, p) &&
> > - cpumask_test_cpu(cpu, p->cpus_ptr) &&
> > - rt_task_fits_capacity(p, cpu))
> > + if (!task_running(rq, p) && cpumask_test_cpu(cpu, p->cpus_ptr)) {
> > +
> > + /*
> > + * If the CPU doesn't fit the task, allow pulling but mark the
> > + * rq as overloaded so that we can push it again to a more
> > + * suitable CPU ASAP.
> > + */
> > + if (!rt_task_fits_capacity(p, cpu)) {
> > + rt_set_overload(rq);
> > + rq->rt.overloaded = 1;
> > + }
> > +
>
> Here rq is source rq from which the task is being pulled. I can't understand
> how marking overload condition on source_rq help. Because overload condition
> gets cleared in the task dequeue path. i.e dec_rt_tasks->dec_rt_migration->
> update_rt_migration().
>
> Also, the overload condition with nr_running=1 may not work as expected unless
> we track this overload condition (due to unfit) separately. Because a task
> can be pushed only when it is NOT running. So a task running on silver will
> continue to run there until it wakes up next time or another high prio task
> gets queued here (due to affinity).
>
> btw, Are you testing this path by disabling RT_PUSH_IPI feature? I ask this
> because, This feature gets turned on by default in our b.L platforms and
> RT task migrations happens by the busy CPU pushing the tasks. Or are there
> any cases where we can run into pick_rt_task() even when RT_PUSH_IPI is
> enabled?

I changed the approach to set the overload at wake up now. I think I got away
without having to encode the reason in rq->rt.overload.

Steve, Pavan, if you can scrutinize this approach I'd be appreciated. It seemed
to work fine with my testing.

I'll push an updated series if this turned out okay.

Thanks

--
Qais Yousef

-->8--


When implemented RT Capacity Awareness; the logic was done such that if
a task was running on a fitting CPU, then it was sticky and we would try
our best to keep it there.

But as Steve suggested, to adhere to the strict priority rules of RT
class; allow pulling an RT task to unfitting CPU to ensure it gets a
chance to run ASAP.

To better handle the fact the task is running on unfit CPU, when it
wakes up mark it as overloaded which will cause it to be pushed to
a fitting CPU when it becomes available.

The latter change requires teaching push_rt_task() how to handle pushing
unfit task.

If the unfit task is the only pushable task, then we only force the push
if we find a fitting CPU. Otherwise we bail out.

Else if the task is higher priorirty than current task, then we
reschedule.

Else if the rq has other pushable tasks, then we push the unfitting task
anyway to reduce the pressure on the rq even if the target CPU is unfit
too.

Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Qais Yousef <[email protected]>
---
kernel/sched/rt.c | 52 +++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 48 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 6d959be4bba0..6d92219d5733 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1658,8 +1658,7 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
{
if (!task_running(rq, p) &&
- cpumask_test_cpu(cpu, p->cpus_ptr) &&
- rt_task_fits_capacity(p, cpu))
+ cpumask_test_cpu(cpu, p->cpus_ptr))
return 1;

return 0;
@@ -1860,6 +1859,7 @@ static int push_rt_task(struct rq *rq)
struct task_struct *next_task;
struct rq *lowest_rq;
int ret = 0;
+ bool fit;

if (!rq->rt.overloaded)
return 0;
@@ -1872,12 +1872,21 @@ static int push_rt_task(struct rq *rq)
if (WARN_ON(next_task == rq->curr))
return 0;

+ /*
+ * The rq could be overloaded because it has unfitting task, if that's
+ * the case they we need to try harder to find a better fitting CPU.
+ */
+ fit = rt_task_fits_capacity(next_task, cpu_of(rq));
+
/*
* It's possible that the next_task slipped in of
* higher priority than current. If that's the case
* just reschedule current.
+ *
+ * Unless next_task doesn't fit in this cpu, then continue with the
+ * attempt to push it.
*/
- if (unlikely(next_task->prio < rq->curr->prio)) {
+ if (unlikely(next_task->prio < rq->curr->prio && fit)) {
resched_curr(rq);
return 0;
}
@@ -1920,6 +1929,33 @@ static int push_rt_task(struct rq *rq)
goto retry;
}

+ /*
+ * Bail out if the task doesn't fit on either CPUs.
+ *
+ * Unless..
+ *
+ * * The priority of next_task is higher than current, then we
+ * resched_curr(). We forced skipping this condition above.
+ *
+ * * The rq has more tasks to push, then we probably should push anyway
+ * reduce the load on this rq.
+ */
+ if (!fit && !rt_task_fits_capacity(next_task, cpu_of(lowest_rq))) {
+
+ /* we forced skipping this condition, so re-evaluate it */
+ if (unlikely(next_task->prio < rq->curr->prio)) {
+ resched_curr(rq);
+ goto out_unlock;
+ }
+
+ /*
+ * If there are more tasks to push, then the rq is overloaded
+ * with more than just this task, so push anyway.
+ */
+ if (has_pushable_tasks(rq))
+ goto out_unlock;
+ }
+
deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, lowest_rq->cpu);
activate_task(lowest_rq, next_task, 0);
@@ -1927,6 +1963,7 @@ static int push_rt_task(struct rq *rq)

resched_curr(lowest_rq);

+out_unlock:
double_unlock_balance(rq, lowest_rq);

out:
@@ -2223,7 +2260,14 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
(rq->curr->nr_cpus_allowed < 2 ||
rq->curr->prio <= p->prio);

- if (need_to_push || !rt_task_fits_capacity(p, cpu_of(rq)))
+ bool fit = rt_task_fits_capacity(p, cpu_of(rq));
+
+ if (!fit && !rq->rt.overloaded) {
+ rt_set_overload(rq);
+ rq->rt.overloaded = 1;
+ }
+
+ if (need_to_push || !fit)
push_rt_tasks(rq);
}

--
2.17.1

2020-02-21 08:36:21

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH 2/3] sched/rt: allow pulling unfitting task

Hi Quais,

On Wed, Feb 19, 2020 at 01:43:08PM +0000, Qais Yousef wrote:

[...]

> >
> > Here rq is source rq from which the task is being pulled. I can't understand
> > how marking overload condition on source_rq help. Because overload condition
> > gets cleared in the task dequeue path. i.e dec_rt_tasks->dec_rt_migration->
> > update_rt_migration().
> >
> > Also, the overload condition with nr_running=1 may not work as expected unless
> > we track this overload condition (due to unfit) separately. Because a task
> > can be pushed only when it is NOT running. So a task running on silver will
> > continue to run there until it wakes up next time or another high prio task
> > gets queued here (due to affinity).
> >
> > btw, Are you testing this path by disabling RT_PUSH_IPI feature? I ask this
> > because, This feature gets turned on by default in our b.L platforms and
> > RT task migrations happens by the busy CPU pushing the tasks. Or are there
> > any cases where we can run into pick_rt_task() even when RT_PUSH_IPI is
> > enabled?
>
> I changed the approach to set the overload at wake up now. I think I got away
> without having to encode the reason in rq->rt.overload.
>
> Steve, Pavan, if you can scrutinize this approach I'd be appreciated. It seemed
> to work fine with my testing.
>

The 1st patch in this series added the support to return an unfit CPU incase
all other BIG CPUs are busy with RT tasks. I believe that change it self
solves the RT tasks spreading problem if we just remove
rt_task_fits_capacity() from pick_rt_task(). In fact rt_task_fits_capacity()
can also be removed from switched_to_rt() and task_woken_rt(). Because
a RT task can be pulled/pushed only if it not currently running. If the
intention is to push/pull a running RT task from an unfit CPU to a BIG CPU,
that won't work as we are not waking any migration/X to carry the migration.

If an CPU has 2 RT tasks (excluding the running RT task), then we have a
choice about which RT task to be pushed based on the destination CPU's
capacity. Are we trying to solve this problem in this patch?

Please see the below comments as well and let me know if we are on the
same page or not.

>
>
> When implemented RT Capacity Awareness; the logic was done such that if
> a task was running on a fitting CPU, then it was sticky and we would try
> our best to keep it there.
>
> But as Steve suggested, to adhere to the strict priority rules of RT
> class; allow pulling an RT task to unfitting CPU to ensure it gets a
> chance to run ASAP.
>
> To better handle the fact the task is running on unfit CPU, when it
> wakes up mark it as overloaded which will cause it to be pushed to
> a fitting CPU when it becomes available.
>
> The latter change requires teaching push_rt_task() how to handle pushing
> unfit task.
>
> If the unfit task is the only pushable task, then we only force the push
> if we find a fitting CPU. Otherwise we bail out.
>
> Else if the task is higher priorirty than current task, then we
> reschedule.
>
> Else if the rq has other pushable tasks, then we push the unfitting task
> anyway to reduce the pressure on the rq even if the target CPU is unfit
> too.
>
> Suggested-by: Steven Rostedt <[email protected]>
> Signed-off-by: Qais Yousef <[email protected]>
> ---
> kernel/sched/rt.c | 52 +++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 48 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 6d959be4bba0..6d92219d5733 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1658,8 +1658,7 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
> static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
> {
> if (!task_running(rq, p) &&
> - cpumask_test_cpu(cpu, p->cpus_ptr) &&
> - rt_task_fits_capacity(p, cpu))
> + cpumask_test_cpu(cpu, p->cpus_ptr))
> return 1;
>

Yes. We come here means rq has more than 1 runnable task, so we prefer
spreading.

> return 0;
> @@ -1860,6 +1859,7 @@ static int push_rt_task(struct rq *rq)
> struct task_struct *next_task;
> struct rq *lowest_rq;
> int ret = 0;
> + bool fit;
>
> if (!rq->rt.overloaded)
> return 0;
> @@ -1872,12 +1872,21 @@ static int push_rt_task(struct rq *rq)
> if (WARN_ON(next_task == rq->curr))
> return 0;
>
> + /*
> + * The rq could be overloaded because it has unfitting task, if that's
> + * the case they we need to try harder to find a better fitting CPU.
> + */
> + fit = rt_task_fits_capacity(next_task, cpu_of(rq));
> +
> /*
> * It's possible that the next_task slipped in of
> * higher priority than current. If that's the case
> * just reschedule current.
> + *
> + * Unless next_task doesn't fit in this cpu, then continue with the
> + * attempt to push it.
> */
> - if (unlikely(next_task->prio < rq->curr->prio)) {
> + if (unlikely(next_task->prio < rq->curr->prio && fit)) {
> resched_curr(rq);
> return 0;
> }
> @@ -1920,6 +1929,33 @@ static int push_rt_task(struct rq *rq)
> goto retry;
> }
>
> + /*
> + * Bail out if the task doesn't fit on either CPUs.
> + *
> + * Unless..
> + *
> + * * The priority of next_task is higher than current, then we
> + * resched_curr(). We forced skipping this condition above.
> + *
> + * * The rq has more tasks to push, then we probably should push anyway
> + * reduce the load on this rq.
> + */
> + if (!fit && !rt_task_fits_capacity(next_task, cpu_of(lowest_rq))) {
> +
> + /* we forced skipping this condition, so re-evaluate it */
> + if (unlikely(next_task->prio < rq->curr->prio)) {
> + resched_curr(rq);
> + goto out_unlock;
> + }
> +
> + /*
> + * If there are more tasks to push, then the rq is overloaded
> + * with more than just this task, so push anyway.
> + */
> + if (has_pushable_tasks(rq))
> + goto out_unlock;

The next_task is not yet removed from the pushable_tasks list, so this will
always be true and we skip pushing. Even if you add some logic, what happens
if the other task also does not fit? How would the task finally be pushed
to other CPUs?

> + }
> +
> deactivate_task(rq, next_task, 0);
> set_task_cpu(next_task, lowest_rq->cpu);
> activate_task(lowest_rq, next_task, 0);
> @@ -1927,6 +1963,7 @@ static int push_rt_task(struct rq *rq)
>
> resched_curr(lowest_rq);
>
> +out_unlock:
> double_unlock_balance(rq, lowest_rq);
>
> out:
> @@ -2223,7 +2260,14 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
> (rq->curr->nr_cpus_allowed < 2 ||
> rq->curr->prio <= p->prio);
>
> - if (need_to_push || !rt_task_fits_capacity(p, cpu_of(rq)))
> + bool fit = rt_task_fits_capacity(p, cpu_of(rq));
> +
> + if (!fit && !rq->rt.overloaded) {
> + rt_set_overload(rq);
> + rq->rt.overloaded = 1;
> + }
> +
> + if (need_to_push || !fit)
> push_rt_tasks(rq);
> }

If this is the only RT task queued on this CPU and current task is not a RT
task, why are we calling push_rt_tasks() in case if unfit scenario. In most
cases, the task is woken on this rq, because there is no other higher capacity
rq that can take this task for whatever reason.

btw, if we set overload condition with just 1 RT task, we keep receiving the
IPI (irq work) to push the task and we can't do that because a running RT
task can't be pushed.

Thanks,
Pavan
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-02-21 11:10:52

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 2/3] sched/rt: allow pulling unfitting task

On 02/21/20 13:37, Pavan Kondeti wrote:
> Hi Quais,

I know my name breaks the English rules, but there's no u after my Q :)

>
> On Wed, Feb 19, 2020 at 01:43:08PM +0000, Qais Yousef wrote:
>
> [...]
>
> > >
> > > Here rq is source rq from which the task is being pulled. I can't understand
> > > how marking overload condition on source_rq help. Because overload condition
> > > gets cleared in the task dequeue path. i.e dec_rt_tasks->dec_rt_migration->
> > > update_rt_migration().
> > >
> > > Also, the overload condition with nr_running=1 may not work as expected unless
> > > we track this overload condition (due to unfit) separately. Because a task
> > > can be pushed only when it is NOT running. So a task running on silver will
> > > continue to run there until it wakes up next time or another high prio task
> > > gets queued here (due to affinity).
> > >
> > > btw, Are you testing this path by disabling RT_PUSH_IPI feature? I ask this
> > > because, This feature gets turned on by default in our b.L platforms and
> > > RT task migrations happens by the busy CPU pushing the tasks. Or are there
> > > any cases where we can run into pick_rt_task() even when RT_PUSH_IPI is
> > > enabled?
> >
> > I changed the approach to set the overload at wake up now. I think I got away
> > without having to encode the reason in rq->rt.overload.
> >
> > Steve, Pavan, if you can scrutinize this approach I'd be appreciated. It seemed
> > to work fine with my testing.
> >
>
> The 1st patch in this series added the support to return an unfit CPU incase

If you can give your reviewed-by and maybe tested-by for that patch that'd be
great.

> all other BIG CPUs are busy with RT tasks. I believe that change it self
> solves the RT tasks spreading problem if we just remove
> rt_task_fits_capacity() from pick_rt_task(). In fact rt_task_fits_capacity()

Yes I will be removing this.

> can also be removed from switched_to_rt() and task_woken_rt(). Because
> a RT task can be pulled/pushed only if it not currently running. If the

I did actually consider pushing a patch that does just that.

The issue is that I have seen delays in migrating the task to the big CPU,
although the CPU was free. It was hard to hit, but I had runs where the
migration didn't happen at all during the 1s test duration.

I need to use the same overloaded trick in switched_to_rt() too though.

> intention is to push/pull a running RT task from an unfit CPU to a BIG CPU,
> that won't work as we are not waking any migration/X to carry the migration.

Hmm, I thought this should happen as a side effect of the push/pull functions.

>
> If an CPU has 2 RT tasks (excluding the running RT task), then we have a
> choice about which RT task to be pushed based on the destination CPU's
> capacity. Are we trying to solve this problem in this patch?

If there are 2 tasks on the same CPU (rq), then it's overloaded and the push
should be triggered anyway, no?

What I'm trying to solve is if a task has woken up on the wrong CPU, or has
just switched to RT on the wrong CPU, then we want to push ASAP.

We can rely on select_task_rq_rt() to do the work, but I have seen big delays
for this to trigger.

>
> Please see the below comments as well and let me know if we are on the
> same page or not.
>
> >
> >
> > When implemented RT Capacity Awareness; the logic was done such that if
> > a task was running on a fitting CPU, then it was sticky and we would try
> > our best to keep it there.
> >
> > But as Steve suggested, to adhere to the strict priority rules of RT
> > class; allow pulling an RT task to unfitting CPU to ensure it gets a
> > chance to run ASAP.
> >
> > To better handle the fact the task is running on unfit CPU, when it
> > wakes up mark it as overloaded which will cause it to be pushed to
> > a fitting CPU when it becomes available.
> >
> > The latter change requires teaching push_rt_task() how to handle pushing
> > unfit task.
> >
> > If the unfit task is the only pushable task, then we only force the push
> > if we find a fitting CPU. Otherwise we bail out.
> >
> > Else if the task is higher priorirty than current task, then we
> > reschedule.
> >
> > Else if the rq has other pushable tasks, then we push the unfitting task
> > anyway to reduce the pressure on the rq even if the target CPU is unfit
> > too.
> >
> > Suggested-by: Steven Rostedt <[email protected]>
> > Signed-off-by: Qais Yousef <[email protected]>
> > ---
> > kernel/sched/rt.c | 52 +++++++++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 48 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 6d959be4bba0..6d92219d5733 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1658,8 +1658,7 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
> > static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
> > {
> > if (!task_running(rq, p) &&
> > - cpumask_test_cpu(cpu, p->cpus_ptr) &&
> > - rt_task_fits_capacity(p, cpu))
> > + cpumask_test_cpu(cpu, p->cpus_ptr))
> > return 1;
> >
>
> Yes. We come here means rq has more than 1 runnable task, so we prefer
> spreading.

I might put this change only for this patch and send the fixes so they can go
before 5.6 is released. Unless the current patch ends up needing few tweaks
only.

>
> > return 0;
> > @@ -1860,6 +1859,7 @@ static int push_rt_task(struct rq *rq)
> > struct task_struct *next_task;
> > struct rq *lowest_rq;
> > int ret = 0;
> > + bool fit;
> >
> > if (!rq->rt.overloaded)
> > return 0;
> > @@ -1872,12 +1872,21 @@ static int push_rt_task(struct rq *rq)
> > if (WARN_ON(next_task == rq->curr))
> > return 0;
> >
> > + /*
> > + * The rq could be overloaded because it has unfitting task, if that's
> > + * the case they we need to try harder to find a better fitting CPU.
> > + */
> > + fit = rt_task_fits_capacity(next_task, cpu_of(rq));
> > +
> > /*
> > * It's possible that the next_task slipped in of
> > * higher priority than current. If that's the case
> > * just reschedule current.
> > + *
> > + * Unless next_task doesn't fit in this cpu, then continue with the
> > + * attempt to push it.
> > */
> > - if (unlikely(next_task->prio < rq->curr->prio)) {
> > + if (unlikely(next_task->prio < rq->curr->prio && fit)) {
> > resched_curr(rq);
> > return 0;
> > }
> > @@ -1920,6 +1929,33 @@ static int push_rt_task(struct rq *rq)
> > goto retry;
> > }
> >
> > + /*
> > + * Bail out if the task doesn't fit on either CPUs.
> > + *
> > + * Unless..
> > + *
> > + * * The priority of next_task is higher than current, then we
> > + * resched_curr(). We forced skipping this condition above.
> > + *
> > + * * The rq has more tasks to push, then we probably should push anyway
> > + * reduce the load on this rq.
> > + */
> > + if (!fit && !rt_task_fits_capacity(next_task, cpu_of(lowest_rq))) {
> > +
> > + /* we forced skipping this condition, so re-evaluate it */
> > + if (unlikely(next_task->prio < rq->curr->prio)) {
> > + resched_curr(rq);
> > + goto out_unlock;
> > + }
> > +
> > + /*
> > + * If there are more tasks to push, then the rq is overloaded
> > + * with more than just this task, so push anyway.
> > + */
> > + if (has_pushable_tasks(rq))
> > + goto out_unlock;

Hmm, I accidently inverted the logic too before posting the patch. It should
have been (!has_pushable_tasks).

>
> The next_task is not yet removed from the pushable_tasks list, so this will
> always be true and we skip pushing. Even if you add some logic, what happens

I saw the loop in push_rt_task[s]() and thought pick_next_pushable_task() is
what dequeues..

Is there another way to know how many tasks are pushable? Or do I need a new
helper?

> if the other task also does not fit? How would the task finally be pushed
> to other CPUs?

If we had 2 unfitting tasks; we'll push one and keep one. If we had 2 tasks,
one unfitting and one fitting, then we'll still push 1. I think the dequeue
should have cleared the overloaded state after pushing the first task anyway.

Does this answer your question or were you trying to point out something else?

>
> > + }
> > +
> > deactivate_task(rq, next_task, 0);
> > set_task_cpu(next_task, lowest_rq->cpu);
> > activate_task(lowest_rq, next_task, 0);
> > @@ -1927,6 +1963,7 @@ static int push_rt_task(struct rq *rq)
> >
> > resched_curr(lowest_rq);
> >
> > +out_unlock:
> > double_unlock_balance(rq, lowest_rq);
> >
> > out:
> > @@ -2223,7 +2260,14 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
> > (rq->curr->nr_cpus_allowed < 2 ||
> > rq->curr->prio <= p->prio);
> >
> > - if (need_to_push || !rt_task_fits_capacity(p, cpu_of(rq)))
> > + bool fit = rt_task_fits_capacity(p, cpu_of(rq));
> > +
> > + if (!fit && !rq->rt.overloaded) {
> > + rt_set_overload(rq);
> > + rq->rt.overloaded = 1;
> > + }
> > +
> > + if (need_to_push || !fit)
> > push_rt_tasks(rq);
> > }
>
> If this is the only RT task queued on this CPU and current task is not a RT
> task, why are we calling push_rt_tasks() in case if unfit scenario. In most
> cases, the task is woken on this rq, because there is no other higher capacity
> rq that can take this task for whatever reason.

But what if the big CPU is now free for it to run?

I need to add a similar condition to switched_to_rt() otherwise the
push_rt_tasks() would be a NOP.

>
> btw, if we set overload condition with just 1 RT task, we keep receiving the
> IPI (irq work) to push the task and we can't do that because a running RT
> task can't be pushed.

Good point.

We can teach the logic not to send IPIs in this case?

Given all of that. I might still agree this might be unnecessary work. So I'll
split it out into a separate patch anyway and send an updated series of the
fixes we agreed on to go into 5.6.

Thanks!

--
Qais Yousef