2020-02-14 16:40:53

by Qais Yousef

[permalink] [raw]
Subject: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU

If a task was running on unfit CPU we could ignore migrating if the
priority level of the new fitting CPU is the *same* as the unfit one.

Add an extra check to select_task_rq_rt() to allow the push in case:

* old_cpu.highest_priority == new_cpu.highest_priority
* task_fits(p, new_cpu)

Signed-off-by: Qais Yousef <[email protected]>
---

I was seeing some delays in migrating a task to a big CPU sometimes although it
was free, and I think this fixes it.

TBH, I fail to see how the check of

p->prio < cpu_rq(target)->rt.highest_prio.curr

is necessary as find_lowest_rq() surely implies the above condition by
definition?

Unless we're fighting a race condition here where the rt_rq priority has
changed between the time we selected the lowest_rq and taking the decision to
migrate, then this makes sense.


kernel/sched/rt.c | 34 +++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 0c8bac134d3a..5ea235f2cfe8 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1430,7 +1430,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
{
struct task_struct *curr;
struct rq *rq;
- bool test;
+ bool test, fit;

/* For anything but wake ups, just return the task_cpu */
if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
@@ -1471,16 +1471,32 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
unlikely(rt_task(curr)) &&
(curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);

- if (test || !rt_task_fits_capacity(p, cpu)) {
+ fit = rt_task_fits_capacity(p, cpu);
+
+ if (test || !fit) {
int target = find_lowest_rq(p);

- /*
- * Don't bother moving it if the destination CPU is
- * not running a lower priority task.
- */
- if (target != -1 &&
- p->prio < cpu_rq(target)->rt.highest_prio.curr)
- cpu = target;
+ if (target != -1) {
+ /*
+ * Don't bother moving it if the destination CPU is
+ * not running a lower priority task.
+ */
+ if (p->prio < cpu_rq(target)->rt.highest_prio.curr) {
+
+ cpu = target;
+
+ } else if (p->prio == cpu_rq(target)->rt.highest_prio.curr) {
+
+ /*
+ * If the priority is the same and the new CPU
+ * is a better fit, then move, otherwise don't
+ * bother here either.
+ */
+ fit = rt_task_fits_capacity(p, target);
+ if (fit)
+ cpu = target;
+ }
+ }
}
rcu_read_unlock();

--
2.17.1


2020-02-17 09:24:46

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU

Hi Qais,

On Fri, Feb 14, 2020 at 04:39:49PM +0000, Qais Yousef wrote:

[...]

> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 0c8bac134d3a..5ea235f2cfe8 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1430,7 +1430,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> {
> struct task_struct *curr;
> struct rq *rq;
> - bool test;
> + bool test, fit;
>
> /* For anything but wake ups, just return the task_cpu */
> if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
> @@ -1471,16 +1471,32 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> unlikely(rt_task(curr)) &&
> (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
>
> - if (test || !rt_task_fits_capacity(p, cpu)) {
> + fit = rt_task_fits_capacity(p, cpu);
> +
> + if (test || !fit) {
> int target = find_lowest_rq(p);
>
> - /*
> - * Don't bother moving it if the destination CPU is
> - * not running a lower priority task.
> - */
> - if (target != -1 &&
> - p->prio < cpu_rq(target)->rt.highest_prio.curr)
> - cpu = target;
> + if (target != -1) {
> + /*
> + * Don't bother moving it if the destination CPU is
> + * not running a lower priority task.
> + */
> + if (p->prio < cpu_rq(target)->rt.highest_prio.curr) {
> +
> + cpu = target;
> +
> + } else if (p->prio == cpu_rq(target)->rt.highest_prio.curr) {
> +
> + /*
> + * If the priority is the same and the new CPU
> + * is a better fit, then move, otherwise don't
> + * bother here either.
> + */
> + fit = rt_task_fits_capacity(p, target);
> + if (fit)
> + cpu = target;
> + }
> + }

I understand that we are opting for the migration when priorities are tied but
the task can fit on the new task. But there is no guarantee that this task
stay there. Because any CPU that drops RT prio can pull the task. Then why
not leave it to the balancer?

I notice a case where tasks would migrate for no reason (happens without this
patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
task can go to *any* little CPU. There is no bias towards its previous CPU.
I don't know if it makes any difference but I see RT task placement is too
keen on reducing the migrations unless it is absolutely needed.

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-02-17 14:15:21

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU

On 02/17/20 14:53, Pavan Kondeti wrote:
> Hi Qais,
>
> On Fri, Feb 14, 2020 at 04:39:49PM +0000, Qais Yousef wrote:
>
> [...]
>
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index 0c8bac134d3a..5ea235f2cfe8 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1430,7 +1430,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> > {
> > struct task_struct *curr;
> > struct rq *rq;
> > - bool test;
> > + bool test, fit;
> >
> > /* For anything but wake ups, just return the task_cpu */
> > if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
> > @@ -1471,16 +1471,32 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> > unlikely(rt_task(curr)) &&
> > (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
> >
> > - if (test || !rt_task_fits_capacity(p, cpu)) {
> > + fit = rt_task_fits_capacity(p, cpu);
> > +
> > + if (test || !fit) {
> > int target = find_lowest_rq(p);
> >
> > - /*
> > - * Don't bother moving it if the destination CPU is
> > - * not running a lower priority task.
> > - */
> > - if (target != -1 &&
> > - p->prio < cpu_rq(target)->rt.highest_prio.curr)
> > - cpu = target;
> > + if (target != -1) {
> > + /*
> > + * Don't bother moving it if the destination CPU is
> > + * not running a lower priority task.
> > + */
> > + if (p->prio < cpu_rq(target)->rt.highest_prio.curr) {
> > +
> > + cpu = target;
> > +
> > + } else if (p->prio == cpu_rq(target)->rt.highest_prio.curr) {
> > +
> > + /*
> > + * If the priority is the same and the new CPU
> > + * is a better fit, then move, otherwise don't
> > + * bother here either.
> > + */
> > + fit = rt_task_fits_capacity(p, target);
> > + if (fit)
> > + cpu = target;
> > + }
> > + }
>
> I understand that we are opting for the migration when priorities are tied but
> the task can fit on the new task. But there is no guarantee that this task
> stay there. Because any CPU that drops RT prio can pull the task. Then why
> not leave it to the balancer?

This patch does help in the 2 RT task test case. Without it I can see a big
delay for the task to migrate from a little CPU to a big one, although the big
is free.

Maybe my test is too short (1 second). The delay I've seen is 0.5-0.7s..

https://imgur.com/a/qKJk4w4

Maybe I missed the real root cause. Let me dig more.

>
> I notice a case where tasks would migrate for no reason (happens without this
> patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
> task can go to *any* little CPU. There is no bias towards its previous CPU.
> I don't know if it makes any difference but I see RT task placement is too
> keen on reducing the migrations unless it is absolutely needed.

In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask
and prefer it if it is.

But yeah I see it happening too

https://imgur.com/a/FYqLIko

Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't
show that.

I shouldn't have changed something to affect this bias. Do you think it's
something I introduced?

It's something maybe worth digging into though. I'll try to have a look.

Thanks

--
Qais Yousef

2020-02-18 04:16:51

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU

On Mon, Feb 17, 2020 at 01:53:07PM +0000, Qais Yousef wrote:
> On 02/17/20 14:53, Pavan Kondeti wrote:
> > Hi Qais,
> >
> > On Fri, Feb 14, 2020 at 04:39:49PM +0000, Qais Yousef wrote:
> >
> > [...]
> >
> > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > > index 0c8bac134d3a..5ea235f2cfe8 100644
> > > --- a/kernel/sched/rt.c
> > > +++ b/kernel/sched/rt.c
> > > @@ -1430,7 +1430,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> > > {
> > > struct task_struct *curr;
> > > struct rq *rq;
> > > - bool test;
> > > + bool test, fit;
> > >
> > > /* For anything but wake ups, just return the task_cpu */
> > > if (sd_flag != SD_BALANCE_WAKE && sd_flag != SD_BALANCE_FORK)
> > > @@ -1471,16 +1471,32 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> > > unlikely(rt_task(curr)) &&
> > > (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
> > >
> > > - if (test || !rt_task_fits_capacity(p, cpu)) {
> > > + fit = rt_task_fits_capacity(p, cpu);
> > > +
> > > + if (test || !fit) {
> > > int target = find_lowest_rq(p);
> > >
> > > - /*
> > > - * Don't bother moving it if the destination CPU is
> > > - * not running a lower priority task.
> > > - */
> > > - if (target != -1 &&
> > > - p->prio < cpu_rq(target)->rt.highest_prio.curr)
> > > - cpu = target;
> > > + if (target != -1) {
> > > + /*
> > > + * Don't bother moving it if the destination CPU is
> > > + * not running a lower priority task.
> > > + */
> > > + if (p->prio < cpu_rq(target)->rt.highest_prio.curr) {
> > > +
> > > + cpu = target;
> > > +
> > > + } else if (p->prio == cpu_rq(target)->rt.highest_prio.curr) {
> > > +
> > > + /*
> > > + * If the priority is the same and the new CPU
> > > + * is a better fit, then move, otherwise don't
> > > + * bother here either.
> > > + */
> > > + fit = rt_task_fits_capacity(p, target);
> > > + if (fit)
> > > + cpu = target;
> > > + }
> > > + }
> >
> > I understand that we are opting for the migration when priorities are tied but
> > the task can fit on the new task. But there is no guarantee that this task
> > stay there. Because any CPU that drops RT prio can pull the task. Then why
> > not leave it to the balancer?
>
> This patch does help in the 2 RT task test case. Without it I can see a big
> delay for the task to migrate from a little CPU to a big one, although the big
> is free.
>
> Maybe my test is too short (1 second). The delay I've seen is 0.5-0.7s..
>
> https://imgur.com/a/qKJk4w4
>
> Maybe I missed the real root cause. Let me dig more.
>
> >
> > I notice a case where tasks would migrate for no reason (happens without this
> > patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
> > task can go to *any* little CPU. There is no bias towards its previous CPU.
> > I don't know if it makes any difference but I see RT task placement is too
> > keen on reducing the migrations unless it is absolutely needed.
>
> In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask
> and prefer it if it is.
>
> But yeah I see it happening too
>
> https://imgur.com/a/FYqLIko
>
> Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't
> show that.
>
> I shouldn't have changed something to affect this bias. Do you think it's
> something I introduced?
>
> It's something maybe worth digging into though. I'll try to have a look.
>

The original RT task placement i.e without capacity awareness, places the task
on the previous CPU if the task can preempt the running task. I interpreted it
as that "higher prio RT" task should get better treatment even if it results
in stopping the lower prio RT execution and migrating it to another CPU.

Now coming to your patch (merged), we force find_lowest_rq() if the previous
CPU can't fit the task though this task can right away run there. When the
lowest mask returns an unfit CPU (with your new patch), We have two choices,
either to place it on this unfit CPU (may involve migration) or place it on
the previous CPU to avoid the migration. We are selecting the first approach.

The task_cpu(p) check in find_lowest_rq() only works when the previous CPU
does not have a RT task. If it is running a lower prio RT task than the
waking task, the lowest_mask may not contain the previous CPU.

I don't if any workload hurts due to this change in behavior. So not sure
if we have to restore the original behavior. Something like below will do.

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4043abe..c80d948 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1475,11 +1475,15 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
int target = find_lowest_rq(p);

/*
- * Don't bother moving it if the destination CPU is
- * not running a lower priority task.
+ * Don't bother moving it
+ *
+ * - If the destination CPU is not running a lower priority task
+ * - The task can't fit on the destination CPU and it can run
+ * right away on it's previous CPU.
*/
- if (target != -1 &&
- p->prio < cpu_rq(target)->rt.highest_prio.curr)
+ if (target != -1 && target != cpu &&
+ p->prio < cpu_rq(target)->rt.highest_prio.curr &&
+ (test || rt_task_fits_capacity(p, target)))
cpu = target;
}
rcu_read_unlock();

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-02-18 17:48:48

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU

On 02/18/20 09:46, Pavan Kondeti wrote:
> The original RT task placement i.e without capacity awareness, places the task
> on the previous CPU if the task can preempt the running task. I interpreted it
> as that "higher prio RT" task should get better treatment even if it results
> in stopping the lower prio RT execution and migrating it to another CPU.
>
> Now coming to your patch (merged), we force find_lowest_rq() if the previous
> CPU can't fit the task though this task can right away run there. When the
> lowest mask returns an unfit CPU (with your new patch), We have two choices,
> either to place it on this unfit CPU (may involve migration) or place it on
> the previous CPU to avoid the migration. We are selecting the first approach.
>
> The task_cpu(p) check in find_lowest_rq() only works when the previous CPU
> does not have a RT task. If it is running a lower prio RT task than the
> waking task, the lowest_mask may not contain the previous CPU.
>
> I don't if any workload hurts due to this change in behavior. So not sure
> if we have to restore the original behavior. Something like below will do.

Is this patch equivalent to yours? If yes, then I got you. If not, then I need
to re-read this again..

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index ace9acf9d63c..854a0c9a7be6 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1476,6 +1476,13 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
if (test || !rt_task_fits_capacity(p, cpu)) {
int target = find_lowest_rq(p);

+ /*
+ * Bail out if we were forcing a migration to find a better
+ * fitting CPU but our search failed.
+ */
+ if (!test && !rt_task_fits_capacity(p, target))
+ goto out_unlock;
+
/*
* Don't bother moving it if the destination CPU is
* not running a lower priority task.
@@ -1484,6 +1491,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
p->prio < cpu_rq(target)->rt.highest_prio.curr)
cpu = target;
}
+
+out_unlock:
rcu_read_unlock();

out:


>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index 4043abe..c80d948 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1475,11 +1475,15 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> int target = find_lowest_rq(p);
>
> /*
> - * Don't bother moving it if the destination CPU is
> - * not running a lower priority task.
> + * Don't bother moving it
> + *
> + * - If the destination CPU is not running a lower priority task
> + * - The task can't fit on the destination CPU and it can run
> + * right away on it's previous CPU.
> */
> - if (target != -1 &&
> - p->prio < cpu_rq(target)->rt.highest_prio.curr)
> + if (target != -1 && target != cpu &&
> + p->prio < cpu_rq(target)->rt.highest_prio.curr &&
> + (test || rt_task_fits_capacity(p, target)))
> cpu = target;
> }
> rcu_read_unlock();
>
> Thanks,
> Pavan
>
> --
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-02-19 02:47:16

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU

On Tue, Feb 18, 2020 at 05:47:19PM +0000, Qais Yousef wrote:
> On 02/18/20 09:46, Pavan Kondeti wrote:
> > The original RT task placement i.e without capacity awareness, places the task
> > on the previous CPU if the task can preempt the running task. I interpreted it
> > as that "higher prio RT" task should get better treatment even if it results
> > in stopping the lower prio RT execution and migrating it to another CPU.
> >
> > Now coming to your patch (merged), we force find_lowest_rq() if the previous
> > CPU can't fit the task though this task can right away run there. When the
> > lowest mask returns an unfit CPU (with your new patch), We have two choices,
> > either to place it on this unfit CPU (may involve migration) or place it on
> > the previous CPU to avoid the migration. We are selecting the first approach.
> >
> > The task_cpu(p) check in find_lowest_rq() only works when the previous CPU
> > does not have a RT task. If it is running a lower prio RT task than the
> > waking task, the lowest_mask may not contain the previous CPU.
> >
> > I don't if any workload hurts due to this change in behavior. So not sure
> > if we have to restore the original behavior. Something like below will do.
>
> Is this patch equivalent to yours? If yes, then I got you. If not, then I need
> to re-read this again..
>
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index ace9acf9d63c..854a0c9a7be6 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1476,6 +1476,13 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> if (test || !rt_task_fits_capacity(p, cpu)) {
> int target = find_lowest_rq(p);
>
> + /*
> + * Bail out if we were forcing a migration to find a better
> + * fitting CPU but our search failed.
> + */
> + if (!test && !rt_task_fits_capacity(p, target))
> + goto out_unlock;
> +

Yes. This is what I was referring to.

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-02-19 10:47:04

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU

On 02/19/20 08:16, Pavan Kondeti wrote:
> On Tue, Feb 18, 2020 at 05:47:19PM +0000, Qais Yousef wrote:
> > On 02/18/20 09:46, Pavan Kondeti wrote:
> > > The original RT task placement i.e without capacity awareness, places the task
> > > on the previous CPU if the task can preempt the running task. I interpreted it
> > > as that "higher prio RT" task should get better treatment even if it results
> > > in stopping the lower prio RT execution and migrating it to another CPU.
> > >
> > > Now coming to your patch (merged), we force find_lowest_rq() if the previous
> > > CPU can't fit the task though this task can right away run there. When the
> > > lowest mask returns an unfit CPU (with your new patch), We have two choices,
> > > either to place it on this unfit CPU (may involve migration) or place it on
> > > the previous CPU to avoid the migration. We are selecting the first approach.
> > >
> > > The task_cpu(p) check in find_lowest_rq() only works when the previous CPU
> > > does not have a RT task. If it is running a lower prio RT task than the
> > > waking task, the lowest_mask may not contain the previous CPU.
> > >
> > > I don't if any workload hurts due to this change in behavior. So not sure
> > > if we have to restore the original behavior. Something like below will do.
> >
> > Is this patch equivalent to yours? If yes, then I got you. If not, then I need
> > to re-read this again..
> >
> > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> > index ace9acf9d63c..854a0c9a7be6 100644
> > --- a/kernel/sched/rt.c
> > +++ b/kernel/sched/rt.c
> > @@ -1476,6 +1476,13 @@ select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
> > if (test || !rt_task_fits_capacity(p, cpu)) {
> > int target = find_lowest_rq(p);
> >
> > + /*
> > + * Bail out if we were forcing a migration to find a better
> > + * fitting CPU but our search failed.
> > + */
> > + if (!test && !rt_task_fits_capacity(p, target))
> > + goto out_unlock;
> > +
>
> Yes. This is what I was referring to.

Cool. I can't see how this could be a problem too but since as you say it'd
preserve the older behavior, I'll add it to the lot with proper changelog.

Thanks!

--
Qais Yousef

2020-02-19 14:06:01

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU

On 02/17/20 13:53, Qais Yousef wrote:
> On 02/17/20 14:53, Pavan Kondeti wrote:
> > I notice a case where tasks would migrate for no reason (happens without this
> > patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
> > task can go to *any* little CPU. There is no bias towards its previous CPU.
> > I don't know if it makes any difference but I see RT task placement is too
> > keen on reducing the migrations unless it is absolutely needed.
>
> In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask
> and prefer it if it is.
>
> But yeah I see it happening too
>
> https://imgur.com/a/FYqLIko
>
> Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't
> show that.
>
> I shouldn't have changed something to affect this bias. Do you think it's
> something I introduced?
>
> It's something maybe worth digging into though. I'll try to have a look.

FWIW, I dug a bit into this and I found out we have a thundering herd issue.

Since I just have a set of periodic task that all start together,
select_task_rq_rt() ends up selecting the same fitting CPU for all of them
(CPU1). The end up all waking up on CPU1, only to get pushed back out
again with only one surviving.

This reshuffles the task placement ending with some tasks being swapped.

I don't think this problem is specific to my change and could happen without
it.

The problem is caused by the way find_lowest_rq() selects a cpu in the mask

1750 best_cpu = cpumask_first_and(lowest_mask,
1751 sched_domain_span(sd));
1752 if (best_cpu < nr_cpu_ids) {
1753 rcu_read_unlock();
1754 return best_cpu;
1755 }

It always returns the first CPU in the mask. Or the mask could only contain
a single CPU too. The end result is that we most likely end up herding all the
tasks that wake up simultaneously to the same CPU.

I'm not sure how to fix this problem yet.

--
Qais Yousef

2020-02-21 08:17:35

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU

On Wed, Feb 19, 2020 at 02:02:44PM +0000, Qais Yousef wrote:
> On 02/17/20 13:53, Qais Yousef wrote:
> > On 02/17/20 14:53, Pavan Kondeti wrote:
> > > I notice a case where tasks would migrate for no reason (happens without this
> > > patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
> > > task can go to *any* little CPU. There is no bias towards its previous CPU.
> > > I don't know if it makes any difference but I see RT task placement is too
> > > keen on reducing the migrations unless it is absolutely needed.
> >
> > In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask
> > and prefer it if it is.
> >
> > But yeah I see it happening too
> >
> > https://imgur.com/a/FYqLIko
> >
> > Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't
> > show that.
> >
> > I shouldn't have changed something to affect this bias. Do you think it's
> > something I introduced?
> >
> > It's something maybe worth digging into though. I'll try to have a look.
>
> FWIW, I dug a bit into this and I found out we have a thundering herd issue.
>
> Since I just have a set of periodic task that all start together,
> select_task_rq_rt() ends up selecting the same fitting CPU for all of them
> (CPU1). The end up all waking up on CPU1, only to get pushed back out
> again with only one surviving.
>
> This reshuffles the task placement ending with some tasks being swapped.
>
> I don't think this problem is specific to my change and could happen without
> it.
>
> The problem is caused by the way find_lowest_rq() selects a cpu in the mask
>
> 1750 best_cpu = cpumask_first_and(lowest_mask,
> 1751 sched_domain_span(sd));
> 1752 if (best_cpu < nr_cpu_ids) {
> 1753 rcu_read_unlock();
> 1754 return best_cpu;
> 1755 }
>
> It always returns the first CPU in the mask. Or the mask could only contain
> a single CPU too. The end result is that we most likely end up herding all the
> tasks that wake up simultaneously to the same CPU.
>
> I'm not sure how to fix this problem yet.
>

Yes, I have seen this problem too. This is not limited to RT even fair class
(find_energy_efficient_cpu path) also have the same issue. There is a window
where we select a CPU for the task and the task being queued there. Because of
this, we may select the same CPU for two successive waking tasks. Turning off
TTWU_QUEUE sched feature addresses this up to some extent. At least it would
solve the cases like multiple tasks getting woken up from an interrupt handler.

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-02-21 11:13:01

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 3/3] sched/rt: fix pushing unfit tasks to a better CPU

On 02/21/20 13:45, Pavan Kondeti wrote:
> On Wed, Feb 19, 2020 at 02:02:44PM +0000, Qais Yousef wrote:
> > On 02/17/20 13:53, Qais Yousef wrote:
> > > On 02/17/20 14:53, Pavan Kondeti wrote:
> > > > I notice a case where tasks would migrate for no reason (happens without this
> > > > patch also). Assuming BIG cores are busy with other RT tasks. Now this RT
> > > > task can go to *any* little CPU. There is no bias towards its previous CPU.
> > > > I don't know if it makes any difference but I see RT task placement is too
> > > > keen on reducing the migrations unless it is absolutely needed.
> > >
> > > In find_lowest_rq() there's a check if the task_cpu(p) is in the lowest_mask
> > > and prefer it if it is.
> > >
> > > But yeah I see it happening too
> > >
> > > https://imgur.com/a/FYqLIko
> > >
> > > Tasks on CPU 0 and 3 swap. Note that my tasks are periodic but the plots don't
> > > show that.
> > >
> > > I shouldn't have changed something to affect this bias. Do you think it's
> > > something I introduced?
> > >
> > > It's something maybe worth digging into though. I'll try to have a look.
> >
> > FWIW, I dug a bit into this and I found out we have a thundering herd issue.
> >
> > Since I just have a set of periodic task that all start together,
> > select_task_rq_rt() ends up selecting the same fitting CPU for all of them
> > (CPU1). The end up all waking up on CPU1, only to get pushed back out
> > again with only one surviving.
> >
> > This reshuffles the task placement ending with some tasks being swapped.
> >
> > I don't think this problem is specific to my change and could happen without
> > it.
> >
> > The problem is caused by the way find_lowest_rq() selects a cpu in the mask
> >
> > 1750 best_cpu = cpumask_first_and(lowest_mask,
> > 1751 sched_domain_span(sd));
> > 1752 if (best_cpu < nr_cpu_ids) {
> > 1753 rcu_read_unlock();
> > 1754 return best_cpu;
> > 1755 }
> >
> > It always returns the first CPU in the mask. Or the mask could only contain
> > a single CPU too. The end result is that we most likely end up herding all the
> > tasks that wake up simultaneously to the same CPU.
> >
> > I'm not sure how to fix this problem yet.
> >
>
> Yes, I have seen this problem too. This is not limited to RT even fair class
> (find_energy_efficient_cpu path) also have the same issue. There is a window
> where we select a CPU for the task and the task being queued there. Because of
> this, we may select the same CPU for two successive waking tasks. Turning off
> TTWU_QUEUE sched feature addresses this up to some extent. At least it would
> solve the cases like multiple tasks getting woken up from an interrupt handler.

Oh, handy. Let me try this out.

I added it to my to-do to investigate it when I have time anyway.

In modern systems where L3 is spanning all CPUs, the migration isn't that
costly, but it'd still be unnecessary wakeup latency that can add up.

Thanks

--
Qais Yousef