2020-11-17 23:27:18

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH -tip 12/32] sched: Simplify the core pick loop for optimized case

The core pick loop grew a lot of warts over time to support
optimizations. Turns out that that directly doing a class pick before
entering the core-wide pick is better for readability. Make the changes.

Since this is a relatively new patch, make it a separate patch so that
it is easier to revert in case anyone reports an issue with it. Testing
shows it to be working for me.

Reviewed-by: Vineeth Pillai <[email protected]>
Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/sched/core.c | 73 ++++++++++++++++-----------------------------
1 file changed, 26 insertions(+), 47 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6aa76de55ef2..12e8e6627ab3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5180,6 +5180,15 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
put_prev_task_balance(rq, prev, rf);

smt_mask = cpu_smt_mask(cpu);
+ need_sync = !!rq->core->core_cookie;
+
+ /* reset state */
+ rq->core->core_cookie = 0UL;
+ if (rq->core->core_forceidle) {
+ need_sync = true;
+ fi_before = true;
+ rq->core->core_forceidle = false;
+ }

/*
* core->core_task_seq, core->core_pick_seq, rq->core_sched_seq
@@ -5192,16 +5201,25 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
* 'Fix' this by also increasing @task_seq for every pick.
*/
rq->core->core_task_seq++;
- need_sync = !!rq->core->core_cookie;

- /* reset state */
-reset:
- rq->core->core_cookie = 0UL;
- if (rq->core->core_forceidle) {
+ /*
+ * Optimize for common case where this CPU has no cookies
+ * and there are no cookied tasks running on siblings.
+ */
+ if (!need_sync) {
+ for_each_class(class) {
+ next = class->pick_task(rq);
+ if (next)
+ break;
+ }
+
+ if (!next->core_cookie) {
+ rq->core_pick = NULL;
+ goto done;
+ }
need_sync = true;
- fi_before = true;
- rq->core->core_forceidle = false;
}
+
for_each_cpu(i, smt_mask) {
struct rq *rq_i = cpu_rq(i);

@@ -5239,38 +5257,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
* core.
*/
p = pick_task(rq_i, class, max);
- if (!p) {
- /*
- * If there weren't no cookies; we don't need to
- * bother with the other siblings.
- */
- if (i == cpu && !need_sync)
- goto next_class;
-
+ if (!p)
continue;
- }
-
- /*
- * Optimize the 'normal' case where there aren't any
- * cookies and we don't need to sync up.
- */
- if (i == cpu && !need_sync) {
- if (p->core_cookie) {
- /*
- * This optimization is only valid as
- * long as there are no cookies
- * involved. We may have skipped
- * non-empty higher priority classes on
- * siblings, which are empty on this
- * CPU, so start over.
- */
- need_sync = true;
- goto reset;
- }
-
- next = p;
- goto done;
- }

rq_i->core_pick = p;

@@ -5298,18 +5286,9 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
cpu_rq(j)->core_pick = NULL;
}
goto again;
- } else {
- /*
- * Once we select a task for a cpu, we
- * should not be doing an unconstrained
- * pick because it might starve a task
- * on a forced idle cpu.
- */
- need_sync = true;
}
}
}
-next_class:;
}

rq->core->core_pick_seq = rq->core->core_task_seq;
--
2.29.2.299.gdc1121823c-goog


2020-11-24 21:43:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -tip 12/32] sched: Simplify the core pick loop for optimized case

On Tue, Nov 17, 2020 at 06:19:42PM -0500, Joel Fernandes (Google) wrote:
> + /*
> + * Optimize for common case where this CPU has no cookies
> + * and there are no cookied tasks running on siblings.
> + */
> + if (!need_sync) {
> + for_each_class(class) {
> + next = class->pick_task(rq);
> + if (next)
> + break;
> + }
> +
> + if (!next->core_cookie) {
> + rq->core_pick = NULL;
> + goto done;
> + }
> need_sync = true;
> }

This isn't what I send you here:

https://lkml.kernel.org/r/[email protected]

Specifically, you've lost the whole cfs-cgroup optimization.

What was wrong/not working with the below?

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5225,8 +5227,6 @@ pick_next_task(struct rq *rq, struct tas
return next;
}

- put_prev_task_balance(rq, prev, rf);
-
smt_mask = cpu_smt_mask(cpu);
need_sync = !!rq->core->core_cookie;

@@ -5255,17 +5255,14 @@ pick_next_task(struct rq *rq, struct tas
* and there are no cookied tasks running on siblings.
*/
if (!need_sync) {
- for_each_class(class) {
- next = class->pick_task(rq);
- if (next)
- break;
- }
-
+ next = __pick_next_task(rq, prev, rf);
if (!next->core_cookie) {
rq->core_pick = NULL;
- goto done;
+ return next;
}
- need_sync = true;
+ put_prev_task(next);
+ } else {
+ put_prev_task_balance(rq, prev, rf);
}

for_each_cpu(i, smt_mask) {

2020-11-24 23:44:27

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH -tip 12/32] sched: Simplify the core pick loop for optimized case

Hi Peter,

On Tue, Nov 24, 2020 at 01:04:38PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 17, 2020 at 06:19:42PM -0500, Joel Fernandes (Google) wrote:
> > + /*
> > + * Optimize for common case where this CPU has no cookies
> > + * and there are no cookied tasks running on siblings.
> > + */
> > + if (!need_sync) {
> > + for_each_class(class) {
> > + next = class->pick_task(rq);
> > + if (next)
> > + break;
> > + }
> > +
> > + if (!next->core_cookie) {
> > + rq->core_pick = NULL;
> > + goto done;
> > + }
> > need_sync = true;
> > }
>
> This isn't what I send you here:
>
> https://lkml.kernel.org/r/[email protected]

I had replied to it here with concerns about the effects of newly idle
balancing not being reverseable, it was only a theoretical concern:
http://lore.kernel.org/r/[email protected]

Also I was trying to keep the logic the same as v8 for unconstrained pick
(calling pick_task), considering that has been tested quite a bit.

> Specifically, you've lost the whole cfs-cgroup optimization.

Are you referring to this optimization in pick_next_task_fair() ?

/*
* Since we haven't yet done put_prev_entity and if the
* selected task
* is a different task than we started out with, try
* and touch the
* least amount of cfs_rqs.
*/

You are right, we wouldn't get that with just calling pick_task_fair(). We
did not have this in v8 series either though.

Also, if the task is a cookied task, then I think you are doing more work
with your patch due to the extra put_prev_task().

> What was wrong/not working with the below?

Other than the new idle balancing, IIRC it was also causing instability.
Maybe we can considering this optimization in the future if that's Ok with
you?

thanks,

- Joel

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5225,8 +5227,6 @@ pick_next_task(struct rq *rq, struct tas
> return next;
> }
>
> - put_prev_task_balance(rq, prev, rf);
> -
> smt_mask = cpu_smt_mask(cpu);
> need_sync = !!rq->core->core_cookie;
>
> @@ -5255,17 +5255,14 @@ pick_next_task(struct rq *rq, struct tas
> * and there are no cookied tasks running on siblings.
> */
> if (!need_sync) {
> - for_each_class(class) {
> - next = class->pick_task(rq);
> - if (next)
> - break;
> - }
> -
> + next = __pick_next_task(rq, prev, rf);
> if (!next->core_cookie) {
> rq->core_pick = NULL;
> - goto done;
> + return next;
> }
> - need_sync = true;
> + put_prev_task(next);
> + } else {
> + put_prev_task_balance(rq, prev, rf);
> }
>
> for_each_cpu(i, smt_mask) {

2020-11-25 08:41:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -tip 12/32] sched: Simplify the core pick loop for optimized case

On Tue, Nov 24, 2020 at 12:04:30PM -0500, Joel Fernandes wrote:
> Hi Peter,
>
> On Tue, Nov 24, 2020 at 01:04:38PM +0100, Peter Zijlstra wrote:
> > On Tue, Nov 17, 2020 at 06:19:42PM -0500, Joel Fernandes (Google) wrote:
> > > + /*
> > > + * Optimize for common case where this CPU has no cookies
> > > + * and there are no cookied tasks running on siblings.
> > > + */
> > > + if (!need_sync) {
> > > + for_each_class(class) {
> > > + next = class->pick_task(rq);
> > > + if (next)
> > > + break;
> > > + }
> > > +
> > > + if (!next->core_cookie) {
> > > + rq->core_pick = NULL;
> > > + goto done;
> > > + }
> > > need_sync = true;
> > > }
> >
> > This isn't what I send you here:
> >
> > https://lkml.kernel.org/r/[email protected]
>
> I had replied to it here with concerns about the effects of newly idle
> balancing not being reverseable, it was only a theoretical concern:
> http://lore.kernel.org/r/[email protected]

Gah, missed that. I don't think that matters much see:
put_prev_task_balance() calling balance_fair().

> > Specifically, you've lost the whole cfs-cgroup optimization.
>
> Are you referring to this optimization in pick_next_task_fair() ?
>
> /*
> * Since we haven't yet done put_prev_entity and if the
> * selected task
> * is a different task than we started out with, try
> * and touch the
> * least amount of cfs_rqs.
> */

Yep, that. The giant FAIR_GROUP_SCHED hunk. The thing that makes
all of pick_next_task() more complicated than it really wants to be.

> You are right, we wouldn't get that with just calling pick_task_fair(). We
> did not have this in v8 series either though.
>
> Also, if the task is a cookied task, then I think you are doing more work
> with your patch due to the extra put_prev_task().

Yes, but only if you mix cookie tasks with non-cookie tasks and schedule
two non-cookie tasks back-to-back. I don't think we care overly much
about that.

I think it makes more sense to ensure that if you have core-sched
enabled on your machine and have a (core-aligned) parition with
non-cookie tasks, scheduling has works as 'normal' as possible.

> > What was wrong/not working with the below?
>
> Other than the new idle balancing, IIRC it was also causing instability.
> Maybe we can considering this optimization in the future if that's Ok with
> you?

Hurmph.. you don't happen to remember what went splat?