2020-11-17 23:25:29

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH -tip 10/32] sched: Fix priority inversion of cookied task with sibling

From: Peter Zijlstra <[email protected]>

The rationale is as follows. In the core-wide pick logic, even if
need_sync == false, we need to go look at other CPUs (non-local CPUs) to
see if they could be running RT.

Say the RQs in a particular core look like this:
Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.

rq0 rq1
CFS1 (tagged) RT1 (not tag)
CFS2 (tagged)

Say schedule() runs on rq0. Now, it will enter the above loop and
pick_task(RT) will return NULL for 'p'. It will enter the above if() block
and see that need_sync == false and will skip RT entirely.

The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
rq0 rq1
CFS1 IDLE

When it should have selected:
rq0 r1
IDLE RT

Joel saw this issue on real-world usecases in ChromeOS where an RT task
gets constantly force-idled and breaks RT. Lets cure it.

NOTE: This problem will be fixed differently in a later patch. It just
kept here for reference purposes about this issue, and to make
applying later patches easier.

Reported-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/sched/core.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4ee4902c2cf5..53af817740c0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5195,6 +5195,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
need_sync = !!rq->core->core_cookie;

/* reset state */
+reset:
rq->core->core_cookie = 0UL;
if (rq->core->core_forceidle) {
need_sync = true;
@@ -5242,14 +5243,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
/*
* If there weren't no cookies; we don't need to
* bother with the other siblings.
- * If the rest of the core is not running a tagged
- * task, i.e. need_sync == 0, and the current CPU
- * which called into the schedule() loop does not
- * have any tasks for this class, skip selecting for
- * other siblings since there's no point. We don't skip
- * for RT/DL because that could make CFS force-idle RT.
*/
- if (i == cpu && !need_sync && class == &fair_sched_class)
+ if (i == cpu && !need_sync)
goto next_class;

continue;
@@ -5259,7 +5254,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
* Optimize the 'normal' case where there aren't any
* cookies and we don't need to sync up.
*/
- if (i == cpu && !need_sync && !p->core_cookie) {
+ if (i == cpu && !need_sync) {
+ if (p->core_cookie) {
+ /*
+ * This optimization is only valid as
+ * long as there are no cookies
+ * involved. We may have skipped
+ * non-empty higher priority classes on
+ * siblings, which are empty on this
+ * CPU, so start over.
+ */
+ need_sync = true;
+ goto reset;
+ }
+
next = p;
goto done;
}
@@ -5299,7 +5307,6 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
*/
need_sync = true;
}
-
}
}
next_class:;
--
2.29.2.299.gdc1121823c-goog


2020-11-22 22:43:08

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH -tip 10/32] sched: Fix priority inversion of cookied task with sibling

On Tue, Nov 17, 2020 at 06:19:40PM -0500, Joel Fernandes (Google) wrote:
> From: Peter Zijlstra <[email protected]>
>
> The rationale is as follows. In the core-wide pick logic, even if
> need_sync == false, we need to go look at other CPUs (non-local CPUs) to
> see if they could be running RT.
>
> Say the RQs in a particular core look like this:
> Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.
>
> rq0 rq1
> CFS1 (tagged) RT1 (not tag)
> CFS2 (tagged)
>
> Say schedule() runs on rq0. Now, it will enter the above loop and
> pick_task(RT) will return NULL for 'p'. It will enter the above if() block
> and see that need_sync == false and will skip RT entirely.
>
> The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
> rq0 rq1
> CFS1 IDLE
>
> When it should have selected:
> rq0 r1
> IDLE RT
>
> Joel saw this issue on real-world usecases in ChromeOS where an RT task
> gets constantly force-idled and breaks RT. Lets cure it.
>
> NOTE: This problem will be fixed differently in a later patch. It just
> kept here for reference purposes about this issue, and to make
> applying later patches easier.
>

The changelog is hard to read, it refers to above if(), whereas there
is no code snippet in the changelog. Also, from what I can see following
the series, p->core_cookie is not yet set anywhere (unless I missed it),
so fixing it in here did not make sense just reading the series.

Balbir

2020-11-24 18:33:22

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH -tip 10/32] sched: Fix priority inversion of cookied task with sibling

On Mon, Nov 23, 2020 at 09:41:23AM +1100, Balbir Singh wrote:
> On Tue, Nov 17, 2020 at 06:19:40PM -0500, Joel Fernandes (Google) wrote:
> > From: Peter Zijlstra <[email protected]>
> >
> > The rationale is as follows. In the core-wide pick logic, even if
> > need_sync == false, we need to go look at other CPUs (non-local CPUs) to
> > see if they could be running RT.
> >
> > Say the RQs in a particular core look like this:
> > Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.
> >
> > rq0 rq1
> > CFS1 (tagged) RT1 (not tag)
> > CFS2 (tagged)
> >
> > Say schedule() runs on rq0. Now, it will enter the above loop and
> > pick_task(RT) will return NULL for 'p'. It will enter the above if() block
> > and see that need_sync == false and will skip RT entirely.
> >
> > The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
> > rq0 rq1
> > CFS1 IDLE
> >
> > When it should have selected:
> > rq0 r1
> > IDLE RT
> >
> > Joel saw this issue on real-world usecases in ChromeOS where an RT task
> > gets constantly force-idled and breaks RT. Lets cure it.
> >
> > NOTE: This problem will be fixed differently in a later patch. It just
> > kept here for reference purposes about this issue, and to make
> > applying later patches easier.
> >
>
> The changelog is hard to read, it refers to above if(), whereas there
> is no code snippet in the changelog.

Yeah sorry, it comes from this email where I described the issue:
http://lore.kernel.org/r/[email protected]

I corrected the changelog and appended the patch below. Also pushed it to:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=coresched

> Also, from what I can see following
> the series, p->core_cookie is not yet set anywhere (unless I missed it),
> so fixing it in here did not make sense just reading the series.

The interface patches for core_cookie are added later, that's how it is. The
infrastructure comes first here. It would also not make sense to add
interface first as well so I think the current ordering is fine.

---8<-----------------------

From: Peter Zijlstra <[email protected]>
Subject: [PATCH] sched: Fix priority inversion of cookied task with sibling

The rationale is as follows. In the core-wide pick logic, even if
need_sync == false, we need to go look at other CPUs (non-local CPUs) to
see if they could be running RT.

Say the RQs in a particular core look like this:
Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.

rq0 rq1
CFS1 (tagged) RT1 (not tag)
CFS2 (tagged)

The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
rq0 rq1
CFS1 IDLE

When it should have selected:
rq0 r1
IDLE RT

Fix this issue by forcing need_sync and restarting the search if a
cookied task was discovered. This will avoid this optimization from
making incorrect picks.

Joel saw this issue on real-world usecases in ChromeOS where an RT task
gets constantly force-idled and breaks RT. Lets cure it.

NOTE: This problem will be fixed differently in a later patch. It just
kept here for reference purposes about this issue, and to make
applying later patches easier.

Reported-by: Joel Fernandes (Google) <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/sched/core.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4ee4902c2cf5..53af817740c0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5195,6 +5195,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
need_sync = !!rq->core->core_cookie;

/* reset state */
+reset:
rq->core->core_cookie = 0UL;
if (rq->core->core_forceidle) {
need_sync = true;
@@ -5242,14 +5243,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
/*
* If there weren't no cookies; we don't need to
* bother with the other siblings.
- * If the rest of the core is not running a tagged
- * task, i.e. need_sync == 0, and the current CPU
- * which called into the schedule() loop does not
- * have any tasks for this class, skip selecting for
- * other siblings since there's no point. We don't skip
- * for RT/DL because that could make CFS force-idle RT.
*/
- if (i == cpu && !need_sync && class == &fair_sched_class)
+ if (i == cpu && !need_sync)
goto next_class;

continue;
@@ -5259,7 +5254,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
* Optimize the 'normal' case where there aren't any
* cookies and we don't need to sync up.
*/
- if (i == cpu && !need_sync && !p->core_cookie) {
+ if (i == cpu && !need_sync) {
+ if (p->core_cookie) {
+ /*
+ * This optimization is only valid as
+ * long as there are no cookies
+ * involved. We may have skipped
+ * non-empty higher priority classes on
+ * siblings, which are empty on this
+ * CPU, so start over.
+ */
+ need_sync = true;
+ goto reset;
+ }
+
next = p;
goto done;
}
@@ -5299,7 +5307,6 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
*/
need_sync = true;
}
-
}
}
next_class:;
--
2.29.2.454.gaff20da3a2-goog

2020-11-26 08:23:15

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH -tip 10/32] sched: Fix priority inversion of cookied task with sibling

On Tue, Nov 24, 2020 at 01:30:38PM -0500, Joel Fernandes wrote:
> On Mon, Nov 23, 2020 at 09:41:23AM +1100, Balbir Singh wrote:
> > On Tue, Nov 17, 2020 at 06:19:40PM -0500, Joel Fernandes (Google) wrote:
> > > From: Peter Zijlstra <[email protected]>
> > >
> > > The rationale is as follows. In the core-wide pick logic, even if
> > > need_sync == false, we need to go look at other CPUs (non-local CPUs) to
> > > see if they could be running RT.
> > >
> > > Say the RQs in a particular core look like this:
> > > Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.
> > >
> > > rq0 rq1
> > > CFS1 (tagged) RT1 (not tag)
> > > CFS2 (tagged)
> > >
> > > Say schedule() runs on rq0. Now, it will enter the above loop and
> > > pick_task(RT) will return NULL for 'p'. It will enter the above if() block
> > > and see that need_sync == false and will skip RT entirely.
> > >
> > > The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
> > > rq0 rq1
> > > CFS1 IDLE
> > >
> > > When it should have selected:
> > > rq0 r1
> > > IDLE RT
> > >
> > > Joel saw this issue on real-world usecases in ChromeOS where an RT task
> > > gets constantly force-idled and breaks RT. Lets cure it.
> > >
> > > NOTE: This problem will be fixed differently in a later patch. It just
> > > kept here for reference purposes about this issue, and to make
> > > applying later patches easier.
> > >
> >
> > The changelog is hard to read, it refers to above if(), whereas there
> > is no code snippet in the changelog.
>
> Yeah sorry, it comes from this email where I described the issue:
> http://lore.kernel.org/r/[email protected]
>
> I corrected the changelog and appended the patch below. Also pushed it to:
> https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=coresched
>
> > Also, from what I can see following
> > the series, p->core_cookie is not yet set anywhere (unless I missed it),
> > so fixing it in here did not make sense just reading the series.
>
> The interface patches for core_cookie are added later, that's how it is. The
> infrastructure comes first here. It would also not make sense to add
> interface first as well so I think the current ordering is fine.
>

Some comments below to help make the code easier to understand

> ---8<-----------------------
>
> From: Peter Zijlstra <[email protected]>
> Subject: [PATCH] sched: Fix priority inversion of cookied task with sibling
>
> The rationale is as follows. In the core-wide pick logic, even if
> need_sync == false, we need to go look at other CPUs (non-local CPUs) to
> see if they could be running RT.
>
> Say the RQs in a particular core look like this:
> Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.
>
> rq0 rq1
> CFS1 (tagged) RT1 (not tag)
> CFS2 (tagged)
>
> The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
> rq0 rq1
> CFS1 IDLE
>
> When it should have selected:
> rq0 r1
> IDLE RT
>
> Fix this issue by forcing need_sync and restarting the search if a
> cookied task was discovered. This will avoid this optimization from
> making incorrect picks.
>
> Joel saw this issue on real-world usecases in ChromeOS where an RT task
> gets constantly force-idled and breaks RT. Lets cure it.
>
> NOTE: This problem will be fixed differently in a later patch. It just
> kept here for reference purposes about this issue, and to make
> applying later patches easier.
>
> Reported-by: Joel Fernandes (Google) <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Joel Fernandes (Google) <[email protected]>
> ---
> kernel/sched/core.c | 25 ++++++++++++++++---------
> 1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 4ee4902c2cf5..53af817740c0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5195,6 +5195,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> need_sync = !!rq->core->core_cookie;
>
> /* reset state */
> +reset:
> rq->core->core_cookie = 0UL;
> if (rq->core->core_forceidle) {
> need_sync = true;
> @@ -5242,14 +5243,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> /*
> * If there weren't no cookies; we don't need to
> * bother with the other siblings.
> - * If the rest of the core is not running a tagged
> - * task, i.e. need_sync == 0, and the current CPU
> - * which called into the schedule() loop does not
> - * have any tasks for this class, skip selecting for
> - * other siblings since there's no point. We don't skip
> - * for RT/DL because that could make CFS force-idle RT.
> */
> - if (i == cpu && !need_sync && class == &fair_sched_class)
> + if (i == cpu && !need_sync)
> goto next_class;
>
> continue;
> @@ -5259,7 +5254,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> * Optimize the 'normal' case where there aren't any
> * cookies and we don't need to sync up.
> */
> - if (i == cpu && !need_sync && !p->core_cookie) {
> + if (i == cpu && !need_sync) {
> + if (p->core_cookie) {
> + /*
> + * This optimization is only valid as
> + * long as there are no cookies

This is not entirely true, need_sync is a function of core cookies, so I
think this needs more clarification, it sounds like we enter this when
the core has no cookies, but the task has a core_cookie? The term cookie
is quite overloaded when used in the context of core vs task.

Effectively from what I understand this means that p wants to be
coscheduled, but the core itself is not coscheduling anything at the
moment, so we need to see if we should do a sync and that sync might
cause p to get kicked out and a higher priority class to come in?

Balbir Singh.

2020-11-26 10:54:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -tip 10/32] sched: Fix priority inversion of cookied task with sibling

On Thu, Nov 26, 2020 at 10:05:19AM +1100, Balbir Singh wrote:
> > @@ -5259,7 +5254,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> > * Optimize the 'normal' case where there aren't any
> > * cookies and we don't need to sync up.
> > */
> > - if (i == cpu && !need_sync && !p->core_cookie) {
> > + if (i == cpu && !need_sync) {
> > + if (p->core_cookie) {
> > + /*
> > + * This optimization is only valid as
> > + * long as there are no cookies
>
> This is not entirely true, need_sync is a function of core cookies, so I
> think this needs more clarification, it sounds like we enter this when
> the core has no cookies, but the task has a core_cookie? The term cookie
> is quite overloaded when used in the context of core vs task.

Nah, its the same. So each task gets a cookie to identify the 'group' of
tasks (possibly just itself) it is allowed to share a core with.

When we to core task selection, the core gets assigned the cookie of the
group it will run, same thing.

> Effectively from what I understand this means that p wants to be
> coscheduled, but the core itself is not coscheduling anything at the
> moment, so we need to see if we should do a sync and that sync might
> cause p to get kicked out and a higher priority class to come in?

This whole patch is about eliding code-wide task selection when it is
not required. IOW an optimization.

When there wasn't a core cookie (IOW, the previous task selection wasn't
core wide and limited) and the task we just selected for our own CPU
also didn't have a cookie (IOW it doesn't have to be core-wide) we can
skip the core wide task selection and schedule just this CPU and call it
a day.

The logic was subtly wrong, this patch fixes it. A next patch completely
rewrites it again to make it far simpler again. Don't spend time trying
to understand this patch (unless you're _that_ kind of person ;-) but
instead apply the whole thing and look at the resulting pick_next_task()
function.

2020-11-27 08:47:29

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH -tip 10/32] sched: Fix priority inversion of cookied task with sibling

On Thu, Nov 26, 2020 at 09:29:14AM +0100, Peter Zijlstra wrote:
> On Thu, Nov 26, 2020 at 10:05:19AM +1100, Balbir Singh wrote:
> > > @@ -5259,7 +5254,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> > > * Optimize the 'normal' case where there aren't any
> > > * cookies and we don't need to sync up.
> > > */
> > > - if (i == cpu && !need_sync && !p->core_cookie) {
> > > + if (i == cpu && !need_sync) {
> > > + if (p->core_cookie) {
> > > + /*
> > > + * This optimization is only valid as
> > > + * long as there are no cookies
> >
> > This is not entirely true, need_sync is a function of core cookies, so I
> > think this needs more clarification, it sounds like we enter this when
> > the core has no cookies, but the task has a core_cookie? The term cookie
> > is quite overloaded when used in the context of core vs task.
>
> Nah, its the same. So each task gets a cookie to identify the 'group' of
> tasks (possibly just itself) it is allowed to share a core with.
>
> When we to core task selection, the core gets assigned the cookie of the
> group it will run, same thing.
>
> > Effectively from what I understand this means that p wants to be
> > coscheduled, but the core itself is not coscheduling anything at the
> > moment, so we need to see if we should do a sync and that sync might
> > cause p to get kicked out and a higher priority class to come in?
>
> This whole patch is about eliding code-wide task selection when it is
> not required. IOW an optimization.
>
> When there wasn't a core cookie (IOW, the previous task selection wasn't
> core wide and limited) and the task we just selected for our own CPU
> also didn't have a cookie (IOW it doesn't have to be core-wide) we can
> skip the core wide task selection and schedule just this CPU and call it
> a day.
>
> The logic was subtly wrong, this patch fixes it. A next patch completely
> rewrites it again to make it far simpler again. Don't spend time trying
> to understand this patch (unless you're _that_ kind of person ;-) but
> instead apply the whole thing and look at the resulting pick_next_task()
> function.

Thanks, I'll look at the git tree and see what the final outcome looks like.

Balbir Singh.

2020-12-01 22:39:59

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH -tip 10/32] sched: Fix priority inversion of cookied task with sibling

On Thu, Nov 26, 2020 at 10:05:19AM +1100, Balbir Singh wrote:
> On Tue, Nov 24, 2020 at 01:30:38PM -0500, Joel Fernandes wrote:
> > On Mon, Nov 23, 2020 at 09:41:23AM +1100, Balbir Singh wrote:
> > > On Tue, Nov 17, 2020 at 06:19:40PM -0500, Joel Fernandes (Google) wrote:
> > > > From: Peter Zijlstra <[email protected]>
> > > >
> > > > The rationale is as follows. In the core-wide pick logic, even if
> > > > need_sync == false, we need to go look at other CPUs (non-local CPUs) to
> > > > see if they could be running RT.
> > > >
> > > > Say the RQs in a particular core look like this:
> > > > Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.
> > > >
> > > > rq0 rq1
> > > > CFS1 (tagged) RT1 (not tag)
> > > > CFS2 (tagged)
> > > >
> > > > Say schedule() runs on rq0. Now, it will enter the above loop and
> > > > pick_task(RT) will return NULL for 'p'. It will enter the above if() block
> > > > and see that need_sync == false and will skip RT entirely.
> > > >
> > > > The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
> > > > rq0 rq1
> > > > CFS1 IDLE
> > > >
> > > > When it should have selected:
> > > > rq0 r1
> > > > IDLE RT
> > > >
> > > > Joel saw this issue on real-world usecases in ChromeOS where an RT task
> > > > gets constantly force-idled and breaks RT. Lets cure it.
> > > >
> > > > NOTE: This problem will be fixed differently in a later patch. It just
> > > > kept here for reference purposes about this issue, and to make
> > > > applying later patches easier.
> > > >
> > >
> > > The changelog is hard to read, it refers to above if(), whereas there
> > > is no code snippet in the changelog.
> >
> > Yeah sorry, it comes from this email where I described the issue:
> > http://lore.kernel.org/r/[email protected]
> >
> > I corrected the changelog and appended the patch below. Also pushed it to:
> > https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=coresched
> >
> > > Also, from what I can see following
> > > the series, p->core_cookie is not yet set anywhere (unless I missed it),
> > > so fixing it in here did not make sense just reading the series.
> >
> > The interface patches for core_cookie are added later, that's how it is. The
> > infrastructure comes first here. It would also not make sense to add
> > interface first as well so I think the current ordering is fine.
> >
>
> Some comments below to help make the code easier to understand
>
> > ---8<-----------------------
> >
> > From: Peter Zijlstra <[email protected]>
> > Subject: [PATCH] sched: Fix priority inversion of cookied task with sibling
> >
> > The rationale is as follows. In the core-wide pick logic, even if
> > need_sync == false, we need to go look at other CPUs (non-local CPUs) to
> > see if they could be running RT.
> >
> > Say the RQs in a particular core look like this:
> > Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.
> >
> > rq0 rq1
> > CFS1 (tagged) RT1 (not tag)
> > CFS2 (tagged)
> >
> > The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
> > rq0 rq1
> > CFS1 IDLE
> >
> > When it should have selected:
> > rq0 r1
> > IDLE RT
> >
> > Fix this issue by forcing need_sync and restarting the search if a
> > cookied task was discovered. This will avoid this optimization from
> > making incorrect picks.
> >
> > Joel saw this issue on real-world usecases in ChromeOS where an RT task
> > gets constantly force-idled and breaks RT. Lets cure it.
> >
> > NOTE: This problem will be fixed differently in a later patch. It just
> > kept here for reference purposes about this issue, and to make
> > applying later patches easier.
> >
> > Reported-by: Joel Fernandes (Google) <[email protected]>
> > Signed-off-by: Peter Zijlstra <[email protected]>
> > Signed-off-by: Joel Fernandes (Google) <[email protected]>
> > ---
> > kernel/sched/core.c | 25 ++++++++++++++++---------
> > 1 file changed, 16 insertions(+), 9 deletions(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 4ee4902c2cf5..53af817740c0 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5195,6 +5195,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> > need_sync = !!rq->core->core_cookie;
> >
> > /* reset state */
> > +reset:
> > rq->core->core_cookie = 0UL;
> > if (rq->core->core_forceidle) {
> > need_sync = true;
> > @@ -5242,14 +5243,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> > /*
> > * If there weren't no cookies; we don't need to
> > * bother with the other siblings.
> > - * If the rest of the core is not running a tagged
> > - * task, i.e. need_sync == 0, and the current CPU
> > - * which called into the schedule() loop does not
> > - * have any tasks for this class, skip selecting for
> > - * other siblings since there's no point. We don't skip
> > - * for RT/DL because that could make CFS force-idle RT.
> > */
> > - if (i == cpu && !need_sync && class == &fair_sched_class)
> > + if (i == cpu && !need_sync)
> > goto next_class;
> >
> > continue;
> > @@ -5259,7 +5254,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> > * Optimize the 'normal' case where there aren't any
> > * cookies and we don't need to sync up.
> > */
> > - if (i == cpu && !need_sync && !p->core_cookie) {
> > + if (i == cpu && !need_sync) {
> > + if (p->core_cookie) {
> > + /*
> > + * This optimization is only valid as
> > + * long as there are no cookies
>
> This is not entirely true, need_sync is a function of core cookies, so I
> think this needs more clarification, it sounds like we enter this when
> the core has no cookies, but the task has a core_cookie? The term cookie
> is quite overloaded when used in the context of core vs task.
>
> Effectively from what I understand this means that p wants to be
> coscheduled, but the core itself is not coscheduling anything at the
> moment, so we need to see if we should do a sync and that sync might
> cause p to get kicked out and a higher priority class to come in?

Yeah so about need_sync, it is basically a flag that says if the HT running
the schedule() loop needs to bother with siblings.

need_sync is true only in following conditions:
- A cookied task is running on any HT on the core.
- Any HT in the core is force idled.

The above code comment you referred to is now reworked. That was for the case
where we discovered during local selection that we found a task with a
cookie so now we have to do a core-wide scan (need_sync = false before but
now it becomes true and we start over). This optimization is done sligtly
differently now, we run ->pick_task() on every class of the local CPU until
we find something, if we find something with a cookie then we do core-wide
selection.

The latest version of this code is now in Peter's branch:
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/tree/kernel/sched/core.c?id=6288c0a49631ce6b53eeab7021a43e49c4c4d436

- Joel