2020-05-06 14:22:22

by Phil Auld

[permalink] [raw]
Subject: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

sched/fair: Fix enqueue_task_fair warning some more

The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the (rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list. In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se does not point to the sched_entity which broke out
of the first loop.

Address this issue by saving the se pointer when the first loop exits and
resetting it before doing the fix up, if needed.

Signed-off-by: Phil Auld <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
---
kernel/sched/fair.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b85b6d..719c996317e3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
{
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se;
+ struct sched_entity *saved_se = NULL;
int idle_h_nr_running = task_has_idle_policy(p);

/*
@@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
flags = ENQUEUE_WAKEUP;
}

+ saved_se = se;
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);

@@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
* leaf list maintenance, resulting in triggering the assertion
* below.
*/
+ if (saved_se)
+ se = saved_se;
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);

--
2.18.0


Cheers,
Phil


2020-05-06 16:39:20

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

Hi Phil,

- reply to all this time

On Wed, 6 May 2020 at 16:18, Phil Auld <[email protected]> wrote:
>
> sched/fair: Fix enqueue_task_fair warning some more
>
> The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> did not fully resolve the issues with the (rq->tmp_alone_branch !=
> &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> the first for_each_sched_entity loop exits due to on_rq, having incompletely
> updated the list. In this case the second for_each_sched_entity loop can
> further modify se. The later code to fix up the list management fails to do

But for the 2nd for_each_sched_entity, the cfs_rq should already be
in the list, isn't it ?

The third for_each_entity loop is there for the throttled case but is
useless for other case

Could you provide us some details about the use case that creates such
a situation ?

> what is needed because se does not point to the sched_entity which broke out
> of the first loop.
>
> Address this issue by saving the se pointer when the first loop exits and
> resetting it before doing the fix up, if needed.
>
> Signed-off-by: Phil Auld <[email protected]>
> Cc: Peter Zijlstra (Intel) <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Juri Lelli <[email protected]>
> ---
> kernel/sched/fair.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 02f323b85b6d..719c996317e3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> {
> struct cfs_rq *cfs_rq;
> struct sched_entity *se = &p->se;
> + struct sched_entity *saved_se = NULL;
> int idle_h_nr_running = task_has_idle_policy(p);
>
> /*
> @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> flags = ENQUEUE_WAKEUP;
> }
>
> + saved_se = se;
> for_each_sched_entity(se) {
> cfs_rq = cfs_rq_of(se);
>
> @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> * leaf list maintenance, resulting in triggering the assertion
> * below.
> */
> + if (saved_se)
> + se = saved_se;
> for_each_sched_entity(se) {
> cfs_rq = cfs_rq_of(se);
>
> --
> 2.18.0
>
>
> Cheers,
> Phil
>

2020-05-06 18:09:53

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

Hi Vincent,

Thanks for taking a look. More below...

On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> Hi Phil,
>
> - reply to all this time
>
> On Wed, 6 May 2020 at 16:18, Phil Auld <[email protected]> wrote:
> >
> > sched/fair: Fix enqueue_task_fair warning some more
> >
> > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > updated the list. In this case the second for_each_sched_entity loop can
> > further modify se. The later code to fix up the list management fails to do
>
> But for the 2nd for_each_sched_entity, the cfs_rq should already be
> in the list, isn't it ?

No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
the same. It returns false expecting the parent to be added later.

But then the parent doens't get there because it's on_rq.

>
> The third for_each_entity loop is there for the throttled case but is
> useless for other case
>

There actually is a throttling involved usually. The second loop breaks out
early because one of the parents is throttled. But not before it advances
se at least once.

Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
with the right se.

> Could you provide us some details about the use case that creates such
> a situation ?
>

I admit I had to add trace_printks to get here. Here's what it showed (sorry
for the long lines...)

1) sh-6271 [044] 1271.322317: bprint: enqueue_task_fair: se 0xffffa085e7e30080 on_rq 0 cfs_rq = 0xffffa085e93da200
2) sh-6271 [044] 1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e93da200 onlist 0 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
3) sh-6271 [044] 1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2: parent not onlist Set t_a_branch to 0xffffa085e93da340 rq->l_c_r_l = 0xffffa085ef92c868
4) sh-6271 [044] 1271.322323: bprint: enqueue_task_fair: se 0xffffa085e93d8800 on_rq 1 cfs_rq = 0xffffa085dbfaea00
5) sh-6271 [044] 1271.322324: bprint: enqueue_task_fair: Done enqueues, se=0xffffa085e93d8800, pid=3642
6) sh-6271 [044] 1271.322326: bprint: enqueue_task_fair: update: cfs 0xffffa085e48ce000 throttled, se = 0xffffa085dbfafc00
7) sh-6271 [044] 1271.322326: bprint: enqueue_task_fair: current se = 0xffffa085dbfafc00, orig_se = 0xffffa085e7e30080
8) sh-6271 [044] 1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
9) sh-6271 [044] 1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 0; cfs 0xffffa085ef92bf80 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
10) sh-6271 [044] 1271.672599: bprint: enqueue_task_fair: cpu 17: rq->tmp_alone_branch = 0xffffa085e93da340 != &rq->leaf_cfs_rq_list = 0xffffa085ef92c868


lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 are from the
first call to list_add_leaf_rq with line 2 being at the start and line 3 showing which
of the 3 cases we hit.

Line 5 is right after the first loop.

Line 6 is the second trip through the 2nd loop and is in the if(throttled) condition.
Line 7 is right below the enqueue_throttle label.

Lines 8 and 9 are from the fixup loop and since onlist is set for both of these it doesn't
do anything. But we've left rq->tmp_alone_branch pointing to the cfs_rq->leaf_cfs_rq_list
from the one call to list_add_leaf_rq that did something and so the cleanup doesn't work.

Based on the comment at the clean up, it looked like it expected the se to be what it was
when the first loop broke not whatever it was left at after the second loop. Could have
been NULL there too I guess but I didn't hit that case.

This is 100% reproducible. And completely gone with the fix. I have a trace showing that.

Does that make more sense?



Cheers,
Phil


> > what is needed because se does not point to the sched_entity which broke out
> > of the first loop.
> >
> > Address this issue by saving the se pointer when the first loop exits and
> > resetting it before doing the fix up, if needed.
> >
> > Signed-off-by: Phil Auld <[email protected]>
> > Cc: Peter Zijlstra (Intel) <[email protected]>
> > Cc: Vincent Guittot <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Juri Lelli <[email protected]>
> > ---
> > kernel/sched/fair.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 02f323b85b6d..719c996317e3 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > {
> > struct cfs_rq *cfs_rq;
> > struct sched_entity *se = &p->se;
> > + struct sched_entity *saved_se = NULL;
> > int idle_h_nr_running = task_has_idle_policy(p);
> >
> > /*
> > @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > flags = ENQUEUE_WAKEUP;
> > }
> >
> > + saved_se = se;
> > for_each_sched_entity(se) {
> > cfs_rq = cfs_rq_of(se);
> >
> > @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > * leaf list maintenance, resulting in triggering the assertion
> > * below.
> > */
> > + if (saved_se)
> > + se = saved_se;
> > for_each_sched_entity(se) {
> > cfs_rq = cfs_rq_of(se);
> >
> > --
> > 2.18.0
> >
> >
> > Cheers,
> > Phil
> >
>

--

2020-05-07 15:08:41

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

Hi Phil,

On Wed, 6 May 2020 at 20:05, Phil Auld <[email protected]> wrote:
>
> Hi Vincent,
>
> Thanks for taking a look. More below...
>
> On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> > Hi Phil,
> >
> > - reply to all this time
> >
> > On Wed, 6 May 2020 at 16:18, Phil Auld <[email protected]> wrote:
> > >
> > > sched/fair: Fix enqueue_task_fair warning some more
> > >
> > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> > > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > > updated the list. In this case the second for_each_sched_entity loop can
> > > further modify se. The later code to fix up the list management fails to do
> >
> > But for the 2nd for_each_sched_entity, the cfs_rq should already be
> > in the list, isn't it ?
>
> No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
> which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
> the same. It returns false expecting the parent to be added later.
>
> But then the parent doens't get there because it's on_rq.
>
> >
> > The third for_each_entity loop is there for the throttled case but is
> > useless for other case
> >
>
> There actually is a throttling involved usually. The second loop breaks out
> early because one of the parents is throttled. But not before it advances
> se at least once.

Ok, that's even because of the throttling that the problem occurs

>
> Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
> with the right se.
>
> > Could you provide us some details about the use case that creates such
> > a situation ?
> >
>
> I admit I had to add trace_printks to get here. Here's what it showed (sorry
> for the long lines...)
>
> 1) sh-6271 [044] 1271.322317: bprint: enqueue_task_fair: se 0xffffa085e7e30080 on_rq 0 cfs_rq = 0xffffa085e93da200
> 2) sh-6271 [044] 1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e93da200 onlist 0 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
> 3) sh-6271 [044] 1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2: parent not onlist Set t_a_branch to 0xffffa085e93da340 rq->l_c_r_l = 0xffffa085ef92c868
> 4) sh-6271 [044] 1271.322323: bprint: enqueue_task_fair: se 0xffffa085e93d8800 on_rq 1 cfs_rq = 0xffffa085dbfaea00
> 5) sh-6271 [044] 1271.322324: bprint: enqueue_task_fair: Done enqueues, se=0xffffa085e93d8800, pid=3642
> 6) sh-6271 [044] 1271.322326: bprint: enqueue_task_fair: update: cfs 0xffffa085e48ce000 throttled, se = 0xffffa085dbfafc00
> 7) sh-6271 [044] 1271.322326: bprint: enqueue_task_fair: current se = 0xffffa085dbfafc00, orig_se = 0xffffa085e7e30080
> 8) sh-6271 [044] 1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> 9) sh-6271 [044] 1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 0; cfs 0xffffa085ef92bf80 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> 10) sh-6271 [044] 1271.672599: bprint: enqueue_task_fair: cpu 17: rq->tmp_alone_branch = 0xffffa085e93da340 != &rq->leaf_cfs_rq_list = 0xffffa085ef92c868
>
>
> lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 are from the
> first call to list_add_leaf_rq with line 2 being at the start and line 3 showing which
> of the 3 cases we hit.
>
> Line 5 is right after the first loop.
>
> Line 6 is the second trip through the 2nd loop and is in the if(throttled) condition.
> Line 7 is right below the enqueue_throttle label.
>
> Lines 8 and 9 are from the fixup loop and since onlist is set for both of these it doesn't
> do anything. But we've left rq->tmp_alone_branch pointing to the cfs_rq->leaf_cfs_rq_list
> from the one call to list_add_leaf_rq that did something and so the cleanup doesn't work.
>
> Based on the comment at the clean up, it looked like it expected the se to be what it was
> when the first loop broke not whatever it was left at after the second loop. Could have
> been NULL there too I guess but I didn't hit that case.
>
> This is 100% reproducible. And completely gone with the fix. I have a trace showing that.
>
> Does that make more sense?

Yes, Good catch
And thanks for the detailed explanation.

>
>
>
> Cheers,
> Phil
>
>
> > > what is needed because se does not point to the sched_entity which broke out
> > > of the first loop.
> > >
> > > Address this issue by saving the se pointer when the first loop exits and
> > > resetting it before doing the fix up, if needed.
> > >
> > > Signed-off-by: Phil Auld <[email protected]>
> > > Cc: Peter Zijlstra (Intel) <[email protected]>
> > > Cc: Vincent Guittot <[email protected]>
> > > Cc: Ingo Molnar <[email protected]>
> > > Cc: Juri Lelli <[email protected]>
> > > ---
> > > kernel/sched/fair.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 02f323b85b6d..719c996317e3 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > {
> > > struct cfs_rq *cfs_rq;
> > > struct sched_entity *se = &p->se;
> > > + struct sched_entity *saved_se = NULL;
> > > int idle_h_nr_running = task_has_idle_policy(p);
> > >
> > > /*
> > > @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > flags = ENQUEUE_WAKEUP;
> > > }
> > >
> > > + saved_se = se;

TBH, I don't like saving and going back to the saved se and loop one
more time on them

> > > for_each_sched_entity(se) {
> > > cfs_rq = cfs_rq_of(se);

Could you add something like below in the 2nd loop instead ?

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5486,6 +5486,13 @@ enqueue_task_fair(struct rq *rq, struct
task_struct *p, int flags)
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(cfs_rq))
goto enqueue_throttle;
+
+ /*
+ * One parent has been throttled and cfs_rq removed from the
+ * list. Add it back to not break the leaf list.
+ */
+ if (throttled_hierarchy(cfs_rq))
+ list_add_leaf_cfs_rq(cfs_rq);
}

enqueue_throttle:

> > >
> > > @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > * leaf list maintenance, resulting in triggering the assertion
> > > * below.
> > > */
> > > + if (saved_se)
> > > + se = saved_se;
> > > for_each_sched_entity(se) {
> > > cfs_rq = cfs_rq_of(se);
> > >
> > > --
> > > 2.18.0
> > >
> > >
> > > Cheers,
> > > Phil
> > >
> >
>
> --
>

2020-05-07 15:20:02

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

Hi Vincent,

On Thu, May 07, 2020 at 05:06:29PM +0200 Vincent Guittot wrote:
> Hi Phil,
>
> On Wed, 6 May 2020 at 20:05, Phil Auld <[email protected]> wrote:
> >
> > Hi Vincent,
> >
> > Thanks for taking a look. More below...
> >
> > On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> > > Hi Phil,
> > >
> > > - reply to all this time
> > >
> > > On Wed, 6 May 2020 at 16:18, Phil Auld <[email protected]> wrote:
> > > >
> > > > sched/fair: Fix enqueue_task_fair warning some more
> > > >
> > > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > > > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > > > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> > > > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > > > updated the list. In this case the second for_each_sched_entity loop can
> > > > further modify se. The later code to fix up the list management fails to do
> > >
> > > But for the 2nd for_each_sched_entity, the cfs_rq should already be
> > > in the list, isn't it ?
> >
> > No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
> > which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
> > the same. It returns false expecting the parent to be added later.
> >
> > But then the parent doens't get there because it's on_rq.
> >
> > >
> > > The third for_each_entity loop is there for the throttled case but is
> > > useless for other case
> > >
> >
> > There actually is a throttling involved usually. The second loop breaks out
> > early because one of the parents is throttled. But not before it advances
> > se at least once.
>
> Ok, that's even because of the throttling that the problem occurs
>
> >
> > Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
> > with the right se.
> >
> > > Could you provide us some details about the use case that creates such
> > > a situation ?
> > >
> >
> > I admit I had to add trace_printks to get here. Here's what it showed (sorry
> > for the long lines...)
> >
> > 1) sh-6271 [044] 1271.322317: bprint: enqueue_task_fair: se 0xffffa085e7e30080 on_rq 0 cfs_rq = 0xffffa085e93da200
> > 2) sh-6271 [044] 1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e93da200 onlist 0 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
> > 3) sh-6271 [044] 1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2: parent not onlist Set t_a_branch to 0xffffa085e93da340 rq->l_c_r_l = 0xffffa085ef92c868
> > 4) sh-6271 [044] 1271.322323: bprint: enqueue_task_fair: se 0xffffa085e93d8800 on_rq 1 cfs_rq = 0xffffa085dbfaea00
> > 5) sh-6271 [044] 1271.322324: bprint: enqueue_task_fair: Done enqueues, se=0xffffa085e93d8800, pid=3642
> > 6) sh-6271 [044] 1271.322326: bprint: enqueue_task_fair: update: cfs 0xffffa085e48ce000 throttled, se = 0xffffa085dbfafc00
> > 7) sh-6271 [044] 1271.322326: bprint: enqueue_task_fair: current se = 0xffffa085dbfafc00, orig_se = 0xffffa085e7e30080
> > 8) sh-6271 [044] 1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> > 9) sh-6271 [044] 1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 0; cfs 0xffffa085ef92bf80 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> > 10) sh-6271 [044] 1271.672599: bprint: enqueue_task_fair: cpu 17: rq->tmp_alone_branch = 0xffffa085e93da340 != &rq->leaf_cfs_rq_list = 0xffffa085ef92c868
> >
> >
> > lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 are from the
> > first call to list_add_leaf_rq with line 2 being at the start and line 3 showing which
> > of the 3 cases we hit.
> >
> > Line 5 is right after the first loop.
> >
> > Line 6 is the second trip through the 2nd loop and is in the if(throttled) condition.
> > Line 7 is right below the enqueue_throttle label.
> >
> > Lines 8 and 9 are from the fixup loop and since onlist is set for both of these it doesn't
> > do anything. But we've left rq->tmp_alone_branch pointing to the cfs_rq->leaf_cfs_rq_list
> > from the one call to list_add_leaf_rq that did something and so the cleanup doesn't work.
> >
> > Based on the comment at the clean up, it looked like it expected the se to be what it was
> > when the first loop broke not whatever it was left at after the second loop. Could have
> > been NULL there too I guess but I didn't hit that case.
> >
> > This is 100% reproducible. And completely gone with the fix. I have a trace showing that.
> >
> > Does that make more sense?
>
> Yes, Good catch
> And thanks for the detailed explanation.

No problem. I had to see it all myself anyway :)


>
> >
> >
> >
> > Cheers,
> > Phil
> >
> >
> > > > what is needed because se does not point to the sched_entity which broke out
> > > > of the first loop.
> > > >
> > > > Address this issue by saving the se pointer when the first loop exits and
> > > > resetting it before doing the fix up, if needed.
> > > >
> > > > Signed-off-by: Phil Auld <[email protected]>
> > > > Cc: Peter Zijlstra (Intel) <[email protected]>
> > > > Cc: Vincent Guittot <[email protected]>
> > > > Cc: Ingo Molnar <[email protected]>
> > > > Cc: Juri Lelli <[email protected]>
> > > > ---
> > > > kernel/sched/fair.c | 4 ++++
> > > > 1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index 02f323b85b6d..719c996317e3 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > {
> > > > struct cfs_rq *cfs_rq;
> > > > struct sched_entity *se = &p->se;
> > > > + struct sched_entity *saved_se = NULL;
> > > > int idle_h_nr_running = task_has_idle_policy(p);
> > > >
> > > > /*
> > > > @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > flags = ENQUEUE_WAKEUP;
> > > > }
> > > >
> > > > + saved_se = se;
>
> TBH, I don't like saving and going back to the saved se and loop one
> more time on them
>
> > > > for_each_sched_entity(se) {
> > > > cfs_rq = cfs_rq_of(se);
>
> Could you add something like below in the 2nd loop instead ?

I'll give it a try this way and let you know. I had to give that machine
away. I'll get another and make sure it hits and then we'll see.


Thanks,
Phil



>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5486,6 +5486,13 @@ enqueue_task_fair(struct rq *rq, struct
> task_struct *p, int flags)
> /* end evaluation on encountering a throttled cfs_rq */
> if (cfs_rq_throttled(cfs_rq))
> goto enqueue_throttle;
> +
> + /*
> + * One parent has been throttled and cfs_rq removed from the
> + * list. Add it back to not break the leaf list.
> + */
> + if (throttled_hierarchy(cfs_rq))
> + list_add_leaf_cfs_rq(cfs_rq);
> }
>
> enqueue_throttle:
>
> > > >
> > > > @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > * leaf list maintenance, resulting in triggering the assertion
> > > > * below.
> > > > */
> > > > + if (saved_se)
> > > > + se = saved_se;
> > > > for_each_sched_entity(se) {
> > > > cfs_rq = cfs_rq_of(se);
> > > >
> > > > --
> > > > 2.18.0
> > > >
> > > >
> > > > Cheers,
> > > > Phil
> > > >
> > >
> >
> > --
> >
>

--

2020-05-07 18:06:50

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

Hi Vincent,

On Thu, May 07, 2020 at 05:06:29PM +0200 Vincent Guittot wrote:
> Hi Phil,
>
> On Wed, 6 May 2020 at 20:05, Phil Auld <[email protected]> wrote:
> >
> > Hi Vincent,
> >
> > Thanks for taking a look. More below...
> >
> > On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> > > Hi Phil,
> > >
> > > - reply to all this time
> > >
> > > On Wed, 6 May 2020 at 16:18, Phil Auld <[email protected]> wrote:
> > > >
> > > > sched/fair: Fix enqueue_task_fair warning some more
> > > >
> > > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > > > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > > > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> > > > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > > > updated the list. In this case the second for_each_sched_entity loop can
> > > > further modify se. The later code to fix up the list management fails to do
> > >
> > > But for the 2nd for_each_sched_entity, the cfs_rq should already be
> > > in the list, isn't it ?
> >
> > No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
> > which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
> > the same. It returns false expecting the parent to be added later.
> >
> > But then the parent doens't get there because it's on_rq.
> >
> > >
> > > The third for_each_entity loop is there for the throttled case but is
> > > useless for other case
> > >
> >
> > There actually is a throttling involved usually. The second loop breaks out
> > early because one of the parents is throttled. But not before it advances
> > se at least once.
>
> Ok, that's even because of the throttling that the problem occurs
>
> >
> > Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
> > with the right se.
> >
> > > Could you provide us some details about the use case that creates such
> > > a situation ?
> > >
> >
> > I admit I had to add trace_printks to get here. Here's what it showed (sorry
> > for the long lines...)
> >
> > 1) sh-6271 [044] 1271.322317: bprint: enqueue_task_fair: se 0xffffa085e7e30080 on_rq 0 cfs_rq = 0xffffa085e93da200
> > 2) sh-6271 [044] 1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e93da200 onlist 0 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
> > 3) sh-6271 [044] 1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2: parent not onlist Set t_a_branch to 0xffffa085e93da340 rq->l_c_r_l = 0xffffa085ef92c868
> > 4) sh-6271 [044] 1271.322323: bprint: enqueue_task_fair: se 0xffffa085e93d8800 on_rq 1 cfs_rq = 0xffffa085dbfaea00
> > 5) sh-6271 [044] 1271.322324: bprint: enqueue_task_fair: Done enqueues, se=0xffffa085e93d8800, pid=3642
> > 6) sh-6271 [044] 1271.322326: bprint: enqueue_task_fair: update: cfs 0xffffa085e48ce000 throttled, se = 0xffffa085dbfafc00
> > 7) sh-6271 [044] 1271.322326: bprint: enqueue_task_fair: current se = 0xffffa085dbfafc00, orig_se = 0xffffa085e7e30080
> > 8) sh-6271 [044] 1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> > 9) sh-6271 [044] 1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 0; cfs 0xffffa085ef92bf80 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> > 10) sh-6271 [044] 1271.672599: bprint: enqueue_task_fair: cpu 17: rq->tmp_alone_branch = 0xffffa085e93da340 != &rq->leaf_cfs_rq_list = 0xffffa085ef92c868
> >
> >
> > lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 are from the
> > first call to list_add_leaf_rq with line 2 being at the start and line 3 showing which
> > of the 3 cases we hit.
> >
> > Line 5 is right after the first loop.
> >
> > Line 6 is the second trip through the 2nd loop and is in the if(throttled) condition.
> > Line 7 is right below the enqueue_throttle label.
> >
> > Lines 8 and 9 are from the fixup loop and since onlist is set for both of these it doesn't
> > do anything. But we've left rq->tmp_alone_branch pointing to the cfs_rq->leaf_cfs_rq_list
> > from the one call to list_add_leaf_rq that did something and so the cleanup doesn't work.
> >
> > Based on the comment at the clean up, it looked like it expected the se to be what it was
> > when the first loop broke not whatever it was left at after the second loop. Could have
> > been NULL there too I guess but I didn't hit that case.
> >
> > This is 100% reproducible. And completely gone with the fix. I have a trace showing that.
> >
> > Does that make more sense?
>
> Yes, Good catch
> And thanks for the detailed explanation.
>
> >
> >
> >
> > Cheers,
> > Phil
> >
> >
> > > > what is needed because se does not point to the sched_entity which broke out
> > > > of the first loop.
> > > >
> > > > Address this issue by saving the se pointer when the first loop exits and
> > > > resetting it before doing the fix up, if needed.
> > > >
> > > > Signed-off-by: Phil Auld <[email protected]>
> > > > Cc: Peter Zijlstra (Intel) <[email protected]>
> > > > Cc: Vincent Guittot <[email protected]>
> > > > Cc: Ingo Molnar <[email protected]>
> > > > Cc: Juri Lelli <[email protected]>
> > > > ---
> > > > kernel/sched/fair.c | 4 ++++
> > > > 1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index 02f323b85b6d..719c996317e3 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > {
> > > > struct cfs_rq *cfs_rq;
> > > > struct sched_entity *se = &p->se;
> > > > + struct sched_entity *saved_se = NULL;
> > > > int idle_h_nr_running = task_has_idle_policy(p);
> > > >
> > > > /*
> > > > @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > flags = ENQUEUE_WAKEUP;
> > > > }
> > > >
> > > > + saved_se = se;
>
> TBH, I don't like saving and going back to the saved se and loop one
> more time on them
>
> > > > for_each_sched_entity(se) {
> > > > cfs_rq = cfs_rq_of(se);
>
> Could you add something like below in the 2nd loop instead ?

This one solves the problem as well with no other visible issues.

Do you want to spin it into a real patch?

You can have my {reported/tested}-by or whatever you need.


Cheers,
Phil

>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5486,6 +5486,13 @@ enqueue_task_fair(struct rq *rq, struct
> task_struct *p, int flags)
> /* end evaluation on encountering a throttled cfs_rq */
> if (cfs_rq_throttled(cfs_rq))
> goto enqueue_throttle;
> +
> + /*
> + * One parent has been throttled and cfs_rq removed from the
> + * list. Add it back to not break the leaf list.
> + */
> + if (throttled_hierarchy(cfs_rq))
> + list_add_leaf_cfs_rq(cfs_rq);
> }
>
> enqueue_throttle:
>
> > > >
> > > > @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > * leaf list maintenance, resulting in triggering the assertion
> > > > * below.
> > > > */
> > > > + if (saved_se)
> > > > + se = saved_se;
> > > > for_each_sched_entity(se) {
> > > > cfs_rq = cfs_rq_of(se);
> > > >
> > > > --
> > > > 2.18.0
> > > >
> > > >
> > > > Cheers,
> > > > Phil
> > > >
> > >
> >
> > --
> >
>

--

2020-05-07 18:23:27

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix enqueue_task_fair warning some more

On Thu, 7 May 2020 at 20:04, Phil Auld <[email protected]> wrote:
>
> Hi Vincent,
>
> On Thu, May 07, 2020 at 05:06:29PM +0200 Vincent Guittot wrote:
> > Hi Phil,
> >
> > On Wed, 6 May 2020 at 20:05, Phil Auld <[email protected]> wrote:
> > >
> > > Hi Vincent,
> > >
> > > Thanks for taking a look. More below...
> > >
> > > On Wed, May 06, 2020 at 06:36:45PM +0200 Vincent Guittot wrote:
> > > > Hi Phil,
> > > >
> > > > - reply to all this time
> > > >
> > > > On Wed, 6 May 2020 at 16:18, Phil Auld <[email protected]> wrote:
> > > > >
> > > > > sched/fair: Fix enqueue_task_fair warning some more
> > > > >
> > > > > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > > > > did not fully resolve the issues with the (rq->tmp_alone_branch !=
> > > > > &rq->leaf_cfs_rq_list) warning in enqueue_task_fair. There is a case where
> > > > > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > > > > updated the list. In this case the second for_each_sched_entity loop can
> > > > > further modify se. The later code to fix up the list management fails to do
> > > >
> > > > But for the 2nd for_each_sched_entity, the cfs_rq should already be
> > > > in the list, isn't it ?
> > >
> > > No. In this case we hit the parent not on list case in list_add_leaf_cfs_rq
> > > which sets rq-tmp_alone_branch to cfs_rq->leaf_cfs_rq_list which is not
> > > the same. It returns false expecting the parent to be added later.
> > >
> > > But then the parent doens't get there because it's on_rq.
> > >
> > > >
> > > > The third for_each_entity loop is there for the throttled case but is
> > > > useless for other case
> > > >
> > >
> > > There actually is a throttling involved usually. The second loop breaks out
> > > early because one of the parents is throttled. But not before it advances
> > > se at least once.
> >
> > Ok, that's even because of the throttling that the problem occurs
> >
> > >
> > > Then the 3rd loop doesn't fix the tmp_alone_branch because it doesn't start
> > > with the right se.
> > >
> > > > Could you provide us some details about the use case that creates such
> > > > a situation ?
> > > >
> > >
> > > I admit I had to add trace_printks to get here. Here's what it showed (sorry
> > > for the long lines...)
> > >
> > > 1) sh-6271 [044] 1271.322317: bprint: enqueue_task_fair: se 0xffffa085e7e30080 on_rq 0 cfs_rq = 0xffffa085e93da200
> > > 2) sh-6271 [044] 1271.322320: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e93da200 onlist 0 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
> > > 3) sh-6271 [044] 1271.322322: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2: parent not onlist Set t_a_branch to 0xffffa085e93da340 rq->l_c_r_l = 0xffffa085ef92c868
> > > 4) sh-6271 [044] 1271.322323: bprint: enqueue_task_fair: se 0xffffa085e93d8800 on_rq 1 cfs_rq = 0xffffa085dbfaea00
> > > 5) sh-6271 [044] 1271.322324: bprint: enqueue_task_fair: Done enqueues, se=0xffffa085e93d8800, pid=3642
> > > 6) sh-6271 [044] 1271.322326: bprint: enqueue_task_fair: update: cfs 0xffffa085e48ce000 throttled, se = 0xffffa085dbfafc00
> > > 7) sh-6271 [044] 1271.322326: bprint: enqueue_task_fair: current se = 0xffffa085dbfafc00, orig_se = 0xffffa085e7e30080
> > > 8) sh-6271 [044] 1271.322327: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> > > 9) sh-6271 [044] 1271.322328: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 0; cfs 0xffffa085ef92bf80 onlist 1 tmp_a_b = 0xffffa085e93da340 &rq->l_c_r_l = 0xffffa085ef92c868
> > > 10) sh-6271 [044] 1271.672599: bprint: enqueue_task_fair: cpu 17: rq->tmp_alone_branch = 0xffffa085e93da340 != &rq->leaf_cfs_rq_list = 0xffffa085ef92c868
> > >
> > >
> > > lines 1 and 4 are from the first loop in enqueue_task_fair. Line 2 and 3 are from the
> > > first call to list_add_leaf_rq with line 2 being at the start and line 3 showing which
> > > of the 3 cases we hit.
> > >
> > > Line 5 is right after the first loop.
> > >
> > > Line 6 is the second trip through the 2nd loop and is in the if(throttled) condition.
> > > Line 7 is right below the enqueue_throttle label.
> > >
> > > Lines 8 and 9 are from the fixup loop and since onlist is set for both of these it doesn't
> > > do anything. But we've left rq->tmp_alone_branch pointing to the cfs_rq->leaf_cfs_rq_list
> > > from the one call to list_add_leaf_rq that did something and so the cleanup doesn't work.
> > >
> > > Based on the comment at the clean up, it looked like it expected the se to be what it was
> > > when the first loop broke not whatever it was left at after the second loop. Could have
> > > been NULL there too I guess but I didn't hit that case.
> > >
> > > This is 100% reproducible. And completely gone with the fix. I have a trace showing that.
> > >
> > > Does that make more sense?
> >
> > Yes, Good catch
> > And thanks for the detailed explanation.
> >
> > >
> > >
> > >
> > > Cheers,
> > > Phil
> > >
> > >
> > > > > what is needed because se does not point to the sched_entity which broke out
> > > > > of the first loop.
> > > > >
> > > > > Address this issue by saving the se pointer when the first loop exits and
> > > > > resetting it before doing the fix up, if needed.
> > > > >
> > > > > Signed-off-by: Phil Auld <[email protected]>
> > > > > Cc: Peter Zijlstra (Intel) <[email protected]>
> > > > > Cc: Vincent Guittot <[email protected]>
> > > > > Cc: Ingo Molnar <[email protected]>
> > > > > Cc: Juri Lelli <[email protected]>
> > > > > ---
> > > > > kernel/sched/fair.c | 4 ++++
> > > > > 1 file changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > > index 02f323b85b6d..719c996317e3 100644
> > > > > --- a/kernel/sched/fair.c
> > > > > +++ b/kernel/sched/fair.c
> > > > > @@ -5432,6 +5432,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > > {
> > > > > struct cfs_rq *cfs_rq;
> > > > > struct sched_entity *se = &p->se;
> > > > > + struct sched_entity *saved_se = NULL;
> > > > > int idle_h_nr_running = task_has_idle_policy(p);
> > > > >
> > > > > /*
> > > > > @@ -5466,6 +5467,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > > flags = ENQUEUE_WAKEUP;
> > > > > }
> > > > >
> > > > > + saved_se = se;
> >
> > TBH, I don't like saving and going back to the saved se and loop one
> > more time on them
> >
> > > > > for_each_sched_entity(se) {
> > > > > cfs_rq = cfs_rq_of(se);
> >
> > Could you add something like below in the 2nd loop instead ?
>
> This one solves the problem as well with no other visible issues.
>
> Do you want to spin it into a real patch?

You can respin your patch in a V2 with the proposal.
A suggested-by from me will be enough

Thanks
Vincent
>
> You can have my {reported/tested}-by or whatever you need.
>
>
> Cheers,
> Phil
>
> >
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5486,6 +5486,13 @@ enqueue_task_fair(struct rq *rq, struct
> > task_struct *p, int flags)
> > /* end evaluation on encountering a throttled cfs_rq */
> > if (cfs_rq_throttled(cfs_rq))
> > goto enqueue_throttle;
> > +
> > + /*
> > + * One parent has been throttled and cfs_rq removed from the
> > + * list. Add it back to not break the leaf list.
> > + */
> > + if (throttled_hierarchy(cfs_rq))
> > + list_add_leaf_cfs_rq(cfs_rq);
> > }
> >
> > enqueue_throttle:
> >
> > > > >
> > > > > @@ -5510,6 +5512,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > > > > * leaf list maintenance, resulting in triggering the assertion
> > > > > * below.
> > > > > */
> > > > > + if (saved_se)
> > > > > + se = saved_se;
> > > > > for_each_sched_entity(se) {
> > > > > cfs_rq = cfs_rq_of(se);
> > > > >
> > > > > --
> > > > > 2.18.0
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Phil
> > > > >
> > > >
> > >
> > > --
> > >
> >
>
> --
>

2020-05-07 20:40:16

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more

sched/fair: Fix enqueue_task_fair warning some more

The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list. In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se no longer points to the sched_entity which broke
out of the first loop.

Address this by calling leaf_add_rq_list if there are throttled parents while
doing the second for_each_sched_entity loop.

Suggested-by: Vincent Guittot <[email protected]>
Signed-off-by: Phil Auld <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
---
kernel/sched/fair.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b85b6d..c6d57c334d51 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(cfs_rq))
goto enqueue_throttle;
+
+ /*
+ * One parent has been throttled and cfs_rq removed from the
+ * list. Add it back to not break the leaf list.
+ */
+ if (throttled_hierarchy(cfs_rq))
+ list_add_leaf_cfs_rq(cfs_rq);
}

enqueue_throttle:
--
2.18.0

V2 rework the fix based on Vincent's suggestion. Thanks Vincent.


Cheers,
Phil

--

2020-05-11 19:30:29

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more

On Thu, 7 May 2020 at 22:36, Phil Auld <[email protected]> wrote:
>
> sched/fair: Fix enqueue_task_fair warning some more
>
> The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> did not fully resolve the issues with the rq->tmp_alone_branch !=
> &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> the first for_each_sched_entity loop exits due to on_rq, having incompletely
> updated the list. In this case the second for_each_sched_entity loop can
> further modify se. The later code to fix up the list management fails to do
> what is needed because se no longer points to the sched_entity which broke
> out of the first loop.
>
> Address this by calling leaf_add_rq_list if there are throttled parents while
> doing the second for_each_sched_entity loop.
>

Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)

> Suggested-by: Vincent Guittot <[email protected]>
> Signed-off-by: Phil Auld <[email protected]>
> Cc: Peter Zijlstra (Intel) <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Juri Lelli <[email protected]>

With the Fixes tag and the typo mentioned by Tao

Reviewed-by: Vincent Guittot <[email protected]>

> ---
> kernel/sched/fair.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 02f323b85b6d..c6d57c334d51 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> /* end evaluation on encountering a throttled cfs_rq */
> if (cfs_rq_throttled(cfs_rq))
> goto enqueue_throttle;
> +
> + /*
> + * One parent has been throttled and cfs_rq removed from the
> + * list. Add it back to not break the leaf list.
> + */
> + if (throttled_hierarchy(cfs_rq))
> + list_add_leaf_cfs_rq(cfs_rq);
> }
>
> enqueue_throttle:
> --
> 2.18.0
>
> V2 rework the fix based on Vincent's suggestion. Thanks Vincent.
>
>
> Cheers,
> Phil
>
> --
>

2020-05-11 20:47:04

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more

On Mon, May 11, 2020 at 09:25:43PM +0200 Vincent Guittot wrote:
> On Thu, 7 May 2020 at 22:36, Phil Auld <[email protected]> wrote:
> >
> > sched/fair: Fix enqueue_task_fair warning some more
> >
> > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > did not fully resolve the issues with the rq->tmp_alone_branch !=
> > &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > updated the list. In this case the second for_each_sched_entity loop can
> > further modify se. The later code to fix up the list management fails to do
> > what is needed because se no longer points to the sched_entity which broke
> > out of the first loop.
> >
> > Address this by calling leaf_add_rq_list if there are throttled parents while
> > doing the second for_each_sched_entity loop.
> >
>
> Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
>
> > Suggested-by: Vincent Guittot <[email protected]>
> > Signed-off-by: Phil Auld <[email protected]>
> > Cc: Peter Zijlstra (Intel) <[email protected]>
> > Cc: Vincent Guittot <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Juri Lelli <[email protected]>
>
> With the Fixes tag and the typo mentioned by Tao
>

Right, that last line of the commit message should read "list_add_leaf_cfs_rq"


> Reviewed-by: Vincent Guittot <[email protected]>

Thanks Vincent.

Peter/Ingo, do you want me to resend or can you fix when applying?


Thanks,
Phil

>
> > ---
> > kernel/sched/fair.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 02f323b85b6d..c6d57c334d51 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> > /* end evaluation on encountering a throttled cfs_rq */
> > if (cfs_rq_throttled(cfs_rq))
> > goto enqueue_throttle;
> > +
> > + /*
> > + * One parent has been throttled and cfs_rq removed from the
> > + * list. Add it back to not break the leaf list.
> > + */
> > + if (throttled_hierarchy(cfs_rq))
> > + list_add_leaf_cfs_rq(cfs_rq);
> > }
> >
> > enqueue_throttle:
> > --
> > 2.18.0
> >
> > V2 rework the fix based on Vincent's suggestion. Thanks Vincent.
> >
> >
> > Cheers,
> > Phil
> >
> > --
> >
>

--

2020-05-12 09:02:31

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more

On 11/05/2020 22:44, Phil Auld wrote:
> On Mon, May 11, 2020 at 09:25:43PM +0200 Vincent Guittot wrote:
>> On Thu, 7 May 2020 at 22:36, Phil Auld <[email protected]> wrote:
>>>
>>> sched/fair: Fix enqueue_task_fair warning some more
>>>
>>> The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
>>> did not fully resolve the issues with the rq->tmp_alone_branch !=
>>> &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
>>> the first for_each_sched_entity loop exits due to on_rq, having incompletely
>>> updated the list. In this case the second for_each_sched_entity loop can
>>> further modify se. The later code to fix up the list management fails to do
>>> what is needed because se no longer points to the sched_entity which broke
>>> out of the first loop.
>>>
>>> Address this by calling leaf_add_rq_list if there are throttled parents while
>>> doing the second for_each_sched_entity loop.
>>>
>>
>> Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
>>
>>> Suggested-by: Vincent Guittot <[email protected]>
>>> Signed-off-by: Phil Auld <[email protected]>
>>> Cc: Peter Zijlstra (Intel) <[email protected]>
>>> Cc: Vincent Guittot <[email protected]>
>>> Cc: Ingo Molnar <[email protected]>
>>> Cc: Juri Lelli <[email protected]>
>>
>> With the Fixes tag and the typo mentioned by Tao
>>
>
> Right, that last line of the commit message should read "list_add_leaf_cfs_rq"
>
>
>> Reviewed-by: Vincent Guittot <[email protected]>
>
> Thanks Vincent.
>
> Peter/Ingo, do you want me to resend or can you fix when applying?


Maybe you could add that 'the throttled parent was already added back to
the list by a task enqueue in a parallel child hierarchy'.

IMHO, this is part of the description because otherwise the throttled
parent would have connected the branch.

And the not-adding of the intermediate child cfs_rq would have gone
unnoticed.

Reviewed-by: Dietmar Eggemann <[email protected]>

[...]

2020-05-12 13:40:23

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more

Hi Dietmar,

On Tue, May 12, 2020 at 11:00:16AM +0200 Dietmar Eggemann wrote:
> On 11/05/2020 22:44, Phil Auld wrote:
> > On Mon, May 11, 2020 at 09:25:43PM +0200 Vincent Guittot wrote:
> >> On Thu, 7 May 2020 at 22:36, Phil Auld <[email protected]> wrote:
> >>>
> >>> sched/fair: Fix enqueue_task_fair warning some more
> >>>
> >>> The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> >>> did not fully resolve the issues with the rq->tmp_alone_branch !=
> >>> &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> >>> the first for_each_sched_entity loop exits due to on_rq, having incompletely
> >>> updated the list. In this case the second for_each_sched_entity loop can
> >>> further modify se. The later code to fix up the list management fails to do
> >>> what is needed because se no longer points to the sched_entity which broke
> >>> out of the first loop.
> >>>
> >>> Address this by calling leaf_add_rq_list if there are throttled parents while
> >>> doing the second for_each_sched_entity loop.
> >>>
> >>
> >> Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> >>
> >>> Suggested-by: Vincent Guittot <[email protected]>
> >>> Signed-off-by: Phil Auld <[email protected]>
> >>> Cc: Peter Zijlstra (Intel) <[email protected]>
> >>> Cc: Vincent Guittot <[email protected]>
> >>> Cc: Ingo Molnar <[email protected]>
> >>> Cc: Juri Lelli <[email protected]>
> >>
> >> With the Fixes tag and the typo mentioned by Tao
> >>
> >
> > Right, that last line of the commit message should read "list_add_leaf_cfs_rq"
> >
> >
> >> Reviewed-by: Vincent Guittot <[email protected]>
> >
> > Thanks Vincent.
> >
> > Peter/Ingo, do you want me to resend or can you fix when applying?
>
>
> Maybe you could add that 'the throttled parent was already added back to
> the list by a task enqueue in a parallel child hierarchy'.
>
> IMHO, this is part of the description because otherwise the throttled
> parent would have connected the branch.
>
> And the not-adding of the intermediate child cfs_rq would have gone
> unnoticed.

Okay, I'll add that statement. For those curious here are the lines from about
70ms earlier in the trace where the throttled parent (0xffffa085e48ce000) is added
to the list.

bz1738415-test-6264 [005] 1271.315046: sched_waking: comm=bz1738415-test pid=6269 prio=120 target_cpu=005
bz1738415-test-6264 [005] 1271.315048: sched_migrate_task: comm=bz1738415-test pid=6269 prio=120 orig_cpu=5 dest_cpu=17
bz1738415-test-6264 [005] 1271.315050: bprint: enqueue_task_fair: se 0xffffa081e6d7de80 on_rq 0 cfs_rq = 0xffffa085e48ce000
bz1738415-test-6264 [005] 1271.315051: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 0 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
bz1738415-test-6264 [005] 1271.315053: bprint: enqueue_entity: Add_leaf_rq: cpu 17: nr_r 2: parent onlist Set tmp_alone_branch to 0xffffa085ef92c868
bz1738415-test-6264 [005] 1271.315053: bprint: enqueue_task_fair: current se = 0xffffa081e6d7de80, orig_se = 0xffffa081e6d7de80
bz1738415-test-6264 [005] 1271.315055: bprint: enqueue_task_fair: Add_leaf_rq: cpu 17: nr_r 2; cfs 0xffffa085e48ce000 onlist 1 tmp_a_b = 0xffffa085ef92c868 &rq->l_c_r_l = 0xffffa085ef92c868
bz1738415-test-6264 [005] 1271.315056: sched_wake_idle_without_ipi: cpu=17

>
> Reviewed-by: Dietmar Eggemann <[email protected]>

Thanks,

Phil


>
> [...]
>

--

2020-05-12 13:54:42

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: Fix enqueue_task_fair warning some more

sched/fair: Fix enqueue_task_fair warning some more

The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list. In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se does not point to the sched_entity which broke out
of the first loop. The list is not fixed up because the throttled parent was
already added back to the list by a task enqueue in a parallel child hierarchy.

Address this by calling list_add_leaf_cfs_rq if there are throttled parents
while doing the second for_each_sched_entity loop.

v3: clean up commit message and add fixes and review tags.

Suggested-by: Vincent Guittot <[email protected]>
Signed-off-by: Phil Auld <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Juri Lelli <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
---
kernel/sched/fair.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b85b6d..c6d57c334d51 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(cfs_rq))
goto enqueue_throttle;
+
+ /*
+ * One parent has been throttled and cfs_rq removed from the
+ * list. Add it back to not break the leaf list.
+ */
+ if (throttled_hierarchy(cfs_rq))
+ list_add_leaf_cfs_rq(cfs_rq);
}

enqueue_throttle:
--
2.18.0


--

2020-05-12 14:09:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2] sched/fair: Fix enqueue_task_fair warning some more

On Mon, May 11, 2020 at 04:44:10PM -0400, Phil Auld wrote:
> Peter/Ingo, do you want me to resend or can you fix when applying?

I now have this, do I need more edits?

---

Subject: sched/fair: Fix enqueue_task_fair warning some more
From: Phil Auld <[email protected]>
Date: Thu, 7 May 2020 16:36:12 -0400

From: Phil Auld <[email protected]>

sched/fair: Fix enqueue_task_fair warning some more

The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list. In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se no longer points to the sched_entity which broke
out of the first loop.

Address this by calling list_add_leaf_cfs_rq if there are throttled
parents while doing the second for_each_sched_entity loop.

Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
Suggested-by: Vincent Guittot <[email protected]>
Signed-off-by: Phil Auld <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 7 +++++++
1 file changed, 7 insertions(+)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(cfs_rq))
goto enqueue_throttle;
+
+ /*
+ * One parent has been throttled and cfs_rq removed from the
+ * list. Add it back to not break the leaf list.
+ */
+ if (throttled_hierarchy(cfs_rq))
+ list_add_leaf_cfs_rq(cfs_rq);
}

enqueue_throttle:

2020-05-12 14:12:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: Fix enqueue_task_fair warning some more

On Tue, May 12, 2020 at 09:52:22AM -0400, Phil Auld wrote:
> sched/fair: Fix enqueue_task_fair warning some more
>
> The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> did not fully resolve the issues with the rq->tmp_alone_branch !=
> &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> the first for_each_sched_entity loop exits due to on_rq, having incompletely
> updated the list. In this case the second for_each_sched_entity loop can
> further modify se. The later code to fix up the list management fails to do
> what is needed because se does not point to the sched_entity which broke out
> of the first loop. The list is not fixed up because the throttled parent was
> already added back to the list by a task enqueue in a parallel child hierarchy.
>
> Address this by calling list_add_leaf_cfs_rq if there are throttled parents
> while doing the second for_each_sched_entity loop.
>
> v3: clean up commit message and add fixes and review tags.

Excellent, ignore what I just sent, I now have this one.

2020-05-12 14:29:54

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair: Fix enqueue_task_fair warning some more

On Tue, May 12, 2020 at 04:10:48PM +0200 Peter Zijlstra wrote:
> On Tue, May 12, 2020 at 09:52:22AM -0400, Phil Auld wrote:
> > sched/fair: Fix enqueue_task_fair warning some more
> >
> > The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > did not fully resolve the issues with the rq->tmp_alone_branch !=
> > &rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
> > the first for_each_sched_entity loop exits due to on_rq, having incompletely
> > updated the list. In this case the second for_each_sched_entity loop can
> > further modify se. The later code to fix up the list management fails to do
> > what is needed because se does not point to the sched_entity which broke out
> > of the first loop. The list is not fixed up because the throttled parent was
> > already added back to the list by a task enqueue in a parallel child hierarchy.
> >
> > Address this by calling list_add_leaf_cfs_rq if there are throttled parents
> > while doing the second for_each_sched_entity loop.
> >
> > v3: clean up commit message and add fixes and review tags.
>
> Excellent, ignore what I just sent, I now have this one.
>

Thank you!


Cheers,
Phil
--

Subject: [tip: sched/urgent] sched/fair: Fix enqueue_task_fair() warning some more

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: b34cb07dde7c2346dec73d053ce926aeaa087303
Gitweb: https://git.kernel.org/tip/b34cb07dde7c2346dec73d053ce926aeaa087303
Author: Phil Auld <[email protected]>
AuthorDate: Tue, 12 May 2020 09:52:22 -04:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 19 May 2020 20:34:10 +02:00

sched/fair: Fix enqueue_task_fair() warning some more

sched/fair: Fix enqueue_task_fair warning some more

The recent patch, fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
did not fully resolve the issues with the rq->tmp_alone_branch !=
&rq->leaf_cfs_rq_list warning in enqueue_task_fair. There is a case where
the first for_each_sched_entity loop exits due to on_rq, having incompletely
updated the list. In this case the second for_each_sched_entity loop can
further modify se. The later code to fix up the list management fails to do
what is needed because se does not point to the sched_entity which broke out
of the first loop. The list is not fixed up because the throttled parent was
already added back to the list by a task enqueue in a parallel child hierarchy.

Address this by calling list_add_leaf_cfs_rq if there are throttled parents
while doing the second for_each_sched_entity loop.

Fixes: fe61468b2cb ("sched/fair: Fix enqueue_task_fair warning")
Suggested-by: Vincent Guittot <[email protected]>
Signed-off-by: Phil Auld <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b..c6d57c3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5479,6 +5479,13 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(cfs_rq))
goto enqueue_throttle;
+
+ /*
+ * One parent has been throttled and cfs_rq removed from the
+ * list. Add it back to not break the leaf list.
+ */
+ if (throttled_hierarchy(cfs_rq))
+ list_add_leaf_cfs_rq(cfs_rq);
}

enqueue_throttle: