2020-05-11 19:16:18

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

Although not exactly identical, unthrottle_cfs_rq() and enqueue_task_fair()
are quite close and follow the same sequence for enqueuing an entity in the
cfs hierarchy. Modify unthrottle_cfs_rq() to use the same pattern as
enqueue_task_fair(). This fixes a problem already faced with the latter and
add an optimization in the last for_each_sched_entity loop.

Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
Reported-by Tao Zhou <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
---

This path applies on top of [email protected]
and fixes similar problem for unthrottle_cfs_rq()

kernel/sched/fair.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e2450c2e0747..4b73518aa25c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4803,26 +4803,44 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
idle_task_delta = cfs_rq->idle_h_nr_running;
for_each_sched_entity(se) {
if (se->on_rq)
- enqueue = 0;
+ break;
+ cfs_rq = cfs_rq_of(se);
+ enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);

+ cfs_rq->h_nr_running += task_delta;
+ cfs_rq->idle_h_nr_running += idle_task_delta;
+
+ /* end evaluation on encountering a throttled cfs_rq */
+ if (cfs_rq_throttled(cfs_rq))
+ goto unthrottle_throttle;
+ }
+
+ for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
- if (enqueue) {
- enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
- } else {
- update_load_avg(cfs_rq, se, 0);
- se_update_runnable(se);
- }
+
+ update_load_avg(cfs_rq, se, UPDATE_TG);
+ se_update_runnable(se);

cfs_rq->h_nr_running += task_delta;
cfs_rq->idle_h_nr_running += idle_task_delta;

+
+ /* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(cfs_rq))
- break;
+ goto unthrottle_throttle;
+
+ /*
+ * One parent has been throttled and cfs_rq removed from the
+ * list. Add it back to not break the leaf list.
+ */
+ if (throttled_hierarchy(cfs_rq))
+ list_add_leaf_cfs_rq(cfs_rq);
}

if (!se)
add_nr_running(rq, task_delta);

+unthrottle_throttle:
/*
* The cfs_rq_throttled() breaks in the above iteration can result in
* incomplete leaf list maintenance, resulting in triggering the
@@ -4831,7 +4849,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);

- list_add_leaf_cfs_rq(cfs_rq);
+ if (list_add_leaf_cfs_rq(cfs_rq))
+ break;
}

assert_list_leaf_cfs_rq(rq);
--
2.17.1


2020-05-12 16:09:29

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

On Mon, May 11, 2020 at 09:13:20PM +0200 Vincent Guittot wrote:
> Although not exactly identical, unthrottle_cfs_rq() and enqueue_task_fair()
> are quite close and follow the same sequence for enqueuing an entity in the
> cfs hierarchy. Modify unthrottle_cfs_rq() to use the same pattern as
> enqueue_task_fair(). This fixes a problem already faced with the latter and
> add an optimization in the last for_each_sched_entity loop.
>
> Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> Reported-by Tao Zhou <[email protected]>
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
>
> This path applies on top of [email protected]
> and fixes similar problem for unthrottle_cfs_rq()
>
> kernel/sched/fair.c | 37 ++++++++++++++++++++++++++++---------
> 1 file changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e2450c2e0747..4b73518aa25c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4803,26 +4803,44 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> idle_task_delta = cfs_rq->idle_h_nr_running;
> for_each_sched_entity(se) {
> if (se->on_rq)
> - enqueue = 0;
> + break;
> + cfs_rq = cfs_rq_of(se);
> + enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
>
> + cfs_rq->h_nr_running += task_delta;
> + cfs_rq->idle_h_nr_running += idle_task_delta;
> +
> + /* end evaluation on encountering a throttled cfs_rq */
> + if (cfs_rq_throttled(cfs_rq))
> + goto unthrottle_throttle;
> + }
> +
> + for_each_sched_entity(se) {
> cfs_rq = cfs_rq_of(se);
> - if (enqueue) {
> - enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
> - } else {
> - update_load_avg(cfs_rq, se, 0);
> - se_update_runnable(se);
> - }
> +
> + update_load_avg(cfs_rq, se, UPDATE_TG);
> + se_update_runnable(se);
>
> cfs_rq->h_nr_running += task_delta;
> cfs_rq->idle_h_nr_running += idle_task_delta;
>
> +
> + /* end evaluation on encountering a throttled cfs_rq */
> if (cfs_rq_throttled(cfs_rq))
> - break;
> + goto unthrottle_throttle;
> +
> + /*
> + * One parent has been throttled and cfs_rq removed from the
> + * list. Add it back to not break the leaf list.
> + */
> + if (throttled_hierarchy(cfs_rq))
> + list_add_leaf_cfs_rq(cfs_rq);
> }
>
> if (!se)
> add_nr_running(rq, task_delta);
>
> +unthrottle_throttle:
> /*
> * The cfs_rq_throttled() breaks in the above iteration can result in
> * incomplete leaf list maintenance, resulting in triggering the
> @@ -4831,7 +4849,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> for_each_sched_entity(se) {
> cfs_rq = cfs_rq_of(se);
>
> - list_add_leaf_cfs_rq(cfs_rq);
> + if (list_add_leaf_cfs_rq(cfs_rq))
> + break;
> }
>
> assert_list_leaf_cfs_rq(rq);
> --
> 2.17.1
>

I ran my reproducer test with this one as well. As expected, since
the first patch fixed the issue I was seeing and I wasn't hitting
the assert here anyway, I didn't hit the assert.

But I also didn't hit any other issues, new or old.

It makes sense to use the same logic flow here as enqueue_task_fair.

Reviewed-by: Phil Auld <[email protected]>


Cheers,
Phil
--

2020-05-12 19:01:10

by Benjamin Segall

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

Vincent Guittot <[email protected]> writes:

> Although not exactly identical, unthrottle_cfs_rq() and enqueue_task_fair()
> are quite close and follow the same sequence for enqueuing an entity in the
> cfs hierarchy. Modify unthrottle_cfs_rq() to use the same pattern as
> enqueue_task_fair(). This fixes a problem already faced with the latter and
> add an optimization in the last for_each_sched_entity loop.
>
> Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> Reported-by Tao Zhou <[email protected]>
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
>
> This path applies on top of [email protected]
> and fixes similar problem for unthrottle_cfs_rq()
>
> kernel/sched/fair.c | 37 ++++++++++++++++++++++++++++---------
> 1 file changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e2450c2e0747..4b73518aa25c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4803,26 +4803,44 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> idle_task_delta = cfs_rq->idle_h_nr_running;
> for_each_sched_entity(se) {
> if (se->on_rq)
> - enqueue = 0;
> + break;
> + cfs_rq = cfs_rq_of(se);
> + enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
>
> + cfs_rq->h_nr_running += task_delta;
> + cfs_rq->idle_h_nr_running += idle_task_delta;
> +
> + /* end evaluation on encountering a throttled cfs_rq */
> + if (cfs_rq_throttled(cfs_rq))
> + goto unthrottle_throttle;
> + }
> +
> + for_each_sched_entity(se) {
> cfs_rq = cfs_rq_of(se);
> - if (enqueue) {
> - enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
> - } else {
> - update_load_avg(cfs_rq, se, 0);
> - se_update_runnable(se);
> - }
> +
> + update_load_avg(cfs_rq, se, UPDATE_TG);
> + se_update_runnable(se);
>
> cfs_rq->h_nr_running += task_delta;
> cfs_rq->idle_h_nr_running += idle_task_delta;
>
> +
> + /* end evaluation on encountering a throttled cfs_rq */
> if (cfs_rq_throttled(cfs_rq))
> - break;
> + goto unthrottle_throttle;
> +
> + /*
> + * One parent has been throttled and cfs_rq removed from the
> + * list. Add it back to not break the leaf list.
> + */
> + if (throttled_hierarchy(cfs_rq))
> + list_add_leaf_cfs_rq(cfs_rq);
> }
>
> if (!se)

The if is no longer necessary, unlike in enqueue, where the skip goto
goes to this if statement rather than past (but enqueue could be changed
to match this). Also in general if we are making these loops absolutely
identical we should probably pull them out to a common function (ideally
including the goto target and following loop as well).

> add_nr_running(rq, task_delta);
>
> +unthrottle_throttle:
> /*
> * The cfs_rq_throttled() breaks in the above iteration can result in
> * incomplete leaf list maintenance, resulting in triggering the
> @@ -4831,7 +4849,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> for_each_sched_entity(se) {
> cfs_rq = cfs_rq_of(se);
>
> - list_add_leaf_cfs_rq(cfs_rq);
> + if (list_add_leaf_cfs_rq(cfs_rq))
> + break;

Do we also need to handle the case of tg_unthrottle_up followed by early exit
from unthrottle_cfs_rq? I do not have enough of an idea what
list_add_leaf_cfs_rq is doing to say.

> }
>
> assert_list_leaf_cfs_rq(rq);

2020-05-13 07:14:04

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

On Tue, 12 May 2020 at 20:59, <[email protected]> wrote:
>
> Vincent Guittot <[email protected]> writes:
>
> > Although not exactly identical, unthrottle_cfs_rq() and enqueue_task_fair()
> > are quite close and follow the same sequence for enqueuing an entity in the
> > cfs hierarchy. Modify unthrottle_cfs_rq() to use the same pattern as
> > enqueue_task_fair(). This fixes a problem already faced with the latter and
> > add an optimization in the last for_each_sched_entity loop.
> >
> > Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
> > Reported-by Tao Zhou <[email protected]>
> > Signed-off-by: Vincent Guittot <[email protected]>
> > ---
> >
> > This path applies on top of [email protected]
> > and fixes similar problem for unthrottle_cfs_rq()
> >
> > kernel/sched/fair.c | 37 ++++++++++++++++++++++++++++---------
> > 1 file changed, 28 insertions(+), 9 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index e2450c2e0747..4b73518aa25c 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4803,26 +4803,44 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> > idle_task_delta = cfs_rq->idle_h_nr_running;
> > for_each_sched_entity(se) {
> > if (se->on_rq)
> > - enqueue = 0;
> > + break;
> > + cfs_rq = cfs_rq_of(se);
> > + enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
> >
> > + cfs_rq->h_nr_running += task_delta;
> > + cfs_rq->idle_h_nr_running += idle_task_delta;
> > +
> > + /* end evaluation on encountering a throttled cfs_rq */
> > + if (cfs_rq_throttled(cfs_rq))
> > + goto unthrottle_throttle;
> > + }
> > +
> > + for_each_sched_entity(se) {
> > cfs_rq = cfs_rq_of(se);
> > - if (enqueue) {
> > - enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
> > - } else {
> > - update_load_avg(cfs_rq, se, 0);
> > - se_update_runnable(se);
> > - }
> > +
> > + update_load_avg(cfs_rq, se, UPDATE_TG);
> > + se_update_runnable(se);
> >
> > cfs_rq->h_nr_running += task_delta;
> > cfs_rq->idle_h_nr_running += idle_task_delta;
> >
> > +
> > + /* end evaluation on encountering a throttled cfs_rq */
> > if (cfs_rq_throttled(cfs_rq))
> > - break;
> > + goto unthrottle_throttle;
> > +
> > + /*
> > + * One parent has been throttled and cfs_rq removed from the
> > + * list. Add it back to not break the leaf list.
> > + */
> > + if (throttled_hierarchy(cfs_rq))
> > + list_add_leaf_cfs_rq(cfs_rq);
> > }
> >
> > if (!se)
>
> The if is no longer necessary, unlike in enqueue, where the skip goto

Yes. Good point

> goes to this if statement rather than past (but enqueue could be changed
> to match this). Also in general if we are making these loops absolutely

There is a patch on mailing that skip the if statement. I'm going to
update it to remove the if

> identical we should probably pull them out to a common function (ideally
> including the goto target and following loop as well).

I tried that but was not convinced by the result which generated a lot
of arguments. I didn't want to delay the fix for such cleanup but I
will have a closer look after. Also the same kind identical sequence
and clean up can be done with dequeue_task_fair and throtthle_cfs_rq.
But Those don't have the problem we are fixing here

>
> > add_nr_running(rq, task_delta);
> >
> > +unthrottle_throttle:
> > /*
> > * The cfs_rq_throttled() breaks in the above iteration can result in
> > * incomplete leaf list maintenance, resulting in triggering the
> > @@ -4831,7 +4849,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
> > for_each_sched_entity(se) {
> > cfs_rq = cfs_rq_of(se);
> >
> > - list_add_leaf_cfs_rq(cfs_rq);
> > + if (list_add_leaf_cfs_rq(cfs_rq))
> > + break;
>
> Do we also need to handle the case of tg_unthrottle_up followed by early exit
> from unthrottle_cfs_rq? I do not have enough of an idea what
> list_add_leaf_cfs_rq is doing to say.

If you are speaking about the 'if (!cfs_rq->load.weight) return;"
after walk_tg_tree_from(). I also thought it was needed but after more
analyses, I concluded that if cfs_rq->load.weight == 0 , no child has
been added in the leaf_cfs_rq_list in such case


>
> > }
> >
> > assert_list_leaf_cfs_rq(rq);

2020-05-13 18:24:43

by Benjamin Segall

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: fix unthrottle_cfs_rq for leaf_cfs_rq list

Vincent Guittot <[email protected]> writes:

> On Tue, 12 May 2020 at 20:59, <[email protected]> wrote:
>>
>> Vincent Guittot <[email protected]> writes:
>>
>> > Although not exactly identical, unthrottle_cfs_rq() and enqueue_task_fair()
>> > are quite close and follow the same sequence for enqueuing an entity in the
>> > cfs hierarchy. Modify unthrottle_cfs_rq() to use the same pattern as
>> > enqueue_task_fair(). This fixes a problem already faced with the latter and
>> > add an optimization in the last for_each_sched_entity loop.
>> >
>> > Fixes: fe61468b2cb (sched/fair: Fix enqueue_task_fair warning)
>> > Reported-by Tao Zhou <[email protected]>
>> > Signed-off-by: Vincent Guittot <[email protected]>
>> > ---
>> >
>> > This path applies on top of [email protected]
>> > and fixes similar problem for unthrottle_cfs_rq()
>> >
>> > kernel/sched/fair.c | 37 ++++++++++++++++++++++++++++---------
>> > 1 file changed, 28 insertions(+), 9 deletions(-)
>> >
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index e2450c2e0747..4b73518aa25c 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -4803,26 +4803,44 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
>> > idle_task_delta = cfs_rq->idle_h_nr_running;
>> > for_each_sched_entity(se) {
>> > if (se->on_rq)
>> > - enqueue = 0;
>> > + break;
>> > + cfs_rq = cfs_rq_of(se);
>> > + enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
>> >
>> > + cfs_rq->h_nr_running += task_delta;
>> > + cfs_rq->idle_h_nr_running += idle_task_delta;
>> > +
>> > + /* end evaluation on encountering a throttled cfs_rq */
>> > + if (cfs_rq_throttled(cfs_rq))
>> > + goto unthrottle_throttle;
>> > + }
>> > +
>> > + for_each_sched_entity(se) {
>> > cfs_rq = cfs_rq_of(se);
>> > - if (enqueue) {
>> > - enqueue_entity(cfs_rq, se, ENQUEUE_WAKEUP);
>> > - } else {
>> > - update_load_avg(cfs_rq, se, 0);
>> > - se_update_runnable(se);
>> > - }
>> > +
>> > + update_load_avg(cfs_rq, se, UPDATE_TG);
>> > + se_update_runnable(se);
>> >
>> > cfs_rq->h_nr_running += task_delta;
>> > cfs_rq->idle_h_nr_running += idle_task_delta;
>> >
>> > +
>> > + /* end evaluation on encountering a throttled cfs_rq */
>> > if (cfs_rq_throttled(cfs_rq))
>> > - break;
>> > + goto unthrottle_throttle;
>> > +
>> > + /*
>> > + * One parent has been throttled and cfs_rq removed from the
>> > + * list. Add it back to not break the leaf list.
>> > + */
>> > + if (throttled_hierarchy(cfs_rq))
>> > + list_add_leaf_cfs_rq(cfs_rq);
>> > }
>> >
>> > if (!se)
>>
>> The if is no longer necessary, unlike in enqueue, where the skip goto
>
> Yes. Good point
>
>> goes to this if statement rather than past (but enqueue could be changed
>> to match this). Also in general if we are making these loops absolutely
>
> There is a patch on mailing that skip the if statement. I'm going to
> update it to remove the if
>
>> identical we should probably pull them out to a common function (ideally
>> including the goto target and following loop as well).
>
> I tried that but was not convinced by the result which generated a lot
> of arguments. I didn't want to delay the fix for such cleanup but I
> will have a closer look after. Also the same kind identical sequence
> and clean up can be done with dequeue_task_fair and throtthle_cfs_rq.
> But Those don't have the problem we are fixing here
>
>>
>> > add_nr_running(rq, task_delta);
>> >
>> > +unthrottle_throttle:
>> > /*
>> > * The cfs_rq_throttled() breaks in the above iteration can result in
>> > * incomplete leaf list maintenance, resulting in triggering the
>> > @@ -4831,7 +4849,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
>> > for_each_sched_entity(se) {
>> > cfs_rq = cfs_rq_of(se);
>> >
>> > - list_add_leaf_cfs_rq(cfs_rq);
>> > + if (list_add_leaf_cfs_rq(cfs_rq))
>> > + break;
>>
>> Do we also need to handle the case of tg_unthrottle_up followed by early exit
>> from unthrottle_cfs_rq? I do not have enough of an idea what
>> list_add_leaf_cfs_rq is doing to say.
>
> If you are speaking about the 'if (!cfs_rq->load.weight) return;"
> after walk_tg_tree_from(). I also thought it was needed but after more
> analyses, I concluded that if cfs_rq->load.weight == 0 , no child has
> been added in the leaf_cfs_rq_list in such case

Hmm, yes, if load.weight is 0 it should not have done anything there.

>
>
>>
>> > }
>> >
>> > assert_list_leaf_cfs_rq(rq);