2020-03-06 08:42:51

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH v3] sched/fair : fix reordering of enqueue/dequeue_task_fair

Even when a cgroup is throttled, the group se of a child cgroup can still
be enqueued and its gse->on_rq stays true. When a task is enqueued on such
child, we still have to update the load_avg and increase
h_nr_running of the throttled cfs. Nevertheless, the 1st
for_each_sched_entity loop is skipped because of gse->on_rq == true and the
2nd loop because the cfs is throttled whereas we have to update both
load_avg with the old h_nr_running and increase h_nr_running in such case.

The same sequence can happen during dequeue when se moves to parent before
breaking in the 1st loop.

Note that the update of load_avg will effectively happen only once in order
to sync up to the throttled time. Next call for updating load_avg will stop
early because the clock stays unchanged.

Fixes: 6d4d22468dae ("sched/fair: Reorder enqueue/dequeue_task_fair path")
Signed-off-by: Vincent Guittot <[email protected]>
---

Changes since v2:
- added similar changes into dequeue_task_fair as reported by Ben

kernel/sched/fair.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fcc968669aea..ea2748a132a2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5431,16 +5431,16 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);

- /* end evaluation on encountering a throttled cfs_rq */
- if (cfs_rq_throttled(cfs_rq))
- goto enqueue_throttle;
-
update_load_avg(cfs_rq, se, UPDATE_TG);
se_update_runnable(se);
update_cfs_group(se);

cfs_rq->h_nr_running++;
cfs_rq->idle_h_nr_running += idle_h_nr_running;
+
+ /* end evaluation on encountering a throttled cfs_rq */
+ if (cfs_rq_throttled(cfs_rq))
+ goto enqueue_throttle;
}

enqueue_throttle:
@@ -5529,16 +5529,17 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);

- /* end evaluation on encountering a throttled cfs_rq */
- if (cfs_rq_throttled(cfs_rq))
- goto dequeue_throttle;
-
update_load_avg(cfs_rq, se, UPDATE_TG);
se_update_runnable(se);
update_cfs_group(se);

cfs_rq->h_nr_running--;
cfs_rq->idle_h_nr_running -= idle_h_nr_running;
+
+ /* end evaluation on encountering a throttled cfs_rq */
+ if (cfs_rq_throttled(cfs_rq))
+ goto dequeue_throttle;
+
}

dequeue_throttle:
--
2.17.1


2020-03-06 14:46:06

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Fix reordering of enqueue/dequeue_task_fair()

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 5ab297bab984310267734dfbcc8104566658ebef
Gitweb: https://git.kernel.org/tip/5ab297bab984310267734dfbcc8104566658ebef
Author: Vincent Guittot <[email protected]>
AuthorDate: Fri, 06 Mar 2020 09:42:08 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 06 Mar 2020 12:57:25 +01:00

sched/fair: Fix reordering of enqueue/dequeue_task_fair()

Even when a cgroup is throttled, the group se of a child cgroup can still
be enqueued and its gse->on_rq stays true. When a task is enqueued on such
child, we still have to update the load_avg and increase
h_nr_running of the throttled cfs. Nevertheless, the 1st
for_each_sched_entity() loop is skipped because of gse->on_rq == true and the
2nd loop because the cfs is throttled whereas we have to update both
load_avg with the old h_nr_running and increase h_nr_running in such case.

The same sequence can happen during dequeue when se moves to parent before
breaking in the 1st loop.

Note that the update of load_avg will effectively happen only once in order
to sync up to the throttled time. Next call for updating load_avg will stop
early because the clock stays unchanged.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Fixes: 6d4d22468dae ("sched/fair: Reorder enqueue/dequeue_task_fair path")
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 54bd628..1dea855 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5460,16 +5460,16 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);

- /* end evaluation on encountering a throttled cfs_rq */
- if (cfs_rq_throttled(cfs_rq))
- goto enqueue_throttle;
-
update_load_avg(cfs_rq, se, UPDATE_TG);
se_update_runnable(se);
update_cfs_group(se);

cfs_rq->h_nr_running++;
cfs_rq->idle_h_nr_running += idle_h_nr_running;
+
+ /* end evaluation on encountering a throttled cfs_rq */
+ if (cfs_rq_throttled(cfs_rq))
+ goto enqueue_throttle;
}

enqueue_throttle:
@@ -5558,16 +5558,17 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);

- /* end evaluation on encountering a throttled cfs_rq */
- if (cfs_rq_throttled(cfs_rq))
- goto dequeue_throttle;
-
update_load_avg(cfs_rq, se, UPDATE_TG);
se_update_runnable(se);
update_cfs_group(se);

cfs_rq->h_nr_running--;
cfs_rq->idle_h_nr_running -= idle_h_nr_running;
+
+ /* end evaluation on encountering a throttled cfs_rq */
+ if (cfs_rq_throttled(cfs_rq))
+ goto dequeue_throttle;
+
}

dequeue_throttle:

2020-03-06 21:21:29

by Benjamin Segall

[permalink] [raw]
Subject: Re: [PATCH v3] sched/fair : fix reordering of enqueue/dequeue_task_fair

Vincent Guittot <[email protected]> writes:

> Even when a cgroup is throttled, the group se of a child cgroup can still
> be enqueued and its gse->on_rq stays true. When a task is enqueued on such
> child, we still have to update the load_avg and increase
> h_nr_running of the throttled cfs. Nevertheless, the 1st
> for_each_sched_entity loop is skipped because of gse->on_rq == true and the
> 2nd loop because the cfs is throttled whereas we have to update both
> load_avg with the old h_nr_running and increase h_nr_running in such case.
>
> The same sequence can happen during dequeue when se moves to parent before
> breaking in the 1st loop.
>
> Note that the update of load_avg will effectively happen only once in order
> to sync up to the throttled time. Next call for updating load_avg will stop
> early because the clock stays unchanged.


Reviewed-by: Ben Segall <[email protected]>

(though it seems I was too slow in actually testing this and it's in
tip, which confused me a bunch when I tried to apply the patch for testing)

>
> Fixes: 6d4d22468dae ("sched/fair: Reorder enqueue/dequeue_task_fair path")
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
>
> Changes since v2:
> - added similar changes into dequeue_task_fair as reported by Ben
>
> kernel/sched/fair.c | 17 +++++++++--------
> 1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fcc968669aea..ea2748a132a2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5431,16 +5431,16 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> for_each_sched_entity(se) {
> cfs_rq = cfs_rq_of(se);
>
> - /* end evaluation on encountering a throttled cfs_rq */
> - if (cfs_rq_throttled(cfs_rq))
> - goto enqueue_throttle;
> -
> update_load_avg(cfs_rq, se, UPDATE_TG);
> se_update_runnable(se);
> update_cfs_group(se);
>
> cfs_rq->h_nr_running++;
> cfs_rq->idle_h_nr_running += idle_h_nr_running;
> +
> + /* end evaluation on encountering a throttled cfs_rq */
> + if (cfs_rq_throttled(cfs_rq))
> + goto enqueue_throttle;
> }
>
> enqueue_throttle:
> @@ -5529,16 +5529,17 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> for_each_sched_entity(se) {
> cfs_rq = cfs_rq_of(se);
>
> - /* end evaluation on encountering a throttled cfs_rq */
> - if (cfs_rq_throttled(cfs_rq))
> - goto dequeue_throttle;
> -
> update_load_avg(cfs_rq, se, UPDATE_TG);
> se_update_runnable(se);
> update_cfs_group(se);
>
> cfs_rq->h_nr_running--;
> cfs_rq->idle_h_nr_running -= idle_h_nr_running;
> +
> + /* end evaluation on encountering a throttled cfs_rq */
> + if (cfs_rq_throttled(cfs_rq))
> + goto dequeue_throttle;
> +
> }
>
> dequeue_throttle: