2017-09-01 13:37:39

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH -v2 03/18] sched/fair: Cure calc_cfs_shares() vs reweight_entity()

Vincent reported that when running in a cgroup, his root
cfs_rq->avg.load_avg dropped to 0 on task idle.

This is because reweight_entity() will now immediately propagate the
weight change of the group entity to its cfs_rq, and as it happens,
our approxmation (5) for calc_cfs_shares() results in 0 when the group
is idle.

Avoid this by using the correct (3) as a lower bound on (5). This way
the empty cgroup will slowly decay instead of instantly drop to 0.

Reported-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/sched/fair.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2703,11 +2703,10 @@ static long calc_cfs_shares(struct cfs_r
tg_shares = READ_ONCE(tg->shares);

/*
- * This really should be: cfs_rq->avg.load_avg, but instead we use
- * cfs_rq->load.weight, which is its upper bound. This helps ramp up
- * the shares for small weight interactive tasks.
+ * Because (5) drops to 0 when the cfs_rq is idle, we need to use (3)
+ * as a lower bound.
*/
- load = scale_load_down(cfs_rq->load.weight);
+ load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg);

tg_weight = atomic_long_read(&tg->load_avg);




2017-09-29 09:04:42

by Morten Rasmussen

[permalink] [raw]
Subject: Re: [PATCH -v2 03/18] sched/fair: Cure calc_cfs_shares() vs reweight_entity()

On Fri, Sep 01, 2017 at 03:21:02PM +0200, Peter Zijlstra wrote:
> Vincent reported that when running in a cgroup, his root
> cfs_rq->avg.load_avg dropped to 0 on task idle.
>
> This is because reweight_entity() will now immediately propagate the
> weight change of the group entity to its cfs_rq, and as it happens,
> our approxmation (5) for calc_cfs_shares() results in 0 when the group
> is idle.
>
> Avoid this by using the correct (3) as a lower bound on (5). This way
> the empty cgroup will slowly decay instead of instantly drop to 0.
>
> Reported-by: Vincent Guittot <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> kernel/sched/fair.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2703,11 +2703,10 @@ static long calc_cfs_shares(struct cfs_r
> tg_shares = READ_ONCE(tg->shares);
>
> /*
> - * This really should be: cfs_rq->avg.load_avg, but instead we use
> - * cfs_rq->load.weight, which is its upper bound. This helps ramp up
> - * the shares for small weight interactive tasks.
> + * Because (5) drops to 0 when the cfs_rq is idle, we need to use (3)
> + * as a lower bound.
> */
> - load = scale_load_down(cfs_rq->load.weight);
> + load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg);

We use cfs_rq->tg_load_avg_contrib (the filtered version of
cfs_rq->avg.load_avg) instead of cfs_rq->avg.load_avg further down, so I
think we should here too for consistency.

+ load = max(scale_load_down(cfs_rq->load.weight),
+ cfs_rq->tg_load_avg_contrib);

With this change (5) almost becomes (3):

ge->load.weight =

tg->weight * max(grq->load.weight, grq->avg.load_avg)
---------------------------------------------------------------------------
tg->load_avg - grq->avg.load_avg + max(grq->load.weight, grq->avg.load_avg)

The difference is that we boost ge->load.weight for if the grq has
runnable tasks with se->avg.load_avg < se->load.weight, i.e. tasks that
occasionally block. This means that the underestimate scenario I have in
my reply for patch #2 is no longer possible. AFAICT, we are now
guaranteed to over-estimate ge->load.weight. It is still quite sensitive
to periodic high priority tasks though.

tg->weight = 1024
tg->load_avg = 2560
\Sum grq->load.weight = 2048

cpu 0 1 \Sum
grq->avg.load_avg 1536 1024
grq->load.weight 1024 1024
load (max) 1536 1024
ge->load_weight (1) 512 512 1024 >= tg->weight
ge->load_weight (3) 614 410 1024 >= tg->weight
ge->load_weight (5) 512 410 922 < tg->weight
ge->load_weight (5*) 614 410 1024 >= tg->weight

2017-09-29 11:39:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v2 03/18] sched/fair: Cure calc_cfs_shares() vs reweight_entity()

On Fri, Sep 29, 2017 at 10:04:34AM +0100, Morten Rasmussen wrote:

> > - load = scale_load_down(cfs_rq->load.weight);
> > + load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg);
>
> We use cfs_rq->tg_load_avg_contrib (the filtered version of
> cfs_rq->avg.load_avg) instead of cfs_rq->avg.load_avg further down, so I
> think we should here too for consistency.
>
> + load = max(scale_load_down(cfs_rq->load.weight),
> + cfs_rq->tg_load_avg_contrib);
>

No; we must use tg_load_avg_contrib because that is what's inclded in
tg_weight, but we want to add the most up-to-date value back in.

2017-09-29 13:00:45

by Morten Rasmussen

[permalink] [raw]
Subject: Re: [PATCH -v2 03/18] sched/fair: Cure calc_cfs_shares() vs reweight_entity()

On Fri, Sep 29, 2017 at 01:38:53PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 29, 2017 at 10:04:34AM +0100, Morten Rasmussen wrote:
>
> > > - load = scale_load_down(cfs_rq->load.weight);
> > > + load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg);
> >
> > We use cfs_rq->tg_load_avg_contrib (the filtered version of
> > cfs_rq->avg.load_avg) instead of cfs_rq->avg.load_avg further down, so I
> > think we should here too for consistency.
> >
> > + load = max(scale_load_down(cfs_rq->load.weight),
> > + cfs_rq->tg_load_avg_contrib);
> >
>
> No; we must use tg_load_avg_contrib because that is what's inclded in
> tg_weight, but we want to add the most up-to-date value back in.

Agreed. Looking at it again it make sense.