by Boqun Feng

[permalink] [raw]

Subject: Re: [PATCH v10 6/7] sched: Provide runnable_load_avg back to cfs_rq

On Tue, Jul 21, 2015 at 08:44:01AM +0800, Yuyang Du wrote:
> On Tue, Jul 21, 2015 at 09:08:07AM +0800, Boqun Feng wrote:
> > Hi Yuyang,
> >
> > On Wed, Jul 15, 2015 at 08:04:41AM +0800, Yuyang Du wrote:
> > > The cfs_rq's load_avg is composed of runnable_load_avg and blocked_load_avg.
> > > Before this series, sometimes the runnable_load_avg is used, and sometimes
> > > the load_avg is used. Completely replacing all uses of runnable_load_avg
> > > with load_avg may be too big a leap, i.e., the blocked_load_avg is concerned
> > > to result in overrated load. Therefore, we get runnable_load_avg back.
> > >
> > > The new cfs_rq's runnable_load_avg is improved to be updated with all of the
> > > runnable sched_eneities at the same time, so the one sched_entity updated and
> > > the others stale problem is solved.
> > >
> >
> > How about tracking cfs_rq's blocked_load_avg instead of
> > runnable_load_avg, because, AFAICS:
> >
> > cfs_rq->runnable_load_avg = se->avg.load_avg - cfs_rq->blocked_load_avg.
>
> No, cfs_rq->runnable_load_avg = cfs_rq->avg.load_avg - cfs_rq->blocked_load_avg,
> without rounding errors and the like.
>

Oh, sorry.. yeah, you're right here.

> > se is the corresponding sched_entity of cfs_rq. And when we need the
> > runnable_load_avg, we just calculate by the expression above.
> >
> > This can be thought as a lazy way to update runnable_load_avg, and we
> > don't need to modify __update_load_avg any more.
>
> Not lazy at all, but adding (as of now) useless blocked_load_avg and an
> extra subtraction.

but we can remove runnable_load_avg tracking code in __update_load_avg,
as you do in this patch, right?

> Or did you forget blocked_load_avg also needs to be updated/decayed as
> time elapses?

I know we need to update or decay the blocked_load_avg, but we only need
to update and decay when 1) entity dequeued/enqueued 2) entity migrated
or 3) we need the runnable_load_avg value calcuated by blocked_load_avg,
right?

These are more rare than __update_load_avg called, right?

Regards,
Boqun

Attachments:

(No filename) (2.00 kB)
signature.asc (473.00 B)
Download all attachments

2015-07-21 10:30:02

by Boqun Feng

[permalink] [raw]

Subject: Re: [PATCH v10 6/7] sched: Provide runnable_load_avg back to cfs_rq

On Tue, Jul 21, 2015 at 06:18:46PM +0800, Boqun Feng wrote:
> On Tue, Jul 21, 2015 at 08:44:01AM +0800, Yuyang Du wrote:
> > On Tue, Jul 21, 2015 at 09:08:07AM +0800, Boqun Feng wrote:
> > > Hi Yuyang,
> > >
> > > On Wed, Jul 15, 2015 at 08:04:41AM +0800, Yuyang Du wrote:
> > > > The cfs_rq's load_avg is composed of runnable_load_avg and blocked_load_avg.
> > > > Before this series, sometimes the runnable_load_avg is used, and sometimes
> > > > the load_avg is used. Completely replacing all uses of runnable_load_avg
> > > > with load_avg may be too big a leap, i.e., the blocked_load_avg is concerned
> > > > to result in overrated load. Therefore, we get runnable_load_avg back.
> > > >
> > > > The new cfs_rq's runnable_load_avg is improved to be updated with all of the
> > > > runnable sched_eneities at the same time, so the one sched_entity updated and
> > > > the others stale problem is solved.
> > > >
> > >
> > > How about tracking cfs_rq's blocked_load_avg instead of
> > > runnable_load_avg, because, AFAICS:
> > >
> > > cfs_rq->runnable_load_avg = se->avg.load_avg - cfs_rq->blocked_load_avg.
> >
> > No, cfs_rq->runnable_load_avg = cfs_rq->avg.load_avg - cfs_rq->blocked_load_avg,
> > without rounding errors and the like.
> >
>
> Oh, sorry.. yeah, you're right here.
>

The point is that you have already tracked the sum of runnable_load_avg
and blocked_load_avg in cfs_rq->avg.load_avg. If you're going to track
part of the sum, you'd better track the one that's updated less
frequently, right?

Anyway, this idea just comes into my mind. I wonder which is udpated
less frequently myself too. ;-) So I ask to see whether there is
something we can improve.

Regards,
Boqun

Attachments:

(No filename) (1.66 kB)
signature.asc (473.00 B)
Download all attachments

2015-07-22 02:20:02

2015-07-27 04:15:32

by Yuyang Du

[permalink] [raw]

Subject: Re: [PATCH v10 2/7] sched: Rewrite runnable load and utilization average tracking

Hi Dietmar,

On Fri, Jul 24, 2015 at 05:41:35PM +0100, Dietmar Eggemann wrote:
> Hi Yuyang,
>
> On 15/07/15 01:04, Yuyang Du wrote:
>
> [...]
>
> > @@ -4674,7 +4487,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
> > /*
> > * w = rw_i + @wl
> > */
> > - w = se->my_q->load.weight + wl;
> > + w = se->my_q->avg.load_avg + wl;
> >
> > /*
> > * wl = S * s'_i; see (2)
>
> There is a comment 'Per the above, wl is the new *se->load.weight*
> value'. This should be replaced by *se->avg.load_avg*. Also the function
> header explains the functionality of effective_load() based on weight
> and not sched_avg::load_avg.

I think it is already replaced when effective_load is called.

About load.weight vs. load_avg, see below.

> > @@ -4695,7 +4508,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
> > /*
> > * wl = dw_i = S * (s'_i - s_i); see (3)
> > */
> > - wl -= se->load.weight;
> > + wl -= se->avg.load_avg;
> >
> > /*
> > * Recursively apply this logic to all parent groups to compute
> > @@ -4769,14 +4582,14 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, int sync)
> > */
> > if (sync) {
> > tg = task_group(current);
> > - weight = current->se.load.weight;
> > + weight = current->se.avg.load_avg;
> >
> > this_load += effective_load(tg, this_cpu, -weight, -weight);
> > load += effective_load(tg, prev_cpu, 0, -weight);
> > }
> >
> > tg = task_group(p);
> > - weight = p->se.load.weight;
> > + weight = p->se.avg.load_avg;
>
> You changed cfs_rq->load.weight to cfs_rq->avg.load_avg and
> se->load.weight to se->avg.load_avg in effective_load() and
> wake_affine() in v2.
> I wasn't able to find explanation why you did this. I mean we still have
> to maintain 'struct load_weight' on cfs_rq's and se's representing tg's.

Yes, I might not have explained it specifically, but back then, it was
simply motivated/reasoned by consistently expressing the load with load_avg.

As of now, it is sort of the same, adding as I previously stated, as far
as group SE is concerned, we use load_avg, instread of runnable_load_avg
or load.weight.

As was also suggested by Morten, we need to revisit the bulk of the load
balancing code a lot, including rethinking about what to use: load.weight,
or runnable_load_avg, or load_avg. I think this patch series is just a
starter.

Thanks,
Yuyang

2015-07-27 04:22:26

by Yuyang Du

[permalink] [raw]

Subject: Re: [PATCH v10 7/7] sched: Clean up load average references

Hi Dietmar,

On Fri, Jul 24, 2015 at 05:41:45PM +0100, Dietmar Eggemann wrote:
> On 15/07/15 01:04, Yuyang Du wrote:
> > For cfs_rq, we have load.weight, runnable_load_avg, and load_avg. We
> > now start to clean up how they are used.
> >
> > First, as group sched_entity already largely uses load_avg, we now expand
> > to use load_avg in all cases.
>
> You're talking about group se's or cfs_rq owned by the group se's
> (se->my_q) here or both?

Definitely, group SE, and if the cfs_rq owned by group SE is also concerned
with group SE, then both. I don't think this is very well calculated to be
optimal, but probably this is the right move I can think of now.

We need to revisit all of the codes before we can at least make a final call.

> Just asking because both data structures (cfs_rq and se) have a 'struct
> load_weight load' as well as 'struct sched_avg avg' member.
>
> Second, for CPU-wide load balancing, we
> > choose to use runnable_load_avg in all cases, which is the same as before
> > this series.
>
> With your patch-set there will be still the difference of
> 'cfs_rq->utilization_load_avg' and your 'cfs_rq->avg.util_avg' in the
> sense that the former one does not contain the contribution of blocked se's.
>
> The EAS patch-set adds blocked utilization contribution:
> https://lkml.org/lkml/2015/7/7/915
>
> The cfs_rq utilization is also used by the load-balancer code via
> get_cpu_usage() so the blocked utilization contribution to
> 'cfs_rq->avg.util_avg' can change load-balancing as well.
>
> Since it is not as heavily used as the cfs_rq->runnable_load_avg we
> might not need to reintroduce cfs_rq->utilization_load_avg but at least
> mention this here.
>

Yes, thanks.

2015-07-27 04:25:13

by Yuyang Du

[permalink] [raw]

Subject: Re: [PATCH v10 6/7] sched: Provide runnable_load_avg back to cfs_rq

On Mon, Jul 27, 2015 at 12:04:20PM +0800, Boqun Feng wrote:
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > 1) blocked load is more "difficult" to track, hint, migrate.
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> I may not get your point here? Are you saying my patch fails to handle
> the migration or are you just telling me that blocked load tracking need
> to take migration into consideration?

Both, is it so difficult to get?

> If it's the latter one, I want to say that, with blocked load or not, we
> have to handle load_avg in migrations, so *adding* some code to handle
> blocked load is not a big deal.
>
> Please consider this piece of code in update_cfs_rq_load_avg(), which
> decays and updates blocked_load_avg.

At this point of time, you tell me why exactly you want to track the blocked?

2015-07-27 03:21:23

by Boqun Feng

Commit-ID: 7ea241afbf4924c58d41078599f7a32ba49fb985
Gitweb: http://git.kernel.org/tip/7ea241afbf4924c58d41078599f7a32ba49fb985
Author: Yuyang Du <[email protected]>
AuthorDate: Wed, 15 Jul 2015 08:04:42 +0800
Committer: Ingo Molnar <[email protected]>
CommitDate: Mon, 3 Aug 2015 12:24:32 +0200

sched/fair: Clean up load average references

For cfs_rq, we have load.weight, runnable_load_avg, and load_avg.
Clean up how they are used:

- First, as group sched_entity already largely uses load_avg, we now expand
to use load_avg in all cases.

- Second, for CPU-wide load balancing, we choose to use runnable_load_avg
in all cases, which is the same as before this series.

Signed-off-by: Yuyang Du <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 44 +++++++++++++++++++++++++++++---------------
1 file changed, 29 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1a878d5..858b94a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -685,6 +685,9 @@ void init_entity_runnable_average(struct sched_entity *se)
sa->util_sum = LOAD_AVG_MAX;
/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
}
+
+static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq);
+static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq);
#else
void init_entity_runnable_average(struct sched_entity *se)
{
@@ -2360,7 +2363,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
*/
tg_weight = atomic_long_read(&tg->load_avg);
tg_weight -= cfs_rq->tg_load_avg_contrib;
- tg_weight += cfs_rq->avg.load_avg;
+ tg_weight += cfs_rq_load_avg(cfs_rq);

return tg_weight;
}
@@ -2370,7 +2373,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
long tg_weight, load, shares;

tg_weight = calc_tg_weight(tg, cfs_rq);
- load = cfs_rq->avg.load_avg;
+ load = cfs_rq_load_avg(cfs_rq);

shares = (tg->shares * load);
if (tg_weight)
@@ -2796,6 +2799,16 @@ void idle_exit_fair(struct rq *this_rq)
{
}

+static inline unsigned long cfs_rq_runnable_load_avg(struct cfs_rq *cfs_rq)
+{
+ return cfs_rq->runnable_load_avg;
+}
+
+static inline unsigned long cfs_rq_load_avg(struct cfs_rq *cfs_rq)
+{
+ return cfs_rq->avg.load_avg;
+}
+
static int idle_balance(struct rq *this_rq);

#else /* CONFIG_SMP */
@@ -4270,6 +4283,12 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
sched_avg_update(this_rq);
}

+/* Used instead of source_load when we know the type == 0 */
+static unsigned long weighted_cpuload(const int cpu)
+{
+ return cfs_rq_runnable_load_avg(&cpu_rq(cpu)->cfs);
+}
+
#ifdef CONFIG_NO_HZ_COMMON
/*
* There is no sane way to deal with nohz on smp when using jiffies because the
@@ -4291,7 +4310,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
static void update_idle_cpu_load(struct rq *this_rq)
{
unsigned long curr_jiffies = READ_ONCE(jiffies);
- unsigned long load = this_rq->cfs.avg.load_avg;
+ unsigned long load = weighted_cpuload(cpu_of(this_rq));
unsigned long pending_updates;

/*
@@ -4337,7 +4356,7 @@ void update_cpu_load_nohz(void)
*/
void update_cpu_load_active(struct rq *this_rq)
{
- unsigned long load = this_rq->cfs.avg.load_avg;
+ unsigned long load = weighted_cpuload(cpu_of(this_rq));
/*
* See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
*/
@@ -4345,12 +4364,6 @@ void update_cpu_load_active(struct rq *this_rq)
__update_cpu_load(this_rq, load, 1);
}

-/* Used instead of source_load when we know the type == 0 */
-static unsigned long weighted_cpuload(const int cpu)
-{
- return cpu_rq(cpu)->cfs.avg.load_avg;
-}
-
/*
* Return a low guess at the load of a migration-source cpu weighted
* according to the scheduling class and "nice" value.
@@ -4398,7 +4411,7 @@ static unsigned long cpu_avg_load_per_task(int cpu)
{
struct rq *rq = cpu_rq(cpu);
unsigned long nr_running = READ_ONCE(rq->cfs.h_nr_running);
- unsigned long load_avg = rq->cfs.avg.load_avg;
+ unsigned long load_avg = weighted_cpuload(cpu);

if (nr_running)
return load_avg / nr_running;
@@ -4517,7 +4530,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
/*
* w = rw_i + @wl
*/
- w = se->my_q->avg.load_avg + wl;
+ w = cfs_rq_load_avg(se->my_q) + wl;

/*
* wl = S * s'_i; see (2)
@@ -5862,13 +5875,14 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
}

if (!se) {
- cfs_rq->h_load = cfs_rq->avg.load_avg;
+ cfs_rq->h_load = cfs_rq_load_avg(cfs_rq);
cfs_rq->last_h_load_update = now;
}

while ((se = cfs_rq->h_load_next) != NULL) {
load = cfs_rq->h_load;
- load = div64_ul(load * se->avg.load_avg, cfs_rq->avg.load_avg + 1);
+ load = div64_ul(load * se->avg.load_avg,
+ cfs_rq_load_avg(cfs_rq) + 1);
cfs_rq = group_cfs_rq(se);
cfs_rq->h_load = load;
cfs_rq->last_h_load_update = now;
@@ -5881,7 +5895,7 @@ static unsigned long task_h_load(struct task_struct *p)

update_cfs_rq_h_load(cfs_rq);
return div64_ul(p->se.avg.load_avg * cfs_rq->h_load,
- cfs_rq->avg.load_avg + 1);
+ cfs_rq_load_avg(cfs_rq) + 1);
}
#else
static inline void update_blocked_averages(int cpu)