by Steve Muckle

[permalink] [raw]

Subject: Re: [PATCH 1/2] sched/fair: move cpufreq hook to update_cfs_rq_load_avg()

On Tue, Apr 12, 2016 at 04:29:06PM +0200, Rafael J. Wysocki wrote:
> On Mon, Apr 11, 2016 at 11:20 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Mon, Apr 11, 2016 at 9:28 PM, Steve Muckle <[email protected]> wrote:
> >> Hi Rafael,
> >>
> >> On 04/01/2016 02:20 AM, Peter Zijlstra wrote:
> >>>> > My thinking was in CFS we get rid of the (cpu == smp_processor_id())
> >>>> > condition for calling the cpufreq hook.
> >>>> >
> >>>> > The sched governor can then calculate utilization and frequency required
> >>>> > for cpu. If (cpu == smp_processor_id()), the update is processed
> >>>> > normally. If (cpu != smp_processor_id()) and the new frequency is higher
> >>>> > than cpu's Fcur, the sched gov IPIs cpu to continue running the update
> >>>> > operation. Otherwise, the update is dropped.
> >>>> >
> >>>> > Does that sound plausible?
> >>>
> >>> Can be done I suppose..
> >>
> >> Currently we drop schedutil updates for a target CPU which do not occur
> >> on that CPU.
> >>
> >> Is this solely due to platforms which must run the cpufreq driver on the
> >> target CPU?
> >
> > The current code assumes that the CPU running the update will always
> > be the one that gets updated. Anything else would require extra
> > synchronization.
>
> This is rather fundamental.
>
> For example, if you look at cpufreq_update_util(), it does this:
>
> data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
>
> meaning that it will run the current CPU's utilization update
> callback. Of course, that won't work cross-CPU, because in principle
> different CPUs may use different governors and therefore different
> util update callbacks.
>
> If you want to do remote updates, I guess that will require an
> irq_work to run the update on the target CPU, but then you'll probably
> want to neglect the rate limit on it as well, so it looks like a
> "need_update" flag in struct update_util_data will be useful for that.
>
> I think I can prototype something along these lines, but can you
> please tell me more about the case you have in mind?

I'm concerned generally with the latency to react to changes in
required capacity due to remote wakeups, which are quite common on SMP
platforms with shared cache. Unless the hook is called it could take
up to a tick to react AFAICS if the target CPU is running some other
task that does not get preempted by the wakeup. That's a potentially
long time for say UI-critical applications and seems like a lost
opportunity for us to leverage closer scheduler-cpufreq communication
to get better performance.

thanks,
Steve

2016-04-13 00:08:32

Commit-ID: 41e0d37f7ac81297c07ba311e4ad39465b8c8295
Gitweb: http://git.kernel.org/tip/41e0d37f7ac81297c07ba311e4ad39465b8c8295
Author: Steve Muckle <[email protected]>
AuthorDate: Mon, 21 Mar 2016 17:21:08 -0700
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 23 Apr 2016 14:20:36 +0200

sched/fair: Do not call cpufreq hook unless util changed

There's no reason to call the cpufreq hook if the root cfs_rq
utilization has not been modified.

Signed-off-by: Steve Muckle <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Michael Turquette <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Morten Rasmussen <[email protected]>
Cc: Patrick Bellasi <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Vincent Guittot <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6df80d4..8155281 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2879,20 +2879,21 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
{
struct sched_avg *sa = &cfs_rq->avg;
struct rq *rq = rq_of(cfs_rq);
- int decayed, removed = 0;
+ int decayed, removed_load = 0, removed_util = 0;
int cpu = cpu_of(rq);

if (atomic_long_read(&cfs_rq->removed_load_avg)) {
s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
sa->load_avg = max_t(long, sa->load_avg - r, 0);
sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
- removed = 1;
+ removed_load = 1;
}

if (atomic_long_read(&cfs_rq->removed_util_avg)) {
long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
sa->util_avg = max_t(long, sa->util_avg - r, 0);
sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
+ removed_util = 1;
}

decayed = __update_load_avg(now, cpu, sa,
@@ -2903,7 +2904,8 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
cfs_rq->load_last_update_time_copy = sa->last_update_time;
#endif

- if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) {
+ if (cpu == smp_processor_id() && &rq->cfs == cfs_rq &&
+ (decayed || removed_util)) {
unsigned long max = rq->cpu_capacity_orig;

/*
@@ -2926,7 +2928,7 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
min(sa->util_avg, max), max);
}

- return decayed || removed;
+ return decayed || removed_load;
}

/* Update task and its cfs_rq load average */