Subject: Re: [PATCH v2 4/4] sched/cpufreq_schedutil: use util_est for OPP selection

Hi Rafael,

On 16-Dec 03:35, Rafael J. Wysocki wrote:
> On Tuesday, December 5, 2017 6:10:18 PM CET Patrick Bellasi wrote:
[...]

> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 2f52ec0f1539..465430d99440 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -183,7 +183,11 @@ static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu)
> >
> > cfs_max = arch_scale_cpu_capacity(NULL, cpu);
> >
> > - *util = min(rq->cfs.avg.util_avg, cfs_max);
> > + *util = rq->cfs.avg.util_avg;
>
> I would use a local variable here.
>
> That *util everywhere looks a bit dirtyish.

Yes, right... will update for the next respin.

>
> > + if (sched_feat(UTIL_EST))
> > + *util = max(*util, rq->cfs.util_est_runnable);
> > + *util = min(*util, cfs_max);
> > +
> > *max = cfs_max;
> > }
> >
> >

Cheers Patrick

--
#include <best/regards.h>

Patrick Bellasi

2017-12-20 08:58:01

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH v2 2/4] sched/fair: add util_est on top of PELT

On Fri, Dec 15, 2017 at 03:41:40PM +0000, Patrick Bellasi wrote:
> Close enough, the actual code is:
>
> util_est = p->util_est.ewma;
> 5218: f9403ba3 ldr x3, [x29,#112]
> 521c: f9418462 ldr x2, [x3,#776]
> if (abs(util_est - util_last) <= (SCHED_CAPACITY_SCALE / 100))
> 5220: eb010040 subs x0, x2, x1
> 5224: da805400 cneg x0, x0, mi
> 5228: f100281f cmp x0, #0xa
> 522c: 54fff9cd b.le 5164 <dequeue_task_fair+0xa04>

Ah, that cneg instruction is cute; on x86 we end up with something like:

bool abs_test(long s)
{
return abs(s) < 32;
}

cmpl $-31, %eax
jl .L107
movq -8(%rbp), %rax
cmpl $31, %eax
jg .L107
movl $1, %eax
jmp .L108
.L107:
movl $0, %eax
.L108:

But I figured you can actually do:

abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1)

Which, if y is a constant, should result in nicer code, and it does for
x86:

addq $31, %rax
cmpq $62, %rax
setbe %al
movzbl %al, %eax

Just not measurably faster, I suppose because of all the dependencies :/

2017-12-20 09:02:54

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH v2 2/4] sched/fair: add util_est on top of PELT

On Wed, Dec 20, 2017 at 09:57:47AM +0100, Peter Zijlstra wrote:
> On Fri, Dec 15, 2017 at 03:41:40PM +0000, Patrick Bellasi wrote:
> > Close enough, the actual code is:
> >
> > util_est = p->util_est.ewma;
> > 5218: f9403ba3 ldr x3, [x29,#112]
> > 521c: f9418462 ldr x2, [x3,#776]
> > if (abs(util_est - util_last) <= (SCHED_CAPACITY_SCALE / 100))
> > 5220: eb010040 subs x0, x2, x1
> > 5224: da805400 cneg x0, x0, mi
> > 5228: f100281f cmp x0, #0xa
> > 522c: 54fff9cd b.le 5164 <dequeue_task_fair+0xa04>
>
> Ah, that cneg instruction is cute; on x86 we end up with something like:
>
> bool abs_test(long s)
> {
> return abs(s) < 32;
> }
>
> cmpl $-31, %eax
> jl .L107
> movq -8(%rbp), %rax
> cmpl $31, %eax
> jg .L107
> movl $1, %eax
> jmp .L108
> .L107:
> movl $0, %eax
> .L108:
>
>
> But I figured you can actually do:
>
> abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1)
>
> Which, if y is a constant, should result in nicer code, and it does for
> x86:
>
> addq $31, %rax
> cmpq $62, %rax
> setbe %al
> movzbl %al, %eax
>
> Just not measurably faster, I suppose because of all the dependencies :/

Ah no, it actually is, I'm an idiot and used 'long' for return value. If
I use bool we loose that last movzbl and we go from around 4.0 cycles
down to 3.4 cycles.

2018-01-10 12:19:09

by tip-bot for Vasyl Gomonovych

[permalink] [raw]

Subject: [tip:sched/core] sched/fair: Use 'unsigned long' for utilization, consistently

Commit-ID: f01415fdbfe83380c2dfcf90b7b26042f88963aa
Gitweb: https://git.kernel.org/tip/f01415fdbfe83380c2dfcf90b7b26042f88963aa
Author: Patrick Bellasi <[email protected]>
AuthorDate: Tue, 5 Dec 2017 17:10:15 +0000
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 10 Jan 2018 11:30:28 +0100

sched/fair: Use 'unsigned long' for utilization, consistently

Utilization and capacity are tracked as 'unsigned long', however some
functions using them return an 'int' which is ultimately assigned back to
'unsigned long' variables.

Since there is not scope on using a different and signed type,
consolidate the signature of functions returning utilization to always
use the native type.

This change improves code consistency, and it also benefits
code paths where utilizations should be clamped by avoiding
further type conversions or ugly type casts.

Signed-off-by: Patrick Bellasi <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Chris Redpath <[email protected]>
Reviewed-by: Brendan Jackman <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Morten Rasmussen <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rafael J . Wysocki <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Todd Kjos <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Viresh Kumar <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2915c0d..de43bd8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5765,8 +5765,8 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
return affine;
}

-static inline int task_util(struct task_struct *p);
-static int cpu_util_wake(int cpu, struct task_struct *p);
+static inline unsigned long task_util(struct task_struct *p);
+static unsigned long cpu_util_wake(int cpu, struct task_struct *p);

static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
{
@@ -6247,7 +6247,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
* capacity_orig) as it useful for predicting the capacity required after task
* migrations (scheduler-driven DVFS).
*/
-static int cpu_util(int cpu)
+static unsigned long cpu_util(int cpu)
{
unsigned long util = cpu_rq(cpu)->cfs.avg.util_avg;
unsigned long capacity = capacity_orig_of(cpu);
@@ -6255,7 +6255,7 @@ static int cpu_util(int cpu)
return (util >= capacity) ? capacity : util;
}

-static inline int task_util(struct task_struct *p)
+static inline unsigned long task_util(struct task_struct *p)
{
return p->se.avg.util_avg;
}
@@ -6264,7 +6264,7 @@ static inline int task_util(struct task_struct *p)
* cpu_util_wake: Compute cpu utilization with any contributions from
* the waking task p removed.
*/
-static int cpu_util_wake(int cpu, struct task_struct *p)
+static unsigned long cpu_util_wake(int cpu, struct task_struct *p)
{
unsigned long util, capacity;