2013-08-19 16:16:39

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 07/10] sched, fair: Optimize find_busiest_queue()

Use for_each_cpu_and() and thereby avoid computing the capacity for
CPUs we know we're not interested in.

Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/sched/fair.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4977,7 +4977,7 @@ static struct rq *find_busiest_queue(str
unsigned long busiest_load = 0, busiest_power = SCHED_POWER_SCALE;
int i;

- for_each_cpu(i, sched_group_cpus(group)) {
+ for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
unsigned long power = power_of(i);
unsigned long capacity = DIV_ROUND_CLOSEST(power,
SCHED_POWER_SCALE);
@@ -4986,9 +4986,6 @@ static struct rq *find_busiest_queue(str
if (!capacity)
capacity = fix_small_capacity(env->sd, group);

- if (!cpumask_test_cpu(i, env->cpus))
- continue;
-
rq = cpu_rq(i);
wl = weighted_cpuload(i);



2013-08-23 08:14:45

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH 07/10] sched, fair: Optimize find_busiest_queue()

Hi Peter,

On 08/19/2013 09:31 PM, Peter Zijlstra wrote:

In the load balancing code, looks to me that
cpumask_copy(cpus, cpu_active_mask) is not updating the env.cpus at all,
before calling find_busiest_group(). Am I missing something?

Should not cpumask_copy() below be before we update the env.cpus parameter?

static int load_balance(int this_cpu, struct rq *this_rq,
struct sched_domain *sd, enum cpu_idle_type idle,
int *balance)
{
int ld_moved, cur_ld_moved, active_balance = 0;
struct sched_group *group;
struct rq *busiest;
unsigned long flags;
struct cpumask *cpus = __get_cpu_var(load_balance_mask);

struct lb_env env = {
.sd = sd,
.dst_cpu = this_cpu,
.dst_rq = this_rq,
.dst_grpmask = sched_group_cpus(sd->groups),
.idle = idle,
.loop_break = sched_nr_migrate_break,
.cpus = cpus,
};

/*
* For NEWLY_IDLE load_balancing, we don't need to consider
* other cpus in our group
*/
if (idle == CPU_NEWLY_IDLE)
env.dst_grpmask = NULL;

cpumask_copy(cpus, cpu_active_mask);

schedstat_inc(sd, lb_count[idle]);
redo:
group = find_busiest_group(&env, balance);

Regards
Preeti U Murthy

2013-08-23 10:04:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 07/10] sched, fair: Optimize find_busiest_queue()

On Fri, Aug 23, 2013 at 01:41:55PM +0530, Preeti U Murthy wrote:
> Hi Peter,
>
> On 08/19/2013 09:31 PM, Peter Zijlstra wrote:
>
> In the load balancing code, looks to me that
> cpumask_copy(cpus, cpu_active_mask) is not updating the env.cpus at all,
> before calling find_busiest_group(). Am I missing something?
>
> Should not cpumask_copy() below be before we update the env.cpus parameter?

Nah, its all pointers.

2013-08-23 10:57:14

by Preeti U Murthy

[permalink] [raw]
Subject: Re: [PATCH 07/10] sched, fair: Optimize find_busiest_queue()

On 08/23/2013 03:33 PM, Peter Zijlstra wrote:
> On Fri, Aug 23, 2013 at 01:41:55PM +0530, Preeti U Murthy wrote:
>> Hi Peter,
>>
>> On 08/19/2013 09:31 PM, Peter Zijlstra wrote:
>>
>> In the load balancing code, looks to me that
>> cpumask_copy(cpus, cpu_active_mask) is not updating the env.cpus at all,
>> before calling find_busiest_group(). Am I missing something?
>>
>> Should not cpumask_copy() below be before we update the env.cpus parameter?
>
> Nah, its all pointers.
>

Yikes! Of course :P

Reviewed-by: Preeti U Murthy <[email protected]>

2013-08-24 10:34:31

by Paul Turner

[permalink] [raw]
Subject: Re: [PATCH 07/10] sched, fair: Optimize find_busiest_queue()

On Mon, Aug 19, 2013 at 9:01 AM, Peter Zijlstra <[email protected]> wrote:
> Use for_each_cpu_and() and thereby avoid computing the capacity for
> CPUs we know we're not interested in.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> kernel/sched/fair.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4977,7 +4977,7 @@ static struct rq *find_busiest_queue(str
> unsigned long busiest_load = 0, busiest_power = SCHED_POWER_SCALE;
> int i;
>
> - for_each_cpu(i, sched_group_cpus(group)) {
> + for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
> unsigned long power = power_of(i);
> unsigned long capacity = DIV_ROUND_CLOSEST(power,
> SCHED_POWER_SCALE);
> @@ -4986,9 +4986,6 @@ static struct rq *find_busiest_queue(str
> if (!capacity)
> capacity = fix_small_capacity(env->sd, group);
>
> - if (!cpumask_test_cpu(i, env->cpus))
> - continue;
> -
> rq = cpu_rq(i);
> wl = weighted_cpuload(i);

There's no need to actually do the divisions immediately below this also.

e.g.
unsigned long max_load_power = SCHED_POWER_SCALE;
...
if (wl * max_load_power > max_load * power) {
max_load = wl;
max_load_power = power;
...

This would actually end up being a little more accurate even.

[ Alternatively without caching max_load_power we could compare wl *
power vs max_load * SCHED_POWER_SCALE. ]

Reviewed-by: Paul Turner <[email protected]>

2013-08-26 12:07:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 07/10] sched, fair: Optimize find_busiest_queue()

On Sat, Aug 24, 2013 at 03:33:59AM -0700, Paul Turner wrote:
> On Mon, Aug 19, 2013 at 9:01 AM, Peter Zijlstra <[email protected]> wrote:
> > +++ b/kernel/sched/fair.c
> > @@ -4977,7 +4977,7 @@ static struct rq *find_busiest_queue(str
> > unsigned long busiest_load = 0, busiest_power = SCHED_POWER_SCALE;
> > int i;
> >
> > - for_each_cpu(i, sched_group_cpus(group)) {
> > + for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
> > unsigned long power = power_of(i);
> > unsigned long capacity = DIV_ROUND_CLOSEST(power,
> > SCHED_POWER_SCALE);
> > @@ -4986,9 +4986,6 @@ static struct rq *find_busiest_queue(str
> > if (!capacity)
> > capacity = fix_small_capacity(env->sd, group);
> >
> > - if (!cpumask_test_cpu(i, env->cpus))
> > - continue;
> > -
> > rq = cpu_rq(i);
> > wl = weighted_cpuload(i);
>
> There's no need to actually do the divisions immediately below this also.
>
> e.g.
> unsigned long max_load_power = SCHED_POWER_SCALE;
> ...
> if (wl * max_load_power > max_load * power) {
> max_load = wl;
> max_load_power = power;
> ...
>
> This would actually end up being a little more accurate even.
>
> [ Alternatively without caching max_load_power we could compare wl *
> power vs max_load * SCHED_POWER_SCALE. ]

You've got me confused again. You're talking about something like the
below?

I suppose the problem with that is that we could keep selecting the
busiest rq with an unmovable task due to:

move_tasks():
if ((load / 2) > env->imbalance)
goto next;

That said, the condition in fbq() should at least be modified to match
this. Now the entire capacity crap comes from:

bdb94aa sched: Try to deal with low capacity

But thinking a little more about it, if power drops that low imbalance
is likely to be _huge_ and we'd not meet that condition. Now if only I
wrote a more comprehensive Changelog and explained why that wouldn't be
the case. /me kicks himself.

---
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4990,28 +4990,12 @@ static struct sched_group *find_busiest_
static struct rq *find_busiest_queue(struct lb_env *env,
struct sched_group *group)
{
- struct rq *busiest = NULL, *rq;
unsigned long busiest_load = 0, busiest_power = 1;
+ struct rq *busiest = NULL;
int i;

for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
- unsigned long power = power_of(i);
- unsigned long capacity = DIV_ROUND_CLOSEST(power,
- SCHED_POWER_SCALE);
- unsigned long wl;
-
- if (!capacity)
- capacity = fix_small_capacity(env->sd, group);
-
- rq = cpu_rq(i);
- wl = weighted_cpuload(i);
-
- /*
- * When comparing with imbalance, use weighted_cpuload()
- * which is not scaled with the cpu power.
- */
- if (capacity && rq->nr_running == 1 && wl > env->imbalance)
- continue;
+ unsigned long wl = weighted_cpuload(i);

/*
* For the load comparisons with the other cpu's, consider
@@ -5027,7 +5011,7 @@ static struct rq *find_busiest_queue(str
if (wl * busiest_power > busiest_load * power) {
busiest_load = wl;
busiest_power = power;
- busiest = rq;
+ busiest = cpu_rq(i);
}
}

2013-08-27 09:14:13

by Paul Turner

[permalink] [raw]
Subject: Re: [PATCH 07/10] sched, fair: Optimize find_busiest_queue()

On Mon, Aug 26, 2013 at 5:07 AM, Peter Zijlstra <[email protected]> wrote:
> On Sat, Aug 24, 2013 at 03:33:59AM -0700, Paul Turner wrote:
>> On Mon, Aug 19, 2013 at 9:01 AM, Peter Zijlstra <[email protected]> wrote:
>> > +++ b/kernel/sched/fair.c
>> > @@ -4977,7 +4977,7 @@ static struct rq *find_busiest_queue(str
>> > unsigned long busiest_load = 0, busiest_power = SCHED_POWER_SCALE;
>> > int i;
>> >
>> > - for_each_cpu(i, sched_group_cpus(group)) {
>> > + for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
>> > unsigned long power = power_of(i);
>> > unsigned long capacity = DIV_ROUND_CLOSEST(power,
>> > SCHED_POWER_SCALE);
>> > @@ -4986,9 +4986,6 @@ static struct rq *find_busiest_queue(str
>> > if (!capacity)
>> > capacity = fix_small_capacity(env->sd, group);
>> >
>> > - if (!cpumask_test_cpu(i, env->cpus))
>> > - continue;
>> > -
>> > rq = cpu_rq(i);
>> > wl = weighted_cpuload(i);
>>
>> There's no need to actually do the divisions immediately below this also.
>>
>> e.g.
>> unsigned long max_load_power = SCHED_POWER_SCALE;
>> ...
>> if (wl * max_load_power > max_load * power) {
>> max_load = wl;
>> max_load_power = power;
>> ...
>>
>> This would actually end up being a little more accurate even.
>>
>> [ Alternatively without caching max_load_power we could compare wl *
>> power vs max_load * SCHED_POWER_SCALE. ]
>
> You've got me confused again. You're talking about something like the
> below?

Nevermind, I was looking at a tip tree as I reviewed this one. What I
was suggesting was exactly:
"[PATCH 01/10] sched: Remove one division operation in find_busiest_queue()"

>
> I suppose the problem with that is that we could keep selecting the
> busiest rq with an unmovable task due to:
>
> move_tasks():
> if ((load / 2) > env->imbalance)
> goto next;
>
> That said, the condition in fbq() should at least be modified to match
> this. Now the entire capacity crap comes from:
>
> bdb94aa sched: Try to deal with low capacity
>
> But thinking a little more about it, if power drops that low imbalance
> is likely to be _huge_ and we'd not meet that condition. Now if only I
> wrote a more comprehensive Changelog and explained why that wouldn't be
> the case. /me kicks himself.
>
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4990,28 +4990,12 @@ static struct sched_group *find_busiest_
> static struct rq *find_busiest_queue(struct lb_env *env,
> struct sched_group *group)
> {
> - struct rq *busiest = NULL, *rq;
> unsigned long busiest_load = 0, busiest_power = 1;
> + struct rq *busiest = NULL;
> int i;
>
> for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
> - unsigned long power = power_of(i);
> - unsigned long capacity = DIV_ROUND_CLOSEST(power,
> - SCHED_POWER_SCALE);
> - unsigned long wl;
> -
> - if (!capacity)
> - capacity = fix_small_capacity(env->sd, group);
> -
> - rq = cpu_rq(i);
> - wl = weighted_cpuload(i);
> -
> - /*
> - * When comparing with imbalance, use weighted_cpuload()
> - * which is not scaled with the cpu power.
> - */
> - if (capacity && rq->nr_running == 1 && wl > env->imbalance)
> - continue;
> + unsigned long wl = weighted_cpuload(i);
>
> /*
> * For the load comparisons with the other cpu's, consider
> @@ -5027,7 +5011,7 @@ static struct rq *find_busiest_queue(str
> if (wl * busiest_power > busiest_load * power) {
> busiest_load = wl;
> busiest_power = power;
> - busiest = rq;
> + busiest = cpu_rq(i);
> }
> }
>
>