2010-07-01 23:14:10

by Venkatesh Pallipadi

[permalink] [raw]
Subject: [PATCH] sched: Call update_group_power only for local_group

commit 871e35b moved update_group_power() call in update_sg_lb_stats(),
resulting in it being called for each group, even though it only updates
the power of local group. As a result we have frequent redundant
update_group_power() calls.

Move it back under "if (local_group)" condition.

This reduces the number of calls to update_group_power by a factor of 4
on my 4 core in 4 NUMA nodes test system.

Signed-off-by: Venkatesh Pallipadi <[email protected]>
---
kernel/sched_fair.c | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index a878b53..369d53d 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2359,8 +2359,11 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
unsigned int balance_cpu = -1, first_idle_cpu = 0;
unsigned long avg_load_per_task = 0;

- if (local_group)
+ if (local_group) {
balance_cpu = group_first_cpu(group);
+ update_group_power(sd, this_cpu);
+ }
+

/* Tally up the load of all CPUs in the group */
max_cpu_load = 0;
@@ -2406,8 +2409,6 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
return;
}

- update_group_power(sd, this_cpu);
-
/* Adjust by relative CPU power of the group */
sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;

--
1.7.1


2010-07-02 08:06:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

On Thu, 2010-07-01 at 16:12 -0700, Venkatesh Pallipadi wrote:
> commit 871e35b moved update_group_power() call in update_sg_lb_stats(),
> resulting in it being called for each group, even though it only updates
> the power of local group. As a result we have frequent redundant
> update_group_power() calls.
>
> Move it back under "if (local_group)" condition.
>
> This reduces the number of calls to update_group_power by a factor of 4
> on my 4 core in 4 NUMA nodes test system.

Hrm,.. so Gautham removed that because for things like the NO_HZ
balancer the initial balance_cpu == this_cpu constraint doesn't hold.

Not I don't think the local_group constraint holds for that either, so
the below would again break that..

Should we perhaps have a conditional on this_rq->nohz_balance_kick or
so?

> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2359,8 +2359,11 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
> unsigned int balance_cpu = -1, first_idle_cpu = 0;
> unsigned long avg_load_per_task = 0;
>
> - if (local_group)
> + if (local_group) {
> balance_cpu = group_first_cpu(group);
> + update_group_power(sd, this_cpu);
> + }
> +
>
> /* Tally up the load of all CPUs in the group */
> max_cpu_load = 0;

2010-07-02 16:20:10

by Venkatesh Pallipadi

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

On Fri, Jul 2, 2010 at 1:05 AM, Peter Zijlstra <[email protected]> wrote:
> On Thu, 2010-07-01 at 16:12 -0700, Venkatesh Pallipadi wrote:
>> commit 871e35b moved update_group_power() call in update_sg_lb_stats(),
>> resulting in it being called for each group, even though it only updates
>> the power of local group. As a result we have frequent redundant
>> update_group_power() calls.
>>
>> Move it back under "if (local_group)" condition.
>>
>> This reduces the number of calls to update_group_power by a factor of 4
>> on my 4 core in 4 NUMA nodes test system.
>
> Hrm,.. so Gautham removed that because for things like the NO_HZ
> balancer the initial balance_cpu == this_cpu constraint doesn't hold.
>
> Not I don't think the local_group constraint holds for that either, so
> the below would again break that..
>
> Should we perhaps have a conditional on this_rq->nohz_balance_kick or
> so?

The thing is that update_group_power is only updating the power of
local group (sd->groups).
It is getting called multiple times however for each group as
update_sd_lb_stats loops
through groups->next calling update_sg_lb_stats.
If we really want to update the power of non-local groups,
update_cpu_power has to change
to take a groups parameter and non this_cpu as arguments and may have
to access non-local
rq etc.

Thanks,
Venki
>
>> --- a/kernel/sched_fair.c
>> +++ b/kernel/sched_fair.c
>> @@ -2359,8 +2359,11 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
>> ? ? ? unsigned int balance_cpu = -1, first_idle_cpu = 0;
>> ? ? ? unsigned long avg_load_per_task = 0;
>>
>> - ? ? if (local_group)
>> + ? ? if (local_group) {
>> ? ? ? ? ? ? ? balance_cpu = group_first_cpu(group);
>> + ? ? ? ? ? ? update_group_power(sd, this_cpu);
>> + ? ? }
>> +
>>
>> ? ? ? /* Tally up the load of all CPUs in the group */
>> ? ? ? max_cpu_load = 0;
>
>

2010-07-02 16:40:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

On Fri, 2010-07-02 at 09:20 -0700, Venkatesh Pallipadi wrote:
> > Hrm,.. so Gautham removed that because for things like the NO_HZ
> > balancer the initial balance_cpu == this_cpu constraint doesn't hold.
> >
> > Not I don't think the local_group constraint holds for that either, so
> > the below would again break that..
> >
> > Should we perhaps have a conditional on this_rq->nohz_balance_kick or
> > so?
>
> The thing is that update_group_power is only updating the power of
> local group (sd->groups).

Not quite, see nohz_idle_balance(), that iterates idle_cpus_mask, and
calls rebalance_domains(balance_cpu, CPU_IDLE), which then does
for_each_domain(balance_cpu, sd)

So sd need not be local at all, and sd->group will be the group of which
balance_cpu is part.

> It is getting called multiple times however for each group as
> update_sd_lb_stats loops
> through groups->next calling update_sg_lb_stats.

Sure I see how that's happening and why you would want to avoid that, no
argument there.

> If we really want to update the power of non-local groups,
> update_cpu_power has to change
> to take a groups parameter and non this_cpu as arguments and may have
> to access non-local
> rq etc.

No, see above. All we need is to somehow allow nohz_idle_balance() to
update cpu_power as well.

So I think we want something like:

if (local_group) {
balance_cpu = group_first_cpu(group);
if (balance_cpu == this_cpu || nohz_balance)
update_group_power(sd, this_cpu);
}

Or am I totally missing something here?

2010-07-02 16:57:10

by Venkatesh Pallipadi

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

My fault. I completely missed that code path, assuming this_cpu in
load_balance means this_cpu :(

How about the change below instead?

Signed-off-by: Venkatesh Pallipadi <[email protected]>
---
kernel/sched_fair.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index a878b53..97f4534 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2406,8 +2406,6 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
return;
}

- update_group_power(sd, this_cpu);
-
/* Adjust by relative CPU power of the group */
sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;

@@ -2456,6 +2454,8 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
init_sd_power_savings_stats(sd, sds, idle);
load_idx = get_sd_load_idx(sd, idle);

+ update_group_power(sd, this_cpu);
+
do {
int local_group;

--
1.7.1

2010-07-02 17:31:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

On Fri, 2010-07-02 at 09:56 -0700, Venkatesh Pallipadi wrote:
> How about the change below instead?
>
I think I've now managed to confuse myself too.. will try and reset my
brain and have a second look in a bit.. ;-)

2010-07-08 14:12:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

Sorry for the delay, I got side-tracked :/

On Fri, 2010-07-02 at 09:56 -0700, Venkatesh Pallipadi wrote:
> My fault. I completely missed that code path, assuming this_cpu in
> load_balance means this_cpu :(

Right, but local_group is still valid because that's
cpumask_test_cpu(this_cpu, sched_group_cpus(group));

So in that regard your initial patch was still correct. However, since
there can be multiple CPUs in the local group, we want to only have one
do the actual balancing, which is what the !*balance logic is about, so
by keeping the call site where it is gains that.

The below proposal avoids calling update_groups_power() too often for:
- !local_groups,
- multiple cpus in the same group.

Does this look correct?

Related to this, Mikey was looking at avoiding the for_each_cpu_and()
loops by converting update_sg_lb_stats into something similar to
update_group_power() and have 1 cpu in the group update the statistics
and propagate upwards by summing groups instead of individual cpus.

---
kernel/sched_fair.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 9910e1b..62f34a0 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2433,7 +2433,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
return;
}

- update_group_power(sd, this_cpu);
+ if (local_group)
+ update_group_power(sd, this_cpu);

/* Adjust by relative CPU power of the group */
sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;

2010-07-08 17:46:15

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

On Thu, 2010-07-08 at 07:12 -0700, Peter Zijlstra wrote:
> Sorry for the delay, I got side-tracked :/
>
> On Fri, 2010-07-02 at 09:56 -0700, Venkatesh Pallipadi wrote:
> > My fault. I completely missed that code path, assuming this_cpu in
> > load_balance means this_cpu :(
>
> Right, but local_group is still valid because that's
> cpumask_test_cpu(this_cpu, sched_group_cpus(group));
>
> So in that regard your initial patch was still correct. However, since
> there can be multiple CPUs in the local group, we want to only have one
> do the actual balancing, which is what the !*balance logic is about, so
> by keeping the call site where it is gains that.
>
> The below proposal avoids calling update_groups_power() too often for:
> - !local_groups,
> - multiple cpus in the same group.
>
> Does this look correct?
>
> Related to this, Mikey was looking at avoiding the for_each_cpu_and()
> loops by converting update_sg_lb_stats into something similar to
> update_group_power() and have 1 cpu in the group update the statistics
> and propagate upwards by summing groups instead of individual cpus.
>
> ---
> kernel/sched_fair.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9910e1b..62f34a0 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2433,7 +2433,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
> return;
> }
>
> - update_group_power(sd, this_cpu);
> + if (local_group)
> + update_group_power(sd, this_cpu);

if IDLE == CPU_NEWLY_IDLE, then all the cpu's in the local group will do
this. Also update_group_power() can be done on only on the local cpu,
i.e., when this_cpu == smp_processor_id() right?

So it should be something like?

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index a878b53..2d8a74e 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2406,8 +2406,6 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
return;
}

- update_group_power(sd, this_cpu);
-
/* Adjust by relative CPU power of the group */
sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;

@@ -2456,6 +2454,9 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
init_sd_power_savings_stats(sd, sds, idle);
load_idx = get_sd_load_idx(sd, idle);

+ if (this_cpu == smp_processor_id())
+ update_group_power(sd, this_cpu);
+
do {
int local_group;


2010-07-08 17:49:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote:
>
> @@ -2456,6 +2454,9 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
> init_sd_power_savings_stats(sd, sds, idle);
> load_idx = get_sd_load_idx(sd, idle);
>
> + if (this_cpu == smp_processor_id())
> + update_group_power(sd, this_cpu);
> +
> do {
> int local_group;

Which will break for nohz_idle_balance..

2010-07-08 17:50:57

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

On Thu, 2010-07-08 at 10:49 -0700, Peter Zijlstra wrote:
> On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote:
> >
> > @@ -2456,6 +2454,9 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
> > init_sd_power_savings_stats(sd, sds, idle);
> > load_idx = get_sd_load_idx(sd, idle);
> >
> > + if (this_cpu == smp_processor_id())
> > + update_group_power(sd, this_cpu);
> > +
> > do {
> > int local_group;
>
> Which will break for nohz_idle_balance..

Then the logic is broken somewhere because update_group_power() reads
APERF/MPERF MSR's which doesn't make sense when this_cpu !=
smp_processor_id(). What am I missing?

thanks,
suresh

2010-07-08 17:56:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

On Thu, 2010-07-08 at 10:50 -0700, Suresh Siddha wrote:
> On Thu, 2010-07-08 at 10:49 -0700, Peter Zijlstra wrote:
> > On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote:
> > >
> > > @@ -2456,6 +2454,9 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
> > > init_sd_power_savings_stats(sd, sds, idle);
> > > load_idx = get_sd_load_idx(sd, idle);
> > >
> > > + if (this_cpu == smp_processor_id())
> > > + update_group_power(sd, this_cpu);
> > > +
> > > do {
> > > int local_group;
> >
> > Which will break for nohz_idle_balance..
>
> Then the logic is broken somewhere because update_group_power() reads
> APERF/MPERF MSR's which doesn't make sense when this_cpu !=
> smp_processor_id(). What am I missing?

The APERF/MPERF code is utterly broken.. (and currently disabled by
default) but yeah, that's one aspect of its brokenness I only realized
after your email.

The problem is that it measures current throughput, not current
capacity. So for an idle thread/core it would return 0, instead of the
potential.

I've been meaning to revisit this.. maybe I should simply rip that out
until I get it working.

I was thinking of measuring temporal maxima to approximate capacity
instead of throughput.

2010-07-08 18:17:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote:
> > @@ -2433,7 +2433,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
> > return;
> > }
> >
> > - update_group_power(sd, this_cpu);
> > + if (local_group)
> > + update_group_power(sd, this_cpu);
>
> if IDLE == CPU_NEWLY_IDLE, then all the cpu's in the local group will do
> this. Also update_group_power() can be done on only on the local cpu,
> i.e., when this_cpu == smp_processor_id() right?

It might make sense to only update_group_power on !CPU_NEWLY_IDLE and
rely on the tick driven cpu_power updates.

No sense in updating them in finer slices I guess.

So how about something like:

---
kernel/sched_fair.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 9910e1b..2f05679 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2427,14 +2427,14 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
* domains. In the newly idle case, we will allow all the cpu's
* to do the newly idle load balance.
*/
- if (idle != CPU_NEWLY_IDLE && local_group &&
- balance_cpu != this_cpu) {
- *balance = 0;
- return;
+ if (idle != CPU_NEWLY_IDLE && local_group) {
+ if (balance_cpu != this_cpu) {
+ *balance = 0;
+ return;
+ }
+ update_group_power(sd, this_cpu);
}

- update_group_power(sd, this_cpu);
-
/* Adjust by relative CPU power of the group */
sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;

2010-07-08 21:54:19

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

On Thu, 2010-07-08 at 11:16 -0700, Peter Zijlstra wrote:
> On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote:
> > > @@ -2433,7 +2433,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
> > > return;
> > > }
> > >
> > > - update_group_power(sd, this_cpu);
> > > + if (local_group)
> > > + update_group_power(sd, this_cpu);
> >
> > if IDLE == CPU_NEWLY_IDLE, then all the cpu's in the local group will do
> > this. Also update_group_power() can be done on only on the local cpu,
> > i.e., when this_cpu == smp_processor_id() right?
>
> It might make sense to only update_group_power on !CPU_NEWLY_IDLE and
> rely on the tick driven cpu_power updates.
>
> No sense in updating them in finer slices I guess.
>
> So how about something like:
>
> ---
> kernel/sched_fair.c | 12 ++++++------
> 1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9910e1b..2f05679 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2427,14 +2427,14 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
> * domains. In the newly idle case, we will allow all the cpu's
> * to do the newly idle load balance.
> */
> - if (idle != CPU_NEWLY_IDLE && local_group &&
> - balance_cpu != this_cpu) {
> - *balance = 0;
> - return;
> + if (idle != CPU_NEWLY_IDLE && local_group) {
> + if (balance_cpu != this_cpu) {
> + *balance = 0;
> + return;
> + }
> + update_group_power(sd, this_cpu);
> }
>
> - update_group_power(sd, this_cpu);
> -
> /* Adjust by relative CPU power of the group */
> sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
>

I am ok with this patch (barring the currently broken aperf/mperf part).

Acked-by: Suresh Siddha <[email protected]>

Also, looking at all this, don't we need to do something like this in
the nohz load balance?

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 9910e1b..ae750e9 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3598,6 +3598,7 @@ static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle)
}

raw_spin_lock_irq(&this_rq->lock);
+ update_rq_clock(this_rq);
update_cpu_load(this_rq);
raw_spin_unlock_irq(&this_rq->lock);


2010-07-09 13:18:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

On Thu, 2010-07-08 at 14:53 -0700, Suresh Siddha wrote:

> Also, looking at all this, don't we need to do something like this in
> the nohz load balance?

Yes, I think you're right, thanks!

> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9910e1b..ae750e9 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -3598,6 +3598,7 @@ static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle)
> }
>
> raw_spin_lock_irq(&this_rq->lock);
> + update_rq_clock(this_rq);
> update_cpu_load(this_rq);
> raw_spin_unlock_irq(&this_rq->lock);

2010-07-12 17:12:03

by Venkatesh Pallipadi

[permalink] [raw]
Subject: Re: [PATCH] sched: Call update_group_power only for local_group

On Thu, Jul 8, 2010 at 11:16 AM, Peter Zijlstra <[email protected]> wrote:
> On Thu, 2010-07-08 at 10:45 -0700, Suresh Siddha wrote:
>> > @@ -2433,7 +2433,8 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
>> > ? ? ? ? ? ? ? return;
>> > ? ? ? }
>> >
>> > - ? ? update_group_power(sd, this_cpu);
>> > + ? ? if (local_group)
>> > + ? ? ? ? ? ? update_group_power(sd, this_cpu);
>>
>> if IDLE == CPU_NEWLY_IDLE, then all the cpu's in the local group will do
>> this. Also update_group_power() can be done on only on the local cpu,
>> i.e., when this_cpu == smp_processor_id() right?
>
> It might make sense to only update_group_power on !CPU_NEWLY_IDLE and
> rely on the tick driven cpu_power updates.
>
> No sense in updating them in finer slices I guess.
>
> So how about something like:

Yes. This looks good.

Acked-by: Venkatesh Pallipadi <[email protected]>

>
> ---
> ?kernel/sched_fair.c | ? 12 ++++++------
> ?1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 9910e1b..2f05679 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2427,14 +2427,14 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
> ? ? ? ? * domains. In the newly idle case, we will allow all the cpu's
> ? ? ? ? * to do the newly idle load balance.
> ? ? ? ? */
> - ? ? ? if (idle != CPU_NEWLY_IDLE && local_group &&
> - ? ? ? ? ? balance_cpu != this_cpu) {
> - ? ? ? ? ? ? ? *balance = 0;
> - ? ? ? ? ? ? ? return;
> + ? ? ? if (idle != CPU_NEWLY_IDLE && local_group) {
> + ? ? ? ? ? ? ? if (balance_cpu != this_cpu) {
> + ? ? ? ? ? ? ? ? ? ? ? *balance = 0;
> + ? ? ? ? ? ? ? ? ? ? ? return;
> + ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? update_group_power(sd, this_cpu);
> ? ? ? ?}
>
> - ? ? ? update_group_power(sd, this_cpu);
> -
> ? ? ? ?/* Adjust by relative CPU power of the group */
> ? ? ? ?sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;
>
>
>

2010-07-17 11:12:24

by Suresh Siddha

[permalink] [raw]
Subject: [tip:sched/core] sched: Update rq->clock for nohz balanced cpus

Commit-ID: 5343bdb8fd076f16edc9d113a9e35e2a1d1f4966
Gitweb: http://git.kernel.org/tip/5343bdb8fd076f16edc9d113a9e35e2a1d1f4966
Author: Suresh Siddha <[email protected]>
AuthorDate: Fri, 9 Jul 2010 15:19:54 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 17 Jul 2010 12:02:08 +0200

sched: Update rq->clock for nohz balanced cpus

Suresh spotted that we don't update the rq->clock in the nohz
load-balancer path.

Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched_fair.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index b4da534..e44a591 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3596,6 +3596,7 @@ static void nohz_idle_balance(int this_cpu, enum cpu_idle_type idle)
}

raw_spin_lock_irq(&this_rq->lock);
+ update_rq_clock(this_rq);
update_cpu_load(this_rq);
raw_spin_unlock_irq(&this_rq->lock);

2010-07-17 11:12:49

by Peter Zijlstra

[permalink] [raw]
Subject: [tip:sched/core] sched: Reduce update_group_power() calls

Commit-ID: bbc8cb5baead9607309583b20873ab0cc8d89eaf
Gitweb: http://git.kernel.org/tip/bbc8cb5baead9607309583b20873ab0cc8d89eaf
Author: Peter Zijlstra <[email protected]>
AuthorDate: Fri, 9 Jul 2010 15:15:43 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Sat, 17 Jul 2010 12:05:14 +0200

sched: Reduce update_group_power() calls

Currently we update cpu_power() too often, update_group_power() only
updates the local group's cpu_power but it gets called for all groups.

Furthermore, CPU_NEWLY_IDLE invocations will result in all cpus
calling it, even though a slow update of cpu_power is sufficient.

Therefore move the update under 'idle != CPU_NEWLY_IDLE &&
local_group' to reduce superfluous invocations.

Reported-by: Venkatesh Pallipadi <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Suresh Siddha <[email protected]>
LKML-Reference: <1278612989.1900.176.camel@laptop>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched_fair.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index e44a591..c9ac097 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2425,14 +2425,14 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
* domains. In the newly idle case, we will allow all the cpu's
* to do the newly idle load balance.
*/
- if (idle != CPU_NEWLY_IDLE && local_group &&
- balance_cpu != this_cpu) {
- *balance = 0;
- return;
+ if (idle != CPU_NEWLY_IDLE && local_group) {
+ if (balance_cpu != this_cpu) {
+ *balance = 0;
+ return;
+ }
+ update_group_power(sd, this_cpu);
}

- update_group_power(sd, this_cpu);
-
/* Adjust by relative CPU power of the group */
sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power;