Hi Peter,
sched_mc_powersavings is broken in pre-Nehalem x86 platforms due to
contradictory SD flags at MC level and CPU level. SD_PREFER_SIBLING being set
at MC level is expected to do the following:
a) Disable consolidating tasks to single group in the parent sched domain
(generally single cpu package)
b) Spread tasks equally across groups at the parent sched domain.
While SD_POWERSAVINGS_BALANCE set at a sched domain will enable logic to
consolidate tasks within minimum number of groups at that sched domain.
Basically SD_POWERSAVINGS_BALANCE at one sched domain and its child domain
having SD_PREFER_SIBLING is contradicting and disabling the
SD_POWERSAVINGS_BALANCE logic in
if (local_group && (sds->this_nr_running >= sgs->group_capacity ||
!sds->this_nr_running))
sds->power_savings_balance = 0;
Since sgs.group_capacity is set to '1' by SD_PREFER_SIBLING in child
sched domain.
The attached patch will fix the expected behavior for sched_mc_powersavings > 0
while objective (b) is still an open issue.
The following condition in find_busiest_group()
sds.max_load <= sds.busiest_load_per_task
treats unequally loaded groups as balanced as longs they are below
capacity
Test Results:
The following patch was tested on dual socket quad core non-threaded Xeon:
Running 4 while(1) loops in shell:
echo 1 > /sys/devices/system/cpu/sched_mc_powersavings
Without Patch:
Running 1 task in one quad core package and 3 in another.
This is effectively the baseline behavior with sched_mc=0
With patch:
All 4 tasks running in one quad core package.
Expected behavior for sched_mc_powersavings>0
--Vaidy
Fix for sched_mc_powersavigs for pre-Nehalem platforms.
Child sched domain should clear SD_PREFER_SIBLING if parent will have
SD_POWERSAVINGS_BALANCE because they are contradicting.
Sets the flags correctly based on sched_mc_power_savings.
Signed-off-by: Vaidyanathan Srinivasan <[email protected]>
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6550415..ef6b7cd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -866,7 +866,10 @@ static inline int sd_balance_for_mc_power(void)
if (sched_smt_power_savings)
return SD_POWERSAVINGS_BALANCE;
- return SD_PREFER_SIBLING;
+ if (!sched_mc_power_savings)
+ return SD_PREFER_SIBLING;
+
+ return 0;
}
static inline int sd_balance_for_package_power(void)
On Mon, 2010-02-08 at 15:35 +0530, Vaidyanathan Srinivasan wrote:
> Fix for sched_mc_powersavigs for pre-Nehalem platforms.
> Child sched domain should clear SD_PREFER_SIBLING if parent will have
> SD_POWERSAVINGS_BALANCE because they are contradicting.
>
> Sets the flags correctly based on sched_mc_power_savings.
>
> Signed-off-by: Vaidyanathan Srinivasan <[email protected]>
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 6550415..ef6b7cd 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -866,7 +866,10 @@ static inline int sd_balance_for_mc_power(void)
> if (sched_smt_power_savings)
> return SD_POWERSAVINGS_BALANCE;
>
> - return SD_PREFER_SIBLING;
> + if (!sched_mc_power_savings)
> + return SD_PREFER_SIBLING;
> +
> + return 0;
> }
>
> static inline int sd_balance_for_package_power(void)
>
Looks good, thanks!
What's the status of getting rid of sched_{mc,smt}_power_savings?
* Peter Zijlstra <[email protected]> [2010-02-08 12:35:48]:
> On Mon, 2010-02-08 at 15:35 +0530, Vaidyanathan Srinivasan wrote:
>
> > Fix for sched_mc_powersavigs for pre-Nehalem platforms.
> > Child sched domain should clear SD_PREFER_SIBLING if parent will have
> > SD_POWERSAVINGS_BALANCE because they are contradicting.
> >
> > Sets the flags correctly based on sched_mc_power_savings.
> >
> > Signed-off-by: Vaidyanathan Srinivasan <[email protected]>
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 6550415..ef6b7cd 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -866,7 +866,10 @@ static inline int sd_balance_for_mc_power(void)
> > if (sched_smt_power_savings)
> > return SD_POWERSAVINGS_BALANCE;
> >
> > - return SD_PREFER_SIBLING;
> > + if (!sched_mc_power_savings)
> > + return SD_PREFER_SIBLING;
> > +
> > + return 0;
> > }
> >
> > static inline int sd_balance_for_package_power(void)
> >
>
> Looks good, thanks!
>
> What's the status of getting rid of sched_{mc,smt}_power_savings?
Hi Peter,
With the current rearrangement of the code, the unified
sched_power_savings seems more doable.
However, I have few more fixes for sched_smt_powersavings on Nehalem
before I would revisit the unified tunable.
--Vaidy
Commit-ID: 28f5318167adf23b16c844b9c2253f355cb21796
Gitweb: http://git.kernel.org/tip/28f5318167adf23b16c844b9c2253f355cb21796
Author: Vaidyanathan Srinivasan <[email protected]>
AuthorDate: Mon, 8 Feb 2010 15:35:55 +0530
Committer: Thomas Gleixner <[email protected]>
CommitDate: Tue, 16 Feb 2010 15:13:59 +0100
sched: Fix sched_mv_power_savings for !SMT
Fix for sched_mc_powersavigs for pre-Nehalem platforms.
Child sched domain should clear SD_PREFER_SIBLING if parent will have
SD_POWERSAVINGS_BALANCE because they are contradicting.
Sets the flags correctly based on sched_mc_power_savings.
Signed-off-by: Vaidyanathan Srinivasan <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Cc: [email protected] [2.6.32.x]
Signed-off-by: Thomas Gleixner <[email protected]>
---
include/linux/sched.h | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 78efe7c..1f5fa53 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -878,7 +878,10 @@ static inline int sd_balance_for_mc_power(void)
if (sched_smt_power_savings)
return SD_POWERSAVINGS_BALANCE;
- return SD_PREFER_SIBLING;
+ if (!sched_mc_power_savings)
+ return SD_PREFER_SIBLING;
+
+ return 0;
}
static inline int sd_balance_for_package_power(void)
* tip-bot for Vaidyanathan Srinivasan <[email protected]> [2010-02-16 14:15:43]:
> Commit-ID: 28f5318167adf23b16c844b9c2253f355cb21796
> Gitweb: http://git.kernel.org/tip/28f5318167adf23b16c844b9c2253f355cb21796
> Author: Vaidyanathan Srinivasan <[email protected]>
> AuthorDate: Mon, 8 Feb 2010 15:35:55 +0530
> Committer: Thomas Gleixner <[email protected]>
> CommitDate: Tue, 16 Feb 2010 15:13:59 +0100
>
> sched: Fix sched_mv_power_savings for !SMT
^^^^^ _mc_
Minor typo, the summary should be
"sched: Fix sched_mc_power_savings for !SMT cases"
Thanks,
Vaidy
> Fix for sched_mc_powersavigs for pre-Nehalem platforms.
> Child sched domain should clear SD_PREFER_SIBLING if parent will have
> SD_POWERSAVINGS_BALANCE because they are contradicting.
>
> Sets the flags correctly based on sched_mc_power_savings.
>
> Signed-off-by: Vaidyanathan Srinivasan <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> LKML-Reference: <[email protected]>
> Cc: [email protected] [2.6.32.x]
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
> include/linux/sched.h | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 78efe7c..1f5fa53 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -878,7 +878,10 @@ static inline int sd_balance_for_mc_power(void)
> if (sched_smt_power_savings)
> return SD_POWERSAVINGS_BALANCE;
>
> - return SD_PREFER_SIBLING;
> + if (!sched_mc_power_savings)
> + return SD_PREFER_SIBLING;
> +
> + return 0;
> }
>
> static inline int sd_balance_for_package_power(void)