If asymmetric packing is used when target cpu is busy,
update_sd_pick_busiest(), can select a lightly loaded cpu.
find_busiest_group() has checks to ensure asym packing is only used
when target cpu is not busy. However it may not be able to avoid a
lightly loaded cpu selected by update_sd_pick_busiest from being
selected as source cpu for eventual load balancing.
Also when using asymmetric scheduling, always select higher cpu as
source cpu for load balancing.
While doing this change, move the check to see if target cpu is busy
into check_asym_packing().
1. Record per second ebizzy (32 threads) on a 64 cpu power 7 box. (5 iterations)
4.5.0-master/ebizzy_32.out
N Min Max Median Avg Stddev
x 5 5205896 17260530 12141204 10759008 4765419
4.5.0-master-asym-changes/ebizzy_32.out
N Min Max Median Avg Stddev
x 5 7044349 19112876 17440718 14947658 5263970
2. Record per second ebizzy (32 threads) on a 64 cpu power 7 box. (5 iterations)
4.5.0-master/ebizzy_64.out
N Min Max Median Avg Stddev
x 5 5400083 14091418 8673907 8872662.4 3389746.8
4.5.0-master-asym-changes/ebizzy_64.out
N Min Max Median Avg Stddev
x 5 7533907 17232623 15083583 13364894 3776877.9
3. Record per second ebizzy (32 threads) on a 64 cpu power 7 box. (5 iterations)
4.5.0-master/ebizzy_128.out
N Min Max Median Avg Stddev
x 5 35328039 41642699 37564951 38378409 2671280
4.5.0-master-asym-changes/ebizzy_128.out
N Min Max Median Avg Stddev
x 5 37102220 42736809 38442478 39529626 2298389.4
Signed-off-by: Srikar Dronamraju <[email protected]>
---
kernel/sched/fair.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 56b7d4b..9abfb16 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6517,6 +6517,8 @@ static bool update_sd_pick_busiest(struct lb_env *env,
if (!(env->sd->flags & SD_ASYM_PACKING))
return true;
+ if (env->idle == CPU_NOT_IDLE)
+ return true;
/*
* ASYM_PACKING needs to move all the work to the lowest
* numbered CPUs in the group, therefore mark all groups
@@ -6526,7 +6528,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
if (!sds->busiest)
return true;
- if (group_first_cpu(sds->busiest) > group_first_cpu(sg))
+ if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
return true;
}
@@ -6672,6 +6674,9 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
if (!(env->sd->flags & SD_ASYM_PACKING))
return 0;
+ if (env->idle == CPU_NOT_IDLE)
+ return 0;
+
if (!sds->busiest)
return 0;
@@ -6864,8 +6869,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
busiest = &sds.busiest_stat;
/* ASYM feature bypasses nice load balance check */
- if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
- check_asym_packing(env, &sds))
+ if (check_asym_packing(env, &sds))
return sds.busiest;
/* There is no busy sibling group to pull tasks from */
--
1.8.3.1
To force a task migration during active balancing, nr_balance_failed is set
to cache_nice_tries + 1. However nr_balance_failed is not reset. As a side
effect, the next regular load balance under the same sd, a cache hot task
might be migrated, just because nr_balance_failed count is high.
Resetting the nr_balance_failed after a successful active balance, ensures
that a hot task is not unreasonably migrated. This can be verified by
looking at number of hot task migrations reported by /proc/schedstat.
Signed-off-by: Srikar Dronamraju <[email protected]>
---
kernel/sched/fair.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9abfb16..fae05f4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7294,10 +7294,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
&busiest->active_balance_work);
}
- /*
- * We've kicked active balancing, reset the failure
- * counter.
- */
+ /* We've kicked active balancing, force task migration. */
sd->nr_balance_failed = sd->cache_nice_tries+1;
}
} else
@@ -7532,10 +7529,13 @@ static int active_load_balance_cpu_stop(void *data)
schedstat_inc(sd, alb_count);
p = detach_one_task(&env);
- if (p)
+ if (p) {
schedstat_inc(sd, alb_pushed);
- else
+ /* Active balancing done, reset the failure counter. */
+ sd->nr_balance_failed = 0;
+ } else {
schedstat_inc(sd, alb_failed);
+ }
}
rcu_read_unlock();
out_unlock:
--
1.8.3.1
On Wed, 2016-03-23 at 17:04 +0530, Srikar Dronamraju wrote:
> If asymmetric packing is used when target cpu is busy,
> update_sd_pick_busiest(), can select a lightly loaded cpu.
> find_busiest_group() has checks to ensure asym packing is only used
> when target cpu is not busy. However it may not be able to avoid a
> lightly loaded cpu selected by update_sd_pick_busiest from being
> selected as source cpu for eventual load balancing.
>
> Also when using asymmetric scheduling, always select higher cpu as
> source cpu for load balancing.
>
> While doing this change, move the check to see if target cpu is busy
> into check_asym_packing().
>
> 1. Record per second ebizzy (32 threads) on a 64 cpu power 7 box. (5 iterations)
> 4.5.0-master/ebizzy_32.out
> N Min Max Median Avg Stddev
> x 5 5205896 17260530 12141204 10759008 4765419
>
> 4.5.0-master-asym-changes/ebizzy_32.out
> N Min Max Median Avg Stddev
> x 5 7044349 19112876 17440718 14947658 5263970
>
> 2. Record per second ebizzy (32 threads) on a 64 cpu power 7 box. (5 iterations)
> 4.5.0-master/ebizzy_64.out
> N Min Max Median Avg Stddev
> x 5 5400083 14091418 8673907 8872662.4 3389746.8
>
> 4.5.0-master-asym-changes/ebizzy_64.out
> N Min Max Median Avg Stddev
> x 5 7533907 17232623 15083583 13364894 3776877.9
>
> 3. Record per second ebizzy (32 threads) on a 64 cpu power 7 box. (5 iterations)
> 4.5.0-master/ebizzy_128.out
> N Min Max Median Avg Stddev
> x 5 35328039 41642699 37564951 38378409 2671280
>
> 4.5.0-master-asym-changes/ebizzy_128.out
> N Min Max Median Avg Stddev
> x 5 37102220 42736809 38442478 39529626 2298389.4
I'm not sure how to interpret these. Any chance you can give a summary of
what these results mean?
> Signed-off-by: Srikar Dronamraju <[email protected]>
FWIW, this still passes my scheduler tests on POWER7, but they weren't
failing before anyway.
Mikey
> ---
> kernel/sched/fair.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 56b7d4b..9abfb16 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6517,6 +6517,8 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> if (!(env->sd->flags & SD_ASYM_PACKING))
> return true;
>
> + if (env->idle == CPU_NOT_IDLE)
> + return true;
> /*
> * ASYM_PACKING needs to move all the work to the lowest
> * numbered CPUs in the group, therefore mark all groups
> @@ -6526,7 +6528,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> if (!sds->busiest)
> return true;
>
> - if (group_first_cpu(sds->busiest) > group_first_cpu(sg))
> + if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
> return true;
> }
>
> @@ -6672,6 +6674,9 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
> if (!(env->sd->flags & SD_ASYM_PACKING))
> return 0;
>
> + if (env->idle == CPU_NOT_IDLE)
> + return 0;
> +
> if (!sds->busiest)
> return 0;
>
> @@ -6864,8 +6869,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> busiest = &sds.busiest_stat;
>
> /* ASYM feature bypasses nice load balance check */
> - if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
> - check_asym_packing(env, &sds))
> + if (check_asym_packing(env, &sds))
> return sds.busiest;
>
> /* There is no busy sibling group to pull tasks from */
On Wed, Mar 23, 2016 at 05:04:40PM +0530, Srikar Dronamraju wrote:
> If asymmetric packing is used when target cpu is busy,
> update_sd_pick_busiest(), can select a lightly loaded cpu.
> find_busiest_group() has checks to ensure asym packing is only used
> when target cpu is not busy. However it may not be able to avoid a
> lightly loaded cpu selected by update_sd_pick_busiest from being
> selected as source cpu for eventual load balancing.
So my brain completely fails to parse. What? Why?
> Also when using asymmetric scheduling, always select higher cpu as
> source cpu for load balancing.
>
> While doing this change, move the check to see if target cpu is busy
> into check_asym_packing().
> Signed-off-by: Srikar Dronamraju <[email protected]>
> ---
> kernel/sched/fair.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 56b7d4b..9abfb16 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6517,6 +6517,8 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> if (!(env->sd->flags & SD_ASYM_PACKING))
> return true;
>
> + if (env->idle == CPU_NOT_IDLE)
> + return true;
OK, so this matches check_asym_packing() and makes sense, we don't want
to pull work if we're not idle.
But please add a comment with the condition, its always hard to remember
the intent of these things later.
> /*
> * ASYM_PACKING needs to move all the work to the lowest
> * numbered CPUs in the group, therefore mark all groups
> @@ -6526,7 +6528,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> if (!sds->busiest)
> return true;
>
> - if (group_first_cpu(sds->busiest) > group_first_cpu(sg))
> + if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
> return true;
> }
Right, so you want to start by moving the highest possible cpu's work
down. The end result is the same, but this way you can reach lower power
states quicker.
Again, please add a comment.
> @@ -6672,6 +6674,9 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
> if (!(env->sd->flags & SD_ASYM_PACKING))
> return 0;
>
> + if (env->idle == CPU_NOT_IDLE)
> + return 0;
> +
> if (!sds->busiest)
> return 0;
>
> @@ -6864,8 +6869,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> busiest = &sds.busiest_stat;
>
> /* ASYM feature bypasses nice load balance check */
> - if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
> - check_asym_packing(env, &sds))
> + if (check_asym_packing(env, &sds))
> return sds.busiest;
>
> /* There is no busy sibling group to pull tasks from */
OK, this is an effective NOP but results in cleaner code.
Commit-ID: d02c071183e1c01a76811c878c8a52322201f81f
Gitweb: http://git.kernel.org/tip/d02c071183e1c01a76811c878c8a52322201f81f
Author: Srikar Dronamraju <[email protected]>
AuthorDate: Wed, 23 Mar 2016 17:54:44 +0530
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 31 Mar 2016 10:49:44 +0200
sched/fair: Reset nr_balance_failed after active balancing
To force a task migration during active balancing, nr_balance_failed is set
to cache_nice_tries + 1. However nr_balance_failed is not reset. As a side
effect, the next regular load balance under the same sd, a cache hot task
might be migrated, just because nr_balance_failed count is high.
Resetting nr_balance_failed after a successful active balance ensures
that a hot task is not unreasonably migrated. This can be verified by
looking at othe number of hot task migrations reported by /proc/schedstat.
Signed-off-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fe30e6..cbb075e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7399,10 +7399,7 @@ more_balance:
&busiest->active_balance_work);
}
- /*
- * We've kicked active balancing, reset the failure
- * counter.
- */
+ /* We've kicked active balancing, force task migration. */
sd->nr_balance_failed = sd->cache_nice_tries+1;
}
} else
@@ -7637,10 +7634,13 @@ static int active_load_balance_cpu_stop(void *data)
schedstat_inc(sd, alb_count);
p = detach_one_task(&env);
- if (p)
+ if (p) {
schedstat_inc(sd, alb_pushed);
- else
+ /* Active balancing done, reset the failure counter. */
+ sd->nr_balance_failed = 0;
+ } else {
schedstat_inc(sd, alb_failed);
+ }
}
rcu_read_unlock();
out_unlock:
* Peter Zijlstra <[email protected]> [2016-03-29 14:19:24]:
> On Wed, Mar 23, 2016 at 05:04:40PM +0530, Srikar Dronamraju wrote:
> > If asymmetric packing is used when target cpu is busy,
> > update_sd_pick_busiest(), can select a lightly loaded cpu.
> > find_busiest_group() has checks to ensure asym packing is only used
> > when target cpu is not busy. However it may not be able to avoid a
> > lightly loaded cpu selected by update_sd_pick_busiest from being
> > selected as source cpu for eventual load balancing.
>
> So my brain completely fails to parse. What? Why?
Lets think of a situation where there are 4 cpus in a core with the core
having ASYM PACKING ability. If the tasks on each cpus are loaded such
that cpu0 has 1, cpu1 has 2, cpu3 has 3 and cpu4 has 2 threads. (Assume all
threads having equal load contributions.)
Now with the current unpatched code, with cpu0 running the load
balancing, its not guaranteed to pick cpu2 (which is the busiest).
Here is what happens.
1. update_sd_lb_stats() (with help of update_sd_pick_busiest() may pick
cpu1 as the sds.busiest.
2. check_asym_packing will return false.
3. find_busiest_group will still continue with cpu1 as sds.busiest and
is able to do load balancing.
After the load balance, the cpu0 will have 2, cpu1 will have 1, cpu2
will have 3 and cpu3 will have 2. So because of using asym packing in
update_sd_pick_busiest() when cpus are busy, we may not select the
busiest cpu resulting in the load balance not be balanced immediately.
Eventually, we will be able to load balance all cpus would have 2
threads.
Now with the patched code, it will picked cpu2 and after load
balance all cpus should end up with 2 threads after the first load
balance itself.
> > if (!(env->sd->flags & SD_ASYM_PACKING))
> > return true;
> >
> > + if (env->idle == CPU_NOT_IDLE)
> > + return true;
>
> OK, so this matches check_asym_packing() and makes sense, we don't want
> to pull work if we're not idle.
>
> But please add a comment with the condition, its always hard to remember
> the intent of these things later.
Okay will do.
> > /*
> > * ASYM_PACKING needs to move all the work to the lowest
> > * numbered CPUs in the group, therefore mark all groups
> > @@ -6526,7 +6528,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
> > if (!sds->busiest)
> > return true;
> >
> > - if (group_first_cpu(sds->busiest) > group_first_cpu(sg))
> > + if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
> > return true;
> > }
>
> Right, so you want to start by moving the highest possible cpu's work
> down. The end result is the same, but this way you can reach lower power
> states quicker.
Yes, The Asym packing was suppose to move the highest possible cpus work
down so this check in a way was defeating the purpose.
>
> Again, please add a comment.
Okay.
> > @@ -6864,8 +6869,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
> > busiest = &sds.busiest_stat;
> >
> > /* ASYM feature bypasses nice load balance check */
> > - if ((env->idle == CPU_IDLE || env->idle == CPU_NEWLY_IDLE) &&
> > - check_asym_packing(env, &sds))
> > + if (check_asym_packing(env, &sds))
> > return sds.busiest;
> >
> > /* There is no busy sibling group to pull tasks from */
>
> OK, this is an effective NOP but results in cleaner code.
Yes, this is a nop.
>
--
Thanks and Regards
Srikar Dronamraju