2013-06-17 13:01:56

by Lei Wen

[permalink] [raw]
Subject: [PATCH v2 0/3] small fix for scale usage

Here are three patches which correct scale usage in both fix_small_imbalance
and update_sg_lb_stats.

And give out comment over when fix_small_imbalance would cause load change.

V2: fix scale usage for update_sg_lb_stats

Lei Wen (3):
sched: reduce calculation effort in fix_small_imbalance
sched: scale the busy and this queue's per-task load before compare
sched: scale cpu load for judgment of group imbalance

kernel/sched/fair.c | 56 +++++++++++++++++++++++++++++++--------------------
1 file changed, 34 insertions(+), 22 deletions(-)

--
1.7.10.4


2013-06-17 13:02:35

by Lei Wen

[permalink] [raw]
Subject: [PATCH 2/3] sched: scale the busy and this queue's per-task load before compare

Since for max_load and this_load, they are the value that already be
scaled. It is not reasonble to get a minimum value between the scaled
and non-scaled value, like below example.
min(sds->busiest_load_per_task, sds->max_load);

Also add comment over in what condition, there would be cpu power gain
in move the load.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 28052fa..77a149c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4692,7 +4692,7 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
{
unsigned long tmp, pwr_now = 0, pwr_move = 0;
unsigned int imbn = 2;
- unsigned long scaled_busy_load_per_task;
+ unsigned long scaled_busy_load_per_task, scaled_this_load_per_task;

if (sds->this_nr_running) {
sds->this_load_per_task /= sds->this_nr_running;
@@ -4714,6 +4714,9 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
return;
}

+ scaled_this_load_per_task = sds->this_load_per_task
+ * SCHED_POWER_SCALE;
+ scaled_this_load_per_task /= sds->this->sgp->power;
/*
* OK, we don't have enough imbalance to justify moving tasks,
* however we may be able to increase total CPU power used by
@@ -4721,28 +4724,35 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
*/

pwr_now += sds->busiest->sgp->power *
- min(sds->busiest_load_per_task, sds->max_load);
+ min(scaled_busy_load_per_task, sds->max_load);
pwr_now += sds->this->sgp->power *
- min(sds->this_load_per_task, sds->this_load);
+ min(scaled_this_load_per_task, sds->this_load);
pwr_now /= SCHED_POWER_SCALE;

/* Amount of load we'd subtract */
if (sds->max_load > scaled_busy_load_per_task) {
pwr_move += sds->busiest->sgp->power *
- min(sds->busiest_load_per_task,
+ min(scaled_busy_load_per_task,
sds->max_load - scaled_busy_load_per_task);
- tmp = (sds->busiest_load_per_task * SCHED_POWER_SCALE) /
- sds->this->sgp->power;
+ tmp = scaled_busy_load_per_task;
} else
- tmp = (sds->max_load * sds->busiest->sgp->power) /
- sds->this->sgp->power;
+ tmp = sds->max_load;

+ /* Scale to this queue from busiest queue */
+ tmp = (tmp * sds->busiest->sgp->power) /
+ sds->this->sgp->power;
/* Amount of load we'd add */
pwr_move += sds->this->sgp->power *
- min(sds->this_load_per_task, sds->this_load + tmp);
+ min(scaled_this_load_per_task, sds->this_load + tmp);
pwr_move /= SCHED_POWER_SCALE;

/* Move if we gain throughput */
+ /*
+ * The only possibilty for below statement be true, is:
+ * sds->max_load is larger than scaled_busy_load_per_task, while,
+ * scaled_this_load_per_task is larger than sds->this_load plus by
+ * the scaled scaled_busy_load_per_task moved into this queue
+ */
if (pwr_move > pwr_now)
env->imbalance = sds->busiest_load_per_task;
}
--
1.7.10.4

2013-06-17 13:02:46

by Lei Wen

[permalink] [raw]
Subject: [PATCH 1/3] sched: reduce calculation effort in fix_small_imbalance

Actually all below item could be repalced by scaled_busy_load_per_task
(sds->busiest_load_per_task * SCHED_POWER_SCALE)
/sds->busiest->sgp->power;

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c61a614..28052fa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4727,20 +4727,17 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds)
pwr_now /= SCHED_POWER_SCALE;

/* Amount of load we'd subtract */
- tmp = (sds->busiest_load_per_task * SCHED_POWER_SCALE) /
- sds->busiest->sgp->power;
- if (sds->max_load > tmp)
+ if (sds->max_load > scaled_busy_load_per_task) {
pwr_move += sds->busiest->sgp->power *
- min(sds->busiest_load_per_task, sds->max_load - tmp);
-
- /* Amount of load we'd add */
- if (sds->max_load * sds->busiest->sgp->power <
- sds->busiest_load_per_task * SCHED_POWER_SCALE)
- tmp = (sds->max_load * sds->busiest->sgp->power) /
- sds->this->sgp->power;
- else
+ min(sds->busiest_load_per_task,
+ sds->max_load - scaled_busy_load_per_task);
tmp = (sds->busiest_load_per_task * SCHED_POWER_SCALE) /
sds->this->sgp->power;
+ } else
+ tmp = (sds->max_load * sds->busiest->sgp->power) /
+ sds->this->sgp->power;
+
+ /* Amount of load we'd add */
pwr_move += sds->this->sgp->power *
min(sds->this_load_per_task, sds->this_load + tmp);
pwr_move /= SCHED_POWER_SCALE;
--
1.7.10.4

2013-06-17 13:02:53

by Lei Wen

[permalink] [raw]
Subject: [PATCH 3/3] sched: scale cpu load for judgment of group imbalance

We cannot compare two load directly from two cpus, since the cpu power
over two cpu may vary largely.

Suppose we meet such two kind of cpus.
CPU A:
No real time work, and there are 3 task, with rq->load.weight
being 512.
CPU B:
Has real time work, and it take 3/4 of the cpu power, which
makes CFS only take 1/4, that is 1024/4=256 cpu power. And over its CFS
runqueue, there is only one task with weight as 128.

Since both cpu's CFS task take for half of the CFS's cpu power, it
should be considered as balanced in such case.

But original judgment like:
if ((max_cpu_load - min_cpu_load) >= avg_load_per_task &&
(max_nr_running - min_nr_running) > 1)
It makes (512-128)>=((512+128)/4), and lead to imbalance conclusion...

Make the load as scaled, to avoid such case.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 77a149c..18625dd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4434,7 +4434,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
int local_group, int *balance, struct sg_lb_stats *sgs)
{
unsigned long nr_running, max_nr_running, min_nr_running;
- unsigned long load, max_cpu_load, min_cpu_load;
+ unsigned long scaled_load, load, max_cpu_load, min_cpu_load;
unsigned int balance_cpu = -1, first_idle_cpu = 0;
unsigned long avg_load_per_task = 0;
int i;
@@ -4464,10 +4464,12 @@ static inline void update_sg_lb_stats(struct lb_env *env,
load = target_load(i, load_idx);
} else {
load = source_load(i, load_idx);
- if (load > max_cpu_load)
- max_cpu_load = load;
- if (min_cpu_load > load)
- min_cpu_load = load;
+ scaled_load = load * SCHED_POWER_SCALE
+ / cpu_rq(i)->cpu_power;
+ if (scaled_load > max_cpu_load)
+ max_cpu_load = scaled_load;
+ if (min_cpu_load > scaled_load)
+ min_cpu_load = scaled_load;

if (nr_running > max_nr_running)
max_nr_running = nr_running;
@@ -4511,8 +4513,11 @@ static inline void update_sg_lb_stats(struct lb_env *env,
* normalized nr_running number somewhere that negates
* the hierarchy?
*/
- if (sgs->sum_nr_running)
- avg_load_per_task = sgs->sum_weighted_load / sgs->sum_nr_running;
+ if (sgs->sum_nr_running) {
+ avg_load_per_task = sgs->sum_weighted_load * SCHED_POWER_SCALE
+ / group->sgp->power;
+ avg_load_per_task /= sgs->sum_nr_running;
+ }

if ((max_cpu_load - min_cpu_load) >= avg_load_per_task &&
(max_nr_running - min_nr_running) > 1)
--
1.7.10.4