2013-04-23 08:27:49

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v3 0/6] correct load_balance()

Commit 88b8dac0 makes load_balance() consider other cpus in its group.
But, there are some missing parts for this feature to work properly.
This patchset correct these things and make load_balance() robust.

Others are related to LBF_ALL_PINNED. This is fallback functionality
when all tasks can't be moved as cpu affinity. But, currently,
if imbalance is not large enough to task's load, we leave LBF_ALL_PINNED
flag and 'redo' is triggered. This is not our intention, so correct it.

These are based on sched/core branch in tip tree.

Changelog
v2->v3: Changes from Peter's suggestion
[2/6]: change comment
[3/6]: fix coding style
[6/6]: fix coding style, fix changelog

v1->v2: Changes from Peter's suggestion
[4/6]: don't include a code to evaluate load value in can_migrate_task()
[5/6]: rename load_balance_tmpmask to load_balance_mask
[6/6]: not use one more cpumasks, use env's cpus for prevent to re-select

Joonsoo Kim (6):
sched: change position of resched_cpu() in load_balance()
sched: explicitly cpu_idle_type checking in rebalance_domains()
sched: don't consider other cpus in our group in case of NEWLY_IDLE
sched: move up affinity check to mitigate useless redoing overhead
sched: rename load_balance_tmpmask to load_balance_mask
sched: prevent to re-select dst-cpu in load_balance()

kernel/sched/core.c | 4 +--
kernel/sched/fair.c | 69 +++++++++++++++++++++++++++++----------------------
2 files changed, 41 insertions(+), 32 deletions(-)

--
1.7.9.5


2013-04-23 08:27:52

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v3 1/6] sched: change position of resched_cpu() in load_balance()

cur_ld_moved is reset if env.flags hit LBF_NEED_BREAK.
So, there is possibility that we miss doing resched_cpu().
Correct it as changing position of resched_cpu()
before checking LBF_NEED_BREAK.

Acked-by: Peter Zijlstra <[email protected]>
Tested-by: Jason Low <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1c97735..25aaf93 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5080,17 +5080,17 @@ more_balance:
double_rq_unlock(env.dst_rq, busiest);
local_irq_restore(flags);

- if (env.flags & LBF_NEED_BREAK) {
- env.flags &= ~LBF_NEED_BREAK;
- goto more_balance;
- }
-
/*
* some other cpu did the load balance for us.
*/
if (cur_ld_moved && env.dst_cpu != smp_processor_id())
resched_cpu(env.dst_cpu);

+ if (env.flags & LBF_NEED_BREAK) {
+ env.flags &= ~LBF_NEED_BREAK;
+ goto more_balance;
+ }
+
/*
* Revisit (affine) tasks on src_cpu that couldn't be moved to
* us and move them to an alternate dst_cpu in our sched_group
--
1.7.9.5

2013-04-23 08:27:51

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v3 2/6] sched: explicitly cpu_idle_type checking in rebalance_domains()

After commit 88b8dac0, dst-cpu can be changed in load_balance(),
then we can't know cpu_idle_type of dst-cpu when load_balance()
return positive. So, add explicit cpu_idle_type checking.

Cc: Srivatsa Vaddagiri <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Tested-by: Jason Low <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 25aaf93..726e129 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5523,10 +5523,11 @@ static void rebalance_domains(int cpu, enum cpu_idle_type idle)
if (time_after_eq(jiffies, sd->last_balance + interval)) {
if (load_balance(cpu, rq, sd, idle, &balance)) {
/*
- * We've pulled tasks over so either we're no
- * longer idle.
+ * The LBF_SOME_PINNED logic could have changed
+ * env->dst_cpu, so we can't know our idle
+ * state even if we migrated tasks. Update it.
*/
- idle = CPU_NOT_IDLE;
+ idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE;
}
sd->last_balance = jiffies;
}
--
1.7.9.5

2013-04-23 08:28:39

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v3 4/6] sched: move up affinity check to mitigate useless redoing overhead

Currently, LBF_ALL_PINNED is cleared after affinity check is passed.
So, if task migration is skipped by small load value or
small imbalance value in move_tasks(), we don't clear LBF_ALL_PINNED.
At last, we trigger 'redo' in load_balance().

Imbalance value is often so small that any tasks cannot be moved
to other cpus and, of course, this situation may be continued after
we change the target cpu. So this patch move up affinity check code and
clear LBF_ALL_PINNED before evaluating load value in order to
mitigate useless redoing overhead.

In addition, re-order some comments correctly.

Acked-by: Peter Zijlstra <[email protected]>
Tested-by: Jason Low <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dfa92b7..b8ef321 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3896,10 +3896,14 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
int tsk_cache_hot = 0;
/*
* We do not migrate tasks that are:
- * 1) running (obviously), or
+ * 1) throttled_lb_pair, or
* 2) cannot be migrated to this CPU due to cpus_allowed, or
- * 3) are cache-hot on their current CPU.
+ * 3) running (obviously), or
+ * 4) are cache-hot on their current CPU.
*/
+ if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
+ return 0;
+
if (!cpumask_test_cpu(env->dst_cpu, tsk_cpus_allowed(p))) {
int new_dst_cpu;

@@ -3967,9 +3971,6 @@ static int move_one_task(struct lb_env *env)
struct task_struct *p, *n;

list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) {
- if (throttled_lb_pair(task_group(p), env->src_rq->cpu, env->dst_cpu))
- continue;
-
if (!can_migrate_task(p, env))
continue;

@@ -4021,7 +4022,7 @@ static int move_tasks(struct lb_env *env)
break;
}

- if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
+ if (!can_migrate_task(p, env))
goto next;

load = task_h_load(p);
@@ -4032,9 +4033,6 @@ static int move_tasks(struct lb_env *env)
if ((load / 2) > env->imbalance)
goto next;

- if (!can_migrate_task(p, env))
- goto next;
-
move_task(p, env);
pulled++;
env->imbalance -= load;
--
1.7.9.5

2013-04-23 08:28:37

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v3 5/6] sched: rename load_balance_tmpmask to load_balance_mask

This name doesn't represent specific meaning.
So rename it to imply it's purpose.

Acked-by: Peter Zijlstra <[email protected]>
Tested-by: Jason Low <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ee8c1bd..cb49b2a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6873,7 +6873,7 @@ struct task_group root_task_group;
LIST_HEAD(task_groups);
#endif

-DECLARE_PER_CPU(cpumask_var_t, load_balance_tmpmask);
+DECLARE_PER_CPU(cpumask_var_t, load_balance_mask);

void __init sched_init(void)
{
@@ -6910,7 +6910,7 @@ void __init sched_init(void)
#endif /* CONFIG_RT_GROUP_SCHED */
#ifdef CONFIG_CPUMASK_OFFSTACK
for_each_possible_cpu(i) {
- per_cpu(load_balance_tmpmask, i) = (void *)ptr;
+ per_cpu(load_balance_mask, i) = (void *)ptr;
ptr += cpumask_size();
}
#endif /* CONFIG_CPUMASK_OFFSTACK */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b8ef321..5b1e966 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4977,7 +4977,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
#define MAX_PINNED_INTERVAL 512

/* Working cpumask for load_balance and load_balance_newidle. */
-DEFINE_PER_CPU(cpumask_var_t, load_balance_tmpmask);
+DEFINE_PER_CPU(cpumask_var_t, load_balance_mask);

static int need_active_balance(struct lb_env *env)
{
@@ -5012,7 +5012,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
struct sched_group *group;
struct rq *busiest;
unsigned long flags;
- struct cpumask *cpus = __get_cpu_var(load_balance_tmpmask);
+ struct cpumask *cpus = __get_cpu_var(load_balance_mask);

struct lb_env env = {
.sd = sd,
--
1.7.9.5

2013-04-23 08:28:34

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v3 6/6] sched: prevent to re-select dst-cpu in load_balance()

Commit 88b8dac0 makes load_balance() consider other cpus in its group.
But, in that, there is no code for preventing to re-select dst-cpu.
So, same dst-cpu can be selected over and over.

This patch add functionality to load_balance() in order to exclude
cpu which is selected once. We prevent to re-select dst_cpu via
env's cpus, so now, env's cpus is a candidate not only for src_cpus,
but also dst_cpus.

With this patch, we can remove lb_iterations and max_lb_iterations,
because we decide whether we can go ahead or not via env's cpus.

Cc: Srivatsa Vaddagiri <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Tested-by: Jason Low <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5b1e966..acaf567 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3905,7 +3905,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
return 0;

if (!cpumask_test_cpu(env->dst_cpu, tsk_cpus_allowed(p))) {
- int new_dst_cpu;
+ int cpu;

schedstat_inc(p, se.statistics.nr_failed_migrations_affine);

@@ -3920,12 +3920,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
if (!env->dst_grpmask || (env->flags & LBF_SOME_PINNED))
return 0;

- new_dst_cpu = cpumask_first_and(env->dst_grpmask,
- tsk_cpus_allowed(p));
- if (new_dst_cpu < nr_cpu_ids) {
- env->flags |= LBF_SOME_PINNED;
- env->new_dst_cpu = new_dst_cpu;
+ /* Prevent to re-select dst_cpu via env's cpus */
+ for_each_cpu_and(cpu, env->dst_grpmask, env->cpus) {
+ if (cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) {
+ env->flags |= LBF_SOME_PINNED;
+ env->new_dst_cpu = cpu;
+ break;
+ }
}
+
return 0;
}

@@ -5008,7 +5011,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
int *balance)
{
int ld_moved, cur_ld_moved, active_balance = 0;
- int lb_iterations, max_lb_iterations;
struct sched_group *group;
struct rq *busiest;
unsigned long flags;
@@ -5028,15 +5030,8 @@ static int load_balance(int this_cpu, struct rq *this_rq,
* For NEWLY_IDLE load_balancing, we don't need to consider
* other cpus in our group
*/
- if (idle == CPU_NEWLY_IDLE) {
+ if (idle == CPU_NEWLY_IDLE)
env.dst_grpmask = NULL;
- /*
- * we don't care max_lb_iterations in this case,
- * in following patch, this will be removed
- */
- max_lb_iterations = 0;
- } else
- max_lb_iterations = cpumask_weight(env.dst_grpmask);

cpumask_copy(cpus, cpu_active_mask);

@@ -5064,7 +5059,6 @@ redo:
schedstat_add(sd, lb_imbalance[idle], env.imbalance);

ld_moved = 0;
- lb_iterations = 1;
if (busiest->nr_running > 1) {
/*
* Attempt to move tasks. If find_busiest_group has found
@@ -5121,14 +5115,17 @@ more_balance:
* moreover subsequent load balance cycles should correct the
* excess load moved.
*/
- if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0 &&
- lb_iterations++ < max_lb_iterations) {
+ if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) {

env.dst_rq = cpu_rq(env.new_dst_cpu);
env.dst_cpu = env.new_dst_cpu;
env.flags &= ~LBF_SOME_PINNED;
env.loop = 0;
env.loop_break = sched_nr_migrate_break;
+
+ /* Prevent to re-select dst_cpu via env's cpus */
+ cpumask_clear_cpu(env.dst_cpu, env.cpus);
+
/*
* Go back to "more_balance" rather than "redo" since we
* need to continue with same src_cpu.
--
1.7.9.5

2013-04-23 08:29:50

by Joonsoo Kim

[permalink] [raw]
Subject: [PATCH v3 3/6] sched: don't consider other cpus in our group in case of NEWLY_IDLE

Commit 88b8dac0 makes load_balance() consider other cpus in its group,
regardless of idle type. When we do NEWLY_IDLE balancing, we should not
consider it, because a motivation of NEWLY_IDLE balancing is to turn
this cpu to non idle state if needed. This is not the case of other cpus.
So, change code not to consider other cpus for NEWLY_IDLE balancing.

With this patch, assign 'if (pulled_task) this_rq->idle_stamp = 0'
in idle_balance() is corrected, because NEWLY_IDLE balancing doesn't
consider other cpus. Assigning to 'this_rq->idle_stamp' is now valid.

Cc: Srivatsa Vaddagiri <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Tested-by: Jason Low <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 726e129..dfa92b7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5026,8 +5026,21 @@ static int load_balance(int this_cpu, struct rq *this_rq,
.cpus = cpus,
};

+ /*
+ * For NEWLY_IDLE load_balancing, we don't need to consider
+ * other cpus in our group
+ */
+ if (idle == CPU_NEWLY_IDLE) {
+ env.dst_grpmask = NULL;
+ /*
+ * we don't care max_lb_iterations in this case,
+ * in following patch, this will be removed
+ */
+ max_lb_iterations = 0;
+ } else
+ max_lb_iterations = cpumask_weight(env.dst_grpmask);
+
cpumask_copy(cpus, cpu_active_mask);
- max_lb_iterations = cpumask_weight(env.dst_grpmask);

schedstat_inc(sd, lb_count[idle]);

--
1.7.9.5

Subject: [tip:sched/core] sched: Change position of resched_cpu() in load_balance()

Commit-ID: f1cd0858100c67273f2c74344e0c464344c4a982
Gitweb: http://git.kernel.org/tip/f1cd0858100c67273f2c74344e0c464344c4a982
Author: Joonsoo Kim <[email protected]>
AuthorDate: Tue, 23 Apr 2013 17:27:37 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 24 Apr 2013 08:52:43 +0200

sched: Change position of resched_cpu() in load_balance()

cur_ld_moved is reset if env.flags hit LBF_NEED_BREAK.
So, there is possibility that we miss doing resched_cpu().
Correct it as changing position of resched_cpu()
before checking LBF_NEED_BREAK.

Signed-off-by: Joonsoo Kim <[email protected]>
Tested-by: Jason Low <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1c97735..25aaf93 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5080,17 +5080,17 @@ more_balance:
double_rq_unlock(env.dst_rq, busiest);
local_irq_restore(flags);

- if (env.flags & LBF_NEED_BREAK) {
- env.flags &= ~LBF_NEED_BREAK;
- goto more_balance;
- }
-
/*
* some other cpu did the load balance for us.
*/
if (cur_ld_moved && env.dst_cpu != smp_processor_id())
resched_cpu(env.dst_cpu);

+ if (env.flags & LBF_NEED_BREAK) {
+ env.flags &= ~LBF_NEED_BREAK;
+ goto more_balance;
+ }
+
/*
* Revisit (affine) tasks on src_cpu that couldn't be moved to
* us and move them to an alternate dst_cpu in our sched_group

Subject: [tip:sched/core] sched: Explicitly cpu_idle_type checking in rebalance_domains()

Commit-ID: de5eb2dd7f171ee8a45d23cd41aa2efe9ab922b3
Gitweb: http://git.kernel.org/tip/de5eb2dd7f171ee8a45d23cd41aa2efe9ab922b3
Author: Joonsoo Kim <[email protected]>
AuthorDate: Tue, 23 Apr 2013 17:27:38 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 24 Apr 2013 08:52:43 +0200

sched: Explicitly cpu_idle_type checking in rebalance_domains()

After commit 88b8dac0, dst-cpu can be changed in load_balance(),
then we can't know cpu_idle_type of dst-cpu when load_balance()
return positive. So, add explicit cpu_idle_type checking.

Signed-off-by: Joonsoo Kim <[email protected]>
Tested-by: Jason Low <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 25aaf93..726e129 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5523,10 +5523,11 @@ static void rebalance_domains(int cpu, enum cpu_idle_type idle)
if (time_after_eq(jiffies, sd->last_balance + interval)) {
if (load_balance(cpu, rq, sd, idle, &balance)) {
/*
- * We've pulled tasks over so either we're no
- * longer idle.
+ * The LBF_SOME_PINNED logic could have changed
+ * env->dst_cpu, so we can't know our idle
+ * state even if we migrated tasks. Update it.
*/
- idle = CPU_NOT_IDLE;
+ idle = idle_cpu(cpu) ? CPU_IDLE : CPU_NOT_IDLE;
}
sd->last_balance = jiffies;
}

Subject: [tip:sched/core] sched: Don' t consider other cpus in our group in case of NEWLY_IDLE

Commit-ID: cfc03118047172f5bdc58d63c607d16d33ce5305
Gitweb: http://git.kernel.org/tip/cfc03118047172f5bdc58d63c607d16d33ce5305
Author: Joonsoo Kim <[email protected]>
AuthorDate: Tue, 23 Apr 2013 17:27:39 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 24 Apr 2013 08:52:44 +0200

sched: Don't consider other cpus in our group in case of NEWLY_IDLE

Commit 88b8dac0 makes load_balance() consider other cpus in its
group, regardless of idle type. When we do NEWLY_IDLE balancing,
we should not consider it, because a motivation of NEWLY_IDLE
balancing is to turn this cpu to non idle state if needed. This
is not the case of other cpus. So, change code not to consider
other cpus for NEWLY_IDLE balancing.

With this patch, assign 'if (pulled_task) this_rq->idle_stamp =
0' in idle_balance() is corrected, because NEWLY_IDLE balancing
doesn't consider other cpus. Assigning to 'this_rq->idle_stamp'
is now valid.

Signed-off-by: Joonsoo Kim <[email protected]>
Tested-by: Jason Low <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 726e129..dfa92b7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5026,8 +5026,21 @@ static int load_balance(int this_cpu, struct rq *this_rq,
.cpus = cpus,
};

+ /*
+ * For NEWLY_IDLE load_balancing, we don't need to consider
+ * other cpus in our group
+ */
+ if (idle == CPU_NEWLY_IDLE) {
+ env.dst_grpmask = NULL;
+ /*
+ * we don't care max_lb_iterations in this case,
+ * in following patch, this will be removed
+ */
+ max_lb_iterations = 0;
+ } else
+ max_lb_iterations = cpumask_weight(env.dst_grpmask);
+
cpumask_copy(cpus, cpu_active_mask);
- max_lb_iterations = cpumask_weight(env.dst_grpmask);

schedstat_inc(sd, lb_count[idle]);

Subject: [tip:sched/core] sched: Move up affinity check to mitigate useless redoing overhead

Commit-ID: d31980846f9688db3ee3e5863525c6ff8ace4c7c
Gitweb: http://git.kernel.org/tip/d31980846f9688db3ee3e5863525c6ff8ace4c7c
Author: Joonsoo Kim <[email protected]>
AuthorDate: Tue, 23 Apr 2013 17:27:40 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 24 Apr 2013 08:52:44 +0200

sched: Move up affinity check to mitigate useless redoing overhead

Currently, LBF_ALL_PINNED is cleared after affinity check is
passed. So, if task migration is skipped by small load value or
small imbalance value in move_tasks(), we don't clear
LBF_ALL_PINNED. At last, we trigger 'redo' in load_balance().

Imbalance value is often so small that any tasks cannot be moved
to other cpus and, of course, this situation may be continued
after we change the target cpu. So this patch move up affinity
check code and clear LBF_ALL_PINNED before evaluating load value
in order to mitigate useless redoing overhead.

In addition, re-order some comments correctly.

Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Tested-by: Jason Low <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dfa92b7..b8ef321 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3896,10 +3896,14 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
int tsk_cache_hot = 0;
/*
* We do not migrate tasks that are:
- * 1) running (obviously), or
+ * 1) throttled_lb_pair, or
* 2) cannot be migrated to this CPU due to cpus_allowed, or
- * 3) are cache-hot on their current CPU.
+ * 3) running (obviously), or
+ * 4) are cache-hot on their current CPU.
*/
+ if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
+ return 0;
+
if (!cpumask_test_cpu(env->dst_cpu, tsk_cpus_allowed(p))) {
int new_dst_cpu;

@@ -3967,9 +3971,6 @@ static int move_one_task(struct lb_env *env)
struct task_struct *p, *n;

list_for_each_entry_safe(p, n, &env->src_rq->cfs_tasks, se.group_node) {
- if (throttled_lb_pair(task_group(p), env->src_rq->cpu, env->dst_cpu))
- continue;
-
if (!can_migrate_task(p, env))
continue;

@@ -4021,7 +4022,7 @@ static int move_tasks(struct lb_env *env)
break;
}

- if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
+ if (!can_migrate_task(p, env))
goto next;

load = task_h_load(p);
@@ -4032,9 +4033,6 @@ static int move_tasks(struct lb_env *env)
if ((load / 2) > env->imbalance)
goto next;

- if (!can_migrate_task(p, env))
- goto next;
-
move_task(p, env);
pulled++;
env->imbalance -= load;

Subject: [tip:sched/core] sched: Prevent to re-select dst-cpu in load_balance()

Commit-ID: e02e60c109ca70935bad1131976bdbf5160cf576
Gitweb: http://git.kernel.org/tip/e02e60c109ca70935bad1131976bdbf5160cf576
Author: Joonsoo Kim <[email protected]>
AuthorDate: Tue, 23 Apr 2013 17:27:42 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 24 Apr 2013 08:52:46 +0200

sched: Prevent to re-select dst-cpu in load_balance()

Commit 88b8dac0 makes load_balance() consider other cpus in its
group. But, in that, there is no code for preventing to
re-select dst-cpu. So, same dst-cpu can be selected over and
over.

This patch add functionality to load_balance() in order to
exclude cpu which is selected once. We prevent to re-select
dst_cpu via env's cpus, so now, env's cpus is a candidate not
only for src_cpus, but also dst_cpus.

With this patch, we can remove lb_iterations and
max_lb_iterations, because we decide whether we can go ahead or
not via env's cpus.

Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Tested-by: Jason Low <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 33 +++++++++++++++------------------
1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5b1e966..acaf567 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3905,7 +3905,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
return 0;

if (!cpumask_test_cpu(env->dst_cpu, tsk_cpus_allowed(p))) {
- int new_dst_cpu;
+ int cpu;

schedstat_inc(p, se.statistics.nr_failed_migrations_affine);

@@ -3920,12 +3920,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
if (!env->dst_grpmask || (env->flags & LBF_SOME_PINNED))
return 0;

- new_dst_cpu = cpumask_first_and(env->dst_grpmask,
- tsk_cpus_allowed(p));
- if (new_dst_cpu < nr_cpu_ids) {
- env->flags |= LBF_SOME_PINNED;
- env->new_dst_cpu = new_dst_cpu;
+ /* Prevent to re-select dst_cpu via env's cpus */
+ for_each_cpu_and(cpu, env->dst_grpmask, env->cpus) {
+ if (cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) {
+ env->flags |= LBF_SOME_PINNED;
+ env->new_dst_cpu = cpu;
+ break;
+ }
}
+
return 0;
}

@@ -5008,7 +5011,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
int *balance)
{
int ld_moved, cur_ld_moved, active_balance = 0;
- int lb_iterations, max_lb_iterations;
struct sched_group *group;
struct rq *busiest;
unsigned long flags;
@@ -5028,15 +5030,8 @@ static int load_balance(int this_cpu, struct rq *this_rq,
* For NEWLY_IDLE load_balancing, we don't need to consider
* other cpus in our group
*/
- if (idle == CPU_NEWLY_IDLE) {
+ if (idle == CPU_NEWLY_IDLE)
env.dst_grpmask = NULL;
- /*
- * we don't care max_lb_iterations in this case,
- * in following patch, this will be removed
- */
- max_lb_iterations = 0;
- } else
- max_lb_iterations = cpumask_weight(env.dst_grpmask);

cpumask_copy(cpus, cpu_active_mask);

@@ -5064,7 +5059,6 @@ redo:
schedstat_add(sd, lb_imbalance[idle], env.imbalance);

ld_moved = 0;
- lb_iterations = 1;
if (busiest->nr_running > 1) {
/*
* Attempt to move tasks. If find_busiest_group has found
@@ -5121,14 +5115,17 @@ more_balance:
* moreover subsequent load balance cycles should correct the
* excess load moved.
*/
- if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0 &&
- lb_iterations++ < max_lb_iterations) {
+ if ((env.flags & LBF_SOME_PINNED) && env.imbalance > 0) {

env.dst_rq = cpu_rq(env.new_dst_cpu);
env.dst_cpu = env.new_dst_cpu;
env.flags &= ~LBF_SOME_PINNED;
env.loop = 0;
env.loop_break = sched_nr_migrate_break;
+
+ /* Prevent to re-select dst_cpu via env's cpus */
+ cpumask_clear_cpu(env.dst_cpu, env.cpus);
+
/*
* Go back to "more_balance" rather than "redo" since we
* need to continue with same src_cpu.

Subject: [tip:sched/core] sched: Rename load_balance_tmpmask to load_balance_mask

Commit-ID: e6252c3ef4b9cd251b53f7b68035f395d20b044e
Gitweb: http://git.kernel.org/tip/e6252c3ef4b9cd251b53f7b68035f395d20b044e
Author: Joonsoo Kim <[email protected]>
AuthorDate: Tue, 23 Apr 2013 17:27:41 +0900
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 24 Apr 2013 08:52:45 +0200

sched: Rename load_balance_tmpmask to load_balance_mask

This name doesn't represent specific meaning.
So rename it to imply it's purpose.

Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Tested-by: Jason Low <[email protected]>
Cc: Srivatsa Vaddagiri <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/core.c | 4 ++--
kernel/sched/fair.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ee8c1bd..cb49b2a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6873,7 +6873,7 @@ struct task_group root_task_group;
LIST_HEAD(task_groups);
#endif

-DECLARE_PER_CPU(cpumask_var_t, load_balance_tmpmask);
+DECLARE_PER_CPU(cpumask_var_t, load_balance_mask);

void __init sched_init(void)
{
@@ -6910,7 +6910,7 @@ void __init sched_init(void)
#endif /* CONFIG_RT_GROUP_SCHED */
#ifdef CONFIG_CPUMASK_OFFSTACK
for_each_possible_cpu(i) {
- per_cpu(load_balance_tmpmask, i) = (void *)ptr;
+ per_cpu(load_balance_mask, i) = (void *)ptr;
ptr += cpumask_size();
}
#endif /* CONFIG_CPUMASK_OFFSTACK */
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b8ef321..5b1e966 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4977,7 +4977,7 @@ static struct rq *find_busiest_queue(struct lb_env *env,
#define MAX_PINNED_INTERVAL 512

/* Working cpumask for load_balance and load_balance_newidle. */
-DEFINE_PER_CPU(cpumask_var_t, load_balance_tmpmask);
+DEFINE_PER_CPU(cpumask_var_t, load_balance_mask);

static int need_active_balance(struct lb_env *env)
{
@@ -5012,7 +5012,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
struct sched_group *group;
struct rq *busiest;
unsigned long flags;
- struct cpumask *cpus = __get_cpu_var(load_balance_tmpmask);
+ struct cpumask *cpus = __get_cpu_var(load_balance_mask);

struct lb_env env = {
.sd = sd,