2013-08-18 08:25:59

by Lei Wen

[permalink] [raw]
Subject: [PATCH 0/8] sched: fixes for the nr_running usage

Since it is different for the nr_running and h_nr_running in its
presenting meaning, we should take care of their usage in the scheduler.

Lei Wen (8):
sched: change load balance number to h_nr_running of run queue
sched: change cpu_avg_load_per_task using h_nr_running
sched: change update_rq_runnable_avg using h_nr_running
sched: change pick_next_task_fair to h_nr_running
sched: change update_sg_lb_stats to h_nr_running
sched: change find_busiest_queue to h_nr_running
sched: change active_load_balance_cpu_stop to use h_nr_running
sched: document the difference between nr_running and h_nr_running

kernel/sched/fair.c | 23 +++++++++++++----------
kernel/sched/sched.h | 6 ++++++
2 files changed, 19 insertions(+), 10 deletions(-)

--
1.7.5.4


2013-08-18 08:26:03

by Lei Wen

[permalink] [raw]
Subject: [PATCH 1/8] sched: change load balance number to h_nr_running of run queue

Since rq->nr_running would include both migration and rt task, it is not
reasonable to seek to move nr_running number of task in the load_balance
function, since it only apply to cfs type.

Change it to cfs's h_nr_running, which could well present the task
number in current cfs queue.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f918635..d6153c8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5096,17 +5096,19 @@ redo:
schedstat_add(sd, lb_imbalance[idle], env.imbalance);

ld_moved = 0;
- if (busiest->nr_running > 1) {
+ /* load balance only apply to CFS task, we use h_nr_running here */
+ if (busiest->cfs.h_nr_running > 1) {
/*
* Attempt to move tasks. If find_busiest_group has found
- * an imbalance but busiest->nr_running <= 1, the group is
+ * an imbalance but busiest->cfs.h_nr_running <= 1, the group is
* still unbalanced. ld_moved simply stays zero, so it is
* correctly treated as an imbalance.
*/
env.flags |= LBF_ALL_PINNED;
env.src_cpu = busiest->cpu;
env.src_rq = busiest;
- env.loop_max = min(sysctl_sched_nr_migrate, busiest->nr_running);
+ env.loop_max = min(sysctl_sched_nr_migrate,
+ busiest->cfs.h_nr_running);

update_h_load(env.src_cpu);
more_balance:
--
1.7.5.4

2013-08-18 08:26:32

by Lei Wen

[permalink] [raw]
Subject: [PATCH 3/8] sched: change update_rq_runnable_avg using h_nr_running

Since update_rq_runnable_avg is used only by cfs scheduler, it
should not consider the task beyond the cfs type.

If one cfs task is running with one rt task, the only cfs task
should be no aware of the existence of rt task, and behavior
like one cfs task occasionly throttled by some bandwidth control
mechanism. Thus its sleep time should not being taken into
runnable avg load calculation.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e6b99b4..9869d4d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2893,7 +2893,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
}

if (!se) {
- update_rq_runnable_avg(rq, rq->nr_running);
+ update_rq_runnable_avg(rq, rq->cfs.h_nr_running);
inc_nr_running(rq);
}
hrtick_update(rq);
@@ -4142,7 +4142,7 @@ static void __update_blocked_averages_cpu(struct task_group *tg, int cpu)
list_del_leaf_cfs_rq(cfs_rq);
} else {
struct rq *rq = rq_of(cfs_rq);
- update_rq_runnable_avg(rq, rq->nr_running);
+ update_rq_runnable_avg(rq, rq->cfs.h_nr_running);
}
}

--
1.7.5.4

2013-08-18 08:26:29

by Lei Wen

[permalink] [raw]
Subject: [PATCH 2/8] sched: change cpu_avg_load_per_task using h_nr_running

Since cpu_avg_load_per_task is used only by cfs scheduler, its meaning
should present the average cfs type task load in the current run queue.
Thus we change it to h_nr_running for well presenting its meaning.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d6153c8..e6b99b4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3008,7 +3008,7 @@ static unsigned long power_of(int cpu)
static unsigned long cpu_avg_load_per_task(int cpu)
{
struct rq *rq = cpu_rq(cpu);
- unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
+ unsigned long nr_running = ACCESS_ONCE(rq->cfs.h_nr_running);
unsigned long load_avg = rq->cfs.runnable_load_avg;

if (nr_running)
--
1.7.5.4

2013-08-18 08:26:58

by Lei Wen

[permalink] [raw]
Subject: [PATCH 4/8] sched: change pick_next_task_fair to h_nr_running

Since pick_next_task_fair only want to ensure there is some task in the
run queue to be picked up, it should use the h_nr_running instead of
nr_running, since nr_running cannot present all tasks if group existed.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9869d4d..33576eb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3653,7 +3653,7 @@ static struct task_struct *pick_next_task_fair(struct rq *rq)
struct cfs_rq *cfs_rq = &rq->cfs;
struct sched_entity *se;

- if (!cfs_rq->nr_running)
+ if (!cfs_rq->h_nr_running)
return NULL;

do {
--
1.7.5.4

2013-08-18 08:27:01

by Lei Wen

[permalink] [raw]
Subject: [PATCH 5/8] sched: change update_sg_lb_stats to h_nr_running

Since update_sg_lb_stats is used to calculate sched_group load
difference of cfs type task, it should use h_nr_running instead of
nr_running of rq.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 33576eb..e026001 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4488,7 +4488,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
for_each_cpu_and(i, sched_group_cpus(group), env->cpus) {
struct rq *rq = cpu_rq(i);

- nr_running = rq->nr_running;
+ nr_running = rq->cfs.h_nr_running;

/* Bias balancing toward cpus of our domain */
if (local_group) {
--
1.7.5.4

2013-08-18 08:27:26

by Lei Wen

[permalink] [raw]
Subject: [PATCH 7/8] sched: change active_load_balance_cpu_stop to use h_nr_running

We should only avoid do the active load balance when there is no
cfs type task. If just use rq->nr_running, it is possible for the
source cpu has multiple rt task, while zero cfs task, so that it
would confuse the active load balance function that try to move,
but find no task it could move.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3656603..4c96124 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5349,7 +5349,7 @@ static int active_load_balance_cpu_stop(void *data)
goto out_unlock;

/* Is there any task to move? */
- if (busiest_rq->nr_running <= 1)
+ if (busiest_rq->cfs.h_nr_running == 0)
goto out_unlock;

/*
--
1.7.5.4

2013-08-18 08:27:47

by Lei Wen

[permalink] [raw]
Subject: [PATCH 6/8] sched: change find_busiest_queue to h_nr_running

Since find_busiest_queue try to avoid do load balance for runqueue
which has only one cfs task and its load is above the imbalance
value calculated, we should use h_nr_running of cfs instead of
nr_running of rq.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e026001..3656603 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4990,7 +4990,8 @@ static struct rq *find_busiest_queue(struct lb_env *env,
* When comparing with imbalance, use weighted_cpuload()
* which is not scaled with the cpu power.
*/
- if (capacity && rq->nr_running == 1 && wl > env->imbalance)
+ if (capacity && rq->cfs.h_nr_running == 1
+ && wl > env->imbalance)
continue;

/*
--
1.7.5.4

2013-08-18 08:29:21

by Lei Wen

[permalink] [raw]
Subject: [PATCH 8/8] sched: document the difference between nr_running and h_nr_running

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/sched.h | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ef0a7b2..b8f0924 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -248,6 +248,12 @@ struct cfs_bandwidth { };
/* CFS-related fields in a runqueue */
struct cfs_rq {
struct load_weight load;
+ /*
+ * The difference between nr_running and h_nr_running is:
+ * nr_running: present how many entity would take part in the sharing
+ * the cpu power of that cfs_rq
+ * h_nr_running: present how many tasks in current cfs runqueue
+ */
unsigned int nr_running, h_nr_running;

u64 exec_clock;
--
1.7.5.4