This patchset bases on tip/sched/core.
This version changed the runnable load avg value setting for new task
in patch 3rd.
We also tried to include blocked load avg in balance. but find many benchmark
performance dropping. Guess the too bigger cpu load drive task to be waken
on remote CPU, and cause wrong decision in periodic balance.
I retested on Intel core2, NHM, SNB, IVB, 2 and 4 sockets machines with
benchmark kbuild, aim7, dbench, tbench, hackbench, oltp, and netperf loopback
etc. The performance is better now.
On SNB EP 4 sockets machine, the hackbench increased about 50%, and result
become stable. on other machines, hackbench increased about 2~10%.
oltp increased about 30% in NHM EX box.
netperf loopback also increased on SNB EP 4 sockets box.
no clear changes on other benchmarks.
Michael Wang had tested previous version on pgbench on his box:
https://lkml.org/lkml/2013/4/2/1022
And Morten tested previous version too.
http://comments.gmane.org/gmane.linux.kernel/1463371
Thanks comments from Peter, Paul, Morten, Miacheal and Preeti.
And more comments are appreciated!
Regards
Alex
[patch v6 1/8] Revert "sched: Introduce temporary FAIR_GROUP_SCHED
[patch v6 2/8] sched: move few runnable tg variables into CONFIG_SMP
[patch v6 3/8] sched: set initial value of runnable avg for new
[patch v6 4/8] sched: fix slept time double counting in enqueue
[patch v6 5/8] sched: update cpu load after task_tick.
[patch v6 6/8] sched: compute runnable load avg in cpu_load and
[patch v6 7/8] sched: consider runnable load average in move_tasks
[patch v6 8/8] sched: remove blocked_load_avg in tg
Remove CONFIG_FAIR_GROUP_SCHED that covers the runnable info, then
we can use runnable load variables.
Signed-off-by: Alex Shi <[email protected]>
---
include/linux/sched.h | 7 +------
kernel/sched/core.c | 7 +------
kernel/sched/fair.c | 13 ++-----------
kernel/sched/sched.h | 10 ++--------
4 files changed, 6 insertions(+), 31 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e692a02..9539597 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1161,12 +1161,7 @@ struct sched_entity {
struct cfs_rq *my_q;
#endif
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
/* Per-entity load-tracking */
struct sched_avg avg;
#endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 67d0465..c8db984 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1563,12 +1563,7 @@ static void __sched_fork(struct task_struct *p)
p->se.vruntime = 0;
INIT_LIST_HEAD(&p->se.group_node);
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
p->se.avg.runnable_avg_period = 0;
p->se.avg.runnable_avg_sum = 0;
#endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a33e59..9c2f726 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1109,8 +1109,7 @@ static inline void update_cfs_shares(struct cfs_rq *cfs_rq)
}
#endif /* CONFIG_FAIR_GROUP_SCHED */
-/* Only depends on SMP, FAIR_GROUP_SCHED may be removed when useful in lb */
-#if defined(CONFIG_SMP) && defined(CONFIG_FAIR_GROUP_SCHED)
+#ifdef CONFIG_SMP
/*
* We choose a half-life close to 1 scheduling period.
* Note: The tables below are dependent on this value.
@@ -3394,12 +3393,6 @@ unlock:
}
/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#ifdef CONFIG_FAIR_GROUP_SCHED
-/*
* Called immediately before a task is migrated to a new cpu; task_cpu(p) and
* cfs_rq_of(p) references at time of call are still valid and identify the
* previous cpu. However, the caller only guarantees p->pi_lock is held; no
@@ -3422,7 +3415,6 @@ migrate_task_rq_fair(struct task_struct *p, int next_cpu)
atomic64_add(se->avg.load_avg_contrib, &cfs_rq->removed_load);
}
}
-#endif
#endif /* CONFIG_SMP */
static unsigned long
@@ -6114,9 +6106,8 @@ const struct sched_class fair_sched_class = {
#ifdef CONFIG_SMP
.select_task_rq = select_task_rq_fair,
-#ifdef CONFIG_FAIR_GROUP_SCHED
.migrate_task_rq = migrate_task_rq_fair,
-#endif
+
.rq_online = rq_online_fair,
.rq_offline = rq_offline_fair,
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index cc03cfd..9419764 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -227,12 +227,6 @@ struct cfs_rq {
#endif
#ifdef CONFIG_SMP
-/*
- * Load-tracking only depends on SMP, FAIR_GROUP_SCHED dependency below may be
- * removed when useful for applications beyond shares distribution (e.g.
- * load-balance).
- */
-#ifdef CONFIG_FAIR_GROUP_SCHED
/*
* CFS Load tracking
* Under CFS, load is tracked on a per-entity basis and aggregated up.
@@ -242,9 +236,9 @@ struct cfs_rq {
u64 runnable_load_avg, blocked_load_avg;
atomic64_t decay_counter, removed_load;
u64 last_decay;
-#endif /* CONFIG_FAIR_GROUP_SCHED */
-/* These always depend on CONFIG_FAIR_GROUP_SCHED */
+
#ifdef CONFIG_FAIR_GROUP_SCHED
+ /* Required to track per-cpu representation of a task_group */
u32 tg_runnable_contrib;
u64 tg_load_contrib;
#endif /* CONFIG_FAIR_GROUP_SCHED */
--
1.7.5.4
We need initialize the se.avg.{decay_count, load_avg_contrib} for a
new forked task.
Otherwise random values of above variables cause mess when do new task
enqueue:
enqueue_task_fair
enqueue_entity
enqueue_entity_load_avg
and make forking balancing imbalance since incorrect load_avg_contrib.
Further more, Morten Rasmussen notice some tasks were not launched at
once after created. So Paul and Peter suggest giving a start value for
new task runnable avg time same as sched_slice().
Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/core.c | 6 ++----
kernel/sched/fair.c | 23 +++++++++++++++++++++++
kernel/sched/sched.h | 2 ++
3 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c8db984..866c05a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1563,10 +1563,6 @@ static void __sched_fork(struct task_struct *p)
p->se.vruntime = 0;
INIT_LIST_HEAD(&p->se.group_node);
-#ifdef CONFIG_SMP
- p->se.avg.runnable_avg_period = 0;
- p->se.avg.runnable_avg_sum = 0;
-#endif
#ifdef CONFIG_SCHEDSTATS
memset(&p->se.statistics, 0, sizeof(p->se.statistics));
#endif
@@ -1710,6 +1706,8 @@ void wake_up_new_task(struct task_struct *p)
set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0));
#endif
+ /* Give new task start runnable values */
+ set_task_runnable_avg(p);
rq = __task_rq_lock(p);
activate_task(rq, p, 0);
p->on_rq = 1;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9c2f726..203f236 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -661,6 +661,26 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se)
return calc_delta_fair(sched_slice(cfs_rq, se), se);
}
+#ifdef CONFIG_SMP
+static inline void __update_task_entity_contrib(struct sched_entity *se);
+
+/* Give new task start runnable values to heavy its load in infant time */
+void set_task_runnable_avg(struct task_struct *p)
+{
+ u32 slice;
+
+ p->se.avg.decay_count = 0;
+ slice = sched_slice(task_cfs_rq(p), &p->se) >> 10;
+ p->se.avg.runnable_avg_sum = slice;
+ p->se.avg.runnable_avg_period = slice;
+ __update_task_entity_contrib(&p->se);
+}
+#else
+void set_task_runnable_avg(struct task_struct *p)
+{
+}
+#endif
+
/*
* Update the current task's runtime statistics. Skip current tasks that
* are not in our scheduling class.
@@ -1508,6 +1528,9 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
* We track migrations using entity decay_count <= 0, on a wake-up
* migration we use a negative decay count to track the remote decays
* accumulated while sleeping.
+ *
+ * When enqueue a new forked task, the se->avg.decay_count == 0, so
+ * we bypass update_entity_load_avg(), use avg.load_avg_contrib direct.
*/
if (unlikely(se->avg.decay_count <= 0)) {
se->avg.last_runnable_update = rq_of(cfs_rq)->clock_task;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c6634f1..518f3d8a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -900,6 +900,8 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime
extern void update_idle_cpu_load(struct rq *this_rq);
+extern void set_task_runnable_avg(struct task_struct *p);
+
#ifdef CONFIG_CGROUP_CPUACCT
#include <linux/cgroup.h>
/* track cpu usage of a group of tasks and its child groups */
--
1.7.5.4
The following 2 variables only used under CONFIG_SMP, so better to move
their definiation into CONFIG_SMP too.
atomic64_t load_avg;
atomic_t runnable_avg;
Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/sched.h | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 9419764..c6634f1 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -114,9 +114,11 @@ struct task_group {
unsigned long shares;
atomic_t load_weight;
+#ifdef CONFIG_SMP
atomic64_t load_avg;
atomic_t runnable_avg;
#endif
+#endif
#ifdef CONFIG_RT_GROUP_SCHED
struct sched_rt_entity **rt_se;
--
1.7.5.4
The wakeuped migrated task will __synchronize_entity_decay(se); in
migrate_task_fair, then it needs to set
`se->avg.last_runnable_update -= (-se->avg.decay_count) << 20'
before update_entity_load_avg, in order to avoid slept time is updated
twice for se.avg.load_avg_contrib in both __syncchronize and
update_entity_load_avg.
but if the slept task is waked up from self cpu, it miss the
last_runnable_update before update_entity_load_avg(se, 0, 1), then the
slept time was used twice in both functions.
So we need to remove the double slept time counting.
Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/fair.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 203f236..8cd19f3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1551,7 +1551,8 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
}
wakeup = 0;
} else {
- __synchronize_entity_decay(se);
+ se->avg.last_runnable_update += __synchronize_entity_decay(se)
+ << 20;
}
/* migrated tasks did not contribute to our blocked load */
--
1.7.5.4
Except using runnable load average in background, move_tasks is also
the key functions in load balance. We need consider the runnable load
average in it in order to the apple to apple load comparison.
Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/fair.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0159c85..91e60ac 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4138,11 +4138,11 @@ static int tg_load_down(struct task_group *tg, void *data)
long cpu = (long)data;
if (!tg->parent) {
- load = cpu_rq(cpu)->load.weight;
+ load = cpu_rq(cpu)->avg.load_avg_contrib;
} else {
load = tg->parent->cfs_rq[cpu]->h_load;
- load *= tg->se[cpu]->load.weight;
- load /= tg->parent->cfs_rq[cpu]->load.weight + 1;
+ load *= tg->se[cpu]->avg.load_avg_contrib;
+ load /= tg->parent->cfs_rq[cpu]->runnable_load_avg + 1;
}
tg->cfs_rq[cpu]->h_load = load;
@@ -4170,8 +4170,8 @@ static unsigned long task_h_load(struct task_struct *p)
struct cfs_rq *cfs_rq = task_cfs_rq(p);
unsigned long load;
- load = p->se.load.weight;
- load = div_u64(load * cfs_rq->h_load, cfs_rq->load.weight + 1);
+ load = p->se.avg.load_avg_contrib;
+ load = div_u64(load * cfs_rq->h_load, cfs_rq->runnable_load_avg + 1);
return load;
}
--
1.7.5.4
They are the base values in load balance, update them with rq runnable
load average, then the load balance will consider runnable load avg
naturally.
We also try to include the blocked_load_avg as cpu load in balancing,
but that cause kbuild/aim7/oltp benchmark performance drop.
Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/core.c | 16 ++++++++++++++--
kernel/sched/fair.c | 5 +++--
2 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f1f9641..8ab37c3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2528,9 +2528,14 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
void update_idle_cpu_load(struct rq *this_rq)
{
unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
- unsigned long load = this_rq->load.weight;
+ unsigned long load;
unsigned long pending_updates;
+#ifdef CONFIG_SMP
+ load = this_rq->cfs.runnable_load_avg;
+#else
+ load = this_rq->load.weight;
+#endif
/*
* bail if there's load or we're actually up-to-date.
*/
@@ -2574,11 +2579,18 @@ void update_cpu_load_nohz(void)
*/
static void update_cpu_load_active(struct rq *this_rq)
{
+ unsigned long load;
+
+#ifdef CONFIG_SMP
+ load = this_rq->cfs.runnable_load_avg;
+#else
+ load = this_rq->load.weight;
+#endif
/*
* See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
*/
this_rq->last_load_update_tick = jiffies;
- __update_cpu_load(this_rq, this_rq->load.weight, 1);
+ __update_cpu_load(this_rq, load, 1);
calc_load_account_active(this_rq);
}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8cd19f3..0159c85 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2920,7 +2920,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
/* Used instead of source_load when we know the type == 0 */
static unsigned long weighted_cpuload(const int cpu)
{
- return cpu_rq(cpu)->load.weight;
+ return cpu_rq(cpu)->cfs.runnable_load_avg;
}
/*
@@ -2965,9 +2965,10 @@ static unsigned long cpu_avg_load_per_task(int cpu)
{
struct rq *rq = cpu_rq(cpu);
unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
+ unsigned long load_avg = rq->cfs.runnable_load_avg;
if (nr_running)
- return rq->load.weight / nr_running;
+ return load_avg / nr_running;
return 0;
}
--
1.7.5.4
blocked_load_avg sometime is too heavy and far bigger than runnable load
avg. that make balance make wrong decision. So better don't consider it.
Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 91e60ac..75c200c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1339,7 +1339,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
struct task_group *tg = cfs_rq->tg;
s64 tg_contrib;
- tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
+ tg_contrib = cfs_rq->runnable_load_avg;
tg_contrib -= cfs_rq->tg_load_contrib;
if (force_update || abs64(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
--
1.7.5.4
To get the latest runnable info, we need do this cpuload update after
task_tick.
Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/core.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 866c05a..f1f9641 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2684,8 +2684,8 @@ void scheduler_tick(void)
raw_spin_lock(&rq->lock);
update_rq_clock(rq);
- update_cpu_load_active(rq);
curr->sched_class->task_tick(rq, curr, 0);
+ update_cpu_load_active(rq);
raw_spin_unlock(&rq->lock);
perf_event_task_tick();
--
1.7.5.4
On Fri, May 10, 2013 at 11:17:27PM +0800, Alex Shi wrote:
> They are the base values in load balance, update them with rq runnable
> load average, then the load balance will consider runnable load avg
> naturally.
>
> We also try to include the blocked_load_avg as cpu load in balancing,
> but that cause kbuild/aim7/oltp benchmark performance drop.
>
> Signed-off-by: Alex Shi <[email protected]>
> ---
> kernel/sched/core.c | 16 ++++++++++++++--
> kernel/sched/fair.c | 5 +++--
> 2 files changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f1f9641..8ab37c3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2528,9 +2528,14 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
> void update_idle_cpu_load(struct rq *this_rq)
> {
> unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
> - unsigned long load = this_rq->load.weight;
> + unsigned long load;
> unsigned long pending_updates;
>
> +#ifdef CONFIG_SMP
> + load = this_rq->cfs.runnable_load_avg;
> +#else
> + load = this_rq->load.weight;
> +#endif
> /*
> * bail if there's load or we're actually up-to-date.
> */
> @@ -2574,11 +2579,18 @@ void update_cpu_load_nohz(void)
> */
> static void update_cpu_load_active(struct rq *this_rq)
> {
> + unsigned long load;
> +
> +#ifdef CONFIG_SMP
> + load = this_rq->cfs.runnable_load_avg;
> +#else
> + load = this_rq->load.weight;
> +#endif
> /*
> * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
This just smells like you want a helper function... :-)
Also it doesn't apply anymore due to Paul Gortemaker moving some of this
stuff about.
On 05/13/2013 10:06 PM, Peter Zijlstra wrote:
>> > static void update_cpu_load_active(struct rq *this_rq)
>> > {
>> > + unsigned long load;
>> > +
>> > +#ifdef CONFIG_SMP
>> > + load = this_rq->cfs.runnable_load_avg;
>> > +#else
>> > + load = this_rq->load.weight;
>> > +#endif
>> > /*
>> > * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
> This just smells like you want a helper function... :-)
Yes, thanks for point out this!
>
> Also it doesn't apply anymore due to Paul Gortemaker moving some of this
> stuff about.
Will rebase on this. Thanks again! :)
--
Thanks
Alex
On 05/13/2013 10:06 PM, Peter Zijlstra wrote:
>> > * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
> This just smells like you want a helper function... :-)
>
> Also it doesn't apply anymore due to Paul Gortemaker moving some of this
> stuff about.
>
>
patch updated. Any comments are appreciated! :)
---
>From fe23d908a7f80dc5cca0abf9cefaf1004a67b331 Mon Sep 17 00:00:00 2001
From: Alex Shi <[email protected]>
Date: Tue, 14 May 2013 10:11:12 +0800
Subject: [PATCH 6/8] sched: compute runnable load avg in cpu_load and
cpu_avg_load_per_task
They are the base values in load balance, update them with rq runnable
load average, then the load balance will consider runnable load avg
naturally.
We also try to include the blocked_load_avg as cpu load in balancing,
but that cause kbuild performance drop 6% on every Intel machine, and
aim7/oltp drop on some of 4 CPU sockets machines.
Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/fair.c | 5 +++--
kernel/sched/proc.c | 17 +++++++++++++++--
2 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a534d1f..d2d3e03 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2960,7 +2960,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
/* Used instead of source_load when we know the type == 0 */
static unsigned long weighted_cpuload(const int cpu)
{
- return cpu_rq(cpu)->load.weight;
+ return cpu_rq(cpu)->cfs.runnable_load_avg;
}
/*
@@ -3005,9 +3005,10 @@ static unsigned long cpu_avg_load_per_task(int cpu)
{
struct rq *rq = cpu_rq(cpu);
unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
+ unsigned long load_avg = rq->cfs.runnable_load_avg;
if (nr_running)
- return rq->load.weight / nr_running;
+ return load_avg / nr_running;
return 0;
}
diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
index bb3a6a0..ce5cd48 100644
--- a/kernel/sched/proc.c
+++ b/kernel/sched/proc.c
@@ -501,6 +501,18 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
sched_avg_update(this_rq);
}
+#ifdef CONFIG_SMP
+unsigned long get_rq_runnable_load(struct rq *rq)
+{
+ return rq->cfs.runnable_load_avg;
+}
+#else
+unsigned long get_rq_runnable_load(struct rq *rq)
+{
+ return rq->load.weight;
+}
+#endif
+
#ifdef CONFIG_NO_HZ_COMMON
/*
* There is no sane way to deal with nohz on smp when using jiffies because the
@@ -522,7 +534,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
void update_idle_cpu_load(struct rq *this_rq)
{
unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
- unsigned long load = this_rq->load.weight;
+ unsigned long load = get_rq_runnable_load(this_rq);
unsigned long pending_updates;
/*
@@ -568,11 +580,12 @@ void update_cpu_load_nohz(void)
*/
void update_cpu_load_active(struct rq *this_rq)
{
+ unsigned long load = get_rq_runnable_load(this_rq);
/*
* See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
*/
this_rq->last_load_update_tick = jiffies;
- __update_cpu_load(this_rq, this_rq->load.weight, 1);
+ __update_cpu_load(this_rq, load, 1);
calc_load_account_active(this_rq);
}
--
1.7.12
--
Thanks
Alex
On 05/10/2013 11:17 PM, Alex Shi wrote:
> On SNB EP 4 sockets machine, the hackbench increased about 50%, and result
> become stable. on other machines, hackbench increased about 2~10%.
> oltp increased about 30% in NHM EX box.
> netperf loopback also increased on SNB EP 4 sockets box.
> no clear changes on other benchmarks.
Paul, Michael, Morten, any comments are appreciated from you! :)
--
Thanks
Alex
On Fri, May 10, 2013 at 11:17:29PM +0800, Alex Shi wrote:
> blocked_load_avg sometime is too heavy and far bigger than runnable load
> avg. that make balance make wrong decision. So better don't consider it.
Would you happen to have an example around that illustrates this?
Also, you've just changed the cgroup balancing -- did you run any tests on that?
> Signed-off-by: Alex Shi <[email protected]>
> ---
> kernel/sched/fair.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 91e60ac..75c200c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1339,7 +1339,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
> struct task_group *tg = cfs_rq->tg;
> s64 tg_contrib;
>
> - tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
> + tg_contrib = cfs_rq->runnable_load_avg;
> tg_contrib -= cfs_rq->tg_load_contrib;
>
> if (force_update || abs64(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
> --
> 1.7.5.4
>
On Fri, May 10, 2013 at 8:17 AM, Alex Shi <[email protected]> wrote:
> blocked_load_avg sometime is too heavy and far bigger than runnable load
> avg. that make balance make wrong decision. So better don't consider it.
>
> Signed-off-by: Alex Shi <[email protected]>
> ---
> kernel/sched/fair.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 91e60ac..75c200c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1339,7 +1339,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
> struct task_group *tg = cfs_rq->tg;
> s64 tg_contrib;
>
> - tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
Nack -- This is necessary for correct shares distribution.
T
> + tg_contrib = cfs_rq->runnable_load_avg;
> tg_contrib -= cfs_rq->tg_load_contrib;
>
> if (force_update || abs64(tg_contrib) > cfs_rq->tg_load_contrib / 8) {
> --
> 1.7.5.4
>
On Fri, May 10, 2013 at 8:17 AM, Alex Shi <[email protected]> wrote:
> This patchset bases on tip/sched/core.
>
> This version changed the runnable load avg value setting for new task
> in patch 3rd.
>
> We also tried to include blocked load avg in balance. but find many benchmark
> performance dropping. Guess the too bigger cpu load drive task to be waken
> on remote CPU, and cause wrong decision in periodic balance.
Fundamentally, I think we should be exploring this space.
While it's perhaps not surprising that it's not a drop-in, since the
current code was tuned always considering the instaneous balance, it
seems the likely path to increased balance stability.
Although, if the code is yielding substantive benefits in its current
form we should consider merging it in the interim.
> I retested on Intel core2, NHM, SNB, IVB, 2 and 4 sockets machines with
> benchmark kbuild, aim7, dbench, tbench, hackbench, oltp, and netperf loopback
> etc. The performance is better now.
>
> On SNB EP 4 sockets machine, the hackbench increased about 50%, and result
> become stable. on other machines, hackbench increased about 2~10%.
> oltp increased about 30% in NHM EX box.
> netperf loopback also increased on SNB EP 4 sockets box.
> no clear changes on other benchmarks.
>
> Michael Wang had tested previous version on pgbench on his box:
> https://lkml.org/lkml/2013/4/2/1022
>
> And Morten tested previous version too.
> http://comments.gmane.org/gmane.linux.kernel/1463371
>
> Thanks comments from Peter, Paul, Morten, Miacheal and Preeti.
> And more comments are appreciated!
>
> Regards
> Alex
>
> [patch v6 1/8] Revert "sched: Introduce temporary FAIR_GROUP_SCHED
> [patch v6 2/8] sched: move few runnable tg variables into CONFIG_SMP
> [patch v6 3/8] sched: set initial value of runnable avg for new
> [patch v6 4/8] sched: fix slept time double counting in enqueue
> [patch v6 5/8] sched: update cpu load after task_tick.
> [patch v6 6/8] sched: compute runnable load avg in cpu_load and
> [patch v6 7/8] sched: consider runnable load average in move_tasks
> [patch v6 8/8] sched: remove blocked_load_avg in tg
On 05/14/2013 04:31 PM, Peter Zijlstra wrote:
> On Fri, May 10, 2013 at 11:17:29PM +0800, Alex Shi wrote:
>> > blocked_load_avg sometime is too heavy and far bigger than runnable load
>> > avg. that make balance make wrong decision. So better don't consider it.
> Would you happen to have an example around that illustrates this?
Sorry, No.
>
> Also, you've just changed the cgroup balancing -- did you run any tests on that?
>
I tested all benchmarks on cover letter maintained, aim7, kbuild etc.
with autogroup enabled. There is no clear performance change.
But since the machine just run benchmark without anyother load, that
doesn't enough.
--
Thanks
Alex
On 05/14/2013 05:05 PM, Paul Turner wrote:
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index 91e60ac..75c200c 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -1339,7 +1339,7 @@ static inline void __update_cfs_rq_tg_load_contrib(struct cfs_rq *cfs_rq,
>> > struct task_group *tg = cfs_rq->tg;
>> > s64 tg_contrib;
>> >
>> > - tg_contrib = cfs_rq->runnable_load_avg + cfs_rq->blocked_load_avg;
> Nack -- This is necessary for correct shares distribution.
I was going to set this patch as RFC. :)
BTW, did you do some test of this part?
--
Thanks
Alex
On 05/14/2013 05:34 PM, Paul Turner wrote:
>> >
>> > We also tried to include blocked load avg in balance. but find many benchmark
>> > performance dropping. Guess the too bigger cpu load drive task to be waken
>> > on remote CPU, and cause wrong decision in periodic balance.
> Fundamentally, I think we should be exploring this space.
I thought something of this. but can not figure out a direction or stand
by some theories.
>
> While it's perhaps not surprising that it's not a drop-in, since the
> current code was tuned always considering the instaneous balance, it
> seems the likely path to increased balance stability.
>
> Although, if the code is yielding substantive benefits in its current
> form we should consider merging it in the interim.
Sorry, I can not follow you here.
>
--
Thanks
Alex
Hi, Alex
On 05/14/2013 03:27 PM, Alex Shi wrote:
[snip]
> }
> diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
This patch seems to be based on 3.10-rc1, while below one
[patch v6 3/8] sched: set initial value of runnable avg for new forked task
is conflict with 3.10-rc1... I think it may need some rebase?
Regards,
Michael Wang
> index bb3a6a0..ce5cd48 100644
> --- a/kernel/sched/proc.c
> +++ b/kernel/sched/proc.c
> @@ -501,6 +501,18 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
> sched_avg_update(this_rq);
> }
>
> +#ifdef CONFIG_SMP
> +unsigned long get_rq_runnable_load(struct rq *rq)
> +{
> + return rq->cfs.runnable_load_avg;
> +}
> +#else
> +unsigned long get_rq_runnable_load(struct rq *rq)
> +{
> + return rq->load.weight;
> +}
> +#endif
> +
> #ifdef CONFIG_NO_HZ_COMMON
> /*
> * There is no sane way to deal with nohz on smp when using jiffies because the
> @@ -522,7 +534,7 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
> void update_idle_cpu_load(struct rq *this_rq)
> {
> unsigned long curr_jiffies = ACCESS_ONCE(jiffies);
> - unsigned long load = this_rq->load.weight;
> + unsigned long load = get_rq_runnable_load(this_rq);
> unsigned long pending_updates;
>
> /*
> @@ -568,11 +580,12 @@ void update_cpu_load_nohz(void)
> */
> void update_cpu_load_active(struct rq *this_rq)
> {
> + unsigned long load = get_rq_runnable_load(this_rq);
> /*
> * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
> */
> this_rq->last_load_update_tick = jiffies;
> - __update_cpu_load(this_rq, this_rq->load.weight, 1);
> + __update_cpu_load(this_rq, load, 1);
>
> calc_load_account_active(this_rq);
> }
>
On 05/10/2013 11:17 PM, Alex Shi wrote:
> We need initialize the se.avg.{decay_count, load_avg_contrib} for a
> new forked task.
> Otherwise random values of above variables cause mess when do new task
> enqueue:
> enqueue_task_fair
> enqueue_entity
> enqueue_entity_load_avg
>
> and make forking balancing imbalance since incorrect load_avg_contrib.
>
> Further more, Morten Rasmussen notice some tasks were not launched at
> once after created. So Paul and Peter suggest giving a start value for
> new task runnable avg time same as sched_slice().
>
updated, it fits latest linus and tip/sched/core tree.
>From 30ba6d80b256c17861e2c9128fdf41cc048af05a Mon Sep 17 00:00:00 2001
From: Alex Shi <[email protected]>
Date: Tue, 14 May 2013 09:41:09 +0800
Subject: [PATCH 3/8] sched: set initial value of runnable avg for new forked
task
We need initialize the se.avg.{decay_count, load_avg_contrib} for a
new forked task.
Otherwise random values of above variables cause mess when do new task
enqueue:
enqueue_task_fair
enqueue_entity
enqueue_entity_load_avg
and make forking balancing imbalance since incorrect load_avg_contrib.
Further more, Morten Rasmussen notice some tasks were not launched at
once after created. So Paul and Peter suggest giving a start value for
new task runnable avg time same as sched_slice().
Signed-off-by: Alex Shi <[email protected]>
---
kernel/sched/core.c | 6 ++----
kernel/sched/fair.c | 23 +++++++++++++++++++++++
kernel/sched/sched.h | 2 ++
3 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ee1cbc6..920d346 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1596,10 +1596,6 @@ static void __sched_fork(struct task_struct *p)
p->se.vruntime = 0;
INIT_LIST_HEAD(&p->se.group_node);
-#ifdef CONFIG_SMP
- p->se.avg.runnable_avg_period = 0;
- p->se.avg.runnable_avg_sum = 0;
-#endif
#ifdef CONFIG_SCHEDSTATS
memset(&p->se.statistics, 0, sizeof(p->se.statistics));
#endif
@@ -1743,6 +1739,8 @@ void wake_up_new_task(struct task_struct *p)
set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0));
#endif
+ /* Give new task start runnable values */
+ set_task_runnable_avg(p);
rq = __task_rq_lock(p);
activate_task(rq, p, 0);
p->on_rq = 1;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e8f3c8f..add32a6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -680,6 +680,26 @@ static u64 sched_vslice(struct cfs_rq *cfs_rq, struct sched_entity *se)
return calc_delta_fair(sched_slice(cfs_rq, se), se);
}
+#ifdef CONFIG_SMP
+static inline void __update_task_entity_contrib(struct sched_entity *se);
+
+/* Give new task start runnable values to heavy its load in infant time */
+void set_task_runnable_avg(struct task_struct *p)
+{
+ u32 slice;
+
+ p->se.avg.decay_count = 0;
+ slice = sched_slice(task_cfs_rq(p), &p->se) >> 10;
+ p->se.avg.runnable_avg_sum = slice;
+ p->se.avg.runnable_avg_period = slice;
+ __update_task_entity_contrib(&p->se);
+}
+#else
+void set_task_runnable_avg(struct task_struct *p)
+{
+}
+#endif
+
/*
* Update the current task's runtime statistics. Skip current tasks that
* are not in our scheduling class.
@@ -1527,6 +1547,9 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
* We track migrations using entity decay_count <= 0, on a wake-up
* migration we use a negative decay count to track the remote decays
* accumulated while sleeping.
+ *
+ * When enqueue a new forked task, the se->avg.decay_count == 0, so
+ * we bypass update_entity_load_avg(), use avg.load_avg_contrib direct.
*/
if (unlikely(se->avg.decay_count <= 0)) {
se->avg.last_runnable_update = rq_of(cfs_rq)->clock_task;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0272fa4..564cecd 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1049,6 +1049,8 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime
extern void update_idle_cpu_load(struct rq *this_rq);
+extern void set_task_runnable_avg(struct task_struct *p);
+
#ifdef CONFIG_PARAVIRT
static inline u64 steal_ticks(u64 steal)
{
--
1.7.12
On 05/16/2013 01:49 PM, Michael Wang wrote:
> On 05/14/2013 03:27 PM, Alex Shi wrote:
> [snip]
>> > }
>> > diff --git a/kernel/sched/proc.c b/kernel/sched/proc.c
> This patch seems to be based on 3.10-rc1, while below one
>
> [patch v6 3/8] sched: set initial value of runnable avg for new forked task
>
> is conflict with 3.10-rc1... I think it may need some rebase?
With the updated 3rd patch. all patchset works on latest tip/sched/core.
Thanks for testing, Michael!
--
Thanks
Alex
On 05/10/2013 11:17 PM, Alex Shi wrote:
> This patchset bases on tip/sched/core.
>
> This version changed the runnable load avg value setting for new task
> in patch 3rd.
>
> We also tried to include blocked load avg in balance. but find many benchmark
> performance dropping. Guess the too bigger cpu load drive task to be waken
> on remote CPU, and cause wrong decision in periodic balance.
>
> I retested on Intel core2, NHM, SNB, IVB, 2 and 4 sockets machines with
> benchmark kbuild, aim7, dbench, tbench, hackbench, oltp, and netperf loopback
> etc. The performance is better now.
>
> On SNB EP 4 sockets machine, the hackbench increased about 50%, and result
> become stable. on other machines, hackbench increased about 2~10%.
> oltp increased about 30% in NHM EX box.
> netperf loopback also increased on SNB EP 4 sockets box.
> no clear changes on other benchmarks.
>
> Michael Wang had tested previous version on pgbench on his box:
> https://lkml.org/lkml/2013/4/2/1022
Tested the latest patch set (new 3/8 and 6/8) with pgbench, tip
3.10.0-rc1 and 12 cpu X86 box, works well and still benefit ;-)
Regards,
Michael Wang
>
> And Morten tested previous version too.
> http://comments.gmane.org/gmane.linux.kernel/1463371
>
> Thanks comments from Peter, Paul, Morten, Miacheal and Preeti.
> And more comments are appreciated!
>
> Regards
> Alex
>
> [patch v6 1/8] Revert "sched: Introduce temporary FAIR_GROUP_SCHED
> [patch v6 2/8] sched: move few runnable tg variables into CONFIG_SMP
> [patch v6 3/8] sched: set initial value of runnable avg for new
> [patch v6 4/8] sched: fix slept time double counting in enqueue
> [patch v6 5/8] sched: update cpu load after task_tick.
> [patch v6 6/8] sched: compute runnable load avg in cpu_load and
> [patch v6 7/8] sched: consider runnable load average in move_tasks
> [patch v6 8/8] sched: remove blocked_load_avg in tg
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On 05/16/2013 03:29 PM, Michael Wang wrote:
>> > Michael Wang had tested previous version on pgbench on his box:
>> > https://lkml.org/lkml/2013/4/2/1022
> Tested the latest patch set (new 3/8 and 6/8) with pgbench, tip
> 3.10.0-rc1 and 12 cpu X86 box, works well and still benefit ;-)
Thanks Michael! :)
--
Thanks
Alex
On Tue, May 14, 2013 at 07:35:25PM +0800, Alex Shi wrote:
> I tested all benchmarks on cover letter maintained, aim7, kbuild etc.
> with autogroup enabled. There is no clear performance change.
> But since the machine just run benchmark without anyother load, that
> doesn't enough.
Back when we started with smp-fair cgroup muck someone wrote a test for it. I
_think_ it ended up in the LTP test-suite.
Now I don't know if that's up-to-date enough to catch some of the cases we've
recently fixed (as in the past few years) so it might want to be updated.
Paul, do you guys at Google have some nice test-cases for all this?
2013/5/16 Peter Zijlstra <[email protected]>:
> On Tue, May 14, 2013 at 07:35:25PM +0800, Alex Shi wrote:
>
>> I tested all benchmarks on cover letter maintained, aim7, kbuild etc.
>> with autogroup enabled. There is no clear performance change.
>> But since the machine just run benchmark without anyother load, that
>> doesn't enough.
>
> Back when we started with smp-fair cgroup muck someone wrote a test for it. I
> _think_ it ended up in the LTP test-suite.
>
Hi Peter
I just download the lastest ltp from
http://sourceforge.net/projects/ltp/files/LTP%20Source/ltp-20130503/
and do cgroup benchmark tests on our SB-EP machine with 2S*8CORE*2SMT,
64G memory.
Following is my testing procedures:
1. tar -xvf ltp-full-20130503.tar
2. cd ltp-full-20130503
3. ./configure prefix=/mnt/ltp && make -j32 && sudo make install
4. cd /mnt/ltp
# create general testcase named cgroup_fj
5. echo -e "cgroup_fj run_cgroup_test_fj.sh" > runtest/cgroup
# we only test cpuset/cpu/cpuacct cgroup benchmark cases, here is my
cgroup_fj_testcases.sh
6. [changlox@lkp-sb03 bin]$ cat testcases/bin/cgroup_fj_testcases.sh
stress 2 2 1 1 1
stress 4 2 1 1 1
stress 5 2 1 1 1
stress 2 1 1 1 2
stress 2 1 1 2 1
stress 2 1 1 2 2
stress 2 1 1 2 3
stress 2 1 2 1 1
stress 2 1 2 1 2
stress 2 1 2 1 3
stress 2 1 2 2 1
stress 2 1 2 2 2
stress 4 1 1 1 2
stress 4 1 2 1 1
stress 4 1 2 1 2
stress 4 1 2 1 3
stress 5 1 1 1 2
stress 5 1 1 2 1
stress 5 1 1 2 2
stress 5 1 1 2 3
stress 5 1 2 1 1
stress 5 1 2 1 2
stress 5 1 2 1 3
stress 5 1 2 2 1
stress 5 1 2 2 2
# run test
7. sudo ./runltp -p -l /tmp/cgroup.results.log -d /tmp -o
/tmp/cgroup.log -f cgroup
my test results:
3.10-rc1 patch1-7 patch1-8
duration=764 duration=754 duration=750
duration=764 duration=754 duration=751
duration=763 duration=755 duration=751
duration means the seconds of testing cost.
Tested-by: Changlong Xie <[email protected]>
> Now I don't know if that's up-to-date enough to catch some of the cases we've
> recently fixed (as in the past few years) so it might want to be updated.
>
> Paul, do you guys at Google have some nice test-cases for all this?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Best regards
Changlox
On 05/23/2013 03:32 PM, Changlong Xie wrote:
> 2013/5/16 Peter Zijlstra <[email protected]>:
>> On Tue, May 14, 2013 at 07:35:25PM +0800, Alex Shi wrote:
>>
>>> I tested all benchmarks on cover letter maintained, aim7, kbuild etc.
>>> with autogroup enabled. There is no clear performance change.
>>> But since the machine just run benchmark without anyother load, that
>>> doesn't enough.
>>
>> Back when we started with smp-fair cgroup muck someone wrote a test for it. I
>> _think_ it ended up in the LTP test-suite.
>>
>
> Hi Peter
>
> my test results:
> 3.10-rc1 patch1-7 patch1-8
> duration=764 duration=754 duration=750
> duration=764 duration=754 duration=751
> duration=763 duration=755 duration=751
>
> duration means the seconds of testing cost.
>
> Tested-by: Changlong Xie <[email protected]>
Seems the 8th patch is helpful on cgroup. Thanks Changlong!
--
Thanks
Alex
On 05/16/2013 03:29 PM, Michael Wang wrote:
>> > This version changed the runnable load avg value setting for new task
>> > in patch 3rd.
>> >
>> > We also tried to include blocked load avg in balance. but find many benchmark
>> > performance dropping. Guess the too bigger cpu load drive task to be waken
>> > on remote CPU, and cause wrong decision in periodic balance.
>> >
>> > I retested on Intel core2, NHM, SNB, IVB, 2 and 4 sockets machines with
>> > benchmark kbuild, aim7, dbench, tbench, hackbench, oltp, and netperf loopback
>> > etc. The performance is better now.
>> >
>> > On SNB EP 4 sockets machine, the hackbench increased about 50%, and result
>> > become stable. on other machines, hackbench increased about 2~10%.
>> > oltp increased about 30% in NHM EX box.
>> > netperf loopback also increased on SNB EP 4 sockets box.
>> > no clear changes on other benchmarks.
>> >
>> > Michael Wang had tested previous version on pgbench on his box:
>> > https://lkml.org/lkml/2013/4/2/1022
> Tested the latest patch set (new 3/8 and 6/8) with pgbench, tip
> 3.10.0-rc1 and 12 cpu X86 box, works well and still benefit ;-)
Paul:
Would you like to give more comments/ideas of this patch set?
--
Thanks
Alex
On 05/16/2013 05:23 PM, Peter Zijlstra wrote:
> On Tue, May 14, 2013 at 07:35:25PM +0800, Alex Shi wrote:
>
>> > I tested all benchmarks on cover letter maintained, aim7, kbuild etc.
>> > with autogroup enabled. There is no clear performance change.
>> > But since the machine just run benchmark without anyother load, that
>> > doesn't enough.
> Back when we started with smp-fair cgroup muck someone wrote a test for it. I
> _think_ it ended up in the LTP test-suite.
Peter:
copy changlong's testing result again, the ltp cgroup stress testing
show this patchset can reduce the stress testing time:
# run test
7. sudo ./runltp -p -l /tmp/cgroup.results.log -d /tmp -o
/tmp/cgroup.log -f cgroup
my test results:
3.10-rc1 patch1-7 patch1-8
duration=764 duration=754 duration=750
duration=764 duration=754 duration=751
duration=763 duration=755 duration=751
duration means the seconds of testing cost.
Tested-by: Changlong Xie <[email protected]>
Paul, would you like to give some comments?
>
> Now I don't know if that's up-to-date enough to catch some of the cases we've
> recently fixed (as in the past few years) so it might want to be updated.
>
> Paul, do you guys at Google have some nice test-cases for all this?
--
Thanks
Alex
On 05/28/2013 09:31 PM, Alex Shi wrote:
>> > Tested the latest patch set (new 3/8 and 6/8) with pgbench, tip
>> > 3.10.0-rc1 and 12 cpu X86 box, works well and still benefit ;-)
>
> Paul:
>
> Would you like to give more comments/ideas of this patch set?
Peter,
If no more idea of the blocked_load_avg usages, could we have the
patchset clobbered in tip tree? Any way we get better performance on
hackbench/pgbench/cgroup stress etc benchmarks.
--
Thanks
Alex
On Fri, 2013-05-10 at 23:17 +0800, Alex Shi wrote:
> blocked_load_avg sometime is too heavy and far bigger than runnable load
> avg. that make balance make wrong decision. So better don't consider it.
>
> Signed-off-by: Alex Shi <[email protected]>
Hi Alex,
I have been testing these patches with a Java server workload on an 8
socket (80 core) box with Hyperthreading enabled, and I have been seeing
good results with these patches.
When using a 3.10-rc2 tip kernel with patches 1-8, there was about a 40%
improvement in performance of the workload compared to when using the
vanilla 3.10-rc2 tip kernel with no patches. When using a 3.10-rc2 tip
kernel with just patches 1-7, the performance improvement of the
workload over the vanilla 3.10-rc2 tip kernel was about 25%.
Tested-by: Jason Low <[email protected]>
Thanks,
Jason
On 05/30/2013 01:00 AM, Jason Low wrote:
> On Fri, 2013-05-10 at 23:17 +0800, Alex Shi wrote:
>> blocked_load_avg sometime is too heavy and far bigger than runnable load
>> avg. that make balance make wrong decision. So better don't consider it.
>>
>> Signed-off-by: Alex Shi <[email protected]>
>
> Hi Alex,
>
> I have been testing these patches with a Java server workload on an 8
> socket (80 core) box with Hyperthreading enabled, and I have been seeing
> good results with these patches.
>
> When using a 3.10-rc2 tip kernel with patches 1-8, there was about a 40%
> improvement in performance of the workload compared to when using the
> vanilla 3.10-rc2 tip kernel with no patches. When using a 3.10-rc2 tip
> kernel with just patches 1-7, the performance improvement of the
> workload over the vanilla 3.10-rc2 tip kernel was about 25%.
>
> Tested-by: Jason Low <[email protected]>
>
That is impressive!
Thanks a lot for your testing! Just curious, what the benchmark are you
using? :)
> Thanks,
> Jason
>
--
Thanks
Alex