2013-09-14 15:39:51

by Vladimir Davydov

[permalink] [raw]
Subject: [PATCH] sched: Fix task_h_load calculation

Patch a003a2 (sched: Consider runnable load average in move_tasks())
sets all top-level cfs_rqs' h_load to rq->avg.load_avg_contrib, which is
always 0. This mistype leads to all tasks having weight 0 when load
balancing in a cpu-cgroup enabled setup. There obviously should be sum
of weights of all runnable tasks there instead. Fix it.

Signed-off-by: Vladimir Davydov <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9b3fe1c..13abc29 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4242,7 +4242,7 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
}

if (!se) {
- cfs_rq->h_load = rq->avg.load_avg_contrib;
+ cfs_rq->h_load = cfs_rq->runnable_load_avg;
cfs_rq->last_h_load_update = now;
}

--
1.7.10.4


2013-09-14 18:53:24

by Paul Turner

[permalink] [raw]
Subject: Re: [PATCH] sched: Fix task_h_load calculation

On Sat, Sep 14, 2013 at 8:39 AM, Vladimir Davydov
<[email protected]> wrote:
> Patch a003a2 (sched: Consider runnable load average in move_tasks())
> sets all top-level cfs_rqs' h_load to rq->avg.load_avg_contrib, which is
> always 0. This mistype leads to all tasks having weight 0 when load
> balancing in a cpu-cgroup enabled setup. There obviously should be sum
> of weights of all runnable tasks there instead. Fix it.
>

load_avg_contrib is the weight that
> Signed-off-by: Vladimir Davydov <[email protected]>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9b3fe1c..13abc29 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4242,7 +4242,7 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
> } Once we'v Once we've made it that e made it that
>
> if (!se) {
> - cfs_rq->h_load = rq->avg.load_avg_contrib;
> + cfs_rq->h_load = cfs_rq->runnable_load_avg;

Looks good.

Reviewed-by: Paul Turner <[email protected]>

> cfs_rq->last_h_load_update = now;
> }
>
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

Subject: [tip:sched/core] sched/balancing: Fix cfs_rq-> task_h_load calculation

Commit-ID: 7e3115ef5149fc502e3a2e80719dba54a8e7409d
Gitweb: http://git.kernel.org/tip/7e3115ef5149fc502e3a2e80719dba54a8e7409d
Author: Vladimir Davydov <[email protected]>
AuthorDate: Sat, 14 Sep 2013 19:39:46 +0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 20 Sep 2013 11:59:39 +0200

sched/balancing: Fix cfs_rq->task_h_load calculation

Patch a003a2 (sched: Consider runnable load average in move_tasks())
sets all top-level cfs_rqs' h_load to rq->avg.load_avg_contrib, which is
always 0. This mistype leads to all tasks having weight 0 when load
balancing in a cpu-cgroup enabled setup. There obviously should be sum
of weights of all runnable tasks there instead. Fix it.

Signed-off-by: Vladimir Davydov <[email protected]>
Reviewed-by: Paul Turner <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2aedacc..7c70201 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4242,7 +4242,7 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
}

if (!se) {
- cfs_rq->h_load = rq->avg.load_avg_contrib;
+ cfs_rq->h_load = cfs_rq->runnable_load_avg;
cfs_rq->last_h_load_update = now;
}

2013-09-29 09:47:16

by Yuanhan Liu

[permalink] [raw]
Subject: Re: [tip:sched/core] sched/balancing: Fix cfs_rq-> task_h_load calculation

On Fri, Sep 20, 2013 at 06:46:59AM -0700, tip-bot for Vladimir Davydov wrote:
> Commit-ID: 7e3115ef5149fc502e3a2e80719dba54a8e7409d
> Gitweb: http://git.kernel.org/tip/7e3115ef5149fc502e3a2e80719dba54a8e7409d
> Author: Vladimir Davydov <[email protected]>
> AuthorDate: Sat, 14 Sep 2013 19:39:46 +0400
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Fri, 20 Sep 2013 11:59:39 +0200
>
> sched/balancing: Fix cfs_rq->task_h_load calculation
>
> Patch a003a2 (sched: Consider runnable load average in move_tasks())
> sets all top-level cfs_rqs' h_load to rq->avg.load_avg_contrib, which is
> always 0. This mistype leads to all tasks having weight 0 when load
> balancing in a cpu-cgroup enabled setup. There obviously should be sum
> of weights of all runnable tasks there instead. Fix it.

Hi Vladimir,

FYI, Here we found a 17% netperf regression by this patch. Here are some
changed stats between this commit 7e3115ef5149fc502e3a2e80719dba54a8e7409d
and it's parent(3029ede39373c368f402a76896600d85a4f7121b)

NOTE: both commit had been tested 10+ times.

stat 7e3115ef5149fc502e3a2e80719 3029ede39373c368f402a768966
netperf.Throughput_Mbps [ 2515 - 2593 ] -- [ 3010 - 3042 ]

lock_stat.&(&base->lock)->rlock.contentions.run_timer_softirq [ 188 - 261 ] -- [ 1907 - 2018 ]
lock_stat.&rq->lock.contentions [ 16885 - 23944 ] -- [ 2.6544e+05 - 2.8201e+05 ]
lock_stat.slock-AF_INET/1.contentions.release_sock [ 28993 - 34079 ] -- [ 2.6537e+05 - 2.7814e+05 ]
lock_stat.slock-AF_INET/1.contentions.tcp_v4_rcv [ 54906 - 64453 ] -- [ 4.6572e+05 - 4.8895e+05 ]
lock_stat.slock-AF_INET/1.contentions [ 54778 - 64265 ] -- [ 4.6503e+05 - 4.8831e+05 ]
lock_stat.slock-AF_INET/1.contentions.lock_sock_nested [ 25382 - 29998 ] -- [ 1.9822e+05 - 2.0934e+05 ]
lock_stat.slock-AF_INET.contentions.lock_sock_nested [ 1.5861e+05 - 1.8802e+05 ] -- [ 1.2317e+06 - 1.3016e+06 ]
lock_stat.slock-AF_INET.contentions.tcp_v4_rcv [ 1.9181e+05 - 2.2617e+05 ] -- [ 1.5482e+06 - 1.6346e+06 ]
lock_stat.slock-AF_INET.contentions [ 1.9259e+05 - 2.269e+05 ] -- [ 1.5536e+06 - 1.6403e+06 ]
lock_stat.&(&base->lock)->rlock.contentions [ 5658 - 9045 ] -- [ 1.3812e+05 - 1.478e+05 ]
lock_stat.&(&base->lock)->rlock.contentions.lock_timer_base [ 11006 - 17636 ] -- [ 2.7183e+05 - 2.9104e+05 ]
lock_stat.slock-AF_INET.contentions.release_sock [ 33931 - 39607 ] -- [ 3.2735e+05 - 3.4512e+05 ]
lock_stat.&(&base->lock)->rlock.contentions.mod_timer [ 93 - 152 ] -- [ 2347 - 2643 ]
lock_stat.&(&zone->lock)->rlock.contentions.__free_pages_ok [ 6.4647e+07 - 6.6226e+07 ] -- [ 5.3604e+07 - 5.5065e+07 ]
vmstat.system.in [ 8921 - 9414 ] -- [ 27103 - 28369 ]
vmstat.system.cs [ 1.4924e+05 - 1.9988e+05 ] -- [ 6.1384e+05 - 6.4036e+05 ]
lock_stat.&(&zone->lock)->rlock.contentions [ 6.7612e+07 - 6.9817e+07 ] -- [ 5.7419e+07 - 5.8889e+07 ]
lock_stat.rcu_node_1.contentions.rcu_process_callbacks [ 81543 - 87346 ] -- [ 97955 - 1.0295e+05 ]
iostat.cpu.user [ 1.4141 - 1.5051 ] -- [ 2.1044 - 2.1732 ]
lock_stat.&(&zone->lock)->rlock.contentions.get_page_from_freelist [ 7.0564e+07 - 7.3656e+07 ] -- [ 6.1222e+07 - 6.2746e+07 ]
lock_stat.&rq->lock.contentions.__schedule [ 8276 - 11422 ] -- [ 1.1656e+05 - 1.9275e+05 ]
iostat.cpu.system [ 95.387 - 95.516 ] -- [ 94.736 - 94.81 ]
vmstat.cpu.sy [ 96 - 96 ] -- [ 95 - 95 ]



And here are the text plot charts for those changed stats:

* for 7e3115ef5149fc502e3a2e80719dba54a8e7409d(this commit)
O for 3029ede39373c368f402a76896600d85a4f7121b(parent)

netperf.Throughput_Mbps

3100 ++------------------------------------------------------------------+
O O |
3000 ++ O O O O O O O O
| |
| |
2900 ++ |
| |
2800 ++ |
| |
2700 ++ |
| |
| |
2600 ++ ..*......*.......*......*.......*...... ....*
| .... *... |
2500 *+------*------*----------------------------------------------------+


vmstat.system.in

30000 ++-----------------------------------------------------------------+
28000 O+ O O O |
| O O O O O O
26000 ++ |
24000 ++ |
22000 ++ |
20000 ++ |
| |
18000 ++ |
16000 ++ |
14000 ++ |
12000 ++ |
| |
10000 *+.....*.......*......*.......*......*.......*......*.......*......*
8000 ++-----------------------------------------------------------------+


vmstat.system.cs

700000 ++----------------------------------------------------------------+
O O |
600000 ++ O O O O O O O O
| |
| |
500000 ++ |
| |
400000 ++ |
| |
300000 ++ |
| |
| ...*
200000 ++ ...*...... ....*...... ....*... |
*......*.......*... *... *......*... |
100000 ++----------------------------------------------------------------+


vmstat.cpu.sy

96 *+------*------*--------------*-------*------*-------*------*-------*
| : : |
| : : |
95.8 ++ : : |
| : : |
| : : |
95.6 ++ : : |
| : : |
95.4 ++ : : |
| : : |
| : : |
95.2 ++ : : |
| : : |
| : : |
95 O+------O------O-------*------O-------O------O-------O------O-------O


lock_stat.&(&zone->lock)->rlock.contentions

7.2e+07 ++---------------------------------------------------------------+
| |
7e+07 ++ ...*......*.......*......*...... ...*.......*..... |
6.8e+07 *+.. *... . |
| *......*
6.6e+07 ++ |
| |
6.4e+07 ++ |
| |
6.2e+07 ++ |
6e+07 ++ |
| O O
5.8e+07 ++ O O O O |
O O O |
5.6e+07 ++---------------------------O-----------------------------------+


lock_stat.&(&zone->lock)->rlock.contentions.get_page_from_freelist

7.4e+07 ++---------------------------*---------------------*-------------+
| ....*... . ...*... .. |
7.2e+07 *+.....*......*... *... . |
| *......*
7e+07 ++ |
| |
6.8e+07 ++ |
| |
6.6e+07 ++ |
| |
6.4e+07 ++ |
| O O O
6.2e+07 ++ O O O |
O O O |
6e+07 ++---------------------------O-----------------------------------+


lock_stat.&(&zone->lock)->rlock.contentions.__free_pages_ok

6.8e+07 ++---------------------------------------------------------------+
| ....*..... |
6.6e+07 *+.....*......*.......*......*......*......*... . |
6.4e+07 ++ *......*
| |
6.2e+07 ++ |
| |
6e+07 ++ |
| |
5.8e+07 ++ |
5.6e+07 ++ |
| O O O
5.4e+07 O+ O O O O |
| O O |
5.2e+07 ++---------------------------------------------------------------+


lock_stat.slock-AF_INET.contentions

1.8e+06 ++---------------------------------------------------------------+
O O O O |
1.6e+06 ++ O O O O O O
1.4e+06 ++ |
| |
1.2e+06 ++ |
1e+06 ++ |
| |
800000 ++ |
600000 ++ |
| |
400000 ++ |
200000 *+..... ....*......*......*......*....... ...*......*
| *......*... *... |
0 ++---------------------------------------------------------------+


lock_stat.slock-AF_INET.contentions.lock_sock_nested

1.4e+06 ++---------------------------------------------------------------+
O O O O |
1.2e+06 ++ O O O O O O
| |
1e+06 ++ |
| |
800000 ++ |
| |
600000 ++ |
| |
400000 ++ |
| |
200000 *+.....*......*.......*......*......*......*.......*......*......*
| |
0 ++---------------------------------------------------------------+


lock_stat.slock-AF_INET.contentions.release_sock

350000 O+-----O---------------------O--------------O---------------------+
| O O O O O O
300000 ++ |
| |
250000 ++ |
| |
200000 ++ |
| |
150000 ++ |
| |
100000 ++ |
| |
50000 ++ ...*...... ....*......*...... ....*......*
*......*.......*... *... *... |
0 ++----------------------------------------------------------------+


lock_stat.slock-AF_INET.contentions.tcp_v4_rcv

1.8e+06 ++---------------------------------------------------------------+
O O O O |
1.6e+06 ++ O O O O O O
1.4e+06 ++ |
| |
1.2e+06 ++ |
1e+06 ++ |
| |
800000 ++ |
600000 ++ |
| |
400000 ++ |
200000 *+..... ....*......*......*......*....... ...*......*
| *......*... *... |
0 ++---------------------------------------------------------------+


lock_stat.rcu_node_1.contentions.rcu_process_callbacks

110000 ++----------------------------------------------------------------+
| |
105000 ++ O |
| O |
O O O
100000 ++ O O O |
| O O |
95000 ++ |
| |
90000 ++ .*
| .... |
| .*.......*......*..... ..*. |
85000 ++.... . .*...... .... |
*. *....... .... *. |
80000 ++-----------------------------------*----------------------------+


lock_stat.slock-AF_INET/1.contentions

500000 ++---------------------------O--------------O---------------------+
O O O O O O O O
450000 ++ |
400000 ++ |
| |
350000 ++ |
300000 ++ |
| |
250000 ++ |
200000 ++ |
| |
150000 ++ |
100000 ++ |
| ....*...... |
50000 *+-----*-------*------*------*--------------*------*-------*------*


lock_stat.slock-AF_INET/1.contentions.tcp_v4_rcv

500000 ++---------------------------O--------------O---------------------+
O O O O O O O O
450000 ++ |
400000 ++ |
| |
350000 ++ |
300000 ++ |
| |
250000 ++ |
200000 ++ |
| |
150000 ++ |
100000 ++ |
| ....*...... |
50000 *+-----*-------*------*------*--------------*------*-------*------*


lock_stat.slock-AF_INET/1.contentions.release_sock

300000 ++----------------------------------------------------------------+
O O O O O |
250000 ++ O O O O O
| |
| |
200000 ++ |
| |
150000 ++ |
| |
100000 ++ |
| |
| |
50000 ++ ...*...... ....*......*...... ....*......*
*......*.......*... *... *... |
0 ++----------------------------------------------------------------+


lock_stat.slock-AF_INET/1.contentions.lock_sock_nested

220000 ++----------------------------------------------------------------+
200000 O+ O O O O O O O |
| O O
180000 ++ |
160000 ++ |
| |
140000 ++ |
120000 ++ |
100000 ++ |
| |
80000 ++ |
60000 ++ |
| |
40000 ++ ...*...... ....*......*...... ....*......*
20000 *+-----*-------*-------------*---------------------*--------------+


lock_stat.&rq->lock.contentions

300000 O+----------------------------------------------------------------+
| O O O O O |
250000 ++ O O O O
| |
| |
200000 ++ |
| |
150000 ++ |
| |
100000 ++ |
| |
| |
50000 ++ |
*......*.......*......*......*.......*......*......*.......*......*
0 ++----------------------------------------------------------------+


lock_stat.&rq->lock.contentions.__schedule

200000 O+----------------------------------------------------------------+
180000 ++ O O O O O O
| |
160000 ++ |
140000 ++ |
| O |
120000 ++ O O |
100000 ++ |
80000 ++ |
| |
60000 ++ |
40000 ++ |
| |
20000 *+.....*.......*......*......*.......*......*......*.......*......*
0 ++----------------------------------------------------------------+


lock_stat.&(&base->lock)->rlock.contentions

160000 ++----------------------------------------------------------------+
O O O O O
140000 ++ O O O O O |
120000 ++ |
| |
100000 ++ |
| |
80000 ++ |
| |
60000 ++ |
40000 ++ |
| |
20000 ++ |
*...... ...*......*.......*......*......*.......*......*
0 ++-----*-------*--------------------------------------------------+


lock_stat.&(&base->lock)->rlock.contentions.lock_timer_base

300000 O+-----O---------------------O------------------------------------+
| O O O O O O O
250000 ++ |
| |
| |
200000 ++ |
| |
150000 ++ |
| |
100000 ++ |
| |
| |
50000 ++ |
*......*.......*......*......*.......*......*......*.......*......*
0 ++----------------------------------------------------------------+


lock_stat.&(&base->lock)->rlock.contentions.run_timer_softirq

2200 ++------------------------------------------------------------------+
2000 O+ O O O |
| O O O O O O
1800 ++ |
1600 ++ |
1400 ++ |
1200 ++ |
| |
1000 ++ |
800 ++ |
600 ++ |
400 ++ |
| ....*...... ....*......*....... ...*.......|
200 *+......*......*... *... *... *
0 ++------------------------------------------------------------------+


lock_stat.&(&base->lock)->rlock.contentions.mod_timer

3000 ++------------------------------------------------------------------+
| |
2500 O+ O O O O O
| O O O |
| |
2000 ++ |
| |
1500 ++ |
| O |
1000 ++ |
| |
| |
500 ++ |
*....... ....*......*.......*......*.......*......*.......*
0 ++------*------*----------------------------------------------------+


iostat.cpu.user

2.2 O+--------------------------------------------O----------------------+
| O O O O O O O |
2.1 ++ O
2 ++ |
| |
1.9 ++ |
1.8 ++ |
| |
1.7 ++ |
1.6 ++ |
| |
1.5 ++ ....*......*....... ...*.......*
1.4 *+......*...... ....*... *.......*... |
| *... |
1.3 ++-------------------------------------------------------------------+


iostat.cpu.system

95.6 ++------------------------------------------------------------------+
*....... ...*.......*..... ....*..... |
95.5 ++ *... . ...*... . |
95.4 ++ *.......*... *.......*
| |
95.3 ++ |
95.2 ++ |
| |
95.1 ++ |
95 ++ |
| |
94.9 ++ |
94.8 ++ O O O
O O O O O O O |
94.7 ++------------------------------------------------------------------+


And here is the bisect log:
# bad: [e9a5cecac4145ba4a64827ac55728d33f5c8bb79] Merge 'drm/drm-fixes' into devel-hourly-2013092910
# good: [272b98c6455f00884f0350f775c5342358ebb73f] Linux 3.12-rc1
git bisect start 'e9a5cecac4145ba4a64827ac55728d33f5c8bb79' '272b98c6455f00884f0350f775c5342358ebb73f' '--'
# bad: [654fdd041227d7de1594baa61c58f2c87bd0640f] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 654fdd041227d7de1594baa61c58f2c87bd0640f
# good: [b75ff5e84bb6c2d43a8ec39b240c80f0543821f0] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect good b75ff5e84bb6c2d43a8ec39b240c80f0543821f0
# good: [d8524ae9d6f492a9c6db9f4d89c5f9b8782fa2d5] Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
git bisect good d8524ae9d6f492a9c6db9f4d89c5f9b8782fa2d5
# good: [4a10c2ac2f368583138b774ca41fac4207911983] Linux 3.12-rc2
git bisect good 4a10c2ac2f368583138b774ca41fac4207911983
# good: [a153e67bda3639a46edac6205610ae63c0fdea4c] Merge branch 'akpm' (patches from Andrew Morton)
git bisect good a153e67bda3639a46edac6205610ae63c0fdea4c
# good: [fa7315871046b9a4c48627905691dbde57e51033] perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
git bisect good fa7315871046b9a4c48627905691dbde57e51033
# bad: [82dfaa58a79c121be3611ce549dec806f2e6004f] Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 82dfaa58a79c121be3611ce549dec806f2e6004f
# good: [bdc5663fa14de657f24080ee959670d49c8dd094] Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good bdc5663fa14de657f24080ee959670d49c8dd094
# good: [3029ede39373c368f402a76896600d85a4f7121b] sched/balancing: Fix 'local->avg_load > busiest->avg_load' case in fix_small_imbalance()
git bisect good 3029ede39373c368f402a76896600d85a4f7121b
# bad: [7e3115ef5149fc502e3a2e80719dba54a8e7409d] sched/balancing: Fix cfs_rq->task_h_load calculation
git bisect bad 7e3115ef5149fc502e3a2e80719dba54a8e7409d
# first bad commit: [7e3115ef5149fc502e3a2e80719dba54a8e7409d] sched/balancing: Fix cfs_rq->task_h_load calculation

Please feel free to ask more data.

Thanks.

--yliu
>
> Signed-off-by: Vladimir Davydov <[email protected]>
> Reviewed-by: Paul Turner <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2aedacc..7c70201 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4242,7 +4242,7 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
> }
>
> if (!se) {
> - cfs_rq->h_load = rq->avg.load_avg_contrib;
> + cfs_rq->h_load = cfs_rq->runnable_load_avg;
> cfs_rq->last_h_load_update = now;
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-09-30 08:14:17

by Vladimir Davydov

[permalink] [raw]
Subject: Re: [tip:sched/core] sched/balancing: Fix cfs_rq-> task_h_load calculation

On 09/29/2013 01:47 PM, Yuanhan Liu wrote:
> On Fri, Sep 20, 2013 at 06:46:59AM -0700, tip-bot for Vladimir Davydov wrote:
>> Commit-ID: 7e3115ef5149fc502e3a2e80719dba54a8e7409d
>> Gitweb:http://git.kernel.org/tip/7e3115ef5149fc502e3a2e80719dba54a8e7409d
>> Author: Vladimir Davydov<[email protected]>
>> AuthorDate: Sat, 14 Sep 2013 19:39:46 +0400
>> Committer: Ingo Molnar<[email protected]>
>> CommitDate: Fri, 20 Sep 2013 11:59:39 +0200
>>
>> sched/balancing: Fix cfs_rq->task_h_load calculation
>>
>> Patch a003a2 (sched: Consider runnable load average in move_tasks())
>> sets all top-level cfs_rqs' h_load to rq->avg.load_avg_contrib, which is
>> always 0. This mistype leads to all tasks having weight 0 when load
>> balancing in a cpu-cgroup enabled setup. There obviously should be sum
>> of weights of all runnable tasks there instead. Fix it.
> Hi Vladimir,
>
> FYI, Here we found a 17% netperf regression by this patch. Here are some
> changed stats between this commit 7e3115ef5149fc502e3a2e80719dba54a8e7409d
> and it's parent(3029ede39373c368f402a76896600d85a4f7121b)

Hello,

Could you please report the following info:

1) the test machine cpu topology (i.e. output of
/sys/devices/system/cpu/cpu*/{thread_siblings_list,core_siblings_list})
2) kernel config you used during the test
3) the output of /sys/kernel/debug/sched_features (debugfs mounted).
4) netperf server/client options
5) did you place netserver into a separate cpu cgroup?

Thanks.

2013-10-01 02:22:44

by Yuanhan Liu

[permalink] [raw]
Subject: Re: [tip:sched/core] sched/balancing: Fix cfs_rq-> task_h_load calculation

On Mon, Sep 30, 2013 at 12:14:03PM +0400, Vladimir Davydov wrote:
> On 09/29/2013 01:47 PM, Yuanhan Liu wrote:
> >On Fri, Sep 20, 2013 at 06:46:59AM -0700, tip-bot for Vladimir Davydov wrote:
> >>Commit-ID: 7e3115ef5149fc502e3a2e80719dba54a8e7409d
> >>Gitweb:http://git.kernel.org/tip/7e3115ef5149fc502e3a2e80719dba54a8e7409d
> >>Author: Vladimir Davydov<[email protected]>
> >>AuthorDate: Sat, 14 Sep 2013 19:39:46 +0400
> >>Committer: Ingo Molnar<[email protected]>
> >>CommitDate: Fri, 20 Sep 2013 11:59:39 +0200
> >>
> >>sched/balancing: Fix cfs_rq->task_h_load calculation
> >>
> >>Patch a003a2 (sched: Consider runnable load average in move_tasks())
> >>sets all top-level cfs_rqs' h_load to rq->avg.load_avg_contrib, which is
> >>always 0. This mistype leads to all tasks having weight 0 when load
> >>balancing in a cpu-cgroup enabled setup. There obviously should be sum
> >>of weights of all runnable tasks there instead. Fix it.
> >Hi Vladimir,
> >
> >FYI, Here we found a 17% netperf regression by this patch. Here are some
> >changed stats between this commit 7e3115ef5149fc502e3a2e80719dba54a8e7409d
> >and it's parent(3029ede39373c368f402a76896600d85a4f7121b)
>
> Hello,
>
> Could you please report the following info:

Hi Vladimir,

This regression was first found at a 2-core 32 CPU Sandybridge server
with 64G memory. However, I can't ssh to it now and we are off work
this week due to holiday. So, sorry, email response may be delayed.

Then I found this regression exists at another atom micro server as
well. And the following machine and testcase specific info are all from it.

And to not make old data confuse you, here I also update the changed
stats and corresponding text plot as well in attachment.
>
> 1) the test machine cpu topology (i.e. output of /sys/devices/system/cpu/cpu*/{thread_siblings_list,core_siblings_list})

# grep . /sys/devices/system/cpu/cpu*/topology/{thread_siblings_list,core_siblings_list}
/sys/devices/system/cpu/cpu0/topology/thread_siblings_list:0-1
/sys/devices/system/cpu/cpu1/topology/thread_siblings_list:0-1
/sys/devices/system/cpu/cpu2/topology/thread_siblings_list:2-3
/sys/devices/system/cpu/cpu3/topology/thread_siblings_list:2-3
/sys/devices/system/cpu/cpu0/topology/core_siblings_list:0-3
/sys/devices/system/cpu/cpu1/topology/core_siblings_list:0-3
/sys/devices/system/cpu/cpu2/topology/core_siblings_list:0-3
/sys/devices/system/cpu/cpu3/topology/core_siblings_list:0-3

> 2) kernel config you used during the test

Attached.

> 3) the output of /sys/kernel/debug/sched_features (debugfs mounted).

# cat /sys/kernel/debug/sched_features
GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY CACHE_HOT_BUDDY
WAKEUP_PREEMPTION ARCH_POWER NO_HRTICK NO_DOUBLE_TICK LB_BIAS NONTASK_POWER
TTWU_QUEUE NO_FORCE_SD_OVERLAP RT_RUNTIME_SHARE NO_LB_MIN NO_NUMA NO_NUMA_FORCE

> 4) netperf server/client options

Here is our testscript we used:
#!/bin/bash
# - test

# start netserver
netserver

sleep 1

for i in $(seq $nr_threads)
do
netperf -t $test -c -C -l $runtime &
done

Where,
$test is TCP_SENDFILE,
$nr_threads is 8, two times of nr cpu
$runtime is 120s

> 5) did you place netserver into a separate cpu cgroup?

Nope.


Thanks.

--yliu


Attachments:
(No filename) (3.21 kB)
changed-stats (18.76 kB)
config-3.12.0-rc1-00038-g7e3115e (78.16 kB)
Download all attachments

2013-10-01 02:32:48

by Paul Turner

[permalink] [raw]
Subject: Re: [tip:sched/core] sched/balancing: Fix cfs_rq-> task_h_load calculation

On Mon, Sep 30, 2013 at 7:22 PM, Yuanhan Liu
<[email protected]> wrote:
> On Mon, Sep 30, 2013 at 12:14:03PM +0400, Vladimir Davydov wrote:
>> On 09/29/2013 01:47 PM, Yuanhan Liu wrote:
>> >On Fri, Sep 20, 2013 at 06:46:59AM -0700, tip-bot for Vladimir Davydov wrote:
>> >>Commit-ID: 7e3115ef5149fc502e3a2e80719dba54a8e7409d
>> >>Gitweb:http://git.kernel.org/tip/7e3115ef5149fc502e3a2e80719dba54a8e7409d
>> >>Author: Vladimir Davydov<[email protected]>
>> >>AuthorDate: Sat, 14 Sep 2013 19:39:46 +0400
>> >>Committer: Ingo Molnar<[email protected]>
>> >>CommitDate: Fri, 20 Sep 2013 11:59:39 +0200
>> >>
>> >>sched/balancing: Fix cfs_rq->task_h_load calculation
>> >>
>> >>Patch a003a2 (sched: Consider runnable load average in move_tasks())
>> >>sets all top-level cfs_rqs' h_load to rq->avg.load_avg_contrib, which is
>> >>always 0. This mistype leads to all tasks having weight 0 when load
>> >>balancing in a cpu-cgroup enabled setup. There obviously should be sum
>> >>of weights of all runnable tasks there instead. Fix it.
>> >Hi Vladimir,
>> >
>> >FYI, Here we found a 17% netperf regression by this patch. Here are some
>> >changed stats between this commit 7e3115ef5149fc502e3a2e80719dba54a8e7409d
>> >and it's parent(3029ede39373c368f402a76896600d85a4f7121b)
>>
>> Hello,
>>
>> Could you please report the following info:
>
> Hi Vladimir,
>
> This regression was first found at a 2-core 32 CPU Sandybridge server
> with 64G memory. However, I can't ssh to it now and we are off work
> this week due to holiday. So, sorry, email response may be delayed.
>
> Then I found this regression exists at another atom micro server as
> well. And the following machine and testcase specific info are all from it.
>
> And to not make old data confuse you, here I also update the changed
> stats and corresponding text plot as well in attachment.
>>
>> 1) the test machine cpu topology (i.e. output of /sys/devices/system/cpu/cpu*/{thread_siblings_list,core_siblings_list})
>
> # grep . /sys/devices/system/cpu/cpu*/topology/{thread_siblings_list,core_siblings_list}
> /sys/devices/system/cpu/cpu0/topology/thread_siblings_list:0-1
> /sys/devices/system/cpu/cpu1/topology/thread_siblings_list:0-1
> /sys/devices/system/cpu/cpu2/topology/thread_siblings_list:2-3
> /sys/devices/system/cpu/cpu3/topology/thread_siblings_list:2-3
> /sys/devices/system/cpu/cpu0/topology/core_siblings_list:0-3
> /sys/devices/system/cpu/cpu1/topology/core_siblings_list:0-3
> /sys/devices/system/cpu/cpu2/topology/core_siblings_list:0-3
> /sys/devices/system/cpu/cpu3/topology/core_siblings_list:0-3
>

>> 2) kernel config you used during the test
>
> Attached.
>
>> 3) the output of /sys/kernel/debug/sched_features (debugfs mounted).
>
> # cat /sys/kernel/debug/sched_features
> GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY CACHE_HOT_BUDDY
> WAKEUP_PREEMPTION ARCH_POWER NO_HRTICK NO_DOUBLE_TICK LB_BIAS NONTASK_POWER
> TTWU_QUEUE NO_FORCE_SD_OVERLAP RT_RUNTIME_SHARE NO_LB_MIN NO_NUMA NO_NUMA_FORCE
>
>> 4) netperf server/client options
>
> Here is our testscript we used:
> #!/bin/bash
> # - test
>
> # start netserver
> netserver
>
> sleep 1
>
> for i in $(seq $nr_threads)
> do
> netperf -t $test -c -C -l $runtime &
> done
>
> Where,
> $test is TCP_SENDFILE,
> $nr_threads is 8, two times of nr cpu
> $runtime is 120s
>
>> 5) did you place netserver into a separate cpu cgroup?
>
> Nope.
>


If this is causing a regression I think it actually calls into
question the original series that included a003a25b227d59d. This
patch only makes h_load not be a nonsense value.