LinuxLists.cc - [RFC/RFT V2 PATCH 0/5] Improve scheduler scalability for fast path

2018-06-12 17:49:59

Subject: [RFC/RFT V2 PATCH 0/5] Improve scheduler scalability for fast path

Current select_idle_sibling first tries to find a fully idle core using
select_idle_core which can potentially search all cores and if it fails it
finds any idle cpu using select_idle_cpu. select_idle_cpu can potentially
search all cpus in the llc domain. This doesn't scale for large llc domains
and will only get worse with more cores in future.

This patch solves the scalability problem by:
- Setting an upper and lower limit of idle cpu search in select_idle_cpu
to keep search time low and constant
- Adding a new sched feature SIS_CORE to disable select_idle_core

Additionally it also introduces a new per-cpu variable next_cpu to track
the limit of search so that every time search starts from where it ended.
This rotating search window over cpus in LLC domain ensures that idle
cpus are eventually found in case of high load.

Following are the performance numbers with various benchmarks with SIS_CORE
true (idle core search enabled).

Hackbench process on 2 socket, 44 core and 88 threads Intel x86 machine
(lower is better):
groups baseline %stdev patch %stdev
1 0.5816 8.94 0.5903 (-1.5%) 11.28
2 0.6428 10.64 0.5843 (9.1%) 4.93
4 1.0152 1.99 0.9965 (1.84%) 1.83
8 1.8128 1.4 1.7921 (1.14%) 1.76
16 3.1666 0.8 3.1345 (1.01%) 0.81
32 5.6084 0.83 5.5677 (0.73%) 0.8

Sysbench MySQL on 2 socket, 44 core and 88 threads Intel x86 machine
(higher is better):
threads baseline %stdev patch %stdev
8 2095.45 1.82 2102.6 (0.34%) 2.11
16 4218.45 0.06 4221.35 (0.07%) 0.38
32 7531.36 0.49 7607.18 (1.01%) 0.25
48 10206.42 0.21 10324.26 (1.15%) 0.13
64 12053.73 0.1 12158.3 (0.87%) 0.24
128 14810.33 0.04 14840.4 (0.2%) 0.38

Oracle DB on 2 socket, 44 core and 88 threads Intel x86 machine
(normalized, higher is better):
users baseline %stdev patch %stdev
20 1 0.9 1.0068 (0.68%) 0.27
40 1 0.8 1.0103 (1.03%) 1.24
60 1 0.34 1.0178 (1.78%) 0.49
80 1 0.53 1.0092 (0.92%) 1.5
100 1 0.79 1.0090 (0.9%) 0.88
120 1 0.06 1.0048 (0.48%) 0.72
140 1 0.22 1.0116 (1.16%) 0.05
160 1 0.57 1.0264 (2.64%) 0.67
180 1 0.81 1.0194 (1.94%) 0.91
200 1 0.44 1.028 (2.8%) 3.09
220 1 1.74 1.0229 (2.29%) 0.21

Uperf pingpong on 2 socket, 44 core and 88 threads Intel x86 machine with
message size = 8k (higher is better):
threads baseline %stdev patch %stdev
8 45.36 0.43 46.28 (2.01%) 0.29
16 87.81 0.82 89.67 (2.12%) 0.38
32 151.19 0.02 153.5 (1.53%) 0.41
48 190.2 0.21 194.79 (2.41%) 0.07
64 190.42 0.35 202.9 (6.55%) 1.66
128 323.86 0.28 343.56 (6.08%) 1.34

Dbench on 2 socket, 44 core and 88 threads Intel x86 machine
(higher is better):
clients baseline patch
1 629.8 603.83 (-4.12%)
2 1159.65 1155.75 (-0.34%)
4 2121.61 2093.99 (-1.3%)
8 2620.52 2641.51 (0.8%)
16 2879.31 2897.6 (0.64%)
32 2791.24 2936.47 (5.2%)
64 1853.07 1894.74 (2.25%)
128 1484.95 1494.29 (0.63%)

Tbench on 2 socket, 44 core and 88 threads Intel x86 machine
(higher is better):
clients baseline patch
1 256.41 255.8 (-0.24%)
2 509.89 504.52 (-1.05%)
4 999.44 1003.74 (0.43%)
8 1982.7 1976.42 (-0.32%)
16 3891.51 3916.04 (0.63%)
32 6819.24 6845.06 (0.38%)
64 8542.95 8568.28 (0.3%)
128 15277.6 15754.6 (3.12%)

Schbench on 2 socket, 44 core and 88 threads Intel x86 machine with 44
tasks (lower is better):
percentile baseline %stdev patch %stdev
50 94 2.82 92 (2.13%) 2.17
75 124 2.13 122 (1.61%) 1.42
90 152 1.74 151 (0.66%) 0.66
95 171 2.11 170 (0.58%) 0
99 512.67 104.96 208.33 (59.36%) 1.2
99.5 2296 82.55 3674.66 (-60.05%) 22.19
99.9 12517.33 2.38 12784 (-2.13%) 0.66

Hackbench process on 2 socket, 16 core and 128 threads SPARC machine
(lower is better):
groups baseline %stdev patch %stdev
1 1.3085 6.65 1.2213 (6.66%) 10.32
2 1.4559 8.55 1.5048 (-3.36%) 4.72
4 2.6271 1.74 2.5532 (2.81%) 2.02
8 4.7089 3.01 4.5118 (4.19%) 2.74
16 8.7406 2.25 8.6801 (0.69%) 4.78
32 17.7835 1.01 16.759 (5.76%) 1.38
64 36.1901 0.65 34.6652 (4.21%) 1.24
128 72.6585 0.51 70.9762 (2.32%) 0.9

Following are the performance numbers with various benchmarks with SIS_CORE
false (idle core search disabled). This improves throughput of certain
workloads but increases latency of other workloads.

Hackbench process on 2 socket, 44 core and 88 threads Intel x86 machine
(lower is better):
groups baseline %stdev patch %stdev
1 0.5816 8.94 0.5835 (-0.33%) 8.21
2 0.6428 10.64 0.5752 (10.52%) 4.05
4 1.0152 1.99 0.9946 (2.03%) 2.56
8 1.8128 1.4 1.7619 (2.81%) 1.88
16 3.1666 0.8 3.1275 (1.23%) 0.42
32 5.6084 0.83 5.5856 (0.41%) 0.89

Sysbench MySQL on 2 socket, 44 core and 88 threads Intel x86 machine
(higher is better):
threads baseline %stdev patch %stdev
8 2095.45 1.82 2084.72 (-0.51%) 1.65
16 4218.45 0.06 4179.69 (-0.92%) 0.18
32 7531.36 0.49 7623.18 (1.22%) 0.39
48 10206.42 0.21 10159.16 (-0.46%) 0.21
64 12053.73 0.1 12087.21 (0.28%) 0.19
128 14810.33 0.04 14894.08 (0.57%) 0.08

Oracle DB on 2 socket, 44 core and 88 threads Intel x86 machine
(normalized, higher is better):
users baseline %stdev patch %stdev
20 1 0.9 1.0056 (0.56%) 0.34
40 1 0.8 1.0173 (1.73%) 0.13
60 1 0.34 0.9995 (-0.05%) 0.85
80 1 0.53 1.0175 (1.75%) 1.56
100 1 0.79 1.0151 (1.51%) 1.31
120 1 0.06 1.0244 (2.44%) 0.5
140 1 0.22 1.034 (3.4%) 0.66
160 1 0.57 1.0362 (3.62%) 0.07
180 1 0.81 1.041 (4.1%) 0.8
200 1 0.44 1.0233 (2.33%) 1.4
220 1 1.74 1.0125 (1.25%) 1.41

Uperf pingpong on 2 socket, 44 core and 88 threads Intel x86 machine with
message size = 8k (higher is better):
threads baseline %stdev patch %stdev
8 45.36 0.43 46.94 (3.48%) 0.2
16 87.81 0.82 91.75 (4.49%) 0.43
32 151.19 0.02 167.74 (10.95%) 1.29
48 190.2 0.21 200.57 (5.45%) 0.89
64 190.42 0.35 226.74 (19.07%) 1.79
128 323.86 0.28 348.12 (7.49%) 0.77

Dbench on 2 socket, 44 core and 88 threads Intel x86 machine
(higher is better):
clients baseline patch
1 629.8 600.19 (-4.7%)
2 1159.65 1162.07 (0.21%)
4 2121.61 2112.27 (-0.44%)
8 2620.52 2645.55 (0.96%)
16 2879.31 2828.87 (-1.75%)
32 2791.24 2760.97 (-1.08%)
64 1853.07 1747.66 (-5.69%)
128 1484.95 1459.81 (-1.69%)

Tbench on 2 socket, 44 core and 88 threads Intel x86 machine
(higher is better):
clients baseline patch
1 256.41 258.11 (0.67%)
2 509.89 509.13 (-0.15%)
4 999.44 1016.58 (1.72%)
8 1982.7 2006.53 (1.2%)
16 3891.51 3964.43 (1.87%)
32 6819.24 7376.92 (8.18%)
64 8542.95 9660.45 (13.08%)
128 15277.6 15438.4 (1.05%)

Schbench on 2 socket, 44 core and 88 threads Intel x86 machine with 44
tasks (lower is better):
percentile baseline %stdev patch %stdev
50 94 2.82 94.67 (-0.71%) 2.2
75 124 2.13 124.67 (-0.54%) 1.67
90 152 1.74 154.33 (-1.54%) 0.75
95 171 2.11 176.67 (-3.31%) 0.86
99 512.67 104.96 4130.33 (-705.65%) 79.41
99.5 2296 82.55 10066.67 (-338.44%) 26.15
99.9 12517.33 2.38 12869.33 (-2.81%) 0.8

Hackbench process on 2 socket, 16 core and 128 threads SPARC machine
(lower is better):
groups baseline %stdev patch %stdev
1 1.3085 6.65 1.2514 (4.36%) 11.1
2 1.4559 8.55 1.5433 (-6%) 3.05
4 2.6271 1.74 2.5626 (2.5%) 2.69
8 4.7089 3.01 4.5316 (3.77%) 2.95
16 8.7406 2.25 8.6585 (0.94%) 2.91
32 17.7835 1.01 17.175 (3.42%) 1.38
64 36.1901 0.65 35.5294 (1.83%) 1.02
128 72.6585 0.51 71.8821 (1.07%) 1.05

Following are the schbench performance numbers with SIS_CORE false and
SIS_PROP false. This recovers the latency increase by having SIS_CORE
false.

Schbench on 2 socket, 44 core and 88 threads Intel x86 machine with 44
tasks (lower is better):
percentile baseline %stdev patch %stdev
50 94 2.82 93.33 (0.71%) 1.24
75 124 2.13 122.67 (1.08%) 1.7
90 152 1.74 149.33 (1.75%) 2.35
95 171 2.11 167 (2.34%) 2.74
99 512.67 104.96 206 (59.82%) 8.86
99.5 2296 82.55 3121.67 (-35.96%) 97.37
99.9 12517.33 2.38 12592 (-0.6%) 1.67

Changes since v1
- Compute the upper and lower limit based on number of cpus in a core
- Split up the search limit and search window rotation into separate
patches
- Add new sched feature to have option of disabling idle core search

subhra mazumdar (5):
sched: limit cpu search in select_idle_cpu
sched: introduce per-cpu var next_cpu to track search limit
sched: rotate the cpu search window for better spread
sched: add sched feature to disable idle core search
sched: SIS_CORE to disable idle core search

kernel/sched/core.c | 2 ++
kernel/sched/fair.c | 31 +++++++++++++++++++++++--------
kernel/sched/features.h | 1 +
kernel/sched/sched.h | 1 +
4 files changed, 27 insertions(+), 8 deletions(-)

--
2.9.3

2018-06-12 17:49:51

by Subhra Mazumdar

[permalink] [raw]

Subject: [PATCH 4/5] sched: add sched feature to disable idle core search

Add a new sched feature SIS_CORE to have an option to disable idle core
search (select_idle_core).

Signed-off-by: subhra mazumdar <[email protected]>
---
kernel/sched/features.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 85ae848..de15733 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -57,6 +57,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
*/
SCHED_FEAT(SIS_AVG_CPU, false)
SCHED_FEAT(SIS_PROP, true)
+SCHED_FEAT(SIS_CORE, true)

/*
* Issue a WARN when we do multiple update_rq_clock() calls
--
2.9.3

2018-06-12 17:50:20

by Subhra Mazumdar

[permalink] [raw]

Subject: [PATCH 5/5] sched: SIS_CORE to disable idle core search

Use SIS_CORE to disable idle core search. For some workloads
select_idle_core becomes a scalability bottleneck, removing it improves
throughput. Also there are workloads where disabling it can hurt latency,
so need to have an option.

Signed-off-by: subhra mazumdar <[email protected]>
---
kernel/sched/fair.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 849c7c8..35a076e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6464,9 +6464,11 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
if (!sd)
return target;

- i = select_idle_core(p, sd, target);
- if ((unsigned)i < nr_cpumask_bits)
- return i;
+ if (sched_feat(SIS_CORE)) {
+ i = select_idle_core(p, sd, target);
+ if ((unsigned)i < nr_cpumask_bits)
+ return i;
+ }

i = select_idle_cpu(p, sd, target);
if ((unsigned)i < nr_cpumask_bits)
--
2.9.3

2018-06-12 17:50:21

by Subhra Mazumdar

[permalink] [raw]

Subject: [PATCH 2/5] sched: introduce per-cpu var next_cpu to track search limit

Introduce a per-cpu variable to track the limit upto which idle cpu search
was done in select_idle_cpu(). This will help to start the search next time
from there. This is necessary for rotating the search window over entire
LLC domain.

Signed-off-by: subhra mazumdar <[email protected]>
---
kernel/sched/core.c | 2 ++
kernel/sched/sched.h | 1 +
2 files changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8d59b25..b3e4ec1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -20,6 +20,7 @@
#include <trace/events/sched.h>

DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
+DEFINE_PER_CPU_SHARED_ALIGNED(int, next_cpu);

#if defined(CONFIG_SCHED_DEBUG) && defined(HAVE_JUMP_LABEL)
/*
@@ -5996,6 +5997,7 @@ void __init sched_init(void)
for_each_possible_cpu(i) {
struct rq *rq;

+ per_cpu(next_cpu, i) = -1;
rq = cpu_rq(i);
raw_spin_lock_init(&rq->lock);
rq->nr_running = 0;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 67702b4..eb12b50 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -912,6 +912,7 @@ static inline void update_idle_core(struct rq *rq) { }
#endif

DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
+DECLARE_PER_CPU_SHARED_ALIGNED(int, next_cpu);

#define cpu_rq(cpu) (&per_cpu(runqueues, (cpu)))
#define this_rq() this_cpu_ptr(&runqueues)
--
2.9.3

2018-06-12 17:50:38

by Subhra Mazumdar

[permalink] [raw]

Subject: [PATCH 3/5] sched: rotate the cpu search window for better spread

Rotate the cpu search window for better spread of threads. This will ensure
an idle cpu will quickly be found if one exists.

Signed-off-by: subhra mazumdar <[email protected]>
---
kernel/sched/fair.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9a6d28d..849c7c8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6372,7 +6372,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
u64 avg_cost, avg_idle;
u64 time, cost;
s64 delta;
- int cpu, limit, floor, nr = INT_MAX;
+ int cpu, limit, floor, target_tmp, nr = INT_MAX;

this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
if (!this_sd)
@@ -6403,9 +6403,15 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
}
}

+ if (per_cpu(next_cpu, target) != -1)
+ target_tmp = per_cpu(next_cpu, target);
+ else
+ target_tmp = target;
+
time = local_clock();

- for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
+ for_each_cpu_wrap(cpu, sched_domain_span(sd), target_tmp) {
+ per_cpu(next_cpu, target) = cpu;
if (!--nr)
return -1;
if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
--
2.9.3

2018-06-12 17:51:25

by Subhra Mazumdar

[permalink] [raw]

Subject: [PATCH 1/5] sched: limit cpu search in select_idle_cpu

Put upper and lower limit on cpu search of select_idle_cpu. The lower limit
is amount of cpus in a core while upper limit is twice that. This ensures
for any architecture we will usually search beyond a core. The upper limit
also helps in keeping the search cost low and constant.

Signed-off-by: subhra mazumdar <[email protected]>
---
kernel/sched/fair.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e497c05..9a6d28d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6372,7 +6372,7 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
u64 avg_cost, avg_idle;
u64 time, cost;
s64 delta;
- int cpu, nr = INT_MAX;
+ int cpu, limit, floor, nr = INT_MAX;

this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
if (!this_sd)
@@ -6389,11 +6389,18 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
return -1;

if (sched_feat(SIS_PROP)) {
+ floor = cpumask_weight(topology_sibling_cpumask(target));
+ if (floor < 2)
+ floor = 2;
+ limit = 2*floor;
u64 span_avg = sd->span_weight * avg_idle;
- if (span_avg > 4*avg_cost)
+ if (span_avg > floor*avg_cost) {
nr = div_u64(span_avg, avg_cost);
- else
- nr = 4;
+ if (nr > limit)
+ nr = limit;
+ } else {
+ nr = floor;
+ }
}

time = local_clock();
--
2.9.3

2018-06-12 20:36:01

by kernel test robot

[permalink] [raw]

Subject: Re: [PATCH 1/5] sched: limit cpu search in select_idle_cpu

Hi subhra,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on v4.17 next-20180612]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/subhra-mazumdar/Improve-scheduler-scalability-for-fast-path/20180613-015158
config: i386-randconfig-x070-201823 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All warnings (new ones prefixed by >>):

kernel/sched/fair.c: In function 'select_idle_cpu':
>> kernel/sched/fair.c:6396:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
u64 span_avg = sd->span_weight * avg_idle;
^~~

vim +6396 kernel/sched/fair.c

10e2f1acd Peter Zijlstra 2016-05-09 6363
10e2f1acd Peter Zijlstra 2016-05-09 6364 /*
10e2f1acd Peter Zijlstra 2016-05-09 6365 * Scan the LLC domain for idle CPUs; this is dynamically regulated by
10e2f1acd Peter Zijlstra 2016-05-09 6366 * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
10e2f1acd Peter Zijlstra 2016-05-09 6367 * average idle time for this rq (as found in rq->avg_idle).
10e2f1acd Peter Zijlstra 2016-05-09 6368 */
10e2f1acd Peter Zijlstra 2016-05-09 6369 static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
10e2f1acd Peter Zijlstra 2016-05-09 6370 {
9cfb38a7b Wanpeng Li 2016-10-09 6371 struct sched_domain *this_sd;
1ad3aaf3f Peter Zijlstra 2017-05-17 6372 u64 avg_cost, avg_idle;
10e2f1acd Peter Zijlstra 2016-05-09 6373 u64 time, cost;
10e2f1acd Peter Zijlstra 2016-05-09 6374 s64 delta;
8dd662615 subhra mazumdar 2018-06-12 6375 int cpu, limit, floor, nr = INT_MAX;
10e2f1acd Peter Zijlstra 2016-05-09 6376
9cfb38a7b Wanpeng Li 2016-10-09 6377 this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
9cfb38a7b Wanpeng Li 2016-10-09 6378 if (!this_sd)
9cfb38a7b Wanpeng Li 2016-10-09 6379 return -1;
9cfb38a7b Wanpeng Li 2016-10-09 6380
10e2f1acd Peter Zijlstra 2016-05-09 6381 /*
10e2f1acd Peter Zijlstra 2016-05-09 6382 * Due to large variance we need a large fuzz factor; hackbench in
10e2f1acd Peter Zijlstra 2016-05-09 6383 * particularly is sensitive here.
10e2f1acd Peter Zijlstra 2016-05-09 6384 */
1ad3aaf3f Peter Zijlstra 2017-05-17 6385 avg_idle = this_rq()->avg_idle / 512;
1ad3aaf3f Peter Zijlstra 2017-05-17 6386 avg_cost = this_sd->avg_scan_cost + 1;
1ad3aaf3f Peter Zijlstra 2017-05-17 6387
1ad3aaf3f Peter Zijlstra 2017-05-17 6388 if (sched_feat(SIS_AVG_CPU) && avg_idle < avg_cost)
10e2f1acd Peter Zijlstra 2016-05-09 6389 return -1;
10e2f1acd Peter Zijlstra 2016-05-09 6390
1ad3aaf3f Peter Zijlstra 2017-05-17 6391 if (sched_feat(SIS_PROP)) {
8dd662615 subhra mazumdar 2018-06-12 6392 floor = cpumask_weight(topology_sibling_cpumask(target));
8dd662615 subhra mazumdar 2018-06-12 6393 if (floor < 2)
8dd662615 subhra mazumdar 2018-06-12 6394 floor = 2;
8dd662615 subhra mazumdar 2018-06-12 6395 limit = 2*floor;
1ad3aaf3f Peter Zijlstra 2017-05-17 @6396 u64 span_avg = sd->span_weight * avg_idle;
8dd662615 subhra mazumdar 2018-06-12 6397 if (span_avg > floor*avg_cost) {
1ad3aaf3f Peter Zijlstra 2017-05-17 6398 nr = div_u64(span_avg, avg_cost);
8dd662615 subhra mazumdar 2018-06-12 6399 if (nr > limit)
8dd662615 subhra mazumdar 2018-06-12 6400 nr = limit;
8dd662615 subhra mazumdar 2018-06-12 6401 } else {
8dd662615 subhra mazumdar 2018-06-12 6402 nr = floor;
8dd662615 subhra mazumdar 2018-06-12 6403 }
1ad3aaf3f Peter Zijlstra 2017-05-17 6404 }
1ad3aaf3f Peter Zijlstra 2017-05-17 6405
10e2f1acd Peter Zijlstra 2016-05-09 6406 time = local_clock();
10e2f1acd Peter Zijlstra 2016-05-09 6407
c743f0a5c Peter Zijlstra 2017-04-14 6408 for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
1ad3aaf3f Peter Zijlstra 2017-05-17 6409 if (!--nr)
1ad3aaf3f Peter Zijlstra 2017-05-17 6410 return -1;
0c98d344f Ingo Molnar 2017-02-05 6411 if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
10e2f1acd Peter Zijlstra 2016-05-09 6412 continue;
943d355d7 Rohit Jain 2018-05-09 6413 if (available_idle_cpu(cpu))
10e2f1acd Peter Zijlstra 2016-05-09 6414 break;
10e2f1acd Peter Zijlstra 2016-05-09 6415 }
10e2f1acd Peter Zijlstra 2016-05-09 6416
10e2f1acd Peter Zijlstra 2016-05-09 6417 time = local_clock() - time;
10e2f1acd Peter Zijlstra 2016-05-09 6418 cost = this_sd->avg_scan_cost;
10e2f1acd Peter Zijlstra 2016-05-09 6419 delta = (s64)(time - cost) / 8;
10e2f1acd Peter Zijlstra 2016-05-09 6420 this_sd->avg_scan_cost += delta;
10e2f1acd Peter Zijlstra 2016-05-09 6421
10e2f1acd Peter Zijlstra 2016-05-09 6422 return cpu;
10e2f1acd Peter Zijlstra 2016-05-09 6423 }
10e2f1acd Peter Zijlstra 2016-05-09 6424

:::::: The code at line 6396 was first introduced by commit
:::::: 1ad3aaf3fcd2444406628a19a9b9e0922b95e2d4 sched/core: Implement new approach to scale select_idle_cpu()

:::::: TO: Peter Zijlstra <[email protected]>
:::::: CC: Ingo Molnar <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation

Attachments:

(No filename) (5.45 kB)
.config.gz (27.67 kB)
Download all attachments

2018-06-12 22:12:58

by Subhra Mazumdar

[permalink] [raw]

Subject: Re: [PATCH 1/5] sched: limit cpu search in select_idle_cpu

On 06/12/2018 01:33 PM, kbuild test robot wrote:
> Hi subhra,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on tip/sched/core]
> [also build test WARNING on v4.17 next-20180612]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/subhra-mazumdar/Improve-scheduler-scalability-for-fast-path/20180613-015158
> config: i386-randconfig-x070-201823 (attached as .config)
> compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386
>
> All warnings (new ones prefixed by >>):
>
> kernel/sched/fair.c: In function 'select_idle_cpu':
>>> kernel/sched/fair.c:6396:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
> u64 span_avg = sd->span_weight * avg_idle;
> ^~~
>
>
<snip>

I fixed this patch, please try the following

---8<---

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e497c05..7243146 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6372,7 +6372,7 @@ static int select_idle_cpu(struct task_struct *p,
struct sched_domain *sd, int t
??????? u64 avg_cost, avg_idle;
??????? u64 time, cost;
??????? s64 delta;
-?????? int cpu, nr = INT_MAX;
+?????? int cpu, limit, floor, nr = INT_MAX;

??????? this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
??????? if (!this_sd)
@@ -6390,10 +6390,17 @@ static int select_idle_cpu(struct task_struct
*p, struct sched_domain *sd, int t

??????? if (sched_feat(SIS_PROP)) {
??????????????? u64 span_avg = sd->span_weight * avg_idle;
-?????????????? if (span_avg > 4*avg_cost)
+?????????????? floor = cpumask_weight(topology_sibling_cpumask(target));
+?????????????? if (floor < 2)
+?????????????????????? floor = 2;
+?????????????? limit = 2*floor;
+?????????????? if (span_avg > floor*avg_cost) {
??????????????????????? nr = div_u64(span_avg, avg_cost);
-?????????????? else
-?????????????????????? nr = 4;
+?????????????????????? if (nr > limit)
+?????????????????????????????? nr = limit;
+?????????????? } else {
+?????????????????????? nr = floor;
+?????????????? }
??????? }

??????? time = local_clock();

2018-06-20 09:00:01

by kernel test robot

[permalink] [raw]

Subject: [lkp-robot] [sched] 8dd662615c: reaim.jobs_per_min -3.3% regression

Greeting,

FYI, we noticed a

commit: 8dd662615c7a4dd434 url: In addition to that,
+------------------+------ | testcase: change | test machine | test parameters | | ipc=pipe | | iterations=12 | | mode=process | | nr_threads=50% +------------------+------

Details are as below:
--------------------------

To reproduce:

git clone bin/lkp install job.yaml bin/lkp run job.yaml

========================== compiler/cpufreq_governor/ gcc-7/performance/x86_64
commit:
2539fc82aa ("sched/fair: 8dd662615c ("sched:
2539fc82aa9b07d9 ---------------- %stddev %change \ | 938731 814.87 948046 141.74 297.72 7.22 +3.4% 14.70 5897031 7.527e+08 4709 -5.0% 4683 -6.6% 9825 -5.0% 13502630 ? 2% 3801600 0.00 ? 9% +0.0 125038 76044 86274 ? 6% -16.7% 32499 ? 22% -28.7% 86274 ? 6% -16.7% 344856 ? 6% 344854 ? 6% 129963 ? 22% -28.7% 157613 ? 17% 106884 -19.6% 836.25 107116 -19.8% 836.25 7.012e+08 7.012e+08 14167 7.123e+08 7.539e+08 7.123e+08 262.50 ? 13% 497313 ? 12% 33703 ? 55% +400.3% 38227 ? 5% +18.7% 10018 ? 18% +41.9% 309090 ? 26% 2427 ? 18% -31.8% 45654 ? 3% +22.4% 2.396e+08 ? 5% 3257365 ? 7% 2.834e+08 2727675 5580441 5.676e+09 6613043 1459318 ? 3% 61461 ? 7% -29.9% 1853 -4.8% 3252668 ? 7% 1.07 ? 4% -0.4 2725753 1.27 -0.4 5579952 6610452 25.48 +3.4 15.88 +21.7% 0.26 ? 3% +45.7% 5.215e+12 1.23 +0.0 6.392e+10 6.68 +0.1 1.894e+11 38848096 1.23 -2.1% 4.1e+13 11806401 1.092e+10 ? 4% 6.639e+12 2.865e+12 54.31 -0.4 6.194e+09 5.212e+09 3.331e+13 0.81 +2.2% 7.453e+08 87.48 +2.1 1.141e+09 55.05 +1.3 1.561e+09 7.453e+08 1.88 ? 13% -0.8 1.73 ? 14% -0.7 1.73 ? 14% -0.7 7.29 ? 2% -0.7 1.22 ? 17% -0.6 0.70 ? 6% -0.3 1.10 ? 5% -0.3 1.91 ? 4% -0.3 1.41 ? 5% -0.3 1.40 ? 5% -0.3 0.94 ? 3% -0.3 1.75 ? 4% -0.3 0.87 -0.2 0.87 -0.2 2.12 ? 3% -0.1 0.75 ? 6% -0.1 0.70 ? 6% -0.1 0.76 ? 6% -0.1 0.89 ? 3% -0.1 0.88 ? 2% -0.1 0.69 -0.0 0.77 ? 4% +0.1 0.79 ? 5% +0.1 23.64 +2.1 23.36 +2.4 23.36 +2.4 23.36 +2.4 22.43 +2.6 20.81 +3.0 46.88 ? 2% -1.8 47.07 ? 2% -1.8 7.94 ? 3% -0.8 7.95 ? 3% -0.8 8.14 ? 3% -0.8 4.23 ? 3% -0.7 3.33 ? 6% -0.7 3.34 ? 6% -0.7 2.08 ? 10% -0.6 2.86 ? 2% -0.2 2.84 ? 2% -0.2 0.91 ? 3% -0.2 1.28 ? 3% -0.2 1.06 ? 3% -0.1 0.41 ? 7% -0.1 0.60 ? 5% -0.1 0.60 ? 5% -0.1 2.12 ? 3% -0.1 1.17 -0.1 1.18 -0.1 0.34 ? 8% -0.1 1.75 -0.1 0.73 -0.1 0.89 ? 3% -0.1 0.72 ? 7% -0.1 1.14 -0.1 0.75 ? 6% -0.1 0.76 ? 6% -0.1 0.98 -0.1 0.19 ? 16% -0.1 0.63 ? 4% -0.1 0.74 ? 4% -0.1 0.31 ? 5% -0.1 0.44 ? 4% -0.1 0.27 ? 19% -0.1 0.16 ? 22% -0.1 0.16 ? 14% -0.1 0.66 -0.1 1.04 ? 4% -0.1 0.41 ? 7% -0.0 0.43 ? 5% -0.0 0.21 ? 7% -0.0 0.51 -0.0 0.65 -0.0 0.14 ? 7% -0.0 0.21 ? 2% -0.0 0.10 ? 11% -0.0 0.36 ? 4% -0.0 0.14 ? 6% -0.0 0.21 ? 8% -0.0 0.16 ? 6% -0.0 0.17 ? 6% -0.0 0.16 ? 5% -0.0 0.12 ? 10% -0.0 0.24 ? 5% -0.0 0.10 ? 8% -0.0 0.09 ? 4% -0.0 0.06 -0.0 0.10 ? 4% +0.0 0.07 ? 12% +0.0 0.34 ? 2% +0.0 0.04 ? 58% +0.0 0.04 ? 58% +0.0 0.04 ? 58% +0.0 0.00 +0.1 0.00 +0.1 1.73 +0.1 1.14 +0.1 0.43 ? 6% +0.1 0.22 ? 8% +0.1 0.62 ? 5% +0.1 0.61 ? 5% +0.1 0.62 ? 5% +0.1 0.26 ? 3% +0.1 0.28 ? 5% +0.1 0.34 ? 4% +0.1 0.34 ? 2% +0.2 0.77 ? 2% +0.2 1.32 +0.2 0.61 ? 4% +0.2 0.62 ? 3% +0.3 23.65 +2.1 23.64 +2.1 23.64 +2.1 22.75 +2.3 23.36 +2.4 21.06 +2.7 2.07 ? 10% -0.6 0.27 ? 18% -0.1 0.11 ? 40% -0.1 0.16 ? 5% -0.0 0.36 ? 4% -0.0 0.48 -0.0 0.12 ? 7% -0.0 0.09 ? 7% -0.0 0.07 ? 11% -0.0 0.23 ? 4% -0.0 0.15 ? 6% -0.0 0.08 ? 12% -0.0 0.08 ? 6% -0.0 0.07 ? 6% +0.0 0.34 ? 2% +0.0 0.07 ? 12% +0.0 0.15 ? 3% +0.0 0.08 ? 13% +0.0 0.01 ?173% +0.0 0.00 +0.1 0.26 ? 3% +0.1 0.28 ? 2% +0.1 21.05 +2.7

7.5 +-+---------------- O O O O O 7.45 +-O O O O O O | 7.4 +-+ | 7.35 +-+ | 7.3 +-+ | +. .+ 7.25 +-+ +. + +. + .+..+ |: + +..+ + 7.2 +-++ | + 7.15 +-+---------------- 300 +-+----------------- | 299 +-+ .+.. 298 +-+.. .+ +.+..+ + | +.+..+ 297 +-+ 296 +-+ | 295 +-+ 294 +-+ | O O O 293 O-+ O O O O 292 +-+ | 291 +-+----------------- 945000 +-+-------------- |: : 940000 +-+ : 935000 +-+ : .+.+.. | + 930000 +-+ 925000 +-+ | 920000 +-+ 915000 +-+ | 910000 +-O 905000 O-+ O O O O O | 900000 +-+-------------- 820 +-+----------------- | :: 815 +-+ : +.. |: : .+..+ : 810 +-+ + + .+.. : +.+..+.+ | + + 805 +-+ | 800 +-+ | 795 +-+ | 790 +-O O O O O 785 +-+--O----O--------- 4750 +-+---------------- |.+.. 4700 +-+ .+..+.+.+..+ + | + | 4650 +-+ | 4600 +-+ | 4550 +-+ O O O O O O O O O O O O O | 4500 +-+ | 4450 +-+---------------- 6.2e+06 +-+------------ 6.15e+06 +-+ | 6.1e+06 O-O O O O O O O O O O 6.05e+06 +-+ | 6e+06 +-+ 5.95e+06 +-+ 5.9e+06 +-+. .+. | +..+ + 5.85e+06 +-+ : : 5.8e+06 +-+ : : | : : 5.75e+06 +-+ :: 5.7e+06 +-+------------ [*] bisect-good sample
[O] bisect-bad sample

************************** lkp-hsw-ep4: 72 threads ========================== compiler/cpufreq_governor/ gcc-7/performance/pipe/1
commit:
2539fc82aa ("sched/fair: 8dd662615c ("sched:
2539fc82aa9b07d9 ---------------- %stddev %change \ | 123935 818.39 818.39 4.18e+09 ? 2% 52401 -11.6% 6201 -9.6% 6.222e+09 0.64 ? 3% +0.2 2871 ? 6% +8.7% 2871 ? 6% +8.7% 9642032 ? 4% 831999 ? 2% 12709169 973111 66696250 ? 5% 11551951 ? 6% 844303 ? 6% 182674 ? 8% +129.8% 27699444 ? 5% 27702926 ? 5% 30815427 30826044 14003985 ? 6% 14000286 ? 6% 15401529 15255090 11550825 ? 6% 0.11 ? 4% +0.1 0.35 ? 3% +0.1 0.47 ? 2% +20.9% 0.23 ? 3% +27.7% 47567 58537822 ? 2% 58523716 ? 2% 58726400 ? 2% 2444733 58669621 ? 2% 1.45e+13 1.78 +0.1 2.579e+11 0.87 ? 5% -0.1 8.286e+09 ? 7% 9.472e+11 1.043e+10 1.424e+14 2.421e+09 ? 3% 1.71 ? 3% -0.3 3.701e+11 ? 2% 2.126e+13 0.11 ? 8% -0.0 1.503e+10 ? 7% 1.313e+13 55.10 +1.2 1.164e+11 ? 2% 9.48e+10 7.393e+13 2408870 5.089e+09 ? 8% 1.95e+09 ? 7% 1.193e+09 ? 6% 2408870 71308 -10.7% 0.00 +6.6e+12% 0.00 0.00 +2.8e+28% 30945 ? 5% +71.6% 5000 ? 2% +55.4% 0.00 +6.6e+12% 0.00 0.00 +2.8e+28% 29631852 26964419 31.32 ? 4% +10.9% 2.30 ? 19% -45.1% 5.38 ? 4% +14.2% 29759 ? 2% +76.9% 5353 ? 3% +53.7% 127.80 ? 6% -27.7% 3.68 ? 9% -24.7% 5.29 ? 4% +13.0% 29.43 ? 4% +16.5% 4.55 ? 4% -23.3% 4.74 ? 3% +18.4% 4.88 ? 4% -18.7% 4.96 ? 5% -11.8% 4.38 ? 4% +13.8% 2.57 ? 7% -20.6% 3.38 ? 3% +7.4% 70365771 74366842 61887561 6.11 ? 12% -2.0 22.94 ? 3% -1.9 23.59 ? 3% -1.9 22.70 ? 3% -1.9 4.24 ? 17% -1.9 25.82 ? 3% -1.8 2.50 ? 23% -1.6 79.22 -1.1 80.03 -1.0 6.77 ? 2% -1.0 6.80 ? 2% -0.9 6.95 ? 2% -0.9 2.44 -0.4 1.31 ? 6% -0.2 2.48 -0.2 1.61 ? 4% -0.2 1.82 ? 2% -0.2 1.51 -0.1 1.06 -0.1 1.02 ? 6% -0.1 0.96 ? 5% -0.1 0.72 ? 3% -0.1 0.70 ? 3% -0.1 0.83 ? 4% -0.1 0.76 -0.0 0.68 ? 3% -0.0 1.61 -0.0 1.10 +0.0 1.19 +0.0 0.62 ? 4% +0.1 1.23 ? 3% +0.1 1.28 ? 2% +0.1 0.60 ? 3% +0.1 1.64 ? 2% +0.1 1.15 ? 4% +0.1 0.66 ? 7% +0.1 2.02 ? 2% +0.1 1.06 ? 2% +0.1 0.71 +0.2 1.38 ? 2% +0.2 1.82 ? 4% +0.2 0.54 +0.2 0.41 ? 57% +0.2 2.86 ? 2% +0.2 1.91 ? 2% +0.2 3.12 ? 2% +0.4 2.64 ? 3% +0.4 0.13 ?173% +0.4 0.00 +0.6 5.25 ? 3% +0.7 6.20 ? 12% -2.0 23.05 ? 3% -1.9 23.73 ? 3% -1.9 4.36 ? 17% -1.9 23.06 ? 3% -1.8 26.51 ? 2% -1.7 2.55 ? 23% -1.6 23.81 -1.1 79.38 -1.1 23.80 -1.1 80.06 -1.0 6.98 ? 2% -0.9 5.43 -0.4 0.65 ? 21% -0.4 3.81 ? 2% -0.3 1.78 ? 5% -0.3 1.32 ? 6% -0.2 2.54 -0.2 0.50 ? 10% -0.2 1.12 ? 6% -0.2 4.11 ? 2% -0.2 0.64 ? 5% -0.1 1.64 -0.1 2.92 -0.1 0.57 ? 7% -0.1 1.10 ? 2% -0.1 0.96 ? 2% -0.1 0.33 ? 2% -0.1 0.88 ? 2% -0.1 0.55 ? 6% -0.1 0.29 ? 7% -0.1 0.32 ? 3% -0.1 0.27 ? 2% -0.0 0.36 -0.0 0.16 ? 6% -0.0 0.17 ? 4% +0.0 0.11 ? 9% +0.0 0.12 ? 8% +0.0 0.04 ? 57% +0.0 0.16 ? 11% +0.0 0.12 ? 12% +0.0 0.30 ? 2% +0.0 1.11 +0.0 0.29 +0.0 0.22 ? 4% +0.0 0.30 ? 6% +0.0 0.23 ? 3% +0.0 0.23 ? 6% +0.0 1.21 +0.0 0.45 +0.1 1.13 +0.1 0.00 +0.1 0.00 +0.1 0.18 ? 4% +0.1 0.00 +0.1 0.41 ? 6% +0.1 0.26 ? 4% +0.1 0.23 ? 8% +0.1 1.65 +0.1 0.42 ? 4% +0.1 0.42 ? 5% +0.1 0.73 ? 3% +0.1 0.00 +0.1 0.00 +0.1 0.80 ? 6% +0.1 0.00 +0.1 1.09 ? 5% +0.1 2.15 ? 2% +0.1 0.67 ? 4% +0.1 0.66 ? 4% +0.2 2.48 ? 2% +0.2 0.95 ? 6% +0.2 3.09 ? 2% +0.2 0.04 ? 58% +0.2 0.04 ? 58% +0.2 0.04 ? 58% +0.2 0.04 ? 58% +0.2 1.10 ? 3% +0.2 0.76 ? 3% +0.2 1.40 ? 5% +0.2 2.76 ? 2% +0.3 4.95 +0.5 5.25 ? 3% +0.7 5.63 ? 2% +0.7 2.55 ? 23% -1.6 2.95 ? 3% -0.3 1.69 ? 5% -0.2 0.39 ? 22% -0.2 2.54 -0.2 0.50 ? 10% -0.2 0.74 ? 10% -0.2 2.77 ? 2% -0.2 1.64 -0.1 2.92 -0.1 2.32 -0.1 0.57 ? 7% -0.1 1.20 -0.1 0.96 ? 2% -0.1 0.26 ? 9% -0.1 0.29 ? 7% -0.1 0.41 ? 2% -0.1 0.32 ? 3% -0.1 1.22 -0.1 0.33 ? 2% -0.0 0.36 -0.0 0.16 ? 6% -0.0 0.05 ? 8% +0.0 0.09 +0.0 0.21 ? 4% +0.0 0.11 ? 7% +0.0 0.23 ? 3% +0.0 0.12 ? 6% +0.0 0.07 ? 10% +0.0 0.17 ? 4% +0.0 0.29 +0.0 0.21 ? 2% +0.0 0.12 ? 12% +0.0 0.11 ? 7% +0.0 0.29 +0.0 0.15 ? 10% +0.0 0.27 ? 3% +0.0 0.29 +0.0 0.15 ? 5% +0.0 0.22 ? 5% +0.0 0.22 ? 8% +0.0 0.23 ? 4% +0.0 0.22 ? 5% +0.0 0.00 +0.1 0.45 +0.1 1.09 +0.1 0.22 +0.1 0.61 ? 4% +0.1 0.57 +0.1 0.00 +0.1 1.14 +0.1 0.35 ? 4% +0.1 0.34 ? 6% +0.1 0.25 ? 5% +0.1 0.23 ? 8% +0.1 0.50 ? 4% +0.1 0.68 +0.1 0.81 ? 2% +0.1 0.72 ? 3% +0.1 0.29 ? 6% +0.1 0.00 +0.1 0.79 ? 6% +0.1 0.00 +0.1 1.08 ? 4% +0.1 2.37 ? 2% +0.1 1.09 ? 3% +0.2 2.73 ? 2% +0.3 1.19 ? 5% +0.3 5.25 ? 3% +0.7 5.63 ? 2% +0.7

Disclaimer:
Results have been for informational design or configuration

Thanks,
Xiaolong
-3.3% regression of reaim.jobs_per_min due to commit:
7d89d2f5647cba9c51c2af ("sched: limit cpu search in select_idle_cpu")
hub.com/0day-ci/linux/commits/subhra-mazumdar/Improve-scheduler-scalability-for-fast-path/20180613-015158">https://github.com/0day-ci/linux/commits/subhra-mazumdar/Improve-scheduler-scalability-for-fast-path/20180613-015158
72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 128G memory
REAIM is an updated and improved version of AIM 7 benchmark.
//sourceforge.net/projects/re-aim-7/">https://sourceforge.net/projects/re-aim-7/
the commit also has significant impact on the following tests:
-----------------------------------------------------------------+
| hackbench: hackbench.throughput 12.7% improvement |
| 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 128G memory |
| cpufreq_governor=performance |
|
|
|
|
-----------------------------------------------------------------+
------------------------------------------------------------------------>
href="https://github.com/intel/lkp-tests.git">https://github.com/intel/lkp-tests.git
# job file is attached in this email
===============================================================
kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase:
-rhel-7.2/1600%/debian-x86_64-2018-04-03.cgz/300s/lkp-hsw-ep2/new_fserver/reaim
Update util_est before updating schedutil")
limit cpu search in select_idle_cpu")
8dd662615c7a4dd4347d89d2f5
--------------------------
%stddev
\
-3.3% 908031 reaim.jobs_per_min
-3.3% 788.22 reaim.jobs_per_min_child
-3.3% 916622 reaim.max_jobs_per_min
-3.7% 136.50 reaim.child_systime
-2.0% 291.73 reaim.child_utime
7.46 reaim.parent_time
-3.8% 14.14 reaim.std_dev_percent
+4.7% 6174085 reaim.time.involuntary_context_switches
-3.1% 7.291e+08 reaim.time.minor_page_faults
4472 reaim.time.percent_of_cpu_this_job_got
4373 reaim.time.system_time
9336 reaim.time.user_time
-12.2% 11859024 reaim.time.voluntary_context_switches
-3.0% 3686400 reaim.workload
0.00 ? 16% mpstat.cpu.iowait%
-8.9% 113866 vmstat.system.cs
+2.6% 78021 vmstat.system.in
71862 ? 9% numa-vmstat.node1.nr_active_anon
23186 ? 30% numa-vmstat.node1.nr_slab_unreclaimable
71862 ? 9% numa-vmstat.node1.nr_zone_active_anon
-16.6% 287582 ? 9% numa-meminfo.node1.Active
-16.6% 287580 ? 9% numa-meminfo.node1.Active(anon)
92645 ? 30% numa-meminfo.node1.SUnreclaim
-22.8% 121642 ? 24% numa-meminfo.node1.Slab
85936 slabinfo.kmalloc-32.active_objs
-19.8% 670.75 slabinfo.kmalloc-32.active_slabs
85955 slabinfo.kmalloc-32.num_objs
-19.8% 670.75 slabinfo.kmalloc-32.num_slabs
-3.1% 6.792e+08 proc-vmstat.numa_hit
-3.1% 6.792e+08 proc-vmstat.numa_local
+3.0% 14590 proc-vmstat.numa_other
-3.2% 6.894e+08 proc-vmstat.pgalloc_normal
-3.1% 7.303e+08 proc-vmstat.pgfault
-3.2% 6.894e+08 proc-vmstat.pgfree
+18.6% 311.29 ? 6% sched_debug.cfs_rq:/.removed.util_avg.max
+26.5% 629284 ? 4% sched_debug.cpu.avg_idle.avg
168620 ? 51% sched_debug.cpu.avg_idle.min
45393 ? 2% sched_debug.cpu.curr->pid.max
14220 ? 16% sched_debug.cpu.curr->pid.stddev
-51.9% 148822 ? 45% sched_debug.cpu.load.max
1656 ? 12% sched_debug.cpu.sched_goidle.stddev
55880 ? 16% sched_debug.cpu.ttwu_local.max
-35.0% 1.557e+08 ? 5% cpuidle.C1.time
-42.8% 1863490 ? 8% cpuidle.C1.usage
-29.8% 1.99e+08 cpuidle.C1E.time
-31.3% 1874186 cpuidle.C1E.usage
-9.4% 5057704 cpuidle.C3.usage
+12.9% 6.41e+09 cpuidle.C6.time
+21.6% 8042531 cpuidle.C6.usage
-10.8% 1301821 ? 2% cpuidle.POLL.time
43098 ? 6% cpuidle.POLL.usage
1765 turbostat.Avg_MHz
-42.9% 1857319 ? 8% turbostat.C1
0.70 ? 5% turbostat.C1%
-31.3% 1872752 turbostat.C1E
0.90 turbostat.C1E%
-9.4% 5057162 turbostat.C3
+21.6% 8040209 turbostat.C6
28.92 turbostat.C6%
19.33 turbostat.CPU%c1
0.38 ? 2% turbostat.CPU%c3
-3.4% 5.039e+12 perf-stat.branch-instructions
1.25 perf-stat.branch-miss-rate%
-1.4% 6.304e+10 perf-stat.branch-misses
6.81 perf-stat.cache-miss-rate%
-3.3% 1.831e+11 perf-stat.cache-references
-9.5% 35176187 perf-stat.context-switches
1.20 perf-stat.cpi
-5.3% 3.884e+13 perf-stat.cpu-cycles
-10.8% 10528657 perf-stat.cpu-migrations
-11.2% 9.699e+09 ? 5% perf-stat.dTLB-load-misses
-3.3% 6.417e+12 perf-stat.dTLB-loads
-3.2% 2.773e+12 perf-stat.dTLB-stores
53.86 perf-stat.iTLB-load-miss-rate%
-5.2% 5.875e+09 perf-stat.iTLB-load-misses
-3.4% 5.033e+09 perf-stat.iTLB-loads
-3.2% 3.224e+13 perf-stat.instructions
0.83 perf-stat.ipc
-3.1% 7.22e+08 perf-stat.minor-faults
89.56 perf-stat.node-load-miss-rate%
-17.5% 9.416e+08 ? 2% perf-stat.node-loads
56.38 perf-stat.node-store-miss-rate%
-4.9% 1.485e+09 perf-stat.node-stores
-3.1% 7.22e+08 perf-stat.page-faults
1.10 ? 7% perf-profile.calltrace.cycles-pp.down_write.path_openat.do_filp_open.do_sys_open.do_syscall_64
1.00 ? 7% perf-profile.calltrace.cycles-pp.call_rwsem_down_write_failed.down_write.path_openat.do_filp_open.do_sys_open
1.00 ? 7% perf-profile.calltrace.cycles-pp.rwsem_down_write_failed.call_rwsem_down_write_failed.down_write.path_openat.do_filp_open
6.61 ? 3% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.63 ? 6% perf-profile.calltrace.cycles-pp.osq_lock.rwsem_down_write_failed.call_rwsem_down_write_failed.down_write.path_openat
0.41 ? 58% perf-profile.calltrace.cycles-pp.__split_vma.do_munmap.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.82 ? 3% perf-profile.calltrace.cycles-pp.elf_map.load_elf_binary.search_binary_handler.do_execveat_common.__x64_sys_execve
1.63 ? 4% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu_free.arch_tlb_finish_mmu.tlb_finish_mmu
1.14 ? 3% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
1.14 ? 3% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry
0.69 ? 4% perf-profile.calltrace.cycles-pp.up_write.path_openat.do_filp_open.do_sys_open.do_syscall_64
1.50 ? 4% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu_free.arch_tlb_finish_mmu
0.64 ? 5% perf-profile.calltrace.cycles-pp.rwsem_wake.call_rwsem_wake.up_write.path_openat.do_filp_open
0.64 ? 5% perf-profile.calltrace.cycles-pp.call_rwsem_wake.up_write.path_openat.do_filp_open.do_sys_open
2.01 ? 2% perf-profile.calltrace.cycles-pp.setlocale
0.67 ? 4% perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.62 ? 3% perf-profile.calltrace.cycles-pp.dcache_readdir.iterate_dir.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.68 ? 4% perf-profile.calltrace.cycles-pp.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.83 ? 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.setlocale
0.82 ? 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.setlocale
0.66 ? 3% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork
0.85 ? 4% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt
0.87 ? 4% perf-profile.calltrace.cycles-pp.apic_timer_interrupt
25.74 perf-profile.calltrace.cycles-pp.secondary_startup_64
25.73 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
25.73 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
25.73 perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
25.01 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
23.78 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
45.06 perf-profile.children.cycles-pp.do_syscall_64
45.25 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
7.11 ? 2% perf-profile.children.cycles-pp.path_openat
7.14 ? 2% perf-profile.children.cycles-pp.do_filp_open
7.37 ? 3% perf-profile.children.cycles-pp.do_sys_open
3.54 ? 5% perf-profile.children.cycles-pp.down_write
2.67 ? 7% perf-profile.children.cycles-pp.rwsem_down_write_failed
2.69 ? 7% perf-profile.children.cycles-pp.call_rwsem_down_write_failed
1.50 ? 9% perf-profile.children.cycles-pp.osq_lock
2.69 ? 2% perf-profile.children.cycles-pp.apic_timer_interrupt
2.67 ? 2% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
0.74 ? 5% perf-profile.children.cycles-pp.do_unlinkat
1.12 ? 5% perf-profile.children.cycles-pp.hrtimer_interrupt
0.94 ? 5% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.29 ? 10% perf-profile.children.cycles-pp.menu_select
0.48 ? 4% perf-profile.children.cycles-pp.do_linkat
0.49 ? 4% perf-profile.children.cycles-pp.__x64_sys_link
2.01 ? 2% perf-profile.children.cycles-pp.setlocale
1.07 ? 2% perf-profile.children.cycles-pp.call_rwsem_wake
1.07 ? 2% perf-profile.children.cycles-pp.rwsem_wake
0.23 ? 4% perf-profile.children.cycles-pp.filename_create
1.65 perf-profile.children.cycles-pp.up_write
0.65 ? 7% perf-profile.children.cycles-pp.tick_sched_timer
0.80 ? 2% perf-profile.children.cycles-pp.wake_up_q
0.64 ? 4% perf-profile.children.cycles-pp.dcache_readdir
1.05 perf-profile.children.cycles-pp.exit_to_usermode_loop
0.67 ? 4% perf-profile.children.cycles-pp.iterate_dir
0.68 ? 4% perf-profile.children.cycles-pp.__x64_sys_getdents
0.91 ? 2% perf-profile.children.cycles-pp.copy_strings
0.12 ? 11% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.56 ? 3% perf-profile.children.cycles-pp.__fput
0.67 perf-profile.children.cycles-pp.task_work_run
0.24 ? 5% perf-profile.children.cycles-pp.run_ksoftirqd
0.37 ? 5% perf-profile.children.cycles-pp.__might_fault
0.21 ? 11% perf-profile.children.cycles-pp.memcpy_erms
0.10 ? 27% perf-profile.children.cycles-pp.ktime_get
0.10 ? 10% perf-profile.children.cycles-pp.tick_nohz_next_event
0.61 ? 6% perf-profile.children.cycles-pp.tick_sched_handle
0.99 perf-profile.children.cycles-pp.kmem_cache_free
0.36 ? 6% perf-profile.children.cycles-pp._copy_to_user
0.39 ? 4% perf-profile.children.cycles-pp.page_add_file_rmap
0.17 ? 2% perf-profile.children.cycles-pp.schedule_idle
0.47 perf-profile.children.cycles-pp.do_signal
0.61 ? 3% perf-profile.children.cycles-pp.get_empty_filp
0.11 ? 6% perf-profile.children.cycles-pp.selinux_vm_enough_memory
0.17 ? 6% perf-profile.children.cycles-pp.rebalance_domains
0.07 ? 17% perf-profile.children.cycles-pp.get_next_timer_interrupt
0.33 ? 2% perf-profile.children.cycles-pp.do_open_execat
0.11 ? 6% perf-profile.children.cycles-pp.__slab_alloc
0.18 ? 3% perf-profile.children.cycles-pp.__pmd_alloc
0.14 ? 3% perf-profile.children.cycles-pp.filp_close
0.15 ? 10% perf-profile.children.cycles-pp.get_user_arg_ptr
0.13 ? 6% perf-profile.children.cycles-pp.free_pcppages_bulk
0.10 ? 7% perf-profile.children.cycles-pp.___slab_alloc
0.22 perf-profile.children.cycles-pp.copy_strings_kernel
0.08 ? 5% perf-profile.children.cycles-pp.irq_enter
0.08 ? 5% perf-profile.children.cycles-pp.run_rebalance_domains
0.05 perf-profile.children.cycles-pp.__d_lookup_done
0.12 ? 5% perf-profile.children.cycles-pp.__update_load_avg_se
0.09 perf-profile.children.cycles-pp.__task_pid_nr_ns
0.36 ? 2% perf-profile.children.cycles-pp.unlock_page
0.07 ? 10% perf-profile.children.cycles-pp.rwsem_down_read_failed
0.07 ? 10% perf-profile.children.cycles-pp.call_rwsem_down_read_failed
0.08 ? 10% perf-profile.children.cycles-pp.cpumask_next_and
0.05 ? 8% perf-profile.children.cycles-pp.reschedule_interrupt
0.06 ? 11% perf-profile.children.cycles-pp.find_next_and_bit
1.80 perf-profile.children.cycles-pp.flush_tlb_func_common
1.21 ? 2% perf-profile.children.cycles-pp.rcu_process_callbacks
0.51 ? 4% perf-profile.children.cycles-pp.wait4
0.30 ? 3% perf-profile.children.cycles-pp.do_task_dead
0.71 ? 4% perf-profile.children.cycles-pp.kernel_wait4
0.69 ? 4% perf-profile.children.cycles-pp.do_wait
0.71 ? 4% perf-profile.children.cycles-pp.__do_sys_wait4
0.36 ? 4% perf-profile.children.cycles-pp._atomic_dec_and_lock
0.39 ? 3% perf-profile.children.cycles-pp.free_uid
0.47 ? 2% perf-profile.children.cycles-pp.put_cred_rcu
0.51 ? 3% perf-profile.children.cycles-pp.find_busiest_group
0.95 perf-profile.children.cycles-pp.schedule
1.54 perf-profile.children.cycles-pp.__schedule
0.83 ? 2% perf-profile.children.cycles-pp.load_balance
0.89 perf-profile.children.cycles-pp.pick_next_task_fair
25.75 perf-profile.children.cycles-pp.do_idle
25.74 perf-profile.children.cycles-pp.cpu_startup_entry
25.74 perf-profile.children.cycles-pp.secondary_startup_64
25.07 perf-profile.children.cycles-pp.cpuidle_enter_state
25.73 perf-profile.children.cycles-pp.start_secondary
23.79 perf-profile.children.cycles-pp.intel_idle
1.48 ? 9% perf-profile.self.cycles-pp.osq_lock
0.21 ? 9% perf-profile.self.cycles-pp.memcpy_erms
0.05 ? 74% perf-profile.self.cycles-pp.ktime_get
0.12 ? 19% perf-profile.self.cycles-pp.menu_select
0.32 ? 4% perf-profile.self.cycles-pp.page_add_file_rmap
0.46 ? 2% perf-profile.self.cycles-pp.__slab_free
0.10 ? 5% perf-profile.self.cycles-pp.cpuidle_enter_state
0.07 ? 12% perf-profile.self.cycles-pp.free_pcppages_bulk
0.05 ? 8% perf-profile.self.cycles-pp.do_idle
0.21 ? 4% perf-profile.self.cycles-pp.update_load_avg
0.13 ? 3% perf-profile.self.cycles-pp.do_dentry_open
0.06 ? 6% perf-profile.self.cycles-pp.update_rq_clock
0.06 ? 6% perf-profile.self.cycles-pp.security_inode_permission
0.08 ? 5% perf-profile.self.cycles-pp.load_balance
0.36 ? 3% perf-profile.self.cycles-pp.unlock_page
0.09 perf-profile.self.cycles-pp.__task_pid_nr_ns
0.17 ? 4% perf-profile.self.cycles-pp.anon_vma_clone
0.12 ? 13% perf-profile.self.cycles-pp.try_to_wake_up
0.06 ? 11% perf-profile.self.cycles-pp.pick_next_task_fair
0.06 ? 11% perf-profile.self.cycles-pp.find_next_and_bit
0.36 ? 4% perf-profile.self.cycles-pp._atomic_dec_and_lock
0.42 ? 4% perf-profile.self.cycles-pp.find_busiest_group
23.79 perf-profile.self.cycles-pp.intel_idle

reaim.parent_time

--------------------------------------O-----------+
O O |
O O O |
O O O O O O O O |
|
|
|
|
|
.+.. +.. |
+. + + |
+.+..+.+ +.+..+.+. +. + + |
+.. + +..+ + +|
+ + |
--------------------------------------------------+

reaim.child_utime

--------------------------------------------------+
.+. .+ |
.+.. .+.+. +. + |
+. .+. .+.+..+. .+..+. .+.|
+. +. + +. |
|
|
|
|
O |
O O O |
O O O O O O O O O |
O |
O O O |
--------------------------------------------------+

reaim.jobs_per_min

--------------------------------------------------+
.+.+ .+ + |
+.. .+.+..+.+ +.+.+..+ + .+ + .. +|
+ .+. .+ : + +. + |
+.+.+ + +. : + |
+ |
|
|
|
|
O O O O O O O |
O O O O O O O O |
O O |
O |
--------------------------------------------------+

reaim.jobs_per_min_child

--------------------------------------------------+
.+. : + |
.+. .+ +..+.+..+ : .+..+. .. +|
.+. +. : : + + |
: : |
+ |
|
|
|
|
|
O O O O O O O |
O O O O O O O |
O O O |
--------------------------------------O-----------+

reaim.time.percent_of_cpu_this_job_got

--------------------------------------------------+
.+.+.. .+..+.+.+..+.+. .+. .+.. .+.|
+..+.+..+.+.+. + +.+. |
|
|
|
|
|
|
|
|
O O O O |
O O O O |
O O O O |
--------------------------------------------------+

reaim.time.involuntary_context_switches

--------------------------------O-----------------+
O O O |
O |
O |
O O O O O O O O |
|
|
|
.+. .+.+.. .+ .+. .+.+..+.+.|
+.+.+.+. + + : +..+.+.+.+. + |
: : |
: : |
:: |
+ |
--------------------------------------------------+

*************************************************************************
Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 128G memory
===============================================================
ipc/iterations/kconfig/mode/nr_threads/rootfs/tbox_group/testcase:
2/x86_64-rhel-7.2/process/50%/debian-x86_64-2018-04-03.cgz/lkp-hsw-ep4/hackbench
Update util_est before updating schedutil")
limit cpu search in select_idle_cpu")
8dd662615c7a4dd4347d89d2f5
--------------------------
%stddev
\
+12.7% 139696 hackbench.throughput
-11.3% 726.18 hackbench.time.elapsed_time
-11.3% 726.18 hackbench.time.elapsed_time.max
-21.1% 3.297e+09 hackbench.time.involuntary_context_switches
46313 hackbench.time.system_time
5606 hackbench.time.user_time
-12.1% 5.467e+09 hackbench.time.voluntary_context_switches
0.81 ? 7% mpstat.cpu.idle%
3120 ? 4% slabinfo.sock_inode_cache.active_objs
3120 ? 4% slabinfo.sock_inode_cache.num_objs
-14.5% 8247745 ? 2% softirqs.RCU
+13.0% 939983 softirqs.SCHED
-5.0% 12067441 vmstat.system.cs
+16.2% 1130820 vmstat.system.in
+49.2% 99499793 ? 4% cpuidle.C1.time
+96.9% 22743914 ? 3% cpuidle.C1.usage
+139.2% 2019174 ? 2% cpuidle.POLL.time
419855 ? 3% cpuidle.POLL.usage
+28.2% 35505978 ? 2% numa-numastat.node0.local_node
+28.2% 35511779 ? 2% numa-numastat.node0.numa_hit
+18.8% 36597997 numa-numastat.node1.local_node
+18.8% 36606343 numa-numastat.node1.numa_hit
+30.7% 18298916 ? 3% numa-vmstat.node0.numa_hit
+30.7% 18292925 ? 3% numa-vmstat.node0.numa_local
+20.7% 18585831 ? 2% numa-vmstat.node1.numa_hit
+20.9% 18441285 ? 2% numa-vmstat.node1.numa_local
+96.9% 22743029 ? 3% turbostat.C1
0.19 ? 4% turbostat.C1%
0.44 ? 14% turbostat.C6%
0.57 ? 4% turbostat.CPU%c1
0.30 ? 15% turbostat.CPU%c6
+1.0% 48036 proc-vmstat.nr_slab_unreclaimable
+23.2% 72134162 proc-vmstat.numa_hit
+23.2% 72120007 proc-vmstat.numa_local
+23.1% 72307125 proc-vmstat.pgalloc_normal
-7.2% 2268341 proc-vmstat.pgfault
+23.2% 72252852 proc-vmstat.pgfree
-10.9% 1.292e+13 perf-stat.branch-instructions
1.85 perf-stat.branch-miss-rate%
-7.2% 2.393e+11 perf-stat.branch-misses
0.79 ? 4% perf-stat.cache-miss-rate%
-23.0% 6.378e+09 ? 4% perf-stat.cache-misses
-14.7% 8.078e+11 perf-stat.cache-references
-15.6% 8.797e+09 perf-stat.context-switches
-11.2% 1.264e+14 perf-stat.cpu-cycles
-31.3% 1.663e+09 perf-stat.cpu-migrations
1.43 ? 12% perf-stat.dTLB-load-miss-rate%
-25.0% 2.774e+11 ? 12% perf-stat.dTLB-load-misses
-9.7% 1.919e+13 perf-stat.dTLB-loads
0.09 ? 13% perf-stat.dTLB-store-miss-rate%
-28.0% 1.082e+10 ? 13% perf-stat.dTLB-store-misses
-9.5% 1.187e+13 perf-stat.dTLB-stores
56.30 perf-stat.iTLB-load-miss-rate%
-10.3% 1.044e+11 perf-stat.iTLB-load-misses
-14.5% 8.103e+10 perf-stat.iTLB-loads
-10.7% 6.605e+13 perf-stat.instructions
-7.2% 2235758 perf-stat.minor-faults
-25.3% 3.801e+09 ? 4% perf-stat.node-load-misses
-18.7% 1.585e+09 ? 3% perf-stat.node-store-misses
-22.2% 9.283e+08 ? 7% perf-stat.node-stores
-7.2% 2235771 perf-stat.page-faults
63709 perf-stat.path-length
66155 ? 93% sched_debug.cfs_rq:/.MIN_vruntime.avg
+3e+14% 2951855 ? 54% sched_debug.cfs_rq:/.MIN_vruntime.max
418846 ? 70% sched_debug.cfs_rq:/.MIN_vruntime.stddev
53105 ? 65% sched_debug.cfs_rq:/.load.max
7768 ? 51% sched_debug.cfs_rq:/.load.stddev
66155 ? 93% sched_debug.cfs_rq:/.max_vruntime.avg
+3e+14% 2951855 ? 54% sched_debug.cfs_rq:/.max_vruntime.max
418846 ? 70% sched_debug.cfs_rq:/.max_vruntime.stddev
-11.3% 26273202 ? 3% sched_debug.cfs_rq:/.min_vruntime.max
-11.4% 23885494 ? 3% sched_debug.cfs_rq:/.min_vruntime.min
34.73 ? 2% sched_debug.cfs_rq:/.runnable_load_avg.max
1.26 ? 33% sched_debug.cfs_rq:/.runnable_load_avg.min
6.14 ? 3% sched_debug.cfs_rq:/.runnable_load_avg.stddev
52656 ? 65% sched_debug.cfs_rq:/.runnable_weight.max
8227 ? 48% sched_debug.cfs_rq:/.runnable_weight.stddev
92.36 ? 14% sched_debug.cfs_rq:/.util_est_enqueued.min
2.77 ? 9% sched_debug.cpu.cpu_load[0].min
5.98 ? 4% sched_debug.cpu.cpu_load[0].stddev
34.27 ? 7% sched_debug.cpu.cpu_load[1].max
3.49 ? 9% sched_debug.cpu.cpu_load[1].min
5.62 ? 4% sched_debug.cpu.cpu_load[1].stddev
3.96 ? 8% sched_debug.cpu.cpu_load[2].min
4.38 ? 6% sched_debug.cpu.cpu_load[3].min
4.99 ? 7% sched_debug.cpu.cpu_load[3].stddev
2.04 ? 6% sched_debug.cpu.nr_running.min
3.63 ? 4% sched_debug.cpu.nr_running.stddev
-14.7% 60002575 ? 4% sched_debug.cpu.nr_switches.avg
-15.4% 62919235 ? 4% sched_debug.cpu.nr_switches.max
-14.7% 52785308 ? 3% sched_debug.cpu.nr_switches.min
4.11 ? 2% perf-profile.calltrace.cycles-pp.select_task_rq_fair.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
21.02 perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.__vfs_write
21.68 perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.__vfs_write.vfs_write
20.79 perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
2.36 perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.try_to_wake_up.autoremove_wake_function.__wake_up_common
24.06 perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.__vfs_write.vfs_write.ksys_write
0.91 perf-profile.calltrace.cycles-pp.available_idle_cpu.select_idle_sibling.select_task_rq_fair.try_to_wake_up.autoremove_wake_function
78.10 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
79.01 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
5.80 ? 2% perf-profile.calltrace.cycles-pp.__schedule.schedule.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.86 ? 2% perf-profile.calltrace.cycles-pp.schedule.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.02 ? 2% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.00 ? 3% perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule.exit_to_usermode_loop.do_syscall_64
1.07 ? 6% perf-profile.calltrace.cycles-pp.set_task_cpu.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
2.26 perf-profile.calltrace.cycles-pp.__switch_to
1.42 ? 5% perf-profile.calltrace.cycles-pp.selinux_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
1.65 perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.exit_to_usermode_loop.do_syscall_64
1.37 ? 2% perf-profile.calltrace.cycles-pp.native_write_msr
0.94 ? 2% perf-profile.calltrace.cycles-pp.load_new_mm_cr3.switch_mm_irqs_off.__schedule.schedule.exit_to_usermode_loop
0.90 ? 5% perf-profile.calltrace.cycles-pp.update_curr.dequeue_entity.dequeue_task_fair.__schedule.schedule
0.84 ? 5% perf-profile.calltrace.cycles-pp.___perf_sw_event.__schedule.schedule.pipe_wait.pipe_read
0.65 ? 2% perf-profile.calltrace.cycles-pp.update_curr.reweight_entity.dequeue_task_fair.__schedule.schedule
0.62 perf-profile.calltrace.cycles-pp.put_prev_entity.pick_next_task_fair.__schedule.schedule.exit_to_usermode_loop
0.76 perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.pipe_wait.pipe_read
0.71 perf-profile.calltrace.cycles-pp.update_curr.reweight_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
0.64 ? 3% perf-profile.calltrace.cycles-pp.update_curr.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
1.59 perf-profile.calltrace.cycles-pp.reweight_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
1.12 perf-profile.calltrace.cycles-pp.check_preempt_curr.ttwu_do_wakeup.try_to_wake_up.autoremove_wake_function.__wake_up_common
1.23 perf-profile.calltrace.cycles-pp.ttwu_do_wakeup.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
0.68 ? 2% perf-profile.calltrace.cycles-pp.file_update_time.pipe_write.__vfs_write.vfs_write.ksys_write
1.29 perf-profile.calltrace.cycles-pp.mutex_lock.pipe_write.__vfs_write.vfs_write.ksys_write
1.35 perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
0.67 ? 3% perf-profile.calltrace.cycles-pp.file_has_perm.security_file_permission.vfs_write.ksys_write.do_syscall_64
1.71 ? 2% perf-profile.calltrace.cycles-pp.copyout.copy_page_to_iter.pipe_read.__vfs_read.vfs_read
1.24 ? 2% perf-profile.calltrace.cycles-pp.selinux_file_permission.security_file_permission.vfs_write.ksys_write.do_syscall_64
0.76 ? 5% perf-profile.calltrace.cycles-pp.file_has_perm.security_file_permission.vfs_read.ksys_read.do_syscall_64
2.14 ? 2% perf-profile.calltrace.cycles-pp.copy_page_from_iter.pipe_write.__vfs_write.vfs_write.ksys_write
1.19 ? 5% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__wake_up_common_lock.pipe_write.__vfs_write
0.86 ? 5% perf-profile.calltrace.cycles-pp._raw_spin_lock.__schedule.schedule.pipe_wait.pipe_read
1.54 ? 5% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__wake_up_common_lock.pipe_write.__vfs_write.vfs_write
1.99 ? 2% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.71 ? 6% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__schedule.schedule.pipe_wait
0.59 ? 3% perf-profile.calltrace.cycles-pp.mutex_lock.pipe_read.__vfs_read.vfs_read.ksys_read
3.07 perf-profile.calltrace.cycles-pp.copy_page_to_iter.pipe_read.__vfs_read.vfs_read.ksys_read
2.13 perf-profile.calltrace.cycles-pp.mutex_unlock.pipe_write.__vfs_write.vfs_write.ksys_write
3.50 ? 3% perf-profile.calltrace.cycles-pp._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
3.03 ? 4% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.try_to_wake_up.autoremove_wake_function.__wake_up_common
0.56 ? 3% perf-profile.calltrace.cycles-pp.__mutex_unlock_slowpath.pipe_write.__vfs_write.vfs_write.ksys_write
0.56 ? 4% perf-profile.calltrace.cycles-pp.__mutex_lock.pipe_write.__vfs_write.vfs_write.ksys_write
5.92 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
4.20 ? 2% perf-profile.children.cycles-pp.select_task_rq_fair
21.13 perf-profile.children.cycles-pp.autoremove_wake_function
21.85 perf-profile.children.cycles-pp.__wake_up_common
2.48 perf-profile.children.cycles-pp.select_idle_sibling
21.22 perf-profile.children.cycles-pp.try_to_wake_up
24.78 perf-profile.children.cycles-pp.__wake_up_common_lock
0.96 perf-profile.children.cycles-pp.available_idle_cpu
22.68 perf-profile.children.cycles-pp.schedule
78.29 perf-profile.children.cycles-pp.do_syscall_64
22.73 perf-profile.children.cycles-pp.__schedule
79.06 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
6.06 ? 2% perf-profile.children.cycles-pp.exit_to_usermode_loop
5.01 ? 2% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.27 ? 4% perf-profile.children.cycles-pp.cpumask_next_wrap
3.52 ? 2% perf-profile.children.cycles-pp.update_curr
1.53 ? 4% perf-profile.children.cycles-pp.___perf_sw_event
1.08 ? 5% perf-profile.children.cycles-pp.set_task_cpu
2.31 perf-profile.children.cycles-pp.__switch_to
0.31 ? 3% perf-profile.children.cycles-pp.find_next_bit
0.94 perf-profile.children.cycles-pp.update_rq_clock
3.95 perf-profile.children.cycles-pp.pick_next_task_fair
0.49 ? 7% perf-profile.children.cycles-pp.migrate_task_rq_fair
1.51 perf-profile.children.cycles-pp.native_write_msr
2.79 perf-profile.children.cycles-pp.load_new_mm_cr3
0.47 ? 7% perf-profile.children.cycles-pp.update_min_vruntime
1.01 perf-profile.children.cycles-pp.finish_task_switch
0.88 ? 2% perf-profile.children.cycles-pp.__calc_delta
0.25 ? 5% perf-profile.children.cycles-pp.remove_entity_load_avg
0.82 perf-profile.children.cycles-pp.put_prev_entity
0.49 ? 4% perf-profile.children.cycles-pp.rb_erase_cached
0.23 ? 3% perf-profile.children.cycles-pp.__switch_to_asm
0.26 ? 2% perf-profile.children.cycles-pp.update_cfs_rq_h_load
0.24 ? 5% perf-profile.children.cycles-pp.cpumask_next
0.33 ? 2% perf-profile.children.cycles-pp.activate_task
0.14 ? 7% perf-profile.children.cycles-pp.attach_entity_load_avg
0.19 ? 4% perf-profile.children.cycles-pp.native_irq_return_iret
0.14 ? 5% perf-profile.children.cycles-pp.mutex_spin_on_owner
0.15 ? 2% perf-profile.children.cycles-pp.smp_reschedule_interrupt
0.07 ? 7% perf-profile.children.cycles-pp.schedule_preempt_disabled
0.19 ? 6% perf-profile.children.cycles-pp.timespec_trunc
0.15 ? 7% perf-profile.children.cycles-pp.inode_has_perm
0.33 ? 5% perf-profile.children.cycles-pp.__x64_sys_read
1.14 perf-profile.children.cycles-pp.check_preempt_curr
0.33 ? 5% perf-profile.children.cycles-pp.generic_pipe_buf_confirm
0.26 ? 2% perf-profile.children.cycles-pp.__fsnotify_parent
0.34 ? 6% perf-profile.children.cycles-pp.rw_verify_area
0.28 ? 7% perf-profile.children.cycles-pp.__x64_sys_write
0.28 perf-profile.children.cycles-pp.current_kernel_time64
1.26 perf-profile.children.cycles-pp.ttwu_do_wakeup
0.50 ? 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_stage2
1.18 ? 2% perf-profile.children.cycles-pp.__update_load_avg_se
0.06 ? 7% perf-profile.children.cycles-pp.osq_lock
0.06 ? 11% perf-profile.children.cycles-pp.schedule_idle
0.24 ? 10% perf-profile.children.cycles-pp.reschedule_interrupt
0.07 ? 36% perf-profile.children.cycles-pp.scheduler_ipi
0.48 ? 2% perf-profile.children.cycles-pp.wake_up_q
0.34 ? 5% perf-profile.children.cycles-pp.preempt_schedule_common
0.30 ? 2% perf-profile.children.cycles-pp.rcu_all_qs
1.73 perf-profile.children.cycles-pp.copyout
0.50 perf-profile.children.cycles-pp.current_time
0.51 ? 3% perf-profile.children.cycles-pp.__might_fault
0.84 perf-profile.children.cycles-pp.__might_sleep
0.11 ? 13% perf-profile.children.cycles-pp.__bitmap_weight
0.12 ? 3% perf-profile.children.cycles-pp.intel_idle
0.93 ? 2% perf-profile.children.cycles-pp.fsnotify
0.13 perf-profile.children.cycles-pp.cpuidle_enter_state
1.23 ? 3% perf-profile.children.cycles-pp.avc_has_perm
2.29 ? 2% perf-profile.children.cycles-pp.copy_page_from_iter
0.82 perf-profile.children.cycles-pp._cond_resched
0.83 ? 3% perf-profile.children.cycles-pp.__mutex_lock
2.65 perf-profile.children.cycles-pp.mutex_lock
1.12 perf-profile.children.cycles-pp.__mutex_unlock_slowpath
3.27 perf-profile.children.cycles-pp.copy_page_to_iter
0.23 ? 3% perf-profile.children.cycles-pp.secondary_startup_64
0.23 ? 3% perf-profile.children.cycles-pp.start_secondary
0.23 ? 3% perf-profile.children.cycles-pp.cpu_startup_entry
0.23 ? 3% perf-profile.children.cycles-pp.do_idle
1.29 ? 3% perf-profile.children.cycles-pp.___might_sleep
0.96 ? 2% perf-profile.children.cycles-pp.__inode_security_revalidate
1.64 ? 4% perf-profile.children.cycles-pp.file_has_perm
3.04 ? 3% perf-profile.children.cycles-pp.mutex_unlock
5.41 ? 3% perf-profile.children.cycles-pp._raw_spin_lock
5.92 perf-profile.children.cycles-pp.syscall_return_via_sysret
6.30 ? 3% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.96 perf-profile.self.cycles-pp.available_idle_cpu
2.65 ? 3% perf-profile.self.cycles-pp.switch_mm_irqs_off
1.44 ? 4% perf-profile.self.cycles-pp.___perf_sw_event
0.15 ? 4% perf-profile.self.cycles-pp.cpumask_next_wrap
2.31 perf-profile.self.cycles-pp.__switch_to
0.31 ? 3% perf-profile.self.cycles-pp.find_next_bit
0.55 ? 4% perf-profile.self.cycles-pp.update_rq_clock
2.60 ? 3% perf-profile.self.cycles-pp.__schedule
1.51 perf-profile.self.cycles-pp.native_write_msr
2.79 perf-profile.self.cycles-pp.load_new_mm_cr3
2.20 ? 3% perf-profile.self.cycles-pp.update_curr
0.47 ? 7% perf-profile.self.cycles-pp.update_min_vruntime
1.10 ? 2% perf-profile.self.cycles-pp._raw_spin_lock
0.88 ? 2% perf-profile.self.cycles-pp.__calc_delta
0.20 ? 7% perf-profile.self.cycles-pp.migrate_task_rq_fair
0.23 ? 3% perf-profile.self.cycles-pp.__switch_to_asm
0.36 ? 9% perf-profile.self.cycles-pp.pipe_wait
0.26 ? 2% perf-profile.self.cycles-pp.update_cfs_rq_h_load
1.17 ? 2% perf-profile.self.cycles-pp.enqueue_task_fair
0.30 ? 2% perf-profile.self.cycles-pp.prepare_to_wait
0.33 ? 2% perf-profile.self.cycles-pp.activate_task
0.14 ? 7% perf-profile.self.cycles-pp.attach_entity_load_avg
0.07 ? 6% perf-profile.self.cycles-pp.cpumask_next
0.11 ? 4% perf-profile.self.cycles-pp.wake_up_q
0.22 ? 4% perf-profile.self.cycles-pp.ksys_write
0.13 ? 3% perf-profile.self.cycles-pp.ttwu_do_wakeup
0.25 perf-profile.self.cycles-pp.check_preempt_curr
0.15 ? 3% perf-profile.self.cycles-pp.__might_fault
0.10 ? 5% perf-profile.self.cycles-pp.copyout
0.19 ? 4% perf-profile.self.cycles-pp.native_irq_return_iret
0.32 ? 6% perf-profile.self.cycles-pp.__x64_sys_read
0.24 ? 4% perf-profile.self.cycles-pp.ksys_read
0.15 ? 7% perf-profile.self.cycles-pp.inode_has_perm
0.14 ? 5% perf-profile.self.cycles-pp.mutex_spin_on_owner
0.33 ? 5% perf-profile.self.cycles-pp.generic_pipe_buf_confirm
0.19 ? 7% perf-profile.self.cycles-pp.current_time
0.30 ? 7% perf-profile.self.cycles-pp.schedule
0.33 ? 2% perf-profile.self.cycles-pp._cond_resched
0.19 ? 2% perf-profile.self.cycles-pp.__fdget_pos
0.26 perf-profile.self.cycles-pp.__fsnotify_parent
0.26 perf-profile.self.cycles-pp.wake_q_add
0.27 perf-profile.self.cycles-pp.current_kernel_time64
0.26 ? 4% perf-profile.self.cycles-pp.__mutex_unlock_slowpath
0.05 perf-profile.self.cycles-pp.sched_clock
0.50 ? 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_stage2
1.15 ? 2% perf-profile.self.cycles-pp.__update_load_avg_se
0.28 ? 7% perf-profile.self.cycles-pp.__x64_sys_write
0.67 ? 2% perf-profile.self.cycles-pp.vfs_write
0.63 ? 3% perf-profile.self.cycles-pp.__vfs_write
0.06 ? 7% perf-profile.self.cycles-pp.osq_lock
1.20 perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.41 ? 6% perf-profile.self.cycles-pp.__mutex_lock
0.40 ? 7% perf-profile.self.cycles-pp.security_file_permission
0.32 ? 4% perf-profile.self.cycles-pp.__inode_security_revalidate
0.30 ? 3% perf-profile.self.cycles-pp.rcu_all_qs
0.58 ? 3% perf-profile.self.cycles-pp.__vfs_read
0.76 ? 3% perf-profile.self.cycles-pp.copy_page_from_iter
0.91 ? 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.83 perf-profile.self.cycles-pp.__might_sleep
0.39 ? 6% perf-profile.self.cycles-pp.file_has_perm
0.11 ? 13% perf-profile.self.cycles-pp.__bitmap_weight
0.91 ? 3% perf-profile.self.cycles-pp.fsnotify
0.12 ? 3% perf-profile.self.cycles-pp.intel_idle
1.21 ? 3% perf-profile.self.cycles-pp.avc_has_perm
2.50 perf-profile.self.cycles-pp.update_load_avg
1.27 ? 3% perf-profile.self.cycles-pp.___might_sleep
2.99 ? 3% perf-profile.self.cycles-pp.mutex_unlock
1.53 ? 7% perf-profile.self.cycles-pp.do_syscall_64
5.92 perf-profile.self.cycles-pp.syscall_return_via_sysret
6.30 ? 3% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
estimated based on internal Intel analysis and are provided
purposes only. Any difference in system hardware or software
may affect actual performance.

Attachments:

(No filename) (61.46 kB)
config-4.17.0-rc6-00217-g8dd6626 (167.09 kB)
job-script (6.98 kB)
job.yaml (4.68 kB)
reproduce (3.89 kB)
Download all attachments