2017-12-04 03:02:56

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [x86/entry/64] 63e02a2a32: will-it-scale.per_process_ops -13.0% regression


Greeting,

FYI, we noticed a -13.0% regression of will-it-scale.per_process_ops due to commit:


commit: 63e02a2a3292d8815eac7be438c8c73d72a7bb93 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: will-it-scale
on test machine: 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory
with following parameters:

test: poll1
cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale

In addition to that, the commit also has significant impact on the following tests:

+------------------+---------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -7.0% regression |
| test machine | 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory |
| test parameters | cpufreq_governor=performance |
| | test=writeseek1 |
+------------------+---------------------------------------------------------------------+
| testcase: change | aim9: aim9.brk_test.ops_per_sec -9.9% regression |
| test machine | 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory |
| test parameters | cpufreq_governor=performance |
| | test=brk_test |
| | testtime=300s |
+------------------+---------------------------------------------------------------------+


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-sb03/poll1/will-it-scale

commit:
955cef1517 ("x86/entry/64: Return to userspace from the trampoline stack")
63e02a2a32 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")

955cef1517a1be93 63e02a2a3292d8815eac7be438
---------------- --------------------------
%stddev %change %stddev
\ | \
7435674 -13.0% 6465918 will-it-scale.per_process_ops
5868564 -10.4% 5256868 will-it-scale.per_thread_ops
0.56 +8.0% 0.61 ? 2% will-it-scale.scalability
1947 -2.0% 1908 will-it-scale.time.system_time
562.79 +6.9% 601.69 will-it-scale.time.user_time
8.06 +0.8 8.86 ? 3% mpstat.cpu.usr%
4969 ? 83% -84.5% 769.00 ? 6% numa-meminfo.node1.Inactive(anon)
116.75 ? 63% +90.1% 222.00 ? 9% numa-vmstat.node0.nr_mlock
116.75 ? 63% +90.1% 222.00 ? 9% numa-vmstat.node0.nr_unevictable
116.75 ? 63% +90.1% 222.00 ? 9% numa-vmstat.node0.nr_zone_unevictable
1242 ? 83% -84.6% 191.25 ? 6% numa-vmstat.node1.nr_inactive_anon
1242 ? 83% -84.6% 191.25 ? 6% numa-vmstat.node1.nr_zone_inactive_anon
1414780 +7.7% 1524182 ? 3% sched_debug.cfs_rq:/.min_vruntime.max
144.71 ? 12% +17.8% 170.42 ? 2% sched_debug.cfs_rq:/.runnable_load_avg.max
-568616 -29.5% -400842 sched_debug.cfs_rq:/.spread0.min
202980 ? 13% +56.8% 318219 ? 6% sched_debug.cpu.avg_idle.min
173545 ? 3% -13.9% 149414 ? 5% sched_debug.cpu.avg_idle.stddev
2.906e+12 -7.9% 2.676e+12 perf-stat.branch-instructions
0.01 ? 2% +2.0 2.00 perf-stat.branch-miss-rate%
2.405e+08 +22170.9% 5.356e+10 perf-stat.branch-misses
1.15 +11.6% 1.28 perf-stat.cpi
3.659e+12 -9.3% 3.318e+12 perf-stat.dTLB-loads
0.00 ? 6% +0.0 0.00 ? 3% perf-stat.dTLB-store-miss-rate%
2.869e+12 -8.8% 2.616e+12 perf-stat.dTLB-stores
1.406e+13 -9.7% 1.27e+13 perf-stat.instructions
0.87 -10.4% 0.78 perf-stat.ipc
13.72 ? 2% -13.7 0.00 perf-profile.calltrace.cycles.entry_SYSCALL_64
24.53 ? 2% -0.2 24.30 ? 3% perf-profile.calltrace.cycles.copy_user_generic_string._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
12.15 ? 3% -0.2 11.98 ? 3% perf-profile.calltrace.cycles.__fget_light.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
9.57 ? 3% -0.1 9.48 ? 4% perf-profile.calltrace.cycles.__fget.__fget_light.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
5.79 ? 6% -0.0 5.75 ? 3% perf-profile.calltrace.cycles.fput.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
32.25 ? 2% +1.5 33.78 ? 3% perf-profile.calltrace.cycles._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
3.99 ? 5% +1.6 5.56 ? 3% perf-profile.calltrace.cycles.__might_fault._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
65.36 ? 2% +2.0 67.34 ? 2% perf-profile.calltrace.cycles.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
68.87 ? 2% +3.1 72.01 ? 2% perf-profile.calltrace.cycles.sys_poll.entry_SYSCALL_64_fastpath
7.33 ? 35% +3.7 11.05 ? 23% perf-profile.calltrace.cycles.poll_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
71.48 ? 2% +3.9 75.41 ? 2% perf-profile.calltrace.cycles.entry_SYSCALL_64_fastpath
9.50 ? 25% +4.0 13.49 ? 19% perf-profile.calltrace.cycles.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
10.06 ? 23% +4.0 14.05 ? 18% perf-profile.calltrace.cycles.secondary_startup_64
9.66 ? 24% +4.0 13.66 ? 19% perf-profile.calltrace.cycles.cpu_startup_entry.start_secondary.secondary_startup_64
9.66 ? 24% +4.0 13.66 ? 19% perf-profile.calltrace.cycles.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
9.66 ? 24% +4.0 13.66 ? 19% perf-profile.calltrace.cycles.start_secondary.secondary_startup_64
2.25 ? 3% +5.4 7.67 ? 3% perf-profile.calltrace.cycles.entry_SYSCALL_64_after_hwframe
13.72 ? 2% -13.7 0.00 perf-profile.children.cycles.entry_SYSCALL_64
24.53 ? 2% -0.2 24.31 ? 3% perf-profile.children.cycles.copy_user_generic_string
12.16 ? 3% -0.2 11.99 ? 3% perf-profile.children.cycles.__fget_light
9.57 ? 3% -0.1 9.48 ? 4% perf-profile.children.cycles.__fget
5.79 ? 6% -0.0 5.75 ? 3% perf-profile.children.cycles.fput
32.25 ? 2% +1.5 33.78 ? 3% perf-profile.children.cycles._copy_from_user
3.99 ? 5% +1.6 5.56 ? 3% perf-profile.children.cycles.__might_fault
65.36 ? 2% +2.0 67.34 ? 2% perf-profile.children.cycles.do_sys_poll
68.87 ? 2% +3.1 72.01 ? 2% perf-profile.children.cycles.sys_poll
7.42 ? 34% +3.7 11.14 ? 22% perf-profile.children.cycles.poll_idle
71.61 ? 2% +3.9 75.50 ? 2% perf-profile.children.cycles.entry_SYSCALL_64_fastpath
9.88 ? 23% +4.0 13.87 ? 19% perf-profile.children.cycles.cpuidle_enter_state
10.06 ? 23% +4.0 14.05 ? 18% perf-profile.children.cycles.secondary_startup_64
10.06 ? 23% +4.0 14.05 ? 18% perf-profile.children.cycles.cpu_startup_entry
9.66 ? 24% +4.0 13.66 ? 19% perf-profile.children.cycles.start_secondary
10.06 ? 23% +4.0 14.05 ? 18% perf-profile.children.cycles.do_idle
2.25 ? 3% +5.4 7.67 ? 3% perf-profile.children.cycles.entry_SYSCALL_64_after_hwframe
13.72 ? 2% -13.7 0.00 perf-profile.self.cycles.entry_SYSCALL_64
24.21 ? 2% -0.3 23.93 ? 2% perf-profile.self.cycles.copy_user_generic_string
9.47 ? 3% -0.1 9.41 ? 4% perf-profile.self.cycles.__fget
5.69 ? 5% +0.0 5.71 ? 3% perf-profile.self.cycles.fput
13.55 ? 4% +0.7 14.24 perf-profile.self.cycles.do_sys_poll
7.41 ? 34% +3.7 11.07 ? 22% perf-profile.self.cycles.poll_idle
2.25 ? 3% +5.4 7.67 ? 3% perf-profile.self.cycles.entry_SYSCALL_64_after_hwframe



will-it-scale.per_process_ops

7.8e+06 +-+---------------------------------------------------------------+
|. .+.++ .++. |
7.6e+06 +-+ : .+.+ +.+.+.+ +.+ |
| : .+.+ + + + |
7.4e+06 +-+ +.+.+.+.++.+.+.+.+.++ ++.+.+ ++.+.|
| |
7.2e+06 +-+ |
| |
7e+06 +-+ |
| |
6.8e+06 +-+ |
| |
6.6e+06 O-+ O OO OO O O |
| O O O O OO O O O O OO O O O O O |
6.4e+06 +-+--------O-----------------------O-O-------------O--------------+


perf-stat.instructions

1.5e+13 +-+--------------------------------------------------------------+
| |
1.45e+13 +-+ +.+ .+. |
| +.+ + +.+.+.+. .+.+.+. +. .+.++.+ +. |
| +. : +.++ + +.+ ++.+.|
1.4e+13 +-+ +.++.+.+.+.+ |
| |
1.35e+13 +-+ |
| |
1.3e+13 +-+ |
O OO O O OO O O O O O |
| O O O O OO O O O O O O O O O |
1.25e+13 +-+ O O |
| |
1.2e+13 +-+--------------------------------------------------------------+


perf-stat.branch-instructions

3.05e+12 +-+--------------------------------------------------------------+
3e+12 +-+ + |
|.+.++.+ + ++ .+.+ .+. + + + |
2.95e+12 +-+ + + + +.+. .+. + +. + + .+ + + + + + : +|
2.9e+12 +-+ + + + + + + + + + + + :+ + : |
| + + + + ++ |
2.85e+12 +-+ |
2.8e+12 +-+ |
2.75e+12 +-+ |
| O |
2.7e+12 +-+ O O O O O |
2.65e+12 O-+ O O O O O O O O O O O O |
| O O O O O O O O O |
2.6e+12 +-+ O |
2.55e+12 +-+--------------------------------------------------------------+


perf-stat.branch-misses

6e+10 +-+-----------------------------------------------------------------+
| O O O O O O O |
5e+10 O-O O O O O O O O O OO O O O O O O OO O O |
| |
| |
4e+10 +-+ |
| |
3e+10 +-+ |
| |
2e+10 +-+ |
| |
| |
1e+10 +-+ |
| |
0 +-+-----------------------------------------------------------------+


perf-stat.dTLB-stores

3.2e+12 +-+---------------------------------------------------------------+
| + + + + |
3.1e+12 +-+ + + : :+ +: |
| + + : + + : |
3e+12 +-+ : : : : |
|. : : : : + |
2.9e+12 +-+.+.++. : : +.+ .+. : +. .+ : +|
| +.+. .+.++.+.: +. + :.+ +.: + :: |
2.8e+12 +-+ + + +.+ + + + |
| |
2.7e+12 +-+ |
O OO O O O O |
2.6e+12 +-O O O O O O O O OO O O OO |
| O O O O O O O O |
2.5e+12 +-+---------------------------------------------------------------+


perf-stat.branch-miss-rate_

2.5 +-+-------------------------------------------------------------------+
| |
| |
2 O-O O O O O O O O OO O O O O O O O O O O O O O O O O OO |
| |
| |
1.5 +-+ |
| |
1 +-+ |
| |
| |
0.5 +-+ |
| |
| |
0 +-+-------------------------------------------------------------------+


perf-stat.ipc

0.92 +-+------------------------------------------------------------------+
| |
0.9 +-+.+. +. .+. .+. +. .+. |
0.88 +-+ +. + + +. +.+ +. .+. + + + .+. |
| +. +. .+ +.+ + +.+ + +. .+.|
0.86 +-+ +.+ +.+.+.+ + |
| |
0.84 +-+ |
| |
0.82 +-+ |
0.8 +-+ O O O O |
| O O O O |
0.78 +-O O O O O O O O O O O O O O |
O O O O O O O |
0.76 +-+------------------------------------------------------------------+


perf-stat.cpi

1.3 +-+---------------------------------O-O------------------------------+
1.28 O-+ O O O O O O O O O |
| O O O O O O O O O O O O |
1.26 +-+ O |
1.24 +-+ O O O O |
| |
1.22 +-+ |
1.2 +-+ |
1.18 +-+ |
| |
1.16 +-+ .+.+ .+.+.+.+. .+ .+. |
1.14 +-+ .+ + + .+ +. .+. .+.+ +. .+ +.|
|.+. .+ + .+. .+ +. .+ + + .+. .+ + |
1.12 +-+ + + + + + + |
1.1 +-+------------------------------------------------------------------+


will-it-scale.time.user_time

620 +-+-------------------------------------------------------------------+
610 +-+ O O |
O O O O O O O OO O O O O O O |
600 +-+ O O O O O O O O O O O O |
590 +-+ |
| |
580 +-+ |
570 +-+ |
560 +-+ +.+.+.|
| : |
550 +-+.+.+.+. .+ .+.+. : |
540 +-+ +.+. + + .+.+ +.+ +. : |
| +.+.++.+.+. + +.+ + + + |
530 +-+ +.+.+.+ ++.+.+ |
520 +-+-------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample

***************************************************************************************************
lkp-sb03: 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-sb03/writeseek1/will-it-scale

commit:
955cef1517 ("x86/entry/64: Return to userspace from the trampoline stack")
63e02a2a32 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")

955cef1517a1be93 63e02a2a3292d8815eac7be438
---------------- --------------------------
%stddev %change %stddev
\ | \
1902014 -7.0% 1768039 will-it-scale.per_process_ops
1557647 -6.3% 1459046 will-it-scale.per_thread_ops
0.52 +4.0% 0.54 will-it-scale.scalability
2293 -1.8% 2251 will-it-scale.time.system_time
216.11 +19.7% 258.70 will-it-scale.time.user_time
1.453e+08 ? 6% +21.7% 1.769e+08 ? 9% cpuidle.POLL.time
3.43 +0.8 4.26 mpstat.cpu.usr%
284863 ? 6% +12.9% 321561 ? 3% softirqs.RCU
7178 ? 6% -11.3% 6368 slabinfo.kmalloc-96.active_objs
7218 ? 5% -10.6% 6450 slabinfo.kmalloc-96.num_objs
72.27 ? 6% +19.5% 86.39 ? 7% sched_debug.cfs_rq:/.load_avg.avg
107.67 ? 3% +31.1% 141.11 ? 19% sched_debug.cfs_rq:/.load_avg.stddev
50035 ? 23% +17.3% 58672 ? 24% sched_debug.cpu.load.stddev
7.58 ? 21% +65.4% 12.54 ? 11% sched_debug.cpu.nr_uninterruptible.max
3.143e+12 -4.7% 2.995e+12 perf-stat.branch-instructions
0.01 ? 2% +1.0 0.97 perf-stat.branch-miss-rate%
3.791e+08 ? 3% +7525.5% 2.891e+10 perf-stat.branch-misses
2.54e+08 +1.0% 2.566e+08 perf-stat.cache-misses
1.03 +6.3% 1.10 perf-stat.cpi
6.671e+12 -4.7% 6.361e+12 perf-stat.dTLB-loads
4.722e+12 -5.0% 4.485e+12 perf-stat.dTLB-stores
35.63 ? 12% -29.7 5.89 ? 20% perf-stat.iTLB-load-miss-rate%
8.119e+08 ? 8% +829.8% 7.549e+09 ? 2% perf-stat.iTLB-loads
1.563e+13 -5.3% 1.48e+13 perf-stat.instructions
0.97 -5.9% 0.91 perf-stat.ipc
5.97 -6.0 0.00 perf-profile.calltrace.cycles.entry_SYSCALL_64
7.43 ? 2% -0.1 7.29 ? 3% perf-profile.calltrace.cycles.find_lock_entry.shmem_getpage_gfp.shmem_write_begin.generic_perform_write.__generic_file_write_iter
9.10 ? 2% -0.1 9.00 ? 3% perf-profile.calltrace.cycles.shmem_getpage_gfp.shmem_write_begin.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
9.43 ? 2% -0.1 9.33 ? 3% perf-profile.calltrace.cycles.shmem_write_begin.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.__vfs_write
19.45 -0.1 19.39 ? 2% perf-profile.calltrace.cycles.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
19.14 -0.0 19.10 perf-profile.calltrace.cycles.copy_user_generic_string.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter
21.14 +0.0 21.15 ? 2% perf-profile.calltrace.cycles.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.__vfs_write
9.16 ? 10% +0.0 9.20 ? 41% perf-profile.calltrace.cycles.poll_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
41.59 +0.1 41.71 ? 2% perf-profile.calltrace.cycles.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.__vfs_write.vfs_write
11.09 ? 8% +0.2 11.24 ? 31% perf-profile.calltrace.cycles.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
11.21 ? 8% +0.2 11.37 ? 31% perf-profile.calltrace.cycles.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
11.21 ? 8% +0.2 11.37 ? 31% perf-profile.calltrace.cycles.cpu_startup_entry.start_secondary.secondary_startup_64
11.21 ? 8% +0.2 11.37 ? 31% perf-profile.calltrace.cycles.start_secondary.secondary_startup_64
11.68 ? 7% +0.2 11.90 ? 27% perf-profile.calltrace.cycles.secondary_startup_64
45.10 +0.3 45.37 ? 2% perf-profile.calltrace.cycles.__generic_file_write_iter.generic_file_write_iter.__vfs_write.vfs_write.sys_write
51.69 +0.3 52.02 ? 2% perf-profile.calltrace.cycles.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
50.28 +0.4 50.63 ? 2% perf-profile.calltrace.cycles.generic_file_write_iter.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
61.80 +0.8 62.60 ? 3% perf-profile.calltrace.cycles.vfs_write.sys_write.entry_SYSCALL_64_fastpath
4.92 +0.9 5.80 ? 5% perf-profile.calltrace.cycles.__fdget_pos.sys_lseek.entry_SYSCALL_64_fastpath
4.96 +0.9 5.86 ? 3% perf-profile.calltrace.cycles.__fdget_pos.sys_write.entry_SYSCALL_64_fastpath
8.74 +1.0 9.75 ? 6% perf-profile.calltrace.cycles.sys_lseek.entry_SYSCALL_64_fastpath
69.88 +1.6 71.49 ? 3% perf-profile.calltrace.cycles.sys_write.entry_SYSCALL_64_fastpath
80.00 +2.9 82.90 ? 3% perf-profile.calltrace.cycles.entry_SYSCALL_64_fastpath
5.97 -6.0 0.00 perf-profile.children.cycles.entry_SYSCALL_64
7.43 ? 2% -0.1 7.29 ? 3% perf-profile.children.cycles.find_lock_entry
9.10 ? 2% -0.1 9.00 ? 3% perf-profile.children.cycles.shmem_getpage_gfp
9.43 ? 2% -0.1 9.33 ? 3% perf-profile.children.cycles.shmem_write_begin
19.45 -0.1 19.39 ? 2% perf-profile.children.cycles.copyin
19.14 -0.0 19.11 perf-profile.children.cycles.copy_user_generic_string
21.14 +0.0 21.15 ? 2% perf-profile.children.cycles.iov_iter_copy_from_user_atomic
9.46 ? 9% +0.1 9.56 ? 36% perf-profile.children.cycles.poll_idle
41.60 +0.1 41.72 ? 2% perf-profile.children.cycles.generic_perform_write
11.21 ? 8% +0.2 11.37 ? 31% perf-profile.children.cycles.start_secondary
11.56 ? 7% +0.2 11.76 ? 27% perf-profile.children.cycles.cpuidle_enter_state
11.69 ? 7% +0.2 11.90 ? 27% perf-profile.children.cycles.do_idle
11.68 ? 7% +0.2 11.90 ? 27% perf-profile.children.cycles.secondary_startup_64
11.68 ? 7% +0.2 11.90 ? 27% perf-profile.children.cycles.cpu_startup_entry
45.10 +0.3 45.37 ? 2% perf-profile.children.cycles.__generic_file_write_iter
51.72 +0.3 52.03 ? 2% perf-profile.children.cycles.__vfs_write
50.28 +0.4 50.63 ? 2% perf-profile.children.cycles.generic_file_write_iter
61.84 +0.8 62.62 ? 3% perf-profile.children.cycles.vfs_write
8.74 +1.0 9.75 ? 6% perf-profile.children.cycles.sys_lseek
3.81 +1.6 5.38 ? 5% perf-profile.children.cycles.__fget_light
69.93 +1.6 71.50 ? 3% perf-profile.children.cycles.sys_write
9.88 +1.8 11.67 ? 3% perf-profile.children.cycles.__fdget_pos
80.23 +2.7 82.94 ? 3% perf-profile.children.cycles.entry_SYSCALL_64_fastpath
5.97 -6.0 0.00 perf-profile.self.cycles.entry_SYSCALL_64
18.93 -0.1 18.84 ? 2% perf-profile.self.cycles.copy_user_generic_string
9.39 ? 8% +0.0 9.42 ? 35% perf-profile.self.cycles.poll_idle



***************************************************************************************************
lkp-ivb-d03: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-ivb-d03/brk_test/aim9/300s

commit:
955cef1517 ("x86/entry/64: Return to userspace from the trampoline stack")
63e02a2a32 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")

955cef1517a1be93 63e02a2a3292d8815eac7be438
---------------- --------------------------
%stddev %change %stddev
\ | \
4124214 -9.9% 3717599 aim9.brk_test.ops_per_sec
272.29 -4.9% 259.03 aim9.time.system_time
27.71 +47.2% 40.78 aim9.time.user_time
12605 ? 9% -27.0% 9203 ? 10% cpuidle.POLL.usage
3.24 ? 2% +1.4 4.62 mpstat.cpu.usr%
4007 ? 3% -9.2% 3639 ? 4% slabinfo.anon_vma_chain.num_objs
9.80 -1.9% 9.61 turbostat.CorWatt
30309 -1.3% 29929 vmstat.system.cs
18905 -1.1% 18689 vmstat.system.in
716.67 ? 11% -22.7% 554.33 ? 6% sched_debug.cfs_rq:/.load_avg.avg
1.00 ? 11% -79.2% 0.21 ?173% sched_debug.cfs_rq:/.nr_spread_over.min
0.45 ? 55% +70.3% 0.76 ? 19% sched_debug.cfs_rq:/.nr_spread_over.stddev
521.82 ? 3% -10.2% 468.57 ? 2% sched_debug.cfs_rq:/.util_avg.avg
1.96 ? 7% +34.0% 2.62 ? 9% sched_debug.cpu.nr_running.max
0.68 ? 15% +42.9% 0.98 ? 15% sched_debug.cpu.nr_running.stddev
0.06 ? 19% +0.9 0.92 perf-stat.branch-miss-rate%
3.583e+08 ? 5% +1125.0% 4.389e+09 ? 28% perf-stat.branch-misses
9163065 -1.8% 8997254 perf-stat.context-switches
0.56 ? 2% +12.8% 0.63 ? 4% perf-stat.cpi
0.06 ?132% +0.2 0.23 ? 6% perf-stat.dTLB-load-miss-rate%
4.062e+08 ?142% +234.1% 1.357e+09 ? 8% perf-stat.dTLB-load-misses
9061724 ? 12% +22.0% 11056158 ? 6% perf-stat.dTLB-store-misses
11.72 ? 24% -6.6 5.08 ? 33% perf-stat.iTLB-load-miss-rate%
4.4e+08 ? 29% +135.5% 1.036e+09 ? 23% perf-stat.iTLB-loads
1.80 ? 2% -11.2% 1.60 ? 3% perf-stat.ipc
14.11 ? 88% -2.6 11.50 ? 86% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
14.22 ? 88% -2.6 11.63 ? 85% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
14.22 ? 88% -2.6 11.63 ? 85% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
14.22 ? 88% -2.6 11.63 ? 85% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
12.86 ? 92% -2.4 10.45 ? 97% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
45.20 ? 3% -1.4 43.82 perf-profile.calltrace.cycles-pp.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
16.60 ? 3% -0.9 15.74 ? 3% perf-profile.calltrace.cycles-pp.vma_merge.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
56.05 ? 2% -0.8 55.25 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_fastpath
14.60 ? 3% -0.7 13.88 ? 2% perf-profile.calltrace.cycles-pp.__vma_adjust.vma_merge.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
54.84 ? 3% -0.7 54.15 perf-profile.calltrace.cycles-pp.sys_brk.entry_SYSCALL_64_fastpath
11.52 ? 9% -0.1 11.46 perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
6.30 ? 5% +0.2 6.48 ? 3% perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
27.40 ? 3% +0.8 28.18 ? 4% perf-profile.calltrace.cycles-pp.secondary_startup_64
12.40 ? 94% +3.3 15.73 ? 62% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_kernel
13.18 ? 88% +3.4 16.55 ? 57% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64
13.18 ? 88% +3.4 16.55 ? 57% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_kernel.secondary_startup_64
13.18 ? 88% +3.4 16.55 ? 57% perf-profile.calltrace.cycles-pp.start_kernel.secondary_startup_64
13.14 ? 88% +3.4 16.53 ? 57% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64
14.22 ? 88% -2.6 11.63 ? 85% perf-profile.children.cycles-pp.start_secondary
45.83 ? 3% -1.2 44.59 perf-profile.children.cycles-pp.do_brk_flags
56.30 ? 2% -0.9 55.36 perf-profile.children.cycles-pp.entry_SYSCALL_64_fastpath
17.05 ? 3% -0.8 16.24 ? 3% perf-profile.children.cycles-pp.vma_merge
15.45 ? 3% -0.7 14.79 ? 2% perf-profile.children.cycles-pp.__vma_adjust
55.47 ? 3% -0.6 54.88 perf-profile.children.cycles-pp.sys_brk
12.21 ? 8% -0.1 12.08 perf-profile.children.cycles-pp.perf_event_mmap
6.40 ? 5% +0.2 6.57 ? 3% perf-profile.children.cycles-pp.security_vm_enough_memory_mm
27.41 ? 3% +0.8 28.19 ? 4% perf-profile.children.cycles-pp.do_idle
27.30 ? 3% +0.8 28.07 ? 4% perf-profile.children.cycles-pp.cpuidle_enter_state
27.40 ? 3% +0.8 28.18 ? 4% perf-profile.children.cycles-pp.secondary_startup_64
27.40 ? 3% +0.8 28.18 ? 4% perf-profile.children.cycles-pp.cpu_startup_entry
25.27 +0.9 26.19 perf-profile.children.cycles-pp.intel_idle
13.18 ? 88% +3.4 16.55 ? 57% perf-profile.children.cycles-pp.start_kernel
4.82 ? 9% +0.0 4.83 ? 5% perf-profile.self.cycles-pp.__vma_adjust
5.25 ? 9% +0.0 5.29 ? 2% perf-profile.self.cycles-pp.perf_event_mmap
5.33 ? 3% +0.4 5.75 ? 3% perf-profile.self.cycles-pp.do_brk_flags
25.26 +0.9 26.19 perf-profile.self.cycles-pp.intel_idle



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong


Attachments:
(No filename) (38.00 kB)
config-4.14.0-01234-g63e02a2 (159.67 kB)
job.yaml (4.64 kB)
reproduce (326.00 B)
Download all attachments

2017-12-04 04:00:02

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [lkp-robot] [x86/entry/64] 63e02a2a32: will-it-scale.per_process_ops -13.0% regression

Thomas, has my fix for this landed?

--Andy

> On Dec 3, 2017, at 7:02 PM, kernel test robot <[email protected]> wrote:
>
>
> Greeting,
>
> FYI, we noticed a -13.0% regression of will-it-scale.per_process_ops due to commit:
>
>
> commit: 63e02a2a3292d8815eac7be438c8c73d72a7bb93 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> in testcase: will-it-scale
> on test machine: 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory
> with following parameters:
>
> test: poll1
> cpufreq_governor: performance
>
> test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
> test-url: https://github.com/antonblanchard/will-it-scale
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+---------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_process_ops -7.0% regression |
> | test machine | 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory |
> | test parameters | cpufreq_governor=performance |
> | | test=writeseek1 |
> +------------------+---------------------------------------------------------------------+
> | testcase: change | aim9: aim9.brk_test.ops_per_sec -9.9% regression |
> | test machine | 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory |
> | test parameters | cpufreq_governor=performance |
> | | test=brk_test |
> | | testtime=300s |
> +------------------+---------------------------------------------------------------------+
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
> gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-sb03/poll1/will-it-scale
>
> commit:
> 955cef1517 ("x86/entry/64: Return to userspace from the trampoline stack")
> 63e02a2a32 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")
>
> 955cef1517a1be93 63e02a2a3292d8815eac7be438
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 7435674 -13.0% 6465918 will-it-scale.per_process_ops
> 5868564 -10.4% 5256868 will-it-scale.per_thread_ops
> 0.56 +8.0% 0.61 ± 2% will-it-scale.scalability
> 1947 -2.0% 1908 will-it-scale.time.system_time
> 562.79 +6.9% 601.69 will-it-scale.time.user_time
> 8.06 +0.8 8.86 ± 3% mpstat.cpu.usr%
> 4969 ± 83% -84.5% 769.00 ± 6% numa-meminfo.node1.Inactive(anon)
> 116.75 ± 63% +90.1% 222.00 ± 9% numa-vmstat.node0.nr_mlock
> 116.75 ± 63% +90.1% 222.00 ± 9% numa-vmstat.node0.nr_unevictable
> 116.75 ± 63% +90.1% 222.00 ± 9% numa-vmstat.node0.nr_zone_unevictable
> 1242 ± 83% -84.6% 191.25 ± 6% numa-vmstat.node1.nr_inactive_anon
> 1242 ± 83% -84.6% 191.25 ± 6% numa-vmstat.node1.nr_zone_inactive_anon
> 1414780 +7.7% 1524182 ± 3% sched_debug.cfs_rq:/.min_vruntime.max
> 144.71 ± 12% +17.8% 170.42 ± 2% sched_debug.cfs_rq:/.runnable_load_avg.max
> -568616 -29.5% -400842 sched_debug.cfs_rq:/.spread0.min
> 202980 ± 13% +56.8% 318219 ± 6% sched_debug.cpu.avg_idle.min
> 173545 ± 3% -13.9% 149414 ± 5% sched_debug.cpu.avg_idle.stddev
> 2.906e+12 -7.9% 2.676e+12 perf-stat.branch-instructions
> 0.01 ± 2% +2.0 2.00 perf-stat.branch-miss-rate%
> 2.405e+08 +22170.9% 5.356e+10 perf-stat.branch-misses
> 1.15 +11.6% 1.28 perf-stat.cpi
> 3.659e+12 -9.3% 3.318e+12 perf-stat.dTLB-loads
> 0.00 ± 6% +0.0 0.00 ± 3% perf-stat.dTLB-store-miss-rate%
> 2.869e+12 -8.8% 2.616e+12 perf-stat.dTLB-stores
> 1.406e+13 -9.7% 1.27e+13 perf-stat.instructions
> 0.87 -10.4% 0.78 perf-stat.ipc
> 13.72 ± 2% -13.7 0.00 perf-profile.calltrace.cycles.entry_SYSCALL_64
> 24.53 ± 2% -0.2 24.30 ± 3% perf-profile.calltrace.cycles.copy_user_generic_string._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 12.15 ± 3% -0.2 11.98 ± 3% perf-profile.calltrace.cycles.__fget_light.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 9.57 ± 3% -0.1 9.48 ± 4% perf-profile.calltrace.cycles.__fget.__fget_light.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 5.79 ± 6% -0.0 5.75 ± 3% perf-profile.calltrace.cycles.fput.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 32.25 ± 2% +1.5 33.78 ± 3% perf-profile.calltrace.cycles._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 3.99 ± 5% +1.6 5.56 ± 3% perf-profile.calltrace.cycles.__might_fault._copy_from_user.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 65.36 ± 2% +2.0 67.34 ± 2% perf-profile.calltrace.cycles.do_sys_poll.sys_poll.entry_SYSCALL_64_fastpath
> 68.87 ± 2% +3.1 72.01 ± 2% perf-profile.calltrace.cycles.sys_poll.entry_SYSCALL_64_fastpath
> 7.33 ± 35% +3.7 11.05 ± 23% perf-profile.calltrace.cycles.poll_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
> 71.48 ± 2% +3.9 75.41 ± 2% perf-profile.calltrace.cycles.entry_SYSCALL_64_fastpath
> 9.50 ± 25% +4.0 13.49 ± 19% perf-profile.calltrace.cycles.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 10.06 ± 23% +4.0 14.05 ± 18% perf-profile.calltrace.cycles.secondary_startup_64
> 9.66 ± 24% +4.0 13.66 ± 19% perf-profile.calltrace.cycles.cpu_startup_entry.start_secondary.secondary_startup_64
> 9.66 ± 24% +4.0 13.66 ± 19% perf-profile.calltrace.cycles.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 9.66 ± 24% +4.0 13.66 ± 19% perf-profile.calltrace.cycles.start_secondary.secondary_startup_64
> 2.25 ± 3% +5.4 7.67 ± 3% perf-profile.calltrace.cycles.entry_SYSCALL_64_after_hwframe
> 13.72 ± 2% -13.7 0.00 perf-profile.children.cycles.entry_SYSCALL_64
> 24.53 ± 2% -0.2 24.31 ± 3% perf-profile.children.cycles.copy_user_generic_string
> 12.16 ± 3% -0.2 11.99 ± 3% perf-profile.children.cycles.__fget_light
> 9.57 ± 3% -0.1 9.48 ± 4% perf-profile.children.cycles.__fget
> 5.79 ± 6% -0.0 5.75 ± 3% perf-profile.children.cycles.fput
> 32.25 ± 2% +1.5 33.78 ± 3% perf-profile.children.cycles._copy_from_user
> 3.99 ± 5% +1.6 5.56 ± 3% perf-profile.children.cycles.__might_fault
> 65.36 ± 2% +2.0 67.34 ± 2% perf-profile.children.cycles.do_sys_poll
> 68.87 ± 2% +3.1 72.01 ± 2% perf-profile.children.cycles.sys_poll
> 7.42 ± 34% +3.7 11.14 ± 22% perf-profile.children.cycles.poll_idle
> 71.61 ± 2% +3.9 75.50 ± 2% perf-profile.children.cycles.entry_SYSCALL_64_fastpath
> 9.88 ± 23% +4.0 13.87 ± 19% perf-profile.children.cycles.cpuidle_enter_state
> 10.06 ± 23% +4.0 14.05 ± 18% perf-profile.children.cycles.secondary_startup_64
> 10.06 ± 23% +4.0 14.05 ± 18% perf-profile.children.cycles.cpu_startup_entry
> 9.66 ± 24% +4.0 13.66 ± 19% perf-profile.children.cycles.start_secondary
> 10.06 ± 23% +4.0 14.05 ± 18% perf-profile.children.cycles.do_idle
> 2.25 ± 3% +5.4 7.67 ± 3% perf-profile.children.cycles.entry_SYSCALL_64_after_hwframe
> 13.72 ± 2% -13.7 0.00 perf-profile.self.cycles.entry_SYSCALL_64
> 24.21 ± 2% -0.3 23.93 ± 2% perf-profile.self.cycles.copy_user_generic_string
> 9.47 ± 3% -0.1 9.41 ± 4% perf-profile.self.cycles.__fget
> 5.69 ± 5% +0.0 5.71 ± 3% perf-profile.self.cycles.fput
> 13.55 ± 4% +0.7 14.24 perf-profile.self.cycles.do_sys_poll
> 7.41 ± 34% +3.7 11.07 ± 22% perf-profile.self.cycles.poll_idle
> 2.25 ± 3% +5.4 7.67 ± 3% perf-profile.self.cycles.entry_SYSCALL_64_after_hwframe
>
>
>
> will-it-scale.per_process_ops
>
> 7.8e+06 +-+---------------------------------------------------------------+
> |. .+.++ .++. |
> 7.6e+06 +-+ : .+.+ +.+.+.+ +.+ |
> | : .+.+ + + + |
> 7.4e+06 +-+ +.+.+.+.++.+.+.+.+.++ ++.+.+ ++.+.|
> | |
> 7.2e+06 +-+ |
> | |
> 7e+06 +-+ |
> | |
> 6.8e+06 +-+ |
> | |
> 6.6e+06 O-+ O OO OO O O |
> | O O O O OO O O O O OO O O O O O |
> 6.4e+06 +-+--------O-----------------------O-O-------------O--------------+
>
>
> perf-stat.instructions
>
> 1.5e+13 +-+--------------------------------------------------------------+
> | |
> 1.45e+13 +-+ +.+ .+. |
> | +.+ + +.+.+.+. .+.+.+. +. .+.++.+ +. |
> | +. : +.++ + +.+ ++.+.|
> 1.4e+13 +-+ +.++.+.+.+.+ |
> | |
> 1.35e+13 +-+ |
> | |
> 1.3e+13 +-+ |
> O OO O O OO O O O O O |
> | O O O O OO O O O O O O O O O |
> 1.25e+13 +-+ O O |
> | |
> 1.2e+13 +-+--------------------------------------------------------------+
>
>
> perf-stat.branch-instructions
>
> 3.05e+12 +-+--------------------------------------------------------------+
> 3e+12 +-+ + |
> |.+.++.+ + ++ .+.+ .+. + + + |
> 2.95e+12 +-+ + + + +.+. .+. + +. + + .+ + + + + + : +|
> 2.9e+12 +-+ + + + + + + + + + + + :+ + : |
> | + + + + ++ |
> 2.85e+12 +-+ |
> 2.8e+12 +-+ |
> 2.75e+12 +-+ |
> | O |
> 2.7e+12 +-+ O O O O O |
> 2.65e+12 O-+ O O O O O O O O O O O O |
> | O O O O O O O O O |
> 2.6e+12 +-+ O |
> 2.55e+12 +-+--------------------------------------------------------------+
>
>
> perf-stat.branch-misses
>
> 6e+10 +-+-----------------------------------------------------------------+
> | O O O O O O O |
> 5e+10 O-O O O O O O O O O OO O O O O O O OO O O |
> | |
> | |
> 4e+10 +-+ |
> | |
> 3e+10 +-+ |
> | |
> 2e+10 +-+ |
> | |
> | |
> 1e+10 +-+ |
> | |
> 0 +-+-----------------------------------------------------------------+
>
>
> perf-stat.dTLB-stores
>
> 3.2e+12 +-+---------------------------------------------------------------+
> | + + + + |
> 3.1e+12 +-+ + + : :+ +: |
> | + + : + + : |
> 3e+12 +-+ : : : : |
> |. : : : : + |
> 2.9e+12 +-+.+.++. : : +.+ .+. : +. .+ : +|
> | +.+. .+.++.+.: +. + :.+ +.: + :: |
> 2.8e+12 +-+ + + +.+ + + + |
> | |
> 2.7e+12 +-+ |
> O OO O O O O |
> 2.6e+12 +-O O O O O O O O OO O O OO |
> | O O O O O O O O |
> 2.5e+12 +-+---------------------------------------------------------------+
>
>
> perf-stat.branch-miss-rate_
>
> 2.5 +-+-------------------------------------------------------------------+
> | |
> | |
> 2 O-O O O O O O O O OO O O O O O O O O O O O O O O O O OO |
> | |
> | |
> 1.5 +-+ |
> | |
> 1 +-+ |
> | |
> | |
> 0.5 +-+ |
> | |
> | |
> 0 +-+-------------------------------------------------------------------+
>
>
> perf-stat.ipc
>
> 0.92 +-+------------------------------------------------------------------+
> | |
> 0.9 +-+.+. +. .+. .+. +. .+. |
> 0.88 +-+ +. + + +. +.+ +. .+. + + + .+. |
> | +. +. .+ +.+ + +.+ + +. .+.|
> 0.86 +-+ +.+ +.+.+.+ + |
> | |
> 0.84 +-+ |
> | |
> 0.82 +-+ |
> 0.8 +-+ O O O O |
> | O O O O |
> 0.78 +-O O O O O O O O O O O O O O |
> O O O O O O O |
> 0.76 +-+------------------------------------------------------------------+
>
>
> perf-stat.cpi
>
> 1.3 +-+---------------------------------O-O------------------------------+
> 1.28 O-+ O O O O O O O O O |
> | O O O O O O O O O O O O |
> 1.26 +-+ O |
> 1.24 +-+ O O O O |
> | |
> 1.22 +-+ |
> 1.2 +-+ |
> 1.18 +-+ |
> | |
> 1.16 +-+ .+.+ .+.+.+.+. .+ .+. |
> 1.14 +-+ .+ + + .+ +. .+. .+.+ +. .+ +.|
> |.+. .+ + .+. .+ +. .+ + + .+. .+ + |
> 1.12 +-+ + + + + + + |
> 1.1 +-+------------------------------------------------------------------+
>
>
> will-it-scale.time.user_time
>
> 620 +-+-------------------------------------------------------------------+
> 610 +-+ O O |
> O O O O O O O OO O O O O O O |
> 600 +-+ O O O O O O O O O O O O |
> 590 +-+ |
> | |
> 580 +-+ |
> 570 +-+ |
> 560 +-+ +.+.+.|
> | : |
> 550 +-+.+.+.+. .+ .+.+. : |
> 540 +-+ +.+. + + .+.+ +.+ +. : |
> | +.+.++.+.+. + +.+ + + + |
> 530 +-+ +.+.+.+ ++.+.+ |
> 520 +-+-------------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
> ***************************************************************************************************
> lkp-sb03: 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
> gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-sb03/writeseek1/will-it-scale
>
> commit:
> 955cef1517 ("x86/entry/64: Return to userspace from the trampoline stack")
> 63e02a2a32 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")
>
> 955cef1517a1be93 63e02a2a3292d8815eac7be438
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 1902014 -7.0% 1768039 will-it-scale.per_process_ops
> 1557647 -6.3% 1459046 will-it-scale.per_thread_ops
> 0.52 +4.0% 0.54 will-it-scale.scalability
> 2293 -1.8% 2251 will-it-scale.time.system_time
> 216.11 +19.7% 258.70 will-it-scale.time.user_time
> 1.453e+08 ± 6% +21.7% 1.769e+08 ± 9% cpuidle.POLL.time
> 3.43 +0.8 4.26 mpstat.cpu.usr%
> 284863 ± 6% +12.9% 321561 ± 3% softirqs.RCU
> 7178 ± 6% -11.3% 6368 slabinfo.kmalloc-96.active_objs
> 7218 ± 5% -10.6% 6450 slabinfo.kmalloc-96.num_objs
> 72.27 ± 6% +19.5% 86.39 ± 7% sched_debug.cfs_rq:/.load_avg.avg
> 107.67 ± 3% +31.1% 141.11 ± 19% sched_debug.cfs_rq:/.load_avg.stddev
> 50035 ± 23% +17.3% 58672 ± 24% sched_debug.cpu.load.stddev
> 7.58 ± 21% +65.4% 12.54 ± 11% sched_debug.cpu.nr_uninterruptible.max
> 3.143e+12 -4.7% 2.995e+12 perf-stat.branch-instructions
> 0.01 ± 2% +1.0 0.97 perf-stat.branch-miss-rate%
> 3.791e+08 ± 3% +7525.5% 2.891e+10 perf-stat.branch-misses
> 2.54e+08 +1.0% 2.566e+08 perf-stat.cache-misses
> 1.03 +6.3% 1.10 perf-stat.cpi
> 6.671e+12 -4.7% 6.361e+12 perf-stat.dTLB-loads
> 4.722e+12 -5.0% 4.485e+12 perf-stat.dTLB-stores
> 35.63 ± 12% -29.7 5.89 ± 20% perf-stat.iTLB-load-miss-rate%
> 8.119e+08 ± 8% +829.8% 7.549e+09 ± 2% perf-stat.iTLB-loads
> 1.563e+13 -5.3% 1.48e+13 perf-stat.instructions
> 0.97 -5.9% 0.91 perf-stat.ipc
> 5.97 -6.0 0.00 perf-profile.calltrace.cycles.entry_SYSCALL_64
> 7.43 ± 2% -0.1 7.29 ± 3% perf-profile.calltrace.cycles.find_lock_entry.shmem_getpage_gfp.shmem_write_begin.generic_perform_write.__generic_file_write_iter
> 9.10 ± 2% -0.1 9.00 ± 3% perf-profile.calltrace.cycles.shmem_getpage_gfp.shmem_write_begin.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
> 9.43 ± 2% -0.1 9.33 ± 3% perf-profile.calltrace.cycles.shmem_write_begin.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.__vfs_write
> 19.45 -0.1 19.39 ± 2% perf-profile.calltrace.cycles.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
> 19.14 -0.0 19.10 perf-profile.calltrace.cycles.copy_user_generic_string.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter
> 21.14 +0.0 21.15 ± 2% perf-profile.calltrace.cycles.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.__vfs_write
> 9.16 ± 10% +0.0 9.20 ± 41% perf-profile.calltrace.cycles.poll_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
> 41.59 +0.1 41.71 ± 2% perf-profile.calltrace.cycles.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.__vfs_write.vfs_write
> 11.09 ± 8% +0.2 11.24 ± 31% perf-profile.calltrace.cycles.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 11.21 ± 8% +0.2 11.37 ± 31% perf-profile.calltrace.cycles.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 11.21 ± 8% +0.2 11.37 ± 31% perf-profile.calltrace.cycles.cpu_startup_entry.start_secondary.secondary_startup_64
> 11.21 ± 8% +0.2 11.37 ± 31% perf-profile.calltrace.cycles.start_secondary.secondary_startup_64
> 11.68 ± 7% +0.2 11.90 ± 27% perf-profile.calltrace.cycles.secondary_startup_64
> 45.10 +0.3 45.37 ± 2% perf-profile.calltrace.cycles.__generic_file_write_iter.generic_file_write_iter.__vfs_write.vfs_write.sys_write
> 51.69 +0.3 52.02 ± 2% perf-profile.calltrace.cycles.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
> 50.28 +0.4 50.63 ± 2% perf-profile.calltrace.cycles.generic_file_write_iter.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
> 61.80 +0.8 62.60 ± 3% perf-profile.calltrace.cycles.vfs_write.sys_write.entry_SYSCALL_64_fastpath
> 4.92 +0.9 5.80 ± 5% perf-profile.calltrace.cycles.__fdget_pos.sys_lseek.entry_SYSCALL_64_fastpath
> 4.96 +0.9 5.86 ± 3% perf-profile.calltrace.cycles.__fdget_pos.sys_write.entry_SYSCALL_64_fastpath
> 8.74 +1.0 9.75 ± 6% perf-profile.calltrace.cycles.sys_lseek.entry_SYSCALL_64_fastpath
> 69.88 +1.6 71.49 ± 3% perf-profile.calltrace.cycles.sys_write.entry_SYSCALL_64_fastpath
> 80.00 +2.9 82.90 ± 3% perf-profile.calltrace.cycles.entry_SYSCALL_64_fastpath
> 5.97 -6.0 0.00 perf-profile.children.cycles.entry_SYSCALL_64
> 7.43 ± 2% -0.1 7.29 ± 3% perf-profile.children.cycles.find_lock_entry
> 9.10 ± 2% -0.1 9.00 ± 3% perf-profile.children.cycles.shmem_getpage_gfp
> 9.43 ± 2% -0.1 9.33 ± 3% perf-profile.children.cycles.shmem_write_begin
> 19.45 -0.1 19.39 ± 2% perf-profile.children.cycles.copyin
> 19.14 -0.0 19.11 perf-profile.children.cycles.copy_user_generic_string
> 21.14 +0.0 21.15 ± 2% perf-profile.children.cycles.iov_iter_copy_from_user_atomic
> 9.46 ± 9% +0.1 9.56 ± 36% perf-profile.children.cycles.poll_idle
> 41.60 +0.1 41.72 ± 2% perf-profile.children.cycles.generic_perform_write
> 11.21 ± 8% +0.2 11.37 ± 31% perf-profile.children.cycles.start_secondary
> 11.56 ± 7% +0.2 11.76 ± 27% perf-profile.children.cycles.cpuidle_enter_state
> 11.69 ± 7% +0.2 11.90 ± 27% perf-profile.children.cycles.do_idle
> 11.68 ± 7% +0.2 11.90 ± 27% perf-profile.children.cycles.secondary_startup_64
> 11.68 ± 7% +0.2 11.90 ± 27% perf-profile.children.cycles.cpu_startup_entry
> 45.10 +0.3 45.37 ± 2% perf-profile.children.cycles.__generic_file_write_iter
> 51.72 +0.3 52.03 ± 2% perf-profile.children.cycles.__vfs_write
> 50.28 +0.4 50.63 ± 2% perf-profile.children.cycles.generic_file_write_iter
> 61.84 +0.8 62.62 ± 3% perf-profile.children.cycles.vfs_write
> 8.74 +1.0 9.75 ± 6% perf-profile.children.cycles.sys_lseek
> 3.81 +1.6 5.38 ± 5% perf-profile.children.cycles.__fget_light
> 69.93 +1.6 71.50 ± 3% perf-profile.children.cycles.sys_write
> 9.88 +1.8 11.67 ± 3% perf-profile.children.cycles.__fdget_pos
> 80.23 +2.7 82.94 ± 3% perf-profile.children.cycles.entry_SYSCALL_64_fastpath
> 5.97 -6.0 0.00 perf-profile.self.cycles.entry_SYSCALL_64
> 18.93 -0.1 18.84 ± 2% perf-profile.self.cycles.copy_user_generic_string
> 9.39 ± 8% +0.0 9.42 ± 35% perf-profile.self.cycles.poll_idle
>
>
>
> ***************************************************************************************************
> lkp-ivb-d03: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 4G memory
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
> gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-ivb-d03/brk_test/aim9/300s
>
> commit:
> 955cef1517 ("x86/entry/64: Return to userspace from the trampoline stack")
> 63e02a2a32 ("x86/entry/64: Create a per-CPU SYSCALL entry trampoline")
>
> 955cef1517a1be93 63e02a2a3292d8815eac7be438
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 4124214 -9.9% 3717599 aim9.brk_test.ops_per_sec
> 272.29 -4.9% 259.03 aim9.time.system_time
> 27.71 +47.2% 40.78 aim9.time.user_time
> 12605 ± 9% -27.0% 9203 ± 10% cpuidle.POLL.usage
> 3.24 ± 2% +1.4 4.62 mpstat.cpu.usr%
> 4007 ± 3% -9.2% 3639 ± 4% slabinfo.anon_vma_chain.num_objs
> 9.80 -1.9% 9.61 turbostat.CorWatt
> 30309 -1.3% 29929 vmstat.system.cs
> 18905 -1.1% 18689 vmstat.system.in
> 716.67 ± 11% -22.7% 554.33 ± 6% sched_debug.cfs_rq:/.load_avg.avg
> 1.00 ± 11% -79.2% 0.21 ±173% sched_debug.cfs_rq:/.nr_spread_over.min
> 0.45 ± 55% +70.3% 0.76 ± 19% sched_debug.cfs_rq:/.nr_spread_over.stddev
> 521.82 ± 3% -10.2% 468.57 ± 2% sched_debug.cfs_rq:/.util_avg.avg
> 1.96 ± 7% +34.0% 2.62 ± 9% sched_debug.cpu.nr_running.max
> 0.68 ± 15% +42.9% 0.98 ± 15% sched_debug.cpu.nr_running.stddev
> 0.06 ± 19% +0.9 0.92 perf-stat.branch-miss-rate%
> 3.583e+08 ± 5% +1125.0% 4.389e+09 ± 28% perf-stat.branch-misses
> 9163065 -1.8% 8997254 perf-stat.context-switches
> 0.56 ± 2% +12.8% 0.63 ± 4% perf-stat.cpi
> 0.06 ±132% +0.2 0.23 ± 6% perf-stat.dTLB-load-miss-rate%
> 4.062e+08 ±142% +234.1% 1.357e+09 ± 8% perf-stat.dTLB-load-misses
> 9061724 ± 12% +22.0% 11056158 ± 6% perf-stat.dTLB-store-misses
> 11.72 ± 24% -6.6 5.08 ± 33% perf-stat.iTLB-load-miss-rate%
> 4.4e+08 ± 29% +135.5% 1.036e+09 ± 23% perf-stat.iTLB-loads
> 1.80 ± 2% -11.2% 1.60 ± 3% perf-stat.ipc
> 14.11 ± 88% -2.6 11.50 ± 86% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 14.22 ± 88% -2.6 11.63 ± 85% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 14.22 ± 88% -2.6 11.63 ± 85% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
> 14.22 ± 88% -2.6 11.63 ± 85% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
> 12.86 ± 92% -2.4 10.45 ± 97% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
> 45.20 ± 3% -1.4 43.82 perf-profile.calltrace.cycles-pp.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
> 16.60 ± 3% -0.9 15.74 ± 3% perf-profile.calltrace.cycles-pp.vma_merge.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
> 56.05 ± 2% -0.8 55.25 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_fastpath
> 14.60 ± 3% -0.7 13.88 ± 2% perf-profile.calltrace.cycles-pp.__vma_adjust.vma_merge.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
> 54.84 ± 3% -0.7 54.15 perf-profile.calltrace.cycles-pp.sys_brk.entry_SYSCALL_64_fastpath
> 11.52 ± 9% -0.1 11.46 perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
> 6.30 ± 5% +0.2 6.48 ± 3% perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.do_brk_flags.sys_brk.entry_SYSCALL_64_fastpath
> 27.40 ± 3% +0.8 28.18 ± 4% perf-profile.calltrace.cycles-pp.secondary_startup_64
> 12.40 ± 94% +3.3 15.73 ± 62% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_kernel
> 13.18 ± 88% +3.4 16.55 ± 57% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64
> 13.18 ± 88% +3.4 16.55 ± 57% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_kernel.secondary_startup_64
> 13.18 ± 88% +3.4 16.55 ± 57% perf-profile.calltrace.cycles-pp.start_kernel.secondary_startup_64
> 13.14 ± 88% +3.4 16.53 ± 57% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64
> 14.22 ± 88% -2.6 11.63 ± 85% perf-profile.children.cycles-pp.start_secondary
> 45.83 ± 3% -1.2 44.59 perf-profile.children.cycles-pp.do_brk_flags
> 56.30 ± 2% -0.9 55.36 perf-profile.children.cycles-pp.entry_SYSCALL_64_fastpath
> 17.05 ± 3% -0.8 16.24 ± 3% perf-profile.children.cycles-pp.vma_merge
> 15.45 ± 3% -0.7 14.79 ± 2% perf-profile.children.cycles-pp.__vma_adjust
> 55.47 ± 3% -0.6 54.88 perf-profile.children.cycles-pp.sys_brk
> 12.21 ± 8% -0.1 12.08 perf-profile.children.cycles-pp.perf_event_mmap
> 6.40 ± 5% +0.2 6.57 ± 3% perf-profile.children.cycles-pp.security_vm_enough_memory_mm
> 27.41 ± 3% +0.8 28.19 ± 4% perf-profile.children.cycles-pp.do_idle
> 27.30 ± 3% +0.8 28.07 ± 4% perf-profile.children.cycles-pp.cpuidle_enter_state
> 27.40 ± 3% +0.8 28.18 ± 4% perf-profile.children.cycles-pp.secondary_startup_64
> 27.40 ± 3% +0.8 28.18 ± 4% perf-profile.children.cycles-pp.cpu_startup_entry
> 25.27 +0.9 26.19 perf-profile.children.cycles-pp.intel_idle
> 13.18 ± 88% +3.4 16.55 ± 57% perf-profile.children.cycles-pp.start_kernel
> 4.82 ± 9% +0.0 4.83 ± 5% perf-profile.self.cycles-pp.__vma_adjust
> 5.25 ± 9% +0.0 5.29 ± 2% perf-profile.self.cycles-pp.perf_event_mmap
> 5.33 ± 3% +0.4 5.75 ± 3% perf-profile.self.cycles-pp.do_brk_flags
> 25.26 +0.9 26.19 perf-profile.self.cycles-pp.intel_idle
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Xiaolong
> <config-4.14.0-01234-g63e02a2>
> <job.yaml>
> <reproduce>

2017-12-04 17:27:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [lkp-robot] [x86/entry/64] 63e02a2a32: will-it-scale.per_process_ops -13.0% regression

On Sun, 3 Dec 2017, Andy Lutomirski wrote:

> Thomas, has my fix for this landed?

It's in the new series.

tglx

2017-12-06 21:34:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: [lkp-robot] [x86/entry/64] 63e02a2a32: will-it-scale.per_process_ops -13.0% regression


* Thomas Gleixner <[email protected]> wrote:

> On Sun, 3 Dec 2017, Andy Lutomirski wrote:
>
> > Thomas, has my fix for this landed?
>
> It's in the new series.

Should now be in tip:master and the affected branches as well.

Thanks,

Ingo