Greeting,
FYI, we noticed a 5.6% improvement of will-it-scale.per_process_ops due to commit:
commit: 9bc0bb50727c8ac69fbb33fb937431cf3518ff37 ("objtool/x86: Rewrite retpoline thunk calls")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/core
in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:
nr_task: 16
mode: process
test: eventfd1
cpufreq_governor: performance
ucode: 0x5003006
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/eventfd1/will-it-scale/0x5003006
commit:
50e7b4a1a1 ("objtool: Skip magical retpoline .altinstr_replacement")
9bc0bb5072 ("objtool/x86: Rewrite retpoline thunk calls")
50e7b4a1a1b264fc 9bc0bb50727c8ac69fbb33fb937
---------------- ---------------------------
%stddev %change %stddev
\ | \
46843229 +5.6% 49479323 will-it-scale.16.processes
2927701 +5.6% 3092457 will-it-scale.per_process_ops
46843229 +5.6% 49479323 will-it-scale.workload
8251 ? 8% -19.6% 6635 ? 11% numa-vmstat.node0.nr_slab_reclaimable
33007 ? 8% -19.6% 26543 ? 11% numa-meminfo.node0.KReclaimable
33007 ? 8% -19.6% 26543 ? 11% numa-meminfo.node0.SReclaimable
1172 ? 12% -68.4% 370.67 ?141% perf-sched.wait_and_delay.avg.ms.futex_wait_queue_me.futex_wait.do_futex.__x64_sys_futex
1172 ? 12% -68.4% 370.65 ?141% perf-sched.wait_time.avg.ms.futex_wait_queue_me.futex_wait.do_futex.__x64_sys_futex
112.67 ? 20% -32.7% 75.83 ? 19% interrupts.CPU115.NMI:Non-maskable_interrupts
112.67 ? 20% -32.7% 75.83 ? 19% interrupts.CPU115.PMI:Performance_monitoring_interrupts
154.00 ? 43% -41.8% 89.67 ? 40% interrupts.CPU135.NMI:Non-maskable_interrupts
154.00 ? 43% -41.8% 89.67 ? 40% interrupts.CPU135.PMI:Performance_monitoring_interrupts
128.50 ? 16% -39.7% 77.50 ? 33% interrupts.CPU151.NMI:Non-maskable_interrupts
128.50 ? 16% -39.7% 77.50 ? 33% interrupts.CPU151.PMI:Performance_monitoring_interrupts
126.50 ? 19% -39.1% 77.00 ? 34% interrupts.CPU152.NMI:Non-maskable_interrupts
126.50 ? 19% -39.1% 77.00 ? 34% interrupts.CPU152.PMI:Performance_monitoring_interrupts
150.67 ? 49% -52.7% 71.33 ? 33% interrupts.CPU153.NMI:Non-maskable_interrupts
150.67 ? 49% -52.7% 71.33 ? 33% interrupts.CPU153.PMI:Performance_monitoring_interrupts
134.67 ? 30% -45.5% 73.33 ? 33% interrupts.CPU154.NMI:Non-maskable_interrupts
134.67 ? 30% -45.5% 73.33 ? 33% interrupts.CPU154.PMI:Performance_monitoring_interrupts
229.00 ? 82% -64.9% 80.33 ? 38% interrupts.CPU57.NMI:Non-maskable_interrupts
229.00 ? 82% -64.9% 80.33 ? 38% interrupts.CPU57.PMI:Performance_monitoring_interrupts
9305 ? 16% +30.4% 12133 ? 20% softirqs.CPU116.RCU
9674 ? 8% +17.7% 11391 ? 11% softirqs.CPU121.RCU
10950 ? 8% +13.3% 12402 ? 7% softirqs.CPU160.RCU
11054 ? 8% +14.6% 12663 ? 5% softirqs.CPU161.RCU
10764 ? 6% +16.6% 12548 ? 6% softirqs.CPU163.RCU
11073 ? 8% +20.4% 13337 ? 4% softirqs.CPU164.RCU
10840 ? 7% +18.1% 12797 ? 6% softirqs.CPU165.RCU
10935 ? 9% +19.5% 13066 ? 7% softirqs.CPU166.RCU
10791 ? 8% +17.0% 12629 ? 8% softirqs.CPU168.RCU
10152 ? 6% +17.1% 11892 ? 5% softirqs.CPU171.RCU
10644 ? 6% +13.0% 12032 ? 5% softirqs.CPU172.RCU
14639 ? 11% +20.5% 17644 ? 10% softirqs.CPU3.RCU
11177 ? 8% +13.4% 12671 ? 7% softirqs.CPU64.RCU
11039 ? 6% +15.3% 12730 ? 6% softirqs.CPU67.RCU
11218 ? 9% +17.9% 13225 ? 5% softirqs.CPU68.RCU
15014 ? 11% +17.8% 17688 ? 6% softirqs.CPU7.RCU
11300 ? 9% +17.4% 13267 ? 7% softirqs.CPU70.RCU
11094 ? 6% +18.1% 13099 ? 7% softirqs.CPU71.RCU
10930 ? 8% +15.5% 12620 ? 5% softirqs.CPU72.RCU
10800 ? 7% +15.8% 12509 ? 8% softirqs.CPU75.RCU
10822 ? 8% +14.7% 12412 ? 6% softirqs.CPU76.RCU
24155 ? 13% +26.9% 30649 ? 14% softirqs.CPU99.SCHED
1.633e+10 +3.7% 1.694e+10 perf-stat.i.branch-instructions
1.18 ? 9% -0.6 0.59 ? 15% perf-stat.i.branch-miss-rate%
1.881e+08 ? 5% -47.4% 98885807 ? 15% perf-stat.i.branch-misses
4905715 ? 12% -56.2% 2149255 ? 70% perf-stat.i.cache-misses
0.72 ? 5% -11.5% 0.64 ? 2% perf-stat.i.cpi
11933 ? 13% +264.7% 43517 ? 54% perf-stat.i.cycles-between-cache-misses
2.352e+10 +5.6% 2.484e+10 perf-stat.i.dTLB-loads
1.574e+10 +5.7% 1.664e+10 perf-stat.i.dTLB-stores
1.748e+08 -47.8% 91212835 ? 6% perf-stat.i.iTLB-load-misses
8.088e+10 +5.6% 8.541e+10 perf-stat.i.instructions
464.80 +104.0% 948.14 ? 6% perf-stat.i.instructions-per-iTLB-miss
1.42 +10.6% 1.57 ? 2% perf-stat.i.ipc
289.77 +5.1% 304.52 perf-stat.i.metric.M/sec
54032 ? 36% -49.7% 27155 ? 26% perf-stat.i.node-loads
1.15 ? 5% -0.6 0.58 ? 16% perf-stat.overall.branch-miss-rate%
0.70 -9.3% 0.64 ? 2% perf-stat.overall.cpi
11810 ? 13% +229.4% 38908 ? 51% perf-stat.overall.cycles-between-cache-misses
462.63 +103.3% 940.67 ? 6% perf-stat.overall.instructions-per-iTLB-miss
1.42 +10.4% 1.57 ? 2% perf-stat.overall.ipc
1.627e+10 +3.7% 1.688e+10 perf-stat.ps.branch-instructions
1.875e+08 ? 5% -47.4% 98557001 ? 15% perf-stat.ps.branch-misses
4889627 ? 12% -56.2% 2142836 ? 70% perf-stat.ps.cache-misses
2.344e+10 +5.6% 2.476e+10 perf-stat.ps.dTLB-loads
1.569e+10 +5.7% 1.659e+10 perf-stat.ps.dTLB-stores
1.742e+08 -47.8% 90889010 ? 6% perf-stat.ps.iTLB-load-misses
8.061e+10 +5.6% 8.512e+10 perf-stat.ps.instructions
53915 ? 36% -49.6% 27175 ? 26% perf-stat.ps.node-loads
2.442e+13 +5.3% 2.571e+13 perf-stat.total.instructions
14.71 ? 7% -2.0 12.67 ? 8% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
3.80 ? 26% -1.5 2.29 ? 12% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
8.34 ? 7% -1.2 7.13 ? 8% perf-profile.calltrace.cycles-pp.eventfd_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.32 ? 31% -1.0 0.30 ?103% perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
4.89 ? 7% -0.9 3.98 ? 9% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.72 ? 6% -0.8 2.94 ? 7% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.70 ? 7% -0.6 2.15 ? 10% perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_read.ksys_read.do_syscall_64
2.85 ? 6% -0.5 2.36 ? 7% perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_write.ksys_write.do_syscall_64
0.72 ? 8% -0.4 0.28 ?100% perf-profile.calltrace.cycles-pp.___might_sleep.__might_fault._copy_from_user.eventfd_write.vfs_write
1.23 ? 8% -0.3 0.97 ? 10% perf-profile.calltrace.cycles-pp.__might_fault._copy_from_user.eventfd_write.vfs_write.ksys_write
14.85 ? 7% -2.0 12.81 ? 8% perf-profile.children.cycles-pp.vfs_write
8.61 ? 6% -1.7 6.93 ? 8% perf-profile.children.cycles-pp.security_file_permission
8.45 ? 7% -1.2 7.24 ? 8% perf-profile.children.cycles-pp.eventfd_write
5.70 ? 7% -1.1 4.64 ? 9% perf-profile.children.cycles-pp.common_file_perm
1.33 ? 31% -0.8 0.48 ? 28% perf-profile.children.cycles-pp.menu_select
3.46 ? 15% -0.8 2.68 ? 10% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.71 ? 7% -0.4 0.27 ? 10% perf-profile.children.cycles-pp.apparmor_file_permission
2.46 ? 7% -0.3 2.11 ? 9% perf-profile.children.cycles-pp.__might_fault
1.33 ? 7% -0.2 1.13 ? 9% perf-profile.children.cycles-pp.___might_sleep
0.38 ? 8% +0.1 0.48 ? 9% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
1.57 ? 55% -1.3 0.30 ? 21% perf-profile.self.cycles-pp.cpuidle_enter_state
4.39 ? 7% -1.1 3.27 ? 8% perf-profile.self.cycles-pp.common_file_perm
2.15 ? 6% -0.8 1.32 ? 11% perf-profile.self.cycles-pp.eventfd_write
0.98 ? 42% -0.8 0.20 ? 44% perf-profile.self.cycles-pp.menu_select
2.25 ? 7% -0.6 1.61 ? 8% perf-profile.self.cycles-pp.eventfd_read
0.57 ? 7% -0.3 0.27 ? 10% perf-profile.self.cycles-pp.apparmor_file_permission
1.32 ? 7% -0.2 1.12 ? 9% perf-profile.self.cycles-pp.___might_sleep
0.43 ? 7% -0.1 0.35 ? 9% perf-profile.self.cycles-pp.__might_fault
0.11 ? 12% -0.0 0.08 ? 16% perf-profile.self.cycles-pp.read_tsc
0.07 ? 5% +0.1 0.13 ? 11% perf-profile.self.cycles-pp.__x64_sys_write
0.07 ? 12% +0.1 0.14 ? 11% perf-profile.self.cycles-pp.__x64_sys_read
0.26 ? 10% +0.1 0.37 ? 8% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
will-it-scale.per_process_ops
3.12e+06 +----------------------------------------------------------------+
3.1e+06 |-+ |
| O O O O |
3.08e+06 |-+ O |
3.06e+06 |-+ |
| |
3.04e+06 |-+ O O OO O |
3.02e+06 |-O O O O |
3e+06 |-+ O |
| |
2.98e+06 |-+ O OO |
2.96e+06 |-+ |
| +.++.+.+.++.+.+.++.+.+.++.+.+.+ |
2.94e+06 |.+ + +. .+. +. .+ .+.+ .+.|
2.92e+06 +----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang