Greeting,
FYI, we noticed a 9.1% improvement of will-it-scale.per_thread_ops due to commit:
commit: 7c30f36a98ae488741178d69662e4f2baa53e7f6 ("io_uring: run __io_sq_thread() with the initial creds from io_uring_setup()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: will-it-scale
on test machine: 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory
with following parameters:
nr_task: 50%
mode: thread
test: unix1
cpufreq_governor: performance
ucode: 0x16
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/thread/50%/debian-10.4-x86_64-20200603.cgz/lkp-hsw-4ex1/unix1/will-it-scale/0x16
commit:
678eeba481 ("io-wq: warn on creating manager while exiting")
7c30f36a98 ("io_uring: run __io_sq_thread() with the initial creds from io_uring_setup()")
678eeba481d8c161 7c30f36a98ae488741178d69662
---------------- ---------------------------
%stddev %change %stddev
\ | \
30824092 +9.1% 33623774 will-it-scale.72.threads
428111 +9.1% 466996 will-it-scale.per_thread_ops
30824092 +9.1% 33623774 will-it-scale.workload
314351 ? 4% -8.6% 287222 numa-meminfo.node0.Unevictable
0.04 ?116% +27922.2% 9.90 ?123% perf-sched.sch_delay.max.ms.__traceiter_sched_switch.__traceiter_sched_switch.schedule_timeout.rcu_gp_kthread.kthread
15.00 +6.7% 16.00 vmstat.cpu.us
78587 ? 4% -8.6% 71805 numa-vmstat.node0.nr_unevictable
78587 ? 4% -8.6% 71805 numa-vmstat.node0.nr_zone_unevictable
1769 -12.7% 1544 syscalls.sys_read.med
1780 -9.4% 1613 syscalls.sys_write.med
19842 ? 3% -20.6% 15756 ? 12% softirqs.CPU11.RCU
12942 ? 8% +17.0% 15137 ? 10% softirqs.CPU134.RCU
13720 ? 11% +19.2% 16356 ? 10% softirqs.CPU55.RCU
36667 ? 8% -41.0% 21647 ? 38% softirqs.CPU83.SCHED
266.33 ? 8% -47.4% 140.00 ? 58% interrupts.CPU11.RES:Rescheduling_interrupts
1118 ? 19% -47.1% 592.00 ? 50% interrupts.CPU11.TLB:TLB_shootdowns
992.50 ? 14% -35.1% 643.67 ? 39% interrupts.CPU120.TLB:TLB_shootdowns
1914 ? 35% +136.0% 4518 ? 43% interrupts.CPU129.NMI:Non-maskable_interrupts
1914 ? 35% +136.0% 4518 ? 43% interrupts.CPU129.PMI:Performance_monitoring_interrupts
36.17 ? 71% +206.9% 111.00 ? 44% interrupts.CPU131.RES:Rescheduling_interrupts
1159 ? 18% +72.2% 1996 ? 32% interrupts.CPU134.CAL:Function_call_interrupts
374.83 ? 61% +139.0% 895.67 ? 40% interrupts.CPU134.TLB:TLB_shootdowns
2810 ? 37% +134.0% 6578 ? 33% interrupts.CPU45.NMI:Non-maskable_interrupts
2810 ? 37% +134.0% 6578 ? 33% interrupts.CPU45.PMI:Performance_monitoring_interrupts
1605 ? 19% +76.6% 2836 ? 48% interrupts.CPU52.CAL:Function_call_interrupts
2231 ? 27% -37.2% 1400 ? 22% interrupts.CPU62.CAL:Function_call_interrupts
6880 ? 25% -46.7% 3669 ? 57% interrupts.CPU62.NMI:Non-maskable_interrupts
6880 ? 25% -46.7% 3669 ? 57% interrupts.CPU62.PMI:Performance_monitoring_interrupts
226.50 ? 18% -47.5% 119.00 ? 63% interrupts.CPU62.RES:Rescheduling_interrupts
1169 ? 18% -44.4% 650.83 ? 52% interrupts.CPU62.TLB:TLB_shootdowns
235.00 ? 13% -59.4% 95.33 ? 65% interrupts.CPU63.RES:Rescheduling_interrupts
384.17 ? 64% +120.0% 845.33 ? 30% interrupts.CPU84.TLB:TLB_shootdowns
1870 ? 8% -26.7% 1370 ? 29% interrupts.CPU93.CAL:Function_call_interrupts
1092 ? 16% -45.3% 597.33 ? 66% interrupts.CPU93.TLB:TLB_shootdowns
3.702e+10 +9.1% 4.038e+10 perf-stat.i.branch-instructions
4.711e+08 +9.0% 5.134e+08 perf-stat.i.branch-misses
1.13 -8.4% 1.04 perf-stat.i.cpi
5.421e+10 +9.0% 5.909e+10 perf-stat.i.dTLB-loads
61939091 +8.9% 67468763 perf-stat.i.dTLB-store-misses
3.777e+10 +8.9% 4.112e+10 perf-stat.i.dTLB-stores
64979413 ? 2% +9.4% 71098260 ? 2% perf-stat.i.iTLB-load-misses
1.703e+08 ? 3% +14.3% 1.947e+08 ? 15% perf-stat.i.iTLB-loads
1.857e+11 +9.1% 2.026e+11 perf-stat.i.instructions
0.89 +9.2% 0.97 perf-stat.i.ipc
896.93 +9.0% 978.06 perf-stat.i.metric.M/sec
22535 ? 7% +13.9% 25662 ? 5% perf-stat.i.node-loads
0.07 -9.1% 0.06 ? 2% perf-stat.overall.MPKI
1.13 -8.4% 1.03 perf-stat.overall.cpi
0.89 +9.1% 0.97 perf-stat.overall.ipc
3.687e+10 +9.1% 4.023e+10 perf-stat.ps.branch-instructions
4.693e+08 +9.0% 5.116e+08 perf-stat.ps.branch-misses
5.399e+10 +9.1% 5.888e+10 perf-stat.ps.dTLB-loads
61667162 +9.0% 67198925 perf-stat.ps.dTLB-store-misses
3.761e+10 +8.9% 4.097e+10 perf-stat.ps.dTLB-stores
64692296 ? 2% +9.5% 70834658 ? 2% perf-stat.ps.iTLB-load-misses
1.695e+08 ? 3% +14.4% 1.939e+08 ? 15% perf-stat.ps.iTLB-loads
1.849e+11 +9.1% 2.018e+11 perf-stat.ps.instructions
23463 ? 8% +13.9% 26730 ? 8% perf-stat.ps.node-loads
5.594e+13 +9.1% 6.104e+13 perf-stat.total.instructions
31.07 -2.3 28.80 ? 9% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
37.16 -2.2 35.01 ? 9% perf-profile.calltrace.cycles-pp.__libc_read
20.02 -1.5 18.51 ? 9% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
17.67 ? 2% -1.5 16.19 ? 9% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
20.78 ? 2% -1.4 19.34 ? 9% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
16.56 -1.4 15.14 ? 9% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
15.50 -1.3 14.21 ? 9% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
14.88 ? 2% -1.3 13.62 ? 9% perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
16.59 -1.2 15.38 ? 9% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
13.54 -1.2 12.38 ? 9% perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
13.18 -1.2 12.02 ? 9% perf-profile.calltrace.cycles-pp.sock_read_iter.new_sync_read.vfs_read.ksys_read.do_syscall_64
14.42 ? 2% -1.1 13.28 ? 9% perf-profile.calltrace.cycles-pp.sock_write_iter.new_sync_write.vfs_write.ksys_write.do_syscall_64
13.50 ? 2% -1.0 12.52 ? 9% perf-profile.calltrace.cycles-pp.sock_sendmsg.sock_write_iter.new_sync_write.vfs_write.ksys_write
12.25 ? 2% -0.9 11.36 ? 9% perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.new_sync_write.vfs_write
3.14 ? 2% -0.8 2.31 ? 9% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.__libc_read
10.57 -0.8 9.81 ? 9% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.__libc_read
1.70 ? 2% -0.5 1.17 ? 9% perf-profile.calltrace.cycles-pp.sock_recvmsg.sock_read_iter.new_sync_read.vfs_read.ksys_read
2.15 ? 2% -0.4 1.72 ? 9% perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.new_sync_write
1.05 ? 2% -0.2 0.81 ? 9% perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter
0.60 -0.2 0.44 ? 44% perf-profile.calltrace.cycles-pp.unix_write_space.sock_wfree.unix_destruct_scm.skb_release_head_state.skb_release_all
1.52 ? 2% -0.2 1.36 ? 8% perf-profile.calltrace.cycles-pp.unix_destruct_scm.skb_release_head_state.skb_release_all.consume_skb.unix_stream_read_generic
1.61 -0.2 1.46 ? 8% perf-profile.calltrace.cycles-pp.skb_release_head_state.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg
1.64 -0.2 1.49 ? 9% perf-profile.calltrace.cycles-pp.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter
40.84 -2.9 37.89 ? 9% perf-profile.children.cycles-pp.do_syscall_64
37.53 -2.1 35.38 ? 9% perf-profile.children.cycles-pp.__libc_read
17.70 ? 2% -1.5 16.22 ? 9% perf-profile.children.cycles-pp.ksys_write
16.60 -1.4 15.19 ? 9% perf-profile.children.cycles-pp.vfs_write
15.55 -1.3 14.26 ? 9% perf-profile.children.cycles-pp.vfs_read
14.91 ? 2% -1.3 13.65 ? 9% perf-profile.children.cycles-pp.new_sync_write
16.62 -1.2 15.41 ? 9% perf-profile.children.cycles-pp.ksys_read
13.22 -1.2 12.06 ? 9% perf-profile.children.cycles-pp.sock_read_iter
13.58 -1.2 12.42 ? 9% perf-profile.children.cycles-pp.new_sync_read
14.47 ? 2% -1.1 13.33 ? 9% perf-profile.children.cycles-pp.sock_write_iter
13.52 ? 2% -1.0 12.55 ? 9% perf-profile.children.cycles-pp.sock_sendmsg
12.36 ? 2% -0.9 11.42 ? 9% perf-profile.children.cycles-pp.unix_stream_sendmsg
5.51 ? 2% -0.8 4.72 ? 8% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
1.71 ? 2% -0.5 1.20 ? 9% perf-profile.children.cycles-pp.sock_recvmsg
2.18 ? 2% -0.4 1.74 ? 9% perf-profile.children.cycles-pp.skb_copy_datagram_from_iter
1.22 ? 2% -0.3 0.88 ? 8% perf-profile.children.cycles-pp.__x86_retpoline_rax
0.52 -0.2 0.31 ? 9% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.93 -0.2 0.72 ? 9% perf-profile.children.cycles-pp.fsnotify
2.25 ? 2% -0.2 2.06 ? 9% perf-profile.children.cycles-pp.__check_object_size
1.55 ? 2% -0.2 1.40 ? 8% perf-profile.children.cycles-pp.unix_destruct_scm
1.64 -0.2 1.49 ? 9% perf-profile.children.cycles-pp.skb_release_all
1.62 -0.2 1.47 ? 8% perf-profile.children.cycles-pp.skb_release_head_state
0.58 -0.1 0.44 ? 8% perf-profile.children.cycles-pp.__virt_addr_valid
0.47 ? 4% -0.1 0.35 ? 9% perf-profile.children.cycles-pp.wait_for_unix_gc
0.61 ? 2% -0.1 0.51 ? 8% perf-profile.children.cycles-pp.unix_write_space
0.40 -0.1 0.31 ? 10% perf-profile.children.cycles-pp.__x64_sys_read
0.63 -0.1 0.55 ? 9% perf-profile.children.cycles-pp.__might_sleep
0.35 ? 3% -0.1 0.29 ? 8% perf-profile.children.cycles-pp.apparmor_socket_getpeersec_dgram
0.13 -0.0 0.08 ? 8% perf-profile.children.cycles-pp.check_stack_object
0.13 ? 5% -0.0 0.09 ? 10% perf-profile.children.cycles-pp.unix_scm_to_skb
0.45 ? 3% -0.0 0.42 ? 8% perf-profile.children.cycles-pp.security_socket_getpeersec_dgram
0.09 ? 5% -0.0 0.06 ? 11% perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup
0.57 ? 2% +0.1 0.71 ? 12% perf-profile.children.cycles-pp.__ksize
1.12 ? 2% -0.8 0.37 ? 9% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
0.50 ? 2% -0.4 0.09 ? 12% perf-profile.self.cycles-pp.sock_recvmsg
1.13 -0.4 0.73 ? 9% perf-profile.self.cycles-pp.sock_read_iter
0.98 ? 3% -0.3 0.68 ? 9% perf-profile.self.cycles-pp.__x86_retpoline_rax
0.36 -0.2 0.13 ? 8% perf-profile.self.cycles-pp.skb_copy_datagram_from_iter
0.89 -0.2 0.69 ? 9% perf-profile.self.cycles-pp.fsnotify
0.29 ? 3% -0.2 0.10 ? 9% perf-profile.self.cycles-pp.security_socket_recvmsg
0.93 ? 2% -0.2 0.74 ? 9% perf-profile.self.cycles-pp.sock_write_iter
0.92 ? 4% -0.2 0.76 ? 10% perf-profile.self.cycles-pp.ftrace_syscall_exit
0.40 ? 2% -0.2 0.25 ? 10% perf-profile.self.cycles-pp.exit_to_user_mode_prepare
1.19 ? 2% -0.1 1.04 ? 9% perf-profile.self.cycles-pp.unix_stream_sendmsg
0.56 ? 2% -0.1 0.42 ? 9% perf-profile.self.cycles-pp.__virt_addr_valid
0.46 ? 6% -0.1 0.34 ? 9% perf-profile.self.cycles-pp.syscall_trace_enter
0.37 ? 2% -0.1 0.26 ? 8% perf-profile.self.cycles-pp.new_sync_write
0.25 ? 3% -0.1 0.14 ? 11% perf-profile.self.cycles-pp.alloc_skb_with_frags
0.34 -0.1 0.24 ? 10% perf-profile.self.cycles-pp.__x64_sys_read
0.59 -0.1 0.50 ? 8% perf-profile.self.cycles-pp.unix_write_space
0.40 ? 2% -0.1 0.31 ? 9% perf-profile.self.cycles-pp.unix_destruct_scm
0.55 ? 2% -0.1 0.47 ? 9% perf-profile.self.cycles-pp.__alloc_skb
0.28 ? 2% -0.1 0.20 ? 9% perf-profile.self.cycles-pp.ksys_write
0.48 ? 2% -0.1 0.41 ? 10% perf-profile.self.cycles-pp.vfs_read
0.16 ? 6% -0.1 0.09 ? 13% perf-profile.self.cycles-pp.skb_copy_datagram_iter
0.49 ? 2% -0.1 0.43 ? 9% perf-profile.self.cycles-pp.unix_stream_recvmsg
0.13 ? 5% -0.1 0.08 ? 11% perf-profile.self.cycles-pp.wait_for_unix_gc
0.13 ? 5% -0.1 0.07 ? 10% perf-profile.self.cycles-pp.unix_scm_to_skb
0.21 ? 5% -0.1 0.16 ? 11% perf-profile.self.cycles-pp.sock_alloc_send_pskb
0.48 -0.1 0.43 ? 10% perf-profile.self.cycles-pp.vfs_write
0.08 ? 7% -0.0 0.03 ? 70% perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup
0.11 ? 4% -0.0 0.06 ? 11% perf-profile.self.cycles-pp.check_stack_object
0.30 ? 3% -0.0 0.26 ? 8% perf-profile.self.cycles-pp.apparmor_socket_getpeersec_dgram
0.14 ? 4% -0.0 0.10 ? 9% perf-profile.self.cycles-pp.sock_sendmsg
0.22 ? 4% -0.0 0.18 ? 9% perf-profile.self.cycles-pp.do_syscall_64
0.21 ? 2% -0.0 0.18 ? 12% perf-profile.self.cycles-pp.__skb_datagram_iter
0.27 ? 2% -0.0 0.23 ? 9% perf-profile.self.cycles-pp.__x64_sys_write
0.23 ? 4% -0.0 0.19 ? 8% perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
0.18 ? 3% +0.1 0.24 ? 8% perf-profile.self.cycles-pp.security_file_permission
0.23 ? 2% +0.1 0.31 ? 8% perf-profile.self.cycles-pp.ksys_read
0.22 ? 3% +0.1 0.30 ? 9% perf-profile.self.cycles-pp.apparmor_socket_recvmsg
0.55 +0.1 0.70 ? 12% perf-profile.self.cycles-pp.__ksize
will-it-scale.per_thread_ops
480000 +------------------------------------------------------------------+
475000 |-+ OO O O O O |
| O O O O O O O O O |
470000 |-O O O O O O O O O |
465000 |-+ O O O O O O O O O |
460000 |-+ .+.+.+.+.+. .+.+. .+.+ |
455000 |.+.+ +.+.+ +.+.+ +. |
| +.+ |
450000 |-+ : |
445000 |-+ : |
440000 |-+ : |
435000 |-+ : |
| : |
430000 |-+ +.+.+.+.+.+ |
425000 +------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang