Greeting,
FYI, we noticed a 10.2% improvement of will-it-scale.per_thread_ops due to commit:
commit: e43de7f0862b8598cd1ef440e3b4701cd107ea40 ("fsnotify: optimize the case of no marks of any type")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: will-it-scale
on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:
nr_task: 100%
mode: thread
test: eventfd1
cpufreq_governor: performance
ucode: 0x5003006
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
bin/lkp run generated-yaml-file
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/eventfd1/will-it-scale/0x5003006
commit:
ec44610fe2 ("fsnotify: count all objects with attached connectors")
e43de7f086 ("fsnotify: optimize the case of no marks of any type")
ec44610fe2b86dae e43de7f0862b8598cd1ef440e3b
---------------- ---------------------------
%stddev %change %stddev
\ | \
3.057e+08 +10.2% 3.368e+08 will-it-scale.192.threads
1592331 +10.2% 1754346 will-it-scale.per_thread_ops
3.057e+08 +10.2% 3.368e+08 will-it-scale.workload
18.46 +1.9 20.39 mpstat.cpu.all.usr%
0.04 ? 25% -53.1% 0.02 ? 63% perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64
401780 ? 4% -6.1% 377284 ? 5% proc-vmstat.numa_pte_updates
2561 ? 18% -19.0% 2075 ? 6% interrupts.CPU110.TLB:TLB_shootdowns
4743 ? 28% -30.0% 3318 ? 23% interrupts.CPU122.CAL:Function_call_interrupts
4311 ? 38% -27.9% 3110 ? 9% interrupts.CPU126.CAL:Function_call_interrupts
2586 ? 14% -18.3% 2112 ? 5% interrupts.CPU126.TLB:TLB_shootdowns
2558 ? 16% -16.7% 2131 ? 4% interrupts.CPU140.TLB:TLB_shootdowns
2581 ? 14% -18.0% 2117 ? 6% interrupts.CPU178.TLB:TLB_shootdowns
3670 ? 10% -14.0% 3155 ? 10% interrupts.CPU183.CAL:Function_call_interrupts
1.1e+11 -3.6% 1.06e+11 perf-stat.i.branch-instructions
0.43 ? 3% +0.2 0.65 perf-stat.i.branch-miss-rate%
4.684e+08 ? 2% +46.4% 6.859e+08 perf-stat.i.branch-misses
1352675 ? 4% -7.9% 1246174 perf-stat.i.cache-misses
1.00 +3.3% 1.04 perf-stat.i.cpi
571300 ? 5% +14.4% 653738 perf-stat.i.cycles-between-cache-misses
1.619e+11 -1.0% 1.604e+11 perf-stat.i.dTLB-loads
161216 +8.5% 174902 perf-stat.i.dTLB-store-misses
1.113e+11 -7.0% 1.035e+11 perf-stat.i.dTLB-stores
4.36e+08 ? 2% +50.0% 6.541e+08 ? 2% perf-stat.i.iTLB-load-misses
600928 ? 11% +173.9% 1645842 ? 34% perf-stat.i.iTLB-loads
5.529e+11 -3.1% 5.356e+11 perf-stat.i.instructions
1268 ? 2% -35.3% 820.45 ? 2% perf-stat.i.instructions-per-iTLB-miss
1.00 -3.2% 0.96 perf-stat.i.ipc
1995 -3.5% 1926 perf-stat.i.metric.M/sec
245924 ? 5% -8.5% 225135 perf-stat.i.node-load-misses
0.43 ? 3% +0.2 0.65 perf-stat.overall.branch-miss-rate%
1.00 +3.3% 1.04 perf-stat.overall.cpi
397012 ? 4% +8.1% 429116 ? 2% perf-stat.overall.cycles-between-cache-misses
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
1269 ? 2% -35.4% 819.49 ? 2% perf-stat.overall.instructions-per-iTLB-miss
1.00 -3.2% 0.96 perf-stat.overall.ipc
545194 -12.1% 479465 perf-stat.overall.path-length
1.096e+11 -3.6% 1.056e+11 perf-stat.ps.branch-instructions
4.668e+08 ? 2% +46.4% 6.836e+08 perf-stat.ps.branch-misses
1396634 ? 4% -7.6% 1290738 perf-stat.ps.cache-misses
9527800 ? 26% -16.0% 7998925 ? 3% perf-stat.ps.cache-references
1.614e+11 -1.0% 1.598e+11 perf-stat.ps.dTLB-loads
160900 +8.9% 175191 perf-stat.ps.dTLB-store-misses
1.109e+11 -7.0% 1.032e+11 perf-stat.ps.dTLB-stores
4.345e+08 ? 2% +50.0% 6.519e+08 ? 2% perf-stat.ps.iTLB-load-misses
601410 ? 12% +172.8% 1640540 ? 34% perf-stat.ps.iTLB-loads
5.511e+11 -3.1% 5.338e+11 perf-stat.ps.instructions
245861 ? 6% -8.7% 224556 perf-stat.ps.node-load-misses
1.667e+14 -3.1% 1.615e+14 perf-stat.total.instructions
28.69 -3.6 25.08 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
35.83 -3.3 32.53 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
40.48 -3.0 37.49 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
42.00 -2.8 39.16 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
7.75 ? 4% -1.8 6.00 ? 2% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
56.76 -1.4 55.37 perf-profile.calltrace.cycles-pp.__libc_read
1.11 +0.0 1.15 perf-profile.calltrace.cycles-pp.___might_sleep.__might_fault._copy_to_iter.eventfd_read.new_sync_read
0.82 +0.0 0.86 ? 4% perf-profile.calltrace.cycles-pp.__x64_sys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
0.55 +0.0 0.60 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_read
0.55 +0.0 0.60 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.__libc_write
0.66 +0.1 0.71 perf-profile.calltrace.cycles-pp.__might_sleep.__might_fault._copy_from_user.eventfd_write.vfs_write
0.56 +0.1 0.62 perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
0.56 +0.1 0.62 perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
2.13 +0.1 2.19 perf-profile.calltrace.cycles-pp.__might_fault._copy_from_user.eventfd_write.vfs_write.ksys_write
0.62 +0.1 0.71 perf-profile.calltrace.cycles-pp.__might_sleep.__might_fault._copy_to_iter.eventfd_read.new_sync_read
1.09 +0.1 1.19 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.eventfd_read.new_sync_read
1.32 +0.1 1.42 perf-profile.calltrace.cycles-pp.fput_many.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
1.52 +0.1 1.64 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
2.32 +0.1 2.45 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
1.52 +0.1 1.65 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
2.30 +0.1 2.44 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
1.11 +0.1 1.25 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_from_user.eventfd_write.vfs_write.ksys_write
1.43 +0.1 1.57 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_read
1.42 +0.1 1.56 ? 2% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__libc_write
1.34 +0.1 1.48 perf-profile.calltrace.cycles-pp.copy_user_generic_unrolled.copyout._copy_to_iter.eventfd_read.new_sync_read
1.74 +0.2 1.89 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.eventfd_read.new_sync_read.vfs_read.ksys_read
1.34 +0.2 1.50 perf-profile.calltrace.cycles-pp.copy_user_generic_unrolled._copy_from_user.eventfd_write.vfs_write.ksys_write
1.78 +0.2 1.95 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.eventfd_write.vfs_write.ksys_write.do_syscall_64
0.34 ? 70% +0.2 0.52 perf-profile.calltrace.cycles-pp.iov_iter_init.new_sync_read.vfs_read.ksys_read.do_syscall_64
1.94 +0.2 2.16 perf-profile.calltrace.cycles-pp.__might_fault._copy_to_iter.eventfd_read.new_sync_read.vfs_read
30.86 +0.2 31.10 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
2.99 +0.3 3.30 perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.eventfd_read.new_sync_read.vfs_read
0.18 ?141% +0.4 0.55 ? 2% perf-profile.calltrace.cycles-pp.testcase
32.36 +0.4 32.76 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write
5.51 +0.4 5.92 perf-profile.calltrace.cycles-pp._copy_from_user.eventfd_write.vfs_write.ksys_write.do_syscall_64
3.47 ? 6% +0.5 3.98 ? 3% perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_read.ksys_read.do_syscall_64
3.52 ? 6% +0.5 4.04 ? 3% perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_write.ksys_write.do_syscall_64
7.81 +0.8 8.59 perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_read
7.84 +0.8 8.64 perf-profile.calltrace.cycles-pp.__entry_text_start.__libc_write
7.51 +0.8 8.31 perf-profile.calltrace.cycles-pp._copy_to_iter.eventfd_read.new_sync_read.vfs_read.ksys_read
4.39 ? 6% +1.0 5.36 ? 3% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
15.06 +1.1 16.15 perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
11.67 +1.2 12.84 perf-profile.calltrace.cycles-pp.eventfd_read.new_sync_read.vfs_read.ksys_read.do_syscall_64
9.32 +1.3 10.63 perf-profile.calltrace.cycles-pp.eventfd_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
47.12 +1.8 48.95 perf-profile.calltrace.cycles-pp.__libc_write
7.16 -7.2 0.00 perf-profile.children.cycles-pp.fsnotify
28.98 -3.8 25.18 perf-profile.children.cycles-pp.vfs_read
36.02 -3.3 32.74 perf-profile.children.cycles-pp.ksys_read
71.72 -2.7 68.99 perf-profile.children.cycles-pp.do_syscall_64
74.56 -2.4 72.14 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
56.97 -1.4 55.60 perf-profile.children.cycles-pp.__libc_read
0.35 -0.1 0.30 perf-profile.children.cycles-pp.fput
0.50 +0.0 0.52 perf-profile.children.cycles-pp.__pthread_disable_asynccancel
0.52 +0.0 0.54 perf-profile.children.cycles-pp.iov_iter_init
0.38 ? 2% +0.0 0.41 ? 2% perf-profile.children.cycles-pp.rcu_read_unlock_strict
0.85 +0.0 0.89 ? 3% perf-profile.children.cycles-pp.__x64_sys_read
2.25 +0.0 2.29 perf-profile.children.cycles-pp.___might_sleep
0.83 +0.1 0.88 ? 3% perf-profile.children.cycles-pp.__x64_sys_write
0.76 ? 2% +0.1 0.84 ? 2% perf-profile.children.cycles-pp.testcase
1.17 +0.1 1.28 perf-profile.children.cycles-pp.syscall_enter_from_user_mode
1.17 +0.1 1.28 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
2.70 +0.1 2.81 perf-profile.children.cycles-pp.fput_many
1.31 +0.1 1.45 perf-profile.children.cycles-pp.__might_sleep
3.16 +0.3 3.41 perf-profile.children.cycles-pp.exit_to_user_mode_prepare
2.38 +0.3 2.63 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
4.98 +0.3 5.25 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
4.37 +0.3 4.68 perf-profile.children.cycles-pp.__might_fault
3.12 +0.3 3.44 perf-profile.children.cycles-pp.copyout
3.63 +0.3 3.96 perf-profile.children.cycles-pp._raw_spin_lock_irq
3.32 +0.4 3.69 perf-profile.children.cycles-pp.copy_user_generic_unrolled
5.80 +0.4 6.24 perf-profile.children.cycles-pp._copy_from_user
0.30 ? 5% +0.7 1.01 ? 3% perf-profile.children.cycles-pp.apparmor_file_permission
7.62 +0.8 8.43 perf-profile.children.cycles-pp._copy_to_iter
8.50 +0.8 9.31 perf-profile.children.cycles-pp.syscall_return_via_sysret
10.10 +1.0 11.09 perf-profile.children.cycles-pp.__entry_text_start
7.20 ? 5% +1.1 8.26 ? 3% perf-profile.children.cycles-pp.common_file_perm
15.25 +1.1 16.36 perf-profile.children.cycles-pp.new_sync_read
11.87 +1.2 13.05 perf-profile.children.cycles-pp.eventfd_read
9.49 +1.3 10.80 perf-profile.children.cycles-pp.eventfd_write
47.33 +1.8 49.18 perf-profile.children.cycles-pp.__libc_write
6.85 -6.9 0.00 perf-profile.self.cycles-pp.fsnotify
2.97 ? 3% -0.8 2.16 ? 5% perf-profile.self.cycles-pp.vfs_read
2.69 ? 3% -0.6 2.05 ? 7% perf-profile.self.cycles-pp.vfs_write
1.56 -0.1 1.50 ? 2% perf-profile.self.cycles-pp.ksys_write
0.25 ? 4% +0.0 0.27 perf-profile.self.cycles-pp.rcu_read_unlock_strict
0.77 +0.0 0.80 ? 3% perf-profile.self.cycles-pp.__x64_sys_read
2.20 +0.0 2.23 perf-profile.self.cycles-pp.___might_sleep
0.41 +0.0 0.45 perf-profile.self.cycles-pp.copyout
0.75 +0.0 0.80 ? 4% perf-profile.self.cycles-pp.__x64_sys_write
0.94 +0.1 1.00 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.62 +0.1 0.68 perf-profile.self.cycles-pp.__fdget_pos
0.67 ? 3% +0.1 0.74 ? 2% perf-profile.self.cycles-pp.testcase
0.87 +0.1 0.94 perf-profile.self.cycles-pp._copy_from_user
2.58 +0.1 2.66 perf-profile.self.cycles-pp.fput_many
1.39 +0.1 1.47 ? 2% perf-profile.self.cycles-pp.ksys_read
0.99 +0.1 1.08 perf-profile.self.cycles-pp.syscall_enter_from_user_mode
1.16 +0.1 1.26 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
1.15 +0.1 1.27 perf-profile.self.cycles-pp.__might_sleep
0.85 +0.1 0.97 ? 2% perf-profile.self.cycles-pp.__might_fault
2.50 +0.2 2.70 perf-profile.self.cycles-pp.eventfd_read
2.14 +0.2 2.37 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
3.02 +0.2 3.26 perf-profile.self.cycles-pp.exit_to_user_mode_prepare
2.42 +0.3 2.69 perf-profile.self.cycles-pp._copy_to_iter
2.88 +0.3 3.19 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
3.46 +0.3 3.78 perf-profile.self.cycles-pp._raw_spin_lock_irq
3.12 +0.4 3.47 perf-profile.self.cycles-pp.copy_user_generic_unrolled
4.53 +0.4 4.93 perf-profile.self.cycles-pp.__entry_text_start
4.59 +0.4 5.00 perf-profile.self.cycles-pp.__libc_write
4.58 +0.4 5.01 ? 2% perf-profile.self.cycles-pp.__libc_read
1.92 ? 2% +0.7 2.61 ? 2% perf-profile.self.cycles-pp.eventfd_write
0.18 ? 8% +0.7 0.90 perf-profile.self.cycles-pp.apparmor_file_permission
8.43 +0.8 9.24 perf-profile.self.cycles-pp.syscall_return_via_sysret
5.64 ? 6% +0.9 6.58 ? 5% perf-profile.self.cycles-pp.common_file_perm
will-it-scale.per_thread_ops
1.8e+06 +-----------------------------------------------------------------+
|O O O O O OO |
1.6e+06 |-+ .+++.+ +.++ +.+++ ++. ++.+ ++. ++.++ +.+ +. .+ ++.+|
|+.++++ + ++.++ +.+ + + + + + +++ + |
| |
1.4e+06 |-+ |
| |
1.2e+06 |-+ |
| |
1e+06 |-+ |
| |
| |
800000 |-+ |
| O |
600000 +-----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang