Greeting,
FYI, we noticed a 6.1% improvement of unixbench.score due to commit:
commit: e902b4cafb16469f4458c6b8f9ba60f14872813b ("[PATCH 4/4] fsnotify: optimize the case of no marks of any type")
url: https://github.com/0day-ci/linux/commits/Amir-Goldstein/Performance-optimization-for-no-fsnotify-marks/20210804-020522
base: https://git.kernel.org/cgit/linux/kernel/git/jack/linux-fs.git fsnotify
in testcase: unixbench
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) CPU @ 2.30GHz with 128G memory
with following parameters:
runtime: 300s
nr_task: 1
test: pipe
cpufreq_governor: performance
ucode: 0x4003006
test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system.
test-url: https://github.com/kdlucas/byte-unixbench
In addition to that, the commit also has significant impact on the following tests:
+------------------+-------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops 11.6% improvement |
| test machine | 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=100% |
| | test=eventfd1 |
| | ucode=0x5003006 |
+------------------+-------------------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
bin/lkp run generated-yaml-file
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/1/debian-10.4-x86_64-20200603.cgz/300s/lkp-csl-2sp4/pipe/unixbench/0x4003006
commit:
945579ec98 ("fsnotify: count all objects with attached connectors")
e902b4cafb ("fsnotify: optimize the case of no marks of any type")
945579ec988af6b6 e902b4cafb16469f4458c6b8f9b
---------------- ---------------------------
%stddev %change %stddev
\ | \
1557 +6.1% 1652 unixbench.score
7.56e+08 +6.2% 8.031e+08 unixbench.workload
0.04 ?101% +0.1 0.09 ? 18% perf-profile.self.cycles-pp.tick_nohz_tick_stopped
14904 ? 4% +24.9% 18614 ? 24% softirqs.CPU49.RCU
46992 ? 6% +8.4% 50924 ? 3% softirqs.CPU61.SCHED
15704 ? 10% +25.6% 19720 ? 19% softirqs.CPU8.RCU
15828 ? 9% +26.4% 20004 ? 21% softirqs.CPU9.RCU
8.648e+08 -4.8% 8.235e+08 perf-stat.i.dTLB-stores
5188634 ? 3% +10.5% 5731846 ? 5% perf-stat.i.iTLB-load-misses
999.21 ? 2% -10.7% 892.04 ? 4% perf-stat.i.instructions-per-iTLB-miss
929.00 ? 2% -11.0% 826.79 ? 4% perf-stat.overall.instructions-per-iTLB-miss
2484 -7.6% 2296 perf-stat.overall.path-length
8.625e+08 -4.8% 8.213e+08 perf-stat.ps.dTLB-stores
5175537 ? 3% +10.5% 5717256 ? 5% perf-stat.ps.iTLB-load-misses
unixbench.score
1700 +--------------------------------------------------------------------+
1680 |-OO O O O |
| O O O O O |
1660 |-+ O O OO O O O O O OO O |
1640 |-+ O O O O O |
| |
1620 |-+ |
1600 |-+ .+. |
1580 |-+ .+.++ ++.+.++.+ |
|.+ .+.++.+.++.+ : +. |
1560 |-++ : + +.++.|
1540 |-+ : : |
| ++.+.++.+.++.+.++.+.+. : |
1520 |-+ ++.+ |
1500 +--------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
***************************************************************************************************
lkp-csl-2ap2: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/eventfd1/will-it-scale/0x5003006
commit:
945579ec98 ("fsnotify: count all objects with attached connectors")
e902b4cafb ("fsnotify: optimize the case of no marks of any type")
945579ec988af6b6 e902b4cafb16469f4458c6b8f9b
---------------- ---------------------------
%stddev %change %stddev
\ | \
3.369e+08 +11.6% 3.759e+08 will-it-scale.192.processes
1754466 +11.6% 1958062 will-it-scale.per_process_ops
3.369e+08 +11.6% 3.759e+08 will-it-scale.workload
17.74 +1.9 19.63 mpstat.cpu.all.usr%
5685 ? 35% +39.7% 7940 ? 18% interrupts.CPU65.NMI:Non-maskable_interrupts
5685 ? 35% +39.7% 7940 ? 18% interrupts.CPU65.PMI:Performance_monitoring_interrupts
797.00 ? 4% +16.4% 927.86 ? 7% perf-sched.wait_and_delay.count.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
3.13 ? 17% -13.2% 2.72 ? 6% perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork
1.121e+11 -3.4% 1.082e+11 perf-stat.i.branch-instructions
0.62 +0.1 0.73 perf-stat.i.branch-miss-rate%
6.886e+08 +13.3% 7.804e+08 perf-stat.i.branch-misses
0.97 +3.0% 1.00 perf-stat.i.cpi
154124 +9.0% 167948 perf-stat.i.dTLB-store-misses
1.133e+11 -7.2% 1.051e+11 perf-stat.i.dTLB-stores
6.466e+08 +15.7% 7.483e+08 perf-stat.i.iTLB-load-misses
596449 ? 15% -67.7% 192654 ? 17% perf-stat.i.iTLB-loads
5.751e+11 -2.6% 5.602e+11 perf-stat.i.instructions
897.64 -16.2% 752.55 perf-stat.i.instructions-per-iTLB-miss
1.03 -2.9% 1.00 perf-stat.i.ipc
72.48 -5.1% 68.81 ? 2% perf-stat.i.metric.K/sec
2047 -3.3% 1980 perf-stat.i.metric.M/sec
0.61 +0.1 0.72 perf-stat.overall.branch-miss-rate%
0.97 +3.0% 1.00 perf-stat.overall.cpi
0.00 ? 4% -0.0 0.00 ? 4% perf-stat.overall.dTLB-load-miss-rate%
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
889.67 -15.9% 748.21 perf-stat.overall.instructions-per-iTLB-miss
1.03 -2.9% 1.00 perf-stat.overall.ipc
514472 -12.7% 448981 perf-stat.overall.path-length
1.117e+11 -3.4% 1.078e+11 perf-stat.ps.branch-instructions
6.863e+08 +13.4% 7.78e+08 perf-stat.ps.branch-misses
527168 ? 4% -6.9% 490865 ? 4% perf-stat.ps.dTLB-load-misses
154185 +8.9% 167864 perf-stat.ps.dTLB-store-misses
1.129e+11 -7.2% 1.047e+11 perf-stat.ps.dTLB-stores
6.444e+08 +15.8% 7.462e+08 perf-stat.ps.iTLB-load-misses
594991 ? 16% -67.8% 191420 ? 17% perf-stat.ps.iTLB-loads
5.731e+11 -2.6% 5.583e+11 perf-stat.ps.instructions
1.733e+14 -2.6% 1.688e+14 perf-stat.total.instructions
31.40 -3.4 28.03 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
35.75 -3.2 32.52 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
40.96 -2.8 38.11 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
42.55 -2.7 39.90 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
8.47 -2.0 6.46 perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
57.80 -1.1 56.74 perf-profile.calltrace.cycles-pp.read
25.58 -0.5 25.08 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
20.90 -0.5 20.40 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.64 +0.0 0.67 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
0.59 ? 3% +0.1 0.65 ? 7% perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.56 +0.1 0.62 perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
0.57 +0.1 0.63 perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.60 +0.1 0.66 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.write
0.60 +0.1 0.67 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.read
1.16 +0.1 1.26 perf-profile.calltrace.cycles-pp.___might_sleep.__might_fault._copy_from_user.eventfd_write.vfs_write
0.76 +0.1 0.87 perf-profile.calltrace.cycles-pp.__might_sleep.__might_fault._copy_from_user.eventfd_write.vfs_write
1.16 +0.1 1.27 perf-profile.calltrace.cycles-pp.___might_sleep.__might_fault._copy_to_iter.eventfd_read.new_sync_read
0.70 ? 3% +0.1 0.81 ? 3% perf-profile.calltrace.cycles-pp.__x64_sys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
1.18 +0.1 1.32 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.eventfd_read.new_sync_read
0.71 +0.1 0.85 perf-profile.calltrace.cycles-pp.__might_sleep.__might_fault._copy_to_iter.eventfd_read.new_sync_read
1.20 +0.1 1.35 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_from_user.eventfd_write.vfs_write.ksys_write
1.48 +0.2 1.64 perf-profile.calltrace.cycles-pp.copy_user_generic_unrolled.copyout._copy_to_iter.eventfd_read.new_sync_read
0.84 ? 7% +0.2 1.01 ? 3% perf-profile.calltrace.cycles-pp.aa_file_perm.common_file_perm.security_file_permission.vfs_read.ksys_read
1.47 +0.2 1.64 perf-profile.calltrace.cycles-pp.copy_user_generic_unrolled._copy_from_user.eventfd_write.vfs_write.ksys_write
1.62 +0.2 1.80 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.read
1.62 +0.2 1.80 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
0.84 ? 7% +0.2 1.02 ? 4% perf-profile.calltrace.cycles-pp.aa_file_perm.common_file_perm.security_file_permission.vfs_write.ksys_write
2.06 +0.2 2.24 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
1.88 +0.2 2.08 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.eventfd_read.new_sync_read.vfs_read.ksys_read
2.03 +0.2 2.24 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.48 +0.2 2.70 perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.91 +0.2 2.14 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.eventfd_write.vfs_write.ksys_write.do_syscall_64
2.45 +0.2 2.67 perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.72 +0.2 2.95 perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.26 +0.2 2.50 perf-profile.calltrace.cycles-pp.__might_fault._copy_from_user.eventfd_write.vfs_write.ksys_write
2.67 +0.3 2.92 perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
2.99 +0.3 3.25 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
2.97 +0.3 3.23 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.16 +0.3 2.42 perf-profile.calltrace.cycles-pp.__might_fault._copy_to_iter.eventfd_read.new_sync_read.vfs_read
0.26 ?100% +0.3 0.56 ? 2% perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
5.38 +0.4 5.75 perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.27 +0.4 3.65 perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.eventfd_read.new_sync_read.vfs_read
3.97 +0.4 4.37 perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_write.ksys_write.do_syscall_64
3.86 +0.5 4.31 perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_read.ksys_read.do_syscall_64
5.94 +0.7 6.61 perf-profile.calltrace.cycles-pp._copy_from_user.eventfd_write.vfs_write.ksys_write.do_syscall_64
8.62 +0.9 9.52 perf-profile.calltrace.cycles-pp.__entry_text_start.write
8.60 +0.9 9.51 perf-profile.calltrace.cycles-pp.__entry_text_start.read
8.34 +1.0 9.30 perf-profile.calltrace.cycles-pp._copy_to_iter.eventfd_read.new_sync_read.vfs_read.ksys_read
13.01 +1.5 14.49 perf-profile.calltrace.cycles-pp.eventfd_read.new_sync_read.vfs_read.ksys_read.do_syscall_64
16.80 +1.6 18.38 perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
47.54 +1.6 49.18 perf-profile.calltrace.cycles-pp.write
10.06 +1.7 11.74 perf-profile.calltrace.cycles-pp.eventfd_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.26 ? 2% -7.3 0.00 perf-profile.children.cycles-pp.fsnotify
31.69 -3.6 28.14 perf-profile.children.cycles-pp.vfs_read
35.84 -3.2 32.62 perf-profile.children.cycles-pp.ksys_read
72.09 -2.9 69.15 perf-profile.children.cycles-pp.do_syscall_64
75.16 -2.6 72.59 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
14.13 -1.8 12.35 perf-profile.children.cycles-pp.security_file_permission
57.85 -1.0 56.80 perf-profile.children.cycles-pp.read
21.19 -0.6 20.60 perf-profile.children.cycles-pp.vfs_write
25.72 -0.5 25.20 perf-profile.children.cycles-pp.ksys_write
0.15 ? 3% +0.0 0.17 ? 4% perf-profile.children.cycles-pp.rcu_read_unlock_strict
0.24 ? 2% +0.0 0.26 perf-profile.children.cycles-pp.read@plt
0.34 +0.0 0.37 perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup
0.48 +0.0 0.52 ? 2% perf-profile.children.cycles-pp.iov_iter_init
1.12 +0.1 1.19 perf-profile.children.cycles-pp.apparmor_file_permission
0.71 ? 2% +0.1 0.79 perf-profile.children.cycles-pp.testcase
0.76 ? 3% +0.1 0.86 ? 2% perf-profile.children.cycles-pp.__x64_sys_write
1.16 +0.1 1.27 perf-profile.children.cycles-pp.syscall_enter_from_user_mode
1.23 ? 3% +0.1 1.36 ? 7% perf-profile.children.cycles-pp.rw_verify_area
1.28 +0.1 1.42 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
2.33 +0.2 2.54 perf-profile.children.cycles-pp.___might_sleep
1.51 +0.3 1.76 perf-profile.children.cycles-pp.__might_sleep
2.59 +0.3 2.90 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
1.72 ? 7% +0.4 2.08 ? 4% perf-profile.children.cycles-pp.aa_file_perm
3.42 +0.4 3.82 perf-profile.children.cycles-pp.copyout
4.20 +0.4 4.61 perf-profile.children.cycles-pp.exit_to_user_mode_prepare
3.93 +0.4 4.36 perf-profile.children.cycles-pp._raw_spin_lock_irq
3.63 +0.4 4.06 perf-profile.children.cycles-pp.copy_user_generic_unrolled
4.94 +0.4 5.38 perf-profile.children.cycles-pp.__fget_light
5.78 +0.5 6.30 perf-profile.children.cycles-pp.__fdget_pos
6.42 +0.5 6.95 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
4.73 +0.6 5.28 perf-profile.children.cycles-pp.__might_fault
6.22 +0.7 6.93 perf-profile.children.cycles-pp._copy_from_user
8.07 +0.9 8.96 perf-profile.children.cycles-pp.common_file_perm
8.50 +1.0 9.48 perf-profile.children.cycles-pp._copy_to_iter
9.44 +1.0 10.45 perf-profile.children.cycles-pp.syscall_return_via_sysret
11.14 +1.1 12.28 perf-profile.children.cycles-pp.__entry_text_start
13.26 +1.5 14.77 perf-profile.children.cycles-pp.eventfd_read
17.03 +1.6 18.64 perf-profile.children.cycles-pp.new_sync_read
47.57 +1.7 49.24 perf-profile.children.cycles-pp.write
10.24 +1.7 11.94 perf-profile.children.cycles-pp.eventfd_write
6.97 ? 2% -7.0 0.00 perf-profile.self.cycles-pp.fsnotify
3.18 ? 2% -0.8 2.40 perf-profile.self.cycles-pp.vfs_read
2.83 ? 2% -0.5 2.31 ? 3% perf-profile.self.cycles-pp.vfs_write
1.61 -0.2 1.43 ? 2% perf-profile.self.cycles-pp.ksys_write
0.14 ? 3% +0.0 0.16 ? 4% perf-profile.self.cycles-pp.rcu_read_unlock_strict
0.33 +0.0 0.37 perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup
0.46 +0.0 0.50 perf-profile.self.cycles-pp.copyout
0.40 +0.0 0.44 ? 2% perf-profile.self.cycles-pp.iov_iter_init
0.80 ? 2% +0.0 0.85 ? 2% perf-profile.self.cycles-pp.__x64_sys_read
1.01 +0.0 1.06 perf-profile.self.cycles-pp.apparmor_file_permission
1.10 +0.1 1.16 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.54 ? 3% +0.1 0.60 ? 2% perf-profile.self.cycles-pp.testcase
1.30 +0.1 1.37 perf-profile.self.cycles-pp.ksys_read
0.92 +0.1 1.00 perf-profile.self.cycles-pp.syscall_enter_from_user_mode
0.98 +0.1 1.07 perf-profile.self.cycles-pp.__fdget_pos
0.92 +0.1 1.02 perf-profile.self.cycles-pp._copy_from_user
0.73 ? 2% +0.1 0.84 ? 2% perf-profile.self.cycles-pp.__x64_sys_write
0.94 +0.1 1.05 perf-profile.self.cycles-pp.__might_fault
1.13 ? 3% +0.1 1.25 ? 7% perf-profile.self.cycles-pp.rw_verify_area
3.32 +0.1 3.45 ? 2% perf-profile.self.cycles-pp.new_sync_read
1.27 +0.1 1.40 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
2.28 +0.2 2.48 perf-profile.self.cycles-pp.___might_sleep
1.32 +0.2 1.54 perf-profile.self.cycles-pp.__might_sleep
2.36 +0.3 2.64 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
2.79 +0.3 3.09 perf-profile.self.cycles-pp._copy_to_iter
1.43 ? 8% +0.3 1.76 ? 5% perf-profile.self.cycles-pp.aa_file_perm
2.84 +0.3 3.17 perf-profile.self.cycles-pp.eventfd_read
3.70 +0.4 4.07 perf-profile.self.cycles-pp.exit_to_user_mode_prepare
3.12 +0.4 3.48 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
3.42 +0.4 3.82 perf-profile.self.cycles-pp.copy_user_generic_unrolled
3.74 +0.4 4.15 perf-profile.self.cycles-pp._raw_spin_lock_irq
4.64 +0.4 5.06 perf-profile.self.cycles-pp.__fget_light
4.18 ? 2% +0.4 4.63 perf-profile.self.cycles-pp.read
4.12 +0.4 4.57 perf-profile.self.cycles-pp.write
5.04 +0.5 5.51 perf-profile.self.cycles-pp.__entry_text_start
6.37 ? 2% +0.5 6.89 ? 2% perf-profile.self.cycles-pp.common_file_perm
2.10 +0.8 2.89 perf-profile.self.cycles-pp.eventfd_write
9.34 +1.0 10.34 perf-profile.self.cycles-pp.syscall_return_via_sysret
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang