2024-04-03 02:58:58

by Oliver Sang

[permalink] [raw]
Subject: [linus:master] [filelock] c69ff40719: stress-ng.dup.ops_per_sec 1.9% improvement



Hello,

kernel test robot noticed a 1.9% improvement of stress-ng.dup.ops_per_sec on:


commit: c69ff4071935f946f1cddc59e1d36a03442ed015 ("filelock: split leases out of struct file_lock")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

nr_threads: 100%
disk: 1HDD
testtime: 60s
fs: ext4
test: dup
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240403/[email protected]

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/dup/stress-ng/60s

commit:
282c30f320 ("filelock: remove temporary compatibility macros")
c69ff40719 ("filelock: split leases out of struct file_lock")

282c30f320ba2579 c69ff4071935f946f1cddc59e1d
---------------- ---------------------------
%stddev %change %stddev
\ | \
195388 +2.0% 199324 vmstat.system.cs
1502041 +1.9% 1531046 stress-ng.dup.ops
25032 +1.9% 25516 stress-ng.dup.ops_per_sec
2020 -1.9% 1982 stress-ng.time.system_time
176.48 +11.1% 196.06 stress-ng.time.user_time
3992532 +1.8% 4063489 stress-ng.time.voluntary_context_switches
1.949e+10 +2.3% 1.994e+10 perf-stat.i.branch-instructions
1.51 -3.2% 1.46 perf-stat.i.cpi
9.495e+10 +2.3% 9.711e+10 perf-stat.i.instructions
0.67 +3.6% 0.70 perf-stat.i.ipc
1.51 -3.4% 1.46 perf-stat.overall.cpi
0.66 +3.5% 0.69 perf-stat.overall.ipc
198601 +1.9% 202371 perf-stat.ps.context-switches
16.89 -3.1 13.75 perf-profile.calltrace.cycles-pp.filp_flush.filp_close.put_files_struct.do_exit.do_group_exit
24.02 -2.8 21.19 perf-profile.calltrace.cycles-pp.filp_close.put_files_struct.do_exit.do_group_exit.__x64_sys_exit_group
12.92 -2.7 10.25 perf-profile.calltrace.cycles-pp.locks_remove_posix.filp_flush.filp_close.put_files_struct.do_exit
33.85 -2.5 31.32 perf-profile.calltrace.cycles-pp.put_files_struct.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
53.34 -1.8 51.51 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
53.34 -1.8 51.51 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
53.31 -1.8 51.47 perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
53.31 -1.8 51.47 perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
54.16 -1.8 52.34 perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.62 +0.0 0.64 perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2
0.74 +0.0 0.77 perf-profile.calltrace.cycles-pp.acct_collect.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.86 +0.0 0.89 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._exit
0.86 +0.0 0.89 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._exit
0.60 +0.0 0.63 perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.anon_vma_clone
0.86 +0.0 0.88 perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe._exit
0.86 +0.0 0.89 perf-profile.calltrace.cycles-pp._exit
0.68 +0.0 0.72 perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
0.82 +0.0 0.86 perf-profile.calltrace.cycles-pp.up_write.free_pgtables.exit_mmap.__mmput.exit_mm
0.70 ? 2% +0.0 0.74 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.anon_vma_clone.anon_vma_fork.dup_mmap.dup_mm
0.81 +0.0 0.85 perf-profile.calltrace.cycles-pp.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.59 +0.0 0.63 ? 5% perf-profile.calltrace.cycles-pp.__libc_fork
0.85 +0.0 0.89 perf-profile.calltrace.cycles-pp.__do_sys_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe.wait4
0.85 +0.0 0.88 perf-profile.calltrace.cycles-pp.kernel_wait4.__do_sys_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe.wait4
0.65 +0.0 0.69 perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.anon_vma_fork
1.09 +0.0 1.14 perf-profile.calltrace.cycles-pp.kmem_cache_free.exit_mmap.__mmput.exit_mm.do_exit
0.95 +0.0 1.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.wait4
0.96 +0.0 1.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.wait4
0.94 +0.0 0.98 perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.anon_vma_clone.anon_vma_fork
0.98 +0.0 1.02 perf-profile.calltrace.cycles-pp.wait4
0.55 +0.0 0.60 ? 7% perf-profile.calltrace.cycles-pp.stress_dup
0.98 +0.0 1.03 perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.anon_vma_clone.anon_vma_fork.dup_mmap
1.44 +0.0 1.48 perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.42 +0.0 1.47 perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
1.86 +0.1 1.91 perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
1.85 +0.1 1.90 perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
1.18 +0.1 1.24 perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
1.18 +0.1 1.24 perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
1.18 +0.1 1.24 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
1.18 +0.1 1.24 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__clone
1.51 +0.1 1.57 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
1.74 +0.1 1.80 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.vm_area_dup.dup_mmap.dup_mm.copy_process
1.60 +0.1 1.67 perf-profile.calltrace.cycles-pp.kmem_cache_free.unlink_anon_vmas.free_pgtables.exit_mmap.__mmput
1.83 +0.1 1.89 perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
1.86 +0.1 1.93 perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
1.82 +0.1 1.89 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
1.52 +0.1 1.59 perf-profile.calltrace.cycles-pp.__clone
1.82 +0.1 1.89 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
1.28 +0.1 1.36 perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.down_write.anon_vma_fork.dup_mmap.dup_mm
1.36 +0.1 1.44 perf-profile.calltrace.cycles-pp.down_write.anon_vma_fork.dup_mmap.dup_mm.copy_process
1.22 +0.1 1.30 perf-profile.calltrace.cycles-pp.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.anon_vma_fork.dup_mmap
2.38 +0.1 2.48 perf-profile.calltrace.cycles-pp.vm_area_dup.dup_mmap.dup_mm.copy_process.kernel_clone
3.34 +0.2 3.51 perf-profile.calltrace.cycles-pp.anon_vma_interval_tree_insert.anon_vma_clone.anon_vma_fork.dup_mmap.dup_mm
5.14 +0.2 5.37 perf-profile.calltrace.cycles-pp.unlink_anon_vmas.free_pgtables.exit_mmap.__mmput.exit_mm
6.76 +0.3 7.04 perf-profile.calltrace.cycles-pp.anon_vma_clone.anon_vma_fork.dup_mmap.dup_mm.copy_process
5.87 +0.3 6.16 perf-profile.calltrace.cycles-pp.fput.filp_close.put_files_struct.do_exit.do_group_exit
7.49 +0.3 7.82 perf-profile.calltrace.cycles-pp.free_pgtables.exit_mmap.__mmput.exit_mm.do_exit
9.30 +0.4 9.69 perf-profile.calltrace.cycles-pp.anon_vma_fork.dup_mmap.dup_mm.copy_process.kernel_clone
0.10 ?200% +0.4 0.52 perf-profile.calltrace.cycles-pp.fifo_open.do_dentry_open.do_open.path_openat.do_filp_open
7.40 +0.5 7.86 perf-profile.calltrace.cycles-pp.dup_fd.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
0.05 ?299% +0.5 0.52 ? 2% perf-profile.calltrace.cycles-pp.osq_lock.rwsem_optimistic_spin.rwsem_down_write_slowpath.down_write.anon_vma_fork
17.56 +0.6 18.18 perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
17.62 +0.6 18.24 perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
17.63 +0.6 18.25 perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
19.31 +0.7 20.03 perf-profile.calltrace.cycles-pp.dup_mmap.dup_mm.copy_process.kernel_clone.__do_sys_clone
19.75 +0.7 20.48 perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
28.58 +1.3 29.84 perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
28.58 +1.3 29.84 perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
28.59 +1.3 29.85 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe._Fork
28.59 +1.3 29.86 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe._Fork
29.12 +1.3 30.41 perf-profile.calltrace.cycles-pp._Fork
29.02 +1.3 30.31 perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
17.87 -3.2 14.72 perf-profile.children.cycles-pp.filp_flush
25.00 -2.9 22.12 perf-profile.children.cycles-pp.filp_close
13.27 -2.7 10.58 perf-profile.children.cycles-pp.locks_remove_posix
34.49 -2.5 31.99 perf-profile.children.cycles-pp.put_files_struct
54.16 -1.8 52.36 perf-profile.children.cycles-pp.__x64_sys_exit_group
54.16 -1.8 52.35 perf-profile.children.cycles-pp.do_exit
54.17 -1.8 52.36 perf-profile.children.cycles-pp.do_group_exit
88.90 -0.4 88.52 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
88.88 -0.4 88.50 perf-profile.children.cycles-pp.do_syscall_64
0.46 ? 2% +0.0 0.48 perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.24 ? 2% +0.0 0.26 ? 4% perf-profile.children.cycles-pp.memcg_account_kmem
0.51 +0.0 0.53 perf-profile.children.cycles-pp.find_idlest_cpu
0.61 +0.0 0.63 perf-profile.children.cycles-pp.irq_exit_rcu
0.66 +0.0 0.69 perf-profile.children.cycles-pp.mas_next_slot
0.44 +0.0 0.46 perf-profile.children.cycles-pp.___slab_alloc
0.36 +0.0 0.38 perf-profile.children.cycles-pp.mm_init
0.75 +0.0 0.78 perf-profile.children.cycles-pp.acct_collect
0.47 +0.0 0.49 perf-profile.children.cycles-pp.dup_userfaultfd
0.49 +0.0 0.52 perf-profile.children.cycles-pp.fifo_open
0.55 +0.0 0.58 perf-profile.children.cycles-pp.lock_vma_under_rcu
0.78 +0.0 0.80 perf-profile.children.cycles-pp.rcu_do_batch
0.18 ? 3% +0.0 0.21 ? 17% perf-profile.children.cycles-pp.process_one_work
0.70 +0.0 0.73 perf-profile.children.cycles-pp.wake_up_new_task
0.84 +0.0 0.86 perf-profile.children.cycles-pp.mas_find
0.87 +0.0 0.90 perf-profile.children.cycles-pp._exit
0.62 +0.0 0.65 perf-profile.children.cycles-pp.do_dentry_open
0.85 +0.0 0.88 perf-profile.children.cycles-pp.load_balance
0.69 +0.0 0.72 perf-profile.children.cycles-pp.do_open
0.44 ? 2% +0.0 0.48 perf-profile.children.cycles-pp.__pte_offset_map_lock
0.92 +0.0 0.95 perf-profile.children.cycles-pp.newidle_balance
1.00 +0.0 1.03 perf-profile.children.cycles-pp.pick_next_task_fair
0.85 +0.0 0.89 perf-profile.children.cycles-pp.__do_sys_wait4
0.85 +0.0 0.88 perf-profile.children.cycles-pp.kernel_wait4
0.98 +0.0 1.02 perf-profile.children.cycles-pp.wait4
1.05 ? 2% +0.0 1.09 perf-profile.children.cycles-pp.__vm_area_free
0.65 +0.0 0.69 ? 5% perf-profile.children.cycles-pp.__libc_fork
0.97 +0.0 1.01 perf-profile.children.cycles-pp.schedule
1.43 +0.0 1.47 perf-profile.children.cycles-pp.path_openat
1.44 +0.0 1.49 perf-profile.children.cycles-pp.do_filp_open
0.45 ? 2% +0.0 0.50 ? 3% perf-profile.children.cycles-pp.memset_orig
0.61 +0.0 0.66 ? 8% perf-profile.children.cycles-pp.stress_dup
1.13 +0.1 1.18 perf-profile.children.cycles-pp.do_wait
0.57 ? 2% +0.1 0.62 ? 6% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
1.86 +0.1 1.92 perf-profile.children.cycles-pp.__x64_sys_openat
1.86 +0.1 1.91 perf-profile.children.cycles-pp.do_sys_openat2
1.06 +0.1 1.12 ? 3% perf-profile.children.cycles-pp.ret_from_fork_asm
1.54 +0.1 1.60 perf-profile.children.cycles-pp.cpuidle_idle_call
1.68 +0.1 1.74 perf-profile.children.cycles-pp.__schedule
1.83 +0.1 1.89 perf-profile.children.cycles-pp.start_secondary
1.86 +0.1 1.93 perf-profile.children.cycles-pp.cpu_startup_entry
1.86 +0.1 1.93 perf-profile.children.cycles-pp.secondary_startup_64_no_verify
1.77 +0.1 1.84 perf-profile.children.cycles-pp.__slab_free
1.85 +0.1 1.92 perf-profile.children.cycles-pp.do_idle
1.54 +0.1 1.61 perf-profile.children.cycles-pp.__clone
1.09 ? 2% +0.1 1.17 ? 2% perf-profile.children.cycles-pp.__anon_vma_interval_tree_remove
1.23 +0.1 1.31 ? 2% perf-profile.children.cycles-pp._raw_spin_lock
2.51 +0.1 2.59 perf-profile.children.cycles-pp.up_write
2.44 ? 2% +0.1 2.53 perf-profile.children.cycles-pp.__memcg_slab_free_hook
2.40 +0.1 2.50 perf-profile.children.cycles-pp.vm_area_dup
1.71 +0.1 1.81 perf-profile.children.cycles-pp.rwsem_spin_on_owner
3.53 +0.1 3.65 perf-profile.children.cycles-pp.kmem_cache_alloc
3.37 +0.2 3.53 perf-profile.children.cycles-pp.anon_vma_interval_tree_insert
3.03 +0.2 3.20 perf-profile.children.cycles-pp.rwsem_optimistic_spin
3.21 +0.2 3.38 perf-profile.children.cycles-pp.rwsem_down_write_slowpath
4.73 +0.2 4.91 perf-profile.children.cycles-pp.kmem_cache_free
5.16 +0.2 5.39 perf-profile.children.cycles-pp.unlink_anon_vmas
5.23 +0.2 5.46 perf-profile.children.cycles-pp.down_write
6.77 +0.3 7.04 perf-profile.children.cycles-pp.anon_vma_clone
6.56 +0.3 6.84 perf-profile.children.cycles-pp.fput
7.51 +0.3 7.84 perf-profile.children.cycles-pp.free_pgtables
9.32 +0.4 9.71 perf-profile.children.cycles-pp.anon_vma_fork
7.40 +0.5 7.86 perf-profile.children.cycles-pp.dup_fd
17.58 +0.6 18.20 perf-profile.children.cycles-pp.exit_mmap
17.67 +0.6 18.29 perf-profile.children.cycles-pp.exit_mm
17.62 +0.6 18.24 perf-profile.children.cycles-pp.__mmput
19.36 +0.7 20.08 perf-profile.children.cycles-pp.dup_mmap
19.75 +0.7 20.48 perf-profile.children.cycles-pp.dup_mm
29.18 +1.3 30.47 perf-profile.children.cycles-pp._Fork
29.03 +1.3 30.32 perf-profile.children.cycles-pp.copy_process
29.76 +1.3 31.08 perf-profile.children.cycles-pp.__do_sys_clone
29.76 +1.3 31.08 perf-profile.children.cycles-pp.kernel_clone
12.85 -2.7 10.18 perf-profile.self.cycles-pp.locks_remove_posix
3.16 -0.4 2.80 perf-profile.self.cycles-pp.filp_flush
0.72 +0.0 0.74 perf-profile.self.cycles-pp.kmem_cache_alloc
0.50 +0.0 0.53 perf-profile.self.cycles-pp.kmem_cache_free
0.81 +0.0 0.85 perf-profile.self.cycles-pp._raw_spin_lock
0.44 ? 2% +0.0 0.49 ? 3% perf-profile.self.cycles-pp.memset_orig
0.57 ? 2% +0.1 0.62 ? 6% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
1.70 +0.1 1.76 perf-profile.self.cycles-pp.__slab_free
2.46 +0.1 2.54 perf-profile.self.cycles-pp.up_write
1.70 +0.1 1.79 perf-profile.self.cycles-pp.rwsem_spin_on_owner
3.33 +0.2 3.50 perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
6.13 +0.3 6.44 perf-profile.self.cycles-pp.fput
7.02 +0.4 7.43 perf-profile.self.cycles-pp.dup_fd
7.28 +0.6 7.85 perf-profile.self.cycles-pp.put_files_struct




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki