Greeting,
FYI, we noticed a 2.4% improvement of aim7.jobs-per-min due to commit:
commit: 1fea323ff00526dcc04fbb4ee6e7d04e4e2ab0e1 ("xfs: reduce debug overhead of dir leaf/node checks")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
in testcase: aim7
on test machine: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:
disk: 4BRD_12G
md: RAID1
fs: xfs
test: disk_rw
load: 3000
cpufreq_governor: performance
ucode: 0x5003006
test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
In addition to that, the commit also has significant impact on the following tests:
+------------------+------------------------------------------------------------------------+
| testcase: change | aim7: aim7.jobs-per-min 1.6% improvement |
| test machine | 144 threads Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory |
| test parameters | cpufreq_governor=performance |
| | disk=1BRD_48G |
| | fs=xfs |
| | load=3000 |
| | test=disk_rw |
| | ucode=0x700001e |
+------------------+------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/4BRD_12G/xfs/x86_64-rhel-8.3/3000/RAID1/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/disk_rw/aim7/0x5003006
commit:
39d3c0b596 ("xfs: No need for inode number error injection in __xfs_dir3_data_check")
1fea323ff0 ("xfs: reduce debug overhead of dir leaf/node checks")
39d3c0b5968b5421 1fea323ff00526dcc04fbb4ee6e
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:6 33% 2:6 kmsg.XFS(md#):xlog_verify_grant_tail:space>BBTOB(tail_blocks)
%stddev %change %stddev
\ | \
505405 +2.4% 517621 aim7.jobs-per-min
35.82 -2.4% 34.98 aim7.time.elapsed_time
35.82 -2.4% 34.98 aim7.time.elapsed_time.max
2866 ? 35% +39.0% 3985 ? 3% interrupts.CPU53.NMI:Non-maskable_interrupts
2866 ? 35% +39.0% 3985 ? 3% interrupts.CPU53.PMI:Performance_monitoring_interrupts
286711 -2.5% 279423 proc-vmstat.nr_dirty
554636 -1.3% 547330 proc-vmstat.nr_file_pages
286865 -2.5% 279593 proc-vmstat.nr_inactive_file
286865 -2.5% 279593 proc-vmstat.nr_zone_inactive_file
287057 -2.6% 279704 proc-vmstat.nr_zone_write_pending
1.313e+10 +2.0% 1.34e+10 perf-stat.i.branch-instructions
52558 +2.7% 53962 perf-stat.i.context-switches
1942 +7.4% 2086 ? 2% perf-stat.i.cpu-migrations
1.9e+10 +2.0% 1.939e+10 perf-stat.i.dTLB-loads
1.061e+10 +2.5% 1.087e+10 perf-stat.i.dTLB-stores
6.606e+10 +2.1% 6.743e+10 perf-stat.i.instructions
487.84 +2.1% 498.03 perf-stat.i.metric.M/sec
3171946 +6.1% 3364545 perf-stat.i.node-store-misses
10014711 +2.8% 10299278 perf-stat.i.node-stores
24.04 +0.6 24.62 perf-stat.overall.node-store-miss-rate%
1.286e+10 +2.0% 1.311e+10 perf-stat.ps.branch-instructions
51473 +2.6% 52806 perf-stat.ps.context-switches
1903 +7.2% 2040 ? 2% perf-stat.ps.cpu-migrations
1.861e+10 +2.0% 1.898e+10 perf-stat.ps.dTLB-loads
1.039e+10 +2.4% 1.064e+10 perf-stat.ps.dTLB-stores
6.469e+10 +2.0% 6.598e+10 perf-stat.ps.instructions
3106311 +6.0% 3293166 perf-stat.ps.node-store-misses
9812026 +2.8% 10082006 perf-stat.ps.node-stores
2.29 ? 7% -0.2 2.07 ? 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
2.28 ? 7% -0.2 2.06 ? 2% perf-profile.calltrace.cycles-pp.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
2.31 ? 7% -0.2 2.10 ? 2% perf-profile.calltrace.cycles-pp.unlink
2.29 ? 7% -0.2 2.08 ? 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlink
1.66 ? 8% -0.2 1.48 ? 4% perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
2.00 ? 6% -0.2 1.83 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.creat64
2.00 ? 6% -0.2 1.83 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.creat64
1.98 ? 6% -0.2 1.80 ? 2% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.00 ? 6% -0.2 1.83 perf-profile.calltrace.cycles-pp.do_sys_openat2.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.creat64
2.00 ? 6% -0.2 1.83 perf-profile.calltrace.cycles-pp.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.creat64
1.97 ? 6% -0.2 1.80 ? 2% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.do_sys_open.do_syscall_64
2.01 ? 6% -0.2 1.84 perf-profile.calltrace.cycles-pp.creat64
0.90 ? 11% -0.1 0.79 ? 3% perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_down_write_slowpath.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.92 ? 9% -0.1 0.81 ? 2% perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.path_openat.do_filp_open.do_sys_openat2.do_sys_open
0.73 ? 11% -0.1 0.63 ? 3% perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_down_write_slowpath.path_openat.do_filp_open.do_sys_openat2
0.69 ? 6% -0.1 0.61 ? 6% perf-profile.calltrace.cycles-pp.osq_lock.rwsem_down_write_slowpath.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.58 ? 8% -0.3 2.29 ? 3% perf-profile.children.cycles-pp.rwsem_down_write_slowpath
2.29 ? 7% -0.2 2.06 ? 2% perf-profile.children.cycles-pp.do_unlinkat
2.32 ? 7% -0.2 2.10 ? 2% perf-profile.children.cycles-pp.unlink
1.62 ? 11% -0.2 1.42 ? 3% perf-profile.children.cycles-pp.rwsem_spin_on_owner
2.08 ? 6% -0.2 1.90 perf-profile.children.cycles-pp.do_sys_open
2.04 ? 6% -0.2 1.86 ? 2% perf-profile.children.cycles-pp.do_filp_open
2.07 ? 6% -0.2 1.90 perf-profile.children.cycles-pp.do_sys_openat2
2.02 ? 6% -0.2 1.85 ? 2% perf-profile.children.cycles-pp.creat64
2.03 ? 6% -0.2 1.86 perf-profile.children.cycles-pp.path_openat
0.82 ? 6% -0.1 0.72 ? 5% perf-profile.children.cycles-pp.osq_lock
0.18 ? 84% -0.1 0.09 ? 9% perf-profile.children.cycles-pp.xfs_vn_lookup
0.50 ? 2% -0.1 0.44 ? 2% perf-profile.children.cycles-pp.__fsnotify_parent
0.14 ? 6% -0.0 0.10 ? 7% perf-profile.children.cycles-pp.write@plt
0.12 ? 11% -0.0 0.09 perf-profile.children.cycles-pp.xfs_dir2_leafn_lookup_for_entry
0.11 ? 18% -0.0 0.08 ? 8% perf-profile.children.cycles-pp.xfs_dir_lookup
0.22 ? 7% -0.0 0.20 ? 2% perf-profile.children.cycles-pp.update_process_times
0.09 ? 8% -0.0 0.07 ? 14% perf-profile.children.cycles-pp.xfs_dir2_node_lookup
0.25 ? 44% +0.1 0.35 ? 5% perf-profile.children.cycles-pp.xfs_file_llseek
1.61 ? 11% -0.2 1.41 ? 3% perf-profile.self.cycles-pp.rwsem_spin_on_owner
0.81 ? 6% -0.1 0.72 ? 5% perf-profile.self.cycles-pp.osq_lock
0.48 ? 2% -0.1 0.41 ? 3% perf-profile.self.cycles-pp.__fsnotify_parent
0.10 ? 6% -0.1 0.04 ? 44% perf-profile.self.cycles-pp.write@plt
0.24 ? 44% +0.1 0.34 ? 5% perf-profile.self.cycles-pp.xfs_file_llseek
0.77 ? 13% +0.2 0.94 ? 4% perf-profile.self.cycles-pp.xfs_file_buffered_write
aim7.jobs-per-min
540000 +------------------------------------------------------------------+
| O |
530000 |-+ O O O O O |
| O O O O O O O O |
| O O O O O O O O O O O O |
520000 |-+ O O O O O O |
| O |
510000 |-+ .+ |
| .+.+.+ |
500000 |-+ +.+ |
| : |
| +. + + +. : |
490000 |.+.+. + +. + + .+.+. + + + +.+ |
| + + + + +.+.+.+.++.+ |
480000 +------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
***************************************************************************************************
lkp-cpl-4sp1: 144 threads Intel(R) Xeon(R) Gold 5318H CPU @ 2.50GHz with 128G memory
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/1BRD_48G/xfs/x86_64-rhel-8.3/3000/debian-10.4-x86_64-20200603.cgz/lkp-cpl-4sp1/disk_rw/aim7/0x700001e
commit:
39d3c0b596 ("xfs: No need for inode number error injection in __xfs_dir3_data_check")
1fea323ff0 ("xfs: reduce debug overhead of dir leaf/node checks")
39d3c0b5968b5421 1fea323ff00526dcc04fbb4ee6e
---------------- ---------------------------
%stddev %change %stddev
\ | \
500977 +1.6% 509113 aim7.jobs-per-min
36.14 -1.6% 35.57 aim7.time.elapsed_time
36.14 -1.6% 35.57 aim7.time.elapsed_time.max
40.93 ? 2% -4.3% 39.19 aim7.time.user_time
28267 ? 79% -81.7% 5164 ? 5% numa-meminfo.node2.KernelStack
28180 ? 78% -81.7% 5162 ? 5% numa-vmstat.node2.nr_kernel_stack
291109 -1.6% 286393 proc-vmstat.nr_dirty
11049 ? 5% +9.0% 12039 ? 4% slabinfo.pde_opener.active_objs
11049 ? 5% +9.0% 12039 ? 4% slabinfo.pde_opener.num_objs
1579 ? 33% +29.9% 2051 ? 25% interrupts.CPU109.NMI:Non-maskable_interrupts
1579 ? 33% +29.9% 2051 ? 25% interrupts.CPU109.PMI:Performance_monitoring_interrupts
1785 ? 30% +45.8% 2602 ? 8% interrupts.CPU117.NMI:Non-maskable_interrupts
1785 ? 30% +45.8% 2602 ? 8% interrupts.CPU117.PMI:Performance_monitoring_interrupts
891.67 ? 8% +99.4% 1778 ? 47% interrupts.CPU4.CAL:Function_call_interrupts
1.301e+10 +1.6% 1.322e+10 perf-stat.i.branch-instructions
52509 +2.1% 53602 perf-stat.i.context-switches
1.89e+10 +1.8% 1.924e+10 perf-stat.i.dTLB-loads
1.061e+10 +1.9% 1.081e+10 perf-stat.i.dTLB-stores
6.554e+10 +1.6% 6.66e+10 perf-stat.i.instructions
296.86 +1.8% 302.18 perf-stat.i.metric.M/sec
76.63 +1.0 77.63 perf-stat.i.node-load-miss-rate%
3774653 ? 2% +6.6% 4025641 ? 3% perf-stat.i.node-loads
4414091 ? 3% +7.6% 4747750 perf-stat.i.node-store-misses
9344160 +2.3% 9559103 perf-stat.i.node-stores
32.07 ? 2% +1.1 33.18 perf-stat.overall.node-store-miss-rate%
1.271e+10 +1.8% 1.293e+10 perf-stat.ps.branch-instructions
51278 +2.3% 52440 perf-stat.ps.context-switches
1.846e+10 +2.0% 1.883e+10 perf-stat.ps.dTLB-loads
1.036e+10 +2.1% 1.057e+10 perf-stat.ps.dTLB-stores
6.4e+10 +1.8% 6.516e+10 perf-stat.ps.instructions
3686827 ? 2% +6.9% 3940762 ? 3% perf-stat.ps.node-loads
4310625 ? 3% +7.8% 4645614 perf-stat.ps.node-store-misses
9127740 +2.5% 9355048 perf-stat.ps.node-stores
2.54 -0.2 2.30 ? 4% perf-profile.calltrace.cycles-pp.creat64
2.53 -0.2 2.29 ? 4% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.creat64
2.52 -0.2 2.28 ? 4% perf-profile.calltrace.cycles-pp.do_sys_openat2.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.creat64
2.50 -0.2 2.26 ? 4% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.50 -0.2 2.26 ? 4% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.do_sys_open.do_syscall_64
2.52 -0.2 2.28 ? 4% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.creat64
2.52 -0.2 2.28 ? 4% perf-profile.calltrace.cycles-pp.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe.creat64
2.82 ? 2% -0.2 2.62 ? 3% perf-profile.calltrace.cycles-pp.unlink
2.79 ? 2% -0.2 2.59 ? 3% perf-profile.calltrace.cycles-pp.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
2.80 ? 2% -0.2 2.61 ? 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlink
2.79 ? 2% -0.2 2.60 ? 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
2.10 -0.1 1.95 ? 4% perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
1.21 ? 3% -0.1 1.10 ? 4% perf-profile.calltrace.cycles-pp.rwsem_down_write_slowpath.path_openat.do_filp_open.do_sys_openat2.do_sys_open
0.96 ? 5% -0.1 0.87 ? 4% perf-profile.calltrace.cycles-pp.xfs_generic_create.path_openat.do_filp_open.do_sys_openat2.do_sys_open
1.07 ? 3% -0.1 0.99 ? 3% perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_down_write_slowpath.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.92 ? 2% -0.1 0.85 ? 4% perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_down_write_slowpath.path_openat.do_filp_open.do_sys_openat2
3.31 -0.3 3.06 ? 4% perf-profile.children.cycles-pp.rwsem_down_write_slowpath
2.56 -0.2 2.31 ? 4% perf-profile.children.cycles-pp.do_filp_open
2.55 ? 2% -0.2 2.30 ? 4% perf-profile.children.cycles-pp.creat64
2.56 -0.2 2.31 ? 4% perf-profile.children.cycles-pp.path_openat
2.60 -0.2 2.36 ? 4% perf-profile.children.cycles-pp.do_sys_open
2.60 -0.2 2.36 ? 4% perf-profile.children.cycles-pp.do_sys_openat2
2.83 ? 2% -0.2 2.63 ? 3% perf-profile.children.cycles-pp.unlink
2.79 ? 2% -0.2 2.60 ? 3% perf-profile.children.cycles-pp.do_unlinkat
1.99 ? 2% -0.1 1.84 ? 3% perf-profile.children.cycles-pp.rwsem_spin_on_owner
0.96 ? 5% -0.1 0.87 ? 4% perf-profile.children.cycles-pp.xfs_generic_create
0.45 ? 5% -0.1 0.37 ? 7% perf-profile.children.cycles-pp.__fsnotify_parent
0.17 ? 5% -0.0 0.13 ? 9% perf-profile.children.cycles-pp.write@plt
0.12 ? 4% -0.0 0.09 ? 5% perf-profile.children.cycles-pp.xfs_dir2_leafn_lookup_for_entry
0.09 ? 7% -0.0 0.07 ? 7% perf-profile.children.cycles-pp.generic_file_llseek_size
0.09 ? 7% -0.0 0.07 ? 10% perf-profile.children.cycles-pp.xfs_dir2_node_lookup
0.08 ? 11% -0.0 0.06 ? 7% perf-profile.children.cycles-pp.wake_up_q
1.97 ? 2% -0.2 1.82 ? 3% perf-profile.self.cycles-pp.rwsem_spin_on_owner
0.43 ? 5% -0.1 0.34 ? 7% perf-profile.self.cycles-pp.__fsnotify_parent
1.17 ? 3% -0.1 1.10 ? 4% perf-profile.self.cycles-pp.write
0.10 ? 7% -0.1 0.05 ? 45% perf-profile.self.cycles-pp.write@plt
0.09 ? 7% -0.0 0.07 ? 7% perf-profile.self.cycles-pp.generic_file_llseek_size
0.19 ? 3% -0.0 0.17 ? 4% perf-profile.self.cycles-pp.xfs_get_extsz_hint
0.21 ? 6% +0.0 0.24 ? 6% perf-profile.self.cycles-pp.propagate_protected_usage
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang