2024-05-28 01:15:16

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [smb3] edfc6481fa: filebench.sum_operations/s 4194.8% improvement



Hello,

kernel test robot noticed a 4194.8% improvement of filebench.sum_operations/s on:


commit: edfc6481faf896301cab940da776229fe39e9fc9 ("smb3: fix perf regression with cached writes with netfs conversion")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: filebench
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
parameters:

disk: 1HDD
fs: ext4
fs2: cifs
test: randomwrite.f
cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240527/[email protected]

=========================================================================================
compiler/cpufreq_governor/disk/fs2/fs/kconfig/rootfs/tbox_group/test/testcase:
gcc-13/performance/1HDD/cifs/ext4/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-icl-2sp6/randomwrite.f/filebench

commit:
14b1cd2534 ("cifs: Fix locking in cifs_strict_readv()")
edfc6481fa ("smb3: fix perf regression with cached writes with netfs conversion")

14b1cd25346b1d61 edfc6481faf896301cab940da77
---------------- ---------------------------
%stddev %change %stddev
\ | \
3814731 ? 93% -62.9% 1414791 ? 44% cpuidle..usage
91.23 ? 4% +6.5% 97.17 iostat.cpu.idle
1817 ? 25% -49.1% 925.83 ? 36% perf-c2c.DRAM.remote
207192 +418.2% 1073659 ? 20% meminfo.AnonHugePages
2604959 ? 5% +65.7% 4315389 ? 4% meminfo.Dirty
69239 ?139% +547.1% 448063 ? 51% numa-meminfo.node0.AnonHugePages
138049 ? 70% +353.2% 625629 ? 65% numa-meminfo.node1.AnonHugePages
33.79 ?139% +547.7% 218.82 ? 51% numa-vmstat.node0.nr_anon_transparent_hugepages
67.47 ? 70% +353.0% 305.60 ? 65% numa-vmstat.node1.nr_anon_transparent_hugepages
10799 ? 25% -35.4% 6972 ? 8% sched_debug.cfs_rq:/.load.avg
37988 ?120% +526.0% 237792 ? 59% sched_debug.cpu.avg_idle.min
4690 ?153% -92.0% 376.83 ? 24% sched_debug.cpu.nr_switches.min
69222 ? 3% -16.7% 57628 vmstat.io.bo
0.73 ? 12% -24.9% 0.55 ? 2% vmstat.procs.b
19540 ? 24% -55.2% 8762 ? 12% vmstat.system.in
0.58 ? 14% -0.2 0.41 mpstat.cpu.all.iowait%
0.05 ? 32% -0.0 0.02 ? 14% mpstat.cpu.all.irq%
0.05 ? 14% -0.0 0.02 ? 6% mpstat.cpu.all.soft%
2.00 +2391.7% 49.83 ? 27% mpstat.max_utilization.seconds
58.54 ? 7% -24.5% 44.17 ? 13% mpstat.max_utilization_pct
99.67 ?163% +4194.7% 4280 ? 7% filebench.sum_bytes_mb/s
765489 ?163% +4194.8% 32875866 ? 7% filebench.sum_operations
12757 ?163% +4194.8% 547887 ? 7% filebench.sum_operations/s
0.24 ? 41% -99.2% 0.00 filebench.sum_time_ms/op
12757 ?163% +4194.8% 547887 ? 7% filebench.sum_writes/s
241.17 ? 80% +321.8% 1017 ? 8% filebench.time.involuntary_context_switches
22.67 ? 23% +63.2% 37.00 filebench.time.percent_of_cpu_this_job_got
37.73 ? 23% +49.1% 56.25 filebench.time.system_time
1.997e+09 ? 45% -62.3% 7.533e+08 ? 26% perf-stat.i.branch-instructions
11.93 ? 23% +3.9 15.84 ? 7% perf-stat.i.cache-miss-rate%
1.589e+08 ? 5% -36.2% 1.013e+08 ? 6% perf-stat.i.cache-references
1227 ? 13% -23.7% 937.19 ? 7% perf-stat.i.cycles-between-cache-misses
9.86e+09 ? 45% -63.3% 3.621e+09 ? 27% perf-stat.i.instructions
4.84 ? 44% +96.1% 9.48 ? 18% perf-stat.overall.MPKI
830.79 ? 40% -62.4% 312.02 ? 34% perf-stat.overall.cycles-between-cache-misses
1.994e+09 ? 45% -62.2% 7.528e+08 ? 27% perf-stat.ps.branch-instructions
1.585e+08 ? 5% -36.3% 1.01e+08 ? 6% perf-stat.ps.cache-references
9.842e+09 ? 45% -63.2% 3.62e+09 ? 27% perf-stat.ps.instructions
1.637e+12 ? 45% -62.9% 6.073e+11 ? 27% perf-stat.total.instructions
101.22 +418.0% 524.27 ? 20% proc-vmstat.nr_anon_transparent_hugepages
2918550 ? 3% +421.9% 15232014 ? 9% proc-vmstat.nr_dirtied
650592 ? 5% +66.0% 1079880 ? 4% proc-vmstat.nr_dirty
23980 -2.1% 23472 proc-vmstat.nr_kernel_stack
17286 ? 6% -5.1% 16397 proc-vmstat.nr_mapped
79441 -2.5% 77426 proc-vmstat.nr_slab_unreclaimable
662082 ? 6% +66.5% 1102087 ? 5% proc-vmstat.nr_zone_write_pending
8719968 ? 21% -48.5% 4491902 ? 10% proc-vmstat.numa_hit
8.00 ? 20% +12912.5% 1041 ? 45% proc-vmstat.numa_huge_pte_updates
8584943 ? 21% -49.2% 4359325 ? 10% proc-vmstat.numa_local
11674686 ? 3% -16.0% 9806002 proc-vmstat.pgpgout
2.00 +51250.0% 1027 ? 56% proc-vmstat.thp_fault_alloc
4.19 ?100% -1.7 2.53 ?144% perf-profile.calltrace.cycles-pp.scsi_end_request.scsi_io_completion.blk_complete_reqs.handle_softirqs.irq_exit_rcu
4.19 ?100% -1.7 2.53 ?144% perf-profile.calltrace.cycles-pp.scsi_io_completion.blk_complete_reqs.handle_softirqs.irq_exit_rcu.common_interrupt
4.24 ?100% -1.7 2.58 ?145% perf-profile.calltrace.cycles-pp.irq_exit_rcu.common_interrupt.asm_common_interrupt.cpuidle_enter_state.cpuidle_enter
4.23 ?100% -1.6 2.58 ?145% perf-profile.calltrace.cycles-pp.handle_softirqs.irq_exit_rcu.common_interrupt.asm_common_interrupt.cpuidle_enter_state
4.20 ?100% -1.6 2.57 ?145% perf-profile.calltrace.cycles-pp.blk_complete_reqs.handle_softirqs.irq_exit_rcu.common_interrupt.asm_common_interrupt
0.50 ? 46% +0.3 0.78 ? 5% perf-profile.calltrace.cycles-pp.write
0.28 ?100% +0.4 0.67 ? 6% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.31 ?100% +0.4 0.71 ? 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.31 ?100% +0.4 0.71 ? 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
0.19 ?141% +0.5 0.64 ? 6% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.50 ? 14% +0.5 3.04 ? 8% perf-profile.calltrace.cycles-pp.read
2.66 ? 14% +0.6 3.28 ? 9% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
2.33 ? 11% +0.6 2.98 ? 14% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
2.33 ? 11% +0.7 3.00 ? 14% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
0.22 ? 20% -0.2 0.06 ? 83% perf-profile.children.cycles-pp.native_apic_mem_eoi
0.23 ? 11% -0.1 0.15 ? 24% perf-profile.children.cycles-pp.getenv
0.03 ?141% +0.1 0.10 ? 29% perf-profile.children.cycles-pp.set_task_cpu
0.01 ?223% +0.1 0.08 ? 37% perf-profile.children.cycles-pp.__radix_tree_lookup
0.00 +0.1 0.10 ? 43% perf-profile.children.cycles-pp.kmalloc_trace
0.01 ?223% +0.1 0.12 ? 37% perf-profile.children.cycles-pp.free_pcppages_bulk
0.16 ? 33% +0.1 0.29 ? 29% perf-profile.children.cycles-pp.vm_area_alloc
0.10 ? 79% +0.1 0.24 ? 26% perf-profile.children.cycles-pp.leave_mm
0.24 ? 19% +0.2 0.41 ? 36% perf-profile.children.cycles-pp.strnlen_user
0.41 ? 22% +0.2 0.58 ? 19% perf-profile.children.cycles-pp.migration_cpu_stop
0.68 ? 12% +0.2 0.86 ? 6% perf-profile.children.cycles-pp.ksys_write
0.65 ? 15% +0.2 0.84 ? 6% perf-profile.children.cycles-pp.vfs_write
0.41 ? 22% +0.2 0.62 ? 19% perf-profile.children.cycles-pp.cpu_stopper_thread
0.79 ? 10% +0.2 1.00 ? 3% perf-profile.children.cycles-pp.write
0.47 ? 28% +0.2 0.70 ? 23% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.24 ? 35% +0.2 0.47 ? 20% perf-profile.children.cycles-pp.set_pte_range
0.43 ? 28% +0.2 0.67 ? 25% perf-profile.children.cycles-pp.d_alloc_parallel
0.58 ? 23% +0.3 0.87 ? 21% perf-profile.children.cycles-pp.__lookup_slow
0.98 ? 22% +0.3 1.27 ? 13% perf-profile.children.cycles-pp.copy_process
1.39 ? 9% +0.3 1.74 ? 13% perf-profile.children.cycles-pp.filemap_map_pages
1.49 ? 9% +0.4 1.91 ? 11% perf-profile.children.cycles-pp.do_read_fault
1.75 ? 10% +0.5 2.26 ? 8% perf-profile.children.cycles-pp.do_fault
2.66 ? 14% +0.6 3.28 ? 9% perf-profile.children.cycles-pp.smpboot_thread_fn
3.94 ? 16% +0.7 4.62 ? 8% perf-profile.children.cycles-pp.read
4.09 ? 4% +0.8 4.93 ? 8% perf-profile.children.cycles-pp.asm_exc_page_fault
3.08 ? 10% +0.9 3.96 ? 8% perf-profile.children.cycles-pp.__handle_mm_fault
3.22 ? 8% +1.0 4.18 ? 8% perf-profile.children.cycles-pp.handle_mm_fault
3.44 ? 6% +1.0 4.47 ? 9% perf-profile.children.cycles-pp.do_user_addr_fault
3.45 ? 5% +1.0 4.48 ? 9% perf-profile.children.cycles-pp.exc_page_fault
20.31 ? 9% +2.6 22.93 ? 6% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
20.26 ? 9% +2.6 22.88 ? 6% perf-profile.children.cycles-pp.do_syscall_64
0.21 ? 20% -0.2 0.06 ? 83% perf-profile.self.cycles-pp.native_apic_mem_eoi
0.12 ? 30% +0.1 0.18 ? 19% perf-profile.self.cycles-pp.newidle_balance
0.01 ?223% +0.1 0.08 ? 37% perf-profile.self.cycles-pp.__radix_tree_lookup
0.00 +0.1 0.09 ? 39% perf-profile.self.cycles-pp.kmalloc_trace
0.05 ?111% +0.1 0.17 ? 36% perf-profile.self.cycles-pp.leave_mm
0.23 ? 23% +0.2 0.39 ? 40% perf-profile.self.cycles-pp.strnlen_user
0.10 ? 53% +0.2 0.30 ? 59% perf-profile.self.cycles-pp.read_counters




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki