2024-03-11 13:13:56

by Oliver Sang

[permalink] [raw]
Subject: [linus:master] [btrfs] e06cc89475: aim7.jobs-per-min -12.9% regression



Hello,

kernel test robot noticed a -12.9% regression of aim7.jobs-per-min on:


commit: e06cc89475eddc1f3a7a4d471524256152c68166 ("btrfs: fix data races when accessing the reserved amount of block reserves")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 09e5c48fea173b72f1c763776136eeb379b1bc47]


testcase: aim7
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:

disk: 1BRD_48G
fs: btrfs
test: disk_cp
load: 1500
cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240311/[email protected]

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
gcc-12/performance/1BRD_48G/btrfs/x86_64-rhel-8.3/1500/debian-12-x86_64-20240206.cgz/lkp-icl-2sp2/disk_cp/aim7

commit:
5897710b28 ("btrfs: send: don't issue unnecessary zero writes for trailing hole")
e06cc89475 ("btrfs: fix data races when accessing the reserved amount of block reserves")

5897710b28cabab0 e06cc89475eddc1f3a7a4d47152
---------------- ---------------------------
%stddev %change %stddev
\ | \
13.71 -6.3% 12.84 iostat.cpu.idle
86109 ? 5% -10.3% 77204 ? 2% meminfo.Mapped
0.29 ? 2% -0.0 0.25 ? 2% mpstat.cpu.all.usr%
249.60 +12.6% 280.99 ? 2% uptime.boot
148704 ? 3% +11.9% 166363 ? 3% numa-vmstat.node0.nr_written
148026 ? 4% +10.5% 163536 ? 3% numa-vmstat.node1.nr_written
83929 -8.8% 76554 vmstat.system.cs
202906 -4.6% 193642 vmstat.system.in
21940 ? 5% -10.2% 19706 ? 2% proc-vmstat.nr_mapped
296731 ? 4% +11.2% 329900 ? 3% proc-vmstat.nr_written
971976 +6.8% 1037759 proc-vmstat.pgfault
1190113 ? 4% +11.2% 1323358 ? 3% proc-vmstat.pgpgout
61472 ? 3% +9.8% 67507 ? 3% proc-vmstat.pgreuse
45149 -12.9% 39308 ? 2% aim7.jobs-per-min
199.49 +14.9% 229.19 ? 2% aim7.time.elapsed_time
199.49 +14.9% 229.19 ? 2% aim7.time.elapsed_time.max
106461 ? 3% +20.1% 127873 ? 2% aim7.time.involuntary_context_switches
153317 +4.7% 160598 aim7.time.minor_page_faults
22001 +16.1% 25542 ? 2% aim7.time.system_time
8341344 +4.7% 8730263 aim7.time.voluntary_context_switches
1.52 +10.0% 1.67 perf-stat.i.MPKI
7.428e+09 -2.7% 7.229e+09 perf-stat.i.branch-instructions
0.62 ? 2% -0.1 0.56 perf-stat.i.branch-miss-rate%
27712058 -10.6% 24784125 ? 2% perf-stat.i.branch-misses
24.15 +1.3 25.40 perf-stat.i.cache-miss-rate%
51305985 +5.9% 54318013 perf-stat.i.cache-misses
84790 -8.9% 77275 perf-stat.i.context-switches
8.56 +5.1% 9.00 perf-stat.i.cpi
3464 -3.4% 3346 perf-stat.i.cpu-migrations
5494 -4.1% 5271 perf-stat.i.cycles-between-cache-misses
3.253e+10 -3.4% 3.141e+10 perf-stat.i.instructions
0.18 -7.5% 0.17 perf-stat.i.ipc
4301 ? 3% -6.5% 4022 ? 2% perf-stat.i.minor-faults
4303 ? 3% -6.5% 4024 ? 2% perf-stat.i.page-faults
1.58 +9.6% 1.73 perf-stat.overall.MPKI
0.37 -0.0 0.34 perf-stat.overall.branch-miss-rate%
24.56 +1.3 25.83 perf-stat.overall.cache-miss-rate%
8.90 +4.6% 9.31 perf-stat.overall.cpi
5642 -4.5% 5386 perf-stat.overall.cycles-between-cache-misses
0.11 -4.4% 0.11 perf-stat.overall.ipc
7.412e+09 -2.6% 7.216e+09 perf-stat.ps.branch-instructions
27605707 ? 2% -10.4% 24743238 perf-stat.ps.branch-misses
51201807 +5.9% 54221492 perf-stat.ps.cache-misses
84459 -8.8% 77008 perf-stat.ps.context-switches
2.889e+11 +1.1% 2.92e+11 perf-stat.ps.cpu-cycles
3468 -3.4% 3349 perf-stat.ps.cpu-migrations
3.246e+10 -3.4% 3.135e+10 perf-stat.ps.instructions
4534 -6.7% 4228 perf-stat.ps.minor-faults
4537 -6.8% 4230 perf-stat.ps.page-faults
6.503e+12 +11.0% 7.221e+12 perf-stat.total.instructions
33.78 -0.2 33.57 perf-profile.calltrace.cycles-pp._raw_spin_lock.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_buffered_write.btrfs_do_write_iter
33.84 -0.2 33.66 perf-profile.calltrace.cycles-pp.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_buffered_write.btrfs_do_write_iter.vfs_write
33.84 -0.2 33.66 perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.btrfs_buffered_write.btrfs_do_write_iter.vfs_write.ksys_write
33.66 -0.2 33.49 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_buffered_write
1.00 ? 4% -0.1 0.88 ? 2% perf-profile.calltrace.cycles-pp.btrfs_set_extent_delalloc.btrfs_dirty_pages.btrfs_buffered_write.btrfs_do_write_iter.vfs_write
0.94 ? 4% -0.1 0.83 ? 2% perf-profile.calltrace.cycles-pp.btrfs_get_extent.btrfs_set_extent_delalloc.btrfs_dirty_pages.btrfs_buffered_write.btrfs_do_write_iter
0.77 ? 4% -0.1 0.68 ? 2% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_set_extent_delalloc.btrfs_dirty_pages
0.77 ? 4% -0.1 0.68 ? 2% perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_set_extent_delalloc.btrfs_dirty_pages.btrfs_buffered_write
0.57 -0.1 0.52 ? 3% perf-profile.calltrace.cycles-pp.btrfs_read_lock_root_node.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_set_extent_delalloc
27.57 +0.1 27.71 perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.btrfs_buffered_write.btrfs_do_write_iter.vfs_write.ksys_write
98.02 +0.2 98.16 perf-profile.calltrace.cycles-pp.write
97.96 +0.2 98.12 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
97.96 +0.2 98.12 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
97.92 +0.2 98.08 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
97.82 +0.2 97.98 perf-profile.calltrace.cycles-pp.btrfs_buffered_write.btrfs_do_write_iter.vfs_write.ksys_write.do_syscall_64
97.90 +0.2 98.07 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
97.85 +0.2 98.02 perf-profile.calltrace.cycles-pp.btrfs_do_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
34.80 +0.3 35.06 perf-profile.calltrace.cycles-pp._raw_spin_lock.__reserve_bytes.btrfs_reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.btrfs_buffered_write
34.69 +0.3 34.96 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__reserve_bytes.btrfs_reserve_metadata_bytes.btrfs_delalloc_reserve_metadata
26.02 +0.3 26.31 perf-profile.calltrace.cycles-pp._raw_spin_lock.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_clear_delalloc_extent.clear_state_bit
26.39 +0.3 26.68 perf-profile.calltrace.cycles-pp.__clear_extent_bit.btrfs_dirty_pages.btrfs_buffered_write.btrfs_do_write_iter.vfs_write
26.34 +0.3 26.63 perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.btrfs_dirty_pages.btrfs_buffered_write.btrfs_do_write_iter
35.09 +0.3 35.38 perf-profile.calltrace.cycles-pp.__reserve_bytes.btrfs_reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.btrfs_buffered_write.btrfs_do_write_iter
35.09 +0.3 35.38 perf-profile.calltrace.cycles-pp.btrfs_reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.btrfs_buffered_write.btrfs_do_write_iter.vfs_write
26.33 +0.3 26.63 perf-profile.calltrace.cycles-pp.btrfs_clear_delalloc_extent.clear_state_bit.__clear_extent_bit.btrfs_dirty_pages.btrfs_buffered_write
26.07 +0.3 26.37 perf-profile.calltrace.cycles-pp.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_clear_delalloc_extent.clear_state_bit.__clear_extent_bit
26.08 +0.3 26.38 perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.btrfs_clear_delalloc_extent.clear_state_bit.__clear_extent_bit.btrfs_dirty_pages
35.18 +0.3 35.50 perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.btrfs_buffered_write.btrfs_do_write_iter.vfs_write.ksys_write
25.94 +0.3 26.26 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.btrfs_block_rsv_release.btrfs_inode_rsv_release.btrfs_clear_delalloc_extent
1.01 ? 4% -0.1 0.88 ? 2% perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
0.94 ? 4% -0.1 0.83 ? 2% perf-profile.children.cycles-pp.btrfs_get_extent
0.83 ? 4% -0.1 0.74 ? 2% perf-profile.children.cycles-pp.btrfs_search_slot
0.77 ? 4% -0.1 0.68 ? 2% perf-profile.children.cycles-pp.btrfs_lookup_file_extent
0.36 ? 3% -0.1 0.30 ? 5% perf-profile.children.cycles-pp.read
0.58 -0.1 0.52 ? 3% perf-profile.children.cycles-pp.btrfs_read_lock_root_node
0.56 -0.1 0.51 ? 2% perf-profile.children.cycles-pp.__btrfs_tree_read_lock
0.56 -0.1 0.51 ? 2% perf-profile.children.cycles-pp.down_read
0.54 -0.1 0.49 ? 2% perf-profile.children.cycles-pp.rwsem_down_read_slowpath
0.11 ? 4% -0.0 0.06 ? 9% perf-profile.children.cycles-pp.calc_available_free_space
0.28 ? 4% -0.0 0.24 ? 5% perf-profile.children.cycles-pp.ksys_read
0.36 ? 2% -0.0 0.31 ? 2% perf-profile.children.cycles-pp.prepare_pages
0.26 ? 4% -0.0 0.22 ? 6% perf-profile.children.cycles-pp.vfs_read
0.45 ? 2% -0.0 0.42 perf-profile.children.cycles-pp.__schedule
0.43 ? 2% -0.0 0.40 perf-profile.children.cycles-pp.schedule
0.23 ? 3% -0.0 0.20 ? 2% perf-profile.children.cycles-pp.__set_extent_bit
0.14 ? 4% -0.0 0.11 ? 4% perf-profile.children.cycles-pp.btrfs_space_info_update_bytes_may_use
0.42 ? 2% -0.0 0.40 perf-profile.children.cycles-pp.schedule_preempt_disabled
0.19 -0.0 0.16 ? 3% perf-profile.children.cycles-pp.pagecache_get_page
0.36 ? 2% -0.0 0.34 perf-profile.children.cycles-pp.load_balance
0.19 ? 3% -0.0 0.16 ? 3% perf-profile.children.cycles-pp.__filemap_get_folio
0.37 ? 2% -0.0 0.35 perf-profile.children.cycles-pp.newidle_balance
0.31 ? 2% -0.0 0.28 perf-profile.children.cycles-pp.cpu_startup_entry
0.31 ? 2% -0.0 0.28 perf-profile.children.cycles-pp.do_idle
0.31 ? 3% -0.0 0.29 perf-profile.children.cycles-pp.find_busiest_group
0.31 ? 2% -0.0 0.28 perf-profile.children.cycles-pp.secondary_startup_64_no_verify
0.38 ? 2% -0.0 0.35 perf-profile.children.cycles-pp.pick_next_task_fair
0.30 -0.0 0.28 ? 2% perf-profile.children.cycles-pp.start_secondary
0.29 ? 3% -0.0 0.27 perf-profile.children.cycles-pp.update_sg_lb_stats
0.35 -0.0 0.33 perf-profile.children.cycles-pp.__close
0.27 ? 2% -0.0 0.25 ? 2% perf-profile.children.cycles-pp.cpuidle_idle_call
0.34 -0.0 0.32 perf-profile.children.cycles-pp.btrfs_evict_inode
0.14 -0.0 0.12 ? 4% perf-profile.children.cycles-pp.btrfs_read_folio
0.34 -0.0 0.32 perf-profile.children.cycles-pp.evict
0.15 ? 2% -0.0 0.13 ? 5% perf-profile.children.cycles-pp.prepare_uptodate_page
0.35 -0.0 0.33 perf-profile.children.cycles-pp.__x64_sys_close
0.20 ? 2% -0.0 0.18 ? 2% perf-profile.children.cycles-pp.acpi_idle_enter
0.34 -0.0 0.33 perf-profile.children.cycles-pp.__dentry_kill
0.35 -0.0 0.33 perf-profile.children.cycles-pp.__fput
0.34 -0.0 0.33 perf-profile.children.cycles-pp.dentry_kill
0.35 -0.0 0.33 perf-profile.children.cycles-pp.dput
0.31 -0.0 0.29 perf-profile.children.cycles-pp.update_sd_lb_stats
0.20 -0.0 0.18 ? 2% perf-profile.children.cycles-pp.acpi_safe_halt
0.12 ? 3% -0.0 0.10 ? 3% perf-profile.children.cycles-pp.alloc_extent_state
0.12 -0.0 0.10 ? 4% perf-profile.children.cycles-pp.kmem_cache_alloc
0.15 ? 3% -0.0 0.14 ? 3% perf-profile.children.cycles-pp.set_extent_bit
0.21 -0.0 0.20 ? 2% perf-profile.children.cycles-pp.cpuidle_enter
0.09 ? 5% -0.0 0.07 ? 5% perf-profile.children.cycles-pp.lock_and_cleanup_extent_if_need
0.21 -0.0 0.19 ? 2% perf-profile.children.cycles-pp.cpuidle_enter_state
0.07 ? 6% -0.0 0.06 perf-profile.children.cycles-pp.btrfs_folio_clamp_clear_checked
0.11 -0.0 0.10 ? 4% perf-profile.children.cycles-pp.btrfs_do_readpage
0.08 ? 5% -0.0 0.07 perf-profile.children.cycles-pp.btrfs_drop_pages
0.09 ? 4% -0.0 0.08 perf-profile.children.cycles-pp.truncate_inode_pages_range
0.08 ? 5% -0.0 0.07 ? 6% perf-profile.children.cycles-pp.btrfs_write_check
0.07 -0.0 0.06 perf-profile.children.cycles-pp.btrfs_create_new_inode
0.08 -0.0 0.07 perf-profile.children.cycles-pp.lock_extent
0.06 -0.0 0.05 perf-profile.children.cycles-pp.kmem_cache_free
0.15 -0.0 0.14 perf-profile.children.cycles-pp.asm_sysvec_call_function_single
99.48 +0.0 99.52 perf-profile.children.cycles-pp.do_syscall_64
99.49 +0.0 99.53 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.21 +0.1 0.26 perf-profile.children.cycles-pp.need_preemptive_reclaim
60.25 +0.1 60.37 perf-profile.children.cycles-pp.btrfs_block_rsv_release
59.96 +0.1 60.07 perf-profile.children.cycles-pp.btrfs_inode_rsv_release
27.57 +0.1 27.71 perf-profile.children.cycles-pp.btrfs_dirty_pages
98.06 +0.1 98.21 perf-profile.children.cycles-pp.write
97.95 +0.2 98.11 perf-profile.children.cycles-pp.ksys_write
97.83 +0.2 97.99 perf-profile.children.cycles-pp.btrfs_buffered_write
97.94 +0.2 98.10 perf-profile.children.cycles-pp.vfs_write
97.86 +0.2 98.02 perf-profile.children.cycles-pp.btrfs_do_write_iter
26.57 +0.3 26.84 perf-profile.children.cycles-pp.__clear_extent_bit
26.44 +0.3 26.72 perf-profile.children.cycles-pp.clear_state_bit
35.50 +0.3 35.79 perf-profile.children.cycles-pp.__reserve_bytes
26.37 +0.3 26.67 perf-profile.children.cycles-pp.btrfs_clear_delalloc_extent
35.23 +0.3 35.52 perf-profile.children.cycles-pp.btrfs_reserve_metadata_bytes
35.19 +0.3 35.50 perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
95.80 +0.4 96.18 perf-profile.children.cycles-pp._raw_spin_lock
95.21 +0.4 95.60 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.70 -0.0 0.67 perf-profile.self.cycles-pp._raw_spin_lock
0.13 ? 2% -0.0 0.11 ? 3% perf-profile.self.cycles-pp.btrfs_space_info_update_bytes_may_use
0.07 ? 5% -0.0 0.06 perf-profile.self.cycles-pp.btrfs_folio_clamp_clear_checked
0.08 ? 4% -0.0 0.07 perf-profile.self.cycles-pp.kmem_cache_alloc
0.07 -0.0 0.06 perf-profile.self.cycles-pp.memset_orig
0.00 +0.1 0.08 ? 6% perf-profile.self.cycles-pp.btrfs_block_rsv_release
94.48 +0.4 94.88 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki