2024-03-31 14:29:49

by Yujie Liu

[permalink] [raw]
Subject: [linus:master] [mm] f77171d241: vm-scalability.throughput 34.9% improvement

Hello,

kernel test robot noticed a 34.9% improvement of vm-scalability.throughput on:

commit: f77171d241e379ea93448a53d58104191e02135c ("mm: allow non-hugetlb large folios to be batch processed")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: vm-scalability
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:

runtime: 300s
test: truncate
cpufreq_governor: performance

Details are as below:

The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240331/[email protected]

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/300s/lkp-cpl-4sp2/truncate/vm-scalability

commit:
31b2ff82ae ("mm: handle large folios in free_unref_folios()")
f77171d241 ("mm: allow non-hugetlb large folios to be batch processed")

31b2ff82aefb33ce f77171d241e379ea93448a53d58
---------------- ---------------------------
%stddev %change %stddev
\ | \
7.397e+08 ? 6% +34.9% 9.978e+08 ? 3% vm-scalability.median
7.397e+08 ? 6% +34.9% 9.978e+08 ? 3% vm-scalability.throughput
193.12 ? 7% -16.4% 161.38 ? 3% vm-scalability.time.percent_of_cpu_this_job_got
84.58 ? 8% -16.5% 70.62 ? 3% vm-scalability.time.system_time
154795 ? 85% +168.7% 415963 ? 28% numa-meminfo.node0.Inactive(anon)
41174935 ? 36% -81.1% 7801569 ? 30% proc-vmstat.pgfree
38644 ? 85% +169.0% 103935 ? 28% numa-vmstat.node0.nr_inactive_anon
38644 ? 85% +169.0% 103937 ? 28% numa-vmstat.node0.nr_zone_inactive_anon
18.05 ? 12% -18.1 0.00 perf-profile.calltrace.cycles-pp.__folio_put_large.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat
18.02 ? 12% -18.0 0.00 perf-profile.calltrace.cycles-pp.__page_cache_release.__folio_put_large.folios_put_refs.truncate_inode_pages_range.evict
17.68 ? 12% -17.7 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__page_cache_release.__folio_put_large.folios_put_refs.truncate_inode_pages_range
17.63 ? 12% -17.6 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.__folio_put_large.folios_put_refs
17.57 ? 12% -17.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__page_cache_release.__folio_put_large
22.14 ? 12% -5.9 16.22 ? 8% perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64
22.15 ? 12% -5.9 16.23 ? 8% perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
22.16 ? 12% -5.9 16.24 ? 8% perf-profile.calltrace.cycles-pp.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
22.16 ? 12% -5.9 16.24 ? 8% perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
22.16 ? 12% -5.9 16.24 ? 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
22.16 ? 12% -5.9 16.24 ? 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlinkat
22.16 ? 12% -5.9 16.24 ? 8% perf-profile.calltrace.cycles-pp.unlinkat
21.78 ? 12% -5.7 16.05 ? 8% perf-profile.calltrace.cycles-pp.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat
1.14 ? 9% +0.1 1.29 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_trylock.rebalance_domains.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt
1.98 ? 3% +0.2 2.17 ? 2% perf-profile.calltrace.cycles-pp.rebalance_domains.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
2.24 ? 3% +0.2 2.44 ? 4% perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.calltrace.cycles-pp.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.calltrace.cycles-pp.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.calltrace.cycles-pp.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail
2.34 ? 3% +0.2 2.56 ? 4% perf-profile.calltrace.cycles-pp.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread.ret_from_fork
2.34 ? 3% +0.2 2.56 ? 4% perf-profile.calltrace.cycles-pp.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread
2.41 ? 3% +0.2 2.64 ? 4% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2.38 ? 3% +0.2 2.61 ? 4% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2.84 ? 4% +0.2 3.09 ? 2% perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
6.56 ? 2% +0.6 7.18 ? 3% perf-profile.calltrace.cycles-pp.rep_movs_alternative._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read
6.90 ? 2% +0.7 7.55 ? 3% perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter
6.98 ? 2% +0.7 7.64 ? 3% perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read
14.15 ? 3% +1.3 15.48 perf-profile.calltrace.cycles-pp.memset_orig.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages
14.19 ? 3% +1.3 15.53 perf-profile.calltrace.cycles-pp.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order
14.30 ? 3% +1.3 15.64 perf-profile.calltrace.cycles-pp.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order.filemap_get_pages
14.36 ? 3% +1.4 15.72 perf-profile.calltrace.cycles-pp.iomap_readahead.read_pages.page_cache_ra_order.filemap_get_pages.filemap_read
14.37 ? 3% +1.4 15.73 perf-profile.calltrace.cycles-pp.read_pages.page_cache_ra_order.filemap_get_pages.filemap_read.xfs_file_buffered_read
14.81 ? 3% +1.4 16.22 perf-profile.calltrace.cycles-pp.page_cache_ra_order.filemap_get_pages.filemap_read.xfs_file_buffered_read.xfs_file_read_iter
14.86 ? 3% +1.4 16.28 perf-profile.calltrace.cycles-pp.filemap_get_pages.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read
21.90 ? 3% +2.1 23.98 perf-profile.calltrace.cycles-pp.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read.ksys_read
21.92 ? 3% +2.1 24.01 perf-profile.calltrace.cycles-pp.xfs_file_buffered_read.xfs_file_read_iter.vfs_read.ksys_read.do_syscall_64
21.94 ? 3% +2.1 24.02 perf-profile.calltrace.cycles-pp.xfs_file_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
22.08 ? 3% +2.1 24.18 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
22.09 ? 3% +2.1 24.20 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
22.11 ? 3% +2.1 24.22 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
22.15 ? 3% +2.1 24.27 perf-profile.calltrace.cycles-pp.read
22.11 ? 3% +2.1 24.23 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
45.34 ? 3% +4.1 49.45 ? 2% perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
45.76 ? 3% +4.1 49.89 ? 2% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
45.87 ? 3% +4.1 50.00 ? 2% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
66.58 ? 3% +5.8 72.37 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
0.00 +15.2 15.18 ? 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.free_one_page.free_unref_folios.folios_put_refs
0.00 +15.3 15.26 ? 8% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.free_one_page.free_unref_folios.folios_put_refs.truncate_inode_pages_range
0.00 +15.4 15.40 ? 8% perf-profile.calltrace.cycles-pp.free_one_page.free_unref_folios.folios_put_refs.truncate_inode_pages_range.evict
0.00 +15.8 15.85 ? 8% perf-profile.calltrace.cycles-pp.free_unref_folios.folios_put_refs.truncate_inode_pages_range.evict.do_unlinkat
18.06 ? 12% -18.1 0.00 perf-profile.children.cycles-pp.__folio_put_large
18.09 ? 12% -17.9 0.16 ? 25% perf-profile.children.cycles-pp.__page_cache_release
17.78 ? 12% -17.7 0.04 ?151% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
22.15 ? 12% -5.9 16.23 ? 8% perf-profile.children.cycles-pp.evict
22.14 ? 12% -5.9 16.22 ? 8% perf-profile.children.cycles-pp.truncate_inode_pages_range
22.16 ? 12% -5.9 16.24 ? 8% perf-profile.children.cycles-pp.__x64_sys_unlinkat
22.16 ? 12% -5.9 16.24 ? 8% perf-profile.children.cycles-pp.do_unlinkat
22.16 ? 12% -5.9 16.24 ? 8% perf-profile.children.cycles-pp.unlinkat
21.85 ? 12% -5.8 16.07 ? 8% perf-profile.children.cycles-pp.folios_put_refs
0.26 ? 10% -0.2 0.07 ? 12% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
0.25 ? 13% -0.2 0.06 ? 8% perf-profile.children.cycles-pp.delete_from_page_cache_batch
0.17 ? 6% -0.1 0.08 ? 8% perf-profile.children.cycles-pp.__mod_lruvec_state
0.16 ? 7% -0.1 0.08 ? 9% perf-profile.children.cycles-pp.__mod_node_page_state
0.07 ? 9% -0.0 0.03 ? 77% perf-profile.children.cycles-pp.begin_new_exec
0.14 ? 7% -0.0 0.11 ? 4% perf-profile.children.cycles-pp.__mmput
0.14 ? 6% -0.0 0.10 ? 4% perf-profile.children.cycles-pp.exit_mmap
0.07 ? 8% -0.0 0.03 ? 78% perf-profile.children.cycles-pp.folio_batch_move_lru
0.14 ? 5% -0.0 0.12 ? 5% perf-profile.children.cycles-pp.load_elf_binary
0.14 ? 3% -0.0 0.12 ? 5% perf-profile.children.cycles-pp.exec_binprm
0.14 ? 3% -0.0 0.12 ? 5% perf-profile.children.cycles-pp.search_binary_handler
0.17 ? 4% -0.0 0.14 ? 4% perf-profile.children.cycles-pp.bprm_execve
0.09 ? 7% +0.0 0.11 ? 9% perf-profile.children.cycles-pp.__filemap_add_folio
0.13 ? 8% +0.0 0.16 ? 7% perf-profile.children.cycles-pp.filemap_add_folio
0.32 ? 3% +0.0 0.35 perf-profile.children.cycles-pp.read_tsc
0.27 ? 3% +0.0 0.31 ? 4% perf-profile.children.cycles-pp.rcu_core
0.52 ? 3% +0.0 0.56 ? 3% perf-profile.children.cycles-pp.update_sg_lb_stats
0.35 ? 5% +0.0 0.39 ? 5% perf-profile.children.cycles-pp.run_rebalance_domains
0.00 +0.1 0.07 ? 9% perf-profile.children.cycles-pp.free_tail_page_prepare
1.20 ? 9% +0.2 1.35 ? 2% perf-profile.children.cycles-pp._raw_spin_trylock
2.12 ? 2% +0.2 2.32 ? 2% perf-profile.children.cycles-pp.rebalance_domains
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.children.cycles-pp.ast_mode_config_helper_atomic_commit_tail
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.children.cycles-pp.commit_tail
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail_rpm
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.children.cycles-pp.drm_atomic_commit
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.children.cycles-pp.drm_atomic_helper_commit
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.children.cycles-pp.ast_primary_plane_helper_atomic_update
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.children.cycles-pp.drm_fb_memcpy
2.27 ? 3% +0.2 2.48 ? 4% perf-profile.children.cycles-pp.memcpy_toio
2.34 ? 3% +0.2 2.56 ? 4% perf-profile.children.cycles-pp.drm_fb_helper_damage_work
2.34 ? 3% +0.2 2.56 ? 4% perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty
2.41 ? 3% +0.2 2.64 ? 4% perf-profile.children.cycles-pp.worker_thread
2.38 ? 3% +0.2 2.61 ? 4% perf-profile.children.cycles-pp.process_one_work
3.12 ? 4% +0.3 3.39 ? 2% perf-profile.children.cycles-pp.__do_softirq
3.50 ? 4% +0.3 3.83 ? 4% perf-profile.children.cycles-pp.irq_exit_rcu
0.00 +0.4 0.38 ? 7% perf-profile.children.cycles-pp.free_unref_page_prepare
6.59 ? 2% +0.6 7.21 ? 3% perf-profile.children.cycles-pp.rep_movs_alternative
6.94 ? 2% +0.7 7.60 ? 3% perf-profile.children.cycles-pp._copy_to_iter
6.99 ? 2% +0.7 7.65 ? 3% perf-profile.children.cycles-pp.copy_page_to_iter
14.17 ? 3% +1.3 15.51 perf-profile.children.cycles-pp.memset_orig
14.19 ? 3% +1.3 15.53 perf-profile.children.cycles-pp.zero_user_segments
14.30 ? 3% +1.3 15.64 perf-profile.children.cycles-pp.iomap_readpage_iter
14.36 ? 3% +1.4 15.72 perf-profile.children.cycles-pp.iomap_readahead
14.37 ? 3% +1.4 15.73 perf-profile.children.cycles-pp.read_pages
14.81 ? 3% +1.4 16.22 perf-profile.children.cycles-pp.page_cache_ra_order
14.86 ? 3% +1.4 16.28 perf-profile.children.cycles-pp.filemap_get_pages
21.90 ? 3% +2.1 23.99 perf-profile.children.cycles-pp.filemap_read
21.92 ? 3% +2.1 24.01 perf-profile.children.cycles-pp.xfs_file_buffered_read
21.94 ? 3% +2.1 24.03 perf-profile.children.cycles-pp.xfs_file_read_iter
22.09 ? 3% +2.1 24.20 perf-profile.children.cycles-pp.vfs_read
22.11 ? 3% +2.1 24.22 perf-profile.children.cycles-pp.ksys_read
22.18 ? 3% +2.1 24.30 perf-profile.children.cycles-pp.read
42.82 ? 3% +3.8 46.65 ? 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
45.46 ? 3% +4.1 49.57 ? 2% perf-profile.children.cycles-pp.acpi_safe_halt
45.57 ? 3% +4.1 49.68 ? 2% perf-profile.children.cycles-pp.acpi_idle_enter
45.99 ? 3% +4.1 50.12 ? 2% perf-profile.children.cycles-pp.cpuidle_enter_state
46.09 ? 3% +4.1 50.22 ? 2% perf-profile.children.cycles-pp.cpuidle_enter
1.12 ? 19% +14.3 15.41 ? 8% perf-profile.children.cycles-pp.free_one_page
0.00 +15.9 15.86 ? 8% perf-profile.children.cycles-pp.free_unref_folios
0.16 ? 7% -0.1 0.08 ? 9% perf-profile.self.cycles-pp.__mod_node_page_state
0.46 ? 3% -0.0 0.42 ? 2% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.31 ? 4% +0.0 0.35 ? 2% perf-profile.self.cycles-pp.read_tsc
0.40 ? 4% +0.0 0.44 ? 5% perf-profile.self.cycles-pp._copy_to_iter
0.38 ? 2% +0.0 0.43 ? 2% perf-profile.self.cycles-pp.menu_select
0.00 +0.1 0.05 ? 6% perf-profile.self.cycles-pp.free_tail_page_prepare
1.19 ? 9% +0.2 1.34 ? 2% perf-profile.self.cycles-pp._raw_spin_trylock
2.26 ? 3% +0.2 2.47 ? 4% perf-profile.self.cycles-pp.memcpy_toio
0.00 +0.3 0.33 ? 7% perf-profile.self.cycles-pp.free_unref_page_prepare
6.50 ? 2% +0.6 7.11 ? 3% perf-profile.self.cycles-pp.rep_movs_alternative
14.09 ? 3% +1.3 15.42 perf-profile.self.cycles-pp.memset_orig
26.19 ? 4% +2.1 28.29 ? 2% perf-profile.self.cycles-pp.acpi_safe_halt


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki