2024-02-20 08:26:50

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression



Hello,

kernel test robot noticed a -21.4% regression of vm-scalability.throughput on:


commit: ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ("readahead: avoid multiple marked readahead pages")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: vm-scalability
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
parameters:

runtime: 300s
test: lru-file-readtwice
cpufreq_governor: performance



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240220/[email protected]

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/lkp-spr-2sp4/lru-file-readtwice/vm-scalability

commit:
f0b7a0d1d4 ("Merge branch 'master' into mm-hotfixes-stable")
ab4443fe3c ("readahead: avoid multiple marked readahead pages")

f0b7a0d1d46625db ab4443fe3ca6298663a55c4a70e
---------------- ---------------------------
%stddev %change %stddev
\ | \
12.33 ? 8% +74.7% 21.54 ? 3% vmstat.procs.b
6.641e+09 ? 7% +43.4% 9.522e+09 ? 3% cpuidle..time
7219825 ? 7% +40.7% 10156643 ? 3% cpuidle..usage
87356 ? 44% +130.7% 201564 ? 12% meminfo.Active(anon)
711730 +26.7% 901680 meminfo.SUnreclaim
198.25 +23.7% 245.26 uptime.boot
18890 ? 2% +14.7% 21667 ? 2% uptime.idle
0.17 ? 62% +0.5 0.70 ? 34% mpstat.cpu.all.iowait%
0.03 ? 5% -0.0 0.02 ? 2% mpstat.cpu.all.soft%
0.83 ? 3% -0.2 0.65 ? 4% mpstat.cpu.all.usr%
347214 ? 10% +19.9% 416202 ? 2% numa-meminfo.node0.SUnreclaim
1.525e+08 ? 4% +13.4% 1.728e+08 ? 4% numa-meminfo.node1.Active
1.524e+08 ? 4% +13.3% 1.727e+08 ? 4% numa-meminfo.node1.Active(file)
71750516 ? 10% -24.9% 53877171 ? 13% numa-meminfo.node1.Inactive
71127836 ? 10% -25.1% 53268721 ? 13% numa-meminfo.node1.Inactive(file)
364797 ? 10% +33.0% 485106 ? 2% numa-meminfo.node1.SUnreclaim
3610954 ? 6% +40.2% 5062891 ? 3% turbostat.C1E
3627684 ? 7% +40.9% 5111624 ? 3% turbostat.C6
12.35 ? 55% -61.0% 4.82 ? 50% turbostat.IPC
31624764 ? 2% +33.5% 42205318 turbostat.IRQ
3.60 ? 24% -1.7 1.94 ? 28% turbostat.PKG_%
12438 ? 4% +90.4% 23687 ? 23% turbostat.POLL
48.81 -12.6% 42.65 turbostat.RAMWatt
24934637 ? 9% +83.8% 45836252 ? 5% numa-numastat.node0.local_node
3271697 ? 22% +70.7% 5586210 ? 22% numa-numastat.node0.numa_foreign
25077126 ? 9% +83.3% 45969061 ? 5% numa-numastat.node0.numa_hit
4703977 ? 10% +159.8% 12220561 ? 7% numa-numastat.node0.numa_miss
4847049 ? 9% +154.8% 12350702 ? 7% numa-numastat.node0.other_node
26364328 ? 5% +111.3% 55706473 ? 3% numa-numastat.node1.local_node
4704476 ? 10% +159.7% 12219530 ? 7% numa-numastat.node1.numa_foreign
26458496 ? 5% +110.9% 55813309 ? 3% numa-numastat.node1.numa_hit
3271887 ? 22% +70.7% 5586065 ? 22% numa-numastat.node1.numa_miss
3363897 ? 20% +69.2% 5691334 ? 22% numa-numastat.node1.other_node
186286 ? 2% -24.3% 140930 ? 2% vm-scalability.median
6476 ? 20% +2723.0 9199 ? 11% vm-scalability.stddev%
88930342 ? 5% -21.4% 69899439 ? 3% vm-scalability.throughput
135.95 ? 2% +35.0% 183.51 vm-scalability.time.elapsed_time
135.95 ? 2% +35.0% 183.51 vm-scalability.time.elapsed_time.max
3898231 ? 7% +22.7% 4784231 ? 7% vm-scalability.time.involuntary_context_switches
246538 +1.2% 249586 vm-scalability.time.minor_page_faults
17484 -3.0% 16967 vm-scalability.time.percent_of_cpu_this_job_got
23546 ? 2% +31.3% 30915 vm-scalability.time.system_time
125622 ? 7% +232.5% 417746 ? 7% vm-scalability.time.voluntary_context_switches
7.10 ? 31% -26.9% 5.19 ? 3% perf-sched.wait_and_delay.avg.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.folio_alloc.page_cache_ra_order
14.80 ? 42% -42.0% 8.58 ? 11% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
6.01 ? 27% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
11652 ? 37% +480.5% 67637 ? 21% perf-sched.wait_and_delay.count.__cond_resched.__alloc_pages.alloc_pages_mpol.folio_alloc.page_cache_ra_order
1328 ? 86% +760.2% 11431 ? 31% perf-sched.wait_and_delay.count.__cond_resched.__kmalloc.ifs_alloc.isra.0
10417 ? 30% +223.8% 33728 ? 30% perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
2529 ? 36% -100.0% 0.00 perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
1336 ?133% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
6.74 ? 26% -24.9% 5.06 ? 3% perf-sched.wait_time.avg.ms.__cond_resched.__alloc_pages.alloc_pages_mpol.folio_alloc.page_cache_ra_order
3.12 ? 31% -48.8% 1.60 ? 14% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc.ifs_alloc.isra.0
1.68 ? 23% -70.2% 0.50 ? 6% perf-sched.wait_time.avg.ms.__cond_resched.down_read.page_cache_ra_unbounded.filemap_get_pages.filemap_read
0.54 ?133% +441.1% 2.94 ? 33% perf-sched.wait_time.avg.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
13.13 ? 40% -42.8% 7.51 ? 12% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages
1.47 ?122% +359.5% 6.78 ? 22% perf-sched.wait_time.max.ms.__cond_resched.task_work_run.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.47 ? 50% -75.4% 1.10 ?134% perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.exit_mmap.__mmput.exit_mm
86841 ? 10% +19.8% 104069 ? 2% numa-vmstat.node0.nr_slab_unreclaimable
3271697 ? 22% +70.7% 5586210 ? 22% numa-vmstat.node0.numa_foreign
25076787 ? 9% +83.3% 45969300 ? 5% numa-vmstat.node0.numa_hit
24934299 ? 9% +83.8% 45836491 ? 5% numa-vmstat.node0.numa_local
4703977 ? 10% +159.8% 12220561 ? 7% numa-vmstat.node0.numa_miss
4847048 ? 9% +154.8% 12350702 ? 7% numa-vmstat.node0.numa_other
38159902 ? 4% +13.2% 43207654 ? 4% numa-vmstat.node1.nr_active_file
17768992 ? 10% -25.1% 13307850 ? 13% numa-vmstat.node1.nr_inactive_file
91228 ? 10% +33.0% 121288 ? 2% numa-vmstat.node1.nr_slab_unreclaimable
38159860 ? 4% +13.2% 43207611 ? 4% numa-vmstat.node1.nr_zone_active_file
17768981 ? 10% -25.1% 13307832 ? 13% numa-vmstat.node1.nr_zone_inactive_file
4704476 ? 10% +159.7% 12219530 ? 7% numa-vmstat.node1.numa_foreign
26458450 ? 5% +110.9% 55813002 ? 3% numa-vmstat.node1.numa_hit
26364282 ? 5% +111.3% 55706167 ? 3% numa-vmstat.node1.numa_local
3271887 ? 22% +70.7% 5586065 ? 22% numa-vmstat.node1.numa_miss
3363897 ? 20% +69.2% 5691333 ? 22% numa-vmstat.node1.numa_other
90826607 ?109% -65.8% 31040624 ? 32% proc-vmstat.compact_daemon_free_scanned
96602657 ?103% -65.4% 33447362 ? 32% proc-vmstat.compact_free_scanned
1184 ? 92% -95.5% 52.75 ? 29% proc-vmstat.kswapd_low_wmark_hit_quickly
21460 ? 47% +137.3% 50924 ? 12% proc-vmstat.nr_active_anon
3576 ? 3% -29.3% 2528 ? 3% proc-vmstat.nr_isolated_file
178094 +26.5% 225368 proc-vmstat.nr_slab_unreclaimable
21460 ? 47% +137.3% 50924 ? 12% proc-vmstat.nr_zone_active_anon
7976174 ? 8% +123.2% 17805741 ? 6% proc-vmstat.numa_foreign
51538988 ? 3% +97.5% 1.018e+08 proc-vmstat.numa_hit
51302328 ? 3% +97.9% 1.015e+08 proc-vmstat.numa_local
7975865 ? 8% +123.3% 17806626 ? 6% proc-vmstat.numa_miss
8210948 ? 7% +119.7% 18042039 ? 6% proc-vmstat.numa_other
1208 ? 92% -93.1% 83.38 ? 24% proc-vmstat.pageoutrun
2270 +4.9% 2381 proc-vmstat.pgpgin
51647 ? 9% +24.2% 64144 ? 19% proc-vmstat.pgreuse
12722105 ? 16% +51.8% 19317724 ? 27% proc-vmstat.workingset_activate_file
8714025 +122.7% 19406236 ? 13% sched_debug.cfs_rq:/.avg_vruntime.avg
14306847 ? 4% +105.2% 29360984 ? 10% sched_debug.cfs_rq:/.avg_vruntime.max
909251 ? 71% +426.4% 4786321 ? 57% sched_debug.cfs_rq:/.avg_vruntime.min
2239402 ? 12% +77.5% 3975146 ? 11% sched_debug.cfs_rq:/.avg_vruntime.stddev
7790 ? 9% +27.6% 9939 ? 6% sched_debug.cfs_rq:/.load.avg
536737 +35.6% 727628 ? 20% sched_debug.cfs_rq:/.load.max
52392 ? 6% +31.7% 68975 ? 15% sched_debug.cfs_rq:/.load.stddev
8714025 +122.7% 19406236 ? 13% sched_debug.cfs_rq:/.min_vruntime.avg
14306847 ? 4% +105.2% 29360984 ? 10% sched_debug.cfs_rq:/.min_vruntime.max
909251 ? 71% +426.4% 4786321 ? 57% sched_debug.cfs_rq:/.min_vruntime.min
2239402 ? 12% +77.5% 3975147 ? 11% sched_debug.cfs_rq:/.min_vruntime.stddev
263.62 -37.2% 165.56 ? 16% sched_debug.cfs_rq:/.removed.runnable_avg.max
34.46 ? 20% -36.9% 21.75 ? 27% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
263.62 -37.2% 165.56 ? 16% sched_debug.cfs_rq:/.removed.util_avg.max
34.46 ? 20% -36.9% 21.75 ? 27% sched_debug.cfs_rq:/.removed.util_avg.stddev
24033 ? 20% +132.7% 55928 ? 31% sched_debug.cpu.avg_idle.min
90800 +46.2% 132766 ? 10% sched_debug.cpu.clock.avg
90862 +46.2% 132821 ? 10% sched_debug.cpu.clock.max
90743 +46.2% 132681 ? 10% sched_debug.cpu.clock.min
90382 +46.0% 131999 ? 10% sched_debug.cpu.clock_task.avg
90564 +46.0% 132188 ? 10% sched_debug.cpu.clock_task.max
75846 ? 2% +54.3% 117026 ? 11% sched_debug.cpu.clock_task.min
8008 +20.9% 9683 ? 8% sched_debug.cpu.curr->pid.max
7262 +96.3% 14257 ? 11% sched_debug.cpu.nr_switches.avg
1335 ? 25% +147.2% 3301 ? 46% sched_debug.cpu.nr_switches.min
0.04 ? 51% +99.6% 0.07 ? 16% sched_debug.cpu.nr_uninterruptible.avg
6.94 ? 10% +28.9% 8.94 ? 8% sched_debug.cpu.nr_uninterruptible.stddev
90747 +46.2% 132684 ? 10% sched_debug.cpu_clk
89537 +46.8% 131475 ? 10% sched_debug.ktime
91651 +45.8% 133586 ? 10% sched_debug.sched_clk
12.03 -19.4% 9.70 perf-stat.i.MPKI
1.752e+10 ? 2% -11.2% 1.556e+10 perf-stat.i.branch-instructions
78.57 -3.0 75.62 perf-stat.i.cache-miss-rate%
1.081e+09 ? 2% -27.8% 7.811e+08 perf-stat.i.cache-misses
1.28e+09 ? 2% -23.4% 9.8e+08 perf-stat.i.cache-references
5.60 +6.0% 5.94 perf-stat.i.cpi
5.076e+11 ? 2% -3.6% 4.895e+11 perf-stat.i.cpu-cycles
505.00 ? 3% +14.2% 576.55 ? 2% perf-stat.i.cpu-migrations
2.087e+10 ? 2% -12.9% 1.818e+10 perf-stat.i.dTLB-loads
0.04 ? 2% +0.0 0.06 ? 3% perf-stat.i.dTLB-store-miss-rate%
1787964 ? 3% +30.8% 2339432 perf-stat.i.dTLB-store-misses
6.896e+09 ? 2% -21.0% 5.448e+09 perf-stat.i.dTLB-stores
7.872e+10 ? 2% -12.2% 6.91e+10 perf-stat.i.instructions
0.27 ? 3% +5.9% 0.28 ? 2% perf-stat.i.ipc
0.12 ? 27% -52.0% 0.06 ? 20% perf-stat.i.major-faults
646.66 ? 8% +31.4% 849.88 ? 2% perf-stat.i.metric.K/sec
201.93 ? 2% -13.2% 175.35 perf-stat.i.metric.M/sec
8279 ? 10% -19.5% 6667 ? 8% perf-stat.i.minor-faults
76148688 ? 6% -25.0% 57102562 ? 4% perf-stat.i.node-load-misses
1.996e+08 ? 6% -27.3% 1.451e+08 ? 4% perf-stat.i.node-loads
8279 ? 10% -19.5% 6667 ? 8% perf-stat.i.page-faults
13.78 -17.3% 11.40 perf-stat.overall.MPKI
0.11 ? 2% +0.0 0.12 perf-stat.overall.branch-miss-rate%
84.62 -4.7 79.91 perf-stat.overall.cache-miss-rate%
6.47 +9.8% 7.10 perf-stat.overall.cpi
469.58 ? 2% +32.7% 623.15 perf-stat.overall.cycles-between-cache-misses
0.03 ? 2% +0.0 0.04 perf-stat.overall.dTLB-store-miss-rate%
0.15 -8.9% 0.14 perf-stat.overall.ipc
1265 ? 2% +19.2% 1507 perf-stat.overall.path-length
1.757e+10 ? 2% -10.1% 1.58e+10 perf-stat.ps.branch-instructions
1.088e+09 ? 2% -26.5% 7.996e+08 perf-stat.ps.cache-misses
1.285e+09 ? 2% -22.1% 1.001e+09 perf-stat.ps.cache-references
490.77 ? 4% +15.5% 566.82 ? 3% perf-stat.ps.cpu-migrations
2.094e+10 ? 2% -11.8% 1.847e+10 perf-stat.ps.dTLB-loads
1746391 ? 2% +32.3% 2310550 perf-stat.ps.dTLB-store-misses
6.91e+09 ? 2% -19.9% 5.536e+09 perf-stat.ps.dTLB-stores
7.892e+10 ? 2% -11.1% 7.017e+10 perf-stat.ps.instructions
0.12 ? 28% -55.6% 0.05 ? 21% perf-stat.ps.major-faults
7608 ? 9% -18.1% 6231 ? 7% perf-stat.ps.minor-faults
76550152 ? 5% -24.5% 57810813 ? 4% perf-stat.ps.node-load-misses
2.022e+08 ? 5% -26.0% 1.495e+08 ? 4% perf-stat.ps.node-loads
7608 ? 9% -18.1% 6231 ? 7% perf-stat.ps.page-faults
1.087e+13 ? 2% +19.2% 1.295e+13 perf-stat.total.instructions
19.35 ? 18% -19.3 0.00 perf-profile.calltrace.cycles-pp.__libc_start_main
19.35 ? 18% -19.3 0.00 perf-profile.calltrace.cycles-pp.main.__libc_start_main
19.35 ? 18% -19.3 0.00 perf-profile.calltrace.cycles-pp.run_builtin.main.__libc_start_main
18.24 ? 41% -18.2 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
18.24 ? 41% -18.2 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
16.85 ? 12% -16.8 0.00 perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.run_builtin.main.__libc_start_main
16.85 ? 12% -16.8 0.00 perf-profile.calltrace.cycles-pp.cmd_record.run_builtin.main.__libc_start_main
16.79 ? 20% -16.8 0.00 perf-profile.calltrace.cycles-pp.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record
16.61 ? 16% -16.6 0.00 perf-profile.calltrace.cycles-pp.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin.main
16.60 ? 18% -16.6 0.00 perf-profile.calltrace.cycles-pp.__libc_write.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist
16.60 ? 18% -16.6 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write.writen.record__pushfn
16.60 ? 18% -16.6 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_write.writen.record__pushfn.perf_mmap__push
16.60 ? 18% -16.6 0.00 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write.writen
16.60 ? 18% -16.6 0.00 perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
16.60 ? 18% -16.6 0.00 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
16.60 ? 18% -16.6 0.00 perf-profile.calltrace.cycles-pp.writen.record__pushfn.perf_mmap__push.record__mmap_read_evlist.__cmd_record
16.55 ? 20% -16.6 0.00 perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64
16.27 ? 17% -16.3 0.00 perf-profile.calltrace.cycles-pp.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin
16.59 ? 42% -15.9 0.66 ?126% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
16.59 ? 42% -15.9 0.66 ?126% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
16.59 ? 42% -15.9 0.73 ?111% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
16.59 ? 42% -15.9 0.73 ?111% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
16.59 ? 42% -15.8 0.76 ?112% perf-profile.calltrace.cycles-pp.read
9.60 ? 77% -9.6 0.00 perf-profile.calltrace.cycles-pp.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.60 ? 77% -9.6 0.00 perf-profile.calltrace.cycles-pp.seq_read_iter.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64
9.47 ? 75% -9.5 0.00 perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.47 ? 75% -9.5 0.00 perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode
9.47 ? 75% -9.5 0.00 perf-profile.calltrace.cycles-pp.do_group_exit.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64
9.47 ? 75% -9.5 0.00 perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.47 ? 75% -9.5 0.00 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.40 ? 37% -9.4 0.00 perf-profile.calltrace.cycles-pp.task_work_run.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
9.20 ? 38% -9.2 0.00 perf-profile.calltrace.cycles-pp.__fput.task_work_run.do_exit.do_group_exit.get_signal
8.39 ? 40% -8.4 0.00 perf-profile.calltrace.cycles-pp.perf_event_release_kernel.perf_release.__fput.task_work_run.do_exit
8.39 ? 40% -8.4 0.00 perf-profile.calltrace.cycles-pp.perf_release.__fput.task_work_run.do_exit.do_group_exit
8.24 ? 36% -8.2 0.00 perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.24 ? 36% -8.2 0.00 perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
7.29 ? 87% -7.3 0.00 perf-profile.calltrace.cycles-pp.show_interrupts.seq_read_iter.proc_reg_read_iter.vfs_read.ksys_read
7.01 ? 49% -7.0 0.00 perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
6.40 ? 57% -6.4 0.00 perf-profile.calltrace.cycles-pp.open64
6.23 ? 44% -6.2 0.00 perf-profile.calltrace.cycles-pp.proc_pid_status.proc_single_show.seq_read_iter.seq_read.vfs_read
6.23 ? 44% -6.2 0.00 perf-profile.calltrace.cycles-pp.proc_single_show.seq_read_iter.seq_read.vfs_read.ksys_read
6.23 ? 44% -6.2 0.00 perf-profile.calltrace.cycles-pp.seq_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.23 ? 44% -6.2 0.00 perf-profile.calltrace.cycles-pp.seq_read_iter.seq_read.vfs_read.ksys_read.do_syscall_64
6.08 ? 31% -6.1 0.00 perf-profile.calltrace.cycles-pp.event_function_call.perf_event_release_kernel.perf_release.__fput.task_work_run
6.05 ? 63% -6.0 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
6.05 ? 63% -6.0 0.00 perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
6.05 ? 63% -6.0 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
6.05 ? 63% -6.0 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
5.86 ? 28% -5.9 0.00 perf-profile.calltrace.cycles-pp.smp_call_function_single.event_function_call.perf_event_release_kernel.perf_release.__fput
5.82 ? 48% -5.8 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
5.82 ? 48% -5.8 0.00 perf-profile.calltrace.cycles-pp.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
5.82 ? 48% -5.8 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve
5.82 ? 48% -5.8 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve
5.82 ? 48% -5.8 0.00 perf-profile.calltrace.cycles-pp.execve
5.79 ? 47% -5.8 0.00 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
5.75 ? 66% -5.7 0.00 perf-profile.calltrace.cycles-pp.fault_in_readable.fault_in_iov_iter_readable.generic_perform_write.shmem_file_write_iter.vfs_write
5.60 ? 55% -5.6 0.00 perf-profile.calltrace.cycles-pp.fault_in_iov_iter_readable.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
4.97 ? 49% -5.0 0.00 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write
4.30 ? 60% -4.3 0.00 perf-profile.calltrace.cycles-pp.asm_exc_page_fault
3.87 ? 70% -3.9 0.00 perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.get_signal
3.87 ? 70% -3.9 0.00 perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
0.00 +0.7 0.71 ? 23% perf-profile.calltrace.cycles-pp.__free_pages_ok.release_pages.__folio_batch_release.truncate_inode_pages_range.evict
0.00 +0.7 0.72 ? 21% perf-profile.calltrace.cycles-pp.delete_from_page_cache_batch.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat
0.00 +0.7 0.73 ? 22% perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge.destroy_large_folio.release_pages.__folio_batch_release.truncate_inode_pages_range
0.00 +0.7 0.73 ? 23% perf-profile.calltrace.cycles-pp.free_unref_page_prepare.free_unref_page.release_pages.__folio_batch_release.truncate_inode_pages_range
0.00 +0.8 0.80 ? 17% perf-profile.calltrace.cycles-pp.xas_load.truncate_folio_batch_exceptionals.truncate_inode_pages_range.evict.do_unlinkat
0.00 +0.9 0.86 ? 13% perf-profile.calltrace.cycles-pp._raw_spin_trylock.rebalance_domains.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt
0.00 +0.9 0.86 ? 21% perf-profile.calltrace.cycles-pp.truncate_cleanup_folio.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat
0.00 +0.9 0.89 ? 18% perf-profile.calltrace.cycles-pp.kmem_cache_free.rcu_do_batch.rcu_core.__do_softirq.irq_exit_rcu
0.00 +0.9 0.89 ? 14% perf-profile.calltrace.cycles-pp.workingset_update_node.xas_store.truncate_folio_batch_exceptionals.truncate_inode_pages_range.evict
0.00 +0.9 0.94 ? 63% perf-profile.calltrace.cycles-pp.fast_imageblit.sys_imageblit.drm_fbdev_generic_defio_imageblit.bit_putcs.fbcon_putcs
0.00 +1.0 0.95 ? 63% perf-profile.calltrace.cycles-pp.sys_imageblit.drm_fbdev_generic_defio_imageblit.bit_putcs.fbcon_putcs.fbcon_redraw
0.00 +1.0 0.95 ? 63% perf-profile.calltrace.cycles-pp.drm_fbdev_generic_defio_imageblit.bit_putcs.fbcon_putcs.fbcon_redraw.fbcon_scroll
0.00 +1.1 1.06 ? 31% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes
0.00 +1.1 1.14 ? 39% perf-profile.calltrace.cycles-pp.clockevents_program_event.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.13 ?264% +1.3 1.38 ? 45% perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry
0.00 +1.3 1.26 ? 58% perf-profile.calltrace.cycles-pp.bit_putcs.fbcon_putcs.fbcon_redraw.fbcon_scroll.con_scroll
0.00 +1.3 1.31 ? 59% perf-profile.calltrace.cycles-pp.fbcon_putcs.fbcon_redraw.fbcon_scroll.con_scroll.lf
0.00 +1.5 1.45 ? 24% perf-profile.calltrace.cycles-pp.destroy_large_folio.release_pages.__folio_batch_release.truncate_inode_pages_range.evict
0.00 +1.5 1.49 ? 60% perf-profile.calltrace.cycles-pp.fbcon_redraw.fbcon_scroll.con_scroll.lf.vt_console_print
0.00 +1.5 1.52 ? 22% perf-profile.calltrace.cycles-pp.free_unref_page.release_pages.__folio_batch_release.truncate_inode_pages_range.evict
0.00 +1.5 1.55 ? 58% perf-profile.calltrace.cycles-pp.con_scroll.lf.vt_console_print.console_flush_all.console_unlock
0.00 +1.5 1.55 ? 58% perf-profile.calltrace.cycles-pp.fbcon_scroll.con_scroll.lf.vt_console_print.console_flush_all
0.00 +1.5 1.55 ? 58% perf-profile.calltrace.cycles-pp.lf.vt_console_print.console_flush_all.console_unlock.vprintk_emit
0.00 +1.6 1.58 ? 57% perf-profile.calltrace.cycles-pp.vt_console_print.console_flush_all.console_unlock.vprintk_emit.devkmsg_emit
0.00 +1.8 1.77 ? 41% perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
0.00 +1.8 1.81 ? 27% perf-profile.calltrace.cycles-pp.__slab_free.kmem_cache_free.rcu_do_batch.rcu_core.__do_softirq
0.13 ?264% +2.0 2.08 ? 30% perf-profile.calltrace.cycles-pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
0.00 +2.0 1.97 ? 17% perf-profile.calltrace.cycles-pp.rcu_segcblist_enqueue.__call_rcu_common.xas_store.truncate_folio_batch_exceptionals.truncate_inode_pages_range
0.00 +2.0 2.04 ? 19% perf-profile.calltrace.cycles-pp.kmem_cache_free.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd
0.00 +2.2 2.17 ? 17% perf-profile.calltrace.cycles-pp.__call_rcu_common.xas_store.truncate_folio_batch_exceptionals.truncate_inode_pages_range.evict
0.55 ?134% +2.2 2.78 ? 18% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.75 ?132% +2.5 3.22 ? 10% perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
0.00 +2.6 2.64 ? 19% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn
0.75 ?132% +2.7 3.40 ? 10% perf-profile.calltrace.cycles-pp.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
0.00 +2.7 2.66 ? 19% perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread
0.00 +2.7 2.67 ? 19% perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
0.00 +2.7 2.67 ? 19% perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.00 +2.8 2.76 ? 13% perf-profile.calltrace.cycles-pp.io_serial_in.wait_for_lsr.wait_for_xmitr.serial8250_console_write.console_flush_all
0.17 ?264% +2.8 2.97 ? 10% perf-profile.calltrace.cycles-pp.__intel_pmu_enable_all.perf_adjust_freq_unthr_context.perf_event_task_tick.scheduler_tick.update_process_times
0.00 +2.8 2.83 ? 13% perf-profile.calltrace.cycles-pp.wait_for_lsr.wait_for_xmitr.serial8250_console_write.console_flush_all.console_unlock
0.00 +2.8 2.83 ? 13% perf-profile.calltrace.cycles-pp.wait_for_xmitr.serial8250_console_write.console_flush_all.console_unlock.vprintk_emit
0.00 +3.7 3.67 ? 15% perf-profile.calltrace.cycles-pp.xas_find.find_lock_entries.truncate_inode_pages_range.evict.do_unlinkat
0.30 ?175% +4.2 4.48 ? 8% perf-profile.calltrace.cycles-pp.perf_adjust_freq_unthr_context.perf_event_task_tick.scheduler_tick.update_process_times.tick_sched_handle
0.30 ?175% +4.3 4.60 ? 8% perf-profile.calltrace.cycles-pp.perf_event_task_tick.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler
0.00 +4.4 4.40 ? 22% perf-profile.calltrace.cycles-pp.release_pages.__folio_batch_release.truncate_inode_pages_range.evict.do_unlinkat
0.50 ?132% +4.4 4.90 ? 8% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.00 +4.4 4.41 ? 21% perf-profile.calltrace.cycles-pp.__folio_batch_release.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat
0.00 +4.7 4.72 ? 33% perf-profile.calltrace.cycles-pp.io_serial_out.serial8250_console_write.console_flush_all.console_unlock.vprintk_emit
0.00 +4.8 4.76 ? 15% perf-profile.calltrace.cycles-pp.find_lock_entries.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat
1.06 ?125% +4.9 5.96 ? 10% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues
0.00 +5.3 5.34 ? 8% perf-profile.calltrace.cycles-pp.intel_idle_xstate.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
1.28 ?100% +5.5 6.82 ? 10% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt
1.28 ?100% +5.6 6.87 ? 10% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
1.28 ?100% +6.1 7.36 ? 10% perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
0.00 +6.9 6.86 ? 15% perf-profile.calltrace.cycles-pp.xas_store.truncate_folio_batch_exceptionals.truncate_inode_pages_range.evict.do_unlinkat
0.00 +7.7 7.69 ? 12% perf-profile.calltrace.cycles-pp.io_serial_in.wait_for_lsr.serial8250_console_write.console_flush_all.console_unlock
0.00 +7.9 7.90 ? 12% perf-profile.calltrace.cycles-pp.wait_for_lsr.serial8250_console_write.console_flush_all.console_unlock.vprintk_emit
0.00 +8.3 8.34 ? 15% perf-profile.calltrace.cycles-pp.truncate_folio_batch_exceptionals.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat
1.28 ?100% +8.4 9.71 ? 7% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.26 ?264% +11.4 11.67 ? 8% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
0.26 ?264% +11.8 12.06 ? 8% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
1.00 ?102% +15.9 16.89 ? 8% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
0.00 +16.1 16.11 ? 13% perf-profile.calltrace.cycles-pp.serial8250_console_write.console_flush_all.console_unlock.vprintk_emit.devkmsg_emit
0.17 ?264% +17.4 17.59 ? 12% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.17 ?264% +17.4 17.59 ? 12% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.17 ?264% +17.4 17.60 ? 12% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.17 ?264% +17.4 17.60 ? 12% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
0.17 ?264% +17.4 17.62 ? 12% perf-profile.calltrace.cycles-pp.write
0.00 +17.5 17.47 ? 13% perf-profile.calltrace.cycles-pp.console_flush_all.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write
0.00 +17.5 17.48 ? 13% perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.vfs_write
1.60 ? 88% +17.5 19.11 ? 8% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.00 +17.5 17.53 ? 12% perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm
0.00 +17.5 17.55 ? 13% perf-profile.calltrace.cycles-pp.vprintk_emit.devkmsg_emit.devkmsg_write.vfs_write.ksys_write
0.00 +17.5 17.55 ? 13% perf-profile.calltrace.cycles-pp.devkmsg_emit.devkmsg_write.vfs_write.ksys_write.do_syscall_64
0.00 +17.6 17.55 ? 13% perf-profile.calltrace.cycles-pp.devkmsg_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +18.0 17.98 ? 12% perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail
0.00 +18.0 17.98 ? 12% perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail
0.00 +18.0 18.02 ? 12% perf-profile.calltrace.cycles-pp.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb
0.00 +18.0 18.02 ? 12% perf-profile.calltrace.cycles-pp.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty
0.00 +18.0 18.02 ? 12% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit
0.00 +18.0 18.02 ? 12% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit
0.00 +18.0 18.03 ? 12% perf-profile.calltrace.cycles-pp.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work
0.00 +18.0 18.03 ? 12% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work
0.00 +18.0 18.03 ? 12% perf-profile.calltrace.cycles-pp.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread
0.00 +18.7 18.67 ? 12% perf-profile.calltrace.cycles-pp.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread.ret_from_fork
0.00 +18.7 18.67 ? 12% perf-profile.calltrace.cycles-pp.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread
0.00 +18.7 18.72 ? 12% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.00 +18.8 18.77 ? 12% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.00 +19.3 19.26 ? 15% perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64
0.00 +19.3 19.28 ? 15% perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +19.3 19.29 ? 15% perf-profile.calltrace.cycles-pp.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
0.00 +19.3 19.29 ? 15% perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
0.00 +19.3 19.29 ? 15% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
0.00 +19.3 19.29 ? 15% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlinkat
0.00 +19.3 19.29 ? 15% perf-profile.calltrace.cycles-pp.unlinkat
0.89 ?100% +22.5 23.36 ? 15% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
0.89 ?100% +22.5 23.36 ? 15% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
0.89 ?100% +22.5 23.36 ? 15% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
3.26 ?101% +26.4 29.63 ? 7% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
6.41 ? 88% +26.4 32.84 ? 7% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
6.41 ? 88% +26.4 32.85 ? 7% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
6.41 ? 88% +26.4 32.85 ? 7% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
3.56 ? 99% +26.6 30.12 ? 7% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
6.41 ? 88% +26.6 33.04 ? 7% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
3.68 ? 96% +28.8 32.52 ? 8% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
75.17 ? 12% -36.3 38.83 ? 10% perf-profile.children.cycles-pp.do_syscall_64
75.17 ? 12% -36.3 38.84 ? 10% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
19.35 ? 18% -19.2 0.10 ? 84% perf-profile.children.cycles-pp.__libc_start_main
19.35 ? 18% -19.2 0.10 ? 84% perf-profile.children.cycles-pp.main
19.35 ? 18% -19.2 0.10 ? 84% perf-profile.children.cycles-pp.run_builtin
19.35 ? 18% -19.2 0.10 ? 90% perf-profile.children.cycles-pp.cmd_record
18.76 ? 18% -18.8 0.00 perf-profile.children.cycles-pp.perf_mmap__push
18.76 ? 18% -18.8 0.00 perf-profile.children.cycles-pp.record__mmap_read_evlist
18.63 ? 28% -18.4 0.18 ? 46% perf-profile.children.cycles-pp.do_exit
18.63 ? 28% -18.4 0.18 ? 46% perf-profile.children.cycles-pp.do_group_exit
16.81 ? 19% -16.8 0.00 perf-profile.children.cycles-pp.__libc_write
16.80 ? 20% -16.8 0.00 perf-profile.children.cycles-pp.record__pushfn
16.60 ? 18% -16.6 0.00 perf-profile.children.cycles-pp.writen
16.60 ? 18% -16.6 0.01 ?264% perf-profile.children.cycles-pp.shmem_file_write_iter
16.42 ? 19% -16.4 0.00 perf-profile.children.cycles-pp.generic_perform_write
16.42 ? 41% -16.1 0.29 ? 13% perf-profile.children.cycles-pp.seq_read_iter
16.81 ? 41% -15.9 0.94 ? 79% perf-profile.children.cycles-pp.read
16.59 ? 42% -15.6 0.96 ? 69% perf-profile.children.cycles-pp.vfs_read
16.59 ? 42% -15.6 0.96 ? 69% perf-profile.children.cycles-pp.ksys_read
19.35 ? 18% -15.4 3.98 ?173% perf-profile.children.cycles-pp.__cmd_record
14.11 ? 37% -14.1 0.00 perf-profile.children.cycles-pp.arch_do_signal_or_restart
14.11 ? 37% -14.1 0.00 perf-profile.children.cycles-pp.get_signal
10.35 ? 37% -10.1 0.27 ? 64% perf-profile.children.cycles-pp.asm_exc_page_fault
10.10 ? 32% -10.0 0.07 ?100% perf-profile.children.cycles-pp.__fput
9.93 ? 74% -9.9 0.01 ?264% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
9.60 ? 77% -9.6 0.00 perf-profile.children.cycles-pp.proc_reg_read_iter
9.58 ? 36% -9.6 0.00 perf-profile.children.cycles-pp.task_work_run
9.10 ? 54% -8.9 0.16 ? 53% perf-profile.children.cycles-pp.exit_mmap
9.10 ? 54% -8.9 0.16 ? 53% perf-profile.children.cycles-pp.__mmput
8.57 ? 35% -8.5 0.09 ? 78% perf-profile.children.cycles-pp.do_sys_openat2
8.57 ? 35% -8.5 0.10 ? 78% perf-profile.children.cycles-pp.__x64_sys_openat
8.39 ? 40% -8.3 0.06 ?101% perf-profile.children.cycles-pp.perf_event_release_kernel
8.39 ? 40% -8.3 0.06 ?101% perf-profile.children.cycles-pp.perf_release
8.24 ? 36% -8.2 0.09 ? 78% perf-profile.children.cycles-pp.do_filp_open
8.24 ? 36% -8.2 0.09 ? 78% perf-profile.children.cycles-pp.path_openat
8.36 ? 29% -8.1 0.23 ? 62% perf-profile.children.cycles-pp.exc_page_fault
8.16 ? 33% -7.9 0.23 ? 62% perf-profile.children.cycles-pp.do_user_addr_fault
7.12 ? 84% -7.1 0.00 perf-profile.children.cycles-pp.show_interrupts
6.98 ? 79% -7.0 0.00 perf-profile.children.cycles-pp.seq_printf
7.01 ? 49% -6.9 0.15 ? 54% perf-profile.children.cycles-pp.exit_mm
6.95 ? 36% -6.7 0.22 ? 61% perf-profile.children.cycles-pp.handle_mm_fault
6.23 ? 44% -6.2 0.00 perf-profile.children.cycles-pp.proc_pid_status
6.23 ? 44% -6.2 0.00 perf-profile.children.cycles-pp.proc_single_show
6.22 ? 60% -6.2 0.00 perf-profile.children.cycles-pp.open64
6.08 ? 31% -6.1 0.00 perf-profile.children.cycles-pp.event_function_call
6.23 ? 42% -6.0 0.21 ? 62% perf-profile.children.cycles-pp.__handle_mm_fault
6.23 ? 44% -6.0 0.25 ? 15% perf-profile.children.cycles-pp.seq_read
5.86 ? 28% -5.9 0.00 perf-profile.children.cycles-pp.smp_call_function_single
5.72 ? 54% -5.7 0.00 perf-profile.children.cycles-pp.fault_in_iov_iter_readable
5.82 ? 48% -5.6 0.20 ? 46% perf-profile.children.cycles-pp.do_execveat_common
5.82 ? 48% -5.6 0.20 ? 46% perf-profile.children.cycles-pp.execve
5.82 ? 48% -5.6 0.20 ? 46% perf-profile.children.cycles-pp.__x64_sys_execve
5.60 ? 55% -5.6 0.00 perf-profile.children.cycles-pp.fault_in_readable
5.83 ? 57% -5.0 0.83 ? 18% perf-profile.children.cycles-pp._raw_spin_lock
4.97 ? 49% -5.0 0.00 perf-profile.children.cycles-pp.copy_page_from_iter_atomic
4.65 ? 60% -4.5 0.13 ? 47% perf-profile.children.cycles-pp.bprm_execve
4.47 ? 64% -4.4 0.10 ? 54% perf-profile.children.cycles-pp.load_elf_binary
4.47 ? 64% -4.4 0.10 ? 54% perf-profile.children.cycles-pp.exec_binprm
4.47 ? 64% -4.4 0.10 ? 54% perf-profile.children.cycles-pp.search_binary_handler
4.52 ? 42% -4.3 0.18 ? 46% perf-profile.children.cycles-pp.__x64_sys_exit_group
3.52 ? 57% -3.5 0.06 ? 83% perf-profile.children.cycles-pp.kernel_clone
3.28 ? 47% -3.2 0.09 ? 80% perf-profile.children.cycles-pp.do_fault
3.12 ? 46% -3.0 0.08 ? 80% perf-profile.children.cycles-pp.do_read_fault
2.33 ? 42% -2.3 0.07 ? 80% perf-profile.children.cycles-pp.filemap_map_pages
1.99 ? 34% -1.9 0.05 ? 78% perf-profile.children.cycles-pp.link_path_walk
0.00 +0.1 0.07 ? 15% perf-profile.children.cycles-pp.__update_blocked_fair
0.00 +0.1 0.07 ? 23% perf-profile.children.cycles-pp.uncharge_folio
0.00 +0.1 0.07 ? 19% perf-profile.children.cycles-pp.rcu_nocb_try_bypass
0.00 +0.1 0.07 ? 11% perf-profile.children.cycles-pp.hrtimer_update_next_event
0.00 +0.1 0.08 ? 26% perf-profile.children.cycles-pp.memcg_account_kmem
0.00 +0.1 0.08 ? 22% perf-profile.children.cycles-pp.free_tail_page_prepare
0.00 +0.1 0.08 ? 22% perf-profile.children.cycles-pp.note_gp_changes
0.00 +0.1 0.08 ? 38% perf-profile.children.cycles-pp.console_conditional_schedule
0.00 +0.1 0.08 ? 10% perf-profile.children.cycles-pp.call_cpuidle
0.00 +0.1 0.08 ? 15% perf-profile.children.cycles-pp.cpuidle_governor_latency_req
0.00 +0.1 0.08 ? 13% perf-profile.children.cycles-pp.error_entry
0.00 +0.1 0.09 ? 15% perf-profile.children.cycles-pp.__libc_read
0.00 +0.1 0.09 ? 23% perf-profile.children.cycles-pp.read_counters
0.00 +0.1 0.09 ? 14% perf-profile.children.cycles-pp.xa_load
0.00 +0.1 0.09 ? 16% perf-profile.children.cycles-pp.hrtimer_get_next_event
0.00 +0.1 0.09 ? 13% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.00 +0.1 0.09 ? 22% perf-profile.children.cycles-pp.cmd_stat
0.00 +0.1 0.09 ? 22% perf-profile.children.cycles-pp.dispatch_events
0.00 +0.1 0.09 ? 22% perf-profile.children.cycles-pp.process_interval
0.00 +0.1 0.10 ? 46% perf-profile.children.cycles-pp.__sysvec_irq_work
0.00 +0.1 0.10 ? 46% perf-profile.children.cycles-pp._printk
0.00 +0.1 0.10 ? 46% perf-profile.children.cycles-pp.asm_sysvec_irq_work
0.00 +0.1 0.10 ? 46% perf-profile.children.cycles-pp.irq_work_run
0.00 +0.1 0.10 ? 46% perf-profile.children.cycles-pp.sysvec_irq_work
0.00 +0.1 0.10 ? 15% perf-profile.children.cycles-pp.timerqueue_add
0.00 +0.1 0.11 ? 13% perf-profile.children.cycles-pp.x86_pmu_disable
0.00 +0.1 0.11 ? 39% perf-profile.children.cycles-pp.irq_work_single
0.00 +0.1 0.11 ? 14% perf-profile.children.cycles-pp.timerqueue_del
0.00 +0.1 0.11 ? 54% perf-profile.children.cycles-pp.drm_fb_helper_damage_area
0.00 +0.1 0.11 ? 11% perf-profile.children.cycles-pp.__hrtimer_next_event_base
0.00 +0.1 0.12 ? 35% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
0.00 +0.1 0.12 ? 36% perf-profile.children.cycles-pp.irq_work_run_list
0.00 +0.1 0.12 ? 13% perf-profile.children.cycles-pp.perf_pmu_nop_void
0.00 +0.1 0.13 ? 13% perf-profile.children.cycles-pp.enqueue_hrtimer
0.00 +0.1 0.13 ? 12% perf-profile.children.cycles-pp.irqtime_account_process_tick
0.00 +0.1 0.14 ? 23% perf-profile.children.cycles-pp.__put_partials
0.00 +0.1 0.14 ? 31% perf-profile.children.cycles-pp.check_cpu_stall
0.00 +0.1 0.14 ? 15% perf-profile.children.cycles-pp.update_rq_clock
0.00 +0.1 0.14 ? 16% perf-profile.children.cycles-pp.hrtimer_next_event_without
0.00 +0.2 0.16 ? 67% perf-profile.children.cycles-pp.calc_global_load_tick
0.00 +0.2 0.16 ? 24% perf-profile.children.cycles-pp.filemap_unaccount_folio
0.00 +0.2 0.16 ? 12% perf-profile.children.cycles-pp.local_clock_noinstr
0.00 +0.2 0.17 ? 14% perf-profile.children.cycles-pp.should_we_balance
0.00 +0.2 0.17 ? 25% perf-profile.children.cycles-pp.delay_halt_tpause
0.00 +0.2 0.19 ? 35% perf-profile.children.cycles-pp.arch_call_rest_init
0.00 +0.2 0.19 ? 35% perf-profile.children.cycles-pp.rest_init
0.00 +0.2 0.19 ? 35% perf-profile.children.cycles-pp.start_kernel
0.00 +0.2 0.19 ? 35% perf-profile.children.cycles-pp.x86_64_start_kernel
0.00 +0.2 0.19 ? 35% perf-profile.children.cycles-pp.x86_64_start_reservations
0.00 +0.2 0.21 ? 19% perf-profile.children.cycles-pp.get_next_timer_interrupt
0.00 +0.2 0.22 ? 25% perf-profile.children.cycles-pp.free_one_page
0.00 +0.2 0.22 ? 14% perf-profile.children.cycles-pp.trigger_load_balance
0.00 +0.2 0.23 ? 14% perf-profile.children.cycles-pp.update_irq_load_avg
0.00 +0.2 0.23 ? 15% perf-profile.children.cycles-pp.update_blocked_averages
0.00 +0.2 0.24 ? 17% perf-profile.children.cycles-pp.run_rebalance_domains
0.00 +0.2 0.24 ? 17% perf-profile.children.cycles-pp.list_lru_del
0.00 +0.2 0.24 ? 19% perf-profile.children.cycles-pp.radix_tree_node_rcu_free
0.00 +0.2 0.25 ? 15% perf-profile.children.cycles-pp.get_slabinfo
0.00 +0.2 0.25 ? 15% perf-profile.children.cycles-pp.slab_show
0.00 +0.3 0.25 ? 21% perf-profile.children.cycles-pp.ct_kernel_exit_state
0.00 +0.3 0.28 ? 24% perf-profile.children.cycles-pp.delay_halt
0.00 +0.3 0.29 ? 18% perf-profile.children.cycles-pp.ct_kernel_enter
0.00 +0.3 0.30 ? 9% perf-profile.children.cycles-pp.irqtime_account_irq
0.00 +0.3 0.31 ? 18% perf-profile.children.cycles-pp.ct_idle_exit
0.00 +0.3 0.31 ? 18% perf-profile.children.cycles-pp.tick_sched_do_timer
0.00 +0.3 0.32 ? 16% perf-profile.children.cycles-pp.__mod_lruvec_kmem_state
0.00 +0.3 0.32 ? 15% perf-profile.children.cycles-pp.xas_start
0.00 +0.3 0.32 ? 16% perf-profile.children.cycles-pp.rcu_pending
0.00 +0.3 0.32 ? 10% perf-profile.children.cycles-pp.sched_clock
0.00 +0.4 0.37 ? 11% perf-profile.children.cycles-pp.native_apic_msr_eoi
0.00 +0.4 0.38 ? 9% perf-profile.children.cycles-pp.sched_clock_cpu
0.00 +0.4 0.40 ? 17% perf-profile.children.cycles-pp.ifs_free
0.00 +0.4 0.40 ? 14% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.00 +0.4 0.40 ? 18% perf-profile.children.cycles-pp.__page_cache_release
0.00 +0.4 0.42 ? 10% perf-profile.children.cycles-pp.native_sched_clock
0.00 +0.4 0.43 ? 13% perf-profile.children.cycles-pp.lapic_next_deadline
0.00 +0.4 0.45 ? 13% perf-profile.children.cycles-pp.mem_cgroup_from_slab_obj
0.00 +0.5 0.46 ? 12% perf-profile.children.cycles-pp.read_tsc
0.00 +0.5 0.48 ? 7% perf-profile.children.cycles-pp.perf_rotate_context
0.00 +0.5 0.52 ? 23% perf-profile.children.cycles-pp.page_counter_uncharge
0.00 +0.5 0.52 ? 58% perf-profile.children.cycles-pp.tick_nohz_irq_exit
0.00 +0.5 0.55 ? 25% perf-profile.children.cycles-pp.rcu_cblist_dequeue
0.00 +0.6 0.56 ? 14% perf-profile.children.cycles-pp.list_lru_del_obj
0.00 +0.6 0.59 ? 21% perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.00 +0.6 0.64 ? 24% perf-profile.children.cycles-pp.drm_fbdev_generic_damage_blit_real
0.00 +0.6 0.64 ? 93% perf-profile.children.cycles-pp.tick_irq_enter
0.00 +0.7 0.66 ? 89% perf-profile.children.cycles-pp.irq_enter_rcu
0.00 +0.7 0.69 ? 40% perf-profile.children.cycles-pp.xas_descend
0.00 +0.7 0.69 ? 22% perf-profile.children.cycles-pp.uncharge_batch
0.00 +0.7 0.70 ? 25% perf-profile.children.cycles-pp.folio_undo_large_rmappable
0.00 +0.7 0.73 ? 22% perf-profile.children.cycles-pp.__free_pages_ok
0.00 +0.7 0.73 ? 21% perf-profile.children.cycles-pp.delete_from_page_cache_batch
0.00 +0.7 0.74 ? 22% perf-profile.children.cycles-pp.__mem_cgroup_uncharge
0.00 +0.8 0.78 ? 14% perf-profile.children.cycles-pp.xas_clear_mark
0.00 +0.9 0.87 ? 21% perf-profile.children.cycles-pp.truncate_cleanup_folio
0.00 +0.9 0.95 ? 13% perf-profile.children.cycles-pp.workingset_update_node
0.00 +1.0 0.96 ? 62% perf-profile.children.cycles-pp.fast_imageblit
0.00 +1.0 0.98 ? 61% perf-profile.children.cycles-pp.sys_imageblit
0.00 +1.0 0.98 ? 61% perf-profile.children.cycles-pp.drm_fbdev_generic_defio_imageblit
0.13 ?264% +1.1 1.20 ? 54% perf-profile.children.cycles-pp.tick_nohz_next_event
0.00 +1.2 1.16 ? 38% perf-profile.children.cycles-pp.clockevents_program_event
0.13 ?264% +1.3 1.40 ? 45% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.00 +1.3 1.29 ? 57% perf-profile.children.cycles-pp.bit_putcs
0.00 +1.3 1.34 ? 57% perf-profile.children.cycles-pp.fbcon_putcs
0.00 +1.5 1.46 ? 23% perf-profile.children.cycles-pp.destroy_large_folio
0.00 +1.5 1.52 ? 59% perf-profile.children.cycles-pp.fbcon_redraw
0.00 +1.5 1.55 ? 58% perf-profile.children.cycles-pp.con_scroll
0.00 +1.5 1.55 ? 58% perf-profile.children.cycles-pp.fbcon_scroll
0.00 +1.5 1.55 ? 58% perf-profile.children.cycles-pp.lf
0.00 +1.6 1.58 ? 57% perf-profile.children.cycles-pp.vt_console_print
0.00 +1.7 1.67 ? 20% perf-profile.children.cycles-pp.free_unref_page
0.00 +1.9 1.86 ? 39% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.13 ?264% +2.0 2.12 ? 29% perf-profile.children.cycles-pp.menu_select
0.55 ?134% +2.2 2.78 ? 18% perf-profile.children.cycles-pp.smpboot_thread_fn
0.00 +2.7 2.67 ? 19% perf-profile.children.cycles-pp.run_ksoftirqd
0.00 +2.8 2.83 ? 13% perf-profile.children.cycles-pp.wait_for_xmitr
0.75 ?132% +2.9 3.64 ? 10% perf-profile.children.cycles-pp.irq_exit_rcu
0.30 ?175% +3.1 3.40 ? 9% perf-profile.children.cycles-pp.__intel_pmu_enable_all
0.00 +3.2 3.24 ? 30% perf-profile.children.cycles-pp.ktime_get
0.44 ?173% +3.7 4.11 ? 15% perf-profile.children.cycles-pp.rcu_core
0.00 +3.7 3.72 ? 14% perf-profile.children.cycles-pp.xas_find
0.22 ?264% +3.8 4.01 ? 16% perf-profile.children.cycles-pp.rcu_do_batch
0.30 ?175% +4.4 4.68 ? 8% perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
0.30 ?175% +4.4 4.70 ? 8% perf-profile.children.cycles-pp.perf_event_task_tick
0.00 +4.4 4.41 ? 21% perf-profile.children.cycles-pp.__folio_batch_release
0.50 ?132% +4.4 4.93 ? 8% perf-profile.children.cycles-pp.intel_idle
0.00 +4.8 4.79 ? 15% perf-profile.children.cycles-pp.find_lock_entries
0.00 +5.0 5.01 ? 20% perf-profile.children.cycles-pp.io_serial_out
1.06 ?125% +5.2 6.25 ? 10% perf-profile.children.cycles-pp.scheduler_tick
0.00 +5.4 5.37 ? 8% perf-profile.children.cycles-pp.intel_idle_xstate
0.75 ?132% +5.4 6.12 ? 13% perf-profile.children.cycles-pp.__do_softirq
1.28 ?100% +5.9 7.18 ? 10% perf-profile.children.cycles-pp.update_process_times
1.28 ?100% +5.9 7.22 ? 10% perf-profile.children.cycles-pp.tick_sched_handle
1.28 ?100% +6.5 7.78 ? 9% perf-profile.children.cycles-pp.tick_nohz_highres_handler
0.22 ?264% +7.4 7.62 ? 12% perf-profile.children.cycles-pp.xas_store
0.00 +8.4 8.39 ? 15% perf-profile.children.cycles-pp.truncate_folio_batch_exceptionals
1.28 ?100% +9.0 10.25 ? 6% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.00 +10.6 10.65 ? 12% perf-profile.children.cycles-pp.io_serial_in
0.00 +10.8 10.80 ? 12% perf-profile.children.cycles-pp.wait_for_lsr
1.28 ?100% +11.0 12.30 ? 7% perf-profile.children.cycles-pp.hrtimer_interrupt
1.28 ?100% +11.4 12.71 ? 7% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
2.02 ? 85% +15.8 17.80 ? 7% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.00 +15.9 15.94 ? 14% perf-profile.children.cycles-pp.serial8250_console_write
0.17 ?264% +17.5 17.64 ? 12% perf-profile.children.cycles-pp.write
0.00 +17.5 17.55 ? 13% perf-profile.children.cycles-pp.devkmsg_emit
0.00 +17.6 17.55 ? 13% perf-profile.children.cycles-pp.devkmsg_write
0.00 +17.6 17.56 ? 12% perf-profile.children.cycles-pp.console_flush_all
0.00 +17.6 17.56 ? 12% perf-profile.children.cycles-pp.console_unlock
0.00 +17.6 17.65 ? 12% perf-profile.children.cycles-pp.vprintk_emit
0.00 +18.0 17.98 ? 12% perf-profile.children.cycles-pp.ast_primary_plane_helper_atomic_update
0.00 +18.0 17.98 ? 12% perf-profile.children.cycles-pp.drm_fb_memcpy
0.00 +18.0 17.98 ? 12% perf-profile.children.cycles-pp.memcpy_toio
0.00 +18.0 18.02 ? 12% perf-profile.children.cycles-pp.ast_mode_config_helper_atomic_commit_tail
0.00 +18.0 18.02 ? 12% perf-profile.children.cycles-pp.commit_tail
0.00 +18.0 18.02 ? 12% perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
0.00 +18.0 18.02 ? 12% perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail_rpm
0.00 +18.0 18.03 ? 12% perf-profile.children.cycles-pp.drm_atomic_commit
0.00 +18.0 18.03 ? 12% perf-profile.children.cycles-pp.drm_atomic_helper_commit
0.00 +18.0 18.03 ? 12% perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
0.00 +18.7 18.67 ? 12% perf-profile.children.cycles-pp.drm_fb_helper_damage_work
0.00 +18.7 18.67 ? 12% perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty
0.00 +18.7 18.72 ? 12% perf-profile.children.cycles-pp.process_one_work
0.00 +18.8 18.77 ? 12% perf-profile.children.cycles-pp.worker_thread
0.00 +19.3 19.28 ? 15% perf-profile.children.cycles-pp.truncate_inode_pages_range
0.00 +19.3 19.28 ? 15% perf-profile.children.cycles-pp.evict
0.00 +19.3 19.29 ? 15% perf-profile.children.cycles-pp.__x64_sys_unlinkat
0.00 +19.3 19.29 ? 15% perf-profile.children.cycles-pp.do_unlinkat
0.00 +19.3 19.29 ? 15% perf-profile.children.cycles-pp.unlinkat
1.04 ? 79% +22.3 23.38 ? 15% perf-profile.children.cycles-pp.ret_from_fork_asm
0.89 ?100% +22.5 23.36 ? 15% perf-profile.children.cycles-pp.kthread
0.89 ?100% +22.5 23.38 ? 15% perf-profile.children.cycles-pp.ret_from_fork
6.41 ? 88% +26.4 32.85 ? 7% perf-profile.children.cycles-pp.start_secondary
6.41 ? 88% +26.6 33.04 ? 7% perf-profile.children.cycles-pp.cpu_startup_entry
6.41 ? 88% +26.6 33.04 ? 7% perf-profile.children.cycles-pp.do_idle
6.41 ? 88% +26.6 33.04 ? 7% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
3.56 ? 99% +26.7 30.28 ? 7% perf-profile.children.cycles-pp.cpuidle_enter_state
3.56 ? 99% +26.7 30.30 ? 7% perf-profile.children.cycles-pp.cpuidle_enter
3.68 ? 96% +29.0 32.73 ? 7% perf-profile.children.cycles-pp.cpuidle_idle_call
5.46 ? 30% -5.5 0.00 perf-profile.self.cycles-pp.smp_call_function_single
5.44 ? 65% -4.6 0.80 ? 18% perf-profile.self.cycles-pp._raw_spin_lock
4.63 ? 59% -4.6 0.00 perf-profile.self.cycles-pp.copy_page_from_iter_atomic
0.00 +0.1 0.06 ? 32% perf-profile.self.cycles-pp.free_unref_page_commit
0.00 +0.1 0.06 ? 13% perf-profile.self.cycles-pp.do_idle
0.00 +0.1 0.06 ? 10% perf-profile.self.cycles-pp.perf_rotate_context
0.00 +0.1 0.06 ? 17% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.00 +0.1 0.07 ? 14% perf-profile.self.cycles-pp.load_balance
0.00 +0.1 0.07 ? 17% perf-profile.self.cycles-pp.ct_kernel_enter
0.00 +0.1 0.07 ? 25% perf-profile.self.cycles-pp.rcu_sched_clock_irq
0.00 +0.1 0.08 ? 15% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
0.00 +0.1 0.08 ? 9% perf-profile.self.cycles-pp.call_cpuidle
0.00 +0.1 0.08 ? 13% perf-profile.self.cycles-pp.error_entry
0.00 +0.1 0.08 ? 21% perf-profile.self.cycles-pp.__do_softirq
0.00 +0.1 0.08 ? 24% perf-profile.self.cycles-pp.uncharge_batch
0.00 +0.1 0.08 ? 20% perf-profile.self.cycles-pp.__page_cache_release
0.00 +0.1 0.09 ? 9% perf-profile.self.cycles-pp.hrtimer_interrupt
0.00 +0.1 0.09 ? 15% perf-profile.self.cycles-pp.x86_pmu_disable
0.00 +0.1 0.09 ? 26% perf-profile.self.cycles-pp.irqtime_account_irq
0.00 +0.1 0.09 ? 10% perf-profile.self.cycles-pp.__hrtimer_next_event_base
0.00 +0.1 0.10 ? 26% perf-profile.self.cycles-pp.delay_halt
0.00 +0.1 0.10 ? 24% perf-profile.self.cycles-pp.delete_from_page_cache_batch
0.00 +0.1 0.11 ? 11% perf-profile.self.cycles-pp.scheduler_tick
0.00 +0.1 0.12 ? 12% perf-profile.self.cycles-pp.workingset_update_node
0.00 +0.1 0.13 ? 32% perf-profile.self.cycles-pp.check_cpu_stall
0.00 +0.1 0.13 ? 12% perf-profile.self.cycles-pp.irqtime_account_process_tick
0.00 +0.1 0.13 ? 12% perf-profile.self.cycles-pp.__hrtimer_run_queues
0.00 +0.1 0.15 ? 71% perf-profile.self.cycles-pp.fbcon_redraw
0.00 +0.2 0.15 ? 67% perf-profile.self.cycles-pp.calc_global_load_tick
0.00 +0.2 0.15 ? 9% perf-profile.self.cycles-pp.cpuidle_idle_call
0.00 +0.2 0.16 ? 55% perf-profile.self.cycles-pp.bit_putcs
0.00 +0.2 0.17 ? 25% perf-profile.self.cycles-pp.delay_halt_tpause
0.00 +0.2 0.18 ? 13% perf-profile.self.cycles-pp.rcu_pending
0.00 +0.2 0.19 ? 16% perf-profile.self.cycles-pp.list_lru_del_obj
0.00 +0.2 0.21 ? 12% perf-profile.self.cycles-pp.trigger_load_balance
0.00 +0.2 0.22 ? 15% perf-profile.self.cycles-pp.truncate_folio_batch_exceptionals
0.00 +0.2 0.22 ? 16% perf-profile.self.cycles-pp.update_irq_load_avg
0.00 +0.2 0.23 ? 16% perf-profile.self.cycles-pp.radix_tree_node_rcu_free
0.00 +0.2 0.23 ? 30% perf-profile.self.cycles-pp.tick_sched_do_timer
0.00 +0.2 0.25 ? 15% perf-profile.self.cycles-pp.get_slabinfo
0.00 +0.2 0.25 ? 20% perf-profile.self.cycles-pp.ct_kernel_exit_state
0.00 +0.3 0.27 ? 16% perf-profile.self.cycles-pp.xas_start
0.00 +0.3 0.33 ? 54% perf-profile.self.cycles-pp.tick_nohz_next_event
0.00 +0.4 0.36 ? 10% perf-profile.self.cycles-pp.native_apic_msr_eoi
0.00 +0.4 0.38 ? 16% perf-profile.self.cycles-pp.ifs_free
0.00 +0.4 0.40 ? 10% perf-profile.self.cycles-pp.native_sched_clock
0.00 +0.4 0.43 ? 13% perf-profile.self.cycles-pp.lapic_next_deadline
0.00 +0.4 0.44 ? 13% perf-profile.self.cycles-pp.mem_cgroup_from_slab_obj
0.00 +0.4 0.45 ? 11% perf-profile.self.cycles-pp.read_tsc
0.00 +0.4 0.45 ? 22% perf-profile.self.cycles-pp.__free_pages_ok
0.00 +0.5 0.48 ? 23% perf-profile.self.cycles-pp.page_counter_uncharge
0.00 +0.5 0.52 ? 23% perf-profile.self.cycles-pp.ktime_get_update_offsets_now
0.00 +0.5 0.52 ? 15% perf-profile.self.cycles-pp.xas_load
0.00 +0.5 0.53 ? 25% perf-profile.self.cycles-pp.rcu_cblist_dequeue
0.00 +0.6 0.59 ? 48% perf-profile.self.cycles-pp.xas_descend
0.00 +0.6 0.62 ? 8% perf-profile.self.cycles-pp.menu_select
0.00 +0.6 0.62 ? 14% perf-profile.self.cycles-pp.xas_clear_mark
0.00 +0.7 0.69 ? 25% perf-profile.self.cycles-pp.folio_undo_large_rmappable
0.00 +1.0 0.96 ? 62% perf-profile.self.cycles-pp.fast_imageblit
0.00 +1.1 1.10 ? 23% perf-profile.self.cycles-pp.find_lock_entries
0.13 ?264% +1.2 1.38 ? 12% perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
0.30 ?175% +1.3 1.62 ? 10% perf-profile.self.cycles-pp.cpuidle_enter_state
0.48 ?132% +1.8 2.28 ? 16% perf-profile.self.cycles-pp.__slab_free
0.00 +2.8 2.85 ? 34% perf-profile.self.cycles-pp.ktime_get
0.30 ?175% +3.1 3.40 ? 9% perf-profile.self.cycles-pp.__intel_pmu_enable_all
0.00 +3.2 3.19 ? 14% perf-profile.self.cycles-pp.xas_store
0.00 +3.5 3.54 ? 15% perf-profile.self.cycles-pp.xas_find
0.50 ?132% +4.4 4.93 ? 8% perf-profile.self.cycles-pp.intel_idle
0.00 +5.0 5.01 ? 20% perf-profile.self.cycles-pp.io_serial_out
0.00 +5.3 5.34 ? 8% perf-profile.self.cycles-pp.intel_idle_xstate
0.00 +10.6 10.65 ? 12% perf-profile.self.cycles-pp.io_serial_in
0.00 +17.6 17.56 ? 12% perf-profile.self.cycles-pp.memcpy_toio



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



2024-02-21 11:14:45

by Jan Kara

[permalink] [raw]
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

On Tue 20-02-24 16:25:37, kernel test robot wrote:
> Hello,
>
> kernel test robot noticed a -21.4% regression of vm-scalability.throughput on:
>
>
> commit: ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ("readahead: avoid multiple marked readahead pages")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> testcase: vm-scalability
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
> parameters:
>
> runtime: 300s
> test: lru-file-readtwice
> cpufreq_governor: performance

JFYI I had a look into this. What the test seems to do is that it creates
image files on tmpfs, loopmounts XFS there, and does reads over file on
XFS. But I was not able to find what lru-file-readtwice exactly does,
neither I was able to reproduce it because I got stuck on some missing Ruby
dependencies on my test system yesterday.

Given the workload is over tmpfs, I'm not very concerned about what
readahead does and how it performs but still I'd like to investigate where
the regression is coming from because it is unexpected.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-02-22 01:35:16

by kernel test robot

[permalink] [raw]
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

hi, Jan Kara,

On Wed, Feb 21, 2024 at 12:14:25PM +0100, Jan Kara wrote:
> On Tue 20-02-24 16:25:37, kernel test robot wrote:
> > Hello,
> >
> > kernel test robot noticed a -21.4% regression of vm-scalability.throughput on:
> >
> >
> > commit: ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ("readahead: avoid multiple marked readahead pages")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > testcase: vm-scalability
> > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
> > parameters:
> >
> > runtime: 300s
> > test: lru-file-readtwice
> > cpufreq_governor: performance
>
> JFYI I had a look into this. What the test seems to do is that it creates
> image files on tmpfs, loopmounts XFS there, and does reads over file on
> XFS. But I was not able to find what lru-file-readtwice exactly does,
> neither I was able to reproduce it because I got stuck on some missing Ruby
> dependencies on my test system yesterday.

what's your OS?

>
> Given the workload is over tmpfs, I'm not very concerned about what
> readahead does and how it performs but still I'd like to investigate where
> the regression is coming from because it is unexpected.

Thanks a lot for information!
it was hard to me to determine the connection, so I rebuilt and rerun tests more
which still showed stable data.

if you have any patch want us to try, please let us know.
it's always our great pleasure to supply supports :)

>
> Honza
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR
>

2024-02-22 11:51:08

by Jan Kara

[permalink] [raw]
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

Hello,

On Thu 22-02-24 09:32:52, Oliver Sang wrote:
> On Wed, Feb 21, 2024 at 12:14:25PM +0100, Jan Kara wrote:
> > On Tue 20-02-24 16:25:37, kernel test robot wrote:
> > > kernel test robot noticed a -21.4% regression of vm-scalability.throughput on:
> > >
> > > commit: ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ("readahead: avoid multiple marked readahead pages")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > > testcase: vm-scalability
> > > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
> > > parameters:
> > >
> > > runtime: 300s
> > > test: lru-file-readtwice
> > > cpufreq_governor: performance
> >
> > JFYI I had a look into this. What the test seems to do is that it creates
> > image files on tmpfs, loopmounts XFS there, and does reads over file on
> > XFS. But I was not able to find what lru-file-readtwice exactly does,
> > neither I was able to reproduce it because I got stuck on some missing Ruby
> > dependencies on my test system yesterday.
>
> what's your OS?

I have SLES15-SP4 installed in my VM. What was missing was 'git' rubygem
which apparently is not packaged at all and when I manually installed it, I
was still hitting other problems so I rather went ahead and checked the
vm-scalability source and wrote my own reproducer based on that.

I'm now able to reproduce the regression in my VM so I'm investigating...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-02-22 18:42:36

by Jan Kara

[permalink] [raw]
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

On Thu 22-02-24 12:50:32, Jan Kara wrote:
> On Thu 22-02-24 09:32:52, Oliver Sang wrote:
> > On Wed, Feb 21, 2024 at 12:14:25PM +0100, Jan Kara wrote:
> > > On Tue 20-02-24 16:25:37, kernel test robot wrote:
> > > > kernel test robot noticed a -21.4% regression of vm-scalability.throughput on:
> > > >
> > > > commit: ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ("readahead: avoid multiple marked readahead pages")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > >
> > > > testcase: vm-scalability
> > > > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
> > > > parameters:
> > > >
> > > > runtime: 300s
> > > > test: lru-file-readtwice
> > > > cpufreq_governor: performance
> > >
> > > JFYI I had a look into this. What the test seems to do is that it creates
> > > image files on tmpfs, loopmounts XFS there, and does reads over file on
> > > XFS. But I was not able to find what lru-file-readtwice exactly does,
> > > neither I was able to reproduce it because I got stuck on some missing Ruby
> > > dependencies on my test system yesterday.
> >
> > what's your OS?
>
> I have SLES15-SP4 installed in my VM. What was missing was 'git' rubygem
> which apparently is not packaged at all and when I manually installed it, I
> was still hitting other problems so I rather went ahead and checked the
> vm-scalability source and wrote my own reproducer based on that.
>
> I'm now able to reproduce the regression in my VM so I'm investigating...

So I was experimenting with this. What the test does is it creates as many
files as there are CPUs, files are sized so that their total size is 8x the
amount of available RAM. For each file two tasks are started which
sequentially read the file from start to end. Trivial repro from my VM with
8 CPUs and 64GB of RAM is like:

truncate -s 60000000000 /dev/shm/xfsimg
mkfs.xfs /dev/shm/xfsimg
mount -t xfs -o loop /dev/shm/xfsimg /mnt
for (( i = 0; i < 8; i++ )); do truncate -s 60000000000 /mnt/sparse-file-$i; done
echo "Ready..."
sleep 3
echo "Running..."
for (( i = 0; i < 8; i++ )); do
dd bs=4k if=/mnt/sparse-file-$i of=/dev/null &
dd bs=4k if=/mnt/sparse-file-$i of=/dev/null &
done 2>&1 | grep "copied"
wait
umount /mnt

The difference between slow and fast runs seems to be in the amount of
pages reclaimed with direct reclaim - after commit ab4443fe3c we reclaim
about 10% of pages with direct reclaim, before commit ab4443fe3c only about
1% of pages is reclaimed with direct reclaim. In both cases we reclaim the
same amount of pages corresponding to the total size of files so it isn't
the case that we would be rereading one page twice.

I suspect the reclaim difference is because after commit ab4443fe3c we
trigger readahead somewhat earlier so our effective workingset is somewhat
larger. This apparently gives harder time to kswapd and we end up with
direct reclaim more often.

Since this is a case of heavy overload on the system, I don't think the
throughput here matters that much and AFAICT the readahead code does
nothing wrong here. So I don't think we need to do anything here.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-03-04 05:06:47

by Yujie Liu

[permalink] [raw]
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

Hi Honza,

On Thu, Feb 22, 2024 at 07:37:56PM +0100, Jan Kara wrote:
> On Thu 22-02-24 12:50:32, Jan Kara wrote:
> > On Thu 22-02-24 09:32:52, Oliver Sang wrote:
> > > On Wed, Feb 21, 2024 at 12:14:25PM +0100, Jan Kara wrote:
> > > > On Tue 20-02-24 16:25:37, kernel test robot wrote:
> > > > > kernel test robot noticed a -21.4% regression of vm-scalability.throughput on:
> > > > >
> > > > > commit: ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ("readahead: avoid multiple marked readahead pages")
> > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > > >
> > > > > testcase: vm-scalability
> > > > > test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory
> > > > > parameters:
> > > > >
> > > > > runtime: 300s
> > > > > test: lru-file-readtwice
> > > > > cpufreq_governor: performance
> > > >
> > > > JFYI I had a look into this. What the test seems to do is that it creates
> > > > image files on tmpfs, loopmounts XFS there, and does reads over file on
> > > > XFS. But I was not able to find what lru-file-readtwice exactly does,
> > > > neither I was able to reproduce it because I got stuck on some missing Ruby
> > > > dependencies on my test system yesterday.
> > >
> > > what's your OS?
> >
> > I have SLES15-SP4 installed in my VM. What was missing was 'git' rubygem
> > which apparently is not packaged at all and when I manually installed it, I
> > was still hitting other problems so I rather went ahead and checked the
> > vm-scalability source and wrote my own reproducer based on that.
> >
> > I'm now able to reproduce the regression in my VM so I'm investigating...
>
> So I was experimenting with this. What the test does is it creates as many
> files as there are CPUs, files are sized so that their total size is 8x the
> amount of available RAM. For each file two tasks are started which
> sequentially read the file from start to end. Trivial repro from my VM with
> 8 CPUs and 64GB of RAM is like:
>
> truncate -s 60000000000 /dev/shm/xfsimg
> mkfs.xfs /dev/shm/xfsimg
> mount -t xfs -o loop /dev/shm/xfsimg /mnt
> for (( i = 0; i < 8; i++ )); do truncate -s 60000000000 /mnt/sparse-file-$i; done
> echo "Ready..."
> sleep 3
> echo "Running..."
> for (( i = 0; i < 8; i++ )); do
> dd bs=4k if=/mnt/sparse-file-$i of=/dev/null &
> dd bs=4k if=/mnt/sparse-file-$i of=/dev/null &
> done 2>&1 | grep "copied"
> wait
> umount /mnt
>
> The difference between slow and fast runs seems to be in the amount of
> pages reclaimed with direct reclaim - after commit ab4443fe3c we reclaim
> about 10% of pages with direct reclaim, before commit ab4443fe3c only about
> 1% of pages is reclaimed with direct reclaim. In both cases we reclaim the
> same amount of pages corresponding to the total size of files so it isn't
> the case that we would be rereading one page twice.
>
> I suspect the reclaim difference is because after commit ab4443fe3c we
> trigger readahead somewhat earlier so our effective workingset is somewhat
> larger. This apparently gives harder time to kswapd and we end up with
> direct reclaim more often.
>
> Since this is a case of heavy overload on the system, I don't think the
> throughput here matters that much and AFAICT the readahead code does
> nothing wrong here. So I don't think we need to do anything here.

Thanks a lot for the analysis. Seems we can abstract two factors that
may affect the throughput:

1. The benchmark itself is "dd" from a file to null, which is basically
a sequential operation, so the earlier readahead should bring benefit
to the throughput.

2. The earlier readahead somewhat enlarges the workingset and causes
more often direct memory reclaim, which may hurt the throughput.

We did another round of test. Our machine has 512GB RAM, now we set
the total file size to 256GB so that all the files can be fully loaded
into the memory and there will be no reclaim anymore. This eliminates
the impact of factor 2, but unexpectedly, we still see a -42.3%
throughput regression after commit ab4443fe3c.

From the perf profile, we can see that the contention of folio lru lock
becomes more intense. We also did a simple one-file "dd" test. Looks
like it is more likely that low-order folios are allocated after commit
ab4443fe3c (Fengwei will help provide the data soon). Therefore, the
average folio size decreases while the total folio amount increases,
which leads to touching lru lock more often.

Please kindly check the detailed metrics below:

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/runtime/test/cpufreq_governor/debug-setup:
lkp-spr-2sp4/vm-scalability/debian-11.1-x86_64-20220510.cgz/x86_64-rhel-8.3/gcc-12/300s/lru-file-readtwice/performance/256GB-perf

commit:
f0b7a0d1d466 ("Merge branch 'master' into mm-hotfixes-stable")
ab4443fe3ca6 ("readahead: avoid multiple marked readahead pages")

f0b7a0d1d46625db ab4443fe3ca6298663a55c4a70e
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.00 ? 49% -0.0 0.00 mpstat.cpu.all.iowait%
8.43 ? 2% +3.6 12.06 ? 5% mpstat.cpu.all.sys%
0.31 -0.0 0.27 ? 2% mpstat.cpu.all.usr%
2289863 ? 8% +55.6% 3563274 ? 7% numa-numastat.node0.local_node
2375395 ? 6% +54.4% 3666799 ? 6% numa-numastat.node0.numa_hit
2311189 ? 7% +53.8% 3554903 ? 6% numa-numastat.node1.local_node
2454386 ? 6% +50.1% 3684288 ? 4% numa-numastat.node1.numa_hit
300.98 +25.2% 376.84 ? 4% vmstat.memory.buff
46333305 +27.5% 59075372 ? 3% vmstat.memory.cache
25.22 ? 4% +51.4% 38.18 ? 6% vmstat.procs.r
303089 +8.0% 327220 vmstat.system.in
29.30 +13.5% 33.27 time.elapsed_time
29.30 +13.5% 33.27 time.elapsed_time.max
33780 ? 16% +94.4% 65660 ? 7% time.involuntary_context_switches
1943 ? 2% +42.4% 2767 ? 5% time.percent_of_cpu_this_job_got
554.77 ? 3% +63.5% 907.13 ? 6% time.system_time
14.90 -3.3% 14.40 time.user_time
20505 ? 11% -41.1% 12085 ? 8% time.voluntary_context_switches
284.00 ? 3% +34.8% 382.75 ? 5% turbostat.Avg_MHz
10.08 ? 2% +3.6 13.65 ? 4% turbostat.Busy%
39.50 ? 2% -1.8 37.68 turbostat.C1E%
0.38 ? 9% -17.3% 0.31 ? 14% turbostat.CPU%c6
9577640 +22.3% 11715251 ? 2% turbostat.IRQ
4.88 ? 12% -3.7 1.15 ? 48% turbostat.PKG_%
5558 ? 5% +41.9% 7887 ? 6% turbostat.POLL
790616 ? 6% -43.9% 443300 ? 7% vm-scalability.median
12060 ? 7% +3811.3 15871 ? 4% vm-scalability.stddev%
3.681e+08 ? 7% -42.3% 2.122e+08 ? 7% vm-scalability.throughput
33780 ? 16% +94.4% 65660 ? 7% vm-scalability.time.involuntary_context_switches
1943 ? 2% +42.4% 2767 ? 5% vm-scalability.time.percent_of_cpu_this_job_got
554.77 ? 3% +63.5% 907.13 ? 6% vm-scalability.time.system_time
20505 ? 11% -41.1% 12085 ? 8% vm-scalability.time.voluntary_context_switches
21390979 ? 4% +31.7% 28175360 ? 19% numa-meminfo.node0.Active
21388266 ? 4% +31.7% 28172516 ? 19% numa-meminfo.node0.Active(file)
24037883 ? 6% +31.1% 31516721 ? 17% numa-meminfo.node0.FilePages
497645 ? 25% +82.4% 907626 ? 38% numa-meminfo.node0.Inactive(file)
25952309 ? 6% +29.2% 33533454 ? 16% numa-meminfo.node0.MemUsed
20138 ? 9% +154.2% 51187 ? 11% numa-meminfo.node1.Active(anon)
704324 ? 17% +85.4% 1306147 ? 33% numa-meminfo.node1.Inactive
427031 ? 22% +141.7% 1031971 ? 41% numa-meminfo.node1.Inactive(file)
43712836 +27.4% 55698257 ? 2% meminfo.Active
22786 ? 6% +136.6% 53907 ? 11% meminfo.Active(anon)
43690049 +27.4% 55644350 ? 2% meminfo.Active(file)
47543418 +27.4% 60583554 ? 2% meminfo.Cached
1454581 ? 10% +72.8% 2513041 ? 11% meminfo.Inactive
929099 ? 16% +109.5% 1946433 ? 14% meminfo.Inactive(file)
242993 +12.9% 274324 meminfo.KReclaimable
79132 ? 2% +34.8% 106631 ? 2% meminfo.Mapped
51363725 +25.6% 64520957 ? 2% meminfo.Memused
9840 +12.2% 11041 ? 2% meminfo.PageTables
242993 +12.9% 274324 meminfo.SReclaimable
136679 +50.2% 205224 ? 5% meminfo.Shmem
72281513 ? 2% +25.8% 90925817 ? 2% meminfo.max_used_kB
5346609 ? 4% +31.7% 7042196 ? 19% numa-vmstat.node0.nr_active_file
6008637 ? 7% +31.1% 7878524 ? 17% numa-vmstat.node0.nr_file_pages
123918 ? 25% +83.2% 227064 ? 38% numa-vmstat.node0.nr_inactive_file
5346510 ? 4% +31.7% 7042147 ? 19% numa-vmstat.node0.nr_zone_active_file
123908 ? 25% +83.3% 227063 ? 38% numa-vmstat.node0.nr_zone_inactive_file
2375271 ? 6% +54.4% 3666818 ? 6% numa-vmstat.node0.numa_hit
2289740 ? 8% +55.6% 3563294 ? 7% numa-vmstat.node0.numa_local
5043 ? 9% +153.9% 12803 ? 11% numa-vmstat.node1.nr_active_anon
106576 ? 22% +141.7% 257597 ? 41% numa-vmstat.node1.nr_inactive_file
5043 ? 9% +153.9% 12803 ? 11% numa-vmstat.node1.nr_zone_active_anon
106574 ? 22% +141.7% 257604 ? 41% numa-vmstat.node1.nr_zone_inactive_file
2454493 ? 6% +50.1% 3684201 ? 4% numa-vmstat.node1.numa_hit
2311296 ? 7% +53.8% 3554816 ? 6% numa-vmstat.node1.numa_local
5701 ? 6% +136.5% 13486 ? 11% proc-vmstat.nr_active_anon
10923519 +27.3% 13904109 ? 2% proc-vmstat.nr_active_file
11886157 +27.4% 15138396 ? 2% proc-vmstat.nr_file_pages
1.19e+08 -2.8% 1.157e+08 proc-vmstat.nr_free_pages
131227 +8.1% 141868 proc-vmstat.nr_inactive_anon
231610 ? 16% +109.7% 485756 ? 14% proc-vmstat.nr_inactive_file
19793 ? 2% +34.7% 26668 ? 2% proc-vmstat.nr_mapped
2455 +12.3% 2758 ? 2% proc-vmstat.nr_page_table_pages
34038 ? 2% +51.4% 51526 ? 5% proc-vmstat.nr_shmem
60753 +12.9% 68588 proc-vmstat.nr_slab_reclaimable
113209 +5.9% 119837 proc-vmstat.nr_slab_unreclaimable
5701 ? 6% +136.5% 13486 ? 11% proc-vmstat.nr_zone_active_anon
10923517 +27.3% 13904109 ? 2% proc-vmstat.nr_zone_active_file
131227 +8.1% 141868 proc-vmstat.nr_zone_inactive_anon
231612 ? 16% +109.7% 485757 ? 14% proc-vmstat.nr_zone_inactive_file
162.75 ? 79% +552.8% 1062 ? 72% proc-vmstat.numa_hint_faults
4831171 ? 4% +52.2% 7352661 ? 4% proc-vmstat.numa_hit
4602441 ? 5% +54.7% 7119707 ? 4% proc-vmstat.numa_local
128.75 ? 59% +527.5% 807.88 ? 31% proc-vmstat.numa_pages_migrated
69656618 -1.5% 68615309 proc-vmstat.pgalloc_normal
672926 +3.0% 692907 proc-vmstat.pgfault
128.75 ? 59% +527.5% 807.88 ? 31% proc-vmstat.pgmigrate_success
31089 +3.7% 32235 proc-vmstat.pgreuse
0.77 ? 2% -0.0 0.74 ? 2% perf-stat.i.branch-miss-rate%
23.58 ? 6% +3.6 27.18 ? 4% perf-stat.i.cache-miss-rate%
2.74 +6.0% 2.90 perf-stat.i.cpi
5.887e+10 ? 7% +28.6% 7.572e+10 ? 10% perf-stat.i.cpu-cycles
10194 ? 3% -9.5% 9226 ? 4% perf-stat.i.cycles-between-cache-misses
0.44 -2.7% 0.43 perf-stat.i.ipc
0.25 ? 11% +29.9% 0.32 ? 11% perf-stat.i.metric.GHz
17995 ? 2% -9.0% 16374 ? 3% perf-stat.i.minor-faults
17995 ? 2% -9.0% 16374 ? 3% perf-stat.i.page-faults
17.09 -16.1% 14.34 ? 2% perf-stat.overall.MPKI
0.32 -0.0 0.29 perf-stat.overall.branch-miss-rate%
82.93 -2.1 80.88 perf-stat.overall.cache-miss-rate%
3.55 ? 2% +28.9% 4.58 ? 3% perf-stat.overall.cpi
207.81 ? 2% +53.7% 319.49 ? 5% perf-stat.overall.cycles-between-cache-misses
0.01 ? 4% +0.0 0.01 ? 3% perf-stat.overall.dTLB-load-miss-rate%
0.01 ? 3% +0.0 0.01 ? 2% perf-stat.overall.dTLB-store-miss-rate%
0.28 ? 2% -22.3% 0.22 ? 3% perf-stat.overall.ipc
967.32 +21.5% 1175 ? 2% perf-stat.overall.path-length
3.648e+09 +9.0% 3.976e+09 perf-stat.ps.branch-instructions
2.987e+08 -10.6% 2.67e+08 perf-stat.ps.cache-misses
3.602e+08 -8.4% 3.301e+08 perf-stat.ps.cache-references
6.207e+10 ? 2% +37.3% 8.524e+10 ? 4% perf-stat.ps.cpu-cycles
356765 ? 4% +14.6% 408833 ? 4% perf-stat.ps.dTLB-load-misses
4.786e+09 +5.2% 5.034e+09 perf-stat.ps.dTLB-loads
222451 ? 2% +6.7% 237255 ? 2% perf-stat.ps.dTLB-store-misses
2.207e+09 -7.4% 2.043e+09 perf-stat.ps.dTLB-stores
1.748e+10 +6.5% 1.862e+10 perf-stat.ps.instructions
17777 -9.3% 16117 ? 2% perf-stat.ps.minor-faults
17778 -9.3% 16118 ? 2% perf-stat.ps.page-faults
5.193e+11 +21.5% 6.31e+11 ? 2% perf-stat.total.instructions
12.70 -7.9 4.85 ? 38% perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read
12.53 -7.8 4.76 ? 38% perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter
8.68 -5.2 3.46 ? 38% perf-profile.calltrace.cycles-pp.read_pages.page_cache_ra_order.filemap_get_pages.filemap_read.xfs_file_buffered_read
8.13 -4.7 3.38 ? 9% perf-profile.calltrace.cycles-pp.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order
8.67 -4.7 3.93 ? 8% perf-profile.calltrace.cycles-pp.iomap_readahead.read_pages.page_cache_ra_order.filemap_get_pages.filemap_read
8.51 -4.7 3.81 ? 8% perf-profile.calltrace.cycles-pp.iomap_readpage_iter.iomap_readahead.read_pages.page_cache_ra_order.filemap_get_pages
7.84 -4.6 3.28 ? 8% perf-profile.calltrace.cycles-pp.__memset.zero_user_segments.iomap_readpage_iter.iomap_readahead.read_pages
6.47 ? 2% -2.1 4.39 ? 5% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
6.44 ? 2% -2.1 4.36 ? 5% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
6.44 ? 2% -2.1 4.36 ? 5% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
6.43 ? 2% -2.1 4.36 ? 5% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
6.39 ? 2% -2.1 4.33 ? 5% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
6.08 ? 2% -2.0 4.11 ? 5% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
5.85 ? 2% -1.9 3.96 ? 5% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
3.96 ? 2% -1.3 2.62 ? 6% perf-profile.calltrace.cycles-pp.write
3.50 ? 2% -1.1 2.36 ? 6% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
3.28 ? 2% -1.1 2.22 ? 6% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
2.76 ? 3% -0.9 1.86 ? 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
2.63 ? 3% -0.8 1.79 ? 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.37 ? 2% -0.8 1.57 ? 6% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
2.30 ? 2% -0.8 1.52 ? 6% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
2.34 ? 4% -0.7 1.61 ? 7% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
1.91 ? 3% -0.7 1.26 ? 7% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
2.04 ? 4% -0.6 1.43 ? 7% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
1.32 ? 2% -0.6 0.71 ? 38% perf-profile.calltrace.cycles-pp.filemap_get_read_batch.filemap_get_pages.filemap_read.xfs_file_buffered_read.xfs_file_read_iter
1.68 ? 4% -0.6 1.09 ? 8% perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
1.48 ? 3% -0.5 0.98 ? 6% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
1.47 ? 3% -0.5 0.98 ? 6% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt
1.29 ? 3% -0.4 0.87 ? 5% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues
1.46 ? 11% -0.4 1.07 ? 38% perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail
1.46 ? 11% -0.4 1.07 ? 38% perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail
1.46 ? 11% -0.4 1.07 ? 38% perf-profile.calltrace.cycles-pp.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb
1.46 ? 11% -0.4 1.07 ? 38% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit
1.46 ? 11% -0.4 1.07 ? 38% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit
1.43 ? 11% -0.4 1.04 ? 38% perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm
1.13 ? 3% -0.4 0.76 ? 6% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
1.02 -0.3 0.70 ? 5% perf-profile.calltrace.cycles-pp.intel_idle_xstate.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.94 ? 3% -0.3 0.65 ? 5% perf-profile.calltrace.cycles-pp.perf_event_task_tick.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler
0.92 ? 3% -0.3 0.63 ? 5% perf-profile.calltrace.cycles-pp.perf_adjust_freq_unthr_context.perf_event_task_tick.scheduler_tick.update_process_times.tick_sched_handle
1.58 ? 12% -0.3 1.29 ? 3% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.74 ? 4% -0.3 0.46 ? 38% perf-profile.calltrace.cycles-pp.__filemap_add_folio.filemap_add_folio.page_cache_ra_order.filemap_get_pages.filemap_read
1.56 ? 12% -0.3 1.29 ? 4% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1.55 ? 12% -0.3 1.28 ? 3% perf-profile.calltrace.cycles-pp.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread.ret_from_fork
1.55 ? 12% -0.3 1.28 ? 3% perf-profile.calltrace.cycles-pp.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread
1.65 ? 11% -0.3 1.38 ? 3% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
1.65 ? 11% -0.3 1.38 ? 3% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
1.65 ? 11% -0.3 1.38 ? 3% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
1.46 ? 11% -0.2 1.22 ? 3% perf-profile.calltrace.cycles-pp.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty
1.46 ? 11% -0.2 1.22 ? 3% perf-profile.calltrace.cycles-pp.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work
1.46 ? 11% -0.2 1.22 ? 3% perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work
1.47 ? 11% -0.2 1.22 ? 3% perf-profile.calltrace.cycles-pp.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread
1.03 ? 7% -0.2 0.82 ? 8% perf-profile.calltrace.cycles-pp.devkmsg_emit.devkmsg_write.vfs_write.ksys_write.do_syscall_64
1.03 ? 7% -0.2 0.82 ? 8% perf-profile.calltrace.cycles-pp.devkmsg_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.03 ? 7% -0.2 0.82 ? 8% perf-profile.calltrace.cycles-pp.vprintk_emit.devkmsg_emit.devkmsg_write.vfs_write.ksys_write
1.02 ? 7% -0.2 0.82 ? 8% perf-profile.calltrace.cycles-pp.console_flush_all.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write
1.02 ? 7% -0.2 0.82 ? 8% perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.vfs_write
0.61 ? 5% -0.1 0.55 ? 4% perf-profile.calltrace.cycles-pp.truncate_inode_pages_range.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64
86.32 +4.1 90.42 perf-profile.calltrace.cycles-pp.read
85.08 +4.6 89.66 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
84.96 +4.6 89.58 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
84.63 +4.7 89.38 perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
84.36 +4.9 89.21 perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
26.79 +9.3 36.06 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_activate
26.94 +9.3 36.22 ? 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_activate.folio_mark_accessed.filemap_read
26.87 +9.3 36.17 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_activate.folio_mark_accessed
26.91 +10.9 37.78 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru
27.00 +10.9 37.89 ? 2% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.filemap_add_folio.page_cache_ra_order
26.99 +10.9 37.89 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.filemap_add_folio
27.44 +10.9 38.36 ? 2% perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru.filemap_add_folio.page_cache_ra_order.filemap_get_pages
27.47 +10.9 38.39 ? 2% perf-profile.calltrace.cycles-pp.folio_add_lru.filemap_add_folio.page_cache_ra_order.filemap_get_pages.filemap_read
12.72 -7.2 5.56 ? 7% perf-profile.children.cycles-pp.copy_page_to_iter
12.56 -7.1 5.46 ? 7% perf-profile.children.cycles-pp._copy_to_iter
8.80 -4.9 3.95 ? 8% perf-profile.children.cycles-pp.read_pages
8.78 -4.8 3.94 ? 8% perf-profile.children.cycles-pp.iomap_readahead
8.62 -4.8 3.83 ? 8% perf-profile.children.cycles-pp.iomap_readpage_iter
8.15 -4.8 3.39 ? 9% perf-profile.children.cycles-pp.zero_user_segments
8.07 -4.7 3.36 ? 9% perf-profile.children.cycles-pp.__memset
6.47 ? 2% -2.1 4.39 ? 5% perf-profile.children.cycles-pp.cpu_startup_entry
6.47 ? 2% -2.1 4.39 ? 5% perf-profile.children.cycles-pp.do_idle
6.47 ? 2% -2.1 4.39 ? 5% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
6.44 ? 2% -2.1 4.36 ? 5% perf-profile.children.cycles-pp.start_secondary
6.42 ? 2% -2.1 4.36 ? 5% perf-profile.children.cycles-pp.cpuidle_idle_call
6.10 ? 2% -2.0 4.13 ? 5% perf-profile.children.cycles-pp.cpuidle_enter
6.10 ? 2% -2.0 4.13 ? 5% perf-profile.children.cycles-pp.cpuidle_enter_state
4.53 ? 2% -1.5 2.98 ? 6% perf-profile.children.cycles-pp.write
4.34 ? 2% -1.4 2.90 ? 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
3.93 ? 2% -1.3 2.66 ? 5% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
2.96 ? 2% -1.0 1.99 ? 5% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
2.89 ? 2% -1.0 1.94 ? 5% perf-profile.children.cycles-pp.hrtimer_interrupt
2.46 ? 3% -0.8 1.64 ? 6% perf-profile.children.cycles-pp.__hrtimer_run_queues
2.47 ? 3% -0.8 1.72 ? 7% perf-profile.children.cycles-pp.ksys_write
2.18 ? 3% -0.7 1.43 ? 7% perf-profile.children.cycles-pp.tick_nohz_highres_handler
1.96 ? 2% -0.7 1.31 ? 5% perf-profile.children.cycles-pp.tick_sched_handle
1.96 ? 2% -0.7 1.30 ? 5% perf-profile.children.cycles-pp.update_process_times
2.20 ? 3% -0.6 1.55 ? 7% perf-profile.children.cycles-pp.vfs_write
1.74 ? 2% -0.6 1.17 ? 5% perf-profile.children.cycles-pp.scheduler_tick
1.35 ? 2% -0.5 0.82 ? 5% perf-profile.children.cycles-pp.filemap_get_read_batch
1.38 ? 2% -0.5 0.88 ? 6% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.48 ? 17% -0.4 0.05 ? 42% perf-profile.children.cycles-pp.page_cache_ra_unbounded
0.69 ? 7% -0.4 0.27 ? 39% perf-profile.children.cycles-pp.xfs_ilock
0.82 ? 4% -0.4 0.40 ? 7% perf-profile.children.cycles-pp.touch_atime
1.46 ? 11% -0.4 1.07 ? 38% perf-profile.children.cycles-pp.ast_primary_plane_helper_atomic_update
1.46 ? 11% -0.4 1.07 ? 38% perf-profile.children.cycles-pp.ast_mode_config_helper_atomic_commit_tail
0.77 ? 5% -0.4 0.38 ? 7% perf-profile.children.cycles-pp.atime_needs_update
1.22 ? 2% -0.4 0.84 ? 5% perf-profile.children.cycles-pp.perf_event_task_tick
1.21 ? 2% -0.4 0.83 ? 5% perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
1.14 ? 3% -0.4 0.76 ? 6% perf-profile.children.cycles-pp.intel_idle
0.65 ? 8% -0.4 0.28 ? 9% perf-profile.children.cycles-pp.down_read
1.02 ? 2% -0.3 0.70 ? 5% perf-profile.children.cycles-pp.intel_idle_xstate
0.79 ? 2% -0.3 0.49 ? 5% perf-profile.children.cycles-pp.rw_verify_area
1.58 ? 12% -0.3 1.29 ? 3% perf-profile.children.cycles-pp.worker_thread
1.56 ? 12% -0.3 1.29 ? 4% perf-profile.children.cycles-pp.process_one_work
1.55 ? 12% -0.3 1.28 ? 3% perf-profile.children.cycles-pp.drm_fb_helper_damage_work
1.55 ? 12% -0.3 1.28 ? 3% perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty
1.65 ? 11% -0.3 1.38 ? 3% perf-profile.children.cycles-pp.ret_from_fork_asm
1.65 ? 11% -0.3 1.38 ? 3% perf-profile.children.cycles-pp.ret_from_fork
1.65 ? 11% -0.3 1.38 ? 3% perf-profile.children.cycles-pp.kthread
0.68 ? 2% -0.3 0.43 ? 7% perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
1.46 ? 11% -0.2 1.22 ? 3% perf-profile.children.cycles-pp.drm_fb_memcpy
1.46 ? 11% -0.2 1.22 ? 3% perf-profile.children.cycles-pp.memcpy_toio
0.77 ? 4% -0.2 0.52 ? 4% perf-profile.children.cycles-pp.__filemap_add_folio
1.46 ? 11% -0.2 1.22 ? 3% perf-profile.children.cycles-pp.commit_tail
1.46 ? 11% -0.2 1.22 ? 3% perf-profile.children.cycles-pp.drm_atomic_commit
1.46 ? 11% -0.2 1.22 ? 3% perf-profile.children.cycles-pp.drm_atomic_helper_commit
1.46 ? 11% -0.2 1.22 ? 3% perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
1.46 ? 11% -0.2 1.22 ? 3% perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail_rpm
1.47 ? 11% -0.2 1.22 ? 3% perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
0.62 ? 3% -0.2 0.39 ? 5% perf-profile.children.cycles-pp.xas_load
0.61 ? 2% -0.2 0.37 ? 5% perf-profile.children.cycles-pp.security_file_permission
0.76 ? 3% -0.2 0.53 ? 5% perf-profile.children.cycles-pp.__intel_pmu_enable_all
0.41 ? 5% -0.2 0.20 ? 38% perf-profile.children.cycles-pp.xfs_iunlock
1.03 ? 7% -0.2 0.82 ? 8% perf-profile.children.cycles-pp.devkmsg_emit
1.03 ? 7% -0.2 0.82 ? 8% perf-profile.children.cycles-pp.devkmsg_write
1.03 ? 7% -0.2 0.83 ? 8% perf-profile.children.cycles-pp.console_flush_all
1.03 ? 7% -0.2 0.83 ? 8% perf-profile.children.cycles-pp.console_unlock
1.04 ? 7% -0.2 0.84 ? 8% perf-profile.children.cycles-pp.vprintk_emit
0.62 ? 3% -0.2 0.42 ? 6% perf-profile.children.cycles-pp.irq_exit_rcu
0.60 ? 2% -0.2 0.41 ? 5% perf-profile.children.cycles-pp.__do_softirq
0.52 ? 3% -0.2 0.33 ? 7% perf-profile.children.cycles-pp.folio_alloc
0.45 ? 2% -0.2 0.27 ? 5% perf-profile.children.cycles-pp.apparmor_file_permission
0.33 ? 6% -0.2 0.16 ? 5% perf-profile.children.cycles-pp.up_read
0.38 ? 4% -0.1 0.23 ? 8% perf-profile.children.cycles-pp.__fsnotify_parent
0.45 ? 3% -0.1 0.31 ? 6% perf-profile.children.cycles-pp.rebalance_domains
0.34 ? 3% -0.1 0.20 ? 6% perf-profile.children.cycles-pp.__fdget_pos
0.40 ? 4% -0.1 0.27 ? 7% perf-profile.children.cycles-pp.__alloc_pages
0.33 ? 3% -0.1 0.19 ? 6% perf-profile.children.cycles-pp.xas_descend
0.41 ? 3% -0.1 0.27 ? 7% perf-profile.children.cycles-pp.alloc_pages_mpol
0.29 ? 6% -0.1 0.16 ? 8% perf-profile.children.cycles-pp.__mem_cgroup_charge
0.38 ? 3% -0.1 0.25 ? 7% perf-profile.children.cycles-pp.get_page_from_freelist
0.34 ? 2% -0.1 0.21 ? 6% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.22 ? 7% -0.1 0.10 ? 12% perf-profile.children.cycles-pp.try_charge_memcg
0.25 ? 3% -0.1 0.14 ? 5% perf-profile.children.cycles-pp.xas_store
0.31 ? 3% -0.1 0.22 ? 6% perf-profile.children.cycles-pp._raw_spin_trylock
0.20 ? 4% -0.1 0.11 ? 7% perf-profile.children.cycles-pp.__free_pages_ok
0.23 ? 5% -0.1 0.14 ? 7% perf-profile.children.cycles-pp.rmqueue
0.22 ? 4% -0.1 0.13 ? 8% perf-profile.children.cycles-pp.current_time
0.18 ? 6% -0.1 0.10 ? 7% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.38 ? 15% -0.1 0.29 ? 11% perf-profile.children.cycles-pp.ktime_get
0.16 ? 8% -0.1 0.08 ? 9% perf-profile.children.cycles-pp.page_counter_try_charge
0.25 ? 6% -0.1 0.17 ? 9% perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.18 ? 3% -0.1 0.10 ? 6% perf-profile.children.cycles-pp.__x64_sys_execve
0.18 ? 3% -0.1 0.10 ? 6% perf-profile.children.cycles-pp.do_execveat_common
0.18 ? 3% -0.1 0.10 ? 6% perf-profile.children.cycles-pp.execve
0.28 ? 21% -0.1 0.20 ? 13% perf-profile.children.cycles-pp.tick_irq_enter
0.17 ? 4% -0.1 0.10 ? 5% perf-profile.children.cycles-pp.__mmput
0.17 ? 4% -0.1 0.10 ? 5% perf-profile.children.cycles-pp.exit_mmap
0.25 -0.1 0.18 ? 7% perf-profile.children.cycles-pp.menu_select
0.18 ? 3% -0.1 0.11 ? 6% perf-profile.children.cycles-pp.aa_file_perm
0.28 ? 20% -0.1 0.21 ? 14% perf-profile.children.cycles-pp.irq_enter_rcu
0.13 ? 4% -0.1 0.06 ? 8% perf-profile.children.cycles-pp.xas_create
0.20 ? 4% -0.1 0.13 ? 7% perf-profile.children.cycles-pp.__mod_node_page_state
0.21 ? 4% -0.1 0.14 ? 5% perf-profile.children.cycles-pp.load_balance
0.20 ? 6% -0.1 0.14 ? 3% perf-profile.children.cycles-pp.xas_start
0.21 ? 4% -0.1 0.14 ? 6% perf-profile.children.cycles-pp.__mod_lruvec_state
0.11 ? 3% -0.1 0.04 ? 38% perf-profile.children.cycles-pp.kmem_cache_alloc_lru
0.11 ? 4% -0.1 0.04 ? 38% perf-profile.children.cycles-pp.xas_alloc
0.12 ? 2% -0.1 0.06 ? 8% perf-profile.children.cycles-pp.folio_prep_large_rmappable
0.18 ? 4% -0.1 0.12 ? 6% perf-profile.children.cycles-pp.__cond_resched
0.15 ? 5% -0.1 0.09 ? 4% perf-profile.children.cycles-pp.bprm_execve
0.61 ? 5% -0.1 0.55 ? 4% perf-profile.children.cycles-pp.truncate_inode_pages_range
0.13 ? 5% -0.1 0.08 ? 5% perf-profile.children.cycles-pp.exec_binprm
0.13 ? 5% -0.1 0.08 ? 5% perf-profile.children.cycles-pp.load_elf_binary
0.13 ? 5% -0.1 0.08 ? 5% perf-profile.children.cycles-pp.search_binary_handler
0.08 ? 6% -0.1 0.02 ?100% perf-profile.children.cycles-pp.begin_new_exec
0.14 ? 11% -0.1 0.08 ? 8% perf-profile.children.cycles-pp.arch_scale_freq_tick
0.12 ? 4% -0.1 0.07 ? 7% perf-profile.children.cycles-pp.lru_add_drain
0.10 ? 5% -0.1 0.04 ? 37% perf-profile.children.cycles-pp.__xas_next
0.12 ? 5% -0.1 0.06 ? 7% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.15 ? 6% -0.1 0.10 ? 5% perf-profile.children.cycles-pp.update_sd_lb_stats
0.12 ? 2% -0.1 0.07 ? 7% perf-profile.children.cycles-pp.asm_exc_page_fault
0.15 ? 4% -0.1 0.10 ? 7% perf-profile.children.cycles-pp.find_busiest_group
0.31 ? 5% -0.0 0.26 ? 6% perf-profile.children.cycles-pp.workingset_activation
0.12 ? 4% -0.0 0.07 ? 4% perf-profile.children.cycles-pp.do_exit
0.12 ? 3% -0.0 0.07 ? 4% perf-profile.children.cycles-pp.__x64_sys_exit_group
0.12 ? 3% -0.0 0.07 ? 4% perf-profile.children.cycles-pp.do_group_exit
0.13 ? 5% -0.0 0.09 ? 5% perf-profile.children.cycles-pp.update_sg_lb_stats
0.10 ? 5% -0.0 0.06 ? 9% perf-profile.children.cycles-pp.do_vmi_munmap
0.10 ? 4% -0.0 0.05 ? 8% perf-profile.children.cycles-pp.do_vmi_align_munmap
0.10 ? 5% -0.0 0.05 ? 38% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
0.15 ? 4% -0.0 0.10 ? 9% perf-profile.children.cycles-pp._raw_spin_lock
0.35 ? 2% -0.0 0.30 ? 3% perf-profile.children.cycles-pp.folio_activate_fn
0.11 ? 6% -0.0 0.07 ? 7% perf-profile.children.cycles-pp.__schedule
0.11 ? 4% -0.0 0.06 ? 10% perf-profile.children.cycles-pp.do_user_addr_fault
0.11 ? 4% -0.0 0.06 ? 10% perf-profile.children.cycles-pp.exc_page_fault
0.08 ? 4% -0.0 0.04 ? 57% perf-profile.children.cycles-pp.unmap_region
0.10 ? 4% -0.0 0.06 ? 5% perf-profile.children.cycles-pp.exit_mm
0.10 ? 3% -0.0 0.06 ? 5% perf-profile.children.cycles-pp.handle_mm_fault
0.15 ? 5% -0.0 0.11 ? 14% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.15 ? 4% -0.0 0.11 ? 6% perf-profile.children.cycles-pp.native_irq_return_iret
0.10 ? 3% -0.0 0.06 ? 8% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.10 ? 6% -0.0 0.06 ? 5% perf-profile.children.cycles-pp.tlb_batch_pages_flush
0.10 ? 5% -0.0 0.06 ? 5% perf-profile.children.cycles-pp.vm_mmap_pgoff
0.10 ? 4% -0.0 0.06 ? 8% perf-profile.children.cycles-pp.mmap_region
0.15 ? 6% -0.0 0.11 ? 7% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
0.08 ? 4% -0.0 0.04 ? 57% perf-profile.children.cycles-pp.rcu_core
0.10 ? 4% -0.0 0.06 ? 7% perf-profile.children.cycles-pp.tlb_finish_mmu
0.10 ? 4% -0.0 0.06 ? 5% perf-profile.children.cycles-pp.do_mmap
0.10 ? 5% -0.0 0.06 ? 8% perf-profile.children.cycles-pp.__handle_mm_fault
0.16 ? 14% -0.0 0.12 ? 23% perf-profile.children.cycles-pp.vt_console_print
0.15 ? 13% -0.0 0.12 ? 23% perf-profile.children.cycles-pp.con_scroll
0.15 ? 13% -0.0 0.11 ? 24% perf-profile.children.cycles-pp.fbcon_redraw
0.15 ? 13% -0.0 0.12 ? 23% perf-profile.children.cycles-pp.fbcon_scroll
0.15 ? 13% -0.0 0.12 ? 23% perf-profile.children.cycles-pp.lf
0.11 ? 5% -0.0 0.08 ? 6% perf-profile.children.cycles-pp.task_tick_fair
0.08 ? 17% -0.0 0.05 ? 38% perf-profile.children.cycles-pp.calc_global_load_tick
0.11 ? 4% -0.0 0.07 ? 4% perf-profile.children.cycles-pp.perf_rotate_context
0.09 ? 6% -0.0 0.05 ? 8% perf-profile.children.cycles-pp.schedule
0.14 ? 13% -0.0 0.10 ? 24% perf-profile.children.cycles-pp.fbcon_putcs
0.10 ? 5% -0.0 0.06 ? 8% perf-profile.children.cycles-pp.rcu_all_qs
0.07 ? 7% -0.0 0.03 ? 77% perf-profile.children.cycles-pp.sched_clock
0.10 ? 18% -0.0 0.07 ? 7% perf-profile.children.cycles-pp.__memcpy
0.08 ? 6% -0.0 0.04 ? 37% perf-profile.children.cycles-pp.asm_sysvec_call_function
0.11 ? 4% -0.0 0.08 ? 5% perf-profile.children.cycles-pp.clockevents_program_event
0.11 ? 16% -0.0 0.08 ? 25% perf-profile.children.cycles-pp.fast_imageblit
0.11 ? 16% -0.0 0.08 ? 25% perf-profile.children.cycles-pp.drm_fbdev_generic_defio_imageblit
0.11 ? 16% -0.0 0.08 ? 25% perf-profile.children.cycles-pp.sys_imageblit
0.12 ? 5% -0.0 0.10 ? 7% perf-profile.children.cycles-pp.find_lock_entries
0.09 ? 4% -0.0 0.06 ? 8% perf-profile.children.cycles-pp.native_sched_clock
0.08 ? 6% -0.0 0.05 ? 8% perf-profile.children.cycles-pp.sched_clock_cpu
0.08 ? 5% -0.0 0.06 ? 5% perf-profile.children.cycles-pp.lapic_next_deadline
0.09 ? 4% -0.0 0.06 ? 7% perf-profile.children.cycles-pp.read_tsc
0.07 ? 6% -0.0 0.05 ? 8% perf-profile.children.cycles-pp.native_apic_msr_eoi
0.07 ? 8% -0.0 0.05 ? 6% perf-profile.children.cycles-pp.__free_one_page
0.07 ? 9% +0.0 0.09 perf-profile.children.cycles-pp.__mem_cgroup_uncharge
0.06 ? 7% +0.0 0.09 ? 5% perf-profile.children.cycles-pp.uncharge_batch
0.04 ? 58% +0.0 0.07 perf-profile.children.cycles-pp.page_counter_uncharge
0.09 ? 7% +0.0 0.12 ? 2% perf-profile.children.cycles-pp.destroy_large_folio
0.08 ? 4% +0.0 0.13 ? 13% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.00 +0.1 0.06 ? 7% perf-profile.children.cycles-pp.free_unref_page
89.22 +3.4 92.57 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
89.02 +3.4 92.44 perf-profile.children.cycles-pp.do_syscall_64
86.89 +3.9 90.79 perf-profile.children.cycles-pp.read
39.51 +4.7 44.21 perf-profile.children.cycles-pp.filemap_get_pages
84.67 +4.7 89.40 perf-profile.children.cycles-pp.ksys_read
84.40 +4.8 89.24 perf-profile.children.cycles-pp.vfs_read
37.48 +5.7 43.21 perf-profile.children.cycles-pp.page_cache_ra_order
82.04 +5.9 87.98 perf-profile.children.cycles-pp.filemap_read
28.01 +9.2 37.19 perf-profile.children.cycles-pp.folio_mark_accessed
27.68 +9.2 36.91 ? 2% perf-profile.children.cycles-pp.folio_activate
28.55 +10.4 38.96 ? 2% perf-profile.children.cycles-pp.filemap_add_folio
27.81 +10.7 38.48 ? 2% perf-profile.children.cycles-pp.folio_add_lru
54.31 +19.8 74.12 ? 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
54.49 +19.8 74.33 ? 2% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
54.82 +19.8 74.67 ? 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
55.58 +19.9 75.44 ? 2% perf-profile.children.cycles-pp.folio_batch_move_lru
12.46 -7.0 5.42 ? 7% perf-profile.self.cycles-pp._copy_to_iter
8.02 -4.7 3.34 ? 9% perf-profile.self.cycles-pp.__memset
1.14 ? 3% -0.4 0.76 ? 6% perf-profile.self.cycles-pp.intel_idle
0.93 ? 3% -0.3 0.58 ? 6% perf-profile.self.cycles-pp.filemap_read
0.56 ? 8% -0.3 0.23 ? 9% perf-profile.self.cycles-pp.down_read
1.02 -0.3 0.70 ? 5% perf-profile.self.cycles-pp.intel_idle_xstate
0.49 ? 7% -0.3 0.21 ? 8% perf-profile.self.cycles-pp.atime_needs_update
0.66 ? 2% -0.2 0.42 ? 7% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
1.43 ? 11% -0.2 1.18 ? 3% perf-profile.self.cycles-pp.memcpy_toio
0.66 ? 2% -0.2 0.44 ? 5% perf-profile.self.cycles-pp.filemap_get_read_batch
0.76 ? 3% -0.2 0.53 ? 5% perf-profile.self.cycles-pp.__intel_pmu_enable_all
0.60 ? 3% -0.2 0.38 ? 6% perf-profile.self.cycles-pp.write
0.60 ? 2% -0.2 0.38 ? 6% perf-profile.self.cycles-pp.read
0.53 ? 3% -0.2 0.32 ? 6% perf-profile.self.cycles-pp.vfs_read
0.48 -0.2 0.31 ? 4% perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
0.32 ? 7% -0.2 0.16 ? 6% perf-profile.self.cycles-pp.up_read
0.36 ? 4% -0.1 0.22 ? 7% perf-profile.self.cycles-pp.__fsnotify_parent
0.32 ? 4% -0.1 0.19 ? 6% perf-profile.self.cycles-pp.__fdget_pos
0.30 ? 7% -0.1 0.17 ? 9% perf-profile.self.cycles-pp.vfs_write
0.30 ? 3% -0.1 0.17 ? 6% perf-profile.self.cycles-pp.xas_descend
0.28 ? 2% -0.1 0.16 ? 8% perf-profile.self.cycles-pp.do_syscall_64
0.28 ? 3% -0.1 0.17 ? 6% perf-profile.self.cycles-pp.entry_SYSCALL_64
0.32 ? 3% -0.1 0.22 ? 7% perf-profile.self.cycles-pp.cpuidle_enter_state
0.19 ? 8% -0.1 0.09 ? 38% perf-profile.self.cycles-pp.xfs_file_read_iter
0.24 ? 3% -0.1 0.14 ? 6% perf-profile.self.cycles-pp.apparmor_file_permission
0.31 ? 3% -0.1 0.22 ? 6% perf-profile.self.cycles-pp._raw_spin_trylock
0.18 ? 5% -0.1 0.10 ? 8% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.22 ? 4% -0.1 0.14 ? 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.23 ? 6% -0.1 0.16 ? 9% perf-profile.self.cycles-pp.ktime_get_update_offsets_now
0.14 ? 9% -0.1 0.07 ? 12% perf-profile.self.cycles-pp.page_counter_try_charge
0.10 ? 4% -0.1 0.02 ?100% perf-profile.self.cycles-pp.rmqueue
0.20 ? 3% -0.1 0.13 ? 7% perf-profile.self.cycles-pp.__mod_node_page_state
0.18 ? 3% -0.1 0.12 ? 7% perf-profile.self.cycles-pp.xas_load
0.09 -0.1 0.02 ?100% perf-profile.self.cycles-pp.__xas_next
0.17 ? 2% -0.1 0.10 ? 6% perf-profile.self.cycles-pp.rw_verify_area
0.16 ? 3% -0.1 0.10 ? 6% perf-profile.self.cycles-pp.aa_file_perm
0.16 ? 3% -0.1 0.09 ? 9% perf-profile.self.cycles-pp.filemap_get_pages
0.19 ? 6% -0.1 0.13 ? 3% perf-profile.self.cycles-pp.xas_start
0.18 ? 3% -0.1 0.12 ? 5% perf-profile.self.cycles-pp.security_file_permission
0.17 -0.1 0.11 ? 6% perf-profile.self.cycles-pp.copy_page_to_iter
0.12 ? 2% -0.1 0.06 ? 8% perf-profile.self.cycles-pp.folio_prep_large_rmappable
0.16 ? 3% -0.1 0.10 ? 6% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.14 ? 11% -0.1 0.08 ? 8% perf-profile.self.cycles-pp.arch_scale_freq_tick
0.08 ? 8% -0.1 0.03 ? 77% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64
0.12 ? 6% -0.1 0.06 ? 10% perf-profile.self.cycles-pp.__free_pages_ok
0.08 ? 4% -0.0 0.03 ? 77% perf-profile.self.cycles-pp.xfs_ilock
0.14 ? 4% -0.0 0.09 ? 9% perf-profile.self.cycles-pp._raw_spin_lock
0.10 ? 4% -0.0 0.06 ? 39% perf-profile.self.cycles-pp.xfs_iunlock
0.12 ? 4% -0.0 0.08 ? 9% perf-profile.self.cycles-pp.current_time
0.11 ? 18% -0.0 0.06 ? 17% perf-profile.self.cycles-pp.iomap_set_range_uptodate
0.12 ? 4% -0.0 0.07 ? 7% perf-profile.self.cycles-pp.ksys_write
0.15 ? 4% -0.0 0.11 ? 6% perf-profile.self.cycles-pp.native_irq_return_iret
0.10 ? 3% -0.0 0.06 ? 8% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.09 ? 5% -0.0 0.05 ? 38% perf-profile.self.cycles-pp.xfs_file_buffered_read
0.10 ? 4% -0.0 0.06 ? 7% perf-profile.self.cycles-pp.xas_store
0.14 ? 3% -0.0 0.10 ? 15% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.08 ? 17% -0.0 0.04 ? 38% perf-profile.self.cycles-pp.calc_global_load_tick
0.11 ? 4% -0.0 0.07 ? 10% perf-profile.self.cycles-pp.ksys_read
0.10 ? 18% -0.0 0.07 ? 7% perf-profile.self.cycles-pp.__memcpy
0.10 ? 7% -0.0 0.07 ? 7% perf-profile.self.cycles-pp.update_sg_lb_stats
0.12 ? 4% -0.0 0.08 ? 8% perf-profile.self.cycles-pp.menu_select
0.09 ? 4% -0.0 0.06 ? 9% perf-profile.self.cycles-pp.__cond_resched
0.11 ? 16% -0.0 0.08 ? 25% perf-profile.self.cycles-pp.fast_imageblit
0.09 ? 4% -0.0 0.06 ? 5% perf-profile.self.cycles-pp.read_tsc
0.08 ? 5% -0.0 0.06 ? 5% perf-profile.self.cycles-pp.native_sched_clock
0.08 ? 5% -0.0 0.06 ? 5% perf-profile.self.cycles-pp.lapic_next_deadline
0.08 ? 6% -0.0 0.06 ? 11% perf-profile.self.cycles-pp.folio_lruvec_lock_irqsave
0.07 ? 5% -0.0 0.05 ? 8% perf-profile.self.cycles-pp.native_apic_msr_eoi
0.07 ? 4% -0.0 0.05 ? 6% perf-profile.self.cycles-pp.__free_one_page
0.09 ? 8% -0.0 0.07 ? 4% perf-profile.self.cycles-pp.find_lock_entries
0.09 +0.0 0.10 perf-profile.self.cycles-pp.lru_add_fn
0.14 ? 2% +0.0 0.16 perf-profile.self.cycles-pp.folio_batch_move_lru
0.03 ? 77% +0.0 0.06 ? 7% perf-profile.self.cycles-pp.page_counter_uncharge
54.31 +19.8 74.12 ? 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


Best Regards,
Yujie

2024-03-04 05:35:35

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

Hi Jan,

On 3/4/2024 12:59 PM, Yujie Liu wrote:
> From the perf profile, we can see that the contention of folio lru lock
> becomes more intense. We also did a simple one-file "dd" test. Looks
> like it is more likely that low-order folios are allocated after commit
> ab4443fe3c (Fengwei will help provide the data soon). Therefore, the
> average folio size decreases while the total folio amount increases,
> which leads to touching lru lock more often.

I did following testing:
With a xfs image in tmpfs + mount it to /mnt and create 12G test file
(sparse-file), use one process to read it on a Ice Lake machine with
256G system memory. So we could make sure we are doing a sequential
file read with no page reclaim triggered.

At the same time, profiling the distribution of order parameter of
filemap_alloc_folio() call to understand how the large folio order
for page cache is generated.

Here is what we got:

- Commit f0b7a0d1d46625db:
$ dd bs=4k if=/mnt/sparse-file of=/dev/null
3145728+0 records in
3145728+0 records out
12884901888 bytes (13 GB, 12 GiB) copied, 2.52208 s, 5.01 GB/s

filemap_alloc_folio
page order : count distribution
0 : 57 | |
1 : 0 | |
2 : 20 | |
3 : 2 | |
4 : 4 | |
5 : 98300 |****************************************|

- Commit ab4443fe3ca6:
$ dd bs=4k if=/mnt/sparse-file of=/dev/null
3145728+0 records in
3145728+0 records out
12884901888 bytes (13 GB, 12 GiB) copied, 2.51469 s, 5.1 GB/s

filemap_alloc_folio
page order : count distribution
0 : 21 | |
1 : 0 | |
2 : 196615 |****************************************|
3 : 98303 |******************* |
4 : 98303 |******************* |


Even the file read throughput is almost same. But the distribution of
order looks like a regression with ab4443fe3ca6 (more smaller order
page cache is generated than parent commit). Thanks.


Regards
Yin, Fengwei

2024-03-06 05:36:35

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression



On 3/4/24 13:35, Yin, Fengwei wrote:
> Even the file read throughput is almost same. But the distribution of
> order looks like a regression with ab4443fe3ca6 (more smaller order
> page cache is generated than parent commit). Thanks.
There may be confusion here. Let me clarify it as:
I shouldn't say folio order distribution is a regression. It's smaller
folio order cause more folios added in page cache for same workload. And
raise the lru lock contention which trigger the regression.


Regards
Yin, Fengwei

2024-03-07 09:24:29

by Jan Kara

[permalink] [raw]
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

On Mon 04-03-24 13:35:10, Yin, Fengwei wrote:
> Hi Jan,
>
> On 3/4/2024 12:59 PM, Yujie Liu wrote:
> > From the perf profile, we can see that the contention of folio lru lock
> > becomes more intense. We also did a simple one-file "dd" test. Looks
> > like it is more likely that low-order folios are allocated after commit
> > ab4443fe3c (Fengwei will help provide the data soon). Therefore, the
> > average folio size decreases while the total folio amount increases,
> > which leads to touching lru lock more often.
>
> I did following testing:
> With a xfs image in tmpfs + mount it to /mnt and create 12G test file
> (sparse-file), use one process to read it on a Ice Lake machine with
> 256G system memory. So we could make sure we are doing a sequential
> file read with no page reclaim triggered.
>
> At the same time, profiling the distribution of order parameter of
> filemap_alloc_folio() call to understand how the large folio order
> for page cache is generated.
>
> Here is what we got:
>
> - Commit f0b7a0d1d46625db:
> $ dd bs=4k if=/mnt/sparse-file of=/dev/null
> 3145728+0 records in
> 3145728+0 records out
> 12884901888 bytes (13 GB, 12 GiB) copied, 2.52208 s, 5.01 GB/s
>
> filemap_alloc_folio
> page order : count distribution
> 0 : 57 | |
> 1 : 0 | |
> 2 : 20 | |
> 3 : 2 | |
> 4 : 4 | |
> 5 : 98300 |****************************************|
>
> - Commit ab4443fe3ca6:
> $ dd bs=4k if=/mnt/sparse-file of=/dev/null
> 3145728+0 records in
> 3145728+0 records out
> 12884901888 bytes (13 GB, 12 GiB) copied, 2.51469 s, 5.1 GB/s
>
> filemap_alloc_folio
> page order : count distribution
> 0 : 21 | |
> 1 : 0 | |
> 2 : 196615 |****************************************|
> 3 : 98303 |******************* |
> 4 : 98303 |******************* |
>
>
> Even the file read throughput is almost same. But the distribution of
> order looks like a regression with ab4443fe3ca6 (more smaller order
> page cache is generated than parent commit). Thanks.

Thanks for testing! This is an interesting result and certainly unexpected
for me. The readahead code allocates naturally aligned pages so based on
the distribution of allocations it seems that before commit ab4443fe3ca6
readahead window was at least 32 pages (128KB) aligned and so we allocated
order 5 pages. After the commit, the readahead window somehow ended up only
aligned to 20 modulo 32. To follow natural alignment and fill 128KB
readahead window we allocated order 2 page (got us to offset 24 modulo 32),
then order 3 page (got us to offset 0 modulo 32), order 4 page (larger
would not fit in 128KB readahead window now), and order 2 page to finish
filling the readahead window.

Now I'm not 100% sure why the readahead window alignment changed with
different rounding when placing readahead mark - probably that's some
artifact when readahead window is tiny in the beginning before we scale it
up (I'll verify by tracing whether everything ends up looking correctly
with the current code). So I don't expect this is a problem in ab4443fe3ca6
as such but it exposes the issue that readahead page insertion code should
perhaps strive to achieve better readahead window alignment with logical
file offset even at the cost of occasionally performing somewhat shorter
readahead. I'll look into this once I dig out of the huge heap of email
after vacation...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2024-03-07 23:06:46

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

On Thu, Mar 07, 2024 at 10:23:08AM +0100, Jan Kara wrote:
> Thanks for testing! This is an interesting result and certainly unexpected
> for me. The readahead code allocates naturally aligned pages so based on
> the distribution of allocations it seems that before commit ab4443fe3ca6
> readahead window was at least 32 pages (128KB) aligned and so we allocated
> order 5 pages. After the commit, the readahead window somehow ended up only
> aligned to 20 modulo 32. To follow natural alignment and fill 128KB
> readahead window we allocated order 2 page (got us to offset 24 modulo 32),
> then order 3 page (got us to offset 0 modulo 32), order 4 page (larger
> would not fit in 128KB readahead window now), and order 2 page to finish
> filling the readahead window.
>
> Now I'm not 100% sure why the readahead window alignment changed with
> different rounding when placing readahead mark - probably that's some
> artifact when readahead window is tiny in the beginning before we scale it
> up (I'll verify by tracing whether everything ends up looking correctly
> with the current code). So I don't expect this is a problem in ab4443fe3ca6
> as such but it exposes the issue that readahead page insertion code should
> perhaps strive to achieve better readahead window alignment with logical
> file offset even at the cost of occasionally performing somewhat shorter
> readahead. I'll look into this once I dig out of the huge heap of email
> after vacation...

I was surprised by what you said here, so I went and re-read the code
and it doesn't work the way I thought it did. So I had a good long think
about how it _should_ work, and I looked for some more corner conditions,
and this is what I came up with.

The first thing I've done is separate out the two limits. The EOF is
a hard limit; we will not allocate pages beyond EOF. The ra->size is
a soft limit; we will allocate pages beyond ra->size, but not too far.

The second thing I noticed is that index + ra_size could wrap. So add
a check for that, and set it to ULONG_MAX. index + ra_size - async_size
could also wrap, but this is harmless. We certainly don't want to kick
off any more readahead in this circumstance, so leaving 'mark' outside
the range [index..ULONG_MAX] is just fine.

The third thing is that we could allocate a folio which contains a page
at ULONG_MAX. We don't really want that in the page cache; it makes
filesystems more complicated if they have to check for that, and we
don't allow an order-0 folio at ULONG_MAX, so there's no need for it.
This _should_ already be prohibited by the "Don't allocate pages past EOF"
check, but let's explicitly prohibit it.

Compile tested only.

diff --git a/mm/readahead.c b/mm/readahead.c
index 130c0e7df99f..742e1f39035b 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -488,7 +488,8 @@ void page_cache_ra_order(struct readahead_control *ractl,
{
struct address_space *mapping = ractl->mapping;
pgoff_t index = readahead_index(ractl);
- pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
+ pgoff_t last = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
+ pgoff_t limit = index + ra->size;
pgoff_t mark = index + ra->size - ra->async_size;
int err = 0;
gfp_t gfp = readahead_gfp_mask(mapping);
@@ -496,23 +497,26 @@ void page_cache_ra_order(struct readahead_control *ractl,
if (!mapping_large_folio_support(mapping) || ra->size < 4)
goto fallback;

- limit = min(limit, index + ra->size - 1);
-
if (new_order < MAX_PAGECACHE_ORDER) {
new_order += 2;
new_order = min_t(unsigned int, MAX_PAGECACHE_ORDER, new_order);
new_order = min_t(unsigned int, new_order, ilog2(ra->size));
}

+ if (limit < index)
+ limit = ULONG_MAX;
filemap_invalidate_lock_shared(mapping);
- while (index <= limit) {
+ while (index < limit) {
unsigned int order = new_order;

/* Align with smaller pages if needed */
if (index & ((1UL << order) - 1))
order = __ffs(index);
+ /* Avoid wrap */
+ if (index + (1UL << order) == 0)
+ order--;
/* Don't allocate pages past EOF */
- while (index + (1UL << order) - 1 > limit)
+ while (index + (1UL << order) - 1 > last)
order--;
err = ra_alloc_folio(ractl, index, mark, order, gfp);
if (err)

2024-03-08 08:46:17

by Yujie Liu

[permalink] [raw]
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

On Thu, Mar 07, 2024 at 06:19:46PM +0000, Matthew Wilcox wrote:
> On Thu, Mar 07, 2024 at 10:23:08AM +0100, Jan Kara wrote:
> > Thanks for testing! This is an interesting result and certainly unexpected
> > for me. The readahead code allocates naturally aligned pages so based on
> > the distribution of allocations it seems that before commit ab4443fe3ca6
> > readahead window was at least 32 pages (128KB) aligned and so we allocated
> > order 5 pages. After the commit, the readahead window somehow ended up only
> > aligned to 20 modulo 32. To follow natural alignment and fill 128KB
> > readahead window we allocated order 2 page (got us to offset 24 modulo 32),
> > then order 3 page (got us to offset 0 modulo 32), order 4 page (larger
> > would not fit in 128KB readahead window now), and order 2 page to finish
> > filling the readahead window.
> >
> > Now I'm not 100% sure why the readahead window alignment changed with
> > different rounding when placing readahead mark - probably that's some
> > artifact when readahead window is tiny in the beginning before we scale it
> > up (I'll verify by tracing whether everything ends up looking correctly
> > with the current code). So I don't expect this is a problem in ab4443fe3ca6
> > as such but it exposes the issue that readahead page insertion code should
> > perhaps strive to achieve better readahead window alignment with logical
> > file offset even at the cost of occasionally performing somewhat shorter
> > readahead. I'll look into this once I dig out of the huge heap of email
> > after vacation...
>
> I was surprised by what you said here, so I went and re-read the code
> and it doesn't work the way I thought it did. So I had a good long think
> about how it _should_ work, and I looked for some more corner conditions,
> and this is what I came up with.
>
> The first thing I've done is separate out the two limits. The EOF is
> a hard limit; we will not allocate pages beyond EOF. The ra->size is
> a soft limit; we will allocate pages beyond ra->size, but not too far.
>
> The second thing I noticed is that index + ra_size could wrap. So add
> a check for that, and set it to ULONG_MAX. index + ra_size - async_size
> could also wrap, but this is harmless. We certainly don't want to kick
> off any more readahead in this circumstance, so leaving 'mark' outside
> the range [index..ULONG_MAX] is just fine.
>
> The third thing is that we could allocate a folio which contains a page
> at ULONG_MAX. We don't really want that in the page cache; it makes
> filesystems more complicated if they have to check for that, and we
> don't allow an order-0 folio at ULONG_MAX, so there's no need for it.
> This _should_ already be prohibited by the "Don't allocate pages past EOF"
> check, but let's explicitly prohibit it.
>
> Compile tested only.

We applied the diff on top of commit ab4443fe3ca6 but got a kernel panic
when running the dd test:

[ 109.259674][ C46] watchdog: BUG: soft lockup - CPU#46 stuck for 22s! [ dd:8616]
[ 109.268946][ C46] Modules linked in: xfs loop intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp btrfs blake2b_generic kvm_intel xor kvm irqbypass crct10dif_pclmul crc32_pclmul sd_mod raid6_pq ghash_clmulni_intel libcrc32c crc32c_intel sg sha512_ssse3 i915 nvme rapl drm_buddy nvme_core intel_gtt ahci t10_pi drm_display_helper ast intel_cstate libahci ipmi_ssif ttm drm_shmem_helper mei_me i2c_i801 crc64_rocksoft_generic video crc64_rocksoft acpi_ipmi intel_uncore megaraid_sas mei drm_kms_helper joydev libata i2c_ismt i2c_smbus dax_hmem crc64 wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter drm fuse ip_tables
[ 109.336216][ C46] CPU: 46 PID: 8616 Comm: dd Tainted: G I 6.8.0-rc1-00005-g6c6de6e42e46 #1
[ 109.347892][ C46] Hardware name: NULL NULL/NULL, BIOS 05.02.01 05/12/2023
[ 109.356324][ C46] RIP: 0010:page_cache_ra_order (mm/readahead.c:521)
[ 109.363394][ C46] Code: cf 48 89 e8 4c 89 fa 48 d3 e0 48 01 c2 75 09 83 e9 01 48 89 e8 48 d3 e0 49 8d 77 ff 48 01 f0 49 39 c6 73 11 83 e9 01 48 89 e8 <48> d3 e0 48 01 f0 49 39 c6 72 ef 31 c0 83 f9 01 8b 3c 24 0f 44 c8
All code
========
0: cf iret
1: 48 89 e8 mov %rbp,%rax
4: 4c 89 fa mov %r15,%rdx
7: 48 d3 e0 shl %cl,%rax
a: 48 01 c2 add %rax,%rdx
d: 75 09 jne 0x18
f: 83 e9 01 sub $0x1,%ecx
12: 48 89 e8 mov %rbp,%rax
15: 48 d3 e0 shl %cl,%rax
18: 49 8d 77 ff lea -0x1(%r15),%rsi
1c: 48 01 f0 add %rsi,%rax
1f: 49 39 c6 cmp %rax,%r14
22: 73 11 jae 0x35
24: 83 e9 01 sub $0x1,%ecx
27: 48 89 e8 mov %rbp,%rax
2a:* 48 d3 e0 shl %cl,%rax <-- trapping instruction
2d: 48 01 f0 add %rsi,%rax
30: 49 39 c6 cmp %rax,%r14
33: 72 ef jb 0x24
35: 31 c0 xor %eax,%eax
37: 83 f9 01 cmp $0x1,%ecx
3a: 8b 3c 24 mov (%rsp),%edi
3d: 0f 44 c8 cmove %eax,%ecx

Code starting with the faulting instruction
===========================================
0: 48 d3 e0 shl %cl,%rax
3: 48 01 f0 add %rsi,%rax
6: 49 39 c6 cmp %rax,%r14
9: 72 ef jb 0xfffffffffffffffa
b: 31 c0 xor %eax,%eax
d: 83 f9 01 cmp $0x1,%ecx
10: 8b 3c 24 mov (%rsp),%edi
13: 0f 44 c8 cmove %eax,%ecx
[ 109.385897][ C46] RSP: 0018:ffa0000012837c00 EFLAGS: 00000206
[ 109.393176][ C46] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000020159674
[ 109.402607][ C46] RDX: 000000000004924c RSI: 0000000000049249 RDI: ff11003f6f3ae7c0
[ 109.412038][ C46] RBP: 0000000000000001 R08: 0000000000038700 R09: 0000000000000013
[ 109.421447][ C46] R10: 0000000000022c04 R11: 0000000000000001 R12: ffa0000012837cb0
[ 109.430868][ C46] R13: ffd400004fee4b40 R14: 0000000000049249 R15: 000000000004924a
[ 109.440270][ C46] FS: 00007f777e884640(0000) GS:ff11003f6f380000(0000) knlGS:0000000000000000
[ 109.450756][ C46] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 109.458603][ C46] CR2: 00007f2d4d425020 CR3: 00000001b4f84005 CR4: 0000000000f71ef0
[ 109.468003][ C46] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 109.477392][ C46] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 109.486794][ C46] PKRU: 55555554
[ 109.491197][ C46] Call Trace:
[ 109.495300][ C46] <IRQ>
[ 109.498922][ C46] ? watchdog_timer_fn (kernel/watchdog.c:548)
[ 109.505074][ C46] ? __pfx_watchdog_timer_fn (kernel/watchdog.c:466)
[ 109.511620][ C46] ? __hrtimer_run_queues (kernel/time/hrtimer.c:1688 kernel/time/hrtimer.c:1752)
[ 109.518059][ C46] ? hrtimer_interrupt (kernel/time/hrtimer.c:1817)
[ 109.524088][ C46] ? __sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1065 arch/x86/kernel/apic/apic.c:1082)
[ 109.531286][ C46] ? sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14))
[ 109.538190][ C46] </IRQ>
[ 109.541867][ C46] <TASK>
[ 109.545545][ C46] ? asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:649)
[ 109.552832][ C46] ? page_cache_ra_order (mm/readahead.c:521)
[ 109.559122][ C46] filemap_get_pages (mm/filemap.c:2500)
[ 109.564935][ C46] filemap_read (mm/filemap.c:2594)
[ 109.570241][ C46] xfs_file_buffered_read (fs/xfs/xfs_file.c:315) xfs
[ 109.577202][ C46] xfs_file_read_iter (fs/xfs/xfs_file.c:341) xfs
[ 109.583749][ C46] vfs_read (include/linux/fs.h:2079 fs/read_write.c:395 fs/read_write.c:476)
[ 109.588762][ C46] ksys_read (fs/read_write.c:619)
[ 109.593660][ C46] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
[ 109.599038][ C46] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
[ 109.605982][ C46] RIP: 0033:0x7f777e78d3ce
[ 109.611255][ C46] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 6e 08 0b 00 e8 69 01 02 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
All code
========
0: c0 e9 b6 shr $0xb6,%cl
3: fe (bad)
4: ff (bad)
5: ff 50 48 call *0x48(%rax)
8: 8d 3d 6e 08 0b 00 lea 0xb086e(%rip),%edi # 0xb087c
e: e8 69 01 02 00 call 0x2017c
13: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
1a: 00 00
1c: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
23: 00
24: 85 c0 test %eax,%eax
26: 75 14 jne 0x3c
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 5a ja 0x8c
32: c3 ret
33: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
3a: 00 00
3c: 48 83 ec 28 sub $0x28,%rsp

Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 5a ja 0x62
8: c3 ret
9: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
10: 00 00
12: 48 83 ec 28 sub $0x28,%rsp
[ 109.633619][ C46] RSP: 002b:00007ffc78ab2778 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 109.643392][ C46] RAX: ffffffffffffffda RBX: 0000000000001000 RCX: 00007f777e78d3ce
[ 109.652686][ C46] RDX: 0000000000001000 RSI: 00005629c0f7c000 RDI: 0000000000000000
[ 109.661976][ C46] RBP: 00005629c0f7c000 R08: 00005629c0f7bd30 R09: 00007f777e870be0
[ 109.671251][ C46] R10: 00005629c0f7c000 R11: 0000000000000246 R12: 0000000000000000
[ 109.680528][ C46] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffffffffff
[ 109.689808][ C46] </TASK>
[ 109.693512][ C46] Kernel panic - not syncing: softlockup: hung tasks


# mm/readahead.c

486 void page_cache_ra_order(struct readahead_control *ractl,
487 struct file_ra_state *ra, unsigned int new_order)
488 {
489 struct address_space *mapping = ractl->mapping;
490 pgoff_t index = readahead_index(ractl);
491 pgoff_t last = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
492 pgoff_t limit = index + ra->size;
493 pgoff_t mark = index + ra->size - ra->async_size;
494 int err = 0;
495 gfp_t gfp = readahead_gfp_mask(mapping);
496
497 if (!mapping_large_folio_support(mapping) || ra->size < 4)
498 goto fallback;
499
500 if (new_order < MAX_PAGECACHE_ORDER) {
501 new_order += 2;
502 if (new_order > MAX_PAGECACHE_ORDER)
503 new_order = MAX_PAGECACHE_ORDER;
504 while ((1 << new_order) > ra->size)
505 new_order--;
506 }
507
508 if (limit < index)
509 limit = ULONG_MAX;
510
511 filemap_invalidate_lock_shared(mapping);
512 while (index < limit) {
513 unsigned int order = new_order;
514
515 /* Align with smaller pages if needed */
516 if (index & ((1UL << order) - 1))
517 order = __ffs(index);
518 if (index + (1UL << order) == 0)
519 order--;
520 /* Don't allocate pages past EOF */
521 while (index + (1UL << order) - 1 > last)
522 order--;
523 /* THP machinery does not support order-1 */
524 if (order == 1)
525 order = 0;
526 err = ra_alloc_folio(ractl, index, mark, order, gfp);
527 if (err)
528 break;
529 index += 1UL << order;
530 }
531
532 if (index > limit) {
533 ra->size += index - limit - 1;
534 ra->async_size += index - limit - 1;
535 }
536
537 read_pages(ractl);
538 filemap_invalidate_unlock_shared(mapping);
539
540 /*
541 * If there were already pages in the page cache, then we may have
542 * left some gaps. Let the regular readahead code take care of this
543 * situation.
544 */
545 if (!err)
546 return;
547 fallback:
548 do_page_cache_ra(ractl, ra->size, ra->async_size);
549 }


Regards,
Yujie

> diff --git a/mm/readahead.c b/mm/readahead.c
> index 130c0e7df99f..742e1f39035b 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -488,7 +488,8 @@ void page_cache_ra_order(struct readahead_control *ractl,
> {
> struct address_space *mapping = ractl->mapping;
> pgoff_t index = readahead_index(ractl);
> - pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
> + pgoff_t last = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
> + pgoff_t limit = index + ra->size;
> pgoff_t mark = index + ra->size - ra->async_size;
> int err = 0;
> gfp_t gfp = readahead_gfp_mask(mapping);
> @@ -496,23 +497,26 @@ void page_cache_ra_order(struct readahead_control *ractl,
> if (!mapping_large_folio_support(mapping) || ra->size < 4)
> goto fallback;
>
> - limit = min(limit, index + ra->size - 1);
> -
> if (new_order < MAX_PAGECACHE_ORDER) {
> new_order += 2;
> new_order = min_t(unsigned int, MAX_PAGECACHE_ORDER, new_order);
> new_order = min_t(unsigned int, new_order, ilog2(ra->size));
> }
>
> + if (limit < index)
> + limit = ULONG_MAX;
> filemap_invalidate_lock_shared(mapping);
> - while (index <= limit) {
> + while (index < limit) {
> unsigned int order = new_order;
>
> /* Align with smaller pages if needed */
> if (index & ((1UL << order) - 1))
> order = __ffs(index);
> + /* Avoid wrap */
> + if (index + (1UL << order) == 0)
> + order--;
> /* Don't allocate pages past EOF */
> - while (index + (1UL << order) - 1 > limit)
> + while (index + (1UL << order) - 1 > last)
> order--;
> err = ra_alloc_folio(ractl, index, mark, order, gfp);
> if (err)

2024-03-10 06:42:00

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

Hi Matthew,

On 3/8/2024 2:19 AM, Matthew Wilcox wrote:
> /* Align with smaller pages if needed */
> if (index & ((1UL << order) - 1))
> order = __ffs(index);
> + /* Avoid wrap */
> + if (index + (1UL << order) == 0)
> + order--;
> /* Don't allocate pages past EOF */
> - while (index + (1UL << order) - 1 > limit)
> + while (index + (1UL << order) - 1 > last)
The lockup is related with this line. When index == (last + 1),
deadloop here.


Regards
Yin, Fengwei

> order--;

2024-03-10 06:42:09

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [linus:master] [readahead] ab4443fe3c: vm-scalability.throughput -21.4% regression

On 3/7/2024 5:23 PM, Jan Kara wrote:
> Thanks for testing! This is an interesting result and certainly unexpected
> for me. The readahead code allocates naturally aligned pages so based on
> the distribution of allocations it seems that before commit ab4443fe3ca6
> readahead window was at least 32 pages (128KB) aligned and so we allocated
> order 5 pages. After the commit, the readahead window somehow ended up only
> aligned to 20 modulo 32. To follow natural alignment and fill 128KB
> readahead window we allocated order 2 page (got us to offset 24 modulo 32),
> then order 3 page (got us to offset 0 modulo 32), order 4 page (larger
> would not fit in 128KB readahead window now), and order 2 page to finish
> filling the readahead window.
>
> Now I'm not 100% sure why the readahead window alignment changed with
> different rounding when placing readahead mark - probably that's some
> artifact when readahead window is tiny in the beginning before we scale it
> up (I'll verify by tracing whether everything ends up looking correctly
> with the current code). So I don't expect this is a problem in ab4443fe3ca6
> as such but it exposes the issue that readahead page insertion code should
> perhaps strive to achieve better readahead window alignment with logical
> file offset even at the cost of occasionally performing somewhat shorter
> readahead. I'll look into this once I dig out of the huge heap of email
> after vacation...
Hi Jan,
I am also curious to this behavior and add tried add logs to understand
the behavior here. Here is something difference w/o ab4443fe3ca6:
- with ab4443fe3ca6:
You are right about the folio order as the readahead window is 0x20.
The folio order sequence is like order 2, order 4, order3, order2.

But different thing is always mark the first order 2 folio readahead.
So the max order is boosted to 4 in page_cache_ra_order(). The code
path always hit
if (index == expected || index == (ra->start + ra->size))
in ondemand_readahead().

If just change the round_down() to round_up() in ra_alloc_folio(),
the major folio order will be restored to 5.

- without ab4443fe3ca6:
at the beginning, the folio order sequence is same like 2, 4, 3, 2.
But besides the first order2 folio, order4 folio will be marked as
readahead also. So it's possible the order boosted to 5.
Also, not just path
if (index == expected || index == (ra->start + ra->size))
is hit. but also
if (folio) {
can be hit (I didn't check other path as this testing is sequential
read).

There are some back and forth between 5 and 2,4,3,2, the order is
stabilized on 5.

I didn't fully understand the whole thing and will dig deeper. The
above is just what the log showed.


Hi Matthew,
I noticed one thing when readahead folio order is being pushed forward,
there are several times readahead trying to allocate and add folios to
page cache. But failed as there is folio inserted to page cache cover
the requested index already. Once the folio order is correct, there is
no such case anymore. I suppose this is expected.


Regards
Yin, Fengwei