2023-09-13 05:04:51

by Oliver Sang

[permalink] [raw]
Subject: Re: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic



Hello,

kernel test robot noticed a -11.9% improvement of autonuma-benchmark.numa01_THREAD_ALLOC.seconds on:


commit: 1ef5cbb92bdb320c5eb9fdee1a811d22ee9e19fe ("[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic")
url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805
patch link: https://lore.kernel.org/all/87e3c08bd1770dd3e6eee099c01e595f14c76fc3.1693287931.git.raghavendra.kt@amd.com/
patch subject: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic

testcase: autonuma-benchmark
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

iterations: 4x
test: numa01_THREAD_ALLOC
cpufreq_governor: performance


hi, Raghu,

the reason there is a separate report for this commit besides
https://lore.kernel.org/all/[email protected]/
is due to bisection nature, for one auto-bisect, we so far only could capture
one commit for performance change.

this auto-bisect is running on another test machine (Sapphire Rapids), and it
happened to choose autonuma-benchmark.numa01_THREAD_ALLOC.seconds as indicator
to do the bisect, it finally captured
"[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional"

and from
https://lore.kernel.org/all/[email protected]/
I noticed you care more about the performance impact of whole patch set,
so let me give a summary table as below.

firstly, let me give out how we apply your patch again:

68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs
af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned
167773d1ddb5f sched/numa: Increase tasks' access history
fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq
1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic
2a806eab1c2e1 sched/numa: Move up the access pid reset logic
2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well


we have below data on this test machine
(full table will be very big, if you want it, please let me know):

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark

commit:
2f88c8e802 ("(tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well")
2a806eab1c ("sched/numa: Move up the access pid reset logic")
1ef5cbb92b ("sched/numa: Add disjoint vma unconditional scan logic")
68cfe9439a ("sched/numa: Allow scanning of shared VMAs")


2f88c8e802c8b128 2a806eab1c2e1c9f0ae39dc0307 1ef5cbb92bdb320c5eb9fdee1a8 68cfe9439a1baa642e05883fa64
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
271.01 +0.8% 273.24 -0.7% 269.00 -26.4% 199.49 ? 3% autonuma-benchmark.numa01.seconds
76.28 +0.2% 76.44 -11.7% 67.36 ? 6% -46.9% 40.49 ? 5% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
8.11 -0.9% 8.04 -0.7% 8.05 -0.1% 8.10 autonuma-benchmark.numa02.seconds
1425 +0.7% 1434 -3.1% 1381 -30.1% 996.02 ? 2% autonuma-benchmark.time.elapsed_time


it has some difference with our previous report on Ice Lake that
autonuma-benchmark.numa02.seconds seems keep stable,
but autonuma-benchmark.numa01.seconds has more changes.

anyway, for both platforms, we see performance improvement consistently
in this test along the patch-set.


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230912/[email protected]


below are normal data we shared in our performance reports. FYI.
(you won't see data for autonuma-benchmark.numa01.seconds or autonuma-benchmark.numa02.seconds,
since the delta bewteen 2a806eab1c and 1ef5cbb92b are small so our tool won't
show them)

=========================================================================================
compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark

commit:
2a806eab1c ("sched/numa: Move up the access pid reset logic")
1ef5cbb92b ("sched/numa: Add disjoint vma unconditional scan logic")

2a806eab1c2e1c9f 1ef5cbb92bdb320c5eb9fdee1a8
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.00 ? 79% +0.0 0.00 ? 13% mpstat.cpu.all.iowait%
357.33 ? 12% +90.4% 680.50 ? 30% perf-c2c.DRAM.remote
79.17 ? 14% +34.7% 106.67 ? 18% perf-c2c.HITM.remote
16378 ? 16% +53.9% 25200 ? 22% turbostat.POLL
50.24 +15.4% 57.99 turbostat.RAMWatt
37.04 ?199% -97.2% 1.05 ?141% perf-sched.wait_time.avg.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
7.46 ? 23% -43.7% 4.20 ? 47% perf-sched.wait_time.avg.ms.__x64_sys_pause.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown]
170.20 ?218% -99.4% 1.05 ?141% perf-sched.wait_time.max.ms.__cond_resched.exit_mmap.__mmput.exit_mm.do_exit
283.88 ? 28% +49.3% 423.88 ? 16% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
189.72 ? 23% +50.9% 286.24 ? 25% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
76.44 -11.9% 67.36 ? 6% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
1434 -3.7% 1381 autonuma-benchmark.time.elapsed_time
1434 -3.7% 1381 autonuma-benchmark.time.elapsed_time.max
1132634 -6.0% 1064224 ? 2% autonuma-benchmark.time.involuntary_context_switches
2532130 ? 2% +4.5% 2645367 ? 2% autonuma-benchmark.time.minor_page_faults
293184 -3.6% 282626 autonuma-benchmark.time.user_time
16101 +41.9% 22846 ? 4% autonuma-benchmark.time.voluntary_context_switches
6.41 ? 52% +3833.7% 251.97 ? 6% sched_debug.cfs_rq:/.util_est_enqueued.avg
401.88 ? 4% +179.2% 1121 ? 3% sched_debug.cfs_rq:/.util_est_enqueued.max
39.18 ? 16% +698.0% 312.66 ? 3% sched_debug.cfs_rq:/.util_est_enqueued.stddev
1662842 +10.5% 1838160 ? 2% sched_debug.cpu.avg_idle.avg
860266 ? 3% -22.4% 667568 ? 11% sched_debug.cpu.avg_idle.min
647306 ? 4% +13.6% 735595 ? 2% sched_debug.cpu.avg_idle.stddev
664890 +10.4% 733919 ? 2% sched_debug.cpu.max_idle_balance_cost.avg
203832 ? 4% +45.7% 296934 ? 4% sched_debug.cpu.max_idle_balance_cost.stddev
58841 ? 19% +205.6% 179845 ? 8% proc-vmstat.numa_hint_faults
47138 ? 20% +145.1% 115557 ? 8% proc-vmstat.numa_hint_faults_local
652.00 ? 27% +5217.2% 34668 ? 10% proc-vmstat.numa_huge_pte_updates
108295 ? 25% +3179.6% 3551657 ? 11% proc-vmstat.numa_pages_migrated
499336 ? 16% +3503.7% 17994636 ? 10% proc-vmstat.numa_pte_updates
108295 ? 25% +3179.6% 3551657 ? 11% proc-vmstat.pgmigrate_success
238140 +6.7% 254200 proc-vmstat.pgreuse
191.00 ? 29% +3488.8% 6854 ? 11% proc-vmstat.thp_migration_success
4331500 -4.5% 4135400 ? 2% proc-vmstat.unevictable_pgs_scanned
0.66 +0.0 0.67 perf-stat.i.branch-miss-rate%
1779997 +3.1% 1835782 perf-stat.i.branch-misses
2096 +1.6% 2128 perf-stat.i.context-switches
219.07 +2.3% 224.02 perf-stat.i.cpu-migrations
163199 -11.6% 144321 ? 2% perf-stat.i.cycles-between-cache-misses
986545 +1.0% 996780 perf-stat.i.dTLB-store-misses
4436 +4.1% 4616 perf-stat.i.minor-faults
42.56 ? 3% +3.4 45.95 perf-stat.i.node-load-miss-rate%
396254 +28.2% 507952 ? 3% perf-stat.i.node-load-misses
4436 +4.1% 4617 perf-stat.i.page-faults
38.37 ? 6% +6.3 44.69 ? 7% perf-stat.overall.node-load-miss-rate%
1734727 +2.3% 1774826 perf-stat.ps.branch-misses
216.66 +2.2% 221.40 perf-stat.ps.cpu-migrations
983143 +1.1% 993856 perf-stat.ps.dTLB-store-misses
4178 +4.3% 4357 perf-stat.ps.minor-faults
384816 +29.9% 499993 ? 4% perf-stat.ps.node-load-misses
4178 +4.3% 4357 perf-stat.ps.page-faults
47.25 ? 24% -32.1 15.11 ?142% perf-profile.calltrace.cycles-pp.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record
40.98 ? 34% -27.0 13.98 ?141% perf-profile.calltrace.cycles-pp.ordered_events__queue.process_simple.reader__read_event.perf_session__process_events.record__finish_output
40.76 ? 34% -26.9 13.90 ?141% perf-profile.calltrace.cycles-pp.queue_event.ordered_events__queue.process_simple.reader__read_event.perf_session__process_events
40.90 ? 36% -26.6 14.32 ?141% perf-profile.calltrace.cycles-pp.process_simple.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record
6.07 ?101% -5.4 0.62 ?223% perf-profile.calltrace.cycles-pp.__ordered_events__flush.perf_session__process_user_event.reader__read_event.perf_session__process_events.record__finish_output
5.76 ?110% -5.1 0.62 ?223% perf-profile.calltrace.cycles-pp.perf_session__process_user_event.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record
5.42 ?101% -4.9 0.48 ?223% perf-profile.calltrace.cycles-pp.perf_session__deliver_event.__ordered_events__flush.perf_session__process_user_event.reader__read_event.perf_session__process_events
0.58 ? 18% +0.4 0.94 ? 18% perf-profile.calltrace.cycles-pp.rebalance_domains.__do_softirq.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.49 ? 49% +0.4 0.94 ? 17% perf-profile.calltrace.cycles-pp.load_balance.rebalance_domains.__do_softirq.__irq_exit_rcu.sysvec_apic_timer_interrupt
0.70 ? 25% +0.5 1.21 ? 22% perf-profile.calltrace.cycles-pp.__do_softirq.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.71 ? 24% +0.5 1.22 ? 22% perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.20 ?142% +0.5 0.74 ? 18% perf-profile.calltrace.cycles-pp.sched_setaffinity.__x64_sys_sched_setaffinity.do_syscall_64.entry_SYSCALL_64_after_hwframe.sched_setaffinity
0.64 ? 53% +0.5 1.18 ? 32% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
0.18 ?141% +0.6 0.74 ? 19% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read.readn.perf_evsel__read.read_counters
0.18 ?141% +0.6 0.74 ? 19% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read.readn.perf_evsel__read
0.18 ?141% +0.6 0.75 ? 19% perf-profile.calltrace.cycles-pp.__libc_read.readn.perf_evsel__read.read_counters.process_interval
0.18 ?141% +0.6 0.76 ? 19% perf-profile.calltrace.cycles-pp.readn.perf_evsel__read.read_counters.process_interval.dispatch_events
0.31 ?103% +0.6 0.89 ? 18% perf-profile.calltrace.cycles-pp.update_sd_lb_stats.find_busiest_group.load_balance.rebalance_domains.__do_softirq
0.10 ?223% +0.6 0.69 ? 18% perf-profile.calltrace.cycles-pp.__sched_setaffinity.sched_setaffinity.__x64_sys_sched_setaffinity.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.71 ? 23% +0.6 1.30 ? 26% perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.31 ?103% +0.6 0.90 ? 18% perf-profile.calltrace.cycles-pp.find_busiest_group.load_balance.rebalance_domains.__do_softirq.__irq_exit_rcu
0.22 ?142% +0.6 0.81 ? 17% perf-profile.calltrace.cycles-pp.__x64_sys_sched_setaffinity.do_syscall_64.entry_SYSCALL_64_after_hwframe.sched_setaffinity.evlist_cpu_iterator__next
0.57 ? 60% +0.6 1.19 ? 16% perf-profile.calltrace.cycles-pp.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
0.58 ? 60% +0.6 1.21 ? 16% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
0.58 ? 60% +0.6 1.21 ? 16% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__xstat64
0.22 ?143% +0.6 0.86 ? 18% perf-profile.calltrace.cycles-pp.update_sg_lb_stats.update_sd_lb_stats.find_busiest_group.load_balance.rebalance_domains
0.58 ? 61% +0.6 1.23 ? 16% perf-profile.calltrace.cycles-pp.__xstat64
0.25 ?150% +0.6 0.90 ? 19% perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2
0.24 ?142% +0.7 0.90 ? 17% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sched_setaffinity.evlist_cpu_iterator__next.read_counters
0.24 ?142% +0.7 0.90 ? 18% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sched_setaffinity.evlist_cpu_iterator__next.read_counters.process_interval
0.21 ?141% +0.7 0.89 ? 19% perf-profile.calltrace.cycles-pp.perf_evsel__read.read_counters.process_interval.dispatch_events.cmd_stat
0.37 ?108% +0.7 1.07 ? 17% perf-profile.calltrace.cycles-pp.evlist__id2evsel.evsel__read_counter.read_counters.process_interval.dispatch_events
0.64 ? 57% +0.7 1.33 ? 20% perf-profile.calltrace.cycles-pp.evlist_cpu_iterator__next.read_counters.process_interval.dispatch_events.cmd_stat
0.10 ?223% +0.7 0.81 ? 27% perf-profile.calltrace.cycles-pp.show_stat.seq_read_iter.vfs_read.ksys_read.do_syscall_64
0.26 ?142% +0.7 1.01 ? 19% perf-profile.calltrace.cycles-pp.sched_setaffinity.evlist_cpu_iterator__next.read_counters.process_interval.dispatch_events
0.51 ? 84% +0.7 1.25 ? 28% perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
0.09 ?223% +0.8 0.85 ? 27% perf-profile.calltrace.cycles-pp.vmstat_start.seq_read_iter.proc_reg_read_iter.vfs_read.ksys_read
0.53 ? 53% +0.8 1.30 ? 25% perf-profile.calltrace.cycles-pp.seq_read_iter.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64
0.53 ? 53% +0.8 1.30 ? 25% perf-profile.calltrace.cycles-pp.proc_reg_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.85 ? 20% +0.8 1.64 ? 26% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.30 ?103% +0.8 1.12 ? 30% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.20 ?143% +0.8 1.03 ? 30% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.89 ? 23% +0.8 1.72 ? 26% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
0.66 ? 70% +0.8 1.48 ? 38% perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.27 ?155% +0.8 1.12 ? 33% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read.readn
0.32 ?150% +0.9 1.18 ? 40% perf-profile.calltrace.cycles-pp.dup_mm.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
0.94 ? 23% +0.9 1.83 ? 26% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
0.94 ? 23% +0.9 1.83 ? 26% perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
0.15 ?223% +1.0 1.10 ? 44% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap
0.15 ?223% +1.0 1.12 ? 43% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput
0.15 ?223% +1.0 1.13 ? 43% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.__mmput.exit_mm
1.00 ? 51% +1.0 1.99 ? 36% perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_fork
1.00 ? 51% +1.0 1.98 ? 36% perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_fork
1.01 ? 51% +1.0 1.99 ? 36% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_fork
1.01 ? 51% +1.0 1.99 ? 36% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_fork
1.06 ? 42% +1.0 2.05 ? 17% perf-profile.calltrace.cycles-pp.evsel__read_counter.read_counters.process_interval.dispatch_events.cmd_stat
0.17 ?223% +1.0 1.22 ? 41% perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.__mmput.exit_mm.do_exit
1.07 ? 54% +1.0 2.12 ? 36% perf-profile.calltrace.cycles-pp.__libc_fork
0.55 ? 75% +1.2 1.74 ? 36% perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
0.86 ? 59% +1.2 2.10 ? 33% perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exit_mm.do_exit.do_group_exit
0.87 ? 59% +1.2 2.11 ? 33% perf-profile.calltrace.cycles-pp.__mmput.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group
0.95 ? 59% +1.3 2.29 ? 33% perf-profile.calltrace.cycles-pp.exit_mm.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
1.74 ? 46% +1.4 3.17 ? 25% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
1.74 ? 46% +1.4 3.17 ? 25% perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
1.19 ? 60% +1.6 2.78 ? 31% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.19 ? 61% +1.6 2.78 ? 31% perf-profile.calltrace.cycles-pp.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.19 ? 61% +1.6 2.78 ? 31% perf-profile.calltrace.cycles-pp.do_group_exit.__x64_sys_exit_group.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.82 ? 24% +1.6 3.46 ? 23% perf-profile.calltrace.cycles-pp.ret_from_fork_asm
1.82 ? 24% +1.6 3.46 ? 23% perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
1.82 ? 24% +1.6 3.46 ? 23% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
2.15 ? 21% +2.0 4.20 ? 24% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt
1.48 ? 80% +3.4 4.89 ? 18% perf-profile.calltrace.cycles-pp.read_counters.process_interval.dispatch_events.cmd_stat
1.54 ? 79% +3.5 5.03 ? 18% perf-profile.calltrace.cycles-pp.dispatch_events.cmd_stat
1.54 ? 79% +3.5 5.03 ? 18% perf-profile.calltrace.cycles-pp.process_interval.dispatch_events.cmd_stat
1.54 ? 79% +3.5 5.04 ? 18% perf-profile.calltrace.cycles-pp.cmd_stat
0.13 ?223% +3.5 3.67 ? 62% perf-profile.calltrace.cycles-pp.copy_page.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch
0.14 ?223% +3.6 3.73 ? 62% perf-profile.calltrace.cycles-pp.folio_copy.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages
0.14 ?223% +3.6 3.73 ? 62% perf-profile.calltrace.cycles-pp.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page
0.14 ?223% +3.6 3.73 ? 62% perf-profile.calltrace.cycles-pp.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page
0.14 ?223% +3.9 4.00 ? 62% perf-profile.calltrace.cycles-pp.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault
0.14 ?223% +3.9 4.00 ? 62% perf-profile.calltrace.cycles-pp.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault
0.14 ?223% +3.9 4.00 ? 62% perf-profile.calltrace.cycles-pp.migrate_misplaced_page.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
3.90 ? 41% +3.9 7.77 ? 27% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
3.97 ? 41% +3.9 7.84 ? 27% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
0.14 ?223% +3.9 4.06 ? 61% perf-profile.calltrace.cycles-pp.do_huge_pmd_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
4.13 ? 41% +4.0 8.15 ? 27% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
4.13 ? 41% +4.0 8.17 ? 27% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
4.18 ? 41% +4.1 8.26 ? 27% perf-profile.calltrace.cycles-pp.read
1.80 ? 50% +5.5 7.29 ? 43% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
2.02 ? 50% +5.6 7.64 ? 41% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
2.04 ? 50% +5.6 7.66 ? 41% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
2.36 ? 33% +5.6 7.99 ? 33% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
2.14 ? 50% +5.7 7.84 ? 40% perf-profile.calltrace.cycles-pp.asm_exc_page_fault
69.69 ? 16% -30.0 39.64 ? 40% perf-profile.children.cycles-pp.__cmd_record
6.08 ?101% -5.5 0.62 ?223% perf-profile.children.cycles-pp.perf_session__process_user_event
6.15 ?100% -5.4 0.72 ?190% perf-profile.children.cycles-pp.__ordered_events__flush
5.48 ?101% -4.9 0.56 ?188% perf-profile.children.cycles-pp.perf_session__deliver_event
0.06 ? 29% +0.0 0.11 ? 27% perf-profile.children.cycles-pp.path_init
0.02 ?141% +0.0 0.06 ? 33% perf-profile.children.cycles-pp.cp_new_stat
0.02 ?141% +0.1 0.07 ? 25% perf-profile.children.cycles-pp.ptep_clear_flush
0.02 ?146% +0.1 0.08 ? 34% perf-profile.children.cycles-pp.rcu_nocb_try_bypass
0.08 ? 24% +0.1 0.14 ? 32% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.02 ?141% +0.1 0.08 ? 25% perf-profile.children.cycles-pp.__legitimize_mnt
0.00 +0.1 0.06 ? 16% perf-profile.children.cycles-pp.vm_memory_committed
0.11 ? 26% +0.1 0.17 ? 19% perf-profile.children.cycles-pp.aa_file_perm
0.06 ? 50% +0.1 0.12 ? 38% perf-profile.children.cycles-pp.kcpustat_cpu_fetch
0.02 ?141% +0.1 0.08 ? 40% perf-profile.children.cycles-pp.set_next_entity
0.09 ? 39% +0.1 0.16 ? 28% perf-profile.children.cycles-pp.try_charge_memcg
0.02 ?143% +0.1 0.09 ? 38% perf-profile.children.cycles-pp.__evlist__disable
0.01 ?223% +0.1 0.08 ? 35% perf-profile.children.cycles-pp._IO_setvbuf
0.08 ? 36% +0.1 0.16 ? 29% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.02 ?223% +0.1 0.09 ? 27% perf-profile.children.cycles-pp.drm_gem_vunmap_unlocked
0.12 ? 23% +0.1 0.20 ? 35% perf-profile.children.cycles-pp.get_idle_time
0.01 ?223% +0.1 0.08 ? 19% perf-profile.children.cycles-pp.meminfo_proc_show
0.10 ? 14% +0.1 0.18 ? 33% perf-profile.children.cycles-pp.drm_atomic_helper_commit
0.12 ? 17% +0.1 0.20 ? 32% perf-profile.children.cycles-pp.xas_descend
0.05 ? 77% +0.1 0.13 ? 27% perf-profile.children.cycles-pp.fsnotify_perm
0.02 ?223% +0.1 0.10 ? 42% perf-profile.children.cycles-pp.vm_unmapped_area
0.11 ? 13% +0.1 0.19 ? 33% perf-profile.children.cycles-pp.drm_atomic_commit
0.02 ?143% +0.1 0.11 ? 18% perf-profile.children.cycles-pp.__kmalloc
0.04 ?118% +0.1 0.13 ? 43% perf-profile.children.cycles-pp.xas_find
0.00 +0.1 0.08 ? 30% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.08 ? 33% +0.1 0.17 ? 37% perf-profile.children.cycles-pp.node_read_vmstat
0.09 ? 45% +0.1 0.18 ? 29% perf-profile.children.cycles-pp.select_task_rq
0.01 ?223% +0.1 0.10 ? 36% perf-profile.children.cycles-pp.slab_show
0.03 ?143% +0.1 0.12 ? 46% perf-profile.children.cycles-pp.acpi_ps_parse_loop
0.12 ? 36% +0.1 0.21 ? 33% perf-profile.children.cycles-pp.dequeue_entity
0.01 ?223% +0.1 0.10 ? 27% perf-profile.children.cycles-pp._IO_file_doallocate
0.08 ? 53% +0.1 0.17 ? 22% perf-profile.children.cycles-pp.apparmor_ptrace_access_check
0.04 ?105% +0.1 0.13 ? 48% perf-profile.children.cycles-pp.acpi_ps_parse_aml
0.08 ? 32% +0.1 0.18 ? 34% perf-profile.children.cycles-pp.autoremove_wake_function
0.12 ? 26% +0.1 0.22 ? 31% perf-profile.children.cycles-pp.__x64_sys_close
0.11 ? 16% +0.1 0.21 ? 36% perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
0.04 ?107% +0.1 0.13 ? 47% perf-profile.children.cycles-pp.acpi_ns_evaluate
0.04 ?107% +0.1 0.13 ? 47% perf-profile.children.cycles-pp.acpi_ps_execute_method
0.02 ?146% +0.1 0.12 ? 32% perf-profile.children.cycles-pp.thread_group_cputime
0.12 ? 35% +0.1 0.22 ? 21% perf-profile.children.cycles-pp.atime_needs_update
0.09 ? 44% +0.1 0.18 ? 23% perf-profile.children.cycles-pp.update_rq_clock_task
0.11 ? 32% +0.1 0.21 ? 25% perf-profile.children.cycles-pp.__perf_event_read_value
0.04 ?107% +0.1 0.14 ? 45% perf-profile.children.cycles-pp.acpi_os_execute_deferred
0.04 ?107% +0.1 0.14 ? 45% perf-profile.children.cycles-pp.acpi_ev_asynch_execute_gpe_method
0.04 ?112% +0.1 0.14 ? 38% perf-profile.children.cycles-pp.get_unmapped_area
0.06 ? 58% +0.1 0.16 ? 38% perf-profile.children.cycles-pp.prepare_task_switch
0.13 ? 34% +0.1 0.23 ? 19% perf-profile.children.cycles-pp.generic_exec_single
0.10 ? 30% +0.1 0.20 ? 30% perf-profile.children.cycles-pp.__wait_for_common
0.03 ?105% +0.1 0.13 ? 29% perf-profile.children.cycles-pp.thread_group_cputime_adjusted
0.13 ? 32% +0.1 0.24 ? 19% perf-profile.children.cycles-pp.smp_call_function_single
0.12 ? 40% +0.1 0.23 ? 26% perf-profile.children.cycles-pp.ttwu_do_activate
0.06 ? 58% +0.1 0.17 ? 32% perf-profile.children.cycles-pp.kstat_irqs_usr
0.10 ? 31% +0.1 0.22 ? 33% perf-profile.children.cycles-pp.__wake_up_common_lock
0.02 ?223% +0.1 0.13 ? 38% perf-profile.children.cycles-pp.free_unref_page_prepare
0.11 ? 48% +0.1 0.22 ? 37% perf-profile.children.cycles-pp.single_release
0.15 ? 33% +0.1 0.26 ? 17% perf-profile.children.cycles-pp.perf_event_read
0.08 ? 48% +0.1 0.20 ? 27% perf-profile.children.cycles-pp.__do_set_cpus_allowed
0.10 ? 70% +0.1 0.21 ? 33% perf-profile.children.cycles-pp.vm_area_dup
0.09 ? 31% +0.1 0.21 ? 35% perf-profile.children.cycles-pp.__wake_up_common
0.20 ? 37% +0.1 0.32 ? 21% perf-profile.children.cycles-pp.update_load_avg
0.12 ? 35% +0.1 0.24 ? 28% perf-profile.children.cycles-pp.blk_mq_queue_tag_busy_iter
0.12 ? 35% +0.1 0.24 ? 28% perf-profile.children.cycles-pp.blk_mq_in_flight
0.20 ? 28% +0.1 0.32 ? 16% perf-profile.children.cycles-pp.__cond_resched
0.17 ? 37% +0.1 0.30 ? 26% perf-profile.children.cycles-pp.dequeue_task_fair
0.08 ? 51% +0.1 0.21 ? 36% perf-profile.children.cycles-pp.free_swap_cache
0.02 ?146% +0.1 0.16 ? 38% perf-profile.children.cycles-pp.flush_tlb_func
0.09 ? 48% +0.1 0.23 ? 37% perf-profile.children.cycles-pp.free_pages_and_swap_cache
0.18 ? 37% +0.1 0.32 ? 26% perf-profile.children.cycles-pp.update_curr
0.02 ?142% +0.1 0.16 ? 26% perf-profile.children.cycles-pp.__x64_sys_newfstat
0.04 ?109% +0.1 0.18 ? 53% perf-profile.children.cycles-pp.free_unref_page_list
0.12 ? 38% +0.1 0.26 ? 30% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
0.12 ? 59% +0.1 0.27 ? 26% perf-profile.children.cycles-pp.security_ptrace_access_check
0.13 ? 40% +0.1 0.28 ? 33% perf-profile.children.cycles-pp.user_path_at_empty
0.12 ? 31% +0.1 0.27 ? 23% perf-profile.children.cycles-pp.__set_cpus_allowed_ptr_locked
0.20 ? 31% +0.1 0.35 ? 34% perf-profile.children.cycles-pp.dev_attr_show
0.13 ? 44% +0.2 0.28 ? 31% perf-profile.children.cycles-pp.readlink
0.20 ? 30% +0.2 0.35 ? 26% perf-profile.children.cycles-pp.__memcpy
0.00 +0.2 0.15 ? 64% perf-profile.children.cycles-pp.pmdp_invalidate
0.18 ? 31% +0.2 0.34 ? 27% perf-profile.children.cycles-pp.dup_task_struct
0.13 ? 33% +0.2 0.29 ? 29% perf-profile.children.cycles-pp.switch_fpu_return
0.00 +0.2 0.16 ? 64% perf-profile.children.cycles-pp.set_pmd_migration_entry
0.19 ? 26% +0.2 0.34 ? 27% perf-profile.children.cycles-pp.__entry_text_start
0.12 ? 32% +0.2 0.29 ? 38% perf-profile.children.cycles-pp.pipe_write
0.25 ? 35% +0.2 0.42 ? 32% perf-profile.children.cycles-pp.__check_object_size
0.23 ? 35% +0.2 0.40 ? 20% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
0.22 ? 46% +0.2 0.39 ? 21% perf-profile.children.cycles-pp.enqueue_task_fair
0.00 +0.2 0.18 ? 92% perf-profile.children.cycles-pp.cpuidle_enter
0.00 +0.2 0.18 ? 92% perf-profile.children.cycles-pp.cpuidle_enter_state
0.00 +0.2 0.18 ? 59% perf-profile.children.cycles-pp.try_to_migrate
0.00 +0.2 0.18 ? 59% perf-profile.children.cycles-pp.try_to_migrate_one
0.16 ? 37% +0.2 0.34 ? 33% perf-profile.children.cycles-pp.do_readlinkat
0.19 ? 54% +0.2 0.37 ? 44% perf-profile.children.cycles-pp.rcu_cblist_dequeue
0.16 ? 37% +0.2 0.34 ? 32% perf-profile.children.cycles-pp.__x64_sys_readlink
0.00 +0.2 0.19 ? 60% perf-profile.children.cycles-pp.rmap_walk_anon
0.00 +0.2 0.19 ? 66% perf-profile.children.cycles-pp.__sysvec_call_function
0.00 +0.2 0.19 ? 95% perf-profile.children.cycles-pp.cpuidle_idle_call
0.00 +0.2 0.20 ? 59% perf-profile.children.cycles-pp.migrate_folio_unmap
0.21 ? 43% +0.2 0.42 ? 27% perf-profile.children.cycles-pp.diskstats_show
0.28 ? 36% +0.2 0.49 ? 24% perf-profile.children.cycles-pp.__kmem_cache_alloc_node
0.00 +0.2 0.21 ? 53% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.01 ?223% +0.2 0.24 ? 63% perf-profile.children.cycles-pp.sysvec_call_function
0.39 ? 15% +0.2 0.63 ? 15% perf-profile.children.cycles-pp.native_irq_return_iret
0.22 ? 13% +0.3 0.48 ? 28% perf-profile.children.cycles-pp.all_vm_events
0.21 ? 38% +0.3 0.48 ? 40% perf-profile.children.cycles-pp.write
0.30 ? 45% +0.3 0.58 ? 27% perf-profile.children.cycles-pp._raw_spin_lock
0.22 ? 40% +0.3 0.50 ? 26% perf-profile.children.cycles-pp.getname_flags
0.28 ? 54% +0.3 0.56 ? 24% perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
0.30 ? 55% +0.3 0.60 ? 39% perf-profile.children.cycles-pp.dup_mmap
0.01 ?223% +0.3 0.32 ? 88% perf-profile.children.cycles-pp.start_secondary
0.20 ? 38% +0.3 0.50 ? 40% perf-profile.children.cycles-pp.release_pages
0.01 ?223% +0.3 0.32 ? 58% perf-profile.children.cycles-pp.asm_sysvec_call_function
0.01 ?223% +0.3 0.32 ? 85% perf-profile.children.cycles-pp.do_idle
0.01 ?223% +0.3 0.32 ? 85% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
0.01 ?223% +0.3 0.32 ? 85% perf-profile.children.cycles-pp.cpu_startup_entry
0.37 ? 37% +0.3 0.68 ? 30% perf-profile.children.cycles-pp.__close_nocancel
0.26 ? 24% +0.3 0.58 ? 38% perf-profile.children.cycles-pp.drm_fb_helper_damage_work
0.26 ? 24% +0.3 0.58 ? 38% perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty
0.36 ? 37% +0.3 0.69 ? 19% perf-profile.children.cycles-pp.perf_read
0.33 ? 29% +0.3 0.66 ? 30% perf-profile.children.cycles-pp.fold_vm_numa_events
0.28 ? 48% +0.3 0.62 ? 34% perf-profile.children.cycles-pp.kmem_cache_free
0.22 ? 81% +0.4 0.58 ? 55% perf-profile.children.cycles-pp.wait4
0.36 ? 32% +0.4 0.72 ? 25% perf-profile.children.cycles-pp.__set_cpus_allowed_ptr
0.40 ? 52% +0.4 0.78 ? 32% perf-profile.children.cycles-pp.__d_lookup_rcu
0.43 ? 23% +0.4 0.81 ? 27% perf-profile.children.cycles-pp.show_stat
0.42 ? 32% +0.4 0.81 ? 24% perf-profile.children.cycles-pp.__sched_setaffinity
0.24 ? 44% +0.4 0.65 ? 43% perf-profile.children.cycles-pp.tlb_batch_pages_flush
0.02 ?223% +0.4 0.45 ? 48% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
0.53 ? 48% +0.4 0.97 ? 34% perf-profile.children.cycles-pp.open_last_lookups
0.03 ?223% +0.4 0.48 ? 42% perf-profile.children.cycles-pp.smp_call_function_many_cond
0.07 ? 58% +0.5 0.52 ? 39% perf-profile.children.cycles-pp.flush_tlb_mm_range
0.43 ? 37% +0.5 0.89 ? 19% perf-profile.children.cycles-pp.perf_evsel__read
0.39 ? 18% +0.5 0.85 ? 27% perf-profile.children.cycles-pp.vmstat_start
0.16 ? 57% +0.5 0.63 ? 42% perf-profile.children.cycles-pp.pick_next_task_fair
0.49 ? 31% +0.5 0.97 ? 23% perf-profile.children.cycles-pp.__x64_sys_sched_setaffinity
0.04 ?168% +0.5 0.52 ? 51% perf-profile.children.cycles-pp.newidle_balance
0.45 ? 28% +0.5 0.96 ? 32% perf-profile.children.cycles-pp.finish_task_switch
0.61 ? 20% +0.5 1.13 ? 24% perf-profile.children.cycles-pp.rebalance_domains
0.55 ? 47% +0.5 1.08 ? 17% perf-profile.children.cycles-pp.evlist__id2evsel
0.46 ? 50% +0.5 0.98 ? 51% perf-profile.children.cycles-pp.do_vmi_munmap
0.29 ? 45% +0.5 0.82 ? 37% perf-profile.children.cycles-pp.tlb_finish_mmu
0.44 ? 53% +0.5 0.98 ? 32% perf-profile.children.cycles-pp.wp_page_copy
0.59 ? 29% +0.6 1.14 ? 30% perf-profile.children.cycles-pp.__percpu_counter_sum
0.46 ? 28% +0.6 1.03 ? 30% perf-profile.children.cycles-pp.process_one_work
0.54 ? 59% +0.6 1.12 ? 29% perf-profile.children.cycles-pp.kmem_cache_alloc
0.63 ? 32% +0.6 1.21 ? 31% perf-profile.children.cycles-pp.__mmdrop
0.76 ? 41% +0.6 1.36 ? 32% perf-profile.children.cycles-pp.walk_component
0.58 ? 53% +0.6 1.18 ? 40% perf-profile.children.cycles-pp.dup_mm
0.48 ? 30% +0.6 1.12 ? 30% perf-profile.children.cycles-pp.worker_thread
0.30 ? 63% +0.6 0.95 ? 41% perf-profile.children.cycles-pp._compound_head
0.68 ? 36% +0.6 1.32 ? 17% perf-profile.children.cycles-pp.readn
0.61 ? 27% +0.7 1.30 ? 25% perf-profile.children.cycles-pp.proc_reg_read_iter
0.99 ? 41% +0.8 1.75 ? 32% perf-profile.children.cycles-pp.lookup_fast
0.78 ? 31% +0.8 1.55 ? 22% perf-profile.children.cycles-pp.evlist_cpu_iterator__next
1.77 ? 16% +0.8 2.55 ? 21% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
0.58 ? 24% +0.8 1.42 ? 29% perf-profile.children.cycles-pp.update_sg_lb_stats
0.61 ? 24% +0.9 1.48 ? 29% perf-profile.children.cycles-pp.update_sd_lb_stats
0.61 ? 24% +0.9 1.50 ? 30% perf-profile.children.cycles-pp.find_busiest_group
1.08 ? 29% +0.9 2.00 ? 32% perf-profile.children.cycles-pp.__irq_exit_rcu
0.93 ? 48% +0.9 1.87 ? 35% perf-profile.children.cycles-pp.copy_process
0.65 ? 26% +1.0 1.62 ? 29% perf-profile.children.cycles-pp.load_balance
1.00 ? 51% +1.0 1.99 ? 36% perf-profile.children.cycles-pp.__do_sys_clone
1.06 ? 42% +1.0 2.05 ? 17% perf-profile.children.cycles-pp.evsel__read_counter
0.50 ? 62% +1.0 1.51 ? 47% perf-profile.children.cycles-pp.zap_pte_range
0.51 ? 61% +1.0 1.53 ? 46% perf-profile.children.cycles-pp.zap_pmd_range
0.53 ? 61% +1.0 1.57 ? 46% perf-profile.children.cycles-pp.unmap_page_range
1.05 ? 31% +1.1 2.10 ? 23% perf-profile.children.cycles-pp.sched_setaffinity
1.07 ? 54% +1.1 2.12 ? 36% perf-profile.children.cycles-pp.__libc_fork
1.70 ? 17% +1.1 2.80 ? 29% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.61 ? 61% +1.1 1.73 ? 43% perf-profile.children.cycles-pp.unmap_vmas
1.32 ? 29% +1.2 2.48 ? 40% perf-profile.children.cycles-pp.__do_softirq
1.30 ? 29% +1.2 2.48 ? 28% perf-profile.children.cycles-pp.do_fault
1.02 ? 32% +1.3 2.29 ? 32% perf-profile.children.cycles-pp.schedule
0.95 ? 59% +1.3 2.30 ? 33% perf-profile.children.cycles-pp.exit_mm
1.15 ? 34% +1.5 2.65 ? 31% perf-profile.children.cycles-pp.__schedule
1.18 ? 58% +1.6 2.76 ? 32% perf-profile.children.cycles-pp.exit_mmap
1.18 ? 58% +1.6 2.78 ? 32% perf-profile.children.cycles-pp.__mmput
1.23 ? 60% +1.6 2.86 ? 31% perf-profile.children.cycles-pp.do_exit
1.23 ? 60% +1.6 2.87 ? 31% perf-profile.children.cycles-pp.do_group_exit
1.23 ? 60% +1.6 2.87 ? 31% perf-profile.children.cycles-pp.__x64_sys_exit_group
1.82 ? 24% +1.6 3.46 ? 23% perf-profile.children.cycles-pp.kthread
1.83 ? 24% +1.7 3.51 ? 24% perf-profile.children.cycles-pp.ret_from_fork_asm
1.83 ? 23% +1.7 3.50 ? 24% perf-profile.children.cycles-pp.ret_from_fork
2.70 ? 16% +1.8 4.51 ? 30% perf-profile.children.cycles-pp.exit_to_user_mode_loop
2.85 ? 17% +2.0 4.83 ? 29% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
3.90 ? 12% +2.0 5.92 ? 23% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
2.51 ? 39% +2.4 4.89 ? 18% perf-profile.children.cycles-pp.read_counters
2.60 ? 38% +2.4 5.03 ? 18% perf-profile.children.cycles-pp.dispatch_events
2.60 ? 38% +2.4 5.03 ? 18% perf-profile.children.cycles-pp.process_interval
2.60 ? 38% +2.4 5.04 ? 18% perf-profile.children.cycles-pp.cmd_stat
3.87 ? 43% +3.2 7.04 ? 26% perf-profile.children.cycles-pp.seq_read_iter
0.23 ?170% +3.6 3.83 ? 58% perf-profile.children.cycles-pp.folio_copy
0.23 ?169% +3.6 3.84 ? 58% perf-profile.children.cycles-pp.migrate_folio_extra
0.23 ?169% +3.6 3.84 ? 58% perf-profile.children.cycles-pp.move_to_new_folio
0.28 ?145% +3.7 4.00 ? 56% perf-profile.children.cycles-pp.copy_page
0.24 ?171% +3.9 4.14 ? 58% perf-profile.children.cycles-pp.migrate_pages_batch
0.24 ?171% +3.9 4.14 ? 58% perf-profile.children.cycles-pp.migrate_pages
0.25 ?171% +3.9 4.15 ? 58% perf-profile.children.cycles-pp.migrate_misplaced_page
0.22 ?166% +3.9 4.13 ? 58% perf-profile.children.cycles-pp.do_huge_pmd_numa_page
4.19 ? 41% +4.1 8.29 ? 27% perf-profile.children.cycles-pp.read
4.84 ? 41% +4.1 8.96 ? 25% perf-profile.children.cycles-pp.vfs_read
5.01 ? 41% +4.3 9.29 ? 25% perf-profile.children.cycles-pp.ksys_read
3.24 ? 32% +6.3 9.52 ? 30% perf-profile.children.cycles-pp.__handle_mm_fault
3.68 ? 31% +6.5 10.18 ? 28% perf-profile.children.cycles-pp.handle_mm_fault
4.55 ? 27% +6.8 11.34 ? 24% perf-profile.children.cycles-pp.do_user_addr_fault
4.62 ? 27% +6.8 11.43 ? 24% perf-profile.children.cycles-pp.exc_page_fault
5.01 ? 26% +7.0 12.02 ? 23% perf-profile.children.cycles-pp.asm_exc_page_fault
0.02 ?141% +0.1 0.08 ? 22% perf-profile.self.cycles-pp.__legitimize_mnt
0.11 ? 26% +0.1 0.16 ? 19% perf-profile.self.cycles-pp.aa_file_perm
0.02 ?141% +0.1 0.08 ? 24% perf-profile.self.cycles-pp.perf_evsel__read
0.02 ?144% +0.1 0.08 ? 40% perf-profile.self.cycles-pp.check_heap_object
0.07 ? 30% +0.1 0.13 ? 34% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.00 +0.1 0.06 ? 13% perf-profile.self.cycles-pp._copy_to_iter
0.07 ? 52% +0.1 0.13 ? 16% perf-profile.self.cycles-pp.atime_needs_update
0.06 ? 50% +0.1 0.12 ? 38% perf-profile.self.cycles-pp.kcpustat_cpu_fetch
0.01 ?223% +0.1 0.08 ? 33% perf-profile.self.cycles-pp.wq_worker_comm
0.05 ? 80% +0.1 0.13 ? 26% perf-profile.self.cycles-pp.try_charge_memcg
0.07 ? 57% +0.1 0.14 ? 28% perf-profile.self.cycles-pp.switch_mm_irqs_off
0.05 ? 84% +0.1 0.13 ? 26% perf-profile.self.cycles-pp.update_rq_clock_task
0.02 ?223% +0.1 0.09 ? 26% perf-profile.self.cycles-pp.enqueue_task_fair
0.01 ?223% +0.1 0.09 ? 27% perf-profile.self.cycles-pp.thread_group_cputime
0.04 ?104% +0.1 0.12 ? 23% perf-profile.self.cycles-pp.fsnotify_perm
0.05 ? 86% +0.1 0.13 ? 23% perf-profile.self.cycles-pp.perf_read
0.01 ?223% +0.1 0.09 ? 27% perf-profile.self.cycles-pp._IO_file_doallocate
0.12 ? 19% +0.1 0.20 ? 32% perf-profile.self.cycles-pp.xas_descend
0.00 +0.1 0.08 ? 30% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.10 ? 39% +0.1 0.19 ? 26% perf-profile.self.cycles-pp.update_curr
0.03 ?105% +0.1 0.12 ? 39% perf-profile.self.cycles-pp.__fput
0.03 ?150% +0.1 0.13 ? 40% perf-profile.self.cycles-pp.task_dump_owner
0.06 ? 58% +0.1 0.17 ? 33% perf-profile.self.cycles-pp.kstat_irqs_usr
0.02 ?223% +0.1 0.12 ? 35% perf-profile.self.cycles-pp.free_unref_page_prepare
0.08 ? 27% +0.1 0.20 ? 40% perf-profile.self.cycles-pp.release_pages
0.12 ? 37% +0.1 0.23 ? 27% perf-profile.self.cycles-pp.blk_mq_queue_tag_busy_iter
0.17 ? 37% +0.1 0.29 ? 28% perf-profile.self.cycles-pp.__schedule
0.08 ? 51% +0.1 0.20 ? 35% perf-profile.self.cycles-pp.free_swap_cache
0.13 ? 23% +0.1 0.26 ? 18% perf-profile.self.cycles-pp.__entry_text_start
0.13 ? 39% +0.1 0.26 ? 24% perf-profile.self.cycles-pp.evlist_cpu_iterator__next
0.02 ?142% +0.1 0.15 ? 24% perf-profile.self.cycles-pp.__x64_sys_newfstat
0.08 ? 40% +0.1 0.22 ? 17% perf-profile.self.cycles-pp.vfs_read
0.12 ? 38% +0.1 0.26 ? 30% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
0.20 ? 32% +0.2 0.35 ? 26% perf-profile.self.cycles-pp.__memcpy
0.22 ? 43% +0.2 0.40 ? 22% perf-profile.self.cycles-pp.do_dentry_open
0.16 ? 41% +0.2 0.34 ? 23% perf-profile.self.cycles-pp.__kmem_cache_alloc_node
0.19 ? 54% +0.2 0.37 ? 44% perf-profile.self.cycles-pp.rcu_cblist_dequeue
0.24 ? 52% +0.2 0.43 ? 35% perf-profile.self.cycles-pp.inode_permission
0.20 ? 39% +0.2 0.40 ? 17% perf-profile.self.cycles-pp.evsel__read_counter
0.14 ? 44% +0.2 0.35 ? 18% perf-profile.self.cycles-pp.memcg_slab_post_alloc_hook
0.39 ? 15% +0.2 0.63 ? 15% perf-profile.self.cycles-pp.native_irq_return_iret
0.22 ? 49% +0.2 0.47 ? 31% perf-profile.self.cycles-pp.kmem_cache_free
0.02 ?223% +0.3 0.27 ? 44% perf-profile.self.cycles-pp.smp_call_function_many_cond
0.22 ? 13% +0.3 0.47 ? 28% perf-profile.self.cycles-pp.all_vm_events
0.24 ? 62% +0.3 0.50 ? 34% perf-profile.self.cycles-pp.kmem_cache_alloc
0.29 ? 42% +0.3 0.56 ? 22% perf-profile.self.cycles-pp.read_counters
0.32 ? 28% +0.3 0.65 ? 31% perf-profile.self.cycles-pp.fold_vm_numa_events
0.39 ? 52% +0.4 0.76 ? 32% perf-profile.self.cycles-pp.__d_lookup_rcu
0.37 ? 18% +0.4 0.75 ? 28% perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
0.54 ? 46% +0.5 1.05 ? 17% perf-profile.self.cycles-pp.evlist__id2evsel
0.58 ? 29% +0.5 1.10 ? 31% perf-profile.self.cycles-pp.__percpu_counter_sum
0.30 ? 63% +0.6 0.92 ? 40% perf-profile.self.cycles-pp._compound_head
0.46 ? 22% +0.6 1.11 ? 28% perf-profile.self.cycles-pp.update_sg_lb_stats
0.27 ?144% +3.7 3.98 ? 57% perf-profile.self.cycles-pp.copy_page




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


2023-09-13 19:46:16

by Raghavendra K T

[permalink] [raw]
Subject: Re: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic

On 9/12/2023 1:20 PM, kernelt test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a -11.9% improvement of autonuma-benchmark.numa01_THREAD_ALLOC.seconds on:
>
>
> commit: 1ef5cbb92bdb320c5eb9fdee1a811d22ee9e19fe ("[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic")
> url: https://github.com/intel-lab-lkp/linux/commits/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007
> base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2f88c8e802c8b128a155976631f4eb2ce4f3c805
> patch link: https://lore.kernel.org/all/87e3c08bd1770dd3e6eee099c01e595f14c76fc3.1693287931.git.raghavendra.kt@amd.com/
> patch subject: [RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional scan logic
>
> testcase: autonuma-benchmark
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> parameters:
>
> iterations: 4x
> test: numa01_THREAD_ALLOC
> cpufreq_governor: performance
>
>
> hi, Raghu,
>
> the reason there is a separate report for this commit besides
> https://lore.kernel.org/all/[email protected]/
> is due to bisection nature, for one auto-bisect, we so far only could capture
> one commit for performance change.
>
> this auto-bisect is running on another test machine (Sapphire Rapids), and it
> happened to choose autonuma-benchmark.numa01_THREAD_ALLOC.seconds as indicator
> to do the bisect, it finally captured
> "[RFC PATCH V1 2/6] sched/numa: Add disjoint vma unconditional"
>
> and from
> https://lore.kernel.org/all/[email protected]/
> I noticed you care more about the performance impact of whole patch set,
> so let me give a summary table as below.
>
> firstly, let me give out how we apply your patch again:
>
> 68cfe9439a1ba (linux-review/Raghavendra-K-T/sched-numa-Move-up-the-access-pid-reset-logic/20230829-141007) sched/numa: Allow scanning of shared VMAs
> af46f3c9ca2d1 sched/numa: Allow recently accessed VMAs to be scanned
> 167773d1ddb5f sched/numa: Increase tasks' access history
> fc769221b2306 sched/numa: Remove unconditional scan logic using mm numa_scan_seq
> 1ef5cbb92bdb3 sched/numa: Add disjoint vma unconditional scan logic
> 2a806eab1c2e1 sched/numa: Move up the access pid reset logic
> 2f88c8e802c8b (tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well
>
>
> we have below data on this test machine
> (full table will be very big, if you want it, please let me know):
>
> =========================================================================================
> compiler/cpufreq_governor/iterations/kconfig/rootfs/tbox_group/test/testcase:
> gcc-12/performance/4x/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-spr-r02/numa01_THREAD_ALLOC/autonuma-benchmark
>
> commit:
> 2f88c8e802 ("(tip/sched/core) sched/eevdf/doc: Modify the documented knob to base_slice_ns as well")
> 2a806eab1c ("sched/numa: Move up the access pid reset logic")
> 1ef5cbb92b ("sched/numa: Add disjoint vma unconditional scan logic")
> 68cfe9439a ("sched/numa: Allow scanning of shared VMAs")
>
>
> 2f88c8e802c8b128 2a806eab1c2e1c9f0ae39dc0307 1ef5cbb92bdb320c5eb9fdee1a8 68cfe9439a1baa642e05883fa64
> ---------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \
> 271.01 +0.8% 273.24 -0.7% 269.00 -26.4% 199.49 ± 3% autonuma-benchmark.numa01.seconds
> 76.28 +0.2% 76.44 -11.7% 67.36 ± 6% -46.9% 40.49 ± 5% autonuma-benchmark.numa01_THREAD_ALLOC.seconds
> 8.11 -0.9% 8.04 -0.7% 8.05 -0.1% 8.10 autonuma-benchmark.numa02.seconds
> 1425 +0.7% 1434 -3.1% 1381 -30.1% 996.02 ± 2% autonuma-benchmark.time.elapsed_time
>
>

Thanks for this Summary too.

I think slight additional time overhead from first patch is coming
from additional logic that gets executed before we return from
is_vma_accessed() check as expected.

Regards
- Raghu