2023-03-20 01:26:38

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [migrate_pages] 7e12beb8ca: vm-scalability.throughput -3.4% regression

Hello,

FYI, we noticed a -3.4% regression of vm-scalability.throughput due to commit:

commit: 7e12beb8ca2ac98b2ec42e0ea4b76cdc93b58654 ("migrate_pages: batch flushing TLB")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: vm-scalability
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
with following parameters:

runtime: 300s
size: 512G
test: anon-cow-rand-mt
cpufreq_governor: performance

test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]


Details are as below:

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/512G/lkp-csl-2sp3/anon-cow-rand-mt/vm-scalability

commit:
ebe75e4751 ("migrate_pages: share more code between _unmap and _move")
7e12beb8ca ("migrate_pages: batch flushing TLB")

ebe75e4751063dce 7e12beb8ca2ac98b2ec42e0ea4b
---------------- ---------------------------
%stddev %change %stddev
\ | \
57522 -3.3% 55603 vm-scalability.median
5513665 -3.4% 5328506 vm-scalability.throughput
203067 ? 3% -8.6% 185675 ? 2% vm-scalability.time.involuntary_context_switches
68459282 ? 6% +42.1% 97269013 ? 2% vm-scalability.time.minor_page_faults
9007 -1.8% 8844 vm-scalability.time.percent_of_cpu_this_job_got
1170 ? 3% +58.2% 1852 ? 3% vm-scalability.time.system_time
26342 -4.6% 25132 vm-scalability.time.user_time
11275 ? 5% +364.1% 52332 ? 7% vm-scalability.time.voluntary_context_switches
1.658e+09 -3.4% 1.601e+09 vm-scalability.workload
51013 ? 40% -67.5% 16584 ?125% numa-numastat.node1.other_node
20056 ? 2% +8.9% 21844 ? 3% numa-vmstat.node1.nr_slab_unreclaimable
51013 ? 40% -67.5% 16584 ?125% numa-vmstat.node1.numa_other
2043 ? 3% +10.5% 2257 vmstat.system.cs
540820 ? 2% +186.2% 1547747 ? 8% vmstat.system.in
0.00 ?157% +0.0 0.00 ? 6% mpstat.cpu.all.iowait%
2.59 +1.7 4.27 ? 4% mpstat.cpu.all.irq%
4.03 ? 3% +2.3 6.36 ? 3% mpstat.cpu.all.sys%
5870 ? 64% -48.0% 3051 ? 7% numa-meminfo.node0.Active
195543 ? 3% -7.2% 181529 ? 4% numa-meminfo.node0.Slab
80226 ? 2% +8.9% 87378 ? 3% numa-meminfo.node1.SUnreclaim
40406018 ? 7% +66.5% 67272793 ? 2% proc-vmstat.numa_hint_faults
20211075 ? 7% +66.8% 33722069 ? 2% proc-vmstat.numa_hint_faults_local
40555366 ? 7% +66.3% 67430626 ? 2% proc-vmstat.numa_pte_updates
69364615 ? 6% +41.5% 98184580 ? 2% proc-vmstat.pgfault
210031 ? 8% +126.2% 475135 ? 99% turbostat.C1
1.382e+09 ? 2% +140.0% 3.317e+09 ? 5% turbostat.IRQ
8771 ? 6% +466.6% 49695 ? 7% turbostat.POLL
87.01 -2.6% 84.76 turbostat.RAMWatt
145904 ? 2% -22.2% 113504 ? 11% sched_debug.cfs_rq:/.min_vruntime.stddev
841.83 ? 2% -13.8% 725.47 ? 6% sched_debug.cfs_rq:/.runnable_avg.min
549777 ? 9% -49.2% 279239 ? 34% sched_debug.cfs_rq:/.spread0.avg
659447 ? 8% -36.7% 417735 ? 22% sched_debug.cfs_rq:/.spread0.max
145800 ? 2% -22.1% 113612 ? 11% sched_debug.cfs_rq:/.spread0.stddev
785.23 ? 6% -14.6% 670.61 ? 10% sched_debug.cfs_rq:/.util_avg.min
67.96 ? 5% +22.7% 83.40 ? 10% sched_debug.cfs_rq:/.util_avg.stddev
246549 ? 7% -15.1% 209367 ? 7% sched_debug.cpu.avg_idle.avg
1592 +10.8% 1763 ? 3% sched_debug.cpu.clock_task.stddev
32106 ? 10% -17.6% 26468 ? 8% sched_debug.cpu.nr_switches.max
1910 ? 6% +31.0% 2503 sched_debug.cpu.nr_switches.min
5664 ? 10% -16.6% 4723 ? 7% sched_debug.cpu.nr_switches.stddev
0.18 ? 4% +0.0 0.23 ? 3% perf-stat.i.branch-miss-rate%
8939520 ? 4% +61.3% 14417578 ? 3% perf-stat.i.branch-misses
66.18 -1.7 64.47 perf-stat.i.cache-miss-rate%
1927 ? 3% +11.0% 2139 perf-stat.i.context-switches
158.85 +10.7% 175.92 ? 3% perf-stat.i.cpu-migrations
0.04 ? 6% +0.0 0.05 ? 11% perf-stat.i.dTLB-load-miss-rate%
4916471 ? 7% +39.7% 6870029 ? 9% perf-stat.i.dTLB-load-misses
9.10 -0.4 8.71 perf-stat.i.dTLB-store-miss-rate%
5.311e+08 -4.1% 5.095e+08 perf-stat.i.dTLB-store-misses
2438160 ? 2% +161.5% 6374895 ? 7% perf-stat.i.iTLB-load-misses
115315 ? 2% +62.0% 186840 ? 7% perf-stat.i.iTLB-loads
43163 ? 5% -25.7% 32083 ? 26% perf-stat.i.instructions-per-iTLB-miss
0.34 ? 37% -63.2% 0.13 ? 27% perf-stat.i.major-faults
226565 ? 6% +41.4% 320417 ? 2% perf-stat.i.minor-faults
50.56 +1.7 52.22 perf-stat.i.node-load-miss-rate%
1.165e+08 +3.7% 1.208e+08 perf-stat.i.node-load-misses
1.13e+08 -3.6% 1.089e+08 perf-stat.i.node-loads
2.678e+08 -3.6% 2.582e+08 perf-stat.i.node-store-misses
2.655e+08 -4.2% 2.543e+08 perf-stat.i.node-stores
226565 ? 6% +41.4% 320418 ? 2% perf-stat.i.page-faults
0.08 ? 4% +0.0 0.12 ? 4% perf-stat.overall.branch-miss-rate%
67.13 -1.8 65.28 perf-stat.overall.cache-miss-rate%
367.93 +2.6% 377.43 perf-stat.overall.cycles-between-cache-misses
0.04 ? 7% +0.0 0.05 ? 10% perf-stat.overall.dTLB-load-miss-rate%
9.38 -0.4 8.97 perf-stat.overall.dTLB-store-miss-rate%
95.49 +1.7 97.16 perf-stat.overall.iTLB-load-miss-rate%
20560 ? 3% -61.9% 7826 ? 7% perf-stat.overall.instructions-per-iTLB-miss
50.76 +1.8 52.60 perf-stat.overall.node-load-miss-rate%
9205 +3.1% 9485 perf-stat.overall.path-length
8892515 ? 4% +62.1% 14412101 ? 3% perf-stat.ps.branch-misses
1927 ? 3% +11.1% 2142 perf-stat.ps.context-switches
158.37 +11.2% 176.03 ? 3% perf-stat.ps.cpu-migrations
4902779 ? 7% +40.2% 6871859 ? 9% perf-stat.ps.dTLB-load-misses
5.295e+08 -4.1% 5.077e+08 perf-stat.ps.dTLB-store-misses
2428324 ? 2% +163.0% 6385873 ? 7% perf-stat.ps.iTLB-load-misses
114618 ? 2% +62.5% 186290 ? 7% perf-stat.ps.iTLB-loads
0.34 ? 37% -63.2% 0.13 ? 27% perf-stat.ps.major-faults
226036 ? 6% +41.8% 320615 ? 2% perf-stat.ps.minor-faults
1.162e+08 +3.7% 1.205e+08 perf-stat.ps.node-load-misses
1.127e+08 -3.6% 1.086e+08 perf-stat.ps.node-loads
2.67e+08 -3.6% 2.573e+08 perf-stat.ps.node-store-misses
2.647e+08 -4.3% 2.534e+08 perf-stat.ps.node-stores
226036 ? 6% +41.8% 320615 ? 2% perf-stat.ps.page-faults
0.00 +0.6 0.60 ? 8% perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush
0.00 +0.6 0.64 ? 7% perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
0.00 +0.9 0.90 ? 10% perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
0.00 +1.9 1.86 ? 9% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +1.9 1.87 ? 8% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +1.9 1.94 ? 8% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +2.6 2.59 ? 9% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.do_access
0.00 +2.8 2.80 ? 8% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush
3.43 ? 13% +6.5 9.88 ? 7% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
3.46 ? 13% +6.5 9.94 ? 7% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
3.15 ? 13% +6.5 9.69 ? 7% perf-profile.calltrace.cycles-pp.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
4.06 ? 11% +6.7 10.71 ? 7% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
3.82 ? 13% +6.7 10.48 ? 7% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
3.83 ? 13% +6.7 10.49 ? 7% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
2.57 ? 13% +6.9 9.46 ? 7% perf-profile.calltrace.cycles-pp.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
2.36 ? 13% +6.9 9.28 ? 7% perf-profile.calltrace.cycles-pp.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault
2.36 ? 13% +6.9 9.29 ? 7% perf-profile.calltrace.cycles-pp.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault
0.00 +7.5 7.50 ? 7% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch
0.00 +7.6 7.56 ? 7% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages
0.00 +7.6 7.57 ? 8% perf-profile.calltrace.cycles-pp.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page
0.00 +7.6 7.57 ? 7% perf-profile.calltrace.cycles-pp.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
1.55 ? 13% -1.1 0.42 ? 9% perf-profile.children.cycles-pp.rmap_walk_anon
1.30 ? 15% -1.0 0.30 ? 9% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
1.11 ? 12% -0.9 0.18 ? 6% perf-profile.children.cycles-pp.try_to_migrate_one
1.17 ? 12% -0.9 0.26 ? 8% perf-profile.children.cycles-pp.try_to_migrate
1.30 ? 12% -0.9 0.42 ? 8% perf-profile.children.cycles-pp.migrate_folio_unmap
1.08 ? 13% -0.9 0.21 ? 11% perf-profile.children.cycles-pp._raw_spin_lock
0.46 ? 14% -0.3 0.14 ? 13% perf-profile.children.cycles-pp.page_vma_mapped_walk
0.35 ? 13% -0.2 0.11 ? 11% perf-profile.children.cycles-pp.remove_migration_pte
0.14 ? 21% -0.1 0.07 ? 11% perf-profile.children.cycles-pp.folio_lruvec_lock_irq
0.14 ? 21% -0.1 0.08 ? 10% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.33 ? 3% -0.0 0.30 perf-profile.children.cycles-pp.lrand48_r@plt
0.06 ? 14% +0.0 0.08 ? 9% perf-profile.children.cycles-pp.mt_find
0.06 ? 14% +0.0 0.08 ? 11% perf-profile.children.cycles-pp.find_vma
0.00 +0.1 0.06 ? 9% perf-profile.children.cycles-pp.folio_migrate_flags
0.06 ? 8% +0.1 0.12 ? 8% perf-profile.children.cycles-pp.exit_to_user_mode_loop
0.03 ? 81% +0.1 0.10 ? 8% perf-profile.children.cycles-pp.uncharge_batch
0.00 +0.1 0.07 ? 8% perf-profile.children.cycles-pp.native_sched_clock
0.06 ? 10% +0.1 0.13 ? 8% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.03 ? 81% +0.1 0.10 ? 10% perf-profile.children.cycles-pp.__folio_put
0.03 ? 81% +0.1 0.10 ? 10% perf-profile.children.cycles-pp.__mem_cgroup_uncharge
0.16 ? 12% +0.1 0.24 ? 10% perf-profile.children.cycles-pp.up_read
0.04 ? 50% +0.1 0.12 ? 8% perf-profile.children.cycles-pp.task_work_run
0.01 ?200% +0.1 0.09 ? 12% perf-profile.children.cycles-pp.page_counter_uncharge
0.23 ? 18% +0.1 0.31 ? 8% perf-profile.children.cycles-pp.folio_batch_move_lru
0.00 +0.1 0.08 ? 10% perf-profile.children.cycles-pp.sched_clock_cpu
0.23 ? 18% +0.1 0.31 ? 8% perf-profile.children.cycles-pp.lru_add_drain
0.23 ? 18% +0.1 0.31 ? 8% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.19 ? 12% +0.1 0.28 ? 11% perf-profile.children.cycles-pp.down_read_trylock
0.05 ? 7% +0.1 0.14 ? 8% perf-profile.children.cycles-pp.mem_cgroup_migrate
0.00 +0.1 0.09 ? 10% perf-profile.children.cycles-pp._find_next_bit
0.03 ? 82% +0.1 0.12 ? 8% perf-profile.children.cycles-pp.change_pte_range
0.03 ? 82% +0.1 0.12 ? 8% perf-profile.children.cycles-pp.task_numa_work
0.03 ? 82% +0.1 0.12 ? 8% perf-profile.children.cycles-pp.change_prot_numa
0.03 ? 82% +0.1 0.12 ? 8% perf-profile.children.cycles-pp.change_protection_range
0.03 ? 82% +0.1 0.12 ? 8% perf-profile.children.cycles-pp.change_pmd_range
0.02 ?123% +0.1 0.12 ? 6% perf-profile.children.cycles-pp.irqtime_account_irq
0.07 ? 13% +0.1 0.18 ? 24% perf-profile.children.cycles-pp.__irq_exit_rcu
0.02 ?122% +0.1 0.13 ? 6% perf-profile.children.cycles-pp.page_counter_charge
0.18 ? 12% +0.1 0.30 ? 9% perf-profile.children.cycles-pp.folio_copy
0.17 ? 13% +0.1 0.30 ? 9% perf-profile.children.cycles-pp.copy_page
0.09 ? 4% +0.1 0.24 ? 9% perf-profile.children.cycles-pp.sync_regs
0.27 ? 11% +0.2 0.51 ? 8% perf-profile.children.cycles-pp.move_to_new_folio
0.27 ? 11% +0.2 0.51 ? 8% perf-profile.children.cycles-pp.migrate_folio_extra
0.10 ? 9% +0.3 0.40 ? 7% perf-profile.children.cycles-pp.native_irq_return_iret
0.07 ? 12% +0.4 0.47 ? 9% perf-profile.children.cycles-pp.__default_send_IPI_dest_field
0.00 +0.4 0.44 ? 9% perf-profile.children.cycles-pp.native_flush_tlb_local
0.09 ? 9% +0.5 0.62 ? 9% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
0.09 ? 10% +1.2 1.32 ? 7% perf-profile.children.cycles-pp.flush_tlb_func
0.26 ? 12% +1.6 1.85 ? 9% perf-profile.children.cycles-pp.llist_reverse_order
0.42 ? 11% +2.4 2.86 ? 8% perf-profile.children.cycles-pp.llist_add_batch
0.42 ? 11% +3.3 3.76 ? 8% perf-profile.children.cycles-pp.__sysvec_call_function
0.42 ? 11% +3.3 3.76 ? 8% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.44 ? 11% +3.5 3.90 ? 8% perf-profile.children.cycles-pp.sysvec_call_function
0.58 ? 6% +4.4 4.95 ? 8% perf-profile.children.cycles-pp.asm_sysvec_call_function
3.44 ? 13% +6.5 9.89 ? 7% perf-profile.children.cycles-pp.__handle_mm_fault
3.47 ? 13% +6.5 9.95 ? 7% perf-profile.children.cycles-pp.handle_mm_fault
3.15 ? 13% +6.5 9.69 ? 7% perf-profile.children.cycles-pp.do_numa_page
0.94 ? 12% +6.7 7.59 ? 7% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
0.94 ? 12% +6.7 7.59 ? 7% perf-profile.children.cycles-pp.smp_call_function_many_cond
3.83 ? 13% +6.7 10.49 ? 7% perf-profile.children.cycles-pp.do_user_addr_fault
3.84 ? 13% +6.7 10.50 ? 7% perf-profile.children.cycles-pp.exc_page_fault
4.08 ? 11% +6.7 10.76 ? 7% perf-profile.children.cycles-pp.asm_exc_page_fault
2.57 ? 13% +6.9 9.46 ? 7% perf-profile.children.cycles-pp.migrate_misplaced_page
2.36 ? 13% +6.9 9.28 ? 7% perf-profile.children.cycles-pp.migrate_pages_batch
2.36 ? 13% +6.9 9.29 ? 7% perf-profile.children.cycles-pp.migrate_pages
0.00 +7.6 7.57 ? 7% perf-profile.children.cycles-pp.try_to_unmap_flush
0.00 +7.6 7.57 ? 7% perf-profile.children.cycles-pp.arch_tlbbatch_flush
67.74 ? 4% -8.5 59.28 ? 2% perf-profile.self.cycles-pp.do_access
1.19 ? 15% -0.9 0.28 ? 9% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.03 ? 81% +0.0 0.08 ? 10% perf-profile.self.cycles-pp.change_pte_range
0.09 ? 4% +0.0 0.14 ? 20% perf-profile.self.cycles-pp._raw_spin_lock
0.15 ? 12% +0.0 0.20 ? 10% perf-profile.self.cycles-pp.up_read
0.00 +0.1 0.05 ? 8% perf-profile.self.cycles-pp.try_to_migrate_one
0.00 +0.1 0.07 ? 8% perf-profile.self.cycles-pp._find_next_bit
0.00 +0.1 0.07 ? 8% perf-profile.self.cycles-pp.native_sched_clock
0.00 +0.1 0.07 ? 12% perf-profile.self.cycles-pp.page_counter_uncharge
0.17 ? 13% +0.1 0.27 ? 9% perf-profile.self.cycles-pp.copy_page
0.01 ?200% +0.1 0.11 ? 8% perf-profile.self.cycles-pp.page_counter_charge
0.09 ? 4% +0.1 0.24 ? 9% perf-profile.self.cycles-pp.sync_regs
0.10 ? 9% +0.3 0.39 ? 8% perf-profile.self.cycles-pp.native_irq_return_iret
0.07 ? 12% +0.4 0.47 ? 9% perf-profile.self.cycles-pp.__default_send_IPI_dest_field
0.00 +0.4 0.44 ? 10% perf-profile.self.cycles-pp.native_flush_tlb_local
0.08 ? 13% +0.5 0.62 ? 7% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.06 ? 15% +0.8 0.88 ? 7% perf-profile.self.cycles-pp.flush_tlb_func
0.26 ? 12% +1.6 1.85 ? 9% perf-profile.self.cycles-pp.llist_reverse_order
0.36 ? 11% +2.0 2.40 ? 8% perf-profile.self.cycles-pp.llist_add_batch
0.38 ? 13% +3.1 3.49 ? 7% perf-profile.self.cycles-pp.smp_call_function_many_cond


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests


Attachments:
(No filename) (19.90 kB)
config-6.2.0-rc4-00556-g7e12beb8ca2a (163.59 kB)
job-script (7.90 kB)
job.yaml (5.73 kB)
reproduce (805.00 B)
Download all attachments

2023-03-20 08:00:01

by Huang, Ying

[permalink] [raw]
Subject: Re: [linus:master] [migrate_pages] 7e12beb8ca: vm-scalability.throughput -3.4% regression

Hi, Yujie,

kernel test robot <[email protected]> writes:

> Hello,
>
> FYI, we noticed a -3.4% regression of vm-scalability.throughput due to commit:
>
> commit: 7e12beb8ca2ac98b2ec42e0ea4b76cdc93b58654 ("migrate_pages: batch flushing TLB")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: vm-scalability
> on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
> with following parameters:
>
> runtime: 300s
> size: 512G
> test: anon-cow-rand-mt
> cpufreq_governor: performance
>
> test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
>
>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <[email protected]>
> | Link: https://lore.kernel.org/oe-lkp/[email protected]
>

Thanks a lot for report! Can you try whether the debug patch as
below can restore the regression?

Best Regards,
Huang, Ying

-------------------------------------8<------------------------------------
From 1ac61967b54bbdc1ca20af16f9dfb2507a4d4811 Mon Sep 17 00:00:00 2001
From: Huang Ying <[email protected]>
Date: Mon, 20 Mar 2023 15:48:39 +0800
Subject: [PATCH] dbg, rmap: avoid flushing TLB in batch if PTE is inaccessible

Signed-off-by: "Huang, Ying" <[email protected]>
---
mm/rmap.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 8632e02661ac..3c7c43642d7c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1582,7 +1582,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
*/
pteval = ptep_get_and_clear(mm, address, pvmw.pte);

- set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
+ if (pte_accessible(mm, pteval))
+ set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
} else {
pteval = ptep_clear_flush(vma, address, pvmw.pte);
}
@@ -1963,7 +1964,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
*/
pteval = ptep_get_and_clear(mm, address, pvmw.pte);

- set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
+ if (pte_accessible(mm, pteval))
+ set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
} else {
pteval = ptep_clear_flush(vma, address, pvmw.pte);
}
--
2.30.2


2023-03-21 03:24:53

by kernel test robot

[permalink] [raw]
Subject: Re: [linus:master] [migrate_pages] 7e12beb8ca: vm-scalability.throughput -3.4% regression

Hi Ying,

On Mon, 2023-03-20 at 15:58 +0800, Huang, Ying wrote:
> Hi, Yujie,
>
> kernel test robot <[email protected]> writes:
>
> > Hello,
> >
> > FYI, we noticed a -3.4% regression of vm-scalability.throughput due to commit:
> >
> > commit: 7e12beb8ca2ac98b2ec42e0ea4b76cdc93b58654 ("migrate_pages: batch flushing TLB")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: vm-scalability
> > on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
> > with following parameters:
> >
> >         runtime: 300s
> >         size: 512G
> >         test: anon-cow-rand-mt
> >         cpufreq_governor: performance
> >
> > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
> >
> >
> > If you fix the issue, kindly add following tag
> > > Reported-by: kernel test robot <[email protected]>
> > > Link: https://lore.kernel.org/oe-lkp/[email protected]
> >
>
> Thanks a lot for report!  Can you try whether the debug patch as
> below can restore the regression?

We've tested the patch and found the throughput score was partially
restored from -3.6% to -1.4%, still with a slight performance drop.
Please check the detailed data as follows:

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/512G/lkp-csl-2sp3/anon-cow-rand-mt/vm-scalability

commit:
ebe75e4751063 ("migrate_pages: share more code between _unmap and _move")
7e12beb8ca2ac ("migrate_pages: batch flushing TLB")
9a30245d65679 ("dbg, rmap: avoid flushing TLB in batch if PTE is inaccessible")

ebe75e4751063dce 7e12beb8ca2ac98b2ec42e0ea4b 9a30245d656794d171cd798a2be
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
57634 -3.5% 55603 -1.5% 56788 vm-scalability.median
81.16 ± 12% -5.0 76.17 ± 35% -20.0 61.18 ± 21% vm-scalability.stddev%
5528051 -3.6% 5328506 -1.4% 5449450 vm-scalability.throughput
200293 ± 3% -7.3% 185675 ± 2% -4.3% 191707 ± 2% vm-scalability.time.involuntary_context_switches
67952989 ± 5% +43.1% 97269013 ± 2% +35.6% 92147668 ± 3% vm-scalability.time.minor_page_faults
9006 -1.8% 8844 -0.6% 8956 vm-scalability.time.percent_of_cpu_this_job_got
1178 ± 3% +57.2% 1852 ± 3% +8.6% 1278 ± 3% vm-scalability.time.system_time
26327 -4.5% 25132 -1.0% 26056 vm-scalability.time.user_time
11378 ± 5% +359.9% 52332 ± 7% +118.5% 24867 ± 7% vm-scalability.time.voluntary_context_switches
1.662e+09 -3.7% 1.601e+09 -1.5% 1.638e+09 vm-scalability.workload
79922 ± 3% +9.3% 87378 ± 3% +3.3% 82589 ± 8% numa-meminfo.node1.SUnreclaim
399014 ±192% -84.9% 60246 ±129% -13.6% 344869 ±239% numa-meminfo.node1.Unevictable
2022 ± 3% +11.6% 2257 +3.6% 2095 vmstat.system.cs
539357 ± 2% +187.0% 1547747 ± 8% +32.9% 716886 ± 4% vmstat.system.in
0.00 ±184% +0.0 0.00 ± 6% +0.0 0.00 ± 25% mpstat.cpu.all.iowait%
2.58 +1.7 4.27 ± 4% +0.5 3.09 ± 3% mpstat.cpu.all.irq%
4.06 ± 3% +2.3 6.36 ± 3% +0.3 4.40 ± 3% mpstat.cpu.all.sys%
19980 ± 3% +9.3% 21844 ± 3% +3.3% 20646 ± 8% numa-vmstat.node1.nr_slab_unreclaimable
99752 ±192% -84.9% 15061 ±129% -13.6% 86216 ±239% numa-vmstat.node1.nr_unevictable
99752 ±192% -84.9% 15061 ±129% -13.6% 86216 ±239% numa-vmstat.node1.nr_zone_unevictable
205569 ± 7% +131.1% 475135 ± 99% +66.5% 342364 ± 91% turbostat.C1
1.382e+09 ± 2% +140.0% 3.317e+09 ± 5% +30.4% 1.803e+09 ± 3% turbostat.IRQ
9095 ± 14% +446.4% 49695 ± 7% +149.0% 22643 ± 11% turbostat.POLL
86.84 -2.4% 84.76 -1.4% 85.63 turbostat.RAMWatt
200293 ± 3% -7.3% 185675 ± 2% -4.3% 191707 ± 2% time.involuntary_context_switches
67.11 ± 56% -92.3% 5.17 ± 55% -95.4% 3.11 ± 80% time.major_page_faults
67952989 ± 5% +43.1% 97269013 ± 2% +35.6% 92147668 ± 3% time.minor_page_faults
9006 -1.8% 8844 -0.6% 8956 time.percent_of_cpu_this_job_got
1178 ± 3% +57.2% 1852 ± 3% +8.6% 1278 ± 3% time.system_time
26327 -4.5% 25132 -1.0% 26056 time.user_time
11378 ± 5% +359.9% 52332 ± 7% +118.5% 24867 ± 7% time.voluntary_context_switches
143480 ± 3% -20.9% 113504 ± 11% -12.0% 126262 ± 4% sched_debug.cfs_rq:/.min_vruntime.stddev
548123 ± 7% -49.1% 279239 ± 34% -20.7% 434543 ± 9% sched_debug.cfs_rq:/.spread0.avg
655329 ± 6% -36.3% 417735 ± 22% -16.2% 549218 ± 6% sched_debug.cfs_rq:/.spread0.max
143388 ± 3% -20.8% 113612 ± 11% -11.9% 126295 ± 4% sched_debug.cfs_rq:/.spread0.stddev
39.81 ± 28% +45.0% 57.73 ± 19% +17.8% 46.89 ± 44% sched_debug.cfs_rq:/.util_est_enqueued.stddev
240478 ± 6% -12.9% 209367 ± 7% -12.0% 211715 ± 5% sched_debug.cpu.avg_idle.avg
1597 +10.4% 1763 ± 3% +2.3% 1633 sched_debug.cpu.clock_task.stddev
1938 ± 5% +29.1% 2503 +11.4% 2160 ± 3% sched_debug.cpu.nr_switches.min
39960890 ± 6% +68.3% 67272793 ± 2% +54.7% 61837739 ± 4% proc-vmstat.numa_hint_faults
19987976 ± 6% +68.7% 33722069 ± 2% +55.1% 30996483 ± 4% proc-vmstat.numa_hint_faults_local
28840932 ± 3% +6.9% 30817082 ± 5% +8.0% 31160418 ± 4% proc-vmstat.numa_hit
28753783 ± 3% +6.9% 30727992 ± 5% +8.1% 31074486 ± 4% proc-vmstat.numa_local
19745743 ± 5% +10.0% 21720583 ± 7% +11.8% 22080123 ± 6% proc-vmstat.numa_pages_migrated
40107839 ± 6% +68.1% 67430626 ± 2% +54.6% 61988683 ± 4% proc-vmstat.numa_pte_updates
37158989 ± 2% +5.3% 39124260 ± 3% +6.3% 39482935 ± 3% proc-vmstat.pgalloc_normal
68856116 ± 5% +42.6% 98184580 ± 2% +35.1% 93057570 ± 3% proc-vmstat.pgfault
19745743 ± 5% +10.0% 21720583 ± 7% +11.8% 22080123 ± 6% proc-vmstat.pgmigrate_success
19754280 ± 5% +10.0% 21735325 ± 7% +11.8% 22080663 ± 6% proc-vmstat.pgreuse
0.17 ± 7% +0.1 0.23 ± 3% +0.0 0.18 ± 5% perf-stat.i.branch-miss-rate%
8953845 ± 3% +61.0% 14417578 ± 3% +13.3% 10142474 ± 2% perf-stat.i.branch-misses
66.30 -1.8 64.47 -0.3 65.98 perf-stat.i.cache-miss-rate%
1904 ± 3% +12.3% 2139 +3.9% 1979 perf-stat.i.context-switches
158.09 +11.3% 175.92 ± 3% +7.5% 170.00 ± 2% perf-stat.i.cpu-migrations
0.04 ± 9% +0.0 0.05 ± 11% +0.0 0.04 ± 7% perf-stat.i.dTLB-load-miss-rate%
4856144 ± 8% +41.5% 6870029 ± 9% +12.3% 5455416 ± 7% perf-stat.i.dTLB-load-misses
9.10 -0.4 8.71 -0.1 8.97 perf-stat.i.dTLB-store-miss-rate%
5.33e+08 -4.4% 5.095e+08 -1.8% 5.233e+08 perf-stat.i.dTLB-store-misses
2454429 ± 2% +159.7% 6374895 ± 7% +26.7% 3110501 ± 5% perf-stat.i.iTLB-load-misses
116140 ± 2% +60.9% 186840 ± 7% -3.6% 111933 ± 4% perf-stat.i.iTLB-loads
41691 ± 5% -23.0% 32083 ± 26% +1.7% 42380 ± 20% perf-stat.i.instructions-per-iTLB-miss
0.31 ± 38% -59.1% 0.13 ± 27% -68.9% 0.10 ± 31% perf-stat.i.major-faults
224958 ± 5% +42.4% 320417 ± 2% +35.4% 304571 ± 3% perf-stat.i.minor-faults
50.61 +1.6 52.22 +0.7 51.35 perf-stat.i.node-load-miss-rate%
1.169e+08 +3.3% 1.208e+08 +0.9% 1.179e+08 perf-stat.i.node-load-misses
1.132e+08 -3.7% 1.089e+08 -2.1% 1.108e+08 perf-stat.i.node-loads
2.688e+08 -3.9% 2.582e+08 -1.8% 2.64e+08 perf-stat.i.node-store-misses
2.664e+08 -4.5% 2.543e+08 -1.7% 2.618e+08 perf-stat.i.node-stores
224959 ± 5% +42.4% 320418 ± 2% +35.4% 304571 ± 3% perf-stat.i.page-faults
0.08 ± 4% +0.0 0.12 ± 4% +0.0 0.09 ± 3% perf-stat.overall.branch-miss-rate%
67.15 -1.9 65.28 -0.5 66.64 perf-stat.overall.cache-miss-rate%
366.74 +2.9% 377.43 +1.2% 371.26 perf-stat.overall.cycles-between-cache-misses
0.03 ± 8% +0.0 0.05 ± 10% +0.0 0.04 ± 8% perf-stat.overall.dTLB-load-miss-rate%
9.38 -0.4 8.97 -0.1 9.25 perf-stat.overall.dTLB-store-miss-rate%
95.49 +1.7 97.16 +1.0 96.53 perf-stat.overall.iTLB-load-miss-rate%
20490 ± 3% -61.8% 7826 ± 7% -21.5% 16077 ± 6% perf-stat.overall.instructions-per-iTLB-miss
50.81 +1.8 52.60 +0.8 51.56 perf-stat.overall.node-load-miss-rate%
9210 +3.0% 9485 +0.7% 9271 perf-stat.overall.path-length
8906114 ± 3% +61.8% 14412101 ± 3% +13.3% 10090374 ± 2% perf-stat.ps.branch-misses
1906 ± 3% +12.3% 2142 +3.8% 1979 perf-stat.ps.context-switches
157.57 +11.7% 176.03 ± 3% +7.6% 169.49 ± 2% perf-stat.ps.cpu-migrations
4843373 ± 8% +41.9% 6871859 ± 9% +12.3% 5440606 ± 7% perf-stat.ps.dTLB-load-misses
5.313e+08 -4.4% 5.077e+08 -1.8% 5.218e+08 perf-stat.ps.dTLB-store-misses
2444301 ± 2% +161.3% 6385873 ± 7% +26.8% 3098710 ± 5% perf-stat.ps.iTLB-load-misses
115384 ± 2% +61.5% 186290 ± 7% -3.7% 111109 ± 4% perf-stat.ps.iTLB-loads
0.31 ± 38% -59.0% 0.13 ± 27% -68.8% 0.10 ± 31% perf-stat.ps.major-faults
224444 ± 5% +42.8% 320615 ± 2% +35.3% 303619 ± 3% perf-stat.ps.minor-faults
1.165e+08 +3.4% 1.205e+08 +0.9% 1.176e+08 perf-stat.ps.node-load-misses
1.128e+08 -3.8% 1.086e+08 -2.1% 1.105e+08 perf-stat.ps.node-loads
2.68e+08 -4.0% 2.573e+08 -1.8% 2.632e+08 perf-stat.ps.node-store-misses
2.656e+08 -4.6% 2.534e+08 -1.7% 2.61e+08 perf-stat.ps.node-stores
224444 ± 5% +42.8% 320615 ± 2% +35.3% 303620 ± 3% perf-stat.ps.page-faults
19.08 ± 10% -1.7 17.34 ± 4% +0.5 19.59 perf-profile.calltrace.cycles-pp.nrand48_r
1.26 ± 15% -1.3 0.00 -1.3 0.00 perf-profile.calltrace.cycles-pp.migrate_folio_unmap.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
1.14 ± 15% -1.1 0.00 -1.1 0.00 perf-profile.calltrace.cycles-pp.try_to_migrate.migrate_folio_unmap.migrate_pages_batch.migrate_pages.migrate_misplaced_page
1.12 ± 15% -1.1 0.00 -1.1 0.00 perf-profile.calltrace.cycles-pp.rmap_walk_anon.try_to_migrate.migrate_folio_unmap.migrate_pages_batch.migrate_pages
1.08 ± 15% -1.1 0.00 -1.1 0.00 perf-profile.calltrace.cycles-pp.try_to_migrate_one.rmap_walk_anon.try_to_migrate.migrate_folio_unmap.migrate_pages_batch
0.92 ± 15% -0.9 0.00 -0.9 0.00 perf-profile.calltrace.cycles-pp.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon.try_to_migrate.migrate_folio_unmap
0.91 ± 15% -0.9 0.00 -0.9 0.00 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon.try_to_migrate
0.91 ± 15% -0.9 0.00 -0.9 0.00 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon
0.91 ± 15% -0.9 0.00 -0.9 0.00 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one
6.40 ± 9% -0.5 5.94 ± 4% +0.1 6.54 perf-profile.calltrace.cycles-pp.lrand48_r
0.26 ±112% -0.3 0.00 -0.3 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.19 ±141% -0.2 0.00 -0.2 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.do_numa_page.__handle_mm_fault.handle_mm_fault
4.13 ± 3% -0.1 4.04 -0.0 4.12 perf-profile.calltrace.cycles-pp.do_rw_once
0.06 ±282% -0.1 0.00 -0.1 0.00 perf-profile.calltrace.cycles-pp.rmap_walk_anon.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
0.13 ±188% +0.1 0.24 ±144% -0.0 0.11 ±187% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.nrand48_r
0.00 +0.1 0.10 ±223% +0.0 0.00 perf-profile.calltrace.cycles-pp.update_load_avg.task_tick_fair.scheduler_tick.update_process_times.tick_sched_handle
0.00 +0.1 0.11 ±223% +0.0 0.00 perf-profile.calltrace.cycles-pp.update_curr.task_tick_fair.scheduler_tick.update_process_times.tick_sched_handle
0.07 ±282% +0.1 0.21 ±144% -0.1 0.00 perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
0.07 ±282% +0.1 0.21 ±144% -0.1 0.00 perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
0.07 ±282% +0.1 0.22 ±144% -0.1 0.00 perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
0.00 +0.2 0.17 ±141% +0.0 0.00 perf-profile.calltrace.cycles-pp.__default_send_IPI_dest_field.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush
0.00 +0.3 0.26 ±100% +0.0 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.nrand48_r
0.00 +0.4 0.36 ± 70% +0.1 0.06 ±282% perf-profile.calltrace.cycles-pp.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page
0.00 +0.4 0.36 ± 70% +0.1 0.06 ±282% perf-profile.calltrace.cycles-pp.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
1.44 ± 28% +0.5 1.94 ± 61% +0.1 1.51 ± 25% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
1.43 ± 29% +0.5 1.93 ± 61% +0.1 1.50 ± 25% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
0.55 ± 69% +0.5 1.08 ± 69% +0.0 0.60 ± 56% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues
1.34 ± 39% +0.6 1.90 ± 69% +0.0 1.35 ± 25% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.17 ±196% +0.6 0.73 ± 85% +0.2 0.33 ± 89% perf-profile.calltrace.cycles-pp.task_tick_fair.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer
1.72 ± 25% +0.6 2.30 ± 48% +0.1 1.80 ± 22% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.do_access
1.08 ± 31% +0.6 1.66 ± 72% +0.1 1.13 ± 26% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
1.52 ± 28% +0.6 2.11 ± 52% +0.1 1.58 ± 25% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
1.09 ± 31% +0.6 1.68 ± 72% +0.1 1.14 ± 26% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
1.18 ± 30% +0.6 1.78 ± 70% +0.1 1.24 ± 26% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
0.00 +0.6 0.60 ± 8% +0.0 0.00 perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush
0.00 +0.6 0.64 ± 7% +0.0 0.00 perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
0.00 +0.9 0.90 ± 10% +0.0 0.00 perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function
72.48 ± 3% +1.4 73.88 -0.7 71.79 perf-profile.calltrace.cycles-pp.do_access
0.00 +1.9 1.86 ± 9% +0.3 0.26 ±113% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +1.9 1.87 ± 8% +0.3 0.26 ±113% perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +1.9 1.94 ± 8% +0.3 0.33 ± 91% perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +2.6 2.59 ± 9% +0.6 0.59 ± 40% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.do_access
0.00 +2.8 2.80 ± 8% +0.9 0.90 ± 18% perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush
3.30 ± 15% +6.6 9.88 ± 7% +0.9 4.18 ± 19% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
3.34 ± 15% +6.6 9.94 ± 7% +0.9 4.22 ± 19% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
3.03 ± 15% +6.7 9.69 ± 7% +1.0 4.03 ± 19% perf-profile.calltrace.cycles-pp.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
3.68 ± 15% +6.8 10.48 ± 7% +0.9 4.63 ± 19% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
3.70 ± 15% +6.8 10.49 ± 7% +0.9 4.64 ± 19% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
3.89 ± 14% +6.8 10.71 ± 7% +1.0 4.85 ± 19% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
2.46 ± 15% +7.0 9.46 ± 7% +1.4 3.85 ± 19% perf-profile.calltrace.cycles-pp.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
2.27 ± 15% +7.0 9.28 ± 7% +1.4 3.67 ± 19% perf-profile.calltrace.cycles-pp.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault
2.27 ± 15% +7.0 9.29 ± 7% +1.4 3.68 ± 19% perf-profile.calltrace.cycles-pp.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault
0.00 +7.5 7.50 ± 7% +2.4 2.38 ± 18% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch
0.00 +7.6 7.56 ± 7% +2.4 2.40 ± 18% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages
0.00 +7.6 7.57 ± 8% +2.4 2.40 ± 18% perf-profile.calltrace.cycles-pp.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page
0.00 +7.6 7.57 ± 7% +2.4 2.40 ± 18% perf-profile.calltrace.cycles-pp.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
16.69 ± 10% -1.3 15.43 ± 5% +0.5 17.16 perf-profile.children.cycles-pp.nrand48_r
1.51 ± 16% -1.1 0.42 ± 9% -1.2 0.31 ± 20% perf-profile.children.cycles-pp.rmap_walk_anon
1.25 ± 16% -1.0 0.30 ± 9% -1.0 0.29 ± 20% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.92 ± 15% -0.9 0.00 -0.9 0.00 perf-profile.children.cycles-pp.ptep_clear_flush
0.92 ± 15% -0.9 0.00 -0.9 0.00 perf-profile.children.cycles-pp.flush_tlb_mm_range
9.27 ± 8% -0.9 8.37 ± 4% +0.2 9.45 perf-profile.children.cycles-pp.lrand48_r
1.08 ± 15% -0.9 0.18 ± 6% -1.0 0.12 ± 21% perf-profile.children.cycles-pp.try_to_migrate_one
1.14 ± 15% -0.9 0.26 ± 8% -0.9 0.19 ± 19% perf-profile.children.cycles-pp.try_to_migrate
1.05 ± 15% -0.8 0.21 ± 11% -0.9 0.16 ± 16% perf-profile.children.cycles-pp._raw_spin_lock
1.26 ± 15% -0.8 0.42 ± 8% -0.9 0.34 ± 21% perf-profile.children.cycles-pp.migrate_folio_unmap
0.46 ± 15% -0.3 0.14 ± 13% -0.3 0.11 ± 20% perf-profile.children.cycles-pp.page_vma_mapped_walk
0.34 ± 15% -0.2 0.11 ± 11% -0.3 0.08 ± 18% perf-profile.children.cycles-pp.remove_migration_pte
0.14 ± 16% -0.1 0.00 -0.1 0.00 perf-profile.children.cycles-pp.handle_pte_fault
4.37 ± 3% -0.1 4.29 -0.0 4.36 perf-profile.children.cycles-pp.do_rw_once
0.13 ± 22% -0.1 0.07 ± 11% -0.0 0.09 ± 23% perf-profile.children.cycles-pp.folio_lruvec_lock_irq
0.13 ± 22% -0.1 0.08 ± 10% -0.0 0.09 ± 22% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.33 ± 2% -0.0 0.30 -0.0 0.32 ± 2% perf-profile.children.cycles-pp.lrand48_r@plt
0.17 ± 21% -0.0 0.14 ± 9% -0.0 0.15 ± 21% perf-profile.children.cycles-pp.folio_isolate_lru
0.02 ±112% -0.0 0.00 +0.0 0.03 ±111% perf-profile.children.cycles-pp.timerqueue_del
0.19 ± 20% -0.0 0.17 ± 8% -0.0 0.17 ± 20% perf-profile.children.cycles-pp.numamigrate_isolate_page
0.06 ± 13% -0.0 0.04 ± 45% -0.0 0.05 ± 37% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.06 ± 13% -0.0 0.04 ± 45% -0.0 0.05 ± 37% perf-profile.children.cycles-pp.do_syscall_64
0.01 ±193% -0.0 0.00 -0.0 0.01 ±188% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.09 ± 20% -0.0 0.08 ± 47% +0.0 0.09 ± 23% perf-profile.children.cycles-pp.tick_sched_do_timer
0.07 ± 39% -0.0 0.06 ± 45% +0.0 0.07 ± 28% perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.01 ±282% -0.0 0.00 -0.0 0.01 ±282% perf-profile.children.cycles-pp.perf_rotate_context
0.02 ±111% -0.0 0.02 ±142% +0.0 0.03 ±112% perf-profile.children.cycles-pp.irqtime_account_process_tick
0.06 ± 39% -0.0 0.06 ± 8% +0.0 0.07 ± 21% perf-profile.children.cycles-pp.rmqueue_bulk
0.00 +0.0 0.00 +0.0 0.01 ±282% perf-profile.children.cycles-pp.__free_one_page
0.00 +0.0 0.00 +0.0 0.01 ±187% perf-profile.children.cycles-pp.lru_add_fn
0.07 ± 27% +0.0 0.07 ± 47% -0.0 0.06 ± 55% perf-profile.children.cycles-pp.ktime_get
0.09 ± 15% +0.0 0.10 ± 8% +0.0 0.11 ± 21% perf-profile.children.cycles-pp.rmqueue
0.09 ± 39% +0.0 0.10 ± 50% -0.0 0.07 ± 75% perf-profile.children.cycles-pp.cpuacct_account_field
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.run_posix_cpu_timers
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.nohz_balance_exit_idle
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.reweight_entity
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.__hrtimer_next_event_base
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.nohz_balancer_kick
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.trigger_load_balance
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.check_cpu_stall
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.perf_event_task_tick
0.09 ± 16% +0.0 0.10 ± 7% +0.0 0.11 ± 22% perf-profile.children.cycles-pp.__alloc_pages
0.09 ± 16% +0.0 0.10 ± 10% +0.0 0.11 ± 21% perf-profile.children.cycles-pp.get_page_from_freelist
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.children.cycles-pp.acct_account_cputime
0.09 ± 18% +0.0 0.10 ± 7% +0.0 0.11 ± 22% perf-profile.children.cycles-pp.__folio_alloc
0.01 ±282% +0.0 0.02 ±142% -0.0 0.00 perf-profile.children.cycles-pp.rcu_core
0.32 ± 19% +0.0 0.34 ± 45% +0.0 0.33 ± 32% perf-profile.children.cycles-pp.account_user_time
0.12 ± 95% +0.0 0.14 ± 6% -0.0 0.11 ± 16% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
0.09 ± 18% +0.0 0.11 ± 9% +0.0 0.11 ± 22% perf-profile.children.cycles-pp.alloc_misplaced_dst_page
0.06 ± 18% +0.0 0.08 ± 69% +0.0 0.07 ± 41% perf-profile.children.cycles-pp.rcu_pending
0.00 +0.0 0.02 ±141% +0.0 0.00 perf-profile.children.cycles-pp.set_tlb_ubc_flush_pending
0.00 +0.0 0.02 ±141% +0.0 0.00 perf-profile.children.cycles-pp.folio_lock_anon_vma_read
0.00 +0.0 0.02 ±141% +0.0 0.01 ±282% perf-profile.children.cycles-pp.folio_get_anon_vma
0.06 ± 18% +0.0 0.08 ± 9% +0.0 0.06 ± 19% perf-profile.children.cycles-pp.mt_find
0.21 ± 17% +0.0 0.23 ± 8% -0.0 0.21 ± 18% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.06 ± 16% +0.0 0.08 ± 8% +0.0 0.08 ± 21% perf-profile.children.cycles-pp.free_unref_page
0.06 ± 18% +0.0 0.08 ± 11% +0.0 0.06 ± 20% perf-profile.children.cycles-pp.find_vma
0.11 ± 16% +0.0 0.12 ± 66% +0.0 0.13 ± 29% perf-profile.children.cycles-pp.__cgroup_account_cputime_field
0.01 ±282% +0.0 0.03 ±102% -0.0 0.00 perf-profile.children.cycles-pp.lapic_next_deadline
0.03 ± 71% +0.0 0.06 ± 8% +0.0 0.05 ± 39% perf-profile.children.cycles-pp.free_pcppages_bulk
0.02 ±209% +0.0 0.04 ±103% -0.0 0.02 ±142% perf-profile.children.cycles-pp.update_cfs_group
0.01 ±282% +0.0 0.03 ±105% -0.0 0.00 perf-profile.children.cycles-pp.hrtimer_update_next_event
0.05 ± 43% +0.0 0.08 ± 61% -0.0 0.05 ± 57% perf-profile.children.cycles-pp.update_irq_load_avg
0.00 +0.0 0.02 ± 99% +0.0 0.00 perf-profile.children.cycles-pp.__perf_sw_event
0.08 ± 15% +0.0 0.10 ± 10% +0.0 0.10 ± 21% perf-profile.children.cycles-pp.__list_del_entry_valid
0.09 ± 47% +0.0 0.12 ± 70% -0.0 0.08 ± 43% perf-profile.children.cycles-pp.hrtimer_active
0.01 ±282% +0.0 0.03 ±106% -0.0 0.00 perf-profile.children.cycles-pp.update_min_vruntime
0.08 ± 18% +0.0 0.11 ± 68% +0.0 0.09 ± 26% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.07 ± 35% +0.0 0.10 ± 33% +0.0 0.08 ± 26% perf-profile.children.cycles-pp.clockevents_program_event
0.01 ±282% +0.0 0.04 ±110% -0.0 0.00 perf-profile.children.cycles-pp.timerqueue_add
0.04 ± 91% +0.0 0.07 ± 50% +0.0 0.06 ± 38% perf-profile.children.cycles-pp.arch_scale_freq_tick
0.02 ±154% +0.0 0.06 ± 74% +0.0 0.03 ± 92% perf-profile.children.cycles-pp.__do_softirq
0.00 +0.0 0.04 ± 71% +0.0 0.02 ±142% perf-profile.children.cycles-pp.can_change_pte_writable
0.01 ±282% +0.0 0.04 ±107% -0.0 0.00 perf-profile.children.cycles-pp.enqueue_hrtimer
0.00 +0.0 0.04 ± 44% +0.0 0.00 perf-profile.children.cycles-pp.tlb_is_not_lazy
0.00 +0.0 0.04 ± 45% +0.0 0.00 perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.15 ± 20% +0.0 0.20 ± 8% -0.0 0.15 ± 21% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
0.11 ± 25% +0.0 0.16 ± 64% +0.0 0.11 ± 25% perf-profile.children.cycles-pp.update_rq_clock
0.03 ±118% +0.1 0.08 ± 58% +0.0 0.05 ± 59% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
0.03 ±127% +0.1 0.09 ± 84% +0.0 0.04 ± 72% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.00 +0.1 0.06 ± 9% +0.0 0.02 ±142% perf-profile.children.cycles-pp.folio_migrate_flags
0.03 ±152% +0.1 0.09 ± 68% +0.0 0.04 ± 72% perf-profile.children.cycles-pp.__update_load_avg_se
0.00 +0.1 0.07 ± 8% +0.0 0.00 perf-profile.children.cycles-pp.native_sched_clock
0.05 ± 36% +0.1 0.12 ± 8% +0.1 0.10 ± 18% perf-profile.children.cycles-pp.exit_to_user_mode_loop
0.06 ± 13% +0.1 0.13 ± 8% +0.1 0.11 ± 16% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.16 ± 13% +0.1 0.24 ± 10% +0.0 0.18 ± 19% perf-profile.children.cycles-pp.up_read
0.00 +0.1 0.08 ± 10% +0.0 0.00 perf-profile.children.cycles-pp.sched_clock_cpu
0.02 ±141% +0.1 0.10 ± 8% +0.0 0.05 ± 42% perf-profile.children.cycles-pp.uncharge_batch
0.01 ±282% +0.1 0.09 ± 12% +0.0 0.04 ± 75% perf-profile.children.cycles-pp.page_counter_uncharge
0.04 ± 71% +0.1 0.12 ± 8% +0.1 0.10 ± 18% perf-profile.children.cycles-pp.task_work_run
0.00 +0.1 0.09 ± 10% +0.0 0.01 ±282% perf-profile.children.cycles-pp._find_next_bit
0.02 ±141% +0.1 0.10 ± 10% +0.0 0.06 ± 44% perf-profile.children.cycles-pp.__mem_cgroup_uncharge
0.02 ±141% +0.1 0.10 ± 10% +0.0 0.06 ± 44% perf-profile.children.cycles-pp.__folio_put
0.19 ± 17% +0.1 0.28 ± 11% +0.0 0.21 ± 18% perf-profile.children.cycles-pp.down_read_trylock
0.03 ± 90% +0.1 0.12 ± 8% +0.1 0.10 ± 16% perf-profile.children.cycles-pp.change_pte_range
0.03 ± 90% +0.1 0.12 ± 8% +0.1 0.10 ± 18% perf-profile.children.cycles-pp.task_numa_work
0.03 ± 90% +0.1 0.12 ± 8% +0.1 0.10 ± 18% perf-profile.children.cycles-pp.change_prot_numa
0.03 ± 90% +0.1 0.12 ± 8% +0.1 0.10 ± 18% perf-profile.children.cycles-pp.change_protection_range
0.03 ± 90% +0.1 0.12 ± 8% +0.1 0.10 ± 18% perf-profile.children.cycles-pp.change_pmd_range
0.21 ± 19% +0.1 0.31 ± 8% +0.0 0.22 ± 21% perf-profile.children.cycles-pp.folio_batch_move_lru
0.02 ±142% +0.1 0.12 ± 6% +0.0 0.04 ± 72% perf-profile.children.cycles-pp.irqtime_account_irq
0.08 ± 36% +0.1 0.18 ± 24% +0.0 0.09 ± 24% perf-profile.children.cycles-pp.__irq_exit_rcu
0.21 ± 19% +0.1 0.31 ± 8% +0.0 0.22 ± 20% perf-profile.children.cycles-pp.lru_add_drain
0.21 ± 19% +0.1 0.31 ± 8% +0.0 0.22 ± 20% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.03 ± 71% +0.1 0.14 ± 8% +0.0 0.08 ± 25% perf-profile.children.cycles-pp.mem_cgroup_migrate
0.01 ±187% +0.1 0.13 ± 6% +0.1 0.07 ± 26% perf-profile.children.cycles-pp.page_counter_charge
0.17 ± 13% +0.1 0.30 ± 9% +0.1 0.24 ± 19% perf-profile.children.cycles-pp.folio_copy
0.17 ± 14% +0.1 0.30 ± 9% +0.1 0.23 ± 20% perf-profile.children.cycles-pp.copy_page
0.09 ± 7% +0.2 0.24 ± 9% +0.0 0.11 ± 14% perf-profile.children.cycles-pp.sync_regs
0.21 ± 48% +0.2 0.39 ± 65% +0.0 0.22 ± 28% perf-profile.children.cycles-pp.update_load_avg
0.25 ± 39% +0.2 0.43 ± 61% +0.0 0.27 ± 25% perf-profile.children.cycles-pp.update_curr
0.25 ± 12% +0.3 0.51 ± 8% +0.1 0.36 ± 20% perf-profile.children.cycles-pp.migrate_folio_extra
0.25 ± 12% +0.3 0.51 ± 8% +0.1 0.36 ± 20% perf-profile.children.cycles-pp.move_to_new_folio
0.11 ± 20% +0.3 0.40 ± 7% +0.0 0.16 ± 15% perf-profile.children.cycles-pp.native_irq_return_iret
0.06 ± 40% +0.4 0.47 ± 9% +0.1 0.13 ± 23% perf-profile.children.cycles-pp.__default_send_IPI_dest_field
0.00 +0.4 0.44 ± 9% +0.1 0.12 ± 22% perf-profile.children.cycles-pp.native_flush_tlb_local
0.68 ± 45% +0.5 1.16 ± 62% +0.0 0.71 ± 28% perf-profile.children.cycles-pp.task_tick_fair
0.08 ± 16% +0.5 0.62 ± 9% +0.1 0.17 ± 21% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
0.96 ± 40% +0.6 1.57 ± 60% +0.0 1.00 ± 27% perf-profile.children.cycles-pp.scheduler_tick
1.56 ± 32% +0.7 2.26 ± 55% +0.1 1.64 ± 25% perf-profile.children.cycles-pp.update_process_times
1.58 ± 32% +0.7 2.29 ± 55% +0.1 1.65 ± 25% perf-profile.children.cycles-pp.tick_sched_handle
1.71 ± 31% +0.7 2.42 ± 54% +0.1 1.79 ± 25% perf-profile.children.cycles-pp.tick_sched_timer
1.85 ± 30% +0.7 2.60 ± 52% +0.1 1.94 ± 25% perf-profile.children.cycles-pp.__hrtimer_run_queues
2.09 ± 29% +0.8 2.86 ± 50% +0.1 2.18 ± 24% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
2.06 ± 29% +0.8 2.85 ± 50% +0.1 2.16 ± 24% perf-profile.children.cycles-pp.hrtimer_interrupt
2.48 ± 26% +0.8 3.28 ± 45% +0.1 2.60 ± 22% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
2.19 ± 29% +0.8 2.99 ± 49% +0.1 2.29 ± 24% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.09 ± 17% +1.2 1.32 ± 7% +0.4 0.45 ± 21% perf-profile.children.cycles-pp.flush_tlb_func
0.25 ± 14% +1.6 1.85 ± 9% +0.3 0.55 ± 18% perf-profile.children.cycles-pp.llist_reverse_order
72.83 ± 3% +1.9 74.77 -0.6 72.25 perf-profile.children.cycles-pp.do_access
0.40 ± 15% +2.5 2.86 ± 8% +0.5 0.93 ± 18% perf-profile.children.cycles-pp.llist_add_batch
0.41 ± 14% +3.3 3.76 ± 8% +0.7 1.14 ± 19% perf-profile.children.cycles-pp.__sysvec_call_function
0.41 ± 14% +3.4 3.76 ± 8% +0.7 1.14 ± 19% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.43 ± 14% +3.5 3.90 ± 8% +0.7 1.17 ± 19% perf-profile.children.cycles-pp.sysvec_call_function
0.55 ± 12% +4.4 4.95 ± 8% +0.9 1.40 ± 19% perf-profile.children.cycles-pp.asm_sysvec_call_function
3.31 ± 15% +6.6 9.89 ± 7% +0.9 4.19 ± 19% perf-profile.children.cycles-pp.__handle_mm_fault
3.34 ± 15% +6.6 9.95 ± 7% +0.9 4.23 ± 19% perf-profile.children.cycles-pp.handle_mm_fault
3.03 ± 15% +6.7 9.69 ± 7% +1.0 4.03 ± 19% perf-profile.children.cycles-pp.do_numa_page
0.91 ± 15% +6.7 7.59 ± 7% +1.5 2.42 ± 18% perf-profile.children.cycles-pp.smp_call_function_many_cond
0.91 ± 15% +6.7 7.59 ± 7% +1.5 2.42 ± 18% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
3.70 ± 15% +6.8 10.49 ± 7% +0.9 4.64 ± 19% perf-profile.children.cycles-pp.do_user_addr_fault
3.70 ± 15% +6.8 10.50 ± 7% +0.9 4.64 ± 19% perf-profile.children.cycles-pp.exc_page_fault
3.91 ± 14% +6.8 10.76 ± 7% +1.0 4.88 ± 19% perf-profile.children.cycles-pp.asm_exc_page_fault
2.46 ± 15% +7.0 9.46 ± 7% +1.4 3.85 ± 19% perf-profile.children.cycles-pp.migrate_misplaced_page
2.27 ± 15% +7.0 9.28 ± 7% +1.4 3.67 ± 19% perf-profile.children.cycles-pp.migrate_pages_batch
2.27 ± 15% +7.0 9.29 ± 7% +1.4 3.68 ± 19% perf-profile.children.cycles-pp.migrate_pages
0.00 +7.6 7.57 ± 7% +2.4 2.40 ± 18% perf-profile.children.cycles-pp.try_to_unmap_flush
0.00 +7.6 7.57 ± 7% +2.4 2.40 ± 18% perf-profile.children.cycles-pp.arch_tlbbatch_flush
66.95 ± 3% -7.7 59.28 ± 2% -2.0 64.95 perf-profile.self.cycles-pp.do_access
13.38 ± 11% -1.4 12.02 ± 4% +0.3 13.71 perf-profile.self.cycles-pp.nrand48_r
8.81 ± 9% -1.1 7.70 ± 3% +0.1 8.94 ± 2% perf-profile.self.cycles-pp.lrand48_r
1.14 ± 16% -0.9 0.28 ± 9% -0.9 0.28 ± 21% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
4.08 ± 3% -0.3 3.77 -0.0 4.03 perf-profile.self.cycles-pp.do_rw_once
0.06 ±187% -0.1 0.00 -0.1 0.00 perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
0.29 ± 4% -0.0 0.26 -0.0 0.28 ± 2% perf-profile.self.cycles-pp.lrand48_r@plt
0.12 ± 27% -0.0 0.10 ± 53% +0.0 0.13 ± 36% perf-profile.self.cycles-pp.account_user_time
0.02 ±141% -0.0 0.00 +0.0 0.02 ±112% perf-profile.self.cycles-pp.hrtimer_interrupt
0.07 ± 16% -0.0 0.07 ± 47% +0.0 0.08 ± 25% perf-profile.self.cycles-pp.tick_sched_do_timer
0.06 ± 55% -0.0 0.05 ± 46% +0.0 0.06 ± 42% perf-profile.self.cycles-pp.ktime_get_update_offsets_now
0.02 ±111% -0.0 0.02 ±142% +0.0 0.03 ±112% perf-profile.self.cycles-pp.irqtime_account_process_tick
0.01 ±188% -0.0 0.01 ±223% -0.0 0.01 ±282% perf-profile.self.cycles-pp.rmap_walk_anon
0.00 +0.0 0.00 +0.0 0.01 ±282% perf-profile.self.cycles-pp.__free_one_page
0.06 ± 42% +0.0 0.07 ± 46% +0.0 0.07 ± 43% perf-profile.self.cycles-pp.update_process_times
0.09 ± 39% +0.0 0.10 ± 50% -0.0 0.07 ± 75% perf-profile.self.cycles-pp.cpuacct_account_field
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.set_tlb_ubc_flush_pending
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.__irq_exit_rcu
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.perf_event_task_tick
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.run_posix_cpu_timers
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.nohz_balance_exit_idle
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.reweight_entity
0.00 +0.0 0.01 ±223% +0.0 0.01 ±187% perf-profile.self.cycles-pp.can_change_pte_writable
0.06 ± 14% +0.0 0.07 ± 11% -0.0 0.04 ± 72% perf-profile.self.cycles-pp.mt_find
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.trigger_load_balance
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.check_cpu_stall
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.timerqueue_add
0.00 +0.0 0.01 ±223% +0.0 0.00 perf-profile.self.cycles-pp.acct_account_cputime
0.08 ± 17% +0.0 0.09 ± 13% +0.0 0.08 ± 21% perf-profile.self.cycles-pp.page_vma_mapped_walk
0.11 ± 17% +0.0 0.13 ± 15% +0.0 0.12 ± 20% perf-profile.self.cycles-pp.__handle_mm_fault
0.01 ±282% +0.0 0.02 ± 99% +0.0 0.02 ±112% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.10 ± 16% +0.0 0.12 ± 65% +0.0 0.12 ± 29% perf-profile.self.cycles-pp.__cgroup_account_cputime_field
0.01 ±282% +0.0 0.03 ±102% -0.0 0.00 perf-profile.self.cycles-pp.lapic_next_deadline
0.01 ±282% +0.0 0.03 ±150% +0.0 0.02 ±112% perf-profile.self.cycles-pp.rcu_pending
0.02 ±209% +0.0 0.04 ±103% -0.0 0.02 ±142% perf-profile.self.cycles-pp.update_cfs_group
0.08 ± 47% +0.0 0.10 ± 68% -0.0 0.07 ± 45% perf-profile.self.cycles-pp.hrtimer_active
0.05 ± 43% +0.0 0.08 ± 61% -0.0 0.05 ± 57% perf-profile.self.cycles-pp.update_irq_load_avg
0.04 ± 94% +0.0 0.06 ± 48% +0.0 0.05 ± 56% perf-profile.self.cycles-pp.ktime_get
0.07 ± 16% +0.0 0.10 ± 10% +0.0 0.10 ± 21% perf-profile.self.cycles-pp.__list_del_entry_valid
0.01 ±282% +0.0 0.03 ±106% -0.0 0.00 perf-profile.self.cycles-pp.update_min_vruntime
0.04 ± 91% +0.0 0.07 ± 50% +0.0 0.06 ± 38% perf-profile.self.cycles-pp.arch_scale_freq_tick
0.00 +0.0 0.03 ± 70% +0.0 0.00 perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys
0.01 ±282% +0.0 0.04 ± 75% +0.0 0.02 ±112% perf-profile.self.cycles-pp.__hrtimer_run_queues
0.06 ± 49% +0.0 0.10 ± 65% -0.0 0.06 ± 56% perf-profile.self.cycles-pp.scheduler_tick
0.03 ±113% +0.0 0.07 ± 83% +0.0 0.04 ± 71% perf-profile.self.cycles-pp.update_rq_clock
0.00 +0.0 0.04 ± 44% +0.0 0.01 ±187% perf-profile.self.cycles-pp.folio_migrate_flags
0.09 ± 14% +0.0 0.14 ± 20% +0.0 0.10 ± 16% perf-profile.self.cycles-pp._raw_spin_lock
0.02 ±191% +0.0 0.06 ± 86% +0.0 0.03 ± 90% perf-profile.self.cycles-pp.__update_load_avg_se
0.03 ±118% +0.0 0.08 ± 57% +0.0 0.05 ± 59% perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
0.02 ±111% +0.1 0.08 ± 10% +0.0 0.06 ± 15% perf-profile.self.cycles-pp.change_pte_range
0.15 ± 14% +0.1 0.20 ± 10% +0.0 0.17 ± 21% perf-profile.self.cycles-pp.up_read
0.00 +0.1 0.05 ± 8% +0.0 0.01 ±188% perf-profile.self.cycles-pp.try_to_migrate_one
0.19 ± 16% +0.1 0.24 ± 11% +0.0 0.20 ± 19% perf-profile.self.cycles-pp.down_read_trylock
0.03 ±151% +0.1 0.09 ± 84% +0.0 0.04 ± 72% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.00 +0.1 0.07 ± 8% +0.0 0.00 perf-profile.self.cycles-pp._find_next_bit
0.00 +0.1 0.07 ± 8% +0.0 0.00 perf-profile.self.cycles-pp.native_sched_clock
0.00 +0.1 0.07 ± 12% +0.0 0.03 ±113% perf-profile.self.cycles-pp.page_counter_uncharge
0.09 ± 41% +0.1 0.16 ± 69% +0.0 0.09 ± 42% perf-profile.self.cycles-pp.task_tick_fair
0.11 ± 49% +0.1 0.19 ± 74% -0.0 0.11 ± 29% perf-profile.self.cycles-pp.update_load_avg
0.01 ±282% +0.1 0.11 ± 8% +0.1 0.06 ± 43% perf-profile.self.cycles-pp.page_counter_charge
0.16 ± 15% +0.1 0.27 ± 9% +0.1 0.22 ± 21% perf-profile.self.cycles-pp.copy_page
0.16 ± 41% +0.1 0.28 ± 65% +0.0 0.18 ± 25% perf-profile.self.cycles-pp.update_curr
0.09 ± 7% +0.2 0.24 ± 9% +0.0 0.11 ± 14% perf-profile.self.cycles-pp.sync_regs
0.11 ± 20% +0.3 0.39 ± 8% +0.0 0.15 ± 15% perf-profile.self.cycles-pp.native_irq_return_iret
0.06 ± 40% +0.4 0.47 ± 9% +0.1 0.13 ± 23% perf-profile.self.cycles-pp.__default_send_IPI_dest_field
0.00 +0.4 0.44 ± 10% +0.1 0.11 ± 19% perf-profile.self.cycles-pp.native_flush_tlb_local
0.07 ± 15% +0.5 0.62 ± 7% +0.1 0.16 ± 18% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.06 ± 16% +0.8 0.88 ± 7% +0.3 0.33 ± 21% perf-profile.self.cycles-pp.flush_tlb_func
0.25 ± 14% +1.6 1.85 ± 9% +0.3 0.55 ± 18% perf-profile.self.cycles-pp.llist_reverse_order
0.35 ± 15% +2.1 2.40 ± 8% +0.4 0.76 ± 18% perf-profile.self.cycles-pp.llist_add_batch
0.37 ± 17% +3.1 3.49 ± 7% +0.7 1.10 ± 18% perf-profile.self.cycles-pp.smp_call_function_many_cond


> Best Regards,
> Huang, Ying
>
> -------------------------------------8<------------------------------------
> From 1ac61967b54bbdc1ca20af16f9dfb2507a4d4811 Mon Sep 17 00:00:00 2001
> From: Huang Ying <[email protected]>
> Date: Mon, 20 Mar 2023 15:48:39 +0800
> Subject: [PATCH] dbg, rmap: avoid flushing TLB in batch if PTE is inaccessible
>
> Signed-off-by: "Huang, Ying" <[email protected]>
> ---
>  mm/rmap.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 8632e02661ac..3c7c43642d7c 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1582,7 +1582,8 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>                                  */
>                                 pteval = ptep_get_and_clear(mm, address, pvmw.pte);
>  
> -                               set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
> +                               if (pte_accessible(mm, pteval))
> +                                       set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
>                         } else {
>                                 pteval = ptep_clear_flush(vma, address, pvmw.pte);
>                         }
> @@ -1963,7 +1964,8 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>                                  */
>                                 pteval = ptep_get_and_clear(mm, address, pvmw.pte);
>  
> -                               set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
> +                               if (pte_accessible(mm, pteval))
> +                                       set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
>                         } else {
>                                 pteval = ptep_clear_flush(vma, address, pvmw.pte);
>                         }

2023-03-21 05:44:58

by Huang, Ying

[permalink] [raw]
Subject: Re: [linus:master] [migrate_pages] 7e12beb8ca: vm-scalability.throughput -3.4% regression

"Liu, Yujie" <[email protected]> writes:

> Hi Ying,
>
> On Mon, 2023-03-20 at 15:58 +0800, Huang, Ying wrote:
>> Hi, Yujie,
>>
>> kernel test robot <[email protected]> writes:
>>
>> > Hello,
>> >
>> > FYI, we noticed a -3.4% regression of vm-scalability.throughput due to commit:
>> >
>> > commit: 7e12beb8ca2ac98b2ec42e0ea4b76cdc93b58654 ("migrate_pages: batch flushing TLB")
>> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>> >
>> > in testcase: vm-scalability
>> > on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
>> > with following parameters:
>> >
>> > runtime: 300s
>> > size: 512G
>> > test: anon-cow-rand-mt
>> > cpufreq_governor: performance
>> >
>> > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
>> > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
>> >
>> >
>> > If you fix the issue, kindly add following tag
>> > > Reported-by: kernel test robot <[email protected]>
>> > > Link: https://lore.kernel.org/oe-lkp/[email protected]
>> >
>>
>> Thanks a lot for report! Can you try whether the debug patch as
>> below can restore the regression?
>
> We've tested the patch and found the throughput score was partially
> restored from -3.6% to -1.4%, still with a slight performance drop.
> Please check the detailed data as follows:

Good! Thanks for your detailed data!

> 0.09 ± 17% +1.2 1.32 ± 7% +0.4 0.45 ± 21% perf-profile.children.cycles-pp.flush_tlb_func

It appears that we can reduce the unnecessary TLB flushing effectively
with the previous debug patch. But the batched flush (full flush) is
still slower than the non-batched flush (flush one page).

Can you try the debug patch as below to check whether it can restore the
regression completely? The new debug patch can be applied on top of the
previous debug patch.

Best Regards,
Huang, Ying

---------------------------8<-----------------------------------------
From b36b662c80652447d7374faff1142a941dc9d617 Mon Sep 17 00:00:00 2001
From: Huang Ying <[email protected]>
Date: Mon, 20 Mar 2023 15:38:12 +0800
Subject: [PATCH] dbg, migrate_pages: don't batch flushing for single page
migration

---
mm/migrate.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 98f1c11197a8..7271209c1a03 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1113,8 +1113,8 @@ static void migrate_folio_done(struct folio *src,
static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
unsigned long private, struct folio *src,
struct folio **dstp, int force, bool avoid_force_lock,
- enum migrate_mode mode, enum migrate_reason reason,
- struct list_head *ret)
+ bool batch_flush, enum migrate_mode mode,
+ enum migrate_reason reason, struct list_head *ret)
{
struct folio *dst;
int rc = -EAGAIN;
@@ -1253,7 +1253,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
/* Establish migration ptes */
VM_BUG_ON_FOLIO(folio_test_anon(src) &&
!folio_test_ksm(src) && !anon_vma, src);
- try_to_migrate(src, TTU_BATCH_FLUSH);
+ try_to_migrate(src, batch_flush ? TTU_BATCH_FLUSH : 0);
page_was_mapped = 1;
}

@@ -1641,6 +1641,7 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
bool nosplit = (reason == MR_NUMA_MISPLACED);
bool no_split_folio_counting = false;
bool avoid_force_lock;
+ bool batch_flush = !list_is_singular(from);

retry:
rc_saved = 0;
@@ -1690,7 +1691,7 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,

rc = migrate_folio_unmap(get_new_page, put_new_page, private,
folio, &dst, pass > 2, avoid_force_lock,
- mode, reason, ret_folios);
+ batch_flush, mode, reason, ret_folios);
/*
* The rules are:
* Success: folio will be freed
@@ -1804,7 +1805,8 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
stats->nr_failed_pages += nr_retry_pages;
move:
/* Flush TLBs for all unmapped folios */
- try_to_unmap_flush();
+ if (batch_flush)
+ try_to_unmap_flush();

retry = 1;
for (pass = 0;
--
2.30.2


2023-03-22 05:22:35

by kernel test robot

[permalink] [raw]
Subject: Re: [linus:master] [migrate_pages] 7e12beb8ca: vm-scalability.throughput -3.4% regression

On Tue, 2023-03-21 at 13:43 +0800, Huang, Ying wrote:
> "Liu, Yujie" <[email protected]> writes:
>
> > Hi Ying,
> >
> > On Mon, 2023-03-20 at 15:58 +0800, Huang, Ying wrote:
> > > Hi, Yujie,
> > >
> > > kernel test robot <[email protected]> writes:
> > >
> > > > Hello,
> > > >
> > > > FYI, we noticed a -3.4% regression of vm-scalability.throughput due to commit:
> > > >
> > > > commit: 7e12beb8ca2ac98b2ec42e0ea4b76cdc93b58654 ("migrate_pages: batch flushing TLB")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > >
> > > > in testcase: vm-scalability
> > > > on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
> > > > with following parameters:
> > > >
> > > >         runtime: 300s
> > > >         size: 512G
> > > >         test: anon-cow-rand-mt
> > > >         cpufreq_governor: performance
> > > >
> > > > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> > > > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
> > > >
> > > >
> > > > If you fix the issue, kindly add following tag
> > > > > Reported-by: kernel test robot <[email protected]>
> > > > > Link: https://lore.kernel.org/oe-lkp/[email protected]
> > > >
> > >
> > > Thanks a lot for report!  Can you try whether the debug patch as
> > > below can restore the regression?
> >
> > We've tested the patch and found the throughput score was partially
> > restored from -3.6% to -1.4%, still with a slight performance drop.
> > Please check the detailed data as follows:
>
> Good!  Thanks for your detailed data!
>
> >       0.09 ± 17%      +1.2        1.32 ±  7%      +0.4        0.45 ± 21%  perf-profile.children.cycles-pp.flush_tlb_func
>
> It appears that we can reduce the unnecessary TLB flushing effectively
> with the previous debug patch.  But the batched flush (full flush) is
> still slower than the non-batched flush (flush one page).
>
> Can you try the debug patch as below to check whether it can restore the
> regression completely?  The new debug patch can be applied on top of the
> previous debug patch.

The second debug patch got a -0.7% performance change. The data have
some fluctuations from test to test, and the standard deviation is even
a bit larger than 0.7%, which make the performance score not very
convincing. Please check other metrics to see if the regression is
fully restored. Thanks.

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/512G/lkp-csl-2sp3/anon-cow-rand-mt/vm-scalability

commit:
ebe75e4751063 ("migrate_pages: share more code between _unmap and _move")
9a30245d65679 ("dbg, rmap: avoid flushing TLB in batch if PTE is inaccessible")
a65085664418d ("dbg, migrate_pages: don't batch flushing for single page migration")

ebe75e4751063dce 9a30245d656794d171cd798a2be a65085664418d7ed1560095d466
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
57634 -1.5% 56788 -0.8% 57199 vm-scalability.median
81.16 ± 12% -20.0 61.18 ± 21% -5.0 76.14 ± 12% vm-scalability.stddev%
5528051 -1.4% 5449450 -0.7% 5487122 vm-scalability.throughput
305.38 -0.1% 305.19 -0.1% 305.15 vm-scalability.time.elapsed_time
305.38 -0.1% 305.19 -0.1% 305.15 vm-scalability.time.elapsed_time.max
652.11 ± 88% +54.5% 1007 ± 63% +45.4% 948.20 ± 80% vm-scalability.time.file_system_inputs
200293 ± 3% -4.3% 191707 ± 2% +1.9% 204033 ± 3% vm-scalability.time.involuntary_context_switches
67.11 ± 56% -95.4% 3.11 ± 80% -11.3% 59.50 ± 27% vm-scalability.time.major_page_faults
32930133 -0.0% 32924571 -0.0% 32922758 vm-scalability.time.maximum_resident_set_size
67952989 ± 5% +35.6% 92147668 ± 3% +2.8% 69849921 ± 8% vm-scalability.time.minor_page_faults
4096 +0.0% 4096 +0.0% 4096 vm-scalability.time.page_size
9006 -0.6% 8956 -0.0% 9005 vm-scalability.time.percent_of_cpu_this_job_got
1178 ± 3% +8.6% 1278 ± 3% -1.9% 1155 ± 4% vm-scalability.time.system_time
26327 -1.0% 26056 +0.0% 26327 vm-scalability.time.user_time
11378 ± 5% +118.5% 24867 ± 7% -0.5% 11327 ± 9% vm-scalability.time.voluntary_context_switches
1.662e+09 -1.5% 1.638e+09 -0.8% 1.648e+09 vm-scalability.workload
1.143e+09 +0.6% 1.15e+09 ± 2% +2.9% 1.176e+09 ± 3% cpuidle..time
2464665 ± 3% +2.0% 2515047 ± 4% +2.2% 2519159 ± 8% cpuidle..usage
367.89 -0.2% 367.16 -0.2% 367.32 uptime.boot
6393 ± 3% -0.9% 6336 ± 2% -0.5% 6363 ± 2% uptime.idle
59.33 ± 4% -0.4% 59.06 ± 2% -0.6% 58.94 ± 3% boot-time.boot
33.79 ± 3% -0.8% 33.54 -0.7% 33.57 boot-time.dhcp
5106 ± 4% -0.6% 5076 ± 2% -0.8% 5066 ± 3% boot-time.idle
1.05 ± 8% -4.4% 1.01 -4.3% 1.01 boot-time.smp_boot
3.78 -0.0 3.77 ± 3% +0.1 3.91 ± 4% mpstat.cpu.all.idle%
0.00 ±184% +0.0 0.00 ± 25% -0.0 0.00 ± 60% mpstat.cpu.all.iowait%
2.58 +0.5 3.09 ± 3% -0.0 2.56 mpstat.cpu.all.irq%
0.03 ± 4% +0.0 0.03 ± 8% -0.0 0.03 ± 5% mpstat.cpu.all.soft%
4.06 ± 3% +0.3 4.40 ± 3% -0.1 3.98 ± 4% mpstat.cpu.all.sys%
89.55 -0.8 88.71 -0.0 89.52 mpstat.cpu.all.usr%
0.00 -100.0% 0.00 -100.0% 0.00 numa-numastat.node0.interleave_hit
14350133 ± 4% +7.7% 15454129 ± 4% -0.5% 14283646 ± 4% numa-numastat.node0.local_node
14405409 ± 4% +7.5% 15487972 ± 4% -0.5% 14332762 ± 4% numa-numastat.node0.numa_hit
55258 ± 48% -37.3% 34622 ± 67% -13.6% 47731 ± 51% numa-numastat.node0.other_node
0.00 -100.0% 0.00 -100.0% 0.00 numa-numastat.node1.interleave_hit
14402027 ± 3% +8.4% 15618857 ± 5% -0.1% 14389667 ± 4% numa-numastat.node1.local_node
14433899 ± 3% +8.6% 15670948 ± 5% -0.0% 14429236 ± 4% numa-numastat.node1.numa_hit
31821 ± 84% +64.9% 52467 ± 44% +30.8% 41622 ± 56% numa-numastat.node1.other_node
305.38 -0.1% 305.19 -0.1% 305.15 time.elapsed_time
305.38 -0.1% 305.19 -0.1% 305.15 time.elapsed_time.max
652.11 ± 88% +54.5% 1007 ± 63% +45.4% 948.20 ± 80% time.file_system_inputs
200293 ± 3% -4.3% 191707 ± 2% +1.9% 204033 ± 3% time.involuntary_context_switches
67.11 ± 56% -95.4% 3.11 ± 80% -11.3% 59.50 ± 27% time.major_page_faults
32930133 -0.0% 32924571 -0.0% 32922758 time.maximum_resident_set_size
67952989 ± 5% +35.6% 92147668 ± 3% +2.8% 69849921 ± 8% time.minor_page_faults
4096 +0.0% 4096 +0.0% 4096 time.page_size
9006 -0.6% 8956 -0.0% 9005 time.percent_of_cpu_this_job_got
1178 ± 3% +8.6% 1278 ± 3% -1.9% 1155 ± 4% time.system_time
26327 -1.0% 26056 +0.0% 26327 time.user_time
11378 ± 5% +118.5% 24867 ± 7% -0.5% 11327 ± 9% time.voluntary_context_switches
4.00 +0.0% 4.00 +0.0% 4.00 vmstat.cpu.id
6.00 +16.7% 7.00 +0.0% 6.00 vmstat.cpu.sy
88.33 -0.9% 87.56 +0.3% 88.60 vmstat.cpu.us
0.00 -100.0% 0.00 -100.0% 0.00 vmstat.cpu.wa
10.67 ± 97% -34.4% 7.00 -34.4% 7.00 vmstat.io.bi
8.00 ± 70% -25.0% 6.00 -25.0% 6.00 vmstat.io.bo
1046 -0.1% 1045 -0.1% 1045 vmstat.memory.buff
2964204 -0.1% 2962572 -0.1% 2961826 vmstat.memory.cache
63650311 +0.1% 63687273 +0.1% 63731617 vmstat.memory.free
0.00 -100.0% 0.00 -100.0% 0.00 vmstat.procs.b
92.00 -0.2% 91.78 -0.3% 91.70 vmstat.procs.r
2022 ± 3% +3.6% 2095 -1.3% 1995 vmstat.system.cs
539357 ± 2% +32.9% 716886 ± 4% -2.1% 528047 ± 5% vmstat.system.in
143480 ± 3% -12.0% 126262 ± 4% -0.6% 142665 ± 3% sched_debug.cfs_rq:/.min_vruntime.stddev
548123 ± 7% -20.7% 434543 ± 9% -5.5% 517900 ± 7% sched_debug.cfs_rq:/.spread0.avg
655329 ± 6% -16.2% 549218 ± 6% -4.7% 624275 ± 5% sched_debug.cfs_rq:/.spread0.max
143388 ± 3% -11.9% 126295 ± 4% -0.6% 142588 ± 3% sched_debug.cfs_rq:/.spread0.stddev
240478 ± 6% -12.0% 211715 ± 5% -3.2% 232667 ± 8% sched_debug.cpu.avg_idle.avg
1938 ± 5% +11.4% 2160 ± 3% -2.1% 1897 ± 4% sched_debug.cpu.nr_switches.min
39960890 ± 6% +54.7% 61837739 ± 4% +5.0% 41939453 ± 11% proc-vmstat.numa_hint_faults
19987976 ± 6% +55.1% 30996483 ± 4% +5.0% 20978472 ± 11% proc-vmstat.numa_hint_faults_local
28840932 ± 3% +8.0% 31160418 ± 4% -0.3% 28764186 ± 4% proc-vmstat.numa_hit
28753783 ± 3% +8.1% 31074486 ± 4% -0.3% 28675501 ± 4% proc-vmstat.numa_local
19745743 ± 5% +11.8% 22080123 ± 6% -0.4% 19668879 ± 6% proc-vmstat.numa_pages_migrated
40107839 ± 6% +54.6% 61988683 ± 4% +5.0% 42094380 ± 11% proc-vmstat.numa_pte_updates
37158989 ± 2% +6.3% 39482935 ± 3% -0.2% 37080293 ± 3% proc-vmstat.pgalloc_normal
68856116 ± 5% +35.1% 93057570 ± 3% +2.8% 70755839 ± 8% proc-vmstat.pgfault
19745743 ± 5% +11.8% 22080123 ± 6% -0.4% 19668879 ± 6% proc-vmstat.pgmigrate_success
19754280 ± 5% +11.8% 22080663 ± 6% -0.4% 19677784 ± 6% proc-vmstat.pgreuse
8953845 ± 3% +13.3% 10142474 ± 2% +0.7% 9013008 ± 2% perf-stat.i.branch-misses
158.09 +7.5% 170.00 ± 2% +1.5% 160.38 ± 3% perf-stat.i.cpu-migrations
9.10 -0.1 8.97 -0.0 9.08 perf-stat.i.dTLB-store-miss-rate%
2454429 ± 2% +26.7% 3110501 ± 5% -5.2% 2326293 ± 3% perf-stat.i.iTLB-load-misses
0.31 ± 38% -68.9% 0.10 ± 31% -11.2% 0.27 ± 22% perf-stat.i.major-faults
224958 ± 5% +35.4% 304571 ± 3% +2.7% 231063 ± 8% perf-stat.i.minor-faults
224959 ± 5% +35.4% 304571 ± 3% +2.7% 231064 ± 8% perf-stat.i.page-faults
0.08 ± 4% +0.0 0.09 ± 3% +0.0 0.08 ± 2% perf-stat.overall.branch-miss-rate%
9.38 -0.1 9.25 -0.0 9.37 perf-stat.overall.dTLB-store-miss-rate%
95.49 +1.0 96.53 -0.3 95.15 perf-stat.overall.iTLB-load-miss-rate%
20490 ± 3% -21.5% 16077 ± 6% +4.5% 21404 ± 4% perf-stat.overall.instructions-per-iTLB-miss
8906114 ± 3% +13.3% 10090374 ± 2% +0.7% 8968593 ± 2% perf-stat.ps.branch-misses
157.57 +7.6% 169.49 ± 2% +1.4% 159.76 ± 3% perf-stat.ps.cpu-migrations
2444301 ± 2% +26.8% 3098710 ± 5% -5.2% 2317560 ± 3% perf-stat.ps.iTLB-load-misses
0.31 ± 38% -68.8% 0.10 ± 31% -10.8% 0.27 ± 22% perf-stat.ps.major-faults
224444 ± 5% +35.3% 303619 ± 3% +2.7% 230589 ± 8% perf-stat.ps.minor-faults
224444 ± 5% +35.3% 303620 ± 3% +2.7% 230589 ± 8% perf-stat.ps.page-faults
1.26 ± 15% -1.3 0.00 -0.0 1.25 ± 14% perf-profile.calltrace.cycles-pp.migrate_folio_unmap.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
1.14 ± 15% -1.1 0.00 -0.0 1.12 ± 14% perf-profile.calltrace.cycles-pp.try_to_migrate.migrate_folio_unmap.migrate_pages_batch.migrate_pages.migrate_misplaced_page
1.12 ± 15% -1.1 0.00 -0.0 1.11 ± 14% perf-profile.calltrace.cycles-pp.rmap_walk_anon.try_to_migrate.migrate_folio_unmap.migrate_pages_batch.migrate_pages
1.08 ± 15% -1.1 0.00 -0.0 1.06 ± 14% perf-profile.calltrace.cycles-pp.try_to_migrate_one.rmap_walk_anon.try_to_migrate.migrate_folio_unmap.migrate_pages_batch
0.92 ± 15% -0.9 0.00 -0.0 0.92 ± 14% perf-profile.calltrace.cycles-pp.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon.try_to_migrate.migrate_folio_unmap
0.91 ± 15% -0.9 0.00 -0.0 0.91 ± 14% perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon.try_to_migrate
0.91 ± 15% -0.9 0.00 -0.0 0.91 ± 14% perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one.rmap_walk_anon
0.91 ± 15% -0.9 0.00 -0.0 0.90 ± 14% perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.ptep_clear_flush.try_to_migrate_one
72.48 ± 3% -0.7 71.79 +2.8 75.24 ± 5% perf-profile.calltrace.cycles-pp.do_access
0.26 ±112% -0.3 0.00 +0.1 0.34 ± 82% perf-profile.calltrace.cycles-pp._raw_spin_lock.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.19 ±141% -0.2 0.00 -0.0 0.16 ±153% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.do_numa_page.__handle_mm_fault.handle_mm_fault
0.07 ±282% -0.1 0.00 -0.1 0.00 perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
0.07 ±282% -0.1 0.00 -0.1 0.00 perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
0.07 ±282% -0.1 0.00 -0.1 0.00 perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.nrand48_r
0.06 ±282% -0.1 0.00 -0.1 0.00 perf-profile.calltrace.cycles-pp.rmap_walk_anon.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
0.13 ±188% -0.0 0.11 ±187% -0.1 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.nrand48_r
4.13 ± 3% -0.0 4.12 -0.1 3.98 ± 6% perf-profile.calltrace.cycles-pp.do_rw_once
1.34 ± 39% +0.0 1.35 ± 25% -0.2 1.16 ± 22% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.55 ± 69% +0.0 0.60 ± 56% -0.1 0.50 ± 52% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues
1.09 ± 31% +0.1 1.14 ± 26% -0.2 0.93 ± 37% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
1.08 ± 31% +0.1 1.13 ± 26% -0.2 0.92 ± 37% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
0.00 +0.1 0.06 ±282% +0.0 0.00 perf-profile.calltrace.cycles-pp.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
0.00 +0.1 0.06 ±282% +0.0 0.00 perf-profile.calltrace.cycles-pp.migrate_folio_extra.move_to_new_folio.migrate_pages_batch.migrate_pages.migrate_misplaced_page
1.18 ± 30% +0.1 1.24 ± 26% -0.1 1.07 ± 23% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
1.52 ± 28% +0.1 1.58 ± 25% -0.2 1.36 ± 21% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
1.43 ± 29% +0.1 1.50 ± 25% -0.1 1.29 ± 21% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
1.44 ± 28% +0.1 1.51 ± 25% -0.1 1.30 ± 21% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.do_access
1.72 ± 25% +0.1 1.80 ± 22% -0.2 1.55 ± 20% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.do_access
6.40 ± 9% +0.1 6.54 -0.6 5.76 ± 17% perf-profile.calltrace.cycles-pp.lrand48_r
0.17 ±196% +0.2 0.33 ± 89% -0.1 0.11 ±200% perf-profile.calltrace.cycles-pp.task_tick_fair.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer
0.00 +0.3 0.26 ±113% +0.0 0.00 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +0.3 0.26 ±113% +0.0 0.00 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.do_access
0.00 +0.3 0.33 ± 91% +0.0 0.00 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.do_access
19.08 ± 10% +0.5 19.59 -2.2 16.90 ± 19% perf-profile.calltrace.cycles-pp.nrand48_r
0.00 +0.6 0.59 ± 40% +0.0 0.00 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.do_access
3.30 ± 15% +0.9 4.18 ± 19% -0.1 3.24 ± 14% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
3.34 ± 15% +0.9 4.22 ± 19% -0.1 3.27 ± 14% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
0.00 +0.9 0.90 ± 18% +0.0 0.00 perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush
3.70 ± 15% +0.9 4.64 ± 19% -0.1 3.60 ± 14% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
3.68 ± 15% +0.9 4.63 ± 19% -0.1 3.59 ± 14% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
3.89 ± 14% +1.0 4.85 ± 19% -0.1 3.76 ± 14% perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
3.03 ± 15% +1.0 4.03 ± 19% -0.1 2.98 ± 14% perf-profile.calltrace.cycles-pp.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
2.46 ± 15% +1.4 3.85 ± 19% -0.1 2.41 ± 14% perf-profile.calltrace.cycles-pp.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
2.27 ± 15% +1.4 3.67 ± 19% -0.0 2.22 ± 14% perf-profile.calltrace.cycles-pp.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault
2.27 ± 15% +1.4 3.68 ± 19% -0.0 2.23 ± 14% perf-profile.calltrace.cycles-pp.migrate_pages.migrate_misplaced_page.do_numa_page.__handle_mm_fault.handle_mm_fault
0.00 +2.4 2.38 ± 18% +0.0 0.00 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch
0.00 +2.4 2.40 ± 18% +0.0 0.00 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages
0.00 +2.4 2.40 ± 18% +0.0 0.00 perf-profile.calltrace.cycles-pp.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page.do_numa_page
0.00 +2.4 2.40 ± 18% +0.0 0.00 perf-profile.calltrace.cycles-pp.arch_tlbbatch_flush.try_to_unmap_flush.migrate_pages_batch.migrate_pages.migrate_misplaced_page
1.51 ± 16% -1.2 0.31 ± 20% -0.0 1.48 ± 14% perf-profile.children.cycles-pp.rmap_walk_anon
1.25 ± 16% -1.0 0.29 ± 20% -0.0 1.22 ± 15% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
1.08 ± 15% -1.0 0.12 ± 21% -0.0 1.06 ± 14% perf-profile.children.cycles-pp.try_to_migrate_one
1.14 ± 15% -0.9 0.19 ± 19% -0.0 1.12 ± 14% perf-profile.children.cycles-pp.try_to_migrate
0.92 ± 15% -0.9 0.00 -0.0 0.92 ± 14% perf-profile.children.cycles-pp.ptep_clear_flush
1.26 ± 15% -0.9 0.34 ± 21% -0.0 1.25 ± 14% perf-profile.children.cycles-pp.migrate_folio_unmap
0.92 ± 15% -0.9 0.00 -0.0 0.91 ± 14% perf-profile.children.cycles-pp.flush_tlb_mm_range
1.05 ± 15% -0.9 0.16 ± 16% -0.0 1.04 ± 15% perf-profile.children.cycles-pp._raw_spin_lock
72.83 ± 3% -0.6 72.25 +2.8 75.59 ± 5% perf-profile.children.cycles-pp.do_access
0.46 ± 15% -0.3 0.11 ± 20% -0.0 0.44 ± 14% perf-profile.children.cycles-pp.page_vma_mapped_walk
0.34 ± 15% -0.3 0.08 ± 18% -0.0 0.33 ± 15% perf-profile.children.cycles-pp.remove_migration_pte
0.14 ± 16% -0.1 0.00 -0.0 0.14 ± 17% perf-profile.children.cycles-pp.handle_pte_fault
0.13 ± 22% -0.0 0.09 ± 23% -0.0 0.12 ± 17% perf-profile.children.cycles-pp.folio_lruvec_lock_irq
0.13 ± 22% -0.0 0.09 ± 22% -0.0 0.12 ± 18% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.09 ± 39% -0.0 0.07 ± 75% -0.0 0.09 ± 52% perf-profile.children.cycles-pp.cpuacct_account_field
0.17 ± 21% -0.0 0.15 ± 21% -0.0 0.16 ± 15% perf-profile.children.cycles-pp.folio_isolate_lru
0.19 ± 20% -0.0 0.17 ± 20% -0.0 0.18 ± 15% perf-profile.children.cycles-pp.numamigrate_isolate_page
0.12 ± 95% -0.0 0.11 ± 16% -0.1 0.06 ± 13% perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
0.09 ± 47% -0.0 0.08 ± 43% -0.0 0.06 ± 38% perf-profile.children.cycles-pp.hrtimer_active
4.37 ± 3% -0.0 4.36 -0.2 4.22 ± 5% perf-profile.children.cycles-pp.do_rw_once
0.33 ± 2% -0.0 0.32 ± 2% -0.0 0.32 ± 5% perf-profile.children.cycles-pp.lrand48_r@plt
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.children.cycles-pp.enqueue_hrtimer
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.children.cycles-pp.timerqueue_add
0.06 ± 13% -0.0 0.05 ± 37% -0.0 0.04 ± 51% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.06 ± 13% -0.0 0.05 ± 37% -0.0 0.04 ± 51% perf-profile.children.cycles-pp.do_syscall_64
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.children.cycles-pp.lapic_next_deadline
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.children.cycles-pp.hrtimer_update_next_event
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.children.cycles-pp.update_min_vruntime
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.children.cycles-pp.rcu_core
0.15 ± 20% -0.0 0.15 ± 21% -0.0 0.14 ± 17% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
0.07 ± 27% -0.0 0.06 ± 55% -0.0 0.05 ± 53% perf-profile.children.cycles-pp.ktime_get
0.01 ±193% -0.0 0.01 ±188% -0.0 0.01 ±201% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.01 ±282% -0.0 0.01 ±282% -0.0 0.01 ±299% perf-profile.children.cycles-pp.perf_rotate_context
0.21 ± 17% -0.0 0.21 ± 18% -0.0 0.20 ± 15% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.02 ±209% -0.0 0.02 ±142% -0.0 0.01 ±300% perf-profile.children.cycles-pp.update_cfs_group
0.05 ± 43% -0.0 0.05 ± 57% -0.0 0.04 ± 67% perf-profile.children.cycles-pp.update_irq_load_avg
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.start_secondary
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.cpu_startup_entry
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.do_idle
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.cpuidle_idle_call
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.cpuidle_enter
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.cpuidle_enter_state
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.mwait_idle_with_hints
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.children.cycles-pp.intel_idle
0.06 ± 18% +0.0 0.07 ± 41% -0.0 0.05 ± 66% perf-profile.children.cycles-pp.rcu_pending
0.02 ±112% +0.0 0.03 ±111% -0.0 0.01 ±300% perf-profile.children.cycles-pp.timerqueue_del
0.02 ±111% +0.0 0.03 ±112% +0.0 0.03 ±100% perf-profile.children.cycles-pp.irqtime_account_process_tick
0.06 ± 18% +0.0 0.06 ± 19% -0.0 0.03 ± 82% perf-profile.children.cycles-pp.mt_find
0.07 ± 39% +0.0 0.07 ± 28% -0.0 0.05 ± 55% perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.00 +0.0 0.01 ±282% +0.0 0.00 perf-profile.children.cycles-pp._find_next_bit
0.00 +0.0 0.01 ±282% +0.0 0.00 perf-profile.children.cycles-pp.folio_get_anon_vma
0.00 +0.0 0.01 ±282% +0.0 0.00 perf-profile.children.cycles-pp.__free_one_page
0.06 ± 18% +0.0 0.06 ± 20% -0.0 0.03 ± 82% perf-profile.children.cycles-pp.find_vma
0.11 ± 25% +0.0 0.11 ± 25% -0.0 0.09 ± 38% perf-profile.children.cycles-pp.update_rq_clock
0.32 ± 19% +0.0 0.33 ± 32% -0.0 0.30 ± 31% perf-profile.children.cycles-pp.account_user_time
0.21 ± 48% +0.0 0.22 ± 28% -0.0 0.18 ± 23% perf-profile.children.cycles-pp.update_load_avg
0.09 ± 20% +0.0 0.09 ± 23% -0.0 0.08 ± 38% perf-profile.children.cycles-pp.tick_sched_do_timer
0.02 ±154% +0.0 0.03 ± 92% -0.0 0.02 ±155% perf-profile.children.cycles-pp.__do_softirq
0.07 ± 35% +0.0 0.08 ± 26% -0.0 0.07 ± 20% perf-profile.children.cycles-pp.clockevents_program_event
0.08 ± 36% +0.0 0.09 ± 24% -0.0 0.07 ± 19% perf-profile.children.cycles-pp.__irq_exit_rcu
0.03 ±127% +0.0 0.04 ± 72% -0.0 0.02 ±123% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.08 ± 18% +0.0 0.09 ± 26% -0.0 0.06 ± 53% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.00 +0.0 0.01 ±187% +0.0 0.00 perf-profile.children.cycles-pp.lru_add_fn
0.21 ± 19% +0.0 0.22 ± 21% -0.0 0.20 ± 15% perf-profile.children.cycles-pp.folio_batch_move_lru
0.21 ± 19% +0.0 0.22 ± 20% -0.0 0.20 ± 15% perf-profile.children.cycles-pp.lru_add_drain
0.21 ± 19% +0.0 0.22 ± 20% -0.0 0.20 ± 15% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.06 ± 39% +0.0 0.07 ± 21% +0.0 0.06 ± 15% perf-profile.children.cycles-pp.rmqueue_bulk
0.06 ± 16% +0.0 0.08 ± 21% -0.0 0.06 ± 13% perf-profile.children.cycles-pp.free_unref_page
0.09 ± 16% +0.0 0.11 ± 22% -0.0 0.09 ± 14% perf-profile.children.cycles-pp.__alloc_pages
0.09 ± 15% +0.0 0.11 ± 21% -0.0 0.09 ± 17% perf-profile.children.cycles-pp.rmqueue
0.09 ± 16% +0.0 0.11 ± 21% -0.0 0.09 ± 14% perf-profile.children.cycles-pp.get_page_from_freelist
0.03 ± 71% +0.0 0.05 ± 39% -0.0 0.03 ± 82% perf-profile.children.cycles-pp.free_pcppages_bulk
0.00 +0.0 0.02 ±142% +0.0 0.00 perf-profile.children.cycles-pp.can_change_pte_writable
0.00 +0.0 0.02 ±142% +0.0 0.00 perf-profile.children.cycles-pp.folio_migrate_flags
0.03 ±152% +0.0 0.04 ± 72% +0.0 0.03 ± 84% perf-profile.children.cycles-pp.__update_load_avg_se
0.09 ± 18% +0.0 0.11 ± 22% -0.0 0.09 ± 14% perf-profile.children.cycles-pp.__folio_alloc
0.09 ± 18% +0.0 0.11 ± 22% +0.0 0.09 ± 16% perf-profile.children.cycles-pp.alloc_misplaced_dst_page
0.08 ± 15% +0.0 0.10 ± 21% -0.0 0.08 ± 16% perf-profile.children.cycles-pp.__list_del_entry_valid
0.04 ± 91% +0.0 0.06 ± 38% +0.0 0.04 ± 66% perf-profile.children.cycles-pp.arch_scale_freq_tick
0.11 ± 16% +0.0 0.13 ± 29% -0.0 0.10 ± 28% perf-profile.children.cycles-pp.__cgroup_account_cputime_field
0.19 ± 17% +0.0 0.21 ± 18% -0.0 0.18 ± 18% perf-profile.children.cycles-pp.down_read_trylock
0.03 ±118% +0.0 0.05 ± 59% -0.0 0.03 ±101% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
0.02 ±142% +0.0 0.04 ± 72% -0.0 0.01 ±299% perf-profile.children.cycles-pp.irqtime_account_irq
0.25 ± 39% +0.0 0.27 ± 25% -0.0 0.22 ± 22% perf-profile.children.cycles-pp.update_curr
0.09 ± 7% +0.0 0.11 ± 14% -0.0 0.08 ± 15% perf-profile.children.cycles-pp.sync_regs
0.16 ± 13% +0.0 0.18 ± 19% -0.0 0.15 ± 14% perf-profile.children.cycles-pp.up_read
0.68 ± 45% +0.0 0.71 ± 28% -0.1 0.58 ± 24% perf-profile.children.cycles-pp.task_tick_fair
0.02 ±141% +0.0 0.05 ± 42% +0.0 0.02 ±122% perf-profile.children.cycles-pp.uncharge_batch
0.01 ±282% +0.0 0.04 ± 75% +0.0 0.01 ±200% perf-profile.children.cycles-pp.page_counter_uncharge
0.02 ±141% +0.0 0.06 ± 44% +0.0 0.02 ±100% perf-profile.children.cycles-pp.__mem_cgroup_uncharge
0.02 ±141% +0.0 0.06 ± 44% +0.0 0.02 ±100% perf-profile.children.cycles-pp.__folio_put
0.03 ± 71% +0.0 0.08 ± 25% -0.0 0.03 ± 82% perf-profile.children.cycles-pp.mem_cgroup_migrate
0.96 ± 40% +0.0 1.00 ± 27% -0.1 0.81 ± 24% perf-profile.children.cycles-pp.scheduler_tick
0.11 ± 20% +0.0 0.16 ± 15% -0.0 0.11 ± 11% perf-profile.children.cycles-pp.native_irq_return_iret
0.06 ± 13% +0.1 0.11 ± 16% -0.0 0.06 ± 13% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.05 ± 36% +0.1 0.10 ± 18% -0.0 0.04 ± 51% perf-profile.children.cycles-pp.exit_to_user_mode_loop
0.01 ±187% +0.1 0.07 ± 26% +0.0 0.02 ±122% perf-profile.children.cycles-pp.page_counter_charge
0.04 ± 71% +0.1 0.10 ± 18% +0.0 0.04 ± 51% perf-profile.children.cycles-pp.task_work_run
0.17 ± 14% +0.1 0.23 ± 20% -0.0 0.17 ± 15% perf-profile.children.cycles-pp.copy_page
0.17 ± 13% +0.1 0.24 ± 19% -0.0 0.17 ± 15% perf-profile.children.cycles-pp.folio_copy
0.03 ± 90% +0.1 0.10 ± 16% +0.0 0.04 ± 51% perf-profile.children.cycles-pp.change_pte_range
0.03 ± 90% +0.1 0.10 ± 18% +0.0 0.04 ± 51% perf-profile.children.cycles-pp.task_numa_work
0.03 ± 90% +0.1 0.10 ± 18% +0.0 0.04 ± 51% perf-profile.children.cycles-pp.change_prot_numa
0.03 ± 90% +0.1 0.10 ± 18% +0.0 0.04 ± 51% perf-profile.children.cycles-pp.change_protection_range
0.03 ± 90% +0.1 0.10 ± 18% +0.0 0.04 ± 51% perf-profile.children.cycles-pp.change_pmd_range
0.06 ± 40% +0.1 0.13 ± 23% +0.0 0.06 ± 15% perf-profile.children.cycles-pp.__default_send_IPI_dest_field
1.58 ± 32% +0.1 1.65 ± 25% -0.2 1.36 ± 25% perf-profile.children.cycles-pp.tick_sched_handle
1.56 ± 32% +0.1 1.64 ± 25% -0.2 1.35 ± 25% perf-profile.children.cycles-pp.update_process_times
1.85 ± 30% +0.1 1.94 ± 25% -0.2 1.61 ± 24% perf-profile.children.cycles-pp.__hrtimer_run_queues
1.71 ± 31% +0.1 1.79 ± 25% -0.2 1.49 ± 25% perf-profile.children.cycles-pp.tick_sched_timer
0.08 ± 16% +0.1 0.17 ± 21% -0.0 0.08 ± 17% perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys
2.09 ± 29% +0.1 2.18 ± 24% -0.3 1.81 ± 23% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
2.06 ± 29% +0.1 2.16 ± 24% -0.3 1.79 ± 23% perf-profile.children.cycles-pp.hrtimer_interrupt
2.19 ± 29% +0.1 2.29 ± 24% -0.3 1.89 ± 23% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.25 ± 12% +0.1 0.36 ± 20% -0.0 0.25 ± 14% perf-profile.children.cycles-pp.move_to_new_folio
0.25 ± 12% +0.1 0.36 ± 20% -0.0 0.25 ± 14% perf-profile.children.cycles-pp.migrate_folio_extra
0.00 +0.1 0.12 ± 22% +0.0 0.00 perf-profile.children.cycles-pp.native_flush_tlb_local
2.48 ± 26% +0.1 2.60 ± 22% -0.3 2.14 ± 22% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
9.27 ± 8% +0.2 9.45 -0.9 8.41 ± 15% perf-profile.children.cycles-pp.lrand48_r
0.25 ± 14% +0.3 0.55 ± 18% -0.0 0.24 ± 15% perf-profile.children.cycles-pp.llist_reverse_order
0.09 ± 17% +0.4 0.45 ± 21% +0.0 0.09 ± 12% perf-profile.children.cycles-pp.flush_tlb_func
16.69 ± 10% +0.5 17.16 -2.0 14.72 ± 19% perf-profile.children.cycles-pp.nrand48_r
0.40 ± 15% +0.5 0.93 ± 18% -0.0 0.39 ± 14% perf-profile.children.cycles-pp.llist_add_batch
0.41 ± 14% +0.7 1.14 ± 19% -0.0 0.41 ± 14% perf-profile.children.cycles-pp.__sysvec_call_function
0.41 ± 14% +0.7 1.14 ± 19% -0.0 0.41 ± 14% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.43 ± 14% +0.7 1.17 ± 19% -0.0 0.42 ± 14% perf-profile.children.cycles-pp.sysvec_call_function
0.55 ± 12% +0.9 1.40 ± 19% -0.0 0.53 ± 15% perf-profile.children.cycles-pp.asm_sysvec_call_function
3.31 ± 15% +0.9 4.19 ± 19% -0.1 3.24 ± 14% perf-profile.children.cycles-pp.__handle_mm_fault
3.34 ± 15% +0.9 4.23 ± 19% -0.1 3.27 ± 14% perf-profile.children.cycles-pp.handle_mm_fault
3.70 ± 15% +0.9 4.64 ± 19% -0.1 3.60 ± 14% perf-profile.children.cycles-pp.exc_page_fault
3.70 ± 15% +0.9 4.64 ± 19% -0.1 3.60 ± 14% perf-profile.children.cycles-pp.do_user_addr_fault
3.91 ± 14% +1.0 4.88 ± 19% -0.1 3.78 ± 14% perf-profile.children.cycles-pp.asm_exc_page_fault
3.03 ± 15% +1.0 4.03 ± 19% -0.1 2.98 ± 14% perf-profile.children.cycles-pp.do_numa_page
2.46 ± 15% +1.4 3.85 ± 19% -0.1 2.41 ± 14% perf-profile.children.cycles-pp.migrate_misplaced_page
2.27 ± 15% +1.4 3.67 ± 19% -0.0 2.22 ± 14% perf-profile.children.cycles-pp.migrate_pages_batch
2.27 ± 15% +1.4 3.68 ± 19% -0.0 2.23 ± 14% perf-profile.children.cycles-pp.migrate_pages
0.91 ± 15% +1.5 2.42 ± 18% -0.0 0.91 ± 14% perf-profile.children.cycles-pp.smp_call_function_many_cond
0.91 ± 15% +1.5 2.42 ± 18% -0.0 0.91 ± 14% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
0.00 +2.4 2.40 ± 18% +0.0 0.00 perf-profile.children.cycles-pp.try_to_unmap_flush
0.00 +2.4 2.40 ± 18% +0.0 0.00 perf-profile.children.cycles-pp.arch_tlbbatch_flush
66.95 ± 3% -2.0 64.95 +3.1 70.02 ± 6% perf-profile.self.cycles-pp.do_access
1.14 ± 16% -0.9 0.28 ± 21% -0.0 1.12 ± 15% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.06 ±187% -0.1 0.00 -0.1 0.00 perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
4.08 ± 3% -0.0 4.03 -0.1 3.94 ± 5% perf-profile.self.cycles-pp.do_rw_once
0.09 ± 39% -0.0 0.07 ± 75% -0.0 0.09 ± 52% perf-profile.self.cycles-pp.cpuacct_account_field
0.06 ± 14% -0.0 0.04 ± 72% -0.0 0.03 ± 82% perf-profile.self.cycles-pp.mt_find
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.self.cycles-pp.lapic_next_deadline
0.01 ±188% -0.0 0.01 ±282% -0.0 0.01 ±200% perf-profile.self.cycles-pp.rmap_walk_anon
0.01 ±282% -0.0 0.00 -0.0 0.00 perf-profile.self.cycles-pp.update_min_vruntime
0.08 ± 47% -0.0 0.07 ± 45% -0.0 0.06 ± 38% perf-profile.self.cycles-pp.hrtimer_active
0.29 ± 4% -0.0 0.28 ± 2% -0.0 0.28 ± 6% perf-profile.self.cycles-pp.lrand48_r@plt
0.06 ± 49% -0.0 0.06 ± 56% -0.0 0.05 ± 52% perf-profile.self.cycles-pp.scheduler_tick
0.02 ±209% -0.0 0.02 ±142% -0.0 0.01 ±300% perf-profile.self.cycles-pp.update_cfs_group
0.05 ± 43% -0.0 0.05 ± 57% -0.0 0.04 ± 67% perf-profile.self.cycles-pp.update_irq_load_avg
0.11 ± 49% -0.0 0.11 ± 29% -0.0 0.09 ± 23% perf-profile.self.cycles-pp.update_load_avg
0.09 ± 41% +0.0 0.09 ± 42% -0.0 0.08 ± 24% perf-profile.self.cycles-pp.task_tick_fair
0.00 +0.0 0.00 +0.0 0.02 ±300% perf-profile.self.cycles-pp.mwait_idle_with_hints
0.12 ± 27% +0.0 0.13 ± 36% -0.0 0.10 ± 45% perf-profile.self.cycles-pp.account_user_time
0.11 ± 17% +0.0 0.12 ± 20% -0.0 0.10 ± 15% perf-profile.self.cycles-pp.__handle_mm_fault
0.02 ±111% +0.0 0.03 ±112% +0.0 0.03 ±100% perf-profile.self.cycles-pp.irqtime_account_process_tick
0.06 ± 55% +0.0 0.06 ± 42% -0.0 0.04 ± 84% perf-profile.self.cycles-pp.ktime_get_update_offsets_now
0.08 ± 17% +0.0 0.08 ± 21% -0.0 0.07 ± 15% perf-profile.self.cycles-pp.page_vma_mapped_walk
0.00 +0.0 0.01 ±282% +0.0 0.00 perf-profile.self.cycles-pp.__free_one_page
0.02 ±141% +0.0 0.02 ±112% -0.0 0.01 ±300% perf-profile.self.cycles-pp.hrtimer_interrupt
0.06 ± 42% +0.0 0.07 ± 43% -0.0 0.06 ± 37% perf-profile.self.cycles-pp.update_process_times
0.07 ± 16% +0.0 0.08 ± 25% -0.0 0.07 ± 38% perf-profile.self.cycles-pp.tick_sched_do_timer
0.00 +0.0 0.01 ±187% +0.0 0.00 perf-profile.self.cycles-pp.can_change_pte_writable
0.00 +0.0 0.01 ±187% +0.0 0.00 perf-profile.self.cycles-pp.folio_migrate_flags
0.00 +0.0 0.01 ±188% +0.0 0.00 perf-profile.self.cycles-pp.try_to_migrate_one
0.02 ±191% +0.0 0.03 ± 90% -0.0 0.01 ±200% perf-profile.self.cycles-pp.__update_load_avg_se
0.19 ± 16% +0.0 0.20 ± 19% -0.0 0.17 ± 17% perf-profile.self.cycles-pp.down_read_trylock
0.03 ±113% +0.0 0.04 ± 71% -0.0 0.03 ±100% perf-profile.self.cycles-pp.update_rq_clock
0.09 ± 14% +0.0 0.10 ± 16% -0.0 0.08 ± 17% perf-profile.self.cycles-pp._raw_spin_lock
0.03 ±151% +0.0 0.04 ± 72% -0.0 0.02 ±123% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.01 ±282% +0.0 0.02 ±112% -0.0 0.00 perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.01 ±282% +0.0 0.02 ±112% -0.0 0.00 perf-profile.self.cycles-pp.rcu_pending
0.10 ± 16% +0.0 0.12 ± 29% -0.0 0.10 ± 26% perf-profile.self.cycles-pp.__cgroup_account_cputime_field
0.01 ±282% +0.0 0.02 ±112% -0.0 0.01 ±300% perf-profile.self.cycles-pp.__hrtimer_run_queues
0.16 ± 41% +0.0 0.18 ± 25% -0.0 0.14 ± 24% perf-profile.self.cycles-pp.update_curr
0.15 ± 14% +0.0 0.17 ± 21% -0.0 0.14 ± 15% perf-profile.self.cycles-pp.up_read
0.04 ± 94% +0.0 0.05 ± 56% -0.0 0.03 ±100% perf-profile.self.cycles-pp.ktime_get
0.07 ± 16% +0.0 0.10 ± 21% +0.0 0.08 ± 16% perf-profile.self.cycles-pp.__list_del_entry_valid
0.04 ± 91% +0.0 0.06 ± 38% +0.0 0.04 ± 66% perf-profile.self.cycles-pp.arch_scale_freq_tick
0.03 ±118% +0.0 0.05 ± 59% -0.0 0.03 ±101% perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
0.00 +0.0 0.03 ±113% +0.0 0.00 perf-profile.self.cycles-pp.page_counter_uncharge
0.09 ± 7% +0.0 0.11 ± 14% -0.0 0.08 ± 15% perf-profile.self.cycles-pp.sync_regs
0.02 ±111% +0.0 0.06 ± 15% -0.0 0.02 ±152% perf-profile.self.cycles-pp.change_pte_range
0.11 ± 20% +0.0 0.15 ± 15% -0.0 0.11 ± 11% perf-profile.self.cycles-pp.native_irq_return_iret
0.01 ±282% +0.1 0.06 ± 43% -0.0 0.01 ±299% perf-profile.self.cycles-pp.page_counter_charge
0.16 ± 15% +0.1 0.22 ± 21% -0.0 0.16 ± 16% perf-profile.self.cycles-pp.copy_page
0.06 ± 40% +0.1 0.13 ± 23% +0.0 0.06 ± 15% perf-profile.self.cycles-pp.__default_send_IPI_dest_field
0.07 ± 15% +0.1 0.16 ± 18% +0.0 0.07 ± 15% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.00 +0.1 0.11 ± 19% +0.0 0.00 perf-profile.self.cycles-pp.native_flush_tlb_local
8.81 ± 9% +0.1 8.94 ± 2% -0.8 7.99 ± 16% perf-profile.self.cycles-pp.lrand48_r
0.06 ± 16% +0.3 0.33 ± 21% -0.0 0.06 ± 36% perf-profile.self.cycles-pp.flush_tlb_func
0.25 ± 14% +0.3 0.55 ± 18% -0.0 0.24 ± 15% perf-profile.self.cycles-pp.llist_reverse_order
13.38 ± 11% +0.3 13.71 -1.7 11.73 ± 21% perf-profile.self.cycles-pp.nrand48_r
0.35 ± 15% +0.4 0.76 ± 18% -0.0 0.34 ± 13% perf-profile.self.cycles-pp.llist_add_batch
0.37 ± 17% +0.7 1.10 ± 18% +0.0 0.38 ± 15% perf-profile.self.cycles-pp.smp_call_function_many_cond

--
Best Regards,
Yujie


> Best Regards,
> Huang, Ying
>
> ---------------------------8<-----------------------------------------
> From b36b662c80652447d7374faff1142a941dc9d617 Mon Sep 17 00:00:00 2001
> From: Huang Ying <[email protected]>
> Date: Mon, 20 Mar 2023 15:38:12 +0800
> Subject: [PATCH] dbg, migrate_pages: don't batch flushing for single page
>  migration
>
> ---
>  mm/migrate.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 98f1c11197a8..7271209c1a03 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1113,8 +1113,8 @@ static void migrate_folio_done(struct folio *src,
>  static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page,
>                                unsigned long private, struct folio *src,
>                                struct folio **dstp, int force, bool avoid_force_lock,
> -                              enum migrate_mode mode, enum migrate_reason reason,
> -                              struct list_head *ret)
> +                              bool batch_flush, enum migrate_mode mode,
> +                              enum migrate_reason reason, struct list_head *ret)
>  {
>         struct folio *dst;
>         int rc = -EAGAIN;
> @@ -1253,7 +1253,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
>                 /* Establish migration ptes */
>                 VM_BUG_ON_FOLIO(folio_test_anon(src) &&
>                                !folio_test_ksm(src) && !anon_vma, src);
> -               try_to_migrate(src, TTU_BATCH_FLUSH);
> +               try_to_migrate(src, batch_flush ? TTU_BATCH_FLUSH : 0);
>                 page_was_mapped = 1;
>         }
>  
> @@ -1641,6 +1641,7 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>         bool nosplit = (reason == MR_NUMA_MISPLACED);
>         bool no_split_folio_counting = false;
>         bool avoid_force_lock;
> +       bool batch_flush = !list_is_singular(from);
>  
>  retry:
>         rc_saved = 0;
> @@ -1690,7 +1691,7 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>  
>                         rc = migrate_folio_unmap(get_new_page, put_new_page, private,
>                                                  folio, &dst, pass > 2, avoid_force_lock,
> -                                                mode, reason, ret_folios);
> +                                                batch_flush, mode, reason, ret_folios);
>                         /*
>                          * The rules are:
>                          *      Success: folio will be freed
> @@ -1804,7 +1805,8 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>         stats->nr_failed_pages += nr_retry_pages;
>  move:
>         /* Flush TLBs for all unmapped folios */
> -       try_to_unmap_flush();
> +       if (batch_flush)
> +               try_to_unmap_flush();
>  
>         retry = 1;
>         for (pass = 0;

2023-03-23 02:02:35

by Huang, Ying

[permalink] [raw]
Subject: Re: [linus:master] [migrate_pages] 7e12beb8ca: vm-scalability.throughput -3.4% regression

"Liu, Yujie" <[email protected]> writes:

> On Tue, 2023-03-21 at 13:43 +0800, Huang, Ying wrote:
>> "Liu, Yujie" <[email protected]> writes:
>>
>> > Hi Ying,
>> >
>> > On Mon, 2023-03-20 at 15:58 +0800, Huang, Ying wrote:
>> > > Hi, Yujie,
>> > >
>> > > kernel test robot <[email protected]> writes:
>> > >
>> > > > Hello,
>> > > >
>> > > > FYI, we noticed a -3.4% regression of vm-scalability.throughput due to commit:
>> > > >
>> > > > commit: 7e12beb8ca2ac98b2ec42e0ea4b76cdc93b58654 ("migrate_pages: batch flushing TLB")
>> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>> > > >
>> > > > in testcase: vm-scalability
>> > > > on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
>> > > > with following parameters:
>> > > >
>> > > > runtime: 300s
>> > > > size: 512G
>> > > > test: anon-cow-rand-mt
>> > > > cpufreq_governor: performance
>> > > >
>> > > > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
>> > > > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
>> > > >
>> > > >
>> > > > If you fix the issue, kindly add following tag
>> > > > > Reported-by: kernel test robot <[email protected]>
>> > > > > Link: https://lore.kernel.org/oe-lkp/[email protected]
>> > > >
>> > >
>> > > Thanks a lot for report! Can you try whether the debug patch as
>> > > below can restore the regression?
>> >
>> > We've tested the patch and found the throughput score was partially
>> > restored from -3.6% to -1.4%, still with a slight performance drop.
>> > Please check the detailed data as follows:
>>
>> Good! Thanks for your detailed data!
>>
>> > 0.09 ± 17% +1.2 1.32 ± 7% +0.4 0.45 ± 21% perf-profile.children.cycles-pp.flush_tlb_func
>>
>> It appears that we can reduce the unnecessary TLB flushing effectively
>> with the previous debug patch. But the batched flush (full flush) is
>> still slower than the non-batched flush (flush one page).
>>
>> Can you try the debug patch as below to check whether it can restore the
>> regression completely? The new debug patch can be applied on top of the
>> previous debug patch.
>
> The second debug patch got a -0.7% performance change. The data have
> some fluctuations from test to test, and the standard deviation is even
> a bit larger than 0.7%, which make the performance score not very
> convincing. Please check other metrics to see if the regression is
> fully restored. Thanks.

Thanks for testing!

> 0.09 ± 17% +0.4 0.45 ± 21% +0.0 0.09 ± 12% perf-profile.children.cycles-pp.flush_tlb_func

From the profiling data, the TLB flushing overhead has been restored.
So I think the remaining 0.7% regression should be at noise level. I
will prepare the fixing patch based on the test results.

Best Regards,
Huang, Ying