2020-10-04 13:34:41

by Chen, Rong A

[permalink] [raw]
Subject: [mm, thp] 85b9f46e8e: vm-scalability.throughput -8.7% regression

Greeting,

FYI, we noticed a -8.7% regression of vm-scalability.throughput due to commit:


commit: 85b9f46e8ea451633ccd60a7d8cacbfff9f34047 ("mm, thp: track fallbacks due to failed memcg charges separately")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: vm-scalability
on test machine: 104 threads Skylake with 192G memory
with following parameters:

runtime: 300s
size: 1T
test: lru-shm
cpufreq_governor: performance
ucode: 0x2006906

test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/1T/lkp-skl-fpga01/lru-shm/vm-scalability/0x2006906

commit:
dcdf11ee14 ("mm, shmem: add vmstat for hugepage fallback")
85b9f46e8e ("mm, thp: track fallbacks due to failed memcg charges separately")

dcdf11ee14413332 85b9f46e8ea451633ccd60a7d8c
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
1:4 24% 2:4 perf-profile.calltrace.cycles-pp.sync_regs.error_entry.do_access
3:4 53% 5:4 perf-profile.calltrace.cycles-pp.error_entry.do_access
9:4 -27% 8:4 perf-profile.children.cycles-pp.error_entry
4:4 -10% 4:4 perf-profile.self.cycles-pp.error_entry
%stddev %change %stddev
\ | \
477291 -9.1% 434041 vm-scalability.median
49791027 -8.7% 45476799 vm-scalability.throughput
223.67 +1.6% 227.36 vm-scalability.time.elapsed_time
223.67 +1.6% 227.36 vm-scalability.time.elapsed_time.max
50364 ± 6% +24.1% 62482 ± 10% vm-scalability.time.involuntary_context_switches
2237 +7.8% 2412 vm-scalability.time.percent_of_cpu_this_job_got
3084 +18.2% 3646 vm-scalability.time.system_time
1921 -4.2% 1839 vm-scalability.time.user_time
13.68 +2.2 15.86 mpstat.cpu.all.sys%
28535 ± 30% -47.0% 15114 ± 79% numa-numastat.node0.other_node
142734 ± 11% -19.4% 115000 ± 17% numa-meminfo.node0.AnonPages
11168 ± 3% +8.8% 12150 ± 5% numa-meminfo.node1.PageTables
76.00 -1.6% 74.75 vmstat.cpu.id
3626 -1.9% 3555 vmstat.system.cs
2214928 ±166% -96.6% 75321 ± 7% cpuidle.C1.usage
200981 ± 7% -18.0% 164861 ± 7% cpuidle.POLL.time
52675 ± 3% -16.7% 43866 ± 10% cpuidle.POLL.usage
35659 ± 11% -19.4% 28754 ± 17% numa-vmstat.node0.nr_anon_pages
1248014 ± 3% +10.9% 1384236 numa-vmstat.node1.nr_mapped
2722 ± 4% +10.6% 3011 ± 5% numa-vmstat.node1.nr_page_table_pages
2562296 +7.0% 2741267 proc-vmstat.nr_mapped
5451 +4.7% 5707 proc-vmstat.nr_page_table_pages
5704 ± 4% +38.0% 7872 ± 6% proc-vmstat.pgactivate
12140 ± 3% -9.5% 10984 ± 6% softirqs.CPU13.RCU
12180 ± 4% -8.3% 11167 ± 2% softirqs.CPU39.RCU
11745 ± 6% -11.6% 10377 ± 4% softirqs.CPU65.RCU
13119 ± 9% -12.7% 11453 ± 4% softirqs.CPU71.RCU
12034 ± 5% -7.9% 11079 ± 3% softirqs.CPU73.RCU
13248 ± 15% -16.3% 11090 ± 4% softirqs.CPU88.RCU
24545 ± 3% -12.0% 21602 ± 9% softirqs.CPU91.SCHED
764.44 ± 9% -16.0% 641.83 ± 7% sched_debug.cfs_rq:/.load_avg.max
0.32 ± 32% -53.7% 0.15 ± 66% sched_debug.cfs_rq:/.nr_running.avg
9.00 ± 25% -51.1% 4.41 ± 85% sched_debug.cfs_rq:/.removed.load_avg.avg
9.00 ± 25% -51.0% 4.41 ± 85% sched_debug.cfs_rq:/.removed.runnable_avg.avg
487.45 ± 27% -42.1% 282.24 ± 33% sched_debug.cfs_rq:/.runnable_avg.avg
353.30 ± 31% -53.7% 163.58 ± 60% sched_debug.cfs_rq:/.util_avg.avg
23.88 ± 11% -30.4% 16.61 ± 14% sched_debug.cfs_rq:/.util_est_enqueued.avg
4.48 ± 41% -37.3% 2.81 ± 33% sched_debug.cpu.clock.stddev
2956 ± 38% -67.3% 965.56 ±126% sched_debug.cpu.curr->pid.avg
596164 ± 8% -12.7% 520369 sched_debug.cpu.max_idle_balance_cost.max
10556 ± 50% -80.1% 2099 ± 44% sched_debug.cpu.max_idle_balance_cost.stddev
0.00 ± 11% -25.2% 0.00 ± 19% sched_debug.cpu.next_balance.stddev
0.30 ± 35% -57.8% 0.13 ± 88% sched_debug.cpu.nr_running.avg
0.26 ± 10% -10.7% 0.24 ± 5% sched_debug.cpu.nr_running.stddev
1.485e+10 -1.0% 1.471e+10 perf-stat.i.branch-instructions
3607 -2.0% 3535 perf-stat.i.context-switches
1.13 ± 3% +5.6% 1.19 perf-stat.i.cpi
6.563e+10 +8.5% 7.123e+10 perf-stat.i.cpu-cycles
1217 +2.2% 1244 perf-stat.i.cycles-between-cache-misses
1.499e+10 -0.9% 1.485e+10 perf-stat.i.dTLB-loads
4.056e+09 -1.3% 4.002e+09 perf-stat.i.dTLB-stores
0.91 ± 4% -5.4% 0.86 ± 2% perf-stat.i.ipc
0.63 +8.4% 0.68 perf-stat.i.metric.GHz
2306374 -1.4% 2274501 perf-stat.i.minor-faults
2061405 ± 4% +14.3% 2355494 ± 3% perf-stat.i.node-load-misses
1283824 ± 4% +5.7% 1356763 perf-stat.i.node-loads
39.74 ± 3% -2.5 37.25 ± 3% perf-stat.i.node-store-miss-rate%
8640344 -2.0% 8468484 perf-stat.i.node-stores
2306374 -1.4% 2274501 perf-stat.i.page-faults
1.24 +9.5% 1.36 perf-stat.overall.cpi
1187 +8.0% 1282 perf-stat.overall.cycles-between-cache-misses
0.80 -8.7% 0.73 perf-stat.overall.ipc
1.486e+10 -1.2% 1.468e+10 perf-stat.ps.branch-instructions
3598 -2.1% 3523 perf-stat.ps.context-switches
6.566e+10 +8.3% 7.111e+10 perf-stat.ps.cpu-cycles
1.499e+10 -1.1% 1.482e+10 perf-stat.ps.dTLB-loads
4.055e+09 -1.5% 3.993e+09 perf-stat.ps.dTLB-stores
5.282e+10 -1.1% 5.223e+10 perf-stat.ps.instructions
2308446 -1.6% 2270962 perf-stat.ps.minor-faults
2058616 ± 4% +14.1% 2349536 ± 3% perf-stat.ps.node-load-misses
1283042 ± 4% +5.5% 1353397 perf-stat.ps.node-loads
8648214 -2.2% 8455419 perf-stat.ps.node-stores
2308446 -1.6% 2270962 perf-stat.ps.page-faults
1473 ± 10% +46.2% 2154 ± 5% interrupts.CPU0.NMI:Non-maskable_interrupts
1473 ± 10% +46.2% 2154 ± 5% interrupts.CPU0.PMI:Performance_monitoring_interrupts
524.00 +8.0% 565.75 ± 5% interrupts.CPU1.CAL:Function_call_interrupts
2593 ± 82% -85.7% 372.00 ± 97% interrupts.CPU1.RES:Rescheduling_interrupts
1282 ± 5% +92.5% 2467 ± 26% interrupts.CPU102.NMI:Non-maskable_interrupts
1282 ± 5% +92.5% 2467 ± 26% interrupts.CPU102.PMI:Performance_monitoring_interrupts
1475 ± 16% +41.9% 2093 interrupts.CPU103.NMI:Non-maskable_interrupts
1475 ± 16% +41.9% 2093 interrupts.CPU103.PMI:Performance_monitoring_interrupts
736.50 ±111% -79.8% 148.75 ± 17% interrupts.CPU13.RES:Rescheduling_interrupts
1030 ± 34% +122.5% 2292 ± 10% interrupts.CPU17.NMI:Non-maskable_interrupts
1030 ± 34% +122.5% 2292 ± 10% interrupts.CPU17.PMI:Performance_monitoring_interrupts
1189 ± 25% +73.2% 2059 ± 4% interrupts.CPU18.NMI:Non-maskable_interrupts
1189 ± 25% +73.2% 2059 ± 4% interrupts.CPU18.PMI:Performance_monitoring_interrupts
1024 ± 34% +90.0% 1946 ± 8% interrupts.CPU22.NMI:Non-maskable_interrupts
1024 ± 34% +90.0% 1946 ± 8% interrupts.CPU22.PMI:Performance_monitoring_interrupts
144.50 ± 35% +67.5% 242.00 ± 37% interrupts.CPU22.RES:Rescheduling_interrupts
1244 ± 23% +80.3% 2244 ± 16% interrupts.CPU23.NMI:Non-maskable_interrupts
1244 ± 23% +80.3% 2244 ± 16% interrupts.CPU23.PMI:Performance_monitoring_interrupts
1435 ± 6% +62.5% 2332 ± 27% interrupts.CPU25.NMI:Non-maskable_interrupts
1435 ± 6% +62.5% 2332 ± 27% interrupts.CPU25.PMI:Performance_monitoring_interrupts
1365 ± 10% +56.4% 2135 ± 3% interrupts.CPU28.NMI:Non-maskable_interrupts
1365 ± 10% +56.4% 2135 ± 3% interrupts.CPU28.PMI:Performance_monitoring_interrupts
1144 ± 25% +84.9% 2115 ± 2% interrupts.CPU30.NMI:Non-maskable_interrupts
1144 ± 25% +84.9% 2115 ± 2% interrupts.CPU30.PMI:Performance_monitoring_interrupts
1288 ± 5% +139.8% 3088 ± 56% interrupts.CPU35.NMI:Non-maskable_interrupts
1288 ± 5% +139.8% 3088 ± 56% interrupts.CPU35.PMI:Performance_monitoring_interrupts
322.50 ± 36% +1361.1% 4712 ± 89% interrupts.CPU36.RES:Rescheduling_interrupts
1100 ± 21% +87.1% 2059 ± 2% interrupts.CPU37.NMI:Non-maskable_interrupts
1100 ± 21% +87.1% 2059 ± 2% interrupts.CPU37.PMI:Performance_monitoring_interrupts
968.75 ± 35% +113.9% 2072 interrupts.CPU38.NMI:Non-maskable_interrupts
968.75 ± 35% +113.9% 2072 interrupts.CPU38.PMI:Performance_monitoring_interrupts
10.50 ± 76% +440.5% 56.75 ± 62% interrupts.CPU4.TLB:TLB_shootdowns
1146 ± 28% +81.4% 2079 interrupts.CPU45.NMI:Non-maskable_interrupts
1146 ± 28% +81.4% 2079 interrupts.CPU45.PMI:Performance_monitoring_interrupts
1295 ± 5% +59.8% 2070 interrupts.CPU46.NMI:Non-maskable_interrupts
1295 ± 5% +59.8% 2070 interrupts.CPU46.PMI:Performance_monitoring_interrupts
828.75 ± 40% +175.4% 2282 ± 18% interrupts.CPU47.NMI:Non-maskable_interrupts
828.75 ± 40% +175.4% 2282 ± 18% interrupts.CPU47.PMI:Performance_monitoring_interrupts
1064 ± 41% +92.8% 2053 interrupts.CPU48.NMI:Non-maskable_interrupts
1064 ± 41% +92.8% 2053 interrupts.CPU48.PMI:Performance_monitoring_interrupts
1324 ± 36% +54.9% 2052 ± 2% interrupts.CPU51.NMI:Non-maskable_interrupts
1324 ± 36% +54.9% 2052 ± 2% interrupts.CPU51.PMI:Performance_monitoring_interrupts
1312 ± 28% +81.8% 2386 ± 18% interrupts.CPU52.NMI:Non-maskable_interrupts
1312 ± 28% +81.8% 2386 ± 18% interrupts.CPU52.PMI:Performance_monitoring_interrupts
1407 ± 6% +34.2% 1889 ± 9% interrupts.CPU54.NMI:Non-maskable_interrupts
1407 ± 6% +34.2% 1889 ± 9% interrupts.CPU54.PMI:Performance_monitoring_interrupts
1419 ± 3% +35.4% 1921 ± 9% interrupts.CPU58.NMI:Non-maskable_interrupts
1419 ± 3% +35.4% 1921 ± 9% interrupts.CPU58.PMI:Performance_monitoring_interrupts
1357 ± 2% +55.6% 2111 ± 11% interrupts.CPU59.NMI:Non-maskable_interrupts
1357 ± 2% +55.6% 2111 ± 11% interrupts.CPU59.PMI:Performance_monitoring_interrupts
1206 ± 26% +79.9% 2171 ± 17% interrupts.CPU60.NMI:Non-maskable_interrupts
1206 ± 26% +79.9% 2171 ± 17% interrupts.CPU60.PMI:Performance_monitoring_interrupts
1388 ± 2% +43.6% 1994 ± 11% interrupts.CPU61.NMI:Non-maskable_interrupts
1388 ± 2% +43.6% 1994 ± 11% interrupts.CPU61.PMI:Performance_monitoring_interrupts
1407 ± 5% +46.7% 2065 ± 7% interrupts.CPU64.NMI:Non-maskable_interrupts
1407 ± 5% +46.7% 2065 ± 7% interrupts.CPU64.PMI:Performance_monitoring_interrupts
1214 ± 25% +66.2% 2017 ± 14% interrupts.CPU65.NMI:Non-maskable_interrupts
1214 ± 25% +66.2% 2017 ± 14% interrupts.CPU65.PMI:Performance_monitoring_interrupts
1212 ± 29% +69.6% 2055 ± 14% interrupts.CPU67.NMI:Non-maskable_interrupts
1212 ± 29% +69.6% 2055 ± 14% interrupts.CPU67.PMI:Performance_monitoring_interrupts
1202 ± 25% +124.4% 2698 ± 23% interrupts.CPU69.NMI:Non-maskable_interrupts
1202 ± 25% +124.4% 2698 ± 23% interrupts.CPU69.PMI:Performance_monitoring_interrupts
1058 ± 37% +114.5% 2269 ± 24% interrupts.CPU7.NMI:Non-maskable_interrupts
1058 ± 37% +114.5% 2269 ± 24% interrupts.CPU7.PMI:Performance_monitoring_interrupts
1359 ± 2% +44.6% 1965 ± 9% interrupts.CPU70.NMI:Non-maskable_interrupts
1359 ± 2% +44.6% 1965 ± 9% interrupts.CPU70.PMI:Performance_monitoring_interrupts
1218 ± 24% +67.2% 2037 ± 10% interrupts.CPU72.NMI:Non-maskable_interrupts
1218 ± 24% +67.2% 2037 ± 10% interrupts.CPU72.PMI:Performance_monitoring_interrupts
139.75 ± 23% -31.3% 96.00 ± 11% interrupts.CPU74.RES:Rescheduling_interrupts
1182 ± 24% +97.2% 2331 ± 25% interrupts.CPU8.NMI:Non-maskable_interrupts
1182 ± 24% +97.2% 2331 ± 25% interrupts.CPU8.PMI:Performance_monitoring_interrupts
219.75 ± 61% -50.1% 109.75 ± 12% interrupts.CPU8.RES:Rescheduling_interrupts
1362 ± 3% +47.7% 2012 ± 6% interrupts.CPU9.NMI:Non-maskable_interrupts
1362 ± 3% +47.7% 2012 ± 6% interrupts.CPU9.PMI:Performance_monitoring_interrupts
260.00 ± 30% +1584.2% 4379 ±158% interrupts.CPU90.RES:Rescheduling_interrupts
203.25 ± 28% +133.7% 475.00 ± 65% interrupts.CPU94.RES:Rescheduling_interrupts
1137 ± 24% +80.0% 2047 ± 2% interrupts.CPU98.NMI:Non-maskable_interrupts
1137 ± 24% +80.0% 2047 ± 2% interrupts.CPU98.PMI:Performance_monitoring_interrupts
142681 ± 5% +43.0% 204005 ± 6% interrupts.NMI:Non-maskable_interrupts
142681 ± 5% +43.0% 204005 ± 6% interrupts.PMI:Performance_monitoring_interrupts
29.89 ± 84% -16.4 13.46 ±173% perf-profile.calltrace.cycles-pp.page_fault
29.52 ± 84% -16.2 13.31 ±173% perf-profile.calltrace.cycles-pp.do_user_addr_fault.page_fault
28.76 ± 84% -15.7 13.04 ±173% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.page_fault
27.92 ± 84% -15.2 12.72 ±173% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.page_fault
14.44 ± 14% -7.0 7.39 ± 12% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
14.44 ± 14% -7.0 7.39 ± 12% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
14.43 ± 14% -7.0 7.39 ± 12% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
14.57 ± 14% -7.0 7.57 ± 14% perf-profile.calltrace.cycles-pp.secondary_startup_64
13.60 ± 15% -6.7 6.87 ± 11% perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
13.40 ± 15% -6.6 6.76 ± 11% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
10.98 ± 18% -5.7 5.33 ± 9% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
5.24 ± 18% -2.8 2.43 ± 44% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlinkat
5.24 ± 18% -2.8 2.43 ± 44% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
5.24 ± 18% -2.8 2.43 ± 44% perf-profile.calltrace.cycles-pp.evict.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
5.24 ± 18% -2.8 2.43 ± 44% perf-profile.calltrace.cycles-pp.unlinkat
5.24 ± 18% -2.8 2.43 ± 44% perf-profile.calltrace.cycles-pp.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
5.24 ± 18% -2.5 2.73 ± 32% perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.do_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.24 ± 18% -2.5 2.73 ± 32% perf-profile.calltrace.cycles-pp.shmem_truncate_range.shmem_evict_inode.evict.do_unlinkat.do_syscall_64
5.23 ± 18% -2.5 2.72 ± 32% perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_truncate_range.shmem_evict_inode.evict.do_unlinkat
4.93 ± 14% -2.4 2.54 ± 2% perf-profile.calltrace.cycles-pp.mem_cgroup_try_charge_delay.shmem_getpage_gfp.shmem_fault.__do_fault.do_fault
4.79 ± 14% -2.4 2.42 ± 3% perf-profile.calltrace.cycles-pp.mem_cgroup_try_charge.mem_cgroup_try_charge_delay.shmem_getpage_gfp.shmem_fault.__do_fault
2.49 ± 19% -1.3 1.16 ± 4% perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.mem_cgroup_try_charge.mem_cgroup_try_charge_delay.shmem_getpage_gfp.shmem_fault
2.32 ± 18% -1.3 1.06 ± 45% perf-profile.calltrace.cycles-pp.__pagevec_release.shmem_undo_range.shmem_truncate_range.shmem_evict_inode.evict
2.30 ± 18% -1.3 1.04 ± 45% perf-profile.calltrace.cycles-pp.release_pages.__pagevec_release.shmem_undo_range.shmem_truncate_range.shmem_evict_inode
2.03 ± 25% -1.2 0.87 ± 45% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
2.03 ± 25% -1.2 0.87 ± 45% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.96 ± 16% -1.0 0.93 ± 42% perf-profile.calltrace.cycles-pp.truncate_inode_page.shmem_undo_range.shmem_truncate_range.shmem_evict_inode.evict
1.69 ± 17% -1.0 0.68 ± 69% perf-profile.calltrace.cycles-pp.delete_from_page_cache.truncate_inode_page.shmem_undo_range.shmem_truncate_range.shmem_evict_inode
2.35 ± 15% -1.0 1.38 ± 18% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
2.13 ± 16% -0.9 1.25 ± 19% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle
1.19 ± 17% -0.8 0.37 ±108% perf-profile.calltrace.cycles-pp.__delete_from_page_cache.delete_from_page_cache.truncate_inode_page.shmem_undo_range.shmem_truncate_range
2.67 ± 17% -0.8 1.85 ± 20% perf-profile.calltrace.cycles-pp.write
2.66 ± 17% -0.8 1.85 ± 20% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
2.66 ± 17% -0.8 1.85 ± 20% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.66 ± 17% -0.8 1.85 ± 20% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.66 ± 17% -0.8 1.85 ± 20% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.63 ± 17% -0.8 1.83 ± 20% perf-profile.calltrace.cycles-pp.vprintk_emit.devkmsg_emit.devkmsg_write.cold.new_sync_write.vfs_write
2.65 ± 17% -0.8 1.85 ± 20% perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.62 ± 17% -0.8 1.82 ± 20% perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.cold.new_sync_write
2.63 ± 17% -0.8 1.83 ± 20% perf-profile.calltrace.cycles-pp.devkmsg_write.cold.new_sync_write.vfs_write.ksys_write.do_syscall_64
2.63 ± 17% -0.8 1.83 ± 20% perf-profile.calltrace.cycles-pp.devkmsg_emit.devkmsg_write.cold.new_sync_write.vfs_write.ksys_write
2.08 ± 15% -0.7 1.39 ± 19% perf-profile.calltrace.cycles-pp.drm_fb_helper_dirty_work.process_one_work.worker_thread.kthread.ret_from_fork
2.10 ± 15% -0.7 1.42 ± 19% perf-profile.calltrace.cycles-pp.ret_from_fork
2.10 ± 15% -0.7 1.42 ± 19% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
2.15 ± 17% -0.7 1.46 ± 20% perf-profile.calltrace.cycles-pp.serial8250_console_write.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.cold
2.08 ± 15% -0.7 1.40 ± 19% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
2.08 ± 15% -0.7 1.40 ± 19% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork
5.65 ± 14% -0.7 4.97 ± 12% perf-profile.calltrace.cycles-pp.filemap_map_pages.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
5.21 ± 14% -0.7 4.53 ± 12% perf-profile.calltrace.cycles-pp.clear_page_erms.shmem_getpage_gfp.shmem_fault.__do_fault.do_fault
2.01 ± 16% -0.7 1.35 ± 19% perf-profile.calltrace.cycles-pp.memcpy_erms.drm_fb_helper_dirty_work.process_one_work.worker_thread.kthread
2.01 ± 17% -0.6 1.38 ± 20% perf-profile.calltrace.cycles-pp.uart_console_write.serial8250_console_write.console_unlock.vprintk_emit.devkmsg_emit
1.83 ± 16% -0.6 1.26 ± 20% perf-profile.calltrace.cycles-pp.wait_for_xmitr.serial8250_console_putchar.uart_console_write.serial8250_console_write.console_unlock
1.83 ± 16% -0.6 1.26 ± 20% perf-profile.calltrace.cycles-pp.serial8250_console_putchar.uart_console_write.serial8250_console_write.console_unlock.vprintk_emit
1.31 ± 17% -0.5 0.78 ± 19% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
1.45 ± 16% -0.5 0.98 ± 20% perf-profile.calltrace.cycles-pp.io_serial_in.wait_for_xmitr.serial8250_console_putchar.uart_console_write.serial8250_console_write
13.12 ± 15% +10.7 23.87 ± 10% perf-profile.calltrace.cycles-pp.__lru_cache_add.shmem_getpage_gfp.shmem_fault.__do_fault.do_fault
10.93 ± 15% +10.8 21.78 ± 10% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.shmem_getpage_gfp.shmem_fault
10.87 ± 15% +10.9 21.73 ± 10% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.shmem_getpage_gfp
12.81 ± 15% +10.9 23.73 ± 10% perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.__lru_cache_add.shmem_getpage_gfp.shmem_fault.__do_fault
14.44 ± 14% -7.0 7.39 ± 12% perf-profile.children.cycles-pp.start_secondary
14.57 ± 14% -7.0 7.57 ± 14% perf-profile.children.cycles-pp.secondary_startup_64
14.57 ± 14% -7.0 7.57 ± 14% perf-profile.children.cycles-pp.cpu_startup_entry
14.57 ± 14% -7.0 7.57 ± 14% perf-profile.children.cycles-pp.do_idle
13.72 ± 15% -6.7 7.05 ± 13% perf-profile.children.cycles-pp.cpuidle_enter
13.71 ± 15% -6.7 7.05 ± 13% perf-profile.children.cycles-pp.cpuidle_enter_state
11.07 ± 17% -5.6 5.48 ± 12% perf-profile.children.cycles-pp.intel_idle
10.18 ± 18% -3.7 6.52 ± 18% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
10.18 ± 18% -3.7 6.52 ± 18% perf-profile.children.cycles-pp.do_syscall_64
5.24 ± 18% -2.8 2.43 ± 44% perf-profile.children.cycles-pp.unlinkat
5.24 ± 18% -2.4 2.83 ± 29% perf-profile.children.cycles-pp.evict
5.24 ± 18% -2.4 2.83 ± 29% perf-profile.children.cycles-pp.do_unlinkat
5.24 ± 18% -2.4 2.83 ± 29% perf-profile.children.cycles-pp.shmem_evict_inode
5.24 ± 18% -2.4 2.83 ± 29% perf-profile.children.cycles-pp.shmem_truncate_range
5.24 ± 18% -2.4 2.83 ± 29% perf-profile.children.cycles-pp.shmem_undo_range
5.05 ± 13% -2.4 2.68 ± 6% perf-profile.children.cycles-pp.mem_cgroup_try_charge_delay
4.89 ± 13% -2.4 2.53 ± 6% perf-profile.children.cycles-pp.mem_cgroup_try_charge
2.64 ± 15% -1.4 1.22 ± 5% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
3.23 ± 16% -1.2 2.02 ± 17% perf-profile.children.cycles-pp.apic_timer_interrupt
2.63 ± 17% -1.1 1.49 ± 27% perf-profile.children.cycles-pp.release_pages
2.90 ± 17% -1.1 1.79 ± 17% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
2.32 ± 18% -1.1 1.24 ± 30% perf-profile.children.cycles-pp.__pagevec_release
1.98 ± 17% -0.9 1.09 ± 28% perf-profile.children.cycles-pp.truncate_inode_page
5.49 ± 14% -0.9 4.61 ± 12% perf-profile.children.cycles-pp.clear_page_erms
2.75 ± 18% -0.9 1.90 ± 20% perf-profile.children.cycles-pp.console_unlock
2.74 ± 17% -0.8 1.90 ± 20% perf-profile.children.cycles-pp.vprintk_emit
2.67 ± 17% -0.8 1.85 ± 20% perf-profile.children.cycles-pp.write
2.67 ± 17% -0.8 1.86 ± 20% perf-profile.children.cycles-pp.ksys_write
2.66 ± 17% -0.8 1.86 ± 20% perf-profile.children.cycles-pp.vfs_write
5.90 ± 13% -0.8 5.10 ± 12% perf-profile.children.cycles-pp.filemap_map_pages
2.66 ± 17% -0.8 1.85 ± 20% perf-profile.children.cycles-pp.new_sync_write
2.63 ± 17% -0.8 1.83 ± 20% perf-profile.children.cycles-pp.devkmsg_write.cold
2.63 ± 17% -0.8 1.83 ± 20% perf-profile.children.cycles-pp.devkmsg_emit
1.71 ± 17% -0.8 0.94 ± 28% perf-profile.children.cycles-pp.delete_from_page_cache
1.95 ± 19% -0.7 1.22 ± 17% perf-profile.children.cycles-pp.hrtimer_interrupt
2.25 ± 18% -0.7 1.52 ± 20% perf-profile.children.cycles-pp.serial8250_console_write
2.08 ± 15% -0.7 1.39 ± 19% perf-profile.children.cycles-pp.memcpy_erms
2.08 ± 15% -0.7 1.39 ± 19% perf-profile.children.cycles-pp.drm_fb_helper_dirty_work
2.10 ± 15% -0.7 1.42 ± 19% perf-profile.children.cycles-pp.ret_from_fork
2.10 ± 15% -0.7 1.42 ± 19% perf-profile.children.cycles-pp.kthread
2.08 ± 15% -0.7 1.40 ± 19% perf-profile.children.cycles-pp.worker_thread
2.08 ± 15% -0.7 1.40 ± 19% perf-profile.children.cycles-pp.process_one_work
2.10 ± 18% -0.7 1.44 ± 20% perf-profile.children.cycles-pp.uart_console_write
2.05 ± 17% -0.7 1.40 ± 20% perf-profile.children.cycles-pp.wait_for_xmitr
1.92 ± 17% -0.6 1.32 ± 20% perf-profile.children.cycles-pp.serial8250_console_putchar
3.44 ± 14% -0.6 2.89 ± 10% perf-profile.children.cycles-pp.shmem_alloc_and_acct_page
1.18 ± 19% -0.6 0.63 ± 31% perf-profile.children.cycles-pp.free_unref_page_list
1.23 ± 17% -0.6 0.68 ± 29% perf-profile.children.cycles-pp.__delete_from_page_cache
1.62 ± 17% -0.5 1.09 ± 20% perf-profile.children.cycles-pp.io_serial_in
1.21 ± 21% -0.5 0.76 ± 19% perf-profile.children.cycles-pp.__hrtimer_run_queues
4.10 ± 14% -0.4 3.67 ± 11% perf-profile.children.cycles-pp.swapgs_restore_regs_and_return_to_usermode
2.52 ± 13% -0.4 2.11 ± 11% perf-profile.children.cycles-pp.shmem_alloc_page
0.82 ± 18% -0.4 0.44 ± 32% perf-profile.children.cycles-pp.free_pcppages_bulk
2.25 ± 13% -0.4 1.88 ± 11% perf-profile.children.cycles-pp.alloc_pages_vma
0.90 ± 21% -0.3 0.55 ± 17% perf-profile.children.cycles-pp.tick_sched_timer
1.98 ± 13% -0.3 1.65 ± 10% perf-profile.children.cycles-pp.__alloc_pages_nodemask
0.74 ± 13% -0.3 0.42 ± 12% perf-profile.children.cycles-pp.irq_exit
0.92 ± 15% -0.3 0.60 ± 20% perf-profile.children.cycles-pp.xas_store
0.72 ± 16% -0.3 0.41 ± 11% perf-profile.children.cycles-pp.ktime_get
0.61 ± 18% -0.3 0.33 ± 36% perf-profile.children.cycles-pp.__free_one_page
0.74 ± 21% -0.3 0.47 ± 21% perf-profile.children.cycles-pp.tick_sched_handle
0.72 ± 20% -0.3 0.45 ± 19% perf-profile.children.cycles-pp.update_process_times
1.60 ± 15% -0.3 1.33 ± 13% perf-profile.children.cycles-pp.shmem_add_to_page_cache
0.66 ± 14% -0.3 0.40 ± 21% perf-profile.children.cycles-pp.menu_select
1.51 ± 13% -0.2 1.26 ± 9% perf-profile.children.cycles-pp.get_page_from_freelist
0.47 ± 20% -0.2 0.25 ± 28% perf-profile.children.cycles-pp.find_get_entries
1.04 ± 15% -0.2 0.83 ± 17% perf-profile.children.cycles-pp.xas_load
0.54 ± 15% -0.2 0.33 ± 25% perf-profile.children.cycles-pp.__list_del_entry_valid
0.52 ± 16% -0.2 0.32 ± 13% perf-profile.children.cycles-pp.clockevents_program_event
1.23 ± 12% -0.2 1.03 ± 10% perf-profile.children.cycles-pp.rmqueue
0.44 ± 14% -0.2 0.25 ± 17% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.99 ± 14% -0.2 0.80 ± 17% perf-profile.children.cycles-pp.unlock_page
0.39 ± 14% -0.2 0.21 ± 27% perf-profile.children.cycles-pp.xas_init_marks
0.44 ± 12% -0.2 0.26 ± 19% perf-profile.children.cycles-pp.__softirqentry_text_start
0.38 ± 13% -0.2 0.21 ± 15% perf-profile.children.cycles-pp.tick_nohz_next_event
1.21 ± 11% -0.2 1.04 ± 12% perf-profile.children.cycles-pp.sync_regs
0.63 ± 17% -0.2 0.47 ± 15% perf-profile.children.cycles-pp.__mod_node_page_state
0.41 ± 16% -0.1 0.28 ± 16% perf-profile.children.cycles-pp.scheduler_tick
0.28 ± 14% -0.1 0.15 ± 30% perf-profile.children.cycles-pp.xas_clear_mark
0.51 ± 17% -0.1 0.37 ± 21% perf-profile.children.cycles-pp.vt_console_print
0.49 ± 17% -0.1 0.36 ± 21% perf-profile.children.cycles-pp.fbcon_redraw
0.73 ± 13% -0.1 0.60 ± 10% perf-profile.children.cycles-pp.rmqueue_bulk
0.49 ± 17% -0.1 0.37 ± 21% perf-profile.children.cycles-pp.lf
0.49 ± 17% -0.1 0.37 ± 21% perf-profile.children.cycles-pp.con_scroll
0.49 ± 17% -0.1 0.37 ± 21% perf-profile.children.cycles-pp.fbcon_scroll
1.21 ± 11% -0.1 1.08 ± 10% perf-profile.children.cycles-pp.__count_memcg_events
0.27 ± 20% -0.1 0.15 ± 31% perf-profile.children.cycles-pp.mem_cgroup_uncharge_list
0.48 ± 17% -0.1 0.36 ± 21% perf-profile.children.cycles-pp.fbcon_putcs
0.48 ± 18% -0.1 0.36 ± 20% perf-profile.children.cycles-pp.bit_putcs
0.44 ± 18% -0.1 0.33 ± 20% perf-profile.children.cycles-pp.drm_fb_helper_sys_imageblit
0.44 ± 18% -0.1 0.33 ± 20% perf-profile.children.cycles-pp.sys_imageblit
0.21 ± 10% -0.1 0.10 ± 25% perf-profile.children.cycles-pp.tick_nohz_irq_exit
0.74 ± 14% -0.1 0.63 ± 14% perf-profile.children.cycles-pp.__perf_sw_event
0.22 ± 20% -0.1 0.12 ± 25% perf-profile.children.cycles-pp.unaccount_page_cache_page
0.78 ± 15% -0.1 0.68 ± 14% perf-profile.children.cycles-pp.xas_find
0.58 ± 13% -0.1 0.48 ± 14% perf-profile.children.cycles-pp.___perf_sw_event
0.48 ± 11% -0.1 0.39 ± 10% perf-profile.children.cycles-pp._raw_spin_lock
0.12 ± 25% -0.1 0.04 ±106% perf-profile.children.cycles-pp.timekeeping_max_deferment
0.34 ± 21% -0.1 0.27 ± 20% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.17 ± 13% -0.1 0.10 ± 19% perf-profile.children.cycles-pp.rebalance_domains
0.20 ± 12% -0.1 0.12 ± 26% perf-profile.children.cycles-pp.get_next_timer_interrupt
0.15 ± 35% -0.1 0.08 ± 23% perf-profile.children.cycles-pp.irq_work_run_list
0.16 ± 18% -0.1 0.09 ± 27% perf-profile.children.cycles-pp.uncharge_batch
0.19 ± 22% -0.1 0.12 ± 20% perf-profile.children.cycles-pp.io_serial_out
0.17 ± 16% -0.1 0.11 ± 19% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.22 ± 19% -0.1 0.15 ± 29% perf-profile.children.cycles-pp.page_mapping
0.13 ± 20% -0.1 0.07 ± 26% perf-profile.children.cycles-pp.page_cache_free_page
0.26 ± 20% -0.1 0.20 ± 14% perf-profile.children.cycles-pp.percpu_counter_add_batch
0.09 ± 23% -0.1 0.03 ±100% perf-profile.children.cycles-pp.free_unref_page_prepare
0.20 ± 16% -0.1 0.15 ± 15% perf-profile.children.cycles-pp.__list_add_valid
0.17 ± 16% -0.1 0.11 ± 13% perf-profile.children.cycles-pp.task_tick_fair
0.12 ± 23% -0.1 0.07 ± 23% perf-profile.children.cycles-pp.free_unref_page_commit
0.14 ± 19% -0.1 0.08 ± 13% perf-profile.children.cycles-pp.irqtime_account_irq
0.13 ± 19% -0.1 0.08 ± 15% perf-profile.children.cycles-pp.lapic_next_deadline
0.09 ± 24% -0.1 0.04 ± 58% perf-profile.children.cycles-pp.read
0.10 ± 21% -0.1 0.04 ± 63% perf-profile.children.cycles-pp.uncharge_page
0.10 ± 22% -0.0 0.05 ± 64% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.14 ± 15% -0.0 0.09 ± 24% perf-profile.children.cycles-pp.__next_timer_interrupt
0.10 ± 19% -0.0 0.05 ± 62% perf-profile.children.cycles-pp._find_next_bit
0.26 ± 18% -0.0 0.21 ± 18% perf-profile.children.cycles-pp.xas_start
0.15 ± 15% -0.0 0.10 ± 25% perf-profile.children.cycles-pp.__mod_zone_page_state
0.12 ± 18% -0.0 0.08 ± 26% perf-profile.children.cycles-pp.read_tsc
0.14 ± 18% -0.0 0.10 ± 23% perf-profile.children.cycles-pp.__indirect_thunk_start
0.10 ± 18% -0.0 0.06 ± 26% perf-profile.children.cycles-pp.load_balance
0.11 ± 27% -0.0 0.07 ± 17% perf-profile.children.cycles-pp.irq_work_interrupt
0.11 ± 27% -0.0 0.07 ± 17% perf-profile.children.cycles-pp.smp_irq_work_interrupt
0.11 ± 27% -0.0 0.07 ± 17% perf-profile.children.cycles-pp.irq_work_run
0.11 ± 27% -0.0 0.07 ± 17% perf-profile.children.cycles-pp.printk
0.15 ± 22% -0.0 0.11 ± 7% perf-profile.children.cycles-pp.PageHuge
0.09 ± 19% -0.0 0.06 ± 20% perf-profile.children.cycles-pp.rcu_core
0.10 ± 15% -0.0 0.07 ± 24% perf-profile.children.cycles-pp.sched_clock
0.16 ± 12% -0.0 0.14 ± 13% perf-profile.children.cycles-pp.___might_sleep
0.09 ± 7% +0.2 0.27 ± 17% perf-profile.children.cycles-pp.mem_cgroup_page_lruvec
0.00 +1.2 1.21 ± 49% perf-profile.children.cycles-pp.__munmap
11.33 ± 15% +10.6 21.96 ± 10% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
13.21 ± 15% +10.7 23.88 ± 10% perf-profile.children.cycles-pp.__lru_cache_add
11.16 ± 15% +10.7 21.87 ± 10% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
13.04 ± 15% +10.7 23.76 ± 10% perf-profile.children.cycles-pp.pagevec_lru_move_fn
11.07 ± 17% -5.6 5.48 ± 12% perf-profile.self.cycles-pp.intel_idle
2.62 ± 15% -1.4 1.21 ± 6% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
6.22 ± 14% -1.0 5.24 ± 12% perf-profile.self.cycles-pp.shmem_getpage_gfp
5.45 ± 14% -0.9 4.58 ± 12% perf-profile.self.cycles-pp.clear_page_erms
1.35 ± 13% -0.8 0.52 ± 6% perf-profile.self.cycles-pp.mem_cgroup_try_charge
2.06 ± 15% -0.7 1.38 ± 19% perf-profile.self.cycles-pp.memcpy_erms
3.95 ± 12% -0.5 3.40 ± 11% perf-profile.self.cycles-pp.filemap_map_pages
1.62 ± 17% -0.5 1.09 ± 20% perf-profile.self.cycles-pp.io_serial_in
0.62 ± 15% -0.3 0.34 ± 10% perf-profile.self.cycles-pp.ktime_get
0.66 ± 16% -0.2 0.41 ± 20% perf-profile.self.cycles-pp.release_pages
0.52 ± 15% -0.2 0.32 ± 26% perf-profile.self.cycles-pp.__list_del_entry_valid
0.41 ± 21% -0.2 0.22 ± 27% perf-profile.self.cycles-pp.find_get_entries
0.42 ± 17% -0.2 0.23 ± 35% perf-profile.self.cycles-pp.__free_one_page
1.22 ± 13% -0.2 1.03 ± 13% perf-profile.self.cycles-pp.swapgs_restore_regs_and_return_to_usermode
0.93 ± 13% -0.2 0.75 ± 17% perf-profile.self.cycles-pp.unlock_page
1.20 ± 12% -0.2 1.03 ± 12% perf-profile.self.cycles-pp.sync_regs
0.79 ± 14% -0.2 0.62 ± 17% perf-profile.self.cycles-pp.xas_load
0.61 ± 17% -0.2 0.46 ± 15% perf-profile.self.cycles-pp.__mod_node_page_state
0.29 ± 21% -0.1 0.15 ± 30% perf-profile.self.cycles-pp.shmem_undo_range
1.20 ± 11% -0.1 1.07 ± 11% perf-profile.self.cycles-pp.__count_memcg_events
0.26 ± 12% -0.1 0.13 ± 30% perf-profile.self.cycles-pp.xas_clear_mark
0.31 ± 15% -0.1 0.19 ± 14% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.61 ± 13% -0.1 0.49 ± 17% perf-profile.self.cycles-pp.__pagevec_lru_add_fn
0.44 ± 18% -0.1 0.33 ± 20% perf-profile.self.cycles-pp.sys_imageblit
0.38 ± 12% -0.1 0.28 ± 15% perf-profile.self.cycles-pp.xas_store
0.27 ± 13% -0.1 0.17 ± 18% perf-profile.self.cycles-pp.cpuidle_enter_state
0.41 ± 11% -0.1 0.31 ± 10% perf-profile.self.cycles-pp.try_charge
0.24 ± 18% -0.1 0.15 ± 24% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.12 ± 27% -0.1 0.04 ±106% perf-profile.self.cycles-pp.timekeeping_max_deferment
0.48 ± 13% -0.1 0.40 ± 12% perf-profile.self.cycles-pp.___perf_sw_event
0.15 ± 18% -0.1 0.08 ± 23% perf-profile.self.cycles-pp.free_pcppages_bulk
0.45 ± 12% -0.1 0.39 ± 11% perf-profile.self.cycles-pp.rmqueue_bulk
0.19 ± 22% -0.1 0.12 ± 20% perf-profile.self.cycles-pp.io_serial_out
0.10 ± 24% -0.1 0.03 ±102% perf-profile.self.cycles-pp.free_unref_page_commit
0.15 ± 22% -0.1 0.09 ± 37% perf-profile.self.cycles-pp.__delete_from_page_cache
0.13 ± 21% -0.1 0.07 ± 23% perf-profile.self.cycles-pp.page_cache_free_page
0.11 ± 22% -0.1 0.05 ± 61% perf-profile.self.cycles-pp.unaccount_page_cache_page
0.11 ± 15% -0.1 0.04 ± 63% perf-profile.self.cycles-pp.free_unref_page_list
0.19 ± 16% -0.1 0.13 ± 18% perf-profile.self.cycles-pp.__list_add_valid
0.30 ± 12% -0.1 0.24 ± 9% perf-profile.self.cycles-pp.__alloc_pages_nodemask
0.20 ± 20% -0.1 0.14 ± 28% perf-profile.self.cycles-pp.page_mapping
0.08 ± 23% -0.1 0.03 ±100% perf-profile.self.cycles-pp.irqtime_account_irq
0.24 ± 22% -0.1 0.18 ± 15% perf-profile.self.cycles-pp.percpu_counter_add_batch
0.09 ± 20% -0.1 0.03 ±105% perf-profile.self.cycles-pp.rcu_sched_clock_irq
0.34 ± 14% -0.1 0.29 ± 11% perf-profile.self.cycles-pp.handle_mm_fault
0.13 ± 17% -0.1 0.08 ± 15% perf-profile.self.cycles-pp.lapic_next_deadline
0.12 ± 15% -0.0 0.07 ± 17% perf-profile.self.cycles-pp.xas_init_marks
0.14 ± 14% -0.0 0.10 ± 23% perf-profile.self.cycles-pp.__mod_zone_page_state
0.09 ± 19% -0.0 0.05 ± 62% perf-profile.self.cycles-pp._find_next_bit
0.23 ± 19% -0.0 0.20 ± 18% perf-profile.self.cycles-pp.xas_start
0.11 ± 17% -0.0 0.08 ± 24% perf-profile.self.cycles-pp.read_tsc
0.09 ± 15% -0.0 0.06 ± 26% perf-profile.self.cycles-pp.native_sched_clock
0.16 ± 13% -0.0 0.14 ± 11% perf-profile.self.cycles-pp.___might_sleep
0.08 ± 10% +0.2 0.26 ± 18% perf-profile.self.cycles-pp.mem_cgroup_page_lruvec
0.49 ± 17% +0.4 0.93 ± 12% perf-profile.self.cycles-pp.page_add_file_rmap
11.15 ± 15% +10.7 21.87 ± 10% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath



vm-scalability.time.system_time

4000 +--------------------------------------------------------------------+
| O O O O O O O O O O O O O O O O O O |
3500 |-+ O O O O O O O O O O |
3000 |.+.+.+..+.+.+.+.+.+..+.+.+.+.+.+..+.+.+.+.+.+.+..+.+. +.+..+.+.+.|
| + : |
2500 |-+ : : |
| : : |
2000 |-+ : : |
| : : |
1500 |-+ : : |
1000 |-+ : : |
| : : |
500 |-+ : |
| : |
0 +--------------------------------------------------------------------+


vm-scalability.throughput

6e+07 +-------------------------------------------------------------------+
| |
5e+07 |-+ .+.+.+.+.+.+.+.. .+. .+ +..+.+.+.+.|
|.+.+.+..+ O O O O O O O +.+.+.+.+.+.+.+..+.+.+ O + : O : O O |
| : : O |
4e+07 |-+ : : |
| : : |
3e+07 |-+ : : |
| : : |
2e+07 |-+ : : |
| : : |
| :: |
1e+07 |-+ : |
| : |
0 +-------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (45.88 kB)
config-5.6.0-11466-g85b9f46e8ea45 (159.20 kB)
job-script (7.57 kB)
job.yaml (5.25 kB)
reproduce (342.14 kB)
Download all attachments

2020-10-04 19:08:51

by David Rientjes

[permalink] [raw]
Subject: Re: [mm, thp] 85b9f46e8e: vm-scalability.throughput -8.7% regression

On Sun, 4 Oct 2020, kernel test robot wrote:

> Greeting,
>
> FYI, we noticed a -8.7% regression of vm-scalability.throughput due to commit:
>
>
> commit: 85b9f46e8ea451633ccd60a7d8cacbfff9f34047 ("mm, thp: track fallbacks due to failed memcg charges separately")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
>
> in testcase: vm-scalability
> on test machine: 104 threads Skylake with 192G memory
> with following parameters:
>
> runtime: 300s
> size: 1T
> test: lru-shm
> cpufreq_governor: performance
> ucode: 0x2006906
>
> test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
>
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
> gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/1T/lkp-skl-fpga01/lru-shm/vm-scalability/0x2006906
>
> commit:
> dcdf11ee14 ("mm, shmem: add vmstat for hugepage fallback")
> 85b9f46e8e ("mm, thp: track fallbacks due to failed memcg charges separately")
>
> dcdf11ee14413332 85b9f46e8ea451633ccd60a7d8c
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> 1:4 24% 2:4 perf-profile.calltrace.cycles-pp.sync_regs.error_entry.do_access
> 3:4 53% 5:4 perf-profile.calltrace.cycles-pp.error_entry.do_access
> 9:4 -27% 8:4 perf-profile.children.cycles-pp.error_entry
> 4:4 -10% 4:4 perf-profile.self.cycles-pp.error_entry
> %stddev %change %stddev
> \ | \
> 477291 -9.1% 434041 vm-scalability.median
> 49791027 -8.7% 45476799 vm-scalability.throughput
> 223.67 +1.6% 227.36 vm-scalability.time.elapsed_time
> 223.67 +1.6% 227.36 vm-scalability.time.elapsed_time.max
> 50364 ± 6% +24.1% 62482 ± 10% vm-scalability.time.involuntary_context_switches
> 2237 +7.8% 2412 vm-scalability.time.percent_of_cpu_this_job_got
> 3084 +18.2% 3646 vm-scalability.time.system_time
> 1921 -4.2% 1839 vm-scalability.time.user_time
> 13.68 +2.2 15.86 mpstat.cpu.all.sys%
> 28535 ± 30% -47.0% 15114 ± 79% numa-numastat.node0.other_node
> 142734 ± 11% -19.4% 115000 ± 17% numa-meminfo.node0.AnonPages
> 11168 ± 3% +8.8% 12150 ± 5% numa-meminfo.node1.PageTables
> 76.00 -1.6% 74.75 vmstat.cpu.id
> 3626 -1.9% 3555 vmstat.system.cs
> 2214928 ±166% -96.6% 75321 ± 7% cpuidle.C1.usage
> 200981 ± 7% -18.0% 164861 ± 7% cpuidle.POLL.time
> 52675 ± 3% -16.7% 43866 ± 10% cpuidle.POLL.usage
> 35659 ± 11% -19.4% 28754 ± 17% numa-vmstat.node0.nr_anon_pages
> 1248014 ± 3% +10.9% 1384236 numa-vmstat.node1.nr_mapped
> 2722 ± 4% +10.6% 3011 ± 5% numa-vmstat.node1.nr_page_table_pages

I'm not sure that I'm reading this correctly, but I suspect that this just
happens because of NUMA: memory affinity will obviously impact
vm-scalability.throughput quite substantially, but I don't think the
bisected commit can be to be blame. Commit 85b9f46e8ea4 ("mm, thp: track
fallbacks due to failed memcg charges separately") simply adds new
count_vm_event() calls in a couple areas to track thp fallback due to
memcg limits separate from fragmentation.

It's likely a question about the testing methodology in general: for
memory intensive benchmarks, I suggest it is configured in a manner that
we can expect consistent memory access latency at the hardware level when
running on a NUMA system.

2020-10-20 09:14:40

by Huang, Ying

[permalink] [raw]
Subject: Re: [mm, thp] 85b9f46e8e: vm-scalability.throughput -8.7% regression

David Rientjes <[email protected]> writes:

> On Sun, 4 Oct 2020, kernel test robot wrote:
>
>> Greeting,
>>
>> FYI, we noticed a -8.7% regression of vm-scalability.throughput due to commit:
>>
>>
>> commit: 85b9f46e8ea451633ccd60a7d8cacbfff9f34047 ("mm, thp: track fallbacks due to failed memcg charges separately")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>>
>> in testcase: vm-scalability
>> on test machine: 104 threads Skylake with 192G memory
>> with following parameters:
>>
>> runtime: 300s
>> size: 1T
>> test: lru-shm
>> cpufreq_governor: performance
>> ucode: 0x2006906
>>
>> test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
>> test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
>>
>>
>>
>> If you fix the issue, kindly add following tag
>> Reported-by: kernel test robot <[email protected]>
>>
>>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>>
>>
>> To reproduce:
>>
>> git clone https://github.com/intel/lkp-tests.git
>> cd lkp-tests
>> bin/lkp install job.yaml # job file is attached in this email
>> bin/lkp run job.yaml
>>
>> =========================================================================================
>> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
>> gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/1T/lkp-skl-fpga01/lru-shm/vm-scalability/0x2006906
>>
>> commit:
>> dcdf11ee14 ("mm, shmem: add vmstat for hugepage fallback")
>> 85b9f46e8e ("mm, thp: track fallbacks due to failed memcg charges separately")
>>
>> dcdf11ee14413332 85b9f46e8ea451633ccd60a7d8c
>> ---------------- ---------------------------
>> fail:runs %reproduction fail:runs
>> | | |
>> 1:4 24% 2:4 perf-profile.calltrace.cycles-pp.sync_regs.error_entry.do_access
>> 3:4 53% 5:4 perf-profile.calltrace.cycles-pp.error_entry.do_access
>> 9:4 -27% 8:4 perf-profile.children.cycles-pp.error_entry
>> 4:4 -10% 4:4 perf-profile.self.cycles-pp.error_entry
>> %stddev %change %stddev
>> \ | \
>> 477291 -9.1% 434041 vm-scalability.median
>> 49791027 -8.7% 45476799 vm-scalability.throughput
>> 223.67 +1.6% 227.36 vm-scalability.time.elapsed_time
>> 223.67 +1.6% 227.36 vm-scalability.time.elapsed_time.max
>> 50364 ± 6% +24.1% 62482 ± 10% vm-scalability.time.involuntary_context_switches
>> 2237 +7.8% 2412 vm-scalability.time.percent_of_cpu_this_job_got
>> 3084 +18.2% 3646 vm-scalability.time.system_time
>> 1921 -4.2% 1839 vm-scalability.time.user_time
>> 13.68 +2.2 15.86 mpstat.cpu.all.sys%
>> 28535 ± 30% -47.0% 15114 ± 79% numa-numastat.node0.other_node
>> 142734 ± 11% -19.4% 115000 ± 17% numa-meminfo.node0.AnonPages
>> 11168 ± 3% +8.8% 12150 ± 5% numa-meminfo.node1.PageTables
>> 76.00 -1.6% 74.75 vmstat.cpu.id
>> 3626 -1.9% 3555 vmstat.system.cs
>> 2214928 ±166% -96.6% 75321 ± 7% cpuidle.C1.usage
>> 200981 ± 7% -18.0% 164861 ± 7% cpuidle.POLL.time
>> 52675 ± 3% -16.7% 43866 ± 10% cpuidle.POLL.usage
>> 35659 ± 11% -19.4% 28754 ± 17% numa-vmstat.node0.nr_anon_pages
>> 1248014 ± 3% +10.9% 1384236 numa-vmstat.node1.nr_mapped
>> 2722 ± 4% +10.6% 3011 ± 5% numa-vmstat.node1.nr_page_table_pages
>
> I'm not sure that I'm reading this correctly, but I suspect that this just
> happens because of NUMA: memory affinity will obviously impact
> vm-scalability.throughput quite substantially, but I don't think the
> bisected commit can be to be blame. Commit 85b9f46e8ea4 ("mm, thp: track
> fallbacks due to failed memcg charges separately") simply adds new
> count_vm_event() calls in a couple areas to track thp fallback due to
> memcg limits separate from fragmentation.
>
> It's likely a question about the testing methodology in general: for
> memory intensive benchmarks, I suggest it is configured in a manner that
> we can expect consistent memory access latency at the hardware level when
> running on a NUMA system.

So you think it's better to bind processes to NUMA node or CPU? But we
want to use this test case to capture NUMA/CPU placement/balance issue
too.

0day solve the problem in another way. We run the test case
multiple-times and calculate the average and standard deviation, then
compare.

For this specific regression, I found something strange,

10.93 ± 15% +10.8 21.78 ± 10% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.shmem_getpage_gfp.shmem_fault

It appears the lock contention becomes heavier with the patch. But I
cannot understand why too.

Best Regards,
Huang, Ying

2020-10-21 09:08:46

by David Rientjes

[permalink] [raw]
Subject: Re: [mm, thp] 85b9f46e8e: vm-scalability.throughput -8.7% regression

On Tue, 20 Oct 2020, Huang, Ying wrote:

> >> =========================================================================================
> >> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
> >> gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/1T/lkp-skl-fpga01/lru-shm/vm-scalability/0x2006906
> >>
> >> commit:
> >> dcdf11ee14 ("mm, shmem: add vmstat for hugepage fallback")
> >> 85b9f46e8e ("mm, thp: track fallbacks due to failed memcg charges separately")
> >>
> >> dcdf11ee14413332 85b9f46e8ea451633ccd60a7d8c
> >> ---------------- ---------------------------
> >> fail:runs %reproduction fail:runs
> >> | | |
> >> 1:4 24% 2:4 perf-profile.calltrace.cycles-pp.sync_regs.error_entry.do_access
> >> 3:4 53% 5:4 perf-profile.calltrace.cycles-pp.error_entry.do_access
> >> 9:4 -27% 8:4 perf-profile.children.cycles-pp.error_entry
> >> 4:4 -10% 4:4 perf-profile.self.cycles-pp.error_entry
> >> %stddev %change %stddev
> >> \ | \
> >> 477291 -9.1% 434041 vm-scalability.median
> >> 49791027 -8.7% 45476799 vm-scalability.throughput
> >> 223.67 +1.6% 227.36 vm-scalability.time.elapsed_time
> >> 223.67 +1.6% 227.36 vm-scalability.time.elapsed_time.max
> >> 50364 ± 6% +24.1% 62482 ± 10% vm-scalability.time.involuntary_context_switches
> >> 2237 +7.8% 2412 vm-scalability.time.percent_of_cpu_this_job_got
> >> 3084 +18.2% 3646 vm-scalability.time.system_time
> >> 1921 -4.2% 1839 vm-scalability.time.user_time
> >> 13.68 +2.2 15.86 mpstat.cpu.all.sys%
> >> 28535 ± 30% -47.0% 15114 ± 79% numa-numastat.node0.other_node
> >> 142734 ± 11% -19.4% 115000 ± 17% numa-meminfo.node0.AnonPages
> >> 11168 ± 3% +8.8% 12150 ± 5% numa-meminfo.node1.PageTables
> >> 76.00 -1.6% 74.75 vmstat.cpu.id
> >> 3626 -1.9% 3555 vmstat.system.cs
> >> 2214928 ±166% -96.6% 75321 ± 7% cpuidle.C1.usage
> >> 200981 ± 7% -18.0% 164861 ± 7% cpuidle.POLL.time
> >> 52675 ± 3% -16.7% 43866 ± 10% cpuidle.POLL.usage
> >> 35659 ± 11% -19.4% 28754 ± 17% numa-vmstat.node0.nr_anon_pages
> >> 1248014 ± 3% +10.9% 1384236 numa-vmstat.node1.nr_mapped
> >> 2722 ± 4% +10.6% 3011 ± 5% numa-vmstat.node1.nr_page_table_pages
> >
> > I'm not sure that I'm reading this correctly, but I suspect that this just
> > happens because of NUMA: memory affinity will obviously impact
> > vm-scalability.throughput quite substantially, but I don't think the
> > bisected commit can be to be blame. Commit 85b9f46e8ea4 ("mm, thp: track
> > fallbacks due to failed memcg charges separately") simply adds new
> > count_vm_event() calls in a couple areas to track thp fallback due to
> > memcg limits separate from fragmentation.
> >
> > It's likely a question about the testing methodology in general: for
> > memory intensive benchmarks, I suggest it is configured in a manner that
> > we can expect consistent memory access latency at the hardware level when
> > running on a NUMA system.
>
> So you think it's better to bind processes to NUMA node or CPU? But we
> want to use this test case to capture NUMA/CPU placement/balance issue
> too.
>

No, because binding to a specific socket may cause other performance
"improvements" or "degradations" depending on how fragmented local memory
is, or whether or not it's under memory pressure. Is the system rebooted
before testing so that we have a consistent state of memory availability
and fragmentation across sockets?

> 0day solve the problem in another way. We run the test case
> multiple-times and calculate the average and standard deviation, then
> compare.
>

Depending on fragmentation or memory availability, any benchmark that
assesses performance may be adversely affected if its results can be
impacted by hugepage backing.

2020-10-21 11:56:24

by Huang, Ying

[permalink] [raw]
Subject: Re: [mm, thp] 85b9f46e8e: vm-scalability.throughput -8.7% regression

David Rientjes <[email protected]> writes:

> On Tue, 20 Oct 2020, Huang, Ying wrote:
>
>> >> =========================================================================================
>> >> compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
>> >> gcc-9/performance/x86_64-rhel-8.3/debian-10.4-x86_64-20200603.cgz/300s/1T/lkp-skl-fpga01/lru-shm/vm-scalability/0x2006906
>> >>
>> >> commit:
>> >> dcdf11ee14 ("mm, shmem: add vmstat for hugepage fallback")
>> >> 85b9f46e8e ("mm, thp: track fallbacks due to failed memcg charges separately")
>> >>
>> >> dcdf11ee14413332 85b9f46e8ea451633ccd60a7d8c
>> >> ---------------- ---------------------------
>> >> fail:runs %reproduction fail:runs
>> >> | | |
>> >> 1:4 24% 2:4 perf-profile.calltrace.cycles-pp.sync_regs.error_entry.do_access
>> >> 3:4 53% 5:4 perf-profile.calltrace.cycles-pp.error_entry.do_access
>> >> 9:4 -27% 8:4 perf-profile.children.cycles-pp.error_entry
>> >> 4:4 -10% 4:4 perf-profile.self.cycles-pp.error_entry
>> >> %stddev %change %stddev
>> >> \ | \
>> >> 477291 -9.1% 434041 vm-scalability.median
>> >> 49791027 -8.7% 45476799 vm-scalability.throughput
>> >> 223.67 +1.6% 227.36 vm-scalability.time.elapsed_time
>> >> 223.67 +1.6% 227.36 vm-scalability.time.elapsed_time.max
>> >> 50364 ± 6% +24.1% 62482 ± 10% vm-scalability.time.involuntary_context_switches
>> >> 2237 +7.8% 2412 vm-scalability.time.percent_of_cpu_this_job_got
>> >> 3084 +18.2% 3646 vm-scalability.time.system_time
>> >> 1921 -4.2% 1839 vm-scalability.time.user_time
>> >> 13.68 +2.2 15.86 mpstat.cpu.all.sys%
>> >> 28535 ± 30% -47.0% 15114 ± 79% numa-numastat.node0.other_node
>> >> 142734 ± 11% -19.4% 115000 ± 17% numa-meminfo.node0.AnonPages
>> >> 11168 ± 3% +8.8% 12150 ± 5% numa-meminfo.node1.PageTables
>> >> 76.00 -1.6% 74.75 vmstat.cpu.id
>> >> 3626 -1.9% 3555 vmstat.system.cs
>> >> 2214928 ±166% -96.6% 75321 ± 7% cpuidle.C1.usage
>> >> 200981 ± 7% -18.0% 164861 ± 7% cpuidle.POLL.time
>> >> 52675 ± 3% -16.7% 43866 ± 10% cpuidle.POLL.usage
>> >> 35659 ± 11% -19.4% 28754 ± 17% numa-vmstat.node0.nr_anon_pages
>> >> 1248014 ± 3% +10.9% 1384236 numa-vmstat.node1.nr_mapped
>> >> 2722 ± 4% +10.6% 3011 ± 5% numa-vmstat.node1.nr_page_table_pages
>> >
>> > I'm not sure that I'm reading this correctly, but I suspect that this just
>> > happens because of NUMA: memory affinity will obviously impact
>> > vm-scalability.throughput quite substantially, but I don't think the
>> > bisected commit can be to be blame. Commit 85b9f46e8ea4 ("mm, thp: track
>> > fallbacks due to failed memcg charges separately") simply adds new
>> > count_vm_event() calls in a couple areas to track thp fallback due to
>> > memcg limits separate from fragmentation.
>> >
>> > It's likely a question about the testing methodology in general: for
>> > memory intensive benchmarks, I suggest it is configured in a manner that
>> > we can expect consistent memory access latency at the hardware level when
>> > running on a NUMA system.
>>
>> So you think it's better to bind processes to NUMA node or CPU? But we
>> want to use this test case to capture NUMA/CPU placement/balance issue
>> too.
>>
>
> No, because binding to a specific socket may cause other performance
> "improvements" or "degradations" depending on how fragmented local memory
> is, or whether or not it's under memory pressure. Is the system rebooted
> before testing so that we have a consistent state of memory availability
> and fragmentation across sockets?

Yes. System is rebooted before testing (0day uses kexec to accelerate
rebooting).

>> 0day solve the problem in another way. We run the test case
>> multiple-times and calculate the average and standard deviation, then
>> compare.
>>
>
> Depending on fragmentation or memory availability, any benchmark that
> assesses performance may be adversely affected if its results can be
> impacted by hugepage backing.

Best Regards,
Huang, Ying