2022-04-01 11:27:35

by kernel test robot

[permalink] [raw]
Subject: [NUMA Balancing] e39bb6be9f: will-it-scale.per_thread_ops 64.4% improvement



Greeting,

FYI, we noticed a 64.4% improvement of will-it-scale.per_thread_ops due to commit:


commit: e39bb6be9f2b39a6dbaeff484361de76021b175d ("NUMA Balancing: add page promotion counter")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

nr_task: 50%
mode: thread
test: fallocate1
cpufreq_governor: performance
ucode: 0x500320a

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/thread/50%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap4/fallocate1/will-it-scale/0x500320a

commit:
ee97347fe0 ("powerpc/fadump: opt out from freeing pages on cma activation failure")
e39bb6be9f ("NUMA Balancing: add page promotion counter")

ee97347fe058d020 e39bb6be9f2b39a6dbaeff48436
---------------- ---------------------------
%stddev %change %stddev
\ | \
3314361 ? 3% +64.4% 5449130 will-it-scale.96.threads
34524 ? 3% +64.4% 56761 will-it-scale.per_thread_ops
3314361 ? 3% +64.4% 5449130 will-it-scale.workload
7820 ? 9% -13.4% 6772 ? 3% numa-meminfo.node2.KernelStack
4035 +5.1% 4240 vmstat.system.cs
0.04 +50.0% 0.06 turbostat.IPC
241.82 +5.3% 254.69 turbostat.PkgWatt
0.67 ? 3% +0.1 0.80 ? 2% mpstat.cpu.all.irq%
0.13 ? 3% +0.0 0.16 ? 3% mpstat.cpu.all.soft%
0.09 ? 8% +0.0 0.13 ? 2% mpstat.cpu.all.usr%
42314 +8.4% 45859 proc-vmstat.nr_slab_reclaimable
2e+09 ? 3% +64.2% 3.285e+09 proc-vmstat.numa_hit
1.999e+09 ? 3% +64.3% 3.284e+09 proc-vmstat.numa_local
1.998e+09 ? 3% +64.1% 3.279e+09 proc-vmstat.pgalloc_normal
1.998e+09 ? 3% +64.1% 3.279e+09 proc-vmstat.pgfree
4.014e+08 ? 2% +88.2% 7.555e+08 numa-numastat.node0.local_node
4.015e+08 ? 2% +88.2% 7.555e+08 numa-numastat.node0.numa_hit
5.288e+08 ? 3% +56.7% 8.283e+08 ? 3% numa-numastat.node1.local_node
5.29e+08 ? 3% +56.6% 8.286e+08 ? 3% numa-numastat.node1.numa_hit
5.29e+08 ? 7% +66.5% 8.806e+08 numa-numastat.node2.local_node
5.291e+08 ? 7% +66.5% 8.811e+08 numa-numastat.node2.numa_hit
5.402e+08 ? 2% +51.6% 8.192e+08 ? 2% numa-numastat.node3.local_node
5.405e+08 ? 2% +51.6% 8.196e+08 ? 2% numa-numastat.node3.numa_hit
9874 ? 8% +14.6% 11314 ? 8% numa-vmstat.node0.nr_mapped
4.014e+08 ? 2% +88.2% 7.555e+08 numa-vmstat.node0.numa_hit
4.013e+08 ? 2% +88.3% 7.555e+08 numa-vmstat.node0.numa_local
5.29e+08 ? 3% +56.7% 8.286e+08 ? 3% numa-vmstat.node1.numa_hit
5.288e+08 ? 3% +56.7% 8.283e+08 ? 3% numa-vmstat.node1.numa_local
7820 ? 9% -13.4% 6768 ? 3% numa-vmstat.node2.nr_kernel_stack
5.291e+08 ? 7% +66.5% 8.811e+08 numa-vmstat.node2.numa_hit
5.289e+08 ? 7% +66.5% 8.806e+08 numa-vmstat.node2.numa_local
5.405e+08 ? 2% +51.6% 8.196e+08 ? 2% numa-vmstat.node3.numa_hit
5.402e+08 ? 2% +51.7% 8.192e+08 ? 2% numa-vmstat.node3.numa_local
6.76 +7.4% 7.26 perf-stat.i.MPKI
9.427e+09 ? 2% +31.5% 1.24e+10 perf-stat.i.branch-instructions
0.38 ? 2% +0.0 0.40 perf-stat.i.branch-miss-rate%
35577299 ? 3% +36.6% 48601753 perf-stat.i.branch-misses
92457638 ? 3% +44.7% 1.337e+08 ? 2% perf-stat.i.cache-misses
2.979e+08 ? 3% +45.8% 4.344e+08 perf-stat.i.cache-references
3950 +5.3% 4158 perf-stat.i.context-switches
6.73 ? 2% -26.0% 4.98 perf-stat.i.cpi
197.62 +4.6% 206.68 perf-stat.i.cpu-migrations
3224 ? 3% -30.7% 2235 ? 2% perf-stat.i.cycles-between-cache-misses
1.183e+10 ? 2% +37.8% 1.631e+10 perf-stat.i.dTLB-loads
0.00 ? 7% -0.0 0.00 ? 9% perf-stat.i.dTLB-store-miss-rate%
5.184e+09 ? 3% +62.5% 8.425e+09 perf-stat.i.dTLB-stores
87.42 +1.0 88.38 perf-stat.i.iTLB-load-miss-rate%
21992707 ? 2% +15.4% 25374748 perf-stat.i.iTLB-load-misses
3105517 ? 2% +7.0% 3324290 perf-stat.i.iTLB-loads
4.405e+10 ? 2% +35.9% 5.988e+10 perf-stat.i.instructions
2002 ? 2% +18.0% 2362 ? 2% perf-stat.i.instructions-per-iTLB-miss
0.15 ? 2% +34.7% 0.20 perf-stat.i.ipc
174.73 ? 4% +35.4% 236.61 ? 2% perf-stat.i.metric.K/sec
139.22 ? 2% +40.4% 195.54 perf-stat.i.metric.M/sec
17761842 ? 4% +24.6% 22130000 ? 3% perf-stat.i.node-load-misses
10797107 ? 3% +68.9% 18236349 ? 2% perf-stat.i.node-store-misses
6.76 +7.3% 7.26 perf-stat.overall.MPKI
0.38 +0.0 0.39 perf-stat.overall.branch-miss-rate%
6.75 ? 2% -26.1% 4.99 perf-stat.overall.cpi
3217 ? 3% -30.6% 2233 ? 2% perf-stat.overall.cycles-between-cache-misses
0.00 ? 19% -0.0 0.00 ? 9% perf-stat.overall.dTLB-load-miss-rate%
0.00 ? 4% -0.0 0.00 ? 4% perf-stat.overall.dTLB-store-miss-rate%
87.63 +0.8 88.42 perf-stat.overall.iTLB-load-miss-rate%
2003 ? 2% +17.8% 2360 ? 2% perf-stat.overall.instructions-per-iTLB-miss
0.15 ? 2% +35.3% 0.20 perf-stat.overall.ipc
4027119 -17.8% 3310074 perf-stat.overall.path-length
9.398e+09 ? 2% +31.5% 1.235e+10 perf-stat.ps.branch-instructions
35560785 ? 2% +36.4% 48518588 perf-stat.ps.branch-misses
92159444 ? 3% +44.6% 1.333e+08 ? 2% perf-stat.ps.cache-misses
2.971e+08 ? 3% +45.7% 4.329e+08 perf-stat.ps.cache-references
3932 +5.2% 4136 perf-stat.ps.context-switches
197.26 +4.4% 205.95 perf-stat.ps.cpu-migrations
1.18e+10 ? 2% +37.7% 1.625e+10 perf-stat.ps.dTLB-loads
5.167e+09 ? 3% +62.4% 8.393e+09 perf-stat.ps.dTLB-stores
21922146 ? 2% +15.3% 25283252 perf-stat.ps.iTLB-load-misses
3093993 ? 2% +7.0% 3310806 perf-stat.ps.iTLB-loads
4.391e+10 ? 2% +35.9% 5.967e+10 perf-stat.ps.instructions
17706806 ? 4% +24.5% 22051204 ? 3% perf-stat.ps.node-load-misses
10761992 ? 3% +68.8% 18167295 ? 2% perf-stat.ps.node-store-misses
1.334e+13 ? 2% +35.2% 1.804e+13 perf-stat.total.instructions
22.90 ? 14% -6.7 16.21 ? 9% perf-profile.calltrace.cycles-pp.__pagevec_lru_add.folio_add_lru.shmem_getpage_gfp.shmem_fallocate.vfs_fallocate
40.42 ? 12% -6.7 33.74 ? 10% perf-profile.calltrace.cycles-pp.shmem_getpage_gfp.shmem_fallocate.vfs_fallocate.ksys_fallocate.__x64_sys_fallocate
22.96 ? 14% -6.7 16.30 ? 9% perf-profile.calltrace.cycles-pp.folio_add_lru.shmem_getpage_gfp.shmem_fallocate.vfs_fallocate.ksys_fallocate
40.78 ? 12% -6.5 34.29 ? 10% perf-profile.calltrace.cycles-pp.shmem_fallocate.vfs_fallocate.ksys_fallocate.__x64_sys_fallocate.do_syscall_64
40.87 ? 12% -6.4 34.42 ? 10% perf-profile.calltrace.cycles-pp.vfs_fallocate.ksys_fallocate.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe
20.95 ? 14% -6.4 14.53 ? 9% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__pagevec_lru_add.folio_add_lru
40.94 ? 12% -6.4 34.52 ? 10% perf-profile.calltrace.cycles-pp.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe.fallocate64
40.93 ? 12% -6.4 34.52 ? 10% perf-profile.calltrace.cycles-pp.ksys_fallocate.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe.fallocate64
20.98 ? 14% -6.4 14.57 ? 9% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__pagevec_lru_add.folio_add_lru.shmem_getpage_gfp
20.98 ? 14% -6.4 14.58 ? 9% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__pagevec_lru_add.folio_add_lru.shmem_getpage_gfp.shmem_fallocate
40.97 ? 12% -6.4 34.56 ? 10% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fallocate64
40.98 ? 12% -6.4 34.59 ? 10% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fallocate64
41.08 ? 12% -6.3 34.74 ? 10% perf-profile.calltrace.cycles-pp.fallocate64
10.27 ? 7% -4.5 5.73 ? 17% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.__pagevec_release
10.30 ? 7% -4.5 5.77 ? 16% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.__pagevec_release.shmem_undo_range.shmem_truncate_range
10.29 ? 7% -4.5 5.77 ? 16% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.__pagevec_release.shmem_undo_range
2.93 ? 8% -1.7 1.24 ? 14% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.filemap_unaccount_folio.__filemap_remove_folio.filemap_remove_folio.truncate_inode_folio
2.94 ? 8% -1.7 1.26 ? 14% perf-profile.calltrace.cycles-pp.filemap_unaccount_folio.__filemap_remove_folio.filemap_remove_folio.truncate_inode_folio.shmem_undo_range
2.72 ? 7% -1.6 1.08 ? 14% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.filemap_unaccount_folio.__filemap_remove_folio.filemap_remove_folio
3.15 ? 8% -1.5 1.61 ? 13% perf-profile.calltrace.cycles-pp.__filemap_remove_folio.filemap_remove_folio.truncate_inode_folio.shmem_undo_range.shmem_truncate_range
3.32 ? 8% -1.5 1.84 ? 13% perf-profile.calltrace.cycles-pp.filemap_remove_folio.truncate_inode_folio.shmem_undo_range.shmem_truncate_range.shmem_setattr
3.45 ? 8% -1.4 2.04 ? 12% perf-profile.calltrace.cycles-pp.truncate_inode_folio.shmem_undo_range.shmem_truncate_range.shmem_setattr.notify_change
2.24 ? 11% -1.2 1.06 ? 10% perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.shmem_add_to_page_cache.shmem_getpage_gfp.shmem_fallocate.vfs_fallocate
1.93 ? 11% -1.1 0.85 ? 10% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.shmem_add_to_page_cache.shmem_getpage_gfp.shmem_fallocate
1.60 ? 10% -0.5 1.06 ? 16% perf-profile.calltrace.cycles-pp.uncharge_folio.__mem_cgroup_uncharge_list.release_pages.__pagevec_release.shmem_undo_range
1.26 ? 11% -0.5 0.74 ? 11% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__pagevec_lru_add.folio_add_lru.shmem_getpage_gfp.shmem_fallocate
0.84 ? 11% -0.2 0.65 ? 11% perf-profile.calltrace.cycles-pp.__pagevec_lru_add.lru_add_drain_cpu.lru_add_drain.__pagevec_release.shmem_undo_range
0.84 ? 11% -0.2 0.65 ? 11% perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.__pagevec_release.shmem_undo_range.shmem_truncate_range
0.82 ? 11% -0.2 0.63 ? 11% perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.__pagevec_lru_add.lru_add_drain_cpu.lru_add_drain.__pagevec_release
0.82 ? 11% -0.2 0.63 ? 11% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__pagevec_lru_add.lru_add_drain_cpu.lru_add_drain
0.82 ? 11% -0.2 0.62 ? 11% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.__pagevec_lru_add.lru_add_drain_cpu
0.84 ? 11% -0.2 0.65 ? 11% perf-profile.calltrace.cycles-pp.lru_add_drain.__pagevec_release.shmem_undo_range.shmem_truncate_range.shmem_setattr
0.49 ? 45% +0.4 0.89 ? 10% perf-profile.calltrace.cycles-pp.shmem_alloc_and_acct_page.shmem_getpage_gfp.shmem_fallocate.vfs_fallocate.ksys_fallocate
0.00 +0.7 0.66 ? 9% perf-profile.calltrace.cycles-pp.shmem_alloc_page.shmem_alloc_and_acct_page.shmem_getpage_gfp.shmem_fallocate.vfs_fallocate
0.46 ? 44% +1.1 1.52 ? 11% perf-profile.calltrace.cycles-pp.propagate_protected_usage.page_counter_try_charge.try_charge_memcg.charge_memcg.__mem_cgroup_charge
5.80 ? 8% +1.9 7.69 ? 12% perf-profile.calltrace.cycles-pp.charge_memcg.__mem_cgroup_charge.shmem_add_to_page_cache.shmem_getpage_gfp.shmem_fallocate
1.30 ? 10% +2.2 3.54 ? 13% perf-profile.calltrace.cycles-pp.page_counter_try_charge.try_charge_memcg.charge_memcg.__mem_cgroup_charge.shmem_add_to_page_cache
1.72 ? 10% +3.0 4.71 ? 13% perf-profile.calltrace.cycles-pp.try_charge_memcg.charge_memcg.__mem_cgroup_charge.shmem_add_to_page_cache.shmem_getpage_gfp
3.18 ? 22% +3.8 6.97 ? 11% perf-profile.calltrace.cycles-pp.propagate_protected_usage.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_list.release_pages
22.70 ? 10% +5.2 27.91 ? 12% perf-profile.calltrace.cycles-pp.__pagevec_release.shmem_undo_range.shmem_truncate_range.shmem_setattr.notify_change
21.85 ? 10% +5.4 27.26 ? 12% perf-profile.calltrace.cycles-pp.release_pages.__pagevec_release.shmem_undo_range.shmem_truncate_range.shmem_setattr
4.86 ? 18% +6.6 11.45 ? 12% perf-profile.calltrace.cycles-pp.page_counter_cancel.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_list.release_pages
10.48 ? 17% +9.9 20.42 ? 11% perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge_list.release_pages.__pagevec_release.shmem_undo_range.shmem_truncate_range
8.06 ? 20% +10.4 18.46 ? 11% perf-profile.calltrace.cycles-pp.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.__pagevec_release
8.88 ? 19% +10.5 19.35 ? 11% perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.__pagevec_release.shmem_undo_range
32.10 ? 11% -11.2 20.92 ? 11% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
32.16 ? 11% -11.2 21.00 ? 11% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
32.16 ? 11% -11.1 21.01 ? 11% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
23.80 ? 14% -6.9 16.89 ? 9% perf-profile.children.cycles-pp.__pagevec_lru_add
40.49 ? 12% -6.7 33.78 ? 10% perf-profile.children.cycles-pp.shmem_getpage_gfp
23.00 ? 14% -6.7 16.32 ? 9% perf-profile.children.cycles-pp.folio_add_lru
40.79 ? 12% -6.5 34.30 ? 10% perf-profile.children.cycles-pp.shmem_fallocate
40.87 ? 12% -6.4 34.42 ? 10% perf-profile.children.cycles-pp.vfs_fallocate
40.94 ? 12% -6.4 34.52 ? 10% perf-profile.children.cycles-pp.ksys_fallocate
40.94 ? 12% -6.4 34.53 ? 10% perf-profile.children.cycles-pp.__x64_sys_fallocate
41.12 ? 12% -6.3 34.79 ? 10% perf-profile.children.cycles-pp.fallocate64
6.62 ? 9% -3.5 3.15 ? 12% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
5.18 ? 9% -2.9 2.30 ? 12% perf-profile.children.cycles-pp.__mod_lruvec_page_state
2.95 ? 8% -1.7 1.27 ? 14% perf-profile.children.cycles-pp.filemap_unaccount_folio
3.16 ? 8% -1.5 1.62 ? 13% perf-profile.children.cycles-pp.__filemap_remove_folio
3.32 ? 8% -1.5 1.84 ? 13% perf-profile.children.cycles-pp.filemap_remove_folio
3.46 ? 8% -1.4 2.05 ? 12% perf-profile.children.cycles-pp.truncate_inode_folio
1.60 ? 10% -0.5 1.07 ? 16% perf-profile.children.cycles-pp.uncharge_folio
0.85 ? 11% -0.2 0.66 ? 11% perf-profile.children.cycles-pp.lru_add_drain
0.85 ? 11% -0.2 0.66 ? 11% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.27 ? 6% -0.1 0.21 ? 11% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.05 ? 45% +0.0 0.08 ? 10% perf-profile.children.cycles-pp.folio_mark_dirty
0.06 ? 14% +0.0 0.10 ? 13% perf-profile.children.cycles-pp.obj_cgroup_charge_pages
0.06 ? 16% +0.0 0.10 ? 13% perf-profile.children.cycles-pp.obj_cgroup_charge
0.07 ? 16% +0.0 0.11 ? 10% perf-profile.children.cycles-pp.shmem_pseudo_vma_init
0.08 ? 12% +0.0 0.11 ? 8% perf-profile.children.cycles-pp.folio_unlock
0.06 ? 11% +0.0 0.10 ? 12% perf-profile.children.cycles-pp.xas_clear_mark
0.05 ? 46% +0.0 0.09 ? 10% perf-profile.children.cycles-pp.__entry_text_start
0.05 ? 45% +0.0 0.09 ? 12% perf-profile.children.cycles-pp.security_vm_enough_memory_mm
0.10 ? 17% +0.0 0.14 ? 12% perf-profile.children.cycles-pp.kmem_cache_alloc_lru
0.09 ? 15% +0.0 0.13 ? 13% perf-profile.children.cycles-pp.xas_alloc
0.08 ? 18% +0.0 0.12 ? 8% perf-profile.children.cycles-pp.__filemap_get_folio
0.02 ? 99% +0.0 0.07 ? 14% perf-profile.children.cycles-pp.down_write
0.02 ? 99% +0.0 0.07 ? 13% perf-profile.children.cycles-pp.free_unref_page_commit
0.02 ? 99% +0.0 0.07 ? 13% perf-profile.children.cycles-pp.folio_mapping
0.02 ? 99% +0.0 0.07 ? 11% perf-profile.children.cycles-pp.cap_vm_enough_memory
0.02 ?141% +0.0 0.06 ? 11% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.05 ? 46% +0.0 0.10 ? 13% perf-profile.children.cycles-pp.__list_add_valid
0.09 ? 12% +0.0 0.13 ? 9% perf-profile.children.cycles-pp._raw_spin_lock
0.08 ? 19% +0.0 0.13 ? 9% perf-profile.children.cycles-pp.pagecache_get_page
0.01 ?223% +0.0 0.06 ? 9% perf-profile.children.cycles-pp.__fget_light
0.09 ? 15% +0.0 0.14 ? 9% perf-profile.children.cycles-pp.__might_resched
0.03 ?100% +0.0 0.08 ? 15% perf-profile.children.cycles-pp.__folio_cancel_dirty
0.09 ? 15% +0.1 0.14 ? 11% perf-profile.children.cycles-pp.truncate_cleanup_folio
0.10 ? 13% +0.1 0.16 ? 10% perf-profile.children.cycles-pp.xas_create
0.08 ? 14% +0.1 0.14 ? 8% perf-profile.children.cycles-pp.xas_load
0.00 +0.1 0.05 ? 8% perf-profile.children.cycles-pp.xas_start
0.09 ? 14% +0.1 0.14 ? 18% perf-profile.children.cycles-pp.__list_del_entry_valid
0.00 +0.1 0.06 ? 15% perf-profile.children.cycles-pp.xas_find_conflict
0.00 +0.1 0.06 ? 13% perf-profile.children.cycles-pp.__set_page_dirty_no_writeback
0.00 +0.1 0.06 ? 13% perf-profile.children.cycles-pp.percpu_counter_add_batch
0.00 +0.1 0.06 ? 13% perf-profile.children.cycles-pp.filemap_free_folio
0.09 ? 13% +0.1 0.15 ? 10% perf-profile.children.cycles-pp.xas_init_marks
0.00 +0.1 0.06 ? 19% perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
0.13 ? 14% +0.1 0.20 ? 12% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.11 ? 14% +0.1 0.18 ? 12% perf-profile.children.cycles-pp.free_unref_page_list
0.13 ? 18% +0.1 0.22 ? 17% perf-profile.children.cycles-pp.__mod_node_page_state
0.19 ? 10% +0.1 0.29 ? 12% perf-profile.children.cycles-pp.mem_cgroup_charge_statistics
0.09 ? 14% +0.1 0.19 ? 10% perf-profile.children.cycles-pp.kthread
0.18 ? 17% +0.1 0.28 ? 16% perf-profile.children.cycles-pp.__mod_lruvec_state
0.09 ? 14% +0.1 0.20 ? 10% perf-profile.children.cycles-pp.ret_from_fork
0.18 ? 11% +0.1 0.29 ? 10% perf-profile.children.cycles-pp.get_page_from_freelist
0.08 ? 12% +0.1 0.18 ? 9% perf-profile.children.cycles-pp.run_ksoftirqd
0.08 ? 12% +0.1 0.19 ? 9% perf-profile.children.cycles-pp.smpboot_thread_fn
0.17 ? 11% +0.1 0.29 ? 10% perf-profile.children.cycles-pp.obj_cgroup_uncharge_pages
0.10 ? 18% +0.1 0.22 ? 15% perf-profile.children.cycles-pp.memcg_check_events
0.19 ? 13% +0.1 0.31 ? 10% perf-profile.children.cycles-pp.kmem_cache_free
0.20 ? 12% +0.1 0.33 ? 10% perf-profile.children.cycles-pp.rcu_core
0.20 ? 13% +0.1 0.33 ? 10% perf-profile.children.cycles-pp.rcu_do_batch
0.20 ? 12% +0.1 0.33 ? 10% perf-profile.children.cycles-pp.__softirqentry_text_start
0.22 ? 18% +0.1 0.37 ? 20% perf-profile.children.cycles-pp.find_lock_entries
0.27 ? 15% +0.2 0.43 ? 10% perf-profile.children.cycles-pp.xas_store
0.27 ? 12% +0.2 0.44 ? 10% perf-profile.children.cycles-pp.__alloc_pages
0.35 ? 12% +0.2 0.55 ? 10% perf-profile.children.cycles-pp.alloc_pages_vma
0.43 ? 13% +0.2 0.67 ? 9% perf-profile.children.cycles-pp.shmem_alloc_page
0.57 ? 12% +0.3 0.90 ? 10% perf-profile.children.cycles-pp.shmem_alloc_and_acct_page
5.82 ? 8% +1.9 7.70 ? 12% perf-profile.children.cycles-pp.charge_memcg
1.31 ? 10% +2.2 3.56 ? 13% perf-profile.children.cycles-pp.page_counter_try_charge
1.72 ? 10% +3.0 4.73 ? 13% perf-profile.children.cycles-pp.try_charge_memcg
3.77 ? 20% +4.8 8.61 ? 10% perf-profile.children.cycles-pp.propagate_protected_usage
22.70 ? 10% +5.2 27.91 ? 12% perf-profile.children.cycles-pp.__pagevec_release
21.93 ? 10% +5.4 27.36 ? 12% perf-profile.children.cycles-pp.release_pages
4.96 ? 18% +6.7 11.66 ? 12% perf-profile.children.cycles-pp.page_counter_cancel
10.48 ? 17% +9.9 20.42 ? 11% perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
8.88 ? 19% +10.5 19.35 ? 11% perf-profile.children.cycles-pp.uncharge_batch
8.19 ? 19% +10.5 18.73 ? 11% perf-profile.children.cycles-pp.page_counter_uncharge
32.10 ? 11% -11.2 20.92 ? 11% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
6.57 ? 9% -3.5 3.08 ? 12% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
3.84 ? 9% -1.3 2.58 ? 14% perf-profile.self.cycles-pp.charge_memcg
3.02 ? 8% -1.2 1.85 ? 14% perf-profile.self.cycles-pp.__mem_cgroup_charge
1.59 ? 10% -0.5 1.05 ? 16% perf-profile.self.cycles-pp.uncharge_folio
0.39 ? 13% -0.2 0.16 ? 14% perf-profile.self.cycles-pp.__mod_lruvec_page_state
0.05 ? 45% +0.0 0.08 ? 10% perf-profile.self.cycles-pp.fallocate64
0.06 ? 15% +0.0 0.09 ? 11% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.04 ? 45% +0.0 0.08 ? 11% perf-profile.self.cycles-pp.filemap_remove_folio
0.06 ? 15% +0.0 0.10 ? 11% perf-profile.self.cycles-pp.xas_clear_mark
0.06 ? 11% +0.0 0.10 ? 12% perf-profile.self.cycles-pp.shmem_getpage_gfp
0.05 ? 45% +0.0 0.08 ? 5% perf-profile.self.cycles-pp.__alloc_pages
0.06 ? 14% +0.0 0.10 ? 10% perf-profile.self.cycles-pp.shmem_pseudo_vma_init
0.06 ? 16% +0.0 0.10 ? 11% perf-profile.self.cycles-pp.xas_load
0.07 ? 12% +0.0 0.11 ? 9% perf-profile.self.cycles-pp.folio_unlock
0.05 ? 46% +0.0 0.09 ? 14% perf-profile.self.cycles-pp.__list_add_valid
0.03 ?100% +0.0 0.07 ? 9% perf-profile.self.cycles-pp.folio_add_lru
0.02 ? 99% +0.0 0.07 ? 13% perf-profile.self.cycles-pp.folio_mapping
0.02 ? 99% +0.0 0.07 ? 11% perf-profile.self.cycles-pp.__mod_lruvec_state
0.09 ? 15% +0.0 0.14 ? 10% perf-profile.self.cycles-pp.__might_resched
0.08 ? 16% +0.0 0.13 ? 10% perf-profile.self.cycles-pp._raw_spin_lock
0.09 ? 15% +0.1 0.14 ? 12% perf-profile.self.cycles-pp.xas_store
0.01 ?223% +0.1 0.06 ? 11% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.00 +0.1 0.05 ? 8% perf-profile.self.cycles-pp.shmem_alloc_and_acct_page
0.00 +0.1 0.05 ? 8% perf-profile.self.cycles-pp.truncate_cleanup_folio
0.00 +0.1 0.05 ? 8% perf-profile.self.cycles-pp.__filemap_get_folio
0.00 +0.1 0.05 ? 8% perf-profile.self.cycles-pp.free_unref_page_commit
0.00 +0.1 0.06 ? 13% perf-profile.self.cycles-pp.percpu_counter_add_batch
0.00 +0.1 0.06 ? 9% perf-profile.self.cycles-pp.__fget_light
0.09 ? 12% +0.1 0.14 ? 18% perf-profile.self.cycles-pp.__list_del_entry_valid
0.00 +0.1 0.06 ? 11% perf-profile.self.cycles-pp.filemap_free_folio
0.00 +0.1 0.06 ? 13% perf-profile.self.cycles-pp.__set_page_dirty_no_writeback
0.10 ? 13% +0.1 0.17 ? 10% perf-profile.self.cycles-pp.shmem_fallocate
0.00 +0.1 0.06 ? 19% perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
0.11 ? 13% +0.1 0.18 ? 12% perf-profile.self.cycles-pp.get_page_from_freelist
0.12 ? 12% +0.1 0.20 ? 12% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.14 ? 13% +0.1 0.21 ? 9% perf-profile.self.cycles-pp.shmem_add_to_page_cache
0.13 ? 18% +0.1 0.21 ? 18% perf-profile.self.cycles-pp.__mod_node_page_state
0.08 ? 17% +0.1 0.18 ? 13% perf-profile.self.cycles-pp.memcg_check_events
0.20 ? 12% +0.1 0.31 ? 14% perf-profile.self.cycles-pp.release_pages
0.19 ? 19% +0.1 0.32 ? 22% perf-profile.self.cycles-pp.find_lock_entries
0.30 ? 13% +0.2 0.48 ? 11% perf-profile.self.cycles-pp.__pagevec_lru_add
0.41 ? 8% +0.8 1.16 ? 14% perf-profile.self.cycles-pp.try_charge_memcg
0.76 ? 10% +1.2 2.01 ? 14% perf-profile.self.cycles-pp.page_counter_try_charge
3.74 ? 20% +4.8 8.53 ? 10% perf-profile.self.cycles-pp.propagate_protected_usage
4.92 ? 18% +6.6 11.56 ? 12% perf-profile.self.cycles-pp.page_counter_cancel




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (28.15 kB)
config-5.17.0-00154-ge39bb6be9f2b (35.53 kB)
Download all attachments

2022-04-04 23:39:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [NUMA Balancing] e39bb6be9f: will-it-scale.per_thread_ops 64.4% improvement

On Fri, Apr 1, 2022 at 2:42 AM kernel test robot <[email protected]> wrote:
>
> FYI, we noticed a 64.4% improvement of will-it-scale.per_thread_ops due to commit:
> e39bb6be9f2b ("NUMA Balancing: add page promotion counter")

That looks odd and unlikely.

That commit only modifies some page counting statistics. Sure, it
could be another cache layout thing, and maybe it's due to the subtle
change in how NUMA_PAGE_MIGRATE gets counted, but it still looks a bit
odd.

Linus