2020-11-15 12:51:52

by kernel test robot

[permalink] [raw]
Subject: [mm] be5d0a74c6: will-it-scale.per_thread_ops -9.1% regression


Greeting,

FYI, we noticed a -9.1% regression of will-it-scale.per_thread_ops due to commit:


commit: be5d0a74c62d8da43f9526a5b08cdd18e2bbc37a ("mm: memcontrol: switch to native NR_ANON_MAPPED counter")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

nr_task: 50%
mode: thread
test: page_fault2
cpufreq_governor: performance
ucode: 0x5002f01

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/thread/50%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap3/page_fault2/will-it-scale/0x5002f01

commit:
0d1c20722a ("mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters")
be5d0a74c6 ("mm: memcontrol: switch to native NR_ANON_MAPPED counter")

0d1c20722ab333ac be5d0a74c62d8da43f9526a5b08
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
0:4 2% 1:4 perf-profile.children.cycles-pp.error_entry
%stddev %change %stddev
\ | \
24003 -9.1% 21817 will-it-scale.per_thread_ops
2304340 -9.1% 2094545 will-it-scale.workload
31.65 -3.3 28.31 mpstat.cpu.all.sys%
5041945 ? 5% -15.9% 4238691 ? 7% sched_debug.cfs_rq:/.min_vruntime.max
1.906e+08 -23.3% 1.463e+08 ? 19% numa-numastat.node1.local_node
1.907e+08 -23.3% 1.463e+08 ? 19% numa-numastat.node1.numa_hit
7666 ? 5% -9.1% 6970 ? 4% slabinfo.signal_cache.active_objs
7726 ? 5% -7.8% 7122 ? 4% slabinfo.signal_cache.num_objs
65.50 +5.0% 68.75 vmstat.cpu.id
63.00 ? 4% -13.9% 54.25 ? 4% vmstat.procs.r
21998 -8.3% 20167 vmstat.system.cs
32080 +9.3% 35056 softirqs.CPU143.SCHED
31620 +8.4% 34283 ? 3% softirqs.CPU145.SCHED
109145 ? 4% +29.4% 141275 ? 15% softirqs.CPU65.TIMER
23752 ? 9% -10.3% 21304 ? 8% softirqs.CPU74.RCU
9541 -3.5% 9202 ? 2% proc-vmstat.nr_mapped
7.004e+08 -8.9% 6.379e+08 proc-vmstat.numa_hit
7.003e+08 -8.9% 6.378e+08 proc-vmstat.numa_local
25916 ? 27% -32.2% 17581 ? 3% proc-vmstat.numa_pages_migrated
18598 ? 7% -14.7% 15872 ? 5% proc-vmstat.pgactivate
7.005e+08 -8.9% 6.38e+08 proc-vmstat.pgalloc_normal
6.957e+08 -9.0% 6.334e+08 proc-vmstat.pgfault
7.004e+08 -8.9% 6.38e+08 proc-vmstat.pgfree
25916 ? 27% -32.2% 17581 ? 3% proc-vmstat.pgmigrate_success
2.226e+09 -5.9% 2.095e+09 perf-stat.i.branch-instructions
69.75 -2.7 67.03 perf-stat.i.cache-miss-rate%
2.485e+08 ? 2% -10.6% 2.22e+08 ? 5% perf-stat.i.cache-misses
3.552e+08 ? 3% -7.0% 3.303e+08 ? 6% perf-stat.i.cache-references
22139 -8.4% 20269 perf-stat.i.context-switches
17.59 -5.0% 16.71 perf-stat.i.cpi
2.013e+11 -9.9% 1.814e+11 perf-stat.i.cpu-cycles
3.108e+09 -4.9% 2.957e+09 perf-stat.i.dTLB-loads
1.06 -0.0 1.01 perf-stat.i.dTLB-store-miss-rate%
18609753 -9.7% 16801004 perf-stat.i.dTLB-store-misses
1.743e+09 -5.5% 1.646e+09 perf-stat.i.dTLB-stores
6478243 ? 2% -8.4% 5931693 ? 4% perf-stat.i.iTLB-load-misses
3960032 -10.0% 3562984 ? 15% perf-stat.i.iTLB-loads
1.152e+10 -5.1% 1.093e+10 perf-stat.i.instructions
0.06 ? 2% +5.6% 0.06 perf-stat.i.ipc
1.05 -9.9% 0.94 perf-stat.i.metric.GHz
0.12 -9.0% 0.11 perf-stat.i.metric.K/sec
39.34 -5.5% 37.19 perf-stat.i.metric.M/sec
2295534 -9.0% 2087952 perf-stat.i.minor-faults
74342929 -8.7% 67864433 perf-stat.i.node-loads
12919196 -10.7% 11532925 ? 2% perf-stat.i.node-stores
2295534 -9.0% 2087952 perf-stat.i.page-faults
69.96 -2.7 67.24 perf-stat.overall.cache-miss-rate%
17.47 -5.1% 16.58 perf-stat.overall.cpi
1.06 -0.0 1.01 perf-stat.overall.dTLB-store-miss-rate%
0.06 +5.4% 0.06 perf-stat.overall.ipc
1515213 +4.6% 1584252 perf-stat.overall.path-length
2.218e+09 -5.8% 2.089e+09 perf-stat.ps.branch-instructions
2.476e+08 ? 2% -10.6% 2.213e+08 ? 5% perf-stat.ps.cache-misses
3.539e+08 ? 3% -7.0% 3.292e+08 ? 6% perf-stat.ps.cache-references
22055 -8.4% 20198 perf-stat.ps.context-switches
2.005e+11 -9.9% 1.807e+11 perf-stat.ps.cpu-cycles
3.097e+09 -4.8% 2.948e+09 perf-stat.ps.dTLB-loads
18541065 -9.7% 16744486 perf-stat.ps.dTLB-store-misses
1.736e+09 -5.5% 1.641e+09 perf-stat.ps.dTLB-stores
6453657 ? 2% -8.4% 5911996 ? 4% perf-stat.ps.iTLB-load-misses
3944828 -10.0% 3550349 ? 15% perf-stat.ps.iTLB-loads
1.148e+10 -5.0% 1.09e+10 perf-stat.ps.instructions
2287083 -9.0% 2080917 perf-stat.ps.minor-faults
74068282 -8.7% 67627572 perf-stat.ps.node-loads
12871415 -10.7% 11494030 ? 2% perf-stat.ps.node-stores
2287083 -9.0% 2080917 perf-stat.ps.page-faults
3.492e+12 -5.0% 3.318e+12 perf-stat.total.instructions
0.82 ? 7% +0.2 1.01 perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.81 ? 7% +0.2 1.00 perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
0.76 ? 8% +0.2 0.94 perf-profile.calltrace.cycles-pp.zap_pte_range.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
0.83 ? 8% +0.2 1.02 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
0.83 ? 8% +0.2 1.02 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.83 ? 8% +0.2 1.02 perf-profile.calltrace.cycles-pp.__munmap
0.83 ? 8% +0.2 1.02 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.83 ? 8% +0.2 1.02 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.76 ? 8% +0.2 0.95 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
0.76 ? 8% +0.2 0.95 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
0.13 ?173% +0.4 0.55 ? 2% perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.__lru_cache_add.alloc_set_pte.finish_fault.do_fault
7.55 ? 7% +1.2 8.73 ? 3% perf-profile.calltrace.cycles-pp.alloc_set_pte.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
0.23 ? 8% -0.0 0.20 ? 2% perf-profile.children.cycles-pp.mem_cgroup_charge_statistics
0.12 ? 10% +0.0 0.15 ? 3% perf-profile.children.cycles-pp.page_counter_try_charge
0.19 ? 8% +0.0 0.22 perf-profile.children.cycles-pp.try_charge
0.34 ? 8% +0.0 0.38 ? 2% perf-profile.children.cycles-pp.__pagevec_lru_add_fn
0.12 ? 17% +0.0 0.17 ? 13% perf-profile.children.cycles-pp.update_load_avg
0.19 ? 21% +0.1 0.24 ? 14% perf-profile.children.cycles-pp.update_cfs_group
0.00 +0.1 0.07 ? 6% perf-profile.children.cycles-pp.__mod_node_page_state
0.48 ? 7% +0.1 0.55 ? 3% perf-profile.children.cycles-pp.pagevec_lru_move_fn
0.04 ? 58% +0.1 0.14 ? 3% perf-profile.children.cycles-pp.page_remove_rmap
0.40 ? 15% +0.1 0.50 ? 13% perf-profile.children.cycles-pp.scheduler_tick
0.36 ? 17% +0.1 0.46 ? 13% perf-profile.children.cycles-pp.task_tick_fair
0.18 ? 22% +0.1 0.30 ? 7% perf-profile.children.cycles-pp.start_kernel
0.82 ? 8% +0.2 1.01 perf-profile.children.cycles-pp.__do_munmap
0.94 ? 8% +0.2 1.13 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.76 ? 7% +0.2 0.95 perf-profile.children.cycles-pp.unmap_page_range
0.76 ? 7% +0.2 0.95 perf-profile.children.cycles-pp.zap_pte_range
0.83 ? 8% +0.2 1.02 perf-profile.children.cycles-pp.__vm_munmap
0.83 ? 8% +0.2 1.02 perf-profile.children.cycles-pp.__x64_sys_munmap
0.94 ? 8% +0.2 1.13 perf-profile.children.cycles-pp.do_syscall_64
0.76 ? 7% +0.2 0.95 perf-profile.children.cycles-pp.unmap_vmas
0.81 ? 7% +0.2 1.00 perf-profile.children.cycles-pp.unmap_region
0.83 ? 8% +0.2 1.02 perf-profile.children.cycles-pp.__munmap
0.23 ? 12% +0.2 0.43 ? 6% perf-profile.children.cycles-pp.__count_memcg_events
0.50 ? 13% +0.2 0.73 ? 37% perf-profile.children.cycles-pp.tick_sched_timer
0.46 ? 13% +0.2 0.69 ? 40% perf-profile.children.cycles-pp.update_process_times
0.47 ? 12% +0.2 0.70 ? 40% perf-profile.children.cycles-pp.tick_sched_handle
0.20 ? 6% +0.3 0.46 perf-profile.children.cycles-pp.__mod_lruvec_state
0.00 +0.4 0.40 perf-profile.children.cycles-pp.page_add_new_anon_rmap
7.56 ? 7% +1.2 8.74 ? 3% perf-profile.children.cycles-pp.alloc_set_pte
0.10 ? 21% +0.0 0.14 ? 13% perf-profile.self.cycles-pp.update_load_avg
0.03 ?100% +0.0 0.07 perf-profile.self.cycles-pp.page_remove_rmap
0.00 +0.1 0.05 perf-profile.self.cycles-pp.do_user_addr_fault
0.19 ? 21% +0.1 0.24 ? 14% perf-profile.self.cycles-pp.update_cfs_group
0.00 +0.1 0.07 ? 6% perf-profile.self.cycles-pp.__mod_node_page_state
0.04 ? 58% +0.1 0.11 ? 7% perf-profile.self.cycles-pp.__mod_lruvec_state
0.24 ? 7% +0.1 0.35 perf-profile.self.cycles-pp.zap_pte_range
0.23 ? 12% +0.2 0.43 ? 6% perf-profile.self.cycles-pp.__count_memcg_events
0.00 +0.2 0.21 ? 3% perf-profile.self.cycles-pp.page_add_new_anon_rmap
3065 ? 27% -42.7% 1757 interrupts.CPU0.NMI:Non-maskable_interrupts
3065 ? 27% -42.7% 1757 interrupts.CPU0.PMI:Performance_monitoring_interrupts
504.00 ? 10% +46.4% 737.75 ? 10% interrupts.CPU0.RES:Rescheduling_interrupts
3905 ? 45% -53.6% 1813 ? 26% interrupts.CPU101.NMI:Non-maskable_interrupts
3905 ? 45% -53.6% 1813 ? 26% interrupts.CPU101.PMI:Performance_monitoring_interrupts
4411 ? 26% -39.2% 2681 ? 44% interrupts.CPU102.NMI:Non-maskable_interrupts
4411 ? 26% -39.2% 2681 ? 44% interrupts.CPU102.PMI:Performance_monitoring_interrupts
240.75 ? 20% -34.3% 158.25 ? 19% interrupts.CPU102.RES:Rescheduling_interrupts
3503 ? 14% -44.3% 1950 ? 17% interrupts.CPU103.NMI:Non-maskable_interrupts
3503 ? 14% -44.3% 1950 ? 17% interrupts.CPU103.PMI:Performance_monitoring_interrupts
3953 ? 26% -66.6% 1319 ? 33% interrupts.CPU105.NMI:Non-maskable_interrupts
3953 ? 26% -66.6% 1319 ? 33% interrupts.CPU105.PMI:Performance_monitoring_interrupts
209.75 ? 9% -24.4% 158.50 ? 22% interrupts.CPU105.RES:Rescheduling_interrupts
4731 ? 17% -38.4% 2916 ? 42% interrupts.CPU107.NMI:Non-maskable_interrupts
4731 ? 17% -38.4% 2916 ? 42% interrupts.CPU107.PMI:Performance_monitoring_interrupts
219.75 ? 6% -25.5% 163.75 ? 26% interrupts.CPU109.RES:Rescheduling_interrupts
344.00 ? 10% -33.6% 228.50 ? 11% interrupts.CPU11.RES:Rescheduling_interrupts
4777 ? 21% -54.1% 2194 ? 24% interrupts.CPU110.NMI:Non-maskable_interrupts
4777 ? 21% -54.1% 2194 ? 24% interrupts.CPU110.PMI:Performance_monitoring_interrupts
4894 ? 26% -47.6% 2565 ? 40% interrupts.CPU111.NMI:Non-maskable_interrupts
4894 ? 26% -47.6% 2565 ? 40% interrupts.CPU111.PMI:Performance_monitoring_interrupts
25831 ? 17% -44.7% 14290 ? 8% interrupts.CPU112.CAL:Function_call_interrupts
5598 -54.1% 2568 ? 55% interrupts.CPU112.NMI:Non-maskable_interrupts
5598 -54.1% 2568 ? 55% interrupts.CPU112.PMI:Performance_monitoring_interrupts
287.50 ? 22% -49.5% 145.25 ? 39% interrupts.CPU112.RES:Rescheduling_interrupts
23913 ? 18% -48.9% 12221 ? 9% interrupts.CPU112.TLB:TLB_shootdowns
22230 ? 17% -27.9% 16030 ? 15% interrupts.CPU113.CAL:Function_call_interrupts
20295 ? 19% -30.9% 14014 ? 18% interrupts.CPU113.TLB:TLB_shootdowns
20557 ? 12% -50.0% 10278 ? 19% interrupts.CPU114.CAL:Function_call_interrupts
204.00 ? 13% -29.7% 143.50 ? 23% interrupts.CPU114.RES:Rescheduling_interrupts
18589 ? 13% -56.2% 8139 ? 24% interrupts.CPU114.TLB:TLB_shootdowns
246.75 ? 24% -42.5% 142.00 ? 8% interrupts.CPU115.RES:Rescheduling_interrupts
22949 ? 5% -26.7% 16810 ? 22% interrupts.CPU116.CAL:Function_call_interrupts
305.25 ? 30% -38.9% 186.50 ? 15% interrupts.CPU116.RES:Rescheduling_interrupts
21033 ? 6% -29.5% 14828 ? 26% interrupts.CPU116.TLB:TLB_shootdowns
3504 ? 6% -42.5% 2015 ? 39% interrupts.CPU118.NMI:Non-maskable_interrupts
3504 ? 6% -42.5% 2015 ? 39% interrupts.CPU118.PMI:Performance_monitoring_interrupts
21514 ? 21% -29.5% 15157 ? 26% interrupts.CPU119.CAL:Function_call_interrupts
19583 ? 23% -33.0% 13111 ? 31% interrupts.CPU119.TLB:TLB_shootdowns
21379 ? 29% -30.1% 14946 ? 9% interrupts.CPU120.CAL:Function_call_interrupts
19469 ? 32% -33.8% 12882 ? 11% interrupts.CPU120.TLB:TLB_shootdowns
22738 ? 27% -45.0% 12513 ? 15% interrupts.CPU121.CAL:Function_call_interrupts
20913 ? 30% -50.3% 10388 ? 19% interrupts.CPU121.TLB:TLB_shootdowns
22173 ? 28% -32.9% 14889 ? 3% interrupts.CPU124.CAL:Function_call_interrupts
20282 ? 31% -36.7% 12837 ? 4% interrupts.CPU124.TLB:TLB_shootdowns
3720 ? 46% -38.1% 2301 ? 61% interrupts.CPU130.NMI:Non-maskable_interrupts
3720 ? 46% -38.1% 2301 ? 61% interrupts.CPU130.PMI:Performance_monitoring_interrupts
4589 ? 20% -49.2% 2332 ? 43% interrupts.CPU140.NMI:Non-maskable_interrupts
4589 ? 20% -49.2% 2332 ? 43% interrupts.CPU140.PMI:Performance_monitoring_interrupts
4601 ? 19% -41.3% 2699 ? 55% interrupts.CPU141.NMI:Non-maskable_interrupts
4601 ? 19% -41.3% 2699 ? 55% interrupts.CPU141.PMI:Performance_monitoring_interrupts
25068 ? 20% -52.6% 11894 ? 20% interrupts.CPU143.CAL:Function_call_interrupts
4765 ? 19% -53.9% 2197 ? 29% interrupts.CPU143.NMI:Non-maskable_interrupts
4765 ? 19% -53.9% 2197 ? 29% interrupts.CPU143.PMI:Performance_monitoring_interrupts
203.75 ? 8% -36.2% 130.00 ? 25% interrupts.CPU143.RES:Rescheduling_interrupts
23255 ? 22% -58.0% 9773 ? 25% interrupts.CPU143.TLB:TLB_shootdowns
4429 ? 25% -47.4% 2328 ? 68% interrupts.CPU144.NMI:Non-maskable_interrupts
4429 ? 25% -47.4% 2328 ? 68% interrupts.CPU144.PMI:Performance_monitoring_interrupts
26160 ? 16% -47.7% 13686 ? 16% interrupts.CPU145.CAL:Function_call_interrupts
4429 ? 17% -60.2% 1762 interrupts.CPU145.NMI:Non-maskable_interrupts
4429 ? 17% -60.2% 1762 interrupts.CPU145.PMI:Performance_monitoring_interrupts
225.25 ? 15% -31.0% 155.50 ? 14% interrupts.CPU145.RES:Rescheduling_interrupts
24344 ? 18% -52.3% 11618 ? 20% interrupts.CPU145.TLB:TLB_shootdowns
17912 ? 15% -26.5% 13163 ? 20% interrupts.CPU146.CAL:Function_call_interrupts
15945 ? 17% -30.3% 11116 ? 24% interrupts.CPU146.TLB:TLB_shootdowns
399.25 ? 39% -53.2% 187.00 ? 34% interrupts.CPU15.RES:Rescheduling_interrupts
18871 ? 15% -28.4% 13508 ? 10% interrupts.CPU150.CAL:Function_call_interrupts
3975 ? 31% -43.2% 2256 ? 63% interrupts.CPU150.NMI:Non-maskable_interrupts
3975 ? 31% -43.2% 2256 ? 63% interrupts.CPU150.PMI:Performance_monitoring_interrupts
16909 ? 17% -32.5% 11415 ? 12% interrupts.CPU150.TLB:TLB_shootdowns
4982 ? 13% -51.8% 2403 ? 63% interrupts.CPU151.NMI:Non-maskable_interrupts
4982 ? 13% -51.8% 2403 ? 63% interrupts.CPU151.PMI:Performance_monitoring_interrupts
19960 ? 45% -39.7% 12032 ? 45% interrupts.CPU174.TLB:TLB_shootdowns
3683 ? 31% -43.2% 2092 ? 15% interrupts.CPU178.NMI:Non-maskable_interrupts
3683 ? 31% -43.2% 2092 ? 15% interrupts.CPU178.PMI:Performance_monitoring_interrupts
142.50 ? 10% +36.7% 194.75 ? 27% interrupts.CPU179.RES:Rescheduling_interrupts
206.75 ? 18% +28.1% 264.75 ? 6% interrupts.CPU18.RES:Rescheduling_interrupts
130.25 ? 9% +49.9% 195.25 ? 40% interrupts.CPU184.RES:Rescheduling_interrupts
4909 ? 13% -46.0% 2649 ? 49% interrupts.CPU39.NMI:Non-maskable_interrupts
4909 ? 13% -46.0% 2649 ? 49% interrupts.CPU39.PMI:Performance_monitoring_interrupts
212.25 ? 9% +37.9% 292.75 ? 15% interrupts.CPU39.RES:Rescheduling_interrupts
22686 ? 15% -34.5% 14862 ? 15% interrupts.CPU41.CAL:Function_call_interrupts
20805 ? 16% -38.4% 12805 ? 18% interrupts.CPU41.TLB:TLB_shootdowns
19999 ? 15% -32.6% 13482 ? 38% interrupts.CPU43.CAL:Function_call_interrupts
18121 ? 17% -37.1% 11392 ? 47% interrupts.CPU43.TLB:TLB_shootdowns
205.75 ? 12% +436.3% 1103 ?133% interrupts.CPU46.RES:Rescheduling_interrupts
15151 ? 19% +29.3% 19597 ? 29% interrupts.CPU47.CAL:Function_call_interrupts
153.25 ? 18% +307.5% 624.50 ?109% interrupts.CPU47.RES:Rescheduling_interrupts
13170 ? 23% +34.0% 17651 ? 32% interrupts.CPU47.TLB:TLB_shootdowns
142.50 ? 7% +67.4% 238.50 ? 18% interrupts.CPU49.RES:Rescheduling_interrupts
4583 ? 21% -39.5% 2771 ? 55% interrupts.CPU56.NMI:Non-maskable_interrupts
4583 ? 21% -39.5% 2771 ? 55% interrupts.CPU56.PMI:Performance_monitoring_interrupts
21903 ? 31% -35.7% 14081 ? 25% interrupts.CPU58.CAL:Function_call_interrupts
227.75 ? 11% -28.5% 162.75 ? 16% interrupts.CPU58.RES:Rescheduling_interrupts
19981 ? 34% -39.8% 12031 ? 30% interrupts.CPU58.TLB:TLB_shootdowns
22651 ? 25% -29.7% 15924 ? 25% interrupts.CPU59.CAL:Function_call_interrupts
20700 ? 28% -32.8% 13915 ? 30% interrupts.CPU59.TLB:TLB_shootdowns
21110 ? 21% -40.9% 12473 ? 17% interrupts.CPU61.CAL:Function_call_interrupts
219.50 ? 23% -40.7% 130.25 ? 23% interrupts.CPU61.RES:Rescheduling_interrupts
19180 ? 23% -45.9% 10375 ? 21% interrupts.CPU61.TLB:TLB_shootdowns
20792 ? 22% -22.7% 16082 ? 27% interrupts.CPU62.CAL:Function_call_interrupts
3800 ? 46% -38.6% 2332 ? 66% interrupts.CPU62.NMI:Non-maskable_interrupts
3800 ? 46% -38.6% 2332 ? 66% interrupts.CPU62.PMI:Performance_monitoring_interrupts
18881 ? 25% -25.5% 14063 ? 31% interrupts.CPU62.TLB:TLB_shootdowns
4375 ? 30% -50.4% 2171 ? 76% interrupts.CPU63.NMI:Non-maskable_interrupts
4375 ? 30% -50.4% 2171 ? 76% interrupts.CPU63.PMI:Performance_monitoring_interrupts
21452 ? 15% -45.3% 11737 ? 21% interrupts.CPU66.CAL:Function_call_interrupts
19571 ? 17% -50.9% 9613 ? 26% interrupts.CPU66.TLB:TLB_shootdowns
22818 ? 26% -34.2% 15006 ? 15% interrupts.CPU68.CAL:Function_call_interrupts
20956 ? 29% -38.0% 12989 ? 18% interrupts.CPU68.TLB:TLB_shootdowns
2585 ? 26% -32.0% 1757 interrupts.CPU69.NMI:Non-maskable_interrupts
2585 ? 26% -32.0% 1757 interrupts.CPU69.PMI:Performance_monitoring_interrupts
22490 ? 17% -41.3% 13193 ? 28% interrupts.CPU70.CAL:Function_call_interrupts
210.00 ? 10% -31.9% 143.00 ? 19% interrupts.CPU70.RES:Rescheduling_interrupts
20594 ? 19% -45.9% 11141 ? 34% interrupts.CPU70.TLB:TLB_shootdowns
2763 ? 28% -28.8% 1967 ? 16% interrupts.CPU71.NMI:Non-maskable_interrupts
2763 ? 28% -28.8% 1967 ? 16% interrupts.CPU71.PMI:Performance_monitoring_interrupts
18770 ? 23% -39.5% 11362 ? 19% interrupts.CPU72.CAL:Function_call_interrupts
17416 ? 29% -45.3% 9531 ? 19% interrupts.CPU72.TLB:TLB_shootdowns
19106 ? 26% -34.2% 12580 ? 20% interrupts.CPU74.CAL:Function_call_interrupts
17297 ? 30% -39.1% 10537 ? 24% interrupts.CPU74.TLB:TLB_shootdowns
4060 ? 26% -46.5% 2174 ? 42% interrupts.CPU79.NMI:Non-maskable_interrupts
4060 ? 26% -46.5% 2174 ? 42% interrupts.CPU79.PMI:Performance_monitoring_interrupts
4761 ? 3% -53.2% 2226 ? 36% interrupts.CPU8.NMI:Non-maskable_interrupts
4761 ? 3% -53.2% 2226 ? 36% interrupts.CPU8.PMI:Performance_monitoring_interrupts
19922 ? 16% -30.0% 13943 ? 25% interrupts.CPU80.CAL:Function_call_interrupts
17990 ? 18% -33.8% 11902 ? 29% interrupts.CPU80.TLB:TLB_shootdowns
24340 ? 25% -46.3% 13081 ? 37% interrupts.CPU83.CAL:Function_call_interrupts
22584 ? 27% -51.1% 11051 ? 45% interrupts.CPU83.TLB:TLB_shootdowns
26501 ? 19% -50.5% 13116 ? 40% interrupts.CPU88.CAL:Function_call_interrupts
5063 ? 15% -45.2% 2773 ? 48% interrupts.CPU88.NMI:Non-maskable_interrupts
5063 ? 15% -45.2% 2773 ? 48% interrupts.CPU88.PMI:Performance_monitoring_interrupts
218.00 ? 10% -37.2% 137.00 ? 28% interrupts.CPU88.RES:Rescheduling_interrupts
24693 ? 21% -55.2% 11062 ? 49% interrupts.CPU88.TLB:TLB_shootdowns
19304 ? 28% -35.9% 12371 ? 14% interrupts.CPU92.CAL:Function_call_interrupts
17361 ? 31% -40.4% 10344 ? 18% interrupts.CPU92.TLB:TLB_shootdowns
5646 -53.3% 2639 ? 57% interrupts.CPU95.NMI:Non-maskable_interrupts
5646 -53.3% 2639 ? 57% interrupts.CPU95.PMI:Performance_monitoring_interrupts
4544 ? 14% -51.9% 2185 ? 12% interrupts.CPU97.NMI:Non-maskable_interrupts
4544 ? 14% -51.9% 2185 ? 12% interrupts.CPU97.PMI:Performance_monitoring_interrupts
271.75 ? 18% -28.9% 193.25 ? 30% interrupts.CPU98.RES:Rescheduling_interrupts
721411 ? 8% -20.0% 577152 ? 6% interrupts.NMI:Non-maskable_interrupts
721411 ? 8% -20.0% 577152 ? 6% interrupts.PMI:Performance_monitoring_interrupts



will-it-scale.per_thread_ops

25500 +-------------------------------------------------------------------+
| + |
25000 |-+ : : |
24500 |-+ + : : + |
| .+. .+. :: +.+ : + + |
24000 |-+. .+. + +.+ +. .+ : +. .+. +. +. .+. + +.+ .+ +.|
23500 |++ + + +.+ :+ + +. + + + + + |
| + + |
23000 |-+ |
22500 |-+ |
| O |
22000 |-O O O O O O OO O O O O O O |
21500 |-+ O O O O O O |
| O O O |
21000 +-------------------------------------------------------------------+


will-it-scale.workload

2.45e+06 +----------------------------------------------------------------+
| + |
2.4e+06 |-+ :: |
2.35e+06 |-+ .+ : : + |
| .+. +.+. + : +.+ : :+ |
2.3e+06 |-+.++. .+ + +. : : +.+.+ +.+.+ .+. + +.+.+.+ +.|
2.25e+06 |++ + +.+ :+ +. + + + |
| + + |
2.2e+06 |-+ |
2.15e+06 |-+ |
| O O O O O O |
2.1e+06 |-O O O O O OO O O O O |
2.05e+06 |-+ O OO OO O |
| O |
2e+06 +----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Oliver Sang


Attachments:
(No filename) (28.54 kB)
config-5.7.0-03920-gbe5d0a74c62d8d (160.44 kB)
job-script (7.65 kB)
job.yaml (5.23 kB)
reproduce (352.00 B)
Download all attachments

2020-11-16 16:25:10

by Johannes Weiner

[permalink] [raw]
Subject: Re: [mm] be5d0a74c6: will-it-scale.per_thread_ops -9.1% regression

On Sun, Nov 15, 2020 at 05:55:44PM +0800, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a -9.1% regression of will-it-scale.per_thread_ops due to commit:
>
>
> commit: be5d0a74c62d8da43f9526a5b08cdd18e2bbc37a ("mm: memcontrol: switch to native NR_ANON_MAPPED counter")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
>
> in testcase: will-it-scale
> on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
> with following parameters:
>
> nr_task: 50%
> mode: thread
> test: page_fault2
> cpufreq_governor: performance
> ucode: 0x5002f01

I suspect it's the lock_page_memcg() in page_remove_rmap(). We already
needed it for shared mappings, and this patch added it to private path
as well, which this test exercises.

The slowpath for this lock is extremely cold - most of the time it's
just an rcu_read_lock(). But we're still doing the function call.

Could you try if this patch helps, please?

From f6e8e56b369109d1362de2c27ea6601d5c411b2e Mon Sep 17 00:00:00 2001
From: Johannes Weiner <[email protected]>
Date: Mon, 16 Nov 2020 10:48:06 -0500
Subject: [PATCH] lockpagememcg

---
include/linux/memcontrol.h | 61 ++++++++++++++++++++++++++--
mm/memcontrol.c | 82 +++++++-------------------------------
2 files changed, 73 insertions(+), 70 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 20108e426f84..b4b73e375948 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -842,9 +842,64 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
extern bool cgroup_memory_noswap;
#endif

-struct mem_cgroup *lock_page_memcg(struct page *page);
-void __unlock_page_memcg(struct mem_cgroup *memcg);
-void unlock_page_memcg(struct page *page);
+struct mem_cgroup *lock_page_memcg_slowpath(struct page *page,
+ struct mem_cgroup *memcg);
+void unlock_page_memcg_slowpath(struct mem_cgroup *memcg);
+
+/**
+ * lock_page_memcg - lock a page and memcg binding
+ * @page: the page
+ *
+ * This function protects unlocked LRU pages from being moved to
+ * another cgroup.
+ *
+ * It ensures lifetime of the memcg -- the caller is responsible for
+ * the lifetime of the page; __unlock_page_memcg() is available when
+ * @page might get freed inside the locked section.
+ */
+static inline struct mem_cgroup *lock_page_memcg(struct page *page)
+{
+ struct page *head = compound_head(page); /* rmap on tail pages */
+ struct mem_cgroup *memcg;
+
+ /*
+ * The RCU lock is held throughout the transaction. The fast
+ * path can get away without acquiring the memcg->move_lock
+ * because page moving starts with an RCU grace period.
+ *
+ * The RCU lock also protects the memcg from being freed when
+ * the page state that is going to change is the only thing
+ * preventing the page itself from being freed. E.g. writeback
+ * doesn't hold a page reference and relies on PG_writeback to
+ * keep off truncation, migration and so forth.
+ */
+ rcu_read_lock();
+
+ if (mem_cgroup_disabled())
+ return NULL;
+
+ memcg = page_memcg(head);
+ if (unlikely(!memcg))
+ return NULL;
+
+ if (likely(!atomic_read(&memcg->moving_account)))
+ return memcg;
+
+ return lock_page_memcg_slowpath(head, memcg);
+}
+
+static inline void __unlock_page_memcg(struct mem_cgroup *memcg)
+{
+ if (unlikely(memcg && memcg->move_lock_task == current))
+ unlock_page_memcg_slowpath(memcg);
+
+ rcu_read_unlock();
+}
+
+static inline void unlock_page_memcg(struct page *page)
+{
+ __unlock_page_memcg(page_memcg(compound_head(page)));
+}

/*
* idx can be of type enum memcg_stat_item or node_stat_item.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 69a2893a6455..9acc42388b86 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2084,49 +2084,19 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg)
pr_cont(" are going to be killed due to memory.oom.group set\n");
}

-/**
- * lock_page_memcg - lock a page and memcg binding
- * @page: the page
- *
- * This function protects unlocked LRU pages from being moved to
- * another cgroup.
- *
- * It ensures lifetime of the returned memcg. Caller is responsible
- * for the lifetime of the page; __unlock_page_memcg() is available
- * when @page might get freed inside the locked section.
- */
-struct mem_cgroup *lock_page_memcg(struct page *page)
+struct mem_cgroup *lock_page_memcg_slowpath(struct page *page,
+ struct mem_cgroup *memcg)
{
- struct page *head = compound_head(page); /* rmap on tail pages */
- struct mem_cgroup *memcg;
unsigned long flags;
-
- /*
- * The RCU lock is held throughout the transaction. The fast
- * path can get away without acquiring the memcg->move_lock
- * because page moving starts with an RCU grace period.
- *
- * The RCU lock also protects the memcg from being freed when
- * the page state that is going to change is the only thing
- * preventing the page itself from being freed. E.g. writeback
- * doesn't hold a page reference and relies on PG_writeback to
- * keep off truncation, migration and so forth.
- */
- rcu_read_lock();
-
- if (mem_cgroup_disabled())
- return NULL;
again:
- memcg = page_memcg(head);
- if (unlikely(!memcg))
- return NULL;
-
- if (atomic_read(&memcg->moving_account) <= 0)
- return memcg;
-
spin_lock_irqsave(&memcg->move_lock, flags);
- if (memcg != page_memcg(head)) {
+ if (memcg != page_memcg(page)) {
spin_unlock_irqrestore(&memcg->move_lock, flags);
+ memcg = page_memcg(page);
+ if (unlikely(!memcg))
+ return NULL;
+ if (!atomic_read(&memcg->moving_account))
+ return memcg;
goto again;
}

@@ -2140,39 +2110,17 @@ struct mem_cgroup *lock_page_memcg(struct page *page)

return memcg;
}
-EXPORT_SYMBOL(lock_page_memcg);
-
-/**
- * __unlock_page_memcg - unlock and unpin a memcg
- * @memcg: the memcg
- *
- * Unlock and unpin a memcg returned by lock_page_memcg().
- */
-void __unlock_page_memcg(struct mem_cgroup *memcg)
-{
- if (memcg && memcg->move_lock_task == current) {
- unsigned long flags = memcg->move_lock_flags;
-
- memcg->move_lock_task = NULL;
- memcg->move_lock_flags = 0;
-
- spin_unlock_irqrestore(&memcg->move_lock, flags);
- }
-
- rcu_read_unlock();
-}
+EXPORT_SYMBOL(lock_page_memcg_slowpath);

-/**
- * unlock_page_memcg - unlock a page and memcg binding
- * @page: the page
- */
-void unlock_page_memcg(struct page *page)
+void unlock_page_memcg_slowpath(struct mem_cgroup *memcg)
{
- struct page *head = compound_head(page);
+ unsigned long flags = memcg->move_lock_flags;

- __unlock_page_memcg(page_memcg(head));
+ memcg->move_lock_task = NULL;
+ memcg->move_lock_flags = 0;
+ spin_unlock_irqrestore(&memcg->move_lock, flags);
}
-EXPORT_SYMBOL(unlock_page_memcg);
+EXPORT_SYMBOL(unlock_page_memcg_slowpath);

struct memcg_stock_pcp {
struct mem_cgroup *cached; /* this never be root cgroup */
--
2.29.1

2020-11-18 02:53:02

by Xing Zhengjun

[permalink] [raw]
Subject: Re: [LKP] Re: [mm] be5d0a74c6: will-it-scale.per_thread_ops -9.1% regression



On 11/17/2020 12:19 AM, Johannes Weiner wrote:
> On Sun, Nov 15, 2020 at 05:55:44PM +0800, kernel test robot wrote:
>>
>> Greeting,
>>
>> FYI, we noticed a -9.1% regression of will-it-scale.per_thread_ops due to commit:
>>
>>
>> commit: be5d0a74c62d8da43f9526a5b08cdd18e2bbc37a ("mm: memcontrol: switch to native NR_ANON_MAPPED counter")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>>
>> in testcase: will-it-scale
>> on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
>> with following parameters:
>>
>> nr_task: 50%
>> mode: thread
>> test: page_fault2
>> cpufreq_governor: performance
>> ucode: 0x5002f01
>
> I suspect it's the lock_page_memcg() in page_remove_rmap(). We already
> needed it for shared mappings, and this patch added it to private path
> as well, which this test exercises.
>
> The slowpath for this lock is extremely cold - most of the time it's
> just an rcu_read_lock(). But we're still doing the function call.
>
> Could you try if this patch helps, please?

I apply the patch to Linux mainline v5.10-rc4, Linux-next next-20201117,
and "be5d0a74c6", they are all failed. What's your codebase for
the patch? I appreciate it if you can rebase the patch to "be5d0a74c6".
From "be5d0a74c6" to v5.10-rc4 or next-20201117, there are a lot of
commits, they will affect the test result. Thanks.

>
> From f6e8e56b369109d1362de2c27ea6601d5c411b2e Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <[email protected]>
> Date: Mon, 16 Nov 2020 10:48:06 -0500
> Subject: [PATCH] lockpagememcg
>
> ---
> include/linux/memcontrol.h | 61 ++++++++++++++++++++++++++--
> mm/memcontrol.c | 82 +++++++-------------------------------
> 2 files changed, 73 insertions(+), 70 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 20108e426f84..b4b73e375948 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -842,9 +842,64 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg);
> extern bool cgroup_memory_noswap;
> #endif
>
> -struct mem_cgroup *lock_page_memcg(struct page *page);
> -void __unlock_page_memcg(struct mem_cgroup *memcg);
> -void unlock_page_memcg(struct page *page);
> +struct mem_cgroup *lock_page_memcg_slowpath(struct page *page,
> + struct mem_cgroup *memcg);
> +void unlock_page_memcg_slowpath(struct mem_cgroup *memcg);
> +
> +/**
> + * lock_page_memcg - lock a page and memcg binding
> + * @page: the page
> + *
> + * This function protects unlocked LRU pages from being moved to
> + * another cgroup.
> + *
> + * It ensures lifetime of the memcg -- the caller is responsible for
> + * the lifetime of the page; __unlock_page_memcg() is available when
> + * @page might get freed inside the locked section.
> + */
> +static inline struct mem_cgroup *lock_page_memcg(struct page *page)
> +{
> + struct page *head = compound_head(page); /* rmap on tail pages */
> + struct mem_cgroup *memcg;
> +
> + /*
> + * The RCU lock is held throughout the transaction. The fast
> + * path can get away without acquiring the memcg->move_lock
> + * because page moving starts with an RCU grace period.
> + *
> + * The RCU lock also protects the memcg from being freed when
> + * the page state that is going to change is the only thing
> + * preventing the page itself from being freed. E.g. writeback
> + * doesn't hold a page reference and relies on PG_writeback to
> + * keep off truncation, migration and so forth.
> + */
> + rcu_read_lock();
> +
> + if (mem_cgroup_disabled())
> + return NULL;
> +
> + memcg = page_memcg(head);
> + if (unlikely(!memcg))
> + return NULL;
> +
> + if (likely(!atomic_read(&memcg->moving_account)))
> + return memcg;
> +
> + return lock_page_memcg_slowpath(head, memcg);
> +}
> +
> +static inline void __unlock_page_memcg(struct mem_cgroup *memcg)
> +{
> + if (unlikely(memcg && memcg->move_lock_task == current))
> + unlock_page_memcg_slowpath(memcg);
> +
> + rcu_read_unlock();
> +}
> +
> +static inline void unlock_page_memcg(struct page *page)
> +{
> + __unlock_page_memcg(page_memcg(compound_head(page)));
> +}
>
> /*
> * idx can be of type enum memcg_stat_item or node_stat_item.
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 69a2893a6455..9acc42388b86 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2084,49 +2084,19 @@ void mem_cgroup_print_oom_group(struct mem_cgroup *memcg)
> pr_cont(" are going to be killed due to memory.oom.group set\n");
> }
>
> -/**
> - * lock_page_memcg - lock a page and memcg binding
> - * @page: the page
> - *
> - * This function protects unlocked LRU pages from being moved to
> - * another cgroup.
> - *
> - * It ensures lifetime of the returned memcg. Caller is responsible
> - * for the lifetime of the page; __unlock_page_memcg() is available
> - * when @page might get freed inside the locked section.
> - */
> -struct mem_cgroup *lock_page_memcg(struct page *page)
> +struct mem_cgroup *lock_page_memcg_slowpath(struct page *page,
> + struct mem_cgroup *memcg)
> {
> - struct page *head = compound_head(page); /* rmap on tail pages */
> - struct mem_cgroup *memcg;
> unsigned long flags;
> -
> - /*
> - * The RCU lock is held throughout the transaction. The fast
> - * path can get away without acquiring the memcg->move_lock
> - * because page moving starts with an RCU grace period.
> - *
> - * The RCU lock also protects the memcg from being freed when
> - * the page state that is going to change is the only thing
> - * preventing the page itself from being freed. E.g. writeback
> - * doesn't hold a page reference and relies on PG_writeback to
> - * keep off truncation, migration and so forth.
> - */
> - rcu_read_lock();
> -
> - if (mem_cgroup_disabled())
> - return NULL;
> again:
> - memcg = page_memcg(head);
> - if (unlikely(!memcg))
> - return NULL;
> -
> - if (atomic_read(&memcg->moving_account) <= 0)
> - return memcg;
> -
> spin_lock_irqsave(&memcg->move_lock, flags);
> - if (memcg != page_memcg(head)) {
> + if (memcg != page_memcg(page)) {
> spin_unlock_irqrestore(&memcg->move_lock, flags);
> + memcg = page_memcg(page);
> + if (unlikely(!memcg))
> + return NULL;
> + if (!atomic_read(&memcg->moving_account))
> + return memcg;
> goto again;
> }
>
> @@ -2140,39 +2110,17 @@ struct mem_cgroup *lock_page_memcg(struct page *page)
>
> return memcg;
> }
> -EXPORT_SYMBOL(lock_page_memcg);
> -
> -/**
> - * __unlock_page_memcg - unlock and unpin a memcg
> - * @memcg: the memcg
> - *
> - * Unlock and unpin a memcg returned by lock_page_memcg().
> - */
> -void __unlock_page_memcg(struct mem_cgroup *memcg)
> -{
> - if (memcg && memcg->move_lock_task == current) {
> - unsigned long flags = memcg->move_lock_flags;
> -
> - memcg->move_lock_task = NULL;
> - memcg->move_lock_flags = 0;
> -
> - spin_unlock_irqrestore(&memcg->move_lock, flags);
> - }
> -
> - rcu_read_unlock();
> -}
> +EXPORT_SYMBOL(lock_page_memcg_slowpath);
>
> -/**
> - * unlock_page_memcg - unlock a page and memcg binding
> - * @page: the page
> - */
> -void unlock_page_memcg(struct page *page)
> +void unlock_page_memcg_slowpath(struct mem_cgroup *memcg)
> {
> - struct page *head = compound_head(page);
> + unsigned long flags = memcg->move_lock_flags;
>
> - __unlock_page_memcg(page_memcg(head));
> + memcg->move_lock_task = NULL;
> + memcg->move_lock_flags = 0;
> + spin_unlock_irqrestore(&memcg->move_lock, flags);
> }
> -EXPORT_SYMBOL(unlock_page_memcg);
> +EXPORT_SYMBOL(unlock_page_memcg_slowpath);
>
> struct memcg_stock_pcp {
> struct mem_cgroup *cached; /* this never be root cgroup */
>

--
Zhengjun Xing