2020-10-14 09:21:27

by Chen, Rong A

[permalink] [raw]
Subject: [mm/writeback] 8d92890bd6: will-it-scale.per_process_ops -15.3% regression

Greeting,

FYI, we noticed a -15.3% regression of will-it-scale.per_process_ops due to commit:


commit: 8d92890bd6b8502d6aee4b37430ae6444ade7a8c ("mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

nr_task: 100%
mode: process
test: page_fault3
cpufreq_governor: performance
ucode: 0x5002f01

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap4/page_fault3/will-it-scale/0x5002f01

commit:
a37b0715dd ("mm/writeback: replace PF_LESS_THROTTLE with PF_LOCAL_THROTTLE")
8d92890bd6 ("mm/writeback: discard NR_UNSTABLE_NFS, use NR_WRITEBACK instead")

a37b0715ddf30077 8d92890bd6b8502d6aee4b37430
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
15:4 -53% 13:4 perf-profile.calltrace.cycles-pp.error_entry.testcase
13:4 -46% 11:4 perf-profile.calltrace.cycles-pp.sync_regs.error_entry.testcase
16:4 -56% 13:4 perf-profile.children.cycles-pp.error_entry
0:4 -1% 0:4 perf-profile.children.cycles-pp.error_exit
1:4 -7% 1:4 perf-profile.self.cycles-pp.error_entry
%stddev %change %stddev
\ | \
397562 ± 2% -15.3% 336574 will-it-scale.per_process_ops
76332041 ± 2% -15.3% 64622457 will-it-scale.workload
39.77 ± 2% -3.3% 38.47 ± 2% boot-time.boot
6743 ± 2% -3.7% 6491 ± 2% boot-time.idle
1181 ± 6% -16.5% 986.75 ± 3% slabinfo.file_lock_cache.active_objs
1181 ± 6% -16.5% 986.75 ± 3% slabinfo.file_lock_cache.num_objs
52683124 -13.2% 45706664 proc-vmstat.numa_hit
52589716 -13.3% 45613288 proc-vmstat.numa_local
52808650 -13.2% 45839676 proc-vmstat.pgalloc_normal
2.291e+10 ± 2% -15.3% 1.94e+10 ± 2% proc-vmstat.pgfault
49686233 ± 7% -15.7% 41876349 ± 5% proc-vmstat.pgfree
13891167 ± 3% -15.0% 11805002 ± 2% numa-numastat.node1.local_node
13912999 ± 3% -15.0% 11829816 ± 2% numa-numastat.node1.numa_hit
13716455 ± 5% -14.8% 11689391 numa-numastat.node2.local_node
13735214 ± 5% -14.7% 11712688 numa-numastat.node2.numa_hit
13726371 ± 4% -14.9% 11684584 numa-numastat.node3.local_node
13757483 ± 4% -14.9% 11712533 numa-numastat.node3.numa_hit
32.74 ± 30% -25.0% 24.56 ± 15% sched_debug.cfs_rq:/.load_avg.stddev
61.79 ± 8% -12.9% 53.80 ± 4% sched_debug.cfs_rq:/.util_avg.stddev
49.46 ± 86% -89.4% 5.25 ± 74% sched_debug.cfs_rq:/.util_est_enqueued.min
21511 ± 3% -9.6% 19450 ± 4% sched_debug.cpu.nr_switches.max
2654 ± 2% +9.7% 2913 ± 4% sched_debug.cpu.sched_goidle.max
372.89 ± 4% -12.4% 326.61 ± 6% sched_debug.cpu.ttwu_local.stddev
8526 ± 4% -9.0% 7758 ± 2% numa-meminfo.node0.KernelStack
13013 ± 3% -8.5% 11911 numa-meminfo.node0.PageTables
110904 ± 7% -12.0% 97557 ± 7% numa-meminfo.node0.SUnreclaim
171034 ± 6% -11.5% 151283 ± 3% numa-meminfo.node0.Slab
7265 ± 4% +9.3% 7941 ± 4% numa-meminfo.node1.KernelStack
87118 ± 8% +13.4% 98831 ± 3% numa-meminfo.node1.SUnreclaim
268655 +8.0% 290122 ± 5% numa-meminfo.node1.Unevictable
8527 ± 4% -9.0% 7758 ± 2% numa-vmstat.node0.nr_kernel_stack
3250 ± 3% -8.3% 2979 numa-vmstat.node0.nr_page_table_pages
27726 ± 7% -12.0% 24389 ± 7% numa-vmstat.node0.nr_slab_unreclaimable
7265 ± 4% +9.3% 7941 ± 4% numa-vmstat.node1.nr_kernel_stack
21779 ± 8% +13.4% 24707 ± 3% numa-vmstat.node1.nr_slab_unreclaimable
67163 +8.0% 72530 ± 5% numa-vmstat.node1.nr_unevictable
67163 +8.0% 72530 ± 5% numa-vmstat.node1.nr_zone_unevictable
8044550 ± 3% -10.9% 7168348 ± 2% numa-vmstat.node1.numa_hit
7970411 ± 3% -10.6% 7125578 ± 2% numa-vmstat.node1.numa_local
7988934 ± 4% -12.3% 7002772 numa-vmstat.node2.numa_hit
7879319 ± 4% -12.6% 6888935 numa-vmstat.node2.numa_local
8041049 ± 4% -13.3% 6974987 numa-vmstat.node3.numa_hit
7919771 ± 4% -13.4% 6856917 numa-vmstat.node3.numa_local
5422 ± 34% +60.2% 8684 interrupts.CPU1.NMI:Non-maskable_interrupts
5422 ± 34% +60.2% 8684 interrupts.CPU1.PMI:Performance_monitoring_interrupts
847.75 ± 7% +10.4% 935.50 ± 6% interrupts.CPU153.CAL:Function_call_interrupts
845.75 ± 7% +10.6% 935.50 ± 6% interrupts.CPU154.CAL:Function_call_interrupts
847.75 ± 7% +10.4% 935.50 ± 6% interrupts.CPU155.CAL:Function_call_interrupts
847.50 ± 7% +10.4% 935.50 ± 6% interrupts.CPU156.CAL:Function_call_interrupts
847.00 ± 7% +10.4% 935.25 ± 6% interrupts.CPU158.CAL:Function_call_interrupts
216.25 ± 17% -55.4% 96.50 ± 58% interrupts.CPU158.RES:Rescheduling_interrupts
848.00 ± 8% +10.1% 933.75 ± 6% interrupts.CPU159.CAL:Function_call_interrupts
847.50 ± 8% +10.4% 935.50 ± 6% interrupts.CPU160.CAL:Function_call_interrupts
847.50 ± 8% +10.4% 935.50 ± 6% interrupts.CPU161.CAL:Function_call_interrupts
847.25 ± 7% +10.4% 935.50 ± 6% interrupts.CPU162.CAL:Function_call_interrupts
847.50 ± 8% +10.4% 935.50 ± 6% interrupts.CPU163.CAL:Function_call_interrupts
370.25 ± 50% -40.1% 221.75 ± 15% interrupts.CPU191.RES:Rescheduling_interrupts
1294 ± 51% -49.8% 649.75 ± 16% interrupts.CPU25.RES:Rescheduling_interrupts
3.38 ± 2% +9.5% 3.70 perf-stat.i.MPKI
3.812e+10 ± 2% -15.2% 3.231e+10 perf-stat.i.branch-instructions
0.30 +0.0 0.31 perf-stat.i.branch-miss-rate%
1.104e+08 ± 2% -12.3% 96774106 perf-stat.i.branch-misses
54.85 -1.7 53.16 perf-stat.i.cache-miss-rate%
3.469e+08 -9.9% 3.124e+08 perf-stat.i.cache-misses
6.312e+08 -7.1% 5.864e+08 perf-stat.i.cache-references
3.14 ± 2% +18.0% 3.71 perf-stat.i.cpi
1697 +11.1% 1885 perf-stat.i.cycles-between-cache-misses
3444920 ± 9% -16.1% 2890587 perf-stat.i.dTLB-load-misses
5.316e+10 ± 2% -15.2% 4.506e+10 perf-stat.i.dTLB-loads
1.278e+09 ± 2% -15.5% 1.08e+09 ± 2% perf-stat.i.dTLB-store-misses
2.735e+10 ± 2% -15.3% 2.318e+10 perf-stat.i.dTLB-stores
71867884 -11.4% 63642234 perf-stat.i.iTLB-load-misses
233257 ± 4% -16.2% 195493 ± 6% perf-stat.i.iTLB-loads
1.873e+11 ± 2% -15.3% 1.587e+11 perf-stat.i.instructions
2611 -4.4% 2497 perf-stat.i.instructions-per-iTLB-miss
0.32 ± 2% -15.2% 0.27 perf-stat.i.ipc
0.35 ± 69% +128.5% 0.79 ± 3% perf-stat.i.metric.K/sec
629.28 ± 2% -15.2% 533.56 perf-stat.i.metric.M/sec
75758540 ± 2% -15.4% 64123701 ± 2% perf-stat.i.minor-faults
16.32 ± 2% +9.5 25.77 ± 2% perf-stat.i.node-load-miss-rate%
6187005 ± 3% +9.5% 6773017 ± 2% perf-stat.i.node-load-misses
32626964 ± 3% -39.1% 19884304 perf-stat.i.node-loads
22.85 -0.3 22.54 perf-stat.i.node-store-miss-rate%
22644910 ± 2% -17.3% 18738027 perf-stat.i.node-store-misses
77237823 ± 2% -15.5% 65254403 perf-stat.i.node-stores
75758541 ± 2% -15.4% 64123701 ± 2% perf-stat.i.page-faults
3.37 ± 2% +9.6% 3.69 perf-stat.overall.MPKI
0.29 +0.0 0.30 perf-stat.overall.branch-miss-rate%
55.00 -1.7 53.31 perf-stat.overall.cache-miss-rate%
3.14 ± 2% +18.0% 3.71 perf-stat.overall.cpi
1694 +11.1% 1882 perf-stat.overall.cycles-between-cache-misses
2606 -4.3% 2493 perf-stat.overall.instructions-per-iTLB-miss
0.32 ± 2% -15.3% 0.27 perf-stat.overall.ipc
15.91 ± 2% +9.5 25.37 ± 2% perf-stat.overall.node-load-miss-rate%
22.67 -0.4 22.31 perf-stat.overall.node-store-miss-rate%
3.801e+10 ± 2% -15.2% 3.223e+10 perf-stat.ps.branch-instructions
1.1e+08 ± 2% -12.3% 96469431 perf-stat.ps.branch-misses
3.46e+08 -9.9% 3.116e+08 perf-stat.ps.cache-misses
6.293e+08 -7.1% 5.846e+08 perf-stat.ps.cache-references
3494841 ± 8% -17.0% 2901338 perf-stat.ps.dTLB-load-misses
5.3e+10 ± 2% -15.2% 4.494e+10 perf-stat.ps.dTLB-loads
1.275e+09 ± 2% -15.5% 1.078e+09 ± 2% perf-stat.ps.dTLB-store-misses
2.727e+10 ± 2% -15.2% 2.312e+10 perf-stat.ps.dTLB-stores
71652742 -11.4% 63468528 perf-stat.ps.iTLB-load-misses
231120 ± 4% -16.5% 192992 ± 5% perf-stat.ps.iTLB-loads
1.868e+11 ± 2% -15.2% 1.583e+11 perf-stat.ps.instructions
75533403 ± 2% -15.3% 63956449 perf-stat.ps.minor-faults
6164201 ± 3% +9.4% 6746436 ± 2% perf-stat.ps.node-load-misses
32581427 ± 3% -39.1% 19854118 perf-stat.ps.node-loads
22575559 ± 2% -17.2% 18684407 perf-stat.ps.node-store-misses
77005596 ± 2% -15.5% 65083018 perf-stat.ps.node-stores
75533404 ± 2% -15.3% 63956450 perf-stat.ps.page-faults
5.653e+13 ± 2% -15.3% 4.788e+13 perf-stat.total.instructions
79.63 -3.5 76.17 perf-profile.calltrace.cycles-pp.page_fault.testcase
8.31 ± 3% -2.1 6.22 ± 2% perf-profile.calltrace.cycles-pp.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
7.96 ± 3% -2.0 5.91 ± 2% perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault
7.37 ± 3% -2.0 5.41 ± 2% perf-profile.calltrace.cycles-pp.shmem_getpage_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault
6.80 ± 3% -1.9 4.92 ± 3% perf-profile.calltrace.cycles-pp.find_lock_entry.shmem_getpage_gfp.shmem_fault.__do_fault.do_fault
4.76 ± 3% -1.4 3.39 ± 3% perf-profile.calltrace.cycles-pp.find_get_entry.find_lock_entry.shmem_getpage_gfp.shmem_fault.__do_fault
6.30 -0.9 5.35 ± 3% perf-profile.calltrace.cycles-pp.__count_memcg_events.handle_mm_fault.do_user_addr_fault.page_fault.testcase
5.15 ± 2% -0.7 4.41 perf-profile.calltrace.cycles-pp.__mod_lruvec_state.page_remove_rmap.zap_pte_range.unmap_page_range.unmap_vmas
1.62 ± 3% -0.7 0.88 ± 3% perf-profile.calltrace.cycles-pp.xas_load.find_get_entry.find_lock_entry.shmem_getpage_gfp.shmem_fault
4.27 ± 3% -0.6 3.66 perf-profile.calltrace.cycles-pp.__mod_memcg_state.__mod_lruvec_state.page_remove_rmap.zap_pte_range.unmap_page_range
5.87 ± 3% -0.5 5.33 perf-profile.calltrace.cycles-pp.__mod_lruvec_state.page_add_file_rmap.alloc_set_pte.finish_fault.do_fault
2.97 -0.5 2.46 ± 3% perf-profile.calltrace.cycles-pp.lock_page_memcg.page_add_file_rmap.alloc_set_pte.finish_fault.do_fault
4.99 ± 4% -0.4 4.60 ± 2% perf-profile.calltrace.cycles-pp.__mod_memcg_state.__mod_lruvec_state.page_add_file_rmap.alloc_set_pte.finish_fault
2.33 ± 2% -0.4 1.95 ± 2% perf-profile.calltrace.cycles-pp.fault_dirty_shared_page.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
1.05 ± 4% -0.3 0.77 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock.alloc_set_pte.finish_fault.do_fault.__handle_mm_fault
0.94 ± 2% -0.1 0.79 ± 2% perf-profile.calltrace.cycles-pp.__perf_sw_event.do_user_addr_fault.page_fault.testcase
0.66 ± 2% -0.1 0.52 perf-profile.calltrace.cycles-pp.down_read_trylock.do_user_addr_fault.page_fault.testcase
1.04 -0.1 0.92 perf-profile.calltrace.cycles-pp.lock_page_memcg.page_remove_rmap.zap_pte_range.unmap_page_range.unmap_vmas
0.99 ± 2% -0.1 0.88 ± 2% perf-profile.calltrace.cycles-pp.__perf_sw_event.page_fault.testcase
0.96 ± 3% -0.1 0.85 perf-profile.calltrace.cycles-pp.up_read.do_user_addr_fault.page_fault.testcase
0.80 ± 3% -0.1 0.69 ± 2% perf-profile.calltrace.cycles-pp.file_update_time.fault_dirty_shared_page.do_fault.__handle_mm_fault.handle_mm_fault
0.63 ± 2% -0.1 0.53 ± 2% perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.do_user_addr_fault.page_fault.testcase
0.73 ± 2% -0.1 0.64 ± 2% perf-profile.calltrace.cycles-pp.swapgs_restore_regs_and_return_to_usermode.testcase
0.61 ± 2% -0.1 0.55 ± 2% perf-profile.calltrace.cycles-pp.___perf_sw_event.__perf_sw_event.page_fault.testcase
84.98 +1.1 86.08 perf-profile.calltrace.cycles-pp.testcase
48.52 +1.3 49.87 perf-profile.calltrace.cycles-pp.do_user_addr_fault.page_fault.testcase
14.57 ± 3% +1.6 16.13 perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
14.57 ± 3% +1.6 16.13 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
14.57 ± 3% +1.6 16.13 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
14.57 ± 3% +1.6 16.14 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
14.57 ± 3% +1.6 16.14 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
14.57 ± 3% +1.6 16.14 perf-profile.calltrace.cycles-pp.__munmap
14.56 ± 3% +1.6 16.13 perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
14.54 ± 3% +1.6 16.11 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
14.54 ± 3% +1.6 16.11 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
14.48 ± 3% +1.6 16.07 perf-profile.calltrace.cycles-pp.zap_pte_range.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
44.16 +2.1 46.22 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.page_fault.testcase
11.15 ± 4% +2.2 13.37 perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.unmap_page_range.unmap_vmas.unmap_region
36.38 +3.2 39.61 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.page_fault.testcase
34.47 ± 2% +3.5 37.98 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.page_fault
23.42 ± 3% +6.1 29.47 perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
23.25 ± 3% +6.1 29.34 perf-profile.calltrace.cycles-pp.alloc_set_pte.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault
21.20 ± 4% +6.5 27.72 perf-profile.calltrace.cycles-pp.page_add_file_rmap.alloc_set_pte.finish_fault.do_fault.__handle_mm_fault
8.32 ± 3% -2.1 6.22 ± 2% perf-profile.children.cycles-pp.__do_fault
7.97 ± 3% -2.1 5.92 ± 2% perf-profile.children.cycles-pp.shmem_fault
7.39 ± 3% -2.0 5.43 ± 2% perf-profile.children.cycles-pp.shmem_getpage_gfp
6.86 ± 3% -1.9 4.96 ± 3% perf-profile.children.cycles-pp.find_lock_entry
85.36 -1.6 83.79 perf-profile.children.cycles-pp.testcase
4.80 ± 3% -1.4 3.42 ± 3% perf-profile.children.cycles-pp.find_get_entry
11.04 ± 3% -1.3 9.76 perf-profile.children.cycles-pp.__mod_lruvec_state
65.24 -1.2 64.03 perf-profile.children.cycles-pp.page_fault
9.28 ± 3% -1.0 8.27 perf-profile.children.cycles-pp.__mod_memcg_state
6.30 -1.0 5.35 ± 3% perf-profile.children.cycles-pp.__count_memcg_events
1.63 ± 3% -0.7 0.90 ± 3% perf-profile.children.cycles-pp.xas_load
4.02 -0.6 3.39 ± 2% perf-profile.children.cycles-pp.lock_page_memcg
3.46 ± 3% -0.5 2.99 ± 3% perf-profile.children.cycles-pp.sync_regs
2.38 ± 2% -0.4 1.99 ± 2% perf-profile.children.cycles-pp.fault_dirty_shared_page
1.08 ± 4% -0.3 0.80 ± 2% perf-profile.children.cycles-pp._raw_spin_lock
1.94 ± 2% -0.3 1.68 ± 2% perf-profile.children.cycles-pp.__perf_sw_event
1.31 ± 2% -0.2 1.14 perf-profile.children.cycles-pp.___perf_sw_event
0.88 ± 2% -0.2 0.72 ± 2% perf-profile.children.cycles-pp.page_mapping
0.82 ± 2% -0.2 0.66 ± 2% perf-profile.children.cycles-pp.set_page_dirty
0.66 ± 2% -0.1 0.52 perf-profile.children.cycles-pp.down_read_trylock
0.61 ± 4% -0.1 0.47 ± 3% perf-profile.children.cycles-pp.unlock_page
0.83 ± 2% -0.1 0.72 ± 2% perf-profile.children.cycles-pp.file_update_time
0.96 ± 3% -0.1 0.85 perf-profile.children.cycles-pp.up_read
0.52 ± 2% -0.1 0.42 ± 3% perf-profile.children.cycles-pp.tlb_flush_mmu
0.73 ± 2% -0.1 0.64 ± 2% perf-profile.children.cycles-pp.swapgs_restore_regs_and_return_to_usermode
0.39 ± 2% -0.1 0.31 ± 3% perf-profile.children.cycles-pp.release_pages
0.45 ± 2% -0.1 0.38 ± 4% perf-profile.children.cycles-pp.find_vma
0.52 ± 3% -0.1 0.45 perf-profile.children.cycles-pp.current_time
0.62 ± 3% -0.1 0.55 ± 3% perf-profile.children.cycles-pp.__mod_node_page_state
0.38 ± 2% -0.1 0.32 ± 3% perf-profile.children.cycles-pp.vmacache_find
0.41 ± 3% -0.1 0.35 perf-profile.children.cycles-pp.do_page_fault
0.41 ± 2% -0.1 0.35 ± 2% perf-profile.children.cycles-pp.___might_sleep
0.20 ± 4% -0.1 0.14 ± 5% perf-profile.children.cycles-pp.__tlb_remove_page_size
0.23 ± 7% -0.1 0.18 ± 6% perf-profile.children.cycles-pp.xas_start
0.47 -0.0 0.42 ± 2% perf-profile.children.cycles-pp.__unlock_page_memcg
0.33 ± 3% -0.0 0.29 ± 2% perf-profile.children.cycles-pp.prepare_exit_to_usermode
0.26 ± 3% -0.0 0.23 ± 2% perf-profile.children.cycles-pp.mark_page_accessed
0.26 -0.0 0.23 perf-profile.children.cycles-pp.__set_page_dirty_no_writeback
0.14 ± 3% -0.0 0.11 ± 7% perf-profile.children.cycles-pp.perf_swevent_event
0.14 ± 6% -0.0 0.11 ± 7% perf-profile.children.cycles-pp.vm_normal_page
0.21 ± 2% -0.0 0.19 ± 4% perf-profile.children.cycles-pp.__might_sleep
0.18 ± 4% -0.0 0.16 perf-profile.children.cycles-pp._cond_resched
0.13 ± 3% -0.0 0.11 ± 3% perf-profile.children.cycles-pp.PageHuge
0.13 ± 3% -0.0 0.11 ± 6% perf-profile.children.cycles-pp.free_pages_and_swap_cache
0.10 ± 5% -0.0 0.08 perf-profile.children.cycles-pp.rcu_all_qs
0.11 ± 4% -0.0 0.10 perf-profile.children.cycles-pp.page_rmapping
48.60 +1.3 49.95 perf-profile.children.cycles-pp.do_user_addr_fault
14.57 ± 3% +1.6 16.13 perf-profile.children.cycles-pp.__do_munmap
14.57 ± 3% +1.6 16.13 perf-profile.children.cycles-pp.__x64_sys_munmap
14.57 ± 3% +1.6 16.13 perf-profile.children.cycles-pp.__vm_munmap
14.54 ± 3% +1.6 16.11 perf-profile.children.cycles-pp.zap_pte_range
14.56 ± 3% +1.6 16.13 perf-profile.children.cycles-pp.unmap_region
14.62 ± 2% +1.6 16.19 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
14.57 ± 3% +1.6 16.14 perf-profile.children.cycles-pp.__munmap
14.54 ± 3% +1.6 16.11 perf-profile.children.cycles-pp.unmap_vmas
14.54 ± 3% +1.6 16.11 perf-profile.children.cycles-pp.unmap_page_range
14.62 ± 2% +1.6 16.19 perf-profile.children.cycles-pp.do_syscall_64
44.21 +2.0 46.26 perf-profile.children.cycles-pp.handle_mm_fault
11.21 ± 4% +2.2 13.43 perf-profile.children.cycles-pp.page_remove_rmap
36.41 +3.2 39.64 perf-profile.children.cycles-pp.__handle_mm_fault
34.53 ± 2% +3.5 38.03 perf-profile.children.cycles-pp.do_fault
23.44 ± 3% +6.1 29.49 perf-profile.children.cycles-pp.finish_fault
23.28 ± 3% +6.1 29.37 perf-profile.children.cycles-pp.alloc_set_pte
21.24 ± 4% +6.5 27.77 perf-profile.children.cycles-pp.page_add_file_rmap
15.77 ± 2% -2.4 13.39 perf-profile.self.cycles-pp.testcase
9.21 ± 3% -1.0 8.21 perf-profile.self.cycles-pp.__mod_memcg_state
6.29 -1.0 5.34 ± 3% perf-profile.self.cycles-pp.__count_memcg_events
1.39 ± 3% -0.7 0.71 ± 2% perf-profile.self.cycles-pp.xas_load
3.12 ± 3% -0.6 2.49 ± 3% perf-profile.self.cycles-pp.find_get_entry
3.97 -0.6 3.34 ± 3% perf-profile.self.cycles-pp.lock_page_memcg
3.46 ± 3% -0.5 2.99 ± 3% perf-profile.self.cycles-pp.sync_regs
1.42 ± 5% -0.4 0.99 ± 3% perf-profile.self.cycles-pp.find_lock_entry
1.53 ± 2% -0.3 1.21 perf-profile.self.cycles-pp.zap_pte_range
1.07 ± 4% -0.3 0.79 ± 3% perf-profile.self.cycles-pp._raw_spin_lock
1.86 -0.3 1.59 ± 2% perf-profile.self.cycles-pp.__handle_mm_fault
1.14 ± 7% -0.2 0.94 ± 7% perf-profile.self.cycles-pp.__mod_lruvec_state
1.26 ± 2% -0.2 1.10 perf-profile.self.cycles-pp.handle_mm_fault
1.09 -0.2 0.94 ± 2% perf-profile.self.cycles-pp.___perf_sw_event
0.84 ± 2% -0.1 0.69 perf-profile.self.cycles-pp.page_mapping
0.86 ± 2% -0.1 0.72 ± 2% perf-profile.self.cycles-pp.do_user_addr_fault
0.65 ± 2% -0.1 0.51 perf-profile.self.cycles-pp.down_read_trylock
0.59 ± 4% -0.1 0.45 ± 2% perf-profile.self.cycles-pp.unlock_page
0.75 ± 2% -0.1 0.61 perf-profile.self.cycles-pp.alloc_set_pte
0.95 ± 3% -0.1 0.84 perf-profile.self.cycles-pp.up_read
0.63 ± 2% -0.1 0.53 ± 2% perf-profile.self.cycles-pp.__perf_sw_event
0.62 ± 2% -0.1 0.53 perf-profile.self.cycles-pp.page_fault
0.57 ± 3% -0.1 0.48 perf-profile.self.cycles-pp.shmem_fault
0.38 ± 2% -0.1 0.30 ± 4% perf-profile.self.cycles-pp.release_pages
0.60 ± 3% -0.1 0.53 ± 4% perf-profile.self.cycles-pp.__mod_node_page_state
0.53 ± 2% -0.1 0.46 perf-profile.self.cycles-pp.shmem_getpage_gfp
0.39 ± 2% -0.1 0.34 perf-profile.self.cycles-pp.do_page_fault
0.41 ± 4% -0.1 0.35 ± 2% perf-profile.self.cycles-pp.do_fault
0.36 ± 3% -0.1 0.31 ± 3% perf-profile.self.cycles-pp.vmacache_find
0.39 ± 2% -0.1 0.34 perf-profile.self.cycles-pp.___might_sleep
0.22 ± 5% -0.1 0.17 ± 9% perf-profile.self.cycles-pp.xas_start
0.40 ± 3% -0.1 0.35 ± 2% perf-profile.self.cycles-pp.swapgs_restore_regs_and_return_to_usermode
0.45 -0.0 0.40 ± 3% perf-profile.self.cycles-pp.__unlock_page_memcg
0.24 ± 2% -0.0 0.20 ± 3% perf-profile.self.cycles-pp.set_page_dirty
0.31 ± 4% -0.0 0.27 ± 5% perf-profile.self.cycles-pp.file_update_time
0.25 ± 2% -0.0 0.21 ± 2% perf-profile.self.cycles-pp.fault_dirty_shared_page
0.16 ± 2% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.__tlb_remove_page_size
0.28 ± 2% -0.0 0.25 ± 3% perf-profile.self.cycles-pp.prepare_exit_to_usermode
0.26 ± 3% -0.0 0.22 perf-profile.self.cycles-pp.mark_page_accessed
0.16 ± 5% -0.0 0.13 perf-profile.self.cycles-pp.finish_fault
0.23 -0.0 0.21 ± 2% perf-profile.self.cycles-pp.__set_page_dirty_no_writeback
0.16 ± 4% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.__do_fault
0.17 ± 2% -0.0 0.15 ± 4% perf-profile.self.cycles-pp.current_time
0.10 ± 4% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.PageHuge
0.20 ± 2% -0.0 0.18 ± 2% perf-profile.self.cycles-pp.__might_sleep
0.12 ± 4% -0.0 0.11 ± 4% perf-profile.self.cycles-pp.free_pages_and_swap_cache
0.10 ± 4% -0.0 0.09 ± 5% perf-profile.self.cycles-pp.perf_swevent_event
0.12 ± 7% -0.0 0.10 ± 7% perf-profile.self.cycles-pp.vm_normal_page
0.10 ± 5% -0.0 0.08 perf-profile.self.cycles-pp.page_rmapping
4.94 ± 8% +3.1 8.00 ± 3% perf-profile.self.cycles-pp.page_remove_rmap
12.30 ± 6% +7.5 19.84 ± 2% perf-profile.self.cycles-pp.page_add_file_rmap



will-it-scale.per_process_ops

520000 +------------------------------------------------------------------+
500000 |-+ ..+ |
| ..+. : |
480000 |.+ : |
460000 |-+ : |
| : |
440000 |-+ : |
420000 |-+ : |
400000 |-+ : ..+....+.. ..+....+...+.. |
| : ..+... ..+. . ..+...+...+. .|
380000 |-+ +. +. +. |
360000 |-+ |
| O O |
340000 |-+ O O O O |
320000 +------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (29.91 kB)
config-5.7.0-00467-g8d92890bd6b85 (160.13 kB)
job-script (7.62 kB)
job.yaml (5.40 kB)
reproduce (354.00 B)
Download all attachments

2020-10-14 20:50:09

by Jan Kara

[permalink] [raw]
Subject: Re: [mm/writeback] 8d92890bd6: will-it-scale.per_process_ops -15.3% regression

On Wed 14-10-20 16:47:06, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a -15.3% regression of will-it-scale.per_process_ops due
> to commit:
>
> commit: 8d92890bd6b8502d6aee4b37430ae6444ade7a8c ("mm/writeback: discard
> NR_UNSTABLE_NFS, use NR_WRITEBACK instead")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

Thanks for report but it doesn't quite make sense to me. If we omit
reporting & NFS changes in that commit (which is code not excercised by
this benchmark), what remains are changes like:

nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
- nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
nr_pages += node_page_state(pgdat, NR_WRITEBACK);
...
- nr_reclaimable = global_node_page_state(NR_FILE_DIRTY) +
- global_node_page_state(NR_UNSTABLE_NFS);
+ nr_reclaimable = global_node_page_state(NR_FILE_DIRTY);
...
- gdtc->dirty = global_node_page_state(NR_FILE_DIRTY) +
- global_node_page_state(NR_UNSTABLE_NFS);
+ gdtc->dirty = global_node_page_state(NR_FILE_DIRTY);

So if there's any negative performance impact of these changes, they're
likely due to code alignment changes or something like that... So I don't
think there's much to do here since optimal code alignment is highly specific
to a particular CPU etc.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-10-15 07:23:03

by NeilBrown

[permalink] [raw]
Subject: Re: [mm/writeback] 8d92890bd6: will-it-scale.per_process_ops -15.3% regression

On Wed, Oct 14 2020, Jan Kara wrote:

> On Wed 14-10-20 16:47:06, kernel test robot wrote:
>> Greeting,
>>
>> FYI, we noticed a -15.3% regression of will-it-scale.per_process_ops due
>> to commit:
>>
>> commit: 8d92890bd6b8502d6aee4b37430ae6444ade7a8c ("mm/writeback: discard
>> NR_UNSTABLE_NFS, use NR_WRITEBACK instead")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> Thanks for report but it doesn't quite make sense to me. If we omit
> reporting & NFS changes in that commit (which is code not excercised by
> this benchmark), what remains are changes like:
>
> nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
> - nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
> nr_pages += node_page_state(pgdat, NR_WRITEBACK);
> ...
> - nr_reclaimable = global_node_page_state(NR_FILE_DIRTY) +
> - global_node_page_state(NR_UNSTABLE_NFS);
> + nr_reclaimable = global_node_page_state(NR_FILE_DIRTY);
> ...
> - gdtc->dirty = global_node_page_state(NR_FILE_DIRTY) +
> - global_node_page_state(NR_UNSTABLE_NFS);
> + gdtc->dirty = global_node_page_state(NR_FILE_DIRTY);
>
> So if there's any negative performance impact of these changes, they're
> likely due to code alignment changes or something like that... So I don't
> think there's much to do here since optimal code alignment is highly specific
> to a particular CPU etc.

I agree, it seems odd.

Removing NR_UNSTABLE_NFS from enum node_stat_item would renumber all the
following value and would, I think, change NR_DIRTIED from 32 to 31.
Might that move something to a different cache line and change some
contention?

That would be easy enough to test: just re-add NR_UNSTABLE_NFS.

I have no experience reading will-it-scale results, but 15% does seem
like a lot.

NeilBrown


Attachments:
signature.asc (869.00 B)

2020-10-15 11:33:12

by Jan Kara

[permalink] [raw]
Subject: Re: [mm/writeback] 8d92890bd6: will-it-scale.per_process_ops -15.3% regression

On Thu 15-10-20 08:46:01, NeilBrown wrote:
> On Wed, Oct 14 2020, Jan Kara wrote:
>
> > On Wed 14-10-20 16:47:06, kernel test robot wrote:
> >> Greeting,
> >>
> >> FYI, we noticed a -15.3% regression of will-it-scale.per_process_ops due
> >> to commit:
> >>
> >> commit: 8d92890bd6b8502d6aee4b37430ae6444ade7a8c ("mm/writeback: discard
> >> NR_UNSTABLE_NFS, use NR_WRITEBACK instead")
> >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > Thanks for report but it doesn't quite make sense to me. If we omit
> > reporting & NFS changes in that commit (which is code not excercised by
> > this benchmark), what remains are changes like:
> >
> > nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
> > - nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
> > nr_pages += node_page_state(pgdat, NR_WRITEBACK);
> > ...
> > - nr_reclaimable = global_node_page_state(NR_FILE_DIRTY) +
> > - global_node_page_state(NR_UNSTABLE_NFS);
> > + nr_reclaimable = global_node_page_state(NR_FILE_DIRTY);
> > ...
> > - gdtc->dirty = global_node_page_state(NR_FILE_DIRTY) +
> > - global_node_page_state(NR_UNSTABLE_NFS);
> > + gdtc->dirty = global_node_page_state(NR_FILE_DIRTY);
> >
> > So if there's any negative performance impact of these changes, they're
> > likely due to code alignment changes or something like that... So I don't
> > think there's much to do here since optimal code alignment is highly specific
> > to a particular CPU etc.
>
> I agree, it seems odd.
>
> Removing NR_UNSTABLE_NFS from enum node_stat_item would renumber all the
> following value and would, I think, change NR_DIRTIED from 32 to 31.
> Might that move something to a different cache line and change some
> contention?

Interesting theory, it could be possible.

> That would be easy enough to test: just re-add NR_UNSTABLE_NFS.

Yeah, easy enough to test. Patch for this is attached. 0-day people, can
you check whether applying this patch changes anything in your perf
numbers?

> I have no experience reading will-it-scale results, but 15% does seem
> like a lot.

Well, will-it-scale is a micro-benchmark that usually runs in extremely
parallel loads so 15% can be caused by fairly obscure issues like different
code alignment of a hot loop, slightly different cache line sharing, or
so...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-10-15 17:58:26

by Jan Kara

[permalink] [raw]
Subject: Re: [mm/writeback] 8d92890bd6: will-it-scale.per_process_ops -15.3% regression

On Thu 15-10-20 11:08:43, Jan Kara wrote:
> On Thu 15-10-20 08:46:01, NeilBrown wrote:
> > On Wed, Oct 14 2020, Jan Kara wrote:
> >
> > > On Wed 14-10-20 16:47:06, kernel test robot wrote:
> > >> Greeting,
> > >>
> > >> FYI, we noticed a -15.3% regression of will-it-scale.per_process_ops due
> > >> to commit:
> > >>
> > >> commit: 8d92890bd6b8502d6aee4b37430ae6444ade7a8c ("mm/writeback: discard
> > >> NR_UNSTABLE_NFS, use NR_WRITEBACK instead")
> > >> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > > Thanks for report but it doesn't quite make sense to me. If we omit
> > > reporting & NFS changes in that commit (which is code not excercised by
> > > this benchmark), what remains are changes like:
> > >
> > > nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
> > > - nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
> > > nr_pages += node_page_state(pgdat, NR_WRITEBACK);
> > > ...
> > > - nr_reclaimable = global_node_page_state(NR_FILE_DIRTY) +
> > > - global_node_page_state(NR_UNSTABLE_NFS);
> > > + nr_reclaimable = global_node_page_state(NR_FILE_DIRTY);
> > > ...
> > > - gdtc->dirty = global_node_page_state(NR_FILE_DIRTY) +
> > > - global_node_page_state(NR_UNSTABLE_NFS);
> > > + gdtc->dirty = global_node_page_state(NR_FILE_DIRTY);
> > >
> > > So if there's any negative performance impact of these changes, they're
> > > likely due to code alignment changes or something like that... So I don't
> > > think there's much to do here since optimal code alignment is highly specific
> > > to a particular CPU etc.
> >
> > I agree, it seems odd.
> >
> > Removing NR_UNSTABLE_NFS from enum node_stat_item would renumber all the
> > following value and would, I think, change NR_DIRTIED from 32 to 31.
> > Might that move something to a different cache line and change some
> > contention?
>
> Interesting theory, it could be possible.
>
> > That would be easy enough to test: just re-add NR_UNSTABLE_NFS.
>
> Yeah, easy enough to test. Patch for this is attached. 0-day people, can
> you check whether applying this patch changes anything in your perf
> numbers?

Forgot the patch. Attached now.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR


Attachments:
(No filename) (2.33 kB)
0001-mm-Add-NR_UNSTABLE_NFS-stat-item.patch (747.00 B)
Download all attachments

2020-10-16 16:09:42

by Chen, Rong A

[permalink] [raw]
Subject: Re: [mm/writeback] 8d92890bd6: will-it-scale.per_process_ops -15.3% regression



On 10/15/2020 5:12 PM, Jan Kara wrote:
> On Thu 15-10-20 11:08:43, Jan Kara wrote:
>> On Thu 15-10-20 08:46:01, NeilBrown wrote:
>>> On Wed, Oct 14 2020, Jan Kara wrote:
>>>
>>>> On Wed 14-10-20 16:47:06, kernel test robot wrote:
>>>>> Greeting,
>>>>>
>>>>> FYI, we noticed a -15.3% regression of will-it-scale.per_process_ops due
>>>>> to commit:
>>>>>
>>>>> commit: 8d92890bd6b8502d6aee4b37430ae6444ade7a8c ("mm/writeback: discard
>>>>> NR_UNSTABLE_NFS, use NR_WRITEBACK instead")
>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>>
>>>> Thanks for report but it doesn't quite make sense to me. If we omit
>>>> reporting & NFS changes in that commit (which is code not excercised by
>>>> this benchmark), what remains are changes like:
>>>>
>>>> nr_pages += node_page_state(pgdat, NR_FILE_DIRTY);
>>>> - nr_pages += node_page_state(pgdat, NR_UNSTABLE_NFS);
>>>> nr_pages += node_page_state(pgdat, NR_WRITEBACK);
>>>> ...
>>>> - nr_reclaimable = global_node_page_state(NR_FILE_DIRTY) +
>>>> - global_node_page_state(NR_UNSTABLE_NFS);
>>>> + nr_reclaimable = global_node_page_state(NR_FILE_DIRTY);
>>>> ...
>>>> - gdtc->dirty = global_node_page_state(NR_FILE_DIRTY) +
>>>> - global_node_page_state(NR_UNSTABLE_NFS);
>>>> + gdtc->dirty = global_node_page_state(NR_FILE_DIRTY);
>>>>
>>>> So if there's any negative performance impact of these changes, they're
>>>> likely due to code alignment changes or something like that... So I don't
>>>> think there's much to do here since optimal code alignment is highly specific
>>>> to a particular CPU etc.
>>>
>>> I agree, it seems odd.
>>>
>>> Removing NR_UNSTABLE_NFS from enum node_stat_item would renumber all the
>>> following value and would, I think, change NR_DIRTIED from 32 to 31.
>>> Might that move something to a different cache line and change some
>>> contention?
>>
>> Interesting theory, it could be possible.
>>
>>> That would be easy enough to test: just re-add NR_UNSTABLE_NFS.
>>
>> Yeah, easy enough to test. Patch for this is attached. 0-day people, can
>> you check whether applying this patch changes anything in your perf
>> numbers?
>
> Forgot the patch. Attached now.
>
> Honza
>

Hi,

We tested the patch and the regression became worse, but as you said the
problem seems odd, so we tested v5.9 and regression already disappeared.

a37b0715ddf30077 8d92890bd6b8502d6aee4b37430 v5.9
---------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev
\ | \ | \
341015 ? 9% -18.4% 278292 +32.4% 451473
will-it-scale.per_process_ops
65475001 ? 9% -18.4% 53432256 +32.4% 86682938
will-it-scale.workload

Best Regards,
Rong Chen