2023-05-06 06:27:16

by Yujie Liu

[permalink] [raw]
Subject: [linus:master] [mm/mmap] 28c5609fb2: aim9.page_test.ops_per_sec -10.8% regression

Hello,

kernel test robot noticed a -10.8% regression of aim9.page_test.ops_per_sec on:

commit: 28c5609fb236807910ca347ad3e26c4567998526 ("mm/mmap: preallocate maple nodes for brk vma expansion")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: aim9
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
parameters:

testtime: 5s
test: all
cpufreq_governor: performance

If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]


Details are as below:

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
gcc-11/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/all/aim9/5s

commit:
7be1c1a3c7 ("mm: more vma cache removal")
28c5609fb2 ("mm/mmap: preallocate maple nodes for brk vma expansion")

7be1c1a3c7b13fb2 28c5609fb236807910ca347ad3e
---------------- ---------------------------
%stddev %change %stddev
\ | \
1183149 -24.7% 890448 aim9.brk_test.ops_per_sec
351551 -10.8% 313668 aim9.page_test.ops_per_sec
4084661 -3.8% 3929302 aim9.time.minor_page_faults
13.27 ? 5% -4.7% 12.65 perf-stat.i.MPKI
6.806e+08 +1.5% 6.909e+08 perf-stat.i.branch-instructions
4.19 ? 3% -0.1 4.13 perf-stat.i.branch-miss-rate%
21115962 ? 3% +1.5% 21424859 perf-stat.i.branch-misses
20.91 -0.1 20.80 perf-stat.i.cache-miss-rate%
7002532 -0.5% 6965796 perf-stat.i.cache-misses
38662778 +0.0% 38680090 perf-stat.i.cache-references
2442 +0.2% 2446 perf-stat.i.context-switches
2.48 ? 3% -2.4% 2.42 ? 2% perf-stat.i.cpi
48002 +0.0% 48002 perf-stat.i.cpu-clock
7.156e+09 ? 4% +1.3% 7.245e+09 perf-stat.i.cpu-cycles
56.45 -0.9% 55.94 perf-stat.i.cpu-migrations
1026 ? 5% +1.8% 1044 perf-stat.i.cycles-between-cache-misses
1.21 ? 7% -0.0 1.16 ? 3% perf-stat.i.dTLB-load-miss-rate%
8643441 ? 5% -1.4% 8519450 ? 3% perf-stat.i.dTLB-load-misses
9.078e+08 +0.8% 9.151e+08 perf-stat.i.dTLB-loads
0.24 ? 2% -0.0 0.23 perf-stat.i.dTLB-store-miss-rate%
1542187 ? 2% -1.7% 1515424 perf-stat.i.dTLB-store-misses
7.519e+08 +0.7% 7.574e+08 perf-stat.i.dTLB-stores
86.68 +0.2 86.84 perf-stat.i.iTLB-load-miss-rate%
1787826 ? 9% +3.3% 1846407 ? 3% perf-stat.i.iTLB-load-misses
399756 ? 44% -13.1% 347406 ? 9% perf-stat.i.iTLB-loads
3.89e+09 +1.5% 3.949e+09 perf-stat.i.instructions
2774 ? 17% -5.7% 2616 ? 4% perf-stat.i.instructions-per-iTLB-miss
0.54 ? 4% +0.3% 0.54 ? 2% perf-stat.i.ipc
0.02 ? 52% -29.8% 0.01 ? 91% perf-stat.i.major-faults
0.15 ? 4% +1.3% 0.15 perf-stat.i.metric.GHz
606.42 -0.5% 603.55 perf-stat.i.metric.K/sec
49.00 +1.0% 49.48 perf-stat.i.metric.M/sec
15869 -3.2% 15362 perf-stat.i.minor-faults
46.90 +0.0 46.90 perf-stat.i.node-load-miss-rate%
79868 ? 2% -2.1% 78183 ? 4% perf-stat.i.node-load-misses
107991 ? 2% -2.4% 105360 ? 2% perf-stat.i.node-loads
25.81 ? 22% +1.2 27.03 ? 21% perf-stat.i.node-store-miss-rate%
48534 ? 10% -4.2% 46474 ? 13% perf-stat.i.node-store-misses
174486 ? 3% -0.3% 173968 ? 4% perf-stat.i.node-stores
15869 -3.2% 15362 perf-stat.i.page-faults
48002 +0.0% 48002 perf-stat.i.task-clock
9.93 -1.5% 9.78 perf-stat.overall.MPKI
3.11 ? 3% -0.0 3.11 perf-stat.overall.branch-miss-rate%
18.12 -0.1 18.02 perf-stat.overall.cache-miss-rate%
1.84 ? 4% -0.2% 1.84 ? 2% perf-stat.overall.cpi
1022 ? 4% +1.8% 1040 perf-stat.overall.cycles-between-cache-misses
0.94 ? 5% -0.0 0.92 ? 3% perf-stat.overall.dTLB-load-miss-rate%
0.20 -0.0 0.20 perf-stat.overall.dTLB-store-miss-rate%
82.18 ? 6% +2.0 84.18 perf-stat.overall.iTLB-load-miss-rate%
2201 ? 10% -2.6% 2143 ? 3% perf-stat.overall.instructions-per-iTLB-miss
0.54 ? 4% +0.1% 0.55 ? 2% perf-stat.overall.ipc
42.42 +0.1 42.52 perf-stat.overall.node-load-miss-rate%
21.71 ? 7% -0.7 20.99 ? 8% perf-stat.overall.node-store-miss-rate%
6.788e+08 +1.5% 6.891e+08 perf-stat.ps.branch-instructions
21134055 ? 3% +1.5% 21441971 perf-stat.ps.branch-misses
6978663 -0.5% 6941923 perf-stat.ps.cache-misses
38511530 +0.0% 38527030 perf-stat.ps.cache-references
2431 +0.2% 2434 perf-stat.ps.context-switches
47841 +0.0% 47841 perf-stat.ps.cpu-clock
7.136e+09 ? 4% +1.3% 7.226e+09 perf-stat.ps.cpu-cycles
56.35 -0.9% 55.83 perf-stat.ps.cpu-migrations
8605729 ? 5% -1.4% 8482344 ? 3% perf-stat.ps.dTLB-load-misses
9.048e+08 +0.8% 9.121e+08 perf-stat.ps.dTLB-loads
1535584 ? 2% -1.7% 1508912 perf-stat.ps.dTLB-store-misses
7.488e+08 +0.7% 7.544e+08 perf-stat.ps.dTLB-stores
1780894 ? 9% +3.3% 1839319 ? 3% perf-stat.ps.iTLB-load-misses
397847 ? 44% -13.1% 345833 ? 9% perf-stat.ps.iTLB-loads
3.879e+09 +1.5% 3.938e+09 perf-stat.ps.instructions
0.01 ? 52% -29.6% 0.01 ? 91% perf-stat.ps.major-faults
15785 -3.2% 15281 perf-stat.ps.minor-faults
79639 ? 2% -2.1% 77969 ? 4% perf-stat.ps.node-load-misses
108122 ? 2% -2.6% 105299 ? 2% perf-stat.ps.node-loads
48288 ? 10% -4.2% 46257 ? 13% perf-stat.ps.node-store-misses
173652 ? 3% -0.3% 173206 ? 4% perf-stat.ps.node-stores
15785 -3.2% 15281 perf-stat.ps.page-faults
47841 +0.0% 47841 perf-stat.ps.task-clock
1.168e+12 +1.4% 1.185e+12 perf-stat.total.instructions
5.54 ? 24% -1.3 4.25 ? 27% perf-profile.calltrace.cycles-pp.creat64
3.12 ?136% -1.2 1.92 ?106% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.rest_init
3.20 ?132% -1.2 2.00 ?100% perf-profile.calltrace.cycles-pp.cpu_startup_entry.rest_init.arch_call_rest_init.start_kernel.secondary_startup_64_no_verify
3.20 ?132% -1.2 2.00 ?100% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.rest_init.arch_call_rest_init.start_kernel
3.20 ?132% -1.2 2.00 ?100% perf-profile.calltrace.cycles-pp.start_kernel.secondary_startup_64_no_verify
3.20 ?132% -1.2 2.00 ?100% perf-profile.calltrace.cycles-pp.arch_call_rest_init.start_kernel.secondary_startup_64_no_verify
3.20 ?132% -1.2 2.00 ?100% perf-profile.calltrace.cycles-pp.rest_init.arch_call_rest_init.start_kernel.secondary_startup_64_no_verify
5.10 ? 24% -1.2 3.90 ? 28% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.creat64
5.13 ? 24% -1.2 3.93 ? 28% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.creat64
3.19 ?132% -1.2 1.99 ?100% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.rest_init.arch_call_rest_init
4.72 ? 25% -1.1 3.60 ? 28% perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_creat.do_syscall_64.entry_SYSCALL_64_after_hwframe.creat64
4.74 ? 25% -1.1 3.62 ? 28% perf-profile.calltrace.cycles-pp.__x64_sys_creat.do_syscall_64.entry_SYSCALL_64_after_hwframe.creat64
4.20 ? 24% -1.0 3.18 ? 28% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_creat.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.16 ? 24% -1.0 3.15 ? 28% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat.do_syscall_64
3.98 ? 22% -0.9 3.07 ? 30% perf-profile.calltrace.cycles-pp.unlink
3.61 ? 22% -0.9 2.74 ? 30% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlink
3.59 ? 22% -0.9 2.73 ? 30% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
3.24 ? 22% -0.8 2.47 ? 31% perf-profile.calltrace.cycles-pp.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
2.92 ? 23% -0.7 2.25 ? 32% perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlink
2.55 ? 24% -0.6 1.93 ? 28% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat
6.62 ? 19% -0.6 6.05 ? 31% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
8.17 ? 20% -0.6 7.61 ? 32% perf-profile.calltrace.cycles-pp.write
2.28 ? 23% -0.6 1.72 ? 27% perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2
6.55 ? 19% -0.6 6.00 ? 31% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
4.92 ? 18% -0.5 4.39 ? 30% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.50 ?223% -0.5 0.00 perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
2.19 ? 24% -0.5 1.68 ? 33% perf-profile.calltrace.cycles-pp.__libc_start_main
2.19 ? 24% -0.5 1.68 ? 33% perf-profile.calltrace.cycles-pp.main.__libc_start_main
2.19 ? 24% -0.5 1.68 ? 33% perf-profile.calltrace.cycles-pp.run_builtin.main.__libc_start_main
2.09 ? 25% -0.5 1.61 ? 34% perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.cmd_sched.run_builtin.main
5.14 ? 17% -0.5 4.66 ? 31% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
2.10 ? 25% -0.5 1.62 ? 34% perf-profile.calltrace.cycles-pp.cmd_record.cmd_sched.run_builtin.main.__libc_start_main
2.10 ? 25% -0.5 1.62 ? 34% perf-profile.calltrace.cycles-pp.cmd_sched.run_builtin.main.__libc_start_main
1.97 ? 22% -0.5 1.51 ? 31% perf-profile.calltrace.cycles-pp.vfs_fstatat.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
1.82 ? 26% -0.4 1.37 ? 34% perf-profile.calltrace.cycles-pp.record__finish_output.__cmd_record.cmd_record.cmd_sched.run_builtin
1.82 ? 26% -0.4 1.37 ? 34% perf-profile.calltrace.cycles-pp.perf_session__process_events.record__finish_output.__cmd_record.cmd_record.cmd_sched
1.72 ? 26% -0.4 1.31 ? 33% perf-profile.calltrace.cycles-pp.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record.cmd_record
2.73 ? 23% -0.4 2.34 ? 28% perf-profile.calltrace.cycles-pp.__xstat64
0.92 ? 22% -0.4 0.53 ? 77% perf-profile.calltrace.cycles-pp.shmem_get_inode.shmem_mknod.lookup_open.open_last_lookups.path_openat
0.60 ? 49% -0.4 0.24 ?144% perf-profile.calltrace.cycles-pp.new_inode.shmem_get_inode.shmem_mknod.lookup_open.open_last_lookups
0.48 ? 70% -0.4 0.13 ?223% perf-profile.calltrace.cycles-pp.thread__new.____machine__findnew_thread.machine__process_fork_event.perf_session__deliver_event.__ordered_events__flush
1.53 ? 26% -0.3 1.18 ? 37% perf-profile.calltrace.cycles-pp.__ordered_events__flush.perf_session__process_user_event.reader__read_event.perf_session__process_events.record__finish_output
0.47 ? 70% -0.3 0.13 ?223% perf-profile.calltrace.cycles-pp.nsinfo__new.thread__new.____machine__findnew_thread.machine__process_fork_event.perf_session__deliver_event
1.26 ? 24% -0.3 0.92 ? 31% perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open
1.53 ? 26% -0.3 1.19 ? 37% perf-profile.calltrace.cycles-pp.perf_session__process_user_event.reader__read_event.perf_session__process_events.record__finish_output.__cmd_record
1.52 ? 26% -0.3 1.17 ? 37% perf-profile.calltrace.cycles-pp.perf_session__deliver_event.__ordered_events__flush.perf_session__process_user_event.reader__read_event.perf_session__process_events
0.46 ? 70% -0.3 0.12 ?223% perf-profile.calltrace.cycles-pp.__xstat64.nsinfo__new.thread__new.____machine__findnew_thread.machine__process_fork_event
1.97 ? 23% -0.3 1.64 ? 27% perf-profile.calltrace.cycles-pp.__close
0.55 ? 71% -0.3 0.22 ?145% perf-profile.calltrace.cycles-pp.____machine__findnew_thread.machine__process_fork_event.perf_session__deliver_event.__ordered_events__flush.perf_session__process_user_event
0.43 ? 70% -0.3 0.12 ?223% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__xstat64.nsinfo__new.thread__new.____machine__findnew_thread
0.43 ? 70% -0.3 0.12 ?223% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64.nsinfo__new.thread__new
0.43 ? 71% -0.3 0.12 ?223% perf-profile.calltrace.cycles-pp.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.08 ? 18% -0.3 3.78 ? 32% perf-profile.calltrace.cycles-pp.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.42 ? 71% -0.3 0.13 ?223% perf-profile.calltrace.cycles-pp.link_path_walk.path_lookupat.filename_lookup.vfs_statx.vfs_fstatat
0.39 ? 72% -0.3 0.10 ?223% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat
0.41 ? 72% -0.3 0.12 ?223% perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.38 ? 70% -0.3 0.11 ?223% perf-profile.calltrace.cycles-pp.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64.nsinfo__new
2.21 ? 22% -0.3 1.96 ? 27% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__xstat64
2.19 ? 22% -0.2 1.94 ? 27% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
1.38 ? 17% -0.2 1.14 ? 32% perf-profile.calltrace.cycles-pp.vfs_statx.vfs_fstatat.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.80 ? 49% -0.2 0.56 ? 57% perf-profile.calltrace.cycles-pp.machine__process_fork_event.perf_session__deliver_event.__ordered_events__flush.perf_session__process_user_event.reader__read_event
0.56 ? 72% -0.2 0.32 ?104% perf-profile.calltrace.cycles-pp.filename_parentat.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.55 ? 23% -0.2 1.32 ? 27% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
0.43 ? 73% -0.2 0.20 ?144% perf-profile.calltrace.cycles-pp.alloc_inode.new_inode.shmem_get_inode.shmem_mknod.lookup_open
1.52 ? 24% -0.2 1.30 ? 27% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
1.70 ? 23% -0.2 1.48 ? 15% perf-profile.calltrace.cycles-pp.div_double
3.59 ? 20% -0.2 3.38 ? 32% perf-profile.calltrace.cycles-pp.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write.do_syscall_64
0.51 ? 72% -0.2 0.30 ?103% perf-profile.calltrace.cycles-pp.path_parentat.filename_parentat.do_unlinkat.__x64_sys_unlink.do_syscall_64
0.30 ?100% -0.2 0.10 ?223% perf-profile.calltrace.cycles-pp.__alloc_file.alloc_empty_file.path_openat.do_filp_open.do_sys_openat2
0.42 ? 73% -0.2 0.22 ?146% perf-profile.calltrace.cycles-pp.__filemap_get_folio.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.__generic_file_write_iter
0.31 ?100% -0.2 0.11 ?223% perf-profile.calltrace.cycles-pp.alloc_empty_file.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat
0.30 ?101% -0.2 0.11 ?223% perf-profile.calltrace.cycles-pp.d_alloc_parallel.lookup_open.open_last_lookups.path_openat.do_filp_open
1.75 ? 23% -0.2 1.56 ? 28% perf-profile.calltrace.cycles-pp.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
0.18 ?141% -0.2 0.00 perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.28 ?100% -0.2 0.10 ?223% perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat
1.32 ? 26% -0.2 1.14 ? 28% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
0.18 ?141% -0.2 0.00 perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_write.ksys_write.do_syscall_64
1.89 ? 20% -0.2 1.71 ? 34% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.__generic_file_write_iter.generic_file_write_iter
2.02 ? 20% -0.2 1.86 ? 34% perf-profile.calltrace.cycles-pp.shmem_write_begin.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.vfs_write
3.34 ? 21% -0.2 3.18 ? 31% perf-profile.calltrace.cycles-pp.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.vfs_write.ksys_write
0.47 ? 74% -0.1 0.32 ?104% perf-profile.calltrace.cycles-pp.__filemap_get_folio.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.ksys_read
0.14 ?223% -0.1 0.00 perf-profile.calltrace.cycles-pp.ktime_get_update_offsets_now.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
5.93 ? 24% -0.1 5.79 ? 31% perf-profile.calltrace.cycles-pp.read
0.84 ? 22% -0.1 0.72 ? 30% perf-profile.calltrace.cycles-pp.path_lookupat.filename_lookup.vfs_statx.vfs_fstatat.__do_sys_newstat
0.35 ?105% -0.1 0.23 ?143% perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare
0.87 ? 21% -0.1 0.75 ? 30% perf-profile.calltrace.cycles-pp.filename_lookup.vfs_statx.vfs_fstatat.__do_sys_newstat.do_syscall_64
0.20 ?141% -0.1 0.08 ?223% perf-profile.calltrace.cycles-pp.link_path_walk.path_parentat.filename_parentat.do_unlinkat.__x64_sys_unlink
0.33 ?105% -0.1 0.22 ?143% perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_user_mode_loop
0.33 ?105% -0.1 0.22 ?143% perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
1.02 ? 27% -0.1 0.90 ? 31% perf-profile.calltrace.cycles-pp.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.33 ?105% -0.1 0.22 ?143% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
0.33 ?105% -0.1 0.22 ?143% perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.__dentry_kill.dentry_kill.dput
0.33 ?105% -0.1 0.22 ?143% perf-profile.calltrace.cycles-pp.shmem_truncate_range.shmem_evict_inode.evict.__dentry_kill.dentry_kill
1.02 ? 27% -0.1 0.91 ? 31% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
0.32 ?105% -0.1 0.22 ?143% perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_truncate_range.shmem_evict_inode.evict.__dentry_kill
0.11 ?223% -0.1 0.00 perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.shmem_alloc_inode.alloc_inode.new_inode.shmem_get_inode
0.28 ?100% -0.1 0.18 ?141% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.21 ? 20% -0.1 1.10 ? 33% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
0.98 ? 27% -0.1 0.88 ? 31% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
0.87 ? 28% -0.1 0.77 ? 31% perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
0.47 ? 72% -0.1 0.38 ?103% perf-profile.calltrace.cycles-pp.shmem_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.vfs_write
0.19 ?141% -0.1 0.09 ?223% perf-profile.calltrace.cycles-pp.apparmor_file_permission.security_file_permission.vfs_read.ksys_read.do_syscall_64
0.30 ?100% -0.1 0.22 ?141% perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
0.17 ?141% -0.1 0.08 ?223% perf-profile.calltrace.cycles-pp.dput.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.08 ?223% -0.1 0.00 perf-profile.calltrace.cycles-pp.cpuidle_governor_latency_req.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry
0.18 ?141% -0.1 0.10 ?223% perf-profile.calltrace.cycles-pp.d_alloc.d_alloc_parallel.lookup_open.open_last_lookups.path_openat
1.20 ? 23% -0.1 1.12 ? 31% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.17 ?141% -0.1 0.10 ?223% perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.do_unlinkat.__x64_sys_unlink.do_syscall_64
0.19 ?141% -0.1 0.12 ?223% perf-profile.calltrace.cycles-pp.thread__find_map.build_id__mark_dso_hit.perf_session__deliver_event.__ordered_events__flush.perf_session__process_user_event
0.20 ?141% -0.1 0.13 ?223% perf-profile.calltrace.cycles-pp.build_id__mark_dso_hit.perf_session__deliver_event.__ordered_events__flush.perf_session__process_user_event.reader__read_event
0.19 ?141% -0.1 0.12 ?223% perf-profile.calltrace.cycles-pp.map__load.thread__find_map.build_id__mark_dso_hit.perf_session__deliver_event.__ordered_events__flush
0.19 ?141% -0.1 0.12 ?223% perf-profile.calltrace.cycles-pp.dso__load.map__load.thread__find_map.build_id__mark_dso_hit.perf_session__deliver_event
0.19 ?141% -0.1 0.12 ?223% perf-profile.calltrace.cycles-pp.__dso__load_kallsyms.dso__load.map__load.thread__find_map.build_id__mark_dso_hit
0.44 ? 72% -0.0 0.39 ?101% perf-profile.calltrace.cycles-pp.__entry_text_start.read
4.36 ? 23% -0.0 4.31 ? 30% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
4.30 ? 23% -0.0 4.26 ? 29% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
0.50 ? 74% -0.0 0.47 ? 78% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.write
0.11 ?223% -0.0 0.08 ?223% perf-profile.calltrace.cycles-pp.shmem_alloc_inode.alloc_inode.new_inode.shmem_get_inode.shmem_mknod
0.48 ? 44% -0.0 0.46 ? 45% perf-profile.calltrace.cycles-pp.rebalance_domains.__softirqentry_text_start.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.28 ?100% -0.0 0.28 ?141% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.copy_page_to_iter.shmem_file_read_iter.vfs_read
0.46 ? 70% -0.0 0.46 ? 75% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.read
0.28 ?100% -0.0 0.28 ?141% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.copy_page_to_iter.shmem_file_read_iter
0.09 ?223% -0.0 0.08 ?223% perf-profile.calltrace.cycles-pp.run_rebalance_domains.__softirqentry_text_start.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.18 ?141% +0.0 0.18 ?141% perf-profile.calltrace.cycles-pp.shmem_alloc_and_acct_folio.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.__generic_file_write_iter
1.28 ? 7% +0.0 1.31 ? 15% perf-profile.calltrace.cycles-pp.__softirqentry_text_start.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
0.44 ? 71% +0.0 0.46 ? 77% perf-profile.calltrace.cycles-pp.__entry_text_start.write
1.29 ? 10% +0.0 1.33 ? 14% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues
0.54 ? 73% +0.0 0.58 ? 73% perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_file_read_iter.vfs_read.ksys_read.do_syscall_64
0.53 ? 45% +0.0 0.57 ? 47% perf-profile.calltrace.cycles-pp.irq_enter_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
0.51 ? 45% +0.0 0.56 ? 47% perf-profile.calltrace.cycles-pp.tick_irq_enter.irq_enter_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
1.56 ? 5% +0.1 1.62 ? 16% perf-profile.calltrace.cycles-pp.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
0.70 ? 17% +0.1 0.75 ? 13% perf-profile.calltrace.cycles-pp.tick_nohz_next_event.tick_nohz_get_sleep_length.menu_select.cpuidle_idle_call.do_idle
0.51 ? 46% +0.1 0.58 ? 46% perf-profile.calltrace.cycles-pp.clockevents_program_event.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
2.58 ? 27% +0.1 2.67 ? 28% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
0.00 +0.1 0.08 ?223% perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_truncate_range.shmem_evict_inode.evict.do_unlinkat
0.00 +0.1 0.08 ?223% perf-profile.calltrace.cycles-pp.io_serial_in.wait_for_lsr.serial8250_console_write.console_emit_next_record.console_unlock
0.00 +0.1 0.08 ?223% perf-profile.calltrace.cycles-pp.delay_tsc.wait_for_lsr.serial8250_console_write.console_emit_next_record.console_unlock
0.00 +0.1 0.08 ?223% perf-profile.calltrace.cycles-pp.update_rq_clock.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer
0.00 +0.1 0.08 ?223% perf-profile.calltrace.cycles-pp.__d_alloc.d_alloc.d_alloc_parallel.lookup_open.open_last_lookups
1.15 ? 16% +0.1 1.24 ? 12% perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry
0.00 +0.1 0.09 ?223% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork
0.00 +0.1 0.09 ?223% perf-profile.calltrace.cycles-pp.shmem_truncate_range.shmem_evict_inode.evict.do_unlinkat.__x64_sys_unlink
2.52 ? 10% +0.1 2.62 ? 16% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
2.36 ? 10% +0.1 2.47 ? 16% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
0.00 +0.1 0.11 ?223% perf-profile.calltrace.cycles-pp.kallsyms__parse.__dso__load_kallsyms.dso__load.map__load.thread__find_map
2.81 ? 26% +0.1 2.93 ? 28% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
4.54 ? 8% +0.1 4.66 ? 15% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
1.74 ? 25% +0.1 1.87 ? 29% perf-profile.calltrace.cycles-pp.shmem_file_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.89 ? 8% +0.2 3.04 ? 16% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
0.00 +0.2 0.17 ?141% perf-profile.calltrace.cycles-pp.shmem_add_to_page_cache.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.__generic_file_write_iter
0.09 ?223% +0.2 0.27 ?100% perf-profile.calltrace.cycles-pp.rcu_sched_clock_irq.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues
0.00 +0.2 0.19 ?141% perf-profile.calltrace.cycles-pp.ret_from_fork
0.00 +0.2 0.19 ?141% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
0.46 ? 72% +0.2 0.67 ? 54% perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.shmem_file_read_iter.vfs_read.ksys_read
6.06 ? 9% +0.3 6.33 ? 15% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
0.00 +0.3 0.27 ?100% perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.vfs_write
2.85 ? 17% +0.3 3.14 ? 14% perf-profile.calltrace.cycles-pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
0.44 ? 74% +0.3 0.73 ? 17% perf-profile.calltrace.cycles-pp.wait_for_lsr.serial8250_console_write.console_emit_next_record.console_unlock.vprintk_emit
0.46 ? 74% +0.3 0.76 ? 17% perf-profile.calltrace.cycles-pp.asm_sysvec_irq_work.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.46 ? 74% +0.3 0.76 ? 17% perf-profile.calltrace.cycles-pp.sysvec_irq_work.asm_sysvec_irq_work.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
0.46 ? 74% +0.3 0.76 ? 17% perf-profile.calltrace.cycles-pp.__sysvec_irq_work.sysvec_irq_work.asm_sysvec_irq_work.cpuidle_enter_state.cpuidle_enter
0.46 ? 74% +0.3 0.76 ? 17% perf-profile.calltrace.cycles-pp.irq_work_run_list.irq_work_run.__sysvec_irq_work.sysvec_irq_work.asm_sysvec_irq_work
0.46 ? 74% +0.3 0.76 ? 17% perf-profile.calltrace.cycles-pp.irq_work_run.__sysvec_irq_work.sysvec_irq_work.asm_sysvec_irq_work.cpuidle_enter_state
0.46 ? 74% +0.3 0.76 ? 17% perf-profile.calltrace.cycles-pp.irq_work_single.irq_work_run_list.irq_work_run.__sysvec_irq_work.sysvec_irq_work
0.46 ? 74% +0.3 0.76 ? 17% perf-profile.calltrace.cycles-pp._printk.irq_work_single.irq_work_run_list.irq_work_run.__sysvec_irq_work
0.46 ? 74% +0.3 0.76 ? 17% perf-profile.calltrace.cycles-pp.vprintk_emit._printk.irq_work_single.irq_work_run_list.irq_work_run
0.46 ? 74% +0.3 0.76 ? 17% perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit._printk.irq_work_single.irq_work_run_list
0.46 ? 74% +0.3 0.76 ? 17% perf-profile.calltrace.cycles-pp.console_emit_next_record.console_unlock.vprintk_emit._printk.irq_work_single
0.46 ? 74% +0.3 0.76 ? 17% perf-profile.calltrace.cycles-pp.serial8250_console_write.console_emit_next_record.console_unlock.vprintk_emit._printk
0.50 ? 71% +0.3 0.81 ? 33% perf-profile.calltrace.cycles-pp.copy_page_to_iter.shmem_file_read_iter.vfs_read.ksys_read.do_syscall_64
6.31 ? 10% +0.3 6.63 ? 15% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
9.12 ? 9% +0.5 9.63 ? 15% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
10.28 ? 12% +0.6 10.89 ? 17% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
47.44 ? 14% +3.6 51.02 ? 15% perf-profile.calltrace.cycles-pp.mwait_idle_with_hints.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
47.55 ? 14% +3.6 51.14 ? 15% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
59.90 ? 10% +3.9 63.83 ? 11% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
64.77 ? 10% +4.4 69.20 ? 10% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
57.50 ? 15% +5.1 62.60 ? 14% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
61.07 ? 14% +5.6 66.64 ? 13% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
61.57 ? 14% +5.6 67.20 ? 13% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
61.57 ? 14% +5.6 67.20 ? 13% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
61.51 ? 14% +5.6 67.15 ? 13% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests



Attachments:
(No filename) (35.45 kB)
config-6.0.0-rc3-00587-g28c5609fb236 (156.03 kB)
job-script (7.87 kB)
job.yaml (5.11 kB)
reproduce (263.00 B)
Download all attachments

2023-05-09 07:03:38

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [linus:master] [mm/mmap] 28c5609fb2: aim9.page_test.ops_per_sec -10.8% regression

Hi Liam,

On 5/6/23 14:20, kernel test robot wrote:
> Hello,
>
> kernel test robot noticed a -10.8% regression of aim9.page_test.ops_per_sec on:
>
> commit: 28c5609fb236807910ca347ad3e26c4567998526 ("mm/mmap: preallocate maple nodes for brk vma expansion")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> testcase: aim9
> test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
> parameters:
>
> testtime: 5s
> test: all
> cpufreq_governor: performance
>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <[email protected]>
> | Link: https://lore.kernel.org/oe-lkp/[email protected]
>

Some finding related:
eBPF funclatency tool says the latency of function do_brk_flags() doubles
with the patch 28c5609fb2.

With the patch 28c5609fb2, the mas_alloc_nodes() is called much more than
without the patch.

In my local debugging env, I can see around 17009999 times call to
mas_alloc_nodes(). The number is zero without the patch 28c5609fb2.
So we are kind of sure the regression is connected to the patch.


The page_test of AIM9 is doing following work with single thread:
newbrk = sbrk(1024 * 1024); /* move up 1 megabyte */
while (true) { /* while not done */
newbrk = sbrk(-4096 * 16); /* deallocate some space */
for (i = 0; i < 16; i++) { /* now get it back in pieces */
newbrk = sbrk(4096); /* Get pointer to new space */
}
}

Is it possible that the sbrk pattern triggers the corner case? Thanks.


Regards
Yin, Fengwei

2023-05-09 22:37:27

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [linus:master] [mm/mmap] 28c5609fb2: aim9.page_test.ops_per_sec -10.8% regression

* Yin Fengwei <[email protected]> [230509 02:56]:
> Hi Liam,
>
> On 5/6/23 14:20, kernel test robot wrote:
> > Hello,
> >
> > kernel test robot noticed a -10.8% regression of aim9.page_test.ops_per_sec on:
> >
> > commit: 28c5609fb236807910ca347ad3e26c4567998526 ("mm/mmap: preallocate maple nodes for brk vma expansion")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > testcase: aim9
> > test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
> > parameters:
> >
> > testtime: 5s
> > test: all
> > cpufreq_governor: performance
> >
> > If you fix the issue, kindly add following tag
> > | Reported-by: kernel test robot <[email protected]>
> > | Link: https://lore.kernel.org/oe-lkp/[email protected]
> >
>
> Some finding related:
> eBPF funclatency tool says the latency of function do_brk_flags() doubles
> with the patch 28c5609fb2.
>
> With the patch 28c5609fb2, the mas_alloc_nodes() is called much more than
> without the patch.

Thank you for the insight into this test.

Right, so this is patch adds the call to preallocate nodes for the worst
case possible. That certainly explains why you see so many more calls
to allocate nodes - it was meant to do just that.

>
> In my local debugging env, I can see around 17009999 times call to
> mas_alloc_nodes(). The number is zero without the patch 28c5609fb2.
> So we are kind of sure the regression is connected to the patch.
>
>
> The page_test of AIM9 is doing following work with single thread:
> newbrk = sbrk(1024 * 1024); /* move up 1 megabyte */
> while (true) { /* while not done */
> newbrk = sbrk(-4096 * 16); /* deallocate some space */
> for (i = 0; i < 16; i++) { /* now get it back in pieces */
> newbrk = sbrk(4096); /* Get pointer to new space */
> }
> }
>
> Is it possible that the sbrk pattern triggers the corner case? Thanks.

I appreciate the analysis and the pointer to the allocation code. This
has shown up somewhere else and I'm working on reducing the
preallocations. This regression seems to be hidden, sometimes at least,
by the kmem_cache.

Regards,
Liam