2022-03-02 18:35:59

by kernel test robot

[permalink] [raw]
Subject: [sched/numa] 0fb3978b0a: stress-ng.fstat.ops_per_sec -18.9% regression



Greeting,

FYI, we noticed a -18.9% regression of stress-ng.fstat.ops_per_sec due to commit:


commit: 0fb3978b0aac3a5c08637aed03cc2d65f793508f ("sched/numa: Fix NUMA topology for systems with CPU-less nodes")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core

in testcase: stress-ng
on test machine: 96 threads 2 sockets Ice Lake with 256G memory
with following parameters:

nr_threads: 10%
disk: 1HDD
testtime: 60s
fs: xfs
class: filesystem
test: fstat
cpufreq_governor: performance
ucode: 0xb000280




If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
filesystem/gcc-9/performance/1HDD/xfs/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-icl-2sp1/fstat/stress-ng/60s/0xb000280

commit:
1087ad4e3f ("sched: replace cpumask_weight with cpumask_empty where appropriate")
0fb3978b0a ("sched/numa: Fix NUMA topology for systems with CPU-less nodes")

1087ad4e3f88c474 0fb3978b0aac3a5c08637aed03c
---------------- ---------------------------
%stddev %change %stddev
\ | \
1868219 -18.9% 1516032 stress-ng.fstat.ops
31136 -18.9% 25267 stress-ng.fstat.ops_per_sec
971742 ? 4% -38.4% 598917 ? 10% stress-ng.time.involuntary_context_switches
2866 +6.4% 3049 stress-ng.time.percent_of_cpu_this_job_got
1614 +9.0% 1759 stress-ng.time.system_time
165.36 -18.8% 134.28 stress-ng.time.user_time
3570131 -21.3% 2809861 stress-ng.time.voluntary_context_switches
19171169 -11.8% 16906168 cpuidle..usage
2.88 -0.5 2.34 mpstat.cpu.all.usr%
102649 ? 6% -21.2% 80900 ? 7% turbostat.C1
566173 -23.2% 434981 vmstat.system.cs
68.84 -1.7% 67.66 iostat.cpu.idle
28.35 +6.0% 30.04 iostat.cpu.system
2.81 -18.5% 2.29 iostat.cpu.user
38537 -9.4% 34904 ? 3% meminfo.Active
38351 -9.5% 34721 ? 3% meminfo.Active(anon)
22760154 -36.6% 14437034 ? 13% meminfo.DirectMap2M
19392447 ? 3% -32.8% 13040417 ? 2% numa-numastat.node0.local_node
19449827 ? 3% -32.8% 13077610 ? 2% numa-numastat.node0.numa_hit
16791212 ? 4% -14.4% 14371345 ? 3% numa-numastat.node1.local_node
16817525 ? 4% -14.3% 14416277 ? 3% numa-numastat.node1.numa_hit
14573 ? 2% -11.2% 12940 ? 2% numa-meminfo.node0.KernelStack
182145 ? 2% -12.6% 159104 ? 2% numa-meminfo.node0.SUnreclaim
36529 -10.7% 32607 ? 4% numa-meminfo.node1.Active
36529 -10.9% 32535 ? 4% numa-meminfo.node1.Active(anon)
1293 ? 13% +63.1% 2109 ? 27% numa-meminfo.node1.PageTables
129591 ? 2% +10.8% 143645 ? 4% numa-meminfo.node1.SUnreclaim
14459 -10.9% 12882 ? 2% numa-vmstat.node0.nr_kernel_stack
10784778 -26.7% 7905245 ? 2% numa-vmstat.node0.numa_hit
10726945 -26.7% 7866232 numa-vmstat.node0.numa_local
9155 -10.9% 8154 ? 4% numa-vmstat.node1.nr_active_anon
321.67 ? 13% +64.3% 528.50 ? 26% numa-vmstat.node1.nr_page_table_pages
31618 ? 2% +14.7% 36272 ? 5% numa-vmstat.node1.nr_slab_unreclaimable
9155 -10.9% 8154 ? 4% numa-vmstat.node1.nr_zone_active_anon
8676741 -16.2% 7274506 ? 2% numa-vmstat.node1.numa_hit
8605314 -16.5% 7185337 numa-vmstat.node1.numa_local
9599 -9.4% 8694 ? 3% proc-vmstat.nr_active_anon
111850 -1.9% 109751 proc-vmstat.nr_inactive_anon
42298 -6.3% 39630 proc-vmstat.nr_shmem
9599 -9.4% 8694 ? 3% proc-vmstat.nr_zone_active_anon
111850 -1.9% 109751 proc-vmstat.nr_zone_inactive_anon
36270505 -24.2% 27496053 proc-vmstat.numa_hit
36186813 -24.2% 27413929 proc-vmstat.numa_local
1657 ? 4% -9.1% 1507 ? 3% proc-vmstat.pgactivate
36206536 -24.1% 27482249 proc-vmstat.pgalloc_normal
35997443 -24.2% 27277470 proc-vmstat.pgfree
1.414e+10 -21.2% 1.114e+10 perf-stat.i.branch-instructions
0.86 ? 5% +0.1 0.98 ? 3% perf-stat.i.branch-miss-rate%
1.147e+08 ? 2% -7.4% 1.062e+08 perf-stat.i.branch-misses
17.27 ? 2% +10.8 28.11 ? 3% perf-stat.i.cache-miss-rate%
94097732 +44.6% 1.36e+08 perf-stat.i.cache-misses
5.473e+08 ? 2% -11.1% 4.866e+08 ? 2% perf-stat.i.cache-references
586851 -22.7% 453723 perf-stat.i.context-switches
1.37 +31.6% 1.81 perf-stat.i.cpi
1.018e+11 +5.1% 1.07e+11 perf-stat.i.cpu-cycles
5781 ? 5% +101.4% 11642 ? 5% perf-stat.i.cpu-migrations
1102 -27.0% 804.31 ? 2% perf-stat.i.cycles-between-cache-misses
2.118e+10 -21.3% 1.667e+10 perf-stat.i.dTLB-loads
1568794 ? 5% -19.2% 1268168 ? 5% perf-stat.i.dTLB-store-misses
1.254e+10 -21.7% 9.819e+09 perf-stat.i.dTLB-stores
7.519e+10 -21.2% 5.923e+10 perf-stat.i.instructions
0.73 -24.1% 0.56 perf-stat.i.ipc
1.06 +5.1% 1.11 perf-stat.i.metric.GHz
796.03 +29.0% 1027 perf-stat.i.metric.K/sec
504.06 -21.3% 396.91 perf-stat.i.metric.M/sec
83.11 +4.3 87.46 perf-stat.i.node-load-miss-rate%
13319686 +66.2% 22131430 perf-stat.i.node-load-misses
2571747 ? 2% +11.1% 2857958 perf-stat.i.node-loads
18.45 +18.1 36.52 perf-stat.i.node-store-miss-rate%
10274353 +151.0% 25786465 perf-stat.i.node-store-misses
47593011 -5.3% 45053955 perf-stat.i.node-stores
7.33 +12.7% 8.26 perf-stat.overall.MPKI
0.81 ? 2% +0.1 0.95 perf-stat.overall.branch-miss-rate%
17.17 +10.4 27.55 ? 3% perf-stat.overall.cache-miss-rate%
1.36 +32.4% 1.79 perf-stat.overall.cpi
1077 -26.7% 789.62 perf-stat.overall.cycles-between-cache-misses
0.02 ? 13% +0.0 0.03 ? 12% perf-stat.overall.dTLB-load-miss-rate%
0.74 -24.5% 0.56 perf-stat.overall.ipc
83.78 +4.7 88.44 perf-stat.overall.node-load-miss-rate%
18.00 +18.2 36.17 perf-stat.overall.node-store-miss-rate%
1.391e+10 -20.7% 1.103e+10 perf-stat.ps.branch-instructions
1.129e+08 ? 2% -7.1% 1.049e+08 perf-stat.ps.branch-misses
93034789 +43.2% 1.332e+08 perf-stat.ps.cache-misses
5.42e+08 ? 2% -10.7% 4.841e+08 ? 2% perf-stat.ps.cache-references
576128 -22.1% 448626 perf-stat.ps.context-switches
1.002e+11 +5.0% 1.052e+11 perf-stat.ps.cpu-cycles
5932 ? 5% +93.4% 11475 ? 5% perf-stat.ps.cpu-migrations
2.083e+10 -20.8% 1.65e+10 perf-stat.ps.dTLB-loads
1540779 ? 5% -18.9% 1249488 ? 5% perf-stat.ps.dTLB-store-misses
1.232e+10 -21.2% 9.712e+09 perf-stat.ps.dTLB-stores
7.394e+10 -20.7% 5.861e+10 perf-stat.ps.instructions
13239899 +63.9% 21702867 perf-stat.ps.node-load-misses
2563640 ? 3% +10.6% 2835670 perf-stat.ps.node-loads
10257883 +144.9% 25122315 perf-stat.ps.node-store-misses
46734508 -5.2% 44323339 perf-stat.ps.node-stores
4.658e+12 -19.9% 3.733e+12 perf-stat.total.instructions
10.04 -2.6 7.48 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__clone
10.03 -2.6 7.47 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
10.02 -2.6 7.46 perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
10.02 -2.6 7.46 perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__clone
10.31 -2.5 7.77 perf-profile.calltrace.cycles-pp.__clone
7.88 -2.5 5.42 perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.26 -2.1 4.14 perf-profile.calltrace.cycles-pp.dup_task_struct.copy_process.kernel_clone.__do_sys_clone.do_syscall_64
4.66 ? 2% -1.7 2.96 perf-profile.calltrace.cycles-pp.__vmalloc_node_range.dup_task_struct.copy_process.kernel_clone.__do_sys_clone
3.56 ? 2% -1.4 2.11 ? 2% perf-profile.calltrace.cycles-pp.__get_vm_area_node.__vmalloc_node_range.dup_task_struct.copy_process.kernel_clone
3.22 ? 2% -1.3 1.97 ? 2% perf-profile.calltrace.cycles-pp.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range.dup_task_struct.copy_process
1.74 ? 11% -1.2 0.57 ? 2% perf-profile.calltrace.cycles-pp.__check_object_size.strncpy_from_user.getname_flags.user_path_at_empty.vfs_statx
3.57 -1.0 2.58 perf-profile.calltrace.cycles-pp.getname_flags.user_path_at_empty.vfs_statx.do_statx.__x64_sys_statx
2.57 ? 3% -0.9 1.63 ? 2% perf-profile.calltrace.cycles-pp.ret_from_fork
2.57 ? 3% -0.9 1.63 ? 2% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
2.52 ? 3% -0.9 1.59 ? 2% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
2.00 ? 3% -0.8 1.15 ? 3% perf-profile.calltrace.cycles-pp.__vunmap.free_work.process_one_work.worker_thread.kthread
2.04 ? 3% -0.8 1.20 ? 3% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork
2.02 ? 3% -0.8 1.18 ? 3% perf-profile.calltrace.cycles-pp.free_work.process_one_work.worker_thread.kthread.ret_from_fork
1.65 ? 3% -0.7 0.90 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range.dup_task_struct
2.50 -0.7 1.79 perf-profile.calltrace.cycles-pp.__entry_text_start.statx
1.57 ? 3% -0.7 0.86 ? 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.alloc_vmap_area.__get_vm_area_node.__vmalloc_node_range
2.95 -0.7 2.27 perf-profile.calltrace.cycles-pp.link_path_walk.path_lookupat.filename_lookup.user_path_at_empty.vfs_statx
2.39 -0.7 1.73 perf-profile.calltrace.cycles-pp.getname_flags.user_path_at_empty.vfs_statx.__do_sys_newlstat.do_syscall_64
2.38 -0.6 1.76 perf-profile.calltrace.cycles-pp.getname_flags.user_path_at_empty.vfs_statx.__do_sys_newstat.do_syscall_64
1.29 ? 3% -0.6 0.68 ? 3% perf-profile.calltrace.cycles-pp.remove_vm_area.__vunmap.free_work.process_one_work.worker_thread
2.15 -0.6 1.55 perf-profile.calltrace.cycles-pp.strncpy_from_user.getname_flags.user_path_at_empty.vfs_statx.do_statx
2.35 ? 2% -0.4 1.92 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
1.46 -0.4 1.05 ? 2% perf-profile.calltrace.cycles-pp.__fxstat64
3.89 ? 2% -0.4 3.49 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
1.40 -0.4 1.01 perf-profile.calltrace.cycles-pp.strncpy_from_user.getname_flags.user_path_at_empty.vfs_statx.__do_sys_newlstat
3.98 ? 2% -0.4 3.60 perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
5.44 ? 2% -0.4 5.06 perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
5.44 ? 2% -0.4 5.05 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
5.43 ? 2% -0.4 5.04 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
5.50 ? 2% -0.4 5.12 perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
1.40 -0.4 1.02 perf-profile.calltrace.cycles-pp.strncpy_from_user.getname_flags.user_path_at_empty.vfs_statx.__do_sys_newstat
1.60 ? 2% -0.3 1.31 ? 2% perf-profile.calltrace.cycles-pp.find_idlest_group.select_task_rq_fair.wake_up_new_task.kernel_clone.__do_sys_clone
2.37 ? 2% -0.3 2.10 perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.39 -0.3 2.12 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
2.39 -0.3 2.12 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.36 ? 2% -0.3 2.10 perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.10 ? 2% -0.3 0.84 perf-profile.calltrace.cycles-pp.__vmalloc_area_node.__vmalloc_node_range.dup_task_struct.copy_process.kernel_clone
0.85 -0.2 0.62 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
0.84 ? 2% -0.2 0.61 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.getname_flags.user_path_at_empty.vfs_statx.do_statx
0.76 -0.2 0.56 perf-profile.calltrace.cycles-pp.cp_statx.do_statx.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.73 -0.2 0.55 ? 3% perf-profile.calltrace.cycles-pp.shim_statx
0.82 ? 2% -0.2 0.64 ? 2% perf-profile.calltrace.cycles-pp.__alloc_pages_bulk.__vmalloc_area_node.__vmalloc_node_range.dup_task_struct.copy_process
1.72 ? 2% -0.1 1.57 ? 2% perf-profile.calltrace.cycles-pp.select_task_rq_fair.wake_up_new_task.kernel_clone.__do_sys_clone.do_syscall_64
0.78 ? 2% -0.1 0.66 ? 3% perf-profile.calltrace.cycles-pp.release_task.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.10 ? 2% -0.1 2.00 perf-profile.calltrace.cycles-pp.wake_up_new_task.kernel_clone.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.64 ? 3% +0.0 0.68 perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
0.64 ? 2% +0.0 0.68 perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
0.67 ? 2% +0.0 0.72 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.start_thread
0.62 ? 2% +0.0 0.66 ? 3% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__softirqentry_text_start.irq_exit_rcu.sysvec_apic_timer_interrupt
0.63 +0.0 0.68 ? 3% perf-profile.calltrace.cycles-pp.rcu_core.__softirqentry_text_start.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
0.56 ? 2% +0.0 0.61 perf-profile.calltrace.cycles-pp.lookup_fast.walk_component.path_lookupat.filename_lookup.user_path_at_empty
0.66 ? 2% +0.0 0.71 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.start_thread
0.78 ? 2% +0.0 0.83 ? 2% perf-profile.calltrace.cycles-pp.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter
0.76 +0.0 0.81 ? 2% perf-profile.calltrace.cycles-pp.__softirqentry_text_start.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state
0.78 ? 2% +0.1 0.86 perf-profile.calltrace.cycles-pp.start_thread
1.00 +0.1 1.08 perf-profile.calltrace.cycles-pp.walk_component.path_lookupat.filename_lookup.user_path_at_empty.vfs_statx
1.59 +0.3 1.88 perf-profile.calltrace.cycles-pp.vfs_getattr.vfs_statx.do_statx.__x64_sys_statx.do_syscall_64
1.55 +0.3 1.85 perf-profile.calltrace.cycles-pp.security_inode_getattr.vfs_getattr.vfs_statx.do_statx.__x64_sys_statx
1.50 +0.3 1.82 perf-profile.calltrace.cycles-pp.common_perm_cond.security_inode_getattr.vfs_getattr.vfs_statx.do_statx
1.57 ? 2% +0.3 1.91 perf-profile.calltrace.cycles-pp.vfs_getattr.vfs_statx.__do_sys_newlstat.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.52 ? 2% +0.4 1.88 perf-profile.calltrace.cycles-pp.security_inode_getattr.vfs_getattr.vfs_statx.__do_sys_newlstat.do_syscall_64
1.48 ? 2% +0.4 1.84 perf-profile.calltrace.cycles-pp.common_perm_cond.security_inode_getattr.vfs_getattr.vfs_statx.__do_sys_newlstat
1.54 +0.4 1.90 perf-profile.calltrace.cycles-pp.vfs_getattr.vfs_statx.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.50 +0.4 1.87 perf-profile.calltrace.cycles-pp.security_inode_getattr.vfs_getattr.vfs_statx.__do_sys_newstat.do_syscall_64
1.45 +0.4 1.83 perf-profile.calltrace.cycles-pp.common_perm_cond.security_inode_getattr.vfs_getattr.vfs_statx.__do_sys_newstat
13.24 +0.5 13.71 perf-profile.calltrace.cycles-pp.user_path_at_empty.vfs_statx.do_statx.__x64_sys_statx.do_syscall_64
11.39 +1.1 12.54 perf-profile.calltrace.cycles-pp.user_path_at_empty.vfs_statx.__do_sys_newlstat.do_syscall_64.entry_SYSCALL_64_after_hwframe
12.37 +1.2 13.60 perf-profile.calltrace.cycles-pp.user_path_at_empty.vfs_statx.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe
24.30 +1.3 25.61 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.statx
23.84 +1.4 25.28 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
5.24 +1.6 6.81 perf-profile.calltrace.cycles-pp.lockref_put_or_lock.dput.path_put.vfs_statx.do_statx
5.56 +1.6 7.13 perf-profile.calltrace.cycles-pp.path_put.vfs_statx.do_statx.__x64_sys_statx.do_syscall_64
5.53 +1.6 7.12 perf-profile.calltrace.cycles-pp.dput.path_put.vfs_statx.do_statx.__x64_sys_statx
9.14 +1.6 10.75 perf-profile.calltrace.cycles-pp.filename_lookup.user_path_at_empty.vfs_statx.do_statx.__x64_sys_statx
5.14 +1.7 6.80 perf-profile.calltrace.cycles-pp.lockref_put_or_lock.dput.path_put.vfs_statx.__do_sys_newlstat
5.46 +1.7 7.14 perf-profile.calltrace.cycles-pp.path_put.vfs_statx.__do_sys_newlstat.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.43 +1.7 7.12 perf-profile.calltrace.cycles-pp.dput.path_put.vfs_statx.__do_sys_newlstat.do_syscall_64
5.04 +1.7 6.72 perf-profile.calltrace.cycles-pp.lockref_put_or_lock.dput.path_put.vfs_statx.__do_sys_newstat
8.87 +1.7 10.55 perf-profile.calltrace.cycles-pp.path_lookupat.filename_lookup.user_path_at_empty.vfs_statx.do_statx
5.35 +1.7 7.06 perf-profile.calltrace.cycles-pp.path_put.vfs_statx.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.33 +1.7 7.04 perf-profile.calltrace.cycles-pp.dput.path_put.vfs_statx.__do_sys_newstat.do_syscall_64
22.49 +1.8 24.28 perf-profile.calltrace.cycles-pp.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
8.74 +1.9 10.60 perf-profile.calltrace.cycles-pp.filename_lookup.user_path_at_empty.vfs_statx.__do_sys_newlstat.do_syscall_64
8.58 +1.9 10.48 perf-profile.calltrace.cycles-pp.path_lookupat.filename_lookup.user_path_at_empty.vfs_statx.__do_sys_newlstat
22.06 +1.9 23.98 perf-profile.calltrace.cycles-pp.do_statx.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
9.72 +1.9 11.64 perf-profile.calltrace.cycles-pp.filename_lookup.user_path_at_empty.vfs_statx.__do_sys_newstat.do_syscall_64
9.52 +2.0 11.49 perf-profile.calltrace.cycles-pp.path_lookupat.filename_lookup.user_path_at_empty.vfs_statx.__do_sys_newstat
22.12 +2.2 24.30 perf-profile.calltrace.cycles-pp.__lxstat64
20.95 +2.2 23.15 perf-profile.calltrace.cycles-pp.vfs_statx.do_statx.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe
22.91 +2.3 25.26 perf-profile.calltrace.cycles-pp.__xstat64
20.57 +2.6 23.18 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__lxstat64
20.38 +2.7 23.04 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__lxstat64
21.39 +2.8 24.15 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__xstat64
21.19 +2.8 24.02 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
19.69 +2.8 22.53 perf-profile.calltrace.cycles-pp.__do_sys_newlstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__lxstat64
20.52 +3.0 23.52 perf-profile.calltrace.cycles-pp.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
18.90 +3.0 21.95 perf-profile.calltrace.cycles-pp.vfs_statx.__do_sys_newlstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__lxstat64
19.74 +3.2 22.94 perf-profile.calltrace.cycles-pp.vfs_statx.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64
21.01 +6.5 27.55 perf-profile.calltrace.cycles-pp.complete_walk.path_lookupat.filename_lookup.user_path_at_empty.vfs_statx
20.84 +6.6 27.43 perf-profile.calltrace.cycles-pp.try_to_unlazy.complete_walk.path_lookupat.filename_lookup.user_path_at_empty
20.59 +6.6 27.24 perf-profile.calltrace.cycles-pp.__legitimize_path.try_to_unlazy.complete_walk.path_lookupat.filename_lookup
20.10 +6.7 26.82 perf-profile.calltrace.cycles-pp.lockref_get_not_dead.__legitimize_path.try_to_unlazy.complete_walk.path_lookupat
10.03 -2.6 7.46 perf-profile.children.cycles-pp.__do_sys_clone
10.02 -2.6 7.46 perf-profile.children.cycles-pp.kernel_clone
10.33 -2.5 7.79 perf-profile.children.cycles-pp.__clone
7.89 -2.5 5.43 perf-profile.children.cycles-pp.copy_process
8.46 -2.3 6.15 perf-profile.children.cycles-pp.getname_flags
6.26 -2.1 4.14 perf-profile.children.cycles-pp.dup_task_struct
4.66 ? 2% -1.7 2.96 perf-profile.children.cycles-pp.__vmalloc_node_range
3.86 ? 2% -1.7 2.16 perf-profile.children.cycles-pp._raw_spin_lock
3.40 ? 2% -1.7 1.75 ? 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
3.56 ? 2% -1.4 2.11 ? 2% perf-profile.children.cycles-pp.__get_vm_area_node
5.06 -1.4 3.67 perf-profile.children.cycles-pp.strncpy_from_user
3.22 ? 2% -1.3 1.97 ? 2% perf-profile.children.cycles-pp.alloc_vmap_area
3.37 -0.9 2.42 perf-profile.children.cycles-pp.__entry_text_start
2.57 ? 3% -0.9 1.63 ? 2% perf-profile.children.cycles-pp.kthread
2.52 ? 3% -0.9 1.59 ? 2% perf-profile.children.cycles-pp.worker_thread
2.72 ? 3% -0.9 1.82 ? 2% perf-profile.children.cycles-pp.ret_from_fork
2.00 ? 3% -0.8 1.15 ? 3% perf-profile.children.cycles-pp.__vunmap
2.05 ? 3% -0.8 1.20 ? 3% perf-profile.children.cycles-pp.process_one_work
2.02 ? 3% -0.8 1.18 ? 3% perf-profile.children.cycles-pp.free_work
2.88 -0.8 2.09 perf-profile.children.cycles-pp.syscall_return_via_sysret
3.03 -0.7 2.33 perf-profile.children.cycles-pp.link_path_walk
1.29 ? 3% -0.6 0.68 ? 3% perf-profile.children.cycles-pp.remove_vm_area
2.12 -0.6 1.54 perf-profile.children.cycles-pp.kmem_cache_alloc
1.99 -0.6 1.43 perf-profile.children.cycles-pp.__check_object_size
1.92 -0.5 1.42 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
1.67 -0.5 1.20 ? 2% perf-profile.children.cycles-pp.__fxstat64
0.80 ? 2% -0.4 0.37 ? 2% perf-profile.children.cycles-pp.find_vmap_area
2.38 ? 2% -0.4 1.96 perf-profile.children.cycles-pp.intel_idle
4.02 ? 2% -0.4 3.63 perf-profile.children.cycles-pp.cpuidle_enter_state
4.03 ? 2% -0.4 3.64 perf-profile.children.cycles-pp.cpuidle_enter
5.50 ? 2% -0.4 5.11 perf-profile.children.cycles-pp.do_idle
5.44 ? 2% -0.4 5.06 perf-profile.children.cycles-pp.start_secondary
5.50 ? 2% -0.4 5.12 perf-profile.children.cycles-pp.secondary_startup_64_no_verify
5.50 ? 2% -0.4 5.12 perf-profile.children.cycles-pp.cpu_startup_entry
1.65 -0.4 1.28 perf-profile.children.cycles-pp.kmem_cache_free
3.03 -0.4 2.67 perf-profile.children.cycles-pp.walk_component
0.83 ? 3% -0.4 0.48 ? 3% perf-profile.children.cycles-pp.free_vmap_area_noflush
1.30 -0.3 0.96 perf-profile.children.cycles-pp.cp_new_stat
1.24 -0.3 0.90 perf-profile.children.cycles-pp._copy_to_user
1.23 -0.3 0.93 ? 2% perf-profile.children.cycles-pp.__might_resched
1.65 ? 2% -0.3 1.35 ? 2% perf-profile.children.cycles-pp.find_idlest_group
1.10 -0.3 0.84 perf-profile.children.cycles-pp.__vmalloc_area_node
0.92 -0.3 0.66 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
1.00 ? 2% -0.3 0.75 ? 2% perf-profile.children.cycles-pp.__might_fault
3.01 -0.2 2.78 perf-profile.children.cycles-pp.__x64_sys_exit
3.01 -0.2 2.78 perf-profile.children.cycles-pp.do_exit
0.78 -0.2 0.56 perf-profile.children.cycles-pp.__check_heap_object
0.34 ? 4% -0.2 0.14 ? 4% perf-profile.children.cycles-pp.find_vm_area
0.77 -0.2 0.57 perf-profile.children.cycles-pp.cp_statx
0.76 -0.2 0.57 ? 3% perf-profile.children.cycles-pp.shim_statx
0.82 ? 2% -0.2 0.64 ? 2% perf-profile.children.cycles-pp.__alloc_pages_bulk
0.96 -0.2 0.79 perf-profile.children.cycles-pp.step_into
0.64 -0.2 0.48 perf-profile.children.cycles-pp.__might_sleep
0.70 ? 2% -0.2 0.55 perf-profile.children.cycles-pp.vfs_getattr_nosec
0.57 ? 2% -0.2 0.42 ? 2% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
0.58 ? 2% -0.1 0.44 ? 2% perf-profile.children.cycles-pp.kernel_init_free_pages
1.62 ? 3% -0.1 1.49 perf-profile.children.cycles-pp.lookup_fast
0.57 ? 2% -0.1 0.44 ? 2% perf-profile.children.cycles-pp.clear_page_erms
1.73 ? 2% -0.1 1.60 ? 2% perf-profile.children.cycles-pp.select_task_rq_fair
0.75 -0.1 0.62 perf-profile.children.cycles-pp.inode_permission
0.79 ? 2% -0.1 0.66 ? 2% perf-profile.children.cycles-pp.release_task
0.72 -0.1 0.60 ? 2% perf-profile.children.cycles-pp.path_init
0.51 ? 2% -0.1 0.38 perf-profile.children.cycles-pp._raw_spin_lock_irq
0.55 ? 2% -0.1 0.43 perf-profile.children.cycles-pp.__sched_yield
0.88 ? 3% -0.1 0.77 ? 2% perf-profile.children.cycles-pp.__d_lookup_rcu
0.52 -0.1 0.42 ? 2% perf-profile.children.cycles-pp.memcpy_erms
0.39 ? 3% -0.1 0.29 ? 2% perf-profile.children.cycles-pp.__virt_addr_valid
0.50 -0.1 0.40 ? 2% perf-profile.children.cycles-pp.arch_dup_task_struct
2.10 ? 2% -0.1 2.00 perf-profile.children.cycles-pp.wake_up_new_task
0.25 ? 2% -0.1 0.16 ? 4% perf-profile.children.cycles-pp.insert_vmap_area
0.33 ? 3% -0.1 0.24 ? 2% perf-profile.children.cycles-pp.__cond_resched
0.46 ? 2% -0.1 0.37 perf-profile.children.cycles-pp.__x64_sys_sched_yield
0.31 ? 4% -0.1 0.22 ? 3% perf-profile.children.cycles-pp.putname
0.45 -0.1 0.36 perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.29 ? 4% -0.1 0.20 ? 3% perf-profile.children.cycles-pp.alloc_pid
0.35 -0.1 0.26 ? 3% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.36 ? 5% -0.1 0.27 ? 5% perf-profile.children.cycles-pp.queued_write_lock_slowpath
0.44 ? 3% -0.1 0.36 perf-profile.children.cycles-pp.generic_permission
0.37 ? 3% -0.1 0.29 ? 2% perf-profile.children.cycles-pp.kmem_cache_alloc_node
0.30 -0.1 0.22 perf-profile.children.cycles-pp.map_id_up
0.31 ? 5% -0.1 0.24 ? 11% perf-profile.children.cycles-pp.stress_fstat_helper
0.28 ? 3% -0.1 0.22 ? 3% perf-profile.children.cycles-pp.cpumask_next
0.29 ? 4% -0.1 0.23 ? 3% perf-profile.children.cycles-pp.obj_cgroup_charge_pages
0.31 ? 2% -0.1 0.25 ? 2% perf-profile.children.cycles-pp.terminate_walk
0.24 ? 3% -0.1 0.18 ? 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
0.21 ? 2% -0.1 0.15 ? 3% perf-profile.children.cycles-pp.cgroup_can_fork
0.18 ? 2% -0.1 0.13 ? 3% perf-profile.children.cycles-pp.cgroup_post_fork
0.20 -0.1 0.14 ? 3% perf-profile.children.cycles-pp.__do_sys_newfstat
0.25 ? 5% -0.1 0.20 ? 4% perf-profile.children.cycles-pp.cgroup_free
0.25 ? 5% -0.1 0.19 ? 5% perf-profile.children.cycles-pp.refcount_dec_not_one
0.15 ? 4% -0.1 0.10 ? 8% perf-profile.children.cycles-pp.cgroup_release
0.41 ? 2% -0.1 0.36 ? 2% perf-profile.children.cycles-pp.__legitimize_mnt
0.12 ? 4% -0.1 0.07 ? 6% perf-profile.children.cycles-pp.insert_vmap_area_augment
0.22 ? 5% -0.1 0.17 ? 4% perf-profile.children.cycles-pp.page_counter_charge
0.27 ? 2% -0.0 0.23 ? 3% perf-profile.children.cycles-pp.try_to_wake_up
0.22 ? 3% -0.0 0.18 ? 2% perf-profile.children.cycles-pp.generic_fillattr
0.18 ? 3% -0.0 0.13 ? 2% perf-profile.children.cycles-pp.security_inode_permission
0.16 ? 4% -0.0 0.11 ? 3% perf-profile.children.cycles-pp.rcu_all_qs
0.18 ? 3% -0.0 0.13 ? 3% perf-profile.children.cycles-pp.from_kgid_munged
0.36 -0.0 0.31 ? 3% perf-profile.children.cycles-pp.dequeue_task_fair
0.31 ? 3% -0.0 0.26 ? 4% perf-profile.children.cycles-pp.nd_jump_root
0.15 ? 3% -0.0 0.10 ? 4% perf-profile.children.cycles-pp.vfs_fstat
0.21 ? 5% -0.0 0.17 ? 4% perf-profile.children.cycles-pp.perf_event_task
0.21 ? 4% -0.0 0.17 ? 4% perf-profile.children.cycles-pp.__memcg_kmem_charge_page
0.19 ? 5% -0.0 0.15 ? 6% perf-profile.children.cycles-pp.menu_select
0.18 ? 2% -0.0 0.14 ? 3% perf-profile.children.cycles-pp.from_kuid_munged
0.14 ? 4% -0.0 0.10 ? 3% perf-profile.children.cycles-pp.check_stack_object
0.13 ? 2% -0.0 0.09 ? 4% perf-profile.children.cycles-pp.do_sched_yield
0.37 ? 4% -0.0 0.33 ? 3% perf-profile.children.cycles-pp.__put_task_struct
0.21 ? 5% -0.0 0.17 ? 4% perf-profile.children.cycles-pp.perf_iterate_sb
0.20 ? 3% -0.0 0.16 ? 5% perf-profile.children.cycles-pp.___slab_alloc
0.19 ? 3% -0.0 0.16 ? 3% perf-profile.children.cycles-pp._raw_write_lock_irq
0.11 ? 3% -0.0 0.08 ? 6% perf-profile.children.cycles-pp.cgroup_exit
0.13 ? 6% -0.0 0.09 ? 5% perf-profile.children.cycles-pp.__lookup_mnt
0.12 ? 4% -0.0 0.08 ? 5% perf-profile.children.cycles-pp.rb_insert_color
0.20 ? 3% -0.0 0.16 ? 5% perf-profile.children.cycles-pp.__slab_alloc
0.18 ? 2% -0.0 0.15 ? 4% perf-profile.children.cycles-pp.common_perm
0.19 ? 3% -0.0 0.16 ? 2% perf-profile.children.cycles-pp.update_curr
0.07 ? 8% -0.0 0.04 ? 71% perf-profile.children.cycles-pp.idr_alloc_cyclic
0.07 ? 8% -0.0 0.04 ? 71% perf-profile.children.cycles-pp.idr_alloc_u32
0.46 ? 2% -0.0 0.43 ? 3% perf-profile.children.cycles-pp.obj_cgroup_uncharge_pages
0.10 -0.0 0.07 ? 7% perf-profile.children.cycles-pp.__x64_sys_newfstat
0.17 ? 5% -0.0 0.14 ? 5% perf-profile.children.cycles-pp.__x64_sys_newlstat
0.10 ? 3% -0.0 0.07 ? 5% perf-profile.children.cycles-pp.obj_cgroup_charge
0.21 ? 3% -0.0 0.18 ? 2% perf-profile.children.cycles-pp.__fget_light
0.11 ? 6% -0.0 0.08 ? 7% perf-profile.children.cycles-pp.pids_can_fork
0.16 ? 6% -0.0 0.12 ? 3% perf-profile.children.cycles-pp.add_device_randomness
0.16 ? 3% -0.0 0.12 ? 3% perf-profile.children.cycles-pp.__x64_sys_newstat
0.28 ? 2% -0.0 0.25 ? 2% perf-profile.children.cycles-pp.dequeue_entity
0.18 ? 4% -0.0 0.15 ? 5% perf-profile.children.cycles-pp.mntput_no_expire
0.26 ? 2% -0.0 0.23 ? 3% perf-profile.children.cycles-pp.enqueue_task_fair
0.14 ? 2% -0.0 0.11 ? 5% perf-profile.children.cycles-pp.__queue_work
0.08 ? 4% -0.0 0.05 ? 8% perf-profile.children.cycles-pp.cpu_cgroup_fork
0.32 ? 3% -0.0 0.29 ? 3% perf-profile.children.cycles-pp.sched_move_task
0.22 ? 3% -0.0 0.19 ? 3% perf-profile.children.cycles-pp.enqueue_entity
0.16 ? 3% -0.0 0.13 ? 5% perf-profile.children.cycles-pp.wake_up_q
0.14 ? 5% -0.0 0.12 ? 4% perf-profile.children.cycles-pp.queue_work_on
0.26 ? 3% -0.0 0.24 ? 2% perf-profile.children.cycles-pp._find_next_bit
0.10 ? 10% -0.0 0.08 ? 6% perf-profile.children.cycles-pp.vunmap_range_noflush
0.10 ? 4% -0.0 0.08 ? 8% perf-profile.children.cycles-pp.propagate_protected_usage
0.14 ? 2% -0.0 0.12 ? 4% perf-profile.children.cycles-pp.sched_clock_cpu
0.17 ? 4% -0.0 0.14 ? 3% perf-profile.children.cycles-pp.perf_event_task_output
0.14 ? 3% -0.0 0.11 ? 6% perf-profile.children.cycles-pp.ttwu_do_activate
0.07 ? 7% -0.0 0.04 ? 44% perf-profile.children.cycles-pp.__free_pages
0.15 ? 4% -0.0 0.13 ? 5% perf-profile.children.cycles-pp.free_pid
0.13 ? 5% -0.0 0.10 ? 4% perf-profile.children.cycles-pp.allocate_slab
0.12 ? 5% -0.0 0.09 ? 5% perf-profile.children.cycles-pp.restore_nameidata
0.12 ? 4% -0.0 0.09 ? 6% perf-profile.children.cycles-pp._mix_pool_bytes
0.12 ? 4% -0.0 0.09 perf-profile.children.cycles-pp.sched_post_fork
0.09 ? 5% -0.0 0.07 ? 5% perf-profile.children.cycles-pp.__list_add_valid
0.09 ? 5% -0.0 0.06 ? 7% perf-profile.children.cycles-pp.task_rq_lock
0.19 ? 5% -0.0 0.17 ? 2% perf-profile.children.cycles-pp.unmap_page_range
0.08 ? 7% -0.0 0.06 ? 9% perf-profile.children.cycles-pp.__purge_vmap_area_lazy
0.12 -0.0 0.10 perf-profile.children.cycles-pp.native_sched_clock
0.07 -0.0 0.05 ? 7% perf-profile.children.cycles-pp.should_failslab
0.08 ? 5% -0.0 0.06 ? 7% perf-profile.children.cycles-pp.copy_thread
0.14 ? 3% -0.0 0.13 ? 3% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.07 ? 5% -0.0 0.05 ? 7% perf-profile.children.cycles-pp.task_fork_fair
0.08 ? 4% -0.0 0.06 ? 7% perf-profile.children.cycles-pp.rb_erase
0.15 ? 4% -0.0 0.13 ? 3% perf-profile.children.cycles-pp.zap_pte_range
0.09 ? 7% -0.0 0.07 ? 6% perf-profile.children.cycles-pp.mntput
0.06 ? 7% -0.0 0.05 ? 7% perf-profile.children.cycles-pp.radix_tree_delete_item
0.09 ? 5% -0.0 0.07 ? 6% perf-profile.children.cycles-pp.task_change_group_fair
0.10 ? 3% +0.0 0.11 ? 4% perf-profile.children.cycles-pp.tlb_flush_mmu
0.06 ? 7% +0.0 0.08 ? 4% perf-profile.children.cycles-pp.proc_fd_link
0.06 ? 7% +0.0 0.08 ? 4% perf-profile.children.cycles-pp.pid_revalidate
0.06 ? 8% +0.0 0.07 ? 5% perf-profile.children.cycles-pp.pid_update_inode
0.10 ? 4% +0.0 0.11 ? 4% perf-profile.children.cycles-pp.flush_tlb_mm_range
0.05 +0.0 0.07 ? 7% perf-profile.children.cycles-pp.__task_rq_lock
0.09 ? 4% +0.0 0.11 ? 5% perf-profile.children.cycles-pp.__x64_sys_rt_sigprocmask
0.15 ? 3% +0.0 0.17 ? 4% perf-profile.children.cycles-pp.put_pid
0.11 ? 3% +0.0 0.13 ? 3% perf-profile.children.cycles-pp.tlb_finish_mmu
0.11 ? 3% +0.0 0.13 ? 3% perf-profile.children.cycles-pp.__switch_to
0.07 ? 7% +0.0 0.08 ? 5% perf-profile.children.cycles-pp.try_grab_compound_head
0.04 ? 44% +0.0 0.06 ? 6% perf-profile.children.cycles-pp.futex_cleanup_begin
0.13 ? 5% +0.0 0.15 ? 2% perf-profile.children.cycles-pp.get_futex_key
0.23 ? 2% +0.0 0.25 ? 2% perf-profile.children.cycles-pp.__list_del_entry_valid
0.10 ? 7% +0.0 0.12 ? 8% perf-profile.children.cycles-pp.pthread_create@@GLIBC_2.2.5
0.06 ? 9% +0.0 0.08 perf-profile.children.cycles-pp.tid_fd_revalidate
0.12 ? 9% +0.0 0.14 ? 6% perf-profile.children.cycles-pp.stress_fstat
0.11 ? 4% +0.0 0.13 ? 3% perf-profile.children.cycles-pp.gup_pgd_range
0.12 ? 6% +0.0 0.14 ? 2% perf-profile.children.cycles-pp.internal_get_user_pages_fast
0.22 ? 3% +0.0 0.25 ? 3% perf-profile.children.cycles-pp.pick_link
0.17 ? 5% +0.0 0.20 ? 5% perf-profile.children.cycles-pp.sigprocmask
0.12 ? 3% +0.0 0.16 ? 4% perf-profile.children.cycles-pp.proc_pid_get_link
0.03 ? 70% +0.0 0.07 ? 10% perf-profile.children.cycles-pp.llist_add_batch
0.20 ? 4% +0.0 0.24 ? 3% perf-profile.children.cycles-pp.try_to_unlazy_next
0.14 ? 4% +0.0 0.18 ? 4% perf-profile.children.cycles-pp.__slab_free
0.19 +0.0 0.24 ? 3% perf-profile.children.cycles-pp.update_rq_clock
0.01 ?223% +0.1 0.06 perf-profile.children.cycles-pp.security_task_free
0.00 +0.1 0.05 ? 8% perf-profile.children.cycles-pp.sched_ttwu_pending
0.01 ?223% +0.1 0.06 ? 7% perf-profile.children.cycles-pp.schedule_tail
0.00 +0.1 0.06 ? 8% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.00 +0.1 0.06 ? 8% perf-profile.children.cycles-pp.__set_task_blocked
0.01 ?223% +0.1 0.06 ? 7% perf-profile.children.cycles-pp.recalc_sigpending
0.00 +0.1 0.06 ? 6% perf-profile.children.cycles-pp.memset_erms
0.00 +0.1 0.06 ? 9% perf-profile.children.cycles-pp.__set_current_blocked
0.00 +0.1 0.06 perf-profile.children.cycles-pp.task_dump_owner
0.00 +0.1 0.07 ? 5% perf-profile.children.cycles-pp.get_pid_task
0.17 ? 4% +0.1 0.24 ? 6% perf-profile.children.cycles-pp.__unfreeze_partials
0.79 ? 2% +0.1 0.87 perf-profile.children.cycles-pp.start_thread
0.00 +0.1 0.08 ? 9% perf-profile.children.cycles-pp.flush_smp_call_function_from_idle
0.54 ? 4% +0.1 0.68 perf-profile.children.cycles-pp.lockref_put_return
4.72 +1.0 5.72 perf-profile.children.cycles-pp.vfs_getattr
4.60 +1.0 5.63 perf-profile.children.cycles-pp.security_inode_getattr
4.47 +1.0 5.52 perf-profile.children.cycles-pp.common_perm_cond
22.54 +1.8 24.32 perf-profile.children.cycles-pp.__x64_sys_statx
22.10 +1.9 24.01 perf-profile.children.cycles-pp.do_statx
22.54 +2.1 24.61 perf-profile.children.cycles-pp.__lxstat64
23.34 +2.2 25.57 perf-profile.children.cycles-pp.__xstat64
37.17 +2.8 39.97 perf-profile.children.cycles-pp.user_path_at_empty
19.71 +2.8 22.55 perf-profile.children.cycles-pp.__do_sys_newlstat
20.55 +3.0 23.54 perf-profile.children.cycles-pp.__do_sys_newstat
81.93 +3.7 85.60 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
81.16 +3.9 85.05 perf-profile.children.cycles-pp.do_syscall_64
16.39 +5.0 21.34 perf-profile.children.cycles-pp.path_put
15.60 +5.0 20.56 perf-profile.children.cycles-pp.lockref_put_or_lock
16.63 +5.0 21.62 perf-profile.children.cycles-pp.dput
27.67 +5.4 33.05 perf-profile.children.cycles-pp.filename_lookup
27.09 +5.5 32.61 perf-profile.children.cycles-pp.path_lookupat
21.03 +6.5 27.57 perf-profile.children.cycles-pp.complete_walk
20.91 +6.6 27.47 perf-profile.children.cycles-pp.try_to_unlazy
20.82 +6.7 27.47 perf-profile.children.cycles-pp.__legitimize_path
20.31 +6.8 27.07 perf-profile.children.cycles-pp.lockref_get_not_dead
59.70 +8.4 68.13 perf-profile.children.cycles-pp.vfs_statx
3.38 ? 2% -1.6 1.73 ? 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
2.84 -0.8 2.07 perf-profile.self.cycles-pp.syscall_return_via_sysret
2.38 -0.6 1.73 perf-profile.self.cycles-pp.strncpy_from_user
1.95 -0.5 1.41 perf-profile.self.cycles-pp.statx
2.37 ? 2% -0.4 1.94 perf-profile.self.cycles-pp.intel_idle
1.23 -0.4 0.88 perf-profile.self.cycles-pp.__entry_text_start
1.23 -0.3 0.88 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
1.14 -0.3 0.80 ? 5% perf-profile.self.cycles-pp.alloc_vmap_area
1.19 ? 2% -0.3 0.86 perf-profile.self.cycles-pp.kmem_cache_alloc
1.29 -0.3 0.97 ? 2% perf-profile.self.cycles-pp.kmem_cache_free
1.12 -0.3 0.85 ? 2% perf-profile.self.cycles-pp.__might_resched
0.86 ? 2% -0.2 0.62 ? 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.40 ? 2% -0.2 1.15 perf-profile.self.cycles-pp.find_idlest_group
0.88 -0.2 0.64 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
0.77 ? 2% -0.2 0.54 ? 4% perf-profile.self.cycles-pp.__check_object_size
0.84 -0.2 0.62 perf-profile.self.cycles-pp.link_path_walk
0.71 -0.2 0.51 perf-profile.self.cycles-pp.__check_heap_object
0.70 -0.2 0.50 perf-profile.self.cycles-pp.do_syscall_64
0.71 -0.2 0.53 ? 3% perf-profile.self.cycles-pp.shim_statx
0.64 -0.2 0.46 ? 3% perf-profile.self.cycles-pp.__xstat64
0.64 -0.2 0.47 perf-profile.self.cycles-pp.__lxstat64
0.63 -0.2 0.46 perf-profile.self.cycles-pp.getname_flags
0.59 ? 2% -0.2 0.42 perf-profile.self.cycles-pp.step_into
0.89 -0.2 0.74 perf-profile.self.cycles-pp._raw_spin_lock
0.57 ? 2% -0.1 0.42 perf-profile.self.cycles-pp.vfs_statx
0.57 ? 2% -0.1 0.44 ? 2% perf-profile.self.cycles-pp.clear_page_erms
0.54 -0.1 0.41 perf-profile.self.cycles-pp.__might_sleep
0.48 ? 2% -0.1 0.35 ? 2% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
0.46 ? 2% -0.1 0.35 perf-profile.self.cycles-pp.filename_lookup
0.47 ? 2% -0.1 0.36 perf-profile.self.cycles-pp.vfs_getattr_nosec
0.41 -0.1 0.30 perf-profile.self.cycles-pp.__x64_sys_statx
0.85 ? 4% -0.1 0.74 perf-profile.self.cycles-pp.__d_lookup_rcu
0.52 -0.1 0.41 ? 3% perf-profile.self.cycles-pp.memcpy_erms
0.41 ? 2% -0.1 0.31 ? 3% perf-profile.self.cycles-pp.lookup_fast
0.37 ? 2% -0.1 0.27 ? 2% perf-profile.self.cycles-pp.path_lookupat
0.34 ? 2% -0.1 0.25 ? 3% perf-profile.self.cycles-pp.__fxstat64
0.34 ? 2% -0.1 0.24 ? 3% perf-profile.self.cycles-pp.do_statx
0.35 -0.1 0.26 ? 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.36 ? 2% -0.1 0.27 perf-profile.self.cycles-pp.__virt_addr_valid
0.34 ? 2% -0.1 0.26 ? 4% perf-profile.self.cycles-pp.walk_component
0.33 ? 2% -0.1 0.24 ? 2% perf-profile.self.cycles-pp.exit_to_user_mode_prepare
0.32 ? 2% -0.1 0.24 ? 3% perf-profile.self.cycles-pp.path_init
0.41 ? 3% -0.1 0.33 perf-profile.self.cycles-pp.generic_permission
0.25 ? 3% -0.1 0.17 ? 3% perf-profile.self.cycles-pp.free_vmap_area_noflush
0.30 -0.1 0.22 ? 3% perf-profile.self.cycles-pp.user_path_at_empty
0.27 ? 2% -0.1 0.20 ? 3% perf-profile.self.cycles-pp.map_id_up
0.23 ? 3% -0.1 0.16 ? 4% perf-profile.self.cycles-pp.find_vmap_area
0.25 ? 5% -0.1 0.18 ? 4% perf-profile.self.cycles-pp.putname
0.18 ? 3% -0.1 0.11 ? 4% perf-profile.self.cycles-pp.insert_vmap_area
0.26 ? 2% -0.1 0.20 ? 2% perf-profile.self.cycles-pp.cp_new_stat
0.27 ? 3% -0.1 0.21 ? 2% perf-profile.self.cycles-pp.inode_permission
0.24 ? 2% -0.1 0.18 ? 2% perf-profile.self.cycles-pp.cp_statx
0.20 ? 4% -0.1 0.14 ? 2% perf-profile.self.cycles-pp.__vmalloc_area_node
0.25 ? 5% -0.1 0.19 ? 5% perf-profile.self.cycles-pp.refcount_dec_not_one
0.20 ? 3% -0.1 0.14 ? 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
0.12 ? 6% -0.0 0.07 ? 6% perf-profile.self.cycles-pp.__alloc_pages_bulk
0.40 ? 2% -0.0 0.35 ? 2% perf-profile.self.cycles-pp.__legitimize_mnt
0.20 -0.0 0.15 ? 3% perf-profile.self.cycles-pp.__might_fault
0.21 ? 3% -0.0 0.17 ? 2% perf-profile.self.cycles-pp.generic_fillattr
0.21 -0.0 0.16 ? 3% perf-profile.self.cycles-pp.dput
0.19 ? 4% -0.0 0.14 ? 5% perf-profile.self.cycles-pp.page_counter_charge
0.17 ? 2% -0.0 0.12 ? 6% perf-profile.self.cycles-pp.__cond_resched
0.14 ? 2% -0.0 0.10 ? 4% perf-profile.self.cycles-pp.__do_sys_newlstat
0.16 ? 5% -0.0 0.12 ? 4% perf-profile.self.cycles-pp.terminate_walk
0.19 ? 3% -0.0 0.16 ? 3% perf-profile.self.cycles-pp._raw_write_lock_irq
0.15 ? 3% -0.0 0.11 ? 5% perf-profile.self.cycles-pp.__legitimize_path
0.28 -0.0 0.25 perf-profile.self.cycles-pp._raw_spin_lock_irq
0.12 ? 5% -0.0 0.08 ? 5% perf-profile.self.cycles-pp.rb_insert_color
0.12 ? 6% -0.0 0.09 ? 7% perf-profile.self.cycles-pp._copy_to_user
0.11 ? 5% -0.0 0.08 ? 6% perf-profile.self.cycles-pp.rcu_all_qs
0.15 ? 3% -0.0 0.11 ? 3% perf-profile.self.cycles-pp.security_inode_permission
0.19 ? 7% -0.0 0.16 ? 5% perf-profile.self.cycles-pp.queued_write_lock_slowpath
0.14 ? 2% -0.0 0.10 ? 4% perf-profile.self.cycles-pp.try_to_unlazy
0.13 ? 3% -0.0 0.09 ? 7% perf-profile.self.cycles-pp.nd_jump_root
0.11 ? 6% -0.0 0.08 perf-profile.self.cycles-pp.__lookup_mnt
0.10 ? 4% -0.0 0.07 perf-profile.self.cycles-pp.check_stack_object
0.17 ? 4% -0.0 0.14 ? 5% perf-profile.self.cycles-pp.mntput_no_expire
0.09 -0.0 0.06 ? 9% perf-profile.self.cycles-pp.__x64_sys_newfstat
0.06 ? 7% -0.0 0.03 ? 70% perf-profile.self.cycles-pp.___slab_alloc
0.16 ? 3% -0.0 0.14 ? 3% perf-profile.self.cycles-pp.common_perm
0.11 ? 5% -0.0 0.08 ? 7% perf-profile.self.cycles-pp.pids_can_fork
0.32 -0.0 0.29 ? 3% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.11 ? 3% -0.0 0.08 ? 7% perf-profile.self.cycles-pp.vfs_getattr
0.12 ? 4% -0.0 0.09 ? 4% perf-profile.self.cycles-pp.__do_sys_newstat
0.10 ? 11% -0.0 0.07 ? 10% perf-profile.self.cycles-pp.vunmap_range_noflush
0.20 ? 3% -0.0 0.17 ? 2% perf-profile.self.cycles-pp.__fget_light
0.11 ? 4% -0.0 0.08 perf-profile.self.cycles-pp.enqueue_entity
0.10 ? 5% -0.0 0.07 ? 6% perf-profile.self.cycles-pp.zap_pte_range
0.14 ? 3% -0.0 0.11 perf-profile.self.cycles-pp.__x64_sys_newstat
0.23 ? 3% -0.0 0.21 ? 3% perf-profile.self.cycles-pp._find_next_bit
0.10 ? 6% -0.0 0.08 ? 6% perf-profile.self.cycles-pp.security_inode_getattr
0.10 ? 6% -0.0 0.08 ? 6% perf-profile.self.cycles-pp.propagate_protected_usage
0.12 ? 6% -0.0 0.09 ? 7% perf-profile.self.cycles-pp.complete_walk
0.10 ? 6% -0.0 0.07 ? 6% perf-profile.self.cycles-pp.restore_nameidata
0.31 ? 3% -0.0 0.29 ? 2% perf-profile.self.cycles-pp.__schedule
0.15 ? 5% -0.0 0.12 ? 3% perf-profile.self.cycles-pp.__x64_sys_newlstat
0.09 ? 7% -0.0 0.06 ? 7% perf-profile.self.cycles-pp.legitimize_links
0.11 ? 6% -0.0 0.09 ? 6% perf-profile.self.cycles-pp._mix_pool_bytes
0.13 -0.0 0.11 ? 3% perf-profile.self.cycles-pp.dup_task_struct
0.11 ? 4% -0.0 0.09 ? 4% perf-profile.self.cycles-pp.native_sched_clock
0.14 ? 3% -0.0 0.12 ? 4% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.10 ? 5% -0.0 0.08 ? 6% perf-profile.self.cycles-pp.update_curr
0.08 ? 5% -0.0 0.07 ? 11% perf-profile.self.cycles-pp.cpumask_next
0.08 ? 5% -0.0 0.07 ? 7% perf-profile.self.cycles-pp.__list_add_valid
0.08 ? 4% -0.0 0.06 perf-profile.self.cycles-pp.rb_erase
0.06 -0.0 0.04 ? 44% perf-profile.self.cycles-pp.dequeue_task_fair
0.05 ? 8% +0.0 0.07 ? 7% perf-profile.self.cycles-pp.finish_task_switch
0.11 ? 4% +0.0 0.12 ? 4% perf-profile.self.cycles-pp.__switch_to
0.07 ? 7% +0.0 0.08 ? 5% perf-profile.self.cycles-pp.try_grab_compound_head
0.10 ? 8% +0.0 0.12 ? 8% perf-profile.self.cycles-pp.pthread_create@@GLIBC_2.2.5
0.04 ? 44% +0.0 0.07 ? 5% perf-profile.self.cycles-pp.__GI___pthread_timedjoin_ex
0.22 +0.0 0.25 ? 2% perf-profile.self.cycles-pp.__list_del_entry_valid
0.10 ? 5% +0.0 0.14 ? 3% perf-profile.self.cycles-pp.start_thread
0.03 ? 70% +0.0 0.07 ? 9% perf-profile.self.cycles-pp.llist_add_batch
0.14 ? 3% +0.0 0.18 ? 5% perf-profile.self.cycles-pp.__slab_free
0.11 ? 6% +0.1 0.16 ? 4% perf-profile.self.cycles-pp.update_rq_clock
0.00 +0.1 0.06 ? 6% perf-profile.self.cycles-pp.memset_erms
0.00 +0.1 0.07 ? 5% perf-profile.self.cycles-pp.get_pid_task
0.06 ? 7% +0.1 0.17 ? 6% perf-profile.self.cycles-pp.select_task_rq_fair
0.54 ? 5% +0.1 0.68 perf-profile.self.cycles-pp.lockref_put_return
4.26 +1.1 5.33 perf-profile.self.cycles-pp.common_perm_cond
15.45 +4.9 20.37 perf-profile.self.cycles-pp.lockref_put_or_lock
20.12 +6.7 26.82 perf-profile.self.cycles-pp.lockref_get_not_dead




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (56.72 kB)
config-5.17.0-rc1-00030-g0fb3978b0aac (164.45 kB)
job-script (8.40 kB)
job.yaml (5.72 kB)
reproduce (570.00 B)
Download all attachments

2022-03-03 08:49:33

by Huang, Ying

[permalink] [raw]
Subject: Re: [sched/numa] 0fb3978b0a: stress-ng.fstat.ops_per_sec -18.9% regression

Hi, Oliver,

Thanks for report.

I still cannot connect the regression with the patch yet. To double
check, I have run test again with "sched_verbose" kernel command line,
and verified that the sched_domain isn't changed at all with the patch.

kernel test robot <[email protected]> writes:
> 0.11 6% +0.1 0.16 4% perf-profile.self.cycles-pp.update_rq_clock
> 0.00 +0.1 0.06 6% perf-profile.self.cycles-pp.memset_erms
> 0.00 +0.1 0.07 5% perf-profile.self.cycles-pp.get_pid_task
> 0.06 7% +0.1 0.17 6% perf-profile.self.cycles-pp.select_task_rq_fair
> 0.54 5% +0.1 0.68 perf-profile.self.cycles-pp.lockref_put_return
> 4.26 +1.1 5.33 perf-profile.self.cycles-pp.common_perm_cond
> 15.45 +4.9 20.37 perf-profile.self.cycles-pp.lockref_put_or_lock
> 20.12 +6.7 26.82 perf-profile.self.cycles-pp.lockref_get_not_dead

From the perf-profile above, the most visible change is more cycles in
lockref_get_not_dead(), which will loop with cmpxchg on
dentry->d_lockref. So this appears to be related to the memory layout.
I will try to debug that.

Because stress-ng is a weird "benchmark" although it's a very good
functionality test, and I cannot connect the patch with the test case
and performance metrics collected. I think this regression should be a
low priority one which shouldn't prevent the merging etc. But I will
continue to investigate the regression to try to root cause it.

Best Regards,
Huang, Ying

2022-03-09 09:43:52

by Huang, Ying

[permalink] [raw]
Subject: Re: [LKP] Re: [sched/numa] 0fb3978b0a: stress-ng.fstat.ops_per_sec -18.9% regression

Hi, All,

"Huang, Ying" <[email protected]> writes:

> Hi, Oliver,
>
> Thanks for report.
>
> I still cannot connect the regression with the patch yet. To double
> check, I have run test again with "sched_verbose" kernel command line,
> and verified that the sched_domain isn't changed at all with the patch.
>
> kernel test robot <[email protected]> writes:
>> 0.11 6% +0.1 0.16 4% perf-profile.self.cycles-pp.update_rq_clock
>> 0.00 +0.1 0.06 6% perf-profile.self.cycles-pp.memset_erms
>> 0.00 +0.1 0.07 5% perf-profile.self.cycles-pp.get_pid_task
>> 0.06 7% +0.1 0.17 6% perf-profile.self.cycles-pp.select_task_rq_fair
>> 0.54 5% +0.1 0.68 perf-profile.self.cycles-pp.lockref_put_return
>> 4.26 +1.1 5.33 perf-profile.self.cycles-pp.common_perm_cond
>> 15.45 +4.9 20.37 perf-profile.self.cycles-pp.lockref_put_or_lock
>> 20.12 +6.7 26.82 perf-profile.self.cycles-pp.lockref_get_not_dead
>
> From the perf-profile above, the most visible change is more cycles in
> lockref_get_not_dead(), which will loop with cmpxchg on
> dentry->d_lockref. So this appears to be related to the memory layout.
> I will try to debug that.
>
> Because stress-ng is a weird "benchmark" although it's a very good
> functionality test, and I cannot connect the patch with the test case
> and performance metrics collected. I think this regression should be a
> low priority one which shouldn't prevent the merging etc. But I will
> continue to investigate the regression to try to root cause it.

Done more investigation for this. It turns out the sched_domain has
been changed after commit 0fb3978b0a, although it's not shown in default
sched_verbose output. sd->imb_numa_nr of level "NUMA" has been changed
from 24 to 12 after the commit. So the following debug patch restore
the performance.

From 6c79c858248196ea71da4be02a0774bb6e277581 Mon Sep 17 00:00:00 2001
From: Huang Ying <[email protected]>
Date: Wed, 9 Mar 2022 10:45:18 +0800
Subject: [PATCH] dbg: restore sd->imb_numa_nr

---
kernel/sched/topology.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 5e05fb7a817b..c348747e4166 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2351,6 +2351,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
sd->name, imb, factor, sd->imb_numa_nr);
#endif
}
+#ifdef CONFIG_SCHED_DEBUG
+ if (!strcmp(sd->name, "NUMA"))
+ sd->imb_numa_nr = 24;
+#endif
}
}

--
2.30.2

I haven't researched much about the influence of sd->imb_numa_nr. But
it seems that commit 0fb3978b0a just follows the behavior of a normal 2
sockets (2 LLCs) machine. This is intended. So the regression report
isn't important for the commit itself.

About the impact of sd->imb_numa_nr, we will continue work on that.

Best Regards,
Huang, Ying

2022-03-09 11:46:19

by Mel Gorman

[permalink] [raw]
Subject: Re: [LKP] Re: [sched/numa] 0fb3978b0a: stress-ng.fstat.ops_per_sec -18.9% regression

On Wed, Mar 09, 2022 at 05:28:55PM +0800, Huang, Ying wrote:
> Hi, All,
>
> "Huang, Ying" <[email protected]> writes:
>
> > Hi, Oliver,
> >
> > Thanks for report.
> >
> > I still cannot connect the regression with the patch yet. To double
> > check, I have run test again with "sched_verbose" kernel command line,
> > and verified that the sched_domain isn't changed at all with the patch.
> >
> > kernel test robot <[email protected]> writes:
> >> 0.11 6% +0.1 0.16 4% perf-profile.self.cycles-pp.update_rq_clock
> >> 0.00 +0.1 0.06 6% perf-profile.self.cycles-pp.memset_erms
> >> 0.00 +0.1 0.07 5% perf-profile.self.cycles-pp.get_pid_task
> >> 0.06 7% +0.1 0.17 6% perf-profile.self.cycles-pp.select_task_rq_fair
> >> 0.54 5% +0.1 0.68 perf-profile.self.cycles-pp.lockref_put_return
> >> 4.26 +1.1 5.33 perf-profile.self.cycles-pp.common_perm_cond
> >> 15.45 +4.9 20.37 perf-profile.self.cycles-pp.lockref_put_or_lock
> >> 20.12 +6.7 26.82 perf-profile.self.cycles-pp.lockref_get_not_dead
> >
> > From the perf-profile above, the most visible change is more cycles in
> > lockref_get_not_dead(), which will loop with cmpxchg on
> > dentry->d_lockref. So this appears to be related to the memory layout.
> > I will try to debug that.
> >
> > Because stress-ng is a weird "benchmark" although it's a very good
> > functionality test, and I cannot connect the patch with the test case
> > and performance metrics collected. I think this regression should be a
> > low priority one which shouldn't prevent the merging etc. But I will
> > continue to investigate the regression to try to root cause it.
>
> Done more investigation for this. It turns out the sched_domain has
> been changed after commit 0fb3978b0a, although it's not shown in default
> sched_verbose output. sd->imb_numa_nr of level "NUMA" has been changed
> from 24 to 12 after the commit. So the following debug patch restore
> the performance.
>

If Ice Lake has multiple last level caches per socket (I didn't check)
then the sd->imb_numa_nr would have changed. I didn't dig into what
stress-ng fstat is doing as it's a stress test more than a performance
test but given that the number of threads is 10% of the total, it's
possible that the workload is being split across nodes differently.

--
Mel Gorman
SUSE Labs