2023-12-05 14:58:14

by Oliver Sang

[permalink] [raw]
Subject: [peterz-queue:locking/futex] [futex] e1a4bd5d6d: will-it-scale.per_thread_ops -11.2% regression



Hello,

kernel test robot noticed a -11.2% regression of will-it-scale.per_thread_ops on:


commit: e1a4bd5d6d978ba147f823c669373e3596e0bbcc ("futex: Implement FUTEX2_NUMA")
https://git.kernel.org/cgit/linux/kernel/git/peterz/queue.git locking/futex

testcase: will-it-scale
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:

nr_task: 16
mode: thread
test: futex1
cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231205/[email protected]

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/futex1/will-it-scale

commit:
38d12f1c15 ("mm: Add vmalloc_huge_node()")
e1a4bd5d6d ("futex: Implement FUTEX2_NUMA")

38d12f1c15069458 e1a4bd5d6d978ba147f823c6693
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.29 -0.1 1.16 mpstat.cpu.all.usr%
16082 ? 47% +268.8% 59317 ? 46% numa-meminfo.node3.AnonHugePages
443502 ? 10% -21.7% 347354 ? 16% numa-numastat.node3.numa_hit
443856 ? 10% -21.7% 347355 ? 16% numa-vmstat.node3.numa_hit
1821 ? 30% -45.9% 985.13 ? 52% sched_debug.cfs_rq:/.load_avg.stddev
9224874 ? 5% +54.4% 14242474 ? 5% meminfo.DirectMap2M
163286 ? 5% +30.9% 213804 ? 5% meminfo.DirectMap4k
0.55 ? 7% -14.3% 0.47 turbostat.IPC
72.33 +1.8% 73.67 turbostat.PkgTmp
1.155e+08 -11.2% 1.026e+08 will-it-scale.16.threads
7220531 -11.2% 6414312 will-it-scale.per_thread_ops
1.155e+08 -11.2% 1.026e+08 will-it-scale.workload
2.035e+10 -8.9% 1.853e+10 perf-stat.i.branch-instructions
0.31 -0.0 0.30 perf-stat.i.branch-miss-rate%
62615280 -12.4% 54851709 perf-stat.i.branch-misses
0.54 +9.3% 0.59 perf-stat.i.cpi
0.00 ? 5% +0.0 0.00 ? 2% perf-stat.i.dTLB-load-miss-rate%
139076 ? 5% +104.7% 284748 ? 2% perf-stat.i.dTLB-load-misses
2.634e+10 -8.2% 2.418e+10 perf-stat.i.dTLB-loads
1.927e+10 -8.8% 1.756e+10 perf-stat.i.dTLB-stores
55538465 -10.4% 49774500 ? 4% perf-stat.i.iTLB-load-misses
2514504 -10.7% 2245869 perf-stat.i.iTLB-loads
1.25e+11 -8.1% 1.149e+11 perf-stat.i.instructions
1.85 -8.5% 1.69 perf-stat.i.ipc
294.40 -8.6% 268.98 perf-stat.i.metric.M/sec
0.31 -0.0 0.30 perf-stat.overall.branch-miss-rate%
0.54 +9.3% 0.59 perf-stat.overall.cpi
0.00 ? 5% +0.0 0.00 ? 2% perf-stat.overall.dTLB-load-miss-rate%
0.00 ? 6% +0.0 0.00 ? 5% perf-stat.overall.dTLB-store-miss-rate%
1.85 -8.5% 1.69 perf-stat.overall.ipc
325727 +3.2% 336234 perf-stat.overall.path-length
2.028e+10 -8.9% 1.847e+10 perf-stat.ps.branch-instructions
62436489 -12.4% 54701854 perf-stat.ps.branch-misses
138701 ? 5% +104.7% 283927 ? 2% perf-stat.ps.dTLB-load-misses
2.625e+10 -8.2% 2.409e+10 perf-stat.ps.dTLB-loads
1.92e+10 -8.8% 1.75e+10 perf-stat.ps.dTLB-stores
55348676 -10.4% 49598644 ? 4% perf-stat.ps.iTLB-load-misses
2506036 -10.7% 2238080 perf-stat.ps.iTLB-loads
1.246e+11 -8.1% 1.145e+11 perf-stat.ps.instructions
3.763e+13 -8.3% 3.451e+13 perf-stat.total.instructions
14.56 ? 2% -1.5 13.06 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
27.62 ? 2% -1.4 26.24 perf-profile.calltrace.cycles-pp.get_user_pages_fast.get_futex_key.futex_wake.do_futex.__x64_sys_futex
25.52 ? 2% -1.0 24.52 perf-profile.calltrace.cycles-pp.internal_get_user_pages_fast.get_user_pages_fast.get_futex_key.futex_wake.do_futex
11.08 ? 2% -0.6 10.48 ? 2% perf-profile.calltrace.cycles-pp.gup_pte_range.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast.get_user_pages_fast
3.74 ? 2% -0.5 3.26 ? 3% perf-profile.calltrace.cycles-pp.try_grab_folio.gup_pte_range.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast
1.04 ? 3% -0.3 0.77 ? 3% perf-profile.calltrace.cycles-pp.is_valid_gup_args.get_user_pages_fast.get_futex_key.futex_wake.do_futex
2.05 ? 4% -0.2 1.90 ? 2% perf-profile.calltrace.cycles-pp.testcase
1.64 ? 3% -0.1 1.51 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
1.33 ? 3% -0.1 1.21 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall
0.98 ? 3% -0.1 0.87 ? 2% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall
1.02 ? 3% -0.1 0.91 ? 2% perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
0.69 ? 2% -0.1 0.63 ? 3% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
3.88 ? 5% +0.6 4.44 ? 5% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
4.71 ? 6% +0.6 5.31 ? 5% perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
47.98 ? 2% +2.7 50.66 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
43.62 ? 2% +3.1 46.74 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
2.64 ? 3% +3.2 5.87 perf-profile.calltrace.cycles-pp.futex_hash.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
41.53 ? 2% +3.3 44.86 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
40.19 ? 2% +3.5 43.64 perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
27.78 ? 2% -1.4 26.37 perf-profile.children.cycles-pp.get_user_pages_fast
25.77 ? 2% -1.0 24.73 perf-profile.children.cycles-pp.internal_get_user_pages_fast
9.17 ? 2% -0.9 8.28 perf-profile.children.cycles-pp.entry_SYSCALL_64
11.42 ? 2% -0.7 10.77 ? 2% perf-profile.children.cycles-pp.gup_pte_range
5.61 ? 3% -0.6 5.06 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
4.30 ? 2% -0.5 3.85 perf-profile.children.cycles-pp.try_grab_folio
1.11 ? 3% -0.3 0.80 ? 3% perf-profile.children.cycles-pp.is_valid_gup_args
2.09 ? 4% -0.2 1.91 perf-profile.children.cycles-pp.testcase
2.05 ? 3% -0.2 1.88 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
1.42 ? 3% -0.1 1.29 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
1.12 ? 3% -0.1 1.00 perf-profile.children.cycles-pp.syscall_return_via_sysret
1.02 ? 3% -0.1 0.91 ? 2% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
0.69 ? 2% -0.1 0.63 ? 3% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.18 ? 9% -0.0 0.13 ? 7% perf-profile.children.cycles-pp.syscall@plt
0.39 ? 5% -0.0 0.35 ? 3% perf-profile.children.cycles-pp.folio_fast_pin_allowed
0.08 ? 12% +0.0 0.12 ? 12% perf-profile.children.cycles-pp.rcu_sched_clock_irq
0.06 ? 17% +0.0 0.10 ? 14% perf-profile.children.cycles-pp.rcu_pending
0.00 +0.1 0.06 ? 9% perf-profile.children.cycles-pp.update_rq_clock
0.00 +0.1 0.06 ? 17% perf-profile.children.cycles-pp.check_cpu_stall
0.04 ? 45% +0.1 0.12 ? 6% perf-profile.children.cycles-pp.sched_clock_cpu
0.00 +0.1 0.08 ? 11% perf-profile.children.cycles-pp.hrtimer_forward
1.02 ? 7% +0.3 1.29 ? 6% perf-profile.children.cycles-pp.ktime_get
48.20 ? 2% +2.7 50.86 perf-profile.children.cycles-pp.do_syscall_64
43.65 ? 2% +3.1 46.74 perf-profile.children.cycles-pp.__x64_sys_futex
2.65 ? 3% +3.2 5.88 perf-profile.children.cycles-pp.futex_hash
41.68 ? 2% +3.3 44.99 perf-profile.children.cycles-pp.do_futex
40.38 ? 2% +3.4 43.81 perf-profile.children.cycles-pp.futex_wake
7.80 ? 3% -0.8 6.98 perf-profile.self.cycles-pp.syscall
5.48 ? 3% -0.5 4.94 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
4.24 ? 2% -0.5 3.71 perf-profile.self.cycles-pp.futex_wake
4.28 ? 2% -0.5 3.80 perf-profile.self.cycles-pp.try_grab_folio
2.60 ? 2% -0.3 2.28 ? 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.01 ? 2% -0.3 0.70 ? 2% perf-profile.self.cycles-pp.is_valid_gup_args
3.79 ? 2% -0.3 3.50 perf-profile.self.cycles-pp.entry_SYSCALL_64
1.96 ? 3% -0.2 1.74 ? 3% perf-profile.self.cycles-pp.internal_get_user_pages_fast
1.83 ? 4% -0.2 1.63 perf-profile.self.cycles-pp.__x64_sys_futex
1.79 ? 4% -0.2 1.64 perf-profile.self.cycles-pp.testcase
1.44 ? 2% -0.1 1.29 ? 2% perf-profile.self.cycles-pp.do_futex
1.40 ? 3% -0.1 1.25 perf-profile.self.cycles-pp.do_syscall_64
1.42 ? 3% -0.1 1.29 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
1.12 ? 3% -0.1 1.00 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.96 ? 3% -0.1 0.86 ? 2% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
0.96 ? 3% -0.1 0.87 ? 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.27 ? 6% -0.0 0.24 ? 3% perf-profile.self.cycles-pp.folio_fast_pin_allowed
0.00 +0.1 0.06 ? 17% perf-profile.self.cycles-pp.check_cpu_stall
0.00 +0.1 0.08 ? 12% perf-profile.self.cycles-pp.sched_clock_cpu
0.00 +0.1 0.08 ? 11% perf-profile.self.cycles-pp.hrtimer_forward
0.97 ? 8% +0.3 1.24 ? 6% perf-profile.self.cycles-pp.ktime_get
5.72 ? 3% +2.1 7.83 perf-profile.self.cycles-pp.get_futex_key
2.51 ? 3% +3.2 5.73 perf-profile.self.cycles-pp.futex_hash




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki