2024-03-04 06:00:22

by Yujie Liu

[permalink] [raw]
Subject: [linus:master] [x86/bugs] 6613d82e61: stress-ng.mutex.ops_per_sec -7.9% regression

Hello,

kernel test robot noticed a -7.9% regression of stress-ng.mutex.ops_per_sec on:

commit: 6613d82e617dd7eb8b0c40b2fe3acea655b1d611 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

nr_threads: 100%
testtime: 60s
test: mutex
cpufreq_governor: performance


In addition to that, the commit also has impact on the following tests:

+------------------+-------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.ptrace.ops_per_sec -3.9% regression |
| test machine | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | nr_threads=100% |
| | test=ptrace |
| | testtime=60s |
+------------------+-------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.getdent.ops_per_sec 5.8% improvement |
| test machine | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | disk=1HDD |
| | fs=btrfs |
| | nr_threads=100% |
| | test=getdent |
| | testtime=60s |
+------------------+-------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_thread_ops 4.0% improvement |
| test machine | 104 threads 2 sockets (Skylake) with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=thread |
| | nr_task=100% |
| | test=futex4 |
+------------------+-------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -2.1% regression |
| test machine | 104 threads 2 sockets (Skylake) with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=100% |
| | test=futex2 |
+------------------+-------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops 3.7% improvement |
| test machine | 104 threads 2 sockets (Skylake) with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=100% |
| | test=futex3 |
+------------------+-------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240304/[email protected]

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/mutex/stress-ng/60s

commit:
a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition")
6613d82e61 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")

a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3
---------------- ---------------------------
%stddev %change %stddev
\ | \
11556 ? 15% -24.2% 8755 ? 10% numa-meminfo.node0.Active
11529 ? 15% -24.5% 8702 ? 10% numa-meminfo.node0.Active(anon)
417861 -8.0% 384591 vmstat.system.cs
287897 -5.2% 273070 vmstat.system.in
182670 +9.0% 199032 stress-ng.mutex.nanosecs_per_mutex
18139421 -7.9% 16702171 stress-ng.mutex.ops
302318 -7.9% 278364 stress-ng.mutex.ops_per_sec
12040142 -7.3% 11161921 stress-ng.time.involuntary_context_switches
9424624 -7.6% 8707796 stress-ng.time.voluntary_context_switches
1.36 -5.7% 1.28 perf-stat.i.MPKI
0.31 -0.0 0.30 perf-stat.i.branch-miss-rate%
11445088 -4.4% 10944702 perf-stat.i.branch-misses
21081580 -6.7% 19679133 perf-stat.i.cache-misses
57754062 -6.7% 53909365 perf-stat.i.cache-references
429726 -7.6% 397018 perf-stat.i.context-switches
120047 -7.3% 111272 perf-stat.i.cpu-migrations
9063 +7.3% 9727 perf-stat.i.cycles-between-cache-misses
8.62 -7.5% 7.97 perf-stat.i.metric.K/sec
1.35 -5.9% 1.27 perf-stat.overall.MPKI
0.31 -0.0 0.30 perf-stat.overall.branch-miss-rate%
8893 +7.0% 9514 perf-stat.overall.cycles-between-cache-misses
11240262 -4.4% 10751121 perf-stat.ps.branch-misses
20680093 -6.7% 19302166 perf-stat.ps.cache-misses
56715466 -6.7% 52937829 perf-stat.ps.cache-references
422630 -7.6% 390583 perf-stat.ps.context-switches
118070 -7.3% 109477 perf-stat.ps.cpu-migrations
10.01 -0.5 9.54 perf-profile.calltrace.cycles-pp.find_lock_lowest_rq.push_rt_task.push_rt_tasks.finish_task_switch.__schedule
20.36 -0.3 20.04 perf-profile.calltrace.cycles-pp.push_rt_task.push_rt_tasks.finish_task_switch.__schedule.schedule
21.10 -0.3 20.84 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
21.08 -0.3 20.83 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
21.86 -0.3 21.60 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
21.85 -0.2 21.60 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
17.30 -0.2 17.07 perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
17.32 -0.2 17.09 perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
17.15 -0.2 16.93 perf-profile.calltrace.cycles-pp.futex_wait_queue.__futex_wait.futex_wait.do_futex.__x64_sys_futex
17.11 -0.2 16.88 perf-profile.calltrace.cycles-pp.schedule.futex_wait_queue.__futex_wait.futex_wait.do_futex
17.10 -0.2 16.88 perf-profile.calltrace.cycles-pp.__schedule.schedule.futex_wait_queue.__futex_wait.futex_wait
4.16 -0.2 3.98 perf-profile.calltrace.cycles-pp.__sched_yield
3.72 -0.2 3.54 perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.73 -0.2 3.55 perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
4.10 -0.2 3.92 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield
4.09 -0.2 3.91 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
3.99 -0.2 3.81 perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
15.61 -0.1 15.46 perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.futex_wait_queue.__futex_wait
2.64 -0.1 2.50 perf-profile.calltrace.cycles-pp.push_rt_tasks.finish_task_switch.__schedule.schedule.__x64_sys_sched_yield
14.86 -0.1 14.72 perf-profile.calltrace.cycles-pp.push_rt_tasks.finish_task_switch.__schedule.schedule.futex_wait_queue
2.95 -0.1 2.81 perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64
0.92 ? 3% -0.1 0.82 ? 2% perf-profile.calltrace.cycles-pp.cpupri_set.enqueue_task_rt.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler
0.72 ? 4% -0.1 0.65 ? 2% perf-profile.calltrace.cycles-pp.cpupri_set.dequeue_rt_stack.dequeue_task_rt.__sched_setscheduler._sched_setscheduler
0.74 ? 4% -0.1 0.66 ? 2% perf-profile.calltrace.cycles-pp.dequeue_rt_stack.dequeue_task_rt.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler
1.07 -0.0 1.04 perf-profile.calltrace.cycles-pp.task_rq_lock.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler
3.76 -0.0 3.73 perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.01 -0.0 0.98 perf-profile.calltrace.cycles-pp._raw_spin_lock.task_rq_lock.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler
0.59 +0.0 0.62 perf-profile.calltrace.cycles-pp.rt_mutex_adjust_pi.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler
0.68 +0.0 0.71 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__schedule.schedule_idle.do_idle
0.69 +0.0 0.72 perf-profile.calltrace.cycles-pp._raw_spin_lock.__schedule.schedule_idle.do_idle.cpu_startup_entry
5.78 +0.1 5.84 perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler
5.86 +0.1 5.92 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler
5.78 +0.1 5.84 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.__sched_setscheduler
5.78 +0.1 5.84 perf-profile.calltrace.cycles-pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.__sched_setscheduler._sched_setscheduler
6.87 +0.1 6.94 perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single
2.26 +0.1 2.34 perf-profile.calltrace.cycles-pp._raw_spin_lock.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler
2.26 +0.1 2.33 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler
8.53 +0.1 8.61 perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
5.44 +0.1 5.52 perf-profile.calltrace.cycles-pp.activate_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single
8.45 +0.1 8.58 perf-profile.calltrace.cycles-pp.enqueue_task_rt.activate_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue
8.26 +0.1 8.39 perf-profile.calltrace.cycles-pp._raw_spin_lock.enqueue_task_rt.activate_task.ttwu_do_activate.sched_ttwu_pending
10.00 +0.2 10.16 perf-profile.calltrace.cycles-pp.activate_task.push_rt_task.push_rt_tasks.finish_task_switch.__schedule
10.11 +0.2 10.27 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.enqueue_task_rt.activate_task.ttwu_do_activate
9.69 +0.2 9.86 perf-profile.calltrace.cycles-pp.enqueue_task_rt.activate_task.push_rt_task.push_rt_tasks.finish_task_switch
9.30 +0.2 9.48 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.enqueue_task_rt.activate_task.push_rt_task
9.38 +0.2 9.56 perf-profile.calltrace.cycles-pp._raw_spin_lock.enqueue_task_rt.activate_task.push_rt_task.push_rt_tasks
47.73 +0.4 48.08 perf-profile.calltrace.cycles-pp.enqueue_task_rt.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler
64.42 +0.4 64.78 perf-profile.calltrace.cycles-pp.__sched_setscheduler
64.32 +0.4 64.68 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_setscheduler
64.31 +0.4 64.68 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_setscheduler
59.94 +0.4 60.35 perf-profile.calltrace.cycles-pp.__x64_sys_sched_setscheduler.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_setscheduler
59.94 +0.4 60.35 perf-profile.calltrace.cycles-pp.do_sched_setscheduler.__x64_sys_sched_setscheduler.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_setscheduler
59.64 +0.4 60.05 perf-profile.calltrace.cycles-pp.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler.do_syscall_64
59.82 +0.4 60.24 perf-profile.calltrace.cycles-pp._sched_setscheduler.do_sched_setscheduler.__x64_sys_sched_setscheduler.do_syscall_64.entry_SYSCALL_64_after_hwframe
46.02 +0.4 46.45 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.enqueue_task_rt.__sched_setscheduler._sched_setscheduler
46.37 +0.4 46.82 perf-profile.calltrace.cycles-pp._raw_spin_lock.enqueue_task_rt.__sched_setscheduler._sched_setscheduler.do_sched_setscheduler
11.13 -0.5 10.60 perf-profile.children.cycles-pp.find_lock_lowest_rq
25.80 -0.4 25.37 perf-profile.children.cycles-pp.schedule
27.01 -0.4 26.60 perf-profile.children.cycles-pp.__schedule
22.66 -0.4 22.28 perf-profile.children.cycles-pp.push_rt_task
22.81 -0.3 22.48 perf-profile.children.cycles-pp.finish_task_switch
21.45 -0.3 21.13 perf-profile.children.cycles-pp.push_rt_tasks
21.10 -0.3 20.84 perf-profile.children.cycles-pp.__x64_sys_futex
21.08 -0.3 20.83 perf-profile.children.cycles-pp.do_futex
17.30 -0.2 17.07 perf-profile.children.cycles-pp.__futex_wait
2.20 ? 3% -0.2 1.97 ? 2% perf-profile.children.cycles-pp.cpupri_set
17.32 -0.2 17.09 perf-profile.children.cycles-pp.futex_wait
17.15 -0.2 16.93 perf-profile.children.cycles-pp.futex_wait_queue
4.18 -0.2 3.99 perf-profile.children.cycles-pp.__sched_yield
3.99 -0.2 3.81 perf-profile.children.cycles-pp.__x64_sys_sched_yield
0.94 ? 4% -0.1 0.85 ? 2% perf-profile.children.cycles-pp.dequeue_rt_stack
0.50 ? 5% -0.1 0.43 ? 3% perf-profile.children.cycles-pp.find_lowest_rq
0.46 ? 5% -0.1 0.40 ? 4% perf-profile.children.cycles-pp.cpupri_find_fitness
0.82 -0.0 0.78 perf-profile.children.cycles-pp.task_woken_rt
0.32 ? 2% -0.0 0.28 ? 3% perf-profile.children.cycles-pp.pull_rt_task
0.31 ? 2% -0.0 0.28 ? 2% perf-profile.children.cycles-pp.pick_next_task_rt
0.58 -0.0 0.55 perf-profile.children.cycles-pp.enqueue_pushable_task
3.76 -0.0 3.73 perf-profile.children.cycles-pp.futex_wake
0.11 ? 4% -0.0 0.10 ? 3% perf-profile.children.cycles-pp.balance_rt
0.43 -0.0 0.41 perf-profile.children.cycles-pp.rto_push_irq_work_func
0.14 ? 2% -0.0 0.13 perf-profile.children.cycles-pp.select_task_rq
0.13 ? 2% -0.0 0.12 perf-profile.children.cycles-pp.select_task_rq_rt
0.07 -0.0 0.06 perf-profile.children.cycles-pp.update_rt_rq_load_avg
0.26 +0.0 0.27 perf-profile.children.cycles-pp.irq_exit_rcu
0.59 +0.0 0.62 perf-profile.children.cycles-pp.rt_mutex_adjust_pi
0.49 ? 2% +0.0 0.53 perf-profile.children.cycles-pp.scheduler_tick
1.14 +0.0 1.18 perf-profile.children.cycles-pp.update_curr_rt
0.58 +0.0 0.63 perf-profile.children.cycles-pp.tick_nohz_highres_handler
0.58 +0.0 0.62 perf-profile.children.cycles-pp.update_process_times
0.62 +0.0 0.68 perf-profile.children.cycles-pp.hrtimer_interrupt
0.60 +0.1 0.65 perf-profile.children.cycles-pp.__hrtimer_run_queues
0.58 +0.1 0.63 perf-profile.children.cycles-pp.tick_sched_handle
0.62 +0.1 0.68 perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.87 +0.1 0.93 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.94 +0.1 1.00 perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
8.53 +0.1 8.61 perf-profile.children.cycles-pp.cpu_startup_entry
8.53 +0.1 8.61 perf-profile.children.cycles-pp.secondary_startup_64_no_verify
8.53 +0.1 8.61 perf-profile.children.cycles-pp.do_idle
14.31 +0.1 14.44 perf-profile.children.cycles-pp.sched_ttwu_pending
13.62 +0.1 13.76 perf-profile.children.cycles-pp.ttwu_do_activate
23.84 +0.3 24.19 perf-profile.children.cycles-pp.activate_task
59.94 +0.4 60.36 perf-profile.children.cycles-pp.__x64_sys_sched_setscheduler
59.94 +0.4 60.35 perf-profile.children.cycles-pp.do_sched_setscheduler
59.82 +0.4 60.24 perf-profile.children.cycles-pp._sched_setscheduler
88.70 +0.5 89.18 perf-profile.children.cycles-pp._raw_spin_lock
87.93 +0.5 88.41 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
71.06 +0.7 71.77 perf-profile.children.cycles-pp.enqueue_task_rt
124.24 +0.8 125.02 perf-profile.children.cycles-pp.__sched_setscheduler
2.19 ? 3% -0.2 1.96 ? 2% perf-profile.self.cycles-pp.cpupri_set
0.31 ? 6% -0.0 0.27 ? 3% perf-profile.self.cycles-pp.cpupri_find_fitness
0.30 ? 3% -0.0 0.27 ? 4% perf-profile.self.cycles-pp.pull_rt_task
0.26 ? 3% -0.0 0.23 ? 3% perf-profile.self.cycles-pp.pick_next_task_rt
0.54 -0.0 0.52 perf-profile.self.cycles-pp.enqueue_pushable_task
0.15 -0.0 0.14 perf-profile.self.cycles-pp.switch_mm_irqs_off
0.65 +0.0 0.67 perf-profile.self.cycles-pp._raw_spin_lock_irqsave
1.00 +0.0 1.04 perf-profile.self.cycles-pp.update_curr_rt
87.92 +0.5 88.40 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


***************************************************************************************************
lkp-icl-2sp7: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/ptrace/stress-ng/60s

commit:
a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition")
6613d82e61 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")

a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3
---------------- ---------------------------
%stddev %change %stddev
\ | \
1476651 -3.9% 1418563 vmstat.system.cs
46602054 -3.9% 44765510 stress-ng.ptrace.ops
776688 -3.9% 746080 stress-ng.ptrace.ops_per_sec
93178718 -3.9% 89501356 stress-ng.time.voluntary_context_switches
41454 ? 26% -69.0% 12835 ? 93% proc-vmstat.numa_pages_migrated
363994 ? 3% -5.7% 343290 ? 3% proc-vmstat.pgfree
41454 ? 26% -69.0% 12835 ? 93% proc-vmstat.pgmigrate_success
36755 ? 34% -41.9% 21353 ? 30% proc-vmstat.pgreuse
0.70 +0.1 0.75 ? 2% perf-stat.i.branch-miss-rate%
44257013 ? 2% +7.7% 47672895 ? 3% perf-stat.i.branch-misses
1534064 -4.0% 1472825 perf-stat.i.context-switches
24.03 -4.0% 23.08 perf-stat.i.metric.K/sec
0.68 ? 2% +0.1 0.73 ? 2% perf-stat.overall.branch-miss-rate%
43429354 ? 2% +7.7% 46789221 ? 2% perf-stat.ps.branch-misses
1506894 -3.9% 1447769 perf-stat.ps.context-switches
45.76 -0.5 45.22 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.cgroup_enter_frozen.ptrace_stop.ptrace_do_notify
45.99 -0.5 45.46 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.cgroup_enter_frozen.ptrace_stop.ptrace_do_notify.ptrace_notify
23.04 -0.3 22.74 perf-profile.calltrace.cycles-pp.cgroup_enter_frozen.ptrace_stop.ptrace_do_notify.ptrace_notify.syscall_trace_enter
23.04 -0.3 22.78 perf-profile.calltrace.cycles-pp.cgroup_enter_frozen.ptrace_stop.ptrace_do_notify.ptrace_notify.syscall_exit_to_user_mode_prepare
7.91 -0.0 7.88 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.getgid
0.61 +0.0 0.64 perf-profile.calltrace.cycles-pp.__schedule.schedule.do_wait.kernel_wait4.__do_sys_wait4
0.63 +0.0 0.66 perf-profile.calltrace.cycles-pp.schedule.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
1.80 +0.1 1.85 perf-profile.calltrace.cycles-pp.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.89 +0.1 1.94 perf-profile.calltrace.cycles-pp.kernel_wait4.__do_sys_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe.wait4
1.92 +0.1 1.98 perf-profile.calltrace.cycles-pp.__do_sys_wait4.do_syscall_64.entry_SYSCALL_64_after_hwframe.wait4
2.12 +0.1 2.19 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.wait4
2.13 +0.1 2.20 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.wait4
2.34 +0.1 2.42 perf-profile.calltrace.cycles-pp.wait4
1.24 +0.1 1.34 ? 3% perf-profile.calltrace.cycles-pp.__x64_sys_ptrace.do_syscall_64.entry_SYSCALL_64_after_hwframe.ptrace
1.27 +0.1 1.36 ? 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ptrace
1.26 +0.1 1.36 ? 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ptrace
1.34 +0.1 1.44 ? 2% perf-profile.calltrace.cycles-pp.ptrace
22.52 +0.3 22.84 perf-profile.calltrace.cycles-pp.cgroup_leave_frozen.ptrace_stop.ptrace_do_notify.ptrace_notify.syscall_trace_enter
44.96 +0.4 45.33 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.cgroup_leave_frozen.ptrace_stop.ptrace_do_notify
45.31 +0.4 45.72 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.cgroup_leave_frozen.ptrace_stop.ptrace_do_notify.ptrace_notify
46.10 -0.5 45.55 perf-profile.children.cycles-pp.cgroup_enter_frozen
90.76 -0.2 90.57 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.25 +0.0 0.27 perf-profile.children.cycles-pp.switch_mm_irqs_off
0.16 ? 2% +0.0 0.17 ? 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.60 +0.0 0.63 ? 2% perf-profile.children.cycles-pp.ttwu_do_activate
1.80 +0.1 1.85 perf-profile.children.cycles-pp.do_wait
1.89 +0.1 1.95 perf-profile.children.cycles-pp.kernel_wait4
1.92 +0.1 1.98 perf-profile.children.cycles-pp.__do_sys_wait4
2.36 +0.1 2.44 perf-profile.children.cycles-pp.wait4
1.47 +0.1 1.56 perf-profile.children.cycles-pp.__schedule
1.50 +0.1 1.58 perf-profile.children.cycles-pp.schedule
1.24 +0.1 1.34 ? 3% perf-profile.children.cycles-pp.__x64_sys_ptrace
1.36 +0.1 1.46 ? 2% perf-profile.children.cycles-pp.ptrace
45.42 +0.4 45.82 perf-profile.children.cycles-pp.cgroup_leave_frozen
90.76 -0.2 90.57 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.38 -0.0 0.37 perf-profile.self.cycles-pp.ptrace_stop
0.93 +0.1 1.02 perf-profile.self.cycles-pp._raw_spin_lock_irq



***************************************************************************************************
lkp-icl-2sp8: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-12/performance/1HDD/btrfs/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/getdent/stress-ng/60s

commit:
a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition")
6613d82e61 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")

a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3
---------------- ---------------------------
%stddev %change %stddev
\ | \
317900 ? 4% +51.9% 482822 ? 4% cpuidle..usage
3.05 +3.6% 3.15 iostat.cpu.user
2.49 ? 4% -0.1 2.38 ? 4% mpstat.cpu.all.idle%
15342 ? 3% +56.1% 23954 ? 3% vmstat.system.cs
178479 +2.4% 182761 vmstat.system.in
9197 ? 2% +47.8% 13598 ? 3% sched_debug.cpu.nr_switches.avg
23944 ? 6% +58.8% 38014 ? 6% sched_debug.cpu.nr_switches.max
5660 ? 12% +72.0% 9736 ? 4% sched_debug.cpu.nr_switches.stddev
58917432 ? 4% -40.0% 35337032 numa-numastat.node0.local_node
58973202 ? 4% -40.0% 35382357 numa-numastat.node0.numa_hit
37066235 +76.6% 65448120 numa-numastat.node1.local_node
37099580 +76.5% 65478720 numa-numastat.node1.numa_hit
269394 ? 4% -71.3% 77312 ? 28% numa-meminfo.node0.KReclaimable
269394 ? 4% -71.3% 77312 ? 28% numa-meminfo.node0.SReclaimable
387028 ? 2% -51.5% 187589 ? 12% numa-meminfo.node0.Slab
93129 ? 12% +194.7% 274479 ? 8% numa-meminfo.node1.KReclaimable
93129 ? 12% +194.7% 274479 ? 8% numa-meminfo.node1.SReclaimable
155845 ? 5% +120.5% 343568 ? 7% numa-meminfo.node1.Slab
67916 ? 3% -71.2% 19547 ? 28% numa-vmstat.node0.nr_slab_reclaimable
59072793 ? 4% -40.0% 35463515 numa-vmstat.node0.numa_hit
59017023 ? 4% -40.0% 35418189 numa-vmstat.node0.numa_local
23698 ? 13% +192.1% 69229 ? 9% numa-vmstat.node1.nr_slab_reclaimable
37209604 +76.6% 65720661 numa-vmstat.node1.numa_hit
37176256 +76.7% 65690060 numa-vmstat.node1.numa_local
9705 -9.2% 8816 stress-ng.getdent.nanosecs_per_getdents_call
1.17e+08 +5.8% 1.238e+08 stress-ng.getdent.ops
1949907 +5.8% 2063349 stress-ng.getdent.ops_per_sec
97203 ? 6% +12.9% 109764 stress-ng.time.involuntary_context_switches
85913623 +5.8% 90920658 stress-ng.time.minor_page_faults
82.78 ? 2% +6.7% 88.32 stress-ng.time.user_time
372113 ? 7% +74.4% 649143 ? 3% stress-ng.time.voluntary_context_switches
90376 -1.7% 88797 proc-vmstat.nr_slab_reclaimable
19745 ? 31% -26.3% 14551 ? 2% proc-vmstat.numa_hint_faults
11950 ? 41% -36.7% 7560 ? 7% proc-vmstat.numa_hint_faults_local
96087443 ? 3% +5.2% 1.011e+08 proc-vmstat.numa_hit
95998301 ? 3% +5.2% 1.01e+08 proc-vmstat.numa_local
1.012e+08 ? 3% +4.7% 1.059e+08 proc-vmstat.pgalloc_normal
86033810 +5.9% 91111926 proc-vmstat.pgfault
1.009e+08 ? 3% +4.7% 1.057e+08 proc-vmstat.pgfree
14992 ? 6% -8.3% 13744 proc-vmstat.pgreuse
3.29 -4.1% 3.15 perf-stat.i.MPKI
1.031e+10 +5.0% 1.082e+10 perf-stat.i.branch-instructions
77903770 +5.3% 82008784 perf-stat.i.branch-misses
45.24 -2.3 42.98 perf-stat.i.cache-miss-rate%
3.596e+08 ? 2% +6.5% 3.83e+08 perf-stat.i.cache-references
15896 ? 3% +56.8% 24926 ? 3% perf-stat.i.context-switches
4.51 -5.2% 4.27 perf-stat.i.cpi
339.16 ? 8% +30.7% 443.20 ? 4% perf-stat.i.cpu-migrations
4.991e+10 +5.0% 5.243e+10 perf-stat.i.instructions
0.24 +5.0% 0.25 perf-stat.i.ipc
44.19 +5.9% 46.82 perf-stat.i.metric.K/sec
1411214 +5.9% 1494386 perf-stat.i.minor-faults
1411214 +5.9% 1494386 perf-stat.i.page-faults
3.30 -3.7% 3.17 perf-stat.overall.MPKI
45.68 -2.3 43.40 perf-stat.overall.cache-miss-rate%
4.49 -4.6% 4.28 perf-stat.overall.cpi
0.22 +4.8% 0.23 perf-stat.overall.ipc
1.014e+10 +4.9% 1.063e+10 perf-stat.ps.branch-instructions
76113957 +5.3% 80174083 perf-stat.ps.branch-misses
3.541e+08 ? 2% +6.3% 3.765e+08 perf-stat.ps.cache-references
15523 ? 3% +56.4% 24284 ? 3% perf-stat.ps.context-switches
331.55 ? 9% +30.6% 433.03 ? 4% perf-stat.ps.cpu-migrations
4.907e+10 +4.9% 5.149e+10 perf-stat.ps.instructions
1388739 +5.8% 1468698 perf-stat.ps.minor-faults
1388739 +5.8% 1468698 perf-stat.ps.page-faults
3.005e+12 +4.2% 3.133e+12 perf-stat.total.instructions
59.17 -2.9 56.25 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
59.24 -2.9 56.31 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
59.68 -2.9 56.79 perf-profile.calltrace.cycles-pp.syscall
29.18 -1.5 27.70 perf-profile.calltrace.cycles-pp.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
28.64 -1.5 27.16 perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
29.82 -1.5 28.37 perf-profile.calltrace.cycles-pp.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
29.03 -1.4 27.58 perf-profile.calltrace.cycles-pp.iterate_dir.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
9.19 ? 3% -1.1 8.13 ? 2% perf-profile.calltrace.cycles-pp.proc_readdir_de.proc_tgid_net_readdir.iterate_dir.__x64_sys_getdents64.do_syscall_64
9.24 ? 3% -1.1 8.19 ? 2% perf-profile.calltrace.cycles-pp.proc_tgid_net_readdir.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe
9.17 ? 3% -1.0 8.12 ? 2% perf-profile.calltrace.cycles-pp.proc_readdir_de.proc_tgid_net_readdir.iterate_dir.__x64_sys_getdents.do_syscall_64
9.25 ? 3% -1.0 8.21 ? 2% perf-profile.calltrace.cycles-pp.proc_tgid_net_readdir.iterate_dir.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.51 ? 4% -0.6 4.89 ? 2% perf-profile.calltrace.cycles-pp.proc_readdir_de.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.49 ? 3% -0.6 4.88 ? 2% perf-profile.calltrace.cycles-pp.proc_readdir_de.iterate_dir.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.22 ? 4% -0.5 3.72 ? 2% perf-profile.calltrace.cycles-pp._raw_read_lock.proc_readdir_de.proc_tgid_net_readdir.iterate_dir.__x64_sys_getdents64
4.20 ? 4% -0.5 3.71 ? 2% perf-profile.calltrace.cycles-pp._raw_read_lock.proc_readdir_de.proc_tgid_net_readdir.iterate_dir.__x64_sys_getdents
2.78 ? 4% -0.3 2.47 ? 2% perf-profile.calltrace.cycles-pp._raw_read_lock.proc_readdir_de.iterate_dir.__x64_sys_getdents64.do_syscall_64
2.77 ? 3% -0.3 2.47 ? 2% perf-profile.calltrace.cycles-pp._raw_read_lock.proc_readdir_de.iterate_dir.__x64_sys_getdents.do_syscall_64
0.90 ? 4% -0.1 0.80 ? 2% perf-profile.calltrace.cycles-pp._raw_read_lock.proc_lookup_de.proc_tgid_net_lookup.lookup_open.open_last_lookups
0.56 ? 2% +0.0 0.58 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.vma_alloc_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault
0.62 ? 2% +0.0 0.64 perf-profile.calltrace.cycles-pp.vma_alloc_folio.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.56 +0.0 0.59 ? 2% perf-profile.calltrace.cycles-pp.d_alloc_parallel.lookup_open.open_last_lookups.path_openat.do_filp_open
0.76 ? 3% +0.1 0.81 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.new_inode.proc_get_inode.proc_lookup_de
0.65 ? 3% +0.1 0.71 ? 5% perf-profile.calltrace.cycles-pp.apparmor_file_free_security.security_file_free.__fput.__x64_sys_close.do_syscall_64
0.66 ? 3% +0.1 0.72 ? 5% perf-profile.calltrace.cycles-pp.security_file_free.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.01 ? 2% +0.1 1.08 perf-profile.calltrace.cycles-pp._raw_spin_lock.new_inode.proc_get_inode.proc_lookup_de.proc_tgid_net_lookup
1.35 +0.1 1.43 perf-profile.calltrace.cycles-pp.new_inode.proc_get_inode.proc_lookup_de.proc_tgid_net_lookup.lookup_open
1.40 +0.1 1.49 perf-profile.calltrace.cycles-pp.proc_get_inode.proc_lookup_de.proc_tgid_net_lookup.lookup_open.open_last_lookups
0.73 +0.1 0.82 perf-profile.calltrace.cycles-pp.may_open.do_open.path_openat.do_filp_open.do_sys_openat2
0.67 +0.1 0.75 perf-profile.calltrace.cycles-pp.inode_permission.may_open.do_open.path_openat.do_filp_open
1.91 ? 3% +0.1 2.03 ? 2% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.__x64_sys_close
0.81 ? 3% +0.1 0.94 ? 4% perf-profile.calltrace.cycles-pp.apparmor_file_alloc_security.security_file_alloc.init_file.alloc_empty_file.path_openat
1.10 ? 3% +0.1 1.23 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput
1.13 ? 3% +0.1 1.26 ? 3% perf-profile.calltrace.cycles-pp.init_file.alloc_empty_file.path_openat.do_filp_open.do_sys_openat2
0.85 ? 3% +0.1 0.98 ? 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput
0.89 ? 3% +0.1 1.04 ? 3% perf-profile.calltrace.cycles-pp.security_file_alloc.init_file.alloc_empty_file.path_openat.do_filp_open
1.47 ? 3% +0.1 1.61 ? 3% perf-profile.calltrace.cycles-pp.alloc_empty_file.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
0.66 +0.2 0.86 ? 22% perf-profile.calltrace.cycles-pp.__legitimize_path.try_to_unlazy.link_path_walk.path_openat.do_filp_open
0.67 +0.2 0.87 ? 22% perf-profile.calltrace.cycles-pp.try_to_unlazy.link_path_walk.path_openat.do_filp_open.do_sys_openat2
0.88 ? 8% +0.2 1.10 ? 5% perf-profile.calltrace.cycles-pp.up_read.kernfs_dop_revalidate.lookup_fast.walk_component.link_path_walk
1.49 +0.2 1.73 ? 7% perf-profile.calltrace.cycles-pp.lookup_fast.open_last_lookups.path_openat.do_filp_open.do_sys_openat2
1.30 ? 5% +0.3 1.56 ? 5% perf-profile.calltrace.cycles-pp.apparmor_file_open.security_file_open.do_dentry_open.do_open.path_openat
1.31 ? 4% +0.3 1.57 ? 5% perf-profile.calltrace.cycles-pp.security_file_open.do_dentry_open.do_open.path_openat.do_filp_open
2.39 ? 3% +0.3 2.65 ? 4% perf-profile.calltrace.cycles-pp.do_dentry_open.do_open.path_openat.do_filp_open.do_sys_openat2
1.49 ? 2% +0.3 1.76 ? 5% perf-profile.calltrace.cycles-pp.up_read.kernfs_iop_permission.inode_permission.link_path_walk.path_openat
1.53 ? 5% +0.3 1.81 ? 2% perf-profile.calltrace.cycles-pp.down_read.kernfs_iop_permission.inode_permission.link_path_walk.path_openat
1.09 ? 10% +0.3 1.40 ? 2% perf-profile.calltrace.cycles-pp.up_read.kernfs_fop_readdir.iterate_dir.__x64_sys_getdents.do_syscall_64
1.08 ? 11% +0.3 1.40 ? 2% perf-profile.calltrace.cycles-pp.up_read.kernfs_fop_readdir.iterate_dir.__x64_sys_getdents64.do_syscall_64
1.22 ? 9% +0.3 1.56 ? 4% perf-profile.calltrace.cycles-pp.down_read.kernfs_dop_revalidate.lookup_fast.walk_component.link_path_walk
1.09 ? 2% +0.3 1.44 ? 24% perf-profile.calltrace.cycles-pp.dput.terminate_walk.path_openat.do_filp_open.do_sys_openat2
1.13 ? 2% +0.4 1.48 ? 23% perf-profile.calltrace.cycles-pp.terminate_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
1.43 ? 9% +0.4 1.81 perf-profile.calltrace.cycles-pp.down_read.kernfs_fop_readdir.iterate_dir.__x64_sys_getdents.do_syscall_64
1.41 ? 9% +0.4 1.81 ? 2% perf-profile.calltrace.cycles-pp.down_read.kernfs_fop_readdir.iterate_dir.__x64_sys_getdents64.do_syscall_64
0.17 ?141% +0.4 0.58 ? 2% perf-profile.calltrace.cycles-pp.kernfs_dop_revalidate.lookup_fast.open_last_lookups.path_openat.do_filp_open
3.51 ? 2% +0.4 3.93 ? 3% perf-profile.calltrace.cycles-pp.do_open.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
0.17 ?141% +0.5 0.70 ? 28% perf-profile.calltrace.cycles-pp.lockref_get_not_dead.__legitimize_path.try_to_unlazy.link_path_walk.path_openat
0.00 +0.6 0.56 perf-profile.calltrace.cycles-pp.kernfs_iop_permission.inode_permission.may_open.do_open.path_openat
2.14 ? 8% +0.6 2.71 ? 4% perf-profile.calltrace.cycles-pp.kernfs_dop_revalidate.lookup_fast.walk_component.link_path_walk.path_openat
3.14 ? 3% +0.6 3.71 ? 4% perf-profile.calltrace.cycles-pp.kernfs_iop_permission.inode_permission.link_path_walk.path_openat.do_filp_open
4.18 ? 2% +0.6 4.77 ? 3% perf-profile.calltrace.cycles-pp.inode_permission.link_path_walk.path_openat.do_filp_open.do_sys_openat2
4.89 ? 4% +0.6 5.50 ? 2% perf-profile.calltrace.cycles-pp.walk_component.link_path_walk.path_openat.do_filp_open.do_sys_openat2
3.29 ? 5% +0.6 3.93 ? 3% perf-profile.calltrace.cycles-pp.lookup_fast.walk_component.link_path_walk.path_openat.do_filp_open
3.23 ? 7% +0.7 3.96 perf-profile.calltrace.cycles-pp.kernfs_fop_readdir.iterate_dir.__x64_sys_getdents.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.22 ? 8% +0.7 3.96 ? 2% perf-profile.calltrace.cycles-pp.kernfs_fop_readdir.iterate_dir.__x64_sys_getdents64.do_syscall_64.entry_SYSCALL_64_after_hwframe
10.92 ? 2% +1.4 12.34 ? 2% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
24.85 +2.5 27.32 perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64
24.92 +2.5 27.39 perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe
26.02 +2.5 28.52 perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
26.05 +2.5 28.55 perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
26.11 +2.5 28.61 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64
26.13 +2.5 28.63 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.open64
26.32 +2.5 28.83 perf-profile.calltrace.cycles-pp.open64
29.41 ? 3% -3.3 26.07 ? 2% perf-profile.children.cycles-pp.proc_readdir_de
57.69 -2.9 54.77 perf-profile.children.cycles-pp.iterate_dir
59.85 -2.9 56.97 perf-profile.children.cycles-pp.syscall
18.49 ? 3% -2.1 16.39 ? 2% perf-profile.children.cycles-pp.proc_tgid_net_readdir
15.47 ? 4% -1.8 13.70 ? 2% perf-profile.children.cycles-pp._raw_read_lock
29.19 -1.5 27.70 perf-profile.children.cycles-pp.__x64_sys_getdents64
29.83 -1.4 28.38 perf-profile.children.cycles-pp.__x64_sys_getdents
94.11 -0.3 93.85 perf-profile.children.cycles-pp.do_syscall_64
94.19 -0.3 93.94 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.89 ? 2% -0.0 0.86 ? 3% perf-profile.children.cycles-pp.proc_readfd_common
0.08 ? 6% -0.0 0.05 perf-profile.children.cycles-pp.main
0.08 ? 6% -0.0 0.05 perf-profile.children.cycles-pp.run_builtin
0.07 ? 11% -0.0 0.05 perf-profile.children.cycles-pp.__cmd_record
0.07 ? 11% -0.0 0.05 perf-profile.children.cycles-pp.cmd_record
0.12 ? 4% -0.0 0.10 ? 4% perf-profile.children.cycles-pp._raw_spin_lock_irq
0.10 +0.0 0.11 perf-profile.children.cycles-pp.atime_needs_update
0.05 +0.0 0.06 perf-profile.children.cycles-pp.nd_jump_root
0.09 +0.0 0.10 perf-profile.children.cycles-pp.__init_rwsem
0.17 +0.0 0.18 perf-profile.children.cycles-pp.generic_permission
0.06 +0.0 0.07 perf-profile.children.cycles-pp.proc_pid_readdir
0.06 +0.0 0.07 perf-profile.children.cycles-pp.process_measurement
0.12 +0.0 0.13 perf-profile.children.cycles-pp.uncharge_batch
0.18 +0.0 0.19 perf-profile.children.cycles-pp.vsnprintf
0.22 ? 2% +0.0 0.24 perf-profile.children.cycles-pp.native_irq_return_iret
0.19 ? 2% +0.0 0.21 ? 2% perf-profile.children.cycles-pp.stress_getdents_dir
0.17 +0.0 0.18 ? 2% perf-profile.children.cycles-pp.memchr
0.08 +0.0 0.09 ? 5% perf-profile.children.cycles-pp.path_init
0.08 ? 5% +0.0 0.10 ? 4% perf-profile.children.cycles-pp.locks_remove_posix
0.24 +0.0 0.25 perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook
0.38 +0.0 0.40 perf-profile.children.cycles-pp.getname_flags
0.10 +0.0 0.12 ? 4% perf-profile.children.cycles-pp.page_counter_uncharge
0.14 ? 3% +0.0 0.16 perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
0.10 +0.0 0.12 perf-profile.children.cycles-pp.percpu_counter_add_batch
0.19 ? 2% +0.0 0.21 ? 2% perf-profile.children.cycles-pp.inode_init_always
0.20 ? 2% +0.0 0.22 ? 2% perf-profile.children.cycles-pp.mod_objcg_state
0.33 +0.0 0.35 perf-profile.children.cycles-pp.__cond_resched
0.18 ? 4% +0.0 0.20 ? 2% perf-profile.children.cycles-pp.strlcat
0.56 +0.0 0.58 perf-profile.children.cycles-pp.alloc_inode
0.52 +0.0 0.55 ? 2% perf-profile.children.cycles-pp.d_alloc
0.66 +0.0 0.69 perf-profile.children.cycles-pp.__slab_free
0.12 +0.0 0.15 ? 18% perf-profile.children.cycles-pp.try_to_unlazy_next
0.31 +0.0 0.34 perf-profile.children.cycles-pp.__memcg_slab_free_hook
0.70 +0.0 0.74 perf-profile.children.cycles-pp.d_alloc_parallel
0.77 +0.0 0.81 perf-profile.children.cycles-pp.filldir64
0.77 +0.0 0.82 ? 2% perf-profile.children.cycles-pp.filldir
0.11 ? 4% +0.1 0.17 ? 7% perf-profile.children.cycles-pp.security_current_getsecid_subj
0.00 +0.1 0.05 ? 8% perf-profile.children.cycles-pp.cpu_startup_entry
0.00 +0.1 0.05 ? 8% perf-profile.children.cycles-pp.do_idle
0.00 +0.1 0.05 ? 8% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
0.00 +0.1 0.05 ? 8% perf-profile.children.cycles-pp.start_secondary
0.10 ? 4% +0.1 0.16 ? 7% perf-profile.children.cycles-pp.apparmor_current_getsecid_subj
1.04 +0.1 1.09 perf-profile.children.cycles-pp.rcu_do_batch
1.04 +0.1 1.10 perf-profile.children.cycles-pp.rcu_core
1.05 +0.1 1.11 perf-profile.children.cycles-pp.__do_softirq
0.65 ? 4% +0.1 0.71 ? 4% perf-profile.children.cycles-pp.apparmor_file_free_security
0.66 ? 3% +0.1 0.72 ? 4% perf-profile.children.cycles-pp.security_file_free
0.19 ? 4% +0.1 0.25 ? 6% perf-profile.children.cycles-pp.ima_file_check
1.19 +0.1 1.27 perf-profile.children.cycles-pp.kmem_cache_free
0.43 ? 13% +0.1 0.51 ? 5% perf-profile.children.cycles-pp.smpboot_thread_fn
0.44 ? 13% +0.1 0.52 ? 5% perf-profile.children.cycles-pp.kthread
0.44 ? 13% +0.1 0.52 ? 5% perf-profile.children.cycles-pp.ret_from_fork
0.44 ? 13% +0.1 0.52 ? 5% perf-profile.children.cycles-pp.ret_from_fork_asm
0.42 ? 15% +0.1 0.51 ? 5% perf-profile.children.cycles-pp.run_ksoftirqd
0.74 +0.1 0.83 perf-profile.children.cycles-pp.may_open
1.92 ? 3% +0.1 2.04 ? 2% perf-profile.children.cycles-pp.evict
1.14 ? 3% +0.1 1.27 ? 3% perf-profile.children.cycles-pp.init_file
0.81 ? 3% +0.1 0.95 ? 4% perf-profile.children.cycles-pp.apparmor_file_alloc_security
0.90 ? 2% +0.1 1.04 ? 3% perf-profile.children.cycles-pp.security_file_alloc
1.47 ? 3% +0.1 1.61 ? 3% perf-profile.children.cycles-pp.alloc_empty_file
2.19 ? 2% +0.2 2.34 perf-profile.children.cycles-pp.new_inode
2.26 ? 2% +0.2 2.42 perf-profile.children.cycles-pp.proc_get_inode
0.53 ? 4% +0.2 0.70 ? 7% perf-profile.children.cycles-pp.apparmor_file_permission
0.55 ? 5% +0.2 0.72 ? 6% perf-profile.children.cycles-pp.security_file_permission
1.30 ? 5% +0.3 1.56 ? 5% perf-profile.children.cycles-pp.apparmor_file_open
1.32 ? 4% +0.3 1.57 ? 5% perf-profile.children.cycles-pp.security_file_open
2.40 ? 4% +0.3 2.66 ? 4% perf-profile.children.cycles-pp.do_dentry_open
1.35 +0.3 1.69 ? 19% perf-profile.children.cycles-pp.try_to_unlazy
1.14 ? 2% +0.4 1.50 ? 23% perf-profile.children.cycles-pp.terminate_walk
1.40 +0.4 1.77 ? 20% perf-profile.children.cycles-pp.__legitimize_path
1.00 ? 2% +0.4 1.39 ? 26% perf-profile.children.cycles-pp.lockref_get_not_dead
7.02 ? 3% +0.4 7.42 ? 3% perf-profile.children.cycles-pp.dput
3.52 ? 2% +0.4 3.94 ? 3% perf-profile.children.cycles-pp.do_open
4.91 ? 4% +0.6 5.53 ? 2% perf-profile.children.cycles-pp.walk_component
3.62 ? 3% +0.7 4.29 ? 3% perf-profile.children.cycles-pp.kernfs_iop_permission
4.87 ? 2% +0.7 5.54 ? 2% perf-profile.children.cycles-pp.inode_permission
2.61 ? 8% +0.7 3.30 ? 4% perf-profile.children.cycles-pp.kernfs_dop_revalidate
4.80 ? 4% +0.9 5.69 ? 3% perf-profile.children.cycles-pp.lookup_fast
5.71 ? 6% +1.2 6.89 ? 3% perf-profile.children.cycles-pp.up_read
10.94 ? 2% +1.4 12.38 ? 2% perf-profile.children.cycles-pp.link_path_walk
6.48 ? 8% +1.5 7.95 ? 2% perf-profile.children.cycles-pp.kernfs_fop_readdir
6.24 ? 7% +1.5 7.75 ? 2% perf-profile.children.cycles-pp.down_read
24.88 +2.5 27.36 perf-profile.children.cycles-pp.path_openat
24.94 +2.5 27.42 perf-profile.children.cycles-pp.do_filp_open
26.06 +2.5 28.56 perf-profile.children.cycles-pp.do_sys_openat2
26.07 +2.5 28.58 perf-profile.children.cycles-pp.__x64_sys_openat
26.37 +2.5 28.88 perf-profile.children.cycles-pp.open64
15.34 ? 4% -1.8 13.59 ? 2% perf-profile.self.cycles-pp._raw_read_lock
13.66 ? 4% -1.7 11.95 ? 2% perf-profile.self.cycles-pp.proc_readdir_de
1.61 ? 4% -0.2 1.46 ? 2% perf-profile.self.cycles-pp.proc_lookup_de
0.12 ? 4% -0.0 0.10 ? 4% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.10 +0.0 0.11 perf-profile.self.cycles-pp.page_counter_uncharge
0.10 +0.0 0.11 perf-profile.self.cycles-pp.percpu_counter_add_batch
0.05 +0.0 0.06 perf-profile.self.cycles-pp.refill_obj_stock
0.08 +0.0 0.09 perf-profile.self.cycles-pp.number
0.09 +0.0 0.10 perf-profile.self.cycles-pp.pid_revalidate
0.19 ? 2% +0.0 0.21 ? 2% perf-profile.self.cycles-pp.__cond_resched
0.26 +0.0 0.28 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.22 ? 2% +0.0 0.24 perf-profile.self.cycles-pp.native_irq_return_iret
0.12 +0.0 0.13 ? 3% perf-profile.self.cycles-pp.__memcg_slab_free_hook
0.13 +0.0 0.14 ? 3% perf-profile.self.cycles-pp.generic_permission
0.08 +0.0 0.09 ? 5% perf-profile.self.cycles-pp.locks_remove_posix
0.09 ? 5% +0.0 0.10 perf-profile.self.cycles-pp.__call_rcu_common
0.18 ? 2% +0.0 0.19 perf-profile.self.cycles-pp.proc_tgid_net_lookup
0.17 ? 2% +0.0 0.18 ? 2% perf-profile.self.cycles-pp.mod_objcg_state
0.13 ? 3% +0.0 0.15 ? 3% perf-profile.self.cycles-pp.inode_init_always
0.16 ? 3% +0.0 0.18 ? 2% perf-profile.self.cycles-pp.do_syscall_64
0.17 ? 2% +0.0 0.19 ? 2% perf-profile.self.cycles-pp.get_proc_task_net
0.23 ? 2% +0.0 0.25 perf-profile.self.cycles-pp.syscall
0.65 +0.0 0.68 perf-profile.self.cycles-pp.__slab_free
0.38 ? 2% +0.0 0.41 ? 6% perf-profile.self.cycles-pp.inode_permission
0.56 +0.0 0.60 perf-profile.self.cycles-pp.filldir
0.84 ? 2% +0.0 0.89 ? 2% perf-profile.self.cycles-pp.lockref_get_not_dead
0.55 +0.0 0.60 ? 2% perf-profile.self.cycles-pp.filldir64
0.00 +0.1 0.05 perf-profile.self.cycles-pp.proc_tgid_net_readdir
0.10 ? 4% +0.1 0.16 ? 7% perf-profile.self.cycles-pp.apparmor_current_getsecid_subj
0.65 ? 3% +0.1 0.71 ? 5% perf-profile.self.cycles-pp.apparmor_file_free_security
0.80 ? 3% +0.1 0.93 ? 4% perf-profile.self.cycles-pp.apparmor_file_alloc_security
0.49 ? 5% +0.2 0.66 ? 6% perf-profile.self.cycles-pp.apparmor_file_permission
1.29 ? 4% +0.3 1.54 ? 5% perf-profile.self.cycles-pp.apparmor_file_open
5.66 ? 6% +1.2 6.84 ? 3% perf-profile.self.cycles-pp.up_read
6.15 ? 7% +1.5 7.66 ? 2% perf-profile.self.cycles-pp.down_read



***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-12-x86_64-20240206.cgz/lkp-skl-fpga01/futex4/will-it-scale

commit:
a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition")
6613d82e61 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")

a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3
---------------- ---------------------------
%stddev %change %stddev
\ | \
33.61 +1.2% 34.01 boot-time.boot
3130 +1.3% 3171 boot-time.idle
70814196 +4.0% 73672365 will-it-scale.104.threads
680905 +4.0% 708387 will-it-scale.per_thread_ops
70814196 +4.0% 73672365 will-it-scale.workload
89530 -1.7% 88005 proc-vmstat.nr_active_anon
92711 -1.7% 91127 proc-vmstat.nr_shmem
89530 -1.7% 88005 proc-vmstat.nr_zone_active_anon
76969 -1.7% 75654 proc-vmstat.pgactivate
1086126 -1.8% 1066713 proc-vmstat.pgalloc_normal
40426 ? 3% +10.6% 44714 ? 4% proc-vmstat.pgreuse
10727 ? 61% +52.8% 16392 ? 5% sched_debug.cfs_rq:/.load_avg.max
0.07 ? 12% -18.3% 0.06 ? 3% sched_debug.cfs_rq:/.nr_running.stddev
0.92 ? 74% +383.2% 4.45 ? 30% sched_debug.cfs_rq:/.removed.runnable_avg.avg
6.89 ? 72% +161.3% 18.00 ? 17% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
0.92 ? 74% +383.1% 4.45 ? 30% sched_debug.cfs_rq:/.removed.util_avg.avg
6.89 ? 72% +161.2% 17.99 ? 17% sched_debug.cfs_rq:/.removed.util_avg.stddev
1259 ? 2% -17.5% 1039 ? 16% sched_debug.cfs_rq:/.util_est.max
3796 ? 3% +29.1% 4902 ? 12% sched_debug.cpu.max_idle_balance_cost.stddev
0.00 +11.9% 0.00 ? 2% sched_debug.cpu.next_balance.stddev
876.78 ? 7% +10.9% 972.39 ? 7% sched_debug.cpu.nr_switches.min
6.71 ? 8% -16.5% 5.60 ? 4% sched_debug.cpu.nr_uninterruptible.stddev
6.114e+09 +4.3% 6.376e+09 perf-stat.i.branch-instructions
1.35 +0.2 1.53 perf-stat.i.branch-miss-rate%
81670984 +19.2% 97330429 perf-stat.i.branch-misses
6.05 -3.3% 5.85 perf-stat.i.cpi
4.754e+10 +4.0% 4.944e+10 perf-stat.i.instructions
0.17 +2.9% 0.17 perf-stat.i.ipc
1.34 +0.2 1.53 perf-stat.overall.branch-miss-rate%
6.07 -3.5% 5.86 perf-stat.overall.cpi
0.16 +3.6% 0.17 perf-stat.overall.ipc
6.094e+09 +4.3% 6.354e+09 perf-stat.ps.branch-instructions
81368878 +19.2% 96977141 perf-stat.ps.branch-misses
4.738e+10 +4.0% 4.928e+10 perf-stat.ps.instructions
0.03 ? 47% +57.9% 0.05 ? 9% perf-stat.ps.major-faults
1.439e+13 +3.6% 1.491e+13 perf-stat.total.instructions
44.04 -21.1 22.93 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
55.72 -19.2 36.49 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
19.53 -18.4 1.17 ? 2% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
20.23 -6.0 14.24 perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
22.07 -5.9 16.15 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
20.74 -5.9 14.83 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
6.35 -4.0 2.31 ? 2% perf-profile.calltrace.cycles-pp.__get_user_nocheck_4.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait
6.88 -3.8 3.10 perf-profile.calltrace.cycles-pp.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait.do_futex
29.51 -2.6 26.91 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall
12.81 -2.6 10.22 perf-profile.calltrace.cycles-pp.futex_wait_setup.__futex_wait.futex_wait.do_futex.__x64_sys_futex
13.99 -2.4 11.54 perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
0.64 +0.1 0.69 perf-profile.calltrace.cycles-pp.get_futex_key.futex_wait_setup.__futex_wait.futex_wait.do_futex
0.76 +0.1 0.82 perf-profile.calltrace.cycles-pp.futex_q_unlock.futex_wait_setup.__futex_wait.futex_wait.do_futex
99.58 +0.1 99.65 perf-profile.calltrace.cycles-pp.syscall
0.97 +0.2 1.19 perf-profile.calltrace.cycles-pp.futex_hash.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait
8.56 +0.6 9.13 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
1.21 +0.8 2.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait
3.64 +0.9 4.58 ? 2% perf-profile.calltrace.cycles-pp.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait.do_futex
7.91 +2.7 10.62 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall
0.00 +17.5 17.48 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.syscall
44.20 -21.7 22.46 perf-profile.children.cycles-pp.do_syscall_64
56.13 -18.9 37.23 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
19.74 -18.5 1.26 ? 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
20.38 -6.0 14.36 perf-profile.children.cycles-pp.futex_wait
22.12 -5.9 16.21 perf-profile.children.cycles-pp.__x64_sys_futex
20.80 -5.9 14.90 perf-profile.children.cycles-pp.do_futex
6.53 -3.9 2.68 ? 2% perf-profile.children.cycles-pp.__get_user_nocheck_4
7.08 -3.8 3.29 ? 2% perf-profile.children.cycles-pp.futex_get_value_locked
29.66 -2.6 27.09 perf-profile.children.cycles-pp.syscall_return_via_sysret
13.00 -2.5 10.47 perf-profile.children.cycles-pp.futex_wait_setup
14.01 -2.4 11.58 perf-profile.children.cycles-pp.__futex_wait
0.18 ? 2% -0.1 0.13 ? 6% perf-profile.children.cycles-pp.amd_clear_divider
0.18 ? 2% -0.0 0.13 ? 3% perf-profile.children.cycles-pp.futex_setup_timer
0.44 -0.0 0.41 ? 3% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
0.05 +0.0 0.06 perf-profile.children.cycles-pp.syscall@plt
0.13 +0.0 0.16 ? 3% perf-profile.children.cycles-pp.testcase
0.80 +0.1 0.86 perf-profile.children.cycles-pp.futex_q_unlock
0.67 +0.1 0.72 perf-profile.children.cycles-pp.get_futex_key
0.98 +0.2 1.21 perf-profile.children.cycles-pp.futex_hash
1.26 +0.9 2.13 perf-profile.children.cycles-pp._raw_spin_lock
3.76 +1.0 4.75 ? 2% perf-profile.children.cycles-pp.futex_q_lock
4.25 +1.4 5.62 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
11.05 +1.9 12.94 perf-profile.children.cycles-pp.entry_SYSCALL_64
1.26 +17.4 18.64 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
19.27 -18.5 0.77 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
6.32 -3.8 2.51 perf-profile.self.cycles-pp.__get_user_nocheck_4
6.23 -3.5 2.68 perf-profile.self.cycles-pp.futex_wait
29.64 -2.6 27.05 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.13 -0.0 0.10 perf-profile.self.cycles-pp.futex_setup_timer
0.39 ? 2% -0.0 0.36 ? 4% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
0.06 +0.0 0.09 ? 5% perf-profile.self.cycles-pp.amd_clear_divider
0.58 +0.0 0.61 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.13 ? 3% +0.0 0.16 ? 3% perf-profile.self.cycles-pp.testcase
0.65 +0.0 0.69 perf-profile.self.cycles-pp.get_futex_key
0.78 +0.0 0.83 perf-profile.self.cycles-pp.futex_q_unlock
0.96 +0.1 1.07 perf-profile.self.cycles-pp.__futex_wait
0.44 +0.1 0.58 perf-profile.self.cycles-pp.do_futex
0.85 +0.2 1.06 perf-profile.self.cycles-pp.futex_wait_setup
0.93 +0.2 1.17 perf-profile.self.cycles-pp.futex_hash
1.23 +0.9 2.09 perf-profile.self.cycles-pp._raw_spin_lock
9.85 +1.9 11.73 perf-profile.self.cycles-pp.entry_SYSCALL_64
2.11 +2.3 4.37 perf-profile.self.cycles-pp.syscall
1.86 +2.3 4.19 perf-profile.self.cycles-pp.do_syscall_64
12.24 +3.1 15.38 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.11 +17.4 18.46 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack



***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/futex2/will-it-scale

commit:
a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition")
6613d82e61 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")

a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3
---------------- ---------------------------
%stddev %change %stddev
\ | \
228.17 ? 8% -23.3% 175.00 ? 15% perf-c2c.HITM.local
25.31 ? 26% +1883.0% 501.86 ?132% sched_debug.cfs_rq:/.removed.load_avg.stddev
5561 ? 52% -43.3% 3154 ? 11% turbostat.C1
17507 ? 17% +19.7% 20950 ? 4% proc-vmstat.numa_hint_faults_local
61472 +4.9% 64491 ? 2% proc-vmstat.pgactivate
66711960 -2.1% 65339777 will-it-scale.104.processes
641460 -2.1% 628266 will-it-scale.per_process_ops
66711960 -2.1% 65339777 will-it-scale.workload
0.33 ? 21% -31.2% 0.23 ? 18% perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.09 ? 16% +82.6% 0.16 ? 15% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
534.00 ? 4% -10.5% 478.00 ? 3% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
187.33 ? 7% -16.7% 156.00 ? 10% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
0.09 ? 16% +82.6% 0.16 ? 15% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
1.296e+10 -1.9% 1.271e+10 perf-stat.i.branch-instructions
1.00 +0.1 1.06 perf-stat.i.branch-miss-rate%
1.286e+08 +4.8% 1.348e+08 perf-stat.i.branch-misses
3.34 +2.1% 3.41 perf-stat.i.cpi
66597836 -1.9% 65315938 perf-stat.i.dTLB-load-misses
1.946e+10 -1.9% 1.909e+10 perf-stat.i.dTLB-loads
0.00 ? 59% -0.0 0.00 perf-stat.i.dTLB-store-miss-rate%
58479 -6.6% 54625 perf-stat.i.dTLB-store-misses
1.439e+10 -1.9% 1.411e+10 perf-stat.i.dTLB-stores
73446151 -8.8% 67017808 ? 4% perf-stat.i.iTLB-load-misses
8.619e+10 -2.0% 8.443e+10 perf-stat.i.instructions
1175 ? 2% +7.7% 1266 ? 4% perf-stat.i.instructions-per-iTLB-miss
0.30 -2.1% 0.29 perf-stat.i.ipc
450.05 -1.9% 441.52 perf-stat.i.metric.M/sec
192401 ? 5% -6.8% 179259 ? 6% perf-stat.i.node-load-misses
0.99 +0.1 1.06 perf-stat.overall.branch-miss-rate%
3.33 +2.2% 3.41 perf-stat.overall.cpi
0.00 -0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
1173 ? 2% +7.6% 1262 ? 4% perf-stat.overall.instructions-per-iTLB-miss
0.30 -2.1% 0.29 perf-stat.overall.ipc
1.292e+10 -1.9% 1.267e+10 perf-stat.ps.branch-instructions
1.282e+08 +4.8% 1.344e+08 perf-stat.ps.branch-misses
66375435 -1.9% 65097246 perf-stat.ps.dTLB-load-misses
1.94e+10 -1.9% 1.903e+10 perf-stat.ps.dTLB-loads
58320 -6.6% 54460 perf-stat.ps.dTLB-store-misses
1.434e+10 -1.9% 1.407e+10 perf-stat.ps.dTLB-stores
73202477 -8.8% 66790734 ? 4% perf-stat.ps.iTLB-load-misses
8.59e+10 -2.0% 8.415e+10 perf-stat.ps.instructions
191780 ? 5% -6.8% 178656 ? 6% perf-stat.ps.node-load-misses
2.598e+13 -2.2% 2.541e+13 perf-stat.total.instructions
17.48 -16.6 0.83 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
61.84 -10.1 51.77 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
48.64 -5.8 42.84 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
13.22 -5.6 7.61 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall
21.25 -1.3 19.98 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall
7.92 -0.2 7.68 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
99.68 +0.1 99.74 perf-profile.calltrace.cycles-pp.syscall
1.04 +0.1 1.13 perf-profile.calltrace.cycles-pp.try_grab_folio.gup_pte_range.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast
0.61 +0.3 0.96 perf-profile.calltrace.cycles-pp.futex_hash.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait
0.89 +0.4 1.25 perf-profile.calltrace.cycles-pp._raw_spin_lock.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait
0.00 +0.9 0.87 perf-profile.calltrace.cycles-pp.__pte_offset_map.gup_pte_range.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast
2.36 ? 5% +1.6 3.97 perf-profile.calltrace.cycles-pp.futex_q_lock.futex_wait_setup.__futex_wait.futex_wait.do_futex
3.56 +1.9 5.43 perf-profile.calltrace.cycles-pp.gup_pte_range.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast.get_user_pages_fast
2.16 ? 2% +2.3 4.47 perf-profile.calltrace.cycles-pp.__get_user_nocheck_4.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait
6.48 +3.3 9.79 perf-profile.calltrace.cycles-pp.gup_pgd_range.lockless_pages_from_mm.internal_get_user_pages_fast.get_user_pages_fast.get_futex_key
2.49 ? 2% +3.5 6.00 perf-profile.calltrace.cycles-pp.futex_get_value_locked.futex_wait_setup.__futex_wait.futex_wait.do_futex
7.50 +3.9 11.39 perf-profile.calltrace.cycles-pp.lockless_pages_from_mm.internal_get_user_pages_fast.get_user_pages_fast.get_futex_key.futex_wait_setup
8.30 +4.7 12.95 perf-profile.calltrace.cycles-pp.internal_get_user_pages_fast.get_user_pages_fast.get_futex_key.futex_wait_setup.__futex_wait
9.10 +5.4 14.50 perf-profile.calltrace.cycles-pp.get_user_pages_fast.get_futex_key.futex_wait_setup.__futex_wait.futex_wait
10.86 +6.8 17.64 perf-profile.calltrace.cycles-pp.get_futex_key.futex_wait_setup.__futex_wait.futex_wait.do_futex
24.84 +7.9 32.70 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
23.47 +8.1 31.60 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
22.92 +8.1 31.06 perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
20.40 +9.2 29.60 perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
17.06 +11.5 28.57 perf-profile.calltrace.cycles-pp.futex_wait_setup.__futex_wait.futex_wait.do_futex.__x64_sys_futex
0.00 +13.6 13.64 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.syscall
17.67 -16.8 0.89 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
62.15 -9.3 52.89 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
48.78 -6.9 41.92 perf-profile.children.cycles-pp.do_syscall_64
13.14 -3.0 10.14 perf-profile.children.cycles-pp.entry_SYSCALL_64
6.90 -2.8 4.09 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
21.41 -1.3 20.13 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.78 ? 3% -0.5 0.29 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
0.17 ? 2% -0.1 0.06 ? 6% perf-profile.children.cycles-pp.amd_clear_divider
0.34 ? 2% +0.0 0.39 ? 2% perf-profile.children.cycles-pp.is_valid_gup_args
0.06 ? 9% +0.0 0.10 ? 4% perf-profile.children.cycles-pp.pud_huge
1.05 +0.1 1.13 perf-profile.children.cycles-pp.try_grab_folio
0.07 ? 5% +0.1 0.16 ? 3% perf-profile.children.cycles-pp.pmd_huge
0.94 +0.4 1.30 perf-profile.children.cycles-pp._raw_spin_lock
0.61 +0.4 0.98 perf-profile.children.cycles-pp.futex_hash
0.45 +0.4 0.88 perf-profile.children.cycles-pp.__pte_offset_map
2.48 ? 5% +1.6 4.06 perf-profile.children.cycles-pp.futex_q_lock
3.64 +1.9 5.52 perf-profile.children.cycles-pp.gup_pte_range
2.28 ? 2% +2.8 5.08 perf-profile.children.cycles-pp.__get_user_nocheck_4
2.54 ? 2% +2.9 5.49 perf-profile.children.cycles-pp.futex_get_value_locked
6.56 +3.3 9.90 perf-profile.children.cycles-pp.gup_pgd_range
7.54 +3.9 11.44 perf-profile.children.cycles-pp.lockless_pages_from_mm
8.42 +4.7 13.12 perf-profile.children.cycles-pp.internal_get_user_pages_fast
9.20 +5.5 14.70 perf-profile.children.cycles-pp.get_user_pages_fast
10.90 +6.8 17.70 perf-profile.children.cycles-pp.get_futex_key
24.90 +7.9 32.76 perf-profile.children.cycles-pp.__x64_sys_futex
23.54 +8.1 31.67 perf-profile.children.cycles-pp.do_futex
23.00 +8.2 31.16 perf-profile.children.cycles-pp.futex_wait
20.42 +9.2 29.65 perf-profile.children.cycles-pp.__futex_wait
17.15 +11.5 28.70 perf-profile.children.cycles-pp.futex_wait_setup
1.24 +13.4 14.65 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
16.85 -16.3 0.54 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
12.04 -3.0 9.08 perf-profile.self.cycles-pp.entry_SYSCALL_64
3.22 -2.3 0.95 perf-profile.self.cycles-pp.__futex_wait
13.61 -1.6 12.00 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
21.39 -1.3 20.10 perf-profile.self.cycles-pp.syscall_return_via_sysret
2.48 -1.0 1.43 perf-profile.self.cycles-pp.futex_wait
0.74 ? 3% -0.5 0.26 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
1.36 ? 3% -0.3 1.07 perf-profile.self.cycles-pp.__x64_sys_futex
0.09 ? 4% -0.0 0.08 perf-profile.self.cycles-pp.futex_setup_timer
0.05 ? 8% +0.0 0.08 ? 5% perf-profile.self.cycles-pp.pud_huge
0.31 ? 2% +0.0 0.35 ? 2% perf-profile.self.cycles-pp.is_valid_gup_args
0.00 +0.1 0.05 perf-profile.self.cycles-pp.syscall@plt
1.03 +0.1 1.09 perf-profile.self.cycles-pp.try_grab_folio
0.05 ? 7% +0.1 0.13 ? 2% perf-profile.self.cycles-pp.pmd_huge
2.36 +0.1 2.46 perf-profile.self.cycles-pp.syscall
0.27 ? 5% +0.2 0.42 ? 5% perf-profile.self.cycles-pp.futex_get_value_locked
0.61 ? 2% +0.2 0.84 perf-profile.self.cycles-pp.futex_wait_setup
0.90 +0.4 1.26 perf-profile.self.cycles-pp._raw_spin_lock
0.59 +0.4 0.95 perf-profile.self.cycles-pp.futex_hash
0.44 +0.4 0.87 perf-profile.self.cycles-pp.__pte_offset_map
0.90 +0.5 1.41 perf-profile.self.cycles-pp.lockless_pages_from_mm
0.51 ? 2% +0.7 1.24 perf-profile.self.cycles-pp.get_user_pages_fast
0.90 +0.8 1.73 perf-profile.self.cycles-pp.internal_get_user_pages_fast
0.96 ? 12% +0.9 1.81 ? 2% perf-profile.self.cycles-pp.futex_q_lock
1.69 +1.3 2.98 perf-profile.self.cycles-pp.get_futex_key
5.82 +1.3 7.13 perf-profile.self.cycles-pp.do_syscall_64
2.81 +1.4 4.16 perf-profile.self.cycles-pp.gup_pgd_range
2.06 +1.4 3.46 perf-profile.self.cycles-pp.gup_pte_range
2.24 ? 2% +2.8 5.01 perf-profile.self.cycles-pp.__get_user_nocheck_4
1.08 +13.4 14.51 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack



***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/futex3/will-it-scale

commit:
a0e2dab44d ("x86/entry_32: Add VERW just before userspace transition")
6613d82e61 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")

a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3
---------------- ---------------------------
%stddev %change %stddev
\ | \
13.83 ? 10% +25.7% 17.39 ? 9% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
13.66 ? 11% +25.8% 17.19 ? 9% sched_debug.cfs_rq:/.removed.util_avg.stddev
0.44 ? 8% -20.4% 0.35 ? 17% perf-sched.wait_and_delay.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
283.50 ? 8% +14.2% 323.67 ? 4% perf-sched.wait_and_delay.count.irqentry_exit_to_user_mode.asm_sysvec_reschedule_ipi.[unknown].[unknown]
0.44 ? 8% -20.4% 0.35 ? 17% perf-sched.wait_time.avg.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown].[unknown]
76308359 +3.7% 79094461 will-it-scale.104.processes
733733 +3.7% 760523 will-it-scale.per_process_ops
76308359 +3.7% 79094461 will-it-scale.workload
55603 -2.2% 54361 proc-vmstat.nr_active_anon
58137 -2.3% 56792 proc-vmstat.nr_shmem
55603 -2.2% 54361 proc-vmstat.nr_zone_active_anon
57819 ? 2% -3.5% 55794 proc-vmstat.pgactivate
4.625e+09 +3.7% 4.793e+09 perf-stat.i.branch-instructions
1.76 +0.3 2.10 perf-stat.i.branch-miss-rate%
81504213 +23.8% 1.009e+08 perf-stat.i.branch-misses
7.84 -3.1% 7.59 perf-stat.i.cpi
76204495 +3.7% 79030797 perf-stat.i.dTLB-load-misses
8.857e+09 +3.7% 9.18e+09 perf-stat.i.dTLB-loads
0.00 -0.0 0.00 perf-stat.i.dTLB-store-miss-rate%
74968 +2.1% 76523 perf-stat.i.dTLB-store-misses
6.71e+09 +3.6% 6.954e+09 perf-stat.i.dTLB-stores
3.674e+10 +3.3% 3.794e+10 perf-stat.i.instructions
0.13 +3.2% 0.13 perf-stat.i.ipc
194.14 +3.6% 201.22 perf-stat.i.metric.M/sec
76.87 +1.3 78.12 perf-stat.i.node-store-miss-rate%
1.76 +0.3 2.10 perf-stat.overall.branch-miss-rate%
7.84 -3.1% 7.59 perf-stat.overall.cpi
0.00 -0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
0.13 +3.2% 0.13 perf-stat.overall.ipc
4.609e+09 +3.6% 4.778e+09 perf-stat.ps.branch-instructions
81226256 +23.8% 1.005e+08 perf-stat.ps.branch-misses
75948753 +3.7% 78766248 perf-stat.ps.dTLB-load-misses
8.827e+09 +3.7% 9.15e+09 perf-stat.ps.dTLB-loads
74738 +2.1% 76323 perf-stat.ps.dTLB-store-misses
6.688e+09 +3.6% 6.931e+09 perf-stat.ps.dTLB-stores
3.662e+10 +3.3% 3.781e+10 perf-stat.ps.instructions
1.106e+13 +3.2% 1.141e+13 perf-stat.total.instructions
39.96 -26.1 13.91 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
52.30 -23.3 29.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
21.46 -20.2 1.24 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
15.92 -9.7 6.22 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
14.55 -9.7 4.85 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
13.86 -9.7 4.18 perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.41 -4.2 1.17 perf-profile.calltrace.cycles-pp.get_futex_key.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
5.45 -4.1 1.31 perf-profile.calltrace.cycles-pp.futex_hash.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
32.42 -3.9 28.48 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.syscall
99.16 -2.6 96.55 perf-profile.calltrace.cycles-pp.syscall
8.66 -1.7 6.99 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall
8.99 +0.3 9.32 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.syscall
0.58 +3.4 3.94 ? 7% perf-profile.calltrace.cycles-pp.testcase
0.00 +21.1 21.12 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.syscall
40.10 -26.8 13.28 perf-profile.children.cycles-pp.do_syscall_64
52.74 -22.8 29.91 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
21.67 -20.4 1.32 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
14.14 -9.8 4.32 perf-profile.children.cycles-pp.futex_wake
14.61 -9.7 4.90 perf-profile.children.cycles-pp.do_futex
15.97 -9.7 6.27 perf-profile.children.cycles-pp.__x64_sys_futex
5.45 -4.3 1.20 perf-profile.children.cycles-pp.get_futex_key
5.46 -4.1 1.32 perf-profile.children.cycles-pp.futex_hash
32.59 -3.9 28.68 perf-profile.children.cycles-pp.syscall_return_via_sysret
99.58 -2.3 97.28 perf-profile.children.cycles-pp.syscall
4.64 -0.7 3.96 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
11.74 -0.7 11.08 perf-profile.children.cycles-pp.entry_SYSCALL_64
0.45 ? 5% -0.1 0.36 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
0.18 ? 3% -0.1 0.09 ? 4% perf-profile.children.cycles-pp.amd_clear_divider
0.05 +0.0 0.10 ? 3% perf-profile.children.cycles-pp.syscall@plt
0.58 +2.7 3.30 ? 8% perf-profile.children.cycles-pp.testcase
1.35 +21.0 22.37 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
21.18 -20.3 0.88 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
5.37 -4.2 1.16 perf-profile.self.cycles-pp.get_futex_key
5.22 -4.0 1.23 perf-profile.self.cycles-pp.futex_hash
32.55 -4.0 28.57 perf-profile.self.cycles-pp.syscall_return_via_sysret
3.42 -1.5 1.87 perf-profile.self.cycles-pp.futex_wake
10.45 -0.7 9.80 perf-profile.self.cycles-pp.entry_SYSCALL_64
0.40 ? 6% -0.1 0.32 ? 3% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
0.05 +0.0 0.07 ? 5% perf-profile.self.cycles-pp.amd_clear_divider
0.05 +0.0 0.10 ? 4% perf-profile.self.cycles-pp.syscall@plt
0.51 +0.1 0.62 perf-profile.self.cycles-pp.do_futex
0.61 +0.3 0.91 ? 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.34 ? 2% +2.3 2.63 ? 9% perf-profile.self.cycles-pp.testcase
1.98 +2.8 4.80 perf-profile.self.cycles-pp.do_syscall_64
1.65 +3.9 5.51 ? 3% perf-profile.self.cycles-pp.syscall
13.00 +4.4 17.39 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.19 +21.0 22.18 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



2024-03-04 17:59:07

by Dave Hansen

[permalink] [raw]
Subject: Re: [linus:master] [x86/bugs] 6613d82e61: stress-ng.mutex.ops_per_sec -7.9% regression

On 3/3/24 21:53, kernel test robot wrote:
> kernel test robot noticed a -7.9% regression of stress-ng.mutex.ops_per_sec on:
>
> commit: 6613d82e617dd7eb8b0c40b2fe3acea655b1d611 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

This _looks_ like noise to me.

Some benchmarks went up, some went down. The differential profile shows
random gunk that basically amounts to "my computer is slow" because it's
mostly things that change when the result changes, like:

> 182670 +9.0% 199032 stress-ng.mutex.nanosecs_per_mutex

Does anyone think there's something substantial to chase after here?

2024-03-05 05:51:40

by Feng Tang

[permalink] [raw]
Subject: Re: [linus:master] [x86/bugs] 6613d82e61: stress-ng.mutex.ops_per_sec -7.9% regression

Hi Dave,

On Mon, Mar 04, 2024 at 09:58:53AM -0800, Dave Hansen wrote:
> On 3/3/24 21:53, kernel test robot wrote:
> > kernel test robot noticed a -7.9% regression of stress-ng.mutex.ops_per_sec on:
> >
> > commit: 6613d82e617dd7eb8b0c40b2fe3acea655b1d611 ("x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> This _looks_ like noise to me.
>
> Some benchmarks went up, some went down. The differential profile shows
> random gunk that basically amounts to "my computer is slow" because it's
> mostly things that change when the result changes, like:
>
> > 182670 +9.0% 199032 stress-ng.mutex.nanosecs_per_mutex
>
> Does anyone think there's something substantial to chase after here?

We further checked this, and it seems to be another case of data/text
alignment effect, that 6613d82e617d removes staic key 'mds_user_clear'
which sits in '.bss' section and change the address alignment of
following data in that section.

With below debug patch to restore the alignment, we can see the
performance is recovered:

a0e2dab44d22b913 6613d82e617dd7eb8b0c40b2fe3 398e7f0da8595354dc330938831
---------------- --------------------------- ---------------------------

302318 -7.9% 278364 +0.3% 303161 stress-ng.mutex.ops_per_sec

---
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 48d049cd74e7..1876865dc954 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -111,6 +111,9 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb);
/* Control unconditional IBPB in switch_mm() */
DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);

+DEFINE_STATIC_KEY_FALSE(test_static_key);
+EXPORT_SYMBOL_GPL(test_static_key);
+
/* Control MDS CPU buffer clear before idling (halt, mwait) */
DEFINE_STATIC_KEY_FALSE(mds_idle_clear);
EXPORT_SYMBOL_GPL(mds_idle_clear);
---

There was another similar case which changed the alignment of
percpu section:
https://lore.kernel.org/lkml/ZSeF6T0mkrH5pOgD@feng-clx/

Thanks,
Feng