2022-10-28 07:17:57

by kernel test robot

[permalink] [raw]
Subject: [tip:x86/core] [x86/retbleed] 80e4c1cd42: will-it-scale.per_thread_ops -5.4% regression


Hi Thomas,

though we call it a 'regression' in title by following parent-vs-commit rule
in our reporting, we understand from commit message this is actually a big
improvement if comparing to 'microcode mitigation' which could cause up to
30% performance drop.

we still report it out FYI about possible performance impact to some micro
benchmark.


Greeting,

FYI, we noticed a -5.4% regression of will-it-scale.per_thread_ops due to commit:


commit: 80e4c1cd42fff110bfdae8fce7ac4f22465f9664 ("x86/retbleed: Add X86_FEATURE_CALL_DEPTH")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/core

in testcase: will-it-scale
on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory
with following parameters:

nr_task: 100%
mode: thread
test: futex3
cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
gcc-11/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220510.cgz/lkp-csl-2ap4/futex3/will-it-scale

commit:
bea75b3389 ("x86/Kconfig: Introduce function padding")
80e4c1cd42 ("x86/retbleed: Add X86_FEATURE_CALL_DEPTH")

bea75b33895f7f87 80e4c1cd42fff110bfdae8fce7a
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.335e+09 -5.4% 1.263e+09 will-it-scale.192.threads
6951370 -5.4% 6578078 will-it-scale.per_thread_ops
1.335e+09 -5.4% 1.263e+09 will-it-scale.workload
33.29 -3.0% 32.30 boot-time.dhcp
0.97 ? 2% +0.1 1.07 ? 2% mpstat.cpu.all.irq%
83145 ?146% -94.2% 4796 ? 6% turbostat.C1
878.33 ? 4% +11.7% 981.00 ? 11% proc-vmstat.direct_map_level2_splits
77333 +2.2% 79018 proc-vmstat.nr_slab_unreclaimable
47455 ? 12% -23.2% 36450 ? 17% proc-vmstat.numa_hint_faults
43003 ? 32% -37.6% 26846 ? 37% proc-vmstat.numa_pages_migrated
43003 ? 32% -37.6% 26846 ? 37% proc-vmstat.pgmigrate_success
198321 ? 12% +21.9% 241714 ? 14% numa-meminfo.node1.AnonPages
200294 ? 12% +21.0% 242442 ? 14% numa-meminfo.node1.Inactive
200294 ? 12% +21.0% 242442 ? 14% numa-meminfo.node1.Inactive(anon)
229302 ? 15% -28.5% 163948 ? 17% numa-meminfo.node2.AnonPages
231172 ? 16% -28.5% 165270 ? 17% numa-meminfo.node2.Inactive
231172 ? 16% -28.5% 165270 ? 17% numa-meminfo.node2.Inactive(anon)
49578 ? 12% +22.1% 60515 ? 14% numa-vmstat.node1.nr_anon_pages
50070 ? 12% +21.2% 60697 ? 14% numa-vmstat.node1.nr_inactive_anon
50071 ? 12% +21.2% 60697 ? 14% numa-vmstat.node1.nr_zone_inactive_anon
57327 ? 15% -28.4% 41064 ? 17% numa-vmstat.node2.nr_anon_pages
57794 ? 16% -28.4% 41393 ? 17% numa-vmstat.node2.nr_inactive_anon
57794 ? 16% -28.4% 41393 ? 17% numa-vmstat.node2.nr_zone_inactive_anon
0.01 ? 4% +7.7% 0.02 perf-stat.i.MPKI
8.662e+10 -5.4% 8.197e+10 perf-stat.i.branch-instructions
3.336e+08 -4.5% 3.187e+08 perf-stat.i.branch-misses
15.22 ? 2% +1.1 16.33 ? 2% perf-stat.i.cache-miss-rate%
1193768 ? 4% +8.9% 1300334 ? 2% perf-stat.i.cache-misses
0.99 +5.9% 1.05 perf-stat.i.cpi
1.439e+11 -5.4% 1.362e+11 perf-stat.i.dTLB-loads
0.00 +0.0 0.00 perf-stat.i.dTLB-store-miss-rate%
255388 -1.9% 250535 perf-stat.i.dTLB-store-misses
1.079e+11 -5.4% 1.021e+11 perf-stat.i.dTLB-stores
5.753e+11 -5.4% 5.444e+11 perf-stat.i.instructions
1.01 -5.5% 0.96 perf-stat.i.ipc
1762 -5.4% 1667 perf-stat.i.metric.M/sec
233635 ? 3% +6.3% 248433 perf-stat.i.node-load-misses
106279 ? 3% +13.5% 120679 perf-stat.i.node-store-misses
0.01 ? 4% +7.1% 0.02 perf-stat.overall.MPKI
15.04 +1.0 16.08 ? 2% perf-stat.overall.cache-miss-rate%
0.99 +5.9% 1.05 perf-stat.overall.cpi
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
1.01 -5.5% 0.96 perf-stat.overall.ipc
8.633e+10 -5.4% 8.17e+10 perf-stat.ps.branch-instructions
3.325e+08 -4.5% 3.176e+08 perf-stat.ps.branch-misses
1.434e+11 -5.4% 1.357e+11 perf-stat.ps.dTLB-loads
254865 -1.9% 249975 perf-stat.ps.dTLB-store-misses
1.075e+11 -5.4% 1.017e+11 perf-stat.ps.dTLB-stores
5.734e+11 -5.4% 5.426e+11 perf-stat.ps.instructions
232956 ? 3% +6.3% 247689 perf-stat.ps.node-load-misses
105863 ? 3% +13.6% 120215 perf-stat.ps.node-store-misses
1.739e+14 -5.4% 1.645e+14 perf-stat.total.instructions
33.36 -2.0 31.32 perf-profile.calltrace.cycles-pp.__entry_text_start.syscall
1.70 -0.3 1.36 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
6.47 -0.3 6.14 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
2.22 -0.1 2.13 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall
97.70 -0.1 97.62 perf-profile.calltrace.cycles-pp.syscall
0.92 -0.1 0.86 perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.syscall
3.48 +0.0 3.51 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
1.98 +0.0 2.02 perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
8.68 +0.7 9.40 perf-profile.calltrace.cycles-pp.futex_hash.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
5.94 +1.2 7.16 perf-profile.calltrace.cycles-pp.get_futex_key.futex_wake.do_futex.__x64_sys_futex.do_syscall_64
23.75 +1.5 25.23 perf-profile.calltrace.cycles-pp.futex_wake.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
51.01 +2.2 53.22 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall
43.94 +2.6 46.55 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
32.18 +3.0 35.18 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
26.90 +3.4 30.27 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall
21.50 -1.3 20.19 perf-profile.children.cycles-pp.__entry_text_start
12.92 -0.8 12.09 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
7.38 -0.5 6.89 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
1.90 -0.4 1.53 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
2.40 -0.1 2.30 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.19 ? 7% +0.0 0.23 ? 4% perf-profile.children.cycles-pp.perf_prepare_sample
0.22 ? 6% +0.0 0.26 ? 3% perf-profile.children.cycles-pp.perf_tp_event
0.22 ? 6% +0.0 0.26 ? 3% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
0.01 ?223% +0.0 0.06 ? 9% perf-profile.children.cycles-pp.account_user_time
0.01 ?223% +0.1 0.06 ? 14% perf-profile.children.cycles-pp.account_system_index_time
0.36 ? 4% +0.1 0.41 ? 5% perf-profile.children.cycles-pp.scheduler_tick
0.31 ? 5% +0.1 0.37 ? 4% perf-profile.children.cycles-pp.task_tick_fair
0.24 ? 9% +0.1 0.30 ? 5% perf-profile.children.cycles-pp.update_curr
0.01 ?223% +0.1 0.08 ? 12% perf-profile.children.cycles-pp.__perf_event_header__init_id
0.01 ?223% +0.1 0.08 ? 12% perf-profile.children.cycles-pp.__task_pid_nr_ns
0.47 ? 7% +0.1 0.57 ? 5% perf-profile.children.cycles-pp.update_process_times
0.77 ? 4% +0.1 0.87 ? 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.71 ? 4% +0.1 0.82 ? 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.48 ? 8% +0.1 0.59 ? 5% perf-profile.children.cycles-pp.tick_sched_handle
0.67 ? 4% +0.1 0.78 ? 4% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.66 ? 4% +0.1 0.78 ? 4% perf-profile.children.cycles-pp.hrtimer_interrupt
0.54 ? 7% +0.1 0.65 ? 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.50 ? 8% +0.1 0.61 ? 5% perf-profile.children.cycles-pp.tick_sched_timer
8.78 +0.8 9.55 perf-profile.children.cycles-pp.futex_hash
6.02 +1.3 7.37 perf-profile.children.cycles-pp.get_futex_key
24.17 +1.9 26.11 perf-profile.children.cycles-pp.futex_wake
51.45 +2.2 53.67 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
44.71 +2.6 47.28 perf-profile.children.cycles-pp.do_syscall_64
32.55 +3.1 35.65 perf-profile.children.cycles-pp.__x64_sys_futex
27.26 +3.2 30.44 perf-profile.children.cycles-pp.do_futex
12.56 -0.8 11.75 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
16.26 -0.7 15.60 perf-profile.self.cycles-pp.syscall
9.62 -0.6 9.02 perf-profile.self.cycles-pp.__entry_text_start
6.82 -0.4 6.46 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.59 -0.2 1.37 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
2.21 -0.2 1.98 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
2.88 -0.1 2.75 perf-profile.self.cycles-pp.do_syscall_64
5.51 -0.1 5.40 perf-profile.self.cycles-pp.syscall_return_via_sysret
2.38 -0.1 2.27 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
3.41 +0.0 3.43 perf-profile.self.cycles-pp.exit_to_user_mode_prepare
0.32 ? 2% +0.0 0.34 ? 2% perf-profile.self.cycles-pp.syscall@plt
0.01 ?223% +0.0 0.06 ? 9% perf-profile.self.cycles-pp.account_user_time
0.01 ?223% +0.1 0.06 ? 14% perf-profile.self.cycles-pp.account_system_index_time
0.01 ?223% +0.1 0.07 ? 12% perf-profile.self.cycles-pp.__task_pid_nr_ns
8.59 +0.4 9.01 perf-profile.self.cycles-pp.futex_hash
9.35 +0.5 9.83 perf-profile.self.cycles-pp.futex_wake
3.22 +1.0 4.17 perf-profile.self.cycles-pp.do_futex
5.71 +1.2 6.92 perf-profile.self.cycles-pp.get_futex_key




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (13.27 kB)
config-6.1.0-rc1-00040-g80e4c1cd42ff (168.74 kB)
job-script (7.82 kB)
job.yaml (5.37 kB)
reproduce (356.00 B)
Download all attachments