(for previous report "[x86/signal] 3aac3ebea0: will-it-scale.per_thread_ops
-11.9% regression" [1] which this 6c3118c321 is targeting for, we found we only
tested tglx’s diff shown in here: https://lore.kernel.org/lkml/87bl1s357p.ffs@tglx/,
but didn't test this patch, so still send out this report FYI
[1] https://lore.kernel.org/lkml/20211207012128.GA16074@xsang-OptiPlex-9020/)
Greeting,
FYI, we noticed a 13.2% improvement of will-it-scale.per_thread_ops due to commit:
commit: 6c3118c32129b4197999a8928ba776bcabd0f5c4 ("signal: Skip the altstack update when not needed")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: will-it-scale
on test machine: 144 threads 4 sockets Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory
with following parameters:
nr_task: 100%
mode: thread
test: signal1
cpufreq_governor: performance
ucode: 0x16
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-hsw-4ex1/signal1/will-it-scale/0x16
commit:
cabdc3a847 ("sched,x86: Don't use cluster topology for x86 hybrid CPUs")
6c3118c321 ("signal: Skip the altstack update when not needed")
cabdc3a8475b918e 6c3118c32129b4197999a8928ba
---------------- ---------------------------
%stddev %change %stddev
\ | \
252712 +13.2% 286175 will-it-scale.144.threads
1754 +13.2% 1987 will-it-scale.per_thread_ops
252712 +13.2% 286175 will-it-scale.workload
45399 ± 49% +65.3% 75024 ± 5% numa-numastat.node1.other_node
1461 ± 31% -43.9% 820.50 ± 25% numa-meminfo.node1.Active
1461 ± 31% -43.9% 820.50 ± 25% numa-meminfo.node1.Active(anon)
2010729 ± 44% -99.8% 3939 ± 96% numa-meminfo.node2.FilePages
2006836 ± 44% -100.0% 135.17 ± 88% numa-meminfo.node2.Unevictable
361.83 ± 30% -43.3% 205.00 ± 26% numa-vmstat.node1.nr_active_anon
361.83 ± 30% -43.3% 205.00 ± 26% numa-vmstat.node1.nr_zone_active_anon
67182 ± 32% +43.4% 96361 ± 4% numa-vmstat.node1.numa_other
502682 ± 44% -99.8% 984.83 ± 96% numa-vmstat.node2.nr_file_pages
501709 ± 44% -100.0% 33.33 ± 89% numa-vmstat.node2.nr_unevictable
501709 ± 44% -100.0% 33.33 ± 89% numa-vmstat.node2.nr_zone_unevictable
30244982 -3.0% 29346668 perf-stat.i.cache-references
1689 +1.2% 1709 perf-stat.i.context-switches
6.01e+08 ± 2% +9.4% 6.572e+08 perf-stat.i.dTLB-stores
0.33 -3.3% 0.32 perf-stat.overall.MPKI
1.104e+08 -11.6% 97613125 perf-stat.overall.path-length
30288439 -2.7% 29479951 perf-stat.ps.cache-references
1680 +1.3% 1701 perf-stat.ps.context-switches
5.995e+08 ± 2% +9.4% 6.557e+08 perf-stat.ps.dTLB-stores
10120822 ± 3% -9.2e+05 9195947 ± 2% syscalls.sys_getpid.noise.2%
9318091 ± 4% -1.1e+06 8255028 ± 3% syscalls.sys_getpid.noise.25%
10007277 ± 3% -9.4e+05 9071322 ± 2% syscalls.sys_getpid.noise.5%
10229292 ± 3% -7.6e+05 9466206 ± 3% syscalls.sys_gettid.noise.2%
9734831 ± 3% -8e+05 8938625 ± 3% syscalls.sys_gettid.noise.25%
10103340 ± 3% -7.8e+05 9322792 ± 3% syscalls.sys_gettid.noise.5%
1.597e+09 ± 9% -7.3e+08 8.641e+08 ± 22% syscalls.sys_rt_sigprocmask.noise.2%
4.123e+08 ± 41% -3.6e+08 52527108 ± 85% syscalls.sys_rt_sigprocmask.noise.25%
1.552e+09 ± 9% -7.8e+08 7.753e+08 ± 27% syscalls.sys_rt_sigprocmask.noise.5%
349534 -10.1% 314361 ± 4% syscalls.sys_tgkill.max
1.551e+09 ± 6% -6.2e+08 9.279e+08 ± 20% syscalls.sys_tgkill.noise.2%
3.251e+08 ± 35% -2.6e+08 66880605 ± 79% syscalls.sys_tgkill.noise.25%
1.503e+09 ± 7% -6.6e+08 8.453e+08 ± 24% syscalls.sys_tgkill.noise.5%
12.27 -12.3 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
12.24 -12.2 0.00 perf-profile.calltrace.cycles-pp.restore_altstack.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
12.23 -12.2 0.00 perf-profile.calltrace.cycles-pp.do_sigaltstack.restore_altstack.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
12.14 -12.1 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.do_sigaltstack.restore_altstack.__x64_sys_rt_sigreturn.do_syscall_64
12.11 -12.1 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.do_sigaltstack.restore_altstack.__x64_sys_rt_sigreturn
72.76 -4.8 67.94 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
72.78 -4.8 67.96 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.raise
72.97 -4.8 68.18 perf-profile.calltrace.cycles-pp.raise
15.95 +1.1 17.02 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.__lock_task_sighand.do_send_sig_info.do_send_specific
15.99 +1.1 17.06 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.__lock_task_sighand.do_send_sig_info.do_send_specific.do_tkill
15.99 +1.1 17.06 perf-profile.calltrace.cycles-pp.__lock_task_sighand.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill
16.22 +1.1 17.33 perf-profile.calltrace.cycles-pp.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill.do_syscall_64
16.30 +1.1 17.40 perf-profile.calltrace.cycles-pp.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
16.28 +1.1 17.39 perf-profile.calltrace.cycles-pp.do_send_specific.do_tkill.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe
16.30 +1.1 17.40 perf-profile.calltrace.cycles-pp.do_tkill.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
14.11 +1.9 16.01 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.__set_current_blocked.signal_setup_done.arch_do_signal_or_restart
14.14 +1.9 16.04 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__set_current_blocked.signal_setup_done.arch_do_signal_or_restart.exit_to_user_mode_prepare
14.27 +1.9 16.20 perf-profile.calltrace.cycles-pp.signal_setup_done.arch_do_signal_or_restart.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
14.26 +1.9 16.19 perf-profile.calltrace.cycles-pp.__set_current_blocked.signal_setup_done.arch_do_signal_or_restart.exit_to_user_mode_prepare.syscall_exit_to_user_mode
14.29 +1.9 16.22 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
14.34 +1.9 16.28 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
14.53 +1.9 16.46 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.get_signal.arch_do_signal_or_restart.exit_to_user_mode_prepare
14.34 +1.9 16.28 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.handler
14.34 +1.9 16.28 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.handler
14.56 +1.9 16.50 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.get_signal.arch_do_signal_or_restart.exit_to_user_mode_prepare.syscall_exit_to_user_mode
14.40 +1.9 16.35 perf-profile.calltrace.cycles-pp.handler
14.88 +2.0 16.84 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
14.74 +2.0 16.72 perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
15.26 +2.0 17.30 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
12.25 +2.8 15.03 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64
12.28 +2.8 15.07 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe
12.39 +2.8 15.22 perf-profile.calltrace.cycles-pp.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.__restore_rt
12.42 +2.8 15.25 perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe.__restore_rt
12.46 +2.8 15.30 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__restore_rt
12.46 +2.8 15.30 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__restore_rt
12.47 +2.8 15.31 perf-profile.calltrace.cycles-pp.__restore_rt
29.10 +3.9 33.02 perf-profile.calltrace.cycles-pp.arch_do_signal_or_restart.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
28.35 +4.2 32.52 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.__set_current_blocked.sigprocmask.__x64_sys_rt_sigprocmask
28.42 +4.2 32.61 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.__set_current_blocked.sigprocmask.__x64_sys_rt_sigprocmask.do_syscall_64
28.69 +4.2 32.92 perf-profile.calltrace.cycles-pp.sigprocmask.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
28.68 +4.2 32.90 perf-profile.calltrace.cycles-pp.__set_current_blocked.sigprocmask.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe
28.72 +4.2 32.96 perf-profile.calltrace.cycles-pp.__x64_sys_rt_sigprocmask.do_syscall_64.entry_SYSCALL_64_after_hwframe.raise
12.24 -12.2 0.00 perf-profile.children.cycles-pp.restore_altstack
12.24 -12.2 0.00 perf-profile.children.cycles-pp.do_sigaltstack
24.70 -9.4 15.30 perf-profile.children.cycles-pp.__x64_sys_rt_sigreturn
72.99 -4.8 68.20 perf-profile.children.cycles-pp.raise
81.54 -1.3 80.22 perf-profile.children.cycles-pp._raw_spin_lock_irq
97.30 -0.3 97.04 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
99.66 -0.1 99.61 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
99.64 -0.1 99.59 perf-profile.children.cycles-pp.do_syscall_64
0.05 +0.0 0.06 perf-profile.children.cycles-pp.restore_sigcontext
0.06 +0.0 0.07 perf-profile.children.cycles-pp.__setup_rt_frame
0.11 +0.0 0.12 ± 3% perf-profile.children.cycles-pp.__send_signal
0.09 ± 5% +0.0 0.11 ± 3% perf-profile.children.cycles-pp.__rb_reserve_next
0.11 ± 4% +0.0 0.12 ± 3% perf-profile.children.cycles-pp.__entry_text_start
0.14 ± 3% +0.0 0.16 ± 3% perf-profile.children.cycles-pp.recalc_sigpending
0.13 ± 2% +0.0 0.15 ± 3% perf-profile.children.cycles-pp.__set_task_blocked
0.13 ± 3% +0.0 0.16 ± 3% perf-profile.children.cycles-pp.ring_buffer_lock_reserve
0.15 +0.0 0.17 ± 4% perf-profile.children.cycles-pp.ftrace_syscall_exit
0.18 ± 2% +0.0 0.20 ± 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
0.17 ± 2% +0.0 0.20 ± 3% perf-profile.children.cycles-pp.trace_buffer_lock_reserve
0.20 ± 3% +0.0 0.24 ± 2% perf-profile.children.cycles-pp.syscall_trace_enter
0.18 ± 3% +0.0 0.21 ± 3% perf-profile.children.cycles-pp.ftrace_syscall_enter
16.00 +1.1 17.07 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
15.99 +1.1 17.06 perf-profile.children.cycles-pp.__lock_task_sighand
16.23 +1.1 17.33 perf-profile.children.cycles-pp.do_send_sig_info
16.30 +1.1 17.40 perf-profile.children.cycles-pp.__x64_sys_tgkill
16.28 +1.1 17.39 perf-profile.children.cycles-pp.do_send_specific
16.30 +1.1 17.40 perf-profile.children.cycles-pp.do_tkill
14.27 +1.9 16.20 perf-profile.children.cycles-pp.signal_setup_done
14.40 +1.9 16.35 perf-profile.children.cycles-pp.handler
14.75 +2.0 16.73 perf-profile.children.cycles-pp.get_signal
12.47 +2.8 15.31 perf-profile.children.cycles-pp.__restore_rt
29.16 +3.9 33.06 perf-profile.children.cycles-pp.exit_to_user_mode_prepare
29.10 +3.9 33.02 perf-profile.children.cycles-pp.arch_do_signal_or_restart
29.62 +4.0 33.58 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
28.69 +4.2 32.92 perf-profile.children.cycles-pp.sigprocmask
28.72 +4.2 32.96 perf-profile.children.cycles-pp.__x64_sys_rt_sigprocmask
55.35 +9.0 64.33 perf-profile.children.cycles-pp.__set_current_blocked
97.30 -0.3 97.04 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.10 ± 4% +0.0 0.11 perf-profile.self.cycles-pp.recalc_sigpending
0.28 +0.0 0.32 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang