2021-05-21 10:16:35

by kernel test robot

[permalink] [raw]
Subject: [kentry] 5c61d03b2b: will-it-scale.per_thread_ops -5.5% regression



Greeting,

FYI, we noticed a -5.5% regression of will-it-scale.per_thread_ops due to commit:


commit: 5c61d03b2b823992b0e8eba73d2be61947f00323 ("kentry: Simplify the common syscall API")
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git x86/kentry


in testcase: will-it-scale
on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:

nr_task: 50%
mode: thread
test: lseek2
cpufreq_governor: performance
ucode: 0x5003006

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
bin/lkp run generated-yaml-file

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/thread/50%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/lseek2/will-it-scale/0x5003006

commit:
a27da10c7e ("x86/entry: Convert ret_from_fork to C")
5c61d03b2b ("kentry: Simplify the common syscall API")

a27da10c7ee27727 5c61d03b2b823992b0e8eba73d2
---------------- ---------------------------
%stddev %change %stddev
\ | \
3.22e+08 -5.5% 3.042e+08 will-it-scale.44.threads
7318218 -5.5% 6912604 will-it-scale.per_thread_ops
3.22e+08 -5.5% 3.042e+08 will-it-scale.workload
273.17 -1.5% 269.04 turbostat.PkgWatt
1474 ? 7% +14.8% 1692 ? 7% slabinfo.dmaengine-unmap-16.active_objs
1474 ? 7% +14.8% 1692 ? 7% slabinfo.dmaengine-unmap-16.num_objs
1488 ? 6% -20.3% 1187 ? 16% interrupts.CPU10.CAL:Function_call_interrupts
183.33 ? 37% -68.8% 57.17 ? 61% interrupts.CPU53.RES:Rescheduling_interrupts
1208 ? 13% -18.7% 982.33 ? 13% interrupts.CPU7.CAL:Function_call_interrupts
602.33 ? 22% -49.3% 305.50 ? 47% interrupts.CPU84.TLB:TLB_shootdowns
6010 ? 20% -34.8% 3918 ? 39% interrupts.CPU87.NMI:Non-maskable_interrupts
6010 ? 20% -34.8% 3918 ? 39% interrupts.CPU87.PMI:Performance_monitoring_interrupts
10028 ? 18% -26.4% 7379 ? 16% softirqs.CPU44.SCHED
14260 ? 12% -21.1% 11254 ? 10% softirqs.CPU47.RCU
15354 ? 12% -24.3% 11624 ? 13% softirqs.CPU53.RCU
19973 ? 45% +80.6% 36071 ? 11% softirqs.CPU53.SCHED
15299 ? 10% -21.3% 12044 ? 17% softirqs.CPU69.RCU
15325 ? 17% -23.8% 11671 ? 11% softirqs.CPU70.RCU
24871 ? 36% -64.6% 8796 ? 45% softirqs.CPU9.SCHED
0.01 ? 19% -100.0% 0.00 perf-sched.sch_delay.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.03 ? 8% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
518.50 ? 10% -100.0% 0.00 perf-sched.wait_and_delay.count.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.20 ?102% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.03 ? 26% -100.0% 0.00 perf-sched.wait_time.avg.ms.exit_to_user_mode_prepare.kentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
0.03 ? 8% -100.0% 0.00 perf-sched.wait_time.avg.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.19 ? 88% -100.0% 0.00 perf-sched.wait_time.max.ms.exit_to_user_mode_prepare.kentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown]
1.20 ?102% -100.0% 0.00 perf-sched.wait_time.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.03 ? 2% +4.5% 0.03 ? 2% perf-stat.i.MPKI
3.195e+10 -5.6% 3.016e+10 perf-stat.i.branch-instructions
2.367e+08 ? 3% -7.8% 2.182e+08 perf-stat.i.branch-misses
13.98 -0.4 13.59 perf-stat.i.cache-miss-rate%
0.86 +4.1% 0.90 perf-stat.i.cpi
4.292e+10 -3.6% 4.138e+10 perf-stat.i.dTLB-loads
0.00 -0.0 0.00 perf-stat.i.dTLB-store-miss-rate%
46888 -7.1% 43557 ? 2% perf-stat.i.dTLB-store-misses
2.867e+10 -2.4% 2.799e+10 perf-stat.i.dTLB-stores
2.247e+08 ? 3% -6.8% 2.094e+08 perf-stat.i.iTLB-load-misses
1.434e+11 -3.9% 1.378e+11 perf-stat.i.instructions
1.16 -3.9% 1.12 perf-stat.i.ipc
1176 -3.9% 1131 perf-stat.i.metric.M/sec
14.27 -0.4 13.89 perf-stat.overall.cache-miss-rate%
0.86 +4.1% 0.90 perf-stat.overall.cpi
0.00 -0.0 0.00 ? 2% perf-stat.overall.dTLB-store-miss-rate%
1.16 -3.9% 1.12 perf-stat.overall.ipc
133998 +1.8% 136413 perf-stat.overall.path-length
3.184e+10 -5.6% 3.006e+10 perf-stat.ps.branch-instructions
2.36e+08 ? 3% -7.8% 2.176e+08 perf-stat.ps.branch-misses
4.278e+10 -3.6% 4.124e+10 perf-stat.ps.dTLB-loads
46753 -7.1% 43435 ? 2% perf-stat.ps.dTLB-store-misses
2.857e+10 -2.4% 2.789e+10 perf-stat.ps.dTLB-stores
2.24e+08 ? 3% -6.8% 2.087e+08 perf-stat.ps.iTLB-load-misses
1.429e+11 -3.9% 1.373e+11 perf-stat.ps.instructions
4.315e+13 -3.8% 4.149e+13 perf-stat.total.instructions
1.47 ? 12% -0.8 0.72 ? 18% perf-profile.calltrace.cycles-pp.shmem_file_llseek.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_lseek64
1.32 ? 10% -0.6 0.67 ? 13% perf-profile.calltrace.cycles-pp.___might_sleep.mutex_lock.__fdget_pos.ksys_lseek.do_syscall_64
0.92 ? 11% -0.2 0.70 ? 11% perf-profile.calltrace.cycles-pp.generic_file_llseek_size.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_lseek64
0.00 +0.8 0.85 ? 11% perf-profile.calltrace.cycles-pp.kentry_syscall_begin.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_lseek64
0.00 +1.7 1.71 ? 11% perf-profile.calltrace.cycles-pp.kentry_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_lseek64
7.02 ? 11% +3.5 10.52 ? 9% perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.49 ? 12% -0.8 0.73 ? 18% perf-profile.children.cycles-pp.shmem_file_llseek
1.32 ? 10% -0.6 0.67 ? 13% perf-profile.children.cycles-pp.___might_sleep
0.95 ? 11% -0.2 0.72 ? 12% perf-profile.children.cycles-pp.generic_file_llseek_size
0.43 ? 12% -0.1 0.31 ? 10% perf-profile.children.cycles-pp.__f_unlock_pos
0.35 ? 13% +0.1 0.47 ? 12% perf-profile.children.cycles-pp.__x64_sys_lseek
0.00 +0.3 0.33 ? 11% perf-profile.children.cycles-pp.kentry_enter_from_user_mode
0.00 +0.5 0.50 ? 13% perf-profile.children.cycles-pp.kentry_syscall_end
0.00 +0.9 0.85 ? 11% perf-profile.children.cycles-pp.kentry_syscall_begin
0.00 +1.7 1.71 ? 11% perf-profile.children.cycles-pp.kentry_exit_to_user_mode
7.27 ? 11% +3.6 10.85 ? 9% perf-profile.children.cycles-pp.__fget_light
1.46 ? 12% -0.7 0.72 ? 18% perf-profile.self.cycles-pp.shmem_file_llseek
1.31 ? 10% -0.6 0.67 ? 12% perf-profile.self.cycles-pp.___might_sleep
0.94 ? 11% -0.2 0.71 ? 11% perf-profile.self.cycles-pp.generic_file_llseek_size
0.22 ? 12% -0.1 0.15 ? 9% perf-profile.self.cycles-pp.__f_unlock_pos
0.35 ? 12% +0.1 0.47 ? 12% perf-profile.self.cycles-pp.__x64_sys_lseek
0.17 ? 12% +0.2 0.33 ? 12% perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup
0.00 +0.2 0.17 ? 11% perf-profile.self.cycles-pp.kentry_enter_from_user_mode
0.00 +0.5 0.49 ? 12% perf-profile.self.cycles-pp.kentry_syscall_end
0.00 +0.7 0.68 ? 10% perf-profile.self.cycles-pp.kentry_syscall_begin
0.00 +1.2 1.21 ? 11% perf-profile.self.cycles-pp.kentry_exit_to_user_mode
1.25 ? 12% +1.4 2.62 ? 10% perf-profile.self.cycles-pp.do_syscall_64
1.11 ? 10% +2.6 3.76 ? 11% perf-profile.self.cycles-pp.__fget_light



will-it-scale.44.threads

3.25e+08 +----------------------------------------------------------------+
| + + .+.++.+ + +. +.+.++.+. +. .+ .+ .|
|+ +. .+ + + ++. .++. : ++.+ + +.+ +.++ |
3.2e+08 |-+ + + + + |
| |
| |
3.15e+08 |-+ |
| |
3.1e+08 |-+ |
| |
| |
3.05e+08 |-+ O OO O OO OO O O OO OO |
| OO O OO OO O OO O O |
| OO O OO OO |
3e+08 +----------------------------------------------------------------+


will-it-scale.per_thread_ops

7.4e+06 +-----------------------------------------------------------------+
| + + +.+.+ .++.+.++ +.++.+. +. .|
7.3e+06 |+++. .+ + .+ + + .+ .+ + ++.+.++.+ +.++.+.++ |
| + + + +.+ + |
| |
7.2e+06 |-+ |
| |
7.1e+06 |-+ |
| |
7e+06 |-+ |
| |
| O O OO OO O OO O OO O O |
6.9e+06 |-+ O O OO O OO O OO O OO |
| OO O O O O O |
6.8e+06 +-----------------------------------------------------------------+


will-it-scale.workload

3.25e+08 +----------------------------------------------------------------+
| + + .+.++.+ + +. +.+.++.+. +. .+ .+ .|
|+ +. .+ + + ++. .++. : ++.+ + +.+ +.++ |
3.2e+08 |-+ + + + + |
| |
| |
3.15e+08 |-+ |
| |
3.1e+08 |-+ |
| |
| |
3.05e+08 |-+ O OO O OO OO O O OO OO |
| OO O OO OO O OO O O |
| OO O OO OO |
3e+08 +----------------------------------------------------------------+




0.055 +-------------------------------------------------------------------+
| + |
0.05 |-+ : |
| : |
0.045 |-+ : : |
| : : |
0.04 |-+ : : |
| : : |
0.035 |-+ : : |
| : : + |
0.03 |-+ + : : : : +. + |
| :+ +. : : : : +. : +.+ +|
0.025 |.++.+. .++. : +.+ + +.++.+.++.+.++.+ ++ ++.+.+ + + |
| + +.+ + +.+ |
0.02 +-------------------------------------------------------------------+




0.012 +-------------------------------------------------------------------+
| : : : : : :: : |
| : : : : :: :: : : : |
0.011 |.+ : : + + + +.+ :: :+.+.++ + + + ::|
| : : : : : : : : :: : : :: : : ::|
| : : : :: : : : : : : : : :: : : ::|
0.01 |-: + + : : : : : : : +.+ +.+ + : : + : + : ::|
| : : : : : : :: : : : : : : : : : : : ::|
0.009 |-: : : : :: :: :: : : : : : : : :: : :: :|
| :: : : : :: :: :: :: : : ::: : :: |
| :: :: : : :: : : : : : :: :: |
0.008 |-+:: :: + + :: + + :: + :: :: |
| :: : : : : : |
| : : : : : : |
0.007 +-------------------------------------------------------------------+




600 +---------------------------------------------------------------------+
| .+ +.+ +.|
550 |-+ : +. : : .+ : |
| : : + : + .+ + + : : + : +.+.+ |
500 |:+ : :: : + + : + +.+ +: ::: +.+ : : : |
|: : :: : : : + : : : :: ::: : : :: |
450 |:+ + + : : : : : :: : : : : : + : : :: |
|: : : : : + : : : : : : : : +.: : |
400 |-+ : : : : : : : : : : : : + + |
| + : : : : :: : : : : |
350 |-+ + : : : : :: : : |
| :: : : : : |
300 |-+ :: : + + : |
| + + + |
250 +---------------------------------------------------------------------+




0.055 +-------------------------------------------------------------------+
| + |
0.05 |-+ : |
| : |
0.045 |-+ : : |
| : : |
0.04 |-+ : : |
| : : |
0.035 |-+ : : |
| : : + |
0.03 |-+ + : : : : +. + |
| :+ +. : : : : +. : +.+ +|
0.025 |.++.+. .++. : +.+ + +.++.+.++.+.++.+ ++ ++.+.+ + + |
| + +.+ + +.+ |
0.02 +-------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (21.32 kB)
config-5.13.0-rc1-00237-g5c61d03b2b82 (176.78 kB)
job-script (8.00 kB)
job.yaml (5.53 kB)
reproduce (347.00 B)
Download all attachments