Greeting,
FYI, we noticed a -14.9% regression of will-it-scale.per_thread_ops due to commit:
commit: a4a118f2eead1d6c49e00765de89878288d4b890 ("hugetlbfs: flush TLBs correctly after huge_pmd_unshare")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: will-it-scale
on test machine: 104 threads 2 sockets Skylake with 192G memory
with following parameters:
nr_task: 100%
mode: thread
test: context_switch1
cpufreq_governor: performance
ucode: 0x2006a0a
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-skl-fpga01/context_switch1/will-it-scale/0x2006a0a
commit:
v5.16-rc2
a4a118f2ee ("hugetlbfs: flush TLBs correctly after huge_pmd_unshare")
v5.16-rc2 a4a118f2eead1d6c49e00765de8
---------------- ---------------------------
%stddev %change %stddev
\ | \
22094930 -14.9% 18801170 will-it-scale.104.threads
212450 -14.9% 180780 will-it-scale.per_thread_ops
22094930 -14.9% 18801170 will-it-scale.workload
104.51 +6.4% 111.15 turbostat.RAMWatt
21864416 -14.9% 18613340 vmstat.system.cs
1.61 ? 14% +42.6% 2.29 ? 11% perf-stat.i.MPKI
3.726e+10 -13.5% 3.224e+10 perf-stat.i.branch-instructions
5.173e+08 -14.1% 4.441e+08 perf-stat.i.branch-misses
1.71 ? 14% +8.5 10.23 ? 7% perf-stat.i.cache-miss-rate%
4566699 ? 12% +689.0% 36029296 ? 4% perf-stat.i.cache-misses
22042272 -14.9% 18767811 perf-stat.i.context-switches
1.52 +16.1% 1.76 perf-stat.i.cpi
170640 ? 18% -95.0% 8502 ? 4% perf-stat.i.cycles-between-cache-misses
44430650 -14.6% 37926361 perf-stat.i.dTLB-load-misses
5.32e+10 -13.6% 4.594e+10 perf-stat.i.dTLB-loads
0.00 ? 4% +0.0 0.00 ? 10% perf-stat.i.dTLB-store-miss-rate%
3.23e+10 -13.7% 2.786e+10 perf-stat.i.dTLB-stores
68025283 -21.9% 53120420 ? 2% perf-stat.i.iTLB-load-misses
1.836e+11 -13.5% 1.589e+11 perf-stat.i.instructions
2820 +9.5% 3089 ? 2% perf-stat.i.instructions-per-iTLB-miss
0.66 -13.2% 0.57 perf-stat.i.ipc
1183 -13.5% 1023 perf-stat.i.metric.M/sec
274656 ? 40% +535.1% 1744238 ? 8% perf-stat.i.node-load-misses
1.59 ? 13% +41.3% 2.25 ? 11% perf-stat.overall.MPKI
1.59 ? 16% +8.6 10.18 ? 8% perf-stat.overall.cache-miss-rate%
1.51 +15.3% 1.74 perf-stat.overall.cpi
61473 ? 10% -87.5% 7707 ? 4% perf-stat.overall.cycles-between-cache-misses
0.08 -0.0 0.08 perf-stat.overall.dTLB-load-miss-rate%
0.00 ? 4% +0.0 0.00 ? 11% perf-stat.overall.dTLB-store-miss-rate%
2700 ? 2% +10.8% 2992 ? 2% perf-stat.overall.instructions-per-iTLB-miss
0.66 -13.3% 0.57 perf-stat.overall.ipc
32.91 ? 37% +37.6 70.48 ? 5% perf-stat.overall.node-load-miss-rate%
2504472 +1.7% 2546759 perf-stat.overall.path-length
3.714e+10 -13.5% 3.214e+10 perf-stat.ps.branch-instructions
5.156e+08 -14.1% 4.427e+08 perf-stat.ps.branch-misses
4556813 ? 12% +687.7% 35896229 ? 4% perf-stat.ps.cache-misses
21967784 -14.8% 18706255 perf-stat.ps.context-switches
44284414 -14.6% 37805127 perf-stat.ps.dTLB-load-misses
5.302e+10 -13.6% 4.58e+10 perf-stat.ps.dTLB-loads
3.219e+10 -13.7% 2.777e+10 perf-stat.ps.dTLB-stores
67799006 -21.9% 52946940 ? 2% perf-stat.ps.iTLB-load-misses
1.83e+11 -13.5% 1.584e+11 perf-stat.ps.instructions
274060 ? 40% +534.0% 1737650 ? 8% perf-stat.ps.node-load-misses
5.534e+13 -13.5% 4.788e+13 perf-stat.total.instructions
29.33 -0.8 28.53 perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write.ksys_write
28.26 -0.8 27.48 perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
28.70 -0.8 27.93 perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write.vfs_write
28.51 -0.8 27.76 perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write
32.31 -0.5 31.76 perf-profile.calltrace.cycles-pp.pipe_write.new_sync_write.vfs_write.ksys_write.do_syscall_64
33.10 -0.5 32.56 perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
12.74 -0.5 12.20 perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.pipe_read.new_sync_read
14.03 -0.5 13.50 perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
13.95 -0.5 13.42 perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
34.07 -0.4 33.64 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write
1.04 ? 2% +0.1 1.16 ? 3% perf-profile.calltrace.cycles-pp.copy_page_to_iter.pipe_read.new_sync_read.vfs_read.ksys_read
0.68 ? 4% +0.1 0.81 ? 6% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.82 +0.2 1.04 ? 2% perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.pipe_read.new_sync_read.vfs_read
1.00 +0.3 1.32 ? 3% perf-profile.calltrace.cycles-pp.touch_atime.pipe_read.new_sync_read.vfs_read.ksys_read
37.78 +0.3 38.13 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read
38.34 +0.4 38.74 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read
1.38 ? 3% -0.9 0.51 ? 5% perf-profile.children.cycles-pp.__task_pid_nr_ns
1.53 ? 3% -0.9 0.67 ? 5% perf-profile.children.cycles-pp.perf_event_pid_type
2.36 ? 3% -0.8 1.55 ? 4% perf-profile.children.cycles-pp.__perf_event_header__init_id
29.35 -0.8 28.55 perf-profile.children.cycles-pp.__wake_up_common_lock
28.28 -0.8 27.50 perf-profile.children.cycles-pp.try_to_wake_up
28.70 -0.8 27.94 perf-profile.children.cycles-pp.__wake_up_common
28.52 -0.8 27.77 perf-profile.children.cycles-pp.autoremove_wake_function
33.12 -0.5 32.58 perf-profile.children.cycles-pp.new_sync_write
32.35 -0.5 31.80 perf-profile.children.cycles-pp.pipe_write
12.75 -0.5 12.21 perf-profile.children.cycles-pp.dequeue_task_fair
13.96 -0.5 13.43 perf-profile.children.cycles-pp.enqueue_task_fair
14.03 -0.5 13.50 perf-profile.children.cycles-pp.ttwu_do_activate
34.08 -0.4 33.66 perf-profile.children.cycles-pp.vfs_write
0.12 ? 5% -0.0 0.08 ? 3% perf-profile.children.cycles-pp.fput
0.12 ? 3% -0.0 0.10 ? 6% perf-profile.children.cycles-pp.child
0.37 ? 2% -0.0 0.35 ? 2% perf-profile.children.cycles-pp.tick_sched_handle
0.10 ? 5% +0.0 0.12 ? 5% perf-profile.children.cycles-pp.__list_add_valid
0.20 ? 3% +0.0 0.23 ? 4% perf-profile.children.cycles-pp.make_kgid
0.09 ? 6% +0.0 0.12 ? 3% perf-profile.children.cycles-pp.clear_buddies
0.13 ? 5% +0.0 0.17 ? 5% perf-profile.children.cycles-pp.local_clock
0.05 ? 5% +0.0 0.08 ? 7% perf-profile.children.cycles-pp.rb_insert_color
0.11 ? 4% +0.0 0.14 ? 3% perf-profile.children.cycles-pp.check_cfs_rq_runtime
0.28 ? 3% +0.0 0.31 ? 3% perf-profile.children.cycles-pp.map_id_range_down
0.48 ? 3% +0.0 0.53 ? 3% perf-profile.children.cycles-pp.__might_sleep
0.35 ? 4% +0.1 0.40 ? 3% perf-profile.children.cycles-pp.__might_fault
0.83 ? 2% +0.1 0.88 ? 2% perf-profile.children.cycles-pp.set_next_entity
0.00 +0.1 0.06 ? 6% perf-profile.children.cycles-pp.default_wake_function
0.51 ? 3% +0.1 0.62 ? 3% perf-profile.children.cycles-pp.pick_next_entity
0.15 ? 6% +0.1 0.26 ? 9% perf-profile.children.cycles-pp.timestamp_truncate
0.40 ? 7% +0.1 0.52 ? 10% perf-profile.children.cycles-pp.file_update_time
1.07 ? 2% +0.1 1.19 ? 2% perf-profile.children.cycles-pp.copy_page_to_iter
0.00 +0.1 0.12 ? 34% perf-profile.children.cycles-pp.__mark_inode_dirty
0.00 +0.1 0.12 ? 32% perf-profile.children.cycles-pp.generic_update_time
1.28 ? 3% +0.2 1.45 ? 4% perf-profile.children.cycles-pp.security_file_permission
0.86 +0.2 1.06 ? 2% perf-profile.children.cycles-pp.atime_needs_update
2.51 +0.3 2.78 ? 2% perf-profile.children.cycles-pp.pick_next_task_fair
1.00 +0.3 1.32 ? 3% perf-profile.children.cycles-pp.touch_atime
1.37 ? 3% -0.9 0.50 ? 5% perf-profile.self.cycles-pp.__task_pid_nr_ns
1.23 ? 4% -0.4 0.86 ? 6% perf-profile.self.cycles-pp.update_curr
0.32 ? 3% -0.0 0.27 ? 3% perf-profile.self.cycles-pp.schedule
0.20 ? 6% -0.0 0.16 ? 7% perf-profile.self.cycles-pp.current_time
0.12 ? 2% -0.0 0.10 ? 6% perf-profile.self.cycles-pp.child
0.06 ? 6% +0.0 0.07 ? 5% perf-profile.self.cycles-pp.__might_fault
0.13 ? 3% +0.0 0.14 ? 3% perf-profile.self.cycles-pp.__cond_resched
0.12 ? 3% +0.0 0.14 ? 4% perf-profile.self.cycles-pp.put_prev_entity
0.12 ? 4% +0.0 0.14 ? 6% perf-profile.self.cycles-pp.touch_atime
0.08 ? 6% +0.0 0.10 ? 3% perf-profile.self.cycles-pp.clear_buddies
0.17 ? 4% +0.0 0.20 ? 6% perf-profile.self.cycles-pp.ksys_write
0.06 ? 7% +0.0 0.09 ? 4% perf-profile.self.cycles-pp.check_cfs_rq_runtime
0.05 +0.0 0.08 ? 7% perf-profile.self.cycles-pp.rb_insert_color
0.26 ? 4% +0.0 0.29 ? 3% perf-profile.self.cycles-pp.map_id_range_down
0.12 ? 7% +0.0 0.16 ? 5% perf-profile.self.cycles-pp.local_clock
0.17 ? 3% +0.0 0.21 ? 3% perf-profile.self.cycles-pp.set_next_entity
0.41 ? 4% +0.0 0.45 ? 3% perf-profile.self.cycles-pp.__might_sleep
0.00 +0.1 0.06 ? 5% perf-profile.self.cycles-pp.default_wake_function
0.37 ? 3% +0.1 0.46 ? 13% perf-profile.self.cycles-pp.vfs_write
0.39 ? 4% +0.1 0.49 ? 6% perf-profile.self.cycles-pp.new_sync_read
0.38 ? 3% +0.1 0.48 ? 3% perf-profile.self.cycles-pp.pick_next_entity
0.14 ? 7% +0.1 0.25 ? 10% perf-profile.self.cycles-pp.timestamp_truncate
0.00 +0.1 0.12 ? 34% perf-profile.self.cycles-pp.__mark_inode_dirty
0.86 ? 3% +0.1 1.00 ? 4% perf-profile.self.cycles-pp.pipe_write
0.36 ? 4% +0.1 0.50 ? 8% perf-profile.self.cycles-pp.vfs_read
0.26 ? 5% +0.1 0.41 ? 5% perf-profile.self.cycles-pp.atime_needs_update
0.19 ? 11% +0.2 0.35 ? 14% perf-profile.self.cycles-pp.security_file_permission
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang