Greeting,
FYI, we noticed a -27.4% regression of stress-ng.msg.ops_per_sec due to commit:
commit: 9223d0dccb8f8523754122f68316dd1a4f39f7f8 ("thermal: Move therm_throt there from x86/mce")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
with following parameters:
nr_threads: 10%
disk: 1HDD
testtime: 60s
fs: ext4
class: os
test: msg
cpufreq_governor: performance
ucode: 0x5003006
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
os/gcc-9/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp5/msg/stress-ng/60s/0x5003006
commit:
4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")
9223d0dccb ("thermal: Move therm_throt there from x86/mce")
4f432e8bb15b352d 9223d0dccb8f8523754122f6831
---------------- ---------------------------
%stddev %change %stddev
\ | \
3.042e+08 ? 4% -27.4% 2.209e+08 ? 2% stress-ng.msg.ops
5069470 ? 4% -27.4% 3682272 ? 2% stress-ng.msg.ops_per_sec
2335 ? 8% -35.0% 1517 ? 18% stress-ng.time.involuntary_context_switches
1676 -4.9% 1594 stress-ng.time.percent_of_cpu_this_job_got
1019 -4.6% 973.25 stress-ng.time.system_time
20.97 ? 2% -19.3% 16.93 ? 2% stress-ng.time.user_time
1549846 ?159% +1738.5% 28494120 ? 16% stress-ng.time.voluntary_context_switches
2050707 ?134% +1531.7% 33461885 ? 21% cpuidle.POLL.time
1354230 ?171% +1585.0% 22819237 ? 19% cpuidle.POLL.usage
82.39 +1.2% 83.37 iostat.cpu.idle
17.20 -5.3% 16.29 iostat.cpu.system
0.04 ? 5% +0.0 0.05 ? 3% mpstat.cpu.all.soft%
0.37 ? 2% -0.1 0.30 ? 2% mpstat.cpu.all.usr%
1337633 ? 28% -65.8% 457460 ? 95% numa-numastat.node0.local_node
456832 ? 26% -67.5% 148621 ? 25% numa-numastat.node1.local_node
500231 ? 26% -64.1% 179602 ? 12% numa-numastat.node1.numa_hit
82.00 +1.2% 83.00 vmstat.cpu.id
49550 ?153% +1673.1% 878579 ? 16% vmstat.system.cs
197941 +26.4% 250229 ? 6% vmstat.system.in
41605 -1.4% 41020 proc-vmstat.nr_slab_unreclaimable
1907890 ? 15% -62.0% 724187 ? 63% proc-vmstat.numa_hit
1821249 ? 16% -65.0% 637560 ? 71% proc-vmstat.numa_local
1925858 ? 15% -61.7% 737459 ? 62% proc-vmstat.pgalloc_normal
1796629 ? 16% -66.1% 609149 ? 75% proc-vmstat.pgfree
9427 ? 3% -11.6% 8337 ? 2% slabinfo.kmalloc-512.active_objs
1182 ? 3% -11.5% 1045 ? 2% slabinfo.kmalloc-512.active_slabs
9459 ? 3% -11.5% 8370 ? 2% slabinfo.kmalloc-512.num_objs
1182 ? 3% -11.5% 1045 ? 2% slabinfo.kmalloc-512.num_slabs
82005 ? 7% -24.2% 62197 ? 5% slabinfo.kmalloc-64.active_objs
1282 ? 7% -24.1% 973.67 ? 5% slabinfo.kmalloc-64.active_slabs
82119 ? 7% -24.1% 62339 ? 5% slabinfo.kmalloc-64.num_objs
1282 ? 7% -24.1% 973.67 ? 5% slabinfo.kmalloc-64.num_slabs
8300 ? 11% +41.5% 11747 ? 10% softirqs.CPU21.SCHED
6707 ? 23% +39.9% 9384 ? 15% softirqs.CPU41.SCHED
8143 ? 17% +50.5% 12255 ? 27% softirqs.CPU56.SCHED
8052 ? 14% +82.7% 14708 ? 29% softirqs.CPU58.SCHED
8701 ? 19% +70.6% 14847 ? 23% softirqs.CPU6.SCHED
6902 ? 18% +50.7% 10401 ? 18% softirqs.CPU72.SCHED
8097 ? 14% +35.4% 10962 ? 17% softirqs.CPU76.SCHED
7938 ? 18% +24.3% 9864 ? 7% softirqs.CPU79.SCHED
839280 ? 4% +14.7% 963029 ? 2% softirqs.SCHED
248119 ? 63% +1378.0% 3667155 ? 28% interrupts.CAL:Function_call_interrupts
4.83 ?169% +5389.7% 265.33 ?166% interrupts.CPU12.RES:Rescheduling_interrupts
709.00 ? 31% +2004.3% 14919 ?120% interrupts.CPU28.CAL:Function_call_interrupts
873.33 ? 64% +7449.7% 65933 ?107% interrupts.CPU4.CAL:Function_call_interrupts
506.33 ? 3% +9102.5% 46595 ?158% interrupts.CPU43.CAL:Function_call_interrupts
27.33 ? 75% +750.6% 232.50 ? 99% interrupts.CPU48.RES:Rescheduling_interrupts
558.67 ? 19% +10960.8% 61792 ?145% interrupts.CPU50.CAL:Function_call_interrupts
706.00 ? 66% +3143.2% 22897 ?147% interrupts.CPU53.CAL:Function_call_interrupts
732.00 ? 44% +6764.9% 50251 ?131% interrupts.CPU56.CAL:Function_call_interrupts
1613 ? 84% +8272.0% 135082 ?130% interrupts.CPU58.CAL:Function_call_interrupts
839.00 ? 42% +23427.3% 197394 ? 68% interrupts.CPU6.CAL:Function_call_interrupts
11.17 ? 79% +4409.0% 503.50 ? 76% interrupts.CPU6.RES:Rescheduling_interrupts
717.00 ? 67% +1775.8% 13449 ?109% interrupts.CPU81.CAL:Function_call_interrupts
6.67 ?171% +1212.5% 87.50 ? 96% interrupts.CPU81.RES:Rescheduling_interrupts
1189 ? 20% +988.6% 12948 ? 11% interrupts.RES:Rescheduling_interrupts
12.68 ? 6% +20.0% 15.22 ? 2% perf-stat.i.MPKI
3.86e+09 ? 3% -9.9% 3.478e+09 ? 2% perf-stat.i.branch-instructions
0.67 ? 3% +0.2 0.84 perf-stat.i.branch-miss-rate%
24129751 +14.6% 27646121 ? 4% perf-stat.i.branch-misses
50806 ?154% +1684.2% 906487 ? 16% perf-stat.i.context-switches
2.58 ? 3% +12.6% 2.91 ? 2% perf-stat.i.cpi
4.686e+09 ? 2% -8.6% 4.282e+09 ? 2% perf-stat.i.dTLB-loads
2.615e+09 ? 2% -6.3% 2.452e+09 ? 3% perf-stat.i.dTLB-stores
83.94 +3.6 87.55 perf-stat.i.iTLB-load-miss-rate%
11626382 ? 5% +41.1% 16403258 ? 6% perf-stat.i.iTLB-load-misses
1.915e+10 ? 3% -10.8% 1.707e+10 ? 2% perf-stat.i.instructions
1660 ? 7% -34.9% 1080 ? 6% perf-stat.i.instructions-per-iTLB-miss
0.39 ? 3% -10.7% 0.35 ? 2% perf-stat.i.ipc
0.81 ? 6% -42.1% 0.47 ? 28% perf-stat.i.metric.K/sec
119.24 ? 2% -8.2% 109.49 ? 2% perf-stat.i.metric.M/sec
13094331 ? 11% -17.0% 10864787 ? 9% perf-stat.i.node-store-misses
12.83 ? 5% +20.0% 15.40 ? 3% perf-stat.overall.MPKI
0.63 ? 3% +0.2 0.79 ? 2% perf-stat.overall.branch-miss-rate%
2.60 ? 3% +12.9% 2.94 ? 2% perf-stat.overall.cpi
84.88 +3.9 88.76 perf-stat.overall.iTLB-load-miss-rate%
1653 ? 7% -36.8% 1044 ? 6% perf-stat.overall.instructions-per-iTLB-miss
0.38 ? 3% -11.5% 0.34 ? 2% perf-stat.overall.ipc
3.799e+09 ? 3% -9.9% 3.422e+09 ? 2% perf-stat.ps.branch-instructions
23742874 +14.6% 27204564 ? 4% perf-stat.ps.branch-misses
49922 ?154% +1687.0% 892147 ? 16% perf-stat.ps.context-switches
4.613e+09 ? 2% -8.6% 4.214e+09 ? 2% perf-stat.ps.dTLB-loads
2.574e+09 ? 2% -6.3% 2.413e+09 ? 3% perf-stat.ps.dTLB-stores
11442374 ? 5% +41.1% 16142526 ? 6% perf-stat.ps.iTLB-load-misses
1.884e+10 ? 3% -10.9% 1.68e+10 ? 2% perf-stat.ps.instructions
12879684 ? 11% -17.0% 10690222 ? 9% perf-stat.ps.node-store-misses
1.185e+12 ? 3% -10.3% 1.063e+12 ? 3% perf-stat.total.instructions
34.69 ? 20% -8.3 26.43 ? 3% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
6.46 ? 11% -1.7 4.79 ? 4% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.do_msgrcv.do_syscall_64.entry_SYSCALL_64_after_hwframe
5.42 ? 11% -1.3 4.14 ? 10% perf-profile.calltrace.cycles-pp.load_msg.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.13 ? 11% -1.1 3.04 ? 12% perf-profile.calltrace.cycles-pp.__kmalloc.load_msg.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.33 ? 6% -0.7 1.60 ? 18% perf-profile.calltrace.cycles-pp.ipc_obtain_object_check.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.94 ? 14% -0.5 1.48 ? 16% perf-profile.calltrace.cycles-pp.store_msg.do_msg_fill.do_msgrcv.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.68 ? 13% -0.4 1.26 ? 5% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.68 ? 13% -0.4 1.26 ? 5% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.67 ? 13% -0.4 1.26 ? 6% perf-profile.calltrace.cycles-pp.seq_read_iter.seq_read.vfs_read.ksys_read.do_syscall_64
1.67 ? 13% -0.4 1.26 ? 6% perf-profile.calltrace.cycles-pp.seq_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.42 ? 14% -0.4 1.05 ? 6% perf-profile.calltrace.cycles-pp.sysvipc_find_ipc.seq_read_iter.seq_read.vfs_read.ksys_read
1.36 ? 16% -0.3 1.01 ? 24% perf-profile.calltrace.cycles-pp.__check_object_size.store_msg.do_msg_fill.do_msgrcv.do_syscall_64
1.04 ? 16% -0.3 0.78 ? 6% perf-profile.calltrace.cycles-pp.__radix_tree_lookup.sysvipc_find_ipc.seq_read_iter.seq_read.vfs_read
0.68 ? 9% -0.2 0.45 ? 44% perf-profile.calltrace.cycles-pp.refill_obj_stock.kfree.free_msg.do_msgrcv.do_syscall_64
0.44 ? 73% +0.5 0.90 ? 6% perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
0.00 +0.8 0.81 ? 18% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
0.00 +0.8 0.84 ? 19% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
0.09 ?223% +1.2 1.30 ? 22% perf-profile.calltrace.cycles-pp.__schedule.schedule.do_msgrcv.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.09 ?223% +1.3 1.35 ? 22% perf-profile.calltrace.cycles-pp.schedule.do_msgrcv.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.4 1.40 ? 15% perf-profile.calltrace.cycles-pp.try_to_wake_up.wake_up_q.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.5 1.46 ? 14% perf-profile.calltrace.cycles-pp.wake_up_q.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.10 ?223% +2.0 2.10 ? 22% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
27.77 ? 10% +5.2 33.00 ? 4% perf-profile.calltrace.cycles-pp.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
34.70 ? 20% -8.1 26.57 ? 3% perf-profile.children.cycles-pp.intel_idle
13.36 ? 9% -4.3 9.10 ? 15% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
5.43 ? 11% -1.3 4.16 ? 10% perf-profile.children.cycles-pp.load_msg
4.18 ? 11% -1.1 3.09 ? 12% perf-profile.children.cycles-pp.__kmalloc
1.95 ? 14% -0.5 1.49 ? 16% perf-profile.children.cycles-pp.store_msg
1.99 ? 14% -0.4 1.56 ? 16% perf-profile.children.cycles-pp.__check_object_size
1.68 ? 13% -0.4 1.26 ? 6% perf-profile.children.cycles-pp.seq_read
1.68 ? 13% -0.4 1.26 ? 5% perf-profile.children.cycles-pp.seq_read_iter
1.68 ? 13% -0.4 1.27 ? 6% perf-profile.children.cycles-pp.ksys_read
1.68 ? 13% -0.4 1.27 ? 6% perf-profile.children.cycles-pp.vfs_read
0.70 ? 9% -0.4 0.31 ? 13% perf-profile.children.cycles-pp.get_obj_cgroup_from_current
1.42 ? 14% -0.4 1.05 ? 6% perf-profile.children.cycles-pp.sysvipc_find_ipc
1.49 ? 14% -0.3 1.16 ? 5% perf-profile.children.cycles-pp.__radix_tree_lookup
0.52 ? 17% -0.2 0.28 ? 9% perf-profile.children.cycles-pp.ipcperms
0.35 ? 25% -0.2 0.11 ? 55% perf-profile.children.cycles-pp.__slab_alloc
0.34 ? 25% -0.2 0.10 ? 59% perf-profile.children.cycles-pp.___slab_alloc
0.61 ? 21% -0.2 0.39 ? 27% perf-profile.children.cycles-pp.__list_del_entry_valid
0.64 ? 14% -0.2 0.47 ? 19% perf-profile.children.cycles-pp.idr_find
0.70 ? 9% -0.2 0.54 ? 5% perf-profile.children.cycles-pp.refill_obj_stock
0.69 ? 10% -0.1 0.56 ? 2% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.47 ? 9% -0.1 0.35 ? 7% perf-profile.children.cycles-pp.__entry_text_start
0.60 ? 9% -0.1 0.49 ? 3% perf-profile.children.cycles-pp.__check_heap_object
0.37 ? 7% -0.1 0.28 ? 9% perf-profile.children.cycles-pp.drain_obj_stock
0.48 ? 12% -0.1 0.39 ? 7% perf-profile.children.cycles-pp._copy_from_user
0.48 ? 10% -0.1 0.40 ? 6% perf-profile.children.cycles-pp.ksys_msgsnd
0.44 ? 9% -0.1 0.36 ? 3% perf-profile.children.cycles-pp.copy_user_generic_unrolled
0.29 ? 6% -0.1 0.22 ? 9% perf-profile.children.cycles-pp.page_counter_cancel
0.29 ? 6% -0.1 0.22 ? 9% perf-profile.children.cycles-pp.page_counter_uncharge
0.23 ? 10% -0.1 0.18 ? 2% perf-profile.children.cycles-pp.__put_user_nocheck_8
0.28 ? 9% -0.1 0.22 ? 4% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
0.23 ? 7% -0.1 0.18 ? 10% perf-profile.children.cycles-pp.__memcg_kmem_uncharge
0.18 ? 9% -0.0 0.14 ? 4% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.24 ? 9% -0.0 0.20 ? 8% perf-profile.children.cycles-pp.__get_user_8
0.23 ? 11% -0.0 0.18 ? 5% perf-profile.children.cycles-pp.seq_printf
0.20 ? 10% -0.0 0.16 ? 4% perf-profile.children.cycles-pp.__might_sleep
0.22 ? 10% -0.0 0.18 ? 5% perf-profile.children.cycles-pp.vsnprintf
0.22 ? 10% -0.0 0.18 ? 5% perf-profile.children.cycles-pp.seq_vprintf
0.23 ? 10% -0.0 0.19 ? 4% perf-profile.children.cycles-pp.sysvipc_msg_proc_show
0.11 ? 6% -0.0 0.08 ? 8% perf-profile.children.cycles-pp.rcu_read_unlock_strict
0.08 ? 8% -0.0 0.06 ? 16% perf-profile.children.cycles-pp.msgctl_stat
0.09 ? 10% +0.0 0.12 ? 9% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.07 ? 18% +0.0 0.11 ? 9% perf-profile.children.cycles-pp.native_irq_return_iret
0.02 ?141% +0.1 0.07 ? 18% perf-profile.children.cycles-pp._raw_spin_trylock
0.00 +0.1 0.06 ? 13% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.15 ? 12% +0.1 0.22 ? 5% perf-profile.children.cycles-pp.msgctl_info
0.00 +0.1 0.07 ? 19% perf-profile.children.cycles-pp.ttwu_do_wakeup
0.00 +0.1 0.07 ? 23% perf-profile.children.cycles-pp.hrtimer_next_event_without
0.00 +0.1 0.07 ? 28% perf-profile.children.cycles-pp.tick_nohz_idle_enter
0.03 ?102% +0.1 0.10 ? 14% perf-profile.children.cycles-pp.read_tsc
0.01 ?223% +0.1 0.09 ? 16% perf-profile.children.cycles-pp.update_cfs_group
0.00 +0.1 0.08 ? 14% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.00 +0.1 0.08 ? 13% perf-profile.children.cycles-pp.__update_load_avg_se
0.00 +0.1 0.09 ? 14% perf-profile.children.cycles-pp.reweight_entity
0.01 ?223% +0.1 0.10 ? 17% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.00 +0.1 0.09 ? 17% perf-profile.children.cycles-pp.get_next_timer_interrupt
0.00 +0.1 0.10 ? 18% perf-profile.children.cycles-pp.send_call_function_single_ipi
0.01 ?223% +0.1 0.13 ? 25% perf-profile.children.cycles-pp.available_idle_cpu
0.01 ?223% +0.1 0.15 ? 17% perf-profile.children.cycles-pp.native_sched_clock
0.00 +0.1 0.14 ? 20% perf-profile.children.cycles-pp.__switch_to_asm
0.02 ?144% +0.1 0.16 ? 16% perf-profile.children.cycles-pp.sched_clock
0.00 +0.1 0.15 ? 46% perf-profile.children.cycles-pp.__sysvec_call_function_single
0.03 ?106% +0.1 0.18 ? 16% perf-profile.children.cycles-pp.sched_clock_cpu
0.02 ?223% +0.2 0.17 ? 32% perf-profile.children.cycles-pp.update_ts_time_stats
0.00 +0.2 0.16 ? 34% perf-profile.children.cycles-pp.llist_add_batch
0.00 +0.2 0.16 ? 34% perf-profile.children.cycles-pp.__smp_call_single_queue
0.01 ?223% +0.2 0.17 ? 23% perf-profile.children.cycles-pp.select_task_rq_fair
0.17 ? 46% +0.2 0.34 ? 13% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.00 +0.2 0.17 ? 45% perf-profile.children.cycles-pp.sysvec_call_function_single
0.02 ?223% +0.2 0.19 ? 32% perf-profile.children.cycles-pp.nr_iowait_cpu
0.01 ?223% +0.2 0.18 ? 24% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
0.02 ?223% +0.2 0.20 ? 40% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.00 +0.2 0.19 ? 42% perf-profile.children.cycles-pp.finish_task_switch
0.02 ?223% +0.2 0.22 ? 17% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.02 ?223% +0.2 0.21 ? 29% perf-profile.children.cycles-pp.tick_nohz_idle_exit
0.01 ?223% +0.2 0.22 ? 43% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.01 ?223% +0.2 0.24 ? 21% perf-profile.children.cycles-pp.__switch_to
0.00 +0.2 0.23 ? 13% perf-profile.children.cycles-pp.set_next_entity
0.01 ?223% +0.3 0.27 ? 15% perf-profile.children.cycles-pp.update_rq_clock
0.01 ?223% +0.3 0.29 ? 12% perf-profile.children.cycles-pp.update_load_avg
0.03 ?223% +0.3 0.33 ? 28% perf-profile.children.cycles-pp.update_curr
0.01 ?223% +0.3 0.34 ? 14% perf-profile.children.cycles-pp.pick_next_task_fair
0.02 ?223% +0.3 0.36 ? 23% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.55 ? 32% +0.4 0.91 ? 6% perf-profile.children.cycles-pp.menu_select
0.02 ?223% +0.4 0.39 ? 16% perf-profile.children.cycles-pp.enqueue_entity
0.06 ?223% +0.4 0.48 ? 36% perf-profile.children.cycles-pp.sched_ttwu_pending
0.02 ?223% +0.5 0.49 ? 18% perf-profile.children.cycles-pp.enqueue_task_fair
0.02 ?223% +0.5 0.50 ? 18% perf-profile.children.cycles-pp.ttwu_do_activate
0.04 ?223% +0.5 0.54 ? 23% perf-profile.children.cycles-pp.dequeue_entity
0.07 ?223% +0.5 0.58 ? 38% perf-profile.children.cycles-pp.flush_smp_call_function_from_idle
0.04 ?223% +0.5 0.58 ? 24% perf-profile.children.cycles-pp.dequeue_task_fair
0.02 ?223% +0.6 0.65 ? 16% perf-profile.children.cycles-pp.wake_q_add
0.03 ?223% +0.8 0.84 ? 19% perf-profile.children.cycles-pp.schedule_idle
0.10 ?223% +1.3 1.36 ? 22% perf-profile.children.cycles-pp.schedule
0.05 ?223% +1.4 1.40 ? 15% perf-profile.children.cycles-pp.try_to_wake_up
0.12 ? 90% +1.4 1.49 ? 14% perf-profile.children.cycles-pp.wake_up_q
0.14 ?166% +2.0 2.11 ? 22% perf-profile.children.cycles-pp.poll_idle
0.13 ?200% +2.0 2.12 ? 20% perf-profile.children.cycles-pp.__schedule
27.85 ? 10% +5.3 33.15 ? 4% perf-profile.children.cycles-pp.do_msgsnd
34.70 ? 20% -8.1 26.57 ? 3% perf-profile.self.cycles-pp.intel_idle
13.28 ? 9% -4.2 9.03 ? 15% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.66 ? 10% -0.4 0.29 ? 14% perf-profile.self.cycles-pp.get_obj_cgroup_from_current
1.41 ? 15% -0.3 1.09 ? 5% perf-profile.self.cycles-pp.__radix_tree_lookup
0.51 ? 17% -0.2 0.27 ? 9% perf-profile.self.cycles-pp.ipcperms
0.60 ? 22% -0.2 0.38 ? 28% perf-profile.self.cycles-pp.__list_del_entry_valid
0.24 ? 36% -0.2 0.05 ? 46% perf-profile.self.cycles-pp.___slab_alloc
0.63 ? 14% -0.2 0.46 ? 18% perf-profile.self.cycles-pp.idr_find
0.69 ? 10% -0.1 0.56 ? 2% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.59 ? 9% -0.1 0.47 ? 3% perf-profile.self.cycles-pp.__check_heap_object
0.46 ? 10% -0.1 0.35 ? 7% perf-profile.self.cycles-pp.__entry_text_start
0.43 ? 9% -0.1 0.35 ? 5% perf-profile.self.cycles-pp.copy_user_generic_unrolled
0.33 ? 13% -0.1 0.26 perf-profile.self.cycles-pp.refill_obj_stock
0.18 ? 13% -0.1 0.11 ? 8% perf-profile.self.cycles-pp.sysvipc_find_ipc
0.27 ? 7% -0.1 0.20 ? 10% perf-profile.self.cycles-pp.page_counter_cancel
0.23 ? 10% -0.1 0.17 ? 4% perf-profile.self.cycles-pp.__put_user_nocheck_8
0.27 ? 10% -0.0 0.22 ? 4% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.17 ? 6% -0.0 0.12 ? 3% perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.24 ? 10% -0.0 0.19 ? 8% perf-profile.self.cycles-pp.__get_user_8
0.26 ? 10% -0.0 0.22 ? 7% perf-profile.self.cycles-pp.obj_cgroup_charge
0.19 ? 11% -0.0 0.15 ? 3% perf-profile.self.cycles-pp.__might_sleep
0.08 ? 13% -0.0 0.05 ? 45% perf-profile.self.cycles-pp.drain_obj_stock
0.13 ? 11% -0.0 0.10 ? 8% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
0.11 ? 12% -0.0 0.08 ? 5% perf-profile.self.cycles-pp.store_msg
0.10 ? 14% -0.0 0.08 perf-profile.self.cycles-pp.__might_fault
0.10 ? 7% -0.0 0.08 ? 8% perf-profile.self.cycles-pp.page_counter_try_charge
0.05 ? 9% +0.0 0.09 ? 14% perf-profile.self.cycles-pp.wake_up_q
0.09 ? 10% +0.0 0.12 ? 10% perf-profile.self.cycles-pp.do_msg_fill
0.07 ? 18% +0.0 0.11 ? 9% perf-profile.self.cycles-pp.native_irq_return_iret
0.02 ?141% +0.1 0.07 ? 18% perf-profile.self.cycles-pp._raw_spin_trylock
0.00 +0.1 0.06 ? 15% perf-profile.self.cycles-pp.switch_mm_irqs_off
0.00 +0.1 0.07 ? 19% perf-profile.self.cycles-pp.dequeue_entity
0.00 +0.1 0.07 ? 22% perf-profile.self.cycles-pp.update_curr
0.03 ?102% +0.1 0.10 ? 12% perf-profile.self.cycles-pp.read_tsc
0.00 +0.1 0.08 ? 14% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.10 ? 12% +0.1 0.18 ? 5% perf-profile.self.cycles-pp.msgctl_info
0.00 +0.1 0.08 ? 22% perf-profile.self.cycles-pp.update_cfs_group
0.00 +0.1 0.08 ? 29% perf-profile.self.cycles-pp.finish_task_switch
0.00 +0.1 0.08 ? 10% perf-profile.self.cycles-pp.__update_load_avg_se
0.01 ?223% +0.1 0.09 ? 52% perf-profile.self.cycles-pp.ttwu_queue_wakelist
0.00 +0.1 0.09 ? 13% perf-profile.self.cycles-pp.reweight_entity
0.00 +0.1 0.09 ? 16% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.00 +0.1 0.10 ? 22% perf-profile.self.cycles-pp.enqueue_task_fair
0.00 +0.1 0.10 ? 18% perf-profile.self.cycles-pp.send_call_function_single_ipi
0.00 +0.1 0.11 ? 16% perf-profile.self.cycles-pp.try_to_wake_up
0.01 ?223% +0.1 0.12 ? 31% perf-profile.self.cycles-pp.perf_trace_sched_wakeup_template
0.00 +0.1 0.12 ? 13% perf-profile.self.cycles-pp.set_next_entity
0.00 +0.1 0.12 ? 12% perf-profile.self.cycles-pp.update_load_avg
0.01 ?223% +0.1 0.13 ? 25% perf-profile.self.cycles-pp.available_idle_cpu
0.01 ?223% +0.1 0.15 ? 18% perf-profile.self.cycles-pp.native_sched_clock
0.00 +0.1 0.14 ? 20% perf-profile.self.cycles-pp.__switch_to_asm
0.37 ? 30% +0.1 0.52 ? 2% perf-profile.self.cycles-pp.menu_select
0.01 ?223% +0.2 0.16 ? 27% perf-profile.self.cycles-pp.enqueue_entity
0.00 +0.2 0.16 ? 34% perf-profile.self.cycles-pp.llist_add_batch
0.02 ?223% +0.2 0.19 ? 32% perf-profile.self.cycles-pp.nr_iowait_cpu
0.01 ?223% +0.2 0.19 ? 17% perf-profile.self.cycles-pp.do_idle
0.00 +0.2 0.19 ? 18% perf-profile.self.cycles-pp.update_rq_clock
0.02 ?223% +0.2 0.22 ? 17% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.01 ?223% +0.2 0.23 ? 21% perf-profile.self.cycles-pp.__switch_to
0.01 ?223% +0.4 0.41 ? 19% perf-profile.self.cycles-pp.__schedule
0.02 ?223% +0.6 0.65 ? 17% perf-profile.self.cycles-pp.wake_q_add
0.13 ?166% +1.9 2.01 ? 22% perf-profile.self.cycles-pp.poll_idle
9.95 ? 12% +4.4 14.31 ? 4% perf-profile.self.cycles-pp.do_msgsnd
4.52 ? 22% +6.1 10.58 ? 9% perf-profile.self.cycles-pp._raw_spin_lock
stress-ng.msg.ops_per_sec
5.4e+06 +-----------------------------------------------------------------+
5.2e+06 |-+ + + +..+ : .+.+ +.. |
| : : + : + : + : : + |
5e+06 |:+ .+.+ : : + : + :+ : : |
4.8e+06 |:+.+ + : + : .+.+. + :+ + : : |
| + + .. : : +.+. + + : |
4.6e+06 |-+ + :: + |
4.4e+06 |-+ + |
4.2e+06 |-+ |
| |
4e+06 |-+ O O O |
3.8e+06 |-+ O O O O O O O O |
| O O O O O O O O O O O O |
3.6e+06 |-+ O O O O O O |
3.4e+06 +-----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang
On Tue, Apr 13, 2021 at 09:58:01PM +0800, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a -27.4% regression of stress-ng.msg.ops_per_sec due to commit:
>
>
> commit: 9223d0dccb8f8523754122f68316dd1a4f39f7f8 ("thermal: Move therm_throt there from x86/mce")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
Hmm, so I went and ran your reproducer, but simplified (see end of
mail), on a KBL box here. The kernel is tip:x86/urgent from last week:
5.12.0-rc6+
-----------
stress-ng: info: [1430] dispatching hogs: 9 msg
stress-ng: info: [1430] successful run completed in 60.01s (1 min, 0.01 secs)
stress-ng: info: [1430] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: info: [1430] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: info: [1430] msg 237390147 60.01 104.03 255.85 3955872.56 659636.95
stress-ng: info: [1430] for a 60.01s run time:
stress-ng: info: [1430] 360.08s available CPU time
stress-ng: info: [1430] 104.11s user time ( 28.91%)
stress-ng: info: [1430] 255.93s system time ( 71.08%)
stress-ng: info: [1430] 360.04s total time ( 99.99%)
stress-ng: info: [1430] load average: 8.47 3.71 1.48
Now the same kernel with
> 4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")
> 9223d0dccb ("thermal: Move therm_throt there from x86/mce")
reverted.
5.12.0-rc6-rev+
---------------
stress-ng: info: [1246] dispatching hogs: 9 msg
stress-ng: info: [1246] successful run completed in 60.02s (1 min, 0.02 secs)
stress-ng: info: [1246] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
stress-ng: info: [1246] (secs) (secs) (secs) (real time) (usr+sys time)
stress-ng: info: [1246] msg 215174467 60.01 99.64 260.24 3585438.79 597906.15
stress-ng: info: [1246] for a 60.02s run time:
stress-ng: info: [1246] 360.10s available CPU time
stress-ng: info: [1246] 99.72s user time ( 27.69%)
stress-ng: info: [1246] 260.32s system time ( 72.29%)
stress-ng: info: [1246] 360.04s total time ( 99.98%)
stress-ng: info: [1246] load average: 7.98 2.33 0.80
so if I'm reading this correctly, reverting the patches here brings the
*slow-down*.
What's up?
reproducer:
----------
#!/usr/bin/bash
for cpu_dir in /sys/devices/system/cpu/cpu[0-9]*
do
online_file="$cpu_dir"/online
[ -f "$online_file" ] && [ "$(cat "$online_file")" -eq 0 ] && continue
file="$cpu_dir"/cpufreq/scaling_governor
[ -f "$file" ] && echo "performance" > "$file"
done
stress-ng --timeout 60 --times --verify --metrics-brief --msg 9
--
Regards/Gruss,
Boris.
SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg
On Tue, 2021-04-13 at 21:58 +0800, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a -27.4% regression of stress-ng.msg.ops_per_sec due
> to commit:
>
>
> commit: 9223d0dccb8f8523754122f68316dd1a4f39f7f8 ("thermal: Move
> therm_throt there from x86/mce")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
> master
>
>
> in testcase: stress-ng
> on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz
> with 192G memory
> with following parameters:
>
> nr_threads: 10%
> disk: 1HDD
> testtime: 60s
> fs: ext4
> class: os
> test: msg
> cpufreq_governor: performance
> ucode: 0x5003006
>
>
>
Is this reproducible? This patch doesn't set any value which restricts
performance. This driver reports the threshold violations.
It is possible that somehow we reached a thermal limit where we got
many interrupts in one run.
Thanks.
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
> Details are as below:
> -------------------------------------------------------------------
> ------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is
> attached in this email
> bin/lkp split-job --compatible job.yaml
> bin/lkp run compatible-job.yaml
>
> =====================================================================
> ====================
> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbo
> x_group/test/testcase/testtime/ucode:
> os/gcc-9/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-10.4-
> x86_64-20200603.cgz/lkp-csl-2sp5/msg/stress-ng/60s/0x5003006
>
> commit:
> 4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")
> 9223d0dccb ("thermal: Move therm_throt there from x86/mce")
>
> 4f432e8bb15b352d 9223d0dccb8f8523754122f6831
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 3.042e+08 ± 4% -27.4% 2.209e+08 ± 2% stress-ng.msg.ops
> 5069470 ± 4% -27.4% 3682272 ± 2% stress-
> ng.msg.ops_per_sec
> 2335 ± 8% -35.0% 1517 ± 18% stress-
> ng.time.involuntary_context_switches
> 1676 -4.9% 1594 stress-
> ng.time.percent_of_cpu_this_job_got
> 1019 -4.6% 973.25 stress-
> ng.time.system_time
> 20.97 ± 2% -19.3% 16.93 ± 2% stress-
> ng.time.user_time
> 1549846 ±159% +1738.5% 28494120 ± 16% stress-
> ng.time.voluntary_context_switches
> 2050707 ±134% +1531.7% 33461885 ± 21% cpuidle.POLL.time
> 1354230 ±171% +1585.0% 22819237 ± 19% cpuidle.POLL.usage
> 82.39 +1.2% 83.37 iostat.cpu.idle
> 17.20 -5.3% 16.29 iostat.cpu.system
> 0.04 ± 5% +0.0 0.05 ± 3% mpstat.cpu.all.soft%
> 0.37 ± 2% -0.1 0.30 ± 2% mpstat.cpu.all.usr%
> 1337633 ± 28% -65.8% 457460 ± 95% numa-
> numastat.node0.local_node
> 456832 ± 26% -67.5% 148621 ± 25% numa-
> numastat.node1.local_node
> 500231 ± 26% -64.1% 179602 ± 12% numa-
> numastat.node1.numa_hit
> 82.00 +1.2% 83.00 vmstat.cpu.id
> 49550 ±153% +1673.1% 878579 ± 16% vmstat.system.cs
> 197941 +26.4% 250229 ± 6% vmstat.system.in
> 41605 -1.4% 41020 proc-
> vmstat.nr_slab_unreclaimable
> 1907890 ± 15% -62.0% 724187 ± 63% proc-vmstat.numa_hit
> 1821249 ± 16% -65.0% 637560 ± 71% proc-vmstat.numa_local
> 1925858 ± 15% -61.7% 737459 ± 62% proc-
> vmstat.pgalloc_normal
> 1796629 ± 16% -66.1% 609149 ± 75% proc-vmstat.pgfree
> 9427 ± 3% -11.6% 8337 ± 2% slabinfo.kmalloc-
> 512.active_objs
> 1182 ± 3% -11.5% 1045 ± 2% slabinfo.kmalloc-
> 512.active_slabs
> 9459 ± 3% -11.5% 8370 ± 2% slabinfo.kmalloc-
> 512.num_objs
> 1182 ± 3% -11.5% 1045 ± 2% slabinfo.kmalloc-
> 512.num_slabs
> 82005 ± 7% -24.2% 62197 ± 5% slabinfo.kmalloc-
> 64.active_objs
> 1282 ± 7% -24.1% 973.67 ± 5% slabinfo.kmalloc-
> 64.active_slabs
> 82119 ± 7% -24.1% 62339 ± 5% slabinfo.kmalloc-
> 64.num_objs
> 1282 ± 7% -24.1% 973.67 ± 5% slabinfo.kmalloc-
> 64.num_slabs
> 8300 ± 11% +41.5% 11747 ± 10% softirqs.CPU21.SCHED
> 6707 ± 23% +39.9% 9384 ± 15% softirqs.CPU41.SCHED
> 8143 ± 17% +50.5% 12255 ± 27% softirqs.CPU56.SCHED
> 8052 ± 14% +82.7% 14708 ± 29% softirqs.CPU58.SCHED
> 8701 ± 19% +70.6% 14847 ± 23% softirqs.CPU6.SCHED
> 6902 ± 18% +50.7% 10401 ± 18% softirqs.CPU72.SCHED
> 8097 ± 14% +35.4% 10962 ± 17% softirqs.CPU76.SCHED
> 7938 ± 18% +24.3% 9864 ± 7% softirqs.CPU79.SCHED
> 839280 ± 4% +14.7% 963029 ± 2% softirqs.SCHED
> 248119 ± 63% +1378.0% 3667155 ±
> 28% interrupts.CAL:Function_call_interrupts
> 4.83 ±169% +5389.7% 265.33
> ±166% interrupts.CPU12.RES:Rescheduling_interrupts
> 709.00 ± 31% +2004.3% 14919
> ±120% interrupts.CPU28.CAL:Function_call_interrupts
> 873.33 ± 64% +7449.7% 65933
> ±107% interrupts.CPU4.CAL:Function_call_interrupts
> 506.33 ± 3% +9102.5% 46595
> ±158% interrupts.CPU43.CAL:Function_call_interrupts
> 27.33 ± 75% +750.6% 232.50 ±
> 99% interrupts.CPU48.RES:Rescheduling_interrupts
> 558.67 ± 19% +10960.8% 61792
> ±145% interrupts.CPU50.CAL:Function_call_interrupts
> 706.00 ± 66% +3143.2% 22897
> ±147% interrupts.CPU53.CAL:Function_call_interrupts
> 732.00 ± 44% +6764.9% 50251
> ±131% interrupts.CPU56.CAL:Function_call_interrupts
> 1613 ± 84% +8272.0% 135082
> ±130% interrupts.CPU58.CAL:Function_call_interrupts
> 839.00 ± 42% +23427.3% 197394 ±
> 68% interrupts.CPU6.CAL:Function_call_interrupts
> 11.17 ± 79% +4409.0% 503.50 ±
> 76% interrupts.CPU6.RES:Rescheduling_interrupts
> 717.00 ± 67% +1775.8% 13449
> ±109% interrupts.CPU81.CAL:Function_call_interrupts
> 6.67 ±171% +1212.5% 87.50 ±
> 96% interrupts.CPU81.RES:Rescheduling_interrupts
> 1189 ± 20% +988.6% 12948 ±
> 11% interrupts.RES:Rescheduling_interrupts
> 12.68 ± 6% +20.0% 15.22 ± 2% perf-stat.i.MPKI
> 3.86e+09 ± 3% -9.9% 3.478e+09 ± 2% perf-stat.i.branch-
> instructions
> 0.67 ± 3% +0.2 0.84 perf-stat.i.branch-
> miss-rate%
> 24129751 +14.6% 27646121 ± 4% perf-stat.i.branch-
> misses
> 50806 ±154% +1684.2% 906487 ± 16% perf-stat.i.context-
> switches
> 2.58 ± 3% +12.6% 2.91 ± 2% perf-stat.i.cpi
> 4.686e+09 ± 2% -8.6% 4.282e+09 ± 2% perf-stat.i.dTLB-loads
> 2.615e+09 ± 2% -6.3% 2.452e+09 ± 3% perf-stat.i.dTLB-stores
> 83.94 +3.6 87.55 perf-stat.i.iTLB-load-
> miss-rate%
> 11626382 ± 5% +41.1% 16403258 ± 6% perf-stat.i.iTLB-load-
> misses
> 1.915e+10 ± 3% -10.8% 1.707e+10 ± 2% perf-
> stat.i.instructions
> 1660 ± 7% -34.9% 1080 ± 6% perf-
> stat.i.instructions-per-iTLB-miss
> 0.39 ± 3% -10.7% 0.35 ± 2% perf-stat.i.ipc
> 0.81 ± 6% -42.1% 0.47 ± 28% perf-
> stat.i.metric.K/sec
> 119.24 ± 2% -8.2% 109.49 ± 2% perf-
> stat.i.metric.M/sec
> 13094331 ± 11% -17.0% 10864787 ± 9% perf-stat.i.node-store-
> misses
> 12.83 ± 5% +20.0% 15.40 ± 3% perf-stat.overall.MPKI
> 0.63 ± 3% +0.2 0.79 ± 2% perf-
> stat.overall.branch-miss-rate%
> 2.60 ± 3% +12.9% 2.94 ± 2% perf-stat.overall.cpi
> 84.88 +3.9 88.76 perf-stat.overall.iTLB-
> load-miss-rate%
> 1653 ± 7% -36.8% 1044 ± 6% perf-
> stat.overall.instructions-per-iTLB-miss
> 0.38 ± 3% -11.5% 0.34 ± 2% perf-stat.overall.ipc
> 3.799e+09 ± 3% -9.9% 3.422e+09 ± 2% perf-stat.ps.branch-
> instructions
> 23742874 +14.6% 27204564 ± 4% perf-stat.ps.branch-
> misses
> 49922 ±154% +1687.0% 892147 ± 16% perf-stat.ps.context-
> switches
> 4.613e+09 ± 2% -8.6% 4.214e+09 ± 2% perf-stat.ps.dTLB-loads
> 2.574e+09 ± 2% -6.3% 2.413e+09 ± 3% perf-stat.ps.dTLB-
> stores
> 11442374 ± 5% +41.1% 16142526 ± 6% perf-stat.ps.iTLB-load-
> misses
> 1.884e+10 ± 3% -10.9% 1.68e+10 ± 2% perf-
> stat.ps.instructions
> 12879684 ± 11% -17.0% 10690222 ± 9% perf-stat.ps.node-
> store-misses
> 1.185e+12 ± 3% -10.3% 1.063e+12 ± 3% perf-
> stat.total.instructions
> 34.69 ± 20% -8.3 26.43 ± 3% perf-
> profile.calltrace.cycles-
> pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_e
> ntry
> 6.46 ± 11% -1.7 4.79 ± 4% perf-
> profile.calltrace.cycles-
> pp.native_queued_spin_lock_slowpath._raw_spin_lock.do_msgrcv.do_sysca
> ll_64.entry_SYSCALL_64_after_hwframe
> 5.42 ± 11% -1.3 4.14 ± 10% perf-
> profile.calltrace.cycles-
> pp.load_msg.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 4.13 ± 11% -1.1 3.04 ± 12% perf-
> profile.calltrace.cycles-
> pp.__kmalloc.load_msg.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_
> hwframe
> 2.33 ± 6% -0.7 1.60 ± 18% perf-
> profile.calltrace.cycles-
> pp.ipc_obtain_object_check.do_msgsnd.do_syscall_64.entry_SYSCALL_64_a
> fter_hwframe
> 1.94 ± 14% -0.5 1.48 ± 16% perf-
> profile.calltrace.cycles-
> pp.store_msg.do_msg_fill.do_msgrcv.do_syscall_64.entry_SYSCALL_64_aft
> er_hwframe
> 1.68 ± 13% -0.4 1.26 ± 5% perf-
> profile.calltrace.cycles-
> pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.68 ± 13% -0.4 1.26 ± 5% perf-
> profile.calltrace.cycles-
> pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 1.67 ± 13% -0.4 1.26 ± 6% perf-
> profile.calltrace.cycles-
> pp.seq_read_iter.seq_read.vfs_read.ksys_read.do_syscall_64
> 1.67 ± 13% -0.4 1.26 ± 6% perf-
> profile.calltrace.cycles-
> pp.seq_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_h
> wframe
> 1.42 ± 14% -0.4 1.05 ± 6% perf-
> profile.calltrace.cycles-
> pp.sysvipc_find_ipc.seq_read_iter.seq_read.vfs_read.ksys_read
> 1.36 ± 16% -0.3 1.01 ± 24% perf-
> profile.calltrace.cycles-
> pp.__check_object_size.store_msg.do_msg_fill.do_msgrcv.do_syscall_64
> 1.04 ± 16% -0.3 0.78 ± 6% perf-
> profile.calltrace.cycles-
> pp.__radix_tree_lookup.sysvipc_find_ipc.seq_read_iter.seq_read.vfs_re
> ad
> 0.68 ± 9% -0.2 0.45 ± 44% perf-
> profile.calltrace.cycles-
> pp.refill_obj_stock.kfree.free_msg.do_msgrcv.do_syscall_64
> 0.44 ± 73% +0.5 0.90 ± 6% perf-
> profile.calltrace.cycles-
> pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_st
> artup_64_no_verify
> 0.00 +0.8 0.81 ± 18% perf-
> profile.calltrace.cycles-
> pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
> 0.00 +0.8 0.84 ± 19% perf-
> profile.calltrace.cycles-
> pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_
> startup_64_no_verify
> 0.09 ±223% +1.2 1.30 ± 22% perf-
> profile.calltrace.cycles-
> pp.__schedule.schedule.do_msgrcv.do_syscall_64.entry_SYSCALL_64_after
> _hwframe
> 0.09 ±223% +1.3 1.35 ± 22% perf-
> profile.calltrace.cycles-
> pp.schedule.do_msgrcv.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.00 +1.4 1.40 ± 15% perf-
> profile.calltrace.cycles-
> pp.try_to_wake_up.wake_up_q.do_msgsnd.do_syscall_64.entry_SYSCALL_64_
> after_hwframe
> 0.00 +1.5 1.46 ± 14% perf-
> profile.calltrace.cycles-
> pp.wake_up_q.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.10 ±223% +2.0 2.10 ± 22% perf-
> profile.calltrace.cycles-
> pp.poll_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_en
> try
> 27.77 ± 10% +5.2 33.00 ± 4% perf-
> profile.calltrace.cycles-
> pp.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 34.70 ± 20% -8.1 26.57 ± 3% perf-
> profile.children.cycles-pp.intel_idle
> 13.36 ± 9% -4.3 9.10 ± 15% perf-
> profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 5.43 ± 11% -1.3 4.16 ± 10% perf-
> profile.children.cycles-pp.load_msg
> 4.18 ± 11% -1.1 3.09 ± 12% perf-
> profile.children.cycles-pp.__kmalloc
> 1.95 ± 14% -0.5 1.49 ± 16% perf-
> profile.children.cycles-pp.store_msg
> 1.99 ± 14% -0.4 1.56 ± 16% perf-
> profile.children.cycles-pp.__check_object_size
> 1.68 ± 13% -0.4 1.26 ± 6% perf-
> profile.children.cycles-pp.seq_read
> 1.68 ± 13% -0.4 1.26 ± 5% perf-
> profile.children.cycles-pp.seq_read_iter
> 1.68 ± 13% -0.4 1.27 ± 6% perf-
> profile.children.cycles-pp.ksys_read
> 1.68 ± 13% -0.4 1.27 ± 6% perf-
> profile.children.cycles-pp.vfs_read
> 0.70 ± 9% -0.4 0.31 ± 13% perf-
> profile.children.cycles-pp.get_obj_cgroup_from_current
> 1.42 ± 14% -0.4 1.05 ± 6% perf-
> profile.children.cycles-pp.sysvipc_find_ipc
> 1.49 ± 14% -0.3 1.16 ± 5% perf-
> profile.children.cycles-pp.__radix_tree_lookup
> 0.52 ± 17% -0.2 0.28 ± 9% perf-
> profile.children.cycles-pp.ipcperms
> 0.35 ± 25% -0.2 0.11 ± 55% perf-
> profile.children.cycles-pp.__slab_alloc
> 0.34 ± 25% -0.2 0.10 ± 59% perf-
> profile.children.cycles-pp.___slab_alloc
> 0.61 ± 21% -0.2 0.39 ± 27% perf-
> profile.children.cycles-pp.__list_del_entry_valid
> 0.64 ± 14% -0.2 0.47 ± 19% perf-
> profile.children.cycles-pp.idr_find
> 0.70 ± 9% -0.2 0.54 ± 5% perf-
> profile.children.cycles-pp.refill_obj_stock
> 0.69 ± 10% -0.1 0.56 ± 2% perf-
> profile.children.cycles-pp.syscall_return_via_sysret
> 0.47 ± 9% -0.1 0.35 ± 7% perf-
> profile.children.cycles-pp.__entry_text_start
> 0.60 ± 9% -0.1 0.49 ± 3% perf-
> profile.children.cycles-pp.__check_heap_object
> 0.37 ± 7% -0.1 0.28 ± 9% perf-
> profile.children.cycles-pp.drain_obj_stock
> 0.48 ± 12% -0.1 0.39 ± 7% perf-
> profile.children.cycles-pp._copy_from_user
> 0.48 ± 10% -0.1 0.40 ± 6% perf-
> profile.children.cycles-pp.ksys_msgsnd
> 0.44 ± 9% -0.1 0.36 ± 3% perf-
> profile.children.cycles-pp.copy_user_generic_unrolled
> 0.29 ± 6% -0.1 0.22 ± 9% perf-
> profile.children.cycles-pp.page_counter_cancel
> 0.29 ± 6% -0.1 0.22 ± 9% perf-
> profile.children.cycles-pp.page_counter_uncharge
> 0.23 ± 10% -0.1 0.18 ± 2% perf-
> profile.children.cycles-pp.__put_user_nocheck_8
> 0.28 ± 9% -0.1 0.22 ± 4% perf-
> profile.children.cycles-pp.__mod_memcg_lruvec_state
> 0.23 ± 7% -0.1 0.18 ± 10% perf-
> profile.children.cycles-pp.__memcg_kmem_uncharge
> 0.18 ± 9% -0.0 0.14 ± 4% perf-
> profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.24 ± 9% -0.0 0.20 ± 8% perf-
> profile.children.cycles-pp.__get_user_8
> 0.23 ± 11% -0.0 0.18 ± 5% perf-
> profile.children.cycles-pp.seq_printf
> 0.20 ± 10% -0.0 0.16 ± 4% perf-
> profile.children.cycles-pp.__might_sleep
> 0.22 ± 10% -0.0 0.18 ± 5% perf-
> profile.children.cycles-pp.vsnprintf
> 0.22 ± 10% -0.0 0.18 ± 5% perf-
> profile.children.cycles-pp.seq_vprintf
> 0.23 ± 10% -0.0 0.19 ± 4% perf-
> profile.children.cycles-pp.sysvipc_msg_proc_show
> 0.11 ± 6% -0.0 0.08 ± 8% perf-
> profile.children.cycles-pp.rcu_read_unlock_strict
> 0.08 ± 8% -0.0 0.06 ± 16% perf-
> profile.children.cycles-pp.msgctl_stat
> 0.09 ± 10% +0.0 0.12 ± 9% perf-
> profile.children.cycles-pp.exit_to_user_mode_prepare
> 0.07 ± 18% +0.0 0.11 ± 9% perf-
> profile.children.cycles-pp.native_irq_return_iret
> 0.02 ±141% +0.1 0.07 ± 18% perf-
> profile.children.cycles-pp._raw_spin_trylock
> 0.00 +0.1 0.06 ± 13% perf-
> profile.children.cycles-pp.switch_mm_irqs_off
> 0.15 ± 12% +0.1 0.22 ± 5% perf-
> profile.children.cycles-pp.msgctl_info
> 0.00 +0.1 0.07 ± 19% perf-
> profile.children.cycles-pp.ttwu_do_wakeup
> 0.00 +0.1 0.07 ± 23% perf-
> profile.children.cycles-pp.hrtimer_next_event_without
> 0.00 +0.1 0.07 ± 28% perf-
> profile.children.cycles-pp.tick_nohz_idle_enter
> 0.03 ±102% +0.1 0.10 ± 14% perf-
> profile.children.cycles-pp.read_tsc
> 0.01 ±223% +0.1 0.09 ± 16% perf-
> profile.children.cycles-pp.update_cfs_group
> 0.00 +0.1 0.08 ± 14% perf-
> profile.children.cycles-pp.__update_load_avg_cfs_rq
> 0.00 +0.1 0.08 ± 13% perf-
> profile.children.cycles-pp.__update_load_avg_se
> 0.00 +0.1 0.09 ± 14% perf-
> profile.children.cycles-pp.reweight_entity
> 0.01 ±223% +0.1 0.10 ± 17% perf-
> profile.children.cycles-pp._raw_spin_unlock_irqrestore
> 0.00 +0.1 0.09 ± 17% perf-
> profile.children.cycles-pp.get_next_timer_interrupt
> 0.00 +0.1 0.10 ± 18% perf-
> profile.children.cycles-pp.send_call_function_single_ipi
> 0.01 ±223% +0.1 0.13 ± 25% perf-
> profile.children.cycles-pp.available_idle_cpu
> 0.01 ±223% +0.1 0.15 ± 17% perf-
> profile.children.cycles-pp.native_sched_clock
> 0.00 +0.1 0.14 ± 20% perf-
> profile.children.cycles-pp.__switch_to_asm
> 0.02 ±144% +0.1 0.16 ± 16% perf-
> profile.children.cycles-pp.sched_clock
> 0.00 +0.1 0.15 ± 46% perf-
> profile.children.cycles-pp.__sysvec_call_function_single
> 0.03 ±106% +0.1 0.18 ± 16% perf-
> profile.children.cycles-pp.sched_clock_cpu
> 0.02 ±223% +0.2 0.17 ± 32% perf-
> profile.children.cycles-pp.update_ts_time_stats
> 0.00 +0.2 0.16 ± 34% perf-
> profile.children.cycles-pp.llist_add_batch
> 0.00 +0.2 0.16 ± 34% perf-
> profile.children.cycles-pp.__smp_call_single_queue
> 0.01 ±223% +0.2 0.17 ± 23% perf-
> profile.children.cycles-pp.select_task_rq_fair
> 0.17 ± 46% +0.2 0.34 ± 13% perf-
> profile.children.cycles-pp.tick_nohz_get_sleep_length
> 0.00 +0.2 0.17 ± 45% perf-
> profile.children.cycles-pp.sysvec_call_function_single
> 0.02 ±223% +0.2 0.19 ± 32% perf-
> profile.children.cycles-pp.nr_iowait_cpu
> 0.01 ±223% +0.2 0.18 ± 24% perf-
> profile.children.cycles-pp.perf_trace_sched_wakeup_template
> 0.02 ±223% +0.2 0.20 ± 40% perf-
> profile.children.cycles-pp.flush_smp_call_function_queue
> 0.00 +0.2 0.19 ± 42% perf-
> profile.children.cycles-pp.finish_task_switch
> 0.02 ±223% +0.2 0.22 ± 17% perf-
> profile.children.cycles-pp._raw_spin_lock_irqsave
> 0.02 ±223% +0.2 0.21 ± 29% perf-
> profile.children.cycles-pp.tick_nohz_idle_exit
> 0.01 ±223% +0.2 0.22 ± 43% perf-
> profile.children.cycles-pp.asm_sysvec_call_function_single
> 0.01 ±223% +0.2 0.24 ± 21% perf-
> profile.children.cycles-pp.__switch_to
> 0.00 +0.2 0.23 ± 13% perf-
> profile.children.cycles-pp.set_next_entity
> 0.01 ±223% +0.3 0.27 ± 15% perf-
> profile.children.cycles-pp.update_rq_clock
> 0.01 ±223% +0.3 0.29 ± 12% perf-
> profile.children.cycles-pp.update_load_avg
> 0.03 ±223% +0.3 0.33 ± 28% perf-
> profile.children.cycles-pp.update_curr
> 0.01 ±223% +0.3 0.34 ± 14% perf-
> profile.children.cycles-pp.pick_next_task_fair
> 0.02 ±223% +0.3 0.36 ± 23% perf-
> profile.children.cycles-pp.ttwu_queue_wakelist
> 0.55 ± 32% +0.4 0.91 ± 6% perf-
> profile.children.cycles-pp.menu_select
> 0.02 ±223% +0.4 0.39 ± 16% perf-
> profile.children.cycles-pp.enqueue_entity
> 0.06 ±223% +0.4 0.48 ± 36% perf-
> profile.children.cycles-pp.sched_ttwu_pending
> 0.02 ±223% +0.5 0.49 ± 18% perf-
> profile.children.cycles-pp.enqueue_task_fair
> 0.02 ±223% +0.5 0.50 ± 18% perf-
> profile.children.cycles-pp.ttwu_do_activate
> 0.04 ±223% +0.5 0.54 ± 23% perf-
> profile.children.cycles-pp.dequeue_entity
> 0.07 ±223% +0.5 0.58 ± 38% perf-
> profile.children.cycles-pp.flush_smp_call_function_from_idle
> 0.04 ±223% +0.5 0.58 ± 24% perf-
> profile.children.cycles-pp.dequeue_task_fair
> 0.02 ±223% +0.6 0.65 ± 16% perf-
> profile.children.cycles-pp.wake_q_add
> 0.03 ±223% +0.8 0.84 ± 19% perf-
> profile.children.cycles-pp.schedule_idle
> 0.10 ±223% +1.3 1.36 ± 22% perf-
> profile.children.cycles-pp.schedule
> 0.05 ±223% +1.4 1.40 ± 15% perf-
> profile.children.cycles-pp.try_to_wake_up
> 0.12 ± 90% +1.4 1.49 ± 14% perf-
> profile.children.cycles-pp.wake_up_q
> 0.14 ±166% +2.0 2.11 ± 22% perf-
> profile.children.cycles-pp.poll_idle
> 0.13 ±200% +2.0 2.12 ± 20% perf-
> profile.children.cycles-pp.__schedule
> 27.85 ± 10% +5.3 33.15 ± 4% perf-
> profile.children.cycles-pp.do_msgsnd
> 34.70 ± 20% -8.1 26.57 ± 3% perf-
> profile.self.cycles-pp.intel_idle
> 13.28 ± 9% -4.2 9.03 ± 15% perf-
> profile.self.cycles-pp.native_queued_spin_lock_slowpath
> 0.66 ± 10% -0.4 0.29 ± 14% perf-
> profile.self.cycles-pp.get_obj_cgroup_from_current
> 1.41 ± 15% -0.3 1.09 ± 5% perf-
> profile.self.cycles-pp.__radix_tree_lookup
> 0.51 ± 17% -0.2 0.27 ± 9% perf-
> profile.self.cycles-pp.ipcperms
> 0.60 ± 22% -0.2 0.38 ± 28% perf-
> profile.self.cycles-pp.__list_del_entry_valid
> 0.24 ± 36% -0.2 0.05 ± 46% perf-
> profile.self.cycles-pp.___slab_alloc
> 0.63 ± 14% -0.2 0.46 ± 18% perf-
> profile.self.cycles-pp.idr_find
> 0.69 ± 10% -0.1 0.56 ± 2% perf-
> profile.self.cycles-pp.syscall_return_via_sysret
> 0.59 ± 9% -0.1 0.47 ± 3% perf-
> profile.self.cycles-pp.__check_heap_object
> 0.46 ± 10% -0.1 0.35 ± 7% perf-
> profile.self.cycles-pp.__entry_text_start
> 0.43 ± 9% -0.1 0.35 ± 5% perf-
> profile.self.cycles-pp.copy_user_generic_unrolled
> 0.33 ± 13% -0.1 0.26 perf-
> profile.self.cycles-pp.refill_obj_stock
> 0.18 ± 13% -0.1 0.11 ± 8% perf-
> profile.self.cycles-pp.sysvipc_find_ipc
> 0.27 ± 7% -0.1 0.20 ± 10% perf-
> profile.self.cycles-pp.page_counter_cancel
> 0.23 ± 10% -0.1 0.17 ± 4% perf-
> profile.self.cycles-pp.__put_user_nocheck_8
> 0.27 ± 10% -0.0 0.22 ± 4% perf-
> profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.17 ± 6% -0.0 0.12 ± 3% perf-
> profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
> 0.24 ± 10% -0.0 0.19 ± 8% perf-
> profile.self.cycles-pp.__get_user_8
> 0.26 ± 10% -0.0 0.22 ± 7% perf-
> profile.self.cycles-pp.obj_cgroup_charge
> 0.19 ± 11% -0.0 0.15 ± 3% perf-
> profile.self.cycles-pp.__might_sleep
> 0.08 ± 13% -0.0 0.05 ± 45% perf-
> profile.self.cycles-pp.drain_obj_stock
> 0.13 ± 11% -0.0 0.10 ± 8% perf-
> profile.self.cycles-pp.__mod_memcg_lruvec_state
> 0.11 ± 12% -0.0 0.08 ± 5% perf-
> profile.self.cycles-pp.store_msg
> 0.10 ± 14% -0.0 0.08 perf-
> profile.self.cycles-pp.__might_fault
> 0.10 ± 7% -0.0 0.08 ± 8% perf-
> profile.self.cycles-pp.page_counter_try_charge
> 0.05 ± 9% +0.0 0.09 ± 14% perf-
> profile.self.cycles-pp.wake_up_q
> 0.09 ± 10% +0.0 0.12 ± 10% perf-
> profile.self.cycles-pp.do_msg_fill
> 0.07 ± 18% +0.0 0.11 ± 9% perf-
> profile.self.cycles-pp.native_irq_return_iret
> 0.02 ±141% +0.1 0.07 ± 18% perf-
> profile.self.cycles-pp._raw_spin_trylock
> 0.00 +0.1 0.06 ± 15% perf-
> profile.self.cycles-pp.switch_mm_irqs_off
> 0.00 +0.1 0.07 ± 19% perf-
> profile.self.cycles-pp.dequeue_entity
> 0.00 +0.1 0.07 ± 22% perf-
> profile.self.cycles-pp.update_curr
> 0.03 ±102% +0.1 0.10 ± 12% perf-
> profile.self.cycles-pp.read_tsc
> 0.00 +0.1 0.08 ± 14% perf-
> profile.self.cycles-pp.__update_load_avg_cfs_rq
> 0.10 ± 12% +0.1 0.18 ± 5% perf-
> profile.self.cycles-pp.msgctl_info
> 0.00 +0.1 0.08 ± 22% perf-
> profile.self.cycles-pp.update_cfs_group
> 0.00 +0.1 0.08 ± 29% perf-
> profile.self.cycles-pp.finish_task_switch
> 0.00 +0.1 0.08 ± 10% perf-
> profile.self.cycles-pp.__update_load_avg_se
> 0.01 ±223% +0.1 0.09 ± 52% perf-
> profile.self.cycles-pp.ttwu_queue_wakelist
> 0.00 +0.1 0.09 ± 13% perf-
> profile.self.cycles-pp.reweight_entity
> 0.00 +0.1 0.09 ± 16% perf-
> profile.self.cycles-pp._raw_spin_unlock_irqrestore
> 0.00 +0.1 0.10 ± 22% perf-
> profile.self.cycles-pp.enqueue_task_fair
> 0.00 +0.1 0.10 ± 18% perf-
> profile.self.cycles-pp.send_call_function_single_ipi
> 0.00 +0.1 0.11 ± 16% perf-
> profile.self.cycles-pp.try_to_wake_up
> 0.01 ±223% +0.1 0.12 ± 31% perf-
> profile.self.cycles-pp.perf_trace_sched_wakeup_template
> 0.00 +0.1 0.12 ± 13% perf-
> profile.self.cycles-pp.set_next_entity
> 0.00 +0.1 0.12 ± 12% perf-
> profile.self.cycles-pp.update_load_avg
> 0.01 ±223% +0.1 0.13 ± 25% perf-
> profile.self.cycles-pp.available_idle_cpu
> 0.01 ±223% +0.1 0.15 ± 18% perf-
> profile.self.cycles-pp.native_sched_clock
> 0.00 +0.1 0.14 ± 20% perf-
> profile.self.cycles-pp.__switch_to_asm
> 0.37 ± 30% +0.1 0.52 ± 2% perf-
> profile.self.cycles-pp.menu_select
> 0.01 ±223% +0.2 0.16 ± 27% perf-
> profile.self.cycles-pp.enqueue_entity
> 0.00 +0.2 0.16 ± 34% perf-
> profile.self.cycles-pp.llist_add_batch
> 0.02 ±223% +0.2 0.19 ± 32% perf-
> profile.self.cycles-pp.nr_iowait_cpu
> 0.01 ±223% +0.2 0.19 ± 17% perf-
> profile.self.cycles-pp.do_idle
> 0.00 +0.2 0.19 ± 18% perf-
> profile.self.cycles-pp.update_rq_clock
> 0.02 ±223% +0.2 0.22 ± 17% perf-
> profile.self.cycles-pp._raw_spin_lock_irqsave
> 0.01 ±223% +0.2 0.23 ± 21% perf-
> profile.self.cycles-pp.__switch_to
> 0.01 ±223% +0.4 0.41 ± 19% perf-
> profile.self.cycles-pp.__schedule
> 0.02 ±223% +0.6 0.65 ± 17% perf-
> profile.self.cycles-pp.wake_q_add
> 0.13 ±166% +1.9 2.01 ± 22% perf-
> profile.self.cycles-pp.poll_idle
> 9.95 ± 12% +4.4 14.31 ± 4% perf-
> profile.self.cycles-pp.do_msgsnd
> 4.52 ± 22% +6.1 10.58 ± 9% perf-
> profile.self.cycles-pp._raw_spin_lock
>
>
>
>
> stress-
> ng.msg.ops_per_sec
>
>
> 5.4e+06 +--------------------------------------------------------
> ---------+
> 5.2e+06 |-
> + + + +..+ : .+.+ +.. |
> | : : +
> : + : + : : + |
> 5e+06 |:+ .+.+ : : + : + :+ :
> : |
> 4.8e+06 |:+.+ + : + : .+.+. + :+ + :
> : |
> | + + .. :
> : +.+. + + : |
> 4.6e+06 |-
> + + :: + |
> 4.4e+06 |-
> + + |
> 4.2e+06 |-
> + |
> |
> |
> 4e+06 |-
> + O O O |
> 3.8e+06 |-+ O O
> O O O O O O |
> | O O O O O O O O O
> O O O |
> 3.6e+06 |-+ O O O O
> O O |
> 3.4e+06 +--------------------------------------------------------
> ---------+
>
>
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are
> provided
> for informational purposes only. Any difference in system hardware or
> software
> design or configuration may affect actual performance.
>
>
> ---
> 0DAY/LKP+ Test Infrastructure Open Source
> Technology Center
> https://lists.01.org/hyperkitty/list/[email protected] Intel
> Corporation
>
> Thanks,
> Oliver Sang
>
Hi Boris, Srinivas,
On Tue, Apr 13, 2021 at 07:28:27PM +0200, Borislav Petkov wrote:
> On Tue, Apr 13, 2021 at 09:58:01PM +0800, kernel test robot wrote:
> > Greeting,
> >
> > FYI, we noticed a -27.4% regression of stress-ng.msg.ops_per_sec due to commit:
> >
> >
> > commit: 9223d0dccb8f8523754122f68316dd1a4f39f7f8 ("thermal: Move therm_throt there from x86/mce")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
This seems to be another case that performance jump is caused by
kernel's data alignment change triggered by an irrelevant patch.
With a debug patch to force aligned all data sections of .o files,
the performance diff is reduced from -27.4 to -2.8%.
And from perf profile and c2c data, we did see differenc about spinlock
around calling do_msgrcv/do_msgsnd with the 2 commits
> Hmm, so I went and ran your reproducer, but simplified (see end of
> mail), on a KBL box here. The kernel is tip:x86/urgent from last week:
>
> 5.12.0-rc6+
> -----------
> stress-ng: info: [1430] dispatching hogs: 9 msg
> stress-ng: info: [1430] successful run completed in 60.01s (1 min, 0.01 secs)
> stress-ng: info: [1430] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
> stress-ng: info: [1430] (secs) (secs) (secs) (real time) (usr+sys time)
> stress-ng: info: [1430] msg 237390147 60.01 104.03 255.85 3955872.56 659636.95
> stress-ng: info: [1430] for a 60.01s run time:
> stress-ng: info: [1430] 360.08s available CPU time
> stress-ng: info: [1430] 104.11s user time ( 28.91%)
> stress-ng: info: [1430] 255.93s system time ( 71.08%)
> stress-ng: info: [1430] 360.04s total time ( 99.99%)
> stress-ng: info: [1430] load average: 8.47 3.71 1.48
>
> Now the same kernel with
>
> > 4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")
> > 9223d0dccb ("thermal: Move therm_throt there from x86/mce")
>
> reverted.
>
> 5.12.0-rc6-rev+
> ---------------
> stress-ng: info: [1246] dispatching hogs: 9 msg
> stress-ng: info: [1246] successful run completed in 60.02s (1 min, 0.02 secs)
> stress-ng: info: [1246] stressor bogo ops real time usr time sys time bogo ops/s bogo ops/s
> stress-ng: info: [1246] (secs) (secs) (secs) (real time) (usr+sys time)
> stress-ng: info: [1246] msg 215174467 60.01 99.64 260.24 3585438.79 597906.15
> stress-ng: info: [1246] for a 60.02s run time:
> stress-ng: info: [1246] 360.10s available CPU time
> stress-ng: info: [1246] 99.72s user time ( 27.69%)
> stress-ng: info: [1246] 260.32s system time ( 72.29%)
> stress-ng: info: [1246] 360.04s total time ( 99.98%)
> stress-ng: info: [1246] load average: 7.98 2.33 0.80
>
> so if I'm reading this correctly, reverting the patches here brings the
> *slow-down*.
>
> What's up?
>
> reproducer:
> ----------
>
> #!/usr/bin/bash
>
> for cpu_dir in /sys/devices/system/cpu/cpu[0-9]*
> do
> online_file="$cpu_dir"/online
> [ -f "$online_file" ] && [ "$(cat "$online_file")" -eq 0 ] && continue
>
> file="$cpu_dir"/cpufreq/scaling_governor
> [ -f "$file" ] && echo "performance" > "$file"
> done
>
> stress-ng --timeout 60 --times --verify --metrics-brief --msg 9
The original test case is for 'nr_threads=10%' which turns to '9' for the
96 CPU 2-sockets Cascade Lake platform. So I guess it may not be reproduced
on 1 socket platform, and sometimes kernel config also matters for
micro-benchmark like 'stress-ng'
Thanks,
Feng
> --
> Regards/Gruss,
> Boris.
>
> SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg