2021-06-28 07:17:40

by kernel test robot

[permalink] [raw]
Subject: [trace] 3d3d9c072e: stress-ng.msg.ops_per_sec 18.5% improvement



Greeting,

FYI, we noticed a 18.5% improvement of stress-ng.msg.ops_per_sec due to commit:


commit: 3d3d9c072e377a0c5fa0a1f661a95134e787db3a ("trace: replace WB_REASON_FOREIGN_FLUSH with a string")
https://git.kernel.org/cgit/linux/kernel/git/rostedt/linux-trace.git for-next


in testcase: stress-ng
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
with following parameters:

nr_threads: 10%
disk: 1HDD
testtime: 60s
fs: ext4
class: os
test: msg
cpufreq_governor: performance
ucode: 0x5003006






Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
bin/lkp run generated-yaml-file

=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
os/gcc-9/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp5/msg/stress-ng/60s/0x5003006

commit:
957cdcd9bd ("ring-buffer: Use fallthrough pseudo-keyword")
3d3d9c072e ("trace: replace WB_REASON_FOREIGN_FLUSH with a string")

957cdcd9bd7e035d 3d3d9c072e377a0c5fa0a1f661a
---------------- ---------------------------
%stddev %change %stddev
\ | \
1.997e+08 +18.5% 2.367e+08 stress-ng.msg.ops
3327615 +18.5% 3943995 stress-ng.msg.ops_per_sec
1246 ? 10% +62.2% 2022 ? 5% stress-ng.time.involuntary_context_switches
1565 +5.7% 1655 stress-ng.time.percent_of_cpu_this_job_got
955.26 +5.8% 1010 stress-ng.time.system_time
40528886 ? 9% -77.6% 9060408 ? 40% stress-ng.time.voluntary_context_switches
20278090 ? 4% -29.3% 14340237 turbostat.IRQ
37628340 ? 9% -76.4% 8880621 ? 51% cpuidle.POLL.time
30309198 ? 9% -77.7% 6751779 ? 55% cpuidle.POLL.usage
83.75 -1.3% 82.62 iostat.cpu.idle
15.92 +7.0% 17.03 iostat.cpu.system
1248983 ? 9% -77.5% 280750 ? 40% vmstat.system.cs
312514 ? 4% -29.2% 221116 vmstat.system.in
16252 ? 19% -30.5% 11301 ? 16% softirqs.CPU34.SCHED
11413 ? 16% -25.4% 8518 ? 28% softirqs.CPU38.SCHED
14279 ? 30% -43.8% 8025 ? 33% softirqs.CPU39.SCHED
119136 ? 35% +316.6% 496375 ? 35% numa-numastat.node0.local_node
181940 ? 6% +202.3% 549926 ? 33% numa-numastat.node0.numa_hit
141089 ? 30% +107.4% 292570 ? 35% numa-numastat.node1.local_node
164880 ? 7% +97.5% 325601 ? 34% numa-numastat.node1.numa_hit
381961 +137.1% 905716 ? 30% proc-vmstat.numa_hit
295354 +177.3% 819154 ? 33% proc-vmstat.numa_local
393066 +137.5% 933616 ? 30% proc-vmstat.pgalloc_normal
263755 +204.3% 802655 ? 35% proc-vmstat.pgfree
11952 ? 4% +36.4% 16297 ? 6% slabinfo.kmalloc-512.active_objs
376.33 ? 4% +36.4% 513.50 ? 6% slabinfo.kmalloc-512.active_slabs
12055 ? 4% +36.4% 16445 ? 6% slabinfo.kmalloc-512.num_objs
376.33 ? 4% +36.4% 513.50 ? 6% slabinfo.kmalloc-512.num_slabs
54896 ? 3% +47.6% 81015 ? 7% slabinfo.kmalloc-64.active_objs
858.17 ? 3% +47.7% 1267 ? 7% slabinfo.kmalloc-64.active_slabs
54962 ? 3% +47.6% 81140 ? 7% slabinfo.kmalloc-64.num_objs
858.17 ? 3% +47.7% 1267 ? 7% slabinfo.kmalloc-64.num_slabs
7755346 ? 12% -76.8% 1802412 ? 7% interrupts.CAL:Function_call_interrupts
456.17 ?141% -92.8% 32.67 ? 63% interrupts.CPU0.RES:Rescheduling_interrupts
281.67 ? 86% -89.3% 30.17 ?120% interrupts.CPU20.RES:Rescheduling_interrupts
37397 ? 75% -80.2% 7409 ? 68% interrupts.CPU34.CAL:Function_call_interrupts
2076 ? 96% -83.4% 343.67 ?167% interrupts.CPU34.NMI:Non-maskable_interrupts
2076 ? 96% -83.4% 343.67 ?167% interrupts.CPU34.PMI:Performance_monitoring_interrupts
45020 ? 71% -95.8% 1880 ? 80% interrupts.CPU39.CAL:Function_call_interrupts
407.50 ? 47% -77.1% 93.50 ?148% interrupts.CPU68.RES:Rescheduling_interrupts
443.50 ?103% -82.1% 79.50 ? 20% interrupts.CPU93.NMI:Non-maskable_interrupts
443.50 ?103% -82.1% 79.50 ? 20% interrupts.CPU93.PMI:Performance_monitoring_interrupts
19756 ? 18% -77.7% 4403 ? 21% interrupts.RES:Rescheduling_interrupts
3.43e+09 ? 3% -7.1% 3.185e+09 ? 2% perf-stat.i.branch-instructions
0.74 ? 4% -0.2 0.58 ? 6% perf-stat.i.branch-miss-rate%
24452178 ? 5% -29.2% 17322512 ? 5% perf-stat.i.branch-misses
1290940 ? 9% -77.6% 289762 ? 39% perf-stat.i.context-switches
4.379e+09 ? 3% -9.4% 3.967e+09 ? 3% perf-stat.i.dTLB-loads
2.503e+09 ? 3% -11.6% 2.212e+09 ? 3% perf-stat.i.dTLB-stores
87.84 -5.0 82.86 perf-stat.i.iTLB-load-miss-rate%
13313579 ? 6% -36.5% 8456850 ? 6% perf-stat.i.iTLB-load-misses
1.688e+10 ? 3% -6.4% 1.58e+10 ? 2% perf-stat.i.instructions
1321 ? 4% +45.4% 1921 ? 6% perf-stat.i.instructions-per-iTLB-miss
0.35 ? 2% -6.0% 0.33 ? 3% perf-stat.i.ipc
111.30 ? 3% -9.2% 101.04 ? 2% perf-stat.i.metric.M/sec
16080 ? 3% +371.9% 75880 ? 32% perf-stat.i.node-stores
0.71 ? 3% -0.2 0.54 ? 3% perf-stat.overall.branch-miss-rate%
2.93 ? 2% +6.7% 3.12 ? 2% perf-stat.overall.cpi
89.24 -5.1 84.11 perf-stat.overall.iTLB-load-miss-rate%
1272 ? 4% +47.5% 1876 ? 6% perf-stat.overall.instructions-per-iTLB-miss
0.34 ? 3% -6.3% 0.32 ? 2% perf-stat.overall.ipc
3.376e+09 ? 3% -7.1% 3.135e+09 ? 2% perf-stat.ps.branch-instructions
24066583 ? 5% -29.2% 17045901 ? 5% perf-stat.ps.branch-misses
1270763 ? 9% -77.6% 285171 ? 40% perf-stat.ps.context-switches
4.31e+09 ? 3% -9.4% 3.905e+09 ? 3% perf-stat.ps.dTLB-loads
2.463e+09 ? 3% -11.6% 2.177e+09 ? 3% perf-stat.ps.dTLB-stores
13105434 ? 6% -36.5% 8323297 ? 6% perf-stat.ps.iTLB-load-misses
1.662e+10 ? 3% -6.4% 1.556e+10 ? 2% perf-stat.ps.instructions
15767 ? 3% +374.0% 74743 ? 32% perf-stat.ps.node-stores
1.048e+12 ? 2% -6.4% 9.805e+11 ? 3% perf-stat.total.instructions
1.08 ? 17% +0.4 1.49 ? 6% perf-profile.calltrace.cycles-pp.get_obj_cgroup_from_current.__kmalloc.load_msg.do_msgsnd.do_syscall_64
2.12 ? 9% +0.6 2.77 ? 9% perf-profile.calltrace.cycles-pp.__mod_memcg_state.__mod_memcg_lruvec_state.kfree.free_msg.do_msgrcv
2.22 ? 9% +0.7 2.88 ? 10% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.kfree.free_msg.do_msgrcv.do_syscall_64
1.66 ? 23% +0.7 2.34 ? 9% perf-profile.calltrace.cycles-pp.ipc_obtain_object_check.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.74 ? 10% +0.9 3.60 ? 10% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.do_msgrcv.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.94 ? 14% +1.0 2.98 ? 10% perf-profile.calltrace.cycles-pp.__mod_memcg_state.__mod_memcg_lruvec_state.__kmalloc.load_msg.do_msgsnd
2.02 ? 14% +1.1 3.09 ? 9% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__kmalloc.load_msg.do_msgsnd.do_syscall_64
1.83 ? 28% +1.9 3.76 ? 20% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.22 ? 14% +2.4 8.66 ? 8% perf-profile.calltrace.cycles-pp.__kmalloc.load_msg.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
7.17 ? 15% +2.6 9.80 ? 9% perf-profile.calltrace.cycles-pp.load_msg.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.12 ? 8% -2.4 0.76 ? 42% perf-profile.children.cycles-pp.__schedule
2.49 ? 8% -1.9 0.60 ? 58% perf-profile.children.cycles-pp.poll_idle
1.82 ? 9% -1.3 0.48 ? 55% perf-profile.children.cycles-pp.schedule
1.78 ? 8% -1.3 0.51 ? 44% perf-profile.children.cycles-pp.wake_up_q
1.69 ? 8% -1.2 0.45 ? 49% perf-profile.children.cycles-pp.try_to_wake_up
1.38 ? 9% -1.1 0.30 ? 22% perf-profile.children.cycles-pp.schedule_idle
0.84 ? 8% -0.6 0.22 ? 56% perf-profile.children.cycles-pp.dequeue_task_fair
0.77 ? 6% -0.6 0.18 ? 37% perf-profile.children.cycles-pp.ttwu_do_activate
0.76 ? 9% -0.6 0.19 ? 59% perf-profile.children.cycles-pp.dequeue_entity
0.73 ? 5% -0.6 0.17 ? 35% perf-profile.children.cycles-pp.enqueue_task_fair
0.73 ? 21% -0.5 0.24 ?100% perf-profile.children.cycles-pp.flush_smp_call_function_from_idle
0.68 ? 20% -0.5 0.21 ? 81% perf-profile.children.cycles-pp.sched_ttwu_pending
0.58 ? 6% -0.4 0.13 ? 38% perf-profile.children.cycles-pp.enqueue_entity
0.55 ? 11% -0.4 0.12 ? 34% perf-profile.children.cycles-pp.pick_next_task_fair
0.76 ? 11% -0.4 0.39 ? 29% perf-profile.children.cycles-pp.menu_select
0.53 ? 10% -0.4 0.17 ? 42% perf-profile.children.cycles-pp.wake_q_add
0.44 ? 22% -0.3 0.11 ? 23% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.44 ? 8% -0.3 0.11 ? 32% perf-profile.children.cycles-pp.update_load_avg
0.48 ? 24% -0.3 0.15 ? 59% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.36 ? 14% -0.3 0.06 ? 52% perf-profile.children.cycles-pp.set_next_entity
0.40 ? 13% -0.3 0.11 ? 84% perf-profile.children.cycles-pp.update_curr
0.36 ? 8% -0.3 0.09 ? 40% perf-profile.children.cycles-pp.__switch_to
0.35 ? 22% -0.3 0.08 ? 27% perf-profile.children.cycles-pp.finish_task_switch
0.33 ? 22% -0.2 0.08 ? 28% perf-profile.children.cycles-pp.sysvec_call_function_single
0.34 ? 6% -0.2 0.10 ? 33% perf-profile.children.cycles-pp.update_rq_clock
0.28 ? 23% -0.2 0.06 ?123% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.28 ? 23% -0.2 0.07 ? 30% perf-profile.children.cycles-pp.__sysvec_call_function_single
0.27 ? 9% -0.2 0.06 ?145% perf-profile.children.cycles-pp.tick_nohz_idle_exit
0.40 ? 15% -0.2 0.23 ? 31% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.26 ? 5% -0.2 0.10 ? 32% perf-profile.children.cycles-pp.sched_clock_cpu
0.22 ? 35% -0.2 0.06 ? 87% perf-profile.children.cycles-pp.__smp_call_single_queue
0.22 ? 35% -0.2 0.06 ? 87% perf-profile.children.cycles-pp.llist_add_batch
0.22 ? 20% -0.2 0.06 ?113% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
0.23 ? 7% -0.1 0.08 ? 37% perf-profile.children.cycles-pp.native_sched_clock
0.30 ? 17% -0.1 0.20 ? 34% perf-profile.children.cycles-pp.tick_nohz_next_event
0.11 ? 8% -0.1 0.05 ? 48% perf-profile.children.cycles-pp.read_tsc
0.12 ? 12% -0.0 0.08 ? 11% perf-profile.children.cycles-pp.native_irq_return_iret
0.07 ? 14% +0.0 0.10 ? 13% perf-profile.children.cycles-pp.get_mem_cgroup_from_objcg
0.16 ? 14% +0.0 0.19 ? 7% perf-profile.children.cycles-pp.__get_user_8
0.14 ? 9% +0.0 0.18 ? 8% perf-profile.children.cycles-pp.drain_obj_stock
0.10 ? 10% +0.1 0.16 ? 13% perf-profile.children.cycles-pp.obj_cgroup_charge_pages
0.18 ? 10% +0.1 0.23 ? 12% perf-profile.children.cycles-pp.obj_cgroup_charge
0.07 ? 15% +0.1 0.14 ? 12% perf-profile.children.cycles-pp.msgctl_info
0.13 ? 13% +0.1 0.22 ? 11% perf-profile.children.cycles-pp.ksys_msgctl
0.22 ? 10% +0.1 0.33 ? 19% perf-profile.children.cycles-pp.ipcperms
0.28 ? 18% +0.1 0.41 ? 32% perf-profile.children.cycles-pp.__list_del_entry_valid
1.09 ? 17% +0.4 1.51 ? 6% perf-profile.children.cycles-pp.get_obj_cgroup_from_current
3.46 ? 15% +0.9 4.38 ? 9% perf-profile.children.cycles-pp.ipc_obtain_object_check
4.07 ? 11% +1.7 5.77 ? 9% perf-profile.children.cycles-pp.__mod_memcg_state
4.26 ? 11% +1.7 6.00 ? 9% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
6.26 ? 14% +2.4 8.72 ? 8% perf-profile.children.cycles-pp.__kmalloc
7.18 ? 15% +2.6 9.82 ? 9% perf-profile.children.cycles-pp.load_msg
4.60 ? 16% +2.8 7.40 ? 9% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
2.31 ? 8% -1.8 0.56 ? 59% perf-profile.self.cycles-pp.poll_idle
0.63 ? 9% -0.5 0.15 ? 38% perf-profile.self.cycles-pp.__schedule
0.53 ? 10% -0.4 0.17 ? 42% perf-profile.self.cycles-pp.wake_q_add
0.35 ? 8% -0.3 0.08 ? 40% perf-profile.self.cycles-pp.__switch_to
0.29 ? 8% -0.2 0.06 ? 59% perf-profile.self.cycles-pp.enqueue_entity
0.24 ? 6% -0.2 0.06 ? 50% perf-profile.self.cycles-pp.do_idle
0.22 ? 35% -0.2 0.06 ? 87% perf-profile.self.cycles-pp.llist_add_batch
0.29 ? 7% -0.2 0.13 ? 32% perf-profile.self.cycles-pp.menu_select
0.22 ? 6% -0.1 0.08 ? 39% perf-profile.self.cycles-pp.native_sched_clock
0.23 ? 9% -0.1 0.12 ? 32% perf-profile.self.cycles-pp.do_msg_fill
0.11 ? 7% -0.1 0.05 ? 48% perf-profile.self.cycles-pp.read_tsc
0.12 ? 12% -0.0 0.08 ? 8% perf-profile.self.cycles-pp.native_irq_return_iret
0.08 ? 8% -0.0 0.05 ? 45% perf-profile.self.cycles-pp.wake_up_q
0.07 ? 14% +0.0 0.10 ? 13% perf-profile.self.cycles-pp.get_mem_cgroup_from_objcg
0.15 ? 12% +0.0 0.19 ? 7% perf-profile.self.cycles-pp.__get_user_8
0.06 ? 13% +0.0 0.10 ? 14% perf-profile.self.cycles-pp.msgctl_info
0.00 +0.1 0.05 ? 7% perf-profile.self.cycles-pp.format_decode
0.26 ? 12% +0.1 0.34 ? 11% perf-profile.self.cycles-pp.sysvipc_find_ipc
0.21 ? 11% +0.1 0.32 ? 19% perf-profile.self.cycles-pp.ipcperms
0.06 ? 50% +0.1 0.17 ? 33% perf-profile.self.cycles-pp.free_msg
0.28 ? 17% +0.1 0.41 ? 31% perf-profile.self.cycles-pp.__list_del_entry_valid
0.57 ? 26% +0.3 0.83 ? 19% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
1.07 ? 17% +0.4 1.48 ? 6% perf-profile.self.cycles-pp.get_obj_cgroup_from_current
2.18 ? 18% +0.7 2.90 ? 12% perf-profile.self.cycles-pp.ipc_obtain_object_check
4.07 ? 11% +1.7 5.77 ? 9% perf-profile.self.cycles-pp.__mod_memcg_state
4.55 ? 16% +2.8 7.34 ? 9% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath



stress-ng.msg.ops_per_sec

4.4e+06 +-----------------------------------------------------------------+
| O |
4.2e+06 |-+ O O O O |
| O O O O O |
4e+06 |-O O OO O O O OO O O O O O O O O O OO |
| O O O O O O O O O |
3.8e+06 |-+ |
| |
3.6e+06 |-+ .+ .+ |
| + + +.++ + |
3.4e+06 |-+ +. : + ++ +. + |
| .+ +.: : +.+ +.+ |
3.2e+06 |-++.+ + : .+ .+ + |
|.+ + +.+ + |
3e+06 +-----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (17.92 kB)
config-5.13.0-rc5-00009-g3d3d9c072e37 (176.85 kB)
job-script (8.41 kB)
job.yaml (5.83 kB)
reproduce (548.00 B)
Download all attachments