2021-03-11 02:01:51

by kernel test robot

[permalink] [raw]
Subject: [printk] 996e966640: stress-ng.klog.ops_per_sec 1097.4% improvement



Greeting,

FYI, we noticed a 1097.4% improvement of stress-ng.klog.ops_per_sec due to commit:


commit: 996e966640ddea7b535cbe7b749e87a3b24f37e8 ("printk: remove logbuf_lock")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master


in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
with following parameters:

nr_threads: 10%
disk: 1HDD
testtime: 60s
fs: ext4
class: os
test: klog
cpufreq_governor: performance
ucode: 0x5003006






Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml

=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
os/gcc-9/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp5/klog/stress-ng/60s/0x5003006

commit:
f9f3f02db9 ("printk: introduce a kmsg_dump iterator")
996e966640 ("printk: remove logbuf_lock")

f9f3f02db98bbe67 996e966640ddea7b535cbe7b749
---------------- ---------------------------
%stddev %change %stddev
\ | \
26314 +1097.3% 315061 stress-ng.klog.ops
438.54 +1097.4% 5250 stress-ng.klog.ops_per_sec
961.83 ? 2% -15.9% 809.00 stress-ng.time.involuntary_context_switches
8450 ? 39% +4572.7% 394845 ?162% cpuidle.POLL.usage
7225 ? 22% +27.2% 9187 ? 11% softirqs.CPU2.SCHED
1844 -3.7% 1775 vmstat.system.cs
176153 ? 70% -50.0% 88051 ?141% numa-vmstat.node0.nr_dirtied
176152 ? 70% -50.0% 88051 ?141% numa-vmstat.node0.nr_written
88056 ?141% +100.1% 176157 ? 70% numa-vmstat.node1.nr_dirtied
88055 ?141% +100.1% 176156 ? 70% numa-vmstat.node1.nr_written
38516 +39.4% 53679 ? 27% sched_debug.cpu.clock.avg
38521 +39.4% 53683 ? 27% sched_debug.cpu.clock.max
38510 +39.4% 53674 ? 27% sched_debug.cpu.clock.min
38388 +39.2% 53441 ? 27% sched_debug.cpu.clock_task.avg
38507 +39.1% 53581 ? 27% sched_debug.cpu.clock_task.max
38513 +39.4% 53676 ? 27% sched_debug.cpu_clk
38019 +39.9% 53181 ? 28% sched_debug.ktime
38879 +39.0% 54033 ? 27% sched_debug.sched_clk
4.47 ? 7% -92.3% 0.35 ? 37% perf-stat.i.MPKI
1.824e+09 +709.2% 1.476e+10 perf-stat.i.branch-instructions
0.49 ? 7% -0.2 0.28 ? 4% perf-stat.i.branch-miss-rate%
8126698 ? 3% +291.8% 31840686 perf-stat.i.branch-misses
30.49 ? 10% -14.0 16.46 ? 3% perf-stat.i.cache-miss-rate%
9809108 ? 8% -83.9% 1581490 ? 3% perf-stat.i.cache-misses
32576128 ? 4% -70.0% 9756759 ? 6% perf-stat.i.cache-references
1700 -4.5% 1623 perf-stat.i.context-switches
3.61 -86.3% 0.49 perf-stat.i.cpi
106.88 -1.6% 105.21 perf-stat.i.cpu-migrations
2857 ? 7% +557.6% 18789 ? 4% perf-stat.i.cycles-between-cache-misses
1.543e+09 +608.5% 1.093e+10 ? 2% perf-stat.i.dTLB-loads
7.862e+08 +893.9% 7.813e+09 perf-stat.i.dTLB-stores
46.51 +3.6 50.13 ? 5% perf-stat.i.iTLB-load-miss-rate%
1764355 +14.1% 2013317 perf-stat.i.iTLB-load-misses
7.571e+09 +699.3% 6.051e+10 perf-stat.i.instructions
4308 +598.5% 30094 perf-stat.i.instructions-per-iTLB-miss
0.29 +636.7% 2.13 perf-stat.i.ipc
12.32 -8.6% 11.26 perf-stat.i.major-faults
0.63 ? 11% +72.3% 1.08 ? 12% perf-stat.i.metric.K/sec
43.60 +700.6% 349.10 perf-stat.i.metric.M/sec
3214 -2.0% 3149 perf-stat.i.minor-faults
97.27 -10.4 86.91 perf-stat.i.node-load-miss-rate%
2442015 ? 9% -89.9% 246595 ? 17% perf-stat.i.node-load-misses
96.91 -11.0 85.92 ? 8% perf-stat.i.node-store-miss-rate%
1985853 ? 9% -97.0% 59228 ? 6% perf-stat.i.node-store-misses
3226 -2.0% 3160 perf-stat.i.page-faults
4.29 ? 3% -96.2% 0.16 ? 6% perf-stat.overall.MPKI
0.44 ? 3% -0.2 0.22 perf-stat.overall.branch-miss-rate%
30.37 ? 11% -14.1 16.27 ? 6% perf-stat.overall.cache-miss-rate%
3.64 -87.4% 0.46 perf-stat.overall.cpi
2824 ? 7% +522.1% 17572 ? 3% perf-stat.overall.cycles-between-cache-misses
0.00 ? 81% -0.0 0.00 ? 74% perf-stat.overall.dTLB-load-miss-rate%
0.00 ? 31% -0.0 0.00 ? 30% perf-stat.overall.dTLB-store-miss-rate%
46.46 +3.7 50.11 ? 5% perf-stat.overall.iTLB-load-miss-rate%
4306 +598.3% 30070 perf-stat.overall.instructions-per-iTLB-miss
0.27 +694.3% 2.18 perf-stat.overall.ipc
98.28 -13.2 85.05 perf-stat.overall.node-load-miss-rate%
99.06 -21.7 77.41 ? 6% perf-stat.overall.node-store-miss-rate%
1.795e+09 +709.1% 1.452e+10 perf-stat.ps.branch-instructions
7911307 ? 3% +296.0% 31331201 perf-stat.ps.branch-misses
9665170 ? 8% -83.9% 1555589 ? 3% perf-stat.ps.cache-misses
31965660 ? 4% -70.0% 9598649 ? 6% perf-stat.ps.cache-references
1664 -4.1% 1597 perf-stat.ps.context-switches
104.80 -1.3% 103.49 perf-stat.ps.cpu-migrations
1.518e+09 +608.7% 1.076e+10 ? 2% perf-stat.ps.dTLB-loads
7.733e+08 +894.4% 7.689e+09 perf-stat.ps.dTLB-stores
1730085 +14.5% 1980925 perf-stat.ps.iTLB-load-misses
7.449e+09 +699.5% 5.955e+10 perf-stat.ps.instructions
2408929 ? 9% -89.9% 242555 ? 17% perf-stat.ps.node-load-misses
1959947 ? 9% -97.0% 58243 ? 6% perf-stat.ps.node-store-misses
4.701e+11 +699.2% 3.757e+12 perf-stat.total.instructions
58.25 ? 7% -58.3 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock.syslog_print_all.do_syslog.__x64_sys_syslog.do_syscall_64
57.16 ? 7% -57.2 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.syslog_print_all.do_syslog.__x64_sys_syslog
0.80 ? 8% +0.2 1.00 ? 4% perf-profile.calltrace.cycles-pp.desc_read.desc_read_finalized_seq._prb_read_valid.prb_read_valid.syslog_print_all
0.00 +0.6 0.57 ? 4% perf-profile.calltrace.cycles-pp.put_dec_trunc8.number.vsnprintf.sprintf.info_print_prefix
0.00 +0.6 0.58 ? 3% perf-profile.calltrace.cycles-pp.memmove.record_print_text.syslog_print_all.do_syslog.__x64_sys_syslog
0.00 +0.6 0.61 ? 7% perf-profile.calltrace.cycles-pp.copy_user_generic_unrolled._copy_to_user.syslog_print_all.do_syslog.__x64_sys_syslog
0.88 ? 8% +0.7 1.61 ? 3% perf-profile.calltrace.cycles-pp.desc_read_finalized_seq._prb_read_valid.prb_read_valid.syslog_print_all.do_syslog
0.00 +0.7 0.74 ? 4% perf-profile.calltrace.cycles-pp.memcpy_erms._prb_read_valid.prb_read_valid.syslog_print_all.do_syslog
0.00 +0.8 0.84 ? 5% perf-profile.calltrace.cycles-pp.memcpy_erms.record_print_text.syslog_print_all.do_syslog.__x64_sys_syslog
0.00 +1.3 1.28 ? 4% perf-profile.calltrace.cycles-pp.desc_read.desc_read_finalized_seq._prb_read_valid.prb_read_valid_info.find_first_fitting_seq
0.00 +1.4 1.40 ? 4% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_to_user.syslog_print_all.do_syslog.__x64_sys_syslog
0.00 +1.4 1.44 ? 4% perf-profile.calltrace.cycles-pp.__check_heap_object.__check_object_size.syslog_print_all.do_syslog.__x64_sys_syslog
0.00 +2.0 2.02 ? 4% perf-profile.calltrace.cycles-pp.memchr.record_print_text.syslog_print_all.do_syslog.__x64_sys_syslog
0.00 +2.0 2.03 ? 2% perf-profile.calltrace.cycles-pp.desc_read_finalized_seq._prb_read_valid.prb_read_valid_info.find_first_fitting_seq.syslog_print_all
0.00 +2.2 2.20 ? 4% perf-profile.calltrace.cycles-pp.memchr._prb_read_valid.prb_read_valid_info.find_first_fitting_seq.syslog_print_all
0.00 +2.9 2.93 ? 2% perf-profile.calltrace.cycles-pp.__check_object_size.syslog_print_all.do_syslog.__x64_sys_syslog.do_syscall_64
0.00 +2.9 2.93 ? 3% perf-profile.calltrace.cycles-pp._copy_to_user.syslog_print_all.do_syslog.__x64_sys_syslog.do_syscall_64
1.12 ? 8% +3.3 4.44 ? 4% perf-profile.calltrace.cycles-pp._prb_read_valid.prb_read_valid.syslog_print_all.do_syslog.__x64_sys_syslog
1.13 ? 8% +3.5 4.63 ? 4% perf-profile.calltrace.cycles-pp.prb_read_valid.syslog_print_all.do_syslog.__x64_sys_syslog.do_syscall_64
0.00 +3.6 3.63 ? 3% perf-profile.calltrace.cycles-pp.memcpy_erms.vsnprintf.sprintf.info_print_prefix.record_print_text
0.00 +3.9 3.86 ? 3% perf-profile.calltrace.cycles-pp.number.vsnprintf.sprintf.info_print_prefix.record_print_text
0.00 +4.2 4.24 ? 2% perf-profile.calltrace.cycles-pp.memcpy_erms.vsnprintf.sprintf.info_print_prefix.find_first_fitting_seq
0.00 +4.6 4.56 ? 4% perf-profile.calltrace.cycles-pp.number.vsnprintf.sprintf.info_print_prefix.find_first_fitting_seq
0.00 +5.1 5.09 ? 3% perf-profile.calltrace.cycles-pp.format_decode.vsnprintf.sprintf.info_print_prefix.record_print_text
0.00 +6.1 6.08 ? 6% perf-profile.calltrace.cycles-pp.format_decode.vsnprintf.sprintf.info_print_prefix.find_first_fitting_seq
0.47 ? 44% +6.4 6.84 ? 4% perf-profile.calltrace.cycles-pp._prb_read_valid.prb_read_valid_info.find_first_fitting_seq.syslog_print_all.do_syslog
0.49 ? 44% +6.6 7.12 ? 3% perf-profile.calltrace.cycles-pp.prb_read_valid_info.find_first_fitting_seq.syslog_print_all.do_syslog.__x64_sys_syslog
1.41 ? 6% +15.7 17.08 ? 3% perf-profile.calltrace.cycles-pp.vsnprintf.sprintf.info_print_prefix.record_print_text.syslog_print_all
1.50 ? 7% +16.8 18.34 ? 3% perf-profile.calltrace.cycles-pp.sprintf.info_print_prefix.record_print_text.syslog_print_all.do_syslog
1.57 ? 7% +17.7 19.25 ? 3% perf-profile.calltrace.cycles-pp.info_print_prefix.record_print_text.syslog_print_all.do_syslog.__x64_sys_syslog
1.61 ? 6% +18.4 20.00 ? 3% perf-profile.calltrace.cycles-pp.vsnprintf.sprintf.info_print_prefix.find_first_fitting_seq.syslog_print_all
1.70 ? 6% +19.8 21.54 ? 3% perf-profile.calltrace.cycles-pp.sprintf.info_print_prefix.find_first_fitting_seq.syslog_print_all.do_syslog
1.78 ? 6% +20.8 22.59 ? 3% perf-profile.calltrace.cycles-pp.info_print_prefix.find_first_fitting_seq.syslog_print_all.do_syslog.__x64_sys_syslog
2.08 ? 7% +22.4 24.43 ? 3% perf-profile.calltrace.cycles-pp.record_print_text.syslog_print_all.do_syslog.__x64_sys_syslog.do_syscall_64
2.39 ? 7% +27.8 30.14 ? 3% perf-profile.calltrace.cycles-pp.find_first_fitting_seq.syslog_print_all.do_syslog.__x64_sys_syslog.do_syscall_64
58.35 ? 7% -58.3 0.04 ? 71% perf-profile.children.cycles-pp._raw_spin_lock
57.20 ? 7% -57.2 0.00 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.00 +0.1 0.11 ? 7% perf-profile.children.cycles-pp.check_stack_object
0.00 +0.1 0.12 ? 3% perf-profile.children.cycles-pp.put_dec
0.00 +0.1 0.14 ? 5% perf-profile.children.cycles-pp.___might_sleep
0.00 +0.1 0.15 ? 5% perf-profile.children.cycles-pp.__printk_safe_enter
0.00 +0.2 0.18 ? 4% perf-profile.children.cycles-pp.__might_sleep
0.01 ?223% +0.4 0.41 ? 5% perf-profile.children.cycles-pp.__virt_addr_valid
0.01 ?223% +0.4 0.43 ? 4% perf-profile.children.cycles-pp.__might_fault
0.01 ?223% +0.5 0.46 ? 7% perf-profile.children.cycles-pp.memcpy
0.06 ? 9% +0.6 0.62 ? 3% perf-profile.children.cycles-pp.memmove
0.07 ? 14% +0.7 0.72 ? 7% perf-profile.children.cycles-pp.copy_user_generic_unrolled
0.07 ? 19% +0.7 0.79 ? 5% perf-profile.children.cycles-pp.get_data
0.09 ? 11% +0.9 1.01 ? 3% perf-profile.children.cycles-pp.put_dec_trunc8
0.09 ? 16% +1.1 1.19 ? 3% perf-profile.children.cycles-pp.skip_atoi
0.12 ? 16% +1.4 1.48 ? 4% perf-profile.children.cycles-pp.__check_heap_object
0.14 ? 7% +1.4 1.54 ? 4% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
0.90 ? 8% +1.4 2.31 ? 4% perf-profile.children.cycles-pp.desc_read
0.26 ? 6% +2.7 2.97 ? 3% perf-profile.children.cycles-pp._copy_to_user
1.04 ? 8% +2.8 3.81 ? 3% perf-profile.children.cycles-pp.desc_read_finalized_seq
0.24 ? 10% +2.8 3.06 ? 2% perf-profile.children.cycles-pp.__check_object_size
1.14 ? 8% +3.5 4.68 ? 4% perf-profile.children.cycles-pp.prb_read_valid
0.40 ? 7% +4.3 4.67 ? 4% perf-profile.children.cycles-pp.memchr
0.57 ? 8% +6.6 7.17 ? 3% perf-profile.children.cycles-pp.prb_read_valid_info
0.70 ? 9% +7.9 8.56 ? 4% perf-profile.children.cycles-pp.number
0.83 ? 8% +9.3 10.13 ? 2% perf-profile.children.cycles-pp.memcpy_erms
1.69 ? 7% +9.8 11.47 ? 3% perf-profile.children.cycles-pp._prb_read_valid
0.89 ? 4% +10.3 11.20 ? 3% perf-profile.children.cycles-pp.format_decode
2.09 ? 7% +22.4 24.50 ? 3% perf-profile.children.cycles-pp.record_print_text
2.39 ? 7% +27.8 30.18 ? 3% perf-profile.children.cycles-pp.find_first_fitting_seq
3.08 ? 6% +35.2 38.28 ? 3% perf-profile.children.cycles-pp.vsnprintf
3.21 ? 6% +36.9 40.07 ? 3% perf-profile.children.cycles-pp.sprintf
3.37 ? 6% +38.6 41.98 ? 3% perf-profile.children.cycles-pp.info_print_prefix
57.20 ? 7% -57.2 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
1.15 ? 10% -1.1 0.04 ? 71% perf-profile.self.cycles-pp._raw_spin_lock
0.00 +0.1 0.08 ? 11% perf-profile.self.cycles-pp.check_stack_object
0.00 +0.1 0.10 ? 10% perf-profile.self.cycles-pp.__might_fault
0.00 +0.1 0.11 ? 5% perf-profile.self.cycles-pp.__printk_safe_enter
0.00 +0.1 0.12 ? 3% perf-profile.self.cycles-pp.put_dec
0.00 +0.1 0.14 ? 5% perf-profile.self.cycles-pp.___might_sleep
0.00 +0.2 0.15 ? 7% perf-profile.self.cycles-pp.prb_read_valid
0.00 +0.2 0.18 ? 4% perf-profile.self.cycles-pp.__might_sleep
0.00 +0.3 0.26 ? 5% perf-profile.self.cycles-pp.prb_read_valid_info
0.00 +0.3 0.31 ? 4% perf-profile.self.cycles-pp._copy_to_user
0.00 +0.4 0.38 ? 3% perf-profile.self.cycles-pp.__virt_addr_valid
0.00 +0.4 0.38 ? 7% perf-profile.self.cycles-pp.find_first_fitting_seq
0.01 ?223% +0.4 0.42 ? 8% perf-profile.self.cycles-pp.memcpy
0.14 ? 14% +0.5 0.61 ? 5% perf-profile.self.cycles-pp.syslog_print_all
0.06 ? 9% +0.6 0.61 ? 3% perf-profile.self.cycles-pp.memmove
0.06 ? 14% +0.6 0.70 ? 6% perf-profile.self.cycles-pp.copy_user_generic_unrolled
0.06 ? 21% +0.7 0.75 ? 5% perf-profile.self.cycles-pp.get_data
0.09 ? 11% +0.9 1.01 ? 3% perf-profile.self.cycles-pp.put_dec_trunc8
0.08 ? 18% +1.0 1.10 ? 3% perf-profile.self.cycles-pp.skip_atoi
0.09 ? 12% +1.0 1.13 ? 4% perf-profile.self.cycles-pp.__check_object_size
0.90 ? 8% +1.3 2.17 ? 4% perf-profile.self.cycles-pp.desc_read
0.16 ? 9% +1.3 1.44 ? 5% perf-profile.self.cycles-pp.record_print_text
0.11 ? 15% +1.3 1.44 ? 4% perf-profile.self.cycles-pp.__check_heap_object
0.13 ? 5% +1.4 1.50 ? 4% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
0.14 ? 11% +1.4 1.56 ? 4% perf-profile.self.cycles-pp.desc_read_finalized_seq
0.14 ? 6% +1.6 1.71 ? 4% perf-profile.self.cycles-pp.sprintf
0.16 ? 13% +1.8 1.98 ? 5% perf-profile.self.cycles-pp.info_print_prefix
0.30 ? 7% +3.4 3.70 ? 4% perf-profile.self.cycles-pp._prb_read_valid
0.40 ? 8% +4.2 4.64 ? 4% perf-profile.self.cycles-pp.memchr
0.59 ? 8% +6.6 7.23 ? 4% perf-profile.self.cycles-pp.number
0.78 ? 3% +8.5 9.26 ? 4% perf-profile.self.cycles-pp.format_decode
0.79 ? 8% +8.9 9.71 ? 2% perf-profile.self.cycles-pp.memcpy_erms
0.84 ? 6% +10.3 11.11 ? 4% perf-profile.self.cycles-pp.vsnprintf



stress-ng.klog.ops

350000 +------------------------------------------------------------------+
|O OO OO OO OO OO OO OO OO OO OO OO OO OO OO OO |
300000 |-+ |
| |
250000 |-+ |
| |
200000 |-+ |
| |
150000 |-+ |
| |
100000 |-+ |
| |
50000 |-+ |
|+.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.++.+|
0 +------------------------------------------------------------------+


stress-ng.klog.ops_per_sec

5500 +--------------------------------------------------------------------+
5000 |-OO OO OO OO OO OO OO O OO OO OO OO OO OO OO O O |
| |
4500 |-+ |
4000 |-+ |
3500 |-+ |
3000 |-+ |
| |
2500 |-+ |
2000 |-+ |
1500 |-+ |
1000 |-+ |
| |
500 |.++.++.++.++.++.++.++.+.++.++.++.++.++.++.++.+.++.++.++.++.++.++.++.|
0 +--------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (21.83 kB)
config-5.11.0-09233-g996e966640dd (174.97 kB)
job-script (8.11 kB)
job.yaml (5.64 kB)
reproduce (549.00 B)
Download all attachments