2022-02-22 09:21:22

by kernel test robot

[permalink] [raw]
Subject: [random] f73c522c4c: stress-ng.getrandom.ops_per_sec 8450.8% improvement



Greeting,

FYI, we noticed a 8450.8% improvement of stress-ng.getrandom.ops_per_sec due to commit:


commit: f73c522c4c2094d1c434083ae362bbd4a2ed7348 ("random: use simpler fast key erasure flow on per-cpu keys")
url: https://github.com/0day-ci/linux/commits/Yusuf-Khan/pga-dfl-pci-Make-sure-DMA-related-error-check-is-not-done-twice/20220222-123031

in testcase: stress-ng
on test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory
with following parameters:

nr_threads: 100%
testtime: 60s
class: cpu
test: getrandom
cpufreq_governor: performance
ucode: 0x42e






Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
cpu/gcc-9/performance/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/getrandom/stress-ng/60s/0x42e

commit:
a086a3a1cb ("random: absorb fast pool into input pool after fast load")
f73c522c4c ("random: use simpler fast key erasure flow on per-cpu keys")

a086a3a1cbfe32bb f73c522c4c2094d1c434083ae36
---------------- ---------------------------
%stddev %change %stddev
\ | \
760691 +8450.6% 65043882 stress-ng.getrandom.ops
12678 +8450.8% 1084065 stress-ng.getrandom.ops_per_sec
29172 +19.8% 34936 ? 3% stress-ng.time.involuntary_context_switches
2852 -1.3% 2817 stress-ng.time.system_time
0.43 ? 6% +6609.4% 28.63 stress-ng.time.user_time
100146 ? 3% +44.5% 144715 ? 3% softirqs.RCU
1635 +10.9% 1813 ? 2% vmstat.system.cs
0.76 ? 2% +0.2 0.99 mpstat.cpu.all.irq%
0.27 ? 4% +0.9 1.21 mpstat.cpu.all.usr%
161.88 +9.4% 177.12 turbostat.CorWatt
190.93 +7.8% 205.84 turbostat.PkgWatt
21.27 ? 3% -10.3% 19.09 ? 3% turbostat.RAMWatt
1.18 ? 9% -80.9% 0.23 ? 14% perf-stat.i.MPKI
6.717e+09 +47.1% 9.884e+09 perf-stat.i.branch-instructions
10570014 ? 6% +17.7% 12440119 ? 5% perf-stat.i.branch-misses
34.23 -26.5 7.78 ? 3% perf-stat.i.cache-miss-rate%
9666129 -92.4% 731662 ? 6% perf-stat.i.cache-misses
28512914 -82.0% 5132520 ? 3% perf-stat.i.cache-references
1361 ? 2% +14.9% 1563 ? 2% perf-stat.i.context-switches
4.64 -84.7% 0.71 perf-stat.i.cpi
14249 +5410.5% 785198 ? 3% perf-stat.i.cycles-between-cache-misses
0.03 ? 9% -0.0 0.02 ? 2% perf-stat.i.dTLB-load-miss-rate%
1658772 ? 12% +170.3% 4483462 ? 3% perf-stat.i.dTLB-load-misses
7.003e+09 +367.8% 3.276e+10 perf-stat.i.dTLB-loads
0.10 ? 4% -0.1 0.02 ? 2% perf-stat.i.dTLB-store-miss-rate%
372166 ? 3% +654.1% 2806515 perf-stat.i.dTLB-store-misses
3.729e+08 +5647.3% 2.143e+10 perf-stat.i.dTLB-stores
92.41 +3.9 96.26 perf-stat.i.iTLB-load-miss-rate%
203482 ? 11% +688.4% 1604335 perf-stat.i.iTLB-load-misses
19298 ? 5% +226.7% 63041 ? 19% perf-stat.i.iTLB-loads
2.89e+10 +589.7% 1.993e+11 perf-stat.i.instructions
280121 ? 16% -55.5% 124599 perf-stat.i.instructions-per-iTLB-miss
0.23 +532.6% 1.43 perf-stat.i.ipc
927.95 -82.8% 159.69 perf-stat.i.metric.K/sec
293.60 +354.6% 1334 perf-stat.i.metric.M/sec
43.93 +1.3 45.22 perf-stat.i.node-load-miss-rate%
5038505 -97.8% 109393 ? 3% perf-stat.i.node-load-misses
6287060 -97.5% 156525 perf-stat.i.node-loads
39.30 -4.5 34.77 ? 2% perf-stat.i.node-store-miss-rate%
2061218 -94.0% 123404 ? 7% perf-stat.i.node-store-misses
3170200 -92.6% 236074 ? 5% perf-stat.i.node-stores
0.99 -97.4% 0.03 ? 3% perf-stat.overall.MPKI
0.16 ? 6% -0.0 0.13 ? 5% perf-stat.overall.branch-miss-rate%
33.90 -19.7 14.24 ? 3% perf-stat.overall.cache-miss-rate%
4.73 -85.5% 0.69 perf-stat.overall.cpi
14157 +1225.6% 187672 ? 6% perf-stat.overall.cycles-between-cache-misses
0.02 ? 12% -0.0 0.01 ? 3% perf-stat.overall.dTLB-load-miss-rate%
0.10 ? 4% -0.1 0.01 perf-stat.overall.dTLB-store-miss-rate%
91.26 +5.0 96.22 perf-stat.overall.iTLB-load-miss-rate%
0.21 +590.1% 1.46 perf-stat.overall.ipc
44.49 -3.4 41.11 ? 2% perf-stat.overall.node-load-miss-rate%
39.40 -5.1 34.33 ? 6% perf-stat.overall.node-store-miss-rate%
6.611e+09 +47.1% 9.727e+09 perf-stat.ps.branch-instructions
10407491 ? 6% +17.6% 12240249 ? 5% perf-stat.ps.branch-misses
9512419 -92.4% 719946 ? 6% perf-stat.ps.cache-misses
28059987 -82.0% 5052170 ? 3% perf-stat.ps.cache-references
1339 ? 2% +14.8% 1538 ? 2% perf-stat.ps.context-switches
1632412 ? 12% +170.3% 4412272 ? 3% perf-stat.ps.dTLB-load-misses
6.891e+09 +367.8% 3.224e+10 perf-stat.ps.dTLB-loads
366290 ? 3% +654.0% 2762000 perf-stat.ps.dTLB-store-misses
3.67e+08 +5646.9% 2.109e+10 perf-stat.ps.dTLB-stores
200266 ? 11% +688.4% 1578805 perf-stat.ps.iTLB-load-misses
18995 ? 5% +226.8% 62085 ? 19% perf-stat.ps.iTLB-loads
2.844e+10 +589.7% 1.962e+11 perf-stat.ps.instructions
4958355 -97.8% 107622 ? 3% perf-stat.ps.node-load-misses
6187037 -97.5% 154096 perf-stat.ps.node-loads
2028401 -94.0% 121256 ? 7% perf-stat.ps.node-store-misses
3119670 -92.6% 232039 ? 5% perf-stat.ps.node-stores
1.798e+12 +590.4% 1.242e+13 perf-stat.total.instructions
97.64 -97.6 0.00 perf-profile.calltrace.cycles-pp.extract_crng.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
96.12 -96.1 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.extract_crng.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
95.92 -95.9 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.extract_crng.urandom_read_nowarn.do_syscall_64
99.86 -1.5 98.33 perf-profile.calltrace.cycles-pp.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
99.88 -0.8 99.04 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
99.88 -0.8 99.09 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getrandom
99.89 -0.1 99.80 perf-profile.calltrace.cycles-pp.getrandom
0.00 +0.6 0.55 ? 2% perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic.crng_fast_key_erasure.crng_make_state.get_random_bytes_user
0.00 +0.6 0.56 ? 3% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
0.00 +0.6 0.64 perf-profile.calltrace.cycles-pp.chacha_block_generic.crng_fast_key_erasure.crng_make_state.get_random_bytes_user.urandom_read_nowarn
0.00 +0.7 0.71 perf-profile.calltrace.cycles-pp.crng_fast_key_erasure.crng_make_state.get_random_bytes_user.urandom_read_nowarn.do_syscall_64
0.00 +0.8 0.76 perf-profile.calltrace.cycles-pp.crng_make_state.get_random_bytes_user.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.2 1.16 perf-profile.calltrace.cycles-pp.__might_sleep.__might_fault._copy_to_user.get_random_bytes_user.urandom_read_nowarn
0.00 +1.3 1.25 perf-profile.calltrace.cycles-pp.check_stack_object.__check_object_size.get_random_bytes_user.urandom_read_nowarn.do_syscall_64
0.00 +1.9 1.90 perf-profile.calltrace.cycles-pp.__might_resched.__might_fault._copy_to_user.get_random_bytes_user.urandom_read_nowarn
0.00 +3.1 3.06 perf-profile.calltrace.cycles-pp.__check_object_size.get_random_bytes_user.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +3.5 3.54 perf-profile.calltrace.cycles-pp.__might_fault._copy_to_user.get_random_bytes_user.urandom_read_nowarn.do_syscall_64
0.00 +6.3 6.27 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string._copy_to_user.get_random_bytes_user.urandom_read_nowarn.do_syscall_64
0.00 +11.6 11.56 perf-profile.calltrace.cycles-pp._copy_to_user.get_random_bytes_user.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +71.5 71.52 perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic.get_random_bytes_user.urandom_read_nowarn.do_syscall_64
0.00 +80.3 80.33 perf-profile.calltrace.cycles-pp.chacha_block_generic.get_random_bytes_user.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +97.9 97.86 perf-profile.calltrace.cycles-pp.get_random_bytes_user.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
98.40 -98.4 0.00 perf-profile.children.cycles-pp.extract_crng
97.63 -97.6 0.00 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
97.41 -97.4 0.00 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
99.86 -1.5 98.33 perf-profile.children.cycles-pp.urandom_read_nowarn
99.93 -0.8 99.09 perf-profile.children.cycles-pp.do_syscall_64
99.93 -0.8 99.14 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
99.90 -0.0 99.86 perf-profile.children.cycles-pp.getrandom
0.06 ? 11% +0.0 0.09 ? 9% perf-profile.children.cycles-pp.task_tick_fair
0.08 ? 5% +0.0 0.12 ? 6% perf-profile.children.cycles-pp.scheduler_tick
0.00 +0.1 0.05 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.11 ? 3% +0.1 0.17 ? 7% perf-profile.children.cycles-pp.tick_sched_handle
0.00 +0.1 0.06 perf-profile.children.cycles-pp.copy_user_generic_unrolled
0.10 ? 4% +0.1 0.17 ? 8% perf-profile.children.cycles-pp.update_process_times
0.12 ? 4% +0.1 0.18 ? 7% perf-profile.children.cycles-pp.tick_sched_timer
0.00 +0.1 0.07 ? 5% perf-profile.children.cycles-pp.__x64_sys_getrandom
0.16 ? 4% +0.1 0.23 ? 5% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.31 ? 3% +0.1 0.39 ? 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.23 ? 3% +0.1 0.32 ? 4% perf-profile.children.cycles-pp.hrtimer_interrupt
0.23 ? 3% +0.1 0.33 ? 4% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.26 ? 4% +0.1 0.36 ? 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.00 +0.3 0.35 perf-profile.children.cycles-pp.__entry_text_start
0.00 +0.4 0.36 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.00 +0.6 0.57 ? 3% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.00 +0.7 0.71 perf-profile.children.cycles-pp.crng_fast_key_erasure
0.00 +0.8 0.76 perf-profile.children.cycles-pp.crng_make_state
0.00 +1.2 1.21 perf-profile.children.cycles-pp.__might_sleep
0.00 +1.5 1.55 perf-profile.children.cycles-pp.check_stack_object
0.18 ? 4% +1.8 1.96 perf-profile.children.cycles-pp.__might_resched
0.00 +3.4 3.45 perf-profile.children.cycles-pp.__check_object_size
0.21 ? 3% +3.7 3.92 perf-profile.children.cycles-pp.__might_fault
0.05 ? 8% +6.4 6.46 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
0.28 ? 2% +11.7 12.00 perf-profile.children.cycles-pp._copy_to_user
0.82 ? 2% +71.4 72.20 perf-profile.children.cycles-pp.chacha_permute
1.37 ? 2% +79.8 81.17 perf-profile.children.cycles-pp.chacha_block_generic
0.00 +98.3 98.32 perf-profile.children.cycles-pp.get_random_bytes_user
97.41 -97.4 0.00 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.00 +0.1 0.06 ? 9% perf-profile.self.cycles-pp.copy_user_generic_unrolled
0.00 +0.1 0.06 perf-profile.self.cycles-pp.__x64_sys_getrandom
0.00 +0.1 0.06 ? 7% perf-profile.self.cycles-pp.crng_fast_key_erasure
0.00 +0.1 0.07 ? 11% perf-profile.self.cycles-pp.do_syscall_64
0.00 +0.1 0.08 ? 6% perf-profile.self.cycles-pp.getrandom
0.00 +0.3 0.30 perf-profile.self.cycles-pp.__entry_text_start
0.00 +0.4 0.36 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.00 +0.5 0.53 ? 3% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.00 +0.9 0.88 perf-profile.self.cycles-pp.__might_fault
0.00 +1.0 1.02 ? 2% perf-profile.self.cycles-pp.__might_sleep
0.00 +1.3 1.28 perf-profile.self.cycles-pp.check_stack_object
0.18 ? 4% +1.7 1.91 perf-profile.self.cycles-pp.__might_resched
0.00 +1.8 1.75 perf-profile.self.cycles-pp._copy_to_user
0.00 +2.0 2.02 perf-profile.self.cycles-pp.__check_object_size
0.00 +2.0 2.04 perf-profile.self.cycles-pp.get_random_bytes_user
0.05 +6.1 6.11 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
0.55 ? 2% +8.4 8.96 perf-profile.self.cycles-pp.chacha_block_generic
0.82 ? 3% +70.9 71.76 perf-profile.self.cycles-pp.chacha_permute




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (16.07 kB)
config-5.17.0-rc4-00015-gf73c522c4c20 (177.41 kB)
job-script (7.95 kB)
job.yaml (5.44 kB)
reproduce (352.00 B)
Download all attachments