Greeting,
FYI, we noticed a 47.7% improvement of stress-ng.getrandom.ops_per_sec due to commit:
commit: 2ee25b6968b1b3c66ffa408de23d023c1bce81cf ("random: avoid superfluous call to RDRAND in CRNG extraction")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: stress-ng
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory
with following parameters:
nr_threads: 100%
testtime: 60s
class: cpu
test: getrandom
cpufreq_governor: performance
ucode: 0x5003102
In addition to that, the commit also has significant impact on the following tests:
+------------------+---------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.getrandom.ops_per_sec 33.0% improvement |
| test machine | 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory |
| test parameters | class=cpu |
| | cpufreq_governor=performance |
| | nr_threads=100% |
| | test=getrandom |
| | testtime=60s |
| | ucode=0x42e |
+------------------+---------------------------------------------------------------------------------+
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
cpu/gcc-9/performance/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp7/getrandom/stress-ng/60s/0x5003102
commit:
96562f2868 ("random: early initialization of ChaCha constants")
2ee25b6968 ("random: avoid superfluous call to RDRAND in CRNG extraction")
96562f286884e2db 2ee25b6968b1b3c66ffa408de23
---------------- ---------------------------
%stddev %change %stddev
\ | \
1582098 ? 2% +47.7% 2337043 stress-ng.getrandom.ops
26367 ? 2% +47.7% 38950 stress-ng.getrandom.ops_per_sec
36932 +2.3% 37791 stress-ng.time.involuntary_context_switches
3.365e+08 ? 3% +24.5% 4.191e+08 ? 7% cpuidle..time
402.02 -1.8% 394.71 pmeter.Average_Active_Power
59.00 -4.0% 56.67 turbostat.PkgTmp
211108 -1.0% 209101 vmstat.system.in
4.66 +1.9 6.53 ? 10% mpstat.cpu.all.idle%
0.50 +0.0 0.54 ? 2% mpstat.cpu.all.irq%
0.14 ? 3% -0.0 0.12 ? 14% mpstat.cpu.all.usr%
191308 ? 11% -55.2% 85772 ? 3% numa-numastat.node0.local_node
249395 ? 2% -30.7% 172877 numa-numastat.node0.numa_hit
58087 ? 39% +50.0% 87104 numa-numastat.node0.other_node
173851 ? 12% +58.9% 276208 numa-numastat.node1.local_node
202838 ? 3% +36.2% 276198 numa-numastat.node1.numa_hit
7408 +5.8% 7840 proc-vmstat.nr_active_anon
2668 ? 3% -5.2% 2528 proc-vmstat.nr_page_table_pages
11807 +3.7% 12242 proc-vmstat.nr_shmem
7408 +5.8% 7840 proc-vmstat.nr_zone_active_anon
21247 +3.5% 21981 proc-vmstat.pgactivate
75695 ? 6% -26.3% 55749 ? 22% numa-meminfo.node0.KReclaimable
75695 ? 6% -26.3% 55749 ? 22% numa-meminfo.node0.SReclaimable
182572 ? 5% -14.6% 155921 numa-meminfo.node0.Slab
26308 ? 17% +74.9% 46018 ? 25% numa-meminfo.node1.KReclaimable
1145005 ? 7% +67.8% 1921879 ? 45% numa-meminfo.node1.MemUsed
26308 ? 17% +74.9% 46018 ? 25% numa-meminfo.node1.SReclaimable
111636 ? 7% +24.4% 138864 numa-meminfo.node1.Slab
18923 ? 6% -26.4% 13936 ? 22% numa-vmstat.node0.nr_slab_reclaimable
3899 ? 4% +8.3% 4221 ? 5% numa-vmstat.node0.numa_interleave
1525275 ? 7% -14.6% 1302321 ? 12% numa-vmstat.node0.numa_local
59275 ? 41% +53.7% 91128 numa-vmstat.node0.numa_other
6576 ? 17% +74.9% 11500 ? 25% numa-vmstat.node1.nr_slab_reclaimable
4170 ? 4% -7.4% 3860 ? 6% numa-vmstat.node1.numa_interleave
722855 ? 17% +30.3% 941837 ? 17% numa-vmstat.node1.numa_local
67164 ? 35% -47.5% 35243 numa-vmstat.node1.numa_other
98.00 -0.1 97.91 perf-profile.calltrace.cycles-pp._extract_crng.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
0.69 ? 7% +0.7 1.36 perf-profile.calltrace.cycles-pp.chacha_block_generic._extract_crng.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.17 ?141% +1.0 1.12 perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic._extract_crng.urandom_read_nowarn.do_syscall_64
98.76 -0.1 98.69 perf-profile.children.cycles-pp._extract_crng
0.08 ? 6% +0.0 0.11 ? 4% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
0.00 +0.1 0.05 perf-profile.children.cycles-pp.__might_fault
0.12 ? 6% +0.1 0.18 perf-profile.children.cycles-pp._copy_to_user
0.48 ? 6% +0.7 1.14 perf-profile.children.cycles-pp.chacha_permute
0.69 ? 7% +0.7 1.37 perf-profile.children.cycles-pp.chacha_block_generic
0.89 ? 10% -0.8 0.10 ? 8% perf-profile.self.cycles-pp._extract_crng
0.08 ? 6% +0.0 0.11 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
0.26 ? 5% +0.1 0.40 ? 2% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.48 ? 6% +0.7 1.14 perf-profile.self.cycles-pp.chacha_permute
4.65 ? 4% +24.6% 5.79 ? 2% perf-stat.i.MPKI
5.33e+09 +1.4% 5.402e+09 perf-stat.i.branch-instructions
0.95 ? 9% -0.2 0.70 ? 10% perf-stat.i.cache-miss-rate%
1.189e+08 ? 3% +35.5% 1.611e+08 ? 3% perf-stat.i.cache-references
9.83 -8.1% 9.03 perf-stat.i.cpi
557463 -3.3% 539225 perf-stat.i.cycles-between-cache-misses
5.913e+09 +5.5% 6.241e+09 perf-stat.i.dTLB-loads
6.913e+08 +37.7% 9.518e+08 perf-stat.i.dTLB-stores
2.548e+10 +8.4% 2.762e+10 perf-stat.i.instructions
0.13 +6.5% 0.13 perf-stat.i.ipc
12.24 -1.7% 12.03 perf-stat.i.major-faults
125.53 +5.8% 132.84 perf-stat.i.metric.M/sec
80.14 +2.6 82.73 perf-stat.i.node-load-miss-rate%
55782 -10.1% 50128 perf-stat.i.node-loads
57.92 ? 6% +23.3 81.21 perf-stat.i.node-store-miss-rate%
49315 ? 6% +19.2% 58807 ? 3% perf-stat.i.node-store-misses
39223 ? 4% -44.3% 21849 ? 2% perf-stat.i.node-stores
4.67 ? 4% +24.9% 5.83 ? 3% perf-stat.overall.MPKI
0.57 -0.2 0.42 ? 5% perf-stat.overall.cache-miss-rate%
10.05 -8.0% 9.24 perf-stat.overall.cpi
0.00 ? 2% -0.0 0.00 ? 13% perf-stat.overall.dTLB-store-miss-rate%
0.10 +8.8% 0.11 perf-stat.overall.ipc
70.08 +2.0 72.05 perf-stat.overall.node-load-miss-rate%
55.46 ? 5% +17.3 72.72 perf-stat.overall.node-store-miss-rate%
5.246e+09 +1.4% 5.318e+09 perf-stat.ps.branch-instructions
1.171e+08 ? 3% +35.4% 1.586e+08 ? 3% perf-stat.ps.cache-references
5.821e+09 +5.5% 6.143e+09 perf-stat.ps.dTLB-loads
6.808e+08 +37.7% 9.371e+08 perf-stat.ps.dTLB-stores
2.508e+10 +8.4% 2.719e+10 perf-stat.ps.instructions
57596 -11.8% 50795 perf-stat.ps.node-loads
48478 ? 7% +19.4% 57879 ? 3% perf-stat.ps.node-store-misses
38847 ? 4% -44.1% 21704 ? 2% perf-stat.ps.node-stores
1.581e+12 +9.6% 1.733e+12 perf-stat.total.instructions
***************************************************************************************************
lkp-ivb-2ep1: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory
=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
cpu/gcc-9/performance/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/getrandom/stress-ng/60s/0x42e
commit:
96562f2868 ("random: early initialization of ChaCha constants")
2ee25b6968 ("random: avoid superfluous call to RDRAND in CRNG extraction")
96562f286884e2db 2ee25b6968b1b3c66ffa408de23
---------------- ---------------------------
%stddev %change %stddev
\ | \
1749360 +33.0% 2327260 stress-ng.getrandom.ops
29155 +33.0% 38787 stress-ng.getrandom.ops_per_sec
79229 ? 4% +6.8% 84630 ? 3% meminfo.AnonHugePages
9206 ? 3% -5.5% 8699 ? 2% proc-vmstat.pgactivate
112056 ? 3% +8.1% 121114 ? 3% softirqs.SCHED
35.12 +2.0% 35.82 boot-time.boot
1458 +2.1% 1489 boot-time.idle
1.81e+08 ? 12% +27.3% 2.304e+08 ? 13% cpuidle..time
401544 ? 14% +29.3% 519074 ? 12% cpuidle..usage
129917 ? 4% -7.8% 119774 ? 5% numa-numastat.node0.local_node
126626 ? 3% +8.4% 137254 ? 5% numa-numastat.node1.local_node
1632 ? 2% +3.9% 1695 vmstat.system.cs
104682 -1.0% 103645 vmstat.system.in
4.99 ? 13% +1.8 6.82 ? 9% mpstat.cpu.all.idle%
0.85 ? 3% +0.1 0.94 ? 4% mpstat.cpu.all.irq%
0.01 ? 7% +0.0 0.02 ? 8% mpstat.cpu.all.soft%
11520 ? 96% -86.7% 1534 ? 87% numa-meminfo.node1.AnonHugePages
52249 ? 86% -72.2% 14547 ? 25% numa-meminfo.node1.AnonPages
71509 ? 64% -53.2% 33434 ? 13% numa-meminfo.node1.AnonPages.max
2428 ? 2% +12.8% 2740 ? 7% numa-vmstat.node0.numa_interleave
13085 ? 86% -72.3% 3628 ? 25% numa-vmstat.node1.nr_anon_pages
2831 ? 2% -11.4% 2509 ? 8% numa-vmstat.node1.numa_interleave
137674 ? 70% +87.2% 257735 ? 13% turbostat.C1E
200554 ? 13% +22.4% 245492 ? 13% turbostat.C6
4.55 ? 7% +1.4 6.00 ? 8% turbostat.C6%
2.49 ? 22% +40.9% 3.51 ? 6% turbostat.CPU%c6
1.05 ? 3% +61.6% 1.70 ? 24% turbostat.Pkg%pc2
0.03 ? 77% +475.0% 0.15 ? 54% turbostat.Pkg%pc6
71.00 -3.8% 68.33 turbostat.PkgTmp
2.18 ? 6% +40.0% 3.05 ? 15% perf-stat.i.MPKI
0.34 ? 10% +0.2 0.51 ? 15% perf-stat.i.branch-miss-rate%
62441682 ? 3% +20.3% 75094122 ? 2% perf-stat.i.cache-references
1387 +3.2% 1432 ? 2% perf-stat.i.context-switches
4.27 -4.0% 4.10 perf-stat.i.cpi
1.366e+11 -1.2% 1.349e+11 perf-stat.i.cpu-cycles
7.358e+09 +2.4% 7.535e+09 perf-stat.i.dTLB-loads
1293946 ? 16% +32.9% 1719408 ? 8% perf-stat.i.dTLB-store-misses
7.417e+08 +27.5% 9.46e+08 perf-stat.i.dTLB-stores
278891 +12.7% 314323 ? 4% perf-stat.i.iTLB-load-misses
3.144e+10 +4.1% 3.274e+10 perf-stat.i.instructions
184251 -14.1% 158318 ? 2% perf-stat.i.instructions-per-iTLB-miss
10.79 -2.7% 10.50 perf-stat.i.major-faults
2.85 -1.2% 2.81 perf-stat.i.metric.GHz
63.24 ? 6% +33.4% 84.34 ? 7% perf-stat.i.metric.K/sec
310.37 +2.5% 318.16 perf-stat.i.metric.M/sec
1.99 ? 3% +15.5% 2.29 ? 2% perf-stat.overall.MPKI
4.34 -5.1% 4.12 perf-stat.overall.cpi
112722 -7.4% 104351 ? 5% perf-stat.overall.instructions-per-iTLB-miss
0.23 +5.4% 0.24 perf-stat.overall.ipc
61452519 ? 3% +20.3% 73919165 ? 2% perf-stat.ps.cache-references
1365 +3.3% 1410 ? 2% perf-stat.ps.context-switches
1.344e+11 -1.2% 1.328e+11 perf-stat.ps.cpu-cycles
7.241e+09 +2.4% 7.416e+09 perf-stat.ps.dTLB-loads
1273486 ? 16% +32.9% 1692449 ? 8% perf-stat.ps.dTLB-store-misses
7.299e+08 +27.6% 9.312e+08 perf-stat.ps.dTLB-stores
274494 +12.8% 309549 ? 4% perf-stat.ps.iTLB-load-misses
3.094e+10 +4.1% 3.223e+10 perf-stat.ps.instructions
10.63 -2.3% 10.38 perf-stat.ps.major-faults
1.966e+12 +5.8% 2.08e+12 perf-stat.total.instructions
97.38 -0.3 97.12 perf-profile.calltrace.cycles-pp._extract_crng.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
94.02 -0.1 93.93 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave._extract_crng.urandom_read_nowarn.do_syscall_64
99.82 -0.0 99.80 perf-profile.calltrace.cycles-pp.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
99.86 -0.0 99.84 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getrandom
99.86 -0.0 99.84 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getrandom
1.31 +0.7 2.03 ? 6% perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic._extract_crng.urandom_read_nowarn.do_syscall_64
1.74 +0.9 2.60 perf-profile.calltrace.cycles-pp.chacha_block_generic._extract_crng.urandom_read_nowarn.do_syscall_64.entry_SYSCALL_64_after_hwframe
98.15 -0.3 97.89 perf-profile.children.cycles-pp._extract_crng
95.50 -0.1 95.42 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
99.84 -0.0 99.82 perf-profile.children.cycles-pp.urandom_read_nowarn
99.92 -0.0 99.90 perf-profile.children.cycles-pp.do_syscall_64
99.92 -0.0 99.90 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.12 +0.0 0.13 perf-profile.children.cycles-pp.tick_sched_handle
0.12 +0.0 0.13 perf-profile.children.cycles-pp.update_process_times
0.13 +0.0 0.14 ? 3% perf-profile.children.cycles-pp.tick_sched_timer
0.12 ? 6% +0.0 0.14 ? 5% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
0.31 ? 2% +0.0 0.33 ? 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.09 ? 9% +0.0 0.12 ? 4% perf-profile.children.cycles-pp.__check_object_size
0.40 ? 2% +0.0 0.43 perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.00 +0.1 0.05 perf-profile.children.cycles-pp.check_stack_object
0.12 ? 13% +0.1 0.23 ? 2% perf-profile.children.cycles-pp.__might_resched
0.18 ? 9% +0.1 0.31 perf-profile.children.cycles-pp.__might_fault
0.35 ? 5% +0.2 0.50 perf-profile.children.cycles-pp._copy_to_user
1.32 +0.7 2.05 ? 6% perf-profile.children.cycles-pp.chacha_permute
1.76 +0.9 2.63 perf-profile.children.cycles-pp.chacha_block_generic
1.25 ? 4% -1.1 0.12 ? 4% perf-profile.self.cycles-pp._extract_crng
95.50 -0.1 95.42 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.06 ? 7% +0.0 0.08 ? 6% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.05 ? 8% +0.0 0.07 perf-profile.self.cycles-pp.urandom_read_nowarn
0.11 ? 4% +0.0 0.13 ? 3% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
0.05 +0.0 0.07 ? 12% perf-profile.self.cycles-pp.__check_object_size
0.00 +0.1 0.05 ? 8% perf-profile.self.cycles-pp.__might_fault
0.12 ? 13% +0.1 0.22 ? 2% perf-profile.self.cycles-pp.__might_resched
0.39 ? 3% +0.1 0.50 ? 6% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.44 +0.1 0.58 ? 18% perf-profile.self.cycles-pp.chacha_block_generic
1.32 +0.7 2.05 ? 6% perf-profile.self.cycles-pp.chacha_permute
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang