2022-10-31 08:51:21

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [rcu] d96c52fe49: fio.write_iops 5.8% improvement

Greeting,

FYI, we noticed a 5.8% improvement of fio.write_iops due to commit:

commit: d96c52fe4907c68adc5e61a0bef7aec0933223d5 ("rcu: Add polled expedited grace-period primitives")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: fio-basic
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
with following parameters:

disk: 2pmem
fs: xfs
mount_option: dax
runtime: 200s
nr_task: 50%
time_based: tb
rw: write
bs: 4k
ioengine: mmap
test_size: 200G
cpufreq_governor: performance

test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio

In addition to that, the commit also has significant impact on the following tests:

+------------------+-----------------------------------------------------------------------------------------------+
| testcase: change | fio-basic: fio.read_iops 7.7% improvement |
| test machine | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory |
| test parameters | bs=4k |
| | cpufreq_governor=performance |
| | disk=2pmem |
| | fs=xfs |
| | ioengine=mmap |
| | mount_option=dax |
| | nr_task=50% |
| | runtime=200s |
| | rw=read |
| | test_size=200G |
| | time_based=tb |
+------------------+-----------------------------------------------------------------------------------------------+


Details are as below:

=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
4k/gcc-11/performance/2pmem/xfs/mmap/x86_64-rhel-8.3/dax/50%/debian-11.1-x86_64-20220510.cgz/200s/write/lkp-csl-2sp7/200G/fio-basic/tb

commit:
e4333cb20f ("rcutorture: Verify that polled GP API sees synchronous grace periods")
d96c52fe49 ("rcu: Add polled expedited grace-period primitives")

e4333cb20f047d96 d96c52fe4907c68adc5e61a0bef
---------------- ---------------------------
%stddev %change %stddev
\ | \
23.87 ? 3% +32.7 56.58 fio.latency_20us%
76.01 -32.6 43.37 ? 2% fio.latency_50us%
4.463e+08 +5.8% 4.722e+08 fio.time.minor_page_faults
521.13 +6.9% 557.11 fio.time.user_time
4.463e+08 +5.8% 4.722e+08 fio.workload
8717 +5.8% 9221 fio.write_bw_MBps
23210 -6.8% 21632 fio.write_clat_90%_us
24960 -5.6% 23552 fio.write_clat_95%_us
21149 -5.6% 19960 fio.write_clat_mean_us
2231570 +5.8% 2360791 fio.write_iops
2.70 +6.8% 2.89 iostat.cpu.user
0.03 +0.0 0.03 ? 6% mpstat.cpu.all.soft%
43.52 +1.0% 43.98 turbostat.RAMWatt
1925180 ? 2% +13.8% 2191273 ? 6% numa-meminfo.node1.MemUsed
84399 ? 4% +11.2% 93823 ? 10% numa-meminfo.node1.SUnreclaim
64201 ?138% +78.6% 114665 ? 74% numa-meminfo.node1.Unevictable
21100 ? 4% +11.2% 23455 ? 10% numa-vmstat.node1.nr_slab_unreclaimable
16050 ?138% +78.6% 28665 ? 74% numa-vmstat.node1.nr_unevictable
16050 ?138% +78.6% 28665 ? 74% numa-vmstat.node1.nr_zone_unevictable
431413 +5.8% 456351 proc-vmstat.nr_page_table_pages
2574662 +3.6% 2668198 proc-vmstat.numa_hit
2485812 +3.8% 2581127 proc-vmstat.numa_local
2572931 +3.7% 2668242 proc-vmstat.pgalloc_normal
4.47e+08 +5.8% 4.729e+08 proc-vmstat.pgfault
2460652 +4.1% 2560513 proc-vmstat.pgfree
872002 +5.8% 922500 proc-vmstat.thp_fault_fallback
7.577e+09 +4.0% 7.878e+09 perf-stat.i.branch-instructions
0.22 ? 2% +0.0 0.25 ? 3% perf-stat.i.branch-miss-rate%
16697023 ? 3% +19.7% 19981806 ? 3% perf-stat.i.branch-misses
1.862e+08 +5.3% 1.961e+08 perf-stat.i.cache-misses
2.653e+08 +5.1% 2.787e+08 perf-stat.i.cache-references
4.42 -3.6% 4.26 perf-stat.i.cpi
731.57 ? 3% -6.1% 686.63 perf-stat.i.cycles-between-cache-misses
7.891e+09 +4.1% 8.217e+09 perf-stat.i.dTLB-loads
0.47 +0.0 0.48 perf-stat.i.dTLB-store-miss-rate%
12378051 +7.7% 13336505 perf-stat.i.dTLB-store-misses
2.643e+09 +5.7% 2.793e+09 perf-stat.i.dTLB-stores
67.44 +3.5 70.99 perf-stat.i.iTLB-load-miss-rate%
12036801 ? 3% +18.8% 14298923 ? 2% perf-stat.i.iTLB-load-misses
3.046e+10 +3.9% 3.166e+10 perf-stat.i.instructions
2572 ? 4% -12.5% 2249 perf-stat.i.instructions-per-iTLB-miss
0.23 +3.9% 0.24 perf-stat.i.ipc
1171 +5.6% 1237 perf-stat.i.metric.K/sec
191.44 +4.3% 199.67 perf-stat.i.metric.M/sec
2217598 +6.0% 2349848 perf-stat.i.minor-faults
360845 ? 4% +5.7% 381564 ? 2% perf-stat.i.node-loads
44614403 ? 2% +7.1% 47802681 ? 2% perf-stat.i.node-store-misses
2217612 +6.0% 2349862 perf-stat.i.page-faults
0.22 ? 2% +0.0 0.25 ? 3% perf-stat.overall.branch-miss-rate%
4.41 -3.6% 4.25 perf-stat.overall.cpi
720.90 -4.9% 685.85 perf-stat.overall.cycles-between-cache-misses
0.47 +0.0 0.48 perf-stat.overall.dTLB-store-miss-rate%
67.43 +3.6 70.99 perf-stat.overall.iTLB-load-miss-rate%
2533 ? 3% -12.6% 2215 ? 2% perf-stat.overall.instructions-per-iTLB-miss
0.23 +3.7% 0.24 perf-stat.overall.ipc
13751 -1.9% 13485 perf-stat.overall.path-length
7.54e+09 +4.0% 7.839e+09 perf-stat.ps.branch-instructions
16623549 ? 3% +19.7% 19891545 ? 3% perf-stat.ps.branch-misses
1.853e+08 +5.3% 1.951e+08 perf-stat.ps.cache-misses
2.64e+08 +5.0% 2.773e+08 perf-stat.ps.cache-references
7.852e+09 +4.1% 8.177e+09 perf-stat.ps.dTLB-loads
12317699 +7.7% 13270764 perf-stat.ps.dTLB-store-misses
2.631e+09 +5.7% 2.779e+09 perf-stat.ps.dTLB-stores
11979927 ? 3% +18.8% 14229710 ? 2% perf-stat.ps.iTLB-load-misses
3.031e+10 +3.9% 3.15e+10 perf-stat.ps.instructions
2206821 +6.0% 2338290 perf-stat.ps.minor-faults
359302 ? 4% +5.7% 379788 ? 2% perf-stat.ps.node-loads
44388499 ? 2% +7.1% 47560032 ? 2% perf-stat.ps.node-store-misses
2206835 +6.0% 2338303 perf-stat.ps.page-faults
6.138e+12 +3.7% 6.367e+12 perf-stat.total.instructions


***************************************************************************************************
lkp-csl-2sp7: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
4k/gcc-11/performance/2pmem/xfs/mmap/x86_64-rhel-8.3/dax/50%/debian-11.1-x86_64-20220510.cgz/200s/read/lkp-csl-2sp7/200G/fio-basic/tb

commit:
e4333cb20f ("rcutorture: Verify that polled GP API sees synchronous grace periods")
d96c52fe49 ("rcu: Add polled expedited grace-period primitives")

e4333cb20f047d96 d96c52fe4907c68adc5e61a0bef
---------------- ---------------------------
%stddev %change %stddev
\ | \
18.06 ? 4% +41.5 59.58 ? 3% fio.latency_20us%
81.86 -41.5 40.40 ? 5% fio.latency_50us%
8656 +7.7% 9320 fio.read_bw_MBps
23466 -8.4% 21504 fio.read_clat_90%_us
24789 -8.8% 22613 fio.read_clat_95%_us
28501 -5.7% 26880 fio.read_clat_99%_us
21373 -7.2% 19825 fio.read_clat_mean_us
2216141 +7.7% 2386084 fio.read_iops
4.432e+08 +7.7% 4.772e+08 fio.time.minor_page_faults
452.30 +7.5% 486.28 fio.time.user_time
4.432e+08 +7.7% 4.772e+08 fio.workload
2.36 +7.3% 2.53 iostat.cpu.user
14396416 ? 10% -16.9% 11959978 ? 8% meminfo.DirectMap2M
0.65 ? 2% -0.1 0.51 ? 44% perf-profile.calltrace.cycles-pp.insert_pfn.__vm_insert_mixed.dax_fault_iter.dax_iomap_pte_fault.__xfs_filemap_fault
0.63 ? 3% -0.1 0.50 ? 44% perf-profile.calltrace.cycles-pp.__get_locked_pte.insert_pfn.__vm_insert_mixed.dax_fault_iter.dax_iomap_pte_fault
0.60 ? 2% -0.1 0.46 ? 44% perf-profile.calltrace.cycles-pp._raw_spin_lock.__get_locked_pte.insert_pfn.__vm_insert_mixed.dax_fault_iter
0.65 ? 2% -0.1 0.59 ? 9% perf-profile.children.cycles-pp.insert_pfn
0.63 ? 3% -0.1 0.57 ? 9% perf-profile.children.cycles-pp.__get_locked_pte
1.14 ? 2% -0.1 0.99 ? 9% perf-profile.self.cycles-pp._raw_spin_lock
429439 +7.2% 460393 proc-vmstat.nr_page_table_pages
2550960 +5.6% 2693959 proc-vmstat.numa_hit
2464007 +5.8% 2606912 proc-vmstat.numa_local
2551059 +5.6% 2694047 proc-vmstat.pgalloc_normal
4.439e+08 +7.7% 4.779e+08 proc-vmstat.pgfault
2447827 +5.4% 2579922 proc-vmstat.pgfree
865972 +7.7% 932383 proc-vmstat.thp_fault_fallback
7.80 +3.9% 8.11 ? 3% perf-stat.i.MPKI
7.99e+09 +4.9% 8.382e+09 perf-stat.i.branch-instructions
0.25 ? 5% +0.0 0.29 ? 9% perf-stat.i.branch-miss-rate%
20400526 ? 5% +15.4% 23535017 ? 2% perf-stat.i.branch-misses
1.795e+08 +7.1% 1.922e+08 perf-stat.i.cache-misses
2.486e+08 +7.2% 2.664e+08 perf-stat.i.cache-references
4.23 -4.6% 4.04 perf-stat.i.cpi
749.57 -6.6% 700.00 perf-stat.i.cycles-between-cache-misses
9606077 +6.2% 10206063 perf-stat.i.dTLB-load-misses
8.018e+09 +4.7% 8.396e+09 perf-stat.i.dTLB-loads
2231154 +7.5% 2399136 perf-stat.i.dTLB-store-misses
2.668e+09 +7.3% 2.862e+09 perf-stat.i.dTLB-stores
73.29 ? 6% +4.4 77.67 perf-stat.i.iTLB-load-miss-rate%
14219792 ? 8% +19.2% 16946973 perf-stat.i.iTLB-load-misses
3.183e+10 +4.8% 3.337e+10 perf-stat.i.instructions
2312 ? 8% -13.1% 2009 ? 2% perf-stat.i.instructions-per-iTLB-miss
0.24 +4.7% 0.25 perf-stat.i.ipc
1273 +7.4% 1368 perf-stat.i.metric.K/sec
197.16 +5.2% 207.39 perf-stat.i.metric.M/sec
2205049 +7.5% 2371531 perf-stat.i.minor-faults
7323132 +4.0% 7616704 perf-stat.i.node-store-misses
2205063 +7.5% 2371545 perf-stat.i.page-faults
7.81 +2.2% 7.98 perf-stat.overall.MPKI
0.26 ? 5% +0.0 0.28 ? 2% perf-stat.overall.branch-miss-rate%
4.22 -4.7% 4.02 perf-stat.overall.cpi
748.61 -6.7% 698.69 perf-stat.overall.cycles-between-cache-misses
73.28 ? 6% +4.4 77.69 perf-stat.overall.iTLB-load-miss-rate%
2252 ? 8% -12.6% 1969 perf-stat.overall.instructions-per-iTLB-miss
0.24 +4.9% 0.25 perf-stat.overall.ipc
14444 -2.5% 14084 perf-stat.overall.path-length
7.951e+09 +4.9% 8.341e+09 perf-stat.ps.branch-instructions
20309598 ? 5% +15.4% 23427778 ? 2% perf-stat.ps.branch-misses
1.786e+08 +7.1% 1.912e+08 perf-stat.ps.cache-misses
2.474e+08 +7.2% 2.651e+08 perf-stat.ps.cache-references
9559588 +6.2% 10156429 perf-stat.ps.dTLB-load-misses
7.979e+09 +4.7% 8.355e+09 perf-stat.ps.dTLB-loads
2220324 +7.5% 2387471 perf-stat.ps.dTLB-store-misses
2.655e+09 +7.3% 2.848e+09 perf-stat.ps.dTLB-stores
14152666 ? 7% +19.2% 16863826 perf-stat.ps.iTLB-load-misses
3.167e+10 +4.8% 3.321e+10 perf-stat.ps.instructions
2194343 +7.5% 2359991 perf-stat.ps.minor-faults
7285981 +4.0% 7577980 perf-stat.ps.node-store-misses
2194357 +7.5% 2360005 perf-stat.ps.page-faults
6.402e+12 +5.0% 6.721e+12 perf-stat.total.instructions


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://01.org/lkp


Attachments:
(No filename) (15.33 kB)
config-5.19.0-rc3-00008-gd96c52fe4907 (165.93 kB)
job-script (8.63 kB)
job.yaml (5.93 kB)
reproduce (1.00 kB)
Download all attachments