LinuxLists.cc - [linus:master] [rcu] d96c52fe49: fio.write

2022-10-31 08:51:21

Subject: [linus:master] [rcu] d96c52fe49: fio.write_iops 5.8% improvement

Greeting,

FYI, we noticed a
commit: d96c52fe4907c68adc5 In addition to that,
+------------------+------- | testcase: change | test machine | test parameters | bs=4k | | | disk=2pmem | | fs=xfs | | ioengine=mmap | | mount_option=dax | | nr_task=50% | | runtime=200s | | rw=read | | test_size=200G | | time_based=tb +------------------+-------

Details are as below:

=========================== bs/compiler/cpufreq_governo 4k/gcc-11/performance/2pm
commit:
e4333cb20f ("rcutorture: d96c52fe49 ("rcu:
e4333cb20f047d96 d96c52fe49 ---------------- ---------- %stddev %change \ | 23.87 ? 3% +32.7 76.01 -32.6 4.463e+08 521.13 4.463e+08 8717 +5.8% 23210 -6.8% 24960 -5.6% 21149 -5.6% 2231570 2.70 +6.8% 0.03 +0.0 43.52 +1.0% 1925180 ? 2% 84399 ? 4% +11.2% 64201 ?138% +78.6% 21100 ? 4% +11.2% 16050 ?138% +78.6% 16050 ?138% +78.6% 431413 2574662 2485812 2572931 4.47e+08 2460652 872002 7.577e+09 0.22 ? 2% +0.0 16697023 ? 3% 1.862e+08 2.653e+08 4.42 -3.6% 731.57 ? 3% 7.891e+09 0.47 +0.0 12378051 2.643e+09 67.44 +3.5 12036801 ? 3% 3.046e+10 2572 ? 4% -12.5% 0.23 +3.9% 1171 +5.6% 191.44 2217598 360845 ? 4% 44614403 ? 2% 2217612 0.22 ? 2% +0.0 4.41 -3.6% 720.90 0.47 +0.0 67.43 +3.6 2533 ? 3% -12.6% 0.23 +3.7% 13751 -1.9% 7.54e+09 16623549 ? 3% 1.853e+08 2.64e+08 7.852e+09 12317699 2.631e+09 11979927 ? 3% 3.031e+10 2206821 359302 ? 4% 44388499 ? 2% 2206835 6.138e+12

*************************** lkp-csl-2sp7: 96 threads =========================== bs/compiler/cpufreq_governo 4k/gcc-11/performance/2pm
commit:
e4333cb20f ("rcutorture: d96c52fe49 ("rcu:
e4333cb20f047d96 d96c52fe49 ---------------- ---------- %stddev %change \ | 18.06 ? 4% +41.5 81.86 -41.5 8656 +7.7% 23466 -8.4% 24789 -8.8% 28501 -5.7% 21373 -7.2% 2216141 4.432e+08 452.30 4.432e+08 2.36 +7.3% 14396416 ? 10% 0.65 ? 2% -0.1 0.63 ? 3% -0.1 0.60 ? 2% -0.1 0.65 ? 2% -0.1 0.63 ? 3% -0.1 1.14 ? 2% -0.1 429439 2550960 2464007 2551059 4.439e+08 2447827 865972 7.80 +3.9% 7.99e+09 0.25 ? 5% +0.0 20400526 ? 5% 1.795e+08 2.486e+08 4.23 -4.6% 749.57 9606077 8.018e+09 2231154 2.668e+09 73.29 ? 6% +4.4 14219792 ? 8% 3.183e+10 2312 ? 8% -13.1% 0.24 +4.7% 1273 +7.4% 197.16 2205049 7323132 2205063 7.81 +2.2% 0.26 ? 5% +0.0 4.22 -4.7% 748.61 73.28 ? 6% +4.4 2252 ? 8% -12.6% 0.24 +4.9% 14444 -2.5% 7.951e+09 20309598 ? 5% 1.786e+08 2.474e+08 9559588 7.979e+09 2220324 2.655e+09 14152666 ? 7% 3.167e+10 2194343 7285981 2194357 6.402e+12

To reproduce:

git clone sudo bin/lkp install job.yaml bin/lkp split-job sudo bin/lkp
# if come # please remove

Disclaimer:
Results have been for informational design or configuration

--
0-DAY CI Kernel Test Service
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
/github.com/axboe/fio">https://github.com/axboe/fio
the commit also has significant impact on the following tests:
----------------------------------------------------------------------------------------+
| fio-basic: fio.read_iops 7.7% improvement |
| 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory |
|
| cpufreq_governor=performance |
|
|
|
|
|
|
|
|
|
----------------------------------------------------------------------------------------+
==============================================================
r/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
em/xfs/mmap/x86_64-rhel-8.3/dax/50%/debian-11.1-x86_64-20220510.cgz/200s/write/lkp-csl-2sp7/200G/fio-basic/tb
Verify that polled GP API sees synchronous grace periods")
Add polled expedited grace-period primitives")
07c68adc5e61a0bef
-----------------
%stddev
\
56.58 fio.latency_20us%
43.37 ? 2% fio.latency_50us%
+5.8% 4.722e+08 fio.time.minor_page_faults
+6.9% 557.11 fio.time.user_time
+5.8% 4.722e+08 fio.workload
9221 fio.write_bw_MBps
21632 fio.write_clat_90%_us
23552 fio.write_clat_95%_us
19960 fio.write_clat_mean_us
+5.8% 2360791 fio.write_iops
2.89 iostat.cpu.user
0.03 ? 6% mpstat.cpu.all.soft%
43.98 turbostat.RAMWatt
+13.8% 2191273 ? 6% numa-meminfo.node1.MemUsed
93823 ? 10% numa-meminfo.node1.SUnreclaim
114665 ? 74% numa-meminfo.node1.Unevictable
23455 ? 10% numa-vmstat.node1.nr_slab_unreclaimable
28665 ? 74% numa-vmstat.node1.nr_unevictable
28665 ? 74% numa-vmstat.node1.nr_zone_unevictable
+5.8% 456351 proc-vmstat.nr_page_table_pages
+3.6% 2668198 proc-vmstat.numa_hit
+3.8% 2581127 proc-vmstat.numa_local
+3.7% 2668242 proc-vmstat.pgalloc_normal
+5.8% 4.729e+08 proc-vmstat.pgfault
+4.1% 2560513 proc-vmstat.pgfree
+5.8% 922500 proc-vmstat.thp_fault_fallback
+4.0% 7.878e+09 perf-stat.i.branch-instructions
0.25 ? 3% perf-stat.i.branch-miss-rate%
+19.7% 19981806 ? 3% perf-stat.i.branch-misses
+5.3% 1.961e+08 perf-stat.i.cache-misses
+5.1% 2.787e+08 perf-stat.i.cache-references
4.26 perf-stat.i.cpi
-6.1% 686.63 perf-stat.i.cycles-between-cache-misses
+4.1% 8.217e+09 perf-stat.i.dTLB-loads
0.48 perf-stat.i.dTLB-store-miss-rate%
+7.7% 13336505 perf-stat.i.dTLB-store-misses
+5.7% 2.793e+09 perf-stat.i.dTLB-stores
70.99 perf-stat.i.iTLB-load-miss-rate%
+18.8% 14298923 ? 2% perf-stat.i.iTLB-load-misses
+3.9% 3.166e+10 perf-stat.i.instructions
2249 perf-stat.i.instructions-per-iTLB-miss
0.24 perf-stat.i.ipc
1237 perf-stat.i.metric.K/sec
+4.3% 199.67 perf-stat.i.metric.M/sec
+6.0% 2349848 perf-stat.i.minor-faults
+5.7% 381564 ? 2% perf-stat.i.node-loads
+7.1% 47802681 ? 2% perf-stat.i.node-store-misses
+6.0% 2349862 perf-stat.i.page-faults
0.25 ? 3% perf-stat.overall.branch-miss-rate%
4.25 perf-stat.overall.cpi
-4.9% 685.85 perf-stat.overall.cycles-between-cache-misses
0.48 perf-stat.overall.dTLB-store-miss-rate%
70.99 perf-stat.overall.iTLB-load-miss-rate%
2215 ? 2% perf-stat.overall.instructions-per-iTLB-miss
0.24 perf-stat.overall.ipc
13485 perf-stat.overall.path-length
+4.0% 7.839e+09 perf-stat.ps.branch-instructions
+19.7% 19891545 ? 3% perf-stat.ps.branch-misses
+5.3% 1.951e+08 perf-stat.ps.cache-misses
+5.0% 2.773e+08 perf-stat.ps.cache-references
+4.1% 8.177e+09 perf-stat.ps.dTLB-loads
+7.7% 13270764 perf-stat.ps.dTLB-store-misses
+5.7% 2.779e+09 perf-stat.ps.dTLB-stores
+18.8% 14229710 ? 2% perf-stat.ps.iTLB-load-misses
+3.9% 3.15e+10 perf-stat.ps.instructions
+6.0% 2338290 perf-stat.ps.minor-faults
+5.7% 379788 ? 2% perf-stat.ps.node-loads
+7.1% 47560032 ? 2% perf-stat.ps.node-store-misses
+6.0% 2338303 perf-stat.ps.page-faults
+3.7% 6.367e+12 perf-stat.total.instructions
************************************************************************
2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
==============================================================
r/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
em/xfs/mmap/x86_64-rhel-8.3/dax/50%/debian-11.1-x86_64-20220510.cgz/200s/read/lkp-csl-2sp7/200G/fio-basic/tb
Verify that polled GP API sees synchronous grace periods")
Add polled expedited grace-period primitives")
07c68adc5e61a0bef
-----------------
%stddev
\
59.58 ? 3% fio.latency_20us%
40.40 ? 5% fio.latency_50us%
9320 fio.read_bw_MBps
21504 fio.read_clat_90%_us
22613 fio.read_clat_95%_us
26880 fio.read_clat_99%_us
19825 fio.read_clat_mean_us
+7.7% 2386084 fio.read_iops
+7.7% 4.772e+08 fio.time.minor_page_faults
+7.5% 486.28 fio.time.user_time
+7.7% 4.772e+08 fio.workload
2.53 iostat.cpu.user
-16.9% 11959978 ? 8% meminfo.DirectMap2M
0.51 ? 44% perf-profile.calltrace.cycles-pp.insert_pfn.__vm_insert_mixed.dax_fault_iter.dax_iomap_pte_fault.__xfs_filemap_fault
0.50 ? 44% perf-profile.calltrace.cycles-pp.__get_locked_pte.insert_pfn.__vm_insert_mixed.dax_fault_iter.dax_iomap_pte_fault
0.46 ? 44% perf-profile.calltrace.cycles-pp._raw_spin_lock.__get_locked_pte.insert_pfn.__vm_insert_mixed.dax_fault_iter
0.59 ? 9% perf-profile.children.cycles-pp.insert_pfn
0.57 ? 9% perf-profile.children.cycles-pp.__get_locked_pte
0.99 ? 9% perf-profile.self.cycles-pp._raw_spin_lock
+7.2% 460393 proc-vmstat.nr_page_table_pages
+5.6% 2693959 proc-vmstat.numa_hit
+5.8% 2606912 proc-vmstat.numa_local
+5.6% 2694047 proc-vmstat.pgalloc_normal
+7.7% 4.779e+08 proc-vmstat.pgfault
+5.4% 2579922 proc-vmstat.pgfree
+7.7% 932383 proc-vmstat.thp_fault_fallback
8.11 ? 3% perf-stat.i.MPKI
+4.9% 8.382e+09 perf-stat.i.branch-instructions
0.29 ? 9% perf-stat.i.branch-miss-rate%
+15.4% 23535017 ? 2% perf-stat.i.branch-misses
+7.1% 1.922e+08 perf-stat.i.cache-misses
+7.2% 2.664e+08 perf-stat.i.cache-references
4.04 perf-stat.i.cpi
-6.6% 700.00 perf-stat.i.cycles-between-cache-misses
+6.2% 10206063 perf-stat.i.dTLB-load-misses
+4.7% 8.396e+09 perf-stat.i.dTLB-loads
+7.5% 2399136 perf-stat.i.dTLB-store-misses
+7.3% 2.862e+09 perf-stat.i.dTLB-stores
77.67 perf-stat.i.iTLB-load-miss-rate%
+19.2% 16946973 perf-stat.i.iTLB-load-misses
+4.8% 3.337e+10 perf-stat.i.instructions
2009 ? 2% perf-stat.i.instructions-per-iTLB-miss
0.25 perf-stat.i.ipc
1368 perf-stat.i.metric.K/sec
+5.2% 207.39 perf-stat.i.metric.M/sec
+7.5% 2371531 perf-stat.i.minor-faults
+4.0% 7616704 perf-stat.i.node-store-misses
+7.5% 2371545 perf-stat.i.page-faults
7.98 perf-stat.overall.MPKI
0.28 ? 2% perf-stat.overall.branch-miss-rate%
4.02 perf-stat.overall.cpi
-6.7% 698.69 perf-stat.overall.cycles-between-cache-misses
77.69 perf-stat.overall.iTLB-load-miss-rate%
1969 perf-stat.overall.instructions-per-iTLB-miss
0.25 perf-stat.overall.ipc
14084 perf-stat.overall.path-length
+4.9% 8.341e+09 perf-stat.ps.branch-instructions
+15.4% 23427778 ? 2% perf-stat.ps.branch-misses
+7.1% 1.912e+08 perf-stat.ps.cache-misses
+7.2% 2.651e+08 perf-stat.ps.cache-references
+6.2% 10156429 perf-stat.ps.dTLB-load-misses
+4.7% 8.355e+09 perf-stat.ps.dTLB-loads
+7.5% 2387471 perf-stat.ps.dTLB-store-misses
+7.3% 2.848e+09 perf-stat.ps.dTLB-stores
+19.2% 16863826 perf-stat.ps.iTLB-load-misses
+4.8% 3.321e+10 perf-stat.ps.instructions
+7.5% 2359991 perf-stat.ps.minor-faults
+4.0% 7577980 perf-stat.ps.node-store-misses
+7.5% 2360005 perf-stat.ps.page-faults
+5.0% 6.721e+12 perf-stat.total.instructions
href="https://github.com/intel/lkp-tests.git">https://github.com/intel/lkp-tests.git
# job file is attached in this email
--compatible job.yaml # generate the yaml file for lkp run
run generated-yaml-file
across any failure that blocks the test,
~/.lkp and /lkp dir to run from a clean state.
estimated based on internal Intel analysis and are provided
purposes only. Any difference in system hardware or software
may affect actual performance.
p">https://01.org/lkp

Attachments:

(No filename) (15.33 kB)
config-5.19.0-rc3-00008-gd96c52fe4907 (165.93 kB)
job-script (8.63 kB)
job.yaml (5.93 kB)
reproduce (1.00 kB)
Download all attachments