Greeting,
FYI, we noticed a 22.4% improvement of fio.read_iops due to commit:
commit: 46a525e199e4037516f7e498c18f065b09df32ac ("io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: fio-basic
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
with following parameters:
disk: 2pmem
fs: ext2
mount_option: dax
runtime: 200s
nr_task: 50%
time_based: tb
rw: read
bs: 2M
ioengine: io_uring
test_size: 200G
cpufreq_governor: performance
test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
2M/gcc-11/performance/2pmem/ext2/io_uring/x86_64-rhel-8.3/dax/50%/debian-11.1-x86_64-20220510.cgz/200s/read/lkp-csl-2sp7/200G/fio-basic/tb
commit:
b000145e99 ("io_uring/rw: defer fsnotify calls to task context")
46a525e199 ("io_uring: don't gate task_work run on TIF_NOTIFY_SIGNAL")
b000145e99078094 46a525e199e4037516f7e498c18
---------------- ---------------------------
%stddev %change %stddev
\ | \
21.62 ? 28% +13.5 35.14 ? 17% fio.latency_100ms%
0.21 ? 7% -0.0 0.17 ? 6% fio.latency_20ms%
11.42 ? 20% -7.1 4.34 ? 21% fio.latency_500ms%
0.68 ? 62% -0.6 0.06 ? 11% fio.latency_750ms%
20312 ? 5% +22.4% 24856 fio.read_bw_MBps
2.66e+08 ? 7% -34.2% 1.751e+08 ? 5% fio.read_clat_90%_us
3.305e+08 ? 6% -28.9% 2.349e+08 ? 9% fio.read_clat_95%_us
4.549e+08 ? 14% -33.7% 3.016e+08 ? 2% fio.read_clat_99%_us
1.515e+08 ? 5% -18.5% 1.235e+08 fio.read_clat_mean_us
84751148 ? 10% -38.1% 52462994 ? 8% fio.read_clat_stddev
10156 ? 5% +22.4% 12428 fio.read_iops
34028 ? 4% -17.5% 28081 ? 4% fio.read_slat_mean_us
207091 ? 7% +23.0% 254784 ? 5% fio.read_slat_stddev
1322240 ? 6% +39.5% 1844804 ? 2% fio.time.involuntary_context_switches
152665 ? 3% +415.2% 786457 fio.time.minor_page_faults
8350 ? 2% +6.6% 8904 fio.time.percent_of_cpu_this_job_got
16651 ? 2% +6.8% 17775 fio.time.system_time
96.17 ? 8% -14.3% 82.42 ? 5% fio.time.user_time
2032503 ? 5% +22.4% 2487213 fio.workload
715266 ? 13% +22.3% 874621 ? 6% numa-numastat.node1.numa_hit
715333 ? 13% +22.2% 874330 ? 6% numa-vmstat.node1.numa_hit
1.966e+09 ? 17% -52.7% 9.303e+08 ? 12% cpuidle..time
4301667 ? 16% -53.0% 2021131 ? 20% cpuidle..usage
10.98 ? 15% -49.4% 5.56 ? 11% iostat.cpu.idle
88.19 +6.4% 93.80 iostat.cpu.system
9.76 ? 17% -53.3% 4.56 ? 15% turbostat.CPU%c1
57.07 +4.1% 59.40 turbostat.RAMWatt
10.12 ? 17% -5.5 4.66 ? 13% mpstat.cpu.all.idle%
0.25 ? 12% -0.1 0.12 ? 6% mpstat.cpu.all.soft%
0.83 ? 10% -0.2 0.65 ? 6% mpstat.cpu.all.usr%
10.60 ? 15% -50.9% 5.20 ? 16% vmstat.cpu.id
159.40 ? 4% +27.7% 203.50 ? 2% vmstat.procs.r
15208 ? 4% +15.1% 17506 vmstat.system.cs
20749 +2.9% 21355 proc-vmstat.nr_kernel_stack
225429 ? 7% +282.2% 861493 proc-vmstat.numa_hint_faults
136434 ? 8% +456.6% 759435 proc-vmstat.numa_hint_faults_local
1090679 +2.6% 1119565 proc-vmstat.numa_hit
6836 ? 3% +520.3% 42407 proc-vmstat.numa_huge_pte_updates
1005727 +2.6% 1031985 proc-vmstat.numa_local
775153 ? 8% +31.7% 1020704 ? 8% proc-vmstat.numa_pages_migrated
3833507 ? 3% +490.7% 22643711 proc-vmstat.numa_pte_updates
1090769 +2.5% 1118337 proc-vmstat.pgalloc_normal
979930 +65.5% 1621767 proc-vmstat.pgfault
895767 ? 15% +7.3% 961415 proc-vmstat.pgfree
775153 ? 8% +31.7% 1020704 ? 8% proc-vmstat.pgmigrate_success
1462 ? 8% +31.9% 1927 ? 8% proc-vmstat.thp_migration_success
98.27 +0.4 98.64 perf-profile.calltrace.cycles-pp.ret_from_fork
97.37 +0.4 97.73 perf-profile.calltrace.cycles-pp.io_read.io_issue_sqe.io_wq_submit_work.io_worker_handle_work.io_wqe_worker
98.23 +0.4 98.60 perf-profile.calltrace.cycles-pp.io_worker_handle_work.io_wqe_worker.ret_from_fork
98.23 +0.4 98.61 perf-profile.calltrace.cycles-pp.io_wqe_worker.ret_from_fork
1.13 ? 12% -0.3 0.86 ? 4% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.27 ? 6% -0.0 0.23 ? 4% perf-profile.children.cycles-pp.try_to_wake_up
0.28 ? 6% -0.0 0.24 ? 3% perf-profile.children.cycles-pp.__wake_up_common
0.28 ? 6% -0.0 0.24 ? 4% perf-profile.children.cycles-pp.__wake_up_common_lock
0.26 ? 6% -0.0 0.22 ? 5% perf-profile.children.cycles-pp.autoremove_wake_function
0.08 ? 6% -0.0 0.06 ? 8% perf-profile.children.cycles-pp.select_task_rq
0.00 +0.3 0.33 ? 4% perf-profile.children.cycles-pp.io_assign_current_work
98.27 +0.4 98.64 perf-profile.children.cycles-pp.ret_from_fork
98.23 +0.4 98.60 perf-profile.children.cycles-pp.io_worker_handle_work
98.23 +0.4 98.61 perf-profile.children.cycles-pp.io_wqe_worker
131360 ? 8% +37.7% 180837 ? 3% sched_debug.cfs_rq:/.MIN_vruntime.avg
1.49 ? 5% +26.4% 1.88 sched_debug.cfs_rq:/.h_nr_running.avg
195967 ? 10% -22.4% 152129 ? 10% sched_debug.cfs_rq:/.load.stddev
144.62 ? 20% +44.6% 209.08 ? 9% sched_debug.cfs_rq:/.load_avg.min
131362 ? 8% +37.7% 180842 ? 3% sched_debug.cfs_rq:/.max_vruntime.avg
201691 ? 7% +13.2% 228371 ? 3% sched_debug.cfs_rq:/.min_vruntime.avg
1.32 ? 5% +25.7% 1.66 sched_debug.cfs_rq:/.nr_running.avg
0.54 ? 5% +14.1% 0.62 ? 3% sched_debug.cfs_rq:/.nr_running.stddev
1584 ? 5% +25.2% 1983 sched_debug.cfs_rq:/.runnable_avg.avg
2.63 ? 13% +77.8% 4.68 ? 13% sched_debug.cfs_rq:/.spread.avg
5.35 ? 15% +36.7% 7.31 ? 9% sched_debug.cfs_rq:/.spread.stddev
1549299 ? 6% -31.1% 1067585 ? 7% sched_debug.cpu.avg_idle.avg
3658796 ? 10% -30.2% 2554830 ? 15% sched_debug.cpu.avg_idle.max
276671 ? 18% -41.5% 161849 ? 20% sched_debug.cpu.avg_idle.min
718901 ? 13% -31.3% 494161 ? 11% sched_debug.cpu.avg_idle.stddev
872915 ? 9% -36.4% 555389 ? 3% sched_debug.cpu.max_idle_balance_cost.avg
1335161 ? 12% -42.1% 773347 ? 15% sched_debug.cpu.max_idle_balance_cost.max
602677 ? 9% -17.0% 500000 sched_debug.cpu.max_idle_balance_cost.min
166368 ? 11% -64.5% 59141 ? 46% sched_debug.cpu.max_idle_balance_cost.stddev
1.49 ? 5% +26.0% 1.88 ? 2% sched_debug.cpu.nr_running.avg
16805 ? 4% +16.8% 19629 sched_debug.cpu.nr_switches.avg
9904 ? 6% +17.8% 11670 ? 2% sched_debug.cpu.nr_switches.min
4092 ? 9% +25.8% 5147 ? 9% sched_debug.cpu.nr_switches.stddev
40.89 +1.2% 41.37 perf-stat.i.MPKI
2.899e+09 ? 5% +20.7% 3.498e+09 perf-stat.i.branch-instructions
0.20 ? 4% -0.0 0.16 ? 4% perf-stat.i.branch-miss-rate%
6.679e+08 ? 5% +22.1% 8.153e+08 perf-stat.i.cache-misses
7.021e+08 ? 5% +21.9% 8.556e+08 perf-stat.i.cache-references
15269 ? 5% +15.1% 17569 perf-stat.i.context-switches
14.35 ? 5% -14.9% 12.21 perf-stat.i.cpi
2.381e+11 +5.8% 2.517e+11 perf-stat.i.cpu-cycles
372.98 ? 5% -16.3% 312.29 perf-stat.i.cycles-between-cache-misses
0.00 ? 9% -0.0 0.00 ? 6% perf-stat.i.dTLB-load-miss-rate%
2.976e+09 ? 5% +20.7% 3.59e+09 perf-stat.i.dTLB-loads
0.00 ? 2% +0.0 0.01 perf-stat.i.dTLB-store-miss-rate%
123809 ? 5% +48.9% 184353 perf-stat.i.dTLB-store-misses
2.799e+09 ? 5% +21.3% 3.394e+09 perf-stat.i.dTLB-stores
74.58 ? 3% +6.6 81.23 perf-stat.i.iTLB-load-miss-rate%
383587 ? 13% -37.2% 240807 ? 14% perf-stat.i.iTLB-loads
1.707e+10 ? 5% +20.9% 2.063e+10 perf-stat.i.instructions
0.07 ? 5% +15.1% 0.08 perf-stat.i.ipc
2.48 +5.8% 2.62 perf-stat.i.metric.GHz
1290 ? 9% -38.8% 790.15 ? 2% perf-stat.i.metric.K/sec
98.96 ? 5% +21.4% 120.15 perf-stat.i.metric.M/sec
4017 ? 2% +80.4% 7248 perf-stat.i.minor-faults
46.99 ? 4% -10.2 36.84 ? 3% perf-stat.i.node-load-miss-rate%
36469552 ? 11% +20.0% 43763092 perf-stat.i.node-loads
25.92 ? 9% -22.8 3.08 ? 14% perf-stat.i.node-store-miss-rate%
41252120 ? 10% -89.2% 4473071 ? 20% perf-stat.i.node-store-misses
1.465e+08 ? 9% +38.1% 2.022e+08 ? 2% perf-stat.i.node-stores
4017 ? 2% +80.4% 7248 perf-stat.i.page-faults
0.16 ? 5% -0.0 0.14 ? 4% perf-stat.overall.branch-miss-rate%
14.08 ? 6% -13.1% 12.24 perf-stat.overall.cpi
359.74 ? 6% -14.0% 309.45 perf-stat.overall.cycles-between-cache-misses
0.00 ? 10% -0.0 0.00 ? 6% perf-stat.overall.dTLB-load-miss-rate%
0.00 ? 2% +0.0 0.01 perf-stat.overall.dTLB-store-miss-rate%
64.21 ? 4% +11.1 75.32 ? 2% perf-stat.overall.iTLB-load-miss-rate%
26608 ? 5% +14.3% 30425 ? 6% perf-stat.overall.instructions-per-iTLB-miss
0.07 ? 5% +14.7% 0.08 perf-stat.overall.ipc
44.39 ? 2% -7.9 36.50 ? 3% perf-stat.overall.node-load-miss-rate%
22.12 ? 8% -20.1 2.06 ? 19% perf-stat.overall.node-store-miss-rate%
2.881e+09 ? 5% +20.9% 3.484e+09 perf-stat.ps.branch-instructions
6.644e+08 ? 5% +22.3% 8.128e+08 perf-stat.ps.cache-misses
6.983e+08 ? 5% +22.1% 8.529e+08 perf-stat.ps.cache-references
15070 ? 5% +15.5% 17412 perf-stat.ps.context-switches
2.383e+11 +5.6% 2.515e+11 perf-stat.ps.cpu-cycles
2.957e+09 ? 5% +20.9% 3.576e+09 perf-stat.ps.dTLB-loads
123050 ? 5% +49.6% 184070 perf-stat.ps.dTLB-store-misses
2.783e+09 ? 5% +21.5% 3.382e+09 perf-stat.ps.dTLB-stores
358319 ? 14% -37.8% 222740 ? 12% perf-stat.ps.iTLB-loads
1.697e+10 ? 5% +21.1% 2.055e+10 perf-stat.ps.instructions
4008 ? 2% +79.2% 7183 perf-stat.ps.minor-faults
36583387 ? 11% +19.6% 43742068 perf-stat.ps.node-loads
41398082 ? 11% -89.7% 4252522 ? 20% perf-stat.ps.node-store-misses
1.459e+08 ? 9% +38.5% 2.021e+08 ? 2% perf-stat.ps.node-stores
4008 ? 2% +79.2% 7183 perf-stat.ps.page-faults
3.428e+12 ? 5% +21.2% 4.155e+12 perf-stat.total.instructions
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://01.org/lkp