2021-03-17 17:17:47

by kernel test robot

[permalink] [raw]
Subject: [io_uring] 7a612350a9: fio.read_iops -6.5% regression



Greeting,

FYI, we noticed a -6.5% regression of fio.read_iops due to commit:


commit: 7a612350a989866510dc5c874fd8ffe1f37555d2 ("io_uring: fix complete_post races for linked req")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: fio-basic
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory
with following parameters:

disk: 2pmem
fs: ext2
mount_option: dax
runtime: 200s
nr_task: 50%
time_based: tb
rw: read
bs: 2M
ioengine: mmap
test_size: 200G
cpufreq_governor: performance
ucode: 0x5003006

test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml

=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based/ucode:
2M/gcc-9/performance/2pmem/ext2/mmap/x86_64-rhel-8.3/dax/50%/debian-10.4-x86_64-20200603.cgz/200s/read/lkp-csl-2sp6/200G/fio-basic/tb/0x5003006

commit:
33cc89a9fc ("io_uring: add io_disarm_next() helper")
7a612350a9 ("io_uring: fix complete_post races for linked req")

33cc89a9fc248a48 7a612350a989866510dc5c874fd
---------------- ---------------------------
%stddev %change %stddev
\ | \
96.76 -88.0 8.72 ? 43% fio.latency_10ms%
3.22 ? 15% +88.0 91.27 ? 4% fio.latency_20ms%
10129 -6.5% 9473 fio.read_bw_MBps
9590101 +7.5% 10310997 fio.read_clat_90%_us
9743018 +7.4% 10463914 fio.read_clat_95%_us
10223616 +6.8% 10922666 fio.read_clat_99%_us
9475384 +6.9% 10131741 fio.read_clat_mean_us
5064 -6.5% 4736 fio.read_iops
5.187e+08 -6.5% 4.851e+08 fio.time.minor_page_faults
466.31 -7.2% 432.57 fio.time.user_time
1013022 -6.5% 947398 fio.workload
2491451 ? 20% -27.3% 1812105 ? 7% cpuidle.C1.time
2.40 -7.1% 2.23 iostat.cpu.user
2276 ? 8% +13.6% 2584 ? 6% slabinfo.task_group.active_objs
2276 ? 8% +13.6% 2584 ? 6% slabinfo.task_group.num_objs
4195 ? 15% +43.6% 6023 ? 31% softirqs.CPU36.RCU
17509 ? 37% -65.8% 5991 ? 39% softirqs.CPU36.SCHED
13188 ? 50% +88.4% 24850 ? 9% softirqs.CPU84.SCHED
501094 -6.5% 468624 proc-vmstat.nr_page_table_pages
2291 ? 66% +134.5% 5374 ? 31% proc-vmstat.nr_written
3230950 -3.9% 3106141 proc-vmstat.numa_hit
3144164 -4.0% 3019355 proc-vmstat.numa_local
494753 ? 2% -5.5% 467629 ? 4% proc-vmstat.numa_pte_updates
3402336 -3.6% 3280876 proc-vmstat.pgalloc_normal
5.2e+08 -6.5% 4.864e+08 proc-vmstat.pgfault
3042976 -4.2% 2915839 proc-vmstat.pgfree
0.42 ? 9% -0.1 0.36 ? 11% perf-profile.children.cycles-pp.native_irq_return_iret
0.48 ? 10% -0.1 0.41 ? 11% perf-profile.children.cycles-pp._raw_read_lock
0.20 ? 11% -0.0 0.16 ? 13% perf-profile.children.cycles-pp.grab_mapping_entry
0.10 ? 13% -0.0 0.08 ? 14% perf-profile.children.cycles-pp.get_unlocked_entry
0.12 ? 10% -0.0 0.09 ? 13% perf-profile.children.cycles-pp.dax_iomap_pfn
0.09 ? 12% -0.0 0.07 ? 17% perf-profile.children.cycles-pp.xas_find_conflict
0.12 ? 11% -0.0 0.10 ? 12% perf-profile.children.cycles-pp.xas_store
0.12 ? 10% -0.0 0.10 ? 10% perf-profile.children.cycles-pp.dax_unlock_entry
0.47 ? 10% -0.1 0.40 ? 12% perf-profile.self.cycles-pp._raw_read_lock
0.42 ? 9% -0.1 0.36 ? 11% perf-profile.self.cycles-pp.native_irq_return_iret
0.08 ? 10% -0.0 0.06 ? 16% perf-profile.self.cycles-pp.xas_find_conflict
37.17 ? 82% +526.0% 232.67 ? 87% interrupts.CPU33.TLB:TLB_shootdowns
4426 ? 41% +69.1% 7485 ? 9% interrupts.CPU36.NMI:Non-maskable_interrupts
4426 ? 41% +69.1% 7485 ? 9% interrupts.CPU36.PMI:Performance_monitoring_interrupts
85.17 ? 65% +118.6% 186.17 ? 11% interrupts.CPU36.RES:Rescheduling_interrupts
4893 ? 48% +56.5% 7656 ? 4% interrupts.CPU67.NMI:Non-maskable_interrupts
4893 ? 48% +56.5% 7656 ? 4% interrupts.CPU67.PMI:Performance_monitoring_interrupts
4248 ? 40% +59.8% 6788 ? 14% interrupts.CPU70.NMI:Non-maskable_interrupts
4248 ? 40% +59.8% 6788 ? 14% interrupts.CPU70.PMI:Performance_monitoring_interrupts
2783 ? 24% +138.2% 6629 ? 27% interrupts.CPU76.NMI:Non-maskable_interrupts
2783 ? 24% +138.2% 6629 ? 27% interrupts.CPU76.PMI:Performance_monitoring_interrupts
573.50 ? 4% +11.0% 636.83 ? 8% interrupts.CPU81.CAL:Function_call_interrupts
129.17 ? 43% -79.1% 27.00 ? 72% interrupts.CPU84.RES:Rescheduling_interrupts
38.00 ? 80% +151.3% 95.50 ? 47% interrupts.CPU84.TLB:TLB_shootdowns
39.33 ? 63% +127.5% 89.50 ? 31% interrupts.CPU87.TLB:TLB_shootdowns
41.33 ? 51% +93.1% 79.83 ? 23% interrupts.CPU88.TLB:TLB_shootdowns
45.33 ? 42% +113.6% 96.83 ? 30% interrupts.CPU93.TLB:TLB_shootdowns
15.18 -3.4% 14.67 perf-stat.i.MPKI
5.037e+09 -2.9% 4.893e+09 perf-stat.i.branch-instructions
0.33 -0.0 0.33 perf-stat.i.branch-miss-rate%
16306932 -4.4% 15591709 perf-stat.i.branch-misses
2.322e+08 -6.1% 2.18e+08 perf-stat.i.cache-misses
3.253e+08 -6.2% 3.052e+08 perf-stat.i.cache-references
6.25 +3.1% 6.44 perf-stat.i.cpi
582.91 +6.4% 620.25 perf-stat.i.cycles-between-cache-misses
0.97 -0.0 0.93 perf-stat.i.dTLB-load-miss-rate%
53274570 -6.9% 49597287 perf-stat.i.dTLB-load-misses
5.431e+09 -3.1% 5.26e+09 perf-stat.i.dTLB-loads
1.7e+09 -6.0% 1.598e+09 perf-stat.i.dTLB-stores
83.93 +2.6 86.55 perf-stat.i.iTLB-load-miss-rate%
13088877 ? 4% +15.3% 15097164 perf-stat.i.iTLB-load-misses
2466124 -6.4% 2307680 perf-stat.i.iTLB-loads
2.138e+10 -3.0% 2.073e+10 perf-stat.i.instructions
1647 ? 4% -16.1% 1381 perf-stat.i.instructions-per-iTLB-miss
0.16 -2.7% 0.16 perf-stat.i.ipc
131.32 -3.5% 126.68 perf-stat.i.metric.M/sec
2571493 -6.5% 2405116 perf-stat.i.minor-faults
43574923 ? 10% -19.9% 34910133 ? 12% perf-stat.i.node-load-misses
27094070 -7.1% 25159107 perf-stat.i.node-stores
2572882 -6.5% 2406509 perf-stat.i.page-faults
15.22 -3.3% 14.72 perf-stat.overall.MPKI
6.26 +3.1% 6.46 perf-stat.overall.cpi
576.54 +6.5% 614.09 perf-stat.overall.cycles-between-cache-misses
0.97 -0.0 0.93 perf-stat.overall.dTLB-load-miss-rate%
84.13 +2.6 86.74 perf-stat.overall.iTLB-load-miss-rate%
1635 ? 4% -16.0% 1373 perf-stat.overall.instructions-per-iTLB-miss
0.16 -3.0% 0.15 perf-stat.overall.ipc
4258659 +3.6% 4412636 perf-stat.overall.path-length
5.011e+09 -2.9% 4.868e+09 perf-stat.ps.branch-instructions
16226538 -4.4% 15516193 perf-stat.ps.branch-misses
2.31e+08 -6.1% 2.169e+08 perf-stat.ps.cache-misses
3.237e+08 -6.2% 3.036e+08 perf-stat.ps.cache-references
53008395 -6.9% 49349734 perf-stat.ps.dTLB-load-misses
5.404e+09 -3.1% 5.234e+09 perf-stat.ps.dTLB-loads
1.692e+09 -6.0% 1.59e+09 perf-stat.ps.dTLB-stores
13029271 ? 4% +15.3% 15021875 perf-stat.ps.iTLB-load-misses
2453727 -6.4% 2295986 perf-stat.ps.iTLB-loads
2.127e+10 -3.0% 2.063e+10 perf-stat.ps.instructions
2558645 -6.5% 2393171 perf-stat.ps.minor-faults
43355231 ? 10% -19.9% 34734698 ? 12% perf-stat.ps.node-load-misses
26960382 -7.1% 25037880 perf-stat.ps.node-stores
2560034 -6.5% 2394561 perf-stat.ps.page-faults
4.314e+12 -3.1% 4.18e+12 perf-stat.total.instructions



fio.read_bw_MBps

10200 +-------------------------------------------------------------------+
| .+.+. .+ +. .+.+. .+.+. + +.+.+. .+.+.+ |
10100 |.+ + +.+ +.+.+ + +.+ |
10000 |-+ |
| |
9900 |-+ |
| |
9800 |-+ |
| |
9700 |-+ |
9600 |-+ |
| |
9500 |-O O O O O O O O O O O O O O O O O |
| O O O O O O O O O O O O O O |
9400 +-------------------------------------------------------------------+


fio.read_iops

5100 +--------------------------------------------------------------------+
| .+.+. .+ +. .+.+. .+.+.. + +.+.+. .+.+.+ |
5050 |.+ + +.+ +.+.+ + +.+ |
5000 |-+ |
| |
4950 |-+ |
| |
4900 |-+ |
| |
4850 |-+ |
4800 |-+ |
| |
4750 |-O O O O O O O O O O O O O O O O O |
| O O O O O O O O O O O O O O |
4700 +--------------------------------------------------------------------+


fio.read_clat_mean_us

1.02e+07 +----------------------------------------------------------------+
| O O O O O O O O O O O O O O O O O |
1.01e+07 |-O O O O O O O O O O O O O O |
| |
1e+07 |-+ |
9.9e+06 |-+ |
| |
9.8e+06 |-+ |
| |
9.7e+06 |-+ |
9.6e+06 |-+ |
| |
9.5e+06 |.+. .+. .+.+. .+.+.+. .+ .+.+. |
| +.+ ++.+ +.+ +.+ +.+.+.+ +.+.+ |
9.4e+06 +----------------------------------------------------------------+


fio.read_clat_90__us

1.05e+07 +----------------------------------------------------------------+
1.04e+07 |-+ |
| |
1.03e+07 |-+ O OO O O O O O O O O O O OO O O O O O O O O O OO O O O O |
1.02e+07 |-+ |
| O O O |
1.01e+07 |-+ |
1e+07 |-+ |
9.9e+06 |-+ |
| |
9.8e+06 |-+ |
9.7e+06 |-+ |
|.+.+.+.+.+ +.+.+.+.+.+.+.+.+.+.+ +.+ +.+.+.+ |
9.6e+06 |-+ :+ :+ + + + |
9.5e+06 +----------------------------------------------------------------+


fio.read_clat_95__us

1.06e+07 +----------------------------------------------------------------+
1.05e+07 |-+ O O O O |
| |
1.04e+07 |-O O O O O O O O O O O O O O OO O O O O O O O O OO O O O |
1.03e+07 |-+ |
| |
1.02e+07 |-+ |
1.01e+07 |-+ |
1e+07 |-+ |
| |
9.9e+06 |-+ + + + |
9.8e+06 |-+ + + + + + : |
|.+.+.+ + +.+.+.+.+ +.+.+.+ +.+.+.+.+.+.+.+ |
9.7e+06 |-+ :+ + |
9.6e+06 +----------------------------------------------------------------+


fio.read_clat_99__us

1.1e+07 +----------------------------------------------------------------+
1.09e+07 |-+ OO O O O O O O O O O O O O O O O O O O |
| |
1.08e+07 |-O O O O O O O O O O O O O |
1.07e+07 |-+ |
| |
1.06e+07 |-+ |
1.05e+07 |-+ |
1.04e+07 |-+ |
| |
1.03e+07 |.+.+.+.+.++.+.+.+.+.+.+.+.+.+.+.+ +.+.+ + + |
1.02e+07 |-+ : : + + + + + |
| :: + + + |
1.01e+07 |-+ :: |
1e+07 +----------------------------------------------------------------+


fio.latency_10ms_

100 +---------------------------------------------------------------------+
90 |-+ + + + + + + +. |
| |
80 |-+ |
70 |-+ |
| |
60 |-+ |
50 |-+ |
40 |-+ |
| |
30 |-+ |
20 |-+ O O |
| O O O O O O O |
10 |-+ O O O O O O O O O O O O O O O O O O O O O O |
0 +---------------------------------------------------------------------+


fio.latency_20ms_

100 +---------------------------------------------------------------------+
90 |-+ O O O O O O O O O O O O O O O O O O O O O O |
| O O O O O O O |
80 |-+ O O |
70 |-+ |
| |
60 |-+ |
50 |-+ |
40 |-+ |
| |
30 |-+ |
20 |-+ |
| |
10 |.+. .+. .+. .+. .+. .+. .+. .+.. |
0 +---------------------------------------------------------------------+


fio.workload

1.02e+06 +----------------------------------------------------------------+
| .+.+. .+ +. .+.+. .+.+. : +.+.+. .+.+.+ |
1.01e+06 |.+ + +.+ +.+.+ + +.+ |
1e+06 |-+ |
| |
990000 |-+ |
| |
980000 |-+ |
| |
970000 |-+ |
960000 |-+ |
| |
950000 |-O O O O O O O O O O O O O O OO O |
| O O O O O O O O O O O O O O |
940000 +----------------------------------------------------------------+


fio.time.minor_page_faults

5.25e+08 +----------------------------------------------------------------+
| +. +. .+. .+ |
5.2e+08 |.+.+.+. .+ +.+.+.+.+. .+.+.+.+. : +.+ +. .+.+ |
5.15e+08 |-+ + + + + |
| |
5.1e+08 |-+ |
5.05e+08 |-+ |
| |
5e+08 |-+ |
4.95e+08 |-+ |
| |
4.9e+08 |-+ |
4.85e+08 |-O O O O O O O O O O O O O O O O O OO O O |
| OO O O O O O O O O O O |
4.8e+08 +----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (26.02 kB)
config-5.12.0-rc2-00020-g7a612350a989 (175.56 kB)
job-script (8.47 kB)
job.yaml (5.92 kB)
reproduce (984.00 B)
Download all attachments