2021-04-15 00:29:14

by kernel test robot

[permalink] [raw]
Subject: [btrfs] 5deb17e18e: fio.write_iops 1.2% improvement



Greeting,

FYI, we noticed a 1.2% improvement of fio.write_iops due to commit:


commit: 5deb17e18e27a3502f21581ba4d086e762b86b31 ("btrfs: track ordered bytes instead of just dio ordered bytes")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: fio-basic
on test machine: 16 threads Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz with 32G memory
with following parameters:

runtime: 300s
disk: 1HDD
fs: btrfs
nr_task: 1
test_size: 128G
rw: randwrite
bs: 4k
ioengine: sync
cpufreq_governor: performance
ucode: 0xde

test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml

=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/ucode:
4k/gcc-9/performance/1HDD/btrfs/sync/x86_64-rhel-8.3/1/debian-10.4-x86_64-20200603.cgz/300s/randwrite/lkp-cfl-e1/128G/fio-basic/0xde

commit:
ac1ea10e75 ("btrfs: add a trace point for reserve tickets")
5deb17e18e ("btrfs: track ordered bytes instead of just dio ordered bytes")

ac1ea10e757a57fb 5deb17e18e27a3502f21581ba4d
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.10 ? 6% -0.0 0.05 ? 6% fio.latency_1000us%
0.01 ? 11% +0.0 0.05 ? 5% fio.latency_100ms%
7.51 ? 3% -3.8 3.74 ? 4% fio.latency_100us%
5.26 -0.7 4.59 fio.latency_10ms%
14.97 ? 3% +1.3 16.22 fio.latency_10us%
0.02 ? 4% +0.0 0.03 fio.latency_2000ms%
0.80 -0.2 0.62 fio.latency_20ms%
4.41 -0.9 3.50 ? 4% fio.latency_20us%
1.00 ? 5% -0.2 0.84 fio.latency_250us%
8.89 ? 2% +4.2 13.07 fio.latency_2ms%
7.38 ? 3% +3.0 10.41 fio.latency_4ms%
0.06 ? 11% -0.0 0.01 ? 11% fio.latency_500ms%
0.17 ? 6% -0.0 0.12 ? 4% fio.latency_50ms%
4.73 ? 3% -2.2 2.57 ? 4% fio.latency_50us%
1693210 +1.3% 1715361 fio.time.file_system_outputs
211651 +1.3% 214420 fio.workload
2.75 +1.2% 2.78 fio.write_bw_MBps
2916352 -5.7% 2749235 fio.write_clat_90%_us
4642133 -10.1% 4171366 fio.write_clat_95%_us
11315882 ? 2% -26.3% 8336179 fio.write_clat_99%_us
1419243 -1.2% 1402015 fio.write_clat_mean_us
22694034 +4.5% 23720487 fio.write_clat_stddev
704.13 +1.2% 712.73 fio.write_iops
9.06 -6.9% 8.44 iostat.cpu.iowait
52914 ? 4% -23.5% 40465 ? 4% softirqs.TIMER
5529 +1.6% 5617 vmstat.io.bo
6263 +4.3% 6530 vmstat.system.cs
108039 +15.0% 124211 meminfo.Active
94360 +16.5% 109929 meminfo.Active(file)
2835 ? 3% +50.0% 4253 ? 3% meminfo.Dirty
2475 ? 2% +11.3% 2755 ? 2% slabinfo.btrfs_extent_buffer.active_objs
2478 ? 2% +11.4% 2761 ? 2% slabinfo.btrfs_extent_buffer.num_objs
2724 ? 4% +8.9% 2966 ? 6% slabinfo.fsnotify_mark_connector.num_objs
2824 ? 74% -67.1% 928.20 ? 63% interrupts.133:IR-PCI-MSI.2097154-edge.eth1-TxRx-1
236.67 ? 6% -22.3% 184.00 ? 26% interrupts.CPU13.NMI:Non-maskable_interrupts
236.67 ? 6% -22.3% 184.00 ? 26% interrupts.CPU13.PMI:Performance_monitoring_interrupts
2824 ? 74% -67.1% 928.20 ? 63% interrupts.CPU2.133:IR-PCI-MSI.2097154-edge.eth1-TxRx-1
23440 +17.5% 27544 proc-vmstat.nr_active_file
401066 -1.8% 393830 proc-vmstat.nr_dirtied
694.67 ? 3% +51.0% 1048 ? 3% proc-vmstat.nr_dirty
173383 +1.3% 175566 proc-vmstat.nr_inactive_file
301375 -2.7% 293337 proc-vmstat.nr_written
23440 +17.5% 27544 proc-vmstat.nr_zone_active_file
173383 +1.3% 175566 proc-vmstat.nr_zone_inactive_file
1105434 +1.2% 1118728 proc-vmstat.numa_hit
1105434 +1.2% 1118728 proc-vmstat.numa_local
1680626 +1.8% 1710624 proc-vmstat.pgpgout
79275921 +3.7% 82180925 perf-stat.i.branch-instructions
54448091 +4.4% 56830814 perf-stat.i.cache-references
6302 +4.1% 6560 perf-stat.i.context-switches
46476020 +1.5% 47154582 perf-stat.i.dTLB-stores
3.912e+08 +3.9% 4.063e+08 perf-stat.i.instructions
0.53 +3.0% 0.54 perf-stat.i.metric.K/sec
17.37 +2.8% 17.85 perf-stat.i.metric.M/sec
557137 +2.8% 572549 perf-stat.overall.path-length
79095771 +3.7% 81994540 perf-stat.ps.branch-instructions
54269739 +4.4% 56645144 perf-stat.ps.cache-references
6280 +4.1% 6539 perf-stat.ps.context-switches
46349077 +1.5% 47028173 perf-stat.ps.dTLB-stores
3.903e+08 +3.9% 4.053e+08 perf-stat.ps.instructions
1.179e+11 +4.1% 1.228e+11 perf-stat.total.instructions
99.55 ? 5% +27.1% 126.57 ? 14% sched_debug.cfs_rq:/.load_avg.avg
272.17 ? 2% +12.1% 305.09 ? 5% sched_debug.cfs_rq:/.load_avg.stddev
2.13 ?223% +940.0% 22.16 ? 52% sched_debug.cfs_rq:/.removed.load_avg.avg
34.10 ?223% +702.2% 273.53 ? 49% sched_debug.cfs_rq:/.removed.load_avg.max
8.25 ?223% +794.7% 73.85 ? 50% sched_debug.cfs_rq:/.removed.load_avg.stddev
0.67 ?223% +1186.0% 8.57 ? 52% sched_debug.cfs_rq:/.removed.runnable_avg.avg
10.67 ?223% +963.4% 113.43 ? 54% sched_debug.cfs_rq:/.removed.runnable_avg.max
2.58 ?223% +1054.3% 29.80 ? 52% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
0.48 ?223% +1655.9% 8.49 ? 52% sched_debug.cfs_rq:/.removed.util_avg.avg
7.73 ?223% +1349.1% 112.07 ? 55% sched_debug.cfs_rq:/.removed.util_avg.max
1.87 ?223% +1474.3% 29.47 ? 53% sched_debug.cfs_rq:/.removed.util_avg.stddev
29.13 ? 17% -28.8% 20.75 ? 28% sched_debug.cfs_rq:/.util_est_enqueued.avg
117919 ? 8% +21.3% 142999 ? 8% sched_debug.cpu.nr_switches.max
19720 ? 11% +34.5% 26518 ? 8% sched_debug.cpu.nr_switches.stddev
32.58 ? 44% -51.4% 15.83 ? 14% sched_debug.cpu.nr_uninterruptible.max
-37.74 -38.3% -23.30 sched_debug.cpu.nr_uninterruptible.min
14.75 ? 24% -32.9% 9.89 ? 13% sched_debug.cpu.nr_uninterruptible.stddev
0.05 ? 3% -17.7% 0.04 ? 7% perf-sched.sch_delay.avg.ms.btrfs_start_ordered_extent.btrfs_run_ordered_extent_work.btrfs_work_helper.process_one_work
0.01 ? 9% -100.0% 0.00 perf-sched.sch_delay.avg.ms.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
0.08 ? 9% +52.3% 0.12 ? 29% perf-sched.sch_delay.max.ms.rwsem_down_read_slowpath.__btrfs_tree_read_lock.btrfs_read_lock_root_node.btrfs_search_slot
0.08 ? 13% -100.0% 0.00 perf-sched.sch_delay.max.ms.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
1.28 ? 3% -10.7% 1.14 ? 2% perf-sched.wait_and_delay.avg.ms.btrfs_start_ordered_extent.btrfs_run_ordered_extent_work.btrfs_work_helper.process_one_work
0.28 ? 24% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
3526 ? 24% +58.6% 5592 ? 9% perf-sched.wait_and_delay.count.btrfs_start_ordered_extent.btrfs_run_ordered_extent_work.btrfs_work_helper.process_one_work
650.33 ? 23% -100.0% 0.00 perf-sched.wait_and_delay.count.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
1862 ? 24% +59.3% 2966 ? 11% perf-sched.wait_and_delay.count.schedule_timeout.wait_for_completion.btrfs_wait_ordered_extents.btrfs_wait_ordered_roots
3.43 ? 15% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
13.69 ? 5% +28.7% 17.62 ? 9% perf-sched.wait_and_delay.max.ms.schedule_timeout.wait_for_completion.btrfs_wait_ordered_extents.btrfs_wait_ordered_roots
1739 ? 15% +70.3% 2961 ? 24% perf-sched.wait_and_delay.max.ms.wait_current_trans.start_transaction.btrfs_finish_ordered_io.btrfs_work_helper
1.23 ? 3% -10.5% 1.10 ? 3% perf-sched.wait_time.avg.ms.btrfs_start_ordered_extent.btrfs_run_ordered_extent_work.btrfs_work_helper.process_one_work
0.27 ? 25% -100.0% 0.00 perf-sched.wait_time.avg.ms.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
0.01 ? 24% -37.4% 0.00 ? 24% perf-sched.wait_time.avg.ms.schedule_timeout.io_schedule_timeout.wait_for_completion_io.write_all_supers
3.40 ? 15% -100.0% 0.00 perf-sched.wait_time.max.ms.schedule_timeout.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
0.03 ? 56% -70.6% 0.01 ? 63% perf-sched.wait_time.max.ms.schedule_timeout.io_schedule_timeout.wait_for_completion_io.write_all_supers
13.66 ? 5% +28.8% 17.59 ? 9% perf-sched.wait_time.max.ms.schedule_timeout.wait_for_completion.btrfs_wait_ordered_extents.btrfs_wait_ordered_roots
1738 ? 15% +70.3% 2961 ? 24% perf-sched.wait_time.max.ms.wait_current_trans.start_transaction.btrfs_finish_ordered_io.btrfs_work_helper
5.53 ? 4% +0.5 6.04 ? 4% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
5.65 ? 4% +0.5 6.19 ? 4% perf-profile.calltrace.cycles-pp.ret_from_fork
5.65 ? 4% +0.5 6.19 ? 4% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
1.15 ? 7% -0.3 0.88 ? 21% perf-profile.children.cycles-pp.sched_clock_cpu
0.14 ? 34% -0.1 0.05 ? 99% perf-profile.children.cycles-pp.__switch_to_asm
0.06 ? 51% +0.1 0.12 ? 52% perf-profile.children.cycles-pp.filename_lookup
0.01 ?223% +0.1 0.12 ? 29% perf-profile.children.cycles-pp.___perf_sw_event
0.21 ? 30% +0.2 0.38 ? 22% perf-profile.children.cycles-pp.dequeue_task_fair
0.16 ? 26% +0.2 0.33 ? 20% perf-profile.children.cycles-pp.dequeue_entity
0.67 ? 12% +0.2 0.91 ? 15% perf-profile.children.cycles-pp.schedule
1.88 ? 5% +0.3 2.19 ? 11% perf-profile.children.cycles-pp.tick_irq_enter
1.10 ? 9% +0.3 1.41 ? 7% perf-profile.children.cycles-pp.do_writepages
5.53 ? 4% +0.5 6.04 ? 4% perf-profile.children.cycles-pp.worker_thread
5.67 ? 4% +0.5 6.20 ? 4% perf-profile.children.cycles-pp.ret_from_fork
5.65 ? 4% +0.5 6.19 ? 4% perf-profile.children.cycles-pp.kthread
3.12 ? 13% -0.6 2.56 ? 11% perf-profile.self.cycles-pp.cpuidle_enter_state
0.14 ? 34% -0.1 0.05 ? 99% perf-profile.self.cycles-pp.__switch_to_asm
0.22 ? 24% -0.1 0.15 ? 33% perf-profile.self.cycles-pp.__softirqentry_text_start
0.01 ?223% +0.1 0.09 ? 46% perf-profile.self.cycles-pp.___perf_sw_event
0.16 ? 45% +0.1 0.25 ? 18% perf-profile.self.cycles-pp.update_rq_clock
0.15 ? 24% +0.1 0.27 ? 30% perf-profile.self.cycles-pp.update_blocked_averages



fio.write_iops

800 +---------------------------------------------------------------------+
| O O O O O O O O O O O O |
700 |.+.++.+.+.++.+.+.++.+.+.++.+.+.++.+.+.++.+.+.++.+.+.++.+.+.++.+.+ OO |
600 |-+ |
| |
500 |-+ |
| |
400 |-+ |
| |
300 |-+ |
200 |-+ |
| |
100 |-+ |
| |
0 +---------------------------------------------------------------------+


fio.latency_100us_

9 +-----------------------------------------------------------------------+
| + .+. .+. |
8 |.+. +.+.+.+.+ + .+. .+.+ ++ +.+.++.+.+.+.++.+.+.+. +.+.+.+.+ |
7 |-+ + + +.++ + |
| |
6 |-+ |
5 |-+ |
| |
4 |-+ O O O O O O O O O O O O |
3 |-O OO O O O O O OO O O O O O |
| OO O O O OO O O O O |
2 |-+ |
1 |-+ |
| |
0 +-----------------------------------------------------------------------+


fio.latency_2ms_

14 +----------------------------------------------------------------------+
| O O O O O O O OO O O O O O OO O OO O |
12 |-O OO O O O OO O O O O O OO O |
| |
10 |-+ |
|.+.++.+.+. +.+.+. .++.+.+. .+. .++. .++. .+ .+.+.+. +.+.+.+ |
8 |-+ + + ++ +.+ +.+ +.+ + + |
| |
6 |-+ |
| |
4 |-+ |
| |
2 |-+ |
| |
0 +----------------------------------------------------------------------+


fio.latency_4ms_

14 +----------------------------------------------------------------------+
| O O O OO O O OO O O |
12 |-O O O O OO O O O OO O O O O O O O O |
| O O |
10 |-+ O O O OO O |
| |
8 |-+.+ .+. .++. .+. |
|.+ + +.++.+.+.+ +.+.++.+.+.+.++.+.+.++.+.+.++.+.+ ++.+.+.+ |
6 |-+ |
| |
4 |-+ |
| |
2 |-+ |
| |
0 +----------------------------------------------------------------------+


fio.latency_500ms_

0.07 +--------------------------------------------------------------------+
| +. :+ + |
0.06 |-+ +.+.+ + + .+.+. +. .+ : + + |
|.+ +.+. : + +. .++. .+.+ .+.+.++ + + + +. : +. |
0.05 |-+:+ + + + + + + + |
| + |
0.04 |-+ |
| |
0.03 |-+ |
| |
0.02 |-+ O |
| O O O OO |
0.01 |-OO O O O O O O O O O O O O O |
| |
0 +--------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (21.02 kB)
config-5.11.0-rc7-00063-g5deb17e18e27 (175.07 kB)
job-script (8.03 kB)
job.yaml (5.53 kB)
reproduce (745.00 B)
Download all attachments