Greeting,
FYI, we noticed a -12.4% regression of fio.write_bw_MBps due to commit:
commit: 302167c50b32e7fccc98994a91d40ddbbab04e52 ("btrfs: don't end the transaction for delayed refs in throttle")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git pending-fixes
in testcase: fio-basic
on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
with following parameters:
runtime: 300s
nr_task: 8t
disk: 1SSD
fs: btrfs
rw: randwrite
bs: 4k
ioengine: sync
test_size: 400g
cpufreq_governor: performance
ucode: 0xb00002e
test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/ucode:
4k/gcc-7/performance/1SSD/btrfs/sync/x86_64-rhel-7.2/8t/debian-x86_64-2018-04-03.cgz/300s/randwrite/lkp-bdw-ep3b/400g/fio-basic/0xb00002e
commit:
a627947076 ("Btrfs: fix deadlock when allocating tree block during leaf/node split")
302167c50b ("btrfs: don't end the transaction for delayed refs in throttle")
a6279470762c19ba 302167c50b32e7fccc98994a91
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
:4 25% 1:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x
%stddev %change %stddev
\ | \
0.02 ± 4% -0.0 0.01 fio.latency_100ms%
41.36 ± 2% -14.7 26.66 ± 12% fio.latency_100us%
0.85 ± 6% +0.3 1.14 ± 14% fio.latency_10us%
0.01 +0.0 0.02 ± 3% fio.latency_2000ms%
0.02 ± 18% -0.0 0.01 ± 5% fio.latency_20ms%
0.50 ± 11% +0.1 0.56 ± 11% fio.latency_20us%
0.03 ± 11% -0.0 0.01 ± 10% fio.latency_250ms%
8.90 ± 5% -2.1 6.80 ± 3% fio.latency_250us%
0.03 ± 7% -0.0 0.02 ± 7% fio.latency_500ms%
0.03 ± 15% -0.0 0.01 fio.latency_50ms%
41.49 ± 3% +16.2 57.73 ± 5% fio.latency_50us%
44895412 ± 2% -12.5% 39295860 fio.time.file_system_outputs
36.25 ± 3% -16.6% 30.25 fio.time.percent_of_cpu_this_job_got
98.06 ± 3% -18.2% 80.23 fio.time.system_time
5558064 ± 2% -12.7% 4851975 fio.time.voluntary_context_switches
5610728 ± 2% -12.5% 4909544 fio.workload
72.97 ± 2% -12.4% 63.91 fio.write_bw_MBps
427.18 ± 2% +14.2% 487.93 fio.write_clat_mean_us
13691 ± 2% +43.7% 19669 fio.write_clat_stddev
18680 ± 2% -12.4% 16360 fio.write_iops
0.97 -0.7 0.30 ± 2% mpstat.cpu.iowait%
3.94 ± 3% -1.5 2.40 mpstat.cpu.sys%
2875717 -13.4% 2489058 softirqs.BLOCK
5107622 ± 3% +27.5% 6510241 ± 4% softirqs.RCU
30695 ± 15% -30.2% 21424 ± 11% numa-meminfo.node0.Writeback
179069 ± 19% +134.0% 419038 ± 20% numa-meminfo.node1.Active
36182 ±105% +701.8% 290125 ± 30% numa-meminfo.node1.Active(file)
1.096e+09 ± 3% -22.2% 8.531e+08 ± 7% cpuidle.C1.time
57940399 -34.0% 38218420 ± 4% cpuidle.C1.usage
13565831 ± 7% -67.4% 4420507 ± 16% cpuidle.POLL.time
4064467 ± 5% -72.0% 1136676 ± 12% cpuidle.POLL.usage
124.33 ± 2% -59.2% 50.74 ± 3% iostat.sda.avgqu-sz
18410 -13.2% 15975 iostat.sda.w/s
300245 -21.0% 237217 iostat.sda.wkB/s
9.15 ± 10% -42.0% 5.31 ± 19% iostat.sda.wrqm/s
300252 -21.0% 237234 vmstat.io.bo
1.00 -100.0% 0.00 vmstat.procs.b
3.00 -33.3% 2.00 vmstat.procs.r
392814 -36.9% 247683 vmstat.system.cs
12975351 -10.0% 11683920 meminfo.Inactive
12742134 -10.1% 11450539 meminfo.Inactive(file)
1336423 -10.4% 1197060 meminfo.SUnreclaim
36875 ± 15% -35.8% 23682 ± 8% meminfo.Writeback
97963 ± 4% -9.3% 88890 ± 2% meminfo.max_used_kB
9315760 ± 11% -24.4% 7044222 ± 9% numa-vmstat.node0.nr_dirtied
7593 ± 15% -30.2% 5301 ± 8% numa-vmstat.node0.nr_writeback
9253810 ± 11% -24.4% 6992866 ± 9% numa-vmstat.node0.nr_written
9053 ±105% +699.4% 72375 ± 30% numa-vmstat.node1.nr_active_file
9053 ±105% +699.4% 72375 ± 30% numa-vmstat.node1.nr_zone_active_file
197.50 ± 2% -20.8% 156.50 ± 4% turbostat.Avg_MHz
7.59 ± 4% -1.1 6.45 ± 7% turbostat.Busy%
57935368 -34.0% 38214519 ± 4% turbostat.C1
3.97 ± 3% -0.9 3.10 ± 7% turbostat.C1%
117.34 ± 5% -10.4% 105.14 ± 3% turbostat.PkgWatt
6.93 -5.8% 6.53 ± 3% turbostat.RAMWatt
23703837 -21.2% 18668822 proc-vmstat.nr_dirtied
11565487 +2.6% 11866577 proc-vmstat.nr_free_pages
3186566 -10.0% 2867899 proc-vmstat.nr_inactive_file
14987 -2.0% 14683 proc-vmstat.nr_kernel_stack
203124 -2.2% 198730 proc-vmstat.nr_slab_reclaimable
334281 -10.4% 299452 proc-vmstat.nr_slab_unreclaimable
23643508 -21.2% 18622029 proc-vmstat.nr_written
3186566 -10.0% 2867899 proc-vmstat.nr_zone_inactive_file
9200220 ± 4% -16.8% 7655217 ± 2% proc-vmstat.numa_hit
9182883 ± 4% -16.8% 7637938 ± 2% proc-vmstat.numa_local
15866899 ± 3% -34.3% 10421136 ± 2% proc-vmstat.pgalloc_normal
15347481 -37.3% 9620050 ± 3% proc-vmstat.pgfree
94578712 -21.2% 74490196 proc-vmstat.pgpgout
1.653e+09 -28.2% 1.188e+09 ± 2% perf-stat.i.branch-instructions
16239810 ± 6% -20.2% 12960638 ± 7% perf-stat.i.cache-misses
1.771e+08 ± 4% -21.6% 1.389e+08 ± 6% perf-stat.i.cache-references
397106 -37.0% 250140 perf-stat.i.context-switches
1.75e+10 ± 5% -21.7% 1.37e+10 ± 6% perf-stat.i.cpu-cycles
8.56 ± 17% -55.8% 3.79 ± 15% perf-stat.i.cpu-migrations
2.408e+09 -24.3% 1.823e+09 ± 2% perf-stat.i.dTLB-loads
1.351e+09 ± 6% -18.8% 1.097e+09 ± 2% perf-stat.i.dTLB-stores
6077563 ± 3% -14.6% 5188983 ± 6% perf-stat.i.iTLB-loads
8.756e+09 -25.6% 6.518e+09 perf-stat.i.instructions
48.01 ± 18% +12.6 60.57 ± 7% perf-stat.i.node-load-miss-rate%
2697176 ± 11% -36.8% 1705410 ± 12% perf-stat.i.node-loads
50.90 ± 16% +12.8 63.72 ± 5% perf-stat.overall.node-load-miss-rate%
486504 ± 2% -15.1% 412869 ± 2% perf-stat.overall.path-length
1.648e+09 -28.2% 1.184e+09 ± 2% perf-stat.ps.branch-instructions
16185048 ± 6% -20.2% 12917198 ± 7% perf-stat.ps.cache-misses
1.765e+08 ± 4% -21.6% 1.384e+08 ± 6% perf-stat.ps.cache-references
395744 -37.0% 249290 perf-stat.ps.context-switches
1.744e+10 ± 5% -21.7% 1.365e+10 ± 6% perf-stat.ps.cpu-cycles
8.54 ± 17% -55.7% 3.78 ± 15% perf-stat.ps.cpu-migrations
2.4e+09 -24.3% 1.817e+09 ± 2% perf-stat.ps.dTLB-loads
1.347e+09 ± 6% -18.8% 1.094e+09 ± 2% perf-stat.ps.dTLB-stores
6056751 ± 3% -14.6% 5171616 ± 6% perf-stat.ps.iTLB-loads
8.727e+09 -25.6% 6.497e+09 perf-stat.ps.instructions
2688159 ± 11% -36.8% 1699709 ± 12% perf-stat.ps.node-loads
2.729e+12 -25.7% 2.026e+12 perf-stat.total.instructions
7679 ± 2% -37.9% 4771 ± 7% sched_debug.cfs_rq:/.exec_clock.avg
25109 ± 10% -20.3% 20001 ± 12% sched_debug.cfs_rq:/.exec_clock.max
6099 ± 20% -24.1% 4629 ± 7% sched_debug.cfs_rq:/.exec_clock.stddev
96721 ± 8% -43.2% 54939 ± 37% sched_debug.cfs_rq:/.load.avg
243210 ± 4% -27.0% 177643 ± 21% sched_debug.cfs_rq:/.load.stddev
105.27 ± 15% -43.2% 59.81 ± 22% sched_debug.cfs_rq:/.load_avg.avg
197.18 ± 11% -21.2% 155.31 ± 8% sched_debug.cfs_rq:/.load_avg.stddev
0.13 ± 6% -31.5% 0.09 ± 25% sched_debug.cfs_rq:/.nr_running.avg
49.64 ± 12% -49.3% 25.18 ± 28% sched_debug.cfs_rq:/.runnable_load_avg.avg
689.54 ± 4% -9.3% 625.71 ± 5% sched_debug.cfs_rq:/.runnable_load_avg.max
142.56 ± 7% -26.4% 104.98 ± 12% sched_debug.cfs_rq:/.runnable_load_avg.stddev
97240 ± 8% -46.4% 52094 ± 33% sched_debug.cfs_rq:/.runnable_weight.avg
243593 ± 4% -28.5% 174272 ± 19% sched_debug.cfs_rq:/.runnable_weight.stddev
147.89 ± 8% -27.2% 107.65 ± 13% sched_debug.cfs_rq:/.util_avg.avg
192.27 ± 8% -20.2% 153.44 ± 8% sched_debug.cfs_rq:/.util_avg.stddev
43.61 ± 16% -60.4% 17.27 ± 44% sched_debug.cfs_rq:/.util_est_enqueued.avg
493.75 ± 16% -44.6% 273.67 ± 33% sched_debug.cfs_rq:/.util_est_enqueued.max
120.70 ± 13% -52.0% 57.95 ± 35% sched_debug.cfs_rq:/.util_est_enqueued.stddev
26.69 ± 32% -43.1% 15.20 ± 13% sched_debug.cpu.cpu_load[0].avg
107.63 ± 13% -22.0% 84.01 ± 8% sched_debug.cpu.cpu_load[0].stddev
28.23 ± 30% -46.4% 15.13 ± 10% sched_debug.cpu.cpu_load[1].avg
96.80 ± 14% -19.5% 77.96 ± 4% sched_debug.cpu.cpu_load[1].stddev
28.35 ± 27% -50.8% 13.93 ± 13% sched_debug.cpu.cpu_load[2].avg
26.83 ± 28% -54.8% 12.13 ± 16% sched_debug.cpu.cpu_load[3].avg
76.35 ± 21% -27.2% 55.61 ± 9% sched_debug.cpu.cpu_load[3].stddev
24.61 ± 29% -58.0% 10.35 ± 18% sched_debug.cpu.cpu_load[4].avg
67.78 ± 23% -29.6% 47.73 ± 16% sched_debug.cpu.cpu_load[4].stddev
217.01 ± 9% -29.1% 153.85 ± 11% sched_debug.cpu.curr->pid.avg
65004 ± 18% -52.3% 31025 ± 31% sched_debug.cpu.load.avg
200774 ± 8% -31.1% 138243 ± 19% sched_debug.cpu.load.stddev
0.09 ± 12% -33.7% 0.06 ± 17% sched_debug.cpu.nr_running.avg
0.27 ± 5% -16.5% 0.23 ± 9% sched_debug.cpu.nr_running.stddev
735069 -32.2% 498554 sched_debug.cpu.nr_switches.avg
2860144 ± 11% -27.4% 2076064 ± 13% sched_debug.cpu.nr_switches.max
665483 ± 24% -31.6% 455234 ± 9% sched_debug.cpu.nr_switches.stddev
0.13 ± 7% -30.1% 0.09 ± 12% sched_debug.cpu.nr_uninterruptible.avg
735117 -32.2% 498430 sched_debug.cpu.sched_count.avg
2858539 ± 11% -27.4% 2076509 ± 13% sched_debug.cpu.sched_count.max
665356 ± 24% -31.6% 454947 ± 9% sched_debug.cpu.sched_count.stddev
366543 -32.2% 248579 sched_debug.cpu.sched_goidle.avg
1428344 ± 11% -27.4% 1036752 ± 13% sched_debug.cpu.sched_goidle.max
332365 ± 24% -31.6% 227301 ± 9% sched_debug.cpu.sched_goidle.stddev
368002 -32.2% 249386 sched_debug.cpu.ttwu_count.avg
3059342 -9.8% 2760232 slabinfo.Acpi-State.active_objs
60835 -10.0% 54758 slabinfo.Acpi-State.active_slabs
3102644 -10.0% 2792672 slabinfo.Acpi-State.num_objs
60835 -10.0% 54758 slabinfo.Acpi-State.num_slabs
40884 ± 7% -42.6% 23477 ± 21% slabinfo.avc_xperms_data.active_objs
323.25 ± 7% -41.8% 188.00 ± 21% slabinfo.avc_xperms_data.active_slabs
41459 ± 7% -41.8% 24144 ± 21% slabinfo.avc_xperms_data.num_objs
323.25 ± 7% -41.8% 188.00 ± 21% slabinfo.avc_xperms_data.num_slabs
1524 ± 18% -25.4% 1136 ± 11% slabinfo.biovec-128.active_objs
1536 ± 18% -24.8% 1155 ± 11% slabinfo.biovec-128.num_objs
1681 ± 7% -20.8% 1331 ± 13% slabinfo.biovec-64.active_objs
1681 ± 7% -20.8% 1331 ± 13% slabinfo.biovec-64.num_objs
2654 ± 10% -56.1% 1166 ± 13% slabinfo.biovec-max.active_objs
671.00 ± 10% -55.3% 300.00 ± 12% slabinfo.biovec-max.active_slabs
2685 ± 10% -55.3% 1201 ± 12% slabinfo.biovec-max.num_objs
671.00 ± 10% -55.3% 300.00 ± 12% slabinfo.biovec-max.num_slabs
21641 ± 9% -12.3% 18989 ± 7% slabinfo.btrfs_delayed_ref_head.active_objs
22866 ± 8% -10.1% 20556 ± 7% slabinfo.btrfs_delayed_ref_head.num_objs
67913 ± 4% -12.5% 59451 ± 3% slabinfo.btrfs_extent_buffer.active_objs
1237 ± 4% -14.7% 1055 ± 3% slabinfo.btrfs_extent_buffer.active_slabs
71775 ± 4% -14.7% 61246 ± 3% slabinfo.btrfs_extent_buffer.num_objs
1237 ± 4% -14.7% 1055 ± 3% slabinfo.btrfs_extent_buffer.num_slabs
6184518 -10.1% 5562477 slabinfo.btrfs_extent_map.active_objs
110462 -10.1% 99345 slabinfo.btrfs_extent_map.active_slabs
6185888 -10.1% 5563352 slabinfo.btrfs_extent_map.num_objs
110462 -10.1% 99345 slabinfo.btrfs_extent_map.num_slabs
26097 ± 3% -27.1% 19016 ± 9% slabinfo.btrfs_ordered_extent.active_objs
673.75 ± 4% -26.8% 493.50 ± 9% slabinfo.btrfs_ordered_extent.active_slabs
26301 ± 4% -26.8% 19264 ± 9% slabinfo.btrfs_ordered_extent.num_objs
673.75 ± 4% -26.8% 493.50 ± 9% slabinfo.btrfs_ordered_extent.num_slabs
13863 ± 5% -39.9% 8328 ± 17% slabinfo.btrfs_path.active_objs
387.25 ± 5% -39.4% 234.50 ± 16% slabinfo.btrfs_path.active_slabs
13954 ± 5% -39.3% 8467 ± 16% slabinfo.btrfs_path.num_objs
387.25 ± 5% -39.4% 234.50 ± 16% slabinfo.btrfs_path.num_slabs
13884 ± 9% -25.6% 10330 ± 15% slabinfo.kmalloc-128.active_objs
439.75 ± 8% -24.7% 331.25 ± 15% slabinfo.kmalloc-128.active_slabs
14089 ± 8% -24.6% 10617 ± 15% slabinfo.kmalloc-128.num_objs
439.75 ± 8% -24.7% 331.25 ± 15% slabinfo.kmalloc-128.num_slabs
1554 ± 3% -10.8% 1386 ± 5% slabinfo.kmalloc-rcl-96.active_objs
1554 ± 3% -10.8% 1386 ± 5% slabinfo.kmalloc-rcl-96.num_objs
10158 ± 8% -28.3% 7284 ± 15% slabinfo.mnt_cache.active_objs
10369 ± 8% -26.9% 7581 ± 14% slabinfo.mnt_cache.num_objs
1660 ± 7% -15.2% 1408 ± 11% slabinfo.skbuff_fclone_cache.active_objs
1660 ± 7% -15.2% 1408 ± 11% slabinfo.skbuff_fclone_cache.num_objs
17.20 ± 15% -10.1 7.14 ± 5% perf-profile.calltrace.cycles-pp.btrfs_mark_extent_written.btrfs_finish_ordered_io.normal_work_helper.process_one_work.worker_thread
19.67 ± 13% -9.6 10.08 ± 7% perf-profile.calltrace.cycles-pp.btrfs_finish_ordered_io.normal_work_helper.process_one_work.worker_thread.kthread
14.18 ± 16% -9.5 4.73 ± 6% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_mark_extent_written.btrfs_finish_ordered_io.normal_work_helper.process_one_work
20.52 ± 13% -9.1 11.40 ± 5% perf-profile.calltrace.cycles-pp.normal_work_helper.process_one_work.worker_thread.kthread.ret_from_fork
27.59 ± 9% -8.7 18.88 ± 4% perf-profile.calltrace.cycles-pp.ret_from_fork
27.59 ± 9% -8.7 18.88 ± 4% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
24.79 ± 10% -6.3 18.45 ± 4% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork
25.03 ± 9% -6.2 18.79 ± 4% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
5.57 ± 21% -4.2 1.36 ± 7% perf-profile.calltrace.cycles-pp.btrfs_lock_root_node.btrfs_search_slot.btrfs_mark_extent_written.btrfs_finish_ordered_io.normal_work_helper
5.55 ± 21% -4.2 1.35 ± 7% perf-profile.calltrace.cycles-pp.btrfs_tree_lock.btrfs_lock_root_node.btrfs_search_slot.btrfs_mark_extent_written.btrfs_finish_ordered_io
4.87 ± 20% -3.6 1.31 ± 10% perf-profile.calltrace.cycles-pp.btrfs_read_lock_root_node.btrfs_search_slot.btrfs_mark_extent_written.btrfs_finish_ordered_io.normal_work_helper
4.84 ± 20% -3.6 1.28 ± 10% perf-profile.calltrace.cycles-pp.btrfs_tree_read_lock.btrfs_read_lock_root_node.btrfs_search_slot.btrfs_mark_extent_written.btrfs_finish_ordered_io
3.84 ± 24% -3.1 0.75 ± 7% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.prepare_to_wait_event.btrfs_tree_lock.btrfs_lock_root_node.btrfs_search_slot
3.76 ± 24% -3.0 0.72 ± 7% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.prepare_to_wait_event.btrfs_tree_lock.btrfs_lock_root_node
3.60 ± 22% -2.8 0.81 ± 9% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.prepare_to_wait_event.btrfs_tree_read_lock.btrfs_read_lock_root_node.btrfs_search_slot
3.54 ± 22% -2.7 0.79 ± 10% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.prepare_to_wait_event.btrfs_tree_read_lock.btrfs_read_lock_root_node
3.47 ± 19% -2.7 0.80 ± 6% perf-profile.calltrace.cycles-pp.prepare_to_wait_event.btrfs_tree_lock.btrfs_lock_root_node.btrfs_search_slot.btrfs_mark_extent_written
3.25 ± 17% -2.4 0.85 ± 10% perf-profile.calltrace.cycles-pp.prepare_to_wait_event.btrfs_tree_read_lock.btrfs_read_lock_root_node.btrfs_search_slot.btrfs_mark_extent_written
1.85 ± 8% -1.2 0.65 ± 3% perf-profile.calltrace.cycles-pp.unlock_up.btrfs_search_slot.btrfs_mark_extent_written.btrfs_finish_ordered_io.normal_work_helper
1.83 ± 9% -1.2 0.63 ± 4% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.unlock_up.btrfs_search_slot.btrfs_mark_extent_written.btrfs_finish_ordered_io
1.45 ± 17% -1.2 0.26 ±100% perf-profile.calltrace.cycles-pp.btrfs_search_slot.setup_leaf_for_split.btrfs_duplicate_item.btrfs_mark_extent_written.btrfs_finish_ordered_io
1.71 ± 8% -1.1 0.60 ± 5% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.unlock_up.btrfs_search_slot.btrfs_mark_extent_written
1.69 ± 9% -1.1 0.59 ± 6% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.unlock_up.btrfs_search_slot
1.63 ± 9% -1.1 0.57 ± 7% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.unlock_up
2.12 ± 13% -0.7 1.43 ± 5% perf-profile.calltrace.cycles-pp.setup_leaf_for_split.btrfs_duplicate_item.btrfs_mark_extent_written.btrfs_finish_ordered_io.normal_work_helper
2.75 ± 10% -0.7 2.09 ± 5% perf-profile.calltrace.cycles-pp.btrfs_duplicate_item.btrfs_mark_extent_written.btrfs_finish_ordered_io.normal_work_helper.process_one_work
0.76 ± 5% -0.2 0.57 ± 8% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
0.73 ± 5% -0.2 0.56 ± 8% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
0.99 ± 7% -0.1 0.87 ± 6% perf-profile.calltrace.cycles-pp.__btrfs_cow_block.btrfs_cow_block.btrfs_search_slot.btrfs_mark_extent_written.btrfs_finish_ordered_io
0.99 ± 7% -0.1 0.87 ± 7% perf-profile.calltrace.cycles-pp.btrfs_cow_block.btrfs_search_slot.btrfs_mark_extent_written.btrfs_finish_ordered_io.normal_work_helper
0.88 ± 10% +0.3 1.21 ± 13% perf-profile.calltrace.cycles-pp.get_next_timer_interrupt.tick_nohz_next_event.tick_nohz_get_sleep_length.menu_select.do_idle
0.27 ±100% +0.4 0.62 ± 10% perf-profile.calltrace.cycles-pp.setup_items_for_insert.btrfs_insert_empty_items.btrfs_csum_file_blocks.add_pending_csums.btrfs_finish_ordered_io
0.99 ± 10% +0.4 1.38 ± 7% perf-profile.calltrace.cycles-pp.btrfs_insert_empty_items.btrfs_csum_file_blocks.add_pending_csums.btrfs_finish_ordered_io.normal_work_helper
1.71 ± 17% +0.5 2.17 ± 13% perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.do_idle.cpu_startup_entry.start_secondary
0.29 ±100% +0.5 0.76 ± 12% perf-profile.calltrace.cycles-pp.__next_timer_interrupt.get_next_timer_interrupt.tick_nohz_next_event.tick_nohz_get_sleep_length.menu_select
1.50 ± 7% +0.5 2.00 ± 9% perf-profile.calltrace.cycles-pp.add_pending_csums.btrfs_finish_ordered_io.normal_work_helper.process_one_work.worker_thread
1.49 ± 7% +0.5 2.00 ± 9% perf-profile.calltrace.cycles-pp.btrfs_csum_file_blocks.add_pending_csums.btrfs_finish_ordered_io.normal_work_helper.process_one_work
0.31 ±103% +0.5 0.83 ± 12% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues
0.15 ±173% +0.6 0.75 ± 6% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_insert_empty_items.btrfs_csum_file_blocks.add_pending_csums.btrfs_finish_ordered_io
0.14 ±173% +0.6 0.75 ± 12% perf-profile.calltrace.cycles-pp.split_leaf.setup_leaf_for_split.btrfs_duplicate_item.btrfs_mark_extent_written.btrfs_finish_ordered_io
0.29 ±100% +0.6 0.91 ± 27% perf-profile.calltrace.cycles-pp.clockevents_program_event.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state
1.08 ± 18% +0.6 1.71 ± 12% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt
0.00 +0.7 0.66 ± 13% perf-profile.calltrace.cycles-pp.push_leaf_right.split_leaf.setup_leaf_for_split.btrfs_duplicate_item.btrfs_mark_extent_written
1.21 ± 19% +0.7 1.92 ± 12% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt
1.37 ± 20% +0.8 2.20 ± 12% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
3.28 ± 11% +1.1 4.33 ± 10% perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
0.00 +1.1 1.13 ± 24% perf-profile.calltrace.cycles-pp.__filemap_fdatawrite_range.btrfs_write_marked_extents.btrfs_write_and_wait_transaction.btrfs_commit_transaction.flush_space
0.00 +1.1 1.14 ± 24% perf-profile.calltrace.cycles-pp.btrfs_write_marked_extents.btrfs_write_and_wait_transaction.btrfs_commit_transaction.flush_space.btrfs_async_reclaim_metadata_space
0.00 +1.1 1.15 ± 23% perf-profile.calltrace.cycles-pp.btrfs_write_and_wait_transaction.btrfs_commit_transaction.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
2.54 ± 17% +1.3 3.81 ± 12% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state
0.00 +1.4 1.35 ± 16% perf-profile.calltrace.cycles-pp.btrfs_run_delayed_refs.btrfs_commit_transaction.flush_space.btrfs_async_reclaim_metadata_space.process_one_work
0.00 +1.4 1.35 ± 16% perf-profile.calltrace.cycles-pp.__btrfs_run_delayed_refs.btrfs_run_delayed_refs.btrfs_commit_transaction.flush_space.btrfs_async_reclaim_metadata_space
3.59 ± 12% +1.9 5.50 ± 8% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle
0.15 ±173% +2.5 2.67 ± 14% perf-profile.calltrace.cycles-pp.btrfs_async_reclaim_metadata_space.process_one_work.worker_thread.kthread.ret_from_fork
0.15 ±173% +2.5 2.67 ± 14% perf-profile.calltrace.cycles-pp.flush_space.btrfs_async_reclaim_metadata_space.process_one_work.worker_thread.kthread
0.00 +2.7 2.67 ± 14% perf-profile.calltrace.cycles-pp.btrfs_commit_transaction.flush_space.btrfs_async_reclaim_metadata_space.process_one_work.worker_thread
6.62 ± 16% +3.0 9.58 ± 5% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry
7.43 ± 10% +3.3 10.76 ± 2% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
52.83 ± 4% +4.2 57.01 ± 4% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
62.44 ± 3% +7.2 69.61 ± 3% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
69.18 ± 3% +8.3 77.44 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
69.26 ± 3% +8.3 77.52 perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
69.25 ± 3% +8.3 77.52 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
69.97 ± 3% +8.8 78.74 perf-profile.calltrace.cycles-pp.secondary_startup_64
fio.write_clat_stddev
21000 +-+-----------------------------------------------------------------+
| O O O O |
20000 O-O O O O O O O O O O O O O O O O O O O |
19000 +-+ O O O |
| |
18000 +-+ |
17000 +-+ |
| |
16000 +-+ |
15000 +-+ |
| |
14000 +-+ .+ .+.. .+. .+. .+. .+.+.+.+.+.. .+.|
13000 +-+. .+. + .+.+.+.+.+ + +.+ + +..+.+.+ +.+.+ |
| + + |
12000 +-+-----------------------------------------------------------------+
fio.latency_2000ms_
0.019 O-+-----------------------------O---O------------O------------------+
| O O O O O O O O |
0.018 +-+ O O |
0.017 +-+ O O O O |
| O O O O O O O |
0.016 +-O |
0.015 +-+ O |
| |
0.014 +-+ |
0.013 +-+ |
| |
0.012 +-+ |
0.011 +-+ |
| |
0.01 +-+-----------------------------------------------------------------+
fio.time.voluntary_context_switches
5.8e+06 +-+---------------------------------------------------------------+
| + + |
5.6e+06 +-+ + +.. : + .+. .+ : : |
|. + : + : : +. + + : : : +.|
| + : + + : + +. .+. + : .+. .+.+.+ :+ |
5.4e+06 +-+ +.+ + + + + +.+.+..+.+ + |
| |
5.2e+06 +-+ |
| |
5e+06 +-+ O O O O O |
| O O O O O O |
O O O O O O O O O O O O O O |
4.8e+06 +-+ O O |
| |
4.6e+06 +-+---------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Rong Chen
Hi, Josef,
kernel test robot <[email protected]> writes:
> Greeting,
>
> FYI, we noticed a -12.4% regression of fio.write_bw_MBps due to commit:
>
>
> commit: 302167c50b32e7fccc98994a91d40ddbbab04e52 ("btrfs: don't end the transaction for delayed refs in throttle")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git pending-fixes
>
> in testcase: fio-basic
> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
> with following parameters:
>
> runtime: 300s
> nr_task: 8t
> disk: 1SSD
> fs: btrfs
> rw: randwrite
> bs: 4k
> ioengine: sync
> test_size: 400g
> cpufreq_governor: performance
> ucode: 0xb00002e
>
> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
> test-url: https://github.com/axboe/fio
>
>
Do you have time to take a look at this regression?
Best Regards,
Huang, Ying
"Huang, Ying" <[email protected]> writes:
> Hi, Josef,
>
> kernel test robot <[email protected]> writes:
>
>> Greeting,
>>
>> FYI, we noticed a -12.4% regression of fio.write_bw_MBps due to commit:
>>
>>
>> commit: 302167c50b32e7fccc98994a91d40ddbbab04e52 ("btrfs: don't end the transaction for delayed refs in throttle")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git pending-fixes
>>
>> in testcase: fio-basic
>> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
>> with following parameters:
>>
>> runtime: 300s
>> nr_task: 8t
>> disk: 1SSD
>> fs: btrfs
>> rw: randwrite
>> bs: 4k
>> ioengine: sync
>> test_size: 400g
>> cpufreq_governor: performance
>> ucode: 0xb00002e
>>
>> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
>> test-url: https://github.com/axboe/fio
>>
>>
>
> Do you have time to take a look at this regression?
Ping
Best Regards,
Huang, Ying
"Huang, Ying" <[email protected]> writes:
> "Huang, Ying" <[email protected]> writes:
>
>> Hi, Josef,
>>
>> kernel test robot <[email protected]> writes:
>>
>>> Greeting,
>>>
>>> FYI, we noticed a -12.4% regression of fio.write_bw_MBps due to commit:
>>>
>>>
>>> commit: 302167c50b32e7fccc98994a91d40ddbbab04e52 ("btrfs: don't end
>>> the transaction for delayed refs in throttle")
>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git pending-fixes
>>>
>>> in testcase: fio-basic
>>> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
>>> with following parameters:
>>>
>>> runtime: 300s
>>> nr_task: 8t
>>> disk: 1SSD
>>> fs: btrfs
>>> rw: randwrite
>>> bs: 4k
>>> ioengine: sync
>>> test_size: 400g
>>> cpufreq_governor: performance
>>> ucode: 0xb00002e
>>>
>>> test-description: Fio is a tool that will spawn a number of threads
>>> or processes doing a particular type of I/O action as specified by
>>> the user.
>>> test-url: https://github.com/axboe/fio
>>>
>>>
>>
>> Do you have time to take a look at this regression?
>
> Ping
Ping again.
Best Regards,
Huang, Ying
On Fri, May 24, 2019 at 03:46:17PM +0800, Huang, Ying wrote:
> "Huang, Ying" <[email protected]> writes:
>
> > "Huang, Ying" <[email protected]> writes:
> >
> >> Hi, Josef,
> >>
> >> kernel test robot <[email protected]> writes:
> >>
> >>> Greeting,
> >>>
> >>> FYI, we noticed a -12.4% regression of fio.write_bw_MBps due to commit:
> >>>
> >>>
> >>> commit: 302167c50b32e7fccc98994a91d40ddbbab04e52 ("btrfs: don't end
> >>> the transaction for delayed refs in throttle")
> >>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git pending-fixes
> >>>
> >>> in testcase: fio-basic
> >>> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
> >>> with following parameters:
> >>>
> >>> runtime: 300s
> >>> nr_task: 8t
> >>> disk: 1SSD
> >>> fs: btrfs
> >>> rw: randwrite
> >>> bs: 4k
> >>> ioengine: sync
> >>> test_size: 400g
> >>> cpufreq_governor: performance
> >>> ucode: 0xb00002e
> >>>
> >>> test-description: Fio is a tool that will spawn a number of threads
> >>> or processes doing a particular type of I/O action as specified by
> >>> the user.
> >>> test-url: https://github.com/axboe/fio
> >>>
> >>>
> >>
> >> Do you have time to take a look at this regression?
> >
> > Ping
>
> Ping again.
>
This happens because now we rely more on on-demand flushing than the catchup
flushing that happened before. This is just one case where it's slightly worse,
overall this change provides better latencies, and even in this result it
provided better completion latencies because we're not randomly flushing at the
end of a transaction. It does appear to be costing writes in that they will
spend more time flushing than before, so you get slightly lower throughput on
pure small write workloads. I can't actually see the slowdown locally.
This patch is here to stay, it just shows we need to continue to refine the
flushing code to be less spikey/painful. Thanks,
Josef
Josef Bacik <[email protected]> writes:
> On Fri, May 24, 2019 at 03:46:17PM +0800, Huang, Ying wrote:
>> "Huang, Ying" <[email protected]> writes:
>>
>> > "Huang, Ying" <[email protected]> writes:
>> >
>> >> Hi, Josef,
>> >>
>> >> kernel test robot <[email protected]> writes:
>> >>
>> >>> Greeting,
>> >>>
>> >>> FYI, we noticed a -12.4% regression of fio.write_bw_MBps due to commit:
>> >>>
>> >>>
>> >>> commit: 302167c50b32e7fccc98994a91d40ddbbab04e52 ("btrfs: don't end
>> >>> the transaction for delayed refs in throttle")
>> >>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git pending-fixes
>> >>>
>> >>> in testcase: fio-basic
>> >>> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 64G memory
>> >>> with following parameters:
>> >>>
>> >>> runtime: 300s
>> >>> nr_task: 8t
>> >>> disk: 1SSD
>> >>> fs: btrfs
>> >>> rw: randwrite
>> >>> bs: 4k
>> >>> ioengine: sync
>> >>> test_size: 400g
>> >>> cpufreq_governor: performance
>> >>> ucode: 0xb00002e
>> >>>
>> >>> test-description: Fio is a tool that will spawn a number of threads
>> >>> or processes doing a particular type of I/O action as specified by
>> >>> the user.
>> >>> test-url: https://github.com/axboe/fio
>> >>>
>> >>>
>> >>
>> >> Do you have time to take a look at this regression?
>> >
>> > Ping
>>
>> Ping again.
>>
>
> This happens because now we rely more on on-demand flushing than the catchup
> flushing that happened before. This is just one case where it's slightly worse,
> overall this change provides better latencies, and even in this result it
> provided better completion latencies because we're not randomly flushing at the
> end of a transaction. It does appear to be costing writes in that they will
> spend more time flushing than before, so you get slightly lower throughput on
> pure small write workloads. I can't actually see the slowdown locally.
>
> This patch is here to stay, it just shows we need to continue to refine the
> flushing code to be less spikey/painful. Thanks,
Thanks for detailed explanation. We will ignore this regression.
Best Regards,
Huang, Ying