2018-06-05 07:21:06

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [xfs] b027d4c97b: fio.latency_2ms% +7.1% regression


Greeting,

FYI, we noticed a +7.1%% regression of fio.latency_2ms% due to commit:


commit: b027d4c97b9675c2ad75dec94be4e46dceb3ec74 ("xfs: don't retry xfs_buf_find on XBF_TRYLOCK failure")
https://git.kernel.org/cgit/fs/xfs/xfs-linux.git xfs-4.18-merge

in testcase: fio-basic
on test machine: 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory
with following parameters:

disk: 2pmem
fs: xfs
mount_option: dax
runtime: 200s
nr_task: 50%
time_based: tb
rw: randwrite
bs: 4k
ioengine: libaio
test_size: 200G
cpufreq_governor: performance

test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio



Details are as below:
-------------------------------------------------------------------------------------------------->

=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
4k/gcc-7/performance/2pmem/xfs/libaio/x86_64-rhel-7.2/dax/50%/debian-x86_64-2016-08-31.cgz/200s/randwrite/lkp-hsw-ep6/200G/fio-basic/tb

commit:
8925a3dc47 ("xfs: make xfs_buf_incore out of line")
b027d4c97b ("xfs: don't retry xfs_buf_find on XBF_TRYLOCK failure")

8925a3dc4771004b b027d4c97b9675c2ad75dec94b
---------------- --------------------------
%stddev %change %stddev
\ | \
46.56 ? 3% +7.1 53.61 fio.latency_2ms%
8.19 +0.2 8.40 fio.latency_100ms%
0.74 ? 3% -0.1 0.68 ? 6% fio.latency_250ms%
25.20 ? 6% -7.3 17.86 ? 6% fio.latency_4ms%
0.46 ? 9% +0.2 0.69 ? 13% fio.latency_750us%
90130 -1.8% 88543 fio.time.involuntary_context_switches
617.00 -4.4% 589.75 fio.time.percent_of_cpu_this_job_got
1162 -4.8% 1106 fio.time.system_time
10148 -0.8% 10066 fio.write_clat_mean_us
322.93 -0.8% 320.30 fio.write_slat_mean_us
7300608 ? 11% -18.2% 5972992 meminfo.DirectMap2M
4923 ? 9% +12.2% 5525 ? 9% numa-meminfo.node1.KernelStack
25.67 +2.1% 26.22 boot-time.boot
14.06 +2.7% 14.44 boot-time.dhcp
14.93 +3.7% 15.49 boot-time.kernel_boot
4490207 ? 3% -85.2% 662708 ? 3% numa-numastat.node0.local_node
4498546 ? 3% -85.1% 668260 ? 2% numa-numastat.node0.numa_hit
4648677 ? 2% -86.8% 615846 ? 3% numa-numastat.node1.local_node
4651468 ? 2% -86.6% 621397 ? 2% numa-numastat.node1.numa_hit
3353173 ? 5% -75.1% 834355 ? 2% numa-vmstat.node0.numa_hit
3344442 ? 5% -75.2% 827896 ? 3% numa-vmstat.node0.numa_local
4923 ? 9% +12.2% 5524 ? 9% numa-vmstat.node1.nr_kernel_stack
3043769 ? 5% -77.6% 680788 ? 3% numa-vmstat.node1.numa_hit
2869969 ? 5% -82.4% 504637 ? 6% numa-vmstat.node1.numa_local
9174615 -85.7% 1311343 proc-vmstat.numa_hit
9163481 -85.8% 1300236 proc-vmstat.numa_local
1635 ? 4% -6.6% 1527 ? 4% proc-vmstat.pgactivate
13537158 -57.7% 5726921 proc-vmstat.pgalloc_normal
13517845 -58.1% 5663328 proc-vmstat.pgfree
50269839 ? 34% -41.0% 29656634 ? 33% sched_debug.cfs_rq:/.load.max
50265844 ? 34% -41.0% 29653364 ? 33% sched_debug.cfs_rq:/.runnable_weight.max
1693994 ? 12% +28.2% 2172260 ? 5% sched_debug.cpu.avg_idle.max
351794 ? 13% +19.3% 419766 ? 9% sched_debug.cpu.avg_idle.stddev
779.56 ? 3% +17.3% 914.08 ? 6% sched_debug.cpu.curr->pid.stddev
0.00 ? 12% +27.4% 0.00 ? 14% sched_debug.cpu.next_balance.stddev
0.42 ? 6% +14.3% 0.48 ? 8% sched_debug.cpu.nr_running.stddev
-35.17 -8.8% -32.06 sched_debug.cpu.nr_uninterruptible.min
2.256e+11 -1.3% 2.227e+11 perf-stat.branch-instructions
2.49 +0.0 2.53 perf-stat.branch-miss-rate%
11.25 -0.5 10.73 ? 2% perf-stat.cache-miss-rate%
2.782e+09 -5.4% 2.632e+09 ? 2% perf-stat.cache-misses
1.69 -2.2% 1.65 perf-stat.cpi
2.367e+12 -3.0% 2.296e+12 perf-stat.cpu-cycles
68767 +4.2% 71688 perf-stat.cpu-migrations
4.851e+08 -2.8% 4.716e+08 perf-stat.iTLB-load-misses
0.59 +2.3% 0.61 perf-stat.ipc
56.55 ? 2% -3.6 52.92 ? 4% perf-stat.node-load-miss-rate%
54.38 -1.9 52.43 perf-stat.node-store-miss-rate%
3.276e+08 ? 2% -15.3% 2.774e+08 ? 7% perf-stat.node-store-misses
2.748e+08 -8.6% 2.511e+08 ? 4% perf-stat.node-stores
82016 -1.6% 80720 perf-stat.path-length
12.81 ? 3% -1.5 11.27 ? 4% perf-profile.calltrace.cycles-pp.xfs_bmapi_write.xfs_iomap_write_direct.xfs_file_iomap_begin.iomap_apply.dax_iomap_rw
12.26 ? 4% -1.5 10.74 ? 4% perf-profile.calltrace.cycles-pp.xfs_bmapi_convert_unwritten.xfs_bmapi_write.xfs_iomap_write_direct.xfs_file_iomap_begin.iomap_apply
10.81 ? 5% -1.4 9.37 ? 4% perf-profile.calltrace.cycles-pp.xfs_bmap_add_extent_unwritten_real.xfs_bmapi_convert_unwritten.xfs_bmapi_write.xfs_iomap_write_direct.xfs_file_iomap_begin
2.73 ? 6% -1.0 1.70 ? 5% perf-profile.calltrace.cycles-pp.xfs_btree_rshift.xfs_btree_make_block_unfull.xfs_btree_insrec.xfs_btree_insert.xfs_bmap_add_extent_unwritten_real
3.02 ? 6% -1.0 1.99 ? 3% perf-profile.calltrace.cycles-pp.xfs_btree_make_block_unfull.xfs_btree_insrec.xfs_btree_insert.xfs_bmap_add_extent_unwritten_real.xfs_bmapi_convert_unwritten
3.62 ? 5% -1.0 2.61 ? 3% perf-profile.calltrace.cycles-pp.xfs_btree_insrec.xfs_btree_insert.xfs_bmap_add_extent_unwritten_real.xfs_bmapi_convert_unwritten.xfs_bmapi_write
3.68 ? 5% -1.0 2.68 ? 3% perf-profile.calltrace.cycles-pp.xfs_btree_insert.xfs_bmap_add_extent_unwritten_real.xfs_bmapi_convert_unwritten.xfs_bmapi_write.xfs_iomap_write_direct
1.64 ? 9% -0.9 0.69 ? 2% perf-profile.calltrace.cycles-pp.xfs_btree_increment.xfs_btree_rshift.xfs_btree_make_block_unfull.xfs_btree_insrec.xfs_btree_insert
1.52 ? 9% -0.9 0.58 ? 4% perf-profile.calltrace.cycles-pp.xfs_buf_get_map.xfs_buf_read_map.xfs_btree_reada_bufl.xfs_btree_readahead_lblock.xfs_btree_increment
1.52 ? 9% -0.9 0.58 ? 4% perf-profile.calltrace.cycles-pp.xfs_buf_read_map.xfs_btree_reada_bufl.xfs_btree_readahead_lblock.xfs_btree_increment.xfs_btree_rshift
1.54 ? 9% -0.9 0.60 ? 4% perf-profile.calltrace.cycles-pp.xfs_btree_readahead_lblock.xfs_btree_increment.xfs_btree_rshift.xfs_btree_make_block_unfull.xfs_btree_insrec
1.53 ? 9% -0.9 0.60 ? 4% perf-profile.calltrace.cycles-pp.xfs_btree_reada_bufl.xfs_btree_readahead_lblock.xfs_btree_increment.xfs_btree_rshift.xfs_btree_make_block_unfull
0.67 ? 5% +0.1 0.76 ? 9% perf-profile.calltrace.cycles-pp.try_to_wake_up.xlog_grant_head_wake.xfs_log_space_wake.xfs_log_done.xfs_log_commit_cil
0.71 ? 4% +0.1 0.82 ? 8% perf-profile.calltrace.cycles-pp.xlog_grant_head_wake.xfs_log_space_wake.xfs_log_done.xfs_log_commit_cil.__xfs_trans_commit
2.45 ? 5% +0.2 2.65 ? 2% perf-profile.calltrace.cycles-pp.__softirqentry_text_start.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state
3.31 ? 5% +0.2 3.51 ? 3% perf-profile.calltrace.cycles-pp.xfs_trans_reserve.xfs_trans_alloc.xfs_iomap_write_direct.xfs_file_iomap_begin.iomap_apply
3.23 ? 5% +0.2 3.44 ? 3% perf-profile.calltrace.cycles-pp.xfs_log_reserve.xfs_trans_reserve.xfs_trans_alloc.xfs_iomap_write_direct.xfs_file_iomap_begin
6.17 ? 2% +0.3 6.49 ? 2% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
6.14 ? 2% +0.3 6.47 ? 2% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry
3.12 ? 8% +0.3 3.46 ? 3% perf-profile.calltrace.cycles-pp.xfs_buf_delwri_submit_buffers.xfsaild.kthread.ret_from_fork
0.00 +0.6 0.58 ? 4% perf-profile.calltrace.cycles-pp.xfs_buf_find.xfs_buf_get_map.xfs_buf_read_map.xfs_btree_reada_bufl.xfs_btree_readahead_lblock
0.00 +2.0 1.97 ? 9% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.xfs_buf_find.xfs_buf_get_map.xfs_buf_read_map
0.00 +2.2 2.16 ? 9% perf-profile.calltrace.cycles-pp._raw_spin_lock.xfs_buf_find.xfs_buf_get_map.xfs_buf_read_map.xfs_trans_read_buf_map
0.00 +3.4 3.39 ? 11% perf-profile.calltrace.cycles-pp.xfs_buf_find.xfs_buf_get_map.xfs_buf_read_map.xfs_trans_read_buf_map.xfs_btree_read_buf_block
6.46 ? 8% -6.5 0.00 perf-profile.children.cycles-pp._xfs_buf_find
12.81 ? 3% -1.5 11.27 ? 4% perf-profile.children.cycles-pp.xfs_bmapi_write
6.74 ? 8% -1.5 5.21 ? 5% perf-profile.children.cycles-pp.xfs_buf_get_map
12.28 ? 4% -1.5 10.75 ? 4% perf-profile.children.cycles-pp.xfs_bmapi_convert_unwritten
6.75 ? 8% -1.5 5.23 ? 5% perf-profile.children.cycles-pp.xfs_buf_read_map
10.81 ? 5% -1.4 9.37 ? 4% perf-profile.children.cycles-pp.xfs_bmap_add_extent_unwritten_real
2.31 ? 7% -1.1 1.23 ? 6% perf-profile.children.cycles-pp.xfs_btree_reada_bufl
2.31 ? 7% -1.1 1.24 ? 6% perf-profile.children.cycles-pp.xfs_btree_readahead_lblock
2.74 ? 6% -1.0 1.71 ? 5% perf-profile.children.cycles-pp.xfs_btree_rshift
3.02 ? 6% -1.0 1.99 ? 4% perf-profile.children.cycles-pp.xfs_btree_make_block_unfull
3.65 ? 5% -1.0 2.63 ? 3% perf-profile.children.cycles-pp.xfs_btree_insrec
3.69 ? 5% -1.0 2.69 ? 2% perf-profile.children.cycles-pp.xfs_btree_insert
1.87 ? 8% -1.0 0.92 ? 3% perf-profile.children.cycles-pp.xfs_btree_increment
10.48 -0.9 9.60 ? 3% perf-profile.children.cycles-pp._raw_spin_lock
9.58 ? 2% -0.7 8.84 ? 3% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.56 -0.1 0.44 ? 7% perf-profile.children.cycles-pp.xfs_btree_decrement
0.75 ? 8% -0.1 0.65 ? 9% perf-profile.children.cycles-pp.xfs_perag_get
0.25 ? 4% -0.0 0.20 ? 10% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.24 ? 6% -0.0 0.20 ? 6% perf-profile.children.cycles-pp.kmem_cache_free
0.14 ? 3% -0.0 0.11 ? 7% perf-profile.children.cycles-pp.down_trylock
0.11 ? 7% -0.0 0.09 ? 8% perf-profile.children.cycles-pp.memset_erms
0.06 ? 6% +0.0 0.08 ? 15% perf-profile.children.cycles-pp.xfs_trans_ail_update_bulk
0.12 ? 10% +0.0 0.15 ? 7% perf-profile.children.cycles-pp.idle_cpu
0.19 ? 3% +0.0 0.23 ? 3% perf-profile.children.cycles-pp.__blkdev_issue_zero_pages
0.03 ?100% +0.0 0.07 ? 17% perf-profile.children.cycles-pp.kernel_wait4
0.03 ?100% +0.0 0.07 ? 15% perf-profile.children.cycles-pp.__do_sys_wait4
0.29 ? 8% +0.1 0.34 ? 9% perf-profile.children.cycles-pp.bio_alloc_bioset
0.25 ? 5% +0.1 0.31 ? 12% perf-profile.children.cycles-pp.mempool_alloc
0.55 ? 4% +0.1 0.60 ? 7% perf-profile.children.cycles-pp.get_next_timer_interrupt
0.02 ?173% +0.1 0.08 ? 15% perf-profile.children.cycles-pp.swake_up
0.00 +0.1 0.07 ? 7% perf-profile.children.cycles-pp.do_wait
0.77 ? 8% +0.1 0.92 ? 7% perf-profile.children.cycles-pp.__save_stack_trace
2.66 ? 2% +0.1 2.81 ? 3% perf-profile.children.cycles-pp.__softirqentry_text_start
0.80 ? 7% +0.1 0.95 ? 7% perf-profile.children.cycles-pp.save_stack_trace_tsk
0.94 ? 6% +0.2 1.09 ? 5% perf-profile.children.cycles-pp.__account_scheduler_latency
0.93 ? 5% +0.2 1.09 ? 7% perf-profile.children.cycles-pp.xlog_grant_head_wake
2.95 +0.2 3.15 ? 3% perf-profile.children.cycles-pp.irq_exit
1.15 ? 6% +0.2 1.35 ? 5% perf-profile.children.cycles-pp.enqueue_entity
1.20 ? 6% +0.2 1.41 ? 4% perf-profile.children.cycles-pp.ttwu_do_activate
1.18 ? 7% +0.2 1.39 ? 4% perf-profile.children.cycles-pp.enqueue_task_fair
6.83 +0.2 7.05 perf-profile.children.cycles-pp.apic_timer_interrupt
1.24 ? 5% +0.2 1.47 ? 5% perf-profile.children.cycles-pp.try_to_wake_up
6.79 +0.2 7.02 perf-profile.children.cycles-pp.smp_apic_timer_interrupt
3.12 ? 8% +0.3 3.46 ? 3% perf-profile.children.cycles-pp.xfs_buf_delwri_submit_buffers
0.00 +5.1 5.07 ? 5% perf-profile.children.cycles-pp.xfs_buf_find
9.38 -0.7 8.71 ? 3% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.71 ? 9% -0.1 0.59 ? 5% perf-profile.self.cycles-pp.xfs_perag_get
0.96 ? 3% -0.1 0.85 ? 3% perf-profile.self.cycles-pp._raw_spin_lock
0.25 ? 4% -0.0 0.20 ? 10% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.06 ? 6% -0.0 0.03 ?100% perf-profile.self.cycles-pp.tick_sched_timer
0.09 ? 9% -0.0 0.07 ? 16% perf-profile.self.cycles-pp.hrtimer_interrupt
0.11 ? 3% -0.0 0.09 ? 8% perf-profile.self.cycles-pp.memset_erms
0.18 ? 2% -0.0 0.17 perf-profile.self.cycles-pp.kmem_cache_free
0.06 ? 11% +0.0 0.08 ? 10% perf-profile.self.cycles-pp.selinux_file_permission
0.12 ? 3% +0.0 0.15 ? 5% perf-profile.self.cycles-pp.xfs_buf_get_map
0.12 ? 10% +0.0 0.15 ? 7% perf-profile.self.cycles-pp.idle_cpu
0.18 ? 12% +0.0 0.22 ? 10% perf-profile.self.cycles-pp.__next_timer_interrupt
0.01 ?173% +0.1 0.07 ? 17% perf-profile.self.cycles-pp.mutex_unlock
0.00 +0.5 0.48 ? 13% perf-profile.self.cycles-pp.xfs_buf_find



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong


Attachments:
(No filename) (15.23 kB)
config-4.17.0-rc4-00005-gb027d4c (167.13 kB)
job-script (7.49 kB)
job.yaml (5.11 kB)
reproduce (920.00 B)
Download all attachments

2018-06-05 22:39:06

by Dave Chinner

[permalink] [raw]
Subject: Re: [lkp-robot] [xfs] b027d4c97b: fio.latency_2ms% +7.1% regression

On Tue, Jun 05, 2018 at 03:16:57PM +0800, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a +7.1%% regression of fio.latency_2ms% due to commit:
>
>
> commit: b027d4c97b9675c2ad75dec94be4e46dceb3ec74 ("xfs: don't retry xfs_buf_find on XBF_TRYLOCK failure")
> https://git.kernel.org/cgit/fs/xfs/xfs-linux.git xfs-4.18-merge
....
> 8925a3dc4771004b b027d4c97b9675c2ad75dec94b
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 46.56 ? 3% +7.1 53.61 fio.latency_2ms%
> 8.19 +0.2 8.40 fio.latency_100ms%
> 0.74 ? 3% -0.1 0.68 ? 6% fio.latency_250ms%
> 25.20 ? 6% -7.3 17.86 ? 6% fio.latency_4ms%
> 0.46 ? 9% +0.2 0.69 ? 13% fio.latency_750us%

This is not a regression. The number of IOs in the 4ms IO latency
bin has reduced by 7%, and inumber in the 2ms IO latency bin has
increased by 7%. IOWs, there's a measurable improvement in IO
latency as a result of those patches, not a regression.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2018-06-06 01:45:42

by kernel test robot

[permalink] [raw]
Subject: Re: [lkp-robot] [xfs] b027d4c97b: fio.latency_2ms% +7.1% regression

On 06/06, Dave Chinner wrote:
>On Tue, Jun 05, 2018 at 03:16:57PM +0800, kernel test robot wrote:
>>
>> Greeting,
>>
>> FYI, we noticed a +7.1%% regression of fio.latency_2ms% due to commit:
>>
>>
>> commit: b027d4c97b9675c2ad75dec94be4e46dceb3ec74 ("xfs: don't retry xfs_buf_find on XBF_TRYLOCK failure")
>> https://git.kernel.org/cgit/fs/xfs/xfs-linux.git xfs-4.18-merge
>....
>> 8925a3dc4771004b b027d4c97b9675c2ad75dec94b
>> ---------------- --------------------------
>> %stddev %change %stddev
>> \ | \
>> 46.56 ? 3% +7.1 53.61 fio.latency_2ms%
>> 8.19 +0.2 8.40 fio.latency_100ms%
>> 0.74 ? 3% -0.1 0.68 ? 6% fio.latency_250ms%
>> 25.20 ? 6% -7.3 17.86 ? 6% fio.latency_4ms%
>> 0.46 ? 9% +0.2 0.69 ? 13% fio.latency_750us%
>
>This is not a regression. The number of IOs in the 4ms IO latency
>bin has reduced by 7%, and inumber in the 2ms IO latency bin has
>increased by 7%. IOWs, there's a measurable improvement in IO
>latency as a result of those patches, not a regression.

Thanks for clarification.

Thanks,
Xiaolong

>
>Cheers,
>
>Dave.
>--
>Dave Chinner
>[email protected]