2018-06-04 05:57:05

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression


Greeting,

FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to commit:


commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as non-rotational")
https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-4.18/block

in testcase: aim7
on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
with following parameters:

disk: 1BRD_48G
fs: btrfs
test: disk_rw
load: 1500
cpufreq_governor: performance

test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/



Details are as below:
-------------------------------------------------------------------------------------------------->

=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7

commit:
522a777566 ("block: consolidate struct request timestamp fields")
316ba5736c ("brd: Mark as non-rotational")

522a777566f56696 316ba5736c9caa5dbcd8408598
---------------- --------------------------
%stddev %change %stddev
\ | \
28321 -11.2% 25147 aim7.jobs-per-min
318.19 +12.6% 358.23 aim7.time.elapsed_time
318.19 +12.6% 358.23 aim7.time.elapsed_time.max
1437526 ? 2% +14.6% 1646849 ? 2% aim7.time.involuntary_context_switches
11986 +14.2% 13691 aim7.time.system_time
73.06 ? 2% -3.6% 70.43 aim7.time.user_time
2449470 ? 2% -25.0% 1837521 ? 4% aim7.time.voluntary_context_switches
20.25 ? 58% +1681.5% 360.75 ?109% numa-meminfo.node1.Mlocked
456062 -16.3% 381859 softirqs.SCHED
9015 ? 7% -21.3% 7098 ? 22% meminfo.CmaFree
47.50 ? 58% +1355.8% 691.50 ? 92% meminfo.Mlocked
5.24 ? 3% -1.2 3.99 ? 2% mpstat.cpu.idle%
0.61 ? 2% -0.1 0.52 ? 2% mpstat.cpu.usr%
16627 +12.8% 18762 ? 4% slabinfo.Acpi-State.active_objs
16627 +12.9% 18775 ? 4% slabinfo.Acpi-State.num_objs
57.00 ? 2% +17.5% 67.00 vmstat.procs.r
20936 -24.8% 15752 ? 2% vmstat.system.cs
45474 -1.7% 44681 vmstat.system.in
6.50 ? 59% +1157.7% 81.75 ? 75% numa-vmstat.node0.nr_mlock
242870 ? 3% +13.2% 274913 ? 7% numa-vmstat.node0.nr_written
2278 ? 7% -22.6% 1763 ? 21% numa-vmstat.node1.nr_free_cma
4.75 ? 58% +1789.5% 89.75 ?109% numa-vmstat.node1.nr_mlock
88018135 ? 3% -48.9% 44980457 ? 7% cpuidle.C1.time
1398288 ? 3% -51.1% 683493 ? 9% cpuidle.C1.usage
3499814 ? 2% -38.5% 2153158 ? 5% cpuidle.C1E.time
52722 ? 4% -45.6% 28692 ? 6% cpuidle.C1E.usage
9865857 ? 3% -40.1% 5905155 ? 5% cpuidle.C3.time
69656 ? 2% -42.6% 39990 ? 5% cpuidle.C3.usage
590856 ? 2% -12.3% 517910 cpuidle.C6.usage
46160 ? 7% -53.7% 21372 ? 11% cpuidle.POLL.time
1716 ? 7% -46.6% 916.25 ? 14% cpuidle.POLL.usage
197656 +4.1% 205732 proc-vmstat.nr_active_file
191867 +4.1% 199647 proc-vmstat.nr_dirty
509282 +1.6% 517318 proc-vmstat.nr_file_pages
2282 ? 8% -24.4% 1725 ? 22% proc-vmstat.nr_free_cma
357.50 +10.6% 395.25 ? 2% proc-vmstat.nr_inactive_file
11.50 ? 58% +1397.8% 172.25 ? 93% proc-vmstat.nr_mlock
970355 ? 4% +14.6% 1111549 ? 8% proc-vmstat.nr_written
197984 +4.1% 206034 proc-vmstat.nr_zone_active_file
357.50 +10.6% 395.25 ? 2% proc-vmstat.nr_zone_inactive_file
192282 +4.1% 200126 proc-vmstat.nr_zone_write_pending
7901465 ? 3% -14.0% 6795016 ? 16% proc-vmstat.pgalloc_movable
886101 +10.2% 976329 proc-vmstat.pgfault
2.169e+12 +15.2% 2.497e+12 perf-stat.branch-instructions
0.41 -0.1 0.35 perf-stat.branch-miss-rate%
31.19 ? 2% +1.6 32.82 perf-stat.cache-miss-rate%
9.116e+09 +8.3% 9.869e+09 perf-stat.cache-misses
2.924e+10 +2.9% 3.008e+10 ? 2% perf-stat.cache-references
6712739 ? 2% -15.4% 5678643 ? 2% perf-stat.context-switches
4.02 +2.7% 4.13 perf-stat.cpi
3.761e+13 +17.3% 4.413e+13 perf-stat.cpu-cycles
606958 -13.7% 523758 ? 2% perf-stat.cpu-migrations
2.476e+12 +13.4% 2.809e+12 perf-stat.dTLB-loads
0.18 ? 2% -0.0 0.16 ? 9% perf-stat.dTLB-store-miss-rate%
1.079e+09 ? 2% -9.6% 9.755e+08 ? 9% perf-stat.dTLB-store-misses
5.933e+11 +1.6% 6.029e+11 perf-stat.dTLB-stores
9.349e+12 +14.2% 1.068e+13 perf-stat.instructions
11247 ? 11% +19.8% 13477 ? 9% perf-stat.instructions-per-iTLB-miss
0.25 -2.6% 0.24 perf-stat.ipc
865561 +10.3% 954350 perf-stat.minor-faults
2.901e+09 ? 3% +9.8% 3.186e+09 ? 3% perf-stat.node-load-misses
3.682e+09 ? 3% +11.0% 4.088e+09 ? 3% perf-stat.node-loads
3.778e+09 +4.8% 3.959e+09 ? 2% perf-stat.node-store-misses
5.079e+09 +6.4% 5.402e+09 perf-stat.node-stores
865565 +10.3% 954352 perf-stat.page-faults
51.75 ? 5% -12.5% 45.30 ? 10% sched_debug.cfs_rq:/.load_avg.avg
316.35 ? 3% +17.2% 370.81 ? 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev
15294 ? 30% +234.9% 51219 ? 76% sched_debug.cpu.avg_idle.min
299443 ? 3% -7.3% 277566 ? 5% sched_debug.cpu.avg_idle.stddev
1182 ? 19% -26.3% 872.02 ? 13% sched_debug.cpu.nr_load_updates.stddev
1.22 ? 8% +21.7% 1.48 ? 6% sched_debug.cpu.nr_running.avg
2.75 ? 10% +26.2% 3.47 ? 6% sched_debug.cpu.nr_running.max
0.58 ? 7% +24.2% 0.73 ? 6% sched_debug.cpu.nr_running.stddev
77148 -20.0% 61702 ? 7% sched_debug.cpu.nr_switches.avg
70024 -24.8% 52647 ? 8% sched_debug.cpu.nr_switches.min
6662 ? 6% +61.9% 10789 ? 24% sched_debug.cpu.nr_switches.stddev
80.45 ? 18% -19.1% 65.05 ? 6% sched_debug.cpu.nr_uninterruptible.stddev
76819 -19.3% 62008 ? 8% sched_debug.cpu.sched_count.avg
70616 -23.5% 53996 ? 8% sched_debug.cpu.sched_count.min
5494 ? 9% +85.3% 10179 ? 26% sched_debug.cpu.sched_count.stddev
16936 -52.9% 7975 ? 9% sched_debug.cpu.sched_goidle.avg
19281 -49.9% 9666 ? 7% sched_debug.cpu.sched_goidle.max
15417 -54.8% 6962 ? 10% sched_debug.cpu.sched_goidle.min
875.00 ? 6% -35.0% 569.09 ? 13% sched_debug.cpu.sched_goidle.stddev
40332 -23.5% 30851 ? 7% sched_debug.cpu.ttwu_count.avg
35074 -26.3% 25833 ? 6% sched_debug.cpu.ttwu_count.min
3239 ? 8% +67.4% 5422 ? 28% sched_debug.cpu.ttwu_count.stddev
5232 +27.4% 6665 ? 13% sched_debug.cpu.ttwu_local.avg
15877 ? 12% +77.5% 28184 ? 27% sched_debug.cpu.ttwu_local.max
2530 ? 10% +95.9% 4956 ? 27% sched_debug.cpu.ttwu_local.stddev
2.52 ? 7% -0.6 1.95 ? 3% perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
1.48 ? 12% -0.5 1.01 ? 4% perf-profile.calltrace.cycles-pp.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
1.18 ? 16% -0.4 0.76 ? 7% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write
1.18 ? 16% -0.4 0.76 ? 7% perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter
0.90 ? 17% -0.3 0.56 ? 4% perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
0.90 ? 17% -0.3 0.56 ? 4% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
0.90 ? 17% -0.3 0.56 ? 4% perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_usermode_loop
0.90 ? 18% -0.3 0.56 ? 4% perf-profile.calltrace.cycles-pp.btrfs_evict_inode.evict.__dentry_kill.dentry_kill.dput
0.90 ? 17% -0.3 0.57 ? 5% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.90 ? 17% -0.3 0.57 ? 5% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.90 ? 17% -0.3 0.57 ? 5% perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.90 ? 17% -0.3 0.57 ? 5% perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
1.69 -0.1 1.54 ? 2% perf-profile.calltrace.cycles-pp.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
0.87 ? 4% -0.1 0.76 ? 2% perf-profile.calltrace.cycles-pp.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter
0.87 ? 4% -0.1 0.76 ? 2% perf-profile.calltrace.cycles-pp.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
0.71 ? 6% -0.1 0.61 ? 2% perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write
0.69 ? 6% -0.1 0.60 ? 2% perf-profile.calltrace.cycles-pp.btrfs_clear_bit_hook.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need
96.77 +0.6 97.33 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
0.00 +0.6 0.56 ? 3% perf-profile.calltrace.cycles-pp.can_overcommit.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
96.72 +0.6 97.29 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
43.13 +0.8 43.91 perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
42.37 +0.8 43.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write
43.11 +0.8 43.89 perf-profile.calltrace.cycles-pp.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
42.96 +0.8 43.77 perf-profile.calltrace.cycles-pp._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter
95.28 +0.9 96.23 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
95.22 +1.0 96.18 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
94.88 +1.0 95.85 perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
94.83 +1.0 95.80 perf-profile.calltrace.cycles-pp.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
94.51 +1.0 95.50 perf-profile.calltrace.cycles-pp.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write
42.44 +1.1 43.52 perf-profile.calltrace.cycles-pp._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
42.09 +1.1 43.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write
44.07 +1.2 45.29 perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
43.42 +1.3 44.69 perf-profile.calltrace.cycles-pp.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
2.06 ? 18% -0.9 1.21 ? 6% perf-profile.children.cycles-pp.btrfs_search_slot
2.54 ? 7% -0.6 1.96 ? 3% perf-profile.children.cycles-pp.btrfs_dirty_pages
1.05 ? 24% -0.5 0.52 ? 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
1.50 ? 12% -0.5 1.03 ? 4% perf-profile.children.cycles-pp.btrfs_get_extent
1.22 ? 15% -0.4 0.79 ? 8% perf-profile.children.cycles-pp.btrfs_lookup_file_extent
0.81 ? 5% -0.4 0.41 ? 6% perf-profile.children.cycles-pp.btrfs_calc_reclaim_metadata_size
0.74 ? 24% -0.4 0.35 ? 9% perf-profile.children.cycles-pp.btrfs_lock_root_node
0.74 ? 24% -0.4 0.35 ? 9% perf-profile.children.cycles-pp.btrfs_tree_lock
0.90 ? 17% -0.3 0.56 ? 4% perf-profile.children.cycles-pp.__dentry_kill
0.90 ? 17% -0.3 0.56 ? 4% perf-profile.children.cycles-pp.evict
0.90 ? 17% -0.3 0.56 ? 4% perf-profile.children.cycles-pp.dentry_kill
0.90 ? 18% -0.3 0.56 ? 4% perf-profile.children.cycles-pp.btrfs_evict_inode
0.91 ? 18% -0.3 0.57 ? 4% perf-profile.children.cycles-pp.exit_to_usermode_loop
0.52 ? 20% -0.3 0.18 ? 14% perf-profile.children.cycles-pp.do_idle
0.90 ? 17% -0.3 0.57 ? 5% perf-profile.children.cycles-pp.task_work_run
0.90 ? 17% -0.3 0.57 ? 5% perf-profile.children.cycles-pp.__fput
0.90 ? 18% -0.3 0.57 ? 4% perf-profile.children.cycles-pp.dput
0.51 ? 20% -0.3 0.18 ? 14% perf-profile.children.cycles-pp.secondary_startup_64
0.51 ? 20% -0.3 0.18 ? 14% perf-profile.children.cycles-pp.cpu_startup_entry
0.50 ? 21% -0.3 0.17 ? 16% perf-profile.children.cycles-pp.start_secondary
0.47 ? 20% -0.3 0.16 ? 13% perf-profile.children.cycles-pp.cpuidle_enter_state
0.47 ? 19% -0.3 0.16 ? 13% perf-profile.children.cycles-pp.intel_idle
0.61 ? 20% -0.3 0.36 ? 11% perf-profile.children.cycles-pp.btrfs_tree_read_lock
0.47 ? 26% -0.3 0.21 ? 10% perf-profile.children.cycles-pp.prepare_to_wait_event
0.64 ? 18% -0.2 0.39 ? 9% perf-profile.children.cycles-pp.btrfs_read_lock_root_node
0.40 ? 22% -0.2 0.21 ? 5% perf-profile.children.cycles-pp.btrfs_clear_path_blocking
0.38 ? 23% -0.2 0.19 ? 13% perf-profile.children.cycles-pp.finish_wait
1.51 ? 3% -0.2 1.35 ? 2% perf-profile.children.cycles-pp.__clear_extent_bit
1.71 -0.1 1.56 ? 2% perf-profile.children.cycles-pp.lock_and_cleanup_extent_if_need
0.29 ? 25% -0.1 0.15 ? 10% perf-profile.children.cycles-pp.btrfs_orphan_del
0.27 ? 27% -0.1 0.12 ? 8% perf-profile.children.cycles-pp.btrfs_del_orphan_item
0.33 ? 18% -0.1 0.19 ? 9% perf-profile.children.cycles-pp.queued_read_lock_slowpath
0.33 ? 19% -0.1 0.20 ? 4% perf-profile.children.cycles-pp.__wake_up_common_lock
0.45 ? 15% -0.1 0.34 ? 2% perf-profile.children.cycles-pp.btrfs_alloc_data_chunk_ondemand
0.47 ? 16% -0.1 0.36 ? 4% perf-profile.children.cycles-pp.btrfs_check_data_free_space
0.91 ? 4% -0.1 0.81 ? 3% perf-profile.children.cycles-pp.clear_extent_bit
1.07 ? 5% -0.1 0.97 perf-profile.children.cycles-pp.__set_extent_bit
0.77 ? 6% -0.1 0.69 ? 3% perf-profile.children.cycles-pp.btrfs_clear_bit_hook
0.17 ? 20% -0.1 0.08 ? 10% perf-profile.children.cycles-pp.queued_write_lock_slowpath
0.16 ? 22% -0.1 0.08 ? 24% perf-profile.children.cycles-pp.btrfs_lookup_inode
0.21 ? 17% -0.1 0.14 ? 19% perf-profile.children.cycles-pp.__btrfs_update_delayed_inode
0.26 ? 12% -0.1 0.18 ? 13% perf-profile.children.cycles-pp.btrfs_async_run_delayed_root
0.52 ? 5% -0.1 0.45 perf-profile.children.cycles-pp.set_extent_bit
0.45 ? 5% -0.1 0.40 ? 3% perf-profile.children.cycles-pp.alloc_extent_state
0.11 ? 17% -0.1 0.06 ? 11% perf-profile.children.cycles-pp.btrfs_clear_lock_blocking_rw
0.28 ? 9% -0.0 0.23 ? 3% perf-profile.children.cycles-pp.btrfs_drop_pages
0.07 -0.0 0.03 ?100% perf-profile.children.cycles-pp.btrfs_set_lock_blocking_rw
0.39 ? 3% -0.0 0.34 ? 3% perf-profile.children.cycles-pp.get_alloc_profile
0.33 ? 7% -0.0 0.29 perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
0.38 ? 2% -0.0 0.35 ? 4% perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
0.49 ? 3% -0.0 0.46 ? 3% perf-profile.children.cycles-pp.pagecache_get_page
0.18 ? 4% -0.0 0.15 ? 2% perf-profile.children.cycles-pp.truncate_inode_pages_range
0.08 ? 5% -0.0 0.05 ? 9% perf-profile.children.cycles-pp.btrfs_set_path_blocking
0.08 ? 6% -0.0 0.06 ? 6% perf-profile.children.cycles-pp.truncate_cleanup_page
0.80 ? 4% +0.2 0.95 ? 2% perf-profile.children.cycles-pp.can_overcommit
96.84 +0.5 97.37 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
96.80 +0.5 97.35 perf-profile.children.cycles-pp.do_syscall_64
43.34 +0.8 44.17 perf-profile.children.cycles-pp.btrfs_inode_rsv_release
43.49 +0.8 44.32 perf-profile.children.cycles-pp.block_rsv_release_bytes
95.32 +0.9 96.26 perf-profile.children.cycles-pp.ksys_write
95.26 +0.9 96.20 perf-profile.children.cycles-pp.vfs_write
94.91 +1.0 95.88 perf-profile.children.cycles-pp.__vfs_write
94.84 +1.0 95.81 perf-profile.children.cycles-pp.btrfs_file_write_iter
94.55 +1.0 95.55 perf-profile.children.cycles-pp.__btrfs_buffered_write
86.68 +1.0 87.70 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
44.08 +1.2 45.31 perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
43.49 +1.3 44.77 perf-profile.children.cycles-pp.reserve_metadata_bytes
87.59 +1.8 89.38 perf-profile.children.cycles-pp._raw_spin_lock
0.47 ? 19% -0.3 0.16 ? 13% perf-profile.self.cycles-pp.intel_idle
0.33 ? 6% -0.1 0.18 ? 6% perf-profile.self.cycles-pp.get_alloc_profile
0.27 ? 8% -0.0 0.22 ? 4% perf-profile.self.cycles-pp.btrfs_drop_pages
0.07 -0.0 0.03 ?100% perf-profile.self.cycles-pp.btrfs_set_lock_blocking_rw
0.14 ? 5% -0.0 0.12 ? 6% perf-profile.self.cycles-pp.clear_page_dirty_for_io
0.09 ? 5% -0.0 0.07 ? 10% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.17 ? 4% +0.1 0.23 ? 3% perf-profile.self.cycles-pp.reserve_metadata_bytes
0.31 ? 7% +0.1 0.45 ? 2% perf-profile.self.cycles-pp.can_overcommit
86.35 +1.0 87.39 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath



aim7.jobs-per-min

29000 +-+-----------------------------------------------------------------+
28500 +-+ +.. + +..+.. +.. |
|..+ +.+..+.. : .. + .+.+..+..+.+.. .+..+.. + + + |
28000 +-+ + .. : + +. + + + |
27500 +-+ + + |
| |
27000 +-+ |
26500 +-+ |
26000 +-+ |
| |
25500 +-+ O O O O O |
25000 +-+ O O O O O O O O O
| O O O O O O O O |
24500 O-+O O O O |
24000 +-+-----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong


Attachments:
(No filename) (22.54 kB)
config-4.17.0-rc4-00028-g316ba57 (167.06 kB)
job-script (7.52 kB)
job.yaml (5.22 kB)
reproduce (728.00 B)
Download all attachments

2018-06-04 15:51:19

by Jens Axboe

[permalink] [raw]
Subject: Re: [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression

On 6/3/18 11:52 PM, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to commit:
>
>
> commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as non-rotational")
> https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-4.18/block
>
> in testcase: aim7
> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
> with following parameters:
>
> disk: 1BRD_48G
> fs: btrfs
> test: disk_rw
> load: 1500
> cpufreq_governor: performance

Does this also happen on eg ext4 or xfs? If not, it might point to something in
btrfs that ends up being worse for a device that isn't rotational.

CC'ing the btrfs guys, and leaving the rest of the email below.

> test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
> test-url: https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_projects_aimbench_files_aim-2Dsuite7_&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=cK1a7KivzZRh1fKQMjSm2A&m=IKNYvfXb5tRluNV45DgoqZaSiffR8xKQObhRn_lf1zo&s=12WA2xKDvsfwuUtTCsanhmFyD3le2LUKfG5u-O5sChk&e=
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
> =========================================================================================
> compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
> gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7
>
> commit:
> 522a777566 ("block: consolidate struct request timestamp fields")
> 316ba5736c ("brd: Mark as non-rotational")
>
> 522a777566f56696 316ba5736c9caa5dbcd8408598
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 28321 -11.2% 25147 aim7.jobs-per-min
> 318.19 +12.6% 358.23 aim7.time.elapsed_time
> 318.19 +12.6% 358.23 aim7.time.elapsed_time.max
> 1437526 ? 2% +14.6% 1646849 ? 2% aim7.time.involuntary_context_switches
> 11986 +14.2% 13691 aim7.time.system_time
> 73.06 ? 2% -3.6% 70.43 aim7.time.user_time
> 2449470 ? 2% -25.0% 1837521 ? 4% aim7.time.voluntary_context_switches
> 20.25 ? 58% +1681.5% 360.75 ?109% numa-meminfo.node1.Mlocked
> 456062 -16.3% 381859 softirqs.SCHED
> 9015 ? 7% -21.3% 7098 ? 22% meminfo.CmaFree
> 47.50 ? 58% +1355.8% 691.50 ? 92% meminfo.Mlocked
> 5.24 ? 3% -1.2 3.99 ? 2% mpstat.cpu.idle%
> 0.61 ? 2% -0.1 0.52 ? 2% mpstat.cpu.usr%
> 16627 +12.8% 18762 ? 4% slabinfo.Acpi-State.active_objs
> 16627 +12.9% 18775 ? 4% slabinfo.Acpi-State.num_objs
> 57.00 ? 2% +17.5% 67.00 vmstat.procs.r
> 20936 -24.8% 15752 ? 2% vmstat.system.cs
> 45474 -1.7% 44681 vmstat.system.in
> 6.50 ? 59% +1157.7% 81.75 ? 75% numa-vmstat.node0.nr_mlock
> 242870 ? 3% +13.2% 274913 ? 7% numa-vmstat.node0.nr_written
> 2278 ? 7% -22.6% 1763 ? 21% numa-vmstat.node1.nr_free_cma
> 4.75 ? 58% +1789.5% 89.75 ?109% numa-vmstat.node1.nr_mlock
> 88018135 ? 3% -48.9% 44980457 ? 7% cpuidle.C1.time
> 1398288 ? 3% -51.1% 683493 ? 9% cpuidle.C1.usage
> 3499814 ? 2% -38.5% 2153158 ? 5% cpuidle.C1E.time
> 52722 ? 4% -45.6% 28692 ? 6% cpuidle.C1E.usage
> 9865857 ? 3% -40.1% 5905155 ? 5% cpuidle.C3.time
> 69656 ? 2% -42.6% 39990 ? 5% cpuidle.C3.usage
> 590856 ? 2% -12.3% 517910 cpuidle.C6.usage
> 46160 ? 7% -53.7% 21372 ? 11% cpuidle.POLL.time
> 1716 ? 7% -46.6% 916.25 ? 14% cpuidle.POLL.usage
> 197656 +4.1% 205732 proc-vmstat.nr_active_file
> 191867 +4.1% 199647 proc-vmstat.nr_dirty
> 509282 +1.6% 517318 proc-vmstat.nr_file_pages
> 2282 ? 8% -24.4% 1725 ? 22% proc-vmstat.nr_free_cma
> 357.50 +10.6% 395.25 ? 2% proc-vmstat.nr_inactive_file
> 11.50 ? 58% +1397.8% 172.25 ? 93% proc-vmstat.nr_mlock
> 970355 ? 4% +14.6% 1111549 ? 8% proc-vmstat.nr_written
> 197984 +4.1% 206034 proc-vmstat.nr_zone_active_file
> 357.50 +10.6% 395.25 ? 2% proc-vmstat.nr_zone_inactive_file
> 192282 +4.1% 200126 proc-vmstat.nr_zone_write_pending
> 7901465 ? 3% -14.0% 6795016 ? 16% proc-vmstat.pgalloc_movable
> 886101 +10.2% 976329 proc-vmstat.pgfault
> 2.169e+12 +15.2% 2.497e+12 perf-stat.branch-instructions
> 0.41 -0.1 0.35 perf-stat.branch-miss-rate%
> 31.19 ? 2% +1.6 32.82 perf-stat.cache-miss-rate%
> 9.116e+09 +8.3% 9.869e+09 perf-stat.cache-misses
> 2.924e+10 +2.9% 3.008e+10 ? 2% perf-stat.cache-references
> 6712739 ? 2% -15.4% 5678643 ? 2% perf-stat.context-switches
> 4.02 +2.7% 4.13 perf-stat.cpi
> 3.761e+13 +17.3% 4.413e+13 perf-stat.cpu-cycles
> 606958 -13.7% 523758 ? 2% perf-stat.cpu-migrations
> 2.476e+12 +13.4% 2.809e+12 perf-stat.dTLB-loads
> 0.18 ? 2% -0.0 0.16 ? 9% perf-stat.dTLB-store-miss-rate%
> 1.079e+09 ? 2% -9.6% 9.755e+08 ? 9% perf-stat.dTLB-store-misses
> 5.933e+11 +1.6% 6.029e+11 perf-stat.dTLB-stores
> 9.349e+12 +14.2% 1.068e+13 perf-stat.instructions
> 11247 ? 11% +19.8% 13477 ? 9% perf-stat.instructions-per-iTLB-miss
> 0.25 -2.6% 0.24 perf-stat.ipc
> 865561 +10.3% 954350 perf-stat.minor-faults
> 2.901e+09 ? 3% +9.8% 3.186e+09 ? 3% perf-stat.node-load-misses
> 3.682e+09 ? 3% +11.0% 4.088e+09 ? 3% perf-stat.node-loads
> 3.778e+09 +4.8% 3.959e+09 ? 2% perf-stat.node-store-misses
> 5.079e+09 +6.4% 5.402e+09 perf-stat.node-stores
> 865565 +10.3% 954352 perf-stat.page-faults
> 51.75 ? 5% -12.5% 45.30 ? 10% sched_debug.cfs_rq:/.load_avg.avg
> 316.35 ? 3% +17.2% 370.81 ? 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev
> 15294 ? 30% +234.9% 51219 ? 76% sched_debug.cpu.avg_idle.min
> 299443 ? 3% -7.3% 277566 ? 5% sched_debug.cpu.avg_idle.stddev
> 1182 ? 19% -26.3% 872.02 ? 13% sched_debug.cpu.nr_load_updates.stddev
> 1.22 ? 8% +21.7% 1.48 ? 6% sched_debug.cpu.nr_running.avg
> 2.75 ? 10% +26.2% 3.47 ? 6% sched_debug.cpu.nr_running.max
> 0.58 ? 7% +24.2% 0.73 ? 6% sched_debug.cpu.nr_running.stddev
> 77148 -20.0% 61702 ? 7% sched_debug.cpu.nr_switches.avg
> 70024 -24.8% 52647 ? 8% sched_debug.cpu.nr_switches.min
> 6662 ? 6% +61.9% 10789 ? 24% sched_debug.cpu.nr_switches.stddev
> 80.45 ? 18% -19.1% 65.05 ? 6% sched_debug.cpu.nr_uninterruptible.stddev
> 76819 -19.3% 62008 ? 8% sched_debug.cpu.sched_count.avg
> 70616 -23.5% 53996 ? 8% sched_debug.cpu.sched_count.min
> 5494 ? 9% +85.3% 10179 ? 26% sched_debug.cpu.sched_count.stddev
> 16936 -52.9% 7975 ? 9% sched_debug.cpu.sched_goidle.avg
> 19281 -49.9% 9666 ? 7% sched_debug.cpu.sched_goidle.max
> 15417 -54.8% 6962 ? 10% sched_debug.cpu.sched_goidle.min
> 875.00 ? 6% -35.0% 569.09 ? 13% sched_debug.cpu.sched_goidle.stddev
> 40332 -23.5% 30851 ? 7% sched_debug.cpu.ttwu_count.avg
> 35074 -26.3% 25833 ? 6% sched_debug.cpu.ttwu_count.min
> 3239 ? 8% +67.4% 5422 ? 28% sched_debug.cpu.ttwu_count.stddev
> 5232 +27.4% 6665 ? 13% sched_debug.cpu.ttwu_local.avg
> 15877 ? 12% +77.5% 28184 ? 27% sched_debug.cpu.ttwu_local.max
> 2530 ? 10% +95.9% 4956 ? 27% sched_debug.cpu.ttwu_local.stddev
> 2.52 ? 7% -0.6 1.95 ? 3% perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> 1.48 ? 12% -0.5 1.01 ? 4% perf-profile.calltrace.cycles-pp.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> 1.18 ? 16% -0.4 0.76 ? 7% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write
> 1.18 ? 16% -0.4 0.76 ? 7% perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter
> 0.90 ? 17% -0.3 0.56 ? 4% perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
> 0.90 ? 17% -0.3 0.56 ? 4% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
> 0.90 ? 17% -0.3 0.56 ? 4% perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_usermode_loop
> 0.90 ? 18% -0.3 0.56 ? 4% perf-profile.calltrace.cycles-pp.btrfs_evict_inode.evict.__dentry_kill.dentry_kill.dput
> 0.90 ? 17% -0.3 0.57 ? 5% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.90 ? 17% -0.3 0.57 ? 5% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.90 ? 17% -0.3 0.57 ? 5% perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.90 ? 17% -0.3 0.57 ? 5% perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
> 1.69 -0.1 1.54 ? 2% perf-profile.calltrace.cycles-pp.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> 0.87 ? 4% -0.1 0.76 ? 2% perf-profile.calltrace.cycles-pp.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter
> 0.87 ? 4% -0.1 0.76 ? 2% perf-profile.calltrace.cycles-pp.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> 0.71 ? 6% -0.1 0.61 ? 2% perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write
> 0.69 ? 6% -0.1 0.60 ? 2% perf-profile.calltrace.cycles-pp.btrfs_clear_bit_hook.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need
> 96.77 +0.6 97.33 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.00 +0.6 0.56 ? 3% perf-profile.calltrace.cycles-pp.can_overcommit.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
> 96.72 +0.6 97.29 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 43.13 +0.8 43.91 perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> 42.37 +0.8 43.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write
> 43.11 +0.8 43.89 perf-profile.calltrace.cycles-pp.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> 42.96 +0.8 43.77 perf-profile.calltrace.cycles-pp._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter
> 95.28 +0.9 96.23 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 95.22 +1.0 96.18 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 94.88 +1.0 95.85 perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 94.83 +1.0 95.80 perf-profile.calltrace.cycles-pp.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
> 94.51 +1.0 95.50 perf-profile.calltrace.cycles-pp.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write
> 42.44 +1.1 43.52 perf-profile.calltrace.cycles-pp._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
> 42.09 +1.1 43.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write
> 44.07 +1.2 45.29 perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> 43.42 +1.3 44.69 perf-profile.calltrace.cycles-pp.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> 2.06 ? 18% -0.9 1.21 ? 6% perf-profile.children.cycles-pp.btrfs_search_slot
> 2.54 ? 7% -0.6 1.96 ? 3% perf-profile.children.cycles-pp.btrfs_dirty_pages
> 1.05 ? 24% -0.5 0.52 ? 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 1.50 ? 12% -0.5 1.03 ? 4% perf-profile.children.cycles-pp.btrfs_get_extent
> 1.22 ? 15% -0.4 0.79 ? 8% perf-profile.children.cycles-pp.btrfs_lookup_file_extent
> 0.81 ? 5% -0.4 0.41 ? 6% perf-profile.children.cycles-pp.btrfs_calc_reclaim_metadata_size
> 0.74 ? 24% -0.4 0.35 ? 9% perf-profile.children.cycles-pp.btrfs_lock_root_node
> 0.74 ? 24% -0.4 0.35 ? 9% perf-profile.children.cycles-pp.btrfs_tree_lock
> 0.90 ? 17% -0.3 0.56 ? 4% perf-profile.children.cycles-pp.__dentry_kill
> 0.90 ? 17% -0.3 0.56 ? 4% perf-profile.children.cycles-pp.evict
> 0.90 ? 17% -0.3 0.56 ? 4% perf-profile.children.cycles-pp.dentry_kill
> 0.90 ? 18% -0.3 0.56 ? 4% perf-profile.children.cycles-pp.btrfs_evict_inode
> 0.91 ? 18% -0.3 0.57 ? 4% perf-profile.children.cycles-pp.exit_to_usermode_loop
> 0.52 ? 20% -0.3 0.18 ? 14% perf-profile.children.cycles-pp.do_idle
> 0.90 ? 17% -0.3 0.57 ? 5% perf-profile.children.cycles-pp.task_work_run
> 0.90 ? 17% -0.3 0.57 ? 5% perf-profile.children.cycles-pp.__fput
> 0.90 ? 18% -0.3 0.57 ? 4% perf-profile.children.cycles-pp.dput
> 0.51 ? 20% -0.3 0.18 ? 14% perf-profile.children.cycles-pp.secondary_startup_64
> 0.51 ? 20% -0.3 0.18 ? 14% perf-profile.children.cycles-pp.cpu_startup_entry
> 0.50 ? 21% -0.3 0.17 ? 16% perf-profile.children.cycles-pp.start_secondary
> 0.47 ? 20% -0.3 0.16 ? 13% perf-profile.children.cycles-pp.cpuidle_enter_state
> 0.47 ? 19% -0.3 0.16 ? 13% perf-profile.children.cycles-pp.intel_idle
> 0.61 ? 20% -0.3 0.36 ? 11% perf-profile.children.cycles-pp.btrfs_tree_read_lock
> 0.47 ? 26% -0.3 0.21 ? 10% perf-profile.children.cycles-pp.prepare_to_wait_event
> 0.64 ? 18% -0.2 0.39 ? 9% perf-profile.children.cycles-pp.btrfs_read_lock_root_node
> 0.40 ? 22% -0.2 0.21 ? 5% perf-profile.children.cycles-pp.btrfs_clear_path_blocking
> 0.38 ? 23% -0.2 0.19 ? 13% perf-profile.children.cycles-pp.finish_wait
> 1.51 ? 3% -0.2 1.35 ? 2% perf-profile.children.cycles-pp.__clear_extent_bit
> 1.71 -0.1 1.56 ? 2% perf-profile.children.cycles-pp.lock_and_cleanup_extent_if_need
> 0.29 ? 25% -0.1 0.15 ? 10% perf-profile.children.cycles-pp.btrfs_orphan_del
> 0.27 ? 27% -0.1 0.12 ? 8% perf-profile.children.cycles-pp.btrfs_del_orphan_item
> 0.33 ? 18% -0.1 0.19 ? 9% perf-profile.children.cycles-pp.queued_read_lock_slowpath
> 0.33 ? 19% -0.1 0.20 ? 4% perf-profile.children.cycles-pp.__wake_up_common_lock
> 0.45 ? 15% -0.1 0.34 ? 2% perf-profile.children.cycles-pp.btrfs_alloc_data_chunk_ondemand
> 0.47 ? 16% -0.1 0.36 ? 4% perf-profile.children.cycles-pp.btrfs_check_data_free_space
> 0.91 ? 4% -0.1 0.81 ? 3% perf-profile.children.cycles-pp.clear_extent_bit
> 1.07 ? 5% -0.1 0.97 perf-profile.children.cycles-pp.__set_extent_bit
> 0.77 ? 6% -0.1 0.69 ? 3% perf-profile.children.cycles-pp.btrfs_clear_bit_hook
> 0.17 ? 20% -0.1 0.08 ? 10% perf-profile.children.cycles-pp.queued_write_lock_slowpath
> 0.16 ? 22% -0.1 0.08 ? 24% perf-profile.children.cycles-pp.btrfs_lookup_inode
> 0.21 ? 17% -0.1 0.14 ? 19% perf-profile.children.cycles-pp.__btrfs_update_delayed_inode
> 0.26 ? 12% -0.1 0.18 ? 13% perf-profile.children.cycles-pp.btrfs_async_run_delayed_root
> 0.52 ? 5% -0.1 0.45 perf-profile.children.cycles-pp.set_extent_bit
> 0.45 ? 5% -0.1 0.40 ? 3% perf-profile.children.cycles-pp.alloc_extent_state
> 0.11 ? 17% -0.1 0.06 ? 11% perf-profile.children.cycles-pp.btrfs_clear_lock_blocking_rw
> 0.28 ? 9% -0.0 0.23 ? 3% perf-profile.children.cycles-pp.btrfs_drop_pages
> 0.07 -0.0 0.03 ?100% perf-profile.children.cycles-pp.btrfs_set_lock_blocking_rw
> 0.39 ? 3% -0.0 0.34 ? 3% perf-profile.children.cycles-pp.get_alloc_profile
> 0.33 ? 7% -0.0 0.29 perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
> 0.38 ? 2% -0.0 0.35 ? 4% perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
> 0.49 ? 3% -0.0 0.46 ? 3% perf-profile.children.cycles-pp.pagecache_get_page
> 0.18 ? 4% -0.0 0.15 ? 2% perf-profile.children.cycles-pp.truncate_inode_pages_range
> 0.08 ? 5% -0.0 0.05 ? 9% perf-profile.children.cycles-pp.btrfs_set_path_blocking
> 0.08 ? 6% -0.0 0.06 ? 6% perf-profile.children.cycles-pp.truncate_cleanup_page
> 0.80 ? 4% +0.2 0.95 ? 2% perf-profile.children.cycles-pp.can_overcommit
> 96.84 +0.5 97.37 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 96.80 +0.5 97.35 perf-profile.children.cycles-pp.do_syscall_64
> 43.34 +0.8 44.17 perf-profile.children.cycles-pp.btrfs_inode_rsv_release
> 43.49 +0.8 44.32 perf-profile.children.cycles-pp.block_rsv_release_bytes
> 95.32 +0.9 96.26 perf-profile.children.cycles-pp.ksys_write
> 95.26 +0.9 96.20 perf-profile.children.cycles-pp.vfs_write
> 94.91 +1.0 95.88 perf-profile.children.cycles-pp.__vfs_write
> 94.84 +1.0 95.81 perf-profile.children.cycles-pp.btrfs_file_write_iter
> 94.55 +1.0 95.55 perf-profile.children.cycles-pp.__btrfs_buffered_write
> 86.68 +1.0 87.70 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 44.08 +1.2 45.31 perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
> 43.49 +1.3 44.77 perf-profile.children.cycles-pp.reserve_metadata_bytes
> 87.59 +1.8 89.38 perf-profile.children.cycles-pp._raw_spin_lock
> 0.47 ? 19% -0.3 0.16 ? 13% perf-profile.self.cycles-pp.intel_idle
> 0.33 ? 6% -0.1 0.18 ? 6% perf-profile.self.cycles-pp.get_alloc_profile
> 0.27 ? 8% -0.0 0.22 ? 4% perf-profile.self.cycles-pp.btrfs_drop_pages
> 0.07 -0.0 0.03 ?100% perf-profile.self.cycles-pp.btrfs_set_lock_blocking_rw
> 0.14 ? 5% -0.0 0.12 ? 6% perf-profile.self.cycles-pp.clear_page_dirty_for_io
> 0.09 ? 5% -0.0 0.07 ? 10% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> 0.17 ? 4% +0.1 0.23 ? 3% perf-profile.self.cycles-pp.reserve_metadata_bytes
> 0.31 ? 7% +0.1 0.45 ? 2% perf-profile.self.cycles-pp.can_overcommit
> 86.35 +1.0 87.39 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>
>
>
> aim7.jobs-per-min
>
> 29000 +-+-----------------------------------------------------------------+
> 28500 +-+ +.. + +..+.. +.. |
> |..+ +.+..+.. : .. + .+.+..+..+.+.. .+..+.. + + + |
> 28000 +-+ + .. : + +. + + + |
> 27500 +-+ + + |
> | |
> 27000 +-+ |
> 26500 +-+ |
> 26000 +-+ |
> | |
> 25500 +-+ O O O O O |
> 25000 +-+ O O O O O O O O O
> | O O O O O O O O |
> 24500 O-+O O O O |
> 24000 +-+-----------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Xiaolong
>


--
Jens Axboe


2018-07-10 05:31:28

by Kemi Wang

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression

Hi, SeongJae
Do you have any input for this regression? thanks

On 2018年06月04日 13:52, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to commit:
>
>
> commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as non-rotational")
> https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-4.18/block
>
> in testcase: aim7
> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
> with following parameters:
>
> disk: 1BRD_48G
> fs: btrfs
> test: disk_rw
> load: 1500
> cpufreq_governor: performance
>
> test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
> =========================================================================================
> compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
> gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7
>
> commit:
> 522a777566 ("block: consolidate struct request timestamp fields")
> 316ba5736c ("brd: Mark as non-rotational")
>
> 522a777566f56696 316ba5736c9caa5dbcd8408598
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 28321 -11.2% 25147 aim7.jobs-per-min
> 318.19 +12.6% 358.23 aim7.time.elapsed_time
> 318.19 +12.6% 358.23 aim7.time.elapsed_time.max
> 1437526 ± 2% +14.6% 1646849 ± 2% aim7.time.involuntary_context_switches
> 11986 +14.2% 13691 aim7.time.system_time
> 73.06 ± 2% -3.6% 70.43 aim7.time.user_time
> 2449470 ± 2% -25.0% 1837521 ± 4% aim7.time.voluntary_context_switches
> 20.25 ± 58% +1681.5% 360.75 ±109% numa-meminfo.node1.Mlocked
> 456062 -16.3% 381859 softirqs.SCHED
> 9015 ± 7% -21.3% 7098 ± 22% meminfo.CmaFree
> 47.50 ± 58% +1355.8% 691.50 ± 92% meminfo.Mlocked
> 5.24 ± 3% -1.2 3.99 ± 2% mpstat.cpu.idle%
> 0.61 ± 2% -0.1 0.52 ± 2% mpstat.cpu.usr%
> 16627 +12.8% 18762 ± 4% slabinfo.Acpi-State.active_objs
> 16627 +12.9% 18775 ± 4% slabinfo.Acpi-State.num_objs
> 57.00 ± 2% +17.5% 67.00 vmstat.procs.r
> 20936 -24.8% 15752 ± 2% vmstat.system.cs
> 45474 -1.7% 44681 vmstat.system.in
> 6.50 ± 59% +1157.7% 81.75 ± 75% numa-vmstat.node0.nr_mlock
> 242870 ± 3% +13.2% 274913 ± 7% numa-vmstat.node0.nr_written
> 2278 ± 7% -22.6% 1763 ± 21% numa-vmstat.node1.nr_free_cma
> 4.75 ± 58% +1789.5% 89.75 ±109% numa-vmstat.node1.nr_mlock
> 88018135 ± 3% -48.9% 44980457 ± 7% cpuidle.C1.time
> 1398288 ± 3% -51.1% 683493 ± 9% cpuidle.C1.usage
> 3499814 ± 2% -38.5% 2153158 ± 5% cpuidle.C1E.time
> 52722 ± 4% -45.6% 28692 ± 6% cpuidle.C1E.usage
> 9865857 ± 3% -40.1% 5905155 ± 5% cpuidle.C3.time
> 69656 ± 2% -42.6% 39990 ± 5% cpuidle.C3.usage
> 590856 ± 2% -12.3% 517910 cpuidle.C6.usage
> 46160 ± 7% -53.7% 21372 ± 11% cpuidle.POLL.time
> 1716 ± 7% -46.6% 916.25 ± 14% cpuidle.POLL.usage
> 197656 +4.1% 205732 proc-vmstat.nr_active_file
> 191867 +4.1% 199647 proc-vmstat.nr_dirty
> 509282 +1.6% 517318 proc-vmstat.nr_file_pages
> 2282 ± 8% -24.4% 1725 ± 22% proc-vmstat.nr_free_cma
> 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_inactive_file
> 11.50 ± 58% +1397.8% 172.25 ± 93% proc-vmstat.nr_mlock
> 970355 ± 4% +14.6% 1111549 ± 8% proc-vmstat.nr_written
> 197984 +4.1% 206034 proc-vmstat.nr_zone_active_file
> 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_zone_inactive_file
> 192282 +4.1% 200126 proc-vmstat.nr_zone_write_pending
> 7901465 ± 3% -14.0% 6795016 ± 16% proc-vmstat.pgalloc_movable
> 886101 +10.2% 976329 proc-vmstat.pgfault
> 2.169e+12 +15.2% 2.497e+12 perf-stat.branch-instructions
> 0.41 -0.1 0.35 perf-stat.branch-miss-rate%
> 31.19 ± 2% +1.6 32.82 perf-stat.cache-miss-rate%
> 9.116e+09 +8.3% 9.869e+09 perf-stat.cache-misses
> 2.924e+10 +2.9% 3.008e+10 ± 2% perf-stat.cache-references
> 6712739 ± 2% -15.4% 5678643 ± 2% perf-stat.context-switches
> 4.02 +2.7% 4.13 perf-stat.cpi
> 3.761e+13 +17.3% 4.413e+13 perf-stat.cpu-cycles
> 606958 -13.7% 523758 ± 2% perf-stat.cpu-migrations
> 2.476e+12 +13.4% 2.809e+12 perf-stat.dTLB-loads
> 0.18 ± 2% -0.0 0.16 ± 9% perf-stat.dTLB-store-miss-rate%
> 1.079e+09 ± 2% -9.6% 9.755e+08 ± 9% perf-stat.dTLB-store-misses
> 5.933e+11 +1.6% 6.029e+11 perf-stat.dTLB-stores
> 9.349e+12 +14.2% 1.068e+13 perf-stat.instructions
> 11247 ± 11% +19.8% 13477 ± 9% perf-stat.instructions-per-iTLB-miss
> 0.25 -2.6% 0.24 perf-stat.ipc
> 865561 +10.3% 954350 perf-stat.minor-faults
> 2.901e+09 ± 3% +9.8% 3.186e+09 ± 3% perf-stat.node-load-misses
> 3.682e+09 ± 3% +11.0% 4.088e+09 ± 3% perf-stat.node-loads
> 3.778e+09 +4.8% 3.959e+09 ± 2% perf-stat.node-store-misses
> 5.079e+09 +6.4% 5.402e+09 perf-stat.node-stores
> 865565 +10.3% 954352 perf-stat.page-faults
> 51.75 ± 5% -12.5% 45.30 ± 10% sched_debug.cfs_rq:/.load_avg.avg
> 316.35 ± 3% +17.2% 370.81 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev
> 15294 ± 30% +234.9% 51219 ± 76% sched_debug.cpu.avg_idle.min
> 299443 ± 3% -7.3% 277566 ± 5% sched_debug.cpu.avg_idle.stddev
> 1182 ± 19% -26.3% 872.02 ± 13% sched_debug.cpu.nr_load_updates.stddev
> 1.22 ± 8% +21.7% 1.48 ± 6% sched_debug.cpu.nr_running.avg
> 2.75 ± 10% +26.2% 3.47 ± 6% sched_debug.cpu.nr_running.max
> 0.58 ± 7% +24.2% 0.73 ± 6% sched_debug.cpu.nr_running.stddev
> 77148 -20.0% 61702 ± 7% sched_debug.cpu.nr_switches.avg
> 70024 -24.8% 52647 ± 8% sched_debug.cpu.nr_switches.min
> 6662 ± 6% +61.9% 10789 ± 24% sched_debug.cpu.nr_switches.stddev
> 80.45 ± 18% -19.1% 65.05 ± 6% sched_debug.cpu.nr_uninterruptible.stddev
> 76819 -19.3% 62008 ± 8% sched_debug.cpu.sched_count.avg
> 70616 -23.5% 53996 ± 8% sched_debug.cpu.sched_count.min
> 5494 ± 9% +85.3% 10179 ± 26% sched_debug.cpu.sched_count.stddev
> 16936 -52.9% 7975 ± 9% sched_debug.cpu.sched_goidle.avg
> 19281 -49.9% 9666 ± 7% sched_debug.cpu.sched_goidle.max
> 15417 -54.8% 6962 ± 10% sched_debug.cpu.sched_goidle.min
> 875.00 ± 6% -35.0% 569.09 ± 13% sched_debug.cpu.sched_goidle.stddev
> 40332 -23.5% 30851 ± 7% sched_debug.cpu.ttwu_count.avg
> 35074 -26.3% 25833 ± 6% sched_debug.cpu.ttwu_count.min
> 3239 ± 8% +67.4% 5422 ± 28% sched_debug.cpu.ttwu_count.stddev
> 5232 +27.4% 6665 ± 13% sched_debug.cpu.ttwu_local.avg
> 15877 ± 12% +77.5% 28184 ± 27% sched_debug.cpu.ttwu_local.max
> 2530 ± 10% +95.9% 4956 ± 27% sched_debug.cpu.ttwu_local.stddev
> 2.52 ± 7% -0.6 1.95 ± 3% perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> 1.48 ± 12% -0.5 1.01 ± 4% perf-profile.calltrace.cycles-pp.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write
> 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter
> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_usermode_loop
> 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.btrfs_evict_inode.evict.__dentry_kill.dentry_kill.dput
> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
> 1.69 -0.1 1.54 ± 2% perf-profile.calltrace.cycles-pp.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter
> 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> 0.71 ± 6% -0.1 0.61 ± 2% perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write
> 0.69 ± 6% -0.1 0.60 ± 2% perf-profile.calltrace.cycles-pp.btrfs_clear_bit_hook.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need
> 96.77 +0.6 97.33 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> 0.00 +0.6 0.56 ± 3% perf-profile.calltrace.cycles-pp.can_overcommit.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
> 96.72 +0.6 97.29 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 43.13 +0.8 43.91 perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> 42.37 +0.8 43.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write
> 43.11 +0.8 43.89 perf-profile.calltrace.cycles-pp.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> 42.96 +0.8 43.77 perf-profile.calltrace.cycles-pp._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter
> 95.28 +0.9 96.23 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 95.22 +1.0 96.18 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 94.88 +1.0 95.85 perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> 94.83 +1.0 95.80 perf-profile.calltrace.cycles-pp.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
> 94.51 +1.0 95.50 perf-profile.calltrace.cycles-pp.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write
> 42.44 +1.1 43.52 perf-profile.calltrace.cycles-pp._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
> 42.09 +1.1 43.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write
> 44.07 +1.2 45.29 perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> 43.42 +1.3 44.69 perf-profile.calltrace.cycles-pp.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> 2.06 ± 18% -0.9 1.21 ± 6% perf-profile.children.cycles-pp.btrfs_search_slot
> 2.54 ± 7% -0.6 1.96 ± 3% perf-profile.children.cycles-pp.btrfs_dirty_pages
> 1.05 ± 24% -0.5 0.52 ± 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> 1.50 ± 12% -0.5 1.03 ± 4% perf-profile.children.cycles-pp.btrfs_get_extent
> 1.22 ± 15% -0.4 0.79 ± 8% perf-profile.children.cycles-pp.btrfs_lookup_file_extent
> 0.81 ± 5% -0.4 0.41 ± 6% perf-profile.children.cycles-pp.btrfs_calc_reclaim_metadata_size
> 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_lock_root_node
> 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_tree_lock
> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.__dentry_kill
> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.evict
> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.dentry_kill
> 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.btrfs_evict_inode
> 0.91 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.exit_to_usermode_loop
> 0.52 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.do_idle
> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.task_work_run
> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.__fput
> 0.90 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.dput
> 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.secondary_startup_64
> 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.cpu_startup_entry
> 0.50 ± 21% -0.3 0.17 ± 16% perf-profile.children.cycles-pp.start_secondary
> 0.47 ± 20% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.cpuidle_enter_state
> 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.intel_idle
> 0.61 ± 20% -0.3 0.36 ± 11% perf-profile.children.cycles-pp.btrfs_tree_read_lock
> 0.47 ± 26% -0.3 0.21 ± 10% perf-profile.children.cycles-pp.prepare_to_wait_event
> 0.64 ± 18% -0.2 0.39 ± 9% perf-profile.children.cycles-pp.btrfs_read_lock_root_node
> 0.40 ± 22% -0.2 0.21 ± 5% perf-profile.children.cycles-pp.btrfs_clear_path_blocking
> 0.38 ± 23% -0.2 0.19 ± 13% perf-profile.children.cycles-pp.finish_wait
> 1.51 ± 3% -0.2 1.35 ± 2% perf-profile.children.cycles-pp.__clear_extent_bit
> 1.71 -0.1 1.56 ± 2% perf-profile.children.cycles-pp.lock_and_cleanup_extent_if_need
> 0.29 ± 25% -0.1 0.15 ± 10% perf-profile.children.cycles-pp.btrfs_orphan_del
> 0.27 ± 27% -0.1 0.12 ± 8% perf-profile.children.cycles-pp.btrfs_del_orphan_item
> 0.33 ± 18% -0.1 0.19 ± 9% perf-profile.children.cycles-pp.queued_read_lock_slowpath
> 0.33 ± 19% -0.1 0.20 ± 4% perf-profile.children.cycles-pp.__wake_up_common_lock
> 0.45 ± 15% -0.1 0.34 ± 2% perf-profile.children.cycles-pp.btrfs_alloc_data_chunk_ondemand
> 0.47 ± 16% -0.1 0.36 ± 4% perf-profile.children.cycles-pp.btrfs_check_data_free_space
> 0.91 ± 4% -0.1 0.81 ± 3% perf-profile.children.cycles-pp.clear_extent_bit
> 1.07 ± 5% -0.1 0.97 perf-profile.children.cycles-pp.__set_extent_bit
> 0.77 ± 6% -0.1 0.69 ± 3% perf-profile.children.cycles-pp.btrfs_clear_bit_hook
> 0.17 ± 20% -0.1 0.08 ± 10% perf-profile.children.cycles-pp.queued_write_lock_slowpath
> 0.16 ± 22% -0.1 0.08 ± 24% perf-profile.children.cycles-pp.btrfs_lookup_inode
> 0.21 ± 17% -0.1 0.14 ± 19% perf-profile.children.cycles-pp.__btrfs_update_delayed_inode
> 0.26 ± 12% -0.1 0.18 ± 13% perf-profile.children.cycles-pp.btrfs_async_run_delayed_root
> 0.52 ± 5% -0.1 0.45 perf-profile.children.cycles-pp.set_extent_bit
> 0.45 ± 5% -0.1 0.40 ± 3% perf-profile.children.cycles-pp.alloc_extent_state
> 0.11 ± 17% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.btrfs_clear_lock_blocking_rw
> 0.28 ± 9% -0.0 0.23 ± 3% perf-profile.children.cycles-pp.btrfs_drop_pages
> 0.07 -0.0 0.03 ±100% perf-profile.children.cycles-pp.btrfs_set_lock_blocking_rw
> 0.39 ± 3% -0.0 0.34 ± 3% perf-profile.children.cycles-pp.get_alloc_profile
> 0.33 ± 7% -0.0 0.29 perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
> 0.38 ± 2% -0.0 0.35 ± 4% perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
> 0.49 ± 3% -0.0 0.46 ± 3% perf-profile.children.cycles-pp.pagecache_get_page
> 0.18 ± 4% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.truncate_inode_pages_range
> 0.08 ± 5% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.btrfs_set_path_blocking
> 0.08 ± 6% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.truncate_cleanup_page
> 0.80 ± 4% +0.2 0.95 ± 2% perf-profile.children.cycles-pp.can_overcommit
> 96.84 +0.5 97.37 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> 96.80 +0.5 97.35 perf-profile.children.cycles-pp.do_syscall_64
> 43.34 +0.8 44.17 perf-profile.children.cycles-pp.btrfs_inode_rsv_release
> 43.49 +0.8 44.32 perf-profile.children.cycles-pp.block_rsv_release_bytes
> 95.32 +0.9 96.26 perf-profile.children.cycles-pp.ksys_write
> 95.26 +0.9 96.20 perf-profile.children.cycles-pp.vfs_write
> 94.91 +1.0 95.88 perf-profile.children.cycles-pp.__vfs_write
> 94.84 +1.0 95.81 perf-profile.children.cycles-pp.btrfs_file_write_iter
> 94.55 +1.0 95.55 perf-profile.children.cycles-pp.__btrfs_buffered_write
> 86.68 +1.0 87.70 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> 44.08 +1.2 45.31 perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
> 43.49 +1.3 44.77 perf-profile.children.cycles-pp.reserve_metadata_bytes
> 87.59 +1.8 89.38 perf-profile.children.cycles-pp._raw_spin_lock
> 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.self.cycles-pp.intel_idle
> 0.33 ± 6% -0.1 0.18 ± 6% perf-profile.self.cycles-pp.get_alloc_profile
> 0.27 ± 8% -0.0 0.22 ± 4% perf-profile.self.cycles-pp.btrfs_drop_pages
> 0.07 -0.0 0.03 ±100% perf-profile.self.cycles-pp.btrfs_set_lock_blocking_rw
> 0.14 ± 5% -0.0 0.12 ± 6% perf-profile.self.cycles-pp.clear_page_dirty_for_io
> 0.09 ± 5% -0.0 0.07 ± 10% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> 0.17 ± 4% +0.1 0.23 ± 3% perf-profile.self.cycles-pp.reserve_metadata_bytes
> 0.31 ± 7% +0.1 0.45 ± 2% perf-profile.self.cycles-pp.can_overcommit
> 86.35 +1.0 87.39 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>
>
>
> aim7.jobs-per-min
>
> 29000 +-+-----------------------------------------------------------------+
> 28500 +-+ +.. + +..+.. +.. |
> |..+ +.+..+.. : .. + .+.+..+..+.+.. .+..+.. + + + |
> 28000 +-+ + .. : + +. + + + |
> 27500 +-+ + + |
> | |
> 27000 +-+ |
> 26500 +-+ |
> 26000 +-+ |
> | |
> 25500 +-+ O O O O O |
> 25000 +-+ O O O O O O O O O
> | O O O O O O O O |
> 24500 O-+O O O O |
> 24000 +-+-----------------------------------------------------------------+
>
>
> [*] bisect-good sample
> [O] bisect-bad sample
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> Thanks,
> Xiaolong
>
>
>
> _______________________________________________
> LKP mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/lkp
>

2018-07-10 17:05:15

by SeongJae Park

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression

Oops, I mistakenly found this mail now. I will look inside for this
though it will take some time because I will not be in office for this
week.


Thanks,
SeongJae Park
On Tue, Jul 10, 2018 at 1:30 AM kemi <[email protected]> wrote:
>
> Hi, SeongJae
> Do you have any input for this regression? thanks
>
> On 2018年06月04日 13:52, kernel test robot wrote:
> >
> > Greeting,
> >
> > FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to commit:
> >
> >
> > commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as non-rotational")
> > https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-4.18/block
> >
> > in testcase: aim7
> > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
> > with following parameters:
> >
> > disk: 1BRD_48G
> > fs: btrfs
> > test: disk_rw
> > load: 1500
> > cpufreq_governor: performance
> >
> > test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
> > test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
> >
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> > =========================================================================================
> > compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
> > gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7
> >
> > commit:
> > 522a777566 ("block: consolidate struct request timestamp fields")
> > 316ba5736c ("brd: Mark as non-rotational")
> >
> > 522a777566f56696 316ba5736c9caa5dbcd8408598
> > ---------------- --------------------------
> > %stddev %change %stddev
> > \ | \
> > 28321 -11.2% 25147 aim7.jobs-per-min
> > 318.19 +12.6% 358.23 aim7.time.elapsed_time
> > 318.19 +12.6% 358.23 aim7.time.elapsed_time.max
> > 1437526 ± 2% +14.6% 1646849 ± 2% aim7.time.involuntary_context_switches
> > 11986 +14.2% 13691 aim7.time.system_time
> > 73.06 ± 2% -3.6% 70.43 aim7.time.user_time
> > 2449470 ± 2% -25.0% 1837521 ± 4% aim7.time.voluntary_context_switches
> > 20.25 ± 58% +1681.5% 360.75 ±109% numa-meminfo.node1.Mlocked
> > 456062 -16.3% 381859 softirqs.SCHED
> > 9015 ± 7% -21.3% 7098 ± 22% meminfo.CmaFree
> > 47.50 ± 58% +1355.8% 691.50 ± 92% meminfo.Mlocked
> > 5.24 ± 3% -1.2 3.99 ± 2% mpstat.cpu.idle%
> > 0.61 ± 2% -0.1 0.52 ± 2% mpstat.cpu.usr%
> > 16627 +12.8% 18762 ± 4% slabinfo.Acpi-State.active_objs
> > 16627 +12.9% 18775 ± 4% slabinfo.Acpi-State.num_objs
> > 57.00 ± 2% +17.5% 67.00 vmstat.procs.r
> > 20936 -24.8% 15752 ± 2% vmstat.system.cs
> > 45474 -1.7% 44681 vmstat.system.in
> > 6.50 ± 59% +1157.7% 81.75 ± 75% numa-vmstat.node0.nr_mlock
> > 242870 ± 3% +13.2% 274913 ± 7% numa-vmstat.node0.nr_written
> > 2278 ± 7% -22.6% 1763 ± 21% numa-vmstat.node1.nr_free_cma
> > 4.75 ± 58% +1789.5% 89.75 ±109% numa-vmstat.node1.nr_mlock
> > 88018135 ± 3% -48.9% 44980457 ± 7% cpuidle.C1.time
> > 1398288 ± 3% -51.1% 683493 ± 9% cpuidle.C1.usage
> > 3499814 ± 2% -38.5% 2153158 ± 5% cpuidle.C1E.time
> > 52722 ± 4% -45.6% 28692 ± 6% cpuidle.C1E.usage
> > 9865857 ± 3% -40.1% 5905155 ± 5% cpuidle.C3.time
> > 69656 ± 2% -42.6% 39990 ± 5% cpuidle.C3.usage
> > 590856 ± 2% -12.3% 517910 cpuidle.C6.usage
> > 46160 ± 7% -53.7% 21372 ± 11% cpuidle.POLL.time
> > 1716 ± 7% -46.6% 916.25 ± 14% cpuidle.POLL.usage
> > 197656 +4.1% 205732 proc-vmstat.nr_active_file
> > 191867 +4.1% 199647 proc-vmstat.nr_dirty
> > 509282 +1.6% 517318 proc-vmstat.nr_file_pages
> > 2282 ± 8% -24.4% 1725 ± 22% proc-vmstat.nr_free_cma
> > 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_inactive_file
> > 11.50 ± 58% +1397.8% 172.25 ± 93% proc-vmstat.nr_mlock
> > 970355 ± 4% +14.6% 1111549 ± 8% proc-vmstat.nr_written
> > 197984 +4.1% 206034 proc-vmstat.nr_zone_active_file
> > 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_zone_inactive_file
> > 192282 +4.1% 200126 proc-vmstat.nr_zone_write_pending
> > 7901465 ± 3% -14.0% 6795016 ± 16% proc-vmstat.pgalloc_movable
> > 886101 +10.2% 976329 proc-vmstat.pgfault
> > 2.169e+12 +15.2% 2.497e+12 perf-stat.branch-instructions
> > 0.41 -0.1 0.35 perf-stat.branch-miss-rate%
> > 31.19 ± 2% +1.6 32.82 perf-stat.cache-miss-rate%
> > 9.116e+09 +8.3% 9.869e+09 perf-stat.cache-misses
> > 2.924e+10 +2.9% 3.008e+10 ± 2% perf-stat.cache-references
> > 6712739 ± 2% -15.4% 5678643 ± 2% perf-stat.context-switches
> > 4.02 +2.7% 4.13 perf-stat.cpi
> > 3.761e+13 +17.3% 4.413e+13 perf-stat.cpu-cycles
> > 606958 -13.7% 523758 ± 2% perf-stat.cpu-migrations
> > 2.476e+12 +13.4% 2.809e+12 perf-stat.dTLB-loads
> > 0.18 ± 2% -0.0 0.16 ± 9% perf-stat.dTLB-store-miss-rate%
> > 1.079e+09 ± 2% -9.6% 9.755e+08 ± 9% perf-stat.dTLB-store-misses
> > 5.933e+11 +1.6% 6.029e+11 perf-stat.dTLB-stores
> > 9.349e+12 +14.2% 1.068e+13 perf-stat.instructions
> > 11247 ± 11% +19.8% 13477 ± 9% perf-stat.instructions-per-iTLB-miss
> > 0.25 -2.6% 0.24 perf-stat.ipc
> > 865561 +10.3% 954350 perf-stat.minor-faults
> > 2.901e+09 ± 3% +9.8% 3.186e+09 ± 3% perf-stat.node-load-misses
> > 3.682e+09 ± 3% +11.0% 4.088e+09 ± 3% perf-stat.node-loads
> > 3.778e+09 +4.8% 3.959e+09 ± 2% perf-stat.node-store-misses
> > 5.079e+09 +6.4% 5.402e+09 perf-stat.node-stores
> > 865565 +10.3% 954352 perf-stat.page-faults
> > 51.75 ± 5% -12.5% 45.30 ± 10% sched_debug.cfs_rq:/.load_avg.avg
> > 316.35 ± 3% +17.2% 370.81 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev
> > 15294 ± 30% +234.9% 51219 ± 76% sched_debug.cpu.avg_idle.min
> > 299443 ± 3% -7.3% 277566 ± 5% sched_debug.cpu.avg_idle.stddev
> > 1182 ± 19% -26.3% 872.02 ± 13% sched_debug.cpu.nr_load_updates.stddev
> > 1.22 ± 8% +21.7% 1.48 ± 6% sched_debug.cpu.nr_running.avg
> > 2.75 ± 10% +26.2% 3.47 ± 6% sched_debug.cpu.nr_running.max
> > 0.58 ± 7% +24.2% 0.73 ± 6% sched_debug.cpu.nr_running.stddev
> > 77148 -20.0% 61702 ± 7% sched_debug.cpu.nr_switches.avg
> > 70024 -24.8% 52647 ± 8% sched_debug.cpu.nr_switches.min
> > 6662 ± 6% +61.9% 10789 ± 24% sched_debug.cpu.nr_switches.stddev
> > 80.45 ± 18% -19.1% 65.05 ± 6% sched_debug.cpu.nr_uninterruptible.stddev
> > 76819 -19.3% 62008 ± 8% sched_debug.cpu.sched_count.avg
> > 70616 -23.5% 53996 ± 8% sched_debug.cpu.sched_count.min
> > 5494 ± 9% +85.3% 10179 ± 26% sched_debug.cpu.sched_count.stddev
> > 16936 -52.9% 7975 ± 9% sched_debug.cpu.sched_goidle.avg
> > 19281 -49.9% 9666 ± 7% sched_debug.cpu.sched_goidle.max
> > 15417 -54.8% 6962 ± 10% sched_debug.cpu.sched_goidle.min
> > 875.00 ± 6% -35.0% 569.09 ± 13% sched_debug.cpu.sched_goidle.stddev
> > 40332 -23.5% 30851 ± 7% sched_debug.cpu.ttwu_count.avg
> > 35074 -26.3% 25833 ± 6% sched_debug.cpu.ttwu_count.min
> > 3239 ± 8% +67.4% 5422 ± 28% sched_debug.cpu.ttwu_count.stddev
> > 5232 +27.4% 6665 ± 13% sched_debug.cpu.ttwu_local.avg
> > 15877 ± 12% +77.5% 28184 ± 27% sched_debug.cpu.ttwu_local.max
> > 2530 ± 10% +95.9% 4956 ± 27% sched_debug.cpu.ttwu_local.stddev
> > 2.52 ± 7% -0.6 1.95 ± 3% perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> > 1.48 ± 12% -0.5 1.01 ± 4% perf-profile.calltrace.cycles-pp.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> > 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write
> > 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter
> > 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
> > 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
> > 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_usermode_loop
> > 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.btrfs_evict_inode.evict.__dentry_kill.dentry_kill.dput
> > 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
> > 1.69 -0.1 1.54 ± 2% perf-profile.calltrace.cycles-pp.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> > 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter
> > 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> > 0.71 ± 6% -0.1 0.61 ± 2% perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write
> > 0.69 ± 6% -0.1 0.60 ± 2% perf-profile.calltrace.cycles-pp.btrfs_clear_bit_hook.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need
> > 96.77 +0.6 97.33 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 0.00 +0.6 0.56 ± 3% perf-profile.calltrace.cycles-pp.can_overcommit.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
> > 96.72 +0.6 97.29 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 43.13 +0.8 43.91 perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> > 42.37 +0.8 43.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write
> > 43.11 +0.8 43.89 perf-profile.calltrace.cycles-pp.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> > 42.96 +0.8 43.77 perf-profile.calltrace.cycles-pp._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter
> > 95.28 +0.9 96.23 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 95.22 +1.0 96.18 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 94.88 +1.0 95.85 perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 94.83 +1.0 95.80 perf-profile.calltrace.cycles-pp.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
> > 94.51 +1.0 95.50 perf-profile.calltrace.cycles-pp.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write
> > 42.44 +1.1 43.52 perf-profile.calltrace.cycles-pp._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
> > 42.09 +1.1 43.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write
> > 44.07 +1.2 45.29 perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> > 43.42 +1.3 44.69 perf-profile.calltrace.cycles-pp.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> > 2.06 ± 18% -0.9 1.21 ± 6% perf-profile.children.cycles-pp.btrfs_search_slot
> > 2.54 ± 7% -0.6 1.96 ± 3% perf-profile.children.cycles-pp.btrfs_dirty_pages
> > 1.05 ± 24% -0.5 0.52 ± 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> > 1.50 ± 12% -0.5 1.03 ± 4% perf-profile.children.cycles-pp.btrfs_get_extent
> > 1.22 ± 15% -0.4 0.79 ± 8% perf-profile.children.cycles-pp.btrfs_lookup_file_extent
> > 0.81 ± 5% -0.4 0.41 ± 6% perf-profile.children.cycles-pp.btrfs_calc_reclaim_metadata_size
> > 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_lock_root_node
> > 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_tree_lock
> > 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.__dentry_kill
> > 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.evict
> > 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.dentry_kill
> > 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.btrfs_evict_inode
> > 0.91 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.exit_to_usermode_loop
> > 0.52 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.do_idle
> > 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.task_work_run
> > 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.__fput
> > 0.90 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.dput
> > 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.secondary_startup_64
> > 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.cpu_startup_entry
> > 0.50 ± 21% -0.3 0.17 ± 16% perf-profile.children.cycles-pp.start_secondary
> > 0.47 ± 20% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.cpuidle_enter_state
> > 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.intel_idle
> > 0.61 ± 20% -0.3 0.36 ± 11% perf-profile.children.cycles-pp.btrfs_tree_read_lock
> > 0.47 ± 26% -0.3 0.21 ± 10% perf-profile.children.cycles-pp.prepare_to_wait_event
> > 0.64 ± 18% -0.2 0.39 ± 9% perf-profile.children.cycles-pp.btrfs_read_lock_root_node
> > 0.40 ± 22% -0.2 0.21 ± 5% perf-profile.children.cycles-pp.btrfs_clear_path_blocking
> > 0.38 ± 23% -0.2 0.19 ± 13% perf-profile.children.cycles-pp.finish_wait
> > 1.51 ± 3% -0.2 1.35 ± 2% perf-profile.children.cycles-pp.__clear_extent_bit
> > 1.71 -0.1 1.56 ± 2% perf-profile.children.cycles-pp.lock_and_cleanup_extent_if_need
> > 0.29 ± 25% -0.1 0.15 ± 10% perf-profile.children.cycles-pp.btrfs_orphan_del
> > 0.27 ± 27% -0.1 0.12 ± 8% perf-profile.children.cycles-pp.btrfs_del_orphan_item
> > 0.33 ± 18% -0.1 0.19 ± 9% perf-profile.children.cycles-pp.queued_read_lock_slowpath
> > 0.33 ± 19% -0.1 0.20 ± 4% perf-profile.children.cycles-pp.__wake_up_common_lock
> > 0.45 ± 15% -0.1 0.34 ± 2% perf-profile.children.cycles-pp.btrfs_alloc_data_chunk_ondemand
> > 0.47 ± 16% -0.1 0.36 ± 4% perf-profile.children.cycles-pp.btrfs_check_data_free_space
> > 0.91 ± 4% -0.1 0.81 ± 3% perf-profile.children.cycles-pp.clear_extent_bit
> > 1.07 ± 5% -0.1 0.97 perf-profile.children.cycles-pp.__set_extent_bit
> > 0.77 ± 6% -0.1 0.69 ± 3% perf-profile.children.cycles-pp.btrfs_clear_bit_hook
> > 0.17 ± 20% -0.1 0.08 ± 10% perf-profile.children.cycles-pp.queued_write_lock_slowpath
> > 0.16 ± 22% -0.1 0.08 ± 24% perf-profile.children.cycles-pp.btrfs_lookup_inode
> > 0.21 ± 17% -0.1 0.14 ± 19% perf-profile.children.cycles-pp.__btrfs_update_delayed_inode
> > 0.26 ± 12% -0.1 0.18 ± 13% perf-profile.children.cycles-pp.btrfs_async_run_delayed_root
> > 0.52 ± 5% -0.1 0.45 perf-profile.children.cycles-pp.set_extent_bit
> > 0.45 ± 5% -0.1 0.40 ± 3% perf-profile.children.cycles-pp.alloc_extent_state
> > 0.11 ± 17% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.btrfs_clear_lock_blocking_rw
> > 0.28 ± 9% -0.0 0.23 ± 3% perf-profile.children.cycles-pp.btrfs_drop_pages
> > 0.07 -0.0 0.03 ±100% perf-profile.children.cycles-pp.btrfs_set_lock_blocking_rw
> > 0.39 ± 3% -0.0 0.34 ± 3% perf-profile.children.cycles-pp.get_alloc_profile
> > 0.33 ± 7% -0.0 0.29 perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
> > 0.38 ± 2% -0.0 0.35 ± 4% perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
> > 0.49 ± 3% -0.0 0.46 ± 3% perf-profile.children.cycles-pp.pagecache_get_page
> > 0.18 ± 4% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.truncate_inode_pages_range
> > 0.08 ± 5% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.btrfs_set_path_blocking
> > 0.08 ± 6% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.truncate_cleanup_page
> > 0.80 ± 4% +0.2 0.95 ± 2% perf-profile.children.cycles-pp.can_overcommit
> > 96.84 +0.5 97.37 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 96.80 +0.5 97.35 perf-profile.children.cycles-pp.do_syscall_64
> > 43.34 +0.8 44.17 perf-profile.children.cycles-pp.btrfs_inode_rsv_release
> > 43.49 +0.8 44.32 perf-profile.children.cycles-pp.block_rsv_release_bytes
> > 95.32 +0.9 96.26 perf-profile.children.cycles-pp.ksys_write
> > 95.26 +0.9 96.20 perf-profile.children.cycles-pp.vfs_write
> > 94.91 +1.0 95.88 perf-profile.children.cycles-pp.__vfs_write
> > 94.84 +1.0 95.81 perf-profile.children.cycles-pp.btrfs_file_write_iter
> > 94.55 +1.0 95.55 perf-profile.children.cycles-pp.__btrfs_buffered_write
> > 86.68 +1.0 87.70 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> > 44.08 +1.2 45.31 perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
> > 43.49 +1.3 44.77 perf-profile.children.cycles-pp.reserve_metadata_bytes
> > 87.59 +1.8 89.38 perf-profile.children.cycles-pp._raw_spin_lock
> > 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.self.cycles-pp.intel_idle
> > 0.33 ± 6% -0.1 0.18 ± 6% perf-profile.self.cycles-pp.get_alloc_profile
> > 0.27 ± 8% -0.0 0.22 ± 4% perf-profile.self.cycles-pp.btrfs_drop_pages
> > 0.07 -0.0 0.03 ±100% perf-profile.self.cycles-pp.btrfs_set_lock_blocking_rw
> > 0.14 ± 5% -0.0 0.12 ± 6% perf-profile.self.cycles-pp.clear_page_dirty_for_io
> > 0.09 ± 5% -0.0 0.07 ± 10% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> > 0.17 ± 4% +0.1 0.23 ± 3% perf-profile.self.cycles-pp.reserve_metadata_bytes
> > 0.31 ± 7% +0.1 0.45 ± 2% perf-profile.self.cycles-pp.can_overcommit
> > 86.35 +1.0 87.39 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> >
> >
> >
> > aim7.jobs-per-min
> >
> > 29000 +-+-----------------------------------------------------------------+
> > 28500 +-+ +.. + +..+.. +.. |
> > |..+ +.+..+.. : .. + .+.+..+..+.+.. .+..+.. + + + |
> > 28000 +-+ + .. : + +. + + + |
> > 27500 +-+ + + |
> > | |
> > 27000 +-+ |
> > 26500 +-+ |
> > 26000 +-+ |
> > | |
> > 25500 +-+ O O O O O |
> > 25000 +-+ O O O O O O O O O
> > | O O O O O O O O |
> > 24500 O-+O O O O |
> > 24000 +-+-----------------------------------------------------------------+
> >
> >
> > [*] bisect-good sample
> > [O] bisect-bad sample
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > Thanks,
> > Xiaolong
> >
> >
> >
> > _______________________________________________
> > LKP mailing list
> > [email protected]
> > https://lists.01.org/mailman/listinfo/lkp
> >

2018-07-26 12:59:57

by Kemi Wang

[permalink] [raw]
Subject: RE: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression

Hi, SeongJae
Any update or any info you need from my side?

-----Original Message-----
From: SeongJae Park [mailto:[email protected]]
Sent: Wednesday, July 11, 2018 12:53 AM
To: Wang, Kemi <[email protected]>
Cc: Ye, Xiaolong <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression

Oops, I mistakenly found this mail now. I will look inside for this though it will take some time because I will not be in office for this week.


Thanks,
SeongJae Park
On Tue, Jul 10, 2018 at 1:30 AM kemi <[email protected]> wrote:
>
> Hi, SeongJae
> Do you have any input for this regression? thanks
>
> On 2018年06月04日 13:52, kernel test robot wrote:
> >
> > Greeting,
> >
> > FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to commit:
> >
> >
> > commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as
> > non-rotational")
> > https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git
> > for-4.18/block
> >
> > in testcase: aim7
> > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @
> > 3.00GHz with 384G memory with following parameters:
> >
> > disk: 1BRD_48G
> > fs: btrfs
> > test: disk_rw
> > load: 1500
> > cpufreq_governor: performance
> >
> > test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
> > test-url:
> > https://sourceforge.net/projects/aimbench/files/aim-suite7/
> >
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> > ====================================================================
> > =====================
> > compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
> >
> > gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64-
> > 2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7
> >
> > commit:
> > 522a777566 ("block: consolidate struct request timestamp fields")
> > 316ba5736c ("brd: Mark as non-rotational")
> >
> > 522a777566f56696 316ba5736c9caa5dbcd8408598
> > ---------------- --------------------------
> > %stddev %change %stddev
> > \ | \
> > 28321 -11.2% 25147 aim7.jobs-per-min
> > 318.19 +12.6% 358.23 aim7.time.elapsed_time
> > 318.19 +12.6% 358.23 aim7.time.elapsed_time.max
> > 1437526 ± 2% +14.6% 1646849 ± 2% aim7.time.involuntary_context_switches
> > 11986 +14.2% 13691 aim7.time.system_time
> > 73.06 ± 2% -3.6% 70.43 aim7.time.user_time
> > 2449470 ± 2% -25.0% 1837521 ± 4% aim7.time.voluntary_context_switches
> > 20.25 ± 58% +1681.5% 360.75 ±109% numa-meminfo.node1.Mlocked
> > 456062 -16.3% 381859 softirqs.SCHED
> > 9015 ± 7% -21.3% 7098 ± 22% meminfo.CmaFree
> > 47.50 ± 58% +1355.8% 691.50 ± 92% meminfo.Mlocked
> > 5.24 ± 3% -1.2 3.99 ± 2% mpstat.cpu.idle%
> > 0.61 ± 2% -0.1 0.52 ± 2% mpstat.cpu.usr%
> > 16627 +12.8% 18762 ± 4% slabinfo.Acpi-State.active_objs
> > 16627 +12.9% 18775 ± 4% slabinfo.Acpi-State.num_objs
> > 57.00 ± 2% +17.5% 67.00 vmstat.procs.r
> > 20936 -24.8% 15752 ± 2% vmstat.system.cs
> > 45474 -1.7% 44681 vmstat.system.in
> > 6.50 ± 59% +1157.7% 81.75 ± 75% numa-vmstat.node0.nr_mlock
> > 242870 ± 3% +13.2% 274913 ± 7% numa-vmstat.node0.nr_written
> > 2278 ± 7% -22.6% 1763 ± 21% numa-vmstat.node1.nr_free_cma
> > 4.75 ± 58% +1789.5% 89.75 ±109% numa-vmstat.node1.nr_mlock
> > 88018135 ± 3% -48.9% 44980457 ± 7% cpuidle.C1.time
> > 1398288 ± 3% -51.1% 683493 ± 9% cpuidle.C1.usage
> > 3499814 ± 2% -38.5% 2153158 ± 5% cpuidle.C1E.time
> > 52722 ± 4% -45.6% 28692 ± 6% cpuidle.C1E.usage
> > 9865857 ± 3% -40.1% 5905155 ± 5% cpuidle.C3.time
> > 69656 ± 2% -42.6% 39990 ± 5% cpuidle.C3.usage
> > 590856 ± 2% -12.3% 517910 cpuidle.C6.usage
> > 46160 ± 7% -53.7% 21372 ± 11% cpuidle.POLL.time
> > 1716 ± 7% -46.6% 916.25 ± 14% cpuidle.POLL.usage
> > 197656 +4.1% 205732 proc-vmstat.nr_active_file
> > 191867 +4.1% 199647 proc-vmstat.nr_dirty
> > 509282 +1.6% 517318 proc-vmstat.nr_file_pages
> > 2282 ± 8% -24.4% 1725 ± 22% proc-vmstat.nr_free_cma
> > 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_inactive_file
> > 11.50 ± 58% +1397.8% 172.25 ± 93% proc-vmstat.nr_mlock
> > 970355 ± 4% +14.6% 1111549 ± 8% proc-vmstat.nr_written
> > 197984 +4.1% 206034 proc-vmstat.nr_zone_active_file
> > 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_zone_inactive_file
> > 192282 +4.1% 200126 proc-vmstat.nr_zone_write_pending
> > 7901465 ± 3% -14.0% 6795016 ± 16% proc-vmstat.pgalloc_movable
> > 886101 +10.2% 976329 proc-vmstat.pgfault
> > 2.169e+12 +15.2% 2.497e+12 perf-stat.branch-instructions
> > 0.41 -0.1 0.35 perf-stat.branch-miss-rate%
> > 31.19 ± 2% +1.6 32.82 perf-stat.cache-miss-rate%
> > 9.116e+09 +8.3% 9.869e+09 perf-stat.cache-misses
> > 2.924e+10 +2.9% 3.008e+10 ± 2% perf-stat.cache-references
> > 6712739 ± 2% -15.4% 5678643 ± 2% perf-stat.context-switches
> > 4.02 +2.7% 4.13 perf-stat.cpi
> > 3.761e+13 +17.3% 4.413e+13 perf-stat.cpu-cycles
> > 606958 -13.7% 523758 ± 2% perf-stat.cpu-migrations
> > 2.476e+12 +13.4% 2.809e+12 perf-stat.dTLB-loads
> > 0.18 ± 2% -0.0 0.16 ± 9% perf-stat.dTLB-store-miss-rate%
> > 1.079e+09 ± 2% -9.6% 9.755e+08 ± 9% perf-stat.dTLB-store-misses
> > 5.933e+11 +1.6% 6.029e+11 perf-stat.dTLB-stores
> > 9.349e+12 +14.2% 1.068e+13 perf-stat.instructions
> > 11247 ± 11% +19.8% 13477 ± 9% perf-stat.instructions-per-iTLB-miss
> > 0.25 -2.6% 0.24 perf-stat.ipc
> > 865561 +10.3% 954350 perf-stat.minor-faults
> > 2.901e+09 ± 3% +9.8% 3.186e+09 ± 3% perf-stat.node-load-misses
> > 3.682e+09 ± 3% +11.0% 4.088e+09 ± 3% perf-stat.node-loads
> > 3.778e+09 +4.8% 3.959e+09 ± 2% perf-stat.node-store-misses
> > 5.079e+09 +6.4% 5.402e+09 perf-stat.node-stores
> > 865565 +10.3% 954352 perf-stat.page-faults
> > 51.75 ± 5% -12.5% 45.30 ± 10% sched_debug.cfs_rq:/.load_avg.avg
> > 316.35 ± 3% +17.2% 370.81 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev
> > 15294 ± 30% +234.9% 51219 ± 76% sched_debug.cpu.avg_idle.min
> > 299443 ± 3% -7.3% 277566 ± 5% sched_debug.cpu.avg_idle.stddev
> > 1182 ± 19% -26.3% 872.02 ± 13% sched_debug.cpu.nr_load_updates.stddev
> > 1.22 ± 8% +21.7% 1.48 ± 6% sched_debug.cpu.nr_running.avg
> > 2.75 ± 10% +26.2% 3.47 ± 6% sched_debug.cpu.nr_running.max
> > 0.58 ± 7% +24.2% 0.73 ± 6% sched_debug.cpu.nr_running.stddev
> > 77148 -20.0% 61702 ± 7% sched_debug.cpu.nr_switches.avg
> > 70024 -24.8% 52647 ± 8% sched_debug.cpu.nr_switches.min
> > 6662 ± 6% +61.9% 10789 ± 24% sched_debug.cpu.nr_switches.stddev
> > 80.45 ± 18% -19.1% 65.05 ± 6% sched_debug.cpu.nr_uninterruptible.stddev
> > 76819 -19.3% 62008 ± 8% sched_debug.cpu.sched_count.avg
> > 70616 -23.5% 53996 ± 8% sched_debug.cpu.sched_count.min
> > 5494 ± 9% +85.3% 10179 ± 26% sched_debug.cpu.sched_count.stddev
> > 16936 -52.9% 7975 ± 9% sched_debug.cpu.sched_goidle.avg
> > 19281 -49.9% 9666 ± 7% sched_debug.cpu.sched_goidle.max
> > 15417 -54.8% 6962 ± 10% sched_debug.cpu.sched_goidle.min
> > 875.00 ± 6% -35.0% 569.09 ± 13% sched_debug.cpu.sched_goidle.stddev
> > 40332 -23.5% 30851 ± 7% sched_debug.cpu.ttwu_count.avg
> > 35074 -26.3% 25833 ± 6% sched_debug.cpu.ttwu_count.min
> > 3239 ± 8% +67.4% 5422 ± 28% sched_debug.cpu.ttwu_count.stddev
> > 5232 +27.4% 6665 ± 13% sched_debug.cpu.ttwu_local.avg
> > 15877 ± 12% +77.5% 28184 ± 27% sched_debug.cpu.ttwu_local.max
> > 2530 ± 10% +95.9% 4956 ± 27% sched_debug.cpu.ttwu_local.stddev
> > 2.52 ± 7% -0.6 1.95 ± 3% perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> > 1.48 ± 12% -0.5 1.01 ± 4% perf-profile.calltrace.cycles-pp.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> > 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write
> > 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter
> > 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
> > 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
> > 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_usermode_loop
> > 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.btrfs_evict_inode.evict.__dentry_kill.dentry_kill.dput
> > 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
> > 1.69 -0.1 1.54 ± 2% perf-profile.calltrace.cycles-pp.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> > 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter
> > 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> > 0.71 ± 6% -0.1 0.61 ± 2% perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write
> > 0.69 ± 6% -0.1 0.60 ± 2% perf-profile.calltrace.cycles-pp.btrfs_clear_bit_hook.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need
> > 96.77 +0.6 97.33 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 0.00 +0.6 0.56 ± 3% perf-profile.calltrace.cycles-pp.can_overcommit.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
> > 96.72 +0.6 97.29 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 43.13 +0.8 43.91 perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> > 42.37 +0.8 43.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write
> > 43.11 +0.8 43.89 perf-profile.calltrace.cycles-pp.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> > 42.96 +0.8 43.77 perf-profile.calltrace.cycles-pp._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter
> > 95.28 +0.9 96.23 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 95.22 +1.0 96.18 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 94.88 +1.0 95.85 perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 94.83 +1.0 95.80 perf-profile.calltrace.cycles-pp.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
> > 94.51 +1.0 95.50 perf-profile.calltrace.cycles-pp.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write
> > 42.44 +1.1 43.52 perf-profile.calltrace.cycles-pp._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
> > 42.09 +1.1 43.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write
> > 44.07 +1.2 45.29 perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> > 43.42 +1.3 44.69 perf-profile.calltrace.cycles-pp.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> > 2.06 ± 18% -0.9 1.21 ± 6% perf-profile.children.cycles-pp.btrfs_search_slot
> > 2.54 ± 7% -0.6 1.96 ± 3% perf-profile.children.cycles-pp.btrfs_dirty_pages
> > 1.05 ± 24% -0.5 0.52 ± 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> > 1.50 ± 12% -0.5 1.03 ± 4% perf-profile.children.cycles-pp.btrfs_get_extent
> > 1.22 ± 15% -0.4 0.79 ± 8% perf-profile.children.cycles-pp.btrfs_lookup_file_extent
> > 0.81 ± 5% -0.4 0.41 ± 6% perf-profile.children.cycles-pp.btrfs_calc_reclaim_metadata_size
> > 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_lock_root_node
> > 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_tree_lock
> > 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.__dentry_kill
> > 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.evict
> > 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.dentry_kill
> > 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.btrfs_evict_inode
> > 0.91 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.exit_to_usermode_loop
> > 0.52 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.do_idle
> > 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.task_work_run
> > 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.__fput
> > 0.90 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.dput
> > 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.secondary_startup_64
> > 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.cpu_startup_entry
> > 0.50 ± 21% -0.3 0.17 ± 16% perf-profile.children.cycles-pp.start_secondary
> > 0.47 ± 20% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.cpuidle_enter_state
> > 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.intel_idle
> > 0.61 ± 20% -0.3 0.36 ± 11% perf-profile.children.cycles-pp.btrfs_tree_read_lock
> > 0.47 ± 26% -0.3 0.21 ± 10% perf-profile.children.cycles-pp.prepare_to_wait_event
> > 0.64 ± 18% -0.2 0.39 ± 9% perf-profile.children.cycles-pp.btrfs_read_lock_root_node
> > 0.40 ± 22% -0.2 0.21 ± 5% perf-profile.children.cycles-pp.btrfs_clear_path_blocking
> > 0.38 ± 23% -0.2 0.19 ± 13% perf-profile.children.cycles-pp.finish_wait
> > 1.51 ± 3% -0.2 1.35 ± 2% perf-profile.children.cycles-pp.__clear_extent_bit
> > 1.71 -0.1 1.56 ± 2% perf-profile.children.cycles-pp.lock_and_cleanup_extent_if_need
> > 0.29 ± 25% -0.1 0.15 ± 10% perf-profile.children.cycles-pp.btrfs_orphan_del
> > 0.27 ± 27% -0.1 0.12 ± 8% perf-profile.children.cycles-pp.btrfs_del_orphan_item
> > 0.33 ± 18% -0.1 0.19 ± 9% perf-profile.children.cycles-pp.queued_read_lock_slowpath
> > 0.33 ± 19% -0.1 0.20 ± 4% perf-profile.children.cycles-pp.__wake_up_common_lock
> > 0.45 ± 15% -0.1 0.34 ± 2% perf-profile.children.cycles-pp.btrfs_alloc_data_chunk_ondemand
> > 0.47 ± 16% -0.1 0.36 ± 4% perf-profile.children.cycles-pp.btrfs_check_data_free_space
> > 0.91 ± 4% -0.1 0.81 ± 3% perf-profile.children.cycles-pp.clear_extent_bit
> > 1.07 ± 5% -0.1 0.97 perf-profile.children.cycles-pp.__set_extent_bit
> > 0.77 ± 6% -0.1 0.69 ± 3% perf-profile.children.cycles-pp.btrfs_clear_bit_hook
> > 0.17 ± 20% -0.1 0.08 ± 10% perf-profile.children.cycles-pp.queued_write_lock_slowpath
> > 0.16 ± 22% -0.1 0.08 ± 24% perf-profile.children.cycles-pp.btrfs_lookup_inode
> > 0.21 ± 17% -0.1 0.14 ± 19% perf-profile.children.cycles-pp.__btrfs_update_delayed_inode
> > 0.26 ± 12% -0.1 0.18 ± 13% perf-profile.children.cycles-pp.btrfs_async_run_delayed_root
> > 0.52 ± 5% -0.1 0.45 perf-profile.children.cycles-pp.set_extent_bit
> > 0.45 ± 5% -0.1 0.40 ± 3% perf-profile.children.cycles-pp.alloc_extent_state
> > 0.11 ± 17% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.btrfs_clear_lock_blocking_rw
> > 0.28 ± 9% -0.0 0.23 ± 3% perf-profile.children.cycles-pp.btrfs_drop_pages
> > 0.07 -0.0 0.03 ±100% perf-profile.children.cycles-pp.btrfs_set_lock_blocking_rw
> > 0.39 ± 3% -0.0 0.34 ± 3% perf-profile.children.cycles-pp.get_alloc_profile
> > 0.33 ± 7% -0.0 0.29 perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
> > 0.38 ± 2% -0.0 0.35 ± 4% perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
> > 0.49 ± 3% -0.0 0.46 ± 3% perf-profile.children.cycles-pp.pagecache_get_page
> > 0.18 ± 4% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.truncate_inode_pages_range
> > 0.08 ± 5% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.btrfs_set_path_blocking
> > 0.08 ± 6% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.truncate_cleanup_page
> > 0.80 ± 4% +0.2 0.95 ± 2% perf-profile.children.cycles-pp.can_overcommit
> > 96.84 +0.5 97.37 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 96.80 +0.5 97.35 perf-profile.children.cycles-pp.do_syscall_64
> > 43.34 +0.8 44.17 perf-profile.children.cycles-pp.btrfs_inode_rsv_release
> > 43.49 +0.8 44.32 perf-profile.children.cycles-pp.block_rsv_release_bytes
> > 95.32 +0.9 96.26 perf-profile.children.cycles-pp.ksys_write
> > 95.26 +0.9 96.20 perf-profile.children.cycles-pp.vfs_write
> > 94.91 +1.0 95.88 perf-profile.children.cycles-pp.__vfs_write
> > 94.84 +1.0 95.81 perf-profile.children.cycles-pp.btrfs_file_write_iter
> > 94.55 +1.0 95.55 perf-profile.children.cycles-pp.__btrfs_buffered_write
> > 86.68 +1.0 87.70 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> > 44.08 +1.2 45.31 perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
> > 43.49 +1.3 44.77 perf-profile.children.cycles-pp.reserve_metadata_bytes
> > 87.59 +1.8 89.38 perf-profile.children.cycles-pp._raw_spin_lock
> > 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.self.cycles-pp.intel_idle
> > 0.33 ± 6% -0.1 0.18 ± 6% perf-profile.self.cycles-pp.get_alloc_profile
> > 0.27 ± 8% -0.0 0.22 ± 4% perf-profile.self.cycles-pp.btrfs_drop_pages
> > 0.07 -0.0 0.03 ±100% perf-profile.self.cycles-pp.btrfs_set_lock_blocking_rw
> > 0.14 ± 5% -0.0 0.12 ± 6% perf-profile.self.cycles-pp.clear_page_dirty_for_io
> > 0.09 ± 5% -0.0 0.07 ± 10% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> > 0.17 ± 4% +0.1 0.23 ± 3% perf-profile.self.cycles-pp.reserve_metadata_bytes
> > 0.31 ± 7% +0.1 0.45 ± 2% perf-profile.self.cycles-pp.can_overcommit
> > 86.35 +1.0 87.39 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> >
> >
> >
> > aim7.jobs-per-min
> >
> > 29000 +-+-----------------------------------------------------------------+
> > 28500 +-+ +.. + +..+.. +.. |
> > |..+ +.+..+.. : .. + .+.+..+..+.+.. .+..+.. + + + |
> > 28000 +-+ + .. : + +. + + + |
> > 27500 +-+ + + |
> > | |
> > 27000 +-+ |
> > 26500 +-+ |
> > 26000 +-+ |
> > | |
> > 25500 +-+ O O O O O |
> > 25000 +-+ O O O O O O O O O
> > | O O O O O O O O |
> > 24500 O-+O O O O |
> > 24000
> > +-+-----------------------------------------------------------------
> > +
> >
> >
> > [*] bisect-good sample
> > [O] bisect-bad sample
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are
> > provided for informational purposes only. Any difference in system
> > hardware or software design or configuration may affect actual performance.
> >
> >
> > Thanks,
> > Xiaolong
> >
> >
> >
> > _______________________________________________
> > LKP mailing list
> > [email protected]
> > https://lists.01.org/mailman/listinfo/lkp
> >

2018-07-27 15:15:46

by Jens Axboe

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression

SeongJae please take a look at this. It's been weeks now and silence
on your end.


On 7/26/18 6:58 AM, Wang, Kemi wrote:
> Hi, SeongJae
> Any update or any info you need from my side?
>
> -----Original Message-----
> From: SeongJae Park [mailto:[email protected]]
> Sent: Wednesday, July 11, 2018 12:53 AM
> To: Wang, Kemi <[email protected]>
> Cc: Ye, Xiaolong <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression
>
> Oops, I mistakenly found this mail now. I will look inside for this though it will take some time because I will not be in office for this week.
>
>
> Thanks,
> SeongJae Park
> On Tue, Jul 10, 2018 at 1:30 AM kemi <[email protected]> wrote:
>>
>> Hi, SeongJae
>> Do you have any input for this regression? thanks
>>
>> On 2018年06月04日 13:52, kernel test robot wrote:
>>>
>>> Greeting,
>>>
>>> FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to commit:
>>>
>>>
>>> commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as
>>> non-rotational")
>>> https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git
>>> for-4.18/block
>>>
>>> in testcase: aim7
>>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @
>>> 3.00GHz with 384G memory with following parameters:
>>>
>>> disk: 1BRD_48G
>>> fs: btrfs
>>> test: disk_rw
>>> load: 1500
>>> cpufreq_governor: performance
>>>
>>> test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
>>> test-url:
>>> https://sourceforge.net/projects/aimbench/files/aim-suite7/
>>>
>>>
>>>
>>> Details are as below:
>>> -------------------------------------------------------------------------------------------------->
>>>
>>> ====================================================================
>>> =====================
>>> compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
>>>
>>> gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64-
>>> 2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7
>>>
>>> commit:
>>> 522a777566 ("block: consolidate struct request timestamp fields")
>>> 316ba5736c ("brd: Mark as non-rotational")
>>>
>>> 522a777566f56696 316ba5736c9caa5dbcd8408598
>>> ---------------- --------------------------
>>> %stddev %change %stddev
>>> \ | \
>>> 28321 -11.2% 25147 aim7.jobs-per-min
>>> 318.19 +12.6% 358.23 aim7.time.elapsed_time
>>> 318.19 +12.6% 358.23 aim7.time.elapsed_time.max
>>> 1437526 ± 2% +14.6% 1646849 ± 2% aim7.time.involuntary_context_switches
>>> 11986 +14.2% 13691 aim7.time.system_time
>>> 73.06 ± 2% -3.6% 70.43 aim7.time.user_time
>>> 2449470 ± 2% -25.0% 1837521 ± 4% aim7.time.voluntary_context_switches
>>> 20.25 ± 58% +1681.5% 360.75 ±109% numa-meminfo.node1.Mlocked
>>> 456062 -16.3% 381859 softirqs.SCHED
>>> 9015 ± 7% -21.3% 7098 ± 22% meminfo.CmaFree
>>> 47.50 ± 58% +1355.8% 691.50 ± 92% meminfo.Mlocked
>>> 5.24 ± 3% -1.2 3.99 ± 2% mpstat.cpu.idle%
>>> 0.61 ± 2% -0.1 0.52 ± 2% mpstat.cpu.usr%
>>> 16627 +12.8% 18762 ± 4% slabinfo.Acpi-State.active_objs
>>> 16627 +12.9% 18775 ± 4% slabinfo.Acpi-State.num_objs
>>> 57.00 ± 2% +17.5% 67.00 vmstat.procs.r
>>> 20936 -24.8% 15752 ± 2% vmstat.system.cs
>>> 45474 -1.7% 44681 vmstat.system.in
>>> 6.50 ± 59% +1157.7% 81.75 ± 75% numa-vmstat.node0.nr_mlock
>>> 242870 ± 3% +13.2% 274913 ± 7% numa-vmstat.node0.nr_written
>>> 2278 ± 7% -22.6% 1763 ± 21% numa-vmstat.node1.nr_free_cma
>>> 4.75 ± 58% +1789.5% 89.75 ±109% numa-vmstat.node1.nr_mlock
>>> 88018135 ± 3% -48.9% 44980457 ± 7% cpuidle.C1.time
>>> 1398288 ± 3% -51.1% 683493 ± 9% cpuidle.C1.usage
>>> 3499814 ± 2% -38.5% 2153158 ± 5% cpuidle.C1E.time
>>> 52722 ± 4% -45.6% 28692 ± 6% cpuidle.C1E.usage
>>> 9865857 ± 3% -40.1% 5905155 ± 5% cpuidle.C3.time
>>> 69656 ± 2% -42.6% 39990 ± 5% cpuidle.C3.usage
>>> 590856 ± 2% -12.3% 517910 cpuidle.C6.usage
>>> 46160 ± 7% -53.7% 21372 ± 11% cpuidle.POLL.time
>>> 1716 ± 7% -46.6% 916.25 ± 14% cpuidle.POLL.usage
>>> 197656 +4.1% 205732 proc-vmstat.nr_active_file
>>> 191867 +4.1% 199647 proc-vmstat.nr_dirty
>>> 509282 +1.6% 517318 proc-vmstat.nr_file_pages
>>> 2282 ± 8% -24.4% 1725 ± 22% proc-vmstat.nr_free_cma
>>> 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_inactive_file
>>> 11.50 ± 58% +1397.8% 172.25 ± 93% proc-vmstat.nr_mlock
>>> 970355 ± 4% +14.6% 1111549 ± 8% proc-vmstat.nr_written
>>> 197984 +4.1% 206034 proc-vmstat.nr_zone_active_file
>>> 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_zone_inactive_file
>>> 192282 +4.1% 200126 proc-vmstat.nr_zone_write_pending
>>> 7901465 ± 3% -14.0% 6795016 ± 16% proc-vmstat.pgalloc_movable
>>> 886101 +10.2% 976329 proc-vmstat.pgfault
>>> 2.169e+12 +15.2% 2.497e+12 perf-stat.branch-instructions
>>> 0.41 -0.1 0.35 perf-stat.branch-miss-rate%
>>> 31.19 ± 2% +1.6 32.82 perf-stat.cache-miss-rate%
>>> 9.116e+09 +8.3% 9.869e+09 perf-stat.cache-misses
>>> 2.924e+10 +2.9% 3.008e+10 ± 2% perf-stat.cache-references
>>> 6712739 ± 2% -15.4% 5678643 ± 2% perf-stat.context-switches
>>> 4.02 +2.7% 4.13 perf-stat.cpi
>>> 3.761e+13 +17.3% 4.413e+13 perf-stat.cpu-cycles
>>> 606958 -13.7% 523758 ± 2% perf-stat.cpu-migrations
>>> 2.476e+12 +13.4% 2.809e+12 perf-stat.dTLB-loads
>>> 0.18 ± 2% -0.0 0.16 ± 9% perf-stat.dTLB-store-miss-rate%
>>> 1.079e+09 ± 2% -9.6% 9.755e+08 ± 9% perf-stat.dTLB-store-misses
>>> 5.933e+11 +1.6% 6.029e+11 perf-stat.dTLB-stores
>>> 9.349e+12 +14.2% 1.068e+13 perf-stat.instructions
>>> 11247 ± 11% +19.8% 13477 ± 9% perf-stat.instructions-per-iTLB-miss
>>> 0.25 -2.6% 0.24 perf-stat.ipc
>>> 865561 +10.3% 954350 perf-stat.minor-faults
>>> 2.901e+09 ± 3% +9.8% 3.186e+09 ± 3% perf-stat.node-load-misses
>>> 3.682e+09 ± 3% +11.0% 4.088e+09 ± 3% perf-stat.node-loads
>>> 3.778e+09 +4.8% 3.959e+09 ± 2% perf-stat.node-store-misses
>>> 5.079e+09 +6.4% 5.402e+09 perf-stat.node-stores
>>> 865565 +10.3% 954352 perf-stat.page-faults
>>> 51.75 ± 5% -12.5% 45.30 ± 10% sched_debug.cfs_rq:/.load_avg.avg
>>> 316.35 ± 3% +17.2% 370.81 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev
>>> 15294 ± 30% +234.9% 51219 ± 76% sched_debug.cpu.avg_idle.min
>>> 299443 ± 3% -7.3% 277566 ± 5% sched_debug.cpu.avg_idle.stddev
>>> 1182 ± 19% -26.3% 872.02 ± 13% sched_debug.cpu.nr_load_updates.stddev
>>> 1.22 ± 8% +21.7% 1.48 ± 6% sched_debug.cpu.nr_running.avg
>>> 2.75 ± 10% +26.2% 3.47 ± 6% sched_debug.cpu.nr_running.max
>>> 0.58 ± 7% +24.2% 0.73 ± 6% sched_debug.cpu.nr_running.stddev
>>> 77148 -20.0% 61702 ± 7% sched_debug.cpu.nr_switches.avg
>>> 70024 -24.8% 52647 ± 8% sched_debug.cpu.nr_switches.min
>>> 6662 ± 6% +61.9% 10789 ± 24% sched_debug.cpu.nr_switches.stddev
>>> 80.45 ± 18% -19.1% 65.05 ± 6% sched_debug.cpu.nr_uninterruptible.stddev
>>> 76819 -19.3% 62008 ± 8% sched_debug.cpu.sched_count.avg
>>> 70616 -23.5% 53996 ± 8% sched_debug.cpu.sched_count.min
>>> 5494 ± 9% +85.3% 10179 ± 26% sched_debug.cpu.sched_count.stddev
>>> 16936 -52.9% 7975 ± 9% sched_debug.cpu.sched_goidle.avg
>>> 19281 -49.9% 9666 ± 7% sched_debug.cpu.sched_goidle.max
>>> 15417 -54.8% 6962 ± 10% sched_debug.cpu.sched_goidle.min
>>> 875.00 ± 6% -35.0% 569.09 ± 13% sched_debug.cpu.sched_goidle.stddev
>>> 40332 -23.5% 30851 ± 7% sched_debug.cpu.ttwu_count.avg
>>> 35074 -26.3% 25833 ± 6% sched_debug.cpu.ttwu_count.min
>>> 3239 ± 8% +67.4% 5422 ± 28% sched_debug.cpu.ttwu_count.stddev
>>> 5232 +27.4% 6665 ± 13% sched_debug.cpu.ttwu_local.avg
>>> 15877 ± 12% +77.5% 28184 ± 27% sched_debug.cpu.ttwu_local.max
>>> 2530 ± 10% +95.9% 4956 ± 27% sched_debug.cpu.ttwu_local.stddev
>>> 2.52 ± 7% -0.6 1.95 ± 3% perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>>> 1.48 ± 12% -0.5 1.01 ± 4% perf-profile.calltrace.cycles-pp.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>>> 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write
>>> 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter
>>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
>>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
>>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_usermode_loop
>>> 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.btrfs_evict_inode.evict.__dentry_kill.dentry_kill.dput
>>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
>>> 1.69 -0.1 1.54 ± 2% perf-profile.calltrace.cycles-pp.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>>> 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter
>>> 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>>> 0.71 ± 6% -0.1 0.61 ± 2% perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write
>>> 0.69 ± 6% -0.1 0.60 ± 2% perf-profile.calltrace.cycles-pp.btrfs_clear_bit_hook.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need
>>> 96.77 +0.6 97.33 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>>> 0.00 +0.6 0.56 ± 3% perf-profile.calltrace.cycles-pp.can_overcommit.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
>>> 96.72 +0.6 97.29 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 43.13 +0.8 43.91 perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>>> 42.37 +0.8 43.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write
>>> 43.11 +0.8 43.89 perf-profile.calltrace.cycles-pp.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>>> 42.96 +0.8 43.77 perf-profile.calltrace.cycles-pp._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter
>>> 95.28 +0.9 96.23 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 95.22 +1.0 96.18 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 94.88 +1.0 95.85 perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 94.83 +1.0 95.80 perf-profile.calltrace.cycles-pp.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
>>> 94.51 +1.0 95.50 perf-profile.calltrace.cycles-pp.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write
>>> 42.44 +1.1 43.52 perf-profile.calltrace.cycles-pp._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
>>> 42.09 +1.1 43.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write
>>> 44.07 +1.2 45.29 perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>>> 43.42 +1.3 44.69 perf-profile.calltrace.cycles-pp.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>>> 2.06 ± 18% -0.9 1.21 ± 6% perf-profile.children.cycles-pp.btrfs_search_slot
>>> 2.54 ± 7% -0.6 1.96 ± 3% perf-profile.children.cycles-pp.btrfs_dirty_pages
>>> 1.05 ± 24% -0.5 0.52 ± 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>>> 1.50 ± 12% -0.5 1.03 ± 4% perf-profile.children.cycles-pp.btrfs_get_extent
>>> 1.22 ± 15% -0.4 0.79 ± 8% perf-profile.children.cycles-pp.btrfs_lookup_file_extent
>>> 0.81 ± 5% -0.4 0.41 ± 6% perf-profile.children.cycles-pp.btrfs_calc_reclaim_metadata_size
>>> 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_lock_root_node
>>> 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_tree_lock
>>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.__dentry_kill
>>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.evict
>>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.dentry_kill
>>> 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.btrfs_evict_inode
>>> 0.91 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.exit_to_usermode_loop
>>> 0.52 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.do_idle
>>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.task_work_run
>>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.__fput
>>> 0.90 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.dput
>>> 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.secondary_startup_64
>>> 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.cpu_startup_entry
>>> 0.50 ± 21% -0.3 0.17 ± 16% perf-profile.children.cycles-pp.start_secondary
>>> 0.47 ± 20% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.cpuidle_enter_state
>>> 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.intel_idle
>>> 0.61 ± 20% -0.3 0.36 ± 11% perf-profile.children.cycles-pp.btrfs_tree_read_lock
>>> 0.47 ± 26% -0.3 0.21 ± 10% perf-profile.children.cycles-pp.prepare_to_wait_event
>>> 0.64 ± 18% -0.2 0.39 ± 9% perf-profile.children.cycles-pp.btrfs_read_lock_root_node
>>> 0.40 ± 22% -0.2 0.21 ± 5% perf-profile.children.cycles-pp.btrfs_clear_path_blocking
>>> 0.38 ± 23% -0.2 0.19 ± 13% perf-profile.children.cycles-pp.finish_wait
>>> 1.51 ± 3% -0.2 1.35 ± 2% perf-profile.children.cycles-pp.__clear_extent_bit
>>> 1.71 -0.1 1.56 ± 2% perf-profile.children.cycles-pp.lock_and_cleanup_extent_if_need
>>> 0.29 ± 25% -0.1 0.15 ± 10% perf-profile.children.cycles-pp.btrfs_orphan_del
>>> 0.27 ± 27% -0.1 0.12 ± 8% perf-profile.children.cycles-pp.btrfs_del_orphan_item
>>> 0.33 ± 18% -0.1 0.19 ± 9% perf-profile.children.cycles-pp.queued_read_lock_slowpath
>>> 0.33 ± 19% -0.1 0.20 ± 4% perf-profile.children.cycles-pp.__wake_up_common_lock
>>> 0.45 ± 15% -0.1 0.34 ± 2% perf-profile.children.cycles-pp.btrfs_alloc_data_chunk_ondemand
>>> 0.47 ± 16% -0.1 0.36 ± 4% perf-profile.children.cycles-pp.btrfs_check_data_free_space
>>> 0.91 ± 4% -0.1 0.81 ± 3% perf-profile.children.cycles-pp.clear_extent_bit
>>> 1.07 ± 5% -0.1 0.97 perf-profile.children.cycles-pp.__set_extent_bit
>>> 0.77 ± 6% -0.1 0.69 ± 3% perf-profile.children.cycles-pp.btrfs_clear_bit_hook
>>> 0.17 ± 20% -0.1 0.08 ± 10% perf-profile.children.cycles-pp.queued_write_lock_slowpath
>>> 0.16 ± 22% -0.1 0.08 ± 24% perf-profile.children.cycles-pp.btrfs_lookup_inode
>>> 0.21 ± 17% -0.1 0.14 ± 19% perf-profile.children.cycles-pp.__btrfs_update_delayed_inode
>>> 0.26 ± 12% -0.1 0.18 ± 13% perf-profile.children.cycles-pp.btrfs_async_run_delayed_root
>>> 0.52 ± 5% -0.1 0.45 perf-profile.children.cycles-pp.set_extent_bit
>>> 0.45 ± 5% -0.1 0.40 ± 3% perf-profile.children.cycles-pp.alloc_extent_state
>>> 0.11 ± 17% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.btrfs_clear_lock_blocking_rw
>>> 0.28 ± 9% -0.0 0.23 ± 3% perf-profile.children.cycles-pp.btrfs_drop_pages
>>> 0.07 -0.0 0.03 ±100% perf-profile.children.cycles-pp.btrfs_set_lock_blocking_rw
>>> 0.39 ± 3% -0.0 0.34 ± 3% perf-profile.children.cycles-pp.get_alloc_profile
>>> 0.33 ± 7% -0.0 0.29 perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
>>> 0.38 ± 2% -0.0 0.35 ± 4% perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
>>> 0.49 ± 3% -0.0 0.46 ± 3% perf-profile.children.cycles-pp.pagecache_get_page
>>> 0.18 ± 4% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.truncate_inode_pages_range
>>> 0.08 ± 5% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.btrfs_set_path_blocking
>>> 0.08 ± 6% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.truncate_cleanup_page
>>> 0.80 ± 4% +0.2 0.95 ± 2% perf-profile.children.cycles-pp.can_overcommit
>>> 96.84 +0.5 97.37 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>>> 96.80 +0.5 97.35 perf-profile.children.cycles-pp.do_syscall_64
>>> 43.34 +0.8 44.17 perf-profile.children.cycles-pp.btrfs_inode_rsv_release
>>> 43.49 +0.8 44.32 perf-profile.children.cycles-pp.block_rsv_release_bytes
>>> 95.32 +0.9 96.26 perf-profile.children.cycles-pp.ksys_write
>>> 95.26 +0.9 96.20 perf-profile.children.cycles-pp.vfs_write
>>> 94.91 +1.0 95.88 perf-profile.children.cycles-pp.__vfs_write
>>> 94.84 +1.0 95.81 perf-profile.children.cycles-pp.btrfs_file_write_iter
>>> 94.55 +1.0 95.55 perf-profile.children.cycles-pp.__btrfs_buffered_write
>>> 86.68 +1.0 87.70 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>>> 44.08 +1.2 45.31 perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
>>> 43.49 +1.3 44.77 perf-profile.children.cycles-pp.reserve_metadata_bytes
>>> 87.59 +1.8 89.38 perf-profile.children.cycles-pp._raw_spin_lock
>>> 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.self.cycles-pp.intel_idle
>>> 0.33 ± 6% -0.1 0.18 ± 6% perf-profile.self.cycles-pp.get_alloc_profile
>>> 0.27 ± 8% -0.0 0.22 ± 4% perf-profile.self.cycles-pp.btrfs_drop_pages
>>> 0.07 -0.0 0.03 ±100% perf-profile.self.cycles-pp.btrfs_set_lock_blocking_rw
>>> 0.14 ± 5% -0.0 0.12 ± 6% perf-profile.self.cycles-pp.clear_page_dirty_for_io
>>> 0.09 ± 5% -0.0 0.07 ± 10% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>>> 0.17 ± 4% +0.1 0.23 ± 3% perf-profile.self.cycles-pp.reserve_metadata_bytes
>>> 0.31 ± 7% +0.1 0.45 ± 2% perf-profile.self.cycles-pp.can_overcommit
>>> 86.35 +1.0 87.39 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>>>
>>>
>>>
>>> aim7.jobs-per-min
>>>
>>> 29000 +-+-----------------------------------------------------------------+
>>> 28500 +-+ +.. + +..+.. +.. |
>>> |..+ +.+..+.. : .. + .+.+..+..+.+.. .+..+.. + + + |
>>> 28000 +-+ + .. : + +. + + + |
>>> 27500 +-+ + + |
>>> | |
>>> 27000 +-+ |
>>> 26500 +-+ |
>>> 26000 +-+ |
>>> | |
>>> 25500 +-+ O O O O O |
>>> 25000 +-+ O O O O O O O O O
>>> | O O O O O O O O |
>>> 24500 O-+O O O O |
>>> 24000
>>> +-+-----------------------------------------------------------------
>>> +
>>>
>>>
>>> [*] bisect-good sample
>>> [O] bisect-bad sample
>>>
>>>
>>> Disclaimer:
>>> Results have been estimated based on internal Intel analysis and are
>>> provided for informational purposes only. Any difference in system
>>> hardware or software design or configuration may affect actual performance.
>>>
>>>
>>> Thanks,
>>> Xiaolong
>>>
>>>
>>>
>>> _______________________________________________
>>> LKP mailing list
>>> [email protected]
>>> https://lists.01.org/mailman/listinfo/lkp
>>>


--
Jens Axboe


2018-07-27 21:56:28

by SeongJae Park

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression

Hello Jens,


On Fri, 27 Jul 2018, Jens Axboe wrote:

> SeongJae please take a look at this. It's been weeks now and silence
> on your end.

I'm so sorry for late response. Unfortunately, I was too busy to look
this regression deeply. Now the things bothering me are almost finished.
Maybe I will have some time from next week.

Again, I'm so sorry for this. I will not subject even though you
decide to revert the patch. Anyway, I will give you some analysis result
by next friday.


Thanks,
SeongJae Park

>
>
> On 7/26/18 6:58 AM, Wang, Kemi wrote:
> > Hi, SeongJae
> > Any update or any info you need from my side?
> >
> > -----Original Message-----
> > From: SeongJae Park [mailto:[email protected]]
> > Sent: Wednesday, July 11, 2018 12:53 AM
> > To: Wang, Kemi <[email protected]>
> > Cc: Ye, Xiaolong <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]
> > Subject: Re: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression
> >
> > Oops, I mistakenly found this mail now. I will look inside for this though it will take some time because I will not be in office for this week.
> >
> >
> > Thanks,
> > SeongJae Park
> > On Tue, Jul 10, 2018 at 1:30 AM kemi <[email protected]> wrote:
> >>
> >> Hi, SeongJae
> >> Do you have any input for this regression? thanks
> >>
> >> On 2018年06月04日 13:52, kernel test robot wrote:
> >>>
> >>> Greeting,
> >>>
> >>> FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to commit:
> >>>
> >>>
> >>> commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as
> >>> non-rotational")
> >>> https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git
> >>> for-4.18/block
> >>>
> >>> in testcase: aim7
> >>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @
> >>> 3.00GHz with 384G memory with following parameters:
> >>>
> >>> disk: 1BRD_48G
> >>> fs: btrfs
> >>> test: disk_rw
> >>> load: 1500
> >>> cpufreq_governor: performance
> >>>
> >>> test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
> >>> test-url:
> >>> https://sourceforge.net/projects/aimbench/files/aim-suite7/
> >>>
> >>>
> >>>
> >>> Details are as below:
> >>> -------------------------------------------------------------------------------------------------->
> >>>
> >>> ====================================================================
> >>> =====================
> >>> compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
> >>>
> >>> gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64-
> >>> 2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7
> >>>
> >>> commit:
> >>> 522a777566 ("block: consolidate struct request timestamp fields")
> >>> 316ba5736c ("brd: Mark as non-rotational")
> >>>
> >>> 522a777566f56696 316ba5736c9caa5dbcd8408598
> >>> ---------------- --------------------------
> >>> %stddev %change %stddev
> >>> \ | \
> >>> 28321 -11.2% 25147 aim7.jobs-per-min
> >>> 318.19 +12.6% 358.23 aim7.time.elapsed_time
> >>> 318.19 +12.6% 358.23 aim7.time.elapsed_time.max
> >>> 1437526 ± 2% +14.6% 1646849 ± 2% aim7.time.involuntary_context_switches
> >>> 11986 +14.2% 13691 aim7.time.system_time
> >>> 73.06 ± 2% -3.6% 70.43 aim7.time.user_time
> >>> 2449470 ± 2% -25.0% 1837521 ± 4% aim7.time.voluntary_context_switches
> >>> 20.25 ± 58% +1681.5% 360.75 ±109% numa-meminfo.node1.Mlocked
> >>> 456062 -16.3% 381859 softirqs.SCHED
> >>> 9015 ± 7% -21.3% 7098 ± 22% meminfo.CmaFree
> >>> 47.50 ± 58% +1355.8% 691.50 ± 92% meminfo.Mlocked
> >>> 5.24 ± 3% -1.2 3.99 ± 2% mpstat.cpu.idle%
> >>> 0.61 ± 2% -0.1 0.52 ± 2% mpstat.cpu.usr%
> >>> 16627 +12.8% 18762 ± 4% slabinfo.Acpi-State.active_objs
> >>> 16627 +12.9% 18775 ± 4% slabinfo.Acpi-State.num_objs
> >>> 57.00 ± 2% +17.5% 67.00 vmstat.procs.r
> >>> 20936 -24.8% 15752 ± 2% vmstat.system.cs
> >>> 45474 -1.7% 44681 vmstat.system.in
> >>> 6.50 ± 59% +1157.7% 81.75 ± 75% numa-vmstat.node0.nr_mlock
> >>> 242870 ± 3% +13.2% 274913 ± 7% numa-vmstat.node0.nr_written
> >>> 2278 ± 7% -22.6% 1763 ± 21% numa-vmstat.node1.nr_free_cma
> >>> 4.75 ± 58% +1789.5% 89.75 ±109% numa-vmstat.node1.nr_mlock
> >>> 88018135 ± 3% -48.9% 44980457 ± 7% cpuidle.C1.time
> >>> 1398288 ± 3% -51.1% 683493 ± 9% cpuidle.C1.usage
> >>> 3499814 ± 2% -38.5% 2153158 ± 5% cpuidle.C1E.time
> >>> 52722 ± 4% -45.6% 28692 ± 6% cpuidle.C1E.usage
> >>> 9865857 ± 3% -40.1% 5905155 ± 5% cpuidle.C3.time
> >>> 69656 ± 2% -42.6% 39990 ± 5% cpuidle.C3.usage
> >>> 590856 ± 2% -12.3% 517910 cpuidle.C6.usage
> >>> 46160 ± 7% -53.7% 21372 ± 11% cpuidle.POLL.time
> >>> 1716 ± 7% -46.6% 916.25 ± 14% cpuidle.POLL.usage
> >>> 197656 +4.1% 205732 proc-vmstat.nr_active_file
> >>> 191867 +4.1% 199647 proc-vmstat.nr_dirty
> >>> 509282 +1.6% 517318 proc-vmstat.nr_file_pages
> >>> 2282 ± 8% -24.4% 1725 ± 22% proc-vmstat.nr_free_cma
> >>> 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_inactive_file
> >>> 11.50 ± 58% +1397.8% 172.25 ± 93% proc-vmstat.nr_mlock
> >>> 970355 ± 4% +14.6% 1111549 ± 8% proc-vmstat.nr_written
> >>> 197984 +4.1% 206034 proc-vmstat.nr_zone_active_file
> >>> 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_zone_inactive_file
> >>> 192282 +4.1% 200126 proc-vmstat.nr_zone_write_pending
> >>> 7901465 ± 3% -14.0% 6795016 ± 16% proc-vmstat.pgalloc_movable
> >>> 886101 +10.2% 976329 proc-vmstat.pgfault
> >>> 2.169e+12 +15.2% 2.497e+12 perf-stat.branch-instructions
> >>> 0.41 -0.1 0.35 perf-stat.branch-miss-rate%
> >>> 31.19 ± 2% +1.6 32.82 perf-stat.cache-miss-rate%
> >>> 9.116e+09 +8.3% 9.869e+09 perf-stat.cache-misses
> >>> 2.924e+10 +2.9% 3.008e+10 ± 2% perf-stat.cache-references
> >>> 6712739 ± 2% -15.4% 5678643 ± 2% perf-stat.context-switches
> >>> 4.02 +2.7% 4.13 perf-stat.cpi
> >>> 3.761e+13 +17.3% 4.413e+13 perf-stat.cpu-cycles
> >>> 606958 -13.7% 523758 ± 2% perf-stat.cpu-migrations
> >>> 2.476e+12 +13.4% 2.809e+12 perf-stat.dTLB-loads
> >>> 0.18 ± 2% -0.0 0.16 ± 9% perf-stat.dTLB-store-miss-rate%
> >>> 1.079e+09 ± 2% -9.6% 9.755e+08 ± 9% perf-stat.dTLB-store-misses
> >>> 5.933e+11 +1.6% 6.029e+11 perf-stat.dTLB-stores
> >>> 9.349e+12 +14.2% 1.068e+13 perf-stat.instructions
> >>> 11247 ± 11% +19.8% 13477 ± 9% perf-stat.instructions-per-iTLB-miss
> >>> 0.25 -2.6% 0.24 perf-stat.ipc
> >>> 865561 +10.3% 954350 perf-stat.minor-faults
> >>> 2.901e+09 ± 3% +9.8% 3.186e+09 ± 3% perf-stat.node-load-misses
> >>> 3.682e+09 ± 3% +11.0% 4.088e+09 ± 3% perf-stat.node-loads
> >>> 3.778e+09 +4.8% 3.959e+09 ± 2% perf-stat.node-store-misses
> >>> 5.079e+09 +6.4% 5.402e+09 perf-stat.node-stores
> >>> 865565 +10.3% 954352 perf-stat.page-faults
> >>> 51.75 ± 5% -12.5% 45.30 ± 10% sched_debug.cfs_rq:/.load_avg.avg
> >>> 316.35 ± 3% +17.2% 370.81 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev
> >>> 15294 ± 30% +234.9% 51219 ± 76% sched_debug.cpu.avg_idle.min
> >>> 299443 ± 3% -7.3% 277566 ± 5% sched_debug.cpu.avg_idle.stddev
> >>> 1182 ± 19% -26.3% 872.02 ± 13% sched_debug.cpu.nr_load_updates.stddev
> >>> 1.22 ± 8% +21.7% 1.48 ± 6% sched_debug.cpu.nr_running.avg
> >>> 2.75 ± 10% +26.2% 3.47 ± 6% sched_debug.cpu.nr_running.max
> >>> 0.58 ± 7% +24.2% 0.73 ± 6% sched_debug.cpu.nr_running.stddev
> >>> 77148 -20.0% 61702 ± 7% sched_debug.cpu.nr_switches.avg
> >>> 70024 -24.8% 52647 ± 8% sched_debug.cpu.nr_switches.min
> >>> 6662 ± 6% +61.9% 10789 ± 24% sched_debug.cpu.nr_switches.stddev
> >>> 80.45 ± 18% -19.1% 65.05 ± 6% sched_debug.cpu.nr_uninterruptible.stddev
> >>> 76819 -19.3% 62008 ± 8% sched_debug.cpu.sched_count.avg
> >>> 70616 -23.5% 53996 ± 8% sched_debug.cpu.sched_count.min
> >>> 5494 ± 9% +85.3% 10179 ± 26% sched_debug.cpu.sched_count.stddev
> >>> 16936 -52.9% 7975 ± 9% sched_debug.cpu.sched_goidle.avg
> >>> 19281 -49.9% 9666 ± 7% sched_debug.cpu.sched_goidle.max
> >>> 15417 -54.8% 6962 ± 10% sched_debug.cpu.sched_goidle.min
> >>> 875.00 ± 6% -35.0% 569.09 ± 13% sched_debug.cpu.sched_goidle.stddev
> >>> 40332 -23.5% 30851 ± 7% sched_debug.cpu.ttwu_count.avg
> >>> 35074 -26.3% 25833 ± 6% sched_debug.cpu.ttwu_count.min
> >>> 3239 ± 8% +67.4% 5422 ± 28% sched_debug.cpu.ttwu_count.stddev
> >>> 5232 +27.4% 6665 ± 13% sched_debug.cpu.ttwu_local.avg
> >>> 15877 ± 12% +77.5% 28184 ± 27% sched_debug.cpu.ttwu_local.max
> >>> 2530 ± 10% +95.9% 4956 ± 27% sched_debug.cpu.ttwu_local.stddev
> >>> 2.52 ± 7% -0.6 1.95 ± 3% perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> >>> 1.48 ± 12% -0.5 1.01 ± 4% perf-profile.calltrace.cycles-pp.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> >>> 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write
> >>> 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter
> >>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
> >>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
> >>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_usermode_loop
> >>> 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.btrfs_evict_inode.evict.__dentry_kill.dentry_kill.dput
> >>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
> >>> 1.69 -0.1 1.54 ± 2% perf-profile.calltrace.cycles-pp.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> >>> 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter
> >>> 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> >>> 0.71 ± 6% -0.1 0.61 ± 2% perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write
> >>> 0.69 ± 6% -0.1 0.60 ± 2% perf-profile.calltrace.cycles-pp.btrfs_clear_bit_hook.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need
> >>> 96.77 +0.6 97.33 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> >>> 0.00 +0.6 0.56 ± 3% perf-profile.calltrace.cycles-pp.can_overcommit.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
> >>> 96.72 +0.6 97.29 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 43.13 +0.8 43.91 perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> >>> 42.37 +0.8 43.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write
> >>> 43.11 +0.8 43.89 perf-profile.calltrace.cycles-pp.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> >>> 42.96 +0.8 43.77 perf-profile.calltrace.cycles-pp._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter
> >>> 95.28 +0.9 96.23 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 95.22 +1.0 96.18 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 94.88 +1.0 95.85 perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> >>> 94.83 +1.0 95.80 perf-profile.calltrace.cycles-pp.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
> >>> 94.51 +1.0 95.50 perf-profile.calltrace.cycles-pp.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write
> >>> 42.44 +1.1 43.52 perf-profile.calltrace.cycles-pp._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
> >>> 42.09 +1.1 43.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write
> >>> 44.07 +1.2 45.29 perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> >>> 43.42 +1.3 44.69 perf-profile.calltrace.cycles-pp.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> >>> 2.06 ± 18% -0.9 1.21 ± 6% perf-profile.children.cycles-pp.btrfs_search_slot
> >>> 2.54 ± 7% -0.6 1.96 ± 3% perf-profile.children.cycles-pp.btrfs_dirty_pages
> >>> 1.05 ± 24% -0.5 0.52 ± 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> >>> 1.50 ± 12% -0.5 1.03 ± 4% perf-profile.children.cycles-pp.btrfs_get_extent
> >>> 1.22 ± 15% -0.4 0.79 ± 8% perf-profile.children.cycles-pp.btrfs_lookup_file_extent
> >>> 0.81 ± 5% -0.4 0.41 ± 6% perf-profile.children.cycles-pp.btrfs_calc_reclaim_metadata_size
> >>> 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_lock_root_node
> >>> 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_tree_lock
> >>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.__dentry_kill
> >>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.evict
> >>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.dentry_kill
> >>> 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.btrfs_evict_inode
> >>> 0.91 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.exit_to_usermode_loop
> >>> 0.52 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.do_idle
> >>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.task_work_run
> >>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.__fput
> >>> 0.90 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.dput
> >>> 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.secondary_startup_64
> >>> 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.cpu_startup_entry
> >>> 0.50 ± 21% -0.3 0.17 ± 16% perf-profile.children.cycles-pp.start_secondary
> >>> 0.47 ± 20% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.cpuidle_enter_state
> >>> 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.intel_idle
> >>> 0.61 ± 20% -0.3 0.36 ± 11% perf-profile.children.cycles-pp.btrfs_tree_read_lock
> >>> 0.47 ± 26% -0.3 0.21 ± 10% perf-profile.children.cycles-pp.prepare_to_wait_event
> >>> 0.64 ± 18% -0.2 0.39 ± 9% perf-profile.children.cycles-pp.btrfs_read_lock_root_node
> >>> 0.40 ± 22% -0.2 0.21 ± 5% perf-profile.children.cycles-pp.btrfs_clear_path_blocking
> >>> 0.38 ± 23% -0.2 0.19 ± 13% perf-profile.children.cycles-pp.finish_wait
> >>> 1.51 ± 3% -0.2 1.35 ± 2% perf-profile.children.cycles-pp.__clear_extent_bit
> >>> 1.71 -0.1 1.56 ± 2% perf-profile.children.cycles-pp.lock_and_cleanup_extent_if_need
> >>> 0.29 ± 25% -0.1 0.15 ± 10% perf-profile.children.cycles-pp.btrfs_orphan_del
> >>> 0.27 ± 27% -0.1 0.12 ± 8% perf-profile.children.cycles-pp.btrfs_del_orphan_item
> >>> 0.33 ± 18% -0.1 0.19 ± 9% perf-profile.children.cycles-pp.queued_read_lock_slowpath
> >>> 0.33 ± 19% -0.1 0.20 ± 4% perf-profile.children.cycles-pp.__wake_up_common_lock
> >>> 0.45 ± 15% -0.1 0.34 ± 2% perf-profile.children.cycles-pp.btrfs_alloc_data_chunk_ondemand
> >>> 0.47 ± 16% -0.1 0.36 ± 4% perf-profile.children.cycles-pp.btrfs_check_data_free_space
> >>> 0.91 ± 4% -0.1 0.81 ± 3% perf-profile.children.cycles-pp.clear_extent_bit
> >>> 1.07 ± 5% -0.1 0.97 perf-profile.children.cycles-pp.__set_extent_bit
> >>> 0.77 ± 6% -0.1 0.69 ± 3% perf-profile.children.cycles-pp.btrfs_clear_bit_hook
> >>> 0.17 ± 20% -0.1 0.08 ± 10% perf-profile.children.cycles-pp.queued_write_lock_slowpath
> >>> 0.16 ± 22% -0.1 0.08 ± 24% perf-profile.children.cycles-pp.btrfs_lookup_inode
> >>> 0.21 ± 17% -0.1 0.14 ± 19% perf-profile.children.cycles-pp.__btrfs_update_delayed_inode
> >>> 0.26 ± 12% -0.1 0.18 ± 13% perf-profile.children.cycles-pp.btrfs_async_run_delayed_root
> >>> 0.52 ± 5% -0.1 0.45 perf-profile.children.cycles-pp.set_extent_bit
> >>> 0.45 ± 5% -0.1 0.40 ± 3% perf-profile.children.cycles-pp.alloc_extent_state
> >>> 0.11 ± 17% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.btrfs_clear_lock_blocking_rw
> >>> 0.28 ± 9% -0.0 0.23 ± 3% perf-profile.children.cycles-pp.btrfs_drop_pages
> >>> 0.07 -0.0 0.03 ±100% perf-profile.children.cycles-pp.btrfs_set_lock_blocking_rw
> >>> 0.39 ± 3% -0.0 0.34 ± 3% perf-profile.children.cycles-pp.get_alloc_profile
> >>> 0.33 ± 7% -0.0 0.29 perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
> >>> 0.38 ± 2% -0.0 0.35 ± 4% perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
> >>> 0.49 ± 3% -0.0 0.46 ± 3% perf-profile.children.cycles-pp.pagecache_get_page
> >>> 0.18 ± 4% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.truncate_inode_pages_range
> >>> 0.08 ± 5% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.btrfs_set_path_blocking
> >>> 0.08 ± 6% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.truncate_cleanup_page
> >>> 0.80 ± 4% +0.2 0.95 ± 2% perf-profile.children.cycles-pp.can_overcommit
> >>> 96.84 +0.5 97.37 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> >>> 96.80 +0.5 97.35 perf-profile.children.cycles-pp.do_syscall_64
> >>> 43.34 +0.8 44.17 perf-profile.children.cycles-pp.btrfs_inode_rsv_release
> >>> 43.49 +0.8 44.32 perf-profile.children.cycles-pp.block_rsv_release_bytes
> >>> 95.32 +0.9 96.26 perf-profile.children.cycles-pp.ksys_write
> >>> 95.26 +0.9 96.20 perf-profile.children.cycles-pp.vfs_write
> >>> 94.91 +1.0 95.88 perf-profile.children.cycles-pp.__vfs_write
> >>> 94.84 +1.0 95.81 perf-profile.children.cycles-pp.btrfs_file_write_iter
> >>> 94.55 +1.0 95.55 perf-profile.children.cycles-pp.__btrfs_buffered_write
> >>> 86.68 +1.0 87.70 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> >>> 44.08 +1.2 45.31 perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
> >>> 43.49 +1.3 44.77 perf-profile.children.cycles-pp.reserve_metadata_bytes
> >>> 87.59 +1.8 89.38 perf-profile.children.cycles-pp._raw_spin_lock
> >>> 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.self.cycles-pp.intel_idle
> >>> 0.33 ± 6% -0.1 0.18 ± 6% perf-profile.self.cycles-pp.get_alloc_profile
> >>> 0.27 ± 8% -0.0 0.22 ± 4% perf-profile.self.cycles-pp.btrfs_drop_pages
> >>> 0.07 -0.0 0.03 ±100% perf-profile.self.cycles-pp.btrfs_set_lock_blocking_rw
> >>> 0.14 ± 5% -0.0 0.12 ± 6% perf-profile.self.cycles-pp.clear_page_dirty_for_io
> >>> 0.09 ± 5% -0.0 0.07 ± 10% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> >>> 0.17 ± 4% +0.1 0.23 ± 3% perf-profile.self.cycles-pp.reserve_metadata_bytes
> >>> 0.31 ± 7% +0.1 0.45 ± 2% perf-profile.self.cycles-pp.can_overcommit
> >>> 86.35 +1.0 87.39 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> >>>
> >>>
> >>>
> >>> aim7.jobs-per-min
> >>>
> >>> 29000 +-+-----------------------------------------------------------------+
> >>> 28500 +-+ +.. + +..+.. +.. |
> >>> |..+ +.+..+.. : .. + .+.+..+..+.+.. .+..+.. + + + |
> >>> 28000 +-+ + .. : + +. + + + |
> >>> 27500 +-+ + + |
> >>> | |
> >>> 27000 +-+ |
> >>> 26500 +-+ |
> >>> 26000 +-+ |
> >>> | |
> >>> 25500 +-+ O O O O O |
> >>> 25000 +-+ O O O O O O O O O
> >>> | O O O O O O O O |
> >>> 24500 O-+O O O O |
> >>> 24000
> >>> +-+-----------------------------------------------------------------
> >>> +
> >>>
> >>>
> >>> [*] bisect-good sample
> >>> [O] bisect-bad sample
> >>>
> >>>
> >>> Disclaimer:
> >>> Results have been estimated based on internal Intel analysis and are
> >>> provided for informational purposes only. Any difference in system
> >>> hardware or software design or configuration may affect actual performance.
> >>>
> >>>
> >>> Thanks,
> >>> Xiaolong
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> LKP mailing list
> >>> [email protected]
> >>> https://lists.01.org/mailman/listinfo/lkp
> >>>
>
>
> --
> Jens Axboe
>
>

2018-08-04 03:06:30

by SeongJae Park

[permalink] [raw]
Subject: Re: [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression

Hello,

On Mon, 4 Jun 2018, Jens Axboe wrote:

> On 6/3/18 11:52 PM, kernel test robot wrote:
> >
> > Greeting,
> >
> > FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to commit:
> >
> >
> > commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as non-rotational")
> > https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-4.18/block
> >
> > in testcase: aim7
> > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
> > with following parameters:
> >
> > disk: 1BRD_48G
> > fs: btrfs
> > test: disk_rw
> > load: 1500
> > cpufreq_governor: performance
>
> Does this also happen on eg ext4 or xfs? If not, it might point to something in
> btrfs that ends up being worse for a device that isn't rotational.

Sorry for late response.

The regression is not reproducible with ext4. Similar test using ext4
didn't showed such performance degradation (61483.81 jobs/min for
original, 60967.35 jobs/min for the patch applied version). So the
cause of the regression would be in the btrfs.

The btrfs has optimizations for SSD; it enables the optimization if the
user gives 'ssd' mount option or the block device is marked as
'non-rotational', which I have set with the commit that incurred this
regression.

The profile result from the LKP roboy says that lock contention has
severely increased with the commit. AFAIK, the optimizations are 1)
using 2 MiB size cluster rather than 64 KiB, and 2) busy-wait log
syncing. The first optimization could increase critical section size,
and second one can increase locking contention because it doesn't
voluntarily unlock mutex.

So, I measured the jobs/min performance for 4.17.0 Linux kernel (orig),
4.17.0 Linux kernel with btrfs SSD optimization enabled (used 'ssd'
mount option) version (orig-opt), the patch applied version (brd-mod),
and the patch applied but btrfs SSD optimization disabled version
(brd-btrfs-mod). If the SSD optimizations of btrfs was the reason, orig
and brd-btrfs-mod should have similar performance while orig-opt and
brd-mod have similar performance. The results are as below:

orig orig-opt brd-mod brd-btrfs-mod
22358 21403 18164 18856


The results say that the SSD optimization of the btrfs can degrade the
performance if it uses a brd as its disk. However, it doesn't
completely explain the regression.

I will look about that more and report again, soon.


Thanks,
SeongJae Park

>
> CC'ing the btrfs guys, and leaving the rest of the email below.
>
> > test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
> > test-url: https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_projects_aimbench_files_aim-2Dsuite7_&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=cK1a7KivzZRh1fKQMjSm2A&m=IKNYvfXb5tRluNV45DgoqZaSiffR8xKQObhRn_lf1zo&s=12WA2xKDvsfwuUtTCsanhmFyD3le2LUKfG5u-O5sChk&e=
> >
> >
> >
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> >
> > =========================================================================================
> > compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
> > gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7
> >
> > commit:
> > 522a777566 ("block: consolidate struct request timestamp fields")
> > 316ba5736c ("brd: Mark as non-rotational")
> >
> > 522a777566f56696 316ba5736c9caa5dbcd8408598
> > ---------------- --------------------------
> > %stddev %change %stddev
> > \ | \
> > 28321 -11.2% 25147 aim7.jobs-per-min
> > 318.19 +12.6% 358.23 aim7.time.elapsed_time
> > 318.19 +12.6% 358.23 aim7.time.elapsed_time.max
> > 1437526 ? 2% +14.6% 1646849 ? 2% aim7.time.involuntary_context_switches
> > 11986 +14.2% 13691 aim7.time.system_time
> > 73.06 ? 2% -3.6% 70.43 aim7.time.user_time
> > 2449470 ? 2% -25.0% 1837521 ? 4% aim7.time.voluntary_context_switches
> > 20.25 ? 58% +1681.5% 360.75 ?109% numa-meminfo.node1.Mlocked
> > 456062 -16.3% 381859 softirqs.SCHED
> > 9015 ? 7% -21.3% 7098 ? 22% meminfo.CmaFree
> > 47.50 ? 58% +1355.8% 691.50 ? 92% meminfo.Mlocked
> > 5.24 ? 3% -1.2 3.99 ? 2% mpstat.cpu.idle%
> > 0.61 ? 2% -0.1 0.52 ? 2% mpstat.cpu.usr%
> > 16627 +12.8% 18762 ? 4% slabinfo.Acpi-State.active_objs
> > 16627 +12.9% 18775 ? 4% slabinfo.Acpi-State.num_objs
> > 57.00 ? 2% +17.5% 67.00 vmstat.procs.r
> > 20936 -24.8% 15752 ? 2% vmstat.system.cs
> > 45474 -1.7% 44681 vmstat.system.in
> > 6.50 ? 59% +1157.7% 81.75 ? 75% numa-vmstat.node0.nr_mlock
> > 242870 ? 3% +13.2% 274913 ? 7% numa-vmstat.node0.nr_written
> > 2278 ? 7% -22.6% 1763 ? 21% numa-vmstat.node1.nr_free_cma
> > 4.75 ? 58% +1789.5% 89.75 ?109% numa-vmstat.node1.nr_mlock
> > 88018135 ? 3% -48.9% 44980457 ? 7% cpuidle.C1.time
> > 1398288 ? 3% -51.1% 683493 ? 9% cpuidle.C1.usage
> > 3499814 ? 2% -38.5% 2153158 ? 5% cpuidle.C1E.time
> > 52722 ? 4% -45.6% 28692 ? 6% cpuidle.C1E.usage
> > 9865857 ? 3% -40.1% 5905155 ? 5% cpuidle.C3.time
> > 69656 ? 2% -42.6% 39990 ? 5% cpuidle.C3.usage
> > 590856 ? 2% -12.3% 517910 cpuidle.C6.usage
> > 46160 ? 7% -53.7% 21372 ? 11% cpuidle.POLL.time
> > 1716 ? 7% -46.6% 916.25 ? 14% cpuidle.POLL.usage
> > 197656 +4.1% 205732 proc-vmstat.nr_active_file
> > 191867 +4.1% 199647 proc-vmstat.nr_dirty
> > 509282 +1.6% 517318 proc-vmstat.nr_file_pages
> > 2282 ? 8% -24.4% 1725 ? 22% proc-vmstat.nr_free_cma
> > 357.50 +10.6% 395.25 ? 2% proc-vmstat.nr_inactive_file
> > 11.50 ? 58% +1397.8% 172.25 ? 93% proc-vmstat.nr_mlock
> > 970355 ? 4% +14.6% 1111549 ? 8% proc-vmstat.nr_written
> > 197984 +4.1% 206034 proc-vmstat.nr_zone_active_file
> > 357.50 +10.6% 395.25 ? 2% proc-vmstat.nr_zone_inactive_file
> > 192282 +4.1% 200126 proc-vmstat.nr_zone_write_pending
> > 7901465 ? 3% -14.0% 6795016 ? 16% proc-vmstat.pgalloc_movable
> > 886101 +10.2% 976329 proc-vmstat.pgfault
> > 2.169e+12 +15.2% 2.497e+12 perf-stat.branch-instructions
> > 0.41 -0.1 0.35 perf-stat.branch-miss-rate%
> > 31.19 ? 2% +1.6 32.82 perf-stat.cache-miss-rate%
> > 9.116e+09 +8.3% 9.869e+09 perf-stat.cache-misses
> > 2.924e+10 +2.9% 3.008e+10 ? 2% perf-stat.cache-references
> > 6712739 ? 2% -15.4% 5678643 ? 2% perf-stat.context-switches
> > 4.02 +2.7% 4.13 perf-stat.cpi
> > 3.761e+13 +17.3% 4.413e+13 perf-stat.cpu-cycles
> > 606958 -13.7% 523758 ? 2% perf-stat.cpu-migrations
> > 2.476e+12 +13.4% 2.809e+12 perf-stat.dTLB-loads
> > 0.18 ? 2% -0.0 0.16 ? 9% perf-stat.dTLB-store-miss-rate%
> > 1.079e+09 ? 2% -9.6% 9.755e+08 ? 9% perf-stat.dTLB-store-misses
> > 5.933e+11 +1.6% 6.029e+11 perf-stat.dTLB-stores
> > 9.349e+12 +14.2% 1.068e+13 perf-stat.instructions
> > 11247 ? 11% +19.8% 13477 ? 9% perf-stat.instructions-per-iTLB-miss
> > 0.25 -2.6% 0.24 perf-stat.ipc
> > 865561 +10.3% 954350 perf-stat.minor-faults
> > 2.901e+09 ? 3% +9.8% 3.186e+09 ? 3% perf-stat.node-load-misses
> > 3.682e+09 ? 3% +11.0% 4.088e+09 ? 3% perf-stat.node-loads
> > 3.778e+09 +4.8% 3.959e+09 ? 2% perf-stat.node-store-misses
> > 5.079e+09 +6.4% 5.402e+09 perf-stat.node-stores
> > 865565 +10.3% 954352 perf-stat.page-faults
> > 51.75 ? 5% -12.5% 45.30 ? 10% sched_debug.cfs_rq:/.load_avg.avg
> > 316.35 ? 3% +17.2% 370.81 ? 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev
> > 15294 ? 30% +234.9% 51219 ? 76% sched_debug.cpu.avg_idle.min
> > 299443 ? 3% -7.3% 277566 ? 5% sched_debug.cpu.avg_idle.stddev
> > 1182 ? 19% -26.3% 872.02 ? 13% sched_debug.cpu.nr_load_updates.stddev
> > 1.22 ? 8% +21.7% 1.48 ? 6% sched_debug.cpu.nr_running.avg
> > 2.75 ? 10% +26.2% 3.47 ? 6% sched_debug.cpu.nr_running.max
> > 0.58 ? 7% +24.2% 0.73 ? 6% sched_debug.cpu.nr_running.stddev
> > 77148 -20.0% 61702 ? 7% sched_debug.cpu.nr_switches.avg
> > 70024 -24.8% 52647 ? 8% sched_debug.cpu.nr_switches.min
> > 6662 ? 6% +61.9% 10789 ? 24% sched_debug.cpu.nr_switches.stddev
> > 80.45 ? 18% -19.1% 65.05 ? 6% sched_debug.cpu.nr_uninterruptible.stddev
> > 76819 -19.3% 62008 ? 8% sched_debug.cpu.sched_count.avg
> > 70616 -23.5% 53996 ? 8% sched_debug.cpu.sched_count.min
> > 5494 ? 9% +85.3% 10179 ? 26% sched_debug.cpu.sched_count.stddev
> > 16936 -52.9% 7975 ? 9% sched_debug.cpu.sched_goidle.avg
> > 19281 -49.9% 9666 ? 7% sched_debug.cpu.sched_goidle.max
> > 15417 -54.8% 6962 ? 10% sched_debug.cpu.sched_goidle.min
> > 875.00 ? 6% -35.0% 569.09 ? 13% sched_debug.cpu.sched_goidle.stddev
> > 40332 -23.5% 30851 ? 7% sched_debug.cpu.ttwu_count.avg
> > 35074 -26.3% 25833 ? 6% sched_debug.cpu.ttwu_count.min
> > 3239 ? 8% +67.4% 5422 ? 28% sched_debug.cpu.ttwu_count.stddev
> > 5232 +27.4% 6665 ? 13% sched_debug.cpu.ttwu_local.avg
> > 15877 ? 12% +77.5% 28184 ? 27% sched_debug.cpu.ttwu_local.max
> > 2530 ? 10% +95.9% 4956 ? 27% sched_debug.cpu.ttwu_local.stddev
> > 2.52 ? 7% -0.6 1.95 ? 3% perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> > 1.48 ? 12% -0.5 1.01 ? 4% perf-profile.calltrace.cycles-pp.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> > 1.18 ? 16% -0.4 0.76 ? 7% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write
> > 1.18 ? 16% -0.4 0.76 ? 7% perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter
> > 0.90 ? 17% -0.3 0.56 ? 4% perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
> > 0.90 ? 17% -0.3 0.56 ? 4% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
> > 0.90 ? 17% -0.3 0.56 ? 4% perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_usermode_loop
> > 0.90 ? 18% -0.3 0.56 ? 4% perf-profile.calltrace.cycles-pp.btrfs_evict_inode.evict.__dentry_kill.dentry_kill.dput
> > 0.90 ? 17% -0.3 0.57 ? 5% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.90 ? 17% -0.3 0.57 ? 5% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.90 ? 17% -0.3 0.57 ? 5% perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 0.90 ? 17% -0.3 0.57 ? 5% perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
> > 1.69 -0.1 1.54 ? 2% perf-profile.calltrace.cycles-pp.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> > 0.87 ? 4% -0.1 0.76 ? 2% perf-profile.calltrace.cycles-pp.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter
> > 0.87 ? 4% -0.1 0.76 ? 2% perf-profile.calltrace.cycles-pp.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> > 0.71 ? 6% -0.1 0.61 ? 2% perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write
> > 0.69 ? 6% -0.1 0.60 ? 2% perf-profile.calltrace.cycles-pp.btrfs_clear_bit_hook.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need
> > 96.77 +0.6 97.33 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 0.00 +0.6 0.56 ? 3% perf-profile.calltrace.cycles-pp.can_overcommit.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
> > 96.72 +0.6 97.29 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 43.13 +0.8 43.91 perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> > 42.37 +0.8 43.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write
> > 43.11 +0.8 43.89 perf-profile.calltrace.cycles-pp.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> > 42.96 +0.8 43.77 perf-profile.calltrace.cycles-pp._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter
> > 95.28 +0.9 96.23 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 95.22 +1.0 96.18 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 94.88 +1.0 95.85 perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > 94.83 +1.0 95.80 perf-profile.calltrace.cycles-pp.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
> > 94.51 +1.0 95.50 perf-profile.calltrace.cycles-pp.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write
> > 42.44 +1.1 43.52 perf-profile.calltrace.cycles-pp._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
> > 42.09 +1.1 43.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write
> > 44.07 +1.2 45.29 perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
> > 43.42 +1.3 44.69 perf-profile.calltrace.cycles-pp.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
> > 2.06 ? 18% -0.9 1.21 ? 6% perf-profile.children.cycles-pp.btrfs_search_slot
> > 2.54 ? 7% -0.6 1.96 ? 3% perf-profile.children.cycles-pp.btrfs_dirty_pages
> > 1.05 ? 24% -0.5 0.52 ? 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
> > 1.50 ? 12% -0.5 1.03 ? 4% perf-profile.children.cycles-pp.btrfs_get_extent
> > 1.22 ? 15% -0.4 0.79 ? 8% perf-profile.children.cycles-pp.btrfs_lookup_file_extent
> > 0.81 ? 5% -0.4 0.41 ? 6% perf-profile.children.cycles-pp.btrfs_calc_reclaim_metadata_size
> > 0.74 ? 24% -0.4 0.35 ? 9% perf-profile.children.cycles-pp.btrfs_lock_root_node
> > 0.74 ? 24% -0.4 0.35 ? 9% perf-profile.children.cycles-pp.btrfs_tree_lock
> > 0.90 ? 17% -0.3 0.56 ? 4% perf-profile.children.cycles-pp.__dentry_kill
> > 0.90 ? 17% -0.3 0.56 ? 4% perf-profile.children.cycles-pp.evict
> > 0.90 ? 17% -0.3 0.56 ? 4% perf-profile.children.cycles-pp.dentry_kill
> > 0.90 ? 18% -0.3 0.56 ? 4% perf-profile.children.cycles-pp.btrfs_evict_inode
> > 0.91 ? 18% -0.3 0.57 ? 4% perf-profile.children.cycles-pp.exit_to_usermode_loop
> > 0.52 ? 20% -0.3 0.18 ? 14% perf-profile.children.cycles-pp.do_idle
> > 0.90 ? 17% -0.3 0.57 ? 5% perf-profile.children.cycles-pp.task_work_run
> > 0.90 ? 17% -0.3 0.57 ? 5% perf-profile.children.cycles-pp.__fput
> > 0.90 ? 18% -0.3 0.57 ? 4% perf-profile.children.cycles-pp.dput
> > 0.51 ? 20% -0.3 0.18 ? 14% perf-profile.children.cycles-pp.secondary_startup_64
> > 0.51 ? 20% -0.3 0.18 ? 14% perf-profile.children.cycles-pp.cpu_startup_entry
> > 0.50 ? 21% -0.3 0.17 ? 16% perf-profile.children.cycles-pp.start_secondary
> > 0.47 ? 20% -0.3 0.16 ? 13% perf-profile.children.cycles-pp.cpuidle_enter_state
> > 0.47 ? 19% -0.3 0.16 ? 13% perf-profile.children.cycles-pp.intel_idle
> > 0.61 ? 20% -0.3 0.36 ? 11% perf-profile.children.cycles-pp.btrfs_tree_read_lock
> > 0.47 ? 26% -0.3 0.21 ? 10% perf-profile.children.cycles-pp.prepare_to_wait_event
> > 0.64 ? 18% -0.2 0.39 ? 9% perf-profile.children.cycles-pp.btrfs_read_lock_root_node
> > 0.40 ? 22% -0.2 0.21 ? 5% perf-profile.children.cycles-pp.btrfs_clear_path_blocking
> > 0.38 ? 23% -0.2 0.19 ? 13% perf-profile.children.cycles-pp.finish_wait
> > 1.51 ? 3% -0.2 1.35 ? 2% perf-profile.children.cycles-pp.__clear_extent_bit
> > 1.71 -0.1 1.56 ? 2% perf-profile.children.cycles-pp.lock_and_cleanup_extent_if_need
> > 0.29 ? 25% -0.1 0.15 ? 10% perf-profile.children.cycles-pp.btrfs_orphan_del
> > 0.27 ? 27% -0.1 0.12 ? 8% perf-profile.children.cycles-pp.btrfs_del_orphan_item
> > 0.33 ? 18% -0.1 0.19 ? 9% perf-profile.children.cycles-pp.queued_read_lock_slowpath
> > 0.33 ? 19% -0.1 0.20 ? 4% perf-profile.children.cycles-pp.__wake_up_common_lock
> > 0.45 ? 15% -0.1 0.34 ? 2% perf-profile.children.cycles-pp.btrfs_alloc_data_chunk_ondemand
> > 0.47 ? 16% -0.1 0.36 ? 4% perf-profile.children.cycles-pp.btrfs_check_data_free_space
> > 0.91 ? 4% -0.1 0.81 ? 3% perf-profile.children.cycles-pp.clear_extent_bit
> > 1.07 ? 5% -0.1 0.97 perf-profile.children.cycles-pp.__set_extent_bit
> > 0.77 ? 6% -0.1 0.69 ? 3% perf-profile.children.cycles-pp.btrfs_clear_bit_hook
> > 0.17 ? 20% -0.1 0.08 ? 10% perf-profile.children.cycles-pp.queued_write_lock_slowpath
> > 0.16 ? 22% -0.1 0.08 ? 24% perf-profile.children.cycles-pp.btrfs_lookup_inode
> > 0.21 ? 17% -0.1 0.14 ? 19% perf-profile.children.cycles-pp.__btrfs_update_delayed_inode
> > 0.26 ? 12% -0.1 0.18 ? 13% perf-profile.children.cycles-pp.btrfs_async_run_delayed_root
> > 0.52 ? 5% -0.1 0.45 perf-profile.children.cycles-pp.set_extent_bit
> > 0.45 ? 5% -0.1 0.40 ? 3% perf-profile.children.cycles-pp.alloc_extent_state
> > 0.11 ? 17% -0.1 0.06 ? 11% perf-profile.children.cycles-pp.btrfs_clear_lock_blocking_rw
> > 0.28 ? 9% -0.0 0.23 ? 3% perf-profile.children.cycles-pp.btrfs_drop_pages
> > 0.07 -0.0 0.03 ?100% perf-profile.children.cycles-pp.btrfs_set_lock_blocking_rw
> > 0.39 ? 3% -0.0 0.34 ? 3% perf-profile.children.cycles-pp.get_alloc_profile
> > 0.33 ? 7% -0.0 0.29 perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
> > 0.38 ? 2% -0.0 0.35 ? 4% perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
> > 0.49 ? 3% -0.0 0.46 ? 3% perf-profile.children.cycles-pp.pagecache_get_page
> > 0.18 ? 4% -0.0 0.15 ? 2% perf-profile.children.cycles-pp.truncate_inode_pages_range
> > 0.08 ? 5% -0.0 0.05 ? 9% perf-profile.children.cycles-pp.btrfs_set_path_blocking
> > 0.08 ? 6% -0.0 0.06 ? 6% perf-profile.children.cycles-pp.truncate_cleanup_page
> > 0.80 ? 4% +0.2 0.95 ? 2% perf-profile.children.cycles-pp.can_overcommit
> > 96.84 +0.5 97.37 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
> > 96.80 +0.5 97.35 perf-profile.children.cycles-pp.do_syscall_64
> > 43.34 +0.8 44.17 perf-profile.children.cycles-pp.btrfs_inode_rsv_release
> > 43.49 +0.8 44.32 perf-profile.children.cycles-pp.block_rsv_release_bytes
> > 95.32 +0.9 96.26 perf-profile.children.cycles-pp.ksys_write
> > 95.26 +0.9 96.20 perf-profile.children.cycles-pp.vfs_write
> > 94.91 +1.0 95.88 perf-profile.children.cycles-pp.__vfs_write
> > 94.84 +1.0 95.81 perf-profile.children.cycles-pp.btrfs_file_write_iter
> > 94.55 +1.0 95.55 perf-profile.children.cycles-pp.__btrfs_buffered_write
> > 86.68 +1.0 87.70 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> > 44.08 +1.2 45.31 perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
> > 43.49 +1.3 44.77 perf-profile.children.cycles-pp.reserve_metadata_bytes
> > 87.59 +1.8 89.38 perf-profile.children.cycles-pp._raw_spin_lock
> > 0.47 ? 19% -0.3 0.16 ? 13% perf-profile.self.cycles-pp.intel_idle
> > 0.33 ? 6% -0.1 0.18 ? 6% perf-profile.self.cycles-pp.get_alloc_profile
> > 0.27 ? 8% -0.0 0.22 ? 4% perf-profile.self.cycles-pp.btrfs_drop_pages
> > 0.07 -0.0 0.03 ?100% perf-profile.self.cycles-pp.btrfs_set_lock_blocking_rw
> > 0.14 ? 5% -0.0 0.12 ? 6% perf-profile.self.cycles-pp.clear_page_dirty_for_io
> > 0.09 ? 5% -0.0 0.07 ? 10% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
> > 0.17 ? 4% +0.1 0.23 ? 3% perf-profile.self.cycles-pp.reserve_metadata_bytes
> > 0.31 ? 7% +0.1 0.45 ? 2% perf-profile.self.cycles-pp.can_overcommit
> > 86.35 +1.0 87.39 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> >
> >
> >
> > aim7.jobs-per-min
> >
> > 29000 +-+-----------------------------------------------------------------+
> > 28500 +-+ +.. + +..+.. +.. |
> > |..+ +.+..+.. : .. + .+.+..+..+.+.. .+..+.. + + + |
> > 28000 +-+ + .. : + +. + + + |
> > 27500 +-+ + + |
> > | |
> > 27000 +-+ |
> > 26500 +-+ |
> > 26000 +-+ |
> > | |
> > 25500 +-+ O O O O O |
> > 25000 +-+ O O O O O O O O O
> > | O O O O O O O O |
> > 24500 O-+O O O O |
> > 24000 +-+-----------------------------------------------------------------+
> >
> >
> > [*] bisect-good sample
> > [O] bisect-bad sample
> >
> >
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> >
> >
> > Thanks,
> > Xiaolong
> >
>
>
> --
> Jens Axboe
>
>

2018-12-18 09:17:04

by Kemi Wang

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression

Hi, All
Do we have special reason to keep this patch (316ba5736c9:brd: Mark as non-rotational).
which leads to a performance regression when BRD is used as a disk on btrfs.

On 2018/7/10 下午1:27, kemi wrote:
> Hi, SeongJae
> Do you have any input for this regression? thanks
>
> On 2018年06月04日 13:52, kernel test robot wrote:
>>
>> Greeting,
>>
>> FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to commit:
>>
>>
>> commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as non-rotational")
>> https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-4.18/block
>>
>> in testcase: aim7
>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
>> with following parameters:
>>
>> disk: 1BRD_48G
>> fs: btrfs
>> test: disk_rw
>> load: 1500
>> cpufreq_governor: performance
>>
>> test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
>> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
>>
>>
>>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>>
>> =========================================================================================
>> compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
>> gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7
>>
>> commit:
>> 522a777566 ("block: consolidate struct request timestamp fields")
>> 316ba5736c ("brd: Mark as non-rotational")
>>
>> 522a777566f56696 316ba5736c9caa5dbcd8408598
>> ---------------- --------------------------
>> %stddev %change %stddev
>> \ | \
>> 28321 -11.2% 25147 aim7.jobs-per-min
>> 318.19 +12.6% 358.23 aim7.time.elapsed_time
>> 318.19 +12.6% 358.23 aim7.time.elapsed_time.max
>> 1437526 ± 2% +14.6% 1646849 ± 2% aim7.time.involuntary_context_switches
>> 11986 +14.2% 13691 aim7.time.system_time
>> 73.06 ± 2% -3.6% 70.43 aim7.time.user_time
>> 2449470 ± 2% -25.0% 1837521 ± 4% aim7.time.voluntary_context_switches
>> 20.25 ± 58% +1681.5% 360.75 ±109% numa-meminfo.node1.Mlocked
>> 456062 -16.3% 381859 softirqs.SCHED
>> 9015 ± 7% -21.3% 7098 ± 22% meminfo.CmaFree
>> 47.50 ± 58% +1355.8% 691.50 ± 92% meminfo.Mlocked
>> 5.24 ± 3% -1.2 3.99 ± 2% mpstat.cpu.idle%
>> 0.61 ± 2% -0.1 0.52 ± 2% mpstat.cpu.usr%
>> 16627 +12.8% 18762 ± 4% slabinfo.Acpi-State.active_objs
>> 16627 +12.9% 18775 ± 4% slabinfo.Acpi-State.num_objs
>> 57.00 ± 2% +17.5% 67.00 vmstat.procs.r
>> 20936 -24.8% 15752 ± 2% vmstat.system.cs
>> 45474 -1.7% 44681 vmstat.system.in
>> 6.50 ± 59% +1157.7% 81.75 ± 75% numa-vmstat.node0.nr_mlock
>> 242870 ± 3% +13.2% 274913 ± 7% numa-vmstat.node0.nr_written
>> 2278 ± 7% -22.6% 1763 ± 21% numa-vmstat.node1.nr_free_cma
>> 4.75 ± 58% +1789.5% 89.75 ±109% numa-vmstat.node1.nr_mlock
>> 88018135 ± 3% -48.9% 44980457 ± 7% cpuidle.C1.time
>> 1398288 ± 3% -51.1% 683493 ± 9% cpuidle.C1.usage
>> 3499814 ± 2% -38.5% 2153158 ± 5% cpuidle.C1E.time
>> 52722 ± 4% -45.6% 28692 ± 6% cpuidle.C1E.usage
>> 9865857 ± 3% -40.1% 5905155 ± 5% cpuidle.C3.time
>> 69656 ± 2% -42.6% 39990 ± 5% cpuidle.C3.usage
>> 590856 ± 2% -12.3% 517910 cpuidle.C6.usage
>> 46160 ± 7% -53.7% 21372 ± 11% cpuidle.POLL.time
>> 1716 ± 7% -46.6% 916.25 ± 14% cpuidle.POLL.usage
>> 197656 +4.1% 205732 proc-vmstat.nr_active_file
>> 191867 +4.1% 199647 proc-vmstat.nr_dirty
>> 509282 +1.6% 517318 proc-vmstat.nr_file_pages
>> 2282 ± 8% -24.4% 1725 ± 22% proc-vmstat.nr_free_cma
>> 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_inactive_file
>> 11.50 ± 58% +1397.8% 172.25 ± 93% proc-vmstat.nr_mlock
>> 970355 ± 4% +14.6% 1111549 ± 8% proc-vmstat.nr_written
>> 197984 +4.1% 206034 proc-vmstat.nr_zone_active_file
>> 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_zone_inactive_file
>> 192282 +4.1% 200126 proc-vmstat.nr_zone_write_pending
>> 7901465 ± 3% -14.0% 6795016 ± 16% proc-vmstat.pgalloc_movable
>> 886101 +10.2% 976329 proc-vmstat.pgfault
>> 2.169e+12 +15.2% 2.497e+12 perf-stat.branch-instructions
>> 0.41 -0.1 0.35 perf-stat.branch-miss-rate%
>> 31.19 ± 2% +1.6 32.82 perf-stat.cache-miss-rate%
>> 9.116e+09 +8.3% 9.869e+09 perf-stat.cache-misses
>> 2.924e+10 +2.9% 3.008e+10 ± 2% perf-stat.cache-references
>> 6712739 ± 2% -15.4% 5678643 ± 2% perf-stat.context-switches
>> 4.02 +2.7% 4.13 perf-stat.cpi
>> 3.761e+13 +17.3% 4.413e+13 perf-stat.cpu-cycles
>> 606958 -13.7% 523758 ± 2% perf-stat.cpu-migrations
>> 2.476e+12 +13.4% 2.809e+12 perf-stat.dTLB-loads
>> 0.18 ± 2% -0.0 0.16 ± 9% perf-stat.dTLB-store-miss-rate%
>> 1.079e+09 ± 2% -9.6% 9.755e+08 ± 9% perf-stat.dTLB-store-misses
>> 5.933e+11 +1.6% 6.029e+11 perf-stat.dTLB-stores
>> 9.349e+12 +14.2% 1.068e+13 perf-stat.instructions
>> 11247 ± 11% +19.8% 13477 ± 9% perf-stat.instructions-per-iTLB-miss
>> 0.25 -2.6% 0.24 perf-stat.ipc
>> 865561 +10.3% 954350 perf-stat.minor-faults
>> 2.901e+09 ± 3% +9.8% 3.186e+09 ± 3% perf-stat.node-load-misses
>> 3.682e+09 ± 3% +11.0% 4.088e+09 ± 3% perf-stat.node-loads
>> 3.778e+09 +4.8% 3.959e+09 ± 2% perf-stat.node-store-misses
>> 5.079e+09 +6.4% 5.402e+09 perf-stat.node-stores
>> 865565 +10.3% 954352 perf-stat.page-faults
>> 51.75 ± 5% -12.5% 45.30 ± 10% sched_debug.cfs_rq:/.load_avg.avg
>> 316.35 ± 3% +17.2% 370.81 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev
>> 15294 ± 30% +234.9% 51219 ± 76% sched_debug.cpu.avg_idle.min
>> 299443 ± 3% -7.3% 277566 ± 5% sched_debug.cpu.avg_idle.stddev
>> 1182 ± 19% -26.3% 872.02 ± 13% sched_debug.cpu.nr_load_updates.stddev
>> 1.22 ± 8% +21.7% 1.48 ± 6% sched_debug.cpu.nr_running.avg
>> 2.75 ± 10% +26.2% 3.47 ± 6% sched_debug.cpu.nr_running.max
>> 0.58 ± 7% +24.2% 0.73 ± 6% sched_debug.cpu.nr_running.stddev
>> 77148 -20.0% 61702 ± 7% sched_debug.cpu.nr_switches.avg
>> 70024 -24.8% 52647 ± 8% sched_debug.cpu.nr_switches.min
>> 6662 ± 6% +61.9% 10789 ± 24% sched_debug.cpu.nr_switches.stddev
>> 80.45 ± 18% -19.1% 65.05 ± 6% sched_debug.cpu.nr_uninterruptible.stddev
>> 76819 -19.3% 62008 ± 8% sched_debug.cpu.sched_count.avg
>> 70616 -23.5% 53996 ± 8% sched_debug.cpu.sched_count.min
>> 5494 ± 9% +85.3% 10179 ± 26% sched_debug.cpu.sched_count.stddev
>> 16936 -52.9% 7975 ± 9% sched_debug.cpu.sched_goidle.avg
>> 19281 -49.9% 9666 ± 7% sched_debug.cpu.sched_goidle.max
>> 15417 -54.8% 6962 ± 10% sched_debug.cpu.sched_goidle.min
>> 875.00 ± 6% -35.0% 569.09 ± 13% sched_debug.cpu.sched_goidle.stddev
>> 40332 -23.5% 30851 ± 7% sched_debug.cpu.ttwu_count.avg
>> 35074 -26.3% 25833 ± 6% sched_debug.cpu.ttwu_count.min
>> 3239 ± 8% +67.4% 5422 ± 28% sched_debug.cpu.ttwu_count.stddev
>> 5232 +27.4% 6665 ± 13% sched_debug.cpu.ttwu_local.avg
>> 15877 ± 12% +77.5% 28184 ± 27% sched_debug.cpu.ttwu_local.max
>> 2530 ± 10% +95.9% 4956 ± 27% sched_debug.cpu.ttwu_local.stddev
>> 2.52 ± 7% -0.6 1.95 ± 3% perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>> 1.48 ± 12% -0.5 1.01 ± 4% perf-profile.calltrace.cycles-pp.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>> 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write
>> 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter
>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_usermode_loop
>> 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.btrfs_evict_inode.evict.__dentry_kill.dentry_kill.dput
>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
>> 1.69 -0.1 1.54 ± 2% perf-profile.calltrace.cycles-pp.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>> 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter
>> 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>> 0.71 ± 6% -0.1 0.61 ± 2% perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write
>> 0.69 ± 6% -0.1 0.60 ± 2% perf-profile.calltrace.cycles-pp.btrfs_clear_bit_hook.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need
>> 96.77 +0.6 97.33 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>> 0.00 +0.6 0.56 ± 3% perf-profile.calltrace.cycles-pp.can_overcommit.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
>> 96.72 +0.6 97.29 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>> 43.13 +0.8 43.91 perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>> 42.37 +0.8 43.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write
>> 43.11 +0.8 43.89 perf-profile.calltrace.cycles-pp.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>> 42.96 +0.8 43.77 perf-profile.calltrace.cycles-pp._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter
>> 95.28 +0.9 96.23 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>> 95.22 +1.0 96.18 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>> 94.88 +1.0 95.85 perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>> 94.83 +1.0 95.80 perf-profile.calltrace.cycles-pp.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
>> 94.51 +1.0 95.50 perf-profile.calltrace.cycles-pp.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write
>> 42.44 +1.1 43.52 perf-profile.calltrace.cycles-pp._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
>> 42.09 +1.1 43.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write
>> 44.07 +1.2 45.29 perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>> 43.42 +1.3 44.69 perf-profile.calltrace.cycles-pp.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>> 2.06 ± 18% -0.9 1.21 ± 6% perf-profile.children.cycles-pp.btrfs_search_slot
>> 2.54 ± 7% -0.6 1.96 ± 3% perf-profile.children.cycles-pp.btrfs_dirty_pages
>> 1.05 ± 24% -0.5 0.52 ± 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>> 1.50 ± 12% -0.5 1.03 ± 4% perf-profile.children.cycles-pp.btrfs_get_extent
>> 1.22 ± 15% -0.4 0.79 ± 8% perf-profile.children.cycles-pp.btrfs_lookup_file_extent
>> 0.81 ± 5% -0.4 0.41 ± 6% perf-profile.children.cycles-pp.btrfs_calc_reclaim_metadata_size
>> 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_lock_root_node
>> 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_tree_lock
>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.__dentry_kill
>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.evict
>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.dentry_kill
>> 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.btrfs_evict_inode
>> 0.91 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.exit_to_usermode_loop
>> 0.52 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.do_idle
>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.task_work_run
>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.__fput
>> 0.90 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.dput
>> 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.secondary_startup_64
>> 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.cpu_startup_entry
>> 0.50 ± 21% -0.3 0.17 ± 16% perf-profile.children.cycles-pp.start_secondary
>> 0.47 ± 20% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.cpuidle_enter_state
>> 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.intel_idle
>> 0.61 ± 20% -0.3 0.36 ± 11% perf-profile.children.cycles-pp.btrfs_tree_read_lock
>> 0.47 ± 26% -0.3 0.21 ± 10% perf-profile.children.cycles-pp.prepare_to_wait_event
>> 0.64 ± 18% -0.2 0.39 ± 9% perf-profile.children.cycles-pp.btrfs_read_lock_root_node
>> 0.40 ± 22% -0.2 0.21 ± 5% perf-profile.children.cycles-pp.btrfs_clear_path_blocking
>> 0.38 ± 23% -0.2 0.19 ± 13% perf-profile.children.cycles-pp.finish_wait
>> 1.51 ± 3% -0.2 1.35 ± 2% perf-profile.children.cycles-pp.__clear_extent_bit
>> 1.71 -0.1 1.56 ± 2% perf-profile.children.cycles-pp.lock_and_cleanup_extent_if_need
>> 0.29 ± 25% -0.1 0.15 ± 10% perf-profile.children.cycles-pp.btrfs_orphan_del
>> 0.27 ± 27% -0.1 0.12 ± 8% perf-profile.children.cycles-pp.btrfs_del_orphan_item
>> 0.33 ± 18% -0.1 0.19 ± 9% perf-profile.children.cycles-pp.queued_read_lock_slowpath
>> 0.33 ± 19% -0.1 0.20 ± 4% perf-profile.children.cycles-pp.__wake_up_common_lock
>> 0.45 ± 15% -0.1 0.34 ± 2% perf-profile.children.cycles-pp.btrfs_alloc_data_chunk_ondemand
>> 0.47 ± 16% -0.1 0.36 ± 4% perf-profile.children.cycles-pp.btrfs_check_data_free_space
>> 0.91 ± 4% -0.1 0.81 ± 3% perf-profile.children.cycles-pp.clear_extent_bit
>> 1.07 ± 5% -0.1 0.97 perf-profile.children.cycles-pp.__set_extent_bit
>> 0.77 ± 6% -0.1 0.69 ± 3% perf-profile.children.cycles-pp.btrfs_clear_bit_hook
>> 0.17 ± 20% -0.1 0.08 ± 10% perf-profile.children.cycles-pp.queued_write_lock_slowpath
>> 0.16 ± 22% -0.1 0.08 ± 24% perf-profile.children.cycles-pp.btrfs_lookup_inode
>> 0.21 ± 17% -0.1 0.14 ± 19% perf-profile.children.cycles-pp.__btrfs_update_delayed_inode
>> 0.26 ± 12% -0.1 0.18 ± 13% perf-profile.children.cycles-pp.btrfs_async_run_delayed_root
>> 0.52 ± 5% -0.1 0.45 perf-profile.children.cycles-pp.set_extent_bit
>> 0.45 ± 5% -0.1 0.40 ± 3% perf-profile.children.cycles-pp.alloc_extent_state
>> 0.11 ± 17% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.btrfs_clear_lock_blocking_rw
>> 0.28 ± 9% -0.0 0.23 ± 3% perf-profile.children.cycles-pp.btrfs_drop_pages
>> 0.07 -0.0 0.03 ±100% perf-profile.children.cycles-pp.btrfs_set_lock_blocking_rw
>> 0.39 ± 3% -0.0 0.34 ± 3% perf-profile.children.cycles-pp.get_alloc_profile
>> 0.33 ± 7% -0.0 0.29 perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
>> 0.38 ± 2% -0.0 0.35 ± 4% perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
>> 0.49 ± 3% -0.0 0.46 ± 3% perf-profile.children.cycles-pp.pagecache_get_page
>> 0.18 ± 4% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.truncate_inode_pages_range
>> 0.08 ± 5% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.btrfs_set_path_blocking
>> 0.08 ± 6% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.truncate_cleanup_page
>> 0.80 ± 4% +0.2 0.95 ± 2% perf-profile.children.cycles-pp.can_overcommit
>> 96.84 +0.5 97.37 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>> 96.80 +0.5 97.35 perf-profile.children.cycles-pp.do_syscall_64
>> 43.34 +0.8 44.17 perf-profile.children.cycles-pp.btrfs_inode_rsv_release
>> 43.49 +0.8 44.32 perf-profile.children.cycles-pp.block_rsv_release_bytes
>> 95.32 +0.9 96.26 perf-profile.children.cycles-pp.ksys_write
>> 95.26 +0.9 96.20 perf-profile.children.cycles-pp.vfs_write
>> 94.91 +1.0 95.88 perf-profile.children.cycles-pp.__vfs_write
>> 94.84 +1.0 95.81 perf-profile.children.cycles-pp.btrfs_file_write_iter
>> 94.55 +1.0 95.55 perf-profile.children.cycles-pp.__btrfs_buffered_write
>> 86.68 +1.0 87.70 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>> 44.08 +1.2 45.31 perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
>> 43.49 +1.3 44.77 perf-profile.children.cycles-pp.reserve_metadata_bytes
>> 87.59 +1.8 89.38 perf-profile.children.cycles-pp._raw_spin_lock
>> 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.self.cycles-pp.intel_idle
>> 0.33 ± 6% -0.1 0.18 ± 6% perf-profile.self.cycles-pp.get_alloc_profile
>> 0.27 ± 8% -0.0 0.22 ± 4% perf-profile.self.cycles-pp.btrfs_drop_pages
>> 0.07 -0.0 0.03 ±100% perf-profile.self.cycles-pp.btrfs_set_lock_blocking_rw
>> 0.14 ± 5% -0.0 0.12 ± 6% perf-profile.self.cycles-pp.clear_page_dirty_for_io
>> 0.09 ± 5% -0.0 0.07 ± 10% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>> 0.17 ± 4% +0.1 0.23 ± 3% perf-profile.self.cycles-pp.reserve_metadata_bytes
>> 0.31 ± 7% +0.1 0.45 ± 2% perf-profile.self.cycles-pp.can_overcommit
>> 86.35 +1.0 87.39 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>>
>>
>>
>> aim7.jobs-per-min
>>
>> 29000 +-+-----------------------------------------------------------------+
>> 28500 +-+ +.. + +..+.. +.. |
>> |..+ +.+..+.. : .. + .+.+..+..+.+.. .+..+.. + + + |
>> 28000 +-+ + .. : + +. + + + |
>> 27500 +-+ + + |
>> | |
>> 27000 +-+ |
>> 26500 +-+ |
>> 26000 +-+ |
>> | |
>> 25500 +-+ O O O O O |
>> 25000 +-+ O O O O O O O O O
>> | O O O O O O O O |
>> 24500 O-+O O O O |
>> 24000 +-+-----------------------------------------------------------------+
>>
>>
>> [*] bisect-good sample
>> [O] bisect-bad sample
>>
>>
>> Disclaimer:
>> Results have been estimated based on internal Intel analysis and are provided
>> for informational purposes only. Any difference in system hardware or software
>> design or configuration may affect actual performance.
>>
>>
>> Thanks,
>> Xiaolong
>>
>>
>>
>> _______________________________________________
>> LKP mailing list
>> [email protected]
>> https://lists.01.org/mailman/listinfo/lkp
>>

2018-12-18 18:58:29

by Jens Axboe

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression

On 12/18/18 2:11 AM, kemi wrote:
> Hi, All
> Do we have special reason to keep this patch (316ba5736c9:brd: Mark as non-rotational).
> which leads to a performance regression when BRD is used as a disk on btrfs.

I really suspect that this is a btrfs issue, as this is just flagging
what is pretty obvious, that a ramdisk is NOT a rotational drive.
So whatever btrfs is doing with that information is causing it to
run slower - this really doesn't make any sense, but there we are.

CC'ing Chris, leaving the report below.

> On 2018/7/10 下午1:27, kemi wrote:
>> Hi, SeongJae
>> Do you have any input for this regression? thanks
>>
>> On 2018年06月04日 13:52, kernel test robot wrote:
>>>
>>> Greeting,
>>>
>>> FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to commit:
>>>
>>>
>>> commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as non-rotational")
>>> https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-4.18/block
>>>
>>> in testcase: aim7
>>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
>>> with following parameters:
>>>
>>> disk: 1BRD_48G
>>> fs: btrfs
>>> test: disk_rw
>>> load: 1500
>>> cpufreq_governor: performance
>>>
>>> test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
>>> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
>>>
>>>
>>>
>>> Details are as below:
>>> -------------------------------------------------------------------------------------------------->
>>>
>>> =========================================================================================
>>> compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
>>> gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7
>>>
>>> commit:
>>> 522a777566 ("block: consolidate struct request timestamp fields")
>>> 316ba5736c ("brd: Mark as non-rotational")
>>>
>>> 522a777566f56696 316ba5736c9caa5dbcd8408598
>>> ---------------- --------------------------
>>> %stddev %change %stddev
>>> \ | \
>>> 28321 -11.2% 25147 aim7.jobs-per-min
>>> 318.19 +12.6% 358.23 aim7.time.elapsed_time
>>> 318.19 +12.6% 358.23 aim7.time.elapsed_time.max
>>> 1437526 ± 2% +14.6% 1646849 ± 2% aim7.time.involuntary_context_switches
>>> 11986 +14.2% 13691 aim7.time.system_time
>>> 73.06 ± 2% -3.6% 70.43 aim7.time.user_time
>>> 2449470 ± 2% -25.0% 1837521 ± 4% aim7.time.voluntary_context_switches
>>> 20.25 ± 58% +1681.5% 360.75 ±109% numa-meminfo.node1.Mlocked
>>> 456062 -16.3% 381859 softirqs.SCHED
>>> 9015 ± 7% -21.3% 7098 ± 22% meminfo.CmaFree
>>> 47.50 ± 58% +1355.8% 691.50 ± 92% meminfo.Mlocked
>>> 5.24 ± 3% -1.2 3.99 ± 2% mpstat.cpu.idle%
>>> 0.61 ± 2% -0.1 0.52 ± 2% mpstat.cpu.usr%
>>> 16627 +12.8% 18762 ± 4% slabinfo.Acpi-State.active_objs
>>> 16627 +12.9% 18775 ± 4% slabinfo.Acpi-State.num_objs
>>> 57.00 ± 2% +17.5% 67.00 vmstat.procs.r
>>> 20936 -24.8% 15752 ± 2% vmstat.system.cs
>>> 45474 -1.7% 44681 vmstat.system.in
>>> 6.50 ± 59% +1157.7% 81.75 ± 75% numa-vmstat.node0.nr_mlock
>>> 242870 ± 3% +13.2% 274913 ± 7% numa-vmstat.node0.nr_written
>>> 2278 ± 7% -22.6% 1763 ± 21% numa-vmstat.node1.nr_free_cma
>>> 4.75 ± 58% +1789.5% 89.75 ±109% numa-vmstat.node1.nr_mlock
>>> 88018135 ± 3% -48.9% 44980457 ± 7% cpuidle.C1.time
>>> 1398288 ± 3% -51.1% 683493 ± 9% cpuidle.C1.usage
>>> 3499814 ± 2% -38.5% 2153158 ± 5% cpuidle.C1E.time
>>> 52722 ± 4% -45.6% 28692 ± 6% cpuidle.C1E.usage
>>> 9865857 ± 3% -40.1% 5905155 ± 5% cpuidle.C3.time
>>> 69656 ± 2% -42.6% 39990 ± 5% cpuidle.C3.usage
>>> 590856 ± 2% -12.3% 517910 cpuidle.C6.usage
>>> 46160 ± 7% -53.7% 21372 ± 11% cpuidle.POLL.time
>>> 1716 ± 7% -46.6% 916.25 ± 14% cpuidle.POLL.usage
>>> 197656 +4.1% 205732 proc-vmstat.nr_active_file
>>> 191867 +4.1% 199647 proc-vmstat.nr_dirty
>>> 509282 +1.6% 517318 proc-vmstat.nr_file_pages
>>> 2282 ± 8% -24.4% 1725 ± 22% proc-vmstat.nr_free_cma
>>> 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_inactive_file
>>> 11.50 ± 58% +1397.8% 172.25 ± 93% proc-vmstat.nr_mlock
>>> 970355 ± 4% +14.6% 1111549 ± 8% proc-vmstat.nr_written
>>> 197984 +4.1% 206034 proc-vmstat.nr_zone_active_file
>>> 357.50 +10.6% 395.25 ± 2% proc-vmstat.nr_zone_inactive_file
>>> 192282 +4.1% 200126 proc-vmstat.nr_zone_write_pending
>>> 7901465 ± 3% -14.0% 6795016 ± 16% proc-vmstat.pgalloc_movable
>>> 886101 +10.2% 976329 proc-vmstat.pgfault
>>> 2.169e+12 +15.2% 2.497e+12 perf-stat.branch-instructions
>>> 0.41 -0.1 0.35 perf-stat.branch-miss-rate%
>>> 31.19 ± 2% +1.6 32.82 perf-stat.cache-miss-rate%
>>> 9.116e+09 +8.3% 9.869e+09 perf-stat.cache-misses
>>> 2.924e+10 +2.9% 3.008e+10 ± 2% perf-stat.cache-references
>>> 6712739 ± 2% -15.4% 5678643 ± 2% perf-stat.context-switches
>>> 4.02 +2.7% 4.13 perf-stat.cpi
>>> 3.761e+13 +17.3% 4.413e+13 perf-stat.cpu-cycles
>>> 606958 -13.7% 523758 ± 2% perf-stat.cpu-migrations
>>> 2.476e+12 +13.4% 2.809e+12 perf-stat.dTLB-loads
>>> 0.18 ± 2% -0.0 0.16 ± 9% perf-stat.dTLB-store-miss-rate%
>>> 1.079e+09 ± 2% -9.6% 9.755e+08 ± 9% perf-stat.dTLB-store-misses
>>> 5.933e+11 +1.6% 6.029e+11 perf-stat.dTLB-stores
>>> 9.349e+12 +14.2% 1.068e+13 perf-stat.instructions
>>> 11247 ± 11% +19.8% 13477 ± 9% perf-stat.instructions-per-iTLB-miss
>>> 0.25 -2.6% 0.24 perf-stat.ipc
>>> 865561 +10.3% 954350 perf-stat.minor-faults
>>> 2.901e+09 ± 3% +9.8% 3.186e+09 ± 3% perf-stat.node-load-misses
>>> 3.682e+09 ± 3% +11.0% 4.088e+09 ± 3% perf-stat.node-loads
>>> 3.778e+09 +4.8% 3.959e+09 ± 2% perf-stat.node-store-misses
>>> 5.079e+09 +6.4% 5.402e+09 perf-stat.node-stores
>>> 865565 +10.3% 954352 perf-stat.page-faults
>>> 51.75 ± 5% -12.5% 45.30 ± 10% sched_debug.cfs_rq:/.load_avg.avg
>>> 316.35 ± 3% +17.2% 370.81 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev
>>> 15294 ± 30% +234.9% 51219 ± 76% sched_debug.cpu.avg_idle.min
>>> 299443 ± 3% -7.3% 277566 ± 5% sched_debug.cpu.avg_idle.stddev
>>> 1182 ± 19% -26.3% 872.02 ± 13% sched_debug.cpu.nr_load_updates.stddev
>>> 1.22 ± 8% +21.7% 1.48 ± 6% sched_debug.cpu.nr_running.avg
>>> 2.75 ± 10% +26.2% 3.47 ± 6% sched_debug.cpu.nr_running.max
>>> 0.58 ± 7% +24.2% 0.73 ± 6% sched_debug.cpu.nr_running.stddev
>>> 77148 -20.0% 61702 ± 7% sched_debug.cpu.nr_switches.avg
>>> 70024 -24.8% 52647 ± 8% sched_debug.cpu.nr_switches.min
>>> 6662 ± 6% +61.9% 10789 ± 24% sched_debug.cpu.nr_switches.stddev
>>> 80.45 ± 18% -19.1% 65.05 ± 6% sched_debug.cpu.nr_uninterruptible.stddev
>>> 76819 -19.3% 62008 ± 8% sched_debug.cpu.sched_count.avg
>>> 70616 -23.5% 53996 ± 8% sched_debug.cpu.sched_count.min
>>> 5494 ± 9% +85.3% 10179 ± 26% sched_debug.cpu.sched_count.stddev
>>> 16936 -52.9% 7975 ± 9% sched_debug.cpu.sched_goidle.avg
>>> 19281 -49.9% 9666 ± 7% sched_debug.cpu.sched_goidle.max
>>> 15417 -54.8% 6962 ± 10% sched_debug.cpu.sched_goidle.min
>>> 875.00 ± 6% -35.0% 569.09 ± 13% sched_debug.cpu.sched_goidle.stddev
>>> 40332 -23.5% 30851 ± 7% sched_debug.cpu.ttwu_count.avg
>>> 35074 -26.3% 25833 ± 6% sched_debug.cpu.ttwu_count.min
>>> 3239 ± 8% +67.4% 5422 ± 28% sched_debug.cpu.ttwu_count.stddev
>>> 5232 +27.4% 6665 ± 13% sched_debug.cpu.ttwu_local.avg
>>> 15877 ± 12% +77.5% 28184 ± 27% sched_debug.cpu.ttwu_local.max
>>> 2530 ± 10% +95.9% 4956 ± 27% sched_debug.cpu.ttwu_local.stddev
>>> 2.52 ± 7% -0.6 1.95 ± 3% perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>>> 1.48 ± 12% -0.5 1.01 ± 4% perf-profile.calltrace.cycles-pp.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>>> 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write
>>> 1.18 ± 16% -0.4 0.76 ± 7% perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter
>>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
>>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
>>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_usermode_loop
>>> 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.calltrace.cycles-pp.btrfs_evict_inode.evict.__dentry_kill.dentry_kill.dput
>>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
>>> 1.69 -0.1 1.54 ± 2% perf-profile.calltrace.cycles-pp.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>>> 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter
>>> 0.87 ± 4% -0.1 0.76 ± 2% perf-profile.calltrace.cycles-pp.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>>> 0.71 ± 6% -0.1 0.61 ± 2% perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write
>>> 0.69 ± 6% -0.1 0.60 ± 2% perf-profile.calltrace.cycles-pp.btrfs_clear_bit_hook.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need
>>> 96.77 +0.6 97.33 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>>> 0.00 +0.6 0.56 ± 3% perf-profile.calltrace.cycles-pp.can_overcommit.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
>>> 96.72 +0.6 97.29 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 43.13 +0.8 43.91 perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>>> 42.37 +0.8 43.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write
>>> 43.11 +0.8 43.89 perf-profile.calltrace.cycles-pp.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>>> 42.96 +0.8 43.77 perf-profile.calltrace.cycles-pp._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter
>>> 95.28 +0.9 96.23 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 95.22 +1.0 96.18 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 94.88 +1.0 95.85 perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>> 94.83 +1.0 95.80 perf-profile.calltrace.cycles-pp.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
>>> 94.51 +1.0 95.50 perf-profile.calltrace.cycles-pp.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write
>>> 42.44 +1.1 43.52 perf-profile.calltrace.cycles-pp._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
>>> 42.09 +1.1 43.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write
>>> 44.07 +1.2 45.29 perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>>> 43.42 +1.3 44.69 perf-profile.calltrace.cycles-pp.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>>> 2.06 ± 18% -0.9 1.21 ± 6% perf-profile.children.cycles-pp.btrfs_search_slot
>>> 2.54 ± 7% -0.6 1.96 ± 3% perf-profile.children.cycles-pp.btrfs_dirty_pages
>>> 1.05 ± 24% -0.5 0.52 ± 9% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>>> 1.50 ± 12% -0.5 1.03 ± 4% perf-profile.children.cycles-pp.btrfs_get_extent
>>> 1.22 ± 15% -0.4 0.79 ± 8% perf-profile.children.cycles-pp.btrfs_lookup_file_extent
>>> 0.81 ± 5% -0.4 0.41 ± 6% perf-profile.children.cycles-pp.btrfs_calc_reclaim_metadata_size
>>> 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_lock_root_node
>>> 0.74 ± 24% -0.4 0.35 ± 9% perf-profile.children.cycles-pp.btrfs_tree_lock
>>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.__dentry_kill
>>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.evict
>>> 0.90 ± 17% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.dentry_kill
>>> 0.90 ± 18% -0.3 0.56 ± 4% perf-profile.children.cycles-pp.btrfs_evict_inode
>>> 0.91 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.exit_to_usermode_loop
>>> 0.52 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.do_idle
>>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.task_work_run
>>> 0.90 ± 17% -0.3 0.57 ± 5% perf-profile.children.cycles-pp.__fput
>>> 0.90 ± 18% -0.3 0.57 ± 4% perf-profile.children.cycles-pp.dput
>>> 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.secondary_startup_64
>>> 0.51 ± 20% -0.3 0.18 ± 14% perf-profile.children.cycles-pp.cpu_startup_entry
>>> 0.50 ± 21% -0.3 0.17 ± 16% perf-profile.children.cycles-pp.start_secondary
>>> 0.47 ± 20% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.cpuidle_enter_state
>>> 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.children.cycles-pp.intel_idle
>>> 0.61 ± 20% -0.3 0.36 ± 11% perf-profile.children.cycles-pp.btrfs_tree_read_lock
>>> 0.47 ± 26% -0.3 0.21 ± 10% perf-profile.children.cycles-pp.prepare_to_wait_event
>>> 0.64 ± 18% -0.2 0.39 ± 9% perf-profile.children.cycles-pp.btrfs_read_lock_root_node
>>> 0.40 ± 22% -0.2 0.21 ± 5% perf-profile.children.cycles-pp.btrfs_clear_path_blocking
>>> 0.38 ± 23% -0.2 0.19 ± 13% perf-profile.children.cycles-pp.finish_wait
>>> 1.51 ± 3% -0.2 1.35 ± 2% perf-profile.children.cycles-pp.__clear_extent_bit
>>> 1.71 -0.1 1.56 ± 2% perf-profile.children.cycles-pp.lock_and_cleanup_extent_if_need
>>> 0.29 ± 25% -0.1 0.15 ± 10% perf-profile.children.cycles-pp.btrfs_orphan_del
>>> 0.27 ± 27% -0.1 0.12 ± 8% perf-profile.children.cycles-pp.btrfs_del_orphan_item
>>> 0.33 ± 18% -0.1 0.19 ± 9% perf-profile.children.cycles-pp.queued_read_lock_slowpath
>>> 0.33 ± 19% -0.1 0.20 ± 4% perf-profile.children.cycles-pp.__wake_up_common_lock
>>> 0.45 ± 15% -0.1 0.34 ± 2% perf-profile.children.cycles-pp.btrfs_alloc_data_chunk_ondemand
>>> 0.47 ± 16% -0.1 0.36 ± 4% perf-profile.children.cycles-pp.btrfs_check_data_free_space
>>> 0.91 ± 4% -0.1 0.81 ± 3% perf-profile.children.cycles-pp.clear_extent_bit
>>> 1.07 ± 5% -0.1 0.97 perf-profile.children.cycles-pp.__set_extent_bit
>>> 0.77 ± 6% -0.1 0.69 ± 3% perf-profile.children.cycles-pp.btrfs_clear_bit_hook
>>> 0.17 ± 20% -0.1 0.08 ± 10% perf-profile.children.cycles-pp.queued_write_lock_slowpath
>>> 0.16 ± 22% -0.1 0.08 ± 24% perf-profile.children.cycles-pp.btrfs_lookup_inode
>>> 0.21 ± 17% -0.1 0.14 ± 19% perf-profile.children.cycles-pp.__btrfs_update_delayed_inode
>>> 0.26 ± 12% -0.1 0.18 ± 13% perf-profile.children.cycles-pp.btrfs_async_run_delayed_root
>>> 0.52 ± 5% -0.1 0.45 perf-profile.children.cycles-pp.set_extent_bit
>>> 0.45 ± 5% -0.1 0.40 ± 3% perf-profile.children.cycles-pp.alloc_extent_state
>>> 0.11 ± 17% -0.1 0.06 ± 11% perf-profile.children.cycles-pp.btrfs_clear_lock_blocking_rw
>>> 0.28 ± 9% -0.0 0.23 ± 3% perf-profile.children.cycles-pp.btrfs_drop_pages
>>> 0.07 -0.0 0.03 ±100% perf-profile.children.cycles-pp.btrfs_set_lock_blocking_rw
>>> 0.39 ± 3% -0.0 0.34 ± 3% perf-profile.children.cycles-pp.get_alloc_profile
>>> 0.33 ± 7% -0.0 0.29 perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
>>> 0.38 ± 2% -0.0 0.35 ± 4% perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
>>> 0.49 ± 3% -0.0 0.46 ± 3% perf-profile.children.cycles-pp.pagecache_get_page
>>> 0.18 ± 4% -0.0 0.15 ± 2% perf-profile.children.cycles-pp.truncate_inode_pages_range
>>> 0.08 ± 5% -0.0 0.05 ± 9% perf-profile.children.cycles-pp.btrfs_set_path_blocking
>>> 0.08 ± 6% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.truncate_cleanup_page
>>> 0.80 ± 4% +0.2 0.95 ± 2% perf-profile.children.cycles-pp.can_overcommit
>>> 96.84 +0.5 97.37 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>>> 96.80 +0.5 97.35 perf-profile.children.cycles-pp.do_syscall_64
>>> 43.34 +0.8 44.17 perf-profile.children.cycles-pp.btrfs_inode_rsv_release
>>> 43.49 +0.8 44.32 perf-profile.children.cycles-pp.block_rsv_release_bytes
>>> 95.32 +0.9 96.26 perf-profile.children.cycles-pp.ksys_write
>>> 95.26 +0.9 96.20 perf-profile.children.cycles-pp.vfs_write
>>> 94.91 +1.0 95.88 perf-profile.children.cycles-pp.__vfs_write
>>> 94.84 +1.0 95.81 perf-profile.children.cycles-pp.btrfs_file_write_iter
>>> 94.55 +1.0 95.55 perf-profile.children.cycles-pp.__btrfs_buffered_write
>>> 86.68 +1.0 87.70 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>>> 44.08 +1.2 45.31 perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
>>> 43.49 +1.3 44.77 perf-profile.children.cycles-pp.reserve_metadata_bytes
>>> 87.59 +1.8 89.38 perf-profile.children.cycles-pp._raw_spin_lock
>>> 0.47 ± 19% -0.3 0.16 ± 13% perf-profile.self.cycles-pp.intel_idle
>>> 0.33 ± 6% -0.1 0.18 ± 6% perf-profile.self.cycles-pp.get_alloc_profile
>>> 0.27 ± 8% -0.0 0.22 ± 4% perf-profile.self.cycles-pp.btrfs_drop_pages
>>> 0.07 -0.0 0.03 ±100% perf-profile.self.cycles-pp.btrfs_set_lock_blocking_rw
>>> 0.14 ± 5% -0.0 0.12 ± 6% perf-profile.self.cycles-pp.clear_page_dirty_for_io
>>> 0.09 ± 5% -0.0 0.07 ± 10% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>>> 0.17 ± 4% +0.1 0.23 ± 3% perf-profile.self.cycles-pp.reserve_metadata_bytes
>>> 0.31 ± 7% +0.1 0.45 ± 2% perf-profile.self.cycles-pp.can_overcommit
>>> 86.35 +1.0 87.39 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>>>
>>>
>>>
>>> aim7.jobs-per-min
>>>
>>> 29000 +-+-----------------------------------------------------------------+
>>> 28500 +-+ +.. + +..+.. +.. |
>>> |..+ +.+..+.. : .. + .+.+..+..+.+.. .+..+.. + + + |
>>> 28000 +-+ + .. : + +. + + + |
>>> 27500 +-+ + + |
>>> | |
>>> 27000 +-+ |
>>> 26500 +-+ |
>>> 26000 +-+ |
>>> | |
>>> 25500 +-+ O O O O O |
>>> 25000 +-+ O O O O O O O O O
>>> | O O O O O O O O |
>>> 24500 O-+O O O O |
>>> 24000 +-+-----------------------------------------------------------------+
>>>
>>>
>>> [*] bisect-good sample
>>> [O] bisect-bad sample
>>>
>>>
>>> Disclaimer:
>>> Results have been estimated based on internal Intel analysis and are provided
>>> for informational purposes only. Any difference in system hardware or software
>>> design or configuration may affect actual performance.
>>>
>>>
>>> Thanks,
>>> Xiaolong
>>>
>>>
>>>
>>> _______________________________________________
>>> LKP mailing list
>>> [email protected]
>>> https://lists.01.org/mailman/listinfo/lkp
>>>


--
Jens Axboe


2018-12-19 23:37:56

by Chris Mason

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [brd] 316ba5736c: aim7.jobs-per-min -11.2% regression

On 18 Dec 2018, at 13:57, Jens Axboe wrote:

> On 12/18/18 2:11 AM, kemi wrote:
>> Hi, All
>> Do we have special reason to keep this patch (316ba5736c9:brd: Mark
>> as non-rotational).
>> which leads to a performance regression when BRD is used as a disk on
>> btrfs.
>
> I really suspect that this is a btrfs issue, as this is just flagging
> what is pretty obvious, that a ramdisk is NOT a rotational drive.
> So whatever btrfs is doing with that information is causing it to
> run slower - this really doesn't make any sense, but there we are.
>
> CC'ing Chris, leaving the report below.

Btrfs is changing the allocator decisions slightly for an SSD,
especially the cluster size for metadata, which should show up as more
system time spent in the btrfs allocator, but I'm not seeing that below.
It also changes how quickly btrfs dispatches synchronous IO.

But, some parts of the differential don't quite make sense to me:

>>>> 47.50 ± 58% +1355.8% 691.50 ± 92% meminfo.Mlocked

Are these changes expected?

-chris

>
>> On 2018/7/10 下午1:27, kemi wrote:
>>> Hi, SeongJae
>>> Do you have any input for this regression? thanks
>>>
>>> On 2018年06月04日 13:52, kernel test robot wrote:
>>>>
>>>> Greeting,
>>>>
>>>> FYI, we noticed a -11.2% regression of aim7.jobs-per-min due to
>>>> commit:
>>>>
>>>>
>>>> commit: 316ba5736c9caa5dbcd84085989862d2df57431d ("brd: Mark as
>>>> non-rotational")
>>>> https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git
>>>> for-4.18/block
>>>>
>>>> in testcase: aim7
>>>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @
>>>> 3.00GHz with 384G memory
>>>> with following parameters:
>>>>
>>>> disk: 1BRD_48G
>>>> fs: btrfs
>>>> test: disk_rw
>>>> load: 1500
>>>> cpufreq_governor: performance
>>>>
>>>> test-description: AIM7 is a traditional UNIX system level benchmark
>>>> suite which is used to test and measure the performance of
>>>> multiuser system.
>>>> test-url:
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_projects_aimbench_files_aim-2Dsuite7_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=kkEXHhn9ofFgUoBrBpTiepWkkQeot8EjTaMlN_yKeyw&s=ScajB-GPDPZvGMy0XU1Hbatu9gVLkqk2j8MSCzK0S8E&e=
>>>>
>>>>
>>>>
>>>> Details are as below:
>>>> -------------------------------------------------------------------------------------------------->
>>>>
>>>> =========================================================================================
>>>> compiler/cpufreq_governor/disk/fs/kconfig/load/rootfs/tbox_group/test/testcase:
>>>> gcc-7/performance/1BRD_48G/btrfs/x86_64-rhel-7.2/1500/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_rw/aim7
>>>>
>>>> commit:
>>>> 522a777566 ("block: consolidate struct request timestamp fields")
>>>> 316ba5736c ("brd: Mark as non-rotational")
>>>>
>>>> 522a777566f56696 316ba5736c9caa5dbcd8408598
>>>> ---------------- --------------------------
>>>> %stddev %change %stddev
>>>> \ | \
>>>> 28321 -11.2% 25147 aim7.jobs-per-min
>>>> 318.19 +12.6% 358.23
>>>> aim7.time.elapsed_time
>>>> 318.19 +12.6% 358.23
>>>> aim7.time.elapsed_time.max
>>>> 1437526 ± 2% +14.6% 1646849 ± 2%
>>>> aim7.time.involuntary_context_switches
>>>> 11986 +14.2% 13691 aim7.time.system_time
>>>> 73.06 ± 2% -3.6% 70.43 aim7.time.user_time
>>>> 2449470 ± 2% -25.0% 1837521 ± 4%
>>>> aim7.time.voluntary_context_switches
>>>> 20.25 ± 58% +1681.5% 360.75 ±109%
>>>> numa-meminfo.node1.Mlocked
>>>> 456062 -16.3% 381859 softirqs.SCHED
>>>> 9015 ± 7% -21.3% 7098 ± 22% meminfo.CmaFree
>>>> 47.50 ± 58% +1355.8% 691.50 ± 92% meminfo.Mlocked
>>>> 5.24 ± 3% -1.2 3.99 ± 2% mpstat.cpu.idle%
>>>> 0.61 ± 2% -0.1 0.52 ± 2% mpstat.cpu.usr%
>>>> 16627 +12.8% 18762 ± 4%
>>>> slabinfo.Acpi-State.active_objs
>>>> 16627 +12.9% 18775 ± 4%
>>>> slabinfo.Acpi-State.num_objs
>>>> 57.00 ± 2% +17.5% 67.00 vmstat.procs.r
>>>> 20936 -24.8% 15752 ± 2% vmstat.system.cs
>>>> 45474 -1.7% 44681 vmstat.system.in
>>>> 6.50 ± 59% +1157.7% 81.75 ± 75%
>>>> numa-vmstat.node0.nr_mlock
>>>> 242870 ± 3% +13.2% 274913 ± 7%
>>>> numa-vmstat.node0.nr_written
>>>> 2278 ± 7% -22.6% 1763 ± 21%
>>>> numa-vmstat.node1.nr_free_cma
>>>> 4.75 ± 58% +1789.5% 89.75 ±109%
>>>> numa-vmstat.node1.nr_mlock
>>>> 88018135 ± 3% -48.9% 44980457 ± 7% cpuidle.C1.time
>>>> 1398288 ± 3% -51.1% 683493 ± 9% cpuidle.C1.usage
>>>> 3499814 ± 2% -38.5% 2153158 ± 5% cpuidle.C1E.time
>>>> 52722 ± 4% -45.6% 28692 ± 6% cpuidle.C1E.usage
>>>> 9865857 ± 3% -40.1% 5905155 ± 5% cpuidle.C3.time
>>>> 69656 ± 2% -42.6% 39990 ± 5% cpuidle.C3.usage
>>>> 590856 ± 2% -12.3% 517910 cpuidle.C6.usage
>>>> 46160 ± 7% -53.7% 21372 ± 11% cpuidle.POLL.time
>>>> 1716 ± 7% -46.6% 916.25 ± 14% cpuidle.POLL.usage
>>>> 197656 +4.1% 205732
>>>> proc-vmstat.nr_active_file
>>>> 191867 +4.1% 199647 proc-vmstat.nr_dirty
>>>> 509282 +1.6% 517318
>>>> proc-vmstat.nr_file_pages
>>>> 2282 ± 8% -24.4% 1725 ± 22%
>>>> proc-vmstat.nr_free_cma
>>>> 357.50 +10.6% 395.25 ± 2%
>>>> proc-vmstat.nr_inactive_file
>>>> 11.50 ± 58% +1397.8% 172.25 ± 93%
>>>> proc-vmstat.nr_mlock
>>>> 970355 ± 4% +14.6% 1111549 ± 8%
>>>> proc-vmstat.nr_written
>>>> 197984 +4.1% 206034
>>>> proc-vmstat.nr_zone_active_file
>>>> 357.50 +10.6% 395.25 ± 2%
>>>> proc-vmstat.nr_zone_inactive_file
>>>> 192282 +4.1% 200126
>>>> proc-vmstat.nr_zone_write_pending
>>>> 7901465 ± 3% -14.0% 6795016 ± 16%
>>>> proc-vmstat.pgalloc_movable
>>>> 886101 +10.2% 976329 proc-vmstat.pgfault
>>>> 2.169e+12 +15.2% 2.497e+12
>>>> perf-stat.branch-instructions
>>>> 0.41 -0.1 0.35
>>>> perf-stat.branch-miss-rate%
>>>> 31.19 ± 2% +1.6 32.82
>>>> perf-stat.cache-miss-rate%
>>>> 9.116e+09 +8.3% 9.869e+09
>>>> perf-stat.cache-misses
>>>> 2.924e+10 +2.9% 3.008e+10 ± 2%
>>>> perf-stat.cache-references
>>>> 6712739 ± 2% -15.4% 5678643 ± 2%
>>>> perf-stat.context-switches
>>>> 4.02 +2.7% 4.13 perf-stat.cpi
>>>> 3.761e+13 +17.3% 4.413e+13 perf-stat.cpu-cycles
>>>> 606958 -13.7% 523758 ± 2%
>>>> perf-stat.cpu-migrations
>>>> 2.476e+12 +13.4% 2.809e+12 perf-stat.dTLB-loads
>>>> 0.18 ± 2% -0.0 0.16 ± 9%
>>>> perf-stat.dTLB-store-miss-rate%
>>>> 1.079e+09 ± 2% -9.6% 9.755e+08 ± 9%
>>>> perf-stat.dTLB-store-misses
>>>> 5.933e+11 +1.6% 6.029e+11 perf-stat.dTLB-stores
>>>> 9.349e+12 +14.2% 1.068e+13
>>>> perf-stat.instructions
>>>> 11247 ± 11% +19.8% 13477 ± 9%
>>>> perf-stat.instructions-per-iTLB-miss
>>>> 0.25 -2.6% 0.24 perf-stat.ipc
>>>> 865561 +10.3% 954350
>>>> perf-stat.minor-faults
>>>> 2.901e+09 ± 3% +9.8% 3.186e+09 ± 3%
>>>> perf-stat.node-load-misses
>>>> 3.682e+09 ± 3% +11.0% 4.088e+09 ± 3%
>>>> perf-stat.node-loads
>>>> 3.778e+09 +4.8% 3.959e+09 ± 2%
>>>> perf-stat.node-store-misses
>>>> 5.079e+09 +6.4% 5.402e+09 perf-stat.node-stores
>>>> 865565 +10.3% 954352 perf-stat.page-faults
>>>> 51.75 ± 5% -12.5% 45.30 ± 10%
>>>> sched_debug.cfs_rq:/.load_avg.avg
>>>> 316.35 ± 3% +17.2% 370.81 ± 8%
>>>> sched_debug.cfs_rq:/.util_est_enqueued.stddev
>>>> 15294 ± 30% +234.9% 51219 ± 76%
>>>> sched_debug.cpu.avg_idle.min
>>>> 299443 ± 3% -7.3% 277566 ± 5%
>>>> sched_debug.cpu.avg_idle.stddev
>>>> 1182 ± 19% -26.3% 872.02 ± 13%
>>>> sched_debug.cpu.nr_load_updates.stddev
>>>> 1.22 ± 8% +21.7% 1.48 ± 6%
>>>> sched_debug.cpu.nr_running.avg
>>>> 2.75 ± 10% +26.2% 3.47 ± 6%
>>>> sched_debug.cpu.nr_running.max
>>>> 0.58 ± 7% +24.2% 0.73 ± 6%
>>>> sched_debug.cpu.nr_running.stddev
>>>> 77148 -20.0% 61702 ± 7%
>>>> sched_debug.cpu.nr_switches.avg
>>>> 70024 -24.8% 52647 ± 8%
>>>> sched_debug.cpu.nr_switches.min
>>>> 6662 ± 6% +61.9% 10789 ± 24%
>>>> sched_debug.cpu.nr_switches.stddev
>>>> 80.45 ± 18% -19.1% 65.05 ± 6%
>>>> sched_debug.cpu.nr_uninterruptible.stddev
>>>> 76819 -19.3% 62008 ± 8%
>>>> sched_debug.cpu.sched_count.avg
>>>> 70616 -23.5% 53996 ± 8%
>>>> sched_debug.cpu.sched_count.min
>>>> 5494 ± 9% +85.3% 10179 ± 26%
>>>> sched_debug.cpu.sched_count.stddev
>>>> 16936 -52.9% 7975 ± 9%
>>>> sched_debug.cpu.sched_goidle.avg
>>>> 19281 -49.9% 9666 ± 7%
>>>> sched_debug.cpu.sched_goidle.max
>>>> 15417 -54.8% 6962 ± 10%
>>>> sched_debug.cpu.sched_goidle.min
>>>> 875.00 ± 6% -35.0% 569.09 ± 13%
>>>> sched_debug.cpu.sched_goidle.stddev
>>>> 40332 -23.5% 30851 ± 7%
>>>> sched_debug.cpu.ttwu_count.avg
>>>> 35074 -26.3% 25833 ± 6%
>>>> sched_debug.cpu.ttwu_count.min
>>>> 3239 ± 8% +67.4% 5422 ± 28%
>>>> sched_debug.cpu.ttwu_count.stddev
>>>> 5232 +27.4% 6665 ± 13%
>>>> sched_debug.cpu.ttwu_local.avg
>>>> 15877 ± 12% +77.5% 28184 ± 27%
>>>> sched_debug.cpu.ttwu_local.max
>>>> 2530 ± 10% +95.9% 4956 ± 27%
>>>> sched_debug.cpu.ttwu_local.stddev
>>>> 2.52 ± 7% -0.6 1.95 ± 3%
>>>> perf-profile.calltrace.cycles-pp.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>>>> 1.48 ± 12% -0.5 1.01 ± 4%
>>>> perf-profile.calltrace.cycles-pp.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>>>> 1.18 ± 16% -0.4 0.76 ± 7%
>>>> perf-profile.calltrace.cycles-pp.btrfs_search_slot.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write
>>>> 1.18 ± 16% -0.4 0.76 ± 7%
>>>> perf-profile.calltrace.cycles-pp.btrfs_lookup_file_extent.btrfs_get_extent.btrfs_dirty_pages.__btrfs_buffered_write.btrfs_file_write_iter
>>>> 0.90 ± 17% -0.3 0.56 ± 4%
>>>> perf-profile.calltrace.cycles-pp.__dentry_kill.dentry_kill.dput.__fput.task_work_run
>>>> 0.90 ± 17% -0.3 0.56 ± 4%
>>>> perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dentry_kill.dput.__fput
>>>> 0.90 ± 17% -0.3 0.56 ± 4%
>>>> perf-profile.calltrace.cycles-pp.dentry_kill.dput.__fput.task_work_run.exit_to_usermode_loop
>>>> 0.90 ± 18% -0.3 0.56 ± 4%
>>>> perf-profile.calltrace.cycles-pp.btrfs_evict_inode.evict.__dentry_kill.dentry_kill.dput
>>>> 0.90 ± 17% -0.3 0.57 ± 5%
>>>> perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>> 0.90 ± 17% -0.3 0.57 ± 5%
>>>> perf-profile.calltrace.cycles-pp.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>> 0.90 ± 17% -0.3 0.57 ± 5%
>>>> perf-profile.calltrace.cycles-pp.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>> 0.90 ± 17% -0.3 0.57 ± 5%
>>>> perf-profile.calltrace.cycles-pp.dput.__fput.task_work_run.exit_to_usermode_loop.do_syscall_64
>>>> 1.69 -0.1 1.54 ± 2%
>>>> perf-profile.calltrace.cycles-pp.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>>>> 0.87 ± 4% -0.1 0.76 ± 2%
>>>> perf-profile.calltrace.cycles-pp.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter
>>>> 0.87 ± 4% -0.1 0.76 ± 2%
>>>> perf-profile.calltrace.cycles-pp.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>>>> 0.71 ± 6% -0.1 0.61 ± 2%
>>>> perf-profile.calltrace.cycles-pp.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need.__btrfs_buffered_write
>>>> 0.69 ± 6% -0.1 0.60 ± 2%
>>>> perf-profile.calltrace.cycles-pp.btrfs_clear_bit_hook.clear_state_bit.__clear_extent_bit.clear_extent_bit.lock_and_cleanup_extent_if_need
>>>> 96.77 +0.6 97.33
>>>> perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
>>>> 0.00 +0.6 0.56 ± 3%
>>>> perf-profile.calltrace.cycles-pp.can_overcommit.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
>>>> 96.72 +0.6 97.29
>>>> perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>> 43.13 +0.8 43.91
>>>> perf-profile.calltrace.cycles-pp.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>>>> 42.37 +0.8 43.16
>>>> perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write
>>>> 43.11 +0.8 43.89
>>>> perf-profile.calltrace.cycles-pp.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>>>> 42.96 +0.8 43.77
>>>> perf-profile.calltrace.cycles-pp._raw_spin_lock.block_rsv_release_bytes.btrfs_inode_rsv_release.__btrfs_buffered_write.btrfs_file_write_iter
>>>> 95.28 +0.9 96.23
>>>> perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>> 95.22 +1.0 96.18
>>>> perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>> 94.88 +1.0 95.85
>>>> perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
>>>> 94.83 +1.0 95.80
>>>> perf-profile.calltrace.cycles-pp.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write.do_syscall_64
>>>> 94.51 +1.0 95.50
>>>> perf-profile.calltrace.cycles-pp.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write.ksys_write
>>>> 42.44 +1.1 43.52
>>>> perf-profile.calltrace.cycles-pp._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter
>>>> 42.09 +1.1 43.18
>>>> perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write
>>>> 44.07 +1.2 45.29
>>>> perf-profile.calltrace.cycles-pp.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write.vfs_write
>>>> 43.42 +1.3 44.69
>>>> perf-profile.calltrace.cycles-pp.reserve_metadata_bytes.btrfs_delalloc_reserve_metadata.__btrfs_buffered_write.btrfs_file_write_iter.__vfs_write
>>>> 2.06 ± 18% -0.9 1.21 ± 6%
>>>> perf-profile.children.cycles-pp.btrfs_search_slot
>>>> 2.54 ± 7% -0.6 1.96 ± 3%
>>>> perf-profile.children.cycles-pp.btrfs_dirty_pages
>>>> 1.05 ± 24% -0.5 0.52 ± 9%
>>>> perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>>>> 1.50 ± 12% -0.5 1.03 ± 4%
>>>> perf-profile.children.cycles-pp.btrfs_get_extent
>>>> 1.22 ± 15% -0.4 0.79 ± 8%
>>>> perf-profile.children.cycles-pp.btrfs_lookup_file_extent
>>>> 0.81 ± 5% -0.4 0.41 ± 6%
>>>> perf-profile.children.cycles-pp.btrfs_calc_reclaim_metadata_size
>>>> 0.74 ± 24% -0.4 0.35 ± 9%
>>>> perf-profile.children.cycles-pp.btrfs_lock_root_node
>>>> 0.74 ± 24% -0.4 0.35 ± 9%
>>>> perf-profile.children.cycles-pp.btrfs_tree_lock
>>>> 0.90 ± 17% -0.3 0.56 ± 4%
>>>> perf-profile.children.cycles-pp.__dentry_kill
>>>> 0.90 ± 17% -0.3 0.56 ± 4%
>>>> perf-profile.children.cycles-pp.evict
>>>> 0.90 ± 17% -0.3 0.56 ± 4%
>>>> perf-profile.children.cycles-pp.dentry_kill
>>>> 0.90 ± 18% -0.3 0.56 ± 4%
>>>> perf-profile.children.cycles-pp.btrfs_evict_inode
>>>> 0.91 ± 18% -0.3 0.57 ± 4%
>>>> perf-profile.children.cycles-pp.exit_to_usermode_loop
>>>> 0.52 ± 20% -0.3 0.18 ± 14%
>>>> perf-profile.children.cycles-pp.do_idle
>>>> 0.90 ± 17% -0.3 0.57 ± 5%
>>>> perf-profile.children.cycles-pp.task_work_run
>>>> 0.90 ± 17% -0.3 0.57 ± 5%
>>>> perf-profile.children.cycles-pp.__fput
>>>> 0.90 ± 18% -0.3 0.57 ± 4%
>>>> perf-profile.children.cycles-pp.dput
>>>> 0.51 ± 20% -0.3 0.18 ± 14%
>>>> perf-profile.children.cycles-pp.secondary_startup_64
>>>> 0.51 ± 20% -0.3 0.18 ± 14%
>>>> perf-profile.children.cycles-pp.cpu_startup_entry
>>>> 0.50 ± 21% -0.3 0.17 ± 16%
>>>> perf-profile.children.cycles-pp.start_secondary
>>>> 0.47 ± 20% -0.3 0.16 ± 13%
>>>> perf-profile.children.cycles-pp.cpuidle_enter_state
>>>> 0.47 ± 19% -0.3 0.16 ± 13%
>>>> perf-profile.children.cycles-pp.intel_idle
>>>> 0.61 ± 20% -0.3 0.36 ± 11%
>>>> perf-profile.children.cycles-pp.btrfs_tree_read_lock
>>>> 0.47 ± 26% -0.3 0.21 ± 10%
>>>> perf-profile.children.cycles-pp.prepare_to_wait_event
>>>> 0.64 ± 18% -0.2 0.39 ± 9%
>>>> perf-profile.children.cycles-pp.btrfs_read_lock_root_node
>>>> 0.40 ± 22% -0.2 0.21 ± 5%
>>>> perf-profile.children.cycles-pp.btrfs_clear_path_blocking
>>>> 0.38 ± 23% -0.2 0.19 ± 13%
>>>> perf-profile.children.cycles-pp.finish_wait
>>>> 1.51 ± 3% -0.2 1.35 ± 2%
>>>> perf-profile.children.cycles-pp.__clear_extent_bit
>>>> 1.71 -0.1 1.56 ± 2%
>>>> perf-profile.children.cycles-pp.lock_and_cleanup_extent_if_need
>>>> 0.29 ± 25% -0.1 0.15 ± 10%
>>>> perf-profile.children.cycles-pp.btrfs_orphan_del
>>>> 0.27 ± 27% -0.1 0.12 ± 8%
>>>> perf-profile.children.cycles-pp.btrfs_del_orphan_item
>>>> 0.33 ± 18% -0.1 0.19 ± 9%
>>>> perf-profile.children.cycles-pp.queued_read_lock_slowpath
>>>> 0.33 ± 19% -0.1 0.20 ± 4%
>>>> perf-profile.children.cycles-pp.__wake_up_common_lock
>>>> 0.45 ± 15% -0.1 0.34 ± 2%
>>>> perf-profile.children.cycles-pp.btrfs_alloc_data_chunk_ondemand
>>>> 0.47 ± 16% -0.1 0.36 ± 4%
>>>> perf-profile.children.cycles-pp.btrfs_check_data_free_space
>>>> 0.91 ± 4% -0.1 0.81 ± 3%
>>>> perf-profile.children.cycles-pp.clear_extent_bit
>>>> 1.07 ± 5% -0.1 0.97
>>>> perf-profile.children.cycles-pp.__set_extent_bit
>>>> 0.77 ± 6% -0.1 0.69 ± 3%
>>>> perf-profile.children.cycles-pp.btrfs_clear_bit_hook
>>>> 0.17 ± 20% -0.1 0.08 ± 10%
>>>> perf-profile.children.cycles-pp.queued_write_lock_slowpath
>>>> 0.16 ± 22% -0.1 0.08 ± 24%
>>>> perf-profile.children.cycles-pp.btrfs_lookup_inode
>>>> 0.21 ± 17% -0.1 0.14 ± 19%
>>>> perf-profile.children.cycles-pp.__btrfs_update_delayed_inode
>>>> 0.26 ± 12% -0.1 0.18 ± 13%
>>>> perf-profile.children.cycles-pp.btrfs_async_run_delayed_root
>>>> 0.52 ± 5% -0.1 0.45
>>>> perf-profile.children.cycles-pp.set_extent_bit
>>>> 0.45 ± 5% -0.1 0.40 ± 3%
>>>> perf-profile.children.cycles-pp.alloc_extent_state
>>>> 0.11 ± 17% -0.1 0.06 ± 11%
>>>> perf-profile.children.cycles-pp.btrfs_clear_lock_blocking_rw
>>>> 0.28 ± 9% -0.0 0.23 ± 3%
>>>> perf-profile.children.cycles-pp.btrfs_drop_pages
>>>> 0.07 -0.0 0.03 ±100%
>>>> perf-profile.children.cycles-pp.btrfs_set_lock_blocking_rw
>>>> 0.39 ± 3% -0.0 0.34 ± 3%
>>>> perf-profile.children.cycles-pp.get_alloc_profile
>>>> 0.33 ± 7% -0.0 0.29
>>>> perf-profile.children.cycles-pp.btrfs_set_extent_delalloc
>>>> 0.38 ± 2% -0.0 0.35 ± 4%
>>>> perf-profile.children.cycles-pp.__set_page_dirty_nobuffers
>>>> 0.49 ± 3% -0.0 0.46 ± 3%
>>>> perf-profile.children.cycles-pp.pagecache_get_page
>>>> 0.18 ± 4% -0.0 0.15 ± 2%
>>>> perf-profile.children.cycles-pp.truncate_inode_pages_range
>>>> 0.08 ± 5% -0.0 0.05 ± 9%
>>>> perf-profile.children.cycles-pp.btrfs_set_path_blocking
>>>> 0.08 ± 6% -0.0 0.06 ± 6%
>>>> perf-profile.children.cycles-pp.truncate_cleanup_page
>>>> 0.80 ± 4% +0.2 0.95 ± 2%
>>>> perf-profile.children.cycles-pp.can_overcommit
>>>> 96.84 +0.5 97.37
>>>> perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>>>> 96.80 +0.5 97.35
>>>> perf-profile.children.cycles-pp.do_syscall_64
>>>> 43.34 +0.8 44.17
>>>> perf-profile.children.cycles-pp.btrfs_inode_rsv_release
>>>> 43.49 +0.8 44.32
>>>> perf-profile.children.cycles-pp.block_rsv_release_bytes
>>>> 95.32 +0.9 96.26
>>>> perf-profile.children.cycles-pp.ksys_write
>>>> 95.26 +0.9 96.20
>>>> perf-profile.children.cycles-pp.vfs_write
>>>> 94.91 +1.0 95.88
>>>> perf-profile.children.cycles-pp.__vfs_write
>>>> 94.84 +1.0 95.81
>>>> perf-profile.children.cycles-pp.btrfs_file_write_iter
>>>> 94.55 +1.0 95.55
>>>> perf-profile.children.cycles-pp.__btrfs_buffered_write
>>>> 86.68 +1.0 87.70
>>>> perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
>>>> 44.08 +1.2 45.31
>>>> perf-profile.children.cycles-pp.btrfs_delalloc_reserve_metadata
>>>> 43.49 +1.3 44.77
>>>> perf-profile.children.cycles-pp.reserve_metadata_bytes
>>>> 87.59 +1.8 89.38
>>>> perf-profile.children.cycles-pp._raw_spin_lock
>>>> 0.47 ± 19% -0.3 0.16 ± 13%
>>>> perf-profile.self.cycles-pp.intel_idle
>>>> 0.33 ± 6% -0.1 0.18 ± 6%
>>>> perf-profile.self.cycles-pp.get_alloc_profile
>>>> 0.27 ± 8% -0.0 0.22 ± 4%
>>>> perf-profile.self.cycles-pp.btrfs_drop_pages
>>>> 0.07 -0.0 0.03 ±100%
>>>> perf-profile.self.cycles-pp.btrfs_set_lock_blocking_rw
>>>> 0.14 ± 5% -0.0 0.12 ± 6%
>>>> perf-profile.self.cycles-pp.clear_page_dirty_for_io
>>>> 0.09 ± 5% -0.0 0.07 ± 10%
>>>> perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>>>> 0.17 ± 4% +0.1 0.23 ± 3%
>>>> perf-profile.self.cycles-pp.reserve_metadata_bytes
>>>> 0.31 ± 7% +0.1 0.45 ± 2%
>>>> perf-profile.self.cycles-pp.can_overcommit
>>>> 86.35 +1.0 87.39
>>>> perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
>>>>
>>>>
>>>>
>>>> aim7.jobs-per-min
>>>>
>>>> 29000
>>>> +-+-----------------------------------------------------------------+
>>>> 28500 +-+ +.. + +..+..
>>>> +.. |
>>>> |..+ +.+..+.. : .. + .+.+..+..+.+.. .+..+.. +
>>>> + + |
>>>> 28000 +-+ + .. : + +. + + +
>>>> |
>>>> 27500 +-+ + +
>>>> |
>>>> |
>>>> |
>>>> 27000 +-+
>>>> |
>>>> 26500 +-+
>>>> |
>>>> 26000 +-+
>>>> |
>>>> |
>>>> |
>>>> 25500 +-+ O O O
>>>> O O |
>>>> 25000 +-+ O O O O O O O O
>>>> O
>>>> | O O O O O O O
>>>> O |
>>>> 24500 O-+O O O O
>>>> |
>>>> 24000
>>>> +-+-----------------------------------------------------------------+
>>>>
>>>>
>>>> [*] bisect-good sample
>>>> [O] bisect-bad sample
>>>>
>>>>
>>>> Disclaimer:
>>>> Results have been estimated based on internal Intel analysis and
>>>> are provided
>>>> for informational purposes only. Any difference in system hardware
>>>> or software
>>>> design or configuration may affect actual performance.
>>>>
>>>>
>>>> Thanks,
>>>> Xiaolong
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LKP mailing list
>>>> [email protected]
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.01.org_mailman_listinfo_lkp&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=9QPtTAxcitoznaWRKKHoEQ&m=kkEXHhn9ofFgUoBrBpTiepWkkQeot8EjTaMlN_yKeyw&s=jS-aI15ofX4iTh_mcL91Pw4x1BdDPVhz6AWa0DQpSFY&e=
>>>>
>
>
> --
> Jens Axboe