2016-12-13 03:19:21

by kernel test robot

[permalink] [raw]
Subject: [ext4] 64e1c57fa4: 74.4% fio.write_bw_MBps improvement

Greeting,

FYI, we noticed a 74.4% improvement of fio.write_bw_MBps due to commit:


commit: 64e1c57fa4740ac0728afe173e5a025b0e94cd55 ("ext4: Use clean_bdev_aliases() instead of iteration")
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master

in testcase: fio-basic
on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory
with following parameters:

disk: 1SSD
fs: ext4
runtime: 300s
nr_task: 64
rw: randwrite
bs: 4k
ioengine: sync
test_size: 512g
cpufreq_governor: performance

test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio



Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
4k/gcc-6/performance/1SSD/ext4/sync/x86_64-rhel-7.2/64/debian-x86_64-2016-08-31.cgz/300s/randwrite/lkp-bdw-ep2/512g/fio-basic

commit:
f734c89cc96e9b6f903865cd2656d9d8a7e160e7
64e1c57fa4740ac0728afe173e5a025b0e94cd55

f734c89cc96e9b6f 64e1c57fa4740ac0728afe173e
---------------- --------------------------
%stddev %change %stddev
\ | \
0.01 ± 0% +2333.3% 0.24 ± 5% fio.latency_100ms%
14.77 ± 6% +162.7% 38.79 ± 5% fio.latency_10us%
21.57 ± 8% -46.0% 11.65 ± 26% fio.latency_20us%
1.38 ± 2% -16.0% 1.16 ± 1% fio.latency_250ms%
0.08 ± 24% -47.8% 0.04 ± 20% fio.latency_250us%
5.41 ± 7% +195.6% 15.98 ± 10% fio.latency_4us%
0.02 ± 0% -50.0% 0.01 ± 0% fio.latency_50ms%
54.51 ± 5% -43.9% 30.60 ± 7% fio.latency_50us%
52135170 ± 0% +74.4% 90914901 ± 0% fio.time.file_system_outputs
2739 ± 11% +87.6% 5140 ± 6% fio.time.involuntary_context_switches
53.00 ± 3% +15.1% 61.00 ± 8% fio.time.percent_of_cpu_this_job_got
144.52 ± 2% +15.2% 166.55 ± 8% fio.time.system_time
142340 ± 3% +45.3% 206879 ± 0% fio.time.voluntary_context_switches
84.80 ± 0% +74.4% 147.91 ± 0% fio.write_bw_MBps
207872 ± 0% -50.0% 103936 ± 0% fio.write_clat_99%_us
2943 ± 0% -42.7% 1687 ± 0% fio.write_clat_mean_us
31196 ± 30% -53.8% 14397 ± 0% fio.write_clat_stddev
21709 ± 0% +74.4% 37865 ± 0% fio.write_iops
830491 ± 1% -12.0% 730497 ± 0% softirqs.TIMER
4.66 ± 2% -16.6% 3.89 ± 3% turbostat.%Busy
129.33 ± 2% -19.3% 104.33 ± 4% turbostat.Avg_MHz
110.54 ± 2% -2.6% 107.65 ± 1% turbostat.PkgWatt
30478861 ± 3% +606.4% 2.153e+08 ± 4% cpuidle.C1-BDW.time
577544 ± 2% +580.0% 3927544 ± 5% cpuidle.C1-BDW.usage
1368467 ± 6% +100.0% 2737189 ± 17% cpuidle.C1E-BDW.usage
13946 ± 5% +417.0% 72102 ± 3% cpuidle.POLL.usage
3278646 ± 8% +58.3% 5189008 ± 23% numa-numastat.node0.local_node
3278656 ± 8% +58.3% 5189014 ± 23% numa-numastat.node0.numa_hit
3459247 ± 10% +69.1% 5850044 ± 19% numa-numastat.node1.local_node
3459257 ± 10% +69.1% 5850058 ± 19% numa-numastat.node1.numa_hit
52135170 ± 0% +74.4% 90914901 ± 0% time.file_system_outputs
2739 ± 11% +87.6% 5140 ± 6% time.involuntary_context_switches
53.00 ± 3% +15.1% 61.00 ± 8% time.percent_of_cpu_this_job_got
144.52 ± 2% +15.2% 166.55 ± 8% time.system_time
142340 ± 3% +45.3% 206879 ± 0% time.voluntary_context_switches
12214 ± 0% +531.2% 77105 ± 0% vmstat.io.bo
27565 ± 0% +158.1% 71138 ± 0% vmstat.memory.buff
28340902 ± 0% +33.6% 37866242 ± 0% vmstat.memory.cache
1.027e+08 ± 0% -9.3% 93134197 ± 0% vmstat.memory.free
1.00 ± 0% -100.0% 0.00 ± 0% vmstat.procs.r
5733 ± 1% +326.4% 24449 ± 1% vmstat.system.cs
93380 ± 0% +22.1% 114031 ± 1% vmstat.system.in
212875 ± 0% +20.3% 256106 ± 1% meminfo.Active
87077 ± 0% +49.9% 130489 ± 0% meminfo.Active(file)
27544 ± 0% +158.0% 71055 ± 0% meminfo.Buffers
26447733 ± 0% +34.4% 35553085 ± 0% meminfo.Cached
4276230 ± 11% +16.3% 4972555 ± 0% meminfo.DirectMap2M
26383188 ± 0% +34.5% 35485681 ± 0% meminfo.Inactive
26260236 ± 0% +34.7% 35362723 ± 0% meminfo.Inactive(file)
1.027e+08 ± 0% -9.3% 93155832 ± 0% meminfo.MemFree
1891893 ± 0% +21.1% 2290894 ± 0% meminfo.SReclaimable
2038176 ± 0% +19.6% 2438164 ± 0% meminfo.Slab
13194963 ± 0% +32.8% 17524523 ± 6% numa-meminfo.node0.FilePages
13174270 ± 0% +32.5% 17461209 ± 6% numa-meminfo.node0.Inactive
13129661 ± 0% +32.4% 17385339 ± 6% numa-meminfo.node0.Inactive(file)
14472238 ± 2% +31.4% 19020611 ± 8% numa-meminfo.node0.MemUsed
13281005 ± 0% +36.1% 18069781 ± 6% numa-meminfo.node1.FilePages
13209686 ± 0% +36.2% 17995539 ± 6% numa-meminfo.node1.Inactive
13131329 ± 0% +36.7% 17948449 ± 6% numa-meminfo.node1.Inactive(file)
51349907 ± 0% -9.8% 46337309 ± 3% numa-meminfo.node1.MemFree
14696506 ± 2% +34.0% 19697553 ± 7% numa-meminfo.node1.MemUsed
5185 ± 72% +58.6% 8222 ± 44% numa-meminfo.node1.PageTables
3222169 ± 0% +32.5% 4270967 ± 6% numa-vmstat.node0.nr_dirtied
3298679 ± 0% +32.8% 4381297 ± 6% numa-vmstat.node0.nr_file_pages
3282350 ± 0% +32.4% 4346524 ± 6% numa-vmstat.node0.nr_inactive_file
248707 ± 8% +506.8% 1509069 ± 2% numa-vmstat.node0.nr_written
3282350 ± 0% +32.4% 4346524 ± 6% numa-vmstat.node0.nr_zone_inactive_file
3214059 ± 0% +38.3% 4445618 ± 6% numa-vmstat.node1.nr_dirtied
3320171 ± 0% +36.1% 4517628 ± 6% numa-vmstat.node1.nr_file_pages
12837551 ± 0% -9.8% 11584103 ± 3% numa-vmstat.node1.nr_free_pages
3282762 ± 0% +36.7% 4487304 ± 6% numa-vmstat.node1.nr_inactive_file
1295 ± 73% +58.8% 2056 ± 44% numa-vmstat.node1.nr_page_table_pages
13.33 ± 89% +455.0% 74.00 ± 43% numa-vmstat.node1.nr_writeback
238168 ± 7% +569.3% 1594092 ± 2% numa-vmstat.node1.nr_written
3282762 ± 0% +36.7% 4487304 ± 6% numa-vmstat.node1.nr_zone_inactive_file
21769 ± 0% +49.8% 32603 ± 0% proc-vmstat.nr_active_file
6525385 ± 0% +74.4% 11383448 ± 0% proc-vmstat.nr_dirtied
6618876 ± 0% +34.5% 8900704 ± 0% proc-vmstat.nr_file_pages
25683902 ± 0% -9.3% 23294311 ± 0% proc-vmstat.nr_free_pages
6565120 ± 0% +34.6% 8835475 ± 0% proc-vmstat.nr_inactive_file
472977 ± 0% +21.1% 572545 ± 0% proc-vmstat.nr_slab_reclaimable
29.00 ± 90% +398.9% 144.67 ± 32% proc-vmstat.nr_writeback
912204 ± 0% +534.6% 5789065 ± 0% proc-vmstat.nr_written
21769 ± 0% +49.8% 32603 ± 0% proc-vmstat.nr_zone_active_file
6565120 ± 0% +34.6% 8835475 ± 0% proc-vmstat.nr_zone_inactive_file
6738831 ± 6% +63.9% 11042933 ± 3% proc-vmstat.numa_hit
6738811 ± 6% +63.9% 11042913 ± 3% proc-vmstat.numa_local
7711569 ± 0% +65.5% 12758902 ± 0% proc-vmstat.pgalloc_normal
3692150 ± 0% +530.5% 23278445 ± 0% proc-vmstat.pgpgout
6402990 ± 0% +35.7% 8691217 ± 0% slabinfo.buffer_head.active_objs
164178 ± 0% +35.7% 222851 ± 0% slabinfo.buffer_head.active_slabs
6402991 ± 0% +35.7% 8691217 ± 0% slabinfo.buffer_head.num_objs
164178 ± 0% +35.7% 222851 ± 0% slabinfo.buffer_head.num_slabs
250.33 ± 16% +105.7% 515.00 ± 57% slabinfo.dquot.active_objs
250.33 ± 16% +105.7% 515.00 ± 57% slabinfo.dquot.num_objs
219.67 ± 6% +161.9% 575.33 ± 15% slabinfo.ext4_allocation_context.active_objs
219.67 ± 6% +161.9% 575.33 ± 15% slabinfo.ext4_allocation_context.num_objs
633978 ± 1% +539.2% 4052598 ± 0% slabinfo.ext4_extent_status.active_objs
6215 ± 1% +539.3% 39731 ± 0% slabinfo.ext4_extent_status.active_slabs
633978 ± 1% +539.2% 4052598 ± 0% slabinfo.ext4_extent_status.num_objs
6215 ± 1% +539.3% 39731 ± 0% slabinfo.ext4_extent_status.num_slabs
436.00 ± 7% +176.2% 1204 ± 19% slabinfo.ext4_io_end.active_objs
436.00 ± 7% +176.2% 1204 ± 19% slabinfo.ext4_io_end.num_objs
4116 ± 2% +54.2% 6349 ± 8% slabinfo.jbd2_journal_head.active_objs
5095 ± 0% +58.3% 8066 ± 1% slabinfo.jbd2_journal_head.num_objs
1214 ± 0% +18.0% 1433 ± 3% slabinfo.kmalloc-4096.active_objs
1237 ± 0% +17.1% 1448 ± 3% slabinfo.kmalloc-4096.num_objs
5.94e+11 ± 0% -73.8% 1.554e+11 ± 7% perf-stat.branch-instructions
0.13 ± 11% +472.1% 0.77 ± 26% perf-stat.branch-miss-rate%
7.992e+08 ± 11% +46.9% 1.174e+09 ± 18% perf-stat.branch-misses
3.21 ± 3% +40.4% 4.51 ± 7% perf-stat.cache-miss-rate%
3.435e+08 ± 3% +63.0% 5.6e+08 ± 5% perf-stat.cache-misses
1.069e+10 ± 1% +16.3% 1.244e+10 ± 2% perf-stat.cache-references
1717134 ± 0% +329.5% 7374605 ± 1% perf-stat.context-switches
3.428e+12 ± 2% -19.7% 2.753e+12 ± 5% perf-stat.cpu-cycles
13563 ± 14% +25.3% 16991 ± 3% perf-stat.cpu-migrations
0.10 ± 2% +263.5% 0.37 ± 2% perf-stat.dTLB-load-miss-rate%
7.249e+08 ± 1% +4.0% 7.541e+08 ± 1% perf-stat.dTLB-load-misses
7.047e+11 ± 1% -71.4% 2.013e+11 ± 4% perf-stat.dTLB-loads
0.02 ± 2% +319.3% 0.10 ± 4% perf-stat.dTLB-store-miss-rate%
5.215e+11 ± 0% -75.5% 1.278e+11 ± 1% perf-stat.dTLB-stores
70.13 ± 2% -21.9% 54.74 ± 4% perf-stat.iTLB-load-miss-rate%
81574642 ± 6% +92.2% 1.568e+08 ± 10% perf-stat.iTLB-loads
3.139e+12 ± 0% -76.9% 7.252e+11 ± 6% perf-stat.instructions
16407 ± 1% -76.6% 3841 ± 6% perf-stat.instructions-per-iTLB-miss
0.92 ± 1% -71.3% 0.26 ± 1% perf-stat.ipc
69.72 ± 3% -8.1% 64.04 ± 4% perf-stat.node-load-miss-rate%
62030107 ± 5% +99.9% 1.24e+08 ± 7% perf-stat.node-load-misses
26888359 ± 6% +158.3% 69455983 ± 6% perf-stat.node-loads
44598884 ± 10% +40.6% 62695931 ± 4% perf-stat.node-store-misses
71457228 ± 5% +32.2% 94453944 ± 5% perf-stat.node-stores
3115 ± 1% -36.9% 1966 ± 7% sched_debug.cfs_rq:/.exec_clock.avg
65160 ± 12% -83.8% 10565 ± 23% sched_debug.cfs_rq:/.exec_clock.max
7939 ± 9% -83.6% 1299 ± 7% sched_debug.cfs_rq:/.exec_clock.stddev
21801 ± 2% -38.0% 13523 ± 10% sched_debug.cfs_rq:/.load.avg
878363 ± 0% -29.9% 615745 ± 16% sched_debug.cfs_rq:/.load.max
119543 ± 1% -29.6% 84214 ± 11% sched_debug.cfs_rq:/.load.stddev
1068 ± 5% -36.0% 684.39 ± 1% sched_debug.cfs_rq:/.load_avg.max
104.62 ± 4% -16.6% 87.23 ± 13% sched_debug.cfs_rq:/.load_avg.stddev
32033 ± 15% -20.6% 25437 ± 6% sched_debug.cfs_rq:/.min_vruntime.avg
98945 ± 13% -47.7% 51784 ± 10% sched_debug.cfs_rq:/.min_vruntime.max
23205 ± 8% -14.7% 19785 ± 6% sched_debug.cfs_rq:/.min_vruntime.min
9298 ± 8% -54.6% 4221 ± 15% sched_debug.cfs_rq:/.min_vruntime.stddev
0.21 ± 7% -11.6% 0.18 ± 2% sched_debug.cfs_rq:/.nr_running.stddev
11.54 ± 12% -77.9% 2.56 ± 10% sched_debug.cfs_rq:/.runnable_load_avg.avg
818.39 ± 2% -82.1% 146.39 ± 20% sched_debug.cfs_rq:/.runnable_load_avg.max
88.60 ± 4% -80.6% 17.20 ± 15% sched_debug.cfs_rq:/.runnable_load_avg.stddev
9297 ± 8% -54.5% 4225 ± 15% sched_debug.cfs_rq:/.spread0.stddev
923.33 ± 1% -52.5% 438.39 ± 16% sched_debug.cfs_rq:/.util_avg.max
105.06 ± 8% -52.6% 49.85 ± 15% sched_debug.cfs_rq:/.util_avg.stddev
93881 ± 11% +63.1% 153150 ± 15% sched_debug.cpu.avg_idle.min
10.14 ± 13% -77.0% 2.33 ± 20% sched_debug.cpu.cpu_load[0].avg
731.94 ± 10% -81.0% 139.22 ± 26% sched_debug.cpu.cpu_load[0].max
79.16 ± 10% -79.7% 16.07 ± 22% sched_debug.cpu.cpu_load[0].stddev
12.24 ± 10% -44.2% 6.82 ± 18% sched_debug.cpu.cpu_load[1].avg
828.00 ± 3% -66.0% 281.89 ± 23% sched_debug.cpu.cpu_load[1].max
89.72 ± 4% -60.9% 35.11 ± 20% sched_debug.cpu.cpu_load[1].stddev
11.92 ± 11% -47.6% 6.25 ± 18% sched_debug.cpu.cpu_load[2].avg
825.67 ± 3% -68.8% 257.72 ± 22% sched_debug.cpu.cpu_load[2].max
89.19 ± 4% -64.2% 31.95 ± 19% sched_debug.cpu.cpu_load[2].stddev
11.54 ± 12% -50.0% 5.77 ± 19% sched_debug.cpu.cpu_load[3].avg
823.89 ± 3% -69.3% 252.89 ± 20% sched_debug.cpu.cpu_load[3].max
88.69 ± 4% -65.6% 30.53 ± 19% sched_debug.cpu.cpu_load[3].stddev
11.14 ± 12% -52.7% 5.27 ± 19% sched_debug.cpu.cpu_load[4].avg
822.17 ± 3% -69.1% 253.83 ± 18% sched_debug.cpu.cpu_load[4].max
88.15 ± 4% -66.4% 29.59 ± 19% sched_debug.cpu.cpu_load[4].stddev
21461 ± 1% -40.0% 12871 ± 8% sched_debug.cpu.load.avg
878363 ± 0% -31.9% 598417 ± 4% sched_debug.cpu.load.max
117955 ± 1% -31.1% 81323 ± 2% sched_debug.cpu.load.stddev
581256 ± 6% -13.1% 505055 ± 0% sched_debug.cpu.max_idle_balance_cost.max
10946 ± 62% -95.1% 535.82 ± 85% sched_debug.cpu.max_idle_balance_cost.stddev
0.21 ± 5% -12.4% 0.19 ± 1% sched_debug.cpu.nr_running.stddev
10669 ± 1% +297.6% 42417 ± 2% sched_debug.cpu.nr_switches.avg
120319 ± 19% +523.0% 749589 ± 19% sched_debug.cpu.nr_switches.max
1406 ± 18% +66.4% 2340 ± 17% sched_debug.cpu.nr_switches.min
15913 ± 15% +510.8% 97200 ± 10% sched_debug.cpu.nr_switches.stddev
11244 ± 4% +286.0% 43399 ± 2% sched_debug.cpu.sched_count.avg
141633 ± 13% +425.5% 744266 ± 18% sched_debug.cpu.sched_count.max
781.72 ± 12% +119.8% 1717 ± 16% sched_debug.cpu.sched_count.min
19973 ± 14% +391.1% 98091 ± 10% sched_debug.cpu.sched_count.stddev
4628 ± 2% +314.0% 19164 ± 4% sched_debug.cpu.sched_goidle.avg
51340 ± 14% +574.1% 346066 ± 25% sched_debug.cpu.sched_goidle.max
351.00 ± 15% +124.4% 787.56 ± 17% sched_debug.cpu.sched_goidle.min
6914 ± 11% +531.9% 43686 ± 15% sched_debug.cpu.sched_goidle.stddev
5131 ± 2% +334.7% 22307 ± 0% sched_debug.cpu.ttwu_count.avg
103132 ± 15% +586.1% 707630 ± 27% sched_debug.cpu.ttwu_count.max
162.50 ± 35% +125.2% 366.00 ± 32% sched_debug.cpu.ttwu_count.min
12843 ± 10% +580.8% 87440 ± 18% sched_debug.cpu.ttwu_count.stddev
2935 ± 5% +381.4% 14135 ± 5% sched_debug.cpu.ttwu_local.avg
64965 ± 25% +510.9% 396846 ± 13% sched_debug.cpu.ttwu_local.max
8418 ± 18% +531.3% 53145 ± 7% sched_debug.cpu.ttwu_local.stddev
1.07 ± 9% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp.___might_sleep.__might_sleep.unmap_underlying_metadata.ext4_ext_map_blocks.ext4_map_blocks
15.17 ± 8% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp.__find_get_block_slow.unmap_underlying_metadata.ext4_ext_map_blocks.ext4_map_blocks.ext4_writepages
0.00 ± -1% +Inf% 1.93 ± 11% perf-profile.calltrace.cycles-pp.__generic_file_write_iter.ext4_file_write_iter.__vfs_write.vfs_write.sys_write
0.92 ± 9% +44.2% 1.33 ± 21% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt.__irqentry_text_start
4.29 ± 10% +43.2% 6.14 ± 20% perf-profile.calltrace.cycles-pp.__irqentry_text_start.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
1.86 ± 11% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp.__might_sleep.unmap_underlying_metadata.ext4_ext_map_blocks.ext4_map_blocks.ext4_writepages
7.08 ± 8% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp.__radix_tree_lookup.radix_tree_lookup_slot.find_get_entry.pagecache_get_page.__find_get_block_slow
1.20 ± 9% +34.8% 1.61 ± 16% perf-profile.calltrace.cycles-pp.__tick_nohz_idle_enter.tick_nohz_irq_exit.irq_exit.smp_apic_timer_interrupt.__irqentry_text_start
0.00 ± -1% +Inf% 1.96 ± 12% perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
21.86 ± 9% -83.1% 3.69 ± 16% perf-profile.calltrace.cycles-pp.__writeback_inodes_wb.wb_writeback.wb_workfn.process_one_work.worker_thread
21.86 ± 9% -83.1% 3.69 ± 16% perf-profile.calltrace.cycles-pp.__writeback_single_inode.writeback_sb_inodes.__writeback_inodes_wb.wb_writeback.wb_workfn
73.68 ± 2% +20.5% 88.75 ± 2% perf-profile.calltrace.cycles-pp.call_cpuidle.cpu_startup_entry.start_secondary
74.62 ± 2% +21.1% 90.38 ± 2% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary
73.64 ± 2% +20.4% 88.69 ± 2% perf-profile.calltrace.cycles-pp.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
68.79 ± 3% +18.5% 81.54 ± 4% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary
21.86 ± 9% -83.1% 3.69 ± 16% perf-profile.calltrace.cycles-pp.do_writepages.__writeback_single_inode.writeback_sb_inodes.__writeback_inodes_wb.wb_writeback
0.00 ± -1% +Inf% 2.44 ± 11% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_fastpath
0.00 ± -1% +Inf% 1.10 ± 10% perf-profile.calltrace.cycles-pp.ext4_da_write_begin.generic_perform_write.__generic_file_write_iter.ext4_file_write_iter.__vfs_write
21.32 ± 9% -92.5% 1.59 ± 19% perf-profile.calltrace.cycles-pp.ext4_ext_map_blocks.ext4_map_blocks.ext4_writepages.do_writepages.__writeback_single_inode
0.00 ± -1% +Inf% 1.94 ± 12% perf-profile.calltrace.cycles-pp.ext4_file_write_iter.__vfs_write.vfs_write.sys_write.entry_SYSCALL_64_fastpath
21.65 ± 9% -90.8% 1.99 ± 17% perf-profile.calltrace.cycles-pp.ext4_map_blocks.ext4_writepages.do_writepages.__writeback_single_inode.writeback_sb_inodes
0.00 ± -1% +Inf% 0.82 ± 22% perf-profile.calltrace.cycles-pp.ext4_split_extent.ext4_ext_map_blocks.ext4_map_blocks.ext4_writepages.do_writepages
21.86 ± 9% -83.1% 3.68 ± 16% perf-profile.calltrace.cycles-pp.ext4_writepages.do_writepages.__writeback_single_inode.writeback_sb_inodes.__writeback_inodes_wb
9.51 ± 9% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp.find_get_entry.pagecache_get_page.__find_get_block_slow.unmap_underlying_metadata.ext4_ext_map_blocks
0.00 ± -1% +Inf% 1.89 ± 12% perf-profile.calltrace.cycles-pp.generic_perform_write.__generic_file_write_iter.ext4_file_write_iter.__vfs_write.vfs_write
1.43 ± 12% +39.3% 1.99 ± 21% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt.__irqentry_text_start.cpuidle_enter
41.93 ± 3% +14.5% 48.00 ± 4% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry
0.74 ± 17% +50.2% 1.12 ± 28% perf-profile.calltrace.cycles-pp.irq_enter.smp_apic_timer_interrupt.__irqentry_text_start.cpuidle_enter.call_cpuidle
1.58 ± 8% +37.8% 2.17 ± 16% perf-profile.calltrace.cycles-pp.irq_exit.smp_apic_timer_interrupt.__irqentry_text_start.cpuidle_enter.call_cpuidle
22.12 ± 9% -80.5% 4.32 ± 18% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
1.51 ± 11% +40.7% 2.12 ± 20% perf-profile.calltrace.cycles-pp.local_apic_timer_interrupt.smp_apic_timer_interrupt.__irqentry_text_start.cpuidle_enter.call_cpuidle
12.16 ± 8% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp.pagecache_get_page.__find_get_block_slow.unmap_underlying_metadata.ext4_ext_map_blocks.ext4_map_blocks
22.01 ± 9% -81.4% 4.09 ± 19% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork
7.92 ± 8% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp.radix_tree_lookup_slot.find_get_entry.pagecache_get_page.__find_get_block_slow.unmap_underlying_metadata
22.12 ± 9% -80.5% 4.32 ± 18% perf-profile.calltrace.cycles-pp.ret_from_fork
4.17 ± 10% +42.5% 5.95 ± 19% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.__irqentry_text_start.cpuidle_enter.call_cpuidle.cpu_startup_entry
74.78 ± 2% +21.1% 90.59 ± 2% perf-profile.calltrace.cycles-pp.start_secondary
0.00 ± -1% +Inf% 2.00 ± 12% perf-profile.calltrace.cycles-pp.sys_write.entry_SYSCALL_64_fastpath
1.21 ± 9% +35.7% 1.65 ± 16% perf-profile.calltrace.cycles-pp.tick_nohz_irq_exit.irq_exit.smp_apic_timer_interrupt.__irqentry_text_start.cpuidle_enter
1.02 ± 10% +32.9% 1.36 ± 17% perf-profile.calltrace.cycles-pp.tick_nohz_stop_sched_tick.__tick_nohz_idle_enter.tick_nohz_irq_exit.irq_exit.smp_apic_timer_interrupt
19.49 ± 9% -100.0% 0.00 ± -1% perf-profile.calltrace.cycles-pp.unmap_underlying_metadata.ext4_ext_map_blocks.ext4_map_blocks.ext4_writepages.do_writepages
0.00 ± -1% +Inf% 1.99 ± 12% perf-profile.calltrace.cycles-pp.vfs_write.sys_write.entry_SYSCALL_64_fastpath
21.86 ± 9% -83.1% 3.69 ± 16% perf-profile.calltrace.cycles-pp.wb_workfn.process_one_work.worker_thread.kthread.ret_from_fork
21.86 ± 9% -83.1% 3.69 ± 16% perf-profile.calltrace.cycles-pp.wb_writeback.wb_workfn.process_one_work.worker_thread.kthread
22.03 ± 9% -81.0% 4.19 ± 19% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork
21.86 ± 9% -83.1% 3.69 ± 16% perf-profile.calltrace.cycles-pp.writeback_sb_inodes.__writeback_inodes_wb.wb_writeback.wb_workfn.process_one_work
1.65 ± 11% -96.6% 0.06 ± 16% perf-profile.children.cycles-pp.___might_sleep
15.47 ± 8% -100.0% 0.00 ± -1% perf-profile.children.cycles-pp.__find_get_block_slow
0.07 ± 18% +2795.0% 1.93 ± 11% perf-profile.children.cycles-pp.__generic_file_write_iter
4.52 ± 9% +41.0% 6.37 ± 19% perf-profile.children.cycles-pp.__irqentry_text_start
2.38 ± 9% -96.8% 0.08 ± 16% perf-profile.children.cycles-pp.__might_sleep
7.36 ± 8% -96.7% 0.24 ± 20% perf-profile.children.cycles-pp.__radix_tree_lookup
1.27 ± 8% +45.9% 1.85 ± 16% perf-profile.children.cycles-pp.__tick_nohz_idle_enter
0.20 ± 10% +966.7% 2.13 ± 13% perf-profile.children.cycles-pp.__vfs_write
21.86 ± 9% -83.1% 3.69 ± 16% perf-profile.children.cycles-pp.__writeback_inodes_wb
21.86 ± 9% -83.1% 3.69 ± 16% perf-profile.children.cycles-pp.__writeback_single_inode
0.35 ± 26% +90.5% 0.67 ± 38% perf-profile.children.cycles-pp._raw_spin_lock
75.80 ± 2% +19.3% 90.40 ± 1% perf-profile.children.cycles-pp.call_cpuidle
76.75 ± 2% +20.1% 92.16 ± 1% perf-profile.children.cycles-pp.cpu_startup_entry
75.75 ± 2% +19.2% 90.33 ± 1% perf-profile.children.cycles-pp.cpuidle_enter
70.78 ± 3% +17.0% 82.81 ± 3% perf-profile.children.cycles-pp.cpuidle_enter_state
0.14 ± 11% +504.8% 0.85 ± 16% perf-profile.children.cycles-pp.do_IRQ
21.86 ± 9% -83.1% 3.69 ± 16% perf-profile.children.cycles-pp.do_writepages
0.77 ± 9% +285.8% 2.98 ± 13% perf-profile.children.cycles-pp.entry_SYSCALL_64_fastpath
0.00 ± -1% +Inf% 1.10 ± 10% perf-profile.children.cycles-pp.ext4_da_write_begin
21.32 ± 9% -92.5% 1.59 ± 19% perf-profile.children.cycles-pp.ext4_ext_map_blocks
0.06 ± 16% +3329.4% 1.94 ± 12% perf-profile.children.cycles-pp.ext4_file_write_iter
21.65 ± 9% -90.8% 1.99 ± 17% perf-profile.children.cycles-pp.ext4_map_blocks
0.12 ± 11% +588.9% 0.83 ± 21% perf-profile.children.cycles-pp.ext4_split_extent
21.86 ± 9% -83.1% 3.68 ± 16% perf-profile.children.cycles-pp.ext4_writepages
10.10 ± 8% -97.5% 0.25 ± 16% perf-profile.children.cycles-pp.find_get_entry
0.04 ± 71% +4269.2% 1.89 ± 12% perf-profile.children.cycles-pp.generic_perform_write
42.29 ± 3% +14.1% 48.25 ± 4% perf-profile.children.cycles-pp.intel_idle
0.82 ± 16% +54.3% 1.27 ± 26% perf-profile.children.cycles-pp.irq_enter
1.76 ± 6% +36.8% 2.40 ± 15% perf-profile.children.cycles-pp.irq_exit
22.12 ± 9% -80.5% 4.32 ± 18% perf-profile.children.cycles-pp.kthread
12.44 ± 8% -96.0% 0.50 ± 13% perf-profile.children.cycles-pp.pagecache_get_page
22.01 ± 9% -81.4% 4.09 ± 19% perf-profile.children.cycles-pp.process_one_work
8.19 ± 8% -96.9% 0.25 ± 18% perf-profile.children.cycles-pp.radix_tree_lookup_slot
22.12 ± 9% -80.5% 4.32 ± 18% perf-profile.children.cycles-pp.ret_from_fork
0.15 ± 12% +463.0% 0.86 ± 16% perf-profile.children.cycles-pp.ret_from_intr
4.41 ± 10% +40.3% 6.19 ± 19% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
74.78 ± 2% +21.1% 90.59 ± 2% perf-profile.children.cycles-pp.start_secondary
0.21 ± 12% +928.1% 2.19 ± 13% perf-profile.children.cycles-pp.sys_write
1.26 ± 8% +40.1% 1.77 ± 16% perf-profile.children.cycles-pp.tick_nohz_irq_exit
1.08 ± 9% +43.2% 1.55 ± 16% perf-profile.children.cycles-pp.tick_nohz_stop_sched_tick
19.77 ± 9% -99.9% 0.02 ±141% perf-profile.children.cycles-pp.unmap_underlying_metadata
0.21 ± 12% +926.6% 2.19 ± 13% perf-profile.children.cycles-pp.vfs_write
21.86 ± 9% -83.1% 3.69 ± 16% perf-profile.children.cycles-pp.wb_workfn
21.86 ± 9% -83.1% 3.69 ± 16% perf-profile.children.cycles-pp.wb_writeback
22.03 ± 9% -81.0% 4.19 ± 19% perf-profile.children.cycles-pp.worker_thread
21.86 ± 9% -83.1% 3.69 ± 16% perf-profile.children.cycles-pp.writeback_sb_inodes
1.65 ± 11% -96.6% 0.06 ± 16% perf-profile.self.cycles-pp.___might_sleep
2.71 ± 9% -100.0% 0.00 ± -1% perf-profile.self.cycles-pp.__find_get_block_slow
1.29 ± 10% -98.2% 0.02 ±141% perf-profile.self.cycles-pp.__might_sleep
7.36 ± 8% -96.7% 0.24 ± 20% perf-profile.self.cycles-pp.__radix_tree_lookup
0.65 ± 9% +42.6% 0.93 ± 18% perf-profile.self.cycles-pp.cpuidle_enter_state
1.89 ± 10% -100.0% 0.00 ± -1% perf-profile.self.cycles-pp.find_get_entry
42.29 ± 3% +14.1% 48.25 ± 4% perf-profile.self.cycles-pp.intel_idle
2.64 ± 8% -100.0% 0.00 ± -1% perf-profile.self.cycles-pp.pagecache_get_page
1.07 ± 8% -100.0% 0.00 ± -1% perf-profile.self.cycles-pp.radix_tree_lookup_slot
1.88 ± 11% -100.0% 0.00 ± -1% perf-profile.self.cycles-pp.unmap_underlying_metadata


perf-stat.context-switches

8e+06 ++------------------------------------------------------------------+
O O O O O O O O O O O O O O O O O O O O O O
7e+06 ++ O |
6e+06 ++ |
| |
5e+06 ++ |
| |
4e+06 ++ |
| |
3e+06 ++ |
2e+06 ++ |
*..*..*..*..*..*..*..*..*..*..*..*.*..*..*..*..*..*..*..*..* * |
1e+06 ++ + + |
| + + |
0 ++---------------------------------------------------------O--*-----+


perf-stat.branch-miss-rate_

1.2 ++--------------------------------------------------------------------+
| |
1 ++ O
| O |
O O O |
0.8 ++ O O |
| O O O O O O O O |
0.6 ++ O O O O O O O O |
| |
0.4 ++ |
| |
| |
0.2 ++.*.. .*.. .*..*..*..*...*..*..*..*.. .*..*..*.. * |
*. *. *..*..*..*. *. *.. .. |
0 ++-----------------------------------------------------------O--*-----+


perf-stat.dTLB-load-miss-rate_

0.45 O+-------O-----------------------------------------------------------+
| O O O O O |
0.4 ++ O O O O O O O O O O O
0.35 ++ O O O O O |
| |
0.3 ++ |
0.25 ++ |
| |
0.2 ++ |
0.15 ++ |
| |
0.1 *+.*..*..*..*..*..*..*..*..*..*..*..*..*..*..*..*..*..*..*..* * |
0.05 ++ + + |
| + + |
0 ++----------------------------------------------------------O--*-----+


perf-stat.dTLB-store-miss-rate_

0.12 ++-------------------------------------------------------------------+
| |
0.1 ++ O O O O O O O O O
O O O O O O O O O O O O O O |
| |
0.08 ++ |
| |
0.06 ++ |
| |
0.04 ++ |
| |
*..*..*..*..*..*..*..*..*..*..*..*..*..*..*..*..*..*..*..*..* * |
0.02 ++ + + |
| + + |
0 ++----------------------------------------------------------O--*-----+


perf-stat.ipc

1 ++--------------------------------------------------------------------+
0.9 ++ *..*..*..*..*.. .*..*..*..*...*.. .*..*.. *.. .*..* * |
| + .*. *. .. *. : : |
0.8 ++ *. * : : |
0.7 *+ : : |
| : : |
0.6 ++ : : |
0.5 ++ : : |
0.4 ++ : : |
| : : |
0.3 O+ O O O O O O O O O O O O O O O O O O O :O: O O
0.2 ++ : : |
| : |
0.1 ++ : |
0 ++-----------------------------------------------------------O--*-----+


perf-stat.instructions-per-iTLB-miss

25000 ++------------------------------------------------------------------+
| |
| |
20000 *+. .*.. |
| *..*..*..*..*..*.. .*. |
| *..*..*.. .*. .*..*. *..*..*..* * |
15000 ++ *. *. : : |
| : : |
10000 ++ : : |
| : : |
| : : |
5000 ++ : : |
O O O O O O O O O O O O O O O O O O O O :O: O O
| : |
0 ++---------------------------------------------------------O--*-----+


fio.write_bw_MBps

160 ++--------------------------------------------------------------------+
O O O O O O O O O O O O O O O O O O O O O O O
140 ++ |
120 ++ |
| |
100 ++ |
| .*.. .*.. |
80 *+.*. *. *..*..*..*..*..*..*...*..*..*..*..*..*..*..*..* * |
| : : |
60 ++ : : |
40 ++ : : |
| : : |
20 ++ : : |
| : |
0 ++-----------------------------------------------------------O--*-----+


fio.write_iops

40000 ++------------------------------------------------------------------+
O O O O O O O O O O O O O O O O O O O O O O O
35000 ++ |
30000 ++ |
| |
25000 ++ |
*..*..*..*..*..*..*..*..*..*..*..*.*..*..*..*..*..*..*..*..* * |
20000 ++ : : |
| : : |
15000 ++ : : |
10000 ++ : : |
| : : |
5000 ++ : : |
| : |
0 ++---------------------------------------------------------O--*-----+


fio.write_clat_mean_us

3000 *+-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*-----*--+
| : : |
2500 ++ : : |
| : : |
| : : |
2000 ++ : : |
O O O O O O O O O O O O O O O O O O O O : O :O O
1500 ++ : : |
| : : |
1000 ++ : : |
| : : |
| : : |
500 ++ : |
| : |
0 ++----------------------------------------------------------O--*-----+


fio.write_clat_99__us

250000 ++-----------------------------------------------------------------+
| |
*..*..*..*..*..*.*..*..*..*..*..*..*..*..*..*..*..*.*..*..* * |
200000 ++ : : |
| : : |
| : : |
150000 ++ : : |
| : : |
100000 O+ O O O O O O O O O O O O O O O O O O O :O :O O
| : : |
| : : |
50000 ++ : : |
| :: |
| : |
0 ++--------------------------------------------------------O--*-----+


fio.latency_10us_

45 ++---------------------------------------------------------------------+
| O
40 O+ O O O O O O O O O O O O O O O O O O O O |
35 ++ O |
| |
30 ++ |
25 ++ |
| *.. *..* |
20 *+. .*..*..*.. : * *.. + + |
15 ++ *. : * : : : + + .*.. .* * |
| *... : + : : : * *.. .. *. : : |
10 ++ * + : : : * : : |
5 ++ * * : : |
| : : |
0 ++------------------------------------------------------------O--*-----+


fio.latency_50ms_

0.02 *+-*--*--*--*--*--*--*--*--*--*--*-*--*--*--*--*--*--*--*--*-----*--+
| : : |
| : : |
0.015 ++ : : |
| : : |
| : : |
| : : |
0.01 O+ O O O O O O O O O O O O O O O O O O O :O :O O
| : : |
| : : |
0.005 ++ : : |
| : : |
| : |
| : |
0 ++---------------------------------------------------------O--*-----+


fio.latency_100ms_

0.3 ++-------------------------------------------------------------------+
| O O O O O |
0.25 O+ O O O O |
| O O O O O O O O O O
| O |
0.2 ++ O O |
| |
0.15 ++ |
| |
0.1 ++ |
| |
| |
0.05 ++ |
| |
0 *+-*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--*--O--*--*--+


fio.latency_250ms_

1.4 *+-*--*--*--*--*--*--*--*--*--*--*---*--*--*--*--*--*--*-----*-----*--+
| *. : : |
1.2 ++ O O O O O O O O O O : O O
O O O O O O O O O O : O : |
1 ++ : : |
| : : |
0.8 ++ : : |
| : : |
0.6 ++ : : |
| : : |
0.4 ++ : : |
| : : |
0.2 ++ : |
| : |
0 ++-----------------------------------------------------------O--*-----+


vmstat.io.bo

80000 O+----O--O-----------O--O-------------------O--------O--O-----------O
| O O O O O O O O O O O O O O |
70000 ++ |
60000 ++ |
| |
50000 ++ |
| |
40000 ++ |
| |
30000 ++ |
20000 ++ |
| |
10000 *+.*..*..*..*..*..*..*..*..*..*..*.*..*..*..*..*..*..*..*..*.. * |
| .. |
0 ++---------------------------------------------------------O--*-----+


vmstat.system.in

120000 ++-O--O--------O-O-----------O--O--O--O--O-----O--O----------O-----+
O O O O O O O O O O O
100000 ++ |
*..*..*..*..*..*.*..*..*..*..*..*..*..*..*..*..*..*.*..*..* * |
| : : |
80000 ++ : : |
| : : |
60000 ++ : : |
| : : |
40000 ++ : : |
| : : |
| : : |
20000 ++ : : |
| : |
0 ++--------------------------------------------------------O--*-----+


vmstat.system.cs

25000 O+-O--------O-----O----------------O-----O-----O--O--------------O--O
| O O O O O O O O O O O O O |
| |
20000 ++ |
| |
| |
15000 ++ |
| |
10000 ++ |
| |
| |
5000 *+.*..*..*..*..*..*..*..*..*..*..*.*..*..*..*..*..*..*..*..* * |
| + + |
| + + |
0 ++---------------------------------------------------------O--*-----+

[*] bisect-good sample
[O] bisect-bad sample


Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Ying Huang


Attachments:
(No filename) (48.98 kB)
config-4.9.0-rc1-00061-g64e1c57 (150.14 kB)
job-script (6.93 kB)
job.yaml (4.58 kB)
reproduce (470.00 B)
Download all attachments