2018-01-02 06:36:16

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [mm/cma] 2b0f904a5a: fio.read_bw_MBps -16.1% regression


Greeting,

FYI, we noticed a -16.1% regression of fio.read_bw_MBps due to commit:


commit: 2b0f904a5a8781498417d67226fd12c5e56053ae ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: fio-basic
on test machine: 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory
with following parameters:

disk: 2pmem
fs: ext4
runtime: 200s
nr_task: 50%
time_based: tb
rw: randread
bs: 2M
ioengine: mmap
test_size: 200G
cpufreq_governor: performance

test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
test-url: https://github.com/axboe/fio



Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
2M/gcc-7/performance/2pmem/ext4/mmap/x86_64-rhel-7.2/50%/debian-x86_64-2016-08-31.cgz/200s/randread/lkp-hsw-ep6/200G/fio-basic/tb

commit:
f6572f9cd2 ("mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request")
2b0f904a5a ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")

f6572f9cd248df2c 2b0f904a5a8781498417d67226
---------------- --------------------------
%stddev %change %stddev
\ | \
11451 -16.1% 9605 fio.read_bw_MBps
0.29 ? 5% +0.1 0.40 ? 3% fio.latency_1000us%
19.35 ? 5% -4.7 14.69 ? 3% fio.latency_10ms%
7.92 ? 3% +12.2 20.15 fio.latency_20ms%
0.05 ? 11% +0.0 0.09 ? 8% fio.latency_2ms%
70.22 -8.9 61.36 fio.latency_4ms%
0.29 ? 13% +0.0 0.33 ? 3% fio.latency_500us%
0.45 ? 29% +1.0 1.45 ? 4% fio.latency_50ms%
1.37 +0.1 1.44 fio.latency_750us%
9792 +31.7% 12896 fio.read_clat_90%_us
10560 +33.0% 14048 fio.read_clat_95%_us
15376 ? 10% +46.9% 22592 fio.read_clat_99%_us
4885 +19.2% 5825 fio.read_clat_mean_us
5725 -16.1% 4802 fio.read_iops
4.598e+09 -16.4% 3.845e+09 fio.time.file_system_inputs
453153 -8.4% 415215 fio.time.involuntary_context_switches
5.748e+08 -16.4% 4.806e+08 fio.time.major_page_faults
1822257 +23.7% 2254706 fio.time.maximum_resident_set_size
5089 +1.6% 5172 fio.time.system_time
514.50 -16.3% 430.48 fio.time.user_time
24569 ? 2% +9.6% 26917 ? 2% fio.time.voluntary_context_switches
54443725 -14.9% 46353339 interrupts.CAL:Function_call_interrupts
0.00 ? 79% -0.0 0.00 ? 17% mpstat.cpu.iowait%
4.45 -0.7 3.71 mpstat.cpu.usr%
1467516 +21.3% 1779543 ? 3% meminfo.Active
1276031 +23.7% 1578443 ? 4% meminfo.Active(file)
25789 ? 3% -76.7% 6013 ? 4% meminfo.CmaFree
1.296e+08 -12.6% 1.133e+08 turbostat.IRQ
41.89 -3.4% 40.47 turbostat.RAMWatt
17444 ? 2% -13.5% 15092 ? 3% turbostat.SMI
10896428 -16.4% 9111830 vmstat.io.bi
6010 -6.2% 5637 vmstat.system.cs
317438 -12.1% 278980 vmstat.system.in
1072892 ? 3% +21.5% 1303487 numa-meminfo.node0.Active
978318 +21.6% 1189809 ? 2% numa-meminfo.node0.Active(file)
222968 -25.2% 166818 numa-meminfo.node0.PageTables
47374 ? 2% +10.6% 52402 ? 7% numa-meminfo.node0.SUnreclaim
165213 +31.9% 217870 numa-meminfo.node1.PageTables
222405 +10.4% 245633 ? 2% numa-meminfo.node1.SReclaimable
102992 ? 46% -80.8% 19812 ? 38% numa-meminfo.node1.Shmem
2.475e+08 ? 2% -24.0% 1.881e+08 numa-numastat.node0.local_node
39371795 ? 14% +167.1% 1.052e+08 ? 2% numa-numastat.node0.numa_foreign
2.475e+08 ? 2% -24.0% 1.881e+08 numa-numastat.node0.numa_hit
31890417 ? 17% +40.2% 44705135 ? 8% numa-numastat.node0.numa_miss
31899482 ? 17% +40.2% 44713255 ? 8% numa-numastat.node0.other_node
2.566e+08 ? 2% -44.2% 1.433e+08 numa-numastat.node1.local_node
31890417 ? 17% +40.2% 44705135 ? 8% numa-numastat.node1.numa_foreign
2.566e+08 ? 2% -44.2% 1.433e+08 numa-numastat.node1.numa_hit
39371795 ? 14% +167.1% 1.052e+08 ? 2% numa-numastat.node1.numa_miss
39373660 ? 14% +167.1% 1.052e+08 ? 2% numa-numastat.node1.other_node
6047 ? 39% -66.5% 2028 ? 63% sched_debug.cfs_rq:/.exec_clock.min
461.37 ? 8% +64.9% 760.74 ? 20% sched_debug.cfs_rq:/.load_avg.avg
1105 ? 13% +1389.3% 16467 ? 56% sched_debug.cfs_rq:/.load_avg.max
408.99 ? 3% +495.0% 2433 ? 49% sched_debug.cfs_rq:/.load_avg.stddev
28746 ? 12% -18.7% 23366 ? 14% sched_debug.cfs_rq:/.min_vruntime.min
752426 ? 3% -12.7% 656636 ? 4% sched_debug.cpu.avg_idle.avg
144956 ? 61% -85.4% 21174 ? 26% sched_debug.cpu.avg_idle.min
245684 ? 11% +44.6% 355257 ? 2% sched_debug.cpu.avg_idle.stddev
236035 ? 15% +51.8% 358264 ? 16% sched_debug.cpu.nr_switches.max
42039 ? 22% +34.7% 56616 ? 8% sched_debug.cpu.nr_switches.stddev
3204 ? 24% -48.1% 1663 ? 30% sched_debug.cpu.sched_count.min
2132 ? 25% +38.7% 2957 ? 11% sched_debug.cpu.sched_count.stddev
90.67 ? 32% -71.8% 25.58 ? 26% sched_debug.cpu.sched_goidle.min
6467 ? 15% +22.3% 7912 ? 15% sched_debug.cpu.ttwu_count.max
1513 ? 27% -55.7% 670.92 ? 22% sched_debug.cpu.ttwu_count.min
1025 ? 20% +68.4% 1727 ? 9% sched_debug.cpu.ttwu_count.stddev
1057 ? 16% -62.9% 391.85 ? 31% sched_debug.cpu.ttwu_local.min
244876 +21.6% 297770 ? 2% numa-vmstat.node0.nr_active_file
88.00 ? 5% +19.3% 105.00 ? 5% numa-vmstat.node0.nr_isolated_file
55778 -25.1% 41765 numa-vmstat.node0.nr_page_table_pages
11843 ? 2% +10.6% 13100 ? 7% numa-vmstat.node0.nr_slab_unreclaimable
159.25 ? 42% -74.9% 40.00 ? 52% numa-vmstat.node0.nr_vmscan_immediate_reclaim
244862 +21.6% 297739 ? 2% numa-vmstat.node0.nr_zone_active_file
19364320 ? 19% +187.2% 55617595 ? 2% numa-vmstat.node0.numa_foreign
268155 ? 3% +49.6% 401089 ? 4% numa-vmstat.node0.workingset_activate
1.229e+08 -19.0% 99590617 numa-vmstat.node0.workingset_refault
6345 ? 3% -76.5% 1489 ? 3% numa-vmstat.node1.nr_free_cma
41335 +32.0% 54552 numa-vmstat.node1.nr_page_table_pages
25770 ? 46% -80.8% 4956 ? 38% numa-vmstat.node1.nr_shmem
55684 +10.4% 61475 ? 2% numa-vmstat.node1.nr_slab_reclaimable
1.618e+08 ? 8% -47.6% 84846798 ? 17% numa-vmstat.node1.numa_hit
1.617e+08 ? 8% -47.6% 84676284 ? 17% numa-vmstat.node1.numa_local
19365342 ? 19% +187.2% 55620100 ? 2% numa-vmstat.node1.numa_miss
19534837 ? 19% +185.6% 55790654 ? 2% numa-vmstat.node1.numa_other
1.296e+08 -21.0% 1.024e+08 numa-vmstat.node1.workingset_refault
1.832e+12 -7.5% 1.694e+12 perf-stat.branch-instructions
0.25 -0.0 0.23 perf-stat.branch-miss-rate%
4.666e+09 -16.0% 3.918e+09 perf-stat.branch-misses
39.88 +1.1 40.98 perf-stat.cache-miss-rate%
2.812e+10 -11.6% 2.485e+10 perf-stat.cache-misses
7.051e+10 -14.0% 6.064e+10 perf-stat.cache-references
1260521 -6.1% 1183071 perf-stat.context-switches
1.87 +9.6% 2.05 perf-stat.cpi
6707 ? 2% -5.2% 6359 perf-stat.cpu-migrations
1.04 ? 11% -0.3 0.77 ? 4% perf-stat.dTLB-load-miss-rate%
2.365e+10 ? 7% -25.9% 1.751e+10 ? 9% perf-stat.dTLB-load-misses
1.05e+12 ? 4% -9.5% 9.497e+11 ? 2% perf-stat.dTLB-stores
28.16 +2.2 30.35 ? 2% perf-stat.iTLB-load-miss-rate%
2.56e+08 -10.4% 2.295e+08 perf-stat.iTLB-loads
8.974e+12 -9.2% 8.151e+12 perf-stat.instructions
89411 -8.8% 81529 perf-stat.instructions-per-iTLB-miss
0.54 -8.8% 0.49 perf-stat.ipc
5.748e+08 -16.4% 4.806e+08 perf-stat.major-faults
52.82 +5.8 58.61 ? 2% perf-stat.node-load-miss-rate%
7.206e+09 ? 2% -18.6% 5.867e+09 ? 3% perf-stat.node-loads
17.96 ? 8% +15.7 33.69 ? 2% perf-stat.node-store-miss-rate%
2.055e+09 ? 8% +65.1% 3.393e+09 ? 4% perf-stat.node-store-misses
9.391e+09 ? 2% -28.9% 6.675e+09 perf-stat.node-stores
5.753e+08 -16.4% 4.811e+08 perf-stat.page-faults
305865 -16.3% 256108 proc-vmstat.allocstall_movable
1923 ? 14% -72.1% 537.00 ? 12% proc-vmstat.allocstall_normal
0.00 +Inf% 1577 ? 67% proc-vmstat.compact_isolated
1005 ? 4% -65.8% 344.00 ? 7% proc-vmstat.kswapd_low_wmark_hit_quickly
320062 +23.2% 394374 ? 4% proc-vmstat.nr_active_file
6411 ? 2% -76.4% 1511 ? 4% proc-vmstat.nr_free_cma
277.00 ? 12% -51.4% 134.75 ? 52% proc-vmstat.nr_vmscan_immediate_reclaim
320049 +23.2% 394353 ? 4% proc-vmstat.nr_zone_active_file
71262212 ? 15% +110.3% 1.499e+08 ? 3% proc-vmstat.numa_foreign
5.042e+08 ? 2% -34.3% 3.314e+08 proc-vmstat.numa_hit
5.041e+08 ? 2% -34.3% 3.314e+08 proc-vmstat.numa_local
71262212 ? 15% +110.3% 1.499e+08 ? 3% proc-vmstat.numa_miss
71273176 ? 15% +110.3% 1.499e+08 ? 3% proc-vmstat.numa_other
1007 ? 4% -65.6% 346.25 ? 7% proc-vmstat.pageoutrun
23070268 -16.0% 19386190 proc-vmstat.pgalloc_dma32
5.525e+08 -16.7% 4.603e+08 proc-vmstat.pgalloc_normal
5.753e+08 -16.4% 4.812e+08 proc-vmstat.pgfault
5.751e+08 -16.3% 4.813e+08 proc-vmstat.pgfree
5.748e+08 -16.4% 4.806e+08 proc-vmstat.pgmajfault
2.299e+09 -16.4% 1.923e+09 proc-vmstat.pgpgin
8.396e+08 -17.8% 6.901e+08 proc-vmstat.pgscan_direct
3.018e+08 ? 2% -13.0% 2.627e+08 proc-vmstat.pgscan_kswapd
4.1e+08 -15.1% 3.48e+08 proc-vmstat.pgsteal_direct
1.542e+08 ? 3% -20.9% 1.22e+08 ? 3% proc-vmstat.pgsteal_kswapd
23514 ? 4% -23.1% 18076 ? 16% proc-vmstat.slabs_scanned
343040 ? 2% +40.3% 481253 ? 2% proc-vmstat.workingset_activate
2.525e+08 -20.1% 2.018e+08 proc-vmstat.workingset_refault
13.64 ? 3% -1.7 11.96 ? 2% perf-profile.calltrace.cycles-pp.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
11.67 ? 3% -1.4 10.29 ? 2% perf-profile.calltrace.cycles-pp.submit_bio.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault.__do_fault
11.64 ? 3% -1.4 10.25 ? 2% perf-profile.calltrace.cycles-pp.generic_make_request.submit_bio.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault
11.10 ? 3% -1.3 9.82 ? 2% perf-profile.calltrace.cycles-pp.pmem_make_request.generic_make_request.submit_bio.ext4_mpage_readpages.filemap_fault
9.21 ? 3% -1.2 8.04 ? 3% perf-profile.calltrace.cycles-pp.pmem_do_bvec.pmem_make_request.generic_make_request.submit_bio.ext4_mpage_readpages
27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
26.79 ? 4% -0.8 25.98 ? 5% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
27.98 ? 3% -0.8 27.22 ? 4% perf-profile.calltrace.cycles-pp.secondary_startup_64
5.36 ? 12% -0.6 4.76 ? 7% perf-profile.calltrace.cycles-pp.kswapd.kthread.ret_from_fork
5.36 ? 12% -0.6 4.76 ? 7% perf-profile.calltrace.cycles-pp.shrink_node.kswapd.kthread.ret_from_fork
5.30 ? 12% -0.6 4.71 ? 7% perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.kswapd.kthread
5.35 ? 12% -0.6 4.76 ? 7% perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.kswapd.kthread.ret_from_fork
5.43 ? 12% -0.5 4.88 ? 7% perf-profile.calltrace.cycles-pp.ret_from_fork
5.43 ? 12% -0.5 4.88 ? 7% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
11.04 ? 2% -0.2 10.82 ? 2% perf-profile.calltrace.cycles-pp.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages
62.44 ? 2% +1.9 64.38 perf-profile.calltrace.cycles-pp.page_fault
62.38 ? 2% +2.0 64.33 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
62.38 ? 2% +2.0 64.34 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
61.52 ? 2% +2.1 63.58 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
61.34 ? 2% +2.1 63.44 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
30.18 ? 3% +2.3 32.45 ? 2% perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages.try_to_free_pages
7.98 ? 3% +2.3 10.33 ? 2% perf-profile.calltrace.cycles-pp.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
30.48 ? 3% +2.4 32.83 ? 2% perf-profile.calltrace.cycles-pp.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault
30.46 ? 3% +2.4 32.81 ? 2% perf-profile.calltrace.cycles-pp.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault
30.46 ? 3% +2.4 32.81 ? 2% perf-profile.calltrace.cycles-pp.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask
30.37 ? 3% +2.4 32.75 ? 2% perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath
5.58 ? 4% +2.5 8.08 ? 2% perf-profile.calltrace.cycles-pp.__lru_cache_add.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault.__do_fault
32.88 ? 3% +2.5 35.38 ? 2% perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
5.51 ? 4% +2.5 8.02 ? 2% perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault
4.24 ? 4% +2.5 6.76 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru.filemap_fault
4.18 ? 4% +2.5 6.70 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru
18.64 ? 3% +2.5 21.16 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node
31.65 ? 3% +2.7 34.31 ? 2% perf-profile.calltrace.cycles-pp.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault.__do_fault
17.21 ? 3% +2.7 19.93 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages
56.37 ? 2% +2.8 59.21 perf-profile.calltrace.cycles-pp.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
56.36 ? 2% +2.8 59.20 perf-profile.calltrace.cycles-pp.ext4_filemap_fault.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault
56.10 ? 2% +2.9 58.97 perf-profile.calltrace.cycles-pp.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault.handle_mm_fault
13.66 ? 3% -1.7 11.98 ? 2% perf-profile.children.cycles-pp.ext4_mpage_readpages
11.69 ? 3% -1.4 10.30 ? 2% perf-profile.children.cycles-pp.submit_bio
11.64 ? 3% -1.4 10.26 ? 2% perf-profile.children.cycles-pp.generic_make_request
11.12 ? 3% -1.3 9.84 ? 2% perf-profile.children.cycles-pp.pmem_make_request
9.27 ? 3% -1.1 8.12 ? 3% perf-profile.children.cycles-pp.pmem_do_bvec
27.33 ? 4% -1.0 26.35 ? 5% perf-profile.children.cycles-pp.start_secondary
27.98 ? 3% -0.8 27.22 ? 4% perf-profile.children.cycles-pp.secondary_startup_64
27.98 ? 3% -0.8 27.22 ? 4% perf-profile.children.cycles-pp.cpu_startup_entry
27.98 ? 3% -0.8 27.22 ? 4% perf-profile.children.cycles-pp.do_idle
27.97 ? 3% -0.8 27.22 ? 4% perf-profile.children.cycles-pp.cpuidle_enter_state
5.36 ? 12% -0.6 4.76 ? 7% perf-profile.children.cycles-pp.kswapd
27.42 ? 4% -0.6 26.84 ? 5% perf-profile.children.cycles-pp.intel_idle
5.43 ? 12% -0.5 4.88 ? 7% perf-profile.children.cycles-pp.kthread
5.43 ? 12% -0.5 4.88 ? 7% perf-profile.children.cycles-pp.ret_from_fork
14.25 -0.4 13.80 ? 2% perf-profile.children.cycles-pp.shrink_page_list
35.60 +1.7 37.31 ? 2% perf-profile.children.cycles-pp.shrink_inactive_list
35.89 +1.8 37.67 ? 2% perf-profile.children.cycles-pp.shrink_node
35.80 +1.8 37.60 ? 2% perf-profile.children.cycles-pp.shrink_node_memcg
62.46 ? 2% +2.0 64.41 perf-profile.children.cycles-pp.page_fault
62.43 ? 2% +2.0 64.39 perf-profile.children.cycles-pp.__do_page_fault
62.41 ? 2% +2.0 64.39 perf-profile.children.cycles-pp.do_page_fault
61.55 ? 2% +2.1 63.63 perf-profile.children.cycles-pp.handle_mm_fault
61.37 ? 2% +2.1 63.49 perf-profile.children.cycles-pp.__handle_mm_fault
8.00 ? 3% +2.3 10.35 ? 2% perf-profile.children.cycles-pp.add_to_page_cache_lru
30.55 ? 3% +2.4 32.92 ? 2% perf-profile.children.cycles-pp.try_to_free_pages
30.53 ? 3% +2.4 32.91 ? 2% perf-profile.children.cycles-pp.do_try_to_free_pages
5.59 ? 4% +2.5 8.09 ? 2% perf-profile.children.cycles-pp.__lru_cache_add
5.61 ? 4% +2.5 8.12 ? 2% perf-profile.children.cycles-pp.pagevec_lru_move_fn
32.97 ? 3% +2.5 35.50 ? 2% perf-profile.children.cycles-pp.__alloc_pages_nodemask
5.26 ? 4% +2.6 7.89 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
19.05 ? 3% +2.7 21.72 ? 3% perf-profile.children.cycles-pp._raw_spin_lock_irq
31.72 ? 3% +2.7 34.41 ? 2% perf-profile.children.cycles-pp.__alloc_pages_slowpath
56.29 ? 2% +2.8 59.07 perf-profile.children.cycles-pp.filemap_fault
56.38 ? 2% +2.8 59.23 perf-profile.children.cycles-pp.__do_fault
56.37 ? 2% +2.8 59.21 perf-profile.children.cycles-pp.ext4_filemap_fault
24.54 +5.3 29.82 ? 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
9.20 ? 3% -1.2 8.04 ? 3% perf-profile.self.cycles-pp.pmem_do_bvec
27.42 ? 4% -0.6 26.84 ? 5% perf-profile.self.cycles-pp.intel_idle
24.54 +5.3 29.82 ? 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath



fio.read_bw_MBps

11600 +-+-----------------------------------------------------------------+
11400 +-+. .+..+.+.+.+ .+.. .+. .+ + .+. .+.+.+. .+.+.+.+..+.+.+.+.|
| + + .+.+ + + + +. + |
11200 +-+ + |
11000 +-+ |
10800 +-+ |
10600 +-+ |
| |
10400 +-+ |
10200 +-+ |
10000 +-+ |
9800 +-+ |
O O O O O O O O O O |
9600 +-O O O O O O O O O |
9400 +-+-----------------------------------------------------------------+


fio.read_iops

5800 +-+------------------------------------------------------------------+
5700 +-+. .+..+.+.+.+ .+. .+. .+. + .+. .+.+.+.. .+.+.+.+.+..+.+.+.|
| + + .+..+ + + + + + |
5600 +-+ + |
5500 +-+ |
5400 +-+ |
5300 +-+ |
| |
5200 +-+ |
5100 +-+ |
5000 +-+ |
4900 +-+ |
O O O O O O O O O O |
4800 +-O O O O O O O O O |
4700 +-+------------------------------------------------------------------+


fio.read_clat_mean_us

6000 +-+------------------------------------------------------------------+
| |
5800 +-O O O O O O O O O O O O O O O |
O O O O |
| |
5600 +-+ |
| |
5400 +-+ |
| |
5200 +-+ |
| |
| +. |
5000 +-+ + +..+. .+. .+. .+. .+. |
| +.+.+..+.+.+.+ +.+.+ +..+ + +.+.+. +.+.+.+.+..+.+.+.|
4800 +-+------------------------------------------------------------------+



[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong


Attachments:
(No filename) (26.12 kB)
config-4.15.0-rc3-00148-g2b0f904 (159.92 kB)
job-script (7.21 kB)
job.yaml (4.85 kB)
reproduce (831.00 B)
Download all attachments

2018-01-03 02:04:36

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [lkp-robot] [mm/cma] 2b0f904a5a: fio.read_bw_MBps -16.1% regression

Hello!

On Tue, Jan 02, 2018 at 02:35:28PM +0800, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a -16.1% regression of fio.read_bw_MBps due to commit:
>
>
> commit: 2b0f904a5a8781498417d67226fd12c5e56053ae ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> in testcase: fio-basic
> on test machine: 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory
> with following parameters:
>
> disk: 2pmem
> fs: ext4
> runtime: 200s
> nr_task: 50%
> time_based: tb
> rw: randread
> bs: 2M
> ioengine: mmap
> test_size: 200G
> cpufreq_governor: performance
>
> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
> test-url: https://github.com/axboe/fio
>
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml
>
> =========================================================================================
> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
> 2M/gcc-7/performance/2pmem/ext4/mmap/x86_64-rhel-7.2/50%/debian-x86_64-2016-08-31.cgz/200s/randread/lkp-hsw-ep6/200G/fio-basic/tb
>
> commit:
> f6572f9cd2 ("mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request")
> 2b0f904a5a ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
>
> f6572f9cd248df2c 2b0f904a5a8781498417d67226
> ---------------- --------------------------
> %stddev %change %stddev
> \ | \
> 11451 -16.1% 9605 fio.read_bw_MBps
> 0.29 ? 5% +0.1 0.40 ? 3% fio.latency_1000us%
> 19.35 ? 5% -4.7 14.69 ? 3% fio.latency_10ms%
> 7.92 ? 3% +12.2 20.15 fio.latency_20ms%
> 0.05 ? 11% +0.0 0.09 ? 8% fio.latency_2ms%
> 70.22 -8.9 61.36 fio.latency_4ms%
> 0.29 ? 13% +0.0 0.33 ? 3% fio.latency_500us%
> 0.45 ? 29% +1.0 1.45 ? 4% fio.latency_50ms%
> 1.37 +0.1 1.44 fio.latency_750us%
> 9792 +31.7% 12896 fio.read_clat_90%_us
> 10560 +33.0% 14048 fio.read_clat_95%_us
> 15376 ? 10% +46.9% 22592 fio.read_clat_99%_us
> 4885 +19.2% 5825 fio.read_clat_mean_us
> 5725 -16.1% 4802 fio.read_iops
> 4.598e+09 -16.4% 3.845e+09 fio.time.file_system_inputs
> 453153 -8.4% 415215 fio.time.involuntary_context_switches
> 5.748e+08 -16.4% 4.806e+08 fio.time.major_page_faults
> 1822257 +23.7% 2254706 fio.time.maximum_resident_set_size
> 5089 +1.6% 5172 fio.time.system_time
> 514.50 -16.3% 430.48 fio.time.user_time

System time is increased and user time is decreased. On the below, there is a clue.

> 24569 ? 2% +9.6% 26917 ? 2% fio.time.voluntary_context_switches
> 54443725 -14.9% 46353339 interrupts.CAL:Function_call_interrupts
> 0.00 ? 79% -0.0 0.00 ? 17% mpstat.cpu.iowait%
> 4.45 -0.7 3.71 mpstat.cpu.usr%
> 1467516 +21.3% 1779543 ? 3% meminfo.Active
> 1276031 +23.7% 1578443 ? 4% meminfo.Active(file)
> 25789 ? 3% -76.7% 6013 ? 4% meminfo.CmaFree
> 1.296e+08 -12.6% 1.133e+08 turbostat.IRQ
> 41.89 -3.4% 40.47 turbostat.RAMWatt
> 17444 ? 2% -13.5% 15092 ? 3% turbostat.SMI
> 10896428 -16.4% 9111830 vmstat.io.bi
> 6010 -6.2% 5637 vmstat.system.cs
> 317438 -12.1% 278980 vmstat.system.in
> 1072892 ? 3% +21.5% 1303487 numa-meminfo.node0.Active
> 978318 +21.6% 1189809 ? 2% numa-meminfo.node0.Active(file)
> 222968 -25.2% 166818 numa-meminfo.node0.PageTables
> 47374 ? 2% +10.6% 52402 ? 7% numa-meminfo.node0.SUnreclaim
> 165213 +31.9% 217870 numa-meminfo.node1.PageTables
> 222405 +10.4% 245633 ? 2% numa-meminfo.node1.SReclaimable
> 102992 ? 46% -80.8% 19812 ? 38% numa-meminfo.node1.Shmem
> 2.475e+08 ? 2% -24.0% 1.881e+08 numa-numastat.node0.local_node
> 39371795 ? 14% +167.1% 1.052e+08 ? 2% numa-numastat.node0.numa_foreign
> 2.475e+08 ? 2% -24.0% 1.881e+08 numa-numastat.node0.numa_hit
> 31890417 ? 17% +40.2% 44705135 ? 8% numa-numastat.node0.numa_miss
> 31899482 ? 17% +40.2% 44713255 ? 8% numa-numastat.node0.other_node
> 2.566e+08 ? 2% -44.2% 1.433e+08 numa-numastat.node1.local_node
> 31890417 ? 17% +40.2% 44705135 ? 8% numa-numastat.node1.numa_foreign
> 2.566e+08 ? 2% -44.2% 1.433e+08 numa-numastat.node1.numa_hit
> 39371795 ? 14% +167.1% 1.052e+08 ? 2% numa-numastat.node1.numa_miss
> 39373660 ? 14% +167.1% 1.052e+08 ? 2% numa-numastat.node1.other_node
> 6047 ? 39% -66.5% 2028 ? 63% sched_debug.cfs_rq:/.exec_clock.min
> 461.37 ? 8% +64.9% 760.74 ? 20% sched_debug.cfs_rq:/.load_avg.avg
> 1105 ? 13% +1389.3% 16467 ? 56% sched_debug.cfs_rq:/.load_avg.max
> 408.99 ? 3% +495.0% 2433 ? 49% sched_debug.cfs_rq:/.load_avg.stddev
> 28746 ? 12% -18.7% 23366 ? 14% sched_debug.cfs_rq:/.min_vruntime.min
> 752426 ? 3% -12.7% 656636 ? 4% sched_debug.cpu.avg_idle.avg
> 144956 ? 61% -85.4% 21174 ? 26% sched_debug.cpu.avg_idle.min
> 245684 ? 11% +44.6% 355257 ? 2% sched_debug.cpu.avg_idle.stddev
> 236035 ? 15% +51.8% 358264 ? 16% sched_debug.cpu.nr_switches.max
> 42039 ? 22% +34.7% 56616 ? 8% sched_debug.cpu.nr_switches.stddev
> 3204 ? 24% -48.1% 1663 ? 30% sched_debug.cpu.sched_count.min
> 2132 ? 25% +38.7% 2957 ? 11% sched_debug.cpu.sched_count.stddev
> 90.67 ? 32% -71.8% 25.58 ? 26% sched_debug.cpu.sched_goidle.min
> 6467 ? 15% +22.3% 7912 ? 15% sched_debug.cpu.ttwu_count.max
> 1513 ? 27% -55.7% 670.92 ? 22% sched_debug.cpu.ttwu_count.min
> 1025 ? 20% +68.4% 1727 ? 9% sched_debug.cpu.ttwu_count.stddev
> 1057 ? 16% -62.9% 391.85 ? 31% sched_debug.cpu.ttwu_local.min
> 244876 +21.6% 297770 ? 2% numa-vmstat.node0.nr_active_file
> 88.00 ? 5% +19.3% 105.00 ? 5% numa-vmstat.node0.nr_isolated_file
> 55778 -25.1% 41765 numa-vmstat.node0.nr_page_table_pages
> 11843 ? 2% +10.6% 13100 ? 7% numa-vmstat.node0.nr_slab_unreclaimable
> 159.25 ? 42% -74.9% 40.00 ? 52% numa-vmstat.node0.nr_vmscan_immediate_reclaim
> 244862 +21.6% 297739 ? 2% numa-vmstat.node0.nr_zone_active_file
> 19364320 ? 19% +187.2% 55617595 ? 2% numa-vmstat.node0.numa_foreign
> 268155 ? 3% +49.6% 401089 ? 4% numa-vmstat.node0.workingset_activate
> 1.229e+08 -19.0% 99590617 numa-vmstat.node0.workingset_refault
> 6345 ? 3% -76.5% 1489 ? 3% numa-vmstat.node1.nr_free_cma
> 41335 +32.0% 54552 numa-vmstat.node1.nr_page_table_pages
> 25770 ? 46% -80.8% 4956 ? 38% numa-vmstat.node1.nr_shmem
> 55684 +10.4% 61475 ? 2% numa-vmstat.node1.nr_slab_reclaimable
> 1.618e+08 ? 8% -47.6% 84846798 ? 17% numa-vmstat.node1.numa_hit
> 1.617e+08 ? 8% -47.6% 84676284 ? 17% numa-vmstat.node1.numa_local
> 19365342 ? 19% +187.2% 55620100 ? 2% numa-vmstat.node1.numa_miss
> 19534837 ? 19% +185.6% 55790654 ? 2% numa-vmstat.node1.numa_other
> 1.296e+08 -21.0% 1.024e+08 numa-vmstat.node1.workingset_refault
> 1.832e+12 -7.5% 1.694e+12 perf-stat.branch-instructions
> 0.25 -0.0 0.23 perf-stat.branch-miss-rate%
> 4.666e+09 -16.0% 3.918e+09 perf-stat.branch-misses
> 39.88 +1.1 40.98 perf-stat.cache-miss-rate%
> 2.812e+10 -11.6% 2.485e+10 perf-stat.cache-misses
> 7.051e+10 -14.0% 6.064e+10 perf-stat.cache-references
> 1260521 -6.1% 1183071 perf-stat.context-switches
> 1.87 +9.6% 2.05 perf-stat.cpi
> 6707 ? 2% -5.2% 6359 perf-stat.cpu-migrations
> 1.04 ? 11% -0.3 0.77 ? 4% perf-stat.dTLB-load-miss-rate%
> 2.365e+10 ? 7% -25.9% 1.751e+10 ? 9% perf-stat.dTLB-load-misses
> 1.05e+12 ? 4% -9.5% 9.497e+11 ? 2% perf-stat.dTLB-stores
> 28.16 +2.2 30.35 ? 2% perf-stat.iTLB-load-miss-rate%
> 2.56e+08 -10.4% 2.295e+08 perf-stat.iTLB-loads
> 8.974e+12 -9.2% 8.151e+12 perf-stat.instructions
> 89411 -8.8% 81529 perf-stat.instructions-per-iTLB-miss
> 0.54 -8.8% 0.49 perf-stat.ipc
> 5.748e+08 -16.4% 4.806e+08 perf-stat.major-faults
> 52.82 +5.8 58.61 ? 2% perf-stat.node-load-miss-rate%
> 7.206e+09 ? 2% -18.6% 5.867e+09 ? 3% perf-stat.node-loads
> 17.96 ? 8% +15.7 33.69 ? 2% perf-stat.node-store-miss-rate%
> 2.055e+09 ? 8% +65.1% 3.393e+09 ? 4% perf-stat.node-store-misses
> 9.391e+09 ? 2% -28.9% 6.675e+09 perf-stat.node-stores
> 5.753e+08 -16.4% 4.811e+08 perf-stat.page-faults
> 305865 -16.3% 256108 proc-vmstat.allocstall_movable
> 1923 ? 14% -72.1% 537.00 ? 12% proc-vmstat.allocstall_normal
> 0.00 +Inf% 1577 ? 67% proc-vmstat.compact_isolated
> 1005 ? 4% -65.8% 344.00 ? 7% proc-vmstat.kswapd_low_wmark_hit_quickly
> 320062 +23.2% 394374 ? 4% proc-vmstat.nr_active_file
> 6411 ? 2% -76.4% 1511 ? 4% proc-vmstat.nr_free_cma
> 277.00 ? 12% -51.4% 134.75 ? 52% proc-vmstat.nr_vmscan_immediate_reclaim
> 320049 +23.2% 394353 ? 4% proc-vmstat.nr_zone_active_file
> 71262212 ? 15% +110.3% 1.499e+08 ? 3% proc-vmstat.numa_foreign
> 5.042e+08 ? 2% -34.3% 3.314e+08 proc-vmstat.numa_hit
> 5.041e+08 ? 2% -34.3% 3.314e+08 proc-vmstat.numa_local
> 71262212 ? 15% +110.3% 1.499e+08 ? 3% proc-vmstat.numa_miss
> 71273176 ? 15% +110.3% 1.499e+08 ? 3% proc-vmstat.numa_other
> 1007 ? 4% -65.6% 346.25 ? 7% proc-vmstat.pageoutrun
> 23070268 -16.0% 19386190 proc-vmstat.pgalloc_dma32
> 5.525e+08 -16.7% 4.603e+08 proc-vmstat.pgalloc_normal
> 5.753e+08 -16.4% 4.812e+08 proc-vmstat.pgfault
> 5.751e+08 -16.3% 4.813e+08 proc-vmstat.pgfree
> 5.748e+08 -16.4% 4.806e+08 proc-vmstat.pgmajfault
> 2.299e+09 -16.4% 1.923e+09 proc-vmstat.pgpgin
> 8.396e+08 -17.8% 6.901e+08 proc-vmstat.pgscan_direct
> 3.018e+08 ? 2% -13.0% 2.627e+08 proc-vmstat.pgscan_kswapd
> 4.1e+08 -15.1% 3.48e+08 proc-vmstat.pgsteal_direct
> 1.542e+08 ? 3% -20.9% 1.22e+08 ? 3% proc-vmstat.pgsteal_kswapd
> 23514 ? 4% -23.1% 18076 ? 16% proc-vmstat.slabs_scanned
> 343040 ? 2% +40.3% 481253 ? 2% proc-vmstat.workingset_activate
> 2.525e+08 -20.1% 2.018e+08 proc-vmstat.workingset_refault
> 13.64 ? 3% -1.7 11.96 ? 2% perf-profile.calltrace.cycles-pp.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
> 11.67 ? 3% -1.4 10.29 ? 2% perf-profile.calltrace.cycles-pp.submit_bio.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault.__do_fault
> 11.64 ? 3% -1.4 10.25 ? 2% perf-profile.calltrace.cycles-pp.generic_make_request.submit_bio.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault
> 11.10 ? 3% -1.3 9.82 ? 2% perf-profile.calltrace.cycles-pp.pmem_make_request.generic_make_request.submit_bio.ext4_mpage_readpages.filemap_fault
> 9.21 ? 3% -1.2 8.04 ? 3% perf-profile.calltrace.cycles-pp.pmem_do_bvec.pmem_make_request.generic_make_request.submit_bio.ext4_mpage_readpages
> 27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
> 27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> 27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
> 26.79 ? 4% -0.8 25.98 ? 5% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
> 27.98 ? 3% -0.8 27.22 ? 4% perf-profile.calltrace.cycles-pp.secondary_startup_64
> 5.36 ? 12% -0.6 4.76 ? 7% perf-profile.calltrace.cycles-pp.kswapd.kthread.ret_from_fork
> 5.36 ? 12% -0.6 4.76 ? 7% perf-profile.calltrace.cycles-pp.shrink_node.kswapd.kthread.ret_from_fork
> 5.30 ? 12% -0.6 4.71 ? 7% perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.kswapd.kthread
> 5.35 ? 12% -0.6 4.76 ? 7% perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.kswapd.kthread.ret_from_fork
> 5.43 ? 12% -0.5 4.88 ? 7% perf-profile.calltrace.cycles-pp.ret_from_fork
> 5.43 ? 12% -0.5 4.88 ? 7% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
> 11.04 ? 2% -0.2 10.82 ? 2% perf-profile.calltrace.cycles-pp.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages
> 62.44 ? 2% +1.9 64.38 perf-profile.calltrace.cycles-pp.page_fault
> 62.38 ? 2% +2.0 64.33 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
> 62.38 ? 2% +2.0 64.34 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
> 61.52 ? 2% +2.1 63.58 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> 61.34 ? 2% +2.1 63.44 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> 30.18 ? 3% +2.3 32.45 ? 2% perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages.try_to_free_pages
> 7.98 ? 3% +2.3 10.33 ? 2% perf-profile.calltrace.cycles-pp.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
> 30.48 ? 3% +2.4 32.83 ? 2% perf-profile.calltrace.cycles-pp.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault
> 30.46 ? 3% +2.4 32.81 ? 2% perf-profile.calltrace.cycles-pp.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault
> 30.46 ? 3% +2.4 32.81 ? 2% perf-profile.calltrace.cycles-pp.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask
> 30.37 ? 3% +2.4 32.75 ? 2% perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath
> 5.58 ? 4% +2.5 8.08 ? 2% perf-profile.calltrace.cycles-pp.__lru_cache_add.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault.__do_fault
> 32.88 ? 3% +2.5 35.38 ? 2% perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
> 5.51 ? 4% +2.5 8.02 ? 2% perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault
> 4.24 ? 4% +2.5 6.76 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru.filemap_fault
> 4.18 ? 4% +2.5 6.70 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru
> 18.64 ? 3% +2.5 21.16 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node
> 31.65 ? 3% +2.7 34.31 ? 2% perf-profile.calltrace.cycles-pp.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault.__do_fault
> 17.21 ? 3% +2.7 19.93 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages

It looks like there is more lru lock contention. It would be caused by
using a movable zone for the CMA memory by this patch. In this case,
reclaim for normal memory skips the lru page on the movable zone so needs
more time to find enough reclaim target pages. It would increase lru lock
holding time and then cause contention.

Could you give me another stat 'pgskip_XXX' in /proc/vmstat to confirm
my theory?

BTW, other stats for memory management looks better than before,
less nr_free_cma, less pgmajfault, less reclaim scan/steal.

Thanks.

2018-01-06 09:27:25

by kernel test robot

[permalink] [raw]
Subject: Re: [lkp-robot] [mm/cma] 2b0f904a5a: fio.read_bw_MBps -16.1% regression

Hi,

On 01/03, Joonsoo Kim wrote:
>Hello!
>
>On Tue, Jan 02, 2018 at 02:35:28PM +0800, kernel test robot wrote:
>>
>> Greeting,
>>
>> FYI, we noticed a -16.1% regression of fio.read_bw_MBps due to commit:
>>
>>
>> commit: 2b0f904a5a8781498417d67226fd12c5e56053ae ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>
>> in testcase: fio-basic
>> on test machine: 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory
>> with following parameters:
>>
>> disk: 2pmem
>> fs: ext4
>> runtime: 200s
>> nr_task: 50%
>> time_based: tb
>> rw: randread
>> bs: 2M
>> ioengine: mmap
>> test_size: 200G
>> cpufreq_governor: performance
>>
>> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
>> test-url: https://github.com/axboe/fio
>>
>>
>>
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>>
>>
>> To reproduce:
>>
>> git clone https://github.com/intel/lkp-tests.git
>> cd lkp-tests
>> bin/lkp install job.yaml # job file is attached in this email
>> bin/lkp run job.yaml
>>
>> =========================================================================================
>> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
>> 2M/gcc-7/performance/2pmem/ext4/mmap/x86_64-rhel-7.2/50%/debian-x86_64-2016-08-31.cgz/200s/randread/lkp-hsw-ep6/200G/fio-basic/tb
>>
>> commit:
>> f6572f9cd2 ("mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request")
>> 2b0f904a5a ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
>>
>> f6572f9cd248df2c 2b0f904a5a8781498417d67226
>> ---------------- --------------------------
>> %stddev %change %stddev
>> \ | \
>> 11451 -16.1% 9605 fio.read_bw_MBps
>> 0.29 ? 5% +0.1 0.40 ? 3% fio.latency_1000us%
>> 19.35 ? 5% -4.7 14.69 ? 3% fio.latency_10ms%
>> 7.92 ? 3% +12.2 20.15 fio.latency_20ms%
>> 0.05 ? 11% +0.0 0.09 ? 8% fio.latency_2ms%
>> 70.22 -8.9 61.36 fio.latency_4ms%
>> 0.29 ? 13% +0.0 0.33 ? 3% fio.latency_500us%
>> 0.45 ? 29% +1.0 1.45 ? 4% fio.latency_50ms%
>> 1.37 +0.1 1.44 fio.latency_750us%
>> 9792 +31.7% 12896 fio.read_clat_90%_us
>> 10560 +33.0% 14048 fio.read_clat_95%_us
>> 15376 ? 10% +46.9% 22592 fio.read_clat_99%_us
>> 4885 +19.2% 5825 fio.read_clat_mean_us
>> 5725 -16.1% 4802 fio.read_iops
>> 4.598e+09 -16.4% 3.845e+09 fio.time.file_system_inputs
>> 453153 -8.4% 415215 fio.time.involuntary_context_switches
>> 5.748e+08 -16.4% 4.806e+08 fio.time.major_page_faults
>> 1822257 +23.7% 2254706 fio.time.maximum_resident_set_size
>> 5089 +1.6% 5172 fio.time.system_time
>> 514.50 -16.3% 430.48 fio.time.user_time
>
>System time is increased and user time is decreased. On the below, there is a clue.
>
>> 24569 ? 2% +9.6% 26917 ? 2% fio.time.voluntary_context_switches
>> 54443725 -14.9% 46353339 interrupts.CAL:Function_call_interrupts
>> 0.00 ? 79% -0.0 0.00 ? 17% mpstat.cpu.iowait%
>> 4.45 -0.7 3.71 mpstat.cpu.usr%
>> 1467516 +21.3% 1779543 ? 3% meminfo.Active
>> 1276031 +23.7% 1578443 ? 4% meminfo.Active(file)
>> 25789 ? 3% -76.7% 6013 ? 4% meminfo.CmaFree
>> 1.296e+08 -12.6% 1.133e+08 turbostat.IRQ
>> 41.89 -3.4% 40.47 turbostat.RAMWatt
>> 17444 ? 2% -13.5% 15092 ? 3% turbostat.SMI
>> 10896428 -16.4% 9111830 vmstat.io.bi
>> 6010 -6.2% 5637 vmstat.system.cs
>> 317438 -12.1% 278980 vmstat.system.in
>> 1072892 ? 3% +21.5% 1303487 numa-meminfo.node0.Active
>> 978318 +21.6% 1189809 ? 2% numa-meminfo.node0.Active(file)
>> 222968 -25.2% 166818 numa-meminfo.node0.PageTables
>> 47374 ? 2% +10.6% 52402 ? 7% numa-meminfo.node0.SUnreclaim
>> 165213 +31.9% 217870 numa-meminfo.node1.PageTables
>> 222405 +10.4% 245633 ? 2% numa-meminfo.node1.SReclaimable
>> 102992 ? 46% -80.8% 19812 ? 38% numa-meminfo.node1.Shmem
>> 2.475e+08 ? 2% -24.0% 1.881e+08 numa-numastat.node0.local_node
>> 39371795 ? 14% +167.1% 1.052e+08 ? 2% numa-numastat.node0.numa_foreign
>> 2.475e+08 ? 2% -24.0% 1.881e+08 numa-numastat.node0.numa_hit
>> 31890417 ? 17% +40.2% 44705135 ? 8% numa-numastat.node0.numa_miss
>> 31899482 ? 17% +40.2% 44713255 ? 8% numa-numastat.node0.other_node
>> 2.566e+08 ? 2% -44.2% 1.433e+08 numa-numastat.node1.local_node
>> 31890417 ? 17% +40.2% 44705135 ? 8% numa-numastat.node1.numa_foreign
>> 2.566e+08 ? 2% -44.2% 1.433e+08 numa-numastat.node1.numa_hit
>> 39371795 ? 14% +167.1% 1.052e+08 ? 2% numa-numastat.node1.numa_miss
>> 39373660 ? 14% +167.1% 1.052e+08 ? 2% numa-numastat.node1.other_node
>> 6047 ? 39% -66.5% 2028 ? 63% sched_debug.cfs_rq:/.exec_clock.min
>> 461.37 ? 8% +64.9% 760.74 ? 20% sched_debug.cfs_rq:/.load_avg.avg
>> 1105 ? 13% +1389.3% 16467 ? 56% sched_debug.cfs_rq:/.load_avg.max
>> 408.99 ? 3% +495.0% 2433 ? 49% sched_debug.cfs_rq:/.load_avg.stddev
>> 28746 ? 12% -18.7% 23366 ? 14% sched_debug.cfs_rq:/.min_vruntime.min
>> 752426 ? 3% -12.7% 656636 ? 4% sched_debug.cpu.avg_idle.avg
>> 144956 ? 61% -85.4% 21174 ? 26% sched_debug.cpu.avg_idle.min
>> 245684 ? 11% +44.6% 355257 ? 2% sched_debug.cpu.avg_idle.stddev
>> 236035 ? 15% +51.8% 358264 ? 16% sched_debug.cpu.nr_switches.max
>> 42039 ? 22% +34.7% 56616 ? 8% sched_debug.cpu.nr_switches.stddev
>> 3204 ? 24% -48.1% 1663 ? 30% sched_debug.cpu.sched_count.min
>> 2132 ? 25% +38.7% 2957 ? 11% sched_debug.cpu.sched_count.stddev
>> 90.67 ? 32% -71.8% 25.58 ? 26% sched_debug.cpu.sched_goidle.min
>> 6467 ? 15% +22.3% 7912 ? 15% sched_debug.cpu.ttwu_count.max
>> 1513 ? 27% -55.7% 670.92 ? 22% sched_debug.cpu.ttwu_count.min
>> 1025 ? 20% +68.4% 1727 ? 9% sched_debug.cpu.ttwu_count.stddev
>> 1057 ? 16% -62.9% 391.85 ? 31% sched_debug.cpu.ttwu_local.min
>> 244876 +21.6% 297770 ? 2% numa-vmstat.node0.nr_active_file
>> 88.00 ? 5% +19.3% 105.00 ? 5% numa-vmstat.node0.nr_isolated_file
>> 55778 -25.1% 41765 numa-vmstat.node0.nr_page_table_pages
>> 11843 ? 2% +10.6% 13100 ? 7% numa-vmstat.node0.nr_slab_unreclaimable
>> 159.25 ? 42% -74.9% 40.00 ? 52% numa-vmstat.node0.nr_vmscan_immediate_reclaim
>> 244862 +21.6% 297739 ? 2% numa-vmstat.node0.nr_zone_active_file
>> 19364320 ? 19% +187.2% 55617595 ? 2% numa-vmstat.node0.numa_foreign
>> 268155 ? 3% +49.6% 401089 ? 4% numa-vmstat.node0.workingset_activate
>> 1.229e+08 -19.0% 99590617 numa-vmstat.node0.workingset_refault
>> 6345 ? 3% -76.5% 1489 ? 3% numa-vmstat.node1.nr_free_cma
>> 41335 +32.0% 54552 numa-vmstat.node1.nr_page_table_pages
>> 25770 ? 46% -80.8% 4956 ? 38% numa-vmstat.node1.nr_shmem
>> 55684 +10.4% 61475 ? 2% numa-vmstat.node1.nr_slab_reclaimable
>> 1.618e+08 ? 8% -47.6% 84846798 ? 17% numa-vmstat.node1.numa_hit
>> 1.617e+08 ? 8% -47.6% 84676284 ? 17% numa-vmstat.node1.numa_local
>> 19365342 ? 19% +187.2% 55620100 ? 2% numa-vmstat.node1.numa_miss
>> 19534837 ? 19% +185.6% 55790654 ? 2% numa-vmstat.node1.numa_other
>> 1.296e+08 -21.0% 1.024e+08 numa-vmstat.node1.workingset_refault
>> 1.832e+12 -7.5% 1.694e+12 perf-stat.branch-instructions
>> 0.25 -0.0 0.23 perf-stat.branch-miss-rate%
>> 4.666e+09 -16.0% 3.918e+09 perf-stat.branch-misses
>> 39.88 +1.1 40.98 perf-stat.cache-miss-rate%
>> 2.812e+10 -11.6% 2.485e+10 perf-stat.cache-misses
>> 7.051e+10 -14.0% 6.064e+10 perf-stat.cache-references
>> 1260521 -6.1% 1183071 perf-stat.context-switches
>> 1.87 +9.6% 2.05 perf-stat.cpi
>> 6707 ? 2% -5.2% 6359 perf-stat.cpu-migrations
>> 1.04 ? 11% -0.3 0.77 ? 4% perf-stat.dTLB-load-miss-rate%
>> 2.365e+10 ? 7% -25.9% 1.751e+10 ? 9% perf-stat.dTLB-load-misses
>> 1.05e+12 ? 4% -9.5% 9.497e+11 ? 2% perf-stat.dTLB-stores
>> 28.16 +2.2 30.35 ? 2% perf-stat.iTLB-load-miss-rate%
>> 2.56e+08 -10.4% 2.295e+08 perf-stat.iTLB-loads
>> 8.974e+12 -9.2% 8.151e+12 perf-stat.instructions
>> 89411 -8.8% 81529 perf-stat.instructions-per-iTLB-miss
>> 0.54 -8.8% 0.49 perf-stat.ipc
>> 5.748e+08 -16.4% 4.806e+08 perf-stat.major-faults
>> 52.82 +5.8 58.61 ? 2% perf-stat.node-load-miss-rate%
>> 7.206e+09 ? 2% -18.6% 5.867e+09 ? 3% perf-stat.node-loads
>> 17.96 ? 8% +15.7 33.69 ? 2% perf-stat.node-store-miss-rate%
>> 2.055e+09 ? 8% +65.1% 3.393e+09 ? 4% perf-stat.node-store-misses
>> 9.391e+09 ? 2% -28.9% 6.675e+09 perf-stat.node-stores
>> 5.753e+08 -16.4% 4.811e+08 perf-stat.page-faults
>> 305865 -16.3% 256108 proc-vmstat.allocstall_movable
>> 1923 ? 14% -72.1% 537.00 ? 12% proc-vmstat.allocstall_normal
>> 0.00 +Inf% 1577 ? 67% proc-vmstat.compact_isolated
>> 1005 ? 4% -65.8% 344.00 ? 7% proc-vmstat.kswapd_low_wmark_hit_quickly
>> 320062 +23.2% 394374 ? 4% proc-vmstat.nr_active_file
>> 6411 ? 2% -76.4% 1511 ? 4% proc-vmstat.nr_free_cma
>> 277.00 ? 12% -51.4% 134.75 ? 52% proc-vmstat.nr_vmscan_immediate_reclaim
>> 320049 +23.2% 394353 ? 4% proc-vmstat.nr_zone_active_file
>> 71262212 ? 15% +110.3% 1.499e+08 ? 3% proc-vmstat.numa_foreign
>> 5.042e+08 ? 2% -34.3% 3.314e+08 proc-vmstat.numa_hit
>> 5.041e+08 ? 2% -34.3% 3.314e+08 proc-vmstat.numa_local
>> 71262212 ? 15% +110.3% 1.499e+08 ? 3% proc-vmstat.numa_miss
>> 71273176 ? 15% +110.3% 1.499e+08 ? 3% proc-vmstat.numa_other
>> 1007 ? 4% -65.6% 346.25 ? 7% proc-vmstat.pageoutrun
>> 23070268 -16.0% 19386190 proc-vmstat.pgalloc_dma32
>> 5.525e+08 -16.7% 4.603e+08 proc-vmstat.pgalloc_normal
>> 5.753e+08 -16.4% 4.812e+08 proc-vmstat.pgfault
>> 5.751e+08 -16.3% 4.813e+08 proc-vmstat.pgfree
>> 5.748e+08 -16.4% 4.806e+08 proc-vmstat.pgmajfault
>> 2.299e+09 -16.4% 1.923e+09 proc-vmstat.pgpgin
>> 8.396e+08 -17.8% 6.901e+08 proc-vmstat.pgscan_direct
>> 3.018e+08 ? 2% -13.0% 2.627e+08 proc-vmstat.pgscan_kswapd
>> 4.1e+08 -15.1% 3.48e+08 proc-vmstat.pgsteal_direct
>> 1.542e+08 ? 3% -20.9% 1.22e+08 ? 3% proc-vmstat.pgsteal_kswapd
>> 23514 ? 4% -23.1% 18076 ? 16% proc-vmstat.slabs_scanned
>> 343040 ? 2% +40.3% 481253 ? 2% proc-vmstat.workingset_activate
>> 2.525e+08 -20.1% 2.018e+08 proc-vmstat.workingset_refault
>> 13.64 ? 3% -1.7 11.96 ? 2% perf-profile.calltrace.cycles-pp.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
>> 11.67 ? 3% -1.4 10.29 ? 2% perf-profile.calltrace.cycles-pp.submit_bio.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault.__do_fault
>> 11.64 ? 3% -1.4 10.25 ? 2% perf-profile.calltrace.cycles-pp.generic_make_request.submit_bio.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault
>> 11.10 ? 3% -1.3 9.82 ? 2% perf-profile.calltrace.cycles-pp.pmem_make_request.generic_make_request.submit_bio.ext4_mpage_readpages.filemap_fault
>> 9.21 ? 3% -1.2 8.04 ? 3% perf-profile.calltrace.cycles-pp.pmem_do_bvec.pmem_make_request.generic_make_request.submit_bio.ext4_mpage_readpages
>> 27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
>> 27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
>> 27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
>> 27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
>> 26.79 ? 4% -0.8 25.98 ? 5% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
>> 27.98 ? 3% -0.8 27.22 ? 4% perf-profile.calltrace.cycles-pp.secondary_startup_64
>> 5.36 ? 12% -0.6 4.76 ? 7% perf-profile.calltrace.cycles-pp.kswapd.kthread.ret_from_fork
>> 5.36 ? 12% -0.6 4.76 ? 7% perf-profile.calltrace.cycles-pp.shrink_node.kswapd.kthread.ret_from_fork
>> 5.30 ? 12% -0.6 4.71 ? 7% perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.kswapd.kthread
>> 5.35 ? 12% -0.6 4.76 ? 7% perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.kswapd.kthread.ret_from_fork
>> 5.43 ? 12% -0.5 4.88 ? 7% perf-profile.calltrace.cycles-pp.ret_from_fork
>> 5.43 ? 12% -0.5 4.88 ? 7% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
>> 11.04 ? 2% -0.2 10.82 ? 2% perf-profile.calltrace.cycles-pp.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages
>> 62.44 ? 2% +1.9 64.38 perf-profile.calltrace.cycles-pp.page_fault
>> 62.38 ? 2% +2.0 64.33 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
>> 62.38 ? 2% +2.0 64.34 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
>> 61.52 ? 2% +2.1 63.58 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
>> 61.34 ? 2% +2.1 63.44 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
>> 30.18 ? 3% +2.3 32.45 ? 2% perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages.try_to_free_pages
>> 7.98 ? 3% +2.3 10.33 ? 2% perf-profile.calltrace.cycles-pp.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
>> 30.48 ? 3% +2.4 32.83 ? 2% perf-profile.calltrace.cycles-pp.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault
>> 30.46 ? 3% +2.4 32.81 ? 2% perf-profile.calltrace.cycles-pp.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault
>> 30.46 ? 3% +2.4 32.81 ? 2% perf-profile.calltrace.cycles-pp.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask
>> 30.37 ? 3% +2.4 32.75 ? 2% perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath
>> 5.58 ? 4% +2.5 8.08 ? 2% perf-profile.calltrace.cycles-pp.__lru_cache_add.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault.__do_fault
>> 32.88 ? 3% +2.5 35.38 ? 2% perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
>> 5.51 ? 4% +2.5 8.02 ? 2% perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault
>> 4.24 ? 4% +2.5 6.76 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru.filemap_fault
>> 4.18 ? 4% +2.5 6.70 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru
>> 18.64 ? 3% +2.5 21.16 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node
>> 31.65 ? 3% +2.7 34.31 ? 2% perf-profile.calltrace.cycles-pp.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault.__do_fault
>> 17.21 ? 3% +2.7 19.93 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages
>
>It looks like there is more lru lock contention. It would be caused by
>using a movable zone for the CMA memory by this patch. In this case,
>reclaim for normal memory skips the lru page on the movable zone so needs
>more time to find enough reclaim target pages. It would increase lru lock
>holding time and then cause contention.
>
>Could you give me another stat 'pgskip_XXX' in /proc/vmstat to confirm
>my theory?

Attached is the /proc/vmstat sample file during the test, sample interval is 1s.

Thanks,
Xiaolong
>
>BTW, other stats for memory management looks better than before,
>less nr_free_cma, less pgmajfault, less reclaim scan/steal.
>
>Thanks.


Attachments:
(No filename) (18.36 kB)
proc-vmstat-f6572f9cd24.gz (53.09 kB)
proc-vmstat-2b0f904a5.gz (54.01 kB)
Download all attachments

2018-01-09 07:15:17

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [lkp-robot] [mm/cma] 2b0f904a5a: fio.read_bw_MBps -16.1% regression

On Sat, Jan 06, 2018 at 05:26:31PM +0800, Ye Xiaolong wrote:
> Hi,
>
> On 01/03, Joonsoo Kim wrote:
> >Hello!
> >
> >On Tue, Jan 02, 2018 at 02:35:28PM +0800, kernel test robot wrote:
> >>
> >> Greeting,
> >>
> >> FYI, we noticed a -16.1% regression of fio.read_bw_MBps due to commit:
> >>
> >>
> >> commit: 2b0f904a5a8781498417d67226fd12c5e56053ae ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
> >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>
> >> in testcase: fio-basic
> >> on test machine: 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory
> >> with following parameters:
> >>
> >> disk: 2pmem
> >> fs: ext4
> >> runtime: 200s
> >> nr_task: 50%
> >> time_based: tb
> >> rw: randread
> >> bs: 2M
> >> ioengine: mmap
> >> test_size: 200G
> >> cpufreq_governor: performance
> >>
> >> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
> >> test-url: https://github.com/axboe/fio
> >>
> >>
> >>
> >> Details are as below:
> >> -------------------------------------------------------------------------------------------------->
> >>
> >>
> >> To reproduce:
> >>
> >> git clone https://github.com/intel/lkp-tests.git
> >> cd lkp-tests
> >> bin/lkp install job.yaml # job file is attached in this email
> >> bin/lkp run job.yaml
> >>
> >> =========================================================================================
> >> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
> >> 2M/gcc-7/performance/2pmem/ext4/mmap/x86_64-rhel-7.2/50%/debian-x86_64-2016-08-31.cgz/200s/randread/lkp-hsw-ep6/200G/fio-basic/tb
> >>
> >> commit:
> >> f6572f9cd2 ("mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request")
> >> 2b0f904a5a ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
> >>
> >> f6572f9cd248df2c 2b0f904a5a8781498417d67226
> >> ---------------- --------------------------
> >> %stddev %change %stddev
> >> \ | \
> >> 11451 -16.1% 9605 fio.read_bw_MBps
> >> 0.29 ? 5% +0.1 0.40 ? 3% fio.latency_1000us%
> >> 19.35 ? 5% -4.7 14.69 ? 3% fio.latency_10ms%
> >> 7.92 ? 3% +12.2 20.15 fio.latency_20ms%
> >> 0.05 ? 11% +0.0 0.09 ? 8% fio.latency_2ms%
> >> 70.22 -8.9 61.36 fio.latency_4ms%
> >> 0.29 ? 13% +0.0 0.33 ? 3% fio.latency_500us%
> >> 0.45 ? 29% +1.0 1.45 ? 4% fio.latency_50ms%
> >> 1.37 +0.1 1.44 fio.latency_750us%
> >> 9792 +31.7% 12896 fio.read_clat_90%_us
> >> 10560 +33.0% 14048 fio.read_clat_95%_us
> >> 15376 ? 10% +46.9% 22592 fio.read_clat_99%_us
> >> 4885 +19.2% 5825 fio.read_clat_mean_us
> >> 5725 -16.1% 4802 fio.read_iops
> >> 4.598e+09 -16.4% 3.845e+09 fio.time.file_system_inputs
> >> 453153 -8.4% 415215 fio.time.involuntary_context_switches
> >> 5.748e+08 -16.4% 4.806e+08 fio.time.major_page_faults
> >> 1822257 +23.7% 2254706 fio.time.maximum_resident_set_size
> >> 5089 +1.6% 5172 fio.time.system_time
> >> 514.50 -16.3% 430.48 fio.time.user_time
> >
> >System time is increased and user time is decreased. On the below, there is a clue.
> >
> >> 24569 ? 2% +9.6% 26917 ? 2% fio.time.voluntary_context_switches
> >> 54443725 -14.9% 46353339 interrupts.CAL:Function_call_interrupts
> >> 0.00 ? 79% -0.0 0.00 ? 17% mpstat.cpu.iowait%
> >> 4.45 -0.7 3.71 mpstat.cpu.usr%
> >> 1467516 +21.3% 1779543 ? 3% meminfo.Active
> >> 1276031 +23.7% 1578443 ? 4% meminfo.Active(file)
> >> 25789 ? 3% -76.7% 6013 ? 4% meminfo.CmaFree
> >> 1.296e+08 -12.6% 1.133e+08 turbostat.IRQ
> >> 41.89 -3.4% 40.47 turbostat.RAMWatt
> >> 17444 ? 2% -13.5% 15092 ? 3% turbostat.SMI
> >> 10896428 -16.4% 9111830 vmstat.io.bi
> >> 6010 -6.2% 5637 vmstat.system.cs
> >> 317438 -12.1% 278980 vmstat.system.in
> >> 1072892 ? 3% +21.5% 1303487 numa-meminfo.node0.Active
> >> 978318 +21.6% 1189809 ? 2% numa-meminfo.node0.Active(file)
> >> 222968 -25.2% 166818 numa-meminfo.node0.PageTables
> >> 47374 ? 2% +10.6% 52402 ? 7% numa-meminfo.node0.SUnreclaim
> >> 165213 +31.9% 217870 numa-meminfo.node1.PageTables
> >> 222405 +10.4% 245633 ? 2% numa-meminfo.node1.SReclaimable
> >> 102992 ? 46% -80.8% 19812 ? 38% numa-meminfo.node1.Shmem
> >> 2.475e+08 ? 2% -24.0% 1.881e+08 numa-numastat.node0.local_node
> >> 39371795 ? 14% +167.1% 1.052e+08 ? 2% numa-numastat.node0.numa_foreign
> >> 2.475e+08 ? 2% -24.0% 1.881e+08 numa-numastat.node0.numa_hit
> >> 31890417 ? 17% +40.2% 44705135 ? 8% numa-numastat.node0.numa_miss
> >> 31899482 ? 17% +40.2% 44713255 ? 8% numa-numastat.node0.other_node
> >> 2.566e+08 ? 2% -44.2% 1.433e+08 numa-numastat.node1.local_node
> >> 31890417 ? 17% +40.2% 44705135 ? 8% numa-numastat.node1.numa_foreign
> >> 2.566e+08 ? 2% -44.2% 1.433e+08 numa-numastat.node1.numa_hit
> >> 39371795 ? 14% +167.1% 1.052e+08 ? 2% numa-numastat.node1.numa_miss
> >> 39373660 ? 14% +167.1% 1.052e+08 ? 2% numa-numastat.node1.other_node
> >> 6047 ? 39% -66.5% 2028 ? 63% sched_debug.cfs_rq:/.exec_clock.min
> >> 461.37 ? 8% +64.9% 760.74 ? 20% sched_debug.cfs_rq:/.load_avg.avg
> >> 1105 ? 13% +1389.3% 16467 ? 56% sched_debug.cfs_rq:/.load_avg.max
> >> 408.99 ? 3% +495.0% 2433 ? 49% sched_debug.cfs_rq:/.load_avg.stddev
> >> 28746 ? 12% -18.7% 23366 ? 14% sched_debug.cfs_rq:/.min_vruntime.min
> >> 752426 ? 3% -12.7% 656636 ? 4% sched_debug.cpu.avg_idle.avg
> >> 144956 ? 61% -85.4% 21174 ? 26% sched_debug.cpu.avg_idle.min
> >> 245684 ? 11% +44.6% 355257 ? 2% sched_debug.cpu.avg_idle.stddev
> >> 236035 ? 15% +51.8% 358264 ? 16% sched_debug.cpu.nr_switches.max
> >> 42039 ? 22% +34.7% 56616 ? 8% sched_debug.cpu.nr_switches.stddev
> >> 3204 ? 24% -48.1% 1663 ? 30% sched_debug.cpu.sched_count.min
> >> 2132 ? 25% +38.7% 2957 ? 11% sched_debug.cpu.sched_count.stddev
> >> 90.67 ? 32% -71.8% 25.58 ? 26% sched_debug.cpu.sched_goidle.min
> >> 6467 ? 15% +22.3% 7912 ? 15% sched_debug.cpu.ttwu_count.max
> >> 1513 ? 27% -55.7% 670.92 ? 22% sched_debug.cpu.ttwu_count.min
> >> 1025 ? 20% +68.4% 1727 ? 9% sched_debug.cpu.ttwu_count.stddev
> >> 1057 ? 16% -62.9% 391.85 ? 31% sched_debug.cpu.ttwu_local.min
> >> 244876 +21.6% 297770 ? 2% numa-vmstat.node0.nr_active_file
> >> 88.00 ? 5% +19.3% 105.00 ? 5% numa-vmstat.node0.nr_isolated_file
> >> 55778 -25.1% 41765 numa-vmstat.node0.nr_page_table_pages
> >> 11843 ? 2% +10.6% 13100 ? 7% numa-vmstat.node0.nr_slab_unreclaimable
> >> 159.25 ? 42% -74.9% 40.00 ? 52% numa-vmstat.node0.nr_vmscan_immediate_reclaim
> >> 244862 +21.6% 297739 ? 2% numa-vmstat.node0.nr_zone_active_file
> >> 19364320 ? 19% +187.2% 55617595 ? 2% numa-vmstat.node0.numa_foreign
> >> 268155 ? 3% +49.6% 401089 ? 4% numa-vmstat.node0.workingset_activate
> >> 1.229e+08 -19.0% 99590617 numa-vmstat.node0.workingset_refault
> >> 6345 ? 3% -76.5% 1489 ? 3% numa-vmstat.node1.nr_free_cma
> >> 41335 +32.0% 54552 numa-vmstat.node1.nr_page_table_pages
> >> 25770 ? 46% -80.8% 4956 ? 38% numa-vmstat.node1.nr_shmem
> >> 55684 +10.4% 61475 ? 2% numa-vmstat.node1.nr_slab_reclaimable
> >> 1.618e+08 ? 8% -47.6% 84846798 ? 17% numa-vmstat.node1.numa_hit
> >> 1.617e+08 ? 8% -47.6% 84676284 ? 17% numa-vmstat.node1.numa_local
> >> 19365342 ? 19% +187.2% 55620100 ? 2% numa-vmstat.node1.numa_miss
> >> 19534837 ? 19% +185.6% 55790654 ? 2% numa-vmstat.node1.numa_other
> >> 1.296e+08 -21.0% 1.024e+08 numa-vmstat.node1.workingset_refault
> >> 1.832e+12 -7.5% 1.694e+12 perf-stat.branch-instructions
> >> 0.25 -0.0 0.23 perf-stat.branch-miss-rate%
> >> 4.666e+09 -16.0% 3.918e+09 perf-stat.branch-misses
> >> 39.88 +1.1 40.98 perf-stat.cache-miss-rate%
> >> 2.812e+10 -11.6% 2.485e+10 perf-stat.cache-misses
> >> 7.051e+10 -14.0% 6.064e+10 perf-stat.cache-references
> >> 1260521 -6.1% 1183071 perf-stat.context-switches
> >> 1.87 +9.6% 2.05 perf-stat.cpi
> >> 6707 ? 2% -5.2% 6359 perf-stat.cpu-migrations
> >> 1.04 ? 11% -0.3 0.77 ? 4% perf-stat.dTLB-load-miss-rate%
> >> 2.365e+10 ? 7% -25.9% 1.751e+10 ? 9% perf-stat.dTLB-load-misses
> >> 1.05e+12 ? 4% -9.5% 9.497e+11 ? 2% perf-stat.dTLB-stores
> >> 28.16 +2.2 30.35 ? 2% perf-stat.iTLB-load-miss-rate%
> >> 2.56e+08 -10.4% 2.295e+08 perf-stat.iTLB-loads
> >> 8.974e+12 -9.2% 8.151e+12 perf-stat.instructions
> >> 89411 -8.8% 81529 perf-stat.instructions-per-iTLB-miss
> >> 0.54 -8.8% 0.49 perf-stat.ipc
> >> 5.748e+08 -16.4% 4.806e+08 perf-stat.major-faults
> >> 52.82 +5.8 58.61 ? 2% perf-stat.node-load-miss-rate%
> >> 7.206e+09 ? 2% -18.6% 5.867e+09 ? 3% perf-stat.node-loads
> >> 17.96 ? 8% +15.7 33.69 ? 2% perf-stat.node-store-miss-rate%
> >> 2.055e+09 ? 8% +65.1% 3.393e+09 ? 4% perf-stat.node-store-misses
> >> 9.391e+09 ? 2% -28.9% 6.675e+09 perf-stat.node-stores
> >> 5.753e+08 -16.4% 4.811e+08 perf-stat.page-faults
> >> 305865 -16.3% 256108 proc-vmstat.allocstall_movable
> >> 1923 ? 14% -72.1% 537.00 ? 12% proc-vmstat.allocstall_normal
> >> 0.00 +Inf% 1577 ? 67% proc-vmstat.compact_isolated
> >> 1005 ? 4% -65.8% 344.00 ? 7% proc-vmstat.kswapd_low_wmark_hit_quickly
> >> 320062 +23.2% 394374 ? 4% proc-vmstat.nr_active_file
> >> 6411 ? 2% -76.4% 1511 ? 4% proc-vmstat.nr_free_cma
> >> 277.00 ? 12% -51.4% 134.75 ? 52% proc-vmstat.nr_vmscan_immediate_reclaim
> >> 320049 +23.2% 394353 ? 4% proc-vmstat.nr_zone_active_file
> >> 71262212 ? 15% +110.3% 1.499e+08 ? 3% proc-vmstat.numa_foreign
> >> 5.042e+08 ? 2% -34.3% 3.314e+08 proc-vmstat.numa_hit
> >> 5.041e+08 ? 2% -34.3% 3.314e+08 proc-vmstat.numa_local
> >> 71262212 ? 15% +110.3% 1.499e+08 ? 3% proc-vmstat.numa_miss
> >> 71273176 ? 15% +110.3% 1.499e+08 ? 3% proc-vmstat.numa_other
> >> 1007 ? 4% -65.6% 346.25 ? 7% proc-vmstat.pageoutrun
> >> 23070268 -16.0% 19386190 proc-vmstat.pgalloc_dma32
> >> 5.525e+08 -16.7% 4.603e+08 proc-vmstat.pgalloc_normal
> >> 5.753e+08 -16.4% 4.812e+08 proc-vmstat.pgfault
> >> 5.751e+08 -16.3% 4.813e+08 proc-vmstat.pgfree
> >> 5.748e+08 -16.4% 4.806e+08 proc-vmstat.pgmajfault
> >> 2.299e+09 -16.4% 1.923e+09 proc-vmstat.pgpgin
> >> 8.396e+08 -17.8% 6.901e+08 proc-vmstat.pgscan_direct
> >> 3.018e+08 ? 2% -13.0% 2.627e+08 proc-vmstat.pgscan_kswapd
> >> 4.1e+08 -15.1% 3.48e+08 proc-vmstat.pgsteal_direct
> >> 1.542e+08 ? 3% -20.9% 1.22e+08 ? 3% proc-vmstat.pgsteal_kswapd
> >> 23514 ? 4% -23.1% 18076 ? 16% proc-vmstat.slabs_scanned
> >> 343040 ? 2% +40.3% 481253 ? 2% proc-vmstat.workingset_activate
> >> 2.525e+08 -20.1% 2.018e+08 proc-vmstat.workingset_refault
> >> 13.64 ? 3% -1.7 11.96 ? 2% perf-profile.calltrace.cycles-pp.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
> >> 11.67 ? 3% -1.4 10.29 ? 2% perf-profile.calltrace.cycles-pp.submit_bio.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault.__do_fault
> >> 11.64 ? 3% -1.4 10.25 ? 2% perf-profile.calltrace.cycles-pp.generic_make_request.submit_bio.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault
> >> 11.10 ? 3% -1.3 9.82 ? 2% perf-profile.calltrace.cycles-pp.pmem_make_request.generic_make_request.submit_bio.ext4_mpage_readpages.filemap_fault
> >> 9.21 ? 3% -1.2 8.04 ? 3% perf-profile.calltrace.cycles-pp.pmem_do_bvec.pmem_make_request.generic_make_request.submit_bio.ext4_mpage_readpages
> >> 27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
> >> 27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> >> 27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
> >> 27.33 ? 4% -1.0 26.35 ? 5% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
> >> 26.79 ? 4% -0.8 25.98 ? 5% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
> >> 27.98 ? 3% -0.8 27.22 ? 4% perf-profile.calltrace.cycles-pp.secondary_startup_64
> >> 5.36 ? 12% -0.6 4.76 ? 7% perf-profile.calltrace.cycles-pp.kswapd.kthread.ret_from_fork
> >> 5.36 ? 12% -0.6 4.76 ? 7% perf-profile.calltrace.cycles-pp.shrink_node.kswapd.kthread.ret_from_fork
> >> 5.30 ? 12% -0.6 4.71 ? 7% perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.kswapd.kthread
> >> 5.35 ? 12% -0.6 4.76 ? 7% perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.kswapd.kthread.ret_from_fork
> >> 5.43 ? 12% -0.5 4.88 ? 7% perf-profile.calltrace.cycles-pp.ret_from_fork
> >> 5.43 ? 12% -0.5 4.88 ? 7% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
> >> 11.04 ? 2% -0.2 10.82 ? 2% perf-profile.calltrace.cycles-pp.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages
> >> 62.44 ? 2% +1.9 64.38 perf-profile.calltrace.cycles-pp.page_fault
> >> 62.38 ? 2% +2.0 64.33 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
> >> 62.38 ? 2% +2.0 64.34 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
> >> 61.52 ? 2% +2.1 63.58 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> >> 61.34 ? 2% +2.1 63.44 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
> >> 30.18 ? 3% +2.3 32.45 ? 2% perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages.try_to_free_pages
> >> 7.98 ? 3% +2.3 10.33 ? 2% perf-profile.calltrace.cycles-pp.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
> >> 30.48 ? 3% +2.4 32.83 ? 2% perf-profile.calltrace.cycles-pp.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault
> >> 30.46 ? 3% +2.4 32.81 ? 2% perf-profile.calltrace.cycles-pp.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault
> >> 30.46 ? 3% +2.4 32.81 ? 2% perf-profile.calltrace.cycles-pp.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask
> >> 30.37 ? 3% +2.4 32.75 ? 2% perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath
> >> 5.58 ? 4% +2.5 8.08 ? 2% perf-profile.calltrace.cycles-pp.__lru_cache_add.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault.__do_fault
> >> 32.88 ? 3% +2.5 35.38 ? 2% perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
> >> 5.51 ? 4% +2.5 8.02 ? 2% perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault
> >> 4.24 ? 4% +2.5 6.76 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru.filemap_fault
> >> 4.18 ? 4% +2.5 6.70 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru
> >> 18.64 ? 3% +2.5 21.16 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node
> >> 31.65 ? 3% +2.7 34.31 ? 2% perf-profile.calltrace.cycles-pp.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault.__do_fault
> >> 17.21 ? 3% +2.7 19.93 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages
> >
> >It looks like there is more lru lock contention. It would be caused by
> >using a movable zone for the CMA memory by this patch. In this case,
> >reclaim for normal memory skips the lru page on the movable zone so needs
> >more time to find enough reclaim target pages. It would increase lru lock
> >holding time and then cause contention.
> >
> >Could you give me another stat 'pgskip_XXX' in /proc/vmstat to confirm
> >my theory?
>
> Attached is the /proc/vmstat sample file during the test, sample interval is 1s.

Thanks!

pgskip_XXX is low so my theory would be wrong. The other theory is
that numa miss is the reason of the regression. Could you test the
same test on the system without numa? I cannot test it since I
don't have pmem.

Thanks.

2018-04-05 07:49:59

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [lkp-robot] [mm/cma] 2b0f904a5a: fio.read_bw_MBps -16.1% regression

Hello,
sorry for bothering you.

2018-01-09 16:16 GMT+09:00 Joonsoo Kim <[email protected]>:
> On Sat, Jan 06, 2018 at 05:26:31PM +0800, Ye Xiaolong wrote:
>> Hi,
>>
>> On 01/03, Joonsoo Kim wrote:
>> >Hello!
>> >
>> >On Tue, Jan 02, 2018 at 02:35:28PM +0800, kernel test robot wrote:
>> >>
>> >> Greeting,
>> >>
>> >> FYI, we noticed a -16.1% regression of fio.read_bw_MBps due to commit:
>> >>
>> >>
>> >> commit: 2b0f904a5a8781498417d67226fd12c5e56053ae ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
>> >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>> >>
>> >> in testcase: fio-basic
>> >> on test machine: 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory
>> >> with following parameters:
>> >>
>> >> disk: 2pmem
>> >> fs: ext4
>> >> runtime: 200s
>> >> nr_task: 50%
>> >> time_based: tb
>> >> rw: randread
>> >> bs: 2M
>> >> ioengine: mmap
>> >> test_size: 200G
>> >> cpufreq_governor: performance
>> >>
>> >> test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user.
>> >> test-url: https://github.com/axboe/fio
>> >>
>> >>
>> >>
>> >> Details are as below:
>> >> -------------------------------------------------------------------------------------------------->
>> >>
>> >>
>> >> To reproduce:
>> >>
>> >> git clone https://github.com/intel/lkp-tests.git
>> >> cd lkp-tests
>> >> bin/lkp install job.yaml # job file is attached in this email
>> >> bin/lkp run job.yaml
>> >>
>> >> =========================================================================================
>> >> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based:
>> >> 2M/gcc-7/performance/2pmem/ext4/mmap/x86_64-rhel-7.2/50%/debian-x86_64-2016-08-31.cgz/200s/randread/lkp-hsw-ep6/200G/fio-basic/tb
>> >>
>> >> commit:
>> >> f6572f9cd2 ("mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request")
>> >> 2b0f904a5a ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
>> >>
>> >> f6572f9cd248df2c 2b0f904a5a8781498417d67226
>> >> ---------------- --------------------------
>> >> %stddev %change %stddev
>> >> \ | \
>> >> 11451 -16.1% 9605 fio.read_bw_MBps
>> >> 0.29 ą 5% +0.1 0.40 ą 3% fio.latency_1000us%
>> >> 19.35 ą 5% -4.7 14.69 ą 3% fio.latency_10ms%
>> >> 7.92 ą 3% +12.2 20.15 fio.latency_20ms%
>> >> 0.05 ą 11% +0.0 0.09 ą 8% fio.latency_2ms%
>> >> 70.22 -8.9 61.36 fio.latency_4ms%
>> >> 0.29 ą 13% +0.0 0.33 ą 3% fio.latency_500us%
>> >> 0.45 ą 29% +1.0 1.45 ą 4% fio.latency_50ms%
>> >> 1.37 +0.1 1.44 fio.latency_750us%
>> >> 9792 +31.7% 12896 fio.read_clat_90%_us
>> >> 10560 +33.0% 14048 fio.read_clat_95%_us
>> >> 15376 ą 10% +46.9% 22592 fio.read_clat_99%_us
>> >> 4885 +19.2% 5825 fio.read_clat_mean_us
>> >> 5725 -16.1% 4802 fio.read_iops
>> >> 4.598e+09 -16.4% 3.845e+09 fio.time.file_system_inputs
>> >> 453153 -8.4% 415215 fio.time.involuntary_context_switches
>> >> 5.748e+08 -16.4% 4.806e+08 fio.time.major_page_faults
>> >> 1822257 +23.7% 2254706 fio.time.maximum_resident_set_size
>> >> 5089 +1.6% 5172 fio.time.system_time
>> >> 514.50 -16.3% 430.48 fio.time.user_time
>> >
>> >System time is increased and user time is decreased. On the below, there is a clue.
>> >
>> >> 24569 ą 2% +9.6% 26917 ą 2% fio.time.voluntary_context_switches
>> >> 54443725 -14.9% 46353339 interrupts.CAL:Function_call_interrupts
>> >> 0.00 ą 79% -0.0 0.00 ą 17% mpstat.cpu.iowait%
>> >> 4.45 -0.7 3.71 mpstat.cpu.usr%
>> >> 1467516 +21.3% 1779543 ą 3% meminfo.Active
>> >> 1276031 +23.7% 1578443 ą 4% meminfo.Active(file)
>> >> 25789 ą 3% -76.7% 6013 ą 4% meminfo.CmaFree
>> >> 1.296e+08 -12.6% 1.133e+08 turbostat.IRQ
>> >> 41.89 -3.4% 40.47 turbostat.RAMWatt
>> >> 17444 ą 2% -13.5% 15092 ą 3% turbostat.SMI
>> >> 10896428 -16.4% 9111830 vmstat.io.bi
>> >> 6010 -6.2% 5637 vmstat.system.cs
>> >> 317438 -12.1% 278980 vmstat.system.in
>> >> 1072892 ą 3% +21.5% 1303487 numa-meminfo.node0.Active
>> >> 978318 +21.6% 1189809 ą 2% numa-meminfo.node0.Active(file)
>> >> 222968 -25.2% 166818 numa-meminfo.node0.PageTables
>> >> 47374 ą 2% +10.6% 52402 ą 7% numa-meminfo.node0.SUnreclaim
>> >> 165213 +31.9% 217870 numa-meminfo.node1.PageTables
>> >> 222405 +10.4% 245633 ą 2% numa-meminfo.node1.SReclaimable
>> >> 102992 ą 46% -80.8% 19812 ą 38% numa-meminfo.node1.Shmem
>> >> 2.475e+08 ą 2% -24.0% 1.881e+08 numa-numastat.node0.local_node
>> >> 39371795 ą 14% +167.1% 1.052e+08 ą 2% numa-numastat.node0.numa_foreign
>> >> 2.475e+08 ą 2% -24.0% 1.881e+08 numa-numastat.node0.numa_hit
>> >> 31890417 ą 17% +40.2% 44705135 ą 8% numa-numastat.node0.numa_miss
>> >> 31899482 ą 17% +40.2% 44713255 ą 8% numa-numastat.node0.other_node
>> >> 2.566e+08 ą 2% -44.2% 1.433e+08 numa-numastat.node1.local_node
>> >> 31890417 ą 17% +40.2% 44705135 ą 8% numa-numastat.node1.numa_foreign
>> >> 2.566e+08 ą 2% -44.2% 1.433e+08 numa-numastat.node1.numa_hit
>> >> 39371795 ą 14% +167.1% 1.052e+08 ą 2% numa-numastat.node1.numa_miss
>> >> 39373660 ą 14% +167.1% 1.052e+08 ą 2% numa-numastat.node1.other_node
>> >> 6047 ą 39% -66.5% 2028 ą 63% sched_debug.cfs_rq:/.exec_clock.min
>> >> 461.37 ą 8% +64.9% 760.74 ą 20% sched_debug.cfs_rq:/.load_avg.avg
>> >> 1105 ą 13% +1389.3% 16467 ą 56% sched_debug.cfs_rq:/.load_avg.max
>> >> 408.99 ą 3% +495.0% 2433 ą 49% sched_debug.cfs_rq:/.load_avg.stddev
>> >> 28746 ą 12% -18.7% 23366 ą 14% sched_debug.cfs_rq:/.min_vruntime.min
>> >> 752426 ą 3% -12.7% 656636 ą 4% sched_debug.cpu.avg_idle.avg
>> >> 144956 ą 61% -85.4% 21174 ą 26% sched_debug.cpu.avg_idle.min
>> >> 245684 ą 11% +44.6% 355257 ą 2% sched_debug.cpu.avg_idle.stddev
>> >> 236035 ą 15% +51.8% 358264 ą 16% sched_debug.cpu.nr_switches.max
>> >> 42039 ą 22% +34.7% 56616 ą 8% sched_debug.cpu.nr_switches.stddev
>> >> 3204 ą 24% -48.1% 1663 ą 30% sched_debug.cpu.sched_count.min
>> >> 2132 ą 25% +38.7% 2957 ą 11% sched_debug.cpu.sched_count.stddev
>> >> 90.67 ą 32% -71.8% 25.58 ą 26% sched_debug.cpu.sched_goidle.min
>> >> 6467 ą 15% +22.3% 7912 ą 15% sched_debug.cpu.ttwu_count.max
>> >> 1513 ą 27% -55.7% 670.92 ą 22% sched_debug.cpu.ttwu_count.min
>> >> 1025 ą 20% +68.4% 1727 ą 9% sched_debug.cpu.ttwu_count.stddev
>> >> 1057 ą 16% -62.9% 391.85 ą 31% sched_debug.cpu.ttwu_local.min
>> >> 244876 +21.6% 297770 ą 2% numa-vmstat.node0.nr_active_file
>> >> 88.00 ą 5% +19.3% 105.00 ą 5% numa-vmstat.node0.nr_isolated_file
>> >> 55778 -25.1% 41765 numa-vmstat.node0.nr_page_table_pages
>> >> 11843 ą 2% +10.6% 13100 ą 7% numa-vmstat.node0.nr_slab_unreclaimable
>> >> 159.25 ą 42% -74.9% 40.00 ą 52% numa-vmstat.node0.nr_vmscan_immediate_reclaim
>> >> 244862 +21.6% 297739 ą 2% numa-vmstat.node0.nr_zone_active_file
>> >> 19364320 ą 19% +187.2% 55617595 ą 2% numa-vmstat.node0.numa_foreign
>> >> 268155 ą 3% +49.6% 401089 ą 4% numa-vmstat.node0.workingset_activate
>> >> 1.229e+08 -19.0% 99590617 numa-vmstat.node0.workingset_refault
>> >> 6345 ą 3% -76.5% 1489 ą 3% numa-vmstat.node1.nr_free_cma
>> >> 41335 +32.0% 54552 numa-vmstat.node1.nr_page_table_pages
>> >> 25770 ą 46% -80.8% 4956 ą 38% numa-vmstat.node1.nr_shmem
>> >> 55684 +10.4% 61475 ą 2% numa-vmstat.node1.nr_slab_reclaimable
>> >> 1.618e+08 ą 8% -47.6% 84846798 ą 17% numa-vmstat.node1.numa_hit
>> >> 1.617e+08 ą 8% -47.6% 84676284 ą 17% numa-vmstat.node1.numa_local
>> >> 19365342 ą 19% +187.2% 55620100 ą 2% numa-vmstat.node1.numa_miss
>> >> 19534837 ą 19% +185.6% 55790654 ą 2% numa-vmstat.node1.numa_other
>> >> 1.296e+08 -21.0% 1.024e+08 numa-vmstat.node1.workingset_refault
>> >> 1.832e+12 -7.5% 1.694e+12 perf-stat.branch-instructions
>> >> 0.25 -0.0 0.23 perf-stat.branch-miss-rate%
>> >> 4.666e+09 -16.0% 3.918e+09 perf-stat.branch-misses
>> >> 39.88 +1.1 40.98 perf-stat.cache-miss-rate%
>> >> 2.812e+10 -11.6% 2.485e+10 perf-stat.cache-misses
>> >> 7.051e+10 -14.0% 6.064e+10 perf-stat.cache-references
>> >> 1260521 -6.1% 1183071 perf-stat.context-switches
>> >> 1.87 +9.6% 2.05 perf-stat.cpi
>> >> 6707 ą 2% -5.2% 6359 perf-stat.cpu-migrations
>> >> 1.04 ą 11% -0.3 0.77 ą 4% perf-stat.dTLB-load-miss-rate%
>> >> 2.365e+10 ą 7% -25.9% 1.751e+10 ą 9% perf-stat.dTLB-load-misses
>> >> 1.05e+12 ą 4% -9.5% 9.497e+11 ą 2% perf-stat.dTLB-stores
>> >> 28.16 +2.2 30.35 ą 2% perf-stat.iTLB-load-miss-rate%
>> >> 2.56e+08 -10.4% 2.295e+08 perf-stat.iTLB-loads
>> >> 8.974e+12 -9.2% 8.151e+12 perf-stat.instructions
>> >> 89411 -8.8% 81529 perf-stat.instructions-per-iTLB-miss
>> >> 0.54 -8.8% 0.49 perf-stat.ipc
>> >> 5.748e+08 -16.4% 4.806e+08 perf-stat.major-faults
>> >> 52.82 +5.8 58.61 ą 2% perf-stat.node-load-miss-rate%
>> >> 7.206e+09 ą 2% -18.6% 5.867e+09 ą 3% perf-stat.node-loads
>> >> 17.96 ą 8% +15.7 33.69 ą 2% perf-stat.node-store-miss-rate%
>> >> 2.055e+09 ą 8% +65.1% 3.393e+09 ą 4% perf-stat.node-store-misses
>> >> 9.391e+09 ą 2% -28.9% 6.675e+09 perf-stat.node-stores
>> >> 5.753e+08 -16.4% 4.811e+08 perf-stat.page-faults
>> >> 305865 -16.3% 256108 proc-vmstat.allocstall_movable
>> >> 1923 ą 14% -72.1% 537.00 ą 12% proc-vmstat.allocstall_normal
>> >> 0.00 +Inf% 1577 ą 67% proc-vmstat.compact_isolated
>> >> 1005 ą 4% -65.8% 344.00 ą 7% proc-vmstat.kswapd_low_wmark_hit_quickly
>> >> 320062 +23.2% 394374 ą 4% proc-vmstat.nr_active_file
>> >> 6411 ą 2% -76.4% 1511 ą 4% proc-vmstat.nr_free_cma
>> >> 277.00 ą 12% -51.4% 134.75 ą 52% proc-vmstat.nr_vmscan_immediate_reclaim
>> >> 320049 +23.2% 394353 ą 4% proc-vmstat.nr_zone_active_file
>> >> 71262212 ą 15% +110.3% 1.499e+08 ą 3% proc-vmstat.numa_foreign
>> >> 5.042e+08 ą 2% -34.3% 3.314e+08 proc-vmstat.numa_hit
>> >> 5.041e+08 ą 2% -34.3% 3.314e+08 proc-vmstat.numa_local
>> >> 71262212 ą 15% +110.3% 1.499e+08 ą 3% proc-vmstat.numa_miss
>> >> 71273176 ą 15% +110.3% 1.499e+08 ą 3% proc-vmstat.numa_other
>> >> 1007 ą 4% -65.6% 346.25 ą 7% proc-vmstat.pageoutrun
>> >> 23070268 -16.0% 19386190 proc-vmstat.pgalloc_dma32
>> >> 5.525e+08 -16.7% 4.603e+08 proc-vmstat.pgalloc_normal
>> >> 5.753e+08 -16.4% 4.812e+08 proc-vmstat.pgfault
>> >> 5.751e+08 -16.3% 4.813e+08 proc-vmstat.pgfree
>> >> 5.748e+08 -16.4% 4.806e+08 proc-vmstat.pgmajfault
>> >> 2.299e+09 -16.4% 1.923e+09 proc-vmstat.pgpgin
>> >> 8.396e+08 -17.8% 6.901e+08 proc-vmstat.pgscan_direct
>> >> 3.018e+08 ą 2% -13.0% 2.627e+08 proc-vmstat.pgscan_kswapd
>> >> 4.1e+08 -15.1% 3.48e+08 proc-vmstat.pgsteal_direct
>> >> 1.542e+08 ą 3% -20.9% 1.22e+08 ą 3% proc-vmstat.pgsteal_kswapd
>> >> 23514 ą 4% -23.1% 18076 ą 16% proc-vmstat.slabs_scanned
>> >> 343040 ą 2% +40.3% 481253 ą 2% proc-vmstat.workingset_activate
>> >> 2.525e+08 -20.1% 2.018e+08 proc-vmstat.workingset_refault
>> >> 13.64 ą 3% -1.7 11.96 ą 2% perf-profile.calltrace.cycles-pp.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
>> >> 11.67 ą 3% -1.4 10.29 ą 2% perf-profile.calltrace.cycles-pp.submit_bio.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault.__do_fault
>> >> 11.64 ą 3% -1.4 10.25 ą 2% perf-profile.calltrace.cycles-pp.generic_make_request.submit_bio.ext4_mpage_readpages.filemap_fault.ext4_filemap_fault
>> >> 11.10 ą 3% -1.3 9.82 ą 2% perf-profile.calltrace.cycles-pp.pmem_make_request.generic_make_request.submit_bio.ext4_mpage_readpages.filemap_fault
>> >> 9.21 ą 3% -1.2 8.04 ą 3% perf-profile.calltrace.cycles-pp.pmem_do_bvec.pmem_make_request.generic_make_request.submit_bio.ext4_mpage_readpages
>> >> 27.33 ą 4% -1.0 26.35 ą 5% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
>> >> 27.33 ą 4% -1.0 26.35 ą 5% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
>> >> 27.33 ą 4% -1.0 26.35 ą 5% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
>> >> 27.33 ą 4% -1.0 26.35 ą 5% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
>> >> 26.79 ą 4% -0.8 25.98 ą 5% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
>> >> 27.98 ą 3% -0.8 27.22 ą 4% perf-profile.calltrace.cycles-pp.secondary_startup_64
>> >> 5.36 ą 12% -0.6 4.76 ą 7% perf-profile.calltrace.cycles-pp.kswapd.kthread.ret_from_fork
>> >> 5.36 ą 12% -0.6 4.76 ą 7% perf-profile.calltrace.cycles-pp.shrink_node.kswapd.kthread.ret_from_fork
>> >> 5.30 ą 12% -0.6 4.71 ą 7% perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.kswapd.kthread
>> >> 5.35 ą 12% -0.6 4.76 ą 7% perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.kswapd.kthread.ret_from_fork
>> >> 5.43 ą 12% -0.5 4.88 ą 7% perf-profile.calltrace.cycles-pp.ret_from_fork
>> >> 5.43 ą 12% -0.5 4.88 ą 7% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork
>> >> 11.04 ą 2% -0.2 10.82 ą 2% perf-profile.calltrace.cycles-pp.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages
>> >> 62.44 ą 2% +1.9 64.38 perf-profile.calltrace.cycles-pp.page_fault
>> >> 62.38 ą 2% +2.0 64.33 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault
>> >> 62.38 ą 2% +2.0 64.34 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
>> >> 61.52 ą 2% +2.1 63.58 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
>> >> 61.34 ą 2% +2.1 63.44 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault
>> >> 30.18 ą 3% +2.3 32.45 ą 2% perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages.try_to_free_pages
>> >> 7.98 ą 3% +2.3 10.33 ą 2% perf-profile.calltrace.cycles-pp.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
>> >> 30.48 ą 3% +2.4 32.83 ą 2% perf-profile.calltrace.cycles-pp.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault
>> >> 30.46 ą 3% +2.4 32.81 ą 2% perf-profile.calltrace.cycles-pp.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault
>> >> 30.46 ą 3% +2.4 32.81 ą 2% perf-profile.calltrace.cycles-pp.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath.__alloc_pages_nodemask
>> >> 30.37 ą 3% +2.4 32.75 ą 2% perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.do_try_to_free_pages.try_to_free_pages.__alloc_pages_slowpath
>> >> 5.58 ą 4% +2.5 8.08 ą 2% perf-profile.calltrace.cycles-pp.__lru_cache_add.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault.__do_fault
>> >> 32.88 ą 3% +2.5 35.38 ą 2% perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault.__do_fault.__handle_mm_fault
>> >> 5.51 ą 4% +2.5 8.02 ą 2% perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru.filemap_fault.ext4_filemap_fault
>> >> 4.24 ą 4% +2.5 6.76 ą 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru.filemap_fault
>> >> 4.18 ą 4% +2.5 6.70 ą 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.add_to_page_cache_lru
>> >> 18.64 ą 3% +2.5 21.16 ą 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node
>> >> 31.65 ą 3% +2.7 34.31 ą 2% perf-profile.calltrace.cycles-pp.__alloc_pages_slowpath.__alloc_pages_nodemask.filemap_fault.ext4_filemap_fault.__do_fault
>> >> 17.21 ą 3% +2.7 19.93 ą 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.shrink_inactive_list.shrink_node_memcg.shrink_node.do_try_to_free_pages
>> >
>> >It looks like there is more lru lock contention. It would be caused by
>> >using a movable zone for the CMA memory by this patch. In this case,
>> >reclaim for normal memory skips the lru page on the movable zone so needs
>> >more time to find enough reclaim target pages. It would increase lru lock
>> >holding time and then cause contention.
>> >
>> >Could you give me another stat 'pgskip_XXX' in /proc/vmstat to confirm
>> >my theory?
>>
>> Attached is the /proc/vmstat sample file during the test, sample interval is 1s.
>
> Thanks!
>
> pgskip_XXX is low so my theory would be wrong. The other theory is
> that numa miss is the reason of the regression. Could you test the
> same test on the system without numa? I cannot test it since I
> don't have pmem.

I may find the reason of this regression. Could you test this patch on
top of this patchset?

http://lkml.kernel.org/r/<[email protected]>

Thanks.