Greeting,
FYI, we noticed a +26.5% improvement of vm-scalability.throughput due to commit:
commit: 4405c5fd8434809972dd2996c4dbfe5124b01d55 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
in testcase: vm-scalability
on test machine: 72 threads Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz with 128G memory
with following parameters:
runtime: 300
thp_enabled: never
thp_defrag: always
nr_task: 8
nr_pmem: 4
priority: 1
test: swap-w-seq-mt
cpufreq_governor: performance
test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/nr_pmem/nr_task/priority/rootfs/runtime/tbox_group/test/testcase/thp_defrag/thp_enabled:
gcc-7/performance/x86_64-rhel-7.2/4/8/1/debian-x86_64-2016-08-31.cgz/300/lkp-hsw-ep2/swap-w-seq-mt/vm-scalability/always/never
commit:
41fd9c44d9 ("mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request")
4405c5fd84 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
41fd9c44d94101ec 4405c5fd8434809972dd2996c4
---------------- --------------------------
%stddev %change %stddev
\ | \
3278698 +26.5% 4148766 ? 2% vm-scalability.throughput
0.76 ? 2% -7.9% 0.70 ? 2% vm-scalability.free_time
414083 +28.8% 533402 ? 2% vm-scalability.median
37.39 -11.2% 33.19 vm-scalability.time.elapsed_time
37.39 -11.2% 33.19 vm-scalability.time.elapsed_time.max
646.75 -10.1% 581.25 vm-scalability.time.percent_of_cpu_this_job_got
182.59 -24.6% 137.76 ? 2% vm-scalability.time.system_time
59.42 -6.8% 55.40 vm-scalability.time.user_time
8942 ? 9% +105.6% 18384 ? 6% vm-scalability.time.voluntary_context_switches
2089321 ? 5% -26.6% 1532791 ? 3% cpuidle.C1.time
62485028 -32.1% 42426989 ? 4% interrupts.CAL:Function_call_interrupts
9.87 -2.0 7.89 mpstat.cpu.sys%
859.75 ? 14% +54.6% 1329 ? 34% slabinfo.dmaengine-unmap-16.active_objs
859.75 ? 14% +54.6% 1329 ? 34% slabinfo.dmaengine-unmap-16.num_objs
2691121 ? 9% -31.9% 1832052 ? 8% softirqs.RCU
565376 -10.6% 505518 ? 2% softirqs.TIMER
1798289 ? 20% +219.4% 5744223 ? 6% numa-numastat.node0.numa_foreign
11379510 ? 6% -30.9% 7864229 ? 5% numa-numastat.node1.local_node
11380678 ? 6% -30.9% 7867707 ? 5% numa-numastat.node1.numa_hit
1798289 ? 20% +219.4% 5744223 ? 6% numa-numastat.node1.numa_miss
1799458 ? 20% +219.4% 5747704 ? 6% numa-numastat.node1.other_node
7.148e+08 ? 4% -35.5% 4.607e+08 ? 6% perf-node.node-load-misses
2.806e+08 ? 2% -25.9% 2.079e+08 ? 7% perf-node.node-loads
27.75 ? 5% +9.9% 30.50 ? 3% perf-node.node-local-load-ratio
2.166e+08 ? 3% -27.1% 1.579e+08 ? 8% perf-node.node-store-misses
2.734e+08 ? 9% -43.9% 1.534e+08 ? 10% perf-node.node-stores
447.75 ? 2% -10.8% 399.25 ? 2% turbostat.Avg_MHz
15.16 ? 2% -1.8 13.36 turbostat.Busy%
0.07 ? 5% -0.0 0.06 ? 7% turbostat.C1%
1.875e+08 -33.8% 1.242e+08 ? 4% turbostat.IRQ
152.01 -2.7% 147.85 turbostat.PkgWatt
14.18 -2.3% 13.85 turbostat.RAMWatt
586.75 ? 7% +13.5% 666.25 ? 3% vmstat.memory.buff
5757201 ? 3% +22.8% 7068800 ? 4% vmstat.memory.free
27365780 -39.1% 16652327 ? 2% vmstat.memory.swpd
9.00 -22.2% 7.00 vmstat.procs.r
829.00 ? 13% -38.6% 509.00 ? 5% vmstat.swap.si
1658990 -26.7% 1215747 vmstat.swap.so
1636758 -23.0% 1260916 vmstat.system.in
47862 ? 6% -65.1% 16689 ? 6% meminfo.CmaFree
83417008 -9.9% 75152314 meminfo.Committed_AS
1850235 -19.2% 1494102 ? 4% meminfo.Inactive
1849617 -19.3% 1493422 ? 4% meminfo.Inactive(anon)
5362375 ? 3% +22.4% 6564299 meminfo.MemAvailable
5534597 ? 2% +22.3% 6766142 meminfo.MemFree
103923 -21.3% 81789 meminfo.PageTables
73933617 +13.6% 83992710 meminfo.SwapFree
11799653 -8.0% 10850808 ? 3% numa-meminfo.node0.AnonPages
937028 -19.4% 755072 ? 6% numa-meminfo.node0.Inactive
936810 -19.4% 754801 ? 6% numa-meminfo.node0.Inactive(anon)
2741438 ? 8% +35.6% 3716602 ? 9% numa-meminfo.node0.MemFree
59023 -51.4% 28701 ? 9% numa-meminfo.node0.PageTables
921828 ? 2% -20.4% 733395 ? 5% numa-meminfo.node1.Inactive
921425 ? 2% -20.5% 732971 ? 5% numa-meminfo.node1.Inactive(anon)
45693 +17.0% 53445 ? 5% numa-meminfo.node1.PageTables
8923 ? 4% +66.6% 14863 ? 25% sched_debug.cfs_rq:/.min_vruntime.avg
16889 ? 10% +36.7% 23084 ? 14% sched_debug.cfs_rq:/.min_vruntime.max
2110 ? 9% +22.4% 2584 ? 11% sched_debug.cfs_rq:/.min_vruntime.stddev
2287 ? 30% +205.4% 6985 ? 10% sched_debug.cfs_rq:/.spread0.avg
10329 ? 13% +47.8% 15262 ? 5% sched_debug.cfs_rq:/.spread0.max
2119 ? 8% +21.2% 2569 ? 10% sched_debug.cfs_rq:/.spread0.stddev
1223 ? 17% +25.1% 1531 ? 21% sched_debug.cfs_rq:/.util_avg.max
3392 ? 36% +534.3% 21519 ? 58% sched_debug.cpu.avg_idle.min
2929484 ? 2% -8.5% 2679073 ? 3% numa-vmstat.node0.nr_anon_pages
705776 ? 9% +36.5% 963258 ? 6% numa-vmstat.node0.nr_free_pages
232015 -20.0% 185568 ? 7% numa-vmstat.node0.nr_inactive_anon
11856 ? 6% +23.8% 14680 ? 9% numa-vmstat.node0.nr_indirectly_reclaimable
14587 -51.7% 7041 ? 7% numa-vmstat.node0.nr_page_table_pages
3831258 ? 3% -25.8% 2842415 ? 11% numa-vmstat.node0.nr_vmscan_write
3831285 ? 3% -25.8% 2842319 ? 11% numa-vmstat.node0.nr_written
232105 -20.0% 185668 ? 7% numa-vmstat.node0.nr_zone_inactive_anon
942274 ? 14% +268.2% 3469716 ? 7% numa-vmstat.node0.numa_foreign
11986 ? 5% -64.6% 4238 ? 8% numa-vmstat.node1.nr_free_cma
228092 ? 2% -21.5% 179096 ? 6% numa-vmstat.node1.nr_inactive_anon
11051 ? 6% -25.6% 8224 ? 17% numa-vmstat.node1.nr_indirectly_reclaimable
2930 ? 9% -11.1% 2604 ? 10% numa-vmstat.node1.nr_mapped
11307 ? 2% +17.1% 13236 ? 6% numa-vmstat.node1.nr_page_table_pages
4507408 ? 3% -19.7% 3619831 ? 10% numa-vmstat.node1.nr_vmscan_write
4507421 ? 3% -19.7% 3619853 ? 10% numa-vmstat.node1.nr_written
228157 ? 2% -21.5% 179161 ? 6% numa-vmstat.node1.nr_zone_inactive_anon
7544972 ? 3% -36.3% 4806885 ? 9% numa-vmstat.node1.numa_hit
7381155 ? 3% -37.1% 4641005 ? 9% numa-vmstat.node1.numa_local
943340 ? 14% +268.0% 3471701 ? 7% numa-vmstat.node1.numa_miss
1107163 ? 12% +228.6% 3637587 ? 7% numa-vmstat.node1.numa_other
0.77 ? 2% -0.2 0.58 ? 4% perf-stat.branch-miss-rate%
1.461e+09 -23.3% 1.121e+09 ? 6% perf-stat.branch-misses
25.39 -1.6 23.82 ? 2% perf-stat.cache-miss-rate%
1.492e+09 ? 2% -26.0% 1.104e+09 ? 7% perf-stat.cache-misses
5.877e+09 -21.2% 4.634e+09 ? 6% perf-stat.cache-references
1.63 -10.3% 1.47 perf-stat.cpi
1.223e+12 ? 2% -12.0% 1.075e+12 ? 7% perf-stat.cpu-cycles
1.01 ? 2% -0.2 0.85 ? 6% perf-stat.dTLB-load-miss-rate%
1.873e+09 ? 10% -24.1% 1.423e+09 ? 6% perf-stat.dTLB-load-misses
1.038e+11 ? 2% -13.8% 8.95e+10 ? 7% perf-stat.dTLB-stores
5.226e+08 ? 5% -21.4% 4.108e+08 perf-stat.iTLB-load-misses
90580077 ? 5% -13.8% 78079215 ? 4% perf-stat.iTLB-loads
1434 ? 5% +24.6% 1787 ? 8% perf-stat.instructions-per-iTLB-miss
0.61 +11.5% 0.68 perf-stat.ipc
5290 ? 3% -17.7% 4356 ? 5% perf-stat.major-faults
71.12 -2.3 68.81 perf-stat.node-load-miss-rate%
7.031e+08 ? 4% -33.4% 4.681e+08 ? 6% perf-stat.node-load-misses
2.853e+08 ? 3% -25.6% 2.122e+08 ? 7% perf-stat.node-loads
45.16 ? 6% +4.0 49.16 ? 3% perf-stat.node-store-miss-rate%
2.153e+08 ? 3% -29.0% 1.529e+08 perf-stat.node-store-misses
2.634e+08 ? 11% -39.9% 1.584e+08 ? 6% perf-stat.node-stores
119681 ? 3% -27.8% 86457 ? 5% proc-vmstat.allocstall_movable
40547 ? 14% -53.6% 18814 ? 31% proc-vmstat.allocstall_normal
629.25 ? 93% -99.7% 2.00 ? 61% proc-vmstat.compact_stall
18.75 ? 31% +212.0% 58.50 ? 89% proc-vmstat.kswapd_high_wmark_hit_quickly
129618 ? 3% +26.4% 163840 ? 3% proc-vmstat.nr_dirty_background_threshold
259555 ? 3% +26.4% 328082 ? 3% proc-vmstat.nr_dirty_threshold
11625 ? 5% -64.0% 4188 ? 8% proc-vmstat.nr_free_cma
1344481 ? 3% +26.2% 1696756 ? 3% proc-vmstat.nr_free_pages
465487 -19.8% 373471 ? 3% proc-vmstat.nr_inactive_anon
189.25 ? 5% -24.7% 142.50 ? 9% proc-vmstat.nr_isolated_anon
26266 -22.0% 20495 proc-vmstat.nr_page_table_pages
8458281 ? 2% -23.7% 6456268 ? 4% proc-vmstat.nr_vmscan_write
16689767 -34.4% 10944327 ? 5% proc-vmstat.nr_written
465637 -19.8% 373643 ? 3% proc-vmstat.nr_zone_inactive_anon
4610783 ? 2% +72.2% 7938811 proc-vmstat.numa_foreign
20568554 -17.2% 17030926 proc-vmstat.numa_hit
20554622 -17.2% 17017000 proc-vmstat.numa_local
4610783 ? 2% +72.2% 7938811 proc-vmstat.numa_miss
4624715 ? 2% +72.0% 7952737 proc-vmstat.numa_other
11566637 ? 2% -14.0% 9944302 ? 5% proc-vmstat.numa_pte_updates
1439116 ? 2% -24.9% 1080676 ? 11% proc-vmstat.pgalloc_dma32
17284194 -33.1% 11570103 ? 5% proc-vmstat.pgdeactivate
9.00 ? 19% +15077.8% 1366 ? 77% proc-vmstat.pgmigrate_success
17280106 -32.9% 11600628 ? 5% proc-vmstat.pgrefill
12136735 ? 12% -38.3% 7491459 ? 14% proc-vmstat.pgscan_direct
6262493 -30.7% 4340098 ? 5% proc-vmstat.pgscan_kswapd
10431149 ? 2% -36.6% 6608259 ? 5% proc-vmstat.pgsteal_direct
6260464 -30.7% 4337402 ? 5% proc-vmstat.pgsteal_kswapd
6045 ? 17% -47.4% 3179 ? 23% proc-vmstat.pswpin
16693811 -34.4% 10948374 ? 5% proc-vmstat.pswpout
33044 ? 6% -20.3% 26344 ? 4% proc-vmstat.slabs_scanned
vm-scalability.throughput
4.4e+06 +-+---------------------------------------------------------------+
| O |
4.2e+06 +-O O O O O O O O O O O |
O O O O |
| |
4e+06 +-+ O |
| O O |
3.8e+06 +-+ |
| |
3.6e+06 +-+ |
| |
| .+ |
3.4e+06 +-+ .+ + .+.+.+.. .+.+.+. |
| + .+.+.+.+.+..+.+ +.+.+.+. .+. .+ + +.+. .+.|
3.2e+06 +-+---------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong