2023-06-21 07:32:38

by Oliver Sang

[permalink] [raw]
Subject: [linus:master] [page cache] 9425c591e0: vm-scalability.throughput -20.0% regression



Hello,

kernel test robot noticed a -20.0% regression of vm-scalability.throughput on:


commit: 9425c591e06a9ab27a145ba655fb50532cf0bcc9 ("page cache: fix page_cache_next/prev_miss off by one")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: vm-scalability
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
parameters:

runtime: 300s
test: lru-file-readonce
cpufreq_governor: performance

test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/

In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | vm-scalability: vm-scalability.throughput -18.9% regression |
| test machine | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory |
| test parameters | cpufreq_governor=performance |
| | debug-setup=no-monitor |
| | runtime=300s |
| | test=lru-file-readonce |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | vm-scalability: vm-scalability.throughput -52.8% regression |
| test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | runtime=300s |
| | test=lru-file-readonce |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | vm-scalability: vm-scalability.throughput -54.0% regression |
| test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | debug-setup=no-monitor |
| | runtime=300s |
| | test=lru-file-readonce |
+------------------+----------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/lkp-csl-2sp3/lru-file-readonce/vm-scalability

commit:
26a6ffff7d ("ocfs2: check new file size on fallocate call")
9425c591e0 ("page cache: fix page_cache_next/prev_miss off by one")

26a6ffff7de5dd36 9425c591e06a9ab27a145ba655f
---------------- ---------------------------
%stddev %change %stddev
\ | \
422457 ? 3% -20.4% 336249 vm-scalability.median
135.68 ? 22% +83.7 219.36 ? 7% vm-scalability.stddev%
40693254 ? 3% -20.0% 32554285 vm-scalability.throughput
115.74 ? 2% +27.3% 147.37 vm-scalability.time.elapsed_time
115.74 ? 2% +27.3% 147.37 vm-scalability.time.elapsed_time.max
203761 ? 3% +49.1% 303742 vm-scalability.time.involuntary_context_switches
8146 ? 3% +36.3% 11100 vm-scalability.time.system_time
1547 -23.2% 1188 vm-scalability.time.user_time
196.73 ? 5% +23.0% 242.00 ? 11% uptime.boot
1.13e+09 ? 4% +40.7% 1.591e+09 cpuidle..time
1336649 ? 4% +35.9% 1816370 cpuidle..usage
53141 ? 4% -10.1% 47762 ? 3% meminfo.Mapped
270645 +30.9% 354248 meminfo.SUnreclaim
149654 ? 2% +23.4% 184628 ? 5% numa-meminfo.node0.SUnreclaim
120878 ? 3% +40.3% 169550 ? 5% numa-meminfo.node1.SUnreclaim
13.33 ? 5% -40.0% 8.00 vmstat.cpu.us
4837 +10.5% 5345 vmstat.system.cs
9.78 ? 7% +1.0 10.80 ? 2% mpstat.cpu.all.idle%
3.03 ? 2% -0.9 2.11 mpstat.cpu.all.irq%
0.24 ? 3% -0.1 0.17 ? 2% mpstat.cpu.all.soft%
13.99 ? 3% -5.5 8.51 mpstat.cpu.all.usr%
18559382 +249.3% 64830393 numa-numastat.node0.local_node
1292876 ? 9% +152.9% 3269468 ? 11% numa-numastat.node0.numa_foreign
18595000 +248.9% 64886633 numa-numastat.node0.numa_hit
18689544 ? 5% +254.9% 66320046 numa-numastat.node1.local_node
18750875 ? 5% +254.0% 66371105 numa-numastat.node1.numa_hit
1292878 ? 9% +152.9% 3269321 ? 11% numa-numastat.node1.numa_miss
1355857 ? 7% +145.6% 3330095 ? 11% numa-numastat.node1.other_node
0.03 -0.0 0.02 turbostat.C1%
1100252 ? 5% +43.0% 1573785 ? 2% turbostat.C6
9.76 ? 6% +1.2 10.93 turbostat.C6%
8.07 ? 5% +12.2% 9.06 turbostat.CPU%c1
11528911 ? 2% +26.1% 14539834 turbostat.IRQ
10642 ? 2% +22.2% 13000 ? 9% turbostat.POLL
1.02 ? 6% +33.3% 1.36 ? 2% turbostat.Pkg%pc2
0.31 ? 13% -34.8% 0.20 ? 8% turbostat.Pkg%pc6
293.93 -2.0% 288.02 turbostat.PkgWatt
70.63 -8.6% 64.57 turbostat.RAMWatt
102.00 ? 18% +313.6% 421.83 ? 3% numa-vmstat.node0.nr_isolated_file
37415 ? 2% +23.4% 46172 ? 5% numa-vmstat.node0.nr_slab_unreclaimable
1292876 ? 9% +152.9% 3269468 ? 11% numa-vmstat.node0.numa_foreign
18594720 +249.0% 64887091 numa-vmstat.node0.numa_hit
18559102 +249.3% 64830851 numa-vmstat.node0.numa_local
92.83 ? 13% +399.5% 463.67 ? 6% numa-vmstat.node1.nr_isolated_file
30223 ? 3% +40.3% 42401 ? 5% numa-vmstat.node1.nr_slab_unreclaimable
18750723 ? 5% +254.0% 66371233 numa-vmstat.node1.numa_hit
18689392 ? 5% +254.9% 66320174 numa-vmstat.node1.numa_local
1292878 ? 9% +152.9% 3269321 ? 11% numa-vmstat.node1.numa_miss
1355857 ? 7% +145.6% 3330095 ? 11% numa-vmstat.node1.numa_other
133623 ? 4% -7.5% 123558 proc-vmstat.allocstall_movable
45240952 ?131% -92.1% 3590941 ? 21% proc-vmstat.compact_daemon_free_scanned
11993 ?131% -99.9% 16.67 ? 30% proc-vmstat.compact_fail
60587760 ?148% -94.1% 3602108 ? 21% proc-vmstat.compact_free_scanned
758488 ? 11% -37.7% 472561 ? 10% proc-vmstat.compact_isolated
57772 ?130% -99.5% 296.17 ? 6% proc-vmstat.compact_stall
45778 ?130% -99.4% 279.50 ? 6% proc-vmstat.compact_success
4855 ? 30% -43.1% 2763 proc-vmstat.kswapd_low_wmark_hit_quickly
103465 -2.2% 101219 proc-vmstat.nr_anon_pages
123102 -2.6% 119920 proc-vmstat.nr_inactive_anon
198.00 ? 12% +353.3% 897.50 ? 3% proc-vmstat.nr_isolated_file
13590 ? 3% -12.4% 11904 ? 3% proc-vmstat.nr_mapped
2781 -2.4% 2714 proc-vmstat.nr_page_table_pages
21528 -4.7% 20515 proc-vmstat.nr_shmem
67668 +30.8% 88515 proc-vmstat.nr_slab_unreclaimable
123102 -2.6% 119921 proc-vmstat.nr_zone_inactive_anon
2891364 ? 17% +90.8% 5518043 proc-vmstat.numa_foreign
37348072 ? 3% +251.5% 1.313e+08 proc-vmstat.numa_hit
37251112 ? 3% +252.1% 1.312e+08 proc-vmstat.numa_local
2890914 ? 17% +90.9% 5517733 proc-vmstat.numa_miss
2989871 ? 17% +87.9% 5618425 proc-vmstat.numa_other
4936 ? 30% -40.1% 2954 proc-vmstat.pageoutrun
44918 ? 2% +6.2% 47709 proc-vmstat.pgactivate
12277725 ? 4% -5.3% 11628469 ? 2% proc-vmstat.pgalloc_dma32
536545 +10.8% 594324 proc-vmstat.pgfault
365141 ? 9% -35.0% 237231 ? 10% proc-vmstat.pgmigrate_success
22058 ? 2% +15.7% 25524 ? 9% proc-vmstat.pgreuse
9.82e+08 +3.6% 1.017e+09 proc-vmstat.pgscan_direct
60968872 ? 6% -58.2% 25506033 ? 2% proc-vmstat.pgscan_kswapd
9.82e+08 +3.6% 1.017e+09 proc-vmstat.pgsteal_direct
60968857 ? 6% -58.2% 25506018 ? 2% proc-vmstat.pgsteal_kswapd
17243606 -4.1% 16531969 proc-vmstat.slabs_scanned
1502208 +15.4% 1733504 proc-vmstat.unevictable_pgs_scanned
0.59 ? 2% +24.3% 0.73 sched_debug.cfs_rq:/.h_nr_running.avg
0.42 ? 44% +60.0% 0.67 sched_debug.cfs_rq:/.h_nr_running.min
0.26 ? 5% -24.7% 0.20 ? 14% sched_debug.cfs_rq:/.h_nr_running.stddev
4422 ? 44% +60.6% 7100 sched_debug.cfs_rq:/.load.min
4.33 ? 34% +47.4% 6.39 ? 6% sched_debug.cfs_rq:/.load_avg.min
2886162 +103.1% 5863037 sched_debug.cfs_rq:/.min_vruntime.avg
2941518 +102.2% 5947278 sched_debug.cfs_rq:/.min_vruntime.max
2194540 ? 5% +123.1% 4896328 ? 2% sched_debug.cfs_rq:/.min_vruntime.min
97897 ? 8% +47.4% 144269 ? 7% sched_debug.cfs_rq:/.min_vruntime.stddev
0.58 ? 2% +25.0% 0.72 sched_debug.cfs_rq:/.nr_running.avg
0.42 ? 44% +60.0% 0.67 sched_debug.cfs_rq:/.nr_running.min
0.22 ? 12% -35.3% 0.14 ? 15% sched_debug.cfs_rq:/.nr_running.stddev
27.98 ? 26% -39.5% 16.94 ? 30% sched_debug.cfs_rq:/.removed.load_avg.avg
512.00 -33.3% 341.33 sched_debug.cfs_rq:/.removed.load_avg.max
114.35 ? 11% -36.5% 72.61 ? 14% sched_debug.cfs_rq:/.removed.load_avg.stddev
12.07 ? 29% -42.9% 6.89 ? 32% sched_debug.cfs_rq:/.removed.runnable_avg.avg
262.17 -33.9% 173.17 sched_debug.cfs_rq:/.removed.runnable_avg.max
50.99 ? 14% -39.5% 30.84 ? 18% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
12.07 ? 29% -42.9% 6.89 ? 32% sched_debug.cfs_rq:/.removed.util_avg.avg
262.17 -33.9% 173.17 sched_debug.cfs_rq:/.removed.util_avg.max
50.98 ? 14% -39.5% 30.84 ? 18% sched_debug.cfs_rq:/.removed.util_avg.stddev
678.41 +17.0% 793.95 sched_debug.cfs_rq:/.runnable_avg.avg
226.28 ? 11% -24.8% 170.27 ? 4% sched_debug.cfs_rq:/.runnable_avg.stddev
97911 ? 8% +47.2% 144100 ? 7% sched_debug.cfs_rq:/.spread0.stddev
666.86 +17.4% 783.08 sched_debug.cfs_rq:/.util_avg.avg
208.29 ? 13% -27.0% 152.05 ? 10% sched_debug.cfs_rq:/.util_avg.stddev
116.27 ? 42% +382.5% 560.94 ? 8% sched_debug.cfs_rq:/.util_est_enqueued.avg
109444 ? 10% +40.2% 153489 ? 18% sched_debug.cpu.clock.avg
109467 ? 10% +40.2% 153509 ? 18% sched_debug.cpu.clock.max
109414 ? 10% +40.3% 153461 ? 18% sched_debug.cpu.clock.min
107663 ? 10% +39.9% 150668 ? 18% sched_debug.cpu.clock_task.avg
108360 ? 10% +40.1% 151833 ? 18% sched_debug.cpu.clock_task.max
75131 ? 3% +39.0% 104412 sched_debug.cpu.clock_task.min
2934 ? 2% +28.9% 3783 sched_debug.cpu.curr->pid.avg
5163 ? 5% +21.3% 6262 sched_debug.cpu.curr->pid.max
0.58 +24.6% 0.73 sched_debug.cpu.nr_running.avg
0.42 ? 44% +60.0% 0.67 sched_debug.cpu.nr_running.min
0.26 ? 6% -24.7% 0.20 ? 14% sched_debug.cpu.nr_running.stddev
3876 +49.9% 5808 sched_debug.cpu.nr_switches.avg
1271 ? 7% +129.7% 2921 ? 3% sched_debug.cpu.nr_switches.min
109415 ? 10% +40.3% 153461 ? 18% sched_debug.cpu_clk
108809 ? 10% +40.5% 152855 ? 18% sched_debug.ktime
92970 ? 2% +31.9% 122619 sched_debug.sched_clk
22.60 ? 2% -9.1% 20.54 perf-stat.i.MPKI
7.85e+09 +6.0% 8.324e+09 perf-stat.i.branch-instructions
0.74 ? 4% -0.0 0.69 perf-stat.i.branch-miss-rate%
71.50 -7.0 64.53 perf-stat.i.cache-miss-rate%
6.938e+08 ? 2% -16.1% 5.818e+08 perf-stat.i.cache-misses
9.381e+08 ? 2% -7.9% 8.638e+08 perf-stat.i.cache-references
4619 +12.3% 5185 ? 2% perf-stat.i.context-switches
6.23 -3.4% 6.02 perf-stat.i.cpi
2.673e+11 -1.1% 2.644e+11 perf-stat.i.cpu-cycles
390.32 ? 4% +15.9% 452.40 perf-stat.i.cycles-between-cache-misses
0.11 ? 14% +0.0 0.16 ? 15% perf-stat.i.dTLB-load-miss-rate%
9950044 ? 17% +49.2% 14844318 ? 16% perf-stat.i.dTLB-load-misses
9.204e+09 +3.9% 9.56e+09 perf-stat.i.dTLB-loads
0.01 ? 4% +0.0 0.02 ? 6% perf-stat.i.dTLB-store-miss-rate%
1156960 ? 4% +31.0% 1515234 ? 7% perf-stat.i.dTLB-store-misses
9.015e+09 ? 2% -14.5% 7.71e+09 perf-stat.i.dTLB-stores
24391123 ? 3% -28.8% 17371098 ? 3% perf-stat.i.iTLB-load-misses
354674 ? 18% -44.4% 197172 ? 7% perf-stat.i.iTLB-loads
1879 ? 2% +36.8% 2570 ? 2% perf-stat.i.instructions-per-iTLB-miss
0.19 ? 2% +6.7% 0.20 perf-stat.i.ipc
702.24 ? 9% -9.9% 632.55 perf-stat.i.metric.K/sec
283.58 -2.1% 277.69 perf-stat.i.metric.M/sec
3012 -9.3% 2732 ? 2% perf-stat.i.minor-faults
28.79 +5.3 34.09 perf-stat.i.node-load-miss-rate%
9266946 ? 4% +29.4% 11987881 perf-stat.i.node-load-misses
13.38 ? 18% -2.3 11.06 ? 2% perf-stat.i.node-store-miss-rate%
25405624 ? 16% -32.4% 17161954 perf-stat.i.node-store-misses
2.395e+08 ? 4% -11.4% 2.121e+08 perf-stat.i.node-stores
3012 -9.3% 2732 ? 2% perf-stat.i.page-faults
22.86 ? 2% -9.0% 20.79 perf-stat.overall.MPKI
0.68 ? 2% -0.1 0.63 perf-stat.overall.branch-miss-rate%
73.97 -6.6 67.35 perf-stat.overall.cache-miss-rate%
6.52 -2.3% 6.37 perf-stat.overall.cpi
385.75 ? 2% +17.8% 454.56 perf-stat.overall.cycles-between-cache-misses
0.11 ? 17% +0.0 0.16 ? 16% perf-stat.overall.dTLB-load-miss-rate%
0.01 ? 4% +0.0 0.02 ? 6% perf-stat.overall.dTLB-store-miss-rate%
1684 ? 3% +42.1% 2393 ? 3% perf-stat.overall.instructions-per-iTLB-miss
0.15 +2.4% 0.16 perf-stat.overall.ipc
26.97 ? 2% +5.6 32.59 perf-stat.overall.node-load-miss-rate%
9.66 ? 19% -2.2 7.49 perf-stat.overall.node-store-miss-rate%
1108 ? 2% +28.4% 1423 perf-stat.overall.path-length
7.806e+09 +6.1% 8.28e+09 perf-stat.ps.branch-instructions
6.9e+08 ? 2% -16.1% 5.788e+08 perf-stat.ps.cache-misses
9.328e+08 ? 2% -7.9% 8.593e+08 perf-stat.ps.cache-references
4583 +12.4% 5153 ? 2% perf-stat.ps.context-switches
9894510 ? 17% +49.3% 14768534 ? 16% perf-stat.ps.dTLB-load-misses
9.153e+09 +3.9% 9.51e+09 perf-stat.ps.dTLB-loads
1149979 ? 4% +31.2% 1508379 ? 7% perf-stat.ps.dTLB-store-misses
8.963e+09 ? 2% -14.4% 7.67e+09 perf-stat.ps.dTLB-stores
24251454 ? 3% -28.7% 17280158 ? 3% perf-stat.ps.iTLB-load-misses
348197 ? 18% -44.1% 194509 ? 7% perf-stat.ps.iTLB-loads
4.08e+10 +1.3% 4.133e+10 perf-stat.ps.instructions
2961 -8.5% 2710 ? 2% perf-stat.ps.minor-faults
9217823 ? 5% +29.3% 11923013 perf-stat.ps.node-load-misses
25335939 ? 16% -32.6% 17074928 perf-stat.ps.node-store-misses
2.381e+08 ? 4% -11.4% 2.11e+08 perf-stat.ps.node-stores
2961 -8.5% 2710 ? 2% perf-stat.ps.page-faults
4.762e+12 ? 2% +28.4% 6.116e+12 perf-stat.total.instructions



***************************************************************************************************
lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
=========================================================================================
compiler/cpufreq_governor/debug-setup/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-12/performance/no-monitor/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/lkp-csl-2sp3/lru-file-readonce/vm-scalability

commit:
26a6ffff7d ("ocfs2: check new file size on fallocate call")
9425c591e0 ("page cache: fix page_cache_next/prev_miss off by one")

26a6ffff7de5dd36 9425c591e06a9ab27a145ba655f
---------------- ---------------------------
%stddev %change %stddev
\ | \
415430 ? 5% -19.5% 334519 vm-scalability.median
148.13 ? 15% +145.2 293.32 ? 5% vm-scalability.stddev%
40011810 ? 5% -18.9% 32458841 vm-scalability.throughput
117.21 ? 5% +25.4% 146.96 vm-scalability.time.elapsed_time
117.21 ? 5% +25.4% 146.96 vm-scalability.time.elapsed_time.max
178088 ? 6% +45.8% 259602 vm-scalability.time.involuntary_context_switches
8568 ? 7% +34.4% 11518 ? 2% vm-scalability.time.system_time
1546 ? 3% -33.8% 1023 ? 38% vm-scalability.time.user_time



***************************************************************************************************
lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/lkp-spr-r02/lru-file-readonce/vm-scalability

commit:
26a6ffff7d ("ocfs2: check new file size on fallocate call")
9425c591e0 ("page cache: fix page_cache_next/prev_miss off by one")

26a6ffff7de5dd36 9425c591e06a9ab27a145ba655f
---------------- ---------------------------
%stddev %change %stddev
\ | \
425365 ? 3% -53.6% 197251 vm-scalability.median
1421 ? 13% +321.7 1743 vm-scalability.stddev%
96416973 ? 2% -52.8% 45524776 vm-scalability.throughput
58.36 +100.9% 117.26 vm-scalability.time.elapsed_time
58.36 +100.9% 117.26 vm-scalability.time.elapsed_time.max
131550 ? 3% +50.6% 198176 vm-scalability.time.involuntary_context_switches
16880 +6.1% 17912 vm-scalability.time.percent_of_cpu_this_job_got
9726 ? 3% +114.8% 20889 vm-scalability.time.system_time
126.96 -9.1% 115.44 vm-scalability.time.user_time
3.374e+09 ? 4% +56.8% 5.291e+09 ? 2% cpuidle..time
3869481 ? 4% +49.9% 5800349 ? 2% cpuidle..usage
24.87 ? 5% -5.3 19.59 mpstat.cpu.all.idle%
1.23 -0.4 0.85 mpstat.cpu.all.irq%
0.08 ? 5% -0.0 0.04 ? 7% mpstat.cpu.all.soft%
1.01 ? 2% -0.5 0.52 mpstat.cpu.all.usr%
26.80 ? 5% -24.1% 20.33 ? 2% vmstat.cpu.id
40855850 ? 2% -14.4% 34965830 vmstat.memory.free
6716 ? 2% -28.1% 4831 vmstat.system.cs
229787 -1.3% 226798 vmstat.system.in
2029790 ? 4% +44.0% 2922054 ? 2% turbostat.C1E
11.28 ? 4% -2.3 8.98 turbostat.C1E%
1801693 ? 5% +56.8% 2825560 turbostat.C6
13.20 ? 6% -2.7 10.48 turbostat.C6%
24.68 ? 5% -20.4% 19.64 turbostat.CPU%c1
14035561 +94.2% 27251262 turbostat.IRQ
564.42 -2.6% 549.72 turbostat.PkgWatt
44.21 -18.7% 35.93 turbostat.RAMWatt
12197 ? 13% +636.3% 89801 ? 15% meminfo.Active
11912 ? 13% +651.5% 89520 ? 15% meminfo.Active(anon)
119130 ? 4% +26.4% 150621 ? 8% meminfo.AnonHugePages
956006 +18.2% 1130087 ? 3% meminfo.Committed_AS
505842 +20.0% 607188 ? 5% meminfo.Inactive(anon)
32714 +426.7% 172321 ? 25% meminfo.Mapped
37463995 ? 4% -11.1% 33318367 ? 2% meminfo.MemFree
561220 +29.6% 727398 meminfo.SUnreclaim
135378 +128.1% 308858 ? 10% meminfo.Shmem
17069924 +280.6% 64961189 numa-numastat.node0.local_node
1609314 ? 8% +82.7% 2940082 ? 21% numa-numastat.node0.numa_foreign
17184715 +279.0% 65135268 numa-numastat.node0.numa_hit
1436738 ? 9% +114.8% 3086362 ? 9% numa-numastat.node0.numa_miss
1550946 ? 9% +109.3% 3246496 ? 8% numa-numastat.node0.other_node
17324228 +280.1% 65851482 numa-numastat.node1.local_node
1436520 ? 9% +114.8% 3085490 ? 9% numa-numastat.node1.numa_foreign
17443777 +277.9% 65920959 numa-numastat.node1.numa_hit
1609014 ? 8% +82.8% 2941512 ? 21% numa-numastat.node1.numa_miss
1729021 ? 11% +74.4% 3014651 ? 22% numa-numastat.node1.other_node
42890 ? 39% +114.0% 91791 ? 23% numa-meminfo.node0.AnonHugePages
248738 ? 37% +72.7% 429557 ? 20% numa-meminfo.node0.AnonPages.max
19682 ? 73% +236.9% 66302 ? 38% numa-meminfo.node0.Mapped
19237897 ? 4% -11.7% 16983176 numa-meminfo.node0.MemFree
297954 ? 5% +24.1% 369749 ? 5% numa-meminfo.node0.SUnreclaim
6267 ? 27% +1225.7% 83084 ? 17% numa-meminfo.node1.Active
6190 ? 28% +1239.5% 82928 ? 17% numa-meminfo.node1.Active(anon)
13113 ?109% +717.9% 107259 ? 23% numa-meminfo.node1.Mapped
263077 ? 5% +36.0% 357708 ? 6% numa-meminfo.node1.SUnreclaim
53103 ?112% +456.9% 295751 ? 11% numa-meminfo.node1.Shmem
1844018 ? 12% +15.2% 2123903 ? 2% numa-meminfo.node1.Slab
4807355 ? 5% -11.5% 4256529 numa-vmstat.node0.nr_free_pages
257.40 ? 12% +451.7% 1420 ? 6% numa-vmstat.node0.nr_isolated_file
4919 ? 73% +237.4% 16596 ? 38% numa-vmstat.node0.nr_mapped
74517 ? 5% +24.1% 92457 ? 5% numa-vmstat.node0.nr_slab_unreclaimable
1609314 ? 8% +82.7% 2940082 ? 21% numa-vmstat.node0.numa_foreign
17184665 +279.0% 65135448 numa-vmstat.node0.numa_hit
17069875 +280.6% 64961368 numa-vmstat.node0.numa_local
1436738 ? 9% +114.8% 3086362 ? 9% numa-vmstat.node0.numa_miss
1550946 ? 9% +109.3% 3246496 ? 8% numa-vmstat.node0.numa_other
1913370 ? 3% -12.9% 1667008 ? 2% numa-vmstat.node0.workingset_nodereclaim
1546 ? 28% +1222.7% 20454 ? 17% numa-vmstat.node1.nr_active_anon
4572882 ? 2% -10.5% 4094430 ? 2% numa-vmstat.node1.nr_free_pages
266.00 ? 17% +448.2% 1458 ? 4% numa-vmstat.node1.nr_isolated_file
3278 ?109% +721.9% 26948 ? 24% numa-vmstat.node1.nr_mapped
13275 ?112% +456.8% 73920 ? 11% numa-vmstat.node1.nr_shmem
65780 ? 5% +36.0% 89471 ? 6% numa-vmstat.node1.nr_slab_unreclaimable
1546 ? 28% +1222.7% 20454 ? 17% numa-vmstat.node1.nr_zone_active_anon
1436520 ? 9% +114.8% 3085490 ? 9% numa-vmstat.node1.numa_foreign
17444041 +277.9% 65921742 numa-vmstat.node1.numa_hit
17324492 +280.1% 65852266 numa-vmstat.node1.numa_local
1609014 ? 8% +82.8% 2941512 ? 21% numa-vmstat.node1.numa_miss
1729021 ? 11% +74.4% 3014651 ? 22% numa-vmstat.node1.numa_other
2265802 ? 16% -22.8% 1748207 ? 4% numa-vmstat.node1.workingset_nodereclaim
0.06 ? 14% +847.2% 0.53 sched_debug.cfs_rq:/.h_nr_running.avg
0.23 ? 5% -33.2% 0.16 ? 13% sched_debug.cfs_rq:/.h_nr_running.stddev
59062 ? 28% +12339.3% 7346893 sched_debug.cfs_rq:/.min_vruntime.avg
84434 ? 24% +8715.0% 7442895 sched_debug.cfs_rq:/.min_vruntime.max
10717 ? 44% +57375.1% 6159708 ? 8% sched_debug.cfs_rq:/.min_vruntime.min
11486 ? 35% +1079.5% 135474 ? 23% sched_debug.cfs_rq:/.min_vruntime.stddev
0.06 ? 14% +839.2% 0.53 sched_debug.cfs_rq:/.nr_running.avg
0.23 ? 5% -49.3% 0.12 ? 7% sched_debug.cfs_rq:/.nr_running.stddev
36118 ?119% -98.6% 511.83 sched_debug.cfs_rq:/.removed.load_avg.max
2419 ?118% -97.6% 56.86 ? 22% sched_debug.cfs_rq:/.removed.load_avg.stddev
522.00 -49.8% 262.00 sched_debug.cfs_rq:/.removed.runnable_avg.max
58.71 ? 32% -50.9% 28.83 ? 23% sched_debug.cfs_rq:/.removed.runnable_avg.stddev
522.00 -49.8% 261.83 sched_debug.cfs_rq:/.removed.util_avg.max
58.71 ? 32% -50.9% 28.82 ? 23% sched_debug.cfs_rq:/.removed.util_avg.stddev
131.58 ? 7% +342.4% 582.10 sched_debug.cfs_rq:/.runnable_avg.avg
232.63 ? 4% -32.7% 156.66 ? 16% sched_debug.cfs_rq:/.runnable_avg.stddev
16482 ? 90% +4569.1% 769608 ?107% sched_debug.cfs_rq:/.spread0.avg
41853 ? 34% +1958.3% 861479 ? 96% sched_debug.cfs_rq:/.spread0.max
11485 ? 35% +1075.2% 134976 ? 23% sched_debug.cfs_rq:/.spread0.stddev
131.15 ? 7% +340.7% 577.93 sched_debug.cfs_rq:/.util_avg.avg
231.87 ? 4% -41.3% 136.06 ? 12% sched_debug.cfs_rq:/.util_avg.stddev
6986 ?125% +6105.4% 433548 ? 13% sched_debug.cpu.avg_idle.min
10.04 ? 3% +211.8% 31.30 ? 51% sched_debug.cpu.clock.stddev
35371 ? 15% +73.9% 61515 sched_debug.cpu.clock_task.min
236.78 ? 13% +1736.9% 4349 sched_debug.cpu.curr->pid.avg
5232 +46.7% 7676 sched_debug.cpu.curr->pid.max
1042 ? 6% -29.9% 730.97 ? 10% sched_debug.cpu.curr->pid.stddev
0.00 ? 19% +135.4% 0.00 ? 48% sched_debug.cpu.next_balance.stddev
0.05 ? 14% +905.6% 0.53 sched_debug.cpu.nr_running.avg
0.23 ? 6% -34.4% 0.15 ? 12% sched_debug.cpu.nr_running.stddev
1088 ? 4% +57.9% 1719 sched_debug.cpu.nr_switches.avg
176.80 ? 17% +262.0% 640.00 sched_debug.cpu.nr_switches.min
50872 ? 16% +45.9% 74232 sched_debug.sched_clk
13.49 ? 28% -11.5 1.96 ? 71% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
13.49 ? 28% -11.5 1.96 ? 71% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
13.49 ? 28% -11.5 1.96 ? 71% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
13.49 ? 28% -11.5 1.96 ? 71% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
10.59 ? 58% -9.5 1.06 ?141% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
9.99 ? 54% -8.9 1.06 ?141% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
9.78 ? 59% -8.7 1.06 ?141% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
12.90 ? 30% -7.1 5.82 ?141% perf-profile.calltrace.cycles-pp.get_signal.arch_do_signal_or_restart.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode
6.89 ? 89% -6.9 0.00 perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.54 ?122% +1.0 1.53 ? 8% perf-profile.calltrace.cycles-pp.free_pgtables.exit_mmap.__mmput.exit_mm.do_exit
0.46 ?123% +1.6 2.05 ? 38% perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
0.79 ? 81% +1.7 2.52 ? 27% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
9.45 ? 15% +2.8 12.21 ? 8% perf-profile.calltrace.cycles-pp.cmd_record.run_builtin.main.__libc_start_main
9.45 ? 15% +2.8 12.21 ? 8% perf-profile.calltrace.cycles-pp.__cmd_record.cmd_record.run_builtin.main.__libc_start_main
2.45 ? 55% +3.1 5.59 ? 36% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault
2.45 ? 55% +3.1 5.59 ? 36% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
8.43 ? 8% +3.3 11.68 ? 8% perf-profile.calltrace.cycles-pp.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin.main
8.43 ? 8% +3.3 11.68 ? 8% perf-profile.calltrace.cycles-pp.perf_mmap__push.record__mmap_read_evlist.__cmd_record.cmd_record.run_builtin
13.49 ? 28% -11.5 1.96 ? 71% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
13.49 ? 28% -11.5 1.96 ? 71% perf-profile.children.cycles-pp.start_secondary
13.49 ? 28% -11.5 1.96 ? 71% perf-profile.children.cycles-pp.cpu_startup_entry
13.49 ? 28% -11.5 1.96 ? 71% perf-profile.children.cycles-pp.do_idle
10.59 ? 58% -9.5 1.06 ?141% perf-profile.children.cycles-pp.cpuidle_idle_call
9.78 ? 59% -8.7 1.06 ?141% perf-profile.children.cycles-pp.cpuidle_enter
9.78 ? 59% -8.7 1.06 ?141% perf-profile.children.cycles-pp.cpuidle_enter_state
6.89 ? 89% -6.9 0.00 perf-profile.children.cycles-pp.poll_idle
5.39 ? 53% -4.0 1.43 ? 77% perf-profile.children.cycles-pp.__schedule
4.37 ? 52% -2.9 1.43 ? 77% perf-profile.children.cycles-pp.pick_next_task_fair
4.37 ? 52% -2.9 1.43 ? 77% perf-profile.children.cycles-pp.newidle_balance
1.00 ? 51% +1.1 2.05 ? 38% perf-profile.children.cycles-pp.__percpu_counter_init
0.73 ? 82% +1.3 2.05 ? 38% perf-profile.children.cycles-pp.do_anonymous_page
0.00 +1.5 1.53 ? 8% perf-profile.children.cycles-pp.__entry_text_start
0.00 +1.5 1.53 ? 8% perf-profile.children.cycles-pp.getdents64
0.79 ? 81% +1.7 2.52 ? 27% perf-profile.children.cycles-pp.link_path_walk
0.28 ?200% +1.8 2.07 ? 41% perf-profile.children.cycles-pp.perf_mmap__consume
0.28 ?200% +2.3 2.62 ? 62% perf-profile.children.cycles-pp.ring_buffer_write_tail
6.89 ? 89% -6.9 0.00 perf-profile.self.cycles-pp.poll_idle
1.00 ? 51% +0.5 1.53 ? 8% perf-profile.self.cycles-pp.__memcpy
11.01 +4.5% 11.50 perf-stat.i.MPKI
1.696e+10 -9.0% 1.543e+10 perf-stat.i.branch-instructions
72.35 +1.8 74.12 perf-stat.i.cache-miss-rate%
6.967e+08 ? 2% -6.4% 6.518e+08 perf-stat.i.cache-misses
9.101e+08 -6.5% 8.505e+08 perf-stat.i.cache-references
6331 -27.0% 4622 perf-stat.i.context-switches
5.35 ? 3% +23.8% 6.63 perf-stat.i.cpi
4.799e+11 ? 2% +7.2% 5.145e+11 perf-stat.i.cpu-cycles
0.03 ? 10% -0.0 0.02 ? 5% perf-stat.i.dTLB-load-miss-rate%
5192997 ? 10% -28.3% 3724044 ? 7% perf-stat.i.dTLB-load-misses
2.069e+10 -15.0% 1.758e+10 perf-stat.i.dTLB-loads
8.241e+09 -38.7% 5.052e+09 perf-stat.i.dTLB-stores
7.845e+10 -13.2% 6.81e+10 perf-stat.i.instructions
0.28 ? 4% -11.7% 0.25 perf-stat.i.ipc
0.20 ? 11% -70.3% 0.06 ? 60% perf-stat.i.major-faults
2.14 ? 2% +7.2% 2.29 perf-stat.i.metric.GHz
208.33 -16.7% 173.46 perf-stat.i.metric.M/sec
8411 ? 6% -24.0% 6396 ? 2% perf-stat.i.minor-faults
58.44 -12.8 45.66 perf-stat.i.node-load-miss-rate%
6494075 +68.5% 10941823 ? 2% perf-stat.i.node-loads
8411 ? 6% -24.0% 6396 ? 2% perf-stat.i.page-faults
11.58 +7.8% 12.48 perf-stat.overall.MPKI
0.14 ? 4% +0.0 0.15 perf-stat.overall.branch-miss-rate%
6.16 +23.2% 7.59 perf-stat.overall.cpi
695.12 +14.1% 793.11 perf-stat.overall.cycles-between-cache-misses
0.00 ? 17% +0.0 0.01 ? 6% perf-stat.overall.dTLB-store-miss-rate%
0.16 -18.8% 0.13 perf-stat.overall.ipc
57.95 -13.2 44.73 perf-stat.overall.node-load-miss-rate%
1095 +72.2% 1885 perf-stat.overall.path-length
1.711e+10 -9.1% 1.555e+10 perf-stat.ps.branch-instructions
7.014e+08 ? 2% -6.4% 6.564e+08 perf-stat.ps.cache-misses
9.165e+08 -6.6% 8.564e+08 perf-stat.ps.cache-references
6241 -26.5% 4588 perf-stat.ps.context-switches
4.874e+11 ? 2% +6.8% 5.204e+11 perf-stat.ps.cpu-cycles
5234336 ? 10% -28.5% 3741043 ? 7% perf-stat.ps.dTLB-load-misses
2.088e+10 -15.2% 1.772e+10 perf-stat.ps.dTLB-loads
8.293e+09 -38.8% 5.074e+09 perf-stat.ps.dTLB-stores
7.913e+10 -13.3% 6.862e+10 perf-stat.ps.instructions
0.19 ? 12% -67.7% 0.06 ? 63% perf-stat.ps.major-faults
7530 ? 6% -19.6% 6051 ? 3% perf-stat.ps.minor-faults
6490058 +69.2% 10980376 ? 2% perf-stat.ps.node-loads
7531 ? 6% -19.6% 6051 ? 3% perf-stat.ps.page-faults
4.703e+12 +72.2% 8.097e+12 perf-stat.total.instructions
64606 -7.2% 59963 proc-vmstat.allocstall_movable
293.00 ?120% -99.4% 1.67 ?101% proc-vmstat.compact_fail
1500 ? 89% -97.0% 44.33 ? 9% proc-vmstat.compact_stall
1207 ? 82% -96.5% 42.67 ? 6% proc-vmstat.compact_success
156.40 ? 8% -90.8% 14.33 ? 25% proc-vmstat.kswapd_high_wmark_hit_quickly
436.60 ? 5% -80.4% 85.67 ? 42% proc-vmstat.kswapd_low_wmark_hit_quickly
2977 ? 13% +654.5% 22460 ? 12% proc-vmstat.nr_active_anon
54830634 +2.0% 55903178 proc-vmstat.nr_file_pages
9352053 ? 4% -11.8% 8245404 proc-vmstat.nr_free_pages
126557 ? 2% +19.8% 151627 ? 5% proc-vmstat.nr_inactive_anon
54120346 +1.9% 55146722 proc-vmstat.nr_inactive_file
519.60 ? 13% +452.5% 2871 ? 2% proc-vmstat.nr_isolated_file
8177 +425.4% 42962 ? 24% proc-vmstat.nr_mapped
33951 +127.2% 77148 ? 9% proc-vmstat.nr_shmem
827779 ? 6% +7.4% 888839 proc-vmstat.nr_slab_reclaimable
140292 +29.7% 181955 proc-vmstat.nr_slab_unreclaimable
2977 ? 13% +654.4% 22459 ? 12% proc-vmstat.nr_zone_active_anon
126557 ? 2% +19.8% 151627 ? 5% proc-vmstat.nr_zone_inactive_anon
54120244 +1.9% 55146389 proc-vmstat.nr_zone_inactive_file
3045834 ? 7% +97.8% 6025572 ? 11% proc-vmstat.numa_foreign
158.60 ? 23% +22167.3% 35316 ? 24% proc-vmstat.numa_hint_faults
15.00 ? 59% +1.9e+05% 28600 ? 17% proc-vmstat.numa_hint_faults_local
34631463 +278.4% 1.311e+08 proc-vmstat.numa_hit
34397124 +280.3% 1.308e+08 proc-vmstat.numa_local
3045753 ? 7% +97.9% 6027874 ? 11% proc-vmstat.numa_miss
3279967 ? 7% +90.9% 6261147 ? 11% proc-vmstat.numa_other
100.20 ? 21% +13250.0% 13376 ? 45% proc-vmstat.numa_pages_migrated
233.60 ? 31% +91258.6% 213413 ? 8% proc-vmstat.numa_pte_updates
1092 ? 4% +11701.1% 128891 ? 15% proc-vmstat.pgactivate
4152968 -2.6% 4045800 proc-vmstat.pgalloc_dma32
560518 +54.1% 863978 ? 3% proc-vmstat.pgfault
2208 ? 2% +8.4% 2394 proc-vmstat.pgpgout
29221 +27.8% 37353 ? 6% proc-vmstat.pgreuse
9.864e+08 +1.3% 9.992e+08 proc-vmstat.pgscan_direct
3140 ?122% +783.4% 27746 ? 15% proc-vmstat.pgscan_khugepaged
25158513 ? 6% -50.5% 12460524 ? 3% proc-vmstat.pgscan_kswapd
9.864e+08 +1.3% 9.992e+08 proc-vmstat.pgsteal_direct
3140 ?122% +783.4% 27746 ? 15% proc-vmstat.pgsteal_khugepaged
25158494 ? 6% -50.5% 12460503 ? 3% proc-vmstat.pgsteal_kswapd
15622681 ? 6% -13.1% 13570133 ? 3% proc-vmstat.slabs_scanned
1212 ? 19% -34.9% 789.00 ? 44% proc-vmstat.unevictable_pgs_culled
2109666 ? 5% +24.0% 2616320 proc-vmstat.unevictable_pgs_scanned
4163268 ? 9% -18.3% 3401404 ? 3% proc-vmstat.workingset_nodereclaim
4550764 ? 8% +9.6% 4988655 proc-vmstat.workingset_nodes
0.03 ? 81% +6857.4% 1.91 ? 98% perf-sched.sch_delay.avg.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.01 ? 57% +622.2% 0.04 ? 37% perf-sched.sch_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
12.08 ?199% -100.0% 0.00 ?112% perf-sched.sch_delay.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
0.01 ? 49% +256.1% 0.03 ? 48% perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.01 ? 38% +60.7% 0.01 ? 15% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
0.01 ? 66% +611.3% 0.08 ? 39% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
0.20 ?163% -95.4% 0.01 ? 5% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork
0.08 ? 79% +17283.7% 14.05 ?104% perf-sched.sch_delay.max.ms.__cond_resched.stop_one_cpu.sched_exec.bprm_execve.part
0.01 ? 39% +390.9% 0.05 ? 27% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.78 ?172% +2586.6% 20.98 ? 59% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
0.01 ? 44% +383.9% 0.03 ? 32% perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
10.92 ? 22% +865.3% 105.43 ? 8% perf-sched.total_wait_and_delay.average.ms
4118 ? 9% +531.9% 26025 ? 19% perf-sched.total_wait_and_delay.count.ms
291.09 ? 4% +1572.4% 4868 ? 30% perf-sched.total_wait_and_delay.max.ms
10.87 ? 22% +869.8% 105.41 ? 8% perf-sched.total_wait_time.average.ms
291.05 ? 4% +1572.6% 4868 ? 30% perf-sched.total_wait_time.max.ms
0.25 ? 45% +1864.6% 4.90 ? 36% perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.85 ? 24% +22507.0% 192.97 ? 7% perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
1.31 ? 22% +5782.9% 77.01 ? 68% perf-sched.wait_and_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
47.74 ? 41% +965.5% 508.60 ? 33% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
3.09 ? 16% +39.5% 4.31 perf-sched.wait_and_delay.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
31.61 ? 25% +1733.3% 579.45 ? 11% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork
20.43 ? 26% +774.1% 178.59 ? 2% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork
964.80 ? 17% +39.3% 1344 ? 13% perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
597.20 ? 5% -76.6% 139.67 ? 16% perf-sched.wait_and_delay.count.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
544.00 ? 8% -84.1% 86.33 ? 75% perf-sched.wait_and_delay.count.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
3.80 ? 56% +27838.6% 1061 ? 48% perf-sched.wait_and_delay.count.exit_to_user_mode_loop.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
98.20 ? 15% +1296.5% 1371 ? 14% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
1116 ? 17% +102.7% 2263 ? 26% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork
332.80 ? 7% +2035.7% 7107 ? 14% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork
209.31 ? 49% +979.6% 2259 ? 78% perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
129.13 ? 84% +774.2% 1128 ? 12% perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
219.17 ? 48% +439.7% 1182 perf-sched.wait_and_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
268.08 ? 3% +863.6% 2583 ?114% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
6.80 ? 32% +414.5% 34.99 ? 18% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
278.18 ? 5% +861.4% 2674 ? 41% perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork
267.76 ? 12% +664.6% 2047 perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork
0.25 ? 45% +1864.5% 4.90 ? 36% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
35.14 ? 89% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.perf_event_release_kernel.perf_release.__fput
58.82 ? 49% -96.3% 2.19 ? 2% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.85 ? 25% +22676.8% 192.74 ? 7% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.22 ? 91% -100.0% 0.00 perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
0.38 ? 6% +114.5% 0.81 ? 51% perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.00 ?183% +6688.9% 0.20 ? 78% perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
1.30 ? 21% +5820.3% 76.98 ? 68% perf-sched.wait_time.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
38.60 ? 69% +1216.8% 508.29 ? 33% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
3.05 ? 17% +40.8% 4.29 perf-sched.wait_time.avg.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
31.60 ? 25% +1733.6% 579.44 ? 11% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork
58.82 ? 49% -96.3% 2.19 ? 2% perf-sched.wait_time.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
20.23 ? 25% +782.9% 178.58 ? 2% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork
209.31 ? 49% +979.6% 2259 ? 78% perf-sched.wait_time.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
151.13 ? 81% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.perf_event_release_kernel.perf_release.__fput
216.25 ? 48% -98.0% 4.37 ? 2% perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
129.13 ? 84% +765.8% 1118 ? 12% perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
0.33 ? 96% -100.0% 0.00 perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.get_signal.arch_do_signal_or_restart
1.04 ? 7% +2409.3% 26.06 ? 26% perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
0.03 ?186% +6181.2% 2.05 ? 71% perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt
219.16 ? 48% +439.7% 1182 perf-sched.wait_time.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
215.77 ? 47% +1097.3% 2583 ?114% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
5.00 +526.9% 31.33 ? 33% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
278.17 ? 5% +861.4% 2674 ? 41% perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork
216.27 ? 48% -98.0% 4.38 ? 2% perf-sched.wait_time.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
267.75 ? 12% +664.7% 2047 perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork


***************************************************************************************************
lkp-spr-r02: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
=========================================================================================
compiler/cpufreq_governor/debug-setup/kconfig/rootfs/runtime/tbox_group/test/testcase:
gcc-12/performance/no-monitor/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/lkp-spr-r02/lru-file-readonce/vm-scalability

commit:
26a6ffff7d ("ocfs2: check new file size on fallocate call")
9425c591e0 ("page cache: fix page_cache_next/prev_miss off by one")

26a6ffff7de5dd36 9425c591e06a9ab27a145ba655f
---------------- ---------------------------
%stddev %change %stddev
\ | \
427728 -55.3% 191045 ? 2% vm-scalability.median
1674 ? 16% +958.2 2632 ? 15% vm-scalability.stddev%
96754197 -54.0% 44538647 vm-scalability.throughput
58.61 ? 2% +102.6% 118.77 vm-scalability.time.elapsed_time
58.61 ? 2% +102.6% 118.77 vm-scalability.time.elapsed_time.max
115768 +38.5% 160336 vm-scalability.time.involuntary_context_switches
16947 +8.0% 18300 vm-scalability.time.percent_of_cpu_this_job_got
9806 +120.5% 21623 ? 2% vm-scalability.time.system_time
125.94 -9.1% 114.45 vm-scalability.time.user_time




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Attachments:
(No filename) (49.85 kB)
config-6.4.0-rc4-00049-g9425c591e06a (161.11 kB)
job-script (8.81 kB)
job.yaml (6.37 kB)
reproduce (9.44 kB)
Download all attachments

2023-06-21 16:02:26

by Mike Kravetz

[permalink] [raw]
Subject: Re: [linus:master] [page cache] 9425c591e0: vm-scalability.throughput -20.0% regression

On 06/21/23 15:19, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a -20.0% regression of vm-scalability.throughput on:
>
>
> commit: 9425c591e06a9ab27a145ba655fb50532cf0bcc9 ("page cache: fix page_cache_next/prev_miss off by one")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> testcase: vm-scalability
> test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
> parameters:
>
> runtime: 300s
> test: lru-file-readonce
> cpufreq_governor: performance
>
> test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+----------------------------------------------------------------------------------------------------+
> | testcase: change | vm-scalability: vm-scalability.throughput -18.9% regression |
> | test machine | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory |
> | test parameters | cpufreq_governor=performance |
> | | debug-setup=no-monitor |
> | | runtime=300s |
> | | test=lru-file-readonce |
> +------------------+----------------------------------------------------------------------------------------------------+
> | testcase: change | vm-scalability: vm-scalability.throughput -52.8% regression |
> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
> | test parameters | cpufreq_governor=performance |
> | | runtime=300s |
> | | test=lru-file-readonce |
> +------------------+----------------------------------------------------------------------------------------------------+
> | testcase: change | vm-scalability: vm-scalability.throughput -54.0% regression |
> | test machine | 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory |
> | test parameters | cpufreq_governor=performance |
> | | debug-setup=no-monitor |
> | | runtime=300s |
> | | test=lru-file-readonce |
> +------------------+----------------------------------------------------------------------------------------------------+
>

Ouch!

I suspected this change could impact page_cache_next/prev_miss users, but had
no idea how much.

Unless someone sees something wrong in 9425c591e06a, the best approach
might be to revert and then add a simple interface to check for 'folio at
a given index in the cache' as suggested by Ackerley Tng.
https://lore.kernel.org/linux-mm/98624c2f481966492b4eb8272aef747790229b73.1683069252.git.ackerleytng@google.com/
--
Mike Kravetz

2023-06-23 12:38:32

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [linus:master] [page cache] 9425c591e0: vm-scalability.throughput -20.0% regression

[I know this is mostly been dealt with already, but due to the immanent
release I'll add this to the tracking to make sure Linus becomes aware
of this]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 21.06.23 09:19, kernel test robot wrote:

>
> kernel test robot noticed a -20.0% regression of vm-scalability.throughput on:
>
>
> commit: 9425c591e06a9ab27a145ba655fb50532cf0bcc9 ("page cache: fix page_cache_next/prev_miss off by one")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> [...]

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 9425c591e06a
#regzbot title mm/page cache: performance regression (2 reverts pending)
#regzbot monitor:
https://lore.kernel.org/all/[email protected]/
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

2023-06-26 09:14:08

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [linus:master] [page cache] 9425c591e0: vm-scalability.throughput -20.0% regression

Hi Mike,

> On 06/21/23 15:19, kernel test robot wrote:
<snip>
> I suspected this change could impact page_cache_next/prev_miss users, but had
> no idea how much.
>
> Unless someone sees something wrong in 9425c591e06a, the best approach
> might be to revert and then add a simple interface to check for 'folio at
> a given index in the cache' as suggested by Ackerley Tng.
> https://lore.kernel.org/linux-mm/98624c2f481966492b4eb8272aef747790229b73.1683069252.git.ackerleytng@google.com/

Some findings in my side.
1. You patch impact the folio order for file readahead. I collect the histogram of
order parameter to filemap_alloc_folio() call w/o your patch:

With your patch:
page order : count distribution
0 : 892073 | |
1 : 0 | |
2 : 65120457 |****************************************|
3 : 32914005 |******************** |
4 : 33020991 |******************** |

Without your patch:
page order : count distribution
0 : 3417288 |**** |
1 : 0 | |
2 : 877012 |* |
3 : 288 | |
4 : 5607522 |******* |
5 : 29974228 |****************************************|

We could see the order 5 dominate the filemap folio without your patch. With your
patch, order 2,3,4 are most used for filemap folio.

2. My understanding is your patch is correct and shouldn't be reverted. I made
a small change based on your patch. The performance regression is gone.

diff --git a/mm/readahead.c b/mm/readahead.c
index 47afbca1d122..cca333f9b560 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -610,7 +610,7 @@ static void ondemand_readahead(struct readahead_control *ractl,
pgoff_t start;

rcu_read_lock();
- start = page_cache_next_miss(ractl->mapping, index + 1,
+ start = page_cache_next_miss(ractl->mapping, index,
max_pages);
rcu_read_unlock();

And the filemap folio order is restored also:
page order : count distribution
0 : 3357622 |**** |
1 : 0 | |
2 : 861726 |* |
3 : 285 | |
4 : 4511637 |***** |
5 : 30505713 |****************************************|

I still didn't figure out why this simple change can restore the performance.
And why index + 1 was used. Will check more.


Regards
Yin, Fengwei

2023-06-27 04:48:27

by Yin, Fengwei

[permalink] [raw]
Subject: Re: [linus:master] [page cache] 9425c591e0: vm-scalability.throughput -20.0% regression



On 6/26/23 17:05, Yin, Fengwei wrote:
> Hi Mike,
>
>> On 06/21/23 15:19, kernel test robot wrote:
> <snip>
>> I suspected this change could impact page_cache_next/prev_miss users, but had
>> no idea how much.
>>
>> Unless someone sees something wrong in 9425c591e06a, the best approach
>> might be to revert and then add a simple interface to check for 'folio at
>> a given index in the cache' as suggested by Ackerley Tng.
>> https://lore.kernel.org/linux-mm/98624c2f481966492b4eb8272aef747790229b73.1683069252.git.ackerleytng@google.com/
>
> Some findings in my side.
> 1. You patch impact the folio order for file readahead. I collect the histogram of
> order parameter to filemap_alloc_folio() call w/o your patch:
>
> With your patch:
> page order : count distribution
> 0 : 892073 | |
> 1 : 0 | |
> 2 : 65120457 |****************************************|
> 3 : 32914005 |******************** |
> 4 : 33020991 |******************** |
>
> Without your patch:
> page order : count distribution
> 0 : 3417288 |**** |
> 1 : 0 | |
> 2 : 877012 |* |
> 3 : 288 | |
> 4 : 5607522 |******* |
> 5 : 29974228 |****************************************|
>
> We could see the order 5 dominate the filemap folio without your patch. With your
> patch, order 2,3,4 are most used for filemap folio.
>
> 2. My understanding is your patch is correct and shouldn't be reverted. I made
> a small change based on your patch. The performance regression is gone.
>
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 47afbca1d122..cca333f9b560 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -610,7 +610,7 @@ static void ondemand_readahead(struct readahead_control *ractl,
> pgoff_t start;
>
> rcu_read_lock();
> - start = page_cache_next_miss(ractl->mapping, index + 1,
> + start = page_cache_next_miss(ractl->mapping, index,
> max_pages);
> rcu_read_unlock();
>
> And the filemap folio order is restored also:
> page order : count distribution
> 0 : 3357622 |**** |
> 1 : 0 | |
> 2 : 861726 |* |
> 3 : 285 | |
> 4 : 4511637 |***** |
> 5 : 30505713 |****************************************|
>
> I still didn't figure out why this simple change can restore the performance.
> And why index + 1 was used. Will check more.

The thing is the ra initialization after page_cache_next_miss() in function
ondemand_readahead():
ra->start = start; (start is index + max_pages + 1 + 1 after your patch)
ra->size = start - index;

And +1 will be accumulated to ra->start and breaks the filemap folio order.


Regards
Yin, Fengwei

>
>
> Regards
> Yin, Fengwei