2020-06-11 06:39:22

by Chen, Rong A

[permalink] [raw]
Subject: [gup] 17839856fd: stress-ng.vm-splice.ops_per_sec 2158.6% improvement

Greeting,

FYI, we noticed a 2158.6% improvement of stress-ng.vm-splice.ops_per_sec due to commit:


commit: 17839856fd588f4ab6b789f482ed3ffd7c403e1f ("gup: document and work around "COW can break either way" issue")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: stress-ng
on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
with following parameters:

nr_threads: 100%
disk: 1HDD
testtime: 30s
class: pipe
cpufreq_governor: performance
ucode: 0x500002c


In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.vm-splice.ops_per_sec 372.8% improvement |
| test machine | 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory |
| test parameters | class=vm |
| | cpufreq_governor=performance |
| | disk=1HDD |
| | fs=ext4 |
| | nr_threads=10% |
| | testtime=1s |
| | ucode=0x500002c |
+------------------+----------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.vm-splice.ops_per_sec 2052.7% improvement |
| test machine | 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory |
| test parameters | class=pipe |
| | cpufreq_governor=performance |
| | disk=1HDD |
| | nr_threads=100% |
| | testtime=1s |
| | ucode=0x500002c |
+------------------+----------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/testcase/testtime/ucode:
pipe/gcc-9/performance/1HDD/x86_64-rhel-7.6/100%/debian-x86_64-20191114.cgz/lkp-csl-2sp5/stress-ng/30s/0x500002c

commit:
f359287765 ("Merge branch 'from-miklos' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs")
17839856fd ("gup: document and work around "COW can break either way" issue")

f359287765c04711 17839856fd588f4ab6b789f482e
---------------- ---------------------------
%stddev %change %stddev
\ | \
17619 -2.1% 17246 stress-ng.time.system_time
1529 +24.1% 1898 stress-ng.time.user_time
1.533e+08 +2158.7% 3.463e+09 stress-ng.vm-splice.ops
5110873 +2158.6% 1.154e+08 stress-ng.vm-splice.ops_per_sec
90027 ± 3% -12.6% 78647 ± 6% meminfo.PageTables
7.48 +1.8 9.28 mpstat.cpu.all.usr%
85.50 -2.0% 83.75 vmstat.cpu.sy
85.79 -1.8% 84.24 iostat.cpu.system
7.41 +24.5% 9.22 iostat.cpu.user
6131 ± 66% +117.4% 13330 ± 33% proc-vmstat.numa_hint_faults
1347 ± 29% +282.1% 5149 ± 50% proc-vmstat.numa_hint_faults_local
1239 ± 9% +14.4% 1417 ± 5% slabinfo.kmalloc-rcl-96.active_objs
1239 ± 9% +14.4% 1417 ± 5% slabinfo.kmalloc-rcl-96.num_objs
0.13 ± 6% -0.0 0.11 ± 6% perf-profile.children.cycles-pp.check_preempt_curr
0.17 ± 6% +0.0 0.20 ± 10% perf-profile.children.cycles-pp.rw_verify_area
0.44 ± 5% +0.0 0.48 ± 3% perf-profile.children.cycles-pp.__irqentry_text_start
339.81 ± 10% -28.3% 243.75 ± 14% sched_debug.cfs_rq:/.exec_clock.stddev
5.95 ± 12% +24.5% 7.41 ± 6% sched_debug.cpu.clock.stddev
5.95 ± 12% +24.5% 7.41 ± 6% sched_debug.cpu.clock_task.stddev
24316 ± 7% +28.4% 31224 ± 10% interrupts.CPU15.RES:Rescheduling_interrupts
38416 ± 26% -35.0% 24985 ± 23% interrupts.CPU17.RES:Rescheduling_interrupts
20612 ± 19% +85.9% 38315 ± 38% interrupts.CPU54.RES:Rescheduling_interrupts
26788 ± 48% +66.2% 44527 ± 31% interrupts.CPU57.RES:Rescheduling_interrupts
21724 ± 15% +55.8% 33857 ± 23% interrupts.CPU70.RES:Rescheduling_interrupts
4845 ± 35% +59.4% 7723 ± 2% interrupts.CPU87.NMI:Non-maskable_interrupts
4845 ± 35% +59.4% 7723 ± 2% interrupts.CPU87.PMI:Performance_monitoring_interrupts
4843 ± 35% +59.5% 7723 ± 2% interrupts.CPU89.NMI:Non-maskable_interrupts
4843 ± 35% +59.5% 7723 ± 2% interrupts.CPU89.PMI:Performance_monitoring_interrupts
4846 ± 35% +59.3% 7721 ± 2% interrupts.CPU90.NMI:Non-maskable_interrupts
4846 ± 35% +59.3% 7721 ± 2% interrupts.CPU90.PMI:Performance_monitoring_interrupts
25860 ± 27% -27.2% 18820 ± 17% interrupts.CPU91.RES:Rescheduling_interrupts
8.59 -9.4% 7.78 perf-stat.i.MPKI
3.695e+10 +17.1% 4.328e+10 perf-stat.i.branch-instructions
2.054e+08 +28.4% 2.638e+08 perf-stat.i.branch-misses
11.31 ± 5% -2.2 9.07 ± 5% perf-stat.i.cache-miss-rate%
3.74 -54.4% 1.70 perf-stat.i.cpi
54067 ± 66% +311.0% 222217 ± 18% perf-stat.i.cycles-between-cache-misses
0.04 ± 5% -0.0 0.03 ± 5% perf-stat.i.dTLB-load-miss-rate%
6065752 ± 7% -10.9% 5407417 ± 4% perf-stat.i.dTLB-load-misses
4.522e+10 +20.4% 5.444e+10 perf-stat.i.dTLB-loads
2.697e+10 +20.9% 3.26e+10 perf-stat.i.dTLB-stores
1.464e+08 +36.3% 1.996e+08 ± 2% perf-stat.i.iTLB-load-misses
1.77e+11 +18.5% 2.098e+11 perf-stat.i.instructions
0.71 +18.5% 0.84 perf-stat.i.ipc
2.58 -1.9% 2.53 perf-stat.i.metric.GHz
1131 +18.8% 1344 perf-stat.i.metric.M/sec
85.95 -2.7 83.28 ± 2% perf-stat.i.node-load-miss-rate%
89.18 -0.9 88.25 perf-stat.i.node-store-miss-rate%
3.29 ± 2% -17.3% 2.72 perf-stat.overall.MPKI
0.56 +0.1 0.61 perf-stat.overall.branch-miss-rate%
1.42 -16.9% 1.18 perf-stat.overall.cpi
0.01 ± 8% -0.0 0.01 ± 3% perf-stat.overall.dTLB-load-miss-rate%
94.12 +1.5 95.58 perf-stat.overall.iTLB-load-miss-rate%
1198 -12.7% 1046 perf-stat.overall.instructions-per-iTLB-miss
0.71 +20.4% 0.85 perf-stat.overall.ipc
3.62e+10 +18.1% 4.276e+10 perf-stat.ps.branch-instructions
2.026e+08 +29.2% 2.618e+08 perf-stat.ps.branch-misses
4.436e+10 +21.4% 5.385e+10 perf-stat.ps.dTLB-loads
2.644e+10 +21.9% 3.222e+10 perf-stat.ps.dTLB-stores
1.448e+08 +37.0% 1.983e+08 ± 2% perf-stat.ps.iTLB-load-misses
1.734e+11 +19.5% 2.073e+11 perf-stat.ps.instructions
3.673e+13 +19.5% 4.389e+13 perf-stat.total.instructions
10749 ± 5% +7.4% 11548 ± 4% softirqs.CPU10.RCU
10512 +8.6% 11414 ± 2% softirqs.CPU13.RCU
10476 ± 2% +9.8% 11503 ± 4% softirqs.CPU14.RCU
10790 ± 2% +10.6% 11938 ± 4% softirqs.CPU18.RCU
10721 ± 2% +10.5% 11842 ± 4% softirqs.CPU19.RCU
10754 ± 2% +9.8% 11804 softirqs.CPU2.RCU
10792 ± 4% +9.5% 11819 ± 3% softirqs.CPU20.RCU
10722 +10.9% 11893 ± 3% softirqs.CPU22.RCU
10968 ± 6% +10.4% 12110 ± 5% softirqs.CPU23.RCU
11012 ± 4% +22.6% 13503 ± 14% softirqs.CPU24.RCU
10584 ± 5% +12.4% 11899 ± 5% softirqs.CPU28.RCU
10512 ± 2% +13.8% 11966 ± 6% softirqs.CPU29.RCU
10826 ± 3% +8.0% 11696 ± 4% softirqs.CPU30.RCU
10482 +8.7% 11390 ± 3% softirqs.CPU31.RCU
10400 ± 4% +8.8% 11315 ± 3% softirqs.CPU32.RCU
10356 +9.7% 11361 ± 4% softirqs.CPU34.RCU
10193 +10.3% 11239 ± 4% softirqs.CPU35.RCU
10159 ± 2% +11.7% 11346 ± 3% softirqs.CPU37.RCU
10262 ± 2% +10.7% 11360 ± 3% softirqs.CPU38.RCU
10077 ± 2% +10.8% 11163 ± 4% softirqs.CPU40.RCU
10204 +9.2% 11142 ± 2% softirqs.CPU41.RCU
10523 ± 4% +6.8% 11238 ± 3% softirqs.CPU43.RCU
10015 +10.5% 11064 ± 3% softirqs.CPU44.RCU
10376 ± 5% +8.0% 11202 ± 4% softirqs.CPU45.RCU
10164 +9.1% 11090 ± 4% softirqs.CPU46.RCU
10176 ± 4% +8.8% 11071 ± 3% softirqs.CPU47.RCU
10739 +10.4% 11853 ± 3% softirqs.CPU49.RCU
10569 ± 3% +14.7% 12120 ± 3% softirqs.CPU51.RCU
10627 ± 5% +12.7% 11976 ± 3% softirqs.CPU52.RCU
10507 +12.9% 11858 ± 5% softirqs.CPU53.RCU
10413 +10.3% 11482 ± 3% softirqs.CPU54.RCU
11050 ± 3% +6.1% 11720 ± 4% softirqs.CPU64.RCU
10663 +10.5% 11781 ± 4% softirqs.CPU65.RCU
10461 ± 2% +10.4% 11544 ± 4% softirqs.CPU67.RCU
10565 ± 2% +11.7% 11804 ± 5% softirqs.CPU71.RCU
10200 ± 2% +22.8% 12529 ± 20% softirqs.CPU74.RCU
10078 +23.0% 12396 ± 16% softirqs.CPU75.RCU
10182 +9.0% 11101 ± 4% softirqs.CPU77.RCU
9994 ± 2% +12.1% 11201 ± 4% softirqs.CPU79.RCU
9848 +11.2% 10954 ± 5% softirqs.CPU80.RCU
9802 ± 2% +12.4% 11017 ± 5% softirqs.CPU81.RCU
9924 +9.8% 10893 ± 4% softirqs.CPU82.RCU
9921 +9.9% 10906 ± 3% softirqs.CPU83.RCU
9905 ± 2% +10.7% 10963 ± 5% softirqs.CPU85.RCU
9901 +8.5% 10746 ± 3% softirqs.CPU87.RCU
9953 ± 2% +10.5% 10997 ± 2% softirqs.CPU88.RCU
9944 +11.1% 11047 ± 5% softirqs.CPU89.RCU
9986 +8.1% 10796 ± 4% softirqs.CPU90.RCU
9796 ± 2% +10.4% 10812 ± 4% softirqs.CPU91.RCU
9750 +10.9% 10816 ± 4% softirqs.CPU93.RCU
10018 +9.8% 10997 ± 3% softirqs.CPU95.RCU
1009159 +9.1% 1101411 ± 3% softirqs.RCU



stress-ng.time.user_time

2500 +--------------------------------------------------------------------+
| |
| |
2000 |-O O O O O O O O O O O O O O O O O O O O O O O O |
| O O O O |
|.+..+.+. .+. .+.+.+.. .+. .+.+.+. .+.+.+..+. .+..+.+.+.+.. |
1500 |-+ +. +.+.+. + +. +. + + |
| |
1000 |-+ |
| |
| |
500 |-+ |
| |
| |
0 +--------------------------------------------------------------------+


stress-ng.time.system_time

18000 +-------------------------------------------------------------------+
| O O O O O O O O O O O O O O O O O O O O O O O O O O O |
16000 |-+ |
14000 |-+ |
| |
12000 |-+ |
10000 |-+ |
| |
8000 |-+ |
6000 |-+ |
| |
4000 |-+ |
2000 |-+ |
| |
0 +-------------------------------------------------------------------+


stress-ng.vm-splice.ops

3.5e+09 +-----------------------------------------------------------------+
| |
3e+09 |-+ |
| |
2.5e+09 |-+ |
| |
2e+09 |-+ |
| |
1.5e+09 |-+ |
| |
1e+09 |-+ |
| |
5e+08 |-+ |
|.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+ |
0 +-----------------------------------------------------------------+


stress-ng.vm-splice.ops_per_sec

1.2e+08 +-----------------------------------------------------------------+
| O O O O O O O O O O O O O O O O O O O O O O O O |
1e+08 |-+ |
| |
| |
8e+07 |-+ |
| |
6e+07 |-+ |
| |
4e+07 |-+ |
| |
| |
2e+07 |-+ |
|.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+ |
0 +-----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample

***************************************************************************************************
lkp-cfl-e1: 16 threads Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz with 32G memory


***************************************************************************************************
lkp-csl-2sp5: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
=========================================================================================
class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/testcase/testtime/ucode:
vm/gcc-9/performance/1HDD/ext4/x86_64-rhel-7.6/10%/debian-x86_64-20191114.cgz/lkp-csl-2sp5/stress-ng/1s/0x500002c

commit:
f359287765 ("Merge branch 'from-miklos' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs")
17839856fd ("gup: document and work around "COW can break either way" issue")

f359287765c04711 17839856fd588f4ab6b789f482e
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
1:4 -25% :4 kmsg.Memory_failure:#:recovery_action_for_reserved_kernel_page:Failed
1:4 -25% :4 kmsg.Memory_failure:#:reserved_kernel_page_still_referenced_by#users
0:4 14% 1:4 perf-profile.calltrace.cycles-pp.sync_regs.error_entry
0:4 15% 1:4 perf-profile.calltrace.cycles-pp.error_entry
2:4 2% 2:4 perf-profile.children.cycles-pp.error_entry
0:4 0% 0:4 perf-profile.self.cycles-pp.error_entry
%stddev %change %stddev
\ | \
3752446 ± 5% +372.8% 17743420 stress-ng.vm-splice.ops
3753307 ± 5% +372.8% 17744113 stress-ng.vm-splice.ops_per_sec
352732 ± 38% -37.5% 220415 ± 19% meminfo.AnonHugePages
6088 ± 4% -8.4% 5575 ± 4% vmstat.io.bo
26.97 +2.2% 27.56 boot-time.boot
16.63 +1.1% 16.82 boot-time.dhcp
11097144 ± 4% +9.4% 12140802 ± 3% cpuidle.C1.time
1306863 ± 5% +10.0% 1438008 ± 4% cpuidle.C1.usage
20928 ± 19% +40.5% 29401 ± 12% sched_debug.cfs_rq:/.min_vruntime.max
621.93 ± 5% -10.6% 556.11 ± 5% sched_debug.cpu.curr->pid.stddev
1413502 ± 10% +9.9% 1552782 proc-vmstat.nr_foll_pin_acquired
1413411 ± 10% +9.9% 1552695 proc-vmstat.nr_foll_pin_released
83634280 ± 3% -4.1% 80233229 proc-vmstat.pgalloc_normal
83604544 ± 3% -4.1% 80204856 proc-vmstat.pgfree
2279 ± 3% +13.7% 2591 slabinfo.khugepaged_mm_slot.active_objs
2279 ± 3% +13.7% 2591 slabinfo.khugepaged_mm_slot.num_objs
51107 ± 4% -12.7% 44626 ± 7% slabinfo.lsm_file_cache.active_objs
53253 ± 3% -12.1% 46810 ± 7% slabinfo.lsm_file_cache.num_objs
0.94 ± 8% +0.1 1.07 ± 6% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__get_user_pages.populate_vma_page_range.__mm_populate
0.96 ± 8% +0.1 1.09 ± 6% perf-profile.calltrace.cycles-pp.handle_mm_fault.__get_user_pages.populate_vma_page_range.__mm_populate.vm_mmap_pgoff
0.11 ± 30% +0.1 0.16 ± 16% perf-profile.children.cycles-pp.anon_vma_fork
0.12 ± 18% +0.0 0.17 ± 13% perf-profile.self.cycles-pp.filemap_map_pages
6.418e+09 +1.2% 6.498e+09 perf-stat.i.branch-instructions
3572544 +2.1% 3646846 perf-stat.i.iTLB-loads
3.035e+10 +1.4% 3.078e+10 perf-stat.i.instructions
20346 -5.6% 19202 ± 2% perf-stat.i.major-faults
7.69 ± 4% -3.5% 7.42 ± 3% perf-stat.overall.MPKI
6.243e+09 +1.3% 6.322e+09 perf-stat.ps.branch-instructions
35907601 +3.8% 37255792 ± 3% perf-stat.ps.branch-misses
3.127e+09 +1.2% 3.165e+09 perf-stat.ps.dTLB-stores
3453004 +2.3% 3531803 perf-stat.ps.iTLB-loads
2.951e+10 +1.5% 2.994e+10 perf-stat.ps.instructions
19574 -5.4% 18514 ± 2% perf-stat.ps.major-faults
38744 ± 10% -27.2% 28219 ± 12% interrupts.CPU24.TLB:TLB_shootdowns
2974 ± 44% -77.7% 662.50 ± 29% interrupts.CPU27.NMI:Non-maskable_interrupts
2974 ± 44% -77.7% 662.50 ± 29% interrupts.CPU27.PMI:Performance_monitoring_interrupts
2511 ± 40% -70.7% 736.25 ± 7% interrupts.CPU28.NMI:Non-maskable_interrupts
2511 ± 40% -70.7% 736.25 ± 7% interrupts.CPU28.PMI:Performance_monitoring_interrupts
37767 ± 28% -28.1% 27157 ± 14% interrupts.CPU28.TLB:TLB_shootdowns
2729 ± 63% -60.8% 1070 ± 86% interrupts.CPU32.NMI:Non-maskable_interrupts
2729 ± 63% -60.8% 1070 ± 86% interrupts.CPU32.PMI:Performance_monitoring_interrupts
2934 ± 59% -66.9% 971.75 ± 90% interrupts.CPU33.NMI:Non-maskable_interrupts
2934 ± 59% -66.9% 971.75 ± 90% interrupts.CPU33.PMI:Performance_monitoring_interrupts
18203 ± 7% +17.7% 21419 ± 9% interrupts.CPU59.CAL:Function_call_interrupts
24514 ± 8% +19.0% 29165 ± 10% interrupts.CPU59.TLB:TLB_shootdowns
99.25 ± 11% -19.9% 79.50 ± 13% interrupts.CPU62.RES:Rescheduling_interrupts
19246 ± 8% +19.6% 23023 ± 6% interrupts.CPU68.CAL:Function_call_interrupts
26764 ± 7% +18.4% 31691 ± 8% interrupts.CPU68.TLB:TLB_shootdowns
3799 ± 52% -80.9% 726.25 ± 22% interrupts.CPU74.NMI:Non-maskable_interrupts
3799 ± 52% -80.9% 726.25 ± 22% interrupts.CPU74.PMI:Performance_monitoring_interrupts
3149 ± 49% -79.8% 635.50 ± 21% interrupts.CPU75.NMI:Non-maskable_interrupts
3149 ± 49% -79.8% 635.50 ± 21% interrupts.CPU75.PMI:Performance_monitoring_interrupts
2709 ± 42% -73.4% 719.75 ± 26% interrupts.CPU76.NMI:Non-maskable_interrupts
2709 ± 42% -73.4% 719.75 ± 26% interrupts.CPU76.PMI:Performance_monitoring_interrupts
2364 ± 40% -57.3% 1009 ± 38% interrupts.CPU79.NMI:Non-maskable_interrupts
2364 ± 40% -57.3% 1009 ± 38% interrupts.CPU79.PMI:Performance_monitoring_interrupts
3030 ± 48% -68.1% 967.50 ± 38% interrupts.CPU81.NMI:Non-maskable_interrupts
3030 ± 48% -68.1% 967.50 ± 38% interrupts.CPU81.PMI:Performance_monitoring_interrupts
2845 ± 67% -51.8% 1370 ± 83% interrupts.CPU83.NMI:Non-maskable_interrupts
2845 ± 67% -51.8% 1370 ± 83% interrupts.CPU83.PMI:Performance_monitoring_interrupts
27279 ± 11% +23.0% 33551 ± 9% interrupts.CPU84.TLB:TLB_shootdowns
36061 ± 8% -14.6% 30804 ± 6% interrupts.CPU92.TLB:TLB_shootdowns
29829 ± 5% -14.1% 25619 ± 11% interrupts.CPU94.TLB:TLB_shootdowns
160996 ± 9% -11.9% 141904 ± 11% interrupts.NMI:Non-maskable_interrupts
160996 ± 9% -11.9% 141904 ± 11% interrupts.PMI:Performance_monitoring_interrupts



***************************************************************************************************
lkp-csl-2sp5: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
=========================================================================================
class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/testcase/testtime/ucode:
pipe/gcc-9/performance/1HDD/x86_64-rhel-7.6/100%/debian-x86_64-20191114.cgz/lkp-csl-2sp5/stress-ng/1s/0x500002c

commit:
f359287765 ("Merge branch 'from-miklos' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs")
17839856fd ("gup: document and work around "COW can break either way" issue")

f359287765c04711 17839856fd588f4ab6b789f482e
---------------- ---------------------------
%stddev %change %stddev
\ | \
816942 ± 8% -12.0% 719276 stress-ng.time.involuntary_context_switches
235279 -4.0% 225847 stress-ng.time.minor_page_faults
599.87 -2.8% 583.29 stress-ng.time.system_time
47.33 ± 2% +28.2% 60.69 stress-ng.time.user_time
5412114 ± 5% +2054.0% 1.166e+08 stress-ng.vm-splice.ops
5417324 ± 5% +2052.7% 1.166e+08 stress-ng.vm-splice.ops_per_sec
16.42 +1.2% 16.62 boot-time.dhcp
5.94 ± 4% +24.7% 7.41 ± 4% iostat.cpu.user
7.12 ± 5% +1.4 8.52 ± 2% mpstat.cpu.all.usr%
1.61e+08 ± 11% -50.9% 79016011 ± 47% cpuidle.C1E.time
984623 ± 12% -21.5% 772591 ± 14% cpuidle.C1E.usage
19231 ± 11% +21.8% 23416 ± 8% numa-meminfo.node0.Mapped
938.25 ± 85% -55.9% 413.50 ±173% numa-meminfo.node1.Inactive(file)
4850 ± 11% +18.6% 5754 ± 10% numa-vmstat.node0.nr_mapped
234.00 ± 85% -55.9% 103.25 ±173% numa-vmstat.node1.nr_inactive_file
234.00 ± 85% -55.9% 103.25 ±173% numa-vmstat.node1.nr_zone_inactive_file
9049 +13.7% 10287 ± 11% softirqs.CPU44.TIMER
9712 ± 10% -12.2% 8524 softirqs.CPU57.TIMER
9235 ± 6% -7.3% 8560 ± 3% softirqs.CPU82.TIMER
8160 +5.6% 8617 ± 5% proc-vmstat.nr_shmem
57724 +3.1% 59541 proc-vmstat.nr_slab_unreclaimable
29.50 ± 78% +11075.4% 3296 ± 69% proc-vmstat.numa_hint_faults_local
295784 -2.1% 289478 proc-vmstat.pgfault
7320 ± 9% -30.6% 5078 ± 22% sched_debug.cfs_rq:/.min_vruntime.min
836.06 ± 11% +16.2% 971.79 ± 5% sched_debug.cfs_rq:/.runnable_avg.stddev
55.75 ± 14% +18.8% 66.25 ± 6% sched_debug.cpu.nr_uninterruptible.max
-59.50 -16.8% -49.50 sched_debug.cpu.nr_uninterruptible.min
773.00 ± 6% +30.9% 1011 ± 14% interrupts.CPU14.RES:Rescheduling_interrupts
2901 ± 34% +43.0% 4147 ± 24% interrupts.CPU16.NMI:Non-maskable_interrupts
2901 ± 34% +43.0% 4147 ± 24% interrupts.CPU16.PMI:Performance_monitoring_interrupts
969.25 ± 10% +74.4% 1690 ± 51% interrupts.CPU31.RES:Rescheduling_interrupts
989.25 ± 12% +39.0% 1375 interrupts.CPU54.RES:Rescheduling_interrupts
1207 ± 18% -24.7% 908.75 ± 10% interrupts.CPU68.RES:Rescheduling_interrupts
884.75 ± 11% +33.3% 1179 ± 11% interrupts.CPU8.RES:Rescheduling_interrupts
1015 ± 8% +22.4% 1243 ± 5% interrupts.CPU81.RES:Rescheduling_interrupts
865.50 ± 8% +18.7% 1027 ± 7% interrupts.CPU9.RES:Rescheduling_interrupts
80.00 ± 69% -53.4% 37.25 ± 15% interrupts.TLB:TLB_shootdowns
4.42 ± 10% +34.9% 5.96 ± 25% perf-stat.i.MPKI
2.426e+08 +2.0% 2.473e+08 perf-stat.i.branch-misses
5.281e+08 ± 7% +10.8% 5.85e+08 ± 4% perf-stat.i.cache-references
0.02 ± 7% +0.0 0.03 ± 25% perf-stat.i.dTLB-load-miss-rate%
37835 -3.7% 36434 perf-stat.i.minor-faults
37835 -3.7% 36434 perf-stat.i.page-faults
3.25 ± 6% +10.1% 3.58 ± 3% perf-stat.overall.MPKI
2.115e+08 +2.1% 2.158e+08 perf-stat.ps.branch-misses
4.613e+08 ± 7% +10.9% 5.116e+08 ± 3% perf-stat.ps.cache-references
33390 -2.3% 32620 perf-stat.ps.minor-faults
33390 -2.3% 32620 perf-stat.ps.page-faults
1.154e+12 +1.2% 1.168e+12 perf-stat.total.instructions
11.58 -11.6 0.00 perf-profile.calltrace.cycles-pp.get_user_pages_unlocked.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe.do_vmsplice
11.26 -11.3 0.00 perf-profile.calltrace.cycles-pp.__get_user_pages.get_user_pages_unlocked.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe
11.17 -11.2 0.00 perf-profile.calltrace.cycles-pp.follow_page_pte.__get_user_pages.get_user_pages_unlocked.internal_get_user_pages_fast.iov_iter_get_pages
11.64 -9.4 2.29 ± 3% perf-profile.calltrace.cycles-pp.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe.do_vmsplice.__do_sys_vmsplice
11.67 -9.0 2.67 ± 3% perf-profile.calltrace.cycles-pp.iov_iter_get_pages.iter_to_pipe.do_vmsplice.__do_sys_vmsplice.do_syscall_64
11.70 -8.1 3.63 ± 3% perf-profile.calltrace.cycles-pp.iter_to_pipe.do_vmsplice.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe
12.08 -7.1 4.94 ± 3% perf-profile.calltrace.cycles-pp.do_vmsplice.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe.vmsplice
6.36 -6.4 0.00 perf-profile.calltrace.cycles-pp.mark_page_accessed.follow_page_pte.__get_user_pages.get_user_pages_unlocked.internal_get_user_pages_fast
12.16 -5.1 7.10 ± 3% perf-profile.calltrace.cycles-pp.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe.vmsplice
12.18 -4.6 7.54 ± 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.vmsplice
12.19 -4.4 7.77 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.vmsplice
8.73 -3.0 5.75 ± 3% perf-profile.calltrace.cycles-pp.__splice_from_pipe.splice_from_pipe.do_splice.__x64_sys_splice.do_syscall_64
12.25 -2.9 9.32 ± 3% perf-profile.calltrace.cycles-pp.vmsplice
9.79 -1.8 7.97 ± 3% perf-profile.calltrace.cycles-pp.splice_from_pipe.do_splice.__x64_sys_splice.do_syscall_64.entry_SYSCALL_64_after_hwframe
8.60 ± 7% -0.8 7.80 ± 5% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
8.60 ± 7% -0.8 7.80 ± 5% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
8.60 ± 7% -0.8 7.80 ± 5% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
12.20 ± 2% -0.6 11.62 perf-profile.calltrace.cycles-pp.enqueue_task_fair.activate_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function
12.22 ± 3% -0.6 11.64 perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock
12.22 ± 3% -0.6 11.64 perf-profile.calltrace.cycles-pp.activate_task.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common
10.57 -0.6 10.02 perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write.new_sync_write
10.53 -0.6 9.98 perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_write
6.78 ± 2% -0.4 6.38 ± 2% perf-profile.calltrace.cycles-pp.prepare_to_wait_event.pipe_read.new_sync_read.vfs_read.ksys_read
6.57 ± 2% -0.4 6.20 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.prepare_to_wait_event.pipe_read.new_sync_read.vfs_read
6.50 ± 2% -0.4 6.13 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.prepare_to_wait_event.pipe_read.new_sync_read
0.55 ± 8% -0.3 0.26 ±100% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_fork
0.55 ± 8% -0.3 0.26 ±100% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_fork
0.55 ± 8% -0.3 0.26 ±100% perf-profile.calltrace.cycles-pp.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_fork
0.55 ± 8% -0.3 0.26 ±100% perf-profile.calltrace.cycles-pp._do_fork.__do_sys_clone.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_fork
3.23 -0.3 2.94 ± 3% perf-profile.calltrace.cycles-pp.direct_splice_actor.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64
3.19 -0.3 2.90 ± 3% perf-profile.calltrace.cycles-pp.splice_from_pipe.direct_splice_actor.splice_direct_to_actor.do_splice_direct.do_sendfile
2.94 -0.3 2.68 ± 3% perf-profile.calltrace.cycles-pp.__splice_from_pipe.splice_from_pipe.direct_splice_actor.splice_direct_to_actor.do_splice_direct
1.93 ± 2% -0.2 1.78 ± 3% perf-profile.calltrace.cycles-pp.push_pipe.iov_iter_get_pages_alloc.default_file_splice_read.do_splice.__x64_sys_splice
1.75 ± 2% -0.1 1.62 ± 3% perf-profile.calltrace.cycles-pp.xas_load.find_get_entry.pagecache_get_page.generic_file_buffered_read.generic_file_splice_read
1.39 ± 2% -0.1 1.30 ± 3% perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.push_pipe.iov_iter_get_pages_alloc.default_file_splice_read.do_splice
0.80 ± 2% -0.1 0.72 ± 2% perf-profile.calltrace.cycles-pp.sanity.copy_page_to_iter.generic_file_buffered_read.generic_file_splice_read.splice_direct_to_actor
0.55 ± 3% -0.0 0.53 perf-profile.calltrace.cycles-pp.free_unref_page.__splice_from_pipe.splice_from_pipe.do_splice.__x64_sys_splice
0.58 ± 3% +0.1 0.64 ± 4% perf-profile.calltrace.cycles-pp.__kmalloc_node.iov_iter_get_pages_alloc.default_file_splice_read.do_splice.__x64_sys_splice
1.07 ± 5% +0.1 1.19 ± 3% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout.copy_page_to_iter.pipe_read.new_sync_read
1.20 ± 4% +0.1 1.32 ± 2% perf-profile.calltrace.cycles-pp.copyout.copy_page_to_iter.pipe_read.new_sync_read.vfs_read
1.54 ± 4% +0.2 1.74 ± 2% perf-profile.calltrace.cycles-pp.copy_page_to_iter.pipe_read.new_sync_read.vfs_read.ksys_read
0.00 +0.6 0.56 ± 4% perf-profile.calltrace.cycles-pp.mutex_lock.do_vmsplice.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.58 ± 14% +0.7 1.33 ± 5% perf-profile.calltrace.cycles-pp.security_file_permission.do_splice.__x64_sys_splice.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.96 ± 2% +0.8 1.72 ± 4% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.splice
0.00 +0.8 0.79 ± 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.vmsplice
0.00 +0.8 0.84 ± 4% perf-profile.calltrace.cycles-pp.__fget_light.__x64_sys_splice.do_syscall_64.entry_SYSCALL_64_after_hwframe.splice
0.14 ±173% +0.8 0.98 ± 7% perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.do_splice.__x64_sys_splice.do_syscall_64
0.00 +0.9 0.87 ± 2% perf-profile.calltrace.cycles-pp._copy_from_user.rw_copy_check_uvector.import_iovec.__do_sys_vmsplice.do_syscall_64
0.00 +0.9 0.89 ± 3% perf-profile.calltrace.cycles-pp.mutex_lock.splice_from_pipe.do_splice.__x64_sys_splice.do_syscall_64
0.00 +1.3 1.28 ± 2% perf-profile.calltrace.cycles-pp.rw_copy_check_uvector.import_iovec.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.5 1.53 ± 2% perf-profile.calltrace.cycles-pp.import_iovec.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe.vmsplice
0.00 +1.9 1.94 ± 3% perf-profile.calltrace.cycles-pp.gup_pgd_range.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe.do_vmsplice
23.74 +3.0 26.75 ± 3% perf-profile.calltrace.cycles-pp.splice
11.58 -11.6 0.00 perf-profile.children.cycles-pp.get_user_pages_unlocked
11.26 -11.3 0.00 perf-profile.children.cycles-pp.__get_user_pages
11.17 -11.2 0.00 perf-profile.children.cycles-pp.follow_page_pte
11.64 -9.3 2.32 ± 3% perf-profile.children.cycles-pp.internal_get_user_pages_fast
11.67 -9.0 2.68 ± 3% perf-profile.children.cycles-pp.iov_iter_get_pages
11.70 -8.0 3.67 ± 3% perf-profile.children.cycles-pp.iter_to_pipe
12.08 -7.0 5.04 ± 3% perf-profile.children.cycles-pp.do_vmsplice
7.09 -6.4 0.69 ± 4% perf-profile.children.cycles-pp.mark_page_accessed
12.16 -5.0 7.15 ± 3% perf-profile.children.cycles-pp.__do_sys_vmsplice
5.55 -4.0 1.52 ± 3% perf-profile.children.cycles-pp.page_cache_pipe_buf_release
11.81 -3.2 8.62 ± 3% perf-profile.children.cycles-pp.__splice_from_pipe
86.84 -2.7 84.19 perf-profile.children.cycles-pp.do_syscall_64
12.27 -2.5 9.74 ± 3% perf-profile.children.cycles-pp.vmsplice
87.27 -2.2 85.05 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
13.00 -2.1 10.91 ± 3% perf-profile.children.cycles-pp.splice_from_pipe
8.60 ± 7% -0.8 7.80 ± 5% perf-profile.children.cycles-pp.start_secondary
16.27 -0.6 15.63 perf-profile.children.cycles-pp.autoremove_wake_function
6.85 ± 2% -0.4 6.46 ± 2% perf-profile.children.cycles-pp.prepare_to_wait_event
3.24 -0.3 2.95 ± 3% perf-profile.children.cycles-pp.direct_splice_actor
0.57 ± 2% -0.2 0.38 ± 4% perf-profile.children.cycles-pp.page_cache_pipe_buf_confirm
1.94 ± 2% -0.1 1.79 ± 2% perf-profile.children.cycles-pp.push_pipe
1.80 ± 2% -0.1 1.67 ± 3% perf-profile.children.cycles-pp.xas_load
0.20 ± 7% -0.1 0.10 ± 19% perf-profile.children.cycles-pp.kthread
0.21 ± 6% -0.1 0.12 ± 17% perf-profile.children.cycles-pp.ret_from_fork
0.86 ± 2% -0.1 0.77 ± 2% perf-profile.children.cycles-pp.sanity
0.16 ± 5% -0.1 0.09 ± 26% perf-profile.children.cycles-pp.memcpy_erms
0.15 ± 4% -0.1 0.09 ± 26% perf-profile.children.cycles-pp.drm_fb_helper_dirty_work
0.36 -0.1 0.30 perf-profile.children.cycles-pp.alloc_pages_current
1.89 -0.1 1.83 perf-profile.children.cycles-pp.__alloc_pages_nodemask
0.16 ± 5% -0.1 0.10 ± 23% perf-profile.children.cycles-pp.worker_thread
0.16 ± 5% -0.1 0.10 ± 23% perf-profile.children.cycles-pp.process_one_work
0.56 ± 9% -0.1 0.50 ± 5% perf-profile.children.cycles-pp.__do_sys_clone
0.56 ± 9% -0.1 0.50 ± 5% perf-profile.children.cycles-pp._do_fork
0.60 ± 7% -0.1 0.55 ± 4% perf-profile.children.cycles-pp.__libc_fork
0.52 ± 9% -0.1 0.47 ± 5% perf-profile.children.cycles-pp.copy_process
0.10 ± 5% -0.0 0.07 ± 6% perf-profile.children.cycles-pp.get_task_policy
0.11 -0.0 0.09 ± 8% perf-profile.children.cycles-pp.copy_page_range
0.07 ± 5% -0.0 0.06 ± 7% perf-profile.children.cycles-pp.free_pcp_prepare
0.11 ± 4% -0.0 0.09 ± 4% perf-profile.children.cycles-pp.prep_new_page
0.17 ± 2% +0.0 0.18 perf-profile.children.cycles-pp.scheduler_tick
0.09 ± 4% +0.0 0.10 perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64
0.34 ± 2% +0.0 0.36 perf-profile.children.cycles-pp.__hrtimer_run_queues
0.37 +0.0 0.39 ± 2% perf-profile.children.cycles-pp.current_time
0.05 ± 8% +0.0 0.08 perf-profile.children.cycles-pp.kill_fasync
0.24 ± 5% +0.0 0.27 ± 4% perf-profile.children.cycles-pp.update_cfs_group
0.65 ± 2% +0.0 0.68 perf-profile.children.cycles-pp.smp_apic_timer_interrupt
0.53 ± 3% +0.0 0.56 perf-profile.children.cycles-pp.hrtimer_interrupt
0.60 +0.0 0.64 perf-profile.children.cycles-pp.fsnotify_parent
0.15 ± 12% +0.0 0.19 ± 5% perf-profile.children.cycles-pp.osq_lock
0.26 ± 7% +0.0 0.30 ± 4% perf-profile.children.cycles-pp.__fdget_pos
0.06 ± 11% +0.1 0.12 ± 56% perf-profile.children.cycles-pp.queued_read_lock_slowpath
0.06 ± 6% +0.1 0.13 ± 3% perf-profile.children.cycles-pp.splice_from_pipe_next
0.00 +0.1 0.07 ± 12% perf-profile.children.cycles-pp.__fdget
0.63 ± 3% +0.1 0.70 ± 4% perf-profile.children.cycles-pp.__kmalloc_node
0.05 ± 8% +0.1 0.13 ± 3% perf-profile.children.cycles-pp.pipe_unlock
0.07 ± 10% +0.1 0.15 ± 7% perf-profile.children.cycles-pp.check_stack_object
0.94 ± 3% +0.1 1.02 ± 2% perf-profile.children.cycles-pp.fsnotify
0.06 ± 6% +0.1 0.15 ± 4% perf-profile.children.cycles-pp.pipe_lock
0.00 +0.1 0.11 ± 4% perf-profile.children.cycles-pp.wait_for_space
0.30 ± 2% +0.1 0.41 ± 4% perf-profile.children.cycles-pp.aa_file_perm
0.00 +0.1 0.11 ± 4% perf-profile.children.cycles-pp.__x64_sys_vmsplice
0.21 ± 3% +0.1 0.32 ± 3% perf-profile.children.cycles-pp.pipe_to_null
0.08 ± 5% +0.1 0.20 ± 2% perf-profile.children.cycles-pp.splice_write_null
0.09 ± 13% +0.1 0.21 ± 12% perf-profile.children.cycles-pp.get_pipe_info
0.34 ± 2% +0.1 0.46 ± 3% perf-profile.children.cycles-pp.rcu_all_qs
0.29 ± 5% +0.1 0.41 ± 2% perf-profile.children.cycles-pp.apparmor_file_permission
0.39 +0.1 0.52 perf-profile.children.cycles-pp.__x86_indirect_thunk_rax
1.20 ± 4% +0.1 1.33 ± 2% perf-profile.children.cycles-pp.copyout
0.14 ± 3% +0.1 0.27 ± 3% perf-profile.children.cycles-pp.generic_pipe_buf_confirm
0.32 ± 6% +0.2 0.47 perf-profile.children.cycles-pp.rw_verify_area
4.91 ± 2% +0.2 5.06 perf-profile.children.cycles-pp.copy_page_to_iter
0.15 ± 3% +0.2 0.33 perf-profile.children.cycles-pp.fpregs_assert_state_consistent
0.00 +0.2 0.22 ± 4% perf-profile.children.cycles-pp.add_to_pipe
0.14 ± 3% +0.2 0.36 ± 3% perf-profile.children.cycles-pp.__check_object_size
0.70 +0.2 0.93 perf-profile.children.cycles-pp._cond_resched
0.27 ± 5% +0.2 0.51 ± 3% perf-profile.children.cycles-pp.copy_user_generic_unrolled
1.02 ± 3% +0.3 1.28 ± 3% perf-profile.children.cycles-pp.mutex_unlock
0.18 ± 8% +0.3 0.45 ± 3% perf-profile.children.cycles-pp.iov_iter_advance
1.97 ± 3% +0.3 2.28 ± 2% perf-profile.children.cycles-pp.copy_user_enhanced_fast_string
0.00 +0.3 0.32 ± 3% perf-profile.children.cycles-pp.try_grab_compound_head
0.61 ± 4% +0.4 1.00 perf-profile.children.cycles-pp.__might_sleep
0.29 ± 7% +0.4 0.68 perf-profile.children.cycles-pp.__might_fault
0.69 +0.6 1.34 ± 2% perf-profile.children.cycles-pp.__fget_light
1.83 ± 13% +0.7 2.48 ± 9% perf-profile.children.cycles-pp.common_file_perm
2.10 +0.7 2.79 ± 2% perf-profile.children.cycles-pp.___might_sleep
0.39 ± 3% +0.9 1.26 ± 3% perf-profile.children.cycles-pp._copy_from_user
2.95 ± 9% +0.9 3.85 ± 5% perf-profile.children.cycles-pp.security_file_permission
1.76 ± 5% +1.1 2.87 perf-profile.children.cycles-pp.mutex_lock
1.27 ± 2% +1.2 2.50 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.64 ± 2% +1.3 1.91 ± 3% perf-profile.children.cycles-pp.rw_copy_check_uvector
0.75 ± 2% +1.5 2.23 ± 3% perf-profile.children.cycles-pp.import_iovec
1.54 ± 4% +1.6 3.15 ± 2% perf-profile.children.cycles-pp.entry_SYSCALL_64
0.06 ± 9% +1.9 1.97 ± 3% perf-profile.children.cycles-pp.gup_pgd_range
24.24 +2.5 26.77 ± 3% perf-profile.children.cycles-pp.splice
6.79 -6.1 0.65 ± 5% perf-profile.self.cycles-pp.mark_page_accessed
5.48 -4.0 1.47 ± 3% perf-profile.self.cycles-pp.page_cache_pipe_buf_release
0.54 -0.2 0.35 ± 6% perf-profile.self.cycles-pp.page_cache_pipe_buf_confirm
1.32 ± 3% -0.1 1.18 ± 3% perf-profile.self.cycles-pp.xas_load
0.26 ± 3% -0.1 0.19 ± 2% perf-profile.self.cycles-pp._raw_spin_lock
0.15 ± 5% -0.1 0.09 ± 26% perf-profile.self.cycles-pp.memcpy_erms
0.78 ± 2% -0.1 0.72 ± 2% perf-profile.self.cycles-pp.sanity
0.51 ± 3% -0.0 0.46 ± 3% perf-profile.self.cycles-pp.__alloc_pages_nodemask
0.09 ± 4% -0.0 0.06 perf-profile.self.cycles-pp.get_task_policy
0.14 ± 5% +0.0 0.15 ± 3% perf-profile.self.cycles-pp.current_time
0.14 ± 7% +0.0 0.16 ± 2% perf-profile.self.cycles-pp.new_sync_write
0.05 +0.0 0.08 ± 5% perf-profile.self.cycles-pp.kill_fasync
0.24 ± 5% +0.0 0.27 ± 4% perf-profile.self.cycles-pp.update_cfs_group
0.16 ± 6% +0.0 0.19 ± 2% perf-profile.self.cycles-pp.vfs_write
0.53 ± 2% +0.0 0.56 perf-profile.self.cycles-pp.fsnotify_parent
0.15 ± 12% +0.0 0.19 ± 6% perf-profile.self.cycles-pp.osq_lock
0.21 ± 2% +0.1 0.27 ± 3% perf-profile.self.cycles-pp.__x86_indirect_thunk_rax
0.05 ± 8% +0.1 0.11 ± 4% perf-profile.self.cycles-pp.splice_from_pipe_next
0.06 ± 7% +0.1 0.12 ± 4% perf-profile.self.cycles-pp.check_stack_object
0.91 ± 3% +0.1 0.99 perf-profile.self.cycles-pp.fsnotify
0.01 ±173% +0.1 0.09 ± 11% perf-profile.self.cycles-pp.pipe_lock
0.06 ± 11% +0.1 0.14 ± 17% perf-profile.self.cycles-pp.get_pipe_info
0.25 ± 2% +0.1 0.34 ± 3% perf-profile.self.cycles-pp.rcu_all_qs
0.00 +0.1 0.09 ± 4% perf-profile.self.cycles-pp.pipe_unlock
0.00 +0.1 0.09 ± 4% perf-profile.self.cycles-pp.wait_for_space
0.10 ± 4% +0.1 0.20 ± 5% perf-profile.self.cycles-pp.__might_fault
0.28 ± 2% +0.1 0.38 ± 4% perf-profile.self.cycles-pp.aa_file_perm
0.17 ± 2% +0.1 0.27 ± 2% perf-profile.self.cycles-pp.pipe_to_null
0.00 +0.1 0.10 ± 4% perf-profile.self.cycles-pp.__x64_sys_vmsplice
0.08 ± 5% +0.1 0.18 ± 2% perf-profile.self.cycles-pp.splice_write_null
0.28 +0.1 0.38 ± 2% perf-profile.self.cycles-pp.wakeup_pipe_writers
0.32 ± 2% +0.1 0.43 ± 2% perf-profile.self.cycles-pp._cond_resched
0.27 ± 4% +0.1 0.38 ± 3% perf-profile.self.cycles-pp.apparmor_file_permission
0.26 ± 6% +0.1 0.37 ± 3% perf-profile.self.cycles-pp.wakeup_pipe_readers
0.51 ± 2% +0.1 0.63 ± 5% perf-profile.self.cycles-pp.security_file_permission
0.07 ± 6% +0.1 0.20 ± 4% perf-profile.self.cycles-pp.__check_object_size
0.08 +0.1 0.21 ± 5% perf-profile.self.cycles-pp._copy_from_user
0.29 ± 6% +0.1 0.43 perf-profile.self.cycles-pp.rw_verify_area
0.12 ± 3% +0.1 0.26 ± 4% perf-profile.self.cycles-pp.generic_pipe_buf_confirm
0.15 ± 2% +0.2 0.32 perf-profile.self.cycles-pp.fpregs_assert_state_consistent
0.11 ± 6% +0.2 0.32 ± 2% perf-profile.self.cycles-pp.import_iovec
0.00 +0.2 0.21 ± 3% perf-profile.self.cycles-pp.do_vmsplice
0.00 +0.2 0.21 ± 4% perf-profile.self.cycles-pp.add_to_pipe
3.05 ± 2% +0.2 3.27 ± 3% perf-profile.self.cycles-pp.find_get_entry
0.13 +0.2 0.35 ± 3% perf-profile.self.cycles-pp.rw_copy_check_uvector
0.26 ± 4% +0.2 0.49 ± 2% perf-profile.self.cycles-pp.copy_user_generic_unrolled
0.99 ± 3% +0.2 1.24 ± 3% perf-profile.self.cycles-pp.mutex_unlock
0.30 ± 2% +0.2 0.55 ± 4% perf-profile.self.cycles-pp.__x64_sys_splice
0.18 ± 6% +0.3 0.43 ± 3% perf-profile.self.cycles-pp.iov_iter_advance
0.00 +0.3 0.29 ± 4% perf-profile.self.cycles-pp.__do_sys_vmsplice
1.95 ± 3% +0.3 2.24 ± 2% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string
0.00 +0.3 0.31 ± 3% perf-profile.self.cycles-pp.try_grab_compound_head
0.52 ± 4% +0.3 0.86 perf-profile.self.cycles-pp.__might_sleep
0.00 +0.3 0.35 ± 5% perf-profile.self.cycles-pp.iov_iter_get_pages
0.00 +0.4 0.36 ± 2% perf-profile.self.cycles-pp.internal_get_user_pages_fast
0.58 ± 2% +0.4 0.97 perf-profile.self.cycles-pp.do_syscall_64
1.06 ± 6% +0.4 1.46 ± 2% perf-profile.self.cycles-pp.mutex_lock
0.54 ± 2% +0.4 0.98 ± 5% perf-profile.self.cycles-pp.do_splice
0.42 ± 2% +0.5 0.88 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.67 ± 3% +0.5 1.14 ± 2% perf-profile.self.cycles-pp.splice
0.00 +0.5 0.49 ± 3% perf-profile.self.cycles-pp.iter_to_pipe
0.49 ± 3% +0.5 0.98 ± 2% perf-profile.self.cycles-pp.splice_from_pipe
1.54 ± 2% +0.5 2.04 ± 2% perf-profile.self.cycles-pp.__splice_from_pipe
0.00 +0.5 0.54 ± 3% perf-profile.self.cycles-pp.vmsplice
1.52 ± 16% +0.5 2.06 ± 11% perf-profile.self.cycles-pp.common_file_perm
0.67 +0.6 1.30 ± 2% perf-profile.self.cycles-pp.__fget_light
2.02 +0.7 2.68 ± 2% perf-profile.self.cycles-pp.___might_sleep
0.76 ± 3% +0.8 1.56 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64
1.26 ± 2% +1.2 2.48 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.05 ± 8% +1.5 1.58 ± 3% perf-profile.self.cycles-pp.gup_pgd_range





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (53.95 kB)
config-5.7.0-01306-g17839856fd588 (206.20 kB)
job-script (7.85 kB)
job.yaml (5.54 kB)
reproduce (400.00 B)
Download all attachments

2020-06-11 20:28:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: [gup] 17839856fd: stress-ng.vm-splice.ops_per_sec 2158.6% improvement

On Wed, Jun 10, 2020 at 9:05 PM kernel test robot <[email protected]> wrote:
>
> FYI, we noticed a 2158.6% improvement of stress-ng.vm-splice.ops_per_sec due to commit:
>
> commit: 17839856fd588f4ab6b789f482ed3ffd7c403e1f ("gup: document and work around "COW can break either way" issue")

Well, that is amusing, and seeing improvements is always nice, but
somehow I think the test is broken.

I can't see why you'd ever see an improvement from that commit, and if
you do see one, not one by a factor of 20x.

> In addition to that, the commit also has significant impact on the following tests:
>
> | testcase: change | stress-ng: stress-ng.vm-splice.ops_per_sec 372.8% improvement |
> | testcase: change | stress-ng: stress-ng.vm-splice.ops_per_sec 2052.7% improvement |

Ok, so it's affecting other runs of the same test, and the smaller
("only" 378%) improvement seems to be just from using fewer threads.

So maybe forcing the COW ends up avoiding some very specific cache
thrashing case.

> To reproduce:
>
> git clone https://github.com/intel/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml # job file is attached in this email
> bin/lkp run job.yaml

Is there some place where you'd actually _see_ what
"stress-ng.vm-splice.ops_per_sec" actually means and does?

Yeah, I can go and find the actual stress-ng git repo, and take a
look. I kind of did. But the step from your "to reproduce" to actually
figuring out what is going on is pretty big.

It would be nice to know what it actually does - do you have a
database of descriptions for the different tests and how to run them
individually or anything like that?

IOW, rather than the above "just run all fo the lkp scripts",
something like how to run the actual individual test would be good.

IOW how do those yaml files translate into _actually_ running the
'stress-ng' program?

Linus

2020-06-12 00:13:35

by Philip Li

[permalink] [raw]
Subject: Re: [LKP] Re: [gup] 17839856fd: stress-ng.vm-splice.ops_per_sec 2158.6% improvement

On Thu, Jun 11, 2020 at 01:24:09PM -0700, Linus Torvalds wrote:
> On Wed, Jun 10, 2020 at 9:05 PM kernel test robot <[email protected]> wrote:
> >
> > FYI, we noticed a 2158.6% improvement of stress-ng.vm-splice.ops_per_sec due to commit:
> >
> > commit: 17839856fd588f4ab6b789f482ed3ffd7c403e1f ("gup: document and work around "COW can break either way" issue")
>
> Well, that is amusing, and seeing improvements is always nice, but
> somehow I think the test is broken.
>
> I can't see why you'd ever see an improvement from that commit, and if
> you do see one, not one by a factor of 20x.
got it, we will double check again, that we go through the data
and it can be reproduced in our environment before sending out report.

>
> > In addition to that, the commit also has significant impact on the following tests:
> >
> > | testcase: change | stress-ng: stress-ng.vm-splice.ops_per_sec 372.8% improvement |
> > | testcase: change | stress-ng: stress-ng.vm-splice.ops_per_sec 2052.7% improvement |
>
> Ok, so it's affecting other runs of the same test, and the smaller
> ("only" 378%) improvement seems to be just from using fewer threads.
>
> So maybe forcing the COW ends up avoiding some very specific cache
> thrashing case.
>
> > To reproduce:
> >
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp install job.yaml # job file is attached in this email
> > bin/lkp run job.yaml
>
> Is there some place where you'd actually _see_ what
> "stress-ng.vm-splice.ops_per_sec" actually means and does?
>
> Yeah, I can go and find the actual stress-ng git repo, and take a
> look. I kind of did. But the step from your "to reproduce" to actually
> figuring out what is going on is pretty big.
>
> It would be nice to know what it actually does - do you have a
> database of descriptions for the different tests and how to run them
> individually or anything like that?
Hi Linus, it is currently embedded in different scripts, like tests/stress-ng,
but it now depends on up level script like lkp run to call them to pass the
parameters from job.yaml. It can provide some basic information. Meanwhile
we try to generate a reproduce script for test running, I add more info
in below reply, kindly check.

>
> IOW, rather than the above "just run all fo the lkp scripts",
> something like how to run the actual individual test would be good.
>
> IOW how do those yaml files translate into _actually_ running the
> 'stress-ng' program?
Thanks Linus for your feedback, we will improve this to provide
more clear reproduce information.

In the attachment, there's a reproduce script with content like below,
which is another way to directly run the stress-ng. This helps to show
which parameters we are using when having this report. Would you mind to
have a look?

for cpu_dir in /sys/devices/system/cpu/cpu[0-9]*
do
online_file="$cpu_dir"/online
[ -f "$online_file" ] && [ "$(cat "$online_file")" -eq 0 ] && continue

file="$cpu_dir"/cpufreq/scaling_governor
[ -f "$file" ] && echo "performance" > "$file"
done

"stress-ng" "--timeout" "30" "--times" "--verify" "--metrics-brief" "--sequential" "96" "--class" "pipe" "--exclude" "spawn,exec,swap"

Thanks

>
> Linus
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

2020-06-15 07:19:14

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [gup] 17839856fd: stress-ng.vm-splice.ops_per_sec 2158.6% improvement

Hi Linus,

On Fri, Jun 12, 2020 at 08:07:32AM +0800, Philip Li wrote:
> On Thu, Jun 11, 2020 at 01:24:09PM -0700, Linus Torvalds wrote:
> >
> > IOW, rather than the above "just run all fo the lkp scripts",
> > something like how to run the actual individual test would be good.
> >
> > IOW how do those yaml files translate into _actually_ running the
> > 'stress-ng' program?
> Thanks Linus for your feedback, we will improve this to provide
> more clear reproduce information.
>
> In the attachment, there's a reproduce script with content like below,
> which is another way to directly run the stress-ng. This helps to show
> which parameters we are using when having this report. Would you mind to
> have a look?
>
> for cpu_dir in /sys/devices/system/cpu/cpu[0-9]*
> do
> online_file="$cpu_dir"/online
> [ -f "$online_file" ] && [ "$(cat "$online_file")" -eq 0 ] && continue
>
> file="$cpu_dir"/cpufreq/scaling_governor
> [ -f "$file" ] && echo "performance" > "$file"
> done
>
> "stress-ng" "--timeout" "30" "--times" "--verify" "--metrics-brief" "--sequential" "96" "--class" "pipe" "--exclude" "spawn,exec,swap"

If you only want to run "vm-splice" case only, you can use "exclude"
option like:

stress-ng --timeout 20 --times --verify --metrics-brief --sequential NR_CPUS --class pipe --exclude="fifo,pipe,sendfile,splice,tee"


Also we've used 0day framework to capture some perf info, pasted below,
hope it could provide some hint, the biggest change is "__get_user_pages"
is almost gone in perf samples:

* perf profile compare

67.21 -67.2 0.00 pp.bt.get_user_pages_unlocked.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe.do_vmsplice
65.22 -65.2 0.00 pp.bt.__get_user_pages.get_user_pages_unlocked.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe
64.74 -64.7 0.00 pp.bt.follow_page_pte.__get_user_pages.get_user_pages_unlocked.internal_get_user_pages_fast.iov_iter_get_pages
67.58 -54.0 13.60 pp.bt.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe.do_vmsplice.__do_sys_vmsplice
67.67 -51.8 15.83 pp.bt.iov_iter_get_pages.iter_to_pipe.do_vmsplice.__do_sys_vmsplice.do_syscall_64
67.86 -46.3 21.56 pp.bt.iter_to_pipe.do_vmsplice.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe
70.00 -40.7 29.30 pp.bt.do_vmsplice.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe.vmsplice
37.33 -37.3 0.00 pp.bt.mark_page_accessed.follow_page_pte.__get_user_pages.get_user_pages_unlocked.internal_get_user_pages_fast

* decouple of __do_sys_vmsplice

- old commit(f359287765):
- 70.42% 0.06% stress-ng-vm-sp [kernel.kallsyms] [k] __do_sys_vmsplice
- 70.36% __do_sys_vmsplice
- 69.95% do_vmsplice.part.0
- 67.80% iter_to_pipe
- 67.61% iov_iter_get_pages
- 67.52% internal_get_user_pages_fast
- 67.15% get_user_pages_unlocked
- 65.17% __get_user_pages
- 64.68% follow_page_pte
- 37.36% mark_page_accessed
1.31% workingset_activation
26.77% try_grab_page
1.37% up_read
0.54% down_read
1.03% mutex_unlock
0.72% wakeup_pipe_readers

- new commit(17839856fd):

- 41.93% 1.67% stress-ng-vm-sp [kernel.kallsyms] [k] __do_sys_vmsplice
- 40.26% __do_sys_vmsplice
- 29.49% do_vmsplice.part.0
- 21.50% iter_to_pipe
- 15.71% iov_iter_get_pages
- 13.59% internal_get_user_pages_fast
- 11.52% gup_pgd_range
1.83% try_grab_compound_head
1.67% iov_iter_advance
1.22% add_to_pipe
- 3.46% mutex_lock
1.04% ___might_sleep
0.75% __might_sleep
1.42% wakeup_pipe_readers
1.14% mutex_unlock
- 9.01% import_iovec
- 7.83% rw_copy_check_uvector
- 5.24% _copy_from_user
- 2.14% __might_fault
1.07% ___might_sleep
0.56% __might_sleep
1.20% copy_user_generic_unrolled
0.84% copy_user_enhanced_fast_string
- 1.32% __check_object_size
0.53% check_stack_object
1.31% __fget_light

* detailed compare report

4.43 -99.2% 0.03 perf-stat.i.MPKI
3.163e+09 +1497.9% 5.055e+10 perf-stat.i.branch-instructions
1.16 -0.2 0.94 perf-stat.i.branch-miss-rate%
36601352 +1179.3% 4.682e+08 perf-stat.i.branch-misses
32.17 -21.8 10.33 perf-stat.i.cache-miss-rate%
23170333 -97.2% 653650 perf-stat.i.cache-misses
71808397 -92.2% 5636462 perf-stat.i.cache-references
1537 +0.5% 1546 perf-stat.i.context-switches
16.19 -94.0% 0.96 perf-stat.i.cpi
96108 -0.1% 95989 perf-stat.i.cpu-clock
2.636e+11 -4.8% 2.51e+11 perf-stat.i.cpu-cycles
126.06 +19.5% 150.69 perf-stat.i.cpu-migrations
11310 +5322.6% 613322 perf-stat.i.cycles-between-cache-misses
0.00 -0.0 0.00 perf-stat.i.dTLB-load-miss-rate%
47789 +162.6% 125478 perf-stat.i.dTLB-load-misses
4.223e+09 +1614.0% 7.237e+10 perf-stat.i.dTLB-loads
0.00 -0.0 0.00 perf-stat.i.dTLB-store-miss-rate%
17758 -38.2% 10983 perf-stat.i.dTLB-store-misses
2.541e+09 +1636.8% 4.414e+10 perf-stat.i.dTLB-stores
99.33 +0.6 99.89 perf-stat.i.iTLB-load-miss-rate%
23983228 +1653.0% 4.204e+08 perf-stat.i.iTLB-load-misses
83340 +192.8% 244057 perf-stat.i.iTLB-loads
1.622e+10 +1506.1% 2.605e+11 perf-stat.i.instructions
689.90 -10.0% 620.97 perf-stat.i.instructions-per-iTLB-miss
0.06 +1524.4% 1.04 perf-stat.i.ipc
2.76 -5.5% 2.61 perf-stat.i.metric.GHz
0.87 +67.8% 1.47 perf-stat.i.metric.K/sec
104.92 +1557.3% 1738 perf-stat.i.metric.M/sec
2973 +0.2% 2979 perf-stat.i.minor-faults
97.83 -12.3 85.57 perf-stat.i.node-load-miss-rate%
1777209 -94.8% 91899 perf-stat.i.node-load-misses
40064 -28.5% 28636 perf-stat.i.node-loads
98.94 -11.7 87.22 perf-stat.i.node-store-miss-rate%
1950431 -97.9% 40800 perf-stat.i.node-store-misses
14881 -17.0% 12357 perf-stat.i.node-stores
2973 +0.2% 2979 perf-stat.i.page-faults
96108 -0.1% 95989 perf-stat.i.task-clock
4.44 -99.5% 0.02 perf-stat.overall.MPKI
1.13 -0.2 0.93 perf-stat.overall.branch-miss-rate%
32.29 -20.7 11.60 perf-stat.overall.cache-miss-rate%
16.30 -94.1% 0.96 perf-stat.overall.cpi
11380 +3265.0% 382951 perf-stat.overall.cycles-between-cache-misses
0.00 -0.0 0.00 perf-stat.overall.dTLB-load-miss-rate%
0.00 -0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
99.68 +0.3 99.94 perf-stat.overall.iTLB-load-miss-rate%
673.00 -7.9% 619.63 perf-stat.overall.instructions-per-iTLB-miss
0.06 +1591.6% 1.04 perf-stat.overall.ipc
97.65 -21.6 76.08 perf-stat.overall.node-load-miss-rate%
99.29 -22.6 76.65 perf-stat.overall.node-store-miss-rate%
3.114e+09 +1502.8% 4.991e+10 perf-stat.ps.branch-instructions
35311356 +1209.5% 4.624e+08 perf-stat.ps.branch-misses
22873359 -97.2% 647182 perf-stat.ps.cache-misses
70844364 -92.1% 5580624 perf-stat.ps.cache-references
1488 +2.6% 1527 perf-stat.ps.context-switches
94719 +0.1% 94804 perf-stat.ps.cpu-clock
2.603e+11 -4.8% 2.478e+11 perf-stat.ps.cpu-cycles
116.79 +27.4% 148.85 perf-stat.ps.cpu-migrations
46047 +169.3% 124021 perf-stat.ps.dTLB-load-misses
4.163e+09 +1616.8% 7.147e+10 perf-stat.ps.dTLB-loads
17216 -36.8% 10874 perf-stat.ps.dTLB-store-misses
2.508e+09 +1637.9% 4.359e+10 perf-stat.ps.dTLB-stores
23732007 +1649.3% 4.152e+08 perf-stat.ps.iTLB-load-misses
76010 +217.5% 241345 perf-stat.ps.iTLB-loads
1.597e+10 +1510.6% 2.572e+11 perf-stat.ps.instructions
2873 +2.7% 2951 perf-stat.ps.minor-faults
1752026 -94.8% 90903 perf-stat.ps.node-load-misses
42165 -32.2% 28583 perf-stat.ps.node-loads
1926258 -97.9% 40352 perf-stat.ps.node-store-misses
13783 -10.8% 12289 perf-stat.ps.node-stores
2873 +2.7% 2951 perf-stat.ps.page-faults
94719 +0.1% 94804 perf-stat.ps.task-clock
1.283e+12 +1529.3% 2.091e+13 perf-stat.total.instructions
67.21 -67.2 0.00 pp.bt.get_user_pages_unlocked.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe.do_vmsplice
65.22 -65.2 0.00 pp.bt.__get_user_pages.get_user_pages_unlocked.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe
64.74 -64.7 0.00 pp.bt.follow_page_pte.__get_user_pages.get_user_pages_unlocked.internal_get_user_pages_fast.iov_iter_get_pages
67.58 -54.0 13.60 pp.bt.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe.do_vmsplice.__do_sys_vmsplice
67.67 -51.8 15.83 pp.bt.iov_iter_get_pages.iter_to_pipe.do_vmsplice.__do_sys_vmsplice.do_syscall_64
67.86 -46.3 21.56 pp.bt.iter_to_pipe.do_vmsplice.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe
70.00 -40.7 29.30 pp.bt.do_vmsplice.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe.vmsplice
37.33 -37.3 0.00 pp.bt.mark_page_accessed.follow_page_pte.__get_user_pages.get_user_pages_unlocked.internal_get_user_pages_fast
70.48 -28.4 42.10 pp.bt.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe.vmsplice
26.74 -26.7 0.00 pp.bt.try_grab_page.follow_page_pte.__get_user_pages.get_user_pages_unlocked.internal_get_user_pages_fast
70.62 -25.9 44.71 pp.bt.do_syscall_64.entry_SYSCALL_64_after_hwframe.vmsplice
70.67 -24.6 46.08 pp.bt.entry_SYSCALL_64_after_hwframe.vmsplice
24.98 -22.9 2.11 pp.bt.page_cache_pipe_buf_release.__splice_from_pipe.splice_from_pipe.do_splice.__x64_sys_splice
26.15 -16.0 10.15 pp.bt.__splice_from_pipe.splice_from_pipe.do_splice.__x64_sys_splice.do_syscall_64
71.03 -13.1 57.88 pp.bt.vmsplice
27.50 -8.9 18.57 pp.bt.splice_from_pipe.do_splice.__x64_sys_splice.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.37 -1.4 0.00 pp.bt.up_read.get_user_pages_unlocked.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe
1.31 -1.3 0.00 pp.bt.workingset_activation.mark_page_accessed.follow_page_pte.__get_user_pages.get_user_pages_unlocked
0.54 -0.5 0.00 pp.bt.down_read.get_user_pages_unlocked.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe
27.81 -0.3 27.46 pp.bt.do_splice.__x64_sys_splice.do_syscall_64.entry_SYSCALL_64_after_hwframe.splice
1.03 +0.1 1.16 pp.bt.mutex_unlock.do_vmsplice.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.5 0.52 pp.bt.generic_pipe_buf_confirm.__splice_from_pipe.splice_from_pipe.do_splice.__x64_sys_splice
0.00 +0.5 0.54 pp.bt.fpregs_assert_state_consistent.do_syscall_64.entry_SYSCALL_64_after_hwframe.splice
0.00 +0.5 0.54 pp.bt.check_stack_object.__check_object_size.rw_copy_check_uvector.import_iovec.__do_sys_vmsplice
0.00 +0.6 0.55 pp.bt.aa_file_perm.common_file_perm.security_file_permission.do_splice.__x64_sys_splice
0.00 +0.6 0.56 pp.bt.__might_sleep.__might_fault._copy_from_user.rw_copy_check_uvector.import_iovec
0.00 +0.6 0.61 pp.bt.__x64_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe.vmsplice
0.00 +0.6 0.64 pp.bt.splice_write_null.do_splice.__x64_sys_splice.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.7 0.67 pp.bt.apparmor_file_permission.security_file_permission.do_splice.__x64_sys_splice.do_syscall_64
0.81 +0.7 1.50 pp.bt.wakeup_pipe_writers.__splice_from_pipe.splice_from_pipe.do_splice.__x64_sys_splice
0.72 +0.7 1.44 pp.bt.wakeup_pipe_readers.do_vmsplice.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.7 0.73 pp.bt.__might_sleep.mutex_lock.splice_from_pipe.do_splice.__x64_sys_splice
0.00 +0.8 0.75 pp.bt.__might_sleep.mutex_lock.do_vmsplice.__do_sys_vmsplice.do_syscall_64
0.00 +0.8 0.80 pp.bt.pipe_to_null.__splice_from_pipe.splice_from_pipe.do_splice.__x64_sys_splice
0.00 +0.8 0.85 pp.bt.copy_user_enhanced_fast_string._copy_from_user.rw_copy_check_uvector.import_iovec.__do_sys_vmsplice
0.00 +0.9 0.95 pp.bt.rw_verify_area.do_splice.__x64_sys_splice.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.0 0.97 pp.bt.___might_sleep.__splice_from_pipe.splice_from_pipe.do_splice.__x64_sys_splice
0.00 +1.1 1.05 pp.bt.___might_sleep.mutex_lock.do_vmsplice.__do_sys_vmsplice.do_syscall_64
0.00 +1.1 1.08 pp.bt.___might_sleep.__might_fault._copy_from_user.rw_copy_check_uvector.import_iovec
0.00 +1.1 1.08 pp.bt.___might_sleep.mutex_lock.splice_from_pipe.do_splice.__x64_sys_splice
0.00 +1.1 1.11 pp.bt.mutex_unlock.splice_from_pipe.do_splice.__x64_sys_splice.do_syscall_64
0.00 +1.2 1.21 pp.bt.copy_user_generic_unrolled._copy_from_user.rw_copy_check_uvector.import_iovec.__do_sys_vmsplice
0.00 +1.2 1.23 pp.bt.__check_object_size.rw_copy_check_uvector.import_iovec.__do_sys_vmsplice.do_syscall_64
0.00 +1.2 1.23 pp.bt.add_to_pipe.iter_to_pipe.do_vmsplice.__do_sys_vmsplice.do_syscall_64
0.00 +1.3 1.33 pp.bt.__fget_light.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe.vmsplice
0.00 +1.4 1.38 pp.bt.syscall_return_via_sysret.splice
0.00 +1.4 1.40 pp.bt.syscall_return_via_sysret.vmsplice
0.00 +1.7 1.69 pp.bt.iov_iter_advance.iter_to_pipe.do_vmsplice.__do_sys_vmsplice.do_syscall_64
0.00 +1.9 1.85 pp.bt.try_grab_compound_head.gup_pgd_range.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe
0.00 +2.1 2.11 pp.bt.__might_fault._copy_from_user.rw_copy_check_uvector.import_iovec.__do_sys_vmsplice
0.00 +2.3 2.33 pp.bt.__fget_light.__x64_sys_splice.do_syscall_64.entry_SYSCALL_64_after_hwframe.splice
0.77 +2.6 3.37 pp.bt.mutex_lock.splice_from_pipe.do_splice.__x64_sys_splice.do_syscall_64
0.00 +3.1 3.07 pp.bt.common_file_perm.security_file_permission.do_splice.__x64_sys_splice.do_syscall_64
0.00 +3.3 3.33 pp.bt.mutex_lock.do_vmsplice.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe
27.95 +3.8 31.74 pp.bt.__x64_sys_splice.do_syscall_64.entry_SYSCALL_64_after_hwframe.splice
0.00 +4.5 4.50 pp.bt.security_file_permission.do_splice.__x64_sys_splice.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +5.2 5.17 pp.bt._copy_from_user.rw_copy_check_uvector.import_iovec.__do_sys_vmsplice.do_syscall_64
28.02 +5.6 33.60 pp.bt.do_syscall_64.entry_SYSCALL_64_after_hwframe.splice
28.07 +6.9 34.94 pp.bt.entry_SYSCALL_64_after_hwframe.splice
0.00 +6.9 6.93 pp.bt.entry_SYSCALL_64.vmsplice
0.00 +7.0 7.05 pp.bt.entry_SYSCALL_64.splice
0.00 +7.7 7.68 pp.bt.rw_copy_check_uvector.import_iovec.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +9.1 9.13 pp.bt.import_iovec.__do_sys_vmsplice.do_syscall_64.entry_SYSCALL_64_after_hwframe.vmsplice
0.00 +11.5 11.48 pp.bt.gup_pgd_range.internal_get_user_pages_fast.iov_iter_get_pages.iter_to_pipe.do_vmsplice
28.44 +18.2 46.63 pp.bt.splice
67.21 -67.2 0.00 pp.child.get_user_pages_unlocked
65.24 -65.2 0.00 pp.child.__get_user_pages
64.75 -64.8 0.00 pp.child.follow_page_pte
67.59 -53.8 13.78 pp.child.internal_get_user_pages_fast
67.67 -51.7 15.94 pp.child.iov_iter_get_pages
67.87 -46.1 21.79 pp.child.iter_to_pipe
70.02 -40.2 29.85 pp.child.do_vmsplice
37.40 -37.4 0.00 pp.child.mark_page_accessed
70.49 -28.1 42.37 pp.child.__do_sys_vmsplice
26.80 -26.8 0.00 pp.child.try_grab_page
25.03 -22.9 2.11 pp.child.page_cache_pipe_buf_release
98.98 -20.5 78.53 pp.child.do_syscall_64
99.07 -17.8 81.22 pp.child.entry_SYSCALL_64_after_hwframe
26.17 -15.5 10.65 pp.child.__splice_from_pipe
71.12 -13.2 57.90 pp.child.vmsplice
27.51 -8.8 18.73 pp.child.splice_from_pipe
1.37 -1.4 0.00 pp.child.up_read
1.31 -1.3 0.00 pp.child.workingset_activation
0.54 -0.5 0.00 pp.child.down_read
0.40 -0.4 0.00 pp.child._raw_spin_lock
0.18 -0.2 0.00 pp.child.__libc_start_main
0.18 -0.2 0.00 pp.child.main
0.17 -0.2 0.00 pp.child.ksys_write
0.17 -0.2 0.00 pp.child.vfs_write
0.17 -0.2 0.00 pp.child.new_sync_write
0.16 -0.2 0.00 pp.child.cmd_record
0.16 -0.2 0.00 pp.child.perf_mmap__push
0.16 -0.2 0.00 pp.child.__GI___libc_write
0.16 -0.2 0.00 pp.child.generic_file_write_iter
0.16 -0.2 0.00 pp.child.__generic_file_write_iter
0.16 -0.2 0.00 pp.child.generic_perform_write
0.12 -0.1 0.00 pp.child.iov_iter_copy_from_user_atomic
0.12 -0.1 0.00 pp.child.copyin
0.11 -0.1 0.00 pp.child.find_extend_vma
0.11 -0.1 0.00 pp.child.ret_from_fork
0.10 -0.1 0.00 pp.child.kthread
0.10 -0.1 0.00 pp.child.follow_page_mask
0.09 -0.1 0.00 pp.child.follow_pmd_mask
0.06 -0.1 0.00 pp.child.page_fault
0.06 -0.1 0.00 pp.child.do_user_addr_fault
0.06 -0.1 0.00 pp.child.handle_mm_fault
0.06 -0.1 0.00 pp.child.__handle_mm_fault
0.05 -0.1 0.00 pp.child.worker_thread
0.05 -0.1 0.00 pp.child.process_one_work
0.05 -0.1 0.00 pp.child.execve
0.05 -0.1 0.00 pp.child.__x64_sys_execve
0.05 -0.1 0.00 pp.child.__do_execve_file
0.05 -0.1 0.00 pp.child.find_vma
0.05 -0.1 0.00 pp.child.memcpy_erms
0.08 +0.0 0.10 pp.child.task_tick_fair
0.10 +0.0 0.13 pp.child.scheduler_tick
27.82 +0.0 27.85 pp.child.do_splice
0.13 +0.0 0.18 pp.child.tick_sched_handle
0.13 +0.0 0.18 pp.child.update_process_times
0.00 +0.1 0.05 pp.child.perf_mux_hrtimer_handler
0.00 +0.1 0.09 pp.child.ktime_get_update_offsets_now
0.00 +0.1 0.10 pp.child.get_user_pages_fast
0.00 +0.1 0.10 pp.child.vmsplice@plt
0.00 +0.1 0.11 pp.child.vmsplice_type
0.15 +0.1 0.26 pp.child.tick_sched_timer
0.00 +0.1 0.12 pp.child.splice@plt
0.00 +0.1 0.13 pp.child.pud_huge
0.06 +0.1 0.21 pp.child.clockevents_program_event
0.19 +0.2 0.34 pp.child.__hrtimer_run_queues
0.56 +0.2 0.73 pp.child.__irqentry_text_start
0.00 +0.2 0.18 pp.child.kill_fasync
0.05 +0.2 0.25 pp.child.ktime_get
0.00 +0.3 0.26 pp.child.pmd_huge
0.00 +0.3 0.26 pp.child.__fdget
0.37 +0.3 0.69 pp.child.smp_apic_timer_interrupt
0.32 +0.3 0.65 pp.child.hrtimer_interrupt
0.00 +0.3 0.34 pp.child.kfree
0.00 +0.4 0.39 pp.child.splice_from_pipe_next
0.00 +0.4 0.41 pp.child.wait_for_space
0.00 +0.5 0.45 pp.child.pipe_lock
0.00 +0.5 0.46 pp.child.pipe_unlock
0.00 +0.6 0.56 pp.child.aa_file_perm
0.00 +0.6 0.57 pp.child.check_stack_object
0.00 +0.6 0.57 pp.child.get_pipe_info
0.00 +0.6 0.62 pp.child.__x64_sys_vmsplice
0.00 +0.7 0.68 pp.child.rcu_all_qs
0.81 +0.7 1.50 pp.child.wakeup_pipe_writers
0.00 +0.7 0.71 pp.child.apparmor_file_permission
0.72 +0.7 1.44 pp.child.wakeup_pipe_readers
0.00 +0.7 0.73 pp.child.splice_write_null
0.15 +0.8 0.93 pp.child.copy_user_enhanced_fast_string
0.00 +0.8 0.84 pp.child.generic_pipe_buf_confirm
0.00 +0.9 0.87 pp.child.pipe_to_null
0.00 +0.9 0.88 pp.child.__x86_indirect_thunk_rax
1.47 +0.9 2.36 pp.child.mutex_unlock
0.00 +1.0 0.99 pp.child.rw_verify_area
0.08 +1.0 1.13 pp.child.fpregs_assert_state_consistent
0.08 +1.2 1.27 pp.child._cond_resched
0.05 +1.2 1.24 pp.child.add_to_pipe
0.00 +1.4 1.38 pp.child.__check_object_size
0.06 +1.4 1.50 pp.child.copy_user_generic_unrolled
0.05 +1.7 1.71 pp.child.iov_iter_advance
0.00 +1.9 1.85 pp.child.try_grab_compound_head
0.09 +2.0 2.08 pp.child.__might_sleep
0.11 +2.1 2.21 pp.child.__might_fault
0.12 +3.0 3.17 pp.child.common_file_perm
0.13 +3.5 3.66 pp.child.__fget_light
0.32 +3.9 4.20 pp.child.___might_sleep
27.95 +3.9 31.85 pp.child.__x64_sys_splice
0.16 +4.4 4.59 pp.child.security_file_permission
0.23 +5.1 5.36 pp.child._copy_from_user
1.11 +5.9 7.04 pp.child.mutex_lock
0.30 +7.2 7.52 pp.child.syscall_return_via_sysret
0.30 +7.7 7.96 pp.child.rw_copy_check_uvector
0.35 +8.9 9.21 pp.child.import_iovec
0.36 +9.0 9.40 pp.child.entry_SYSCALL_64
0.29 +11.4 11.67 pp.child.gup_pgd_range
28.53 +18.2 46.70 pp.child.splice
35.94 -35.9 0.00 pp.self.mark_page_accessed
26.67 -26.7 0.00 pp.self.try_grab_page
24.93 -22.9 2.04 pp.self.page_cache_pipe_buf_release
1.37 -1.4 0.00 pp.self.up_read
1.31 -1.3 0.00 pp.self.workingset_activation
0.45 -0.5 0.00 pp.self.down_read
0.39 -0.4 0.00 pp.self._raw_spin_lock
0.17 -0.2 0.00 pp.self.follow_page_pte
0.14 -0.1 0.00 pp.self.__get_user_pages
0.09 -0.1 0.00 pp.self.follow_page_mask
0.08 -0.1 0.00 pp.self.follow_pmd_mask
0.06 -0.1 0.00 pp.self.get_user_pages_unlocked
0.06 -0.1 0.00 pp.self.find_extend_vma
0.05 -0.1 0.00 pp.self.memcpy_erms
0.00 +0.1 0.06 pp.self.get_user_pages_fast
0.00 +0.1 0.08 pp.self.pud_huge
0.00 +0.1 0.09 pp.self.ktime_get_update_offsets_now
0.00 +0.1 0.10 pp.self.vmsplice@plt
0.00 +0.1 0.11 pp.self.vmsplice_type
0.00 +0.1 0.12 pp.self.splice@plt
0.00 +0.2 0.17 pp.self.__fdget
0.00 +0.2 0.18 pp.self.kill_fasync
0.00 +0.2 0.20 pp.self.pmd_huge
0.00 +0.2 0.23 pp.self.ktime_get
0.00 +0.3 0.28 pp.self.pipe_lock
0.00 +0.3 0.28 pp.self.kfree
0.00 +0.3 0.29 pp.self.pipe_unlock
0.00 +0.3 0.33 pp.self.splice_from_pipe_next
0.00 +0.3 0.35 pp.self.wait_for_space
0.00 +0.4 0.38 pp.self.__x86_indirect_thunk_rax
0.00 +0.4 0.39 pp.self.get_pipe_info
0.00 +0.5 0.47 pp.self.check_stack_object
0.00 +0.5 0.50 pp.self.aa_file_perm
0.00 +0.5 0.51 pp.self.rcu_all_qs
0.00 +0.5 0.53 pp.self.__might_fault
0.00 +0.6 0.56 pp.self.__x64_sys_vmsplice
0.00 +0.6 0.59 pp.self._cond_resched
0.81 +0.7 1.47 pp.self.wakeup_pipe_writers
0.00 +0.7 0.66 pp.self.splice_write_null
0.00 +0.7 0.66 pp.self.apparmor_file_permission
0.71 +0.7 1.38 pp.self.wakeup_pipe_readers
0.00 +0.7 0.71 pp.self.security_file_permission
0.15 +0.7 0.87 pp.self.copy_user_enhanced_fast_string
1.47 +0.8 2.24 pp.self.mutex_unlock
0.00 +0.8 0.78 pp.self.__check_object_size
0.00 +0.8 0.78 pp.self.generic_pipe_buf_confirm
0.00 +0.8 0.79 pp.self._copy_from_user
0.00 +0.8 0.81 pp.self.pipe_to_null
0.00 +0.9 0.88 pp.self.rw_verify_area
0.08 +1.0 1.09 pp.self.fpregs_assert_state_consistent
0.05 +1.1 1.19 pp.self.add_to_pipe
0.00 +1.2 1.20 pp.self.import_iovec
0.00 +1.3 1.27 pp.self.do_vmsplice
0.06 +1.3 1.41 pp.self.copy_user_generic_unrolled
0.05 +1.3 1.40 pp.self.rw_copy_check_uvector
0.05 +1.5 1.54 pp.self.__x64_sys_splice
0.05 +1.6 1.63 pp.self.iov_iter_advance
0.06 +1.6 1.69 pp.self.__do_sys_vmsplice
0.08 +1.7 1.80 pp.self.__might_sleep
0.00 +1.8 1.80 pp.self.try_grab_compound_head
0.95 +1.8 2.79 pp.self.mutex_lock
0.08 +2.0 2.05 pp.self.iov_iter_get_pages
0.09 +2.0 2.11 pp.self.internal_get_user_pages_fast
0.09 +2.3 2.39 pp.self.do_syscall_64
0.10 +2.5 2.61 pp.self.common_file_perm
0.10 +2.6 2.73 pp.self.entry_SYSCALL_64_after_hwframe
0.09 +2.7 2.75 pp.self.do_splice
0.09 +2.8 2.88 pp.self.iter_to_pipe
0.12 +3.0 3.13 pp.self.splice
0.12 +3.0 3.15 pp.self.splice_from_pipe
0.12 +3.1 3.18 pp.self.vmsplice
0.13 +3.4 3.56 pp.self.__fget_light
0.14 +3.5 3.60 pp.self.__splice_from_pipe
0.31 +3.8 4.07 pp.self.___might_sleep
0.19 +4.6 4.76 pp.self.entry_SYSCALL_64
0.30 +7.1 7.44 pp.self.syscall_return_via_sysret
0.28 +9.2 9.47 pp.self.gup_pgd_range

Thanks,
Feng

2020-06-15 07:37:24

by Jann Horn

[permalink] [raw]
Subject: Re: [gup] 17839856fd: stress-ng.vm-splice.ops_per_sec 2158.6% improvement

On Thu, Jun 11, 2020 at 10:24 PM Linus Torvalds
<[email protected]> wrote:
>
> On Wed, Jun 10, 2020 at 9:05 PM kernel test robot <[email protected]> wrote:
> >
> > FYI, we noticed a 2158.6% improvement of stress-ng.vm-splice.ops_per_sec due to commit:
> >
> > commit: 17839856fd588f4ab6b789f482ed3ffd7c403e1f ("gup: document and work around "COW can break either way" issue")
>
> Well, that is amusing, and seeing improvements is always nice, but
> somehow I think the test is broken.
>
> I can't see why you'd ever see an improvement from that commit, and if
> you do see one, not one by a factor of 20x.

FWIW, if this is the testcase:
<https://kernel.ubuntu.com/git/cking/stress-ng.git/tree/stress-vm-splice.c>

then that testcase is essentially testing how fast vmsplice() is when
called in a loop on an uninitialized mmap() mapping. So before that
commit, I think it will create zeropage PTEs in the first iteration
(and zeropage PTEs are _PAGE_SPECIAL, see do_anonymous_page()). And
get_user_pages_fast() bails out in gup_pte_range() if pte_special().
So that testcase was always hitting the GUP slowpath.
But now the first iteration will force the creation of a normal RW
PTE, so all following iterations can go through the GUP fastpath.

So in summary I guess the test was just really slow up until now
because it was hitting a slowpath that you wouldn't hit during normal
usage? At least for vmsplice(), writing uninitialized pages doesn't
really make a whole lot of sense...

2020-06-15 08:36:46

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [gup] 17839856fd: stress-ng.vm-splice.ops_per_sec 2158.6% improvement

On Mon, Jun 15, 2020 at 09:32:41AM +0200, Jann Horn wrote:
> On Thu, Jun 11, 2020 at 10:24 PM Linus Torvalds
> <[email protected]> wrote:
> >
> > On Wed, Jun 10, 2020 at 9:05 PM kernel test robot <[email protected]> wrote:
> > >
> > > FYI, we noticed a 2158.6% improvement of stress-ng.vm-splice.ops_per_sec due to commit:
> > >
> > > commit: 17839856fd588f4ab6b789f482ed3ffd7c403e1f ("gup: document and work around "COW can break either way" issue")
> >
> > Well, that is amusing, and seeing improvements is always nice, but
> > somehow I think the test is broken.
> >
> > I can't see why you'd ever see an improvement from that commit, and if
> > you do see one, not one by a factor of 20x.
>
> FWIW, if this is the testcase:
> <https://kernel.ubuntu.com/git/cking/stress-ng.git/tree/stress-vm-splice.c>

Yes, functionally this is the same file as what is used by 0day, though 0day
use a stable release tar ball.

Thanks,
Feng

> then that testcase is essentially testing how fast vmsplice() is when
> called in a loop on an uninitialized mmap() mapping. So before that
> commit, I think it will create zeropage PTEs in the first iteration
> (and zeropage PTEs are _PAGE_SPECIAL, see do_anonymous_page()). And
> get_user_pages_fast() bails out in gup_pte_range() if pte_special().
> So that testcase was always hitting the GUP slowpath.
> But now the first iteration will force the creation of a normal RW
> PTE, so all following iterations can go through the GUP fastpath.
>
> So in summary I guess the test was just really slow up until now
> because it was hitting a slowpath that you wouldn't hit during normal
> usage? At least for vmsplice(), writing uninitialized pages doesn't
> really make a whole lot of sense...
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

2020-06-15 19:12:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: [gup] 17839856fd: stress-ng.vm-splice.ops_per_sec 2158.6% improvement

On Mon, Jun 15, 2020 at 12:33 AM Jann Horn <[email protected]> wrote:
>
> So in summary I guess the test was just really slow up until now
> because it was hitting a slowpath that you wouldn't hit during normal
> usage? At least for vmsplice(), writing uninitialized pages doesn't
> really make a whole lot of sense...

Heh. My main worry with that commit was that we'd see the reverse
effect: somebody implicitly depending on the continued sharing of
pages in some odd splice situation, and where breaking the COW would
result in a performance degradation.

The fact that it went the other way instead is somewhat ironic.

But good.

I guess I'll just ping the stable people to say "yeah, put that in
stable too". It's a fix, and apparently it _helps_ performance for the
degenerate cases rather than hurting anything.

Linus