Greeting,
FYI, we noticed a -8.8% regression of will-it-scale.per_process_ops due to commit:
commit: 3deb642f0de4c14f37437dd247f9c77839f043f8 ("fs: introduce new ->get_poll_head and ->poll_mask methods")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: will-it-scale
on test machine: 32 threads Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz with 64G memory
with following parameters:
test: poll2
cpufreq_governor: performance
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase:
gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-sb03/poll2/will-it-scale
commit:
9965ed174e ("fs: add new vfs_poll and file_can_poll helpers")
3deb642f0d ("fs: introduce new ->get_poll_head and ->poll_mask methods")
9965ed174e7d3889 3deb642f0de4c14f37437dd247
---------------- --------------------------
%stddev %change %stddev
\ | \
501456 -8.8% 457120 will-it-scale.per_process_ops
244715 -2.3% 238978 will-it-scale.per_thread_ops
0.53 ? 2% +2.4% 0.54 will-it-scale.scalability
310.44 +0.0% 310.44 will-it-scale.time.elapsed_time
310.44 +0.0% 310.44 will-it-scale.time.elapsed_time.max
15775 ? 5% +13.0% 17818 ? 10% will-it-scale.time.involuntary_context_switches
9911 +0.2% 9931 will-it-scale.time.maximum_resident_set_size
17178 +0.2% 17218 will-it-scale.time.minor_page_faults
4096 +0.0% 4096 will-it-scale.time.page_size
806.50 +0.1% 807.00 will-it-scale.time.percent_of_cpu_this_job_got
2330 +0.1% 2332 will-it-scale.time.system_time
175.19 -1.2% 173.14 ? 2% will-it-scale.time.user_time
1648 ? 15% +6.5% 1755 ? 13% will-it-scale.time.voluntary_context_switches
52370410 -6.4% 49024375 will-it-scale.workload
109841 ? 7% +6.4% 116842 ? 12% interrupts.CAL:Function_call_interrupts
25.62 +0.3% 25.69 boot-time.boot
15.04 +0.2% 15.07 boot-time.dhcp
766.39 +0.7% 771.95 boot-time.idle
15.70 +0.2% 15.73 boot-time.kernel_boot
8462 ? 22% +1.6% 8594 ? 33% softirqs.NET_RX
481569 -8.2% 442215 ? 4% softirqs.RCU
759091 -0.4% 756380 softirqs.SCHED
4791321 ? 2% -1.3% 4729715 ? 2% softirqs.TIMER
49.53 -0.0 49.49 mpstat.cpu.idle%
0.00 ? 14% +0.0 0.00 ? 16% mpstat.cpu.iowait%
0.02 ? 68% +0.0 0.07 ?111% mpstat.cpu.soft%
45.39 +0.3 45.70 mpstat.cpu.sys%
5.05 -0.3 4.74 mpstat.cpu.usr%
192.00 +0.0% 192.00 vmstat.memory.buff
1304198 -0.0% 1303934 vmstat.memory.cache
64246863 +0.0% 64247669 vmstat.memory.free
0.00 -100.0% 0.00 vmstat.procs.b
16.00 +0.0% 16.00 vmstat.procs.r
1882 ? 4% +12.3% 2112 ? 7% vmstat.system.cs
32654 +0.2% 32735 vmstat.system.in
0.00 -100.0% 0.00 numa-numastat.node0.interleave_hit
231408 ? 9% -20.9% 183107 ? 36% numa-numastat.node0.local_node
232476 ? 9% -19.7% 186744 ? 36% numa-numastat.node0.numa_hit
1070 ?164% +239.7% 3637 ? 62% numa-numastat.node0.other_node
0.00 -100.0% 0.00 numa-numastat.node1.interleave_hit
402582 ? 5% +12.0% 450977 ? 14% numa-numastat.node1.local_node
407775 ? 5% +11.2% 453631 ? 14% numa-numastat.node1.numa_hit
5193 ? 34% -48.9% 2656 ? 88% numa-numastat.node1.other_node
2026939 ? 2% +6.0% 2149119 ? 2% cpuidle.C1.time
93645 ? 5% +8.9% 101940 ? 9% cpuidle.C1.usage
5600995 ? 4% +3.3% 5783955 cpuidle.C1E.time
27629 ? 4% +7.5% 29709 ? 12% cpuidle.C1E.usage
4580663 +1.6% 4654736 ? 3% cpuidle.C3.time
14432 ? 2% +2.1% 14740 ? 2% cpuidle.C3.usage
0.00 -100.0% 0.00 cpuidle.C6.time
0.00 -100.0% 0.00 cpuidle.C6.usage
4.874e+09 +0.0% 4.876e+09 cpuidle.C7.time
4979280 -0.0% 4978987 cpuidle.C7.usage
20767 ? 2% +4.6% 21732 ? 5% cpuidle.POLL.time
1874 ? 4% +14.4% 2144 ? 14% cpuidle.POLL.usage
1573 -0.0% 1573 turbostat.Avg_MHz
51.07 -0.0 51.04 turbostat.Busy%
3088 +0.1% 3089 turbostat.Bzy_MHz
91251 ? 5% +8.7% 99209 ? 9% turbostat.C1
0.02 +0.0 0.02 turbostat.C1%
27387 ? 4% +7.5% 29434 ? 12% turbostat.C1E
0.06 ? 7% +0.0 0.06 turbostat.C1E%
14343 ? 2% +2.1% 14650 ? 2% turbostat.C3
0.05 ? 9% +0.0 0.05 ? 9% turbostat.C3%
4978589 -0.0% 4978359 turbostat.C7
48.83 +0.0 48.86 turbostat.C7%
21.60 +0.1% 21.63 turbostat.CPU%c1
27.25 +0.0% 27.26 turbostat.CPU%c3
0.07 ? 5% -3.4% 0.07 turbostat.CPU%c7
121.94 +0.9% 123.02 turbostat.CorWatt
54.75 +2.3% 56.00 ? 4% turbostat.CoreTmp
10226202 +0.2% 10250629 turbostat.IRQ
16.61 -0.1% 16.60 turbostat.Pkg%pc2
0.11 ? 35% -4.8% 0.10 ? 23% turbostat.Pkg%pc3
55.00 ? 3% +1.4% 55.75 ? 4% turbostat.PkgTmp
149.27 +0.7% 150.35 turbostat.PkgWatt
2693 +0.0% 2694 turbostat.TSC_MHz
85638 +1.2% 86662 meminfo.Active
85438 +1.2% 86462 meminfo.Active(anon)
48745 +0.2% 48823 meminfo.AnonHugePages
80292 -0.3% 80037 meminfo.AnonPages
1252146 +0.0% 1252312 meminfo.Cached
170618 ? 7% -2.1% 166957 ? 15% meminfo.CmaFree
204800 +0.0% 204800 meminfo.CmaTotal
32955004 -0.0% 32954996 meminfo.CommitLimit
238145 ? 4% -3.0% 231085 ? 2% meminfo.Committed_AS
62914560 +0.0% 62914560 meminfo.DirectMap1G
6091264 -0.1% 6087168 meminfo.DirectMap2M
159920 ? 7% +2.6% 164016 ? 14% meminfo.DirectMap4k
2048 +0.0% 2048 meminfo.Hugepagesize
9854 -0.4% 9816 meminfo.Inactive
9707 -0.4% 9669 meminfo.Inactive(anon)
8421 +0.3% 8448 meminfo.KernelStack
25445 -0.2% 25385 meminfo.Mapped
63912919 +0.0% 63913474 meminfo.MemAvailable
64246806 +0.0% 64247582 meminfo.MemFree
65910008 -0.0% 65909996 meminfo.MemTotal
1221 ? 57% -100.0% 0.00 meminfo.Mlocked
4138 -0.4% 4122 meminfo.PageTables
52017 -0.8% 51577 meminfo.SReclaimable
73138 -0.9% 72470 meminfo.SUnreclaim
15475 ? 3% +3.6% 16025 ? 3% meminfo.Shmem
125156 -0.9% 124048 meminfo.Slab
1236713 -0.0% 1236135 meminfo.Unevictable
3.436e+10 +0.0% 3.436e+10 meminfo.VmallocTotal
4.321e+12 ? 2% +8.0% 4.667e+12 ? 3% perf-stat.branch-instructions
0.27 ? 2% -0.0 0.25 perf-stat.branch-miss-rate%
1.17e+10 ? 5% -0.7% 1.161e+10 ? 4% perf-stat.branch-misses
8.85 ? 5% +0.2 9.05 ? 8% perf-stat.cache-miss-rate%
2.157e+08 ? 6% -2.4% 2.106e+08 ? 4% perf-stat.cache-misses
2.436e+09 ? 3% -3.9% 2.342e+09 ? 8% perf-stat.cache-references
575516 ? 4% +13.6% 653927 ? 8% perf-stat.context-switches
0.79 ? 2% -6.7% 0.74 ? 2% perf-stat.cpi
1.561e+13 +1.9% 1.59e+13 perf-stat.cpu-cycles
9256 -1.5% 9117 ? 4% perf-stat.cpu-migrations
0.73 ? 13% -0.3 0.44 ? 39% perf-stat.dTLB-load-miss-rate%
3.463e+10 ? 15% -30.7% 2.401e+10 ? 34% perf-stat.dTLB-load-misses
4.709e+12 ? 2% +15.7% 5.451e+12 ? 4% perf-stat.dTLB-loads
0.08 ? 59% -0.1 0.02 ? 41% perf-stat.dTLB-store-miss-rate%
2.096e+09 ? 61% -63.9% 7.557e+08 ? 37% perf-stat.dTLB-store-misses
2.745e+12 +20.0% 3.294e+12 ? 3% perf-stat.dTLB-stores
83.18 ? 2% +2.4 85.55 perf-stat.iTLB-load-miss-rate%
2.08e+09 ? 5% -7.5% 1.924e+09 ? 3% perf-stat.iTLB-load-misses
4.216e+08 ? 15% -22.9% 3.251e+08 ? 7% perf-stat.iTLB-loads
1.973e+13 ? 3% +9.2% 2.154e+13 ? 3% perf-stat.instructions
9503 ? 3% +17.9% 11203 ? 4% perf-stat.instructions-per-iTLB-miss
1.26 ? 2% +7.2% 1.35 ? 2% perf-stat.ipc
778803 -0.0% 778741 perf-stat.minor-faults
27.27 ? 5% +0.4 27.63 ? 8% perf-stat.node-load-miss-rate%
20008915 ? 18% -0.7% 19861107 ? 14% perf-stat.node-load-misses
53683432 ? 22% -3.6% 51734389 ? 5% perf-stat.node-loads
21.47 ? 6% +2.5 23.93 ? 9% perf-stat.node-store-miss-rate%
44312543 ? 7% +14.7% 50804799 ? 16% perf-stat.node-store-misses
1.619e+08 -0.9% 1.605e+08 ? 7% perf-stat.node-stores
778804 -0.0% 778753 perf-stat.page-faults
376850 ? 3% +16.6% 439328 ? 3% perf-stat.path-length
21359 +1.2% 21613 proc-vmstat.nr_active_anon
20080 -0.4% 20009 proc-vmstat.nr_anon_pages
1594833 +0.0% 1594853 proc-vmstat.nr_dirty_background_threshold
3193566 +0.0% 3193607 proc-vmstat.nr_dirty_threshold
313080 +0.0% 313125 proc-vmstat.nr_file_pages
42650 ? 7% -2.1% 41736 ? 15% proc-vmstat.nr_free_cma
16061673 +0.0% 16061877 proc-vmstat.nr_free_pages
2426 -0.3% 2418 proc-vmstat.nr_inactive_anon
22991 -0.5% 22885 proc-vmstat.nr_indirectly_reclaimable
8434 +0.2% 8453 proc-vmstat.nr_kernel_stack
6502 -0.2% 6486 proc-vmstat.nr_mapped
304.50 ? 57% -100.0% 0.00 proc-vmstat.nr_mlock
1033 -0.4% 1029 proc-vmstat.nr_page_table_pages
3864 ? 3% +3.7% 4005 ? 3% proc-vmstat.nr_shmem
13003 -0.8% 12893 proc-vmstat.nr_slab_reclaimable
18283 -0.9% 18116 proc-vmstat.nr_slab_unreclaimable
309177 -0.0% 309033 proc-vmstat.nr_unevictable
21431 +1.2% 21694 proc-vmstat.nr_zone_active_anon
2426 -0.3% 2418 proc-vmstat.nr_zone_inactive_anon
309178 -0.0% 309033 proc-vmstat.nr_zone_unevictable
1552 ? 11% +2.8% 1596 ? 6% proc-vmstat.numa_hint_faults
1424 ? 13% +1.7% 1448 ? 7% proc-vmstat.numa_hint_faults_local
663868 +0.4% 666440 proc-vmstat.numa_hit
657596 +0.4% 660136 proc-vmstat.numa_local
6271 +0.5% 6303 proc-vmstat.numa_other
1920 ? 9% +1.1% 1941 ? 5% proc-vmstat.numa_pte_updates
1392 ? 15% +14.5% 1593 ? 11% proc-vmstat.pgactivate
221716 ? 5% +10.7% 245443 ? 16% proc-vmstat.pgalloc_movable
449628 ? 2% -4.6% 428813 ? 9% proc-vmstat.pgalloc_normal
798885 +0.1% 799772 proc-vmstat.pgfault
664270 +0.5% 667803 proc-vmstat.pgfree
60396 ? 20% -5.5% 57070 ? 49% numa-meminfo.node0.Active
60246 ? 20% -5.4% 56970 ? 49% numa-meminfo.node0.Active(anon)
38993 ? 24% -6.4% 36496 ? 57% numa-meminfo.node0.AnonHugePages
56097 ? 19% -1.3% 55376 ? 50% numa-meminfo.node0.AnonPages
627504 ? 4% -1.8% 616208 ? 4% numa-meminfo.node0.FilePages
6886 ? 43% +5.6% 7272 ? 46% numa-meminfo.node0.Inactive
6848 ? 45% +5.1% 7199 ? 47% numa-meminfo.node0.Inactive(anon)
4904 ? 6% -9.8% 4422 ? 4% numa-meminfo.node0.KernelStack
13902 ? 14% -0.3% 13858 ? 14% numa-meminfo.node0.Mapped
32051702 +0.0% 32066302 numa-meminfo.node0.MemFree
32914928 +0.0% 32914928 numa-meminfo.node0.MemTotal
863224 ? 3% -1.7% 848624 numa-meminfo.node0.MemUsed
2517 ? 4% -6.2% 2362 ? 24% numa-meminfo.node0.PageTables
28789 ? 8% -5.2% 27293 ? 5% numa-meminfo.node0.SReclaimable
39196 ? 2% +1.5% 39800 ? 3% numa-meminfo.node0.SUnreclaim
11216 ? 30% -19.7% 9009 ? 26% numa-meminfo.node0.Shmem
67986 ? 3% -1.3% 67094 ? 3% numa-meminfo.node0.Slab
616100 ? 4% -1.5% 607027 ? 3% numa-meminfo.node0.Unevictable
25236 ? 46% +17.2% 29585 ? 93% numa-meminfo.node1.Active
25186 ? 47% +17.1% 29485 ? 94% numa-meminfo.node1.Active(anon)
9758 ?100% +26.3% 12322 ?173% numa-meminfo.node1.AnonHugePages
24194 ? 45% +1.9% 24652 ?112% numa-meminfo.node1.AnonPages
624823 ? 4% +1.8% 636288 ? 3% numa-meminfo.node1.FilePages
2967 ?102% -14.3% 2543 ?132% numa-meminfo.node1.Inactive
2858 ?108% -13.6% 2468 ?138% numa-meminfo.node1.Inactive(anon)
3519 ? 7% +14.4% 4026 ? 3% numa-meminfo.node1.KernelStack
11548 ? 17% -0.2% 11525 ? 18% numa-meminfo.node1.Mapped
32195068 -0.0% 32181267 numa-meminfo.node1.MemFree
32995080 -0.0% 32995068 numa-meminfo.node1.MemTotal
800010 ? 3% +1.7% 813800 numa-meminfo.node1.MemUsed
1617 ? 6% +8.7% 1758 ? 32% numa-meminfo.node1.PageTables
23225 ? 10% +4.5% 24281 ? 6% numa-meminfo.node1.SReclaimable
33937 ? 4% -3.7% 32668 ? 7% numa-meminfo.node1.SUnreclaim
4248 ? 71% +65.0% 7008 ? 31% numa-meminfo.node1.Shmem
57163 ? 5% -0.4% 56950 ? 5% numa-meminfo.node1.Slab
620612 ? 3% +1.4% 629107 ? 3% numa-meminfo.node1.Unevictable
65548 -0.5% 65250 slabinfo.Acpi-Namespace.active_objs
65549 -0.5% 65251 slabinfo.Acpi-Namespace.num_objs
81755 -0.2% 81607 slabinfo.Acpi-Operand.active_objs
81760 -0.2% 81620 slabinfo.Acpi-Operand.num_objs
879.75 ? 16% +21.7% 1071 ? 6% slabinfo.Acpi-State.active_objs
879.75 ? 16% +21.7% 1071 ? 6% slabinfo.Acpi-State.num_objs
1912 ? 20% +26.7% 2422 ? 5% slabinfo.avtab_node.active_objs
1912 ? 20% +26.7% 2422 ? 5% slabinfo.avtab_node.num_objs
57919 -1.9% 56815 slabinfo.dentry.active_objs
58456 -2.2% 57179 slabinfo.dentry.num_objs
8847 ? 3% -4.4% 8454 ? 2% slabinfo.filp.active_objs
9191 ? 3% -4.6% 8770 ? 2% slabinfo.filp.num_objs
45451 -0.5% 45226 slabinfo.inode_cache.active_objs
45637 -0.5% 45406 slabinfo.inode_cache.num_objs
2382 -2.6% 2321 ? 4% slabinfo.kmalloc-1024.active_objs
3630 -7.8% 3347 ? 6% slabinfo.kmalloc-2048.active_objs
3637 -7.7% 3358 ? 6% slabinfo.kmalloc-2048.num_objs
22142 ? 7% -3.0% 21477 ? 5% slabinfo.kmalloc-32.active_objs
22169 ? 7% -2.8% 21540 ? 5% slabinfo.kmalloc-32.num_objs
5657 ? 6% -7.8% 5214 ? 4% slabinfo.kmalloc-512.active_objs
5802 ? 7% -9.9% 5227 ? 4% slabinfo.kmalloc-512.num_objs
24097 +0.9% 24321 slabinfo.kmalloc-64.active_objs
24299 +0.8% 24492 slabinfo.kmalloc-64.num_objs
6576 ? 4% -2.0% 6445 ? 2% slabinfo.kmalloc-96.active_objs
6598 ? 4% -2.2% 6453 ? 2% slabinfo.kmalloc-96.num_objs
942.25 ? 7% -9.0% 857.75 ? 3% slabinfo.nsproxy.active_objs
942.25 ? 7% -9.0% 857.75 ? 3% slabinfo.nsproxy.num_objs
21213 ? 5% +4.1% 22093 ? 3% slabinfo.pid.active_objs
21226 ? 5% +4.1% 22093 ? 3% slabinfo.pid.num_objs
672.00 ? 16% +23.8% 832.00 ? 9% slabinfo.pool_workqueue.active_objs
672.00 ? 16% +23.8% 832.00 ? 9% slabinfo.pool_workqueue.num_objs
544.00 ? 18% +19.1% 648.00 ? 6% slabinfo.scsi_sense_cache.active_objs
544.00 ? 18% +19.1% 648.00 ? 6% slabinfo.scsi_sense_cache.num_objs
982.75 ? 8% -7.3% 911.50 ? 9% slabinfo.task_group.active_objs
982.75 ? 8% -7.3% 911.50 ? 9% slabinfo.task_group.num_objs
16136 ? 12% +5.1% 16966 ? 5% slabinfo.vm_area_struct.active_objs
16147 ? 12% +5.4% 17017 ? 5% slabinfo.vm_area_struct.num_objs
15065 ? 20% -5.5% 14242 ? 49% numa-vmstat.node0.nr_active_anon
14031 ? 19% -1.3% 13844 ? 50% numa-vmstat.node0.nr_anon_pages
156875 ? 4% -1.8% 154050 ? 4% numa-vmstat.node0.nr_file_pages
8012903 +0.0% 8016566 numa-vmstat.node0.nr_free_pages
1714 ? 44% +5.0% 1799 ? 47% numa-vmstat.node0.nr_inactive_anon
10082 ? 10% +17.2% 11813 ? 14% numa-vmstat.node0.nr_indirectly_reclaimable
4913 ? 6% -10.0% 4422 ? 4% numa-vmstat.node0.nr_kernel_stack
3515 ? 14% -0.4% 3500 ? 14% numa-vmstat.node0.nr_mapped
127.50 ? 57% -100.0% 0.00 numa-vmstat.node0.nr_mlock
629.00 ? 4% -6.2% 590.25 ? 24% numa-vmstat.node0.nr_page_table_pages
2803 ? 30% -19.7% 2251 ? 26% numa-vmstat.node0.nr_shmem
7197 ? 8% -5.2% 6823 ? 5% numa-vmstat.node0.nr_slab_reclaimable
9798 ? 2% +1.5% 9949 ? 3% numa-vmstat.node0.nr_slab_unreclaimable
154024 ? 4% -1.5% 151756 ? 3% numa-vmstat.node0.nr_unevictable
15065 ? 20% -5.5% 14242 ? 49% numa-vmstat.node0.nr_zone_active_anon
1714 ? 44% +5.0% 1799 ? 47% numa-vmstat.node0.nr_zone_inactive_anon
154024 ? 4% -1.5% 151756 ? 3% numa-vmstat.node0.nr_zone_unevictable
474897 ? 2% -8.1% 436556 ? 7% numa-vmstat.node0.numa_hit
166664 +0.1% 166790 numa-vmstat.node0.numa_interleave
473733 ? 2% -8.7% 432687 ? 6% numa-vmstat.node0.numa_local
1163 ?147% +232.5% 3868 ? 58% numa-vmstat.node0.numa_other
6302 ? 46% +17.1% 7378 ? 94% numa-vmstat.node1.nr_active_anon
6058 ? 45% +1.9% 6171 ?112% numa-vmstat.node1.nr_anon_pages
156204 ? 4% +1.8% 159071 ? 3% numa-vmstat.node1.nr_file_pages
42643 ? 7% -2.1% 41729 ? 15% numa-vmstat.node1.nr_free_cma
8048742 -0.0% 8045285 numa-vmstat.node1.nr_free_pages
714.25 ?108% -13.7% 616.75 ?138% numa-vmstat.node1.nr_inactive_anon
12908 ? 8% -14.2% 11071 ? 15% numa-vmstat.node1.nr_indirectly_reclaimable
3522 ? 7% +14.5% 4031 ? 3% numa-vmstat.node1.nr_kernel_stack
2995 ? 17% -0.3% 2987 ? 18% numa-vmstat.node1.nr_mapped
176.00 ? 57% -100.0% 0.00 numa-vmstat.node1.nr_mlock
404.00 ? 6% +8.9% 440.00 ? 32% numa-vmstat.node1.nr_page_table_pages
1060 ? 71% +65.1% 1751 ? 31% numa-vmstat.node1.nr_shmem
5806 ? 10% +4.5% 6070 ? 6% numa-vmstat.node1.nr_slab_reclaimable
8483 ? 4% -3.7% 8166 ? 7% numa-vmstat.node1.nr_slab_unreclaimable
155152 ? 3% +1.4% 157276 ? 3% numa-vmstat.node1.nr_unevictable
6361 ? 46% +17.0% 7440 ? 93% numa-vmstat.node1.nr_zone_active_anon
714.25 ?108% -13.7% 616.75 ?138% numa-vmstat.node1.nr_zone_inactive_anon
155152 ? 3% +1.4% 157276 ? 3% numa-vmstat.node1.nr_zone_unevictable
460308 ? 2% +8.1% 497743 ? 5% numa-vmstat.node1.numa_hit
166767 -0.1% 166558 numa-vmstat.node1.numa_interleave
286720 ? 4% +14.0% 326994 ? 8% numa-vmstat.node1.numa_local
173587 -1.6% 170748 numa-vmstat.node1.numa_other
28.60 ? 75% -65.3% 9.93 ?101% sched_debug.cfs_rq:/.MIN_vruntime.avg
815.14 ? 73% -61.0% 317.83 ?101% sched_debug.cfs_rq:/.MIN_vruntime.max
0.00 +0.0% 0.00 sched_debug.cfs_rq:/.MIN_vruntime.min
149.28 ? 73% -63.0% 55.30 ?101% sched_debug.cfs_rq:/.MIN_vruntime.stddev
50550 +0.2% 50644 sched_debug.cfs_rq:/.exec_clock.avg
106670 ? 3% +1.2% 107917 ? 3% sched_debug.cfs_rq:/.exec_clock.max
18301 ? 32% +1.9% 18652 ? 34% sched_debug.cfs_rq:/.exec_clock.min
20049 ? 3% +1.1% 20268 ? 2% sched_debug.cfs_rq:/.exec_clock.stddev
47381 ? 16% +2.2% 48421 ? 6% sched_debug.cfs_rq:/.load.avg
230674 ? 35% -16.9% 191596 ? 37% sched_debug.cfs_rq:/.load.max
5143 -1.7% 5053 sched_debug.cfs_rq:/.load.min
66397 ? 25% -7.2% 61609 ? 19% sched_debug.cfs_rq:/.load.stddev
54.48 ? 11% +23.6% 67.35 ? 9% sched_debug.cfs_rq:/.load_avg.avg
301.38 ? 3% +7.9% 325.33 ? 7% sched_debug.cfs_rq:/.load_avg.max
6.21 ? 11% -2.0% 6.08 ? 14% sched_debug.cfs_rq:/.load_avg.min
87.81 ? 9% +18.2% 103.80 ? 6% sched_debug.cfs_rq:/.load_avg.stddev
28.62 ? 75% -65.3% 9.93 ?101% sched_debug.cfs_rq:/.max_vruntime.avg
815.64 ? 73% -61.0% 317.83 ?101% sched_debug.cfs_rq:/.max_vruntime.max
0.00 +0.0% 0.00 sched_debug.cfs_rq:/.max_vruntime.min
149.37 ? 73% -63.0% 55.30 ?101% sched_debug.cfs_rq:/.max_vruntime.stddev
975783 +0.2% 977424 sched_debug.cfs_rq:/.min_vruntime.avg
1409368 ? 4% -0.6% 1401106 ? 3% sched_debug.cfs_rq:/.min_vruntime.max
448106 ? 24% +1.7% 455768 ? 23% sched_debug.cfs_rq:/.min_vruntime.min
251304 ? 6% -2.7% 244544 ? 3% sched_debug.cfs_rq:/.min_vruntime.stddev
0.59 ? 5% +1.7% 0.60 sched_debug.cfs_rq:/.nr_running.avg
1.04 ? 6% -4.0% 1.00 sched_debug.cfs_rq:/.nr_running.max
0.17 +0.0% 0.17 sched_debug.cfs_rq:/.nr_running.min
0.38 ? 3% +0.1% 0.38 sched_debug.cfs_rq:/.nr_running.stddev
0.92 ? 3% -6.4% 0.86 sched_debug.cfs_rq:/.nr_spread_over.avg
2.62 ? 11% -34.9% 1.71 ? 22% sched_debug.cfs_rq:/.nr_spread_over.max
0.83 +0.0% 0.83 sched_debug.cfs_rq:/.nr_spread_over.min
0.35 ? 21% -55.4% 0.15 ? 42% sched_debug.cfs_rq:/.nr_spread_over.stddev
2.67 ? 99% +408.5% 13.56 ? 43% sched_debug.cfs_rq:/.removed.load_avg.avg
85.33 ? 99% +98.6% 169.50 sched_debug.cfs_rq:/.removed.load_avg.max
14.85 ?100% +199.0% 44.39 ? 22% sched_debug.cfs_rq:/.removed.load_avg.stddev
123.37 ?100% +405.6% 623.75 ? 43% sched_debug.cfs_rq:/.removed.runnable_sum.avg
3947 ?100% +97.6% 7801 sched_debug.cfs_rq:/.removed.runnable_sum.max
686.90 ?100% +197.3% 2041 ? 22% sched_debug.cfs_rq:/.removed.runnable_sum.stddev
1.05 ?100% +357.8% 4.82 ? 42% sched_debug.cfs_rq:/.removed.util_avg.avg
33.67 ?100% +99.4% 67.12 ? 20% sched_debug.cfs_rq:/.removed.util_avg.max
5.86 ?100% +173.8% 16.04 ? 31% sched_debug.cfs_rq:/.removed.util_avg.stddev
36.99 ? 10% +6.4% 39.35 ? 2% sched_debug.cfs_rq:/.runnable_load_avg.avg
140.17 ? 2% -0.1% 140.00 sched_debug.cfs_rq:/.runnable_load_avg.max
4.96 -2.5% 4.83 sched_debug.cfs_rq:/.runnable_load_avg.min
48.16 ? 7% +4.3% 50.23 sched_debug.cfs_rq:/.runnable_load_avg.stddev
46667 ? 16% +2.0% 47609 ? 7% sched_debug.cfs_rq:/.runnable_weight.avg
225541 ? 35% -17.5% 186040 ? 40% sched_debug.cfs_rq:/.runnable_weight.max
5143 -1.7% 5053 sched_debug.cfs_rq:/.runnable_weight.min
65291 ? 25% -7.4% 60452 ? 21% sched_debug.cfs_rq:/.runnable_weight.stddev
0.02 ?173% -100.0% 0.00 sched_debug.cfs_rq:/.spread.avg
0.50 ?173% -100.0% 0.00 sched_debug.cfs_rq:/.spread.max
0.09 ?173% -100.0% 0.00 sched_debug.cfs_rq:/.spread.stddev
-51577 -21.7% -40360 sched_debug.cfs_rq:/.spread0.avg
382004 ? 29% +0.3% 383321 ? 26% sched_debug.cfs_rq:/.spread0.max
-579250 -3.0% -562009 sched_debug.cfs_rq:/.spread0.min
251311 ? 6% -2.7% 244551 ? 3% sched_debug.cfs_rq:/.spread0.stddev
598.80 ? 2% +1.6% 608.31 sched_debug.cfs_rq:/.util_avg.avg
1173 ? 4% -5.1% 1113 ? 2% sched_debug.cfs_rq:/.util_avg.max
202.08 ? 9% +0.1% 202.29 ? 10% sched_debug.cfs_rq:/.util_avg.min
342.36 ? 2% -1.5% 337.33 sched_debug.cfs_rq:/.util_avg.stddev
323.89 ? 8% -10.1% 291.09 ? 3% sched_debug.cfs_rq:/.util_est_enqueued.avg
716.25 -8.9% 652.42 ? 9% sched_debug.cfs_rq:/.util_est_enqueued.max
46.08 ? 14% +11.2% 51.25 ? 4% sched_debug.cfs_rq:/.util_est_enqueued.min
276.83 -12.0% 243.64 ? 9% sched_debug.cfs_rq:/.util_est_enqueued.stddev
815280 ? 4% -1.5% 802720 ? 3% sched_debug.cpu.avg_idle.avg
993248 -0.8% 985236 sched_debug.cpu.avg_idle.max
268671 ? 37% -10.7% 240022 ? 34% sched_debug.cpu.avg_idle.min
181491 ? 20% +2.7% 186451 ? 13% sched_debug.cpu.avg_idle.stddev
176552 -0.0% 176542 sched_debug.cpu.clock.avg
176555 -0.0% 176545 sched_debug.cpu.clock.max
176548 -0.0% 176538 sched_debug.cpu.clock.min
1.58 ? 7% +15.3% 1.82 ? 11% sched_debug.cpu.clock.stddev
176552 -0.0% 176542 sched_debug.cpu.clock_task.avg
176555 -0.0% 176545 sched_debug.cpu.clock_task.max
176548 -0.0% 176538 sched_debug.cpu.clock_task.min
1.58 ? 7% +15.3% 1.82 ? 11% sched_debug.cpu.clock_task.stddev
27.78 +0.8% 28.01 sched_debug.cpu.cpu_load[0].avg
143.88 ? 2% +2.2% 147.08 ? 6% sched_debug.cpu.cpu_load[0].max
4.96 -2.5% 4.83 sched_debug.cpu.cpu_load[0].min
39.72 +1.8% 40.46 ? 3% sched_debug.cpu.cpu_load[0].stddev
29.30 ? 5% -1.4% 28.89 sched_debug.cpu.cpu_load[1].avg
190.75 ? 7% -2.3% 186.29 ? 2% sched_debug.cpu.cpu_load[1].max
4.96 -2.5% 4.83 sched_debug.cpu.cpu_load[1].min
44.19 ? 8% -1.6% 43.47 sched_debug.cpu.cpu_load[1].stddev
29.46 ? 3% -1.3% 29.07 sched_debug.cpu.cpu_load[2].avg
212.54 ? 5% -2.5% 207.17 sched_debug.cpu.cpu_load[2].max
5.04 -0.8% 5.00 ? 2% sched_debug.cpu.cpu_load[2].min
46.52 ? 6% -1.9% 45.63 sched_debug.cpu.cpu_load[2].stddev
29.15 ? 2% -0.6% 28.98 sched_debug.cpu.cpu_load[3].avg
220.46 ? 3% -2.0% 215.96 sched_debug.cpu.cpu_load[3].max
5.38 ? 2% -1.6% 5.29 ? 6% sched_debug.cpu.cpu_load[3].min
47.42 ? 3% -1.3% 46.82 sched_debug.cpu.cpu_load[3].stddev
28.54 +0.3% 28.63 sched_debug.cpu.cpu_load[4].avg
221.79 -1.2% 219.12 sched_debug.cpu.cpu_load[4].max
4.96 +0.8% 5.00 ? 4% sched_debug.cpu.cpu_load[4].min
47.78 -0.4% 47.61 sched_debug.cpu.cpu_load[4].stddev
3147 -0.4% 3133 sched_debug.cpu.curr->pid.avg
5202 +0.0% 5203 sched_debug.cpu.curr->pid.max
1417 -0.0% 1416 sched_debug.cpu.curr->pid.min
1416 +0.4% 1422 sched_debug.cpu.curr->pid.stddev
33530 ? 7% -3.5% 32352 sched_debug.cpu.load.avg
191933 ? 40% -21.7% 150339 ? 2% sched_debug.cpu.load.max
5143 -1.7% 5053 sched_debug.cpu.load.min
50686 ? 25% -13.2% 43983 sched_debug.cpu.load.stddev
500000 +0.0% 500000 sched_debug.cpu.max_idle_balance_cost.avg
500000 +0.0% 500000 sched_debug.cpu.max_idle_balance_cost.max
500000 +0.0% 500000 sched_debug.cpu.max_idle_balance_cost.min
4294 -0.0% 4294 sched_debug.cpu.next_balance.avg
4294 -0.0% 4294 sched_debug.cpu.next_balance.max
4294 -0.0% 4294 sched_debug.cpu.next_balance.min
0.00 ? 31% +0.8% 0.00 ? 31% sched_debug.cpu.next_balance.stddev
156192 -0.0% 156167 sched_debug.cpu.nr_load_updates.avg
160511 -0.2% 160148 sched_debug.cpu.nr_load_updates.max
155075 -0.9% 153755 sched_debug.cpu.nr_load_updates.min
983.74 ? 3% +4.1% 1024 ? 3% sched_debug.cpu.nr_load_updates.stddev
0.55 ? 2% -0.2% 0.55 sched_debug.cpu.nr_running.avg
1.46 ? 9% -2.9% 1.42 ? 5% sched_debug.cpu.nr_running.max
0.17 +0.0% 0.17 sched_debug.cpu.nr_running.min
0.40 ? 4% +0.7% 0.41 sched_debug.cpu.nr_running.stddev
9103 ? 2% +5.2% 9573 ? 4% sched_debug.cpu.nr_switches.avg
39036 ? 9% -3.0% 37872 ? 11% sched_debug.cpu.nr_switches.max
1631 ? 27% +19.6% 1951 ? 10% sched_debug.cpu.nr_switches.min
8464 ? 12% -1.0% 8378 ? 8% sched_debug.cpu.nr_switches.stddev
0.00 ?110% -100.0% 0.00 sched_debug.cpu.nr_uninterruptible.avg
7.25 ? 8% +5.7% 7.67 ? 11% sched_debug.cpu.nr_uninterruptible.max
-8.54 +2.0% -8.71 sched_debug.cpu.nr_uninterruptible.min
3.44 ? 14% +10.2% 3.78 ? 8% sched_debug.cpu.nr_uninterruptible.stddev
10856 ? 4% -1.0% 10746 ? 4% sched_debug.cpu.sched_count.avg
114419 ? 16% -21.2% 90112 ? 19% sched_debug.cpu.sched_count.max
876.04 ? 36% +16.6% 1021 ? 21% sched_debug.cpu.sched_count.min
20384 ? 14% -15.8% 17163 ? 10% sched_debug.cpu.sched_count.stddev
3022 ? 3% +2.9% 3109 ? 5% sched_debug.cpu.sched_goidle.avg
14082 ? 10% +9.1% 15363 ? 15% sched_debug.cpu.sched_goidle.max
319.96 ? 48% +10.4% 353.33 ? 25% sched_debug.cpu.sched_goidle.min
3191 ? 11% -0.7% 3168 ? 13% sched_debug.cpu.sched_goidle.stddev
3462 ? 3% +6.8% 3697 ? 5% sched_debug.cpu.ttwu_count.avg
15960 ? 11% -5.7% 15054 ? 5% sched_debug.cpu.ttwu_count.max
677.54 ? 12% -7.4% 627.62 ? 32% sched_debug.cpu.ttwu_count.min
3454 ? 12% -1.8% 3392 ? 3% sched_debug.cpu.ttwu_count.stddev
1414 ? 5% +10.8% 1567 ? 5% sched_debug.cpu.ttwu_local.avg
11015 ? 16% +13.8% 12535 ? 17% sched_debug.cpu.ttwu_local.max
178.12 ? 13% +7.0% 190.58 ? 12% sched_debug.cpu.ttwu_local.min
2147 ? 10% +19.7% 2570 ? 10% sched_debug.cpu.ttwu_local.stddev
176549 -0.0% 176539 sched_debug.cpu_clk
996147 +0.0% 996147 sched_debug.dl_rq:.dl_bw->bw.avg
996147 +0.0% 996147 sched_debug.dl_rq:.dl_bw->bw.max
996147 +0.0% 996147 sched_debug.dl_rq:.dl_bw->bw.min
4.295e+09 -0.0% 4.295e+09 sched_debug.jiffies
176549 -0.0% 176539 sched_debug.ktime
0.01 -50.0% 0.00 ? 99% sched_debug.rt_rq:/.rt_nr_migratory.avg
0.17 -50.0% 0.08 ? 99% sched_debug.rt_rq:/.rt_nr_migratory.max
0.03 -50.0% 0.01 ?100% sched_debug.rt_rq:/.rt_nr_migratory.stddev
0.01 -50.0% 0.00 ? 99% sched_debug.rt_rq:/.rt_nr_running.avg
0.17 -50.0% 0.08 ? 99% sched_debug.rt_rq:/.rt_nr_running.max
0.03 -50.0% 0.01 ?100% sched_debug.rt_rq:/.rt_nr_running.stddev
950.00 +0.0% 950.00 sched_debug.rt_rq:/.rt_runtime.avg
950.00 +0.0% 950.00 sched_debug.rt_rq:/.rt_runtime.max
950.00 +0.0% 950.00 sched_debug.rt_rq:/.rt_runtime.min
0.05 ? 14% +5.3% 0.05 ? 14% sched_debug.rt_rq:/.rt_time.avg
1.55 ? 14% +5.2% 1.63 ? 13% sched_debug.rt_rq:/.rt_time.max
0.27 ? 14% +5.2% 0.28 ? 13% sched_debug.rt_rq:/.rt_time.stddev
176870 -0.0% 176859 sched_debug.sched_clk
1.00 +0.0% 1.00 sched_debug.sched_clock_stable()
4118331 +0.0% 4118331 sched_debug.sysctl_sched.sysctl_sched_features
24.00 +0.0% 24.00 sched_debug.sysctl_sched.sysctl_sched_latency
3.00 +0.0% 3.00 sched_debug.sysctl_sched.sysctl_sched_min_granularity
1.00 +0.0% 1.00 sched_debug.sysctl_sched.sysctl_sched_tunable_scaling
4.00 +0.0% 4.00 sched_debug.sysctl_sched.sysctl_sched_wakeup_granularity
29.43 ? 31% -2.3 27.12 ? 35% perf-profile.calltrace.cycles-pp.secondary_startup_64
25.83 ? 40% -1.2 24.65 ? 39% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
25.83 ? 40% -1.2 24.65 ? 39% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
25.83 ? 40% -1.2 24.65 ? 39% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
3.60 ? 70% -1.1 2.46 ?107% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64
3.60 ? 70% -1.1 2.46 ?107% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_kernel.secondary_startup_64
3.60 ? 70% -1.1 2.46 ?107% perf-profile.calltrace.cycles-pp.start_kernel.secondary_startup_64
3.59 ? 71% -1.1 2.46 ?107% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_kernel.secondary_startup_64
3.53 ? 70% -1.1 2.40 ?107% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_kernel
25.33 ? 40% -1.0 24.35 ? 39% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
1.75 ? 12% -0.9 0.87 ? 14% perf-profile.calltrace.cycles-pp.fput
23.86 ? 41% -0.5 23.37 ? 40% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
1.29 ? 31% -0.5 0.80 ? 24% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
1.28 ? 31% -0.5 0.80 ? 24% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry
0.59 ? 62% -0.4 0.14 ?173% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle
0.32 ?102% -0.3 0.00 perf-profile.calltrace.cycles-pp.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle
0.31 ?100% -0.3 0.00 perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state
22.27 ? 14% -0.2 22.07 ? 13% perf-profile.calltrace.cycles-pp.fput.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.15 ?173% -0.2 0.00 perf-profile.calltrace.cycles-pp.__softirqentry_text_start.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state
22.97 ? 13% -0.1 22.83 ? 13% perf-profile.calltrace.cycles-pp.__fget.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64
24.80 ? 13% -0.1 24.68 ? 13% perf-profile.calltrace.cycles-pp.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.89 ? 14% -0.0 0.84 ? 12% perf-profile.calltrace.cycles-pp.__fget_light.__fget_light.do_sys_poll.__x64_sys_poll.do_syscall_64
0.86 ? 12% -0.0 0.83 ? 16% perf-profile.calltrace.cycles-pp.__fdget.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.86 ? 16% +0.0 1.86 ? 12% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret
0.94 ? 10% +0.0 0.94 ? 23% perf-profile.calltrace.cycles-pp.__kmalloc.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.64 ? 13% +0.0 0.68 ? 20% perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.08 ? 13% +0.1 2.20 ? 10% perf-profile.calltrace.cycles-pp.copy_user_generic_string._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64
2.38 ? 12% +0.2 2.58 ? 10% perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.8 0.83 ? 13% perf-profile.calltrace.cycles-pp.vfs_poll
0.00 +1.6 1.64 ? 12% perf-profile.calltrace.cycles-pp.vfs_poll.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
64.39 ? 13% +2.4 66.80 ? 13% perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
64.61 ? 13% +2.4 67.05 ? 13% perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
65.59 ? 13% +2.5 68.06 ? 13% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
65.39 ? 13% +2.5 67.86 ? 13% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
29.45 ? 31% -2.3 27.12 ? 35% perf-profile.children.cycles-pp.do_idle
29.43 ? 31% -2.3 27.12 ? 35% perf-profile.children.cycles-pp.secondary_startup_64
29.43 ? 31% -2.3 27.12 ? 35% perf-profile.children.cycles-pp.cpu_startup_entry
28.97 ? 31% -2.1 26.84 ? 35% perf-profile.children.cycles-pp.cpuidle_enter_state
27.40 ? 32% -1.6 25.77 ? 35% perf-profile.children.cycles-pp.intel_idle
25.83 ? 40% -1.2 24.65 ? 39% perf-profile.children.cycles-pp.start_secondary
3.60 ? 70% -1.1 2.46 ?107% perf-profile.children.cycles-pp.start_kernel
24.02 ? 14% -1.1 22.95 ? 13% perf-profile.children.cycles-pp.fput
1.52 ? 25% -0.4 1.08 ? 13% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
1.53 ? 25% -0.4 1.09 ? 13% perf-profile.children.cycles-pp.apic_timer_interrupt
0.89 ? 22% -0.2 0.69 ? 9% perf-profile.children.cycles-pp.hrtimer_interrupt
0.52 ? 32% -0.2 0.32 ? 27% perf-profile.children.cycles-pp.irq_exit
25.69 ? 13% -0.2 25.52 ? 13% perf-profile.children.cycles-pp.__fget_light
0.43 ? 31% -0.2 0.27 ? 31% perf-profile.children.cycles-pp.__softirqentry_text_start
0.35 ? 24% -0.1 0.21 ? 28% perf-profile.children.cycles-pp.menu_select
22.97 ? 13% -0.1 22.83 ? 13% perf-profile.children.cycles-pp.__fget
0.66 ? 20% -0.1 0.53 ? 10% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.41 ? 31% -0.1 0.29 ? 19% perf-profile.children.cycles-pp.io_serial_in
0.60 ? 22% -0.1 0.48 ? 13% perf-profile.children.cycles-pp.irq_work_run_list
0.58 ? 22% -0.1 0.47 ? 14% perf-profile.children.cycles-pp.irq_work_interrupt
0.58 ? 22% -0.1 0.47 ? 14% perf-profile.children.cycles-pp.smp_irq_work_interrupt
0.58 ? 22% -0.1 0.47 ? 14% perf-profile.children.cycles-pp.irq_work_run
0.58 ? 22% -0.1 0.47 ? 14% perf-profile.children.cycles-pp.printk
0.58 ? 22% -0.1 0.47 ? 14% perf-profile.children.cycles-pp.vprintk_emit
0.59 ? 19% -0.1 0.48 ? 12% perf-profile.children.cycles-pp.serial8250_console_write
0.61 ? 19% -0.1 0.50 ? 12% perf-profile.children.cycles-pp.console_unlock
0.58 ? 19% -0.1 0.47 ? 12% perf-profile.children.cycles-pp.serial8250_console_putchar
0.58 ? 19% -0.1 0.48 ? 12% perf-profile.children.cycles-pp.wait_for_xmitr
0.58 ? 20% -0.1 0.48 ? 12% perf-profile.children.cycles-pp.uart_console_write
0.10 ? 30% -0.1 0.02 ?173% perf-profile.children.cycles-pp.lapic_next_deadline
0.16 ? 23% -0.1 0.09 ? 27% perf-profile.children.cycles-pp.tick_nohz_next_event
0.18 ? 24% -0.1 0.12 ? 21% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.11 ? 24% -0.1 0.06 ? 60% perf-profile.children.cycles-pp.get_next_timer_interrupt
0.13 ? 26% -0.1 0.07 ? 27% perf-profile.children.cycles-pp.clockevents_program_event
0.10 ? 9% -0.1 0.04 ?103% perf-profile.children.cycles-pp.run_timer_softirq
0.40 ? 15% -0.1 0.34 ? 15% perf-profile.children.cycles-pp.tick_sched_timer
0.14 ? 34% -0.1 0.08 ? 19% perf-profile.children.cycles-pp.native_write_msr
0.36 ? 16% -0.1 0.30 ? 18% perf-profile.children.cycles-pp.tick_sched_handle
0.17 ? 31% -0.1 0.12 ? 7% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.06 ? 60% -0.0 0.01 ?173% perf-profile.children.cycles-pp.irq_enter
0.08 ? 29% -0.0 0.03 ?105% perf-profile.children.cycles-pp.find_busiest_group
0.07 ? 24% -0.0 0.03 ?102% perf-profile.children.cycles-pp.__next_timer_interrupt
0.33 ? 16% -0.0 0.28 ? 14% perf-profile.children.cycles-pp.update_process_times
0.06 ? 66% -0.0 0.01 ?173% perf-profile.children.cycles-pp.sched_clock_cpu
0.16 ? 44% -0.0 0.12 ? 33% perf-profile.children.cycles-pp.rebalance_domains
0.08 ? 37% -0.0 0.04 ? 59% perf-profile.children.cycles-pp.native_irq_return_iret
0.06 ? 60% -0.0 0.02 ?173% perf-profile.children.cycles-pp.update_blocked_averages
0.04 ?107% -0.0 0.00 perf-profile.children.cycles-pp.native_sched_clock
0.06 ? 28% -0.0 0.03 ?100% perf-profile.children.cycles-pp.rcu_check_callbacks
0.06 ? 28% -0.0 0.03 ?100% perf-profile.children.cycles-pp.read_tsc
0.04 ? 59% -0.0 0.01 ?173% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.11 ? 24% -0.0 0.08 ? 67% perf-profile.children.cycles-pp.load_balance
0.86 ? 12% -0.0 0.83 ? 16% perf-profile.children.cycles-pp.__fdget
0.03 ?100% -0.0 0.00 perf-profile.children.cycles-pp._raw_spin_lock_irq
0.20 ? 11% -0.0 0.18 ? 18% perf-profile.children.cycles-pp.scheduler_tick
0.12 ? 24% -0.0 0.09 ? 23% perf-profile.children.cycles-pp.ktime_get
0.04 ?103% -0.0 0.01 ?173% perf-profile.children.cycles-pp.sched_clock
0.04 ?102% -0.0 0.01 ?173% perf-profile.children.cycles-pp.tick_irq_enter
0.02 ?173% -0.0 0.00 perf-profile.children.cycles-pp._raw_spin_trylock
0.03 ?105% -0.0 0.01 ?173% perf-profile.children.cycles-pp._raw_spin_lock
0.06 ? 64% -0.0 0.04 ?107% perf-profile.children.cycles-pp.run_rebalance_domains
0.15 ? 69% -0.0 0.13 ?110% perf-profile.children.cycles-pp.memcpy
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp.tick_nohz_irq_exit
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp.intel_pmu_disable_all
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp.find_next_bit
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp.__remove_hrtimer
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp.rcu_idle_exit
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp.timerqueue_del
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp.call_function_interrupt
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp.smp_call_function_interrupt
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp._cond_resched
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp.try_to_wake_up
0.01 ?173% -0.0 0.00 perf-profile.children.cycles-pp.call_timer_fn
0.16 ? 62% -0.0 0.14 ?115% perf-profile.children.cycles-pp.ret_from_fork
0.16 ? 62% -0.0 0.14 ?115% perf-profile.children.cycles-pp.kthread
0.14 ? 69% -0.0 0.12 ?109% perf-profile.children.cycles-pp.fb_flashcursor
0.14 ? 69% -0.0 0.12 ?109% perf-profile.children.cycles-pp.bit_cursor
0.14 ? 69% -0.0 0.12 ?109% perf-profile.children.cycles-pp.soft_cursor
0.14 ? 69% -0.0 0.12 ?109% perf-profile.children.cycles-pp.mga_dirty_update
0.12 ? 15% -0.0 0.11 ? 24% perf-profile.children.cycles-pp.__might_sleep
0.15 ? 62% -0.0 0.14 ?115% perf-profile.children.cycles-pp.worker_thread
0.15 ? 62% -0.0 0.14 ?115% perf-profile.children.cycles-pp.process_one_work
0.12 ? 13% -0.0 0.11 ? 15% perf-profile.children.cycles-pp.___might_sleep
0.02 ?173% -0.0 0.01 ?173% perf-profile.children.cycles-pp.ksys_read
0.01 ?173% +0.0 0.01 ?173% perf-profile.children.cycles-pp.vfs_read
0.01 ?173% +0.0 0.01 ?173% perf-profile.children.cycles-pp.__vfs_read
0.01 ?173% +0.0 0.01 ?173% perf-profile.children.cycles-pp.wake_up_klogd_work_func
0.07 ? 17% +0.0 0.07 ? 11% perf-profile.children.cycles-pp.__indirect_thunk_start
0.24 ? 23% +0.0 0.24 ? 20% perf-profile.children.cycles-pp.kmalloc_slab
1.86 ? 16% +0.0 1.87 ? 12% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.12 ? 9% +0.0 0.12 ? 27% perf-profile.children.cycles-pp.entry_SYSCALL_64_stage2
0.00 +0.0 0.01 ?173% perf-profile.children.cycles-pp.perf_evsel__read_counter
0.00 +0.0 0.01 ?173% perf-profile.children.cycles-pp.__libc_read
0.00 +0.0 0.01 ?173% perf-profile.children.cycles-pp.perf_read
0.00 +0.0 0.01 ?173% perf-profile.children.cycles-pp.smp_call_function_single
0.00 +0.0 0.01 ?173% perf-profile.children.cycles-pp.perf_event_read
0.00 +0.0 0.01 ?173% perf-profile.children.cycles-pp.cpumask_next_and
0.00 +0.0 0.01 ?173% perf-profile.children.cycles-pp.find_next_and_bit
0.00 +0.0 0.01 ?173% perf-profile.children.cycles-pp.cmd_stat
0.00 +0.0 0.01 ?173% perf-profile.children.cycles-pp.__run_perf_stat
0.00 +0.0 0.01 ?173% perf-profile.children.cycles-pp.process_interval
0.00 +0.0 0.01 ?173% perf-profile.children.cycles-pp.read_counters
0.00 +0.0 0.02 ?173% perf-profile.children.cycles-pp.__libc_start_main
0.00 +0.0 0.02 ?173% perf-profile.children.cycles-pp.main
0.00 +0.0 0.02 ?173% perf-profile.children.cycles-pp.handle_internal_command
0.00 +0.0 0.02 ?173% perf-profile.children.cycles-pp.run_builtin
0.17 ? 47% +0.0 0.19 ? 24% perf-profile.children.cycles-pp.delay_tsc
0.69 ? 11% +0.0 0.71 ? 20% perf-profile.children.cycles-pp.kfree
0.05 ? 60% +0.0 0.07 ? 31% perf-profile.children.cycles-pp.task_tick_fair
0.99 ? 10% +0.0 1.02 ? 21% perf-profile.children.cycles-pp.__kmalloc
0.00 +0.0 0.03 ?100% perf-profile.children.cycles-pp.update_load_avg
0.22 ? 15% +0.0 0.27 ? 23% perf-profile.children.cycles-pp.__might_fault
2.09 ? 13% +0.1 2.21 ? 10% perf-profile.children.cycles-pp.copy_user_generic_string
2.48 ? 12% +0.2 2.68 ? 10% perf-profile.children.cycles-pp._copy_from_user
64.39 ? 13% +2.4 66.81 ? 13% perf-profile.children.cycles-pp.do_sys_poll
64.71 ? 13% +2.4 67.15 ? 13% perf-profile.children.cycles-pp.__x64_sys_poll
0.00 +2.5 2.47 ? 13% perf-profile.children.cycles-pp.vfs_poll
65.67 ? 13% +2.5 68.14 ? 13% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
65.49 ? 13% +2.5 67.96 ? 13% perf-profile.children.cycles-pp.do_syscall_64
27.38 ? 32% -1.6 25.76 ? 35% perf-profile.self.cycles-pp.intel_idle
23.88 ? 14% -1.0 22.87 ? 13% perf-profile.self.cycles-pp.fput
22.78 ? 13% -0.2 22.60 ? 13% perf-profile.self.cycles-pp.__fget
0.41 ? 31% -0.1 0.29 ? 19% perf-profile.self.cycles-pp.io_serial_in
0.12 ? 30% -0.1 0.06 ? 62% perf-profile.self.cycles-pp.menu_select
0.14 ? 34% -0.1 0.08 ? 19% perf-profile.self.cycles-pp.native_write_msr
0.08 ? 37% -0.0 0.04 ? 59% perf-profile.self.cycles-pp.native_irq_return_iret
0.04 ?107% -0.0 0.00 perf-profile.self.cycles-pp.native_sched_clock
0.06 ? 70% -0.0 0.03 ?100% perf-profile.self.cycles-pp.__softirqentry_text_start
0.05 ? 62% -0.0 0.01 ?173% perf-profile.self.cycles-pp.find_busiest_group
0.06 ? 28% -0.0 0.03 ?100% perf-profile.self.cycles-pp.read_tsc
0.03 ?105% -0.0 0.00 perf-profile.self.cycles-pp.do_idle
0.03 ?102% -0.0 0.00 perf-profile.self.cycles-pp.__next_timer_interrupt
0.04 ? 58% -0.0 0.01 ?173% perf-profile.self.cycles-pp.run_timer_softirq
0.06 ? 63% -0.0 0.03 ?100% perf-profile.self.cycles-pp.ktime_get
0.03 ?100% -0.0 0.00 perf-profile.self.cycles-pp._raw_spin_lock_irq
0.85 ? 11% -0.0 0.83 ? 16% perf-profile.self.cycles-pp.__fdget
0.03 ?105% -0.0 0.01 ?173% perf-profile.self.cycles-pp._raw_spin_lock
0.15 ? 69% -0.0 0.13 ?110% perf-profile.self.cycles-pp.memcpy
0.03 ?102% -0.0 0.01 ?173% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.03 ?102% -0.0 0.01 ?173% perf-profile.self.cycles-pp.rcu_check_callbacks
0.01 ?173% -0.0 0.00 perf-profile.self.cycles-pp.irq_exit
0.01 ?173% -0.0 0.00 perf-profile.self.cycles-pp.perf_mux_hrtimer_handler
0.01 ?173% -0.0 0.00 perf-profile.self.cycles-pp.find_next_bit
0.01 ?173% -0.0 0.00 perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.69 ? 11% -0.0 0.68 ? 14% perf-profile.self.cycles-pp.kfree
0.12 ? 13% -0.0 0.11 ? 15% perf-profile.self.cycles-pp.___might_sleep
0.11 ? 15% -0.0 0.11 ? 24% perf-profile.self.cycles-pp.__might_sleep
0.07 ? 62% -0.0 0.07 ? 12% perf-profile.self.cycles-pp.cpuidle_enter_state
0.21 ? 18% -0.0 0.20 ? 14% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.01 ?173% +0.0 0.01 ?173% perf-profile.self.cycles-pp.update_blocked_averages
0.07 ? 17% +0.0 0.07 ? 11% perf-profile.self.cycles-pp.__indirect_thunk_start
0.24 ? 23% +0.0 0.24 ? 20% perf-profile.self.cycles-pp.kmalloc_slab
1.86 ? 16% +0.0 1.87 ? 12% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.12 ? 9% +0.0 0.12 ? 27% perf-profile.self.cycles-pp.entry_SYSCALL_64_stage2
0.00 +0.0 0.01 ?173% perf-profile.self.cycles-pp.smp_call_function_single
0.10 ? 37% +0.0 0.12 ? 16% perf-profile.self.cycles-pp.__might_fault
0.17 ? 47% +0.0 0.19 ? 24% perf-profile.self.cycles-pp.delay_tsc
0.30 ? 12% +0.0 0.33 ? 18% perf-profile.self.cycles-pp.__x64_sys_poll
0.67 ? 5% +0.0 0.69 ? 23% perf-profile.self.cycles-pp.__kmalloc
0.18 ? 12% +0.0 0.23 ? 17% perf-profile.self.cycles-pp._copy_from_user
0.76 ? 10% +0.1 0.81 ? 11% perf-profile.self.cycles-pp.do_syscall_64
2.62 ? 15% +0.1 2.68 ? 9% perf-profile.self.cycles-pp.__fget_light
2.08 ? 13% +0.1 2.16 ? 12% perf-profile.self.cycles-pp.copy_user_generic_string
11.46 ? 14% +1.0 12.45 ? 13% perf-profile.self.cycles-pp.do_sys_poll
0.00 +2.5 2.47 ? 13% perf-profile.self.cycles-pp.vfs_poll
will-it-scale.per_process_ops
540000 +-+----------------------------------------------------------------+
530000 +-+ +. |
|+.+ ++.++.++.+++.++.+ .++.+ .++.+++.++.++.++.++.+++. .++. + |
520000 +-+ + + ++ + : |
510000 +-+ : |
| :+. |
500000 +-+ + +|
490000 +-+ |
480000 +-+ |
| |
470000 +-+ |
460000 OO+OO O O O O OO OO OOO OO O O OO |
| O OO OO OO O OO O O OO |
450000 +-+ O O OO |
440000 +-+----------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
Thanks,
Xiaolong
[ Added Al, since this all came in through his trees. The guilty
authors were already added by the robot ]
On Fri, Jun 22, 2018 at 5:31 PM kernel test robot <[email protected]> wrote:
>
> FYI, we noticed a -8.8% regression of will-it-scale.per_process_ops due to commit:
Guys, this seems pretty big.
What was the alleged advantage of the new poll methods again? Because
it sure isn't obvious - not from the numbers, and not from the commit
messages.
The code seems to be garbage. It replaces our nice generic "you can do
anything you damn well like in your poll function" with two limited
fixed-function "just give me the poll head and the mask".
I was assuming there was a good reason for it, but looking closer I
see absolutely nothing but negatives. The argument that keyed wake-ups
somehow make multiple wake-queues irrelevant doesn't hold water when
the code is more complex and apparently slower. It's not like anybody
ever *had* to use multiple wait-queues, but the old code was both
simpler and cleaner and *allowed* you to use multiple queues if you
wanted to.
So the old code is simpler, cleaner, and more flexible. And according
to the test robot, it also performs better.
So I'm inclined to just revert the whole mess unless I get some
serious explanations for what the supposed advantages are.
The disadvantages are obvious: every poll event now causes *two*
indirect branches to the low-level filesystem or driver - one to get
he poll head, and one to get the mask. Add to that all the new "do we
have the new-style or old sane poll interface" tests, and poll is
obviously more complicated.
Quite frankly, when I look at it, I just go "that's _stupid_". I'm
entirely missing the point of the conversion, and it's not explained
in the messages either.
If we could get the poll head by just having a direct pointer in the
'struct file', maybe that would be one thing. As it is, this all
literally just adds overhead for no obvious reason. It replaced one
simple direct call with two dependent but separate ones.
Linus
On Fri, Jun 22, 2018 at 06:25:45PM +0900, Linus Torvalds wrote:
> What was the alleged advantage of the new poll methods again? Because
> it sure isn't obvious - not from the numbers, and not from the commit
> messages.
The primary goal is that we can implement a race-free aio poll,
the primary benefit is that we can get rid of the currently racy
and bug prone way we do in-kernel poll-like calls for things like
eventfd. The first is clearly is in 4.18-rc and provides massive
performance advantanges if used, the second is not there yet,
more on that below.
> I was assuming there was a good reason for it, but looking closer I
> see absolutely nothing but negatives. The argument that keyed wake-ups
> somehow make multiple wake-queues irrelevant doesn't hold water when
> the code is more complex and apparently slower. It's not like anybody
> ever *had* to use multiple wait-queues, but the old code was both
> simpler and cleaner and *allowed* you to use multiple queues if you
> wanted to.
It wasn't cleaner at all if you aren't poll or select, and even
for those it isn't exactly clean, see the whole mess around ->qproc.
> The disadvantages are obvious: every poll event now causes *two*
> indirect branches to the low-level filesystem or driver - one to get
> he poll head, and one to get the mask. Add to that all the new "do we
> have the new-style or old sane poll interface" tests, and poll is
> obviously more complicated.
It already caused two, and now we have three thanks to ->qproc. One
of the advantages of the new code is that we can eventually get rid
of ->qproc once all users of a non-default qproc are switched away
from vfs_poll. Which requires a little more work, but I have the
patches for that to be posted soon.
> If we could get the poll head by just having a direct pointer in the
> 'struct file', maybe that would be one thing. As it is, this all
> literally just adds overhead for no obvious reason. It replaced one
> simple direct call with two dependent but separate ones.
People are doing weird things with their poll heads, so we can't do
that unconditionally. We could however offer a waitqueue pointer
in struct file and most users would be very happy with that.
In the meantime below is an ugly patch that removes the _qproc
indirect for ->poll only (similar patch is possible for select
assuming the code uses select). And for next merge window I plan
to kill it off entirely.
How can we get this thrown into the will it scale run?
---
From 50ca47fdcfec0a1af56aac6db8a168bb678308a5 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <[email protected]>
Date: Fri, 22 Jun 2018 11:36:26 +0200
Subject: fs: optimize away ->_qproc indirection for poll_mask based polling
Signed-off-by: Christoph Hellwig <[email protected]>
---
fs/select.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/fs/select.c b/fs/select.c
index bc3cc0f98896..54406e0ad23e 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -845,7 +845,25 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
/* userland u16 ->events contains POLL... bitmap */
filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
pwait->_key = filter | busy_flag;
- mask = vfs_poll(f.file, pwait);
+ if (f.file->f_op->poll) {
+ mask = f.file->f_op->poll(f.file, pwait);
+ } else if (file_has_poll_mask(f.file)) {
+ struct wait_queue_head *head;
+
+ head = f.file->f_op->get_poll_head(f.file, pwait->_key);
+ if (!head) {
+ mask = DEFAULT_POLLMASK;
+ } else if (IS_ERR(head)) {
+ mask = EPOLLERR;
+ } else {
+ if (pwait->_qproc)
+ __pollwait(f.file, head, pwait);
+ mask = f.file->f_op->poll_mask(f.file, pwait->_key);
+ }
+ } else {
+ mask = DEFAULT_POLLMASK;
+ }
+
if (mask & busy_flag)
*can_busy_poll = true;
mask &= filter; /* Mask out unneeded events. */
--
2.17.1
And a version with select() also covered:
---
From 317159003ae28113cf759c632b161fb39192fe3c Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <[email protected]>
Date: Fri, 22 Jun 2018 11:36:26 +0200
Subject: fs: optimize away ->_qproc indirection for poll_mask based polling
Signed-off-by: Christoph Hellwig <[email protected]>
---
fs/select.c | 38 +++++++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/fs/select.c b/fs/select.c
index bc3cc0f98896..2c9d81892509 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -528,6 +528,24 @@ static int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
wait_key_set(wait, in, out, bit,
busy_flag);
mask = vfs_poll(f.file, wait);
+ if (f.file->f_op->poll) {
+ mask = f.file->f_op->poll(f.file, wait);
+ } else if (file_has_poll_mask(f.file)) {
+ struct wait_queue_head *head;
+
+ head = f.file->f_op->get_poll_head(f.file, wait->_key);
+ if (!head) {
+ mask = DEFAULT_POLLMASK;
+ } else if (IS_ERR(head)) {
+ mask = EPOLLERR;
+ } else {
+ if (wait->_qproc)
+ __pollwait(f.file, head, wait);
+ mask = f.file->f_op->poll_mask(f.file, wait->_key);
+ }
+ } else {
+ mask = DEFAULT_POLLMASK;
+ }
fdput(f);
if ((mask & POLLIN_SET) && (in & bit)) {
@@ -845,7 +863,25 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
/* userland u16 ->events contains POLL... bitmap */
filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
pwait->_key = filter | busy_flag;
- mask = vfs_poll(f.file, pwait);
+ if (f.file->f_op->poll) {
+ mask = f.file->f_op->poll(f.file, pwait);
+ } else if (file_has_poll_mask(f.file)) {
+ struct wait_queue_head *head;
+
+ head = f.file->f_op->get_poll_head(f.file, pwait->_key);
+ if (!head) {
+ mask = DEFAULT_POLLMASK;
+ } else if (IS_ERR(head)) {
+ mask = EPOLLERR;
+ } else {
+ if (pwait->_qproc)
+ __pollwait(f.file, head, pwait);
+ mask = f.file->f_op->poll_mask(f.file, pwait->_key);
+ }
+ } else {
+ mask = DEFAULT_POLLMASK;
+ }
+
if (mask & busy_flag)
*can_busy_poll = true;
mask &= filter; /* Mask out unneeded events. */
--
2.17.1
On Fri, Jun 22, 2018 at 6:46 PM Christoph Hellwig <[email protected]> wrote:
>
> > The disadvantages are obvious: every poll event now causes *two*
> > indirect branches to the low-level filesystem or driver - one to get
> > he poll head, and one to get the mask. Add to that all the new "do we
> > have the new-style or old sane poll interface" tests, and poll is
> > obviously more complicated.
>
> It already caused two
No it didn't. If the data was ready, all that got short-circuited, and
we just had that ->poll() call.
Only if you *waited* did you get the second one to check the result,
and the whole poll_wait(). You could just say "I already have the
data, there's no wait-queu to add, I'll just return immediately".
You're making this all unconditionally pessimal.
Admit it. The new interface is inferior.
> In the meantime below is an ugly patch that removes the _qproc
> indirect for ->poll only (similar patch is possible for select
> assuming the code uses select). And for next merge window I plan
> to kill it off entirely.
You're just making the code even worse here.
Seriously, this patch is beyond ugly. The new interface is nasty shit.
Get your act together. Don't uglify and slow down everything else just
because you're concentrating only on aio.
Your reply so far just makes me more convinced this was not thought
through properly.
Linus
On Fri, Jun 22, 2018 at 7:02 PM Linus Torvalds
<[email protected]> wrote:
>
> Get your act together. Don't uglify and slow down everything else just
> because you're concentrating only on aio.
.. and seriously, poll and select are timing-critical. There are many
real loads where they show up as *the* thing in kernel profiles.
aio is a distant distant second cousin. You need to make sure poll and
select are prioritized, and aio should be a "if we can do this
cleanly" case.
Linus
On Fri, Jun 22, 2018 at 12:00:14PM +0200, Christoph Hellwig wrote:
> And a version with select() also covered:
For fuck sake, if you want vfs_poll() inlined, *make* *it* *inlined*.
Is there any reason for not doing that other than EXPORT_SYMBOL_GPL
fetish? Because if there isn't, I would like to draw your attention
to the fact that _this_ pwecious inchewlekshul pwopewty can be trivially
open-coded by out-of-tree shite even if it happens to be non-GPL one.
> mask = vfs_poll(f.file, wait);
> + if (f.file->f_op->poll) {
... not to mention that here you forgot to remove the call itself while
expanding it.
Said that, you are not attacking the worst part of it - it's a static
branch, not the considerably more costly indirect ones. Remember when
I asked you about the price of those? Method calls are costly.
Another problem with with ->get_poll_head() calling conventions is
that originally you wanted to return ERR_PTR(-mask) as a way to report
not needing to call ->poll_mask(); that got shot down since quite
a few of those don't fit into 12 bits that ERR_PTR() gives us.
IIRC, the real reason for non-constant ->get_poll_head() was the sockets,
with
static struct wait_queue_head *sock_get_poll_head(struct file *file,
__poll_t events)
{
struct socket *sock = file->private_data;
if (!sock->ops->poll_mask)
return NULL;
sock_poll_busy_loop(sock, events);
return sk_sleep(sock->sk);
}
The first part isn't a problem (it is constant). The second is
static inline void sock_poll_busy_loop(struct socket *sock, __poll_t events)
{
if (sk_can_busy_loop(sock->sk) &&
events && (events & POLL_BUSY_LOOP)) {
/* once, only if requested by syscall */
sk_busy_loop(sock->sk, 1);
}
}
and the third -
static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
BUILD_BUG_ON(offsetof(struct socket_wq, wait) != 0);
return &rcu_dereference_raw(sk->sk_wq)->wait;
}
Now, ->sk_wq is modified only in sock_init_data() and sock_graft();
the latter, IIRC, is ->accept() helper. Do we ever call either of
those on a sock of already opened file? IOW, is there any real
reason for socket ->get_poll_head() not to be constant, other
than wanting to keep POLL_BUSY_LOOP handling out of ->poll_mask()?
I agree that POLL_BUSY_LOOP is ugly as hell, but you *still* have
sock_poll_mask() not free from it...
On Fri, Jun 22, 2018 at 12:01:17PM +0100, Al Viro wrote:
> For fuck sake, if you want vfs_poll() inlined, *make* *it* *inlined*.
That is not going to help with de-virtualizing _qproc, which was
the whole idea of that change. At least not without a compiler
way smarter than gcc.
But if you want it inline that is fine with me, it just seems little
large for inlining.
None that I plan to actually remove all calls except for poll and select
for vfs_poll in a pending series, at which point it would become static
anyway.
> Said that, you are not attacking the worst part of it - it's a static
> branch, not the considerably more costly indirect ones. Remember when
> I asked you about the price of those? Method calls are costly.
And back then it did not show up even in poll heavy workloads. But
since then something new happened - spectre mitigations, which make
indirect calls exorbitantly more expensive.
> Now, ->sk_wq is modified only in sock_init_data() and sock_graft();
> the latter, IIRC, is ->accept() helper. Do we ever call either of
> those on a sock of already opened file? IOW, is there any real
> reason for socket ->get_poll_head() not to be constant, other
> than wanting to keep POLL_BUSY_LOOP handling out of ->poll_mask()?
> I agree that POLL_BUSY_LOOP is ugly as hell, but you *still* have
> sock_poll_mask() not free from it...
I'd have to defer to networking folks if busy looping after pollwait
is what they want, but I suspect the answer is no, by the time
we are already waiting for the queue busy waiting seems pointless.
On Fri, Jun 22, 2018 at 01:53:00PM +0200, Christoph Hellwig wrote:
> > Now, ->sk_wq is modified only in sock_init_data() and sock_graft();
> > the latter, IIRC, is ->accept() helper. Do we ever call either of
> > those on a sock of already opened file? IOW, is there any real
> > reason for socket ->get_poll_head() not to be constant, other
> > than wanting to keep POLL_BUSY_LOOP handling out of ->poll_mask()?
> > I agree that POLL_BUSY_LOOP is ugly as hell, but you *still* have
> > sock_poll_mask() not free from it...
>
> I'd have to defer to networking folks if busy looping after pollwait
> is what they want, but I suspect the answer is no, by the time
> we are already waiting for the queue busy waiting seems pointless.
So mark that in ->f_mode - I strongly suspect that
sk_can_busy_loop(sock->sk) can't change while an opened file is there.
And lift that (conditional on new FMODE_BUSY_LOOP) into do_poll()
and do_select() - we *already* have bits of pieces of that logics in
there and that way they'd at least be gathered in one place.
Then replace ->get_poll_head() with file->f_poll_head and
see what it gives.
On Fri, Jun 22, 2018 at 12:56:13PM +0100, Al Viro wrote:
> So mark that in ->f_mode - I strongly suspect that
> sk_can_busy_loop(sock->sk) can't change while an opened file is there.
> And lift that (conditional on new FMODE_BUSY_LOOP) into do_poll()
> and do_select() - we *already* have bits of pieces of that logics in
> there and that way they'd at least be gathered in one place.
The problem is that call to sk_busy_loop(), which is going to be indirect
no matter what.
>
> Then replace ->get_poll_head() with file->f_poll_head and
> see what it gives.
Working on it right now. Works so far except for the busy loop case.
I'm looking into a separate methods just for that as a first idea.
On Fri, Jun 22, 2018 at 02:07:39PM +0200, Christoph Hellwig wrote:
> On Fri, Jun 22, 2018 at 12:56:13PM +0100, Al Viro wrote:
> > So mark that in ->f_mode - I strongly suspect that
> > sk_can_busy_loop(sock->sk) can't change while an opened file is there.
> > And lift that (conditional on new FMODE_BUSY_LOOP) into do_poll()
> > and do_select() - we *already* have bits of pieces of that logics in
> > there and that way they'd at least be gathered in one place.
>
> The problem is that call to sk_busy_loop(), which is going to be indirect
> no matter what.
if ->f_poll_head is NULL {
use ->poll
} else {
if can ll_poll (checked in ->f_mode)
call ->ll_poll(), if it returns what we want - we are done
add to ->f_poll_head
call ->poll_mask()
}
On Fri, Jun 22, 2018 at 01:17:22PM +0100, Al Viro wrote:
> > The problem is that call to sk_busy_loop(), which is going to be indirect
> > no matter what.
>
> if ->f_poll_head is NULL {
> use ->poll
> } else {
> if can ll_poll (checked in ->f_mode)
> call ->ll_poll(), if it returns what we want - we are done
> add to ->f_poll_head
> call ->poll_mask()
What I have for now is slightly different:
if ((events & POLL_BUSY_LOOP) && file->f_op->poll_busy_loop)
file->f_op->poll_busy_loop(file, events);
if (file->f_op->poll) {
return file->f_op->poll(file, pt);
} else if (file_has_poll_mask(file)) {
...
}
returns whatever we want part is something I want to look into
once the basics are done as it probably is non entirely trivial due to
structure of polling in the low-level network protocol.
On Fri, Jun 22, 2018 at 02:33:07PM +0200, Christoph Hellwig wrote:
> On Fri, Jun 22, 2018 at 01:17:22PM +0100, Al Viro wrote:
> > > The problem is that call to sk_busy_loop(), which is going to be indirect
> > > no matter what.
> >
> > if ->f_poll_head is NULL {
> > use ->poll
> > } else {
> > if can ll_poll (checked in ->f_mode)
> > call ->ll_poll(), if it returns what we want - we are done
> > add to ->f_poll_head
> > call ->poll_mask()
>
> What I have for now is slightly different:
>
> if ((events & POLL_BUSY_LOOP) && file->f_op->poll_busy_loop)
> file->f_op->poll_busy_loop(file, events);
>
> if (file->f_op->poll) {
> return file->f_op->poll(file, pt);
> } else if (file_has_poll_mask(file)) {
> ...
> }
>
> returns whatever we want part is something I want to look into
> once the basics are done as it probably is non entirely trivial due to
> structure of polling in the low-level network protocol.
First of all, you'll get the same ->f_op for *all* sockets. So you'll be
hitting that path regardless of sk_can_busy_loop(sock->sk). What's more,
that way you get (on fast path) even more indirect calls, AFAICS.
And I don't see any point in separate file_has_poll_mask() - just check
->f_poll_head and that's it.
Hi Xiaolong,
can you retest this workload on the following branch:
git://git.infradead.org/users/hch/vfs.git remove-get-poll-head
Gitweb:
http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/remove-get-poll-head
On Fri, Jun 22, 2018 at 05:02:51PM +0200, Christoph Hellwig wrote:
> Hi Xiaolong,
>
> can you retest this workload on the following branch:
>
> git://git.infradead.org/users/hch/vfs.git remove-get-poll-head
>
> Gitweb:
>
> http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/remove-get-poll-head
See objections upthread re "fs,net: move poll busy loop handling into a
separate method"; as for the next one... I'd like an ACK from networking
folks. The rest of queue makes sense.
On Fri, Jun 22, 2018 at 04:14:09PM +0100, Al Viro wrote:
> > http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/remove-get-poll-head
>
> See objections upthread re "fs,net: move poll busy loop handling into a
> separate method"; as for the next one... I'd like an ACK from networking
> folks. The rest of queue makes sense.
I want to see basic results first before micro-optimizing. After that
I'll send it out to the net folks for feedback.
On Fri, Jun 22, 2018 at 05:28:50PM +0200, Christoph Hellwig wrote:
> On Fri, Jun 22, 2018 at 04:14:09PM +0100, Al Viro wrote:
> > > http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/remove-get-poll-head
> >
> > See objections upthread re "fs,net: move poll busy loop handling into a
> > separate method"; as for the next one... I'd like an ACK from networking
> > folks. The rest of queue makes sense.
>
> I want to see basic results first before micro-optimizing. After that
> I'll send it out to the net folks for feedback.
I looked into this a bit, in the end sk_can_busy_loop does this:
return sk->sk_ll_usec && !signal_pending(current);
where sk_ll_usec defaults based on a sysctl that needs to be
turned on, but can be overriden per socket.
While at the same time corect poll code already checks net_busy_loop_on
to set POLL_BUSY_LOOP. So except for sockets where people set the
timeout to 0 the code already does the right thing as-is. IMHO not
really worth wasting a FMODE_* flag for it, but if you insist I'll add
it.
On Fri, Jun 22, 2018 at 7:01 AM Al Viro <[email protected]> wrote:
>
> On Fri, Jun 22, 2018 at 12:00:14PM +0200, Christoph Hellwig wrote:
> > And a version with select() also covered:
>
> For fuck sake, if you want vfs_poll() inlined, *make* *it* *inlined*.
> Is there any reason for not doing that other than EXPORT_SYMBOL_GPL
> fetish? Because if there isn't, I would like to draw your attention
> to the fact that _this_ pwecious inchewlekshul pwopewty can be trivially
> open-coded by out-of-tree shite even if it happens to be non-GPL one.
>
Was this suggestion so bad that you have to insult not only the
author, but also people with speech impediments?
Sean
> > mask = vfs_poll(f.file, wait);
> > + if (f.file->f_op->poll) {
>
> ... not to mention that here you forgot to remove the call itself while
> expanding it.
>
> Said that, you are not attacking the worst part of it - it's a static
> branch, not the considerably more costly indirect ones. Remember when
> I asked you about the price of those? Method calls are costly.
>
> Another problem with with ->get_poll_head() calling conventions is
> that originally you wanted to return ERR_PTR(-mask) as a way to report
> not needing to call ->poll_mask(); that got shot down since quite
> a few of those don't fit into 12 bits that ERR_PTR() gives us.
>
> IIRC, the real reason for non-constant ->get_poll_head() was the sockets,
> with
>
> static struct wait_queue_head *sock_get_poll_head(struct file *file,
> __poll_t events)
> {
> struct socket *sock = file->private_data;
>
> if (!sock->ops->poll_mask)
> return NULL;
> sock_poll_busy_loop(sock, events);
> return sk_sleep(sock->sk);
> }
>
> The first part isn't a problem (it is constant). The second is
> static inline void sock_poll_busy_loop(struct socket *sock, __poll_t events)
> {
> if (sk_can_busy_loop(sock->sk) &&
> events && (events & POLL_BUSY_LOOP)) {
> /* once, only if requested by syscall */
> sk_busy_loop(sock->sk, 1);
> }
> }
>
> and the third -
>
> static inline wait_queue_head_t *sk_sleep(struct sock *sk)
> {
> BUILD_BUG_ON(offsetof(struct socket_wq, wait) != 0);
> return &rcu_dereference_raw(sk->sk_wq)->wait;
> }
>
> Now, ->sk_wq is modified only in sock_init_data() and sock_graft();
> the latter, IIRC, is ->accept() helper. Do we ever call either of
> those on a sock of already opened file? IOW, is there any real
> reason for socket ->get_poll_head() not to be constant, other
> than wanting to keep POLL_BUSY_LOOP handling out of ->poll_mask()?
> I agree that POLL_BUSY_LOOP is ugly as hell, but you *still* have
> sock_poll_mask() not free from it...
On Fri, Jun 22, 2018 at 06:18:02PM +0200, Christoph Hellwig wrote:
> On Fri, Jun 22, 2018 at 05:28:50PM +0200, Christoph Hellwig wrote:
> > On Fri, Jun 22, 2018 at 04:14:09PM +0100, Al Viro wrote:
> > > > http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/remove-get-poll-head
> > >
> > > See objections upthread re "fs,net: move poll busy loop handling into a
> > > separate method"; as for the next one... I'd like an ACK from networking
> > > folks. The rest of queue makes sense.
> >
> > I want to see basic results first before micro-optimizing. After that
> > I'll send it out to the net folks for feedback.
>
> I looked into this a bit, in the end sk_can_busy_loop does this:
>
> return sk->sk_ll_usec && !signal_pending(current);
>
> where sk_ll_usec defaults based on a sysctl that needs to be
> turned on, but can be overriden per socket.
>
> While at the same time corect poll code already checks net_busy_loop_on
> to set POLL_BUSY_LOOP. So except for sockets where people set the
> timeout to 0 the code already does the right thing as-is. IMHO not
> really worth wasting a FMODE_* flag for it, but if you insist I'll add
> it.
It's not just that - there's also an issue of extra indirect call on the
fast path for sockets. You get this method of yours + ->poll_mask(),
which hits another indirect to per-family ->poll_mask(). It might be
better to have these combined, sparing us an extra indirect call.
Just give it the same calling conventions as ->poll_mask() have...
On Fri, Jun 22, 2018 at 09:02:55PM +0100, Al Viro wrote:
> > While at the same time corect poll code already checks net_busy_loop_on
> > to set POLL_BUSY_LOOP. So except for sockets where people set the
> > timeout to 0 the code already does the right thing as-is. IMHO not
> > really worth wasting a FMODE_* flag for it, but if you insist I'll add
> > it.
>
> It's not just that - there's also an issue of extra indirect call on the
> fast path for sockets. You get this method of yours + ->poll_mask(),
> which hits another indirect to per-family ->poll_mask(). It might be
> better to have these combined, sparing us an extra indirect call.
>
> Just give it the same calling conventions as ->poll_mask() have...
The problem is that for the busy poll we want the actual busy poll +
__pollwait + ->poll_mask. Which is going to make that new poll_busy_loop
with a return value look exactly like ->poll.
So for now I'm tempted to just do this:
---
From 4abf23f6565ff2a74f1859758f9c894abe476a00 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <[email protected]>
Date: Sat, 23 Jun 2018 09:02:59 +0200
Subject: FOLD: remove ->poll_busy_loop again
Busy looping always comes in from poll(2) or select(2). So instead of
adding a separate method we can just do it at the beginning of ->poll
for now.
Signed-off-by: Christoph Hellwig <[email protected]>
---
fs/select.c | 8 --------
include/linux/fs.h | 1 -
net/socket.c | 20 ++++++--------------
3 files changed, 6 insertions(+), 23 deletions(-)
diff --git a/fs/select.c b/fs/select.c
index 25327efca2f9..c68f7cdc777a 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -38,14 +38,6 @@ __poll_t vfs_poll(struct file *file, struct poll_table_struct *pt)
{
unsigned int events = poll_requested_events(pt);
- /*
- * XXX: might be worth adding a f_mode flag to see if busy looping is
- * supported. Although callers probably only keep setting it when
- * supported, that's why POLL_BUSY_LOOP is reported in the output.
- */
- if ((events & POLL_BUSY_LOOP) && file->f_op->poll_busy_loop)
- file->f_op->poll_busy_loop(file, events);
-
if (file->f_op->poll) {
return file->f_op->poll(file, pt);
} else if (file->f_poll_head) {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 82133bd1a047..bfaebdc03878 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1722,7 +1722,6 @@ struct file_operations {
int (*iterate_shared) (struct file *, struct dir_context *);
__poll_t (*poll) (struct file *, struct poll_table_struct *);
__poll_t (*poll_mask) (struct file *, __poll_t);
- void (*poll_busy_loop)(struct file *file, __poll_t events);
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
diff --git a/net/socket.c b/net/socket.c
index b52e5b900e09..0aaa49190b30 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -131,19 +131,6 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
-#ifdef CONFIG_NET_RX_BUSY_POLL
-static void sock_poll_busy_loop(struct file *file, __poll_t events)
-{
- struct socket *sock = file->private_data;
-
- /* once, only if requested by syscall */
- if (sk_can_busy_loop(sock->sk))
- sk_busy_loop(sock->sk, 1);
-}
-#else
-#define sock_poll_busy_loop NULL
-#endif
-
/*
* Socket files have a set of 'special' operations as well as the generic file ones. These don't appear
* in the operation structures but are done directly via the socketcall() multiplexor.
@@ -155,7 +142,6 @@ static const struct file_operations socket_file_ops = {
.read_iter = sock_read_iter,
.write_iter = sock_write_iter,
.poll_mask = sock_poll_mask,
- .poll_busy_loop = sock_poll_busy_loop,
.poll = sock_poll,
.unlocked_ioctl = sock_ioctl,
#ifdef CONFIG_COMPAT
@@ -1163,6 +1149,12 @@ static __poll_t sock_poll(struct file *file, poll_table *wait)
struct socket *sock = file->private_data;
__poll_t events = poll_requested_events(wait), mask = 0;
+ /*
+ * Poll once, if requested by syscall.
+ */
+ if ((events & POLL_BUSY_LOOP) && sk_can_busy_loop(sock->sk))
+ sk_busy_loop(sock->sk, 1);
+
if (sock->ops->poll) {
mask = sock->ops->poll(file, sock, wait);
} else if (sock->ops->poll_mask) {
--
2.17.1
Hi,
On 06/22, Christoph Hellwig wrote:
>Hi Xiaolong,
>
>can you retest this workload on the following branch:
>
> git://git.infradead.org/users/hch/vfs.git remove-get-poll-head
>
>Gitweb:
>
> http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/remove-get-poll-head
Here is the comparison for commit 3deb642f0d and commit 8fbedc1 ("fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer") in remove-get-poll-head branch.
3deb642f0de4c14f 8fbedc19c94fd25a2b9b327015
---------------- --------------------------
%stddev change %stddev
\ | \
457120 -7% 424608 will-it-scale.per_process_ops
238978 233862 will-it-scale.per_thread_ops
1755 ? 13% 8% 1899 ? 18% will-it-scale.time.voluntary_context_switches
2332 2342 will-it-scale.time.system_time
310 310 will-it-scale.time.elapsed_time
310 310 will-it-scale.time.elapsed_time.max
4096 4096 will-it-scale.time.page_size
0.54 0.54 will-it-scale.scalability
807 806 will-it-scale.time.percent_of_cpu_this_job_got
17218 17179 will-it-scale.time.minor_page_faults
9931 9862 will-it-scale.time.maximum_resident_set_size
173 -6% 163 will-it-scale.time.user_time
49024375 -6% 46155690 will-it-scale.workload
17818 ? 10% -19% 14397 ? 4% will-it-scale.time.involuntary_context_switches
116842 ? 12% -4% 112098 ? 5% interrupts.CAL:Function_call_interrupts
32735 32635 vmstat.system.in
2112 ? 7% -13% 1845 ? 3% vmstat.system.cs
150 150 turbostat.PkgWatt
123 122 turbostat.CorWatt
1573 1573 turbostat.Avg_MHz
15.73 13% 17.77 ? 19% boot-time.kernel_boot
15.07 12% 16.93 ? 21% boot-time.dhcp
771 8% 834 ? 12% boot-time.idle
25.69 8% 27.69 ? 12% boot-time.boot
1755 ? 13% 8% 1899 ? 18% time.voluntary_context_switches
2332 2342 time.system_time
310 310 time.elapsed_time
310 310 time.elapsed_time.max
4096 4096 time.page_size
807 806 time.percent_of_cpu_this_job_got
17218 17179 time.minor_page_faults
9931 9862 time.maximum_resident_set_size
173 -6% 163 time.user_time
17818 ? 10% -19% 14397 ? 4% time.involuntary_context_switches
428813 ? 9% 57% 672385 proc-vmstat.pgalloc_normal
41736 ? 15% 22% 50828 proc-vmstat.nr_free_cma
18116 8% 19506 ? 8% proc-vmstat.nr_slab_unreclaimable
1029 1033 proc-vmstat.nr_page_table_pages
8453 8471 proc-vmstat.nr_kernel_stack
6486 6499 proc-vmstat.nr_mapped
3193607 3194517 proc-vmstat.nr_dirty_threshold
1594853 1595308 proc-vmstat.nr_dirty_background_threshold
16061877 16064831 proc-vmstat.nr_free_pages
20009 20005 proc-vmstat.nr_anon_pages
6303 6294 proc-vmstat.numa_other
799772 797937 proc-vmstat.pgfault
667803 665906 proc-vmstat.pgfree
666440 663786 proc-vmstat.numa_hit
660136 657491 proc-vmstat.numa_local
313125 310062 proc-vmstat.nr_file_pages
1941 ? 5% 1917 ? 8% proc-vmstat.numa_pte_updates
1448 ? 7% 1421 ? 9% proc-vmstat.numa_hint_faults_local
1596 ? 6% 1558 ? 10% proc-vmstat.numa_hint_faults
12893 -6% 12152 ? 11% proc-vmstat.nr_slab_reclaimable
22885 -100% 0 proc-vmstat.nr_indirectly_reclaimable
245443 ? 16% -100% 0 proc-vmstat.pgalloc_movable
19861107 ? 14% 34% 26619357 ? 35% perf-stat.node-load-misses
51734389 ? 5% 22% 63014695 ? 25% perf-stat.node-loads
1.924e+09 ? 3% 21% 2.32e+09 ? 5% perf-stat.iTLB-load-misses
2.342e+09 ? 8% 15% 2.695e+09 ? 4% perf-stat.cache-references
3.251e+08 ? 7% 11% 3.622e+08 ? 5% perf-stat.iTLB-loads
2.106e+08 ? 4% 10% 2.323e+08 ? 11% perf-stat.cache-misses
0.74 7% 0.79 perf-stat.cpi
1.605e+08 ? 7% 6% 1.703e+08 ? 6% perf-stat.node-stores
50804799 ? 16% 5% 53535896 ? 18% perf-stat.node-store-misses
27.63 ? 8% 5% 29.07 ? 8% perf-stat.node-load-miss-rate%
85.55 86.49 perf-stat.iTLB-load-miss-rate%
0.25 0.25 perf-stat.branch-miss-rate%
778741 776946 perf-stat.minor-faults
778753 776948 perf-stat.page-faults
23.93 ? 9% 23.75 ? 12% perf-stat.node-store-miss-rate%
9117 ? 4% 8969 ? 4% perf-stat.cpu-migrations
1.59e+13 -4% 1.533e+13 perf-stat.cpu-cycles
439328 ? 3% -5% 419250 ? 5% perf-stat.path-length
9.05 ? 8% -5% 8.62 ? 9% perf-stat.cache-miss-rate%
0.44 ? 39% -6% 0.42 ? 31% perf-stat.dTLB-load-miss-rate%
1.35 -7% 1.26 perf-stat.ipc
3.294e+12 ? 3% -9% 2.988e+12 ? 3% perf-stat.dTLB-stores
5.451e+12 ? 4% -10% 4.905e+12 ? 4% perf-stat.dTLB-loads
4.667e+12 ? 3% -10% 4.195e+12 ? 4% perf-stat.branch-instructions
2.154e+13 ? 3% -10% 1.935e+13 ? 4% perf-stat.instructions
1.161e+10 ? 4% -10% 1.043e+10 ? 5% perf-stat.branch-misses
2.401e+10 ? 34% -13% 2.093e+10 ? 36% perf-stat.dTLB-load-misses
653927 ? 8% -13% 568299 ? 3% perf-stat.context-switches
11203 ? 4% -26% 8344 perf-stat.instructions-per-iTLB-miss
0.02 ? 41% -50% 0.01 ? 47% perf-stat.dTLB-store-miss-rate%
7.557e+08 ? 37% -53% 3.521e+08 ? 49% perf-stat.dTLB-store-misses
Thanks,
Xiaolong
On Tue, Jun 26, 2018 at 02:03:38PM +0800, Ye Xiaolong wrote:
> Hi,
>
> On 06/22, Christoph Hellwig wrote:
> >Hi Xiaolong,
> >
> >can you retest this workload on the following branch:
> >
> > git://git.infradead.org/users/hch/vfs.git remove-get-poll-head
> >
> >Gitweb:
> >
> > http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/remove-get-poll-head
>
> Here is the comparison for commit 3deb642f0d and commit 8fbedc1 ("fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer") in remove-get-poll-head branch.
Especially the boot time ones and others look like they have additional
changes.
Can you compare the baseline of my tree, which is
894b8c00 ("Merge tag 'for_v4.18-rc2' of
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs") against 8fbedc1
(("fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer") ?
On 06/27, Christoph Hellwig wrote:
>On Tue, Jun 26, 2018 at 02:03:38PM +0800, Ye Xiaolong wrote:
>> Hi,
>>
>> On 06/22, Christoph Hellwig wrote:
>> >Hi Xiaolong,
>> >
>> >can you retest this workload on the following branch:
>> >
>> > git://git.infradead.org/users/hch/vfs.git remove-get-poll-head
>> >
>> >Gitweb:
>> >
>> > http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/remove-get-poll-head
>>
>> Here is the comparison for commit 3deb642f0d and commit 8fbedc1 ("fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer") in remove-get-poll-head branch.
>
>Especially the boot time ones and others look like they have additional
>changes.
>
>Can you compare the baseline of my tree, which is
>894b8c00 ("Merge tag 'for_v4.18-rc2' of
>git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs") against 8fbedc1
>(("fs: replace f_ops->get_poll_head with a static ->f_poll_head pointer") ?
Update the result:
testcase/path_params/tbox_group/run: will-it-scale/poll2-performance/lkp-sb03
894b8c000ae6c106 8fbedc19c94fd25a2b9b327015
---------------- --------------------------
%stddev change %stddev
\ | \
404611 ? 4% 5% 424608 will-it-scale.per_process_ops
1489 ? 21% 28% 1899 ? 18% will-it-scale.time.voluntary_context_switches
45828560 46155690 will-it-scale.workload
2337 2342 will-it-scale.time.system_time
806 806 will-it-scale.time.percent_of_cpu_this_job_got
310 310 will-it-scale.time.elapsed_time
310 310 will-it-scale.time.elapsed_time.max
4096 4096 will-it-scale.time.page_size
233917 233862 will-it-scale.per_thread_ops
17196 17179 will-it-scale.time.minor_page_faults
9901 9862 will-it-scale.time.maximum_resident_set_size
14705 ? 3% 14397 ? 4% will-it-scale.time.involuntary_context_switches
167 163 will-it-scale.time.user_time
0.66 ? 25% -17% 0.54 will-it-scale.scalability
120508 ? 15% -7% 112098 ? 5% interrupts.CAL:Function_call_interrupts
1670 ? 3% 10% 1845 ? 3% vmstat.system.cs
32707 32635 vmstat.system.in
121 122 turbostat.CorWatt
149 150 turbostat.PkgWatt
1573 1573 turbostat.Avg_MHz
17.54 ? 19% 17.77 ? 19% boot-time.kernel_boot
824 ? 12% 834 ? 12% boot-time.idle
27.45 ? 12% 27.69 ? 12% boot-time.boot
16.96 ? 21% 16.93 ? 21% boot-time.dhcp
1489 ? 21% 28% 1899 ? 18% time.voluntary_context_switches
2337 2342 time.system_time
806 806 time.percent_of_cpu_this_job_got
310 310 time.elapsed_time
310 310 time.elapsed_time.max
4096 4096 time.page_size
17196 17179 time.minor_page_faults
9901 9862 time.maximum_resident_set_size
14705 ? 3% 14397 ? 4% time.involuntary_context_switches
167 163 time.user_time
18320 6% 19506 ? 8% proc-vmstat.nr_slab_unreclaimable
1518 ? 7% 1558 ? 10% proc-vmstat.numa_hint_faults
1387 ? 8% 1421 ? 9% proc-vmstat.numa_hint_faults_local
1873 ? 5% 1917 ? 8% proc-vmstat.numa_pte_updates
19987 20005 proc-vmstat.nr_anon_pages
8464 8471 proc-vmstat.nr_kernel_stack
309815 310062 proc-vmstat.nr_file_pages
50828 50828 proc-vmstat.nr_free_cma
16065590 16064831 proc-vmstat.nr_free_pages
3194669 3194517 proc-vmstat.nr_dirty_threshold
1595384 1595308 proc-vmstat.nr_dirty_background_threshold
798886 797937 proc-vmstat.pgfault
6510 6499 proc-vmstat.nr_mapped
659089 657491 proc-vmstat.numa_local
665458 663786 proc-vmstat.numa_hit
1037 1033 proc-vmstat.nr_page_table_pages
669923 665906 proc-vmstat.pgfree
676982 672385 proc-vmstat.pgalloc_normal
6368 6294 proc-vmstat.numa_other
13013 -7% 12152 ? 11% proc-vmstat.nr_slab_reclaimable
51213164 ? 18% 23% 63014695 ? 25% perf-stat.node-loads
22096136 ? 28% 20% 26619357 ? 35% perf-stat.node-load-misses
2.079e+08 ? 9% 12% 2.323e+08 ? 11% perf-stat.cache-misses
515039 ? 3% 10% 568299 ? 3% perf-stat.context-switches
3.283e+08 ? 22% 10% 3.622e+08 ? 5% perf-stat.iTLB-loads
Thanks,
Xiaolong
On Thu, Jun 28, 2018 at 08:38:34AM +0800, Ye Xiaolong wrote:
> Update the result:
>
> testcase/path_params/tbox_group/run: will-it-scale/poll2-performance/lkp-sb03
So this looks like a huge improvement in the per process ops, but not
as large as the original regression, and no change in the per-thread
ops.
But the baseline already looks much lower, e.g. this shows an improvement
from 404611 to 424608 for the per-process ops, while the original showed
a regression from 501456 to 457120. Are we measuring on different
hardware? Did we gain new spectre mitigations elsewhere? Either way
I'm going to send these patches out for review, but I'd like to understand
the numbers a bit more.
>
> 894b8c000ae6c106 8fbedc19c94fd25a2b9b327015
> ---------------- --------------------------
> %stddev change %stddev
> \ | \
> 404611 ? 4% 5% 424608 will-it-scale.per_process_ops
> 1489 ? 21% 28% 1899 ? 18% will-it-scale.time.voluntary_context_switches
> 45828560 46155690 will-it-scale.workload
> 2337 2342 will-it-scale.time.system_time
> 806 806 will-it-scale.time.percent_of_cpu_this_job_got
> 310 310 will-it-scale.time.elapsed_time
> 310 310 will-it-scale.time.elapsed_time.max
> 4096 4096 will-it-scale.time.page_size
> 233917 233862 will-it-scale.per_thread_ops
> 17196 17179 will-it-scale.time.minor_page_faults
> 9901 9862 will-it-scale.time.maximum_resident_set_size
> 14705 ? 3% 14397 ? 4% will-it-scale.time.involuntary_context_switches
> 167 163 will-it-scale.time.user_time
> 0.66 ? 25% -17% 0.54 will-it-scale.scalability
> 120508 ? 15% -7% 112098 ? 5% interrupts.CAL:Function_call_interrupts
> 1670 ? 3% 10% 1845 ? 3% vmstat.system.cs
> 32707 32635 vmstat.system.in
> 121 122 turbostat.CorWatt
> 149 150 turbostat.PkgWatt
> 1573 1573 turbostat.Avg_MHz
> 17.54 ? 19% 17.77 ? 19% boot-time.kernel_boot
> 824 ? 12% 834 ? 12% boot-time.idle
> 27.45 ? 12% 27.69 ? 12% boot-time.boot
> 16.96 ? 21% 16.93 ? 21% boot-time.dhcp
> 1489 ? 21% 28% 1899 ? 18% time.voluntary_context_switches
> 2337 2342 time.system_time
> 806 806 time.percent_of_cpu_this_job_got
> 310 310 time.elapsed_time
> 310 310 time.elapsed_time.max
> 4096 4096 time.page_size
> 17196 17179 time.minor_page_faults
> 9901 9862 time.maximum_resident_set_size
> 14705 ? 3% 14397 ? 4% time.involuntary_context_switches
> 167 163 time.user_time
> 18320 6% 19506 ? 8% proc-vmstat.nr_slab_unreclaimable
> 1518 ? 7% 1558 ? 10% proc-vmstat.numa_hint_faults
> 1387 ? 8% 1421 ? 9% proc-vmstat.numa_hint_faults_local
> 1873 ? 5% 1917 ? 8% proc-vmstat.numa_pte_updates
> 19987 20005 proc-vmstat.nr_anon_pages
> 8464 8471 proc-vmstat.nr_kernel_stack
> 309815 310062 proc-vmstat.nr_file_pages
> 50828 50828 proc-vmstat.nr_free_cma
> 16065590 16064831 proc-vmstat.nr_free_pages
> 3194669 3194517 proc-vmstat.nr_dirty_threshold
> 1595384 1595308 proc-vmstat.nr_dirty_background_threshold
> 798886 797937 proc-vmstat.pgfault
> 6510 6499 proc-vmstat.nr_mapped
> 659089 657491 proc-vmstat.numa_local
> 665458 663786 proc-vmstat.numa_hit
> 1037 1033 proc-vmstat.nr_page_table_pages
> 669923 665906 proc-vmstat.pgfree
> 676982 672385 proc-vmstat.pgalloc_normal
> 6368 6294 proc-vmstat.numa_other
> 13013 -7% 12152 ? 11% proc-vmstat.nr_slab_reclaimable
> 51213164 ? 18% 23% 63014695 ? 25% perf-stat.node-loads
> 22096136 ? 28% 20% 26619357 ? 35% perf-stat.node-load-misses
> 2.079e+08 ? 9% 12% 2.323e+08 ? 11% perf-stat.cache-misses
> 515039 ? 3% 10% 568299 ? 3% perf-stat.context-switches
> 3.283e+08 ? 22% 10% 3.622e+08 ? 5% perf-stat.iTLB-loads
>
> Thanks,
> Xiaolong
---end quoted text---