2021-04-23 05:29:45

by kernel test robot

[permalink] [raw]
Subject: [mm/writeback] e5dbd33218: will-it-scale.per_process_ops -3.8% regression



Greeting,

FYI, we noticed a -3.8% regression of will-it-scale.per_process_ops due to commit:


commit: e5dbd33218bd8d87ab69f730ab90aed5fab7eb26 ("mm/writeback: Add wait_on_page_writeback_killable")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: will-it-scale
on test machine: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:

nr_task: 16
mode: process
test: mmap2
cpufreq_governor: performance
ucode: 0x5003006

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale

In addition to that, the commit also has significant impact on the following tests:

+------------------+-----------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -3.8% regression |
| test machine | 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=50% |
| | test=mmap2 |
| | ucode=0x5003006 |
+------------------+-----------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -4.4% regression |
| test machine | 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=50% |
| | test=mmap1 |
| | ucode=0x5003006 |
+------------------+-----------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -2.8% regression |
| test machine | 104 threads Skylake with 192G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=16 |
| | test=mmap2 |
| | ucode=0x2006a0a |
+------------------+-----------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -4.7% regression |
| test machine | 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory |
| test parameters | cpufreq_governor=performance |
| | mode=process |
| | nr_task=100% |
| | test=mmap1 |
| | ucode=0x5003006 |
+------------------+-----------------------------------------------------------------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/mmap2/will-it-scale/0x5003006

commit:
39f985c8f6 ("fs/cachefiles: Remove wait_bit_key layout dependency")
e5dbd33218 ("mm/writeback: Add wait_on_page_writeback_killable")

39f985c8f667c80a e5dbd33218bd8d87ab69f730ab9
---------------- ---------------------------
%stddev %change %stddev
\ | \
9359770 -3.8% 9001769 will-it-scale.16.processes
584985 -3.8% 562610 will-it-scale.per_process_ops
9359770 -3.8% 9001769 will-it-scale.workload
15996 -1.2% 15811 proc-vmstat.nr_kernel_stack
23577 ? 10% +18.5% 27937 ? 7% softirqs.CPU48.SCHED
5183 ? 41% +47.2% 7630 ? 7% interrupts.CPU1.NMI:Non-maskable_interrupts
5183 ? 41% +47.2% 7630 ? 7% interrupts.CPU1.PMI:Performance_monitoring_interrupts
54.33 ? 12% +18.4% 64.33 ? 7% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
153.34 ? 24% -45.9% 83.00 ? 25% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
153.33 ? 24% -45.9% 82.99 ? 25% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
2.424e+10 -3.8% 2.332e+10 perf-stat.i.branch-instructions
0.47 +3.7% 0.48 perf-stat.i.cpi
2.529e+10 -4.0% 2.428e+10 perf-stat.i.dTLB-loads
1.15e+10 -3.8% 1.106e+10 perf-stat.i.dTLB-stores
54249733 -4.8% 51627939 perf-stat.i.iTLB-load-misses
1.004e+11 -3.8% 9.661e+10 perf-stat.i.instructions
2.15 -3.6% 2.07 perf-stat.i.ipc
693.66 -3.9% 666.70 perf-stat.i.metric.M/sec
0.46 +3.7% 0.48 perf-stat.overall.cpi
2.15 -3.6% 2.08 perf-stat.overall.ipc
2.416e+10 -3.8% 2.324e+10 perf-stat.ps.branch-instructions
2.52e+10 -4.0% 2.419e+10 perf-stat.ps.dTLB-loads
1.146e+10 -3.8% 1.102e+10 perf-stat.ps.dTLB-stores
54065825 -4.8% 51454019 perf-stat.ps.iTLB-load-misses
1.001e+11 -3.8% 9.628e+10 perf-stat.ps.instructions
3.025e+13 -3.9% 2.908e+13 perf-stat.total.instructions
0.89 ? 14% -0.1 0.77 ? 11% perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.shmem_mmap.mmap_region.do_mmap
0.14 ? 13% -0.1 0.04 ? 71% perf-profile.children.cycles-pp.common_mmap
0.61 ? 12% -0.1 0.52 ? 12% perf-profile.children.cycles-pp.common_file_perm
0.21 ? 8% -0.0 0.17 ? 11% perf-profile.children.cycles-pp.vma_set_page_prot
0.12 ? 8% -0.0 0.09 ? 12% perf-profile.children.cycles-pp.blocking_notifier_call_chain
0.12 ? 14% -0.0 0.09 ? 15% perf-profile.children.cycles-pp.get_mmap_base
0.09 ? 8% -0.0 0.07 ? 11% perf-profile.children.cycles-pp.vm_pgprot_modify
0.13 ? 15% +0.1 0.19 ? 8% perf-profile.children.cycles-pp.cap_capable
0.03 ?102% +0.1 0.12 ? 12% perf-profile.children.cycles-pp.munmap@plt
0.14 ? 13% +0.1 0.24 ? 6% perf-profile.children.cycles-pp.testcase
0.33 ? 10% -0.1 0.23 ? 10% perf-profile.self.cycles-pp.cap_vm_enough_memory
0.13 ? 11% -0.1 0.03 ?100% perf-profile.self.cycles-pp.common_mmap
0.48 ? 12% -0.1 0.41 ? 12% perf-profile.self.cycles-pp.common_file_perm
0.49 ? 12% -0.1 0.43 ? 13% perf-profile.self.cycles-pp.vm_area_alloc
0.12 ? 8% -0.0 0.09 ? 12% perf-profile.self.cycles-pp.blocking_notifier_call_chain
0.12 ? 13% -0.0 0.09 ? 14% perf-profile.self.cycles-pp.get_mmap_base
0.11 ? 8% +0.0 0.16 ? 10% perf-profile.self.cycles-pp.__x64_sys_munmap
0.11 ? 14% +0.1 0.18 ? 8% perf-profile.self.cycles-pp.cap_capable
0.12 ? 11% +0.1 0.20 ? 6% perf-profile.self.cycles-pp.testcase
0.01 ?223% +0.1 0.11 ? 13% perf-profile.self.cycles-pp.munmap@plt



will-it-scale.16.processes

9.4e+06 +----------------------------------------------------------------+
|............+............+............ ......|
9.35e+06 |-+ +............+...... |
9.3e+06 |-+ |
| |
9.25e+06 |-+ |
9.2e+06 |-+ |
| |
9.15e+06 |-+ |
9.1e+06 |-+ |
| |
9.05e+06 |-+ |
9e+06 |-+ O O |
| O O |
8.95e+06 +----------------------------------------------------------------+


will-it-scale.per_process_ops

590000 +------------------------------------------------------------------+
| |
585000 |............+.............+............ ......|
| +.............+...... |
| |
580000 |-+ |
| |
575000 |-+ |
| |
570000 |-+ |
| |
| |
565000 |-+ |
| O O O O |
560000 +------------------------------------------------------------------+


will-it-scale.workload

9.4e+06 +----------------------------------------------------------------+
|............+............+............ ......|
9.35e+06 |-+ +............+...... |
9.3e+06 |-+ |
| |
9.25e+06 |-+ |
9.2e+06 |-+ |
| |
9.15e+06 |-+ |
9.1e+06 |-+ |
| |
9.05e+06 |-+ |
9e+06 |-+ O O |
| O O |
8.95e+06 +----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample

***************************************************************************************************
lkp-csl-2sp9: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/mmap2/will-it-scale/0x5003006

commit:
39f985c8f6 ("fs/cachefiles: Remove wait_bit_key layout dependency")
e5dbd33218 ("mm/writeback: Add wait_on_page_writeback_killable")

39f985c8f667c80a e5dbd33218bd8d87ab69f730ab9
---------------- ---------------------------
%stddev %change %stddev
\ | \
25487561 -3.8% 24516984 will-it-scale.44.processes
579262 -3.8% 557203 will-it-scale.per_process_ops
25487561 -3.8% 24516984 will-it-scale.workload
3365 ? 12% -18.1% 2758 ? 15% numa-meminfo.node1.PageTables
841.83 ? 12% -18.2% 688.67 ? 15% numa-vmstat.node1.nr_page_table_pages
1.68 ? 12% +24.0% 2.09 ? 8% perf-sched.wait_time.avg.ms.rcu_gp_kthread.kthread.ret_from_fork
511.00 ? 3% +18.7% 606.50 ? 9% interrupts.CPU13.CAL:Function_call_interrupts
985.50 ? 39% -42.1% 571.00 ? 13% interrupts.CPU51.CAL:Function_call_interrupts
0.14 ? 10% -0.1 0.05 ? 45% perf-profile.children.cycles-pp.common_mmap
0.14 ? 8% +0.1 0.20 ? 9% perf-profile.children.cycles-pp.cap_capable
0.20 ? 9% +0.1 0.29 ? 11% perf-profile.children.cycles-pp.apparmor_mmap_file
0.02 ?141% +0.1 0.13 ? 17% perf-profile.children.cycles-pp.munmap@plt
0.15 ? 11% +0.1 0.28 ? 11% perf-profile.children.cycles-pp.testcase
0.14 ? 12% -0.1 0.04 ? 71% perf-profile.self.cycles-pp.common_mmap
0.35 ? 8% -0.1 0.25 ? 8% perf-profile.self.cycles-pp.cap_vm_enough_memory
0.11 ? 9% +0.1 0.18 ? 9% perf-profile.self.cycles-pp.__x64_sys_munmap
0.12 ? 11% +0.1 0.19 ? 9% perf-profile.self.cycles-pp.cap_capable
0.19 ? 8% +0.1 0.28 ? 11% perf-profile.self.cycles-pp.apparmor_mmap_file
0.13 ? 10% +0.1 0.23 ? 11% perf-profile.self.cycles-pp.testcase
0.00 +0.1 0.12 ? 19% perf-profile.self.cycles-pp.munmap@plt
6.572e+10 -3.8% 6.322e+10 perf-stat.i.branch-instructions
1.979e+08 -3.2% 1.916e+08 perf-stat.i.branch-misses
0.45 +3.8% 0.47 perf-stat.i.cpi
6.853e+10 -3.8% 6.59e+10 perf-stat.i.dTLB-loads
3.112e+10 -3.8% 2.994e+10 perf-stat.i.dTLB-stores
1.438e+08 -4.0% 1.381e+08 perf-stat.i.iTLB-load-misses
1912507 +2.0% 1949928 perf-stat.i.iTLB-loads
2.721e+11 -3.8% 2.618e+11 perf-stat.i.instructions
2.21 -3.8% 2.13 perf-stat.i.ipc
1879 -3.8% 1807 perf-stat.i.metric.M/sec
0.45 +3.9% 0.47 perf-stat.overall.cpi
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
2.21 -3.8% 2.13 perf-stat.overall.ipc
6.549e+10 -3.8% 6.301e+10 perf-stat.ps.branch-instructions
1.972e+08 -3.2% 1.91e+08 perf-stat.ps.branch-misses
6.829e+10 -3.8% 6.568e+10 perf-stat.ps.dTLB-loads
3.101e+10 -3.8% 2.983e+10 perf-stat.ps.dTLB-stores
1.433e+08 -3.9% 1.376e+08 perf-stat.ps.iTLB-load-misses
1905905 +2.0% 1943294 perf-stat.ps.iTLB-loads
2.711e+11 -3.8% 2.609e+11 perf-stat.ps.instructions
8.194e+13 -3.8% 7.88e+13 perf-stat.total.instructions



***************************************************************************************************
lkp-csl-2sp9: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/mmap1/will-it-scale/0x5003006

commit:
39f985c8f6 ("fs/cachefiles: Remove wait_bit_key layout dependency")
e5dbd33218 ("mm/writeback: Add wait_on_page_writeback_killable")

39f985c8f667c80a e5dbd33218bd8d87ab69f730ab9
---------------- ---------------------------
%stddev %change %stddev
\ | \
29240907 -4.4% 27946132 will-it-scale.44.processes
664565 -4.4% 635139 will-it-scale.per_process_ops
29240907 -4.4% 27946132 will-it-scale.workload
10398 ? 30% -54.6% 4721 ? 13% proc-vmstat.numa_hint_faults
-15.86 +42.0% -22.53 sched_debug.cpu.nr_uninterruptible.min
11801 ? 8% -15.8% 9934 ? 12% numa-vmstat.node0.nr_slab_reclaimable
24848 ? 6% -14.2% 21328 ? 9% numa-vmstat.node0.nr_slab_unreclaimable
4235 ? 8% -11.6% 3745 ? 6% slabinfo.skbuff_head_cache.active_objs
4256 ? 8% -11.6% 3761 ? 5% slabinfo.skbuff_head_cache.num_objs
3575 ? 44% +55.9% 5572 ? 41% interrupts.CPU62.NMI:Non-maskable_interrupts
3575 ? 44% +55.9% 5572 ? 41% interrupts.CPU62.PMI:Performance_monitoring_interrupts
534.33 ? 6% +34.5% 718.67 ? 27% interrupts.CPU74.CAL:Function_call_interrupts
47209 ? 8% -15.8% 39739 ? 12% numa-meminfo.node0.KReclaimable
1344385 ? 3% -15.6% 1134906 ? 8% numa-meminfo.node0.MemUsed
47209 ? 8% -15.8% 39739 ? 12% numa-meminfo.node0.SReclaimable
99393 ? 6% -14.2% 85314 ? 9% numa-meminfo.node0.SUnreclaim
1252104 ? 3% +17.0% 1464694 ? 7% numa-meminfo.node1.MemUsed
121777 ? 6% +17.4% 142988 ? 9% numa-meminfo.node1.Slab
0.15 ? 11% -0.1 0.07 ? 13% perf-profile.children.cycles-pp.__x64_sys_mmap
0.17 ? 9% -0.1 0.09 ? 11% perf-profile.children.cycles-pp.get_mmap_base
0.25 ? 6% -0.0 0.20 ? 9% perf-profile.children.cycles-pp.cap_mmap_file
0.18 ? 10% +0.0 0.21 ? 5% perf-profile.children.cycles-pp.tlb_flush_mmu
0.19 ? 10% +0.0 0.24 ? 10% perf-profile.children.cycles-pp.cap_capable
0.27 ? 8% +0.1 0.35 ? 12% perf-profile.children.cycles-pp.apparmor_mmap_file
0.17 ? 10% +0.1 0.26 ? 10% perf-profile.children.cycles-pp.testcase
0.44 ? 9% -0.2 0.27 ? 9% perf-profile.self.cycles-pp.cap_vm_enough_memory
0.13 ? 13% -0.1 0.05 ? 45% perf-profile.self.cycles-pp.__x64_sys_mmap
0.16 ? 12% -0.1 0.09 ? 11% perf-profile.self.cycles-pp.get_mmap_base
0.23 ? 7% -0.0 0.18 ? 10% perf-profile.self.cycles-pp.cap_mmap_file
0.12 ? 9% +0.0 0.15 ? 8% perf-profile.self.cycles-pp.tlb_flush_mmu
0.18 ? 12% +0.0 0.22 ? 11% perf-profile.self.cycles-pp.cap_capable
0.14 ? 11% +0.1 0.21 ? 10% perf-profile.self.cycles-pp.testcase
0.13 ? 7% +0.1 0.21 ? 12% perf-profile.self.cycles-pp.__x64_sys_munmap
0.25 ? 9% +0.1 0.33 ? 11% perf-profile.self.cycles-pp.apparmor_mmap_file
6.752e+10 -4.4% 6.455e+10 perf-stat.i.branch-instructions
1.885e+08 -3.4% 1.821e+08 perf-stat.i.branch-misses
0.44 +4.6% 0.46 perf-stat.i.cpi
6.851e+10 -4.2% 6.563e+10 perf-stat.i.dTLB-loads
3.064e+10 -4.4% 2.929e+10 perf-stat.i.dTLB-stores
1.251e+08 -2.1% 1.225e+08 perf-stat.i.iTLB-load-misses
1904859 +1.4% 1931131 perf-stat.i.iTLB-loads
2.786e+11 -4.4% 2.664e+11 perf-stat.i.instructions
2232 -2.4% 2178 perf-stat.i.instructions-per-iTLB-miss
2.26 -4.4% 2.16 perf-stat.i.ipc
1893 -4.3% 1812 perf-stat.i.metric.M/sec
0.28 +0.0 0.28 perf-stat.overall.branch-miss-rate%
0.44 +4.7% 0.46 perf-stat.overall.cpi
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
2227 -2.4% 2174 perf-stat.overall.instructions-per-iTLB-miss
2.27 -4.4% 2.17 perf-stat.overall.ipc
6.728e+10 -4.4% 6.433e+10 perf-stat.ps.branch-instructions
1.879e+08 -3.4% 1.815e+08 perf-stat.ps.branch-misses
6.827e+10 -4.2% 6.54e+10 perf-stat.ps.dTLB-loads
3.054e+10 -4.4% 2.92e+10 perf-stat.ps.dTLB-stores
1.246e+08 -2.1% 1.22e+08 perf-stat.ps.iTLB-load-misses
1898462 +1.4% 1924678 perf-stat.ps.iTLB-loads
2.776e+11 -4.4% 2.654e+11 perf-stat.ps.instructions
8.39e+13 -4.4% 8.022e+13 perf-stat.total.instructions



***************************************************************************************************
lkp-skl-fpga01: 104 threads Skylake with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-skl-fpga01/mmap2/will-it-scale/0x2006a0a

commit:
39f985c8f6 ("fs/cachefiles: Remove wait_bit_key layout dependency")
e5dbd33218 ("mm/writeback: Add wait_on_page_writeback_killable")

39f985c8f667c80a e5dbd33218bd8d87ab69f730ab9
---------------- ---------------------------
%stddev %change %stddev
\ | \
6286058 -2.8% 6112306 will-it-scale.16.processes
392878 -2.8% 382018 will-it-scale.per_process_ops
6286058 -2.8% 6112306 will-it-scale.workload
11705 ? 7% +14.6% 13414 ? 7% softirqs.CPU23.RCU
0.75 ? 16% -0.2 0.58 ? 11% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.56 ? 12% -0.1 0.45 ? 12% perf-profile.children.cycles-pp.tick_sched_timer
0.51 ? 12% -0.1 0.41 ? 12% perf-profile.children.cycles-pp.tick_sched_handle
0.11 ? 10% -0.0 0.08 ? 6% perf-profile.children.cycles-pp.get_mmap_base
0.16 ? 10% +0.0 0.20 ? 4% perf-profile.children.cycles-pp.testcase
0.12 ? 17% +0.1 0.17 ? 11% perf-profile.children.cycles-pp.cap_capable
0.00 +0.1 0.10 ? 13% perf-profile.children.cycles-pp.munmap@plt
0.43 ? 9% -0.1 0.29 ? 6% perf-profile.self.cycles-pp.cap_vm_enough_memory
0.45 ? 9% -0.1 0.36 ? 9% perf-profile.self.cycles-pp.common_file_perm
0.10 ? 9% -0.0 0.08 ? 6% perf-profile.self.cycles-pp.get_mmap_base
0.00 +0.1 0.08 ? 14% perf-profile.self.cycles-pp.munmap@plt
0.19 ? 11% +0.1 0.27 ? 8% perf-profile.self.cycles-pp.__x64_sys_munmap
5721 ? 47% -62.2% 2162 ? 79% interrupts.41:PCI-MSI.67633156-edge.eth0-TxRx-3
108.83 ? 24% -42.7% 62.33 ? 30% interrupts.CPU101.NMI:Non-maskable_interrupts
108.83 ? 24% -42.7% 62.33 ? 30% interrupts.CPU101.PMI:Performance_monitoring_interrupts
29.67 ? 16% +146.1% 73.00 ? 77% interrupts.CPU11.RES:Rescheduling_interrupts
5721 ? 47% -62.2% 2162 ? 79% interrupts.CPU33.41:PCI-MSI.67633156-edge.eth0-TxRx-3
211.00 ? 99% -69.4% 64.67 ? 32% interrupts.CPU43.NMI:Non-maskable_interrupts
211.00 ? 99% -69.4% 64.67 ? 32% interrupts.CPU43.PMI:Performance_monitoring_interrupts
99.00 ? 21% -26.9% 72.33 ? 21% interrupts.CPU44.NMI:Non-maskable_interrupts
99.00 ? 21% -26.9% 72.33 ? 21% interrupts.CPU44.PMI:Performance_monitoring_interrupts
108.67 ? 33% -37.4% 68.00 ? 18% interrupts.CPU49.NMI:Non-maskable_interrupts
108.67 ? 33% -37.4% 68.00 ? 18% interrupts.CPU49.PMI:Performance_monitoring_interrupts
97.67 ? 20% -24.9% 73.33 ? 17% interrupts.CPU96.NMI:Non-maskable_interrupts
97.67 ? 20% -24.9% 73.33 ? 17% interrupts.CPU96.PMI:Performance_monitoring_interrupts
0.03 ? 47% -66.3% 0.01 ? 42% perf-sched.sch_delay.max.ms.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64
1678 ? 38% +59.5% 2677 ? 12% perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
1678 ? 38% +59.5% 2677 ? 12% perf-sched.wait_and_delay.avg.ms.do_syslog.part.0.kmsg_read.vfs_read
4759 ? 44% +70.0% 8092 ? 10% perf-sched.wait_and_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
4759 ? 44% +70.0% 8092 ? 10% perf-sched.wait_and_delay.max.ms.do_syslog.part.0.kmsg_read.vfs_read
4762 ? 44% +70.0% 8094 ? 10% perf-sched.wait_and_delay.max.ms.pipe_read.new_sync_read.vfs_read.ksys_read
4319 ? 49% +78.4% 7706 ? 11% perf-sched.wait_and_delay.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
1678 ? 38% +59.5% 2677 ? 12% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
1678 ? 38% +59.5% 2677 ? 12% perf-sched.wait_time.avg.ms.do_syslog.part.0.kmsg_read.vfs_read
4759 ? 44% +70.0% 8092 ? 10% perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
4759 ? 44% +70.0% 8092 ? 10% perf-sched.wait_time.max.ms.do_syslog.part.0.kmsg_read.vfs_read
4762 ? 44% +70.0% 8094 ? 10% perf-sched.wait_time.max.ms.pipe_read.new_sync_read.vfs_read.ksys_read
4319 ? 49% +78.4% 7706 ? 11% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait
1.647e+10 -2.6% 1.603e+10 perf-stat.i.branch-instructions
82309505 ? 3% -6.4% 77043640 perf-stat.i.branch-misses
5.99 ? 24% +7.5 13.49 ? 44% perf-stat.i.cache-miss-rate%
0.08 ? 4% -0.0 0.07 ? 2% perf-stat.i.dTLB-load-miss-rate%
13214811 ? 4% -6.1% 12407570 ? 2% perf-stat.i.dTLB-load-misses
1.719e+10 -2.6% 1.674e+10 perf-stat.i.dTLB-loads
7.906e+09 -2.6% 7.699e+09 perf-stat.i.dTLB-stores
87.09 ? 3% -5.7 81.35 ? 4% perf-stat.i.iTLB-load-miss-rate%
13127160 ? 2% -5.1% 12459801 perf-stat.i.iTLB-load-misses
1919142 ? 23% +49.4% 2867789 ? 24% perf-stat.i.iTLB-loads
6.825e+10 -2.6% 6.645e+10 perf-stat.i.instructions
399.89 -2.7% 389.17 perf-stat.i.metric.M/sec
6.22 ? 28% +7.3 13.52 ? 39% perf-stat.overall.cache-miss-rate%
0.08 ? 4% -0.0 0.07 ? 2% perf-stat.overall.dTLB-load-miss-rate%
87.27 ? 3% -5.8 81.44 ? 4% perf-stat.overall.iTLB-load-miss-rate%
1.641e+10 -2.6% 1.598e+10 perf-stat.ps.branch-instructions
82037823 ? 3% -6.4% 76784758 perf-stat.ps.branch-misses
13169722 ? 4% -6.1% 12365594 ? 2% perf-stat.ps.dTLB-load-misses
1.713e+10 -2.7% 1.668e+10 perf-stat.ps.dTLB-loads
7.879e+09 -2.6% 7.674e+09 perf-stat.ps.dTLB-stores
13082418 ? 2% -5.1% 12417659 perf-stat.ps.iTLB-load-misses
1912636 ? 23% +49.4% 2857819 ? 24% perf-stat.ps.iTLB-loads
6.802e+10 -2.6% 6.623e+10 perf-stat.ps.instructions
2.056e+13 -2.8% 1.999e+13 perf-stat.total.instructions



***************************************************************************************************
lkp-csl-2sp9: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/mmap1/will-it-scale/0x5003006

commit:
39f985c8f6 ("fs/cachefiles: Remove wait_bit_key layout dependency")
e5dbd33218 ("mm/writeback: Add wait_on_page_writeback_killable")

39f985c8f667c80a e5dbd33218bd8d87ab69f730ab9
---------------- ---------------------------
%stddev %change %stddev
\ | \
30112325 -4.7% 28693708 will-it-scale.88.processes
342185 -4.7% 326064 will-it-scale.per_process_ops
30112325 -4.7% 28693708 will-it-scale.workload
8800 ? 14% +130.8% 20313 ? 70% cpuidle.POLL.time
1884 ? 32% -45.5% 1026 ? 42% interrupts.CPU67.CAL:Function_call_interrupts
121.50 ? 12% -20.0% 97.17 ? 7% perf-sched.wait_and_delay.count.preempt_schedule_common.__cond_resched.down_write_killable.__vm_munmap.__x64_sys_munmap
78709 ? 60% -48.1% 40873 ?122% numa-meminfo.node0.Active
78545 ? 60% -48.2% 40686 ?122% numa-meminfo.node0.Active(anon)
19679 ? 60% -48.1% 10211 ?122% numa-vmstat.node0.nr_active_anon
19679 ? 60% -48.1% 10211 ?122% numa-vmstat.node0.nr_zone_active_anon
6.879e+10 -4.7% 6.556e+10 perf-stat.i.branch-instructions
0.28 -0.0 0.28 perf-stat.i.branch-miss-rate%
1.883e+08 -6.9% 1.753e+08 perf-stat.i.branch-misses
0.83 +5.7% 0.88 perf-stat.i.cpi
7.004e+10 -4.7% 6.678e+10 perf-stat.i.dTLB-loads
27999 ? 3% -10.2% 25154 ? 11% perf-stat.i.dTLB-store-misses
3.146e+10 -4.7% 2.999e+10 perf-stat.i.dTLB-stores
1.007e+08 -6.0% 94644724 perf-stat.i.iTLB-load-misses
2.842e+11 -4.7% 2.709e+11 perf-stat.i.instructions
2829 +1.4% 2870 perf-stat.i.instructions-per-iTLB-miss
1.21 -5.4% 1.14 perf-stat.i.ipc
1935 -4.7% 1844 perf-stat.i.metric.M/sec
0.27 -0.0 0.27 perf-stat.overall.branch-miss-rate%
0.83 +5.7% 0.88 perf-stat.overall.cpi
2822 +1.4% 2862 perf-stat.overall.instructions-per-iTLB-miss
1.21 -5.4% 1.14 perf-stat.overall.ipc
6.855e+10 -4.7% 6.534e+10 perf-stat.ps.branch-instructions
1.877e+08 -6.9% 1.748e+08 perf-stat.ps.branch-misses
6.98e+10 -4.7% 6.655e+10 perf-stat.ps.dTLB-loads
3.135e+10 -4.7% 2.989e+10 perf-stat.ps.dTLB-stores
1.003e+08 -6.0% 94321542 perf-stat.ps.iTLB-load-misses
2.832e+11 -4.7% 2.699e+11 perf-stat.ps.instructions
8.563e+13 -4.7% 8.161e+13 perf-stat.total.instructions
30.17 -1.3 28.83 perf-profile.calltrace.cycles-pp.__mmap
26.81 -1.2 25.57 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__mmap
25.89 -1.1 24.77 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
22.19 -1.1 21.08 perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
25.44 -1.1 24.35 perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
25.11 -1.1 24.02 perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.__mmap
17.98 -1.0 17.01 perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
5.83 -0.3 5.57 perf-profile.calltrace.cycles-pp.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
3.33 -0.2 3.08 perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
2.40 -0.1 2.26 perf-profile.calltrace.cycles-pp.vma_link.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
0.90 ? 4% -0.1 0.76 ? 4% perf-profile.calltrace.cycles-pp.perf_event_mmap_output.perf_iterate_sb.perf_event_mmap.mmap_region.do_mmap
3.35 -0.1 3.21 perf-profile.calltrace.cycles-pp.zap_pte_range.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
2.90 -0.1 2.76 perf-profile.calltrace.cycles-pp.rcu_all_qs.__cond_resched.unmap_page_range.unmap_vmas.unmap_region
0.87 ? 2% -0.1 0.74 ? 3% perf-profile.calltrace.cycles-pp.down_write_killable.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.76 -0.1 1.66 perf-profile.calltrace.cycles-pp.__entry_text_start.__munmap
1.78 -0.1 1.68 perf-profile.calltrace.cycles-pp.__vma_link_rb.vma_link.mmap_region.do_mmap.vm_mmap_pgoff
1.74 -0.1 1.66 perf-profile.calltrace.cycles-pp.__entry_text_start.__mmap
0.76 -0.1 0.70 perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff
0.58 -0.0 0.53 ? 2% perf-profile.calltrace.cycles-pp.cap_vm_enough_memory.security_vm_enough_memory_mm.mmap_region.do_mmap.vm_mmap_pgoff
0.64 -0.0 0.59 perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
0.79 -0.0 0.75 perf-profile.calltrace.cycles-pp.down_write_killable.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.95 -0.0 0.91 perf-profile.calltrace.cycles-pp.__vma_rb_erase.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
0.70 -0.0 0.68 perf-profile.calltrace.cycles-pp.vmacache_find.find_vma.__do_munmap.__vm_munmap.__x64_sys_munmap
0.84 +0.1 0.90 perf-profile.calltrace.cycles-pp.security_mmap_file.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.27 +0.1 1.34 perf-profile.calltrace.cycles-pp.vm_unmapped_area.arch_get_unmapped_area_topdown.get_unmapped_area.do_mmap.vm_mmap_pgoff
5.84 +0.2 6.08 perf-profile.calltrace.cycles-pp.__cond_resched.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
2.05 +0.4 2.43 perf-profile.calltrace.cycles-pp.find_vma.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
23.20 +0.6 23.76 perf-profile.calltrace.cycles-pp.___might_sleep.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
36.46 +0.9 37.31 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
9.76 +1.0 10.72 perf-profile.calltrace.cycles-pp.free_pgd_range.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
40.96 +1.0 41.96 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
70.65 +1.3 71.93 perf-profile.calltrace.cycles-pp.__munmap
67.34 +1.4 68.76 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
66.47 +1.5 67.94 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
66.18 +1.5 67.67 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
65.70 +1.6 67.29 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
53.60 +1.7 55.34 perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
63.66 +1.8 65.49 perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
30.19 -1.4 28.83 perf-profile.children.cycles-pp.__mmap
22.27 -1.1 21.16 perf-profile.children.cycles-pp.do_mmap
25.48 -1.1 24.39 perf-profile.children.cycles-pp.ksys_mmap_pgoff
25.17 -1.1 24.08 perf-profile.children.cycles-pp.vm_mmap_pgoff
18.11 -1.0 17.16 perf-profile.children.cycles-pp.mmap_region
6.00 -0.3 5.73 perf-profile.children.cycles-pp.perf_event_mmap
3.39 -0.3 3.14 perf-profile.children.cycles-pp.perf_iterate_sb
1.73 -0.2 1.55 perf-profile.children.cycles-pp.down_write_killable
0.90 -0.2 0.74 perf-profile.children.cycles-pp.__might_sleep
0.94 ? 4% -0.1 0.79 ? 4% perf-profile.children.cycles-pp.perf_event_mmap_output
2.48 -0.1 2.34 perf-profile.children.cycles-pp.vma_link
3.41 -0.1 3.27 perf-profile.children.cycles-pp.zap_pte_range
2.26 -0.1 2.13 perf-profile.children.cycles-pp.__entry_text_start
1.13 -0.1 1.01 ? 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.58 ? 2% -0.1 0.47 ? 5% perf-profile.children.cycles-pp.tlb_gather_mmu
1.78 -0.1 1.68 perf-profile.children.cycles-pp.__vma_link_rb
2.16 ? 2% -0.1 2.07 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.90 -0.1 0.83 ? 6% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.51 -0.1 0.44 perf-profile.children.cycles-pp.downgrade_write
0.85 -0.1 0.78 ? 5% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.80 -0.1 0.74 ? 5% perf-profile.children.cycles-pp.hrtimer_interrupt
0.81 -0.1 0.74 ? 5% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.78 -0.1 0.72 perf-profile.children.cycles-pp.security_vm_enough_memory_mm
0.43 ? 5% -0.1 0.38 ? 7% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.60 -0.1 0.55 perf-profile.children.cycles-pp.cap_vm_enough_memory
0.68 -0.1 0.63 perf-profile.children.cycles-pp.free_pgtables
0.50 -0.0 0.45 ? 2% perf-profile.children.cycles-pp.vma_set_page_prot
0.34 ? 4% -0.0 0.29 ? 3% perf-profile.children.cycles-pp.up_read
0.15 ? 4% -0.0 0.11 ? 30% perf-profile.children.cycles-pp.ktime_get
0.33 ? 2% -0.0 0.29 ? 5% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
0.58 -0.0 0.54 ? 6% perf-profile.children.cycles-pp.tick_sched_timer
0.96 -0.0 0.92 perf-profile.children.cycles-pp.__vma_rb_erase
0.61 -0.0 0.57 perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.28 ? 2% -0.0 0.24 ? 4% perf-profile.children.cycles-pp.userfaultfd_unmap_complete
0.33 ? 2% -0.0 0.30 perf-profile.children.cycles-pp.unlink_anon_vmas
0.48 -0.0 0.45 perf-profile.children.cycles-pp.sync_mm_rss
0.74 -0.0 0.71 perf-profile.children.cycles-pp.vmacache_find
0.37 -0.0 0.34 ? 2% perf-profile.children.cycles-pp.up_write
0.40 ? 2% -0.0 0.38 perf-profile.children.cycles-pp.__vm_enough_memory
0.46 -0.0 0.44 perf-profile.children.cycles-pp.security_mmap_addr
0.25 ? 3% -0.0 0.23 ? 2% perf-profile.children.cycles-pp.unmap_single_vma
0.18 ? 2% -0.0 0.16 ? 6% perf-profile.children.cycles-pp.kfree
0.34 ? 2% -0.0 0.32 perf-profile.children.cycles-pp.userfaultfd_unmap_prep
0.17 -0.0 0.15 ? 3% perf-profile.children.cycles-pp.blocking_notifier_call_chain
0.08 -0.0 0.06 ? 7% perf-profile.children.cycles-pp.should_failslab
0.11 ? 4% -0.0 0.09 ? 4% perf-profile.children.cycles-pp.unlink_file_vma
0.19 -0.0 0.18 perf-profile.children.cycles-pp.rcu_read_unlock_strict
0.10 ? 23% +0.0 0.14 ? 7% perf-profile.children.cycles-pp.vm_area_free
0.06 +0.0 0.11 ? 3% perf-profile.children.cycles-pp.__x86_indirect_thunk_r9
2.22 +0.1 2.27 perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
0.26 ? 7% +0.1 0.31 ? 4% perf-profile.children.cycles-pp.strlen
1.28 +0.1 1.35 perf-profile.children.cycles-pp.vm_unmapped_area
0.27 ? 2% +0.1 0.36 ? 2% perf-profile.children.cycles-pp.vmacache_update
0.87 +0.1 0.98 perf-profile.children.cycles-pp.security_mmap_file
94.27 +0.2 94.46 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
21.07 +0.3 21.37 perf-profile.children.cycles-pp.___might_sleep
92.45 +0.4 92.80 perf-profile.children.cycles-pp.do_syscall_64
2.21 +0.4 2.59 perf-profile.children.cycles-pp.find_vma
9.79 +1.0 10.74 perf-profile.children.cycles-pp.free_pgd_range
41.00 +1.0 42.00 perf-profile.children.cycles-pp.unmap_vmas
39.84 +1.1 40.91 perf-profile.children.cycles-pp.unmap_page_range
70.67 +1.3 71.95 perf-profile.children.cycles-pp.__munmap
66.25 +1.5 67.75 perf-profile.children.cycles-pp.__x64_sys_munmap
65.75 +1.6 67.31 perf-profile.children.cycles-pp.__vm_munmap
53.70 +1.7 55.42 perf-profile.children.cycles-pp.unmap_region
63.84 +1.8 65.66 perf-profile.children.cycles-pp.__do_munmap
2.07 -0.2 1.89 perf-profile.self.cycles-pp.__do_munmap
0.99 -0.2 0.83 ? 2% perf-profile.self.cycles-pp.do_mmap
0.88 ? 4% -0.1 0.74 ? 4% perf-profile.self.cycles-pp.perf_event_mmap_output
0.81 -0.1 0.67 ? 2% perf-profile.self.cycles-pp.__might_sleep
3.10 -0.1 2.99 perf-profile.self.cycles-pp.__cond_resched
2.39 -0.1 2.29 perf-profile.self.cycles-pp.perf_iterate_sb
0.56 ? 2% -0.1 0.46 ? 5% perf-profile.self.cycles-pp.tlb_gather_mmu
1.75 -0.1 1.66 perf-profile.self.cycles-pp.__vma_link_rb
2.46 -0.1 2.37 perf-profile.self.cycles-pp.zap_pte_range
2.17 -0.1 2.09 perf-profile.self.cycles-pp.perf_event_mmap
2.15 ? 2% -0.1 2.06 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.54 -0.1 0.46 ? 4% perf-profile.self.cycles-pp.__vm_munmap
1.01 -0.1 0.94 perf-profile.self.cycles-pp.__entry_text_start
0.48 -0.1 0.42 perf-profile.self.cycles-pp.downgrade_write
0.42 ? 6% -0.1 0.36 ? 7% perf-profile.self.cycles-pp.lru_add_drain_cpu
0.25 ? 3% -0.1 0.19 ? 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.71 ? 2% -0.1 0.66 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.93 -0.0 0.88 perf-profile.self.cycles-pp.vm_area_alloc
0.36 ? 2% -0.0 0.31 ? 2% perf-profile.self.cycles-pp.cap_vm_enough_memory
0.68 -0.0 0.63 perf-profile.self.cycles-pp.down_write_killable
0.14 ? 4% -0.0 0.10 ? 30% perf-profile.self.cycles-pp.ktime_get
0.49 -0.0 0.45 perf-profile.self.cycles-pp.unmap_vmas
0.27 ? 3% -0.0 0.23 ? 5% perf-profile.self.cycles-pp.userfaultfd_unmap_complete
0.93 -0.0 0.90 perf-profile.self.cycles-pp.__vma_rb_erase
0.33 ? 2% -0.0 0.29 ? 3% perf-profile.self.cycles-pp.__x64_sys_munmap
0.46 ? 2% -0.0 0.43 perf-profile.self.cycles-pp.sync_mm_rss
0.43 -0.0 0.40 perf-profile.self.cycles-pp.exit_to_user_mode_prepare
0.31 -0.0 0.28 ? 2% perf-profile.self.cycles-pp.unlink_anon_vmas
0.39 ? 2% -0.0 0.36 ? 2% perf-profile.self.cycles-pp.unmap_region
0.64 -0.0 0.61 perf-profile.self.cycles-pp.vmacache_find
0.54 -0.0 0.52 ? 2% perf-profile.self.cycles-pp.get_unmapped_area
0.30 ? 2% -0.0 0.27 ? 5% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
0.35 -0.0 0.33 perf-profile.self.cycles-pp.up_write
0.16 ? 5% -0.0 0.14 ? 4% perf-profile.self.cycles-pp.__vm_enough_memory
0.29 ? 2% -0.0 0.27 ? 2% perf-profile.self.cycles-pp.up_read
0.17 ? 3% -0.0 0.15 ? 3% perf-profile.self.cycles-pp.blocking_notifier_call_chain
0.17 ? 4% -0.0 0.15 ? 2% perf-profile.self.cycles-pp.do_syscall_64
0.20 ? 3% -0.0 0.18 ? 2% perf-profile.self.cycles-pp.syscall_enter_from_user_mode
0.07 ? 6% -0.0 0.06 perf-profile.self.cycles-pp.ktime_get_update_offsets_now
0.20 ? 3% +0.0 0.21 ? 2% perf-profile.self.cycles-pp.cap_mmap_file
0.82 +0.0 0.85 perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown
0.17 ? 2% +0.0 0.21 ? 4% perf-profile.self.cycles-pp.lru_add_drain
1.24 +0.0 1.28 perf-profile.self.cycles-pp.vm_unmapped_area
0.05 ? 75% +0.1 0.11 ? 9% perf-profile.self.cycles-pp.vm_area_free
0.00 +0.1 0.07 perf-profile.self.cycles-pp.arch_vma_name
0.24 ? 3% +0.1 0.32 ? 2% perf-profile.self.cycles-pp.vmacache_update
0.41 ? 2% +0.1 0.50 perf-profile.self.cycles-pp.security_mmap_file
2.32 +0.1 2.46 perf-profile.self.cycles-pp.rcu_all_qs
1.24 +0.3 1.57 perf-profile.self.cycles-pp.find_vma
9.70 +1.0 10.65 perf-profile.self.cycles-pp.free_pgd_range
14.97 +1.0 15.94 perf-profile.self.cycles-pp.unmap_page_range





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (49.97 kB)
config-5.12.0-rc4-00002-ge5dbd33218bd (175.56 kB)
job-script (7.86 kB)
job.yaml (5.16 kB)
reproduce (347.00 B)
Download all attachments

2021-04-23 12:52:27

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [mm/writeback] e5dbd33218: will-it-scale.per_process_ops -3.8% regression

On Fri, Apr 23, 2021 at 01:46:01PM +0800, kernel test robot wrote:
> FYI, we noticed a -3.8% regression of will-it-scale.per_process_ops due to commit:
> commit: e5dbd33218bd8d87ab69f730ab90aed5fab7eb26 ("mm/writeback: Add wait_on_page_writeback_killable")

That commit just adds a function. It doesn't add any callers. It must
just be moving something around ...

> 39f985c8f667c80a e5dbd33218bd8d87ab69f730ab9
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 9359770 -3.8% 9001769 will-it-scale.16.processes
> 584985 -3.8% 562610 will-it-scale.per_process_ops
> 9359770 -3.8% 9001769 will-it-scale.workload
> 15996 -1.2% 15811 proc-vmstat.nr_kernel_stack
> 23577 ? 10% +18.5% 27937 ? 7% softirqs.CPU48.SCHED
> 5183 ? 41% +47.2% 7630 ? 7% interrupts.CPU1.NMI:Non-maskable_interrupts
> 5183 ? 41% +47.2% 7630 ? 7% interrupts.CPU1.PMI:Performance_monitoring_interrupts
> 54.33 ? 12% +18.4% 64.33 ? 7% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
> 153.34 ? 24% -45.9% 83.00 ? 25% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
> 153.33 ? 24% -45.9% 82.99 ? 25% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
> 2.424e+10 -3.8% 2.332e+10 perf-stat.i.branch-instructions
> 0.47 +3.7% 0.48 perf-stat.i.cpi
> 2.529e+10 -4.0% 2.428e+10 perf-stat.i.dTLB-loads
> 1.15e+10 -3.8% 1.106e+10 perf-stat.i.dTLB-stores
> 54249733 -4.8% 51627939 perf-stat.i.iTLB-load-misses
> 1.004e+11 -3.8% 9.661e+10 perf-stat.i.instructions
> 2.15 -3.6% 2.07 perf-stat.i.ipc
> 693.66 -3.9% 666.70 perf-stat.i.metric.M/sec
> 0.46 +3.7% 0.48 perf-stat.overall.cpi
> 2.15 -3.6% 2.08 perf-stat.overall.ipc
> 2.416e+10 -3.8% 2.324e+10 perf-stat.ps.branch-instructions
> 2.52e+10 -4.0% 2.419e+10 perf-stat.ps.dTLB-loads
> 1.146e+10 -3.8% 1.102e+10 perf-stat.ps.dTLB-stores
> 54065825 -4.8% 51454019 perf-stat.ps.iTLB-load-misses
> 1.001e+11 -3.8% 9.628e+10 perf-stat.ps.instructions
> 3.025e+13 -3.9% 2.908e+13 perf-stat.total.instructions
> 0.89 ? 14% -0.1 0.77 ? 11% perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.shmem_mmap.mmap_region.do_mmap
> 0.14 ? 13% -0.1 0.04 ? 71% perf-profile.children.cycles-pp.common_mmap
> 0.61 ? 12% -0.1 0.52 ? 12% perf-profile.children.cycles-pp.common_file_perm
> 0.21 ? 8% -0.0 0.17 ? 11% perf-profile.children.cycles-pp.vma_set_page_prot
> 0.12 ? 8% -0.0 0.09 ? 12% perf-profile.children.cycles-pp.blocking_notifier_call_chain
> 0.12 ? 14% -0.0 0.09 ? 15% perf-profile.children.cycles-pp.get_mmap_base
> 0.09 ? 8% -0.0 0.07 ? 11% perf-profile.children.cycles-pp.vm_pgprot_modify
> 0.13 ? 15% +0.1 0.19 ? 8% perf-profile.children.cycles-pp.cap_capable
> 0.03 ?102% +0.1 0.12 ? 12% perf-profile.children.cycles-pp.munmap@plt
> 0.14 ? 13% +0.1 0.24 ? 6% perf-profile.children.cycles-pp.testcase
> 0.33 ? 10% -0.1 0.23 ? 10% perf-profile.self.cycles-pp.cap_vm_enough_memory
> 0.13 ? 11% -0.1 0.03 ?100% perf-profile.self.cycles-pp.common_mmap
> 0.48 ? 12% -0.1 0.41 ? 12% perf-profile.self.cycles-pp.common_file_perm
> 0.49 ? 12% -0.1 0.43 ? 13% perf-profile.self.cycles-pp.vm_area_alloc
> 0.12 ? 8% -0.0 0.09 ? 12% perf-profile.self.cycles-pp.blocking_notifier_call_chain
> 0.12 ? 13% -0.0 0.09 ? 14% perf-profile.self.cycles-pp.get_mmap_base
> 0.11 ? 8% +0.0 0.16 ? 10% perf-profile.self.cycles-pp.__x64_sys_munmap
> 0.11 ? 14% +0.1 0.18 ? 8% perf-profile.self.cycles-pp.cap_capable
> 0.12 ? 11% +0.1 0.20 ? 6% perf-profile.self.cycles-pp.testcase
> 0.01 ?223% +0.1 0.11 ? 13% perf-profile.self.cycles-pp.munmap@plt

I'm struggling to see anything in that that says anything other than
"we did 3-4% less work". Maybe someone else has something useful to
say about it?

2021-04-28 06:16:40

by Xing Zhengjun

[permalink] [raw]
Subject: Re: [LKP] Re: [mm/writeback] e5dbd33218: will-it-scale.per_process_ops -3.8% regression

Hi Matthew,

On 4/23/2021 8:47 PM, Matthew Wilcox wrote:
> On Fri, Apr 23, 2021 at 01:46:01PM +0800, kernel test robot wrote:
>> FYI, we noticed a -3.8% regression of will-it-scale.per_process_ops due to commit:
>> commit: e5dbd33218bd8d87ab69f730ab90aed5fab7eb26 ("mm/writeback: Add wait_on_page_writeback_killable")
> That commit just adds a function. It doesn't add any callers. It must
> just be moving something around ...

The micro benchmark like will-it-scale is sensitive to the alignments
(text/data), so I apply the data align debug patch and re-test, the
regression reduced to -1.5%.
=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/nr_task/mode/test/cpufreq_governor/ucode:
lkp-csl-2sp9/will-it-scale/debian-10.4-x86_64-20200603.cgz/x86_64-rhel-8.3-ge5dbd33218bd-no-dynamic/gcc-9/16/process/mmap2/performance/0x5003006

commit:
  a142a3781e3dc0c03a48688cac619c2684eed18f (fs/cachefiles: Remove
wait_bit_key layout dependency)
  86460bf788cb360a14811fadb3f94f9765ba5a23 (mm/writeback: Add
wait_on_page_writeback_killable)

a142a3781e3dc0c0 86460bf788cb360a14811fadb3f
---------------- ---------------------------
         %stddev     %change         %stddev
             \          |                \
   9089952            -1.5%    8953838 will-it-scale.16.processes
    568121            -1.5%     559614 will-it-scale.per_process_ops
   9089952            -1.5%    8953838        will-it-scale.workload

>> 39f985c8f667c80a e5dbd33218bd8d87ab69f730ab9
>> ---------------- ---------------------------
>> %stddev %change %stddev
>> \ | \
>> 9359770 -3.8% 9001769 will-it-scale.16.processes
>> 584985 -3.8% 562610 will-it-scale.per_process_ops
>> 9359770 -3.8% 9001769 will-it-scale.workload
>> 15996 -1.2% 15811 proc-vmstat.nr_kernel_stack
>> 23577 ± 10% +18.5% 27937 ± 7% softirqs.CPU48.SCHED
>> 5183 ± 41% +47.2% 7630 ± 7% interrupts.CPU1.NMI:Non-maskable_interrupts
>> 5183 ± 41% +47.2% 7630 ± 7% interrupts.CPU1.PMI:Performance_monitoring_interrupts
>> 54.33 ± 12% +18.4% 64.33 ± 7% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_sys_poll
>> 153.34 ± 24% -45.9% 83.00 ± 25% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
>> 153.33 ± 24% -45.9% 82.99 ± 25% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
>> 2.424e+10 -3.8% 2.332e+10 perf-stat.i.branch-instructions
>> 0.47 +3.7% 0.48 perf-stat.i.cpi
>> 2.529e+10 -4.0% 2.428e+10 perf-stat.i.dTLB-loads
>> 1.15e+10 -3.8% 1.106e+10 perf-stat.i.dTLB-stores
>> 54249733 -4.8% 51627939 perf-stat.i.iTLB-load-misses
>> 1.004e+11 -3.8% 9.661e+10 perf-stat.i.instructions
>> 2.15 -3.6% 2.07 perf-stat.i.ipc
>> 693.66 -3.9% 666.70 perf-stat.i.metric.M/sec
>> 0.46 +3.7% 0.48 perf-stat.overall.cpi
>> 2.15 -3.6% 2.08 perf-stat.overall.ipc
>> 2.416e+10 -3.8% 2.324e+10 perf-stat.ps.branch-instructions
>> 2.52e+10 -4.0% 2.419e+10 perf-stat.ps.dTLB-loads
>> 1.146e+10 -3.8% 1.102e+10 perf-stat.ps.dTLB-stores
>> 54065825 -4.8% 51454019 perf-stat.ps.iTLB-load-misses
>> 1.001e+11 -3.8% 9.628e+10 perf-stat.ps.instructions
>> 3.025e+13 -3.9% 2.908e+13 perf-stat.total.instructions
>> 0.89 ± 14% -0.1 0.77 ± 11% perf-profile.calltrace.cycles-pp.atime_needs_update.touch_atime.shmem_mmap.mmap_region.do_mmap
>> 0.14 ± 13% -0.1 0.04 ± 71% perf-profile.children.cycles-pp.common_mmap
>> 0.61 ± 12% -0.1 0.52 ± 12% perf-profile.children.cycles-pp.common_file_perm
>> 0.21 ± 8% -0.0 0.17 ± 11% perf-profile.children.cycles-pp.vma_set_page_prot
>> 0.12 ± 8% -0.0 0.09 ± 12% perf-profile.children.cycles-pp.blocking_notifier_call_chain
>> 0.12 ± 14% -0.0 0.09 ± 15% perf-profile.children.cycles-pp.get_mmap_base
>> 0.09 ± 8% -0.0 0.07 ± 11% perf-profile.children.cycles-pp.vm_pgprot_modify
>> 0.13 ± 15% +0.1 0.19 ± 8% perf-profile.children.cycles-pp.cap_capable
>> 0.03 ±102% +0.1 0.12 ± 12% perf-profile.children.cycles-pp.munmap@plt
>> 0.14 ± 13% +0.1 0.24 ± 6% perf-profile.children.cycles-pp.testcase
>> 0.33 ± 10% -0.1 0.23 ± 10% perf-profile.self.cycles-pp.cap_vm_enough_memory
>> 0.13 ± 11% -0.1 0.03 ±100% perf-profile.self.cycles-pp.common_mmap
>> 0.48 ± 12% -0.1 0.41 ± 12% perf-profile.self.cycles-pp.common_file_perm
>> 0.49 ± 12% -0.1 0.43 ± 13% perf-profile.self.cycles-pp.vm_area_alloc
>> 0.12 ± 8% -0.0 0.09 ± 12% perf-profile.self.cycles-pp.blocking_notifier_call_chain
>> 0.12 ± 13% -0.0 0.09 ± 14% perf-profile.self.cycles-pp.get_mmap_base
>> 0.11 ± 8% +0.0 0.16 ± 10% perf-profile.self.cycles-pp.__x64_sys_munmap
>> 0.11 ± 14% +0.1 0.18 ± 8% perf-profile.self.cycles-pp.cap_capable
>> 0.12 ± 11% +0.1 0.20 ± 6% perf-profile.self.cycles-pp.testcase
>> 0.01 ±223% +0.1 0.11 ± 13% perf-profile.self.cycles-pp.munmap@plt
> I'm struggling to see anything in that that says anything other than
> "we did 3-4% less work". Maybe someone else has something useful to
> say about it?
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

--
Zhengjun Xing