2021-10-05 14:10:10

by kernel test robot

[permalink] [raw]
Subject: [mm] 243418e392: will-it-scale.per_process_ops 3.0% improvement



Greeting,

FYI, we noticed a 3.0% improvement of will-it-scale.per_process_ops due to commit:


commit: 243418e3925d5b5b0657ae54c322d43035e97eed ("mm: fs: invalidate bh_lrus for only cold path")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: will-it-scale
on test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
with following parameters:

nr_task: 100%
mode: process
test: brk1
cpufreq_governor: performance
ucode: 0x5003006

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale





Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/brk1/will-it-scale/0x5003006

commit:
b7cd9fa5cc ("lib/zlib_inflate/inffast: check config in C to avoid unused function warning")
243418e392 ("mm: fs: invalidate bh_lrus for only cold path")

b7cd9fa5ccc392d9 243418e3925d5b5b0657ae54c32
---------------- ---------------------------
%stddev %change %stddev
\ | \
2.068e+08 +3.0% 2.131e+08 will-it-scale.192.processes
1077243 +3.0% 1109687 will-it-scale.per_process_ops
2.068e+08 +3.0% 2.131e+08 will-it-scale.workload
759.50 ? 67% +596.6% 5290 ? 55% interrupts.CPU28.RES:Rescheduling_interrupts
12703 -8.6% 11607 perf-sched.wait_and_delay.count.__sched_text_start.__sched_text_start.preempt_schedule_common.__cond_resched.unmap_vmas
28481 ? 6% +11.7% 31799 ? 9% softirqs.CPU28.RCU
32901 ? 5% +10.3% 36284 ? 3% softirqs.CPU54.RCU
0.18 ? 24% -0.0 0.15 ? 2% perf-stat.i.branch-miss-rate%
1.083e+11 ? 3% +2.4% 1.108e+11 perf-stat.i.dTLB-loads
0.16 ? 3% -0.0 0.15 ? 2% perf-stat.overall.branch-miss-rate%
0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate%
2813 ? 6% +11.5% 3137 ? 2% perf-stat.overall.instructions-per-iTLB-miss
565816 -3.3% 547233 perf-stat.overall.path-length
1.079e+11 ? 3% +2.4% 1.105e+11 perf-stat.ps.dTLB-loads
17.76 -2.3 15.48 perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.73 -2.0 0.76 ? 5% perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.__do_munmap.__x64_sys_brk.do_syscall_64
35.83 -1.6 34.23 perf-profile.calltrace.cycles-pp.__do_munmap.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
78.46 -0.6 77.82 perf-profile.calltrace.cycles-pp.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
81.63 -0.5 81.10 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
83.66 -0.5 83.20 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.brk
7.49 -0.4 7.10 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__x64_sys_brk
9.06 -0.4 8.68 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__x64_sys_brk.do_syscall_64
4.02 -0.1 3.90 perf-profile.calltrace.cycles-pp.zap_pte_range.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
1.68 -0.1 1.61 perf-profile.calltrace.cycles-pp.tlb_gather_mmu.unmap_region.__do_munmap.__x64_sys_brk.do_syscall_64
1.08 +0.0 1.11 perf-profile.calltrace.cycles-pp.__vma_rb_erase.__do_munmap.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.77 +0.0 0.80 perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
1.13 +0.0 1.16 perf-profile.calltrace.cycles-pp.up_read.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
1.37 +0.0 1.42 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
0.61 +0.1 0.66 perf-profile.calltrace.cycles-pp.sync_mm_rss.zap_pte_range.unmap_page_range.unmap_vmas.unmap_region
1.27 +0.1 1.34 perf-profile.calltrace.cycles-pp.vmacache_find.find_vma.__do_munmap.__x64_sys_brk.do_syscall_64
1.84 +0.1 1.93 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.__do_munmap.__x64_sys_brk.do_syscall_64
2.87 +0.1 2.98 perf-profile.calltrace.cycles-pp.down_write_killable.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
5.98 +0.2 6.22 perf-profile.calltrace.cycles-pp.perf_iterate_sb.perf_event_mmap.do_brk_flags.__x64_sys_brk.do_syscall_64
10.79 +0.3 11.09 perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
11.42 +0.3 11.74 perf-profile.calltrace.cycles-pp.__entry_text_start.brk
3.99 +0.3 4.30 perf-profile.calltrace.cycles-pp.find_vma.__do_munmap.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
32.96 +0.6 33.61 perf-profile.calltrace.cycles-pp.do_brk_flags.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
18.01 -2.3 15.74 perf-profile.children.cycles-pp.unmap_region
2.79 -2.0 0.81 ? 5% perf-profile.children.cycles-pp.lru_add_drain
36.15 -1.6 34.55 perf-profile.children.cycles-pp.__do_munmap
78.72 -0.6 78.09 perf-profile.children.cycles-pp.__x64_sys_brk
81.86 -0.5 81.34 perf-profile.children.cycles-pp.do_syscall_64
83.91 -0.5 83.46 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
7.59 -0.4 7.16 perf-profile.children.cycles-pp.unmap_page_range
9.11 -0.4 8.73 perf-profile.children.cycles-pp.unmap_vmas
4.12 -0.1 4.01 perf-profile.children.cycles-pp.zap_pte_range
1.68 -0.1 1.61 perf-profile.children.cycles-pp.tlb_gather_mmu
1.13 +0.0 1.16 perf-profile.children.cycles-pp.up_read
0.78 +0.0 0.81 perf-profile.children.cycles-pp.syscall_enter_from_user_mode
0.46 +0.0 0.51 perf-profile.children.cycles-pp.tlb_flush_mmu
0.61 +0.1 0.66 perf-profile.children.cycles-pp.sync_mm_rss
1.63 +0.1 1.69 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
1.48 +0.1 1.56 perf-profile.children.cycles-pp.vmacache_find
1.94 +0.1 2.03 perf-profile.children.cycles-pp.tlb_finish_mmu
3.07 +0.1 3.19 perf-profile.children.cycles-pp.down_write_killable
7.37 +0.2 7.58 perf-profile.children.cycles-pp.__entry_text_start
6.18 +0.2 6.42 perf-profile.children.cycles-pp.perf_iterate_sb
11.14 +0.3 11.43 perf-profile.children.cycles-pp.perf_event_mmap
5.32 +0.4 5.68 perf-profile.children.cycles-pp.find_vma
33.18 +0.7 33.86 perf-profile.children.cycles-pp.do_brk_flags
2.85 -0.3 2.50 perf-profile.self.cycles-pp.unmap_page_range
2.26 -0.1 2.13 perf-profile.self.cycles-pp.zap_pte_range
1.62 -0.1 1.56 perf-profile.self.cycles-pp.tlb_gather_mmu
0.64 +0.0 0.67 perf-profile.self.cycles-pp.exit_to_user_mode_prepare
0.72 +0.0 0.75 perf-profile.self.cycles-pp.syscall_enter_from_user_mode
1.07 ? 2% +0.0 1.11 perf-profile.self.cycles-pp.up_read
1.47 +0.0 1.51 perf-profile.self.cycles-pp.tlb_finish_mmu
1.43 +0.0 1.47 perf-profile.self.cycles-pp.downgrade_write
0.56 +0.0 0.61 perf-profile.self.cycles-pp.sync_mm_rss
0.28 ? 3% +0.0 0.33 ? 2% perf-profile.self.cycles-pp.tlb_flush_mmu
2.09 +0.1 2.14 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
2.25 +0.1 2.31 perf-profile.self.cycles-pp.vm_area_alloc
2.26 +0.1 2.34 perf-profile.self.cycles-pp.__x64_sys_brk
3.46 +0.1 3.53 perf-profile.self.cycles-pp.do_brk_flags
1.32 +0.1 1.40 perf-profile.self.cycles-pp.vmacache_find
2.53 +0.1 2.62 perf-profile.self.cycles-pp.down_write_killable
3.32 +0.1 3.42 perf-profile.self.cycles-pp.__entry_text_start
3.21 +0.1 3.33 ? 2% perf-profile.self.cycles-pp.kmem_cache_free
2.87 ? 2% +0.1 3.00 perf-profile.self.cycles-pp.kmem_cache_alloc
4.05 +0.1 4.20 perf-profile.self.cycles-pp.__do_munmap
5.21 +0.2 5.37 perf-profile.self.cycles-pp.brk
4.16 +0.2 4.35 ? 2% perf-profile.self.cycles-pp.perf_iterate_sb
3.41 +0.3 3.67 perf-profile.self.cycles-pp.find_vma



will-it-scale.192.processes

2.14e+08 +----------------------------------------------------------------+
|O OO O O O O O O |
2.13e+08 |-+ O O OOO OO OO O |
2.12e+08 |-+ O OO O |
| O |
2.11e+08 |-+ |
2.1e+08 |-+ |
| + |
2.09e+08 |-+ : |
2.08e+08 |-+ : : |
| ++ .+ + .+ + |
2.07e+08 |-+ : + : + + ++ ++.++ +.+++.+ :.+|
2.06e+08 |-+ + : : : ::+ .+ .+ + + .+ +.++ + |
|+.+ + + : + +++ ++ + +.+ + |
2.05e+08 +----------------------------------------------------------------+


will-it-scale.per_process_ops

1.115e+06 +---------------------------------------------------------------+
1.11e+06 |O+OO O O O OO O O |
| O OO OO OO OO O |
1.105e+06 |-+ O O O O |
1.1e+06 |-+ O |
| |
1.095e+06 |-+ |
1.09e+06 |-+ + |
1.085e+06 |-+ :: |
| : : + |
1.08e+06 |-+ ++++ + + +.+++.+++. .+ + :: |
1.075e+06 |-+ + : +. :: + .+ + +++ + +.+ +.+|
|+.+++ +. : + +. :+.+++ ++.++ |
1.07e+06 |-+ ++ ++ |
1.065e+06 +---------------------------------------------------------------+


will-it-scale.workload

2.14e+08 +----------------------------------------------------------------+
|O OO O O O O O O |
2.13e+08 |-+ O O OOO OO OO O |
2.12e+08 |-+ O OO O |
| O |
2.11e+08 |-+ |
2.1e+08 |-+ |
| + |
2.09e+08 |-+ : |
2.08e+08 |-+ : : |
| ++ .+ + .+ + |
2.07e+08 |-+ : + : + + ++ ++.++ +.+++.+ :.+|
2.06e+08 |-+ + : : : ::+ .+ .+ + + .+ +.++ + |
|+.+ + + : + +++ ++ + +.+ + |
2.05e+08 +----------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (15.94 kB)
config-5.15.0-rc2-00169-g243418e3925d (171.68 kB)
job-script (7.89 kB)
job.yaml (5.30 kB)
reproduce (347.00 B)
Download all attachments