Greeting,
FYI, we noticed a 3.5% improvement of will-it-scale.per_process_ops due to commit:
commit: a10787e6d58c24b51e91c19c6d16c5da89fcaa4b ("bpf: Enable task local storage for tracing programs")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
in testcase: will-it-scale
on test machine: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory
with following parameters:
nr_task: 16
mode: process
test: mmap2
cpufreq_governor: performance
ucode: 0x5003006
test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale
Details are as below:
-------------------------------------------------------------------------------------------------->
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml
bin/lkp run compatible-job.yaml
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/mmap2/will-it-scale/0x5003006
commit:
9c8f21e6f8 ("xsk: Build skb by page (aka generic zerocopy xmit)")
a10787e6d5 ("bpf: Enable task local storage for tracing programs")
9c8f21e6f8856a96 a10787e6d58c24b51e91c19c6d1
---------------- ---------------------------
%stddev %change %stddev
\ | \
8990002 +3.5% 9304107 will-it-scale.16.processes
561874 +3.5% 581506 will-it-scale.per_process_ops
8990002 +3.5% 9304107 will-it-scale.workload
112185 ? 23% +46.6% 164508 ? 22% numa-numastat.node0.local_node
63.33 ? 93% -80.8% 12.17 ?130% numa-vmstat.node0.nr_inactive_file
63.33 ? 93% -80.8% 12.17 ?130% numa-vmstat.node0.nr_zone_inactive_file
14212 ? 23% +41.7% 20144 ? 14% softirqs.CPU15.SCHED
30141 ? 13% -22.5% 23370 ? 14% softirqs.CPU59.SCHED
66.17 ? 88% -90.7% 6.17 ? 48% interrupts.CPU60.RES:Rescheduling_interrupts
500.00 +86.1% 930.33 ? 60% interrupts.CPU69.CAL:Function_call_interrupts
396.17 ? 6% -18.8% 321.50 ? 21% interrupts.CPU87.NMI:Non-maskable_interrupts
396.17 ? 6% -18.8% 321.50 ? 21% interrupts.CPU87.PMI:Performance_monitoring_interrupts
5.45 ? 46% -98.5% 0.08 ? 73% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown]
176.51 ? 36% -61.2% 68.51 ? 77% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
5.45 ? 46% -98.5% 0.08 ? 73% perf-sched.wait_time.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown]
176.50 ? 36% -61.2% 68.50 ? 77% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
2.304e+10 +3.4% 2.383e+10 perf-stat.i.branch-instructions
72536156 +4.1% 75492267 perf-stat.i.branch-misses
0.48 -3.3% 0.47 perf-stat.i.cpi
0.00 ? 15% -0.0 0.00 ? 9% perf-stat.i.dTLB-load-miss-rate%
2.404e+10 +3.4% 2.487e+10 perf-stat.i.dTLB-loads
1.096e+10 +3.4% 1.133e+10 perf-stat.i.dTLB-stores
47654226 +12.8% 53744349 perf-stat.i.iTLB-load-misses
9.562e+10 +3.4% 9.889e+10 perf-stat.i.instructions
2015 -8.4% 1847 perf-stat.i.instructions-per-iTLB-miss
2.06 +3.5% 2.14 perf-stat.i.ipc
659.67 +3.4% 682.32 perf-stat.i.metric.M/sec
0.48 -3.4% 0.47 perf-stat.overall.cpi
0.00 ? 18% -0.0 0.00 ? 14% perf-stat.overall.dTLB-load-miss-rate%
2006 -8.3% 1840 perf-stat.overall.instructions-per-iTLB-miss
2.07 +3.5% 2.14 perf-stat.overall.ipc
2.297e+10 +3.4% 2.375e+10 perf-stat.ps.branch-instructions
72285805 +4.1% 75236431 perf-stat.ps.branch-misses
2.396e+10 +3.4% 2.479e+10 perf-stat.ps.dTLB-loads
1.092e+10 +3.4% 1.13e+10 perf-stat.ps.dTLB-stores
47489125 +12.8% 53563329 perf-stat.ps.iTLB-load-misses
9.529e+10 +3.4% 9.856e+10 perf-stat.ps.instructions
2.876e+13 +3.5% 2.976e+13 perf-stat.total.instructions
44.75 -7.7 37.01 ? 11% perf-profile.calltrace.cycles-pp.__munmap
42.13 -7.2 34.95 ? 11% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
41.64 -7.1 34.53 ? 11% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
41.21 -7.1 34.11 ? 11% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
41.45 -7.1 34.36 ? 11% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
39.74 -6.9 32.83 ? 11% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
33.92 -6.2 27.75 ? 11% perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
25.32 -5.7 19.64 ? 11% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
24.74 -5.7 19.08 ? 11% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
10.59 -3.7 6.89 ? 11% perf-profile.calltrace.cycles-pp.___might_sleep.unmap_page_range.unmap_vmas.unmap_region.__do_munmap
1.60 -0.5 1.06 ? 32% perf-profile.calltrace.cycles-pp.__entry_text_start.__mmap
2.94 -0.4 2.56 ? 10% perf-profile.calltrace.cycles-pp.d_path.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
2.85 ? 2% -0.4 2.47 ? 11% perf-profile.calltrace.cycles-pp.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64
0.66 ? 6% -0.4 0.29 ?101% perf-profile.calltrace.cycles-pp.strlen.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff
2.39 ? 3% -0.3 2.10 ? 11% perf-profile.calltrace.cycles-pp.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff
1.30 ? 3% -0.2 1.08 ? 11% perf-profile.calltrace.cycles-pp.security_mmap_file.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.97 ? 2% -0.2 0.78 ? 11% perf-profile.calltrace.cycles-pp.find_vma.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
0.67 ? 3% -0.2 0.49 ? 45% perf-profile.calltrace.cycles-pp.touch_atime.shmem_mmap.mmap_region.do_mmap.vm_mmap_pgoff
0.90 ? 5% -0.2 0.73 ? 8% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
0.78 ? 5% -0.1 0.63 ? 8% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle
26.40 ? 4% +10.3 36.72 ? 17% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
26.40 ? 4% +10.3 36.72 ? 17% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
26.40 ? 4% +10.3 36.72 ? 17% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
26.11 ? 5% +10.4 36.49 ? 18% perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
26.00 ? 5% +10.4 36.40 ? 18% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
27.39 ? 4% +11.1 38.45 ? 18% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
25.93 ? 4% +11.4 37.32 ? 18% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
67.97 -10.6 57.41 ? 11% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
66.99 -10.4 56.56 ? 11% perf-profile.children.cycles-pp.do_syscall_64
44.75 -7.4 37.31 ? 11% perf-profile.children.cycles-pp.__munmap
41.23 -7.1 34.12 ? 11% perf-profile.children.cycles-pp.__vm_munmap
41.47 -7.1 34.38 ? 11% perf-profile.children.cycles-pp.__x64_sys_munmap
39.79 -6.9 32.88 ? 11% perf-profile.children.cycles-pp.__do_munmap
33.98 -6.2 27.81 ? 11% perf-profile.children.cycles-pp.unmap_region
25.35 -5.7 19.67 ? 11% perf-profile.children.cycles-pp.unmap_vmas
24.73 -5.6 19.12 ? 11% perf-profile.children.cycles-pp.unmap_page_range
11.68 -3.9 7.83 ? 11% perf-profile.children.cycles-pp.___might_sleep
2.98 -0.4 2.59 ? 10% perf-profile.children.cycles-pp.d_path
2.87 ? 2% -0.4 2.49 ? 11% perf-profile.children.cycles-pp.get_unmapped_area
2.49 ? 2% -0.3 2.18 ? 11% perf-profile.children.cycles-pp.kmem_cache_alloc
2.09 -0.3 1.80 ? 11% perf-profile.children.cycles-pp.__entry_text_start
2.31 -0.3 2.02 ? 10% perf-profile.children.cycles-pp.zap_pte_range
1.31 ? 3% -0.2 1.09 ? 11% perf-profile.children.cycles-pp.security_mmap_file
1.24 ? 2% -0.2 1.05 ? 10% perf-profile.children.cycles-pp.down_write
1.00 -0.2 0.81 ? 10% perf-profile.children.cycles-pp.find_vma
0.66 ? 6% -0.1 0.52 ? 15% perf-profile.children.cycles-pp.strlen
0.66 ? 3% -0.1 0.53 ? 12% perf-profile.children.cycles-pp.common_file_perm
0.69 ? 3% -0.1 0.58 ? 10% perf-profile.children.cycles-pp.touch_atime
0.36 ? 4% -0.1 0.29 ? 8% perf-profile.children.cycles-pp.sync_mm_rss
0.40 ? 3% -0.1 0.34 ? 8% perf-profile.children.cycles-pp.downgrade_write
0.19 ? 12% -0.1 0.13 ? 21% perf-profile.children.cycles-pp.cap_capable
0.25 ? 4% -0.1 0.20 ? 10% perf-profile.children.cycles-pp.vmacache_find
0.18 ? 7% -0.0 0.14 ? 10% perf-profile.children.cycles-pp.tlb_flush_mmu
0.19 ? 7% -0.0 0.15 ? 13% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.13 ? 11% -0.0 0.10 ? 15% perf-profile.children.cycles-pp.__libc_start_main
0.13 ? 11% -0.0 0.10 ? 15% perf-profile.children.cycles-pp.main
0.13 ? 11% -0.0 0.10 ? 15% perf-profile.children.cycles-pp.run_builtin
0.12 ? 10% -0.0 0.09 ? 7% perf-profile.children.cycles-pp.timestamp_truncate
0.09 ? 5% -0.0 0.06 ? 20% perf-profile.children.cycles-pp.common_mmap
0.19 ? 9% -0.0 0.16 ? 5% perf-profile.children.cycles-pp.may_expand_vm
0.19 ? 6% -0.0 0.16 ? 5% perf-profile.children.cycles-pp.userfaultfd_unmap_complete
0.09 ? 12% -0.0 0.07 ? 11% perf-profile.children.cycles-pp.vm_pgprot_modify
0.08 ? 6% -0.0 0.06 ? 11% perf-profile.children.cycles-pp.get_align_mask
0.10 ? 7% +0.0 0.13 ? 14% perf-profile.children.cycles-pp.blocking_notifier_call_chain
0.08 ? 22% +0.0 0.13 ? 12% perf-profile.children.cycles-pp.munmap@plt
26.40 ? 4% +10.3 36.72 ? 17% perf-profile.children.cycles-pp.start_secondary
27.39 ? 4% +11.1 38.45 ? 18% perf-profile.children.cycles-pp.secondary_startup_64_no_verify
27.39 ? 4% +11.1 38.45 ? 18% perf-profile.children.cycles-pp.cpu_startup_entry
27.39 ? 4% +11.1 38.45 ? 18% perf-profile.children.cycles-pp.do_idle
27.10 ? 4% +11.1 38.21 ? 18% perf-profile.children.cycles-pp.cpuidle_enter
27.09 ? 4% +11.1 38.21 ? 18% perf-profile.children.cycles-pp.cpuidle_enter_state
26.00 ? 4% +11.3 37.32 ? 18% perf-profile.children.cycles-pp.intel_idle
11.56 -3.8 7.71 ? 11% perf-profile.self.cycles-pp.___might_sleep
1.28 ? 4% -0.2 1.07 ? 10% perf-profile.self.cycles-pp.perf_event_mmap
1.01 -0.2 0.84 ? 11% perf-profile.self.cycles-pp.__entry_text_start
1.08 ? 4% -0.2 0.92 ? 9% perf-profile.self.cycles-pp.kmem_cache_alloc
0.66 ? 6% -0.1 0.51 ? 14% perf-profile.self.cycles-pp.strlen
0.67 -0.1 0.54 ? 11% perf-profile.self.cycles-pp.find_vma
0.50 ? 4% -0.1 0.40 ? 12% perf-profile.self.cycles-pp.common_file_perm
0.50 ? 6% -0.1 0.41 ? 11% perf-profile.self.cycles-pp.get_obj_cgroup_from_current
0.34 ? 4% -0.1 0.28 ? 9% perf-profile.self.cycles-pp.sync_mm_rss
0.39 ? 3% -0.1 0.33 ? 8% perf-profile.self.cycles-pp.downgrade_write
0.17 ? 13% -0.1 0.11 ? 21% perf-profile.self.cycles-pp.cap_capable
0.24 ? 3% -0.0 0.20 ? 10% perf-profile.self.cycles-pp.vmacache_find
0.15 ? 7% -0.0 0.11 ? 25% perf-profile.self.cycles-pp.menu_select
0.39 ? 3% -0.0 0.34 ? 7% perf-profile.self.cycles-pp.__vm_munmap
0.08 ? 8% -0.0 0.04 ? 73% perf-profile.self.cycles-pp.common_mmap
0.13 ? 11% -0.0 0.09 ? 6% perf-profile.self.cycles-pp.tlb_flush_mmu
0.15 ? 6% -0.0 0.12 ? 12% perf-profile.self.cycles-pp.touch_atime
0.13 ? 10% -0.0 0.10 ? 10% perf-profile.self.cycles-pp.remove_vma
0.11 ? 11% -0.0 0.08 ? 6% perf-profile.self.cycles-pp.timestamp_truncate
0.18 ? 10% -0.0 0.15 ? 8% perf-profile.self.cycles-pp.may_expand_vm
0.16 ? 4% -0.0 0.13 ? 6% perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.18 ? 5% -0.0 0.15 ? 11% perf-profile.self.cycles-pp.get_unmapped_area
0.19 ? 6% -0.0 0.16 ? 5% perf-profile.self.cycles-pp.userfaultfd_unmap_complete
0.13 ? 5% -0.0 0.11 ? 10% perf-profile.self.cycles-pp.prepend
0.10 ? 7% +0.0 0.13 ? 14% perf-profile.self.cycles-pp.blocking_notifier_call_chain
26.00 ? 4% +11.3 37.32 ? 18% perf-profile.self.cycles-pp.intel_idle
will-it-scale.per_process_ops
585000 +------------------------------------------------------------------+
| O O O OO O O OO O |
580000 |-+ O |
| O O OO O O O O O |
575000 |-O OO O |
| O O |
570000 |-+ O O O O OO O |
| OO |
565000 |-+ O |
| .+. +. .+.++.+.+.+.++.+.+.+. |
560000 |-+ +.+.+ +.+ +.+ + |
| : |
555000 |.+.++.+. .+.+ |
| + |
550000 +------------------------------------------------------------------+
[*] bisect-good sample
[O] bisect-bad sample
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang