2020-04-25 11:53:27

by Chen, Rong A

[permalink] [raw]
Subject: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

Greeting,

FYI, we noticed a -14.1% regression of will-it-scale.per_process_ops due to commit:


commit: 1de08dccd383482a3e88845d3554094d338f5ff9 ("x86/mce: Add a struct mce.kflags field")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: will-it-scale
on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
with following parameters:

nr_task: 100%
mode: process
test: malloc1
cpufreq_governor: performance
ucode: 0x11

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-20191114.cgz/lkp-knm01/malloc1/will-it-scale/0x11

commit:
9554bfe403 ("x86/mce: Convert the CEC to use the MCE notifier")
1de08dccd3 ("x86/mce: Add a struct mce.kflags field")

9554bfe403bdfc08 1de08dccd383482a3e88845d355
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:4 25% 1:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x
:4 25% 1:4 dmesg.WARNING:at_ip___perf_sw_event/0x
%stddev %change %stddev
\ | \
668.00 -14.1% 573.75 will-it-scale.per_process_ops
192559 -14.1% 165344 will-it-scale.workload
424371 -20.3% 338331 ± 8% vmstat.system.in
0.00 ± 13% +0.0 0.00 ± 22% mpstat.cpu.all.soft%
0.54 -0.1 0.47 ± 3% mpstat.cpu.all.usr%
1.205e+08 -13.7% 1.039e+08 numa-numastat.node0.local_node
1.205e+08 -13.7% 1.039e+08 numa-numastat.node0.numa_hit
61585280 -13.1% 53521568 numa-vmstat.node0.numa_hit
61585799 -13.1% 53522027 numa-vmstat.node0.numa_local
1.203e+08 -13.8% 1.037e+08 proc-vmstat.numa_hit
1.203e+08 -13.8% 1.037e+08 proc-vmstat.numa_local
1.205e+08 -13.7% 1.04e+08 proc-vmstat.pgalloc_normal
60608363 -13.6% 52339576 proc-vmstat.pgfault
1.204e+08 -13.8% 1.038e+08 proc-vmstat.pgfree
0.04 ± 9% +17.4% 0.05 ± 7% sched_debug.cfs_rq:/.nr_running.stddev
52.52 ± 5% +11.9% 58.79 ± 2% sched_debug.cfs_rq:/.util_avg.stddev
2049590 ± 7% -12.8% 1788136 ± 4% sched_debug.cpu.avg_idle.avg
418.68 ± 2% -22.1% 326.13 ± 2% sched_debug.cpu.clock.stddev
418.68 ± 2% -22.1% 326.14 ± 2% sched_debug.cpu.clock_task.stddev
158439 ± 8% +29.6% 205376 ± 16% sched_debug.cpu.max_idle_balance_cost.stddev
0.00 -18.8% 0.00 ± 2% sched_debug.cpu.next_balance.stddev
0.00 ±173% +500.0% 0.00 ± 33% sched_debug.cpu.nr_uninterruptible.avg
-49.04 +76.9% -86.75 sched_debug.cpu.nr_uninterruptible.min
1117 +13.3% 1266 ± 3% sched_debug.cpu.sched_count.min
36.92 -2.4 34.52 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
36.91 -2.4 34.51 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
36.30 -2.3 33.98 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
36.30 -2.3 33.99 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.55 -0.3 0.26 ±100% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn
0.54 ± 2% -0.3 0.26 ±100% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.release_pages
1.02 -0.1 0.94 perf-profile.calltrace.cycles-pp.page_fault
0.81 -0.1 0.73 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_page_fault.page_fault
0.98 -0.1 0.90 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault
0.77 -0.1 0.68 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_page_fault.page_fault
0.69 -0.1 0.61 perf-profile.calltrace.cycles-pp.handle_pte_fault.__handle_mm_fault.handle_mm_fault.do_page_fault.page_fault
0.65 -0.1 0.58 ± 5% perf-profile.calltrace.cycles-pp.apic_timer_interrupt._raw_spin_unlock_irqrestore.release_pages.tlb_flush_mmu.tlb_finish_mmu
0.67 ± 2% -0.1 0.60 ± 6% perf-profile.calltrace.cycles-pp.apic_timer_interrupt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain
0.66 -0.1 0.59 ± 4% perf-profile.calltrace.cycles-pp._raw_spin_unlock_irqrestore.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region
0.62 -0.1 0.56 ± 4% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.release_pages.tlb_flush_mmu
0.64 ± 2% -0.1 0.57 ± 6% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn.lru_add_drain_cpu
0.66 -0.1 0.60 ± 6% perf-profile.calltrace.cycles-pp._raw_spin_unlock_irqrestore.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region
47.88 +0.1 48.00 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
47.86 +0.1 48.00 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap
47.74 +0.1 47.89 perf-profile.calltrace.cycles-pp.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap
96.85 +0.3 97.18 perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
46.33 +0.3 46.67 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region
46.30 +0.3 46.64 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.tlb_finish_mmu
47.55 +0.3 47.90 perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
47.55 +0.4 47.90 perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap.__vm_munmap
47.52 +0.4 47.87 perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap
96.64 +0.4 97.01 perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
46.22 +0.5 46.74 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region
46.20 +0.5 46.72 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain
60.91 +2.6 63.55 perf-profile.calltrace.cycles-pp.munmap
60.60 +2.6 63.24 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
60.84 +2.6 63.47 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.munmap
60.59 +2.6 63.23 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
60.83 +2.6 63.47 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
2.21 -0.3 1.90 ± 6% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
2.24 -0.3 1.93 ± 5% perf-profile.children.cycles-pp.apic_timer_interrupt
2.13 -0.3 1.85 ± 5% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
1.84 -0.2 1.60 ± 5% perf-profile.children.cycles-pp.hrtimer_interrupt
1.57 -0.2 1.39 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.66 -0.1 0.55 perf-profile.children.cycles-pp.vm_mmap_pgoff
0.67 -0.1 0.56 perf-profile.children.cycles-pp.ksys_mmap_pgoff
1.07 -0.1 0.96 ± 5% perf-profile.children.cycles-pp.tick_sched_timer
1.03 -0.1 0.93 ± 5% perf-profile.children.cycles-pp.tick_sched_handle
1.01 -0.1 0.91 ± 5% perf-profile.children.cycles-pp.update_process_times
0.58 -0.1 0.49 perf-profile.children.cycles-pp.do_mmap
1.01 -0.1 0.92 perf-profile.children.cycles-pp.do_page_fault
1.05 -0.1 0.97 perf-profile.children.cycles-pp.page_fault
0.83 -0.1 0.74 ± 2% perf-profile.children.cycles-pp.handle_mm_fault
0.79 ± 2% -0.1 0.71 ± 6% perf-profile.children.cycles-pp.scheduler_tick
0.92 ± 2% -0.1 0.84 ± 3% perf-profile.children.cycles-pp.unmap_vmas
0.78 -0.1 0.70 ± 2% perf-profile.children.cycles-pp.__handle_mm_fault
0.88 -0.1 0.80 ± 3% perf-profile.children.cycles-pp.unmap_page_range
0.70 -0.1 0.62 perf-profile.children.cycles-pp.handle_pte_fault
0.43 -0.1 0.36 perf-profile.children.cycles-pp.mmap_region
0.47 ± 2% -0.1 0.41 ± 2% perf-profile.children.cycles-pp.mmap64
0.55 -0.0 0.50 ± 4% perf-profile.children.cycles-pp.task_tick_fair
0.18 ± 2% -0.0 0.13 ± 3% perf-profile.children.cycles-pp.perf_event_mmap
0.11 -0.0 0.08 ± 6% perf-profile.children.cycles-pp.perf_iterate_sb
0.31 -0.0 0.27 ± 3% perf-profile.children.cycles-pp.__alloc_pages_nodemask
0.26 -0.0 0.23 perf-profile.children.cycles-pp.get_page_from_freelist
0.22 -0.0 0.19 ± 3% perf-profile.children.cycles-pp.pte_alloc_one
0.24 ± 2% -0.0 0.21 ± 4% perf-profile.children.cycles-pp.__pte_alloc
0.50 -0.0 0.47 ± 2% perf-profile.children.cycles-pp.change_protection
0.50 -0.0 0.47 ± 2% perf-profile.children.cycles-pp.change_p4d_range
0.50 -0.0 0.47 ± 2% perf-profile.children.cycles-pp.change_prot_numa
0.11 ± 4% -0.0 0.09 ± 4% perf-profile.children.cycles-pp.free_unref_page_list
0.50 -0.0 0.47 ± 2% perf-profile.children.cycles-pp.task_work_run
0.50 -0.0 0.48 perf-profile.children.cycles-pp.exit_to_usermode_loop
0.32 ± 2% -0.0 0.29 ± 4% perf-profile.children.cycles-pp.___might_sleep
0.50 -0.0 0.47 ± 2% perf-profile.children.cycles-pp.task_numa_work
0.16 ± 2% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.prep_new_page
0.16 ± 2% -0.0 0.14 ± 3% perf-profile.children.cycles-pp.alloc_pages_vma
0.14 ± 3% -0.0 0.12 ± 4% perf-profile.children.cycles-pp.clear_page_erms
0.07 ± 6% -0.0 0.05 perf-profile.children.cycles-pp.kmem_cache_free
0.07 -0.0 0.05 ± 8% perf-profile.children.cycles-pp.percpu_counter_add_batch
0.17 ± 2% -0.0 0.15 ± 4% perf-profile.children.cycles-pp._cond_resched
0.12 -0.0 0.10 ± 4% perf-profile.children.cycles-pp.get_unmapped_area
0.09 ± 4% -0.0 0.07 ± 10% perf-profile.children.cycles-pp._raw_spin_lock
0.13 -0.0 0.11 ± 4% perf-profile.children.cycles-pp.__anon_vma_prepare
0.13 ± 3% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.free_pgtables
0.10 ± 4% -0.0 0.08 ± 5% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
0.16 -0.0 0.15 ± 3% perf-profile.children.cycles-pp.irq_exit
0.13 -0.0 0.12 ± 3% perf-profile.children.cycles-pp.free_pgd_range
0.10 -0.0 0.09 ± 4% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.05 +0.0 0.08 ± 6% perf-profile.children.cycles-pp.__mod_memcg_state
0.07 +0.0 0.10 perf-profile.children.cycles-pp.__mod_lruvec_state
0.14 ± 13% +0.0 0.18 ± 6% perf-profile.children.cycles-pp.__remove_hrtimer
47.91 +0.1 48.04 perf-profile.children.cycles-pp.tlb_finish_mmu
47.90 +0.1 48.03 perf-profile.children.cycles-pp.tlb_flush_mmu
47.81 +0.1 47.96 perf-profile.children.cycles-pp.release_pages
98.55 +0.1 98.70 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
98.53 +0.1 98.68 perf-profile.children.cycles-pp.do_syscall_64
96.91 +0.3 97.24 perf-profile.children.cycles-pp.__x64_sys_munmap
96.89 +0.3 97.22 perf-profile.children.cycles-pp.__vm_munmap
47.73 +0.3 48.06 perf-profile.children.cycles-pp.pagevec_lru_move_fn
96.88 +0.3 97.21 perf-profile.children.cycles-pp.__do_munmap
47.60 +0.4 47.95 perf-profile.children.cycles-pp.lru_add_drain
47.59 +0.4 47.94 perf-profile.children.cycles-pp.lru_add_drain_cpu
96.67 +0.4 97.03 perf-profile.children.cycles-pp.unmap_region
92.78 +0.8 93.60 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
92.83 +0.8 93.67 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
60.92 +2.6 63.56 perf-profile.children.cycles-pp.munmap
0.17 ± 6% -0.1 0.06 ± 13% perf-profile.self.cycles-pp.__hrtimer_run_queues
0.47 ± 2% -0.1 0.42 ± 4% perf-profile.self.cycles-pp.unmap_page_range
0.10 ± 4% -0.0 0.07 ± 12% perf-profile.self.cycles-pp.hrtimer_interrupt
0.11 -0.0 0.08 ± 10% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.44 -0.0 0.41 ± 3% perf-profile.self.cycles-pp.change_p4d_range
0.08 -0.0 0.06 ± 9% perf-profile.self.cycles-pp.perf_iterate_sb
0.14 -0.0 0.12 ± 3% perf-profile.self.cycles-pp.clear_page_erms
0.08 ± 5% -0.0 0.07 ± 6% perf-profile.self.cycles-pp._raw_spin_lock
0.07 ± 7% -0.0 0.05 perf-profile.self.cycles-pp.kmem_cache_free
0.08 -0.0 0.07 ± 6% perf-profile.self.cycles-pp.release_pages
0.09 -0.0 0.08 ± 5% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.05 ± 8% +0.0 0.07 ± 14% perf-profile.self.cycles-pp.___perf_sw_event
0.05 +0.0 0.07 ± 5% perf-profile.self.cycles-pp.__mod_memcg_state
0.00 +0.1 0.14 ± 9% perf-profile.self.cycles-pp.__remove_hrtimer
92.77 +0.8 93.60 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
124732 +11.7% 139275 softirqs.CPU0.TIMER
125382 +13.1% 141792 ± 2% softirqs.CPU10.TIMER
124413 +11.5% 138750 softirqs.CPU100.TIMER
123657 +12.2% 138733 softirqs.CPU101.TIMER
123899 +12.1% 138896 softirqs.CPU102.TIMER
124228 +11.7% 138717 softirqs.CPU103.TIMER
123619 +12.3% 138882 softirqs.CPU104.TIMER
123585 +12.7% 139247 softirqs.CPU105.TIMER
123593 +12.0% 138384 softirqs.CPU106.TIMER
123642 +12.2% 138741 softirqs.CPU107.TIMER
123926 +11.7% 138456 softirqs.CPU108.TIMER
123798 +12.0% 138608 softirqs.CPU109.TIMER
124136 +12.0% 139018 softirqs.CPU11.TIMER
123935 +12.5% 139487 softirqs.CPU110.TIMER
124699 +11.1% 138505 softirqs.CPU111.TIMER
123708 +12.3% 138865 softirqs.CPU112.TIMER
123122 +13.1% 139227 softirqs.CPU113.TIMER
123665 +16.1% 143543 ± 7% softirqs.CPU114.TIMER
123887 +11.6% 138272 softirqs.CPU115.TIMER
123873 +11.8% 138498 softirqs.CPU116.TIMER
123987 +11.6% 138362 softirqs.CPU117.TIMER
123891 +11.6% 138254 softirqs.CPU118.TIMER
123315 +16.1% 143144 ± 4% softirqs.CPU119.TIMER
123661 +14.9% 142093 ± 3% softirqs.CPU12.TIMER
123199 +12.3% 138367 softirqs.CPU120.TIMER
123372 +12.1% 138262 softirqs.CPU121.TIMER
123641 +11.6% 137956 softirqs.CPU122.TIMER
123389 +12.6% 138884 softirqs.CPU123.TIMER
123309 +12.4% 138608 softirqs.CPU124.TIMER
123604 +11.9% 138283 softirqs.CPU125.TIMER
123456 +12.2% 138513 softirqs.CPU126.TIMER
123526 +12.8% 139288 softirqs.CPU127.TIMER
123496 +11.9% 138241 softirqs.CPU128.TIMER
123381 +12.9% 139252 softirqs.CPU129.TIMER
125036 +11.1% 138906 softirqs.CPU13.TIMER
123722 +12.5% 139159 softirqs.CPU130.TIMER
123523 +12.1% 138454 softirqs.CPU131.TIMER
123760 +11.8% 138329 softirqs.CPU132.TIMER
123569 +12.2% 138624 softirqs.CPU133.TIMER
123669 +12.1% 138657 softirqs.CPU134.TIMER
122996 +12.7% 138591 softirqs.CPU135.TIMER
123011 +12.6% 138503 softirqs.CPU136.TIMER
123141 +12.6% 138624 softirqs.CPU137.TIMER
123505 +12.0% 138360 softirqs.CPU138.TIMER
123513 +12.0% 138343 softirqs.CPU139.TIMER
124622 +18.6% 147768 ± 6% softirqs.CPU14.TIMER
123088 +12.3% 138241 softirqs.CPU140.TIMER
123186 +13.2% 139395 softirqs.CPU141.TIMER
123249 +12.8% 139061 ± 2% softirqs.CPU142.TIMER
123274 +12.2% 138333 softirqs.CPU143.TIMER
123241 +12.5% 138599 ± 2% softirqs.CPU144.TIMER
123183 +12.1% 138053 softirqs.CPU145.TIMER
124668 +10.7% 137947 softirqs.CPU146.TIMER
123275 +12.7% 138880 softirqs.CPU147.TIMER
123278 +12.3% 138403 softirqs.CPU148.TIMER
123611 +12.3% 138764 softirqs.CPU149.TIMER
124133 +12.0% 139055 softirqs.CPU15.TIMER
123195 +12.1% 138071 softirqs.CPU150.TIMER
123687 +11.7% 138126 softirqs.CPU151.TIMER
123483 +11.9% 138191 softirqs.CPU152.TIMER
123281 +12.1% 138248 softirqs.CPU153.TIMER
124097 +11.3% 138090 softirqs.CPU154.TIMER
123803 +11.7% 138348 softirqs.CPU155.TIMER
122720 +12.6% 138171 softirqs.CPU157.TIMER
123267 +12.7% 138929 softirqs.CPU158.TIMER
123387 +12.0% 138189 softirqs.CPU159.TIMER
124075 +12.0% 138944 softirqs.CPU16.TIMER
123571 +12.4% 138834 softirqs.CPU160.TIMER
123578 +12.0% 138378 softirqs.CPU161.TIMER
123664 +11.9% 138397 softirqs.CPU162.TIMER
123363 +11.9% 138061 softirqs.CPU163.TIMER
123053 +12.4% 138264 softirqs.CPU164.TIMER
123412 +12.1% 138367 softirqs.CPU165.TIMER
123721 +11.7% 138219 softirqs.CPU166.TIMER
123512 +11.8% 138098 softirqs.CPU167.TIMER
123392 +12.1% 138382 softirqs.CPU168.TIMER
123559 +11.9% 138276 softirqs.CPU169.TIMER
124075 +12.5% 139563 softirqs.CPU17.TIMER
123019 +12.5% 138428 softirqs.CPU170.TIMER
123467 +12.0% 138328 softirqs.CPU171.TIMER
123040 +12.3% 138224 softirqs.CPU173.TIMER
123997 +11.6% 138334 softirqs.CPU174.TIMER
123787 +11.8% 138437 softirqs.CPU175.TIMER
123315 +12.2% 138419 softirqs.CPU176.TIMER
123771 +12.1% 138716 softirqs.CPU177.TIMER
123016 +12.2% 138058 softirqs.CPU178.TIMER
122844 +12.4% 138072 softirqs.CPU179.TIMER
123981 +12.1% 138975 softirqs.CPU18.TIMER
123511 +11.8% 138041 softirqs.CPU180.TIMER
123415 +12.0% 138171 softirqs.CPU181.TIMER
122954 +12.9% 138845 softirqs.CPU182.TIMER
123291 +12.0% 138113 softirqs.CPU183.TIMER
122910 +12.4% 138175 softirqs.CPU184.TIMER
123015 +12.8% 138812 ± 2% softirqs.CPU185.TIMER
123197 +11.5% 137396 softirqs.CPU186.TIMER
122914 +12.2% 137870 softirqs.CPU187.TIMER
122854 +12.7% 138509 softirqs.CPU188.TIMER
122864 +12.5% 138211 softirqs.CPU189.TIMER
124311 +11.5% 138654 softirqs.CPU19.TIMER
122961 +12.4% 138166 softirqs.CPU190.TIMER
123015 +12.3% 138134 softirqs.CPU191.TIMER
122906 +12.3% 138029 softirqs.CPU192.TIMER
122988 +12.3% 138098 softirqs.CPU193.TIMER
122863 +11.9% 137464 softirqs.CPU194.TIMER
122883 +12.3% 138039 softirqs.CPU195.TIMER
123157 +12.1% 138023 softirqs.CPU196.TIMER
122831 +12.5% 138204 softirqs.CPU197.TIMER
123113 +11.9% 137755 softirqs.CPU198.TIMER
122809 +12.2% 137771 softirqs.CPU199.TIMER
123833 +12.3% 139118 softirqs.CPU20.TIMER
122908 +12.1% 137806 softirqs.CPU200.TIMER
122641 +12.4% 137887 softirqs.CPU201.TIMER
123187 +12.2% 138253 softirqs.CPU202.TIMER
122997 +12.2% 138012 softirqs.CPU203.TIMER
123088 +11.7% 137542 softirqs.CPU204.TIMER
122928 +12.2% 137903 softirqs.CPU205.TIMER
122990 +12.1% 137911 softirqs.CPU206.TIMER
123028 +12.2% 138095 softirqs.CPU207.TIMER
122473 +12.7% 138050 softirqs.CPU208.TIMER
122665 +12.8% 138325 softirqs.CPU209.TIMER
124436 +11.5% 138798 softirqs.CPU21.TIMER
122387 +12.7% 137900 softirqs.CPU210.TIMER
122693 +12.3% 137746 softirqs.CPU211.TIMER
122651 +12.3% 137710 softirqs.CPU212.TIMER
122995 +11.9% 137644 softirqs.CPU213.TIMER
122918 +11.9% 137535 softirqs.CPU214.TIMER
122662 +12.0% 137431 softirqs.CPU215.TIMER
122226 +12.3% 137244 softirqs.CPU216.TIMER
122549 +12.1% 137338 softirqs.CPU217.TIMER
123475 +11.5% 137651 softirqs.CPU218.TIMER
123553 +11.4% 137635 softirqs.CPU219.TIMER
123621 +13.3% 140099 ± 2% softirqs.CPU22.TIMER
122924 +12.0% 137726 softirqs.CPU220.TIMER
123174 +12.3% 138348 softirqs.CPU221.TIMER
122630 +12.5% 137977 softirqs.CPU222.TIMER
123731 +11.8% 138294 softirqs.CPU223.TIMER
123492 +11.8% 138023 softirqs.CPU224.TIMER
123019 +11.9% 137651 softirqs.CPU225.TIMER
123061 +12.0% 137781 softirqs.CPU226.TIMER
123436 +11.8% 137983 softirqs.CPU227.TIMER
122711 +12.3% 137768 softirqs.CPU228.TIMER
122361 +13.2% 138515 softirqs.CPU229.TIMER
124044 +12.2% 139183 softirqs.CPU23.TIMER
122462 +13.2% 138573 softirqs.CPU230.TIMER
122970 +12.5% 138298 softirqs.CPU231.TIMER
123005 +12.2% 137962 softirqs.CPU232.TIMER
122716 +12.4% 137939 softirqs.CPU233.TIMER
122591 +12.4% 137826 softirqs.CPU234.TIMER
122794 +12.4% 138058 softirqs.CPU235.TIMER
122607 +12.6% 138015 softirqs.CPU236.TIMER
122744 +12.8% 138401 softirqs.CPU237.TIMER
122680 +12.2% 137683 softirqs.CPU238.TIMER
122589 +12.3% 137729 softirqs.CPU239.TIMER
122741 +12.5% 138046 softirqs.CPU240.TIMER
124001 +11.3% 137964 softirqs.CPU241.TIMER
122283 +12.5% 137507 softirqs.CPU242.TIMER
122722 +12.4% 137958 softirqs.CPU243.TIMER
123393 +11.4% 137463 softirqs.CPU244.TIMER
122456 +12.4% 137610 softirqs.CPU245.TIMER
122995 +12.3% 138090 softirqs.CPU246.TIMER
123687 +11.4% 137814 softirqs.CPU247.TIMER
122494 +12.9% 138288 softirqs.CPU248.TIMER
122634 +12.6% 138146 softirqs.CPU249.TIMER
124923 +11.8% 139698 softirqs.CPU25.TIMER
122661 +12.3% 137714 softirqs.CPU250.TIMER
122343 +12.5% 137689 softirqs.CPU251.TIMER
122846 +11.8% 137353 softirqs.CPU252.TIMER
122455 +12.3% 137525 softirqs.CPU253.TIMER
122334 +13.1% 138369 softirqs.CPU254.TIMER
122023 +13.2% 138083 ± 2% softirqs.CPU256.TIMER
122317 +12.2% 137246 softirqs.CPU257.TIMER
122558 +11.6% 136825 softirqs.CPU258.TIMER
122482 +15.8% 141829 ± 6% softirqs.CPU259.TIMER
123637 +12.2% 138768 softirqs.CPU26.TIMER
122120 +12.5% 137405 softirqs.CPU260.TIMER
122386 +12.2% 137365 softirqs.CPU262.TIMER
122398 +12.4% 137560 softirqs.CPU263.TIMER
122108 +12.5% 137359 softirqs.CPU264.TIMER
122101 +15.4% 140873 ± 5% softirqs.CPU265.TIMER
122163 +12.2% 137050 softirqs.CPU266.TIMER
122077 +12.6% 137416 softirqs.CPU267.TIMER
122318 +12.3% 137341 softirqs.CPU268.TIMER
122063 +12.5% 137290 softirqs.CPU269.TIMER
123854 +12.0% 138752 softirqs.CPU27.TIMER
122323 +12.0% 137011 softirqs.CPU270.TIMER
122091 +12.4% 137199 softirqs.CPU271.TIMER
122042 +13.0% 137944 softirqs.CPU272.TIMER
122898 +11.4% 136920 softirqs.CPU273.TIMER
122100 +12.5% 137340 softirqs.CPU274.TIMER
122223 +12.2% 137141 softirqs.CPU275.TIMER
122367 +11.8% 136818 softirqs.CPU276.TIMER
122216 +12.2% 137067 softirqs.CPU277.TIMER
122035 +12.2% 136880 softirqs.CPU278.TIMER
122296 +12.4% 137442 softirqs.CPU279.TIMER
124072 +12.2% 139264 softirqs.CPU28.TIMER
122058 +12.3% 137060 softirqs.CPU280.TIMER
121889 +12.5% 137107 softirqs.CPU281.TIMER
121953 +12.4% 137058 softirqs.CPU282.TIMER
122136 +12.3% 137153 softirqs.CPU283.TIMER
122152 +11.9% 136661 softirqs.CPU284.TIMER
122126 +12.1% 136891 softirqs.CPU285.TIMER
121523 +12.7% 136947 softirqs.CPU286.TIMER
117427 +12.6% 132264 ± 2% softirqs.CPU287.TIMER
124082 +11.7% 138601 softirqs.CPU29.TIMER
124520 +11.4% 138684 softirqs.CPU3.TIMER
125786 ± 3% +10.4% 138855 softirqs.CPU30.TIMER
124607 +11.5% 138958 softirqs.CPU31.TIMER
123701 +13.1% 139886 softirqs.CPU32.TIMER
124593 +11.9% 139391 softirqs.CPU33.TIMER
123719 +12.0% 138526 softirqs.CPU34.TIMER
123959 +11.9% 138665 softirqs.CPU35.TIMER
123758 +12.0% 138556 softirqs.CPU36.TIMER
123856 +11.9% 138597 softirqs.CPU37.TIMER
124053 +16.7% 144775 ± 6% softirqs.CPU38.TIMER
123675 +11.8% 138281 softirqs.CPU39.TIMER
124228 +12.0% 139164 softirqs.CPU4.TIMER
123900 +12.3% 139175 softirqs.CPU40.TIMER
123892 +12.4% 139211 softirqs.CPU41.TIMER
127063 ± 3% +9.1% 138615 softirqs.CPU42.TIMER
123679 +12.2% 138760 softirqs.CPU43.TIMER
124702 +11.9% 139566 softirqs.CPU44.TIMER
123975 +11.9% 138712 softirqs.CPU45.TIMER
124174 +11.6% 138531 softirqs.CPU46.TIMER
123644 +12.1% 138571 softirqs.CPU48.TIMER
123687 +12.3% 138843 softirqs.CPU49.TIMER
124610 +11.6% 139078 softirqs.CPU5.TIMER
124146 +15.8% 143709 ± 3% softirqs.CPU51.TIMER
123635 +12.8% 139412 softirqs.CPU52.TIMER
124065 +12.1% 139088 softirqs.CPU53.TIMER
124147 +14.2% 141788 ± 3% softirqs.CPU54.TIMER
123762 +12.2% 138905 softirqs.CPU55.TIMER
125582 +10.6% 138868 softirqs.CPU57.TIMER
125328 +18.7% 148744 ± 11% softirqs.CPU58.TIMER
123995 +12.1% 138967 softirqs.CPU59.TIMER
124120 +12.1% 139157 softirqs.CPU6.TIMER
124023 +12.3% 139227 softirqs.CPU60.TIMER
123781 +12.1% 138727 softirqs.CPU61.TIMER
123569 +12.5% 138970 softirqs.CPU62.TIMER
123608 +12.5% 139074 softirqs.CPU63.TIMER
123407 +13.1% 139543 softirqs.CPU64.TIMER
127045 ± 4% +9.2% 138760 softirqs.CPU65.TIMER
126610 ± 5% +9.5% 138633 softirqs.CPU66.TIMER
123612 +12.1% 138603 softirqs.CPU67.TIMER
123604 +12.5% 139113 softirqs.CPU68.TIMER
123505 +12.2% 138554 softirqs.CPU69.TIMER
124325 +11.7% 138838 softirqs.CPU7.TIMER
124090 +12.5% 139591 softirqs.CPU70.TIMER
123699 +11.9% 138451 softirqs.CPU71.TIMER
124613 +13.9% 141945 ± 3% softirqs.CPU72.TIMER
123431 +12.3% 138639 softirqs.CPU73.TIMER
123853 +12.1% 138852 softirqs.CPU75.TIMER
124259 +11.8% 138957 softirqs.CPU76.TIMER
123966 +12.1% 138962 softirqs.CPU77.TIMER
123577 +12.1% 138498 softirqs.CPU78.TIMER
123749 +12.0% 138577 softirqs.CPU79.TIMER
124280 +11.7% 138772 softirqs.CPU8.TIMER
124029 +11.7% 138583 softirqs.CPU80.TIMER
123492 +12.4% 138826 softirqs.CPU81.TIMER
124304 +11.6% 138753 softirqs.CPU82.TIMER
123952 +12.0% 138766 softirqs.CPU83.TIMER
123672 +15.0% 142254 ± 3% softirqs.CPU85.TIMER
123910 +12.4% 139236 softirqs.CPU86.TIMER
123664 +12.3% 138908 softirqs.CPU87.TIMER
124068 +12.1% 139076 softirqs.CPU88.TIMER
123802 +12.2% 138885 softirqs.CPU89.TIMER
123826 +12.0% 138649 softirqs.CPU90.TIMER
124086 +11.7% 138621 softirqs.CPU91.TIMER
123586 +12.4% 138925 softirqs.CPU92.TIMER
123696 +12.5% 139160 softirqs.CPU93.TIMER
123720 +11.8% 138367 softirqs.CPU94.TIMER
124536 +11.3% 138666 softirqs.CPU95.TIMER
123950 +12.2% 139071 softirqs.CPU96.TIMER
123876 +12.1% 138860 softirqs.CPU97.TIMER
123700 +12.2% 138807 softirqs.CPU98.TIMER
123452 +12.5% 138905 softirqs.CPU99.TIMER
35617264 +12.0% 39908742 softirqs.TIMER
554.75 ± 65% +547.1% 3590 ± 96% interrupts.32:IR-PCI-MSI.2097155-edge.eth0-TxRx-2
452452 -21.0% 357575 ± 8% interrupts.CPU0.LOC:Local_timer_interrupts
1315 ± 10% +55.9% 2051 ± 16% interrupts.CPU0.RES:Rescheduling_interrupts
452347 -21.1% 357112 ± 8% interrupts.CPU1.LOC:Local_timer_interrupts
452532 -21.3% 356223 ± 8% interrupts.CPU10.LOC:Local_timer_interrupts
448763 -20.9% 354890 ± 8% interrupts.CPU100.LOC:Local_timer_interrupts
448680 -20.8% 355364 ± 8% interrupts.CPU101.LOC:Local_timer_interrupts
447608 -20.7% 355149 ± 8% interrupts.CPU102.LOC:Local_timer_interrupts
448227 -20.9% 354352 ± 8% interrupts.CPU103.LOC:Local_timer_interrupts
448996 -20.6% 356349 ± 8% interrupts.CPU104.LOC:Local_timer_interrupts
448510 -20.7% 355821 ± 8% interrupts.CPU105.LOC:Local_timer_interrupts
447226 -20.7% 354825 ± 8% interrupts.CPU106.LOC:Local_timer_interrupts
446828 -20.9% 353563 ± 8% interrupts.CPU107.LOC:Local_timer_interrupts
446384 -20.5% 355009 ± 8% interrupts.CPU108.LOC:Local_timer_interrupts
446823 -20.7% 354511 ± 8% interrupts.CPU109.LOC:Local_timer_interrupts
452246 -21.1% 356984 ± 8% interrupts.CPU11.LOC:Local_timer_interrupts
447034 -20.5% 355451 ± 8% interrupts.CPU110.LOC:Local_timer_interrupts
447783 -20.9% 354154 ± 8% interrupts.CPU111.LOC:Local_timer_interrupts
446657 -20.5% 355120 ± 8% interrupts.CPU112.LOC:Local_timer_interrupts
445392 -20.3% 354902 ± 8% interrupts.CPU113.LOC:Local_timer_interrupts
448354 -20.7% 355621 ± 8% interrupts.CPU114.LOC:Local_timer_interrupts
72.00 ± 84% +293.8% 283.50 ± 28% interrupts.CPU114.RES:Rescheduling_interrupts
447962 -20.8% 354707 ± 8% interrupts.CPU115.LOC:Local_timer_interrupts
447215 -20.7% 354848 ± 8% interrupts.CPU116.LOC:Local_timer_interrupts
447530 -20.8% 354411 ± 8% interrupts.CPU117.LOC:Local_timer_interrupts
448647 -20.9% 354998 ± 8% interrupts.CPU118.LOC:Local_timer_interrupts
447753 -20.6% 355316 ± 8% interrupts.CPU119.LOC:Local_timer_interrupts
554.75 ± 65% +547.1% 3590 ± 96% interrupts.CPU12.32:IR-PCI-MSI.2097155-edge.eth0-TxRx-2
451965 -20.9% 357310 ± 8% interrupts.CPU12.LOC:Local_timer_interrupts
448234 -20.7% 355312 ± 8% interrupts.CPU120.LOC:Local_timer_interrupts
447794 -20.7% 354971 ± 8% interrupts.CPU121.LOC:Local_timer_interrupts
448691 -20.4% 357243 ± 8% interrupts.CPU122.LOC:Local_timer_interrupts
5632 ± 32% -33.5% 3745 interrupts.CPU122.NMI:Non-maskable_interrupts
5632 ± 32% -33.5% 3745 interrupts.CPU122.PMI:Performance_monitoring_interrupts
447313 -20.2% 356771 ± 9% interrupts.CPU123.LOC:Local_timer_interrupts
91.50 ± 57% -59.0% 37.50 ±128% interrupts.CPU123.RES:Rescheduling_interrupts
448249 -20.3% 357447 ± 8% interrupts.CPU124.LOC:Local_timer_interrupts
448297 -20.6% 355804 ± 8% interrupts.CPU125.LOC:Local_timer_interrupts
449032 -21.0% 354647 ± 8% interrupts.CPU126.LOC:Local_timer_interrupts
447604 -20.8% 354651 ± 8% interrupts.CPU127.LOC:Local_timer_interrupts
447392 -20.4% 356084 ± 8% interrupts.CPU128.LOC:Local_timer_interrupts
169.25 ± 48% +405.2% 855.00 ±119% interrupts.CPU128.RES:Rescheduling_interrupts
447344 -20.7% 354766 ± 8% interrupts.CPU129.LOC:Local_timer_interrupts
6632 ± 24% -29.3% 4686 ± 33% interrupts.CPU129.NMI:Non-maskable_interrupts
6632 ± 24% -29.3% 4686 ± 33% interrupts.CPU129.PMI:Performance_monitoring_interrupts
452316 -21.2% 356567 ± 8% interrupts.CPU13.LOC:Local_timer_interrupts
449103 -20.4% 357697 ± 8% interrupts.CPU130.LOC:Local_timer_interrupts
449086 -20.5% 357205 ± 8% interrupts.CPU131.LOC:Local_timer_interrupts
450238 -21.0% 355846 ± 8% interrupts.CPU132.LOC:Local_timer_interrupts
447135 -20.8% 354142 ± 8% interrupts.CPU133.LOC:Local_timer_interrupts
446450 -20.1% 356602 ± 8% interrupts.CPU134.LOC:Local_timer_interrupts
449356 -20.5% 357068 ± 9% interrupts.CPU135.LOC:Local_timer_interrupts
447938 -20.5% 356253 ± 8% interrupts.CPU136.LOC:Local_timer_interrupts
447648 -20.6% 355408 ± 8% interrupts.CPU137.LOC:Local_timer_interrupts
3763 +74.3% 6559 ± 24% interrupts.CPU137.NMI:Non-maskable_interrupts
3763 +74.3% 6559 ± 24% interrupts.CPU137.PMI:Performance_monitoring_interrupts
446747 -20.3% 355856 ± 8% interrupts.CPU138.LOC:Local_timer_interrupts
447237 -20.8% 354392 ± 8% interrupts.CPU139.LOC:Local_timer_interrupts
452138 -21.0% 357163 ± 8% interrupts.CPU14.LOC:Local_timer_interrupts
448154 -20.5% 356451 ± 8% interrupts.CPU140.LOC:Local_timer_interrupts
176.25 ± 53% +131.8% 408.50 ± 38% interrupts.CPU140.RES:Rescheduling_interrupts
447880 -20.6% 355665 ± 8% interrupts.CPU141.LOC:Local_timer_interrupts
447680 -20.7% 354855 ± 8% interrupts.CPU142.LOC:Local_timer_interrupts
446659 -20.6% 354589 ± 8% interrupts.CPU143.LOC:Local_timer_interrupts
445820 -20.5% 354379 ± 8% interrupts.CPU144.LOC:Local_timer_interrupts
6541 ± 24% -42.8% 3738 interrupts.CPU144.NMI:Non-maskable_interrupts
6541 ± 24% -42.8% 3738 interrupts.CPU144.PMI:Performance_monitoring_interrupts
447253 -20.6% 355128 ± 8% interrupts.CPU145.LOC:Local_timer_interrupts
447763 -20.8% 354763 ± 8% interrupts.CPU146.LOC:Local_timer_interrupts
447772 -20.7% 354890 ± 8% interrupts.CPU147.LOC:Local_timer_interrupts
446939 -20.7% 354440 ± 8% interrupts.CPU148.LOC:Local_timer_interrupts
447464 -20.7% 354793 ± 8% interrupts.CPU149.LOC:Local_timer_interrupts
452738 -21.2% 356559 ± 8% interrupts.CPU15.LOC:Local_timer_interrupts
446799 -20.6% 354599 ± 8% interrupts.CPU150.LOC:Local_timer_interrupts
447879 -20.9% 354287 ± 8% interrupts.CPU151.LOC:Local_timer_interrupts
448426 -20.6% 355888 ± 8% interrupts.CPU152.LOC:Local_timer_interrupts
449366 -20.7% 356542 ± 8% interrupts.CPU153.LOC:Local_timer_interrupts
447867 -20.7% 355067 ± 8% interrupts.CPU154.LOC:Local_timer_interrupts
447719 -20.7% 355013 ± 8% interrupts.CPU155.LOC:Local_timer_interrupts
446511 -20.3% 355972 ± 8% interrupts.CPU156.LOC:Local_timer_interrupts
449192 -20.9% 355497 ± 8% interrupts.CPU157.LOC:Local_timer_interrupts
446758 -20.5% 355083 ± 8% interrupts.CPU158.LOC:Local_timer_interrupts
447084 -20.6% 355107 ± 8% interrupts.CPU159.LOC:Local_timer_interrupts
452662 -21.1% 357106 ± 8% interrupts.CPU16.LOC:Local_timer_interrupts
447831 -20.8% 354684 ± 8% interrupts.CPU160.LOC:Local_timer_interrupts
109.50 ± 15% -37.2% 68.75 ± 22% interrupts.CPU160.RES:Rescheduling_interrupts
447477 -20.6% 355188 ± 8% interrupts.CPU161.LOC:Local_timer_interrupts
448495 -20.8% 355285 ± 8% interrupts.CPU162.LOC:Local_timer_interrupts
449288 -20.9% 355477 ± 8% interrupts.CPU163.LOC:Local_timer_interrupts
447985 -20.6% 355568 ± 8% interrupts.CPU164.LOC:Local_timer_interrupts
446601 -20.5% 355191 ± 8% interrupts.CPU165.LOC:Local_timer_interrupts
448135 -20.6% 355818 ± 8% interrupts.CPU166.LOC:Local_timer_interrupts
448461 -20.6% 356227 ± 8% interrupts.CPU167.LOC:Local_timer_interrupts
447391 -20.7% 354733 ± 8% interrupts.CPU168.LOC:Local_timer_interrupts
446612 -20.5% 354909 ± 8% interrupts.CPU169.LOC:Local_timer_interrupts
452748 -21.3% 356314 ± 8% interrupts.CPU17.LOC:Local_timer_interrupts
447238 -20.4% 355830 ± 8% interrupts.CPU170.LOC:Local_timer_interrupts
448308 -20.9% 354608 ± 8% interrupts.CPU171.LOC:Local_timer_interrupts
449043 -21.2% 354035 ± 8% interrupts.CPU172.LOC:Local_timer_interrupts
449395 -20.9% 355374 ± 9% interrupts.CPU173.LOC:Local_timer_interrupts
5700 ± 32% -17.8% 4687 ± 33% interrupts.CPU173.NMI:Non-maskable_interrupts
5700 ± 32% -17.8% 4687 ± 33% interrupts.CPU173.PMI:Performance_monitoring_interrupts
446781 -20.4% 355457 ± 8% interrupts.CPU174.LOC:Local_timer_interrupts
446541 -20.7% 354087 ± 8% interrupts.CPU175.LOC:Local_timer_interrupts
447728 -20.6% 355499 ± 8% interrupts.CPU176.LOC:Local_timer_interrupts
447740 -20.5% 355885 ± 8% interrupts.CPU177.LOC:Local_timer_interrupts
6623 ± 24% -43.3% 3758 interrupts.CPU177.NMI:Non-maskable_interrupts
6623 ± 24% -43.3% 3758 interrupts.CPU177.PMI:Performance_monitoring_interrupts
447747 -20.7% 355148 ± 8% interrupts.CPU178.LOC:Local_timer_interrupts
447285 -20.6% 354994 ± 8% interrupts.CPU179.LOC:Local_timer_interrupts
450911 -21.1% 355809 ± 8% interrupts.CPU18.LOC:Local_timer_interrupts
447180 -20.7% 354485 ± 8% interrupts.CPU180.LOC:Local_timer_interrupts
447702 -20.7% 354973 ± 8% interrupts.CPU181.LOC:Local_timer_interrupts
447897 -20.7% 355132 ± 8% interrupts.CPU182.LOC:Local_timer_interrupts
449321 -21.0% 355184 ± 8% interrupts.CPU183.LOC:Local_timer_interrupts
448357 -20.8% 354979 ± 8% interrupts.CPU184.LOC:Local_timer_interrupts
165.00 ± 61% -81.2% 31.00 ± 63% interrupts.CPU184.RES:Rescheduling_interrupts
447698 -20.4% 356305 ± 8% interrupts.CPU185.LOC:Local_timer_interrupts
446780 -20.6% 354611 ± 8% interrupts.CPU186.LOC:Local_timer_interrupts
447678 -20.6% 355625 ± 8% interrupts.CPU187.LOC:Local_timer_interrupts
447756 -20.3% 356660 ± 8% interrupts.CPU188.LOC:Local_timer_interrupts
448842 -20.5% 356728 ± 8% interrupts.CPU189.LOC:Local_timer_interrupts
452463 -21.2% 356696 ± 8% interrupts.CPU19.LOC:Local_timer_interrupts
448558 -20.6% 355985 ± 8% interrupts.CPU190.LOC:Local_timer_interrupts
6581 ± 24% -28.9% 4680 ± 34% interrupts.CPU190.NMI:Non-maskable_interrupts
6581 ± 24% -28.9% 4680 ± 34% interrupts.CPU190.PMI:Performance_monitoring_interrupts
448186 -20.7% 355605 ± 8% interrupts.CPU191.LOC:Local_timer_interrupts
447640 -20.7% 354849 ± 8% interrupts.CPU192.LOC:Local_timer_interrupts
447828 -20.5% 355818 ± 8% interrupts.CPU193.LOC:Local_timer_interrupts
449769 -20.7% 356689 ± 8% interrupts.CPU194.LOC:Local_timer_interrupts
449120 -20.4% 357570 ± 9% interrupts.CPU195.LOC:Local_timer_interrupts
5641 ± 32% -33.7% 3738 interrupts.CPU195.NMI:Non-maskable_interrupts
5641 ± 32% -33.7% 3738 interrupts.CPU195.PMI:Performance_monitoring_interrupts
448037 -20.4% 356497 ± 7% interrupts.CPU196.LOC:Local_timer_interrupts
446302 -20.4% 355172 ± 8% interrupts.CPU197.LOC:Local_timer_interrupts
451541 -21.3% 355341 ± 8% interrupts.CPU198.LOC:Local_timer_interrupts
449452 -21.1% 354503 ± 9% interrupts.CPU199.LOC:Local_timer_interrupts
450751 -20.8% 356778 ± 8% interrupts.CPU2.LOC:Local_timer_interrupts
451672 -21.2% 356011 ± 8% interrupts.CPU20.LOC:Local_timer_interrupts
447614 -20.9% 354143 ± 8% interrupts.CPU200.LOC:Local_timer_interrupts
446364 -20.6% 354456 ± 8% interrupts.CPU201.LOC:Local_timer_interrupts
447150 -20.4% 355847 ± 8% interrupts.CPU202.LOC:Local_timer_interrupts
5662 ± 32% -17.4% 4678 ± 34% interrupts.CPU202.NMI:Non-maskable_interrupts
5662 ± 32% -17.4% 4678 ± 34% interrupts.CPU202.PMI:Performance_monitoring_interrupts
447324 -20.5% 355784 ± 8% interrupts.CPU203.LOC:Local_timer_interrupts
450353 -21.1% 355551 ± 8% interrupts.CPU204.LOC:Local_timer_interrupts
449486 -21.1% 354766 ± 8% interrupts.CPU205.LOC:Local_timer_interrupts
47.50 ±118% -84.7% 7.25 ± 15% interrupts.CPU205.RES:Rescheduling_interrupts
447640 -20.7% 355083 ± 8% interrupts.CPU206.LOC:Local_timer_interrupts
447443 -20.4% 356366 ± 8% interrupts.CPU207.LOC:Local_timer_interrupts
446651 -20.5% 355032 ± 8% interrupts.CPU208.LOC:Local_timer_interrupts
446807 -20.4% 355505 ± 8% interrupts.CPU209.LOC:Local_timer_interrupts
5643 ± 32% -33.5% 3751 interrupts.CPU209.NMI:Non-maskable_interrupts
5643 ± 32% -33.5% 3751 interrupts.CPU209.PMI:Performance_monitoring_interrupts
452259 -21.0% 357396 ± 8% interrupts.CPU21.LOC:Local_timer_interrupts
447655 -20.7% 354771 ± 8% interrupts.CPU210.LOC:Local_timer_interrupts
447098 -20.7% 354473 ± 8% interrupts.CPU211.LOC:Local_timer_interrupts
446681 -20.6% 354768 ± 8% interrupts.CPU212.LOC:Local_timer_interrupts
446397 -20.4% 355254 ± 8% interrupts.CPU213.LOC:Local_timer_interrupts
7.75 ± 5% +1032.3% 87.75 ±105% interrupts.CPU213.RES:Rescheduling_interrupts
449196 -20.8% 355834 ± 8% interrupts.CPU214.LOC:Local_timer_interrupts
447889 -20.7% 355386 ± 8% interrupts.CPU215.LOC:Local_timer_interrupts
448595 -21.1% 353795 ± 8% interrupts.CPU216.LOC:Local_timer_interrupts
448232 -21.0% 354308 ± 8% interrupts.CPU217.LOC:Local_timer_interrupts
446761 -20.6% 354787 ± 8% interrupts.CPU218.LOC:Local_timer_interrupts
446095 -20.5% 354569 ± 8% interrupts.CPU219.LOC:Local_timer_interrupts
451994 -21.1% 356839 ± 8% interrupts.CPU22.LOC:Local_timer_interrupts
448837 -21.0% 354621 ± 8% interrupts.CPU220.LOC:Local_timer_interrupts
448495 -21.1% 353771 ± 8% interrupts.CPU221.LOC:Local_timer_interrupts
446731 -20.6% 354886 ± 8% interrupts.CPU222.LOC:Local_timer_interrupts
446768 -20.7% 354324 ± 8% interrupts.CPU223.LOC:Local_timer_interrupts
446684 -20.6% 354750 ± 8% interrupts.CPU224.LOC:Local_timer_interrupts
446780 -20.6% 354621 ± 8% interrupts.CPU225.LOC:Local_timer_interrupts
448071 -20.6% 355846 ± 8% interrupts.CPU226.LOC:Local_timer_interrupts
447211 -20.6% 355043 ± 8% interrupts.CPU227.LOC:Local_timer_interrupts
447185 -20.7% 354717 ± 8% interrupts.CPU228.LOC:Local_timer_interrupts
446926 -20.8% 353965 ± 8% interrupts.CPU229.LOC:Local_timer_interrupts
452110 -20.8% 358288 ± 8% interrupts.CPU23.LOC:Local_timer_interrupts
448401 -20.3% 357175 ± 7% interrupts.CPU230.LOC:Local_timer_interrupts
449244 -20.8% 355611 ± 8% interrupts.CPU231.LOC:Local_timer_interrupts
449265 -21.1% 354397 ± 8% interrupts.CPU232.LOC:Local_timer_interrupts
5647 ± 33% -33.6% 3747 interrupts.CPU232.NMI:Non-maskable_interrupts
5647 ± 33% -33.6% 3747 interrupts.CPU232.PMI:Performance_monitoring_interrupts
447186 -20.7% 354486 ± 8% interrupts.CPU233.LOC:Local_timer_interrupts
448107 -21.0% 354228 ± 8% interrupts.CPU234.LOC:Local_timer_interrupts
7537 -37.9% 4681 ± 35% interrupts.CPU234.NMI:Non-maskable_interrupts
7537 -37.9% 4681 ± 35% interrupts.CPU234.PMI:Performance_monitoring_interrupts
447040 -20.7% 354506 ± 8% interrupts.CPU235.LOC:Local_timer_interrupts
447193 -20.9% 353777 ± 8% interrupts.CPU236.LOC:Local_timer_interrupts
446268 -20.6% 354295 ± 8% interrupts.CPU237.LOC:Local_timer_interrupts
449634 -20.8% 356255 ± 8% interrupts.CPU238.LOC:Local_timer_interrupts
449337 -20.8% 355992 ± 8% interrupts.CPU239.LOC:Local_timer_interrupts
451802 -20.9% 357381 ± 8% interrupts.CPU24.LOC:Local_timer_interrupts
447287 -20.7% 354800 ± 8% interrupts.CPU240.LOC:Local_timer_interrupts
446264 -20.8% 353437 ± 8% interrupts.CPU241.LOC:Local_timer_interrupts
447521 -20.6% 355538 ± 8% interrupts.CPU242.LOC:Local_timer_interrupts
448368 -20.8% 355108 ± 8% interrupts.CPU243.LOC:Local_timer_interrupts
448794 -21.2% 353861 ± 8% interrupts.CPU244.LOC:Local_timer_interrupts
448265 -21.1% 353723 ± 8% interrupts.CPU245.LOC:Local_timer_interrupts
447404 -20.6% 355385 ± 8% interrupts.CPU246.LOC:Local_timer_interrupts
448265 -20.7% 355456 ± 8% interrupts.CPU247.LOC:Local_timer_interrupts
447316 -20.7% 354842 ± 8% interrupts.CPU248.LOC:Local_timer_interrupts
447886 -20.7% 355020 ± 8% interrupts.CPU249.LOC:Local_timer_interrupts
452147 -21.3% 355786 ± 8% interrupts.CPU25.LOC:Local_timer_interrupts
448406 -20.8% 355289 ± 8% interrupts.CPU250.LOC:Local_timer_interrupts
448566 -20.9% 354726 ± 8% interrupts.CPU251.LOC:Local_timer_interrupts
447607 -21.0% 353757 ± 8% interrupts.CPU252.LOC:Local_timer_interrupts
28.50 ± 68% +173.7% 78.00 ± 39% interrupts.CPU252.RES:Rescheduling_interrupts
448037 -20.9% 354453 ± 8% interrupts.CPU253.LOC:Local_timer_interrupts
449413 -20.8% 356025 ± 8% interrupts.CPU254.LOC:Local_timer_interrupts
449548 -20.8% 356202 ± 8% interrupts.CPU255.LOC:Local_timer_interrupts
41.00 ±110% -84.8% 6.25 ± 13% interrupts.CPU255.RES:Rescheduling_interrupts
449239 -21.0% 354887 ± 8% interrupts.CPU256.LOC:Local_timer_interrupts
447192 -20.8% 354224 ± 8% interrupts.CPU257.LOC:Local_timer_interrupts
448177 -20.7% 355300 ± 8% interrupts.CPU258.LOC:Local_timer_interrupts
448275 -20.6% 355711 ± 8% interrupts.CPU259.LOC:Local_timer_interrupts
451573 -21.3% 355400 ± 8% interrupts.CPU26.LOC:Local_timer_interrupts
449188 -20.7% 356207 ± 8% interrupts.CPU260.LOC:Local_timer_interrupts
449309 -20.8% 355852 ± 8% interrupts.CPU261.LOC:Local_timer_interrupts
447847 -20.8% 354847 ± 8% interrupts.CPU262.LOC:Local_timer_interrupts
446950 -20.5% 355229 ± 8% interrupts.CPU263.LOC:Local_timer_interrupts
6496 ± 24% -42.7% 3722 interrupts.CPU263.NMI:Non-maskable_interrupts
6496 ± 24% -42.7% 3722 interrupts.CPU263.PMI:Performance_monitoring_interrupts
449945 -20.8% 356417 ± 8% interrupts.CPU264.LOC:Local_timer_interrupts
450439 -21.0% 355899 ± 8% interrupts.CPU265.LOC:Local_timer_interrupts
450339 -20.7% 357335 ± 8% interrupts.CPU266.LOC:Local_timer_interrupts
450174 -20.2% 359229 ± 9% interrupts.CPU267.LOC:Local_timer_interrupts
451002 -20.9% 356683 ± 8% interrupts.CPU268.LOC:Local_timer_interrupts
450462 -21.1% 355569 ± 8% interrupts.CPU269.LOC:Local_timer_interrupts
452188 -21.0% 357345 ± 8% interrupts.CPU27.LOC:Local_timer_interrupts
453310 -21.7% 355167 ± 8% interrupts.CPU270.LOC:Local_timer_interrupts
451241 -21.2% 355504 ± 8% interrupts.CPU271.LOC:Local_timer_interrupts
451114 -21.3% 355154 ± 8% interrupts.CPU272.LOC:Local_timer_interrupts
450134 -21.1% 355326 ± 8% interrupts.CPU273.LOC:Local_timer_interrupts
450431 -20.9% 356194 ± 8% interrupts.CPU274.LOC:Local_timer_interrupts
19.00 ± 45% +173.7% 52.00 ± 62% interrupts.CPU274.RES:Rescheduling_interrupts
450807 -20.8% 357003 ± 8% interrupts.CPU275.LOC:Local_timer_interrupts
453075 -21.5% 355714 ± 8% interrupts.CPU276.LOC:Local_timer_interrupts
450048 -21.0% 355644 ± 8% interrupts.CPU277.LOC:Local_timer_interrupts
449561 -20.7% 356522 ± 8% interrupts.CPU278.LOC:Local_timer_interrupts
450017 -20.4% 358363 ± 9% interrupts.CPU279.LOC:Local_timer_interrupts
451493 -21.2% 355798 ± 8% interrupts.CPU28.LOC:Local_timer_interrupts
160.00 ± 5% +51.4% 242.25 ± 9% interrupts.CPU28.RES:Rescheduling_interrupts
450158 -21.0% 355626 ± 8% interrupts.CPU280.LOC:Local_timer_interrupts
450175 -21.0% 355780 ± 8% interrupts.CPU281.LOC:Local_timer_interrupts
450065 -20.9% 356051 ± 8% interrupts.CPU282.LOC:Local_timer_interrupts
447777 -20.5% 355761 ± 8% interrupts.CPU283.LOC:Local_timer_interrupts
450110 -20.9% 356052 ± 8% interrupts.CPU284.LOC:Local_timer_interrupts
6524 ± 24% -28.6% 4657 ± 34% interrupts.CPU284.NMI:Non-maskable_interrupts
6524 ± 24% -28.6% 4657 ± 34% interrupts.CPU284.PMI:Performance_monitoring_interrupts
449420 -20.9% 355525 ± 8% interrupts.CPU285.LOC:Local_timer_interrupts
449800 -20.7% 356713 ± 8% interrupts.CPU286.LOC:Local_timer_interrupts
459915 -20.2% 366810 ± 9% interrupts.CPU287.LOC:Local_timer_interrupts
451800 -21.2% 355897 ± 8% interrupts.CPU29.LOC:Local_timer_interrupts
451158 -21.0% 356357 ± 8% interrupts.CPU3.LOC:Local_timer_interrupts
452347 -21.3% 355924 ± 8% interrupts.CPU30.LOC:Local_timer_interrupts
3777 +98.4% 7492 interrupts.CPU30.NMI:Non-maskable_interrupts
3777 +98.4% 7492 interrupts.CPU30.PMI:Performance_monitoring_interrupts
128.75 ± 31% +86.2% 239.75 ± 10% interrupts.CPU30.RES:Rescheduling_interrupts
451634 -20.8% 357831 ± 8% interrupts.CPU31.LOC:Local_timer_interrupts
452787 -20.9% 358017 ± 8% interrupts.CPU32.LOC:Local_timer_interrupts
70.50 ± 38% +279.8% 267.75 ± 54% interrupts.CPU32.RES:Rescheduling_interrupts
451923 -20.9% 357380 ± 8% interrupts.CPU33.LOC:Local_timer_interrupts
452722 -21.4% 355941 ± 8% interrupts.CPU34.LOC:Local_timer_interrupts
448603 -20.4% 357004 ± 8% interrupts.CPU35.LOC:Local_timer_interrupts
452210 -21.3% 355998 ± 8% interrupts.CPU36.LOC:Local_timer_interrupts
451882 -21.3% 355544 ± 8% interrupts.CPU37.LOC:Local_timer_interrupts
453383 -21.4% 356383 ± 8% interrupts.CPU38.LOC:Local_timer_interrupts
452809 -21.0% 357598 ± 8% interrupts.CPU39.LOC:Local_timer_interrupts
452315 -21.0% 357529 ± 8% interrupts.CPU4.LOC:Local_timer_interrupts
450608 -21.2% 354930 ± 8% interrupts.CPU40.LOC:Local_timer_interrupts
450081 -20.9% 355990 ± 8% interrupts.CPU41.LOC:Local_timer_interrupts
450043 -20.8% 356507 ± 8% interrupts.CPU42.LOC:Local_timer_interrupts
22.75 ± 48% +236.3% 76.50 ± 40% interrupts.CPU42.RES:Rescheduling_interrupts
450140 -20.8% 356378 ± 8% interrupts.CPU43.LOC:Local_timer_interrupts
449695 -20.7% 356421 ± 8% interrupts.CPU44.LOC:Local_timer_interrupts
18.50 ± 43% +625.7% 134.25 ± 80% interrupts.CPU44.RES:Rescheduling_interrupts
450168 -20.8% 356636 ± 8% interrupts.CPU45.LOC:Local_timer_interrupts
451761 -21.3% 355664 ± 8% interrupts.CPU46.LOC:Local_timer_interrupts
450482 -20.8% 356781 ± 8% interrupts.CPU47.LOC:Local_timer_interrupts
53.75 ± 80% +245.6% 185.75 ± 57% interrupts.CPU47.RES:Rescheduling_interrupts
450849 -21.0% 356318 ± 8% interrupts.CPU48.LOC:Local_timer_interrupts
449530 -20.8% 356024 ± 8% interrupts.CPU49.LOC:Local_timer_interrupts
452185 -21.2% 356424 ± 8% interrupts.CPU5.LOC:Local_timer_interrupts
452374 -21.2% 356685 ± 8% interrupts.CPU50.LOC:Local_timer_interrupts
451158 -20.7% 357779 ± 9% interrupts.CPU51.LOC:Local_timer_interrupts
451039 -20.6% 357966 ± 8% interrupts.CPU52.LOC:Local_timer_interrupts
5682 ± 32% -34.2% 3740 interrupts.CPU52.NMI:Non-maskable_interrupts
5682 ± 32% -34.2% 3740 interrupts.CPU52.PMI:Performance_monitoring_interrupts
450978 -21.0% 356243 ± 8% interrupts.CPU53.LOC:Local_timer_interrupts
452251 -21.5% 355052 ± 9% interrupts.CPU54.LOC:Local_timer_interrupts
450986 -21.2% 355254 ± 8% interrupts.CPU55.LOC:Local_timer_interrupts
450685 -21.0% 356148 ± 8% interrupts.CPU56.LOC:Local_timer_interrupts
449475 -20.9% 355563 ± 8% interrupts.CPU57.LOC:Local_timer_interrupts
451278 -20.7% 357853 ± 8% interrupts.CPU58.LOC:Local_timer_interrupts
169.75 ± 60% -63.6% 61.75 ± 99% interrupts.CPU58.RES:Rescheduling_interrupts
450102 -20.5% 357925 ± 8% interrupts.CPU59.LOC:Local_timer_interrupts
451723 -21.3% 355340 ± 8% interrupts.CPU6.LOC:Local_timer_interrupts
454798 -21.5% 356820 ± 8% interrupts.CPU60.LOC:Local_timer_interrupts
7545 -50.2% 3758 interrupts.CPU60.NMI:Non-maskable_interrupts
7545 -50.2% 3758 interrupts.CPU60.PMI:Performance_monitoring_interrupts
451436 -21.3% 355493 ± 8% interrupts.CPU61.LOC:Local_timer_interrupts
450910 -20.8% 357077 ± 8% interrupts.CPU62.LOC:Local_timer_interrupts
450686 -20.3% 359334 ± 9% interrupts.CPU63.LOC:Local_timer_interrupts
450216 -20.8% 356564 ± 8% interrupts.CPU64.LOC:Local_timer_interrupts
449965 -21.1% 354963 ± 8% interrupts.CPU65.LOC:Local_timer_interrupts
450709 -21.0% 355952 ± 8% interrupts.CPU66.LOC:Local_timer_interrupts
450160 -21.0% 355775 ± 8% interrupts.CPU67.LOC:Local_timer_interrupts
450297 -21.1% 355375 ± 8% interrupts.CPU68.LOC:Local_timer_interrupts
449937 -20.9% 356013 ± 8% interrupts.CPU69.LOC:Local_timer_interrupts
5674 ± 32% -17.2% 4700 ± 34% interrupts.CPU69.NMI:Non-maskable_interrupts
5674 ± 32% -17.2% 4700 ± 34% interrupts.CPU69.PMI:Performance_monitoring_interrupts
451977 -21.0% 356885 ± 8% interrupts.CPU7.LOC:Local_timer_interrupts
449651 -20.9% 355504 ± 8% interrupts.CPU70.LOC:Local_timer_interrupts
451073 -20.9% 356589 ± 8% interrupts.CPU71.LOC:Local_timer_interrupts
447324 -20.8% 354292 ± 8% interrupts.CPU72.LOC:Local_timer_interrupts
6588 ± 24% -43.2% 3740 interrupts.CPU72.NMI:Non-maskable_interrupts
6588 ± 24% -43.2% 3740 interrupts.CPU72.PMI:Performance_monitoring_interrupts
240.50 ± 7% -36.0% 154.00 ± 50% interrupts.CPU72.RES:Rescheduling_interrupts
447170 -20.7% 354707 ± 8% interrupts.CPU73.LOC:Local_timer_interrupts
448611 -20.8% 355143 ± 8% interrupts.CPU74.LOC:Local_timer_interrupts
447417 -20.8% 354345 ± 8% interrupts.CPU75.LOC:Local_timer_interrupts
448190 -20.8% 355035 ± 8% interrupts.CPU76.LOC:Local_timer_interrupts
448346 -20.9% 354442 ± 8% interrupts.CPU77.LOC:Local_timer_interrupts
6594 ± 24% -43.4% 3735 interrupts.CPU77.NMI:Non-maskable_interrupts
6594 ± 24% -43.4% 3735 interrupts.CPU77.PMI:Performance_monitoring_interrupts
448452 -20.6% 355866 ± 8% interrupts.CPU78.LOC:Local_timer_interrupts
447509 -20.7% 354660 ± 8% interrupts.CPU79.LOC:Local_timer_interrupts
451912 -20.9% 357687 ± 8% interrupts.CPU8.LOC:Local_timer_interrupts
449808 -20.9% 355596 ± 8% interrupts.CPU80.LOC:Local_timer_interrupts
449111 -21.0% 354994 ± 8% interrupts.CPU81.LOC:Local_timer_interrupts
6642 ± 24% -29.3% 4699 ± 34% interrupts.CPU81.NMI:Non-maskable_interrupts
6642 ± 24% -29.3% 4699 ± 34% interrupts.CPU81.PMI:Performance_monitoring_interrupts
447536 -20.8% 354299 ± 8% interrupts.CPU82.LOC:Local_timer_interrupts
448241 -21.0% 354323 ± 8% interrupts.CPU83.LOC:Local_timer_interrupts
447498 -20.7% 355042 ± 8% interrupts.CPU84.LOC:Local_timer_interrupts
484.50 ± 82% -80.6% 94.00 ± 47% interrupts.CPU84.RES:Rescheduling_interrupts
446794 -20.5% 355249 ± 8% interrupts.CPU85.LOC:Local_timer_interrupts
448191 -20.9% 354594 ± 8% interrupts.CPU86.LOC:Local_timer_interrupts
447635 -20.9% 353877 ± 8% interrupts.CPU87.LOC:Local_timer_interrupts
448706 -20.8% 355250 ± 8% interrupts.CPU88.LOC:Local_timer_interrupts
3738 +50.5% 5627 ± 33% interrupts.CPU88.NMI:Non-maskable_interrupts
3738 +50.5% 5627 ± 33% interrupts.CPU88.PMI:Performance_monitoring_interrupts
448312 -20.9% 354709 ± 8% interrupts.CPU89.LOC:Local_timer_interrupts
452329 -21.0% 357333 ± 8% interrupts.CPU9.LOC:Local_timer_interrupts
449864 -21.1% 354745 ± 8% interrupts.CPU90.LOC:Local_timer_interrupts
451193 -21.5% 353997 ± 8% interrupts.CPU91.LOC:Local_timer_interrupts
447166 -20.4% 356159 ± 8% interrupts.CPU92.LOC:Local_timer_interrupts
447364 -20.7% 354752 ± 8% interrupts.CPU93.LOC:Local_timer_interrupts
449764 -21.0% 355410 ± 8% interrupts.CPU94.LOC:Local_timer_interrupts
448393 -20.7% 355788 ± 8% interrupts.CPU95.LOC:Local_timer_interrupts
450191 -21.1% 355404 ± 8% interrupts.CPU96.LOC:Local_timer_interrupts
447651 -20.6% 355343 ± 8% interrupts.CPU97.LOC:Local_timer_interrupts
447679 -20.8% 354531 ± 8% interrupts.CPU98.LOC:Local_timer_interrupts
447448 -20.9% 353795 ± 8% interrupts.CPU99.LOC:Local_timer_interrupts
1.293e+08 -20.8% 1.024e+08 ± 8% interrupts.LOC:Local_timer_interrupts



will-it-scale.per_process_ops

680 +---------------------------------------------------------------------+
| ++.++. .+ .++. + +.++.+ +.++.+.++. : ++.+.+ ++.++.+.++.++.|
660 |-+ + + +.+ ++.+.+ |
640 |-+ |
| |
620 |-+ |
| |
600 |-+ |
| O O OO O O |
580 |-+O OO O O O OO |
560 |-O O O |
| O O |
540 |-+ OO O O O OO O OO O |
| O |
520 +---------------------------------------------------------------------+


will-it-scale.workload

200000 +------------------------------------------------------------------+
195000 |.+ .+ +. +. .+ |
| ++.++. +. +.+ + +.++.+ ++.+.++.+ : ++.++ +.++.++.++.++.|
190000 |-+ + + +.+ +.++.+ |
185000 |-+ |
| |
180000 |-+ |
175000 |-+ |
170000 |-+ O O O |
| O O O O O O |
165000 |-+ O O O O OO O |
160000 |-O |
| O O OO OO OO OO O |
155000 |-+ O OO |
150000 +------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (71.29 kB)
config-5.7.0-rc1-00011-g1de08dccd3834 (209.48 kB)
job-script (7.49 kB)
job.yaml (5.17 kB)
reproduce (322.00 B)
Download all attachments

2020-04-25 13:05:38

by Borislav Petkov

[permalink] [raw]
Subject: Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Sat, Apr 25, 2020 at 07:44:14PM +0800, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a -14.1% regression of will-it-scale.per_process_ops due to commit:
>
>
> commit: 1de08dccd383482a3e88845d3554094d338f5ff9 ("x86/mce: Add a struct mce.kflags field")

I don't see how a struct mce member addition will cause any performance
regression. Please check your test case.

Thx.

--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

2020-08-18 08:33:45

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

Hi Borislav,

On Sat, Apr 25, 2020 at 03:01:36PM +0200, Borislav Petkov wrote:
> On Sat, Apr 25, 2020 at 07:44:14PM +0800, kernel test robot wrote:
> > Greeting,
> >
> > FYI, we noticed a -14.1% regression of will-it-scale.per_process_ops due to commit:
> >
> >
> > commit: 1de08dccd383482a3e88845d3554094d338f5ff9 ("x86/mce: Add a struct mce.kflags field")
>
> I don't see how a struct mce member addition will cause any performance
> regression. Please check your test case.

Sorry for the late response.

We've done more rounds of test, and the test results are consistent.

Our suspect is the commit changes the data alignment of other kernel
domains than mce, which causes the performance change to this malloc
microbenchmark.

Without the patch, size of 'struct mce' is 120 bytes, while it will
be 128 bytes after adding the '__u64 kflags'

And we also debugged further:

* add "mce=off" to kernel cmdline, the performance change keeps.

* change the 'kflags' from __u64 to __u32 (the size of mce will
go back to 120 bytes), the performance change is gone

* only comment off '__u64 kflags', peformance change is gone.

We also tried perf c2c tool to capture some data, but the platform
is a Xeon Phi which doesn't support it. Capturing raw HITM event
also can not provide useful data.

0day has reported quite some strange peformance bump like this,
https://lore.kernel.org/lkml/20200205123216.GO12867@shao2-debian/
https://lore.kernel.org/lkml/20200114085637.GA29297@shao2-debian/
https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
for some of which, the bump could be gone if we hack to force all
kernel functions to be aligned, but it doesn't work for this case.

So together with the debugging above, we thought this could be a
data alignment change caused performance bump.

Thanks,
Feng

> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg
> _______________________________________________
> LKP mailing list -- [email protected]
> To unsubscribe send an email to [email protected]

2020-08-18 20:08:22

by Luck, Tony

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Tue, Aug 18, 2020 at 04:29:43PM +0800, Feng Tang wrote:
> Hi Borislav,
>
> On Sat, Apr 25, 2020 at 03:01:36PM +0200, Borislav Petkov wrote:
> > On Sat, Apr 25, 2020 at 07:44:14PM +0800, kernel test robot wrote:
> > > Greeting,
> > >
> > > FYI, we noticed a -14.1% regression of will-it-scale.per_process_ops due to commit:
> > >
> > >
> > > commit: 1de08dccd383482a3e88845d3554094d338f5ff9 ("x86/mce: Add a struct mce.kflags field")
> >
> > I don't see how a struct mce member addition will cause any performance
> > regression. Please check your test case.
>
> Sorry for the late response.
>
> We've done more rounds of test, and the test results are consistent.
>
> Our suspect is the commit changes the data alignment of other kernel
> domains than mce, which causes the performance change to this malloc
> microbenchmark.
>
> Without the patch, size of 'struct mce' is 120 bytes, while it will
> be 128 bytes after adding the '__u64 kflags'
>
> And we also debugged further:
>
> * add "mce=off" to kernel cmdline, the performance change keeps.
>
> * change the 'kflags' from __u64 to __u32 (the size of mce will
> go back to 120 bytes), the performance change is gone
>
> * only comment off '__u64 kflags', peformance change is gone.
>
> We also tried perf c2c tool to capture some data, but the platform
> is a Xeon Phi which doesn't support it. Capturing raw HITM event
> also can not provide useful data.
>
> 0day has reported quite some strange peformance bump like this,
> https://lore.kernel.org/lkml/20200205123216.GO12867@shao2-debian/
> https://lore.kernel.org/lkml/20200114085637.GA29297@shao2-debian/
> https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
> for some of which, the bump could be gone if we hack to force all
> kernel functions to be aligned, but it doesn't work for this case.
>
> So together with the debugging above, we thought this could be a
> data alignment change caused performance bump.

So if this was a change to a structure in some performance sensitive
path, I'd totally understand how it could end up with a 14% change on
some benchmark that stressed that code path.

But I doubt the kernel ever touches a "struct mce" during execution
of your benchmark (I presume your test machine isn't getting thousands
of corrected memory errors during the test :-) ).

We do have some DEFINE_PER_CPU data objects of type "struct mce":

$ git grep 'DEFINE_PER_CPU(struct mce,'
arch/x86/kernel/cpu/mce/core.c:static DEFINE_PER_CPU(struct mce, mces_seen);
arch/x86/kernel/cpu/mce/core.c:DEFINE_PER_CPU(struct mce, injectm);

Maybe making those slightly bigger has pushed some other per_cpu object
into an unfortunate alignment where some frequently used data is now
split between two cache lines instead of sitting in one?

Can you collect some perf trace data for the benchmark when running
on kernels with kflags as __u32 and __u64 (looks to be the minimal
possible change that you found that still exhibits this problem).

We'd like to find out which kernel functions are burning extra CPU
cycles and maybe understand why.

The answer isn't to tinker with "struct mce". Other changes could trigger
this same change in alignment. Anything that is this perfomance sensitive
needs to have some "__attribute__((aligned(64)))" (or whatever) to
make sure arbitrary changes elsewhere don't do this.

-Tony

2020-08-19 02:09:52

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Tue, Aug 18, 2020 at 01:06:54PM -0700, Luck, Tony wrote:
> On Tue, Aug 18, 2020 at 04:29:43PM +0800, Feng Tang wrote:
> > Hi Borislav,
> >
> > On Sat, Apr 25, 2020 at 03:01:36PM +0200, Borislav Petkov wrote:
> > > On Sat, Apr 25, 2020 at 07:44:14PM +0800, kernel test robot wrote:
> > > > Greeting,
> > > >
> > > > FYI, we noticed a -14.1% regression of will-it-scale.per_process_ops due to commit:
> > > >
> > > >
> > > > commit: 1de08dccd383482a3e88845d3554094d338f5ff9 ("x86/mce: Add a struct mce.kflags field")
> > >
> > > I don't see how a struct mce member addition will cause any performance
> > > regression. Please check your test case.
> >
> > Sorry for the late response.
> >
> > We've done more rounds of test, and the test results are consistent.
> >
> > Our suspect is the commit changes the data alignment of other kernel
> > domains than mce, which causes the performance change to this malloc
> > microbenchmark.
> >
> > Without the patch, size of 'struct mce' is 120 bytes, while it will
> > be 128 bytes after adding the '__u64 kflags'
> >
> > And we also debugged further:
> >
> > * add "mce=off" to kernel cmdline, the performance change keeps.
> >
> > * change the 'kflags' from __u64 to __u32 (the size of mce will
> > go back to 120 bytes), the performance change is gone
> >
> > * only comment off '__u64 kflags', peformance change is gone.
> >
> > We also tried perf c2c tool to capture some data, but the platform
> > is a Xeon Phi which doesn't support it. Capturing raw HITM event
> > also can not provide useful data.
> >
> > 0day has reported quite some strange peformance bump like this,
> > https://lore.kernel.org/lkml/20200205123216.GO12867@shao2-debian/
> > https://lore.kernel.org/lkml/20200114085637.GA29297@shao2-debian/
> > https://lore.kernel.org/lkml/20200330011254.GA14393@feng-iot/
> > for some of which, the bump could be gone if we hack to force all
> > kernel functions to be aligned, but it doesn't work for this case.
> >
> > So together with the debugging above, we thought this could be a
> > data alignment change caused performance bump.
>
> So if this was a change to a structure in some performance sensitive
> path, I'd totally understand how it could end up with a 14% change on
> some benchmark that stressed that code path.

> But I doubt the kernel ever touches a "struct mce" during execution
> of your benchmark (I presume your test machine isn't getting thousands
> of corrected memory errors during the test :-) ).

No, it isn't getting any mce error :) It's a Xeon Phi platform.

We've tried the "mce=off" cmdline option, and the 14% keeps. So we
thought the mce itself isn't the cause.

> We do have some DEFINE_PER_CPU data objects of type "struct mce":
>
> $ git grep 'DEFINE_PER_CPU(struct mce,'
> arch/x86/kernel/cpu/mce/core.c:static DEFINE_PER_CPU(struct mce, mces_seen);
> arch/x86/kernel/cpu/mce/core.c:DEFINE_PER_CPU(struct mce, injectm);
>
> Maybe making those slightly bigger has pushed some other per_cpu object
> into an unfortunate alignment where some frequently used data is now
> split between two cache lines instead of sitting in one?

Yes, I also checked the percpu data part of kernel System map, seems
it only affects alignments of several variables, from 'mce_pooll_banks'
to 'tsc_adjust', and the alignment restores for 'lapic_events', but I
can't see any of them could be related to this malloc microbenchmark

old map:

0000000000018c60 d mces_seen
0000000000018ce0 D injectm
0000000000018d58 D mce_poll_banks
0000000000018d60 D mce_poll_count
0000000000018d64 D mce_exception_count
0000000000018d68 D mce_device
0000000000018d70 d cmci_storm_state
0000000000018d74 d cmci_storm_cnt
0000000000018d78 d cmci_time_stamp
0000000000018d80 d cmci_backoff_cnt
0000000000018d88 d mce_banks_owned
0000000000018d90 d smca_misc_banks_map
0000000000018d94 d bank_map
0000000000018d98 d threshold_banks
0000000000018da0 d thermal_state
0000000000019260 D pqr_state
0000000000019270 d arch_prev_mperf
0000000000019278 d arch_prev_aperf
0000000000019280 D arch_freq_scale
00000000000192a0 d tsc_adjust
00000000000192c0 d lapic_events

new map:

0000000000018c60 d mces_seen
0000000000018ce0 D injectm
0000000000018d60 D mce_poll_banks
0000000000018d68 D mce_poll_count
0000000000018d6c D mce_exception_count
0000000000018d70 D mce_device
0000000000018d78 d cmci_storm_state
0000000000018d7c d cmci_storm_cnt
0000000000018d80 d cmci_time_stamp
0000000000018d88 d cmci_backoff_cnt
0000000000018d90 d mce_banks_owned
0000000000018d98 d smca_misc_banks_map
0000000000018d9c d bank_map
0000000000018da0 d threshold_banks
0000000000018dc0 d thermal_state
0000000000019280 D pqr_state
0000000000019290 d arch_prev_mperf
0000000000019298 d arch_prev_aperf
00000000000192a0 D arch_freq_scale
00000000000192c0 d tsc_adjust
0000000000019300 d lapic_events

> Can you collect some perf trace data for the benchmark when running
> on kernels with kflags as __u32 and __u64 (looks to be the minimal
> possible change that you found that still exhibits this problem).
>
> We'd like to find out which kernel functions are burning extra CPU
> cycles and maybe understand why.

Ok, will do that and report back. 0day has recently upgraded the gcc,
default config, rootfs, so it may take some time to reproduce this with
old gcc/environment (and yes, gcc, kernel config, rootfs could all
affect micro benchmark's result :))

Thanks,
Feng

> The answer isn't to tinker with "struct mce". Other changes could trigger
> this same change in alignment. Anything that is this perfomance sensitive
> needs to have some "__attribute__((aligned(64)))" (or whatever) to
> make sure arbitrary changes elsewhere don't do this.
>
> -Tony

2020-08-19 03:38:26

by Luck, Tony

[permalink] [raw]
Subject: RE: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

0000000000019260 D pqr_state

Do you have /sys/fs/resctrl mounted? This variable is read on every context switch.
If your benchmark does a lot of context switching and this now shares a cache line
with something different (especially something that is sometimes modified from another
CPU) that could cause some cache line bouncing.

-Tony

2020-08-21 02:45:12

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Wed, Aug 19, 2020 at 10:04:37AM +0800, Feng Tang wrote:
> > We do have some DEFINE_PER_CPU data objects of type "struct mce":
> >
> > $ git grep 'DEFINE_PER_CPU(struct mce,'
> > arch/x86/kernel/cpu/mce/core.c:static DEFINE_PER_CPU(struct mce, mces_seen);
> > arch/x86/kernel/cpu/mce/core.c:DEFINE_PER_CPU(struct mce, injectm);
> >
> > Maybe making those slightly bigger has pushed some other per_cpu object
> > into an unfortunate alignment where some frequently used data is now
> > split between two cache lines instead of sitting in one?
>
> Yes, I also checked the percpu data part of kernel System map, seems
> it only affects alignments of several variables, from 'mce_pooll_banks'
> to 'tsc_adjust', and the alignment restores for 'lapic_events', but I
> can't see any of them could be related to this malloc microbenchmark
>
> old map:
>
> 0000000000018c60 d mces_seen
> 0000000000018ce0 D injectm
> 0000000000018d58 D mce_poll_banks
> 0000000000018d60 D mce_poll_count
> 0000000000018d64 D mce_exception_count
> 0000000000018d68 D mce_device
> 0000000000018d70 d cmci_storm_state
> 0000000000018d74 d cmci_storm_cnt
> 0000000000018d78 d cmci_time_stamp
> 0000000000018d80 d cmci_backoff_cnt
> 0000000000018d88 d mce_banks_owned
> 0000000000018d90 d smca_misc_banks_map
> 0000000000018d94 d bank_map
> 0000000000018d98 d threshold_banks
> 0000000000018da0 d thermal_state
> 0000000000019260 D pqr_state
> 0000000000019270 d arch_prev_mperf
> 0000000000019278 d arch_prev_aperf
> 0000000000019280 D arch_freq_scale
> 00000000000192a0 d tsc_adjust
> 00000000000192c0 d lapic_events
>
> new map:
>
> 0000000000018c60 d mces_seen
> 0000000000018ce0 D injectm
> 0000000000018d60 D mce_poll_banks
> 0000000000018d68 D mce_poll_count
> 0000000000018d6c D mce_exception_count
> 0000000000018d70 D mce_device
> 0000000000018d78 d cmci_storm_state
> 0000000000018d7c d cmci_storm_cnt
> 0000000000018d80 d cmci_time_stamp
> 0000000000018d88 d cmci_backoff_cnt
> 0000000000018d90 d mce_banks_owned
> 0000000000018d98 d smca_misc_banks_map
> 0000000000018d9c d bank_map
> 0000000000018da0 d threshold_banks
> 0000000000018dc0 d thermal_state
> 0000000000019280 D pqr_state
> 0000000000019290 d arch_prev_mperf
> 0000000000019298 d arch_prev_aperf
> 00000000000192a0 D arch_freq_scale
> 00000000000192c0 d tsc_adjust
> 0000000000019300 d lapic_events
>
> > Can you collect some perf trace data for the benchmark when running
> > on kernels with kflags as __u32 and __u64 (looks to be the minimal
> > possible change that you found that still exhibits this problem).
> >
> > We'd like to find out which kernel functions are burning extra CPU
> > cycles and maybe understand why.

I can only found the old kernels for raw tip/ras/core branch, which reproduced
this regressions.

1de08dccd383 x86/mce: Add a struct mce.kflags field
9554bfe403bd x86/mce: Convert the CEC to use the MCE notifier

And strange thing is after using gcc9 and debian10 rootfs, with same commits
the regression turns to a improvement, though the trend keeps, that if we
changes the kflags from __u64 to __u32, the performance will be no change.

Following is the comparing of regression, I also attached the perf-profile
for old and new commit (let me know if you need more data)


9554bfe403bdfc08 1de08dccd383482a3e88845d355
---------------- ---------------------------
%stddev %change %stddev
\ | \
192362 -15.1% 163343 will-it-scale.287.processes
0.91 +0.2% 0.92 will-it-scale.287.processes_idle
669.67 -15.1% 568.50 will-it-scale.per_process_ops
309.97 +0.2% 310.74 will-it-scale.time.elapsed_time
309.97 +0.2% 310.74 will-it-scale.time.elapsed_time.max
0.67 ±141% +200.0% 2.00 ± 50% will-it-scale.time.involuntary_context_switches
9921 +0.8% 10004 will-it-scale.time.maximum_resident_set_size
6110 +0.3% 6130 will-it-scale.time.minor_page_faults
4096 +0.0% 4096 will-it-scale.time.page_size
0.18 ± 2% +1.9% 0.18 ± 5% will-it-scale.time.system_time
0.25 ± 3% +0.0% 0.25 ± 4% will-it-scale.time.user_time
73.00 +12.3% 82.00 ± 3% will-it-scale.time.voluntary_context_switches
192362 -15.1% 163343 will-it-scale.workload
366.22 +0.3% 367.20 uptime.boot
15417 ± 4% +0.8% 15533 uptime.idle
1.347e+09 ± 2% -1.9% 1.321e+09 cpuidle.C1.time
2623112 ± 7% +5.7% 2773573 cpuidle.C1.usage
532385 ± 70% -98.7% 7012 ± 13% cpuidle.POLL.time
11803 ± 72% -96.7% 392.50 ± 13% cpuidle.POLL.usage
1.44 ± 4% +0.1 1.52 mpstat.cpu.all.idle%
0.00 ± 41% +0.0 0.00 ± 19% mpstat.cpu.all.soft%
98.01 -0.0 97.98 mpstat.cpu.all.sys%
0.55 ± 3% -0.1 0.50 mpstat.cpu.all.usr%
0.00 -100.0% 0.00 numa-numastat.node0.interleave_hit
1.2e+08 -14.5% 1.026e+08 numa-numastat.node0.local_node
1.2e+08 -14.5% 1.026e+08 numa-numastat.node0.numa_hit
0.00 -100.0% 0.00 numa-numastat.node0.other_node
0.00 -100.0% 0.00 numa-numastat.node1.interleave_hit
0.00 -100.0% 0.00 numa-numastat.node1.local_node
0.00 -100.0% 0.00 numa-numastat.node1.numa_hit
0.00 -100.0% 0.00 numa-numastat.node1.other_node
309.97 +0.2% 310.74 time.elapsed_time
309.97 +0.2% 310.74 time.elapsed_time.max
0.67 ±141% +200.0% 2.00 ± 50% time.involuntary_context_switches
9921 +0.8% 10004 time.maximum_resident_set_size
6110 +0.3% 6130 time.minor_page_faults
4096 +0.0% 4096 time.page_size
0.18 ± 2% +1.9% 0.18 ± 5% time.system_time
0.25 ± 3% +0.0% 0.25 ± 4% time.user_time
73.00 +12.3% 82.00 ± 3% time.voluntary_context_switches
1.00 +50.0% 1.50 ± 33% vmstat.cpu.id
97.00 +0.0% 97.00 vmstat.cpu.sy
0.00 -100.0% 0.00 vmstat.cpu.us
0.00 -100.0% 0.00 vmstat.io.bi
4.00 +0.0% 4.00 vmstat.memory.buff
1574390 -0.1% 1573361 vmstat.memory.cache
79173849 +0.0% 79177727 vmstat.memory.free
282.33 -0.1% 282.00 vmstat.procs.r
2760 -0.4% 2749 vmstat.system.cs
364380 ± 12% -1.4% 359417 vmstat.system.in
10.07 -8.7% 9.20 perf-stat.i.MPKI
1.005e+10 +1.6% 1.022e+10 perf-stat.i.branch-instructions
1.30 -0.1 1.16 perf-stat.i.branch-miss-rate%
1.26e+08 -9.7% 1.138e+08 perf-stat.i.branch-misses
13.62 +0.2 13.86 perf-stat.i.cache-miss-rate%
55442235 ± 2% -6.1% 52078517 perf-stat.i.cache-misses
4.077e+08 ± 2% -7.6% 3.766e+08 perf-stat.i.cache-references
2747 -0.3% 2739 perf-stat.i.context-switches
10.85 -1.4% 10.70 perf-stat.i.cpi
288378 +0.1% 288596 perf-stat.i.cpu-clock
4.467e+11 -0.0% 4.465e+11 perf-stat.i.cpu-cycles
267.78 +0.2% 268.24 perf-stat.i.cpu-migrations
8033 ± 2% +6.4% 8547 perf-stat.i.cycles-between-cache-misses
0.18 -0.0 0.16 perf-stat.i.iTLB-load-miss-rate%
68968473 -11.4% 61127131 perf-stat.i.iTLB-load-misses
4.114e+10 +1.4% 4.172e+10 perf-stat.i.iTLB-loads
4.109e+10 +1.4% 4.167e+10 perf-stat.i.instructions
598.48 +14.8% 687.20 perf-stat.i.instructions-per-iTLB-miss
0.09 +1.4% 0.09 perf-stat.i.ipc
1.55 -0.1% 1.55 perf-stat.i.metric.GHz
1.35 -15.1% 1.15 perf-stat.i.metric.K/sec
178.94 +1.3% 181.27 perf-stat.i.metric.M/sec
195779 -14.8% 166863 perf-stat.i.minor-faults
195779 -14.8% 166863 perf-stat.i.page-faults
288378 +0.1% 288596 perf-stat.i.task-clock
9.92 -8.9% 9.04 perf-stat.overall.MPKI
1.25 -0.1 1.11 perf-stat.overall.branch-miss-rate%
13.66 +0.2 13.89 perf-stat.overall.cache-miss-rate%
10.87 -1.5% 10.71 perf-stat.overall.cpi
8026 ± 2% +6.3% 8534 perf-stat.overall.cycles-between-cache-misses
0.17 -0.0 0.15 perf-stat.overall.iTLB-load-miss-rate%
596.26 +14.2% 681.07 perf-stat.overall.instructions-per-iTLB-miss
0.09 +1.5% 0.09 perf-stat.overall.ipc
65896092 +19.8% 78932072 perf-stat.overall.path-length
1.002e+10 +1.6% 1.018e+10 perf-stat.ps.branch-instructions
1.254e+08 -9.5% 1.134e+08 perf-stat.ps.branch-misses
55492415 ± 2% -6.1% 52128690 perf-stat.ps.cache-misses
4.062e+08 ± 2% -7.6% 3.754e+08 perf-stat.ps.cache-references
2689 -0.4% 2677 perf-stat.ps.context-switches
286795 +0.0% 286862 perf-stat.ps.cpu-clock
4.452e+11 -0.1% 4.449e+11 perf-stat.ps.cpu-cycles
253.81 +0.2% 254.32 perf-stat.ps.cpu-migrations
68711344 -11.3% 60977219 perf-stat.ps.iTLB-load-misses
4.098e+10 +1.4% 4.156e+10 perf-stat.ps.iTLB-loads
4.096e+10 +1.4% 4.153e+10 perf-stat.ps.instructions
194243 -14.6% 165836 perf-stat.ps.minor-faults
194243 -14.6% 165836 perf-stat.ps.page-faults
286795 +0.0% 286862 perf-stat.ps.task-clock
1.268e+13 +1.7% 1.289e+13 perf-stat.total.instructions
0.00 -100.0% 0.00 proc-vmstat.compact_isolated
153775 +0.1% 153982 proc-vmstat.nr_active_anon
34.00 ± 7% -5.9% 32.00 ± 9% proc-vmstat.nr_active_file
111205 -0.4% 110762 proc-vmstat.nr_anon_pages
61.00 ± 31% +14.8% 70.00 ± 5% proc-vmstat.nr_anon_transparent_hugepages
58.67 -1.1% 58.00 proc-vmstat.nr_dirtied
5.00 +0.0% 5.00 proc-vmstat.nr_dirty
1963650 +0.0% 1963749 proc-vmstat.nr_dirty_background_threshold
3932102 +0.0% 3932300 proc-vmstat.nr_dirty_threshold
360190 +0.0% 360264 proc-vmstat.nr_file_pages
49937 +0.0% 49937 proc-vmstat.nr_free_cma
19794023 +0.0% 19795020 proc-vmstat.nr_free_pages
5663 -0.0% 5661 proc-vmstat.nr_inactive_anon
98.00 ± 3% +0.0% 98.00 ± 7% proc-vmstat.nr_inactive_file
13.33 ± 60% +61.2% 21.50 ± 2% proc-vmstat.nr_isolated_anon
40539 -0.0% 40522 proc-vmstat.nr_kernel_stack
12404 -0.5% 12343 proc-vmstat.nr_mapped
430.00 -49.9% 215.50 ± 99% proc-vmstat.nr_mlock
15352 -0.0% 15347 proc-vmstat.nr_page_table_pages
48318 +1.3% 48928 proc-vmstat.nr_shmem
33638 -0.6% 33432 proc-vmstat.nr_slab_reclaimable
80590 -0.7% 80051 proc-vmstat.nr_slab_unreclaimable
311806 -0.2% 311237 proc-vmstat.nr_unevictable
0.00 -100.0% 0.00 proc-vmstat.nr_unstable
0.00 -100.0% 0.00 proc-vmstat.nr_writeback
57.67 -1.2% 57.00 proc-vmstat.nr_written
153775 +0.1% 153982 proc-vmstat.nr_zone_active_anon
34.00 ± 7% -5.9% 32.00 ± 9% proc-vmstat.nr_zone_active_file
5663 -0.0% 5661 proc-vmstat.nr_zone_inactive_anon
98.00 ± 3% +0.0% 98.00 ± 7% proc-vmstat.nr_zone_inactive_file
311806 -0.2% 311237 proc-vmstat.nr_zone_unevictable
5.00 +0.0% 5.00 proc-vmstat.nr_zone_write_pending
2788 ± 8% -1.2% 2755 ± 11% proc-vmstat.numa_hint_faults
2788 ± 8% -1.2% 2755 ± 11% proc-vmstat.numa_hint_faults_local
1.2e+08 -14.5% 1.026e+08 proc-vmstat.numa_hit
121.00 ± 28% +59.9% 193.50 ± 20% proc-vmstat.numa_huge_pte_updates
0.00 -100.0% 0.00 proc-vmstat.numa_interleave
1.2e+08 -14.5% 1.026e+08 proc-vmstat.numa_local
0.00 -100.0% 0.00 proc-vmstat.numa_other
65275 ± 26% +56.7% 102311 ± 20% proc-vmstat.numa_pte_updates
6292 ± 6% +0.7% 6335 ± 2% proc-vmstat.pgactivate
0.00 -100.0% 0.00 proc-vmstat.pgalloc_dma32
1.201e+08 -14.5% 1.027e+08 proc-vmstat.pgalloc_normal
60452926 -14.4% 51751356 proc-vmstat.pgfault
1.2e+08 -14.5% 1.026e+08 proc-vmstat.pgfree
0.00 -100.0% 0.00 proc-vmstat.pgpgin
50.00 ± 52% +18.0% 59.00 ± 10% proc-vmstat.thp_collapse_alloc
32.00 +0.0% 32.00 proc-vmstat.thp_fault_alloc
0.00 -100.0% 0.00 proc-vmstat.thp_zero_page_alloc
105.00 -0.5% 104.50 proc-vmstat.unevictable_pgs_culled
549.00 +0.0% 549.00 proc-vmstat.unevictable_pgs_mlocked
0.68 ± 70% -0.7 0.00 pp.bt.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
0.56 ± 2% -0.6 0.00 pp.bt.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.76 -0.1 0.65 pp.bt.mmap64
0.68 -0.1 0.57 pp.bt.entry_SYSCALL_64_after_hwframe.mmap64
0.64 -0.1 0.54 pp.bt.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
0.65 -0.1 0.55 pp.bt.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
0.67 -0.1 0.57 pp.bt.do_syscall_64.entry_SYSCALL_64_after_hwframe.mmap64
0.82 -0.1 0.74 ± 2% pp.bt.handle_mm_fault.do_page_fault.page_fault
0.78 -0.1 0.70 pp.bt.__handle_mm_fault.handle_mm_fault.do_page_fault.page_fault
0.70 -0.1 0.62 pp.bt.handle_pte_fault.__handle_mm_fault.handle_mm_fault.do_page_fault.page_fault
1.03 -0.1 0.95 pp.bt.page_fault
0.99 -0.1 0.92 ± 2% pp.bt.do_page_fault.page_fault
0.92 -0.1 0.86 pp.bt.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
0.85 -0.1 0.80 pp.bt.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap
1.42 ± 4% -0.0 1.37 pp.bt.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore
0.99 ± 6% -0.0 0.95 pp.bt._raw_spin_unlock_irqrestore.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region
0.98 ± 5% -0.0 0.94 pp.bt.apic_timer_interrupt._raw_spin_unlock_irqrestore.release_pages.tlb_flush_mmu.tlb_finish_mmu
0.94 ± 6% -0.0 0.91 pp.bt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.release_pages.tlb_flush_mmu
0.82 ± 5% -0.0 0.79 pp.bt.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.release_pages
0.82 ± 5% -0.0 0.81 pp.bt.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn
0.95 ± 5% -0.0 0.94 pp.bt.smp_apic_timer_interrupt.apic_timer_interrupt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn.lru_add_drain_cpu
0.98 ± 5% -0.0 0.97 pp.bt.apic_timer_interrupt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain
0.98 ± 5% -0.0 0.97 pp.bt._raw_spin_unlock_irqrestore.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region
47.85 +0.1 47.95 pp.bt.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
47.84 +0.1 47.94 pp.bt.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap
47.71 +0.1 47.83 pp.bt.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap
97.48 +0.2 97.64 pp.bt.munmap
97.35 +0.2 97.53 pp.bt.entry_SYSCALL_64_after_hwframe.munmap
97.34 +0.2 97.52 pp.bt.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
46.37 +0.2 46.55 pp.bt._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region
46.34 +0.2 46.52 pp.bt.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.tlb_finish_mmu
96.76 +0.2 96.97 pp.bt.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
96.81 +0.2 97.03 pp.bt.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
96.80 +0.2 97.02 pp.bt.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap
47.47 +0.2 47.71 pp.bt.lru_add_drain.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
47.46 +0.2 47.70 pp.bt.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap.__vm_munmap
47.44 +0.2 47.68 pp.bt.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap
96.55 +0.2 96.80 pp.bt.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
46.22 +0.3 46.48 pp.bt._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region
46.19 +0.3 46.47 pp.bt.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain
0.76 -0.1 0.65 pp.child.mmap64
0.66 -0.1 0.56 pp.child.vm_mmap_pgoff
0.67 -0.1 0.57 pp.child.ksys_mmap_pgoff
0.58 ± 2% -0.1 0.49 pp.child.do_mmap
0.11 ± 37% -0.1 0.03 ±100% pp.child.timerqueue_del
2.02 ± 4% -0.1 1.94 pp.child.smp_apic_timer_interrupt
2.11 ± 4% -0.1 2.03 pp.child.apic_timer_interrupt
1.51 ± 4% -0.1 1.44 pp.child.__hrtimer_run_queues
0.79 -0.1 0.71 pp.child.__handle_mm_fault
0.84 -0.1 0.77 pp.child.handle_mm_fault
1.07 -0.1 0.99 pp.child.page_fault
1.75 ± 4% -0.1 1.68 pp.child.hrtimer_interrupt
0.71 -0.1 0.64 pp.child.handle_pte_fault
0.44 ± 2% -0.1 0.36 pp.child.mmap_region
0.07 ± 70% -0.1 0.00 pp.child.rb_next
1.02 -0.1 0.95 pp.child.do_page_fault
0.06 -0.1 0.00 pp.child.free_unref_page_commit
0.93 -0.1 0.87 pp.child.unmap_vmas
2.06 ± 6% -0.1 2.00 pp.child._raw_spin_unlock_irqrestore
0.05 -0.1 0.00 pp.child.__might_sleep
0.05 -0.1 0.00 pp.child.find_vma
0.51 -0.0 0.46 pp.child.exit_to_usermode_loop
0.27 -0.0 0.22 pp.child.get_page_from_freelist
0.32 -0.0 0.27 pp.child.__alloc_pages_nodemask
0.87 -0.0 0.83 pp.child.unmap_page_range
0.39 ± 29% -0.0 0.35 ± 34% pp.child.cmd_record
0.39 ± 29% -0.0 0.35 ± 34% pp.child.perf_mmap__push
0.36 ± 27% -0.0 0.32 ± 34% pp.child.ksys_write
0.50 -0.0 0.46 pp.child.task_work_run
0.50 -0.0 0.46 pp.child.task_numa_work
0.18 ± 4% -0.0 0.14 pp.child.perf_event_mmap
0.39 ± 29% -0.0 0.35 ± 35% pp.child.__libc_start_main
0.39 ± 29% -0.0 0.35 ± 35% pp.child.main
0.38 ± 28% -0.0 0.35 ± 33% pp.child.__GI___libc_write
0.17 ± 2% -0.0 0.14 ± 3% pp.child.prep_new_page
0.35 ± 27% -0.0 0.31 ± 35% pp.child.vfs_write
0.12 ± 8% -0.0 0.08 pp.child.perf_iterate_sb
0.50 -0.0 0.46 pp.child.change_protection
0.50 -0.0 0.46 pp.child.change_prot_numa
0.50 -0.0 0.46 pp.child.change_p4d_range
0.25 ± 3% -0.0 0.21 ± 2% pp.child.__pte_alloc
0.03 ± 70% -0.0 0.00 pp.child.mem_cgroup_try_charge_delay
0.03 ± 70% -0.0 0.00 pp.child.__put_anon_vma
0.15 ± 3% -0.0 0.12 pp.child.clear_page_erms
0.23 ± 2% -0.0 0.20 ± 2% pp.child.pte_alloc_one
0.32 ± 28% -0.0 0.29 ± 34% pp.child.generic_file_write_iter
0.31 ± 29% -0.0 0.28 ± 35% pp.child.__generic_file_write_iter
0.30 ± 28% -0.0 0.28 ± 34% pp.child.generic_perform_write
0.32 ± 28% -0.0 0.30 ± 35% pp.child.new_sync_write
0.11 ± 4% -0.0 0.09 pp.child.free_unref_page_list
0.99 ± 4% -0.0 0.97 pp.child.update_process_times
1.06 ± 4% -0.0 1.04 pp.child.tick_sched_timer
0.16 -0.0 0.14 pp.child.alloc_pages_vma
0.12 -0.0 0.10 pp.child.get_unmapped_area
1.01 ± 4% -0.0 0.99 pp.child.tick_sched_handle
0.54 ± 3% -0.0 0.53 pp.child.task_tick_fair
0.78 ± 5% -0.0 0.76 pp.child.scheduler_tick
0.02 ±141% -0.0 0.00 pp.child.iov_iter_fault_in_readable
0.02 ±141% -0.0 0.00 pp.child.enqueue_hrtimer
0.32 -0.0 0.30 pp.child.___might_sleep
0.17 ± 2% -0.0 0.15 pp.child._cond_resched
0.07 ± 7% -0.0 0.05 pp.child.kmem_cache_free
0.16 -0.0 0.15 ± 3% pp.child.irq_exit
0.13 ± 3% -0.0 0.12 pp.child.free_pgtables
0.12 ± 3% -0.0 0.11 pp.child.unlink_anon_vmas
0.11 ± 11% -0.0 0.10 pp.child.perf_mux_hrtimer_handler
0.08 ± 5% -0.0 0.07 pp.child._raw_spin_lock
0.10 ± 4% -0.0 0.08 ± 5% pp.child.arch_get_unmapped_area_topdown
0.10 -0.0 0.09 pp.child.__update_load_avg_cfs_rq
0.22 ± 28% -0.0 0.21 ± 38% pp.child.shmem_write_begin
0.22 ± 28% -0.0 0.21 ± 38% pp.child.shmem_getpage_gfp
0.16 ± 5% -0.0 0.15 pp.child.update_curr
0.13 -0.0 0.12 pp.child.__anon_vma_prepare
0.13 -0.0 0.12 pp.child.free_pgd_range
0.09 ± 9% -0.0 0.08 pp.child.mem_cgroup_uncharge_list
0.07 -0.0 0.06 pp.child.vm_unmapped_area
0.07 -0.0 0.06 pp.child.percpu_counter_add_batch
0.12 -0.0 0.11 pp.child.free_p4d_range
0.09 ± 5% -0.0 0.08 pp.child.kmem_cache_alloc
0.06 ± 8% -0.0 0.05 pp.child.rcu_sched_clock_irq
0.06 ± 8% -0.0 0.05 pp.child.remove_vma
0.07 -0.0 0.07 ± 7% pp.child.rcu_all_qs
0.06 -0.0 0.06 ± 9% pp.child.run_timer_softirq
0.06 -0.0 0.06 ± 9% pp.child.malloc
0.07 ± 6% -0.0 0.07 pp.child.flush_tlb_mm_range
0.05 ± 8% -0.0 0.05 pp.child.entry_SYSCALL_64
0.05 ± 8% -0.0 0.05 pp.child.vma_link
0.06 ± 8% -0.0 0.06 ± 9% pp.child.clockevents_program_event
0.05 -0.0 0.05 pp.child.vm_normal_page
0.06 +0.0 0.06 pp.child.syscall_return_via_sysret
0.06 +0.0 0.06 pp.child.uncharge_batch
0.11 ± 4% +0.0 0.11 ± 4% pp.child.__softirqentry_text_start
0.09 ± 5% +0.0 0.10 ± 5% pp.child.__update_load_avg_se
0.16 ± 26% +0.0 0.16 ± 27% pp.child.__lru_cache_add
0.11 ± 4% +0.0 0.12 pp.child.__pagevec_lru_add_fn
0.02 ±141% +0.0 0.03 ±100% pp.child.interrupt_entry
0.07 ± 18% +0.0 0.08 pp.child.update_rq_clock
0.11 ± 4% +0.0 0.12 ± 4% pp.child.__perf_sw_event
0.09 ± 5% +0.0 0.11 ± 4% pp.child.___perf_sw_event
0.04 ± 70% +0.0 0.06 pp.child.perf_event_task_tick
0.13 ± 31% +0.0 0.15 ± 3% pp.child.__remove_hrtimer
0.07 +0.0 0.11 ± 4% pp.child.__mod_lruvec_state
0.00 +0.1 0.05 pp.child.irq_enter
0.00 +0.1 0.05 ±100% pp.child.isolate_lru_page
98.48 +0.1 98.53 pp.child.do_syscall_64
0.00 +0.1 0.06 ± 9% pp.child.mmput
0.00 +0.1 0.06 ± 9% pp.child.exit_mmap
98.50 +0.1 98.56 pp.child.entry_SYSCALL_64_after_hwframe
0.02 ±141% +0.1 0.08 ± 6% pp.child.__mod_memcg_state
0.00 +0.1 0.06 ±100% pp.child.khugepaged
0.00 +0.1 0.06 ±100% pp.child._raw_spin_lock_irq
0.00 +0.1 0.07 ±100% pp.child.ret_from_fork
0.00 +0.1 0.07 ±100% pp.child.kthread
47.88 +0.1 47.98 pp.child.tlb_finish_mmu
47.87 +0.1 47.98 pp.child.tlb_flush_mmu
47.79 +0.1 47.90 pp.child.release_pages
97.50 +0.2 97.66 pp.child.munmap
96.82 +0.2 97.03 pp.child.__x64_sys_munmap
96.81 +0.2 97.03 pp.child.__vm_munmap
96.79 +0.2 97.01 pp.child.__do_munmap
47.66 +0.2 47.91 pp.child.pagevec_lru_move_fn
47.52 +0.2 47.77 pp.child.lru_add_drain
47.51 +0.3 47.76 pp.child.lru_add_drain_cpu
96.58 +0.3 96.83 pp.child.unmap_region
92.87 +0.5 93.34 pp.child._raw_spin_lock_irqsave
92.80 +0.5 93.31 pp.child.native_queued_spin_lock_slowpath
0.15 ± 28% -0.1 0.07 pp.self.__hrtimer_run_queues
0.07 ± 70% -0.1 0.00 pp.self.rb_next
0.05 -0.1 0.00 pp.self.__pagevec_lru_add_fn
0.05 -0.1 0.00 pp.self.run_timer_softirq
0.05 -0.1 0.00 pp.self.free_unref_page_commit
0.15 ± 3% -0.0 0.11 pp.self.clear_page_erms
0.44 -0.0 0.41 pp.self.change_p4d_range
0.08 ± 5% -0.0 0.06 ± 9% pp.self.perf_iterate_sb
0.47 ± 2% -0.0 0.45 pp.self.unmap_page_range
0.02 ±141% -0.0 0.00 pp.self.smp_apic_timer_interrupt
0.30 -0.0 0.29 pp.self.___might_sleep
0.02 ±141% -0.0 0.00 pp.self.malloc
0.02 ±141% -0.0 0.00 pp.self.entry_SYSCALL_64
0.02 ±141% -0.0 0.00 pp.self.__might_sleep
0.08 ± 10% -0.0 0.07 ± 7% pp.self._raw_spin_lock
0.09 ± 14% -0.0 0.08 ± 6% pp.self.hrtimer_interrupt
0.09 ± 5% -0.0 0.08 ± 6% pp.self.release_pages
0.10 ± 8% -0.0 0.09 pp.self._raw_spin_unlock_irqrestore
0.09 -0.0 0.08 pp.self.__update_load_avg_cfs_rq
0.06 -0.0 0.05 pp.self.do_page_fault
0.06 -0.0 0.05 pp.self.kmem_cache_free
0.09 ± 5% -0.0 0.08 ± 5% pp.self.free_p4d_range
0.08 ± 5% -0.0 0.08 ± 6% pp.self._cond_resched
0.07 ± 7% -0.0 0.06 pp.self.vm_unmapped_area
0.06 ± 8% -0.0 0.05 pp.self.kmem_cache_alloc
0.06 -0.0 0.06 ± 9% pp.self.rcu_all_qs
0.08 -0.0 0.08 ± 6% pp.self.__update_load_avg_se
0.09 ± 5% -0.0 0.09 pp.self.update_curr
0.07 ± 6% -0.0 0.07 pp.self._raw_spin_lock_irqsave
0.11 ± 4% -0.0 0.11 pp.self.task_tick_fair
0.07 ± 7% -0.0 0.07 ± 7% pp.self.get_page_from_freelist
0.06 ± 8% -0.0 0.06 ± 9% pp.self.__handle_mm_fault
0.05 -0.0 0.05 pp.self.__do_munmap
0.06 +0.0 0.06 pp.self.syscall_return_via_sysret
0.02 ±141% +0.0 0.03 ±100% pp.self.update_rq_clock
0.02 ±141% +0.0 0.03 ±100% pp.self.interrupt_entry
0.04 ± 70% +0.0 0.06 ± 9% pp.self.perf_event_task_tick
0.00 +0.0 0.03 ±100% pp.self.vm_normal_page
0.05 +0.0 0.08 pp.self.___perf_sw_event
0.02 ±141% +0.1 0.08 ± 6% pp.self.__mod_memcg_state
0.00 +0.1 0.11 ± 4% pp.self.__remove_hrtimer
92.80 +0.5 93.31 pp.self.native_queued_spin_lock_slowpath
333.33 -0.2% 332.50 softirqs.BLOCK
5.00 +0.0% 5.00 softirqs.HI
17005 ± 69% -64.3% 6074 ± 8% softirqs.NET_RX
45.33 +0.4% 45.50 ± 3% softirqs.NET_TX
1322815 ± 2% -1.3% 1305414 softirqs.RCU
633707 ± 9% +1.9% 645991 ± 11% softirqs.SCHED
293.00 -0.2% 292.50 softirqs.TASKLET
35621870 +12.5% 40074312 softirqs.TIMER
344034 -0.3% 343007 interrupts.CAL:Function_call_interrupts
396.00 -0.1% 395.50 interrupts.IWI:IRQ_work_interrupts
1.102e+08 ± 13% -1.1% 1.09e+08 interrupts.LOC:Local_timer_interrupts
288.00 +0.0% 288.00 interrupts.MCP:Machine_check_polls
1451499 +1.0% 1465843 interrupts.NMI:Non-maskable_interrupts
1451499 +1.0% 1465843 interrupts.PMI:Performance_monitoring_interrupts
24121 ± 2% +1.9% 24578 ± 7% interrupts.RES:Rescheduling_interrupts
1262 ± 2% +12.5% 1421 ± 7% interrupts.TLB:TLB_shootdowns

Thanks,
Feng


Attachments:
(No filename) (32.53 kB)
perf-profile.old (140.42 kB)
perf-profile.new (154.79 kB)
Download all attachments

2020-08-24 15:16:24

by Borislav Petkov

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Fri, Aug 21, 2020 at 10:02:59AM +0800, Feng Tang wrote:
> 1de08dccd383 x86/mce: Add a struct mce.kflags field
> 9554bfe403bd x86/mce: Convert the CEC to use the MCE notifier
>
> And strange thing is after using gcc9 and debian10 rootfs, with same commits
> the regression turns to a improvement,

How so?

> though the trend keeps, that if we
> changes the kflags from __u64 to __u32, the performance will be no change.
>
> Following is the comparing of regression, I also attached the perf-profile
> for old and new commit (let me know if you need more data)
>
>
> 9554bfe403bdfc08 1de08dccd383482a3e88845d355
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 192362 -15.1% 163343 will-it-scale.287.processes
> 0.91 +0.2% 0.92 will-it-scale.287.processes_idle
> 669.67 -15.1% 568.50 will-it-scale.per_process_ops

This is the data from your previous measurement:

9554bfe403bdfc08 1de08dccd383482a3e88845d355
---------------- ---------------------------
%stddev %change %stddev
\ | \
668.00 -14.1% 573.75 will-it-scale.per_process_ops

If I'm reading it correctly, commit

1de08dccd383 ("x86/mce: Add a struct mce.kflags field")

is still the slower one vs

9554bfe403bd ("x86/mce: Convert the CEC to use the MCE notifier")

Or am I misreading it?

In any case, this really looks like what Tony said: this enlargement of
struct mce pushes some variable into a cacheline-misaligned placement,
causing it to bounce.

The $ 10^6 question is, which variable is that...

--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

2020-08-24 15:36:49

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Mon, Aug 24, 2020 at 05:14:25PM +0200, Borislav Petkov wrote:
> On Fri, Aug 21, 2020 at 10:02:59AM +0800, Feng Tang wrote:
> > 1de08dccd383 x86/mce: Add a struct mce.kflags field
> > 9554bfe403bd x86/mce: Convert the CEC to use the MCE notifier
> >
> > And strange thing is after using gcc9 and debian10 rootfs, with same commits
> > the regression turns to a improvement,
>
> How so?

My understanding is microbenchmark like will-it-scale is sensitive to the
alignments (text/data), we've found other simliar cases that with this 0day's
update (compiler, kernel config, rootfs), some other reported regression
can not be reproduced.

> > though the trend keeps, that if we
> > changes the kflags from __u64 to __u32, the performance will be no change.
> >
> > Following is the comparing of regression, I also attached the perf-profile
> > for old and new commit (let me know if you need more data)
> >
> >
> > 9554bfe403bdfc08 1de08dccd383482a3e88845d355
> > ---------------- ---------------------------
> > %stddev %change %stddev
> > \ | \
> > 192362 -15.1% 163343 will-it-scale.287.processes
> > 0.91 +0.2% 0.92 will-it-scale.287.processes_idle
> > 669.67 -15.1% 568.50 will-it-scale.per_process_ops
>
> This is the data from your previous measurement:
>
> 9554bfe403bdfc08 1de08dccd383482a3e88845d355
> ---------------- ---------------------------
> %stddev %change %stddev
> \ | \
> 668.00 -14.1% 573.75 will-it-scale.per_process_ops
>
> If I'm reading it correctly, commit
>
> 1de08dccd383 ("x86/mce: Add a struct mce.kflags field")
>
> is still the slower one vs
>
> 9554bfe403bd ("x86/mce: Convert the CEC to use the MCE notifier")
>
> Or am I misreading it?

Your reading is correct. With the original kernel (built back in April)
and old rootfs, the regression persists, just a small drift between 14.1%
and 15.1% (which is normal for will-it-scale), while the 15.1% was just
retested last week.

>
> In any case, this really looks like what Tony said: this enlargement of
> struct mce pushes some variable into a cacheline-misaligned placement,
> causing it to bounce.

Yes, that's what we suspected. And I just did another try to force the
percpu mce structure aligned. And the regression seems to be gone (reduced
from 14.1% to 2%), which further proved it.

patch is:

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 43b1519..2c020ef 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -95,7 +95,7 @@ struct mca_config mca_cfg __read_mostly = {
.monarch_timeout = -1
};

-static DEFINE_PER_CPU(struct mce, mces_seen);
+static DEFINE_PER_CPU_ALIGNED(struct mce, mces_seen);
static unsigned long mce_need_notify;
static int cpu_missing;

@@ -148,7 +148,7 @@ void mce_setup(struct mce *m)
m->microcode = boot_cpu_data.microcode;
}

-DEFINE_PER_CPU(struct mce, injectm);
+DEFINE_PER_CPU_ALIGNED(struct mce, injectm);
EXPORT_PER_CPU_SYMBOL_GPL(injectm);

> The $ 10^6 question is, which variable is that...

:) Right, this is what I'm doing right now. Some test job is queued on
the test box, and it may needs some iterations of new patch. Hopefully we
can isolate some specific variable given some luck.

Thanks,
Feng

2020-08-24 15:44:31

by Luck, Tony

[permalink] [raw]
Subject: RE: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

> Yes, that's what we suspected. And I just did another try to force the
> percpu mce structure aligned. And the regression seems to be gone (reduced
> from 14.1% to 2%), which further proved it.

I wonder whether it would be useful for bisection of performance issues
for you to change the global definition of DEFINE_PER_CPU() to make
all per CPU definitions aligned. Just like you switch compiler flags to make
all functions aligned.

-Tony

2020-08-24 15:50:01

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Mon, Aug 24, 2020 at 11:38:53PM +0800, Luck, Tony wrote:
> > Yes, that's what we suspected. And I just did another try to force the
> > percpu mce structure aligned. And the regression seems to be gone (reduced
> > from 14.1% to 2%), which further proved it.
>
> I wonder whether it would be useful for bisection of performance issues
> for you to change the global definition of DEFINE_PER_CPU() to make
> all per CPU definitions aligned. Just like you switch compiler flags to make
> all functions aligned.

Thanks for the hint! This will definitely help tracking strange performance
changes like this, as a general debug mthod.

Thanks,
Feng

>
> -Tony

2020-08-24 16:13:36

by Borislav Petkov

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Mon, Aug 24, 2020 at 11:33:00PM +0800, Feng Tang wrote:
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 43b1519..2c020ef 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -95,7 +95,7 @@ struct mca_config mca_cfg __read_mostly = {
> .monarch_timeout = -1
> };
>
> -static DEFINE_PER_CPU(struct mce, mces_seen);
> +static DEFINE_PER_CPU_ALIGNED(struct mce, mces_seen);
> static unsigned long mce_need_notify;
> static int cpu_missing;
>
> @@ -148,7 +148,7 @@ void mce_setup(struct mce *m)
> m->microcode = boot_cpu_data.microcode;
> }
>
> -DEFINE_PER_CPU(struct mce, injectm);
> +DEFINE_PER_CPU_ALIGNED(struct mce, injectm);
> EXPORT_PER_CPU_SYMBOL_GPL(injectm);

I don't think this is the right fix. Lemme quote Tony from a previous
email:

"The answer isn't to tinker with "struct mce". Other changes could
trigger this same change in alignment. Anything that is this perfomance
sensitive needs to have some "__attribute__((aligned(64)))" (or
whatever) to make sure arbitrary changes elsewhere don't do this."

And yes, your diff is not tinkering with struct mce but it is tinkering
with percpu vars which are of type struct mce.

However, the proper fix is...

> :) Right, this is what I'm doing right now. Some test job is queued on
> the test box, and it may needs some iterations of new patch. Hopefully we
> can isolate some specific variable given some luck.

... yes, exactly, you need to identify the contention where this
happens, causing a cacheline to bounce or a variable straddles across a
cacheline boundary, causing the read to fetch two cachelines and thus
causes that slowdown. And then align that var to the beginning of a
cacheline.

Also, maybe I missed this but, do you trigger this only on Xeon Phi or
on "normal" x86 too?

Because if it is Xeon Phi only, then that might explain the size of the
slowdown and that it happens only there because it is a, well, "strange"
machine. :-)

Thx.

--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

2020-08-24 16:58:37

by Mel Gorman

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Mon, Aug 24, 2020 at 06:12:38PM +0200, Borislav Petkov wrote:
>
> > :) Right, this is what I'm doing right now. Some test job is queued on
> > the test box, and it may needs some iterations of new patch. Hopefully we
> > can isolate some specific variable given some luck.
>
> ... yes, exactly, you need to identify the contention where this
> happens,
> causing a cacheline to bounce or a variable straddles across a
> cacheline boundary, causing the read to fetch two cachelines and thus
> causes that slowdown. And then align that var to the beginning of a
> cacheline.
>

Given the test is malloc1, it *may* be struct per_cpu_pages embedded within
per_cpu_pageset. The cache characteristics of per_cpu_pageset are terrible
because of how it mixes up zone counters and per-cpu lists. However, if
the first per_cpu_pageset is cache-aligned then every second per_cpu_pages
will be cache-aligned and half of the lists will fit in one cache line. If
the whole structure gets pushed out of alignment then all per_cpu_pages
straddle cache lines, increase the overall cache footprint and potentially
cause problems if the cache is not large enough to hold hot structures.

The misses could potentially be inferred without c2c from looking at
perf -e cache-misses on a good and bad kernel and seeing if there is a
noticable increase in misses in mm/page_alloc.c with a focus on anything
using per-cpu lists.

Whether the problem is per_cpu_pages or some other structure, it's not
struct mce's fault in all likelihood -- it's just the messenger.

--
Mel Gorman
SUSE Labs

2020-08-25 06:25:55

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Mon, Aug 24, 2020 at 06:12:38PM +0200, Borislav Petkov wrote:
> > -DEFINE_PER_CPU(struct mce, injectm);
> > +DEFINE_PER_CPU_ALIGNED(struct mce, injectm);
> > EXPORT_PER_CPU_SYMBOL_GPL(injectm);
>
> I don't think this is the right fix.

Agreed :) This is a debug patch, what we want is to root cause this
strange performance bump, as we've seen many other reports that the
culprit commit has no direct relation with the benchmark at all,
and want to find their pattern.

> Lemme quote Tony from a previous
> email:
>
> "The answer isn't to tinker with "struct mce". Other changes could
> trigger this same change in alignment. Anything that is this perfomance
> sensitive needs to have some "__attribute__((aligned(64)))" (or
> whatever) to make sure arbitrary changes elsewhere don't do this."
>
> And yes, your diff is not tinkering with struct mce but it is tinkering
> with percpu vars which are of type struct mce.
>
> However, the proper fix is...
>
> > :) Right, this is what I'm doing right now. Some test job is queued on
> > the test box, and it may needs some iterations of new patch. Hopefully we
> > can isolate some specific variable given some luck.
>
> ... yes, exactly, you need to identify the contention where this
> happens, causing a cacheline to bounce or a variable straddles across a
> cacheline boundary, causing the read to fetch two cachelines and thus
> causes that slowdown. And then align that var to the beginning of a
> cacheline.
>
> Also, maybe I missed this but, do you trigger this only on Xeon Phi or
> on "normal" x86 too?
>
> Because if it is Xeon Phi only, then that might explain the size of the
> slowdown and that it happens only there because it is a, well, "strange"
> machine. :-)

Good point! This is only reproduced on Xeon Phi, and can't be seen on
other Skylake/Cascade Lake/Icelake platforms.

The hotspots for this Xeon Phi is even different from other platforms,
while other platforms share the same hotspot.

Also one good news is, we seem to identify the 2 key percpu variables
out of the list mentioned in previous email:
'arch_freq_scale'
'tsc_adjust'

These 2 variables are accessed in 2 hot call stacks (for this 288 CPU
Xeon Phi platform):

- arch_freq_scale is accessed in scheduler tick
arch_scale_freq_tick+0xaf/0xc0
scheduler_tick+0x39/0x100
update_process_times+0x3c/0x50
tick_sched_handle+0x22/0x60
tick_sched_timer+0x37/0x70
__hrtimer_run_queues+0xfc/0x2a0
hrtimer_interrupt+0x122/0x270
smp_apic_timer_interrupt+0x6a/0x150
apic_timer_interrupt+0xf/0x20

- tsc_adjust is accessed in idle entrance
tsc_verify_tsc_adjust+0xeb/0xf0
arch_cpu_idle_enter+0xc/0x20
do_idle+0x91/0x280
cpu_startup_entry+0x19/0x20
start_kernel+0x4f4/0x516
secondary_startup_64+0xb6/0xc0

From systemmap file, for bad kernel these 2 sit in one cache line, while
for good kernel they sit in 2 separate cache lines.

It also explains why it turns from a regression to an improvement with
updated gcc/kconfig, as the cache line sharing situation is reversed.

The direct patch I can think of is to make 'tsc_adjust' cache aligned
to separate these 2 'hot' variables. How do you think?

--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -29,7 +29,7 @@ struct tsc_adjust {
bool warned;
};

-static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
+static DEFINE_PER_CPU_ALIGNED(struct tsc_adjust, tsc_adjust);


Thanks,
Feng

2020-08-25 06:51:34

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Mon, Aug 24, 2020 at 05:56:53PM +0100, Mel Gorman wrote:
> On Mon, Aug 24, 2020 at 06:12:38PM +0200, Borislav Petkov wrote:
> >
> > > :) Right, this is what I'm doing right now. Some test job is queued on
> > > the test box, and it may needs some iterations of new patch. Hopefully we
> > > can isolate some specific variable given some luck.
> >
> > ... yes, exactly, you need to identify the contention where this
> > happens,
> > causing a cacheline to bounce or a variable straddles across a
> > cacheline boundary, causing the read to fetch two cachelines and thus
> > causes that slowdown. And then align that var to the beginning of a
> > cacheline.
> >
>
> Given the test is malloc1, it *may* be struct per_cpu_pages embedded within
> per_cpu_pageset. The cache characteristics of per_cpu_pageset are terrible
> because of how it mixes up zone counters and per-cpu lists. However, if
> the first per_cpu_pageset is cache-aligned then every second per_cpu_pages
> will be cache-aligned and half of the lists will fit in one cache line. If
> the whole structure gets pushed out of alignment then all per_cpu_pages
> straddle cache lines, increase the overall cache footprint and potentially
> cause problems if the cache is not large enough to hold hot structures.
>
> The misses could potentially be inferred without c2c from looking at
> perf -e cache-misses on a good and bad kernel and seeing if there is a
> noticable increase in misses in mm/page_alloc.c with a focus on anything
> using per-cpu lists.

Thanks for the tip, which is useful for Xeon-Phi. I ran it with 'cache-misses'
instead of default 'cycles', and the 2 versions of perf data show similar hotspots:

92.62% 92.62% [kernel.kallsyms] [k] native_queued_spin_lock_slowpath - -
46.20% native_queued_spin_lock_slowpath;_raw_spin_lock_irqsave;release_pages;tlb_flush_mmu;tlb_finish_mmu;unmap_region;__do_munmap;__vm_munmap;__x64_sys_munmap;do_syscall_64;entry_SYSCALL_64_after_hwframe;munmap
46.13% native_queued_spin_lock_slowpath;_raw_spin_lock_irqsave;pagevec_lru_move_fn;lru_add_drain_cpu;lru_add_drain;unmap_region;__do_munmap;__vm_munmap;__x64_sys_munmap;do_syscall_64;entry_SYSCALL_64_after_hwframe;munmap

> Whether the problem is per_cpu_pages or some other structure, it's not
> struct mce's fault in all likelihood -- it's just the messenger.

Agreed. The mce patch itself is innocent, it just changes other domains'
variables' alignment indeliberately.

Thanks,
Feng

> --
> Mel Gorman
> SUSE Labs

2020-08-25 16:46:34

by Luck, Tony

[permalink] [raw]
Subject: RE: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

> These 2 variables are accessed in 2 hot call stacks (for this 288 CPU
> Xeon Phi platform):

This might be the key element of "weirdness" for this system. It
has 288 CPUs ... cache alignment problems are often not too bad
on "small" systems. The as you scale up to bigger machines you
suddenly hit some critical point and performance drops dramatically.

It's good that you are picking up tips on how to bisect these and diagnose
the underlying problem. Number of cores is going to keep increasing, so
we will keep finding new issues like this.

-Tony

2020-08-26 01:47:22

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Wed, Aug 26, 2020 at 12:44:37AM +0800, Luck, Tony wrote:
> > These 2 variables are accessed in 2 hot call stacks (for this 288 CPU
> > Xeon Phi platform):
>
> This might be the key element of "weirdness" for this system. It
> has 288 CPUs ... cache alignment problems are often not too bad
> on "small" systems. The as you scale up to bigger machines you
> suddenly hit some critical point and performance drops dramatically.
>
> It's good that you are picking up tips on how to bisect these and diagnose
> the underlying problem. Number of cores is going to keep increasing, so
> we will keep finding new issues like this.

Yes, now we have one more bullet for shooting this kind of strange
performance changes :)

Thanks,
Feng

> -Tony

2020-08-28 17:51:16

by Borislav Petkov

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Tue, Aug 25, 2020 at 02:23:05PM +0800, Feng Tang wrote:
> Also one good news is, we seem to identify the 2 key percpu variables
> out of the list mentioned in previous email:
> 'arch_freq_scale'
> 'tsc_adjust'
>
> These 2 variables are accessed in 2 hot call stacks (for this 288 CPU
> Xeon Phi platform):
>
> - arch_freq_scale is accessed in scheduler tick
> arch_scale_freq_tick+0xaf/0xc0
> scheduler_tick+0x39/0x100
> update_process_times+0x3c/0x50
> tick_sched_handle+0x22/0x60
> tick_sched_timer+0x37/0x70
> __hrtimer_run_queues+0xfc/0x2a0
> hrtimer_interrupt+0x122/0x270
> smp_apic_timer_interrupt+0x6a/0x150
> apic_timer_interrupt+0xf/0x20
>
> - tsc_adjust is accessed in idle entrance
> tsc_verify_tsc_adjust+0xeb/0xf0
> arch_cpu_idle_enter+0xc/0x20
> do_idle+0x91/0x280
> cpu_startup_entry+0x19/0x20
> start_kernel+0x4f4/0x516
> secondary_startup_64+0xb6/0xc0
>
> From systemmap file, for bad kernel these 2 sit in one cache line, while
> for good kernel they sit in 2 separate cache lines.
>
> It also explains why it turns from a regression to an improvement with
> updated gcc/kconfig, as the cache line sharing situation is reversed.
>
> The direct patch I can think of is to make 'tsc_adjust' cache aligned
> to separate these 2 'hot' variables. How do you think?
>
> --- a/arch/x86/kernel/tsc_sync.c
> +++ b/arch/x86/kernel/tsc_sync.c
> @@ -29,7 +29,7 @@ struct tsc_adjust {
> bool warned;
> };
>
> -static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
> +static DEFINE_PER_CPU_ALIGNED(struct tsc_adjust, tsc_adjust);

So why don't you define both variables with DEFINE_PER_CPU_ALIGNED and
check if all your bad measurements go away this way?

You'd also need to check whether there's no detrimental effect from
this change on other, i.e., !KNL platforms, and I think there won't
be because both variables will be in separate cachelines then and all
should be good.

Hmm?

--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

2020-08-31 02:18:07

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Fri, Aug 28, 2020 at 07:48:39PM +0200, Borislav Petkov wrote:
> On Tue, Aug 25, 2020 at 02:23:05PM +0800, Feng Tang wrote:
> > Also one good news is, we seem to identify the 2 key percpu variables
> > out of the list mentioned in previous email:
> > 'arch_freq_scale'
> > 'tsc_adjust'
> >
> > These 2 variables are accessed in 2 hot call stacks (for this 288 CPU
> > Xeon Phi platform):
> >
> > - arch_freq_scale is accessed in scheduler tick
> > arch_scale_freq_tick+0xaf/0xc0
> > scheduler_tick+0x39/0x100
> > update_process_times+0x3c/0x50
> > tick_sched_handle+0x22/0x60
> > tick_sched_timer+0x37/0x70
> > __hrtimer_run_queues+0xfc/0x2a0
> > hrtimer_interrupt+0x122/0x270
> > smp_apic_timer_interrupt+0x6a/0x150
> > apic_timer_interrupt+0xf/0x20
> >
> > - tsc_adjust is accessed in idle entrance
> > tsc_verify_tsc_adjust+0xeb/0xf0
> > arch_cpu_idle_enter+0xc/0x20
> > do_idle+0x91/0x280
> > cpu_startup_entry+0x19/0x20
> > start_kernel+0x4f4/0x516
> > secondary_startup_64+0xb6/0xc0
> >
> > From systemmap file, for bad kernel these 2 sit in one cache line, while
> > for good kernel they sit in 2 separate cache lines.
> >
> > It also explains why it turns from a regression to an improvement with
> > updated gcc/kconfig, as the cache line sharing situation is reversed.
> >
> > The direct patch I can think of is to make 'tsc_adjust' cache aligned
> > to separate these 2 'hot' variables. How do you think?
> >
> > --- a/arch/x86/kernel/tsc_sync.c
> > +++ b/arch/x86/kernel/tsc_sync.c
> > @@ -29,7 +29,7 @@ struct tsc_adjust {
> > bool warned;
> > };
> >
> > -static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
> > +static DEFINE_PER_CPU_ALIGNED(struct tsc_adjust, tsc_adjust);
>
> So why don't you define both variables with DEFINE_PER_CPU_ALIGNED and
> check if all your bad measurements go away this way?

For 'arch_freq_scale', there are other percpu variables in the same
smpboot.c: 'arch_prev_aperf' and 'arch_prev_mperf', and in hot path
arch_scale_freq_tick(), these 3 variables are all accessed, so I didn't
touch it. Or maybe we can align the first of these 3 variables, so
that they sit in one cacheline.

> You'd also need to check whether there's no detrimental effect from
> this change on other, i.e., !KNL platforms, and I think there won't
> be because both variables will be in separate cachelines then and all
> should be good.

Yes, these kind of changes should be verified on other platforms.

One thing still puzzles me, that the 2 variables are per-cpu things, and
there is no case of many CPU contending, why the cacheline layout matters?
I doubt it is due to the contention of the same cache set, and am trying
to find some way to test it.

Thanks,
Feng

> Hmm?
>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

2020-08-31 07:57:21

by Mel Gorman

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Mon, Aug 31, 2020 at 10:16:38AM +0800, Feng Tang wrote:
> > So why don't you define both variables with DEFINE_PER_CPU_ALIGNED and
> > check if all your bad measurements go away this way?
>
> For 'arch_freq_scale', there are other percpu variables in the same
> smpboot.c: 'arch_prev_aperf' and 'arch_prev_mperf', and in hot path
> arch_scale_freq_tick(), these 3 variables are all accessed, so I didn't
> touch it. Or maybe we can align the first of these 3 variables, so
> that they sit in one cacheline.
>
> > You'd also need to check whether there's no detrimental effect from
> > this change on other, i.e., !KNL platforms, and I think there won't
> > be because both variables will be in separate cachelines then and all
> > should be good.
>
> Yes, these kind of changes should be verified on other platforms.
>
> One thing still puzzles me, that the 2 variables are per-cpu things, and
> there is no case of many CPU contending, why the cacheline layout matters?
> I doubt it is due to the contention of the same cache set, and am trying
> to find some way to test it.
>

Because if you have two structures that are per-cpu and not cache-aligned
then a write in one can bounce the cache line in another due to
cache coherency protocol. It's generally called "false cache line
sharing". https://en.wikipedia.org/wiki/False_sharing has basic examples
(lets not get into whether wikipedia is a valid citation source, there
are books on the topic if someone really cared).

While it's in my imagination, this should happen with the page allocator
pcpu structures because the core structure is 1.5 cache lines on 64-bit
currently and not aligned. That means that not only can two CPUs interfere
with each others lists and counters but that could happen cross-node.

The hypothesis can be tested with perf looking for abnormal cache
misses. In this case, an intense allocating process bound to one CPU
with intermittent allocations on the adjacent CPU should show unexpected
cache line bounces. It would not be perfect as collisions would happen
anyway when the pcpu lists spill over on either the alloc or free side
to the the buddy lists but in that case, the cache misses would happen
on different instructions.

--
Mel Gorman
SUSE Labs

2020-08-31 08:24:19

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Mon, Aug 31, 2020 at 08:56:11AM +0100, Mel Gorman wrote:
> On Mon, Aug 31, 2020 at 10:16:38AM +0800, Feng Tang wrote:
> > > So why don't you define both variables with DEFINE_PER_CPU_ALIGNED and
> > > check if all your bad measurements go away this way?
> >
> > For 'arch_freq_scale', there are other percpu variables in the same
> > smpboot.c: 'arch_prev_aperf' and 'arch_prev_mperf', and in hot path
> > arch_scale_freq_tick(), these 3 variables are all accessed, so I didn't
> > touch it. Or maybe we can align the first of these 3 variables, so
> > that they sit in one cacheline.
> >
> > > You'd also need to check whether there's no detrimental effect from
> > > this change on other, i.e., !KNL platforms, and I think there won't
> > > be because both variables will be in separate cachelines then and all
> > > should be good.
> >
> > Yes, these kind of changes should be verified on other platforms.
> >
> > One thing still puzzles me, that the 2 variables are per-cpu things, and
> > there is no case of many CPU contending, why the cacheline layout matters?
> > I doubt it is due to the contention of the same cache set, and am trying
> > to find some way to test it.
> >
>
> Because if you have two structures that are per-cpu and not cache-aligned
> then a write in one can bounce the cache line in another due to
> cache coherency protocol. It's generally called "false cache line
> sharing". https://en.wikipedia.org/wiki/False_sharing has basic examples
> (lets not get into whether wikipedia is a valid citation source, there
> are books on the topic if someone really cared).

For 'arch_freq_scale' and 'tsc_adjust' percpu variable, they are only
accessed by their own CPU, and usually no other CPU will touch them, the
hot node path only use this_cpu_read/write/ptr. And each CPU's static
percpu variables are all packed together in one area (256KB for one CPU on
this test box), so I don't think there is multiple CPUs accessing one cache
line scenario, which is easy to trigger false sharing.

Also our different test shows the test score is higher if 'arch_freq_scale'
and 'tsc_adjust' are in 2 separate cachelines.


> While it's in my imagination, this should happen with the page allocator
> pcpu structures because the core structure is 1.5 cache lines on 64-bit
> currently and not aligned. That means that not only can two CPUs interfere
> with each others lists and counters but that could happen cross-node.
>
> The hypothesis can be tested with perf looking for abnormal cache
> misses. In this case, an intense allocating process bound to one CPU
> with intermittent allocations on the adjacent CPU should show unexpected
> cache line bounces. It would not be perfect as collisions would happen
> anyway when the pcpu lists spill over on either the alloc or free side
> to the the buddy lists but in that case, the cache misses would happen
> on different instructions.
>
> --
> Mel Gorman
> SUSE Labs

Thanks,
Feng

2020-08-31 08:56:36

by Mel Gorman

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Mon, Aug 31, 2020 at 04:23:06PM +0800, Feng Tang wrote:
> On Mon, Aug 31, 2020 at 08:56:11AM +0100, Mel Gorman wrote:
> > On Mon, Aug 31, 2020 at 10:16:38AM +0800, Feng Tang wrote:
> > > > So why don't you define both variables with DEFINE_PER_CPU_ALIGNED and
> > > > check if all your bad measurements go away this way?
> > >
> > > For 'arch_freq_scale', there are other percpu variables in the same
> > > smpboot.c: 'arch_prev_aperf' and 'arch_prev_mperf', and in hot path
> > > arch_scale_freq_tick(), these 3 variables are all accessed, so I didn't
> > > touch it. Or maybe we can align the first of these 3 variables, so
> > > that they sit in one cacheline.
> > >
> > > > You'd also need to check whether there's no detrimental effect from
> > > > this change on other, i.e., !KNL platforms, and I think there won't
> > > > be because both variables will be in separate cachelines then and all
> > > > should be good.
> > >
> > > Yes, these kind of changes should be verified on other platforms.
> > >
> > > One thing still puzzles me, that the 2 variables are per-cpu things, and
> > > there is no case of many CPU contending, why the cacheline layout matters?
> > > I doubt it is due to the contention of the same cache set, and am trying
> > > to find some way to test it.
> > >
> >
> > Because if you have two structures that are per-cpu and not cache-aligned
> > then a write in one can bounce the cache line in another due to
> > cache coherency protocol. It's generally called "false cache line
> > sharing". https://en.wikipedia.org/wiki/False_sharing has basic examples
> > (lets not get into whether wikipedia is a valid citation source, there
> > are books on the topic if someone really cared).
>
> For 'arch_freq_scale' and 'tsc_adjust' percpu variable, they are only
> accessed by their own CPU, and usually no other CPU will touch them,

Read "false sharing again". Two adjacent per-CPU structures can still
interfere with each other if the structures happen to cross a cache line
boundary and are not cache aligned.

> the
> hot node path only use this_cpu_read/write/ptr. And each CPU's static
> percpu variables are all packed together in one area (256KB for one CPU on
> this test box),

If the structure is not cache aligned (probably 64KB) then there is a
boundary when cache line bounces can occur.

--
Mel Gorman
SUSE Labs

2020-08-31 12:54:10

by Feng Tang

[permalink] [raw]
Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression

On Mon, Aug 31, 2020 at 09:55:17AM +0100, Mel Gorman wrote:
> On Mon, Aug 31, 2020 at 04:23:06PM +0800, Feng Tang wrote:
> > On Mon, Aug 31, 2020 at 08:56:11AM +0100, Mel Gorman wrote:
> > > On Mon, Aug 31, 2020 at 10:16:38AM +0800, Feng Tang wrote:
> > > > > So why don't you define both variables with DEFINE_PER_CPU_ALIGNED and
> > > > > check if all your bad measurements go away this way?
> > > >
> > > > For 'arch_freq_scale', there are other percpu variables in the same
> > > > smpboot.c: 'arch_prev_aperf' and 'arch_prev_mperf', and in hot path
> > > > arch_scale_freq_tick(), these 3 variables are all accessed, so I didn't
> > > > touch it. Or maybe we can align the first of these 3 variables, so
> > > > that they sit in one cacheline.
> > > >
> > > > > You'd also need to check whether there's no detrimental effect from
> > > > > this change on other, i.e., !KNL platforms, and I think there won't
> > > > > be because both variables will be in separate cachelines then and all
> > > > > should be good.
> > > >
> > > > Yes, these kind of changes should be verified on other platforms.
> > > >
> > > > One thing still puzzles me, that the 2 variables are per-cpu things, and
> > > > there is no case of many CPU contending, why the cacheline layout matters?
> > > > I doubt it is due to the contention of the same cache set, and am trying
> > > > to find some way to test it.
> > > >
> > >
> > > Because if you have two structures that are per-cpu and not cache-aligned
> > > then a write in one can bounce the cache line in another due to
> > > cache coherency protocol. It's generally called "false cache line
> > > sharing". https://en.wikipedia.org/wiki/False_sharing has basic examples
> > > (lets not get into whether wikipedia is a valid citation source, there
> > > are books on the topic if someone really cared).
> >
> > For 'arch_freq_scale' and 'tsc_adjust' percpu variable, they are only
> > accessed by their own CPU, and usually no other CPU will touch them,
>
> Read "false sharing again". Two adjacent per-CPU structures can still
> interfere with each other if the structures happen to cross a cache line
> boundary and are not cache aligned.

Sure, will recheck that wiki, thanks.

Some cache info about the test box is, it's the Xeon Phi platform with
72 cores, and each core has 4 HT threads, so there are 288 CPUs. The L1
D-cache and I-cache are both 32KB. There is no L3 cache, and every 2
cores share one 1MB L2 cache. The L1 D-cache is 64 sets, 8 ways, and the
L2 cache is 1024 sets, 16 ways.

Thanks,
Feng

> > the
> > hot node path only use this_cpu_read/write/ptr. And each CPU's static
> > percpu variables are all packed together in one area (256KB for one CPU on
> > this test box),
>
> If the structure is not cache aligned (probably 64KB) then there is a
> boundary when cache line bounces can occur.
>
> --
> Mel Gorman
> SUSE Labs