2020-06-22 01:13:27

by Chen, Rong A

[permalink] [raw]
Subject: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

Greeting,

FYI, we noticed a -33.4% regression of vm-scalability.throughput due to commit:


commit: c0d0381ade79885c04a04c303284b040616b116e ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: vm-scalability
on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
with following parameters:

runtime: 300s
size: 8T
test: anon-cow-seq-hugetlb
cpufreq_governor: performance
ucode: 0x11

test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.6/debian-x86_64-20191114.cgz/300s/8T/lkp-knm01/anon-cow-seq-hugetlb/vm-scalability/0x11

commit:
49aef7175c ("mm/memblock.c: remove redundant assignment to variable max_addr")
c0d0381ade ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")

49aef7175cc6eb70 c0d0381ade79885c04a04c30328
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
1:4 -25% :4 dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x
:4 25% 1:4 dmesg.WARNING:at_ip__mutex_lock/0x
:4 25% 1:4 dmesg.WARNING:at_ip_perf_event_mmap_output/0x
%stddev %change %stddev
\ | \
37643 -29.5% 26549 ± 4% vm-scalability.median
9.18 ± 16% +9.3 18.48 ± 21% vm-scalability.median_stddev%
12666048 -33.4% 8429873 ± 4% vm-scalability.throughput
1701516 -19.5% 1369683 ± 5% vm-scalability.time.minor_page_faults
13447 ± 3% -19.3% 10848 vm-scalability.time.percent_of_cpu_this_job_got
26620 ± 4% -10.5% 23830 vm-scalability.time.system_time
14297 ± 3% -35.3% 9255 ± 4% vm-scalability.time.user_time
125377 ± 2% +621.3% 904348 ± 4% vm-scalability.time.voluntary_context_switches
2.468e+09 -19.6% 1.984e+09 ± 5% vm-scalability.workload
3.807e+10 ± 4% +20.9% 4.605e+10 ± 2% cpuidle.C1.time
9524 ± 10% -17.3% 7873 ± 4% numa-vmstat.node0.nr_mapped
2494252 -12.4% 2184278 ± 3% numa-numastat.node0.local_node
2494194 -12.4% 2184223 ± 3% numa-numastat.node0.numa_hit
49.26 ± 3% +9.4 58.62 mpstat.cpu.all.idle%
32.89 ± 4% -3.2 29.70 mpstat.cpu.all.sys%
17.79 ± 3% -6.2 11.64 ± 4% mpstat.cpu.all.usr%
4856 +47.6% 7169 numa-meminfo.node0.HugePages_Surp
4856 +47.6% 7169 numa-meminfo.node0.HugePages_Total
37725 ± 10% -17.7% 31054 ± 4% numa-meminfo.node0.Mapped
12274212 +38.3% 16981169 numa-meminfo.node0.MemUsed
49.00 ± 3% +18.9% 58.25 vmstat.cpu.id
17.00 ± 4% -33.8% 11.25 ± 3% vmstat.cpu.us
138.25 ± 2% -19.5% 111.25 ± 2% vmstat.procs.r
5836 ± 2% +88.0% 10969 ± 3% vmstat.system.cs
4855 +48.0% 7186 meminfo.HugePages_Surp
4856 +48.0% 7186 meminfo.HugePages_Total
9946017 +48.0% 14718559 meminfo.Hugetlb
48593 ± 8% -13.9% 41859 ± 3% meminfo.Mapped
12877361 +36.8% 17619364 meminfo.Memused
81607 ± 2% +29.1% 105324 meminfo.max_used_kB
2890 ±100% +285.4% 11139 ± 3% perf-stat.i.context-switches
298.31 ±100% +243.7% 1025 ± 2% perf-stat.i.cpu-migrations
0.11 ±100% +119.3% 0.24 perf-stat.i.ipc
0.04 ±100% +129.7% 0.10 ± 4% perf-stat.i.metric.K/sec
0.11 ±100% +122.1% 0.24 perf-stat.overall.ipc
3107 ±100% +137.3% 7372 ± 3% perf-stat.overall.path-length
2836 ±100% +284.4% 10902 ± 3% perf-stat.ps.context-switches
275.01 ±100% +257.9% 984.31 ± 2% perf-stat.ps.cpu-migrations
111986 -11.7% 98924 ± 2% slabinfo.anon_vma.active_objs
2434 -11.7% 2150 ± 2% slabinfo.anon_vma.active_slabs
111986 -11.7% 98924 ± 2% slabinfo.anon_vma.num_objs
2434 -11.7% 2150 ± 2% slabinfo.anon_vma.num_slabs
258615 -9.1% 235038 ± 2% slabinfo.anon_vma_chain.active_objs
4041 -9.1% 3672 ± 2% slabinfo.anon_vma_chain.active_slabs
258670 -9.1% 235049 ± 2% slabinfo.anon_vma_chain.num_objs
4041 -9.1% 3672 ± 2% slabinfo.anon_vma_chain.num_slabs
1500 -17.8% 1232 ± 5% slabinfo.hugetlbfs_inode_cache.active_objs
1500 -17.8% 1232 ± 5% slabinfo.hugetlbfs_inode_cache.num_objs
682.00 ± 6% +15.9% 790.50 ± 6% slabinfo.numa_policy.active_objs
682.00 ± 6% +15.9% 790.50 ± 6% slabinfo.numa_policy.num_objs
1082230 -19.6% 869890 ± 5% proc-vmstat.htlb_buddy_alloc_success
1721553 -6.8% 1604364 proc-vmstat.nr_dirty_background_threshold
3447317 -6.8% 3212652 proc-vmstat.nr_dirty_threshold
17371789 -6.8% 16197198 proc-vmstat.nr_free_pages
351.00 +3.8% 364.25 proc-vmstat.nr_inactive_file
39705 +1.4% 40269 proc-vmstat.nr_kernel_stack
12252 ± 7% -13.7% 10570 ± 3% proc-vmstat.nr_mapped
2632 +8.8% 2863 proc-vmstat.nr_page_table_pages
98880 -1.6% 97341 proc-vmstat.nr_slab_unreclaimable
351.00 +3.8% 364.25 proc-vmstat.nr_zone_inactive_file
6285 ± 50% -90.8% 577.50 ± 47% proc-vmstat.numa_hint_faults
6285 ± 50% -90.8% 577.50 ± 47% proc-vmstat.numa_hint_faults_local
2514799 -12.3% 2205532 ± 3% proc-vmstat.numa_hit
2514798 -12.3% 2205531 ± 3% proc-vmstat.numa_local
5.557e+08 -19.6% 4.469e+08 ± 5% proc-vmstat.pgalloc_normal
2510883 -12.9% 2187221 ± 3% proc-vmstat.pgfault
5.553e+08 -19.5% 4.468e+08 ± 5% proc-vmstat.pgfree
60107 ± 2% -19.7% 48269 ± 17% sched_debug.cfs_rq:/.exec_clock.avg
71511 ± 2% -23.4% 54794 ± 16% sched_debug.cfs_rq:/.exec_clock.max
44464 ± 7% -16.9% 36954 ± 18% sched_debug.cfs_rq:/.exec_clock.min
4619 ± 4% -47.8% 2412 ± 11% sched_debug.cfs_rq:/.exec_clock.stddev
614.00 ± 10% +371.9% 2897 ±124% sched_debug.cfs_rq:/.load_avg.max
49.58 ± 11% +321.1% 208.79 ±123% sched_debug.cfs_rq:/.load_avg.stddev
11526482 ± 4% -37.6% 7189313 ± 17% sched_debug.cfs_rq:/.min_vruntime.avg
13399855 ± 4% -40.0% 8039878 ± 15% sched_debug.cfs_rq:/.min_vruntime.max
8425626 ± 8% -37.2% 5294770 ± 18% sched_debug.cfs_rq:/.min_vruntime.min
788640 ± 4% -56.4% 343805 ± 11% sched_debug.cfs_rq:/.min_vruntime.stddev
0.37 ± 8% -40.0% 0.22 ± 10% sched_debug.cfs_rq:/.nr_running.avg
21.17 ± 5% +134.9% 49.72 ± 21% sched_debug.cfs_rq:/.nr_spread_over.avg
56.15 ± 13% +60.7% 90.25 ± 19% sched_debug.cfs_rq:/.nr_spread_over.max
6.95 ± 7% +289.4% 27.06 ± 23% sched_debug.cfs_rq:/.nr_spread_over.min
7.32 ± 5% +39.4% 10.20 ± 19% sched_debug.cfs_rq:/.nr_spread_over.stddev
413.38 ± 11% -27.5% 299.85 ± 14% sched_debug.cfs_rq:/.runnable_avg.avg
834.44 ± 53% -65.7% 286.48 ± 5% sched_debug.cfs_rq:/.runnable_avg.stddev
2625208 ± 11% -29.7% 1845657 ± 13% sched_debug.cfs_rq:/.spread0.avg
4483653 ± 6% -39.8% 2699533 ± 11% sched_debug.cfs_rq:/.spread0.max
-475138 -94.3% -26940 sched_debug.cfs_rq:/.spread0.min
801419 ± 4% -56.8% 345835 ± 11% sched_debug.cfs_rq:/.spread0.stddev
371.84 ± 8% -22.5% 288.10 ± 14% sched_debug.cfs_rq:/.util_avg.avg
382.64 ± 4% -32.5% 258.09 ± 4% sched_debug.cfs_rq:/.util_avg.stddev
330.75 ± 10% -54.8% 149.57 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.avg
358.21 ± 6% -27.6% 259.25 ± 2% sched_debug.cfs_rq:/.util_est_enqueued.stddev
4047967 ± 4% +17.6% 4760305 ± 4% sched_debug.cpu.avg_idle.avg
561715 ± 52% +116.4% 1215479 ± 7% sched_debug.cpu.avg_idle.min
731.80 ± 6% -38.6% 449.23 ± 8% sched_debug.cpu.clock.stddev
731.80 ± 6% -38.6% 449.23 ± 8% sched_debug.cpu.clock_task.stddev
4741 ± 13% -43.0% 2702 ± 18% sched_debug.cpu.curr->pid.avg
5519 ± 7% -15.8% 4650 ± 11% sched_debug.cpu.curr->pid.stddev
2119875 ± 5% +16.3% 2465873 ± 4% sched_debug.cpu.max_idle_balance_cost.avg
1107931 ± 2% +24.6% 1380330 ± 3% sched_debug.cpu.max_idle_balance_cost.min
581556 ± 5% +11.1% 646020 ± 5% sched_debug.cpu.max_idle_balance_cost.stddev
0.00 ± 4% -30.8% 0.00 ± 11% sched_debug.cpu.next_balance.stddev
0.31 ± 12% -32.6% 0.21 ± 10% sched_debug.cpu.nr_running.avg
5116 +46.1% 7477 ± 11% sched_debug.cpu.nr_switches.avg
2301 ± 5% +85.0% 4257 ± 22% sched_debug.cpu.nr_switches.min
0.09 ± 44% +300.2% 0.34 ± 17% sched_debug.cpu.nr_uninterruptible.avg
296.40 ± 3% +24.1% 367.95 ± 8% sched_debug.cpu.nr_uninterruptible.max
-50.40 +38.8% -69.98 sched_debug.cpu.nr_uninterruptible.min
23.04 ± 3% +53.3% 35.32 ± 12% sched_debug.cpu.nr_uninterruptible.stddev
2343 +98.2% 4646 ± 18% sched_debug.cpu.sched_count.avg
1110 ± 2% +193.0% 3252 ± 23% sched_debug.cpu.sched_count.min
913.60 ± 2% +125.6% 2060 ± 18% sched_debug.cpu.sched_goidle.avg
356.95 +296.7% 1416 ± 25% sched_debug.cpu.sched_goidle.min
1148 +102.6% 2326 ± 18% sched_debug.cpu.ttwu_count.avg
510.15 ± 5% +80.8% 922.26 ± 21% sched_debug.cpu.ttwu_count.min
148.25 ± 6% -11.3% 131.52 ± 6% sched_debug.cpu.ttwu_local.stddev
21.96 ± 6% -10.0 11.99 ± 11% perf-profile.calltrace.cycles-pp.hugetlb_cow.hugetlb_fault.handle_mm_fault.do_page_fault.page_fault
19.05 ± 5% -8.1 10.96 ± 11% perf-profile.calltrace.cycles-pp.copy_user_huge_page.hugetlb_cow.hugetlb_fault.handle_mm_fault.do_page_fault
18.72 ± 5% -8.0 10.76 ± 11% perf-profile.calltrace.cycles-pp.copy_subpage.copy_user_huge_page.hugetlb_cow.hugetlb_fault.handle_mm_fault
17.82 ± 4% -7.6 10.17 ± 10% perf-profile.calltrace.cycles-pp.copy_page.copy_subpage.copy_user_huge_page.hugetlb_cow.hugetlb_fault
2.22 ± 44% -1.5 0.68 ± 23% perf-profile.calltrace.cycles-pp.alloc_huge_page.hugetlb_cow.hugetlb_fault.handle_mm_fault.do_page_fault
1.03 ± 22% -0.6 0.40 ±102% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.do_access
0.91 ± 23% -0.6 0.32 ±103% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.do_access
0.81 ± 19% -0.2 0.60 ± 14% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.exit_mmap.mmput.do_exit.do_group_exit
0.81 ± 19% -0.2 0.60 ± 14% perf-profile.calltrace.cycles-pp.tlb_flush_mmu.tlb_finish_mmu.exit_mmap.mmput.do_exit
0.80 ± 19% -0.2 0.60 ± 14% perf-profile.calltrace.cycles-pp.release_pages.tlb_flush_mmu.tlb_finish_mmu.exit_mmap.mmput
0.79 ± 19% -0.2 0.59 ± 15% perf-profile.calltrace.cycles-pp.__free_huge_page.release_pages.tlb_flush_mmu.tlb_finish_mmu.exit_mmap
1.44 ± 22% +0.6 2.00 ± 28% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues
0.71 ± 80% +1.5 2.16 ± 33% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.osq_lock
1.03 ± 55% +1.6 2.67 ± 32% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.osq_lock.__mutex_lock
1.24 ± 57% +2.0 3.29 ± 26% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.osq_lock.__mutex_lock.hugetlb_fault
1.38 ± 55% +2.3 3.66 ± 25% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.osq_lock.__mutex_lock.hugetlb_fault.handle_mm_fault
29.76 ± 21% +20.1 49.83 ± 13% perf-profile.calltrace.cycles-pp.__mutex_lock.hugetlb_fault.handle_mm_fault.do_page_fault.page_fault
15.71 ± 27% +23.2 38.88 ± 18% perf-profile.calltrace.cycles-pp.osq_lock.__mutex_lock.hugetlb_fault.handle_mm_fault.do_page_fault
21.96 ± 6% -10.0 12.00 ± 11% perf-profile.children.cycles-pp.hugetlb_cow
19.36 ± 5% -8.2 11.11 ± 11% perf-profile.children.cycles-pp.copy_user_huge_page
18.74 ± 5% -8.0 10.77 ± 11% perf-profile.children.cycles-pp.copy_subpage
18.31 ± 5% -7.8 10.54 ± 11% perf-profile.children.cycles-pp.copy_page
16.95 ± 20% -7.0 10.00 ± 53% perf-profile.children.cycles-pp.do_rw_once
2.46 ± 41% -1.7 0.80 ± 14% perf-profile.children.cycles-pp.alloc_huge_page
3.11 ± 36% -1.6 1.48 ± 4% perf-profile.children.cycles-pp._raw_spin_lock
2.64 ± 44% -1.5 1.10 ± 6% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
1.60 ± 38% -1.0 0.57 ± 14% perf-profile.children.cycles-pp.alloc_surplus_huge_page
1.81 ± 12% -0.5 1.34 ± 4% perf-profile.children.cycles-pp.do_syscall_64
1.81 ± 12% -0.5 1.35 ± 4% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.76 ± 28% -0.4 0.38 ± 12% perf-profile.children.cycles-pp.alloc_fresh_huge_page
1.21 ± 10% -0.4 0.85 ± 11% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.40 ± 52% -0.3 0.09 ± 20% perf-profile.children.cycles-pp.prep_new_huge_page
0.81 ± 19% -0.2 0.60 ± 14% perf-profile.children.cycles-pp.tlb_finish_mmu
0.81 ± 19% -0.2 0.60 ± 14% perf-profile.children.cycles-pp.tlb_flush_mmu
0.80 ± 18% -0.2 0.60 ± 15% perf-profile.children.cycles-pp.release_pages
0.79 ± 19% -0.2 0.59 ± 15% perf-profile.children.cycles-pp.__free_huge_page
0.44 ± 13% -0.2 0.26 ± 14% perf-profile.children.cycles-pp.___might_sleep
0.20 ± 14% -0.1 0.06 ± 6% perf-profile.children.cycles-pp.update_and_free_page
0.32 ± 12% -0.1 0.21 ± 6% perf-profile.children.cycles-pp.ksys_read
0.29 ± 13% -0.1 0.18 ± 10% perf-profile.children.cycles-pp.read
0.31 ± 11% -0.1 0.20 ± 8% perf-profile.children.cycles-pp.vfs_read
0.18 ± 17% -0.1 0.09 ± 8% perf-profile.children.cycles-pp._cond_resched
0.16 ± 9% -0.1 0.09 ± 17% perf-profile.children.cycles-pp.__libc_start_main
0.16 ± 11% -0.1 0.08 ± 17% perf-profile.children.cycles-pp.main
0.19 ± 14% -0.1 0.12 ± 17% perf-profile.children.cycles-pp.x86_pmu_disable
0.26 ± 8% -0.1 0.19 ± 6% perf-profile.children.cycles-pp.ksys_write
0.26 ± 9% -0.1 0.19 ± 8% perf-profile.children.cycles-pp.vfs_write
0.15 ± 12% -0.1 0.09 ± 20% perf-profile.children.cycles-pp.new_sync_read
0.15 ± 11% -0.1 0.09 ± 17% perf-profile.children.cycles-pp.pipe_read
0.24 ± 9% -0.1 0.18 ± 7% perf-profile.children.cycles-pp.new_sync_write
0.08 ± 6% -0.1 0.03 ±100% perf-profile.children.cycles-pp.__perf_sw_event
0.11 ± 14% -0.0 0.06 ± 14% perf-profile.children.cycles-pp.__wake_up_common
0.12 ± 3% -0.0 0.08 ± 10% perf-profile.children.cycles-pp.schedule_timeout
0.12 ± 13% -0.0 0.07 ± 17% perf-profile.children.cycles-pp.__wake_up_common_lock
0.10 ± 15% -0.0 0.06 ± 15% perf-profile.children.cycles-pp.autoremove_wake_function
0.11 ± 29% -0.0 0.07 ± 10% perf-profile.children.cycles-pp.swapgs_restore_regs_and_return_to_usermode
0.26 ± 6% -0.0 0.22 ± 5% perf-profile.children.cycles-pp.tick_program_event
0.09 ± 31% -0.0 0.05 ± 8% perf-profile.children.cycles-pp.prepare_exit_to_usermode
0.09 ± 9% -0.0 0.06 ± 13% perf-profile.children.cycles-pp.rcu_gp_kthread
0.08 ± 10% -0.0 0.06 ± 14% perf-profile.children.cycles-pp.cmd_stat
0.01 ±173% +0.0 0.06 ± 11% perf-profile.children.cycles-pp.select_task_rq_fair
0.11 ± 11% +0.0 0.16 ± 9% perf-profile.children.cycles-pp.finish_task_switch
0.00 +0.1 0.06 ± 14% perf-profile.children.cycles-pp.rwsem_wake
0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.arch_stack_walk
0.00 +0.1 0.06 ± 13% perf-profile.children.cycles-pp.dequeue_task_fair
0.00 +0.1 0.07 ± 13% perf-profile.children.cycles-pp.stack_trace_save_tsk
0.13 ± 6% +0.1 0.21 ± 10% perf-profile.children.cycles-pp.schedule_idle
0.06 ± 11% +0.1 0.14 ± 10% perf-profile.children.cycles-pp.enqueue_entity
0.03 ±100% +0.1 0.11 ± 7% perf-profile.children.cycles-pp.update_load_avg
0.00 +0.1 0.09 ± 12% perf-profile.children.cycles-pp.__account_scheduler_latency
0.07 ± 17% +0.1 0.16 ± 11% perf-profile.children.cycles-pp.activate_task
0.07 ± 17% +0.1 0.16 ± 11% perf-profile.children.cycles-pp.ttwu_do_activate
0.07 ± 17% +0.1 0.16 ± 11% perf-profile.children.cycles-pp.enqueue_task_fair
0.00 +0.1 0.09 ± 8% perf-profile.children.cycles-pp.wake_up_q
0.14 ± 24% +0.1 0.27 ± 13% perf-profile.children.cycles-pp.schedule_preempt_disabled
0.00 +0.1 0.14 ± 15% perf-profile.children.cycles-pp.sched_ttwu_pending
1.17 ± 13% +0.4 1.56 ± 8% perf-profile.children.cycles-pp.load_balance
0.99 ± 11% +0.4 1.41 ± 9% perf-profile.children.cycles-pp.find_busiest_group
0.96 ± 11% +0.4 1.38 ± 9% perf-profile.children.cycles-pp.update_sd_lb_stats
0.66 ± 9% +0.5 1.20 ± 11% perf-profile.children.cycles-pp.newidle_balance
0.53 ± 7% +0.6 1.11 ± 12% perf-profile.children.cycles-pp.pick_next_task_fair
0.71 ± 8% +0.6 1.35 ± 10% perf-profile.children.cycles-pp.schedule
0.00 +0.7 0.68 ± 17% perf-profile.children.cycles-pp.rwsem_down_read_slowpath
0.86 ± 7% +0.7 1.58 ± 9% perf-profile.children.cycles-pp.__schedule
52.14 ± 14% +10.7 62.86 ± 9% perf-profile.children.cycles-pp.page_fault
52.09 ± 14% +10.7 62.83 ± 9% perf-profile.children.cycles-pp.do_page_fault
51.96 ± 14% +10.8 62.74 ± 9% perf-profile.children.cycles-pp.handle_mm_fault
51.86 ± 14% +10.8 62.67 ± 9% perf-profile.children.cycles-pp.hugetlb_fault
29.77 ± 21% +20.1 49.84 ± 13% perf-profile.children.cycles-pp.__mutex_lock
15.83 ± 27% +23.4 39.18 ± 18% perf-profile.children.cycles-pp.osq_lock
16.53 ± 4% -7.1 9.48 ± 9% perf-profile.self.cycles-pp.copy_page
15.27 ± 22% -6.4 8.90 ± 53% perf-profile.self.cycles-pp.do_rw_once
2.38 ± 42% -1.4 1.02 ± 7% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.70 ± 9% -0.2 0.48 ± 14% perf-profile.self.cycles-pp.perf_mux_hrtimer_handler
0.38 ± 2% -0.2 0.21 ± 11% perf-profile.self.cycles-pp.copy_subpage
0.33 ± 4% -0.2 0.17 ± 11% perf-profile.self.cycles-pp.copy_user_huge_page
0.18 ± 18% -0.1 0.05 ± 8% perf-profile.self.cycles-pp.update_and_free_page
0.19 ± 14% -0.1 0.11 ± 18% perf-profile.self.cycles-pp.x86_pmu_disable
0.19 ± 6% -0.1 0.12 ± 10% perf-profile.self.cycles-pp.___might_sleep
0.35 ± 14% -0.1 0.28 ± 8% perf-profile.self.cycles-pp.run_timer_softirq
0.18 ± 10% -0.1 0.12 ± 6% perf-profile.self.cycles-pp.get_page_from_freelist
0.26 ± 6% -0.0 0.22 ± 5% perf-profile.self.cycles-pp.tick_program_event
0.07 -0.0 0.06 ± 7% perf-profile.self.cycles-pp.try_to_wake_up
0.00 +0.1 0.06 ± 9% perf-profile.self.cycles-pp.update_load_avg
0.73 ± 12% +0.3 1.06 ± 8% perf-profile.self.cycles-pp.update_sd_lb_stats
14.37 ± 25% +21.2 35.60 ± 18% perf-profile.self.cycles-pp.osq_lock
22773 ± 3% +11.6% 25420 ± 2% softirqs.CPU0.SCHED
15462 ± 6% +24.5% 19248 ± 5% softirqs.CPU1.SCHED
8331 ± 11% +27.1% 10590 ± 11% softirqs.CPU10.RCU
14783 ± 4% +16.0% 17145 ± 4% softirqs.CPU10.SCHED
14236 ± 3% +15.6% 16462 ± 2% softirqs.CPU100.SCHED
14117 ± 3% +18.2% 16683 ± 3% softirqs.CPU101.SCHED
13918 ± 4% +18.4% 16472 ± 5% softirqs.CPU102.SCHED
14183 ± 6% +18.0% 16742 ± 3% softirqs.CPU103.SCHED
13795 ± 5% +19.8% 16528 softirqs.CPU104.SCHED
14326 ± 6% +17.7% 16859 softirqs.CPU105.SCHED
13842 ± 5% +19.4% 16530 ± 2% softirqs.CPU106.SCHED
14155 ± 4% +20.0% 16983 ± 2% softirqs.CPU107.SCHED
13907 ± 3% +18.6% 16492 ± 2% softirqs.CPU108.SCHED
14411 ± 7% +16.1% 16726 ± 2% softirqs.CPU109.SCHED
14354 +19.0% 17085 ± 2% softirqs.CPU11.SCHED
13689 ± 6% +21.8% 16667 ± 3% softirqs.CPU110.SCHED
13461 ± 6% +25.6% 16907 ± 3% softirqs.CPU111.SCHED
13738 ± 9% +23.2% 16925 softirqs.CPU112.SCHED
14135 ± 3% +20.5% 17032 softirqs.CPU113.SCHED
14477 ± 4% +20.8% 17491 ± 4% softirqs.CPU114.SCHED
123003 ± 2% -7.8% 113426 ± 2% softirqs.CPU114.TIMER
14014 ± 4% +23.4% 17299 softirqs.CPU115.SCHED
13528 ± 6% +24.4% 16824 ± 2% softirqs.CPU116.SCHED
13530 ± 6% +24.5% 16840 ± 2% softirqs.CPU117.SCHED
14263 ± 3% +16.1% 16557 ± 2% softirqs.CPU119.SCHED
7902 ± 2% +22.1% 9648 ± 6% softirqs.CPU12.RCU
14332 ± 3% +19.5% 17130 ± 4% softirqs.CPU12.SCHED
13476 ± 5% +26.5% 17047 softirqs.CPU120.SCHED
14240 ± 6% +20.7% 17183 softirqs.CPU121.SCHED
13161 ± 5% +27.0% 16708 ± 2% softirqs.CPU122.SCHED
13037 ± 7% +28.8% 16786 softirqs.CPU123.SCHED
14229 ± 5% +16.5% 16577 ± 3% softirqs.CPU124.SCHED
14427 ± 4% +15.4% 16652 softirqs.CPU125.SCHED
14426 ± 4% +16.3% 16771 softirqs.CPU126.SCHED
6680 +39.3% 9305 ± 22% softirqs.CPU127.RCU
13659 ± 3% +23.7% 16890 ± 3% softirqs.CPU128.SCHED
14396 ± 5% +17.9% 16979 ± 3% softirqs.CPU129.SCHED
14183 ± 7% +20.2% 17048 ± 2% softirqs.CPU13.SCHED
6700 ± 2% +35.8% 9097 ± 17% softirqs.CPU130.RCU
13949 ± 4% +22.2% 17051 ± 4% softirqs.CPU130.SCHED
14271 ± 4% +21.8% 17379 ± 2% softirqs.CPU131.SCHED
129315 ± 3% -7.0% 120290 softirqs.CPU131.TIMER
14135 ± 2% +16.0% 16396 ± 2% softirqs.CPU132.SCHED
13567 ± 4% +22.5% 16617 ± 3% softirqs.CPU133.SCHED
13912 ± 5% +20.1% 16712 ± 2% softirqs.CPU134.SCHED
14309 ± 5% +18.9% 17016 ± 2% softirqs.CPU135.SCHED
13637 ± 4% +20.3% 16402 ± 5% softirqs.CPU136.SCHED
13843 ± 4% +19.3% 16518 ± 2% softirqs.CPU137.SCHED
14049 ± 5% +19.2% 16750 ± 2% softirqs.CPU138.SCHED
14244 ± 7% +18.1% 16816 softirqs.CPU139.SCHED
14139 +22.1% 17271 ± 2% softirqs.CPU14.SCHED
13460 ± 8% +22.9% 16546 softirqs.CPU140.SCHED
13645 ± 6% +21.7% 16601 ± 2% softirqs.CPU141.SCHED
13278 ± 6% +25.5% 16661 ± 3% softirqs.CPU142.SCHED
12487 ± 5% +29.2% 16136 ± 2% softirqs.CPU143.SCHED
13440 ± 9% +22.8% 16502 softirqs.CPU144.SCHED
126064 ± 2% -8.1% 115836 softirqs.CPU144.TIMER
13995 ± 6% +21.2% 16966 ± 3% softirqs.CPU145.SCHED
13834 ± 3% +14.8% 15883 ± 6% softirqs.CPU146.SCHED
14297 ± 9% +17.1% 16735 softirqs.CPU147.SCHED
14500 ± 3% +15.2% 16711 softirqs.CPU148.SCHED
14419 ± 6% +17.5% 16945 softirqs.CPU149.SCHED
8018 ± 6% +24.4% 9977 ± 7% softirqs.CPU15.RCU
13832 ± 5% +28.2% 17737 ± 5% softirqs.CPU15.SCHED
14538 ± 5% +15.0% 16723 ± 2% softirqs.CPU150.SCHED
15092 +9.5% 16526 ± 4% softirqs.CPU152.SCHED
14533 ± 2% +15.6% 16804 softirqs.CPU153.SCHED
14415 ± 2% +14.8% 16554 ± 2% softirqs.CPU154.SCHED
14583 ± 2% +15.4% 16833 ± 2% softirqs.CPU155.SCHED
14851 ± 2% +11.8% 16598 ± 2% softirqs.CPU156.SCHED
6747 +35.2% 9124 ± 19% softirqs.CPU158.RCU
13839 ± 4% +25.9% 17429 ± 4% softirqs.CPU158.SCHED
15020 ± 5% +11.8% 16785 ± 2% softirqs.CPU159.SCHED
13468 ± 5% +25.8% 16945 ± 2% softirqs.CPU16.SCHED
128305 ± 3% -8.1% 117855 softirqs.CPU160.TIMER
15111 ± 2% +13.7% 17184 softirqs.CPU162.SCHED
14788 ± 2% +12.8% 16685 ± 3% softirqs.CPU163.SCHED
13903 ± 5% +21.5% 16889 softirqs.CPU164.SCHED
14547 ± 5% +13.5% 16510 ± 2% softirqs.CPU165.SCHED
14440 ± 4% +13.3% 16354 ± 3% softirqs.CPU166.SCHED
14450 ± 5% +14.1% 16494 ± 2% softirqs.CPU167.SCHED
131454 ± 3% -6.9% 122396 ± 2% softirqs.CPU167.TIMER
13683 ± 6% +23.0% 16824 ± 3% softirqs.CPU168.SCHED
14468 ± 5% +18.1% 17085 ± 3% softirqs.CPU169.SCHED
7703 +22.5% 9437 ± 4% softirqs.CPU17.RCU
14392 +19.8% 17237 softirqs.CPU17.SCHED
14570 ± 6% +16.9% 17038 ± 3% softirqs.CPU170.SCHED
14172 ± 5% +19.3% 16905 softirqs.CPU171.SCHED
14176 ± 3% +17.9% 16709 softirqs.CPU172.SCHED
127159 ± 2% -8.4% 116497 ± 2% softirqs.CPU172.TIMER
15034 ± 2% +11.1% 16698 ± 2% softirqs.CPU174.SCHED
14923 ± 8% +17.0% 17458 ± 5% softirqs.CPU175.SCHED
14164 ± 5% +18.8% 16833 ± 2% softirqs.CPU176.SCHED
127027 ± 2% -8.4% 116378 ± 2% softirqs.CPU176.TIMER
14359 ± 3% +16.9% 16788 ± 4% softirqs.CPU177.SCHED
14466 ± 3% +18.7% 17169 softirqs.CPU178.SCHED
14632 ± 2% +15.1% 16837 softirqs.CPU179.SCHED
123562 ± 3% -8.2% 113392 ± 2% softirqs.CPU179.TIMER
7677 +27.6% 9794 ± 2% softirqs.CPU18.RCU
14177 ± 2% +21.6% 17233 softirqs.CPU18.SCHED
14429 ± 7% +15.8% 16707 softirqs.CPU180.SCHED
15194 ± 5% +12.4% 17084 softirqs.CPU181.SCHED
13917 ± 3% +20.2% 16723 ± 3% softirqs.CPU182.SCHED
14587 ± 2% +15.1% 16786 ± 3% softirqs.CPU183.SCHED
14266 ± 3% +18.0% 16827 ± 2% softirqs.CPU184.SCHED
14661 ± 3% +16.3% 17053 softirqs.CPU185.SCHED
14222 ± 5% +21.0% 17202 ± 3% softirqs.CPU186.SCHED
15059 ± 2% +13.0% 17022 ± 3% softirqs.CPU187.SCHED
125388 ± 2% -7.6% 115887 ± 2% softirqs.CPU187.TIMER
13435 ± 4% +25.0% 16801 ± 3% softirqs.CPU188.SCHED
13434 ± 2% +28.8% 17297 ± 2% softirqs.CPU189.SCHED
7719 +27.0% 9806 ± 9% softirqs.CPU19.RCU
14075 ± 2% +20.7% 16986 softirqs.CPU19.SCHED
13735 ± 3% +20.4% 16536 ± 2% softirqs.CPU190.SCHED
13772 ± 3% +19.8% 16494 softirqs.CPU192.SCHED
13911 ± 2% +17.4% 16327 ± 2% softirqs.CPU193.SCHED
6417 +34.3% 8616 ± 17% softirqs.CPU194.RCU
14008 +17.6% 16468 ± 8% softirqs.CPU194.SCHED
14094 ± 4% +19.9% 16905 softirqs.CPU195.SCHED
14387 ± 7% +16.1% 16698 ± 5% softirqs.CPU196.SCHED
13704 ± 3% +23.7% 16952 softirqs.CPU198.SCHED
8214 ± 10% +23.7% 10157 ± 7% softirqs.CPU2.RCU
14431 +17.4% 16946 ± 4% softirqs.CPU2.SCHED
7770 +30.4% 10135 ± 8% softirqs.CPU20.RCU
13996 ± 2% +23.7% 17320 softirqs.CPU20.SCHED
14611 ± 5% +12.7% 16463 ± 3% softirqs.CPU200.SCHED
14897 ± 3% +15.5% 17199 softirqs.CPU201.SCHED
14105 ± 6% +21.7% 17167 ± 2% softirqs.CPU202.SCHED
127698 ± 3% -7.1% 118685 softirqs.CPU202.TIMER
14369 ± 6% +18.6% 17048 ± 2% softirqs.CPU203.SCHED
126668 ± 3% -8.8% 115539 softirqs.CPU203.TIMER
14461 ± 4% +13.1% 16356 ± 3% softirqs.CPU205.SCHED
13139 ± 10% +26.6% 16634 ± 3% softirqs.CPU206.SCHED
14808 ± 2% +11.8% 16550 ± 2% softirqs.CPU207.SCHED
13744 ± 5% +21.0% 16624 ± 4% softirqs.CPU208.SCHED
14039 ± 5% +18.7% 16661 ± 2% softirqs.CPU209.SCHED
127488 ± 2% -8.2% 116998 softirqs.CPU209.TIMER
7755 ± 2% +30.5% 10122 ± 12% softirqs.CPU21.RCU
13731 ± 5% +25.2% 17185 ± 2% softirqs.CPU21.SCHED
14755 ± 6% +16.1% 17123 ± 4% softirqs.CPU211.SCHED
13920 ± 3% +18.5% 16501 ± 2% softirqs.CPU212.SCHED
14435 ± 4% +16.1% 16765 ± 3% softirqs.CPU213.SCHED
13857 ± 6% +21.6% 16844 ± 2% softirqs.CPU214.SCHED
13417 ± 8% +16.5% 15635 ± 2% softirqs.CPU215.SCHED
127636 ± 3% -7.4% 118224 softirqs.CPU215.TIMER
13715 ± 3% +21.7% 16686 ± 3% softirqs.CPU216.SCHED
126421 ± 2% -7.8% 116544 softirqs.CPU216.TIMER
13885 ± 5% +18.2% 16416 softirqs.CPU217.SCHED
14003 ± 8% +20.5% 16875 softirqs.CPU218.SCHED
13891 ± 4% +23.9% 17210 ± 4% softirqs.CPU22.SCHED
14901 ± 2% +12.0% 16694 ± 2% softirqs.CPU220.SCHED
14623 ± 6% +17.3% 17153 softirqs.CPU221.SCHED
14613 ± 8% +13.9% 16643 ± 2% softirqs.CPU224.SCHED
14745 ± 4% +12.9% 16643 softirqs.CPU225.SCHED
126232 -8.3% 115717 softirqs.CPU225.TIMER
14290 ± 2% +17.2% 16753 softirqs.CPU226.SCHED
15229 ± 3% +10.3% 16800 ± 3% softirqs.CPU228.SCHED
14282 ± 6% +15.4% 16479 ± 3% softirqs.CPU23.SCHED
13785 ± 9% +24.2% 17126 ± 4% softirqs.CPU230.SCHED
126791 ± 4% -9.4% 114889 ± 3% softirqs.CPU231.TIMER
14749 ± 3% +15.0% 16958 ± 3% softirqs.CPU232.SCHED
15099 ± 5% +11.9% 16893 ± 2% softirqs.CPU233.SCHED
14825 ± 6% +12.9% 16738 softirqs.CPU234.SCHED
14272 ± 5% +18.7% 16945 ± 2% softirqs.CPU236.SCHED
15187 ± 2% +10.9% 16846 softirqs.CPU237.SCHED
15267 ± 7% +11.1% 16955 ± 3% softirqs.CPU239.SCHED
14156 ± 4% +16.3% 16466 ± 3% softirqs.CPU24.SCHED
14671 ± 3% +13.9% 16711 ± 3% softirqs.CPU240.SCHED
127318 ± 2% -7.6% 117651 ± 2% softirqs.CPU240.TIMER
14218 ± 5% +20.2% 17097 softirqs.CPU241.SCHED
14554 ± 3% +15.2% 16771 softirqs.CPU242.SCHED
14003 +19.9% 16785 softirqs.CPU243.SCHED
13761 ± 6% +21.5% 16724 ± 2% softirqs.CPU244.SCHED
14418 ± 6% +16.4% 16781 softirqs.CPU245.SCHED
14760 ± 7% +14.8% 16940 ± 4% softirqs.CPU246.SCHED
14891 ± 3% +12.3% 16722 ± 4% softirqs.CPU249.SCHED
13823 ± 4% +24.1% 17148 softirqs.CPU25.SCHED
14201 ± 4% +18.1% 16774 ± 3% softirqs.CPU250.SCHED
14620 ± 10% +18.5% 17331 softirqs.CPU251.SCHED
14877 ± 3% +11.3% 16564 ± 2% softirqs.CPU252.SCHED
14517 ± 6% +16.3% 16877 ± 3% softirqs.CPU254.SCHED
14284 ± 7% +18.0% 16858 ± 4% softirqs.CPU255.SCHED
121538 ± 3% -7.5% 112473 softirqs.CPU255.TIMER
14582 +14.8% 16736 softirqs.CPU256.SCHED
14057 ± 3% +22.8% 17259 softirqs.CPU257.SCHED
15076 ± 3% +13.4% 17097 ± 3% softirqs.CPU258.SCHED
14356 ± 3% +20.2% 17251 ± 3% softirqs.CPU259.SCHED
13998 ± 3% +20.1% 16806 ± 2% softirqs.CPU26.SCHED
13899 ± 4% +19.0% 16534 ± 5% softirqs.CPU260.SCHED
128277 ± 4% -7.4% 118758 softirqs.CPU260.TIMER
14288 ± 5% +20.7% 17251 softirqs.CPU261.SCHED
14456 ± 5% +15.1% 16644 softirqs.CPU262.SCHED
14114 ± 2% +17.8% 16624 ± 4% softirqs.CPU263.SCHED
13328 ± 4% +25.0% 16657 softirqs.CPU264.SCHED
13328 ± 4% +25.1% 16676 ± 4% softirqs.CPU266.SCHED
14367 ± 3% +19.5% 17165 ± 2% softirqs.CPU267.SCHED
122663 ± 3% -7.6% 113293 ± 2% softirqs.CPU268.TIMER
13527 ± 6% +25.2% 16940 softirqs.CPU27.SCHED
14341 ± 5% +19.8% 17187 softirqs.CPU270.SCHED
14345 ± 5% +14.3% 16403 ± 5% softirqs.CPU271.SCHED
14990 ± 3% +14.4% 17152 ± 4% softirqs.CPU272.SCHED
15007 ± 2% +13.4% 17012 softirqs.CPU273.SCHED
14668 ± 8% +17.0% 17157 ± 4% softirqs.CPU274.SCHED
14287 ± 6% +17.2% 16749 ± 4% softirqs.CPU277.SCHED
6301 +41.8% 8938 ± 16% softirqs.CPU278.RCU
13980 ± 10% +24.6% 17424 ± 2% softirqs.CPU278.SCHED
14763 ± 7% +16.0% 17119 softirqs.CPU279.SCHED
13682 ± 5% +23.9% 16950 softirqs.CPU28.SCHED
14204 ± 3% +17.0% 16617 ± 3% softirqs.CPU280.SCHED
14202 ± 3% +14.4% 16240 ± 2% softirqs.CPU281.SCHED
14629 ± 7% +16.3% 17013 ± 3% softirqs.CPU283.SCHED
13692 ± 6% +22.7% 16795 softirqs.CPU284.SCHED
14194 ± 7% +17.5% 16679 ± 3% softirqs.CPU285.SCHED
123721 ± 3% -8.1% 113741 ± 3% softirqs.CPU285.TIMER
14757 ± 7% +15.5% 17050 softirqs.CPU286.SCHED
13560 ± 6% +17.2% 15894 softirqs.CPU287.SCHED
14201 ± 4% +19.9% 17020 ± 2% softirqs.CPU29.SCHED
14345 ± 3% +20.5% 17292 softirqs.CPU3.SCHED
13910 ± 4% +20.9% 16814 ± 4% softirqs.CPU30.SCHED
7587 +41.8% 10759 ± 23% softirqs.CPU31.RCU
13401 ± 4% +26.6% 16967 ± 2% softirqs.CPU31.SCHED
13593 +24.6% 16937 softirqs.CPU32.SCHED
13175 ± 2% +29.0% 16989 softirqs.CPU33.SCHED
13491 ± 5% +25.8% 16974 ± 3% softirqs.CPU34.SCHED
13520 ± 7% +25.9% 17018 ± 2% softirqs.CPU35.SCHED
13723 ± 6% +22.1% 16763 ± 2% softirqs.CPU36.SCHED
13449 ± 3% +27.1% 17091 softirqs.CPU37.SCHED
7405 +40.0% 10368 ± 17% softirqs.CPU38.RCU
14299 ± 6% +22.4% 17509 ± 3% softirqs.CPU38.SCHED
14269 ± 3% +18.6% 16918 softirqs.CPU39.SCHED
7977 ± 2% +23.6% 9862 ± 5% softirqs.CPU4.RCU
14055 ± 2% +23.4% 17349 ± 3% softirqs.CPU4.SCHED
14257 ± 2% +19.4% 17030 ± 2% softirqs.CPU40.SCHED
13774 ± 2% +24.1% 17090 softirqs.CPU41.SCHED
14425 ± 8% +17.7% 16976 softirqs.CPU42.SCHED
14314 ± 2% +21.2% 17345 ± 2% softirqs.CPU43.SCHED
13274 ± 4% +27.7% 16954 softirqs.CPU44.SCHED
135612 ± 10% -15.7% 114264 ± 3% softirqs.CPU44.TIMER
13919 ± 2% +20.9% 16829 softirqs.CPU45.SCHED
13156 +27.9% 16834 ± 2% softirqs.CPU46.SCHED
13617 ± 2% +22.0% 16614 softirqs.CPU47.SCHED
13565 ± 3% +25.7% 17054 ± 2% softirqs.CPU48.SCHED
13791 ± 4% +21.6% 16769 softirqs.CPU49.SCHED
14260 ± 3% +20.1% 17126 ± 4% softirqs.CPU5.SCHED
13510 ± 7% +25.3% 16934 softirqs.CPU50.SCHED
12790 ± 10% +30.3% 16669 softirqs.CPU51.SCHED
13989 +19.9% 16774 ± 3% softirqs.CPU52.SCHED
14279 ± 3% +17.1% 16722 ± 2% softirqs.CPU53.SCHED
7263 +36.9% 9940 ± 21% softirqs.CPU54.RCU
14088 ± 2% +20.1% 16918 softirqs.CPU54.SCHED
13508 ± 6% +23.1% 16629 softirqs.CPU55.SCHED
130302 ± 6% -9.4% 118014 ± 7% softirqs.CPU55.TIMER
13313 ± 3% +27.9% 17030 ± 2% softirqs.CPU56.SCHED
7360 ± 2% +33.2% 9803 ± 12% softirqs.CPU57.RCU
13818 ± 2% +21.3% 16768 ± 2% softirqs.CPU57.SCHED
7296 +27.1% 9274 ± 7% softirqs.CPU58.RCU
13922 ± 3% +23.4% 17181 ± 2% softirqs.CPU58.SCHED
7271 +30.3% 9476 ± 11% softirqs.CPU59.RCU
14169 ± 5% +19.7% 16956 ± 2% softirqs.CPU59.SCHED
8176 ± 7% +16.5% 9524 ± 3% softirqs.CPU6.RCU
13689 ± 3% +25.2% 17132 ± 5% softirqs.CPU6.SCHED
13645 ± 2% +23.3% 16825 softirqs.CPU60.SCHED
13441 ± 6% +24.9% 16786 ± 2% softirqs.CPU61.SCHED
13497 ± 5% +25.7% 16959 softirqs.CPU62.SCHED
13682 ± 4% +25.5% 17170 softirqs.CPU63.SCHED
13656 ± 6% +22.3% 16695 ± 3% softirqs.CPU64.SCHED
13704 ± 7% +22.7% 16819 softirqs.CPU65.SCHED
13683 ± 3% +23.0% 16826 softirqs.CPU66.SCHED
14076 ± 2% +19.3% 16797 ± 2% softirqs.CPU67.SCHED
13713 ± 4% +18.2% 16215 ± 2% softirqs.CPU68.SCHED
13396 ± 3% +25.1% 16754 ± 2% softirqs.CPU69.SCHED
8042 ± 3% +21.0% 9730 ± 7% softirqs.CPU7.RCU
13604 ± 3% +27.3% 17320 ± 5% softirqs.CPU7.SCHED
13023 ± 2% +29.2% 16828 ± 2% softirqs.CPU70.SCHED
12307 ± 3% +32.1% 16255 softirqs.CPU71.SCHED
7727 ± 22% +37.3% 10612 ± 22% softirqs.CPU72.RCU
13320 ± 3% +28.0% 17050 ± 2% softirqs.CPU72.SCHED
13838 ± 3% +22.0% 16883 ± 3% softirqs.CPU73.SCHED
14604 ± 2% +14.4% 16707 ± 4% softirqs.CPU74.SCHED
14184 ± 4% +19.1% 16892 ± 2% softirqs.CPU75.SCHED
14138 ± 4% +19.0% 16830 softirqs.CPU76.SCHED
7098 +28.1% 9090 ± 12% softirqs.CPU77.RCU
13082 ± 4% +28.0% 16741 softirqs.CPU77.SCHED
126355 ± 4% -10.8% 112765 ± 2% softirqs.CPU78.TIMER
14718 ± 5% +12.6% 16576 ± 3% softirqs.CPU79.SCHED
7959 ± 2% +27.1% 10119 ± 10% softirqs.CPU8.RCU
13775 ± 4% +25.1% 17232 ± 5% softirqs.CPU8.SCHED
14397 +13.6% 16355 ± 2% softirqs.CPU80.SCHED
13999 ± 6% +20.7% 16896 ± 2% softirqs.CPU81.SCHED
14129 ± 7% +18.2% 16706 softirqs.CPU82.SCHED
13931 +17.6% 16385 ± 3% softirqs.CPU83.SCHED
14455 ± 5% +16.1% 16776 ± 2% softirqs.CPU84.SCHED
14426 ± 5% +16.4% 16799 softirqs.CPU86.SCHED
7049 +28.2% 9036 ± 10% softirqs.CPU88.RCU
13901 ± 3% +20.1% 16697 ± 2% softirqs.CPU88.SCHED
127258 ± 2% -7.3% 118017 softirqs.CPU88.TIMER
14353 ± 5% +17.4% 16855 ± 2% softirqs.CPU89.SCHED
14823 ± 4% +14.7% 17004 softirqs.CPU9.SCHED
14278 ± 4% +18.4% 16908 softirqs.CPU90.SCHED
14225 ± 2% +18.8% 16894 ± 2% softirqs.CPU91.SCHED
7091 +26.7% 8987 ± 10% softirqs.CPU92.RCU
13726 ± 3% +24.0% 17014 ± 3% softirqs.CPU92.SCHED
14051 ± 5% +20.6% 16939 softirqs.CPU93.SCHED
14123 ± 4% +17.9% 16649 ± 3% softirqs.CPU94.SCHED
130109 ± 5% -8.2% 119400 softirqs.CPU95.TIMER
13638 ± 4% +23.5% 16847 ± 2% softirqs.CPU96.SCHED
13134 ± 2% +29.4% 17001 ± 3% softirqs.CPU97.SCHED
13357 ± 3% +25.7% 16791 ± 2% softirqs.CPU98.SCHED
14112 ± 6% +20.0% 16938 softirqs.CPU99.SCHED
1991578 +21.8% 2426642 ± 3% softirqs.RCU
4100803 +18.4% 4855612 softirqs.SCHED
454.00 ± 27% +675.8% 3522 ±139% interrupts.30:IR-PCI-MSI.2097153-edge.eth0-TxRx-0
384622 -1.6% 378485 interrupts.CAL:Function_call_interrupts
930.25 ± 8% -46.7% 496.00 ± 10% interrupts.CPU0.TLB:TLB_shootdowns
758.50 ± 9% -44.8% 418.75 ± 11% interrupts.CPU1.TLB:TLB_shootdowns
454.00 ± 27% +675.8% 3522 ±139% interrupts.CPU10.30:IR-PCI-MSI.2097153-edge.eth0-TxRx-0
886.75 ± 5% -39.4% 537.00 ± 3% interrupts.CPU10.TLB:TLB_shootdowns
169.50 ± 11% +67.0% 283.00 ± 9% interrupts.CPU100.TLB:TLB_shootdowns
178.50 ± 20% +44.0% 257.00 ± 9% interrupts.CPU101.TLB:TLB_shootdowns
169.00 ± 23% +62.7% 275.00 ± 20% interrupts.CPU104.TLB:TLB_shootdowns
1463 ± 28% +88.1% 2753 ± 22% interrupts.CPU105.NMI:Non-maskable_interrupts
1463 ± 28% +88.1% 2753 ± 22% interrupts.CPU105.PMI:Performance_monitoring_interrupts
158.75 ± 21% +73.4% 275.25 ± 4% interrupts.CPU106.TLB:TLB_shootdowns
157.50 ± 24% +67.3% 263.50 ± 13% interrupts.CPU109.TLB:TLB_shootdowns
2884 ± 27% -43.7% 1625 ± 6% interrupts.CPU11.NMI:Non-maskable_interrupts
2884 ± 27% -43.7% 1625 ± 6% interrupts.CPU11.PMI:Performance_monitoring_interrupts
713.50 ± 5% -38.8% 436.50 ± 13% interrupts.CPU11.TLB:TLB_shootdowns
146.00 ± 38% +79.3% 261.75 ± 14% interrupts.CPU110.TLB:TLB_shootdowns
162.00 ± 14% +82.6% 295.75 ± 12% interrupts.CPU112.TLB:TLB_shootdowns
445.75 ± 52% +170.1% 1204 ± 91% interrupts.CPU114.RES:Rescheduling_interrupts
160.25 ± 37% +70.8% 273.75 ± 14% interrupts.CPU114.TLB:TLB_shootdowns
1417 ± 17% +66.0% 2352 ± 34% interrupts.CPU115.NMI:Non-maskable_interrupts
1417 ± 17% +66.0% 2352 ± 34% interrupts.CPU115.PMI:Performance_monitoring_interrupts
380.75 ± 9% +37.9% 525.00 ± 15% interrupts.CPU116.RES:Rescheduling_interrupts
183.00 ± 9% +48.2% 271.25 ± 14% interrupts.CPU116.TLB:TLB_shootdowns
170.50 ± 15% +74.5% 297.50 ± 5% interrupts.CPU118.TLB:TLB_shootdowns
454.25 ± 38% +251.3% 1596 ±103% interrupts.CPU12.RES:Rescheduling_interrupts
847.00 ± 12% -36.9% 534.25 ± 10% interrupts.CPU12.TLB:TLB_shootdowns
173.25 ± 20% +65.7% 287.00 ± 13% interrupts.CPU122.TLB:TLB_shootdowns
420.50 ± 32% +80.5% 759.00 ± 37% interrupts.CPU123.RES:Rescheduling_interrupts
247.75 ± 22% +75.9% 435.75 ± 14% interrupts.CPU124.RES:Rescheduling_interrupts
181.75 ± 33% +58.7% 288.50 ± 15% interrupts.CPU126.TLB:TLB_shootdowns
364.75 ± 21% +39.1% 507.50 ± 11% interrupts.CPU128.RES:Rescheduling_interrupts
337.75 ± 24% +248.5% 1177 ± 74% interrupts.CPU13.RES:Rescheduling_interrupts
783.25 ± 14% -47.8% 409.00 ± 3% interrupts.CPU13.TLB:TLB_shootdowns
277.00 ± 14% +84.9% 512.25 ± 10% interrupts.CPU131.RES:Rescheduling_interrupts
155.75 ± 24% +71.7% 267.50 ± 21% interrupts.CPU132.TLB:TLB_shootdowns
174.25 ± 24% +50.4% 262.00 ± 19% interrupts.CPU133.TLB:TLB_shootdowns
137.00 ± 7% +116.4% 296.50 ± 11% interrupts.CPU134.TLB:TLB_shootdowns
152.00 ± 24% +86.3% 283.25 ± 17% interrupts.CPU138.TLB:TLB_shootdowns
820.00 ± 10% -32.5% 553.75 ± 3% interrupts.CPU14.TLB:TLB_shootdowns
277.75 ± 19% +81.1% 503.00 ± 7% interrupts.CPU140.RES:Rescheduling_interrupts
298.25 ± 31% +59.3% 475.00 ± 15% interrupts.CPU142.RES:Rescheduling_interrupts
192.50 ± 22% +38.6% 266.75 ± 4% interrupts.CPU143.TLB:TLB_shootdowns
290.00 ± 38% +103.4% 590.00 ± 18% interrupts.CPU145.RES:Rescheduling_interrupts
104.50 ± 34% +90.4% 199.00 ± 16% interrupts.CPU145.TLB:TLB_shootdowns
226.50 ± 10% +96.9% 446.00 ± 15% interrupts.CPU146.RES:Rescheduling_interrupts
95.75 ± 8% +145.4% 235.00 ± 12% interrupts.CPU146.TLB:TLB_shootdowns
145.00 ± 22% +65.2% 239.50 ± 16% interrupts.CPU147.TLB:TLB_shootdowns
90.25 ± 23% +212.2% 281.75 ± 19% interrupts.CPU148.TLB:TLB_shootdowns
104.25 ± 22% +142.7% 253.00 ± 11% interrupts.CPU149.TLB:TLB_shootdowns
2888 ± 16% -35.6% 1860 ± 30% interrupts.CPU15.NMI:Non-maskable_interrupts
2888 ± 16% -35.6% 1860 ± 30% interrupts.CPU15.PMI:Performance_monitoring_interrupts
472.25 ± 31% +275.2% 1771 ±106% interrupts.CPU15.RES:Rescheduling_interrupts
816.50 ± 16% -50.0% 408.50 ± 11% interrupts.CPU15.TLB:TLB_shootdowns
84.75 ± 24% +207.4% 260.50 ± 10% interrupts.CPU150.TLB:TLB_shootdowns
133.50 ± 16% +108.6% 278.50 ± 17% interrupts.CPU151.TLB:TLB_shootdowns
103.75 ± 25% +132.0% 240.75 ± 8% interrupts.CPU152.TLB:TLB_shootdowns
105.00 ± 60% +155.0% 267.75 ± 7% interrupts.CPU153.TLB:TLB_shootdowns
100.50 ± 28% +133.1% 234.25 ± 8% interrupts.CPU154.TLB:TLB_shootdowns
73.50 ± 36% +220.4% 235.50 ± 11% interrupts.CPU155.TLB:TLB_shootdowns
229.50 ± 18% +182.1% 647.50 ± 8% interrupts.CPU156.RES:Rescheduling_interrupts
114.50 ± 20% +110.9% 241.50 ± 29% interrupts.CPU156.TLB:TLB_shootdowns
125.75 ± 27% +79.7% 226.00 ± 8% interrupts.CPU157.TLB:TLB_shootdowns
97.50 ± 21% +143.1% 237.00 ± 16% interrupts.CPU158.TLB:TLB_shootdowns
94.75 ± 49% +136.4% 224.00 ± 21% interrupts.CPU159.TLB:TLB_shootdowns
452.00 ± 36% +89.2% 855.25 ± 39% interrupts.CPU16.RES:Rescheduling_interrupts
959.00 ± 16% -54.3% 438.25 ± 8% interrupts.CPU16.TLB:TLB_shootdowns
96.75 ± 9% +145.0% 237.00 ± 9% interrupts.CPU161.TLB:TLB_shootdowns
94.25 ± 24% +169.5% 254.00 ± 10% interrupts.CPU162.TLB:TLB_shootdowns
100.50 ± 8% +141.8% 243.00 ± 14% interrupts.CPU163.TLB:TLB_shootdowns
87.00 ± 42% +186.5% 249.25 ± 8% interrupts.CPU164.TLB:TLB_shootdowns
93.50 ± 29% +145.7% 229.75 ± 9% interrupts.CPU165.TLB:TLB_shootdowns
110.50 ± 33% +165.4% 293.25 ± 15% interrupts.CPU166.TLB:TLB_shootdowns
323.25 ± 37% +105.7% 665.00 ± 39% interrupts.CPU167.RES:Rescheduling_interrupts
96.75 ± 48% +140.3% 232.50 ± 14% interrupts.CPU167.TLB:TLB_shootdowns
224.00 ± 22% +140.8% 539.50 ± 40% interrupts.CPU168.RES:Rescheduling_interrupts
87.50 ± 26% +151.1% 219.75 ± 14% interrupts.CPU168.TLB:TLB_shootdowns
1120 ± 47% +79.9% 2015 ± 27% interrupts.CPU169.NMI:Non-maskable_interrupts
1120 ± 47% +79.9% 2015 ± 27% interrupts.CPU169.PMI:Performance_monitoring_interrupts
100.25 ± 22% +117.5% 218.00 ± 9% interrupts.CPU169.TLB:TLB_shootdowns
667.75 ± 12% -30.3% 465.50 ± 14% interrupts.CPU17.TLB:TLB_shootdowns
75.00 ± 28% +252.7% 264.50 ± 22% interrupts.CPU170.TLB:TLB_shootdowns
94.25 ± 19% +126.0% 213.00 ± 13% interrupts.CPU171.TLB:TLB_shootdowns
318.25 ± 28% +53.3% 488.00 ± 18% interrupts.CPU172.RES:Rescheduling_interrupts
85.25 ± 33% +221.1% 273.75 ± 19% interrupts.CPU172.TLB:TLB_shootdowns
82.25 ± 18% +130.7% 189.75 ± 20% interrupts.CPU173.TLB:TLB_shootdowns
67.75 ± 55% +261.6% 245.00 ± 12% interrupts.CPU174.TLB:TLB_shootdowns
227.75 ± 41% +588.6% 1568 ± 95% interrupts.CPU175.RES:Rescheduling_interrupts
91.50 ± 10% +150.8% 229.50 ± 10% interrupts.CPU175.TLB:TLB_shootdowns
99.50 ± 17% +147.2% 246.00 ± 22% interrupts.CPU176.TLB:TLB_shootdowns
79.25 ± 18% +200.0% 237.75 ± 27% interrupts.CPU177.TLB:TLB_shootdowns
87.50 ± 36% +129.7% 201.00 ± 4% interrupts.CPU178.TLB:TLB_shootdowns
127.25 ± 44% +82.7% 232.50 ± 15% interrupts.CPU179.TLB:TLB_shootdowns
1161 ± 15% +137.6% 2758 ± 27% interrupts.CPU18.NMI:Non-maskable_interrupts
1161 ± 15% +137.6% 2758 ± 27% interrupts.CPU18.PMI:Performance_monitoring_interrupts
425.75 ± 15% +149.5% 1062 ± 64% interrupts.CPU18.RES:Rescheduling_interrupts
985.50 ± 13% -50.1% 492.25 ± 13% interrupts.CPU18.TLB:TLB_shootdowns
112.25 ± 33% +120.7% 247.75 ± 8% interrupts.CPU180.TLB:TLB_shootdowns
126.25 ± 15% +96.0% 247.50 ± 9% interrupts.CPU181.TLB:TLB_shootdowns
229.25 ± 25% +74.6% 400.25 ± 7% interrupts.CPU182.RES:Rescheduling_interrupts
102.50 ± 41% +173.4% 280.25 ± 4% interrupts.CPU182.TLB:TLB_shootdowns
253.25 ± 13% +52.6% 386.50 ± 8% interrupts.CPU183.RES:Rescheduling_interrupts
97.00 ± 50% +153.4% 245.75 ± 4% interrupts.CPU183.TLB:TLB_shootdowns
99.00 ± 14% +143.7% 241.25 ± 18% interrupts.CPU184.TLB:TLB_shootdowns
287.50 ± 37% +57.0% 451.25 ± 32% interrupts.CPU185.RES:Rescheduling_interrupts
92.50 ± 27% +139.2% 221.25 ± 5% interrupts.CPU185.TLB:TLB_shootdowns
2302 ± 26% +37.4% 3162 ± 5% interrupts.CPU186.NMI:Non-maskable_interrupts
2302 ± 26% +37.4% 3162 ± 5% interrupts.CPU186.PMI:Performance_monitoring_interrupts
87.75 ± 16% +204.0% 266.75 ± 20% interrupts.CPU186.TLB:TLB_shootdowns
97.00 ± 31% +157.0% 249.25 ± 10% interrupts.CPU187.TLB:TLB_shootdowns
261.50 ± 10% +54.2% 403.25 ± 15% interrupts.CPU188.RES:Rescheduling_interrupts
108.50 ± 42% +103.7% 221.00 ± 26% interrupts.CPU188.TLB:TLB_shootdowns
223.75 ± 21% +88.4% 421.50 ± 18% interrupts.CPU189.RES:Rescheduling_interrupts
119.25 ± 11% +108.4% 248.50 ± 25% interrupts.CPU189.TLB:TLB_shootdowns
400.25 ± 27% +117.4% 870.25 ± 52% interrupts.CPU19.RES:Rescheduling_interrupts
699.50 ± 16% -31.8% 476.75 ± 10% interrupts.CPU19.TLB:TLB_shootdowns
99.75 ± 43% +148.6% 248.00 ± 12% interrupts.CPU190.TLB:TLB_shootdowns
98.25 ± 10% +146.6% 242.25 ± 17% interrupts.CPU191.TLB:TLB_shootdowns
105.25 ± 29% +145.4% 258.25 ± 19% interrupts.CPU192.TLB:TLB_shootdowns
89.50 ± 23% +191.3% 260.75 ± 17% interrupts.CPU193.TLB:TLB_shootdowns
66.25 ± 25% +285.7% 255.50 ± 17% interrupts.CPU194.TLB:TLB_shootdowns
278.75 ± 10% +84.9% 515.50 ± 32% interrupts.CPU195.RES:Rescheduling_interrupts
80.00 ± 18% +247.8% 278.25 ± 14% interrupts.CPU195.TLB:TLB_shootdowns
70.25 ± 12% +246.3% 243.25 ± 11% interrupts.CPU196.TLB:TLB_shootdowns
104.75 ± 26% +147.0% 258.75 ± 23% interrupts.CPU197.TLB:TLB_shootdowns
86.75 ± 62% +188.2% 250.00 ± 20% interrupts.CPU198.TLB:TLB_shootdowns
94.25 ± 17% +150.1% 235.75 ± 19% interrupts.CPU199.TLB:TLB_shootdowns
3084 ± 4% -52.9% 1453 ± 6% interrupts.CPU2.NMI:Non-maskable_interrupts
3084 ± 4% -52.9% 1453 ± 6% interrupts.CPU2.PMI:Performance_monitoring_interrupts
556.25 ± 46% +179.6% 1555 ± 83% interrupts.CPU2.RES:Rescheduling_interrupts
891.00 ± 15% -42.6% 511.50 ± 13% interrupts.CPU2.TLB:TLB_shootdowns
364.75 ± 16% +193.9% 1072 ± 40% interrupts.CPU20.RES:Rescheduling_interrupts
825.75 ± 13% -35.7% 530.75 ± 6% interrupts.CPU20.TLB:TLB_shootdowns
129.25 ± 17% +83.0% 236.50 ± 7% interrupts.CPU200.TLB:TLB_shootdowns
88.50 ± 58% +200.0% 265.50 ± 12% interrupts.CPU201.TLB:TLB_shootdowns
116.75 ± 24% +99.8% 233.25 ± 24% interrupts.CPU202.TLB:TLB_shootdowns
254.25 ± 35% +121.8% 564.00 ± 14% interrupts.CPU203.RES:Rescheduling_interrupts
83.25 ± 27% +203.0% 252.25 ± 18% interrupts.CPU203.TLB:TLB_shootdowns
110.25 ± 16% +195.2% 325.50 ± 35% interrupts.CPU204.TLB:TLB_shootdowns
332.00 ± 44% +60.5% 532.75 ± 15% interrupts.CPU205.RES:Rescheduling_interrupts
89.00 ± 32% +165.4% 236.25 ± 11% interrupts.CPU205.TLB:TLB_shootdowns
236.25 ± 23% +112.8% 502.75 ± 10% interrupts.CPU206.RES:Rescheduling_interrupts
82.75 ± 32% +195.2% 244.25 ± 13% interrupts.CPU206.TLB:TLB_shootdowns
121.25 ± 15% +86.6% 226.25 ± 18% interrupts.CPU207.TLB:TLB_shootdowns
83.75 ± 24% +170.7% 226.75 ± 5% interrupts.CPU208.TLB:TLB_shootdowns
330.25 ± 13% +131.5% 764.50 ± 44% interrupts.CPU209.RES:Rescheduling_interrupts
75.75 ± 20% +217.8% 240.75 ± 9% interrupts.CPU209.TLB:TLB_shootdowns
778.00 ± 16% -39.2% 473.00 ± 34% interrupts.CPU21.TLB:TLB_shootdowns
268.75 ± 27% +178.4% 748.25 ± 37% interrupts.CPU210.RES:Rescheduling_interrupts
114.75 ± 16% +100.0% 229.50 ± 13% interrupts.CPU210.TLB:TLB_shootdowns
112.00 ± 29% +101.1% 225.25 ± 15% interrupts.CPU211.TLB:TLB_shootdowns
97.00 ± 58% +159.3% 251.50 ± 21% interrupts.CPU212.TLB:TLB_shootdowns
263.75 ± 29% +80.3% 475.50 ± 15% interrupts.CPU213.RES:Rescheduling_interrupts
101.25 ± 25% +98.5% 201.00 ± 28% interrupts.CPU215.TLB:TLB_shootdowns
1275 ± 12% +70.6% 2177 ± 30% interrupts.CPU216.NMI:Non-maskable_interrupts
1275 ± 12% +70.6% 2177 ± 30% interrupts.CPU216.PMI:Performance_monitoring_interrupts
278.75 ± 15% +48.7% 414.50 ± 14% interrupts.CPU216.RES:Rescheduling_interrupts
95.50 ± 26% +118.3% 208.50 ± 11% interrupts.CPU216.TLB:TLB_shootdowns
1391 ± 30% +54.2% 2146 ± 33% interrupts.CPU217.NMI:Non-maskable_interrupts
1391 ± 30% +54.2% 2146 ± 33% interrupts.CPU217.PMI:Performance_monitoring_interrupts
114.50 ± 10% +84.9% 211.75 ± 16% interrupts.CPU217.TLB:TLB_shootdowns
77.00 ± 48% +308.4% 314.50 ± 55% interrupts.CPU218.TLB:TLB_shootdowns
70.75 ± 41% +259.0% 254.00 ± 21% interrupts.CPU219.TLB:TLB_shootdowns
884.75 ± 18% -36.6% 561.25 ± 12% interrupts.CPU22.TLB:TLB_shootdowns
58.25 ± 36% +315.9% 242.25 ± 13% interrupts.CPU220.TLB:TLB_shootdowns
230.50 ± 35% +103.9% 470.00 ± 18% interrupts.CPU221.RES:Rescheduling_interrupts
61.00 ± 40% +254.1% 216.00 ± 3% interrupts.CPU221.TLB:TLB_shootdowns
114.75 ± 47% +102.6% 232.50 ± 22% interrupts.CPU223.TLB:TLB_shootdowns
94.25 ± 19% +146.9% 232.75 ± 15% interrupts.CPU224.TLB:TLB_shootdowns
92.75 ± 47% +142.6% 225.00 ± 12% interrupts.CPU225.TLB:TLB_shootdowns
84.75 ± 19% +182.0% 239.00 ± 4% interrupts.CPU226.TLB:TLB_shootdowns
91.00 ± 34% +164.8% 241.00 ± 20% interrupts.CPU227.TLB:TLB_shootdowns
2345 ± 32% -40.0% 1407 ± 17% interrupts.CPU228.NMI:Non-maskable_interrupts
2345 ± 32% -40.0% 1407 ± 17% interrupts.CPU228.PMI:Performance_monitoring_interrupts
61.75 ± 24% +232.0% 205.00 ± 10% interrupts.CPU228.TLB:TLB_shootdowns
75.00 ± 49% +188.7% 216.50 ± 14% interrupts.CPU229.TLB:TLB_shootdowns
706.25 ± 17% -38.5% 434.00 ± 19% interrupts.CPU23.TLB:TLB_shootdowns
78.50 ± 20% +208.6% 242.25 ± 11% interrupts.CPU230.TLB:TLB_shootdowns
67.50 ± 35% +214.4% 212.25 ± 20% interrupts.CPU231.TLB:TLB_shootdowns
204.25 ± 15% +141.9% 494.00 ± 11% interrupts.CPU232.RES:Rescheduling_interrupts
65.25 ± 19% +254.0% 231.00 ± 13% interrupts.CPU232.TLB:TLB_shootdowns
104.75 ± 28% +106.9% 216.75 ± 26% interrupts.CPU233.TLB:TLB_shootdowns
65.00 ± 44% +226.9% 212.50 ± 8% interrupts.CPU234.TLB:TLB_shootdowns
72.00 ± 53% +186.8% 206.50 ± 20% interrupts.CPU235.TLB:TLB_shootdowns
262.50 ± 38% +75.3% 460.25 ± 13% interrupts.CPU236.RES:Rescheduling_interrupts
72.75 ± 61% +195.5% 215.00 ± 7% interrupts.CPU236.TLB:TLB_shootdowns
78.25 ± 21% +187.2% 224.75 ± 32% interrupts.CPU237.TLB:TLB_shootdowns
92.25 ± 27% +166.9% 246.25 ± 17% interrupts.CPU238.TLB:TLB_shootdowns
120.50 ± 37% +118.5% 263.25 ± 19% interrupts.CPU239.TLB:TLB_shootdowns
871.75 ± 7% -42.3% 502.75 ± 12% interrupts.CPU24.TLB:TLB_shootdowns
63.75 ± 34% +271.0% 236.50 ± 6% interrupts.CPU240.TLB:TLB_shootdowns
77.25 ± 29% +207.4% 237.50 ± 15% interrupts.CPU241.TLB:TLB_shootdowns
49.75 ± 29% +310.6% 204.25 ± 13% interrupts.CPU242.TLB:TLB_shootdowns
76.00 ± 19% +194.7% 224.00 ± 17% interrupts.CPU243.TLB:TLB_shootdowns
80.75 ± 10% +183.6% 229.00 ± 17% interrupts.CPU244.TLB:TLB_shootdowns
86.75 ± 57% +171.8% 235.75 ± 8% interrupts.CPU245.TLB:TLB_shootdowns
56.50 ± 51% +344.2% 251.00 ± 13% interrupts.CPU246.TLB:TLB_shootdowns
270.50 ± 24% +153.1% 684.75 ± 34% interrupts.CPU247.RES:Rescheduling_interrupts
88.25 ± 24% +180.2% 247.25 ± 18% interrupts.CPU247.TLB:TLB_shootdowns
81.25 ± 37% +167.1% 217.00 ± 20% interrupts.CPU248.TLB:TLB_shootdowns
67.00 ± 26% +216.4% 212.00 ± 11% interrupts.CPU249.TLB:TLB_shootdowns
740.75 ± 11% -36.9% 467.25 ± 7% interrupts.CPU25.TLB:TLB_shootdowns
67.50 ± 29% +247.0% 234.25 ± 11% interrupts.CPU250.TLB:TLB_shootdowns
92.00 ± 63% +111.1% 194.25 ± 24% interrupts.CPU251.TLB:TLB_shootdowns
72.25 ± 45% +223.5% 233.75 ± 16% interrupts.CPU252.TLB:TLB_shootdowns
78.75 ± 29% +178.7% 219.50 ± 12% interrupts.CPU253.TLB:TLB_shootdowns
64.25 ± 58% +258.4% 230.25 ± 11% interrupts.CPU254.TLB:TLB_shootdowns
267.00 ± 17% +44.3% 385.25 ± 13% interrupts.CPU255.RES:Rescheduling_interrupts
114.25 ± 36% +106.1% 235.50 ± 12% interrupts.CPU255.TLB:TLB_shootdowns
74.00 ± 33% +213.9% 232.25 ± 13% interrupts.CPU256.TLB:TLB_shootdowns
86.00 ± 81% +158.4% 222.25 ± 19% interrupts.CPU257.TLB:TLB_shootdowns
1565 ± 38% +54.4% 2417 ± 27% interrupts.CPU258.NMI:Non-maskable_interrupts
1565 ± 38% +54.4% 2417 ± 27% interrupts.CPU258.PMI:Performance_monitoring_interrupts
252.00 ± 25% +118.6% 550.75 ± 54% interrupts.CPU258.RES:Rescheduling_interrupts
81.75 ± 21% +192.7% 239.25 ± 9% interrupts.CPU258.TLB:TLB_shootdowns
106.75 ± 37% +109.1% 223.25 ± 20% interrupts.CPU259.TLB:TLB_shootdowns
899.75 ± 8% -42.9% 514.00 ± 6% interrupts.CPU26.TLB:TLB_shootdowns
67.25 ± 41% +291.8% 263.50 ± 22% interrupts.CPU260.TLB:TLB_shootdowns
59.25 ± 34% +289.0% 230.50 ± 8% interrupts.CPU261.TLB:TLB_shootdowns
292.75 ± 12% +61.0% 471.25 ± 16% interrupts.CPU262.RES:Rescheduling_interrupts
49.75 ± 45% +400.0% 248.75 ± 14% interrupts.CPU262.TLB:TLB_shootdowns
325.00 ± 17% +52.1% 494.25 ± 27% interrupts.CPU263.RES:Rescheduling_interrupts
87.00 ± 32% +143.1% 211.50 ± 11% interrupts.CPU263.TLB:TLB_shootdowns
218.25 ± 14% +71.7% 374.75 ± 5% interrupts.CPU264.RES:Rescheduling_interrupts
75.75 ± 28% +188.8% 218.75 ± 9% interrupts.CPU264.TLB:TLB_shootdowns
56.25 ± 55% +306.2% 228.50 ± 20% interrupts.CPU265.TLB:TLB_shootdowns
80.75 ± 53% +221.1% 259.25 ± 12% interrupts.CPU266.TLB:TLB_shootdowns
271.25 ± 27% +56.2% 423.75 ± 22% interrupts.CPU267.RES:Rescheduling_interrupts
60.25 ± 48% +352.7% 272.75 ± 16% interrupts.CPU267.TLB:TLB_shootdowns
56.00 ± 43% +295.5% 221.50 ± 20% interrupts.CPU268.TLB:TLB_shootdowns
97.25 ± 14% +151.2% 244.25 ± 17% interrupts.CPU269.TLB:TLB_shootdowns
688.00 ± 13% -32.2% 466.25 ± 11% interrupts.CPU27.TLB:TLB_shootdowns
208.00 ± 27% +93.6% 402.75 ± 8% interrupts.CPU270.RES:Rescheduling_interrupts
57.50 ± 30% +368.3% 269.25 ± 19% interrupts.CPU270.TLB:TLB_shootdowns
73.75 ± 34% +288.5% 286.50 ± 14% interrupts.CPU271.TLB:TLB_shootdowns
279.25 ± 32% +37.3% 383.50 ± 10% interrupts.CPU272.RES:Rescheduling_interrupts
79.25 ± 14% +189.6% 229.50 ± 28% interrupts.CPU272.TLB:TLB_shootdowns
216.50 ± 15% +201.8% 653.50 ± 66% interrupts.CPU273.RES:Rescheduling_interrupts
65.50 ± 22% +256.1% 233.25 ± 11% interrupts.CPU273.TLB:TLB_shootdowns
74.75 ± 68% +198.3% 223.00 ± 17% interrupts.CPU274.TLB:TLB_shootdowns
68.50 ± 42% +204.4% 208.50 ± 10% interrupts.CPU275.TLB:TLB_shootdowns
59.75 ± 35% +226.8% 195.25 ± 13% interrupts.CPU276.TLB:TLB_shootdowns
292.25 ± 35% +80.2% 526.75 ± 32% interrupts.CPU277.RES:Rescheduling_interrupts
73.25 ± 39% +233.8% 244.50 ± 14% interrupts.CPU277.TLB:TLB_shootdowns
61.50 ± 60% +248.0% 214.00 ± 9% interrupts.CPU278.TLB:TLB_shootdowns
163.25 ± 15% +188.1% 470.25 ± 18% interrupts.CPU279.RES:Rescheduling_interrupts
81.50 ± 10% +142.0% 197.25 ± 8% interrupts.CPU279.TLB:TLB_shootdowns
372.75 ± 20% +139.3% 892.00 ± 59% interrupts.CPU28.RES:Rescheduling_interrupts
891.00 ± 4% -41.6% 520.50 ± 10% interrupts.CPU28.TLB:TLB_shootdowns
75.75 ± 23% +223.8% 245.25 ± 12% interrupts.CPU280.TLB:TLB_shootdowns
66.25 ± 39% +234.0% 221.25 ± 12% interrupts.CPU281.TLB:TLB_shootdowns
239.25 ± 24% +90.9% 456.75 ± 15% interrupts.CPU282.RES:Rescheduling_interrupts
63.50 ± 36% +281.9% 242.50 ± 11% interrupts.CPU282.TLB:TLB_shootdowns
75.50 ± 39% +215.2% 238.00 ± 10% interrupts.CPU283.TLB:TLB_shootdowns
269.75 ± 41% +72.2% 464.50 ± 20% interrupts.CPU284.RES:Rescheduling_interrupts
66.25 ± 51% +251.7% 233.00 ± 22% interrupts.CPU284.TLB:TLB_shootdowns
103.75 ± 33% +117.3% 225.50 ± 6% interrupts.CPU285.TLB:TLB_shootdowns
274.75 ± 25% +42.6% 391.75 ± 9% interrupts.CPU286.RES:Rescheduling_interrupts
102.50 ± 22% +106.1% 211.25 ± 3% interrupts.CPU286.TLB:TLB_shootdowns
1604 ± 9% -17.7% 1320 interrupts.CPU287.CAL:Function_call_interrupts
1860 ± 18% -26.0% 1376 ± 16% interrupts.CPU287.NMI:Non-maskable_interrupts
1860 ± 18% -26.0% 1376 ± 16% interrupts.CPU287.PMI:Performance_monitoring_interrupts
245.50 ± 19% +54.3% 378.75 ± 8% interrupts.CPU287.RES:Rescheduling_interrupts
685.00 ± 13% -35.5% 441.75 ± 4% interrupts.CPU29.TLB:TLB_shootdowns
689.75 ± 18% -32.4% 466.00 ± 6% interrupts.CPU3.TLB:TLB_shootdowns
399.75 ± 15% +149.8% 998.75 ± 69% interrupts.CPU30.RES:Rescheduling_interrupts
793.00 ± 16% -36.1% 506.75 ± 7% interrupts.CPU30.TLB:TLB_shootdowns
3182 ± 24% -56.9% 1372 ± 6% interrupts.CPU31.NMI:Non-maskable_interrupts
3182 ± 24% -56.9% 1372 ± 6% interrupts.CPU31.PMI:Performance_monitoring_interrupts
283.00 ± 10% +134.5% 663.50 ± 31% interrupts.CPU31.RES:Rescheduling_interrupts
783.50 ± 20% -43.8% 440.25 ± 3% interrupts.CPU31.TLB:TLB_shootdowns
905.25 ± 8% -39.7% 546.25 ± 19% interrupts.CPU32.TLB:TLB_shootdowns
709.25 ± 4% -35.0% 461.00 ± 14% interrupts.CPU33.TLB:TLB_shootdowns
474.50 ± 31% +92.7% 914.50 ± 50% interrupts.CPU34.RES:Rescheduling_interrupts
879.50 ± 11% -43.0% 501.50 ± 9% interrupts.CPU34.TLB:TLB_shootdowns
750.00 ± 12% -42.5% 431.25 ± 10% interrupts.CPU35.TLB:TLB_shootdowns
889.00 ± 5% -45.4% 485.25 ± 6% interrupts.CPU36.TLB:TLB_shootdowns
415.25 ± 26% +34.3% 557.75 ± 9% interrupts.CPU37.RES:Rescheduling_interrupts
818.50 ± 14% -45.1% 449.25 ± 14% interrupts.CPU37.TLB:TLB_shootdowns
453.50 ± 19% +23.7% 561.00 ± 11% interrupts.CPU38.RES:Rescheduling_interrupts
899.00 ± 5% -37.8% 558.75 ± 7% interrupts.CPU38.TLB:TLB_shootdowns
752.25 ± 14% -41.3% 441.25 ± 11% interrupts.CPU39.TLB:TLB_shootdowns
1502 ± 7% +48.7% 2234 ± 24% interrupts.CPU4.NMI:Non-maskable_interrupts
1502 ± 7% +48.7% 2234 ± 24% interrupts.CPU4.PMI:Performance_monitoring_interrupts
1012 ± 26% -47.3% 534.00 ± 6% interrupts.CPU4.TLB:TLB_shootdowns
882.25 ± 14% -41.4% 517.00 ± 17% interrupts.CPU40.TLB:TLB_shootdowns
762.50 ± 13% -35.8% 489.25 ± 4% interrupts.CPU41.TLB:TLB_shootdowns
995.75 ± 12% -49.2% 506.00 ± 8% interrupts.CPU42.TLB:TLB_shootdowns
652.50 ± 12% -35.4% 421.25 ± 11% interrupts.CPU43.TLB:TLB_shootdowns
421.25 ± 18% +102.1% 851.25 ± 47% interrupts.CPU44.RES:Rescheduling_interrupts
819.00 ± 10% -39.3% 497.25 ± 7% interrupts.CPU44.TLB:TLB_shootdowns
369.00 ± 21% +79.1% 660.75 ± 22% interrupts.CPU45.RES:Rescheduling_interrupts
801.00 ± 7% -41.6% 467.75 ± 8% interrupts.CPU45.TLB:TLB_shootdowns
786.75 ± 15% -38.5% 483.75 ± 12% interrupts.CPU46.TLB:TLB_shootdowns
716.75 ± 13% -35.4% 463.25 ± 11% interrupts.CPU47.TLB:TLB_shootdowns
813.25 ± 9% -34.6% 531.75 ± 6% interrupts.CPU48.TLB:TLB_shootdowns
658.50 ± 6% -26.2% 486.25 ± 6% interrupts.CPU49.TLB:TLB_shootdowns
735.25 ± 20% -36.6% 466.00 ± 18% interrupts.CPU5.TLB:TLB_shootdowns
852.50 ± 17% -43.0% 485.75 ± 9% interrupts.CPU50.TLB:TLB_shootdowns
815.75 ± 8% -40.3% 487.25 ± 12% interrupts.CPU51.TLB:TLB_shootdowns
899.50 ± 9% -42.5% 517.50 ± 10% interrupts.CPU52.TLB:TLB_shootdowns
654.50 ± 9% -33.8% 433.00 ± 19% interrupts.CPU53.TLB:TLB_shootdowns
920.00 ± 8% -40.9% 543.50 ± 6% interrupts.CPU54.TLB:TLB_shootdowns
687.25 ± 19% -31.5% 470.75 ± 16% interrupts.CPU55.TLB:TLB_shootdowns
2467 ± 22% -41.5% 1442 ± 13% interrupts.CPU56.NMI:Non-maskable_interrupts
2467 ± 22% -41.5% 1442 ± 13% interrupts.CPU56.PMI:Performance_monitoring_interrupts
342.00 ± 7% +106.1% 705.00 ± 30% interrupts.CPU56.RES:Rescheduling_interrupts
884.75 ± 19% -45.2% 485.25 ± 8% interrupts.CPU56.TLB:TLB_shootdowns
772.00 ± 11% -43.0% 440.00 ± 11% interrupts.CPU57.TLB:TLB_shootdowns
942.50 ± 11% -46.3% 506.00 ± 10% interrupts.CPU58.TLB:TLB_shootdowns
754.25 ± 5% -38.0% 467.50 interrupts.CPU59.TLB:TLB_shootdowns
393.75 ± 9% +311.4% 1619 ±109% interrupts.CPU6.RES:Rescheduling_interrupts
860.25 ± 12% -46.6% 459.00 ± 7% interrupts.CPU6.TLB:TLB_shootdowns
894.50 ± 16% -43.4% 506.00 ± 13% interrupts.CPU60.TLB:TLB_shootdowns
1684 ± 32% +78.1% 2999 ± 14% interrupts.CPU61.NMI:Non-maskable_interrupts
1684 ± 32% +78.1% 2999 ± 14% interrupts.CPU61.PMI:Performance_monitoring_interrupts
693.50 ± 17% -34.9% 451.75 ± 9% interrupts.CPU61.TLB:TLB_shootdowns
936.25 ± 16% -49.3% 474.75 ± 8% interrupts.CPU62.TLB:TLB_shootdowns
421.25 ± 44% +112.9% 897.00 ± 11% interrupts.CPU63.RES:Rescheduling_interrupts
731.25 ± 8% -41.1% 431.00 ± 7% interrupts.CPU63.TLB:TLB_shootdowns
883.00 ± 15% -43.7% 496.75 ± 7% interrupts.CPU64.TLB:TLB_shootdowns
736.00 ± 11% -36.8% 465.50 ± 5% interrupts.CPU65.TLB:TLB_shootdowns
880.00 ± 7% -42.6% 505.25 ± 8% interrupts.CPU66.TLB:TLB_shootdowns
943.50 ± 11% -40.5% 561.75 ± 10% interrupts.CPU68.TLB:TLB_shootdowns
393.25 ± 36% +100.5% 788.50 ± 26% interrupts.CPU69.RES:Rescheduling_interrupts
790.25 ± 9% -38.3% 487.50 ± 8% interrupts.CPU69.TLB:TLB_shootdowns
415.50 ± 24% +407.0% 2106 ±123% interrupts.CPU7.RES:Rescheduling_interrupts
679.00 ± 14% -36.8% 429.25 interrupts.CPU7.TLB:TLB_shootdowns
350.25 ± 13% +92.1% 672.75 ± 12% interrupts.CPU70.RES:Rescheduling_interrupts
778.75 ± 3% -41.8% 453.50 ± 11% interrupts.CPU70.TLB:TLB_shootdowns
673.50 ± 8% -39.8% 405.75 ± 11% interrupts.CPU71.TLB:TLB_shootdowns
460.75 ± 74% +95.7% 901.75 ± 43% interrupts.CPU74.RES:Rescheduling_interrupts
160.75 ± 33% +82.1% 292.75 ± 15% interrupts.CPU76.TLB:TLB_shootdowns
198.00 ± 22% +45.8% 288.75 ± 9% interrupts.CPU77.TLB:TLB_shootdowns
2305 ± 26% -35.5% 1486 ± 12% interrupts.CPU79.NMI:Non-maskable_interrupts
2305 ± 26% -35.5% 1486 ± 12% interrupts.CPU79.PMI:Performance_monitoring_interrupts
155.25 ± 31% +72.9% 268.50 ± 8% interrupts.CPU79.TLB:TLB_shootdowns
887.25 ± 17% -45.4% 484.50 ± 10% interrupts.CPU8.TLB:TLB_shootdowns
192.00 ± 18% +32.9% 255.25 ± 12% interrupts.CPU80.TLB:TLB_shootdowns
158.75 ± 30% +95.1% 309.75 ± 12% interrupts.CPU82.TLB:TLB_shootdowns
265.75 ± 35% +120.8% 586.75 ± 30% interrupts.CPU84.RES:Rescheduling_interrupts
146.75 ± 29% +80.2% 264.50 ± 20% interrupts.CPU84.TLB:TLB_shootdowns
186.00 ± 31% +56.0% 290.25 ± 9% interrupts.CPU86.TLB:TLB_shootdowns
169.25 ± 15% +66.5% 281.75 ± 2% interrupts.CPU88.TLB:TLB_shootdowns
879.75 ± 10% -47.5% 462.00 ± 4% interrupts.CPU9.TLB:TLB_shootdowns
161.00 ± 20% +75.8% 283.00 ± 5% interrupts.CPU90.TLB:TLB_shootdowns
194.75 ± 12% +50.3% 292.75 ± 11% interrupts.CPU91.TLB:TLB_shootdowns
2722 ± 34% -37.8% 1692 ± 6% interrupts.CPU92.NMI:Non-maskable_interrupts
2722 ± 34% -37.8% 1692 ± 6% interrupts.CPU92.PMI:Performance_monitoring_interrupts
190.75 ± 14% +36.3% 260.00 ± 22% interrupts.CPU92.TLB:TLB_shootdowns
188.00 ± 15% +57.0% 295.25 ± 10% interrupts.CPU94.TLB:TLB_shootdowns
200.25 ± 11% +59.6% 319.50 ± 16% interrupts.CPU95.TLB:TLB_shootdowns
316.25 ± 55% +105.1% 648.50 ± 19% interrupts.CPU96.RES:Rescheduling_interrupts
143.75 ± 42% +91.0% 274.50 ± 8% interrupts.CPU96.TLB:TLB_shootdowns
2409 ± 41% -34.2% 1586 ± 46% interrupts.CPU97.NMI:Non-maskable_interrupts
2409 ± 41% -34.2% 1586 ± 46% interrupts.CPU97.PMI:Performance_monitoring_interrupts
186.00 ± 33% +56.0% 290.25 ± 16% interrupts.CPU98.TLB:TLB_shootdowns



vm-scalability.time.user_time

16000 +-------------------------------------------------------------------+
| + |
15000 |-+ :+ + + + |
14000 |-+ .++. +. + + +.+ :+ .++ ++. +.++ + + :+ + +|
|+.+++ ++ + + + ++.+++ +.+++.++.+ ++ + +.+ + :+ |
13000 |-+ + |
| |
12000 |-+ |
| |
11000 |-+ |
10000 |-+ |
| OO O O |
9000 |-+ OO O OO O OO O |
|O OOO O O OO |
8000 +-------------------------------------------------------------------+


vm-scalability.time.voluntary_context_switches

1e+06 +------------------------------------------------------------------+
| O O |
900000 |O+OOO OO O OOO OOO O O OO |
800000 |-+ |
| |
700000 |-+ |
600000 |-+ |
| |
500000 |-+ |
400000 |-+ |
| |
300000 |-+ |
200000 |-+ |
|+.+++.+++.+ + +.+++.++.+++.+++.+++.+++. .+ +. |
100000 +------------------------------------------------------------------+


vm-scalability.throughput

1.6e+07 +-----------------------------------------------------------------+
| |
1.5e+07 |-+ + + + |
1.4e+07 |+. + + +. + + +. + + |
| +++ +.+ +.+++ .++ +++ +.++ +++ +.+ +. |
1.3e+07 |-+ + +.+++ +++.++ +++.+++.+|
1.2e+07 |-+ |
| |
1.1e+07 |-+ |
1e+07 |-+ |
| |
9e+06 |-+ O O O O |
8e+06 |-+ O OO O OO O OO OO O |
|O O O O O |
7e+06 +-----------------------------------------------------------------+


vm-scalability.median

44000 +-------------------------------------------------------------------+
42000 |-+ + + + |
| : : : : :: |
40000 |+. : +. + ++.+ .++. : + .+++. : +. + |
38000 |-++++ + +.+ ++.+++ +++ + ++.+ + +. .+++. +. ++. |
| +++ + + +|
36000 |-+ |
34000 |-+ |
32000 |-+ |
| |
30000 |-+ |
28000 |-+ O O O |
| O O O O |
26000 |O+OOO O O OO OOO O O O |
24000 +-------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (81.22 kB)
config-5.6.0-05750-gc0d0381ade798 (208.58 kB)
job-script (7.68 kB)
job.yaml (5.29 kB)
reproduce (4.81 kB)
Download all attachments

2020-06-22 22:06:23

by Mike Kravetz

[permalink] [raw]
Subject: Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

On 6/21/20 5:55 PM, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed a -33.4% regression of vm-scalability.throughput due to commit:
>
>
> commit: c0d0381ade79885c04a04c303284b040616b116e ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: vm-scalability
> on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
> with following parameters:
>
> runtime: 300s
> size: 8T
> test: anon-cow-seq-hugetlb
> cpufreq_governor: performance
> ucode: 0x11
>

Some performance regression is not surprising as the change includes acquiring
and holding the i_mmap_rwsem (in read mode) during hugetlb page faults. 33.4%
seems a bit high. But, the test is primarily exercising the hugetlb page
fault path and little else.

The reason for taking the i_mmap_rwsem is to prevent PMD unsharing from
invalidating the pmd we are operating on. This specific test case is operating
on anonymous private mappings. So, PMD sharing is not possible and we can
eliminate acquiring the mutex in this case. In fact, we should check all
mappings (even sharable) for the possibly of PMD sharing and only take the
mutex if necessary. It will make the code a bit uglier, but will take care
of some of these regressions. We still need to take the mutex in the case
of PMD sharing. I'm afraid a regression is unavoidable in that case.

I'll put together a patch.
--
Mike Kravetz

2020-06-25 23:40:39

by Mike Kravetz

[permalink] [raw]
Subject: Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

On 6/22/20 3:01 PM, Mike Kravetz wrote:
> On 6/21/20 5:55 PM, kernel test robot wrote:
>> Greeting,
>>
>> FYI, we noticed a -33.4% regression of vm-scalability.throughput due to commit:
>>
>>
>> commit: c0d0381ade79885c04a04c303284b040616b116e ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>
>> in testcase: vm-scalability
>> on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
>> with following parameters:
>>
>> runtime: 300s
>> size: 8T
>> test: anon-cow-seq-hugetlb
>> cpufreq_governor: performance
>> ucode: 0x11
>>
>
> Some performance regression is not surprising as the change includes acquiring
> and holding the i_mmap_rwsem (in read mode) during hugetlb page faults. 33.4%
> seems a bit high. But, the test is primarily exercising the hugetlb page
> fault path and little else.
>
> The reason for taking the i_mmap_rwsem is to prevent PMD unsharing from
> invalidating the pmd we are operating on. This specific test case is operating
> on anonymous private mappings. So, PMD sharing is not possible and we can
> eliminate acquiring the mutex in this case. In fact, we should check all
> mappings (even sharable) for the possibly of PMD sharing and only take the
> mutex if necessary. It will make the code a bit uglier, but will take care
> of some of these regressions. We still need to take the mutex in the case
> of PMD sharing. I'm afraid a regression is unavoidable in that case.
>
> I'll put together a patch.

Not acquiring the mutex on faults when sharing is not possible is quite
straight forward. We can even use the existing routine vma_shareable()
to easily check. However, the next patch in the series 87bf91d39bb5
"hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race" depends
on always acquiring the mutex. If we break this assumption, then the
code to back out hugetlb reservations needs to be written. A high level
view of what needs to be done is in the commit message for 87bf91d39bb5.

I'm working on the code to back out reservations.
--
Mike Kravetz

2020-07-06 20:30:26

by Mike Kravetz

[permalink] [raw]
Subject: [RFC PATCH 0/3] hugetlbfs: address fault time regression

Commits c0d0381ade79 and 87bf91d39bb5 changed the way huegtlb locking
was performed to address BUGs. One specific change was to always take
the i_mmap_rwsem in read mode during fault processing. One result of
this change was a 33% regression for anon non-shared page faults [1].

Technically, i_mmap_rwsem only needs to be taken during page faults
if the pmd can potentially be shared. pmd sharing is not possible for
anon non-shared mappings (as in the reported regression), therefore the
code can be modified to not acquire the semaphore in this case.

Unfortunately, commit 87bf91d39bb5 depends on i_mmap_rwsem always being
taken in the fault path to prevent fault/truncation races. So, that
approach is no longer appropriate. Rather, the code now detects races
and backs out operations.

This code "works" in that it only takes i_mmap_rwsem when necessary and
addresses the original BUGs. However, I am sending as an RFC because:
- I am unsure if the added complexity is worth performance benefit.
- There needs to be a better way/location to make a decison about taking
the semaphore. See FIXME's in the code.

Comments and suggestions would be appreciated.

[1] https://lore.kernel.org/lkml/20200622005551.GK5535@shao2-debian

Mike Kravetz (3):
Revert: "hugetlbfs: Use i_mmap_rwsem to address page fault/truncate
race"
hugetlbfs: Only take i_mmap_rwsem when sharing is possible
huegtlbfs: handle page fault/truncate races

fs/hugetlbfs/inode.c | 69 +++++++++-----------
mm/hugetlb.c | 150 ++++++++++++++++++++++++++++++-------------
2 files changed, 137 insertions(+), 82 deletions(-)

--
2.25.4

2020-08-21 08:40:43

by Xing Zhengjun

[permalink] [raw]
Subject: Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression



On 6/26/2020 5:33 AM, Mike Kravetz wrote:
> On 6/22/20 3:01 PM, Mike Kravetz wrote:
>> On 6/21/20 5:55 PM, kernel test robot wrote:
>>> Greeting,
>>>
>>> FYI, we noticed a -33.4% regression of vm-scalability.throughput due to commit:
>>>
>>>
>>> commit: c0d0381ade79885c04a04c303284b040616b116e ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>
>>> in testcase: vm-scalability
>>> on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
>>> with following parameters:
>>>
>>> runtime: 300s
>>> size: 8T
>>> test: anon-cow-seq-hugetlb
>>> cpufreq_governor: performance
>>> ucode: 0x11
>>>
>>
>> Some performance regression is not surprising as the change includes acquiring
>> and holding the i_mmap_rwsem (in read mode) during hugetlb page faults. 33.4%
>> seems a bit high. But, the test is primarily exercising the hugetlb page
>> fault path and little else.
>>
>> The reason for taking the i_mmap_rwsem is to prevent PMD unsharing from
>> invalidating the pmd we are operating on. This specific test case is operating
>> on anonymous private mappings. So, PMD sharing is not possible and we can
>> eliminate acquiring the mutex in this case. In fact, we should check all
>> mappings (even sharable) for the possibly of PMD sharing and only take the
>> mutex if necessary. It will make the code a bit uglier, but will take care
>> of some of these regressions. We still need to take the mutex in the case
>> of PMD sharing. I'm afraid a regression is unavoidable in that case.
>>
>> I'll put together a patch.
>
> Not acquiring the mutex on faults when sharing is not possible is quite
> straight forward. We can even use the existing routine vma_shareable()
> to easily check. However, the next patch in the series 87bf91d39bb5
> "hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race" depends
> on always acquiring the mutex. If we break this assumption, then the
> code to back out hugetlb reservations needs to be written. A high level
> view of what needs to be done is in the commit message for 87bf91d39bb5.
>
> I'm working on the code to back out reservations.
>

I find that 34ae204f18519f0920bd50a644abd6fefc8dbfcf(hugetlbfs: remove
call to huge_pte_alloc without i_mmap_rwsem) fixed this regression, I
test with the patch, the regression reduced to 10.1%, do you have plan
to continue to improve it? Thanks.

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/runtime/size/test/cpufreq_governor/ucode:

lkp-knm01/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/8T/anon-cow-seq-hugetlb/performance/0x11

commit:
49aef7175cc6eb703a9280a7b830e675fe8f2704
c0d0381ade79885c04a04c303284b040616b116e
v5.8
34ae204f18519f0920bd50a644abd6fefc8dbfcf
v5.9-rc1

49aef7175cc6eb70 c0d0381ade79885c04a04c30328 v5.8
34ae204f18519f0920bd50a644a v5.9-rc1
---------------- --------------------------- ---------------------------
--------------------------- ---------------------------
%stddev %change %stddev %change
%stddev %change %stddev %change %stddev
\ | \ | \
| \ | \
38084 -31.1% 26231 ± 2% -26.6% 27944 ±
5% -7.0% 35405 -7.5% 35244
vm-scalability.median
9.92 ± 9% +12.0 21.95 ± 4% +3.9 13.87 ±
30% -5.3 4.66 ± 9% -6.6 3.36 ± 7%
vm-scalability.median_stddev%
12827311 -35.0% 8340256 ± 2% -30.9% 8865669 ±
5% -10.1% 11532087 -10.2% 11513595 ± 2%
vm-scalability.throughput
2.507e+09 -22.7% 1.938e+09 -15.3% 2.122e+09 ±
6% +8.0% 2.707e+09 +8.0% 2.707e+09 ± 2%
vm-scalability.workload



--
Zhengjun Xing

2020-08-21 21:06:30

by Mike Kravetz

[permalink] [raw]
Subject: Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

On 8/21/20 1:39 AM, Xing Zhengjun wrote:
>
>
> On 6/26/2020 5:33 AM, Mike Kravetz wrote:
>> On 6/22/20 3:01 PM, Mike Kravetz wrote:
>>> On 6/21/20 5:55 PM, kernel test robot wrote:
>>>> Greeting,
>>>>
>>>> FYI, we noticed a -33.4% regression of vm-scalability.throughput due to commit:
>>>>
>>>>
>>>> commit: c0d0381ade79885c04a04c303284b040616b116e ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>>
>>>> in testcase: vm-scalability
>>>> on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory
>>>> with following parameters:
>>>>
>>>> runtime: 300s
>>>> size: 8T
>>>> test: anon-cow-seq-hugetlb
>>>> cpufreq_governor: performance
>>>> ucode: 0x11
>>>>
>>>
>>> Some performance regression is not surprising as the change includes acquiring
>>> and holding the i_mmap_rwsem (in read mode) during hugetlb page faults. 33.4%
>>> seems a bit high. But, the test is primarily exercising the hugetlb page
>>> fault path and little else.
>>>
>>> The reason for taking the i_mmap_rwsem is to prevent PMD unsharing from
>>> invalidating the pmd we are operating on. This specific test case is operating
>>> on anonymous private mappings. So, PMD sharing is not possible and we can
>>> eliminate acquiring the mutex in this case. In fact, we should check all
>>> mappings (even sharable) for the possibly of PMD sharing and only take the
>>> mutex if necessary. It will make the code a bit uglier, but will take care
>>> of some of these regressions. We still need to take the mutex in the case
>>> of PMD sharing. I'm afraid a regression is unavoidable in that case.
>>>
>>> I'll put together a patch.
>>
>> Not acquiring the mutex on faults when sharing is not possible is quite
>> straight forward. We can even use the existing routine vma_shareable()
>> to easily check. However, the next patch in the series 87bf91d39bb5
>> "hugetlbfs: Use i_mmap_rwsem to address page fault/truncate race" depends
>> on always acquiring the mutex. If we break this assumption, then the
>> code to back out hugetlb reservations needs to be written. A high level
>> view of what needs to be done is in the commit message for 87bf91d39bb5.
>>
>> I'm working on the code to back out reservations.
>>
>
> I find that 34ae204f18519f0920bd50a644abd6fefc8dbfcf(hugetlbfs: remove call to huge_pte_alloc without i_mmap_rwsem) fixed this regression, I test with the patch, the regression reduced to 10.1%, do you have plan to continue to improve it? Thanks.

Thank you for testing!

Commit 34ae204f1851 was not created to address performance. Rather it was
created more for the sake of correctness.

IIRC, this test is going to stress the page fault path by continuing to produce
faults for the duration of the test. Commit 34ae204f1851 removed an unneeded,
somewhat dangerous and redundant call to huge_pte_alloc in the fault path.
Your testing showed that removing this call helped address a good amount of the
performance regression. The series proposed in this thread attempts to
eliminate more potentially unnecessary code in the hugetlb fault path.
Specifically, it will only acquire the i_mmap_rwsem when necessary.

Would you be willing to test this series on top of 34ae204f1851? I will need
to rebase the series to take the changes made by 34ae204f1851 into account.

BTW - I believe that shared hugetlb mappings are the most common use case.
In the shared case, acquiring i_mmap_rwsem is necessary if pmd sharing is
possible. There is little we can do to eliminate this. The test suite does
not appear to test shared mappings.
--
Mike Kravetz

2020-08-21 23:40:19

by Mike Kravetz

[permalink] [raw]
Subject: Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

On 8/21/20 2:02 PM, Mike Kravetz wrote:
> Would you be willing to test this series on top of 34ae204f1851? I will need
> to rebase the series to take the changes made by 34ae204f1851 into account.

Actually, the series in this thread will apply/run cleanly on top of
34ae204f1851. No need to rebase or port. If we decide to move forward more
work is required. See a few FIXME's in the patches.
--
Mike Kravetz

2020-10-12 06:18:03

by Xing Zhengjun

[permalink] [raw]
Subject: Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

Hi Mike,

I re-test it in v5.9-rc8, the regression still existed. It is almost
the same as 34ae204f1851. Do you have time to look at it? Thanks.

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/runtime/size/test/cpufreq_governor/ucode:

lkp-knm01/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/8T/anon-cow-seq-hugetlb/performance/0x11

commit:
49aef7175cc6eb703a9280a7b830e675fe8f2704
c0d0381ade79885c04a04c303284b040616b116e
v5.8
34ae204f18519f0920bd50a644abd6fefc8dbfcf
v5.9-rc1
v5.9-rc8

49aef7175cc6eb70 c0d0381ade79885c04a04c30328 v5.8
34ae204f18519f0920bd50a644a v5.9-rc1
v5.9-rc8
---------------- --------------------------- ---------------------------
--------------------------- ---------------------------
---------------------------
%stddev %change %stddev %change
%stddev %change %stddev %change %stddev
%change %stddev
\ | \ | \
| \ | \ |
\
38043 ± 3% -30.2% 26560 ± 4% -29.5% 26815 ±
6% -7.4% 35209 ± 2% -7.4% 35244 -8.8%
34704 vm-scalability.median
7.86 ± 19% +9.7 17.54 ± 21% +10.4 18.23 ±
34% -3.1 4.75 ± 7% -4.5 3.36 ± 7% -4.0
3.82 ± 15% vm-scalability.median_stddev%
12822071 ± 3% -34.1% 8450822 ± 4% -33.6% 8517252 ±
6% -10.7% 11453675 ± 2% -10.2% 11513595 ± 2% -11.6%
11331657 vm-scalability.throughput
2.523e+09 ± 3% -20.7% 2.001e+09 ± 5% -19.9% 2.021e+09 ±
7% +6.8% 2.694e+09 ± 2% +7.3% 2.707e+09 ± 2% +5.4%
2.661e+09 vm-scalability.workload


On 8/22/2020 7:36 AM, Mike Kravetz wrote:
> On 8/21/20 2:02 PM, Mike Kravetz wrote:
>> Would you be willing to test this series on top of 34ae204f1851? I will need
>> to rebase the series to take the changes made by 34ae204f1851 into account.
>
> Actually, the series in this thread will apply/run cleanly on top of
> 34ae204f1851. No need to rebase or port. If we decide to move forward more
> work is required. See a few FIXME's in the patches.
>

--
Zhengjun Xing

2020-10-12 17:46:32

by Mike Kravetz

[permalink] [raw]
Subject: Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

On 10/11/20 10:29 PM, Xing Zhengjun wrote:
> Hi Mike,
>
> I re-test it in v5.9-rc8, the regression still existed. It is almost the same as 34ae204f1851. Do you have time to look at it? Thanks.
>

Thank you for testing.

Just curious, did you apply the series in this thread or just test v5.9-rc8?
If just testing v5.9-rc8, no changes to this code were added after 34ae204f1851,
so results being the same are expected.

There are some functional issues with this new hugetlb locking model that
are currently being worked. It is likely to result in significantly different
code. The performance issues discovered here will be taken into account with
the new code. However, as previously mentioned additional synchronization
is required for functional correctness. As a result, there will be some
regression in this code.

--
Mike Kravetz

> =========================================================================================
> tbox_group/testcase/rootfs/kconfig/compiler/runtime/size/test/cpufreq_governor/ucode:
>
> lkp-knm01/vm-scalability/debian-x86_64-20191114.cgz/x86_64-rhel-7.6/gcc-7/300s/8T/anon-cow-seq-hugetlb/performance/0x11
>
> commit:
> 49aef7175cc6eb703a9280a7b830e675fe8f2704
> c0d0381ade79885c04a04c303284b040616b116e
> v5.8
> 34ae204f18519f0920bd50a644abd6fefc8dbfcf
> v5.9-rc1
> v5.9-rc8
>
> 49aef7175cc6eb70 c0d0381ade79885c04a04c30328 v5.8 34ae204f18519f0920bd50a644a v5.9-rc1 v5.9-rc8
> ---------------- --------------------------- --------------------------- --------------------------- --------------------------- ---------------------------
> %stddev %change %stddev %change %stddev %change %stddev %change %stddev %change %stddev
> \ | \ | \ | \ | \ | \
> 38043 ± 3% -30.2% 26560 ± 4% -29.5% 26815 ± 6% -7.4% 35209 ± 2% -7.4% 35244 -8.8% 34704 vm-scalability.median
> 7.86 ± 19% +9.7 17.54 ± 21% +10.4 18.23 ± 34% -3.1 4.75 ± 7% -4.5 3.36 ± 7% -4.0 3.82 ± 15% vm-scalability.median_stddev%
> 12822071 ± 3% -34.1% 8450822 ± 4% -33.6% 8517252 ± 6% -10.7% 11453675 ± 2% -10.2% 11513595 ± 2% -11.6% 11331657 vm-scalability.throughput
> 2.523e+09 ± 3% -20.7% 2.001e+09 ± 5% -19.9% 2.021e+09 ± 7% +6.8% 2.694e+09 ± 2% +7.3% 2.707e+09 ± 2% +5.4% 2.661e+09 vm-scalability.workload
>
>
> On 8/22/2020 7:36 AM, Mike Kravetz wrote:
>> On 8/21/20 2:02 PM, Mike Kravetz wrote:
>>> Would you be willing to test this series on top of 34ae204f1851? I will need
>>> to rebase the series to take the changes made by 34ae204f1851 into account.
>>
>> Actually, the series in this thread will apply/run cleanly on top of
>> 34ae204f1851. No need to rebase or port. If we decide to move forward more
>> work is required. See a few FIXME's in the patches.
>>

2020-10-13 11:21:24

by Xing Zhengjun

[permalink] [raw]
Subject: Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression



On 10/13/2020 1:40 AM, Mike Kravetz wrote:
> On 10/11/20 10:29 PM, Xing Zhengjun wrote:
>> Hi Mike,
>>
>> I re-test it in v5.9-rc8, the regression still existed. It is almost the same as 34ae204f1851. Do you have time to look at it? Thanks.
>>
>
> Thank you for testing.
>
> Just curious, did you apply the series in this thread or just test v5.9-rc8?
> If just testing v5.9-rc8, no changes to this code were added after 34ae204f1851,
> so results being the same are expected.
>

I just test v5.9-rc8. Where can I find the series patches you mentioned
here? Or just wait for the next mainline release?


> There are some functional issues with this new hugetlb locking model that
> are currently being worked. It is likely to result in significantly different
> code. The performance issues discovered here will be taken into account with
> the new code. However, as previously mentioned additional synchronization
> is required for functional correctness. As a result, there will be some
> regression in this code.
>

--
Zhengjun Xing

2020-10-13 11:23:10

by Mike Kravetz

[permalink] [raw]
Subject: Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression

On 10/12/20 6:59 PM, Xing Zhengjun wrote:
>
>
> On 10/13/2020 1:40 AM, Mike Kravetz wrote:
>> On 10/11/20 10:29 PM, Xing Zhengjun wrote:
>>> Hi Mike,
>>>
>>> I re-test it in v5.9-rc8, the regression still existed. It is almost the same as 34ae204f1851. Do you have time to look at it? Thanks.
>>>
>>
>> Thank you for testing.
>>
>> Just curious, did you apply the series in this thread or just test v5.9-rc8?
>> If just testing v5.9-rc8, no changes to this code were added after 34ae204f1851,
>> so results being the same are expected.
>>
>
> I just test v5.9-rc8. Where can I find the series patches you mentioned here? Or just wait for the next mainline release?
>

My apologies. I missed that you were not cc'ed on this thred:
https://lore.kernel.org/linux-mm/[email protected]/

As mentioned, there will likely be another revision to the way locking is
handled. The new scheme will try to consider performance as is done in
the above link. I suggest you wait for next revision. If you do not mind,
I will cc you when the new code is posted.
--
Mike Kravetz

2020-10-13 11:37:17

by Xing Zhengjun

[permalink] [raw]
Subject: Re: [LKP] Re: [hugetlbfs] c0d0381ade: vm-scalability.throughput -33.4% regression



On 10/13/2020 11:01 AM, Mike Kravetz wrote:
> On 10/12/20 6:59 PM, Xing Zhengjun wrote:
>>
>>
>> On 10/13/2020 1:40 AM, Mike Kravetz wrote:
>>> On 10/11/20 10:29 PM, Xing Zhengjun wrote:
>>>> Hi Mike,
>>>>
>>>> I re-test it in v5.9-rc8, the regression still existed. It is almost the same as 34ae204f1851. Do you have time to look at it? Thanks.
>>>>
>>>
>>> Thank you for testing.
>>>
>>> Just curious, did you apply the series in this thread or just test v5.9-rc8?
>>> If just testing v5.9-rc8, no changes to this code were added after 34ae204f1851,
>>> so results being the same are expected.
>>>
>>
>> I just test v5.9-rc8. Where can I find the series patches you mentioned here? Or just wait for the next mainline release?
>>
>
> My apologies. I missed that you were not cc'ed on this thred:
> https://lore.kernel.org/linux-mm/[email protected]/
>
> As mentioned, there will likely be another revision to the way locking is
> handled. The new scheme will try to consider performance as is done in
> the above link. I suggest you wait for next revision. If you do not mind,
> I will cc you when the new code is posted.
>

OK. I will wait for the next revision.

--
Zhengjun Xing