2018-11-05 05:08:51

by Chen, Rong A

[permalink] [raw]
Subject: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

Greeting,

FYI, we noticed a -64.1% regression of will-it-scale.per_thread_ops due to commit:


commit: 9bc8039e715da3b53dbac89525323a9f2f69b7b5 ("mm: brk: downgrade mmap_sem to read when shrinking")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

in testcase: will-it-scale
on test machine: 8 threads Ivy Bridge with 16G memory
with following parameters:

nr_task: 100%
mode: thread
test: brk1
ucode: 0x20
cpufreq_governor: performance

test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two.
test-url: https://github.com/antonblanchard/will-it-scale



Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode:
gcc-7/performance/x86_64-rhel-7.2/thread/100%/debian-x86_64-2018-04-03.cgz/lkp-ivb-d01/brk1/will-it-scale/0x20

commit:
85a06835f6 ("mm: mremap: downgrade mmap_sem to read when shrinking")
9bc8039e71 ("mm: brk: downgrade mmap_sem to read when shrinking")

85a06835f6f1ba79 9bc8039e715da3b53dbac89525
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
1:4 -25% :4 kmsg.a0f723>]usb_hcd_irq
:4 25% 1:4 kmsg.b6a4a7>]usb_hcd_irq
1:4 -25% :4 kmsg.cae42e>]usb_hcd_irq
:4 25% 1:4 kmsg.fb#f87>]usb_hcd_irq
:4 25% 1:4 kmsg.fd#>]usb_hcd_irq
%stddev %change %stddev
\ | \
196250 ± 8% -64.1% 70494 will-it-scale.per_thread_ops
127330 ± 19% -98.0% 2525 ± 24% will-it-scale.time.involuntary_context_switches
727.50 ± 2% -77.0% 167.25 will-it-scale.time.percent_of_cpu_this_job_got
2141 ± 2% -77.6% 479.12 will-it-scale.time.system_time
50.48 ± 7% -48.5% 25.98 will-it-scale.time.user_time
34925294 ± 18% +270.3% 1.293e+08 ± 4% will-it-scale.time.voluntary_context_switches
1570007 ± 8% -64.1% 563958 will-it-scale.workload
6435 ± 2% -6.4% 6024 proc-vmstat.nr_shmem
1298 ± 16% -44.5% 721.00 ± 18% proc-vmstat.pgactivate
2341 +16.4% 2724 slabinfo.kmalloc-96.active_objs
2341 +16.4% 2724 slabinfo.kmalloc-96.num_objs
6346 ±150% -87.8% 776.25 ± 9% softirqs.NET_RX
160107 ± 8% +151.9% 403273 softirqs.SCHED
1097999 -13.0% 955526 softirqs.TIMER
5.50 ± 9% -81.8% 1.00 vmstat.procs.r
230700 ± 19% +269.9% 853292 ± 4% vmstat.system.cs
26706 ± 3% +15.7% 30910 ± 5% vmstat.system.in
11.24 ± 23% +72.2 83.39 mpstat.cpu.idle%
0.00 ±131% +0.0 0.04 ± 99% mpstat.cpu.iowait%
86.32 ± 2% -70.8 15.54 mpstat.cpu.sys%
2.44 ± 7% -1.4 1.04 ± 8% mpstat.cpu.usr%
20610709 ± 15% +2376.0% 5.103e+08 ± 34% cpuidle.C1.time
3233399 ± 8% +241.5% 11042785 ± 25% cpuidle.C1.usage
36172040 ± 6% +931.3% 3.73e+08 ± 15% cpuidle.C1E.time
783605 ± 4% +548.7% 5083041 ± 18% cpuidle.C1E.usage
28753819 ± 39% +1054.5% 3.319e+08 ± 49% cpuidle.C3.time
283912 ± 25% +688.4% 2238225 ± 34% cpuidle.C3.usage
1.507e+08 ± 47% +292.3% 5.913e+08 ± 28% cpuidle.C6.time
339861 ± 37% +549.7% 2208222 ± 24% cpuidle.C6.usage
2709719 ± 5% +824.2% 25043444 cpuidle.POLL.time
28602864 ± 18% +173.7% 78276116 ± 10% cpuidle.POLL.usage
3338 ± 2% -71.8% 941.75 turbostat.Avg_MHz
90.40 ± 2% -65.1 25.31 turbostat.Busy%
3233316 ± 8% +241.5% 11042588 ± 25% turbostat.C1
0.83 ± 16% +20.2 21.02 ± 35% turbostat.C1%
783559 ± 4% +548.7% 5082972 ± 18% turbostat.C1E
1.46 ± 7% +13.9 15.34 ± 15% turbostat.C1E%
283898 ± 25% +688.4% 2238202 ± 34% turbostat.C3
1.17 ± 39% +12.5 13.66 ± 49% turbostat.C3%
339857 ± 37% +549.7% 2208205 ± 24% turbostat.C6
6.05 ± 46% +18.2 24.30 ± 28% turbostat.C6%
6.12 ± 18% +755.7% 52.39 ± 9% turbostat.CPU%c1
2.18 ±105% +735.9% 18.18 ± 45% turbostat.CPU%c3
1.30 ±117% +216.3% 4.12 ± 92% turbostat.CPU%c6
38.31 -37.2% 24.06 turbostat.CorWatt
8285923 ± 2% +13.9% 9436075 ± 6% turbostat.IRQ
43.17 -33.4% 28.77 turbostat.PkgWatt
8.92e+11 -67.9% 2.862e+11 ± 31% perf-stat.branch-instructions
0.55 ± 4% +1.0 1.53 ± 5% perf-stat.branch-miss-rate%
0.19 ± 22% +0.7 0.89 ± 69% perf-stat.cache-miss-rate%
2.401e+10 -57.3% 1.025e+10 ± 28% perf-stat.cache-references
71517245 ± 18% +264.5% 2.607e+08 ± 4% perf-stat.context-switches
1.85 -21.9% 1.44 ± 2% perf-stat.cpi
8.223e+12 ± 2% -76.8% 1.904e+12 ± 32% perf-stat.cpu-cycles
0.34 ± 11% +0.3 0.63 ± 9% perf-stat.dTLB-load-miss-rate%
4.563e+09 ± 10% -50.7% 2.252e+09 ± 27% perf-stat.dTLB-load-misses
1.357e+12 -73.2% 3.63e+11 ± 31% perf-stat.dTLB-loads
0.01 ± 28% +0.0 0.02 ± 10% perf-stat.dTLB-store-miss-rate%
20813820 ± 31% +109.2% 43542782 ± 25% perf-stat.dTLB-store-misses
97.04 -13.4 83.65 ± 2% perf-stat.iTLB-load-miss-rate%
5.871e+08 ± 8% -48.9% 3.001e+08 ± 35% perf-stat.iTLB-load-misses
17851698 ± 4% +235.7% 59936365 ± 41% perf-stat.iTLB-loads
4.456e+12 -70.6% 1.311e+12 ± 31% perf-stat.instructions
7651 ± 9% -41.0% 4513 ± 11% perf-stat.instructions-per-iTLB-miss
0.54 +28.2% 0.69 ± 2% perf-stat.ipc
131812 ± 2% -75.7% 31973 sched_debug.cfs_rq:/.exec_clock.avg
133451 ± 2% -67.5% 43371 ± 30% sched_debug.cfs_rq:/.exec_clock.max
130067 ± 4% -83.3% 21767 ± 30% sched_debug.cfs_rq:/.exec_clock.min
1220 ± 87% +592.8% 8456 ± 98% sched_debug.cfs_rq:/.exec_clock.stddev
229519 ± 6% +70.6% 391456 ± 4% sched_debug.cfs_rq:/.load.max
65462 ± 11% +141.1% 157808 ± 4% sched_debug.cfs_rq:/.load.stddev
82.00 ± 3% -83.5% 13.54 ± 30% sched_debug.cfs_rq:/.load_avg.min
985041 ± 4% -90.8% 90737 sched_debug.cfs_rq:/.min_vruntime.avg
1004586 ± 4% -87.2% 128180 ± 21% sched_debug.cfs_rq:/.min_vruntime.max
969836 ± 5% -93.6% 62139 ± 28% sched_debug.cfs_rq:/.min_vruntime.min
0.77 ± 5% -42.6% 0.44 ± 10% sched_debug.cfs_rq:/.nr_running.avg
0.34 ± 12% +39.5% 0.48 sched_debug.cfs_rq:/.nr_running.stddev
1.79 ± 7% -63.1% 0.66 ± 22% sched_debug.cfs_rq:/.nr_spread_over.avg
10.54 ± 20% -68.0% 3.38 ± 26% sched_debug.cfs_rq:/.nr_spread_over.max
3.52 ± 18% -68.3% 1.12 ± 25% sched_debug.cfs_rq:/.nr_spread_over.stddev
96.05 ± 6% -33.5% 63.92 ± 8% sched_debug.cfs_rq:/.runnable_load_avg.avg
237.58 ± 7% +31.3% 311.83 ± 2% sched_debug.cfs_rq:/.runnable_load_avg.max
67.84 ± 9% +55.7% 105.65 sched_debug.cfs_rq:/.runnable_load_avg.stddev
200408 ± 8% +80.6% 361912 ± 6% sched_debug.cfs_rq:/.runnable_weight.max
56555 ± 14% +160.3% 147211 ± 5% sched_debug.cfs_rq:/.runnable_weight.stddev
936.14 ± 2% -55.3% 418.09 ± 6% sched_debug.cfs_rq:/.util_avg.avg
1341 ± 11% -43.2% 762.54 ± 11% sched_debug.cfs_rq:/.util_avg.max
587.25 ± 18% -69.6% 178.75 ± 23% sched_debug.cfs_rq:/.util_avg.min
518.64 ± 11% -90.2% 50.84 ± 29% sched_debug.cfs_rq:/.util_est_enqueued.avg
1018 ± 4% -70.8% 297.75 ± 23% sched_debug.cfs_rq:/.util_est_enqueued.max
386.54 ± 6% -73.6% 102.24 ± 21% sched_debug.cfs_rq:/.util_est_enqueued.stddev
116574 ± 30% +67.2% 194964 ± 30% sched_debug.cpu.avg_idle.avg
93.29 ± 16% -35.1% 60.54 ± 8% sched_debug.cpu.cpu_load[0].avg
221.08 ± 8% +34.3% 296.96 ± 9% sched_debug.cpu.cpu_load[0].max
63.60 ± 9% +57.6% 100.26 ± 9% sched_debug.cpu.cpu_load[0].stddev
95.54 ± 13% -47.7% 49.98 ± 7% sched_debug.cpu.cpu_load[1].avg
41.00 ± 52% -97.7% 0.96 ± 68% sched_debug.cpu.cpu_load[1].min
44.57 ± 16% +38.2% 61.57 ± 8% sched_debug.cpu.cpu_load[1].stddev
98.62 ± 10% -58.2% 41.19 ± 7% sched_debug.cpu.cpu_load[2].avg
169.42 ± 8% -28.1% 121.79 ± 9% sched_debug.cpu.cpu_load[2].max
58.04 ± 35% -94.5% 3.17 ± 47% sched_debug.cpu.cpu_load[2].min
100.91 ± 8% -64.5% 35.80 ± 10% sched_debug.cpu.cpu_load[3].avg
155.21 ± 5% -39.6% 93.79 ± 13% sched_debug.cpu.cpu_load[3].max
70.42 ± 22% -93.3% 4.71 ± 36% sched_debug.cpu.cpu_load[3].min
102.59 ± 6% -68.9% 31.92 ± 17% sched_debug.cpu.cpu_load[4].avg
144.54 ± 5% -44.9% 79.71 ± 28% sched_debug.cpu.cpu_load[4].max
78.79 ± 15% -92.6% 5.83 ± 30% sched_debug.cpu.cpu_load[4].min
1044 ± 5% -33.5% 694.38 ± 4% sched_debug.cpu.curr->pid.avg
835.66 ± 4% +19.0% 994.65 sched_debug.cpu.curr->pid.stddev
228643 ± 6% +71.3% 391578 ± 4% sched_debug.cpu.load.max
64674 ± 11% +143.6% 157538 ± 4% sched_debug.cpu.load.stddev
0.00 ± 5% -48.4% 0.00 ± 46% sched_debug.cpu.next_balance.stddev
1.03 ± 16% -44.9% 0.57 ± 13% sched_debug.cpu.nr_running.avg
6206533 ± 12% +167.7% 16616895 ± 4% sched_debug.cpu.nr_switches.avg
6795551 ± 18% +222.4% 21911344 ± 19% sched_debug.cpu.nr_switches.max
5666982 ± 9% +115.3% 12201219 ± 17% sched_debug.cpu.nr_switches.min
351168 ± 73% +837.8% 3293280 ± 72% sched_debug.cpu.nr_switches.stddev
0.32 ± 21% +109.8% 0.67 ± 8% sched_debug.cpu.nr_uninterruptible.avg
6297340 ± 12% +163.9% 16617675 ± 4% sched_debug.cpu.sched_count.avg
6887997 ± 18% +218.1% 21912464 ± 19% sched_debug.cpu.sched_count.max
5753875 ± 8% +112.1% 12201952 ± 17% sched_debug.cpu.sched_count.min
352908 ± 73% +833.2% 3293323 ± 71% sched_debug.cpu.sched_count.stddev
3067826 ± 12% +170.7% 8305951 ± 4% sched_debug.cpu.sched_goidle.avg
3361061 ± 18% +225.9% 10953485 ± 19% sched_debug.cpu.sched_goidle.max
2798299 ± 9% +117.9% 6098033 ± 17% sched_debug.cpu.sched_goidle.min
175049 ± 72% +840.7% 1646720 ± 72% sched_debug.cpu.sched_goidle.stddev
3232125 ± 11% +159.6% 8389199 ± 4% sched_debug.cpu.ttwu_count.avg
3533980 ± 16% +275.2% 13260800 ± 36% sched_debug.cpu.ttwu_count.max
271212 ±109% +1140.0% 3363044 ± 83% sched_debug.cpu.ttwu_count.stddev
107111 ± 9% +30.6% 139843 ± 17% sched_debug.cpu.ttwu_local.max
85459 ± 6% -29.3% 60381 ± 44% sched_debug.cpu.ttwu_local.min
6551 ± 69% +374.8% 31104 ± 78% sched_debug.cpu.ttwu_local.stddev
82.07 -72.2 9.87 ± 5% perf-profile.calltrace.cycles-pp.down_write_killable.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
81.23 -72.0 9.20 ± 5% perf-profile.calltrace.cycles-pp.call_rwsem_down_write_failed_killable.down_write_killable.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
81.17 -72.0 9.18 ± 5% perf-profile.calltrace.cycles-pp.rwsem_down_write_failed_killable.call_rwsem_down_write_failed_killable.down_write_killable.__x64_sys_brk.do_syscall_64
68.76 -68.8 0.00 perf-profile.calltrace.cycles-pp.osq_lock.rwsem_down_write_failed_killable.call_rwsem_down_write_failed_killable.down_write_killable.__x64_sys_brk
94.12 -54.6 39.55 ± 2% perf-profile.calltrace.cycles-pp.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
94.54 -54.4 40.16 ± 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
94.69 -54.3 40.35 ± 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
10.16 ± 3% -10.0 0.19 ±173% perf-profile.calltrace.cycles-pp.rwsem_spin_on_owner.rwsem_down_write_failed_killable.call_rwsem_down_write_failed_killable.down_write_killable.__x64_sys_brk
0.89 ± 10% -0.2 0.67 ± 6% perf-profile.calltrace.cycles-pp.anon_vma_clone.__split_vma.__do_munmap.__x64_sys_brk.do_syscall_64
0.66 ± 6% +0.4 1.03 ± 8% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__x64_sys_brk
0.74 ± 6% +0.4 1.16 ± 8% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__x64_sys_brk.do_syscall_64
1.42 ± 7% +0.4 1.85 ± 6% perf-profile.calltrace.cycles-pp.vma_merge.do_brk_flags.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.6 0.59 ± 11% perf-profile.calltrace.cycles-pp.__task_rq_lock.try_to_wake_up.wake_up_q.rwsem_wake.call_rwsem_wake
0.00 +0.6 0.63 ± 13% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry
0.00 +0.7 0.66 ± 18% perf-profile.calltrace.cycles-pp.tick_nohz_idle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
0.00 +0.7 0.67 ± 9% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__sched_text_start.schedule.rwsem_down_write_failed_killable.call_rwsem_down_write_failed_killable
0.00 +0.7 0.67 ± 11% perf-profile.calltrace.cycles-pp.unlink_anon_vmas.free_pgtables.unmap_region.__do_munmap.__x64_sys_brk
0.00 +0.7 0.67 ± 7% perf-profile.calltrace.cycles-pp.pick_next_task_idle.__sched_text_start.schedule.rwsem_down_write_failed_killable.call_rwsem_down_write_failed_killable
0.00 +0.7 0.68 ± 12% perf-profile.calltrace.cycles-pp.find_next_bit.__next_timer_interrupt.get_next_timer_interrupt.tick_nohz_next_event.tick_nohz_get_sleep_length
0.00 +0.7 0.68 ± 13% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
0.67 ± 7% +0.7 1.35 ± 9% perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.14 ±173% +0.7 0.87 ± 5% perf-profile.calltrace.cycles-pp.get_unmapped_area.do_brk_flags.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.8 0.77 ± 8% perf-profile.calltrace.cycles-pp.security_vm_enough_memory_mm.do_brk_flags.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.8 0.78 ± 11% perf-profile.calltrace.cycles-pp.free_pgtables.unmap_region.__do_munmap.__x64_sys_brk.do_syscall_64
0.00 +0.9 0.87 ± 17% perf-profile.calltrace.cycles-pp.unwind_next_frame.__save_stack_trace.save_stack_trace_tsk.__account_scheduler_latency.enqueue_entity
0.00 +0.9 0.91 ± 8% perf-profile.calltrace.cycles-pp.update_curr.dequeue_entity.dequeue_task_fair.__sched_text_start.schedule
1.75 ± 10% +0.9 2.66 ± 5% perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.0 0.98 ± 7% perf-profile.calltrace.cycles-pp.__switch_to
0.00 +1.0 0.98 ± 9% perf-profile.calltrace.cycles-pp.__switch_to_asm
0.00 +1.1 1.10 ± 3% perf-profile.calltrace.cycles-pp.update_load_avg.dequeue_entity.dequeue_task_fair.__sched_text_start.schedule
0.00 +1.1 1.11 ± 20% perf-profile.calltrace.cycles-pp.select_task_rq_fair.try_to_wake_up.wake_up_q.rwsem_wake.call_rwsem_wake
0.00 +1.2 1.20 ± 6% perf-profile.calltrace.cycles-pp.__next_timer_interrupt.get_next_timer_interrupt.tick_nohz_next_event.tick_nohz_get_sleep_length.menu_select
0.00 +1.6 1.64 ± 7% perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__sched_text_start.schedule_idle.do_idle
0.00 +1.9 1.86 ± 14% perf-profile.calltrace.cycles-pp.__save_stack_trace.save_stack_trace_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair
5.75 ± 4% +1.9 7.64 ± 5% perf-profile.calltrace.cycles-pp.__do_munmap.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +1.9 1.91 ± 14% perf-profile.calltrace.cycles-pp.save_stack_trace_tsk.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.ttwu_do_activate
0.00 +2.1 2.09 ± 11% perf-profile.calltrace.cycles-pp.get_next_timer_interrupt.tick_nohz_next_event.tick_nohz_get_sleep_length.menu_select.do_idle
3.38 ± 5% +2.1 5.47 ± 4% perf-profile.calltrace.cycles-pp.do_brk_flags.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +2.2 2.25 ± 14% perf-profile.calltrace.cycles-pp.__account_scheduler_latency.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
0.00 +2.5 2.46 ± 7% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__sched_text_start.schedule_idle.do_idle.cpu_startup_entry
0.00 +2.5 2.46 ± 10% perf-profile.calltrace.cycles-pp.tick_nohz_next_event.tick_nohz_get_sleep_length.menu_select.do_idle.cpu_startup_entry
0.00 +3.1 3.06 ± 6% perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.do_idle.cpu_startup_entry.start_secondary
0.00 +3.5 3.52 ± 4% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__sched_text_start.schedule.rwsem_down_write_failed_killable
0.00 +3.8 3.80 ± 8% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
0.00 +4.1 4.10 ± 3% perf-profile.calltrace.cycles-pp.dequeue_task_fair.__sched_text_start.schedule.rwsem_down_write_failed_killable.call_rwsem_down_write_failed_killable
0.00 +4.7 4.66 ± 11% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule_idle.do_idle.cpu_startup_entry.start_secondary
0.00 +4.8 4.83 ± 11% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
0.00 +5.3 5.27 ± 6% perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
0.91 ± 16% +5.5 6.44 ± 4% perf-profile.calltrace.cycles-pp.try_to_wake_up.wake_up_q.rwsem_wake.call_rwsem_wake.up_write
0.95 ± 15% +5.7 6.70 ± 4% perf-profile.calltrace.cycles-pp.wake_up_q.rwsem_wake.call_rwsem_wake.up_write.__x64_sys_brk
1.73 ± 11% +5.8 7.53 ± 4% perf-profile.calltrace.cycles-pp.up_write.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +6.2 6.15 ± 2% perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.wake_up_q
1.15 ± 17% +6.2 7.33 ± 4% perf-profile.calltrace.cycles-pp.rwsem_wake.call_rwsem_wake.up_write.__x64_sys_brk.do_syscall_64
1.15 ± 17% +6.2 7.36 ± 4% perf-profile.calltrace.cycles-pp.call_rwsem_wake.up_write.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +6.9 6.93 ± 3% perf-profile.calltrace.cycles-pp.try_to_wake_up.wake_up_q.rwsem_wake.call_rwsem_wake.up_read
0.00 +6.9 6.95 ± 2% perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.wake_up_q.rwsem_wake
0.00 +7.1 7.15 ± 3% perf-profile.calltrace.cycles-pp.wake_up_q.rwsem_wake.call_rwsem_wake.up_read.__x64_sys_brk
0.00 +7.3 7.33 ± 2% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.wake_up_q.rwsem_wake.call_rwsem_wake
0.00 +7.8 7.75 ± 3% perf-profile.calltrace.cycles-pp.rwsem_wake.call_rwsem_wake.up_read.__x64_sys_brk.do_syscall_64
0.00 +7.8 7.79 ± 3% perf-profile.calltrace.cycles-pp.call_rwsem_wake.up_read.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +7.9 7.90 ± 3% perf-profile.calltrace.cycles-pp.up_read.__x64_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +8.0 8.00 ± 4% perf-profile.calltrace.cycles-pp.__sched_text_start.schedule.rwsem_down_write_failed_killable.call_rwsem_down_write_failed_killable.down_write_killable
0.00 +8.2 8.22 ± 4% perf-profile.calltrace.cycles-pp.schedule.rwsem_down_write_failed_killable.call_rwsem_down_write_failed_killable.down_write_killable.__x64_sys_brk
0.83 ± 27% +25.9 26.75 ± 11% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
1.09 ± 32% +30.9 31.97 ± 10% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
1.62 ± 36% +44.4 46.01 ± 9% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
1.63 ± 36% +44.5 46.18 ± 9% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
1.63 ± 36% +44.6 46.21 ± 9% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
1.73 ± 29% +51.1 52.86 ± 2% perf-profile.calltrace.cycles-pp.secondary_startup_64
82.10 -72.2 9.91 ± 6% perf-profile.children.cycles-pp.down_write_killable
81.23 -72.0 9.20 ± 5% perf-profile.children.cycles-pp.call_rwsem_down_write_failed_killable
81.20 -72.0 9.18 ± 5% perf-profile.children.cycles-pp.rwsem_down_write_failed_killable
68.78 -68.8 0.01 ±173% perf-profile.children.cycles-pp.osq_lock
94.15 -54.6 39.58 ± 2% perf-profile.children.cycles-pp.__x64_sys_brk
94.59 -54.3 40.28 ± 3% perf-profile.children.cycles-pp.do_syscall_64
94.74 -54.3 40.48 ± 3% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
10.16 ± 3% -9.8 0.38 ± 61% perf-profile.children.cycles-pp.rwsem_spin_on_owner
1.25 ± 11% -0.5 0.78 ± 5% perf-profile.children.cycles-pp.find_vma
0.89 ± 10% -0.2 0.67 ± 7% perf-profile.children.cycles-pp.anon_vma_clone
0.27 ± 31% -0.1 0.12 ± 5% perf-profile.children.cycles-pp.tlb_gather_mmu
0.07 ± 7% +0.0 0.10 ± 12% perf-profile.children.cycles-pp.cap_mmap_addr
0.03 ±100% +0.0 0.07 ± 11% perf-profile.children.cycles-pp.tlb_flush_mmu_free
0.00 +0.1 0.05 perf-profile.children.cycles-pp.irq_work_run_list
0.08 ± 25% +0.1 0.14 ± 13% perf-profile.children.cycles-pp.arch_tlb_finish_mmu
0.01 ±173% +0.1 0.07 ± 12% perf-profile.children.cycles-pp.cap_capable
0.07 ± 12% +0.1 0.13 ± 23% perf-profile.children.cycles-pp.selinux_mmap_addr
0.04 ± 60% +0.1 0.10 ± 27% perf-profile.children.cycles-pp.free_pgd_range
0.00 +0.1 0.06 ± 13% perf-profile.children.cycles-pp.strlen
0.00 +0.1 0.07 ± 16% perf-profile.children.cycles-pp.sched_idle_set_state
0.00 +0.1 0.07 ± 33% perf-profile.children.cycles-pp.__list_add_valid
0.03 ±102% +0.1 0.10 ± 14% perf-profile.children.cycles-pp.__vm_enough_memory
0.00 +0.1 0.08 ± 24% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.00 +0.1 0.08 ± 30% perf-profile.children.cycles-pp.rcu_note_context_switch
0.00 +0.1 0.08 ± 14% perf-profile.children.cycles-pp.tsc_verify_tsc_adjust
0.00 +0.1 0.08 ± 15% perf-profile.children.cycles-pp.lapic_next_deadline
0.00 +0.1 0.08 ± 15% perf-profile.children.cycles-pp.__enqueue_entity
0.00 +0.1 0.08 ± 26% perf-profile.children.cycles-pp.perf_exclude_event
0.04 ± 58% +0.1 0.12 ± 31% perf-profile.children.cycles-pp.lru_add_drain_cpu
0.00 +0.1 0.08 ± 27% perf-profile.children.cycles-pp.in_sched_functions
0.09 ± 20% +0.1 0.18 ± 26% perf-profile.children.cycles-pp.tick_sched_handle
0.09 ± 21% +0.1 0.17 ± 26% perf-profile.children.cycles-pp.update_process_times
0.00 +0.1 0.09 ± 16% perf-profile.children.cycles-pp.tick_check_broadcast_expired
0.00 +0.1 0.09 ± 16% perf-profile.children.cycles-pp.native_irq_return_iret
0.23 ± 15% +0.1 0.32 ± 13% perf-profile.children.cycles-pp.remove_vma
0.00 +0.1 0.09 ± 23% perf-profile.children.cycles-pp.get_cpu_device
0.00 +0.1 0.09 ± 13% perf-profile.children.cycles-pp.vma_adjust_trans_huge
0.00 +0.1 0.09 ± 13% perf-profile.children.cycles-pp.rb_insert_color_cached
0.00 +0.1 0.09 ± 15% perf-profile.children.cycles-pp.arch_cpu_idle_enter
0.00 +0.1 0.09 ± 17% perf-profile.children.cycles-pp.hugepage_vma_check
0.00 +0.1 0.09 ± 20% perf-profile.children.cycles-pp.put_prev_task_idle
0.10 ± 15% +0.1 0.19 ± 13% perf-profile.children.cycles-pp.__vma_link_rb
0.00 +0.1 0.10 ± 15% perf-profile.children.cycles-pp.userfaultfd_unmap_prep
0.10 ± 19% +0.1 0.20 ± 21% perf-profile.children.cycles-pp.__rb_insert_augmented
0.00 +0.1 0.10 ± 12% perf-profile.children.cycles-pp.clear_buddies
0.16 ± 14% +0.1 0.27 ± 22% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown
0.03 ±100% +0.1 0.13 ± 5% perf-profile.children.cycles-pp.khugepaged_enter_vma_merge
0.13 ± 19% +0.1 0.23 ± 13% perf-profile.children.cycles-pp.tlb_finish_mmu
0.00 +0.1 0.10 ± 28% perf-profile.children.cycles-pp.pm_qos_request
0.00 +0.1 0.10 ± 34% perf-profile.children.cycles-pp.stack_access_ok
0.00 +0.1 0.11 ± 37% perf-profile.children.cycles-pp.perf_swevent_event
0.17 ± 27% +0.1 0.28 ± 7% perf-profile.children.cycles-pp.avc_has_perm_noaudit
0.00 +0.1 0.11 ± 11% perf-profile.children.cycles-pp.irq_work_needs_cpu
0.00 +0.1 0.11 ± 17% perf-profile.children.cycles-pp.reschedule_interrupt
0.00 +0.1 0.11 ± 19% perf-profile.children.cycles-pp.clockevents_program_event
0.00 +0.1 0.11 ± 42% perf-profile.children.cycles-pp.tick_nohz_tick_stopped
0.10 ± 15% +0.1 0.22 ± 24% perf-profile.children.cycles-pp.tick_sched_timer
0.00 +0.1 0.12 ± 33% perf-profile.children.cycles-pp.can_stop_idle_tick
0.53 ± 4% +0.1 0.66 ± 9% perf-profile.children.cycles-pp.vma_compute_subtree_gap
0.00 +0.1 0.12 ± 13% perf-profile.children.cycles-pp.native_load_tls
0.00 +0.1 0.13 ± 33% perf-profile.children.cycles-pp.load_new_mm_cr3
0.07 ± 23% +0.1 0.20 ± 10% perf-profile.children.cycles-pp.cap_vm_enough_memory
0.00 +0.1 0.14 ± 19% perf-profile.children.cycles-pp.__update_idle_core
0.00 +0.1 0.14 ± 15% perf-profile.children.cycles-pp.update_min_vruntime
0.00 +0.1 0.14 ± 26% perf-profile.children.cycles-pp.leave_mm
0.33 ± 6% +0.1 0.47 ± 14% perf-profile.children.cycles-pp.___might_sleep
0.00 +0.2 0.15 ± 24% perf-profile.children.cycles-pp.ttwu_stat
0.09 ± 20% +0.2 0.24 ± 13% perf-profile.children.cycles-pp.perf_event_mmap_output
0.14 ± 9% +0.2 0.30 ± 20% perf-profile.children.cycles-pp.strlcpy
0.01 ±173% +0.2 0.18 ± 31% perf-profile.children.cycles-pp.__softirqentry_text_start
0.00 +0.2 0.17 ± 25% perf-profile.children.cycles-pp.deactivate_task
0.00 +0.2 0.18 ± 12% perf-profile.children.cycles-pp.orc_find
0.00 +0.2 0.18 ± 14% perf-profile.children.cycles-pp.sched_ttwu_pending
0.24 ± 14% +0.2 0.42 ± 7% perf-profile.children.cycles-pp.security_mmap_addr
0.00 +0.2 0.18 ± 11% perf-profile.children.cycles-pp.rcu_eqs_exit
0.00 +0.2 0.19 ± 15% perf-profile.children.cycles-pp.rcu_eqs_enter
0.20 ± 10% +0.2 0.40 ± 14% perf-profile.children.cycles-pp.__might_sleep
0.07 ± 21% +0.2 0.27 ± 18% perf-profile.children.cycles-pp.__vma_rb_erase
0.00 +0.2 0.20 ± 22% perf-profile.children.cycles-pp.rb_next
0.00 +0.2 0.20 ± 20% perf-profile.children.cycles-pp.__list_del_entry_valid
0.00 +0.2 0.20 ± 22% perf-profile.children.cycles-pp.save_stack_address
0.22 ± 19% +0.2 0.43 ± 9% perf-profile.children.cycles-pp.cred_has_capability
0.00 +0.2 0.22 ± 25% perf-profile.children.cycles-pp.__orc_find
0.00 +0.2 0.23 ± 14% perf-profile.children.cycles-pp.kernel_text_address
0.03 ±100% +0.2 0.25 ± 24% perf-profile.children.cycles-pp.irq_exit
0.00 +0.2 0.23 ± 27% perf-profile.children.cycles-pp.__rwsem_mark_wake
0.00 +0.2 0.24 ± 22% perf-profile.children.cycles-pp.nr_iowait_cpu
0.00 +0.2 0.25 ± 12% perf-profile.children.cycles-pp.call_cpuidle
0.24 ± 13% +0.3 0.50 ± 9% perf-profile.children.cycles-pp.perf_iterate_sb
0.38 ± 13% +0.3 0.63 ± 14% perf-profile.children.cycles-pp.vm_area_dup
0.32 ± 7% +0.3 0.57 ± 9% perf-profile.children.cycles-pp.kmem_cache_alloc
0.00 +0.3 0.26 ± 8% perf-profile.children.cycles-pp.put_prev_entity
0.16 ± 12% +0.3 0.42 ± 11% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.00 +0.3 0.27 ± 24% perf-profile.children.cycles-pp.menu_reflect
0.00 +0.3 0.27 ± 14% perf-profile.children.cycles-pp.activate_task
0.00 +0.3 0.27 ± 23% perf-profile.children.cycles-pp.__hrtimer_next_event_base
0.00 +0.3 0.27 ± 14% perf-profile.children.cycles-pp.account_entity_dequeue
0.00 +0.3 0.28 ± 18% perf-profile.children.cycles-pp.__kernel_text_address
0.00 +0.3 0.28 ± 8% perf-profile.children.cycles-pp.get_iowait_load
0.27 ± 13% +0.3 0.56 ± 12% perf-profile.children.cycles-pp.selinux_vm_enough_memory
0.41 ± 2% +0.3 0.70 ± 9% perf-profile.children.cycles-pp.unlink_anon_vmas
0.00 +0.3 0.31 ± 13% perf-profile.children.cycles-pp.pick_next_entity
0.00 +0.3 0.31 ± 4% perf-profile.children.cycles-pp.rb_erase_cached
0.00 +0.3 0.32 ± 13% perf-profile.children.cycles-pp.cpuidle_governor_latency_req
0.00 +0.3 0.32 ± 13% perf-profile.children.cycles-pp.account_entity_enqueue
0.00 +0.3 0.33 ± 8% perf-profile.children.cycles-pp.available_idle_cpu
0.00 +0.3 0.34 ± 14% perf-profile.children.cycles-pp.update_ts_time_stats
0.00 +0.3 0.34 ± 12% perf-profile.children.cycles-pp.unwind_get_return_address
0.00 +0.3 0.35 ± 20% perf-profile.children.cycles-pp.rwsem_downgrade_wake
0.45 ± 3% +0.4 0.80 ± 11% perf-profile.children.cycles-pp.free_pgtables
0.00 +0.4 0.37 ± 8% perf-profile.children.cycles-pp.__calc_delta
0.00 +0.4 0.37 ± 24% perf-profile.children.cycles-pp.call_rwsem_downgrade_wake
0.00 +0.4 0.38 ± 8% perf-profile.children.cycles-pp.select_idle_sibling
0.68 ± 6% +0.4 1.06 ± 7% perf-profile.children.cycles-pp.unmap_page_range
0.00 +0.4 0.39 ± 7% perf-profile.children.cycles-pp.put_prev_task_fair
0.22 ± 8% +0.4 0.62 ± 8% perf-profile.children.cycles-pp.hrtimer_interrupt
0.00 +0.4 0.41 ± 2% perf-profile.children.cycles-pp.rcu_idle_exit
0.50 ± 11% +0.4 0.91 ± 6% perf-profile.children.cycles-pp.get_unmapped_area
0.00 +0.4 0.42 ± 3% perf-profile.children.cycles-pp.hrtimer_get_next_event
0.36 ± 11% +0.4 0.79 ± 9% perf-profile.children.cycles-pp.security_vm_enough_memory_mm
0.75 ± 6% +0.4 1.19 ± 8% perf-profile.children.cycles-pp.unmap_vmas
1.42 ± 7% +0.4 1.86 ± 5% perf-profile.children.cycles-pp.vma_merge
0.00 +0.5 0.46 ± 15% perf-profile.children.cycles-pp.__unwind_start
0.00 +0.5 0.46 ± 32% perf-profile.children.cycles-pp.resched_curr
0.12 ± 8% +0.5 0.62 ± 12% perf-profile.children.cycles-pp.__indirect_thunk_start
0.01 ±173% +0.5 0.53 ± 14% perf-profile.children.cycles-pp.read_tsc
0.00 +0.5 0.52 ± 23% perf-profile.children.cycles-pp.downgrade_write
0.06 ± 62% +0.5 0.58 ± 12% perf-profile.children.cycles-pp.wake_q_add
0.44 ± 26% +0.6 1.00 ± 25% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.00 +0.6 0.62 ± 13% perf-profile.children.cycles-pp.reweight_entity
0.04 ±106% +0.6 0.67 ± 6% perf-profile.children.cycles-pp.native_write_msr
0.00 +0.6 0.63 ± 13% perf-profile.children.cycles-pp.hrtimer_next_event_without
0.00 +0.6 0.64 ± 6% perf-profile.children.cycles-pp.___perf_sw_event
0.00 +0.7 0.68 ± 7% perf-profile.children.cycles-pp.pick_next_task_idle
0.28 ± 9% +0.7 0.96 ± 10% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
0.04 ±100% +0.7 0.72 ± 10% perf-profile.children.cycles-pp.__task_rq_lock
0.02 ±173% +0.7 0.71 ± 9% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.00 +0.7 0.70 ± 10% perf-profile.children.cycles-pp.update_rq_clock
0.00 +0.7 0.72 ± 6% perf-profile.children.cycles-pp.__update_load_avg_se
0.68 ± 6% +0.7 1.40 ± 9% perf-profile.children.cycles-pp.perf_event_mmap
0.00 +0.7 0.74 ± 12% perf-profile.children.cycles-pp.tick_nohz_idle_enter
0.29 ± 8% +0.7 1.03 ± 10% perf-profile.children.cycles-pp.apic_timer_interrupt
0.01 ±173% +0.7 0.76 ± 6% perf-profile.children.cycles-pp.finish_task_switch
0.01 ±173% +0.8 0.81 ± 4% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.03 ±100% +0.8 0.84 ± 11% perf-profile.children.cycles-pp.ktime_get
0.02 ±173% +0.8 0.83 ± 21% perf-profile.children.cycles-pp.check_preempt_curr
0.01 ±173% +0.8 0.83 ± 12% perf-profile.children.cycles-pp.tick_nohz_idle_exit
0.03 ±105% +0.9 0.88 ± 2% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.02 ±173% +0.9 0.92 ± 20% perf-profile.children.cycles-pp.ttwu_do_wakeup
0.46 ± 26% +0.9 1.36 ± 20% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
1.75 ± 10% +0.9 2.67 ± 5% perf-profile.children.cycles-pp.unmap_region
0.01 ±173% +0.9 0.95 ± 11% perf-profile.children.cycles-pp.find_next_bit
0.01 ±173% +1.0 0.99 ± 7% perf-profile.children.cycles-pp.update_cfs_group
0.03 ±105% +1.0 1.04 ± 8% perf-profile.children.cycles-pp.__switch_to_asm
0.04 ± 57% +1.0 1.07 ± 10% perf-profile.children.cycles-pp.update_curr
0.03 ±100% +1.2 1.24 ± 4% perf-profile.children.cycles-pp.select_task_rq_fair
0.01 ±173% +1.2 1.23 ± 6% perf-profile.children.cycles-pp.__switch_to
0.28 ± 12% +1.3 1.55 ± 10% perf-profile.children.cycles-pp._raw_spin_lock
0.01 ±173% +1.3 1.34 ± 13% perf-profile.children.cycles-pp.native_sched_clock
0.02 ±173% +1.4 1.37 ± 12% perf-profile.children.cycles-pp.sched_clock
0.00 +1.4 1.36 ± 12% perf-profile.children.cycles-pp.unwind_next_frame
0.04 ± 63% +1.4 1.49 ± 9% perf-profile.children.cycles-pp.sched_clock_cpu
0.03 ±100% +1.5 1.48 ± 10% perf-profile.children.cycles-pp.__next_timer_interrupt
0.08 ± 38% +1.7 1.77 ± 5% perf-profile.children.cycles-pp.set_next_entity
5.79 ± 4% +1.9 7.71 ± 5% perf-profile.children.cycles-pp.__do_munmap
0.00 +2.1 2.09 ± 9% perf-profile.children.cycles-pp.__save_stack_trace
3.38 ± 5% +2.1 5.47 ± 4% perf-profile.children.cycles-pp.do_brk_flags
0.00 +2.1 2.13 ± 9% perf-profile.children.cycles-pp.save_stack_trace_tsk
0.05 ± 61% +2.2 2.22 ± 7% perf-profile.children.cycles-pp.get_next_timer_interrupt
0.01 ±173% +2.5 2.53 ± 8% perf-profile.children.cycles-pp.__account_scheduler_latency
0.11 ± 47% +2.5 2.65 ± 2% perf-profile.children.cycles-pp.update_load_avg
0.06 ± 60% +2.6 2.66 ± 7% perf-profile.children.cycles-pp.tick_nohz_next_event
0.15 ± 42% +3.2 3.33 ± 6% perf-profile.children.cycles-pp.pick_next_task_fair
0.09 ± 31% +3.4 3.51 ± 7% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.14 ± 35% +3.5 3.62 ± 3% perf-profile.children.cycles-pp.dequeue_entity
0.16 ± 36% +4.0 4.12 ± 2% perf-profile.children.cycles-pp.dequeue_task_fair
0.24 ± 44% +4.1 4.30 ± 5% perf-profile.children.cycles-pp.poll_idle
0.24 ± 50% +5.3 5.54 ± 4% perf-profile.children.cycles-pp.schedule_idle
2.34 ± 9% +5.6 7.90 ± 4% perf-profile.children.cycles-pp.up_write
0.19 ± 35% +5.8 5.99 ± 6% perf-profile.children.cycles-pp.menu_select
0.20 ± 33% +6.1 6.30 ± 2% perf-profile.children.cycles-pp.enqueue_entity
0.22 ± 32% +6.7 6.96 ± 2% perf-profile.children.cycles-pp.enqueue_task_fair
0.24 ± 29% +7.1 7.35 ± 2% perf-profile.children.cycles-pp.ttwu_do_activate
0.00 +7.9 7.91 ± 3% perf-profile.children.cycles-pp.up_read
0.32 ± 37% +7.9 8.23 ± 4% perf-profile.children.cycles-pp.schedule
0.91 ± 15% +12.5 13.40 ± 3% perf-profile.children.cycles-pp.try_to_wake_up
0.95 ± 15% +12.9 13.85 ± 3% perf-profile.children.cycles-pp.wake_up_q
0.55 ± 43% +13.0 13.52 ± 4% perf-profile.children.cycles-pp.__sched_text_start
1.15 ± 17% +14.0 15.11 ± 3% perf-profile.children.cycles-pp.rwsem_wake
1.15 ± 17% +14.0 15.15 ± 3% perf-profile.children.cycles-pp.call_rwsem_wake
0.88 ± 20% +29.8 30.72 ± 5% perf-profile.children.cycles-pp.intel_idle
1.16 ± 24% +35.6 36.74 ± 4% perf-profile.children.cycles-pp.cpuidle_enter_state
1.63 ± 36% +44.6 46.21 ± 9% perf-profile.children.cycles-pp.start_secondary
1.74 ± 29% +51.1 52.81 ± 2% perf-profile.children.cycles-pp.do_idle
1.73 ± 29% +51.1 52.86 ± 2% perf-profile.children.cycles-pp.secondary_startup_64
1.73 ± 29% +51.1 52.86 ± 2% perf-profile.children.cycles-pp.cpu_startup_entry
68.56 -68.5 0.01 ±173% perf-profile.self.cycles-pp.osq_lock
10.12 ± 4% -9.7 0.38 ± 61% perf-profile.self.cycles-pp.rwsem_spin_on_owner
1.41 ± 7% -0.9 0.51 ± 17% perf-profile.self.cycles-pp.rwsem_down_write_failed_killable
1.16 ± 4% -0.6 0.52 ± 12% perf-profile.self.cycles-pp.up_write
1.02 ± 14% -0.5 0.48 ± 9% perf-profile.self.cycles-pp.find_vma
1.16 ± 5% -0.2 0.94 ± 2% perf-profile.self.cycles-pp.__vma_adjust
0.38 ± 10% -0.2 0.20 ± 17% perf-profile.self.cycles-pp.anon_vma_clone
0.26 ± 31% -0.1 0.12 ± 5% perf-profile.self.cycles-pp.tlb_gather_mmu
0.06 ± 15% +0.0 0.09 ± 15% perf-profile.self.cycles-pp.cap_mmap_addr
0.06 +0.0 0.10 ± 8% perf-profile.self.cycles-pp.__split_vma
0.01 ±173% +0.0 0.06 ± 14% perf-profile.self.cycles-pp.remove_vma
0.04 ± 58% +0.1 0.09 ± 19% perf-profile.self.cycles-pp.tlb_finish_mmu
0.00 +0.1 0.05 ± 9% perf-profile.self.cycles-pp.strlen
0.00 +0.1 0.06 ± 11% perf-profile.self.cycles-pp.unwind_get_return_address
0.04 ± 60% +0.1 0.10 ± 27% perf-profile.self.cycles-pp.free_pgd_range
0.00 +0.1 0.06 ± 13% perf-profile.self.cycles-pp.cap_capable
0.00 +0.1 0.06 ± 34% perf-profile.self.cycles-pp.__list_add_valid
0.04 ± 58% +0.1 0.11 ± 28% perf-profile.self.cycles-pp.lru_add_drain_cpu
0.00 +0.1 0.07 ± 14% perf-profile.self.cycles-pp.tsc_verify_tsc_adjust
0.00 +0.1 0.07 ± 22% perf-profile.self.cycles-pp.kernel_text_address
0.00 +0.1 0.07 ± 34% perf-profile.self.cycles-pp.put_prev_task_idle
0.00 +0.1 0.07 ± 17% perf-profile.self.cycles-pp.hugepage_vma_check
0.00 +0.1 0.08 ± 30% perf-profile.self.cycles-pp.rcu_note_context_switch
0.00 +0.1 0.08 ± 24% perf-profile.self.cycles-pp.__save_stack_trace
0.00 +0.1 0.08 ± 16% perf-profile.self.cycles-pp.clear_buddies
0.00 +0.1 0.08 ± 14% perf-profile.self.cycles-pp.__enqueue_entity
0.10 ± 31% +0.1 0.17 ± 13% perf-profile.self.cycles-pp.security_mmap_addr
0.04 ± 58% +0.1 0.12 ± 32% perf-profile.self.cycles-pp.selinux_vm_enough_memory
0.00 +0.1 0.08 ± 34% perf-profile.self.cycles-pp.free_pgtables
0.09 ± 13% +0.1 0.17 ± 18% perf-profile.self.cycles-pp.unlink_anon_vmas
0.00 +0.1 0.08 ± 10% perf-profile.self.cycles-pp.rb_insert_color_cached
0.00 +0.1 0.08 ± 31% perf-profile.self.cycles-pp.get_cpu_device
0.04 ± 58% +0.1 0.12 ± 22% perf-profile.self.cycles-pp.selinux_mmap_addr
0.00 +0.1 0.08 ± 10% perf-profile.self.cycles-pp.vma_adjust_trans_huge
0.00 +0.1 0.08 ± 15% perf-profile.self.cycles-pp.tick_check_broadcast_expired
0.05 ± 8% +0.1 0.14 ± 33% perf-profile.self.cycles-pp.userfaultfd_unmap_complete
0.00 +0.1 0.09 ± 17% perf-profile.self.cycles-pp.__vma_link_rb
0.00 +0.1 0.09 ± 16% perf-profile.self.cycles-pp.native_irq_return_iret
0.00 +0.1 0.09 ± 12% perf-profile.self.cycles-pp.ttwu_do_wakeup
0.00 +0.1 0.09 ± 13% perf-profile.self.cycles-pp.irq_work_needs_cpu
0.00 +0.1 0.09 ± 8% perf-profile.self.cycles-pp.update_ts_time_stats
0.00 +0.1 0.09 ± 32% perf-profile.self.cycles-pp.tick_nohz_tick_stopped
0.00 +0.1 0.09 ± 36% perf-profile.self.cycles-pp.stack_access_ok
0.00 +0.1 0.10 ± 18% perf-profile.self.cycles-pp.userfaultfd_unmap_prep
0.00 +0.1 0.10 ± 11% perf-profile.self.cycles-pp.__unwind_start
0.00 +0.1 0.10 ± 29% perf-profile.self.cycles-pp.pm_qos_request
0.50 ± 4% +0.1 0.60 ± 12% perf-profile.self.cycles-pp.vma_compute_subtree_gap
0.15 ± 12% +0.1 0.25 ± 9% perf-profile.self.cycles-pp.perf_iterate_sb
0.03 ±100% +0.1 0.12 ± 22% perf-profile.self.cycles-pp.rwsem_wake
0.00 +0.1 0.10 ± 23% perf-profile.self.cycles-pp.ttwu_do_activate
0.09 ± 17% +0.1 0.20 ± 22% perf-profile.self.cycles-pp.__rb_insert_augmented
0.00 +0.1 0.10 ± 17% perf-profile.self.cycles-pp.save_stack_address
0.00 +0.1 0.10 ± 8% perf-profile.self.cycles-pp.hrtimer_next_event_without
0.00 +0.1 0.10 ± 8% perf-profile.self.cycles-pp.tick_nohz_idle_exit
0.17 ± 28% +0.1 0.27 ± 9% perf-profile.self.cycles-pp.avc_has_perm_noaudit
0.01 ±173% +0.1 0.12 ± 15% perf-profile.self.cycles-pp.cap_vm_enough_memory
0.00 +0.1 0.11 ± 31% perf-profile.self.cycles-pp.sched_clock_cpu
0.09 ± 19% +0.1 0.20 ± 21% perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown
0.00 +0.1 0.12 ± 33% perf-profile.self.cycles-pp.tick_nohz_get_sleep_length
0.00 +0.1 0.12 ± 10% perf-profile.self.cycles-pp.native_load_tls
0.00 +0.1 0.12 ± 33% perf-profile.self.cycles-pp.can_stop_idle_tick
0.00 +0.1 0.12 ± 15% perf-profile.self.cycles-pp.up_read
0.00 +0.1 0.13 ± 18% perf-profile.self.cycles-pp.__update_idle_core
0.00 +0.1 0.13 ± 33% perf-profile.self.cycles-pp.load_new_mm_cr3
0.00 +0.1 0.13 ± 12% perf-profile.self.cycles-pp.update_min_vruntime
0.32 ± 7% +0.1 0.45 ± 13% perf-profile.self.cycles-pp.___might_sleep
0.03 ±100% +0.1 0.16 ± 13% perf-profile.self.cycles-pp.cred_has_capability
0.11 ± 21% +0.1 0.24 ± 14% perf-profile.self.cycles-pp.get_unmapped_area
0.00 +0.1 0.14 ± 19% perf-profile.self.cycles-pp.put_prev_task_fair
0.00 +0.1 0.14 ± 30% perf-profile.self.cycles-pp.deactivate_task
0.00 +0.1 0.14 ± 46% perf-profile.self.cycles-pp.downgrade_write
0.09 ± 20% +0.1 0.23 ± 16% perf-profile.self.cycles-pp.perf_event_mmap_output
0.00 +0.1 0.15 ± 19% perf-profile.self.cycles-pp.pick_next_task_idle
0.00 +0.1 0.15 ± 13% perf-profile.self.cycles-pp.schedule_idle
0.00 +0.2 0.15 ± 24% perf-profile.self.cycles-pp.ttwu_stat
0.00 +0.2 0.15 ± 24% perf-profile.self.cycles-pp.cpu_startup_entry
0.01 ±173% +0.2 0.17 ± 28% perf-profile.self.cycles-pp.strlcpy
0.00 +0.2 0.16 ± 18% perf-profile.self.cycles-pp.schedule
0.18 ± 8% +0.2 0.34 ± 13% perf-profile.self.cycles-pp.__might_sleep
0.00 +0.2 0.16 ± 26% perf-profile.self.cycles-pp.rb_next
0.24 ± 6% +0.2 0.41 ± 11% perf-profile.self.cycles-pp.kmem_cache_alloc
0.18 ± 27% +0.2 0.35 ± 8% perf-profile.self.cycles-pp.vma_merge
0.00 +0.2 0.18 ± 12% perf-profile.self.cycles-pp.sched_ttwu_pending
0.00 +0.2 0.18 ± 22% perf-profile.self.cycles-pp.rcu_eqs_enter
0.00 +0.2 0.18 ± 12% perf-profile.self.cycles-pp.orc_find
0.00 +0.2 0.18 ± 18% perf-profile.self.cycles-pp.__hrtimer_next_event_base
0.00 +0.2 0.18 ± 12% perf-profile.self.cycles-pp.rcu_eqs_exit
0.00 +0.2 0.19 ± 18% perf-profile.self.cycles-pp.__list_del_entry_valid
0.01 ±173% +0.2 0.21 ± 20% perf-profile.self.cycles-pp.__vma_rb_erase
0.00 +0.2 0.21 ± 27% perf-profile.self.cycles-pp.get_next_timer_interrupt
0.00 +0.2 0.22 ± 13% perf-profile.self.cycles-pp.rcu_idle_exit
0.00 +0.2 0.22 ± 25% perf-profile.self.cycles-pp.__orc_find
0.00 +0.2 0.22 ± 21% perf-profile.self.cycles-pp.update_rq_clock
0.00 +0.2 0.22 ± 28% perf-profile.self.cycles-pp.menu_reflect
0.00 +0.2 0.23 ± 27% perf-profile.self.cycles-pp.__rwsem_mark_wake
0.00 +0.2 0.23 ± 21% perf-profile.self.cycles-pp.nr_iowait_cpu
0.00 +0.2 0.24 ± 13% perf-profile.self.cycles-pp.account_entity_dequeue
0.00 +0.2 0.24 ± 3% perf-profile.self.cycles-pp.tick_nohz_next_event
0.00 +0.2 0.24 ± 10% perf-profile.self.cycles-pp.call_cpuidle
0.25 ± 5% +0.2 0.49 ± 9% perf-profile.self.cycles-pp.perf_event_mmap
0.00 +0.2 0.25 ± 5% perf-profile.self.cycles-pp.rb_erase_cached
0.00 +0.2 0.25 ± 16% perf-profile.self.cycles-pp.pick_next_entity
0.38 ± 9% +0.2 0.63 ± 7% perf-profile.self.cycles-pp.__x64_sys_brk
0.00 +0.2 0.25 ± 7% perf-profile.self.cycles-pp.put_prev_entity
0.00 +0.3 0.27 ± 14% perf-profile.self.cycles-pp.activate_task
0.00 +0.3 0.28 ± 8% perf-profile.self.cycles-pp.get_iowait_load
0.24 ± 7% +0.3 0.52 ± 13% perf-profile.self.cycles-pp.__do_munmap
0.00 +0.3 0.28 ± 5% perf-profile.self.cycles-pp.cpuidle_enter_state
0.00 +0.3 0.29 ± 14% perf-profile.self.cycles-pp.account_entity_enqueue
0.00 +0.3 0.29 ± 7% perf-profile.self.cycles-pp.ktime_get
0.00 +0.3 0.33 ± 12% perf-profile.self.cycles-pp.check_preempt_curr
0.00 +0.3 0.33 ± 7% perf-profile.self.cycles-pp.available_idle_cpu
0.00 +0.3 0.33 ± 8% perf-profile.self.cycles-pp.__account_scheduler_latency
0.00 +0.3 0.35 ± 4% perf-profile.self.cycles-pp.__calc_delta
0.37 ± 5% +0.4 0.73 ± 6% perf-profile.self.cycles-pp.unmap_page_range
0.01 ±173% +0.4 0.45 ± 3% perf-profile.self.cycles-pp.wake_up_q
0.00 +0.4 0.44 ± 15% perf-profile.self.cycles-pp.dequeue_task_fair
0.12 ± 13% +0.4 0.56 ± 12% perf-profile.self.cycles-pp.__indirect_thunk_start
0.00 +0.5 0.46 ± 32% perf-profile.self.cycles-pp.resched_curr
0.00 +0.5 0.48 ± 10% perf-profile.self.cycles-pp.___perf_sw_event
0.01 ±173% +0.5 0.51 ± 14% perf-profile.self.cycles-pp.read_tsc
0.06 ± 62% +0.5 0.58 ± 12% perf-profile.self.cycles-pp.wake_q_add
0.44 ± 26% +0.6 0.99 ± 24% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.02 ±173% +0.6 0.58 ± 13% perf-profile.self.cycles-pp.switch_mm_irqs_off
0.01 ±173% +0.6 0.59 ± 17% perf-profile.self.cycles-pp.update_curr
0.00 +0.6 0.59 ± 17% perf-profile.self.cycles-pp.dequeue_entity
0.04 ±106% +0.6 0.66 ± 6% perf-profile.self.cycles-pp.native_write_msr
0.00 +0.6 0.62 ± 13% perf-profile.self.cycles-pp.reweight_entity
0.00 +0.6 0.65 ± 13% perf-profile.self.cycles-pp.__next_timer_interrupt
0.23 ± 11% +0.7 0.89 ± 13% perf-profile.self.cycles-pp._raw_spin_lock
0.00 +0.7 0.66 ± 10% perf-profile.self.cycles-pp.enqueue_task_fair
0.01 ±173% +0.7 0.69 ± 4% perf-profile.self.cycles-pp.finish_task_switch
0.00 +0.7 0.69 ± 8% perf-profile.self.cycles-pp.__update_load_avg_se
0.02 ±173% +0.8 0.80 ± 2% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.01 ±173% +0.8 0.80 ± 5% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.00 +0.8 0.78 ± 5% perf-profile.self.cycles-pp.select_task_rq_fair
0.00 +0.8 0.80 ± 11% perf-profile.self.cycles-pp.unwind_next_frame
0.00 +0.8 0.83 ± 9% perf-profile.self.cycles-pp.find_next_bit
0.07 ± 17% +0.9 0.97 ± 9% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.01 ±173% +0.9 0.96 ± 7% perf-profile.self.cycles-pp.update_cfs_group
0.03 ±105% +1.0 0.98 ± 8% perf-profile.self.cycles-pp.set_next_entity
0.02 ±173% +1.0 1.01 ± 7% perf-profile.self.cycles-pp.pick_next_task_fair
0.03 ±105% +1.0 1.04 ± 8% perf-profile.self.cycles-pp.__switch_to_asm
0.04 ±103% +1.1 1.10 ± 9% perf-profile.self.cycles-pp.update_load_avg
0.01 ±173% +1.2 1.18 ± 7% perf-profile.self.cycles-pp.do_idle
0.01 ±173% +1.2 1.18 ± 5% perf-profile.self.cycles-pp.__switch_to
0.04 ±107% +1.2 1.28 ± 7% perf-profile.self.cycles-pp.enqueue_entity
0.01 ±173% +1.3 1.31 ± 14% perf-profile.self.cycles-pp.native_sched_clock
0.07 ± 28% +1.5 1.60 ± 6% perf-profile.self.cycles-pp.try_to_wake_up
0.08 ± 33% +1.7 1.81 ± 9% perf-profile.self.cycles-pp.menu_select
0.10 ± 43% +2.2 2.32 ± 9% perf-profile.self.cycles-pp.__sched_text_start
0.22 ± 44% +3.8 4.05 ± 6% perf-profile.self.cycles-pp.poll_idle
0.88 ± 20% +29.7 30.58 ± 5% perf-profile.self.cycles-pp.intel_idle



will-it-scale.per_thread_ops

220000 +-+----------------------------------------------------------------+
|. + + + +. + + +. :+ + .+. + + |
200000 +-+ +. + + .+ : + +. + + +. .+. + + .+ + + + |
180000 +-+ + + + .+. : + + +. .+ + + + +|
| + + + + |
160000 +-+ |
| |
140000 +-+ |
| |
120000 +-+ |
100000 +-O |
| |
80000 O-+ O O O O |
| O O O O O O O O O O O O O O O O O O O |
60000 +-+----------------------------------------------------------------+


will-it-scale.workload

1.8e+06 +-+---------------------------------------------------------------+
| .+ + .+ .+ + .+. .+. |
1.6e+06 +-+ + + + .+ +.+ + .+.+ + .+. + :.+ + + + |
| +.+ + + +. + +.+ +.+ +. .+ + + + +|
1.4e+06 +-+ + + + + |
| |
1.2e+06 +-+ |
| |
1e+06 +-+ |
| |
800000 +-O |
O O O |
600000 +-+ O O O O OO O O O O O O O O O O O O O |
| O O |
400000 +-+---------------------------------------------------------------+


will-it-scale.time.user_time

60 +-+--------------------------------------------------------------------+
| |
55 +-+.+ .+ .+ + .+.. .+. |
50 +-+ + .+.. .+ +.+ + .+..+ + .+. : + .+ + + + |
| +.+ + + .+. + +.+ +.+ +. : + + + +|
45 +-+ + + +.+ + |
| |
40 +-+ |
| |
35 +-+ |
30 +-O |
O O O O O O O O |
25 +-+ O O O O O O O O O O O O |
| O O O O |
20 +-+--------------------------------------------------------------------+


will-it-scale.time.system_time

2400 +-+------------------------------------------------------------------+
2200 +-+.+.+.+.+. .+.+.+.+.+.+..+.+.+.+.+.+. .+. |
| +.+.+.+.+ +.+ +.+.+.+.+.+.+.+.|
2000 +-+ |
1800 +-+ |
| |
1600 +-+ |
1400 +-+ |
1200 +-+ |
| |
1000 +-+ |
800 +-+ |
| |
600 O-O O O O O O O O O O O O O O O O O O O O O O |
400 +-+---------------------------O----O---------------------------------+


will-it-scale.time.percent_of_cpu_this_job_got

800 +-+-------------------------------------------------------------------+
|.+.+.+.+.+.+.+. .+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+..+.+.+.+. .+.+.+.|
700 +-+ +. + |
| |
600 +-+ |
| |
500 +-+ |
| |
400 +-+ |
| |
300 +-+ |
| |
200 O-O O O O O O |
| O O O O O O O O O O O O O O O O O O |
100 +-+-------------------------------------------------------------------+


will-it-scale.time.voluntary_context_switches

1.6e+08 +-+---------------------------------------------------------------+
| O O O O O |
1.4e+08 +-O O O O O O O |
1.2e+08 O-+ O O OO O O O O |
| O O O O |
1e+08 +-+ |
| |
8e+07 +-+ |
| |
6e+07 +-+ |
4e+07 +-+ + .|
| .+. + :.+. .+.+.+.++.+.+.+.+.+.+.+ |
2e+07 +-+. .+.+.+ + + +.+. .+.+.+.+. .+.+.+ |
| + + + |
0 +-+---------------------------------------------------------------+


will-it-scale.time.involuntary_context_switches

200000 +-+----------------------------------------------------------------+
180000 +-+ + .+ : + + .+ + + .+ |
| + : : + + : |
160000 +-+ : : : + + |
140000 +-+ + + +.: +.+. + + .+ + + |
| + + : : + +.+ +.+.+ : + + |
120000 +-+ + : : : + :|
100000 +-+ + + :|
80000 +-+ |
| |
60000 +-+ |
40000 +-+ |
| |
20000 +-+ |
0 O-O-O-O-O-O-O-O-O-O-O-O-O-O-O-O-O-OO-O-O-O-O-O-O-------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Rong Chen


Attachments:
(No filename) (67.18 kB)
config-4.19.0-06995-g9bc8039 (170.99 kB)
job-script (6.93 kB)
job.yaml (4.64 kB)
reproduce (316.00 B)
Download all attachments

2018-11-05 17:57:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

On Sun, Nov 4, 2018 at 9:08 PM kernel test robot <[email protected]> wrote:
>
> FYI, we noticed a -64.1% regression of will-it-scale.per_thread_ops
> due to commit 9bc8039e715d ("mm: brk: downgrade mmap_sem to read when
> shrinking")

Ugh. That looks pretty bad.

> in testcase: will-it-scale
> on test machine: 8 threads Ivy Bridge with 16G memory
> with following parameters:
>
> nr_task: 100%
> mode: thread
> test: brk1
> ucode: 0x20
> cpufreq_governor: performance

The reason seems to be way more scheduler time due to lots more
context switches:

> 34925294 ± 18% +270.3% 1.293e+08 ± 4% will-it-scale.time.voluntary_context_switches

Yang Shi, would you mind taking a look at what's going on?

Linus

2018-11-05 18:28:51

by Yang Shi

[permalink] [raw]
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression



On 11/5/18 9:50 AM, Linus Torvalds wrote:
> On Sun, Nov 4, 2018 at 9:08 PM kernel test robot <[email protected]> wrote:
>> FYI, we noticed a -64.1% regression of will-it-scale.per_thread_ops
>> due to commit 9bc8039e715d ("mm: brk: downgrade mmap_sem to read when
>> shrinking")
> Ugh. That looks pretty bad.
>
>> in testcase: will-it-scale
>> on test machine: 8 threads Ivy Bridge with 16G memory
>> with following parameters:
>>
>> nr_task: 100%
>> mode: thread
>> test: brk1
>> ucode: 0x20
>> cpufreq_governor: performance
> The reason seems to be way more scheduler time due to lots more
> context switches:
>
>> 34925294 ± 18% +270.3% 1.293e+08 ± 4% will-it-scale.time.voluntary_context_switches
> Yang Shi, would you mind taking a look at what's going on?

No problem.

Actually, the commit is mainly for optimizing the long stall time caused
by holding mmap_sem by write when unmapping or shrinking large mapping.
It downgrades write mmap_sem to read when zapping pages. So, it looks
the downgrade incurs more context switches. This is kind of expected.

However, the test looks just shrink the mapping with one normal 4K page
size. It sounds the overhead of context switches outpace the gain in
this case at the first glance.

Since the optimization makes more sense to large mapping, how about
restore the mapping size limit, e.g. just downgrade mmap_sem for >= 1g
(PUD_SIZE) mapping?

Thanks,
Yang

>
> Linus


2018-11-05 18:37:17

by Linus Torvalds

[permalink] [raw]
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

On Mon, Nov 5, 2018 at 10:28 AM Yang Shi <[email protected]> wrote:
>
> Actually, the commit is mainly for optimizing the long stall time caused
> by holding mmap_sem by write when unmapping or shrinking large mapping.
> It downgrades write mmap_sem to read when zapping pages. So, it looks
> the downgrade incurs more context switches. This is kind of expected.
>
> However, the test looks just shrink the mapping with one normal 4K page
> size. It sounds the overhead of context switches outpace the gain in
> this case at the first glance.

I'm not seeing why there should be a context switch in the first place.

Even if you have lots of concurrent brk() users, they should all block
exactly the same way as before (a write lock blocks against a write
lock, but it *also* blocks against a downgraded read lock).

So no, I don't want just some limit to hide this problem for that
particular test. There's something else going on.

Linus

2018-11-05 20:12:55

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

On 11/5/18 6:50 PM, Linus Torvalds wrote:
> On Sun, Nov 4, 2018 at 9:08 PM kernel test robot <[email protected]> wrote:
>>
>> FYI, we noticed a -64.1% regression of will-it-scale.per_thread_ops
>> due to commit 9bc8039e715d ("mm: brk: downgrade mmap_sem to read when
>> shrinking")
>
> Ugh. That looks pretty bad.
>
>> in testcase: will-it-scale
>> on test machine: 8 threads Ivy Bridge with 16G memory
>> with following parameters:
>>
>> nr_task: 100%
>> mode: thread
>> test: brk1
>> ucode: 0x20
>> cpufreq_governor: performance
>
> The reason seems to be way more scheduler time due to lots more
> context switches:
>
>> 34925294 ± 18% +270.3% 1.293e+08 ± 4% will-it-scale.time.voluntary_context_switches

And what about this:

0.83 ± 27% +25.9 26.75 ± 11% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
1.09 ± 32% +30.9 31.97 ± 10% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
1.62 ± 36% +44.4 46.01 ± 9% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
1.63 ± 36% +44.5 46.18 ± 9% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
1.63 ± 36% +44.6 46.21 ± 9% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
1.73 ± 29% +51.1 52.86 ± 2% perf-profile.calltrace.cycles-pp.secondary_startup_64

And the graphs showing less user/kernel time and less "percent_of_cpu_this_job_got"...

I didn't spot an obvious mistake in the patch itself, so it looks
like some bad interaction between scheduler and the mmap downgrade?

> Yang Shi, would you mind taking a look at what's going on?
>
> Linus
>


2018-11-05 20:19:12

by Yang Shi

[permalink] [raw]
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression



On 11/5/18 10:35 AM, Linus Torvalds wrote:
> On Mon, Nov 5, 2018 at 10:28 AM Yang Shi <[email protected]> wrote:
>> Actually, the commit is mainly for optimizing the long stall time caused
>> by holding mmap_sem by write when unmapping or shrinking large mapping.
>> It downgrades write mmap_sem to read when zapping pages. So, it looks
>> the downgrade incurs more context switches. This is kind of expected.
>>
>> However, the test looks just shrink the mapping with one normal 4K page
>> size. It sounds the overhead of context switches outpace the gain in
>> this case at the first glance.
> I'm not seeing why there should be a context switch in the first place.
>
> Even if you have lots of concurrent brk() users, they should all block
> exactly the same way as before (a write lock blocks against a write
> lock, but it *also* blocks against a downgraded read lock).

Yes, it is true. The brk() users will not get waken up. What I can think
of for now is there might be other helper processes and/or kernel
threads are waiting for read mmap_sem. They might get waken up by the
downgrade.

But, I also saw huge increase in cpu idle time and sched_goidle events.
Not have clue yet for why idle goes up.

20610709 ± 15% +2376.0% 5.103e+08 ± 34% cpuidle.C1.time
28753819 ± 39% +1054.5% 3.319e+08 ± 49% cpuidle.C3.time

175049 ± 72% +840.7% 1646720 ± 72% sched_debug.cpu.sched_goidle.stddev


Thanks,
Yang

>
> So no, I don't want just some limit to hide this problem for that
> particular test. There's something else going on.
>
> Linus


2018-11-05 22:16:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

On Mon, Nov 5, 2018 at 12:12 PM Vlastimil Babka <[email protected]> wrote:
>
> I didn't spot an obvious mistake in the patch itself, so it looks
> like some bad interaction between scheduler and the mmap downgrade?

I'm thinking it's RWSEM_SPIN_ON_OWNER that ends up being confused by
the downgrade.

It looks like the benchmark used to be basically CPU-bound, at about
800% CPU, and now it's somewhere in the 200% CPU region:

will-it-scale.time.percent_of_cpu_this_job_got

800 +-+-------------------------------------------------------------------+
|.+.+.+.+.+.+.+. .+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+..+.+.+.+. .+.+.+.|
700 +-+ +. + |
| |
600 +-+ |
| |
500 +-+ |
| |
400 +-+ |
| |
300 +-+ |
| |
200 O-O O O O O O |
| O O O O O O O O O O O O O O O O O O |
100 +-+-------------------------------------------------------------------+

which sounds like the downgrade really messes with the "spin waiting
for lock" logic.

I'm thinking it's the "wake up waiter" logic that has some bad
interaction with spinning, and breaks that whole optimization.

Adding Waiman and Davidlohr to the participants, because they seem to
be the obvious experts in this area.

Linus

2018-11-05 22:43:24

by Waiman Long

[permalink] [raw]
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

On 11/05/2018 05:14 PM, Linus Torvalds wrote:
> On Mon, Nov 5, 2018 at 12:12 PM Vlastimil Babka <[email protected]> wrote:
>> I didn't spot an obvious mistake in the patch itself, so it looks
>> like some bad interaction between scheduler and the mmap downgrade?
> I'm thinking it's RWSEM_SPIN_ON_OWNER that ends up being confused by
> the downgrade.
>
> It looks like the benchmark used to be basically CPU-bound, at about
> 800% CPU, and now it's somewhere in the 200% CPU region:
>
> will-it-scale.time.percent_of_cpu_this_job_got
>
> 800 +-+-------------------------------------------------------------------+
> |.+.+.+.+.+.+.+. .+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+..+.+.+.+. .+.+.+.|
> 700 +-+ +. + |
> | |
> 600 +-+ |
> | |
> 500 +-+ |
> | |
> 400 +-+ |
> | |
> 300 +-+ |
> | |
> 200 O-O O O O O O |
> | O O O O O O O O O O O O O O O O O O |
> 100 +-+-------------------------------------------------------------------+
>
> which sounds like the downgrade really messes with the "spin waiting
> for lock" logic.
>
> I'm thinking it's the "wake up waiter" logic that has some bad
> interaction with spinning, and breaks that whole optimization.
>
> Adding Waiman and Davidlohr to the participants, because they seem to
> be the obvious experts in this area.
>
> Linus

Optimistic spinning on rwsem is done only on writers spinning on a
writer-owned rwsem. If a write-lock is downgraded to a read-lock, all
the spinning waiters will quit. That may explain the drop in cpu
utilization. I do have a old patch that enable a certain amount of
reader spinning which may help the situation. I can rebase that and send
it out for review if people have interest.

Cheers,
Longman



2018-12-28 11:55:23

by Kemi Wang

[permalink] [raw]
Subject: RE: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

Hi, Waiman
Did you post that patch? Let's see if it helps.

-----Original Message-----
From: LKP [mailto:[email protected]] On Behalf Of Waiman Long
Sent: Tuesday, November 6, 2018 6:40 AM
To: Linus Torvalds <[email protected]>; [email protected]; Davidlohr Bueso <[email protected]>
Cc: [email protected]; Linux Kernel Mailing List <[email protected]>; Matthew Wilcox <[email protected]>; [email protected]; Colin King <[email protected]>; Andrew Morton <[email protected]>; [email protected]; [email protected]; [email protected]
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

On 11/05/2018 05:14 PM, Linus Torvalds wrote:
> On Mon, Nov 5, 2018 at 12:12 PM Vlastimil Babka <[email protected]> wrote:
>> I didn't spot an obvious mistake in the patch itself, so it looks
>> like some bad interaction between scheduler and the mmap downgrade?
> I'm thinking it's RWSEM_SPIN_ON_OWNER that ends up being confused by
> the downgrade.
>
> It looks like the benchmark used to be basically CPU-bound, at about
> 800% CPU, and now it's somewhere in the 200% CPU region:
>
> will-it-scale.time.percent_of_cpu_this_job_got
>
> 800 +-+-------------------------------------------------------------------+
> |.+.+.+.+.+.+.+. .+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+..+.+.+.+. .+.+.+.|
> 700 +-+ +. + |
> | |
> 600 +-+ |
> | |
> 500 +-+ |
> | |
> 400 +-+ |
> | |
> 300 +-+ |
> | |
> 200 O-O O O O O O |
> | O O O O O O O O O O O O O O O O O O |
> 100 +-+-------------------------------------------------------------------+
>
> which sounds like the downgrade really messes with the "spin waiting
> for lock" logic.
>
> I'm thinking it's the "wake up waiter" logic that has some bad
> interaction with spinning, and breaks that whole optimization.
>
> Adding Waiman and Davidlohr to the participants, because they seem to
> be the obvious experts in this area.
>
> Linus

Optimistic spinning on rwsem is done only on writers spinning on a
writer-owned rwsem. If a write-lock is downgraded to a read-lock, all
the spinning waiters will quit. That may explain the drop in cpu
utilization. I do have a old patch that enable a certain amount of
reader spinning which may help the situation. I can rebase that and send
it out for review if people have interest.

Cheers,
Longman


_______________________________________________
LKP mailing list
[email protected]
https://lists.01.org/mailman/listinfo/lkp

2018-12-28 12:01:50

by Waiman Long

[permalink] [raw]
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

On 12/27/2018 08:31 PM, Wang, Kemi wrote:
> Hi, Waiman
> Did you post that patch? Let's see if it helps.

I did post the patch a while ago. I will need to rebase it to a new
baseline. Will do that in a week or 2.

-Longman

>
> -----Original Message-----
> From: LKP [mailto:[email protected]] On Behalf Of Waiman Long
> Sent: Tuesday, November 6, 2018 6:40 AM
> To: Linus Torvalds <[email protected]>; [email protected]; Davidlohr Bueso <[email protected]>
> Cc: [email protected]; Linux Kernel Mailing List <[email protected]>; Matthew Wilcox <[email protected]>; [email protected]; Colin King <[email protected]>; Andrew Morton <[email protected]>; [email protected]; [email protected]; [email protected]
> Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression
>
> On 11/05/2018 05:14 PM, Linus Torvalds wrote:
>> On Mon, Nov 5, 2018 at 12:12 PM Vlastimil Babka <[email protected]> wrote:
>>> I didn't spot an obvious mistake in the patch itself, so it looks
>>> like some bad interaction between scheduler and the mmap downgrade?
>> I'm thinking it's RWSEM_SPIN_ON_OWNER that ends up being confused by
>> the downgrade.
>>
>> It looks like the benchmark used to be basically CPU-bound, at about
>> 800% CPU, and now it's somewhere in the 200% CPU region:
>>
>> will-it-scale.time.percent_of_cpu_this_job_got
>>
>> 800 +-+-------------------------------------------------------------------+
>> |.+.+.+.+.+.+.+. .+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+..+.+.+.+. .+.+.+.|
>> 700 +-+ +. + |
>> | |
>> 600 +-+ |
>> | |
>> 500 +-+ |
>> | |
>> 400 +-+ |
>> | |
>> 300 +-+ |
>> | |
>> 200 O-O O O O O O |
>> | O O O O O O O O O O O O O O O O O O |
>> 100 +-+-------------------------------------------------------------------+
>>
>> which sounds like the downgrade really messes with the "spin waiting
>> for lock" logic.
>>
>> I'm thinking it's the "wake up waiter" logic that has some bad
>> interaction with spinning, and breaks that whole optimization.
>>
>> Adding Waiman and Davidlohr to the participants, because they seem to
>> be the obvious experts in this area.
>>
>> Linus
> Optimistic spinning on rwsem is done only on writers spinning on a
> writer-owned rwsem. If a write-lock is downgraded to a read-lock, all
> the spinning waiters will quit. That may explain the drop in cpu
> utilization. I do have a old patch that enable a certain amount of
> reader spinning which may help the situation. I can rebase that and send
> it out for review if people have interest.
>
> Cheers,
> Longman
>
>
> _______________________________________________
> LKP mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/lkp



2018-12-28 15:20:25

by Kemi Wang

[permalink] [raw]
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression



On 2018/12/28 上午10:55, Waiman Long wrote:
> On 12/27/2018 08:31 PM, Wang, Kemi wrote:
>> Hi, Waiman
>> Did you post that patch? Let's see if it helps.
>
> I did post the patch a while ago. I will need to rebase it to a new
> baseline. Will do that in a week or 2.
>

OK.I will take a look at it and try to rebase it on shi's patch to see if
the regression can be fixed.
May I know where I can get that patch, I didn't find it in my inbox. Thanks

> -Longman
>
>>
>> -----Original Message-----
>> From: LKP [mailto:[email protected]] On Behalf Of Waiman Long
>> Sent: Tuesday, November 6, 2018 6:40 AM
>> To: Linus Torvalds <[email protected]>; [email protected]; Davidlohr Bueso <[email protected]>
>> Cc: [email protected]; Linux Kernel Mailing List <[email protected]>; Matthew Wilcox <[email protected]>; [email protected]; Colin King <[email protected]>; Andrew Morton <[email protected]>; [email protected]; [email protected]; [email protected]
>> Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression
>>
>> On 11/05/2018 05:14 PM, Linus Torvalds wrote:
>>> On Mon, Nov 5, 2018 at 12:12 PM Vlastimil Babka <[email protected]> wrote:
>>>> I didn't spot an obvious mistake in the patch itself, so it looks
>>>> like some bad interaction between scheduler and the mmap downgrade?
>>> I'm thinking it's RWSEM_SPIN_ON_OWNER that ends up being confused by
>>> the downgrade.
>>>
>>> It looks like the benchmark used to be basically CPU-bound, at about
>>> 800% CPU, and now it's somewhere in the 200% CPU region:
>>>
>>> will-it-scale.time.percent_of_cpu_this_job_got
>>>
>>> 800 +-+-------------------------------------------------------------------+
>>> |.+.+.+.+.+.+.+. .+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+.+..+.+.+.+. .+.+.+.|
>>> 700 +-+ +. + |
>>> | |
>>> 600 +-+ |
>>> | |
>>> 500 +-+ |
>>> | |
>>> 400 +-+ |
>>> | |
>>> 300 +-+ |
>>> | |
>>> 200 O-O O O O O O |
>>> | O O O O O O O O O O O O O O O O O O |
>>> 100 +-+-------------------------------------------------------------------+
>>>
>>> which sounds like the downgrade really messes with the "spin waiting
>>> for lock" logic.
>>>
>>> I'm thinking it's the "wake up waiter" logic that has some bad
>>> interaction with spinning, and breaks that whole optimization.
>>>
>>> Adding Waiman and Davidlohr to the participants, because they seem to
>>> be the obvious experts in this area.
>>>
>>> Linus
>> Optimistic spinning on rwsem is done only on writers spinning on a
>> writer-owned rwsem. If a write-lock is downgraded to a read-lock, all
>> the spinning waiters will quit. That may explain the drop in cpu
>> utilization. I do have a old patch that enable a certain amount of
>> reader spinning which may help the situation. I can rebase that and send
>> it out for review if people have interest.
>>
>> Cheers,
>> Longman
>>
>>
>> _______________________________________________
>> LKP mailing list
>> [email protected]
>> https://lists.01.org/mailman/listinfo/lkp
>
>

2019-01-31 00:07:24

by Tim Chen

[permalink] [raw]
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

On 12/27/18 6:55 PM, Waiman Long wrote:
> On 12/27/2018 08:31 PM, Wang, Kemi wrote:
>> Hi, Waiman
>> Did you post that patch? Let's see if it helps.
>
> I did post the patch a while ago. I will need to rebase it to a new
> baseline. Will do that in a week or 2.
>
> -Longman
>

Waiman,

In a large scale OLTP benchmark, we are also seeing a 1% loss in performance
from patch 9bc8039e71. We'll be interested to try out your patch to
restore some optimistic spinning on reader held sem.

Thanks.

Tim

2019-01-31 02:55:06

by Waiman Long

[permalink] [raw]
Subject: Re: [LKP] [mm] 9bc8039e71: will-it-scale.per_thread_ops -64.1% regression

On 01/30/2019 07:06 PM, Tim Chen wrote:
> On 12/27/18 6:55 PM, Waiman Long wrote:
>> On 12/27/2018 08:31 PM, Wang, Kemi wrote:
>>> Hi, Waiman
>>> Did you post that patch? Let's see if it helps.
>> I did post the patch a while ago. I will need to rebase it to a new
>> baseline. Will do that in a week or 2.
>>
>> -Longman
>>
> Waiman,
>
> In a large scale OLTP benchmark, we are also seeing a 1% loss in performance
> from patch 9bc8039e71. We'll be interested to try out your patch to
> restore some optimistic spinning on reader held sem.
>
> Thanks.
>
> Tim

I am reworking the rwsem code and reader-optimistic spinning code isn't
ready with the reworked code yet. I need a bit more time to work on that
and certainly keep you posted once it is ready for review.

Cheers,
Longman