2018-05-28 11:44:44

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement


Greeting,

FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:


commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: vm-scalability
on test machine: 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory
with following parameters:

runtime: 300s
size: 1T
test: lru-shm
cpufreq_governor: performance

test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/



Details are as below:
-------------------------------------------------------------------------------------------------->
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/300s/1T/lkp-hsx04/lru-shm/vm-scalability

commit:
ccc2f49f99 ("mm, memcontrol: move swap charge handling into get_swap_page()")
309fe96bfc ("mm, memcontrol: implement memory.swap.events")

ccc2f49f991f17cd 309fe96bfc0ae387f53612927a
---------------- --------------------------
%stddev %change %stddev
\ | \
71207426 +23.0% 87612470 vm-scalability.throughput
0.32 ? 8% -80.2% 0.06 ? 2% vm-scalability.free_time
499213 +23.4% 616000 vm-scalability.median
0.01 ? 9% -43.6% 0.00 ? 22% vm-scalability.median_stddev
71207426 +23.0% 87612470 vm-scalability.throughput
305.83 +3.5% 316.49 vm-scalability.time.elapsed_time
305.83 +3.5% 316.49 vm-scalability.time.elapsed_time.max
7.933e+08 +8.3% 8.594e+08 vm-scalability.time.minor_page_faults
2610 -18.8% 2119 vm-scalability.time.percent_of_cpu_this_job_got
5076 -20.1% 4056 vm-scalability.time.system_time
2910 -8.9% 2651 vm-scalability.time.user_time
24540 +8.2% 26563 vm-scalability.time.voluntary_context_switches
3.566e+09 +8.3% 3.863e+09 vm-scalability.workload
4435819 ? 2% +13.1% 5015715 ? 4% cpuidle.C1E.time
58453 +12.6% 65828 ? 13% cpuidle.POLL.time
630.29 -1.9% 618.41 pmeter.Average_Active_Power
112976 +25.4% 141673 pmeter.performance_per_watt
26.00 -19.2% 21.00 vmstat.procs.r
147037 -1.2% 145251 vmstat.system.in
0.00 ?173% +0.0 0.00 ?124% mpstat.cpu.iowait%
11.66 -2.6 9.02 mpstat.cpu.sys%
6.66 -0.8 5.86 mpstat.cpu.usr%
113669 -12.8% 99110 meminfo.Active
112018 -13.0% 97459 meminfo.Active(anon)
23274932 -21.5% 18277464 meminfo.Mapped
51042 -22.7% 39479 meminfo.PageTables
5675691 -22.6% 4394275 ? 2% numa-meminfo.node0.Mapped
12906 ? 9% -24.0% 9808 ? 11% numa-meminfo.node0.PageTables
5564225 ? 2% -20.1% 4445143 numa-meminfo.node1.Mapped
12478 ? 6% -23.3% 9573 ? 11% numa-meminfo.node1.PageTables
5605568 ? 2% -20.3% 4467557 ? 2% numa-meminfo.node2.Mapped
12399 ? 8% -23.0% 9545 ? 8% numa-meminfo.node2.PageTables
5747984 ? 3% -19.3% 4638538 ? 3% numa-meminfo.node3.Mapped
11853 ? 6% -16.8% 9867 ? 10% numa-meminfo.node3.PageTables
40006 ? 2% +32.6% 53040 ? 23% numa-meminfo.node3.SUnreclaim
1394386 -21.2% 1099228 ? 2% numa-vmstat.node0.nr_mapped
3220 ? 9% -23.7% 2457 ? 9% numa-vmstat.node0.nr_page_table_pages
1385184 ? 2% -19.8% 1111569 ? 2% numa-vmstat.node1.nr_mapped
3096 ? 6% -22.4% 2404 ? 11% numa-vmstat.node1.nr_page_table_pages
1392379 ? 2% -18.4% 1135757 ? 2% numa-vmstat.node2.nr_mapped
3056 ? 7% -20.7% 2422 ? 7% numa-vmstat.node2.nr_page_table_pages
1477487 ? 2% -18.7% 1201163 ? 3% numa-vmstat.node3.nr_mapped
3074 ? 4% -17.2% 2546 ? 11% numa-vmstat.node3.nr_page_table_pages
10001 ? 2% +32.6% 13259 ? 23% numa-vmstat.node3.nr_slab_unreclaimable
4316 ? 19% -26.2% 3183 ? 3% syscalls.sys_mmap.med
66119 ? 25% -41.3% 38816 ? 10% syscalls.sys_newfstat.max
71070823 ? 55% -6.4e+07 6980408 ? 20% syscalls.sys_newfstat.noise.100%
86378359 ? 45% -6.5e+07 20983557 ? 6% syscalls.sys_newfstat.noise.2%
83896012 ? 47% -6.5e+07 18902607 ? 7% syscalls.sys_newfstat.noise.25%
86279365 ? 46% -6.5e+07 20864391 ? 6% syscalls.sys_newfstat.noise.5%
79533721 ? 49% -6.4e+07 15258271 ? 8% syscalls.sys_newfstat.noise.50%
74988875 ? 52% -6.4e+07 11205147 ? 13% syscalls.sys_newfstat.noise.75%
2034 ? 9% -16.9% 1690 ? 4% syscalls.sys_read.med
1598 ? 6% -10.5% 1431 ? 3% syscalls.sys_write.med
5.102e+12 +9.0% 5.559e+12 perf-stat.branch-instructions
1.37 -21.4% 1.08 perf-stat.cpi
2.479e+13 -14.0% 2.132e+13 perf-stat.cpu-cycles
20771 +2.6% 21302 perf-stat.cpu-migrations
4.59e+12 ? 2% +9.2% 5.014e+12 perf-stat.dTLB-loads
1.483e+12 ? 4% +10.6% 1.639e+12 perf-stat.dTLB-stores
2.527e+09 +7.3% 2.712e+09 perf-stat.iTLB-load-misses
1.804e+13 ? 2% +9.3% 1.972e+13 perf-stat.instructions
0.73 +27.1% 0.93 perf-stat.ipc
7.943e+08 +8.3% 8.605e+08 perf-stat.minor-faults
2.416e+09 +4.3% 2.519e+09 perf-stat.node-stores
7.943e+08 +8.3% 8.605e+08 perf-stat.page-faults
27996 -13.0% 24359 proc-vmstat.nr_active_anon
237.75 +6.2% 252.50 proc-vmstat.nr_dirtied
33832161 -1.0% 33504100 proc-vmstat.nr_file_pages
33515106 -1.0% 33189801 proc-vmstat.nr_inactive_anon
485.50 +0.8% 489.25 proc-vmstat.nr_inactive_file
23158 -1.0% 22915 proc-vmstat.nr_kernel_stack
5811543 -24.2% 4407834 proc-vmstat.nr_mapped
12781 -25.1% 9571 proc-vmstat.nr_page_table_pages
33521883 -1.0% 33192863 proc-vmstat.nr_shmem
222.00 +13.1% 251.00 proc-vmstat.nr_written
28001 -13.0% 24362 proc-vmstat.nr_zone_active_anon
33515101 -1.0% 33189795 proc-vmstat.nr_zone_inactive_anon
485.50 +0.8% 489.25 proc-vmstat.nr_zone_inactive_file
7.959e+08 +8.3% 8.621e+08 proc-vmstat.numa_hit
7.958e+08 +8.3% 8.621e+08 proc-vmstat.numa_local
11401 ? 8% -69.4% 3491 ? 23% proc-vmstat.pgactivate
7.969e+08 +8.4% 8.635e+08 proc-vmstat.pgalloc_normal
7.944e+08 +8.3% 8.605e+08 proc-vmstat.pgfault
7.964e+08 +8.4% 8.63e+08 proc-vmstat.pgfree
76.68 ?173% -100.0% 0.00 ? 10% sched_debug.cfs_rq:/.MIN_vruntime.stddev
29153 -18.2% 23841 sched_debug.cfs_rq:/.exec_clock.avg
48865 ? 7% -14.6% 41739 ? 7% sched_debug.cfs_rq:/.exec_clock.max
26558 -18.4% 21662 sched_debug.cfs_rq:/.exec_clock.min
76.68 ?173% -100.0% 0.00 ? 10% sched_debug.cfs_rq:/.max_vruntime.stddev
4166046 -19.1% 3372283 sched_debug.cfs_rq:/.min_vruntime.avg
4360622 -19.2% 3524394 sched_debug.cfs_rq:/.min_vruntime.max
3816276 -17.3% 3154309 sched_debug.cfs_rq:/.min_vruntime.min
105670 ? 15% -32.9% 70895 ? 16% sched_debug.cfs_rq:/.min_vruntime.stddev
-361713 -46.2% -194567 sched_debug.cfs_rq:/.spread0.min
105504 ? 15% -32.8% 70895 ? 16% sched_debug.cfs_rq:/.spread0.stddev
309.71 ? 13% -22.3% 240.75 ? 21% sched_debug.cfs_rq:/.util_est_enqueued.max
6.42 ? 19% -37.9% 3.99 ? 5% sched_debug.cpu.clock.stddev
6.42 ? 19% -37.9% 3.98 ? 5% sched_debug.cpu.clock_task.stddev
5.91 ? 7% -14.8% 5.04 ? 6% sched_debug.cpu.cpu_load[4].avg
355621 ? 22% -28.6% 253956 ? 5% sched_debug.cpu.nr_switches.max
40018 ? 12% -30.5% 27804 ? 16% sched_debug.cpu.nr_switches.stddev
0.00 ? 19% +100.0% 0.01 ? 24% sched_debug.cpu.nr_uninterruptible.avg
364939 ? 24% -26.2% 269378 ? 6% sched_debug.cpu.sched_count.max
41878 ? 12% -27.3% 30433 ? 13% sched_debug.cpu.sched_count.stddev
179801 ? 22% -33.2% 120078 ? 3% sched_debug.cpu.ttwu_count.max
20153 ? 12% -32.8% 13538 ? 19% sched_debug.cpu.ttwu_count.stddev
174157 ? 23% -33.1% 116564 ? 2% sched_debug.cpu.ttwu_local.max
19436 ? 12% -34.2% 12782 ? 20% sched_debug.cpu.ttwu_local.stddev
66.14 -66.1 0.00 perf-profile.calltrace.cycles-pp.do_access
44.18 -44.2 0.00 perf-profile.calltrace.cycles-pp.page_fault.do_access
44.15 -44.1 0.00 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault.do_access
44.13 -44.1 0.00 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault.do_access
42.34 -42.3 0.00 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault.do_access
17.02 -17.0 0.00 perf-profile.calltrace.cycles-pp.do_rw_once
6.85 ? 14% -6.9 0.00 perf-profile.calltrace.cycles-pp.__munmap
6.81 ? 14% -6.8 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
6.81 ? 14% -6.8 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
6.81 ? 14% -6.8 0.00 perf-profile.calltrace.cycles-pp.vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
6.81 ? 14% -6.8 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
8.94 -6.4 2.52 ?173% perf-profile.calltrace.cycles-pp.clear_page_erms.shmem_getpage_gfp.shmem_fault.__do_fault.__handle_mm_fault
5.83 -5.8 0.00 perf-profile.calltrace.cycles-pp.native_irq_return_iret.do_access
6.80 ? 14% -5.6 1.25 ?145% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_munmap.vm_munmap
6.80 ? 14% -5.6 1.25 ?145% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_munmap.vm_munmap.__x64_sys_munmap
6.81 ? 14% -5.6 1.26 ?144% perf-profile.calltrace.cycles-pp.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
6.81 ? 14% -5.6 1.26 ?144% perf-profile.calltrace.cycles-pp.unmap_region.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64
8.00 -5.2 2.79 ?173% perf-profile.calltrace.cycles-pp.filemap_map_pages.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
5.34 -4.8 0.57 ?173% perf-profile.calltrace.cycles-pp.finish_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
5.30 -4.7 0.56 ?173% perf-profile.calltrace.cycles-pp.alloc_set_pte.finish_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault
0.00 +1.1 1.13 ? 91% perf-profile.calltrace.cycles-pp.native_irq_return_iret
0.00 +1.2 1.18 ? 31% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt
0.00 +1.2 1.18 ? 32% perf-profile.calltrace.cycles-pp.get_next_timer_interrupt.tick_nohz_next_event.tick_nohz_get_sleep_length.menu_select.do_idle
0.00 +1.3 1.27 ? 31% perf-profile.calltrace.cycles-pp.load_balance.rebalance_domains.__softirqentry_text_start.irq_exit.smp_apic_timer_interrupt
0.00 +1.3 1.33 ? 33% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
0.00 +1.8 1.78 ? 32% perf-profile.calltrace.cycles-pp.rebalance_domains.__softirqentry_text_start.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt
0.00 +1.9 1.88 ? 33% perf-profile.calltrace.cycles-pp.tick_nohz_next_event.tick_nohz_get_sleep_length.menu_select.do_idle.cpu_startup_entry
0.00 +2.3 2.27 ? 33% perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.do_idle.cpu_startup_entry.start_secondary
0.00 +2.6 2.63 ? 32% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state
0.23 ?173% +3.1 3.36 ? 64% perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.23 ?173% +3.2 3.38 ? 63% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.23 ?173% +3.2 3.39 ? 63% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.23 ?173% +3.2 3.41 ? 63% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
0.23 ?173% +3.2 3.42 ? 63% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
0.00 +3.2 3.21 ? 31% perf-profile.calltrace.cycles-pp.__softirqentry_text_start.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state
0.23 ?173% +3.3 3.52 ? 61% perf-profile.calltrace.cycles-pp.write
0.00 +3.5 3.51 ? 32% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle
0.00 +3.7 3.74 ? 30% perf-profile.calltrace.cycles-pp.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle
0.00 +4.3 4.35 ? 33% perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
0.87 ? 3% +7.6 8.51 ? 31% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry
0.87 ? 3% +7.7 8.58 ? 31% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
7.11 +46.8 53.92 ? 30% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
8.11 +55.7 63.77 ? 30% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
8.72 +61.2 69.92 ? 30% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
8.72 +61.2 69.92 ? 30% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
8.72 +61.2 69.92 ? 30% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
8.73 +61.4 70.12 ? 30% perf-profile.calltrace.cycles-pp.secondary_startup_64
66.14 -66.1 0.00 perf-profile.children.cycles-pp.do_access
17.02 -17.0 0.00 perf-profile.children.cycles-pp.do_rw_once
6.85 ? 14% -6.9 0.00 perf-profile.children.cycles-pp.__munmap
6.81 ? 14% -5.5 1.30 ?137% perf-profile.children.cycles-pp.do_munmap
6.81 ? 14% -5.5 1.31 ?136% perf-profile.children.cycles-pp.unmap_vmas
6.81 ? 14% -5.5 1.31 ?136% perf-profile.children.cycles-pp.unmap_page_range
6.81 ? 14% -5.5 1.30 ?137% perf-profile.children.cycles-pp.vm_munmap
6.81 ? 14% -5.5 1.30 ?137% perf-profile.children.cycles-pp.unmap_region
6.81 ? 14% -5.5 1.30 ?137% perf-profile.children.cycles-pp.__x64_sys_munmap
5.88 -5.1 0.75 ?173% perf-profile.children.cycles-pp.alloc_set_pte
5.35 -4.8 0.57 ?173% perf-profile.children.cycles-pp.finish_fault
5.89 -4.8 1.13 ? 90% perf-profile.children.cycles-pp.native_irq_return_iret
4.50 ? 13% -3.9 0.63 ?155% perf-profile.children.cycles-pp.page_remove_rmap
2.89 ? 8% -2.2 0.72 ?167% perf-profile.children.cycles-pp.shmem_alloc_page
2.86 ? 8% -2.1 0.71 ?167% perf-profile.children.cycles-pp.alloc_pages_vma
2.73 ? 9% -2.1 0.67 ?165% perf-profile.children.cycles-pp.__alloc_pages_nodemask
2.48 ? 10% -1.9 0.59 ?165% perf-profile.children.cycles-pp.get_page_from_freelist
1.57 ? 16% -1.2 0.34 ?164% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
1.25 ? 20% -0.8 0.40 ? 48% perf-profile.children.cycles-pp._raw_spin_lock
0.00 +0.1 0.08 ? 21% perf-profile.children.cycles-pp.ret_from_intr
0.00 +0.1 0.08 ? 24% perf-profile.children.cycles-pp.update_rq_clock
0.00 +0.1 0.09 ? 26% perf-profile.children.cycles-pp.update_group_capacity
0.00 +0.1 0.09 ? 26% perf-profile.children.cycles-pp.intel_pmu_disable_all
0.00 +0.1 0.09 ? 28% perf-profile.children.cycles-pp.perf_event_task_tick
0.11 ? 6% +0.1 0.22 ? 15% perf-profile.children.cycles-pp.__indirect_thunk_start
0.00 +0.1 0.11 ? 34% perf-profile.children.cycles-pp.cpu_load_update
0.00 +0.1 0.12 ? 25% perf-profile.children.cycles-pp.run_posix_cpu_timers
0.00 +0.1 0.12 ? 33% perf-profile.children.cycles-pp.rb_next
0.00 +0.1 0.12 ? 19% perf-profile.children.cycles-pp.interrupt_entry
0.01 ?173% +0.1 0.13 ? 21% perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.00 +0.1 0.12 ? 36% perf-profile.children.cycles-pp.rcu_eqs_exit
0.00 +0.1 0.14 ? 39% perf-profile.children.cycles-pp.nr_iowait_cpu
0.00 +0.1 0.14 ? 38% perf-profile.children.cycles-pp.rcu_dynticks_eqs_exit
0.00 +0.1 0.14 ? 38% perf-profile.children.cycles-pp.leave_mm
0.00 +0.1 0.14 ? 26% perf-profile.children.cycles-pp.__intel_pmu_enable_all
0.00 +0.1 0.14 ? 36% perf-profile.children.cycles-pp.irq_work_needs_cpu
0.00 +0.1 0.14 ? 30% perf-profile.children.cycles-pp.idle_cpu
0.00 +0.1 0.14 ? 30% perf-profile.children.cycles-pp.call_cpuidle
0.00 +0.2 0.15 ? 35% perf-profile.children.cycles-pp.rcu_irq_exit
0.00 +0.2 0.16 ? 40% perf-profile.children.cycles-pp.rcu_needs_cpu
0.00 +0.2 0.16 ? 31% perf-profile.children.cycles-pp.get_cpu_device
0.00 +0.2 0.16 ? 28% perf-profile.children.cycles-pp.tick_check_oneshot_broadcast_this_cpu
0.00 +0.2 0.16 ? 38% perf-profile.children.cycles-pp.native_apic_mem_write
0.00 +0.2 0.16 ? 36% perf-profile.children.cycles-pp.find_next_and_bit
0.00 +0.2 0.17 ? 43% perf-profile.children.cycles-pp.timekeeping_max_deferment
0.00 +0.2 0.18 ? 34% perf-profile.children.cycles-pp.cpumask_next_and
0.00 +0.2 0.19 ? 37% perf-profile.children.cycles-pp.timerqueue_add
0.00 +0.2 0.19 ? 38% perf-profile.children.cycles-pp.enqueue_hrtimer
0.00 +0.2 0.20 ? 34% perf-profile.children.cycles-pp.update_ts_time_stats
0.00 +0.2 0.20 ? 30% perf-profile.children.cycles-pp.rcu_idle_exit
0.04 ? 58% +0.2 0.25 ? 28% perf-profile.children.cycles-pp.irq_work_run_list
0.00 +0.2 0.21 ? 28% perf-profile.children.cycles-pp.tick_nohz_irq_exit
0.03 ?100% +0.2 0.24 ? 28% perf-profile.children.cycles-pp.irq_work_interrupt
0.03 ?100% +0.2 0.24 ? 28% perf-profile.children.cycles-pp.smp_irq_work_interrupt
0.03 ?100% +0.2 0.24 ? 28% perf-profile.children.cycles-pp.irq_work_run
0.03 ?100% +0.2 0.24 ? 28% perf-profile.children.cycles-pp.printk
0.00 +0.2 0.22 ? 36% perf-profile.children.cycles-pp.arch_cpu_idle_enter
0.00 +0.2 0.23 ? 39% perf-profile.children.cycles-pp.tsc_verify_tsc_adjust
0.00 +0.2 0.24 ? 35% perf-profile.children.cycles-pp.hrtimer_get_next_event
0.00 +0.2 0.24 ? 33% perf-profile.children.cycles-pp._raw_spin_trylock
0.01 ?173% +0.2 0.25 ? 45% perf-profile.children.cycles-pp.rcu_process_callbacks
0.00 +0.3 0.26 ? 34% perf-profile.children.cycles-pp.pm_qos_read_value
0.01 ?173% +0.3 0.28 ? 26% perf-profile.children.cycles-pp.update_blocked_averages
0.00 +0.3 0.27 ? 30% perf-profile.children.cycles-pp.read_tsc
0.00 +0.3 0.28 ? 36% perf-profile.children.cycles-pp.timerqueue_del
0.00 +0.3 0.29 ? 27% perf-profile.children.cycles-pp.lapic_next_deadline
0.06 ? 7% +0.3 0.36 ? 28% perf-profile.children.cycles-pp.rcu_check_callbacks
0.03 ?173% +0.3 0.34 ? 70% perf-profile.children.cycles-pp.fbcon_putcs
0.01 ?173% +0.3 0.33 ? 33% perf-profile.children.cycles-pp.__remove_hrtimer
0.03 ?173% +0.3 0.34 ? 71% perf-profile.children.cycles-pp.bit_putcs
0.00 +0.3 0.32 ? 33% perf-profile.children.cycles-pp.__hrtimer_next_event_base
0.03 ?173% +0.3 0.35 ? 70% perf-profile.children.cycles-pp.fbcon_redraw
0.03 ?173% +0.3 0.35 ? 70% perf-profile.children.cycles-pp.lf
0.03 ?173% +0.3 0.35 ? 70% perf-profile.children.cycles-pp.con_scroll
0.03 ?173% +0.3 0.35 ? 70% perf-profile.children.cycles-pp.fbcon_scroll
0.03 ?100% +0.3 0.35 ? 26% perf-profile.children.cycles-pp.run_rebalance_domains
0.00 +0.3 0.33 ? 29% perf-profile.children.cycles-pp.native_sched_clock
0.03 ?173% +0.3 0.36 ? 71% perf-profile.children.cycles-pp.vt_console_print
0.00 +0.3 0.34 ? 34% perf-profile.children.cycles-pp.rcu_eqs_enter
0.06 ? 7% +0.3 0.40 ? 26% perf-profile.children.cycles-pp.native_write_msr
0.00 +0.3 0.35 ? 31% perf-profile.children.cycles-pp.hrtimer_next_event_without
0.00 +0.4 0.37 ? 29% perf-profile.children.cycles-pp.sched_clock
0.06 ? 11% +0.4 0.48 ? 26% perf-profile.children.cycles-pp.clockevents_program_event
0.17 ? 5% +0.4 0.59 ? 21% perf-profile.children.cycles-pp.scheduler_tick
0.05 ? 9% +0.4 0.49 ? 42% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.04 ? 57% +0.5 0.51 ? 29% perf-profile.children.cycles-pp.sched_clock_cpu
0.07 ? 10% +0.5 0.57 ? 34% perf-profile.children.cycles-pp.run_timer_softirq
0.06 +0.6 0.61 ? 30% perf-profile.children.cycles-pp.find_next_bit
0.11 ? 6% +0.6 0.68 ? 28% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.07 ? 7% +0.6 0.67 ? 31% perf-profile.children.cycles-pp.tick_irq_enter
0.09 ? 14% +0.7 0.75 ? 29% perf-profile.children.cycles-pp.ktime_get
0.09 ? 9% +0.8 0.85 ? 31% perf-profile.children.cycles-pp.irq_enter
0.09 ? 4% +0.8 0.86 ? 32% perf-profile.children.cycles-pp.find_busiest_group
0.09 ? 4% +0.8 0.90 ? 31% perf-profile.children.cycles-pp.__next_timer_interrupt
0.27 ? 5% +0.9 1.15 ? 25% perf-profile.children.cycles-pp.update_process_times
0.28 ? 4% +1.0 1.25 ? 25% perf-profile.children.cycles-pp.tick_sched_handle
0.30 ? 5% +1.1 1.43 ? 26% perf-profile.children.cycles-pp.tick_sched_timer
0.12 ? 3% +1.1 1.25 ? 32% perf-profile.children.cycles-pp.get_next_timer_interrupt
0.13 ? 5% +1.2 1.29 ? 31% perf-profile.children.cycles-pp.load_balance
0.07 ?173% +1.2 1.23 ? 66% perf-profile.children.cycles-pp.delay_tsc
0.14 ?173% +1.5 1.67 ? 62% perf-profile.children.cycles-pp.io_serial_in
0.18 ? 4% +1.6 1.81 ? 32% perf-profile.children.cycles-pp.rebalance_domains
0.19 ? 6% +1.8 2.00 ? 34% perf-profile.children.cycles-pp.tick_nohz_next_event
0.22 ? 7% +2.1 2.33 ? 33% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.50 ? 4% +2.3 2.77 ? 28% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.23 ?143% +2.5 2.77 ? 64% perf-profile.children.cycles-pp.serial8250_console_putchar
0.23 ?143% +2.6 2.83 ? 64% perf-profile.children.cycles-pp.uart_console_write
0.23 ?143% +2.7 2.91 ? 64% perf-profile.children.cycles-pp.wait_for_xmitr
0.24 ?144% +2.7 2.97 ? 64% perf-profile.children.cycles-pp.serial8250_console_write
0.38 ? 4% +2.9 3.33 ? 30% perf-profile.children.cycles-pp.__softirqentry_text_start
0.22 ?173% +3.0 3.20 ? 69% perf-profile.children.cycles-pp.devkmsg_write
0.22 ?173% +3.0 3.20 ? 69% perf-profile.children.cycles-pp.printk_emit
0.64 ? 4% +3.0 3.66 ? 28% perf-profile.children.cycles-pp.hrtimer_interrupt
0.27 ?147% +3.1 3.34 ? 64% perf-profile.children.cycles-pp.console_unlock
0.24 ?159% +3.1 3.37 ? 64% perf-profile.children.cycles-pp.__vfs_write
0.24 ?157% +3.1 3.39 ? 63% perf-profile.children.cycles-pp.vfs_write
0.24 ?157% +3.1 3.39 ? 63% perf-profile.children.cycles-pp.ksys_write
0.24 ?161% +3.2 3.44 ? 63% perf-profile.children.cycles-pp.vprintk_emit
0.25 ?153% +3.3 3.52 ? 61% perf-profile.children.cycles-pp.write
0.44 ? 4% +3.4 3.85 ? 29% perf-profile.children.cycles-pp.irq_exit
0.43 ? 5% +4.0 4.43 ? 33% perf-profile.children.cycles-pp.menu_select
1.22 ? 2% +7.5 8.74 ? 29% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
1.23 ? 2% +7.6 8.81 ? 29% perf-profile.children.cycles-pp.apic_timer_interrupt
7.13 +47.0 54.12 ? 30% perf-profile.children.cycles-pp.intel_idle
8.18 +56.3 64.52 ? 30% perf-profile.children.cycles-pp.cpuidle_enter_state
8.72 +61.2 69.92 ? 30% perf-profile.children.cycles-pp.start_secondary
8.73 +61.4 70.12 ? 30% perf-profile.children.cycles-pp.secondary_startup_64
8.73 +61.4 70.12 ? 30% perf-profile.children.cycles-pp.cpu_startup_entry
8.74 +61.5 70.20 ? 30% perf-profile.children.cycles-pp.do_idle
16.94 -16.9 0.00 perf-profile.self.cycles-pp.do_rw_once
10.66 -10.7 0.00 perf-profile.self.cycles-pp.do_access
5.89 -4.8 1.13 ? 90% perf-profile.self.cycles-pp.native_irq_return_iret
3.75 ? 12% -3.2 0.54 ?154% perf-profile.self.cycles-pp.page_remove_rmap
1.57 ? 16% -1.2 0.34 ?164% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.00 +0.1 0.07 ? 17% perf-profile.self.cycles-pp.ret_from_intr
0.00 +0.1 0.08 ? 31% perf-profile.self.cycles-pp.rcu_idle_exit
0.00 +0.1 0.08 ? 26% perf-profile.self.cycles-pp.tick_irq_enter
0.00 +0.1 0.09 ? 28% perf-profile.self.cycles-pp.perf_event_task_tick
0.00 +0.1 0.11 ? 15% perf-profile.self.cycles-pp.ktime_get_update_offsets_now
0.11 ? 6% +0.1 0.22 ? 15% perf-profile.self.cycles-pp.__indirect_thunk_start
0.00 +0.1 0.11 ? 34% perf-profile.self.cycles-pp.scheduler_tick
0.00 +0.1 0.11 ? 34% perf-profile.self.cycles-pp.cpu_load_update
0.00 +0.1 0.11 ? 27% perf-profile.self.cycles-pp.__remove_hrtimer
0.00 +0.1 0.12 ? 25% perf-profile.self.cycles-pp.run_posix_cpu_timers
0.00 +0.1 0.12 ? 33% perf-profile.self.cycles-pp.rb_next
0.00 +0.1 0.12 ? 19% perf-profile.self.cycles-pp.interrupt_entry
0.00 +0.1 0.12 ? 38% perf-profile.self.cycles-pp.timerqueue_add
0.00 +0.1 0.13 ? 32% perf-profile.self.cycles-pp.sched_clock_cpu
0.00 +0.1 0.13 ? 35% perf-profile.self.cycles-pp.hrtimer_interrupt
0.00 +0.1 0.14 ? 39% perf-profile.self.cycles-pp.nr_iowait_cpu
0.00 +0.1 0.14 ? 21% perf-profile.self.cycles-pp.smp_apic_timer_interrupt
0.00 +0.1 0.14 ? 38% perf-profile.self.cycles-pp.rcu_dynticks_eqs_exit
0.00 +0.1 0.14 ? 38% perf-profile.self.cycles-pp.leave_mm
0.00 +0.1 0.14 ? 36% perf-profile.self.cycles-pp.irq_work_needs_cpu
0.00 +0.1 0.14 ? 30% perf-profile.self.cycles-pp.idle_cpu
0.00 +0.1 0.14 ? 30% perf-profile.self.cycles-pp.call_cpuidle
0.00 +0.2 0.16 ? 40% perf-profile.self.cycles-pp.rcu_needs_cpu
0.00 +0.2 0.16 ? 31% perf-profile.self.cycles-pp.get_cpu_device
0.00 +0.2 0.16 ? 28% perf-profile.self.cycles-pp.tick_check_oneshot_broadcast_this_cpu
0.00 +0.2 0.16 ? 38% perf-profile.self.cycles-pp.native_apic_mem_write
0.00 +0.2 0.16 ? 36% perf-profile.self.cycles-pp.find_next_and_bit
0.00 +0.2 0.17 ? 43% perf-profile.self.cycles-pp.timekeeping_max_deferment
0.00 +0.2 0.18 ? 29% perf-profile.self.cycles-pp.perf_mux_hrtimer_handler
0.00 +0.2 0.19 ? 30% perf-profile.self.cycles-pp.__hrtimer_run_queues
0.00 +0.2 0.19 ? 36% perf-profile.self.cycles-pp.tsc_verify_tsc_adjust
0.00 +0.2 0.19 ? 27% perf-profile.self.cycles-pp.update_blocked_averages
0.00 +0.2 0.23 ? 22% perf-profile.self.cycles-pp.irq_exit
0.00 +0.2 0.23 ? 35% perf-profile.self.cycles-pp.get_next_timer_interrupt
0.00 +0.2 0.24 ? 33% perf-profile.self.cycles-pp._raw_spin_trylock
0.00 +0.3 0.26 ? 34% perf-profile.self.cycles-pp.pm_qos_read_value
0.00 +0.3 0.27 ? 29% perf-profile.self.cycles-pp.rcu_check_callbacks
0.00 +0.3 0.27 ? 30% perf-profile.self.cycles-pp.read_tsc
0.00 +0.3 0.30 ? 35% perf-profile.self.cycles-pp.rebalance_domains
0.00 +0.3 0.30 ? 30% perf-profile.self.cycles-pp.load_balance
0.00 +0.3 0.32 ? 33% perf-profile.self.cycles-pp.__hrtimer_next_event_base
0.04 ? 57% +0.3 0.36 ? 31% perf-profile.self.cycles-pp.__softirqentry_text_start
0.00 +0.3 0.33 ? 29% perf-profile.self.cycles-pp.native_sched_clock
0.00 +0.3 0.34 ? 34% perf-profile.self.cycles-pp.rcu_eqs_enter
0.06 ? 7% +0.3 0.40 ? 26% perf-profile.self.cycles-pp.native_write_msr
0.05 ? 9% +0.4 0.45 ? 34% perf-profile.self.cycles-pp.run_timer_softirq
0.00 +0.4 0.40 ? 38% perf-profile.self.cycles-pp.tick_nohz_next_event
0.03 ?100% +0.4 0.45 ? 42% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.01 ?173% +0.4 0.44 ? 34% perf-profile.self.cycles-pp.__next_timer_interrupt
0.05 ? 59% +0.5 0.52 ? 32% perf-profile.self.cycles-pp.ktime_get
0.05 +0.5 0.52 ? 31% perf-profile.self.cycles-pp.do_idle
0.11 ? 7% +0.5 0.59 ? 34% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.06 +0.6 0.61 ? 30% perf-profile.self.cycles-pp.find_next_bit
0.06 ? 6% +0.6 0.62 ? 32% perf-profile.self.cycles-pp.find_busiest_group
0.11 ? 4% +1.1 1.18 ? 33% perf-profile.self.cycles-pp.cpuidle_enter_state
0.07 ?173% +1.2 1.23 ? 66% perf-profile.self.cycles-pp.delay_tsc
0.16 ? 2% +1.4 1.52 ? 34% perf-profile.self.cycles-pp.menu_select
0.14 ?173% +1.5 1.67 ? 62% perf-profile.self.cycles-pp.io_serial_in
7.12 +46.9 54.02 ? 30% perf-profile.self.cycles-pp.intel_idle



vm-scalability.throughput

9.5e+07 +-+---------------------------------------------------------------+
| |
9e+07 +-+ O O |
O O O O O O O O O O O O O |
| O O |
8.5e+07 +-+ |
| |
8e+07 +-+ O O |
| |
7.5e+07 +-+ +. |
| + +. |
|.+.+.+.+. .+..+.+.+ +. .+.+.+.+.+.+. .+..+.+.+.+.+.+.+.+.|
7e+07 +-+ +.+ + +.+ |
| |
6.5e+07 +-+---------------------------------------------------------------+



[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


Thanks,
Xiaolong


Attachments:
(No filename) (36.08 kB)
config-4.17.0-rc4-00145-g309fe96 (167.13 kB)
job-script (7.33 kB)
job.yaml (4.91 kB)
reproduce (296.17 kB)
Download all attachments

2018-05-28 15:55:47

by Michal Hocko

[permalink] [raw]
Subject: Re: [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Mon 28-05-18 19:40:19, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
>
>
> commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

This doesn't make any sense to me. The patch merely adds an accounting.
It doesn't optimize anything. So I strongly suspect the result is just
misleading or the test (environment) misconfigured. Not the first time
I am seeing something like that I am afraid.

--
Michal Hocko
SUSE Labs

2018-05-29 04:40:35

by Aaron Lu

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Mon, 2018-05-28 at 14:03 +0200, Michal Hocko wrote:
> On Mon 28-05-18 19:40:19, kernel test robot wrote:
> >
> > Greeting,
> >
> > FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
> >
> >
> > commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> This doesn't make any sense to me. The patch merely adds an accounting.
> It doesn't optimize anything. So I strongly suspect the result is just
> misleading or the test (environment) misconfigured. Not the first time
> I am seeing something like that I am afraid.
>

Most likely the same situation as:
"
FYI, we noticed a -27.2% regression of will-it-scale.per_process_ops
due to commit:


commit: e27be240df53f1a20c659168e722b5d9f16cc7f4 ("mm: memcg: make sure
memory.events is uptodate when waking pollers")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
"

Where the performance change is due to layout change of
'struct mem_cgroup':
http://lkml.kernel.org/r/[email protected]

2018-05-29 07:58:54

by Michal Hocko

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Tue 29-05-18 03:15:51, Lu, Aaron wrote:
> On Mon, 2018-05-28 at 14:03 +0200, Michal Hocko wrote:
> > On Mon 28-05-18 19:40:19, kernel test robot wrote:
> > >
> > > Greeting,
> > >
> > > FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
> > >
> > >
> > > commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > This doesn't make any sense to me. The patch merely adds an accounting.
> > It doesn't optimize anything. So I strongly suspect the result is just
> > misleading or the test (environment) misconfigured. Not the first time
> > I am seeing something like that I am afraid.
> >
>
> Most likely the same situation as:
> "
> FYI, we noticed a -27.2% regression of will-it-scale.per_process_ops
> due to commit:
>
>
> commit: e27be240df53f1a20c659168e722b5d9f16cc7f4 ("mm: memcg: make sure
> memory.events is uptodate when waking pollers")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> "
>
> Where the performance change is due to layout change of
> 'struct mem_cgroup':
> http://lkml.kernel.org/r/[email protected]

I do not follow. How can _this_ patch lead to an improvement when it
actually _adds_ an accounting? The other report you are mentioning is a
_regression_ and I can imagine that the layout changes can lead to that
result.
--
Michal Hocko
SUSE Labs

2018-05-29 08:13:37

by Aaron Lu

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Tue, May 29, 2018 at 09:58:00AM +0200, Michal Hocko wrote:
> On Tue 29-05-18 03:15:51, Lu, Aaron wrote:
> > On Mon, 2018-05-28 at 14:03 +0200, Michal Hocko wrote:
> > > On Mon 28-05-18 19:40:19, kernel test robot wrote:
> > > >
> > > > Greeting,
> > > >
> > > > FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
> > > >
> > > >
> > > > commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > >
> > > This doesn't make any sense to me. The patch merely adds an accounting.
> > > It doesn't optimize anything. So I strongly suspect the result is just
> > > misleading or the test (environment) misconfigured. Not the first time
> > > I am seeing something like that I am afraid.
> > >
> >
> > Most likely the same situation as:
> > "
> > FYI, we noticed a -27.2% regression of will-it-scale.per_process_ops
> > due to commit:
> >
> >
> > commit: e27be240df53f1a20c659168e722b5d9f16cc7f4 ("mm: memcg: make sure
> > memory.events is uptodate when waking pollers")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > "
> >
> > Where the performance change is due to layout change of
> > 'struct mem_cgroup':
> > http://lkml.kernel.org/r/[email protected]
>
> I do not follow. How can _this_ patch lead to an improvement when it
> actually _adds_ an accounting? The other report you are mentioning is a

This patch also changed the layout of 'struct mem_cgroup':

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d99b71bc2c66..517096c3cc99 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -208,6 +210,9 @@ struct mem_cgroup {
atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
struct cgroup_file events_file;

+ /* handle for "memory.swap.events" */
+ struct cgroup_file swap_events_file;
+
/* protect arrays of thresholds */
struct mutex thresholds_lock;

And I'm guessing that might be the cause.

> _regression_ and I can imagine that the layout changes can lead to that
> result.

2018-05-29 08:30:12

by Michal Hocko

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Tue 29-05-18 16:11:27, Aaron Lu wrote:
> On Tue, May 29, 2018 at 09:58:00AM +0200, Michal Hocko wrote:
> > On Tue 29-05-18 03:15:51, Lu, Aaron wrote:
> > > On Mon, 2018-05-28 at 14:03 +0200, Michal Hocko wrote:
> > > > On Mon 28-05-18 19:40:19, kernel test robot wrote:
> > > > >
> > > > > Greeting,
> > > > >
> > > > > FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
> > > > >
> > > > >
> > > > > commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > >
> > > > This doesn't make any sense to me. The patch merely adds an accounting.
> > > > It doesn't optimize anything. So I strongly suspect the result is just
> > > > misleading or the test (environment) misconfigured. Not the first time
> > > > I am seeing something like that I am afraid.
> > > >
> > >
> > > Most likely the same situation as:
> > > "
> > > FYI, we noticed a -27.2% regression of will-it-scale.per_process_ops
> > > due to commit:
> > >
> > >
> > > commit: e27be240df53f1a20c659168e722b5d9f16cc7f4 ("mm: memcg: make sure
> > > memory.events is uptodate when waking pollers")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > "
> > >
> > > Where the performance change is due to layout change of
> > > 'struct mem_cgroup':
> > > http://lkml.kernel.org/r/[email protected]
> >
> > I do not follow. How can _this_ patch lead to an improvement when it
> > actually _adds_ an accounting? The other report you are mentioning is a
>
> This patch also changed the layout of 'struct mem_cgroup':
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d99b71bc2c66..517096c3cc99 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -208,6 +210,9 @@ struct mem_cgroup {
> atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
> struct cgroup_file events_file;
>
> + /* handle for "memory.swap.events" */
> + struct cgroup_file swap_events_file;
> +
> /* protect arrays of thresholds */
> struct mutex thresholds_lock;
>
> And I'm guessing that might be the cause.

Ohh, you are right! Sorry, I've missed that part.

--
Michal Hocko
SUSE Labs

2018-05-29 09:02:00

by Aaron Lu

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Tue, May 29, 2018 at 10:27:51AM +0200, Michal Hocko wrote:
> On Tue 29-05-18 16:11:27, Aaron Lu wrote:
> > On Tue, May 29, 2018 at 09:58:00AM +0200, Michal Hocko wrote:
> > > On Tue 29-05-18 03:15:51, Lu, Aaron wrote:
> > > > On Mon, 2018-05-28 at 14:03 +0200, Michal Hocko wrote:
> > > > > On Mon 28-05-18 19:40:19, kernel test robot wrote:
> > > > > >
> > > > > > Greeting,
> > > > > >
> > > > > > FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
> > > > > >
> > > > > >
> > > > > > commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > > >
> > > > > This doesn't make any sense to me. The patch merely adds an accounting.
> > > > > It doesn't optimize anything. So I strongly suspect the result is just
> > > > > misleading or the test (environment) misconfigured. Not the first time
> > > > > I am seeing something like that I am afraid.
> > > > >
> > > >
> > > > Most likely the same situation as:
> > > > "
> > > > FYI, we noticed a -27.2% regression of will-it-scale.per_process_ops
> > > > due to commit:
> > > >
> > > >
> > > > commit: e27be240df53f1a20c659168e722b5d9f16cc7f4 ("mm: memcg: make sure
> > > > memory.events is uptodate when waking pollers")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > > "
> > > >
> > > > Where the performance change is due to layout change of
> > > > 'struct mem_cgroup':
> > > > http://lkml.kernel.org/r/[email protected]
> > >
> > > I do not follow. How can _this_ patch lead to an improvement when it
> > > actually _adds_ an accounting? The other report you are mentioning is a
> >
> > This patch also changed the layout of 'struct mem_cgroup':
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index d99b71bc2c66..517096c3cc99 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -208,6 +210,9 @@ struct mem_cgroup {
> > atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
> > struct cgroup_file events_file;
> >
> > + /* handle for "memory.swap.events" */
> > + struct cgroup_file swap_events_file;
> > +
> > /* protect arrays of thresholds */
> > struct mutex thresholds_lock;
> >
> > And I'm guessing that might be the cause.
>
> Ohh, you are right! Sorry, I've missed that part.

Never mind, I want to thank you for taking a look at these reports :-)

I just tried to move this newly added field to the bottom of the
structure(just above 'struct mem_cgroup_per_node *nodeinfo[0];'), and
performance dropped to 82665166, still much better than base but already
worse than this patch.

As you said in another email, this is really fragile.

2018-06-01 07:26:58

by Aaron Lu

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Mon, May 28, 2018 at 07:40:19PM +0800, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
>
>
> commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> in testcase: vm-scalability
> on test machine: 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory
> with following parameters:
>
> runtime: 300s
> size: 1T
> test: lru-shm
> cpufreq_governor: performance
>
> test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
>

With the patch I just sent out:
"mem_cgroup: make sure moving_account, move_lock_task and stat_cpu in the
same cacheline"

Applying this commit on top doesn't yield 23% improvement any more, but
a 6% performace drop...

I found the culprit being the following one line introduced in this commit:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d90b0201a8c4..07ab974c0a49 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6019,13 +6019,17 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
if (!memcg)
return 0;

- if (!entry.val)
+ if (!entry.val) {
+ memcg_memory_event(memcg, MEMCG_SWAP_FAIL);
return 0;
+ }

memcg = mem_cgroup_id_get_online(memcg);

If I remove that memcg_memory_event() call, performance will restore.

It's beyond my understanding why this code path matters since there is
no swap device setup in the test machine so I don't see how possible
get_swap_page() could ever be called.

Still investigating...

2018-06-06 08:51:34

by Aaron Lu

[permalink] [raw]
Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Fri, Jun 01, 2018 at 03:26:04PM +0800, Aaron Lu wrote:
> On Mon, May 28, 2018 at 07:40:19PM +0800, kernel test robot wrote:
> >
> > Greeting,
> >
> > FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
> >
> >
> > commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > in testcase: vm-scalability
> > on test machine: 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory
> > with following parameters:
> >
> > runtime: 300s
> > size: 1T
> > test: lru-shm
> > cpufreq_governor: performance
> >
> > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
> >
>
> With the patch I just sent out:
> "mem_cgroup: make sure moving_account, move_lock_task and stat_cpu in the
> same cacheline"
>
> Applying this commit on top doesn't yield 23% improvement any more, but
> a 6% performace drop...
> I found the culprit being the following one line introduced in this commit:
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d90b0201a8c4..07ab974c0a49 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6019,13 +6019,17 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
> if (!memcg)
> return 0;
>
> - if (!entry.val)
> + if (!entry.val) {
> + memcg_memory_event(memcg, MEMCG_SWAP_FAIL);

Removing this line restored performance but it really doesn't make any
sense. Ying suggested it might be code alignment related and suggested
to use a different compiler than gcc-7.2. Then I used gcc-6.4 and turned
out the test result to be pretty much the same for the two commits:

(each test has run for 3 times)
$ grep throughput base/*/stats.json
base/0/stats.json: "vm-scalability.throughput": 89207489,
base/1/stats.json: "vm-scalability.throughput": 89982933,
base/2/stats.json: "vm-scalability.throughput": 90436592,

$ grep throughput head/*/stats.json
head/0/stats.json: "vm-scalability.throughput": 90882775,
head/1/stats.json: "vm-scalability.throughput": 90675220,
head/2/stats.json: "vm-scalability.throughput": 91173479,

So probably it's really related to code alignment and this bisected
commit doesn't cause performance change(as expected).

> return 0;
> + }
>
> memcg = mem_cgroup_id_get_online(memcg);
>
> If I remove that memcg_memory_event() call, performance will restore.
>
> It's beyond my understanding why this code path matters since there is
> no swap device setup in the test machine so I don't see how possible
> get_swap_page() could ever be called.
>
> Still investigating...
>