LinuxLists.cc - [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

2018-05-28 11:44:44

Subject: [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

Greeting,

FYI, we noticed a

commit: 309fe96bfc0ae387f5

Details are as below:
-------------------------- ========================== compiler/cpufreq_governor/ gcc-7/performance/x86_64
commit:
ccc2f49f99 ("mm, 309fe96bfc ("mm,
ccc2f49f991f17cd ---------------- %stddev %change \ | 71207426 0.32 ? 8% -80.2% 499213 0.01 ? 9% -43.6% 71207426 305.83 305.83 7.933e+08 2610 -18.8% 5076 -20.1% 2910 -8.9% 24540 3.566e+09 4435819 ? 2% 58453 +12.6% 630.29 112976 26.00 -19.2% 147037 0.00 ?173% +0.0 11.66 -2.6 6.66 -0.8 113669 -12.8% 112018 -13.0% 23274932 51042 -22.7% 5675691 12906 ? 9% -24.0% 5564225 ? 2% 12478 ? 6% -23.3% 5605568 ? 2% 12399 ? 8% -23.0% 5747984 ? 3% 11853 ? 6% -16.8% 40006 ? 2% +32.6% 1394386 3220 ? 9% -23.7% 1385184 ? 2% 3096 ? 6% -22.4% 1392379 ? 2% 3056 ? 7% -20.7% 1477487 ? 2% 3074 ? 4% -17.2% 10001 ? 2% +32.6% 4316 ? 19% -26.2% 66119 ? 25% -41.3% 71070823 ? 55% -6.4e+07 86378359 ? 45% -6.5e+07 83896012 ? 47% -6.5e+07 86279365 ? 46% -6.5e+07 79533721 ? 49% -6.4e+07 74988875 ? 52% -6.4e+07 2034 ? 9% -16.9% 1598 ? 6% -10.5% 5.102e+12 1.37 -21.4% 2.479e+13 20771 4.59e+12 ? 2% 1.483e+12 ? 4% 2.527e+09 1.804e+13 ? 2% 0.73 +27.1% 7.943e+08 2.416e+09 7.943e+08 27996 -13.0% 237.75 33832161 33515106 485.50 23158 5811543 12781 -25.1% 33521883 222.00 28001 -13.0% 33515101 485.50 7.959e+08 7.958e+08 11401 ? 8% -69.4% 7.969e+08 7.944e+08 7.964e+08 76.68 ?173% -100.0% 29153 -18.2% 48865 ? 7% -14.6% 26558 -18.4% 76.68 ?173% -100.0% 4166046 4360622 3816276 105670 ? 15% -32.9% -361713 105504 ? 15% -32.8% 309.71 ? 13% 6.42 ? 19% -37.9% 6.42 ? 19% -37.9% 5.91 ? 7% -14.8% 355621 ? 22% 40018 ? 12% -30.5% 0.00 ? 19% +100.0% 364939 ? 24% 41878 ? 12% -27.3% 179801 ? 22% 20153 ? 12% -32.8% 174157 ? 23% 19436 ? 12% -34.2% 66.14 -66.1 44.18 -44.2 44.15 -44.1 44.13 -44.1 42.34 -42.3 17.02 -17.0 6.85 ? 14% -6.9 6.81 ? 14% -6.8 6.81 ? 14% -6.8 6.81 ? 14% -6.8 6.81 ? 14% -6.8 8.94 -6.4 5.83 -5.8 6.80 ? 14% -5.6 6.80 ? 14% -5.6 6.81 ? 14% -5.6 6.81 ? 14% -5.6 8.00 -5.2 5.34 -4.8 5.30 -4.7 0.00 +1.1 0.00 +1.2 0.00 +1.2 0.00 +1.3 0.00 +1.3 0.00 +1.8 0.00 +1.9 0.00 +2.3 0.00 +2.6 0.23 ?173% +3.1 0.23 ?173% +3.2 0.23 ?173% +3.2 0.23 ?173% +3.2 0.23 ?173% +3.2 0.00 +3.2 0.23 ?173% +3.3 0.00 +3.5 0.00 +3.7 0.00 +4.3 0.87 ? 3% +7.6 0.87 ? 3% +7.7 7.11 +46.8 8.11 +55.7 8.72 +61.2 8.72 +61.2 8.72 +61.2 8.73 +61.4 66.14 -66.1 17.02 -17.0 6.85 ? 14% -6.9 6.81 ? 14% -5.5 6.81 ? 14% -5.5 6.81 ? 14% -5.5 6.81 ? 14% -5.5 6.81 ? 14% -5.5 6.81 ? 14% -5.5 5.88 -5.1 5.35 -4.8 5.89 -4.8 4.50 ? 13% -3.9 2.89 ? 8% -2.2 2.86 ? 8% -2.1 2.73 ? 9% -2.1 2.48 ? 10% -1.9 1.57 ? 16% -1.2 1.25 ? 20% -0.8 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.11 ? 6% +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.01 ?173% +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.04 ? 58% +0.2 0.00 +0.2 0.03 ?100% +0.2 0.03 ?100% +0.2 0.03 ?100% +0.2 0.03 ?100% +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.01 ?173% +0.2 0.00 +0.3 0.01 ?173% +0.3 0.00 +0.3 0.00 +0.3 0.00 +0.3 0.06 ? 7% +0.3 0.03 ?173% +0.3 0.01 ?173% +0.3 0.03 ?173% +0.3 0.00 +0.3 0.03 ?173% +0.3 0.03 ?173% +0.3 0.03 ?173% +0.3 0.03 ?173% +0.3 0.03 ?100% +0.3 0.00 +0.3 0.03 ?173% +0.3 0.00 +0.3 0.06 ? 7% +0.3 0.00 +0.3 0.00 +0.4 0.06 ? 11% +0.4 0.17 ? 5% +0.4 0.05 ? 9% +0.4 0.04 ? 57% +0.5 0.07 ? 10% +0.5 0.06 +0.6 0.11 ? 6% +0.6 0.07 ? 7% +0.6 0.09 ? 14% +0.7 0.09 ? 9% +0.8 0.09 ? 4% +0.8 0.09 ? 4% +0.8 0.27 ? 5% +0.9 0.28 ? 4% +1.0 0.30 ? 5% +1.1 0.12 ? 3% +1.1 0.13 ? 5% +1.2 0.07 ?173% +1.2 0.14 ?173% +1.5 0.18 ? 4% +1.6 0.19 ? 6% +1.8 0.22 ? 7% +2.1 0.50 ? 4% +2.3 0.23 ?143% +2.5 0.23 ?143% +2.6 0.23 ?143% +2.7 0.24 ?144% +2.7 0.38 ? 4% +2.9 0.22 ?173% +3.0 0.22 ?173% +3.0 0.64 ? 4% +3.0 0.27 ?147% +3.1 0.24 ?159% +3.1 0.24 ?157% +3.1 0.24 ?157% +3.1 0.24 ?161% +3.2 0.25 ?153% +3.3 0.44 ? 4% +3.4 0.43 ? 5% +4.0 1.22 ? 2% +7.5 1.23 ? 2% +7.6 7.13 +47.0 8.18 +56.3 8.72 +61.2 8.73 +61.4 8.73 +61.4 8.74 +61.5 16.94 -16.9 10.66 -10.7 5.89 -4.8 3.75 ? 12% -3.2 1.57 ? 16% -1.2 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.11 ? 6% +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.1 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.2 0.00 +0.3 0.00 +0.3 0.00 +0.3 0.00 +0.3 0.00 +0.3 0.00 +0.3 0.04 ? 57% +0.3 0.00 +0.3 0.00 +0.3 0.06 ? 7% +0.3 0.05 ? 9% +0.4 0.00 +0.4 0.03 ?100% +0.4 0.01 ?173% +0.4 0.05 ? 59% +0.5 0.05 +0.5 0.11 ? 7% +0.5 0.06 +0.6 0.06 ? 6% +0.6 0.11 ? 4% +1.1 0.07 ?173% +1.2 0.16 ? 2% +1.4 0.14 ?173% +1.5 7.12 +46.9

9.5e+07 +-+------------- | 9e+07 +-+ O O O O O | 8.5e+07 +-+ | 8e+07 +-+ O O | 7.5e+07 +-+ | |.+.+.+.+. .+..+.+.+ 7e+07 +-+ +.+ | 6.5e+07 +-+------------- [*] bisect-good sample
[O] bisect-bad sample

Disclaimer:
Results have been for informational design or configuration

Thanks,
Xiaolong
+23.0% improvement of vm-scalability.throughput due to commit:
3612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
el.org/cgit/linux/kernel/git/next/linux-next.git">https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory
The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
//git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/">https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
------------------------------------------------------------------------>
===============================================================
kconfig/rootfs/runtime/size/tbox_group/test/testcase:
-rhel-7.2/debian-x86_64-2016-08-31.cgz/300s/1T/lkp-hsx04/lru-shm/vm-scalability
memcontrol: move swap charge handling into get_swap_page()")
memcontrol: implement memory.swap.events")
309fe96bfc0ae387f53612927a
--------------------------
%stddev
\
+23.0% 87612470 vm-scalability.throughput
0.06 ? 2% vm-scalability.free_time
+23.4% 616000 vm-scalability.median
0.00 ? 22% vm-scalability.median_stddev
+23.0% 87612470 vm-scalability.throughput
+3.5% 316.49 vm-scalability.time.elapsed_time
+3.5% 316.49 vm-scalability.time.elapsed_time.max
+8.3% 8.594e+08 vm-scalability.time.minor_page_faults
2119 vm-scalability.time.percent_of_cpu_this_job_got
4056 vm-scalability.time.system_time
2651 vm-scalability.time.user_time
+8.2% 26563 vm-scalability.time.voluntary_context_switches
+8.3% 3.863e+09 vm-scalability.workload
+13.1% 5015715 ? 4% cpuidle.C1E.time
65828 ? 13% cpuidle.POLL.time
-1.9% 618.41 pmeter.Average_Active_Power
+25.4% 141673 pmeter.performance_per_watt
21.00 vmstat.procs.r
-1.2% 145251 vmstat.system.in
0.00 ?124% mpstat.cpu.iowait%
9.02 mpstat.cpu.sys%
5.86 mpstat.cpu.usr%
99110 meminfo.Active
97459 meminfo.Active(anon)
-21.5% 18277464 meminfo.Mapped
39479 meminfo.PageTables
-22.6% 4394275 ? 2% numa-meminfo.node0.Mapped
9808 ? 11% numa-meminfo.node0.PageTables
-20.1% 4445143 numa-meminfo.node1.Mapped
9573 ? 11% numa-meminfo.node1.PageTables
-20.3% 4467557 ? 2% numa-meminfo.node2.Mapped
9545 ? 8% numa-meminfo.node2.PageTables
-19.3% 4638538 ? 3% numa-meminfo.node3.Mapped
9867 ? 10% numa-meminfo.node3.PageTables
53040 ? 23% numa-meminfo.node3.SUnreclaim
-21.2% 1099228 ? 2% numa-vmstat.node0.nr_mapped
2457 ? 9% numa-vmstat.node0.nr_page_table_pages
-19.8% 1111569 ? 2% numa-vmstat.node1.nr_mapped
2404 ? 11% numa-vmstat.node1.nr_page_table_pages
-18.4% 1135757 ? 2% numa-vmstat.node2.nr_mapped
2422 ? 7% numa-vmstat.node2.nr_page_table_pages
-18.7% 1201163 ? 3% numa-vmstat.node3.nr_mapped
2546 ? 11% numa-vmstat.node3.nr_page_table_pages
13259 ? 23% numa-vmstat.node3.nr_slab_unreclaimable
3183 ? 3% syscalls.sys_mmap.med
38816 ? 10% syscalls.sys_newfstat.max
6980408 ? 20% syscalls.sys_newfstat.noise.100%
20983557 ? 6% syscalls.sys_newfstat.noise.2%
18902607 ? 7% syscalls.sys_newfstat.noise.25%
20864391 ? 6% syscalls.sys_newfstat.noise.5%
15258271 ? 8% syscalls.sys_newfstat.noise.50%
11205147 ? 13% syscalls.sys_newfstat.noise.75%
1690 ? 4% syscalls.sys_read.med
1431 ? 3% syscalls.sys_write.med
+9.0% 5.559e+12 perf-stat.branch-instructions
1.08 perf-stat.cpi
-14.0% 2.132e+13 perf-stat.cpu-cycles
+2.6% 21302 perf-stat.cpu-migrations
+9.2% 5.014e+12 perf-stat.dTLB-loads
+10.6% 1.639e+12 perf-stat.dTLB-stores
+7.3% 2.712e+09 perf-stat.iTLB-load-misses
+9.3% 1.972e+13 perf-stat.instructions
0.93 perf-stat.ipc
+8.3% 8.605e+08 perf-stat.minor-faults
+4.3% 2.519e+09 perf-stat.node-stores
+8.3% 8.605e+08 perf-stat.page-faults
24359 proc-vmstat.nr_active_anon
+6.2% 252.50 proc-vmstat.nr_dirtied
-1.0% 33504100 proc-vmstat.nr_file_pages
-1.0% 33189801 proc-vmstat.nr_inactive_anon
+0.8% 489.25 proc-vmstat.nr_inactive_file
-1.0% 22915 proc-vmstat.nr_kernel_stack
-24.2% 4407834 proc-vmstat.nr_mapped
9571 proc-vmstat.nr_page_table_pages
-1.0% 33192863 proc-vmstat.nr_shmem
+13.1% 251.00 proc-vmstat.nr_written
24362 proc-vmstat.nr_zone_active_anon
-1.0% 33189795 proc-vmstat.nr_zone_inactive_anon
+0.8% 489.25 proc-vmstat.nr_zone_inactive_file
+8.3% 8.621e+08 proc-vmstat.numa_hit
+8.3% 8.621e+08 proc-vmstat.numa_local
3491 ? 23% proc-vmstat.pgactivate
+8.4% 8.635e+08 proc-vmstat.pgalloc_normal
+8.3% 8.605e+08 proc-vmstat.pgfault
+8.4% 8.63e+08 proc-vmstat.pgfree
0.00 ? 10% sched_debug.cfs_rq:/.MIN_vruntime.stddev
23841 sched_debug.cfs_rq:/.exec_clock.avg
41739 ? 7% sched_debug.cfs_rq:/.exec_clock.max
21662 sched_debug.cfs_rq:/.exec_clock.min
0.00 ? 10% sched_debug.cfs_rq:/.max_vruntime.stddev
-19.1% 3372283 sched_debug.cfs_rq:/.min_vruntime.avg
-19.2% 3524394 sched_debug.cfs_rq:/.min_vruntime.max
-17.3% 3154309 sched_debug.cfs_rq:/.min_vruntime.min
70895 ? 16% sched_debug.cfs_rq:/.min_vruntime.stddev
-46.2% -194567 sched_debug.cfs_rq:/.spread0.min
70895 ? 16% sched_debug.cfs_rq:/.spread0.stddev
-22.3% 240.75 ? 21% sched_debug.cfs_rq:/.util_est_enqueued.max
3.99 ? 5% sched_debug.cpu.clock.stddev
3.98 ? 5% sched_debug.cpu.clock_task.stddev
5.04 ? 6% sched_debug.cpu.cpu_load[4].avg
-28.6% 253956 ? 5% sched_debug.cpu.nr_switches.max
27804 ? 16% sched_debug.cpu.nr_switches.stddev
0.01 ? 24% sched_debug.cpu.nr_uninterruptible.avg
-26.2% 269378 ? 6% sched_debug.cpu.sched_count.max
30433 ? 13% sched_debug.cpu.sched_count.stddev
-33.2% 120078 ? 3% sched_debug.cpu.ttwu_count.max
13538 ? 19% sched_debug.cpu.ttwu_count.stddev
-33.1% 116564 ? 2% sched_debug.cpu.ttwu_local.max
12782 ? 20% sched_debug.cpu.ttwu_local.stddev
0.00 perf-profile.calltrace.cycles-pp.do_access
0.00 perf-profile.calltrace.cycles-pp.page_fault.do_access
0.00 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault.do_access
0.00 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault.do_access
0.00 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault.do_access
0.00 perf-profile.calltrace.cycles-pp.do_rw_once
0.00 perf-profile.calltrace.cycles-pp.__munmap
0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.00 perf-profile.calltrace.cycles-pp.vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
0.00 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
2.52 ?173% perf-profile.calltrace.cycles-pp.clear_page_erms.shmem_getpage_gfp.shmem_fault.__do_fault.__handle_mm_fault
0.00 perf-profile.calltrace.cycles-pp.native_irq_return_iret.do_access
1.25 ?145% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_munmap.vm_munmap
1.25 ?145% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_munmap.vm_munmap.__x64_sys_munmap
1.26 ?144% perf-profile.calltrace.cycles-pp.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.26 ?144% perf-profile.calltrace.cycles-pp.unmap_region.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64
2.79 ?173% perf-profile.calltrace.cycles-pp.filemap_map_pages.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
0.57 ?173% perf-profile.calltrace.cycles-pp.finish_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault
0.56 ?173% perf-profile.calltrace.cycles-pp.alloc_set_pte.finish_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault
1.13 ? 91% perf-profile.calltrace.cycles-pp.native_irq_return_iret
1.18 ? 31% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt
1.18 ? 32% perf-profile.calltrace.cycles-pp.get_next_timer_interrupt.tick_nohz_next_event.tick_nohz_get_sleep_length.menu_select.do_idle
1.27 ? 31% perf-profile.calltrace.cycles-pp.load_balance.rebalance_domains.__softirqentry_text_start.irq_exit.smp_apic_timer_interrupt
1.33 ? 33% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt
1.78 ? 32% perf-profile.calltrace.cycles-pp.rebalance_domains.__softirqentry_text_start.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt
1.88 ? 33% perf-profile.calltrace.cycles-pp.tick_nohz_next_event.tick_nohz_get_sleep_length.menu_select.do_idle.cpu_startup_entry
2.27 ? 33% perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.do_idle.cpu_startup_entry.start_secondary
2.63 ? 32% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state
3.36 ? 64% perf-profile.calltrace.cycles-pp.__vfs_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.38 ? 63% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
3.39 ? 63% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
3.41 ? 63% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
3.42 ? 63% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
3.21 ? 31% perf-profile.calltrace.cycles-pp.__softirqentry_text_start.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state
3.52 ? 61% perf-profile.calltrace.cycles-pp.write
3.51 ? 32% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle
3.74 ? 30% perf-profile.calltrace.cycles-pp.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle
4.35 ? 33% perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
8.51 ? 31% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry
8.58 ? 31% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
53.92 ? 30% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary
63.77 ? 30% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
69.92 ? 30% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64
69.92 ? 30% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64
69.92 ? 30% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64
70.12 ? 30% perf-profile.calltrace.cycles-pp.secondary_startup_64
0.00 perf-profile.children.cycles-pp.do_access
0.00 perf-profile.children.cycles-pp.do_rw_once
0.00 perf-profile.children.cycles-pp.__munmap
1.30 ?137% perf-profile.children.cycles-pp.do_munmap
1.31 ?136% perf-profile.children.cycles-pp.unmap_vmas
1.31 ?136% perf-profile.children.cycles-pp.unmap_page_range
1.30 ?137% perf-profile.children.cycles-pp.vm_munmap
1.30 ?137% perf-profile.children.cycles-pp.unmap_region
1.30 ?137% perf-profile.children.cycles-pp.__x64_sys_munmap
0.75 ?173% perf-profile.children.cycles-pp.alloc_set_pte
0.57 ?173% perf-profile.children.cycles-pp.finish_fault
1.13 ? 90% perf-profile.children.cycles-pp.native_irq_return_iret
0.63 ?155% perf-profile.children.cycles-pp.page_remove_rmap
0.72 ?167% perf-profile.children.cycles-pp.shmem_alloc_page
0.71 ?167% perf-profile.children.cycles-pp.alloc_pages_vma
0.67 ?165% perf-profile.children.cycles-pp.__alloc_pages_nodemask
0.59 ?165% perf-profile.children.cycles-pp.get_page_from_freelist
0.34 ?164% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
0.40 ? 48% perf-profile.children.cycles-pp._raw_spin_lock
0.08 ? 21% perf-profile.children.cycles-pp.ret_from_intr
0.08 ? 24% perf-profile.children.cycles-pp.update_rq_clock
0.09 ? 26% perf-profile.children.cycles-pp.update_group_capacity
0.09 ? 26% perf-profile.children.cycles-pp.intel_pmu_disable_all
0.09 ? 28% perf-profile.children.cycles-pp.perf_event_task_tick
0.22 ? 15% perf-profile.children.cycles-pp.__indirect_thunk_start
0.11 ? 34% perf-profile.children.cycles-pp.cpu_load_update
0.12 ? 25% perf-profile.children.cycles-pp.run_posix_cpu_timers
0.12 ? 33% perf-profile.children.cycles-pp.rb_next
0.12 ? 19% perf-profile.children.cycles-pp.interrupt_entry
0.13 ? 21% perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.12 ? 36% perf-profile.children.cycles-pp.rcu_eqs_exit
0.14 ? 39% perf-profile.children.cycles-pp.nr_iowait_cpu
0.14 ? 38% perf-profile.children.cycles-pp.rcu_dynticks_eqs_exit
0.14 ? 38% perf-profile.children.cycles-pp.leave_mm
0.14 ? 26% perf-profile.children.cycles-pp.__intel_pmu_enable_all
0.14 ? 36% perf-profile.children.cycles-pp.irq_work_needs_cpu
0.14 ? 30% perf-profile.children.cycles-pp.idle_cpu
0.14 ? 30% perf-profile.children.cycles-pp.call_cpuidle
0.15 ? 35% perf-profile.children.cycles-pp.rcu_irq_exit
0.16 ? 40% perf-profile.children.cycles-pp.rcu_needs_cpu
0.16 ? 31% perf-profile.children.cycles-pp.get_cpu_device
0.16 ? 28% perf-profile.children.cycles-pp.tick_check_oneshot_broadcast_this_cpu
0.16 ? 38% perf-profile.children.cycles-pp.native_apic_mem_write
0.16 ? 36% perf-profile.children.cycles-pp.find_next_and_bit
0.17 ? 43% perf-profile.children.cycles-pp.timekeeping_max_deferment
0.18 ? 34% perf-profile.children.cycles-pp.cpumask_next_and
0.19 ? 37% perf-profile.children.cycles-pp.timerqueue_add
0.19 ? 38% perf-profile.children.cycles-pp.enqueue_hrtimer
0.20 ? 34% perf-profile.children.cycles-pp.update_ts_time_stats
0.20 ? 30% perf-profile.children.cycles-pp.rcu_idle_exit
0.25 ? 28% perf-profile.children.cycles-pp.irq_work_run_list
0.21 ? 28% perf-profile.children.cycles-pp.tick_nohz_irq_exit
0.24 ? 28% perf-profile.children.cycles-pp.irq_work_interrupt
0.24 ? 28% perf-profile.children.cycles-pp.smp_irq_work_interrupt
0.24 ? 28% perf-profile.children.cycles-pp.irq_work_run
0.24 ? 28% perf-profile.children.cycles-pp.printk
0.22 ? 36% perf-profile.children.cycles-pp.arch_cpu_idle_enter
0.23 ? 39% perf-profile.children.cycles-pp.tsc_verify_tsc_adjust
0.24 ? 35% perf-profile.children.cycles-pp.hrtimer_get_next_event
0.24 ? 33% perf-profile.children.cycles-pp._raw_spin_trylock
0.25 ? 45% perf-profile.children.cycles-pp.rcu_process_callbacks
0.26 ? 34% perf-profile.children.cycles-pp.pm_qos_read_value
0.28 ? 26% perf-profile.children.cycles-pp.update_blocked_averages
0.27 ? 30% perf-profile.children.cycles-pp.read_tsc
0.28 ? 36% perf-profile.children.cycles-pp.timerqueue_del
0.29 ? 27% perf-profile.children.cycles-pp.lapic_next_deadline
0.36 ? 28% perf-profile.children.cycles-pp.rcu_check_callbacks
0.34 ? 70% perf-profile.children.cycles-pp.fbcon_putcs
0.33 ? 33% perf-profile.children.cycles-pp.__remove_hrtimer
0.34 ? 71% perf-profile.children.cycles-pp.bit_putcs
0.32 ? 33% perf-profile.children.cycles-pp.__hrtimer_next_event_base
0.35 ? 70% perf-profile.children.cycles-pp.fbcon_redraw
0.35 ? 70% perf-profile.children.cycles-pp.lf
0.35 ? 70% perf-profile.children.cycles-pp.con_scroll
0.35 ? 70% perf-profile.children.cycles-pp.fbcon_scroll
0.35 ? 26% perf-profile.children.cycles-pp.run_rebalance_domains
0.33 ? 29% perf-profile.children.cycles-pp.native_sched_clock
0.36 ? 71% perf-profile.children.cycles-pp.vt_console_print
0.34 ? 34% perf-profile.children.cycles-pp.rcu_eqs_enter
0.40 ? 26% perf-profile.children.cycles-pp.native_write_msr
0.35 ? 31% perf-profile.children.cycles-pp.hrtimer_next_event_without
0.37 ? 29% perf-profile.children.cycles-pp.sched_clock
0.48 ? 26% perf-profile.children.cycles-pp.clockevents_program_event
0.59 ? 21% perf-profile.children.cycles-pp.scheduler_tick
0.49 ? 42% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.51 ? 29% perf-profile.children.cycles-pp.sched_clock_cpu
0.57 ? 34% perf-profile.children.cycles-pp.run_timer_softirq
0.61 ? 30% perf-profile.children.cycles-pp.find_next_bit
0.68 ? 28% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.67 ? 31% perf-profile.children.cycles-pp.tick_irq_enter
0.75 ? 29% perf-profile.children.cycles-pp.ktime_get
0.85 ? 31% perf-profile.children.cycles-pp.irq_enter
0.86 ? 32% perf-profile.children.cycles-pp.find_busiest_group
0.90 ? 31% perf-profile.children.cycles-pp.__next_timer_interrupt
1.15 ? 25% perf-profile.children.cycles-pp.update_process_times
1.25 ? 25% perf-profile.children.cycles-pp.tick_sched_handle
1.43 ? 26% perf-profile.children.cycles-pp.tick_sched_timer
1.25 ? 32% perf-profile.children.cycles-pp.get_next_timer_interrupt
1.29 ? 31% perf-profile.children.cycles-pp.load_balance
1.23 ? 66% perf-profile.children.cycles-pp.delay_tsc
1.67 ? 62% perf-profile.children.cycles-pp.io_serial_in
1.81 ? 32% perf-profile.children.cycles-pp.rebalance_domains
2.00 ? 34% perf-profile.children.cycles-pp.tick_nohz_next_event
2.33 ? 33% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
2.77 ? 28% perf-profile.children.cycles-pp.__hrtimer_run_queues
2.77 ? 64% perf-profile.children.cycles-pp.serial8250_console_putchar
2.83 ? 64% perf-profile.children.cycles-pp.uart_console_write
2.91 ? 64% perf-profile.children.cycles-pp.wait_for_xmitr
2.97 ? 64% perf-profile.children.cycles-pp.serial8250_console_write
3.33 ? 30% perf-profile.children.cycles-pp.__softirqentry_text_start
3.20 ? 69% perf-profile.children.cycles-pp.devkmsg_write
3.20 ? 69% perf-profile.children.cycles-pp.printk_emit
3.66 ? 28% perf-profile.children.cycles-pp.hrtimer_interrupt
3.34 ? 64% perf-profile.children.cycles-pp.console_unlock
3.37 ? 64% perf-profile.children.cycles-pp.__vfs_write
3.39 ? 63% perf-profile.children.cycles-pp.vfs_write
3.39 ? 63% perf-profile.children.cycles-pp.ksys_write
3.44 ? 63% perf-profile.children.cycles-pp.vprintk_emit
3.52 ? 61% perf-profile.children.cycles-pp.write
3.85 ? 29% perf-profile.children.cycles-pp.irq_exit
4.43 ? 33% perf-profile.children.cycles-pp.menu_select
8.74 ? 29% perf-profile.children.cycles-pp.smp_apic_timer_interrupt
8.81 ? 29% perf-profile.children.cycles-pp.apic_timer_interrupt
54.12 ? 30% perf-profile.children.cycles-pp.intel_idle
64.52 ? 30% perf-profile.children.cycles-pp.cpuidle_enter_state
69.92 ? 30% perf-profile.children.cycles-pp.start_secondary
70.12 ? 30% perf-profile.children.cycles-pp.secondary_startup_64
70.12 ? 30% perf-profile.children.cycles-pp.cpu_startup_entry
70.20 ? 30% perf-profile.children.cycles-pp.do_idle
0.00 perf-profile.self.cycles-pp.do_rw_once
0.00 perf-profile.self.cycles-pp.do_access
1.13 ? 90% perf-profile.self.cycles-pp.native_irq_return_iret
0.54 ?154% perf-profile.self.cycles-pp.page_remove_rmap
0.34 ?164% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.07 ? 17% perf-profile.self.cycles-pp.ret_from_intr
0.08 ? 31% perf-profile.self.cycles-pp.rcu_idle_exit
0.08 ? 26% perf-profile.self.cycles-pp.tick_irq_enter
0.09 ? 28% perf-profile.self.cycles-pp.perf_event_task_tick
0.11 ? 15% perf-profile.self.cycles-pp.ktime_get_update_offsets_now
0.22 ? 15% perf-profile.self.cycles-pp.__indirect_thunk_start
0.11 ? 34% perf-profile.self.cycles-pp.scheduler_tick
0.11 ? 34% perf-profile.self.cycles-pp.cpu_load_update
0.11 ? 27% perf-profile.self.cycles-pp.__remove_hrtimer
0.12 ? 25% perf-profile.self.cycles-pp.run_posix_cpu_timers
0.12 ? 33% perf-profile.self.cycles-pp.rb_next
0.12 ? 19% perf-profile.self.cycles-pp.interrupt_entry
0.12 ? 38% perf-profile.self.cycles-pp.timerqueue_add
0.13 ? 32% perf-profile.self.cycles-pp.sched_clock_cpu
0.13 ? 35% perf-profile.self.cycles-pp.hrtimer_interrupt
0.14 ? 39% perf-profile.self.cycles-pp.nr_iowait_cpu
0.14 ? 21% perf-profile.self.cycles-pp.smp_apic_timer_interrupt
0.14 ? 38% perf-profile.self.cycles-pp.rcu_dynticks_eqs_exit
0.14 ? 38% perf-profile.self.cycles-pp.leave_mm
0.14 ? 36% perf-profile.self.cycles-pp.irq_work_needs_cpu
0.14 ? 30% perf-profile.self.cycles-pp.idle_cpu
0.14 ? 30% perf-profile.self.cycles-pp.call_cpuidle
0.16 ? 40% perf-profile.self.cycles-pp.rcu_needs_cpu
0.16 ? 31% perf-profile.self.cycles-pp.get_cpu_device
0.16 ? 28% perf-profile.self.cycles-pp.tick_check_oneshot_broadcast_this_cpu
0.16 ? 38% perf-profile.self.cycles-pp.native_apic_mem_write
0.16 ? 36% perf-profile.self.cycles-pp.find_next_and_bit
0.17 ? 43% perf-profile.self.cycles-pp.timekeeping_max_deferment
0.18 ? 29% perf-profile.self.cycles-pp.perf_mux_hrtimer_handler
0.19 ? 30% perf-profile.self.cycles-pp.__hrtimer_run_queues
0.19 ? 36% perf-profile.self.cycles-pp.tsc_verify_tsc_adjust
0.19 ? 27% perf-profile.self.cycles-pp.update_blocked_averages
0.23 ? 22% perf-profile.self.cycles-pp.irq_exit
0.23 ? 35% perf-profile.self.cycles-pp.get_next_timer_interrupt
0.24 ? 33% perf-profile.self.cycles-pp._raw_spin_trylock
0.26 ? 34% perf-profile.self.cycles-pp.pm_qos_read_value
0.27 ? 29% perf-profile.self.cycles-pp.rcu_check_callbacks
0.27 ? 30% perf-profile.self.cycles-pp.read_tsc
0.30 ? 35% perf-profile.self.cycles-pp.rebalance_domains
0.30 ? 30% perf-profile.self.cycles-pp.load_balance
0.32 ? 33% perf-profile.self.cycles-pp.__hrtimer_next_event_base
0.36 ? 31% perf-profile.self.cycles-pp.__softirqentry_text_start
0.33 ? 29% perf-profile.self.cycles-pp.native_sched_clock
0.34 ? 34% perf-profile.self.cycles-pp.rcu_eqs_enter
0.40 ? 26% perf-profile.self.cycles-pp.native_write_msr
0.45 ? 34% perf-profile.self.cycles-pp.run_timer_softirq
0.40 ? 38% perf-profile.self.cycles-pp.tick_nohz_next_event
0.45 ? 42% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.44 ? 34% perf-profile.self.cycles-pp.__next_timer_interrupt
0.52 ? 32% perf-profile.self.cycles-pp.ktime_get
0.52 ? 31% perf-profile.self.cycles-pp.do_idle
0.59 ? 34% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.61 ? 30% perf-profile.self.cycles-pp.find_next_bit
0.62 ? 32% perf-profile.self.cycles-pp.find_busiest_group
1.18 ? 33% perf-profile.self.cycles-pp.cpuidle_enter_state
1.23 ? 66% perf-profile.self.cycles-pp.delay_tsc
1.52 ? 34% perf-profile.self.cycles-pp.menu_select
1.67 ? 62% perf-profile.self.cycles-pp.io_serial_in
54.02 ? 30% perf-profile.self.cycles-pp.intel_idle

vm-scalability.throughput

--------------------------------------------------+
|
O O |
O O O O O O O O |
O O |
|
|
|
|
+. |
+ +. |
+. .+.+.+.+.+.+. .+..+.+.+.+.+.+.+.+.|
+ +.+ |
|
--------------------------------------------------+

estimated based on internal Intel analysis and are provided
purposes only. Any difference in system hardware or software
may affect actual performance.

Attachments:

(No filename) (36.08 kB)
config-4.17.0-rc4-00145-g309fe96 (167.13 kB)
job-script (7.33 kB)
job.yaml (4.91 kB)
reproduce (296.17 kB)
Download all attachments

2018-05-28 15:55:47

by Michal Hocko

[permalink] [raw]

Subject: Re: [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Mon 28-05-18 19:40:19, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
>
>
> commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

This doesn't make any sense to me. The patch merely adds an accounting.
It doesn't optimize anything. So I strongly suspect the result is just
misleading or the test (environment) misconfigured. Not the first time
I am seeing something like that I am afraid.

--
Michal Hocko
SUSE Labs

2018-05-29 04:40:35

by Aaron Lu

[permalink] [raw]

Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Mon, 2018-05-28 at 14:03 +0200, Michal Hocko wrote:
> On Mon 28-05-18 19:40:19, kernel test robot wrote:
> >
> > Greeting,
> >
> > FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
> >
> >
> > commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> This doesn't make any sense to me. The patch merely adds an accounting.
> It doesn't optimize anything. So I strongly suspect the result is just
> misleading or the test (environment) misconfigured. Not the first time
> I am seeing something like that I am afraid.
>

Most likely the same situation as:
"
FYI, we noticed a -27.2% regression of will-it-scale.per_process_ops
due to commit:

commit: e27be240df53f1a20c659168e722b5d9f16cc7f4 ("mm: memcg: make sure
memory.events is uptodate when waking pollers")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
"

Where the performance change is due to layout change of
'struct mem_cgroup':
http://lkml.kernel.org/r/[email protected]

2018-05-29 07:58:54

by Michal Hocko

[permalink] [raw]

Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Tue 29-05-18 03:15:51, Lu, Aaron wrote:
> On Mon, 2018-05-28 at 14:03 +0200, Michal Hocko wrote:
> > On Mon 28-05-18 19:40:19, kernel test robot wrote:
> > >
> > > Greeting,
> > >
> > > FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
> > >
> > >
> > > commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > This doesn't make any sense to me. The patch merely adds an accounting.
> > It doesn't optimize anything. So I strongly suspect the result is just
> > misleading or the test (environment) misconfigured. Not the first time
> > I am seeing something like that I am afraid.
> >
>
> Most likely the same situation as:
> "
> FYI, we noticed a -27.2% regression of will-it-scale.per_process_ops
> due to commit:
>
>
> commit: e27be240df53f1a20c659168e722b5d9f16cc7f4 ("mm: memcg: make sure
> memory.events is uptodate when waking pollers")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> "
>
> Where the performance change is due to layout change of
> 'struct mem_cgroup':
> http://lkml.kernel.org/r/[email protected]

I do not follow. How can _this_ patch lead to an improvement when it
actually _adds_ an accounting? The other report you are mentioning is a
_regression_ and I can imagine that the layout changes can lead to that
result.
--
Michal Hocko
SUSE Labs

2018-05-29 08:13:37

by Aaron Lu

[permalink] [raw]

Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Tue, May 29, 2018 at 09:58:00AM +0200, Michal Hocko wrote:
> On Tue 29-05-18 03:15:51, Lu, Aaron wrote:
> > On Mon, 2018-05-28 at 14:03 +0200, Michal Hocko wrote:
> > > On Mon 28-05-18 19:40:19, kernel test robot wrote:
> > > >
> > > > Greeting,
> > > >
> > > > FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
> > > >
> > > >
> > > > commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > >
> > > This doesn't make any sense to me. The patch merely adds an accounting.
> > > It doesn't optimize anything. So I strongly suspect the result is just
> > > misleading or the test (environment) misconfigured. Not the first time
> > > I am seeing something like that I am afraid.
> > >
> >
> > Most likely the same situation as:
> > "
> > FYI, we noticed a -27.2% regression of will-it-scale.per_process_ops
> > due to commit:
> >
> >
> > commit: e27be240df53f1a20c659168e722b5d9f16cc7f4 ("mm: memcg: make sure
> > memory.events is uptodate when waking pollers")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > "
> >
> > Where the performance change is due to layout change of
> > 'struct mem_cgroup':
> > http://lkml.kernel.org/r/[email protected]
>
> I do not follow. How can _this_ patch lead to an improvement when it
> actually _adds_ an accounting? The other report you are mentioning is a

This patch also changed the layout of 'struct mem_cgroup':

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d99b71bc2c66..517096c3cc99 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -208,6 +210,9 @@ struct mem_cgroup {
atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
struct cgroup_file events_file;

+ /* handle for "memory.swap.events" */
+ struct cgroup_file swap_events_file;
+
/* protect arrays of thresholds */
struct mutex thresholds_lock;

And I'm guessing that might be the cause.

> _regression_ and I can imagine that the layout changes can lead to that
> result.

2018-05-29 08:30:12

by Michal Hocko

[permalink] [raw]

Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Tue 29-05-18 16:11:27, Aaron Lu wrote:
> On Tue, May 29, 2018 at 09:58:00AM +0200, Michal Hocko wrote:
> > On Tue 29-05-18 03:15:51, Lu, Aaron wrote:
> > > On Mon, 2018-05-28 at 14:03 +0200, Michal Hocko wrote:
> > > > On Mon 28-05-18 19:40:19, kernel test robot wrote:
> > > > >
> > > > > Greeting,
> > > > >
> > > > > FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
> > > > >
> > > > >
> > > > > commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > >
> > > > This doesn't make any sense to me. The patch merely adds an accounting.
> > > > It doesn't optimize anything. So I strongly suspect the result is just
> > > > misleading or the test (environment) misconfigured. Not the first time
> > > > I am seeing something like that I am afraid.
> > > >
> > >
> > > Most likely the same situation as:
> > > "
> > > FYI, we noticed a -27.2% regression of will-it-scale.per_process_ops
> > > due to commit:
> > >
> > >
> > > commit: e27be240df53f1a20c659168e722b5d9f16cc7f4 ("mm: memcg: make sure
> > > memory.events is uptodate when waking pollers")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > "
> > >
> > > Where the performance change is due to layout change of
> > > 'struct mem_cgroup':
> > > http://lkml.kernel.org/r/[email protected]
> >
> > I do not follow. How can _this_ patch lead to an improvement when it
> > actually _adds_ an accounting? The other report you are mentioning is a
>
> This patch also changed the layout of 'struct mem_cgroup':
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index d99b71bc2c66..517096c3cc99 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -208,6 +210,9 @@ struct mem_cgroup {
> atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
> struct cgroup_file events_file;
>
> + /* handle for "memory.swap.events" */
> + struct cgroup_file swap_events_file;
> +
> /* protect arrays of thresholds */
> struct mutex thresholds_lock;
>
> And I'm guessing that might be the cause.

Ohh, you are right! Sorry, I've missed that part.

--
Michal Hocko
SUSE Labs

2018-05-29 09:02:00

by Aaron Lu

[permalink] [raw]

Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Tue, May 29, 2018 at 10:27:51AM +0200, Michal Hocko wrote:
> On Tue 29-05-18 16:11:27, Aaron Lu wrote:
> > On Tue, May 29, 2018 at 09:58:00AM +0200, Michal Hocko wrote:
> > > On Tue 29-05-18 03:15:51, Lu, Aaron wrote:
> > > > On Mon, 2018-05-28 at 14:03 +0200, Michal Hocko wrote:
> > > > > On Mon 28-05-18 19:40:19, kernel test robot wrote:
> > > > > >
> > > > > > Greeting,
> > > > > >
> > > > > > FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
> > > > > >
> > > > > >
> > > > > > commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> > > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > > >
> > > > > This doesn't make any sense to me. The patch merely adds an accounting.
> > > > > It doesn't optimize anything. So I strongly suspect the result is just
> > > > > misleading or the test (environment) misconfigured. Not the first time
> > > > > I am seeing something like that I am afraid.
> > > > >
> > > >
> > > > Most likely the same situation as:
> > > > "
> > > > FYI, we noticed a -27.2% regression of will-it-scale.per_process_ops
> > > > due to commit:
> > > >
> > > >
> > > > commit: e27be240df53f1a20c659168e722b5d9f16cc7f4 ("mm: memcg: make sure
> > > > memory.events is uptodate when waking pollers")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > > "
> > > >
> > > > Where the performance change is due to layout change of
> > > > 'struct mem_cgroup':
> > > > http://lkml.kernel.org/r/[email protected]
> > >
> > > I do not follow. How can _this_ patch lead to an improvement when it
> > > actually _adds_ an accounting? The other report you are mentioning is a
> >
> > This patch also changed the layout of 'struct mem_cgroup':
> >
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index d99b71bc2c66..517096c3cc99 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -208,6 +210,9 @@ struct mem_cgroup {
> > atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS];
> > struct cgroup_file events_file;
> >
> > + /* handle for "memory.swap.events" */
> > + struct cgroup_file swap_events_file;
> > +
> > /* protect arrays of thresholds */
> > struct mutex thresholds_lock;
> >
> > And I'm guessing that might be the cause.
>
> Ohh, you are right! Sorry, I've missed that part.

Never mind, I want to thank you for taking a look at these reports :-)

I just tried to move this newly added field to the bottom of the
structure(just above 'struct mem_cgroup_per_node *nodeinfo[0];'), and
performance dropped to 82665166, still much better than base but already
worse than this patch.

As you said in another email, this is really fragile.

2018-06-01 07:26:58

by Aaron Lu

[permalink] [raw]

Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Mon, May 28, 2018 at 07:40:19PM +0800, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
>
>
> commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> in testcase: vm-scalability
> on test machine: 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory
> with following parameters:
>
> runtime: 300s
> size: 1T
> test: lru-shm
> cpufreq_governor: performance
>
> test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
>

With the patch I just sent out:
"mem_cgroup: make sure moving_account, move_lock_task and stat_cpu in the
same cacheline"

Applying this commit on top doesn't yield 23% improvement any more, but
a 6% performace drop...

I found the culprit being the following one line introduced in this commit:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d90b0201a8c4..07ab974c0a49 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6019,13 +6019,17 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
if (!memcg)
return 0;

- if (!entry.val)
+ if (!entry.val) {
+ memcg_memory_event(memcg, MEMCG_SWAP_FAIL);
return 0;
+ }

memcg = mem_cgroup_id_get_online(memcg);

If I remove that memcg_memory_event() call, performance will restore.

It's beyond my understanding why this code path matters since there is
no swap device setup in the test machine so I don't see how possible
get_swap_page() could ever be called.

Still investigating...

2018-06-06 08:51:34

by Aaron Lu

[permalink] [raw]

Subject: Re: [LKP] [lkp-robot] [mm, memcontrol] 309fe96bfc: vm-scalability.throughput +23.0% improvement

On Fri, Jun 01, 2018 at 03:26:04PM +0800, Aaron Lu wrote:
> On Mon, May 28, 2018 at 07:40:19PM +0800, kernel test robot wrote:
> >
> > Greeting,
> >
> > FYI, we noticed a +23.0% improvement of vm-scalability.throughput due to commit:
> >
> >
> > commit: 309fe96bfc0ae387f53612927a8f0dc3eb056efd ("mm, memcontrol: implement memory.swap.events")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > in testcase: vm-scalability
> > on test machine: 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory
> > with following parameters:
> >
> > runtime: 300s
> > size: 1T
> > test: lru-shm
> > cpufreq_governor: performance
> >
> > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
> >
>
> With the patch I just sent out:
> "mem_cgroup: make sure moving_account, move_lock_task and stat_cpu in the
> same cacheline"
>
> Applying this commit on top doesn't yield 23% improvement any more, but
> a 6% performace drop...
> I found the culprit being the following one line introduced in this commit:
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d90b0201a8c4..07ab974c0a49 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6019,13 +6019,17 @@ int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry)
> if (!memcg)
> return 0;
>
> - if (!entry.val)
> + if (!entry.val) {
> + memcg_memory_event(memcg, MEMCG_SWAP_FAIL);

Removing this line restored performance but it really doesn't make any
sense. Ying suggested it might be code alignment related and suggested
to use a different compiler than gcc-7.2. Then I used gcc-6.4 and turned
out the test result to be pretty much the same for the two commits:

(each test has run for 3 times)
$ grep throughput base/*/stats.json
base/0/stats.json: "vm-scalability.throughput": 89207489,
base/1/stats.json: "vm-scalability.throughput": 89982933,
base/2/stats.json: "vm-scalability.throughput": 90436592,

$ grep throughput head/*/stats.json
head/0/stats.json: "vm-scalability.throughput": 90882775,
head/1/stats.json: "vm-scalability.throughput": 90675220,
head/2/stats.json: "vm-scalability.throughput": 91173479,

So probably it's really related to code alignment and this bisected
commit doesn't cause performance change(as expected).

> return 0;
> + }
>
> memcg = mem_cgroup_id_get_online(memcg);
>
> If I remove that memcg_memory_event() call, performance will restore.
>
> It's beyond my understanding why this code path matters since there is
> no swap device setup in the test machine so I don't see how possible
> get_swap_page() could ever be called.
>
> Still investigating...
>