Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752009AbcDVABx (ORCPT ); Thu, 21 Apr 2016 20:01:53 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:35747 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751508AbcDVABw (ORCPT ); Thu, 21 Apr 2016 20:01:52 -0400 Date: Thu, 21 Apr 2016 17:01:50 -0700 From: Andrew Morton To: Vinayak Menon Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, dan.j.williams@intel.com, mgorman@suse.de, vbabka@suse.cz, kirill.shutemov@linux.intel.com, dave.hansen@linux.intel.com, hughd@google.com Subject: Re: [PATCH] mm: make fault_around_bytes configurable Message-Id: <20160421170150.b492ffe35d073270b53f0e4d@linux-foundation.org> In-Reply-To: <1460992636-711-1-git-send-email-vinmenon@codeaurora.org> References: <1460992636-711-1-git-send-email-vinmenon@codeaurora.org> X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4606 Lines: 93 On Mon, 18 Apr 2016 20:47:16 +0530 Vinayak Menon wrote: > Mapping pages around fault is found to cause performance degradation > in certain use cases. The test performed here is launch of 10 apps > one by one, doing something with the app each time, and then repeating > the same sequence once more, on an ARM 64-bit Android device with 2GB > of RAM. The time taken to launch the apps is found to be better when > fault around feature is disabled by setting fault_around_bytes to page > size (4096 in this case). Well that's one workload, and a somewhat strange one. What is the effect on other workloads (of which there are a lot!). > The tests were done on 3.18 kernel. 4 extra vmstat counters were added > for debugging. pgpgoutclean accounts the clean pages reclaimed via > __delete_from_page_cache. pageref_activate, pageref_activate_vm_exec, > and pageref_keep accounts the mapped file pages activated and retained > by page_check_references. > > === Without swap === > 3.18 3.18-fault_around_bytes=4096 > ----------------------------------------------------------------------- > workingset_refault 691100 664339 > workingset_activate 210379 179139 > pgpgin 4676096 4492780 > pgpgout 163967 96711 > pgpgoutclean 1090664 990659 > pgalloc_dma 3463111 3328299 > pgfree 3502365 3363866 > pgactivate 568134 238570 > pgdeactivate 752260 392138 > pageref_activate 315078 121705 > pageref_activate_vm_exec 162940 55815 > pageref_keep 141354 51011 > pgmajfault 24863 23633 > pgrefill_dma 1116370 544042 > pgscan_kswapd_dma 1735186 1234622 > pgsteal_kswapd_dma 1121769 1005725 > pgscan_direct_dma 12966 1090 > pgsteal_direct_dma 6209 967 > slabs_scanned 1539849 977351 > pageoutrun 1260 1333 > allocstall 47 7 > > === With swap === > 3.18 3.18-fault_around_bytes=4096 > ----------------------------------------------------------------------- > workingset_refault 597687 878109 > workingset_activate 167169 254037 > pgpgin 4035424 5157348 > pgpgout 162151 85231 > pgpgoutclean 928587 1225029 > pswpin 46033 17100 > pswpout 237952 127686 > pgalloc_dma 3305034 3542614 > pgfree 3354989 3592132 > pgactivate 626468 355275 > pgdeactivate 990205 771902 > pageref_activate 294780 157106 > pageref_activate_vm_exec 141722 63469 > pageref_keep 121931 63028 > pgmajfault 67818 45643 > pgrefill_dma 1324023 977192 > pgscan_kswapd_dma 1825267 1720322 > pgsteal_kswapd_dma 1181882 1365500 > pgscan_direct_dma 41957 9622 > pgsteal_direct_dma 25136 6759 > slabs_scanned 689575 542705 > pageoutrun 1234 1538 > allocstall 110 26 > > Looks like with fault_around, there is more pressure on reclaim because > of the presence of more mapped pages, resulting in more IO activity, > more faults, more swapping, and allocstalls. A few of those things did get a bit worse? Do you have any data on actual wall-time changes? How much faster do things become with the patch? If it is "0.1%" then I'd say "umm, no". > Make fault_around_bytes configurable so that it can be tuned to avoid > performance degradation. It sounds like we need to be smarter about auto-tuning this thing. Maybe the refault code could be taught to provide the feedback path but that sounds hard. Still. I do think it would be better to make this configurable at runtime. Move the existing debugfs tunable into /proc/sys/vm (and document it!). I do dislkie adding even more tunables but this one does make sense. People will want to run their workloads with various values until they find the peak throughput, and requiring a kernel rebuild for that is a huge pain.