Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757069AbbGQHBf (ORCPT ); Fri, 17 Jul 2015 03:01:35 -0400 Received: from mailout2.samsung.com ([203.254.224.25]:43812 "EHLO mailout2.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756868AbbGQHBc (ORCPT ); Fri, 17 Jul 2015 03:01:32 -0400 X-AuditID: cbfee68e-f79c56d000006efb-fc-55a8a84804d7 From: PINTU KUMAR To: akpm@linux-foundation.org, corbet@lwn.net, vbabka@suse.cz, gorcunov@openvz.org, mhocko@suse.cz, emunson@akamai.com, kirill.shutemov@linux.intel.com, standby24x7@gmail.com, hannes@cmpxchg.org, vdavydov@parallels.com, hughd@google.com, minchan@kernel.org, tj@kernel.org, rientjes@google.com, xypron.glpk@gmx.de, dzickus@redhat.com, prarit@redhat.com, ebiederm@xmission.com, rostedt@goodmis.org, uobergfe@redhat.com, paulmck@linux.vnet.ibm.com, iamjoonsoo.kim@lge.com, ddstreet@ieee.org, sasha.levin@oracle.com, koct9i@gmail.com, mgorman@suse.de, cj@linux.com, opensource.ganesh@gmail.com, vinmenon@codeaurora.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-pm@vger.kernel.org, qiuxishi@huawei.com, Valdis.Kletnieks@vt.edu Cc: cpgs@samsung.com, pintu_agarwal@yahoo.com, vishnu.ps@samsung.com, rohit.kr@samsung.com, iqbal.ams@samsung.com, pintu.ping@gmail.com, pintu.k@outlook.com References: <1435929607-3435-1-git-send-email-pintu.k@samsung.com> <1437114578-2502-1-git-send-email-pintu.k@samsung.com> In-reply-to: <1437114578-2502-1-git-send-email-pintu.k@samsung.com> Subject: RE: [PATCHv2 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature Date: Fri, 17 Jul 2015 12:28:47 +0530 Message-id: <06c401d0c05e$663f2ee0$32bd8ca0$@samsung.com> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii Content-transfer-encoding: 7bit X-Mailer: Microsoft Outlook 14.0 Thread-index: AQIslilycOCfT8Oq4c8d4RbMXOXODQHY5cNOnRjHFRA= Content-language: en-us X-Brightmail-Tracker: H4sIAAAAAAAAA02SbUxTZxTH89z79N7SjHhXUR5Z3BYys0miCIIe3Es0ZsvzxcQp7MNi0II3 QISKLejUJcMUqjKoCpZBYQ61QMFqYxuVKmItVfGdABHBgDXyEuYqHeXFsQzXS/nAt///nP/z /M5JjpxVNnBR8ix1nqhRq7KjOQW2RiQcWkUtlpQ1d25+BjU2Kwe2UR2CQddRBKPuldChv4Gh /3EcvL9ayMPbpwEGbjn8DFywb4GhcQOGxl91HFTPlDHQO1ITtMe9MjirN2Poul7DwYD1vQwC pR4E5W+HEfypm5ZB/eQYD0V9rzH4HEH79FiXDHpGKzBM+V4EEe9KeZiquIFAb77MwIiuCEPr MS8D5z0vWOis7OFhpuEOAsNgHQOdY34M1UcMCLquVLFgu+Li4bShH0Gb0Y82xtByTy1L2/4a Y2mXoZShTlM/T/1/p9LC1j6e1trzaaHHJ6NDJ28z1N50nKP28TKeFvu6Gdpe+S+mg92/MfTM /e+p1+3AW5f/qPhqt5idtV/UxH6zS5H5TH8J55oP/hQwphWgu6nFKExOhAQy0lnCh/RS0jFg 44qRQq4U6hCZ/q8t2JDPha6WfBeqn0OksfQlChkfIu3nipAU4oTPyV1XuFSPEE5wxPh7AysZ VqhFpOnIAywhlMJh0nTNg6UHYcJmcmZGlMqLhWTyx2w/K2ksrCA2k3MuHi4kkaqBjnn9IXlX PjCnWSGG2Jz3mJD+hDisPja0waek+fEbJOkIYQOp0k9woUwkKfO+4qV5iHBRQTqvDc/DBDJV 7sahLZcTu2v+n2XktuU5PomIaQHatABtWoA2LUDUItyEloi56bnatAxN/GqtKkebr85Ynb43 x46C9/twdtjQjHpdX7qRIEfRH4QXfGtJUcpU+7UHc9woMTjRKTZqSfre4Mmr83bGrV0XD4kJ iWvj1yeti44M3xX1z3alkKHKE/eIYq6o2anJzxa1bsTIw6IKUOwvLQd0qM8ycSA14DQuHXS2 6JKqZd56u/8L1GNJ3FSy0Tz+LHlf3ZvKR+snmn8OZLLTk+YtySWzE8rNW09lLTo7tOrEZHqV etkap/Z0ypPt7gh0Kba9omVb/aHumsVHx0b2pL3aV7iod3JbUo+VMxsjP3IxX+9ofbLB8XGj /YdNedFYm6mKi2E1WtX/qkrqi7oDAAA= X-Brightmail-Tracker: H4sIAAAAAAAAA2WTe0xTVxzHc+69bS9kXe66gifNzLprFjKSKh0PfyzTmDDYTYQFHcRt2eiu cNM29sFaMEMTU1NggLY6GCq1c8zhKLVZtZ0oNHGsVOYjBCJuRlwZRpSwrYCyIWGbrIW5kO38 9fl+c77f88g5NCl7IFbQelOVYDHxBlacTF17fJdRcR5PaYYnmAJuv08M/ik7gom+jxBMhV+C 4foQBdFBNSx110pgemiOgG+CswScDhTBvYdOCroO2MVwfLGZgFuT7rhsHBfB5/UdFIz0usUw 5lsSwZwjgqBl+j6Cn+2PRPDl7zMSqBu9S0EsGJdDDSMiuDl1hIL52O34EgsOCcwfCSGo7zhL wKS9joKLDeMEfBG5TcL1YzclsNh5CYFz4hQB12dmKTi+34lg5FwbCf5zfRL4xBlF0N86i7ak cy2RdpLr/3WG5EacDoLrcUUl3OyDMq724qiEaw9Uc7WRmIi7d/hbggt4G8Vc4GGzhGuK3SC4 y8f+oLiJG0cJ7sSVbdx4OEgVr33Hhl7VCXyFYFEKpnJzhd6k3cRufVOTp8nOyVCr1LmwkVWa eKOwiX2tsFhVoDfEL5pV7uYN1XGrmLda2Q2b/99QWlKggifBt1RFJdv/zbyc8Z/xvg/pfqj/ iqrsqPlwrnWnDQ2UNSGaxkwW7j5Y0ISS4piKh8f84iaUTMuYkwh3OX5CKyKG8OWTdSgREDNp eKBPmvDlzCExbv20k0wIkmlH2Lv/KpWokjF7sfd8hEoEkpg8fGJRSNjPMiX4s8dRMsEU8yL2 u3qWp0uZXNw2NvwPP4MXWsaWmWTSsb/nO2KFn8dBX4xc2akSXxj8BSVYzryC2+p/E6/MWYOb x+9IDiOZa1WVa1WVa1WVa1WkHVFelCJUlldad2qN6vVW3mitNmnXl5uNAbT83O8rLqAuG4QR QyP2Kakt31MqE/G7rTXGMMI0ycqlk3vjlrSCr9kjWMwaS7VBsIZRdvykH5OKlHJz/POYqjTq zKyN2ZnZuVmQlZPLrpFeMReWyhgtXyXsEoRKwfIkR9BJCht64+3WO2d2KM4O64aS0yY69wjv zm0OXvpat7RWdvTQrUHZtef+is4c7GjId3szzxetk79n+L4w9EGq/Myjbfv0bsnogdDT3g37 /Fjzujq268dQTHna/IJ+Xdm0PTVtwDGv7V7a7pvauqWXvrpjPr+i1tMfPuX3BXv/bIAcXd5i I73AUlYdr04nLVb+b4b+HAIEBAAA DLP-Filter: Pass X-MTR: 20000000000000000@CPGS X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 15372 Lines: 402 Sorry, correcting a small typo error below. Please review and provide your comments. This is the version2 of the previous patch. > -----Original Message----- > From: Pintu Kumar [mailto:pintu.k@samsung.com] > Sent: Friday, July 17, 2015 12:00 PM > To: akpm@linux-foundation.org; corbet@lwn.net; vbabka@suse.cz; > gorcunov@openvz.org; pintu.k@samsung.com; mhocko@suse.cz; > emunson@akamai.com; kirill.shutemov@linux.intel.com; > standby24x7@gmail.com; hannes@cmpxchg.org; vdavydov@parallels.com; > hughd@google.com; minchan@kernel.org; tj@kernel.org; rientjes@google.com; > xypron.glpk@gmx.de; dzickus@redhat.com; prarit@redhat.com; > ebiederm@xmission.com; rostedt@goodmis.org; uobergfe@redhat.com; > paulmck@linux.vnet.ibm.com; iamjoonsoo.kim@lge.com; ddstreet@ieee.org; > sasha.levin@oracle.com; koct9i@gmail.com; mgorman@suse.de; cj@linux.com; > opensource.ganesh@gmail.com; vinmenon@codeaurora.org; linux- > doc@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org; linux- > pm@vger.kernel.org; qiuxishi@huawei.com; Valdis.Kletnieks@vt.edu > Cc: cpgs@samsung.com; pintu_agarwal@yahoo.com; vishnu.ps@samsung.com; > rohit.kr@samsung.com; iqbal.ams@samsung.com; pintu.ping@gmail.com; > pintu.k@outlook.com > Subject: [PATCHv2 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory > feature > > This patch provides 2 things: > 1. Add new control called shrink_memory in /proc/sys/vm/. > This control can be used to aggressively reclaim memory system-wide in one shot > from the user space. A value of 1 will instruct the kernel to reclaim as much as > totalram_pages in the system. > Example: echo 1 > /proc/sys/vm/shrink_memory > > If any other value than 1 is written to shrink_memory an error EINVAL occurs. > > 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY. > Currently, shrink_all_memory function is used only during hibernation. > With the new config we can make use of this API for non-hibernation case also > without disturbing the hibernation case. > > The detailed paper was presented in Embedded Linux Conference, Mar-2015 > http://events.linuxfoundation.org/sites/events/files/slides/ > %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf > > A sample example is shown below: > Device: ARMv7, Dual Core CPU 1.2GHz > RAM: 512MB (Without SWAP/ZRAM) > Linux Kernel: 3.10.17 > Scenario: Just after boot-up finished. > > BEFORE: > ------------------------------------------------------------------------- > shell> free -tm ; cat /proc/buddyinfo > total used free shared buffers cached > Mem: 460 440 20 0 35 154 > -/+ buffers/cache: 250 209 > Swap: 0 0 0 > Total: 460 440 20 > Node 0, zone Normal 1037 705 92 19 19 17 4 9 0 0 0 > > shell> vmstat 1 & > > AFTER: > ------------------------------------------------------------------------- > shell> echo 1 > /proc/sys/vm/shrink_memory > > r b swpd free buff cache si so bi bo in cs us sy id wa st > 0 0 0 20768 35876 157876 0 0 0 0 64 177 0 1 99 0 0 > -------------------------------------------------------------------------------- > |1 0 0 33104 34864 149808 0 0 0 0 82 221 0 12 88 0 0| > -------------------------------------------------------------------------------- > 0 0 0 188776 3000 54420 0 0 0 0 216 374 0 30 70 0 0 > 0 0 0 188400 3652 54528 0 0 740 8 188 337 2 1 95 2 0 > > shell> free -tm ; cat /proc/buddyinfo > total used free shared buffers cached > Mem: 460 278 182 0 4 54 > -/+ buffers/cache: 219 240 > Swap: 0 0 0 > Total: 460 278 182 > Node 0, zone Normal 5575 3158 1500 727 240 90 33 18 10 6 > 6 > > RESULTS: > ----------------------------------------------------- > Around 160MB of memory were recovered in one shot. > Many higher-order pages were recovered in the process. > From the vmstat output the total CPU usage is: ~12% (system), when this > command is running, for 1 second. > We also measured the power consumption using H/W power monitor tool. > Below is the result: > Before - ~180mA > During shrink memory - ~237mA > Duration - ~0.5 sec > Consumption: ~57mA > > FURTHER OBSERVATIONS: > ----------------------------------------------------- > 37% reduction in killing of application with memory shrink calling on boot up. > Around ~4000 page faults are reduced. > Around ~43% of reduction in kswapd calls. > Movement to slowpath reduced dractically. > Combining shrink_memory with compaction shows good benefits over > fragmentation. > > APPLICATION LAUNCH BEHAVIOR: > ----------------------------------------------------- > During First Launch: > ================================================================== > ========== > Application Before_shrink_memory After_shrink_memory Difference > Camera 1.981 1.86 0.121 > Gallery 1.276 0.94 0.336 > contacts 1.112 0.941 0.171 > messaging 0.886 0.795 0.091 > settings 1.257 1.212 0.045 > Music 1.854 2.098 -0.244 > Gmail 1.872 1.935 -0.063 > Browser 2.569 2.677 -0.108 > ================================================================== > ========== > > During Re-launch: > ================================================================== > ========== > Application Before_shrink_memory After_shrink_memory Difference > Camera 1.248 0.976 0.272 > Gallery 0.697 0.633 0.064 > contacts 0.506 0.561 -0.055 > messaging 0.533 0.489 0.044 > settings 0.833 0.805 0.028 > Music 0.832 0.769 0.063 > Gmail 0.913 0.841 0.072 > Browser 0.579 0.57 0.009 > ================================================================== > ========== > > Various other use cases where this can be used: > ---------------------------------------------------------------------------- > 1) Just after system boot-up is finished, using the sysctl configuration from > bootup script. > 2) During system suspend state, after suspend_freeze_processes() > [kernel/power/suspend.c] > Based on certain condition about fragmentation or free memory state. > 3) From Android ION system heap driver, when order-4 allocation starts failing. > By calling shrink_all_memory, in a separate worker thread, based on certain > condition. > 4) It can be combined with compact_memory to achieve better results on > memory > fragmentation. > 5) It can be helpful in debugging and tuning various vm parameters. > 6) It can be helpful to identify how much of maximum memory could be > reclaimable at any point of time. > And how much higher-order pages could be formed with this amount of > reclaimable memory. > Thus it can be helpful in accordingly tuning the reserved memory needs > of a system. > 7) It can be helpful in properly tuning the SWAP size in the system. > In shrink_all_memory, we enable may_swap = 1, that means all unused pages > will be swapped out. > Thus, running shrink_memory on a heavy loaded system, we can check how > much > swap is getting full. > That can be the maximum swap size with a 10% delta. > Also if ZRAM is used, it helps us in compressing and storing the pages for > later use. > 8) It can be helpful to allow more new applications to be launched, without > killing the older once. > And moving the least recently used pages to the SWAP area. > Thus user data can be retained. > 9) Can be part of a system system-tool to quickly defragment entire system > memory. > 10) This may also help in reducing fragmentation within CMA region. > 11) More use cases can be identified. > > Most importantly, it can be more effective when applied intelligently, based on > certain conditions. > It should be executed always and the decision is left upto the user. * It should _not_ be executed always. The decision is left to the user. > > Signed-off-by: Pintu Kumar > --- > V2: Added min,max parameter for shrink_memory, suggested by > Heinrich Schuchardt . > Error handling in sysctl_shrinkmem_handler, for any value other than 1, > suggested by, Heinrich Schuchardt . > Fixed HIBERNATION+SHRINK_MEMORY issue in shrink_all_memory, > suggested by Valdis.Kletnieks@vt.edu. > Restore gfp_mask to original, because of other dependencies. > Also adding GFP_RECLAIM_MASK, does not affect anything. > Verified power consumption data during shrink_memory, > as suggested by Johannes Weiner . > Verified application launch/re-launch scenarios before/after shrink_memory, > as suggested by Xishi Qiu . > Updates the commit messages with examples and use cases. > > Documentation/sysctl/vm.txt | 18 ++++++++++++++++++ > include/linux/swap.h | 7 +++++++ > kernel/sysctl.c | 16 ++++++++++++++++ > mm/Kconfig | 8 ++++++++ > mm/vmscan.c | 34 ++++++++++++++++++++++++++++++++-- > 5 files changed, 81 insertions(+), 2 deletions(-) > > diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index > 9832ec5..54eda3a 100644 > --- a/Documentation/sysctl/vm.txt > +++ b/Documentation/sysctl/vm.txt > @@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm: > - page-cluster > - panic_on_oom > - percpu_pagelist_fraction > +- shrink_memory > - stat_interval > - swappiness > - user_reserve_kbytes > @@ -718,6 +719,23 @@ sysctl, it will revert to this default behavior. > > ============================================================== > > +shrink_memory > + > +This control is available only when CONFIG_SHRINK_MEMORY is set. This > +control can be used to aggressively reclaim memory system-wide in one > +shot. A value of > +1 will instruct the kernel to reclaim as much as totalram_pages in the system. > +For example, to reclaim all memory system-wide we can do: > +# echo 1 > /proc/sys/vm/shrink_memory > + > +If any other value than 1 is written to shrink_memory an error EINVAL occurs. > + > +For more information about this control, please visit the following > +presentation in embedded linux conference, 2015. > +http://events.linuxfoundation.org/sites/events/files/slides/ > +%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf > + > +============================================================== > + > stat_interval > > The time interval between which vm statistics are updated. The default diff --git > a/include/linux/swap.h b/include/linux/swap.h index 9a7adfb..6505b0b 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -333,6 +333,13 @@ extern int vm_swappiness; extern int > remove_mapping(struct address_space *mapping, struct page *page); extern > unsigned long vm_total_pages; > > +#ifdef CONFIG_SHRINK_MEMORY > +extern int sysctl_shrink_memory; > +extern int sysctl_shrinkmem_handler(struct ctl_table *table, int write, > + void __user *buffer, size_t *length, loff_t *ppos); #endif > + > + > #ifdef CONFIG_NUMA > extern int zone_reclaim_mode; > extern int sysctl_min_unmapped_ratio; > diff --git a/kernel/sysctl.c b/kernel/sysctl.c index c566b56..e66581b 100644 > --- a/kernel/sysctl.c > +++ b/kernel/sysctl.c > @@ -275,6 +275,11 @@ static int min_extfrag_threshold; static int > max_extfrag_threshold = 1000; #endif > > +#ifdef CONFIG_SHRINK_MEMORY > +static int min_shrink_memory = 1; > +static int max_shrink_memory = 1; > +#endif > + > static struct ctl_table kern_table[] = { > { > .procname = "sched_child_runs_first", > @@ -1351,6 +1356,17 @@ static struct ctl_table vm_table[] = { > }, > > #endif /* CONFIG_COMPACTION */ > +#ifdef CONFIG_SHRINK_MEMORY > + { > + .procname = "shrink_memory", > + .data = &sysctl_shrink_memory, > + .maxlen = sizeof(int), > + .mode = 0200, > + .proc_handler = sysctl_shrinkmem_handler, > + .extra1 = &min_shrink_memory, > + .extra2 = &max_shrink_memory, > + }, > +#endif > { > .procname = "min_free_kbytes", > .data = &min_free_kbytes, > diff --git a/mm/Kconfig b/mm/Kconfig > index b3a60ee..8e04bd9 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -657,3 +657,11 @@ config DEFERRED_STRUCT_PAGE_INIT > when kswapd starts. This has a potential performance impact on > processes running early in the lifetime of the systemm until kswapd > finishes the initialisation. > + > +config SHRINK_MEMORY > + bool "Allow for system-wide shrinking of memory" > + default n > + depends on MMU > + help > + It enables support for system-wide memory reclaim in one shot using > + echo 1 > /proc/sys/vm/shrink_memory. > diff --git a/mm/vmscan.c b/mm/vmscan.c > index c8d8282..e802fa7 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -58,6 +58,10 @@ > #define CREATE_TRACE_POINTS > #include > > +#ifdef CONFIG_SHRINK_MEMORY > +#include > +#endif > + > struct scan_control { > /* How many pages shrink_list() should reclaim */ > unsigned long nr_to_reclaim; > @@ -3557,7 +3561,7 @@ void wakeup_kswapd(struct zone *zone, int order, > enum zone_type classzone_idx) > wake_up_interruptible(&pgdat->kswapd_wait); > } > > -#ifdef CONFIG_HIBERNATION > +#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY > /* > * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of > * freed pages. > @@ -3576,12 +3580,16 @@ unsigned long shrink_all_memory(unsigned long > nr_to_reclaim) > .may_writepage = 1, > .may_unmap = 1, > .may_swap = 1, > - .hibernation_mode = 1, > }; > struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask); > struct task_struct *p = current; > unsigned long nr_reclaimed; > > + if (system_entering_hibernation()) > + sc.hibernation_mode = 1; > + else > + sc.hibernation_mode = 0; > + > p->flags |= PF_MEMALLOC; > lockdep_set_current_reclaim_state(sc.gfp_mask); > reclaim_state.reclaimed_slab = 0; > @@ -3597,6 +3605,28 @@ unsigned long shrink_all_memory(unsigned long > nr_to_reclaim) } #endif /* CONFIG_HIBERNATION */ > > +#ifdef CONFIG_SHRINK_MEMORY > +int sysctl_shrink_memory; > +/* This is the entry point for system-wide shrink memory > ++via /proc/sys/vm/shrink_memory */ > +int sysctl_shrinkmem_handler(struct ctl_table *table, int write, > + void __user *buffer, size_t *length, loff_t *ppos) { > + int ret; > + > + ret = proc_dointvec_minmax(table, write, buffer, length, ppos); > + if (ret) > + return ret; > + > + if (write) { > + if (sysctl_shrink_memory & 1) > + shrink_all_memory(totalram_pages); > + } > + > + return 0; > +} > +#endif > + > /* It's optimal to keep kswapds on the same CPUs as their memory, but > not required for correctness. So if the last cpu in a node goes > away, we get changed to run anywhere: as the first one comes back, > -- > 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/