Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1030200AbbGQGmq (ORCPT ); Fri, 17 Jul 2015 02:42:46 -0400 Received: from mailout3.samsung.com ([203.254.224.33]:55445 "EHLO mailout3.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932158AbbGQGmh (ORCPT ); Fri, 17 Jul 2015 02:42:37 -0400 X-AuditID: cbfee690-f796f6d000005054-b6-55a8a3cbc1ef From: Pintu Kumar To: akpm@linux-foundation.org, corbet@lwn.net, vbabka@suse.cz, gorcunov@openvz.org, pintu.k@samsung.com, mhocko@suse.cz, emunson@akamai.com, kirill.shutemov@linux.intel.com, standby24x7@gmail.com, hannes@cmpxchg.org, vdavydov@parallels.com, hughd@google.com, minchan@kernel.org, tj@kernel.org, rientjes@google.com, xypron.glpk@gmx.de, dzickus@redhat.com, prarit@redhat.com, ebiederm@xmission.com, rostedt@goodmis.org, uobergfe@redhat.com, paulmck@linux.vnet.ibm.com, iamjoonsoo.kim@lge.com, ddstreet@ieee.org, sasha.levin@oracle.com, koct9i@gmail.com, mgorman@suse.de, cj@linux.com, opensource.ganesh@gmail.com, vinmenon@codeaurora.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-pm@vger.kernel.org, qiuxishi@huawei.com, Valdis.Kletnieks@vt.edu Cc: cpgs@samsung.com, pintu_agarwal@yahoo.com, vishnu.ps@samsung.com, rohit.kr@samsung.com, iqbal.ams@samsung.com, pintu.ping@gmail.com, pintu.k@outlook.com Subject: [PATCHv2 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature Date: Fri, 17 Jul 2015 11:59:38 +0530 Message-id: <1437114578-2502-1-git-send-email-pintu.k@samsung.com> X-Mailer: git-send-email 1.7.9.5 In-reply-to: <1435929607-3435-1-git-send-email-pintu.k@samsung.com> References: <1435929607-3435-1-git-send-email-pintu.k@samsung.com> X-Brightmail-Tracker: H4sIAAAAAAAAAy1SfUyMcRz3e57fPc/VtD07xc9tZ5zJe1RXvpZhDD9/MG8ZGTl5Vqi0u0Ib W7leiG4pr3XjokiOci1vGeeqIaN2ZeiUxhXRVUIlTe46/30/L9/P5/vHV8rKrBK5dE98oqiJ V8cqOW9s8lUdnvviSkn4/JdNs8FQZuKgrFOHwGHJRNBpnQkNGVUYWl4GwsidNB66638w8Lii l4Eb5jXQ3qfHcP2EjoOCoVwG3n02uODxNgkUZhRhaHxg4KDVNCKBH9k1CPK6OxB81Q1I4Oqv Hh7Smz9hcFa4YP2xRgnoB6p5eNN5FkO/0+7qGczmof9sFYKMotsMfNalY3h0rI2BKzV2Fmzn 3/AwdK0Wgd5RzICtpxdDQaoeQWPlBRbKKi08nNa3IKg+04uWzqZ5NUaWVnf1sLRRn83Q+/kt PO39voOmPWrmqdGcRNNqnBLanvOEoebS4xw19+XyNMvZxNBn5/9g6mg6x9CLz9fTNmsFXqeI 8F60W4zdc0DUzFu80zumzz7MJ3yNOfT+koNLQboNWUgqJYKK/DQpspCXaxxPGlrLuCzkLZUJ xYik3ytgPIKKfDyaKfEIlxH5UngXe0AqQ06+zePdLk6YTga7/iC34CsUccT58CHrBqxgRKQ0 tQ67XeOEtaThVR3rnrEwjQzYb47yPsIyUvw6n/HcNJkYche5aS9hOcmsqxgtkLksj5+WjmYS oduLfKvv+J8jkP48K/bsKojZwnrOnkielLzFOWicEY0pRX5iQlSCdle0RhWgVcdpk+KjA6L2 x5mR67Ve/O3IuYdaLWFWJEiRcqxPyoqScJlEfUCbHGdFIa6GU6zcL2q/6xvjEyMDg0ODIEQV Ehy0YGGocoLPN/nvjTIhWp0o7hPFBFETqUmKFbVWxEi95Clo0pZI9rlSrL25udA2fgQprkb4 b2w2GXI68vuGw2yG0NpB03D0iUQ7Nq4/ModfQpPLq1Z1bbIEb5tX+9ewr/zUyq4Hlf4Ttib7 y9tOKsL1yoNFy2fE6pbkDpVHfcx/5hewN6J99SvNQtv2+2GOD1NbHQuMty69Plw3XeG86xsz FDRFibUx6sBZrEar/gekE0qDVQMAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA2VSf0wTZxjed3e9K2SNl6rjUmNSL1mWOeuotvI20cUNNLdMkSlNlsXR3eBS wP7AHmU6/ykrAnbSKCha6UZVVKwsbGW4TYmy0kXFqMRqUCtIJoIQKj8UMGLUK8b9sb1/vc/z Ps/7fPnyynFlFamSF9iKBYeNt7BkMnH5xX1ac/loozHVU5MC/uYmEpqH3Aj62ysQDIXfh67y swT0XNHCy9NlFDy69hiD8y1jGJwKrYcHE14CTv7gJqHuWTUGtwf9EtzVJ4PD5Q0ERM/4Seht eimDx1URBDWPBhAMu6dlcHxylIKdd+4TEG+R4LXKqAy80x0UdA/VEjAVj0k5T6somKo9i6C8 4VcMBt07CThX2YfB0UgMh+sHuyl4duJvBN7+YxhcHx0joK7UiyDa6sOhubWdgn3eHgQd+8fQ 6g+4mkgA5zpGRnEu6q3CuD8P9VDc2HgOV3buDsUFQk6uLBKXcQ/2/IVxoeAukgtNVFOcJ34D 4y4enCG4/hsHMO6nS59zfeEWImvhly60Ml/g8wSHWrDl2vMKbOZV7GebTOkm/YpUrUZrgDRW beOtwio2Y12WZm2BRfptVl3CW5wSlcWLIvvhR//fYMxeq4E3xi8067M3/utZlvqf+roJ5U/E nlNFw/nb7tb3ky7k3uhBSXKG1jH/fF8he92/w3T1NpMelCxX0kcQ8/Dw78RrUIoxu2/VUAkV Sb/HPB2ZQYnBPLqBZOJtbXgC4HQAMcHSTiKhmktnMl1XO/FET9DvMtOxn2d5Bf0Jc+zmIcyD 5FKemvFXr0zQSXQ6U9HZMhuglCTnLwTxPUgRQG8F0XyhKLdI/MZs1S4VeavotJmX5tqtITR7 igOqP9BJF4QRLUfs2wrXmkajUsaXiNutYcTIcXaeYnCHRCny+O3fCQ67yeG0CGIY6aVH7cVV 83Pt0mHbik3a5bo0/XK9QQe6FQY2RXHJvs6opM18sbBFEIoExxsfJk9SudDV1VtrC6OeNt8V 7fgidVphn+q2b8rwW0mZM3POzdNY44J7/uyhysCM2/Rjpv6rjCV7Yzm3NvgWP1mmG46k57T/ Yt6wO2d/0pwnbeNnMoMjg5bWNSGfgdxXG7YI7cmTsm8/zUj5OOqy9+Ydr3/R7d48ox94Pvdh ffhAeJItjBuVdSwh5vPaxbhD5F8BEiavQaADAAA= DLP-Filter: Pass X-MTR: 20000000000000000@CPGS X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 13317 Lines: 352 This patch provides 2 things: 1. Add new control called shrink_memory in /proc/sys/vm/. This control can be used to aggressively reclaim memory system-wide in one shot from the user space. A value of 1 will instruct the kernel to reclaim as much as totalram_pages in the system. Example: echo 1 > /proc/sys/vm/shrink_memory If any other value than 1 is written to shrink_memory an error EINVAL occurs. 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY. Currently, shrink_all_memory function is used only during hibernation. With the new config we can make use of this API for non-hibernation case also without disturbing the hibernation case. The detailed paper was presented in Embedded Linux Conference, Mar-2015 http://events.linuxfoundation.org/sites/events/files/slides/ %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf A sample example is shown below: Device: ARMv7, Dual Core CPU 1.2GHz RAM: 512MB (Without SWAP/ZRAM) Linux Kernel: 3.10.17 Scenario: Just after boot-up finished. BEFORE: ------------------------------------------------------------------------- shell> free -tm ; cat /proc/buddyinfo total used free shared buffers cached Mem: 460 440 20 0 35 154 -/+ buffers/cache: 250 209 Swap: 0 0 0 Total: 460 440 20 Node 0, zone Normal 1037 705 92 19 19 17 4 9 0 0 0 shell> vmstat 1 & AFTER: ------------------------------------------------------------------------- shell> echo 1 > /proc/sys/vm/shrink_memory r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 20768 35876 157876 0 0 0 0 64 177 0 1 99 0 0 -------------------------------------------------------------------------------- |1 0 0 33104 34864 149808 0 0 0 0 82 221 0 12 88 0 0| -------------------------------------------------------------------------------- 0 0 0 188776 3000 54420 0 0 0 0 216 374 0 30 70 0 0 0 0 0 188400 3652 54528 0 0 740 8 188 337 2 1 95 2 0 shell> free -tm ; cat /proc/buddyinfo total used free shared buffers cached Mem: 460 278 182 0 4 54 -/+ buffers/cache: 219 240 Swap: 0 0 0 Total: 460 278 182 Node 0, zone Normal 5575 3158 1500 727 240 90 33 18 10 6 6 RESULTS: ----------------------------------------------------- Around 160MB of memory were recovered in one shot. Many higher-order pages were recovered in the process. >From the vmstat output the total CPU usage is: ~12% (system), when this command is running, for 1 second. We also measured the power consumption using H/W power monitor tool. Below is the result: Before - ~180mA During shrink memory - ~237mA Duration - ~0.5 sec Consumption: ~57mA FURTHER OBSERVATIONS: ----------------------------------------------------- 37% reduction in killing of application with memory shrink calling on boot up. Around ~4000 page faults are reduced. Around ~43% of reduction in kswapd calls. Movement to slowpath reduced dractically. Combining shrink_memory with compaction shows good benefits over fragmentation. APPLICATION LAUNCH BEHAVIOR: ----------------------------------------------------- During First Launch: ============================================================================ Application Before_shrink_memory After_shrink_memory Difference Camera 1.981 1.86 0.121 Gallery 1.276 0.94 0.336 contacts 1.112 0.941 0.171 messaging 0.886 0.795 0.091 settings 1.257 1.212 0.045 Music 1.854 2.098 -0.244 Gmail 1.872 1.935 -0.063 Browser 2.569 2.677 -0.108 ============================================================================ During Re-launch: ============================================================================ Application Before_shrink_memory After_shrink_memory Difference Camera 1.248 0.976 0.272 Gallery 0.697 0.633 0.064 contacts 0.506 0.561 -0.055 messaging 0.533 0.489 0.044 settings 0.833 0.805 0.028 Music 0.832 0.769 0.063 Gmail 0.913 0.841 0.072 Browser 0.579 0.57 0.009 ============================================================================ Various other use cases where this can be used: ---------------------------------------------------------------------------- 1) Just after system boot-up is finished, using the sysctl configuration from bootup script. 2) During system suspend state, after suspend_freeze_processes() [kernel/power/suspend.c] Based on certain condition about fragmentation or free memory state. 3) From Android ION system heap driver, when order-4 allocation starts failing. By calling shrink_all_memory, in a separate worker thread, based on certain condition. 4) It can be combined with compact_memory to achieve better results on memory fragmentation. 5) It can be helpful in debugging and tuning various vm parameters. 6) It can be helpful to identify how much of maximum memory could be reclaimable at any point of time. And how much higher-order pages could be formed with this amount of reclaimable memory. Thus it can be helpful in accordingly tuning the reserved memory needs of a system. 7) It can be helpful in properly tuning the SWAP size in the system. In shrink_all_memory, we enable may_swap = 1, that means all unused pages will be swapped out. Thus, running shrink_memory on a heavy loaded system, we can check how much swap is getting full. That can be the maximum swap size with a 10% delta. Also if ZRAM is used, it helps us in compressing and storing the pages for later use. 8) It can be helpful to allow more new applications to be launched, without killing the older once. And moving the least recently used pages to the SWAP area. Thus user data can be retained. 9) Can be part of a system system-tool to quickly defragment entire system memory. 10) This may also help in reducing fragmentation within CMA region. 11) More use cases can be identified. Most importantly, it can be more effective when applied intelligently, based on certain conditions. It should be executed always and the decision is left upto the user. Signed-off-by: Pintu Kumar --- V2: Added min,max parameter for shrink_memory, suggested by Heinrich Schuchardt . Error handling in sysctl_shrinkmem_handler, for any value other than 1, suggested by, Heinrich Schuchardt . Fixed HIBERNATION+SHRINK_MEMORY issue in shrink_all_memory, suggested by Valdis.Kletnieks@vt.edu. Restore gfp_mask to original, because of other dependencies. Also adding GFP_RECLAIM_MASK, does not affect anything. Verified power consumption data during shrink_memory, as suggested by Johannes Weiner . Verified application launch/re-launch scenarios before/after shrink_memory, as suggested by Xishi Qiu . Updates the commit messages with examples and use cases. Documentation/sysctl/vm.txt | 18 ++++++++++++++++++ include/linux/swap.h | 7 +++++++ kernel/sysctl.c | 16 ++++++++++++++++ mm/Kconfig | 8 ++++++++ mm/vmscan.c | 34 ++++++++++++++++++++++++++++++++-- 5 files changed, 81 insertions(+), 2 deletions(-) diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 9832ec5..54eda3a 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm: - page-cluster - panic_on_oom - percpu_pagelist_fraction +- shrink_memory - stat_interval - swappiness - user_reserve_kbytes @@ -718,6 +719,23 @@ sysctl, it will revert to this default behavior. ============================================================== +shrink_memory + +This control is available only when CONFIG_SHRINK_MEMORY is set. This control +can be used to aggressively reclaim memory system-wide in one shot. A value of +1 will instruct the kernel to reclaim as much as totalram_pages in the system. +For example, to reclaim all memory system-wide we can do: +# echo 1 > /proc/sys/vm/shrink_memory + +If any other value than 1 is written to shrink_memory an error EINVAL occurs. + +For more information about this control, please visit the following +presentation in embedded linux conference, 2015. +http://events.linuxfoundation.org/sites/events/files/slides/ +%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf + +============================================================== + stat_interval The time interval between which vm statistics are updated. The default diff --git a/include/linux/swap.h b/include/linux/swap.h index 9a7adfb..6505b0b 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -333,6 +333,13 @@ extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern unsigned long vm_total_pages; +#ifdef CONFIG_SHRINK_MEMORY +extern int sysctl_shrink_memory; +extern int sysctl_shrinkmem_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *length, loff_t *ppos); +#endif + + #ifdef CONFIG_NUMA extern int zone_reclaim_mode; extern int sysctl_min_unmapped_ratio; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index c566b56..e66581b 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -275,6 +275,11 @@ static int min_extfrag_threshold; static int max_extfrag_threshold = 1000; #endif +#ifdef CONFIG_SHRINK_MEMORY +static int min_shrink_memory = 1; +static int max_shrink_memory = 1; +#endif + static struct ctl_table kern_table[] = { { .procname = "sched_child_runs_first", @@ -1351,6 +1356,17 @@ static struct ctl_table vm_table[] = { }, #endif /* CONFIG_COMPACTION */ +#ifdef CONFIG_SHRINK_MEMORY + { + .procname = "shrink_memory", + .data = &sysctl_shrink_memory, + .maxlen = sizeof(int), + .mode = 0200, + .proc_handler = sysctl_shrinkmem_handler, + .extra1 = &min_shrink_memory, + .extra2 = &max_shrink_memory, + }, +#endif { .procname = "min_free_kbytes", .data = &min_free_kbytes, diff --git a/mm/Kconfig b/mm/Kconfig index b3a60ee..8e04bd9 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -657,3 +657,11 @@ config DEFERRED_STRUCT_PAGE_INIT when kswapd starts. This has a potential performance impact on processes running early in the lifetime of the systemm until kswapd finishes the initialisation. + +config SHRINK_MEMORY + bool "Allow for system-wide shrinking of memory" + default n + depends on MMU + help + It enables support for system-wide memory reclaim in one shot using + echo 1 > /proc/sys/vm/shrink_memory. diff --git a/mm/vmscan.c b/mm/vmscan.c index c8d8282..e802fa7 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -58,6 +58,10 @@ #define CREATE_TRACE_POINTS #include +#ifdef CONFIG_SHRINK_MEMORY +#include +#endif + struct scan_control { /* How many pages shrink_list() should reclaim */ unsigned long nr_to_reclaim; @@ -3557,7 +3561,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx) wake_up_interruptible(&pgdat->kswapd_wait); } -#ifdef CONFIG_HIBERNATION +#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY /* * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of * freed pages. @@ -3576,12 +3580,16 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim) .may_writepage = 1, .may_unmap = 1, .may_swap = 1, - .hibernation_mode = 1, }; struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask); struct task_struct *p = current; unsigned long nr_reclaimed; + if (system_entering_hibernation()) + sc.hibernation_mode = 1; + else + sc.hibernation_mode = 0; + p->flags |= PF_MEMALLOC; lockdep_set_current_reclaim_state(sc.gfp_mask); reclaim_state.reclaimed_slab = 0; @@ -3597,6 +3605,28 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim) } #endif /* CONFIG_HIBERNATION */ +#ifdef CONFIG_SHRINK_MEMORY +int sysctl_shrink_memory; +/* This is the entry point for system-wide shrink memory ++via /proc/sys/vm/shrink_memory */ +int sysctl_shrinkmem_handler(struct ctl_table *table, int write, + void __user *buffer, size_t *length, loff_t *ppos) +{ + int ret; + + ret = proc_dointvec_minmax(table, write, buffer, length, ppos); + if (ret) + return ret; + + if (write) { + if (sysctl_shrink_memory & 1) + shrink_all_memory(totalram_pages); + } + + return 0; +} +#endif + /* It's optimal to keep kswapds on the same CPUs as their memory, but not required for correctness. So if the last cpu in a node goes away, we get changed to run anywhere: as the first one comes back, -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/