2015-07-03 13:33:20

by PINTU KUMAR

[permalink] [raw]
Subject: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

This patch provides 2 things:
1. Add new control called shrink_memory in /proc/sys/vm/.
This control can be used to aggressively reclaim memory system-wide
in one shot from the user space. A value of 1 will instruct the
kernel to reclaim as much as totalram_pages in the system.
Example: echo 1 > /proc/sys/vm/shrink_memory

2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
Currently, shrink_all_memory function is used only during hibernation.
With the new config we can make use of this API for non-hibernation case
also without disturbing the hibernation case.

The detailed paper was presented in Embedded Linux Conference, Mar-2015
http://events.linuxfoundation.org/sites/events/files/slides/
%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf

Scenarios were this can be used and helpful are:
1) Can be invoked just after system boot-up is finished.
2) Can be invoked just before entering entire system suspend.
3) Can be invoked from kernel when order-4 pages starts failing.
4) Can be helpful to completely avoid or delay the kerenl OOM condition.
5) Can be developed as a system-tool to quickly defragment entire system
from user space, without the need to kill any application.

Signed-off-by: Pintu Kumar <[email protected]>
---
Documentation/sysctl/vm.txt | 16 ++++++++++++++++
include/linux/swap.h | 7 +++++++
kernel/sysctl.c | 9 +++++++++
mm/Kconfig | 8 ++++++++
mm/vmscan.c | 23 +++++++++++++++++++++--
5 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 9832ec5..a959ad1 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm:
- page-cluster
- panic_on_oom
- percpu_pagelist_fraction
+- shrink_memory
- stat_interval
- swappiness
- user_reserve_kbytes
@@ -718,6 +719,21 @@ sysctl, it will revert to this default behavior.

==============================================================

+shrink_memory
+
+This control is available only when CONFIG_SHRINK_MEMORY is set. This control
+can be used to aggressively reclaim memory system-wide in one shot. A value of
+1 will instruct the kernel to reclaim as much as totalram_pages in the system.
+For example, to reclaim all memory system-wide we can do:
+# echo 1 > /proc/sys/vm/shrink_memory
+
+For more information about this control, please visit the following
+presentation in embedded linux conference, 2015.
+http://events.linuxfoundation.org/sites/events/files/slides/
+%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
+
+==============================================================
+
stat_interval

The time interval between which vm statistics are updated. The default
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 9a7adfb..6505b0b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -333,6 +333,13 @@ extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
extern unsigned long vm_total_pages;

+#ifdef CONFIG_SHRINK_MEMORY
+extern int sysctl_shrink_memory;
+extern int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos);
+#endif
+
+
#ifdef CONFIG_NUMA
extern int zone_reclaim_mode;
extern int sysctl_min_unmapped_ratio;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c566b56..2895099 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1351,6 +1351,15 @@ static struct ctl_table vm_table[] = {
},

#endif /* CONFIG_COMPACTION */
+#ifdef CONFIG_SHRINK_MEMORY
+ {
+ .procname = "shrink_memory",
+ .data = &sysctl_shrink_memory,
+ .maxlen = sizeof(int),
+ .mode = 0200,
+ .proc_handler = sysctl_shrinkmem_handler,
+ },
+#endif
{
.procname = "min_free_kbytes",
.data = &min_free_kbytes,
diff --git a/mm/Kconfig b/mm/Kconfig
index b3a60ee..8e04bd9 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -657,3 +657,11 @@ config DEFERRED_STRUCT_PAGE_INIT
when kswapd starts. This has a potential performance impact on
processes running early in the lifetime of the systemm until kswapd
finishes the initialisation.
+
+config SHRINK_MEMORY
+ bool "Allow for system-wide shrinking of memory"
+ default n
+ depends on MMU
+ help
+ It enables support for system-wide memory reclaim in one shot using
+ echo 1 > /proc/sys/vm/shrink_memory.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c8d8282..837b88d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3557,7 +3557,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
wake_up_interruptible(&pgdat->kswapd_wait);
}

-#ifdef CONFIG_HIBERNATION
+#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY
/*
* Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
* freed pages.
@@ -3571,12 +3571,17 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
struct reclaim_state reclaim_state;
struct scan_control sc = {
.nr_to_reclaim = nr_to_reclaim,
+#ifdef CONFIG_SHRINK_MEMORY
+ .gfp_mask = (GFP_HIGHUSER_MOVABLE | GFP_RECLAIM_MASK),
+ .hibernation_mode = 0,
+#else
.gfp_mask = GFP_HIGHUSER_MOVABLE,
+ .hibernation_mode = 1,
+#endif
.priority = DEF_PRIORITY,
.may_writepage = 1,
.may_unmap = 1,
.may_swap = 1,
- .hibernation_mode = 1,
};
struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
struct task_struct *p = current;
@@ -3597,6 +3602,20 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
}
#endif /* CONFIG_HIBERNATION */

+#ifdef CONFIG_SHRINK_MEMORY
+int sysctl_shrink_memory;
+/* This is the entry point for system-wide shrink memory
++via /proc/sys/vm/shrink_memory */
+int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos)
+{
+ if (write)
+ shrink_all_memory(totalram_pages);
+
+ return 0;
+}
+#endif
+
/* It's optimal to keep kswapds on the same CPUs as their memory, but
not required for correctness. So if the last cpu in a node goes
away, we get changed to run anywhere: as the first one comes back,
--
1.7.9.5


2015-07-03 17:22:07

by Heinrich Schuchardt

[permalink] [raw]
Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

On 03.07.2015 15:20, Pintu Kumar wrote:
> This patch provides 2 things:
> 1. Add new control called shrink_memory in /proc/sys/vm/.
> This control can be used to aggressively reclaim memory system-wide
> in one shot from the user space. A value of 1 will instruct the
> kernel to reclaim as much as totalram_pages in the system.
> Example: echo 1 > /proc/sys/vm/shrink_memory
>
> 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
> Currently, shrink_all_memory function is used only during hibernation.
> With the new config we can make use of this API for non-hibernation case
> also without disturbing the hibernation case.
>
> The detailed paper was presented in Embedded Linux Conference, Mar-2015
> http://events.linuxfoundation.org/sites/events/files/slides/
> %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
>
> Scenarios were this can be used and helpful are:
> 1) Can be invoked just after system boot-up is finished.
> 2) Can be invoked just before entering entire system suspend.
> 3) Can be invoked from kernel when order-4 pages starts failing.
> 4) Can be helpful to completely avoid or delay the kerenl OOM condition.
> 5) Can be developed as a system-tool to quickly defragment entire system
> from user space, without the need to kill any application.
>
> Signed-off-by: Pintu Kumar <[email protected]>
> ---
> Documentation/sysctl/vm.txt | 16 ++++++++++++++++
> include/linux/swap.h | 7 +++++++
> kernel/sysctl.c | 9 +++++++++
> mm/Kconfig | 8 ++++++++
> mm/vmscan.c | 23 +++++++++++++++++++++--
> 5 files changed, 61 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> index 9832ec5..a959ad1 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm:
> - page-cluster
> - panic_on_oom
> - percpu_pagelist_fraction
> +- shrink_memory
> - stat_interval
> - swappiness
> - user_reserve_kbytes
> @@ -718,6 +719,21 @@ sysctl, it will revert to this default behavior.
>
> ==============================================================
>
> +shrink_memory
> +
> +This control is available only when CONFIG_SHRINK_MEMORY is set. This control
> +can be used to aggressively reclaim memory system-wide in one shot. A value of
> +1 will instruct the kernel to reclaim as much as totalram_pages in the system.
> +For example, to reclaim all memory system-wide we can do:
> +# echo 1 > /proc/sys/vm/shrink_memory

The API should be as restrictive as possible to allow for extensibility.

You describe "1" as the only used value. So, please add here:

"If any other value than 1 is written to shrink_memory an error
EINVAL occurs."

> +
> +For more information about this control, please visit the following
> +presentation in embedded linux conference, 2015.
> +http://events.linuxfoundation.org/sites/events/files/slides/
> +%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
> +
> +==============================================================
> +
> stat_interval
>
> The time interval between which vm statistics are updated. The default
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 9a7adfb..6505b0b 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -333,6 +333,13 @@ extern int vm_swappiness;
> extern int remove_mapping(struct address_space *mapping, struct page *page);
> extern unsigned long vm_total_pages;
>
> +#ifdef CONFIG_SHRINK_MEMORY
> +extern int sysctl_shrink_memory;
> +extern int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
> + void __user *buffer, size_t *length, loff_t *ppos);
> +#endif
> +
> +
> #ifdef CONFIG_NUMA
> extern int zone_reclaim_mode;
> extern int sysctl_min_unmapped_ratio;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index c566b56..2895099 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1351,6 +1351,15 @@ static struct ctl_table vm_table[] = {
> },
>
> #endif /* CONFIG_COMPACTION */
> +#ifdef CONFIG_SHRINK_MEMORY
> + {
> + .procname = "shrink_memory",
> + .data = &sysctl_shrink_memory,
> + .maxlen = sizeof(int),
> + .mode = 0200,
> + .proc_handler = sysctl_shrinkmem_handler,

Supply the value limits.

int min_shrink_memory = 1;
int max_shrink_memory = 1;

.extra1 = &min_shrink_memory,
.extra2 = &max_shrink_memory,

> + },
> +#endif
> {
> .procname = "min_free_kbytes",
> .data = &min_free_kbytes,
> diff --git a/mm/Kconfig b/mm/Kconfig
> index b3a60ee..8e04bd9 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -657,3 +657,11 @@ config DEFERRED_STRUCT_PAGE_INIT
> when kswapd starts. This has a potential performance impact on
> processes running early in the lifetime of the systemm until kswapd
> finishes the initialisation.
> +
> +config SHRINK_MEMORY
> + bool "Allow for system-wide shrinking of memory"
> + default n
> + depends on MMU
> + help
> + It enables support for system-wide memory reclaim in one shot using
> + echo 1 > /proc/sys/vm/shrink_memory.
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c8d8282..837b88d 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -3557,7 +3557,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
> wake_up_interruptible(&pgdat->kswapd_wait);
> }
>
> -#ifdef CONFIG_HIBERNATION
> +#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY
> /*
> * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
> * freed pages.
> @@ -3571,12 +3571,17 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
> struct reclaim_state reclaim_state;
> struct scan_control sc = {
> .nr_to_reclaim = nr_to_reclaim,
> +#ifdef CONFIG_SHRINK_MEMORY
> + .gfp_mask = (GFP_HIGHUSER_MOVABLE | GFP_RECLAIM_MASK),
> + .hibernation_mode = 0,
> +#else
> .gfp_mask = GFP_HIGHUSER_MOVABLE,
> + .hibernation_mode = 1,
> +#endif
> .priority = DEF_PRIORITY,
> .may_writepage = 1,
> .may_unmap = 1,
> .may_swap = 1,
> - .hibernation_mode = 1,
> };
> struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
> struct task_struct *p = current;
> @@ -3597,6 +3602,20 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
> }
> #endif /* CONFIG_HIBERNATION */
>
> +#ifdef CONFIG_SHRINK_MEMORY
> +int sysctl_shrink_memory;
> +/* This is the entry point for system-wide shrink memory
> ++via /proc/sys/vm/shrink_memory */
> +int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
> + void __user *buffer, size_t *length, loff_t *ppos)
> +{

Check if *buffer contains "1". If the value is not "1" return -EINVAL.

The check can be done using function proc_dointvec_minmax().

Best regards

Heinrich Schuchardt

> + if (write)
> + shrink_all_memory(totalram_pages);
> +
> + return 0;
> +}
> +#endif
> +
> /* It's optimal to keep kswapds on the same CPUs as their memory, but
> not required for correctness. So if the last cpu in a node goes
> away, we get changed to run anywhere: as the first one comes back,
>

2015-07-03 18:39:31

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

On Fri, Jul 03, 2015 at 06:50:07PM +0530, Pintu Kumar wrote:
> This patch provides 2 things:
> 1. Add new control called shrink_memory in /proc/sys/vm/.
> This control can be used to aggressively reclaim memory system-wide
> in one shot from the user space. A value of 1 will instruct the
> kernel to reclaim as much as totalram_pages in the system.
> Example: echo 1 > /proc/sys/vm/shrink_memory
>
> 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
> Currently, shrink_all_memory function is used only during hibernation.
> With the new config we can make use of this API for non-hibernation case
> also without disturbing the hibernation case.
>
> The detailed paper was presented in Embedded Linux Conference, Mar-2015
> http://events.linuxfoundation.org/sites/events/files/slides/
> %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
>
> Scenarios were this can be used and helpful are:
> 1) Can be invoked just after system boot-up is finished.

The allocator automatically reclaims when memory is needed, that's why
the metrics quoted in those slides, free pages and fragmentation level,
don't really mean much. We don't care how much memory is free or how
fragmented it is UNTIL somebody actually asks for it. The only metric
that counts is the allocation success ratio (and possibly the latency).

> 2) Can be invoked just before entering entire system suspend.

Why is that? Suspend already allocates as much as it needs to create
the system image.

> 3) Can be invoked from kernel when order-4 pages starts failing.

We have compaction for that, and compaction invokes page reclaim
automatically to satisfy its need for free pages.

> 4) Can be helpful to completely avoid or delay the kerenl OOM condition.

That's not how OOM works. An OOM is triggered when there is demand for
memory but no more pages to reclaim, telling the kernel to look harder
will not change that.

> 5) Can be developed as a system-tool to quickly defragment entire system
> from user space, without the need to kill any application.

Again, the kernel automatically reclaims and compacts memory on demand.
If the existing mechanisms don't do this properly, and you have actual
problems with them, they should be reported and fixed, not bypassed.
But the metrics you seem to base this change on are not representative
of something that should matter in practice.

2015-07-05 12:33:20

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

On Sat, Jul 04, 2015 at 06:04:37AM +0000, PINTU KUMAR wrote:
> >On Fri, Jul 03, 2015 at 06:50:07PM +0530, Pintu Kumar wrote:
> >> This patch provides 2 things:
> >> 1. Add new control called shrink_memory in /proc/sys/vm/.
> >> This control can be used to aggressively reclaim memory system-wide
> >> in one shot from the user space. A value of 1 will instruct the
> >> kernel to reclaim as much as totalram_pages in the system.
> >> Example: echo 1 > /proc/sys/vm/shrink_memory
> >>
> >> 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
> >> Currently, shrink_all_memory function is used only during hibernation.
> >> With the new config we can make use of this API for non-hibernation case
> >> also without disturbing the hibernation case.
> >>
> >> The detailed paper was presented in Embedded Linux Conference, Mar-2015
> >> http://events.linuxfoundation.org/sites/events/files/slides/
> >> %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
> >>
> >> Scenarios were this can be used and helpful are:
> >> 1) Can be invoked just after system boot-up is finished.
> >
> >The allocator automatically reclaims when memory is needed, that's why
> >the metrics quoted in those slides, free pages and fragmentation level,
> >don't really mean much. We don't care how much memory is free or how
> >fragmented it is UNTIL somebody actually asks for it. The only metric
> >that counts is the allocation success ratio (and possibly the latency).
>
> Yes, the allocator automatically reclaims memory but in the
> slowpath. Also it reclaims only to satisfy the current allocation
> needs. That means for all future higher-order allocations the system
> will be entering slowpath again and again. Over a point of time
> (with multiple application launch), the higher-orders (2^4 and
> above) will be gone. The system entering slowpath means that the
> first allocation attempt has already failed. Then in slowpath the
> sequence is: kswapd -> compaction -> then direct reclaim. Thus
> entering slowpath again and again will be a costly operation.
>
> Thus keeping free memory ready in higher-order pages will be helpful
> for succeeding first allocation attempt.

High order allocation fastpath sounds like a bad idea, especially on
embedded devices. It takes a lot of work to create higher order
pages, so anything that relies on being able to allocate them
frequently and quickly is going to be very expensive.

But even if you need higher order pages on a regular pages, compaction
is *way* more efficient and directed than what you are proposing. My
phone has 2G of memory, which is over half a million of pages. What
would it do to my battery life if you told the VM on a regular basis
to scan the LRUs until it has reclaimed half a million pages?

> The scenario that is discussed here is about: Invoking shrink_memory
> from user space, as soon as the system boot is finished. Because as
> per my observation, the buffer+caches that is accumulated during
> boot-up is not very helpful for the system for later application
> launch. Thus reclaiming all memory in shot after the boot-up will
> help grab higher-order pages and freeing lots of memory. Also the
> reclaimed memory stays in as actual free memory. The cached that
> gets accumulated after the application launch will be having more
> hits. It is like a little advanced version of drop_caches.

The buffers and cache are trivial to reclaim and compact, so that
shouldn't affect allocation success at all. And even allocation
latency should be reasonable.

drop_caches is a development/debugging tool for kernel developers, not
a tool to implement userspace memory management. If you find you need
to use it on a regular basis because of performance issues, then
please file a bug report.

> >> 2) Can be invoked just before entering entire system suspend.
> >
> >Why is that? Suspend already allocates as much as it needs to create
>
> >the system image.
>
> Sorry, but I think you got it wrong here. We are not talking about
> snapshot image creation part that comes under hibernation. We are
> talking about the mobile world, where the system gets suspended when
> it is kept idle for longer time. The hibernation part does not comes
> here. The idea is that the shrink_memory can be best utilized when
> the system is not doing any useful stuffs and going from idle to
> suspend. In this scenario, we can check the state of free memory and
> perform the system-wide reclaim if necessary. Thus when the system
> resume again, it will have enough memory as free. Again, this is
> mainly for embedded world where hibernation is not enabled. For
> normal world, it already does it during hibernation snapshot image
> creation.

The reason they are suspending is to conserve energy, now? This is an
outrageous amount of work you propose should be done when the system
goes idle. Generally, proactive work tends to be less efficient than
on-demand work due to overproduction, so the more power-restrained
your system, the lazier and just-in-time you should be.

If your higher-order allocation latency really is an issue, at least
use targetted background compaction. But again, everybody would be
better off if you didn't rely on frequent higher-order allocations,
because they require a lot of CPU-intensive work that consumes a lot
of power, whether you schedule that work on-demand or proactively.

> >> 3) Can be invoked from kernel when order-4 pages starts failing.
> >
> >We have compaction for that, and compaction invokes page reclaim
>
> >automatically to satisfy its need for free pages.
>
> It is not always true. Compaction may not be always
> successful. Again it is related to slowpath. When order-4 starts
> failing very often that means all higher-orders becomes 0. Thus
> system will be entering slowpath again and again, doing swap,
> compaction, reclaim most of the time. And even for compaction,
> there is a knob in user space to call compaction from user space:
> #echo 1 > /proc/sys/vm/compact_memory

At least that's not a cache-destructive operation and just compacts
already free pages but, just like drop_caches, you shouldn't ever have
to use this in production.

> >> 4) Can be helpful to completely avoid or delay the kerenl OOM condition.
> >
> >That's not how OOM works. An OOM is triggered when there is demand for
> >memory but no more pages to reclaim, telling the kernel to look harder
> >will not change that.
>
> >
> Yes, I know this. I am not talking about calling shrink_memory after OOM.
>
> Rather much before OOM when the first attempt of higher-order starts failing.
> This will delay the OOM to a much later stage.

That's not how OOM works *at all*. OOM happens when all the pages are
tied up in places where they can't be reclaimed. It has nothing to do
with fragmentation (OOM is not even defined for higher order pages) or
reclaim timing (since reclaim can't reclaim unreclaimable pages. heh).

You're really not making a lot of sense here.

> >> 5) Can be developed as a system-tool to quickly defragment entire system
> >> from user space, without the need to kill any application.
> >
> >Again, the kernel automatically reclaims and compacts memory on demand.
> >If the existing mechanisms don't do this properly, and you have actual
> >problems with them, they should be reported and fixed, not bypassed.
> >But the metrics you seem to base this change on are not representative
>
> >of something that should matter in practice.
>
> It is not always guaranteed that compaction/reclaim
> _did_some_progress_ always yield some results on the fly. It takes
> sometime to get sync with the free memory. Thus keeping the free
> list ready before hand will be much more helpful.

We can always make compaction more aggressive with certain GFP flags
and tell it to wait for delayed memory frees etc.

> Anyways, the use case here is to develop a system utility which can
> perform compaction/reclaim/compaction aggressively. Its an
> additional idea that somebody interested can develop.

I'm having a hard time seeing a clear usecase from your proposal, and
the implementation is too heavyhanded and destructive to be generally
useful as a memory management tool in real life.

2015-07-05 08:46:31

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

On Fri, 03 Jul 2015 18:50:07 +0530, Pintu Kumar said:
> This patch provides 2 things:

> 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
> Currently, shrink_all_memory function is used only during hibernation.
> With the new config we can make use of this API for non-hibernation case
> also without disturbing the hibernation case.

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c

> @@ -3571,12 +3571,17 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
> struct reclaim_state reclaim_state;
> struct scan_control sc = {
> .nr_to_reclaim = nr_to_reclaim,
> +#ifdef CONFIG_SHRINK_MEMORY
> + .gfp_mask = (GFP_HIGHUSER_MOVABLE | GFP_RECLAIM_MASK),
> + .hibernation_mode = 0,
> +#else
> .gfp_mask = GFP_HIGHUSER_MOVABLE,
> + .hibernation_mode = 1,
> +#endif


That looks like a bug just waiting to happen. What happens if we
call an actual hibernation mode in a SHRINK_MEMORY=y kernel, and it finds
an extra gfp mask bit set, and hibernation_mode set to an unexpected value?


Attachments:
(No filename) (848.00 B)

2015-07-06 10:24:03

by Xishi Qiu

[permalink] [raw]
Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

On 2015/7/3 21:20, Pintu Kumar wrote:

> This patch provides 2 things:
> 1. Add new control called shrink_memory in /proc/sys/vm/.
> This control can be used to aggressively reclaim memory system-wide
> in one shot from the user space. A value of 1 will instruct the
> kernel to reclaim as much as totalram_pages in the system.
> Example: echo 1 > /proc/sys/vm/shrink_memory
>
> 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
> Currently, shrink_all_memory function is used only during hibernation.
> With the new config we can make use of this API for non-hibernation case
> also without disturbing the hibernation case.
>
> The detailed paper was presented in Embedded Linux Conference, Mar-2015
> http://events.linuxfoundation.org/sites/events/files/slides/
> %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
>
> Scenarios were this can be used and helpful are:
> 1) Can be invoked just after system boot-up is finished.
> 2) Can be invoked just before entering entire system suspend.
> 3) Can be invoked from kernel when order-4 pages starts failing.
> 4) Can be helpful to completely avoid or delay the kerenl OOM condition.
> 5) Can be developed as a system-tool to quickly defragment entire system
> from user space, without the need to kill any application.
>

Hi Pintu,

How about increase min_free_kbytes and Android lowmemorykiller's level?

Thanks,
Xishi Qiu

2015-07-06 13:56:10

by PINTU KUMAR

[permalink] [raw]
Subject: RE: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature


> -----Original Message-----
> From: Heinrich Schuchardt [mailto:[email protected]]
> Sent: Friday, July 03, 2015 10:51 PM
> To: Pintu Kumar; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
> feature
>
> On 03.07.2015 15:20, Pintu Kumar wrote:
> > This patch provides 2 things:
> > 1. Add new control called shrink_memory in /proc/sys/vm/.
> > This control can be used to aggressively reclaim memory system-wide in
> > one shot from the user space. A value of 1 will instruct the kernel to
> > reclaim as much as totalram_pages in the system.
> > Example: echo 1 > /proc/sys/vm/shrink_memory
> >
> > 2. Enable shrink_all_memory API in kernel with new
> CONFIG_SHRINK_MEMORY.
> > Currently, shrink_all_memory function is used only during hibernation.
> > With the new config we can make use of this API for non-hibernation
> > case also without disturbing the hibernation case.
> >
> > The detailed paper was presented in Embedded Linux Conference,
> > Mar-2015 http://events.linuxfoundation.org/sites/events/files/slides/
> > %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
> >
> > Scenarios were this can be used and helpful are:
> > 1) Can be invoked just after system boot-up is finished.
> > 2) Can be invoked just before entering entire system suspend.
> > 3) Can be invoked from kernel when order-4 pages starts failing.
> > 4) Can be helpful to completely avoid or delay the kerenl OOM condition.
> > 5) Can be developed as a system-tool to quickly defragment entire system
> > from user space, without the need to kill any application.
> >
> > Signed-off-by: Pintu Kumar <[email protected]>
> > ---
> > Documentation/sysctl/vm.txt | 16 ++++++++++++++++
> > include/linux/swap.h | 7 +++++++
> > kernel/sysctl.c | 9 +++++++++
> > mm/Kconfig | 8 ++++++++
> > mm/vmscan.c | 23 +++++++++++++++++++++--
> > 5 files changed, 61 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
> > index 9832ec5..a959ad1 100644
> > --- a/Documentation/sysctl/vm.txt
> > +++ b/Documentation/sysctl/vm.txt
> > @@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm:
> > - page-cluster
> > - panic_on_oom
> > - percpu_pagelist_fraction
> > +- shrink_memory
> > - stat_interval
> > - swappiness
> > - user_reserve_kbytes
> > @@ -718,6 +719,21 @@ sysctl, it will revert to this default behavior.
> >
> > ==============================================================
> >
> > +shrink_memory
> > +
> > +This control is available only when CONFIG_SHRINK_MEMORY is set. This
> > +control can be used to aggressively reclaim memory system-wide in one
> > +shot. A value of
> > +1 will instruct the kernel to reclaim as much as totalram_pages in the
system.
> > +For example, to reclaim all memory system-wide we can do:
> > +# echo 1 > /proc/sys/vm/shrink_memory
>
> The API should be as restrictive as possible to allow for extensibility.
>
> You describe "1" as the only used value. So, please add here:
>
> "If any other value than 1 is written to shrink_memory an error EINVAL
occurs."
>

Ok, I will handle this error case in next patch set.
Actual, I did exactly like compact_memory, so I made this way.

> > +
> > +For more information about this control, please visit the following
> > +presentation in embedded linux conference, 2015.
> > +http://events.linuxfoundation.org/sites/events/files/slides/
> > +%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
> > +
> > +==============================================================
> > +
> > stat_interval
> >
> > The time interval between which vm statistics are updated. The
> > default diff --git a/include/linux/swap.h b/include/linux/swap.h index
> > 9a7adfb..6505b0b 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -333,6 +333,13 @@ extern int vm_swappiness; extern int
> > remove_mapping(struct address_space *mapping, struct page *page);
> > extern unsigned long vm_total_pages;
> >
> > +#ifdef CONFIG_SHRINK_MEMORY
> > +extern int sysctl_shrink_memory;
> > +extern int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
> > + void __user *buffer, size_t *length, loff_t *ppos); #endif
> > +
> > +
> > #ifdef CONFIG_NUMA
> > extern int zone_reclaim_mode;
> > extern int sysctl_min_unmapped_ratio; diff --git a/kernel/sysctl.c
> > b/kernel/sysctl.c index c566b56..2895099 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -1351,6 +1351,15 @@ static struct ctl_table vm_table[] = {
> > },
> >
> > #endif /* CONFIG_COMPACTION */
> > +#ifdef CONFIG_SHRINK_MEMORY
> > + {
> > + .procname = "shrink_memory",
> > + .data = &sysctl_shrink_memory,
> > + .maxlen = sizeof(int),
> > + .mode = 0200,
> > + .proc_handler = sysctl_shrinkmem_handler,
>
> Supply the value limits.
>
> int min_shrink_memory = 1;
> int max_shrink_memory = 1;
>
> .extra1 = &min_shrink_memory,
> .extra2 = &max_shrink_memory,
>
Ok, I will include this value as well in the new patch set.

> > + },
> > +#endif
> > {
> > .procname = "min_free_kbytes",
> > .data = &min_free_kbytes,
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index b3a60ee..8e04bd9 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -657,3 +657,11 @@ config DEFERRED_STRUCT_PAGE_INIT
> > when kswapd starts. This has a potential performance impact on
> > processes running early in the lifetime of the systemm until kswapd
> > finishes the initialisation.
> > +
> > +config SHRINK_MEMORY
> > + bool "Allow for system-wide shrinking of memory"
> > + default n
> > + depends on MMU
> > + help
> > + It enables support for system-wide memory reclaim in one shot using
> > + echo 1 > /proc/sys/vm/shrink_memory.
> > diff --git a/mm/vmscan.c b/mm/vmscan.c index c8d8282..837b88d 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -3557,7 +3557,7 @@ void wakeup_kswapd(struct zone *zone, int order,
> enum zone_type classzone_idx)
> > wake_up_interruptible(&pgdat->kswapd_wait);
> > }
> >
> > -#ifdef CONFIG_HIBERNATION
> > +#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY
> > /*
> > * Try to free `nr_to_reclaim' of memory, system-wide, and return the
number
> of
> > * freed pages.
> > @@ -3571,12 +3571,17 @@ unsigned long shrink_all_memory(unsigned long
> nr_to_reclaim)
> > struct reclaim_state reclaim_state;
> > struct scan_control sc = {
> > .nr_to_reclaim = nr_to_reclaim,
> > +#ifdef CONFIG_SHRINK_MEMORY
> > + .gfp_mask = (GFP_HIGHUSER_MOVABLE | GFP_RECLAIM_MASK),
> > + .hibernation_mode = 0,
> > +#else
> > .gfp_mask = GFP_HIGHUSER_MOVABLE,
> > + .hibernation_mode = 1,
> > +#endif
> > .priority = DEF_PRIORITY,
> > .may_writepage = 1,
> > .may_unmap = 1,
> > .may_swap = 1,
> > - .hibernation_mode = 1,
> > };
> > struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
> > struct task_struct *p = current;
> > @@ -3597,6 +3602,20 @@ unsigned long shrink_all_memory(unsigned long
> > nr_to_reclaim) } #endif /* CONFIG_HIBERNATION */
> >
> > +#ifdef CONFIG_SHRINK_MEMORY
> > +int sysctl_shrink_memory;
> > +/* This is the entry point for system-wide shrink memory
> > ++via /proc/sys/vm/shrink_memory */
> > +int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
> > + void __user *buffer, size_t *length, loff_t *ppos) {
>
> Check if *buffer contains "1". If the value is not "1" return -EINVAL.
>
> The check can be done using function proc_dointvec_minmax().
>

Ok, I will include this case also in the new patch set.
Thanks for the review and suggestions.

> Best regards
>
> Heinrich Schuchardt
>
> > + if (write)
> > + shrink_all_memory(totalram_pages);
> > +
> > + return 0;
> > +}
> > +#endif
> > +
> > /* It's optimal to keep kswapds on the same CPUs as their memory, but
> > not required for correctness. So if the last cpu in a node goes
> > away, we get changed to run anywhere: as the first one comes back,
> >

2015-07-06 14:06:28

by PINTU KUMAR

[permalink] [raw]
Subject: RE: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

Hi,

> -----Original Message-----
> From: Xishi Qiu [mailto:[email protected]]
> Sent: Monday, July 06, 2015 3:53 PM
> To: Pintu Kumar
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
> feature
>
> On 2015/7/3 21:20, Pintu Kumar wrote:
>
> > This patch provides 2 things:
> > 1. Add new control called shrink_memory in /proc/sys/vm/.
> > This control can be used to aggressively reclaim memory system-wide in
> > one shot from the user space. A value of 1 will instruct the kernel to
> > reclaim as much as totalram_pages in the system.
> > Example: echo 1 > /proc/sys/vm/shrink_memory
> >
> > 2. Enable shrink_all_memory API in kernel with new
> CONFIG_SHRINK_MEMORY.
> > Currently, shrink_all_memory function is used only during hibernation.
> > With the new config we can make use of this API for non-hibernation
> > case also without disturbing the hibernation case.
> >
> > The detailed paper was presented in Embedded Linux Conference,
> > Mar-2015 http://events.linuxfoundation.org/sites/events/files/slides/
> > %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
> >
> > Scenarios were this can be used and helpful are:
> > 1) Can be invoked just after system boot-up is finished.
> > 2) Can be invoked just before entering entire system suspend.
> > 3) Can be invoked from kernel when order-4 pages starts failing.
> > 4) Can be helpful to completely avoid or delay the kerenl OOM condition.
> > 5) Can be developed as a system-tool to quickly defragment entire system
> > from user space, without the need to kill any application.
> >
>
> Hi Pintu,
>
> How about increase min_free_kbytes and Android lowmemorykiller's level?
>
Thanks for the review.
Actually in Tizen, we don't use Android LMK and we wanted to delay the LMK with
aggressive direct_reclaim offline.
And increasing min_free value also may not help much.
Currently, in our case free memory never falls below 10MB, with 512MB RAM
configuration.


> Thanks,
> Xishi Qiu

2015-07-06 17:23:40

by Pintu Agarwal

[permalink] [raw]
Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

Sorry, looks like some problem with the yahoo mail. Some emails are bouncing.
Sending again with the gmail.


----- Original Message -----
> From: "[email protected]" <[email protected]>
> To: Pintu Kumar <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Sent: Sunday, 5 July 2015 1:38 AM
> Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature
>
> On Fri, 03 Jul 2015 18:50:07 +0530, Pintu Kumar said:
>
>> This patch provides 2 things:
>
>> 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
>> Currently, shrink_all_memory function is used only during hibernation.
>> With the new config we can make use of this API for non-hibernation case
>> also without disturbing the hibernation case.
>
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>
>> @@ -3571,12 +3571,17 @@ unsigned long shrink_all_memory(unsigned long
> nr_to_reclaim)
>> struct reclaim_state reclaim_state;
>> struct scan_control sc = {
>> .nr_to_reclaim = nr_to_reclaim,
>> +#ifdef CONFIG_SHRINK_MEMORY
>> + .gfp_mask = (GFP_HIGHUSER_MOVABLE | GFP_RECLAIM_MASK),
>> + .hibernation_mode = 0,
>> +#else
>> .gfp_mask = GFP_HIGHUSER_MOVABLE,
>> + .hibernation_mode = 1,
>> +#endif
>
>
> That looks like a bug just waiting to happen. What happens if we
> call an actual hibernation mode in a SHRINK_MEMORY=y kernel, and it finds
> an extra gfp mask bit set, and hibernation_mode set to an unexpected value?
>
Ok, got it. Thanks for pointing this out.
I will handle HIBERNATION & SHRINK_MEMORY case and send again.
I will try to handle it using ifdefs. Do you have any special
suggestions on how this can be handled?
I verified only for the ARM case without hibernation. But, it is
likely that this feature can be enabled in laptop mode also. So we
should handle it.

2015-07-06 18:38:08

by Pintu Agarwal

[permalink] [raw]
Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

Hi,
Please find my comments inline.

> Sent: Saturday, 4 July 2015 6:25 PM
> Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature
>
> On Sat, Jul 04, 2015 at 06:04:37AM +0000, PINTU KUMAR wrote:
>> >On Fri, Jul 03, 2015 at 06:50:07PM +0530, Pintu Kumar wrote:
>> >> This patch provides 2 things:
>> >> 1. Add new control called shrink_memory in /proc/sys/vm/.
>> >> This control can be used to aggressively reclaim memory
> system-wide
>> >> in one shot from the user space. A value of 1 will instruct the
>> >> kernel to reclaim as much as totalram_pages in the system.
>> >> Example: echo 1 > /proc/sys/vm/shrink_memory
>> >>
>> >> 2. Enable shrink_all_memory API in kernel with new
> CONFIG_SHRINK_MEMORY.
>> >> Currently, shrink_all_memory function is used only during
> hibernation.
>> >> With the new config we can make use of this API for
> non-hibernation case
>> >> also without disturbing the hibernation case.
>> >>
>> >> The detailed paper was presented in Embedded Linux Conference,
> Mar-2015
>> >> http://events.linuxfoundation.org/sites/events/files/slides/
>> >> %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
>> >>
>> >> Scenarios were this can be used and helpful are:
>> >> 1) Can be invoked just after system boot-up is finished.
>> >
>> >The allocator automatically reclaims when memory is needed, that's
> why
>> >the metrics quoted in those slides, free pages and fragmentation level,
>> >don't really mean much. We don't care how much memory is free
> or how
>> >fragmented it is UNTIL somebody actually asks for it. The only metric
>> >that counts is the allocation success ratio (and possibly the latency).
>>
>> Yes, the allocator automatically reclaims memory but in the
>> slowpath. Also it reclaims only to satisfy the current allocation
>> needs. That means for all future higher-order allocations the system
>> will be entering slowpath again and again. Over a point of time
>> (with multiple application launch), the higher-orders (2^4 and
>> above) will be gone. The system entering slowpath means that the
>> first allocation attempt has already failed. Then in slowpath the
>> sequence is: kswapd -> compaction -> then direct reclaim. Thus
>> entering slowpath again and again will be a costly operation.
>>
>> Thus keeping free memory ready in higher-order pages will be helpful
>> for succeeding first allocation attempt.
>
> High order allocation fastpath sounds like a bad idea, especially on
> embedded devices. It takes a lot of work to create higher order
> pages, so anything that relies on being able to allocate them
> frequently and quickly is going to be very expensive.
>
> But even if you need higher order pages on a regular pages, compaction
> is *way* more efficient and directed than what you are proposing. My
> phone has 2G of memory, which is over half a million of pages. What
> would it do to my battery life if you told the VM on a regular basis
> to scan the LRUs until it has reclaimed half a million pages?
>
Yes, may be for 2GB and above we dont need to worry about.
It is mainly be useful for less than 1GB and 512MB RAM devices.
Like in our case with 512MB RAM device, the totalram_pages was around 460MB.
And actual free memory was just about 30MB after bootup and
reclaimable memory goes up to ~200MB.
In this scenario it is highly unlikely that we have order-4, order-8
pages available after some usage.
Thus we found it to be useful when system is idle and when order-4 and
above becomes 0.
And we dont need to run it on regular basis.
Ok, I will try to get the power measurement done between slowpath and
shrink memory.

>> The scenario that is discussed here is about: Invoking shrink_memory
>> from user space, as soon as the system boot is finished. Because as
>> per my observation, the buffer+caches that is accumulated during
>> boot-up is not very helpful for the system for later application
>> launch. Thus reclaiming all memory in shot after the boot-up will
>> help grab higher-order pages and freeing lots of memory. Also the
>> reclaimed memory stays in as actual free memory. The cached that
>> gets accumulated after the application launch will be having more
>> hits. It is like a little advanced version of drop_caches.
>
> The buffers and cache are trivial to reclaim and compact, so that
> shouldn't affect allocation success at all. And even allocation
> latency should be reasonable.
>
> drop_caches is a development/debugging tool for kernel developers, not
> a tool to implement userspace memory management. If you find you need
> to use it on a regular basis because of performance issues, then
> please file a bug report.
>
Yes, shrink_memory can also be helpful in debugging and vm parameter
tuning like drop_caches.
It is a way of performing direct_reclaim for user space just like
direct_compact.
Following are the benefits:
It help us to identify how much of maximum memory could be reclaimable
at any point of time.
What and how much higher-order pages could be formed with this amount
of reclaimable memory.
Also, In shrink_all_memory, we enable may_swap = 1, that means all
unused pages could be swapped out to says ZRAM backing store.
Thus we will know, what could be the best swap space that can be
configured for the device for an over-loaded scenario.
Thus it can help in system tuning also.

>> >> 2) Can be invoked just before entering entire system suspend.
>> >
>> >Why is that? Suspend already allocates as much as it needs to create
>>
>> >the system image.
>>
>> Sorry, but I think you got it wrong here. We are not talking about
>> snapshot image creation part that comes under hibernation. We are
>> talking about the mobile world, where the system gets suspended when
>> it is kept idle for longer time. The hibernation part does not comes
>> here. The idea is that the shrink_memory can be best utilized when
>> the system is not doing any useful stuffs and going from idle to
>> suspend. In this scenario, we can check the state of free memory and
>> perform the system-wide reclaim if necessary. Thus when the system
>> resume again, it will have enough memory as free. Again, this is
>> mainly for embedded world where hibernation is not enabled. For
>> normal world, it already does it during hibernation snapshot image
>> creation.
>
> The reason they are suspending is to conserve energy, now? This is an
> outrageous amount of work you propose should be done when the system
> goes idle. Generally, proactive work tends to be less efficient than
> on-demand work due to overproduction, so the more power-restrained
> your system, the lazier and just-in-time you should be.
>
The amount of work will be done only if it is really required, just
before entering suspend.
That is only if the system is heavily fragmented and most likely to
enter slowpath again and again.
In which case the system will be already slow.

> If your higher-order allocation latency really is an issue, at least
> use targetted background compaction. But again, everybody would be
> better off if you didn't rely on frequent higher-order allocations,
> because they require a lot of CPU-intensive work that consumes a lot
> of power, whether you schedule that work on-demand or proactively.
>
Yes, we tried compaction, but as per our analysis background
compaction is not useful always if
you have less free memory and more reclaimable.
And, it should not be called always. It should be called based on some
condition and when situation demands.
Ok, we will also perform power measurement and report.

>> >> 3) Can be invoked from kernel when order-4 pages starts failing.
>> >
>> >We have compaction for that, and compaction invokes page reclaim
>>
>> >automatically to satisfy its need for free pages.
>>
>> It is not always true. Compaction may not be always
>> successful. Again it is related to slowpath. When order-4 starts
>> failing very often that means all higher-orders becomes 0. Thus
>> system will be entering slowpath again and again, doing swap,
>> compaction, reclaim most of the time. And even for compaction,
>> there is a knob in user space to call compaction from user space:
>> #echo 1 > /proc/sys/vm/compact_memory
>
> At least that's not a cache-destructive operation and just compacts
> already free pages but, just like drop_caches, you shouldn't ever have
> to use this in production.
>
Yes, we will not be using this in production.
As I said earlier, this feature can be used for some tuning purpose also.

>> >> 4) Can be helpful to completely avoid or delay the kerenl OOM
> condition.
>> >
>> >That's not how OOM works. An OOM is triggered when there is demand
> for
>> >memory but no more pages to reclaim, telling the kernel to look harder
>> >will not change that.
>>
>> >
>> Yes, I know this. I am not talking about calling shrink_memory after OOM.
>>
>> Rather much before OOM when the first attempt of higher-order starts
> failing.
>> This will delay the OOM to a much later stage.
>
> That's not how OOM works *at all*. OOM happens when all the pages are
> tied up in places where they can't be reclaimed. It has nothing to do
> with fragmentation (OOM is not even defined for higher order pages) or
> reclaim timing (since reclaim can't reclaim unreclaimable pages. heh).
>
> You're really not making a lot of sense here.
>
If you check my last part of the slides in the ELC presentation, I
have covered few scenarios.
Consider a case, that you have launched 10 application, and free
memory is not enough for 11th app launch.
Now, system have 2 choice, keep performing direct reclaim or goes for OOM kill.
But, we dont want the old application to be killed and instead allows
more application to be launched.
Now, if after 10 application, if we execute memory shrinker, and swap
of previous pages out, we get enough free memory.
This will allow us to launch few more application and at the same time
retain previous application content on swap.

>> >> 5) Can be developed as a system-tool to quickly defragment entire
> system
>> >> from user space, without the need to kill any application.
>> >
>> >Again, the kernel automatically reclaims and compacts memory on demand.
>> >If the existing mechanisms don't do this properly, and you have
> actual
>> >problems with them, they should be reported and fixed, not bypassed.
>> >But the metrics you seem to base this change on are not representative
>>
>> >of something that should matter in practice.
>>
>> It is not always guaranteed that compaction/reclaim
>> _did_some_progress_ always yield some results on the fly. It takes
>> sometime to get sync with the free memory. Thus keeping the free
>> list ready before hand will be much more helpful.
>
> We can always make compaction more aggressive with certain GFP flags
> and tell it to wait for delayed memory frees etc.
>
As per our experiments, we found that compaction is not always helpful
in kernel 3.10.
The compaction success rate is 3/20 attempts. Instead direct_reclaim
is more helpful.

>> Anyways, the use case here is to develop a system utility which can
>> perform compaction/reclaim/compaction aggressively. Its an
>> additional idea that somebody interested can develop.
>
> I'm having a hard time seeing a clear usecase from your proposal, and
> the implementation is too heavyhanded and destructive to be generally
> useful as a memory management tool in real life.
>
I have few more use cases to be presented.
Calling shrink_memory from ION/Graphics driver in a work queue, if
order-4 requests are failing for IOMMU allocation.
We also observed that, after heavy file transfer operation to/from
Device/PC, the free memory becomes very low and reclaimable memory
becomes very high.
This reclaimable memory may not be very useful for future application launch.
So, we can schedule a shrink_memory to make it part of free.

Sorry, if I have misunderstood any of your request.
If you have any other suggestions or experiments that you would like
me to perform to come to a conclusion, please let me know.
I will be happy to take your request.

2015-07-07 01:39:41

by Xishi Qiu

[permalink] [raw]
Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

On 2015/7/6 22:03, PINTU KUMAR wrote:

> Hi,
>
>> -----Original Message-----
>> From: Xishi Qiu [mailto:[email protected]]
>> Sent: Monday, July 06, 2015 3:53 PM
>> To: Pintu Kumar
>> Cc: [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; linux-
>> [email protected]; [email protected]; [email protected]; linux-
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected]
>> Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
>> feature
>>
>> On 2015/7/3 21:20, Pintu Kumar wrote:
>>
>>> This patch provides 2 things:
>>> 1. Add new control called shrink_memory in /proc/sys/vm/.
>>> This control can be used to aggressively reclaim memory system-wide in
>>> one shot from the user space. A value of 1 will instruct the kernel to
>>> reclaim as much as totalram_pages in the system.
>>> Example: echo 1 > /proc/sys/vm/shrink_memory
>>>
>>> 2. Enable shrink_all_memory API in kernel with new
>> CONFIG_SHRINK_MEMORY.
>>> Currently, shrink_all_memory function is used only during hibernation.
>>> With the new config we can make use of this API for non-hibernation
>>> case also without disturbing the hibernation case.
>>>
>>> The detailed paper was presented in Embedded Linux Conference,
>>> Mar-2015 http://events.linuxfoundation.org/sites/events/files/slides/
>>> %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
>>>
>>> Scenarios were this can be used and helpful are:
>>> 1) Can be invoked just after system boot-up is finished.
>>> 2) Can be invoked just before entering entire system suspend.
>>> 3) Can be invoked from kernel when order-4 pages starts failing.
>>> 4) Can be helpful to completely avoid or delay the kerenl OOM condition.
>>> 5) Can be developed as a system-tool to quickly defragment entire system
>>> from user space, without the need to kill any application.
>>>
>>
>> Hi Pintu,
>>
>> How about increase min_free_kbytes and Android lowmemorykiller's level?
>>
> Thanks for the review.
> Actually in Tizen, we don't use Android LMK and we wanted to delay the LMK with
> aggressive direct_reclaim offline.
> And increasing min_free value also may not help much.
> Currently, in our case free memory never falls below 10MB, with 512MB RAM
> configuration.
>

How about the performance as you reclaim so much memory?
(e.g. shrink page cache, use zram, ksm, compaction...)
When launching the same app next time, it may be slow, right?

How about use cgroup to manage the apps, but I don't know how to do the detail.

Thanks,
Xishi Qiu

>
>


2015-07-07 07:02:38

by PINTU KUMAR

[permalink] [raw]
Subject: RE: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

Hi,

> -----Original Message-----
> From: Xishi Qiu [mailto:[email protected]]
> Sent: Tuesday, July 07, 2015 7:08 AM
> To: PINTU KUMAR
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
> feature
>
> On 2015/7/6 22:03, PINTU KUMAR wrote:
>
> > Hi,
> >
> >> -----Original Message-----
> >> From: Xishi Qiu [mailto:[email protected]]
> >> Sent: Monday, July 06, 2015 3:53 PM
> >> To: Pintu Kumar
> >> Cc: [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; linux- [email protected];
> >> [email protected]; [email protected]; linux-
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected]
> >> Subject: Re: [PATCH 1/1] kernel/sysctl.c: Add
> >> /proc/sys/vm/shrink_memory feature
> >>
> >> On 2015/7/3 21:20, Pintu Kumar wrote:
> >>
> >>> This patch provides 2 things:
> >>> 1. Add new control called shrink_memory in /proc/sys/vm/.
> >>> This control can be used to aggressively reclaim memory system-wide
> >>> in one shot from the user space. A value of 1 will instruct the
> >>> kernel to reclaim as much as totalram_pages in the system.
> >>> Example: echo 1 > /proc/sys/vm/shrink_memory
> >>>
> >>> 2. Enable shrink_all_memory API in kernel with new
> >> CONFIG_SHRINK_MEMORY.
> >>> Currently, shrink_all_memory function is used only during hibernation.
> >>> With the new config we can make use of this API for non-hibernation
> >>> case also without disturbing the hibernation case.
> >>>
> >>> The detailed paper was presented in Embedded Linux Conference,
> >>> Mar-2015
> >>> http://events.linuxfoundation.org/sites/events/files/slides/
> >>> %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
> >>>
> >>> Scenarios were this can be used and helpful are:
> >>> 1) Can be invoked just after system boot-up is finished.
> >>> 2) Can be invoked just before entering entire system suspend.
> >>> 3) Can be invoked from kernel when order-4 pages starts failing.
> >>> 4) Can be helpful to completely avoid or delay the kerenl OOM condition.
> >>> 5) Can be developed as a system-tool to quickly defragment entire system
> >>> from user space, without the need to kill any application.
> >>>
> >>
> >> Hi Pintu,
> >>
> >> How about increase min_free_kbytes and Android lowmemorykiller's level?
> >>
> > Thanks for the review.
> > Actually in Tizen, we don't use Android LMK and we wanted to delay the
> > LMK with aggressive direct_reclaim offline.
> > And increasing min_free value also may not help much.
> > Currently, in our case free memory never falls below 10MB, with 512MB
> > RAM configuration.
> >
>
> How about the performance as you reclaim so much memory?
> (e.g. shrink page cache, use zram, ksm, compaction...) When launching the same
> app next time, it may be slow, right?
>
Yes, obviously, there will be slight degrade in performance for relaunch of
application.
But, it will be better that the first launch.
Please check the following data:
Browser Launch:
01-01 12:06:26.550
01-01 12:06:28.340
Time taken: 1790 ms

Relaunch:
01-01 12:09:08.130
01-01 12:09:08.380
Time: 250ms

After shrink_memory again:
01-01 12:12:17.280
01-01 12:12:17.770
Time: 490ms

The main point here is that the killing is avoided and application data is
retained.
Also, when the memory pressure situation arises leading to slowpath again and
again,
We will be already in the performance degraded state.
Instead of continuous performance degradation, a one time is better.

> How about use cgroup to manage the apps, but I don't know how to do the
> detail.
>
Yes, we already use cgroups, vmpressure, to manage memory threshold for reclaim,
swap and kill.
Even cgroup also have a similar force_reclaim mechanism, to reclaim pages within
groups for a particular threshold.
But, that is at a later stages and also it does not care about order of the
pages.
It does not perform system-wide reclaim.


> Thanks,
> Xishi Qiu
>
> >
> >
>

2015-07-17 06:42:46

by PINTU KUMAR

[permalink] [raw]
Subject: [PATCHv2 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

This patch provides 2 things:
1. Add new control called shrink_memory in /proc/sys/vm/.
This control can be used to aggressively reclaim memory system-wide
in one shot from the user space. A value of 1 will instruct the
kernel to reclaim as much as totalram_pages in the system.
Example: echo 1 > /proc/sys/vm/shrink_memory

If any other value than 1 is written to shrink_memory an error EINVAL
occurs.

2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
Currently, shrink_all_memory function is used only during hibernation.
With the new config we can make use of this API for non-hibernation case
also without disturbing the hibernation case.

The detailed paper was presented in Embedded Linux Conference, Mar-2015
http://events.linuxfoundation.org/sites/events/files/slides/
%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf

A sample example is shown below:
Device: ARMv7, Dual Core CPU 1.2GHz
RAM: 512MB (Without SWAP/ZRAM)
Linux Kernel: 3.10.17
Scenario: Just after boot-up finished.

BEFORE:
-------------------------------------------------------------------------
shell> free -tm ; cat /proc/buddyinfo
total used free shared buffers cached
Mem: 460 440 20 0 35 154
-/+ buffers/cache: 250 209
Swap: 0 0 0
Total: 460 440 20
Node 0, zone Normal 1037 705 92 19 19 17 4 9 0 0 0

shell> vmstat 1 &

AFTER:
-------------------------------------------------------------------------
shell> echo 1 > /proc/sys/vm/shrink_memory

r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 20768 35876 157876 0 0 0 0 64 177 0 1 99 0 0
--------------------------------------------------------------------------------
|1 0 0 33104 34864 149808 0 0 0 0 82 221 0 12 88 0 0|
--------------------------------------------------------------------------------
0 0 0 188776 3000 54420 0 0 0 0 216 374 0 30 70 0 0
0 0 0 188400 3652 54528 0 0 740 8 188 337 2 1 95 2 0

shell> free -tm ; cat /proc/buddyinfo
total used free shared buffers cached
Mem: 460 278 182 0 4 54
-/+ buffers/cache: 219 240
Swap: 0 0 0
Total: 460 278 182
Node 0, zone Normal 5575 3158 1500 727 240 90 33 18 10 6 6

RESULTS:
-----------------------------------------------------
Around 160MB of memory were recovered in one shot.
Many higher-order pages were recovered in the process.
>From the vmstat output the total CPU usage is: ~12% (system), when this
command is running, for 1 second.
We also measured the power consumption using H/W power monitor tool.
Below is the result:
Before - ~180mA
During shrink memory - ~237mA
Duration - ~0.5 sec
Consumption: ~57mA

FURTHER OBSERVATIONS:
-----------------------------------------------------
37% reduction in killing of application with memory shrink calling on boot up.
Around ~4000 page faults are reduced.
Around ~43% of reduction in kswapd calls.
Movement to slowpath reduced dractically.
Combining shrink_memory with compaction shows good benefits over fragmentation.

APPLICATION LAUNCH BEHAVIOR:
-----------------------------------------------------
During First Launch:
============================================================================
Application Before_shrink_memory After_shrink_memory Difference
Camera 1.981 1.86 0.121
Gallery 1.276 0.94 0.336
contacts 1.112 0.941 0.171
messaging 0.886 0.795 0.091
settings 1.257 1.212 0.045
Music 1.854 2.098 -0.244
Gmail 1.872 1.935 -0.063
Browser 2.569 2.677 -0.108
============================================================================

During Re-launch:
============================================================================
Application Before_shrink_memory After_shrink_memory Difference
Camera 1.248 0.976 0.272
Gallery 0.697 0.633 0.064
contacts 0.506 0.561 -0.055
messaging 0.533 0.489 0.044
settings 0.833 0.805 0.028
Music 0.832 0.769 0.063
Gmail 0.913 0.841 0.072
Browser 0.579 0.57 0.009
============================================================================

Various other use cases where this can be used:
----------------------------------------------------------------------------
1) Just after system boot-up is finished, using the sysctl configuration from
bootup script.
2) During system suspend state, after suspend_freeze_processes()
[kernel/power/suspend.c]
Based on certain condition about fragmentation or free memory state.
3) From Android ION system heap driver, when order-4 allocation starts failing.
By calling shrink_all_memory, in a separate worker thread, based on certain
condition.
4) It can be combined with compact_memory to achieve better results on memory
fragmentation.
5) It can be helpful in debugging and tuning various vm parameters.
6) It can be helpful to identify how much of maximum memory could be
reclaimable at any point of time.
And how much higher-order pages could be formed with this amount of
reclaimable memory.
Thus it can be helpful in accordingly tuning the reserved memory needs
of a system.
7) It can be helpful in properly tuning the SWAP size in the system.
In shrink_all_memory, we enable may_swap = 1, that means all unused pages
will be swapped out.
Thus, running shrink_memory on a heavy loaded system, we can check how much
swap is getting full.
That can be the maximum swap size with a 10% delta.
Also if ZRAM is used, it helps us in compressing and storing the pages for
later use.
8) It can be helpful to allow more new applications to be launched, without
killing the older once.
And moving the least recently used pages to the SWAP area.
Thus user data can be retained.
9) Can be part of a system system-tool to quickly defragment entire system
memory.
10) This may also help in reducing fragmentation within CMA region.
11) More use cases can be identified.

Most importantly, it can be more effective when applied intelligently, based
on certain conditions.
It should be executed always and the decision is left upto the user.

Signed-off-by: Pintu Kumar <[email protected]>
---
V2: Added min,max parameter for shrink_memory, suggested by
Heinrich Schuchardt <[email protected]>.
Error handling in sysctl_shrinkmem_handler, for any value other than 1,
suggested by, Heinrich Schuchardt <[email protected]>.
Fixed HIBERNATION+SHRINK_MEMORY issue in shrink_all_memory,
suggested by [email protected].
Restore gfp_mask to original, because of other dependencies.
Also adding GFP_RECLAIM_MASK, does not affect anything.
Verified power consumption data during shrink_memory,
as suggested by Johannes Weiner <[email protected]>.
Verified application launch/re-launch scenarios before/after shrink_memory,
as suggested by Xishi Qiu <[email protected]>.
Updates the commit messages with examples and use cases.

Documentation/sysctl/vm.txt | 18 ++++++++++++++++++
include/linux/swap.h | 7 +++++++
kernel/sysctl.c | 16 ++++++++++++++++
mm/Kconfig | 8 ++++++++
mm/vmscan.c | 34 ++++++++++++++++++++++++++++++++--
5 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 9832ec5..54eda3a 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm:
- page-cluster
- panic_on_oom
- percpu_pagelist_fraction
+- shrink_memory
- stat_interval
- swappiness
- user_reserve_kbytes
@@ -718,6 +719,23 @@ sysctl, it will revert to this default behavior.

==============================================================

+shrink_memory
+
+This control is available only when CONFIG_SHRINK_MEMORY is set. This control
+can be used to aggressively reclaim memory system-wide in one shot. A value of
+1 will instruct the kernel to reclaim as much as totalram_pages in the system.
+For example, to reclaim all memory system-wide we can do:
+# echo 1 > /proc/sys/vm/shrink_memory
+
+If any other value than 1 is written to shrink_memory an error EINVAL occurs.
+
+For more information about this control, please visit the following
+presentation in embedded linux conference, 2015.
+http://events.linuxfoundation.org/sites/events/files/slides/
+%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
+
+==============================================================
+
stat_interval

The time interval between which vm statistics are updated. The default
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 9a7adfb..6505b0b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -333,6 +333,13 @@ extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
extern unsigned long vm_total_pages;

+#ifdef CONFIG_SHRINK_MEMORY
+extern int sysctl_shrink_memory;
+extern int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos);
+#endif
+
+
#ifdef CONFIG_NUMA
extern int zone_reclaim_mode;
extern int sysctl_min_unmapped_ratio;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c566b56..e66581b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -275,6 +275,11 @@ static int min_extfrag_threshold;
static int max_extfrag_threshold = 1000;
#endif

+#ifdef CONFIG_SHRINK_MEMORY
+static int min_shrink_memory = 1;
+static int max_shrink_memory = 1;
+#endif
+
static struct ctl_table kern_table[] = {
{
.procname = "sched_child_runs_first",
@@ -1351,6 +1356,17 @@ static struct ctl_table vm_table[] = {
},

#endif /* CONFIG_COMPACTION */
+#ifdef CONFIG_SHRINK_MEMORY
+ {
+ .procname = "shrink_memory",
+ .data = &sysctl_shrink_memory,
+ .maxlen = sizeof(int),
+ .mode = 0200,
+ .proc_handler = sysctl_shrinkmem_handler,
+ .extra1 = &min_shrink_memory,
+ .extra2 = &max_shrink_memory,
+ },
+#endif
{
.procname = "min_free_kbytes",
.data = &min_free_kbytes,
diff --git a/mm/Kconfig b/mm/Kconfig
index b3a60ee..8e04bd9 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -657,3 +657,11 @@ config DEFERRED_STRUCT_PAGE_INIT
when kswapd starts. This has a potential performance impact on
processes running early in the lifetime of the systemm until kswapd
finishes the initialisation.
+
+config SHRINK_MEMORY
+ bool "Allow for system-wide shrinking of memory"
+ default n
+ depends on MMU
+ help
+ It enables support for system-wide memory reclaim in one shot using
+ echo 1 > /proc/sys/vm/shrink_memory.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c8d8282..e802fa7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -58,6 +58,10 @@
#define CREATE_TRACE_POINTS
#include <trace/events/vmscan.h>

+#ifdef CONFIG_SHRINK_MEMORY
+#include <linux/suspend.h>
+#endif
+
struct scan_control {
/* How many pages shrink_list() should reclaim */
unsigned long nr_to_reclaim;
@@ -3557,7 +3561,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
wake_up_interruptible(&pgdat->kswapd_wait);
}

-#ifdef CONFIG_HIBERNATION
+#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY
/*
* Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
* freed pages.
@@ -3576,12 +3580,16 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
.may_writepage = 1,
.may_unmap = 1,
.may_swap = 1,
- .hibernation_mode = 1,
};
struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
struct task_struct *p = current;
unsigned long nr_reclaimed;

+ if (system_entering_hibernation())
+ sc.hibernation_mode = 1;
+ else
+ sc.hibernation_mode = 0;
+
p->flags |= PF_MEMALLOC;
lockdep_set_current_reclaim_state(sc.gfp_mask);
reclaim_state.reclaimed_slab = 0;
@@ -3597,6 +3605,28 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
}
#endif /* CONFIG_HIBERNATION */

+#ifdef CONFIG_SHRINK_MEMORY
+int sysctl_shrink_memory;
+/* This is the entry point for system-wide shrink memory
++via /proc/sys/vm/shrink_memory */
+int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos)
+{
+ int ret;
+
+ ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
+ if (ret)
+ return ret;
+
+ if (write) {
+ if (sysctl_shrink_memory & 1)
+ shrink_all_memory(totalram_pages);
+ }
+
+ return 0;
+}
+#endif
+
/* It's optimal to keep kswapds on the same CPUs as their memory, but
not required for correctness. So if the last cpu in a node goes
away, we get changed to run anywhere: as the first one comes back,
--
1.7.9.5

2015-07-17 07:01:35

by PINTU KUMAR

[permalink] [raw]
Subject: RE: [PATCHv2 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

Sorry, correcting a small typo error below.

Please review and provide your comments.
This is the version2 of the previous patch.

> -----Original Message-----
> From: Pintu Kumar [mailto:[email protected]]
> Sent: Friday, July 17, 2015 12:00 PM
> To: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: [PATCHv2 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
> feature
>
> This patch provides 2 things:
> 1. Add new control called shrink_memory in /proc/sys/vm/.
> This control can be used to aggressively reclaim memory system-wide in one
shot
> from the user space. A value of 1 will instruct the kernel to reclaim as much
as
> totalram_pages in the system.
> Example: echo 1 > /proc/sys/vm/shrink_memory
>
> If any other value than 1 is written to shrink_memory an error EINVAL occurs.
>
> 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
> Currently, shrink_all_memory function is used only during hibernation.
> With the new config we can make use of this API for non-hibernation case also
> without disturbing the hibernation case.
>
> The detailed paper was presented in Embedded Linux Conference, Mar-2015
> http://events.linuxfoundation.org/sites/events/files/slides/
> %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
>
> A sample example is shown below:
> Device: ARMv7, Dual Core CPU 1.2GHz
> RAM: 512MB (Without SWAP/ZRAM)
> Linux Kernel: 3.10.17
> Scenario: Just after boot-up finished.
>
> BEFORE:
> -------------------------------------------------------------------------
> shell> free -tm ; cat /proc/buddyinfo
> total used free shared buffers cached
> Mem: 460 440 20 0 35 154
> -/+ buffers/cache: 250 209
> Swap: 0 0 0
> Total: 460 440 20
> Node 0, zone Normal 1037 705 92 19 19 17 4 9
0 0 0
>
> shell> vmstat 1 &
>
> AFTER:
> -------------------------------------------------------------------------
> shell> echo 1 > /proc/sys/vm/shrink_memory
>
> r b swpd free buff cache si so bi bo in cs us sy id wa
st
> 0 0 0 20768 35876 157876 0 0 0 0 64 177 0 1 99 0
0
>
--------------------------------------------------------------------------------
> |1 0 0 33104 34864 149808 0 0 0 0 82 221 0 12 88 0
0|
>
--------------------------------------------------------------------------------
> 0 0 0 188776 3000 54420 0 0 0 0 216 374 0 30 70 0
0
> 0 0 0 188400 3652 54528 0 0 740 8 188 337 2 1 95 2
0
>
> shell> free -tm ; cat /proc/buddyinfo
> total used free shared buffers cached
> Mem: 460 278 182 0 4 54
> -/+ buffers/cache: 219 240
> Swap: 0 0 0
> Total: 460 278 182
> Node 0, zone Normal 5575 3158 1500 727 240 90 33 18
10 6
> 6
>
> RESULTS:
> -----------------------------------------------------
> Around 160MB of memory were recovered in one shot.
> Many higher-order pages were recovered in the process.
> From the vmstat output the total CPU usage is: ~12% (system), when this
> command is running, for 1 second.
> We also measured the power consumption using H/W power monitor tool.
> Below is the result:
> Before - ~180mA
> During shrink memory - ~237mA
> Duration - ~0.5 sec
> Consumption: ~57mA
>
> FURTHER OBSERVATIONS:
> -----------------------------------------------------
> 37% reduction in killing of application with memory shrink calling on boot up.
> Around ~4000 page faults are reduced.
> Around ~43% of reduction in kswapd calls.
> Movement to slowpath reduced dractically.
> Combining shrink_memory with compaction shows good benefits over
> fragmentation.
>
> APPLICATION LAUNCH BEHAVIOR:
> -----------------------------------------------------
> During First Launch:
> ==================================================================
> ==========
> Application Before_shrink_memory After_shrink_memory Difference
> Camera 1.981 1.86 0.121
> Gallery 1.276 0.94 0.336
> contacts 1.112 0.941 0.171
> messaging 0.886 0.795 0.091
> settings 1.257 1.212 0.045
> Music 1.854 2.098 -0.244
> Gmail 1.872 1.935 -0.063
> Browser 2.569 2.677 -0.108
> ==================================================================
> ==========
>
> During Re-launch:
> ==================================================================
> ==========
> Application Before_shrink_memory After_shrink_memory Difference
> Camera 1.248 0.976 0.272
> Gallery 0.697 0.633 0.064
> contacts 0.506 0.561 -0.055
> messaging 0.533 0.489 0.044
> settings 0.833 0.805 0.028
> Music 0.832 0.769 0.063
> Gmail 0.913 0.841 0.072
> Browser 0.579 0.57 0.009
> ==================================================================
> ==========
>
> Various other use cases where this can be used:
> ----------------------------------------------------------------------------
> 1) Just after system boot-up is finished, using the sysctl configuration from
> bootup script.
> 2) During system suspend state, after suspend_freeze_processes()
> [kernel/power/suspend.c]
> Based on certain condition about fragmentation or free memory state.
> 3) From Android ION system heap driver, when order-4 allocation starts
failing.
> By calling shrink_all_memory, in a separate worker thread, based on certain
> condition.
> 4) It can be combined with compact_memory to achieve better results on
> memory
> fragmentation.
> 5) It can be helpful in debugging and tuning various vm parameters.
> 6) It can be helpful to identify how much of maximum memory could be
> reclaimable at any point of time.
> And how much higher-order pages could be formed with this amount of
> reclaimable memory.
> Thus it can be helpful in accordingly tuning the reserved memory needs
> of a system.
> 7) It can be helpful in properly tuning the SWAP size in the system.
> In shrink_all_memory, we enable may_swap = 1, that means all unused pages
> will be swapped out.
> Thus, running shrink_memory on a heavy loaded system, we can check how
> much
> swap is getting full.
> That can be the maximum swap size with a 10% delta.
> Also if ZRAM is used, it helps us in compressing and storing the pages for
> later use.
> 8) It can be helpful to allow more new applications to be launched, without
> killing the older once.
> And moving the least recently used pages to the SWAP area.
> Thus user data can be retained.
> 9) Can be part of a system system-tool to quickly defragment entire system
> memory.
> 10) This may also help in reducing fragmentation within CMA region.
> 11) More use cases can be identified.
>
> Most importantly, it can be more effective when applied intelligently, based
on
> certain conditions.
> It should be executed always and the decision is left upto the user.

* It should _not_ be executed always. The decision is left to the user.

>
> Signed-off-by: Pintu Kumar <[email protected]>
> ---
> V2: Added min,max parameter for shrink_memory, suggested by
> Heinrich Schuchardt <[email protected]>.
> Error handling in sysctl_shrinkmem_handler, for any value other than 1,
> suggested by, Heinrich Schuchardt <[email protected]>.
> Fixed HIBERNATION+SHRINK_MEMORY issue in shrink_all_memory,
> suggested by [email protected].
> Restore gfp_mask to original, because of other dependencies.
> Also adding GFP_RECLAIM_MASK, does not affect anything.
> Verified power consumption data during shrink_memory,
> as suggested by Johannes Weiner <[email protected]>.
> Verified application launch/re-launch scenarios before/after
shrink_memory,
> as suggested by Xishi Qiu <[email protected]>.
> Updates the commit messages with examples and use cases.
>
> Documentation/sysctl/vm.txt | 18 ++++++++++++++++++
> include/linux/swap.h | 7 +++++++
> kernel/sysctl.c | 16 ++++++++++++++++
> mm/Kconfig | 8 ++++++++
> mm/vmscan.c | 34 ++++++++++++++++++++++++++++++++--
> 5 files changed, 81 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index
> 9832ec5..54eda3a 100644
> --- a/Documentation/sysctl/vm.txt
> +++ b/Documentation/sysctl/vm.txt
> @@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm:
> - page-cluster
> - panic_on_oom
> - percpu_pagelist_fraction
> +- shrink_memory
> - stat_interval
> - swappiness
> - user_reserve_kbytes
> @@ -718,6 +719,23 @@ sysctl, it will revert to this default behavior.
>
> ==============================================================
>
> +shrink_memory
> +
> +This control is available only when CONFIG_SHRINK_MEMORY is set. This
> +control can be used to aggressively reclaim memory system-wide in one
> +shot. A value of
> +1 will instruct the kernel to reclaim as much as totalram_pages in the
system.
> +For example, to reclaim all memory system-wide we can do:
> +# echo 1 > /proc/sys/vm/shrink_memory
> +
> +If any other value than 1 is written to shrink_memory an error EINVAL occurs.
> +
> +For more information about this control, please visit the following
> +presentation in embedded linux conference, 2015.
> +http://events.linuxfoundation.org/sites/events/files/slides/
> +%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
> +
> +==============================================================
> +
> stat_interval
>
> The time interval between which vm statistics are updated. The default diff
--git
> a/include/linux/swap.h b/include/linux/swap.h index 9a7adfb..6505b0b 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -333,6 +333,13 @@ extern int vm_swappiness; extern int
> remove_mapping(struct address_space *mapping, struct page *page); extern
> unsigned long vm_total_pages;
>
> +#ifdef CONFIG_SHRINK_MEMORY
> +extern int sysctl_shrink_memory;
> +extern int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
> + void __user *buffer, size_t *length, loff_t *ppos); #endif
> +
> +
> #ifdef CONFIG_NUMA
> extern int zone_reclaim_mode;
> extern int sysctl_min_unmapped_ratio;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c index c566b56..e66581b 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -275,6 +275,11 @@ static int min_extfrag_threshold; static int
> max_extfrag_threshold = 1000; #endif
>
> +#ifdef CONFIG_SHRINK_MEMORY
> +static int min_shrink_memory = 1;
> +static int max_shrink_memory = 1;
> +#endif
> +
> static struct ctl_table kern_table[] = {
> {
> .procname = "sched_child_runs_first",
> @@ -1351,6 +1356,17 @@ static struct ctl_table vm_table[] = {
> },
>
> #endif /* CONFIG_COMPACTION */
> +#ifdef CONFIG_SHRINK_MEMORY
> + {
> + .procname = "shrink_memory",
> + .data = &sysctl_shrink_memory,
> + .maxlen = sizeof(int),
> + .mode = 0200,
> + .proc_handler = sysctl_shrinkmem_handler,
> + .extra1 = &min_shrink_memory,
> + .extra2 = &max_shrink_memory,
> + },
> +#endif
> {
> .procname = "min_free_kbytes",
> .data = &min_free_kbytes,
> diff --git a/mm/Kconfig b/mm/Kconfig
> index b3a60ee..8e04bd9 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -657,3 +657,11 @@ config DEFERRED_STRUCT_PAGE_INIT
> when kswapd starts. This has a potential performance impact on
> processes running early in the lifetime of the systemm until kswapd
> finishes the initialisation.
> +
> +config SHRINK_MEMORY
> + bool "Allow for system-wide shrinking of memory"
> + default n
> + depends on MMU
> + help
> + It enables support for system-wide memory reclaim in one shot using
> + echo 1 > /proc/sys/vm/shrink_memory.
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index c8d8282..e802fa7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -58,6 +58,10 @@
> #define CREATE_TRACE_POINTS
> #include <trace/events/vmscan.h>
>
> +#ifdef CONFIG_SHRINK_MEMORY
> +#include <linux/suspend.h>
> +#endif
> +
> struct scan_control {
> /* How many pages shrink_list() should reclaim */
> unsigned long nr_to_reclaim;
> @@ -3557,7 +3561,7 @@ void wakeup_kswapd(struct zone *zone, int order,
> enum zone_type classzone_idx)
> wake_up_interruptible(&pgdat->kswapd_wait);
> }
>
> -#ifdef CONFIG_HIBERNATION
> +#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY
> /*
> * Try to free `nr_to_reclaim' of memory, system-wide, and return the number
of
> * freed pages.
> @@ -3576,12 +3580,16 @@ unsigned long shrink_all_memory(unsigned long
> nr_to_reclaim)
> .may_writepage = 1,
> .may_unmap = 1,
> .may_swap = 1,
> - .hibernation_mode = 1,
> };
> struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
> struct task_struct *p = current;
> unsigned long nr_reclaimed;
>
> + if (system_entering_hibernation())
> + sc.hibernation_mode = 1;
> + else
> + sc.hibernation_mode = 0;
> +
> p->flags |= PF_MEMALLOC;
> lockdep_set_current_reclaim_state(sc.gfp_mask);
> reclaim_state.reclaimed_slab = 0;
> @@ -3597,6 +3605,28 @@ unsigned long shrink_all_memory(unsigned long
> nr_to_reclaim) } #endif /* CONFIG_HIBERNATION */
>
> +#ifdef CONFIG_SHRINK_MEMORY
> +int sysctl_shrink_memory;
> +/* This is the entry point for system-wide shrink memory
> ++via /proc/sys/vm/shrink_memory */
> +int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
> + void __user *buffer, size_t *length, loff_t *ppos) {
> + int ret;
> +
> + ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
> + if (ret)
> + return ret;
> +
> + if (write) {
> + if (sysctl_shrink_memory & 1)
> + shrink_all_memory(totalram_pages);
> + }
> +
> + return 0;
> +}
> +#endif
> +
> /* It's optimal to keep kswapds on the same CPUs as their memory, but
> not required for correctness. So if the last cpu in a node goes
> away, we get changed to run anywhere: as the first one comes back,
> --
> 1.7.9.5

2015-07-20 04:40:45

by PINTU KUMAR

[permalink] [raw]
Subject: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

This patch provides 2 things:
1. Add new control called shrink_memory in /proc/sys/vm/.
This control can be used to aggressively reclaim memory system-wide
in one shot from the user space. A value of 1 will instruct the
kernel to reclaim as much as totalram_pages in the system.
Example: echo 1 > /proc/sys/vm/shrink_memory

If any other value than 1 is written to shrink_memory an error EINVAL
occurs.

2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
Currently, shrink_all_memory function is used only during hibernation.
With the new config we can make use of this API for non-hibernation case
also without disturbing the hibernation case.

The detailed paper was presented in Embedded Linux Conference, Mar-2015
http://events.linuxfoundation.org/sites/events/files/slides/
%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf

A sample example is shown below:
Device: ARMv7, Dual Core CPU 1.2GHz
RAM: 512MB (Without SWAP/ZRAM)
Linux Kernel: 3.10.17
Scenario: Just after boot-up finished.

BEFORE:
-------------------------------------------------------------------------
shell> free -tm ; cat /proc/buddyinfo
total used free shared buffers cached
Mem: 460 440 20 0 35 154
-/+ buffers/cache: 250 209
Swap: 0 0 0
Total: 460 440 20
Node 0, zone Normal 1037 705 92 19 19 17 4 9 0 0 0

shell> vmstat 1 &

AFTER:
-------------------------------------------------------------------------
shell> echo 1 > /proc/sys/vm/shrink_memory

r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 20768 35876 157876 0 0 0 0 64 177 0 1 99 0 0
--------------------------------------------------------------------------------
|1 0 0 33104 34864 149808 0 0 0 0 82 221 0 12 88 0 0|
--------------------------------------------------------------------------------
0 0 0 188776 3000 54420 0 0 0 0 216 374 0 30 70 0 0
0 0 0 188400 3652 54528 0 0 740 8 188 337 2 1 95 2 0

shell> free -tm ; cat /proc/buddyinfo
total used free shared buffers cached
Mem: 460 278 182 0 4 54
-/+ buffers/cache: 219 240
Swap: 0 0 0
Total: 460 278 182
Node 0, zone Normal 5575 3158 1500 727 240 90 33 18 10 6 6

RESULTS:
-----------------------------------------------------
Around 160MB of memory were recovered in one shot.
Many higher-order pages were recovered in the process.
>From the vmstat output the total CPU usage is: ~12% (system), when this
command is running, for 1 second.
We also measured the power consumption using H/W power monitor tool.
Below is the result:
Before - ~180mA
During shrink memory - ~237mA
Duration - ~0.5 sec
Consumption: ~57mA

FURTHER OBSERVATIONS:
-----------------------------------------------------
37% reduction in killing of application with memory shrink calling on boot up.
Around ~4000 page faults are reduced.
Around ~43% of reduction in kswapd calls.
Movement to slowpath reduced dractically.
Combining shrink_memory with compaction shows good benefits over fragmentation.

APPLICATION LAUNCH BEHAVIOR:
-----------------------------------------------------
During First Launch:
============================================================================
Application Before_shrink_memory After_shrink_memory Difference
Camera 1.981 1.86 0.121
Gallery 1.276 0.94 0.336
contacts 1.112 0.941 0.171
messaging 0.886 0.795 0.091
settings 1.257 1.212 0.045
Music 1.854 2.098 -0.244
Gmail 1.872 1.935 -0.063
Browser 2.569 2.677 -0.108
============================================================================

During Re-launch:
============================================================================
Application Before_shrink_memory After_shrink_memory Difference
Camera 1.248 0.976 0.272
Gallery 0.697 0.633 0.064
contacts 0.506 0.561 -0.055
messaging 0.533 0.489 0.044
settings 0.833 0.805 0.028
Music 0.832 0.769 0.063
Gmail 0.913 0.841 0.072
Browser 0.579 0.57 0.009
============================================================================

Various other use cases where this can be used:
----------------------------------------------------------------------------
1) Just after system boot-up is finished, using the sysctl configuration from
bootup script.
2) During system suspend state, after suspend_freeze_processes()
[kernel/power/suspend.c]
Based on certain condition about fragmentation or free memory state.
3) From Android ION system heap driver, when order-4 allocation starts failing.
By calling shrink_all_memory, in a separate worker thread, based on certain
condition.
4) It can be combined with compact_memory to achieve better results on memory
fragmentation.
5) It can be helpful in debugging and tuning various vm parameters.
6) It can be helpful to identify how much of maximum memory could be
reclaimable at any point of time.
And how much higher-order pages could be formed with this amount of
reclaimable memory.
Thus it can be helpful in accordingly tuning the reserved memory needs
of a system.
7) It can be helpful in properly tuning the SWAP size in the system.
In shrink_all_memory, we enable may_swap = 1, that means all unused pages
will be swapped out.
Thus, running shrink_memory on a heavy loaded system, we can check how much
swap is getting full.
That can be the maximum swap size with a 10% delta.
Also if ZRAM is used, it helps us in compressing and storing the pages for
later use.
8) It can be helpful to allow more new applications to be launched, without
killing the older once.
And moving the least recently used pages to the SWAP area.
Thus user data can be retained.
9) Can be part of a system utility to quickly defragment entire system
memory.
10) This may also help in reducing fragmentation within CMA region.
11) More use cases can be identified.

Most importantly, it can be more effective when applied intelligently, based
on certain conditions.
It should not be executed always and the decision is left upto the user.

Signed-off-by: Pintu Kumar <[email protected]>
---
V3: Correcting a small typo error at the end of commit message.

V2: Added min,max parameter for shrink_memory, suggested by
Heinrich Schuchardt <[email protected]>.
Error handling in sysctl_shrinkmem_handler, for any value other than 1,
suggested by, Heinrich Schuchardt <[email protected]>.
Fixed HIBERNATION+SHRINK_MEMORY issue in shrink_all_memory,
suggested by [email protected].
Restore gfp_mask to original, because of other dependencies.
Also adding GFP_RECLAIM_MASK, does not affect anything.
Verified power consumption data during shrink_memory,
as suggested by Johannes Weiner <[email protected]>.
Verified application launch/re-launch scenarios before/after shrink_memory,
as suggested by Xishi Qiu <[email protected]>.
Updates the commit messages with examples and use cases.

Documentation/sysctl/vm.txt | 18 ++++++++++++++++++
include/linux/swap.h | 7 +++++++
kernel/sysctl.c | 16 ++++++++++++++++
mm/Kconfig | 8 ++++++++
mm/vmscan.c | 34 ++++++++++++++++++++++++++++++++--
5 files changed, 81 insertions(+), 2 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 9832ec5..54eda3a 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -54,6 +54,7 @@ Currently, these files are in /proc/sys/vm:
- page-cluster
- panic_on_oom
- percpu_pagelist_fraction
+- shrink_memory
- stat_interval
- swappiness
- user_reserve_kbytes
@@ -718,6 +719,23 @@ sysctl, it will revert to this default behavior.

==============================================================

+shrink_memory
+
+This control is available only when CONFIG_SHRINK_MEMORY is set. This control
+can be used to aggressively reclaim memory system-wide in one shot. A value of
+1 will instruct the kernel to reclaim as much as totalram_pages in the system.
+For example, to reclaim all memory system-wide we can do:
+# echo 1 > /proc/sys/vm/shrink_memory
+
+If any other value than 1 is written to shrink_memory an error EINVAL occurs.
+
+For more information about this control, please visit the following
+presentation in embedded linux conference, 2015.
+http://events.linuxfoundation.org/sites/events/files/slides/
+%5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
+
+==============================================================
+
stat_interval

The time interval between which vm statistics are updated. The default
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 9a7adfb..6505b0b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -333,6 +333,13 @@ extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
extern unsigned long vm_total_pages;

+#ifdef CONFIG_SHRINK_MEMORY
+extern int sysctl_shrink_memory;
+extern int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos);
+#endif
+
+
#ifdef CONFIG_NUMA
extern int zone_reclaim_mode;
extern int sysctl_min_unmapped_ratio;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c566b56..e66581b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -275,6 +275,11 @@ static int min_extfrag_threshold;
static int max_extfrag_threshold = 1000;
#endif

+#ifdef CONFIG_SHRINK_MEMORY
+static int min_shrink_memory = 1;
+static int max_shrink_memory = 1;
+#endif
+
static struct ctl_table kern_table[] = {
{
.procname = "sched_child_runs_first",
@@ -1351,6 +1356,17 @@ static struct ctl_table vm_table[] = {
},

#endif /* CONFIG_COMPACTION */
+#ifdef CONFIG_SHRINK_MEMORY
+ {
+ .procname = "shrink_memory",
+ .data = &sysctl_shrink_memory,
+ .maxlen = sizeof(int),
+ .mode = 0200,
+ .proc_handler = sysctl_shrinkmem_handler,
+ .extra1 = &min_shrink_memory,
+ .extra2 = &max_shrink_memory,
+ },
+#endif
{
.procname = "min_free_kbytes",
.data = &min_free_kbytes,
diff --git a/mm/Kconfig b/mm/Kconfig
index b3a60ee..8e04bd9 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -657,3 +657,11 @@ config DEFERRED_STRUCT_PAGE_INIT
when kswapd starts. This has a potential performance impact on
processes running early in the lifetime of the systemm until kswapd
finishes the initialisation.
+
+config SHRINK_MEMORY
+ bool "Allow for system-wide shrinking of memory"
+ default n
+ depends on MMU
+ help
+ It enables support for system-wide memory reclaim in one shot using
+ echo 1 > /proc/sys/vm/shrink_memory.
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c8d8282..e802fa7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -58,6 +58,10 @@
#define CREATE_TRACE_POINTS
#include <trace/events/vmscan.h>

+#ifdef CONFIG_SHRINK_MEMORY
+#include <linux/suspend.h>
+#endif
+
struct scan_control {
/* How many pages shrink_list() should reclaim */
unsigned long nr_to_reclaim;
@@ -3557,7 +3561,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzone_idx)
wake_up_interruptible(&pgdat->kswapd_wait);
}

-#ifdef CONFIG_HIBERNATION
+#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY
/*
* Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
* freed pages.
@@ -3576,12 +3580,16 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
.may_writepage = 1,
.may_unmap = 1,
.may_swap = 1,
- .hibernation_mode = 1,
};
struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
struct task_struct *p = current;
unsigned long nr_reclaimed;

+ if (system_entering_hibernation())
+ sc.hibernation_mode = 1;
+ else
+ sc.hibernation_mode = 0;
+
p->flags |= PF_MEMALLOC;
lockdep_set_current_reclaim_state(sc.gfp_mask);
reclaim_state.reclaimed_slab = 0;
@@ -3597,6 +3605,28 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
}
#endif /* CONFIG_HIBERNATION */

+#ifdef CONFIG_SHRINK_MEMORY
+int sysctl_shrink_memory;
+/* This is the entry point for system-wide shrink memory
++via /proc/sys/vm/shrink_memory */
+int sysctl_shrinkmem_handler(struct ctl_table *table, int write,
+ void __user *buffer, size_t *length, loff_t *ppos)
+{
+ int ret;
+
+ ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
+ if (ret)
+ return ret;
+
+ if (write) {
+ if (sysctl_shrink_memory & 1)
+ shrink_all_memory(totalram_pages);
+ }
+
+ return 0;
+}
+#endif
+
/* It's optimal to keep kswapds on the same CPUs as their memory, but
not required for correctness. So if the last cpu in a node goes
away, we get changed to run anywhere: as the first one comes back,
--
1.7.9.5

2015-07-20 08:28:36

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

On Mon, Jul 20, 2015 at 09:59:04AM +0530, Pintu Kumar wrote:
> This patch provides 2 things:
> 1. Add new control called shrink_memory in /proc/sys/vm/.
> This control can be used to aggressively reclaim memory system-wide
> in one shot from the user space. A value of 1 will instruct the
> kernel to reclaim as much as totalram_pages in the system.
> Example: echo 1 > /proc/sys/vm/shrink_memory
>
> If any other value than 1 is written to shrink_memory an error EINVAL
> occurs.
>
> 2. Enable shrink_all_memory API in kernel with new CONFIG_SHRINK_MEMORY.
> Currently, shrink_all_memory function is used only during hibernation.
> With the new config we can make use of this API for non-hibernation case
> also without disturbing the hibernation case.
>
> The detailed paper was presented in Embedded Linux Conference, Mar-2015
> http://events.linuxfoundation.org/sites/events/files/slides/
> %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
>

Johannes has already reviewed this series and explained why it's a bad
idea. This is just a note to say that I agree the points he made and also
think that adding an additional knob to reclaim data from user space is a bad
idea. Even drop_caches is only intended as a debugging tool to illustrate
cases where normal reclaim is broken. Similarly compact_node exists as a
debugging tool to check if direct compaction is not behaving as expected.

If this is invoked when high-order allocations start failing and memory is
fragmented with unreclaimable memory then it'll potentially keep thrashing
depending on the userspace monitor implementation. If the latency of
a high order allocation is important then reclaim/compaction should be
examined and improved. If the reliability of high-order allocations are
important then you either need to reserve the memory in advance. If that
is undesirable due to a constrained memory environment then one approach
is to modify how pages are grouped by mobility as described in the leader
of the series "Remove zonelist cache and high-order watermark checking".
There are two suggestions there for out-of-tree patches that would make
high-order allocations more reliable that are not suitable for mainline.

Yes, I read your presentation but lets go through the use cases you
list again;

> Various other use cases where this can be used:
> ----------------------------------------------------------------------------
> 1) Just after system boot-up is finished, using the sysctl configuration from
> bootup script.

Almost no benefit. Any page cache that is active and now cold would be
trivially reclaimed later.

> 2) During system suspend state, after suspend_freeze_processes()
> [kernel/power/suspend.c]
> Based on certain condition about fragmentation or free memory state.

No gain.

> 3) From Android ION system heap driver, when order-4 allocation starts failing.
> By calling shrink_all_memory, in a separate worker thread, based on certain
> condition.

If order-4 allocations fail when shrink_all_memory works and the order-4
allocation is required to work then the aggressiveness of reclaim/compaction
needs to be fixed to reclaim all system memory if necessary. Right now
it can bail because generally it is expected that no subsystem depends on
high order allocations succeeding for functional correctness.

> 4) It can be combined with compact_memory to achieve better results on memory
> fragmentation.

Only by reclaiming the world. In 3.0 the system behaved like this. High order
stress tests could take hours to complete as the system was continually
thrashed. Today the same test would complete in about 15 minutes albeit
with lower allocation success rates. We ran into multiple issues where
high order allocation requests caused the system to thrash and triggering
such thrashing from userspace is not an improvement.

> 5) It can be helpful in debugging and tuning various vm parameters.

No more than drop_caches is.

> 6) It can be helpful to identify how much of maximum memory could be
> reclaimable at any point of time.

Only by reclaiming the world. A less destructive means is using
MemAvailable from /proc/meminfo

> And how much higher-order pages could be formed with this amount of
> reclaimable memory.

Only by reclaiming the world

> Thus it can be helpful in accordingly tuning the reserved memory needs
> of a system.

By which time it's too late as a reboot will be necessary to set the
reserve.

> 7) It can be helpful in properly tuning the SWAP size in the system.

Only for a single point in time as it's workload dependant. The same
data can be inferred from smaps.

> In shrink_all_memory, we enable may_swap = 1, that means all unused pages
> will be swapped out.
> Thus, running shrink_memory on a heavy loaded system, we can check how much
> swap is getting full.
> That can be the maximum swap size with a 10% delta.
> Also if ZRAM is used, it helps us in compressing and storing the pages for
> later use.
> 8) It can be helpful to allow more new applications to be launched, without
> killing the older once.

Reclaim would achieve the same effect over time.

> And moving the least recently used pages to the SWAP area.
> Thus user data can be retained.
> 9) Can be part of a system utility to quickly defragment entire system
> memory.

Any memory that is not on the LRU or indirectly pinned by pages on the
LRU are unaffected.

If high-order allocation latency or reliability is important then you
really need a different solution because unless this thing runs
continually to keep memory unused then it'll eventually fail hard and
the system will perform poorly in the meantime.

--
Mel Gorman
SUSE Labs

2015-07-20 08:49:39

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

On Mon 20-07-15 09:28:10, Mel Gorman wrote:
[...]
> If high-order allocation latency or reliability is important then you
> really need a different solution because unless this thing runs
> continually to keep memory unused then it'll eventually fail hard and
> the system will perform poorly in the meantime.

Completely agreed. E.g. Vlastimil was suggesting a background compaction
daemon which might be a way to go.
--
Michal Hocko
SUSE Labs

2015-07-20 16:13:17

by PINTU KUMAR

[permalink] [raw]
Subject: RE: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

Hi,

Thank you all for reviewing the patch and providing your valuable comments and
suggestions.
During the ELC conference many people suggested to release the patch to
mainline, so this patch, to get others opinion.

If you have any more suggestions to experiment and verify please let me know.

The suggestion was only to open up the shrink_all_memory API for some use cases.

I am not saying that it needs to be called continuously. It can be used only on
certain condition and only when deemed necessary.
The same technique is already used in hibernation to reduce the RAM snapshot
image size.
But in embedded world, hibernation is not used, so this feature cannot be
utilized.

Thanks once again for the review and feedback.


> -----Original Message-----
> From: Mel Gorman [mailto:[email protected]]
> Sent: Monday, July 20, 2015 1:58 PM
> To: Pintu Kumar
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
> feature
>
> On Mon, Jul 20, 2015 at 09:59:04AM +0530, Pintu Kumar wrote:
> > This patch provides 2 things:
> > 1. Add new control called shrink_memory in /proc/sys/vm/.
> > This control can be used to aggressively reclaim memory system-wide in
> > one shot from the user space. A value of 1 will instruct the kernel to
> > reclaim as much as totalram_pages in the system.
> > Example: echo 1 > /proc/sys/vm/shrink_memory
> >
> > If any other value than 1 is written to shrink_memory an error EINVAL
> > occurs.
> >
> > 2. Enable shrink_all_memory API in kernel with new
> CONFIG_SHRINK_MEMORY.
> > Currently, shrink_all_memory function is used only during hibernation.
> > With the new config we can make use of this API for non-hibernation
> > case also without disturbing the hibernation case.
> >
> > The detailed paper was presented in Embedded Linux Conference,
> > Mar-2015 http://events.linuxfoundation.org/sites/events/files/slides/
> > %5BELC-2015%5D-System-wide-Memory-Defragmenter.pdf
> >
>
> Johannes has already reviewed this series and explained why it's a bad idea.
This
> is just a note to say that I agree the points he made and also think that
adding an
> additional knob to reclaim data from user space is a bad idea. Even
drop_caches
> is only intended as a debugging tool to illustrate cases where normal reclaim
is
> broken. Similarly compact_node exists as a debugging tool to check if direct
> compaction is not behaving as expected.
>
> If this is invoked when high-order allocations start failing and memory is
> fragmented with unreclaimable memory then it'll potentially keep thrashing
> depending on the userspace monitor implementation. If the latency of a high
> order allocation is important then reclaim/compaction should be examined and
> improved. If the reliability of high-order allocations are important then you
either
> need to reserve the memory in advance. If that is undesirable due to a
> constrained memory environment then one approach is to modify how pages are
> grouped by mobility as described in the leader of the series "Remove zonelist
> cache and high-order watermark checking".
> There are two suggestions there for out-of-tree patches that would make high-
> order allocations more reliable that are not suitable for mainline.
>
> Yes, I read your presentation but lets go through the use cases you list
again;
>
> > Various other use cases where this can be used:
> > ----------------------------------------------------------------------
> > ------
> > 1) Just after system boot-up is finished, using the sysctl configuration
from
> > bootup script.
>
> Almost no benefit. Any page cache that is active and now cold would be
trivially
> reclaimed later.
>
> > 2) During system suspend state, after suspend_freeze_processes()
> > [kernel/power/suspend.c]
> > Based on certain condition about fragmentation or free memory state.
>
> No gain.
>
> > 3) From Android ION system heap driver, when order-4 allocation starts
failing.
> > By calling shrink_all_memory, in a separate worker thread, based on
certain
> > condition.
>
> If order-4 allocations fail when shrink_all_memory works and the order-4
> allocation is required to work then the aggressiveness of reclaim/compaction
> needs to be fixed to reclaim all system memory if necessary. Right now it can
bail
> because generally it is expected that no subsystem depends on high order
> allocations succeeding for functional correctness.
>
> > 4) It can be combined with compact_memory to achieve better results on
> memory
> > fragmentation.
>
> Only by reclaiming the world. In 3.0 the system behaved like this. High order
> stress tests could take hours to complete as the system was continually
thrashed.
> Today the same test would complete in about 15 minutes albeit with lower
> allocation success rates. We ran into multiple issues where high order
allocation
> requests caused the system to thrash and triggering such thrashing from
> userspace is not an improvement.
>
> > 5) It can be helpful in debugging and tuning various vm parameters.
>
> No more than drop_caches is.
>
> > 6) It can be helpful to identify how much of maximum memory could be
> > reclaimable at any point of time.
>
> Only by reclaiming the world. A less destructive means is using MemAvailable
> from /proc/meminfo
>
> > And how much higher-order pages could be formed with this amount of
> > reclaimable memory.
>
> Only by reclaiming the world
>
> > Thus it can be helpful in accordingly tuning the reserved memory needs
> > of a system.
>
> By which time it's too late as a reboot will be necessary to set the reserve.
>
> > 7) It can be helpful in properly tuning the SWAP size in the system.
>
> Only for a single point in time as it's workload dependant. The same data can
be
> inferred from smaps.
>
> > In shrink_all_memory, we enable may_swap = 1, that means all unused pages
> > will be swapped out.
> > Thus, running shrink_memory on a heavy loaded system, we can check how
> much
> > swap is getting full.
> > That can be the maximum swap size with a 10% delta.
> > Also if ZRAM is used, it helps us in compressing and storing the pages
for
> > later use.
> > 8) It can be helpful to allow more new applications to be launched, without
> > killing the older once.
>
> Reclaim would achieve the same effect over time.
>
> > And moving the least recently used pages to the SWAP area.
> > Thus user data can be retained.
> > 9) Can be part of a system utility to quickly defragment entire system
> > memory.
>
> Any memory that is not on the LRU or indirectly pinned by pages on the LRU are
> unaffected.
>
> If high-order allocation latency or reliability is important then you really
need a
> different solution because unless this thing runs continually to keep memory
> unused then it'll eventually fail hard and the system will perform poorly in
the
> meantime.
>
> --
> Mel Gorman
> SUSE Labs

2015-07-20 17:55:52

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

On Mon, Jul 20, 2015 at 09:43:02PM +0530, PINTU KUMAR wrote:
> Hi,
>
> Thank you all for reviewing the patch and providing your valuable comments and
> suggestions.
> During the ELC conference many people suggested to release the patch to
> mainline, so this patch, to get others opinion.
>

Unfortunately, in my opinion it runs the risk of creating a different
set of problems. Either it needs to be run frequently to keep memory free
which incurs one set of penalties or it is used too late when there are
unmovable/unreclaimable pages preventing allocations succeeding in which
case you are back at the original problem. I see what you did and why it
would work in some cases but I think the main reason it works is because
it's run frequently enough so memory is never used. Grouping pages by
mobility actually took advantage of a similar property when it increased
min_free_kbytes but that was much more limited than adding a giant hammer
for userspace to reclaim the world.

> If you have any more suggestions to experiment and verify please let me know.
>

I believe I already did. If it's high-order reliability that is important
then you need to either reserve the memory or look at protecting the pages
using grouping pages by mobility. I pointed out what series to look at and
the leader explains how it could be adjusted further for the embedded case
if necessary.

If it's latency you are interested in then reclaim/compaction needs to
be modified to be more aggressive when it is somehow detected that the
high-order allocation must succeed for functional correctness. In that case
the relational starting point would be to look at should_continue_reclaim
and how it relates to compaction.

> The suggestion was only to open up the shrink_all_memory API for some use cases.
>
> I am not saying that it needs to be called continuously. It can be used only on
> certain condition and only when deemed necessary.
> The same technique is already used in hibernation to reduce the RAM snapshot
> image size.

Reducing memory usage is not the same as guaranteeing that high-order
pages are available for allocation.

--
Mel Gorman
SUSE Labs

2015-07-22 13:07:20

by PINTU KUMAR

[permalink] [raw]
Subject: RE: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

Dear Mel, thank you very much for your comments and suggestions.
I will drop this one and look on further improving direct_reclaim and
compaction.
Just few more comments below before I close.

Also, during this patch, I feel that the hibernation_mode part in
shrink_all_memory can be corrected.
So, can I separately submit the below patch?
That is instead of hard-coding the hibernation_mode, we can get hibernation
status using:
system_entering_hibernation()

Please let me know your suggestion about this changes.

-#ifdef CONFIG_HIBERNATION
+#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY
/*
* Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
* freed pages.
@@ -3576,12 +3580,16 @@ unsigned long shrink_all_memory(unsigned long
nr_to_reclaim)
.may_writepage = 1,
.may_unmap = 1,
.may_swap = 1,
- .hibernation_mode = 1,
};
struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
struct task_struct *p = current;
unsigned long nr_reclaimed;

+ if (system_entering_hibernation())
+ sc.hibernation_mode = 1;
+ else
+ sc.hibernation_mode = 0;
+
p->flags |= PF_MEMALLOC;
lockdep_set_current_reclaim_state(sc.gfp_mask);
reclaim_state.reclaimed_slab = 0;
@@ -3597,6 +3605,28 @@ unsigned long shrink_all_memory(unsigned long
nr_to_reclaim)
}
#endif /* CONFIG_HIBERNATION */


> -----Original Message-----
> From: Mel Gorman [mailto:[email protected]]
> Sent: Monday, July 20, 2015 11:26 PM
> To: PINTU KUMAR
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
> feature
>
> On Mon, Jul 20, 2015 at 09:43:02PM +0530, PINTU KUMAR wrote:
> > Hi,
> >
> > Thank you all for reviewing the patch and providing your valuable
> > comments and suggestions.
> > During the ELC conference many people suggested to release the patch
> > to mainline, so this patch, to get others opinion.
> >
>
> Unfortunately, in my opinion it runs the risk of creating a different set of
> problems. Either it needs to be run frequently to keep memory free which
incurs
> one set of penalties or it is used too late when there are
> unmovable/unreclaimable pages preventing allocations succeeding in which case
> you are back at the original problem.

Yes, I completely agree with you that it needs to be invoked at the right time.
Running it too late is of no benefit.

> I see what you did and why it would work in some cases
> but I think the main reason it works is because it's run frequently
> enough so memory is never used.

Yes, we ran frequently, but not so frequently and only when required.
Actually, it gives us best result when calling shrink_memory plus compaction
together,
once after boot, and once during order-4 failure from kernel, or during suspend
state.
It reduced the slowpath count drastically (during 30 application launch test).
VMSTAT WITHOUT WITH
slowpath_entered 16659 1859
allocstall 298 149
pageoutrun 2699 1108
compact_stall 244 37
nr_free_cma 2560 2505

Anyways, I agree that if reclaimable pages or SWAP free is not enough, it does
not
yield good results.

> Grouping pages by mobility actually took
> advantage of a similar property when it increased min_free_kbytes but that was
> much more limited than adding a giant hammer for userspace to reclaim the
> world.
>
> > If you have any more suggestions to experiment and verify please let me
know.
> >
>
> I believe I already did. If it's high-order reliability that is important then
you need
> to either reserve the memory or look at protecting the pages using grouping
> pages by mobility. I pointed out what series to look at and the leader
explains
> how it could be adjusted further for the embedded case if necessary.

Thanks. I would definitely look into grouping pages by mobility and those
series.

>
> If it's latency you are interested in then reclaim/compaction needs to be
modified
> to be more aggressive when it is somehow detected that the high-order
> allocation must succeed for functional correctness. In that case the
relational
> starting point would be to look at should_continue_reclaim and how it relates
to
> compaction.
>
Thanks. Definitely I will do a deep dive into should_continue_reclaim.

> > The suggestion was only to open up the shrink_all_memory API for some use
> cases.
> >
> > I am not saying that it needs to be called continuously. It can be
> > used only on certain condition and only when deemed necessary.
> > The same technique is already used in hibernation to reduce the RAM
> > snapshot image size.
>
> Reducing memory usage is not the same as guaranteeing that high-order pages
> are available for allocation.
>
> --
> Mel Gorman
> SUSE Labs

2015-07-22 14:05:46

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

On Wed, Jul 22, 2015 at 06:33:26PM +0530, PINTU KUMAR wrote:
> Dear Mel, thank you very much for your comments and suggestions.
> I will drop this one and look on further improving direct_reclaim and
> compaction.
> Just few more comments below before I close.
>
> Also, during this patch, I feel that the hibernation_mode part in
> shrink_all_memory can be corrected.
> So, can I separately submit the below patch?
> That is instead of hard-coding the hibernation_mode, we can get hibernation
> status using:
> system_entering_hibernation()
>
> Please let me know your suggestion about this changes.
>
> -#ifdef CONFIG_HIBERNATION
> +#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY

This appears to be a patch on top of "Add /proc/sys/vm/shrink_memory
feature" so I do not see what would be separately submitted that would
make sense.

--
Mel Gorman
SUSE Labs

2015-07-29 05:12:25

by PINTU KUMAR

[permalink] [raw]
Subject: RE: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

Sorry, for late reply.

> -----Original Message-----
> From: Mel Gorman [mailto:[email protected]]
> Sent: Wednesday, July 22, 2015 7:36 PM
> To: PINTU KUMAR
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory
> feature
>
> On Wed, Jul 22, 2015 at 06:33:26PM +0530, PINTU KUMAR wrote:
> > Dear Mel, thank you very much for your comments and suggestions.
> > I will drop this one and look on further improving direct_reclaim and
> > compaction.
> > Just few more comments below before I close.
> >
> > Also, during this patch, I feel that the hibernation_mode part in
> > shrink_all_memory can be corrected.
> > So, can I separately submit the below patch?
> > That is instead of hard-coding the hibernation_mode, we can get
> > hibernation status using:
> > system_entering_hibernation()
> >
> > Please let me know your suggestion about this changes.
> >
> > -#ifdef CONFIG_HIBERNATION
> > +#if defined CONFIG_HIBERNATION || CONFIG_SHRINK_MEMORY
>
I was talking about only the following case.
Instead of hard coding the hibernation_mode in shrink_all_memory,
We can set it at runtime.

- .hibernation_mode = 1,

+ if (system_entering_hibernation())
+ sc.hibernation_mode = 1;
+ else
+ sc.hibernation_mode = 0;

The PM owners should confirm if this is ok.
Once confirmed, I will submit the full patch set.

+> This appears to be a patch on top of "Add /proc/sys/vm/shrink_memory feature"
> so I do not see what would be separately submitted that would make sense.
>
And we don't need to have /proc/sys/vm/shrink_memory patch for this.

However, if required, we can also expose shrink_all_memory() outside the
hibernation using the CONFIG_SHRINK_MEMORY.
Otherwise, we can neglect other changes.

> --
> Mel Gorman
> SUSE Labs

2015-07-29 12:08:54

by Johannes Weiner

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] kernel/sysctl.c: Add /proc/sys/vm/shrink_memory feature

On Wed, Jul 29, 2015 at 10:41:10AM +0530, PINTU KUMAR wrote:
> I was talking about only the following case.
> Instead of hard coding the hibernation_mode in shrink_all_memory,
> We can set it at runtime.
>
> - .hibernation_mode = 1,
>
> + if (system_entering_hibernation())
> + sc.hibernation_mode = 1;
> + else
> + sc.hibernation_mode = 0;

Nobody outside hibernation uses this function (and likely never will).
The hardcoding is fine.