This patch alters the admin and user reserves of the previous patches
in this series when memory is added or removed.
If memory is added and the reserves have been eliminated or increased above
the default max, then we'll trust the admin.
If memory is removed and there isn't enough free memory, then we
need to reset the reserves.
Otherwise keep the reserve set by the admin.
The reserve reset code is the same as the reserve initialization code.
Does this sound reasonable to other people? I figured that hot removal
with too large of memory in the reserves was the most important case
to get right.
I tested hot addition and removal by triggering it via sysfs. The reserves
shrunk when they were set high and memory was removed. They were reset
higher when memory was added again.
---
mm/mmap.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 63 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c
index 8fb9d99..321348b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -33,6 +33,8 @@
#include <linux/uprobes.h>
#include <linux/rbtree_augmented.h>
#include <linux/sched/sysctl.h>
+#include <linux/notifier.h>
+#include <linux/memory.h>
#include <asm/uaccess.h>
#include <asm/cacheflush.h>
@@ -3111,3 +3113,64 @@ int __meminit init_admin_reserve(void)
return 0;
}
module_init(init_admin_reserve)
+
+/*
+ * Reinititalise user and admin reserves if memory is added or removed.
+ *
+ * If memory is added and the reserves have been eliminated or increased above
+ * the default max, then we'll trust the admin.
+ *
+ * If memory is removed and there isn't enough free memory, then we
+ * need to reset the reserves.
+ *
+ * Otherwise keep the reserve set by the admin.
+ */
+static int reserve_mem_notifier(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ unsigned long tmp, free_kbytes;
+
+ switch (action) {
+ case MEM_ONLINE:
+ tmp = sysctl_user_reserve_kbytes;
+ if (0 < tmp && tmp < (1UL << 17))
+ init_user_reserve();
+
+ tmp = sysctl_admin_reserve_kbytes;
+ if (0 < tmp && tmp < (1UL << 13))
+ init_admin_reserve();
+
+ break;
+ case MEM_OFFLINE:
+ free_kbytes = global_page_state(NR_FREE_PAGES) << (PAGE_SHIFT - 10);
+
+ if (sysctl_user_reserve_kbytes > free_kbytes) {
+ init_user_reserve();
+ pr_info("vm.user_reserve_kbytes reset to %lu\n",
+ sysctl_user_reserve_kbytes);
+ }
+
+ if (sysctl_admin_reserve_kbytes > free_kbytes) {
+ init_admin_reserve();
+ pr_info("vm.admin_reserve_kbytes reset to %lu\n",
+ sysctl_admin_reserve_kbytes);
+ }
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block reserve_mem_nb = {
+ .notifier_call = reserve_mem_notifier,
+};
+
+int __meminit init_reserve_notifier(void)
+{
+ if (register_memory_notifier(&reserve_mem_nb))
+ printk("Failed registering memory add/remove notifier for admin reserve");
+
+ return 0;
+}
+module_init(init_reserve_notifier)
--
1.8.0.1
On Mon, 8 Apr 2013 15:07:38 -0400 Andrew Shewmaker <[email protected]> wrote:
> This patch alters the admin and user reserves of the previous patches
> in this series when memory is added or removed.
>
> If memory is added and the reserves have been eliminated or increased above
> the default max, then we'll trust the admin.
>
> If memory is removed and there isn't enough free memory, then we
> need to reset the reserves.
>
> Otherwise keep the reserve set by the admin.
>
> The reserve reset code is the same as the reserve initialization code.
>
> Does this sound reasonable to other people? I figured that hot removal
> with too large of memory in the reserves was the most important case
> to get right.
Seems reasonable to me.
I don't understand the magic numbers 1<<13 and 1<<17. How could I?
Please add comments explaining how and why these were chosen.
Your patch adds 400 bytes of unusable code to the
CONFIG_MEMORY_HOTPLUG=n kernel. We have a fix for that in the CPU
hotplug case: register_hotcpu_notifier(). Memory hotplug has its own
hotplug_memory_notifier() but a) it's broken and b) it just doesn't
work! With my gcc-4.4.4, the unused functions are still included in
the .o file.
So I did this:
From: Andrew Morton <[email protected]>
Subject: include/linux/memory.h: implement register_hotmemory_notifier()
When CONFIG_MEMORY_HOTPLUG=n, we don't want the memory-hotplug notifier
handlers to be included in the .o files, for space reasons.
The existing hotplug_memory_notifier() tries to handle this but testing
with gcc-4.4.4 shows that it doesn't work - the hotplug functions are
still present in the .o files.
So implement a new register_hotmemory_notifier() which is a copy of
register_hotcpu_notifier(), and which actually works as desired.
hotplug_memory_notifier() and register_memory_notifier() callsites should
be converted to use this new register_hotmemory_notifier().
While we're there, let's repair the existing hotplug_memory_notifier(): it
simply stomps on the register_memory_notifier() return value, so
well-behaved code cannot check for errors. Apparently non of the existing
callers were well-behaved :(
Cc: Andrew Shewmaker <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
include/linux/memory.h | 15 ++++++++++++---
include/linux/notifier.h | 5 ++++-
2 files changed, 16 insertions(+), 4 deletions(-)
diff -puN include/linux/memory.h~include-linux-memoryh-implement-register_hotmemory_notifier include/linux/memory.h
--- a/include/linux/memory.h~include-linux-memoryh-implement-register_hotmemory_notifier
+++ a/include/linux/memory.h
@@ -18,6 +18,7 @@
#include <linux/node.h>
#include <linux/compiler.h>
#include <linux/mutex.h>
+#include <linux/notifier.h>
#define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
@@ -127,13 +128,21 @@ enum mem_add_context { BOOT, HOTPLUG };
#endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
#ifdef CONFIG_MEMORY_HOTPLUG
-#define hotplug_memory_notifier(fn, pri) { \
+#define hotplug_memory_notifier(fn, pri) ({ \
static __meminitdata struct notifier_block fn##_mem_nb =\
{ .notifier_call = fn, .priority = pri }; \
register_memory_notifier(&fn##_mem_nb); \
-}
+})
+#define register_hotmemory_notifier(nb) register_memory_notifier(nb)
+#define unregister_hotmemory_notifier(nb) unregister_memory_notifier(nb)
#else
-#define hotplug_memory_notifier(fn, pri) do { } while (0)
+static inline int hotplug_memory_notifier(notifier_fn_t fn, int priority)
+{
+ return 0;
+}
+/* These aren't inline functions due to a GCC bug. */
+#define register_hotmemory_notifier(nb) ({ (void)(nb); 0; })
+#define unregister_hotmemory_notifier(nb) ({ (void)(nb); })
#endif
/*
diff -puN include/linux/notifier.h~include-linux-memoryh-implement-register_hotmemory_notifier include/linux/notifier.h
--- a/include/linux/notifier.h~include-linux-memoryh-implement-register_hotmemory_notifier
+++ a/include/linux/notifier.h
@@ -47,8 +47,11 @@
* runtime initialization.
*/
+typedef int (*notifier_fn_t)(struct notifier_block *nb,
+ unsigned long action, void *data);
+
struct notifier_block {
- int (*notifier_call)(struct notifier_block *, unsigned long, void *);
+ notifier_fn_t notifier_call;
struct notifier_block __rcu *next;
int priority;
};
_
And then I changed your patch thusly:
--- a/mm/mmap.c~mm-reinititalise-user-and-admin-reserves-if-memory-is-added-or-removed-fix
+++ a/mm/mmap.c
@@ -3198,7 +3198,7 @@ static struct notifier_block reserve_mem
int __meminit init_reserve_notifier(void)
{
- if (register_memory_notifier(&reserve_mem_nb))
+ if (register_hotmemory_notifier(&reserve_mem_nb))
printk("Failed registering memory add/remove notifier for admin reserve");
return 0;
_
and voila, no more bloat.
On Mon, 8 Apr 2013 15:07:38 -0400 Andrew Shewmaker <[email protected]> wrote:
> This patch alters the admin and user reserves of the previous patches
> in this series when memory is added or removed.
>
> If memory is added and the reserves have been eliminated or increased above
> the default max, then we'll trust the admin.
>
> If memory is removed and there isn't enough free memory, then we
> need to reset the reserves.
>
> Otherwise keep the reserve set by the admin.
>
> The reserve reset code is the same as the reserve initialization code.
>
> Does this sound reasonable to other people? I figured that hot removal
> with too large of memory in the reserves was the most important case
> to get right.
>
> I tested hot addition and removal by triggering it via sysfs. The reserves
> shrunk when they were set high and memory was removed. They were reset
> higher when memory was added again.
I have added your Signed-off-by: to my copy of this patch.
On Mon, Apr 08, 2013 at 01:37:12PM -0700, Andrew Morton wrote:
> On Mon, 8 Apr 2013 15:07:38 -0400 Andrew Shewmaker <[email protected]> wrote:
>
> > This patch alters the admin and user reserves of the previous patches
> > in this series when memory is added or removed.
> >
> > If memory is added and the reserves have been eliminated or increased above
> > the default max, then we'll trust the admin.
> >
> > If memory is removed and there isn't enough free memory, then we
> > need to reset the reserves.
> >
> > Otherwise keep the reserve set by the admin.
> >
> > The reserve reset code is the same as the reserve initialization code.
> >
> > Does this sound reasonable to other people? I figured that hot removal
> > with too large of memory in the reserves was the most important case
> > to get right.
>
> Seems reasonable to me.
>
> I don't understand the magic numbers 1<<13 and 1<<17. How could I?
> Please add comments explaining how and why these were chosen.
I'm preparing a new version with this and the other changes you
gave me. Thank you!
Should I add the memory notifier code to mm/nommu.c too?
I'm guessing that if a system doesn't have an mmu that it also
won't be hotplugging memory.
> Your patch adds 400 bytes of unusable code to the
> CONFIG_MEMORY_HOTPLUG=n kernel. We have a fix for that in the CPU
> hotplug case: register_hotcpu_notifier(). Memory hotplug has its own
> hotplug_memory_notifier() but a) it's broken and b) it just doesn't
> work! With my gcc-4.4.4, the unused functions are still included in
> the .o file.
>
> So I did this:
>
> From: Andrew Morton <[email protected]>
> Subject: include/linux/memory.h: implement register_hotmemory_notifier()
>
> When CONFIG_MEMORY_HOTPLUG=n, we don't want the memory-hotplug notifier
> handlers to be included in the .o files, for space reasons.
>
> The existing hotplug_memory_notifier() tries to handle this but testing
> with gcc-4.4.4 shows that it doesn't work - the hotplug functions are
> still present in the .o files.
>
> So implement a new register_hotmemory_notifier() which is a copy of
> register_hotcpu_notifier(), and which actually works as desired.
> hotplug_memory_notifier() and register_memory_notifier() callsites should
> be converted to use this new register_hotmemory_notifier().
>
> While we're there, let's repair the existing hotplug_memory_notifier(): it
> simply stomps on the register_memory_notifier() return value, so
> well-behaved code cannot check for errors. Apparently non of the existing
> callers were well-behaved :(
>
> Cc: Andrew Shewmaker <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> include/linux/memory.h | 15 ++++++++++++---
> include/linux/notifier.h | 5 ++++-
> 2 files changed, 16 insertions(+), 4 deletions(-)
>
> diff -puN include/linux/memory.h~include-linux-memoryh-implement-register_hotmemory_notifier include/linux/memory.h
> --- a/include/linux/memory.h~include-linux-memoryh-implement-register_hotmemory_notifier
> +++ a/include/linux/memory.h
> @@ -18,6 +18,7 @@
> #include <linux/node.h>
> #include <linux/compiler.h>
> #include <linux/mutex.h>
> +#include <linux/notifier.h>
>
> #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
>
> @@ -127,13 +128,21 @@ enum mem_add_context { BOOT, HOTPLUG };
> #endif /* CONFIG_MEMORY_HOTPLUG_SPARSE */
>
> #ifdef CONFIG_MEMORY_HOTPLUG
> -#define hotplug_memory_notifier(fn, pri) { \
> +#define hotplug_memory_notifier(fn, pri) ({ \
> static __meminitdata struct notifier_block fn##_mem_nb =\
> { .notifier_call = fn, .priority = pri }; \
> register_memory_notifier(&fn##_mem_nb); \
> -}
> +})
> +#define register_hotmemory_notifier(nb) register_memory_notifier(nb)
> +#define unregister_hotmemory_notifier(nb) unregister_memory_notifier(nb)
> #else
> -#define hotplug_memory_notifier(fn, pri) do { } while (0)
> +static inline int hotplug_memory_notifier(notifier_fn_t fn, int priority)
> +{
> + return 0;
> +}
> +/* These aren't inline functions due to a GCC bug. */
> +#define register_hotmemory_notifier(nb) ({ (void)(nb); 0; })
> +#define unregister_hotmemory_notifier(nb) ({ (void)(nb); })
> #endif
>
> /*
> diff -puN include/linux/notifier.h~include-linux-memoryh-implement-register_hotmemory_notifier include/linux/notifier.h
> --- a/include/linux/notifier.h~include-linux-memoryh-implement-register_hotmemory_notifier
> +++ a/include/linux/notifier.h
> @@ -47,8 +47,11 @@
> * runtime initialization.
> */
>
> +typedef int (*notifier_fn_t)(struct notifier_block *nb,
> + unsigned long action, void *data);
> +
> struct notifier_block {
> - int (*notifier_call)(struct notifier_block *, unsigned long, void *);
> + notifier_fn_t notifier_call;
> struct notifier_block __rcu *next;
> int priority;
> };
> _
>
>
> And then I changed your patch thusly:
>
> --- a/mm/mmap.c~mm-reinititalise-user-and-admin-reserves-if-memory-is-added-or-removed-fix
> +++ a/mm/mmap.c
> @@ -3198,7 +3198,7 @@ static struct notifier_block reserve_mem
>
> int __meminit init_reserve_notifier(void)
> {
> - if (register_memory_notifier(&reserve_mem_nb))
> + if (register_hotmemory_notifier(&reserve_mem_nb))
> printk("Failed registering memory add/remove notifier for admin reserve");
>
> return 0;
> _
>
> and voila, no more bloat.
On Mon, 8 Apr 2013 17:00:40 -0400 Andrew Shewmaker <[email protected]> wrote:
> Should I add the memory notifier code to mm/nommu.c too?
> I'm guessing that if a system doesn't have an mmu that it also
> won't be hotplugging memory.
I doubt if we need to worry about memory hotplug on nommu machines,
so just do the minimum which is required to get nommu to compile
and link. That's probably "nothing".
On Tue, Apr 9, 2013 at 4:19 PM, Andrew Morton <[email protected]> wrote:
> On Mon, 8 Apr 2013 17:00:40 -0400 Andrew Shewmaker <[email protected]> wrote:
>
>> Should I add the memory notifier code to mm/nommu.c too?
>> I'm guessing that if a system doesn't have an mmu that it also
>> won't be hotplugging memory.
>
> I doubt if we need to worry about memory hotplug on nommu machines,
> so just do the minimum which is required to get nommu to compile
> and link. That's probably "nothing".
I haven't gotten myself set up to compile a nommu architecture, so I'll post
my next version, and work on verifying it compiles and links later. But I
I probably won't be able to get to that for a week and a half ... I'm leaving
on my honeymoon in the next couple days :)
Hi Andrew,
On 04/10/2013 07:56 AM, Andrew Shewmaker wrote:
> On Tue, Apr 9, 2013 at 4:19 PM, Andrew Morton <[email protected]> wrote:
>> On Mon, 8 Apr 2013 17:00:40 -0400 Andrew Shewmaker <[email protected]> wrote:
>>
>>> Should I add the memory notifier code to mm/nommu.c too?
>>> I'm guessing that if a system doesn't have an mmu that it also
>>> won't be hotplugging memory.
>> I doubt if we need to worry about memory hotplug on nommu machines,
>> so just do the minimum which is required to get nommu to compile
>> and link. That's probably "nothing".
> I haven't gotten myself set up to compile a nommu architecture, so I'll post
> my next version, and work on verifying it compiles and links later. But I
> I probably won't be able to get to that for a week and a half ... I'm leaving
> on my honeymoon in the next couple days :)
How to compile a nommu architecture? just config in menu config or a
physical machine?
On Tue, Apr 9, 2013 at 6:05 PM, Simon Jeons <[email protected]> wrote:
> Hi Andrew,
>
> On 04/10/2013 07:56 AM, Andrew Shewmaker wrote:
>>
>> On Tue, Apr 9, 2013 at 4:19 PM, Andrew Morton <[email protected]>
>> wrote:
>>>
>>> On Mon, 8 Apr 2013 17:00:40 -0400 Andrew Shewmaker <[email protected]>
>>> wrote:
>>>
>>>> Should I add the memory notifier code to mm/nommu.c too?
>>>> I'm guessing that if a system doesn't have an mmu that it also
>>>> won't be hotplugging memory.
>>>
>>> I doubt if we need to worry about memory hotplug on nommu machines,
>>> so just do the minimum which is required to get nommu to compile
>>> and link. That's probably "nothing".
>>
>> I haven't gotten myself set up to compile a nommu architecture, so I'll
>> post
>> my next version, and work on verifying it compiles and links later. But I
>> I probably won't be able to get to that for a week and a half ... I'm
>> leaving
>> on my honeymoon in the next couple days :)
>
>
> How to compile a nommu architecture? just config in menu config or a
> physical machine?
I was going to set up a qemu arm guest. Please, anyone, let me know if
there's an easier way to test nommu builds on x86.
On 04/10/2013 08:11 AM, Andrew Shewmaker wrote:
> On Tue, Apr 9, 2013 at 6:05 PM, Simon Jeons <[email protected]> wrote:
>> Hi Andrew,
>>
>> On 04/10/2013 07:56 AM, Andrew Shewmaker wrote:
>>> On Tue, Apr 9, 2013 at 4:19 PM, Andrew Morton <[email protected]>
>>> wrote:
>>>> On Mon, 8 Apr 2013 17:00:40 -0400 Andrew Shewmaker <[email protected]>
>>>> wrote:
>>>>
>>>>> Should I add the memory notifier code to mm/nommu.c too?
>>>>> I'm guessing that if a system doesn't have an mmu that it also
>>>>> won't be hotplugging memory.
>>>> I doubt if we need to worry about memory hotplug on nommu machines,
>>>> so just do the minimum which is required to get nommu to compile
>>>> and link. That's probably "nothing".
>>> I haven't gotten myself set up to compile a nommu architecture, so I'll
>>> post
>>> my next version, and work on verifying it compiles and links later. But I
>>> I probably won't be able to get to that for a week and a half ... I'm
>>> leaving
>>> on my honeymoon in the next couple days :)
>>
>> How to compile a nommu architecture? just config in menu config or a
>> physical machine?
> I was going to set up a qemu arm guest. Please, anyone, let me know if
> there's an easier way to test nommu builds on x86.
AFAK, arm7 is nommu.
On Tue, Apr 9, 2013 at 4:19 PM, Andrew Morton <[email protected]> wrote:
> On Mon, 8 Apr 2013 17:00:40 -0400 Andrew Shewmaker <[email protected]> wrote:
>
>> Should I add the memory notifier code to mm/nommu.c too?
>> I'm guessing that if a system doesn't have an mmu that it also
>> won't be hotplugging memory.
>
> I doubt if we need to worry about memory hotplug on nommu machines,
> so just do the minimum which is required to get nommu to compile
> and link. That's probably "nothing".
I don't know why I didn't think to look for a cross-compiler package first.
Anyway, nommu compiles and links without error when I disable suspend and
switch from slub to slab. Those errors didn't appear to have anything to do
with mm/nommu.c
--
Andrew Shewmaker
On Mon, Apr 08, 2013 at 01:37:12PM -0700, Andrew Morton wrote:
> On Mon, 8 Apr 2013 15:07:38 -0400 Andrew Shewmaker <[email protected]> wrote:
>
> > This patch alters the admin and user reserves of the previous patches
> > in this series when memory is added or removed.
> >
> > If memory is added and the reserves have been eliminated or increased above
> > the default max, then we'll trust the admin.
> >
> > If memory is removed and there isn't enough free memory, then we
> > need to reset the reserves.
> >
> > Otherwise keep the reserve set by the admin.
> >
> > The reserve reset code is the same as the reserve initialization code.
> >
> > Does this sound reasonable to other people? I figured that hot removal
> > with too large of memory in the reserves was the most important case
> > to get right.
>
> Seems reasonable to me.
>
> I don't understand the magic numbers 1<<13 and 1<<17. How could I?
> Please add comments explaining how and why these were chosen.
The v9 patch I posted has this too, but here is a patch against
yesterday's mmotm.
diff --git a/mm/mmap.c b/mm/mmap.c
index 099a16d..cee7e74 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3119,6 +3119,13 @@ module_init(init_admin_reserve)
/*
* Reinititalise user and admin reserves if memory is added or removed.
*
+ * The default user reserve max is 128MB, and the default max for the
+ * admin reserve is 8MB. These are usually, but not always, enough to
+ * enable recovery from a memory hogging process using login/sshd, a shell,
+ * and tools like top. It may make sense to increase or even disable the
+ * reserve depending on the existence of swap or variations in the recovery
+ * tools. So, the admin may have changed them.
+ *
* If memory is added and the reserves have been eliminated or increased above
* the default max, then we'll trust the admin.
*
@@ -3134,10 +3141,16 @@ static int reserve_mem_notifier(struct notifier_block *nb,
switch (action) {
case MEM_ONLINE:
+ /*
+ * Default max is 128MB. Leave alone if modified by operator.
+ */
tmp = sysctl_user_reserve_kbytes;
if (0 < tmp && tmp < (1UL << 17))
init_user_reserve();
+ /*
+ * Default max is 8MB. Leave alone if modified by operator.
+ */
tmp = sysctl_admin_reserve_kbytes;
if (0 < tmp && tmp < (1UL << 13))
init_admin_reserve();