2013-05-16 11:47:39

by Tang Chen

[permalink] [raw]
Subject: [PATCH 1/1] numa, mm, memory-hotplug: Do not allocate pagetable to local node with MEMORY_HOTREMOVE enabled.

The following patch-set allocated pagetables to local node.
https://lkml.org/lkml/2013/4/11/829

Doing this will break memory hot-remove.

Before removing memory, the kernel offlines memory. If offlining
memory fails, the memory cannot be removed. The pagetables are
used by the kernel, so they cannot be offlined. Furthermore, they
cannot be removed.

Of course, we can free pagetable pages because the pagetables of
the to be removed memory are useless. But offlining memory doesn't
mean removing memory. If users only want to offline memory, the
pagetables should not be freed.

The minimum unit of memory online/offline is block. And by default,
one block contains one section, which by default is 128MB. There is
possiblity that half of the block is pagetable, and the other half
is movable memory.

When we offline this kind of block, the status of the block is
uncertain. We cannot simply free the pagetables in this block because
they may be used by other online blocks. But when doing memory
hot-remove, the failure of offlining blocks will break the memory
hot-remove logic.


In order to fix it, we have three solutions:

1. Reserve the whole block (128MB), making no user can use the rest
parts of the block. And skip them when offlining memory.
When all the other blocks are offlined, free the pagetable, and remove
all the memory.

But we may lose some memory for this purpose. 128MB is a little big
to waste.


2. Keep this block online. Although the offline operation fails, it is
OK to remove memory.

But the offline operation will always fail. And generally speaking,
there are a lot of reasons of offline failing, it is difficult to
detect if it is OK to remove memory. So we don't suggest this way.


3. Migrate user pages and make this block offline. Offlining memory won't
stop the kernel using the pagetables stored in them, so it will be OK.

But this will change the semantics of "offline". I'm not sure if we
can do it in this way.


So before we fix this problem, I think we should not allocate pagetables
to local node when CONFIG_MEMORY_HOTREMOVE is enabled. And recover it when
we confirm the direction and fix the problem.

This patch is based on
git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-mm

Any other solution for this problem is welcome.


Signed-off-by: Tang Chen <[email protected]>
---
arch/x86/mm/init.c | 27 ++++++++++++++++-----------
1 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 8d0007a..8cd8a2d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -55,18 +55,23 @@ __ref void *alloc_low_pages(unsigned int num)

if ((pgt_buf_end + num) > pgt_buf_top || !can_use_brk_pgt) {
unsigned long ret;
- if (local_min_pfn_mapped >= local_max_pfn_mapped) {
+#ifndef CONFIG_MEMORY_HOTPLUG
+ if (local_max_pfn_mapped > local_min_pfn_mapped) {
+ ret = memblock_find_in_range(
+ local_min_pfn_mapped << PAGE_SHIFT,
+ local_max_pfn_mapped << PAGE_SHIFT,
+ PAGE_SIZE * num , PAGE_SIZE);
+ } else
+#endif
+ {
if (low_min_pfn_mapped >= low_max_pfn_mapped)
panic("alloc_low_page: ran out of memory");
ret = memblock_find_in_range(
low_min_pfn_mapped << PAGE_SHIFT,
low_max_pfn_mapped << PAGE_SHIFT,
PAGE_SIZE * num , PAGE_SIZE);
- } else
- ret = memblock_find_in_range(
- local_min_pfn_mapped << PAGE_SHIFT,
- local_max_pfn_mapped << PAGE_SHIFT,
- PAGE_SIZE * num , PAGE_SIZE);
+ }
+
if (!ret)
panic("alloc_low_page: can not alloc memory");
memblock_reserve(ret, PAGE_SIZE * num);
@@ -443,6 +448,11 @@ void __init init_mem_mapping(unsigned long begin, unsigned long end)
if (new_mapped_ram_size > mapped_ram_size)
step_size <<= STEP_SIZE_SHIFT;
mapped_ram_size += new_mapped_ram_size;
+
+ if (is_low) {
+ low_min_pfn_mapped = local_min_pfn_mapped;
+ low_max_pfn_mapped = local_max_pfn_mapped;
+ }
}

if (real_end < end) {
@@ -450,11 +460,6 @@ void __init init_mem_mapping(unsigned long begin, unsigned long end)
if ((end >> PAGE_SHIFT) > local_max_pfn_mapped)
local_max_pfn_mapped = end >> PAGE_SHIFT;
}
-
- if (is_low) {
- low_min_pfn_mapped = local_min_pfn_mapped;
- low_max_pfn_mapped = local_max_pfn_mapped;
- }
}

#ifndef CONFIG_NUMA
--
1.7.1


2013-05-21 07:02:48

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 1/1] numa, mm, memory-hotplug: Do not allocate pagetable to local node with MEMORY_HOTREMOVE enabled.

On Thu, May 16, 2013 at 2:50 PM, Tang Chen <[email protected]> wrote:
> The following patch-set allocated pagetables to local node.
> https://lkml.org/lkml/2013/4/11/829
>
> Doing this will break memory hot-remove.
>
> Before removing memory, the kernel offlines memory. If offlining
> memory fails, the memory cannot be removed. The pagetables are
> used by the kernel, so they cannot be offlined. Furthermore, they
> cannot be removed.
>
> Of course, we can free pagetable pages because the pagetables of
> the to be removed memory are useless. But offlining memory doesn't
> mean removing memory. If users only want to offline memory, the
> pagetables should not be freed.
>
> The minimum unit of memory online/offline is block. And by default,
> one block contains one section, which by default is 128MB. There is
> possiblity that half of the block is pagetable, and the other half
> is movable memory.
>
> When we offline this kind of block, the status of the block is
> uncertain. We cannot simply free the pagetables in this block because
> they may be used by other online blocks. But when doing memory
> hot-remove, the failure of offlining blocks will break the memory
> hot-remove logic.
>
>
> In order to fix it, we have three solutions:
>
> 1. Reserve the whole block (128MB), making no user can use the rest
> parts of the block. And skip them when offlining memory.
> When all the other blocks are offlined, free the pagetable, and remove
> all the memory.
>
> But we may lose some memory for this purpose. 128MB is a little big
> to waste.
>
>
> 2. Keep this block online. Although the offline operation fails, it is
> OK to remove memory.
>
> But the offline operation will always fail. And generally speaking,
> there are a lot of reasons of offline failing, it is difficult to
> detect if it is OK to remove memory. So we don't suggest this way.
>
>
> 3. Migrate user pages and make this block offline. Offlining memory won't
> stop the kernel using the pagetables stored in them, so it will be OK.
>
> But this will change the semantics of "offline". I'm not sure if we
> can do it in this way.
>
>
> So before we fix this problem, I think we should not allocate pagetables
> to local node when CONFIG_MEMORY_HOTREMOVE is enabled. And recover it when
> we confirm the direction and fix the problem.
>
> This patch is based on
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git for-x86-mm
>
> Any other solution for this problem is welcome.
>
>
> Signed-off-by: Tang Chen <[email protected]>

Ugh. Special-casing for CONFIG_MEMORY_HOTPLUG is just begging for
trouble. Were you able to determine which commit broke memory
hot-remove?

2013-05-21 07:10:09

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH 1/1] numa, mm, memory-hotplug: Do not allocate pagetable to local node with MEMORY_HOTREMOVE enabled.

Hi

On 05/21/2013 03:02 PM, Pekka Enberg wrote:
......
>
> Ugh. Special-casing for CONFIG_MEMORY_HOTPLUG is just begging for
> trouble. Were you able to determine which commit broke memory
> hot-remove?

Please refer to the following patch-set.
https://lkml.org/lkml/2013/4/11/829

patch21 and patch22 will allocate pagetable to local, which may cause
memory hot-remove fail.

But this patch-set is not in the mainline now.

Thanks. :)