2014-07-21 03:57:09

by Wang Nan

[permalink] [raw]
Subject: [PATCH v2 0/7] memory-hotplug: suitable memory should go to ZONE_MOVABLE

This series of patches fix a problem when adding memory in bad manner.
For example: for a x86_64 machine booted with "mem=400M" and with 2GiB
memory installed, following commands cause problem:

# echo 0x40000000 > /sys/devices/system/memory/probe
[ 28.613895] init_memory_mapping: [mem 0x40000000-0x47ffffff]
# echo 0x48000000 > /sys/devices/system/memory/probe
[ 28.693675] init_memory_mapping: [mem 0x48000000-0x4fffffff]
# echo online_movable > /sys/devices/system/memory/memory9/state
# echo 0x50000000 > /sys/devices/system/memory/probe
[ 29.084090] init_memory_mapping: [mem 0x50000000-0x57ffffff]
# echo 0x58000000 > /sys/devices/system/memory/probe
[ 29.151880] init_memory_mapping: [mem 0x58000000-0x5fffffff]
# echo online_movable > /sys/devices/system/memory/memory11/state
# echo online> /sys/devices/system/memory/memory8/state
# echo online> /sys/devices/system/memory/memory10/state
# echo offline> /sys/devices/system/memory/memory9/state
[ 30.558819] Offlined Pages 32768
# free
total used free shared buffers cached
Mem: 780588 18014398509432020 830552 0 0 51180
-/+ buffers/cache: 18014398509380840 881732
Swap: 0 0 0

This is because the above commands probe higher memory after online a
section with online_movable, which causes ZONE_HIGHMEM (or ZONE_NORMAL
for systems without ZONE_HIGHMEM) overlaps ZONE_MOVABLE.

After the second online_movable, the problem can be observed from
zoneinfo:

# cat /proc/zoneinfo
...
Node 0, zone Movable
pages free 65491
min 250
low 312
high 375
scanned 0
spanned 18446744073709518848
present 65536
managed 65536
...

This series of patches solve the problem by checking ZONE_MOVABLE when
choosing zone for new memory. If new memory is inside or higher than
ZONE_MOVABLE, makes it go there instead.

After applying this series of patches, following are free and zoneinfo
result (after offlining memory9):

bash-4.2# free
total used free shared buffers cached
Mem: 780956 80112 700844 0 0 51180
-/+ buffers/cache: 28932 752024
Swap: 0 0 0

bash-4.2# cat /proc/zoneinfo

Node 0, zone DMA
pages free 3389
min 14
low 17
high 21
scanned 0
spanned 4095
present 3998
managed 3977
nr_free_pages 3389
...
start_pfn: 1
inactive_ratio: 1
Node 0, zone DMA32
pages free 73724
min 341
low 426
high 511
scanned 0
spanned 98304
present 98304
managed 92958
nr_free_pages 73724
...
start_pfn: 4096
inactive_ratio: 1
Node 0, zone Normal
pages free 32630
min 120
low 150
high 180
scanned 0
spanned 32768
present 32768
managed 32768
nr_free_pages 32630
...
start_pfn: 262144
inactive_ratio: 1
Node 0, zone Movable
pages free 65476
min 241
low 301
high 361
scanned 0
spanned 98304
present 65536
managed 65536
nr_free_pages 65476
...
start_pfn: 294912
inactive_ratio: 1

v1 -> v2:
- introduce zone_for_memory() to arch independent code to make arch
dependent code simpler, following Dave Hansen's comments.
- Paste free and zoneinfo result in patch 0 as a response to
Zhang Yanfei.
- Fix a problem in tile to add memory into ZONE_HIGHMEM by default.

Wang Nan (7):
memory-hotplug: add zone_for_memory() for selecting zone for new
memory
memory-hotplug: x86_64: suitable memory should go to ZONE_MOVABLE
memory-hotplug: x86_32: suitable memory should go to ZONE_MOVABLE
memory-hotplug: ia64: suitable memory should go to ZONE_MOVABLE
memory-hotplug: ppc: suitable memory should go to ZONE_MOVABLE
memory-hotplug: sh: suitable memory should go to ZONE_MOVABLE
memory-hotplug: tile: suitable memory should go to ZONE_MOVABLE

arch/ia64/mm/init.c | 3 ++-
arch/powerpc/mm/mem.c | 3 ++-
arch/sh/mm/init.c | 5 +++--
arch/tile/mm/init.c | 3 ++-
arch/x86/mm/init_32.c | 3 ++-
arch/x86/mm/init_64.c | 3 ++-
include/linux/memory_hotplug.h | 1 +
mm/memory_hotplug.c | 28 ++++++++++++++++++++++++++++
8 files changed, 42 insertions(+), 7 deletions(-)

--
1.8.4


2014-07-21 03:57:13

by Wang Nan

[permalink] [raw]
Subject: [PATCH v2 7/7] memory-hotplug: tile: suitable memory should go to ZONE_MOVABLE

This patch introduces zone_for_memory() to arch_add_memory() on tile to
ensure new, higher memory added into ZONE_MOVABLE if movable zone has
already setup.

This patch also fix a problem: on tile, new memory should be added into
ZONE_HIGHMEM by default, not MAX_NR_ZONES-1, which is ZONE_MOVABLE.

Signed-off-by: Wang Nan <[email protected]>
Cc: Zhang Yanfei <[email protected]>
Cc: Dave Hansen <[email protected]>
---
arch/tile/mm/init.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
index bfb3127..22ac6c1 100644
--- a/arch/tile/mm/init.c
+++ b/arch/tile/mm/init.c
@@ -872,7 +872,8 @@ void __init mem_init(void)
int arch_add_memory(u64 start, u64 size)
{
struct pglist_data *pgdata = &contig_page_data;
- struct zone *zone = pgdata->node_zones + MAX_NR_ZONES-1;
+ struct zone *zone = pgdata->node_zones +
+ zone_for_memory(nid, start, size, ZONE_HIGHMEM);
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;

--
1.8.4

2014-07-21 03:57:11

by Wang Nan

[permalink] [raw]
Subject: [PATCH v2 5/7] memory-hotplug: ppc: suitable memory should go to ZONE_MOVABLE

This patch introduces zone_for_memory() to arch_add_memory() on powerpc
to ensure new, higher memory added into ZONE_MOVABLE if movable zone has
already setup.

Signed-off-by: Wang Nan <[email protected]>
Cc: Zhang Yanfei <[email protected]>
Cc: Dave Hansen <[email protected]>
---
arch/powerpc/mm/mem.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 2c8e90f..e0f7a18 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -128,7 +128,8 @@ int arch_add_memory(int nid, u64 start, u64 size)
return -EINVAL;

/* this should work for most non-highmem platforms */
- zone = pgdata->node_zones;
+ zone = pgdata->node_zones +
+ zone_for_memory(nid, start, size, 0);

return __add_pages(nid, zone, start_pfn, nr_pages);
}
--
1.8.4

2014-07-21 03:57:08

by Wang Nan

[permalink] [raw]
Subject: [PATCH v2 2/7] memory-hotplug: x86_64: suitable memory should go to ZONE_MOVABLE

This patch introduces zone_for_memory() to arch_add_memory() on x86_64
to ensure new, higher memory added into ZONE_MOVABLE if movable zone has
already setup.

Signed-off-by: Wang Nan <[email protected]>
Cc: Zhang Yanfei <[email protected]>
Cc: Dave Hansen <[email protected]>
---
arch/x86/mm/init_64.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index df1a992..5621c47 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -691,7 +691,8 @@ static void update_end_of_memory_vars(u64 start, u64 size)
int arch_add_memory(int nid, u64 start, u64 size)
{
struct pglist_data *pgdat = NODE_DATA(nid);
- struct zone *zone = pgdat->node_zones + ZONE_NORMAL;
+ struct zone *zone = pgdat->node_zones +
+ zone_for_memory(nid, start, size, ZONE_NORMAL);
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
int ret;
--
1.8.4

2014-07-21 03:57:07

by Wang Nan

[permalink] [raw]
Subject: [PATCH v2 6/7] memory-hotplug: sh: suitable memory should go to ZONE_MOVABLE

This patch introduces zone_for_memory() to arch_add_memory() on sh to
ensure new, higher memory added into ZONE_MOVABLE if movable zone has
already setup.

Signed-off-by: Wang Nan <[email protected]>
Cc: Zhang Yanfei <[email protected]>
Cc: Dave Hansen <[email protected]>
---
arch/sh/mm/init.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 2d089fe..2790b6a 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -495,8 +495,9 @@ int arch_add_memory(int nid, u64 start, u64 size)
pgdat = NODE_DATA(nid);

/* We only have ZONE_NORMAL, so this is easy.. */
- ret = __add_pages(nid, pgdat->node_zones + ZONE_NORMAL,
- start_pfn, nr_pages);
+ ret = __add_pages(nid, pgdat->node_zones +
+ zone_for_memory(nid, start, size, ZONE_NORMAL),
+ start_pfn, nr_pages);
if (unlikely(ret))
printk("%s: Failed, __add_pages() == %d\n", __func__, ret);

--
1.8.4

2014-07-21 03:57:06

by Wang Nan

[permalink] [raw]
Subject: [PATCH v2 3/7] memory-hotplug: x86_32: suitable memory should go to ZONE_MOVABLE

This patch introduces zone_for_memory() to arch_add_memory() on x86_32
to ensure new, higher memory added into ZONE_MOVABLE if movable zone has
already setup.

Signed-off-by: Wang Nan <[email protected]>
Cc: Zhang Yanfei <[email protected]>
Cc: Dave Hansen <[email protected]>
---
arch/x86/mm/init_32.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index e395048..7d05565 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -825,7 +825,8 @@ void __init mem_init(void)
int arch_add_memory(int nid, u64 start, u64 size)
{
struct pglist_data *pgdata = NODE_DATA(nid);
- struct zone *zone = pgdata->node_zones + ZONE_HIGHMEM;
+ struct zone *zone = pgdata->node_zones +
+ zone_for_memory(nid, start, size, ZONE_HIGHMEM);
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;

--
1.8.4

2014-07-21 03:57:05

by Wang Nan

[permalink] [raw]
Subject: [PATCH v2 4/7] memory-hotplug: ia64: suitable memory should go to ZONE_MOVABLE

This patch introduces zone_for_memory() to arch_add_memory() on ia64 to
ensure new, higher memory added into ZONE_MOVABLE if movable zone has
already setup.

Signed-off-by: Wang Nan <[email protected]>
Cc: Zhang Yanfei <[email protected]>
Cc: Dave Hansen <[email protected]>
---
arch/ia64/mm/init.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 25c3502..892d43e 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -631,7 +631,8 @@ int arch_add_memory(int nid, u64 start, u64 size)

pgdat = NODE_DATA(nid);

- zone = pgdat->node_zones + ZONE_NORMAL;
+ zone = pgdat->node_zones +
+ zone_for_memory(nid, start, size, ZONE_NORMAL);
ret = __add_pages(nid, zone, start_pfn, nr_pages);

if (ret)
--
1.8.4

2014-07-21 03:57:00

by Wang Nan

[permalink] [raw]
Subject: [PATCH v2 1/7] memory-hotplug: add zone_for_memory() for selecting zone for new memory

This patch introduces a zone_for_memory function in arch independent
code for arch_add_memory() using.

Many arch_add_memory() function simply selects ZONE_HIGHMEM or
ZONE_NORMAL and add new memory into it. However, with the existance of
ZONE_MOVABLE, the selection method should be carefully considered: if
new, higher memory is added after ZONE_MOVABLE is setup, the default
zone and ZONE_MOVABLE may overlap each other.

should_add_memory_movable() checks the status of ZONE_MOVABLE. If it has
already contain memory, compare the address of new memory and movable
memory. If new memory is higher than movable, it should be added into
ZONE_MOVABLE instead of default zone.

Signed-off-by: Wang Nan <[email protected]>
Cc: Zhang Yanfei <[email protected]>
Cc: Dave Hansen <[email protected]>
---
include/linux/memory_hotplug.h | 1 +
mm/memory_hotplug.c | 28 ++++++++++++++++++++++++++++
2 files changed, 29 insertions(+)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 010d125..3de3d02 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -258,6 +258,7 @@ static inline void remove_memory(int nid, u64 start, u64 size) {}
extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
void *arg, int (*func)(struct memory_block *, void *));
extern int add_memory(int nid, u64 start, u64 size);
+extern int zone_for_memory(int nid, u64 start, u64 size, int zone_default);
extern int arch_add_memory(int nid, u64 start, u64 size);
extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
extern bool is_memblock_offlined(struct memory_block *mem);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 469bbf5..348fda7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1156,6 +1156,34 @@ static int check_hotplug_memory_range(u64 start, u64 size)
return 0;
}

+/*
+ * If movable zone has already been setup, newly added memory should be check.
+ * If its address is higher than movable zone, it should be added as movable.
+ * Without this check, movable zone may overlap with other zone.
+ */
+static int should_add_memory_movable(int nid, u64 start, u64 size)
+{
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ pg_data_t *pgdat = NODE_DATA(nid);
+ struct zone *movable_zone = pgdat->node_zones + ZONE_MOVABLE;
+
+ if (zone_is_empty(movable_zone))
+ return 0;
+
+ if (movable_zone->zone_start_pfn <= start_pfn)
+ return 1;
+
+ return 0;
+}
+
+int zone_for_memory(int nid, u64 start, u64 size, int zone_default)
+{
+ if (should_add_memory_movable(nid, start, size))
+ return ZONE_MOVABLE;
+
+ return zone_default;
+}
+
/* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
int __ref add_memory(int nid, u64 start, u64 size)
{
--
1.8.4

2014-07-21 17:20:00

by Sergei Shtylyov

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] memory-hotplug: add zone_for_memory() for selecting zone for new memory

Hello.

On 07/21/2014 07:46 AM, Wang Nan wrote:

Some grammar nitpicking.

> This patch introduces a zone_for_memory function in arch independent
> code for arch_add_memory() using.

s/ using/'s use/.

> Many arch_add_memory() function simply selects ZONE_HIGHMEM or

Plural needed with "many".

> ZONE_NORMAL and add new memory into it. However, with the existance of
> ZONE_MOVABLE, the selection method should be carefully considered: if
> new, higher memory is added after ZONE_MOVABLE is setup, the default
> zone and ZONE_MOVABLE may overlap each other.

> should_add_memory_movable() checks the status of ZONE_MOVABLE. If it has
> already contain memory, compare the address of new memory and movable
> memory. If new memory is higher than movable, it should be added into
> ZONE_MOVABLE instead of default zone.

> Signed-off-by: Wang Nan <[email protected]>
> Cc: Zhang Yanfei <[email protected]>
> Cc: Dave Hansen <[email protected]>
[...]

> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 469bbf5..348fda7 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1156,6 +1156,34 @@ static int check_hotplug_memory_range(u64 start, u64 size)
> return 0;
> }
>
> +/*
> + * If movable zone has already been setup, newly added memory should be check.

Checked.

WBR, Sergei

2014-07-22 03:10:10

by Wang Nan

[permalink] [raw]
Subject: Re: [PATCH v2 7/7] memory-hotplug: tile: suitable memory should go to ZONE_MOVABLE

Hi Andrew,

Please drop patch 7/7 from -mm tree and keep other 6 patches.

arch_add_memory() in tile is different from others: no nid parameter.
Patch 7/7 will block compiling.

I cc this mail to Chris Metcalf and hope he can look at this issue.

Other 6 patches looks good.

On 2014/7/21 11:46, Wang Nan wrote:
> This patch introduces zone_for_memory() to arch_add_memory() on tile to
> ensure new, higher memory added into ZONE_MOVABLE if movable zone has
> already setup.
>
> This patch also fix a problem: on tile, new memory should be added into
> ZONE_HIGHMEM by default, not MAX_NR_ZONES-1, which is ZONE_MOVABLE.
>
> Signed-off-by: Wang Nan <[email protected]>
> Cc: Zhang Yanfei <[email protected]>
> Cc: Dave Hansen <[email protected]>
> ---
> arch/tile/mm/init.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
> index bfb3127..22ac6c1 100644
> --- a/arch/tile/mm/init.c
> +++ b/arch/tile/mm/init.c
> @@ -872,7 +872,8 @@ void __init mem_init(void)
> int arch_add_memory(u64 start, u64 size)
> {
> struct pglist_data *pgdata = &contig_page_data;
> - struct zone *zone = pgdata->node_zones + MAX_NR_ZONES-1;
> + struct zone *zone = pgdata->node_zones +
> + zone_for_memory(nid, start, size, ZONE_HIGHMEM);
> unsigned long start_pfn = start >> PAGE_SHIFT;
> unsigned long nr_pages = size >> PAGE_SHIFT;
>
>

2014-07-31 20:44:02

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v2 7/7] memory-hotplug: tile: suitable memory should go to ZONE_MOVABLE

On 7/21/2014 11:09 PM, Wang Nan wrote:
> Hi Andrew,
>
> Please drop patch 7/7 from -mm tree and keep other 6 patches.
>
> arch_add_memory() in tile is different from others: no nid parameter.
> Patch 7/7 will block compiling.
>
> I cc this mail to Chris Metcalf and hope he can look at this issue.
>
> Other 6 patches looks good.
>
> On 2014/7/21 11:46, Wang Nan wrote:
>> This patch introduces zone_for_memory() to arch_add_memory() on tile to
>> ensure new, higher memory added into ZONE_MOVABLE if movable zone has
>> already setup.
>>
>> This patch also fix a problem: on tile, new memory should be added into
>> ZONE_HIGHMEM by default, not MAX_NR_ZONES-1, which is ZONE_MOVABLE.
>>
>> Signed-off-by: Wang Nan <[email protected]>
>> Cc: Zhang Yanfei <[email protected]>
>> Cc: Dave Hansen <[email protected]>
>> ---
>> arch/tile/mm/init.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/tile/mm/init.c b/arch/tile/mm/init.c
>> index bfb3127..22ac6c1 100644
>> --- a/arch/tile/mm/init.c
>> +++ b/arch/tile/mm/init.c
>> @@ -872,7 +872,8 @@ void __init mem_init(void)
>> int arch_add_memory(u64 start, u64 size)
>> {
>> struct pglist_data *pgdata = &contig_page_data;
>> - struct zone *zone = pgdata->node_zones + MAX_NR_ZONES-1;
>> + struct zone *zone = pgdata->node_zones +
>> + zone_for_memory(nid, start, size, ZONE_HIGHMEM);
>> unsigned long start_pfn = start >> PAGE_SHIFT;
>> unsigned long nr_pages = size >> PAGE_SHIFT;
>>

This code is entirely stale; it came from the initial port of Linux
2.6.15 to Tilera. Since we have always used DISCONTIGMEM unconditionally,
which forces NEED_MULTIPLE_NODES to be true, this code never compiles.
Note the completely irrelevant comment about x86 in this ifdef block, too :-)

The cleanest thing to do is just remove those three functions in the
ifdef block. I'll do that to our internal tree and plan to push the
change upstream later.

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com