2012-10-24 09:42:20

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 0/2 V2] memory_hotplug: fix memory hotplug bug

We found 2 bugs while we test and develop memory hotplug.

The hotplug code does not handle node_states[N_NORMAL_MEMORY] correctly,
it may corrupt the memory.

And we ensure the SLUB do NOT respond when node_states[N_NORMAL_MEMORY]
is not changed.

The patchset is based on mainline(3d0ceac129f3ea0b125289055a3aa7519d38df77)


CC: David Rientjes <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
CC: Yasuaki Ishimatsu <[email protected]>
CC: Rob Landley <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Jiang Liu <[email protected]>
CC: Kay Sievers <[email protected]>
CC: Greg Kroah-Hartman <[email protected]>
CC: Mel Gorman <[email protected]>
CC: 'FNST-Wen Congyang' <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]

Lai Jiangshan (2):
memory_hotplug: fix possible incorrect node_states[N_NORMAL_MEMORY]
slub, hotplug: ignore unrelated node's hot-adding and hot-removing

Documentation/memory-hotplug.txt | 5 +-
include/linux/memory.h | 1 +
mm/memory_hotplug.c | 136 +++++++++++++++++++++++++++++++++-----
mm/slub.c | 4 +-
4 files changed, 127 insertions(+), 19 deletions(-)

--
1.7.4.4


2012-10-24 09:42:18

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 1/2 V2] memory_hotplug: fix possible incorrect node_states[N_NORMAL_MEMORY]

Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
it forgets to manage node_states[N_NORMAL_MEMORY]. it may cause
node_states[N_NORMAL_MEMORY] becomes incorrect.

Example, if a node is empty before online, and we online a memory
which is in ZONE_NORMAL. And after online, node_states[N_HIGH_MEMORY]
is correct, but node_states[N_NORMAL_MEMORY] is incorrect,
the online code don't set the new online node to
node_states[N_NORMAL_MEMORY].

The same things like it will happen when offline(the offline code
don't clear the node from node_states[N_NORMAL_MEMORY] when needed).
Some memory managment code depends node_states[N_NORMAL_MEMORY],
so we have to fix up the node_states[N_NORMAL_MEMORY].

We add node_states_check_changes_online() and node_states_check_changes_offline()
to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
are changed while hotpluging.

Also add @status_change_nid_normal to struct memory_notify, thus
the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
are changed. (We can add a @flags and reuse @status_change_nid instead of
introducing @status_change_nid_normal, but it will add much more complicated
in memory hotplug callback in every subsystem. So introdcing
@status_change_nid_normal is better and it don't change the sematic
of @status_change_nid)

Changed from V1:
add more comments
change the function name

CC: David Rientjes <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
CC: Yasuaki Ishimatsu <[email protected]>
CC: Rob Landley <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Jiang Liu <[email protected]>
CC: Kay Sievers <[email protected]>
CC: Greg Kroah-Hartman <[email protected]>
CC: Mel Gorman <[email protected]>
CC: 'FNST-Wen Congyang' <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
Signed-off-by: Lai Jiangshan <[email protected]>
---
Documentation/memory-hotplug.txt | 5 +-
include/linux/memory.h | 1 +
mm/memory_hotplug.c | 136 +++++++++++++++++++++++++++++++++-----
3 files changed, 125 insertions(+), 17 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6d0c251..6e6cbc7 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -377,15 +377,18 @@ The third argument is passed by pointer of struct memory_notify.
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
+ int status_change_nid_normal;
int status_change_nid;
}

start_pfn is start_pfn of online/offline memory.
nr_pages is # of pages of online/offline memory.
+status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
set/clear. It means a new(memoryless) node gets new memory by online and a
node loses all memory. If this is -1, then nodemask status is not changed.
-If status_changed_nid >= 0, callback should create/discard structures for the
+If status_changed_nid* >= 0, callback should create/discard structures for the
node if necessary.

--------------
diff --git a/include/linux/memory.h b/include/linux/memory.h
index ff9a9f8..a09216d 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -53,6 +53,7 @@ int arch_get_memory_phys_device(unsigned long start_pfn);
struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
+ int status_change_nid_normal;
int status_change_nid;
};

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ec899a2..a1920fb 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -467,6 +467,53 @@ static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages,
return 0;
}

+/* check which state of node_states will be changed when online memory */
+static void node_states_check_changes_online(unsigned long nr_pages,
+ struct zone *zone, struct memory_notify *arg)
+{
+ int nid = zone_to_nid(zone);
+ enum zone_type zone_last = ZONE_NORMAL;
+
+ /*
+ * If we have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
+ * which have 0...ZONE_NORMAL, set zone_last to ZONE_NORMAL.
+ *
+ * If we don't have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
+ * which have 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
+ */
+ if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+ zone_last = ZONE_MOVABLE;
+
+ /*
+ * if the memory to be online is in a zone of 0...zone_last, and
+ * the zones of 0...zone_last don't have memory before online, we will
+ * need to set the node to node_states[N_NORMAL_MEMORY] after
+ * the memory is online.
+ */
+ if (zone_idx(zone) <= zone_last && !node_state(nid, N_NORMAL_MEMORY))
+ arg->status_change_nid_normal = nid;
+ else
+ arg->status_change_nid_normal = -1;
+
+ /*
+ * if the node don't have memory befor online, we will need to
+ * set the node to node_states[N_HIGH_MEMORY] after the memory
+ * is online.
+ */
+ if (!node_state(nid, N_HIGH_MEMORY))
+ arg->status_change_nid = nid;
+ else
+ arg->status_change_nid = -1;
+}
+
+static void node_states_set_node(int node, struct memory_notify *arg)
+{
+ if (arg->status_change_nid_normal >= 0)
+ node_set_state(node, N_NORMAL_MEMORY);
+
+ node_set_state(node, N_HIGH_MEMORY);
+}
+

int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
{
@@ -478,13 +525,18 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
struct memory_notify arg;

lock_memory_hotplug();
+ /*
+ * This doesn't need a lock to do pfn_to_page().
+ * The section can't be removed here because of the
+ * memory_block->state_mutex.
+ */
+ zone = page_zone(pfn_to_page(pfn));
+
arg.start_pfn = pfn;
arg.nr_pages = nr_pages;
- arg.status_change_nid = -1;
+ node_states_check_changes_online(nr_pages, zone, &arg);

nid = page_to_nid(pfn_to_page(pfn));
- if (node_present_pages(nid) == 0)
- arg.status_change_nid = nid;

ret = memory_notify(MEM_GOING_ONLINE, &arg);
ret = notifier_to_errno(ret);
@@ -494,12 +546,6 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
return ret;
}
/*
- * This doesn't need a lock to do pfn_to_page().
- * The section can't be removed here because of the
- * memory_block->state_mutex.
- */
- zone = page_zone(pfn_to_page(pfn));
- /*
* If this zone is not populated, then it is not in zonelist.
* This means the page allocator ignores this zone.
* So, zonelist must be updated after online.
@@ -524,7 +570,7 @@ int __ref online_pages(unsigned long pfn, unsigned long nr_pages)
zone->present_pages += onlined_pages;
zone->zone_pgdat->node_present_pages += onlined_pages;
if (onlined_pages) {
- node_set_state(zone_to_nid(zone), N_HIGH_MEMORY);
+ node_states_set_node(zone_to_nid(zone), &arg);
if (need_zonelists_rebuild)
build_all_zonelists(NULL, zone);
else
@@ -874,6 +920,67 @@ check_pages_isolated(unsigned long start_pfn, unsigned long end_pfn)
return offlined;
}

+/* check which state of node_states will be changed when offline memory */
+static void node_states_check_changes_offline(unsigned long nr_pages,
+ struct zone *zone, struct memory_notify *arg)
+{
+ struct pglist_data *pgdat = zone->zone_pgdat;
+ unsigned long present_pages = 0;
+ enum zone_type zt, zone_last = ZONE_NORMAL;
+
+ /*
+ * If we have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
+ * which have 0...ZONE_NORMAL, set zone_last to ZONE_NORMAL.
+ *
+ * If we don't have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
+ * which have 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
+ */
+ if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+ zone_last = ZONE_MOVABLE;
+
+ /*
+ * check whether node_states[N_NORMAL_MEMORY] will be changed.
+ * If the memory to be offline is in a zone of 0...zone_last,
+ * and it is the last present memory, 0...zone_last will
+ * become empty after offline , thus we can determind we will
+ * need to clear the node from node_states[N_NORMAL_MEMORY].
+ */
+ for (zt = 0; zt <= zone_last; zt++)
+ present_pages += pgdat->node_zones[zt].present_pages;
+ if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
+ arg->status_change_nid_normal = zone_to_nid(zone);
+ else
+ arg->status_change_nid_normal = -1;
+
+ /*
+ * node_states[N_HIGH_MEMORY] contains nodes which have 0...ZONE_MOVABLE
+ */
+ zone_last = ZONE_MOVABLE;
+
+ /*
+ * check whether node_states[N_HIGH_MEMORY] will be changed
+ * If we try to offline the last present @nr_pages from the node,
+ * we can determind we will need to clear the node from
+ * node_states[N_HIGH_MEMORY].
+ */
+ for (; zt <= zone_last; zt++)
+ present_pages += pgdat->node_zones[zt].present_pages;
+ if (nr_pages >= present_pages)
+ arg->status_change_nid = zone_to_nid(zone);
+ else
+ arg->status_change_nid = -1;
+}
+
+static void node_states_clear_node(int node, struct memory_notify *arg)
+{
+ if (arg->status_change_nid_normal >= 0)
+ node_clear_state(node, N_NORMAL_MEMORY);
+
+ if ((N_HIGH_MEMORY != N_NORMAL_MEMORY) &&
+ (arg->status_change_nid >= 0))
+ node_clear_state(node, N_HIGH_MEMORY);
+}
+
static int __ref __offline_pages(unsigned long start_pfn,
unsigned long end_pfn, unsigned long timeout)
{
@@ -907,9 +1014,7 @@ static int __ref __offline_pages(unsigned long start_pfn,

arg.start_pfn = start_pfn;
arg.nr_pages = nr_pages;
- arg.status_change_nid = -1;
- if (nr_pages >= node_present_pages(node))
- arg.status_change_nid = node;
+ node_states_check_changes_offline(nr_pages, zone, &arg);

ret = memory_notify(MEM_GOING_OFFLINE, &arg);
ret = notifier_to_errno(ret);
@@ -982,10 +1087,9 @@ repeat:
} else
zone_pcp_update(zone);

- if (!node_present_pages(node)) {
- node_clear_state(node, N_HIGH_MEMORY);
+ node_states_clear_node(node, &arg);
+ if (arg.status_change_nid >= 0)
kswapd_stop(node);
- }

vm_total_pages = nr_free_pagecache_pages();
writeback_set_ratelimit();
--
1.7.4.4

2012-10-24 09:42:16

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH 2/2 V2] slub, hotplug: ignore unrelated node's hot-adding and hot-removing

SLUB only fucus on the nodes which has normal memory, so ignore the other
node's hot-adding and hot-removing.

Aka: if some memroy of a node(which has no onlined memory) is online,
but this new memory onlined is not normal memory(HIGH memory example),
we should not allocate kmem_cache_node for SLUB.

And if the last normal memory is offlined, but the node still has memroy,
we should remove kmem_cache_node for that node.(current code delay it when
all of the memory is offlined)

so we only do something when marg->status_change_nid_normal > 0.
marg->status_change_nid is not suitable here.

The same problem doesn't exsit in SLAB, because SLAB allocates kmem_list3
for every node even the node don't have normal memory, SLAB tolerates
kmem_list3 on alien nodes. SLUB only fucus on the nodes which has normal
memory, it don't tolerates alien kmem_cache_node, the patch makes
SLUB become self-compatible and avoid WARN and BUG in a rare condition.

CC: David Rientjes <[email protected]>
Cc: Minchan Kim <[email protected]>
CC: KOSAKI Motohiro <[email protected]>
CC: Yasuaki Ishimatsu <[email protected]>
CC: Rob Landley <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Jiang Liu <[email protected]>
CC: Kay Sievers <[email protected]>
CC: Greg Kroah-Hartman <[email protected]>
CC: Mel Gorman <[email protected]>
CC: 'FNST-Wen Congyang' <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
Signed-off-by: Lai Jiangshan <[email protected]>
---
mm/slub.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index a0d6984..487f0bd 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3573,7 +3573,7 @@ static void slab_mem_offline_callback(void *arg)
struct memory_notify *marg = arg;
int offline_node;

- offline_node = marg->status_change_nid;
+ offline_node = marg->status_change_nid_normal;

/*
* If the node still has available memory. we need kmem_cache_node
@@ -3606,7 +3606,7 @@ static int slab_mem_going_online_callback(void *arg)
struct kmem_cache_node *n;
struct kmem_cache *s;
struct memory_notify *marg = arg;
- int nid = marg->status_change_nid;
+ int nid = marg->status_change_nid_normal;
int ret = 0;

/*
--
1.7.4.4

Subject: Re: [PATCH 2/2 V2] slub, hotplug: ignore unrelated node's hot-adding and hot-removing

On Wed, 24 Oct 2012, Lai Jiangshan wrote:

> SLUB only fucus on the nodes which has normal memory, so ignore the other
> node's hot-adding and hot-removing.

As far as I can see the reasoning sounds fine and the patch looks clean.

Acked-by: Christoph Lameter <[email protected]>

2012-10-25 04:18:08

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: [PATCH 1/2 V2] memory_hotplug: fix possible incorrect node_states[N_NORMAL_MEMORY]

On Wed, Oct 24, 2012 at 5:43 AM, Lai Jiangshan <[email protected]> wrote:
> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
> it forgets to manage node_states[N_NORMAL_MEMORY]. it may cause
> node_states[N_NORMAL_MEMORY] becomes incorrect.
>
> Example, if a node is empty before online, and we online a memory
> which is in ZONE_NORMAL. And after online, node_states[N_HIGH_MEMORY]
> is correct, but node_states[N_NORMAL_MEMORY] is incorrect,
> the online code don't set the new online node to
> node_states[N_NORMAL_MEMORY].
>
> The same things like it will happen when offline(the offline code
> don't clear the node from node_states[N_NORMAL_MEMORY] when needed).
> Some memory managment code depends node_states[N_NORMAL_MEMORY],
> so we have to fix up the node_states[N_NORMAL_MEMORY].
>
> We add node_states_check_changes_online() and node_states_check_changes_offline()
> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
> are changed while hotpluging.
>
> Also add @status_change_nid_normal to struct memory_notify, thus
> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
> are changed. (We can add a @flags and reuse @status_change_nid instead of
> introducing @status_change_nid_normal, but it will add much more complicated
> in memory hotplug callback in every subsystem. So introdcing
> @status_change_nid_normal is better and it don't change the sematic
> of @status_change_nid)
>
> Changed from V1:
> add more comments
> change the function name

Your patch didn't fix my previous comments and don't works correctly.
Please test your own patch before resubmitting. You should consider both
zone normal only node and zone high only node.

2012-10-26 01:49:06

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH 1/2 V2] memory_hotplug: fix possible incorrect node_states[N_NORMAL_MEMORY]

On 10/25/2012 12:17 PM, KOSAKI Motohiro wrote:
> On Wed, Oct 24, 2012 at 5:43 AM, Lai Jiangshan <[email protected]> wrote:
>> Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY],
>> it forgets to manage node_states[N_NORMAL_MEMORY]. it may cause
>> node_states[N_NORMAL_MEMORY] becomes incorrect.
>>
>> Example, if a node is empty before online, and we online a memory
>> which is in ZONE_NORMAL. And after online, node_states[N_HIGH_MEMORY]
>> is correct, but node_states[N_NORMAL_MEMORY] is incorrect,
>> the online code don't set the new online node to
>> node_states[N_NORMAL_MEMORY].
>>
>> The same things like it will happen when offline(the offline code
>> don't clear the node from node_states[N_NORMAL_MEMORY] when needed).
>> Some memory managment code depends node_states[N_NORMAL_MEMORY],
>> so we have to fix up the node_states[N_NORMAL_MEMORY].
>>
>> We add node_states_check_changes_online() and node_states_check_changes_offline()
>> to detect whether node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY]
>> are changed while hotpluging.
>>
>> Also add @status_change_nid_normal to struct memory_notify, thus
>> the memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY]
>> are changed. (We can add a @flags and reuse @status_change_nid instead of
>> introducing @status_change_nid_normal, but it will add much more complicated
>> in memory hotplug callback in every subsystem. So introdcing
>> @status_change_nid_normal is better and it don't change the sematic
>> of @status_change_nid)
>>
>> Changed from V1:
>> add more comments
>> change the function name
>
> Your patch didn't fix my previous comments and don't works correctly.
> Please test your own patch before resubmitting. You should consider both
> zone normal only node and zone high only node.
>

The comments in the code already answered/explained your previous comments.

Thanks,
Lai

2012-10-31 07:10:04

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 2/2 V2] slub, hotplug: ignore unrelated node's hot-adding and hot-removing

On Wed, Oct 24, 2012 at 12:43 PM, Lai Jiangshan <[email protected]> wrote:
> SLUB only fucus on the nodes which has normal memory, so ignore the other
> node's hot-adding and hot-removing.
>
> Aka: if some memroy of a node(which has no onlined memory) is online,
> but this new memory onlined is not normal memory(HIGH memory example),
> we should not allocate kmem_cache_node for SLUB.
>
> And if the last normal memory is offlined, but the node still has memroy,
> we should remove kmem_cache_node for that node.(current code delay it when
> all of the memory is offlined)
>
> so we only do something when marg->status_change_nid_normal > 0.
> marg->status_change_nid is not suitable here.
>
> The same problem doesn't exsit in SLAB, because SLAB allocates kmem_list3
> for every node even the node don't have normal memory, SLAB tolerates
> kmem_list3 on alien nodes. SLUB only fucus on the nodes which has normal
> memory, it don't tolerates alien kmem_cache_node, the patch makes
> SLUB become self-compatible and avoid WARN and BUG in a rare condition.
>
> CC: David Rientjes <[email protected]>
> Cc: Minchan Kim <[email protected]>
> CC: KOSAKI Motohiro <[email protected]>
> CC: Yasuaki Ishimatsu <[email protected]>
> CC: Rob Landley <[email protected]>
> CC: Andrew Morton <[email protected]>
> CC: Jiang Liu <[email protected]>
> CC: Kay Sievers <[email protected]>
> CC: Greg Kroah-Hartman <[email protected]>
> CC: Mel Gorman <[email protected]>
> CC: 'FNST-Wen Congyang' <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> Signed-off-by: Lai Jiangshan <[email protected]>

The patch looks OK but changelog doesn't say what problem this fixes,
how you found about it, and do we need this in stable.