2012-11-15 08:51:34

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 00/14] introduce N_MEMORY

This patch is part3 of the following patchset:
https://lkml.org/lkml/2012/10/29/319

Part1 is here:
https://lkml.org/lkml/2012/10/31/30

Part2 is here:
https://lkml.org/lkml/2012/10/31/73

Part4 is here:
https://lkml.org/lkml/2012/10/31/129

Part5 is here:
https://lkml.org/lkml/2012/10/31/145

Part6 is here:
https://lkml.org/lkml/2012/10/31/248

You can apply this patchset without the other parts.

Note: part1 and part2 are in mm tree now. part5 are being reimplemented(We will
post it some days later).

We need a node which only contains movable memory. This feature is very
important for node hotplug. So we will add a new nodemask
for all memory. N_MEMORY contains movable memory but N_HIGH_MEMORY
doesn't contain it.

The meaning of N_MEMORY and N_HIGH_MEMORY nodemask:
1. N_HIGH_MEMORY: the node contains the memory that kernel can use. movable
node aren't in this nodemask.
2. N_MEMORY: the node contains memory.

Why we intrdouce a new nodemask, not rename N_HIGH_MEMORY to N_MEMORY?

See the following two codes:
1.
==========================
static void *__meminit alloc_page_cgroup(size_t size, int nid)
{
gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN;
void *addr = NULL;

addr = alloc_pages_exact_nid(nid, size, flags);
if (addr) {
kmemleak_alloc(addr, size, 1, flags);
return addr;
}

if (node_state(nid, N_HIGH_MEMORY))
addr = vzalloc_node(size, nid);
else
addr = vzalloc(size);

return addr;
}
==========================
If the node only has ZONE_MOVABLE memory, we should use vzalloc().
So we should have a mask that stores the node which has memory that
the kernel can use.

2.
==========================
static int mpol_set_nodemask(struct mempolicy *pol,
const nodemask_t *nodes, struct nodemask_scratch *nsc)
{
int ret;

/* if mode is MPOL_DEFAULT, pol is NULL. This is right. */
if (pol == NULL)
return 0;
/* Check N_HIGH_MEMORY */
nodes_and(nsc->mask1,
cpuset_current_mems_allowed, node_states[N_HIGH_MEMORY]);
...
if (pol->flags & MPOL_F_RELATIVE_NODES)
mpol_relative_nodemask(&nsc->mask2, nodes,&nsc->mask1);
else
nodes_and(nsc->mask2, *nodes, nsc->mask1);
...
}
==========================
If the user specifies 2 nodes: one has ZONE_MOVABLE memory, and the other one
doesn't. The cpuset for this task contains all nodes. nsc->mask2 should contain
these 2 nodes. So we should hava a mask that store the node which has memory,
and use this mask to calculate nsc->mask1.


The movable node will implemtent in part4. So N_MEMORY is equal to N_HIGH_MEMORY
now.

Changes from v1 to v2:
1. add your Signed-off-by, because I am on the the patch delivery path. Andrew
Morton tells me this.
2. patch13: The newest kernel adds some codes which use N_HIGH_MEMORY. It shoule
be N_MEMORY now.

Lai Jiangshan (14):
node_states: introduce N_MEMORY
cpuset: use N_MEMORY instead N_HIGH_MEMORY
procfs: use N_MEMORY instead N_HIGH_MEMORY
memcontrol: use N_MEMORY instead N_HIGH_MEMORY
oom: use N_MEMORY instead N_HIGH_MEMORY
mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
mempolicy: use N_MEMORY instead N_HIGH_MEMORY
hugetlb: use N_MEMORY instead N_HIGH_MEMORY
vmstat: use N_MEMORY instead N_HIGH_MEMORY
kthread: use N_MEMORY instead N_HIGH_MEMORY
init: use N_MEMORY instead N_HIGH_MEMORY
vmscan: use N_MEMORY instead N_HIGH_MEMORY
page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states
initialization
hotplug: update nodemasks management

Documentation/cgroups/cpusets.txt | 2 +-
Documentation/memory-hotplug.txt | 5 ++-
arch/x86/mm/init_64.c | 4 +-
drivers/base/node.c | 2 +-
fs/proc/kcore.c | 2 +-
fs/proc/task_mmu.c | 4 +-
include/linux/cpuset.h | 2 +-
include/linux/memory.h | 1 +
include/linux/nodemask.h | 1 +
init/main.c | 2 +-
kernel/cpuset.c | 32 +++++++-------
kernel/kthread.c | 2 +-
mm/hugetlb.c | 24 +++++------
mm/memcontrol.c | 18 ++++----
mm/memory_hotplug.c | 87 ++++++++++++++++++++++++++++++++-------
mm/mempolicy.c | 12 +++---
mm/migrate.c | 2 +-
mm/oom_kill.c | 2 +-
mm/page_alloc.c | 42 ++++++++++---------
mm/page_cgroup.c | 2 +-
mm/vmscan.c | 4 +-
mm/vmstat.c | 4 +-
22 files changed, 162 insertions(+), 94 deletions(-)

--
1.8.0


2012-11-15 08:51:33

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 04/14] memcontrol: use N_MEMORY instead N_HIGH_MEMORY

From: Lai Jiangshan <[email protected]>

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
mm/memcontrol.c | 18 +++++++++---------
mm/page_cgroup.c | 2 +-
2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7acf43b..1b69665 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -800,7 +800,7 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg,
int nid;
u64 total = 0;

- for_each_node_state(nid, N_HIGH_MEMORY)
+ for_each_node_state(nid, N_MEMORY)
total += mem_cgroup_node_nr_lru_pages(memcg, nid, lru_mask);
return total;
}
@@ -1611,9 +1611,9 @@ static void mem_cgroup_may_update_nodemask(struct mem_cgroup *memcg)
return;

/* make a nodemask where this memcg uses memory from */
- memcg->scan_nodes = node_states[N_HIGH_MEMORY];
+ memcg->scan_nodes = node_states[N_MEMORY];

- for_each_node_mask(nid, node_states[N_HIGH_MEMORY]) {
+ for_each_node_mask(nid, node_states[N_MEMORY]) {

if (!test_mem_cgroup_node_reclaimable(memcg, nid, false))
node_clear(nid, memcg->scan_nodes);
@@ -1684,7 +1684,7 @@ static bool mem_cgroup_reclaimable(struct mem_cgroup *memcg, bool noswap)
/*
* Check rest of nodes.
*/
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
if (node_isset(nid, memcg->scan_nodes))
continue;
if (test_mem_cgroup_node_reclaimable(memcg, nid, noswap))
@@ -3759,7 +3759,7 @@ move_account:
drain_all_stock_sync(memcg);
ret = 0;
mem_cgroup_start_move(memcg);
- for_each_node_state(node, N_HIGH_MEMORY) {
+ for_each_node_state(node, N_MEMORY) {
for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
enum lru_list lru;
for_each_lru(lru) {
@@ -4087,7 +4087,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,

total_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL);
seq_printf(m, "total=%lu", total_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid, LRU_ALL);
seq_printf(m, " N%d=%lu", nid, node_nr);
}
@@ -4095,7 +4095,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,

file_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_FILE);
seq_printf(m, "file=%lu", file_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
LRU_ALL_FILE);
seq_printf(m, " N%d=%lu", nid, node_nr);
@@ -4104,7 +4104,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,

anon_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_ANON);
seq_printf(m, "anon=%lu", anon_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
LRU_ALL_ANON);
seq_printf(m, " N%d=%lu", nid, node_nr);
@@ -4113,7 +4113,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,

unevictable_nr = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_UNEVICTABLE));
seq_printf(m, "unevictable=%lu", unevictable_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
BIT(LRU_UNEVICTABLE));
seq_printf(m, " N%d=%lu", nid, node_nr);
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 5ddad0c..c1054ad 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -271,7 +271,7 @@ void __init page_cgroup_init(void)
if (mem_cgroup_disabled())
return;

- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long start_pfn, end_pfn;

start_pfn = node_start_pfn(nid);
--
1.8.0

2012-11-15 08:51:43

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 11/14] init: use N_MEMORY instead N_HIGH_MEMORY

From: Lai Jiangshan <[email protected]>

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
init/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index e33e09d..63ae904 100644
--- a/init/main.c
+++ b/init/main.c
@@ -857,7 +857,7 @@ static void __init kernel_init_freeable(void)
/*
* init can allocate pages on any node
*/
- set_mems_allowed(node_states[N_HIGH_MEMORY]);
+ set_mems_allowed(node_states[N_MEMORY]);
/*
* init can run on any cpu.
*/
--
1.8.0

2012-11-15 08:51:42

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 07/14] mempolicy: use N_MEMORY instead N_HIGH_MEMORY

From: Lai Jiangshan <[email protected]>

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
mm/mempolicy.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d04a8a5..d4a084c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -212,9 +212,9 @@ static int mpol_set_nodemask(struct mempolicy *pol,
/* if mode is MPOL_DEFAULT, pol is NULL. This is right. */
if (pol == NULL)
return 0;
- /* Check N_HIGH_MEMORY */
+ /* Check N_MEMORY */
nodes_and(nsc->mask1,
- cpuset_current_mems_allowed, node_states[N_HIGH_MEMORY]);
+ cpuset_current_mems_allowed, node_states[N_MEMORY]);

VM_BUG_ON(!nodes);
if (pol->mode == MPOL_PREFERRED && nodes_empty(*nodes))
@@ -1388,7 +1388,7 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
goto out_put;
}

- if (!nodes_subset(*new, node_states[N_HIGH_MEMORY])) {
+ if (!nodes_subset(*new, node_states[N_MEMORY])) {
err = -EINVAL;
goto out_put;
}
@@ -2361,7 +2361,7 @@ void __init numa_policy_init(void)
* fall back to the largest node if they're all smaller.
*/
nodes_clear(interleave_nodes);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long total_pages = node_present_pages(nid);

/* Preserve the largest node */
@@ -2442,7 +2442,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
*nodelist++ = '\0';
if (nodelist_parse(nodelist, nodes))
goto out;
- if (!nodes_subset(nodes, node_states[N_HIGH_MEMORY]))
+ if (!nodes_subset(nodes, node_states[N_MEMORY]))
goto out;
} else
nodes_clear(nodes);
@@ -2476,7 +2476,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
* Default to online nodes with memory if no nodelist
*/
if (!nodelist)
- nodes = node_states[N_HIGH_MEMORY];
+ nodes = node_states[N_MEMORY];
break;
case MPOL_LOCAL:
/*
--
1.8.0

2012-11-15 08:51:40

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 03/14] procfs: use N_MEMORY instead N_HIGH_MEMORY

From: Lai Jiangshan <[email protected]>

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
fs/proc/kcore.c | 2 +-
fs/proc/task_mmu.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 86c67ee..e96d4f1 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -249,7 +249,7 @@ static int kcore_update_ram(void)
/* Not inialized....update now */
/* find out "max pfn" */
end_pfn = 0;
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long node_end;
node_end = NODE_DATA(nid)->node_start_pfn +
NODE_DATA(nid)->node_spanned_pages;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 90c63f9..2d89601 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1126,7 +1126,7 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
return NULL;

nid = page_to_nid(page);
- if (!node_isset(nid, node_states[N_HIGH_MEMORY]))
+ if (!node_isset(nid, node_states[N_MEMORY]))
return NULL;

return page;
@@ -1279,7 +1279,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
if (md->writeback)
seq_printf(m, " writeback=%lu", md->writeback);

- for_each_node_state(n, N_HIGH_MEMORY)
+ for_each_node_state(n, N_MEMORY)
if (md->node[n])
seq_printf(m, " N%d=%lu", n, md->node[n]);
out:
--
1.8.0

2012-11-15 08:51:39

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 01/14] node_states: introduce N_MEMORY

From: Lai Jiangshan <[email protected]>

We have N_NORMAL_MEMORY for standing for the nodes that have normal memory with
zone_type <= ZONE_NORMAL.

And we have N_HIGH_MEMORY for standing for the nodes that have normal or high
memory.

But we don't have any word to stand for the nodes that have *any* memory.

And we have N_CPU but without N_MEMORY.

Current code reuse the N_HIGH_MEMORY for this purpose because any node which
has memory must have high memory or normal memory currently.

A) But this reusing is bad for *readability*. Because the name
N_HIGH_MEMORY just stands for high or normal:

A.example 1)
mem_cgroup_nr_lru_pages():
for_each_node_state(nid, N_HIGH_MEMORY)

The user will be confused(why this function just counts for high or
normal memory node? does it counts for ZONE_MOVABLE's lru pages?)
until someone else tell them N_HIGH_MEMORY is reused to stand for
nodes that have any memory.

A.cont) If we introduce N_MEMORY, we can reduce this confusing
AND make the code more clearly:

A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice:

One is in page_cgroup_init(void):
for_each_node_state(nid, N_HIGH_MEMORY) {

It means if the node have memory, we will allocate page_cgroup map for
the node. We should use N_MEMORY instead here to gaim more clearly.

The second using is in alloc_page_cgroup():
if (node_state(nid, N_HIGH_MEMORY))
addr = vzalloc_node(size, nid);

It means if the node has high or normal memory that can be allocated
from kernel. We should keep N_HIGH_MEMORY here, and it will be better
if the "any memory" semantic of N_HIGH_MEMORY is removed.

B) This reusing is out-dated if we introduce MOVABLE-dedicated node.
The MOVABLE-dedicated node should not appear in
node_stats[N_HIGH_MEMORY] nor node_stats[N_NORMAL_MEMORY],
because MOVABLE-dedicated node has no high or normal memory.

In x86_64, N_HIGH_MEMORY=N_NORMAL_MEMORY, if a MOVABLE-dedicated node
is in node_stats[N_HIGH_MEMORY], it is also means it is in
node_stats[N_NORMAL_MEMORY], it causes SLUB wrong.

The slub uses
for_each_node_state(nid, N_NORMAL_MEMORY)
and creates kmem_cache_node for MOVABLE-dedicated node and cause problem.

In one word, we need a N_MEMORY. We just intrude it as an alias to
N_HIGH_MEMORY and fix all im-proper usages of N_HIGH_MEMORY in late patches.

Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
include/linux/nodemask.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 7afc363..c6ebdc9 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -380,6 +380,7 @@ enum node_states {
#else
N_HIGH_MEMORY = N_NORMAL_MEMORY,
#endif
+ N_MEMORY = N_HIGH_MEMORY,
N_CPU, /* The node has one or more cpus */
NR_NODE_STATES
};
--
1.8.0

2012-11-15 08:51:37

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 02/14] cpuset: use N_MEMORY instead N_HIGH_MEMORY

From: Lai Jiangshan <[email protected]>

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
Documentation/cgroups/cpusets.txt | 2 +-
include/linux/cpuset.h | 2 +-
kernel/cpuset.c | 32 ++++++++++++++++----------------
3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index cefd3d8..12e01d4 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -218,7 +218,7 @@ and name space for cpusets, with a minimum of additional kernel code.
The cpus and mems files in the root (top_cpuset) cpuset are
read-only. The cpus file automatically tracks the value of
cpu_online_mask using a CPU hotplug notifier, and the mems file
-automatically tracks the value of node_states[N_HIGH_MEMORY]--i.e.,
+automatically tracks the value of node_states[N_MEMORY]--i.e.,
nodes with memory--using the cpuset_track_online_nodes() hook.


diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 838320f..8c8a60d 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -144,7 +144,7 @@ static inline nodemask_t cpuset_mems_allowed(struct task_struct *p)
return node_possible_map;
}

-#define cpuset_current_mems_allowed (node_states[N_HIGH_MEMORY])
+#define cpuset_current_mems_allowed (node_states[N_MEMORY])
static inline void cpuset_init_current_mems_allowed(void) {}

static inline int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index f33c715..2b133db 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -302,10 +302,10 @@ static void guarantee_online_cpus(const struct cpuset *cs,
* are online, with memory. If none are online with memory, walk
* up the cpuset hierarchy until we find one that does have some
* online mems. If we get all the way to the top and still haven't
- * found any online mems, return node_states[N_HIGH_MEMORY].
+ * found any online mems, return node_states[N_MEMORY].
*
* One way or another, we guarantee to return some non-empty subset
- * of node_states[N_HIGH_MEMORY].
+ * of node_states[N_MEMORY].
*
* Call with callback_mutex held.
*/
@@ -313,14 +313,14 @@ static void guarantee_online_cpus(const struct cpuset *cs,
static void guarantee_online_mems(const struct cpuset *cs, nodemask_t *pmask)
{
while (cs && !nodes_intersects(cs->mems_allowed,
- node_states[N_HIGH_MEMORY]))
+ node_states[N_MEMORY]))
cs = cs->parent;
if (cs)
nodes_and(*pmask, cs->mems_allowed,
- node_states[N_HIGH_MEMORY]);
+ node_states[N_MEMORY]);
else
- *pmask = node_states[N_HIGH_MEMORY];
- BUG_ON(!nodes_intersects(*pmask, node_states[N_HIGH_MEMORY]));
+ *pmask = node_states[N_MEMORY];
+ BUG_ON(!nodes_intersects(*pmask, node_states[N_MEMORY]));
}

/*
@@ -1100,7 +1100,7 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
return -ENOMEM;

/*
- * top_cpuset.mems_allowed tracks node_stats[N_HIGH_MEMORY];
+ * top_cpuset.mems_allowed tracks node_stats[N_MEMORY];
* it's read-only
*/
if (cs == &top_cpuset) {
@@ -1122,7 +1122,7 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
goto done;

if (!nodes_subset(trialcs->mems_allowed,
- node_states[N_HIGH_MEMORY])) {
+ node_states[N_MEMORY])) {
retval = -EINVAL;
goto done;
}
@@ -2034,7 +2034,7 @@ static struct cpuset *cpuset_next(struct list_head *queue)
* before dropping down to the next. It always processes a node before
* any of its children.
*
- * In the case of memory hot-unplug, it will remove nodes from N_HIGH_MEMORY
+ * In the case of memory hot-unplug, it will remove nodes from N_MEMORY
* if all present pages from a node are offlined.
*/
static void
@@ -2073,7 +2073,7 @@ scan_cpusets_upon_hotplug(struct cpuset *root, enum hotplug_event event)

/* Continue past cpusets with all mems online */
if (nodes_subset(cp->mems_allowed,
- node_states[N_HIGH_MEMORY]))
+ node_states[N_MEMORY]))
continue;

oldmems = cp->mems_allowed;
@@ -2081,7 +2081,7 @@ scan_cpusets_upon_hotplug(struct cpuset *root, enum hotplug_event event)
/* Remove offline mems from this cpuset. */
mutex_lock(&callback_mutex);
nodes_and(cp->mems_allowed, cp->mems_allowed,
- node_states[N_HIGH_MEMORY]);
+ node_states[N_MEMORY]);
mutex_unlock(&callback_mutex);

/* Move tasks from the empty cpuset to a parent */
@@ -2134,8 +2134,8 @@ void cpuset_update_active_cpus(bool cpu_online)

#ifdef CONFIG_MEMORY_HOTPLUG
/*
- * Keep top_cpuset.mems_allowed tracking node_states[N_HIGH_MEMORY].
- * Call this routine anytime after node_states[N_HIGH_MEMORY] changes.
+ * Keep top_cpuset.mems_allowed tracking node_states[N_MEMORY].
+ * Call this routine anytime after node_states[N_MEMORY] changes.
* See cpuset_update_active_cpus() for CPU hotplug handling.
*/
static int cpuset_track_online_nodes(struct notifier_block *self,
@@ -2148,7 +2148,7 @@ static int cpuset_track_online_nodes(struct notifier_block *self,
case MEM_ONLINE:
oldmems = top_cpuset.mems_allowed;
mutex_lock(&callback_mutex);
- top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
+ top_cpuset.mems_allowed = node_states[N_MEMORY];
mutex_unlock(&callback_mutex);
update_tasks_nodemask(&top_cpuset, &oldmems, NULL);
break;
@@ -2177,7 +2177,7 @@ static int cpuset_track_online_nodes(struct notifier_block *self,
void __init cpuset_init_smp(void)
{
cpumask_copy(top_cpuset.cpus_allowed, cpu_active_mask);
- top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
+ top_cpuset.mems_allowed = node_states[N_MEMORY];

hotplug_memory_notifier(cpuset_track_online_nodes, 10);

@@ -2245,7 +2245,7 @@ void cpuset_init_current_mems_allowed(void)
*
* Description: Returns the nodemask_t mems_allowed of the cpuset
* attached to the specified @tsk. Guaranteed to return some non-empty
- * subset of node_states[N_HIGH_MEMORY], even if this means going outside the
+ * subset of node_states[N_MEMORY], even if this means going outside the
* tasks cpuset.
**/

--
1.8.0

2012-11-15 08:52:59

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 14/14] hotplug: update nodemasks management

From: Lai Jiangshan <[email protected]>

update nodemasks management for N_MEMORY

Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
Documentation/memory-hotplug.txt | 5 ++-
include/linux/memory.h | 1 +
mm/memory_hotplug.c | 87 +++++++++++++++++++++++++++++++++-------
3 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6e6cbc7..70bc1c7 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -378,6 +378,7 @@ struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
+ int status_change_nid_high;
int status_change_nid;
}

@@ -385,7 +386,9 @@ start_pfn is start_pfn of online/offline memory.
nr_pages is # of pages of online/offline memory.
status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
is (will be) set/clear, if this is -1, then nodemask status is not changed.
-status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
+status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
+status_change_nid is set node id when N_MEMORY of nodemask is (will be)
set/clear. It means a new(memoryless) node gets new memory by online and a
node loses all memory. If this is -1, then nodemask status is not changed.
If status_changed_nid* >= 0, callback should create/discard structures for the
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a09216d..45e93b4 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -54,6 +54,7 @@ struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
+ int status_change_nid_high;
int status_change_nid;
};

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index dfa6a91..760095d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -475,13 +475,15 @@ static void node_states_check_changes_online(unsigned long nr_pages,
enum zone_type zone_last = ZONE_NORMAL;

/*
- * If we have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
- * which have 0...ZONE_NORMAL, set zone_last to ZONE_NORMAL.
+ * If we have HIGHMEM or movable node, node_states[N_NORMAL_MEMORY]
+ * contains nodes which have zones of 0...ZONE_NORMAL,
+ * set zone_last to ZONE_NORMAL.
*
- * If we don't have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
- * which have 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
+ * If we don't have HIGHMEM nor movable node,
+ * node_states[N_NORMAL_MEMORY] contains nodes which have zones of
+ * 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
*/
- if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+ if (N_MEMORY == N_NORMAL_MEMORY)
zone_last = ZONE_MOVABLE;

/*
@@ -495,12 +497,34 @@ static void node_states_check_changes_online(unsigned long nr_pages,
else
arg->status_change_nid_normal = -1;

+#ifdef CONFIG_HIGHMEM
+ /*
+ * If we have movable node, node_states[N_HIGH_MEMORY]
+ * contains nodes which have zones of 0...ZONE_HIGH,
+ * set zone_last to ZONE_HIGH.
+ *
+ * If we don't have movable node, node_states[N_NORMAL_MEMORY]
+ * contains nodes which have zones of 0...ZONE_MOVABLE,
+ * set zone_last to ZONE_MOVABLE.
+ */
+ zone_last = ZONE_HIGH;
+ if (N_MEMORY == N_HIGH_MEMORY)
+ zone_last = ZONE_MOVABLE;
+
+ if (zone_idx(zone) <= zone_last && !node_state(nid, N_HIGH_MEMORY))
+ arg->status_change_nid_high = nid;
+ else
+ arg->status_change_nid_high = -1;
+#else
+ arg->status_change_nid_high = arg->status_change_nid_normal;
+#endif
+
/*
* if the node don't have memory befor online, we will need to
- * set the node to node_states[N_HIGH_MEMORY] after the memory
+ * set the node to node_states[N_MEMORY] after the memory
* is online.
*/
- if (!node_state(nid, N_HIGH_MEMORY))
+ if (!node_state(nid, N_MEMORY))
arg->status_change_nid = nid;
else
arg->status_change_nid = -1;
@@ -511,7 +535,10 @@ static void node_states_set_node(int node, struct memory_notify *arg)
if (arg->status_change_nid_normal >= 0)
node_set_state(node, N_NORMAL_MEMORY);

- node_set_state(node, N_HIGH_MEMORY);
+ if (arg->status_change_nid_high >= 0)
+ node_set_state(node, N_HIGH_MEMORY);
+
+ node_set_state(node, N_MEMORY);
}


@@ -929,13 +956,15 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
enum zone_type zt, zone_last = ZONE_NORMAL;

/*
- * If we have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
- * which have 0...ZONE_NORMAL, set zone_last to ZONE_NORMAL.
+ * If we have HIGHMEM or movable node, node_states[N_NORMAL_MEMORY]
+ * contains nodes which have zones of 0...ZONE_NORMAL,
+ * set zone_last to ZONE_NORMAL.
*
- * If we don't have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
- * which have 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
+ * If we don't have HIGHMEM nor movable node,
+ * node_states[N_NORMAL_MEMORY] contains nodes which have zones of
+ * 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
*/
- if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+ if (N_MEMORY == N_NORMAL_MEMORY)
zone_last = ZONE_MOVABLE;

/*
@@ -952,6 +981,30 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
else
arg->status_change_nid_normal = -1;

+#ifdef CONIG_HIGHMEM
+ /*
+ * If we have movable node, node_states[N_HIGH_MEMORY]
+ * contains nodes which have zones of 0...ZONE_HIGH,
+ * set zone_last to ZONE_HIGH.
+ *
+ * If we don't have movable node, node_states[N_NORMAL_MEMORY]
+ * contains nodes which have zones of 0...ZONE_MOVABLE,
+ * set zone_last to ZONE_MOVABLE.
+ */
+ zone_last = ZONE_HIGH;
+ if (N_MEMORY == N_HIGH_MEMORY)
+ zone_last = ZONE_MOVABLE;
+
+ for (; zt <= zone_last; zt++)
+ present_pages += pgdat->node_zones[zt].present_pages;
+ if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
+ arg->status_change_nid_high = zone_to_nid(zone);
+ else
+ arg->status_change_nid_high = -1;
+#else
+ arg->status_change_nid_high = arg->status_change_nid_normal;
+#endif
+
/*
* node_states[N_HIGH_MEMORY] contains nodes which have 0...ZONE_MOVABLE
*/
@@ -976,9 +1029,13 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
if (arg->status_change_nid_normal >= 0)
node_clear_state(node, N_NORMAL_MEMORY);

- if ((N_HIGH_MEMORY != N_NORMAL_MEMORY) &&
- (arg->status_change_nid >= 0))
+ if ((N_MEMORY != N_NORMAL_MEMORY) &&
+ (arg->status_change_nid_high >= 0))
node_clear_state(node, N_HIGH_MEMORY);
+
+ if ((N_MEMORY != N_HIGH_MEMORY) &&
+ (arg->status_change_nid >= 0))
+ node_clear_state(node, N_MEMORY);
}

static int __ref __offline_pages(unsigned long start_pfn,
--
1.8.0

2012-11-15 08:53:25

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 12/14] vmscan: use N_MEMORY instead N_HIGH_MEMORY

From: Lai Jiangshan <[email protected]>

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
mm/vmscan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8b055e9..5f86835 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3137,7 +3137,7 @@ static int __devinit cpu_callback(struct notifier_block *nfb,
int nid;

if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN) {
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
pg_data_t *pgdat = NODE_DATA(nid);
const struct cpumask *mask;

@@ -3193,7 +3193,7 @@ static int __init kswapd_init(void)
int nid;

swap_setup();
- for_each_node_state(nid, N_HIGH_MEMORY)
+ for_each_node_state(nid, N_MEMORY)
kswapd_run(nid);
hotcpu_notifier(cpu_callback, 0);
return 0;
--
1.8.0

2012-11-15 08:53:23

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 13/14] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization

From: Lai Jiangshan <[email protected]>

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Since we introduced N_MEMORY, we update the initialization of node_states.

Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Lin Feng <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
arch/x86/mm/init_64.c | 4 +++-
mm/page_alloc.c | 42 +++++++++++++++++++++++-------------------
2 files changed, 26 insertions(+), 20 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3baff25..2ead3c8 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -630,7 +630,9 @@ void __init paging_init(void)
* numa support is not compiled in, and later node_set_state
* will not set it back.
*/
- node_clear_state(0, N_NORMAL_MEMORY);
+ node_clear_state(0, N_MEMORY);
+ if (N_MEMORY != N_NORMAL_MEMORY)
+ node_clear_state(0, N_NORMAL_MEMORY);

zone_sizes_init();
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b74de6..4054aaf 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1692,7 +1692,7 @@ bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark,
*
* If the zonelist cache is present in the passed in zonelist, then
* returns a pointer to the allowed node mask (either the current
- * tasks mems_allowed, or node_states[N_HIGH_MEMORY].)
+ * tasks mems_allowed, or node_states[N_MEMORY].)
*
* If the zonelist cache is not available for this zonelist, does
* nothing and returns NULL.
@@ -1721,7 +1721,7 @@ static nodemask_t *zlc_setup(struct zonelist *zonelist, int alloc_flags)

allowednodes = !in_interrupt() && (alloc_flags & ALLOC_CPUSET) ?
&cpuset_current_mems_allowed :
- &node_states[N_HIGH_MEMORY];
+ &node_states[N_MEMORY];
return allowednodes;
}

@@ -3194,7 +3194,7 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
return node;
}

- for_each_node_state(n, N_HIGH_MEMORY) {
+ for_each_node_state(n, N_MEMORY) {

/* Don't want a node to appear more than once */
if (node_isset(n, *used_node_mask))
@@ -3336,7 +3336,7 @@ static int default_zonelist_order(void)
* local memory, NODE_ORDER may be suitable.
*/
average_size = total_size /
- (nodes_weight(node_states[N_HIGH_MEMORY]) + 1);
+ (nodes_weight(node_states[N_MEMORY]) + 1);
for_each_online_node(nid) {
low_kmem_size = 0;
total_size = 0;
@@ -4687,7 +4687,7 @@ unsigned long __init find_min_pfn_with_active_regions(void)
/*
* early_calculate_totalpages()
* Sum pages in active regions for movable zone.
- * Populate N_HIGH_MEMORY for calculating usable_nodes.
+ * Populate N_MEMORY for calculating usable_nodes.
*/
static unsigned long __init early_calculate_totalpages(void)
{
@@ -4700,7 +4700,7 @@ static unsigned long __init early_calculate_totalpages(void)

totalpages += pages;
if (pages)
- node_set_state(nid, N_HIGH_MEMORY);
+ node_set_state(nid, N_MEMORY);
}
return totalpages;
}
@@ -4717,9 +4717,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
unsigned long usable_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
- nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];
+ nodemask_t saved_node_state = node_states[N_MEMORY];
unsigned long totalpages = early_calculate_totalpages();
- int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
+ int usable_nodes = nodes_weight(node_states[N_MEMORY]);

/*
* If movablecore was specified, calculate what size of
@@ -4754,7 +4754,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
restart:
/* Spread kernelcore memory as evenly as possible throughout nodes */
kernelcore_node = required_kernelcore / usable_nodes;
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long start_pfn, end_pfn;

/*
@@ -4846,23 +4846,27 @@ restart:

out:
/* restore the node_state */
- node_states[N_HIGH_MEMORY] = saved_node_state;
+ node_states[N_MEMORY] = saved_node_state;
}

-/* Any regular memory on that node ? */
-static void __init check_for_regular_memory(pg_data_t *pgdat)
+/* Any regular or high memory on that node ? */
+static void check_for_memory(pg_data_t *pgdat, int nid)
{
-#ifdef CONFIG_HIGHMEM
enum zone_type zone_type;

- for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
+ if (N_MEMORY == N_NORMAL_MEMORY)
+ return;
+
+ for (zone_type = 0; zone_type <= ZONE_MOVABLE - 1; zone_type++) {
struct zone *zone = &pgdat->node_zones[zone_type];
if (zone->present_pages) {
- node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
+ node_set_state(nid, N_HIGH_MEMORY);
+ if (N_NORMAL_MEMORY != N_HIGH_MEMORY &&
+ zone_type <= ZONE_NORMAL)
+ node_set_state(nid, N_NORMAL_MEMORY);
break;
}
}
-#endif
}

/**
@@ -4945,8 +4949,8 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)

/* Any memory on that node */
if (pgdat->node_present_pages)
- node_set_state(nid, N_HIGH_MEMORY);
- check_for_regular_memory(pgdat);
+ node_set_state(nid, N_MEMORY);
+ check_for_memory(pgdat, nid);
}
}

@@ -6105,7 +6109,7 @@ void reset_zone_present_pages(void)
struct zone *z;
int i, nid;

- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
for (i = 0; i < MAX_NR_ZONES; i++) {
z = NODE_DATA(nid)->node_zones + i;
z->present_pages = 0;
--
1.8.0

2012-11-15 08:54:18

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 08/14] hugetlb: use N_MEMORY instead N_HIGH_MEMORY

From: Lai Jiangshan <[email protected]>

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
drivers/base/node.c | 2 +-
mm/hugetlb.c | 24 ++++++++++++------------
2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index af1a177..31f4805 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -227,7 +227,7 @@ static node_registration_func_t __hugetlb_unregister_node;
static inline bool hugetlb_register_node(struct node *node)
{
if (__hugetlb_register_node &&
- node_state(node->dev.id, N_HIGH_MEMORY)) {
+ node_state(node->dev.id, N_MEMORY)) {
__hugetlb_register_node(node);
return true;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 59a0059..7720ade 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1057,7 +1057,7 @@ static void return_unused_surplus_pages(struct hstate *h,
* on-line nodes with memory and will handle the hstate accounting.
*/
while (nr_pages--) {
- if (!free_pool_huge_page(h, &node_states[N_HIGH_MEMORY], 1))
+ if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1))
break;
}
}
@@ -1180,14 +1180,14 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
int __weak alloc_bootmem_huge_page(struct hstate *h)
{
struct huge_bootmem_page *m;
- int nr_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
+ int nr_nodes = nodes_weight(node_states[N_MEMORY]);

while (nr_nodes) {
void *addr;

addr = __alloc_bootmem_node_nopanic(
NODE_DATA(hstate_next_node_to_alloc(h,
- &node_states[N_HIGH_MEMORY])),
+ &node_states[N_MEMORY])),
huge_page_size(h), huge_page_size(h), 0);

if (addr) {
@@ -1259,7 +1259,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
if (!alloc_bootmem_huge_page(h))
break;
} else if (!alloc_fresh_huge_page(h,
- &node_states[N_HIGH_MEMORY]))
+ &node_states[N_MEMORY]))
break;
}
h->max_huge_pages = i;
@@ -1527,7 +1527,7 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
if (!(obey_mempolicy &&
init_nodemask_of_mempolicy(nodes_allowed))) {
NODEMASK_FREE(nodes_allowed);
- nodes_allowed = &node_states[N_HIGH_MEMORY];
+ nodes_allowed = &node_states[N_MEMORY];
}
} else if (nodes_allowed) {
/*
@@ -1537,11 +1537,11 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
init_nodemask_of_node(nodes_allowed, nid);
} else
- nodes_allowed = &node_states[N_HIGH_MEMORY];
+ nodes_allowed = &node_states[N_MEMORY];

h->max_huge_pages = set_max_huge_pages(h, count, nodes_allowed);

- if (nodes_allowed != &node_states[N_HIGH_MEMORY])
+ if (nodes_allowed != &node_states[N_MEMORY])
NODEMASK_FREE(nodes_allowed);

return len;
@@ -1844,7 +1844,7 @@ static void hugetlb_register_all_nodes(void)
{
int nid;

- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
struct node *node = &node_devices[nid];
if (node->dev.id == nid)
hugetlb_register_node(node);
@@ -1939,8 +1939,8 @@ void __init hugetlb_add_hstate(unsigned order)
for (i = 0; i < MAX_NUMNODES; ++i)
INIT_LIST_HEAD(&h->hugepage_freelists[i]);
INIT_LIST_HEAD(&h->hugepage_activelist);
- h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
- h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
+ h->next_nid_to_alloc = first_node(node_states[N_MEMORY]);
+ h->next_nid_to_free = first_node(node_states[N_MEMORY]);
snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
huge_page_size(h)/1024);
/*
@@ -2035,11 +2035,11 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
if (!(obey_mempolicy &&
init_nodemask_of_mempolicy(nodes_allowed))) {
NODEMASK_FREE(nodes_allowed);
- nodes_allowed = &node_states[N_HIGH_MEMORY];
+ nodes_allowed = &node_states[N_MEMORY];
}
h->max_huge_pages = set_max_huge_pages(h, tmp, nodes_allowed);

- if (nodes_allowed != &node_states[N_HIGH_MEMORY])
+ if (nodes_allowed != &node_states[N_MEMORY])
NODEMASK_FREE(nodes_allowed);
}
out:
--
1.8.0

2012-11-15 08:54:16

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 09/14] vmstat: use N_MEMORY instead N_HIGH_MEMORY

From: Lai Jiangshan <[email protected]>

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
mm/vmstat.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..1b5cacd 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -930,7 +930,7 @@ static int pagetypeinfo_show(struct seq_file *m, void *arg)
pg_data_t *pgdat = (pg_data_t *)arg;

/* check memoryless node */
- if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+ if (!node_state(pgdat->node_id, N_MEMORY))
return 0;

seq_printf(m, "Page block order: %d\n", pageblock_order);
@@ -1292,7 +1292,7 @@ static int unusable_show(struct seq_file *m, void *arg)
pg_data_t *pgdat = (pg_data_t *)arg;

/* check memoryless node */
- if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+ if (!node_state(pgdat->node_id, N_MEMORY))
return 0;

walk_zones_in_node(m, pgdat, unusable_show_print);
--
1.8.0

2012-11-15 08:54:45

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 10/14] kthread: use N_MEMORY instead N_HIGH_MEMORY

From: Lai Jiangshan <[email protected]>

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
kernel/kthread.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 29fb60c..691dc2e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -428,7 +428,7 @@ int kthreadd(void *unused)
set_task_comm(tsk, "kthreadd");
ignore_signals(tsk);
set_cpus_allowed_ptr(tsk, cpu_all_mask);
- set_mems_allowed(node_states[N_HIGH_MEMORY]);
+ set_mems_allowed(node_states[N_MEMORY]);

current->flags |= PF_NOFREEZE;

--
1.8.0

2012-11-15 08:55:18

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 05/14] oom: use N_MEMORY instead N_HIGH_MEMORY

From: Lai Jiangshan <[email protected]>

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Hillf Danton <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
mm/oom_kill.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 79e0f3e..aa2d89c 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -257,7 +257,7 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
* the page allocator means a mempolicy is in effect. Cpuset policy
* is enforced in get_page_from_freelist().
*/
- if (nodemask && !nodes_subset(node_states[N_HIGH_MEMORY], *nodemask)) {
+ if (nodemask && !nodes_subset(node_states[N_MEMORY], *nodemask)) {
*totalpages = total_swap_pages;
for_each_node_mask(nid, *nodemask)
*totalpages += node_spanned_pages(nid);
--
1.8.0

2012-11-15 08:55:17

by Wen Congyang

[permalink] [raw]
Subject: [PART3 Patch v2 06/14] mm,migrate: use N_MEMORY instead N_HIGH_MEMORY

From: Lai Jiangshan <[email protected]>

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.

Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Signed-off-by: Wen Congyang <[email protected]>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 77ed2d7..d595e58 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1201,7 +1201,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
if (node < 0 || node >= MAX_NUMNODES)
goto out_pm;

- if (!node_state(node, N_HIGH_MEMORY))
+ if (!node_state(node, N_MEMORY))
goto out_pm;

err = -EACCES;
--
1.8.0

2012-11-16 00:29:24

by Andrew Morton

[permalink] [raw]
Subject: Re: [PART3 Patch v2 13/14] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization

On Thu, 15 Nov 2012 16:57:36 +0800
Wen Congyang <[email protected]> wrote:

> N_HIGH_MEMORY stands for the nodes that has normal or high memory.
> N_MEMORY stands for the nodes that has any memory.
>
> The code here need to handle with the nodes which have memory, we should
> use N_MEMORY instead.
>
> Since we introduced N_MEMORY, we update the initialization of node_states.

reset_zone_present_pages() has been removed by the recently-queued
revert-mm-fix-up-zone-present-pages.patch, so I dropped that hunk.

We still have

akpm:/usr/src/25> grep N_HIGH_MEMORY mm/page_alloc.c
[N_HIGH_MEMORY] = { { [0] = 1UL } },
node_set_state(nid, N_HIGH_MEMORY);
if (N_NORMAL_MEMORY != N_HIGH_MEMORY &&

which I hope is correct. Can you please check it?

2012-11-16 02:55:36

by Wen Congyang

[permalink] [raw]
Subject: Re: [PART3 Patch v2 13/14] page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization

At 11/16/2012 08:29 AM, Andrew Morton Wrote:
> On Thu, 15 Nov 2012 16:57:36 +0800
> Wen Congyang <[email protected]> wrote:
>
>> N_HIGH_MEMORY stands for the nodes that has normal or high memory.
>> N_MEMORY stands for the nodes that has any memory.
>>
>> The code here need to handle with the nodes which have memory, we should
>> use N_MEMORY instead.
>>
>> Since we introduced N_MEMORY, we update the initialization of node_states.
>
> reset_zone_present_pages() has been removed by the recently-queued
> revert-mm-fix-up-zone-present-pages.patch, so I dropped that hunk.
>
> We still have
>
> akpm:/usr/src/25> grep N_HIGH_MEMORY mm/page_alloc.c
> [N_HIGH_MEMORY] = { { [0] = 1UL } },
> node_set_state(nid, N_HIGH_MEMORY);
> if (N_NORMAL_MEMORY != N_HIGH_MEMORY &&
>
> which I hope is correct. Can you please check it?
>

Yes, it is correct.

We will introduce N_MEMORY nodemask in part4, and N_MEMORY is N_HIGH_MEMORY
in this patchset. So we don't init and update N_MEMORY nodemask in this patchset.

Thanks
Wen Congyang