From: Lai Jiangshan <[email protected]>
This patch is part3 of the following patchset:
https://lkml.org/lkml/2012/10/29/319
Part1 is here:
https://lkml.org/lkml/2012/10/31/30
Part2 is here:
http://marc.info/?l=linux-kernel&m=135166705909544&w=2
You can apply this patchset without the other parts.
we need a node which only contains movable memory. This feature is very
important for node hotplug. So we will add a new nodemask
for all memory. N_MEMORY contains movable memory but N_HIGH_MEMORY
doesn't contain it.
We don't remove N_HIGH_MEMORY because it can be used to search which
nodes contains memory that the kernel can use.
The movable node will implemtent in part4. So N_MEMORY is equal to N_HIGH_MEMORY
now.
Lai Jiangshan (14):
node_states: introduce N_MEMORY
cpuset: use N_MEMORY instead N_HIGH_MEMORY
procfs: use N_MEMORY instead N_HIGH_MEMORY
memcontrol: use N_MEMORY instead N_HIGH_MEMORY
oom: use N_MEMORY instead N_HIGH_MEMORY
mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
mempolicy: use N_MEMORY instead N_HIGH_MEMORY
hugetlb: use N_MEMORY instead N_HIGH_MEMORY
vmstat: use N_MEMORY instead N_HIGH_MEMORY
kthread: use N_MEMORY instead N_HIGH_MEMORY
init: use N_MEMORY instead N_HIGH_MEMORY
vmscan: use N_MEMORY instead N_HIGH_MEMORY
page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states
initialization
hotplug: update nodemasks management
Documentation/cgroups/cpusets.txt | 2 +-
Documentation/memory-hotplug.txt | 5 ++-
arch/x86/mm/init_64.c | 4 +-
drivers/base/node.c | 2 +-
fs/proc/kcore.c | 2 +-
fs/proc/task_mmu.c | 4 +-
include/linux/cpuset.h | 2 +-
include/linux/memory.h | 1 +
include/linux/nodemask.h | 1 +
init/main.c | 2 +-
kernel/cpuset.c | 32 +++++++-------
kernel/kthread.c | 2 +-
mm/hugetlb.c | 24 +++++------
mm/memcontrol.c | 18 ++++----
mm/memory_hotplug.c | 87 ++++++++++++++++++++++++++++++++-------
mm/mempolicy.c | 12 +++---
mm/migrate.c | 2 +-
mm/oom_kill.c | 2 +-
mm/page_alloc.c | 40 ++++++++++--------
mm/page_cgroup.c | 2 +-
mm/vmscan.c | 4 +-
mm/vmstat.c | 4 +-
22 files changed, 161 insertions(+), 93 deletions(-)
--
1.8.0
From: Lai Jiangshan <[email protected]>
We have N_NORMAL_MEMORY for standing for the nodes that have normal memory with
zone_type <= ZONE_NORMAL.
And we have N_HIGH_MEMORY for standing for the nodes that have normal or high
memory.
But we don't have any word to stand for the nodes that have *any* memory.
And we have N_CPU but without N_MEMORY.
Current code reuse the N_HIGH_MEMORY for this purpose because any node which
has memory must have high memory or normal memory currently.
A) But this reusing is bad for *readability*. Because the name
N_HIGH_MEMORY just stands for high or normal:
A.example 1)
mem_cgroup_nr_lru_pages():
for_each_node_state(nid, N_HIGH_MEMORY)
The user will be confused(why this function just counts for high or
normal memory node? does it counts for ZONE_MOVABLE's lru pages?)
until someone else tell them N_HIGH_MEMORY is reused to stand for
nodes that have any memory.
A.cont) If we introduce N_MEMORY, we can reduce this confusing
AND make the code more clearly:
A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice:
One is in page_cgroup_init(void):
for_each_node_state(nid, N_HIGH_MEMORY) {
It means if the node have memory, we will allocate page_cgroup map for
the node. We should use N_MEMORY instead here to gaim more clearly.
The second using is in alloc_page_cgroup():
if (node_state(nid, N_HIGH_MEMORY))
addr = vzalloc_node(size, nid);
It means if the node has high or normal memory that can be allocated
from kernel. We should keep N_HIGH_MEMORY here, and it will be better
if the "any memory" semantic of N_HIGH_MEMORY is removed.
B) This reusing is out-dated if we introduce MOVABLE-dedicated node.
The MOVABLE-dedicated node should not appear in
node_stats[N_HIGH_MEMORY] nor node_stats[N_NORMAL_MEMORY],
because MOVABLE-dedicated node has no high or normal memory.
In x86_64, N_HIGH_MEMORY=N_NORMAL_MEMORY, if a MOVABLE-dedicated node
is in node_stats[N_HIGH_MEMORY], it is also means it is in
node_stats[N_NORMAL_MEMORY], it causes SLUB wrong.
The slub uses
for_each_node_state(nid, N_NORMAL_MEMORY)
and creates kmem_cache_node for MOVABLE-dedicated node and cause problem.
In one word, we need a N_MEMORY. We just intrude it as an alias to
N_HIGH_MEMORY and fix all im-proper usages of N_HIGH_MEMORY in late patches.
Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Acked-by: Hillf Danton <[email protected]>
---
include/linux/nodemask.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index 7afc363..c6ebdc9 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -380,6 +380,7 @@ enum node_states {
#else
N_HIGH_MEMORY = N_NORMAL_MEMORY,
#endif
+ N_MEMORY = N_HIGH_MEMORY,
N_CPU, /* The node has one or more cpus */
NR_NODE_STATES
};
--
1.8.0
From: Lai Jiangshan <[email protected]>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <[email protected]>
---
init/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/init/main.c b/init/main.c
index 9cf77ab..9595968 100644
--- a/init/main.c
+++ b/init/main.c
@@ -855,7 +855,7 @@ static void __init kernel_init_freeable(void)
/*
* init can allocate pages on any node
*/
- set_mems_allowed(node_states[N_HIGH_MEMORY]);
+ set_mems_allowed(node_states[N_MEMORY]);
/*
* init can run on any cpu.
*/
--
1.8.0
From: Lai Jiangshan <[email protected]>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <[email protected]>
---
kernel/kthread.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/kthread.c b/kernel/kthread.c
index 29fb60c..691dc2e 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -428,7 +428,7 @@ int kthreadd(void *unused)
set_task_comm(tsk, "kthreadd");
ignore_signals(tsk);
set_cpus_allowed_ptr(tsk, cpu_all_mask);
- set_mems_allowed(node_states[N_HIGH_MEMORY]);
+ set_mems_allowed(node_states[N_MEMORY]);
current->flags |= PF_NOFREEZE;
--
1.8.0
From: Lai Jiangshan <[email protected]>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Hillf Danton <[email protected]>
---
mm/vmscan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2624edc..98a2e11 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3135,7 +3135,7 @@ static int __devinit cpu_callback(struct notifier_block *nfb,
int nid;
if (action == CPU_ONLINE || action == CPU_ONLINE_FROZEN) {
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
pg_data_t *pgdat = NODE_DATA(nid);
const struct cpumask *mask;
@@ -3191,7 +3191,7 @@ static int __init kswapd_init(void)
int nid;
swap_setup();
- for_each_node_state(nid, N_HIGH_MEMORY)
+ for_each_node_state(nid, N_MEMORY)
kswapd_run(nid);
hotcpu_notifier(cpu_callback, 0);
return 0;
--
1.8.0
From: Lai Jiangshan <[email protected]>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Since we introduced N_MEMORY, we update the initialization of node_states.
Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/mm/init_64.c | 4 +++-
mm/page_alloc.c | 40 ++++++++++++++++++++++------------------
2 files changed, 25 insertions(+), 19 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3baff25..2ead3c8 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -630,7 +630,9 @@ void __init paging_init(void)
* numa support is not compiled in, and later node_set_state
* will not set it back.
*/
- node_clear_state(0, N_NORMAL_MEMORY);
+ node_clear_state(0, N_MEMORY);
+ if (N_MEMORY != N_NORMAL_MEMORY)
+ node_clear_state(0, N_NORMAL_MEMORY);
zone_sizes_init();
}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5b74de6..f1f44d5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1692,7 +1692,7 @@ bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark,
*
* If the zonelist cache is present in the passed in zonelist, then
* returns a pointer to the allowed node mask (either the current
- * tasks mems_allowed, or node_states[N_HIGH_MEMORY].)
+ * tasks mems_allowed, or node_states[N_MEMORY].)
*
* If the zonelist cache is not available for this zonelist, does
* nothing and returns NULL.
@@ -1721,7 +1721,7 @@ static nodemask_t *zlc_setup(struct zonelist *zonelist, int alloc_flags)
allowednodes = !in_interrupt() && (alloc_flags & ALLOC_CPUSET) ?
&cpuset_current_mems_allowed :
- &node_states[N_HIGH_MEMORY];
+ &node_states[N_MEMORY];
return allowednodes;
}
@@ -3194,7 +3194,7 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
return node;
}
- for_each_node_state(n, N_HIGH_MEMORY) {
+ for_each_node_state(n, N_MEMORY) {
/* Don't want a node to appear more than once */
if (node_isset(n, *used_node_mask))
@@ -3336,7 +3336,7 @@ static int default_zonelist_order(void)
* local memory, NODE_ORDER may be suitable.
*/
average_size = total_size /
- (nodes_weight(node_states[N_HIGH_MEMORY]) + 1);
+ (nodes_weight(node_states[N_MEMORY]) + 1);
for_each_online_node(nid) {
low_kmem_size = 0;
total_size = 0;
@@ -4687,7 +4687,7 @@ unsigned long __init find_min_pfn_with_active_regions(void)
/*
* early_calculate_totalpages()
* Sum pages in active regions for movable zone.
- * Populate N_HIGH_MEMORY for calculating usable_nodes.
+ * Populate N_MEMORY for calculating usable_nodes.
*/
static unsigned long __init early_calculate_totalpages(void)
{
@@ -4700,7 +4700,7 @@ static unsigned long __init early_calculate_totalpages(void)
totalpages += pages;
if (pages)
- node_set_state(nid, N_HIGH_MEMORY);
+ node_set_state(nid, N_MEMORY);
}
return totalpages;
}
@@ -4717,9 +4717,9 @@ static void __init find_zone_movable_pfns_for_nodes(void)
unsigned long usable_startpfn;
unsigned long kernelcore_node, kernelcore_remaining;
/* save the state before borrow the nodemask */
- nodemask_t saved_node_state = node_states[N_HIGH_MEMORY];
+ nodemask_t saved_node_state = node_states[N_MEMORY];
unsigned long totalpages = early_calculate_totalpages();
- int usable_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
+ int usable_nodes = nodes_weight(node_states[N_MEMORY]);
/*
* If movablecore was specified, calculate what size of
@@ -4754,7 +4754,7 @@ static void __init find_zone_movable_pfns_for_nodes(void)
restart:
/* Spread kernelcore memory as evenly as possible throughout nodes */
kernelcore_node = required_kernelcore / usable_nodes;
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long start_pfn, end_pfn;
/*
@@ -4846,23 +4846,27 @@ restart:
out:
/* restore the node_state */
- node_states[N_HIGH_MEMORY] = saved_node_state;
+ node_states[N_MEMORY] = saved_node_state;
}
-/* Any regular memory on that node ? */
-static void __init check_for_regular_memory(pg_data_t *pgdat)
+/* Any regular or high memory on that node ? */
+static void check_for_memory(pg_data_t *pgdat, int nid)
{
-#ifdef CONFIG_HIGHMEM
enum zone_type zone_type;
- for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
+ if (N_MEMORY == N_NORMAL_MEMORY)
+ return;
+
+ for (zone_type = 0; zone_type <= ZONE_MOVABLE - 1; zone_type++) {
struct zone *zone = &pgdat->node_zones[zone_type];
if (zone->present_pages) {
- node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
+ node_set_state(nid, N_HIGH_MEMORY);
+ if (N_NORMAL_MEMORY != N_HIGH_MEMORY &&
+ zone_type <= ZONE_NORMAL)
+ node_set_state(nid, N_NORMAL_MEMORY);
break;
}
}
-#endif
}
/**
@@ -4945,8 +4949,8 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn)
/* Any memory on that node */
if (pgdat->node_present_pages)
- node_set_state(nid, N_HIGH_MEMORY);
- check_for_regular_memory(pgdat);
+ node_set_state(nid, N_MEMORY);
+ check_for_memory(pgdat, nid);
}
}
--
1.8.0
From: Lai Jiangshan <[email protected]>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
---
mm/vmstat.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c737057..1b5cacd 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -930,7 +930,7 @@ static int pagetypeinfo_show(struct seq_file *m, void *arg)
pg_data_t *pgdat = (pg_data_t *)arg;
/* check memoryless node */
- if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+ if (!node_state(pgdat->node_id, N_MEMORY))
return 0;
seq_printf(m, "Page block order: %d\n", pageblock_order);
@@ -1292,7 +1292,7 @@ static int unusable_show(struct seq_file *m, void *arg)
pg_data_t *pgdat = (pg_data_t *)arg;
/* check memoryless node */
- if (!node_state(pgdat->node_id, N_HIGH_MEMORY))
+ if (!node_state(pgdat->node_id, N_MEMORY))
return 0;
walk_zones_in_node(m, pgdat, unusable_show_print);
--
1.8.0
From: Lai Jiangshan <[email protected]>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Hillf Danton <[email protected]>
---
drivers/base/node.c | 2 +-
mm/hugetlb.c | 24 ++++++++++++------------
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/base/node.c b/drivers/base/node.c
index af1a177..31f4805 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -227,7 +227,7 @@ static node_registration_func_t __hugetlb_unregister_node;
static inline bool hugetlb_register_node(struct node *node)
{
if (__hugetlb_register_node &&
- node_state(node->dev.id, N_HIGH_MEMORY)) {
+ node_state(node->dev.id, N_MEMORY)) {
__hugetlb_register_node(node);
return true;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 59a0059..7720ade 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1057,7 +1057,7 @@ static void return_unused_surplus_pages(struct hstate *h,
* on-line nodes with memory and will handle the hstate accounting.
*/
while (nr_pages--) {
- if (!free_pool_huge_page(h, &node_states[N_HIGH_MEMORY], 1))
+ if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1))
break;
}
}
@@ -1180,14 +1180,14 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
int __weak alloc_bootmem_huge_page(struct hstate *h)
{
struct huge_bootmem_page *m;
- int nr_nodes = nodes_weight(node_states[N_HIGH_MEMORY]);
+ int nr_nodes = nodes_weight(node_states[N_MEMORY]);
while (nr_nodes) {
void *addr;
addr = __alloc_bootmem_node_nopanic(
NODE_DATA(hstate_next_node_to_alloc(h,
- &node_states[N_HIGH_MEMORY])),
+ &node_states[N_MEMORY])),
huge_page_size(h), huge_page_size(h), 0);
if (addr) {
@@ -1259,7 +1259,7 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
if (!alloc_bootmem_huge_page(h))
break;
} else if (!alloc_fresh_huge_page(h,
- &node_states[N_HIGH_MEMORY]))
+ &node_states[N_MEMORY]))
break;
}
h->max_huge_pages = i;
@@ -1527,7 +1527,7 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
if (!(obey_mempolicy &&
init_nodemask_of_mempolicy(nodes_allowed))) {
NODEMASK_FREE(nodes_allowed);
- nodes_allowed = &node_states[N_HIGH_MEMORY];
+ nodes_allowed = &node_states[N_MEMORY];
}
} else if (nodes_allowed) {
/*
@@ -1537,11 +1537,11 @@ static ssize_t nr_hugepages_store_common(bool obey_mempolicy,
count += h->nr_huge_pages - h->nr_huge_pages_node[nid];
init_nodemask_of_node(nodes_allowed, nid);
} else
- nodes_allowed = &node_states[N_HIGH_MEMORY];
+ nodes_allowed = &node_states[N_MEMORY];
h->max_huge_pages = set_max_huge_pages(h, count, nodes_allowed);
- if (nodes_allowed != &node_states[N_HIGH_MEMORY])
+ if (nodes_allowed != &node_states[N_MEMORY])
NODEMASK_FREE(nodes_allowed);
return len;
@@ -1844,7 +1844,7 @@ static void hugetlb_register_all_nodes(void)
{
int nid;
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
struct node *node = &node_devices[nid];
if (node->dev.id == nid)
hugetlb_register_node(node);
@@ -1939,8 +1939,8 @@ void __init hugetlb_add_hstate(unsigned order)
for (i = 0; i < MAX_NUMNODES; ++i)
INIT_LIST_HEAD(&h->hugepage_freelists[i]);
INIT_LIST_HEAD(&h->hugepage_activelist);
- h->next_nid_to_alloc = first_node(node_states[N_HIGH_MEMORY]);
- h->next_nid_to_free = first_node(node_states[N_HIGH_MEMORY]);
+ h->next_nid_to_alloc = first_node(node_states[N_MEMORY]);
+ h->next_nid_to_free = first_node(node_states[N_MEMORY]);
snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
huge_page_size(h)/1024);
/*
@@ -2035,11 +2035,11 @@ static int hugetlb_sysctl_handler_common(bool obey_mempolicy,
if (!(obey_mempolicy &&
init_nodemask_of_mempolicy(nodes_allowed))) {
NODEMASK_FREE(nodes_allowed);
- nodes_allowed = &node_states[N_HIGH_MEMORY];
+ nodes_allowed = &node_states[N_MEMORY];
}
h->max_huge_pages = set_max_huge_pages(h, tmp, nodes_allowed);
- if (nodes_allowed != &node_states[N_HIGH_MEMORY])
+ if (nodes_allowed != &node_states[N_MEMORY])
NODEMASK_FREE(nodes_allowed);
}
out:
--
1.8.0
From: Lai Jiangshan <[email protected]>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Hillf Danton <[email protected]>
---
mm/oom_kill.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 79e0f3e..aa2d89c 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -257,7 +257,7 @@ static enum oom_constraint constrained_alloc(struct zonelist *zonelist,
* the page allocator means a mempolicy is in effect. Cpuset policy
* is enforced in get_page_from_freelist().
*/
- if (nodemask && !nodes_subset(node_states[N_HIGH_MEMORY], *nodemask)) {
+ if (nodemask && !nodes_subset(node_states[N_MEMORY], *nodemask)) {
*totalpages = total_swap_pages;
for_each_node_mask(nid, *nodemask)
*totalpages += node_spanned_pages(nid);
--
1.8.0
From: Lai Jiangshan <[email protected]>
update nodemasks management for N_MEMORY
Signed-off-by: Lai Jiangshan <[email protected]>
---
Documentation/memory-hotplug.txt | 5 ++-
include/linux/memory.h | 1 +
mm/memory_hotplug.c | 87 +++++++++++++++++++++++++++++++++-------
3 files changed, 77 insertions(+), 16 deletions(-)
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 6e6cbc7..70bc1c7 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -378,6 +378,7 @@ struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
+ int status_change_nid_high;
int status_change_nid;
}
@@ -385,7 +386,9 @@ start_pfn is start_pfn of online/offline memory.
nr_pages is # of pages of online/offline memory.
status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
is (will be) set/clear, if this is -1, then nodemask status is not changed.
-status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
+status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
+is (will be) set/clear, if this is -1, then nodemask status is not changed.
+status_change_nid is set node id when N_MEMORY of nodemask is (will be)
set/clear. It means a new(memoryless) node gets new memory by online and a
node loses all memory. If this is -1, then nodemask status is not changed.
If status_changed_nid* >= 0, callback should create/discard structures for the
diff --git a/include/linux/memory.h b/include/linux/memory.h
index a09216d..45e93b4 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -54,6 +54,7 @@ struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
+ int status_change_nid_high;
int status_change_nid;
};
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index dfa6a91..760095d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -475,13 +475,15 @@ static void node_states_check_changes_online(unsigned long nr_pages,
enum zone_type zone_last = ZONE_NORMAL;
/*
- * If we have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
- * which have 0...ZONE_NORMAL, set zone_last to ZONE_NORMAL.
+ * If we have HIGHMEM or movable node, node_states[N_NORMAL_MEMORY]
+ * contains nodes which have zones of 0...ZONE_NORMAL,
+ * set zone_last to ZONE_NORMAL.
*
- * If we don't have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
- * which have 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
+ * If we don't have HIGHMEM nor movable node,
+ * node_states[N_NORMAL_MEMORY] contains nodes which have zones of
+ * 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
*/
- if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+ if (N_MEMORY == N_NORMAL_MEMORY)
zone_last = ZONE_MOVABLE;
/*
@@ -495,12 +497,34 @@ static void node_states_check_changes_online(unsigned long nr_pages,
else
arg->status_change_nid_normal = -1;
+#ifdef CONFIG_HIGHMEM
+ /*
+ * If we have movable node, node_states[N_HIGH_MEMORY]
+ * contains nodes which have zones of 0...ZONE_HIGH,
+ * set zone_last to ZONE_HIGH.
+ *
+ * If we don't have movable node, node_states[N_NORMAL_MEMORY]
+ * contains nodes which have zones of 0...ZONE_MOVABLE,
+ * set zone_last to ZONE_MOVABLE.
+ */
+ zone_last = ZONE_HIGH;
+ if (N_MEMORY == N_HIGH_MEMORY)
+ zone_last = ZONE_MOVABLE;
+
+ if (zone_idx(zone) <= zone_last && !node_state(nid, N_HIGH_MEMORY))
+ arg->status_change_nid_high = nid;
+ else
+ arg->status_change_nid_high = -1;
+#else
+ arg->status_change_nid_high = arg->status_change_nid_normal;
+#endif
+
/*
* if the node don't have memory befor online, we will need to
- * set the node to node_states[N_HIGH_MEMORY] after the memory
+ * set the node to node_states[N_MEMORY] after the memory
* is online.
*/
- if (!node_state(nid, N_HIGH_MEMORY))
+ if (!node_state(nid, N_MEMORY))
arg->status_change_nid = nid;
else
arg->status_change_nid = -1;
@@ -511,7 +535,10 @@ static void node_states_set_node(int node, struct memory_notify *arg)
if (arg->status_change_nid_normal >= 0)
node_set_state(node, N_NORMAL_MEMORY);
- node_set_state(node, N_HIGH_MEMORY);
+ if (arg->status_change_nid_high >= 0)
+ node_set_state(node, N_HIGH_MEMORY);
+
+ node_set_state(node, N_MEMORY);
}
@@ -929,13 +956,15 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
enum zone_type zt, zone_last = ZONE_NORMAL;
/*
- * If we have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
- * which have 0...ZONE_NORMAL, set zone_last to ZONE_NORMAL.
+ * If we have HIGHMEM or movable node, node_states[N_NORMAL_MEMORY]
+ * contains nodes which have zones of 0...ZONE_NORMAL,
+ * set zone_last to ZONE_NORMAL.
*
- * If we don't have HIGHMEM, node_states[N_NORMAL_MEMORY] contains nodes
- * which have 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
+ * If we don't have HIGHMEM nor movable node,
+ * node_states[N_NORMAL_MEMORY] contains nodes which have zones of
+ * 0...ZONE_MOVABLE, set zone_last to ZONE_MOVABLE.
*/
- if (N_HIGH_MEMORY == N_NORMAL_MEMORY)
+ if (N_MEMORY == N_NORMAL_MEMORY)
zone_last = ZONE_MOVABLE;
/*
@@ -952,6 +981,30 @@ static void node_states_check_changes_offline(unsigned long nr_pages,
else
arg->status_change_nid_normal = -1;
+#ifdef CONIG_HIGHMEM
+ /*
+ * If we have movable node, node_states[N_HIGH_MEMORY]
+ * contains nodes which have zones of 0...ZONE_HIGH,
+ * set zone_last to ZONE_HIGH.
+ *
+ * If we don't have movable node, node_states[N_NORMAL_MEMORY]
+ * contains nodes which have zones of 0...ZONE_MOVABLE,
+ * set zone_last to ZONE_MOVABLE.
+ */
+ zone_last = ZONE_HIGH;
+ if (N_MEMORY == N_HIGH_MEMORY)
+ zone_last = ZONE_MOVABLE;
+
+ for (; zt <= zone_last; zt++)
+ present_pages += pgdat->node_zones[zt].present_pages;
+ if (zone_idx(zone) <= zone_last && nr_pages >= present_pages)
+ arg->status_change_nid_high = zone_to_nid(zone);
+ else
+ arg->status_change_nid_high = -1;
+#else
+ arg->status_change_nid_high = arg->status_change_nid_normal;
+#endif
+
/*
* node_states[N_HIGH_MEMORY] contains nodes which have 0...ZONE_MOVABLE
*/
@@ -976,9 +1029,13 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
if (arg->status_change_nid_normal >= 0)
node_clear_state(node, N_NORMAL_MEMORY);
- if ((N_HIGH_MEMORY != N_NORMAL_MEMORY) &&
- (arg->status_change_nid >= 0))
+ if ((N_MEMORY != N_NORMAL_MEMORY) &&
+ (arg->status_change_nid_high >= 0))
node_clear_state(node, N_HIGH_MEMORY);
+
+ if ((N_MEMORY != N_HIGH_MEMORY) &&
+ (arg->status_change_nid >= 0))
+ node_clear_state(node, N_MEMORY);
}
static int __ref __offline_pages(unsigned long start_pfn,
--
1.8.0
From: Lai Jiangshan <[email protected]>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <[email protected]>
---
mm/mempolicy.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d04a8a5..d4a084c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -212,9 +212,9 @@ static int mpol_set_nodemask(struct mempolicy *pol,
/* if mode is MPOL_DEFAULT, pol is NULL. This is right. */
if (pol == NULL)
return 0;
- /* Check N_HIGH_MEMORY */
+ /* Check N_MEMORY */
nodes_and(nsc->mask1,
- cpuset_current_mems_allowed, node_states[N_HIGH_MEMORY]);
+ cpuset_current_mems_allowed, node_states[N_MEMORY]);
VM_BUG_ON(!nodes);
if (pol->mode == MPOL_PREFERRED && nodes_empty(*nodes))
@@ -1388,7 +1388,7 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
goto out_put;
}
- if (!nodes_subset(*new, node_states[N_HIGH_MEMORY])) {
+ if (!nodes_subset(*new, node_states[N_MEMORY])) {
err = -EINVAL;
goto out_put;
}
@@ -2361,7 +2361,7 @@ void __init numa_policy_init(void)
* fall back to the largest node if they're all smaller.
*/
nodes_clear(interleave_nodes);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long total_pages = node_present_pages(nid);
/* Preserve the largest node */
@@ -2442,7 +2442,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
*nodelist++ = '\0';
if (nodelist_parse(nodelist, nodes))
goto out;
- if (!nodes_subset(nodes, node_states[N_HIGH_MEMORY]))
+ if (!nodes_subset(nodes, node_states[N_MEMORY]))
goto out;
} else
nodes_clear(nodes);
@@ -2476,7 +2476,7 @@ int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context)
* Default to online nodes with memory if no nodelist
*/
if (!nodelist)
- nodes = node_states[N_HIGH_MEMORY];
+ nodes = node_states[N_MEMORY];
break;
case MPOL_LOCAL:
/*
--
1.8.0
From: Lai Jiangshan <[email protected]>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Hillf Danton <[email protected]>
---
Documentation/cgroups/cpusets.txt | 2 +-
include/linux/cpuset.h | 2 +-
kernel/cpuset.c | 32 ++++++++++++++++----------------
3 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index cefd3d8..12e01d4 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -218,7 +218,7 @@ and name space for cpusets, with a minimum of additional kernel code.
The cpus and mems files in the root (top_cpuset) cpuset are
read-only. The cpus file automatically tracks the value of
cpu_online_mask using a CPU hotplug notifier, and the mems file
-automatically tracks the value of node_states[N_HIGH_MEMORY]--i.e.,
+automatically tracks the value of node_states[N_MEMORY]--i.e.,
nodes with memory--using the cpuset_track_online_nodes() hook.
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 838320f..8c8a60d 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -144,7 +144,7 @@ static inline nodemask_t cpuset_mems_allowed(struct task_struct *p)
return node_possible_map;
}
-#define cpuset_current_mems_allowed (node_states[N_HIGH_MEMORY])
+#define cpuset_current_mems_allowed (node_states[N_MEMORY])
static inline void cpuset_init_current_mems_allowed(void) {}
static inline int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index f33c715..2b133db 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -302,10 +302,10 @@ static void guarantee_online_cpus(const struct cpuset *cs,
* are online, with memory. If none are online with memory, walk
* up the cpuset hierarchy until we find one that does have some
* online mems. If we get all the way to the top and still haven't
- * found any online mems, return node_states[N_HIGH_MEMORY].
+ * found any online mems, return node_states[N_MEMORY].
*
* One way or another, we guarantee to return some non-empty subset
- * of node_states[N_HIGH_MEMORY].
+ * of node_states[N_MEMORY].
*
* Call with callback_mutex held.
*/
@@ -313,14 +313,14 @@ static void guarantee_online_cpus(const struct cpuset *cs,
static void guarantee_online_mems(const struct cpuset *cs, nodemask_t *pmask)
{
while (cs && !nodes_intersects(cs->mems_allowed,
- node_states[N_HIGH_MEMORY]))
+ node_states[N_MEMORY]))
cs = cs->parent;
if (cs)
nodes_and(*pmask, cs->mems_allowed,
- node_states[N_HIGH_MEMORY]);
+ node_states[N_MEMORY]);
else
- *pmask = node_states[N_HIGH_MEMORY];
- BUG_ON(!nodes_intersects(*pmask, node_states[N_HIGH_MEMORY]));
+ *pmask = node_states[N_MEMORY];
+ BUG_ON(!nodes_intersects(*pmask, node_states[N_MEMORY]));
}
/*
@@ -1100,7 +1100,7 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
return -ENOMEM;
/*
- * top_cpuset.mems_allowed tracks node_stats[N_HIGH_MEMORY];
+ * top_cpuset.mems_allowed tracks node_stats[N_MEMORY];
* it's read-only
*/
if (cs == &top_cpuset) {
@@ -1122,7 +1122,7 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs,
goto done;
if (!nodes_subset(trialcs->mems_allowed,
- node_states[N_HIGH_MEMORY])) {
+ node_states[N_MEMORY])) {
retval = -EINVAL;
goto done;
}
@@ -2034,7 +2034,7 @@ static struct cpuset *cpuset_next(struct list_head *queue)
* before dropping down to the next. It always processes a node before
* any of its children.
*
- * In the case of memory hot-unplug, it will remove nodes from N_HIGH_MEMORY
+ * In the case of memory hot-unplug, it will remove nodes from N_MEMORY
* if all present pages from a node are offlined.
*/
static void
@@ -2073,7 +2073,7 @@ scan_cpusets_upon_hotplug(struct cpuset *root, enum hotplug_event event)
/* Continue past cpusets with all mems online */
if (nodes_subset(cp->mems_allowed,
- node_states[N_HIGH_MEMORY]))
+ node_states[N_MEMORY]))
continue;
oldmems = cp->mems_allowed;
@@ -2081,7 +2081,7 @@ scan_cpusets_upon_hotplug(struct cpuset *root, enum hotplug_event event)
/* Remove offline mems from this cpuset. */
mutex_lock(&callback_mutex);
nodes_and(cp->mems_allowed, cp->mems_allowed,
- node_states[N_HIGH_MEMORY]);
+ node_states[N_MEMORY]);
mutex_unlock(&callback_mutex);
/* Move tasks from the empty cpuset to a parent */
@@ -2134,8 +2134,8 @@ void cpuset_update_active_cpus(bool cpu_online)
#ifdef CONFIG_MEMORY_HOTPLUG
/*
- * Keep top_cpuset.mems_allowed tracking node_states[N_HIGH_MEMORY].
- * Call this routine anytime after node_states[N_HIGH_MEMORY] changes.
+ * Keep top_cpuset.mems_allowed tracking node_states[N_MEMORY].
+ * Call this routine anytime after node_states[N_MEMORY] changes.
* See cpuset_update_active_cpus() for CPU hotplug handling.
*/
static int cpuset_track_online_nodes(struct notifier_block *self,
@@ -2148,7 +2148,7 @@ static int cpuset_track_online_nodes(struct notifier_block *self,
case MEM_ONLINE:
oldmems = top_cpuset.mems_allowed;
mutex_lock(&callback_mutex);
- top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
+ top_cpuset.mems_allowed = node_states[N_MEMORY];
mutex_unlock(&callback_mutex);
update_tasks_nodemask(&top_cpuset, &oldmems, NULL);
break;
@@ -2177,7 +2177,7 @@ static int cpuset_track_online_nodes(struct notifier_block *self,
void __init cpuset_init_smp(void)
{
cpumask_copy(top_cpuset.cpus_allowed, cpu_active_mask);
- top_cpuset.mems_allowed = node_states[N_HIGH_MEMORY];
+ top_cpuset.mems_allowed = node_states[N_MEMORY];
hotplug_memory_notifier(cpuset_track_online_nodes, 10);
@@ -2245,7 +2245,7 @@ void cpuset_init_current_mems_allowed(void)
*
* Description: Returns the nodemask_t mems_allowed of the cpuset
* attached to the specified @tsk. Guaranteed to return some non-empty
- * subset of node_states[N_HIGH_MEMORY], even if this means going outside the
+ * subset of node_states[N_MEMORY], even if this means going outside the
* tasks cpuset.
**/
--
1.8.0
From: Lai Jiangshan <[email protected]>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 77ed2d7..d595e58 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1201,7 +1201,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
if (node < 0 || node >= MAX_NUMNODES)
goto out_pm;
- if (!node_state(node, N_HIGH_MEMORY))
+ if (!node_state(node, N_MEMORY))
goto out_pm;
err = -EACCES;
--
1.8.0
From: Lai Jiangshan <[email protected]>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <[email protected]>
---
mm/memcontrol.c | 18 +++++++++---------
mm/page_cgroup.c | 2 +-
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7acf43b..1b69665 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -800,7 +800,7 @@ static unsigned long mem_cgroup_nr_lru_pages(struct mem_cgroup *memcg,
int nid;
u64 total = 0;
- for_each_node_state(nid, N_HIGH_MEMORY)
+ for_each_node_state(nid, N_MEMORY)
total += mem_cgroup_node_nr_lru_pages(memcg, nid, lru_mask);
return total;
}
@@ -1611,9 +1611,9 @@ static void mem_cgroup_may_update_nodemask(struct mem_cgroup *memcg)
return;
/* make a nodemask where this memcg uses memory from */
- memcg->scan_nodes = node_states[N_HIGH_MEMORY];
+ memcg->scan_nodes = node_states[N_MEMORY];
- for_each_node_mask(nid, node_states[N_HIGH_MEMORY]) {
+ for_each_node_mask(nid, node_states[N_MEMORY]) {
if (!test_mem_cgroup_node_reclaimable(memcg, nid, false))
node_clear(nid, memcg->scan_nodes);
@@ -1684,7 +1684,7 @@ static bool mem_cgroup_reclaimable(struct mem_cgroup *memcg, bool noswap)
/*
* Check rest of nodes.
*/
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
if (node_isset(nid, memcg->scan_nodes))
continue;
if (test_mem_cgroup_node_reclaimable(memcg, nid, noswap))
@@ -3759,7 +3759,7 @@ move_account:
drain_all_stock_sync(memcg);
ret = 0;
mem_cgroup_start_move(memcg);
- for_each_node_state(node, N_HIGH_MEMORY) {
+ for_each_node_state(node, N_MEMORY) {
for (zid = 0; !ret && zid < MAX_NR_ZONES; zid++) {
enum lru_list lru;
for_each_lru(lru) {
@@ -4087,7 +4087,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
total_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL);
seq_printf(m, "total=%lu", total_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid, LRU_ALL);
seq_printf(m, " N%d=%lu", nid, node_nr);
}
@@ -4095,7 +4095,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
file_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_FILE);
seq_printf(m, "file=%lu", file_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
LRU_ALL_FILE);
seq_printf(m, " N%d=%lu", nid, node_nr);
@@ -4104,7 +4104,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
anon_nr = mem_cgroup_nr_lru_pages(memcg, LRU_ALL_ANON);
seq_printf(m, "anon=%lu", anon_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
LRU_ALL_ANON);
seq_printf(m, " N%d=%lu", nid, node_nr);
@@ -4113,7 +4113,7 @@ static int memcg_numa_stat_show(struct cgroup *cont, struct cftype *cft,
unevictable_nr = mem_cgroup_nr_lru_pages(memcg, BIT(LRU_UNEVICTABLE));
seq_printf(m, "unevictable=%lu", unevictable_nr);
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
node_nr = mem_cgroup_node_nr_lru_pages(memcg, nid,
BIT(LRU_UNEVICTABLE));
seq_printf(m, " N%d=%lu", nid, node_nr);
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 5ddad0c..c1054ad 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -271,7 +271,7 @@ void __init page_cgroup_init(void)
if (mem_cgroup_disabled())
return;
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long start_pfn, end_pfn;
start_pfn = node_start_pfn(nid);
--
1.8.0
From: Lai Jiangshan <[email protected]>
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <[email protected]>
Acked-by: Hillf Danton <[email protected]>
---
fs/proc/kcore.c | 2 +-
fs/proc/task_mmu.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 86c67ee..e96d4f1 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -249,7 +249,7 @@ static int kcore_update_ram(void)
/* Not inialized....update now */
/* find out "max pfn" */
end_pfn = 0;
- for_each_node_state(nid, N_HIGH_MEMORY) {
+ for_each_node_state(nid, N_MEMORY) {
unsigned long node_end;
node_end = NODE_DATA(nid)->node_start_pfn +
NODE_DATA(nid)->node_spanned_pages;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 90c63f9..2d89601 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1126,7 +1126,7 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
return NULL;
nid = page_to_nid(page);
- if (!node_isset(nid, node_states[N_HIGH_MEMORY]))
+ if (!node_isset(nid, node_states[N_MEMORY]))
return NULL;
return page;
@@ -1279,7 +1279,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
if (md->writeback)
seq_printf(m, " writeback=%lu", md->writeback);
- for_each_node_state(n, N_HIGH_MEMORY)
+ for_each_node_state(n, N_MEMORY)
if (md->node[n])
seq_printf(m, " N%d=%lu", n, md->node[n]);
out:
--
1.8.0
On Wed, 31 Oct 2012, Wen Congyang wrote:
> From: Lai Jiangshan <[email protected]>
>
> This patch is part3 of the following patchset:
> https://lkml.org/lkml/2012/10/29/319
>
> Part1 is here:
> https://lkml.org/lkml/2012/10/31/30
>
> Part2 is here:
> http://marc.info/?l=linux-kernel&m=135166705909544&w=2
>
> You can apply this patchset without the other parts.
>
> we need a node which only contains movable memory. This feature is very
> important for node hotplug. So we will add a new nodemask
> for all memory. N_MEMORY contains movable memory but N_HIGH_MEMORY
> doesn't contain it.
>
> We don't remove N_HIGH_MEMORY because it can be used to search which
> nodes contains memory that the kernel can use.
>
This doesn't describe why we need the new node state, unfortunately. It
makes sense to boot with node(s) containing only ZONE_MOVABLE, but it
doesn't show why we need a nodemask to specify such nodes and such
information should be available from the kernel log or /proc/zoneinfo.
Node hotplug should fail if all memory cannot be offlined, so why do we
need another nodemask? Only offline the node if all memory is offlined.
At 11/01/2012 02:16 AM, David Rientjes Wrote:
> On Wed, 31 Oct 2012, Wen Congyang wrote:
>
>> From: Lai Jiangshan <[email protected]>
>>
>> This patch is part3 of the following patchset:
>> https://lkml.org/lkml/2012/10/29/319
>>
>> Part1 is here:
>> https://lkml.org/lkml/2012/10/31/30
>>
>> Part2 is here:
>> http://marc.info/?l=linux-kernel&m=135166705909544&w=2
>>
>> You can apply this patchset without the other parts.
>>
>> we need a node which only contains movable memory. This feature is very
>> important for node hotplug. So we will add a new nodemask
>> for all memory. N_MEMORY contains movable memory but N_HIGH_MEMORY
>> doesn't contain it.
>>
>> We don't remove N_HIGH_MEMORY because it can be used to search which
>> nodes contains memory that the kernel can use.
>>
>
> This doesn't describe why we need the new node state, unfortunately. It
1. Somethimes, we use the node which contains the memory that can be used by
kernel.
2. Sometimes, we use the node which contains the memory.
In case1, we use N_HIGH_MEMORY, and we use N_MEMORY in case2.
> makes sense to boot with node(s) containing only ZONE_MOVABLE, but it
> doesn't show why we need a nodemask to specify such nodes and such
Sorry for confusing you.
We don't add a nodemask to specify nodes which contain only ZONE_MOVABLE.
We want to add a nodemask(N_MEMORY) to specify nodes which contain memory.
In part3, we don't implement the node which only contain ZONE_MOVABLE, so
N_MEMORY is N_HIGH_MEMORY. We will add this nodemask when we implement
the node which contain only ZONE_MOVABLE.
In this patchset, we try to change N_HIGH_MEMORY to N_MEMORY for case2.
Thanks
Wen Congyang
> information should be available from the kernel log or /proc/zoneinfo.
>
> Node hotplug should fail if all memory cannot be offlined, so why do we
> need another nodemask? Only offline the node if all memory is offlined.
>
On Thu, 1 Nov 2012, Wen Congyang wrote:
> > This doesn't describe why we need the new node state, unfortunately. It
>
> 1. Somethimes, we use the node which contains the memory that can be used by
> kernel.
> 2. Sometimes, we use the node which contains the memory.
>
> In case1, we use N_HIGH_MEMORY, and we use N_MEMORY in case2.
>
Yeah, that's clear, but the question is still _why_ we want two different
nodemasks. I know that this part of the patchset simply introduces the
new nodemask because the name "N_MEMORY" is more clear than
"N_HIGH_MEMORY", but there's no real incentive for making that change by
introducing a new nodemask where a simple rename would suffice.
I can only assume that you want to later use one of them for a different
purpose: those that do not include nodes that consist of only
ZONE_MOVABLE. But that change for MPOL_BIND is nacked since it
significantly changes the semantics of set_mempolicy() and you can't break
userspace (see my response to that from yesterday). Until that problem is
addressed, then there's no reason for the additional nodemask so nack on
this series as well.
At 11/02/2012 05:36 AM, David Rientjes Wrote:
> On Thu, 1 Nov 2012, Wen Congyang wrote:
>
>>> This doesn't describe why we need the new node state, unfortunately. It
>>
>> 1. Somethimes, we use the node which contains the memory that can be used by
>> kernel.
>> 2. Sometimes, we use the node which contains the memory.
>>
>> In case1, we use N_HIGH_MEMORY, and we use N_MEMORY in case2.
>>
>
> Yeah, that's clear, but the question is still _why_ we want two different
> nodemasks. I know that this part of the patchset simply introduces the
> new nodemask because the name "N_MEMORY" is more clear than
> "N_HIGH_MEMORY", but there's no real incentive for making that change by
> introducing a new nodemask where a simple rename would suffice.
>
> I can only assume that you want to later use one of them for a different
> purpose: those that do not include nodes that consist of only
> ZONE_MOVABLE. But that change for MPOL_BIND is nacked since it
> significantly changes the semantics of set_mempolicy() and you can't break
> userspace (see my response to that from yesterday). Until that problem is
> addressed, then there's no reason for the additional nodemask so nack on
> this series as well.
>
I still think that we need two nodemasks: one store the node which has memory
that the kernel can use, and one store the node which has memory.
For example:
==========================
static void *__meminit alloc_page_cgroup(size_t size, int nid)
{
gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN;
void *addr = NULL;
addr = alloc_pages_exact_nid(nid, size, flags);
if (addr) {
kmemleak_alloc(addr, size, 1, flags);
return addr;
}
if (node_state(nid, N_HIGH_MEMORY))
addr = vzalloc_node(size, nid);
else
addr = vzalloc(size);
return addr;
}
==========================
If the node only has ZONE_MOVABLE memory, we should use vzalloc().
So we should have a mask that stores the node which has memory that
the kernel can use.
==========================
static int mpol_set_nodemask(struct mempolicy *pol,
const nodemask_t *nodes, struct nodemask_scratch *nsc)
{
int ret;
/* if mode is MPOL_DEFAULT, pol is NULL. This is right. */
if (pol == NULL)
return 0;
/* Check N_HIGH_MEMORY */
nodes_and(nsc->mask1,
cpuset_current_mems_allowed, node_states[N_HIGH_MEMORY]);
...
if (pol->flags & MPOL_F_RELATIVE_NODES)
mpol_relative_nodemask(&nsc->mask2, nodes,&nsc->mask1);
else
nodes_and(nsc->mask2, *nodes, nsc->mask1);
...
}
==========================
If the user specifies 2 nodes: one has ZONE_MOVABLE memory, and the other one doesn't.
nsc->mask2 should contain these 2 nodes. So we should hava a mask that store the node
which has memory.
There maybe something wrong in the change for MPOL_BIND. But this patchset is needed.
Thanks
Wen Congyang
On Fri, 02 Nov 2012 15:41:55 +0800
Wen Congyang <[email protected]> wrote:
> At 11/02/2012 05:36 AM, David Rientjes Wrote:
> > On Thu, 1 Nov 2012, Wen Congyang wrote:
> >
> >>> This doesn't describe why we need the new node state, unfortunately. It
> >>
> >> 1. Somethimes, we use the node which contains the memory that can be used by
> >> kernel.
> >> 2. Sometimes, we use the node which contains the memory.
> >>
> >> In case1, we use N_HIGH_MEMORY, and we use N_MEMORY in case2.
> >>
> >
> > Yeah, that's clear, but the question is still _why_ we want two different
> > nodemasks. I know that this part of the patchset simply introduces the
> > new nodemask because the name "N_MEMORY" is more clear than
> > "N_HIGH_MEMORY", but there's no real incentive for making that change by
> > introducing a new nodemask where a simple rename would suffice.
> >
> > I can only assume that you want to later use one of them for a different
> > purpose: those that do not include nodes that consist of only
> > ZONE_MOVABLE. But that change for MPOL_BIND is nacked since it
> > significantly changes the semantics of set_mempolicy() and you can't break
> > userspace (see my response to that from yesterday). Until that problem is
> > addressed, then there's no reason for the additional nodemask so nack on
> > this series as well.
I cannot locate "my response to that from yesterday". Specificity, please!
>
> I still think that we need two nodemasks: one store the node which has memory
> that the kernel can use, and one store the node which has memory.
>
> For example:
>
> ==========================
> static void *__meminit alloc_page_cgroup(size_t size, int nid)
> {
> gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN;
> void *addr = NULL;
>
> addr = alloc_pages_exact_nid(nid, size, flags);
> if (addr) {
> kmemleak_alloc(addr, size, 1, flags);
> return addr;
> }
>
> if (node_state(nid, N_HIGH_MEMORY))
> addr = vzalloc_node(size, nid);
> else
> addr = vzalloc(size);
>
> return addr;
> }
> ==========================
> If the node only has ZONE_MOVABLE memory, we should use vzalloc().
> So we should have a mask that stores the node which has memory that
> the kernel can use.
>
> ==========================
> static int mpol_set_nodemask(struct mempolicy *pol,
> const nodemask_t *nodes, struct nodemask_scratch *nsc)
> {
> int ret;
>
> /* if mode is MPOL_DEFAULT, pol is NULL. This is right. */
> if (pol == NULL)
> return 0;
> /* Check N_HIGH_MEMORY */
> nodes_and(nsc->mask1,
> cpuset_current_mems_allowed, node_states[N_HIGH_MEMORY]);
> ...
> if (pol->flags & MPOL_F_RELATIVE_NODES)
> mpol_relative_nodemask(&nsc->mask2, nodes,&nsc->mask1);
> else
> nodes_and(nsc->mask2, *nodes, nsc->mask1);
> ...
> }
> ==========================
> If the user specifies 2 nodes: one has ZONE_MOVABLE memory, and the other one doesn't.
> nsc->mask2 should contain these 2 nodes. So we should hava a mask that store the node
> which has memory.
>
> There maybe something wrong in the change for MPOL_BIND. But this patchset is needed.
Well, let's discuss the userspace-visible non-back-compatible mpol
change. What is it, why did it happen, what is its impact, is it
acceptable?
I grabbed "PART1" and "PART2", but that's as far as I got with the six
memory hotplug patch series.
At 11/15/2012 03:52 AM, Andrew Morton Wrote:
> On Fri, 02 Nov 2012 15:41:55 +0800
> Wen Congyang <[email protected]> wrote:
>
>> At 11/02/2012 05:36 AM, David Rientjes Wrote:
>>> On Thu, 1 Nov 2012, Wen Congyang wrote:
>>>
>>>>> This doesn't describe why we need the new node state, unfortunately. It
>>>>
>>>> 1. Somethimes, we use the node which contains the memory that can be used by
>>>> kernel.
>>>> 2. Sometimes, we use the node which contains the memory.
>>>>
>>>> In case1, we use N_HIGH_MEMORY, and we use N_MEMORY in case2.
>>>>
>>>
>>> Yeah, that's clear, but the question is still _why_ we want two different
>>> nodemasks. I know that this part of the patchset simply introduces the
>>> new nodemask because the name "N_MEMORY" is more clear than
>>> "N_HIGH_MEMORY", but there's no real incentive for making that change by
>>> introducing a new nodemask where a simple rename would suffice.
>>>
>>> I can only assume that you want to later use one of them for a different
>>> purpose: those that do not include nodes that consist of only
>>> ZONE_MOVABLE. But that change for MPOL_BIND is nacked since it
>>> significantly changes the semantics of set_mempolicy() and you can't break
>>> userspace (see my response to that from yesterday). Until that problem is
>>> addressed, then there's no reason for the additional nodemask so nack on
>>> this series as well.
>
> I cannot locate "my response to that from yesterday". Specificity, please!
>
>>
>> I still think that we need two nodemasks: one store the node which has memory
>> that the kernel can use, and one store the node which has memory.
>>
>> For example:
>>
>> ==========================
>> static void *__meminit alloc_page_cgroup(size_t size, int nid)
>> {
>> gfp_t flags = GFP_KERNEL | __GFP_ZERO | __GFP_NOWARN;
>> void *addr = NULL;
>>
>> addr = alloc_pages_exact_nid(nid, size, flags);
>> if (addr) {
>> kmemleak_alloc(addr, size, 1, flags);
>> return addr;
>> }
>>
>> if (node_state(nid, N_HIGH_MEMORY))
>> addr = vzalloc_node(size, nid);
>> else
>> addr = vzalloc(size);
>>
>> return addr;
>> }
>> ==========================
>> If the node only has ZONE_MOVABLE memory, we should use vzalloc().
>> So we should have a mask that stores the node which has memory that
>> the kernel can use.
>>
>> ==========================
>> static int mpol_set_nodemask(struct mempolicy *pol,
>> const nodemask_t *nodes, struct nodemask_scratch *nsc)
>> {
>> int ret;
>>
>> /* if mode is MPOL_DEFAULT, pol is NULL. This is right. */
>> if (pol == NULL)
>> return 0;
>> /* Check N_HIGH_MEMORY */
>> nodes_and(nsc->mask1,
>> cpuset_current_mems_allowed, node_states[N_HIGH_MEMORY]);
>> ...
>> if (pol->flags & MPOL_F_RELATIVE_NODES)
>> mpol_relative_nodemask(&nsc->mask2, nodes,&nsc->mask1);
>> else
>> nodes_and(nsc->mask2, *nodes, nsc->mask1);
>> ...
>> }
>> ==========================
>> If the user specifies 2 nodes: one has ZONE_MOVABLE memory, and the other one doesn't.
>> nsc->mask2 should contain these 2 nodes. So we should hava a mask that store the node
>> which has memory.
>>
>> There maybe something wrong in the change for MPOL_BIND. But this patchset is needed.
>
> Well, let's discuss the userspace-visible non-back-compatible mpol
> change. What is it, why did it happen, what is its impact, is it
> acceptable?
With the all patchsets, we can make a node which only has ZONE_MOVABLE memory.
When we test this feature, we found a problem: we can't bind a task to
such node, because there is no normal memory on this node.
According to the comment in policy_nodemask():
===============
static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
{
/* Lower zones don't get a nodemask applied for MPOL_BIND */
if (unlikely(policy->mode == MPOL_BIND) &&
gfp_zone(gfp) >= policy_zone &&
cpuset_nodemask_valid_mems_allowed(&policy->v.nodes))
return &policy->v.nodes;
return NULL;
}
===============
The mempolicy may only affect the memory for userspace. So I think we should
allow the user to bind a task to a movable node.
So we modify the function is_valid_nodemask() in part6 to allow the user to
do this.
We modify the function policy_nodemask() in part6, because:
we may allocate memory in task context(For example: fork a process, and allocate
memory to manage the new task), and the memory is used by the kernel(we can't
access it in userspace). In this case, gfp_zone() is ZONE_NORMAL, and
gfp_zone() >= policy_zone is true. Now we will return policy->v.nodes, and will
try allocate the memory in movable node. We can't allocate memory now.
So we modify the function policy_nodemask() to fix this problem.
Does this change mpol?
Thanks
Wen Congyang
>
> I grabbed "PART1" and "PART2", but that's as far as I got with the six
> memory hotplug patch series.
>