This patchset allows more configs to make use of movable nodes. When
CONFIG_MOVABLE_NODE is selected, there are two ways to introduce such
nodes into the system:
1. Discover movable nodes at boot. Currently this is only possible on
x86, but we will enable configs supporting fdt to do the same.
2. Hotplug and online all of a node's memory using online_movable. This
is already possible on any config supporting memory hotplug, not
just x86, but the Kconfig doesn't say so. We will fix that.
We'll also remove some cruft on power which would prevent (2).
/* changelog */
v7:
* Fix error when !CONFIG_HAVE_MEMBLOCK, found by the kbuild test robot.
* Remove the prefix of "linux,hotpluggable". Document the property's purpose.
v6:
* http://lkml.kernel.org/r/[email protected]
* Add a patch enabling the fdt to describe hotpluggable memory.
v5:
* http://lkml.kernel.org/r/[email protected]
* Drop the patches which recognize the "status" property of dt memory
nodes. Firmware can set the size of "linux,usable-memory" to zero instead.
v4:
* http://lkml.kernel.org/r/[email protected]
* Rename of_fdt_is_available() to of_fdt_device_is_available().
Rename of_flat_dt_is_available() to of_flat_dt_device_is_available().
* Instead of restoring top-down allocation, ensure it never goes
bottom-up in the first place, by making movable_node arch-specific.
* Use MEMORY_HOTPLUG instead of PPC64 in the mm/Kconfig patch.
v3:
* http://lkml.kernel.org/r/[email protected]
* Use Rob Herring's suggestions to improve the node availability check.
* More verbose commit log in the patch enabling CONFIG_MOVABLE_NODE.
* Add a patch to restore top-down allocation the way x86 does.
v2:
* http://lkml.kernel.org/r/[email protected]
* Use the "status" property of standard dt memory nodes instead of
introducing a new "ibm,hotplug-aperture" compatible id.
* Remove the patch which explicitly creates a memoryless node. This set
no longer has any bearing on whether the pgdat is created at boot or
at the time of memory addition.
v1:
* http://lkml.kernel.org/r/[email protected]
Reza Arbab (5):
powerpc/mm: allow memory hotplug into a memoryless node
mm: remove x86-only restriction of movable_node
mm: enable CONFIG_MOVABLE_NODE on non-x86 arches
of/fdt: mark hotpluggable memory
dt: add documentation of "hotpluggable" memory property
Documentation/devicetree/booting-without-of.txt | 7 +++++++
Documentation/kernel-parameters.txt | 2 +-
arch/powerpc/mm/numa.c | 13 +------------
arch/x86/kernel/setup.c | 24 ++++++++++++++++++++++++
drivers/of/fdt.c | 19 +++++++++++++++++++
include/linux/of_fdt.h | 1 +
mm/Kconfig | 2 +-
mm/memory_hotplug.c | 20 --------------------
8 files changed, 54 insertions(+), 34 deletions(-)
--
1.8.3.1
Summarize the "hotpluggable" property of dt memory nodes.
Signed-off-by: Reza Arbab <[email protected]>
---
Documentation/devicetree/booting-without-of.txt | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/Documentation/devicetree/booting-without-of.txt b/Documentation/devicetree/booting-without-of.txt
index 3f1437f..280d283 100644
--- a/Documentation/devicetree/booting-without-of.txt
+++ b/Documentation/devicetree/booting-without-of.txt
@@ -974,6 +974,13 @@ compatibility.
4Gb. Some vendors prefer splitting those ranges into smaller
segments, but the kernel doesn't care.
+ Additional properties:
+
+ - hotpluggable : The presence of this property provides an explicit
+ hint to the operating system that this memory may potentially be
+ removed later. The kernel can take this into consideration when
+ doing nonmovable allocations and when laying out memory zones.
+
e) The /chosen node
This node is a bit "special". Normally, that's where Open Firmware
--
1.8.3.1
Remove the check which prevents us from hotplugging into an empty node.
The original commit b226e4621245 ("[PATCH] powerpc: don't add memory to
empty node/zone"), states that this was intended to be a temporary measure.
It is a workaround for an oops which no longer occurs.
Signed-off-by: Reza Arbab <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Cc: Nathan Fontenot <[email protected]>
Cc: Bharata B Rao <[email protected]>
---
arch/powerpc/mm/numa.c | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index a51c188..0cb6bd8 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1085,7 +1085,7 @@ static int hot_add_node_scn_to_nid(unsigned long scn_addr)
int hot_add_scn_to_nid(unsigned long scn_addr)
{
struct device_node *memory = NULL;
- int nid, found = 0;
+ int nid;
if (!numa_enabled || (min_common_depth < 0))
return first_online_node;
@@ -1101,17 +1101,6 @@ int hot_add_scn_to_nid(unsigned long scn_addr)
if (nid < 0 || !node_online(nid))
nid = first_online_node;
- if (NODE_DATA(nid)->node_spanned_pages)
- return nid;
-
- for_each_online_node(nid) {
- if (NODE_DATA(nid)->node_spanned_pages) {
- found = 1;
- break;
- }
- }
-
- BUG_ON(!found);
return nid;
}
--
1.8.3.1
In commit c5320926e370 ("mem-hotplug: introduce movable_node boot
option"), the memblock allocation direction is changed to bottom-up and
then back to top-down like this:
1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node().
2. memblock_set_bottom_up(false), called by x86's numa_init().
Even though (1) occurs in generic mm code, it is wrapped by #ifdef
CONFIG_MOVABLE_NODE, which depends on X86_64.
This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches,
things will be unbalanced. (1) will happen for them, but (2) will not.
This toggle was added in the first place because x86 has a delay between
adding memblocks and marking them as hotpluggable. Since other arches do
this marking either immediately or not at all, they do not require the
bottom-up toggle.
So, resolve things by moving (1) from cmdline_parse_movable_node() to
x86's setup_arch(), immediately after the movable_node parameter has
been parsed.
Signed-off-by: Reza Arbab <[email protected]>
---
Documentation/kernel-parameters.txt | 2 +-
arch/x86/kernel/setup.c | 24 ++++++++++++++++++++++++
mm/memory_hotplug.c | 20 --------------------
3 files changed, 25 insertions(+), 21 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 37babf9..adcccd5 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2401,7 +2401,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
that the amount of memory usable for all allocations
is not too small.
- movable_node [KNL,X86] Boot-time switch to enable the effects
+ movable_node [KNL] Boot-time switch to enable the effects
of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
MTD_Partition= [MTD]
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 9c337b0..4cfba94 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -985,6 +985,30 @@ void __init setup_arch(char **cmdline_p)
parse_early_param();
+#ifdef CONFIG_MEMORY_HOTPLUG
+ /*
+ * Memory used by the kernel cannot be hot-removed because Linux
+ * cannot migrate the kernel pages. When memory hotplug is
+ * enabled, we should prevent memblock from allocating memory
+ * for the kernel.
+ *
+ * ACPI SRAT records all hotpluggable memory ranges. But before
+ * SRAT is parsed, we don't know about it.
+ *
+ * The kernel image is loaded into memory at very early time. We
+ * cannot prevent this anyway. So on NUMA system, we set any
+ * node the kernel resides in as un-hotpluggable.
+ *
+ * Since on modern servers, one node could have double-digit
+ * gigabytes memory, we can assume the memory around the kernel
+ * image is also un-hotpluggable. So before SRAT is parsed, just
+ * allocate memory near the kernel image to try the best to keep
+ * the kernel away from hotpluggable memory.
+ */
+ if (movable_node_is_enabled())
+ memblock_set_bottom_up(true);
+#endif
+
x86_report_nx();
/* after early param, so could get panic from serial */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index cad4b91..e43142c1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1727,26 +1727,6 @@ static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
static int __init cmdline_parse_movable_node(char *p)
{
#ifdef CONFIG_MOVABLE_NODE
- /*
- * Memory used by the kernel cannot be hot-removed because Linux
- * cannot migrate the kernel pages. When memory hotplug is
- * enabled, we should prevent memblock from allocating memory
- * for the kernel.
- *
- * ACPI SRAT records all hotpluggable memory ranges. But before
- * SRAT is parsed, we don't know about it.
- *
- * The kernel image is loaded into memory at very early time. We
- * cannot prevent this anyway. So on NUMA system, we set any
- * node the kernel resides in as un-hotpluggable.
- *
- * Since on modern servers, one node could have double-digit
- * gigabytes memory, we can assume the memory around the kernel
- * image is also un-hotpluggable. So before SRAT is parsed, just
- * allocate memory near the kernel image to try the best to keep
- * the kernel away from hotpluggable memory.
- */
- memblock_set_bottom_up(true);
movable_node_enabled = true;
#else
pr_warn("movable_node option not supported\n");
--
1.8.3.1
To support movable memory nodes (CONFIG_MOVABLE_NODE), at least one of
the following must be true:
1. This config has the capability to identify movable nodes at boot.
Right now, only x86 can do this.
2. Our config supports memory hotplug, which means that a movable node
can be created by hotplugging all of its memory into ZONE_MOVABLE.
Fix the Kconfig definition of CONFIG_MOVABLE_NODE, which currently
recognizes (1), but not (2).
Signed-off-by: Reza Arbab <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Balbir Singh <[email protected]>
---
mm/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/Kconfig b/mm/Kconfig
index 86e3e0e..061b46b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -153,7 +153,7 @@ config MOVABLE_NODE
bool "Enable to assign a node which has only movable memory"
depends on HAVE_MEMBLOCK
depends on NO_BOOTMEM
- depends on X86_64
+ depends on X86_64 || MEMORY_HOTPLUG
depends on NUMA
default n
help
--
1.8.3.1
When movable nodes are enabled, any node containing only hotpluggable
memory is made movable at boot time.
On x86, hotpluggable memory is discovered by parsing the ACPI SRAT,
making corresponding calls to memblock_mark_hotplug().
If we introduce a dt property to describe memory as hotpluggable,
configs supporting early fdt may then also do this marking and use
movable nodes.
Signed-off-by: Reza Arbab <[email protected]>
Tested-by: Balbir Singh <[email protected]>
---
drivers/of/fdt.c | 19 +++++++++++++++++++
include/linux/of_fdt.h | 1 +
mm/Kconfig | 2 +-
3 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index c89d5d2..c9b5cac 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1015,6 +1015,7 @@ int __init early_init_dt_scan_memory(unsigned long node, const char *uname,
const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
const __be32 *reg, *endp;
int l;
+ bool hotpluggable;
/* We are scanning "memory" nodes only */
if (type == NULL) {
@@ -1034,6 +1035,7 @@ int __init early_init_dt_scan_memory(unsigned long node, const char *uname,
return 0;
endp = reg + (l / sizeof(__be32));
+ hotpluggable = of_get_flat_dt_prop(node, "hotpluggable", NULL);
pr_debug("memory scan node %s, reg size %d,\n", uname, l);
@@ -1049,6 +1051,13 @@ int __init early_init_dt_scan_memory(unsigned long node, const char *uname,
(unsigned long long)size);
early_init_dt_add_memory_arch(base, size);
+
+ if (!hotpluggable)
+ continue;
+
+ if (early_init_dt_mark_hotplug_memory_arch(base, size))
+ pr_warn("failed to mark hotplug range 0x%llx - 0x%llx\n",
+ base, base + size);
}
return 0;
@@ -1146,6 +1155,11 @@ void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
memblock_add(base, size);
}
+int __init __weak early_init_dt_mark_hotplug_memory_arch(u64 base, u64 size)
+{
+ return memblock_mark_hotplug(base, size);
+}
+
int __init __weak early_init_dt_reserve_memory_arch(phys_addr_t base,
phys_addr_t size, bool nomap)
{
@@ -1168,6 +1182,11 @@ void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
WARN_ON(1);
}
+int __init __weak early_init_dt_mark_hotplug_memory_arch(u64 base, u64 size)
+{
+ return -ENOSYS;
+}
+
int __init __weak early_init_dt_reserve_memory_arch(phys_addr_t base,
phys_addr_t size, bool nomap)
{
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index 4341f32..271b3fd 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -71,6 +71,7 @@ extern int early_init_dt_scan_memory(unsigned long node, const char *uname,
extern void early_init_fdt_scan_reserved_mem(void);
extern void early_init_fdt_reserve_self(void);
extern void early_init_dt_add_memory_arch(u64 base, u64 size);
+extern int early_init_dt_mark_hotplug_memory_arch(u64 base, u64 size);
extern int early_init_dt_reserve_memory_arch(phys_addr_t base, phys_addr_t size,
bool no_map);
extern void * early_init_dt_alloc_memory_arch(u64 size, u64 align);
diff --git a/mm/Kconfig b/mm/Kconfig
index 061b46b..33a9b06 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -153,7 +153,7 @@ config MOVABLE_NODE
bool "Enable to assign a node which has only movable memory"
depends on HAVE_MEMBLOCK
depends on NO_BOOTMEM
- depends on X86_64 || MEMORY_HOTPLUG
+ depends on X86_64 || OF_EARLY_FLATTREE || MEMORY_HOTPLUG
depends on NUMA
default n
help
--
1.8.3.1
On 15/11/16 09:02, Reza Arbab wrote:
> In commit c5320926e370 ("mem-hotplug: introduce movable_node boot
> option"), the memblock allocation direction is changed to bottom-up and
> then back to top-down like this:
>
> 1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node().
> 2. memblock_set_bottom_up(false), called by x86's numa_init().
>
> Even though (1) occurs in generic mm code, it is wrapped by #ifdef
> CONFIG_MOVABLE_NODE, which depends on X86_64.
>
> This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches,
> things will be unbalanced. (1) will happen for them, but (2) will not.
>
> This toggle was added in the first place because x86 has a delay between
> adding memblocks and marking them as hotpluggable. Since other arches do
> this marking either immediately or not at all, they do not require the
> bottom-up toggle.
>
> So, resolve things by moving (1) from cmdline_parse_movable_node() to
> x86's setup_arch(), immediately after the movable_node parameter has
> been parsed.
>
> Signed-off-by: Reza Arbab <[email protected]>
Acked-by: Balbir Singh <[email protected]>
On 15/11/16 09:02, Reza Arbab wrote:
> When movable nodes are enabled, any node containing only hotpluggable
> memory is made movable at boot time.
>
> On x86, hotpluggable memory is discovered by parsing the ACPI SRAT,
> making corresponding calls to memblock_mark_hotplug().
>
> If we introduce a dt property to describe memory as hotpluggable,
> configs supporting early fdt may then also do this marking and use
> movable nodes.
>
> Signed-off-by: Reza Arbab <[email protected]>
> Tested-by: Balbir Singh <[email protected]>
Also
Acked-by: Balbir Singh <[email protected]>
Reza Arbab <[email protected]> writes:
> In commit c5320926e370 ("mem-hotplug: introduce movable_node boot
> option"), the memblock allocation direction is changed to bottom-up and
> then back to top-down like this:
>
> 1. memblock_set_bottom_up(true), called by cmdline_parse_movable_node().
> 2. memblock_set_bottom_up(false), called by x86's numa_init().
>
> Even though (1) occurs in generic mm code, it is wrapped by #ifdef
> CONFIG_MOVABLE_NODE, which depends on X86_64.
>
> This means that when we extend CONFIG_MOVABLE_NODE to non-x86 arches,
> things will be unbalanced. (1) will happen for them, but (2) will not.
>
> This toggle was added in the first place because x86 has a delay between
> adding memblocks and marking them as hotpluggable. Since other arches do
> this marking either immediately or not at all, they do not require the
> bottom-up toggle.
>
> So, resolve things by moving (1) from cmdline_parse_movable_node() to
> x86's setup_arch(), immediately after the movable_node parameter has
> been parsed.
Considering that we now can mark memblock hotpluggable, do we need to
enable the bottom up allocation for ppc64 also ?
>
> Signed-off-by: Reza Arbab <[email protected]>
> ---
> Documentation/kernel-parameters.txt | 2 +-
> arch/x86/kernel/setup.c | 24 ++++++++++++++++++++++++
-aneesh
On Tue, Nov 15, 2016 at 12:35:42PM +0530, Aneesh Kumar K.V wrote:
>Considering that we now can mark memblock hotpluggable, do we need to
>enable the bottom up allocation for ppc64 also ?
No, we don't, because early_init_dt_scan_memory() marks the memblocks
hotpluggable immediately when they are added. There is no gap between
the addition and the marking, as there is on x86, during which an
allocation might accidentally occur in a movable node.
--
Reza Arbab