2016-10-06 18:37:11

by Reza Arbab

[permalink] [raw]
Subject: [PATCH v4 0/5] powerpc/mm: movable hotplug memory nodes

These changes enable the dynamic creation of movable nodes on power.

On x86, the ACPI SRAT memory affinity structure can mark memory
hotpluggable, allowing the kernel to possibly create movable nodes at
boot.

While power has no analog of this SRAT information, we can still create
a movable memory node, post boot, by hotplugging all of the node's
memory into ZONE_MOVABLE.

We provide a way to describe the extents and numa associativity of such
a node in the device tree, while deferring the memory addition to take
place through hotplug.

In v1, this patchset introduced a new dt compatible id to explicitly
create a memoryless node at boot. Here, things have been simplified to
be applicable regardless of the status of node hotplug on power. We
still intend to enable hotadding a pgdat, but that's now untangled as a
separate topic.

v4:
* Rename of_fdt_is_available() to of_fdt_device_is_available().
Rename of_flat_dt_is_available() to of_flat_dt_device_is_available().

* Instead of restoring top-down allocation, ensure it never goes
bottom-up in the first place, by making movable_node arch-specific.

* Use MEMORY_HOTPLUG instead of PPC64 in the mm/Kconfig patch.

v3:
* http://lkml.kernel.org/r/[email protected]

* Use Rob Herring's suggestions to improve the node availability check.

* More verbose commit log in the patch enabling CONFIG_MOVABLE_NODE.

* Add a patch to restore top-down allocation the way x86 does.

v2:
* http://lkml.kernel.org/r/[email protected]

* Use the "status" property of standard dt memory nodes instead of
introducing a new "ibm,hotplug-aperture" compatible id.

* Remove the patch which explicitly creates a memoryless node. This set
no longer has any bearing on whether the pgdat is created at boot or
at the time of memory addition.

v1:
* http://lkml.kernel.org/r/[email protected]

Reza Arbab (5):
drivers/of: introduce of_fdt_device_is_available()
drivers/of: do not add memory for unavailable nodes
powerpc/mm: allow memory hotplug into a memoryless node
mm: make processing of movable_node arch-specific
mm: enable CONFIG_MOVABLE_NODE on non-x86 arches

arch/powerpc/mm/numa.c | 13 +------------
arch/x86/mm/numa.c | 35 ++++++++++++++++++++++++++++++++++-
drivers/of/fdt.c | 29 ++++++++++++++++++++++++++---
include/linux/of_fdt.h | 2 ++
mm/Kconfig | 2 +-
mm/memory_hotplug.c | 31 -------------------------------
6 files changed, 64 insertions(+), 48 deletions(-)

--
1.8.3.1


2016-10-06 18:36:59

by Reza Arbab

[permalink] [raw]
Subject: [PATCH v4 1/5] drivers/of: introduce of_fdt_device_is_available()

In __fdt_scan_reserved_mem(), the availability of a node is determined
by testing its "status" property.

Move this check into its own function, borrowing logic from the
unflattened version, of_device_is_available().

Another caller will be added in a subsequent patch.

Signed-off-by: Reza Arbab <[email protected]>
Acked-by: Rob Herring <[email protected]>
---
drivers/of/fdt.c | 26 +++++++++++++++++++++++---
include/linux/of_fdt.h | 2 ++
2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 085c638..b138efb 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -151,6 +151,23 @@ int of_fdt_match(const void *blob, unsigned long node,
return score;
}

+bool of_fdt_device_is_available(const void *blob, unsigned long node)
+{
+ const char *status;
+ int statlen;
+
+ status = fdt_getprop(blob, node, "status", &statlen);
+ if (!status)
+ return true;
+
+ if (statlen) {
+ if (!strcmp(status, "okay") || !strcmp(status, "ok"))
+ return true;
+ }
+
+ return false;
+}
+
static void *unflatten_dt_alloc(void **mem, unsigned long size,
unsigned long align)
{
@@ -647,7 +664,6 @@ static int __init __fdt_scan_reserved_mem(unsigned long node, const char *uname,
int depth, void *data)
{
static int found;
- const char *status;
int err;

if (!found && depth == 1 && strcmp(uname, "reserved-memory") == 0) {
@@ -667,8 +683,7 @@ static int __init __fdt_scan_reserved_mem(unsigned long node, const char *uname,
return 1;
}

- status = of_get_flat_dt_prop(node, "status", NULL);
- if (status && strcmp(status, "okay") != 0 && strcmp(status, "ok") != 0)
+ if (!of_flat_dt_device_is_available(node))
return 0;

err = __reserved_mem_reserve_reg(node, uname);
@@ -809,6 +824,11 @@ int __init of_flat_dt_match(unsigned long node, const char *const *compat)
return of_fdt_match(initial_boot_params, node, compat);
}

+bool __init of_flat_dt_device_is_available(unsigned long node)
+{
+ return of_fdt_device_is_available(initial_boot_params, node);
+}
+
struct fdt_scan_status {
const char *name;
int namelen;
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index 26c3302..4ff8c8e 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -37,6 +37,7 @@ extern bool of_fdt_is_big_endian(const void *blob,
unsigned long node);
extern int of_fdt_match(const void *blob, unsigned long node,
const char *const *compat);
+extern bool of_fdt_device_is_available(const void *blob, unsigned long node);
extern void *of_fdt_unflatten_tree(const unsigned long *blob,
struct device_node *dad,
struct device_node **mynodes);
@@ -59,6 +60,7 @@ extern const void *of_get_flat_dt_prop(unsigned long node, const char *name,
int *size);
extern int of_flat_dt_is_compatible(unsigned long node, const char *name);
extern int of_flat_dt_match(unsigned long node, const char *const *matches);
+extern bool of_flat_dt_device_is_available(unsigned long node);
extern unsigned long of_get_flat_dt_root(void);
extern int of_get_flat_dt_size(void);

--
1.8.3.1

2016-10-06 18:37:26

by Reza Arbab

[permalink] [raw]
Subject: [PATCH v4 4/5] mm: make processing of movable_node arch-specific

Currently, CONFIG_MOVABLE_NODE depends on X86_64. In preparation to
enable it for other arches, we need to factor a detail which is unique
to x86 out of the generic mm code.

Specifically, as documented in kernel-parameters.txt, the use of
"movable_node" should remain restricted to x86:

movable_node [KNL,X86] Boot-time switch to enable the effects
of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.

This option tells x86 to find movable nodes identified by the ACPI SRAT.
On other arches, it would have no benefit, only the undesired side
effect of setting bottom-up memblock allocation.

Since #ifdef CONFIG_MOVABLE_NODE will no longer be enough to restrict
this option to x86, move it to an arch-specific compilation unit
instead.

Signed-off-by: Reza Arbab <[email protected]>
---
arch/x86/mm/numa.c | 35 ++++++++++++++++++++++++++++++++++-
mm/memory_hotplug.c | 31 -------------------------------
2 files changed, 34 insertions(+), 32 deletions(-)

diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index fb68210..e95cab4 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -887,6 +887,38 @@ EXPORT_SYMBOL(cpumask_of_node);
#endif /* !CONFIG_DEBUG_PER_CPU_MAPS */

#ifdef CONFIG_MEMORY_HOTPLUG
+
+static int __init cmdline_parse_movable_node(char *p)
+{
+#ifdef CONFIG_MOVABLE_NODE
+ /*
+ * Memory used by the kernel cannot be hot-removed because Linux
+ * cannot migrate the kernel pages. When memory hotplug is
+ * enabled, we should prevent memblock from allocating memory
+ * for the kernel.
+ *
+ * ACPI SRAT records all hotpluggable memory ranges. But before
+ * SRAT is parsed, we don't know about it.
+ *
+ * The kernel image is loaded into memory at very early time. We
+ * cannot prevent this anyway. So on NUMA system, we set any
+ * node the kernel resides in as un-hotpluggable.
+ *
+ * Since on modern servers, one node could have double-digit
+ * gigabytes memory, we can assume the memory around the kernel
+ * image is also un-hotpluggable. So before SRAT is parsed, just
+ * allocate memory near the kernel image to try the best to keep
+ * the kernel away from hotpluggable memory.
+ */
+ memblock_set_bottom_up(true);
+ movable_node_enabled = true;
+#else
+ pr_warn("movable_node option not supported\n");
+#endif
+ return 0;
+}
+early_param("movable_node", cmdline_parse_movable_node);
+
int memory_add_physaddr_to_nid(u64 start)
{
struct numa_meminfo *mi = &numa_meminfo;
@@ -899,4 +931,5 @@ int memory_add_physaddr_to_nid(u64 start)
return nid;
}
EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
-#endif
+
+#endif /* CONFIG_MEMORY_HOTPLUG */
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9d29ba0..79c709a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1738,37 +1738,6 @@ static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
}
#endif /* CONFIG_MOVABLE_NODE */

-static int __init cmdline_parse_movable_node(char *p)
-{
-#ifdef CONFIG_MOVABLE_NODE
- /*
- * Memory used by the kernel cannot be hot-removed because Linux
- * cannot migrate the kernel pages. When memory hotplug is
- * enabled, we should prevent memblock from allocating memory
- * for the kernel.
- *
- * ACPI SRAT records all hotpluggable memory ranges. But before
- * SRAT is parsed, we don't know about it.
- *
- * The kernel image is loaded into memory at very early time. We
- * cannot prevent this anyway. So on NUMA system, we set any
- * node the kernel resides in as un-hotpluggable.
- *
- * Since on modern servers, one node could have double-digit
- * gigabytes memory, we can assume the memory around the kernel
- * image is also un-hotpluggable. So before SRAT is parsed, just
- * allocate memory near the kernel image to try the best to keep
- * the kernel away from hotpluggable memory.
- */
- memblock_set_bottom_up(true);
- movable_node_enabled = true;
-#else
- pr_warn("movable_node option not supported\n");
-#endif
- return 0;
-}
-early_param("movable_node", cmdline_parse_movable_node);
-
/* check which state of node_states will be changed when offline memory */
static void node_states_check_changes_offline(unsigned long nr_pages,
struct zone *zone, struct memory_notify *arg)
--
1.8.3.1

2016-10-06 18:37:42

by Reza Arbab

[permalink] [raw]
Subject: [PATCH v4 3/5] powerpc/mm: allow memory hotplug into a memoryless node

Remove the check which prevents us from hotplugging into an empty node.

This limitation has been questioned before [1], and judging by the
response, there doesn't seem to be a reason we can't remove it. No issues
have been found in light testing.

[1] http://lkml.kernel.org/r/CAGZKiBrmkSa1yyhbf5hwGxubcjsE5SmkSMY4tpANERMe2UG4bg@mail.gmail.com
http://lkml.kernel.org/r/[email protected]

Signed-off-by: Reza Arbab <[email protected]>
Reviewed-by: Aneesh Kumar K.V <[email protected]>
Acked-by: Balbir Singh <[email protected]>
Cc: Nathan Fontenot <[email protected]>
Cc: Bharata B Rao <[email protected]>
---
arch/powerpc/mm/numa.c | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 75b9cd6..d7ac419 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1121,7 +1121,7 @@ static int hot_add_node_scn_to_nid(unsigned long scn_addr)
int hot_add_scn_to_nid(unsigned long scn_addr)
{
struct device_node *memory = NULL;
- int nid, found = 0;
+ int nid;

if (!numa_enabled || (min_common_depth < 0))
return first_online_node;
@@ -1137,17 +1137,6 @@ int hot_add_scn_to_nid(unsigned long scn_addr)
if (nid < 0 || !node_online(nid))
nid = first_online_node;

- if (NODE_DATA(nid)->node_spanned_pages)
- return nid;
-
- for_each_online_node(nid) {
- if (NODE_DATA(nid)->node_spanned_pages) {
- found = 1;
- break;
- }
- }
-
- BUG_ON(!found);
return nid;
}

--
1.8.3.1

2016-10-06 18:37:51

by Reza Arbab

[permalink] [raw]
Subject: [PATCH v4 5/5] mm: enable CONFIG_MOVABLE_NODE on non-x86 arches

To support movable memory nodes (CONFIG_MOVABLE_NODE), at least one of
the following must be true:

1. We're on x86. This arch has the capability to identify movable nodes
at boot by parsing the ACPI SRAT, if the movable_node option is used.

2. Our config supports memory hotplug, which means that a movable node
can be created by hotplugging all of its memory into ZONE_MOVABLE.

Fix the Kconfig definition of CONFIG_MOVABLE_NODE, which currently
recognizes (1), but not (2).

Signed-off-by: Reza Arbab <[email protected]>
---
mm/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index be0ee11..5d0818f 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -153,7 +153,7 @@ config MOVABLE_NODE
bool "Enable to assign a node which has only movable memory"
depends on HAVE_MEMBLOCK
depends on NO_BOOTMEM
- depends on X86_64
+ depends on X86_64 || MEMORY_HOTPLUG
depends on NUMA
default n
help
--
1.8.3.1

2016-10-06 18:38:16

by Reza Arbab

[permalink] [raw]
Subject: [PATCH v4 2/5] drivers/of: do not add memory for unavailable nodes

Respect the standard dt "status" property when scanning memory nodes in
early_init_dt_scan_memory(), so that if the node is unavailable, no
memory will be added.

The use case at hand is accelerator or device memory, which may be
unusable until post-boot initialization of the memory link. Such a node
can be described in the dt as any other, given its status is "disabled".
Per the device tree specification,

"disabled"
Indicates that the device is not presently operational, but it
might become operational in the future (for example, something
is not plugged in, or switched off).

Once such memory is made operational, it can then be hotplugged.

Signed-off-by: Reza Arbab <[email protected]>
---
drivers/of/fdt.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index b138efb..08e5d94 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1056,6 +1056,9 @@ int __init early_init_dt_scan_memory(unsigned long node, const char *uname,
} else if (strcmp(type, "memory") != 0)
return 0;

+ if (!of_flat_dt_device_is_available(node))
+ return 0;
+
reg = of_get_flat_dt_prop(node, "linux,usable-memory", &l);
if (reg == NULL)
reg = of_get_flat_dt_prop(node, "reg", &l);
--
1.8.3.1

2016-10-07 06:37:35

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific

Reza Arbab <[email protected]> writes:

> Currently, CONFIG_MOVABLE_NODE depends on X86_64. In preparation to
> enable it for other arches, we need to factor a detail which is unique
> to x86 out of the generic mm code.
>
> Specifically, as documented in kernel-parameters.txt, the use of
> "movable_node" should remain restricted to x86:
>
> movable_node [KNL,X86] Boot-time switch to enable the effects
> of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
>
> This option tells x86 to find movable nodes identified by the ACPI SRAT.
> On other arches, it would have no benefit, only the undesired side
> effect of setting bottom-up memblock allocation.
>
> Since #ifdef CONFIG_MOVABLE_NODE will no longer be enough to restrict
> this option to x86, move it to an arch-specific compilation unit
> instead.

Reviewed-by: Aneesh Kumar K.V <[email protected]>

>
> Signed-off-by: Reza Arbab <[email protected]>
> ---
> arch/x86/mm/numa.c | 35 ++++++++++++++++++++++++++++++++++-
> mm/memory_hotplug.c | 31 -------------------------------
> 2 files changed, 34 insertions(+), 32 deletions(-)
>
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index fb68210..e95cab4 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -887,6 +887,38 @@ EXPORT_SYMBOL(cpumask_of_node);
> #endif /* !CONFIG_DEBUG_PER_CPU_MAPS */
>
> #ifdef CONFIG_MEMORY_HOTPLUG
> +
> +static int __init cmdline_parse_movable_node(char *p)
> +{
> +#ifdef CONFIG_MOVABLE_NODE
> + /*
> + * Memory used by the kernel cannot be hot-removed because Linux
> + * cannot migrate the kernel pages. When memory hotplug is
> + * enabled, we should prevent memblock from allocating memory
> + * for the kernel.
> + *
> + * ACPI SRAT records all hotpluggable memory ranges. But before
> + * SRAT is parsed, we don't know about it.
> + *
> + * The kernel image is loaded into memory at very early time. We
> + * cannot prevent this anyway. So on NUMA system, we set any
> + * node the kernel resides in as un-hotpluggable.
> + *
> + * Since on modern servers, one node could have double-digit
> + * gigabytes memory, we can assume the memory around the kernel
> + * image is also un-hotpluggable. So before SRAT is parsed, just
> + * allocate memory near the kernel image to try the best to keep
> + * the kernel away from hotpluggable memory.
> + */
> + memblock_set_bottom_up(true);
> + movable_node_enabled = true;
> +#else
> + pr_warn("movable_node option not supported\n");
> +#endif
> + return 0;
> +}
> +early_param("movable_node", cmdline_parse_movable_node);
> +
> int memory_add_physaddr_to_nid(u64 start)
> {
> struct numa_meminfo *mi = &numa_meminfo;
> @@ -899,4 +931,5 @@ int memory_add_physaddr_to_nid(u64 start)
> return nid;
> }
> EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
> -#endif
> +
> +#endif /* CONFIG_MEMORY_HOTPLUG */
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9d29ba0..79c709a 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1738,37 +1738,6 @@ static bool can_offline_normal(struct zone *zone, unsigned long nr_pages)
> }
> #endif /* CONFIG_MOVABLE_NODE */
>
> -static int __init cmdline_parse_movable_node(char *p)
> -{
> -#ifdef CONFIG_MOVABLE_NODE
> - /*
> - * Memory used by the kernel cannot be hot-removed because Linux
> - * cannot migrate the kernel pages. When memory hotplug is
> - * enabled, we should prevent memblock from allocating memory
> - * for the kernel.
> - *
> - * ACPI SRAT records all hotpluggable memory ranges. But before
> - * SRAT is parsed, we don't know about it.
> - *
> - * The kernel image is loaded into memory at very early time. We
> - * cannot prevent this anyway. So on NUMA system, we set any
> - * node the kernel resides in as un-hotpluggable.
> - *
> - * Since on modern servers, one node could have double-digit
> - * gigabytes memory, we can assume the memory around the kernel
> - * image is also un-hotpluggable. So before SRAT is parsed, just
> - * allocate memory near the kernel image to try the best to keep
> - * the kernel away from hotpluggable memory.
> - */
> - memblock_set_bottom_up(true);
> - movable_node_enabled = true;
> -#else
> - pr_warn("movable_node option not supported\n");
> -#endif
> - return 0;
> -}
> -early_param("movable_node", cmdline_parse_movable_node);
> -
> /* check which state of node_states will be changed when offline memory */
> static void node_states_check_changes_offline(unsigned long nr_pages,
> struct zone *zone, struct memory_notify *arg)
> --
> 1.8.3.1

2016-10-07 06:40:39

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] mm: enable CONFIG_MOVABLE_NODE on non-x86 arches

Reza Arbab <[email protected]> writes:

> To support movable memory nodes (CONFIG_MOVABLE_NODE), at least one of
> the following must be true:
>
> 1. We're on x86. This arch has the capability to identify movable nodes
> at boot by parsing the ACPI SRAT, if the movable_node option is used.
>
> 2. Our config supports memory hotplug, which means that a movable node
> can be created by hotplugging all of its memory into ZONE_MOVABLE.
>
> Fix the Kconfig definition of CONFIG_MOVABLE_NODE, which currently
> recognizes (1), but not (2).
>

We now enable a lot of new code on different arch, such as the new node list
N_MEMORY.

Reviewed-by: Aneesh Kumar K.V <[email protected]>

> Signed-off-by: Reza Arbab <[email protected]>
> ---
> mm/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index be0ee11..5d0818f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -153,7 +153,7 @@ config MOVABLE_NODE
> bool "Enable to assign a node which has only movable memory"
> depends on HAVE_MEMBLOCK
> depends on NO_BOOTMEM
> - depends on X86_64
> + depends on X86_64 || MEMORY_HOTPLUG
> depends on NUMA
> default n
> help
> --
> 1.8.3.1

2016-10-11 12:42:11

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific



On 07/10/16 05:36, Reza Arbab wrote:
> Currently, CONFIG_MOVABLE_NODE depends on X86_64. In preparation to
> enable it for other arches, we need to factor a detail which is unique
> to x86 out of the generic mm code.
>
> Specifically, as documented in kernel-parameters.txt, the use of
> "movable_node" should remain restricted to x86:
>
> movable_node [KNL,X86] Boot-time switch to enable the effects
> of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
>
> This option tells x86 to find movable nodes identified by the ACPI SRAT.
> On other arches, it would have no benefit, only the undesired side
> effect of setting bottom-up memblock allocation.
>
> Since #ifdef CONFIG_MOVABLE_NODE will no longer be enough to restrict
> this option to x86, move it to an arch-specific compilation unit
> instead.
>
> Signed-off-by: Reza Arbab <[email protected]>

Acked-by: Balbir Singh <[email protected]>

2016-10-11 13:17:43

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] mm: enable CONFIG_MOVABLE_NODE on non-x86 arches



On 07/10/16 05:36, Reza Arbab wrote:
> To support movable memory nodes (CONFIG_MOVABLE_NODE), at least one of
> the following must be true:
>
> 1. We're on x86. This arch has the capability to identify movable nodes
> at boot by parsing the ACPI SRAT, if the movable_node option is used.
>
> 2. Our config supports memory hotplug, which means that a movable node
> can be created by hotplugging all of its memory into ZONE_MOVABLE.
>
> Fix the Kconfig definition of CONFIG_MOVABLE_NODE, which currently
> recognizes (1), but not (2).
>
> Signed-off-by: Reza Arbab <[email protected]>
> ---
> mm/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index be0ee11..5d0818f 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -153,7 +153,7 @@ config MOVABLE_NODE
> bool "Enable to assign a node which has only movable memory"
> depends on HAVE_MEMBLOCK
> depends on NO_BOOTMEM
> - depends on X86_64
> + depends on X86_64 || MEMORY_HOTPLUG
> depends on NUMA
> default n
> help
>

Acked-by: Balbir Singh <[email protected]>

2016-10-11 13:59:08

by Rob Herring

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] drivers/of: do not add memory for unavailable nodes

On Thu, Oct 6, 2016 at 1:36 PM, Reza Arbab <[email protected]> wrote:
> Respect the standard dt "status" property when scanning memory nodes in
> early_init_dt_scan_memory(), so that if the node is unavailable, no
> memory will be added.
>
> The use case at hand is accelerator or device memory, which may be
> unusable until post-boot initialization of the memory link. Such a node
> can be described in the dt as any other, given its status is "disabled".
> Per the device tree specification,
>
> "disabled"
> Indicates that the device is not presently operational, but it
> might become operational in the future (for example, something
> is not plugged in, or switched off).
>
> Once such memory is made operational, it can then be hotplugged.
>
> Signed-off-by: Reza Arbab <[email protected]>
> ---
> drivers/of/fdt.c | 3 +++
> 1 file changed, 3 insertions(+)

Acked-by: Rob Herring <[email protected]>

2016-10-20 03:31:07

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v4 3/5] powerpc/mm: allow memory hotplug into a memoryless node



On 07/10/16 05:36, Reza Arbab wrote:
> Remove the check which prevents us from hotplugging into an empty node.
>
> This limitation has been questioned before [1], and judging by the
> response, there doesn't seem to be a reason we can't remove it. No issues
> have been found in light testing.
>
> [1] http://lkml.kernel.org/r/CAGZKiBrmkSa1yyhbf5hwGxubcjsE5SmkSMY4tpANERMe2UG4bg@mail.gmail.com
> http://lkml.kernel.org/r/[email protected]
>
> Signed-off-by: Reza Arbab <[email protected]>
> Reviewed-by: Aneesh Kumar K.V <[email protected]>
> Acked-by: Balbir Singh <[email protected]>
> Cc: Nathan Fontenot <[email protected]>
> Cc: Bharata B Rao <[email protected]>
> ---
> arch/powerpc/mm/numa.c | 13 +------------
> 1 file changed, 1 insertion(+), 12 deletions(-)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 75b9cd6..d7ac419 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1121,7 +1121,7 @@ static int hot_add_node_scn_to_nid(unsigned long scn_addr)
> int hot_add_scn_to_nid(unsigned long scn_addr)
> {
> struct device_node *memory = NULL;
> - int nid, found = 0;
> + int nid;
>
> if (!numa_enabled || (min_common_depth < 0))
> return first_online_node;
> @@ -1137,17 +1137,6 @@ int hot_add_scn_to_nid(unsigned long scn_addr)
> if (nid < 0 || !node_online(nid))
> nid = first_online_node;
>
> - if (NODE_DATA(nid)->node_spanned_pages)
> - return nid;
> -
> - for_each_online_node(nid) {
> - if (NODE_DATA(nid)->node_spanned_pages) {
> - found = 1;
> - break;
> - }
> - }
> -
> - BUG_ON(!found);
> return nid;

FYI, these checks were temporary to begin with

I found this in git history

b226e462124522f2f23153daff31c311729dfa2f (powerpc: don't add memory to empty node/zone)

Balbir Singh.

2016-10-20 14:39:12

by Reza Arbab

[permalink] [raw]
Subject: Re: [PATCH v4 3/5] powerpc/mm: allow memory hotplug into a memoryless node

On Thu, Oct 20, 2016 at 02:30:42PM +1100, Balbir Singh wrote:
>FYI, these checks were temporary to begin with
>
>I found this in git history
>
>b226e462124522f2f23153daff31c311729dfa2f (powerpc: don't add memory to empty node/zone)

Nice find! I spent some time digging, but this had eluded me.

--
Reza Arbab

2016-10-21 06:23:08

by Alistair Popple

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] drivers/of: do not add memory for unavailable nodes

Hi Reza,

On Thu, 6 Oct 2016 01:36:32 PM Reza Arbab wrote:
> Respect the standard dt "status" property when scanning memory nodes in
> early_init_dt_scan_memory(), so that if the node is unavailable, no
> memory will be added.

What happens if a kernel without this patch is booted on a system with some
status="disabled" device-nodes? Do older kernels just ignore this memory or do
they try to use it?

>From what I can tell it seems that kernels without this patch will try and use
this memory even if it is marked in the device-tree as status="disabled" which
could lead to problems for older kernels when we start exporting this property
from firmware.

Arguably this might not be such a problem in practice as we probably don't
have many (if any) existing kernels that will boot on hardware exporting these
properties. However given this patch seems fairly independent perhaps it is
worth sending as a separate fix if it is not going to make it into this
release?

Regards,

Alistair

> The use case at hand is accelerator or device memory, which may be
> unusable until post-boot initialization of the memory link. Such a node
> can be described in the dt as any other, given its status is "disabled".
> Per the device tree specification,
>
> "disabled"
> Indicates that the device is not presently operational, but it
> might become operational in the future (for example, something
> is not plugged in, or switched off).
>
> Once such memory is made operational, it can then be hotplugged.
>
> Signed-off-by: Reza Arbab <[email protected]>
> ---
> drivers/of/fdt.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> index b138efb..08e5d94 100644
> --- a/drivers/of/fdt.c
> +++ b/drivers/of/fdt.c
> @@ -1056,6 +1056,9 @@ int __init early_init_dt_scan_memory(unsigned long
node, const char *uname,
> } else if (strcmp(type, "memory") != 0)
> return 0;
>
> + if (!of_flat_dt_device_is_available(node))
> + return 0;
> +
> reg = of_get_flat_dt_prop(node, "linux,usable-memory", &l);
> if (reg == NULL)
> reg = of_get_flat_dt_prop(node, "reg", &l);
>

2016-10-23 01:51:19

by Reza Arbab

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] drivers/of: do not add memory for unavailable nodes

Hi Alistair,

On Fri, Oct 21, 2016 at 05:22:54PM +1100, Alistair Popple wrote:
>From what I can tell it seems that kernels without this patch will try
>and use this memory even if it is marked in the device-tree as
>status="disabled" which could lead to problems for older kernels when
>we start exporting this property from firmware.
>
>Arguably this might not be such a problem in practice as we probably
>don't have many (if any) existing kernels that will boot on hardware
>exporting these properties.

Yes, I think you've got it right.

>However given this patch seems fairly independent perhaps it is worth
>sending as a separate fix if it is not going to make it into this
>release?

Michael,

If this set as a whole is going to miss the release, would it be helpful
for me to resend 1/5 and 2/5 as a separate set? They are the minimum
needed to prevent the possible forward compatibility issue Alistair
describes.

--
Reza Arbab

2016-10-24 10:24:15

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] drivers/of: do not add memory for unavailable nodes

Alistair Popple <[email protected]> writes:

> Hi Reza,
>
> On Thu, 6 Oct 2016 01:36:32 PM Reza Arbab wrote:
>> Respect the standard dt "status" property when scanning memory nodes in
>> early_init_dt_scan_memory(), so that if the node is unavailable, no
>> memory will be added.
>
> What happens if a kernel without this patch is booted on a system with some
> status="disabled" device-nodes? Do older kernels just ignore this memory or do
> they try to use it?
>
> From what I can tell it seems that kernels without this patch will try and use
> this memory even if it is marked in the device-tree as status="disabled" which
> could lead to problems for older kernels when we start exporting this property
> from firmware.

The code already looks for "linux,usable-memory" in preference to "reg".
Can you use that instead?

That would have the advantage that existing kernels already understand
it.

Another problem with using "status" is we could have device trees out
there that have status = disabled and we don't know about it, and by
changing the kernel to use that property we break people's systems.
Though for memory nodes my guess is that's not true, but you never know ...

cheers

2016-10-24 18:20:17

by Reza Arbab

[permalink] [raw]
Subject: Re: [PATCH v4 2/5] drivers/of: do not add memory for unavailable nodes

On Mon, Oct 24, 2016 at 09:24:04PM +1100, Michael Ellerman wrote:
>The code already looks for "linux,usable-memory" in preference to
>"reg". Can you use that instead?

Yes, we could set the size of "linux,usable-memory" to zero instead of
setting status to "disabled".

I'll send a v5 of this set which drops 1/5 and 2/5. That would be the
only difference here.

>That would have the advantage that existing kernels already understand
>it.
>
>Another problem with using "status" is we could have device trees out
>there that have status = disabled and we don't know about it, and by
>changing the kernel to use that property we break people's systems.
>Though for memory nodes my guess is that's not true, but you never know ...

--
Reza Arbab

2016-10-25 09:40:08

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v4 3/5] powerpc/mm: allow memory hotplug into a memoryless node

Balbir Singh <[email protected]> writes:
> FYI, these checks were temporary to begin with
>
> I found this in git history
>
> b226e462124522f2f23153daff31c311729dfa2f (powerpc: don't add memory to empty node/zone)

Nice thanks for digging it up.

commit b226e462124522f2f23153daff31c311729dfa2f
Author: Mike Kravetz <[email protected]>
AuthorDate: Fri Dec 16 14:30:35 2005 -0800
^^^^

That is why maintainers don't like to merge "temporary" patches :)

cheers

2016-10-25 12:16:02

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific



On 11/10/16 23:26, Balbir Singh wrote:
>
>
> On 07/10/16 05:36, Reza Arbab wrote:
>> Currently, CONFIG_MOVABLE_NODE depends on X86_64. In preparation to
>> enable it for other arches, we need to factor a detail which is unique
>> to x86 out of the generic mm code.
>>
>> Specifically, as documented in kernel-parameters.txt, the use of
>> "movable_node" should remain restricted to x86:
>>
>> movable_node [KNL,X86] Boot-time switch to enable the effects
>> of CONFIG_MOVABLE_NODE=y. See mm/Kconfig for details.
>>
>> This option tells x86 to find movable nodes identified by the ACPI SRAT.
>> On other arches, it would have no benefit, only the undesired side
>> effect of setting bottom-up memblock allocation.
>>
>> Since #ifdef CONFIG_MOVABLE_NODE will no longer be enough to restrict
>> this option to x86, move it to an arch-specific compilation unit
>> instead.
>>
>> Signed-off-by: Reza Arbab <[email protected]>
>
> Acked-by: Balbir Singh <[email protected]>
>

After the ack, I realized there were some more checks needed, IOW
questions for you :)

1. Have you checked to see if our memblock allocations spill
over to probably hotpluggable nodes?
2. Shouldn't we be marking nodes discovered as movable via
memblock_mark_hotplug()?

Balbir Singh.

2016-10-25 15:55:19

by Reza Arbab

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific

On Tue, Oct 25, 2016 at 11:15:40PM +1100, Balbir Singh wrote:
>After the ack, I realized there were some more checks needed, IOW
>questions for you :)

Hey! No takebacks!

The short answer is that neither of these is a concern.

Longer; if you use "movable_node", x86 can identify these nodes at boot.
They call memblock_mark_hotplug() while parsing the SRAT. Then, when the
zones are initialized, those markings are used to determine ZONE_MOVABLE.

We have no analog of this SRAT information, so our movable nodes can
only be created post boot, by hotplugging and explicitly onlining with
online_movable.

>1. Have you checked to see if our memblock allocations spill
>over to probably hotpluggable nodes?

Since our nodes don't exist at boot, we don't have that short window
before the zones are drawn where the node has normal memory, and a
kernel allocation might occur within.

>2. Shouldn't we be marking nodes discovered as movable via
>memblock_mark_hotplug()?

Again, this early boot marking mechanism only applies to movable_node.

--
Reza Arbab

2016-10-25 22:34:40

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific



On 26/10/16 02:55, Reza Arbab wrote:
> On Tue, Oct 25, 2016 at 11:15:40PM +1100, Balbir Singh wrote:
>> After the ack, I realized there were some more checks needed, IOW
>> questions for you :)
>
> Hey! No takebacks!
>

I still believe we need your changes, I was wondering if we've tested
it against normal memory nodes and checked if any memblock
allocations end up there. Michael showed me some memblock
allocations on node 1 of a two node machine with movable_node
I'll double check at my end. See my question below


> The short answer is that neither of these is a concern.
>
> Longer; if you use "movable_node", x86 can identify these nodes at boot. They call memblock_mark_hotplug() while parsing the SRAT. Then, when the zones are initialized, those markings are used to determine ZONE_MOVABLE.
>
> We have no analog of this SRAT information, so our movable nodes can only be created post boot, by hotplugging and explicitly onlining with online_movable.
>

Is this true for all of system memory as well or only for nodes
hotplugged later?

Balbir Singh.

2016-10-25 23:00:03

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific



On 26/10/16 02:55, Reza Arbab wrote:
> On Tue, Oct 25, 2016 at 11:15:40PM +1100, Balbir Singh wrote:
>> After the ack, I realized there were some more checks needed, IOW
>> questions for you :)
>
> Hey! No takebacks!
>

I still believe we need your changes, I was wondering if we've tested
it against normal memory nodes and checked if any memblock
allocations end up there. Michael showed me some memblock
allocations on node 1 of a two node machine with movable_node
I'll double check at my end. See my question below


> The short answer is that neither of these is a concern.
>
> Longer; if you use "movable_node", x86 can identify these nodes at boot. They call memblock_mark_hotplug() while parsing the SRAT. Then, when the zones are initialized, those markings are used to determine ZONE_MOVABLE.
>
> We have no analog of this SRAT information, so our movable nodes can only be created post boot, by hotplugging and explicitly onlining with online_movable.
>

Is this true for all of system memory as well or only for nodes
hotplugged later?

Balbir Singh.

2016-10-26 00:49:43

by Reza Arbab

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific

On Wed, Oct 26, 2016 at 09:34:18AM +1100, Balbir Singh wrote:
>I still believe we need your changes, I was wondering if we've tested
>it against normal memory nodes and checked if any memblock
>allocations end up there. Michael showed me some memblock
>allocations on node 1 of a two node machine with movable_node

The movable_node option is x86-only. Both of those nodes contain normal
memory, so allocations on both are allowed.

>> Longer; if you use "movable_node", x86 can identify these nodes at
>> boot. They call memblock_mark_hotplug() while parsing the SRAT. Then,
>> when the zones are initialized, those markings are used to determine
>> ZONE_MOVABLE.
>>
>> We have no analog of this SRAT information, so our movable nodes can
>> only be created post boot, by hotplugging and explicitly onlining
>> with online_movable.
>
>Is this true for all of system memory as well or only for nodes
>hotplugged later?

As far as I know, power has nothing like the SRAT that tells us, at
boot, which memory is hotpluggable. So there is nothing to wire the
movable_node option up to.

Of course, any memory you hotplug afterwards is, by definition,
hotpluggable. So we can still create movable nodes that way.

--
Reza Arbab

2016-10-26 10:53:24

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific

Reza Arbab <[email protected]> writes:

> On Wed, Oct 26, 2016 at 09:34:18AM +1100, Balbir Singh wrote:
>>I still believe we need your changes, I was wondering if we've tested
>>it against normal memory nodes and checked if any memblock
>>allocations end up there. Michael showed me some memblock
>>allocations on node 1 of a two node machine with movable_node
>
> The movable_node option is x86-only. Both of those nodes contain normal
> memory, so allocations on both are allowed.
>
>>> Longer; if you use "movable_node", x86 can identify these nodes at
>>> boot. They call memblock_mark_hotplug() while parsing the SRAT. Then,
>>> when the zones are initialized, those markings are used to determine
>>> ZONE_MOVABLE.
>>>
>>> We have no analog of this SRAT information, so our movable nodes can
>>> only be created post boot, by hotplugging and explicitly onlining
>>> with online_movable.
>>
>>Is this true for all of system memory as well or only for nodes
>>hotplugged later?
>
> As far as I know, power has nothing like the SRAT that tells us, at
> boot, which memory is hotpluggable.

On pseries we have the ibm,dynamic-memory device tree property, which
can contain ranges of memory that are not yet "assigned to the
partition" - ie. can be hotplugged later.

So in general that statement is not true.

But I think you're focused on bare-metal, in which case you might be
right. But that doesn't mean we couldn't have a similar property, if
skiboot/hostboot knew what the ranges of memory were going to be.

cheers

2016-10-26 17:03:56

by Reza Arbab

[permalink] [raw]
Subject: Re: [PATCH v4 4/5] mm: make processing of movable_node arch-specific

On Wed, Oct 26, 2016 at 09:52:53PM +1100, Michael Ellerman wrote:
>> As far as I know, power has nothing like the SRAT that tells us, at
>> boot, which memory is hotpluggable.
>
>On pseries we have the ibm,dynamic-memory device tree property, which
>can contain ranges of memory that are not yet "assigned to the
>partition" - ie. can be hotplugged later.
>
>So in general that statement is not true.
>
>But I think you're focused on bare-metal, in which case you might be
>right. But that doesn't mean we couldn't have a similar property, if
>skiboot/hostboot knew what the ranges of memory were going to be.

Yes, sorry, I should have qualified that statement to say I wasn't
talking about pseries.

I can amend this set to actually implement movable_node on power too,
but we'd have to settle on a name for the dt property. Is
"linux,movable-node" too on the nose?

--
Reza Arbab