2010-11-17 04:45:41

by Zheng, Shaohui

[permalink] [raw]
Subject: [3/8,v3] NUMA Hotplug Emulator: Userland interface to hotplug-add fake offlined nodes.

From: Haicheng Li <[email protected]>

Add a sysfs entry "probe" under /sys/devices/system/node/:

- to show all fake offlined nodes:
$ cat /sys/devices/system/node/probe

- to hotadd a fake offlined node, e.g. nodeid is N:
$ echo N > /sys/devices/system/node/probe

CC: Dave Hansen <[email protected]>
CC: Christoph Lameter <[email protected]>
Signed-off-by: Haicheng Li <[email protected]>
Signed-off-by: Shaohui Zheng <[email protected]>
---
Index: linux-hpe4/Documentation/ABI/testing/sysfs-devices-node
===================================================================
--- linux-hpe4.orig/Documentation/ABI/testing/sysfs-devices-node 2010-11-15 17:13:02.433461413 +0800
+++ linux-hpe4/Documentation/ABI/testing/sysfs-devices-node 2010-11-15 17:13:07.093461818 +0800
@@ -5,3 +5,11 @@
When this file is written to, all memory within that node
will be compacted. When it completes, memory will be freed
into blocks which have as many contiguous pages as possible
+
+What: /sys/devices/system/node/probe
+Date: Jun 2010
+Contact: Haicheng Li <[email protected]>
+Description:
+ This file lists all the availabe hidden nodes, when we write
+ a nid number to this interface, and the nid is in the available
+ node list, the hidden node becomes visible.
Index: linux-hpe4/drivers/base/node.c
===================================================================
--- linux-hpe4.orig/drivers/base/node.c 2010-11-15 17:13:02.433461413 +0800
+++ linux-hpe4/drivers/base/node.c 2010-11-15 17:13:07.093461818 +0800
@@ -538,6 +538,25 @@
unregister_node(&node_devices[nid]);
}

+#ifdef CONFIG_NODE_HOTPLUG_EMU
+static ssize_t store_nodes_probe(struct sysdev_class *class,
+ struct sysdev_class_attribute *attr,
+ const char *buf, size_t count)
+{
+ long nid;
+
+ strict_strtol(buf, 0, &nid);
+ if (nid < 0 || nid > nr_node_ids - 1) {
+ printk(KERN_ERR "Invalid NUMA node id: %ld (0 <= nid < %d).\n",
+ nid, nr_node_ids);
+ return -EPERM;
+ }
+ hotadd_hidden_nodes(nid);
+
+ return count;
+}
+#endif
+
/*
* node states attributes
*/
@@ -566,26 +585,35 @@
return print_nodes_state(na->state, buf);
}

-#define _NODE_ATTR(name, state) \
+#define _NODE_ATTR_RO(name, state) \
{ _SYSDEV_CLASS_ATTR(name, 0444, show_node_state, NULL), state }

+#define _NODE_ATTR_RW(name, store_func, state) \
+ { _SYSDEV_CLASS_ATTR(name, 0644, show_node_state, store_func), state }
+
static struct node_attr node_state_attr[] = {
- _NODE_ATTR(possible, N_POSSIBLE),
- _NODE_ATTR(online, N_ONLINE),
- _NODE_ATTR(has_normal_memory, N_NORMAL_MEMORY),
- _NODE_ATTR(has_cpu, N_CPU),
+ [N_POSSIBLE] = _NODE_ATTR_RO(possible, N_POSSIBLE),
+#ifdef CONFIG_NODE_HOTPLUG_EMU
+ [N_HIDDEN] = _NODE_ATTR_RW(probe, store_nodes_probe, N_HIDDEN),
+#endif
+ [N_ONLINE] = _NODE_ATTR_RO(online, N_ONLINE),
+ [N_NORMAL_MEMORY] = _NODE_ATTR_RO(has_normal_memory, N_NORMAL_MEMORY),
#ifdef CONFIG_HIGHMEM
- _NODE_ATTR(has_high_memory, N_HIGH_MEMORY),
+ [N_HIGH_MEMORY] = _NODE_ATTR_RO(has_high_memory, N_HIGH_MEMORY),
#endif
+ [N_CPU] = _NODE_ATTR_RO(has_cpu, N_CPU),
};

static struct sysdev_class_attribute *node_state_attrs[] = {
- &node_state_attr[0].attr,
- &node_state_attr[1].attr,
- &node_state_attr[2].attr,
- &node_state_attr[3].attr,
+ &node_state_attr[N_POSSIBLE].attr,
+#ifdef CONFIG_NODE_HOTPLUG_EMU
+ &node_state_attr[N_HIDDEN].attr,
+#endif
+ &node_state_attr[N_ONLINE].attr,
+ &node_state_attr[N_NORMAL_MEMORY].attr,
+ &node_state_attr[N_CPU].attr,
#ifdef CONFIG_HIGHMEM
- &node_state_attr[4].attr,
+ &node_state_attr[N_HIGH_MEMORY].attr,
#endif
NULL
};
Index: linux-hpe4/mm/Kconfig
===================================================================
--- linux-hpe4.orig/mm/Kconfig 2010-11-15 17:13:02.443461606 +0800
+++ linux-hpe4/mm/Kconfig 2010-11-15 17:21:05.535335091 +0800
@@ -147,6 +147,21 @@
depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
depends on MIGRATION

+config NUMA_HOTPLUG_EMU
+ bool "NUMA hotplug emulator"
+ depends on X86_64 && NUMA && MEMORY_HOTPLUG
+
+ ---help---
+
+config NODE_HOTPLUG_EMU
+ bool "Node hotplug emulation"
+ depends on NUMA_HOTPLUG_EMU && MEMORY_HOTPLUG
+ ---help---
+ Enable Node hotplug emulation. The machine will be setup with
+ hidden virtual nodes when booted with "numa=hide=N*size", where
+ N is the number of hidden nodes, size is the memory size per
+ hidden node. This is only useful for debugging.
+
#
# If we have space for more page flags then we can enable additional
# optimizations and functionality.

--
Thanks & Regards,
Shaohui


2010-11-17 08:17:00

by David Rientjes

[permalink] [raw]
Subject: Re: [3/8,v3] NUMA Hotplug Emulator: Userland interface to hotplug-add fake offlined nodes.

On Wed, 17 Nov 2010, [email protected] wrote:

> From: Haicheng Li <[email protected]>
>
> Add a sysfs entry "probe" under /sys/devices/system/node/:
>
> - to show all fake offlined nodes:
> $ cat /sys/devices/system/node/probe
>
> - to hotadd a fake offlined node, e.g. nodeid is N:
> $ echo N > /sys/devices/system/node/probe
>

This would be much more powerful if we just reserved an amount of memory
at boot and then allowed users to hot-add a given amount with an
non-online node id. Then we can test nodes of various sizes rather than
being statically committed at boot.

This should be fairly straight-forward by faking
ACPI_SRAT_MEM_HOT_PLUGGABLE entries, for example.

> Index: linux-hpe4/mm/Kconfig
> ===================================================================
> --- linux-hpe4.orig/mm/Kconfig 2010-11-15 17:13:02.443461606 +0800
> +++ linux-hpe4/mm/Kconfig 2010-11-15 17:21:05.535335091 +0800
> @@ -147,6 +147,21 @@
> depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
> depends on MIGRATION
>
> +config NUMA_HOTPLUG_EMU
> + bool "NUMA hotplug emulator"
> + depends on X86_64 && NUMA && MEMORY_HOTPLUG
> +
> + ---help---
> +
> +config NODE_HOTPLUG_EMU
> + bool "Node hotplug emulation"
> + depends on NUMA_HOTPLUG_EMU && MEMORY_HOTPLUG
> + ---help---
> + Enable Node hotplug emulation. The machine will be setup with
> + hidden virtual nodes when booted with "numa=hide=N*size", where
> + N is the number of hidden nodes, size is the memory size per
> + hidden node. This is only useful for debugging.
> +

That's clearly wrong, but I don't see why this needs to be a new Kconfig
option to begin with, can't we enable all of this functionality by default
under CONFIG_NUMA_EMU && CONFIG_MEMORY_HOTPLUG?