2010-11-30 08:45:17

by Zheng, Shaohui

[permalink] [raw]
Subject: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe

From: Shaohui Zheng <[email protected]>

Implement a debugfs inteface /sys/kernel/debug/mem_hotplug/probe for meomory hotplug
emulation. it accepts the same parameters like
/sys/devices/system/memory/probe.

Document the interface usage to file Documentation/memory-hotplug.txt.

CC: Dave Hansen <[email protected]>
Signed-off-by: Shaohui Zheng <[email protected]>
Signed-off-by: Haicheng Li <[email protected]>
--
Index: linux-hpe4/mm/memory_hotplug.c
===================================================================
--- linux-hpe4.orig/mm/memory_hotplug.c 2010-11-30 14:15:23.587622002 +0800
+++ linux-hpe4/mm/memory_hotplug.c 2010-11-30 14:16:45.447622001 +0800
@@ -983,4 +983,35 @@
}

module_init(node_debug_init);
+
+#ifdef CONFIG_ARCH_MEMORY_PROBE
+
+static ssize_t debug_memory_probe_store(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ return parse_memory_probe_store(buf, count);
+}
+
+static const struct file_operations memory_probe_file_ops = {
+ .write = debug_memory_probe_store,
+ .llseek = generic_file_llseek,
+};
+
+static int __init memory_debug_init(void)
+{
+ if (!memhp_debug_root)
+ memhp_debug_root = debugfs_create_dir("mem_hotplug", NULL);
+ if (!memhp_debug_root)
+ return -ENOMEM;
+
+ if (!debugfs_create_file("probe", S_IWUSR, memhp_debug_root,
+ NULL, &memory_probe_file_ops))
+ return -ENOMEM;
+
+ return 0;
+}
+
+module_init(memory_debug_init);
+
+#endif /* CONFIG_ARCH_MEMORY_PROBE */
#endif /* CONFIG_DEBUG_FS */
Index: linux-hpe4/Documentation/memory-hotplug.txt
===================================================================
--- linux-hpe4.orig/Documentation/memory-hotplug.txt 2010-11-30 14:15:23.587622002 +0800
+++ linux-hpe4/Documentation/memory-hotplug.txt 2010-11-30 14:40:27.267622000 +0800
@@ -198,23 +198,41 @@
In some environments, especially virtualized environment, firmware will not
notify memory hotplug event to the kernel. For such environment, "probe"
interface is supported. This interface depends on CONFIG_ARCH_MEMORY_PROBE.
+It can be also used for physical memory hotplug emulation.

-Now, CONFIG_ARCH_MEMORY_PROBE is supported only by powerpc but it does not
-contain highly architecture codes. Please add config if you need "probe"
+Now, CONFIG_ARCH_MEMORY_PROBE is supported by powerpc and x86_64, but it does
+not contain highly architecture codes. Please add config if you need "probe"
interface.

-Probe interface is located at
-/sys/devices/system/memory/probe
+We have both sysfs and debugfs interface for memory probe. They are located at
+/sys/devices/system/memory/probe (sysfs) and /sys/kernel/debug/mem_hotplug/probe
+(debugfs), We can try any of them, they accpet the same parameters.

You can tell the physical address of new memory to the kernel by

-% echo start_address_of_new_memory > /sys/devices/system/memory/probe
+% echo start_address_of_new_memory > memory/probe

Then, [start_address_of_new_memory, start_address_of_new_memory + section_size)
memory range is hot-added. In this case, hotplug script is not called (in
current implementation). You'll have to online memory by yourself.
Please see "How to online memory" in this text.

+The probe interface can accept flexible parameters, for example:
+
+Add a memory section(128M) to node 3(boots with mem=1024m)
+
+ echo 0x40000000,3 > memory/probe
+
+And more we make it friendly, it is possible to add memory to do
+
+ echo 3g > memory/probe
+ echo 1024m,3 > memory/probe
+
+Another format suggested by Dave Hansen:
+
+ echo physical_address=0x40000000 numa_node=3 > memory/probe
+
+You can also use mem_hotplug/probe(debugfs) interface in the above examples.

4.3 Node hotplug emulation
------------

--
Thanks & Regards,
Shaohui


2010-12-02 00:57:43

by David Rientjes

[permalink] [raw]
Subject: Re: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe

On Tue, 30 Nov 2010, [email protected] wrote:

> From: Shaohui Zheng <[email protected]>
>
> Implement a debugfs inteface /sys/kernel/debug/mem_hotplug/probe for meomory hotplug
> emulation. it accepts the same parameters like
> /sys/devices/system/memory/probe.
>

NACK, we don't need two interfaces to do the same thing.

2010-12-02 01:08:55

by Zheng, Shaohui

[permalink] [raw]
Subject: Re: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe

On Wed, Dec 01, 2010 at 04:57:35PM -0800, David Rientjes wrote:
> On Tue, 30 Nov 2010, [email protected] wrote:
>
> > From: Shaohui Zheng <[email protected]>
> >
> > Implement a debugfs inteface /sys/kernel/debug/mem_hotplug/probe for meomory hotplug
> > emulation. it accepts the same parameters like
> > /sys/devices/system/memory/probe.
> >
>
> NACK, we don't need two interfaces to do the same thing.

You may not know the background, the sysfs memory/probe interface is a general
interface. Even through we have a debugfs interface, we should still keep it.

For test purpose, the sysfs is enough, according to the comments from Greg & Dave,
we create the debugfs interface.

--
Thanks & Regards,
Shaohui

2010-12-02 01:21:55

by David Rientjes

[permalink] [raw]
Subject: Re: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe

On Thu, 2 Dec 2010, Shaohui Zheng wrote:

> > > From: Shaohui Zheng <[email protected]>
> > >
> > > Implement a debugfs inteface /sys/kernel/debug/mem_hotplug/probe for meomory hotplug
> > > emulation. it accepts the same parameters like
> > > /sys/devices/system/memory/probe.
> > >
> >
> > NACK, we don't need two interfaces to do the same thing.
>
> You may not know the background, the sysfs memory/probe interface is a general
> interface. Even through we have a debugfs interface, we should still keep it.
>
> For test purpose, the sysfs is enough, according to the comments from Greg & Dave,
> we create the debugfs interface.
>

I doubt either Greg or Dave suggested adding duplicate interfaces for the
same functionality.

The difference is that we needed to add the add_node interface in a new
mem_hotplug debugfs directory because it's only useful for debugging
kernel code and, thus, doesn't really have an appropriate place in sysfs.
Nobody is going to use add_node unless they lack hotpluggable memory
sections in their SRAT and want to debug the memory hotplug callers. For
example, I already wrote all of this node hotplug emulation stuff when I
wrote the node hotplug support for SLAB.

Memory hotplug, however, does serve a non-debugging function and is
appropriate in sysfs since this is how people hotplug memory. It's an ABI
that we can't simply remove without deprecation over a substantial period
of time and in this case it doesn't seem to have a clear advantage. We
need not add special emulation support for something that is already
possible for real systems, so adding a duplicate interface in debugfs is
inappropriate.

2010-12-02 01:50:57

by Zheng, Shaohui

[permalink] [raw]
Subject: Re: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe

>
> I doubt either Greg or Dave suggested adding duplicate interfaces for the
> same functionality.
>
> The difference is that we needed to add the add_node interface in a new
> mem_hotplug debugfs directory because it's only useful for debugging
> kernel code and, thus, doesn't really have an appropriate place in sysfs.
> Nobody is going to use add_node unless they lack hotpluggable memory
> sections in their SRAT and want to debug the memory hotplug callers. For
> example, I already wrote all of this node hotplug emulation stuff when I
> wrote the node hotplug support for SLAB.
>
> Memory hotplug, however, does serve a non-debugging function and is
> appropriate in sysfs since this is how people hotplug memory. It's an ABI
> that we can't simply remove without deprecation over a substantial period
> of time and in this case it doesn't seem to have a clear advantage. We
> need not add special emulation support for something that is already
> possible for real systems, so adding a duplicate interface in debugfs is
> inappropriate.

so we should still keep the sysfs memory/probe interface without any modifications,
but for the debugfs mem_hotplug/probe interface, we can add the memory region
to a desired node. It is an extention for the sysfs memory/probe interface, it can
be used for memory hotplug emulation. Do I understand it correctly?

--
Thanks & Regards,
Shaohui

2010-12-02 02:13:25

by David Rientjes

[permalink] [raw]
Subject: Re: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe

On Thu, 2 Dec 2010, Shaohui Zheng wrote:

> so we should still keep the sysfs memory/probe interface without any modifications,
> but for the debugfs mem_hotplug/probe interface, we can add the memory region
> to a desired node.

This feature would be distinct from the add_node interface already
provided: instead of hotplugging a new node to test the memory hotplug
callbacks, this new interface would only be hotadding new memory to a node
other than the one it has physical affinity with. For that support, I'd
suggest new probe files in debugfs for each online node:

/sys/kernel/debug/mem_hotplug/add_node (already exists)
/sys/kernel/debug/mem_hotplug/node0/add_memory
/sys/kernel/debug/mem_hotplug/node1/add_memory
...

and then you can offline and remove that memory with the existing hotplug
support (CONFIG_MEMORY_HOTPLUG and CONFIG_MEMORY_HOTREMOVE, respectively).

2010-12-02 02:35:35

by Zheng, Shaohui

[permalink] [raw]
Subject: RE: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe

Why should we add so many interfaces for memory hotplug emulation? If so, we should create both sysfs and debugfs
entries for an online node, we are trying to add redundant code logic.

We need not make a simple thing such complicated, Simple is beautiful, I'd prefer to rename the mem_hotplug/probe
interface as mem_hotplug/add_memory.

/sys/kernel/debug/mem_hotplug/add_node (already exists)
/sys/kernel/debug/mem_hotplug/add_memory (rename probe as add_memory)

Thanks & Regards,
Shaohui


-----Original Message-----
From: David Rientjes [mailto:[email protected]]
Sent: Thursday, December 02, 2010 10:13 AM
To: Zheng, Shaohui
Cc: Andrew Morton; [email protected]; [email protected]; [email protected]; Andi Kleen; Dave Hansen; Greg KH; Li, Haicheng
Subject: Re: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe

On Thu, 2 Dec 2010, Shaohui Zheng wrote:

> so we should still keep the sysfs memory/probe interface without any modifications,
> but for the debugfs mem_hotplug/probe interface, we can add the memory region
> to a desired node.

This feature would be distinct from the add_node interface already
provided: instead of hotplugging a new node to test the memory hotplug
callbacks, this new interface would only be hotadding new memory to a node
other than the one it has physical affinity with. For that support, I'd
suggest new probe files in debugfs for each online node:

/sys/kernel/debug/mem_hotplug/add_node (already exists)
/sys/kernel/debug/mem_hotplug/node0/add_memory
/sys/kernel/debug/mem_hotplug/node1/add_memory
...

and then you can offline and remove that memory with the existing hotplug
support (CONFIG_MEMORY_HOTPLUG and CONFIG_MEMORY_HOTREMOVE, respectively).

2010-12-02 23:34:09

by David Rientjes

[permalink] [raw]
Subject: RE: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe

On Thu, 2 Dec 2010, Zheng, Shaohui wrote:

> Why should we add so many interfaces for memory hotplug emulation?

Because they are functionally different from real memory hotplug and we
want to support different configurations such as mapping memory to a
different node id or onlining physical nodes that don't exist.

They are in debugfs because the emulation, unlike real memory hotplug, is
used only for testing and debugging.

> If so, we should create both sysfs and debugfs
> entries for an online node, we are trying to add redundant code logic.
>

We do not need sysfs triggers for onlining a node, that already happens
automatically if the memory that is being onlined has a hotpluggable node
entry in the SRAT that has an offline node id.

> We need not make a simple thing such complicated, Simple is beautiful, I'd prefer to rename the mem_hotplug/probe
> interface as mem_hotplug/add_memory.
>
> /sys/kernel/debug/mem_hotplug/add_node (already exists)
> /sys/kernel/debug/mem_hotplug/add_memory (rename probe as add_memory)
>

No, add_memory would then require these bizarre lines that you've been
parsing like

echo 'physical_addr=0x80000000 node_id=3' > /sys/kernel/debug/mem_hotplug/add_memory

which is unnecessary if you introduce my proposal for per-node debugfs
directories similar to that under /sys/devices/system/node that is
extendable later if we add additional per-node triggers under
CONFIG_DEBUG_FS.

Adding /sys/kernel/debug/mem_hotplug/node2/add_memory that you write a
physical address to is a much more robust, simple, and extendable
interface.

2010-12-06 01:23:04

by Zheng, Shaohui

[permalink] [raw]
Subject: RE: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe

After introduce the per-node interface, the following directive can be avoided.

echo '0x80000000,3' > /sys/kernel/debug/mem_hotplug/add_memory
echo 'physical_addr=0x80000000 node_id=3' > /sys/kernel/debug/mem_hotplug/add_memory

I already implemented a draft in another thread, and waiting for comments, thanks for the proposal.

Thanks & Regards,
Shaohui


-----Original Message-----
From: David Rientjes [mailto:[email protected]]
Sent: Friday, December 03, 2010 7:34 AM
To: Zheng, Shaohui
Cc: Andrew Morton; [email protected]; [email protected]; [email protected]; Andi Kleen; Dave Hansen; Greg KH; Li, Haicheng
Subject: RE: [8/8, v6] NUMA Hotplug Emulator: implement debugfs interface for memory probe

On Thu, 2 Dec 2010, Zheng, Shaohui wrote:

> Why should we add so many interfaces for memory hotplug emulation?

Because they are functionally different from real memory hotplug and we
want to support different configurations such as mapping memory to a
different node id or onlining physical nodes that don't exist.

They are in debugfs because the emulation, unlike real memory hotplug, is
used only for testing and debugging.

> If so, we should create both sysfs and debugfs
> entries for an online node, we are trying to add redundant code logic.
>

We do not need sysfs triggers for onlining a node, that already happens
automatically if the memory that is being onlined has a hotpluggable node
entry in the SRAT that has an offline node id.

> We need not make a simple thing such complicated, Simple is beautiful, I'd prefer to rename the mem_hotplug/probe
> interface as mem_hotplug/add_memory.
>
> /sys/kernel/debug/mem_hotplug/add_node (already exists)
> /sys/kernel/debug/mem_hotplug/add_memory (rename probe as add_memory)
>

No, add_memory would then require these bizarre lines that you've been
parsing like

echo 'physical_addr=0x80000000 node_id=3' > /sys/kernel/debug/mem_hotplug/add_memory

which is unnecessary if you introduce my proposal for per-node debugfs
directories similar to that under /sys/devices/system/node that is
extendable later if we add additional per-node triggers under
CONFIG_DEBUG_FS.

Adding /sys/kernel/debug/mem_hotplug/node2/add_memory that you write a
physical address to is a much more robust, simple, and extendable
interface.